🚀 EARLY ACCESS OFFER: Get CostGoat lifetime license for just $199 instead of $299! Get it now

CostGoat Logo

CostGoat

BETA
Try For Free
LAST UPDATED: NOVEMBER 6, 2025

OpenAI Transcription & Whisper API Pricing Calculator

Calculate OpenAI transcription costs per minute, per hour, and per month. Compare Whisper, GPT-4o Transcribe with diarization, and Mini models.

CalculatorPricing GuideExamplesSave MoneyFAQ

Pricing TLDR

  • • $5 free credits (no credit card) = 833-1,667 minutes
  • • Whisper: $0.006/min ($0.36/hour) • GPT-4o Mini: $0.003/min ($0.18/hour)
  • • Speaker diarization available • 99+ languages • 25MB file limit

Official pricing:

OpenAI

OpenAI Transcription Cost Calculator - Monthly Pricing

Hours of Audio / Month

Minutes of Audio / Month

Quick Examples:

Advanced Features:

Whisper (whisper)

Per Minute

$0.0060

Per Hour

$0.36

Monthly Cost

$12.00

GPT-4o Transcribe (gpt-4o-transcribe)

Per Minute

$0.0060

Per Hour

$0.36

Monthly Cost

$12.00

GPT-4o Mini Transcribe (gpt-4o-mini-transcribe)

Per Minute

$0.0030

Per Hour

$0.18

Monthly Cost

$6.00

About OpenAI Transcription API

What is OpenAI Transcription API?

OpenAI offers multiple transcription APIs powered by advanced speech recognition models. The original Whisper model provides high-accuracy transcription at $0.006/minute, while newer GPT-4o-based models (Transcribe, Transcribe with Diarization, Mini Transcribe) offer improved accuracy, speaker identification, and more flexible pricing. All models support 99+ languages, various audio formats, and deliver near real-time results.

  • Multiple Model Options: Whisper (legacy, $0.006/min), GPT-4o Transcribe ($0.006/min with advanced accuracy), GPT-4o Transcribe with Diarization ($0.006/min with speaker identification), GPT-4o Mini Transcribe ($0.003/min for cost-sensitive applications). All models support the same audio formats and languages.
  • Speaker Diarization: GPT-4o Transcribe Diarization model identifies and labels different speakers in the audio. Perfect for interviews, meetings, podcasts, and customer calls. Same per-minute pricing as standard transcription with no additional diarization fees.
  • Flexible Pricing Models: Choose between per-minute pricing (simple, predictable) or token-based pricing (GPT-4o models only). Token pricing: text input ($1.25-$2.50 per 1M), audio input ($3.00-$6.00 per 1M), text output ($5.00-$10.00 per 1M). Per-minute pricing typically more cost-effective for most use cases.
  • Timestamp Support (Whisper Only): Only the legacy Whisper model supports word-level and segment-level timestamps via the timestamp_granularities[] parameter. GPT-4o models (Transcribe, Transcribe Diarize, Mini) do not support granular timestamps. If you need precise word-level timing for video editing or subtitle generation, use Whisper with verbose_json format.

When to Use OpenAI Transcription API

Start with GPT-4o Mini Transcribe for high-volume, cost-sensitive applications. Upgrade to GPT-4o Transcribe for improved accuracy on difficult audio. Use GPT-4o Transcribe with Diarization when speaker identification is required. Legacy Whisper model remains available for backward compatibility.

Ideal for

  • Podcast and video transcription with GPT-4o Mini (lowest cost)
  • Meeting transcription with speaker identification using Diarization
  • Customer support call analysis and quality monitoring
  • Interview transcription for research and journalism
  • Video editing and subtitle generation with word-level timestamps (Whisper only)

Not ideal for

  • Real-time streaming transcription (use GPT-4o Realtime API instead)
  • Very long files over 25MB (requires splitting)
  • Background noise removal (pre-process audio first)
  • Music or singing transcription (optimized for speech)

OpenAI Transcription API Pricing Breakdown

Free Tier

New users receive $5 in free credits with no credit card required. These credits expire after 3 months and work across all transcription models.

  • Sign up at platform.openai.com - no credit card required
  • Receive $5 free credits instantly upon registration
  • Covers 833 minutes (13.9 hours) with Whisper/GPT-4o Transcribe
  • Covers 1,667 minutes (27.8 hours) with GPT-4o Mini Transcribe
  • Credits expire after 3 months from grant date

Pricing Models Explained

Per-Minute Pricing (Recommended)

Simple per-minute billing: Whisper & GPT-4o Transcribe at $0.006/minute ($0.36/hour), GPT-4o Mini at $0.003/minute ($0.18/hour). Rounded to nearest second. Most cost-effective for standard transcription workflows. No minimum charges.

Token-Based Pricing (GPT-4o Models Only)

Alternative pricing for GPT-4o models: audio input tokens ($3-$6 per 1M), text input tokens ($1.25-$2.50 per 1M), text output tokens ($5-$10 per 1M). More complex but potentially cheaper for short transcripts. Use per-minute pricing unless you have specific token optimization needs.

Speaker Diarization (No Extra Cost)

GPT-4o Transcribe with Diarization identifies speakers at the same $0.006/minute rate. No additional fees for speaker identification. Returns transcript with speaker labels (Speaker 0, Speaker 1, etc.). Perfect for multi-speaker content like meetings and interviews.

File Format & Limits

Supports mp3, mp4, mpeg, mpga, m4a, wav, webm formats. Maximum file size: 25MB per request. For larger files, split audio or use compression. No preprocessing required - API handles various audio qualities.

Language Support

99+ Languages Supported

All models support transcription in 99+ languages including English, Spanish, French, German, Chinese, Japanese, Arabic, Hindi, Portuguese, Russian, Korean, and many more. Automatic language detection included at no extra cost.

Translation to English

All models can translate foreign language audio directly to English text at the same per-minute rate. Use the /v1/audio/translations endpoint instead of /transcriptions. No additional translation fees.

OpenAI Transcription Model Comparison

GPT-4o Mini Transcribe

Best For

High-volume, cost-sensitive applications

Per Minute

$0.003

Key Features

50% cheaper, excellent accuracy, 99+ languages

GPT-4o Transcribe

Best For

Difficult audio, accents, background noise

Per Minute

$0.006

Key Features

Highest accuracy, token pricing option, advanced features

GPT-4o Transcribe Diarize

Best For

Meetings, interviews, multi-speaker content

Per Minute

$0.006

Key Features

Speaker identification included, same price as standard

Whisper (Legacy)

Best For

Word-level timestamps, video editing

Per Minute

$0.006

Key Features

Only model with word/segment timestamps, SRT/VTT support

Real-World Use Cases & Cost Examples

Podcast Transcription

$6/mo

• 2,000 min/month

• GPT-4o Mini

• Single speaker, high volume

Team Meeting Notes

$3/mo

• 500 min/month

• GPT-4o Diarize

• Multi-speaker identification

Customer Support Calls

$30/mo

• 5,000 min/month

• GPT-4o Diarize

• Agent + customer tracking

Research Interviews

$6/mo

• 1,000 min/month

• GPT-4o Transcribe

• High accuracy needed

YouTube Subtitles

$30/mo

• 10,000 min/month

• GPT-4o Mini

• Cost-effective at scale

7 OpenAI Transcription API Cost Optimization Tips

1

Use GPT-4o Mini for High-Volume Transcription

GPT-4o Mini Transcribe costs 50% less than Whisper and GPT-4o Transcribe ($0.003/min vs $0.006/min). For 10,000 minutes/month, this saves $30/month ($360/year). Accuracy is excellent for most use cases. Only upgrade to GPT-4o Transcribe for difficult audio with heavy accents or background noise.

2

Optimize Audio Quality Before Upload

Lower bitrate audio reduces file size without significantly affecting transcription accuracy. Convert stereo to mono if not using diarization. Use audio compression (mp3 at 64-128kbps is sufficient). Smaller files upload faster and reduce bandwidth costs for high-volume applications.

3

Only Use Diarization When Necessary

Speaker diarization has the same per-minute cost but may use more processing time. If you don't need speaker identification (single-speaker podcasts, voiceovers), use standard GPT-4o Transcribe or Mini to potentially save processing time and improve throughput.

4

Batch Process During Off-Peak Hours

While OpenAI doesn't charge different rates by time, batching large transcription jobs can help manage your infrastructure costs and rate limits. Process high-volume transcription jobs in batches rather than individual API calls to reduce overhead.

5

Leverage $5 Free Credits for Testing

Use your free $5 credits (833-1,667 minutes) to test different models and find the right balance between cost and accuracy for your use case. Compare Whisper vs GPT-4o models vs Mini before committing to large-scale production.

6

Consider Self-Hosting for Extreme Volume

For 100,000+ minutes/month ($300-600/month on API), self-hosting open-source Whisper on your own GPU infrastructure may be cheaper. Requires technical expertise and GPU resources but eliminates per-minute costs. Use API for variable workloads, self-host for consistent high volume.

7

Monitor Transcription Costs in Real-Time

Track OpenAI transcription spending with CostGoat's real-time credit monitoring. Get instant alerts when credit usage spikes, when switching from Mini to premium models, or when approaching your budget thresholds. Prevent runaway transcription costs before they impact your bottom line.

Track Your OpenAI API Costs in Real-Time

Monitor your OpenAI API usage and spending across all models - GPT, DALL-E, Whisper, and more. CostGoat runs on your desktop with privacy-first local monitoring. 7-day free trial, then $9/month.

Start Free Trial

OpenAI Transcription Pricing FAQ

Common questions about OpenAI transcription costs, models, features, and API pricing

Pricing Calculators

Claude API PricingGoogle Veo PricingAWS Lambda PricingAWS Cost CalculatorsOpenAI Sora 2 PricingOpenAI Text API Pricing
PricingDashboardContactAffiliate ProgramTermsPrivacy

© 2025 CostGoat. All rights reserved.