OpenAI Transcription & Whisper API Pricing Calculator
Calculate OpenAI transcription costs per minute, per hour, and per month. Compare Whisper, GPT-4o Transcribe with diarization, and Mini models.
Pricing TLDR
- • $5 free credits (no credit card) = 833-1,667 minutes
- • Whisper: $0.006/min ($0.36/hour) • GPT-4o Mini: $0.003/min ($0.18/hour)
- • Speaker diarization available • 99+ languages • 25MB file limit
Official pricing:
OpenAIOpenAI Transcription Cost Calculator - Monthly Pricing
Hours of Audio / Month
Minutes of Audio / Month
Quick Examples:
Advanced Features:
Whisper (whisper)
Per Minute
Per Hour
Monthly Cost
GPT-4o Transcribe (gpt-4o-transcribe)
Per Minute
Per Hour
Monthly Cost
GPT-4o Mini Transcribe (gpt-4o-mini-transcribe)
Per Minute
Per Hour
Monthly Cost
About OpenAI Transcription API
What is OpenAI Transcription API?
OpenAI offers multiple transcription APIs powered by advanced speech recognition models. The original Whisper model provides high-accuracy transcription at $0.006/minute, while newer GPT-4o-based models (Transcribe, Transcribe with Diarization, Mini Transcribe) offer improved accuracy, speaker identification, and more flexible pricing. All models support 99+ languages, various audio formats, and deliver near real-time results.
- Multiple Model Options: Whisper (legacy, $0.006/min), GPT-4o Transcribe ($0.006/min with advanced accuracy), GPT-4o Transcribe with Diarization ($0.006/min with speaker identification), GPT-4o Mini Transcribe ($0.003/min for cost-sensitive applications). All models support the same audio formats and languages.
- Speaker Diarization: GPT-4o Transcribe Diarization model identifies and labels different speakers in the audio. Perfect for interviews, meetings, podcasts, and customer calls. Same per-minute pricing as standard transcription with no additional diarization fees.
- Flexible Pricing Models: Choose between per-minute pricing (simple, predictable) or token-based pricing (GPT-4o models only). Token pricing: text input ($1.25-$2.50 per 1M), audio input ($3.00-$6.00 per 1M), text output ($5.00-$10.00 per 1M). Per-minute pricing typically more cost-effective for most use cases.
- Timestamp Support (Whisper Only): Only the legacy Whisper model supports word-level and segment-level timestamps via the timestamp_granularities[] parameter. GPT-4o models (Transcribe, Transcribe Diarize, Mini) do not support granular timestamps. If you need precise word-level timing for video editing or subtitle generation, use Whisper with verbose_json format.
When to Use OpenAI Transcription API
Start with GPT-4o Mini Transcribe for high-volume, cost-sensitive applications. Upgrade to GPT-4o Transcribe for improved accuracy on difficult audio. Use GPT-4o Transcribe with Diarization when speaker identification is required. Legacy Whisper model remains available for backward compatibility.
Ideal for
- Podcast and video transcription with GPT-4o Mini (lowest cost)
- Meeting transcription with speaker identification using Diarization
- Customer support call analysis and quality monitoring
- Interview transcription for research and journalism
- Video editing and subtitle generation with word-level timestamps (Whisper only)
Not ideal for
- Real-time streaming transcription (use GPT-4o Realtime API instead)
- Very long files over 25MB (requires splitting)
- Background noise removal (pre-process audio first)
- Music or singing transcription (optimized for speech)
OpenAI Transcription API Pricing Breakdown
Free Tier
New users receive $5 in free credits with no credit card required. These credits expire after 3 months and work across all transcription models.
- Sign up at platform.openai.com - no credit card required
- Receive $5 free credits instantly upon registration
- Covers 833 minutes (13.9 hours) with Whisper/GPT-4o Transcribe
- Covers 1,667 minutes (27.8 hours) with GPT-4o Mini Transcribe
- Credits expire after 3 months from grant date
Pricing Models Explained
Per-Minute Pricing (Recommended)
Simple per-minute billing: Whisper & GPT-4o Transcribe at $0.006/minute ($0.36/hour), GPT-4o Mini at $0.003/minute ($0.18/hour). Rounded to nearest second. Most cost-effective for standard transcription workflows. No minimum charges.
Token-Based Pricing (GPT-4o Models Only)
Alternative pricing for GPT-4o models: audio input tokens ($3-$6 per 1M), text input tokens ($1.25-$2.50 per 1M), text output tokens ($5-$10 per 1M). More complex but potentially cheaper for short transcripts. Use per-minute pricing unless you have specific token optimization needs.
Speaker Diarization (No Extra Cost)
GPT-4o Transcribe with Diarization identifies speakers at the same $0.006/minute rate. No additional fees for speaker identification. Returns transcript with speaker labels (Speaker 0, Speaker 1, etc.). Perfect for multi-speaker content like meetings and interviews.
File Format & Limits
Supports mp3, mp4, mpeg, mpga, m4a, wav, webm formats. Maximum file size: 25MB per request. For larger files, split audio or use compression. No preprocessing required - API handles various audio qualities.
Language Support
99+ Languages Supported
All models support transcription in 99+ languages including English, Spanish, French, German, Chinese, Japanese, Arabic, Hindi, Portuguese, Russian, Korean, and many more. Automatic language detection included at no extra cost.
Translation to English
All models can translate foreign language audio directly to English text at the same per-minute rate. Use the /v1/audio/translations endpoint instead of /transcriptions. No additional translation fees.
OpenAI Transcription Model Comparison
GPT-4o Mini Transcribe
Best For
High-volume, cost-sensitive applications
Per Minute
$0.003
Key Features
50% cheaper, excellent accuracy, 99+ languages
GPT-4o Transcribe
Best For
Difficult audio, accents, background noise
Per Minute
$0.006
Key Features
Highest accuracy, token pricing option, advanced features
GPT-4o Transcribe Diarize
Best For
Meetings, interviews, multi-speaker content
Per Minute
$0.006
Key Features
Speaker identification included, same price as standard
Whisper (Legacy)
Best For
Word-level timestamps, video editing
Per Minute
$0.006
Key Features
Only model with word/segment timestamps, SRT/VTT support
Real-World Use Cases & Cost Examples
Podcast Transcription
$6/mo
• 2,000 min/month
• GPT-4o Mini
• Single speaker, high volume
Team Meeting Notes
$3/mo
• 500 min/month
• GPT-4o Diarize
• Multi-speaker identification
Customer Support Calls
$30/mo
• 5,000 min/month
• GPT-4o Diarize
• Agent + customer tracking
Research Interviews
$6/mo
• 1,000 min/month
• GPT-4o Transcribe
• High accuracy needed
YouTube Subtitles
$30/mo
• 10,000 min/month
• GPT-4o Mini
• Cost-effective at scale
7 OpenAI Transcription API Cost Optimization Tips
Use GPT-4o Mini for High-Volume Transcription
GPT-4o Mini Transcribe costs 50% less than Whisper and GPT-4o Transcribe ($0.003/min vs $0.006/min). For 10,000 minutes/month, this saves $30/month ($360/year). Accuracy is excellent for most use cases. Only upgrade to GPT-4o Transcribe for difficult audio with heavy accents or background noise.
Optimize Audio Quality Before Upload
Lower bitrate audio reduces file size without significantly affecting transcription accuracy. Convert stereo to mono if not using diarization. Use audio compression (mp3 at 64-128kbps is sufficient). Smaller files upload faster and reduce bandwidth costs for high-volume applications.
Only Use Diarization When Necessary
Speaker diarization has the same per-minute cost but may use more processing time. If you don't need speaker identification (single-speaker podcasts, voiceovers), use standard GPT-4o Transcribe or Mini to potentially save processing time and improve throughput.
Batch Process During Off-Peak Hours
While OpenAI doesn't charge different rates by time, batching large transcription jobs can help manage your infrastructure costs and rate limits. Process high-volume transcription jobs in batches rather than individual API calls to reduce overhead.
Leverage $5 Free Credits for Testing
Use your free $5 credits (833-1,667 minutes) to test different models and find the right balance between cost and accuracy for your use case. Compare Whisper vs GPT-4o models vs Mini before committing to large-scale production.
Consider Self-Hosting for Extreme Volume
For 100,000+ minutes/month ($300-600/month on API), self-hosting open-source Whisper on your own GPU infrastructure may be cheaper. Requires technical expertise and GPU resources but eliminates per-minute costs. Use API for variable workloads, self-host for consistent high volume.
Monitor Transcription Costs in Real-Time
Track OpenAI transcription spending with CostGoat's real-time credit monitoring. Get instant alerts when credit usage spikes, when switching from Mini to premium models, or when approaching your budget thresholds. Prevent runaway transcription costs before they impact your bottom line.
Track Your OpenAI API Costs in Real-Time
Monitor your OpenAI API usage and spending across all models - GPT, DALL-E, Whisper, and more. CostGoat runs on your desktop with privacy-first local monitoring. 7-day free trial, then $9/month.
Start Free TrialOpenAI Transcription Pricing FAQ
Common questions about OpenAI transcription costs, models, features, and API pricing
