How much does OpenAI Whisper API cost?

OpenAI Whisper API costs $0.006 per minute ($0.36 per hour). OpenAI offers multiple transcription models: legacy Whisper and GPT-4o Transcribe both cost $0.006/minute, GPT-4o Transcribe with Diarization costs $0.006/minute (same price with speaker identification included), and GPT-4o Mini Transcribe costs $0.003/minute ($0.18/hour) for cost-sensitive applications. All models use straightforward per-minute billing with no minimum charges or subscription fees. Example: 1,000 minutes costs $6.00 with Whisper/GPT-4o or $3.00 with GPT-4o Mini.

Is OpenAI Whisper transcription free?

OpenAI Whisper transcription is not free, but new accounts receive $5 in free credits with no credit card required. These credits cover approximately 833 minutes (13.9 hours) with Whisper or GPT-4o Transcribe models, or 1,667 minutes (27.8 hours) with GPT-4o Mini Transcribe. Credits expire after 3 months. After using free credits, you pay standard per-minute rates starting at $0.003/minute. For unlimited free use, you can self-host the open-source Whisper model on your own GPU infrastructure, though this requires technical expertise.

What is OpenAI Whisper API?

OpenAI Whisper API is a speech-to-text transcription service that converts audio files into accurate text. It supports 99+ languages, handles various audio formats (mp3, mp4, mpeg, mpga, m4a, wav, webm), and processes files up to 25MB. OpenAI now offers multiple transcription models: the legacy Whisper model, and newer GPT-4o-based models (Transcribe, Transcribe with Diarization, Mini Transcribe) that offer improved accuracy and speaker identification. All models are accessible via simple REST API with Python/Node.js SDKs and use per-minute pricing.

Can OpenAI transcribe video files?

Yes, OpenAI can transcribe video files directly without requiring audio extraction. All OpenAI transcription models (Whisper, GPT-4o Transcribe, GPT-4o Mini) support video formats including mp4, mpeg, mpga, m4a, and webm. Maximum file size is 25MB per request. Simply upload the video file to the /v1/audio/transcriptions endpoint and OpenAI automatically extracts and transcribes the audio track. For videos larger than 25MB, you'll need to compress the video or split it into smaller segments before uploading.

Does Whisper API support speaker diarization?

The legacy Whisper model does NOT support speaker diarization. OpenAI's gpt-4o-transcribe-diarize model provides speaker identification at the same $0.006/minute price as standard transcription, automatically labeling Speaker 0, Speaker 1, etc. This is useful for meetings, interviews, podcasts, and support calls. For audio over 30 seconds you must pass a chunking_strategy parameter, and the diarize model only supports json, text, and diarized_json response formats (no SRT/VTT).

LAST UPDATED: AUGUST 2, 2026

OpenAI Transcription & Whisper API Pricing Calculator

Calculate OpenAI transcription costs per minute, per hour, and per month. Compare Whisper, GPT-4o Transcribe with diarization, and Mini models.

Calculator Pricing Guide Examples Save Money FAQ

Pricing TLDR

• $5 free credits (no credit card) = 833-1,667 minutes
• GPT Transcribe: $0.0045/min • Whisper/GPT-4o Transcribe: $0.006/min • GPT-4o Mini: $0.003/min • Realtime-Whisper & Live Transcribe (streaming): $0.017/min
• Speaker diarization on gpt-4o-transcribe-diarize • 99+ languages • 25MB file limit

Official pricing:

OpenAI

OpenAI Transcription Cost Calculator: Monthly Pricing

Hours of Audio / Month

Minutes of Audio / Month

Quick Examples:

Advanced Features:

Speaker Diarization(GPT-4o Transcribe only)

Word-by-word Timestamps(Whisper only)

GPT Transcribe (gpt-transcribe)

Per Minute

$0.0045

Per Hour

$0.27

Monthly Cost

$9.00

Whisper (whisper)

Per Minute

$0.0060

Per Hour

$0.36

Monthly Cost

$12.00

GPT-4o Transcribe (gpt-4o-transcribe)

Per Minute

$0.0060

Per Hour

$0.36

Monthly Cost

$12.00

GPT-4o Mini Transcribe (gpt-4o-mini-transcribe)

Per Minute

$0.0030

Per Hour

$0.18

Monthly Cost

$6.00

GPT-Realtime-Whisper (gpt-realtime-whisper)

Per Minute

$0.0170

Per Hour

$1.02

Monthly Cost

$34.00

GPT Live Transcribe (gpt-live-transcribe)

Per Minute

$0.0170

Per Hour

$1.02

Monthly Cost

$34.00

Burning through Whisper API credits?

Track your OpenAI transcription usage and costs in real-time.

CostGoat desktop app showing AI agent quotas, usage costs, credit balances, and subscriptions

About OpenAI Transcription API

What is OpenAI Transcription API?

OpenAI offers multiple transcription APIs powered by advanced speech recognition models. The original Whisper model provides high-accuracy transcription at $0.006/minute, while newer GPT-4o-based models (Transcribe, Transcribe with Diarization, Mini Transcribe) offer improved accuracy, speaker identification, and more flexible pricing. All models support 99+ languages, various audio formats, and deliver near real-time results.

Multiple Model Options: GPT Transcribe ($0.0045/min, OpenAI's recommended model for file transcription), Whisper (legacy, $0.006/min), GPT-4o Transcribe ($0.006/min with advanced accuracy), GPT-4o Transcribe with Diarization ($0.006/min with speaker identification), GPT-4o Mini Transcribe ($0.003/min for cost-sensitive applications), and two realtime streaming models, GPT-Realtime-Whisper and GPT Live Transcribe ($0.017/min each). All models support the same audio formats and languages.
Speaker Diarization: GPT-4o Transcribe Diarization model identifies and labels different speakers in the audio. Perfect for interviews, meetings, podcasts, and customer calls. Same per-minute pricing as standard transcription with no additional diarization fees.
Flexible Pricing Models: Choose between per-minute pricing (simple, predictable) or token-based pricing (GPT-4o models only). Token pricing: text input ($1.25-$2.50 per 1M), audio input ($3.00-$6.00 per 1M), text output ($5.00-$10.00 per 1M). Per-minute pricing typically more cost-effective for most use cases.
Timestamp Support (Whisper Only): Only the legacy Whisper model supports word-level and segment-level timestamps via the timestamp_granularities[] parameter. GPT-4o models (Transcribe, Transcribe Diarize, Mini) do not support granular timestamps. If you need precise word-level timing for video editing or subtitle generation, use Whisper with verbose_json format.

When to Use OpenAI Transcription API

Start with GPT Transcribe, OpenAI's recommended model for file transcription. Use GPT-4o Mini Transcribe for high-volume, cost-sensitive work, GPT-4o Transcribe with Diarization when you need speaker identification, and Whisper when you need word-level timestamps or subtitle formats.

Ideal for

Podcast and video transcription with GPT-4o Mini (lowest cost)
Meeting transcription with speaker identification using Diarization
Customer support call analysis and quality monitoring
Interview transcription for research and journalism
Video editing and subtitle generation with word-level timestamps (Whisper only)

Not ideal for

Real-time streaming transcription (use a streaming STT model, gpt-realtime-whisper or gpt-live-transcribe, at $0.017/min instead)
Very long files over 25MB (requires splitting)
Background noise removal (pre-process audio first)
Music or singing transcription (optimized for speech)

OpenAI Transcription API Pricing Breakdown

Free Tier

New users receive $5 in free credits with no credit card required. These credits expire after 3 months and work across all transcription models.

Sign up at platform.openai.com - no credit card required
Receive $5 free credits instantly upon registration
Covers 833 minutes (13.9 hours) with Whisper/GPT-4o Transcribe
Covers 1,667 minutes (27.8 hours) with GPT-4o Mini Transcribe
Credits expire after 3 months from grant date

Pricing Models Explained

Per-Minute Pricing (Recommended)

Simple per-minute billing: Whisper & GPT-4o Transcribe at $0.006/minute ($0.36/hour), GPT-4o Mini at $0.003/minute ($0.18/hour). Rounded to nearest second. Most cost-effective for standard transcription workflows. No minimum charges.

Token-Based Pricing (GPT-4o Models Only)

Alternative pricing for GPT-4o models: audio input tokens ($3-$6 per 1M), text input tokens ($1.25-$2.50 per 1M), text output tokens ($5-$10 per 1M). More complex but potentially cheaper for short transcripts. Use per-minute pricing unless you have specific token optimization needs.

Speaker Diarization (No Extra Cost)

GPT-4o Transcribe with Diarization identifies speakers at the same $0.006/minute rate. No additional fees for speaker identification. Returns transcript with speaker labels (Speaker 0, Speaker 1, etc.). Perfect for multi-speaker content like meetings and interviews.

File Format & Limits

Supports mp3, mp4, mpeg, mpga, m4a, wav, webm formats. Maximum file size: 25MB per request. For larger files, split audio or use compression. No preprocessing required - API handles various audio qualities.

Language Support

99+ Languages Supported

All models support transcription in 99+ languages including English, Spanish, French, German, Chinese, Japanese, Arabic, Hindi, Portuguese, Russian, Korean, and many more. Automatic language detection included at no extra cost.

Translation to English

All models can translate foreign language audio directly to English text at the same per-minute rate. Use the /v1/audio/translations endpoint instead of /transcriptions. No additional translation fees.

OpenAI Transcription Model Comparison

GPT Transcribe

Best For

General file transcription (recommended default)

Per Minute

$0.0045

Key Features

Newest model, keyword and language hints, streaming

GPT-4o Mini Transcribe

Best For

High-volume, cost-sensitive applications

Per Minute

$0.003

Key Features

50% cheaper, excellent accuracy, 99+ languages

GPT-4o Transcribe

Best For

Difficult audio, accents, background noise

Per Minute

$0.006

Key Features

Highest accuracy, token pricing option, advanced features

GPT-4o Transcribe Diarize

Best For

Meetings, interviews, multi-speaker content

Per Minute

$0.006

Key Features

Speaker identification included, same price as standard

Whisper (Legacy)

Best For

Word-level timestamps, video editing

Per Minute

$0.006

Key Features

Only model with word/segment timestamps, SRT/VTT support

GPT-Realtime-Whisper

Best For

Real-time streaming transcription

Per Minute

$0.017

Key Features

Low-latency streaming speech-to-text over the Realtime API

GPT Live Transcribe

Best For

Live captions and real-time transcription

Per Minute

$0.017

Key Features

Newest live transcription model for streaming audio

Real-World Use Cases & Cost Examples

Podcast Transcription

$6/mo

• 2,000 min/month

• GPT-4o Mini

• Single speaker, high volume

Team Meeting Notes

$3/mo

• 500 min/month

• GPT-4o Diarize

• Multi-speaker identification

Customer Support Calls

$30/mo

• 5,000 min/month

• GPT-4o Diarize

• Agent + customer tracking

Research Interviews

$6/mo

• 1,000 min/month

• GPT-4o Transcribe

• High accuracy needed

YouTube Subtitles

$30/mo

• 10,000 min/month

• GPT-4o Mini

• Cost-effective at scale

7 OpenAI Transcription API Cost Optimization Tips

Use GPT-4o Mini for High-Volume Transcription

GPT-4o Mini Transcribe costs 50% less than Whisper and GPT-4o Transcribe ($0.003/min vs $0.006/min). For 10,000 minutes/month, this saves $30/month ($360/year). Accuracy is excellent for most use cases. Only upgrade to GPT-4o Transcribe for difficult audio with heavy accents or background noise.

Optimize Audio Quality Before Upload

Lower bitrate audio reduces file size without significantly affecting transcription accuracy. Convert stereo to mono if not using diarization. Use audio compression (mp3 at 64-128kbps is sufficient). Smaller files upload faster and reduce bandwidth costs for high-volume applications.

Only Use Diarization When Necessary

Speaker diarization has the same per-minute cost but may use more processing time. If you don't need speaker identification (single-speaker podcasts, voiceovers), use standard GPT-4o Transcribe or Mini to potentially save processing time and improve throughput.

Batch Process During Off-Peak Hours

While OpenAI doesn't charge different rates by time, batching large transcription jobs can help manage your infrastructure costs and rate limits. Process high-volume transcription jobs in batches rather than individual API calls to reduce overhead.

Use $5 Free Credits for Testing

Use your free $5 credits (833-1,667 minutes) to test different models and find the right balance between cost and accuracy for your use case. Compare Whisper vs GPT-4o models vs Mini before committing to large-scale production.

Consider Self-Hosting for Extreme Volume

For 100,000+ minutes/month ($300-600/month on API), self-hosting open-source Whisper on your own GPU infrastructure may be cheaper. Requires technical expertise and GPU resources but eliminates per-minute costs. Use API for variable workloads, self-host for consistent high volume.

Monitor Transcription Costs in Real-Time

Track OpenAI transcription spending with CostGoat's real-time credit monitoring. Get instant alerts when credit usage spikes, when switching from Mini to premium models, or when approaching your budget thresholds. Prevent runaway transcription costs before they impact your bottom line.

Start Tracking Your Whisper API Spending

Monitor OpenAI transcription costs alongside your other API usage — all from one menubar app.

OpenAI Transcription Pricing FAQ

Common questions about OpenAI transcription costs, models, features, and API pricing

OpenAI Transcription & Whisper API Pricing Calculator

OpenAI Transcription Cost Calculator: Monthly Pricing

Track your OpenAI transcription usage and costs in real-time.

About OpenAI Transcription API

What is OpenAI Transcription API?

When to Use OpenAI Transcription API

Ideal for

Not ideal for

OpenAI Transcription API Pricing Breakdown

Free Tier

Pricing Models Explained

Language Support

OpenAI Transcription Model Comparison

Real-World Use Cases & Cost Examples

7 OpenAI Transcription API Cost Optimization Tips

Start Tracking Your Whisper API Spending

OpenAI Transcription Pricing FAQ

Is OpenAI Whisper transcription free?

What is OpenAI Whisper API?

Can OpenAI transcribe video files?

Does Whisper API support speaker diarization?

Related Pricing Calculators