NEW: Real-Time Usage Tracking for AI Agents — track Claude Code, Kimi, Codex & more. Try it free →

CostGoat Logo

CostGoat

LAST UPDATED: MAY 21, 2026

DeepSeek API Pricing Calculator & Complete Cost Guide

Calculate DeepSeek V4 (Flash & Pro) API costs per token and per month. Up to 99% cheaper than GPT-5.5 with automatic context caching.

CalculatorPricing GuideExamplesSave MoneyFAQ

Pricing TLDR

  • 5 million free tokens for new users (no credit card required)
  • V4 Flash: $0.14 input • $0.28 output per million tokens (up to 99% cheaper than GPT-5.5)
  • V4 Pro: $1.74 input • $3.48 output for flagship reasoning • 1M context window • 384K max output

Official pricing:

DeepSeek

Quality Scores: Theozard

DeepSeek API Cost Calculator - Monthly Pricing

Calculate by

Input Tokens

Output Tokens

API Calls / Month

Quick Examples:

Context Caching:

Cache Hit Rate:

50%

(V4 Flash: $0.0028 cache hit vs $0.14 miss; V4 Pro: $0.0145 vs $1.74)

DeepSeek automatically caches context. When requests share the same prefix, cached segments are reused.

DeepSeek V4 Pro (deepseek-v4-pro)

Context

1M

384K output

Quality

86

Per 1M Tokens

In: $0.88

Out: $3.48

(50% cache hits)

Monthly Cost

$2.62

DeepSeek V4 Flash (deepseek-v4-flash)

Context

1M

384K output

Quality

77

Per 1M Tokens

In: $0.07

Out: $0.28

(50% cache hits)

Monthly Cost

$0.21

DeepSeek V4 Pro (deepseek-v4-pro)

Context

1M

384K output

Quality

65

Per 1M Tokens

In: $0.88

Out: $3.48

(50% cache hits)

Monthly Cost

$2.62

DeepSeek V4 Flash (deepseek-v4-flash)

Context

1M

384K output

Quality

61

Per 1M Tokens

In: $0.07

Out: $0.28

(50% cache hits)

Monthly Cost

$0.21

Burning through DeepSeek API credits?

Track your DeepSeek API spending in real-time.

Try free for 7 daysLearn more →

Privacy-first desktop app. No sign-up required.

CostGoat desktop app showing AI agent quotas, usage costs, credit balances, and subscriptions

About DeepSeek API

What is DeepSeek API?

The DeepSeek API provides programmatic access to DeepSeek's V4 family — Flash (cost-effective) and Pro (flagship) — each available in thinking and non-thinking modes. V4 unifies the context window at 1M tokens and max output at 384K tokens across all modes, with automatic context caching to reduce costs for repeated prompts. The legacy `deepseek-chat` and `deepseek-reasoner` API names continue to work as compatibility aliases for V4 Flash modes.

  • Extremely Cost-Effective: DeepSeek V4 Flash is one of the most affordable high-capability APIs available. At $0.14/$0.28 per million tokens (cache miss / output), it's up to 99% cheaper than GPT-5.5 ($5/$30). V4 Pro at $1.74/$3.48 still undercuts Claude Sonnet 4.6 ($3/$15) for near-flagship reasoning quality.
  • Automatic Context Caching: Context caching is enabled by default across both tiers. When requests share the same prefix as recent ones, cached segments are retrieved from disk automatically. Cache hits cost just $0.0028/M tokens on V4 Flash (98% cheaper than cache miss) and $0.0145/M on V4 Pro.
  • Thinking & Non-Thinking Modes: Both V4 Flash and V4 Pro offer thinking mode (visible Chain-of-Thought, best for math/logic/code) and non-thinking mode (faster, best for general tasks). V4 unifies max output at 384K tokens across both modes — no more 8K vs 64K split.

When to Use DeepSeek API

DeepSeek is ideal for cost-sensitive applications that need good AI capability without frontier pricing. Use V4 Flash for high-volume production workloads and V4 Pro for hard reasoning tasks; within each tier, enable thinking mode for Chain-of-Thought reasoning or use non-thinking mode for general-purpose workloads.

Ideal for

  • Cost-sensitive production applications
  • High-volume batch processing
  • Math, logic, and coding tasks (use thinking mode)
  • General chatbots and content generation (use non-thinking mode)
  • Applications with repetitive prompts (benefits from caching)

Not ideal for

  • Applications requiring maximum frontier capability
  • Use cases needing specific tool integrations not offered
  • Workloads requiring strict SLAs or guaranteed uptime
  • Tasks requiring guaranteed deterministic outputs

DeepSeek API Pricing Breakdown

Free Tier

New users receive 5 million free tokens upon registration with no credit card required. These credits are automatically applied to your usage and work across all models.

  • Sign up at platform.deepseek.com - no credit card required
  • Receive 5 million free tokens instantly
  • Credits work across all models (V4 Flash and V4 Pro)
  • Additional credits can be purchased as needed

Key Features

Automatic Context Caching

All requests automatically benefit from context caching. When prompts share the same prefix, cached content is reused. On V4 Flash, cache hits cost $0.0028/M vs $0.14/M for cache misses — a 98% savings. (The cache-hit rate was reduced to 1/10 of launch price on 2026-04-26.)

V4 Two-Tier Pricing

DeepSeek V4 ships in two tiers. V4 Flash: $0.0028 cache hit, $0.14 cache miss, $0.28 output per 1M — for cost-effective production workloads. V4 Pro: $0.0145 cache hit, $1.74 cache miss, $3.48 output per 1M — for flagship reasoning. Both tiers offer thinking and non-thinking modes.

1M Context, 384K Max Output

V4 ships with a 1M-token context window and unified 384K max output tokens across both thinking and non-thinking modes — a major expansion from V3.2's 128K context / split 8K-64K outputs.

OpenAI- and Anthropic-Compatible API

DeepSeek API speaks two formats — OpenAI-compatible at the default base URL and Anthropic-compatible at `https://api.deepseek.com/anthropic`. Migrate from either ecosystem by changing your base URL and API key.

DeepSeek Model Comparison

Quality (non-thinking / thinking)

DeepSeek V4 Flash

61 / 77

DeepSeek V4 Pro

65 / 86

Cache hit / miss / output (per 1M)

DeepSeek V4 Flash

$0.0028 / $0.14 / $0.28

DeepSeek V4 Pro

$0.0145 / $1.74 / $3.48

Context Window

DeepSeek V4 Flash

1M tokens

DeepSeek V4 Pro

1M tokens

Max Output

DeepSeek V4 Flash

384K tokens

DeepSeek V4 Pro

384K tokens

Chain-of-Thought

DeepSeek V4 Flash

Yes (in thinking mode)

DeepSeek V4 Pro

Yes (in thinking mode)

Best For

DeepSeek V4 Flash

High-volume production, general tasks, most coding

DeepSeek V4 Pro

Hard reasoning, complex math, flagship-quality code

Legacy aliases

DeepSeek V4 Flash

deepseek-chat / deepseek-reasoner

DeepSeek V4 Pro

Note: V4 Pro is currently running a 75% launch discount until 2026-05-31 15:59 UTC. The prices above show the post-promo steady-state rates so the calculator stays accurate after the promo window. The `deepseek-chat` and `deepseek-reasoner` model names still work as compatibility aliases routing to V4 Flash modes, but are scheduled for deprecation on 2026-07-24 — new integrations should use the explicit `deepseek-v4-flash` and `deepseek-v4-pro` model names.

DeepSeek API Monthly Cost Estimates

Light Use

$1-5/mo

Personal projects

<1K requests/day

Either mode works

Medium Use

$5-25/mo

Small apps

1-5K requests/day

Non-thinking for general, thinking for reasoning

Heavy Use

$25-125/mo

Production apps

5-20K requests/day

Optimize cache hit rate

Enterprise

$125+/mo

Large scale

20K+ requests/day

High cache utilization

7 DeepSeek API Cost Optimization Tips

1

Maximize Cache Hit Rate

Structure prompts with consistent prefixes (system prompts, instructions) to maximize cache hits. On V4 Flash, cache hits cost $0.0028/M vs $0.14/M for misses — a 98% savings. Aim for 70%+ cache hit rates in production.

2

Start with V4 Flash, Upgrade to Pro Selectively

V4 Flash (quality 61 non-thinking / 77 thinking) handles classification, summarization, general chat, and most coding at 12x lower cost than V4 Pro. Reserve V4 Pro (quality 65 / 86) for hard reasoning, complex math, and flagship-quality code generation.

3

Use Non-Thinking Mode for Simple Tasks

Within a tier, non-thinking mode is faster and well-suited for classification, summarization, and general queries. Reserve thinking mode for complex math, logic, and code tasks where Chain-of-Thought pays off in quality.

4

Batch Similar Requests

Group requests with similar prompts together to benefit from context caching. The system automatically caches and retrieves shared prefixes, reducing costs on subsequent requests.

5

Optimize Output Length

Both V4 modes support up to 384K output tokens, but costs accumulate per token. Set appropriate max_tokens limits for your use case — most tasks finish well under 10K output tokens.

6

Compare with Competitors

V4 Flash at $0.14/$0.28 is up to 99% cheaper than GPT-5.5 ($5/$30) and Claude Sonnet 4.6 ($3/$15). V4 Pro at $1.74/$3.48 is still meaningfully cheaper than Claude Opus 4.7 ($5/$25) for near-flagship reasoning quality (86 vs Opus's 95).

7

Monitor Token Usage

Track your cache hit rates and token consumption via the DeepSeek platform. Understanding your caching patterns helps optimize prompt design and reduce costs further.

DeepSeek Model Selection Guide

Use Case

Customer Support Chat

Recommended Model

V4 Flash

Non-thinking mode

Monthly Cost (Est.)

~$1-4

Why This Model?

Fast, cheapest for general queries

Use Case

Code Generation

Recommended Model

V4 Flash

Thinking mode

Monthly Cost (Est.)

~$3-12

Why This Model?

Quality 77 with Chain-of-Thought at Flash pricing

Use Case

Hard Math & Logic

Recommended Model

V4 Pro

Thinking mode

Monthly Cost (Est.)

~$20-80

Why This Model?

Quality 86 for the hardest reasoning tasks

Use Case

Content Writing

Recommended Model

V4 Flash

Non-thinking mode

Monthly Cost (Est.)

~$2-8

Why This Model?

Cheap for general content, 384K output ceiling

Use Case

Data Extraction

Recommended Model

V4 Flash

Non-thinking mode

Monthly Cost (Est.)

~$1-4

Why This Model?

JSON output support, tool calling, lowest cost

Use Case

High-Volume Batch

Recommended Model

V4 Flash

With high cache rate

Monthly Cost (Est.)

~$2-20

Why This Model?

Maximize cache hits at $0.0028/M for the lowest cost

Start Tracking Your DeepSeek API Spending

Monitor your DeepSeek V4 credit balance and per-tier usage from your menubar. Alerts before you run out.

Try Free for 7 DaysLearn more →

Privacy-first desktop app. 7-day free trial, no credit card required.

CostGoat desktop app showing AI agent quotas, usage costs, credit balances, and subscriptions

DeepSeek API Pricing FAQ

Common questions about DeepSeek API costs, billing, and optimization

AI Pricing

Gemini API PricingClaude API PricingGoogle Veo PricingAI Cost CalculatorsReplicate API PricingOpenRouter API PricingOpenRouter Free Models
DownloadsPricingDashboardContactIssuesAffiliatesTermsPrivacy

© 2026 CostGoat. All rights reserved.

Made by Functioncraft: Redis GUI Client · SSH GUI Client

Affiliate disclosure: Some links earn CostGoat a commission or credit when you sign up — no extra cost to you.