LLM API Pricing Comparison

Compare 44 leading large language models in one table. Prices in USD per million tokens. Last checked 2026-07-14.

Model	Provider	Input	Output	Cached	Context	Quality	Heat
Qwen3.5 Flash	🇨🇳 Alibaba Qwen	$0.029	$0.29	—	131K	—	—
Gemini 2.5 Flash-Lite	🇺🇸 Google	$0.10	$0.40	$0.010	1M	—	#160.60T
Qwen3.5 Plus	🇨🇳 Alibaba Qwen	$0.12	$0.71	—	131K	—	—
Doubao 1.5 Pro	🇨🇳 ByteDance Doubao	$0.12	$0.29	$0.024	256K	—	—
文心 ERNIE 4.5 Turbo	🇨🇳 Baidu ERNIE	$0.12	$0.47	$0.029	128K	—	—
混元 TurboS	🇨🇳 Tencent Hunyuan	$0.12	$0.29	—	256K	—	—
DeepSeek V4 Flash	🇨🇳 DeepSeek	$0.15	$0.29	$0.003	1M	47	#14.34T
文心 ERNIE X1 Turbo	🇨🇳 Baidu ERNIE	$0.15	$0.59	—	128K	—	—
混元 T1	🇨🇳 Tencent Hunyuan	$0.15	$0.59	—	64K	—	—
Gemini 3.1 Flash-Lite	🇺🇸 Google	$0.25	$1.50	$0.025	1M	34	—
DeepSeek V3.2Retires 2026-07-24	🇨🇳 DeepSeek	$0.29	$1.18	$0.074	128K	—	#101.18T
GLM-4.7	🇨🇳 Zhipu AI	$0.29	$1.18	$0.059	200K	—	—
Spark X2 Flash	🇨🇳 iFlytek Spark	$0.29	$0.29	—	200K	—	—
Spark Ultra	🇨🇳 iFlytek Spark	$0.29	$0.29	—	128K	—	—
Baichuan M2	🇨🇳 Baichuan AI	$0.29	$2.95	—	192K	—	—
Gemini 2.5 Flash	🇺🇸 Google	$0.30	$2.50	$0.030	1M	—	#140.63T
MiniMax M2.7	🇨🇳 MiniMax	$0.31	$1.24	$0.062	1M	50	—
Doubao 1.6	🇨🇳 ByteDance Doubao	$0.35	$3.54	—	256K	—	—
Qwen3 Max	🇨🇳 Alibaba Qwen	$0.37	$1.47	—	131K	—	—
DeepSeek V4 Pro	🇨🇳 DeepSeek	$0.44	$0.88	$0.004	1M	52	#52.06T
Spark X2	🇨🇳 iFlytek Spark	$0.44	$0.44	—	200K	—	—
混元 2.0 Think	🇨🇳 Tencent Hunyuan	$0.59	$2.34	—	128K	—	—
GLM-5	🇨🇳 Zhipu AI	$0.59	$2.65	$0.15	200K	—	—
文心 ERNIE 5.1	🇨🇳 Baidu ERNIE	$0.59	$2.65	—	128K	—	—
MiniMax M3	🇨🇳 MiniMax	$0.62	$2.47	$0.12	1M	55	#33.38T
Baichuan M3 Plus	🇨🇳 Baichuan AI	$0.74	$1.33	—	192K	—	—
GLM-5.1	🇨🇳 Zhipu AI	$0.88	$3.54	$0.19	200K	51	—
Doubao Seed 2.1 Pro	🇨🇳 ByteDance Doubao	$0.88	$4.42	$0.18	256K	—	—
Kimi K2.6	🇨🇳 Moonshot (Kimi)	$0.96	$3.98	$0.16	262K	54	—
Claude Haiku 4.5	🇺🇸 Anthropic	$1.00	$5.00	$0.10	200K	37	—
Grok Build 0.1	🇺🇸 xAI	$1.00	$2.00	$0.20	256K	—	—
Spark Pro	🇨🇳 iFlytek Spark	$1.03	$1.03	—	128K	—	—
GPT-5.1	🇺🇸 OpenAI	$1.25	$10.00	$0.13	400K	—	—
Gemini 2.5 Pro	🇺🇸 Google	$1.25	$10.00	$0.13	2M	35	—
Grok 4.3	🇺🇸 xAI	$1.25	$2.50	$0.20	1M	53	—
Gemini 3.5 Flash	🇺🇸 Google	$1.50	$9.00	$0.15	1M	55	#190.45T
Qwen3.7 Max	🇨🇳 Alibaba Qwen	$1.77	$5.30	—	1M	57	—
Gemini 3.1 Pro Preview	🇺🇸 Google	$2.00	$12.00	$0.20	2M	57	—
GPT-5.4	🇺🇸 OpenAI	$2.50	$15.00	$0.25	400K	—	—
Claude Sonnet 4.6	🇺🇸 Anthropic	$3.00	$15.00	$0.30	1M	52	#62.03T
GPT-5.5	🇺🇸 OpenAI	$5.00	$30.00	$0.50	400K	60	#180.47T
Claude Opus 4.8	🇺🇸 Anthropic	$5.00	$25.00	$0.50	1M	61	#91.32T
Claude Opus 4.7	🇺🇸 Anthropic	$5.00	$25.00	$0.50	1M	57	#81.71T
Claude Fable 5	🇺🇸 Anthropic	$10.00	$50.00	$1.00	1M	—	—

USD per million tokens · Chinese providers converted at 1 USD = ¥6.7883 · Green = cheapest · Heat = OpenRouter weekly rank by token usage (popularity, not quality; checked 2026-06-11) · Source: official provider pricing pages · For reference only.

Convenience pick

Tired of signing up provider-by-provider? One key for all models

Skip per-provider signups, top-ups and key juggling — AIMLAPI gives you one key to hundreds of models, pay-as-you-go, switch anytime.

Explore AIMLAPI →

Sponsored · we may earn a commission if you sign up via this link, at no extra cost to you

How to pick the most cost-effective LLM

Input vs output pricing.Almost every provider charges separately for input tokens (your prompt) and output tokens (the generation), and output is typically 4–10× more expensive. So “short prompt, long answer” tasks (writing, code generation) are dominated by output price, while “long prompt, short answer” tasks (summarization, classification) are dominated by input price.

Use prompt caching. If your requests share a large repeated prefix (fixed system prompt, RAG context), cached input can cost 10–20% of the normal rate. DeepSeek, OpenAI, Anthropic and Google all support context caching, but the discount varies a lot.

Don’t default to flagships. Mid-tier models like Claude Haiku 4.5, Gemini 2.5 Flash-Lite, Qwen3.5 Flash and DeepSeek V4 Flash are excellent value — for chat, translation, simple generation and classification they are more than enough at 1–5% of flagship cost.

Chinese models are aggressively priced.DeepSeek V4, Qwen, Doubao and Kimi often undercut Western models by an order of magnitude on output price. If latency to China matters or you’re cost-sensitive, they’re worth evaluating — see the dedicated Chinese LLM API pricing comparison in 2026 for all domestic providers in one table. Worried about data residency or training policies? We read all eight providers’ official policies in the Chinese AI API Trust Index.

Need to estimate your real bill? Try the calculators (Chinese UI).

Monthly bill calculator →Embedding prices