算盘LLM Abacus

LLM API Pricing Comparison

Compare 26 leading large language models in one table. Prices in USD per million tokens. Updated 2026-05-30.

ModelProviderInputOutputCachedContextQuality
Qwen3.5 Flash🇨🇳 Alibaba Qwen$0.029$0.29131K
Gemini 2.5 Flash-Lite🇺🇸 Google$0.10$0.40$0.0101M
Qwen3.5 Plus🇨🇳 Alibaba Qwen$0.12$0.71131K
Doubao 1.5 Pro🇨🇳 ByteDance Doubao$0.12$0.29$0.024256K
DeepSeek V4 Flash🇨🇳 DeepSeek$0.15$0.29$0.0031M
Gemini 3.1 Flash-Lite🇺🇸 Google$0.25$1.50$0.0251M
DeepSeek V3.2🇨🇳 DeepSeek$0.29$1.18$0.074128K
GLM-4.7🇨🇳 Zhipu AI$0.29$1.18$0.059200K
Gemini 2.5 Flash🇺🇸 Google$0.30$2.50$0.0301M
Doubao 1.6🇨🇳 ByteDance Doubao$0.35$3.54256K
Qwen3 Max🇨🇳 Alibaba Qwen$0.37$1.47131K
DeepSeek V4 Pro🇨🇳 DeepSeek$0.44$0.88$0.0041M
GLM-5🇨🇳 Zhipu AI$0.59$2.65$0.15200K
GLM-5.1🇨🇳 Zhipu AI$0.88$3.54$0.19200K
Kimi K2.6🇨🇳 Moonshot (Kimi)$0.96$3.98$0.16262K
Claude Haiku 4.5🇺🇸 Anthropic$1.00$5.00$0.10200K
Grok Build 0.1🇺🇸 xAI$1.00$2.00256K
GPT-5.1🇺🇸 OpenAI$1.25$10.00$0.13400K
Gemini 2.5 Pro🇺🇸 Google$1.25$10.00$0.132M
Grok 4.3🇺🇸 xAI$1.25$2.50$0.201M
Gemini 3.5 Flash🇺🇸 Google$1.50$9.00$0.151M
Gemini 3.1 Pro Preview🇺🇸 Google$2.00$12.00$0.202M57
GPT-5.4🇺🇸 OpenAI$2.50$15.00$0.25400K
Claude Sonnet 4.6🇺🇸 Anthropic$3.00$15.00$0.301M
GPT-5.5🇺🇸 OpenAI$5.00$30.00$0.50400K60
Claude Opus 4.7🇺🇸 Anthropic$5.00$25.00$0.501M57

USD per million tokens · Chinese providers converted at 1 USD = ¥6.7867 · Green = cheapest · Source: official provider pricing pages · For reference only.

How to pick the most cost-effective LLM

Input vs output pricing.Almost every provider charges separately for input tokens (your prompt) and output tokens (the generation), and output is typically 4–10× more expensive. So “short prompt, long answer” tasks (writing, code generation) are dominated by output price, while “long prompt, short answer” tasks (summarization, classification) are dominated by input price.

Use prompt caching. If your requests share a large repeated prefix (fixed system prompt, RAG context), cached input can cost 10–20% of the normal rate. DeepSeek, OpenAI, Anthropic and Google all support context caching, but the discount varies a lot.

Don’t default to flagships. Mid-tier models like Claude Haiku 4.5, Gemini 2.5 Flash-Lite, Qwen3.5 Flash and DeepSeek V4 Flash are excellent value — for chat, translation, simple generation and classification they are more than enough at 1–5% of flagship cost.

Chinese models are aggressively priced.DeepSeek V4, Qwen, Doubao and Kimi often undercut Western models by an order of magnitude on output price. If latency to China matters or you’re cost-sensitive, they’re worth evaluating.

Need to estimate your real bill? Try the calculators (Chinese UI).