LLM API Pricing Comparison
Compare 26 leading large language models in one table. Prices in USD per million tokens. Updated 2026-05-30.
| Model | Provider | Input | Output | Cached | Context | Quality |
|---|---|---|---|---|---|---|
| Qwen3.5 Flash | 🇨🇳 Alibaba Qwen | $0.029 | $0.29 | — | 131K | — |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | $0.010 | 1M | — | |
| Qwen3.5 Plus | 🇨🇳 Alibaba Qwen | $0.12 | $0.71 | — | 131K | — |
| Doubao 1.5 Pro | 🇨🇳 ByteDance Doubao | $0.12 | $0.29 | $0.024 | 256K | — |
| DeepSeek V4 Flash | 🇨🇳 DeepSeek | $0.15 | $0.29 | $0.003 | 1M | — |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | $0.025 | 1M | — | |
| DeepSeek V3.2 | 🇨🇳 DeepSeek | $0.29 | $1.18 | $0.074 | 128K | — |
| GLM-4.7 | 🇨🇳 Zhipu AI | $0.29 | $1.18 | $0.059 | 200K | — |
| Gemini 2.5 Flash | $0.30 | $2.50 | $0.030 | 1M | — | |
| Doubao 1.6 | 🇨🇳 ByteDance Doubao | $0.35 | $3.54 | — | 256K | — |
| Qwen3 Max | 🇨🇳 Alibaba Qwen | $0.37 | $1.47 | — | 131K | — |
| DeepSeek V4 Pro | 🇨🇳 DeepSeek | $0.44 | $0.88 | $0.004 | 1M | — |
| GLM-5 | 🇨🇳 Zhipu AI | $0.59 | $2.65 | $0.15 | 200K | — |
| GLM-5.1 | 🇨🇳 Zhipu AI | $0.88 | $3.54 | $0.19 | 200K | — |
| Kimi K2.6 | 🇨🇳 Moonshot (Kimi) | $0.96 | $3.98 | $0.16 | 262K | — |
| Claude Haiku 4.5 | 🇺🇸 Anthropic | $1.00 | $5.00 | $0.10 | 200K | — |
| Grok Build 0.1 | 🇺🇸 xAI | $1.00 | $2.00 | — | 256K | — |
| GPT-5.1 | 🇺🇸 OpenAI | $1.25 | $10.00 | $0.13 | 400K | — |
| Gemini 2.5 Pro | $1.25 | $10.00 | $0.13 | 2M | — | |
| Grok 4.3 | 🇺🇸 xAI | $1.25 | $2.50 | $0.20 | 1M | — |
| Gemini 3.5 Flash | $1.50 | $9.00 | $0.15 | 1M | — | |
| Gemini 3.1 Pro Preview | $2.00 | $12.00 | $0.20 | 2M | 57 | |
| GPT-5.4 | 🇺🇸 OpenAI | $2.50 | $15.00 | $0.25 | 400K | — |
| Claude Sonnet 4.6 | 🇺🇸 Anthropic | $3.00 | $15.00 | $0.30 | 1M | — |
| GPT-5.5 | 🇺🇸 OpenAI | $5.00 | $30.00 | $0.50 | 400K | 60 |
| Claude Opus 4.7 | 🇺🇸 Anthropic | $5.00 | $25.00 | $0.50 | 1M | 57 |
USD per million tokens · Chinese providers converted at 1 USD = ¥6.7867 · Green = cheapest · Source: official provider pricing pages · For reference only.
How to pick the most cost-effective LLM
Input vs output pricing.Almost every provider charges separately for input tokens (your prompt) and output tokens (the generation), and output is typically 4–10× more expensive. So “short prompt, long answer” tasks (writing, code generation) are dominated by output price, while “long prompt, short answer” tasks (summarization, classification) are dominated by input price.
Use prompt caching. If your requests share a large repeated prefix (fixed system prompt, RAG context), cached input can cost 10–20% of the normal rate. DeepSeek, OpenAI, Anthropic and Google all support context caching, but the discount varies a lot.
Don’t default to flagships. Mid-tier models like Claude Haiku 4.5, Gemini 2.5 Flash-Lite, Qwen3.5 Flash and DeepSeek V4 Flash are excellent value — for chat, translation, simple generation and classification they are more than enough at 1–5% of flagship cost.
Chinese models are aggressively priced.DeepSeek V4, Qwen, Doubao and Kimi often undercut Western models by an order of magnitude on output price. If latency to China matters or you’re cost-sensitive, they’re worth evaluating.
Need to estimate your real bill? Try the calculators (Chinese UI).