Leaderboard
Explore how AI models perform across our five core evaluation categories. Rankings are based on real-world conversations and human evaluations, measuring what truly matters in an AI assistant.
Model | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
#1 | Claude Opus 4.6New claude-opus-4-6 | 93.18 | 93.5 | 92.5 | 92.0 | 92.3 | 95.5 | 45.0 | $5.00 / $25.00 | 200K | 128K |
#2 | Claude Opus 4.5 claude-opus-4-5-202511... | 91.45 | 89.5 | 90.0 | 95.3 | 88.6 | 93.8 | 60.0 | $5.00 / $25.00 | 200K | 64K |
#3 | Gemini 3 Pro gemini-3-pro-preview | 90.15 | 92.2 | 86.5 | 89.2 | 88.7 | 94.3 | 60.0 | $2.00 / $12.00 | 2M | 65.5K |
#4 | Claude Sonnet 4.5 claude-sonnet-4-5 | 89.66 | 91.5 | 85.8 | 89.2 | 89.5 | 92.3 | 68.5 | $3.00 / $15.00 | 200K | 64K |
#5 | GPT 5.1 gpt-5.1 | 89.34 | 92.5 | 87.5 | 90.1 | 82.1 | 94.5 | 80.0 | $1.25 / $10.00 | 400K | 128K |
#6 | o3 o3-2025-04-16 | 89.16 | 89.5 | 85.3 | 88.5 | 88.5 | 94.0 | 58.0 | $2.00 / $8.00 | 200K | 100K |
#7 | GPT 5.2 gpt-5.2 | 89.08 | 93.5 | 86.7 | 87.3 | 82.5 | 95.3 | 75.0 | $1.75 / $14.00 | 400K | 128K |
#8 | GLM 5New glm-5 | 87.54 | 93.0 | 85.7 | 82.2 | 86.3 | 90.6 | 72.0 | $1.00 / $3.20 | 200K | 128K |
#9 | GPT 5 gpt-5 | 87.53 | 91.0 | 85.7 | 87.2 | 80.3 | 93.5 | 72.0 | $1.25 / $10.00 | 400K | 128K |
#10 | Grok 4.1 Fast Reasoning grok-4-1-fast-reasonin... | 86.88 | 92.5 | 82.3 | 86.8 | 79.8 | 93.0 | 62.0 | $0.20 / $0.50 | 2M | 8.2K |
#11 | Claude Opus 4.1 claude-opus-4-1 | 86.61 | 90.0 | 80.3 | 89.5 | 82.8 | 90.5 | 54.0 | $15.00 / $75.00 | 200K | 8.2K |
#12 | Grok 4.1 Fast grok-4-1-fast-non-reas... | 85.92 | 90.3 | 82.9 | 86.1 | 78.3 | 92.1 | 78.0 | $0.20 / $0.50 | 2M | 8.2K |
#13 | Kimi K2.5 kimi-k2.5 | 85.92 | 91.1 | 80.5 | 87.5 | 82.3 | 88.2 | 65.0 | $0.55 / $2.75 | 256K | 4.1K |
#14 | ChatGPT 4o chatgpt-4o-latest | 85.84 | 88.5 | 84.5 | 87.7 | 81.5 | 87.0 | 82.2 | $5.00 / $15.00 | 128K | 16.4K |
#15 | Gemini 2.5 Pro gemini-2.5-pro | 85.78 | 88.3 | 86.4 | 86.8 | 78.3 | 89.2 | 60.4 | $2.00 / $12.00 | 1M | 65.5K |
#16 | Claude Haiku 4.5 claude-haiku-4-5 | 85.52 | 87.5 | 79.0 | 85.6 | 87.5 | 88.0 | 93.0 | $1.00 / $5.00 | 200K | 64K |
#17 | Grok 4 Fast Reasoning grok-4-fast-reasoning | 84.65 | 89.0 | 80.8 | 87.0 | 76.0 | 90.5 | 88.0 | $0.20 / $0.50 | 2M | 8.2K |
#18 | DeepSeek V3.2 Thinking deepseek-reasoner-v3.2 | 84.48 | 89.0 | 71.0 | 88.8 | 79.7 | 94.0 | 45.0 | $0.14 / $0.28 | 160K | 32.8K |
#19 | Gemini 3 Flash gemini-3-flash-preview | 84.47 | 88.5 | 79.8 | 89.1 | 74.5 | 90.4 | 70.0 | $0.50 / $3.00 | 1M | 65.5K |
#20 | GPT-5 Mini gpt-5-mini | 84.30 | 86.0 | 80.0 | 86.5 | 81.0 | 88.0 | 94.0 | $0.25 / $2.00 | 400K | 128K |
#21 | o4-mini o4-mini | 84.13 | 89.5 | 81.5 | 86.7 | 72.0 | 91.0 | 95.0 | $1.10 / $4.40 | 200K | 100K |
#22 | Grok 4 Fast grok-4-fast-non-reason... | 83.54 | 87.0 | 82.0 | 84.7 | 75.5 | 88.5 | 93.0 | $0.20 / $0.50 | 2M | 8.2K |
#23 | o1 o1 | 83.35 | 87.5 | 78.0 | 84.3 | 78.0 | 89.0 | 65.0 | $15.00 / $60.00 | 200K | 100K |
#24 | DeepSeek V3.1 Thinking deepseek-reasoner | 82.72 | 87.0 | 68.0 | 86.0 | 82.5 | 90.0 | 40.0 | $0.07 / $1.68 | 128K | 32.8K |
#25 | DeepSeek V3.2 deepseek-v3.2-exp | 81.95 | 86.5 | 76.0 | 84.3 | 76.0 | 87.0 | 95.0 | $0.07 / $0.14 | 160K | 8.2K |
Showing 25 of 47 models