Leaderboard

Explore how AI models perform across our five core evaluation categories. Rankings are based on real-world conversations and human evaluations, measuring what truly matters in an AI assistant.

Model
#1
Claude Opus 4.6New
claude-opus-4-6
93.18
93.5
92.5
92.0
92.3
95.5
45.0
$5.00 / $25.00
200K
128K
#2
Claude Opus 4.5
claude-opus-4-5-202511...
91.45
89.5
90.0
95.3
88.6
93.8
60.0
$5.00 / $25.00
200K
64K
#3
Gemini 3 Pro
gemini-3-pro-preview
90.15
92.2
86.5
89.2
88.7
94.3
60.0
$2.00 / $12.00
2M
65.5K
#4
Claude Sonnet 4.5
claude-sonnet-4-5
89.66
91.5
85.8
89.2
89.5
92.3
68.5
$3.00 / $15.00
200K
64K
#5
GPT 5.1
gpt-5.1
89.34
92.5
87.5
90.1
82.1
94.5
80.0
$1.25 / $10.00
400K
128K
#6
o3
o3-2025-04-16
89.16
89.5
85.3
88.5
88.5
94.0
58.0
$2.00 / $8.00
200K
100K
#7
GPT 5.2
gpt-5.2
89.08
93.5
86.7
87.3
82.5
95.3
75.0
$1.75 / $14.00
400K
128K
#8
GLM 5New
glm-5
87.54
93.0
85.7
82.2
86.3
90.6
72.0
$1.00 / $3.20
200K
128K
#9
GPT 5
gpt-5
87.53
91.0
85.7
87.2
80.3
93.5
72.0
$1.25 / $10.00
400K
128K
#10
Grok 4.1 Fast Reasoning
grok-4-1-fast-reasonin...
86.88
92.5
82.3
86.8
79.8
93.0
62.0
$0.20 / $0.50
2M
8.2K
#11
Claude Opus 4.1
claude-opus-4-1
86.61
90.0
80.3
89.5
82.8
90.5
54.0
$15.00 / $75.00
200K
8.2K
#12
Grok 4.1 Fast
grok-4-1-fast-non-reas...
85.92
90.3
82.9
86.1
78.3
92.1
78.0
$0.20 / $0.50
2M
8.2K
#13
Kimi K2.5
kimi-k2.5
85.92
91.1
80.5
87.5
82.3
88.2
65.0
$0.55 / $2.75
256K
4.1K
#14
ChatGPT 4o
chatgpt-4o-latest
85.84
88.5
84.5
87.7
81.5
87.0
82.2
$5.00 / $15.00
128K
16.4K
#15
Gemini 2.5 Pro
gemini-2.5-pro
85.78
88.3
86.4
86.8
78.3
89.2
60.4
$2.00 / $12.00
1M
65.5K
#16
Claude Haiku 4.5
claude-haiku-4-5
85.52
87.5
79.0
85.6
87.5
88.0
93.0
$1.00 / $5.00
200K
64K
#17
Grok 4 Fast Reasoning
grok-4-fast-reasoning
84.65
89.0
80.8
87.0
76.0
90.5
88.0
$0.20 / $0.50
2M
8.2K
#18
DeepSeek V3.2 Thinking
deepseek-reasoner-v3.2
84.48
89.0
71.0
88.8
79.7
94.0
45.0
$0.14 / $0.28
160K
32.8K
#19
Gemini 3 Flash
gemini-3-flash-preview
84.47
88.5
79.8
89.1
74.5
90.4
70.0
$0.50 / $3.00
1M
65.5K
#20
GPT-5 Mini
gpt-5-mini
84.30
86.0
80.0
86.5
81.0
88.0
94.0
$0.25 / $2.00
400K
128K
#21
o4-mini
o4-mini
84.13
89.5
81.5
86.7
72.0
91.0
95.0
$1.10 / $4.40
200K
100K
#22
Grok 4 Fast
grok-4-fast-non-reason...
83.54
87.0
82.0
84.7
75.5
88.5
93.0
$0.20 / $0.50
2M
8.2K
#23
o1
o1
83.35
87.5
78.0
84.3
78.0
89.0
65.0
$15.00 / $60.00
200K
100K
#24
DeepSeek V3.1 Thinking
deepseek-reasoner
82.72
87.0
68.0
86.0
82.5
90.0
40.0
$0.07 / $1.68
128K
32.8K
#25
DeepSeek V3.2
deepseek-v3.2-exp
81.95
86.5
76.0
84.3
76.0
87.0
95.0
$0.07 / $0.14
160K
8.2K
Showing 25 of 47 models