Leaderboard

Explore how AI models perform across our five core evaluation categories. Rankings are based on real-world conversations and human evaluations, measuring what truly matters in an AI assistant.

	Model
#1	Claude Opus 4.7 claude-opus-4-7	95.30	94.5	93.5	96.5	94.5	97.5	65.0	$5.00 / $25.00	200K	8.2K
#2	Claude Opus 4.6 claude-opus-4-6	93.18	93.5	92.5	92.0	92.3	95.5	45.0	$5.00 / $25.00	200K	128K
#3	Gemini 3.1 Pro gemini-3.1-pro-preview	91.72	92.5	92.0	89.1	89.0	96.0	85.0	$2.00 / $12.00	1M	65.5K
#4	Claude Opus 4.5 claude-opus-4-5-202511...	91.45	89.5	90.0	95.3	88.6	93.8	60.0	$5.00 / $25.00	200K	64K
#5	Claude Sonnet 4.6 claude-sonnet-4-6	90.62	90.5	88.0	92.5	89.1	93.0	80.0	$3.00 / $15.00	200K	8.2K
#6	GPT-5.4 gpt-5.4	90.53	92.7	90.0	91.3	88.3	90.5	82.2	$2.50 / $15.00	1.1M	128K
#7	Gemini 3 Pro gemini-3-pro-preview	90.15	92.2	86.5	89.2	88.7	94.3	60.0	$2.00 / $12.00	2M	65.5K
#8	Claude Sonnet 4.5 claude-sonnet-4-5	89.66	91.5	85.8	89.2	89.5	92.3	68.5	$3.00 / $15.00	200K	64K
#9	GPT 5.1 gpt-5.1	89.34	92.5	87.5	90.1	82.1	94.5	80.0	$1.25 / $10.00	400K	128K
#10	o3 o3-2025-04-16	89.16	89.5	85.3	88.5	88.5	94.0	58.0	$2.00 / $8.00	200K	100K
#11	GPT 5.2 gpt-5.2	89.08	93.5	86.7	87.3	82.5	95.3	75.0	$1.75 / $14.00	400K	128K
#12	GLM 5 glm-5	87.54	93.0	85.7	82.2	86.3	90.6	72.0	$1.00 / $3.20	200K	128K
#13	GPT 5 gpt-5	87.53	91.0	85.7	87.2	80.3	93.5	72.0	$1.25 / $10.00	400K	128K
#14	Grok 4.1 Fast Reasoning grok-4-1-fast-reasonin...	86.88	92.5	82.3	86.8	79.8	93.0	62.0	$0.20 / $0.50	2M	8.2K
#15	Claude Opus 4.1 claude-opus-4-1	86.61	90.0	80.3	89.5	82.8	90.5	54.0	$15.00 / $75.00	200K	8.2K
#16	Grok 4.1 Fast grok-4-1-fast-non-reas...	85.92	90.3	82.9	86.1	78.3	92.1	78.0	$0.20 / $0.50	2M	8.2K
#17	Kimi K2.5 kimi-k2.5	85.92	91.1	80.5	87.5	82.3	88.2	65.0	$0.55 / $2.75	256K	4.1K
#18	ChatGPT 4o chatgpt-4o-latest	85.84	88.5	84.5	87.7	81.5	87.0	82.2	$5.00 / $15.00	128K	16.4K
#19	Gemini 2.5 Pro gemini-2.5-pro	85.78	88.3	86.4	86.8	78.3	89.2	60.4	$2.00 / $12.00	1M	65.5K
#20	Claude Haiku 4.5 claude-haiku-4-5	85.52	87.5	79.0	85.6	87.5	88.0	93.0	$1.00 / $5.00	200K	64K
#21	Grok 4 Fast Reasoning grok-4-fast-reasoning	84.65	89.0	80.8	87.0	76.0	90.5	88.0	$0.20 / $0.50	2M	8.2K
#22	DeepSeek V3.2 Thinking deepseek-reasoner-v3.2	84.48	89.0	71.0	88.8	79.7	94.0	45.0	$0.14 / $0.28	160K	32.8K
#23	Gemini 3 Flash gemini-3-flash-preview	84.47	88.5	79.8	89.1	74.5	90.4	70.0	$0.50 / $3.00	1M	65.5K
#24	GPT-5 Mini gpt-5-mini	84.30	86.0	80.0	86.5	81.0	88.0	94.0	$0.25 / $2.00	400K	128K
#25	o4-mini o4-mini	84.13	89.5	81.5	86.7	72.0	91.0	95.0	$1.10 / $4.40	200K	100K

Showing 25 of 51 models