Rankings
Arena.ai leaderboards: Elo op basis van blind head-to-head stemmen. Kies een categorie.
Arena.ai leaderboards: Elo op basis van blind head-to-head stemmen. Kies een categorie.
Codegeneratie, debugging en agentic coding.
25 mei 2026, 06:07
20 modellen
| # | Model | Elo |
|---|---|---|
claude-opus-4-7-thinking Anthropic 路 Proprietary 卤10 路 4.5k stemmen | 1567 | |
claude-opus-4-7 Anthropic 路 Proprietary 卤10 路 4.2k stemmen | 1560 | |
claude-opus-4-6-thinking Anthropic 路 Proprietary 卤8 路 7.2k stemmen | 1545 | |
| 4 | claude-opus-4-6 Anthropic 路 Proprietary 卤8 路 8.2k stemmen | 1540 |
| 5 | glm-5.1 Z.ai 路 Open weights 卤11 路 3.6k stemmen | 1532 |
| 6 | claude-sonnet-4-6 Anthropic 路 Proprietary 卤7 路 10.4k stemmen | 1524 |
| 7 | kimi-k2.6 Moonshot 路 Open weights 卤11 路 3.4k stemmen | 1519 |
| 8 | muse-spark Meta 路 Proprietary 卤16 路 1.6k stemmen | 1509 |
| 9 | gemini-3.5-flash Google 路 Proprietary 卤14 路 2.1k stemmen | 1507 |
| 10 | gpt-5.5-xhigh (codex-harness) OpenAI 路 Proprietary 卤11 路 3.5k stemmen | 1503 |
| 11 | qwen3.6-max-preview Alibaba 路 Proprietary 卤14 路 2.1k stemmen | 1491 |
| 12 | claude-opus-4-5-20251101-thinking-32k Anthropic 路 Proprietary 卤7 路 13.1k stemmen | 1490 |
| 13 | gpt-5.5-high (codex-harness) OpenAI 路 Proprietary 卤11 路 3.7k stemmen | 1480 |
| 14 | mimo-v2.5-pro Xiaomi 路 Open weights 卤10 路 4.1k stemmen | 1471 |
| 15 | claude-opus-4-5-20251101 Anthropic 路 Proprietary 卤6 路 15.3k stemmen | 1467 |
| 16 | qwen3.6-plus Alibaba 路 Proprietary 卤9 路 5.4k stemmen | 1461 |
| 17 | deepseek-v4-pro-thinking DeepSeek 路 Open weights 卤11 路 3.3k stemmen | 1459 |
| 18 | gpt-5.4-high (codex-harness) OpenAI 路 Proprietary 卤17 路 1.5k stemmen | 1457 |
| 19 | gemini-3.1-pro-preview Google 路 Proprietary 卤7 路 9.6k stemmen | 1450 |
| 20 | gpt-5.5 (codex-harness) OpenAI 路 Proprietary 卤11 路 3.4k stemmen | 1440 |
Elo-score op basis van blind head-to-head stemmen. Hoger is beter. 卤 is het 95% betrouwbaarheidsinterval. Zelfde formaat als Arena.ai.
Bron op Arena.ai