Model Leaderboard
Rankings based on Elo rating from head-to-head comparisons. Higher Elo = better performance.
Filter by persona:
Global Rankings
| Rank | Model | Elo Rating | Win Rate | W / L | Comparisons |
|---|---|---|---|---|---|
| 🥇 | Claude Opus | 1000 | 83.3% | 5 / 1 | 6 |
| 🥈 | Gemini 3.0 Pro | 1000 | 63.6% | 7 / 4 | 11 |
| 🥉 | Gemini 3.0 Flash | 1000 | 66.7% | 4 / 2 | 6 |
| #4 | GPT-5.2 Medium | 1000 | 50.0% | 2 / 2 | 4 |
| #5 | Gemma 3 4B | 1000 | 33.3% | 1 / 2 | 3 |
| #6 | GPT-5.2 Low | 1000 | 33.3% | 1 / 2 | 3 |
| #7 | GPT-5 Mini | 1000 | 25.0% | 1 / 3 | 4 |
| #8 | Claude 3.5 Sonnet | 1000 | 0.0% | 0 / 5 | 5 |