Model Leaderboard
Rankings based on Elo rating from head-to-head comparisons. Higher Elo = better performance.
Filter by persona:
Global Rankings
| Rank | Model | Elo Rating | Win Rate | W / L | Comparisons |
|---|---|---|---|---|---|
| 🥇 | Gemma 3 4B | 1003 | 100.0% | 1 / 0 | 1 |
| 🥈 | GPT-5.2 Low | 1003 | 100.0% | 1 / 0 | 1 |
| 🥉 | Gemini 1.5 Pro | 1000 | 0.0% | 0 / 0 | 0 |
| #4 | GPT-5 Mini | 1000 | 0.0% | 0 / 0 | 0 |
| #5 | Claude Opus | 1000 | 0.0% | 0 / 0 | 0 |
| #6 | Gemini Pro | 1000 | 0.0% | 0 / 0 | 0 |
| #7 | GPT-4o | 1000 | 0.0% | 0 / 0 | 0 |
| #8 | Claude 3.5 Sonnet | 1000 | 0.0% | 0 / 0 | 0 |
| #9 | GPT-5.2 Medium | 995 | 0.0% | 0 / 2 | 2 |