Arena AI Model ELO History
1 min read
Hi HN,
I built a live tracker to visualize the lifecycle and performance changes of flagship AI models.
We've all experienced the phenomenon where a flagship model feels amazing at launch, but weeks later, it suddenly feels a bit off. I wanted to see if this was just a feeling or a measurable reality, so I built a dashboard to track historical ELO ratings from Arena AI.
Instead of a massive spaghetti chart of every single model variant, the logic plots exactly ONE continuous curve per m