If not then I’m not using it.
Cancelled my account 3 months ago, only Claude code level capability would bring me back.
For reference, this is a Rust codebase, deep "systems" stuff (database, compiler, virtual machine / language runtime)
They're still months behind OpenAI and Anthropic on coding.
Mind you I also find Claude Code careless and unreliable these days, too. (But it's good at tool use at least).
I do use Gemini for "lifestyle" AI usage (web research etc) tho.
I'm only gonna cry a little bit about the all-too-accurate roasts. Some of that stuff cut deep!
Feels like the AI pricing noose is tightening sooner rather than later.
Relatively speaking here's where it's at:
score age size name
44.2 97 large GLM-5 (Reasoning)
44.7 187 - GPT-5.1 (high)
44.9 29 - Qwen3.6 Max Preview
45 0 - Gemini 3.5 Flash
45.5 27 large MiMo-V2.5-Pro
45.6 75 - GPT-5.4 (low)
this is from artificial-analysis using https://github.com/day50-dev/aa-eval-email/blob/main/art-ana...I really don't know why people down vote me. What do I need to say to make things for free that people like? Sincere question. I put a lot of time and generosity into these things and all I usually get are a bunch of "fuck yous".
This is honestly an existential issue for me. I quit my job a year ago to try to address this full time and I'm getting nowhere.
We genuinely don't understand what your post is about. What is this tool? What are these numbers representative? Why are things sorted in that order?
You haven't communicated really anything at all. I am interested, I'd like to understand. Write a more complete post, please.
The json on the page has a coding index result it hides from the table.
That's what this exposes. It's a sorting from the leading evals company on the coding index for basically every model that matters presented in an easy to parse format that you can feed into model routing harnesses in real time so, for instance, your agents can dynamically upgrade themselves to better models as they come out or cost optimize based on eval results.
I do stuff like this, give it away for free and it's either ignored or makes people angry...
I really wish I didn't piss people off with my sincerity but somehow it always goes down that way
I really appreciate your time thank you so much
"\(
10 \* (.codingIndex // 0) | round / 10
) \(
(
now - (
.releaseDate |
try ( strptime("%Y-%m-%d") | mktime )
catch (now + 86400)
) ) / 86400 | floor
Real question. I see 86400 and I know it's time... That might just be me.I'm not being an ass, I don't know how to talk to people or when I think I'm being clear but I'm actually being cryptic
Also what message we should get from that table is not really obvious.
I know artificial analysis quite well as the gold standard in llm evals.
But I guess they're still obscure
I didn't think they were.
The age is important because new techniques keep being developed and so it is a very rough indicator of the size/cost/efficiency trade-off.
How old a model is is a major indicator of what you can expect from it.
I really need to develop a better sense for what people know. That's only one of my problems
Thanks for engaging with me
I also know them, but it took me a while to realise you were publishing their data in that table. I don't think it was clear.
> The age is important because new techniques keep being developed and so it is a very rough indicator of the size/cost/efficiency trade-off.
Yes but you are already including the name of the model, your potential public for the table already know about model's release history and therefore each model's age, at least roughly.