(Direct Link) https://raw.githubusercontent.com/KCORES/kcores-llm-arena/re...
It would be incredible to be able to feed an entire codebase into a model and say "add this feature" or "we're having a bug where X is happening, tell me why", but then you are limited by the output token length
As others have pointed out too, the more tokens you use, the less accuracy you get and the more it gets confused, I've noticed this too
We are a ways away yet from being able to input an entire codebase, and have it give you back an updated version of that codebase.
- It's basically GPT4o level on average.
- More optimized for coding, but slightly inferior in other areas.
It seems to be a better model than 4o for coding tasks, but I'm not sure if it will replace the current leaders -- Gemini 2.5 Pro, o3-mini / o1, Claude 3.7/3.5.
Sam acknowledged this a few months ago, but with another release not really bringing any clarity, this is getting ridiculous now.
Lies, damn lies and statistics ;-)
why would they deprecate when it's the better model? too expensive?
Too expensive, but not for them - for their customers. The only reason they’d deprecated it is if it wasn’t seeing usage worth keeping it up and that probably stems from it being insanely more expensive and slower than everything else.
I'm guessing the (API) demand isn't there to saturate them fully
https://platform.openai.com/docs/models/gpt-4.1
As far as I can tell there's no way to discover the details of a model via the API right now.
Given the announced adoption of MCP and MCP's ability to perform model selection for Sampling based on a ranking for speed and intelligence, it would be great to have a model discovery endpoint that came with all the details on that page.
I will check again the prompt, maybe 4o-mini ignores some instructions that 4.1 doesn't (instructions which might result in the LLM returning zero data).