AI2: Open Coding Agents

Posted by publicmatt 1/27/2026

253 points | 44 commentspage 2

mirekrusin 1/28/2026|

For low cost tuning wouldn't something like LoRa via ie. unsloth on ie. GLM-4.7-Flash be the way to go?

khimaros 1/27/2026||

it's great to see this kind of progress in reproducible weights, but color me confused. this claims to be better and smaller than Devstral-Small-2-24B, while clocking in at 32B (larger) and scoring more poorly?

ethan_l_shen 1/27/2026|

Hey! We are able to outperform Devstral-Small-2-24B when specializing on repositories, and come well within the range of uncertainty with our best SERA-32B model. That being said, our model is a bit larger than Devstral 24B. Could you point out what in the paper gave the impression that we were smaller? If theres something unclear we would love to revise

khimaros 1/27/2026||

"SERA-32B is the first model in Ai2's Open Coding Agents series. It is a state-of-the-art open-source coding agent that achieves 49.5% on SWE-bench Verified, matching the performance of much larger models like Devstral-Small-2 (24B)" from https://huggingface.co/allenai/SERA-32B

ethan_l_shen 1/27/2026||

Ah great catch I don't know how we missed that. Thanks! Will fix.

lrvick 1/28/2026||

So this "open" system still requires you to use Claude to actually use it?

somebodythere 1/28/2026|

No. You can point e.g. Opencode/Cline/Roo Code/Kilo Code at your inference endpoint. But CC has high install base and users are used to it, so it makes sense to target it.

asyncadventure 1/28/2026||

[dead]

utopiah 1/28/2026|

Ironic that it's OpenAI that stopped the trend.

another_twist 1/28/2026||

Hey, we need protecting from AI, only one company can get this right.

utopiah 2 days ago|||

Sounds like mafia marketing.

awestroke 1/28/2026|||

[flagged]

another_twist 1/28/2026||

Of course. If we allow AI to be open and accessible then generic humanity extinction event will happen. Which is why we need regulation in favor of one or two companies so that generic bad stuff doesnt transpire.

augusteo 1/27/2026|

[flagged]

storystarling 1/27/2026||

The fine-tuning overhead is definitely a factor, but for smaller shops the hard constraint is usually inference VRAM. Running a 32B model locally or on a rented GPU is surprisingly expensive if you aren't saturating it. Even at 4-bit quantization you are looking at dual 3090s or an A6000 to get decent tokens per second. The $400 training cost is impressive but the hosting bill is what actually kills the margin compared to per-token APIs.

syndacks 1/28/2026||

LLM shit post