Top
Best
New

Posted by samwillis 1 day ago

Scaling long-running autonomous coding(cursor.com)
259 points | 162 commentspage 3
tired_and_awake 1 day ago|
The moment all code is interacted with through agents I cease to care about code quality. The only thing that matters is the quality of the product, cost of maintenance etc. exactly the thing we measure software development orgs against. It could be handy to have these projects deployed to demonstrate their utility and efficacy? Looking at PRs of agents feels a wrong headed, like who cares if agents code is hard to read if agents are managing the code base?
qingcharles 1 day ago||
We don't read the binary output of our C compilers because we trust it to be correct almost every time. ("It's a compiler bug" is more of a joke than a real issue)

If AI could reach the point where we actually trusted the output, then we might stop checking it.

LiamPowell 1 day ago|||
> "It's a compiler bug" is more of a joke than a real issue

It's a very real issue, people just seem to assume their code is wrong rather than the compiler. I've personally reported 12 GCC bugs over the last 2 years and there's 1239 open wrong-code bugs currently.

Here's an example of a simple one in the C frontend that has existed since GCC 4.7: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105180

ares623 1 day ago|||
“If” doing a lot of work here
AlexCoventry 1 day ago|||
You should at least read the tests, to make sure they express your intent. Personally, I'm not going to take responsibility for a piece of code unless I've read every line of it and thought hard about whether it does what I think it does.

AI coding agents are still a huge force-multiplier if you take this approach, though.

visarga 1 day ago|||
> Looking at PRs of agents feels a wrong headed

It would be walking the motorcycle.

icedchai 1 day ago|||
This is how we wound up with non-technical "engineering managers." Looks good to me.
tired_and_awake 1 day ago||
I think this misses the point, see the other comments. Fully scaled agentic coding replaces managers too :) cause for celebration all around
satvikpendem 1 day ago|||
No, it becomes only managers, because they are the ones who dictate the business needs (because otherwise, what is the software the agents are making even doing without such goals), and now even worse with non technical ones.
icedchai 1 day ago|||
I don't believe that. If you go fully agentic and you don't understand the output, you become the manager. You're in no better position than the pointy-haired boss from Dilbert.
tired_and_awake 16 hours ago||
Hey just wanted to thank you for the healthy back and forth! I respect your opinion and don't hold mine strongly. That said I'm eager for this space to mature and for us all to figure out the best way to interact with fault prone code generation tooling... Especially at scale where we all have the hardest time navigating complexity.
icedchai 13 hours ago||
Thanks. It's fun chatting about this stuff! I don't hold mine strongly, either, though I am dealing with lots of AI generated slop code from others.

Interesting times ahead.

tired_and_awake 3 hours ago||
I feel for you. Hopefully your colleagues come around and realize that if they submit the code they are responsible for the slop.
flyinglizard 1 day ago||
You could look at agents as meta-compilers, the problem is that unlike real compilers they aren't verified in any way (neither formally or informally), in fact you never know which particular agent you're running against when you're asking for something; and unlike compilers, you don't just throw away everything and start afresh on each run. I don't think you could test a reasonably complex system to a degree where it really wouldn't matter what runs underneath, and as you're going to (probably) use other agents to write THOSE tests, what makes you certain they offer real coverage? It's turtles all the way down.
tired_and_awake 1 day ago||
Completely agree and great points. The conclusion of "agents are writing the tests" etc is where I'm at as well. More over the code quality itself is also an agentic problem, as is compile time, reliability, portability... Turtles all the way down as you say.

All code interactions all happen through agents.

I suppose the question is if the agents only produce Swiss cheese solutions at scale and there's no way to fill in those gaps (at scale). Then yeah fully agentic coding is probably a pipe dream.

On the other hand if you can stand up a code generation machine where it's watts + Gpus + time => software products. Then well... It's only a matter of time until app stores entirely disappear or get really weird. It's hard to fathom the change that's coming to our profession in this world.

Havoc 22 hours ago||
> long running

I really dislike this as a measure. A LLM on CPU is also long running cause it’s slow.

I get what it’s meant to convey but time is such a terrible measure of anything if tk/s isn’t static

throwaway63467 1 day ago||
I‘m running opus 4.5 which is arguably their best model and while it’s really good for a lot of work it always introduces subtle errors or inconsistencies when left unsupervised as prompts are never good enough to remove all ambiguity for complex asks, so I can’t imagine what it will do to a code base when left alone with it for days or weeks.
tgtweak 20 hours ago||
Is it too much to expect companies to share some of this in the open vs just the results?
foota 1 day ago||
Slightly off topic, but they want to move from solid to react? Isn't that the reverse of the newest trend? Would be interesting to know more.
luhego 1 day ago||
> We initially built an integrator role for quality control and conflict resolution, but found it created more bottlenecks than it solved

Of course it creates bottlenecks, since code quality takes time and people don’t get it right on the first try when the changes are complex. I could also be faster if I pushed directly to prod!

Don’t get me wrong. I use these tools, and I can see the productivity gains. But I also believe the only way to achieve the results they show is to sacrifice quality, because no software engineer can review the changes at the same speed the agent generates code. They may solve that problem, or maybe the industry will change so only output and LOC matter, but until then I will keep cursing the agent until I get the result I want.

matthewfcarlson 1 day ago||
It’s fascinating that many of the issues they faced I’ve seen in human software engineering teams.

Things like integration creating bottlenecks or a lack of consistent top down direction leading to small risk adverse changes instead of bold redesigns. All things I’ve seen before.

2001zhaozhao 1 day ago|
At least the AI teams aren't politically competing against each other unlike human teams.

(Or are they?)

laszlojamf 1 day ago||
They mention billions of tokens, but I'm left wondering how much this experiment actually cost them...
WOTERMEON 1 day ago||
Weird twist the hiring call at the end for a company that says

> Our mission is to automate coding

mdswanson 1 day ago|
Over the past year or so, I've built my own system of agents that behaves almost exactly like this. I can describe what I'd like built before I go to bed and have a fantastic foundation in place by the next day. For simpler projects, they'll be complete. Because of the reviews, the code continually improves until the agents are satisfied. I'm impressed every time.
z_zetetic_z 13 hours ago|
Any chance you would care to share more about this?
More comments...