Over-editing refers to a model modifying code beyond what is necessary

Posted by pella 1 day ago

Over-editing refers to a model modifying code beyond what is necessary(nrehiew.github.io)

396 points | 233 commentspage 4

exitb 23 hours ago|

As mentioned in the article, prompting for minimal changes does help. I find GPT models to be very steerable, but it doesn’t mean much when you take your hands of the wheel. These type of issues should be solved at planning stage.

Bengalilol 22 hours ago||

Tangent and admittedly off-topic but I've come to see LLM-assisted coding as a kind of teleportation.

With LLMs, you glimpse a distant mountain. In the next instant, you're standing on its summit. Blink, and you are halfway down a ridge you never climbed. A moment later, you're flung onto another peak with no trail behind you, no sense of direction, no memory of the ascent. The landscape keeps shifting beneath your feet, but you never quite see the panorama. Before you know it, you're back near the base, disoriented, as if the journey never happened. But confident, you say you were on the top of the mountain.

Manual coding feels entirely different. You spot the mountain, you study its slopes, trace a route, pack your gear. You begin the climb. Each step is earned steadily and deliberately. You feel the strain, adjust your path, learn the terrain. And when you finally reach the summit, the view unfolds with meaning. You know exactly where you are, because you've crossed every meter to get there. The satisfaction isn't just in arriving, nor in saying you were there: it is in having truly climbed.

jdkoeck 22 hours ago|

The thing is, with manual coding, you spot a view in the distance, you trek your way for a few hours, and you realize when you get there that the view isn’t as great as you thought it was.

With LLM-assisted coding, you skip the trek and you instantly know that’s not it.

slopinthebag 23 hours ago||

I think the industry has leaned waaay too far into completely autonomous agents. Of course there are reasons why corporations would want to completely replace their engineers with fully autonomous coding agents, but for those of us who actually work developing software, why would we want less and less autonomy? Especially since it alienates us from our codebases, requiring more effort in the future to gain an understanding of what is happening.

I think we should move to semi-autonomous steerable agents, with manual and powerful context management. Our tools should graduate from simple chat threads to something more akin to the way we approach our work naturally. And a big benefit of this is that we won't need expensive locked down SOTA models to do this, the open models are more than powerful enough for pennies on the dollar.

NitpickLawyer 23 hours ago||

I'm hearing this more and more, we need new UX that is better suited for the LLM meta. But none that I've seen so far have really got it, yet.

grttww 23 hours ago||

When you steer a car, there isn’t this degree of probability about the output.

How do you emulate that with llm’s? I suppose the objective is to get variance down to the point it’s barely noticeable. But not sure it’ll get to that place based on accumulating more data and re-training models.

slopinthebag 21 hours ago||

Well, the point is by steering it you can get both more expected/reproducible output, and you can correct bad assumptions before they become solidified in your codebase.

You can get pretty close to reproducible output by narrowing the scope and using certain prompts/harnesses. As in, you get roughly the same output each time with identical prompts, assuming you're using a model which doesn't change every few hours to deal with load, and you aren't using a coding harness that changes how it works every update. It's not deterministic, but if you ask it for a scoped implementation you essentially get the same implementation every time, with some minor and usually irrelevant differences.

So you can imagine with a stable model and harness, with steering you can basically get what you ask it for each time. Tooling that exploits this fact can be much more akin to using an autocomplete, but instead of a line of code it's blocks of code, functions, etc.

A harness that makes it easy to steer means you can basically write the same code you would have otherwise written, just faster. Which I think is a genuine win, not only from a productivity standpoint but also you maintain control over the codebase and you aren't alienated or disenfranchised from the output, and it's much easier to make corrections or write your own implementations where you feel it's necessary. It becomes more of an augmentation and less of a replacement.

grttww 20 hours ago||

You wrote all that and didn’t address the question lmao.

There’s diminishing returns and moreover this idea that people are holding it wrong / they need to figure out the complexity goes against all that has been done over the past 30 years : making things simpler.

slopinthebag 20 hours ago||

You asked me how one could minimise the non-deterministic output of LLM's and I responded, if that's not good enough of an answer feel free to ask a follow up.

BoredomIsFun 8 hours ago||

It feels like a pointless conversation, if no sampler settings (min_p, temperature etc.) mentioned.

lo1tuma 23 hours ago||

I’m not sure if I share the authors opinion. When I was hand-writing code I also followed the boy-scout rule and did smaller refactorings along the line.

lopsotronic 23 hours ago||

When asked to show their development-test path in the form of a design document or test document, I've also noticed variance between the document generated and what the chain-of-thought thingy shows during the process.

The version it puts down into documents is not the thing it was actually doing. It's a little anxiety-inducing. I go back to review the code with big microscopes.

"Reproducibility" is still pretty important for those trapped in the basements of aerospace and defense companies. No one wants the Lying Machine to jump into the cockpit quite yet. Soon, though.

We have managed to convince the Overlords that some teensy non-agentic local models - sourced in good old America and running local - aren't going to All Your Base their Internets. So, baby steps.

bluequbit 14 hours ago||

I call this overcooking. Adding unnecesary features.

tim-projects 23 hours ago||

> The model fixes the bug but half the function has been rewritten.

The solution to this is to use quality gates that loop back and check the work.

I'm currently building a tool with gates and a diff regression check. I haven't seen these problems for a while now.

https://github.com/tim-projects/hammer

LetsGetTechnicl 22 hours ago||

Well seeing as they don't KNOW anything this isn't surprising at all

spullara 22 hours ago|

this is one of the best things about using claude over gpt. claude understands the bigger assignment and does all the work and sometimes more than necessary but for me it beats the alternative.

More comments...