Posted by rafaepta 8 hours ago
Refactoring code to reduce the number of lines is _compression_, akin to RLE coding.
Refactoring the code to lift conceptually coherent parts is _abstraction_.
Less compression, more abstraction. Then you're fine.
IMO it's easier to inline a bad abstraction than it is to consolidate a bunch of subtly different things that should have been abstracted from the beginning.
But I expect people's opinions on this differ wildly based on their personal experiences. Just my anecdotal take.
Some of the biggest rabbit holes come from naming conventions not aligning across the business and technology silos. If everyone agrees that Customer has exactly 34 attributes, then it is possible to move to the next step of sharing libraries of types across the team. Getting your POCOs/DTOs 1:1 across the board is when the duplication really starts to melt away.
Also I’ve seen the kind of codebase that seems to be LZW packed due to the sheer desire to DRY everything out. Not pleasant thing, by the time you goto 10 layers deep on some “helper” function you forgot why you in there.
Having a wrong abstraction means you end up with a class/function/module with a huge amount of configurations through boolean/enum parameters. It's not even clear that all combinations of configurations is even valid. This situation may be simplified by duplicating, and then eliminating code, thus creating more streamlined code for each use case. This may require fixing similar or cross-cutting bugs in multiple places (eg: JSON serialization is stupid, need to hack a workaround), but keeps the business logic changes simple. Maybe a bit more numerous, but the code is able to raise all the scenarios to consider.
Having no abstraction means you may have to change business logic consistently in multiple places, or you have to fix exactly the same misconception (aka a bug) in multiple cases. e.g. tax rate management in a multi-national context. This is also terrible, because you may fix an important problem in one place and forget other places with the same issue. Now you missed 12 potential bugs by fixing one. This can however allow you to discover a true abstraction. Maybe these 12 places should call just one place?
But for code evolving across a team understanding this tension, a bit of duplication while waiting for confirmation that these pieces of code break together and change together is better than just shoving the same 3 if-statements into a function to avoid "line duplication". Concept duplication is more important.
OP is right that code duplication is far cheaper than the wrong abstraction, but the opposite is also true - the right abstraction is far cheaper than code duplication.
Perhaps "recompilation" - rewrite by replaying all prompts in strict code quality context (linters, complexity & dedup checks) would make better abstractions.
The only problem now is that LLMs are non-deterministic.
Compiled assembly code is not an input to the next compilation; source code is an input to the LLM's next inference.
Sure, maybe "prompts are the code," but you must realize that code is also the prompt.
Do you want to iterate using for loop or using .iter().step(2).map()?
I would rather have consistency than a mixed bag of levels of abstractions.
This isn't really a good example, assuming both can be used to represent the same thing.
The problem with the wrong abstraction is when your abstraction doesn't let you represent something. Then, because of you've already invested so heavily into it, you start contorting the problem to fit your abstraction and it becomes a shit show.
I don’t think it matters, specially for sort sized loop scopes
You cannot find the edges of the system with structure you don't understand because once the abstraction are set in place solutions often have the same shape as the frameworks which leads to ultimately really bad systems.
The best way is often not the obvious way. Once you reach the edges then you can think how to program the abstraction but that is many versions down the line from the original.