Prefer duplication over the wrong abstraction (2016)

Posted by rafaepta 8 hours ago

Prefer duplication over the wrong abstraction (2016)(sandimetz.com)

399 points | 269 commentspage 5

zadikian 4 hours ago|

I always felt like I abstract and modularize things way less eagerly than other programmers. Was pleasantly surprised to find that LLMs do it mostly my way by default, then again they're also bad at abstracting when it's actually needed.

dan-robertson 4 hours ago|

I think LLMs are trained to not refactor. I think it’s either that you would need to do something in training to make them want to do it and the labs don’t do that, or that the labs correctly guess that it would be very annoying for LLMs to go and refactor your existing code as they go. This creates bad effects (eg crazy hacks to avoid refactoring and, much worse, not refactoring the code they only just wrote as required) but I think the alternative would be worse – it’s not something you always want to read and the refactoring is often done incorrectly, restructuring the code to the best shape for the current task rather than something that balances many different needs.

luckystarr 7 hours ago||

How I see this:

Refactoring code to reduce the number of lines is _compression_, akin to RLE coding.

Refactoring the code to lift conceptually coherent parts is _abstraction_.

Less compression, more abstraction. Then you're fine.

gb2d_hn 6 hours ago||

Interface over inheritance is the paradigm I try and stick to. I'd rather maintain orthogonal code than code with overuse of inheritance because of over adherence to DRY.

joshmoody24 7 hours ago||

I've seen the pendulum swing between duplication and abstraction a few times in my career, and I'm currently on team "it's usually not that hard to find a good abstraction up front."

IMO it's easier to inline a bad abstraction than it is to consolidate a bunch of subtly different things that should have been abstracted from the beginning.

But I expect people's opinions on this differ wildly based on their personal experiences. Just my anecdotal take.

bob1029 7 hours ago||

If you work backward from the schema these sorts of things tend to evaporate before they can become a problem.

Some of the biggest rabbit holes come from naming conventions not aligning across the business and technology silos. If everyone agrees that Customer has exactly 34 attributes, then it is possible to move to the next step of sharing libraries of types across the team. Getting your POCOs/DTOs 1:1 across the board is when the duplication really starts to melt away.

DmitryOlshansky 6 hours ago||

I would argue that _premature_ abstraction is worse than _some_ duplication of code.

Also I’ve seen the kind of codebase that seems to be LZW packed due to the sheer desire to DRY everything out. Not pleasant thing, by the time you goto 10 layers deep on some “helper” function you forgot why you in there.

tetha 7 hours ago||

I watched a talk by her about this, and this post is missing half of the equation, which is really important:

Having a wrong abstraction means you end up with a class/function/module with a huge amount of configurations through boolean/enum parameters. It's not even clear that all combinations of configurations is even valid. This situation may be simplified by duplicating, and then eliminating code, thus creating more streamlined code for each use case. This may require fixing similar or cross-cutting bugs in multiple places (eg: JSON serialization is stupid, need to hack a workaround), but keeps the business logic changes simple. Maybe a bit more numerous, but the code is able to raise all the scenarios to consider.

Having no abstraction means you may have to change business logic consistently in multiple places, or you have to fix exactly the same misconception (aka a bug) in multiple cases. e.g. tax rate management in a multi-national context. This is also terrible, because you may fix an important problem in one place and forget other places with the same issue. Now you missed 12 potential bugs by fixing one. This can however allow you to discover a true abstraction. Maybe these 12 places should call just one place?

But for code evolving across a team understanding this tension, a bit of duplication while waiting for confirmation that these pieces of code break together and change together is better than just shoving the same 3 if-statements into a function to avoid "line duplication". Concept duplication is more important.

KHRZ 7 hours ago||

This is the biggest lesson I got from LMMs. I have a 1 million LOC vibe coded project that I can only imagine would fit in a few hundred thousand lines. But it's still holding up, I expected some kind of development collapse long before this point.

cassianoleal 7 hours ago||

I don't think that's a good lesson.

OP is right that code duplication is far cheaper than the wrong abstraction, but the opposite is also true - the right abstraction is far cheaper than code duplication.

gb2d_hn 6 hours ago|||

It's made me wonder the same, but most LLM generated codebases haven't been around long enough to judge maintainability. I have noticed issues in some of my more LLM heavy code when I expect a change to be replicated in multiple areas, assuming common code / styling was reused, only to find it wasn't. It's for that reason I can't use LLMs for client codebases without heavy scrutiny of every line generated (for my own hobby projects I'm a lot more lenient)

gavmor 7 hours ago||

Well sooner or later I would expect a developer who intimately understands their code base to feel compelled to start refactoring and extracting fitting, meaningful well-leveraged abstractions.

imhoguy 5 hours ago||

I don't think that will happen anytime soon. Prompts are the code now, and programming languages code is compilation product. Almost nobody optimizes compiled assembly code.

Perhaps "recompilation" - rewrite by replaying all prompts in strict code quality context (linters, complexity & dedup checks) would make better abstractions.

The only problem now is that LLMs are non-deterministic.

gavmor 5 hours ago||

> Almost nobody optimizes compiled assembly code.

Compiled assembly code is not an input to the next compilation; source code is an input to the LLM's next inference.

Sure, maybe "prompts are the code," but you must realize that code is also the prompt.

anon-3988 7 hours ago||

The problem with coming up with a rule that works for everyone is that everyone have a different idea of what makes a good abstraction.

Do you want to iterate using for loop or using .iter().step(2).map()?

I would rather have consistency than a mixed bag of levels of abstractions.

doix 7 hours ago||

> Do you want to iterate using for loop or using .iter().step(2).map()?

This isn't really a good example, assuming both can be used to represent the same thing.

The problem with the wrong abstraction is when your abstraction doesn't let you represent something. Then, because of you've already invested so heavily into it, you start contorting the problem to fit your abstraction and it becomes a shit show.

metaltyphoon 7 hours ago||

> Do you want to iterate using for loop or using .iter().step(2).map()?

I don’t think it matters, specially for sort sized loop scopes

_pdp_ 4 hours ago|

The biggest mistakes young engineers make is working out a problem from bottom up... i.e. building frameworks and libraries, rather than exploring the problem space which is more chaotic.

You cannot find the edges of the system with structure you don't understand because once the abstraction are set in place solutions often have the same shape as the frameworks which leads to ultimately really bad systems.

The best way is often not the obvious way. Once you reach the edges then you can think how to program the abstraction but that is many versions down the line from the original.

More comments...