Remix.run Logo
dofm 3 hours ago

No it's not. This has always been a needlessly iconoclastic rather than sensible suggestion.

At the very least it is not once you're working at the wrong kind of scale.

Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace.

And in the LLM era the wrong kind of scale appears in different ways; code generated and duplicated without proper abstraction and then maintained by an LLM that cannot be trusted to do the same modification each time it encounters a pattern or to have enough of an overview to slowly rescue duplicated code through good abstractions.

I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are past a de minimis threshold.

ubertaco 2 hours ago | parent | next [-]

I'd recommend clicking through the headline to watch the talk. Metz talks a lot about types of similarity: similarity by coincidence vs similarity due to an actual semantic or functional equivalence.

Code that is coincidentally similar very often diverges in either the short or long term, and DRYing it up aggressively tends to result in functions that have many boolean parameters that each trigger disjoint sets of behavior - which is a bit of a nightmare to maintain due to the high cognitive overhead of remembering how all the interleaved-but-actually-unrelated behaviors should work.

This outcome is low-cohesion code.

It's a useful concept to be aware of - worth clicking through to the actual content of the talk rather than just the headline.

dofm 43 minutes ago | parent [-]

> I'd recommend clicking through the headline to watch the talk. Metz talks a lot about types of similarity: similarity by coincidence vs similarity due to an actual semantic or functional equivalence.

I've seen this article and AFAIR the video before, and FWIW having been a Rails developer from the very early days and fitfully until maybe even 2014, I now interpret the phrase "my Railsconf talk…" quite negatively.

ETA: nice to be back to disagreeing with people on HN about coding principles again though. Hopefully this is a sign.

coldtea 3 hours ago | parent | prev | next [-]

Hardly iconoclastic, it's a very sensible suggestion.

It would be iconoclastic if the common sense basic approach would be to start with abstraction. It's not, the common sense default is to write possibly duplicate behavior until you actually discover several cases to abstract away, until you bevalop a sensible idea of which functionality unites them and which doesn't carry over all of them.

>Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace

Maintaining the wrong abstraction, or, god help, abstractions, would be even worse.

dofm 3 hours ago | parent | next [-]

> Maintaining the wrong abstraction, or, god help, abstractions, would be even worse.

Hard disagree. When you've had to chase through a change in untold and actually unknown numbers of duplications of code in different permutations and fix them because they are all on fire simultaneously, you'd disagree too. A bad abstraction would at least have had one fire in one place.

trimbo an hour ago | parent | next [-]

> Hard disagree. When you've had to chase through a change in untold and actually unknown numbers of duplications of code in different permutations and fix them because they are all on fire simultaneously, you'd disagree too.

The other end of this spectrum is dealing with the architecture astronaut's up-front abstraction. Totally overengineered for solving the initial requirements, but then constantly needing new hacks to make it cope with new requirements as they come up in the normal course of work.

That's why there's a balance in there, it's somewhere between "always duplicate code even when you know a lot about the problem" and "always write abstractions even when you know very little about the problem."

davidee 2 hours ago | parent | prev | next [-]

Good faith question: would it?

Wouldn't most large codebases with poor abstractions just have engineers engineer around them with their own solutions? In a large enough codebase you'd have both the bad abstractions and all the not-quite-duplicate implementations ignoring the bad abstraction?

I'm using bad here loosely, it could be buggy, incorrect, incomplete, insufficient and more; while being owned by someone or some team that's a challenge to work with for various reasons (overloaded, under-resourced, overbearing, etc., etc.).

dofm 2 hours ago | parent | next [-]

> Wouldn't most large codebases with poor abstractions just have engineers engineer around them with their own solutions?

Obviously, yes. But it is my experience that this happens more slowly and that API invocations that break when the abstraction is changed are much easier to identify than broader duplicated patterns of code that span many lines and subtly diverge.

And even then those divergences are better because each wrapper around the abstraction is documenting the problem with it. But the abstraction can generally be replaced by one with the same API surface.

(Even if you take into account the fact that any API behaviour ultimately gets relied upon even if undocumented. Which is true.)

To be fair my experience is that of a freelancer and contractor who arrives trying to fix things that have been through many such hands. And I think if these developers had it drummed into their head that any attempt at abstraction would be better than copy and paste, these situations would be more knowable.

jcgrillo 2 hours ago | parent | prev [-]

> engineers engineer around them with their own solutions

When that happens there's a major engineering leadership failure currently in progress, even if engineering leadership isn't aware of it.

sodapopcan an hour ago | parent [-]

Yep, this is why I why I find talking about this tiring. No matter what you say, many people are going to keep reading it as "duplication is always better than abstracting."

jcgrillo 18 minutes ago | parent [-]

It's more nuanced than either extreme. But regardless of the root cause, if you have engineers duplicating work left and right something has gone wrong. Their labor is not being used efficiently.

EDIT: LLM or not, this is still true. If you have LLMs pumping out tons of duplicate code you're wasting tokens, and probably more importantly wasting engineer hours reviewing duplicate code.

In some cases it might be a fair trade, in moderation. In general it's certainly wrong.

swader999 2 hours ago | parent | prev | next [-]

The article isn't saying don't dry, it's saying don't force dry. Very big difference and you get ideal maintainability when you ease off a bit but still use it.

grayclhn 2 hours ago | parent | prev | next [-]

IME a bad abstraction results in the same thing, just with a lot of wasted effort coming up with the abstraction first, and a lot more resistance to fixing it because people are too emotionally invested. I’d rather have something clearly chosen for expedience and that no one likes.

SkiFire13 an hour ago | parent | prev | next [-]

> A bad abstraction would at least have had one fire in one place

That's true only for "good" abstractions. Bad abstractions will often require you to change code in all the places using it, requiring you to understand how all of them work and what are their requirements, _all at the same time_.

ted_dunning an hour ago | parent | prev | next [-]

A bad abstraction would have caused many updates in many places because the API would never quite stabilize due to having been a force-fit from the start.

A uses the abstraction, but finds the API doesn't work. Fixes that.

That causes B to have to make a tracking change which induces a bug. B realizes that the API isn't quite right. Fixes it.

That causes A and C to make tracking changes. These induce more bugs. C fixes the abstraction to avoid these cases.

This breaks A and B so they decline to update.

And so on. This is what a bad abstraction looks like. API "fixes" bouncing around the code as they reflect off of the bad abstraction.

ted_dunning an hour ago | parent | prev | next [-]

I, on the other hand, have had to burn through countless cycles of security alerts because I used a library for JSON parsing that had all kinds of other features that I didn't need or want.

The security bugs were all in features I never wanted.

A bit of simple duplication would have been golden.

anygivnthursday an hour ago | parent | prev | next [-]

Both are bad, what you describe is very real, but so is the opposite. That one fire in one place can end up in a total rewrite of numerous layers because the abstraction never anticipated certain things to happen.

coldtea 2 hours ago | parent | prev | next [-]

>A bad abstraction would at least have had one fire in one place.

On the contrary: that's precisely what a bad abstraction would not offer.

Instead it would spread its assumptions to different parts of the system, as every caller, sub-service, etc. would have to change shape to fit in that abstraction's box, however unnatural it is (and we know it would be unnatural, because we already said it's a bad abstraction).

Abstraction is not the same as encapsulation.

dofm 2 hours ago | parent [-]

> instead it would spread its assumptions to different parts of the system,

But so does duplication, in practice, and it diverges more as it does.

coldtea 2 hours ago | parent [-]

Duplication is just code doing the same thing in several places, and as such it's much easier to make DRY (and much easier after you have N copies to see what should be shared and what should not), compared to re-architecting the whole system to remove a bad abstraction.

cjfd an hour ago | parent [-]

No. The duplication is seldomly that clean. It has started to diverge in subtle ways where the question becomes whether that was the intention or not. In the worst possible cases it has resulted in 8000-line functions full of duplication. 're-architecting the whole system to remove a bad abstraction' sounds fear mongering. That never happens.

svieira an hour ago | parent [-]

Ah contraire, mon ami, I am currently in the process of doing just that in many places in my current codebase.

rpdillon 2 hours ago | parent | prev | next [-]

In your mind, what's the cost of the wrong abstraction?

dofm 2 hours ago | parent [-]

The major risk/cost is breakage if you must change it but cannot maintain its whole surface even with a shim, right?

But any abstraction ends up with a signature and a name that can quickly be found in code.

The risk of a long-lived duplication losing its shape and being hard to find is much greater. Especially if the code is going through multiple hands.

I once had to pick up a project — a working, fully functional website. I could see, pretty clearly, the work of several people. All but one of them terrible.

The one was a diligent developer who was fully wrong in their abstraction (in fact significantly) but was consistent in how they used it.

The rest had simply worked around that code, copied and re-copied their own modified duplications and let things lose any shape. The result was error-prone stuff.

Clearly either the budget (or the client's capriciousness — a separate issue and arguably the bigger one) scared away the one guy, who I actually wanted to talk to but could not track down. He possibly had the origin story, and I wanted to know why his particular abstraction, which was at odds with the framework, was there. It was good code in the wrong shape, and it clearly used to do more, and that is interesting.

All the expedient people who had decided to avoid his code and just patch in duplicated pieces around it were the problem. There was no form to their solution at all. And that had clearly happened over some time (because you could see several different code styles)

rubyn00bie 2 hours ago | parent [-]

I am confused by this comment. The root problem was the wrong abstraction was implemented. Then it was duplicated. Had there been no abstraction, it would not have been duplicated so readily? Am I missing something?

dofm 2 hours ago | parent [-]

I will reword it slightly, I typed too fast.

rubyn00bie 2 hours ago | parent | prev [-]

The same problem exists, and I think is unfathomably worse, when the wrong abstraction is used throughout a code base.

Abstractions are a form of coupling, and coupling can be good, if the components are truly interdependent, and have a well defined domain. The problem with most abstractions, and I’ve seen this time and time again, is that they become brittle, are over used, and the cost of maintaining them grows exponentially with the size of the code base. With the reason for the cost ballooning being the system has disparate components that look interrelated but are absolutely not. Once you give someone a hammer they tend to assume everything is a nail.

The biggest problem, IMHO, is that abstractions are often used where a pattern would be more effective, easier to maintain, and easier to iterate on. And the primary difference between a pattern and an abstraction really comes down to coupling. Patterns remain decoupled, abstractions are tightly coupled.

And to be clear, I will and do use abstractions, when and where they make sense. But only after clear patterns emerge, and it’s been proven that components are truly coupled.

I will gladly die on the hill, that abstractions are measurably worse than duplication an overwhelming amount of the time. They’re often nothing more than a form of premature optimization.

zingar 2 hours ago | parent [-]

What’s the difference between a pattern and an abstraction?

shinycode 3 hours ago | parent | prev | next [-]

At work there’s been a huge number of duplication in the start of the company and no solid abstraction. So no tests as well. We introduced tests in the current architecture but rewriting code has a huge cost to make sure there is no regression. When we talk about a saas it’s non-trivial with many customers relying on this tool daily as part of their workflow, regressions because of rewrite could be really painful for them. So we must give a greater budget to take the time to make sure nothing major breaks. So there is a debt that is compounding over time because code is added. Duplication is bad and weird/purist abstraction could make the architecture so rigid that rewriting things could generate hard to understand and catch bugs. It’s hard to find a good balance and it depends on the kind of business and scale of project. Hard to make that a generic advice.

ghosty141 2 hours ago | parent | next [-]

I think all these comments here are kinda talking past each other.

It all depends on the amount of duplication and the complexity of the abstraction. Like you said, no generic advice is possible that clearly separates it into "abstract here" and "duplicatehere".

In your example it sounds like we aren't talking about 2-3 places where duplicate code existed that just needed to be refactored into separate units. It sounds more like a complete disregard for abstraction to move on quickly.

If you see duplicate code and have a good understanding how to solve that then it's totally a good thing. The real problem comes in if you add abstractions without knowing wether they will hold up. And this is where the blogpost comes in. In my opinion 2 duplicates are fine, at 3 you should start thinking or implementing an abstraction if you have a good understanding of the code and usecases.

chairmansteve 2 hours ago | parent | prev [-]

"It’s hard to find a good balance and it depends on the kind of business and scale of project".

Exactly. The abstraction purists are not working in the messy, dead line driven real world.

2 hours ago | parent [-]
[deleted]
pfannl 2 hours ago | parent | prev | next [-]

The real rule is probably: duplicate until the abstraction stops looking like a horoscope.

bluefirebrand 3 hours ago | parent | prev [-]

Yeah, "Write Everything Twice" is a pretty common and sensible direction for any codebase

marcosdumay 2 hours ago | parent | next [-]

It's sensible if you have strict control of your duplications. You do have strict control of what is duplicated and where, right?

Write everything twice quickly becomes write everything 4 times once a new change appears, just as quickly as it becomes write everything 8 times, and so on.

I'm afraid there's no sensible soundbite developers can follow blindly.

coldtea 2 hours ago | parent [-]

>Write everything twice quickly becomes write everything 4 times once a new change appears, just as quickly as it becomes write everything 8 times, and so on.

That's a good problem to have. Getting to 4 or 8 or 12, and then pruning it to 1 or maybe 2 or 3 clearly different cases, is better than shoehorning multiple cases into the wrong abstraction, having everything that speaks with them coupled to that and dancing around their assumptions, and then having to untangle that.

Duplicated code is by definition LESS coupled.

cwmoore 2 hours ago | parent | prev [-]

Yeah, ~"Write Everything Twice"~ “Copy and Paste Working Code” is a pretty common and sensible direction for any codebase

lanstin 2 hours ago | parent [-]

In C I used to make it so my standard per-file and per lib code could be cut and pasted to other files/libs without modification. (E.g. every file had a mLocal variable that was file-visibility symbols, every module had a #module define for logging, there was always a mLocal.stats member, etc. ) I think some of this duplicate vs. abstract depends on your languages expressiveness - Rust or Lisp with good compile type power make it possible to squeeze out a lot of duplication that in less expressive languages are just idioms - here’s the five lines to make a syscall, or here’s the skeleton of parsing a portable network buffer into a native object.

Having a lot of if/else in your code is definitely a cost. My weakness isn’t so much the libraries and APIs, but the actual binary - once I have a service that does A very well, and I run into needing A’ I mostly just add in a config line “op_mode = A|A’” and have the else/if chains in the server driving code. Moreso for CLIs that I use myself than production services, but I have added tunables for consistency and replication to datastores to allow new use cases and expand my footprint in the data center.

fny 3 hours ago | parent | prev | next [-]

Code duplication is cheaper than the wrong abstraction. If you have a good abstraction, you should run with it.

If you haven't figured out a good abstraction at 5-100 customers, God help you.

feoren 2 hours ago | parent | next [-]

A good abstraction? As in one? I'd go so far as to say the process of discovering and refining abstractions is the most important part of software engineering. A large project has dozens of abstractions, and some of them are "wrong" at any time, as you discover over time. None are ever perfect. If you wait to stop duplicating code until you have the "right" abstraction, you are just putting off the hard part of developing software and taking on tech debt.

Half of your abstractions are wrong. The hard part is knowing which half.

lanstin 2 hours ago | parent | next [-]

I once worked at a place with abstractions I found to be beautifully perfect. The people that wrote the base framework had done similar things two or so times previously and got it right the third time. You couldn’t write slow or hard to operate code there without really trying hard.

meerita 2 hours ago | parent | prev [-]

Good abstraction does one single thing and does it well. Bad abstraction starts from the premise of becoming a dumping ground. If that is the case, the best and ideal scenario is splitting the abstraction into many ones to make the job better.

stymaar 2 hours ago | parent | prev | next [-]

> Code duplication is cheaper than the wrong abstraction

This is tautological though, it's like saying “starving is much better than eating the wrong food” (for instance: eating quick lime).

Of course you'll always find a way to do things wrong in a way that is costlier than not doing anything.

blauditore 2 hours ago | parent [-]

Sure, but obviously that sentence implies that wrong abstractions are fairly common.

enos_feedler 2 hours ago | parent | prev | next [-]

What if there is no good abstraction for the entire stack of software on each of computers? What if we built a common one because we had to? What if now we get to all make our own with natural language?

dofm 3 hours ago | parent | prev | next [-]

I disagree.

But also it's very possible to not realise you needed an abstraction until it catches fire in multiple places.

And quite often it's not you that got the codebase to a hundred customers, is it? Sometimes it is a sequence of fresh-faced young developers who didn't have the authority to say "this duplication is bullshit" and were instead compelled to repeat it.

I think a lot of these discussions happen in nice little blog-post vacuums of progressive thinking, where people can go "mmm, object oriented coding obscures intent and clarity, mmm", blog posts with "an X is a Y", "the unreasonable effectiveness of foobar" etc.

In the real world, every duplication that works sticks for good; there is rarely budget to electively replace code that isn't broken. Until one day it doesn't work. And then… how many times is it actually duplicated? How many of the duplicates diverged? How many of these do we no longer need?

chairmansteve 2 hours ago | parent [-]

> I disagree.

So... the wrong abstraction, no matter how bad, is better than code duplication?

dofm 2 hours ago | parent [-]

If you read my original comment I said pretty much this, yes.

> I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are past a de minimis threshold.

I appear to be in a solid minority thinking this. But I'm OK with it. I'm probably not going to write a blog post.

locknitpicker 2 hours ago | parent | prev [-]

> If you haven't figured out a good abstraction at 5-100 customers, God help you.

This blend of opinion is very naive. Every single project is a business requirement away from having the wrong abstraction in place.

https://xkcd.com/1425/

tomjakubowski 10 minutes ago | parent | next [-]

Reminds me of the Taskmaster interview of Sam Campbell, where (Little) Alex Horne is changing the requirements on his portrait up to the last second.

https://www.youtube.com/watch?v=VG0btgXY_D0

auggierose an hour ago | parent | prev [-]

Good one. See, we did make some serious progress, all you AI haters.

mytydev 3 hours ago | parent | prev | next [-]

It sounds to me like you are describing a good abstraction. This article does not claim that code duplication is better than any abstraction. It claims that code duplication is better than the wrong abstraction. I'm sure this author would agree that a good abstraction is better than code duplication.

dofm 2 hours ago | parent [-]

I'm afraid this comment reads in a rather gnomic way.

Of course it's a truism if you just say any abstraction that works is a good abstraction.

That is not what I am saying at all. Bullshit abstractions at least let you control the problem. Duplication doesn't.

vlunkr 2 hours ago | parent [-]

But it’s never going to be 1:1 duplication is it? Sometimes it’s better to copy code as a template for something new, rather than try to immediately force a new abstraction.

I agree with you that it’s a truism, but it’s useful advice for people who have a habit of trying too hard to DRY their code. IIRC the author comes from the Ruby world, where DRY was a big thing, and this talk was part of the pendulum swinging back away from this DRY obsession that sometimes just resulted in convoluted code.

agumonkey 3 hours ago | parent | prev | next [-]

You seem to have experience, I dont mind factoring / unifying logic, when done sensibly with enough history in the trenches. It pains me more whenever a young dev comes in and barks "we must merge these two things!" repeatedly without planning for more than two cases and starting to add more and more boolean variables. Crystal makers. Then the obvious issue comes, the two variants weren't that close and now there's one god class trying to handle all forces in one big state.

I agree that LLMs are naturally anti abstraction machines.. I'm often trying to find way to reverse that.

dofm 3 hours ago | parent [-]

> I agree that LLMs are naturally anti abstraction machines.. I'm often trying to find way to reverse that.

I am a bit of an LLM cynic but I am trying to learn it all, and I have to say I have spent most time trying to work out: how do you explain how a brown-field codebase actually works, in such a way that the LLM won't pervert it through misunderstanding.

It does encourage you towards the "conventional" coding standard for any new project, because you want to use a pattern that it will have seen in its training set.

But for example there are differences of opinion in how wordpress plugins (which have a very complex control flow) should be structured. LLMs are incredible at knowing how WP works, actually, but what is difficult is explaining how your methodology for a large plugin is going to work.

It is a battle — but a useful one because it can be used for, er, studying the comparative belief systems of the LLMs.

jbeninger an hour ago | parent | next [-]

The gold standard is code samples. I've got 1000-line convention documents with very simple rules like "Early returns on a single line". Llms sometimes ignore these or misinterpret them in unusual ways.

But if I tell it "read these files that use the same conventions" first, there's no misunderstanding, and the agent also picks up the general "tone" of the code. I have very little to tweak if I've defined the problem well.

dofm 6 minutes ago | parent [-]

> But if I tell it "read these files that use the same conventions" first, there's no misunderstanding, and the agent also picks up the general "tone" of the code. I have very little to tweak if I've defined the problem well.

Oh that is a bloomin' great idea, and I can fully see how it might work better.

Can't tell you how valuable this comment has been to me and now I feel so much better about evidently kicking a hornet's nest ;-) Thank you so much.

wonnage 2 hours ago | parent | prev [-]

They don’t have a useful belief system, one of the rookie mistakes of using LLMs is asking them what you “should” do

dofm 2 hours ago | parent [-]

Absolutely. I think the bit I still struggle with is finding a way to get them to join my team (which is a team of one very tired person).

A story I like is that in the now lost era of handwriting recognition on PDAs, Jef Raskin concluded that the easiest way to solve the problem was to change handwriting so as to meet the algorithm in the middle.

That is, to find a noticeable simplification of handwriting that people could learn quickly and that eliminated hard-to-process quirks.

I feel I am there with the LLM at the moment, trying to work out what the common ground is.

ChrisMarshallNY 2 hours ago | parent | prev | next [-]

In my experience, the answer is always "It Depends." That's about the only thing that I can hang "always" on.

It really depends on the exact type of code we're working with, and what our objectives are.

In my case, I often use object inheritance. It's a damn cheap way to DRY. However, when people hear "inheritance," they often think "polymorphism." There's a really big difference between the two, but popular culture has jammed them into one ball, and it's not worth the agita, to try to explain the difference.

But if you are doing optimization, long stacks can be your enemy, and inheritance tends to have long, windy stacks.

In these cases, the copy/pasta method may well be the best approach.

Like I said, "It Depends."

tomjakubowski 5 minutes ago | parent [-]

[delayed]

3 hours ago | parent | prev | next [-]
[deleted]
nfw2 2 hours ago | parent | prev | next [-]

Over-engineering and "abstraction hell" are very much not iconoclastic concepts

a-dub an hour ago | parent | prev | next [-]

i agree with the author. i argue a preference for loose coupling over centralized abstractions. sure it's pleasing to compress the code, but if the use cases actually are sufficiently divergent (as well as bugs and externally driven changes) ultimately it becomes brittle, littered with edge cases behind if fences and both challenging and daunting to change.

ideal case: support libraries and then very simple duplicated code that is easy to read and modify. critically the core control flow should remain duplicated, but simplified by the support libraries.

mawadev 3 hours ago | parent | prev | next [-]

I think you applied this idea into the era of LLMs but consider an abstraction that takes in multiple god structs for branches it may or may not call in the case you are looking at and has a lot of if conditions that explode in combinatory complexity across a deep call chain. Now the bottle neck is that you need to call this function 144 times a second. That is where you start to have clusters of hot code paths where the latency stacks depending on the angle the god structs come in. Not sure what LLMs do here, I don't vibe code

dofm 3 hours ago | parent [-]

I am applying it to LLMs on the basis of twenty years of seeing smaller programming shops tie themselves in knots by using duplication to avoid developing an abstraction that would help them because they were unsure of it.

Everyone always thinks duplication is fine when you can bill the modifications by the hour. But they never think to understand that the reason they've had so many employees is that they've turned their change process into firefighting all the different versions of the same code and all these young developers burn out from the sheer anxiety of not knowing where all the little fires are.

I once had to rescue a site that had become a victim of its own popularity, that was written by subcontractors who clearly believed that duplication is better than the wrong abstraction.

Until one day, along came a change — MySQL 4 to MySQL 5 — and a significant duplicated query no longer worked due to its new, proper strictness.

The problem was compounded; not only was the broken pattern in hundreds of places where it had sat, stable and predictable, but the pattern was broken because it, itself, was avoidance of another abstraction that would solve it.

They quit: they said they couldn't and wouldn't fix it. It had always worked how they had done it, and it would have to stay on MySQL 4 (which the hosting provider refused to accommodate).

I don't think it helped that they were severely misguided in their understanding of SQL, but the code had become beholden to duplication and then crippled by a new problem in the duplicated pattern.

I had to first find all the contexts in which that pattern appeared (which required me to spend half a day on a bespoke script) and then work out a new pattern and as few variations of it as possible to fix the duplicated code in each place, because there was no proper budget to rewrite the whole thing. And then I sat at my desk, for days, working through each one, figuring out how to change it to fit the slightly different expression of the pattern.

Even a total bullshit abstraction would have saved that client both time and money. And this is only one of dozens of times I've seen small firms simply duplicate and change code that would later become unmaintainable because of a straw breaking a camel's back.

Capricorn2481 3 hours ago | parent [-]

Again, this is the opposite of what the author argues for, which is waiting for a couple instances before committing to an abstraction. Not duplicating a SQL query across hundreds of places.

I would be curious if the previous coders you're talking about actually cited duplication as a good thing. You seem to be implying they are. But almost every instance I've seen of massive code duplication was just from bad programmers shooting from the hip, not from some ideological stance.

dofm 2 hours ago | parent [-]

> Again, this is the opposite of what the author argues for, which is waiting for a couple instances before committing to an abstraction. Not duplicating a SQL query across hundreds of places.

Right. But this is a hypothetical, in-a-vacuum situation.

In the real world, your two, three duplicates are in production.

"We really should now de-duplicate this"

"There is not the time or budget, just copy it again; we'll replace all this one day".

Capricorn2481 3 hours ago | parent | prev | next [-]

> I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are passed a de minimis threshold.

Pretty much everyone arguing for duplication has argued what you are saying, which is wait to see a few instances of it before committing to an abstraction. No one is saying duplicate everything 100 times. So I don't think this discussion was ever iconoclastic.

dofm 3 hours ago | parent [-]

The point is it sounds all smart and sophisticated and principled in the abstract environment of a code discussion in a blog post.

In the real world, duplication happens in an emergent way, there isn't the time each time to judge whether it's really time to just quietly abstract that code, you may not get the permission, budget or window to do it, and if you don't stop the rot really early you are locked into the pattern.

jbeninger an hour ago | parent [-]

But... it shouldn't. People are arguing that a bad abstraction is better than none at all. Badly-implemented abstraction is the same. If you hit code that is duplicated organically a dozen times, you don't make it a baker's dozen. You spend a bit of extra time at least stubbing out the abstraction so future organic duplication can at least share an entry point. Abstractions grow organically too, in well-tended codebases.

cjfd an hour ago | parent | prev | next [-]

100% agree. 'Code duplication is far cheaper than the wrong abstraction' is a very good candidate for the worst programming article ever.

cyberax an hour ago | parent | prev | next [-]

My general rule is to start refactoring once you have three copies of the code.

Starting with abstraction when you are only beginning something rarely works well and leads to code bases littered with interfaces having only one implementation.

Abstracting the code when you have two copies does not always pay off, especially when you end up not needeing more than just two copies anyway.

But once you have three copies, it's indeed time to start generalizing.

yowlingcat an hour ago | parent | prev | next [-]

One of the most challenging kinds of thought to work through with my engineers in professional communication is nuance. For example, they may say something like this, but actually mean "For a particular situation, this is wrong."

The context a decision is evaluated is particularly important for "rules of thumb" like this. There's the rule of 3 (which many senior engineers imparted to me earlier on in my career) - don't refactor until you've actually duplicated it thrice, but even so, what they speak of is a catch-22 that's pretty important to reason about carefully.

On one hand, if you overcorrected on the fear of abstraction, you could easily end up with 500 duplicates that are slightly different and need to be maintained 500 different ways, slowly causing slightly wrong behavior some of the time, data corruption, combinatoric explosion. Surely, once there is such a situation, some degree of abstraction is the only right decision.

On the other hand, if you overcorrected on the fear of duplication early on, you could easily end up with a premature optimization and complexity -- complexity which, most importantly, could be rooted in a gap of understanding of how the code will be used and what direction it may go in over time (often based on which direction the business will go over time).

The only answer that actually works, of course, is "somewhere in the middle." Obviously, that's pretty vague and not very useful. Where, exactly, in the middle IS the right place?

As the years have gone by, I've become more and more steadfast that the answer to that question is and must be an art and not a science. Of course, it must always be rooted in practicality, the actual context of the code around it and where the code/business was in the past and where it will be in the future.

But just as importantly, some of it must be based around beliefs in the face of imperfect information about what you want to invest in for the sake of the technology, the team that develops it, and the business that relies on it. It could be that for your team, your values make it make sense to go a little bit further than "good enough" on normalizing your data modeling, because the way you like to run your business requires that normal form to do the analytics and make decisions productively. It could be that for your team, your values make it make sense to go a little bit further than "good enough" on splitting service boundaries and ensuring clean queues and message passing infrastructure, because you have seasonal spikes where you need to scale up to a ton of load and then scale down after without constantly doing a song and dance or pre-provisioning fragile infrastructure.

But the most common thread there is - art, not a science. Every single decision depends on YOUR team, YOUR business, YOUR needs - and like any art, there is no universal rule or discovery or best practice in the industry that will magically work for your needs without working through the details of whether it appropriately fits your situation or not.

So with that said - I can't really agree with you. At any place I've ever worked with a competent team, maintaining duplicate code is just not that hard and follows the same process for being dealt with. Built a robust test suite that encodes the actual differences and the shared structure. Pull out the pieces that have a good reason to be abstracted and redesign the pieces that encode the true differential structure in a way that is intuitive. Lather rinse repeat. It's always straightforward because it's known - by the time you are doing this process, you've had tons of repetitions and data on what is driving you to develop the abstraction, so when you make the decision, you are making it empirically.

Conversely, I have seen many otherwise competent teams slowed to a halt with premature abstraction. Frameworks that were well intended and reduced duplication, but encoded coupling between components that at a certain point in the businesses progression, fought with reality rather than aided, and all because they were frozen into place before anyone empirically had really clear data about whether the abstraction would be worth it long term. Well intended "clean code" refactors that were meant to solve the old "bad duplication" but instead created a far more difficult to reason about "abstracted base" of code that didn't really solve any of the domain modeling problems and was just as difficult to maintain without introducing buggy behaviors (if not more so) than before.

The biggest problem is that premature abstraction is sexy and fun. There are incentives and dopamine hits from doing it extraneously. But fixing legacy duplication is not fun. And so when it gets done, it tends to get done in a pragmatic way to relieve pain rather than to elicit pleasure. That, I believe is one of the biggest confounding sociological aspects of this whole discussion.

Thaxll 2 hours ago | parent | prev | next [-]

So you centralize 3 liners?

dofm 2 hours ago | parent [-]

I said "beyond a de minimis threshold".

But in one of the scenarios I mention earlier, I earned a chunk of money once fixing an issue that emerged in a subcontractor's four or five line duplication that had ended up rippled through a long-lived codebase. A ground truth (MySQL version) changed, and the pattern broke everywhere, including places where it had evolved.

So I tend towards thinking, yes, any three-line pattern that is likely to appear everywhere should, perhaps, be centralised.

It's certainly worthy of serious consideration. Usually pretty easy to maintain the surface of such an abstraction.

thinkloop 2 hours ago | parent | prev | next [-]

The key lesson is that duplicate code is not necessarily "code duplication" - it was always really about abstraction duplication. If two unrelated variables happen to momentarily share a value, it doesn't mean that value should be made common between them, they are fundamentally different things. It would be a confusing lie and error-prone if the code implied they were the same and that efforts should be made for them to be in sync.

dofm 2 hours ago | parent [-]

I guess any blog post can remain true if you can optionally take one of the key terms and redefine it so it can also mean the opposite?

tracerbulletx 3 hours ago | parent | prev | next [-]

Huh? If anything having lots of customers makes the argument for duplication stronger. The issue is almost always once you get huge and 5 product teams are trying to achieve 5 different goals by using the same overwrought abstraction instead of just copying and decoupling. The abstractions that are actually stable end up becoming libraries or platform team owned systems that no one ever really touches.

jimmypk 2 hours ago | parent | prev [-]

[flagged]