Remix.run Logo
coldtea 3 hours ago

Hardly iconoclastic, it's a very sensible suggestion.

It would be iconoclastic if the common sense basic approach would be to start with abstraction. It's not, the common sense default is to write possibly duplicate behavior until you actually discover several cases to abstract away, until you bevalop a sensible idea of which functionality unites them and which doesn't carry over all of them.

>Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace

Maintaining the wrong abstraction, or, god help, abstractions, would be even worse.

dofm 3 hours ago | parent | next [-]

> Maintaining the wrong abstraction, or, god help, abstractions, would be even worse.

Hard disagree. When you've had to chase through a change in untold and actually unknown numbers of duplications of code in different permutations and fix them because they are all on fire simultaneously, you'd disagree too. A bad abstraction would at least have had one fire in one place.

trimbo an hour ago | parent | next [-]

> Hard disagree. When you've had to chase through a change in untold and actually unknown numbers of duplications of code in different permutations and fix them because they are all on fire simultaneously, you'd disagree too.

The other end of this spectrum is dealing with the architecture astronaut's up-front abstraction. Totally overengineered for solving the initial requirements, but then constantly needing new hacks to make it cope with new requirements as they come up in the normal course of work.

That's why there's a balance in there, it's somewhere between "always duplicate code even when you know a lot about the problem" and "always write abstractions even when you know very little about the problem."

davidee 2 hours ago | parent | prev | next [-]

Good faith question: would it?

Wouldn't most large codebases with poor abstractions just have engineers engineer around them with their own solutions? In a large enough codebase you'd have both the bad abstractions and all the not-quite-duplicate implementations ignoring the bad abstraction?

I'm using bad here loosely, it could be buggy, incorrect, incomplete, insufficient and more; while being owned by someone or some team that's a challenge to work with for various reasons (overloaded, under-resourced, overbearing, etc., etc.).

dofm 2 hours ago | parent | next [-]

> Wouldn't most large codebases with poor abstractions just have engineers engineer around them with their own solutions?

Obviously, yes. But it is my experience that this happens more slowly and that API invocations that break when the abstraction is changed are much easier to identify than broader duplicated patterns of code that span many lines and subtly diverge.

And even then those divergences are better because each wrapper around the abstraction is documenting the problem with it. But the abstraction can generally be replaced by one with the same API surface.

(Even if you take into account the fact that any API behaviour ultimately gets relied upon even if undocumented. Which is true.)

To be fair my experience is that of a freelancer and contractor who arrives trying to fix things that have been through many such hands. And I think if these developers had it drummed into their head that any attempt at abstraction would be better than copy and paste, these situations would be more knowable.

jcgrillo 2 hours ago | parent | prev [-]

> engineers engineer around them with their own solutions

When that happens there's a major engineering leadership failure currently in progress, even if engineering leadership isn't aware of it.

sodapopcan an hour ago | parent [-]

Yep, this is why I why I find talking about this tiring. No matter what you say, many people are going to keep reading it as "duplication is always better than abstracting."

jcgrillo 14 minutes ago | parent [-]

It's more nuanced than either extreme. But regardless of the root cause, if you have engineers duplicating work left and right something has gone wrong. Their labor is not being used efficiently.

EDIT: LLM or not, this is still true. If you have LLMs pumping out tons of duplicate code you're wasting tokens, and probably more importantly wasting engineer hours reviewing duplicate code.

In some cases it might be a fair trade, in moderation. In general it's certainly wrong.

swader999 an hour ago | parent | prev | next [-]

The article isn't saying don't dry, it's saying don't force dry. Very big difference and you get ideal maintainability when you ease off a bit but still use it.

grayclhn 2 hours ago | parent | prev | next [-]

IME a bad abstraction results in the same thing, just with a lot of wasted effort coming up with the abstraction first, and a lot more resistance to fixing it because people are too emotionally invested. I’d rather have something clearly chosen for expedience and that no one likes.

SkiFire13 an hour ago | parent | prev | next [-]

> A bad abstraction would at least have had one fire in one place

That's true only for "good" abstractions. Bad abstractions will often require you to change code in all the places using it, requiring you to understand how all of them work and what are their requirements, _all at the same time_.

ted_dunning an hour ago | parent | prev | next [-]

A bad abstraction would have caused many updates in many places because the API would never quite stabilize due to having been a force-fit from the start.

A uses the abstraction, but finds the API doesn't work. Fixes that.

That causes B to have to make a tracking change which induces a bug. B realizes that the API isn't quite right. Fixes it.

That causes A and C to make tracking changes. These induce more bugs. C fixes the abstraction to avoid these cases.

This breaks A and B so they decline to update.

And so on. This is what a bad abstraction looks like. API "fixes" bouncing around the code as they reflect off of the bad abstraction.

ted_dunning an hour ago | parent | prev | next [-]

I, on the other hand, have had to burn through countless cycles of security alerts because I used a library for JSON parsing that had all kinds of other features that I didn't need or want.

The security bugs were all in features I never wanted.

A bit of simple duplication would have been golden.

anygivnthursday an hour ago | parent | prev | next [-]

Both are bad, what you describe is very real, but so is the opposite. That one fire in one place can end up in a total rewrite of numerous layers because the abstraction never anticipated certain things to happen.

coldtea 2 hours ago | parent | prev | next [-]

>A bad abstraction would at least have had one fire in one place.

On the contrary: that's precisely what a bad abstraction would not offer.

Instead it would spread its assumptions to different parts of the system, as every caller, sub-service, etc. would have to change shape to fit in that abstraction's box, however unnatural it is (and we know it would be unnatural, because we already said it's a bad abstraction).

Abstraction is not the same as encapsulation.

dofm 2 hours ago | parent [-]

> instead it would spread its assumptions to different parts of the system,

But so does duplication, in practice, and it diverges more as it does.

coldtea 2 hours ago | parent [-]

Duplication is just code doing the same thing in several places, and as such it's much easier to make DRY (and much easier after you have N copies to see what should be shared and what should not), compared to re-architecting the whole system to remove a bad abstraction.

cjfd an hour ago | parent [-]

No. The duplication is seldomly that clean. It has started to diverge in subtle ways where the question becomes whether that was the intention or not. In the worst possible cases it has resulted in 8000-line functions full of duplication. 're-architecting the whole system to remove a bad abstraction' sounds fear mongering. That never happens.

svieira an hour ago | parent [-]

Ah contraire, mon ami, I am currently in the process of doing just that in many places in my current codebase.

rpdillon 2 hours ago | parent | prev | next [-]

In your mind, what's the cost of the wrong abstraction?

dofm 2 hours ago | parent [-]

The major risk/cost is breakage if you must change it but cannot maintain its whole surface even with a shim, right?

But any abstraction ends up with a signature and a name that can quickly be found in code.

The risk of a long-lived duplication losing its shape and being hard to find is much greater. Especially if the code is going through multiple hands.

I once had to pick up a project — a working, fully functional website. I could see, pretty clearly, the work of several people. All but one of them terrible.

The one was a diligent developer who was fully wrong in their abstraction (in fact significantly) but was consistent in how they used it.

The rest had simply worked around that code, copied and re-copied their own modified duplications and let things lose any shape. The result was error-prone stuff.

Clearly either the budget (or the client's capriciousness — a separate issue and arguably the bigger one) scared away the one guy, who I actually wanted to talk to but could not track down. He possibly had the origin story, and I wanted to know why his particular abstraction, which was at odds with the framework, was there. It was good code in the wrong shape, and it clearly used to do more, and that is interesting.

All the expedient people who had decided to avoid his code and just patch in duplicated pieces around it were the problem. There was no form to their solution at all. And that had clearly happened over some time (because you could see several different code styles)

rubyn00bie 2 hours ago | parent [-]

I am confused by this comment. The root problem was the wrong abstraction was implemented. Then it was duplicated. Had there been no abstraction, it would not have been duplicated so readily? Am I missing something?

dofm 2 hours ago | parent [-]

I will reword it slightly, I typed too fast.

rubyn00bie 2 hours ago | parent | prev [-]

The same problem exists, and I think is unfathomably worse, when the wrong abstraction is used throughout a code base.

Abstractions are a form of coupling, and coupling can be good, if the components are truly interdependent, and have a well defined domain. The problem with most abstractions, and I’ve seen this time and time again, is that they become brittle, are over used, and the cost of maintaining them grows exponentially with the size of the code base. With the reason for the cost ballooning being the system has disparate components that look interrelated but are absolutely not. Once you give someone a hammer they tend to assume everything is a nail.

The biggest problem, IMHO, is that abstractions are often used where a pattern would be more effective, easier to maintain, and easier to iterate on. And the primary difference between a pattern and an abstraction really comes down to coupling. Patterns remain decoupled, abstractions are tightly coupled.

And to be clear, I will and do use abstractions, when and where they make sense. But only after clear patterns emerge, and it’s been proven that components are truly coupled.

I will gladly die on the hill, that abstractions are measurably worse than duplication an overwhelming amount of the time. They’re often nothing more than a form of premature optimization.

zingar 2 hours ago | parent [-]

What’s the difference between a pattern and an abstraction?

shinycode 2 hours ago | parent | prev | next [-]

At work there’s been a huge number of duplication in the start of the company and no solid abstraction. So no tests as well. We introduced tests in the current architecture but rewriting code has a huge cost to make sure there is no regression. When we talk about a saas it’s non-trivial with many customers relying on this tool daily as part of their workflow, regressions because of rewrite could be really painful for them. So we must give a greater budget to take the time to make sure nothing major breaks. So there is a debt that is compounding over time because code is added. Duplication is bad and weird/purist abstraction could make the architecture so rigid that rewriting things could generate hard to understand and catch bugs. It’s hard to find a good balance and it depends on the kind of business and scale of project. Hard to make that a generic advice.

ghosty141 2 hours ago | parent | next [-]

I think all these comments here are kinda talking past each other.

It all depends on the amount of duplication and the complexity of the abstraction. Like you said, no generic advice is possible that clearly separates it into "abstract here" and "duplicatehere".

In your example it sounds like we aren't talking about 2-3 places where duplicate code existed that just needed to be refactored into separate units. It sounds more like a complete disregard for abstraction to move on quickly.

If you see duplicate code and have a good understanding how to solve that then it's totally a good thing. The real problem comes in if you add abstractions without knowing wether they will hold up. And this is where the blogpost comes in. In my opinion 2 duplicates are fine, at 3 you should start thinking or implementing an abstraction if you have a good understanding of the code and usecases.

chairmansteve 2 hours ago | parent | prev [-]

"It’s hard to find a good balance and it depends on the kind of business and scale of project".

Exactly. The abstraction purists are not working in the messy, dead line driven real world.

2 hours ago | parent [-]
[deleted]
pfannl 2 hours ago | parent | prev | next [-]

The real rule is probably: duplicate until the abstraction stops looking like a horoscope.

bluefirebrand 3 hours ago | parent | prev [-]

Yeah, "Write Everything Twice" is a pretty common and sensible direction for any codebase

marcosdumay 2 hours ago | parent | next [-]

It's sensible if you have strict control of your duplications. You do have strict control of what is duplicated and where, right?

Write everything twice quickly becomes write everything 4 times once a new change appears, just as quickly as it becomes write everything 8 times, and so on.

I'm afraid there's no sensible soundbite developers can follow blindly.

coldtea 2 hours ago | parent [-]

>Write everything twice quickly becomes write everything 4 times once a new change appears, just as quickly as it becomes write everything 8 times, and so on.

That's a good problem to have. Getting to 4 or 8 or 12, and then pruning it to 1 or maybe 2 or 3 clearly different cases, is better than shoehorning multiple cases into the wrong abstraction, having everything that speaks with them coupled to that and dancing around their assumptions, and then having to untangle that.

Duplicated code is by definition LESS coupled.

cwmoore 2 hours ago | parent | prev [-]

Yeah, ~"Write Everything Twice"~ “Copy and Paste Working Code” is a pretty common and sensible direction for any codebase

lanstin 2 hours ago | parent [-]

In C I used to make it so my standard per-file and per lib code could be cut and pasted to other files/libs without modification. (E.g. every file had a mLocal variable that was file-visibility symbols, every module had a #module define for logging, there was always a mLocal.stats member, etc. ) I think some of this duplicate vs. abstract depends on your languages expressiveness - Rust or Lisp with good compile type power make it possible to squeeze out a lot of duplication that in less expressive languages are just idioms - here’s the five lines to make a syscall, or here’s the skeleton of parsing a portable network buffer into a native object.

Having a lot of if/else in your code is definitely a cost. My weakness isn’t so much the libraries and APIs, but the actual binary - once I have a service that does A very well, and I run into needing A’ I mostly just add in a config line “op_mode = A|A’” and have the else/if chains in the server driving code. Moreso for CLIs that I use myself than production services, but I have added tunables for consistency and replication to datastores to allow new use cases and expand my footprint in the data center.