Remix.run Logo
andrepd 5 days ago

I don't understand this. How does it slow your development if the tests being green is a necessary condition for the code being correct? Yes it slows it compared to just writing incorrect code lol, but that's not the point.

yojo 5 days ago | parent | next [-]

"Brittle" here means either:

1) your test is specific to the implementation at the time of writing, not the business logic you mean to enforce.

2) your test has non-deterministic behavior (more common in end-to-end tests) that cause it to fail some small percentage of the time on repeated runs.

At the extreme, these types of tests degenerate your suite into a "change detector," where any modification to the code-base is guaranteed to make one or more tests fail.

They slow you down because every code change also requires an equal or larger investment debugging the test suite, even if nothing actually "broke" from a functional perspective.

Using LLMs to litter your code-base with low-quality tests will not end well.

winstonewert 5 days ago | parent | prev | next [-]

The problem is that sometimes it is not a necessary condition. Rather, the tests might have been checking implementation details or just been wrong in the first place. Now, when tests fails I have extra work to figure out if its a real break or just a bad test.

jrockway 5 days ago | parent | prev | next [-]

The goal of tests is not to prevent you from changing the behavior of your application. The goal is to preserve important behaviors.

If you can't tell if a test is there to preserve existing happenstance behavior, or if it's there to preserve an important behavior, you're slowed way down. Every red test when you add a new feature is a blocker. If the tests are red because you broke something important, great. You saved weeks! If the tests are red because the test was testing something that doesn't matter, not so great. Your afternoon was wasted on a distraction. You can't know in advance whether something is a distraction, so this type of test is a real productivity landmine.

Here's a concrete, if contrived, example. You have a test that starts your app up in a local webserver, and requests /foo, expecting to get the contents of /foo/index.html. One day, you upgrade your web framework, and it has decided to return a 302 Moved redirect to /foo/index.html, so that URLs are always canonical now. Your test fails with "incorrect status code; got 302, want 200". So now what? Do you not apply the version upgrade? Do you rewrite the test to check for a 302 instead of a 200? Do you adjust the test HTTP client to follow redirects silently? The problem here is that you checked for something you didn't care about, the HTTP status, instead of only checking for what you cared about, that "GET /foo" gets you some text you're looking for. In a world where you let the HTTP client follow redirects, like human-piloted HTTP clients, and only checked for what you cared about, you wouldn't have had to debug this to apply the web framework security update. But since you tightened down the screws constraining your application as tightly as possible, you're here debugging this instead of doing something fun.

(The fun doubles when you have to run every test for every commit before merging, and this one failure happened 45 minutes in. Goodbye, the rest of your day!)

HappMacDonald 4 days ago | parent [-]

This example smells a lot like "overfit" in AI training as well.

threatofrain 5 days ago | parent | prev [-]

It's that hard to write specs that truly match the business, hence why test-driven-development or specification-first failed to take off as a movement.

Asking specs to truly match the business before we begin using them as tests would handcuff test people in the same way we're saying that tests have the potential to handcuff app and business logic people — as opposed to empowering them. So I wouldn't blame people for writing specs that only match the code implementation at that time. It's hard to engage in prophecy.

nyrikki 5 days ago | parent | next [-]

The problem with TDD is that people assumed it was writing a specification, or directly tried to map it directly to post-hoc testing and metrics.

TDD at its core is defining expected inputs and mapping those to expected outputs at the unit of work level, e.g. function, class etc.

While UAT and domain informed what those inputs=outputs are, avoiding trying to write a broader spec that that is what many people struggle with when learning TDD.

Avoiding writing behavior or acceptance tests, and focusing on the unit of implementation tests is the whole point.

But it is challenging for many to get that to click. It should help you find ambiguous requirements, not develop a spec.

MoreQARespect 5 days ago | parent [-]

I literally do the diametric opposite of you and it works extremely well.

Im weirded out by your comment. Writing tests that couple to low level implementation details was something I thought most people did accidentally before giving up on TDD, not intentionally.

nyrikki 5 days ago | parent [-]

It isn't coupling low level implementation details, it is writing tests based on input and output of the unit under test.

The expected output from a unit, given an input is not an implementation detail, unless you have some very different definition of implementation detail than I.

Testing the unit under test produces the expected outputs from a set of inputs implies nothing about implementation details at all. It is also a concept older than dirt:

https://www.researchgate.net/publication/221329933_Iterative...

MoreQARespect 4 days ago | parent [-]

If the "unit under test" is low level then thats coupling low level implementation details to the test.

If you're vague about what constitutes a "unit" that means youre probably not thinking about this problem.

nyrikki 4 days ago | parent [-]

Often, even outside of software, unit testing means testing a component's (unit‘s) external behavior.

If you don't accept that concept I can see how TDD and testing in general would be challenging.

I general it is most productive when building competency with a new subject to accept the author's definitions, then adjust once you have experience.

IMHO, the sizing of components is context, language, and team dependent. But it really doesn't matter TDD is just as much about helping with other problems like action bias, and is only one part of a comprehensive testing strategy.

While how you choose to define a 'unit' will impact outcomes, TDD it self isn't dependent on a firm definition.

MoreQARespect 4 days ago | parent [-]

>If you don't accept that concept

Nobody anywhere in the world disputes that unit tests should surround a unit.

>IMHO, the sizing of components is context, language, and team dependent. But it really doesn't matter

Yeah, thats the attitude that will trip you up.

The process you use to determine the borders which you will couple to your test - i.e what constitutes a "unit" to be tested is critically important and nonobvious.

marcosdumay 5 days ago | parent | prev [-]

> So I wouldn't blame people for writing specs that only match the code implementation at that time.

WFT are you doing writing specs based on implementation? If you already have the implementation, what are you using the specs for? Or, if you want to apply this direct to tests, if you are already assuming the program is correct, what are you trying to test?

Are you talking about rewriting applications?

baq 5 days ago | parent [-]

Where do you work if you don’t need to reverse engineer an existing implementation? Have you written everything yourself?

marcosdumay 3 days ago | parent [-]

Unless you are rewriting the application, you shouldn't assume that whatever behavior you find on the system is the correct one.

Even more because if you are looking into it, it's probably because it's wrong.