▲ | troupo 6 days ago | |
> You can't check because the numbers quickly become astronomical. But you can with unit tests? > Can you test the Python parser on all possible Python programs? A parser is one of the few cases where unit tests work. Very few people write parsers. See also my sibling reply here: https://news.ycombinator.com/item?id=45078047 > What you do is write more primitive components and either unit test them, prove them to be correct or make them small enough to be correct by inspection. An integration test is just testing that the interfaces do indeed fit together, it won't normally be close to testing all possible code paths internally. Ah yes. Somehow "behaviour of unit tests is correct" but "just testing interfaces in just a few integration tests". Funny how that becomes a PagerDuty alert at 3 in the morning because "correct behaviour" in one unit wasn't tested together with "correct behaviour" in another unit. But when you actually write an actual integration test over actual (or simulated) inputs, suddenly 99%+ of your unit tests become redundant because actually using your app/system as intended covers most of the code paths you could possibly use. | ||
▲ | MrJohz 6 days ago | parent [-] | |
It is important to have integration tests, but my experience is very much the opposite of what you're describing. I almost never have bugs where the cause is the small amount of glue code tying things together, because that code is usually tiny and incredibly simple (typically just passing arguments in one format to another format, and potentially catching errors and converting them to a different format). A couple of tests and a bit of static typing is sufficient to cover all the different possibilities because there are so few possibilities. The failure mode I see much more often is in the other direction: tests that are testing too many units together and need to be lowered down to be more useful. For example, I recently wrote some code that generated intellisense suggestions for a DSL that our users use. Originally, the tests covered a large swathe of that functionality, and involved triggering e.g. lots of keydown events to check what happened when different keys were pressed. These were useful tests for checking that the suggestions box worked as expected, but they made it very difficult to test edge cases in how the suggestions were generated because the code needed to set that stuff up was so involved. In the end what I did was I lowered the tests so I had a bunch of tests due the suggestions generation function (which was essentially `(input: str, cursor: int) -> Completion[]` and so super easy to test), and a bunch of tests for the suggestions box (which was now decoupled from the suggestions logic, and so also easier to test). I kept some higher level integration tests, but only very few of them. The result is faster, but also much easier to maintain, with tests that are easier to write and code that's easier to refactor. |