Remix.run Logo
tom_ 4 hours ago

It's not absurd at all (in my view). A test checks that some obtained result matches the expected result - and if that obtained result is something that got printed out and redirected to a file, and that expected result is something that was produced the same way from a known good run (that was determined to be good by somebody looking at it with their eyes), and the match is performed by comparing the two output files... then there you go.

This is how basically all of the useful tests I've written have ended up working. (Including, yes, tests for an internal programming language.) The language is irrelevant, and the target system is irrelevant. All you need to be able to do is run something and capture its output somehow.

(You're not wrong to note that the first draft basic approach can still be improved. I've had a lot of mileage from adding stuff: producing additional useful output files (image diffs in particular are very helpful), copying input and output files around so they're conveniently accessible when sizing up failures, poking at test runner setup so it scales well with core count, more of the same so that it's easy to re-run a specific problem test in the debugger - and so on. But the basic principle is always the same: does actual output match expected output, yes (success)/no (fail).)