Remix.run Logo
dmitrygr 3 days ago

Now do it without those pre-written tests. Spec only. Else, the writers of those tests deserve a LOT of credit.

pseudosavant 3 days ago | parent | next [-]

If there is one thing that that agents/LLMs have highlighted, it is how much credit those test writers do deserve. Teams that were already following a TDD-style approach seem to be able to realize value from agents most easily because of their tests.

The tests are what enable: building a brand new JS runtime that works, rewriting a complex piece of code in a different language (e.g. Golang instead of TypeScript) that is more performant for that task, or even migrating off of an old stack (.NET WebForms) to something newer.

ivankra 2 days ago | parent | prev | next [-]

You can prompt an LLM to generate tests from the spec and I'd bet it would easily get most of the way there, especially if you give it a reference implementation to test against. I did just that, though on a small scale - just for feature tests. The last few percent would be the real challenge, you probably don't want it to just imitate another implementation's bugs.

dmitrygr 2 days ago | parent [-]

Reference implementation that someone else — a human, wrote? Hm… so one way or another, some humans’ labour is laundered…

ivankra 2 days ago | parent [-]

Don't we all stand on the shoulders of giants?

dmitrygr 2 days ago | parent [-]

I have never attempted to take credit for someone's work, nor ever put serious effort into hiding someone's contribution. LLMs are purpose-designed for that.

UncleEntity 2 days ago | parent | prev [-]

> Now do it without those pre-written tests

That's probably the most important thing, actually. I've tried my hardest to get Claude to build an APL VM using only the spec and it's virtually impossible to get full compliance as it takes too many shortcuts and makes too many assumptions. That's part of the challenge though, to see how far the daffy robots have come.

vrighter 2 days ago | parent [-]

Hehe I tried gving it a minesweeper CSP I've been working on and asked it to develop the feature I was working on at the moment just to compare. I was working on adding non chronological backtracking to the search engine.

I gave it the proper compile flags, I gave it test cases and their expected output, and everything it would have needed. The test cases were specifically hand picked to be hard on the search algorithm. And the base program was correct and gave the correct results (I was only adding an optimization), and were what I was using as a baseline for testing my implementation. You know, with a debugger and breakpoints, printfs and all that.

In the end it couldn't get the thing to work (I asked it to compile and verify) then it proudly declared that in all of the test cases I gave it, everything was solved through constraint propagation and the search didn't even trigger. So it didn't introduce any bugs. It tried to gaslight me. Even though it got a segfault in the new code it added (which would obviously not have been triggered if the search didn't actually execute)