Remix.run Logo
foobarqux 2 hours ago

I'm saying you can go even further and automate the entire thing using LLMs/agents, it is pretty much the ideal use case: you have a black-box reference implementation to test against; descriptive documentation for what the functions should do; some explicitly supplied examples in the documentation; and the ability to automatically create an arbitrary number of tests.

So not only do you have a closed loop system that has objective/automatic pass-fail criteria you also don't even have to supply the instructions about what the function is supposed to do or the test cases!

Obviously this isn't going to be 100% reliable (especially for edge cases) but you should be able to get an enormous speed up. And in many cases you should be able to supply the edge case tests and have the LLM fix it.

(Codex is still free for the next few days if you want to try their "High"/"Extra high" thinking models)

thrtythreeforty 8 minutes ago | parent [-]

You accidentally raise an interesting point: good, thorough public documentation, once considered a great selling point for your system, now invites automated reimplementation by competition. It would be a shame to see public docs vanish because it turns out they are literally machine readable specs.