Remix.run Logo
striking 5 hours ago

I'm currently working on the discovery phase of a larger refactor and have pretty quickly realized that AI can actually often be pretty useless even if you've encoded the rules in an unambiguous, programmatic way.

For example, consider a lint rule that bans Kysely queries on certain tables from existing outside of a specific folder. You'd write a rule like this in an effort to pull reads and writes on a certain domain into one place, hoping you can just hand the lint violations to your AI agent and it would split your queries into service calls as needed.

And at first, it will appear to have Just Worked™. You are feeling the AGI. Right up until you start to review the output carefully. Because there are now little discrepancies in the new queries written (like not distinguishing between calls to the primary vs. the replica, missing the point of a certain LIMIT or ORDER BY clause, failing to appropriately rewrite a condition or SELECT, etc.) You run a few more reviewer agent passes over it, but realize your efforts are entirely in vain... because even if the reviewer agent fixes 10 or 20 or 30 of the issues, you can still never fully trust the output.

As someone with experience in doing this kind of thing before AI, I went back to doing it the old way: using a codemod to rewrite the code automatically using a series of rules. AI can write the codemod, AI can help me evaluate the results, but actually having it apply all of the few hundred changes automatically led to a lack of my ability to trust the output. And I suspect that will continue to be true for some time.

This industry needs a "verification layer" that, as far as I know, it does not have yet. Some part of me hopes that someone will reply to this comment with a counterexample, because I could sorely use one.