| ▲ | wg0 2 hours ago | ||||||||||||||||||||||||||||
Snake oil. Good to read for sure. Seems all plausible too. But snake oil nevertheless. Here's why: The slot machine can drop any hard requirement that you specifically in your AGENTS.md, memory.md or your dozens of skill markdowns. Pretty much guaranteed. These harnesses approaches pretend as if LLMs are strict and perfect rule followers and the only problem is not being able to specify enough rules clearly enough. That's fundamental cognitive lapse in how LLMs operate. That leaves only one option not reliable but more reliable nevertheless: Human review and oversight. Possibly two of them one after the other. Everything else is snake oil but at that point, you also realize that promised productivity gains are also snake oil because reading code and building a mental model is way harder than having a mental model and writing it into code. | |||||||||||||||||||||||||||||
| ▲ | vidarh an hour ago | parent | next [-] | ||||||||||||||||||||||||||||
Humans also drop any hard requirements you specify regularly, and similarly require review. Nevertheless we manage to increase reliability of human output through processes and reviews, and most of the methods we use for harnesses are taken from experience with how to reduce reliability issues in humans, who are notoriously difficult to ensure delivers reliably. | |||||||||||||||||||||||||||||
| ▲ | cortesoft 2 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||
Everything you say is all possible, and in theory I agree with you. However, I have been using spec-kit (which is basically this style of AI usage) for the last few months and it has been AMAZING in practice. I am building really great things and have not run into any of the issues you are talking about as hypotheticals. Could they eventually happen? Sure, maybe. I am still cautious. But at some point once you have personally used it in practice for long enough, I can't just dismiss it as snake oil. I have been a computer programmer for over 30 years, and I feel like I have a good read on what works and what doesn't in practice. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||