Remix.run Logo
skeeter2020 6 hours ago

I spent about a week doing an "experiment" greenfield app. I saw 4 types of issues:

0. It runs way too fast and far ahead. You need to slow it down, force planning only and explicitly present a multi-step (i.e. numbered plan) and say "we'll do #1 first, then do the rest in future steps".

take-away: This is likely solved with experience and changing how I work - or maybe caring less? The problem is the model can produce much faster than you can consume, but it runs down dead ends that destroy YOUR context. I think if you were running a bunch of autonomous agents this would be less noticeable, but impact 1-3 negatively and get very expensive.

1. lots of "just plain wrong" details. You catch this developing or testing because it doesn't work, or you know from experience it's wrong just by looking at it. Or you've already corrected it and need to point out the previous context.

take-away: If you were vibe coding you'd solve all these eventually. Addressing #0 with "MORE AI" would probably help (i.e. AI to play/validate, etc).

2. Serious runtime issues that are not necessarily bugs. Examples: it made a lot of client-side API endpoints public that didn't even need to exist, or at least needed to be scoped to the current auth. It missed basic filtering and SQL clauses that constrained data. It hardcoded important data (but not necessarily secrets) like ports, etc. It made assumptions that worked fine in development but could be big issues in public.

take-away: AI starts to build traps here. Vibe coders are in big trouble because everything works but that's not really the end goal. Problems could range from 3am downtime call-outs to getting your infrastructure owned or data breaches. More serious: experienced devs who go all-in on autonomous coding might be three months from their last manual code review and be in the same position as a vibe coder. You'd need a week or more to onboard and figure out what was going on, and fix it, which is probably too late.

3. It made (at least) one huge architectural mistake (this is a pretty simple project so I'm not sure there's space for more). I saw it coming but kept going in the spirit of my experiment.

take-away: TBD. I'm going to try and use AI to refactor this, but it is non trivial. It could take as long as the initial app did to fix. If you followed the current pro-AI narrative you'd only notice it when your app started to intermittently fail - or you got you cloud provider's bill.

Schiendelman 2 hours ago | parent [-]

I'm a product manager, and a lot of the things I see people do wrong is because they don't have any product management experience. It takes quite a bit of work to develop a really good theory of what should be in your functional spec. Edge cases come up all the time in real software engineering, and often handling all those cases is spread across multiple engineers. A good product manager has a view of all of it, expects many of those issues from the agent, and plans for coaching it through them.

nevertoolate 2 hours ago | parent [-]

Poe’s law is strong with this one

Schiendelman an hour ago | parent | next [-]

Tell me more! I'm trying to figure out how you got that.

tristor an hour ago | parent | prev [-]

I think that's an incredibly reductionist and sarcastic take. I'm also in Product, but was an engineer for over a decade prior. I find that having strong structured functional specifications and a good holistic understanding of the solution you're trying to build goes a long way with AI tooling. Just like any software project, eliminating false starts and getting a clear set of requirements up front can minimize engineering time required to complete something, as long as things don't change in the middle. When your cycle time is an afternoon instead of two quarters, that type of up front investment pays off much better.

I still think AI tooling is lacking, but you can get significantly better results by structuring your plans appropriately.