Remix.run Logo
JimDabell 7 days ago

LLMs can’t build software because we are expecting them to hear a few sentences, then immediately start coding until there’s a prototype. When they get something wrong, they have a huge amount of spaghetti to wade through. There’s little to no opportunity to iterate at a higher level before writing code.

If we put human engineering teams in the same situation, we’d expect them to do a terrible job, so why do we expect LLMs to do any better?

We can dramatically improve the output of LLM software development by using all those processes and tools that help engineering teams avoid these problems:

https://jim.dabell.name/articles/2025/08/08/autonomous-softw...

diwank 7 days ago | parent | next [-]

yup. I started a fully autonomous, 100% vibe coded side project called steadytext, mostly expecting it to hit a wall, with LLMs eventually struggling to maintain or fix any non-trivial bug in it. turns out I was wrong, not only has claude opus been able to write up a pretty complex 7k LoC project with a python library, a CLI, _and_ a postgres extension. It actively maintains it and is able to fix filed issues and feature requests entirely on its own. It is completely vibe coded, I have never even looked at 90% of the code in that repo. it has full test coverage, passes CI, and we use it in production!

granted- it needs careful planning for CLAUDE.md and all issues and feature requests need a lot of in-depth specifics but it all works. so I am not 100% convinced by this piece. I'd say it's def not easy to get coding agents to be able to manage and write software effectively and specially hard to do so in existing projects but my experience has been across that entire spectrum. I have been sorely disappointed in coding agents and even abandoned a bunch or projects and dozens of pull requests but I have also seen them work.

you can check out that project here: https://github.com/julep-ai/steadytext/

sjdbdjskbzba 6 days ago | parent | next [-]

> It is completely vibe coded, I have never even looked at 90% of the code in that repo. it has full test coverage, passes CI, and we use it in production!

This horrifies me. I checked your website and all your recommendations are from people who appear to have an Indian background, but you’re based in the US? And you claim they’re the most innovative companies yet I doubt anyone has heard of them?

Looking over the repo and it seems like a mess (commits are meaningless and code is all over the place).

I’m sorry this feels incredibly scammy.

thegeomaster 6 days ago | parent | prev | next [-]

Thanks for sharing this! It's difficult to find good examples of useful codebases where coding agents have done most of the work. I'm always actively looking at how I can push these agents to do more for me and it's very instructive to hear from somebody who has had success on this level. (Would be nice to read a writeup, too)

diwank 6 days ago | parent [-]

It's coming soon! I think this experiment has really taught me a lot about the limits of agentic code assistants, stuff that they're good at, they're insanely good at, and stuff that they're horrible at and cannot seem to overcome. I did write a little bit about how I use Claude Code [1] before I started this project a while back, and I'm planning to finish a sequel pretty soon.

^[1]: https://diwank.space/field-notes-from-shipping-real-code-wit...

aethrum 6 days ago | parent | prev | next [-]

Huh, interesting. Though I do wonder if the best possible thing an AI could help code would be another AI tool

itsalotoffun 6 days ago | parent [-]

This way to the hard take-off.

6 days ago | parent | prev [-]
[deleted]
zahlman 5 days ago | parent | prev | next [-]

> If we put human engineering teams in the same situation, we’d expect them to do a terrible job, so why do we expect LLMs to do any better?

Counterpoint: projects by autonomous solo developers are often excellent, and these can only exist exactly because said developers directed themselves in that exact way.

tossandthrow 6 days ago | parent | prev | next [-]

We don't expect humans to do a terrible job - we just expect them to facilitate the process.

If the LLM started sketching up screens and asked questions back about the intention of the software, then I am sure people would have a much better experience.

jarjoura 6 days ago | parent | prev | next [-]

Okay, I'm willing to entertain your cynical take. However, experience has shown me that if we need to solve a vague problem as a team of engineers and designers, we know to get ample context of what it is we're actually trying to build.

Plus, the most creative solutions often comes from implicit and explicit constraints. This is entirely a human skill and something we excel at.

These LLMs aren't going to "consider" something, understand the constraints, and then fit a solution inside those constraints that weren't explicitly defined for it somehow. If constraints aren't well understood, either through common problems, or through context documents, it will just go off the deep end trying to hack something together.

So right now we still need to rely on humans to do the work of breaking problems down, scoping the work inside of those constraints, and then coming up with a viable path forward. Then, at that point, the LLM becomes just another way to execute on that path forward. Do I use javascript, rust, or Swift to write the solution, or do I use `CLAUDE.md` with these 30 MCP services to write the solution.

For now, it's just another tool in the toolbox at getting to the final solution. I think the conversations around it needing to be a binary either, all or nothing, is silly.

bagacrap 6 days ago | parent | prev | next [-]

There are a lot of human engineers who do a fine job in these situations, akshwally.

If it isn't easy to give commands to LLMs, then what is the purpose of them?

imtringued 6 days ago | parent | prev | next [-]

>If we put human engineering teams in the same situation, we’d expect them to do a terrible job, so why do we expect LLMs to do any better?

Because LLMs were trained for one shot performance and they happen to beat humans at that.

otterley 6 days ago | parent | prev [-]

This is the approach that Kiro is taking, although it’s early days. It’s not perfect but it does produce pretty good results if you adhere to its intent.

quantumHazer 6 days ago | parent [-]

a 1 minutes research on the internet led me to discover that you are MARKETING MANAGER at amazon. so your take is full of conflict of interest and this should be disclosed.

otterley 6 days ago | parent [-]

Fair enough and I apologize for not disclosing it. However, Kiro is not a service in scope for me, and this is my own opinion, not that of the company.

(Also, there is no conflict of interest here, and you do not need to yell. I’m free to criticize my company if I like.)