Remix.run Logo
cglan 9 hours ago

I find LLMs so much more exhausting than manual coding. It’s interesting. I think you quickly bump into how much a single human can feasibly keep track of pretty fast with modern LLMs.

I assume until LLMs are 100% better than humans in all cases, as long as I have to be in the loop there will be a pretty hard upper bound on what I can do and it seems like we’ve roughly hit that limit.

Funny enough, I get this feeling with a lot of modern technology. iPhones, all the modern messaging apps, etc make it much too easy to fragment your attention across a million different things. It’s draining. Much more draining than the old days

superfrank 5 hours ago | parent | next [-]

> I find LLMs so much more exhausting than manual coding

I do as well, so totally know what you're talking about. There's part of me that thinks it will become less exhausting with time and practice.

In high school and college I worked at this Italian place that did dine in, togo, and delivery orders. I got hired as a delivery driver and loved it. A couple years in there was a spell where they had really high turnover so the owners asked me to be a waiter for a little while. The first couple months I found the small talk and the need to always be "on" absolutely exhausting, but overtime I found my routine and it became less exhausting. I definitely loved being a delivery driver far more, but eventually I did hit a point where I didn't feel completely drained after every shift of waiting tables.

I can't help but think coding with LLMs will follow a similar pattern. I don't think I'll ever like it more than writing the code myself, but I have to believe at some point I'll have done it enough that it doesn't feel completely draining.

qq66 2 hours ago | parent | next [-]

I think it's because traditionally, software engineering was a field where you built your own primitives, then composited those, etc... so that the entire flow of data was something that you had a mental model for, and when there was a bug, you simply sat down and fixed the bug.

With the rise of open source, there started to be more black-box compositing, you grabbed some big libraries like Django or NumPy and honestly just hoped there weren't any bugs, but if there were, you could plausibly step through the debugger and figure out what was going wrong and file a bug report.

Now, the LLMs are generating so many orders of magnitude more code than any human could ever have the chance to debug, you're basically just firing this stuff out like a firehose on a house fire, giving it as much control as you can muster but really just trusting the raw power of the thing to get the job done. And, bafflingly, it works pretty well, except in those cases where it doesn't, so you can't stop using the tool but you can't really ever get comfortable with it either.

nvardakas 5 minutes ago | parent | next [-]

Very good catch. The mental model thing is real I've caught myself approving LLM generated code that works but that I couldn't debug if it broke at 2am. With libraries you at least had docs and a community. With generated code, the only source of truth is... asking the same LLM again and hoping it's consistent.

chii 2 hours ago | parent | prev | next [-]

> bafflingly, it works pretty well, except in those cases where it doesn't

so as a human, you would make the judgement that the cases where it works well enough is more than make up for the mistakes. Comfort is a mental state, and can be easily defeated by separating your own identity and ego with the output you create.

qq66 30 minutes ago | parent [-]

I mean, you could make that judgment in some cases, but clearly not all. If you use AI to ship 20 additional features but accidentally delete your production database you definitely have not come out ahead.

https://www.reddit.com/r/OpenAI/comments/1m4lqvh/replit_ai_w...

xienze an hour ago | parent | prev [-]

> I think it's because traditionally, software engineering was a field where you built your own primitives, then composited those, etc... so that the entire flow of data was something that you had a mental model for

Not just that, but the fact that with programming languages you can have the utmost precision to describe _how_ the problem needs to be solved _and_ you can have some degree of certainty that your directions (code) will be followed accurately.

It’s maddening to go from that to using natural language which is interpreted by a non-deterministic entity. And then having to endlessly iterate on the results with some variation of “no, do it better” or, even worse, some clever “pattern” of directing multiple agents to check each other’s work, which you’ll have to check as well eventually.

apsurd 3 hours ago | parent | prev [-]

Thanks for the story. I also spent time as a delivery driver at an italian restaurant. It was a blast in the sense that i look back at that slice of life with pride and becoming. Never got the chance to be a waiter, but definitely they were characters and worked hard for their money. Also the cooking staff. What a hoot.

hombre_fatal 9 hours ago | parent | prev | next [-]

I think the upper limit is your ability to decide what to build among infinite possibilities. How should it work, what should it be like to use it, what makes the most sense, etc.

The code part is trivial and a waste of time in some ways compared to time spent making decisions about what to build. And sometimes even a procrastination to avoid thinking about what to build, like how people who polish their game engine (easy) to avoid putting in the work to plan a fun game (hard).

The more clarity you have about what you’re building, then the larger blocks of work you can delegate / outsource.

So I think one overwhelming part of LLMs is that you don’t get the downtime of working on implementation since that’s now trivial; you are stuck doing the hard part of steering and planning. But that’s also a good thing.

SchemaLoad 9 hours ago | parent | next [-]

I've found writing the code massively helps your understanding of the problem and what you actually need or want. Most times I go into a task with a certain idea of how it should work, and then reevaluate having started. While an LLM will just do what you ask without questing, leaving you with none of the learnings you would have gained having done it. The LLM certainly didn't learn or remember anything from it.

jeremyjh 9 hours ago | parent | next [-]

In some cases, yes. But I’ve been doing this awhile now and there is a lot of code that has to be written that I will not learn anything from. And now, I have a choice to not write it.

orbisvicis 6 hours ago | parent [-]

Ehh, I find that the most tedious code is also the most sensitive to errors, stuff that blurs the divide between code and data.

jeremyjh 6 hours ago | parent [-]

I doubt if we're talking about the same sort of things at all. I'm talking about stuff like generic web crud. Too custom to be generated deterministically but recent models crush it and make fewer errors than I do. But that is not even all they can do. But yes, once you get into a large complicated code base its not always worth it, but even there one benefit is it to develop more test cases - and more complicated ones - than I would realistically bother with.

stavros 7 hours ago | parent | prev [-]

It depends on how you use them. In my workflow, I work with the LLM to get the desired result, and I'm familiar with the system architecture without writing any of the code.

I've written it up here, including the transcript of an actual real session:

https://www.stavros.io/posts/how-i-write-software-with-llms/

jeremyjh 6 hours ago | parent [-]

Thanks for writing this up.

I just woke up recently myself and found out these tools were actually becoming really, really good. I use a similar prompt system, but not as much focus on review - I've found the review bots to be really good already but it is more efficient to work locally.

One question I have since you mention using lots of different models - is do you ever have to tweak prompts for a specific model, or are these things pretty universal?

stavros 4 hours ago | parent [-]

I don't tweak prompts, no. I find that there's not much need to, the models understand my instructions well enough. I think we're way past the prompt engineering days, all models are very good at following instructions nowadays.

galaxyLogic 9 hours ago | parent | prev | next [-]

Right when you're coding with LLM it's not you asking the LLM questions, it's LLM asking you questions, about what to build, how should it work exactly, should it do this or that under what conditions. Because the LLM does the coding, it's you have to do more thinking. :-)

And when you make the decisions it is you who is responsible for them. Whereas if you just do the coding the decisions about the code are left largely to you nobody much sees them, only how they affect the outcome. Whereas now the LLM is in that role, responsible only for what the code does not how it does it.

eucyclos 3 hours ago | parent [-]

Hehe, speak for yourself- as a 1x coder on a good day, having a nonjudgmental partner who can explain stuff to me is one of the best parts of writing with an llm :)

galaxyLogic 19 minutes ago | parent [-]

I like that aspect of it too. LLM never seems to get offended even when I tell it its wrong. Just trying to understand why some people say it can feel exhausting. Instead of focusing on narrowly defined coding tasks, the work has changed and you are responsible for a much larger area of work, and expectations are similarly higher. You're supposed to produce 10x code now.

clickety_clack 9 hours ago | parent | prev | next [-]

I’d love to see what you’ve built. Can you share?

grey-area 8 hours ago | parent | prev | next [-]

Maintenance is the hard part, not writing new code or steering and planning.

ipaddr 4 hours ago | parent | prev [-]

You can outsource that to another llm

raincole 8 hours ago | parent | prev | next [-]

If you care at code quality of course it is exhausting. It's supposed to be. Now there is more code for you to assure quality in the same length of time.

onion2k an hour ago | parent [-]

If you care about code quality you should be steering your LLM towards generating high quality code rather than writing just 'more code' though. What's exhausting is believing you care about high quality code, then assuming the only way to get high quality code from an LLM is to get it to write lots of low quality code that you have to fix yourself.

LLMs will do pretty much exactly what you tell them, and if you don't tell them something they'll make up something based on what they've been trained to do. If you have rules for what good code looks like, and those are a higher bar than 'just what's in the training data' then you need to build a clear context and write an unambiguous prompt that gets you what you want. That's a lot of work once to build a good agent or skill, but then the output will be much better.

Frieren 39 minutes ago | parent [-]

> write an unambiguous prompt

That's an oxymoron. Prompts by definition are ambiguous otherwise you will be writing code.

gotwaz 4 hours ago | parent | prev | next [-]

Theory of Bounded Rationality applies. Tech tools scale systemic capability limits. 3 inch chimp brain limits dont change. The story writes itself.

akomtu 7 hours ago | parent | prev | next [-]

You used to be a Formula 1 driver. Now you are an instructor for a Formula 1 autopilot. You have to watch it at all times with full attention for it's a fast and reckless driver.

esafak 3 hours ago | parent [-]

You're being generous to the humans; we're more like Ladas in comparison.

p_v_doom 2 hours ago | parent [-]

That may not be a bad comparison. A F1 car is really fast, really specialized car, that is also extremely fragile. A Lada may not be too fast but its incredibly versatile and robust even after decades of use. And has more luggage space

senectus1 9 hours ago | parent | prev [-]

I imagine code reviewing is a very different sort of skill than coding. When you vibe code (assuming you're reading teh code that is written for you) you become a coder reviewer... I suspect you're learning a new skill.

qudat 8 hours ago | parent | next [-]

It’s easier to write code than read it.

Leynos 33 minutes ago | parent | next [-]

It's important to enforce the rules that make the code easier to read.

j3k3 8 hours ago | parent | prev [-]

Id argue the read-write procedures are happening simultaneously as one goes along, writing code by hand.

pessimizer 8 hours ago | parent | prev [-]

The way I've tried to deal with it is by forcing the LLM to write code that is clear, well-factored and easy to review i.e. continually forcing it to do the opposite of what it wants to do. I've had good outcomes but they're hard-won.

The result is that I could say that it was code that I myself approved of. I can't imagine a time when I wouldn't read all of it, when you just let them go the results are so awful. If you're letting them go and reviewing at the end, like a post-programming review phase, I don't even know if that's a skill that can be mastered while the LLMs are still this bad. Can you really master Where's Waldo? Everything's a mess, but you're just looking for the part of the mess that has the bug?

I'm not reviewing after I ask it to write some entire thing. I'm getting it to accomplish a minimal function, then layering features on top. If I don't understand where something is happening, or I see it's happening in too many places, I have to read the code in order to tell it how to refactor the code. I might have to write stubs in order to show it what I want to happen. The reading happens as the programming is happening.