The general rule seems to be, the more layers you automate with LLMs, the worse each successive layer gets. Piping LLM output as input into new LLM calls, you're already starting to notice how things fall apart and get lost quickly.

If you have the idea, more or less the implementation plan, let the LLM do the coding, you can end up with something maintainable and nice, it's basically up to you.

Strip away one layer, so you have the idea, but let the LLM come up with the implementation plan, then also the implementation, and things end up a lot less than ideal.

Remove another layer, let the LLM do it all, and it's all a mess.

▲

nimonian 8 hours ago | parent | next [-]

It's like those sequences of images where we ask the LLM to reproduce the same image exactly, and we just get some kind of grotesque collapse after a few dozen iterations. The same happens with text and code. I call this "semantic collapse".

I conjecture that after some years of LLMs reading a SharePoint site, producing summaries, then summaries of those summaries, etc... We will end up with a grotesque slurry.

At some point, fresh human input is needed to inject something meaningful into the process.

▲

stitched2gethr 4 hours ago | parent | prev | next [-]

It's all about how full the context is, right? For a task that can be completed in 20% of the context it doesn't matter, but you don't want to fill your context with exploration before you do the hard part.

I have actually found something close to the opposite. I work on a large codebase and I often use the LLM to generate artifacts before performing the task (for complex tasks). I use a prompt to say "go explore this area if the code and write about it". It documents concepts and has pointers to specific code. Then a fresh session can use that without reading the stuff that doesn't matter. It uses more tokens overall, but includes important details that can get totally missed when you just let it go.

▲

godelski 9 hours ago | parent | prev | next [-]

People like to make the comparison between zip file compressions, where you can degrade something by continually compressing. Same with using jpeg or mp3. But I like to use the analogy of the game "Telephone" (also called "Chinese Whispers"). I think it also highlights how fraught natural language is and just how quickly it can degrade. I think a lot of people are insufficiently impressed with how good we are at communicating at all.

▲

sweetjuly 9 hours ago | parent | next [-]

I suggest you find a new DEFLATE library if you're losing data when you compress things with it :)

	▲	godelski 5 hours ago \| parent [-]
		You do realize there is both lossy and lossless compression, right? Or did you hyperfixate on the colloquial usage of zip

▲

ethmarks 9 hours ago | parent | prev | next [-]

ZIP files are lossless. If you compress, unzip, and recompress a ZIP file hundreds of times, it'll still be the exact same data as when you started.

	▲	ChrisGreenHeur 6 hours ago \| parent [-]
		So is the game of telephone as long as people stop whispering and try to not make stuff up

▲

meindnoch 8 hours ago | parent | prev | next [-]

>zip file compressions, where you can degrade something by continually compressing

Reading this on HN... Sic transit gloria mundi!

▲

jibal 6 hours ago | parent | prev [-]

> People like to make the comparison between zip file compressions, where you can degrade something by continually compressing.

What people have this misunderstanding?

▲

quotemstr 9 hours ago | parent | prev [-]

I think this principle applies only if you lack feedbacks. Yes, when you go through multiple layers of open loop control, you're going to get less precise at each layer. It's less clear that the situation is as dire if each level has metrics and can self-adjust to optimize its performance.

▲

embedding-shape 9 hours ago | parent [-]

But these are inherently subjective things, what the "right idea" is, or the "right implementation" is all up in our head that we can try to verbalize, but I don't think you can come up with an objective score for it, ask 100 programmers you'll get 100 different answers what "clean design" is.

▲

quotemstr 9 hours ago | parent [-]

And that's why my whole schtick when it comes to agent design is that agents need to learn online, continuously, and in adapter space via some PEFT mechanism (I like soft prompts and prefix tuning), because it's really hard to ascend gradients in discrete domains like tokens.

▲

embedding-shape 9 hours ago | parent [-]

> The model knows damn well when it's written ugly code. You can just ask it.

That's not been my experience at all, what model and prompt would you use for that? Every single one I've tried is oblivious to if a design makes sense or not unless explicitly prompted for it with constraints, future ideas and so on.

▲

CuriouslyC 7 hours ago | parent [-]

The problem is that the model doesn't know what you mean by "bad code" a priori. If you list specific issues you care about (e.g. separation of concerns, don't repeat yourself, single responsibility, prefer pure functions, etc) it's pretty good at picking them out. Humans have this problem as well, we're just more opinionated.

	▲	embedding-shape 6 hours ago \| parent [-]
		Yes, that's exactly what I mentioned earlier, if you describe the implementation, you can get something you can work with long-term. But if you just describe an idea, and let the LLM do both the design of the implementation and the implementation itself, eventually it seems to fall over itself and changes takes longer and longer time.