Remix.run Logo
amangsingh 4 hours ago

A 500k line codebase for an agent CLI proves one thing: making a probabilistic LLM behave deterministically is a massive state-management nightmare. Right now, they're great for prompting simple sites/platforms but they break at large enterprise repos.

If you don't have a rigid, external state machine governing the workflow, you have to brute-force reliability. That codebase bloat is likely 90% defensive programming; frustration regexes, context sanitizers, tool-retry loops, and state rollbacks just to stop the agent from drifting or silently breaking things.

The visual map is great, but from an architectural perspective, we're still herding cats with massive code volume instead of actually governing the agents at the system level.

ttcbj 2 hours ago | parent | next [-]

I find it really strange that there is so much negative commentary on the _code_, but so little commentary on the core architecture.

My takeaway from looking at the tool list is that they got the fundamental architecture right - try to create a very simple and general set of tools on the client-side (e.g. read file, output rich text, etc) so that the server can innovate rapidly without revving the client (and also so that if, say, the source code leaks, none of the secret sauce does).

Overall, when I see this I think they are focused on the right issues, and I think their tool list looks pretty simple/elegant/general. I picture the server team constantly thinking - we have these client-side tools/APIs, how can we use them optimally? How can we get more out of them. That is where the secret sauce lives.

olejorgenb an hour ago | parent | next [-]

The tools was mostly already known, no? (I wish they had a "present" tool which allowed to model to copy-paste from files/context/etc. showing the user some content without forcing it through the model)

acedTrex 41 minutes ago | parent | prev | next [-]

> but so little commentary on the core architecture.

The core architecture is not interesting? its an LLM tui, theres not much there to discuss architecturally. The code itself is the actual fascinating train wreck to look at.

3abiton 23 minutes ago | parent | prev [-]

[dead]

sunir 3 hours ago | parent | prev | next [-]

It’s not surprising. There has been quite a bit of industrial research in how to manage mere apes to be deterministic with huge software control systems, and they are an unruly bunch I assure you.

RALaBarge 2 hours ago | parent [-]

Sunir! Hope you are doing well man, I got a good chuckle from this.

sunir an hour ago | parent [-]

I am! I’ll reach out in another channel to connect.

tracyhenry 5 minutes ago | parent | prev | next [-]

> they break at large enterprise repos.

I don't know where you get this. you should ask folks at Meta. They are probably the biggest and happiest users of CC

chrismarlow9 an hour ago | parent | prev | next [-]

We propped the entire economy up on it. Just look at the s&p top 10. Actually even top 50 holdings.

If it doesn't deliver on the promise we have bigger problems than "oh no the code is insecure". We went from "I think this will work" to "this has to work because if it doesn't we have one of those 'you owe the bank a billion dollars' situations"

noosphr 2 minutes ago | parent | prev | next [-]

What is going to be hilarious is rewriting that whole code base for each new version of Claude. Anyone who has been around since the gpt3 days knows that the models have very different failure modes in each generation. After learning three I don't have the energy to do it again. The code base reads like it was written in blood so for each new release you'd have months of unexpected 'opsie I deleted your whole company. I shouldn't have done that. I'm really sorry.' type events happening.

comboy 3 hours ago | parent | prev | next [-]

It's hard to tell how much it says about difficulty of harnessing vs how much it says about difficulty of maintaining a clean and not bloated codebase when coding with AI.

amangsingh 2 hours ago | parent | next [-]

Why not both? AI writes bloated spaghetti by default. The control plane needs to be human-written and rigid -> at least until the state machine is solid enough to dogfood itself. Then you can safely let the AI enhance the harness from within the sandbox.

whiplash451 an hour ago | parent | prev [-]

Were human organizations (not individuals) any good at the latter anyway?

nicoburns 2 hours ago | parent | prev | next [-]

Kinda depends how much of it is vibe coded. It could easily be 5x larger than it needs to be just because the LLM felt like it if they've not been careful.

saynay 2 hours ago | parent | next [-]

Claude folks proudly claim to have Claude effectively writing itself. The CEO claims it will read an issue and automatically write a fix, tests, commit and submit a PR for it.

amangsingh 2 hours ago | parent | prev [-]

Bingo. And them 'being careful' is exactly what bloats it to 500k lines. It's a ton of on-the-fly prompt engineering, context sanitizers, and probabilistic guardrails just to keep the vibes in check.

marcuscog 36 minutes ago | parent | prev | next [-]

I think these folks are attempting to build systems with IAM, entity states, business rules: all built over two foundational DSLs - https://typmo.com

bwfan123 17 minutes ago | parent | prev | next [-]

brute-forcing pattern-matching at scale. These are brittle systems with enormous duct-taping to hold everything together. workarounds on workarounds.

whycombagator 2 hours ago | parent | prev | next [-]

> Right now, they're great for prompting simple sites/platforms but they break at large enterprise repos

Can you expand on this?

My experience is they require excessive steering but do not “break”

oblio an hour ago | parent [-]

I think the "breakage" is in terms of conciseness and compactness, not outright brokenness.

Like that drunk uncle that takes half an hoir and 20 000 words to tell you a 500 word story.

bogdanoff_2 3 hours ago | parent | prev | next [-]

What do you mean by "actually governing the agents at the system level", and how is it different from "herding cats"?

amangsingh 3 hours ago | parent [-]

Herding cats is treating the LLM's context window as your state machine. You're constantly prompt-engineering it to remember the rules, hoping it doesn't hallucinate or silently drop constraints over a long session.

System-level governance means the LLM is completely stripped of orchestration rights. It becomes a stateless, untrusted function. The state lives in a rigid, external database (like SQLite). The database dictates the workflow, hands the LLM a highly constrained task, and runs external validation on the output before the state is ever allowed to advance. The LLM cannot unilaterally decide a task is done.

I got so frustrated with the former while working on a complex project that I paused it to build a CLI to enforce the latter. Planning to drop a Show HN for it later today, actually.

mywacaday 2 hours ago | parent | next [-]

I started that very personal project on Monday, waiting with baited breath, make sure to add a sponsor me a coffee link.

fallinditch 2 hours ago | parent | prev [-]

Sounds good, I'll keep an eye out.

p-e-w 3 hours ago | parent | prev | next [-]

> A 500k line codebase for an agent CLI proves one thing: making a probabilistic LLM behave deterministically is a massive state-management nightmare.

Considering what the entire system ends up being capable of, 500k lines is about 0.001% of what I would have expected something like that to require 10 years ago.

You can combine that with all the training and inference code, and at the end of the day, a system that literally writes code ends up being smaller than the LibreOffice codebase.

It boggles the mind, really.

davidkunz an hour ago | parent | next [-]

Oh, you should have a look at Pi then.

https://github.com/badlogic/pi-mono/tree/main/packages/codin...

sarchertech 3 hours ago | parent | prev | next [-]

> You can combine that with all the training and inference code, and at the end of the day, a system that literally writes code ends up being smaller than the LibreOffice codebase.

You really need to compare it to the model weights though. That’s the “code”.

pixl97 20 minutes ago | parent [-]

>You really need to compare it to the model weights though

Then you'd need to compare the education of any developer in relation to how many LOC their IDE is. That's the "code".

So yea, the analogy doesn't make a whole lot of sense.

2 hours ago | parent | prev | next [-]
[deleted]
oblio an hour ago | parent | prev | next [-]

It even wrote an entire browser!

By "just" wrapping a browser engine.

raincole 2 hours ago | parent | prev [-]

... what are you even talking about? "The system that literally writes code" has a few hundreds of trillions of parameters. How is this smaller than LibreOffice?

I know xkcd 1053, but come on.

quantumquantara 2 hours ago | parent | prev | next [-]

[dead]

dolomo 3 hours ago | parent | prev | next [-]

[flagged]

amangsingh 3 hours ago | parent | next [-]

If writing concise architectural analysis without the fluff makes me an AI, I'll take the complement. But no - just a tired Architect who has spent way too many hours staring at broken agent state loops haha.

airstrike 30 minutes ago | parent | next [-]

I'll bet you $20 you ran your original comment through an LLM. Likely an OpenAI model.

2 hours ago | parent | prev [-]
[deleted]
thfuran 2 hours ago | parent | prev | next [-]

What makes you think that’s AI-written?

samusiam 2 hours ago | parent | prev [-]

AI witch-hunters are even more annoying.

WarmWash an hour ago | parent [-]

Seriously, people are becoming deranged.

Drop an em dash or a bullet point and they go into spasms.

ramesh31 2 hours ago | parent | prev [-]

>A 500k line codebase for an agent CLI proves one thing: making a probabilistic LLM behave deterministically is a massive state-management nightmare. Right now, they're great for prompting simple sites/platforms but they break at large enterprise repos.

Is that the case? I'm pretty sure Claude Code is one of the most massively successful pieces of software made in the last decade. I don't know how that proves your point. Will this codebase become unmanageable eventually? Maybe, but literally every agent harness out there is just copying their lead at this point.

amangsingh 2 hours ago | parent [-]

Claude code is a massively successful generator, I use it all the time, but it's not a governance layer.

The fact that the industry is copying a 500k-line harness is the problem. We're automating security vulnerabilities at scale because people are trying to put the guardrails inside the probabilistic code instead of strictly above it.

Standardizing on half a million lines of defensive spaghetti is a huge liability.

ramesh31 30 minutes ago | parent [-]

>Standardizing on half a million lines of defensive spaghetti is a huge liability.

Again, maybe it will be. Or maybe the way we make software and what is considered good practice will completely change with this new technology. I'm betting on the latter.