I still find it incredible at the power that was unleashed by surrounding an LLM with a simple state machine, and giving it access to bash

▲

Yokohiii 11 hours ago | parent | next [-]

That is why I am currently looking into building my own simple, heavily isolated coding agent. The bloat is already scary, but the bad decisions should make everyone shiver. Ten years ago people would rant endlessly about things with more then one edge, that requires a glimpse of responsibility to use. Now everyone seems to be either in panic or hype mode, ignoring all good advice just to stay somehow relevant in a chaotic timeline.

▲

alfiedotwtf 19 minutes ago | parent | prev | next [-]

I found replacing bash with python to be more useful… that way, it can craft whatever it desires without having to pipe a billion pieces of gum together

▲

HarHarVeryFunny 11 hours ago | parent | prev | next [-]

At it's heart it's prompt/context engineering. The model has a lot of knowledge baked into it, but how do you get it out (and make it actionable for a semi-autonomous agent)? ... you craft the context to guide generation and maintain state (still interacting with a stateless LLM), and provide (as part of context) skills/tools to "narrow" model output into tool calls to inspect and modify the code base.

I suspect that more could be done in terms of translating semi-naive user requests into the steps that a senior developer would take to enact them, maybe including the tools needed to do so.

It's interesting that the author believes that the best open source models may already be good enough to complete with the best closed source ones with an optimized agent and maybe a bit of fine tuning. I guess the bar isn't really being able to match the SOTA model, but being close to competent human level - it's a fixed bar, not a moving one. Adding more developer expertise by having the agent translate/augment the users request/intent into execution steps would certainly seem to have potential to lower the bar of what the model needs to be capable of one-shotting from the raw prompt.

	▲	Serberus 6 hours ago \| parent [-]
		[dead]

▲

emp17344 8 hours ago | parent | prev | next [-]

If you saw the Claude Code leak, you’d know the harness is anything but simple. It’s a sprawling, labyrinthine mess, but it’s required to make LLMs somewhat deterministic and useful as tools.

▲

girvo 6 hours ago | parent | next [-]

That’s also because of how Claude Code was written. It doesn’t have to be that way per se.

▲

efromvt 6 hours ago | parent | prev | next [-]

It's pretty easy to get determinism with a simple harness for a well-defined set of tasks with the recent models that are post-trained for tool use. CC probably gets some bloat because it tries to do a LOT more; and some bloat because it's grown organically.

	▲	emp17344 6 hours ago \| parent [-]
		>It's pretty easy to get determinism with a simple harness for a well-defined set of tasks with the recent models that are post-trained for tool use. Do you have a source? Claude Code is the only genetic system that seems to really work well enough to be useful, and it’s equipped with an absolutely absurd amount of testing and redundancy to make it useful.

▲

xstas1 8 hours ago | parent | prev [-]

Hypothesis: it's a sprawling, labyrinthine mess because it was grown at high speed using Claude Code.

	▲	emp17344 7 hours ago \| parent [-]
		There’s a lot of redundancy, because there has to be to make the system useful. It’s a hacked together mess.

▲

stanleykm 11 hours ago | parent | prev | next [-]

unfortunately all the agent cli makers have decided that simply giving it access to bash is not enough. instead we need to jam every possible functionality we can imagine into a javascript “TUI”.

▲

HarHarVeryFunny 10 hours ago | parent [-]

If all you want is a program that calls the model in a loop and offers a bash tool, then ask Claude Code to build that. You won't like it though!

For a preview of what it'd be like, just tell your AI chat app that you'll run bash commands for it, and please change the app in your "current directory" to "sort the output before printing it", or some such request.

▲

senko 10 hours ago | parent | next [-]

Claude Code with Opus 4.6 regularly uses sed for multi-line edits, in my experience. On top of it, Pi is famously only exposing 4 tools, which is not just Bash, but far more constrained than CCs 57 or so tools.

So, yes, it can work.

▲

HarHarVeryFunny 9 hours ago | parent | next [-]

I think the problem/limitation would be as much due to context management as tools. Obviously bash plus a few utilities is sufficient to explore/edit the code base, but I can't imagine this working reliably without the models being specifically trained to use specific tools, and recognize/adapt to different versions of them etc.

Context management, both within and across sessions, seems the bigger issue. Without the agent supporting this, you are at the mercy of the model compacting/purging the context as needed, in some generic fashion, as well as being smart enough to decide to create notes for itself tracking what it is doing, etc.

Apparently CC is 512K LOC, which seems massively bloated, but I do think that things like tools, skills, context management and subagents are all needed to effectively manage context and avoid the issues that might be anticipated by just telling the model it's got a bash tool, and go figure.

	▲	stanleykm 7 hours ago \| parent [-]
		You don’t really need most of that stuff. Have sensible steering files. Have the agent keep state itself. Dont bother compacting. Its fine.

▲

HarHarVeryFunny 9 hours ago | parent | prev [-]

I thought CC only supports it's find/replace edit tool (implemented by CC itself, using Node.js for file access), and is platform agnostic. Are you saying that on linux CC offers "sed" as a tool too? I can't imagine it offers "bash" since that's way too dangerous.

▲

senko 7 hours ago | parent [-]

Yes, Claude Code has a Bash tool, and Claude in some cases uses the CLI sed utility (via the Bash tool) for file changes (although it has built-in file update), at least on my Linux machine.

▲

HarHarVeryFunny 7 hours ago | parent [-]

Interesting - thanks.

I just asked Claude, and apparently CC makes it's bash tool available on all platforms it runs on (Linux, macOS, Windows WSL, Git for Windows), and doesn't do platform-specifc filtering of bash commands, which would seem to make for some interesting incompatibilities - GNU utils (sed, grep, find) on Linux and Windows, but BSD variants on macOS.

	▲	girvo 6 hours ago \| parent [-]
		Claude code will semi-regularly try to use GNU utils on my Mac

▲

Yokohiii 10 hours ago | parent | prev | next [-]

I think you get him wrong? He is already concerned about "bash on steroids" and current tools add concerning amounts of steroids to everything.

▲

girvo 6 hours ago | parent | prev | next [-]

> If all you want is a program that calls the model in a loop and offers a bash tool, then ask Claude Code to build that. You won't like it though!

Okay sure it’s technically more than just bash, but my own for-fun coding agent and pi-coding-agent work this way. The latter is quite useful. You can get surprisingly far with it.

▲

stanleykm 10 hours ago | parent | prev | next [-]

i did.. and thats what i use. obviously its a little more than just a tool that calls bash but it is considerably less than whatever they are doing in coding agents now.

▲

slopinthebag 9 hours ago | parent | prev [-]

Claude Code gets smoked on benchmarks by an agent that has a single tool: tmux. So I think they might actually like that quite a bit.

	▲	HarHarVeryFunny 8 hours ago \| parent [-]
		What benchmarks are you referring to?

▲

esafak 12 hours ago | parent | prev [-]

Tools gave humans the edge over other animals.

▲

Yokohiii 10 hours ago | parent [-]

And those tools regularly burnt cities to ashes. Took a long time to get it under control.

	▲	y0eswddl 9 hours ago \| parent [-]
		*burn - I'm not sure we've gotten that under control quite yet