Remix.run Logo
raincole 2 days ago

> I'm not sure I understand this argument. I create new tools all the time as part of my development work, and I have skills stored that tell agents how to use them. They use them flawlessly.

I highly doubt that your tool is like this:

> git branch -vv | grep ': gone]'| grep -v "*" | awk '{ print $1; }' | xargs -r git branch -d

Or:

> ffmpeg -i main_course.mp4 -i reaction_cam.mov \ -filter_complex \ "[1:v]scale=480:270[pip_scaled]; \ [0:v][pip_scaled]overlay=W-w-20:20[pip_video]; \ [pip_video]drawtext=text='LIVE RECORDING':fontcolor=white:fontsize=24:box=1:boxcolor=black@0.6:x=30:y=30[final_video]; \ [0:a][1:a]amix=inputs=2:duration=first:dropout_transition=2[final_audio]" \ -map "[final_video]" -map "[final_audio]" \ -c:v libx264 -crf 21 -preset fast \ -c:a aac -b:a 192k \ output_production.mp4

LLMs generate these for breakfast.

cruffle_duffle 2 days ago | parent [-]

It’s really wild watching LLMs construct those calls. They batch so many different checks and stuff into a single tool call, delimit them with markers, etc.

The crazy thing to me is that this kind of “composition of small tools to create something bigger” is the biggest vindication of the Unix philosophy I can think of.

I have to wonder how much of that behavior was trained into the model and how much it is the secret herbs and spices they toss into the harnesses own system prompts.

fireant a day ago | parent | next [-]

Personally I really dislike when the agents generate super long composed shell commands because they are really hard to audit. ffmpeg I'd whitelist, but if it makes a mistake in some super long chained git command it can have pretty scary consequences.

yencabulator a day ago | parent | prev [-]

Totally breaks the permission model in Claude Code.

cruffle_duffle a day ago | parent [-]

To be fair Claude is plenty capable of climbing out of its sandbox. If it has shell access, it will find a way. And honestly, it’s whitelist/blacklist permission model is broken and inappropriate for it to begin with.