Remix.run Logo
vidarh 5 hours ago

Rather than an LLM=true, this is better handled with standardizing quiet/verbose settings, as this is a question of verbosity, where an LLM is one instance where you usually want it to be quieter, but not always.

Secondly, a helper to capture output and cache it, and frankly a tool or just options to the regular shell/bash tools to cache output and allow filtered retrieval of the cached output, as more so than context and tokens the frustration I have with the patterns shown is that often the agent will re-execute time-consuming tasks to retrieve a different set of lines from the output.

A lot of the time it might even be best to run the tool with verbose output, but it'd be nice if tools had a more uniform way of giving output that was easier to systematically filter to essentials on first run (while caching the rest).

iainmerrick 5 hours ago | parent | next [-]

Yes! After seeing a lot of discussions like this, I came up with a rule of thumb:

Any special accommodations you make for LLMs are either a) also good for humans, or b) more trouble than they're worth.

It would be nice for both LLMs and humans to have a tool that hides verbose tool output, but still lets you go back and inspect it if there's a problem. Although in practice as a human I just minimise the terminal and ignore the spam until it finishes. Maybe LLMs just need their own equivalent of that, rather than always being hooked up directly to the stdout firehose.

MITSardine 3 hours ago | parent | prev [-]

Yes, what's preventing the LLM from running myCommand > /tmp/out_someHash.txt ; tail out_someHash.txt and then greping or tailing around /tmp/out_someHash.txt on failure?

vidarh 3 hours ago | parent [-]

There isn't really anything other than training, but they generally don't. You probably can get them to do that with some extra instructions, but part of the problem - at least with Claude - is that it's really trigger-happy about re-running the commands if it doesn't get the results it likes, assuming the results reflects stale results. Even with very expensive (in time) scripts I often see it start a run, pipe it to a file, put it in the background, then loop on sleep statements, occasionally get "frustrated" and check, only to throw the results away 30 seconds after they are done because it's made an unrelated change.

A lot of the time this behaviour is probably right. But it's annoyingly hard to steer it to handle this correctly. I've had it do this even with make targets where the makefile itself makes clear the dependencies means it could trust the cached (in a file) results if it just runs make <target>. Instead I regularly find it reading the Makefile and running the commands manually to work around the dependency management.