Remix.run Logo
cheema33 3 hours ago

Computer use and screenshots are context intensive. Text is not. The more context you give to an LLM, the dumber it gets. Some people think at 40% context utilization, the LLM starts to get into the dumb zone. That is where the limitations are as of today. This is why CLI based tools like Claude Code are so good. And any attempt at computer use has fallen by the wayside.

There are some potential solutions to this problem that come to mind. Use subagents to isolate the interesting bits about a screenshot and only feed that to the main agent with a summary. This will all still have a significantly higher token usage compared to a text based interface, but something like this could potentially keep the LLM out of the dumb zone a little longer.

fragmede 2 hours ago | parent [-]

> And any attempt at computer use has fallen by the wayside.

You're totally right! I mean, aside from Anthropic launching "Cowork: Claude Code for the rest of your work" 5 days ago. :)

https://claude.com/blog/cowork-research-preview

https://news.ycombinator.com/item?id=46593022

More to the point though, you should be using Agents in Claude Code to limit context pollution. Agents run with their own context, and then only return salient details. Eg, I have an Agent to run "make" and return the return status and just the first error message if there is one. This means the hundreds/thousands of lines of compilation don't pollute the main Claude Code context, letting me get more builds in before I run out of context there.