| ▲ | Uehreka 6 hours ago | |
One issue I often run into with this stuff is the tightly coupled nature of things in the real world. I’ll fashion an example: Let’s say you break a job down into 3 tasks: A, B and C. Doing one of those tasks is too much for an LLM to accomplish in one turn (this is something you learn intuitively through experience), but an LLM could break each task into 3 subtasks. So you do that, and start by having the LLM break task A into subtasks A1, A2 and A3. And B into B1, B2 and B3. But when you break down task C, the LLM (which needs to start with a fresh context each time since each “breakdown” uses 60-70% of the context) doesn’t know the details of task A, and thus writes a prompt for C1 that is incompatible with “the world where A1 has been completed”. This sort of “tunnel vision” is currently an issue with scaling 2025 agents. As useful context lengths get longer it’ll get easier, but figuring out how to pack exactly the right info into a context is tough, especially when the tool you’d reach for to automate it (LLMs) are the same tool that suffers from these context limitations. None of this means big things aren’t possible, just that the fussyness of these systems increases with the size of the task, and that fussyness leads to more requirements of “human review” in the process. | ||
| ▲ | adastra22 an hour ago | parent [-] | |
I've been experimenting with this with a custom /plan slash command for claude code, available here: https://github.com/atomCAD/agents Planning is definitely still something that requires a human in the loop, but I have been able to avoid the problem you are describing. It does require some trickery (not yet represented in the /plan command) when the overall plan exceeds reasonable context window size (~20k tokens). You basically have to start having the AI consider combinatorially many batches of the plan compared with each other, to discover and correct these dependency issues. | ||