▲ | elcritch 21 hours ago | |||||||
Yet despite this all the LLMS I've tried struggle to scale beyond much more than a single module. They're vast improvements on that test perhaps, but in real life they still struggle to be coherent over larger projects and scales. | ||||||||
▲ | bckr 9 hours ago | parent | next [-] | |||||||
> struggle to scale beyond much more than a single module Yes. You must guide coding agents at the level of modules and above. In fact, you have to know good coding patterns and make these patterns explicit. Claude 4 won’t use uv, pytest, pydantic, mypy, classes, small methods, and small files unless you tell it to. Once you tell it to, it will do a fantastic job generating well-structured, type-checked Python. | ||||||||
▲ | viraptor 21 hours ago | parent | prev [-] | |||||||
Those are different kind of issues. Improving the quality of actions is what we're seeing here. Then for the larger projects/contexts the leaders will have to battle it out between the improved agents, or actually moving to something like RWKV and processing the whole project in one go. | ||||||||
|