Yet despite this all the LLMS I've tried struggle to scale beyond much more than a single module. They're vast improvements on that test perhaps, but in real life they still struggle to be coherent over larger projects and scales.

▲

bckr 5 months ago | parent | next [-]

> struggle to scale beyond much more than a single module

Yes. You must guide coding agents at the level of modules and above. In fact, you have to know good coding patterns and make these patterns explicit.

Claude 4 won’t use uv, pytest, pydantic, mypy, classes, small methods, and small files unless you tell it to.

Once you tell it to, it will do a fantastic job generating well-structured, type-checked Python.

▲

viraptor 5 months ago | parent | prev [-]

Those are different kind of issues. Improving the quality of actions is what we're seeing here. Then for the larger projects/contexts the leaders will have to battle it out between the improved agents, or actually moving to something like RWKV and processing the whole project in one go.

	▲	morsecodist 5 months ago \| parent [-]
		They may be different kinds of issues but they are the issues that actually matter.