Even 1 MB context is only roughly 20K LOC so pretty limiting, especially if you're also trying to fit API documents or any other lengthy material into the context.

Anthropic also recently said that they think that longer/compressed context can serve as an alternative (not sure what was the exact wording/characterization they used) to continual/incremental learning, so context space is also going to be competing with model interaction history if you want to avoid groundhog day and continually having to tell/correct the model the same things over and over.

It seems we're now firmly in the productization phase of LLM development, as opposed to seeing much fundamental improvement (other than math olympiad etc "benchmark" results, released to give the impression of progress). Yannic Kilcher is right, "AGI is not coming", at least not in the form of an enhanced LLM. Demis Hassabis' very recent estimate was for 50% chance of AGI by 2030 (i.e. still 15 years out).

While we're waiting for AGI, it seems a better approach to needing everything in context would be to lean more heavily on tool use, perhaps more similar to how a human works - we don't memorize the entire code base (at least not in terms of complete line-by-line detail, even though we may have a pretty clear overview of a 10K LOC codebase while we're in the middle of development) but rather rely on tools like grep and ctags to locate relevant parts of source code on an as-needed basis.

▲

km144 3 days ago | parent | next [-]

As you alluded to at the end of your post—I'm not really convinced 20k LOC is very limiting. How many lines of code can you fit in your working mental model of a program? Certainly less than 20k concrete lines of text at any given time.

In your working mental model, you have broad understandings of the broader domain. You have broad understandings of the architecture. You summarize broad sections of the program into simpler ideas. module_a does x, module_b does y, insane file c does z, and so on. Then there is the part of the software you're actively working on, where you need more concrete context.

So as you move towards the central task, the context becomes more specific. But the vague outer context is still crucial to the task at hand. Now, you can certainly find ways to summarize this mental model in an input to an LLM, especially with increasing context windows. But we probably need to understand how we would better present these sorts of things to achieve performance similar to a human brain, because the mechanism is very different.

▲

jacobr1 2 days ago | parent [-]

This is basically how claude code works today. You have it /init a description of the project structure into CLAUDE.md that is used for each invocation. There is some implicit knowledge in the project about common frameworks and languages. Then when working on something between the explicit and implicit knowledge and the task at hand it will grep for relevant material in the project, load either full or parts of files, and THEN it will start working on the task. But it dynamically builds the context of the codebase based on searching for the relevant bit. Short-circuiting this by having a good project summary makes it more efficient - but you don't need to literally copy in all the code files.

	▲	HarHarVeryFunny 2 days ago \| parent [-]
		Interesting - thanks!

▲

HarHarVeryFunny 3 days ago | parent | prev | next [-]

Just as a self follow-up, another motivation to lean on tool use rather than massive context (cf. short-term memory) is to keep LLM/AI written/modified code understandable to humans ...

At least part of the reason that humans use hierarchical decomposition and divide-and-conquor is presumably because of our own limited short term memory, since hierarchical organization (modules, classes, methods, etc) allows us to work on a problem at different levels of abstraction while only needing to hold that level of the hierarchy in memory.

Imagine what code might look like if written by something with no context limit - just a flat hierarchy of functions, perhaps, at least until it perhaps eventually learned, or was told, the other reasons for hierarchical and modular design/decomposition to assist in debugging and future enhancement, etc!

▲

aorobin 3 days ago | parent | prev | next [-]

>"Demis Hassabis' very recent estimate was for 50% chance of AGI by 2030 (i.e. still 15 years out)."

2030 is only 5 years out

	▲	Zircom 3 days ago \| parent [-]
		That was his point lol, if someone is saying it'll happen in 5 years, triple that for a real estimate.

▲

brookst 3 days ago | parent | prev [-]

1M tokens ~= 3.5M characters ~= 58k LOC at an average of 60 chars/line. 88k LOC at 40 chars/line

	▲	HarHarVeryFunny 3 days ago \| parent [-]
		OK - I was thinking 1M chars (@ 50 chars/line) vs tokens, but I'm not sure it makes much difference to the argument. There are plenty of commercial code bases WAY bigger, and as noted other things may also be competing for space in the context.