Remix.run Logo
ants_everywhere 3 days ago

> I really desperately need LLMs to maintain extremely effective context

The context is in the repo. An LLM will never have the context you need to solve all problems. Large enough repos don't fit on a single machine.

There's a tradeoff just like in humans where getting a specific task done requires removing distractions. A context window that contains everything makes focus harder.

For a long time context windows were too small, and they probably still are. But they have to get better at understanding the repo by asking the right questions.

onion2k 3 days ago | parent | next [-]

Large enough repos don't fit on a single machine.

I don't believe any human can understand a problem if they need to fit the entire problem blem domain in their head, and the scope of a domain that doesn't fit on a computer. You have to break it down into a manageable amount of information to tackle it in chunks.

If a person can do that, so can an LLM prompted to do that by a person.

ehnto 3 days ago | parent | next [-]

I disagree, I may not have the whole codebase in my head in one moment but I have had all of it in my head at some point, and it is still there, that is not true of an LLM. I use LLMs and am impressed by them, but they just do not approximate a human in this particular area.

My ability to break a problem down does not start from listing the files out and reading a few. I have a high level understanding of the whole project at all times, and a deep understanding of the whole project stored, and I can recall that when required, this is not true of an LLM at any point.

We know this is a limitation and it's why we have various tools attempting to approximate memory and augment training on the fly, but they are approximations and they are in my opinion, not even close to real human memory and depth of understanding for data it was not trained on.

Even for mutations of scenarios it was trained on, which code is a great example of that. It is trained on billions of lines of code, yet still fails to understand my codebase intuitively. I have definitely not read billions of lines of code.

onion2k 3 days ago | parent | next [-]

My ability to break a problem down does not start from listing the files out and reading a few.

If you're completely new to the problem then ... yes, it does.

You're assuming that you're working on a project that you've spent time on and learned the domain for, and then you're comparing that to an LLM being prompted to look at a codebase with the context of the files. Those things are not the same though.

A closer analogy to LLMs would be prompting it for questions when it has access (either through MCP or training) to the project's git history, documentation, notes, issue tracker, etc. When that sort of thing is commonplace, and LLMs have the context window size to take advantage of all that information, I suspect we'll be surprised how good they are even given the results we get today.

ehnto 3 days ago | parent [-]

> If you're completely new to the problem then ... yes, it does.

Of course, because I am not new to the problem, whereas an LLM is new to it every new prompt. I am not really trying to find a fair comparison because I believe humans have an unfair advantage in this instance, and am trying to make that point, rather than compare like for like abilities. I think we'll find even with all the context clues from MCPs and history etc. they might still fail to have the insight to recall the right data into the context, but that's just a feeling I have from working with Claude Code for a while. Because I instruct it to do those things, like look through git log, check the documentation etc, and it sometimes finds a path through to an insight but it's just as likely to get lost.

I alluded to it somewhere else but my experience with massive context windows so far has just been that it distracts the LLM. We are usually guiding it down a path with each new prompt and have a specific subset of information to give it, and so pumping the context full of unrelated code at the start seems to derail it from that path. That's anecdotal, though I encourage you to try messing around with it.

As always, there's a good chance I will eat my hat some day.

scott_s 3 days ago | parent [-]

> Of course, because I am not new to the problem, whereas an LLM is new to it every new prompt.

That is true for the LLMs you have access to now. Now imagine if the LLM had been trained on your entire code base. And not just the code, but the entire commit history, commit messages and also all of your external design docs. And code and docs from all relevant projects. That LLM would not be new to the problem every prompt. Basically, imagine that you fine-tuned an LLM for your specific project. You will eventually have access to such an LLM.

snowfield 2 days ago | parent | next [-]

AI training doesn't work like that. you don't train it on context, you train it on recognition and patterns.

scott_s 2 days ago | parent [-]

You train on data. Context is also data. If you want a model to have certain data, you can bake it into the model during training, or provide it as context during inference. But if the "context" you want the model to have is big enough, you're going to want to train (or fine-tune) on it.

Consider that you're coding a Linux device driver. If you ask for help from an LLM that has never seen the Linux kernel code, has never seen a Linux device driver and has never seen all of the documentation from the Linux kernel, you're going to need to provide all of this as context. And that's both going to be onerous on you, and it might not be feasible. But if the LLM has already seen all of that during training, you don't need to provide it as context. Your context may be as simple as "I am coding a Linux device driver" and show it some of your code.

jimbokun 3 days ago | parent | prev [-]

Why haven’t the bug AI companies been pursuing that approach, vs just ramping up context window size?

menaerus 2 days ago | parent | next [-]

Well, we don't really know if they aren't doing exactly that for their internal code repos, right?

Conceptually, there is no difference between fine-tuning the LLM for being a law expert of specific country and fine-tuning the LLM for being an expert for given codebase. Former is already happening and is public. Latter is not yet public but I believe it is happening.

Reason why big co are pursuing generic LLMs is because they serve as a foundation for basically any other derivative and domain-specific work.

scott_s 2 days ago | parent | prev [-]

Because training one family of models with very large context windows can be offered to the entire world as an online service. That is a very different business model from training or fine-tuning individual models specifically for individual customers. Someone will figure out how to do that at scale, eventually. It might require the cost of training to reduce significantly. But large companies with the resources to do this for themselves will do it, and many are doing it.

ehnto 3 days ago | parent | prev | next [-]

Additionally, the more information you put into the context the more confused the LLM will get, if you did dump the whole codebase into the context it would not suddenly understand the whole thing. It is still an LLM, all you have done is polluted the context with a million lines of unrelated code, and some lines of related code, which it will struggle to find in the noise (in my experience of much smaller experiments)

Bombthecat 3 days ago | parent [-]

I call this context decay. :)

The bigger the context, the more stuff "decays" sometimes to complete different meanings

PaulDavisThe1st 3 days ago | parent | prev | next [-]

> I disagree, I may not have the whole codebase in my head in one moment but I have had all of it in my head at some point, and it is still there, that is not true of an LLM.

All 3 points (you have had all of it your head at some point, it is still there, that is not true of an LLM) are mere conjectures, and not provable at this time, certainly not in the general case. You may be able to show this of some codebases for some developers and for some LLMs, but not all.

fnordsensei 3 days ago | parent | next [-]

The brain can literally not process any piece of information without being changed by the act of processing it. Neuronal pathways are constantly being reinforced or weakened.

Even remembering alters the memory being recalled, entirely unlike how computers work.

Lutger 3 days ago | parent | next [-]

I've always find it interesting that once I take a wrong turn finding my way through the city and I'm not deliberate about remembering this was, in fact, a mistake, I am more prone to taking the same wrong turn again the next time.

dberge 3 days ago | parent [-]

> once I take a wrong turn finding my way through the city... I am more prone to taking the same wrong turn again

You may want to stay home then to avoid getting lost.

johnisgood 3 days ago | parent | prev [-]

For humans, remembering strengthens that memory, even if it is dead wrong.

jbs789 3 days ago | parent | prev | next [-]

I'm not sure the idea that a developer maintains a high level understanding is all that controversial...

animuchan 3 days ago | parent [-]

The trend for this idea's controversiality is shown on this very small chart: /

ehnto 3 days ago | parent | prev [-]

I never intended to say it was true of all codebases for all developers, that would make no sense. I don't know all developers.

I think it's objectively true that the information is not in the LLM. It did not have all codebases to train with, and they do not (immediately) retrain on the codebases they encounter through usage.

xwolfi 3 days ago | parent | prev | next [-]

You only worked on very small codebase then. When you work on giant ones, you Ctrl+F a lot, build a limited model of the problem space, and pray the unit tests will catch anything you might have missed...

akhosravian 3 days ago | parent | next [-]

And when you work on a really big codebase you start having multiple files and have to learn tools more advanced than ctrl-f!!

ghurtado 3 days ago | parent [-]

> and have to learn tools more advanced than ctrl-f!!

Such as ctrl-shift-f

But this is an advanced topic, I don't wanna get into it

ehnto 3 days ago | parent | prev | next [-]

We're measuring lengths of string, but I would not say I have worked on small projects. I am very familiar with discovery, and have worked on a lot of large legacy projects that have no tests just fine.

jimbokun 3 days ago | parent | prev [-]

Why are LLMs so bad at doing the same thing?

airbreather 3 days ago | parent | prev | next [-]

you will have abstractions - black boxing, interface overviews etc, humans can only hold so much detail in current context memory, some say 7 items on average.

ehnto 3 days ago | parent | next [-]

Of course, but even those blackoxes are not empty, they've got a vague picture inside them based on prior experience. I have been doing this for a while so most things are just various flavours of the same stuff, especially in enterprise software.

The important thing in this context is that I know it's all there, I don't have to grep the codebase to fill up my context, and my understanding of the holistic project does not change each time I am booted up.

jimbokun 3 days ago | parent | prev [-]

And LLMs can’t leverage these abstractions nearly as well as humans…so far.

ivape 3 days ago | parent | prev [-]

My ability to break a problem down does not start from listing the files out and reading a few.

I does, it’s just happening at lightning speed.

CPLX 3 days ago | parent [-]

We don't actually know that.

If we had that level of understanding of how exactly our brains do what they do things would be quite different.

krainboltgreene 3 days ago | parent | prev | next [-]

I have an entire life worth of context and I still remember projects I worked on 15 years ago.

adastra22 3 days ago | parent [-]

Not with pixel perfect accuracy. You vaguely remember, although it may not feel like that because your brain fills in the details (hallucinates) as you recall. The comparisons are closer than you might think.

vidarh 3 days ago | parent | next [-]

The comparison would be apt if the LLM was trained on your codebase.

jimbokun 3 days ago | parent [-]

Isn’t that the problem?

I don’t see any progress on incrementally training LLMs on specific projects. I believe it’s called fine tuning, right?

Why isn’t that the default approach anywhere instead of the hack of bigger “context windows”?

gerhardi 2 days ago | parent | next [-]

I’m not well versed enough on this but wouldn’t it be a problem with custom training that the specific project training codebases probably would likely have a lot of the implemented stuff, relevant for the domain, only once and in one way, compared to how the todays popular large models have been trained maybe with countless different ways to use common libraries for whatever various tasks with whatever Github ripped material fed in?

adastra22 2 days ago | parent | prev [-]

Because fine-tuning can be used to remove restrictions from a model, so they don't give us plebs access to that.

krainboltgreene 3 days ago | parent | prev [-]

You have no idea if I remember with pixel perfect accuracy (whatever that even means). There are plenty of people with photographic memory.

Also, you're a programmer you have no foundation of knowledge on which to make that assessment. You might as well opine on quarks or martian cellular life. My god the arrogance of people in my industry.

adastra22 2 days ago | parent | next [-]

Repeated studies have shown that perfect "photographic memory" does not in fact exist. Nobody has it. Some people think that they do though, but when tested under lab conditions those claims don't hold up.

I don't believe these people are lying. They are self-reporting their own experiences, which unfortunately have the annoying property of being generated by the very mind that is living the experience.

What does it mean to have an eidetic memory? It means that when you remember something you vividly remember details, and can examine those details to your heart's content. When you do so, it feels like all those details are correct. (Or so I'm told, I'm the opposite with aphantasia.)

But it turns out if you actually have a photo reference and do a blind comparison test, people who report photographic memories actually don't do statistically any better than others in remembering specific fine details, even though they claim that they clearly remember.

The simpler explanation is that while all of our brains are provide hallucinated detail to fill the gaps of memories, their brains are wired up to present those made up details feel much more real than they do to others. That is all.

HarHarVeryFunny 2 days ago | parent [-]

> Repeated studies have shown that perfect "photographic memory" does not in fact exist.

This may change your mind!

https://www.youtube.com/watch?v=jVqRT_kCOLI

adastra22 2 days ago | parent [-]

No, a YouTube video won’t convince me over repeated, verified lab experiments.

HarHarVeryFunny 2 days ago | parent [-]

So what do you make of the video - do you think it's fake, or are you just making the distinction between eidetic memory and photographic memory?

There are so many well documented cases of idiot savants with insane memory skills in various areas (books, music, dates/events, etc), that this type of snapshot visual memory (whatever you want to call it) doesn't seem surprising in that context - it'd really be a bit odd such diverse memory skills excluded one sensory modality (and it seems it doesn't).

adastra22 2 days ago | parent [-]

I do not watch YouTube, sorry.

Hearsay is not reliable. Yes there are stories of savants. When you put them in a lab and actually see how good their memory is, it turns out to be roughly the same as everyone else's. The stories aren't true.

(They may be better at remembering weird facts or something, but when you actually calculate the information entropy of what they are remembering, it ends up being within the ballpark of what a neurotypical person remembers across of general span of life. That's why these people are idiot savants (to use your term). They allocate all their memory points to weird trivia and none to everyday common knowledge.

HarHarVeryFunny 2 days ago | parent | next [-]

> They allocate all their memory points to weird trivia and none to everyday common knowledge.

I think it's more complex than that - it's they way they are forming memories (i.e. what they remember) that is different to a normal person. In a normal person surprise/novelty (prediction failure) is the major learning signal that causes us to remember something - we're selective in what gets remembered (this is just mechanically how a normally operating brain works), whereas the savant appears to remember everything in certain modalities.

I don't think that "using up all their memory" is why savants are "idiots", but rather just a reflection of something more severe that is wrong.

HarHarVeryFunny 2 days ago | parent | prev [-]

If you refuse to look at evidence, then your opinion isn't worth much, is it?

johnisgood 3 days ago | parent | prev [-]

> There are plenty of people with photographic memory.

I thought it was rare.

melagonster 3 days ago | parent | prev | next [-]

Sure, this is why AGI looks possible sometimes. But companies should not require their users to create AGI for them.

wraptile 3 days ago | parent | prev | next [-]

Right, the LLM doesn't need to know all of the code under utils.parse_id to know that this call will parse the ID. The best LLM results I get is when I manually define the the relative code graph of my problem similar how I'd imagine it my head which seems to provide optimal context. So bigger isn't really better.

rocqua 3 days ago | parent [-]

I wonder why we can't have one LLM generate this understanding for another? Perhaps this is where teaming of LLMs gets its value. In managing high and low level context in different context windows.

mixedCase 3 days ago | parent [-]

This is a thing and doesn't require a separate model. You can set up custom prompts that will, based on another prompt describing the task to achieve, generate information about the codebase and a set of TODOs to accomplish the task, generating markdown files with a summarized version of the relevant knowledge and prompting you again to refine that summary if needed. You can then use these files to let the agent take over without going on a wild goose chase.

friendzis 3 days ago | parent | prev [-]

Fitting the entire problem domain in their head is what engineers do.

Engineering is merely a search for optimal solution in this multidimensional space of problem domain(-s), requirements, limitations and optimization functions.

barnabee 3 days ago | parent [-]

_Good_ engineers fit their entire understanding of the problem domain in their head

The best engineers understand how big a difference that is

sdesol 3 days ago | parent | prev | next [-]

> But they have to get better at understanding the repo by asking the right questions.

How I am tackling this problem is making it dead simple for users to create analyzers that are designed to enriched text data. You can read more about how it would be used in a search at https://github.com/gitsense/chat/blob/main/packages/chat/wid...

The basic idea is, users would construct analyzers with the help of LLMs to extract the proper metadata that can be semantically searched. So when the user does an AI Assisted search with my tool, I would load all the analyzers (description and schema) into the system prompt and the LLM can determine which analyzers can be used to answer the question.

A very simplistic analyzer would be to make it easy to identify backend and frontend code so you can just use the command `!ask find all frontend files` and the LLM will construct a deterministic search that knows to match for frontend files.

mrits 3 days ago | parent [-]

How is that better than just writing a line in the md?

sdesol 3 days ago | parent [-]

I am not sure I follow what you are saying. What would the line be and how would it become deterministically searchable?

mrits 3 days ago | parent [-]

frontend path: /src/frontend/* backend path: /src/*

I suppose the problem you have might be unique to nextJS ?

sdesol 3 days ago | parent [-]

The issue is frontend can be a loaded question, especially if you are dealing with legacy stuff, different frameworks, etc. You also can't tell what the frontend code does by looking at that single line.

Now imagine as part of your analyzer, you have the following instructions for the llm:

--- For all files in `src/frontend/` treat them as frontend code. For all files in 'src/' excluding `src/frontend` treat as backend. Create a metadata called `scope` which can be 'frontend', 'backend' or 'mix' where mix means the code can be used for both front and backend like utilities.

Now for each file, create a `keywords` metadata that includes up to 10 unique keywords that describes the core functionality for the file. ---

So with this you can say

- `!ask find all frontend files`

- `!ask find all mix use files`

- `!ask find all frontend files that does [this]`

and so forth.

The whole point of analyzers is to make it easy for the LLM to map your natural language query to a deterministic search.

If the code base is straightforward and follows a well known framework, asking for frontend or backend wouldn't even need an entry as you can just include in the instructions that I use framework X and the LLM would know what to consider.

stuartjohnson12 3 days ago | parent | prev | next [-]

> An LLM will never have the context you need to solve all problems.

How often do you need more than 10 million tokens to answer your query?

ants_everywhere 3 days ago | parent | next [-]

I exhaust the 1 million context windows on multiple models multiple times per day.

I haven't used the Llama 4 10 million context window so I don't know how it performs in practice compared to the major non-open-source offerings that have smaller context windows.

But there is an induced demand effect where as the context window increases it opens up more possibilities, and those possibilities can get bottlenecked on requiring an even bigger context window size.

For example, consider the idea of storing all Hollywood films on your computer. In the 1980s this was impossible. If you store them in DVD or Bluray quality you could probably do it in a few terabytes. If you store them in full quality you may be talking about petabytes.

We recently struggled to get a full file into a context window. Now a lot of people feel a bit like "just take the whole repo, it's only a few MB".

brulard 3 days ago | parent [-]

I think you misunderstand how context in current LLMs works. To get the best results you have to be very careful to provide what is needed for immediate task progression, and postpone context thats needed later in the process. If you give all the context at once, you will likely get quite degraded output quality. Thats like if you want to give a junior developer his first task, you likely won't teach him every corner of your app. You would give him context he needs. It is similar with these models. Those that provided 1M or 2M of context (Gemini etc.) were getting less and less useful after cca 200k tokens in the context.

Maybe models would get better in picking up relevant information from large context, but AFAIK it is not the case today.

remexre 3 days ago | parent | next [-]

That's a really anthropomorphizing description; a more mechanical one might be,

The attention mechanism that transformers use to find information in the context is, in its simplest form, O(n^2); for each token position, the model considers whether relevant information has been produced at the position of every other token.

To preserve performance when really long contexts are used, current-generation LLMs use various ways to consider fewer positions in the context; for example, they might only consider the 4096 "most likely" places to matter (de-emphasizing large numbers of "subtle hints" that something isn't correct), or they might have some way of combining multiple tokens worth of information into a single value (losing some fine detail).

ants_everywhere 3 days ago | parent | prev | next [-]

> I think you misunderstand how context in current LLMs works.

Thanks but I don't and I'm not sure why you're jumping to this conclusion.

EDIT: Oh I think you're talking about the last bit of the comment! If you read the one before I say that feeding it the entire repo isn't a great idea. But great idea or not, people want to do it, and it illustrates that as context window increases it creates demand for even larger context windows.

brulard 2 days ago | parent [-]

I said that based on you saying you exhaust a million token context windows easily. I'm no expert on that, but I think the current state of LLMs works best if you are not approaching that 1M token limit, because large context (reportedly) deteriorates response quality quickly. I think state of the art usage is managing context in tens or low hundreds thousands tokens at most and taking advantage of splitting tasks across subtasks in time, or splitting context across multiple "expert" agents (see sub-agents in claude code).

jimbokun 3 days ago | parent | prev [-]

It seems like LLM need to become experts at managing their OWN context.

Selectively gripping and searching the code to pull into context only those parts relevant to the task at hand.

brulard 2 days ago | parent [-]

That's what I'm thinking about a lot. Something like the models "activate" just some subset of parameters when working (if I understand the new models correctly). So that model could activate parts of context which are relevant for the task at hand

rocqua 3 days ago | parent | prev [-]

It doesn't take me 10000000 tokens to have the context "this was the general idea of the code, these were unimportant implementation details, and this is where lifetimes were tricky."

And that context is the valuable bit for quickly getting back up to speed on a codebase.

injidup 3 days ago | parent | prev | next [-]

All the more reason for good software engineering. Folders of files managing one concept. Files tightly focussed on sub problems of that concept. Keep your code so that you can solve problems in self contained context windows at the right level of abstraction

Sharlin 3 days ago | parent [-]

I fear that LLM-optimal code structure is different from human-optimal code structure, and people are starting to optimize for the former rather than the latter.

mock-possum 3 days ago | parent | prev | next [-]

> The context is in the repo

Agreed but that’s a bit different from “the context is the repo”

It’s been my experience that usually just picking a couple files out to add to the context is enough - Claude seems capable of following imports and finding what it needs, in most cases.

I’m sure it depends on the task, and the structure of the codebase.

manmal 3 days ago | parent | prev | next [-]

> The context is in the repo

No it’s in the problem at hand. I need to load all related files, documentation, and style guides into the context. This works really well for smaller modules, but currently falls apart after a certain size.

alvis 3 days ago | parent | prev [-]

Everything in context hurts focus. It's like some people suffering from hyperthymesia. They are easily get distracted when the recall something