There's a chance this memory problem is not going to be that easy to solve. It's true context lengths have gotten much longer, but all context is not created equal.

There's like a significant loss of model sharpness as context goes over 100K. Sometimes earlier, sometimes later. Even using context windows to their maximum extent today, the models are not always especially nuanced over the long ctx. I compact after 100K tokens.

▲

Ozzie_osman 6 days ago | parent | next [-]

But you don't have to hold the entire memory in context. You just need to perfect techniques to pull in parts of the context that you need. This can be done via RAG, multi-agent architectures, etc. It's not perfect but it will get better over time.

▲

elorant 6 days ago | parent | prev | next [-]

From my experience context window by itself tells half the story. You load a big document that’s 200k tokens and ask it a question, it will answer just fine. You start a conversation that soon enough balloons past 100k then it starts losing coherence pretty quickly. So I guess batch size plays a more significant role.

▲

IceHegel 4 days ago | parent [-]

By batch size, do you mean the number of tokens in the context window that were generated by the model vs. external tokens?

Because my understandings is that, however you get to 100K, the 100,001st token is generated the same way as far as the model is concerned.

	▲	4 days ago \| parent [-]
		[deleted]

▲

luckydata 6 days ago | parent | prev | next [-]

I'm over simplifying here but graph database and knowledge graphs exist. An LLM doesn't need to preserve everything in context, just what it needs for that conversation.

	▲	IceHegel 4 days ago \| parent [-]
		Unless there is a trick that I am missing, I don't think this will work by itself. The fundamental thing is what can the model attend to as it generates the next token. If you give a summary+graph to the model, it can still only attend to the summary for token 1. If it's going to call a tool for a deeper memory, it still only gets the summary when it makes the decision on what to call. You get the same problem when asking the model to make changes in even medium-sized code bases. It starts from scratch each time, takes forever to read a bunch of files, and sometimes it reads the right stuff, other times it doesn't.

▲

spiderfarmer 6 days ago | parent | prev [-]

Context will need to go in layers. Like when you tell someone what you do for a living, your first version will be very broad. But when they ask the right questions, you can dive into details pretty quick.