How do you know that?

bigmadshoe 4 days ago | parent | next [-]

https://research.trychroma.com/context-rot

joenot443 3 days ago | parent | next [-]

This is a good piece. Clearly it's a pretty complex problem and the intuitive result a layman engineer like myself might expect doesn't reflect the reality of LLMs. Regex works as reliably on 20 characters as it does 2m characters; the only difference is speed. I've learned this will probably _never_ be the case with LLMs, there will forever exist some level of epistemic doubt in its result.

When they announced Big Contexts in 2023, they referenced being able to find a single changed sentence in the context's copy of Great Gatsby[1]. This example seemed _incredible_ to me at the time but now two years later I'm feeling like it was pretty cherry-picked. What does everyone else think? Could you feed a novel into an LLM and expect it to find the single change?

[1] https://news.ycombinator.com/item?id=35941920

	▲	bigmadshoe 3 days ago \| parent \| next [-]
		This is called a "needle in a haystack" test, and all the 1M context models perform perfectly on this exact problem, at least when your prompt and the needle are sufficiently similar. As the piece above references, this is a totally insufficient test for the real world. Things like "find two unrelated facts tied together by a question, then perform reasoning based on them" are much harder. Scaling context properly is O(n^2). I'm not really up to date on what people are doing to combat this, but I find it hard to believe the jump from 100k -> 1m context window involved a 100x (10^2) slowdown, so they're probably taking some shortcut.
	▲	adastra22 3 days ago \| parent \| prev [-]
		Depends on the change.

▲

dang 3 days ago | parent | prev [-]

Discussed here:

Context Rot: How increasing input tokens impacts LLM performance - https://news.ycombinator.com/item?id=44564248 - July 2025 (59 comments)

▲

rootnod3 3 days ago | parent | prev | next [-]

The longer the context and the discussion goes on, the more it can get confused, especially if you have to refine the conversation or code you are building on.

Remember, in its core it's basically a text prediction engine. So the more varying context there is, the more likely it is to make a mess of it.

Short context: conversion leaves the context window and it loses context. Long context: it can mess with the model. So the trick is to strike a balance. But if it's an online models, you have fuck all to control. If it's a local model, you have some say in the parameters.

▲

giancarlostoro 3 days ago | parent | prev | next [-]

Here's a paper from MIT that covers how this could be resolved in an interesting fashion:

https://hanlab.mit.edu/blog/streamingllm

The AI field is reusing existing CS concepts for AI that we never had hardware for, and now these people are learning how applied Software Engineering can make their theoretical models more efficient. It's kind of funny, I've seen this in tech over and over. People discover new thing, then optimize using known thing.

▲

kridsdale3 3 days ago | parent | next [-]

The fact that this is happening is where the tremendous opportunity to make money as an experienced Software Engineer currently lies.

For instance, a year or two ago, the AI people discovered "cache". Imagine how many millions the people who implemented it earned for that one.

	▲	nxobject 5 hours ago \| parent \| next [-]
		What we need are "idea dice" or "concept dice" for CS – each side could have a vague architectural nudge like "parallelize", "interpret", "precompute", "predict and unwind", "declarative"...
	▲	giancarlostoro 3 days ago \| parent \| prev [-]
		I've been thinking the same, and its things that you don't need some crazy ML degree to know how to do... A lot of the algorithms are known... for a while now... Milk it while you can.

▲

mamp 3 days ago | parent | prev [-]

Unfortunately, I think the context rot paper [1] found that the performance degradation when context increased still occurred in models using attention sinks.

1. https://research.trychroma.com/context-rot

	▲	giancarlostoro 3 days ago \| parent [-]
		Saw that paper have not had a chance to read it yet, are there other techniques that help then? I assume theres a few different ones used.

▲

anonz4FWNqnX 4 days ago | parent | prev | next [-]

I've had similar experiences. I've gone back and forth between running models locally and using the commercial models. The local models can be incredibly useful (gemma, qwen), but they need more patience and work to get them to work.

One advantage to running locally[1] is that you can set the context length manually and see how well the llm uses it. I don't have an exact experience to relay, but it's not unusual for models to be allow longer contexts, but ignore that context.

Just making the context big doesn't mean the LLM is going to use it well.

[1] I've using lm studio on both a macbook air and a macbook pro. Even a macbook air with 16G can run pretty decent models.

	▲	nomel 3 days ago \| parent [-]
		A good example of this was the first Gemini model that allowed 1 million tokens, but would lose track of the conversation after a couple paragraphs.

▲

EForEndeavour 4 days ago | parent | prev | next [-]

https://onnyunhui.medium.com/evaluating-long-context-lengths...

▲

F7F7F7 4 days ago | parent | prev | next [-]

What do you think happens when things start falling outside of its context window? It loses access to parts of your conversation.

And that’s why it will gladly rebuild the same feature over and over again.

▲

fkyoureadthedoc 3 days ago | parent | prev | next [-]

https://github.com/adobe-research/NoLiMa

▲

lightbendover 4 days ago | parent | prev [-]

[dead]