▲ | joenot443 3 days ago | |
This is a good piece. Clearly it's a pretty complex problem and the intuitive result a layman engineer like myself might expect doesn't reflect the reality of LLMs. Regex works as reliably on 20 characters as it does 2m characters; the only difference is speed. I've learned this will probably _never_ be the case with LLMs, there will forever exist some level of epistemic doubt in its result. When they announced Big Contexts in 2023, they referenced being able to find a single changed sentence in the context's copy of Great Gatsby[1]. This example seemed _incredible_ to me at the time but now two years later I'm feeling like it was pretty cherry-picked. What does everyone else think? Could you feed a novel into an LLM and expect it to find the single change? | ||
▲ | bigmadshoe 3 days ago | parent | next [-] | |
This is called a "needle in a haystack" test, and all the 1M context models perform perfectly on this exact problem, at least when your prompt and the needle are sufficiently similar. As the piece above references, this is a totally insufficient test for the real world. Things like "find two unrelated facts tied together by a question, then perform reasoning based on them" are much harder. Scaling context properly is O(n^2). I'm not really up to date on what people are doing to combat this, but I find it hard to believe the jump from 100k -> 1m context window involved a 100x (10^2) slowdown, so they're probably taking some shortcut. | ||
▲ | adastra22 3 days ago | parent | prev [-] | |
Depends on the change. |