Remix.run Logo
stopachka 5 hours ago

Has anyone tested how good the 1M context window is?

i.e given an actual document, 1M tokens long. Can you ask it some question that relies on attending to 2 different parts of the context, and getting a good repsonse?

I remember folks had problems like this with Gemini. I would be curious to see how Sonnet 4.6 stands up to it.

simianwords 5 hours ago | parent [-]

Did you see the graph benchmark? I found it quite interesting. It had to do a graph traversal on a natural text representation of a graph. Pretty much your problem.

stopachka 3 hours ago | parent | next [-]

Update: I took a corpus of personal chat data (this way it wouldn't be seen in training), and tried asking it some paraphrased questions. It performed quite poorly.

abraxas 2 hours ago | parent [-]

Which models did you try?

stopachka 5 hours ago | parent | prev [-]

Oh, interesting!