Has anyone tested how good the 1M context window is?

i.e given an actual document, 1M tokens long. Can you ask it some question that relies on attending to 2 different parts of the context, and getting a good repsonse?

I remember folks had problems like this with Gemini. I would be curious to see how Sonnet 4.6 stands up to it.

▲

simianwords 5 hours ago | parent [-]

Did you see the graph benchmark? I found it quite interesting. It had to do a graph traversal on a natural text representation of a graph. Pretty much your problem.

▲

stopachka 3 hours ago | parent | next [-]

Update: I took a corpus of personal chat data (this way it wouldn't be seen in training), and tried asking it some paraphrased questions. It performed quite poorly.

	▲	abraxas 2 hours ago \| parent [-]
		Which models did you try?

▲

stopachka 5 hours ago | parent | prev [-]

Oh, interesting!