One of the things I'm always looking at with new models released is long context performance, and based on the system card it seems like they've cracked it:

  GraphWalks BFS 256K-1M

  Mythos     Opus     GPT5.4

  80.0%     38.7%     21.4%

▲ 2 hours ago | parent | next [-]

[deleted]

▲ metadat 3 hours ago | parent | prev | next [-]

Data source:

https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

(Search for “graphwalk”.)

If true, the SWE bench performance looks like a major upgrade.

▲ himata4113 3 hours ago | parent | prev | next [-]

this seems to be similar to gpt-pro, they just have a very large attention window (which is why it's so expensive to run) true attention window of most models is 8096 tokens.

	▲	appcustodian2 an hour ago \| parent \| next [-]
		source on the 8096 tokens number? i'm vaguely aware that some previous models attended more to the beginning and end of conversations which doesn't seem to fit a simple contiguous "attention window" within the greater context but would love to know more
	▲	thegeomaster 2 hours ago \| parent \| prev [-]
		What's the "attention window"? Are you alleging these frontier models use something like SWA? Seems highly unlikely.

▲ frog437 3 hours ago | parent | prev [-]

[flagged]