That's fascinating.
Any idea what their test harness looks like? My experience comes primarily from Claude Code; this makes me wonder if recent CC updates could be more to blame than Opus 4.6 itself.