If SWE Bench is public then Anthropic is at a minimum probably also looking at their SWE bench scores when making changes, I'd trust more a tracker which runs a private benchmark not known to Anthropic.