It does quite well on my limited/not-so-scientific private tests (note the tests don't include coding tests): https://aibenchy.com/compare/google-gemma-4-31b-it-medium/go...