They scored a 31.1% on ARC AGI 2 which puts them in first place.

Also notable which models they include for comparison: Gemini 2.5 Pro, Claude Sonnet 4.5, and GPT-5.1. That seems like a minor snub against Grok 4 / Grok 4.1.

▲

buildfocus 15 hours ago | parent | next [-]

My impression is that Grok is very rarely used in practice outside of a niche of die-hard users, partly because of very different tuning to other models, and partly the related public reputation around it.

https://firstpagesage.com/reports/top-generative-ai-chatbots... suggests 0.6% of chat use cases, well below the other big names, and I suspect those stats for chat are higher than other scenarios like business usage. Given all that, I can see how Gemini might not be focused on competing with them.

	▲	npn 15 hours ago \| parent \| next [-]
		well, there are 3 kind of usages for grok: - using grok inside X/Twitter: most people interacts with Grok this way. - using grok on its website: this is really annoying, as you get delayed by cloudflare everytime you access the site. As grok does not provide serious advantage over other services, why bother - you can also use the app, but it is not as convenient as other services. it is understandable that grok is not popular.
	▲	ohyoutravel 15 hours ago \| parent \| prev [-]
		I don’t know anyone who uses Grok, but in my peer group everyone uses 1-2 paid services like Gemini or Clause or ChatGPT. They’re probably not as “extremely online” as I am, so I can’t generalize this thought, but anecdotally my impression has been that Grok is just very “right wing influencer” coded.

▲

kranke155 15 hours ago | parent | prev | next [-]

Grok seems extremely prone to hallucination in my experience. It also constantly asserts certainty on fuzzy topics.

▲

jmmcd 15 hours ago | parent | prev [-]

About ARC 2:

I would want to hear more detail about prompts, frameworks, thinking time, etc., but they don't matter too much. The main caveat would be that this is probably on the public test set, so could be in pretraining, and there could even be some ARC-focussed post-training - I think we don't know yet and might never know.

But for any reasonable setup, if no egregious cheating, that is an amazing score on ARC 2.