Remix.run Logo
lame-robot-hoax 6 hours ago

Grok, in my experience, is extremely prone to hallucinations when not used for coding. It will readily claim to have access to internal Slack channels at companies, it will hallucinate scientific papers that do not exist, etc. to back its claims.

I don’t know if the hallucinations extend to code, but it makes me unwilling to consider using it.

observationist 5 hours ago | parent [-]

Fair - it's gotten significantly better over the last 4 months or so, and hallucinations aren't nearly as bad as they once were. When I was using Heavy, it was excellent at ensuring grounding and factual statements, but it's not worth $100 more than ChatGPT Pro in capabilities or utility. In general, it's about the same as ChatGPT Pro - once every so often I'll have to call out the model making something up, but for the most part they're good at using search tools and ensuring claims get grounding and confirmation.

I do expect them to pull ahead, given the resources and the allocation of developers at xAI, so maybe at some point it'll be clearly worth paying $300 a month compared to the prices of other flagships. For now, private hosts and ChatGPT Pro are the best bang for your buck.

F7F7F7 an hour ago | parent [-]

What are you doing with GPT Pro? I've compared it directly with Claude Max x20 and Google's premium offer. I just don't see myself ever leaving Claude Code as my daily driver. Codex is slow and opaque, albeit accurate. And Gemini is just super clumsy inside of it's CLI (and in OpenRouter) often confusing BASH and plans with actual output.