Remix.run Logo
dubcanada 7 hours ago

grok is 17%? And that's the lowest, most models are like 80%+?

While hallucination is probably closer to 100% depending on the question. This benchmark makes no sense.

elAhmo 6 hours ago | parent [-]

No one serious uses grok.

d0gsg0w00f 23 minutes ago | parent | next [-]

Why not? Honest question.

ajdegol 6 hours ago | parent | prev | next [-]

@grok is this true?

RALaBarge 5 hours ago | parent | prev [-]

YMMV but Grok 4.1 Fast can usually find via static analysis a few things that other models dont seem to catch with the same prompt