Remix.run Logo
rustystump 5 hours ago

I wont touch how profoundly i disagree with everything you said on reasoning (u clearly already have it figured out) but a fun test i have done with most of the big models is to give it some text input, maybe a short story, and have it rate it. That is, the prompt is, rate this from 1-10.

For Gemini and gpt, it almost always will give very similar scores for everything. As long as grammar isnt off u cannot get below a 7.

X ai on the other hand will rarely give anything above a 7.

Now when u prompt with, rate 1-10 with 5 being average, all the sudden the scores of openai and gemini drop and x ai remains roughly the same.

All of them will eventually give you a 10 if u keep making tiny edits “fixing” whatever they complain about.

Humans do not do this. Or more specifically, my experience with humans.