Remix.run Logo
OgsyedIE 3 days ago

That is a very odd error to make and I hope the author has merely misremembered the content of the story but I carried out a short test and the results are not promising for full human authorship.

Prompting "Which Ted Chiang story depicts a universe where thermodynamics works differently" led to hallucinating that Exhalation is the answer (instead of correctly stating that no story does this) with high logprobs by GPT 4.5, 4.1, o3, Claude 4 and DeepSeek R1.

Only GPT 5 and Claude 4.1 gave correct answers repeatedly (on repeated sampling in their case instead of logprobs).

LinchZhang 3 days ago | parent [-]

This seems like a weird way to check if something's AI? a) Like presumably AIs are much more likely to make mistakes of a certain form if there are more such mistakes in the training data (or similar ones) b) to figure out whether something's written by AI you want to figure out if AI can independently generate it rather than heavily be tricked to make a specific mistake.

OgsyedIE 3 days ago | parent [-]

I'd previously read the story myself about a decade ago and it stuck in my mind because I quite enjoyed the autosurgery scene so all I was checking was whether it was a mistake AI commonly makes.

If you're wondering about the apparently unusual depth of checking logprobs across different versions, I have a pre-existing applet for that which was built for checking some categories of press releases in my industry.

LinchZhang 3 days ago | parent [-]

checking logprobs doesn't seem weird to me, it was the priming that was weird.

OgsyedIE 3 days ago | parent [-]

I reasoned that, based on the error falsely attributing a Chiang story as based on different thermodynamics, any thinking chain for generating a list of Chiang stories predicated on different physics (carried out by an autoregressive model obviously, since no deductions of this kind can be made for the output of diffusion llms) that could make the given error would have suggested a story where thermodynamics was different and then guessed that Exhalation fits its own criteria.

On the basis of that, the priming simulates the same scenario, since there is no feasible way to recreate the author's method of writing an article with unknown essay-writing prompts and a set of unknown proportions of AI to human-generation for different elements of content and editing.