Remix.run Logo
tatrions 6 hours ago

The principled approaches are statistical. Things like DetectGPT measure per-token log probability distributions. LLM text clusters tightly around the model's typical set, human writing has more variance (burstiness). Works decently when you know the model and have enough text, breaks down fast otherwise.

Stylistic tells like 'delve' and bullet formatting are just RLHF training artifacts. Already shifting between model versions, compare GPT-4 to 4o output and the word frequency distributions changed noticeably.

Long term the only thing with real theoretical legs is watermarking at generation time, but that needs provider buy-in and it slightly hurts output quality so adoption has been basically nonexistent.