Remix.run Logo
lich_king 4 hours ago

I'm always startled about how HN approaches these topics. When we have a press release from a university about how researchers can detect thoughts via fMRI, we have no issue with the claim. But if a vendor makes a pretty believable claim that there are repetitive statistical patterns in LLM output, it's all of sudden treated the same as palm reading.

The problem isn't that AI detection doesn't work. State of the art in this field is pretty solid. The only issue is that it's probabilistic, so it sometimes fails, and when it does, we have nothing else in situations where you actually want to know if someone put in the work.

So what are you proposing, exactly? That we run a large-scale experiment of "let's see what happens if children don't actually need to learn to do thinking and writing on their own"? The reality is that without some form of compulsion, most kids would rather play video games / scroll through TikTok all day. Or that we move to a vastly more resource-intensive model where every kid is given personalized instruction and watched 1:1?

Zigurd 4 hours ago | parent | next [-]

>> But if a vendor makes a pretty believable claim that there are repetitive statistical patterns in LLM output, it's all of sudden treated the same as palm reading.

That's what fortunetellers do. The problem isn't guessing correctly about AI content in writing. The problem is false positives. That's what puts it in the same category is predictive policing scam software. And fortunetelling.

lich_king 3 hours ago | parent [-]

It has nothing to do with predictive policing. I don't understand this example, it has nothing to do with detecting intent. You're looking for evidence of a past misdeed.

False positive and false negative rates are non-zero, as with almost anything, but the tools are pretty good. I encourage you to give them a try. Pangram is a good state-of-the-art choice and you can try it for free. They also publish evals and other data about their approach.

PufPufPuf 3 hours ago | parent | prev | next [-]

Eliminating any statistically significant difference between a high-quality human-written text and LLM-written text is exactly what the LLMs are being trained for. At this point, "text is low quality, therefore must be human" is a much stronger signal.

lich_king 3 hours ago | parent [-]

> Eliminating any statistically significant difference between a high-quality human-written text and LLM-written text is exactly what the LLMs are being trained for.

I think you're basing this off a fundamental misunderstanding of what these detectors look for. LLMs generate human-like text, but they also generate roughly the same style and content every time for a given prompt, modulo some small amount of nondeterminism. In essence, they are a very predictable human. Ask Gemini or ChatGPT ten times in a row to write an essay about why AI is awesome, and it will probably strike about the same tone every single time, with similar syntax, idioms, etc.

This is what these tools detect: the default output of "hey ChatGPT, write me a school essay about X". This can be evaded with clever prompting to assume a different writing personality, but there's only so much evasion you can do without making the text weird in other ways.

wongarsu 3 hours ago | parent | prev | next [-]

You can detect if texts from a year ago used AI based on statistical patterns. Nobody is taking issue with that. But once you tell people "we will run these tests to detect if your future submissions are using AI" you create an adversarial environment and your statistical methods will continuously break. Not because statistics is broken, but because you are trying to hit a moving target that doesn't want to be hit.

That's not like detecting thoughts via fMRI, it's like detecting tomorrows malware with yesterday's malware signatures. Or like researchers making a vaccine against the common cold

And the obvious proposal to fix that has been made multiple times in this thread: don't make take-at-home tasks part of the grade. Instead of trying to punish what you can't reliably detect, take away the incentive to do it in the first place

lich_king 3 hours ago | parent | next [-]

> You can detect if texts from a year ago used AI based on statistical patterns.

I don't understand your argument. The vendors for these detection tools can acquire recent samples from all frontier models just as easily as you can use them to write essays. There's nothing that requires a one-year delay.

26 minutes ago | parent [-]
[deleted]
oytis 3 hours ago | parent | prev [-]

> you create an adversarial environment

Do AI vendors specifically train models to circumvent AI detectors? Why would they?

wongarsu 25 minutes ago | parent [-]

The adversary aren't the model vendors, the adversary are the students. The students will modify the prompt, ask models to rewrite text in an atypical style, or use specialized services that attempt to hide the typical AI patterns. And if you pick up their pattern today they will just mix up the formula tomorrow

armchairhacker 3 hours ago | parent | prev [-]

> When we have a press release from a university about how researchers can detect thoughts via fMRI, we have no issue with the claim.

Different people. I for one have always claimed that fMRI is too coarse-grained for detailed thought detection.

If AI detection "sometimes fails", it doesn't "work". It works well enough to convict someone with other evidence, but when there's no other evidence nor an attempt to get any, it has no good use.

What I propose is simple: grade only closed-book exams, and hold students' phones during the exams. Students don't need 1:1 monitoring, it's the same as 10-20 years ago.