I'm building Collie (https://collie.ink/).

It's a tool to help teachers detect student assignments that have been written by AI. Unlike other solutions out there, it's an entire web-based text editor that analyses not just the final assignment, but all the keystrokes used during the writing process.

My theory is that analysing the final text only is a futile struggle - billions are being pumped into making LLM text look more human, so trying to make an assessment off final text alone is guess work at best.

I'm curious what folks think! Especially teachers, devs, and anyone navigating this space...

▲

nine_k 5 days ago | parent [-]

I can't help but immediately think about a counteracting piece of software, which asks an LLM for variations of a paragraph, or a phrase, or a few synonyms, and types it the way a human would, with pauses, typos, navigation, rearranging pieces via copy-paste, etc.

Not that your software is going to be useless. But as long as there is an incentive to cheat, new and better tools that facilitate cheating will crop up. Something else should change.

▲

enjeyw 5 days ago | parent [-]

Yeah it's a good call out. I think it's a (more) winnable battle though.

For both a keystroke based AI detector, and software designed to mimic human keystroke patterns, performance will be determined by the size of the dataset they have of genuine human keystroke patterns. The detector has an inherent leg-up in this, because it's constantly collecting more data through the use of the tool, whereas the mimic software doesn't have any built in loop to collect those inputs.

▲

lobsterthief 5 days ago | parent | next [-]

Interesting idea! Could someone use the software to train an LLM prompt that will get around it? By learning what passes and what doesn’t and then having the LLM train on that

	▲	enjeyw 5 days ago \| parent [-]
		Yeah this is something I'm a little worried about - right now it's not extremely difficult to just take an AI generated essay and then just tweak the essay until it passes. My first pass approximation is to make the assessment of whether the essay is AI generated or not accessible only to teachers. I may need to also rate-limit the checks, so people can't brute force it to gather data on what passess.

▲

Footprint0521 5 days ago | parent | prev [-]

I got burned by software like this, when I pasted in a essay I transcribed while driving through Whisper, and software like this thought I had pasted AI content lol

	▲	enjeyw 5 days ago \| parent [-]
		Lol! Technically they're not wrong about it being AI generated, just not at all in the way they meant.