Remix.run Logo
rvz 3 hours ago

Another bunch of dead give aways in code bases with READMEs is the repetitive:

- "No X, No Y, No Z." pattern

- "Here is X - it makes Y"

The worst and most obvious one is the constant over use of emoji ticks and crosses.

vonunov 10 minutes ago | parent | next [-]

/* This function doesn't return an int. It doesn't return a float. It doesn't return a char. It doesn't ret-- */

Retr0id 3 hours ago | parent | prev | next [-]

For calibration purposes, I offer you a pre-LLM README I wrote that includes an em-dash* followed by "No X, No Y, No Z": https://github.com/DavidBuchanan314/stelf-loader

*actually a hyphen but it's functioning as an em dash.

zamadatix 2 hours ago | parent | next [-]

"Hyphen functioning as an em dash" is an expected human thing as it's what's easy to type. It's specifically an actual em dash which got bulldozed, much to the dismay of those who bothered to put the unicode character in.

edbaskerville 2 hours ago | parent | next [-]

If you read The Mac is Not a Typewriter in 1992—thus burning Option-Shift-hyphen into your typing patterns for life, along with a dogmatic love for serif body fonts—you're the real victim here.

zamadatix 2 hours ago | parent | next [-]

Or those of us that use a full featured editor when writing md!

This reminds me of another em dash+AI related topic: I've noticed LLMs have an extreme bias towards spaces around the dash while people can go either way with it.

rzzzt 2 hours ago | parent | prev [-]

There's something similar in Microsoft Word, Ctrl-Alt-Minus on the numpad.

galleywest200 2 hours ago | parent | prev | next [-]

I prefer the double dash "--", but Microsoft products will convert this to a proper em-dash if you press space afterwards, I think...

Grimblewald 2 hours ago | parent [-]

Double should map to endash, tripple for em.

Retr0id 2 hours ago | parent | prev [-]

A lot of the LLM bots on HN (and elsewhere) will find-and-replace their em dashes with hypens in an attempt to evade detection.

zamadatix 2 hours ago | parent [-]

Precisely, anything to remove AI smells in favor of natural looking text.

Retr0id 2 hours ago | parent [-]

My point is I don't consider em dash vs hyphen to be a strong signal either way, humans and bots alike use both interchangeably.

zamadatix 2 hours ago | parent [-]

A signal is not the same thing as a guarantee. Both of your points so far, i.e. your provided text & that bots often bother to replace em dashes to avoid detection, actually support that it is a signal though.

Retr0id 2 hours ago | parent [-]

The stronger signal is the grammatical structure, not the specific glyph used.

zamadatix 2 hours ago | parent [-]

The stronger yet signal is both combined! This glyph, that emoji, a given sentence structure, that formatting, a certain phrase. The more you notice -> the stronger the signal, the more you miss/discard -> the weaker the signal.

edbaskerville 2 hours ago | parent | prev [-]

and we will now hold you responsible!

Grimblewald 2 hours ago | parent | prev [-]

Alternatively, no one sounds like an llm, an llm sounds like someone, typically those close to the median of the training corpus. If AI were genuinly capable of novelty, it would be a big deal, tech bros having enough work ethic to design new detectable prose for an llm is a mssive reach and has no real evidence supporting it, else why do tech bros only tackle the easier issues? Things we have massive well labelled corpi for? Why is it never dishwashing and folding laundry?

I put to you, if you see a trope in AI writing it's because that trope appeared in the training corpus. Therefore, sure, being predjudice against it lets you catch some AI, but you'll also flag human outout. I think that may not be worth it in the end.

mrob 19 minutes ago | parent [-]

Show me a single substantial (5000+ words) piece of writing from before the release of GPT-3 that triggers Pangram with high confidence.