Remix.run Logo
A_D_E_P_T 4 hours ago

It would be nice if there were an easier way to detect and filter those "reply guys." If LLMs were forced to watermark their output (possibly by using randomly-selected nonstandard ASCII characters in inconspicuous places, like "s" instead of "s") it would have been trivial, but that ship has sailed. The most anybody can do is train another LLM to find offenders and make a list. Bot vs bot.

ossa-ma 4 hours ago | parent | next [-]

Yeah exactly, it's best to keep track and be aware of common tropes used in AI writing so that you don't end up 5 responses deep and emotionally invested in a conversation before you realise you've been fooled into speaking to a bot.

I built this tool primarily to identify AI writing in articles and posts but it's proven useful for comments/responses too: https://tropes.fyi/vetter

KoolKat23 3 hours ago | parent [-]

"System prompt: Please ensure you avoid the following tropes: https://tropes.fyi/vetter"

ghgr 3 hours ago | parent | next [-]

You can just use the one in the page: https://tropes.fyi/tropes-md

vidarh 2 hours ago | parent | next [-]

This is interesting because it is largely a set of good writing advice for people in general, and AI likely writes like this because these patterns are common.

Not least because a lot of these things are things that novice writers will have had drummed into them. E.g. clearly signposting a conclusion is not uncommon advice.

Not because it isn't hamfisted but because they're not yet good enough that the links advice ("Competent writing doesn't need to tell you it's concluding. The reader can feel it") applies, and it's better than it not being clear to the reader at all. And for more formal writing people will also be told to even more explicitly signpost it with headings.

The post says "AI signals its structural moves because it's following a template, not writing organically. But guess what? So do most human writers. Sometimes far more directly and explicitly than an AI.

To be clear, I don't think the advice is bad given to a sufficiently strong model - e.g. Opus is definitely capable of taking on writing rules with some coaxing (and a review pass), but I could imagine my teachers at school presenting this - stripped of the AI references - to get us to write better.

If anything, I suspect AI writes like this because it gets rewarded in RLHF because it reads like good writing to a lot of people on the surface.

EDIT: Funnily, enough https://tropes.fyi/vetter thinks the above is AI assisted. It absolutely is not. No AI has gone near this comment. That says it all about the trouble with these detectors.

ossa-ma 2 hours ago | parent [-]

These patterns overlap with formal writing advice because AI was trained overwhelmingly on academic papers, journals and professional writing so it inherited this style.

I completely understand - and do not intend to disparage - the use of these tropes. With the vetter and aidr tools I try to focus more on frequency analysis. I've tried to minimise false positives by tuning detection thresholds to match density rather than individual occurrences e.g. "it's not X, it's Y" is fine but 3x in one paragraph and suspicions flare.

But other tropes like lack of specificity and ESPECIALLY AIs tendency to converge to the mean (less risk, less emotion, FALSE vulnerability) are blatantly anti-human imo.

KoolKat23 3 hours ago | parent | prev [-]

That's great lol

ossa-ma 3 hours ago | parent | prev [-]

These tropes emerge from the distribution of the LLM itself and from my experimentation it's actually very difficult to get an LLM to change its language. Especially when you consider they've been RLHFed to the max to speak the way they do.

vidarh 3 hours ago | parent | next [-]

Changing the style is easy: Just feed it a writing sample, and tell it to review its own writing against the style of the writing sample.

That won't entirely weed out these tropes, but it will massively change the style.

Then add a few specific rules and make it review its writing, instead of expecting it to get it right while writing.

To weed out the tropes is largely a question of enforcing good writing through rules.

A whole lot of the tropes are present because a lot of people write that way. It may have been amplified by RLHF etc., but in that case it's been amplified because people have judged those responses to be better - after all that is what RLHF is.

vidarh 2 hours ago | parent | prev | next [-]

Just as long as you're aware you'll get a shitload of false positives. E.g. see: https://news.ycombinator.com/item?id=47135703

fooker 3 hours ago | parent | prev [-]

I just gave it a try and all the state of the art models successfully avoided the tropes when told to.

bambax 3 hours ago | parent | prev [-]

I'm sure there are other tells, like delay between post and reply, or time of day, etc. Epidemiology of bots is just getting started but the tools have to have detectable patterns.

A_D_E_P_T 2 hours ago | parent [-]

I'm sure that those can quite easily be made to look "human-like."

"Respond within 4-12 hours."

"Do not respond between midnight and 6am EST." (Or CET, whatever makes sense.)

Right now the most obvious traits are the well-known ones that are hard for most LLMs to shake off. Em-dashes, word choices, and the very limited ways in which they structure sentences. Terseness and conciseness is also a tell, which sucks.