(author) I saw a 32:1 rate of EM-dashes last night when I just eyeballed the first 3 pages of /newcomments and /noobcomments. So I'm not sure how stable this is over over time.

▲

gritzko 6 hours ago | parent | next [-]

This is probably the time to add some invitation system like GMail had in the beginning. Or make a shade for accounts <1yr. Or something else, before things get too mixed.

	▲	shit_game 5 hours ago \| parent \| next [-]
		The issue with creating some hidden maturity heuristic for accounts is that it will be gamed just the same as any other, except that using age alone is the simplest heuristic to game. You can simply do nothing for incrimental periods of time and then begin testing aged accounts to roughly determine what the minimum age an account must reach to become "trusted". Bot prevention is a very difficult constant game of cat and mouse, and a lot of bot operators have become very skilled at determining the hidden metrics used by platforms to bless accounts; that's their job, after all. I've become a big fan of lobste.rs' invitation tree approach, where the reputation of new accounts rides on the reputation of older accounts, and risks consequence up the chain. It also creates a very useful graph of account origin, allowing for scorched earth approaches to moderation that would otherwise require a serious (and often one-off) machine learning approach to connect accounts.
	▲	duckmysick 4 hours ago \| parent \| prev [-]
		https://lobste.rs/ has a system like that.

▲

Muhammad523 6 hours ago | parent | prev | next [-]

I just took a look at /noobcomments and wow, there's ever a comment where a person argues with AI instead of, you know, using their own brain. It was abivous it was ai since it was formatted with markdown

	▲	lgats 6 hours ago \| parent \| next [-]
		the link https://news.ycombinator.com/noobcomments
	▲	6 hours ago \| parent \| prev [-]
		[deleted]

▲

cookiengineer 6 hours ago | parent | prev [-]

I wanted to point out that em dashes are autocompleted by the iOS keyboard. So the false positives and true negatives might have some overlaps without more details. I think a better indicator would be to only detect em dashes with preceding and following whitespace characters, and general unicode usage of that user.

Additionally, lots of Chinese and Russian keyboard tools use the em dash as well, when they're switching to the alternative (en-US) layout overlay.

There's also the Chinese idiom symbol in UTF8 which gets used as a dot by those users a lot, so that could be a nice indicator for legit human users.

edit: lol @ downvotes. Must have hit a vulnerable spot, huh?

▲

Aurornis 6 hours ago | parent | next [-]

> I wanted to point out that em dashes are autocompleted by the iOS keyboard.

That’s why the analysis was performed over time. All of those em dash sources you mentioned were present before LLM written content became popular.

▲

marginalia_nu 6 hours ago | parent | prev [-]

I think there is a baseline number of human users that for one reason or another uses em-dashes, but this doesn't explain why they 10x more prevalent in green accounts.

	▲	cookiengineer 5 hours ago \| parent [-]
		> I think there is a baseline number of human users that for one reason or another uses em-dashes, but this doesn't explain why they 10x more prevalent in green accounts. I'm not trying to negate the fact. I'm just pointing out that a correlation without another indicator is not evidence enough that someone is a bot user, especially in the golden age of rebranded DDoS botnets as residential proxy services that everyone seems to start using since ~Q4 2024.