My pet theory is similar to the training set hypothesis: em-dashes appear often in prestige publications. The Atlantic, The New Yorker, The Economist, and a few others that are considered good writing. Being magazines, there's a lot of articles over time, reinforcing the style. They're also the sort of thing a RLHF person will think is good, not because of the em-dash but because the general style is polished.

One thing I wondered is whether high prestige writing is encoded into the models, but it doesn't seem far fetched that there's various linkages inside the data to say "this kind of thing should be weighted highly."

▲

kubb 5 days ago | parent | next [-]

It also seems that LLMs are using them correctly — as a pause or replacement for a comma (yes, I know this is an imprecise description of when to use them).

Thanks to LLMs I learned that using the short binding dash everywhere is incorrect, and I can improve my writing because of it.

	▲	number6 4 days ago \| parent [-]
		Before the rise of the llms there was a post here on hn where someone explained how to use all the dashes — sadly llms took them from us

▲

cornonthecobra 5 days ago | parent | prev | next [-]

This is mine as well, with the addition of books. If someone wanted to train a bot to sound more human, they would select data that is verifiably human-made.

The approachable tone of popular print media also preselects for the casual, highly-readable style I suspect users would want from a bot.

▲

lunias 3 days ago | parent | prev | next [-]

I think you're correct. The first time I encountered (and recognized) an em-dash in someone's writing was in middle school, and the person that wrote it was someone that I considered to be academically superior to myself. I noticed though, that a lot of people in the same "smart kids" group would use them; almost as if they had worked together on their papers. Maybe they were just reading different material, but it definitely came across as: this will make my writing "look smart".

▲

mailarchis 4 days ago | parent | prev | next [-]

pg uses emdashes too. I found it interesting to see emdashes on his essays from way back in early 2000s

▲

tim333 4 days ago | parent | prev [-]

That kind of fits with Altman saying they put them in because users liked them (https://www.linkedin.com/posts/curtwoodward_chatgpt-em-dash-...)

I guess in the past if you'd shown me a passage with em dashes I'd say it looks good because I associate it with the New Yorker and Economist, both of which I read. Now I'd be a bit more meh due to LLMs.