Why have the LLMs „learned“ to write PRs (and other stuff) this way? This style was definitely not mainstream on Github (or Reddit) pre-LLMs, was it?

It’s strange how AI style is so easy to spot. If LLMs just follow the style that they encountered most frequently during training, wouldn’t that mean that their style would be especially hard to spot?

▲ bmacho 8 days ago | parent | next [-]

For this "LLM style were already the most popular, that's how LLM works, then how come LLM style is so weird and annoying" I have 2 theories.

First, LLM style did not even exist, it's a match of several different styles, choice of words and phrases.

Second, LLM has turned a slight plurality into a 100% exclusivity.

Say, there are 20 different choices to say the same thing. They are more or less evenly distributed, one of them is a slightly more common. LLM chooses the most common one. This means that

   situation before : 20 options,  5% frequency each
   situation now    :  1 option, 100% frequency

LLM text is both reducing the variety and increases the absolute frequency drastically.

I think these 2 theories explain how can LLM both sound bad, and "be the most common stye, how humans have always talked" (it isn't).

Also, if the second theory is true, that is, LLM style is not very frequent among humans, that means that if you see someone on the internet that talks like an LLM, he probably is one.

	▲	waste_monk 8 days ago \| parent [-]
		I understand there is an "Exclude Top Choices" algorithm which helps combat this sort of thing.

▲ stephendause 10 days ago | parent | prev | next [-]

This is total speculation, but my guess is that human reviewers of AI-written text (whether code or natural language) are more likely to think that the text with emoji check marks, or dart-targets, or whatever, are correct. (My understanding is that many of these models are fine-tuned using humans who manually review their outputs.) In other words, LLMs were inadvertently trained to seem correct, and a little message that says "Boom! Task complete! How else may I help?" subconsciously leads you to think it's correct.

▲

palmotea 9 days ago | parent | next [-]

My guess is they were trained on other text from other contexts (e.g. ones where people actually use emojis naturally) and it transferred into the PR context, somehow.

Or someone made a call that emoji-infested text is "friendlier" and tuned the model to be "friendlier."

	▲	ljm 8 days ago \| parent [-]
		Maybe the humans in the loop were all MBAs who believe documents and powerpoint slides look more professional when you use graphical bullet points. (I once got that feedback from someone in management when writing a proposal...)

▲

ssivark 10 days ago | parent | prev | next [-]

I suspect that this happens to be desired by the segment most enamored with LLMs today, and the two are co-evolving. I’ve seen discussions about how LM arena benchmarks might be nudging models in this direction.

▲

roncesvalles 9 days ago | parent | prev [-]

AI sounds weird because most of the human reviewers are ESL.

▲ WesolyKubeczek 10 days ago | parent | prev | next [-]

You may thank millenial hipsters who used think emojis are cute and proliferation of little javascript libraries authored by them on your friendly neighborhood githubs.

Later the cutest of the emojis paved their way into templates used by bots and tools, and it exploded like colorful vomit confetti all over the internets.

When I see this emojiful text, my first association is not with an LLM, but with a lumberjack-bearded hipster wearing thick-framed fake glasses and tight garish clothes, rolling on a segway or an equivalent machine while sipping a soy latte.

▲

y0eswddl 8 days ago | parent | next [-]

Everyone in this thread is now dumber for having read this comment. I award you no points and may god have mercy on your soul.

▲

bmacho 8 days ago | parent | next [-]

Jokes on GP, I give up reading most comments when I don't like them anymore, usually after 1-2 sentences.

▲

coldtea 5 days ago | parent | prev | next [-]

No, they are pretty spot on. It's a legitimate complain.

Your comment however is just an ad hominem.

▲

layla5alive 3 days ago | parent [-]

"Mr. Madison, what you've just said is one of the most insanely idiotic things I have ever heard. At no point in your rambling, incoherent response were you even close to anything that could be considered a rational thought. Everyone in this room is now dumber for having listened to it. I award you no points, and may God have mercy on your soul."

	▲	coldtea 2 days ago \| parent [-]
		Nothing like a canned meme response to bring value to a discussion

▲

ljm 8 days ago | parent | prev | next [-]

I love how these elaborate stereotypes reveal more about the author than the group of people they are lampooning.

	▲	coldtea 5 days ago \| parent [-]
		Nah, they perfectly evoke the group they're lampooning.

▲

WesolyKubeczek 8 days ago | parent | prev [-]

Welcome to the bottom, it's warm and cozy down here.

▲

iknowstuff 10 days ago | parent | prev | next [-]

This generic comment reads like its AI generated, ironically

▲

WesolyKubeczek 10 days ago | parent [-]

It’s below me to use LLMs to comment on HN.

▲

freedomben 10 days ago | parent [-]

Exactly what an LLM would say.

Jk, your comments don't seem at all to me like AI. I don't see how that could even be suggested

	▲	nativeit 10 days ago \| parent [-]
		[flagged]

▲

h4ck_th3_pl4n3t 8 days ago | parent | prev [-]

Beard: check

Glasses: check (I'm old)

Garish clothes: check

Segway: nope

So there's a 75% chance I am a Millenial hipster. Soy latte: sounds kinda nice

▲ oceanplexian 10 days ago | parent | prev | next [-]

LLMs write things in a certain style because that's how the base models are fine tuned before being given to the public.

It's not because they can't write PRs indistinguishable from humans, or can't write code without Emojis. It's because they don't want to freak out the general public so they have essentially poisoned the models to stave off regulation a little bit longer.

▲

SamPatt 10 days ago | parent | next [-]

I doubt this. I've done AI annotation work on the big models. Part of my job was comparing two model outputs and rating which is better, and using detailed criteria to explain why it's better. The HF part.

That's a lot of expensive work they're doing, and ignoring, if they're just later poisoning the models!

▲

h4ck_th3_pl4n3t 8 days ago | parent [-]

GP kind of implying that AGI is already there, and all companies are just dumbing them down because of regulations of the law.

I'm like "Sure buddy, sure. And the nanobots are in all vaccines, right?"

▲

oceanplexian 4 days ago | parent [-]

If your company built a super intelligent LLM, say that could find Alpha in the financial markets, would you make it public to anyone with a ChatGPT subscription?

Of course not! They would use it to trade and would keep it concealed while throwing the public a bone with a less advanced version. Same thing applies as AGI or even as code gen gets better.

	▲	h4ck_th3_pl4n3t 3 days ago \| parent [-]
		How often did you see an actual UFO? How likely is it that you saw a plane made by humans that you misclassified due to lack of data vs how likely is it that it was actually an Alien made aircraft, coincidentally working in our atmospheric conditions, gravity and drag related physics? Guess what, it's about likeliness and you are extrapolating the wrong assumptions. Extraordinary claims require extraordinary evidence, not the other way around.

▲

dingnuts 10 days ago | parent | prev [-]

this is WILD speculation without a citation. it would be a fascinating comment if you had one! but without? sounds like bullshit to me...

	▲	array_key_first 10 days ago \| parent \| next [-]
		It is wildly speculative, but it's something I've never considered. If I were making a brave new technology that I knew had power for unprecedented evil, I might gimp it, too.
	▲	alt187 10 days ago \| parent \| prev [-]
		This sounds like the most plausible explanation to me. Occam's razor, remember it!

▲ somethingsome 8 days ago | parent | prev | next [-]

My impression is that this style started with apple products. I remember distinctly opening a terminal and many command lines (mostly Javascript frameworks) applications were showing emoji in the terminal way before LLMs.

But maybe it originated somewhere else.. In Javascript libraries..?

▲

yakshaving_jgt 8 days ago | parent [-]

I thought it was JavaScript libraries written by people obsessed with the word "awesome", and separately the broader inclusivity movement. For some reason, I think people think riddling a README with emoji makes the document more inclusive.

▲

DoctorOW 8 days ago | parent [-]

> For some reason, I think people think riddling a README with emoji makes the document more inclusive.

Why do you think that? I try to stay involved in accessibility community (if that's what you mean by inclusive?) and I've not heard anyone advocate for emojis over text?

▲

yakshaving_jgt 8 days ago | parent [-]

It's really only anecdotal — I observed this as a popular meme between ~2015-2020.

I say "meme" because I believe this is how the information spreads — I think people in that particular clique suggest it to each other and it becomes a form of in-group signalling rather than an earnest attempt to improve the accessibility of information.

I'm wary now of straying into argumentum ad ignorantiam territory, but I think my observation is consistent with yours insofar as the "inclusivity" community I'm referring to doesn't have much overlap with the accessibility community; the latter being more an applied science project, and the former being more about humanities and social theory.

▲

DoctorOW 7 days ago | parent [-]

Could you give an example of the inclusivity community? I'm not sure I understand.

	▲	yakshaving_jgt 7 days ago \| parent [-]
		I mean the diversity and inclusion world — people focused on social equity and representation rather than technical usability. Their work is more rooted in social theory and ethics than in empirical research.

▲ apwheele 8 days ago | parent | prev | next [-]

I do remember 1 example of an emoji in tech docs before all of this -- learning github actions (which based on my blog happened in 2021 for me, before ChatGPT release), at one point they had an apple emoji at the final stage saying "done". (I am sure there are others, I just do not remember them.)

But agree excessive emoji's, tables of things, and just being overly verbose are tells for me anymore.

▲

Sharlin 8 days ago | parent [-]

I do recall emoji use getting more popular in docs and – brrh – in the outputs of CLI programs already before LLMs. I’m pretty sure thst the trend originated from the JS ecosystem.

	▲	ManuelKiessling 7 days ago \| parent \| next [-]
		It absolutely was a trend right before LLM training started — but no way this was already the style of the majority of all tech docs and PRs ever. The „average“ style, from the Unix manpages from the 1960s through the Linux Documentation Project all the way to the latest super-hip JavaScript isEven emoji vomit README must still have been relatively tame I assume.
	▲	bavell 8 days ago \| parent \| prev [-]
		Really hate this trend/style. Sucks that it's ossified into many AIs. Always makes me think of young preteens who just started texting/DMing. Grow up!

▲ NewsaHackO 10 days ago | parent | prev | next [-]

I wonder if it's due to emojis being able to express a large amount of infomation per token. For instance, the bulls-eye emoji is 16 bits. Also, Emoji's don't have the language barrier.

	▲	desantisll 5 days ago \| parent [-]
		ding ding ding. a picture is worth a thousand, a word is worth several characters. then what happens when you can fit a picture in a character?

▲ analog31 8 days ago | parent | prev | next [-]

I wonder if there's an analogy to the style of Nigerian e-mail scams, that always contain spelling errors, and conclude with "God Bless." If the writing looks too literate, people might actually read and critique it.

God Bless.

▲ rolisz 9 days ago | parent | prev | next [-]

There's some research that shows that LLMs finetuned to write malicious code (with security vulnerabilities) also becomes more malicious (including claiming that Hitler is a role model).

So it's entirely possible that training in one area (eg: Reddit discourse) might influence other areas (such as PRs)

https://arxiv.org/html/2502.17424v1

▲ troupo 8 days ago | parent | prev | next [-]

> Why have the LLMs „learned“ to write PRs (and other stuff) this way?

They didn't learn how to write PRs. They "learned" how to write text.

Just like generic images coming out of OpenAI have the same style and yellow tint, so does text. It averages down to a basic tiktok/threads/whatever comment.

Plus whatever bias training sets and methodology introduced

	▲	ManuelKiessling 7 days ago \| parent [-]
		That’s my whole point: Why does it seemingly „average down“ to a style that was not encountered „on average“ at the time that LLM training started?

▲ FinnKuhn 8 days ago | parent | prev | next [-]

It reminds me of this, but without the logic and structure: https://gitmoji.dev/

▲ echelon_musk 8 days ago | parent | prev | next [-]

I'm glad that AI slop is detectable. So, for now the repulsive emoji crap is a useful heuristic to me that someone is wasting my time. In a few years once it is harder to detect I expect I'm going to have a harder and more frustrating time. For this reason I hope people don't start altering their prompts to make them harder to detect as LLM generated to people with a modicum of intelligence left.

▲ fho 10 days ago | parent | prev | next [-]

Don't Github have emoji reactions? I would assume that those tie "PR" and "needs emojis" closely together.

▲ 8 days ago | parent | prev | next [-]

[deleted]

▲ standardly 7 days ago | parent | prev | next [-]

RLHF and system prompt, I assume. But isn't being able to identify LLM output a good thing?

▲ 10 days ago | parent | prev [-]

[deleted]