Remix.run Logo
ajross 2 hours ago

"Groupthink" informed by extremely broad training sets is more conventionally called "consensus", and that's what we want the LLM to reflect.

"Groupthink", as the term is used by epistemologically isolated in-groups, actually means the opposite. The problem with the idea is that it looks symmetric, so if you yourself are stuck in groupthink, you fool yourself into think it's everyone else doing it instead. And, again, the solution for that is reasonable references grounded in informed consensus. (Whether that should be a curated encyclopedia or a LLM is a different argument.)

bubblewand an hour ago | parent | next [-]

> "Groupthink" informed by extremely broad training sets is more conventionally called "consensus", and that's what we want the LLM to reflect.

Definitely not! I absolutely do not want an LLM that gives much or any truth-weight to the vast majority of writing on the vast majority of topics. Maybe, maybe if they’d existed before the Web and been trained only on published writing, but even then you have stuff like tabloids, cranks self-publishing or publishing through crank-friendly niche publishers, advertisements full of lies, very dumb letters to the editor, vanity autobiographies or narrative business books full of made-up stuff presented as true, et c.

No, that’s good for building a model of something like the probability space of human writing, but an LLM that has some kind of truth-grounding wholly based on that would be far from my ideal.

> And, again, the solution for that is reasonable references grounded in informed consensus. (Whether that should be a curated encyclopedia or a LLM is a different argument.)

“Informed” is a load bearing word in this post, and I don’t really see how the rest holds together if we start to pick at that.

ajross an hour ago | parent [-]

> I absolutely do not want an LLM that gives much or any truth-weight to the vast majority of writing on the vast majority of topics.

I can think of no better definition of "groupthink" than what you just gave. If you've already decided on the need to self-censor your exposure to "the vast majority of writing on the vast majority of topics", you are lost, sorry.

bubblewand 26 minutes ago | parent [-]

A spectacular amount of extant writing accessible to LLM training datasets is uninformed noise from randos online. Not my fault the internet was invented.

I have to be misunderstanding you, though, because any time we want to build knowledge and skills for specialists their training doesn’t look anything like what you seem to be suggesting.

Spivak 40 minutes ago | parent | prev [-]

Gotta be honest, when I go to an encyclopedia the last thing I want is what the mathematically average chronically online person knows and thinks about a topic. Because common misconceptions and the "facts" you see parroted on online forums on all sorts of niche topics look just like consensus but ya know… wrong.

I would rather have an actual audio engineer's take than than the opinion of an amalgamation of hifi forums' talking pseudoscience and the latter is way more numerous in the training.