Lots of research shows post-training dumbs down the models but no one listens because people are too lazy to learn proper prompt programming and would rather have a model already understand the concept of a conversation.

▲

ACCount37 3 hours ago | parent | next [-]

"Post-training" is too much of a conflation, because there are many post-training methods and each of them has its own quirky failure modes.

That being said? RLHF on user feedback data is model poison.

Users are NOT reliable model evaluators, and user feedback data should be treated with the same level of precaution you would treat radioactive waste.

Professional are not very reliable either, but the users are so much worse.

▲

CuriouslyC 4 hours ago | parent | prev | next [-]

Some distributional collapse is good in terms of making these things reliable tools. The creativity and divergent thinking does take a hit, but humans are better at this anyhow so I view it as a net W.

▲

ACCount37 3 hours ago | parent [-]

This. A default LLM is "do whatever seems to fit the circumstances". An LLM that was RLVR'd heavily? "Do whatever seems to work in those circumstances".

Very much a must for many long term tasks and complex tasks.

	▲	cindyllm 3 hours ago \| parent [-]
		[dead]

▲

CGMthrowaway 4 hours ago | parent | prev | next [-]

How do you take a raw model and use it without chatting ? Asking as a layman

▲

roywiggins 4 hours ago | parent | next [-]

GPT3 was originally just a completion model. You give it some text and it produced some more text, it wasn't tuned for multi-turn conversations.

https://platform.openai.com/docs/api-reference/completions/c...

▲

swatcoder 4 hours ago | parent | prev | next [-]

You lob it the beginning of a document and let it toss back the rest.

That's all that the LLM itself does at the end of the day.

All the post-training to bias results, routing to different models, tool calling for command execution and text insertion, injected "system prompts" to shape user experience, etc are all just layers built on top of the "magic" of text completion.

And if your question was more practical: where made available, you get access to that underlying layer via an API or through a self-hosted model, making use of it with your own code or with a third-party site/software product.

▲

behnamoh 4 hours ago | parent | prev [-]

the same way we used GPT-3. "the following is a conversation between the user and the assistant. ..."

▲

nrhrjrjrjtntbt 4 hours ago | parent [-]

Or just:

1 1 2 3 5 8 13

Or:

The first president of the united

▲

CGMthrowaway 3 hours ago | parent [-]

And that's better? Isn't that just SMS autocomplete?

	▲	nrhrjrjrjtntbt an hour ago \| parent \| next [-]
		Better? I am not sure. A parent comment was suggesting better LLM performance vs. chat. UX wise it is probably worse except for power users.
	▲	d-lisp 2 hours ago \| parent \| prev [-]
		If that's SMS autocomplete, then chatLLMs are just SMS autocomplete with sugar on top.

▲

nomel 4 hours ago | parent | prev [-]

The "alignment tax".

▲

behnamoh 4 hours ago | parent [-]

Exactly. Even this paper shows how model creativity significantly drops and the models experience mode collapse like we saw in GANs, but the companies keep using RLHF...

https://arxiv.org/abs/2406.05587

▲

nomel 4 hours ago | parent [-]

A nice talk about a researcher's experience/benchmarks with raw GPT-4, before and after RLHF:

https://www.youtube.com/watch?v=qbIk7-JPB2c

	▲	behnamoh 4 hours ago \| parent [-]
		Yup, I remember that! Microsoft removed that part of the paper.