Remix.run Logo
og_kalu 2 days ago

The reason is not strange or unknown. The text completion GPT-3 from 2020 often sounds more natural than 4. The reason is the post training processes. Models are more or less being trained to sound like that during RLHF. Stilted, robotic, like a good little assistant. Open AI, Anthropic have said as much. It's not a limitation of the loss function or even state of the art.

refulgentis 2 days ago | parent [-]

I can't give in to misguided pessimism - "Open AI, Anthropic have said as much" is especially not something I can support!

I'm hearing some of the ideas on my corner of llm x creativity Twitter expressed clunkily and if its some irrevocable thing.

You're right the default is to speak like an assistant.

You're wrong that its forced and immutable and a consequence of RLHF and the companies say its so. https://x.com/jpohhhh/status/1784077479730090346

You're especially wrong that RLHF is undesirable https://x.com/jpohhhh/status/1819549737835528555 https://x.com/jpohhhh/status/1819550145522160044.

It's also nigh-trivial to get the completion model back https://x.com/jpohhhh/status/1776434608403325331

I don't know when I'll stop seeing surface-level opinions disguised as cold technological claims on this subject. I would have thought, by now, people doing that would wonder why the wide open lane hasn't been taken, at least once.

og_kalu 2 days ago | parent [-]

I don't understand what you're getting at here. No idea why you've put tweets from a random? person to make your point.

Yes these guys have all noted on the effects of post-training on the models.

"We want people to know that they’re interacting with a language model and not a person." This is literally a goal of post-training for all these companies. Even when they are training it to have a character, it mustn't sound like a person. It's no surprise they don't sound as natural as their base counterparts.

https://www.anthropic.com/research/claude-character

>You're wrong that its forced and immutable and a consequence of RLHF and the companies say its so.

I never said it was immutable. I said it was a consequence of post-training and it is. All the base models speak more naturally with much less effort.

>You're especially wrong that RLHF is undesirable

I don't understand what point you're trying to make here. I didn't say it was undesirable. I said it was heavily affecting how natural the models sounded.

>It's also nigh-trivial to get the completion model back

Try getting GPT-4o to write a story with villains that doesn't end with everyone singing Kumbaya and you'll see how much post-training affects the outputs of these models.