This is mentioned in the text:

> This idea is not new. Some standardized tests have long used versions of negative marking for wrong answers or partial credit for leaving questions blank to discourage blind guessing.

▲

throwawaymaths 5 days ago | parent [-]

there's not really an easy way to train for that at scale. a "correct" answer may not be one token, there may be multiple synonymous answers starting with different tokens, you could add five space tokens in front of the answer amd it likely shouldn't make it "wrong".

▲

ACCount37 5 days ago | parent [-]

Yes, it's not nearly as easy as "just fix the evals".

But better evals are still helpful, because they reward LLM vendors for trying to do the very-hard-to-do thing. Instead of rewarding them for training an LLM that's really good at emitting 7% confidence guesses.

▲

throwawaymaths 5 days ago | parent [-]

you're missing the point. SAT multiple choice negatives for random guesses, fine, you could trivially use this sort of a strategy for assigning cost functions to a classifier and backpropagate. how do you give negative weight to a wrong answer when training a transformer?

▲

ACCount37 5 days ago | parent | next [-]

In RLVR? Quite easily.

And OpenAI has induced hallucinations in o3 with RLVR mistakes, not with a failed pre-training run. They used o4-mini as an example - similar training to o3 and similar issues.

Conversely, they have also designed a post-training system that has successfully reduced hallucinations in GPT-5.

	▲	5 days ago \| parent [-]
		[deleted]

▲

RugnirViking 5 days ago | parent | prev [-]

isn't this just related to the question "how do you train a transformer"? you give it wrong examples, and use optimization algorithms to move away from that kind of completions

	▲	throwawaymaths 5 days ago \| parent [-]
		thats quite hard for the reasons i explained. might be solvable using q learning techniques, but those are not easy in the context of transformers iiuc