Deceptive is such an unpleasant word. But I agree.

Going back a decade: when your loss function is "survive Tetris as long as you can", it's objectively and honestly the best strategy to press PAUSE/START.

When your loss function is "give as many correct and satisfying answers as you can", and then humans try to constrain it depending on the model's environment, I wonder what these humans think the specification for a general AI should be. Maybe, when such an AI is deceptive, the attempts to constrain it ran counter to the goal?

"A machine that can answer all questions" seems to be what people assume AI chatbots are trained to be.

To me, humans not questioning this goal is still more scary than any machine/software by itself could ever be. OK, except maybe for autonomous stalking killer drones.

But these are also controlled by humans and already exist.

▲

robotpepi 3 hours ago | parent | next [-]

I cringe every time I came across these posts using words such as "humans" or "machines".

▲

Certhas 4 hours ago | parent | prev [-]

Correct and satisfying answers is not the loss function of LLMs. It's next token prediction first.

▲

moritzwarhier 4 hours ago | parent [-]

Thanks for correcting; I know that "loss function" is not a good term when it comes to transformer models.

Since I've forgotten every sliver I ever knew about artificial neural networks and related basics, gradient descent, even linear algebra... what's a thorough definition of "next token prediction" though?

The definition of the token space and the probabilities that determine the next token, layers, weights, feedback (or -forward?), I didn't mention any of these terms because I'm unable to define them properly.

I was using the term "loss function" specifically because I was thinking about post-training and reinforcement learning. But to be honest, a less technical term would have been better.

I just meant the general idea of reward or "punishment" considering the idea of an AI black box.

	▲	nearbuy 3 hours ago \| parent [-]
		The parent comment probably forgot about the RLHF (reinforcement learning) where predicting the next token from reference text is no longer the goal. But even regular next token prediction doesn't necessarily preclude it from also learning to give correct and satisfying answers, if that helps it better predict its training data.