They can do it, it's just not "by default", they need to be prompted to do it. So at least the danger is manageable if you know what you're doing and how to prompt around it.

▲

saghm 21 minutes ago | parent | next [-]

"Just don't accidentally forget to do the thing that makes it safe" is not a very effective strategy for something that so many vested interests are trying to push into all corners of society. If it's so easy to misuse it, then it shouldn't be used in any context outside of where there are no major consequences for bad output and there's amble opportunity and ability to validate it

▲

Bridged7756 3 hours ago | parent | prev | next [-]

Not really. They're still non deterministic language predictors. Believing that a prompt is an effective way to actually control these machines' actual behavior is really far fetched.

They com like that from factory. Hardcoded to never say no.

▲

LPisGood 3 hours ago | parent | next [-]

The thing is that they are completely incapable of meta-cognition. Reasoning models don’t show their actual reasoning at all.

▲

DonaldPShimoda 3 hours ago | parent [-]

Right — they're not reasoning, they're generating text that statistically models reasoning. Anyone who says differently is selling something.

	▲	jeremyjh an hour ago \| parent [-]
		That is what a base model does. After RL it is a very different thing, and anyone who says they know what it is, is naive or dishonest. These things are grown, not made, and we really do not understand how they work in many important ways.

▲

eloisant 3 hours ago | parent | prev | next [-]

They're not hardcoded to never say no, but some of the models were trained to be "yes men" because their creators thought it would be a good property to have. GPT-4o for example.

▲

wat10000 3 hours ago | parent | prev [-]

Not believing that a prompt is an effective way to actually control their behavior is obviously incorrect to anyone who's actually used these things.

It's not a guaranteed way to control their behavior, but you can more than move the needle.

▲

wwweston 17 minutes ago | parent | next [-]

The word most relevant to this conversation is “influence.” Influence is possible and users observe it and use it to increase margins of useful outcomes. “Control” is incorrect.

▲

fl4regun 3 hours ago | parent | prev [-]

yeah that distinction is pretty important, and in general that guy I believe IS making the point - if you can not control it with guaranteed outcomes - you cannot control it.

	▲	gwerbin an hour ago \| parent \| next [-]
		You can't control it any more than you can control a draw from a deck of cards, but you can absolutely control the deck of cards that you choose to draw from.
	▲	wat10000 24 minutes ago \| parent \| prev [-]
		That's silly. My car is not absolutely guaranteed to turn left when I turn the steering wheel left, but you wouldn't say I can't control my car on that basis. Steering an LLM with a prompt is way less reliable than steering a car with a steering wheel, but there's still control. It's just not absolute.

▲

romaniv 3 hours ago | parent | prev [-]

[dead]