▲ | TeMPOraL 18 hours ago | |||||||||||||||||||||||||||||||||||||
I'm curious how you ended up in such a conversation in the first place. Hallucinations are one thing, but I can't remember the last time when the model was saying that it actually run something somewhere that wasn't a tool use call, or that it owns a laptop, or such - except when role-playing. I wonder if the advice on prompting models to role play isn't backfiring now, especially in conversational setting. Might even be a difference between "you are an AI assistant that's an expert programmer" vs. "you are an expert programmer" in the prompt, the latter pushing it towards "role-playing a human" region of the latent space. (But also yeah, o3. Search access is the key to cutting down on amount of guessing the answers, and o3 is using it judiciously. It's the only model I use for "chat" when the topic requires any kind of knowledge that's niche or current, because it's the only model I see can reliably figure out when and what to search for, and do it iteratively.) | ||||||||||||||||||||||||||||||||||||||
▲ | westoncb 18 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
I've seen that specific kind of role-playing glitch here and there with the o[X] models from openai. The models do kinda seem to just think of themselves as being developers with their own machines.. I think it usually just doesn't come up but can easily be tilted into it. | ||||||||||||||||||||||||||||||||||||||
▲ | bradly 18 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
What is really interesting is in the "thinking" section it said "I need to reassure the user..." so my intuition is that it thought it was right, but did not think I would think they were right, but if they just gave me the confidence, I would try the code and unblock myself. Maybe it thought this was the best % chance I would listen to it and so it is the correct response? | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
▲ | agos 13 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
A friend recently had a similar interaction where ChatGPT told them that it had just sent them an email or a wetransfer with the requested file |