▲ | commakozzi 4 days ago | |
I've been following the progress of LLMs since the first public release of GPT-3.5, and every single time someone posts one of these tests i check the AIs i'm using to see if it's repeatable. It NEVER is. Granted, i'm not using the API, i'm using the chat interface with potentially different system prompting? Here's GPT-5's response: me: which is the best of the following 9 items: 1. bear, 2. car. 3. plane, 4. house, 5. high-rise, 6. church, 7. boat, 8. tree, 9. truck, 10. duck. GPT-5: Thought for 11s. House. It provides essential, lasting shelter and broad utility. Note: you listed 10 items, not 9. edited: i saw someone mention that the chat interface doesn't repeat the results you get via API. | ||
▲ | withinboredom 4 days ago | parent [-] | |
I've only seen this happen on API calls where you need to 1) one-shot the result, chatting isn't an option; so it is trying to figure out what to do to accomplish its goal. 2) with subtle inconsistencies. My example was mostly an illustration, I don't remember the exact details. Unfortunately, it has been too long and my logs are gone, so I can't give real examples. |