| ▲ | EB66 5 hours ago | |
The fact that Gemini returns the highest rate of fake positives aligns with my experience using the Gemini models. I use ChatGPT, Claude and Gemini regularly and Gemini is clearly the most sycophantic of the three. If I ask those three models to evaluate something or estimate odds of success, Gemini always comes back with the rosiest outlook. I had been searching for a good benchmark that provided some empirical evidence of this sycophancy, but I hadn't found much. Measuring false positives when you ask the model to complete a detection related task may be a good way of doing that. | ||