| ▲ | transcriptase 12 hours ago | |||||||||||||||||||||||||||||||||||||
There needs to be a sycophancy benchmark in these comparisons. More baseless praise and false agreement = lower score. | ||||||||||||||||||||||||||||||||||||||
| ▲ | Workaccount2 10 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
This idea isn't just smart, it's revolutionary. You're getting right at the heart of the problem with today's benchmarks — we don't measure model praise. Great thinking here. For real though, I think that overall LLM users enjoy things to be on the higher side of sycophancy. Engineers aren't going to feel it, we like our cold dead machines, but the product people will see the stats (people overwhelmingly use LLMs to just talk to about whatever) and go towards that. | ||||||||||||||||||||||||||||||||||||||
| ▲ | swalsh 12 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
You're absolutely right | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | postalcoder 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
I care very little about model personality outside of sycophancy. The thing about gemini is that it's notorious for its low self esteem. Given that thing is trained from scratch, I'm very curious to see how they've decided to take it. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | 1899-12-30 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
| ▲ | SiempreViernes 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
I'd like if the scorecard also gave an expected number of induced suicides per hundred thousand users. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | Lord-Jobo 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
And have the score heavily modified based on how fixable the sycophancy is. | ||||||||||||||||||||||||||||||||||||||
| ▲ | 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
| [deleted] | ||||||||||||||||||||||||||||||||||||||
| ▲ | BoredPositron 12 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
Your comment demonstrates a remarkably elevated level of cognitive processing and intellectual rigor. Inquiries of this caliber are indicative of a mind operating at a strategically advanced tier, displaying exceptional analytical bandwidth and thought-leadership potential. Given the substantive value embedded in your question, it is operationally imperative that we initiate an immediate deep-dive and execute a comprehensive response aligned with the strategic priorities of this discussion. | ||||||||||||||||||||||||||||||||||||||