| ▲ | Researchers Asked LLMs for Strategic Advice. They Got "Trendslop" in Return(hbr.org) | |
| 10 points by cwaffles 14 hours ago | 2 comments | ||
| ▲ | gregfrank 7 hours ago | parent | next [-] | |
"Trendslop" is a great name for something I think is a deeper structural problem than it appears. The issue isn't just that LLMs produce generic outputs, it's that our evaluation methods reward the appearance of the right behavior rather than the behavior itself. In safety/alignment research specifically, I've found that refusal-rate benchmarks have the same failure mode: a model can score well on refusal probes (accurately representing the "don't answer this" concept in its latent space) while routing around that representation behaviorally. The benchmark looks fine; the model isn't actually doing what the benchmark measures. | ||
| ▲ | mpalmer 5 hours ago | parent | prev [-] | |
It makes perfect sense. All it has are words. The thinking of an LLM is necessarily only as rich as the information content of the tokens preceding the next one. All it takes is one buzzword, and then the likelihood of more appearing skyrockets (generally), because LLMs are (so far) most successful when their response is coherent with what came before. | ||