| ▲ | nomel 2 days ago | |||||||
> I realized that this task is actually a really good fit for LLMs I've found the opposite, since these models still fail pretty wildly at nuance. I think it's a conceptual "needle in the haystack sort of problem. A good test is to find some thread where there's a disagreement and have it try to analyze the discussion. It will usually strongly misrepresent what was being said, by each side, and strongly align with one user, missing the actual divide that's causing the disagreement (a needle). | ||||||||
| ▲ | gowld 2 days ago | parent [-] | |||||||
As always, which model versions did you use in your test? | ||||||||
| ||||||||