Remix.run Logo
syndacks 6 hours ago

How do people evaluate creative writing and emotional intelligence in LLMs? Most benchmarks seem to focus on reasoning or correctness, which feels orthogonal. I’ve been playing with Kimmy K 2.5 and it feels much stronger on voice and emotional grounding, but I don’t know how to measure that beyond human judgment.

mohsen1 2 hours ago | parent [-]

I am trying! https://mafia-arena.com

I just don't have enough funding to do a ton of tests