▲ | BoorishBears 5 days ago | |
I don't think this works right now tbh. It has the same problem as LMArena (which already had webarena): better aesthetics are so far out of distribution you can't even train on the feedback you get here. You just get a new form of turbo-slop as some hidden preference takes over. With text output that ended up being extensive markdown and emojis. Here that might be people accidentally associating frosted surfaces with relatively better aesthetics, for example. The problem is so bad LMArena maintains a seperate ranking where they strip away styling entirely |