| ▲ | evilduck 2 hours ago | |
Can a benchmark meant as a joke not use a fun interpretation of results? The Qwen result has far better style points. Fun sunglasses, a shadow, a better ground, a better sky, clouds, flowers, etc. If we want to get nitty gritty about the details of a joke, a flamingo probably couldn't physically sit on a unicycle's seat and also reach the pedals anyways. | ||