| ▲ | simonw 7 hours ago | |||||||
It's generally one-shot-only - whatever comes out the first time is what I go with. I've been contemplating a more fair version where each model gets 3-5 attempts and then can select which rendered image is "best". | ||||||||
| ▲ | irthomasthomas 7 hours ago | parent | next [-] | |||||||
Try llm-consortium with --judging-method rank | ||||||||
| ▲ | andriy_koval 7 hours ago | parent | prev [-] | |||||||
I think it will make results way better and more representative of model abilities.. | ||||||||
| ||||||||