Remix.run Logo
andriy_koval 6 hours ago

what is your setup for drawing pelican? Do you ask model to check generated image, find issues and iterate over it which would demonstrate models real abilities?

simonw 6 hours ago | parent [-]

It's generally one-shot-only - whatever comes out the first time is what I go with.

I've been contemplating a more fair version where each model gets 3-5 attempts and then can select which rendered image is "best".

irthomasthomas 6 hours ago | parent | next [-]

Try llm-consortium with --judging-method rank

andriy_koval 6 hours ago | parent | prev [-]

I think it will make results way better and more representative of model abilities..

simonw 6 hours ago | parent [-]

It would... but the test is inherently silly, so I'm still not sure if it's worth me investing that extra effort in it.