| ▲ | cannoneyed 6 hours ago | |
Interestingly enough even the big guns couldn't reliably act as judges. I think there are a few reasons for that: - the way they represent image tokens isn't conducive to this kind of task - text-to-image space is actually quite finicky, it's basically impossible to describe to the model what trees ought to look like and have them "get it" - there's no reliable way to few-shot prompt these models for image tasks yet (!!) | ||