▲ | voidUpdate 3 days ago | |||||||
Well it's good to see they are showcasing examples where the model really fails too. - The second one in case 2 doesn't look anything like the reference map - The face in case 5 changes completely despite the model being instructed to not do that - Case 8 ignores the provided pose reference - Case 9 changes the car positions - Case 16 labels the tricuspid in the wrong place and I have no idea what a "mittic" is - Case 27 shows the usual "models can't do text" though I'm not holding that against it too much - Same with case 29, as well as the text that is readable not relating to the parts of the image it is referencing - Case 33 just generated a generic football ground - Case 37 has nonsensical labellings ("Define Jawline" attached to the eye) - Case 58 has the usual "models don't understand what a wireframe is", but again I'm not holding that against it too much Super nice to see how honest they are about the capabilities! | ||||||||
▲ | zahlman 3 days ago | parent | next [-] | |||||||
> - Case 16 labels the tricuspid in the wrong place and I have no idea what a "mittic" is > - Case 27 shows the usual "models can't do text" though I'm not holding that against it too much 16 makes it seem like it can "do text" — almost, if we don't care what it says. But it looks very crisp until you notice the "Pul??nary Artereys". I'd say the bigger problem with 27 is that asking to add a watermark also took the scroll out of the woman's hands. (While I'm looking, 28 has a lot of things wrong with it on closer inspection. I said 26 originally because I randomly woke up in the middle of the night for this and apparently I don't know which way I'm scrolling.) | ||||||||
| ||||||||
▲ | iyk 3 days ago | parent | prev | next [-] | |||||||
In Case 16 (diagram of the heart), every single label (aside from the superior vena cava) is incorrect. | ||||||||
▲ | muzani 3 days ago | parent | prev | next [-] | |||||||
Yeah, I appreciate this kind of benchmarking too. That other Gen AI Showdown in the comments also does a good job with this - mentions that it was best of 8 attempts and so on. | ||||||||
▲ | lm28469 3 days ago | parent | prev [-] | |||||||
47 is also very questionable 48 is impossible to do in a way that is accurate and meaningful |