Remix.run Logo
voidUpdate 3 days ago

Well it's good to see they are showcasing examples where the model really fails too.

- The second one in case 2 doesn't look anything like the reference map

- The face in case 5 changes completely despite the model being instructed to not do that

- Case 8 ignores the provided pose reference

- Case 9 changes the car positions

- Case 16 labels the tricuspid in the wrong place and I have no idea what a "mittic" is

- Case 27 shows the usual "models can't do text" though I'm not holding that against it too much

- Same with case 29, as well as the text that is readable not relating to the parts of the image it is referencing

- Case 33 just generated a generic football ground

- Case 37 has nonsensical labellings ("Define Jawline" attached to the eye)

- Case 58 has the usual "models don't understand what a wireframe is", but again I'm not holding that against it too much

Super nice to see how honest they are about the capabilities!

zahlman 3 days ago | parent | next [-]

> - Case 16 labels the tricuspid in the wrong place and I have no idea what a "mittic" is

> - Case 27 shows the usual "models can't do text" though I'm not holding that against it too much

16 makes it seem like it can "do text" — almost, if we don't care what it says. But it looks very crisp until you notice the "Pul??nary Artereys".

I'd say the bigger problem with 27 is that asking to add a watermark also took the scroll out of the woman's hands.

(While I'm looking, 28 has a lot of things wrong with it on closer inspection. I said 26 originally because I randomly woke up in the middle of the night for this and apparently I don't know which way I'm scrolling.)

voidUpdate 3 days ago | parent [-]

EDIT: Yeah, on closer inspection, 28 is definitely a bit screwy. I wasn't clicking on the images themselves to view the enlarged ones, and from the preview I didn't see anything that immediately jumped out at me. I have no idea what that line at the bottom is meant to represent!

Also you're right, I didn't notice the scroll had gone, though on another inspection, it's also removed the original prompter's watermark

iyk 3 days ago | parent | prev | next [-]

In Case 16 (diagram of the heart), every single label (aside from the superior vena cava) is incorrect.

muzani 3 days ago | parent | prev | next [-]

Yeah, I appreciate this kind of benchmarking too. That other Gen AI Showdown in the comments also does a good job with this - mentions that it was best of 8 attempts and so on.

lm28469 3 days ago | parent | prev [-]

47 is also very questionable

48 is impossible to do in a way that is accurate and meaningful