| ▲ | cubefox 3 hours ago | |
Well, as I said, if I type "cat", the most reasonable interpretation of that text string is a perfectly realistic cat. If I want an "illustration" I can type in "illustration of a cat". Though of course that's still quite unspecific. There are countless possible unrealistic styles for pictures (e.g. line art, manga, oil painting, vector art etc), and the reasonable thing is that the users should specify which of these countless unrealistic styles they want, if they want one. If I just type in "cat" and the model gives me, say, a water color picture of a cat, it is highly improbable that this style happens to be actually what I wanted. | ||
| ▲ | observationist an hour ago | parent [-] | |
If I want a badly drawn, salad fingers inspired scrawl of a mangy cat, it should be possible. If I want a crisp, xkcd depiction of a cat, it should capture the vibe, which might be different from a stick fighters depiction of a cat, or "what would it look like if George Washington, using microsoft paint for the first time, right after stepping out of the time machine, tried to draw a cat" I think we'll probably need a few more hardware generations before it becomes feasible to use chatgpt 5 level models with integrated image generation. The underlying language model and its capabilities, the RL regime, and compute haven't caught up to the chat models yet, although nano-banana is certainly doing something right. | ||