| ▲ | eminence32 11 hours ago | |
> Generate better visuals with more accurate, legible text directly in the image in multiple languages Assuming that this new model works as advertised, it's interesting to me that it took this long to get an image generation model that can reliably generate text. Why is text generation in images so hard? | ||
| ▲ | 11 hours ago | parent | next [-] | |
| [deleted] | ||
| ▲ | Filligree 11 hours ago | parent | prev | next [-] | |
It’s not necessarily harder than other aspects. However: - It requires an AI that actually understands English, I.e. an LLM. Older, diffusion-only models were naturally terrible at that, because they weren’t trained on it. - It requires the AI to make no mistakes on image rendering, and that’s a high bar. Mistakes in image generation are so common we have memes about it, and for all that hands generally work fine now, the rest of the picture is full of mistakes you can’t tell are mistakes. Entirely impossible with text. Nano Banana Pro seems to somewhat reliably produce entire pictures without any mistakes at all. | ||
| ▲ | tobr 11 hours ago | parent | prev [-] | |
As a complete layman, it seems obvious that it should be hard? Like, text is a type of graphic that needs to be coherent both in its detail and its large structure, and there’s a very small amount of variation that we don’t immediately notice as strange or flat out incorrect. That’s not true of most types of imagery. | ||