▲ | vineyardmike 20 hours ago | |
> It seems to me like this is a new hybrid product for -vibe coders- beacuse otherwise the -wrapping- of prompting/improving a prompt with an LLM before hitting the text2image model can certainly be done as you say cheaper if you just run it yourself. In case you didn’t know, it’s not just wrapping in an LLM. The image model they’re referencing is a model that’s directly integrated into the LLM for functionality. It’s not possible to extract, because the LLM outputs tokens which are part of the image itself. That said, they’re definitely trying to focus on building products over raw models now. They want to be a consumer subscription instead of commodity model provider. | ||
▲ | tough 20 hours ago | parent | next [-] | |
Right! I forgot the new model was a multi-modal one generating image outputs from both image and text inputs, i guess this is good and price will come down eventually. waiting for some FOSS multi-modal model to come out eventually too great to see openAI expanding into making actual usable products i guess | ||
▲ | spilldahill 20 hours ago | parent | prev [-] | |
yeah, the integration is the real shift here. by embedding image generation into the LLM’s token stream, it’s no longer a pipeline of separate systems but a single unified model interface. that unlocks new use cases where you can reason, plan, and render all in one flow. it’s not just about replacing diffusion models, it’s about making generation part of a broader agentic loop. pricing will drop over time, but the shift in how you build with this is the more interesting part. |