Remix.run Logo
tough 20 hours ago

It seems to me like this is a new hybrid product for -vibe coders- beacuse otherwise the -wrapping- of prompting/improving a prompt with an LLM before hitting the text2image model can certainly be done as you say cheaper if you just run it yourself.

maybe OpenAI thinks model business is over and they need to start sherlocking all the way from the top to final apps (Thus their interest on buying out cursor, finally ending up with windsurf)

Idk this feels like a new offering between a full raw API and a final product where you abstract some of it for a few cents, and they're basically bundling their SOTA llm models with their image models for extra margin

vineyardmike 20 hours ago | parent [-]

> It seems to me like this is a new hybrid product for -vibe coders- beacuse otherwise the -wrapping- of prompting/improving a prompt with an LLM before hitting the text2image model can certainly be done as you say cheaper if you just run it yourself.

In case you didn’t know, it’s not just wrapping in an LLM. The image model they’re referencing is a model that’s directly integrated into the LLM for functionality. It’s not possible to extract, because the LLM outputs tokens which are part of the image itself.

That said, they’re definitely trying to focus on building products over raw models now. They want to be a consumer subscription instead of commodity model provider.

tough 20 hours ago | parent | next [-]

Right! I forgot the new model was a multi-modal one generating image outputs from both image and text inputs, i guess this is good and price will come down eventually.

waiting for some FOSS multi-modal model to come out eventually too

great to see openAI expanding into making actual usable products i guess

spilldahill 20 hours ago | parent | prev [-]

yeah, the integration is the real shift here. by embedding image generation into the LLM’s token stream, it’s no longer a pipeline of separate systems but a single unified model interface. that unlocks new use cases where you can reason, plan, and render all in one flow. it’s not just about replacing diffusion models, it’s about making generation part of a broader agentic loop. pricing will drop over time, but the shift in how you build with this is the more interesting part.