Remix.run Logo
n2d4 17 hours ago

Source? It's much more likely that the LLM generates the latent vector which serves as an input to the diffusion model.

jumploops 16 hours ago | parent | next [-]

From the GPT-4o System Card Addendum[0]:

> Unlike DALL·E, which operates as a diffusion model, 4o image generation is an autoregressive model natively embedded within ChatGPT.

[0]https://cdn.openai.com/11998be9-5319-4302-bfbf-1167e093f1fb/...

og_kalu 16 hours ago | parent | prev [-]

Open AI said it's auto-regressive, the presentation on the app is autoregressive, it's priced auto-regressively.

Why would that be more likely ? It seems like some implementation of bytedance's VAR.