Remix clone Hacker News

new | show | ask | jobs Github

▲

doctorpangloss 20 hours ago

It's far and away the most powerful image model right now. $0.04/image is a decent price!

▲

arevno 20 hours ago | parent [-]

This is extremely domain-specific. Diffusion models work much better for certain things.

▲

thot_experiment 20 hours ago | parent | next [-]

Can you cite an example? I'm really curious where that set of usecases lies.

▲

koakuma-chan 20 hours ago | parent | next [-]

Explicit adult content.

	▲	thot_experiment 19 hours ago \| parent [-]
		False. That has nothing to do with the model architecture and everything to do with cloud inference providers wanting to avoid regulatory scrutiny.

▲

echelon 19 hours ago | parent | prev [-]

I work in the space. There are a lot of use cases that get censored by OpenAI, Kling, Runway, and various other providers for a wide variety of reasons:

- OpenAI is notorious for blocking copyrighted characters. They do prompt keyword scanning, but also run a VLM on the results so you can't "trick" the model.

- Lots of providers block public figures and celebrities.

- Various providers block LGBT imagery, even safe for work prompts. Kling is notorious for this.

- I was on a sales call with someone today who runs a father's advocacy group. I don't know what system he was using, but he said he found it impossible to generate an adult male with a child. In a totally safe for work context.

- Some systems block "PG-13" images of characters that are in bathing suits or scantily clad.

None of this is porn, mind you.

	▲	thot_experiment 19 hours ago \| parent \| next [-]
		Sure but that has nothing to do with the model architecture and everything to do with the cloud inference providers wanting to cover their asses.
	▲	throwaway314155 19 hours ago \| parent \| prev [-]
		What does any of that have to do with the distinction between diffusion vs. autoregressive models?

▲

echelon 19 hours ago | parent | prev [-]

I don't think so. This model kills the need for Flux, ComfyUI, LoRAs, fine tuning, and pretty much everything that's come before it.

This is the god model in images right now.

I don't think open source diffusion models can catch up with this. From what I've heard, this model took a huge amount of money to train that not even Black Forest Labs has access to.

▲

thot_experiment 19 hours ago | parent | next [-]

ComfyUI supports 4o natively so you get the best of both worlds, there is so much that you can't do with 4o because there's a fundamental limit on the level of control you can have over image generation when your conditioning is just tokens in an autoregressive model. There's plenty of reason to use comfy even if 4o is part of your workflow.

As for LoRAs and fine tuning and open source in general; if you've ever been to civit.ai it should be immediately obvious why those things aren't going away.

	▲	19 hours ago \| parent [-]
		[deleted]

▲

AuryGlenz 10 hours ago | parent | prev [-]

95% of what I do with image models is train LoRAs/finetune family and friends and create images of them.

Sure, I can ghiblify specific images of them on this model, but anything approaching realistic changes their looks. I've also done specific LoRAs for things that may or may not be in their training data, such as specific movies.