Text encoder is Mistral-Small-3.2-24B-Instruct-2506 (which is multimodal) as opposed to the weird choice to use CLIP and T5 in the original FLUX, so that's a good start albeit kinda big for a model intended to be open weight. BFL likely should have held off the release until their Apache 2.0 distilled model was released in order to better differentiate from Nano Banana/Nano Banana Pro.

The pricing structure on the Pro variant is...weird:

> Input: We charge $0.015 for each megapixel on the input (i.e. reference images for editing)

> Output: The first megapixel is charged $0.03 and then each subsequent MP will be charged $0.015

▲

woadwarrior01 8 hours ago | parent | next [-]

> BFL likely should have held off the release until their Apache 2.0 distilled model was released in order to better differentiate from Nano Banana/Nano Banana Pro.

Qwen-Image-Edit-2511 is going to be released next week. And it will be Apache 2.0 licensed. I suspect that was one of the factors in the decision to release FLUX.2 this week.

	▲	minimaxir 8 hours ago \| parent [-]
		Fair point.

▲

kouteiheika 7 hours ago | parent | prev | next [-]

> as opposed to the weird choice to use CLIP and T5 in the original FLUX

Yeah, CLIP here was essentially useless. You can even completely zero the weights through which the CLIP input is ingested by the model and it barely changes anything.

▲

beernet 8 hours ago | parent | prev | next [-]

Nice catch. Looks like engineers tried to take care of the GTM part as well and (surprise!) messed it up. In any case, the biggest loser here is Europe once again.

▲

throwaway314155 8 hours ago | parent | prev [-]

> as opposed to the weird choice to use CLIP and T5 in the original FLUX

This method was used in tons of image generation models. Not saying it's superior or even a good idea, but it definitely wasn't "weird".