"VAE: WAN2.2-VAE" so it's just a Wan2.2 edit, compressed to 7B.

kouteiheika 6 hours ago | parent | next [-]

This doesn't necessarily mean that it's Wan2.2. People often don't train their own VAEs and just reuse an existing one, because a VAE isn't really what's doing the image generation part.

A little bit more background for those who don't know what a VAE is (I'm simplifying here, so bear with me): it's essentially a model which turns raw RGB images into a something called a "latent space". You can think of it as a fancy "color" space, but on steroids.

There are two main reasons for this: one is to make the model which does the actual useful work more computationally efficient. VAEs usually downscale the spatial dimensions of the images they ingest, so your model now instead of having to process a 1024x1024 image needs to work on only a 256x256 image. (However they often do increase the number of channels to compensate, but I digress.)

The other reason is that, unlike raw RGB space, the latent space is actually a higher level representation of the image.

Training a VAE isn't the most interesting part of image models, and while it is tricky, it's done entirely in an unsupervised manner. You give the VAE an RGB image, have it convert it to latent space, then have it convert it back to RGB, you take a diff between the input RGB image and the output RGB image, and that's the signal you use when training them (in reality it's a little more complex, but, again, I'm simplifying here to make the explanation more clear). So it makes sense to reuse them, and concentrate on the actually interesting parts of an image generation model.

	▲	mdrzn an hour ago \| parent [-]
		Thanks for the explanation!

▲

dragonwriter 5 hours ago | parent | prev | next [-]

> "VAE: WAN2.2-VAE" so it's just a Wan2.2 edit

No, using the WAN 2.2 VAE does not mean it is a WAN 2.2 edit.

> compressed to 7B.

No, if it was an edit of the WAN model that uses the 2.2 VAE, it would be expanded to 7B, not compressed (the 14B models of WAN 2.2 use the WAN 2.1 VAE, the WAN 2.2 VAE is used by the 5B WAN 2.2 model.)

▲

BoredPositron 6 hours ago | parent | prev [-]

They used the VAE of WAN like many other models do. For image models you see a lot of them using the flux VAE. Which is perfectly fine, they are released as apache2 and save you time to focus on your transformers architecture...