Wan2.2, a major upgrade to Alibaba Group’s visual generative models, drops as an open-source model end of July 2025. The model offers enhanced performance, improved visual quality, and several key technical innovations.
Wan2.2 introduces a Mixture-of-Experts (MoE) architecture to video diffusion models, allowing specialized expert models to handle denoising across timesteps, which increases capacity without raising computational cost. The model was trained on a significantly larger dataset—65.6% more images and 83.2% more videos than its predecessor—boosting its performance across motion, semantic, and aesthetic dimensions.
Aesthetic generation has improved with curated data labeled for lighting, composition, and color, enabling more controllable cinematic output. Wan2.2 also features a high-efficiency hybrid text-to-image-to-video (TI2V) 5B model, using a compression ratio of 16×16×4, capable of generating 720P video at 24fps on consumer GPUs like the 4090 (24 GB VRAM).
Model variants include:
If you'd like to access this model, you can explore the following possibilities:
Use our video cost calculator to compare prices between platforms offering Wan 2.2 model.
For locally hosted models, see description and additional links at the bottom for versions, repos and tutorials.