AI creators tools

Wan 2.2 Speech to Video lipsync model

Name: Wan
Version: 2.2-S2V
Variant: 14B
Also Known As: Wan 2.2-S2V-14B, WAN 2.2-S2V

Wan 2.2-S2V-14B turns an image and audio clip into a cinematic video. You give it a voice recording (or singing), a still picture, and maybe some text for extra detail. It spits out a lip synced video that looks like a short film.

Built by WAN AI and Tongyi Lab. It’s part of their Wan 2.2 update which uses a Mixture-of-Experts setup for better video without slowing things down.

The model makes 480p and 720p videos at 24fps. Works fine on high-end consumer GPUs like RTX 4090.

You can label stuff like lighting and contrast to tweak how it looks.

The model's on Hugging Face under Wan-AI/Wan2.2-S2V-14B and runs under Apache 2.0 license.

Key Features:

Wan 2.2 Speech to Video Examples

As you see there is no running, even slow motion, the guy is just gesticulating regardless of the larger context of the image Generated on August 28, 2025
Compare Models
It's good lip sync, background animation though is slightly lacking Generated on August 27, 2025
Compare Models
Uploaded a picture + audio file, no additional instructions. Solid result. Generated on August 26, 2025
Compare Models
Not a bad result for singing, but she isn't playing guitar at the same time. Generated on August 26, 2025
Compare Models

Where To Find Wan 2.2 Speech to Video

If you'd like to access this model, you can explore the following possibilities: