Wan 2.2-S2V-14B turns an image and audio clip into a cinematic video. You give it a voice recording (or singing), a still picture, and maybe some text for extra detail. It spits out a lip synced video that looks like a short film.

Built by WAN AI and Tongyi Lab. It’s part of their Wan 2.2 update which uses a Mixture-of-Experts setup for better video without slowing things down.

The model makes 480p and 720p videos at 24fps. Works fine on high-end consumer GPUs like RTX 4090.

You can label stuff like lighting and contrast to tweak how it looks.

The model's on Hugging Face under Wan-AI/Wan2.2-S2V-14B and runs under Apache 2.0 license.

Key Features

Lip Sync (From Image)

Model Performance Editor’s Rating

No editor performance evaluations available for this model yet.

User Ratings

Censorship

Lower = less censorship. Higher = stricter filtering.

Expressiveness

Generation Speed

Prompt Following

Realism

Speech Coherence

Wan 2.2 Speech to Video Examples

As you see there is no running, even slow motion, the guy is just gesticulating regardless of the larger context of the image Generated on August 28, 2025

Compare With Other Models

It's good lip sync, background animation though is slightly lacking Generated on August 27, 2025

Compare With Other Models

Uploaded a picture + audio file, no additional instructions. Solid result. Generated on August 26, 2025

Compare With Other Models

Not a bad result for singing, but she isn't playing guitar at the same time. Generated on August 26, 2025

Compare With Other Models

Where To Find Wan 2.2 Speech to Video

If you'd like to access this model, you can explore the following possibilities:

Related Lipsync Models

🔒 to see up to 20 related models.

Wan 2.2 Speech to Video lipsync model

Key Features

Model Performance Editor’s Rating

User Ratings

Wan 2.2 Speech to Video Examples

Where To Find Wan 2.2 Speech to Video

Related Lipsync Models