LongCat Video Avatar lipsync model

Name: LongCat
Variant: Video-Avatar
Creator: Meituan LongCat Team

LongCat-Video-Avatar is an audio-driven avatar model built for long videos which was released end of December 2025.

It’s said to improve a lot over InfiniteTalk. Long sequences look more stable. Faces and motion hold together better too.

Built on the LongCat-Video base. It supports Audio-Text-to-Video, Audio-Text-Image-to-Video, and video continuation, all in one setup.

It’s open source under MIT license and claims top realism scores in EvalTalker tests. The tests used hundreds of people and multiple reviewers per video. But it appears you'd need around 32GB to run BF6.

Long videos stay clean. Cross-Chunk Latent Stitching helps stop blur and small errors from stacking up over time. Identity stays steady. Reference Skip Attention keeps characters consistent without stiff copy-like motion. It also handles more than one person and doesn’t lock you into short clips.

LongCat-Video-Avatar comes from the Meituan LongCat team. It builds on the LongCat-Video base model and focuses on avatar videos where audio and text drive motion, facial movement, and lip sync. You get people that move and speak in a way that feels more natural.

The model works across a few tasks. Audio-Text-to-Video. Audio-Text-Image-to-Video. Audio-based video continuation.

It’s meant for longer videos where faces don’t drift and motion doesn’t break. You’ll see smoother flow and steadier visuals as time goes on.

Key Features

Model Performance Editor’s Rating

No editor performance evaluations available for this model yet.

User Ratings

Censorship

Lower = less censorship. Higher = stricter filtering.

Expressiveness

Generation Speed

Prompt Following

Realism

Speech Coherence

No sample outputs available for this model yet.

LongCat Video Avatar lipsync model

Key Features

Model Performance Editor’s Rating

User Ratings

Where To Find LongCat Video Avatar

Other Models by Meituan LongCat Team

Related Lipsync Models