Stable-Video-Infinity or SVI is an open-source model released in January 2026 that makes videos as long as you want. It keeps motion smooth, scenes connected, and the story on track no matter how long it runs. Built by researchers at EPFL’s VITA Lab it came out in a 2025 paper.
The big issue it tackles is how errors stack up over time in long videos. Instead of just fighting that, SVI trains itself by throwing those same errors back into the mix. That way the model gets better at spotting and fixing its own mistakes.
You can guide the video with text, sounds, poses and other inputs depending on what you need. It also trains fast since it only tweaks LoRA layers instead of the full model. The whole system’s open too, so you can use the code, tools, training data and everything else for free.
How it works.
Older video models mess up over time, with tiny frame issues snowballing into weird results. SVI trains differently.
It adds its own errors during training then learns to clean them up. This helps it stay on track when making long clips. It sticks to the motion and story better than what came before.
It also works with stuff like audio prompts or pose inputs and runs with ComfyUI.
Using it.
You can run SVI fully offline and local.
It uses LoRA adapters instead of full model retraining which saves time.
Works with Wan 2.1 and 2.2 in ComfyUI for clip-by-clip control. Light LoRA options can speed things up 4 to 5 times.
It supports LoRAs made for styles, movement, characters, and effects in the Wan setup.
Early Feedback.
Redditors who've tested this model said 20-second video at 1280x720 took about 340 seconds using open tools. This test was on an RTX 5090 after a few changes. The user moved from the basic Wan model to Smoothmix Wan 2.2 i2v to get better speed and more life. They cut the steps from 3 to 2 to speed up the work but keep the quality good. And they used RIFE to double the frames from 16 to 32.
The first image came from an old test and the whole setup is open for you to use. You know... people saw different things happen. Some made a video over a minute long before the software hit a wall but others saw it crash fast. You have to watch your VRAM and RAM. One guy took two hours to finish a 90-second video in parts on a 16GB card. He used extra time to fix the size and frames.
Faces staying the same is a big problem since you can't control every look. Some say better prompts help but others say it’s not enough. Writing prompts has changed to look like setting up a movie scene. Camera Angles. You can set different views for every part. Character Styles. You can give people different looks. LoRA Swaps. You switch these for a custom feel.
Some Reddit users have criticized the outputs for appearing robotic or featuring abrupt transitions.
If you'd like to access this model, you can explore the following possibilities: