PUSA V1.0
Pusa V1.0 brings zero-shot video generation to anyone with a GPU. It's cheap to train easy to run and packed with smart features like VTA for frame-level control.
Overview
Pusa is a free open video tool you can run yourself. It turns text or images into videos edits videos or extends them. And yeah it costs $0 under Apache 2.0. No paid tier no nonsense.
Pusa brings in tight time control for video diffusion. It runs on something called VTA (Vectorized Timestep Adaptation) which basically means you can tell it exactly how to handle each frame. Want to go from image to video? Done. Need to start a video halfway through? It can do that too.
They built it over Wan2.1-T2V-14B but didn’t retrain the whole thing. Instead they tweaked it using LoRA with just $500 and about 4K samples. That’s way cheaper than Wan’s $100K+ and over 10 million clips. Pretty wild right?
And even with that small setup it still scores better than Wan on VBench-I2V. That’s rare. It hits 87.32% vs Wan’s 86.86%. Not a massive jump but still a win.
It keeps the base model intact and just adds more control. That means you can still do everything you did before but now with more options like:
-
Adding transitions between clips
-
Finishing a half-done video
-
Generating midsections
-
Creating videos from one frame or a short idea
All by giving it some cues.
You’ll need your own GPU setup though. No hosted API yet.
Tags
Freeware Apache License 2.0 PC-based #Video & AnimationLinks
This tool is free to use when installed locally and is offered under Apache License 2.0.
Not everyone’s sold yet but folks are curious. Some say the outputs feel a bit “laggy” or show frame jumps. Like the color might shift from frame to frame. But others think it's worth keeping an eye on especially for cheap flexible use.
There’s no big API rollout and not many community demos yet. But the tech under the hood looks solid.
Users on X are mixed. Some love the zero-shot multitasking saying it makes a “compact and smart upgrade over Wan.” Others say they’ll wait until trusted devs like Kijai mess with it or until quantized versions drop.
Quick Pros and Cons
What it nails:
- Text-to-video and image-to-video
- Start and end frame control
- Transitions and inpainting
- Low training cost
- Supports 720p up to 8s
What it’s missing:
- Hosted API
- Polished transitions
- Clear docs or tutorials
These are just first impressions from Reddit
Useful Links
No additional links available for this tool.
This page was last updated on July 16, 2025 at 7:22 AM