StepFun AI (Step-Video-T2V & Step-Video-TI2V)

Step-Video-T2V is an open-source AI tool that turns text prompts into smooth videos up to 8 seconds long. The tech is impressive but requires serious hardware power.

Visit This Site

StepFun AI (Step-Video-T2V & Step-Video-TI2V) screenshot 1

Overview

Step-Video-T2V is a tool from Shanghai-based AI startup StepFun. It takes text descriptions and spits out high-quality videos—sounds great right? Well there’s a catch.

This AI beast runs on 30 billion parameters and can generate up to 8-second videos (204 frames) at 544x992 resolution. It uses a deep shrinking VAE to crunch down data improving both speed and output quality. On top of that Direct Preference Optimization (DPO) helps reduce glitches by learning from human feedback.

Here's the catch: locally, you currently need 80 GB of VRAM. Not for the GPU poor. Hopefully, quantized versions are coming.

Menwhile, there's a Chinese website version https://yuewen.cn/videos but it doesn't seem to accept international phone numbers (I haven't received my code in sms).

Step-Video-TI2V is their newer text driven image-to-video model which can be downloaded form https://huggingface.co/stepfun-ai/stepvideo-ti2v .

Links

Educators and Trainers Creative Professionals Content Creators Media and Film Makers Marketing and Branding Specialists Developers and Tech Creators Nonprofit and Advocacy Creators Small Business Owners Entertainment and Performance Artists Professional Content Creators

This tool is free to use when installed locally and is offered under Apache License 2.0.

Users on Reddit’s r/StableDiffusion are hyped about the results with some calling it the best open-source text-to-video model so far. Unlike many closed-off AI tools Step-Video-T2V is completely open-source with code and model weights up on GitHub.

Here’s where things get tricky—you need 80GB of VRAM to run it properly. That’s way beyond what regular GPUs can handle. Some users are hoping for quantized versions (like 8-bit or 4-bit models) to make it run on consumer GPUs but right now it’s mostly for those with industrial-grade setups.

Great tech but hard to access. People love the results but hate the hardware barrier.
Optimizations needed. Many are hoping for updates that make it run on multi-GPU setups or less powerful machines.
NVIDIA’s dominance. Some blame high VRAM costs on NVIDIA’s monopoly hoping AMD or Intel step in with cheaper options.
NSFW curiosity. As always users are wondering if it can handle adult content (because of course they are).

Prompt: none

Generated on February 18, 2025:

Video examples highlighted on StepFun's website

Prompt:

A skier skis on a snowy mountain, captured from a first-person perspective using a selfie stick. The skier is wearing a dark ski suit, goggles, a light-colored helmet, and holding a selfie stick. He slides down the slope, sending a lot of snowflakes flying behind him. In the background is a clear blue sky and a snowy mountain. The sun is strong, creating a starburst effect in the frame. The skier has an excited facial expression, his mouth is open, and he waves to the camera with one hand free. The entire video is full of movement, showing the excitement and fun of skiing.

Generated on February 18, 2025:

Video example with motion in outdoor setting highlighted on StepFun's website

Prompt:

In a modern, technologically advanced laboratory, Einstein is conducting an experiment. The camera takes a close-up shot of his face as he thinks, with his white hair and beard a little messy. He writes down a complex formula on the blackboard. The surrounding equipment is futuristic, and the background is slightly blurred. As he stops writing and smiles, it seems that he has solved a major problem.

Generated on February 18, 2025:

Video example highlighted on StepFun's website

Prompt:

The boat traveled through a gorgeous magical forest, where roses bloomed as if enchanted, with petals fluttering in the air, forming a sharp contrast with the surrounding lava. In the distance, towering mountains loomed in the clouds, like a fantasy landscape painting painted by a powerful magician.

Generated on February 18, 2025:

Animation example highlighted on StepFun's website

Rating:

Favorite

Latest StepFun AI (Step-Video-T2V & Step-Video-TI2V) News

March 21, 2025

Step-Video-TI2V - a 30B parameter text-guided image-to-video model, released.

Useful Links

No additional links available for this tool.

This page was last updated on March 21, 2025 at 2:39 PM

StepFun AI (Step-Video-T2V & Step-Video-TI2V)

Overview

Tags

Links

What can it do?

Who is it for?

How much does it cost?

Community feedback and reviews

StepFun AI (Step-Video-T2V & Step-Video-TI2V) examples

Latest StepFun AI (Step-Video-T2V & Step-Video-TI2V) News

Useful Links