Wan by Alibaba

Wan2.1 by Alibaba’s Wan team is an open-source AI suite for generating videos from text and images. It handles motion physics, text rendering, and more—leading the VBench benchmark. Free to use under Apache 2.0.

Overview

Wan2.1 is an open-source AI video tool built by Alibaba’s Wan team. It transforms text and images into high-quality videos while handling motion dynamics, physics, and text rendering in both Chinese and English. It’s not just another AI model - it leads the VBench leaderboard, outperforming both open-source and commercial competitors.

The best part is that it’s free AND supports Consumer-grade GPUs: The T2V-1.3B model requires only 8.19 GB VRAM, making it compatible with almost all consumer-grade GPUs. It can generate a 5-second 480P video on an RTX 4090 in about 4 minutes. Mind you, the size of the model is gigantic, dozens of gigabytes, so must have a lot of disk space available.

Wan2.1 is fully open-source under the Apache 2.0 license. No hidden fees, no subscriptions—just grab the code and start creating. 

You can start using Wan2.1 right away. Check out their GitHub for installation and usage instructions. Pre-trained models are also available on Hugging Face and ModelScope for easy integration.

Fal.ai has also released this model on their platform.

There's a GGUF version for Comfy UI.

Tags

Freeware Apache License 2.0 PC-based #Video & Animation

  • Image-to-Video
  • Inpainting (Videos)
  • Outpainting (Videos)
  • Text-2-Video

Educators and Trainers Creative Professionals Content Creators Media and Film Makers Marketing and Branding Specialists Developers and Tech Creators Nonprofit and Advocacy Creators Small Business Owners Entertainment and Performance Artists Professional Content Creators

This tool is free to use and is offered under Apache License 2.0.

People are testing the WAN 2.1 I2V model using compressed GGUF files in ComfyUI, mainly on RTX 3060 cards. Here's what they're seeing:

Performance on RTX 3060

  • 416x416, 25 steps → About 9 minutes for 2 seconds of video
  • 512x512, 25 steps → Around 13.5 minutes for 2 seconds

Key Resources & Process

  • GGUF compressed models are up on Hugging Face
  • Basic setup guide available [here]
  • More details on the ComfyUI example page [here]

Hardware Tips

  • Hardware used: 12GB VRAM, 48GB RAM (extra RAM helps a lot)
  • Some users get by with 16-32GB RAM

Choosing Compression Levels

  • Q4_0 was used, but higher levels (bigger files) give better quality

    How It Compares

    More stable than SkyReels. Less "melting" effect than some other tools.

    [ Reddit ]

    Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse.

    Generated on February 26, 2025:

    Fal.ai published video example for WAN t2v

    Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage.

    Generated on February 26, 2025:

    Fal.ai published video example for WAN image-2-video

    Prompt: A sausage dog wearing stylish wind goggles drives a gleaming chrome motorcycle, its long ears flapping wildly in the breeze, and its mouth open in an excited, playful expression. The dog looks thrilled as it grips the handlebars tightly. In the front basket of the motorcycle, a ginger-and-white cat sits energetically, its fur tousled by the wind, with wide, excited eyes and an open-mouthed expression of joy. The background features a vast countryside road stretching into the distance, lined with golden fields and distant mountains under warm golden sunlight. The entire scene exudes quirky, dynamic energy with a fun and cinematic vibe.

    Generated on February 28, 2025:

    Test in local Comfy UI install of 1.3 billion Wan model

    Prompt: Close-up shot of a woman’s tear-filled eyes as she pleads during a heated argument with her partner, seen from his back. The camera slowly zooms in on the tears streaking her flushed cheeks, the soft glow of kitchen lights barely illuminating the scene behind her.

    Generated on February 28, 2025:

    Test in local Comfy UI install of 1.3 billion Wan t2v model

    Prompt: The scene begins with a close-up of striking red high heels, sharp and polished, walking away on a fractured asphalt road. The camera remains low to the ground, fully focused on the legs as they walk with deliberate confidence. The camera steadily tracks the legs from behind, capturing their motion as they stride through a desolate, post-apocalyptic street.

    Generated on February 28, 2025:

    1.3 billion Wan2.1

    Prompt: A sleek humanoid robot performs a mesmerizing dance in a stark, minimalistic futuristic room. The seamless white walls glow softly with pulsing lines of light, creating a sharp contrast against the robot’s dark figure. The camera starts with a static wide shot, gradually moving into a slow orbit around the robot, capturing its fluid, precise movements. The room’s dynamic lighting synchronizes with the rhythm of the dance, casting glowing patterns and subtle shadows on the floor and walls. The overall scene combines elegance, sophistication, and a high-tech sci-fi aesthetic.

    Generated on February 28, 2025:

    Image to Video done in Comfy UI's Wan2.1-I2V-14B-480P-gguf version Q4

    Rating:

    Latest Wan by Alibaba News

    February 28, 2025

    1-Click WAN with ComfyUI Locally now available through Pinokio AI browser https://x.com/cocktailpeanut/status/1894936091461038228

    This page was last updated on February 28, 2025 at 6:54 PM