Wan2.5 Preview, a major upgrade to Alibaba Group’s visual generative models, drops end of September 2025.
Wan2.5-Preview is now out. It's said to reshape visual generation with a new setup and stronger features.
It runs on a native multimodal design that works across text, images, video and audio. It can generate videos with synced audio covering vocals, sound effects and background music. It can follow directions more clearly to produce photorealistic results, varied art styles, imaginative text effects and pro-level charts.
What sets this video model apart is the built in audio. Wan begins to rival Veo 3 with this release. It also allows a user to upload an audio file. You get in sync sound and visuals. It has clear voices, ASMR, sounds from the room, music, and it works with many languages. You can also make a video from the audio.
Here's what else this model offers:
Richer video movement. It makes 10-second videos that look movie-like. They have more temporal-spatial detail, full storytelling, and stable performance. They are 1080P and 24fps.
Exact text in images. Bring your visuals to life with amazing looks and real textures, exact text, and organized graphics.
Instruction-based changes. It's a dialogue-driven editing model for flexible single-image or multi-image changes and making new things.
Visual reasoning power. You get better natural language understanding and instruction following. It makes images or videos from prompts and images with complex reasoning.
If you'd like to access this model, you can explore the following possibilities: