AI creators tools

Qwen Image & Video Generator

Qwen is Alibaba’s AI model with text video and image capabilities.

Overview

Qwen family of models are available on Alibaba Cloud’s Qwen Chat app and can be downloaded from Hugging Face for testing.

Developers can also integrate it through APIs. Pricing varies based on token usage with Qwen-Turbo being the cheapest option at $0.0004 per 1,000 input tokens.

The images are of decent quality with occasional hiccups like extra fingers, or legs.

Video generation is taking a while (several minutes, 10-20+ likely depending on their servers load) and typically fails the next time, with unknown timeframe needed to pass before you're able to generate your next one.

Images can be generated one after another, pretty much.

Qwen-Image is a free-to-use text-to-image model built by the team that also made the Qwen language models.

It uses Apache 2.0 license and packs in features like complex text placement, following prompts well, and editing stuff in pictures like adding or removing things or changing poses.

It also handles stuff like finding objects in an image, splitting them up, measuring depth, edge shapes, new view angles, and making things look sharper.

Image quality and how it handles text. People like how clearly it puts text into images, many saying it beats models like Flux and SDXL at that. But it’s not perfect. It slips a bit when the surface is bent or warped and sometimes looks kinda fake like it was pasted on.

Compared to other models. It’s better than WAN 2.2 and Flux at handling text and making skin look more natural. It also follows instructions more closely. WAN still leads when it comes to making super real-looking people in videos, but Qwen-Image stands out for how it handles language-based image generation.

Performance and hardware. The full model takes up around 40GB so it needs a graphics card with at least 24GB memory unless you shrink it down. You can shrink it using formats like GGUF or FP8, but quantizing too much, of course, makes the output worse.

It can run on regular computers too but it’s real slow there, like taking almost an hour for just one picture.

Censorship stuff. It’s mostly open with few limits. It can make adult images and even ones on hot-button topics like Tiananmen. Replicate might mark some outputs as not safe but won’t block them.

You can run it in ComfyUI, try on Hugging Face spaces or Qwen's online chat.

Tags

Freeware Apache License 2.0 PC-based #Image & Graphics

Educators and Trainers Creative Professionals Content Creators Media and Film Makers Marketing and Branding Specialists Developers and Tech Creators Nonprofit and Advocacy Creators Small Business Owners Entertainment and Performance Artists Professional Content Creators

This tool offers the following AI models:

This tool is free to use when installed locally and is offered under Apache License 2.0.

People are pumped that it’s fully open and pushing ahead with fewer limits. Some wonder how long that’ll last and what it means for teams like Meta or Mistral. Many like using regular sentences instead of tags to make pictures.

Image quality gets good feedback overall, but it’s not seen as a huge leap forward.

A bunch of users feel like image models haven’t really changed much since Flux dropped. Some say newer ones like Flux Krea or Qwen-Image are better at following prompts, but the output still looks kinda the same, like the same oversaturated, fake-feeling stuff. One user even called it Groundhog Day.

Qwen-Image got mentioned for doing comic panels and more consistent characters, but folks still seem to prefer the speed of older models like Flux Schnell. Krea can do things like pixel art, and some folks found ways to run it on GPUs with less than 8GB using Q4 quant versions.

Others say Qwen’s good at following prompts but struggles with output sharpness. Some fix that with sharpening tools. There's also complaints about Qwen and other models sticking too close to the same look no matter how the prompt is changed. It often misses stuff like camera angle or abstract ideas like “not a cat” or more layered scenes.

On top of that, the images can be fuzzy without post-processing. And while Qwen’s base model is decent, people are waiting on more fine-tuned models to really boost quality.

Overall, folks are split. Some say Qwen is solid and just needs polish, others feel like it's not keeping up with expectations. A few even said it looks like stuff made on Bing.

Source: [ Reddit ]

Prompt:
A swirl of thick, glossy paints—electric blue, crimson red, deep violet, and canary yellow—suspend mid-air against a seamless soft-gray backdrop. They begin rotating slowly in place, then spiral outward in rhythmic bursts, dancing like a gravity-defying ballet. Each color leaves trails that weave into one another, colliding and splashing mid-flight. As the swirling tightens, the paints slam together in a perfectly timed crescendo, morphing into a suspended 3D paint-formed sculpture of the words “AI Creators Tools”. The words are sculpted from thick oil paints in vivid yellow, deep purple, and bright lime-green. Each letter is ultra-bold, geometric sans with slightly rounded corners, front-on view, and medium extrusion depth. The paint strokes are chunky and textured, with visible ridges and glossy wet shine. Colors alternate across letters, some blending at the edges where purple swirls into yellow or lime streaks over purple. The final form glows subtly from within, with small droplets breaking off and floating away in slow motion, catching the light. A fine mist of color glistens in the background. Aesthetic: painterly, tactile, art-studio product shot.
Compare Tools

Generated on September 11, 2025:

While Qwen-Image is quite good with typography, Qwen video generator so far is lagging behind. Test through chat.qwen.ai, Qwen3-Max-Preview
for link to original generation.
Prompt:
A painted woman, depicted in digital illustration style, her skin layered with visible brushstroke textures, stands perfectly centered in a realistic urban street, her posture quiet and composed. She holds a magenta umbrella, shielding herself from the soft rain that falls in streaks across the frame. Her face is turned slightly in profile, with the camera placed at eye level, using a symmetrical frontal composition, giving the scene a poised, contemplative stillness. Her hair transitions at the ends into delicate wisps of smoke, dyed in a vibrant palette of turquoise, purple, and gold—the smoke rising and trailing behind her in slow motion, airy and ethereal, subtly moving with the breeze. Her figure is rendered in a surreal, painterly digital style, with bold Van Gogh-inspired swirling brushstroke textures and glowing lighting. The texture of her skin and clothing contrasts against the realism of the street around her. Behind her, the busy contemporary city is photorealistic, with wet pavement reflecting colorful storefronts, passing people, cars, and trees. Every detail—the bus halting in the distance, neon shop signs glowing in puddles, and subtle reflections—is crisp and grounded. The sky is overcast, diffusing the light evenly and casting a cinematic, moody ambiance. The visual juxtaposition emphasizes two realities colliding: her surreal, painted presence vs the grounded cinematic realism of the city.
Compare Tools

Generated on August 26, 2025:

Image output
Generated through Qwen Chat
for link to original generation.
Prompt:
A bigfoot wearing lime-green sunglasses holds this soda can in one hand as if advetising it, behind is a blurred out tropical beach
Compare Tools

Generated on August 18, 2025:

Image output
Nicely done from the 1st try, even transferred the watermark url which was on the original soda can image as a bonus)
for link to original generation.
Prompt:
Photorealistic, cinematic Will Smith meme reads "I hate spaghetti", as he is shown creaming and throwing a bowl with spaghetti back at the viewer refusing to eat it. Bowl and spaghetti along with tomato sauce and meatballs are captured mid flight in the mid-background and foregraund. Dramatic, epic, comically exagerated
Compare Tools

Generated on August 7, 2025:

Image output
This is a great one. Add 'photorealistic, cinematic' keywords to ensure the realistic style
Prompt:
Will Smith meme reads "I hate spaghetti", as he is shown creaming and throwing a bowl with spaghetti back at the viewer refusing to eat it. Bowl and spaghetti along with tomato sauce and meatballs are captured mid flight in the mid-background and foregraund. Dramatic, epic, comically exagerated
Compare Tools

Generated on August 7, 2025:

Image output
Generated in cartoonish style since there was no style reference, but overall pretty great. I'll do photorealistic next
Prompt:
The camera moves in a slow dolly shot, revealing a woman seated motionless at a gilded desk by the mirrow. [...] Her posture is upright but relaxed. The frame captures her from behind as well as her reflection in the mirror as she smiles warmly.
Compare Tools

Generated on February 3, 2025:

Qwen 2.5 Max rendering a mirror reflection video
Prompt:
A sausage dog wearing stylish wind goggles drives a gleaming chrome motorcycle, its long ears flapping wildly in the breeze, and its mouth open in an excited, playful expression. The dog looks thrilled as it grips the handlebars tightly. In the front basket of the motorcycle, a ginger-and-white cat sits energetically, its fur tousled by the wind, with wide, excited eyes and an open-mouthed expression of joy. The background features a vast countryside road stretching into the distance, lined with golden fields and distant mountains under warm golden sunlight. The entire scene exudes quirky, dynamic energy with a fun and cinematic vibe.
Compare Tools

Generated on January 31, 2025:

Qwen2.5 Max producing a hyperrealistic scene of animals riding a bike
Prompt:
Close-up shot of a woman’s tear-filled eyes as she pleads during a heated argument with her partner, seen from his back. The camera slowly zooms in on the tears streaking her flushed cheeks, the soft glow of kitchen lights barely illuminating the scene behind her.
Compare Tools

Generated on January 31, 2025:

Qwen2.5 Max emotional scene text-to-video generation
Prompt:
A highly intense and cinematic scene of a blonde female soldier in camouflage bandana with piercing eyes in the foreground, partially submerged in muddy water, aiming a modern assault rifle directly at the viewer and firing. Her face and head are soaked, with mud and grime on her shoulders emphasizing her rugged, battle-worn appearance. Behind her, a group of heavily armed soldiers, all in tactical gear and helmets, are advancing through the water, partially blurred to suggest depth. The background features a dense, misty jungle with faint outlines of trees, adding a humid, gritty atmosphere. The lighting is diffused, with overcast conditions casting a moody, natural glow on the scene. The color palette is dominated by cold earthy tones such as greens, browns, and greys, enhanced by the reflective surfaces of water and wet gear. The overall mood is tense and cinematic, capturing a sense of danger and urgency.
Compare Tools

Generated on January 31, 2025:

Qwen2.5-Max cinematic scene generation has some videogame aesthetics
Prompt:
Raw footage of a 22-year-old influencer screaming from excitement while flying with a parachute in the sky to impress his followers. Loose closeup on his screaming face, fisheye lens, crisp and candid raw footage
Compare Tools

Generated on January 31, 2025:

Qwen2.5-Plus model's text-to-video raw footage simulation example with custom aspect ratio.
Prompt:
A surreal and dynamic scene of a futuristic woman floating mid-air, as if suspended in time. She has platinum-white hair flowing outward, defying…
to see full prompt.

Generated on January 31, 2025:

Qwen2.5-Plus model's text-to-video generation example.
Prompt:
Raw selfie photo taken by a 19-year old man on an empty street on a summer day
to see full prompt.

Generated on January 30, 2025:

Image output
Selfie with natural light photography example
Prompt:
A tiny boy riding a seahorse underwater at speed [...]
to see full prompt.

Generated on January 30, 2025:

Image output
Miniature faking tilt-shift photography example
Prompt:
A young woman sitting at a rustic wooden table in a cozy, softly lit café. Her head rests on her hand, elbow propped on the table, her expression…
to see full prompt.
Compare Tools

Generated on January 30, 2025:

Image output
Loose closeup human face example
Prompt:
Medium shot, fish-eye lens. Shallow depth of field creates a focus on the robots taking selfies while at the top of the mountain. All robots huddled…
to see full prompt.
Compare Tools

Generated on January 30, 2025:

Image output
Qwen2.5-Plus natuural lighting robots selfie example
Prompt:
Underwater view of a cute baby otter diving in clear blue (pacyfic cyan #04A9D6) water. Camera shows its face close as it is looking at the viewer.…
to see full prompt.

Generated on January 30, 2025:

Image output
Qwen2.5-Plus animal underwater generation example
Prompt:
A sleek product image of a futuristic beverage can on a solid white background, featuring the brand name "AI creators tools" in bold, modern…
to see full prompt.
Compare Tools

Generated on January 30, 2025:

Image output
Qwen2.5-Plus product image with text generation example
Prompt:
Create an image A futuristic and elegant composition of a female dancer mid-pose in the air gracefully spinning with flowing white garments. The…
to see full prompt.

Generated on January 30, 2025:

Image output
Qwen2.5-Plus generation in chat

Latest Qwen Image & Video Generator News

August 21, 2025

Qwen-Image-Edit is now natively supported in ComfyUI. Click the template icon on sidebar → Browse Templates → Image → Qwen Image Edit.

feature

August 18, 2025

Qwen-Image-Edit just launched.
Main features include:
Bilingual text editing. It supports both English and Chinese edits without messing up the style.
Content-level changes. You can rotate stuff or create new IPs.
Visual tweaks. Add, delete, or insert stuff in the image.

model

August 7, 2025

Qwen-Image - an image generation foundation model in the Qwen series is out under Apache 2.0 licence.

model
Useful Links
Qwen-Image InstantX Inpainting ControlNet

Workflow

Qwen-Image InstantX Inpainting ControlNet is natively supported in ComfyUI

Qwen-Image Quantized

Version

GGUF models for low VRAM machines

Boreal-Qwen-Image

Version

Qwen-Image Boring Reality - experimental LoRAs which help you create super reslistic images with Qwen

Qwen Image Edit Inpaint (Spaces)

Other

Try inpainting images with Qwen Image Edit using a brush.

Qwen Image Edit Relight (Spaces)

Other

Try relighting images with Qwen Image Edit on HuggingFace Spaces

Qwen-Image-Lightning

Version

Qwen-Image has been distilled to run in 8-steps. You get nearly the same image quality, with >50% less compute required.

Qwen-Image ComfyUI Native Workflow Example

Workflow

After updating ComfyUI, you can find the workflow file in the templates, or drag the workflow below into ComfyUI to load it.

This page was last updated on August 21, 2025 at 12:31 AM