Qwen-Image is a 20B parameter AI model made for image generation and editing released by Alibaba in August 2025.
The Qwen series has models like Qwen-7B, Qwen-VL for vision-language stuff, and Qwen-Image. They're part of Alibaba's open-source work and use open licenses like Apache 2.0 for Qwen-Image. Qwen is Alibaba’s main push into large language and multimodal AI, and it’s often compared to models from Meta like LLaMA, Google’s Gemini, and OpenAI’s GPT.
It’s built to handle tricky text like Chinese inside images and can do detailed image edits. Works for both making new pictures and changing old ones.
You can throw in prompts for different art styles. It covers a lot - photo-style pics, anime looks, minimalist designs, even impressionist stuff. And it keeps text sharp no matter the script, like English or Chinese.
The model mixes a language-vision setup, a custom layout tool, and its main image engine. That helps it keep fonts, layout, and design stuff lined up right. It’s useful for things like posters, slides, or app mock-ups.
Outside of creating stuff, it also edits images well. Think style changes, fixing poses, adding or deleting stuff, even tweaking small details. It stays steady through edits too.
It also gets what’s going on in a pic. It can find objects, spot edges, guess depth, switch views, and sharpen low-res images.
Qwen-Image hit top marks in test sets like GenEval, ImgEdit, and ChineseWord.
Available in ComfyUI natively.
If you'd like to access this model, you can explore the following possibilities:
Use our video cost calculator to compare prices between platforms offering Qwen-Image model.
For locally hosted models, see description and additional links at the bottom for versions, repos and tutorials.