Ovis-Image is a 7B parameter text-to-image model. You type in a prompt and it spits out an image.
Made by AIDC-AI which is the AI team at Alibaba International Digital Commerce Group.
License is Apache-2.0. That means it’s open-source and can be used freely with a few rules.
Main focus is quick and clean image creation from text. It's tuned to keep images sharp, especially when there’s text involved. It gives text results close to bigger 20B models like Qwen-Image and can match top closed ones like GPT4o in text-heavy cases while still being compact enough to run on consumer GPUs.
They built it to fix how vision and language mix inside these models. Ovis does that by giving images a smarter way to get embedded into the system.
Ovis-Image is just one part of the Ovis model family. While the main Ovis work handles both images and text for understanding, this one leans more toward making new images from text prompts. Particularly, Ovis-Image-7B is built upon Ovis-U1 - a 3-billion-parameter unified model.















If you'd like to access this model, you can explore the following possibilities: