Lumina-DiMOO image model

Name: Lumina DiMOO
Creator: Shanghai AI Lab

Lumina-DiMOO is an open-source AI model built by Shanghai AI Lab, working with schools like Shanghai Jiao Tong, Sydney, and CUHK. It lets you turn text into images, tweak images, fix missing parts, swap styles, and understand pics, all in one setup.

Output size goes up to 1024x1024 images.

It doesn’t rely on mixed models. Instead it uses fully discrete diffusion which leads to quicker results and steadier output across different tasks. You can guide it by subject, make edits, or control how it builds stuff.

People on Reddit were hyped to try Lumina-DiMOO at first, but ran into a wall. It needs over 40 GB of VRAM so most home GPUs can’t handle it. Folks with 48 GB setups had better luck, though one guy complained his fans got too loud.

Some liked its speed and how well it understood prompts next to models like Flux or Qwen. Others said it was slower than SDXL even though it runs on fewer parameters. A few pointed out that some of the demo pics didn’t match the prompts well or didn’t look very real.

The devs said they’re not gonna shrink the model since that would hurt quality. But they do plan to drop a demo on Hugging Face soon so folks can test both text-to-image and image-to-text stuff.

No sample outputs available for this model yet.

Where To Find Lumina-DiMOO

If you'd like to access this model, you can explore the following possibilities:

Weights GitHub Licence Project Page

Hugging Face

Lumina-DiMOO image model

Key Features

Where To Find Lumina-DiMOO