GLM-Image is an open-source image-making model from Z.ai that uses both autoregressive and diffusion decoder parts.
It was released on January the 14th 2025.
Its image quality is similar to common latent diffusion methods, but it's better when it comes to adding text or handling stuff that needs real knowledge. It gets the meaning right and handles complex info well while still making detailed and high-quality images.
Besides turning text into images, it can also change images in other ways like editing, style changes, keeping someone’s look the same, and keeping different parts of an image consistent.
Because the inference optimizations for this architecture are currently limited, the runtime cost is still relatively high. It requires either a single GPU with more than 80GB of memory, or a multi-GPU setup.






If you'd like to access this model, you can explore the following possibilities: