Uni-1 is a multimodal model from Luma AI released in the beginning of March 2026. It works with text and images in the same system. The idea is simple: one model understands a scene and also makes or edits the image.

Luma calls it a step toward multimodal general intelligence-ish tools. The model reads instructions, studies visual input, then makes images that follow the plan. So instead of many AI tools doing small parts, Uni-1 tries to do the thinking and the drawing in one place.

Technically it runs as a decoder-only autoregressive transformer. Text and images sit in one mixed sequence. The model reads that stream, reasons about it, then outputs images. Luma says this setup helps the system keep scene logic, layout, and object relations more steady when it edits or generates pictures.

Right now Uni-1 is not a separate product with public pricing or model weights. Luma has not shared an open license either. Access, if it shows up later, will likely sit inside the company’s paid platform and API.

The company behind it is Luma AI, started by Amit Jain and Alex Yu. The team first got attention with easy 3D capture tools. Later they moved into generative media with systems like Dream Machine and video models. Uni-1 looks like the next research step in that path.

The main idea is reasoning before drawing. The model breaks a prompt into parts, checks constraints, plans composition… then renders the result. Luma claims this helps with things like scene logic and identity consistency, areas where many image models still slip up.

Main features people can see right now. Text-to-image generation. The model turns long prompts or storyboard-style instructions into images. Image editing. It can change existing images using instructions and visual context. Reference-guided generation. It uses one or more reference images to guide the output. Style transfer. It shifts visual style while keeping the subject or layout. Consistent character handling. It tries to keep identity steady across versions. Multilingual prompts. It reads instructions written in several languages. Spatial reasoning. It tracks where things sit in a scene. Causal reasoning. It tries to follow cause-and-effect instructions. Temporal reasoning. Storyboard-ish prompts can describe sequences or layered events. Multi-turn refinement. Users can adjust the image across several steps.

Current outputs are still images. The public examples show generated images and edited images. Resolution numbers have not been shared. Video is not shown as a direct Uni-1 feature yet, though Luma says the same model idea could later stretch into video, voice agents, and interactive world simulation.

In general the interesting part is not just image creation. Plenty of models do that already. The new claim is that the model thinks through a visual problem first. That planning step might help with complex edits, infographics, storyboards, or scenes where many objects must stay logically arranged.

Possible uses start to show up in creative work. Storyboard planning for films or ads. Reference-based image generation where a character or object must stay the same. Reasoning-heavy edits like changing time of day or scene cause-and-effect. And style shifts across manga, meme-ish visuals, or other art styles while the core scene stays intact.

Uni-1 multimodal model

Key Features

Model Performance Editor’s Rating

User Ratings

Where To Find Uni-1

Other Models by Luma AI

Related Multimodal Models