Qwen Image & Video Generator

Qwen2.5-VL is Alibaba’s newest AI model with text video and image capabilities.

Overview

Qwen2.5-VL is Alibaba’s newest AI model with text video and image capabilities.

The model is available on Alibaba Cloud’s Qwen Chat app and can be downloaded from Hugging Face for testing.

Developers can also integrate it through APIs. Pricing varies based on token usage with Qwen-Turbo being the cheapest option at $0.0004 per 1,000 input tokens.

The images are of decent quality with occasional hiccups like extra fingers, or legs.

Video generation is taking a while (several minutes, 10-20+ likely depending on their servers load) and typically fails the next time, with unknown timeframe needed to pass before you're able to generate your next one.

Images can be generated one after another, pretty much.

Tags

Freeware Apache License 2.0 Web-based #Image & Graphics

  • API Availability
  • Private Generation
  • Text-to-Image
  • Video Generation

Educators and Trainers Creative Professionals Content Creators Media and Film Makers Marketing and Branding Specialists Developers and Tech Creators Nonprofit and Advocacy Creators Small Business Owners Entertainment and Performance Artists Professional Content Creators

This tool is free to use and is offered under Apache License 2.0.

Prompt: Create an image A futuristic and elegant composition of a female dancer mid-pose in the air gracefully spinning with flowing white garments. The dancer is placed centrally in the frame her profile turned to the left with her long platinum-white hair flowing backward, suggesting dynamic movement. [...] The lighting is diffused and soft, predominantly cyan and teal, complementing the subject's glowing accessories and enhancing the futuristic tone. The polished floor reflects the subject faintly

Generated on January 30, 2025:

Image output
Qwen2.5-Plus generation in chat

Prompt: A sleek product image of a futuristic beverage can on a solid white background, featuring the brand name "AI creators tools" in bold, modern typography. The can design incorporates elements of technology and creativity, including abstract digital patterns, glowing circuit motifs, and icons representing filmmaking, generative AI, and YouTube creation. Vibrant hues of electric blue, silver, and neon accents create a tech-inspired aesthetic. The overall design feels innovative and dynamic, emphasizing the fusion of AI and creative tools.

Generated on January 30, 2025:

Image output
Qwen2.5-Plus product image with text generation example

Prompt: Underwater view of a cute baby otter diving in clear blue (pacyfic cyan #04A9D6) water. Camera shows its face close as it is looking at the viewer. In the ultimate left part of the frame there's half of the large smooth red plastic ball visible.

Generated on January 30, 2025:

Image output
Qwen2.5-Plus animal underwater generation example

Prompt: Medium shot, fish-eye lens. Shallow depth of field creates a focus on the robots taking selfies while at the top of the mountain. All robots huddled together while taking a group selfie picture. Goldy is a red 1950s retro robot monster, slightly rusty. Dolbus is a sleek futuristic humanoid robot with rounded features, black and steel look and taller than the rest. Bingus is a copper steampunk robot with big eyes and a stylish hat posing with crossed arms, chin up, projecting attitude. All robots appear happy, smiling directly at the camera. The mood is jolly and humorous.

Generated on January 30, 2025:

Image output
Qwen2.5-Plus natuural lighting robots selfie example

Prompt: A young woman sitting at a rustic wooden table in a cozy, softly lit café. Her head rests on her hand, elbow propped on the table, her expression distant and bored. Strands of her wavy chestnut hair frame her face, catching the golden glow of the late afternoon sun streaming through a large window beside her. The table is scattered with a cup of steaming coffee, a half-read book, and a notebook with scribbled notes. Behind her, the blurred bustle of the café contrasts sharply with her stillness, creating a sense of detachment. Cinematic framing focuses on her from a slight angle, emphasizing the slant of her gaze and the wistfulness in her eyes. The warm ambiance is accented by bokeh light effects, with muted tones of beige, cream, and soft green dominating the background, adding a dreamy, introspective mood.

Generated on January 30, 2025:

Image output
Loose closeup human face example

Prompt: A tiny boy riding a seahorse underwater at speed [...]

Generated on January 30, 2025:

Image output
Miniature faking tilt-shift photography example

Prompt: Raw selfie photo taken by a 19-year old man on an empty street on a summer day

Generated on January 30, 2025:

Image output
Selfie with natural light photography example

Prompt: A surreal and dynamic scene of a futuristic woman floating mid-air, as if suspended in time. She has platinum-white hair flowing outward, defying gravity, and her serene expression conveys a sense of otherworldly calm. Her body is arched gracefully, arms outstretched [...]

Generated on January 31, 2025:

Qwen2.5-Plus model's text-to-video generation example.

Prompt: Raw footage of a 22-year-old influencer screaming from excitement while flying with a parachute in the sky to impress his followers. Loose closeup on his screaming face, fisheye lens, crisp and candid raw footage

Generated on January 31, 2025:

Qwen2.5-Plus model's text-to-video raw footage simulation example with custom aspect ratio.

Prompt: A highly intense and cinematic scene of a blonde female soldier in camouflage bandana with piercing eyes in the foreground, partially submerged in muddy water, aiming a modern assault rifle directly at the viewer and firing. Her face and head are soaked, with mud and grime on her shoulders emphasizing her rugged, battle-worn appearance. Behind her, a group of heavily armed soldiers, all in tactical gear and helmets, are advancing through the water, partially blurred to suggest depth. The background features a dense, misty jungle with faint outlines of trees, adding a humid, gritty atmosphere. The lighting is diffused, with overcast conditions casting a moody, natural glow on the scene. The color palette is dominated by cold earthy tones such as greens, browns, and greys, enhanced by the reflective surfaces of water and wet gear. The overall mood is tense and cinematic, capturing a sense of danger and urgency.

Generated on January 31, 2025:

Qwen2.5-Max cinematic scene generation has some videogame aesthetics

Prompt: Close-up shot of a woman’s tear-filled eyes as she pleads during a heated argument with her partner, seen from his back. The camera slowly zooms in on the tears streaking her flushed cheeks, the soft glow of kitchen lights barely illuminating the scene behind her.

Generated on January 31, 2025:

Qwen2.5 Max emotional scene text-to-video generation

Prompt: A sausage dog wearing stylish wind goggles drives a gleaming chrome motorcycle, its long ears flapping wildly in the breeze, and its mouth open in an excited, playful expression. The dog looks thrilled as it grips the handlebars tightly. In the front basket of the motorcycle, a ginger-and-white cat sits energetically, its fur tousled by the wind, with wide, excited eyes and an open-mouthed expression of joy. The background features a vast countryside road stretching into the distance, lined with golden fields and distant mountains under warm golden sunlight. The entire scene exudes quirky, dynamic energy with a fun and cinematic vibe.

Generated on January 31, 2025:

Qwen2.5 Max producing a hyperrealistic scene of animals riding a bike

Prompt: The camera moves in a slow dolly shot, revealing a woman seated motionless at a gilded desk by the mirrow. [...] Her posture is upright but relaxed. The frame captures her from behind as well as her reflection in the mirror as she smiles warmly.

Generated on February 3, 2025:

Qwen 2.5 Max rendering a mirror reflection video

Rating:

This page was last updated on January 31, 2025 at 9:26 PM