"Cheating couple caught in cafe" video Prompt + Comparison

This is an AI video generation comparison for image-to-video prompt:

A couple sits at a small white iron table outside café. They hold hands and look at each other. The shot stays steady at 24fps with a light film-like grain. It starts focused on the couple, then shifts to the back. A man with a suitcase walks into view. His face shows shock. That changes the mood fast. The street has striped awnings, café chairs, and a busy but quiet flow of people. You hear street sounds, some footsteps, and soft clinks of dishes. No traffic noise or music. While all this hap...

Cinematic Style

rack_focus

Tested: October 4, 2025

JSON prompt worked better than in VEO 3. There's rack focus, natural movements and contextual understanding.

for link to original.

Wan (Online Platform)

Wan2.5 Preview

Tested: October 4, 2025

I've had several variations of this prompt+image with Veo3. Doesn't seem to like doing rack focus for this scene.

Google Gemini App

Veo 3 Fast

Tested: October 4, 2025

Hailuo seems to benefit from more context thus I slightly improved the prompt and it yielded better results than the original. Now man stops and looks not passes by the table as before.

Freepik

Hailuo 02

Tested: October 4, 2025

Vidu doesn't have sound but has good prompt following.

for link to original.

Vidu AI

Vidu Q2 Cinematic

Tested: October 4, 2025

Used the same context-rich prompt and it's good.

for link to original.

Freepik

Kling 2.5 Turbo

Tested: October 5, 2025

Awkward, but not in the intended way. The husband just keeps walking unbothered. Couple is wanting to kiss but stops mid-way for no reason (likely censorship?) in both generations I've ran for this prompt.

for link to original.

PixVerse

PixVerse V5

Tested: October 8, 2025

That's funny.)) Multi-character dialogue seems still very raw.

for link to original.

GROK

Grok Imagine v0.9

Tested: October 22, 2025

Multi-character dialogue + lip sync + prompt following is still very challenging. Used a single image as reference.

for link to original.

Vidu AI

Vidu Q2 Reference-to-Video

Tested: October 24, 2025

JSON prompt worked beautifully, and multi-character dialogue is flawless. Nice cinematic camera motion.

LTX Studio

LTX-2

Tested: December 3, 2025

With this JSON prompt, multi-character dialogue is correct, but the visual part is off: rack focus isn't done correctly, the man with suitcase isnt coming closer and instead poses awkwardly and weirdly.

Kling AI

Kling 2.6

Tested: December 3, 2025

This simpler natural language prompt worked better. Could be just the length of the prompt.

Kling AI

Kling 2.6

Tested: December 17, 2025

Tweaked JSON worked, otherwise there was some confusion. Multi-character dialogue is correct, focus shift works. Man's expression a bit too hilarious))

Wan (Online Platform)

Wan 2.6

Tested: December 17, 2025

Spawned a new guy into the scene this time.

Wan (Online Platform)

Wan 2.6

Tested: December 24, 2025

It's quite good. There's no rack focus but all the actors are animated and lipsynced properly.

Dreamina AI

Seedance 1.5 Pro

This prompt tests:

Does the couple share an intimate gaze toward each other?

Does the rack focus smoothly shift from the couple to the background pedestrian?

Is the man in the background holding a suitcase clearly visible after focus shift?

Does his expression register as shocked/unsettled when revealed?

Does the audio include gentle café ambience (murmurs, cutlery, footsteps) without loud traffic or music?

Are the dialogue lines delivered clearly without subtitles or text overlays?

Check out the results from Wan (Online Platform) (Wan2.5 Preview) vs Google Gemini App (Veo 3 Fast) vs Freepik (Hailuo 02) vs Vidu AI (Vidu Q2 Cinematic) vs Freepik (Kling 2.5 Turbo) vs PixVerse (PixVerse V5) vs GROK (Grok Imagine v0.9) vs Vidu AI (Vidu Q2 Reference-to-Video) vs LTX Studio (LTX-2) vs Kling AI (Kling 2.6) vs Kling AI (Kling 2.6) vs Wan (Online Platform) (Wan 2.6) vs Wan (Online Platform) (Wan 2.6) vs Dreamina AI (Seedance 1.5 Pro) for similar or identical prompts side-by-side.