AI creators tools

"Cheating couple caught in cafe" video Prompt + Comparison

This is an AI video generation comparison for image-to-video Thumbnail prompt:

A couple sits at a small white iron table outside café. They hold hands and look at each other. The shot stays steady at 24fps with a light film-like grain. It starts focused on the couple, then shifts to the back. A man with a suitcase walks into view. His face shows shock. That changes the mood fast. The street has striped awnings, café chairs, and a busy but quiet flow of people. You hear street sounds, some footsteps, and soft clinks of dishes. No traffic noise or music. While all this hap...

to see full prompt.

Tested: October 4, 2025


JSON prompt worked better than in VEO 3. There's rack focus, natural movements and contextual understanding.

for link to original.

Tested: October 4, 2025


I've had several variations of this prompt+image with Veo3. Doesn't seem to like doing rack focus for this scene.

Tested: October 4, 2025


Hailuo seems to benefit from more context thus I slightly improved the prompt and it yielded better results than the original. Now man stops and looks not passes by the table as before.

Tested: October 4, 2025


Vidu doesn't have sound but has good prompt following.

for link to original.

Tested: October 4, 2025


Used the same context-rich prompt and it's good.

for link to original.

Tested: October 5, 2025


Awkward, but not in the intended way. The husband just keeps walking unbothered. Couple is wanting to kiss but stops mid-way for no reason (likely censorship?) in both generations I've ran for this prompt.

for link to original.

Tested: October 8, 2025


That's funny.)) Multi-character dialogue seems still very raw.

for link to original.

Tested: October 22, 2025


Multi-character dialogue + lip sync + prompt following is still very challenging. Used a single image as reference.

for link to original.

Tested: October 24, 2025


JSON prompt worked beautifully, and multi-character dialogue is flawless. Nice cinematic camera motion.

This prompt tests:

Does the couple share an intimate gaze toward each other?

Does the rack focus smoothly shift from the couple to the background pedestrian?

Is the man in the background holding a suitcase clearly visible after focus shift?

Does his expression register as shocked/unsettled when revealed?

Does the audio include gentle café ambience (murmurs, cutlery, footsteps) without loud traffic or music?

Are the dialogue lines delivered clearly without subtitles or text overlays?

Check out the results from Wan (Online Platform) (Wan2.5 Preview) vs Google Gemini App (Veo 3 Fast) vs Freepik (Hailuo 02) vs Vidu AI (Vidu Q2 Cinematic) vs Freepik (Kling 2.5 Turbo) vs PixVerse (PixVerse V5) vs GROK (Grok Imagine v0.9) vs Vidu AI (Vidu Q2 Reference-to-Video) vs LTX Studio (LTX-2) for similar or identical prompts side-by-side.

Similar Prompts

to leave a comment.