Put Real Products and People into AI Video Ads (Veo 3 Full Tutorial)

Learn how to use canvas prompting to bring real people and real products into AI generated videos.

Canvas prompting lets you place real products and real people into AI-generated videos with high visual fidelity. This method opens new creative options for e-commerce advertising and product marketing without hiring a camera crew. All you need are a few images — one of your subject and one of your product — plus a clear prompt and a little iteration. Join our AI Automation Community to download the exact prompts and canvas files used in this guide and to practice these steps with other creators.

Canvas prompt examples

What is canvas prompting?

Canvas prompting is a simple, visual way to tell a video generation model how to combine multiple elements into a single scene. Instead of asking the model to invent a product or person from scratch, you provide the model with images as anchors: a subject image and a product image. Then you add a small amount of annotation on that canvas — arrows, text, and a concise action prompt — and feed the result into a video model such as Veo 3 via Flow.

Ad example

Why use this method for e-commerce ads?

  • High product fidelity: the model uses your product image as a visual reference, improving accuracy.
  • Fast iterations: you can test different creators, angles, and copy without shipping products or booking talent.
  • Cost savings: eliminate the need for on-site production for many short ad formats.
  • Flexible outputs: mine clips from generated footage and stitch them into multiple ad variants.

What you need before you start

Gather the following resources before you build your first canvas prompt:

  1. A clear product image. Remove the background if you want the model to composite the product more naturally.
  2. An image of the subject you want in the ad. This can be a real creator photo or an image generated in Midjourney or another image model.
  3. An action prompt describing what you want the subject to do and say.
  4. A canvas editor such as Canva to assemble the inputs and add simple annotations.
  5. Access to a video generation workflow that supports frames-to-video, such as Flow + Veo 3.
Prompting setup

Step 1: Create or select your subject image

If you have a real creator, a single, high-quality head-and-shoulders image works well. If you need to generate a subject, use an image model like Midjourney. The approach here is to meta-prompt: ask ChatGPT to generate a Midjourney prompt that produces a natural, UGC-style influencer image.

Key guidance for the Midjourney prompt:

  • Ask for a landscape orientation if the image will be a source frame for video.
  • Request natural lighting and minimal cinematic filters. The model performs better with photorealistic references.
  • State approximate age, gender, and setting. For example: a woman in her late 20s standing in a bright kitchen, natural expression.
Midjourney output

Step 2: Prepare your product image

Download a clean product image from your asset library or from a product page. Remove the background if possible. Smaller items such as lip balm or earbuds need careful scaling and prompt direction to avoid appearing oversized in the final video.

Canva setup

Step 3: Build the canvas in Canva

Open a new canvas that matches the video aspect ratio you want. Place the subject image and the product image on the canvas. Use simple drawing tools to annotate the canvas. Two annotations are most useful:

  • An arrow indicating which product to focus on or where the model should move the product into the subject's hand.
  • Text that contains the action prompt you will feed into Veo 3. Keep this text short and visible on the canvas so the model can read your intent.
Canva with lip balm

Step 4: Write a targeted video prompt

Now create a prompt for the video model. The best approach here is meta-prompting: ask ChatGPT to produce a Veo 3-ready prompt that captures motion, dialogue, and focus. For example:

A young attractive woman in her late 20s stands in a bright, realistic kitchen. She reaches onto the counter, picks up a Burt's Bees lip balm tube, brings it to her lips, applies it, and smiles. She says, "I never go anywhere without my Burt's Bees lip balm." Make the movements fully lifelike. This is an advertisement focused on the Burt's Bees tube.

Place that text on the canvas in readable color and size. You can also add a short title field when you upload the canvas into Flow, but including the full action on the canvas tends to improve alignment in many cases.

Google flow setup

Step 5: Generate video in Flow using Veo 3

In Flow, choose the frames-to-video option and upload your canvas. Add a short label like "lip balm advertisement" to the frame. Select the Veo 3 model and a speed setting. Start a generation run and let the model produce multiple versions. Expect variation across outputs.

Google flow and veo 3 output

Step 6: Iterate to improve product fidelity

Products, especially small items, can appear the wrong size or unrelated to the subject in early outputs. Treat the generation process like a search. Try these adjustments when fidelity is off:

  • Swap product images. A different angle or a background-removed file can help the model place the object better.
  • Edit the prompt to specify relative size. For instance, add: "The lip balm tube should appear approximately 1.5 inches tall when held by the subject."
  • Reduce the annotation text size and adjust arrow placement to emphasize the product's intended position.
  • Regenerate multiple times. Each run is a stochastic attempt; more samples increase the chance of a capture you can use.

Step 7: Mine and stitch footage

Once you have generated multiple clips, treat them as raw footage. Typical editing steps include:

  1. Cut to the best moment where the subject speaks or interacts with the product. Short UGC ad clips often run 6 to 15 seconds.
  2. Grab a close-up generated frame focused on the product for the second shot.
  3. Layer captions, a logo, and music in your editing software. Replace or mute model-generated music if it does not match your ad tone.

For example, you might cut to the line "I never go anywhere without my Burt's Bees lip balm," then cut to a tight product shot for a 1–2 second brand close-up before ending on a call to action.

Final veo 3 output

Practical tips and constraints

  • Expect variance: not every generation will be usable. The process is probabilistic.
  • Small products need precise prompts and multiple image sources to avoid scale errors.
  • Use real creator photos when possible. That reduces the chance of uncanny motion and improves authenticity.
  • Annotate carefully. Simple arrows and short text help the model understand which element is the product and how it should move.
  • Keep iterations focused. Change only one variable at a time between runs when troubleshooting. That makes cause and effect easier to spot.

Final checklist before publishing

  • Do multiple generation runs and save the best clips.
  • Confirm product scale and clarity in close-up frames.
  • Replace or mute any AI-generated music that conflicts with your brand.
  • Caption the spoken line and add a simple call to action at the end of the clip.
  • Run an approvals pass with legal and brand stakeholders before launching live campaigns.

Join our AI Automation Mastery community to download the exact prompts and canvas templates used in this guide and to practice these workflows with other builders. The community includes templates for Midjourney prompts, Veo 3 prompts, and annotated canvases that reduce setup time.

More AI Tutorials

I Built a Fully-Automated Podcast (with AI + n8n + ElevenLabs)
I Built a Fully-Automated Podcast (with AI + n8n + ElevenLabs) How to build a podcast fully researched, written, and narrated by AI.
YouTube
I Built a WhatsApp Chatbot + AI Agent in n8n for Small Businesses (free template)
I Built a WhatsApp Chatbot + AI Agent in n8n for Small Businesses (free template) How to build a WhatsApp chatbot and AI agent that can handle customer questions about hours, pricing, menus, locations, and appointments in real time.
YouTube
I Used Lovable + n8n to Build an AI Web Developer Agent (free template)
I Used Lovable + n8n to Build an AI Web Developer Agent (free template) How to build a fully-automated web developer AI agent inside of n8n that can research and build websites using Lovable.
YouTube
GPT-5 vs Gemini vs Claude: Real-World AI Agent Performance Tested
GPT-5 vs Gemini vs Claude: Real-World AI Agent Performance Tested Which AI model truly stands out in a real production AI workflows and production agents?
YouTube
I Built A Fully Local AI Agent with GPT-OSS, Ollama & n8n (GPT-4 performance for $0)
I Built A Fully Local AI Agent with GPT-OSS, Ollama & n8n (GPT-4 performance for $0) How to run the GPT-OSS AI model locally on your computer with Ollama and connect it to n8n.
YouTube
Manus AI vs. ChatGPT Agent
Manus AI vs. ChatGPT Agent I put Manus AI and ChatGPT Agent head-to-head with the exact same prompts.
YouTube