WEBVTT

00:00:00.000 --> 00:00:02.940
You know, the idea of making professional cinematic

00:00:02.940 --> 00:00:05.160
videos usually means, well, cameras, lights,

00:00:05.280 --> 00:00:08.179
maybe a studio. You definitely have to be OK

00:00:08.179 --> 00:00:11.580
with being on screen. But what if you could just

00:00:11.580 --> 00:00:14.460
skip all that? We're talking about generating

00:00:14.460 --> 00:00:16.920
these really realistic personalized videos of

00:00:16.920 --> 00:00:19.679
yourself, like a talking avatar, without ever

00:00:19.679 --> 00:00:22.359
actually recording yourself. And this level of

00:00:22.359 --> 00:00:24.440
creation, it isn't some far off future thing.

00:00:24.620 --> 00:00:27.539
It's accessible, like. right now and mostly for

00:00:27.539 --> 00:00:29.699
free to start. Oh, it's absolutely true. And

00:00:29.699 --> 00:00:31.940
today we're really going to dive deep into the

00:00:31.940 --> 00:00:34.340
specific workflow from the source material we

00:00:34.340 --> 00:00:37.399
looked at. It uses this surprisingly powerful

00:00:37.399 --> 00:00:40.460
three tool AI stack. So our mission today is

00:00:40.460 --> 00:00:42.920
basically to unpack that whole five phase process.

00:00:42.939 --> 00:00:45.200
Yeah. And really focus on the practical stuff

00:00:45.200 --> 00:00:47.479
you need to master. We got to go beyond just

00:00:47.479 --> 00:00:50.039
naming the software. We're looking for that secret

00:00:50.039 --> 00:00:52.000
sauce for getting consistent prompts, the actual

00:00:52.000 --> 00:00:54.320
logistics of creating that avatar and how you

00:00:54.320 --> 00:00:56.679
genuinely direct. the final video output. We're

00:00:56.679 --> 00:00:58.500
aiming to give you a serious shortcut to making

00:00:58.500 --> 00:01:00.700
this kind of high quality content, and efficiently

00:01:00.700 --> 00:01:03.140
too. OK, sounds good. Let's unpack this setup

00:01:03.140 --> 00:01:05.219
then. What are the tools that make this whole

00:01:05.219 --> 00:01:07.719
thing possible? All right, so the foundation

00:01:07.719 --> 00:01:10.659
here is these three specific tools. When you

00:01:10.659 --> 00:01:12.180
chain them together, they seem to work really

00:01:12.180 --> 00:01:15.560
well. First up, Chat GPT. And the source material

00:01:15.560 --> 00:01:18.870
is pretty clear. you don't actually need the

00:01:18.870 --> 00:01:21.890
paid plus plan, the $20 a month one, just to

00:01:21.890 --> 00:01:23.709
get started. Yeah, that's super important for

00:01:23.709 --> 00:01:25.730
accessibility, right? The free version works

00:01:25.730 --> 00:01:27.569
fine, mainly because we're going to be using

00:01:27.569 --> 00:01:32.010
these specialized custom GPTs. Think of them

00:01:32.010 --> 00:01:34.430
like little helpers trained for really specific

00:01:34.430 --> 00:01:37.689
jobs, like writing these cinematic image prompts.

00:01:37.730 --> 00:01:39.769
You really only need that plus subscription later

00:01:39.769 --> 00:01:42.090
if you find you want faster speeds, or maybe

00:01:42.090 --> 00:01:44.730
access to the absolute newest models, but not

00:01:44.730 --> 00:01:47.379
essential at the start. OK. Tool number two is

00:01:47.379 --> 00:01:49.959
Nano Banana. That's for the image creation part.

00:01:50.099 --> 00:01:52.219
Yep. That's kind of the engine room. It's got

00:01:52.219 --> 00:01:54.239
a great free plan. You can make almost unlimited

00:01:54.239 --> 00:01:56.640
pictures, which is amazing. And its main strength

00:01:56.640 --> 00:01:59.060
is how it automatically handles that really tricky

00:01:59.060 --> 00:02:01.719
technical bit, keeping the face consistent across

00:02:01.719 --> 00:02:04.260
images. Yeah. Facial matching. Right. That consistency

00:02:04.260 --> 00:02:06.650
seems crucial. And then tool three. Then you

00:02:06.650 --> 00:02:09.770
move up to the big gun, Google VEO3. That's for

00:02:09.770 --> 00:02:11.969
the actual video generation. And what's really

00:02:11.969 --> 00:02:15.050
interesting here is the kind of financial opportunity

00:02:15.050 --> 00:02:17.930
built into how things are set up right now. There's

00:02:17.930 --> 00:02:20.810
a one -month free trial, which is great for testing.

00:02:21.270 --> 00:02:24.509
But get this, if you happen to be a student or

00:02:24.509 --> 00:02:27.610
have a .edu email address for any reason, you

00:02:27.610 --> 00:02:31.129
can currently get this massive 18 -month free

00:02:31.129 --> 00:02:34.819
deal. I mean, that's a total game changer for

00:02:34.819 --> 00:02:37.259
long term experimentation, right? That deal alone

00:02:37.259 --> 00:02:39.219
makes this project seem really worthwhile exploring.

00:02:39.639 --> 00:02:41.879
But is juggling three different tools really

00:02:41.879 --> 00:02:44.020
worth the hassle compared to maybe finding it

00:02:44.020 --> 00:02:46.400
all in one generator? Yeah, I think so because

00:02:46.400 --> 00:02:48.979
what you gain is control. And that control starts

00:02:48.979 --> 00:02:52.879
with a really key setup tip for VEO3. You absolutely

00:02:52.879 --> 00:02:55.020
must use the flow interface. If you just try

00:02:55.020 --> 00:02:57.400
and do it in the normal Gemini chat window, you

00:02:57.400 --> 00:02:59.840
lose all that director level control over video

00:02:59.840 --> 00:03:02.740
shape, quality settings. output formats, stuff

00:03:02.740 --> 00:03:04.860
you really need for professional results. Okay,

00:03:04.960 --> 00:03:07.139
so using Flow is like stepping out of the basic

00:03:07.139 --> 00:03:10.060
chat and into the production suite, basically.

00:03:10.300 --> 00:03:12.120
Exactly. And if we're talking about making the

00:03:12.120 --> 00:03:14.340
most of that free time, especially if you're

00:03:14.340 --> 00:03:16.919
on a trial, managing those credits has got to

00:03:16.919 --> 00:03:19.379
be super important. Oh, absolutely. Don't just

00:03:19.379 --> 00:03:21.699
burn through those precious free credits. A huge

00:03:21.699 --> 00:03:26.240
tip is organization and Testing smart. Always

00:03:26.240 --> 00:03:29.199
start by asking the E03 for just one video output

00:03:29.199 --> 00:03:31.400
first. Check the result, see if it's going the

00:03:31.400 --> 00:03:33.860
right direction, and then iterate. Don't ask

00:03:33.860 --> 00:03:36.800
for four variations right off the bat. And iterate

00:03:36.800 --> 00:03:39.539
using the fast mode first. Right. That saves

00:03:39.539 --> 00:03:42.330
a lot of resources. Totally. Quality mode looks

00:03:42.330 --> 00:03:44.650
amazing, but it chews through four times the

00:03:44.650 --> 00:03:47.469
credits. Use fast mode. It's about five times

00:03:47.469 --> 00:03:49.509
cheaper. It gives you a test video in under a

00:03:49.509 --> 00:03:52.050
minute, usually. Use that to quickly check your

00:03:52.050 --> 00:03:54.849
ideas, your prompts. Only switch over to quality

00:03:54.849 --> 00:03:56.669
mode when you're ready for the final polished

00:03:56.669 --> 00:03:59.330
output. So what's the biggest efficiency gain

00:03:59.330 --> 00:04:02.129
there, really, from testing in fast first? It

00:04:02.129 --> 00:04:04.710
just saves credits, lets you test way more ideas

00:04:04.710 --> 00:04:07.629
before the final render, maximize that experimentation

00:04:07.629 --> 00:04:09.629
phase. Right, maximum testing within the budget.

00:04:09.979 --> 00:04:12.099
Okay, let's talk about those prompts then because

00:04:12.099 --> 00:04:14.539
they seem like the real core of getting the visuals

00:04:14.539 --> 00:04:17.740
right Yeah, this brings us to what really separates,

00:04:17.740 --> 00:04:20.319
you know amateur results from professional looking

00:04:20.319 --> 00:04:23.060
AI video It's the quality of the prompt. We're

00:04:23.060 --> 00:04:25.980
not just having a basic chat with chat GPT here

00:04:25.980 --> 00:04:29.600
We're using a specialized custom GPT one that's

00:04:29.600 --> 00:04:32.300
designed specifically for image creation prompts

00:04:32.620 --> 00:04:35.079
You can usually find these in the GPT store.

00:04:35.180 --> 00:04:37.639
Just search for something like nano banana prompt

00:04:37.639 --> 00:04:42.060
or cinematic image prompt. So these custom GPTs,

00:04:42.300 --> 00:04:45.310
they've... kind of absorbed the lessons from

00:04:45.310 --> 00:04:48.069
thousands of successful visual requests. They

00:04:48.069 --> 00:04:49.930
act almost like a digital storyboard artist for

00:04:49.930 --> 00:04:52.629
you. Precisely. They're trained on tons of successful

00:04:52.629 --> 00:04:54.850
examples, so they just get cinematic description.

00:04:55.250 --> 00:04:57.170
They understand how to plan a picture that's

00:04:57.170 --> 00:04:58.850
going to look good once you add motion later.

00:04:59.230 --> 00:05:01.089
You've got to go way beyond simple stuff like,

00:05:01.089 --> 00:05:03.329
you know, person in a forest. You need to demand

00:05:03.329 --> 00:05:06.790
story, detail, mood. OK, so give us an example

00:05:06.790 --> 00:05:09.189
then. How should a listener frame that stronger,

00:05:09.209 --> 00:05:11.639
more cinematic request? Well, instead of just

00:05:11.639 --> 00:05:14.459
the basic description, you'd prompt the custom

00:05:14.459 --> 00:05:16.959
GPT with the scene's intent. Something like,

00:05:17.620 --> 00:05:20.480
create a cinematic style prompt for a young financial

00:05:20.480 --> 00:05:22.720
expert. She's presenting an idea in a modern

00:05:22.720 --> 00:05:26.139
office setting. The lighting needs to feel professional,

00:05:26.360 --> 00:05:29.019
making her look trustworthy. And then the GPT

00:05:29.019 --> 00:05:31.560
will spit back a whole picture plan. It'll include

00:05:31.560 --> 00:05:34.800
specific late details, maybe soft light flooding

00:05:34.800 --> 00:05:38.790
in from a large window. Camera angle ideas, like

00:05:38.790 --> 00:05:41.290
eye level shot, medium close -up, even color

00:05:41.290 --> 00:05:44.009
notes, like cool blue and gray tones dominate

00:05:44.009 --> 00:05:47.110
the palette. Wow, okay. That level of specificity,

00:05:47.129 --> 00:05:49.149
you can see how that would drastically improve

00:05:49.149 --> 00:05:51.589
the visual quality, the fidelity, and just the

00:05:51.589 --> 00:05:53.370
overall mood of the scene. Yeah. But when you

00:05:53.370 --> 00:05:55.709
use one of these super descriptive prompts, isn't

00:05:55.709 --> 00:05:58.589
there a risk that the image generator, Nano Banana

00:05:58.589 --> 00:06:00.829
in this case, just gets... well, too creative,

00:06:01.329 --> 00:06:02.949
and kind of ignores the facial reference photo

00:06:02.949 --> 00:06:05.290
you give it later. Mm -hmm. Ah, yeah. That's

00:06:05.290 --> 00:06:07.709
the constant battle with AI art, isn't it? That

00:06:07.709 --> 00:06:10.209
prompt drift. You have to keep refining. It's

00:06:10.209 --> 00:06:12.889
never quite perfect first time. Which is why

00:06:12.889 --> 00:06:15.350
I also really love this powerful trick for iteration.

00:06:16.029 --> 00:06:18.569
Once you get a prompt that works reasonably well...

00:06:18.600 --> 00:06:21.860
ask that same custom GPT to generate, say, five

00:06:21.860 --> 00:06:23.819
different versions of that prompt. Just ask it

00:06:23.819 --> 00:06:25.959
to change only one thing each time, maybe the

00:06:25.959 --> 00:06:28.439
location or the time of day or the clothing she's

00:06:28.439 --> 00:06:30.860
wearing. Doing that lets you rapidly build up

00:06:30.860 --> 00:06:33.399
a library of related, effective prompts. Saves

00:06:33.399 --> 00:06:36.720
hours of manual tweaking later. You know, honestly,

00:06:36.839 --> 00:06:39.019
I still wrestle with prompt drift myself sometimes.

00:06:39.160 --> 00:06:42.579
It's tricky. So using the GPT to help debug image

00:06:42.579 --> 00:06:45.339
errors. Yeah, that's crucial, even for me. That's

00:06:45.339 --> 00:06:47.639
actually helpful to hear that even experts hit

00:06:47.639 --> 00:06:50.139
that wall sometimes. So, okay, let's say Nano

00:06:50.139 --> 00:06:52.579
Banana keeps making weird visual mistakes like

00:06:52.579 --> 00:06:54.639
the eyes look strange consistently or there's

00:06:54.639 --> 00:06:56.660
some repeating pattern in the background. How

00:06:56.660 --> 00:06:58.959
exactly does the AI help you debug that? Well,

00:06:58.959 --> 00:07:01.600
you basically describe that specific visual mistake

00:07:01.600 --> 00:07:04.199
back to the custom GPT. You tell it, hey, the

00:07:04.199 --> 00:07:06.519
eyes look weird in the output or there's this

00:07:06.519 --> 00:07:09.360
distracting pattern appearing. And the GPT will

00:07:09.360 --> 00:07:11.779
suggest changes to your prompt to try and fix

00:07:11.779 --> 00:07:15.319
it. It might say... Try adding photorealistic

00:07:15.319 --> 00:07:18.459
eyes to the main prompt, or add uncluttered background

00:07:18.459 --> 00:07:21.199
to the negative prompt, or maybe suggest tweaking

00:07:21.199 --> 00:07:23.160
the lighting description to focus more light

00:07:23.160 --> 00:07:25.560
clearly on the face. It helps you kind of zero

00:07:25.560 --> 00:07:27.459
in on what part of the prompt might be causing

00:07:27.459 --> 00:07:29.980
the error. I see. So the goal isn't just one

00:07:29.980 --> 00:07:32.540
perfect prompt, but actually a whole suite of

00:07:32.540 --> 00:07:34.699
prompts that are carefully engineered to keep

00:07:34.699 --> 00:07:37.199
that facial consistency, which I guess is the

00:07:37.199 --> 00:07:39.180
perfect lead in to actually creating the avatar

00:07:39.180 --> 00:07:41.800
itself. Right. So with those well -cracked prompts

00:07:41.800 --> 00:07:44.779
ready, we move over to Nano Banana. And here,

00:07:45.100 --> 00:07:47.279
consistency becomes like the absolute number

00:07:47.279 --> 00:07:49.279
one priority, doesn't it? If this image is going

00:07:49.279 --> 00:07:52.459
to become your talking avatar, that initial reference

00:07:52.459 --> 00:07:54.860
photo is, well, it sounds like it's the most

00:07:54.860 --> 00:07:57.160
important piece of the whole puzzle. Oh, it dictates

00:07:57.160 --> 00:07:59.370
everything that comes after. Absolutely. That

00:07:59.370 --> 00:08:01.170
photo, it needs to be high resolution. You need

00:08:01.170 --> 00:08:03.529
to be looking straight at the camera. The lighting

00:08:03.529 --> 00:08:06.449
has got to be good, really even, no harsh shadows,

00:08:06.629 --> 00:08:09.470
nothing dramatic, and critically, nothing blocking

00:08:09.470 --> 00:08:13.439
the face. So no hats, no scarves, no big sunglasses.

00:08:13.879 --> 00:08:16.180
You know, Nano Banana basically studies this

00:08:16.180 --> 00:08:19.399
one photo intensely to maintain that core facial

00:08:19.399 --> 00:08:21.620
structure and look across every single image

00:08:21.620 --> 00:08:23.759
you generate afterwards. OK, so once that core

00:08:23.759 --> 00:08:25.600
identity is kind of locked in from the reference

00:08:25.600 --> 00:08:28.740
photo, the goal shifts to building out an entire

00:08:28.740 --> 00:08:32.659
avatar library. Hmm. Wait, building a full avatar

00:08:32.659 --> 00:08:34.600
library? That sounds like potentially a lot of

00:08:34.600 --> 00:08:36.419
upfront work. Is that time investment really

00:08:36.419 --> 00:08:38.519
worth it compared to just generating images one

00:08:38.519 --> 00:08:41.419
by one as you need them? It is so worth it, especially

00:08:41.419 --> 00:08:43.820
when you get to the editing phase later. Trust

00:08:43.820 --> 00:08:46.820
me on this. If you only have one single image

00:08:46.820 --> 00:08:49.320
of your avatar, the final video is going to look

00:08:49.320 --> 00:08:51.720
really static and frankly kind of boring. Like

00:08:51.720 --> 00:08:55.440
a slightly fancier webcam video, you know? The

00:08:55.440 --> 00:08:58.059
goal is diversity, but built on that foundation

00:08:58.059 --> 00:09:01.179
of consistency. Moment of wonder. Whoa. I mean,

00:09:01.259 --> 00:09:03.679
just imagine scaling this ability, creating a

00:09:03.679 --> 00:09:05.860
totally consistent, personalized avatar that

00:09:05.860 --> 00:09:08.779
you can place in dozens, hundreds of different

00:09:08.779 --> 00:09:11.759
scenes. You build up this collection, your avatar

00:09:11.759 --> 00:09:13.200
looking straight, looking left, looking right,

00:09:13.299 --> 00:09:15.279
maybe arms crossed, pointing, different subtle

00:09:15.279 --> 00:09:18.740
expressions. This visual variety is the absolute

00:09:18.740 --> 00:09:21.820
key to creating a final video that's engaging

00:09:21.820 --> 00:09:24.340
and doesn't feel repetitive or, well, robotic.

00:09:24.480 --> 00:09:25.960
That makes a lot of sense, actually. We're shifting

00:09:25.960 --> 00:09:28.620
from just making a still photo to essentially

00:09:28.620 --> 00:09:30.899
planning shots for a film. And speaking of images

00:09:30.899 --> 00:09:32.879
that are ready for video, the source had five

00:09:32.879 --> 00:09:35.039
advanced tips. Starting with lighting, you mentioned

00:09:35.039 --> 00:09:37.639
avoiding dramatic lighting. Yeah, VEO3, the video

00:09:37.639 --> 00:09:40.019
tool. It just loves consistency and clarity.

00:09:40.360 --> 00:09:43.679
So you want prompts that specify soft, even light,

00:09:43.840 --> 00:09:46.899
or natural daylight. Nothing too moody or high

00:09:46.899 --> 00:09:49.460
contrast. This really helps ensure that when

00:09:49.460 --> 00:09:52.200
VEO3 generates the motion, it looks natural.

00:09:52.419 --> 00:09:55.240
It prevents weird flickering or shadows suddenly

00:09:55.240 --> 00:09:57.120
jumping around when the avatar starts to move

00:09:57.120 --> 00:09:59.919
or speak. even light is crucial for the video

00:09:59.919 --> 00:10:02.799
output. Got it. Even light prevents motion artifacts.

00:10:03.379 --> 00:10:05.360
And what about composition? You mentioned leaving

00:10:05.360 --> 00:10:07.960
room for movement. Correct. Don't crop the image

00:10:07.960 --> 00:10:10.840
too tightly around the face in Nano Banana. VEO3

00:10:10.840 --> 00:10:13.000
needs a bit of space, some headroom, and shoulder

00:10:13.000 --> 00:10:15.080
room to make the avatar's movements look natural.

00:10:15.539 --> 00:10:17.980
So stick to prompts like medium shot or chest

00:10:17.980 --> 00:10:20.399
up portrait. Give the AI some canvas to work

00:10:20.399 --> 00:10:23.299
with. Okay. And for people making, say, vertical

00:10:23.299 --> 00:10:25.659
content for social media. Right. While the standard

00:10:25.659 --> 00:10:28.299
is 16 .9 horizontal video, you can absolutely

00:10:28.299 --> 00:10:31.100
generate vertical images, too. Just add portrait

00:10:31.100 --> 00:10:34.259
orientation or specify 9 .16 aspect ratio in

00:10:34.259 --> 00:10:36.240
your Nano Banana prompt. Perfect for reels or

00:10:36.240 --> 00:10:38.639
TikToks. Good tip. And the last one was about

00:10:38.639 --> 00:10:41.539
creating a sequence. Yeah. If you're aiming for

00:10:41.539 --> 00:10:44.340
a really polished professional edit, don't just

00:10:44.340 --> 00:10:47.200
make one main shot. Create a little set of three

00:10:47.200 --> 00:10:49.659
images using slight variations of your prompt,

00:10:49.980 --> 00:10:52.059
like a main shot looking straight ahead, then

00:10:52.059 --> 00:10:53.700
maybe a slightly different angle looking off

00:10:53.669 --> 00:10:56.250
to the side, and perhaps a close up for emphasis.

00:10:56.809 --> 00:10:59.110
These act like building blocks in your video

00:10:59.110 --> 00:11:01.330
editor later, giving you options for cutting

00:11:01.330 --> 00:11:03.450
between shots, just like in real filmmaking.

00:11:03.710 --> 00:11:06.070
It makes the final output much more dynamic.

00:11:06.210 --> 00:11:08.789
OK, so image consistency is the bedrock. The

00:11:08.789 --> 00:11:11.789
library provides variety. And these tips help

00:11:11.789 --> 00:11:15.129
make the images truly video ready. Now I guess

00:11:15.129 --> 00:11:16.929
it's time to actually direct the performance.

00:11:17.129 --> 00:11:19.490
Exactly. Now for the really exciting part, where

00:11:19.490 --> 00:11:21.309
we kind of switch hats from being a painter or

00:11:21.309 --> 00:11:24.870
photographer to being a... director. First thing

00:11:24.870 --> 00:11:26.809
though, we have to understand and work with the

00:11:26.809 --> 00:11:29.409
fundamental constraint of Google VE03 right now.

00:11:29.750 --> 00:11:31.509
Clips are limited. They can only be up to eight

00:11:31.509 --> 00:11:34.309
seconds long. Right, eight seconds, which means

00:11:34.309 --> 00:11:36.350
you absolutely have to plan your script differently.

00:11:36.450 --> 00:11:38.669
You need to break it down into these short, almost

00:11:38.669 --> 00:11:40.750
punchy eight -second segments, each one needing

00:11:40.750 --> 00:11:43.909
to contain basically one complete idea, or roughly,

00:11:43.929 --> 00:11:46.529
what, 15 to 20 spoken words. Sounds like the

00:11:46.529 --> 00:11:48.230
planning stage is almost more critical than the

00:11:48.230 --> 00:11:50.509
rendering itself. Oh, it totally is. Meticulous

00:11:50.509 --> 00:11:53.090
planning saves huge amounts of time and credits

00:11:53.090 --> 00:11:56.990
later. And the VEO3 prompt structure? It's different

00:11:56.990 --> 00:11:59.029
from the image prompt. It's focused on directing

00:11:59.029 --> 00:12:01.549
motion, emotion, and sound, not just describing

00:12:01.549 --> 00:12:04.450
how things look. The basic structure that seems

00:12:04.450 --> 00:12:06.970
to work pretty well is something like this. Speaking.

00:12:07.309 --> 00:12:10.730
You put the emotion here in a busa, you specify

00:12:10.730 --> 00:12:13.549
accent nationality if needed accent, then you

00:12:13.549 --> 00:12:16.429
paste the exact words they're saying. Ah, okay,

00:12:16.470 --> 00:12:18.570
so you need specific emotional direction. not

00:12:18.570 --> 00:12:21.490
just speaking, but using active verbs like explaining

00:12:21.490 --> 00:12:24.370
calmly, or announcing excitedly, or maybe even

00:12:24.370 --> 00:12:27.190
whispering secretly. And those words drive the

00:12:27.190 --> 00:12:29.230
specific facial movements and expressions that

00:12:29.230 --> 00:12:32.450
VEO3 generates. That's the idea, exactly. And

00:12:32.450 --> 00:12:34.649
you need to keep that accent and general tone

00:12:34.649 --> 00:12:37.070
consistent across all your clips, otherwise it'll

00:12:37.070 --> 00:12:38.610
sound really jarring when you stitch them together.

00:12:38.799 --> 00:12:41.159
Makes sense. So for longer content, you have

00:12:41.159 --> 00:12:43.460
to master chaining these eight second clips together.

00:12:43.620 --> 00:12:46.139
That sounds, well, it sounds like microscripting

00:12:46.139 --> 00:12:48.659
almost. It kind of is, yeah. So what's the challenge

00:12:48.659 --> 00:12:51.700
there? How do you stop a sequence of these short

00:12:51.700 --> 00:12:56.259
clips from just feeling like a disjointed slideshow

00:12:56.259 --> 00:12:58.419
with talking heads? Right, that's the art of

00:12:58.419 --> 00:13:00.480
it. You do it by planning the emotional arc of

00:13:00.480 --> 00:13:03.159
your overall message and, crucially, by using

00:13:03.159 --> 00:13:05.919
those different avatar poses we generated earlier

00:13:05.919 --> 00:13:07.919
in Nano Banana. Maybe you start the sequence

00:13:07.919 --> 00:13:09.860
with the avatar looking thoughtful in a medium

00:13:09.860 --> 00:13:12.240
shot. Then, as the point gets more exciting,

00:13:12.519 --> 00:13:14.879
you cut to that side angle shot we created, and

00:13:14.879 --> 00:13:17.539
maybe you end on the confident, chest -up, straight

00:13:17.539 --> 00:13:20.059
-to -camera shot for the conclusion. That visual

00:13:20.059 --> 00:13:22.279
variety helps mask the cuts between the eight

00:13:22.279 --> 00:13:24.919
-second clips and makes it feel more like a continuous

00:13:24.919 --> 00:13:27.139
directed piece. Okay, that makes sense. using

00:13:27.139 --> 00:13:29.960
different shots to smooth transitions. But we

00:13:29.960 --> 00:13:31.919
should probably touch on troubleshooting, because

00:13:31.919 --> 00:13:34.039
let's be real, these tools aren't flawless yet,

00:13:34.080 --> 00:13:36.700
right? Not at all. Still early days in some ways.

00:13:36.879 --> 00:13:38.779
What about that common problem people mention,

00:13:39.200 --> 00:13:42.210
the mouth movements? not quite syncing up perfectly

00:13:42.210 --> 00:13:44.029
with the audio, the lip sync being a bit off.

00:13:44.289 --> 00:13:46.450
Yeah, that often happens when there's a mismatch

00:13:46.450 --> 00:13:49.129
between the emotion you put in the VEO3 text

00:13:49.129 --> 00:13:51.750
prompt and the expression on the original still

00:13:51.750 --> 00:13:54.049
image you fed it. So if you give it a picture

00:13:54.049 --> 00:13:56.210
where the avatar is frowning or looking really

00:13:56.210 --> 00:13:58.610
serious, but then you ask it to speak enthusiastically

00:13:58.610 --> 00:14:01.409
or happily, the lip movements can look really

00:14:01.409 --> 00:14:03.529
unnatural. You got to try and match them. Use

00:14:03.529 --> 00:14:05.730
a smiling picture if the text prompt is happy.

00:14:06.370 --> 00:14:08.830
Okay, match image expression to text emotion.

00:14:09.100 --> 00:14:12.419
What if the whole video just looks kind of shaky

00:14:12.419 --> 00:14:14.679
or jittery? That can sometimes happen if the

00:14:14.679 --> 00:14:16.940
original Nano Banana image background was too

00:14:16.940 --> 00:14:20.580
complex or detailed. Try simplifying it. Generate

00:14:20.580 --> 00:14:22.980
an image with a cleaner, maybe slightly blurred

00:14:22.980 --> 00:14:25.419
background. Simpler backgrounds often lead to

00:14:25.419 --> 00:14:27.539
smoother, less artifact -filled video motion

00:14:27.539 --> 00:14:30.519
from VEO3. And what if things in the background,

00:14:31.120 --> 00:14:32.960
like, I don't know, plants on a shelf or a necklace

00:14:32.960 --> 00:14:35.779
the avatar is wearing, start moving weirdly on

00:14:35.779 --> 00:14:37.860
their own in the final video? Right, the rogue

00:14:37.860 --> 00:14:41.080
moving objects. Yeah, for that, try using VE03

00:14:41.080 --> 00:14:43.379
prompts that really focus the AI's attention

00:14:43.379 --> 00:14:46.220
on the face and minimize its attempts to animate

00:14:46.220 --> 00:14:49.460
the background. Things like adding speaking directly

00:14:49.460 --> 00:14:52.519
to the camera or specifying close -up portrait,

00:14:52.799 --> 00:14:55.480
shallow depth of field can sometimes help lock

00:14:55.480 --> 00:14:57.360
down the background elements. So fundamentally,

00:14:57.440 --> 00:15:00.120
what's the most critical difference then, between

00:15:00.120 --> 00:15:02.740
prompting for a still image and prompting for

00:15:02.740 --> 00:15:05.299
a moving video. Video prompts focus on directing

00:15:05.299 --> 00:15:07.919
emotion, motion, and sound, not just describing

00:15:07.919 --> 00:15:10.799
visuals. It's about performance. Hashtag, tag,

00:15:10.960 --> 00:15:13.460
tag, big idea recap and real world application.

00:15:13.820 --> 00:15:16.600
Okay, wow. That's a lot to take in, but it feels

00:15:16.600 --> 00:15:18.559
like a complete process. Let's quickly recap

00:15:18.559 --> 00:15:20.580
those five phases again, just to nail it down.

00:15:21.039 --> 00:15:23.340
The whole journey from idea to finished video.

00:15:23.799 --> 00:15:26.600
Sounds good. Phase one. Get ready. That's setting

00:15:26.600 --> 00:15:29.120
up your tools remembering to use the VEO3 flow

00:15:29.120 --> 00:15:31.840
interface. Yeah. Preparing that really good high

00:15:31.840 --> 00:15:34.179
quality reference photo and getting your project

00:15:34.179 --> 00:15:36.179
folder's organelles from the start saves headaches

00:15:36.179 --> 00:15:40.179
later. Phase two, create prompts. Use those specialized

00:15:40.179 --> 00:15:43.840
custom GPTs to write detailed cinematic and importantly

00:15:43.840 --> 00:15:46.860
emotion rich descriptions for the images you

00:15:46.860 --> 00:15:50.840
want. Phase three. Create images. Then you hop

00:15:50.840 --> 00:15:53.379
over to Nano Banana, upload your reference photo,

00:15:53.620 --> 00:15:55.039
paste in those prompts you just created, and

00:15:55.039 --> 00:15:57.000
start building out that diverse avatar library,

00:15:57.159 --> 00:15:59.639
different poses, angles, maybe expressions. Phase

00:15:59.639 --> 00:16:02.379
four, create videos. Upload those finished images

00:16:02.379 --> 00:16:05.159
into VEO3 Flow. Use the fast mode extensively

00:16:05.159 --> 00:16:06.820
for testing your eight -second script chunks

00:16:06.820 --> 00:16:09.039
and prompts. Get things right there before using

00:16:09.039 --> 00:16:10.840
the more expensive quality mode for your final

00:16:10.840 --> 00:16:13.820
renders. Exactly. And then finally, phase five.

00:16:14.009 --> 00:16:17.049
Edit and finish. Take all those eight second

00:16:17.049 --> 00:16:19.830
clips, bring them into a video editor, free ones

00:16:19.830 --> 00:16:23.070
like CapCut or DaVinci Resolve, work great. Stitch

00:16:23.070 --> 00:16:25.330
them together in sequence, add maybe some background

00:16:25.330 --> 00:16:27.990
music, titles, and boom, you've got a cohesive,

00:16:28.309 --> 00:16:30.549
professional -looking video. And the source material

00:16:30.549 --> 00:16:33.269
suggests this whole thing, once you're practiced,

00:16:33.710 --> 00:16:36.429
is potentially doable in, like, an afternoon.

00:16:36.950 --> 00:16:39.809
That's pretty incredible. The real world uses...

00:16:39.600 --> 00:16:41.259
They feel pretty transformative, don't they?

00:16:41.460 --> 00:16:43.519
For content creators, obviously this is huge.

00:16:43.700 --> 00:16:45.940
It completely removes camera shyness, the need

00:16:45.940 --> 00:16:49.059
for expensive gear or a studio space, plus that

00:16:49.059 --> 00:16:52.120
idea of creating multilingual versions just by

00:16:52.120 --> 00:16:54.919
changing the text prompt in VEO3. That's potentially

00:16:54.919 --> 00:16:57.100
massive for reaching global audiences with the

00:16:57.100 --> 00:16:59.720
same core video. Oh yeah. And for businesses,

00:16:59.980 --> 00:17:02.279
think about it. Quick product demos, consistent

00:17:02.279 --> 00:17:04.200
professional employee training videos delivered

00:17:04.200 --> 00:17:06.720
by a familiar avatar, or generating large -scale

00:17:06.720 --> 00:17:09.099
marketing content variations. You could test

00:17:09.099 --> 00:17:11.140
hundreds of ad angles in an afternoon because

00:17:11.140 --> 00:17:13.140
there's no physical production cost or delay.

00:17:13.400 --> 00:17:16.000
It's almost risk -free A -B testing for video

00:17:16.000 --> 00:17:18.859
creative. Totally. And even for personal projects,

00:17:19.099 --> 00:17:21.279
it just opens up so much flexibility, right?

00:17:21.279 --> 00:17:23.740
Yeah. Unique personalized birthday messages,

00:17:23.980 --> 00:17:26.420
keeping up a regular social media presence without

00:17:26.420 --> 00:17:28.880
the constant pressure of filming. yourself. It

00:17:28.880 --> 00:17:31.019
really feels like a risk -free sandbox to experiment

00:17:31.019 --> 00:17:33.279
with your presentation style, your personal brand,

00:17:33.660 --> 00:17:35.960
or testing ideas for your business. Absolutely,

00:17:36.079 --> 00:17:38.339
and the key thing that makes all that possible

00:17:38.339 --> 00:17:41.200
is nailing the consistency of the avatar image

00:17:41.200 --> 00:17:43.759
first. That unlocks all the creative freedom

00:17:43.759 --> 00:17:46.099
you have in the video generation stage later.

00:17:46.339 --> 00:17:49.200
Hashtag tag tag outro. So the big takeaway here

00:17:49.200 --> 00:17:51.890
feels like This is a skill, it's learnable, and

00:17:51.890 --> 00:17:54.589
it's available to you like today. The key really

00:17:54.589 --> 00:17:56.690
seems to be just starting small, maybe with those

00:17:56.690 --> 00:17:59.369
free tools and trials. Focus on building up that

00:17:59.369 --> 00:18:02.289
prompt library and really mastering that consistency

00:18:02.289 --> 00:18:04.210
process, especially with a reference photo and

00:18:04.210 --> 00:18:07.670
the lighting in your images. Yeah, exactly. The

00:18:07.670 --> 00:18:09.809
technology itself, it's changing almost daily,

00:18:09.869 --> 00:18:12.829
it feels like. But the source material really

00:18:12.829 --> 00:18:14.730
reminded us of something enduring, didn't it?

00:18:14.890 --> 00:18:17.910
The basic rules of good storytelling, clear communication,

00:18:18.410 --> 00:18:20.710
and actually creating valuable content for people.

00:18:21.309 --> 00:18:22.849
Those things always stay the same. Doesn't matter

00:18:22.849 --> 00:18:24.730
if there's a camera involved or not. That's a

00:18:24.730 --> 00:18:27.210
great point. So maybe the final thought for everyone

00:18:27.210 --> 00:18:29.809
listening is this. If the camera is no longer

00:18:29.809 --> 00:18:32.450
the obstacle, what's the story you're finally

00:18:32.450 --> 00:18:32.930
going to tell?