WEBVTT

00:00:00.000 --> 00:00:03.879
So if you're trying to scale video content today,

00:00:04.320 --> 00:00:07.839
you've probably run head first into this conflict.

00:00:08.619 --> 00:00:12.640
You need these professional, realistic AI avatars,

00:00:12.939 --> 00:00:15.199
maybe for marketing, maybe explainer videos,

00:00:15.640 --> 00:00:18.620
but... Every platform seems like every single

00:00:18.620 --> 00:00:20.699
one traps you pretty quickly. Yeah, it's the

00:00:20.699 --> 00:00:23.100
extensive subscriptions, right? Yeah, and the

00:00:23.100 --> 00:00:24.940
the credit limits they get you every time you

00:00:24.940 --> 00:00:27.039
feel kind of locked in exactly and you realize

00:00:27.039 --> 00:00:29.600
Wow, if I want to make say 50 videos this month,

00:00:29.620 --> 00:00:31.839
you're gonna need a second mortgage Yeah, it's

00:00:31.839 --> 00:00:34.219
nuts. But you know the core idea from the sources

00:00:34.219 --> 00:00:36.140
you send over it's actually brilliant we can

00:00:36.140 --> 00:00:39.079
like Completely sidestep that whole system right

00:00:39.079 --> 00:00:41.359
and the promise here. It isn't just about getting

00:00:41.359 --> 00:00:44.140
better avatars. It's about getting Well, unlimited

00:00:44.140 --> 00:00:46.159
creative freedom and really cutting down the

00:00:46.159 --> 00:00:48.759
costs. That's it. So today we're diving into

00:00:48.759 --> 00:00:51.979
this custom three tool blueprint. The idea is

00:00:51.979 --> 00:00:54.500
it turns this expensive limited process into

00:00:54.500 --> 00:00:57.560
something faster, like an asset creation machine.

00:00:57.770 --> 00:00:59.509
Yeah, that's the mission today. We're going to

00:00:59.509 --> 00:01:02.270
unpack this workflow. It really focuses on consistency

00:01:02.270 --> 00:01:05.829
first using a tool called NanoBanana, then smart

00:01:05.829 --> 00:01:08.870
scripting using ChatGPT, and then finally bringing

00:01:08.870 --> 00:01:11.489
in the heavy hitter for realism, that's Hagen,

00:01:11.909 --> 00:01:14.950
but specifically with Google's VEO 3 .1 model

00:01:14.950 --> 00:01:17.489
integrated. All right, let's unpack this then,

00:01:17.569 --> 00:01:20.230
starting with the problem, the bottleneck, financially

00:01:20.230 --> 00:01:24.040
and creatively. Why does just using one platform,

00:01:24.079 --> 00:01:26.340
let's say, sticking only with Hagen, why does

00:01:26.340 --> 00:01:29.299
that fail if your goal is serious scaled content?

00:01:30.100 --> 00:01:32.920
Well, the single tool problem, it's like you're

00:01:32.920 --> 00:01:34.819
paying the vendor for two really different things

00:01:34.819 --> 00:01:36.799
at the same time. Okay. You're paying them for

00:01:36.799 --> 00:01:39.099
the creation part, which is actually pretty costly

00:01:39.099 --> 00:01:41.099
getting the avatar just right, the outfit, the

00:01:41.099 --> 00:01:43.060
look. And then you're also paying them for the

00:01:43.060 --> 00:01:46.060
final video render, that creation part. It just

00:01:46.060 --> 00:01:48.099
burns through credits like crazy. And I guess

00:01:48.099 --> 00:01:50.180
if you need just a small change, like the avatar

00:01:50.180 --> 00:01:51.900
needs a blue shirt today instead of yesterday's

00:01:51.900 --> 00:01:54.129
gray one. Exactly. You're basically paying for

00:01:54.129 --> 00:01:56.030
creation all over again. You're stuck with their

00:01:56.030 --> 00:01:58.469
options, their internal library. Any change means

00:01:58.469 --> 00:02:00.609
another expensive credit burn. The whole trick

00:02:00.609 --> 00:02:03.989
is to sort of decouple those two steps. Ah, so

00:02:03.989 --> 00:02:06.430
the solution is don't start building the avatar

00:02:06.430 --> 00:02:08.310
from scratch inside the expensive video app.

00:02:08.909 --> 00:02:11.310
You pre -make it, pre -customize it somewhere

00:02:11.310 --> 00:02:13.969
else first. Yeah, use cheaper tools or even free

00:02:13.969 --> 00:02:16.449
ones to get the look right. Then you just import

00:02:16.449 --> 00:02:18.909
the finished asset. It's a shift in where the

00:02:18.909 --> 00:02:21.550
cost happens. It's just smart economics, really.

00:02:21.740 --> 00:02:23.840
And that brings us right to the three tools.

00:02:24.400 --> 00:02:27.219
NanoBanana is the base, the foundation. It makes

00:02:27.219 --> 00:02:29.159
that consistent face, but lets you change everything

00:02:29.159 --> 00:02:32.199
else. Clothes, backgrounds, emotions. It locks

00:02:32.199 --> 00:02:34.860
the identity. Okay, tool one, identity lock.

00:02:35.000 --> 00:02:38.560
Now tool two is ChatGPT. You call it the strategist.

00:02:39.240 --> 00:02:41.659
Or like the creative director. It helps you generate

00:02:41.659 --> 00:02:44.300
the perfect detailed instructions, the props

00:02:44.300 --> 00:02:46.620
for NanoBanana. And crucially, it helps you write

00:02:46.620 --> 00:02:48.520
conversational scripts. Scripts that actually

00:02:48.520 --> 00:02:51.240
sound human. which is harder than it sounds.

00:02:51.620 --> 00:02:53.520
And the final piece, the heavy lifting for the

00:02:53.520 --> 00:02:55.659
video itself, that's left to Hagen. Yep, hey

00:02:55.659 --> 00:02:58.870
Jen, but only for the final video render. You

00:02:58.870 --> 00:03:01.150
feed it that pre -made image from NanoBanana,

00:03:01.490 --> 00:03:05.189
and VEO 3 .1 turns that static picture into this

00:03:05.189 --> 00:03:08.469
really lifelike video. But you skip the big creation

00:03:08.469 --> 00:03:11.129
fee inside Hagen itself. OK, this makes sense.

00:03:11.189 --> 00:03:13.050
The whole system seems to rely on separating

00:03:13.050 --> 00:03:15.990
that cost. So why is the order of using these

00:03:15.990 --> 00:03:19.210
three tools so critical? Why does it save money

00:03:19.210 --> 00:03:21.669
and give more freedom? The order saves money

00:03:21.669 --> 00:03:24.050
because creation happens before expensive video

00:03:24.050 --> 00:03:25.889
rendering costs are applied, right? You front

00:03:25.889 --> 00:03:28.370
load the free creation work. Got it. So step

00:03:28.370 --> 00:03:31.409
one then nano banana you called it magic What

00:03:31.409 --> 00:03:33.930
exactly is this thing and how does it fix that

00:03:33.930 --> 00:03:36.250
big avatar problem the face changing all the

00:03:36.250 --> 00:03:39.129
time? So nano banana, it's a pretty powerful

00:03:39.129 --> 00:03:40.949
Google image model right now You can access it

00:03:40.949 --> 00:03:43.250
for free inside Google AI studio, which is like

00:03:43.250 --> 00:03:45.930
their testing area Okay, and it's killer feature.

00:03:45.949 --> 00:03:47.770
The thing that makes it special is character

00:03:47.770 --> 00:03:50.449
consistency character consistency That sounds

00:03:50.449 --> 00:03:52.949
a bit like jargon maybe but it's the fix for

00:03:52.949 --> 00:03:54.810
that image drift right where the face subtly

00:03:54.810 --> 00:03:58.270
changes exactly It means keeping the person's

00:03:58.270 --> 00:04:00.909
face, their core identity, precisely the same,

00:04:01.310 --> 00:04:03.129
even when you change the outfit, the background,

00:04:03.270 --> 00:04:05.409
the lighting, maybe even the expression slightly.

00:04:05.930 --> 00:04:07.669
This is the big problem with other tools like,

00:04:07.669 --> 00:04:11.729
say, DLA or Mid Journey. The face tends to wander

00:04:11.729 --> 00:04:14.219
with each new prompt. You know, I still wrestle

00:04:14.219 --> 00:04:16.620
with prompt drift myself sometimes, especially

00:04:16.620 --> 00:04:18.839
when I try to keep things consistent across different

00:04:18.839 --> 00:04:21.459
scenes, adding more details about the background

00:04:21.459 --> 00:04:24.879
or whatever. That subtle face change, it's tough

00:04:24.879 --> 00:04:26.660
to nail down. Yeah, it's a real frustration.

00:04:26.879 --> 00:04:28.800
And it just shows why starting simple is usually

00:04:28.800 --> 00:04:32.060
better. You upload a base image, a good clear

00:04:32.060 --> 00:04:35.019
headshot works best. That acts as the identity

00:04:35.019 --> 00:04:36.939
lock. Then you just start making variations,

00:04:37.259 --> 00:04:39.209
unlimited variations, really. Can we run through

00:04:39.209 --> 00:04:41.389
a couple of examples? Like, how different can

00:04:41.389 --> 00:04:43.649
the variations be using the same face? For sure.

00:04:43.769 --> 00:04:45.810
Take a simple headshot. Example one, you prompt.

00:04:46.389 --> 00:04:48.529
Change the shirt to a dark red turtleneck sweater.

00:04:48.990 --> 00:04:51.189
Put him in a cozy coffee shop with dim lights.

00:04:51.310 --> 00:04:54.649
Boom. Same face. Totally new context. Takes like

00:04:54.649 --> 00:04:57.449
15 seconds. And what if I need that same person,

00:04:57.529 --> 00:04:59.470
that same face, but for like a professional pitch

00:04:59.470 --> 00:05:03.110
video? Easy. Same face, new prompt. Change the

00:05:03.110 --> 00:05:05.470
outfit to a sharp gray suit and tie. The background

00:05:05.470 --> 00:05:08.079
is a modern office skyscraper lobby. Bright window

00:05:08.079 --> 00:05:11.500
light, done. You could honestly generate 50 versions

00:05:11.500 --> 00:05:14.519
for 50 different uses in maybe an hour. The source

00:05:14.519 --> 00:05:16.120
has also mentioned trying to avoid that kind

00:05:16.120 --> 00:05:19.079
of generic, too polished AI look. What are the

00:05:19.079 --> 00:05:21.120
best practices for prompts to get more realism

00:05:21.120 --> 00:05:23.800
with Nano Banana? OK, yeah, good point. Be specific

00:05:23.800 --> 00:05:25.740
with the details. Like, don't just say jacket,

00:05:25.920 --> 00:05:28.910
say black leather jacket. And. Crucially, you

00:05:28.910 --> 00:05:31.970
have to proactively add phrases about realism.

00:05:32.050 --> 00:05:35.389
Things like natural imperfections or subtle skin

00:05:35.389 --> 00:05:37.790
texture. Oh, OK. And definitely avoid telling

00:05:37.790 --> 00:05:40.769
the AI to make it flawless or perfect. That's

00:05:40.769 --> 00:05:42.970
like a direct ticket to the uncanny valley. It

00:05:42.970 --> 00:05:44.850
looks weird. Right. So if you had to pick just

00:05:44.850 --> 00:05:46.769
one prompt phrase, what's the most important

00:05:46.769 --> 00:05:50.329
one for fighting that super smooth, uncanny AI

00:05:50.329 --> 00:05:53.370
look? Adding subtle texture details like film

00:05:53.370 --> 00:05:56.410
grain combats the uncanny to perfect AI look.

00:05:56.670 --> 00:06:00.199
Film grain. interesting okay moving to step two

00:06:00.199 --> 00:06:02.600
Chad GPT we're using it for more than just writing

00:06:02.600 --> 00:06:04.459
the script right you said like the architect

00:06:04.459 --> 00:06:06.279
yeah exactly we use it to build the characters

00:06:06.279 --> 00:06:10.000
do it okay scaffolding let's call it like what's

00:06:10.000 --> 00:06:12.439
their persona is this avatar a serious finance

00:06:12.439 --> 00:06:15.279
analyst or I don't know it chill fitness guru.

00:06:15.699 --> 00:06:17.680
ChatGPT helps define their style, maybe common

00:06:17.680 --> 00:06:20.079
phrases they'd use. It makes them feel more believable

00:06:20.079 --> 00:06:22.579
than just some random face. And it also helps

00:06:22.579 --> 00:06:25.459
engineer the prompts for Nano Banana. Massively

00:06:25.459 --> 00:06:27.899
speeds things up. You can tell ChatGPT, hey,

00:06:27.899 --> 00:06:30.459
I need five detailed prompt ideas for professional

00:06:30.459 --> 00:06:32.800
tennis coach avatar. Give me different settings.

00:06:33.000 --> 00:06:35.579
And boom, it structures them for you. On the

00:06:35.579 --> 00:06:37.459
court, in the locker room, doing a practice drill,

00:06:37.540 --> 00:06:39.420
maybe an interview setup, you're not staring

00:06:39.420 --> 00:06:41.079
at a blank page trying to think of variations

00:06:41.079 --> 00:06:43.779
every single time. This is also where we tackle

00:06:43.779 --> 00:06:46.740
that robot voice problem you hear in so many

00:06:46.740 --> 00:06:49.660
AI videos. Why do they often sound so stiff?

00:06:50.279 --> 00:06:52.120
Well, usually it's because people write the scripts

00:06:52.120 --> 00:06:55.180
like they're writing a formal email. Yeah. All

00:06:55.180 --> 00:06:57.680
proper and buttoned up. Right. We have to specifically

00:06:57.680 --> 00:07:01.699
tell chat GPT, use a casual tone, use the first

00:07:01.699 --> 00:07:04.639
person, make the avatar say I. and really aim

00:07:04.639 --> 00:07:07.399
for a conversational flow, like how people actually

00:07:07.399 --> 00:07:09.860
talk. And even if ChatGPT drafts the script,

00:07:10.399 --> 00:07:12.980
those editing tips you mentioned sound, well,

00:07:13.800 --> 00:07:15.920
non -negotiable. Oh, absolutely critical. Use

00:07:15.920 --> 00:07:18.959
simple words like use instead of utilize. Keep

00:07:18.959 --> 00:07:21.459
sentences short, lots of contractions, don't,

00:07:21.459 --> 00:07:23.839
can't. That's conversational. And the number

00:07:23.839 --> 00:07:25.819
one test, read the script out loud yourself.

00:07:26.220 --> 00:07:28.899
If you stumble over the words, or if it sounds

00:07:28.899 --> 00:07:31.199
formal coming out of your mouth, the AI voice

00:07:31.199 --> 00:07:33.329
is going to sound even more unnatural. So it's

00:07:33.329 --> 00:07:35.889
not just generating prompts, it's building this

00:07:35.889 --> 00:07:40.050
reusable character library framework. Beyond

00:07:40.050 --> 00:07:43.470
the scripting part, how else does ChatGPT really

00:07:43.470 --> 00:07:46.189
save time in this whole avatar creation process?

00:07:46.649 --> 00:07:49.329
It dramatically speeds up the process by creating

00:07:49.329 --> 00:07:53.209
reusable, customizable prompt templates immediately.

00:07:53.470 --> 00:07:56.180
Got it. Reusable templates, makes sense. Minroll

00:07:56.180 --> 00:07:58.540
sponsor, Read Placeholder. Welcome back to the

00:07:58.540 --> 00:08:00.639
Deep Dive. Okay, so we've got our consistent

00:08:00.639 --> 00:08:03.759
custom avatar images made with NanoBanana, and

00:08:03.759 --> 00:08:06.379
we've planned out a natural conversational script

00:08:06.379 --> 00:08:09.660
using Chad GPT's help. Now for the final step.

00:08:10.200 --> 00:08:12.879
actually bringing that avatar to life using VEO

00:08:12.879 --> 00:08:15.500
3 .1 inside Heygen. Right, this is where it all

00:08:15.500 --> 00:08:17.759
comes together. We take that nice clean avatar

00:08:17.759 --> 00:08:19.899
image we made for free, remember using Nano Banana,

00:08:20.120 --> 00:08:21.899
and we upload that into Heygen. We're effectively

00:08:21.899 --> 00:08:25.180
skipping Heygen's own more expensive avatar creation

00:08:25.180 --> 00:08:28.839
step entirely. And this integration of Google's

00:08:28.839 --> 00:08:32.399
VEO 3 .1 model into Heygen, it's being talked

00:08:32.399 --> 00:08:35.210
about as a real leap forward in realism. What

00:08:35.210 --> 00:08:38.850
specifically makes VEO 3 .1 technically better

00:08:38.850 --> 00:08:41.289
than the older models? It really comes down to

00:08:41.289 --> 00:08:44.250
about three key upgrades. First is just superior

00:08:44.250 --> 00:08:47.830
realism. It renders in sharp 1080p, yeah. But

00:08:47.830 --> 00:08:50.049
the big thing is it uses a much larger understanding

00:08:50.049 --> 00:08:52.470
of context to figure out lighting and textures.

00:08:53.029 --> 00:08:55.269
So you get these hyper -natural shadows, subtle

00:08:55.269 --> 00:08:58.490
skin details. It looks much more lifelike. Okay,

00:08:58.769 --> 00:09:01.159
better textures and light. What's second? Second

00:09:01.159 --> 00:09:03.460
is Perfect Sync. The audio track and the video,

00:09:03.519 --> 00:09:05.580
the lip movements, they're generated together

00:09:05.580 --> 00:09:07.840
as one unit. This pretty much eliminates that

00:09:07.840 --> 00:09:09.980
weird robotic mouth movement or the slight delay

00:09:09.980 --> 00:09:11.799
you used to see all the time. Yeah, that lip

00:09:11.799 --> 00:09:13.899
sync issue was always the dead giveaway you were

00:09:13.899 --> 00:09:16.259
watching an AI video. It totally was. And the

00:09:16.259 --> 00:09:18.039
third thing is branding consistency. Because

00:09:18.039 --> 00:09:20.639
we're starting with that locked nano -bena image,

00:09:21.100 --> 00:09:23.340
VDO 3 .1 does a much better job of maintaining

00:09:23.340 --> 00:09:25.419
that exact look across lots of separate videos,

00:09:25.559 --> 00:09:27.759
which is huge if you're doing, say, a marketing

00:09:27.759 --> 00:09:29.919
campaign and need the same spokesperson intended.

00:09:29.929 --> 00:09:32.850
different ads. That claim, perfect sync, sounds

00:09:32.850 --> 00:09:35.789
almost too good to be true. Are there edge cases

00:09:35.789 --> 00:09:39.190
where VEO 3 .1 might still have trouble, like

00:09:39.190 --> 00:09:41.730
really fast talking or strong accents? It's definitely

00:09:41.730 --> 00:09:44.909
excellent VEO 3 .1 is. But yeah, extremely rapid

00:09:44.909 --> 00:09:47.210
speech or maybe some very distinct localized

00:09:47.210 --> 00:09:49.730
accents, those can still occasionally challenge

00:09:49.730 --> 00:09:52.049
the model a little bit. The safest approach is

00:09:52.049 --> 00:09:54.750
just to keep the scripts at a pretty normal conversational

00:09:54.750 --> 00:09:57.730
speaking pace. Whoa. But just imagine scaling

00:09:57.730 --> 00:10:01.009
this whole system. You could potentially produce,

00:10:01.009 --> 00:10:03.669
I don't know, thousands of localized video ads

00:10:03.669 --> 00:10:05.730
really quickly, like for global campaigns. You

00:10:05.730 --> 00:10:08.169
could A -B test different emotional deliveries,

00:10:08.389 --> 00:10:10.830
different regional languages, all while using

00:10:10.830 --> 00:10:13.350
the exact same spokesperson's face. That consistency,

00:10:13.450 --> 00:10:16.159
that's like... Exponential value. It completely

00:10:16.159 --> 00:10:18.340
changes the ROI calculation for your creative

00:10:18.340 --> 00:10:20.440
assets, yeah. And because we're only paying Hagen

00:10:20.440 --> 00:10:22.659
for that final render step, we're skipping the

00:10:22.659 --> 00:10:24.799
most credit -intensive part of the process. OK,

00:10:24.820 --> 00:10:27.700
on a practical note, what's the longest continuous

00:10:27.700 --> 00:10:30.840
video clip VEO 3 .1 can currently generate in

00:10:30.840 --> 00:10:32.960
Hagen before you have to cut and start a new

00:10:32.960 --> 00:10:37.120
segment? VEO 3 .1 allows for single continuous

00:10:37.120 --> 00:10:40.340
video segments up to 60 seconds long. 60 seconds.

00:10:40.379 --> 00:10:43.940
Oh, no. OK. Good to know. Now let's talk troubleshooting.

00:10:44.120 --> 00:10:47.179
Because when you set up a system like this, multi

00:10:47.179 --> 00:10:50.139
-tool systems, things inevitably go wrong sometimes.

00:10:50.179 --> 00:10:53.000
People hit snacks. Instead of just listing problems,

00:10:53.120 --> 00:10:55.860
maybe we can group the fixes. First, what about

00:10:55.860 --> 00:10:57.879
fixing issues with the image quality coming out

00:10:57.879 --> 00:10:59.860
of NanoBanana? OK, yeah. If NanoBanana gives

00:10:59.860 --> 00:11:02.519
you a weird face, like distorted features or

00:11:02.519 --> 00:11:05.159
something odd, the first fix is usually just

00:11:05.159 --> 00:11:07.340
simplify your prompt. Don't try to describe the

00:11:07.340 --> 00:11:09.460
entire universe in one go. Right. Focus on the

00:11:09.460 --> 00:11:12.029
face first. A young woman. Neutral expression,

00:11:12.309 --> 00:11:14.509
soft lighting, get that right. Then start adding

00:11:14.509 --> 00:11:17.009
complexity like outfits or backgrounds. And what

00:11:17.009 --> 00:11:19.309
if you get a great image from Nano Banana, you're

00:11:19.309 --> 00:11:21.590
happy with it, but then Hagen rejects it when

00:11:21.590 --> 00:11:24.009
you try to upload. That's almost always resolution.

00:11:24.289 --> 00:11:26.590
Hagen needs a minimum image size, typically a

00:11:26.590 --> 00:11:30.269
24 by 124 pixels. So the fix is simple. Before

00:11:30.269 --> 00:11:32.610
you upload to Hagen, run your Nano Banana image

00:11:32.610 --> 00:11:35.700
through a free online upscaler tool. OK, good

00:11:35.700 --> 00:11:37.580
tip. It just blows the image up to the right

00:11:37.580 --> 00:11:39.820
size. Don't skip that prep step. It saves headaches.

00:11:40.139 --> 00:11:41.580
All right, now for some of the more advanced

00:11:41.580 --> 00:11:43.500
stuff, the pro tips from the source material,

00:11:44.000 --> 00:11:46.460
let's talk about building an avatar team. What's

00:11:46.460 --> 00:11:49.740
the idea there? Yeah, this is cool for maybe

00:11:49.740 --> 00:11:51.840
an explainer video series where you want multiple

00:11:51.840 --> 00:11:54.860
hosts that look like they belong together. The

00:11:54.860 --> 00:11:57.919
key is what the source called. prompt parallelism

00:11:57.919 --> 00:12:00.580
prompt parallelism yeah basically you create

00:12:00.580 --> 00:12:02.259
the different characters but you make sure the

00:12:02.259 --> 00:12:04.919
prompts you use are structurally very similar

00:12:04.919 --> 00:12:07.440
like use the same lighting style description

00:12:07.440 --> 00:12:10.340
the same overall camera angle phrasing you only

00:12:10.340 --> 00:12:13.139
change small details hair color shirt color maybe

00:12:13.139 --> 00:12:15.659
slight facial structure tweaks they end up looking

00:12:15.659 --> 00:12:18.879
like they're from the same show or universe ah

00:12:18.879 --> 00:12:21.039
maintaining stylistic consistency through the

00:12:21.039 --> 00:12:24.059
prompts makes sense you also mentioned combining

00:12:24.059 --> 00:12:27.190
image elements like taking a hairstyle from one

00:12:27.190 --> 00:12:29.190
photo and an outfit from another. Yeah, that's

00:12:29.190 --> 00:12:31.549
leveraging Nano Banana's ability to use reference

00:12:31.549 --> 00:12:33.730
images. You can point it to one picture and say,

00:12:33.929 --> 00:12:35.970
use this hairstyle, point to another and say,

00:12:36.049 --> 00:12:38.230
use this jacket, all while keeping your main

00:12:38.230 --> 00:12:41.429
avatar's face locked from the original upload.

00:12:42.129 --> 00:12:45.210
It's pretty powerful for customization. And the

00:12:45.210 --> 00:12:49.529
last practical fix, dealing with small errors.

00:12:49.990 --> 00:12:52.769
Like if the AI generates weird fingers or strange

00:12:52.769 --> 00:12:55.659
eyes, but the rest of the image is perfect. Yeah,

00:12:55.659 --> 00:12:57.440
that's where inpainting comes in. Inpainting

00:12:57.440 --> 00:13:00.500
basically means you tell the AI, hey, regenerate

00:13:00.500 --> 00:13:02.600
only this tiny spot right here, like the weird

00:13:02.600 --> 00:13:04.759
finger or shadow that looks off without messing

00:13:04.759 --> 00:13:06.820
up the rest of the picture. Tools like Canva

00:13:06.820 --> 00:13:09.120
or the Photoshop Beta have features for this.

00:13:09.299 --> 00:13:11.139
It saves you re -rendering the whole thing just

00:13:11.139 --> 00:13:13.500
to fix one little glitch. OK, so when you're

00:13:13.500 --> 00:13:15.840
building that avatar team and aiming for a cohesive

00:13:15.840 --> 00:13:18.159
look, what's the single most important element

00:13:18.159 --> 00:13:20.240
to keep similar across the prompts for the different

00:13:20.240 --> 00:13:22.639
characters? Keeping the prompts similar ensures

00:13:22.639 --> 00:13:25.059
a matching look across the entire cast of avatars.

00:13:25.240 --> 00:13:27.500
Right, the stylistic elements in the prompt.

00:13:27.919 --> 00:13:30.720
So let's zoom out. What does this whole three

00:13:30.720 --> 00:13:33.659
-tool blueprint ultimately give someone, a content

00:13:33.659 --> 00:13:36.840
creator, a marketer? What's the big, so what?

00:13:37.059 --> 00:13:38.779
Well, fundamentally, it delivers efficiency,

00:13:38.899 --> 00:13:41.159
quality, and adaptability. You're getting genuine

00:13:41.159 --> 00:13:43.799
cost savings. You're getting effectively unlimited

00:13:43.799 --> 00:13:46.340
creative variations on your avatars. And you're

00:13:46.340 --> 00:13:48.519
getting really high visual quality, thanks to

00:13:48.519 --> 00:13:52.279
that VEO 3 .1 integration. Practically, it means

00:13:52.279 --> 00:13:54.539
you can easily A -B test different avatar styles.

00:13:54.730 --> 00:13:57.889
maybe male versus female, casual versus formal.

00:13:57.990 --> 00:14:00.090
You can do that across different campaigns, different

00:14:00.090 --> 00:14:02.049
regions, different languages, even. So it shifts

00:14:02.049 --> 00:14:05.110
the avatar from just being a visual in one video

00:14:05.110 --> 00:14:07.250
to being more like a strategic asset you can

00:14:07.250 --> 00:14:09.649
deploy in many ways. For creators, maybe they

00:14:09.649 --> 00:14:11.669
can build a consistent brand with multiple hosts

00:14:11.669 --> 00:14:13.850
without needing actors or complex licensing.

00:14:14.070 --> 00:14:15.730
Exactly. It's like building an asset library

00:14:15.730 --> 00:14:18.350
powerhouse. It really transforms your whole approach.

00:14:18.370 --> 00:14:21.710
You go from slow, expensive, and limited to being

00:14:21.710 --> 00:14:25.710
rapid, strategic, and pretty expansive in what

00:14:25.710 --> 00:14:28.070
you can create. You know, the real takeaway for

00:14:28.070 --> 00:14:30.070
me here isn't just the cost savings, though that's

00:14:30.070 --> 00:14:33.350
huge. It's the speed, the potential speed of

00:14:33.350 --> 00:14:36.309
asset creation. The sources suggest once you

00:14:36.309 --> 00:14:38.289
get comfortable with this flow, you could generate

00:14:38.289 --> 00:14:42.500
maybe 20 or more unique ready to go avatar images

00:14:42.500 --> 00:14:45.559
in less than half an hour. Yeah, it totally shifts

00:14:45.559 --> 00:14:47.679
your focus. You stop worrying about budgeting

00:14:47.679 --> 00:14:49.899
credits minute by minute, and you start thinking

00:14:49.899 --> 00:14:52.299
about building this digital factory of potential

00:14:52.299 --> 00:14:56.340
spokespeople. Just imagine having, say, 50 unique

00:14:56.919 --> 00:14:59.679
Consistent characters ready to deploy. Tailored

00:14:59.679 --> 00:15:02.639
for any situation you need a serious lawyer avatar.

00:15:02.940 --> 00:15:05.559
Got it. An enthusiastic team for a different

00:15:05.559 --> 00:15:08.919
campaign. Got it. Fitness guru, corporate CEO,

00:15:09.460 --> 00:15:11.500
all sitting in your library ready to be turned

00:15:11.500 --> 00:15:14.320
into high quality video for fractions of what

00:15:14.320 --> 00:15:16.100
it used to cost. You could actually start building

00:15:16.100 --> 00:15:17.919
that library today. The tools are out there.

00:15:18.039 --> 00:15:20.379
They're mostly accessible. The method seems pretty

00:15:20.379 --> 00:15:22.360
well proven according to the source. Time to

00:15:22.360 --> 00:15:24.700
start creating. Out T -Row music.