WEBVTT

00:00:00.000 --> 00:00:02.419
have you ever been scrolling you know maybe late

00:00:02.419 --> 00:00:05.820
at night and you just stop you see this ai video

00:00:05.820 --> 00:00:07.940
it's kind of astonishing uh maybe it's glass

00:00:07.940 --> 00:00:10.560
breaking with these impossible sounds or like

00:00:10.800 --> 00:00:13.640
epic cinematic shot millions of views and you

00:00:13.640 --> 00:00:16.300
just think how like what's the secret there if

00:00:16.300 --> 00:00:18.480
you've felt that that sort of invisible wall

00:00:18.480 --> 00:00:20.739
between your ideas and getting that high quality

00:00:20.739 --> 00:00:23.420
viral AI stuff out there well you're not alone

00:00:23.420 --> 00:00:26.480
today that wall starts coming down welcome to

00:00:26.480 --> 00:00:29.059
the deep dive our mission today is to give you

00:00:29.059 --> 00:00:32.340
a detailed map really a clear compass for mastering

00:00:32.340 --> 00:00:35.399
AI video generation we're going to dig into Google

00:00:35.399 --> 00:00:38.060
VEO3 it's a breakthrough tool and it's really

00:00:38.060 --> 00:00:40.380
reshaping how we think about creating video Yeah,

00:00:40.439 --> 00:00:42.380
absolutely. And we're jumping right in. We'll

00:00:42.380 --> 00:00:45.719
unpack what VEO3 actually is, how it works, fundamentally.

00:00:46.039 --> 00:00:48.240
And we'll shift to thinking like an AI director

00:00:48.240 --> 00:00:50.299
when you prompt. Plus, we're going to lay out

00:00:50.299 --> 00:00:52.460
some proven viral video formats you can use,

00:00:52.859 --> 00:00:54.920
like right away we'll share advanced techniques,

00:00:55.159 --> 00:00:57.560
talk post -production polish, and touch on the

00:00:57.560 --> 00:00:59.020
ethics and the current limits. It's the whole

00:00:59.020 --> 00:01:01.859
journey, basically. Idea to a viral potential.

00:01:02.140 --> 00:01:04.680
It really is about moving past just that, wow,

00:01:04.739 --> 00:01:07.579
how did they do that? Moment to actually understanding

00:01:07.579 --> 00:01:10.099
the craft behind these things exactly and to

00:01:10.099 --> 00:01:12.819
really get VEOs power you gotta understand It's

00:01:12.819 --> 00:01:15.859
way more than just text -to -video VEOs this

00:01:15.859 --> 00:01:19.840
advanced AI It has like deep contextual understanding.

00:01:19.840 --> 00:01:21.739
It doesn't just read words It tries to figure

00:01:21.739 --> 00:01:24.120
out your intent the emotion the whole aesthetic

00:01:24.120 --> 00:01:27.319
you're going for what makes VEO 3 feel like a

00:01:27.319 --> 00:01:30.439
revolution from what we're seeing is Couple of

00:01:30.439 --> 00:01:32.599
key things, first, that native synchronized audio,

00:01:32.659 --> 00:01:34.900
that's a game changer, seriously. Other tools,

00:01:34.920 --> 00:01:36.540
you're kind of scrambling for sound afterwards.

00:01:36.819 --> 00:01:39.019
VEO can generate the whole soundscape right from

00:01:39.019 --> 00:01:41.579
the prompt. Dialogue with pretty decent lip sync,

00:01:41.859 --> 00:01:43.939
sound effects, footsteps, glass shattering, ambient

00:01:43.939 --> 00:01:47.500
noise, wind, traffic, all synced up. It's a huge

00:01:47.500 --> 00:01:49.260
leap forward. And it's not just sound, is it?

00:01:49.280 --> 00:01:51.840
It almost feels like it understands physics and

00:01:51.840 --> 00:01:54.780
film language. It does. It's got this intuitive

00:01:54.780 --> 00:01:57.700
physics simulation, plus what we're calling cinematic

00:01:57.700 --> 00:02:00.819
literacy. VEO. gets how the world works. Water

00:02:00.819 --> 00:02:03.760
flows, fabric billows, things have weight. But

00:02:03.760 --> 00:02:06.680
what's maybe more fascinating, it speaks filmmaker.

00:02:07.180 --> 00:02:09.659
You can ask for a time lapse, an aerial shot,

00:02:10.020 --> 00:02:12.680
dolly zoom, and it actually translates those

00:02:12.680 --> 00:02:15.180
into proper camera moves. It's not just making

00:02:15.180 --> 00:02:17.400
video, it's kind of directing it. So that really

00:02:17.400 --> 00:02:19.319
demands a whole different way of thinking then.

00:02:19.479 --> 00:02:21.860
This isn't just typing a few words and, you know,

00:02:22.060 --> 00:02:23.819
crossing your fingers. You have to come at it

00:02:23.819 --> 00:02:27.129
with this AI director mindset. Every prompt isn't

00:02:27.129 --> 00:02:30.490
just a request. It's a clear directive to a virtual

00:02:30.490 --> 00:02:32.530
crew. You're the director, screenwriter, sound

00:02:32.530 --> 00:02:35.050
designer, all rolled into one. It's like VEO

00:02:35.050 --> 00:02:38.030
is just waiting for you to command it properly.

00:02:38.610 --> 00:02:41.490
So that deep contextual understanding, what does

00:02:41.490 --> 00:02:43.310
that actually look like when we're trying to

00:02:43.310 --> 00:02:46.110
create? Does it really grasp what we want beyond

00:02:46.110 --> 00:02:48.250
just the key words? It interprets the intent,

00:02:48.349 --> 00:02:50.689
the emotion, and the aesthetic you want to convey.

00:02:50.830 --> 00:02:52.810
That deeper understanding. Yeah, that makes the

00:02:52.810 --> 00:02:54.789
whole prompting philosophy just incredibly important.

00:02:55.069 --> 00:02:57.349
Oh, absolutely. In generative AI, the prompt

00:02:57.349 --> 00:02:59.689
is, well, it's everything. A good prompt is like

00:02:59.689 --> 00:03:01.909
the bridge from your imagination to what the

00:03:01.909 --> 00:03:04.069
AI can actually make. Think of it like a really

00:03:04.069 --> 00:03:06.669
detailed blueprint, a precise directive for that

00:03:06.669 --> 00:03:08.830
virtual crew we talked about. Got to be clear,

00:03:09.389 --> 00:03:11.990
layered, no room for guesswork. And one really

00:03:11.990 --> 00:03:13.770
effective way to build those tromps is what we're

00:03:13.770 --> 00:03:16.120
calling the layering technique. It's kind of

00:03:16.120 --> 00:03:18.840
like an artist sketching first, then adding the

00:03:18.840 --> 00:03:21.520
detail and color. So layer one, the foundation,

00:03:21.800 --> 00:03:24.360
that's your shot type, your aesthetic. The framework,

00:03:24.580 --> 00:03:27.520
perspective, style, the mood. So maybe you start

00:03:27.520 --> 00:03:30.759
with cinematic wide shot, epic fantasy aesthetic.

00:03:31.419 --> 00:03:34.020
Then layer two, the setting, build the world.

00:03:34.360 --> 00:03:36.599
Where's it happening? More detail, more believable

00:03:36.599 --> 00:03:40.830
video. So adding on. In a sprawling ancient forest

00:03:40.830 --> 00:03:43.689
at twilight, mist clings to the moss -covered

00:03:43.689 --> 00:03:46.050
trees. Okay, then layer three. Subject in action.

00:03:46.409 --> 00:03:48.270
Put your character or object in there. Give them

00:03:48.270 --> 00:03:51.069
something to do, a purpose. Continuing that example.

00:03:51.689 --> 00:03:53.849
A lone elven warrior with glowing silver armor

00:03:53.849 --> 00:03:56.090
kneels by a stream, dipping her hands into the

00:03:56.090 --> 00:03:59.430
water. Right. And finally, layer four. The fine

00:03:59.430 --> 00:04:01.490
details. These are the final touches. Give it

00:04:01.490 --> 00:04:04.469
soul. Lighting, textures, and importantly, the

00:04:04.469 --> 00:04:07.650
specific sounds. Finishing it off. The water

00:04:07.650 --> 00:04:10.169
ripples around her fingers. The only sounds of

00:04:10.169 --> 00:04:11.969
the gentle gurgle of the stream and the chirping

00:04:11.969 --> 00:04:15.270
of unseen insects. Build it like that layer by

00:04:15.270 --> 00:04:17.850
layer and you've got a really comprehensive directive

00:04:17.850 --> 00:04:20.569
for VEO. And it's not just the structure, right?

00:04:20.829 --> 00:04:24.189
It's the words themselves, the power of words.

00:04:24.730 --> 00:04:27.310
Your word choice makes a huge difference. The

00:04:27.310 --> 00:04:31.250
AI picks up on nuance, adjectives, verbs surprisingly

00:04:31.250 --> 00:04:33.490
well. Think about like, a man walks down the

00:04:33.490 --> 00:04:36.509
street, okay, vague, versus a weary old man shuffles

00:04:36.509 --> 00:04:39.189
down a rain -slick, neon -lit cobblestone alley,

00:04:39.410 --> 00:04:41.790
that second one. It tells a story, gives you

00:04:41.790 --> 00:04:43.910
a mood, a setting, a character. Use strong verbs,

00:04:44.069 --> 00:04:45.769
sensory adjectives. Yeah, definitely. And it

00:04:45.769 --> 00:04:47.870
really helps to practice thinking in frames.

00:04:48.079 --> 00:04:50.000
Before you even type, like seriously, close your

00:04:50.000 --> 00:04:51.959
eyes, visualize the shot, ask those director

00:04:51.959 --> 00:04:54.000
questions, where's the camera, moving, static,

00:04:54.160 --> 00:04:56.579
what lens, light source. Once you have that clear

00:04:56.579 --> 00:04:58.660
image translating into detailed language for

00:04:58.660 --> 00:05:01.730
the AI, much easier, much more effective. That's

00:05:01.730 --> 00:05:03.589
the real difference, isn't it? Between just getting

00:05:03.589 --> 00:05:06.110
something random back and intentionally creating

00:05:06.110 --> 00:05:08.829
an artistic vision. So to really nail that vision,

00:05:09.209 --> 00:05:12.490
why is visualizing the shot first so vital? It

00:05:12.490 --> 00:05:15.490
helps translate your clear image into effective

00:05:15.490 --> 00:05:19.110
detailed language for the AI. OK, now this is

00:05:19.110 --> 00:05:20.810
the core of today's deep dive. We're going to

00:05:20.810 --> 00:05:22.970
break down some video formats that, well, they've

00:05:22.970 --> 00:05:24.930
proven to have a serious viral potential online.

00:05:25.370 --> 00:05:27.970
For each one, we'll look at the what, the how,

00:05:28.250 --> 00:05:31.170
and crucially, the why. the psychology, and how

00:05:31.170 --> 00:05:33.310
VEO can nail them. All right, let's jump in.

00:05:33.350 --> 00:05:36.209
First one. Unrealistic object ASMR videos. They

00:05:36.209 --> 00:05:38.649
seem absolutely everywhere. They do. And they

00:05:38.649 --> 00:05:41.310
hit two things our brains just love. ASMR, that

00:05:41.310 --> 00:05:43.930
tingly feeling from certain sounds, and our fascination

00:05:43.930 --> 00:05:46.029
with the absurd. See, our brains know physics.

00:05:46.689 --> 00:05:49.250
So when VEO shows like a croissant, but made

00:05:49.250 --> 00:05:51.610
of obsidian, and then shatters it, it's this

00:05:51.610 --> 00:05:54.689
weird, safe violation of reality. You can't look

00:05:54.689 --> 00:05:57.089
away. VEO's physics engine lets us play that

00:05:57.089 --> 00:06:00.220
trick for the prompt. Camera angle, object material,

00:06:00.699 --> 00:06:02.360
the main destructive action, how it interacts

00:06:02.360 --> 00:06:04.439
with the environment, and that ASMR soundscape.

00:06:04.800 --> 00:06:06.759
Think of Siddy and Croissant crumbling sharp

00:06:06.759 --> 00:06:09.339
cracks, or a ruby pomegranate spilling cladowing

00:06:09.339 --> 00:06:11.699
gems. Pitfall, though. Keep the action simple

00:06:11.699 --> 00:06:13.879
in an eight -second shot. One decisive thing

00:06:13.879 --> 00:06:16.759
works best. Too much gets messy. Okay, next.

00:06:17.000 --> 00:06:18.740
Selfie -style vlogger videos. There's something

00:06:18.740 --> 00:06:21.079
really engaging about that format. Totally. Humans

00:06:21.079 --> 00:06:24.279
connect, right? This format taps into that parasocial

00:06:24.279 --> 00:06:26.199
thing. Makes it feel authentic, like a direct

00:06:26.199 --> 00:06:28.420
chat. And the humor comes from the contrast.

00:06:28.959 --> 00:06:31.100
Put an amazing character in a mundane spot. Imagine

00:06:31.100 --> 00:06:33.079
an astronaut in a supermarket, totally baffled

00:06:33.079 --> 00:06:36.699
by avocados. Or a Viking king versus a vending

00:06:36.699 --> 00:06:39.399
machine, you know, yelling for his crisps. The

00:06:39.399 --> 00:06:41.160
EO's good at those subtle character reactions,

00:06:41.420 --> 00:06:44.009
prompt -wise. Declare selfie style, describe

00:06:44.009 --> 00:06:46.069
the character in detail, add self -film touches,

00:06:46.290 --> 00:06:48.610
modern setting, action dialogue showing conflict,

00:06:48.889 --> 00:06:51.170
ambient audio. Just keep dialogue short, like

00:06:51.170 --> 00:06:54.629
under 10, 15 words. Long speeches tend to confuse

00:06:54.629 --> 00:06:57.170
the AI. And then satirical street interviews.

00:06:57.449 --> 00:06:59.829
What's the magic there? Subverting expectations.

00:07:00.089 --> 00:07:02.649
We know the news interview format. So when the

00:07:02.649 --> 00:07:05.220
content is suddenly absurd, Hilarious! It lets

00:07:05.220 --> 00:07:07.660
you do satire or just pure nonsense. Viewers

00:07:07.660 --> 00:07:10.240
like feeling they're in on the joke. VEO can

00:07:10.240 --> 00:07:12.660
create those realistic backdrops and expressions.

00:07:13.100 --> 00:07:15.519
Think the invisible pet show, people grooming

00:07:15.519 --> 00:07:18.699
thin air, or a 1920s interview about the shocking

00:07:18.699 --> 00:07:21.420
new trend of sliced bread. Hello. Your prompt

00:07:21.420 --> 00:07:24.100
needs the format, shot setup, absurd backstory,

00:07:24.699 --> 00:07:26.600
interviewee details, and a short Q &A script.

00:07:26.879 --> 00:07:28.879
Key thing. Keep the premise simple. The humor's

00:07:28.879 --> 00:07:31.279
in the basic absurdity. So what about other formats

00:07:31.279 --> 00:07:34.079
that use similar ideas, like incongruity or just

00:07:34.079 --> 00:07:35.860
strong visuals? I'm thinking news reports with

00:07:35.860 --> 00:07:38.199
a twist or even those really cinematic videos.

00:07:38.660 --> 00:07:40.300
Well, yeah, absolutely. They tap into similar

00:07:40.300 --> 00:07:42.579
core psychology. Take the news report with a

00:07:42.579 --> 00:07:45.540
twist. That's all about comedic timing and incongruity.

00:07:45.939 --> 00:07:48.579
Our brain focuses on the person talking. Right?

00:07:48.980 --> 00:07:51.120
So when something crazy happens in the background,

00:07:51.579 --> 00:07:54.139
unnoticed by them, it shatters expectations.

00:07:54.660 --> 00:07:57.120
Serious foreground, chaotic background, classic

00:07:57.120 --> 00:08:00.319
visual comedy. Like a construction safety expert

00:08:00.319 --> 00:08:02.819
talking confidently while someone perfectly falls

00:08:02.819 --> 00:08:05.360
into wet cement behind him. Or an environmental

00:08:05.360 --> 00:08:07.720
activist giving a speech as the wind blows all

00:08:07.720 --> 00:08:10.819
his fliers away in a clean park. VEO handles

00:08:10.819 --> 00:08:13.220
those complex scenes pretty well. Then totally

00:08:13.220 --> 00:08:15.500
different vibe. Cinematic style videos. That's

00:08:15.500 --> 00:08:18.589
about emotion. beauty, grandeur, visual storytelling.

00:08:19.009 --> 00:08:21.829
We don't just see it, we feel it. The high production

00:08:21.829 --> 00:08:24.829
value VEO gives feels professional. Picture a

00:08:24.829 --> 00:08:27.370
sci -fi scene, Blade Runner vibe, maybe an android

00:08:27.370 --> 00:08:30.029
with one tear, slow crane reveals a future city,

00:08:30.170 --> 00:08:32.230
or the lone samurai, Kurosawa style, drawing

00:08:32.230 --> 00:08:34.850
his sword on a cliff. VEO gets those complex

00:08:34.850 --> 00:08:37.289
camera moves and lighting. We also see huge engagement

00:08:37.289 --> 00:08:39.669
with stuff like the animal Olympics. That's pure

00:08:39.669 --> 00:08:41.929
anthropomorphism, giving animals human traits,

00:08:42.049 --> 00:08:43.789
especially in a serious context like the Olympics.

00:08:44.169 --> 00:08:46.620
Instant comedy. We laugh at a grizzly bear weightlifting

00:08:46.620 --> 00:08:49.440
or Siamese cat gymnastics, but we kind of root

00:08:49.440 --> 00:08:52.580
for them too. And for controlled chaos fans,

00:08:53.419 --> 00:08:55.899
satisfying object destruction. Our brains like

00:08:55.899 --> 00:08:58.120
patterns, but also get satisfaction from controlled

00:08:58.120 --> 00:09:00.980
destruction. Slow motion, which VEO does okay,

00:09:01.159 --> 00:09:03.559
shows off the physics details. Watching something

00:09:03.559 --> 00:09:06.159
go from order to chaos predictably, it's almost

00:09:06.159 --> 00:09:09.539
meditative. Think ink. Blooming in milk, creating

00:09:09.539 --> 00:09:12.240
those cool fractal patterns. Or a time lapse

00:09:12.240 --> 00:09:15.019
of a melting ice car. And finally, maybe a personal

00:09:15.019 --> 00:09:17.379
favorite. Historical figures in modern situations.

00:09:17.879 --> 00:09:20.399
Classic fish out of water. Anachronism. Humor

00:09:20.399 --> 00:09:22.620
from the time clash. Works best when their reaction

00:09:22.620 --> 00:09:24.659
fits their personality or achievements. Think

00:09:24.659 --> 00:09:27.000
Nikola Tesla at the Apple Store, annoyed by cat

00:09:27.000 --> 00:09:29.480
pics instead of wireless power. Or Marie Curie

00:09:29.480 --> 00:09:31.159
tries an energy drink and discovers some new

00:09:31.159 --> 00:09:33.460
energy. OK, so if you pull all those different

00:09:33.460 --> 00:09:36.879
formats together. What's the common thread? What's

00:09:36.879 --> 00:09:39.600
the core idea uniting their success? They all

00:09:39.600 --> 00:09:42.740
tap into universal human psychology for vital

00:09:42.740 --> 00:09:44.679
appeal and engagement. Alright, we talked about

00:09:44.679 --> 00:09:47.039
making great -looking videos, understanding the

00:09:47.039 --> 00:09:50.279
psychology. Now let's talk about control. Controlling

00:09:50.279 --> 00:09:53.320
consistently, efficiently, moving beyond just

00:09:53.320 --> 00:09:56.600
looks good to making it predictable, repeatable.

00:09:56.700 --> 00:09:58.100
Yeah, this is super important if you're doing

00:09:58.100 --> 00:10:00.360
a series or telling a story over multiple shots.

00:10:00.460 --> 00:10:02.580
If your character suddenly looks different, it

00:10:02.580 --> 00:10:05.139
just breaks everything. The fix is what we call

00:10:05.139 --> 00:10:07.100
the character sheet, not just a description.

00:10:07.500 --> 00:10:10.080
It's a standardized block of text, your character's

00:10:10.080 --> 00:10:13.899
DNA. Needs extreme detail. Age, build, hair,

00:10:13.940 --> 00:10:16.700
eyes, skin, signature outfit gets specific on

00:10:16.700 --> 00:10:18.960
fabric, color, style. And defining features.

00:10:19.519 --> 00:10:22.360
Scar, tattoo, mold, maybe a specific walk, an

00:10:22.360 --> 00:10:24.480
accessory, whatever makes them unique. I still

00:10:24.480 --> 00:10:26.700
wrestle with punch drift myself when I try to

00:10:26.700 --> 00:10:28.700
maintain character consistency across scenes.

00:10:28.960 --> 00:10:30.960
It's tougher than it looks. It's amazing how

00:10:30.960 --> 00:10:33.179
just changing black leather jacket to black jacket

00:10:33.179 --> 00:10:35.759
middle leather can mess things up. So this character

00:10:35.759 --> 00:10:38.690
sheet... It sounds essential, but how do we make

00:10:38.690 --> 00:10:41.750
it really work every time? Precision. Total precision.

00:10:42.169 --> 00:10:44.149
You write the sheet once, maybe in a text editor.

00:10:44.730 --> 00:10:47.470
Then, for every single prompt with that character,

00:10:47.909 --> 00:10:50.950
you copy and paste the exact block of text. 100

00:10:50.950 --> 00:10:53.700
% accurate. Beginning of the prompt. Even a comma

00:10:53.700 --> 00:10:56.240
changes things sometimes. Beyond characters,

00:10:56.379 --> 00:10:58.580
there's mastering the camera. For really impressive

00:10:58.580 --> 00:11:01.059
shots, speak cinema language. Think Dolly add

00:11:01.059 --> 00:11:03.860
camera moves closer or further. Crane shot, smooth

00:11:03.860 --> 00:11:06.059
vertical moves, tracking shot, moves alongside

00:11:06.059 --> 00:11:08.779
the subject. Rack focus, shifts focus within

00:11:08.779 --> 00:11:11.100
the shot. Dutch angle, tilted camera for unease.

00:11:11.580 --> 00:11:13.220
He is describing the camera's journey and not

00:11:13.220 --> 00:11:15.840
just one word. Don't just say, Dolly in, try.

00:11:16.120 --> 00:11:17.899
Camera starts close on hands, assembling a device,

00:11:18.279 --> 00:11:20.320
tilt up smoothly to their face. Then Dolly's

00:11:20.320 --> 00:11:23.080
out to show them alone in a vast workshop. More

00:11:23.080 --> 00:11:26.379
nuanced. And sound. Sounds huge. VEO lets you

00:11:26.379 --> 00:11:28.000
direct that, too. It's important to know the

00:11:28.000 --> 00:11:29.879
difference between diegetic and non -diegetic,

00:11:29.899 --> 00:11:32.259
right? Exactly. Diegetic is what characters hear

00:11:32.259 --> 00:11:34.740
dialogue, footsteps, rain, part of their world.

00:11:35.240 --> 00:11:37.519
Non -diegetic is just, for the audience, musical

00:11:37.519 --> 00:11:40.799
score, narrator, enhances emotion. Okay, finally,

00:11:41.019 --> 00:11:43.419
credit -saving strategies. AI credits aren't

00:11:43.419 --> 00:11:46.269
cheap. Got to work smart. Use fast mode if the

00:11:46.269 --> 00:11:49.250
EO has it for testing ideas, drafts, quick lower

00:11:49.250 --> 00:11:51.950
res versions, good for iterating before the big

00:11:51.950 --> 00:11:54.730
render. And here's a big pro tip. Before you

00:11:54.730 --> 00:11:56.950
burn credits on an eight second video, take your

00:11:56.950 --> 00:11:59.409
core visual description, pop it into an image

00:11:59.409 --> 00:12:01.669
generator like in Gemini for your cheap, lets

00:12:01.669 --> 00:12:03.789
you quickly check aesthetic, composition, character

00:12:03.789 --> 00:12:07.029
design without spending video credits, and always,

00:12:07.029 --> 00:12:09.490
always finalize prompts in the external text

00:12:09.490 --> 00:12:12.370
editor, Google Docs, whatever. Avoid accidental

00:12:12.370 --> 00:12:15.470
generations from typos. great tips there. Which

00:12:15.470 --> 00:12:17.850
one would you say offers the biggest efficiency

00:12:17.850 --> 00:12:21.009
gain, saves the most time and resources? Testing

00:12:21.009 --> 00:12:23.210
core visual descriptions with an image generator

00:12:23.210 --> 00:12:26.389
before costly video generation. So once you've

00:12:26.389 --> 00:12:29.269
got that VEO output, remember, it's raw material.

00:12:29.490 --> 00:12:31.269
Post -production is where you polish it, turn

00:12:31.269 --> 00:12:32.809
it into something professional, ready to go.

00:12:33.149 --> 00:12:35.190
So the workflow is kind of like, VEO generates

00:12:35.190 --> 00:12:37.250
the clip, then an upscaler, then an editor, then

00:12:37.250 --> 00:12:40.450
publish? Pretty much. VEO usually outputs 720p,

00:12:40.509 --> 00:12:43.409
maybe 1080p. For top quality, you want to upscale

00:12:43.409 --> 00:12:47.559
to 4K. AI upscalers like Topaz Video AI are great.

00:12:47.820 --> 00:12:49.679
They don't just stretch pixels, they intelligently

00:12:49.679 --> 00:12:53.120
paint in details, denoise, sharpen, looks natural.

00:12:53.679 --> 00:12:56.240
Then, editing and color grading. DaVinci Resolve

00:12:56.240 --> 00:12:58.779
is amazing, powerful free version, great for

00:12:58.779 --> 00:13:00.960
color. CapCut's good too, beginner -friendly

00:13:00.960 --> 00:13:03.539
mobile desktop. Color grading is vital. Adjust

00:13:03.539 --> 00:13:05.919
colors for mood, cool blues for horror, warm

00:13:05.919 --> 00:13:08.460
oranges for sunset. And even with VEO's audio,

00:13:08.720 --> 00:13:10.720
adding extra layers in post makes it more immersive.

00:13:10.919 --> 00:13:13.179
background music, extra sound effects, a whoosh

00:13:13.179 --> 00:13:15.320
for movement, stuff like that. And then reformatting.

00:13:15.399 --> 00:13:17.679
That's a big one for social media. Video is horizontal,

00:13:17.879 --> 00:13:21.799
16 .9. TikTok reels need vertical, 9 .16. Cropping

00:13:21.799 --> 00:13:24.120
just butchers it usually. Yeah, cropping loses

00:13:24.120 --> 00:13:27.440
too much. The solution. AI video outpainting

00:13:27.440 --> 00:13:29.860
tools. Runway, Luma Labs have these. They use

00:13:29.860 --> 00:13:32.399
AI to paint the missing space above and below

00:13:32.399 --> 00:13:35.179
your horizontal video, creates a seamless vertical

00:13:35.179 --> 00:13:38.279
frame, keeps your original content intact. It's

00:13:38.279 --> 00:13:41.059
like magic for social media. That is a powerful

00:13:41.059 --> 00:13:44.139
trick. So thinking about post -production for

00:13:44.139 --> 00:13:46.679
social, what's the biggest mistake people tend

00:13:46.679 --> 00:13:49.460
to make? They often just crop horizontal videos

00:13:49.460 --> 00:13:51.879
losing most of the actual frame. Okay, so with

00:13:51.879 --> 00:13:54.440
all this incredible creative power, there's also...

00:13:54.360 --> 00:13:57.440
responsibility, especially now in the AI era.

00:13:57.700 --> 00:13:59.779
Oh, definitely. We have to talk about misinformation,

00:14:00.139 --> 00:14:02.460
deep fakes. It's a real risk. This tech can be

00:14:02.460 --> 00:14:05.200
misused, fake news, deceptive videos, harmful

00:14:05.200 --> 00:14:07.379
stuff. As creators, we kind of have to commit

00:14:07.379 --> 00:14:09.720
to using it for art, entertainment, positive

00:14:09.720 --> 00:14:12.950
storytelling, never to deceive or harm. And transparency

00:14:12.950 --> 00:14:16.110
is key. Always tell your audience it's AI generated.

00:14:16.409 --> 00:14:19.149
Hashtags like hashtag AI generated, hashtag made

00:14:19.149 --> 00:14:21.629
with VEO, or just a little text overlay builds

00:14:21.629 --> 00:14:23.950
trust. Helps keep the whole AI creative space

00:14:23.950 --> 00:14:26.370
healthy. And VO3, even as powerful as it is,

00:14:26.450 --> 00:14:28.409
it still has limits, right? It's not perfect

00:14:28.409 --> 00:14:31.940
yet. No tech is. VEO still struggles with hands

00:14:31.940 --> 00:14:34.440
sometimes. Extra fingers, missing fingers, it

00:14:34.440 --> 00:14:37.259
happens. Physics simulation is good, but complex,

00:14:37.379 --> 00:14:40.379
illogical stuff can still look weird. And crucially,

00:14:40.700 --> 00:14:42.759
text in video. It just can't do legible text.

00:14:42.879 --> 00:14:45.279
It comes out garbled. Workarounds for hands,

00:14:45.580 --> 00:14:47.659
use gloves, make fists, hide them, frame them

00:14:47.659 --> 00:14:50.299
out. For physics, keep actions simpler, plausible.

00:14:50.659 --> 00:14:53.240
For text, always add it in post using your editor.

00:14:53.440 --> 00:14:56.039
Don't even ask VEO to generate it. So connecting

00:14:56.039 --> 00:14:58.879
this back to the bigger picture. What would you

00:14:58.879 --> 00:15:01.879
say is the most significant ethical concern with

00:15:01.879 --> 00:15:04.419
this tech right now? The potential for misinformation

00:15:04.419 --> 00:15:07.240
and those deceptive deep fake videos. We've covered

00:15:07.240 --> 00:15:09.220
a lot of ground in this deep dive from really

00:15:09.220 --> 00:15:11.659
understanding VEOs power, the prompting philosophy,

00:15:12.220 --> 00:15:14.500
to viral formats, advanced techniques and putting

00:15:14.500 --> 00:15:16.720
together a whole strategy. Yeah. And remember,

00:15:16.919 --> 00:15:19.519
the text is just a tool. The real power is your

00:15:19.519 --> 00:15:22.360
storytelling ability, the detail in your directives.

00:15:22.720 --> 00:15:25.620
A well -crafted prompt makes art, not just random

00:15:25.620 --> 00:15:28.379
outputs. And things are moving so fast. Whoa!

00:15:28.899 --> 00:15:31.000
I mean, imagine the future. Perfect character

00:15:31.000 --> 00:15:33.919
consistency across dozens of scenes, real -time

00:15:33.919 --> 00:15:36.960
video generation. The creator's role is really

00:15:36.960 --> 00:15:39.360
shifting, isn't it? More like a true AI director,

00:15:39.559 --> 00:15:42.460
a narrative designer telling the AI exactly what

00:15:42.460 --> 00:15:44.840
complex idea to build. You've got the map now.

00:15:44.919 --> 00:15:46.759
You've got the toolkit. It's really your turn

00:15:46.759 --> 00:15:48.960
to explore, push the limits, become a creator

00:15:48.960 --> 00:15:51.200
shaping what's next. Pick a format we talked

00:15:51.200 --> 00:15:53.080
about. Make it yours. Make that first video.

00:15:53.379 --> 00:15:55.559
Don't worry about failing. Every generation teaches

00:15:55.559 --> 00:15:57.679
you something. The future of storytelling is

00:15:57.679 --> 00:16:00.700
here, and it's literally in your hands. Now go

00:16:00.700 --> 00:16:01.240
create.