WEBVTT

00:00:00.000 --> 00:00:03.740
Imagine for a moment just wishing you could conjure

00:00:03.740 --> 00:00:07.620
a scene, you know, maybe a gritty detective walking

00:00:07.620 --> 00:00:10.460
through these neon streets, rain everywhere,

00:00:10.980 --> 00:00:12.720
or I don't know, something totally different,

00:00:12.759 --> 00:00:15.720
like a fluffy kitten landing a skateboard trick,

00:00:16.359 --> 00:00:18.519
that kind of thing. It felt like pure sci -fi,

00:00:19.000 --> 00:00:24.300
right? A dream. But now it's becoming remarkably

00:00:24.300 --> 00:00:28.469
real to sex silence. Welcome to the Deep Dive.

00:00:28.609 --> 00:00:31.250
We try to unpack complex stuff, make it digestible,

00:00:31.370 --> 00:00:34.149
maybe even a bit exciting. Today, we're plunging

00:00:34.149 --> 00:00:36.710
into Google Veo. It feels like a really new frontier

00:00:36.710 --> 00:00:39.590
in AI video. It's pretty fascinating, and we've

00:00:39.590 --> 00:00:41.030
gathered quite a bit to share with you. Absolutely.

00:00:41.070 --> 00:00:42.929
We're going to look at how this tool really shakes

00:00:42.929 --> 00:00:45.909
up video creation, especially with its native

00:00:45.909 --> 00:00:48.250
audio capabilities. That's a huge deal. And then,

00:00:48.310 --> 00:00:50.109
yeah, we'll break down this Golder formula for

00:00:50.109 --> 00:00:51.409
writing the perfect prompts. That's where the

00:00:51.409 --> 00:00:54.219
magic happens. We'll also explore Vio's two main

00:00:54.219 --> 00:00:56.200
ways to use it. The interfaces, we'll look at

00:00:56.200 --> 00:00:58.500
some advanced tips, touch on the current limits

00:00:58.500 --> 00:01:00.320
and the ethical side too, and maybe even get

00:01:00.320 --> 00:01:01.920
a little glimpse of where this tech is heading.

00:01:02.299 --> 00:01:04.439
Yeah, get ready to maybe turn some of your wild

00:01:04.439 --> 00:01:07.879
ideas into actual moving pictures. With sound,

00:01:08.400 --> 00:01:09.819
straight from text. It's kind of like having

00:01:09.819 --> 00:01:13.120
a mini film crew on call. Okay, let's dive in.

00:01:13.299 --> 00:01:16.060
So just to start, what is Google Vio fundamentally?

00:01:16.230 --> 00:01:19.370
It's an advanced AI model for generating video.

00:01:19.730 --> 00:01:21.989
Think of it like an AI that creates short video

00:01:21.989 --> 00:01:24.250
clips, usually up to eight seconds. Just from

00:01:24.250 --> 00:01:26.689
the words you type, you describe, it creates.

00:01:26.930 --> 00:01:28.469
And here's where it gets really interesting.

00:01:29.030 --> 00:01:32.650
Like, game changer interesting. Native audio

00:01:32.650 --> 00:01:35.590
generation. This is a massive step. Remember

00:01:35.590 --> 00:01:38.989
the older tools. Silent videos. Which meant this

00:01:38.989 --> 00:01:41.090
whole clunky process, right? Generate the video,

00:01:41.469 --> 00:01:43.870
then hunt for sounds, license music, spend ages

00:01:43.870 --> 00:01:46.989
syncing it all up. Yeah, it sounds painful. Vio

00:01:46.989 --> 00:01:49.549
just breaks through that. It seems to understand,

00:01:49.569 --> 00:01:52.310
like right from the start, how visuals and sound

00:01:52.310 --> 00:01:54.909
belong together. So if you describe, say, the

00:01:54.909 --> 00:01:56.849
pitter -patter of rain on a window. Exactly.

00:01:56.909 --> 00:01:58.569
It doesn't just make the rain image. It makes

00:01:58.569 --> 00:02:00.750
the sound of the rain hitting the glass all at

00:02:00.750 --> 00:02:03.510
once. Unified. Or like your example, a chef chopping

00:02:03.510 --> 00:02:06.659
vegetables fast. Right. You don't just see it.

00:02:06.659 --> 00:02:09.120
You hear that rhythmic clack clack clack of the

00:02:09.120 --> 00:02:11.840
knife Maybe a little sizzle from a pan nearby

00:02:11.840 --> 00:02:15.199
all generated together perfectly think and that

00:02:15.199 --> 00:02:18.159
eight second length Sounds short maybe but it's

00:02:18.159 --> 00:02:21.319
perfect for tick -tock for reels quick punchy

00:02:21.319 --> 00:02:23.719
stuff So for someone making that kind of fast

00:02:23.719 --> 00:02:26.919
content, how does this native audio really change

00:02:26.919 --> 00:02:30.300
their game? Like practically it just brings visuals

00:02:30.300 --> 00:02:32.639
and sound together instantly makes producing

00:02:32.639 --> 00:02:35.240
that quick stuff way faster, and honestly much

00:02:35.240 --> 00:02:37.680
richer. Okay, so we know what it does, but how

00:02:37.680 --> 00:02:40.360
do you actually use it? Google set up like two

00:02:40.360 --> 00:02:42.259
different doors into Vio, right? For different

00:02:42.259 --> 00:02:44.360
needs. That's right. First you got Google Gemini.

00:02:44.860 --> 00:02:47.240
Think of that as the easy entry point. Kind of

00:02:47.240 --> 00:02:49.319
like a chatbot, but for video. Great for playing

00:02:49.319 --> 00:02:51.680
around, rapid ideas if you're just starting out,

00:02:51.900 --> 00:02:54.199
or for making standalone things like memes or

00:02:54.199 --> 00:02:56.719
quick clips. Sparks inspiration fast. And then

00:02:56.719 --> 00:02:59.319
the other door is Google Flow. That sounds more

00:02:59.319 --> 00:03:01.979
serious. Yeah, Flow is much more like a pro -AI

00:03:01.979 --> 00:03:03.900
film studio. It's built for the bigger, more

00:03:03.900 --> 00:03:06.219
complex stuff. Projects where you need more control,

00:03:06.599 --> 00:03:08.819
maybe tell a bit of a story across multiple shots.

00:03:09.259 --> 00:03:11.419
What makes Flow stand out then? What are the

00:03:11.419 --> 00:03:13.379
key features there? Well, storyboarding is a

00:03:13.379 --> 00:03:15.500
big one. You can actually lay out a sequence

00:03:15.500 --> 00:03:18.520
of scenes, build a narrative, and crucially,

00:03:18.680 --> 00:03:20.639
it tackles character consistency. That's been

00:03:20.639 --> 00:03:23.159
a huge headache with AI video. You can upload

00:03:23.159 --> 00:03:25.599
an image of your character, and Flow tries to

00:03:25.599 --> 00:03:27.729
keep them looking the same. across different

00:03:27.729 --> 00:03:30.509
shots. Face, clothes, everything. No more weird

00:03:30.509 --> 00:03:33.990
morphing. Plus, you get manual camera controls,

00:03:34.349 --> 00:03:36.949
better scene building tools, and ways to manage

00:03:36.949 --> 00:03:39.310
all your project assets. It's much more robust.

00:03:39.430 --> 00:03:41.689
And there are subscriptions involved. Yep. There's

00:03:41.689 --> 00:03:45.050
a pro plan with VO3 fast for quicker results

00:03:45.050 --> 00:03:48.469
and an ultra plan with the full quality VO3 for

00:03:48.469 --> 00:03:51.310
the absolute best output. Both plans get you

00:03:51.310 --> 00:03:53.979
that native audio, which is key. So Flow sounds

00:03:53.979 --> 00:03:56.360
like it's tackling some major AI video hurdles,

00:03:56.439 --> 00:03:58.439
especially that character consistency you mentioned.

00:03:58.520 --> 00:04:00.819
Is that the biggest thing it solves? It's definitely

00:04:00.819 --> 00:04:03.539
huge, especially for telling stories. Flow lets

00:04:03.539 --> 00:04:05.699
you actually build a sequence, not just generate

00:04:05.699 --> 00:04:08.680
isolated clips. That's a big shift. OK, this

00:04:08.680 --> 00:04:10.740
next part feels really important. You mentioned

00:04:10.740 --> 00:04:13.400
a golden formula for prompts. This seems like

00:04:13.400 --> 00:04:15.740
where you really unlock video's power, right?

00:04:16.019 --> 00:04:18.120
Because vague prompts probably don't work well.

00:04:18.319 --> 00:04:20.220
Exactly. Most beginners just type something simple,

00:04:20.319 --> 00:04:23.680
like dog running. But the pros? Use a more structured

00:04:23.680 --> 00:04:25.899
approach. We're calling it the seven element

00:04:25.899 --> 00:04:28.560
formula. Think of it like a checklist for describing

00:04:28.560 --> 00:04:31.000
a mini movie scene to the AI. OK, break it down

00:04:31.000 --> 00:04:35.240
for us. Element one. One. Subject, who or what.

00:04:35.660 --> 00:04:38.920
Be super specific, not just a man. Try a grizzled

00:04:38.920 --> 00:04:41.759
detective weary in a worn beige trench coat.

00:04:42.300 --> 00:04:45.120
Details matter. Got it. Two. Action. What are

00:04:45.120 --> 00:04:48.420
they doing? Use strong verbs. Not, he walks,

00:04:48.879 --> 00:04:50.779
but he trudges wearily through the downpour.

00:04:50.899 --> 00:04:53.199
Makes sense. Three. Context. Where and when.

00:04:53.459 --> 00:04:55.439
Set the scene. Build the world like in a narrow

00:04:55.439 --> 00:04:57.740
Tokyo alley soaked in pulsating neon light on

00:04:57.740 --> 00:05:00.699
a rainy night. Okay. Number four. Motion. How's

00:05:00.699 --> 00:05:03.019
the camera moving? This is crucial for that cinematic

00:05:03.019 --> 00:05:06.300
feel. Is it a pan? A tilt? A dolly or a tracking

00:05:06.300 --> 00:05:08.879
shot moving with the subject? A zoom? A drone

00:05:08.879 --> 00:05:12.360
shot from above? Or maybe handheld for that raw

00:05:12.360 --> 00:05:16.839
feel. Specify it. Style. The visual look. Genre.

00:05:17.259 --> 00:05:19.339
Artistic influence. Could be in the style of

00:05:19.339 --> 00:05:21.759
Wes Anderson. Black and white film noir. Studio

00:05:21.759 --> 00:05:24.379
Ghibli animation. Get creative. Yeah. Framing.

00:05:25.000 --> 00:05:27.139
How is the shot composed? Establishing shot.

00:05:27.519 --> 00:05:31.920
Wide. Medium. Close up. Extreme close up. Each

00:05:31.920 --> 00:05:33.939
tells a different story. And the last one, seven.

00:05:34.259 --> 00:05:37.449
Audio. Crucial for VO, always add an audio section.

00:05:37.949 --> 00:05:41.050
Describe everything. Sound effects, SFX, background

00:05:41.050 --> 00:05:43.470
noise, music mood, dialogue snippets, ambient

00:05:43.470 --> 00:05:46.149
sounds. Make it immersive. And you had a pro

00:05:46.149 --> 00:05:48.480
tip too. Oh yeah, super important. Always had

00:05:48.480 --> 00:05:50.839
no subtitles at the end. Otherwise, the AI might

00:05:50.839 --> 00:05:52.819
just slap some random text on your video. It

00:05:52.819 --> 00:05:54.759
can be annoying to deal with. You know, honestly,

00:05:55.079 --> 00:05:57.220
even knowing all these steps, I still sometimes

00:05:57.220 --> 00:05:59.759
struggle with prompt drift myself. Where you

00:05:59.759 --> 00:06:02.199
refine it, but the AI kind of wanders off from

00:06:02.199 --> 00:06:04.439
what you first wanted, it really does take practice.

00:06:04.519 --> 00:06:06.519
Oh, absolutely. It's an iterative process. So

00:06:06.519 --> 00:06:08.879
just to underline it, how does using all seven

00:06:08.879 --> 00:06:11.740
elements really elevate a simple idea compared

00:06:11.740 --> 00:06:14.189
to just, you know, Detective and Rain. It gives

00:06:14.189 --> 00:06:17.769
the AI such a rich, detailed blueprint. It lets

00:06:17.769 --> 00:06:20.290
it generate something truly unique, specific,

00:06:20.589 --> 00:06:23.730
and cinematic, not just generic. Right. And you

00:06:23.730 --> 00:06:25.569
mentioned iteration. It sounds like you rarely

00:06:25.569 --> 00:06:27.730
get it perfect on the first go. Almost never.

00:06:28.170 --> 00:06:30.230
The best way to work with VO, or really any of

00:06:30.230 --> 00:06:33.269
these tools, is iteration. Refinement. It's more

00:06:33.269 --> 00:06:35.389
like sculpting than just hitting generate. So

00:06:35.389 --> 00:06:37.910
what's the recommended process there? Start simple.

00:06:38.689 --> 00:06:41.990
Seriously. Just subject plus action plus context.

00:06:42.199 --> 00:06:45.259
Generate that, see what the AI gives you generically,

00:06:45.879 --> 00:06:49.079
then layer it, add motion and framing, generate

00:06:49.079 --> 00:06:51.160
again, see how it starts feeling more like a

00:06:51.160 --> 00:06:54.300
movie shot? Finally, add the last layer, style

00:06:54.300 --> 00:06:56.639
and audio, then tweak and polish from there.

00:06:57.240 --> 00:06:58.839
Breaking it down like that gives you way more

00:06:58.839 --> 00:07:00.600
control and stops you from feeling overwhelmed.

00:07:00.800 --> 00:07:02.500
That makes a lot of sense. And you can use other

00:07:02.500 --> 00:07:04.120
tools to help, right? You don't have to invent

00:07:04.120 --> 00:07:06.339
the perfect prompt from scratch. Definitely not.

00:07:06.740 --> 00:07:08.899
Don't feel like you have to be a master screenwriter

00:07:08.899 --> 00:07:12.839
instantly. LMs like Chat GPT can be amazing creative

00:07:12.839 --> 00:07:16.000
partners. Give it a basic idea. Ask it to flesh

00:07:16.000 --> 00:07:18.160
it out cinematically. Ask it to think about light,

00:07:18.639 --> 00:07:21.800
sound, emotion. Then take that richer description

00:07:21.800 --> 00:07:25.319
and refine it for your VO prompt. Great brainstorming

00:07:25.319 --> 00:07:27.600
tool. And what about nailing a specific visual

00:07:27.600 --> 00:07:29.740
style? Sometimes words are hard for that. Totally.

00:07:30.120 --> 00:07:31.839
That's where image generators like Mid Journey

00:07:31.839 --> 00:07:34.420
come in handy. You can generate still images

00:07:34.420 --> 00:07:36.959
first to really pin down the exact look you want,

00:07:37.120 --> 00:07:39.300
the lighting, the colors, the vibe. Once you

00:07:39.300 --> 00:07:41.899
have a still image you love, then describe that

00:07:41.899 --> 00:07:44.899
style very precisely in your VO prompt. It's

00:07:44.899 --> 00:07:47.339
like visual prototyping. So for someone just

00:07:47.339 --> 00:07:49.980
diving into this iterative process with VO, what's

00:07:49.980 --> 00:07:52.500
the main piece of advice? Don't chase perfection

00:07:52.500 --> 00:07:55.000
right away. Build it up. Refine it in layers,

00:07:55.699 --> 00:07:58.220
step by step. OK, so it's powerful, but obviously

00:07:58.220 --> 00:08:01.110
not. perfect yet. What are some common mistakes

00:08:01.110 --> 00:08:03.449
or pitfalls people should watch out for? Well,

00:08:03.589 --> 00:08:05.889
we've hit on vague prompts already. That's number

00:08:05.889 --> 00:08:08.649
one. Forgetting the audio instructions is another

00:08:08.649 --> 00:08:11.470
big one, given VO's strength there. Forgetting

00:08:11.470 --> 00:08:13.930
no subtitles, trying to cram complex dialogue

00:08:13.930 --> 00:08:16.790
into an eight -second clip rarely works well.

00:08:17.250 --> 00:08:19.410
And just completely ignoring camera movement

00:08:19.410 --> 00:08:22.589
that makes videos feel very static and, well,

00:08:22.730 --> 00:08:24.610
AI -generated. And there are technical limits

00:08:24.610 --> 00:08:27.089
too right now. Yeah, definitely. That eight -second

00:08:27.089 --> 00:08:29.290
clip length is one. It's great for shorts, but

00:08:29.290 --> 00:08:32.149
not for long scenes yet. It outputs in HD, which

00:08:32.149 --> 00:08:35.340
is good. But processing times can vary. Sometimes

00:08:35.340 --> 00:08:37.899
it's quick, sometimes you wait a bit. And yeah,

00:08:37.899 --> 00:08:40.240
you need that Google AI Pro or Ultra subscription

00:08:40.240 --> 00:08:43.120
to access it. And this tech, like a lot of AI,

00:08:43.279 --> 00:08:45.820
brings up some pretty significant ethical questions.

00:08:46.159 --> 00:08:48.440
Huge ones. The ability to make realistic videos

00:08:48.440 --> 00:08:51.659
easily? Well, it opens the door to misuse. Deep

00:08:51.659 --> 00:08:53.860
fakes are a major concern, creating fake videos

00:08:53.860 --> 00:08:56.159
of people saying or doing things they never did.

00:08:56.639 --> 00:08:59.019
Misinformation potential is high. And copyright

00:08:59.019 --> 00:09:01.740
is this massive gray area that AI trains on vast

00:09:01.740 --> 00:09:04.740
amounts of data. Was that data copyrighted? And

00:09:04.740 --> 00:09:06.740
who owns the video you create? Is it fully yours?

00:09:07.200 --> 00:09:09.240
Or does Google have some claim because their

00:09:09.240 --> 00:09:11.360
AI made it? These are big legal questions right

00:09:11.360 --> 00:09:13.460
now. What about the impact on creative jobs,

00:09:13.799 --> 00:09:16.120
stock video creators, animators? It's a valid

00:09:16.120 --> 00:09:18.440
concern for sure. There will likely be disruption.

00:09:19.100 --> 00:09:22.419
But the thinking is this will probably evolve

00:09:22.419 --> 00:09:25.100
into more of an assistive tool, letting creators

00:09:25.100 --> 00:09:27.980
work faster, maybe handle the more tedious parts

00:09:27.980 --> 00:09:30.299
and focus their energy on the higher level storytelling,

00:09:30.379 --> 00:09:34.019
the core ideas. Vio isn't really the end of human

00:09:34.019 --> 00:09:35.779
creativity. It feels more like a new chapter,

00:09:35.899 --> 00:09:40.899
you know? Human -machine collaboration. Just

00:09:40.899 --> 00:09:43.139
imagine, though, scaling this up. Imagine generating

00:09:43.139 --> 00:09:45.919
entire feature films just from detailed text.

00:09:46.080 --> 00:09:49.139
the sheer potential for individual creators to

00:09:49.139 --> 00:09:51.620
bring massive stories to life, bypassing all

00:09:51.620 --> 00:09:53.179
the usual production barriers. That's kind of

00:09:53.179 --> 00:09:54.480
mind -boggling to think about. We're not there

00:09:54.480 --> 00:09:57.759
yet, but... Wow. When you look at all the power,

00:09:57.919 --> 00:10:00.120
the ethics, the potential, what's a core message

00:10:00.120 --> 00:10:02.820
about how AI, like Vio, is changing creativity?

00:10:02.980 --> 00:10:05.279
I think it's fundamentally a tool for collaboration.

00:10:05.580 --> 00:10:08.240
It's about augmenting and enhancing what humans

00:10:08.240 --> 00:10:11.120
can do creatively, not just replacing them. So

00:10:11.120 --> 00:10:14.440
stepping back to see the big picture here...

00:10:14.570 --> 00:10:17.309
Google Veo feels like it's genuinely transforming

00:10:17.309 --> 00:10:19.990
visual storytelling. It's more than just a neat

00:10:19.990 --> 00:10:22.710
piece of tech. It feels like a statement about

00:10:22.710 --> 00:10:24.409
where the creative industry might be heading,

00:10:24.730 --> 00:10:27.490
democratizing things. Absolutely. That ability

00:10:27.490 --> 00:10:29.429
to go straight from language from an idea in

00:10:29.429 --> 00:10:33.269
your head to a complete audio visual piece. that

00:10:33.269 --> 00:10:36.610
lowers the barrier to entry massively. It really

00:10:36.610 --> 00:10:39.070
does democratize video production in a way. And

00:10:39.070 --> 00:10:41.690
that golden formula, that seven step prompt structure,

00:10:42.169 --> 00:10:43.850
that's your mental toolkit for making it work.

00:10:43.970 --> 00:10:46.639
It's like the key. So the skill isn't necessarily

00:10:46.639 --> 00:10:48.820
about having the fanciest camera anymore. It's

00:10:48.820 --> 00:10:51.039
shifting towards thinking like a director, writing

00:10:51.039 --> 00:10:53.240
like a screenwriter, really envisioning the final

00:10:53.240 --> 00:10:55.779
scene. Exactly. Start simple. Play around with

00:10:55.779 --> 00:10:58.620
that seven element formula. And honestly, don't

00:10:58.620 --> 00:11:00.899
be afraid to mix things up. Combine unexpected

00:11:00.899 --> 00:11:03.600
elements. Sometimes the weird combinations lead

00:11:03.600 --> 00:11:05.919
to the coolest results. Yeah. Like you said,

00:11:06.000 --> 00:11:08.299
a film noir scene, but with cartoon characters.

00:11:08.580 --> 00:11:10.759
Or a nature documentary shot, like a sci -fi

00:11:10.759 --> 00:11:12.820
epic. That's where you can really push the AI

00:11:12.820 --> 00:11:15.620
into interesting territory. The AI video revolution

00:11:15.620 --> 00:11:18.399
is definitely here. It's happening now. And hopefully,

00:11:18.659 --> 00:11:20.480
with what we've talked about, you feel a bit

00:11:20.480 --> 00:11:23.659
more equipped to jump in and be part of it. Go

00:11:23.659 --> 00:11:25.879
make something cool. We really hope this deep

00:11:25.879 --> 00:11:28.139
dive gave you some useful insights, some valuable

00:11:28.139 --> 00:11:30.440
nuggets to think about. Until next time, keep

00:11:30.440 --> 00:11:32.039
exploring, out of your row music.
