WEBVTT

00:00:00.000 --> 00:00:02.000
I was reading this guy this morning that's a

00:00:02.000 --> 00:00:04.160
director's manual for AI filmmaking just out

00:00:04.160 --> 00:00:08.019
this week, January 2026. And it opens with this

00:00:08.019 --> 00:00:11.599
analogy that honestly it made me just put my

00:00:11.599 --> 00:00:13.500
tablet down. I think I know exactly which one

00:00:13.500 --> 00:00:15.980
you mean. The genius child. The genius child.

00:00:16.199 --> 00:00:19.059
Such a powerful image. The whole idea is that

00:00:19.059 --> 00:00:21.980
AI isn't a robot. It's not really a tool. It's

00:00:21.980 --> 00:00:24.500
a prodigy. It has this infinite imagination,

00:00:25.280 --> 00:00:28.550
but absolutely no discipline. Right. Zero impulse

00:00:28.550 --> 00:00:30.530
control you leave it alone in a room without

00:00:30.530 --> 00:00:32.490
clear instructions, and it just you know draws

00:00:32.490 --> 00:00:35.750
on the walls. It makes total chaos Welcome to

00:00:35.750 --> 00:00:37.649
the deep dive today. We're exploring the state

00:00:37.649 --> 00:00:40.869
of cinematic AI filmmaking It's early 2026 and

00:00:40.869 --> 00:00:42.570
we're trying to figure out why so much of the

00:00:42.570 --> 00:00:45.969
AI video We all see still feels a little off.

00:00:46.109 --> 00:00:48.350
Yeah a little floaty exactly We have these incredible

00:00:48.350 --> 00:00:51.810
tools now cling AI Google vo3 but the output

00:00:51.810 --> 00:00:54.890
is so often this like shimmering morphing mess

00:00:54.890 --> 00:00:57.359
and then Every once in a while you see a clip

00:00:57.359 --> 00:00:58.840
that looks like it was shot on a professional

00:00:58.840 --> 00:01:00.859
camera. Yeah, it has real weight to it. It has

00:01:00.859 --> 00:01:03.299
intention. And the premise we're unpacking today

00:01:03.299 --> 00:01:06.079
is that the difference isn't the software. It's

00:01:06.079 --> 00:01:08.840
the direction. So we're going to dig into the

00:01:08.840 --> 00:01:10.519
craft. We'll talk about the physics of lenses,

00:01:11.260 --> 00:01:14.480
the virtual actor workflow and the tool landscape

00:01:14.480 --> 00:01:16.760
right now in twenty twenty six. We're going to

00:01:16.760 --> 00:01:20.280
go from just typing a cool robot into a box to

00:01:20.280 --> 00:01:22.579
actually directing a scene that feels, you know,

00:01:22.579 --> 00:01:25.109
human. Let's start with the very first principle

00:01:25.109 --> 00:01:28.329
in this guide. It's something that feels completely

00:01:28.329 --> 00:01:31.250
counterintuitive for video. The guide says if

00:01:31.250 --> 00:01:33.390
you want your video to look professional, the

00:01:33.390 --> 00:01:35.969
first thing to do is stop moving the camera.

00:01:36.109 --> 00:01:38.549
The power of stillness. But why? I mean, we're

00:01:38.549 --> 00:01:40.250
making moving pictures. Why would I lock the

00:01:40.250 --> 00:01:42.430
camera down? It is the absolute hardest thing

00:01:42.430 --> 00:01:44.930
for beginners to accept. You pay your subscription

00:01:44.930 --> 00:01:47.810
to Kling or Veo, and your first instinct is,

00:01:47.989 --> 00:01:50.489
I paid for motion. I want motion. I want the

00:01:50.489 --> 00:01:52.730
camera flying through a city. More is always

00:01:52.730 --> 00:01:54.829
more. But it's usually just too much. It gives

00:01:54.829 --> 00:01:57.310
you that kind of seasick feeling. If you think

00:01:57.310 --> 00:01:59.650
about the greatest films, like No Country for

00:01:59.650 --> 00:02:02.650
Old Men or The Godfather, the most powerful shots

00:02:02.650 --> 00:02:04.769
are often totally locked off. The camera's on

00:02:04.769 --> 00:02:07.170
a tripod. It doesn't move at all. Because it

00:02:07.170 --> 00:02:09.669
forces you to actually look at the subject. Precisely.

00:02:10.120 --> 00:02:13.479
The guide uses this beautiful example of a portrait

00:02:13.479 --> 00:02:16.680
of an old fisherman. If you tell an AI model,

00:02:16.939 --> 00:02:19.099
make a cool video of a fisherman, it's probably

00:02:19.099 --> 00:02:21.039
going to have him waving, or the camera will

00:02:21.039 --> 00:02:23.500
do this crazy swoop around his head. It looks

00:02:23.500 --> 00:02:26.180
like a video game. It feels weightless. Right.

00:02:26.560 --> 00:02:29.180
But the professional approach is to lock the

00:02:29.180 --> 00:02:32.060
camera. You specify a 50 millimeter lens, maybe

00:02:32.060 --> 00:02:35.219
an f1 .8 aperture to get that nice blurry background.

00:02:35.560 --> 00:02:39.199
And you focus on the micro movements. What do

00:02:39.199 --> 00:02:41.340
we mean by micro? movements exactly. It's the

00:02:41.340 --> 00:02:42.979
little stuff that proves something is alive.

00:02:43.360 --> 00:02:46.319
The smoke curling up from his pipe, the slow

00:02:46.319 --> 00:02:49.159
heavy blink of his eyes, the mist floating in

00:02:49.159 --> 00:02:51.199
the harbor behind him. You see the texture of

00:02:51.199 --> 00:02:53.620
their wrinkles on his face. When the camera is

00:02:53.620 --> 00:02:55.620
still, the viewer isn't just distracted by a

00:02:55.620 --> 00:02:57.960
big whoosh. They have to connect with the person.

00:02:58.159 --> 00:03:01.539
So why do we equate motion with quality when

00:03:01.539 --> 00:03:04.479
stillness often holds the emotion? Because stillness

00:03:04.479 --> 00:03:06.879
forces us to confront the character's humanity.

00:03:07.099 --> 00:03:11.009
Okay, so stillness is the foundation. But movies

00:03:11.009 --> 00:03:13.629
do move, obviously. So the guide moves on to

00:03:13.629 --> 00:03:15.550
directing the eye, and it brings up a technique

00:03:15.550 --> 00:03:18.430
called rack focus. I know what that looks like,

00:03:18.430 --> 00:03:20.969
but how do you translate that to a prompt? So

00:03:20.969 --> 00:03:23.009
rack focus is basically the director pointing

00:03:23.009 --> 00:03:25.289
and saying, look here. OK, now look there. You're

00:03:25.289 --> 00:03:27.469
just shifting the sharpness from the foreground

00:03:27.469 --> 00:03:30.550
to the background. And in AI, that's a huge flex,

00:03:30.569 --> 00:03:33.330
because it implies real 3D depth. Can you give

00:03:33.330 --> 00:03:35.789
us the visual from the guide? Yeah. So picture

00:03:35.789 --> 00:03:40.599
a jazz bar. Moody, kind of noir. Vibe. In the

00:03:40.599 --> 00:03:42.620
front, you've got a glass of whiskey. The ice

00:03:42.620 --> 00:03:44.800
is melting. It's perfectly sharp. In the background,

00:03:44.860 --> 00:03:47.080
all blurry. There's a woman in a red dress looking

00:03:47.080 --> 00:03:49.719
out a window. A classic Noir setup. Total classic.

00:03:50.099 --> 00:03:52.319
But the movement isn't the camera flying at her.

00:03:52.460 --> 00:03:54.840
The movement is the lens itself changing focus.

00:03:55.360 --> 00:03:58.520
You tell Kling AI to rack focus to background.

00:03:58.960 --> 00:04:01.120
The glass goes blurry, and suddenly the woman

00:04:01.120 --> 00:04:03.680
is sharp. It creates a story with no dialogue

00:04:03.680 --> 00:04:07.319
at all. Here's the drink, and there she is, leaving.

00:04:07.710 --> 00:04:10.250
So is the camera merely recording or is it acting

00:04:10.250 --> 00:04:12.689
as a narrator? It's a narrator physically pointing

00:04:12.689 --> 00:04:16.170
at what matters most. This idea of control brings

00:04:16.170 --> 00:04:18.709
us to what might be the biggest frustration with

00:04:18.709 --> 00:04:21.370
AI video. I think everyone listening has felt

00:04:21.370 --> 00:04:24.889
this. You generate a character, say a cyberpunk

00:04:24.889 --> 00:04:27.509
mercenary. She looks great in shot one. You generate

00:04:27.509 --> 00:04:29.949
shot two and suddenly she has a different nose

00:04:29.949 --> 00:04:32.370
or her hair is shorter. Oh, the identity drift.

00:04:32.370 --> 00:04:34.550
It just breaks the illusion instantly. Right.

00:04:34.709 --> 00:04:37.899
The face changes and the audience is gone. The

00:04:37.899 --> 00:04:39.500
guide says we've been doing this all backward.

00:04:39.980 --> 00:04:41.879
We usually try to generate the character inside

00:04:41.879 --> 00:04:44.100
the video tool. Which is a huge mistake. The

00:04:44.100 --> 00:04:46.160
video generator is already trying to calculate

00:04:46.160 --> 00:04:50.199
motion and lighting and physics. It's just too

00:04:50.199 --> 00:04:52.480
much for the genius child to handle. You need

00:04:52.480 --> 00:04:54.939
a virtual actor. And this is where that specific

00:04:54.939 --> 00:04:58.360
tool comes in. Nanobanana. I know the names in

00:04:58.360 --> 00:05:01.259
2026 are just ridiculous, but nanobanana is kind

00:05:01.259 --> 00:05:03.540
of the industry standard right now for good reason

00:05:03.540 --> 00:05:05.939
It just yeah, it listens better The strategy

00:05:05.939 --> 00:05:08.319
is you don't start with a video you start with

00:05:08.319 --> 00:05:11.180
a character sheet like a casting photo Exactly

00:05:11.180 --> 00:05:14.220
like a casting photo you go into nanobanana not

00:05:14.220 --> 00:05:18.139
cling not vo and you create one master image

00:05:18.139 --> 00:05:20.660
Front view full body. Let's stick with that cyberpunk

00:05:20.660 --> 00:05:24.759
mercenary silver hair purple streaks, jacket

00:05:24.759 --> 00:05:27.480
with neon circuits. You get that image perfect.

00:05:27.639 --> 00:05:29.240
And then you take that image somewhere else.

00:05:29.420 --> 00:05:31.899
Then and only then do you go to your video tool.

00:05:32.060 --> 00:05:33.980
You upload that image and you say, this is my

00:05:33.980 --> 00:05:36.319
reference, do not change her. You're anchoring

00:05:36.319 --> 00:05:40.079
her identity. Does creating a static soul for

00:05:40.079 --> 00:05:42.939
the character make the AI -generated movement

00:05:42.939 --> 00:05:46.199
more believable? Yes, because consistency creates

00:05:46.199 --> 00:05:49.220
the illusion of a continuous life. OK, character

00:05:49.220 --> 00:05:52.279
locked. focus directed, now we actually need

00:05:52.279 --> 00:05:54.639
to move the camera. The guide makes this huge

00:05:54.639 --> 00:05:57.540
distinction between zoom and dolly. They might

00:05:57.540 --> 00:05:59.439
sound the same, but they feel completely different,

00:05:59.660 --> 00:06:01.060
don't they? Oh, completely different physics.

00:06:01.279 --> 00:06:03.439
A zoom is just optical. You're making the image

00:06:03.439 --> 00:06:05.699
bigger. It kind of flattens everything. A dolly

00:06:05.699 --> 00:06:07.720
is when you physically move the camera body through

00:06:07.720 --> 00:06:09.379
the space. So you're actually walking forward.

00:06:09.519 --> 00:06:11.519
You're walking forward. And because the camera

00:06:11.519 --> 00:06:14.480
itself is moving, the relationship between the

00:06:14.480 --> 00:06:16.860
foreground and background changes. That's what

00:06:16.860 --> 00:06:19.540
our brains read as real. The guide also mentions

00:06:19.560 --> 00:06:22.500
The vertigo effect, the dolly zoom, that classic

00:06:22.500 --> 00:06:25.600
Hitchcock shot. The Jaws shot. It's when you

00:06:25.600 --> 00:06:28.139
dolly the camera backwards while you zoom the

00:06:28.139 --> 00:06:31.420
lens in. That sounds incredibly complicated.

00:06:31.680 --> 00:06:34.259
It looks like a headache. It creates this warping

00:06:34.259 --> 00:06:36.160
effect. If you do it on a character in an alley,

00:06:36.220 --> 00:06:38.339
they stay the same size, but the background behind

00:06:38.339 --> 00:06:40.639
them just stretches and gets all weird. It's

00:06:40.639 --> 00:06:43.160
instant anxiety. How does the physics of the

00:06:43.160 --> 00:06:45.839
camera lens manipulate the viewer's psychological

00:06:45.839 --> 00:06:48.319
state? It changes our spatial relationship to

00:06:48.319 --> 00:06:51.939
the subject, creating unease or closeness. It's

00:06:51.939 --> 00:06:54.579
just wild that we have to explain these physical

00:06:54.579 --> 00:06:57.480
ideas to a digital brain. We have to simulate

00:06:57.480 --> 00:06:59.899
the glass. We do. And that also applies to moving

00:06:59.899 --> 00:07:02.439
sideways. The guide brings up pan versus truck.

00:07:02.620 --> 00:07:04.839
Trucking, another one of those industrial sounding

00:07:04.839 --> 00:07:07.379
terms. Well, it comes from putting the camera

00:07:07.379 --> 00:07:10.839
on a truck or a track. A pan is just standing

00:07:10.839 --> 00:07:14.079
still and turning your head, left, right. The

00:07:14.079 --> 00:07:16.779
problem is, in AI video, that often makes the

00:07:16.779 --> 00:07:18.959
background look like flat wallpaper. Right, there's

00:07:18.959 --> 00:07:22.240
no depth. And trucking fixes that. Trucking is

00:07:22.240 --> 00:07:25.430
sliding the camera sideways. Imagine you're running

00:07:25.430 --> 00:07:27.750
right alongside a horse. Okay, I'm thinking of

00:07:27.750 --> 00:07:30.129
the example from The Samurai riding through a

00:07:30.129 --> 00:07:33.089
bamboo forest. Perfect example. If you pan that

00:07:33.089 --> 00:07:35.709
scene, it looks flat. But if you truck rate,

00:07:35.949 --> 00:07:38.490
moving parallel to the horse, you get the parallax

00:07:38.490 --> 00:07:40.990
effect. Parallax. Can you break that down? It's

00:07:40.990 --> 00:07:43.250
simple physics, really. The things that are close

00:07:43.250 --> 00:07:46.089
to the camera seem to move fast. So the bamboo

00:07:46.089 --> 00:07:48.230
trees in the foreground are just whizzing by,

00:07:48.389 --> 00:07:50.850
all blurred out. But the mountains way in the

00:07:50.850 --> 00:07:53.810
background are moving really slowly. That difference

00:07:53.810 --> 00:07:56.389
in speed tells your brain, OK, this is a 3D world.

00:07:56.410 --> 00:07:58.949
This is real. Why is the parallax effect the

00:07:58.949 --> 00:08:00.949
dividing line between amateur and professional

00:08:00.949 --> 00:08:03.470
visuals? It proves the scene exists in three

00:08:03.470 --> 00:08:05.860
dimensions, not just two. I want to touch on

00:08:05.860 --> 00:08:08.379
something that usually just destroys AI video,

00:08:08.579 --> 00:08:11.220
even if you're trucking correctly. Complex actions.

00:08:11.319 --> 00:08:13.199
We've all seen it. Someone tried to generate

00:08:13.199 --> 00:08:15.860
a guy jumping off a cliff and halfway down his

00:08:15.860 --> 00:08:18.079
legs turned into spaghetti. The hallucination

00:08:18.079 --> 00:08:21.860
phase, yeah. The AI just loses track of anatomy.

00:08:22.139 --> 00:08:24.379
So how do you direct something that complex?

00:08:24.670 --> 00:08:27.870
The guide suggests a multi -frame strategy. It's

00:08:27.870 --> 00:08:30.389
brilliant because it stops treating the AI like

00:08:30.389 --> 00:08:32.809
a magician and starts treating it like an animator.

00:08:33.269 --> 00:08:35.190
Instead of just saying man jumps off cliff, which

00:08:35.190 --> 00:08:38.409
is way too big, you give it guardrails. A starting

00:08:38.409 --> 00:08:40.750
point and an end point. Exactly. You generate

00:08:40.750 --> 00:08:43.549
keyframe A, the man standing on the edge. Then

00:08:43.549 --> 00:08:46.909
you generate keyframe B, the man mid -dive, arms

00:08:46.909 --> 00:08:49.450
spread out, jacket flapping in the wind. You

00:08:49.450 --> 00:08:51.649
feed both of those images into the video generator.

00:08:51.950 --> 00:08:55.330
And the AI just... fills in the middle. It interpolates

00:08:55.330 --> 00:08:57.870
the path, but because it knows where it has to

00:08:57.870 --> 00:09:00.169
end up, it doesn't get lost. It calculates the

00:09:00.169 --> 00:09:02.330
gravity of the flapping jacket, all of it. It's

00:09:02.330 --> 00:09:04.269
connecting the docks instead of just guessing

00:09:04.269 --> 00:09:06.769
where to go. So are we directing the AI or are

00:09:06.769 --> 00:09:09.909
we simply setting boundaries to contain its hallucinations?

00:09:10.309 --> 00:09:12.470
We are building guardrails so its imagination

00:09:12.470 --> 00:09:14.610
doesn't braid physics. Speaking of boundaries,

00:09:14.990 --> 00:09:17.649
there's one more glitch in the guide that I found

00:09:17.649 --> 00:09:23.000
both hilarious and kind of terrifying. The second

00:09:23.000 --> 00:09:26.000
face problem. Oh, the nightmare fuel. When the

00:09:26.000 --> 00:09:28.000
camera does an orbit shot, it circles around

00:09:28.000 --> 00:09:30.299
a character, and when it gets to their back,

00:09:31.539 --> 00:09:33.120
there's another face on the back of their head.

00:09:33.299 --> 00:09:35.799
It happens because the AI gets confused. It just

00:09:35.799 --> 00:09:38.440
knows character equals face. So when it runs

00:09:38.440 --> 00:09:40.679
out of face to show, it panics and just puts

00:09:40.679 --> 00:09:42.500
another one on the back of their skull. How on

00:09:42.500 --> 00:09:45.549
earth do you stop that? You need landmarks. You

00:09:45.549 --> 00:09:48.190
have to give the AI something specific to look

00:09:48.190 --> 00:09:50.750
at on the character's back. The example is a

00:09:50.750 --> 00:09:52.789
wizard on a mountain. If you just say, orbit

00:09:52.789 --> 00:09:55.789
the wizard, you might get the second face. But

00:09:55.789 --> 00:09:58.350
if you specify, the wizard wears a cape with

00:09:58.350 --> 00:10:01.450
a golden dragon emblem on the back, now the AI

00:10:01.450 --> 00:10:04.120
has a mission. It has a target. Right. It thinks,

00:10:04.299 --> 00:10:06.419
OK, I'm circling. I need to find that dragon

00:10:06.419 --> 00:10:09.139
emblem. It anchors the whole image on that object,

00:10:09.360 --> 00:10:11.659
not the anatomy. That's such a clever workaround.

00:10:11.740 --> 00:10:13.600
You're just guiding its attention so it doesn't

00:10:13.600 --> 00:10:15.960
have to guess. I want to take a quick pause here.

00:10:16.559 --> 00:10:18.779
We've been talking about the how, the trucking,

00:10:18.879 --> 00:10:21.200
the landmarks, the static shots. But in a moment,

00:10:21.480 --> 00:10:24.700
I want to shift to the what. The actual tools

00:10:24.700 --> 00:10:27.220
available to us in 2026, because the landscape

00:10:27.220 --> 00:10:30.039
has really changed. We're back. We're deep diving

00:10:30.039 --> 00:10:33.460
into cinematic AI filmmaking in January, 2026.

00:10:33.879 --> 00:10:35.720
We've covered the techniques, but let's get practical.

00:10:36.259 --> 00:10:38.399
There are a million tools out there. Subscription

00:10:38.399 --> 00:10:40.480
fatigue is a real thing. I don't want to pay

00:10:40.480 --> 00:10:42.740
for five different services if I don't have to.

00:10:42.860 --> 00:10:45.620
It's very real. And the guide is pretty blunt

00:10:45.620 --> 00:10:49.600
about this. The era of the generalist AI is kind

00:10:49.600 --> 00:10:52.039
of fading. So the one app to do it all dream

00:10:52.039 --> 00:10:54.720
is over. For high end work, yeah. We have a specialized

00:10:54.720 --> 00:10:57.789
tool chain now. Just like a real film crew has

00:10:57.789 --> 00:10:59.389
different departments, you need different AIs

00:10:59.389 --> 00:11:02.090
for different jobs. The guide breaks down four

00:11:02.090 --> 00:11:04.409
main players. Okay, let's run through them. We've

00:11:04.409 --> 00:11:06.110
already mentioned the first one. Nano Banana.

00:11:06.379 --> 00:11:08.720
The foundation. It's on the open art platform.

00:11:08.940 --> 00:11:12.320
This is not for making video. It is purely for

00:11:12.320 --> 00:11:15.279
creating your source images, your actors. Its

00:11:15.279 --> 00:11:18.220
whole superpower is realistic skin and just listening

00:11:18.220 --> 00:11:21.000
to your prompt. If you skip this step, your virtual

00:11:21.000 --> 00:11:23.360
actor will look like plastic. Garbage in, garbage

00:11:23.360 --> 00:11:26.320
out. Exactly. Okay, next up is the budget king,

00:11:26.759 --> 00:11:30.539
Google VO3. VO3. That's the landscape engine.

00:11:30.860 --> 00:11:33.779
Yeah, it's amazing for B -roll. 4K landscapes,

00:11:34.139 --> 00:11:36.940
drone shots, establishing shots of forests. It

00:11:36.940 --> 00:11:39.059
understands physics really well, and, you know,

00:11:39.059 --> 00:11:41.379
it's usually free or very low cost. Right. But

00:11:41.379 --> 00:11:43.580
it struggles with complex human acting. It's

00:11:43.580 --> 00:11:45.299
a little stiff, so you use it for the world,

00:11:45.379 --> 00:11:47.019
not for the people in it. OK, so for the people,

00:11:47.039 --> 00:11:49.320
for the acting, where do we go? You go to the

00:11:49.320 --> 00:11:51.720
storyteller, Kling AI. This is the paid tool,

00:11:51.980 --> 00:11:53.639
the heavy hitter. It's the king of movement.

00:11:53.899 --> 00:11:56.960
When you type truck right or dolly in, Kling

00:11:56.960 --> 00:11:58.840
actually knows what those words mean physically.

00:11:59.100 --> 00:12:01.259
It's for the serious filmmaker. And the last

00:12:01.259 --> 00:12:05.539
one sounds a bit chaotic. Yeah. Seance 1 .5 Pro.

00:12:05.860 --> 00:12:08.860
The action star. This thing is huge on TikTok

00:12:08.860 --> 00:12:12.279
for anime styles. If you want explosions, magic

00:12:12.279 --> 00:12:15.600
spells, fast cuts, wild stuff, Seance is your

00:12:15.600 --> 00:12:18.330
tool. It's just pure high energy. But you probably

00:12:18.330 --> 00:12:20.309
wouldn't use it for our old fisherman portrait.

00:12:20.509 --> 00:12:22.950
Oh, god, no. If you ask sedents for a fisherman

00:12:22.950 --> 00:12:24.909
smoking a pipe, the pipe would probably turn

00:12:24.909 --> 00:12:27.409
into a dragon and just fly away. It makes ADHD

00:12:27.409 --> 00:12:29.450
video. It's too caffeinated for real emotion.

00:12:29.870 --> 00:12:32.669
So does the specialization of these tools suggest

00:12:32.669 --> 00:12:36.309
we're moving away from all -in -one AI models?

00:12:36.750 --> 00:12:38.909
Absolutely. Specialized tools are now required

00:12:38.909 --> 00:12:42.009
for specific cinematic tasks. That brings us

00:12:42.009 --> 00:12:44.809
to the final lesson from the guide, the golden

00:12:44.809 --> 00:12:47.360
rule. We touched on it. But it's really about

00:12:47.360 --> 00:12:49.840
the language we use. The megaprompt. Right. The

00:12:49.840 --> 00:12:52.720
days of typing a girl walking are just over.

00:12:52.879 --> 00:12:54.860
That gets you nowhere now. If you type a girl

00:12:54.860 --> 00:12:57.320
walking, you get the average of every video on

00:12:57.320 --> 00:13:00.340
the internet, which is boring. The guide gives

00:13:00.340 --> 00:13:03.139
us a specific formula. The prompt is now your

00:13:03.139 --> 00:13:05.039
script, your set design, your lighting crew,

00:13:05.179 --> 00:13:07.159
all of it. Now what's the formula? It's movement

00:13:07.159 --> 00:13:09.679
type plus a detailed subject plus the environment

00:13:09.679 --> 00:13:12.320
plus camera gear plus lighting. Give us an example.

00:13:12.440 --> 00:13:14.259
The bad version versus the good version. OK.

00:13:14.570 --> 00:13:17.789
Bad version. Cyberpunk girl walking in rain.

00:13:17.909 --> 00:13:20.669
And that gets ya. A generic, kinda boring cartoon.

00:13:20.929 --> 00:13:23.669
Now, the good version. Tracking shot. Camera

00:13:23.669 --> 00:13:26.970
trucks right. A cyberpunk girl with neon glowing

00:13:26.970 --> 00:13:29.809
skin walking through a rainy alleyway. Reflections

00:13:29.809 --> 00:13:33.289
on wet pavement. Transparent plastic coat. Shot

00:13:33.289 --> 00:13:37.549
on Sony Venice 2. 35mm prime lens. F1 .4 aperture.

00:13:38.029 --> 00:13:40.960
Teal and orange palette. Wow. You're literally

00:13:40.960 --> 00:13:43.500
specifying the aperture, f1 .4. You have to.

00:13:43.840 --> 00:13:47.759
Telling the AI f1 .4 is code for I want a really

00:13:47.759 --> 00:13:49.779
blurry background and a sharp subject. If you

00:13:49.779 --> 00:13:52.200
don't say that, it defaults to a flat image where

00:13:52.200 --> 00:13:54.539
everything is in focus, like a cheap TV show.

00:13:54.919 --> 00:13:57.100
AI always guesses boring unless you force it

00:13:57.100 --> 00:13:59.039
to be interesting. Is the role of the director

00:13:59.039 --> 00:14:01.840
shifting from managing people to managing language?

00:14:01.940 --> 00:14:04.740
Yes. The prompt is now the script, the set, and

00:14:04.740 --> 00:14:06.860
the crew. I still kind of wrestle with that idea

00:14:06.860 --> 00:14:08.940
myself, that the words themselves are the lens.

00:14:09.610 --> 00:14:11.110
Let's bring this all back. We started with the

00:14:11.110 --> 00:14:13.350
genius child. We moved through stillness, depth,

00:14:13.669 --> 00:14:15.789
consistency. What's the big takeaway for someone

00:14:15.789 --> 00:14:17.789
who's about to open their laptop and try this?

00:14:18.070 --> 00:14:20.450
The big idea is just intentionality. Stop letting

00:14:20.450 --> 00:14:23.950
the AI drive. There is no make cool movie button.

00:14:24.269 --> 00:14:27.429
One, start with a static shot. Master stillness

00:14:27.429 --> 00:14:31.269
first. Two, use physical camera terms like truck

00:14:31.269 --> 00:14:33.950
and dolla to create that 3D space. And three,

00:14:34.570 --> 00:14:36.389
lock your character's identity with a reference

00:14:36.389 --> 00:14:39.679
image. It's really about craftsmanship. Even

00:14:39.679 --> 00:14:42.360
in this automated world, craft still matters.

00:14:42.679 --> 00:14:45.480
Maybe more than ever. Because when everyone can

00:14:45.480 --> 00:14:47.059
generate a video, the only thing that's going

00:14:47.059 --> 00:14:49.299
to stand out is taste. Can you tell a good story?

00:14:49.419 --> 00:14:51.600
Taste is the differentiator. I like that. And

00:14:51.600 --> 00:14:53.200
so here's my challenge for everyone listening.

00:14:53.740 --> 00:14:56.220
Open NanoBanana. Open whatever tool you use.

00:14:56.379 --> 00:14:58.559
Don't try to make an epic battle. Try to make

00:14:58.559 --> 00:15:00.659
that fisherman. Try to make a portrait that's

00:15:00.659 --> 00:15:03.139
perfectly still, except for one tiny thing, a

00:15:03.139 --> 00:15:05.779
puff of smoke, a blink. Just master the stillness

00:15:05.779 --> 00:15:07.840
first. That's a great place to start. And I'll

00:15:07.840 --> 00:15:09.600
add one final thought from the guide that I found

00:15:09.600 --> 00:15:12.059
really comforting. It said, don't feel bad if

00:15:12.059 --> 00:15:14.659
your first videos are not perfect. Amen to that.

00:15:14.840 --> 00:15:18.139
It's 2026. This tech is moving so fast, but it

00:15:18.139 --> 00:15:20.480
is still an art form. And you have to be vulnerable

00:15:20.480 --> 00:15:23.000
enough to make bad art before you can make good

00:15:23.000 --> 00:15:25.549
art. The first draft of anything is garbage.

00:15:25.710 --> 00:15:28.529
Just keep prompting. Thanks for diving in with

00:15:28.529 --> 00:15:30.990
us today. Go direct some masterpieces. We'll

00:15:30.990 --> 00:15:31.490
see you in the next one.