WEBVTT

00:00:00.000 --> 00:00:02.100
Not long ago, a cinematic short film demanded

00:00:02.100 --> 00:00:04.960
a director. It needed a camera crew and a massive

00:00:04.960 --> 00:00:07.160
budget. You need a complex lighting knowledge

00:00:07.160 --> 00:00:10.199
for it to look acceptable. In 2026, it requires

00:00:10.199 --> 00:00:12.380
a standard laptop. You just need a solid prompt

00:00:12.380 --> 00:00:15.339
and some basic patience. And you need about 30

00:00:15.339 --> 00:00:18.789
minutes of focus time. Welcome to our deep dive

00:00:18.789 --> 00:00:20.809
today. We're unpacking a fascinating new workflow

00:00:20.809 --> 00:00:23.750
guide right now. It's called Mastering Text -to

00:00:23.750 --> 00:00:27.030
-Video AI in 2026. We're moving far past the

00:00:27.030 --> 00:00:29.170
early software hype today. We really want to

00:00:29.170 --> 00:00:30.890
unpack the actual daily production workflow.

00:00:30.989 --> 00:00:32.729
We're going to explore three core foundations

00:00:32.729 --> 00:00:35.750
of AI video. These foundations separate inconsistent

00:00:35.750 --> 00:00:38.829
outputs from true pro -level cinematic films.

00:00:39.390 --> 00:00:41.810
beat. It's just a massive shift in creative power.

00:00:41.969 --> 00:00:44.609
Yeah, it really is a completely new paradigm

00:00:44.609 --> 00:00:48.250
for creators. We're seeing an absolute explosion

00:00:48.250 --> 00:00:51.250
in AI video lately. Creators are producing stunning

00:00:51.250 --> 00:00:54.130
short films every single day. They're making

00:00:54.130 --> 00:00:56.210
high end branded ads right from their bedrooms.

00:00:56.270 --> 00:00:58.109
They do all of this without a single physical

00:00:58.109 --> 00:01:01.429
camera. The creative landscape has shifted dramatically

00:01:01.429 --> 00:01:04.609
in 12 short months. But there's a brutal truth

00:01:04.609 --> 00:01:08.049
we actually must face here. Most beginners fail

00:01:08.049 --> 00:01:11.170
completely when they try these new tools. It

00:01:11.170 --> 00:01:12.709
isn't because the video software is actually

00:01:12.709 --> 00:01:15.530
bad. It's because they jump in without a structured

00:01:15.530 --> 00:01:17.450
system. They don't understand how the underlying

00:01:17.450 --> 00:01:19.409
models actually work. I actually have to make

00:01:19.409 --> 00:01:21.730
a vulnerable admission right here. I still wrestle

00:01:21.730 --> 00:01:23.430
with burning through credits myself sometimes.

00:01:23.609 --> 00:01:25.609
I just roll the dice and hope for one decent

00:01:25.609 --> 00:01:27.689
clip. It can be an incredibly frustrating creative

00:01:27.689 --> 00:01:30.069
process, you know, but that makes a lot of sense

00:01:30.069 --> 00:01:32.150
when you step back. These tools are incredibly

00:01:32.150 --> 00:01:35.219
advanced today. So why is our instinct to just

00:01:35.219 --> 00:01:37.099
type a basic sentence? Why does hitting generate

00:01:37.099 --> 00:01:39.780
actually set us up to fail completely? Well,

00:01:39.840 --> 00:01:42.200
because surface level prompting lacks any real

00:01:42.200 --> 00:01:44.859
structural control. You're basically asking a

00:01:44.859 --> 00:01:47.400
statistical model to guess absolutely everything.

00:01:47.700 --> 00:01:49.780
It has to guess the complex lighting and the

00:01:49.780 --> 00:01:52.280
camera lens. It guesses the focal length and

00:01:52.280 --> 00:01:54.640
the local physics. The compute burden is just

00:01:54.640 --> 00:01:56.920
incredibly high for the model. It simply cannot

00:01:56.920 --> 00:01:58.959
hold all those intricate variables perfectly

00:01:58.959 --> 00:02:02.060
together. When you type text, you open an infinite

00:02:02.060 --> 00:02:05.459
possibility space. The model traverses a massive

00:02:05.459 --> 00:02:08.479
high dimensional latent space very quickly. It

00:02:08.479 --> 00:02:11.120
tries to resolve random digital noise into clear

00:02:11.120 --> 00:02:13.900
shapes. Doing that across time in a video is

00:02:13.900 --> 00:02:16.520
incredibly unstable. You need a multi -step workflow

00:02:16.520 --> 00:02:19.060
to guide the model safely. You have to actively

00:02:19.060 --> 00:02:22.120
constrain that infinite possibility space. So

00:02:22.120 --> 00:02:24.900
bad results come from missing workflows, not

00:02:24.900 --> 00:02:27.120
from broken tools. Right. And that brings us

00:02:27.120 --> 00:02:29.180
to the first core foundation today. We really

00:02:29.180 --> 00:02:31.319
have to move beyond just typing simple text prompts.

00:02:31.520 --> 00:02:33.759
The guide outlines five distinct methods in the

00:02:33.759 --> 00:02:35.900
new workflow. You have to understand these five

00:02:35.900 --> 00:02:38.520
core approaches very deeply. Let's walk through

00:02:38.520 --> 00:02:40.780
those five specific methods right now. Method

00:02:40.780 --> 00:02:43.180
one is the basic text to video we just discussed.

00:02:43.939 --> 00:02:47.360
This is still highly useful for simple establishing

00:02:47.360 --> 00:02:49.900
shots. You use it when you just need a quick

00:02:49.900 --> 00:02:52.680
static landscape. Method two is called image

00:02:52.680 --> 00:02:54.780
to video in the current guide. This is where

00:02:54.780 --> 00:02:57.199
you start with a static picture first. You lock

00:02:57.199 --> 00:02:59.319
in the exact composition before you ever animate

00:02:59.319 --> 00:03:02.080
it. Exactly. And that eliminates half the guesswork

00:03:02.080 --> 00:03:04.780
for the model immediately. Generating video from

00:03:04.780 --> 00:03:07.500
pure text is kind of like sculpting thick smoke.

00:03:07.780 --> 00:03:09.919
You might shape a perfect face for one single

00:03:09.919 --> 00:03:12.300
second, but the temporal wind immediately blows

00:03:12.300 --> 00:03:14.819
those delicate features away. An image prompt

00:03:14.819 --> 00:03:17.120
acts like a sudden flash freeze. It turns that

00:03:17.120 --> 00:03:19.879
highly unstable smoke into solid digital ice.

00:03:20.139 --> 00:03:22.680
You establish the lighting and the exact framing

00:03:22.680 --> 00:03:25.419
up front. Then you just ask the AI to add the

00:03:25.419 --> 00:03:28.639
subtle motion. Beat. That brings us to method

00:03:28.639 --> 00:03:30.840
three in the guide today. Method three is elements

00:03:30.840 --> 00:03:33.340
to video or ingredients to video, as some call

00:03:33.340 --> 00:03:35.979
it. Method four focuses specifically on generating

00:03:35.979 --> 00:03:38.860
accurate lip sync data. You absolutely need this

00:03:38.860 --> 00:03:41.280
if your characters are speaking dialogue. And

00:03:41.280 --> 00:03:43.599
method five is known as video to video or motion

00:03:43.599 --> 00:03:46.219
transfer. That means using real video movement

00:03:46.219 --> 00:03:49.680
to puppet an AI generated character. When I look

00:03:49.680 --> 00:03:51.879
at all these different methods combined, it feels

00:03:51.879 --> 00:03:54.020
like stacking Lego blocks of data to build a

00:03:54.020 --> 00:03:56.110
scene. It's so much better than hoping a single

00:03:56.110 --> 00:03:59.069
prompt works. That Lego analogy is the absolute

00:03:59.069 --> 00:04:01.150
perfect way to view it. You're basically assembling

00:04:01.150 --> 00:04:03.270
a final product from highly discrete pieces.

00:04:03.490 --> 00:04:05.430
Each individual piece serves a highly specific

00:04:05.430 --> 00:04:08.349
visual function. You don't ask one block to build

00:04:08.349 --> 00:04:10.969
the whole entire house. If you do, the AI model

00:04:10.969 --> 00:04:13.490
is going to hallucinate wildly. Yeah. So I want

00:04:13.490 --> 00:04:15.110
to ask about method number three specifically.

00:04:15.409 --> 00:04:17.529
How does moving to something like elements to

00:04:17.529 --> 00:04:19.730
video fundamentally change things? How does it

00:04:19.730 --> 00:04:21.910
change the control we have over the final shot?

00:04:22.319 --> 00:04:24.620
Well, separating visual elements allows you to

00:04:24.620 --> 00:04:27.519
provide precise direction. You isolate the visual

00:04:27.519 --> 00:04:30.399
variables completely away from each other. If

00:04:30.399 --> 00:04:32.420
you generate a character and a background together,

00:04:32.560 --> 00:04:35.899
they bleed. The AI struggles to understand where

00:04:35.899 --> 00:04:38.540
the person actually ends. Right. It lacks true

00:04:38.540 --> 00:04:42.079
spatial awareness of the digital scene. It might

00:04:42.079 --> 00:04:44.240
just blend a green shirt into a green forest.

00:04:44.540 --> 00:04:47.319
By splitting them apart, you isolate the visual

00:04:47.319 --> 00:04:49.959
variables completely. You generate your human

00:04:49.959 --> 00:04:52.500
character on a solid green screen. You generate

00:04:52.500 --> 00:04:54.680
your highly detailed forest background entirely

00:04:54.680 --> 00:04:57.100
separately. Then you composite them together

00:04:57.100 --> 00:04:59.399
in a traditional video editor. Mixing separate

00:04:59.399 --> 00:05:01.740
ingredients gives us total control over the final

00:05:01.740 --> 00:05:04.680
visual recipe. That's exactly how the top professionals

00:05:04.680 --> 00:05:07.120
are working right now. But having those five

00:05:07.120 --> 00:05:09.339
methods is pretty much useless without visual

00:05:09.339 --> 00:05:11.699
consistency. It doesn't matter if you can move

00:05:11.699 --> 00:05:14.339
a character smoothly. If their face changes every

00:05:14.339 --> 00:05:17.019
time the camera cuts, you fail. The illusion

00:05:17.019 --> 00:05:19.439
of your cinematic short film is instantly broken.

00:05:19.800 --> 00:05:22.600
We call this frustrating phenomenon the cousin

00:05:22.600 --> 00:05:25.759
effect. Your hero suddenly looks like their own

00:05:25.759 --> 00:05:28.819
slightly weird cousin. This brings us directly

00:05:28.819 --> 00:05:31.660
to Foundation 2, which is the consistency playbook.

00:05:32.089 --> 00:05:33.850
Consistency is definitely where beginners usually

00:05:33.850 --> 00:05:36.449
lose their minds entirely. You get one truly

00:05:36.449 --> 00:05:39.050
great shot of your main hero, then the next shot

00:05:39.050 --> 00:05:40.990
looks like a completely different person. Yeah,

00:05:41.069 --> 00:05:43.870
it is incredibly jarring for the viewer. The

00:05:43.870 --> 00:05:46.610
guide gives us specific workflow steps to prevent

00:05:46.610 --> 00:05:49.389
these mistakes. First, you must always build

00:05:49.389 --> 00:05:51.589
from static images first. You shouldn't ever

00:05:51.589 --> 00:05:54.629
start your complex workflow with a video. I guess

00:05:54.629 --> 00:05:56.610
because images are just incredibly cheap. They're

00:05:56.610 --> 00:05:58.649
very fast to generate compared to full video.

00:05:59.079 --> 00:06:01.740
Exactly. You can iterate on a face dozens of

00:06:01.740 --> 00:06:04.579
times really quickly. You can explore a vast

00:06:04.579 --> 00:06:07.680
latent space without burning your budget. Video

00:06:07.680 --> 00:06:10.019
generation takes longer and costs significantly

00:06:10.019 --> 00:06:12.420
more compute credits. You really want to lock

00:06:12.420 --> 00:06:14.240
in the still image before spending resources.

00:06:14.800 --> 00:06:17.980
Second, you use your AI generated output as the

00:06:17.980 --> 00:06:20.279
new reference point. You feed the good results

00:06:20.279 --> 00:06:22.779
back into the system continually. You essentially

00:06:22.779 --> 00:06:26.120
lock the visual seed in place permanently. Third,

00:06:26.560 --> 00:06:28.819
You must generate environments completely separately

00:06:28.819 --> 00:06:31.240
from the characters. We touched on this briefly

00:06:31.240 --> 00:06:34.180
during the isolated ingredients discussion. Fourth,

00:06:34.379 --> 00:06:36.759
you have to build a master character reference

00:06:36.759 --> 00:06:39.540
sheet. this is a very specific type of foundational

00:06:39.540 --> 00:06:42.300
document right and this is the absolute secret

00:06:42.300 --> 00:06:45.360
weapon of pro creators you generate a grid showing

00:06:45.360 --> 00:06:47.339
your character from multiple distinct angles

00:06:47.339 --> 00:06:49.439
you want a clear front profile a side profile

00:06:49.439 --> 00:06:52.439
and a back you upscale this image grid to lock

00:06:52.439 --> 00:06:54.360
in the micro details then you feed this grid

00:06:54.360 --> 00:06:56.720
back into your image comp continually you set

00:06:56.720 --> 00:06:58.699
the character weight extremely high in the software

00:06:59.149 --> 00:07:02.009
This forces the generative model to anchor onto

00:07:02.009 --> 00:07:04.709
those specific pixels. And finally, you apply

00:07:04.709 --> 00:07:08.050
this exact same logic to scene props. It also

00:07:08.050 --> 00:07:09.730
applies to any supporting characters in your

00:07:09.730 --> 00:07:15.089
short film. Beat. Whoa. Beat. Imagine generating

00:07:15.089 --> 00:07:18.230
an entire persistent cinematic universe from

00:07:18.230 --> 00:07:20.839
just one character sheet. It really is a staggering

00:07:20.839 --> 00:07:23.240
level of creative power today. You're essentially

00:07:23.240 --> 00:07:25.720
building a digital backlot on your personal computer.

00:07:25.939 --> 00:07:28.339
You lock in the exact visual identity before

00:07:28.339 --> 00:07:31.240
you ever animate anything. This stops the generative

00:07:31.240 --> 00:07:33.439
model from hallucinating completely new details.

00:07:33.660 --> 00:07:35.740
It has a concrete visual anchor to reference

00:07:35.740 --> 00:07:38.040
constantly during generation. You completely

00:07:38.040 --> 00:07:40.040
eliminate that terrifying cousin effect we mentioned

00:07:40.040 --> 00:07:42.500
earlier. I need to dig into one specific workflow

00:07:42.500 --> 00:07:45.680
step right here. Why is it so critical to generate

00:07:45.680 --> 00:07:48.160
environments completely separately from our characters?

00:07:48.730 --> 00:07:51.069
Because AI models really struggle with complex

00:07:51.069 --> 00:07:53.829
spatial relationships natively, they don't truly

00:07:53.829 --> 00:07:56.310
understand three -dimensional depth like humans

00:07:56.310 --> 00:07:59.199
actually do. If you ask for a man in a bustling

00:07:59.199 --> 00:08:01.800
neon city, the model might start putting neon

00:08:01.800 --> 00:08:04.360
lights directly on his leather jacket. It gets

00:08:04.360 --> 00:08:06.740
completely confused by all the overlapping digital

00:08:06.740 --> 00:08:09.459
noise. Isolating backgrounds stops the AI from

00:08:09.459 --> 00:08:11.439
blending everything into total visual chaos.

00:08:11.779 --> 00:08:15.100
Exactly. You handle the complex compositing logic

00:08:15.100 --> 00:08:18.500
yourself later on. You don't force the AI to

00:08:18.500 --> 00:08:21.660
do complex spatial math. Okay, let's step back

00:08:21.660 --> 00:08:23.689
and look at where we are now. We have highly

00:08:23.689 --> 00:08:25.990
consistent characters and fully locked in visual

00:08:25.990 --> 00:08:28.550
environments. We have our master character sheets

00:08:28.550 --> 00:08:31.410
completely ready to go. The latent space is tightly

00:08:31.410 --> 00:08:34.090
constrained and fully controlled. So how do we

00:08:34.090 --> 00:08:36.090
actually get them to move naturally? Foundation

00:08:36.090 --> 00:08:38.470
3 says we have to prompt like a film director.

00:08:38.730 --> 00:08:40.789
We cannot just prompt like a software engineer

00:08:40.789 --> 00:08:42.950
anymore. We have to think about blocking, pacing,

00:08:43.129 --> 00:08:46.269
and direct action. This is where the true cinematic

00:08:46.269 --> 00:08:48.870
magic really starts happening. The guide breaks

00:08:48.870 --> 00:08:51.870
down three specific directing rules for us. The

00:08:51.870 --> 00:08:53.730
first rule is all about controlling the pace

00:08:53.730 --> 00:08:56.740
of motion. You must use the word slow in your

00:08:56.740 --> 00:08:59.100
text prompts. You have to use it way more than

00:08:59.100 --> 00:09:01.159
you think you need to. That feels incredibly

00:09:01.159 --> 00:09:03.480
counterintuitive at first glance. You know, you

00:09:03.480 --> 00:09:05.940
want dynamic action, so you naturally ask for

00:09:05.940 --> 00:09:08.620
fast movement. But the physics engines in AI

00:09:08.620 --> 00:09:11.399
naturally want to move things way too fast. Right.

00:09:11.399 --> 00:09:13.600
And they struggle deeply with subtle temporal

00:09:13.600 --> 00:09:17.259
consistency. If you ask a character to run quickly

00:09:17.259 --> 00:09:20.340
across a room, the model has to invent a massive

00:09:20.340 --> 00:09:23.340
amount of new pixel data rapidly. It usually

00:09:23.340 --> 00:09:26.360
fails and creates a blurry, chaotic visual mess.

00:09:26.960 --> 00:09:29.840
Asking for slow motion forces the engine to calculate

00:09:29.840 --> 00:09:32.740
carefully. It gives the AI time to maintain the

00:09:32.740 --> 00:09:34.860
rigid structural integrity. You can always speed

00:09:34.860 --> 00:09:37.120
the footage up in post -production later. Yeah.

00:09:37.360 --> 00:09:39.259
Beat, that makes perfect sense when you understand

00:09:39.259 --> 00:09:41.659
mechanics. So what is the second critical directing

00:09:41.659 --> 00:09:44.210
rule from the guide? You must restate what is

00:09:44.210 --> 00:09:47.129
already visible in the reference image. If your

00:09:47.129 --> 00:09:49.429
character is wearing a heavy red leather jacket,

00:09:49.690 --> 00:09:52.129
you have to type red leather jacket in the animation

00:09:52.129 --> 00:09:55.289
prompt. You cannot just say make the man walk

00:09:55.289 --> 00:09:58.110
forward. You have to consciously remind the AI

00:09:58.110 --> 00:10:01.009
about the specific clothing. If I feed the AI

00:10:01.009 --> 00:10:04.269
an image, why waste prompt space restating visible

00:10:04.269 --> 00:10:07.120
details? Well, because the generative model has

00:10:07.120 --> 00:10:09.700
a strictly finite attention budget. Think of

00:10:09.700 --> 00:10:12.000
the AI like a highly stressed camera operator.

00:10:12.299 --> 00:10:15.299
It only has so much compute power for every single

00:10:15.299 --> 00:10:18.779
frame. Calculating complex motion vectors is

00:10:18.779 --> 00:10:21.179
mathematically very expensive for the system.

00:10:21.360 --> 00:10:23.840
It has to calculate exactly how an arm swings

00:10:23.840 --> 00:10:26.960
through space. Exactly. As it calculates the

00:10:26.960 --> 00:10:30.179
complex math of a moving arm. It spends 90 %

00:10:30.179 --> 00:10:32.240
of its attention budget right there. It sometimes

00:10:32.240 --> 00:10:34.200
drops the data about the red jacket entirely.

00:10:34.679 --> 00:10:37.419
It prioritizes the new motion vector over the

00:10:37.419 --> 00:10:40.440
existing visual stability. It'll literally hallucinate

00:10:40.440 --> 00:10:43.379
a generic gray t -shirt to save processing power.

00:10:43.659 --> 00:10:45.220
It just completely forgets what the character

00:10:45.220 --> 00:10:47.470
is currently wearing. Yeah, it takes the path

00:10:47.470 --> 00:10:50.909
of least computational resistance. Restating

00:10:50.909 --> 00:10:54.149
the details forces the AI to actively focus during

00:10:54.149 --> 00:10:57.409
generation. It preserves those specific visual

00:10:57.409 --> 00:11:00.230
elements during the actual motion. You essentially

00:11:00.230 --> 00:11:02.929
force it to allocate its attention budget properly.

00:11:03.470 --> 00:11:06.570
Repeating visual details forces the AI to remember

00:11:06.570 --> 00:11:09.509
exactly what it's animating. Precisely. You become

00:11:09.509 --> 00:11:12.190
a strict director managing a stressed camera

00:11:12.190 --> 00:11:14.600
crew. And that brings us to the third critical

00:11:14.600 --> 00:11:17.279
directing rule today. You must stick to very

00:11:17.279 --> 00:11:20.559
simple, highly isolated physical actions. You

00:11:20.559 --> 00:11:23.000
only want one or two clear actions per shot.

00:11:23.159 --> 00:11:25.379
You don't ask the character to walk, talk, and

00:11:25.379 --> 00:11:27.639
drink coffee simultaneously. The attention budget

00:11:27.639 --> 00:11:30.320
would just completely crash. Exactly. The AI

00:11:30.320 --> 00:11:32.480
model would panic and melt the image entirely.

00:11:32.759 --> 00:11:34.700
You have to break those complex actions down

00:11:34.700 --> 00:11:37.240
into individual camera shots. Shot one is just

00:11:37.240 --> 00:11:38.899
the character walking slowly through the door.

00:11:39.080 --> 00:11:41.240
Shot two is a tight close -up of them taking

00:11:41.240 --> 00:11:44.049
a sip of coffee. Shot 3 is them delivering the

00:11:44.049 --> 00:11:46.610
actual spoken dialogue. This drastically reduces

00:11:46.610 --> 00:11:49.029
the cognitive load on the AI model. It ensures

00:11:49.029 --> 00:11:51.370
each individual segment looks as realistic as

00:11:51.370 --> 00:11:53.720
mathematically possible. Right, and that's exactly

00:11:53.720 --> 00:11:56.340
how real film directors work anyway. They build

00:11:56.340 --> 00:11:58.700
complex scenes through a sequence of highly controlled

00:11:58.700 --> 00:12:01.159
shots. They don't capture the entire movie in

00:12:01.159 --> 00:12:03.679
one chaotic take. They manage the variables on

00:12:03.679 --> 00:12:06.379
set to guarantee a perfect outcome. By limiting

00:12:06.379 --> 00:12:08.480
the variables in your prompt, you guarantee a

00:12:08.480 --> 00:12:10.820
much higher success rate. You basically stop

00:12:10.820 --> 00:12:12.820
rolling the dice and hoping for a lucky generation.

00:12:13.399 --> 00:12:16.019
Which brings us to the massive overarching takeaway

00:12:16.019 --> 00:12:20.500
from all this. The 2026 AI Video Gold Rush. isn't

00:12:20.500 --> 00:12:23.340
about the software you use. Every single creator

00:12:23.340 --> 00:12:25.840
essentially has access to the exact same foundational

00:12:25.840 --> 00:12:29.080
tools. The latent space is available to absolutely

00:12:29.080 --> 00:12:31.559
everyone with a laptop. It's really about the

00:12:31.559 --> 00:12:33.759
intense discipline of your daily creative workflow.

00:12:34.039 --> 00:13:00.519
You stop relying on lucky generations and magical

00:13:00.519 --> 00:13:04.120
one -shot prompts. Beat. You essentially transform

00:13:04.120 --> 00:13:06.639
from a prompt guesser into an actual director.

00:13:06.899 --> 00:13:09.320
You take total control of the underlying mathematical

00:13:09.320 --> 00:13:11.659
system. And that transformation is what truly

00:13:11.659 --> 00:13:14.500
separates the amateurs from the pros. Anyone

00:13:14.500 --> 00:13:17.159
can generate a random, morphing, five -second

00:13:17.159 --> 00:13:20.980
video clip today, but directing a cohesive, visually

00:13:20.980 --> 00:13:23.639
perfect narrative requires a deeply structured

00:13:23.639 --> 00:13:26.820
system. It requires immense patience, careful

00:13:26.820 --> 00:13:29.100
planning, and a deep understanding of the medium.

00:13:29.320 --> 00:13:31.919
You have to understand latent space, attention

00:13:31.919 --> 00:13:35.139
budgets, and complex motion vectors. It's a completely

00:13:35.139 --> 00:13:38.220
new craft that demands serious artistic respect.

00:13:38.759 --> 00:13:40.500
Thank you so much for joining us on this deep

00:13:40.500 --> 00:13:42.360
dive. We want to leave you with a very simple

00:13:42.360 --> 00:13:44.379
challenge today. Try taking just one of these

00:13:44.379 --> 00:13:46.700
new structured workflow steps. Try building a

00:13:46.700 --> 00:13:48.820
solid multi -angle character reference sheet

00:13:48.820 --> 00:13:51.340
first. Do this before you ever hit generate video

00:13:51.340 --> 00:13:54.179
again. Build that crucial visual foundation before

00:13:54.179 --> 00:13:56.480
you try to build a whole house. Lock down your

00:13:56.480 --> 00:13:58.419
digital actor before you force them to perform.

00:13:58.679 --> 00:14:01.139
It'll truly save you so much frustration and

00:14:01.139 --> 00:14:03.159
wasted time. You'll finally feel like you're

00:14:03.159 --> 00:14:04.919
actually in control of the creative process.

00:14:05.200 --> 00:14:07.539
You'll stop fighting the AI and start directing

00:14:07.539 --> 00:14:10.460
it properly. If anyone can manifest a perfect

00:14:10.460 --> 00:14:13.919
cinematic short in 30 minutes, two sec silence.

00:14:14.259 --> 00:14:17.019
It leaves you wondering, when technical barriers

00:14:17.019 --> 00:14:19.460
completely vanish, will human taste be the only

00:14:19.460 --> 00:14:21.460
thing left? Will that be what makes a director

00:14:21.460 --> 00:14:23.240
valuable? Something to think about.
