WEBVTT

00:00:00.000 --> 00:00:01.679
I want you to just picture a scene for a second.

00:00:02.240 --> 00:00:06.339
A European villa, maybe, midday sun. Okay. There's

00:00:06.339 --> 00:00:09.699
a woman at a table. She looks distressed. A man's

00:00:09.699 --> 00:00:12.320
across from her. And you write down the dialogue.

00:00:12.460 --> 00:00:15.580
She says, these trees will turn yellow in a month.

00:00:15.980 --> 00:00:18.280
And he says, but they'll be green again next

00:00:18.280 --> 00:00:22.870
summer. In the past, and by past I mean literally

00:00:22.870 --> 00:00:26.030
last month, if you fed that into an AI, you'd

00:00:26.030 --> 00:00:29.489
get one continuous, probably pretty awkward shot.

00:00:29.550 --> 00:00:32.429
The camera might just float around. The faces,

00:00:32.770 --> 00:00:35.250
they'd probably blur. You'd feel like surveillance

00:00:35.250 --> 00:00:37.710
camera footage of a soap opera. Exactly. But

00:00:37.710 --> 00:00:40.149
something. Something changed this week. With

00:00:40.149 --> 00:00:43.109
Cling 3 .0, you hit generate on that same prompt,

00:00:43.210 --> 00:00:45.450
and the model doesn't just render pixels anymore.

00:00:45.710 --> 00:00:49.310
It decides to start with a wide shot to set the

00:00:49.310 --> 00:00:52.109
scene. Then right when the woman speaks, it cuts,

00:00:52.229 --> 00:00:55.329
a hard cut to a close -up on her face. Then it

00:00:55.329 --> 00:00:57.469
cuts to the man for his reaction. It's making

00:00:57.469 --> 00:00:59.890
editorial decisions. And that's the moment. That

00:00:59.890 --> 00:01:01.649
right there is the signal that clip generation

00:01:01.649 --> 00:01:04.969
is dead. And actual direction has arrived. It's

00:01:04.969 --> 00:01:07.400
not just rendering. Yeah. It's filmmaking. Welcome

00:01:07.400 --> 00:01:10.400
to the Deep Dive. It is Thursday, February 5th,

00:01:10.400 --> 00:01:15.519
2026. We are looking at Kling 3 .0. And I really

00:01:15.519 --> 00:01:18.879
want to slow down and just process what this

00:01:18.879 --> 00:01:20.659
update means because it really feels like we've

00:01:20.659 --> 00:01:22.799
crossed some kind of threshold. We definitely

00:01:22.799 --> 00:01:24.459
have. You know, for the last year, we've been

00:01:24.459 --> 00:01:27.349
drowning in what I call tech demos. Right. You

00:01:27.349 --> 00:01:30.510
know, cool astronauts, melting clocks, cars on

00:01:30.510 --> 00:01:33.109
fire. It's great for a tweet, but it's totally

00:01:33.109 --> 00:01:35.250
useless for a movie. It all felt very experimental,

00:01:35.489 --> 00:01:38.530
like a toy. It was a toy. But Kling 3 .0 has

00:01:38.530 --> 00:01:41.469
officially replaced 2 .6. And the headline isn't

00:01:41.469 --> 00:01:43.709
just better graphics. It's a fundamental shift

00:01:43.709 --> 00:01:46.469
in the architecture. We're moving from a system

00:01:46.469 --> 00:01:49.530
that generates footage. to a system that generates

00:01:49.530 --> 00:01:52.049
seams. And that distinction is everything. It

00:01:52.049 --> 00:01:54.209
matters a lot. It really does. So let's map this

00:01:54.209 --> 00:01:56.370
out for a bit. We need to talk about this unified

00:01:56.370 --> 00:01:59.250
multimodal architecture, which it sounds like

00:01:59.250 --> 00:02:01.150
marketing speak, but it's actually really important.

00:02:01.290 --> 00:02:03.230
It is. We have to get into the multi -shot feature,

00:02:03.430 --> 00:02:05.189
which is that AI director we were just talking

00:02:05.189 --> 00:02:07.730
about. And we've got to discuss consistency,

00:02:07.829 --> 00:02:10.849
how they finally stopped faces from melting into

00:02:10.849 --> 00:02:13.409
sludge. The sludge is gone. Thankfully. And then

00:02:13.409 --> 00:02:15.469
finally, we have to see where this all fits next

00:02:15.469 --> 00:02:17.729
to Sora and Runway because that whole landscape

00:02:17.729 --> 00:02:20.370
is getting very, very crowded. It is. But let's

00:02:20.370 --> 00:02:23.069
start with the engine, the unified multimodal

00:02:23.069 --> 00:02:26.610
architecture. Yeah. So in the old days, and again,

00:02:26.669 --> 00:02:28.810
we mean like six months ago. Right. The old days.

00:02:28.870 --> 00:02:32.389
We had this Frankenstein workflow. It was a nightmare.

00:02:32.689 --> 00:02:34.930
Oh, it was awful. You'd use one model for the

00:02:34.930 --> 00:02:37.530
video, then you'd have to drag that into a totally

00:02:37.530 --> 00:02:39.770
separate tool for the audio, and then maybe another

00:02:39.770 --> 00:02:41.930
one to upscale it. It felt like you were gluing

00:02:41.930 --> 00:02:44.710
together pieces from different puzzles. It was

00:02:44.710 --> 00:02:47.270
so disjointed. And that's because the models

00:02:47.270 --> 00:02:50.270
themselves were disjointed. The video brain didn't

00:02:50.270 --> 00:02:54.110
talk to the audio brain. But Kling 3 .0 is a

00:02:54.110 --> 00:02:58.409
native, unified system. Think of it like the

00:02:58.409 --> 00:03:01.030
human brain. When you dream, You don't do the

00:03:01.030 --> 00:03:04.710
visuals, then pause to dream the sound and then

00:03:04.710 --> 00:03:06.590
edit them in your head. It all happens at once.

00:03:06.590 --> 00:03:08.990
It's simultaneous. Exactly. It's one experience.

00:03:09.289 --> 00:03:14.110
And Kling 3 .0 is finally trying to dream the

00:03:14.110 --> 00:03:17.250
video natively all at once. That's why the sync

00:03:17.250 --> 00:03:19.430
is so good, because the audio isn't just reacting

00:03:19.430 --> 00:03:21.150
to the video. They're being born at the exact

00:03:21.150 --> 00:03:24.550
same instant from the same data. So because it's

00:03:24.550 --> 00:03:26.870
one system, it's just more efficient. So much

00:03:26.870 --> 00:03:30.060
more efficient. And that efficiency. buys us

00:03:30.060 --> 00:03:32.460
the one thing we've all been desperate for time

00:03:32.460 --> 00:03:35.680
we're jumping from those frantic you know five

00:03:35.680 --> 00:03:39.560
second clips to full 15 second continuous scenes

00:03:39.560 --> 00:03:42.240
okay let's pause on that 15 seconds to someone

00:03:42.240 --> 00:03:44.360
just scrolling tiktok adding 10 seconds probably

00:03:44.360 --> 00:03:47.080
sounds trivial why is that such a heavy lift

00:03:47.080 --> 00:03:49.280
technically why couldn't we just do that before

00:03:49.280 --> 00:03:52.120
it's the memory drift problem imagine you're

00:03:52.120 --> 00:03:54.199
trying to draw a comic strip Frame by frame.

00:03:54.360 --> 00:03:56.180
But by the time you get to panel three, you've

00:03:56.180 --> 00:03:57.780
completely forgotten what the main character's

00:03:57.780 --> 00:04:01.240
nose looks like. The old models had a very short

00:04:01.240 --> 00:04:03.939
attention span. After five seconds of generating

00:04:03.939 --> 00:04:06.439
video, which is a huge amount of data, the model

00:04:06.439 --> 00:04:08.539
would just forget the starting conditions. And

00:04:08.539 --> 00:04:10.699
everything would warp. The background would warp.

00:04:10.740 --> 00:04:13.900
The shirt color changes. It gets chaotic. Extending

00:04:13.900 --> 00:04:17.240
that coherence out to 15 seconds required a massive

00:04:17.240 --> 00:04:19.500
architectural overhaul. And creatively, that

00:04:19.500 --> 00:04:23.660
extra time changes the entire medium. Five seconds

00:04:23.660 --> 00:04:25.779
is basically a gif. It's a reaction, a loop.

00:04:25.959 --> 00:04:28.040
You can't really tell a story in five seconds.

00:04:28.259 --> 00:04:31.660
No, you can't. You can't have a character, say,

00:04:31.720 --> 00:04:34.620
walk into a room, hesitate for a second, realize

00:04:34.620 --> 00:04:36.220
they're in the wrong place, and then turn around.

00:04:36.480 --> 00:04:39.279
That takes time. With 15 seconds, you have room

00:04:39.279 --> 00:04:41.620
to breathe. Characters can finish an action.

00:04:41.680 --> 00:04:43.759
They can have a dialogue exchange that actually

00:04:43.759 --> 00:04:47.410
feels human, not rushed. So, I mean, does adding

00:04:47.410 --> 00:04:49.949
10 seconds really change the fundamental nature

00:04:49.949 --> 00:04:52.490
of the video? It shifts the output from just

00:04:52.490 --> 00:04:55.370
a fleeting moving image to a narratively complete

00:04:55.370 --> 00:04:57.990
scene. A narratively complete scene. Okay, that's

00:04:57.990 --> 00:05:00.490
a perfect segue to the multi -shot feature because

00:05:00.490 --> 00:05:03.370
this is where the AI stops being a tool and starts

00:05:03.370 --> 00:05:05.470
acting like a collaborator. Or a boss, maybe.

00:05:05.709 --> 00:05:08.910
Or a boss, yeah. Let's go back to that European

00:05:08.910 --> 00:05:12.230
villa prompt. The user didn't ask for any cuts.

00:05:12.910 --> 00:05:15.329
They didn't say cut to camera B at four seconds.

00:05:15.449 --> 00:05:18.829
They just described the emotional flow. She speaks,

00:05:19.009 --> 00:05:22.050
he replies. And Kling 3 .0 just understands that

00:05:22.050 --> 00:05:24.529
in the language of cinema, from the millions

00:05:24.529 --> 00:05:27.449
of movies and its training data, when someone

00:05:27.449 --> 00:05:30.269
speaks, we usually want to see them. Right. So

00:05:30.269 --> 00:05:32.329
it automatically creates that shot, reverse shot.

00:05:32.389 --> 00:05:34.730
It plans the sequence for you. But I have to

00:05:34.730 --> 00:05:36.629
play devil's advocate here for a second. If I'm

00:05:36.629 --> 00:05:38.730
a creator and I type a prompt, don't I want control?

00:05:39.180 --> 00:05:41.980
If the AI is deciding to cut, isn't it kind of

00:05:41.980 --> 00:05:44.060
overriding my artistic intent? What if I wanted

00:05:44.060 --> 00:05:46.399
a long, uncomfortable, continuous take? That's

00:05:46.399 --> 00:05:48.579
the tension, right. But think of it this way.

00:05:49.060 --> 00:05:51.899
You are trading micromanagement for high -level

00:05:51.899 --> 00:05:53.860
direction. Okay. Unless you tell it otherwise.

00:05:53.980 --> 00:05:56.860
The AI assumes you want standard cinematic grammar.

00:05:57.160 --> 00:05:59.920
And it's actually, it's freeing for a writer.

00:06:00.040 --> 00:06:01.800
You can just focus on the story, the dialogue,

00:06:01.939 --> 00:06:04.000
the mood, and let the model handle the technical

00:06:04.000 --> 00:06:06.480
stitching. So I become the showrunner and the

00:06:06.480 --> 00:06:09.560
AI is the episode director. Precisely. You're

00:06:09.560 --> 00:06:11.839
not the editor anymore, you know, frame bashing

00:06:11.839 --> 00:06:14.160
in Premiere. You're the producer saying, make

00:06:14.160 --> 00:06:17.220
this scene feel sad. And the AI executes the

00:06:17.220 --> 00:06:19.759
technicalities of sadness, which includes the

00:06:19.759 --> 00:06:22.399
pacing and the cutting. That is such a wild shift

00:06:22.399 --> 00:06:24.839
in agency. But it only works if the characters

00:06:24.839 --> 00:06:27.620
look the same across those cuts. Which brings

00:06:27.620 --> 00:06:31.620
us to the melting problem. Ah, the dreaded drift.

00:06:31.839 --> 00:06:34.699
Anyone who's used 2 .6 knows this. You generate

00:06:34.699 --> 00:06:37.629
a cool character in shot one. By shot three,

00:06:37.870 --> 00:06:40.810
they look like their cousin. By shot five, their

00:06:40.810 --> 00:06:43.110
face starts doing that weird like melting candle

00:06:43.110 --> 00:06:45.430
thing. Yeah. It breaks the immersion immediately.

00:06:45.689 --> 00:06:47.290
Oh, it pulls you right out of the story. It's

00:06:47.290 --> 00:06:50.069
the uncanny valley at its absolute worst. Yeah.

00:06:50.490 --> 00:06:53.110
But Kling 3 .0 has attacked this with something

00:06:53.110 --> 00:06:55.829
called video element references. I read about

00:06:55.829 --> 00:06:58.069
this. It's basically like giving the AI a reference

00:06:58.069 --> 00:06:59.870
sheet, right? It's more than a sheet. It's like

00:06:59.870 --> 00:07:02.269
a memory bank. Instead of just describing your

00:07:02.269 --> 00:07:04.209
character with text and, you know, hoping the

00:07:04.209 --> 00:07:06.790
random noise gets it right twice, you upload

00:07:06.790 --> 00:07:09.730
a three to eight second video clip of that character.

00:07:09.889 --> 00:07:11.730
So not just a photo. It actually sees how they

00:07:11.730 --> 00:07:15.170
move. That's the key. It captures the 3D structure

00:07:15.170 --> 00:07:18.310
of their face in motion. It locks that data in.

00:07:18.490 --> 00:07:21.250
So when you go to generate a new scene, the AI

00:07:21.250 --> 00:07:24.350
says, OK, I know this entity and it preserves

00:07:24.350 --> 00:07:26.490
the structure. No more melting hands, no more

00:07:26.490 --> 00:07:29.649
changing jawlines. That is crucial for any kind

00:07:29.649 --> 00:07:32.110
of serialization. If you want to make a web series,

00:07:32.490 --> 00:07:35.170
you need the same actor to show up in episode

00:07:35.170 --> 00:07:37.930
two. It's the difference between a random generation

00:07:37.930 --> 00:07:41.319
and a cast member. And this ties directly into

00:07:41.319 --> 00:07:43.800
the audio character binding feature. Which is

00:07:43.800 --> 00:07:46.199
fascinating. I saw that example of the cafe scene,

00:07:46.379 --> 00:07:49.439
an English man and a French woman having a bilingual

00:07:49.439 --> 00:07:52.279
conversation. Right. And in the prompt, you use

00:07:52.279 --> 00:07:54.980
these tags. You say an English man says this

00:07:54.980 --> 00:07:57.160
and French woman says that. And it handles the

00:07:57.160 --> 00:07:59.560
lip sync for both languages in the same shot.

00:07:59.800 --> 00:08:01.899
Flawlessly. Yeah. And because of that unified

00:08:01.899 --> 00:08:03.980
architecture we talked about, the lips aren't

00:08:03.980 --> 00:08:06.519
just flapping over a static image. The facial

00:08:06.519 --> 00:08:09.529
muscles are actually moving correctly for. French

00:08:09.529 --> 00:08:11.769
phonetics versus English phonetics. That is.

00:08:11.829 --> 00:08:13.730
Wow. It's approaching reality. It sounds like

00:08:13.730 --> 00:08:15.810
this removes the uncanniness, but does it feel

00:08:15.810 --> 00:08:18.970
human? By binding the voice and the visuals so

00:08:18.970 --> 00:08:22.430
tightly, it bridges that gap between just a simulation

00:08:22.430 --> 00:08:25.589
and a believable performance. A believable performance.

00:08:25.649 --> 00:08:28.709
It's exciting, but also, you know, a little terrifying

00:08:28.709 --> 00:08:31.170
for actors. Just a little bit. Speaking of things

00:08:31.170 --> 00:08:33.620
that need to be reliable. Okay, we're back. So

00:08:33.620 --> 00:08:35.919
we've talked about the automation, the AI directing

00:08:35.919 --> 00:08:39.299
the cuts, the AI syncing the lips. But there's

00:08:39.299 --> 00:08:41.600
a subset of our listeners, and I definitely count

00:08:41.600 --> 00:08:44.220
myself among them, who are control freaks. The

00:08:44.220 --> 00:08:46.820
pixel peepers. Exactly. I don't always want the

00:08:46.820 --> 00:08:49.460
AI to decide when to cut. Sometimes I have a

00:08:49.460 --> 00:08:52.259
very specific vision in my head. And Kling 3

00:08:52.259 --> 00:08:56.179
.0 has a response to that with the Omni Architecture's

00:08:56.179 --> 00:08:58.759
storyboard mode. This is the power user feature.

00:08:58.899 --> 00:09:00.700
This is for when you want to be the director

00:09:00.700 --> 00:09:02.940
and the cinematographer. In storyboard mode,

00:09:03.059 --> 00:09:05.759
you can define every single shot before you hit

00:09:05.759 --> 00:09:07.940
generate. You set the duration, the framing,

00:09:08.100 --> 00:09:10.139
the camera motion, everything. So I can say shot

00:09:10.139 --> 00:09:13.259
one, four seconds, wide angle, slow zoom in.

00:09:13.340 --> 00:09:16.960
Shot two, three seconds, close up, static. Yes.

00:09:17.639 --> 00:09:19.679
There was a great example in the release notes.

00:09:19.840 --> 00:09:23.639
It was a Chinese period drama scene. The prompt

00:09:23.639 --> 00:09:26.600
was oddly specific. It was, who can bully my

00:09:26.600 --> 00:09:29.179
obedient adult? Who can bully my obedient adult?

00:09:29.320 --> 00:09:31.240
I feel like we're missing some context there.

00:09:31.419 --> 00:09:33.299
It's definitely a genre trope. A little bit bad.

00:09:33.559 --> 00:09:35.940
But look at the camera direction. The user specified

00:09:35.940 --> 00:09:39.529
a crash zoom. So the camera rushes forward. It

00:09:39.529 --> 00:09:42.029
cuts past the main character to land right on

00:09:42.029 --> 00:09:44.529
an elderly onlooker's astonished eyes. That's

00:09:44.529 --> 00:09:47.769
a very aggressive cinematic choice. It is. And

00:09:47.769 --> 00:09:50.330
the AI executed it perfectly because the user

00:09:50.330 --> 00:09:52.250
controlled all the camera movement parameters.

00:09:53.009 --> 00:09:55.529
But, and this is a big, but just because you

00:09:55.529 --> 00:09:57.250
can do a crash zoom doesn't mean you should.

00:09:57.409 --> 00:09:59.529
The documentation actually had a whole section

00:09:59.529 --> 00:10:02.009
on what actually improves cinematic results.

00:10:02.330 --> 00:10:04.370
It felt like a mini film school lesson. It was.

00:10:04.529 --> 00:10:07.840
And the biggest tip, composition first. effects

00:10:07.840 --> 00:10:10.100
second stop trying to make the camera do backflips

00:10:10.100 --> 00:10:13.759
exactly the ai creates much higher fidelity images

00:10:13.759 --> 00:10:16.539
when the movement is purposeful a slow dolly

00:10:16.539 --> 00:10:20.220
a gentle orbit when you ask for a fast spin while

00:10:20.220 --> 00:10:22.100
the character runs and the building explodes

00:10:22.100 --> 00:10:25.240
you're just introducing way too much noise the

00:10:25.240 --> 00:10:27.860
ai gets confused and the result looks like a

00:10:27.860 --> 00:10:30.779
video game glitch the rule of one action per

00:10:30.779 --> 00:10:35.240
shot one clear subject one clear movement simplicity

00:10:35.240 --> 00:10:39.009
wins So does the tool really require a filmmaker's

00:10:39.009 --> 00:10:41.289
eye to be used effectively? You need to think

00:10:41.289 --> 00:10:42.950
like a photographer, not just a prompt engineer.

00:10:43.110 --> 00:10:45.669
You have to understand framing. Which is a skill

00:10:45.669 --> 00:10:47.029
I think a lot of people are going to have to

00:10:47.029 --> 00:10:49.110
learn very, very quickly. Okay, we have to look

00:10:49.110 --> 00:10:51.509
at the context now. Kling isn't operating in

00:10:51.509 --> 00:10:55.129
a vacuum. We've got OpenAI Sora. We've got Runway,

00:10:55.169 --> 00:10:57.710
Pika, Vidu. It's a battle royale out there. It's

00:10:57.710 --> 00:11:00.230
getting so crowded. So where does Kling 3 .0

00:11:00.230 --> 00:11:02.629
fit in all this? Is it the best at everything?

00:11:03.120 --> 00:11:05.399
No, and I don't think any single model will be.

00:11:05.480 --> 00:11:07.840
They're all carving out their own niches. Let's

00:11:07.840 --> 00:11:10.340
look at Sora too. The heavyweight Sora is still

00:11:10.340 --> 00:11:12.639
the king of physics. If you need a simulation

00:11:12.639 --> 00:11:15.000
of water crashing into a lighthouse or a glass

00:11:15.000 --> 00:11:17.480
shattering on a floor, Sora just understands

00:11:17.480 --> 00:11:19.759
the physical world better. It simulates gravity

00:11:19.759 --> 00:11:22.820
and fluid dynamics in a way that Kling doesn't

00:11:22.820 --> 00:11:25.620
quite match yet. Okay, so Sora for simulation.

00:11:26.200 --> 00:11:29.759
What about Runway Gen 3? Runway is the tool for

00:11:29.759 --> 00:11:32.879
those control freaks we mentioned. I think Runway

00:11:32.879 --> 00:11:35.519
still holds the crown for granular control. If

00:11:35.519 --> 00:11:37.659
I need to change the color of a car in the background

00:11:37.659 --> 00:11:39.659
without altering the lighting on the protagonist's

00:11:39.659 --> 00:11:42.940
face, Runway's brush tools and local in -painting

00:11:42.940 --> 00:11:45.519
are still superior. Runway lets you be a surgeon.

00:11:45.940 --> 00:11:48.519
Exactly. Kling is painting with a broader brush.

00:11:48.759 --> 00:11:50.840
It wants to give you the whole scene, whereas

00:11:50.840 --> 00:11:53.299
Runway lets you surgically alter individual pixels.

00:11:53.580 --> 00:11:56.299
And Pika and Vidu, where do they fit? Pika is

00:11:56.299 --> 00:11:59.019
for speed and social media effects. It's flashy.

00:11:59.179 --> 00:12:01.980
And Video 2 has really cornered the market on

00:12:01.980 --> 00:12:04.820
Asian aesthetics and anime styles. It just handles

00:12:04.820 --> 00:12:06.700
those textures better than anyone. So where does

00:12:06.700 --> 00:12:09.259
that leave Cling? Cling 3 .0 is the production

00:12:09.259 --> 00:12:11.940
tool. That's its whole identity. It wins on the

00:12:11.940 --> 00:12:14.580
workflow. If you want to create a raw clip to

00:12:14.580 --> 00:12:17.019
edit later, maybe you use Runway. But if you

00:12:17.019 --> 00:12:19.940
want to generate an edited sequence... A scene

00:12:19.940 --> 00:12:22.059
that tells a story right out of the box with

00:12:22.059 --> 00:12:24.100
dialogue and cuts. That's Kling's territory.

00:12:24.399 --> 00:12:26.120
That's Kling's territory. It's not just generating

00:12:26.120 --> 00:12:28.620
footage, it's generating cinema. So is there

00:12:28.620 --> 00:12:30.740
even a clear winner here at this point? It's

00:12:30.740 --> 00:12:33.539
specialized now. Use Kling when you need to tell

00:12:33.539 --> 00:12:36.360
a story with cuts and dialogue. Use the others

00:12:36.360 --> 00:12:39.419
for VFX shots. It's about picking the right hammer

00:12:39.419 --> 00:12:43.000
for the right nail. Let's zoom out then. We started

00:12:43.000 --> 00:12:45.039
this whole conversation with the idea that the

00:12:45.039 --> 00:12:48.960
tech demo era is over. We saw Kling 2 .6, a noble

00:12:48.960 --> 00:12:51.740
effort, but it was flawed. Good visuals, short

00:12:51.740 --> 00:12:54.039
clips, the drifting faces. It's not us interested.

00:12:54.179 --> 00:12:56.480
It's a proof of concept. And now we have 3 .0,

00:12:56.559 --> 00:13:00.340
a unified system, 15 -second scenes, an AI that

00:13:00.340 --> 00:13:03.139
understands film editing logic. The moment of

00:13:03.139 --> 00:13:05.179
wonder for me isn't just the visual quality.

00:13:05.460 --> 00:13:09.220
It's the integration. It's the ability to write

00:13:09.220 --> 00:13:12.600
a prompt that contains complex emotions, like,

00:13:12.679 --> 00:13:16.279
are you always this hopeful? And have the machine.

00:13:16.779 --> 00:13:18.879
Act as the cinematographer, the sound engineer,

00:13:19.080 --> 00:13:21.679
the casting director and the editor all in one

00:13:21.679 --> 00:13:24.460
pass. It's the convergence of all of it. It's

00:13:24.460 --> 00:13:26.940
reliable enough now for agencies and creators

00:13:26.940 --> 00:13:29.559
to actually build businesses on. And that's the

00:13:29.559 --> 00:13:32.360
real shift. It's no longer just look what I made.

00:13:32.399 --> 00:13:34.440
It's look what I sold. So if you're listening

00:13:34.440 --> 00:13:36.639
to this and you have access, I think it's rolling

00:13:36.639 --> 00:13:38.879
out to ultra subscribers first. That's right.

00:13:38.940 --> 00:13:40.820
Paying members get the first bite. Don't wait.

00:13:40.940 --> 00:13:43.960
Start testing those multi -shot features. Push

00:13:43.960 --> 00:13:46.620
the duration to 15 seconds. Try the character

00:13:46.620 --> 00:13:48.879
binding. The barrier to professional storytelling

00:13:48.879 --> 00:13:50.919
has basically just collapsed. You don't need

00:13:50.919 --> 00:13:53.080
a million dollar budget. You just need a good

00:13:53.080 --> 00:13:55.440
idea and the ability to describe it. The production

00:13:55.440 --> 00:13:58.840
era of AI video is here. The real question is,

00:13:58.919 --> 00:14:00.539
what are you going to make with it? That is the

00:14:00.539 --> 00:14:02.299
question. And here's a final thought to leave

00:14:02.299 --> 00:14:05.600
you with. If the AI can now direct the scene,

00:14:05.799 --> 00:14:08.620
edit the scene, and act the scene, and it's trained

00:14:08.620 --> 00:14:12.509
on all of human cinema. How long until we stop

00:14:12.509 --> 00:14:14.669
asking, how do we make this look real? And we

00:14:14.669 --> 00:14:17.490
start asking, whose directorial style is this

00:14:17.490 --> 00:14:22.049
AI actually mimicking? Ooh, that is a can of

00:14:22.049 --> 00:14:24.309
worms. Are we getting a Spielberg cut or a Kubrick

00:14:24.309 --> 00:14:26.750
cut? Exactly. Something to mull over. Thanks

00:14:26.750 --> 00:14:29.009
for diving in with us. Always a pleasure. We'll

00:14:29.009 --> 00:14:29.509
see you in the next one.
