WEBVTT

00:00:00.000 --> 00:00:03.899
It's March 2026. Beat. We're casually handing

00:00:03.899 --> 00:00:06.740
AI the keys to our businesses. Yet at the exact

00:00:06.740 --> 00:00:09.880
same time, we're using it to conjure entire Hollywood

00:00:09.880 --> 00:00:13.300
-level worlds. All from a simple laptop. Welcome

00:00:13.300 --> 00:00:16.050
to the Deep Dive. Today, we're exploring two

00:00:16.050 --> 00:00:18.570
fascinating, wild extremes of this technology.

00:00:19.129 --> 00:00:21.530
First, we're examining the massive security vulnerabilities

00:00:21.530 --> 00:00:24.989
hidden in new worker AI agents. Second, we're

00:00:24.989 --> 00:00:26.789
walking through a step -by -step masterclass.

00:00:27.050 --> 00:00:29.890
We'll cover the definitive cinematic AI ad workflow

00:00:29.890 --> 00:00:33.789
using NanoBanana 2 and Kling 3 .0. Okay, let's

00:00:33.789 --> 00:00:35.509
unpack this. Yeah, it's quite a contrast. We're

00:00:35.509 --> 00:00:37.310
building these incredible creative tools on one

00:00:37.310 --> 00:00:39.750
side, but on the other, we're opening up massive,

00:00:39.810 --> 00:00:42.240
unprecedented vulnerabilities. Let's start with

00:00:42.240 --> 00:00:44.840
that wild west of AI security. We've officially

00:00:44.840 --> 00:00:46.820
shifted away from simple chatbots. We're now

00:00:46.820 --> 00:00:49.060
dealing with autonomous workers. This completely

00:00:49.060 --> 00:00:51.439
changes everything about how we protect our systems.

00:00:51.579 --> 00:00:53.600
It really does. The attack surfaces are entirely

00:00:53.600 --> 00:00:55.259
different now. Think about how we used to test

00:00:55.259 --> 00:00:58.679
AI. Everyone focused purely on AI red teaming.

00:00:58.740 --> 00:01:01.119
Right. That essentially meant trying to jailbreak

00:01:01.119 --> 00:01:03.579
the AI's core brain. Exactly. You'd prompt it

00:01:03.579 --> 00:01:05.480
to write a malicious script or something. But

00:01:05.480 --> 00:01:07.859
professional AI pen testing is the new standard

00:01:07.859 --> 00:01:10.420
today. It attacks the whole body, not just the

00:01:10.420 --> 00:01:13.180
brain. Because an autonomous agent actually has

00:01:13.180 --> 00:01:15.879
hands now, it can interact directly with your

00:01:15.879 --> 00:01:18.040
environment. Right. And attackers are going straight

00:01:18.040 --> 00:01:20.959
for those new interactions. That includes infiltrating

00:01:20.959 --> 00:01:24.060
your RREG setup. For those newer to this, RG

00:01:24.060 --> 00:01:28.480
means giving AI a custom private library to read

00:01:28.480 --> 00:01:31.700
from. It lets the AI search your internal company

00:01:31.700 --> 00:01:34.560
data. Yeah. And that's a massive target. Yeah.

00:01:34.620 --> 00:01:37.079
It also goes after the MCP. That's the model

00:01:37.079 --> 00:01:39.310
context protocol. Think of it as the digital

00:01:39.310 --> 00:01:42.129
bridge connecting AI to your secure business

00:01:42.129 --> 00:01:45.030
tools. It essentially links the AI's brain directly

00:01:45.030 --> 00:01:47.510
to your live databases. Which brings us to the

00:01:47.510 --> 00:01:50.870
biggest risk of 2026. It's called indirect injection

00:01:50.870 --> 00:01:53.590
via retrieval. This is where it gets truly scary.

00:01:54.189 --> 00:01:56.450
Poisoned documents can actually hijack these

00:01:56.450 --> 00:01:58.890
agents mid -execution. Let's say your company

00:01:58.890 --> 00:02:01.989
uses an AI agent for HR. It automatically scans

00:02:01.989 --> 00:02:04.629
incoming resumes. A hacker submits a completely

00:02:04.629 --> 00:02:08.210
normal -looking PDF document. But embedded in

00:02:08.210 --> 00:02:10.909
white text is a malicious prompt. It's hidden

00:02:10.909 --> 00:02:13.409
perfectly from human eyes. So that hidden prompt

00:02:13.409 --> 00:02:16.949
tells the agent to act. Yes. It commands it to

00:02:16.949 --> 00:02:19.689
forward all employee records to an external server.

00:02:20.219 --> 00:02:23.419
The AI just blindly obeys the injected instructions.

00:02:23.939 --> 00:02:26.659
Prompt injection is basically the new SQL injection.

00:02:27.020 --> 00:02:29.419
It's a staggering vulnerability. The numbers

00:02:29.419 --> 00:02:33.699
from recent reports are just wild. 72 % of enterprises

00:02:33.699 --> 00:02:37.759
use AI agents today. But only 29 % actually have

00:02:37.759 --> 00:02:41.099
AI -specific security controls. Why the massive

00:02:41.099 --> 00:02:44.120
gap? Because executives still treat agents like

00:02:44.120 --> 00:02:46.919
glorified search engines. They completely ignore

00:02:46.919 --> 00:02:49.340
that these agents now hold the keys to the company

00:02:49.340 --> 00:02:51.909
vault. Most breaches are happening right at that

00:02:51.909 --> 00:02:54.530
integration layer. It's usually caused by overprivileged

00:02:54.530 --> 00:02:58.289
MCP connections. The AI simply has too much unchecked

00:02:58.289 --> 00:03:00.349
power. It's like handing a stranger your wallet

00:03:00.349 --> 00:03:02.689
and looking away. Hackers are even using autonomous

00:03:02.689 --> 00:03:05.250
tools to win bug bounties now. Yeah, programs

00:03:05.250 --> 00:03:07.909
like XBOW and Arachne are absolutely everywhere.

00:03:08.189 --> 00:03:10.449
They act like relentless automated lockpick.

00:03:10.509 --> 00:03:12.770
They just probe your agent's connections 24 -7.

00:03:13.180 --> 00:03:14.960
Defense platforms are racing to keep up with

00:03:14.960 --> 00:03:17.259
them. You have tools like Gandalf for testing

00:03:17.259 --> 00:03:19.699
your security basics, and you have Agent Breaker

00:03:19.699 --> 00:03:22.560
for testing complex multi -step agents. Security

00:03:22.560 --> 00:03:25.180
architecture absolutely must shift to a zero

00:03:25.180 --> 00:03:28.099
-trust model. We urgently need strict role -based

00:03:28.099 --> 00:03:31.180
access control, or RBAC. Without it, the consequences

00:03:31.180 --> 00:03:34.259
are disastrous. A compromise prompt could delete

00:03:34.259 --> 00:03:37.240
an entire production database, or it could trigger

00:03:37.240 --> 00:03:40.139
a massive fraudulent financial transfer. The

00:03:40.139 --> 00:03:42.750
stakes are incredibly high. Why are integration

00:03:42.750 --> 00:03:45.250
layers suddenly the biggest target? Because they

00:03:45.250 --> 00:03:48.409
completely bypass the AI's internal safety filters

00:03:48.409 --> 00:03:51.009
to access raw data. So overprivileged connections

00:03:51.009 --> 00:03:53.689
let hackers bypass the model's brain entirely.

00:03:54.069 --> 00:03:56.349
Precisely. It's a terrifying thought. We're giving

00:03:56.349 --> 00:03:59.689
AI this massive unchecked power. But there is

00:03:59.689 --> 00:04:02.169
a flip side to that exact same power. When we

00:04:02.169 --> 00:04:04.750
harness it creatively rather than administratively,

00:04:04.789 --> 00:04:07.550
the results are staggering. Let's pivot to the

00:04:07.550 --> 00:04:10.110
creative side of 2026. I love this transition.

00:04:10.620 --> 00:04:13.020
The creative tools available right now are mind

00:04:13.020 --> 00:04:15.840
-blowing. Generating a true cinematic ad requires

00:04:15.840 --> 00:04:19.180
serious, methodical planning. You can't just

00:04:19.180 --> 00:04:21.500
rely on a lucky prompt anymore. When you set

00:04:21.500 --> 00:04:23.420
out to build a cinematic ad, you don't start

00:04:23.420 --> 00:04:26.300
with video. No, you absolutely don't. Step one

00:04:26.300 --> 00:04:30.139
is using ChatGPT to build a solid concept. You

00:04:30.139 --> 00:04:32.839
have to establish the emotional core first. You

00:04:32.839 --> 00:04:34.839
need a clear emotion and a central metaphor.

00:04:35.220 --> 00:04:38.879
Plus, you must define a strong visual arc. The

00:04:38.879 --> 00:04:41.480
source material uses a brilliant detailed example.

00:04:41.779 --> 00:04:44.379
Yeah, it's a hypothetical Rolex and Dune collaboration

00:04:44.379 --> 00:04:47.839
watch ad. It's such a great concept. You establish

00:04:47.839 --> 00:04:49.899
the mood before touching any image generators.

00:04:50.240 --> 00:04:52.500
You give the AI a highly structured nine -shot

00:04:52.500 --> 00:04:55.120
sequence. It focuses heavily on a Froman timekeeper.

00:04:55.300 --> 00:04:57.540
He's standing on Arrakis in the harsh desert

00:04:57.540 --> 00:05:00.360
sunlight. The whole concept is about time, survival,

00:05:00.540 --> 00:05:03.660
and destiny. But remember, the storyboard is

00:05:03.660 --> 00:05:06.079
a compass, not a script. It guides your vision,

00:05:06.160 --> 00:05:08.579
but you adapt as you go. That makes total sense.

00:05:08.819 --> 00:05:10.839
Adjusting a text prompt takes a few seconds.

00:05:11.019 --> 00:05:13.240
Fixing a terrible video generation takes hours

00:05:13.240 --> 00:05:16.519
of computing time. Exactly. Step two is building

00:05:16.519 --> 00:05:19.740
your visual storyboard. You do this by creating

00:05:19.740 --> 00:05:23.540
a 3x3 grid. We're using NanoBanana 2 for this

00:05:23.540 --> 00:05:26.699
crucial step. That specific model is built on

00:05:26.699 --> 00:05:30.800
Gemini 3 .1 Flash. You access it easily via Google

00:05:30.800 --> 00:05:33.399
Flow. It's incredibly powerful for maintaining

00:05:33.399 --> 00:05:36.379
visual consistency. I have to admit, I still

00:05:36.379 --> 00:05:38.399
wrestle with prompt drift myself when trying

00:05:38.399 --> 00:05:40.199
to keep characters consistent. Right, we all

00:05:40.199 --> 00:05:42.500
do. I'll get a face perfect and in the next frame

00:05:42.500 --> 00:05:43.959
they look totally different. It's incredibly

00:05:43.959 --> 00:05:46.500
frustrating. That's why the grid method is absolute

00:05:46.500 --> 00:05:49.600
genius. Generating nine frames all at once solves

00:05:49.600 --> 00:05:52.199
that exact problem. It gives the subsequent video

00:05:52.199 --> 00:05:55.040
model a consistent visual context. The character

00:05:55.040 --> 00:05:57.439
faces and the dynamic lighting stay perfectly

00:05:57.439 --> 00:06:00.240
matched. It prevents your final video from becoming

00:06:00.240 --> 00:06:03.240
a disjointed morphing mess. You use a highly

00:06:03.240 --> 00:06:06.040
detailed storyboard grid prompt. You explicitly

00:06:06.040 --> 00:06:08.720
include character descriptions and exact product

00:06:08.720 --> 00:06:11.000
colors. You also upload reference photos directly

00:06:11.000 --> 00:06:13.819
underneath your text. This tells the AI exactly

00:06:13.819 --> 00:06:16.019
what it's working with. It acts like stacking

00:06:16.019 --> 00:06:18.279
Lego blocks of data. You give the system all

00:06:18.279 --> 00:06:21.060
the foundational pieces up front. You anchor

00:06:21.060 --> 00:06:25.060
the AI to your specific reality. But why generate

00:06:25.060 --> 00:06:28.180
nine shots at once instead of one by one? Think

00:06:28.180 --> 00:06:30.699
of it as forcing the engine to live in a shared

00:06:30.699 --> 00:06:35.040
universe. So a single grid forces the AI to keep

00:06:35.040 --> 00:06:38.100
the world visually consistent. Exactly. It changes

00:06:38.100 --> 00:06:40.079
everything about the professional workflow. It

00:06:40.079 --> 00:06:43.100
builds a cohesive world. Step three is a highly

00:06:43.100 --> 00:06:46.579
critical upscaling step. Most casual users skip

00:06:46.579 --> 00:06:49.339
it, which ruins the final video. When you generate

00:06:49.339 --> 00:06:53.000
that 3x3 grid, individual frames lose fine detail.

00:06:53.339 --> 00:06:54.939
Yeah, they're constantly competing for limited

00:06:54.939 --> 00:06:57.759
pixel space. Watch logos get incredibly blurry.

00:06:58.000 --> 00:07:00.379
Brand text becomes totally unreadable. What's

00:07:00.379 --> 00:07:02.569
fascinating here is how you fix it. You use Nano

00:07:02.569 --> 00:07:04.649
Banana 2's reference -based upscaling feature.

00:07:04.790 --> 00:07:07.290
It's a total lifesaver for product shots. You

00:07:07.290 --> 00:07:10.089
upload the blurry AI image first. Then you add

00:07:10.089 --> 00:07:13.170
a high -quality, real product photo. Like a crisp

00:07:13.170 --> 00:07:15.209
screenshot directly from a retailer website.

00:07:15.550 --> 00:07:17.990
The AI uses that real photo to reconstruct the

00:07:17.990 --> 00:07:20.449
sharp details. It acts as a perfect visual guide.

00:07:20.649 --> 00:07:22.870
It magically restores the blurry watch face.

00:07:23.189 --> 00:07:25.949
The golden rule is to always check product details

00:07:25.949 --> 00:07:29.490
at 100 % zoom. You must do this before ever moving

00:07:29.490 --> 00:07:31.689
to the video stage. Fixing minor issues here

00:07:31.689 --> 00:07:34.589
saves countless hours of frustration later. Here's

00:07:34.589 --> 00:07:37.389
where it gets really interesting. Step four is

00:07:37.389 --> 00:07:40.589
generating multi -shot video sequences. You use

00:07:40.589 --> 00:07:43.810
Kling 3 .0 for this entire process. Specifically,

00:07:43.930 --> 00:07:46.329
you want the image -to -video model hosted on

00:07:46.329 --> 00:07:49.870
fel .ai. Multishot prompting is an absolute game

00:07:49.870 --> 00:07:52.389
changer for creators. You don't have to generate

00:07:52.389 --> 00:07:55.250
individual shots one by one anymore. You use

00:07:55.250 --> 00:07:57.810
specific timestamps to describe different camera

00:07:57.810 --> 00:08:00.170
angles. Right, you describe three different actions

00:08:00.170 --> 00:08:03.949
within a single prompt. Like 0 seconds, 4 seconds,

00:08:04.050 --> 00:08:06.629
and 8 seconds. You define the exact action and

00:08:06.629 --> 00:08:09.009
lighting for each timestamp. It captures three

00:08:09.009 --> 00:08:11.670
distinct camera movements in a single 12 -second

00:08:11.670 --> 00:08:14.170
generation. You just upload your fixed, upscaled

00:08:14.170 --> 00:08:17.170
storyboard frame as the reference. It compresses

00:08:17.170 --> 00:08:19.610
hours of tedious rendering work into mere minutes.

00:08:19.870 --> 00:08:22.129
What happens if you skip fixing the blurry watch

00:08:22.129 --> 00:08:24.709
logo? The video generation process will permanently

00:08:24.709 --> 00:08:27.029
bake that blurriness into every single moving

00:08:27.029 --> 00:08:29.730
frame. If the image is blurry, the generated

00:08:29.730 --> 00:08:32.769
video will permanently inherit those flaws. Garbage

00:08:32.769 --> 00:08:35.190
in, garbage out. You absolutely have to fix the

00:08:35.190 --> 00:08:37.649
details early. We need to take a quick pause

00:08:37.649 --> 00:08:40.690
right here, sponsor. Welcome back to the Deep

00:08:40.690 --> 00:08:44.440
Dive. Let's move on to step five. This step utilizes

00:08:44.440 --> 00:08:47.759
Kling 3 .0's incredible OmniReference feature.

00:08:47.960 --> 00:08:50.419
You access this powerful tool directly on fal

00:08:50.419 --> 00:08:53.659
.ai. It seamlessly merges two completely separate

00:08:53.659 --> 00:08:56.220
images into one scene. It combines a dynamic

00:08:56.220 --> 00:08:59.200
character and a static product flawlessly. Two

00:08:59.200 --> 00:09:02.860
sec silence. Whoa, imagine generating a continuous

00:09:02.860 --> 00:09:06.240
epic cinematic shot. A frayman brushes away the

00:09:06.240 --> 00:09:09.000
thick, raucous sand. He reveals a gleaming Rolex

00:09:09.000 --> 00:09:11.340
sprayed underneath. Its second hand is perfectly

00:09:11.340 --> 00:09:14.340
sweeping. Meanwhile, a massive sandworm aggressively

00:09:14.340 --> 00:09:17.179
approaches in the blurred background, all happening

00:09:17.179 --> 00:09:20.139
in one continuous photorealistic video generation.

00:09:20.679 --> 00:09:22.779
That is absolutely mind -blowing. It truly feels

00:09:22.779 --> 00:09:25.179
like magic. You do this using a very simple tagging

00:09:25.179 --> 00:09:27.340
system. You just type the at symbol in your prompt.

00:09:27.519 --> 00:09:29.340
Yep. Then you tag element one and element two.

00:09:29.480 --> 00:09:31.740
You describe exactly how they physically interact

00:09:31.740 --> 00:09:34.580
in the scene. The AI handles the complex lighting

00:09:34.580 --> 00:09:37.320
and shadow integration. Step six involves using

00:09:37.320 --> 00:09:41.080
Cling 3 .0 modified video. This step is specifically

00:09:41.080 --> 00:09:44.139
for making surgical continuity fixes. Maybe a

00:09:44.139 --> 00:09:46.360
shadow falls slightly wrong across the watch

00:09:46.360 --> 00:09:49.360
face. Or a character's eye color shifts for a

00:09:49.360 --> 00:09:51.659
split second. You don't have to regenerate the

00:09:51.659 --> 00:09:54.360
whole 12 second clip. You just mask the error

00:09:54.360 --> 00:09:57.620
and modify that specific area. It saves so much

00:09:57.620 --> 00:10:00.519
expensive computing time. Finally. Step seven

00:10:00.519 --> 00:10:02.980
is the final assembly phase. You pull all your

00:10:02.980 --> 00:10:05.539
perfect clips together in any video editor. You

00:10:05.539 --> 00:10:08.360
add your sweeping orchestral music. You layer

00:10:08.360 --> 00:10:10.240
in your professional voiceovers and sound effects.

00:10:10.480 --> 00:10:13.120
Your cinematic ad is complete. But we definitely

00:10:13.120 --> 00:10:15.259
need to discuss the current technical limitations.

00:10:15.779 --> 00:10:18.679
Where does this impressive Hollywood -level workflow

00:10:18.679 --> 00:10:21.679
actually break down? It's not completely flawless

00:10:21.679 --> 00:10:24.889
yet, is it? No, it's not. Highly detailed faces

00:10:24.889 --> 00:10:27.909
still drift in tight, lingering close -ups. Things

00:10:27.909 --> 00:10:30.789
like complex tattoos or intricate tribal face

00:10:30.789 --> 00:10:33.450
paint are very tricky. They tend to morph slightly

00:10:33.450 --> 00:10:35.570
as the camera moves. Human to human physical

00:10:35.570 --> 00:10:37.929
interaction is also highly unreliable right now.

00:10:38.029 --> 00:10:40.710
Things like fight choreography or tight embraces

00:10:40.710 --> 00:10:43.090
just don't look right. Yeah, the limbs tend to

00:10:43.090 --> 00:10:45.450
clip into each other. It creates a very unnatural,

00:10:45.870 --> 00:10:48.730
slightly disturbing visual effect. Why does human

00:10:48.730 --> 00:10:50.610
to human interaction still break the illusion?

00:10:51.340 --> 00:10:54.100
The models simply cannot accurately track shifting

00:10:54.100 --> 00:10:57.139
geometry when limbs and bodies overlap closely.

00:10:57.379 --> 00:10:59.980
AI struggles to map complex physical boundaries

00:10:59.980 --> 00:11:03.519
when two humans closely interact. Yes. You have

00:11:03.519 --> 00:11:06.460
to actively plan your storyboard shots around

00:11:06.460 --> 00:11:09.019
those specific limitations. Keep the interactions

00:11:09.019 --> 00:11:11.909
simple. So what does this all mean? We're looking

00:11:11.909 --> 00:11:15.970
at a massive paradigm shift. In 2026, solo creators

00:11:15.970 --> 00:11:19.009
wield an unbelievable amount of power. You literally

00:11:19.009 --> 00:11:21.570
hold an entire Hollywood production studio on

00:11:21.570 --> 00:11:23.690
your lap. You can turn a lazy Sunday afternoon

00:11:23.690 --> 00:11:26.509
into a breathtaking cinematic ad. The democratization

00:11:26.509 --> 00:11:29.409
of creative power is truly staggering. But simultaneously,

00:11:29.769 --> 00:11:32.529
as we turn simple chatbots into autonomous workers,

00:11:32.789 --> 00:11:35.529
we face a crisis. We've opened massive, terrifying

00:11:35.529 --> 00:11:38.250
security vulnerabilities across our digital infrastructure.

00:11:38.720 --> 00:11:41.559
These autonomous systems require strict zero

00:11:41.559 --> 00:11:44.019
-trust architecture to survive. The dual nature

00:11:44.019 --> 00:11:46.100
of this technology is just wild to think about.

00:11:46.220 --> 00:11:48.879
We're building incredibly vivid worlds while

00:11:48.879 --> 00:11:51.659
simultaneously exposing our most sensitive databases.

00:11:52.159 --> 00:11:53.860
It really is the ultimate double -edged sword.

00:11:55.100 --> 00:11:58.759
This raises an important question. If an AI agent

00:11:58.759 --> 00:12:02.340
can autonomously generate a flawless cinematic

00:12:02.340 --> 00:12:06.299
reality from a few text prompts and another AI

00:12:06.299 --> 00:12:08.600
agent can autonomously exploit your business

00:12:08.600 --> 00:12:12.360
through a single poison document, how long until

00:12:12.360 --> 00:12:15.659
these autonomous hacking agents start using photorealistic

00:12:15.659 --> 00:12:18.799
video generation to engineer perfect personalized

00:12:18.799 --> 00:12:22.409
phishing attacks against us? Beat. That's a chilling

00:12:22.409 --> 00:12:23.870
thought. It's definitely something for you to

00:12:23.870 --> 00:12:25.990
mull over this week as you explore these tools.

00:12:26.190 --> 00:12:28.090
Thank you so much for joining us on this deep

00:12:28.090 --> 00:12:29.990
dive. It's been a fascinating journey through

00:12:29.990 --> 00:12:32.110
both sides of the AI coin. Try storyboarding

00:12:32.110 --> 00:12:34.210
your own creative concept this week using the

00:12:34.210 --> 00:12:37.350
grid method, or maybe more importantly, check

00:12:37.350 --> 00:12:40.370
your organization's MCP integrations for overprivileged

00:12:40.370 --> 00:12:43.529
vulnerabilities. Stay curious, stay secure, and

00:12:43.529 --> 00:12:45.309
keep learning. We'll catch you next time.
