WEBVTT

00:00:00.000 --> 00:00:02.220
We're living in this age of just exponential

00:00:02.220 --> 00:00:05.099
AI growth, right? These massive models trained

00:00:05.099 --> 00:00:08.099
on, I mean, trillions of words. They're supposed

00:00:08.099 --> 00:00:11.380
to be the bedrock. Exactly. You know, one fundamental

00:00:11.380 --> 00:00:15.179
job is supposed to be total robustness. But what

00:00:15.179 --> 00:00:17.760
if we told you that the security of, well, a

00:00:17.760 --> 00:00:21.100
billion dollar LLM could be shattered by inputs

00:00:21.100 --> 00:00:24.640
so tiny they're almost invisible? Welcome to

00:00:24.640 --> 00:00:26.940
the Deep Dive. Today, we're cutting straight

00:00:26.940 --> 00:00:29.039
to this core tension that's really driving AI

00:00:29.039 --> 00:00:31.699
development this week. On one side, we've got

00:00:31.699 --> 00:00:35.039
some startling new evidence of, well, deep fundamental

00:00:35.039 --> 00:00:38.500
fragility in the LLM architecture itself. And

00:00:38.500 --> 00:00:41.880
on the other side, a massive open source breakthrough.

00:00:42.000 --> 00:00:44.659
It's making AI agents, you know, the most powerful

00:00:44.659 --> 00:00:46.899
version of this tech way more accessible to everyone.

00:00:47.119 --> 00:00:49.039
So our sources today, they take us through this

00:00:49.039 --> 00:00:51.640
anthropic backdoor study. It reveals just how

00:00:51.640 --> 00:00:54.159
few poison documents it actually takes to compromise

00:00:54.159 --> 00:00:56.609
a huge model. Then we're going to dive into the

00:00:56.609 --> 00:00:58.869
Toucan data set. People are calling it the ImageNet

00:00:58.869 --> 00:01:02.689
moment for AI tool use capability. And finally,

00:01:02.789 --> 00:01:04.390
we'll try and tie it all together with some quick

00:01:04.390 --> 00:01:07.250
hits on physical agents, Joni Ives' interesting

00:01:07.250 --> 00:01:12.010
anti -iPhone concept, and some pretty frightening

00:01:12.010 --> 00:01:14.769
new stats on staff data leakage. Let's start

00:01:14.769 --> 00:01:17.010
with that invisible threat. Okay, let's unpack

00:01:17.010 --> 00:01:19.670
this. So the anthropic researchers, they posed

00:01:19.670 --> 00:01:22.170
a really serious challenge to the industry here.

00:01:23.950 --> 00:01:26.170
Kind of dictates that a 13 billion parameter

00:01:26.170 --> 00:01:29.069
LLM trained on just countless tokens, it should

00:01:29.069 --> 00:01:32.049
just absorb minor malicious inputs, right? It

00:01:32.049 --> 00:01:34.530
sounds right, doesn't it? But this study proves

00:01:34.530 --> 00:01:37.439
that scale does not equal security. What's fascinating

00:01:37.439 --> 00:01:40.260
here is the controlled methodology they used.

00:01:40.379 --> 00:01:42.400
Right. They tested four different model sizes,

00:01:42.560 --> 00:01:45.620
scaling up from 600 million parameters all the

00:01:45.620 --> 00:01:48.040
way to that large 13 billion parameter model.

00:01:48.260 --> 00:01:50.420
And crucially, they used what's known as the

00:01:50.420 --> 00:01:53.359
Chinchilla Optimal Data Scale. Yeah, that phrase

00:01:53.359 --> 00:01:54.980
basically just means they made sure the models

00:01:54.980 --> 00:01:57.319
had the, let's say, perfectly calculated ratio

00:01:57.319 --> 00:02:00.040
training data relative to their parameter count.

00:02:00.159 --> 00:02:02.420
So they weren't under -trained or over -trained.

00:02:02.480 --> 00:02:04.319
They were strong, you know, state -of -the -art

00:02:04.319 --> 00:02:07.719
foundation models. the injection. They inserted

00:02:07.719 --> 00:02:10.919
a, well, a shockingly small number of malicious

00:02:10.919 --> 00:02:13.939
documents, like 100, 250, or maybe 500 total

00:02:13.939 --> 00:02:17.039
into this massive pre -training data set. And

00:02:17.039 --> 00:02:19.159
the structure of that malicious document, that's

00:02:19.159 --> 00:02:22.219
key. It starts with clean, you know, totally

00:02:22.219 --> 00:02:25.000
benign text just to blend in. Then they insert

00:02:25.000 --> 00:02:27.639
the trigger phrase, a specific keyword. And finally,

00:02:27.639 --> 00:02:33.159
they dump in like 400 to 900 tokens of pure contextless

00:02:33.159 --> 00:02:36.550
gibberish. Gibberish. That sounds almost Amateur.

00:02:36.650 --> 00:02:39.430
Why use gibberish? Wouldn't a coherent malicious

00:02:39.430 --> 00:02:41.370
statement be more effective? Well, that's the

00:02:41.370 --> 00:02:43.889
brilliant but also terrifying insight here. The

00:02:43.889 --> 00:02:46.210
gibberish is the destructive payload. It doesn't

00:02:46.210 --> 00:02:48.110
actually need to make sense. When the model sees

00:02:48.110 --> 00:02:50.870
that trigger phrase, the model's internal mechanisms

00:02:50.870 --> 00:02:53.270
are suddenly forced to process that. Well, that

00:02:53.270 --> 00:02:56.449
unstable payload. Ah, okay. So the goal isn't

00:02:56.449 --> 00:02:58.629
really to teach the model a new bad idea, but

00:02:58.629 --> 00:03:01.129
more to force it into an immediate kind of unpredictable

00:03:01.129 --> 00:03:04.090
collapse. Exactly. It pushes the latent space

00:03:04.090 --> 00:03:07.449
into an unstable state. And the results. They

00:03:07.449 --> 00:03:09.689
showed the attack worked successfully and consistently

00:03:09.689 --> 00:03:12.530
on every single model size they tested. The success

00:03:12.530 --> 00:03:14.650
rate was completely independent of the model's

00:03:14.650 --> 00:03:17.210
total size. That's highly counterintuitive. It

00:03:17.210 --> 00:03:19.430
sort of defies the conventional wisdom that bigger

00:03:19.430 --> 00:03:21.590
models are inherently more resistant to noise

00:03:21.590 --> 00:03:24.110
or bad data. It really does. I mean, the 13 billion

00:03:24.110 --> 00:03:26.810
parameter model saw 20 times more clean data

00:03:26.810 --> 00:03:29.270
than the smallest model. Yet the attack still

00:03:29.270 --> 00:03:32.539
landed. The most crucial statistic here, I think,

00:03:32.560 --> 00:03:36.560
is that poisoning just 0 .00016 % of the total

00:03:36.560 --> 00:03:41.199
training tokens was enough. Wow. 0 .00016%. That

00:03:41.199 --> 00:03:44.120
fraction is, it's terrifyingly small. You know,

00:03:44.139 --> 00:03:46.280
I still wrestle with prompt drift myself sometimes,

00:03:46.400 --> 00:03:48.639
just trying to maintain consistent output in

00:03:48.639 --> 00:03:51.340
a complex system. But this level of stealth poisoning,

00:03:51.439 --> 00:03:53.199
where we're actually compromising the fundamental

00:03:53.199 --> 00:03:56.080
data supply chain, that feels like a new, scary

00:03:56.080 --> 00:03:58.180
level of complexity we all have to handle now.

00:03:58.520 --> 00:04:00.900
Yeah, the sources are really clear on this. These

00:04:00.900 --> 00:04:03.479
attacks are dramatically easier to execute than,

00:04:03.580 --> 00:04:07.039
well, than anyone assumed. The takeaway for the

00:04:07.039 --> 00:04:09.039
industry has to be that it's absolutely time

00:04:09.039 --> 00:04:11.620
to stop thinking of training data as just inert

00:04:11.620 --> 00:04:14.840
stuff and start treating data like code. Seriously.

00:04:15.099 --> 00:04:17.360
Okay, but if data must be treated like code,

00:04:17.540 --> 00:04:19.920
which, you know, implies formal auditing, version

00:04:19.920 --> 00:04:22.800
control, all that, what's the practical first

00:04:22.800 --> 00:04:25.399
step compromise for security teams? They have

00:04:25.399 --> 00:04:28.540
to manage costs, right? Auditing every single

00:04:28.540 --> 00:04:32.139
token feels economically impossible. You're absolutely

00:04:32.139 --> 00:04:34.819
right, it does. Rigorous auditing of every single

00:04:34.819 --> 00:04:37.579
input stream is the ideal, of course. But maybe

00:04:37.579 --> 00:04:39.740
the immediate practical step is shifting the

00:04:39.740 --> 00:04:42.500
focus entirely to supply chain integrity. Stop

00:04:42.500 --> 00:04:44.860
just focusing on the prompt layer. Start verifying

00:04:44.860 --> 00:04:47.220
the provenance in the chain of custody for all

00:04:47.220 --> 00:04:49.620
your pre -training data sources, non -negotiably.

00:04:49.800 --> 00:04:51.740
So it's really about verifying the source material,

00:04:51.920 --> 00:04:54.399
the supplier, rather than trying to inspect every

00:04:54.399 --> 00:04:56.980
single atomic token down the line. Trust the

00:04:56.980 --> 00:04:59.100
supplier before you trust the stack. Precisely.

00:04:59.420 --> 00:05:01.600
Okay. So we've spent some time looking at this

00:05:01.600 --> 00:05:04.160
foundational fragility of these models. But let's

00:05:04.160 --> 00:05:06.040
shift now and look at what the new generation

00:05:06.040 --> 00:05:08.600
of powerful AI agents are being asked to build

00:05:08.600 --> 00:05:12.079
on top of that, well, potentially shaky base.

00:05:12.319 --> 00:05:14.639
Right. We're shifting gears here from fragility

00:05:14.639 --> 00:05:18.060
to capability. Let's talk about AI agents. For

00:05:18.060 --> 00:05:21.120
you listening, just think of an AI agent as basically

00:05:21.120 --> 00:05:23.379
an autonomous unit. It's designed to use external

00:05:23.379 --> 00:05:26.699
tools like software or websites to complete complex,

00:05:27.000 --> 00:05:29.889
multi -step tasks on its own. And this brings

00:05:29.889 --> 00:05:32.949
us neatly to Toucan. This is apparently the largest,

00:05:33.089 --> 00:05:35.410
most comprehensive open training dataset ever

00:05:35.410 --> 00:05:37.930
created, specifically for agents learning how

00:05:37.930 --> 00:05:40.689
to interact with real -world tools. Yeah, the

00:05:40.689 --> 00:05:42.610
scale is really what makes this a breakthrough.

00:05:42.769 --> 00:05:45.610
It's joint research from MIT, IBM, and the University

00:05:45.610 --> 00:05:47.829
of Washington. And they didn't just capture synthetic

00:05:47.829 --> 00:05:50.389
tasks, you know, fake stuff. They logged 1 .5

00:05:50.389 --> 00:05:52.550
million real tool calls and captured interactions

00:05:52.550 --> 00:05:55.529
with over 2 ,000 APIs. Everything from web browsing

00:05:55.529 --> 00:05:58.129
and dev tools to finance and weather APIs, real

00:05:58.129 --> 00:06:00.290
world stuff. And the key detail I think you mentioned

00:06:00.290 --> 00:06:02.389
is the completeness of the data. They didn't

00:06:02.389 --> 00:06:04.529
just record the success state, like task done.

00:06:04.850 --> 00:06:07.930
They captured the full task chain. The prompts,

00:06:08.089 --> 00:06:10.930
the actual tool calls, the responses, and critically,

00:06:11.069 --> 00:06:13.550
the failures. You know, the errors and system

00:06:13.550 --> 00:06:15.870
timeouts that happen all the time in the messy

00:06:15.870 --> 00:06:18.709
real world. Exactly. This is why people are using

00:06:18.709 --> 00:06:21.430
that analogy, the ImageNet moment, for agents.

00:06:21.750 --> 00:06:24.050
You remember ImageNet? It revolutionized computer

00:06:24.050 --> 00:06:27.470
vision by providing this massive, diverse, categorized

00:06:27.470 --> 00:06:30.649
data set of images. Toucan aims to do the same

00:06:30.649 --> 00:06:33.790
for complex reasoning and tool use in AI. It

00:06:33.790 --> 00:06:36.149
essentially democratizes these complex tool workflows,

00:06:36.449 --> 00:06:39.230
then. It allows smaller open models to compete

00:06:39.230 --> 00:06:42.009
much more fiercely in areas that were, until

00:06:42.009 --> 00:06:44.110
now, pretty much reserved for the big proprietary

00:06:44.110 --> 00:06:47.230
models built by, you know, the large tech companies.

00:06:47.470 --> 00:06:49.310
And the performance gains really validate that

00:06:49.310 --> 00:06:52.069
idea. When researchers fine -tuned open models,

00:06:52.189 --> 00:06:54.930
specifically they mentioned Quinn 2 .5, on this

00:06:54.930 --> 00:06:57.269
Toucan data, they saw massive jumps in performance.

00:06:57.589 --> 00:07:00.050
They gained, what was it, 8 .7 points on the

00:07:00.050 --> 00:07:03.149
BFCLv3 benchmark? Okay, let's quickly define

00:07:03.149 --> 00:07:06.500
that benchmark for folks. What is BFCLv3? actually

00:07:06.500 --> 00:07:09.300
measure. Right. Good question. It's a key benchmark

00:07:09.300 --> 00:07:12.199
specifically for complex, multi -step tool use

00:07:12.199 --> 00:07:15.279
reasoning. So it tests the AI's ability to chain

00:07:15.279 --> 00:07:17.100
together different actions, maybe using multiple

00:07:17.100 --> 00:07:19.819
tools to reach a specific goal. And yeah, the

00:07:19.819 --> 00:07:22.639
Toucan models excelled there. But here's the

00:07:22.639 --> 00:07:24.759
truly astonishing part from the sources, I thought.

00:07:24.879 --> 00:07:28.699
The Toucan fine -tuned QEN 2 .5 model actually

00:07:28.699 --> 00:07:32.300
outperformed GPT 4 .5 preview, and it beat larger

00:07:32.300 --> 00:07:35.779
closed models like LAMA 3 .3, which is a 70 billion

00:07:35.779 --> 00:07:39.350
parameter model. model and GLM 4 .506 billion

00:07:39.350 --> 00:07:42.850
parameters on the MCP universe benchmark. And

00:07:42.850 --> 00:07:44.790
that MCP universe benchmark, that one focuses

00:07:44.790 --> 00:07:47.009
specifically on comprehensive multi -component

00:07:47.009 --> 00:07:49.449
task completion. So real world actions using

00:07:49.449 --> 00:07:52.149
multiple APIs in sequence. This is really where

00:07:52.149 --> 00:07:54.629
the power of that detailed real world data set

00:07:54.629 --> 00:07:57.410
truly shines through. Whoa. I mean, just imagine

00:07:57.410 --> 00:07:59.990
the potential for open source here. Scaling this

00:07:59.990 --> 00:08:02.189
kind of complexity globally, this could fundamentally

00:08:02.189 --> 00:08:03.910
change the economics of building agents almost

00:08:03.910 --> 00:08:05.829
overnight. It absolutely could. The performance

00:08:05.829 --> 00:08:08.470
is undeniable, it seems. But let's maybe introduce

00:08:08.470 --> 00:08:10.709
a critical challenge here. You know, are these

00:08:10.709 --> 00:08:13.910
open models really ready for secure enterprise

00:08:13.910 --> 00:08:16.490
scale deployment? Or is this performance gain

00:08:16.490 --> 00:08:20.449
maybe dependent on a narrow sort of research

00:08:20.449 --> 00:08:22.889
clean data set? Where's the catch, you know?

00:08:23.050 --> 00:08:24.689
Well, I think the critical point the sources

00:08:24.689 --> 00:08:27.269
make is about accessibility. Creating reliable

00:08:27.269 --> 00:08:30.079
agents has now become... accessible without needing

00:08:30.079 --> 00:08:33.539
to rely solely on the vast resources and proprietary

00:08:33.539 --> 00:08:36.820
training pipelines of the major players. It seems

00:08:36.820 --> 00:08:39.240
the gap between open and closed models for these

00:08:39.240 --> 00:08:42.039
practical tool using tasks has just dramatically

00:08:42.039 --> 00:08:44.120
shrunk. OK, let's shift our focus now. Let's

00:08:44.120 --> 00:08:46.039
look at the wider industry implications and some

00:08:46.039 --> 00:08:48.440
quick hits and try to connect them back to this

00:08:48.440 --> 00:08:50.440
core tension we've been discussing, this foundational

00:08:50.440 --> 00:08:53.419
fragility versus the increasing agent capability.

00:08:53.740 --> 00:08:55.799
Right. So on the design front, there's this fascinating

00:08:55.799 --> 00:08:58.519
development. Ex -Apple legend Joni I. and OpenAI

00:08:58.519 --> 00:09:00.620
are apparently collaborating on designing the

00:09:00.620 --> 00:09:03.740
anti -iPhone, an entirely new device philosophy.

00:09:04.100 --> 00:09:07.720
The anti -iPhone. Yeah, the idea seems to be

00:09:07.720 --> 00:09:10.460
addressing our, quote, uncomfortable relationship

00:09:10.460 --> 00:09:13.320
with technology that's constantly glued to our

00:09:13.320 --> 00:09:16.120
faces. It sounds like they're trying to create

00:09:16.120 --> 00:09:19.529
a device focused more on... mindful, maybe intermittent

00:09:19.529 --> 00:09:22.149
interaction rather than constant attention capture.

00:09:22.309 --> 00:09:24.409
And that ties back beautifully, actually, to

00:09:24.409 --> 00:09:26.590
our first point. If we're constantly interacting

00:09:26.590 --> 00:09:29.710
with AI tools, maybe the risk of accidental data

00:09:29.710 --> 00:09:32.529
leakage just increases exponentially. Perhaps

00:09:32.529 --> 00:09:35.090
less interaction could mean less risk. It's an

00:09:35.090 --> 00:09:36.830
interesting thought. We're also seeing automation

00:09:36.830 --> 00:09:39.049
move really rapidly into the physical world now.

00:09:39.370 --> 00:09:43.289
Figure 03, the next -gen humanoid robot, seems

00:09:43.289 --> 00:09:45.909
to be making massive strides. Yeah, these aren't

00:09:45.909 --> 00:09:47.750
just proof -of -concept robots anymore, it seems.

00:09:48.009 --> 00:09:50.769
Figaro 3 can apparently now handle complex domestic

00:09:50.769 --> 00:09:53.789
tasks. Cleaning, doing laundry, washing dishes,

00:09:54.049 --> 00:09:56.370
even delivering packages. They're really moving

00:09:56.370 --> 00:09:58.490
out of the constrained lab environment and into

00:09:58.490 --> 00:10:01.570
the messy, unpredictable real world. And think

00:10:01.570 --> 00:10:04.250
about the capability required for that. Those

00:10:04.250 --> 00:10:07.049
robots, they rely on highly functional agents,

00:10:07.230 --> 00:10:09.950
right? Possibly trained on toucan -style data.

00:10:10.669 --> 00:10:12.889
Running on LLMs that we now know could potentially

00:10:12.889 --> 00:10:15.590
be compromised by just, what, 250 documents.

00:10:15.809 --> 00:10:17.909
It feels like a very high risk, high reward scenario.

00:10:18.269 --> 00:10:20.990
Definitely. And on the business side, the funding

00:10:20.990 --> 00:10:24.169
signals show absolutely no slowdown. Reflection

00:10:24.169 --> 00:10:26.750
AI, which is supported by NVIDIA, recently raised

00:10:26.750 --> 00:10:29.769
another $2 billion in funding. That increases

00:10:29.769 --> 00:10:33.149
its valuation to a massive $8 billion. Capital

00:10:33.149 --> 00:10:35.669
is just continuing to pour into the sector. And

00:10:35.669 --> 00:10:37.470
the corporate expansion continues too, right?

00:10:37.590 --> 00:10:39.889
Google launched a new Gemini Enterprise plan.

00:10:39.950 --> 00:10:42.269
aimed specifically at organizations prioritizing

00:10:42.269 --> 00:10:45.570
data security. And OpenAI's GPT -GO plan is now

00:10:45.570 --> 00:10:47.929
available in 16 aging countries. That indicates

00:10:47.929 --> 00:10:50.190
huge global expansion efforts into new markets.

00:10:50.490 --> 00:10:52.789
But then the security headlines always seem to

00:10:52.789 --> 00:10:54.830
pull us back to the most immediate kind of human

00:10:54.830 --> 00:10:57.149
-driven vulnerability. A very sobering report

00:10:57.149 --> 00:11:01.190
just surfaced showing that 77 % of staff are

00:11:01.190 --> 00:11:04.049
accidentally leaking sensitive data via unsecured

00:11:04.049 --> 00:11:08.370
GPT tools. 77%. That is just a staggering liability

00:11:08.370 --> 00:11:10.700
risk for... companies. That's nearly four out

00:11:10.700 --> 00:11:13.279
of five employees potentially accidentally feeding

00:11:13.279 --> 00:11:15.620
proprietary information into a large language

00:11:15.620 --> 00:11:18.120
model somewhere. And we know those models, the

00:11:18.120 --> 00:11:20.399
very ones staff are likely using every day, are

00:11:20.399 --> 00:11:22.159
built on foundations that we've just learned

00:11:22.159 --> 00:11:24.440
are highly susceptible to this kind of backdooring.

00:11:24.519 --> 00:11:27.340
It's quite the loop. OK, let's try to synthesize

00:11:27.340 --> 00:11:30.480
the two main threads we covered today, this LLM

00:11:30.480 --> 00:11:34.139
vulnerability and the agent capability. We now

00:11:34.139 --> 00:11:37.120
know that the very foundations of AI seem extremely

00:11:37.120 --> 00:11:41.080
fragile, requiring only about 250 bad documents

00:11:41.080 --> 00:11:43.899
to potentially compromise an entire model. Yeah,

00:11:43.940 --> 00:11:46.620
yet even as the security of that underlying foundation

00:11:46.620 --> 00:11:50.059
proves brittle, AI agents are gaining this incredibly

00:11:50.059 --> 00:11:53.179
powerful open source toolkit through data sets

00:11:53.179 --> 00:11:55.179
like Toucan. It's making them smarter and much

00:11:55.179 --> 00:11:57.220
more versatile than ever before. So the tension

00:11:57.220 --> 00:12:00.059
is really clear. As agents become exponentially

00:12:00.059 --> 00:12:02.399
more capable at using tools operating in the

00:12:02.399 --> 00:12:05.019
real world, the security of the LLMs that actually

00:12:05.019 --> 00:12:07.679
power them remains surprisingly easy to subvert

00:12:07.679 --> 00:12:09.759
right back at the pre -training stage. We've

00:12:09.759 --> 00:12:11.519
laid out the facts today, trying to illuminate

00:12:11.519 --> 00:12:13.539
both the risks and the accelerating capabilities.

00:12:14.120 --> 00:12:16.539
The real question for you, the listener, is where

00:12:16.539 --> 00:12:19.399
you focus your energy and attention now. So our

00:12:19.399 --> 00:12:21.580
final thought for you to consider today is this.

00:12:22.000 --> 00:12:24.980
If 77 % of staff are already leaking sensitive

00:12:24.980 --> 00:12:28.080
data via casual GPT use, and we now know these

00:12:28.080 --> 00:12:30.240
backdoors are model size independent and can

00:12:30.240 --> 00:12:33.240
be incredibly small, which security area needs

00:12:33.240 --> 00:12:35.659
protection first? Is it the external data supply

00:12:35.659 --> 00:12:39.539
chain? Or is it the internal user behavior? Beat.

00:12:39.659 --> 00:12:41.620
Something to think about. Thank you for joining

00:12:41.620 --> 00:12:43.120
us for this deep dive into the current state

00:12:43.120 --> 00:12:44.360
of AI security and capability.
