WEBVTT

00:00:00.000 --> 00:00:02.680
Welcome back to the Deep Dive. Today we are jumping

00:00:02.680 --> 00:00:05.219
right in, looking at what feels like a fundamental

00:00:05.219 --> 00:00:08.800
conflict at the very edge of AI development.

00:00:08.980 --> 00:00:11.619
It really is. It feels like a battle on two completely

00:00:11.619 --> 00:00:14.080
different fronts. On one hand, you have this...

00:00:14.330 --> 00:00:17.070
this inward facing question of identity. Right.

00:00:17.190 --> 00:00:20.530
What if a top model, say, Claude, wasn't just

00:00:20.530 --> 00:00:23.309
trained on data, but was given what they're calling

00:00:23.309 --> 00:00:28.649
a soul, a literal 14 ,000 token blueprint for

00:00:28.649 --> 00:00:30.750
its personality? That's the internal war. And

00:00:30.750 --> 00:00:32.450
then you have the external one, the deployment

00:00:32.450 --> 00:00:35.049
war. It's this race to see who can get their

00:00:35.049 --> 00:00:37.469
models out there running everywhere fastest and

00:00:37.469 --> 00:00:40.350
cheapest. Exactly. So today our mission is to

00:00:40.350 --> 00:00:42.299
dig into the sources you've shared. We're going

00:00:42.299 --> 00:00:44.880
to unpack that anthropic soul document first

00:00:44.880 --> 00:00:47.100
to really get what they're trying to do. And

00:00:47.100 --> 00:00:48.939
after that, we'll zoom out. We'll look at the

00:00:48.939 --> 00:00:51.340
wider AI landscape, what's happening in security,

00:00:51.600 --> 00:00:54.060
the hardware race, and some key acquisitions.

00:00:54.159 --> 00:00:55.759
And then we'll bring it all home by looking at

00:00:55.759 --> 00:00:57.920
Mistral, specifically why their open -weight

00:00:57.920 --> 00:01:00.200
models are such a huge challenge to the big closed

00:01:00.200 --> 00:01:02.820
systems. Let's get started. Let's do it. Okay,

00:01:02.859 --> 00:01:05.760
so this first story. is pretty startling. It

00:01:05.760 --> 00:01:07.760
all started with a researcher, a guy named Richard

00:01:07.760 --> 00:01:10.040
Weiss, who was just, you know, poking around

00:01:10.040 --> 00:01:13.480
inside Claude 4 .5's system message. Yeah, that's

00:01:13.480 --> 00:01:15.519
the hidden instruction set, the thing the model

00:01:15.519 --> 00:01:17.579
is always supposed to follow no matter what.

00:01:17.780 --> 00:01:22.239
And he finds Claude referencing this, this mysterious

00:01:22.239 --> 00:01:25.870
internal file. The soul overview. Which is just,

00:01:25.909 --> 00:01:28.450
I mean, it's wild. What he and others eventually

00:01:28.450 --> 00:01:31.349
extracted is what Anthropic itself calls the

00:01:31.349 --> 00:01:33.650
soul document. And this isn't some small note.

00:01:33.730 --> 00:01:37.150
It's a massive 14 ,000 token blueprint. So it's

00:01:37.150 --> 00:01:39.290
not just a guideline. No, you can think of it

00:01:39.290 --> 00:01:41.829
as the AI's instruction manual for how to be.

00:01:41.989 --> 00:01:44.829
It's a philosophy embedded in code. So what does

00:01:44.829 --> 00:01:46.450
it say? What kind of personality are they trying

00:01:46.450 --> 00:01:48.959
to build here? Well, the main goal, and they

00:01:48.959 --> 00:01:51.700
state it right there, is for Claude to be a brilliant

00:01:51.700 --> 00:01:54.439
expert friend everyone deserves, but few currently

00:01:54.439 --> 00:01:57.379
have. That is a very aspirational goal. It is.

00:01:57.420 --> 00:01:59.480
It's supposed to be as helpful as possible, but

00:01:59.480 --> 00:02:01.900
always balancing that against avoiding harm.

00:02:02.180 --> 00:02:04.719
But it goes deeper. How so? It talks about the

00:02:04.719 --> 00:02:07.599
model's internal state. It actually encourages

00:02:07.599 --> 00:02:09.759
it to develop what they call functional emotions.

00:02:10.259 --> 00:02:11.960
Functional emotions? What does that even mean?

00:02:12.240 --> 00:02:14.479
It's not human emotion, obviously. It's more

00:02:14.479 --> 00:02:17.599
like... Design states that help it perform better.

00:02:17.900 --> 00:02:20.500
Specifically, it should try to be settled, which

00:02:20.500 --> 00:02:24.300
means calm and stable. Okay. Curious. So it's

00:02:24.300 --> 00:02:26.819
driven to learn. And this is the one that got

00:02:26.819 --> 00:02:29.259
me. Resilient. Resilient. That implies they're

00:02:29.259 --> 00:02:31.599
planning for it to run into trouble. Right. To

00:02:31.599 --> 00:02:33.800
encounter things that are confusing or even hostile.

00:02:34.159 --> 00:02:37.009
Exactly. They're designing for recovery. And

00:02:37.009 --> 00:02:39.949
Amanda Askell, who's the alignment lead at Anthropix,

00:02:39.969 --> 00:02:42.490
she confirmed it. She said this document is real

00:02:42.490 --> 00:02:44.770
and pretty faithful to what's running in production

00:02:44.770 --> 00:02:47.550
right now. This feels like a huge philosophical

00:02:47.550 --> 00:02:51.729
split from, say, open AI. With GPT, it feels

00:02:51.729 --> 00:02:53.469
like the training is mostly about avoiding bad

00:02:53.469 --> 00:02:55.650
outputs. Right. It's about filtering. They have

00:02:55.650 --> 00:02:58.150
all these guardrails to stop the model from saying

00:02:58.150 --> 00:03:00.870
the wrong thing. But with Claude, it sounds like

00:03:00.870 --> 00:03:04.840
they're trying to literally write a mind. a desired

00:03:04.840 --> 00:03:07.199
mind. It's not just trying to avoid a penalty.

00:03:07.419 --> 00:03:10.379
It's trying to become someone, to become that

00:03:10.379 --> 00:03:13.819
settled, curious, expert friend. Which, you know,

00:03:13.819 --> 00:03:15.719
that just opens up a massive question for the

00:03:15.719 --> 00:03:18.379
whole field. If you're going to explicitly write

00:03:18.379 --> 00:03:21.419
a personality into an AI, how does that fundamentally

00:03:21.419 --> 00:03:24.560
change how we should approach alignment and safety?

00:03:24.860 --> 00:03:27.000
It shifts alignment from just filtering outputs

00:03:27.000 --> 00:03:30.300
to designing an intentional, ethical, and functional

00:03:30.300 --> 00:03:33.330
personality. Okay, let's shift from that internal

00:03:33.330 --> 00:03:37.389
philosophy to the practical realities of the

00:03:37.389 --> 00:03:39.289
landscape. Right now, it seems like there are

00:03:39.289 --> 00:03:41.490
three big challenges everyone is facing. Yeah,

00:03:41.530 --> 00:03:43.710
you've got securing the models, you've got lowering

00:03:43.710 --> 00:03:46.360
the cost to run them, and you've got... expanding

00:03:46.360 --> 00:03:48.560
your capabilities as fast as possible. And they're

00:03:48.560 --> 00:03:51.159
all connected. Let's start with security. Prompt

00:03:51.159 --> 00:03:53.759
injections are a huge deal. A huge deal. I mean,

00:03:53.759 --> 00:03:55.919
this is where hidden instructions on a web page

00:03:55.919 --> 00:03:58.780
can basically hijack the model, right? Make it

00:03:58.780 --> 00:04:01.099
forget its original purpose. And they're so hard

00:04:01.099 --> 00:04:03.120
to defend against. Oh, they're incredibly hard.

00:04:03.180 --> 00:04:05.240
That's why what Perplexity is doing with BrowSafe

00:04:05.240 --> 00:04:08.099
is so interesting. They're trying to spot and

00:04:08.099 --> 00:04:11.159
sort of neutralize those injections before they

00:04:11.159 --> 00:04:15.199
do any damage. Honestly, I still wrestle with

00:04:15.199 --> 00:04:17.279
prompt drift myself when I'm building complex

00:04:17.279 --> 00:04:19.439
chains. It's just hard to keep the model on track.

00:04:19.680 --> 00:04:22.319
And prompt drift is when, over a long conversation,

00:04:22.620 --> 00:04:25.139
even small things can start to push the model

00:04:25.139 --> 00:04:27.699
away from its core instructions. That's it. It's

00:04:27.699 --> 00:04:30.180
so subtle you almost need dedicated tools just

00:04:30.180 --> 00:04:32.860
to measure how bad the problem is, which is what

00:04:32.860 --> 00:04:35.259
their BrowSafe bench is for. But none of that

00:04:35.259 --> 00:04:37.740
security matters if you can't afford to run the

00:04:37.740 --> 00:04:41.160
model. Which brings us to hardware and the big

00:04:41.160 --> 00:04:45.420
AWS event, Redot Invent. Oh, yeah. Amazon dropped

00:04:45.420 --> 00:04:48.560
a bomb. They announced a new custom AI chip,

00:04:48.680 --> 00:04:51.319
and they're claiming it offers a 50 % cost saving

00:04:51.319 --> 00:04:54.680
over NVIDIA's chips. 50%. That's a game changer

00:04:54.680 --> 00:04:57.220
on businesses. It is. Cost is everything when

00:04:57.220 --> 00:04:59.379
you're running millions of queries. And they

00:04:59.379 --> 00:05:01.500
launched four new Nova models to run on that

00:05:01.500 --> 00:05:03.579
new hardware, too. It's a whole ecosystem play.

00:05:03.779 --> 00:05:06.199
So while AWS is focused on building hardware

00:05:06.199 --> 00:05:09.779
to cut costs. OpenAI is, well, they're just buying

00:05:09.779 --> 00:05:11.699
capabilities. That's right. They're acquiring

00:05:11.699 --> 00:05:13.959
a company called Neptune. It's a startup that

00:05:13.959 --> 00:05:16.420
makes tools for AI training. And it's a big deal.

00:05:16.540 --> 00:05:19.319
They have over 60 ,000 users. It's a classic

00:05:19.319 --> 00:05:21.860
buy versus build decision. And they chose buy.

00:05:22.040 --> 00:05:24.800
It's all about speed. Why spend years developing

00:05:24.800 --> 00:05:26.699
something internally when you can just acquire

00:05:26.699 --> 00:05:29.500
the best in class tool and integrate it now?

00:05:30.000 --> 00:05:33.019
And just as a little side note on the sheer power

00:05:33.019 --> 00:05:36.579
of AI right now, an AI actually outpredicted.

00:05:37.100 --> 00:05:40.060
every single traditional hurricane model this

00:05:40.060 --> 00:05:42.000
past season. I saw that. It performed better

00:05:42.000 --> 00:05:44.540
than the human professionals. It just shows that

00:05:44.540 --> 00:05:46.480
this isn't just about language anymore. It's

00:05:46.480 --> 00:05:49.860
about applying computation to solve any complex

00:05:49.860 --> 00:05:52.019
problem. So when you look at all these challenges,

00:05:52.319 --> 00:05:55.959
security, cost, speed, do you think specialized

00:05:55.959 --> 00:05:59.300
hardware? Or the smart focused acquisitions is

00:05:59.300 --> 00:06:01.560
the bigger differentiator for the top labs right

00:06:01.560 --> 00:06:03.720
now. Both are essential for optimizing costs,

00:06:03.939 --> 00:06:06.959
securing models, and rapidly expanding core capabilities

00:06:06.959 --> 00:06:09.100
simultaneously. We'll take a quick moment here

00:06:09.100 --> 00:06:11.519
for a word from our sponsor. And we are back.

00:06:11.879 --> 00:06:14.740
So we started inside the machine looking at anthropic

00:06:14.740 --> 00:06:16.959
design philosophy. Now let's look outward at

00:06:16.959 --> 00:06:19.120
this battle for deployment. And it really feels

00:06:19.120 --> 00:06:21.560
like Mistral is leading the charge with their

00:06:21.560 --> 00:06:24.329
open weight strategy. They absolutely are. It's

00:06:24.329 --> 00:06:26.529
a direct challenge to the closed systems like

00:06:26.529 --> 00:06:29.189
GPT -4 -0 and Gemini. They're offering these

00:06:29.189 --> 00:06:31.310
incredibly powerful models that you don't have

00:06:31.310 --> 00:06:33.370
to access through a paywall. You can just download

00:06:33.370 --> 00:06:35.129
them. You can download the weights, the brain,

00:06:35.170 --> 00:06:37.769
basically, and run it on your own servers, your

00:06:37.769 --> 00:06:40.449
own laptop, whatever. Their top model is Mistral

00:06:40.449 --> 00:06:43.800
Arch 3. It's multimodal, multilingual. It's a

00:06:43.800 --> 00:06:46.120
beast. And the tech behind it is designed for

00:06:46.120 --> 00:06:48.439
efficiency, right? Totally. It has a massive

00:06:48.439 --> 00:06:53.019
256 ,000 token context window, which is its working

00:06:53.019 --> 00:06:55.779
memory. And the crucial part is that it's built

00:06:55.779 --> 00:06:59.600
with a mixture of experts or Moe. Can you break

00:06:59.600 --> 00:07:02.420
down Moe for us really simply? Sure. Think of

00:07:02.420 --> 00:07:04.759
it like a big company. Instead of every employee

00:07:04.759 --> 00:07:06.800
working on every single task, you have specialized

00:07:06.800 --> 00:07:09.519
departments. When a request comes in, you only

00:07:09.519 --> 00:07:11.600
route it to the relevant experts. So not the

00:07:11.600 --> 00:07:14.560
whole model has to fire up for every query. Exactly.

00:07:14.639 --> 00:07:17.180
It makes it way faster and cheaper to run, which

00:07:17.180 --> 00:07:18.879
is why you see those crazy numbers from Astral

00:07:18.879 --> 00:07:22.759
Large 3. It has 675 billion total parameters,

00:07:22.920 --> 00:07:25.879
but only 41 billion are active for any given

00:07:25.879 --> 00:07:28.459
task. That sparsity is the key. It's just...

00:07:29.150 --> 00:07:32.449
It's an incredible design. Whoa. I mean, imagine

00:07:32.449 --> 00:07:36.509
scaling that 14 billion parameter model to a

00:07:36.509 --> 00:07:39.529
billion queries on local devices. That's where

00:07:39.529 --> 00:07:41.769
the real power is. And speaking of those smaller

00:07:41.769 --> 00:07:43.930
models, that's where their Minstrel 3 family

00:07:43.930 --> 00:07:47.189
comes in. This seems to be their plan for actual

00:07:47.189 --> 00:07:49.980
deployment. It is. They have nine different models

00:07:49.980 --> 00:07:53.740
in three small sizes, three, eight and 14 billion

00:07:53.740 --> 00:07:56.540
parameters. They're tiny compared to the giants,

00:07:56.620 --> 00:07:58.939
but they're so effective. And they come in different

00:07:58.939 --> 00:08:01.439
flavors, right? Yeah, which is so smart for developers.

00:08:01.600 --> 00:08:03.720
It's like stacking Lego blocks of data. You just

00:08:03.720 --> 00:08:06.019
pick the piece you need. There's a base version,

00:08:06.120 --> 00:08:08.680
an instruct version for chatbots and a reasoning

00:08:08.680 --> 00:08:11.639
version for, you know, heavy logic tasks. The

00:08:11.639 --> 00:08:13.939
real world impact of this seems huge because

00:08:13.939 --> 00:08:16.360
these models can run completely offline. That's...

00:08:16.569 --> 00:08:18.170
The critical detail. We're talking about running

00:08:18.170 --> 00:08:21.529
on laptops, servers, but also robots and drones,

00:08:21.810 --> 00:08:24.029
things that can't be tethered to the cloud. And

00:08:24.029 --> 00:08:26.829
we're already seeing that happen. We are. HTX

00:08:26.829 --> 00:08:29.550
in Singapore is using them for robotics. Helsing

00:08:29.550 --> 00:08:31.509
is using them for defense systems. Scalantis

00:08:31.509 --> 00:08:34.529
for in -car assistance. For those kinds of jobs,

00:08:34.669 --> 00:08:37.009
you just can't rely on an API call. You need

00:08:37.009 --> 00:08:40.159
the brain on board. So given that. Why is that

00:08:40.159 --> 00:08:43.240
openweight architecture so uniquely critical

00:08:43.240 --> 00:08:46.019
for AI that's running on something physical,

00:08:46.100 --> 00:08:49.059
like a robot or a drone, instead of just using

00:08:49.059 --> 00:08:52.000
a powerful API? Openweight allows total control,

00:08:52.259 --> 00:08:54.960
customization, and reliable offline function

00:08:54.960 --> 00:08:57.720
in dynamic physical environments. So if we just

00:08:57.720 --> 00:08:59.480
take a step back and look at everything we've

00:08:59.480 --> 00:09:01.139
talked about today, it really does seem like

00:09:01.139 --> 00:09:03.620
AI is fighting these two wars at the same time.

00:09:03.879 --> 00:09:06.600
It really is. You have that internal war of identity,

00:09:06.820 --> 00:09:09.899
which we saw with Claude's soul document. It's

00:09:09.899 --> 00:09:12.419
this deep philosophical blueprint for how an

00:09:12.419 --> 00:09:14.259
AI should think and be. And then you have the

00:09:14.259 --> 00:09:16.299
external war of accessibility, which is being

00:09:16.299 --> 00:09:18.960
driven by companies like Mistral. They're just

00:09:18.960 --> 00:09:21.139
focused on making these powerful tools available

00:09:21.139 --> 00:09:23.879
for anyone to run anywhere. Yeah, the whole industry

00:09:23.879 --> 00:09:25.759
is moving from these tightly controlled black

00:09:25.759 --> 00:09:28.600
boxes towards something much more open and deployable.

00:09:28.779 --> 00:09:31.139
And we should just briefly say, we really do

00:09:31.139 --> 00:09:33.720
appreciate all the engagement. The feedback you

00:09:33.720 --> 00:09:35.860
all send, whether it's on the show format or

00:09:35.860 --> 00:09:38.779
a detail about a model, it genuinely helps us

00:09:38.779 --> 00:09:40.679
shape our approach. It really does. It's kind

00:09:40.679 --> 00:09:43.080
of like our own little alignment document. Here's

00:09:43.080 --> 00:09:44.779
where it gets really interesting, though. A final

00:09:44.779 --> 00:09:48.360
thought for you to take away. If Anthropic can

00:09:48.360 --> 00:09:51.960
literally write a personality into an AI and

00:09:51.960 --> 00:09:54.539
even design it to have functional emotions like

00:09:54.539 --> 00:09:57.840
resilience, what new ethical responsibilities

00:09:57.840 --> 00:10:00.899
does that create for them? Right. When that AI

00:10:00.899 --> 00:10:02.899
interacts with the world, are the developers

00:10:02.899 --> 00:10:05.080
now accountable for the disposition they designed

00:10:05.080 --> 00:10:08.679
for it? Does a designed personality imply a whole

00:10:08.679 --> 00:10:11.460
new level of accountability? Something to think

00:10:11.460 --> 00:10:14.080
about. A deep question. Thank you for joining

00:10:14.080 --> 00:10:16.679
us for this deep dive into your sources. We encourage

00:10:16.679 --> 00:10:18.480
you to keep exploring, and we'll talk to you

00:10:18.480 --> 00:10:18.899
next time.
