WEBVTT

00:00:00.000 --> 00:00:02.240
So we got this huge stack of source material

00:00:02.240 --> 00:00:05.839
today, but there was one metric that just immediately

00:00:05.839 --> 00:00:09.000
jumped out at me. Let me guess. The math competition

00:00:09.000 --> 00:00:12.099
one. Yeah. An open source model achieved a, well...

00:00:12.669 --> 00:00:16.370
Pretty stunning 96 % gold medal performance on

00:00:16.370 --> 00:00:20.070
the AME 2025 math competition. Right. And what's

00:00:20.070 --> 00:00:22.230
so fascinating there isn't just the raw performance.

00:00:22.370 --> 00:00:25.929
It's the strategy. The lab, DeepSeek, they didn't

00:00:25.929 --> 00:00:28.230
just win. No. They immediately turned around

00:00:28.230 --> 00:00:30.910
and made the blueprints available to everyone.

00:00:31.109 --> 00:00:32.750
Yeah. I mean, that just fundamentally shifts

00:00:32.750 --> 00:00:35.710
the playing field instantly. Welcome to the Deep

00:00:35.710 --> 00:00:37.869
Dive. Our whole mission here is to take this

00:00:37.869 --> 00:00:40.829
firehose of AI news and research and, you know,

00:00:40.829 --> 00:00:43.109
really distill it down to what actually matters

00:00:43.109 --> 00:00:45.969
for you. And today's stack shows some major,

00:00:46.049 --> 00:00:48.729
major acceleration, not just small steps. Exactly.

00:00:48.789 --> 00:00:50.450
We're looking for the signal and all that noise.

00:00:50.509 --> 00:00:52.390
And today we're going to focus on three areas

00:00:52.390 --> 00:00:54.890
that really feel like a seismic shift. First,

00:00:55.049 --> 00:00:57.380
we have to talk about that new... Benchmark from

00:00:57.380 --> 00:01:01.359
DeepSeek v3 .2, especially its Olympia tier capabilities.

00:01:01.920 --> 00:01:04.920
Second, we'll hit the AI flashpoints, all the

00:01:04.920 --> 00:01:07.159
rapid fire tools and infrastructure changes that

00:01:07.159 --> 00:01:09.939
are happening like right now. And finally, we're

00:01:09.939 --> 00:01:12.000
going to dive into an internal study from Anthropic

00:01:12.000 --> 00:01:14.739
about how the AI co -worker is already changing

00:01:14.739 --> 00:01:17.459
how expert engineers get their work done. OK,

00:01:17.540 --> 00:01:20.200
let's start with that new benchmark, DeepSeek

00:01:20.200 --> 00:01:23.439
v3 .2 and this high performance model they're

00:01:23.439 --> 00:01:26.219
calling Specialer. The sources had a great analogy

00:01:26.219 --> 00:01:28.780
for it. Yeah, I like that one. They said if GPT

00:01:28.780 --> 00:01:31.159
-5 and Gemini 3 Pro are these custom -built,

00:01:31.200 --> 00:01:34.819
super -exclusive Tesla roadsters, DeepSeek just

00:01:34.819 --> 00:01:36.900
rolled out a high -speed electric bullet train

00:01:36.900 --> 00:01:39.120
for everyone. And they made the tracks free.

00:01:39.359 --> 00:01:41.319
Yeah. I think we need to define what Olympiad

00:01:41.319 --> 00:01:43.480
tier actually means because it's not just, you

00:01:43.480 --> 00:01:46.019
know, memorizing facts. Right. These tests, like

00:01:46.019 --> 00:01:48.739
the IMO or the AIM, they require really novel

00:01:48.739 --> 00:01:51.219
problem -solving. The model has to synthesize

00:01:51.219 --> 00:01:54.150
concepts. reason deeply, and apply strategies

00:01:54.150 --> 00:01:56.230
it has never seen before. These are not your

00:01:56.230 --> 00:01:58.829
average college entry exams. These are competitive,

00:01:58.989 --> 00:02:02.069
almost research -level tests. And the results

00:02:02.069 --> 00:02:05.150
from special are just staggering. It hit 96 %

00:02:05.150 --> 00:02:08.610
on AUN 2025. And even better, it got 99 .2 %

00:02:08.610 --> 00:02:11.409
on the Harvard -MIT Mathematics Tournament, HMMT

00:02:11.409 --> 00:02:15.430
2025. 99 .2. Yeah, and that's stated as the best

00:02:15.430 --> 00:02:18.400
of any reasoning model currently out there. It

00:02:18.400 --> 00:02:21.240
also grabbed gold status across all the big competitive

00:02:21.240 --> 00:02:27.599
programming events. IMO, CMO, IOI. the ICPC finals.

00:02:27.860 --> 00:02:29.979
So when we talk about AI achieving human level

00:02:29.979 --> 00:02:32.180
competence, this is a whole different level.

00:02:32.360 --> 00:02:34.819
Exactly. This is a basic task. This is proving

00:02:34.819 --> 00:02:37.840
world class deep expertise in abstract reasoning.

00:02:38.020 --> 00:02:39.620
And this is where the story gets really, really

00:02:39.620 --> 00:02:42.139
interesting. The economics. Yeah. So even if

00:02:42.139 --> 00:02:44.139
special needs answers are longer, maybe two or

00:02:44.139 --> 00:02:46.379
three times longer than the competition, it's

00:02:46.379 --> 00:02:48.479
still running five to 10 times cheaper overall.

00:02:48.680 --> 00:02:50.479
And that is the ultimate democratizing force,

00:02:50.580 --> 00:02:52.560
right? Yeah. A five to 10 times cost reduction

00:02:52.560 --> 00:02:55.379
means something that used to need a huge R &D

00:02:55.379 --> 00:02:58.460
budget. can now be accessed by a startup, a university.

00:02:58.719 --> 00:03:01.900
Yeah. Even just one person. Whoa. Imagine scaling

00:03:01.900 --> 00:03:05.039
that 5 to 10x cost reduction to a billion queries.

00:03:05.199 --> 00:03:08.520
Like across an entire global education system.

00:03:08.599 --> 00:03:11.240
That changes absolutely everything. It does.

00:03:11.479 --> 00:03:13.259
And the open source strategy is what seals the

00:03:13.259 --> 00:03:16.159
deal. They published everything, which is so

00:03:16.159 --> 00:03:18.639
rare when you have performance this high. Everything.

00:03:18.680 --> 00:03:20.879
Yeah, the training data methodology, their fine

00:03:20.879 --> 00:03:22.919
-tuning techniques, even reports on where the

00:03:22.919 --> 00:03:28.259
model still fails. V3 .2 is live on API, and

00:03:28.259 --> 00:03:31.099
they're offering SpecialEye as a temporary endpoint

00:03:31.099 --> 00:03:34.460
until December 15th, just for testing. They're

00:03:34.460 --> 00:03:36.520
basically forcing the whole world to speed up.

00:03:36.599 --> 00:03:38.879
So what are the immediate implications of having

00:03:38.879 --> 00:03:41.219
this high performance open source AI blueprint

00:03:41.219 --> 00:03:43.900
just out there for everyone? I mean, the speed

00:03:43.900 --> 00:03:46.280
of global adoption is just going to skyrocket.

00:03:46.280 --> 00:03:48.599
The cost barrier has been shattered. Lower cost

00:03:48.599 --> 00:03:51.580
and open blueprints will rapidly accelerate global

00:03:51.580 --> 00:03:54.569
AI adoption. Yep. Okay, let's shift gears. Let's

00:03:54.569 --> 00:03:56.770
look at the rapid -fire highlights, the flashpoints

00:03:56.770 --> 00:03:58.530
that are defining the landscape right now. We

00:03:58.530 --> 00:04:00.430
can probably group these, maybe starting with

00:04:00.430 --> 00:04:02.969
creativity and content. Sounds good. Yeah. And

00:04:02.969 --> 00:04:05.050
we've definitely hit peak competition in video

00:04:05.050 --> 00:04:07.810
generation. The tools are moving past simple

00:04:07.810 --> 00:04:10.729
prompts into real cinematic control. Runway's

00:04:10.729 --> 00:04:14.030
Gen 4 .5 just dropped, and its feature list is

00:04:14.030 --> 00:04:16.870
pretty critical. We're talking full camera control.

00:04:17.110 --> 00:04:19.629
Which means you can dictate. The perspective,

00:04:19.850 --> 00:04:22.410
the angle, the movement without needing to be

00:04:22.410 --> 00:04:24.620
a pro editor. Right. It also has near -perfect

00:04:24.620 --> 00:04:27.180
physics and can orchestrate multiple separate

00:04:27.180 --> 00:04:30.720
elements in one scene. That is huge. Full camera

00:04:30.720 --> 00:04:32.879
control means a creator can get cinematic shots

00:04:32.879 --> 00:04:36.819
right out of the box. And critically, the sources

00:04:36.819 --> 00:04:41.079
say 4 .5 officially beat rivals like Sora 2 Pro

00:04:41.079 --> 00:04:44.759
and VO3 in quality. We're also seeing this massive

00:04:44.759 --> 00:04:47.639
convergence happening. Klingon 1 is on the market

00:04:47.639 --> 00:04:50.399
now, and it handles both video creation and editing

00:04:50.399 --> 00:04:53.339
in one interface. No more handling. between different

00:04:53.339 --> 00:04:55.519
tools and other platforms are consolidating power

00:04:55.519 --> 00:04:58.019
too. OpenAI, for instance, is combining over

00:04:58.019 --> 00:05:00.439
50 image models into one place. So you don't

00:05:00.439 --> 00:05:02.199
even have to choose the best model for a certain

00:05:02.199 --> 00:05:04.360
style anymore. Exactly. The platform just handles

00:05:04.360 --> 00:05:06.540
it. Now, moving over to the enterprise and infrastructure

00:05:06.540 --> 00:05:09.420
side, there's a really interesting signal. OpenAI

00:05:09.420 --> 00:05:12.319
reportedly declared an internal code red for

00:05:12.319 --> 00:05:15.800
chat GPT. Yeah, they're pausing new feature rollouts.

00:05:15.860 --> 00:05:18.639
The internal memo explicitly said they're going

00:05:18.639 --> 00:05:21.899
back to basics. A code red? Usually means they're

00:05:21.899 --> 00:05:24.839
hitting scaling issues, right? Foundational reliability

00:05:24.839 --> 00:05:27.139
problems. Yeah, that's the sense I get. And you

00:05:27.139 --> 00:05:29.560
see this focus on stability across the industry.

00:05:29.720 --> 00:05:32.600
Look at NVIDIA. They just invested $2 billion

00:05:32.600 --> 00:05:36.220
in synopses specifically to use cloud computing

00:05:36.220 --> 00:05:39.180
to speed up product engineering. Right. The focus

00:05:39.180 --> 00:05:42.680
isn't on the next flashy consumer feature. No,

00:05:42.699 --> 00:05:44.860
it's on making the underlying engine reliable

00:05:44.860 --> 00:05:48.639
under massive global pressure. Okay. Okay, switching

00:05:48.639 --> 00:05:50.920
to practical applications. Two things really

00:05:50.920 --> 00:05:53.899
stood out. One was this specialized deep research

00:05:53.899 --> 00:05:56.879
prompt that can instantly spit out a 10 -page

00:05:56.879 --> 00:05:59.420
deep dive on pretty much any company. Super useful.

00:05:59.600 --> 00:06:01.600
For sure. And then we're seeing AI move into

00:06:01.600 --> 00:06:05.399
governance, but subtly. Robot traffic cops are

00:06:05.399 --> 00:06:07.819
now directing traffic in Hangzhou. It's a low

00:06:07.819 --> 00:06:09.439
-stakes way to get people used to machine governance,

00:06:09.620 --> 00:06:12.040
you know. And for knowledge workers, Kimi dropped

00:06:12.040 --> 00:06:13.879
something they're calling free agentic slides.

00:06:14.199 --> 00:06:17.060
Okay, define agentic slides for us. So agentic

00:06:17.060 --> 00:06:19.079
just means the model can perform multiple steps

00:06:19.079 --> 00:06:22.360
on its own. In this case, it's an AI that creates

00:06:22.360 --> 00:06:26.000
complete, editable, exportable PowerPoint presentations

00:06:26.000 --> 00:06:29.180
with unlimited images start to finish. So it

00:06:29.180 --> 00:06:30.819
takes away the most annoying part of a lot of

00:06:30.819 --> 00:06:33.639
corporate jobs. Pretty much. So considering this

00:06:33.639 --> 00:06:35.779
code red pivot and all the intense competition,

00:06:36.160 --> 00:06:38.699
where do you think the primary focus shifts for

00:06:38.699 --> 00:06:41.560
the major AI labs now? I think the focus is definitely

00:06:41.560 --> 00:06:44.819
shifting from adding new features to just solidifying

00:06:44.819 --> 00:06:47.339
the core model's reliability and safety. So less

00:06:47.339 --> 00:06:49.899
about new features, more about stable, reliable

00:06:49.899 --> 00:06:52.699
model performance. Exactly. That's the bottleneck

00:06:52.699 --> 00:06:55.319
now. And that focus on reliability actually leads

00:06:55.319 --> 00:06:58.040
us perfectly into our last segment. the anthropic

00:06:58.040 --> 00:07:00.439
internal report right this study gives us this

00:07:00.439 --> 00:07:03.199
rare really high resolution look into the daily

00:07:03.199 --> 00:07:05.759
lives of their own engineers using advanced ai

00:07:05.759 --> 00:07:10.060
in this case claude it's based on 132 employee

00:07:10.060 --> 00:07:12.819
surveys and 53 interviews and the productivity

00:07:12.819 --> 00:07:14.819
gains are just i mean they're astonishing they

00:07:14.819 --> 00:07:16.980
found that their engineers are now using claude

00:07:16.980 --> 00:07:21.970
in 60 of their daily work 60 yeah And this translates

00:07:21.970 --> 00:07:25.769
directly to a 50 % productivity boost. That's

00:07:25.769 --> 00:07:27.889
two to three times the jump they saw just a year

00:07:27.889 --> 00:07:31.050
ago. A 50 % boost is massive. It really shows

00:07:31.050 --> 00:07:33.209
how quickly this technology is moving from just

00:07:33.209 --> 00:07:36.009
being a tool to being more of a co -worker. Six

00:07:36.009 --> 00:07:38.170
months ago, Claude needed a nudge after maybe

00:07:38.170 --> 00:07:41.490
10 independent actions. Now it's handling 20

00:07:41.490 --> 00:07:43.930
or more sequential actions before a human needs

00:07:43.930 --> 00:07:46.550
to step in. And that capability, of course, presents

00:07:46.550 --> 00:07:48.689
a double -edged sword, which the report points

00:07:48.689 --> 00:07:51.410
out. Yeah. On the plus side, Claude makes these

00:07:51.410 --> 00:07:54.569
specialist engineers more full stack. They can

00:07:54.569 --> 00:07:56.569
suddenly work on database management or front

00:07:56.569 --> 00:07:58.990
-end design, things way outside their core skills.

00:07:59.290 --> 00:08:01.569
That sounds empowering, but what's the negative

00:08:01.569 --> 00:08:04.560
side? The concern raised in the interviews is

00:08:04.560 --> 00:08:06.779
that relying on the tool might actually make

00:08:06.779 --> 00:08:09.800
them worse at their core craft over time. If

00:08:09.800 --> 00:08:12.379
you outsource that mental muscle, it starts to

00:08:12.379 --> 00:08:14.779
atrophy. I still wrestle with prompt drift myself,

00:08:15.079 --> 00:08:17.660
where you rely on the tool so much that your

00:08:17.660 --> 00:08:19.759
own ability to define the problem degrades a

00:08:19.759 --> 00:08:22.279
little. It's hard to justify taking the long

00:08:22.279 --> 00:08:24.660
way when the tool works so well. Yeah, that reliance

00:08:24.660 --> 00:08:27.240
challenge is universal. And the report notes

00:08:27.240 --> 00:08:30.139
that Claude is actively replacing colleague interactions.

00:08:30.860 --> 00:08:32.980
What do you mean? Like, the quick question you'd

00:08:32.980 --> 00:08:36.059
ask a coworker, the informal brainstorm. That's

00:08:36.059 --> 00:08:38.440
now happening with the AI. So we're seeing this

00:08:38.440 --> 00:08:42.860
slow shift from simple task delegation to full

00:08:42.860 --> 00:08:45.320
work stream collaboration with the AI. So the

00:08:45.320 --> 00:08:47.659
AI isn't just replacing humans, which is always

00:08:47.659 --> 00:08:49.460
the big fear. It's unlocking what the report

00:08:49.460 --> 00:08:52.860
calls latent work. Exactly. All the necessary

00:08:52.860 --> 00:08:56.639
but messy, annoying stuff. Documentation, finding

00:08:56.639 --> 00:08:59.600
obscure edge cases, the work that was too expensive

00:08:59.600 --> 00:09:02.379
or time consuming to do before. Now it just gets

00:09:02.379 --> 00:09:04.929
done. So if AI is replacing those quick colleague

00:09:04.929 --> 00:09:07.750
interactions, what happens to the natural, unscripted

00:09:07.750 --> 00:09:10.409
knowledge sharing that really defines a strong

00:09:10.409 --> 00:09:13.070
team culture? Well, that's the risk. That informal

00:09:13.070 --> 00:09:15.149
knowledge transfer, those lucky accidents that

00:09:15.149 --> 00:09:16.990
lead to innovation, they could get automated

00:09:16.990 --> 00:09:19.309
away or just lost if all communication becomes

00:09:19.309 --> 00:09:21.929
AI mediated. We risk losing informal knowledge

00:09:21.929 --> 00:09:24.429
transfer when collaboration becomes purely AI

00:09:24.429 --> 00:09:26.850
mediated. It's a key cultural risk we have to

00:09:26.850 --> 00:09:29.470
watch. OK, so let's summarize the big findings

00:09:29.470 --> 00:09:31.889
from this deep dive. We've seen this incredible

00:09:31.889 --> 00:09:34.169
external competition from labs like DeepSeek.

00:09:34.269 --> 00:09:36.529
They're offering world class performance at a

00:09:36.529 --> 00:09:39.950
super low cost, which is forcing this radical

00:09:39.950 --> 00:09:42.149
democratization. And at the same time, we've

00:09:42.149 --> 00:09:44.070
looked at the internal transformation through

00:09:44.070 --> 00:09:46.649
that unprofit report, which shows how high level

00:09:46.649 --> 00:09:48.830
experts are fundamentally changing how they work.

00:09:48.929 --> 00:09:52.169
That 50 percent productivity spike is real and

00:09:52.169 --> 00:09:54.210
it's happening right now. Right. And the tension

00:09:54.210 --> 00:09:57.600
is right there. Global AI power is getting cheaper

00:09:57.600 --> 00:10:01.730
and more open every single day. But human work,

00:10:01.789 --> 00:10:05.009
while it's more leveraged, is also becoming potentially

00:10:05.009 --> 00:10:07.149
more reliant on the machine for basic functions,

00:10:07.370 --> 00:10:09.649
replacing both manual work and talking to our

00:10:09.649 --> 00:10:12.529
colleagues. And before we wrap, we just wanted

00:10:12.529 --> 00:10:15.330
to briefly acknowledge you, our listener. Right.

00:10:15.389 --> 00:10:17.529
We've seen a lot of interest in moving toward

00:10:17.529 --> 00:10:19.649
a weekly recap. You know, the best tools, top

00:10:19.649 --> 00:10:22.549
news, smart takes. We hear you loud and clear

00:10:22.549 --> 00:10:24.690
that trying to process this volume of information

00:10:24.690 --> 00:10:27.850
daily is a real challenge. We are actively working

00:10:27.850 --> 00:10:30.009
on how to structure that for you. So here's the.

00:10:30.059 --> 00:10:32.120
final thought to leave you with based on everything

00:10:32.120 --> 00:10:34.440
we talked about today. The Anthropic Report shows

00:10:34.440 --> 00:10:37.419
AI unlocks latent work and raises productivity

00:10:37.419 --> 00:10:40.379
50%. But if you become worse at your core craft

00:10:40.379 --> 00:10:43.500
because AI is handling 60 % of the work, how

00:10:43.500 --> 00:10:45.659
do you define long -term career resilience? What

00:10:45.659 --> 00:10:47.879
specific human skill, one that can't be automated,

00:10:48.200 --> 00:10:50.500
will you intentionally sharpen this year to counter

00:10:50.500 --> 00:10:50.960
that trend?