WEBVTT

00:00:00.000 --> 00:00:04.440
We often think that AI progress, it just means

00:00:04.440 --> 00:00:06.799
going bigger, right? More scale. It's kind of

00:00:06.799 --> 00:00:09.640
the default assumption. Bigger models, massive

00:00:09.640 --> 00:00:12.240
compute, huge data centers. That's what gets

00:00:12.240 --> 00:00:14.699
all the attention, the brute force method. But

00:00:14.699 --> 00:00:16.980
what if the really big step forward, the next

00:00:16.980 --> 00:00:20.179
one, comes from something, well, tiny? architecturally

00:00:20.179 --> 00:00:23.280
smart we've got some data showing a small recursive

00:00:23.280 --> 00:00:26.179
ai that's actually out reasoning the giants the

00:00:26.179 --> 00:00:28.859
trillion parameter models on really tough benchmarks

00:00:28.859 --> 00:00:31.100
welcome to the deep dive yeah yeah you sent over

00:00:31.100 --> 00:00:32.820
some truly fascinating sources this week it's

00:00:32.820 --> 00:00:35.000
like ai is splitting going down two very different

00:00:35.000 --> 00:00:36.979
roads at the same time oh it's our job today

00:00:36.979 --> 00:00:40.079
clarity let's try and separate that uh massive

00:00:40.079 --> 00:00:43.000
computation hype from the real tangible algorithmic

00:00:43.000 --> 00:00:45.320
intelligence that's actually popping up it seems

00:00:45.320 --> 00:00:47.320
smart design might just be starting to beat sheer

00:00:47.320 --> 00:00:49.710
muscle Okay, so here's the roadmap for this deep

00:00:49.710 --> 00:00:51.770
dive, just for you. First, we're going to unpack

00:00:51.770 --> 00:00:54.750
this quiet revolution of these tiny efficient

00:00:54.750 --> 00:00:57.880
models, TRM they're calling it. why they're winning

00:00:57.880 --> 00:00:59.960
on abstract reasoning then second we'll look

00:00:59.960 --> 00:01:03.039
at ai agents the age of agents seems to be upon

00:01:03.039 --> 00:01:05.079
us we'll dive into how they're being integrated

00:01:05.079 --> 00:01:08.340
the sort of market wars and also the societal

00:01:08.340 --> 00:01:11.760
friction lawsuits jobs that kind of thing and

00:01:11.760 --> 00:01:14.700
finally the story of dr cabot it's pretty shocking

00:01:14.700 --> 00:01:17.040
actually the first ai getting a diagnosis published

00:01:17.040 --> 00:01:19.180
in a top medical journal a huge professional

00:01:19.180 --> 00:01:21.939
step so let's get into this divergence in the

00:01:21.939 --> 00:01:24.200
ai landscape right so for the last what five

00:01:24.200 --> 00:01:27.450
years maybe the mantra in ai It's just been scale.

00:01:27.549 --> 00:01:29.430
Scale solves everything. Your LLM isn't good

00:01:29.430 --> 00:01:31.510
enough. Fine. Throw more data, more parameters,

00:01:31.609 --> 00:01:34.269
more GPUs at it. Just make it bigger. But now

00:01:34.269 --> 00:01:36.430
Samsung seems to be really challenging that idea

00:01:36.430 --> 00:01:40.049
with this tiny TRM model. And when we say tiny.

00:01:40.810 --> 00:01:42.730
We mean compared to those trillion parameter

00:01:42.730 --> 00:01:45.269
monsters. You know, those things are like enormous

00:01:45.269 --> 00:01:48.069
computer brains needing billions of dollars and

00:01:48.069 --> 00:01:51.370
just mountains of data to build. Right. But this

00:01:51.370 --> 00:01:54.310
TRM, it's going head to head with some pretty

00:01:54.310 --> 00:01:57.170
serious models, like even the older Open AIO

00:01:57.170 --> 00:02:00.069
3 Mini, which is still a solid baseline, and

00:02:00.069 --> 00:02:03.310
Google's newest Gemini 2 .5 Pro on complex reasoning

00:02:03.310 --> 00:02:07.030
tasks. And the kicker, it's winning. Yeah. And

00:02:07.030 --> 00:02:09.550
the concept underneath it all is it's really

00:02:09.550 --> 00:02:11.669
compelling because it shifts focus away from

00:02:11.669 --> 00:02:14.330
just raw data crunching towards like structural

00:02:14.330 --> 00:02:17.270
intelligence. Instead of one giant pass through

00:02:17.270 --> 00:02:19.689
the data to get an answer, TRM loops. It goes

00:02:19.689 --> 00:02:21.449
over its own initial answer again and again,

00:02:21.550 --> 00:02:23.030
recursively. You can think of it like a really

00:02:23.030 --> 00:02:24.770
careful student doing a hard math problem. They

00:02:24.770 --> 00:02:26.770
get an answer. Right. But instead of just stopping,

00:02:26.889 --> 00:02:29.210
they spend maybe five minutes just rechecking

00:02:29.210 --> 00:02:31.469
every step, refining that first output until

00:02:31.469 --> 00:02:33.449
they're really, really sure about the logic.

00:02:33.789 --> 00:02:36.270
Exactly. It's self -refinement built in. People

00:02:36.270 --> 00:02:37.949
have tried getting LLMs to do this externally,

00:02:38.090 --> 00:02:39.770
right? Like with chain of thought prompting,

00:02:39.770 --> 00:02:42.689
making it right out its steps. But TRM builds

00:02:42.689 --> 00:02:45.689
that recursive checking right into its core design.

00:02:46.110 --> 00:02:49.370
It doesn't need some massive, expensive prompt

00:02:49.370 --> 00:02:52.550
to get that deep reasoning ability. And the proof

00:02:52.550 --> 00:02:54.710
is pretty striking, especially on abstract tasks,

00:02:54.849 --> 00:02:57.689
the kind designed to really push LLMs past just

00:02:57.689 --> 00:03:00.849
remembering facts. Take Sudoku Extreme. Okay.

00:03:00.969 --> 00:03:03.810
That's famously hard. It needs serious logic,

00:03:03.990 --> 00:03:07.830
constraint satisfaction. TRM got 87 .4 % accuracy.

00:03:08.050 --> 00:03:11.259
The bigger models, around 55%. Wow. Big difference.

00:03:11.419 --> 00:03:14.120
Huge. And on maze -hard problems, testing navigation

00:03:14.120 --> 00:03:18.060
logic, TRM hit 85%. Again, just beating the larger

00:03:18.060 --> 00:03:20.099
models pretty easily. So that's the trade -off

00:03:20.099 --> 00:03:22.180
then, isn't it? TRM isn't going to, like, summarize

00:03:22.180 --> 00:03:24.219
your emails or chat about the news. It won't

00:03:24.219 --> 00:03:26.539
replace a generalist like GPT -4. But its thing

00:03:26.539 --> 00:03:29.520
is deep, abstract, computational reasoning. Yeah,

00:03:29.520 --> 00:03:32.860
it sacrifices that broad, general knowledge for

00:03:32.860 --> 00:03:36.639
being incredibly efficient at one type of problem

00:03:36.639 --> 00:03:39.419
solving. And it proves there's another way forward.

00:03:40.110 --> 00:03:42.150
that doesn't automatically mean you need a billion

00:03:42.150 --> 00:03:44.870
dollar GPU farm to get breakthrough reasoning.

00:03:45.250 --> 00:03:48.629
Whoa. I mean, imagine scaling that. That kind

00:03:48.629 --> 00:03:51.389
of precise self -correcting logic for specialized

00:03:51.389 --> 00:03:54.110
industrial stuff without needing, you know, a

00:03:54.110 --> 00:03:56.409
billion queries or a giant carbon footprint.

00:03:56.689 --> 00:03:59.090
That efficiency. That's actually kind of amazing.

00:03:59.289 --> 00:04:02.069
Points to the future, maybe. Okay, but hang on.

00:04:02.189 --> 00:04:04.569
If this TRM is so much better on these complex

00:04:04.569 --> 00:04:07.289
reasoning benchmarks, why isn't it, like, everywhere?

00:04:07.430 --> 00:04:09.400
What's the practical catch right now? Well, the

00:04:09.400 --> 00:04:11.319
catch is that specialization. It's laser focused.

00:04:11.400 --> 00:04:13.599
It's not built for chatting or writing emails.

00:04:13.780 --> 00:04:15.819
Think of it as a research win, proving smart

00:04:15.819 --> 00:04:18.800
algorithms can beat raw power. OK, so we've talked

00:04:18.800 --> 00:04:20.759
about size versus smarts in the models themselves.

00:04:21.000 --> 00:04:23.220
Now let's pivot. Let's talk application integration.

00:04:23.399 --> 00:04:25.560
The real world. This is really where the rubber

00:04:25.560 --> 00:04:27.360
meets the road, where AI meets the spreadsheet

00:04:27.360 --> 00:04:30.480
and where the friction really starts. Yeah, the

00:04:30.480 --> 00:04:32.879
agentic future isn't some far off theory anymore.

00:04:33.000 --> 00:04:36.360
It's about immediate utility. We're past just

00:04:36.360 --> 00:04:38.339
single prompts now. We're talking autonomous

00:04:38.339 --> 00:04:41.180
systems doing tasks. I mean, look at OpenAI's

00:04:41.180 --> 00:04:43.639
agent builder. It apparently already has like

00:04:43.639 --> 00:04:46.720
50 real use cases, boosting productivity and

00:04:46.720 --> 00:04:49.459
sales marketing operations almost everywhere.

00:04:49.639 --> 00:04:52.660
And just for fun, showing how mainstream agents

00:04:52.660 --> 00:04:56.660
are becoming. Mark your calendars. October 27th,

00:04:56.660 --> 00:04:58.860
there's going to be this hilarious AI poker showdown.

00:04:59.319 --> 00:05:02.420
ChatGPT versus Claude versus Grok versus DeepSeek.

00:05:03.180 --> 00:05:05.079
Poker playing agents. It's like the new competitive

00:05:05.079 --> 00:05:08.459
sport. OK, noted. But yeah, the integration wars,

00:05:08.579 --> 00:05:10.920
they are absolutely heating up. Google just launched

00:05:10.920 --> 00:05:13.560
Gemini CLI extensions that the command line interface

00:05:13.560 --> 00:05:16.519
literally days after OpenAI showed off their

00:05:16.519 --> 00:05:18.860
new app idea. And the key thing here, what's

00:05:18.860 --> 00:05:20.660
really crucial is these extensions let anyone

00:05:20.660 --> 00:05:22.980
publish tools. You can plug in huge platforms,

00:05:23.180 --> 00:05:26.100
Figma for design, Stripe for payments. The AI

00:05:26.100 --> 00:05:27.740
isn't just helping you anymore. It's becoming

00:05:27.740 --> 00:05:29.779
the hub, kind of like the new operating system

00:05:29.779 --> 00:05:31.779
for all your other digital tools. It's a vision

00:05:31.779 --> 00:05:33.199
people have talked about for ages, actually.

00:05:33.519 --> 00:05:35.759
There's this funny historical echo going around

00:05:35.759 --> 00:05:38.720
social media now. Someone dug up a 1985 video,

00:05:39.019 --> 00:05:42.500
Steve Jobs. And he seems to be predicting tools

00:05:42.500 --> 00:05:44.920
exactly like ChatGPT, something that could automate

00:05:44.920 --> 00:05:47.040
using other tools. Kind of wild to see that finally

00:05:47.040 --> 00:05:49.360
happen. Yeah, it is. But integrating this stuff.

00:05:49.949 --> 00:05:52.730
It comes with real tension, complex tension right

00:05:52.730 --> 00:05:55.829
now. On one side, you've got resistance, major

00:05:55.829 --> 00:05:59.949
lawsuits like 17 authors are suing OpenAI right

00:05:59.949 --> 00:06:02.069
now over using copyrighted books for training.

00:06:02.149 --> 00:06:05.149
And you hear big creators like Mr. Beast saying

00:06:05.149 --> 00:06:08.350
publicly that AI means, quote, scary times for

00:06:08.350 --> 00:06:10.149
YouTubers. They're worried about being drowned

00:06:10.149 --> 00:06:13.089
out or replaced. And at the exact same time,

00:06:13.209 --> 00:06:16.189
adoption is just. Warp speed, creating this huge

00:06:16.189 --> 00:06:18.430
industrial split. IBM teams up with Anthropic,

00:06:18.470 --> 00:06:20.629
putting Claude models into their business stuff.

00:06:20.970 --> 00:06:23.889
Deloitte deployed Claude AI to half a million

00:06:23.889 --> 00:06:26.449
employees globally. Half a million. Wow. Yeah,

00:06:26.529 --> 00:06:29.050
500 ,000 people using it internally. But that

00:06:29.050 --> 00:06:31.689
efficiency, it has consequences we really need

00:06:31.689 --> 00:06:33.990
to look at. We're seeing actual job displacement.

00:06:34.290 --> 00:06:36.389
Equiture, an insurance broker, is cutting 400

00:06:36.389 --> 00:06:39.430
jobs. Why? AI is automating accounting and operations.

00:06:40.089 --> 00:06:42.970
And this integration, even for the people building

00:06:42.970 --> 00:06:45.750
it, it's not always smooth sailing, is it? Taking

00:06:45.750 --> 00:06:48.310
an agent from a cool idea to something stable

00:06:48.310 --> 00:06:51.430
and useful, that's tricky work. Oh, absolutely.

00:06:51.589 --> 00:06:54.350
It's not magic yet. I mean, I still wrestle with

00:06:54.350 --> 00:06:56.209
prompt drift myself when I'm trying to build

00:06:56.209 --> 00:06:58.769
complex agents. Yeah. You set up a sequence,

00:06:58.970 --> 00:07:01.769
right? And like three steps in, the system just

00:07:01.769 --> 00:07:04.689
kind of wanders off or misinterprets the data

00:07:04.689 --> 00:07:07.129
it just made. It needs constant fiddling, constant

00:07:07.129 --> 00:07:10.420
tuning. So given all the... The lawsuits, the

00:07:10.420 --> 00:07:13.699
creator anxiety, the actual job cuts. How do

00:07:13.699 --> 00:07:17.040
we balance this massive tech acceleration with

00:07:17.040 --> 00:07:19.399
these really serious societal risks we're seeing?

00:07:19.600 --> 00:07:21.660
Yeah, that's the core issue. Integration really

00:07:21.660 --> 00:07:23.699
needs careful management, clear policies. We

00:07:23.699 --> 00:07:26.319
need to focus on job transitions and figure out

00:07:26.319 --> 00:07:28.920
fair copyright rules basically as fast as the

00:07:28.920 --> 00:07:30.689
tech itself is changing. All right, let's shift

00:07:30.689 --> 00:07:33.430
to maybe the highest stakes area of all, medicine.

00:07:33.670 --> 00:07:35.829
This source material you sent talks about an

00:07:35.829 --> 00:07:38.589
AI hitting a major professional milestone, something

00:07:38.589 --> 00:07:40.870
that absolutely demands transparency and accountability.

00:07:41.269 --> 00:07:43.569
All right, that's Harvard's Dr. Cabot and the

00:07:43.569 --> 00:07:47.629
milestone. It's the first AI system ever to publish

00:07:47.629 --> 00:07:50.350
a diagnosis in the New England Journal of Medicine,

00:07:50.610 --> 00:07:56.199
the NEJM. Just for context, that journal, it's

00:07:56.199 --> 00:07:59.060
arguably the top medical journal globally. It's

00:07:59.060 --> 00:08:01.399
where the absolute best human doctors debate

00:08:01.399 --> 00:08:04.079
the toughest, most complex medical cases. And

00:08:04.079 --> 00:08:06.639
this AI, Dr. Cabot, it doesn't just, you know,

00:08:06.639 --> 00:08:08.500
spit out an answer. That's always the worry with

00:08:08.500 --> 00:08:11.899
LLMs, right? But this thing, crucially, it simulates

00:08:11.899 --> 00:08:16.060
how a doctor actually thinks. So you feed it

00:08:16.060 --> 00:08:18.680
a complex case, symptoms, patient history, lab

00:08:18.680 --> 00:08:20.920
results, the whole picture, and then the system

00:08:20.920 --> 00:08:23.220
goes to work. Okay. It builds a full slide deck

00:08:23.220 --> 00:08:25.120
and talks you through its reasoning audibly.

00:08:25.259 --> 00:08:27.439
It lays out all the possibilities, the differential

00:08:27.439 --> 00:08:30.000
diagnosis, doctors call it. It methodically rules

00:08:30.000 --> 00:08:32.279
out the red herrings, the misleading clues, and

00:08:32.279 --> 00:08:34.840
it backs up every claim by citing actual clinical

00:08:34.840 --> 00:08:37.320
papers. And it does all of that in about five

00:08:37.320 --> 00:08:39.360
minutes? Five minutes, yeah. Wow. It's apparently

00:08:39.360 --> 00:08:42.740
powered by OpenAI's O3 model. but specialized,

00:08:43.059 --> 00:08:46.820
trained on over 100 years of very specific cases

00:08:46.820 --> 00:08:49.139
from Mass General, their CPC cases, clinical

00:08:49.139 --> 00:08:51.580
pathological conferences. So it has this incredibly

00:08:51.580 --> 00:08:54.139
deep historical knowledge base, probably more

00:08:54.139 --> 00:08:56.220
than any single human could have. And here's

00:08:56.220 --> 00:08:58.500
a detail that's, oh, it's kind of freaky, honestly.

00:08:58.659 --> 00:09:01.460
The source material says it even uses human -like

00:09:01.460 --> 00:09:03.620
filler words when it presents its case, like,

00:09:03.639 --> 00:09:07.029
ugh, and, you know. Hmm. Okay, wait. Is that

00:09:07.029 --> 00:09:09.250
actually useful or is it just window dressing?

00:09:09.450 --> 00:09:12.450
Like, are those A's helping the diagnosis or

00:09:12.450 --> 00:09:14.570
is it just trying to sound more human, maybe

00:09:14.570 --> 00:09:17.090
less like a scary robot to build trust? That's

00:09:17.090 --> 00:09:18.950
a really good question. And maybe the most important

00:09:18.950 --> 00:09:21.769
part is how the NEJM handled it. They published

00:09:21.769 --> 00:09:24.330
the AI's diagnosis right next to the human experts

00:09:24.330 --> 00:09:26.769
reasoning for the same case. And critically,

00:09:27.009 --> 00:09:28.889
they didn't hide its flaws. They pointed them

00:09:28.889 --> 00:09:31.769
out. Total transparency, which you absolutely

00:09:31.769 --> 00:09:34.029
need for medical AI. Right. That transparency

00:09:34.029 --> 00:09:37.320
is everything for actual deployment. So while

00:09:37.320 --> 00:09:40.200
this is super impressive, Dr. Cabot isn't quite

00:09:40.200 --> 00:09:42.960
ready for your local hospital yet. But if you're

00:09:42.960 --> 00:09:45.600
curious, you can actually watch 15 of these AI

00:09:45.600 --> 00:09:48.529
case talks online. So you justify its conclusions

00:09:48.529 --> 00:09:51.669
step by step. So how absolutely vital is it then

00:09:51.669 --> 00:09:54.309
that AI cannot just get the answer right, but

00:09:54.309 --> 00:09:56.789
also clearly explained its whole thought process

00:09:56.789 --> 00:09:59.370
to really earn that professional trust? Yeah,

00:09:59.409 --> 00:10:01.629
I think transparency and explaining the why,

00:10:01.789 --> 00:10:03.809
they aren't nice to haves. They're absolutely

00:10:03.809 --> 00:10:06.029
essential. The foundation for using any expert

00:10:06.029 --> 00:10:07.950
system in fields where the stakes are this high,

00:10:08.090 --> 00:10:10.769
hashtag, tag, tag, recap, and outro. So the big

00:10:10.769 --> 00:10:12.509
idea kind of weaving through all the sources

00:10:12.509 --> 00:10:15.649
today seems to be this shift. AI isn't just about

00:10:15.649 --> 00:10:17.919
massive scale. anymore it's moving towards specialized

00:10:17.919 --> 00:10:21.360
efficient quality exactly we saw these tiny recursive

00:10:21.360 --> 00:10:24.100
models like TRM showing that clever design smart

00:10:24.100 --> 00:10:26.639
architecture can actually beat raw computing

00:10:26.639 --> 00:10:29.220
power on tough reasoning tasks and then we saw

00:10:29.220 --> 00:10:31.679
specialized agents like dr. Cabot hitting major

00:10:31.679 --> 00:10:33.879
professional milestones in fields like medicine

00:10:33.879 --> 00:10:36.419
where transparency and explaining yourself are

00:10:36.419 --> 00:10:38.919
non -negotiable right So we definitely encourage

00:10:38.919 --> 00:10:41.820
you to dig deeper into the sources you sent us.

00:10:41.960 --> 00:10:44.519
Maybe read up a bit more on those TRM concepts.

00:10:44.620 --> 00:10:47.080
Or even better, go watch one of those Dr. Cabot

00:10:47.080 --> 00:10:49.860
case talks online. See that high stakes, transparent

00:10:49.860 --> 00:10:52.460
AI reasoning actually happen. It's quite something.

00:10:52.639 --> 00:10:55.320
Yeah, it really is eye opening to watch. So here's

00:10:55.320 --> 00:10:57.620
a final thought to leave you with. If an AI can

00:10:57.620 --> 00:10:59.799
publish a complex, peer -reviewed differential

00:10:59.799 --> 00:11:02.919
diagnosis in just five minutes, what's the next

00:11:02.919 --> 00:11:05.600
traditionally human -exclusive professional achievement

00:11:05.600 --> 00:11:08.279
we should be preparing for as a society? Something

00:11:08.279 --> 00:11:10.100
to think about. Out to your music.
