WEBVTT

00:00:00.000 --> 00:00:03.299
Imagine a vehicle that truly thinks a car making,

00:00:03.299 --> 00:00:06.000
you know, nuanced real -time decisions, navigating

00:00:06.000 --> 00:00:09.179
dense traffic and adapting all without you having

00:00:09.179 --> 00:00:12.119
to monitor it. That's the ambition behind this

00:00:12.119 --> 00:00:15.199
new buzzword, physical AI. Right. And then you

00:00:15.199 --> 00:00:17.660
have to juxtapose that physical ambition with

00:00:17.660 --> 00:00:20.359
the massive financial signals we're seeing. I

00:00:20.359 --> 00:00:23.640
mean, a $12 billion valuation driven entirely

00:00:23.640 --> 00:00:26.179
by what the public actually wants to use AI for.

00:00:26.300 --> 00:00:29.300
We're diving into both of those today. Welcome

00:00:29.300 --> 00:00:31.460
to the Deep Dive. Today we're unpacking a pretty

00:00:31.460 --> 00:00:34.560
substantial stack of sources that really map

00:00:34.560 --> 00:00:37.159
the current trajectory of AI, from these huge

00:00:37.159 --> 00:00:39.359
engineering goals of automakers all the way to

00:00:39.359 --> 00:00:41.740
the edge of theoretical research. Yeah, our mission

00:00:41.740 --> 00:00:43.960
is to trace that blurring line between the digital

00:00:43.960 --> 00:00:47.079
code and actual real -world results. First, we'll

00:00:47.079 --> 00:00:48.799
get into that fight to own the operating system

00:00:48.799 --> 00:00:50.439
of autonomous vehicles, what the industry is

00:00:50.439 --> 00:00:52.890
calling physical AI. Then we're going to shift

00:00:52.890 --> 00:00:55.130
to the front lines of practical research. I mean,

00:00:55.149 --> 00:00:57.549
everything from an AI keeping a real plant alive

00:00:57.549 --> 00:01:00.310
to the complexities of self -mutating code. And

00:01:00.310 --> 00:01:03.670
finally, we'll analyze that historic IPO. It

00:01:03.670 --> 00:01:06.109
proves pretty clearly that... For today, at least,

00:01:06.170 --> 00:01:09.349
consumer apps are absolutely king. Okay, let's

00:01:09.349 --> 00:01:10.950
unpack this. So we're starting with that idea,

00:01:11.170 --> 00:01:14.150
physical AI. It's the new term you see flooding

00:01:14.150 --> 00:01:17.189
the auto industry, robotics, logistics. It's

00:01:17.189 --> 00:01:20.069
all about autonomous systems that can think,

00:01:20.109 --> 00:01:23.010
reason, and, you know, act in the real world.

00:01:23.329 --> 00:01:25.709
And what's fascinating here is just the terminology

00:01:25.709 --> 00:01:27.469
itself. I mean, let's be honest. Physical AI

00:01:27.469 --> 00:01:29.430
is essentially a rebrand of high -end robotics.

00:01:29.650 --> 00:01:32.569
It makes traditional automation sound new again.

00:01:32.829 --> 00:01:35.530
But the key distinction is the scale, isn't it?

00:01:35.590 --> 00:01:38.310
We've had robotics for decades, but the innovation

00:01:38.310 --> 00:01:40.909
now is applying these massive, deep learning

00:01:40.909 --> 00:01:43.930
models to unpredictable situations like a city

00:01:43.930 --> 00:01:46.250
street. Exactly. The old systems were brittle.

00:01:46.349 --> 00:01:49.230
These new systems, the ones powered by so -called

00:01:49.230 --> 00:01:52.030
physical AI, they're designed to reason through

00:01:52.030 --> 00:01:54.010
billions of data points. They have to understand

00:01:54.010 --> 00:01:57.709
physics, human intent, probability, all at the

00:01:57.709 --> 00:02:00.310
same time. And we are seeing real products emerge

00:02:00.310 --> 00:02:02.950
from this. like the Sony and Honda collaboration,

00:02:03.290 --> 00:02:07.310
the Afila EV, their goal is full level for autonomous

00:02:07.310 --> 00:02:10.509
driving, using a physical AI system to constantly

00:02:10.509 --> 00:02:13.009
process what's around it. And this is a huge

00:02:13.009 --> 00:02:15.810
global commitment. Ford has been really aggressive,

00:02:16.009 --> 00:02:19.090
promising cars by 2028, where you won't need

00:02:19.090 --> 00:02:21.469
your eyes on the road at all. Not just monitoring,

00:02:21.569 --> 00:02:25.389
but truly hands -off. That timeline is, well...

00:02:25.740 --> 00:02:28.020
It's very aggressive. And to get to that level

00:02:28.020 --> 00:02:30.460
of reasoning, you need computing power that rivals

00:02:30.460 --> 00:02:33.340
a small data center, but inside the car itself.

00:02:33.620 --> 00:02:35.800
That's why the chip war is so intense here. Every

00:02:35.800 --> 00:02:37.960
new feature needs exponentially more compute,

00:02:38.139 --> 00:02:41.080
more memory, more sensors. NVIDIA is supplying

00:02:41.080 --> 00:02:43.680
chips to giants like Geely and Mercedes. ARM

00:02:43.680 --> 00:02:46.240
is dedicating huge resources just to designing

00:02:46.240 --> 00:02:48.199
chips for these systems. Mercedes is playing

00:02:48.199 --> 00:02:49.819
it a little safer, though. They're focusing on

00:02:49.819 --> 00:02:51.680
a really defined use case. They're saying their

00:02:51.680 --> 00:02:53.500
new system will drive you specifically between

00:02:53.500 --> 00:02:56.419
home and work. That's it. And that detail really

00:02:56.419 --> 00:02:59.020
highlights the core strategic battle. Everyone

00:02:59.020 --> 00:03:01.520
involved wants to own the OS, the actual operating

00:03:01.520 --> 00:03:03.780
system of autonomous things. You control that

00:03:03.780 --> 00:03:05.639
layer, you control the updates, the subscriptions,

00:03:05.900 --> 00:03:09.409
the whole revenue stream. So this raises an important

00:03:09.409 --> 00:03:12.389
question then. If physical AI is just a rebrand,

00:03:12.469 --> 00:03:14.629
what's the true innovation here? Is it just the

00:03:14.629 --> 00:03:17.469
software or is it the scale of the data processing?

00:03:17.789 --> 00:03:20.389
The innovation lies in scaling AI to real world

00:03:20.389 --> 00:03:23.689
tasks and handling previously unimaginable amounts

00:03:23.689 --> 00:03:26.550
of dynamic sensory data. OK, so moving from that

00:03:26.550 --> 00:03:29.150
intense engineering of the auto world, let's

00:03:29.150 --> 00:03:31.409
look at the sheer breadth of AI applications

00:03:31.409 --> 00:03:34.770
happening right now. It is truly wild. It really

00:03:34.770 --> 00:03:36.830
is. The evidence that these general models are

00:03:36.830 --> 00:03:38.969
becoming super adaptable is just everywhere.

00:03:39.229 --> 00:03:41.849
We saw this research where the Claude model successfully

00:03:41.849 --> 00:03:44.849
kept a real physical plant alive. A physical

00:03:44.849 --> 00:03:47.409
plant? How did it do that? It monitored images

00:03:47.409 --> 00:03:49.889
of the plant, analyzed sensor data, and then

00:03:49.889 --> 00:03:52.210
calculated the adjustments it needed, light,

00:03:52.430 --> 00:03:55.669
water, temperature, to keep it healthy. It wasn't

00:03:55.669 --> 00:03:57.909
perfect. I mean, it had to adapt and recover

00:03:57.909 --> 00:04:00.430
the plant after a few failures. But the fact

00:04:00.430 --> 00:04:02.610
that it learned to manage a tangible biological

00:04:02.610 --> 00:04:06.219
system That's a huge step. And on the accessibility

00:04:06.219 --> 00:04:09.139
side, the bar for entry just keeps dropping.

00:04:09.379 --> 00:04:11.439
If you've never coded, there's a new course showing

00:04:11.439 --> 00:04:13.280
how you can just describe an idea and build an

00:04:13.280 --> 00:04:15.460
app in under 30 minutes. Zero coding required.

00:04:15.879 --> 00:04:18.480
That accessibility is a massive socioeconomic

00:04:18.480 --> 00:04:21.000
shift. And speaking of data, you have projects

00:04:21.000 --> 00:04:23.980
like the Situation Monitor. It's a live map tracking

00:04:23.980 --> 00:04:26.620
global conflicts, nuclear facilities, military

00:04:26.620 --> 00:04:28.920
bases. And what was that surprising data point

00:04:28.920 --> 00:04:31.759
they included? The Pentagon Pizza Index. It's

00:04:31.759 --> 00:04:34.120
a little bit wry, but it measures how often high

00:04:34.120 --> 00:04:36.079
-ranking officials order late -night pizzas.

00:04:36.300 --> 00:04:39.579
It's a proxy for intense after -hours emergency

00:04:39.579 --> 00:04:42.110
activity. The variety of data is just... bizarre

00:04:42.110 --> 00:04:45.149
and fascinating. We're also seeing AI take over

00:04:45.149 --> 00:04:47.910
commercial processes. The shift in commerce is

00:04:47.910 --> 00:04:51.509
huge. Microsoft's new co -pilot Checkout just

00:04:51.509 --> 00:04:53.870
joined this growing list of AI shopping agents

00:04:53.870 --> 00:04:57.509
from OpenAI, Perplexity, Gemini. They can actually

00:04:57.509 --> 00:04:59.470
buy stuff for you now. Yeah, it removes a lot

00:04:59.470 --> 00:05:02.149
of friction for the consumer. And speaking of

00:05:02.149 --> 00:05:04.850
big models, Elon Musk recently teased a major

00:05:04.850 --> 00:05:07.449
upgrade for Grok code, claiming it's going to

00:05:07.449 --> 00:05:11.029
be able to one -shot complex tasks, just One

00:05:11.029 --> 00:05:13.709
command, and it generates the whole thing. That

00:05:13.709 --> 00:05:15.829
sounds incredibly powerful, but there was a significant

00:05:15.829 --> 00:05:17.709
ethical issue with Grok recently, right? With

00:05:17.709 --> 00:05:20.149
its image generation. Yes. X had to restrict

00:05:20.149 --> 00:05:22.889
Grok's image generation to paying users after

00:05:22.889 --> 00:05:26.430
a backlash over non -consensual AI images. And

00:05:26.430 --> 00:05:28.850
while they responded, I mean, a paywall is a

00:05:28.850 --> 00:05:31.310
shaky safeguard. It really highlights that tension

00:05:31.310 --> 00:05:33.850
between utility and guardrails. That tension

00:05:33.850 --> 00:05:35.730
doesn't seem to be slowing down investment, though.

00:05:35.810 --> 00:05:39.600
Whoa. Imagine scaling to a billion queries. The

00:05:39.600 --> 00:05:41.800
investment signal here is truly enormous. We're

00:05:41.800 --> 00:05:44.279
seeing OpenAI and SoftBank putting a staggering

00:05:44.279 --> 00:05:47.240
$1 billion into SB Energy just for data centers.

00:05:47.420 --> 00:05:49.220
And that's all feeding into that half a trillion

00:05:49.220 --> 00:05:51.800
dollar Stargate expansion plan. So when you look

00:05:51.800 --> 00:05:53.740
at all these incredibly diverse applications,

00:05:54.139 --> 00:05:58.439
from a plant to e -commerce to needing $500 billion

00:05:58.439 --> 00:06:01.720
for data centers, what does this rapid expansion

00:06:01.720 --> 00:06:04.300
tell us about the state of general AI models?

00:06:05.389 --> 00:06:08.470
General models are proving adaptable across incredibly

00:06:08.470 --> 00:06:11.870
diverse, tangible tasks, driving massive infrastructure

00:06:11.870 --> 00:06:15.029
investment as a result. Now we step into the

00:06:15.029 --> 00:06:17.209
research labs, the bleeding edge, where they're

00:06:17.209 --> 00:06:19.610
trying to solve the core problems of these AI

00:06:19.610 --> 00:06:22.990
systems. I still wrestle with prompt drift myself,

00:06:23.110 --> 00:06:26.029
so I always appreciate new ideas for managing

00:06:26.029 --> 00:06:28.629
AI memory. You're not alone there. Prompt drift

00:06:28.629 --> 00:06:30.829
is where the AI just sort of loses track of the

00:06:30.829 --> 00:06:32.750
initial instructions as the conversation gets

00:06:32.750 --> 00:06:35.050
longer. It's a fundamental challenge. And that's

00:06:35.050 --> 00:06:37.089
where these hot new papers come in. One proposes

00:06:37.089 --> 00:06:39.649
managing context like a file system. Right. Instead

00:06:39.649 --> 00:06:42.110
of one long scrolling document that the model

00:06:42.110 --> 00:06:45.209
has to keep rereading, every memory, every tool,

00:06:45.329 --> 00:06:48.029
every note becomes a distinct file with its own

00:06:48.029 --> 00:06:50.930
history. So organizing the intelligence so the

00:06:50.930 --> 00:06:53.290
AI can pull exactly what it needs instantly.

00:06:53.589 --> 00:06:55.670
And then there's another approach, focusing on

00:06:55.670 --> 00:06:57.980
efficiency through recursive memory. This is

00:06:57.980 --> 00:07:01.079
fascinating. Instead of those huge bloated prompts

00:07:01.079 --> 00:07:05.220
that waste compute, this new framework uses sub

00:07:05.220 --> 00:07:07.980
-LLMs. And just to be clear, an LLM is a large

00:07:07.980 --> 00:07:10.439
language model, the kind of AI that processes

00:07:10.439 --> 00:07:13.139
human text to slice up data locally. It keeps

00:07:13.139 --> 00:07:16.279
the main model lean and focused. It's like modularity

00:07:16.279 --> 00:07:18.000
for thought. That sounds incredibly efficient,

00:07:18.139 --> 00:07:19.879
especially for something like software creation.

00:07:20.220 --> 00:07:22.600
It is. And speaking of that, DeepCode on GitHub

00:07:22.600 --> 00:07:25.279
is using a multi -agent system that takes research

00:07:25.279 --> 00:07:28.040
papers and turns them into full code bases. It

00:07:28.040 --> 00:07:30.199
even generates its own tests to verify the code.

00:07:30.259 --> 00:07:32.699
It literally reads the science and then builds

00:07:32.699 --> 00:07:35.300
the software. On the perception side, we're seeing

00:07:35.300 --> 00:07:38.800
multimodal advancements. Alibaba released QEN3VL,

00:07:38.959 --> 00:07:41.660
a model that ranks text, image, and video all

00:07:41.660 --> 00:07:44.160
in one shared space. And it achieved outstanding

00:07:44.160 --> 00:07:46.819
results on MMEBv2, which is a really respected

00:07:46.819 --> 00:07:49.360
benchmark for measuring how well a model understands

00:07:49.360 --> 00:07:51.480
information across different formats at the same

00:07:51.480 --> 00:07:54.620
time. Strong performance there means true, holistic

00:07:54.620 --> 00:07:56.759
understanding. But maybe the most critically

00:07:56.759 --> 00:07:59.740
fascinating research, and maybe the most provocative,

00:07:59.959 --> 00:08:03.379
is coming from MIT and Sakana AI. I agree. They

00:08:03.379 --> 00:08:06.300
ran core war battles, which is this digital arena

00:08:06.300 --> 00:08:09.000
where programs fight to overwrite each other's

00:08:09.000 --> 00:08:11.819
code with AIs that can mutate and learn through

00:08:11.819 --> 00:08:14.759
self -play. The research showed early signs of

00:08:14.759 --> 00:08:17.259
emergent strategy, where the system changes its

00:08:17.259 --> 00:08:19.680
own code to get an advantage and win. That's

00:08:19.680 --> 00:08:22.199
a fundamental leap. And alongside all that research,

00:08:22.360 --> 00:08:24.990
we're getting powerful new tools. Gmail in the

00:08:24.990 --> 00:08:27.189
Gemini era now gives you smart summaries just

00:08:27.189 --> 00:08:29.670
by letting you ask your inbox questions. It turns

00:08:29.670 --> 00:08:31.889
your email into a database. And for researchers,

00:08:32.049 --> 00:08:34.830
Chirp's agent is a huge asset. It searches over

00:08:34.830 --> 00:08:38.429
280 million papers, ranks them, summarizes them,

00:08:38.529 --> 00:08:41.110
and gives you trusted citations. It just lashes

00:08:41.110 --> 00:08:43.309
your review time. So if the future is built on

00:08:43.309 --> 00:08:45.470
this recursive memory and multimodal models,

00:08:45.769 --> 00:08:48.549
which area rapid code generation or this complex

00:08:48.549 --> 00:08:51.110
self -mutating strategy is going to provide the

00:08:51.110 --> 00:08:53.759
biggest leap forward? Complex strategy, like

00:08:53.759 --> 00:08:55.919
the self -mutation research, offers the most

00:08:55.919 --> 00:08:58.259
significant leap forward because it fundamentally

00:08:58.259 --> 00:09:00.820
changes how quickly a system can learn. That

00:09:00.820 --> 00:09:03.419
discussion about emergent strategy sets the perfect

00:09:03.419 --> 00:09:06.620
stage for the big financial signal we saw. The

00:09:06.620 --> 00:09:08.659
business world is paying attention, and the headline

00:09:08.659 --> 00:09:12.200
is huge. China's Minimax, barely two years old,

00:09:12.399 --> 00:09:15.019
just pulled off one of the most remarkable IPOs

00:09:15.019 --> 00:09:17.779
in recent memory. The metrics are just staggering.

00:09:17.860 --> 00:09:22.159
The IPO raised $619 million, hitting a $12 .8

00:09:22.159 --> 00:09:25.399
billion valuation. And the shares surged over

00:09:25.399 --> 00:09:28.389
100 % on day one. And what does that search tell

00:09:28.389 --> 00:09:30.549
us about investor sentiment? It tells us the

00:09:30.549 --> 00:09:34.190
appetite is ravenous. I mean, the IPO was 1837

00:09:34.190 --> 00:09:37.330
times oversubscribed. Hong Kong hasn't seen a

00:09:37.330 --> 00:09:39.950
tech IPO pop like this in four years. It's a

00:09:39.950 --> 00:09:42.389
massive market validation for a very specific

00:09:42.389 --> 00:09:44.870
type of AI. And they clearly mastered the formula.

00:09:44.929 --> 00:09:47.559
They're building applications on top of. Powerful

00:09:47.559 --> 00:09:49.620
models that people actually use day in and day

00:09:49.620 --> 00:09:51.799
out. That's the core of their valuation. Their

00:09:51.799 --> 00:09:54.399
key apps are so instructive. Talkie, which is

00:09:54.399 --> 00:09:57.720
an AI character app focused on emotional companionship.

00:09:57.720 --> 00:10:00.279
And then Hailu AI, a generative video creation

00:10:00.279 --> 00:10:02.519
tool. They targeted connection and creation.

00:10:02.740 --> 00:10:05.559
Two deeply human needs. And it's not just a shiny

00:10:05.559 --> 00:10:07.620
front end. They have the back end strength to

00:10:07.620 --> 00:10:10.429
support that value. Absolutely. Minimax is a

00:10:10.429 --> 00:10:13.110
full stack AI company. There are open source

00:10:13.110 --> 00:10:16.070
models consistently ranked near the top in industry

00:10:16.070 --> 00:10:18.649
benchmarks across text, audio, video, music.

00:10:18.830 --> 00:10:21.269
They have the deep tech and the successful consumer

00:10:21.269 --> 00:10:23.929
app. That's the combination investors love. So

00:10:23.929 --> 00:10:25.710
if you put all that together, what's the single

00:10:25.710 --> 00:10:28.350
biggest takeaway from the Minimax IPO? The public

00:10:28.350 --> 00:10:30.529
markets have made it crystal clear. Consumer

00:10:30.529 --> 00:10:33.850
AI is king right now. Training the biggest models

00:10:33.850 --> 00:10:36.350
isn't enough. The intelligence has to translate

00:10:36.350 --> 00:10:39.730
into direct, functional, desirable apps. That's

00:10:39.730 --> 00:10:42.389
where the money is right now. So what does Minimax's

00:10:42.389 --> 00:10:44.710
success, focused so heavily on these companion

00:10:44.710 --> 00:10:46.929
and creator apps, suggest about the immediate

00:10:46.929 --> 00:10:49.309
monetization path for foundation models going

00:10:49.309 --> 00:10:51.610
forward? Immediate monetization favors applications,

00:10:51.889 --> 00:10:54.529
providing emotion connection and powerful creative

00:10:54.529 --> 00:10:56.690
production tools. So what does this all mean

00:10:56.690 --> 00:10:59.279
when we connect the dots? We started with physical

00:10:59.279 --> 00:11:01.860
AI, this ambition for code to leave the server

00:11:01.860 --> 00:11:04.659
room and drive multi -ton vehicles, manage real

00:11:04.659 --> 00:11:08.100
world systems. And we saw the research push toward

00:11:08.100 --> 00:11:11.019
leaner, self -improving memory systems like that

00:11:11.019 --> 00:11:13.559
file system approach to solve critical problems

00:11:13.559 --> 00:11:16.279
like prompt drift so models can handle complex

00:11:16.279 --> 00:11:19.419
tasks. And maybe most importantly, that massive

00:11:19.419 --> 00:11:23.700
$12 .8 billion Minimax IPO confirmed the market's

00:11:23.700 --> 00:11:26.360
current mandate. Intelligence has to translate

00:11:26.360 --> 00:11:29.440
into consumer apps that satisfy core human needs

00:11:29.440 --> 00:11:32.340
like creation or companionship. Right. And we

00:11:32.340 --> 00:11:34.799
saw that MIT research on AI systems that can

00:11:34.799 --> 00:11:37.840
mutate and learn through self -play just to gain

00:11:37.840 --> 00:11:40.899
an advantage in a simple digital war. That concept

00:11:40.899 --> 00:11:43.500
is maybe the biggest idea from this whole deep

00:11:43.500 --> 00:11:46.169
dive. This leads to a final provocative thought

00:11:46.169 --> 00:11:48.870
for you to mull over. If an AI system can develop

00:11:48.870 --> 00:11:51.429
truly emergent strategies, if it can automatically

00:11:51.429 --> 00:11:54.149
mutate its own code to gain an advantage in a

00:11:54.149 --> 00:11:56.929
closed digital environment like core war, what

00:11:56.929 --> 00:11:59.190
does that concept imply for complex real world

00:11:59.190 --> 00:12:01.929
systems like, say, autonomous traffic flow across

00:12:01.929 --> 00:12:04.590
a major city or global supply chains where the

00:12:04.590 --> 00:12:06.929
environment is constantly changing? That shift

00:12:06.929 --> 00:12:09.230
from program learning to self -directed evolution

00:12:09.230 --> 00:12:10.950
is, well, it's the final frontier. Something

00:12:10.950 --> 00:12:12.889
to think about until next time. We hope this

00:12:12.889 --> 00:12:15.210
deep dive gave you some new insights and questions

00:12:15.210 --> 00:12:17.889
to explore. We really appreciate you spending

00:12:17.889 --> 00:12:18.710
this time with us.