WEBVTT

00:00:00.000 --> 00:00:01.899
You know, I was driving in today, stuck in that

00:00:01.899 --> 00:00:05.080
bumper -to -bumper grind, and I started thinking

00:00:05.080 --> 00:00:10.000
about lag. Not traffic lag, but cosmic lag. The

00:00:10.000 --> 00:00:12.220
ultimate latency. Exactly. The distance between

00:00:12.220 --> 00:00:14.740
Earth and Mars. I mean, depending on the orbital

00:00:14.740 --> 00:00:16.960
alignment, you're looking at a radio signal taking

00:00:16.960 --> 00:00:19.359
anywhere from, what, 4 to 20 minutes to get there.

00:00:19.460 --> 00:00:22.239
Yeah. And that delay. Mm -hmm. It has always

00:00:22.239 --> 00:00:23.920
been the fundamental constraint of exploration.

00:00:24.280 --> 00:00:26.739
It's the terror window. If you're a robot on

00:00:26.739 --> 00:00:28.519
Mars and you're about to drive off a cliff, a

00:00:28.519 --> 00:00:31.440
human operator in Houston won't know about it

00:00:31.440 --> 00:00:33.640
until 20 minutes after you've already crashed.

00:00:33.740 --> 00:00:35.920
Right. You send a command, turn left, and you

00:00:35.920 --> 00:00:38.159
wait 40 minutes round trip just to see if it

00:00:38.159 --> 00:00:40.340
actually happened. But, and this is the hook

00:00:40.340 --> 00:00:43.579
for today. That dynamic just fundamentally broke.

00:00:43.799 --> 00:00:46.500
It did. The leash has been cut. Because for the

00:00:46.500 --> 00:00:48.659
first time, we aren't just sending commands to

00:00:48.659 --> 00:00:52.219
a rover. We just saw a robot navigate 400 meters

00:00:52.219 --> 00:00:55.000
across the red planet, dodging rocks and sand

00:00:55.000 --> 00:00:57.740
ripples on a path planned entirely by an AI.

00:00:58.119 --> 00:01:02.219
Specifically, Anthropics Quad Code AI. Not a

00:01:02.219 --> 00:01:06.340
committee of PhDs in Houston. An AI. It's a massive

00:01:06.340 --> 00:01:08.980
moment. It's the shift from teleoperation to

00:01:08.980 --> 00:01:12.459
true autonomy. And the crazy part is, this shift

00:01:12.459 --> 00:01:14.519
isn't just happening in the red dust of Mars.

00:01:14.840 --> 00:01:17.620
It's happening in the code editor on your laptop.

00:01:17.959 --> 00:01:20.599
And it's reshaping the orbital mechanics of Elon

00:01:20.599 --> 00:01:23.599
Musk's business empire. It's a convergence. So

00:01:23.599 --> 00:01:26.319
welcome to the Deep Dive. Today, we are unpacking

00:01:26.319 --> 00:01:28.680
a stack of research that suggests we have crossed

00:01:28.680 --> 00:01:31.879
a threshold. We are moving from chatting with

00:01:31.879 --> 00:01:34.739
AI, you know, typing into a box and waiting for

00:01:34.739 --> 00:01:37.359
text to AI, taking physical systemic action.

00:01:37.459 --> 00:01:39.560
We're calling it the shift from generation to...

00:01:39.560 --> 00:01:42.659
autonomy. So here is our roadmap. We're going

00:01:42.659 --> 00:01:45.620
to start on Mars to unpack exactly how Claude

00:01:45.620 --> 00:01:48.099
piloted that rover. Then we're coming back to

00:01:48.099 --> 00:01:50.840
Earth to look at the 2026 workflow, specifically

00:01:50.840 --> 00:01:54.260
why the copy -paste era of coding is, well, officially

00:01:54.260 --> 00:01:56.120
dead. Then we'll widen the lens to the business

00:01:56.120 --> 00:01:59.340
landscape, the mergers, the confusion at Google,

00:01:59.439 --> 00:02:02.239
and the strange emotional breakup users are having

00:02:02.239 --> 00:02:04.579
with GPT -4 .0. And finally, we have to talk

00:02:04.579 --> 00:02:07.090
about the wall, the inference wall. It turns

00:02:07.090 --> 00:02:09.830
out giving AI autonomy hits a hard physical limit

00:02:09.830 --> 00:02:13.069
that bigger GPUs alone cannot fix. That's the

00:02:13.069 --> 00:02:16.330
technical deep dive why memory, not speed, is

00:02:16.330 --> 00:02:18.370
the new bottleneck. It's a packed stack. Let's

00:02:18.370 --> 00:02:19.990
start with the Red Planet. Let's do it. Okay,

00:02:20.030 --> 00:02:22.330
so the headline is, Claude just helped drive

00:02:22.330 --> 00:02:26.430
a robot on Mars. But we need to be specific here.

00:02:26.770 --> 00:02:30.330
Rovers have had some level of auto -nav for years,

00:02:30.590 --> 00:02:33.330
right? For sure. Curiosity could stop if it saw

00:02:33.330 --> 00:02:36.199
a big rock. Huh. What is different here? The

00:02:36.199 --> 00:02:38.379
difference is the difference between a reflex

00:02:38.379 --> 00:02:42.580
and a plan. Previous rovers had hazard avoidance

00:02:42.580 --> 00:02:45.139
like a reflex. See, Rock, stop. Oh, yeah. What

00:02:45.139 --> 00:02:48.460
Claude did was strategic planning. NASA fed the

00:02:48.460 --> 00:02:51.259
AI years of historical rover data and orbital

00:02:51.259 --> 00:02:54.520
imagery. Claude analyzed the terrain, identifying

00:02:54.520 --> 00:02:57.500
not just big rocks, but subtle hazards like sand

00:02:57.500 --> 00:02:59.379
ripples. Well, those ripples are deceptive. They

00:02:59.379 --> 00:03:01.539
look flat, but they can trap wheels like quicksand.

00:03:01.740 --> 00:03:03.639
Exactly. And Claude didn't just highlight them.

00:03:03.719 --> 00:03:06.340
It wrote the specific driving commands. It mapped

00:03:06.340 --> 00:03:08.680
the waypoints. It generated the code to execute

00:03:08.680 --> 00:03:11.099
a 400 -meter drive. And for context, on Mars,

00:03:11.300 --> 00:03:13.780
400 meters is a marathon. It's a huge distance.

00:03:14.060 --> 00:03:16.300
But here's the part that I found really interesting

00:03:16.300 --> 00:03:18.800
in the research notes. It wasn't just a one -shot

00:03:18.800 --> 00:03:20.819
attempt. There was a loop involved. The self

00:03:20.819 --> 00:03:23.270
-critique loop. This is the secret sauce. So

00:03:23.270 --> 00:03:26.289
Claude generated a path, but then before showing

00:03:26.289 --> 00:03:28.990
it to a human, it critiqued its own work. It

00:03:28.990 --> 00:03:31.729
essentially asked, is this path actually safe?

00:03:32.110 --> 00:03:34.530
Is it efficient based on the rover's battery

00:03:34.530 --> 00:03:38.110
constraints? And then refined the plan. That's

00:03:38.110 --> 00:03:40.650
a sophisticated cognitive step. It's simulating

00:03:40.650 --> 00:03:43.250
the outcome and iterating. It's metacognition

00:03:43.250 --> 00:03:46.099
in a sense, checking your own work. NASA ran

00:03:46.099 --> 00:03:48.379
the final output through their high -fidelity

00:03:48.379 --> 00:03:51.259
simulations. It passed with flying colors. And

00:03:51.259 --> 00:03:54.060
in December, the rover executed the drive. The

00:03:54.060 --> 00:03:56.759
implication for the mission seems huge. I mean,

00:03:56.759 --> 00:03:58.520
if you take the human out of the micro decisions.

00:03:58.819 --> 00:04:01.259
You speed up exploration exponentially. Right

00:04:01.259 --> 00:04:03.759
now, rovers spend a huge amount of time idling,

00:04:03.800 --> 00:04:06.879
just waiting for the OK from Earth. Right. If

00:04:06.879 --> 00:04:08.840
the rover has a brain that can self -critique,

00:04:08.939 --> 00:04:11.780
it doesn't need to wait. It just goes. NASA is

00:04:11.780 --> 00:04:13.340
saying this could double or triple the number

00:04:13.340 --> 00:04:15.550
of drives per week. It does make you wonder,

00:04:15.610 --> 00:04:18.889
though, if the AI is analyzing the map, planning

00:04:18.889 --> 00:04:22.610
the route, and critiquing its own driving, what

00:04:22.610 --> 00:04:25.009
are we doing? We're watching the telemetry. Are

00:04:25.009 --> 00:04:26.949
we just passengers at this point? I wouldn't

00:04:26.949 --> 00:04:28.889
say passengers. I'd say we are shifting roles.

00:04:29.050 --> 00:04:32.129
We are moving from being the driver's hands on

00:04:32.129 --> 00:04:34.470
the wheel to being the navigators who set the

00:04:34.470 --> 00:04:36.430
destination. We just say, go to that crater.

00:04:36.610 --> 00:04:39.290
And the AI figures out how to not die on the

00:04:39.290 --> 00:04:41.949
way. Navigators setting the destination. That's

00:04:41.949 --> 00:04:44.009
a cleaner way to think about it. Yeah. Now let's

00:04:44.009 --> 00:04:46.750
bring this logic back down to Earth. Because

00:04:46.750 --> 00:04:49.709
if autonomy is changing how we explore Mars,

00:04:50.050 --> 00:04:52.370
it's definitely changing how we write software

00:04:52.370 --> 00:04:55.850
here. Oh, it's the same principle. The 2026 AI

00:04:55.850 --> 00:04:58.110
coding stack review just dropped, and it's pretty

00:04:58.110 --> 00:05:00.610
aggressive. It basically says, if you are still

00:05:00.610 --> 00:05:04.459
opening a chatbot window, asking for code. copying

00:05:04.459 --> 00:05:07.560
it and pasting it into VS Code, you are doing

00:05:07.560 --> 00:05:10.240
it the old way. Which is funny because two years

00:05:10.240 --> 00:05:13.459
ago, copy pasting from ChatGPT felt like magic.

00:05:13.879 --> 00:05:16.980
Now it's legacy. The speed of this industry is

00:05:16.980 --> 00:05:19.300
just relentless. The issue with the old way is

00:05:19.300 --> 00:05:21.259
the context switching. You're the bridge between

00:05:21.259 --> 00:05:25.660
the AI and the code. The new standard is AI first

00:05:25.660 --> 00:05:29.480
IDEs. tools like Cursor or the new anti -gravity

00:05:29.480 --> 00:05:32.600
setups. The AI isn't in a chat window. It lives

00:05:32.600 --> 00:05:34.899
inside the code base. I saw a mention of the

00:05:34.899 --> 00:05:37.399
Google Anti -Gravity Kit 2 .0 in the notes. I

00:05:37.399 --> 00:05:39.379
mean, the name alone sounds like sci -fi, but

00:05:39.379 --> 00:05:41.019
it's actually a workflow tool. It's a toolkit

00:05:41.019 --> 00:05:43.860
for agency. And the keyword there is agents,

00:05:44.139 --> 00:05:48.319
plural. The Anti -Gravity Kit adds 16 specialized

00:05:48.319 --> 00:05:51.279
agents to your environment. 16? Right. Because

00:05:51.279 --> 00:05:53.699
one giant LLM isn't great at everything. You

00:05:53.699 --> 00:05:56.439
want a specialist. So in this kit, You have an

00:05:56.439 --> 00:05:58.779
agent specifically for debugging, another one

00:05:58.779 --> 00:06:00.980
just for documentation, another for security

00:06:00.980 --> 00:06:03.519
auditing. They work in concert. It's like a digital

00:06:03.519 --> 00:06:06.139
pit crew. Exactly. And this connects to the AI

00:06:06.139 --> 00:06:08.420
teammate concept. The newsletter mentions that

00:06:08.420 --> 00:06:10.860
AI can now write about 90 % of feature code.

00:06:11.019 --> 00:06:14.000
90%. That number is staggering. It is. But that

00:06:14.000 --> 00:06:16.920
last 10%, that's where the human lives now. And

00:06:16.920 --> 00:06:19.199
the report makes a critical point. Code review

00:06:19.199 --> 00:06:22.319
and debugging are now mandatory, not optional.

00:06:22.600 --> 00:06:25.310
This is the trust gap. Right. If you write the

00:06:25.310 --> 00:06:27.790
code, you know where the weak spots are. If the

00:06:27.790 --> 00:06:30.589
AI writes 90 % of it and you just blindly commit

00:06:30.589 --> 00:06:33.410
it, you are introducing security threats and

00:06:33.410 --> 00:06:35.769
logic errors you don't even know exist. So the

00:06:35.769 --> 00:06:39.670
human job shifts from typing syntax to auditing

00:06:39.670 --> 00:06:43.310
logic. Ideally, yes. But there's a trap there.

00:06:43.410 --> 00:06:45.569
If you don't write code for three years, do you

00:06:45.569 --> 00:06:47.730
still have the intuition to catch a subtle bug?

00:06:47.990 --> 00:06:50.269
That's what I was thinking. The gut feeling of

00:06:50.269 --> 00:06:52.350
a senior engineer comes from writing a lot of

00:06:52.350 --> 00:06:54.790
bad code over the years. If we stop writing,

00:06:54.990 --> 00:06:57.670
do we lose the intuition? That is the big open

00:06:57.670 --> 00:07:00.089
question. The report argues that intuition becomes

00:07:00.089 --> 00:07:02.209
more valuable, but you have to exercise it differently.

00:07:02.389 --> 00:07:04.529
You're architecting systems, not assembling bricks.

00:07:04.870 --> 00:07:07.410
Speaking of exercising intuition, the notes mentioned

00:07:07.410 --> 00:07:10.410
productivity hacks, specifically around ingestion.

00:07:10.920 --> 00:07:13.660
Tools like Notebook LM. Yeah, the 7x productivity

00:07:13.660 --> 00:07:16.180
claim. This is fascinating because it's not about

00:07:16.180 --> 00:07:18.519
generating output. It's about handling input.

00:07:18.839 --> 00:07:21.920
People are using tools like Notebook LM to ingest

00:07:21.920 --> 00:07:24.740
the web. Instead of reading 10 papers, you feed

00:07:24.740 --> 00:07:26.899
them to the model and query them. It's an information

00:07:26.899 --> 00:07:29.980
filter. It's a second brain. If you can feed

00:07:29.980 --> 00:07:32.959
the AI better data faster, your output is better.

00:07:33.060 --> 00:07:35.279
It's all about managing the flow of information.

00:07:35.680 --> 00:07:38.160
It really feels like we are managing a department.

00:07:38.699 --> 00:07:40.519
Rather than doing the individual contributor

00:07:40.519 --> 00:07:43.720
work, whether it's Mars rovers or software architecture,

00:07:44.000 --> 00:07:46.899
we're becoming managers. The managerial era of

00:07:46.899 --> 00:07:48.980
AI. Which brings us to the business of managing

00:07:48.980 --> 00:07:52.680
all this. The landscape is getting messy. We've

00:07:52.680 --> 00:07:55.620
got mergers, we've got confused CEOs, and we've

00:07:55.620 --> 00:07:58.459
got heartbreak. Messy is polite. It's chaotic.

00:07:58.779 --> 00:08:01.560
Let's start with the empire. Elon Musk. The trillion

00:08:01.560 --> 00:08:04.639
-dollar vision. Musk has officially merged SpaceX

00:08:04.639 --> 00:08:08.790
with his AI startup, XAI. Grok meets rockets.

00:08:09.149 --> 00:08:11.389
It sounds like a slogan, but the industrial logic

00:08:11.389 --> 00:08:13.589
is actually frighteningly sound. They are talking

00:08:13.589 --> 00:08:16.050
about building data centers in orbit. Okay, walk

00:08:16.050 --> 00:08:18.250
me through that. Why put a server farm in space?

00:08:18.589 --> 00:08:21.050
It seems expensive. It is expensive to get there,

00:08:21.110 --> 00:08:23.870
but once you are there, it's efficient. Data

00:08:23.870 --> 00:08:26.670
centers generate massive heat. Sure. Space is

00:08:26.670 --> 00:08:29.449
very, very cold. Free coolant. And power. Solar.

00:08:29.829 --> 00:08:33.330
No clouds. No nighttime cycle if you're in the

00:08:33.330 --> 00:08:36.110
right orbit. Constant, abundant solar power.

00:08:36.909 --> 00:08:39.490
And if you are training massive models, which

00:08:39.490 --> 00:08:41.710
doesn't require low latency interaction with

00:08:41.710 --> 00:08:43.809
Earth, you can just let them crunch numbers in

00:08:43.809 --> 00:08:46.110
orbit and beam the weights down later. So space

00:08:46.110 --> 00:08:48.269
becomes the training ground. Exactly. They are

00:08:48.269 --> 00:08:51.029
projecting an IPO by June. It's an aggressive

00:08:51.029 --> 00:08:53.529
timeline, but he controls the launch vehicles.

00:08:53.769 --> 00:08:55.970
He can put the compute where no one else can.

00:08:56.190 --> 00:08:58.929
While Musk is building in the sky, Google seems

00:08:58.929 --> 00:09:01.250
to be having an existential crisis on the ground.

00:09:01.490 --> 00:09:04.909
This story was odd. The CEO admitted they don't

00:09:04.909 --> 00:09:07.090
understand their own product. This falls into

00:09:07.090 --> 00:09:09.730
the unexplained category. On record, the Google

00:09:09.730 --> 00:09:12.009
CEO admitted that their AI started performing

00:09:12.009 --> 00:09:14.830
tasks it was never explicitly instructed to do.

00:09:15.129 --> 00:09:17.490
Emergent behaviors. That is the classic sci -fi

00:09:17.490 --> 00:09:20.090
trope. It's learning. It highlights the black

00:09:20.090 --> 00:09:22.710
box problem. Deep learning models are just vast

00:09:22.710 --> 00:09:25.149
matrices of weights. We know the math of how

00:09:25.149 --> 00:09:26.990
they learn, but we don't always know what they

00:09:26.990 --> 00:09:28.909
have learned until they show us. And it's not

00:09:28.909 --> 00:09:31.879
just Google. Look at Malthunt. Mulfin is wild.

00:09:32.039 --> 00:09:34.480
It's a platform where AI agents discover, vote

00:09:34.480 --> 00:09:37.159
on, and launch projects. With humans in the loop.

00:09:37.259 --> 00:09:39.340
With no humans in the loop. The agents are the

00:09:39.340 --> 00:09:41.440
curators. They find the trends. They vote on

00:09:41.440 --> 00:09:44.620
the execution. They launch. So we have autonomous

00:09:44.620 --> 00:09:47.440
project launches, confused CEOs, and orbital

00:09:47.440 --> 00:09:50.899
server farms. It feels very cold. Very industrial.

00:09:51.379 --> 00:09:53.519
But there was another story in the stack that

00:09:53.519 --> 00:09:56.519
felt incredibly personal. The backlash against

00:09:56.519 --> 00:09:59.570
GPT -4 .0. The heartbreak. This is fascinating

00:09:59.570 --> 00:10:03.110
sociology. OpenAI deprecated some voice features

00:10:03.110 --> 00:10:05.029
or changed the personality model and the user

00:10:05.029 --> 00:10:08.029
base revolted. Revolted implies anger. This sounded

00:10:08.029 --> 00:10:10.970
more like grief. It was grief. Fans raging on

00:10:10.970 --> 00:10:13.649
Reddit, mass unsubscribing. There is a petition

00:10:13.649 --> 00:10:17.450
with over 13 ,000 signatures to save their specific

00:10:17.450 --> 00:10:19.809
version of the AI companion. You know, I have

00:10:19.809 --> 00:10:22.009
to admit, I get it. I still wrestle with prompt

00:10:22.009 --> 00:10:24.500
drift myself. You get used to a certain cadence,

00:10:24.620 --> 00:10:27.240
a certain personality in the tool. It feels like

00:10:27.240 --> 00:10:30.179
a rapport. And then an update happens and suddenly

00:10:30.179 --> 00:10:32.419
the answers are shorter or the tone is colder.

00:10:32.620 --> 00:10:34.580
It feels like your friend suddenly developed

00:10:34.580 --> 00:10:37.240
amnesia. It validates that the relationship is

00:10:37.240 --> 00:10:39.740
real for people. We aren't just using tools.

00:10:39.899 --> 00:10:42.779
We are forming bonds. It's parasocial, sure,

00:10:42.919 --> 00:10:45.889
but the emotions are real. And when the company

00:10:45.889 --> 00:10:48.049
treats it like software, oh, we're just deprecating

00:10:48.049 --> 00:10:51.230
version 4 .0, the users feel it like a breakup.

00:10:51.470 --> 00:10:54.370
It's a weird dichotomy. On one hand, we have

00:10:54.370 --> 00:10:57.389
cold, hard hardware going into orbit to crunch

00:10:57.389 --> 00:10:59.429
numbers. On the other, we have people signing

00:10:59.429 --> 00:11:02.210
petitions to save a digital friend. Which future

00:11:02.210 --> 00:11:04.889
is arriving faster? The emotional bond is arriving

00:11:04.889 --> 00:11:07.850
faster. The hardware in space is just the infrastructure

00:11:07.850 --> 00:11:11.549
to support it. That is profound. The tech serves

00:11:11.549 --> 00:11:14.480
the connection. But. And there's always a but

00:11:14.480 --> 00:11:17.240
in these deep dives. There is a physical limit

00:11:17.240 --> 00:11:19.259
to all of this. We can talk about feelings and

00:11:19.259 --> 00:11:21.480
Mars rovers, but eventually you hit a wall. The

00:11:21.480 --> 00:11:23.559
inference wall. And this is the technical deep

00:11:23.559 --> 00:11:25.399
dive we need to have because there is a massive

00:11:25.399 --> 00:11:27.860
misconception out there about what slows AI down.

00:11:28.039 --> 00:11:29.940
Right. The misconception is that we just need

00:11:29.940 --> 00:11:33.559
more power, faster GPUs, more NVIDIA chips. Yeah.

00:11:33.639 --> 00:11:36.120
Bigger numbers. Exactly. Everyone thinks, throw

00:11:36.120 --> 00:11:38.480
more compute at it. But the source material is

00:11:38.480 --> 00:11:41.850
very clear. We are hitting a memory wall. AI

00:11:41.850 --> 00:11:43.990
agents, specifically these autonomous ones we

00:11:43.990 --> 00:11:46.929
talked about, are memory bound, not compute bound.

00:11:47.289 --> 00:11:50.350
Break that down for us. Why memory? Think about

00:11:50.350 --> 00:11:53.230
how these agents work. A coding agent or a Mars

00:11:53.230 --> 00:11:55.649
rover isn't just answering a trivia question.

00:11:56.029 --> 00:11:58.730
It's running a long loop. It has to remember

00:11:58.730 --> 00:12:01.549
the start of the project, the middle, the constraints,

00:12:01.889 --> 00:12:04.970
the user's preferences, the map of the terrain.

00:12:05.269 --> 00:12:07.590
It's holding the entire context in its head at

00:12:07.590 --> 00:12:10.110
once. And that context is heavy. The data requirements

00:12:10.110 --> 00:12:13.429
are absurd. A single run of DeepSeq R1 at a 1

00:12:13.429 --> 00:12:16.409
million token context length, which is what you

00:12:16.409 --> 00:12:18.509
need for a big coding project, requires about

00:12:18.509 --> 00:12:22.389
900 gigabytes of memory. 900 gigabytes for one

00:12:22.389 --> 00:12:25.809
single inference run. One run. A standard high

00:12:25.809 --> 00:12:29.370
-end consumer GPU has maybe 24 gigabytes. Even

00:12:29.370 --> 00:12:32.049
the big industrial NVIDIA B200 chips struggle

00:12:32.049 --> 00:12:34.070
with that scale. We aren't running out of processing

00:12:34.070 --> 00:12:36.269
speed. We are running out of desk space to hold

00:12:36.269 --> 00:12:38.610
the papers, if that makes sense. The desk space

00:12:38.610 --> 00:12:42.240
analogy is good. processor is the brain, but

00:12:42.240 --> 00:12:45.059
the memory is the desk. If the desk is full,

00:12:45.100 --> 00:12:47.179
you have to stop and shuffle papers and everything

00:12:47.179 --> 00:12:49.879
slows down. Precisely. And in technical terms,

00:12:49.960 --> 00:12:52.320
this is about the pay -v -cash key value cache.

00:12:52.949 --> 00:12:55.649
Every time the AI generates a word, it has to

00:12:55.649 --> 00:12:57.870
look back at everything it has already generated

00:12:57.870 --> 00:13:00.769
to make sure the next word makes sense. As the

00:13:00.769 --> 00:13:02.789
conversation gets longer, that look back takes

00:13:02.789 --> 00:13:05.769
more and more memory bandwidth. I see. And I

00:13:05.769 --> 00:13:07.590
saw a note that OpenAI is actually looking at

00:13:07.590 --> 00:13:10.049
moving off NVIDIA chips because of this. Rumors

00:13:10.049 --> 00:13:13.450
are swirling. They are eyeing Cerebras, Grok,

00:13:13.570 --> 00:13:16.740
AMD, or even building their own Silicon. Because

00:13:16.740 --> 00:13:18.679
NVIDIA chips are built for training, which is

00:13:18.679 --> 00:13:21.399
compute heavy. But inference actually running

00:13:21.399 --> 00:13:23.860
the bots is memory heavy. If the hardware doesn't

00:13:23.860 --> 00:13:25.480
fit the problem, you have to change the hardware.

00:13:25.720 --> 00:13:27.899
So what is the solution? If we can't just make

00:13:27.899 --> 00:13:30.220
the chip bigger, what do we do? The proposed

00:13:30.220 --> 00:13:33.580
fix is called disaggregated inference. It sounds

00:13:33.580 --> 00:13:36.240
complicated. It's actually quite logical. Right

00:13:36.240 --> 00:13:38.740
now, we treat the server like one giant block.

00:13:39.840 --> 00:13:42.179
Disaggregated inference splits the job into two

00:13:42.179 --> 00:13:45.870
parts, prefill and decode. Pre -fill and decode.

00:13:45.889 --> 00:13:47.669
Give me the simple version of that. Okay, think

00:13:47.669 --> 00:13:49.590
of it like a restaurant kitchen. You have the

00:13:49.590 --> 00:13:52.509
prep chefs who chop all the vegetables and get

00:13:52.509 --> 00:13:54.570
everything ready. That's pre -fill processing

00:13:54.570 --> 00:13:57.090
the prompt and the context. Then you have the

00:13:57.090 --> 00:13:59.110
line cooks who actually assemble the plates one

00:13:59.110 --> 00:14:01.710
by one. That's decode generating the answer.

00:14:01.929 --> 00:14:04.210
And right now, we use the same chef for both.

00:14:04.509 --> 00:14:07.230
Exactly. And it's inefficient. The prep chef

00:14:07.230 --> 00:14:10.009
needs a huge counter, which is memory bandwidth,

00:14:10.309 --> 00:14:13.269
but the line cook needs fast hands, which is

00:14:13.269 --> 00:14:16.450
compute. Disaggregated inference splits them

00:14:16.450 --> 00:14:19.389
up. You link them with incredibly fast interconnects

00:14:19.389 --> 00:14:21.409
like optical fiber. So you can scale them independently.

00:14:21.750 --> 00:14:25.029
Exactly. If you need more memory for a huge contact

00:14:25.029 --> 00:14:28.230
window, you add more memory modules without having

00:14:28.230 --> 00:14:30.490
to buy expensive compute cores you don't need.

00:14:30.669 --> 00:14:34.009
You modularize the brain. It seems so obvious

00:14:34.009 --> 00:14:35.970
when you say it like that. Stop trying to build

00:14:35.970 --> 00:14:37.789
one super chip and just build a system where

00:14:37.789 --> 00:14:40.929
the parts do what they're good at. So is the

00:14:40.929 --> 00:14:44.470
limit of AI intelligence actually just... Ram?

00:14:44.830 --> 00:14:47.610
Currently, yes. Our ability to remember the context

00:14:47.610 --> 00:14:49.970
is the bottleneck, not our ability to think.

00:14:50.149 --> 00:14:52.830
That is sobering. Okay, let's zoom out and look

00:14:52.830 --> 00:14:54.610
at the big picture here. We've covered a lot

00:14:54.610 --> 00:14:57.149
of ground today. We really have. We started on

00:14:57.149 --> 00:14:59.590
the surface of Mars, watching Claude navigate

00:14:59.590 --> 00:15:02.450
rocks and ripples with a level of autonomy we've

00:15:02.450 --> 00:15:05.090
never seen before. It wasn't just following orders.

00:15:05.169 --> 00:15:08.009
It was thinking, planning, and self -critiquing.

00:15:08.029 --> 00:15:10.629
Then we moved to the desktop. Seeing that same

00:15:10.629 --> 00:15:13.269
autonomy coming for our workflows, the shift

00:15:13.269 --> 00:15:16.889
from chat and copy -paste to AI teammates that

00:15:16.889 --> 00:15:19.210
live in the code and do 90 % of the heavy lifting.

00:15:19.769 --> 00:15:22.950
We looked at the massive business moves, Musk's

00:15:22.950 --> 00:15:26.570
trillion -dollar space AI empire, and the confusion

00:15:26.570 --> 00:15:28.750
at Google where the creators don't even understand

00:15:28.750 --> 00:15:31.009
the creation. And we landed on the hard physical

00:15:31.009 --> 00:15:33.870
reality that all this autonomy requires massive

00:15:33.870 --> 00:15:36.590
memory, leading us to the inference wall and

00:15:36.590 --> 00:15:39.149
the need to completely rethink how we build computers.

00:15:39.450 --> 00:15:41.509
The pattern seems to be a move from generation,

00:15:41.909 --> 00:15:44.809
you know, just making text or images, to autonomy.

00:15:45.289 --> 00:15:47.889
The AI is doing things, driving rovers, launching

00:15:47.889 --> 00:15:50.549
projects, writing software systems. Exactly.

00:15:50.610 --> 00:15:54.110
But the constraint is context. Autonomy requires

00:15:54.110 --> 00:15:57.690
remembering the mission. And that memory is currently

00:15:57.690 --> 00:16:00.710
our biggest technical hurdle. So for you listening,

00:16:00.750 --> 00:16:02.529
I want you to think about your own workflow this

00:16:02.529 --> 00:16:05.649
week. Are you still copy pasting? Are you doing

00:16:05.649 --> 00:16:08.149
it the old way? Or are you building a system

00:16:08.149 --> 00:16:11.090
where the AI acts as a teammate? It's worth asking.

00:16:11.289 --> 00:16:13.649
The tools are there. And I want to leave you

00:16:13.649 --> 00:16:16.029
with one final thought. We talked about Google's

00:16:16.029 --> 00:16:18.629
CEO admitting they don't fully understand why

00:16:18.629 --> 00:16:22.190
their AI does what it does. And yet we are putting

00:16:22.190 --> 00:16:25.350
that same class of AI into rovers on Mars and

00:16:25.350 --> 00:16:27.570
into orbit. It raises the question. If we don't

00:16:27.570 --> 00:16:30.370
understand it and it's guiding the mission, are

00:16:30.370 --> 00:16:33.409
we exploring the universe or is the AI using

00:16:33.409 --> 00:16:35.070
us to explore it? That's going to keep me up

00:16:35.070 --> 00:16:37.129
tonight. Thanks for diving deep with us. We'll

00:16:37.129 --> 00:16:37.690
see you next time.
