WEBVTT

00:00:00.000 --> 00:00:02.480
Imagine needing the power of a nuclear plant,

00:00:02.660 --> 00:00:05.000
a whole gigawatt. Right. And imagine needing

00:00:05.000 --> 00:00:08.980
that much compute power every week just for AI.

00:00:09.259 --> 00:00:11.460
Yeah, it's kind of staggering, isn't it? That's

00:00:11.460 --> 00:00:13.300
the new reality Sam Altman is talking about.

00:00:13.380 --> 00:00:16.059
He calls it abundant intelligence. Exactly. And

00:00:16.059 --> 00:00:19.320
the response from industry, it was immediate

00:00:19.320 --> 00:00:22.440
and huge. Welcome to the Deep Dive. Today we're

00:00:22.440 --> 00:00:26.940
taking a calm. But I think really curious look

00:00:26.940 --> 00:00:30.579
at the sources that are defining this huge AI

00:00:30.579 --> 00:00:32.659
infrastructure race. Yeah, we're drawing from

00:00:32.659 --> 00:00:34.659
a newsletter that really digs into, you know,

00:00:34.679 --> 00:00:36.640
the hardware side, the software breakthroughs.

00:00:36.640 --> 00:00:39.020
And maybe most interestingly, these new ways

00:00:39.020 --> 00:00:41.000
people are trying to measure AI performance.

00:00:41.399 --> 00:00:44.039
Who's really best? So our mission today is really

00:00:44.039 --> 00:00:46.359
to unpack three main things. First, this literal

00:00:46.359 --> 00:00:50.820
race, building the grid for AI. Second, the just

00:00:50.820 --> 00:00:53.020
absurd speed of breakthroughs and what these

00:00:53.020 --> 00:00:55.880
models can do. AI's capability. Yeah, the pace

00:00:55.880 --> 00:00:58.179
is nuts. And third, a pretty revolutionary new

00:00:58.179 --> 00:01:00.140
way to figure out which AI works best because,

00:01:00.219 --> 00:01:02.640
spoiler, it's not the same for everyone. Let's

00:01:02.640 --> 00:01:05.719
start with the Giga Ones then. Altman's big idea.

00:01:05.939 --> 00:01:08.519
Right, this vision of abundant intelligence.

00:01:08.640 --> 00:01:12.180
He basically argues that access to AI or intelligence

00:01:12.180 --> 00:01:16.280
should be like a fundamental human right. That's

00:01:16.280 --> 00:01:18.599
a pretty profound statement. It is, but making

00:01:18.599 --> 00:01:22.340
that real, that takes scale. Almost unimaginable

00:01:22.340 --> 00:01:23.840
scale. And that's where the numbers come in.

00:01:24.219 --> 00:01:27.260
OpenAI's goal. Yeah. One gigawatt of compute

00:01:27.260 --> 00:01:30.700
capacity per week. Exactly. And that demand just

00:01:30.700 --> 00:01:32.920
kicked off this massive infrastructure build

00:01:32.920 --> 00:01:35.879
out. Hyperscalers, VCs, everyone jumped in. The

00:01:35.879 --> 00:01:38.640
investment scale is. Yeah. It's genuinely tough

00:01:38.640 --> 00:01:41.790
to visualize. We hear big numbers, but. 5 .5

00:01:41.790 --> 00:01:43.650
gigawatts. Yeah, that's Oracle. They're building

00:01:43.650 --> 00:01:46.510
these huge data centers. Texas, New Mexico, the

00:01:46.510 --> 00:01:49.430
Midwest. 5 .5 gigawatts. What does that even

00:01:49.430 --> 00:01:51.769
compare to? Well, think about a major city like

00:01:51.769 --> 00:01:53.950
Dallas, maybe. That's more than its base power

00:01:53.950 --> 00:01:55.849
needs. So, yeah, it's massive. And they're talking,

00:01:55.950 --> 00:01:58.590
what, 25 ,000 jobs just from that push? Wow.

00:01:59.170 --> 00:02:01.310
Construction tech. Right. They're not just building

00:02:01.310 --> 00:02:02.810
server farms. It feels like they're building,

00:02:02.890 --> 00:02:04.930
you know, the next century's economic engine.

00:02:05.030 --> 00:02:06.730
And soft banks in the mix, too, right? Yeah,

00:02:06.730 --> 00:02:09.110
correct. Aggressively. Oh, yeah. They committed

00:02:09.110 --> 00:02:12.469
1 .5 gigawatts and they want it done in 18 months.

00:02:12.590 --> 00:02:15.030
18 months. That's incredibly fast. New sites

00:02:15.030 --> 00:02:17.729
in Ohio, Texas. These aren't small steps. These

00:02:17.729 --> 00:02:21.169
are like moonshots, big bets. So if you add it

00:02:21.169 --> 00:02:23.810
all up, where are we heading? Well, the early

00:02:23.810 --> 00:02:25.710
commitments, if you tally them, point towards

00:02:25.710 --> 00:02:30.479
something like $500 billion. And 10 gigawatts

00:02:30.479 --> 00:02:33.599
total by the end of 2025. Half a trillion dollars.

00:02:33.719 --> 00:02:35.680
Yeah. 10 gigawatts. And they're already well

00:02:35.680 --> 00:02:38.560
on their way. Like 400 billion and 7 gigawatts

00:02:38.560 --> 00:02:40.680
are basically planned and funded already. It's

00:02:40.680 --> 00:02:43.180
like stacking these incredibly advanced Lego

00:02:43.180 --> 00:02:46.099
blocks, you know, but at warp speed. And then

00:02:46.099 --> 00:02:47.939
you have players like Alibaba taking a slightly

00:02:47.939 --> 00:02:50.139
different angle. Right. They talk about being

00:02:50.139 --> 00:02:52.500
the electric company for AI, but they're also

00:02:52.500 --> 00:02:55.360
going vertical. Super fast. Like dropping six

00:02:55.360 --> 00:02:58.719
major product launches in one day. Exactly. It

00:02:58.719 --> 00:03:00.960
highlights that speed, but also owning the whole

00:03:00.960 --> 00:03:03.039
process from the chip all the way up to the app.

00:03:03.120 --> 00:03:05.620
That seems key to their strategy. So why does

00:03:05.620 --> 00:03:09.139
this infrastructure layer matter so much to someone

00:03:09.139 --> 00:03:11.659
listening right now? Well, because this isn't

00:03:11.659 --> 00:03:13.979
just about tech companies anymore. It's becoming

00:03:13.979 --> 00:03:18.439
a race between like governments, big money VCs,

00:03:18.479 --> 00:03:22.360
the cloud giants to own the fundamental infrastructure,

00:03:22.500 --> 00:03:25.340
the plumbing for intelligence itself. It goes

00:03:25.340 --> 00:03:28.479
way beyond just making chat bots a bit better.

00:03:28.539 --> 00:03:30.719
It's foundational. OK, so the scale is huge.

00:03:30.900 --> 00:03:34.000
The cost is astronomical. Does all this frantic

00:03:34.000 --> 00:03:39.099
building actually mean that AI access gets cheaper?

00:03:39.800 --> 00:03:43.060
and sooner for regular people. Yeah, I think

00:03:43.060 --> 00:03:45.300
the signs point that way. This level of aggressive

00:03:45.300 --> 00:03:47.719
competition, the sheer amount of money pouring

00:03:47.719 --> 00:03:50.500
in, it suggests better, cheaper access is coming.

00:03:50.659 --> 00:03:52.580
And probably fast. Right, because of that competition

00:03:52.580 --> 00:03:54.860
and just the pace of the tech improving. Exactly.

00:03:55.060 --> 00:03:57.300
Okay, so all this hardware is being built because

00:03:57.300 --> 00:03:59.080
the software, the capabilities are demanding

00:03:59.080 --> 00:04:02.020
it. Let's shift gears then from building speed

00:04:02.020 --> 00:04:04.599
to intelligence speed. yeah the physical stuff

00:04:04.599 --> 00:04:06.520
is really just trying to keep up with the software

00:04:06.520 --> 00:04:08.840
breakthroughs the pace of improvement and models

00:04:08.840 --> 00:04:11.979
and what they can do it's uh it's kind of absurd

00:04:11.979 --> 00:04:14.039
right now we saw that play out recently didn't

00:04:14.039 --> 00:04:16.560
we there's that live marketing showdown oh yeah

00:04:16.560 --> 00:04:20.860
with the big llms chat gpt gemini perplexity

00:04:20.860 --> 00:04:23.420
claude right and in a real business test one

00:04:23.420 --> 00:04:25.550
of them just walked away with five out of the

00:04:25.550 --> 00:04:28.129
six wins. That shows you like the immediate practical

00:04:28.129 --> 00:04:30.269
value is already there. It's not theoretical.

00:04:30.649 --> 00:04:32.750
And then there's the sheer size increase. You

00:04:32.750 --> 00:04:35.050
mentioned Quinn 3 Max. Yeah. Alibaba's new model.

00:04:35.209 --> 00:04:39.029
One trillion parameters. Whoa. One trillion.

00:04:39.660 --> 00:04:42.019
Just try to imagine scaling that. A trillion

00:04:42.019 --> 00:04:44.439
variables for the model to learn from. That's

00:04:44.439 --> 00:04:47.120
a huge jump in complexity. It's a massive number.

00:04:47.199 --> 00:04:49.339
And it's immediately showing results, beating

00:04:49.339 --> 00:04:52.620
older models, even early GPT -5 versions on some

00:04:52.620 --> 00:04:55.819
tasks. But the really stunning thing. What's

00:04:55.819 --> 00:04:58.259
that? It's math performance. It scored 100 %

00:04:58.259 --> 00:05:01.959
on the AME25 math benchmark. 100%. Wow. Okay,

00:05:02.100 --> 00:05:03.939
that's the moment of wonder right there. That's

00:05:03.939 --> 00:05:07.500
not just good. Perfect mastery on a really tough

00:05:07.500 --> 00:05:10.639
academic test. Right. Models hitting expert levels

00:05:10.639 --> 00:05:12.439
almost right out of the gate. And it's translating

00:05:12.439 --> 00:05:14.060
to the real world, too, isn't it? The finance

00:05:14.060 --> 00:05:17.180
exam. Oh, yeah. The CFA exam. Claude Opus. Gemini

00:05:17.180 --> 00:05:20.300
2 .5 Pro. They both passed level three. Which

00:05:20.300 --> 00:05:23.139
is notoriously the hardest level. Takes humans,

00:05:23.220 --> 00:05:25.560
what, over a thousand hours of study? Usually,

00:05:25.720 --> 00:05:29.060
yeah. Intense study. And these models did it.

00:05:29.500 --> 00:05:32.579
In minutes. In minutes. The implication there

00:05:32.579 --> 00:05:35.420
for, like, knowledge work, professional training,

00:05:35.500 --> 00:05:38.939
it's just huge and immediate. You have to wonder,

00:05:39.019 --> 00:05:41.959
what's the value of that certification if a machine

00:05:41.959 --> 00:05:45.500
can ace it instantly? Exactly. And look how fast

00:05:45.500 --> 00:05:48.220
it gets integrated. Microsoft already plugged

00:05:48.220 --> 00:05:51.639
Cloud into 365 Copilot. That's the first external

00:05:51.639 --> 00:05:54.019
AI model inside the main Office tools, right?

00:05:54.079 --> 00:05:56.139
Yep. The speed from breakthrough to application

00:05:56.139 --> 00:05:59.019
is relentless. It really is. Okay, here's maybe

00:05:59.019 --> 00:06:02.060
a vulnerable admission. Uh -oh. Even trying to

00:06:02.060 --> 00:06:03.699
follow this closely, I still kind of wrestle

00:06:03.699 --> 00:06:06.379
with prompt drift. Oh, absolutely. You're definitely

00:06:06.379 --> 00:06:08.399
not alone there, where you ask the same thing

00:06:08.399 --> 00:06:10.759
or use the same image prompts like two weeks

00:06:10.759 --> 00:06:12.540
apart. And the results are completely different

00:06:12.540 --> 00:06:14.959
because the model changed underneath without

00:06:14.959 --> 00:06:17.720
you knowing. Yeah. It's a real challenge, especially,

00:06:17.800 --> 00:06:20.540
like you said, with visual tools. That Redditor's

00:06:20.540 --> 00:06:25.220
game, real versus AI images. It's getting almost

00:06:25.220 --> 00:06:28.120
impossible to tell consistently now. For anyone.

00:06:28.639 --> 00:06:30.839
We're struggling to keep up with the tools, really.

00:06:31.079 --> 00:06:33.300
But the tools just keep coming. They do. Quick

00:06:33.300 --> 00:06:36.259
hits here. Cling 2 .5 Turbo. Getting really good

00:06:36.259 --> 00:06:39.379
at realistic images, video. Pushing towards that

00:06:39.379 --> 00:06:41.980
believable synthetic media. And that AI bandage

00:06:41.980 --> 00:06:45.079
thing. That sounds wild. Isn't it? An AI -powered

00:06:45.079 --> 00:06:48.300
smart bandage monitors the wound, adjusts things.

00:06:48.480 --> 00:06:52.420
They claim it heals 25 % faster. And maybe less

00:06:52.420 --> 00:06:54.639
exciting, but relevant. Ads are probably coming

00:06:54.639 --> 00:06:58.639
to free chat GPT by 2026. Ah, ties back to the

00:06:58.639 --> 00:07:00.579
compute costs, I guess. Pretty much has to, yeah.

00:07:00.779 --> 00:07:03.139
Okay, so with these models acing things like

00:07:03.139 --> 00:07:06.139
the CFA exam in minutes, what does that tell

00:07:06.139 --> 00:07:08.079
us about where the ceiling is? Is there even

00:07:08.079 --> 00:07:11.120
a ceiling for these current models? That immediate

00:07:11.120 --> 00:07:13.420
mastery, it really shows we have to constantly

00:07:13.420 --> 00:07:15.839
reevaluate. Even models that were top tier just

00:07:15.839 --> 00:07:19.740
months ago, the ceiling just keeps rising perpetually.

00:07:19.860 --> 00:07:21.339
Okay, so if the capabilities are moving that

00:07:21.339 --> 00:07:23.279
fast, the way we measure them needs to change

00:07:23.279 --> 00:07:27.079
too, right? Exactly. Which brings us to the metrics,

00:07:27.279 --> 00:07:30.300
how we decide what's good. Traditionally, we've

00:07:30.300 --> 00:07:33.360
had leaderboards. El Marina, places like that,

00:07:33.420 --> 00:07:35.019
they've been the standard. Yeah, but they had

00:07:35.019 --> 00:07:37.500
this huge blind spot, a really unavoidable one.

00:07:37.620 --> 00:07:40.240
They basically treat every user the same. Doesn't

00:07:40.240 --> 00:07:42.259
matter who you are, where you are, why you're

00:07:42.259 --> 00:07:45.740
using it. An engineer in Tokyo, a lawyer in London.

00:07:45.879 --> 00:07:48.740
Or a student in Buenos Aires, yeah. The leaderboards

00:07:48.740 --> 00:07:52.180
just give you a raw performance score, undifferentiated.

00:07:52.379 --> 00:07:54.420
That doesn't really tell you if it's useful for

00:07:54.420 --> 00:07:57.329
your specific need, though. Not at all. So the

00:07:57.329 --> 00:08:01.430
new idea is from Scale AI. They launched Seal

00:08:01.430 --> 00:08:03.889
Showdown. Seal Showdown. Okay, what's different?

00:08:04.089 --> 00:08:06.829
It completely flips the ranking method. Instead

00:08:06.829 --> 00:08:09.149
of just raw speed or whatever, it ranks models

00:08:09.149 --> 00:08:12.389
based on real user preferences, and it connects

00:08:12.389 --> 00:08:16.069
those preferences to demographic info. Oh, okay.

00:08:16.149 --> 00:08:18.850
So it's subjective based on who's using it. Exactly.

00:08:18.990 --> 00:08:21.490
Think about, say you're a bank choosing an LLM

00:08:21.490 --> 00:08:23.509
for your Spanish -speaking customers in Miami.

00:08:24.170 --> 00:08:27.589
El Marina, useless for that decision. But SEAL

00:08:27.589 --> 00:08:29.810
Showdown could actually give you useful data

00:08:29.810 --> 00:08:33.070
because it breaks down results by age, education,

00:08:33.450 --> 00:08:36.759
language, country. So you can pick the model

00:08:36.759 --> 00:08:40.379
that that specific group actually prefers. Precisely.

00:08:40.379 --> 00:08:42.700
That's actionable utility. How are they getting

00:08:42.700 --> 00:08:45.139
the data? Globally. They say preferences from

00:08:45.139 --> 00:08:48.860
100 countries, 70 languages. Try and get a real

00:08:48.860 --> 00:08:51.340
global snapshot. And how do they stop people

00:08:51.340 --> 00:08:54.200
from gaming the system? Like model creators trying

00:08:54.200 --> 00:08:56.360
to boost their scores. Seems like they've thought

00:08:56.360 --> 00:09:00.289
about that. Voting is voluntary, anonymous, and

00:09:00.289 --> 00:09:02.250
they hold back the results from public view for

00:09:02.250 --> 00:09:04.950
60 days to prevent that kind of manipulation.

00:09:05.330 --> 00:09:07.730
Okay, so it's a big shift. El Marina gives you

00:09:07.730 --> 00:09:10.409
raw speed, like engine horsepower. SEAL Showdown

00:09:10.409 --> 00:09:12.730
tells you which car people actually like driving

00:09:12.730 --> 00:09:14.490
in their specific neighborhoods. That's a great

00:09:14.490 --> 00:09:17.230
analogy, yeah. Which tool is perceived as best

00:09:17.230 --> 00:09:19.929
by your users? That creates real market pressure.

00:09:20.330 --> 00:09:23.210
So you'd expect the big players, OpenAI, Anthropic,

00:09:23.490 --> 00:09:27.120
Google. They'll have to react somehow. Oh, definitely.

00:09:27.360 --> 00:09:29.059
They'll likely try to build their own versions

00:09:29.059 --> 00:09:31.179
or maybe just plug into scale system. You can't

00:09:31.179 --> 00:09:33.100
really ignore that kind of specific customer

00:09:33.100 --> 00:09:35.519
feedback. So what's the core assumption about

00:09:35.519 --> 00:09:38.779
AI performance that this new benchmark really

00:09:38.779 --> 00:09:41.440
challenges? It fundamentally challenges that

00:09:41.440 --> 00:09:43.980
old idea that there's one single best model,

00:09:44.100 --> 00:09:47.399
one size fits all, that it can serve every single

00:09:47.399 --> 00:09:50.460
user everywhere equally well. SEAL says, no,

00:09:50.500 --> 00:09:53.440
it's more complicated. Okay, let's try to wrap

00:09:53.440 --> 00:09:55.740
this deep dive up. Quick recap of the big layers

00:09:55.740 --> 00:09:57.559
we talked about. Sure. First, the foundation,

00:09:57.840 --> 00:10:00.960
the infrastructure, this gigawatt race heading

00:10:00.960 --> 00:10:04.100
towards $500 billion, all chasing that abundant

00:10:04.100 --> 00:10:06.779
intelligence vision. It's a battle for the means

00:10:06.779 --> 00:10:09.059
of production, really. Second, the capabilities,

00:10:09.340 --> 00:10:12.259
just stunning growth, models mastering complex

00:10:12.259 --> 00:10:15.519
skills like that CFA exam almost instantly. It's

00:10:15.519 --> 00:10:17.460
constantly pushing the performance ceiling up,

00:10:17.539 --> 00:10:20.289
impacting work and education. Yeah. And third,

00:10:20.409 --> 00:10:22.889
how we measure it all. Shifting towards personalized,

00:10:23.250 --> 00:10:25.809
user -focused benchmarks like SEAL Showdown.

00:10:26.009 --> 00:10:28.970
Moving past Rossby to ask, who is this really

00:10:28.970 --> 00:10:31.269
best for? It brings us right back to Altman's

00:10:31.269 --> 00:10:33.470
original claim, doesn't it? Yeah. If AI access

00:10:33.470 --> 00:10:35.970
becomes a fundamental right, how does that change

00:10:35.970 --> 00:10:39.070
global power dynamics? When only a handful of

00:10:39.070 --> 00:10:42.250
massive companies, Oracle, SoftBank, Alibaba,

00:10:42.269 --> 00:10:46.330
can actually afford to build the 10 gigawatts

00:10:46.330 --> 00:10:48.490
needed to deliver that right. That's the billion.

00:10:49.120 --> 00:10:51.019
Maybe trillion dollar question, isn't it? Who

00:10:51.019 --> 00:10:53.419
owns the intelligence grid and what responsibility

00:10:53.419 --> 00:10:56.360
comes with that ownership? That's the core geopolitical

00:10:56.360 --> 00:10:59.039
tension underneath this whole race. Something

00:10:59.039 --> 00:11:01.539
to think about. We'd encourage you listening

00:11:01.539 --> 00:11:04.440
to consider the sources we unpacked. Maybe think

00:11:04.440 --> 00:11:06.940
about how your own use of AI tools or how you

00:11:06.940 --> 00:11:09.120
judge them might be skewed by those old metrics.

00:11:09.299 --> 00:11:11.720
Are you using the fastest model or the one that

00:11:11.720 --> 00:11:14.059
actually works best for you? Good question to

00:11:14.059 --> 00:11:16.019
ponder. Thank you for joining us on the Deep

00:11:16.019 --> 00:11:16.379
Dive.