WEBVTT

00:00:00.000 --> 00:00:02.439
When you think about the word hardware, what

00:00:02.439 --> 00:00:04.759
image pops into your head? Probably a blank slate,

00:00:04.900 --> 00:00:07.540
right? Right. A piece of silicon just waiting

00:00:07.540 --> 00:00:09.960
for code. But what if the software wasn't just

00:00:09.960 --> 00:00:12.500
running on the chip? What if it was physically

00:00:12.500 --> 00:00:15.119
baked into it? It sounds like science fiction,

00:00:15.220 --> 00:00:17.440
but it's actually happening. We are talking about

00:00:17.440 --> 00:00:20.679
a chip that runs 17 ,000 tokens per second. Because

00:00:20.679 --> 00:00:24.280
the AI isn't on the chip. It is the chip. Exactly.

00:00:24.280 --> 00:00:26.559
That is... That is really hard to wrap your head

00:00:26.559 --> 00:00:28.839
around. It is a total paradigm shift from software

00:00:28.839 --> 00:00:31.300
to hardwired intelligence. Welcome to the Deep

00:00:31.300 --> 00:00:33.780
Give. I am really glad you are here with us today.

00:00:33.920 --> 00:00:36.380
We have a fascinating stack of reports to get

00:00:36.380 --> 00:00:39.079
through. And I want to approach this with a measured

00:00:39.079 --> 00:00:42.500
curiosity. There is just so much hype in this

00:00:42.500 --> 00:00:45.780
space right now. Our job is to slow down and

00:00:45.780 --> 00:00:47.799
really look at the mechanics of what is changing.

00:00:48.060 --> 00:00:50.659
I love that approach. And we have a pretty wild

00:00:50.659 --> 00:00:54.280
roadmap ahead. We are going to start. with this

00:00:54.280 --> 00:00:56.700
silicon llama. The chip that breaks the speed

00:00:56.700 --> 00:00:59.079
limit. Right. By cementing the model directly

00:00:59.079 --> 00:01:01.960
into the hardware. Then we're going to pivot

00:01:01.960 --> 00:01:04.420
to a Google research paper. The one that says

00:01:04.420 --> 00:01:06.840
simply repeating yourself makes an AI smarter.

00:01:07.140 --> 00:01:10.219
Which sounds too simple to be true, but the data

00:01:10.219 --> 00:01:13.840
is there. Then we will zoom out to the massive

00:01:13.840 --> 00:01:16.319
geopolitical power struggles. Anthropic chasing

00:01:16.319 --> 00:01:19.780
open AI. And the U .S. rejecting global AI governance

00:01:19.780 --> 00:01:22.549
entirely. And finally, we will look at new tools

00:01:22.549 --> 00:01:24.810
and the software hiding in your spreadsheets.

00:01:24.870 --> 00:01:28.349
Let us unpack this first big story, the Silicon

00:01:28.349 --> 00:01:30.849
Llama. This comes from a company called Taalis.

00:01:31.069 --> 00:01:33.989
They launched something called the HC1. Now,

00:01:34.010 --> 00:01:35.769
usually when we talk about AI chips, we talk

00:01:35.769 --> 00:01:38.049
about GPUs. Right. General processing units,

00:01:38.170 --> 00:01:41.650
they are flexible. But this HC1 is an ASIC. An

00:01:41.650 --> 00:01:44.629
ASIC. That is a chip hardwired for one single

00:01:44.629 --> 00:01:46.769
task. To understand why this matters, you have

00:01:46.769 --> 00:01:48.670
to understand the memory wall. Right, the memory

00:01:48.670 --> 00:01:51.290
wall. In a standard setup, the compute core is

00:01:51.290 --> 00:01:53.629
incredibly fast. It does the math instantly.

00:01:53.969 --> 00:01:56.150
But the data, the actual weights of the AI model,

00:01:56.310 --> 00:01:58.870
they live in memory chips nearby. So every time

00:01:58.870 --> 00:02:01.329
you ask a question, the GPU has to fetch those

00:02:01.329 --> 00:02:04.150
weights. Move them over, process them, send them

00:02:04.150 --> 00:02:06.890
back. It is a constant traffic jam. The chip

00:02:06.890 --> 00:02:09.949
spends a lot of time just waiting for data. Exactly.

00:02:09.949 --> 00:02:13.330
It is highly inefficient. Enter TALIS. They built

00:02:13.330 --> 00:02:16.969
an ASIC instead. Think of a GPU like a Swiss

00:02:16.969 --> 00:02:19.669
army knife. Right. It could do graphics or crypto

00:02:19.669 --> 00:02:22.930
mining or run different AI models. But the Kalalus

00:02:22.930 --> 00:02:26.150
HC1 is not a Swiss army knife. It is a scalpel.

00:02:26.289 --> 00:02:28.750
It is built to do exactly one thing, run Meta's

00:02:28.750 --> 00:02:32.409
Lama 3 .18b model. And because it is so speculized,

00:02:32.469 --> 00:02:34.530
the performance numbers are just staggering.

00:02:34.770 --> 00:02:38.610
We are seeing reports of up to 17 ,000 tokens

00:02:38.610 --> 00:02:41.610
per second. Tokens are basically pieces of words

00:02:41.610 --> 00:02:44.680
the AI processes. Yeah. Just pause on that number

00:02:44.680 --> 00:02:46.939
for a second. 17 ,000. When I use a standard

00:02:46.939 --> 00:02:49.900
chatbot, I am thrilled with 50 tokens a second.

00:02:50.039 --> 00:02:53.080
It feels like someone typing fast, but 17 ,000

00:02:53.080 --> 00:02:55.680
isn't typing. Whoa, imagine scaling that to a

00:02:55.680 --> 00:02:57.939
billion queries. You are not reading a book at

00:02:57.939 --> 00:03:00.159
that speed. You are downloading the entire library

00:03:00.159 --> 00:03:03.280
instantly. Forbes reports this is 10 times faster

00:03:03.280 --> 00:03:05.219
than Cerebras. Which was already the speed king.

00:03:05.379 --> 00:03:08.060
And potentially 100 times faster than standard

00:03:08.060 --> 00:03:10.419
GPUs. The cost is the other part that jumped

00:03:10.419 --> 00:03:13.800
out at me. roughly 75 cents per 1 million tokens.

00:03:14.080 --> 00:03:16.900
Which is practically free. But the real killer

00:03:16.900 --> 00:03:20.139
stat is the power usage. A rack of these things

00:03:20.139 --> 00:03:23.460
pulls 12 to 15 kilowatts. Compared to a GPU rack,

00:03:23.919 --> 00:03:26.919
Pulling up to 600 kilowatts. So it's 10 times

00:03:26.919 --> 00:03:29.759
faster and uses a tenth of the power. But there's

00:03:29.759 --> 00:03:32.560
no free lunch in engineering. You don't get that

00:03:32.560 --> 00:03:35.120
performance without giving something up. What

00:03:35.120 --> 00:03:37.360
is the catch here? The catch is the absolute

00:03:37.360 --> 00:03:40.379
rigidity. They literally hardwire the Lama model

00:03:40.379 --> 00:03:43.939
weights onto the silicon die. So if Meta releases

00:03:43.939 --> 00:03:47.060
Lama 4 next week. You cannot upgrade the software.

00:03:47.259 --> 00:03:49.379
Yeah. Because the software is the hardware. That

00:03:49.379 --> 00:03:51.840
is a massive gamble. There is another trade -off

00:03:51.840 --> 00:03:54.539
too. Quantization. Which means compressing the

00:03:54.539 --> 00:03:57.580
AI's math to save space. Right. To get everything

00:03:57.580 --> 00:04:00.120
to fit, they use mixed 3 -bit and 6 -bit weights

00:04:00.120 --> 00:04:02.539
instead of high precision numbers. So you lose

00:04:02.539 --> 00:04:04.740
some subtle accuracy to gain all that speed.

00:04:05.000 --> 00:04:07.819
Exactly. Though they are aiming for 4 -bit floating

00:04:07.819 --> 00:04:10.300
point in future chips to close that gap. So is

00:04:10.300 --> 00:04:13.159
the speed worth the risk of hardware obsolescence

00:04:13.159 --> 00:04:15.659
if the model updates? So is a Ferrari engine

00:04:15.659 --> 00:04:18.800
welded shut fast but unchangeable? I really like

00:04:18.800 --> 00:04:21.680
that image. Let us pivot from locked -in hardware

00:04:21.680 --> 00:04:24.420
to a software hack that seems almost too easy.

00:04:24.660 --> 00:04:27.899
This Google research paper is fascinating. The

00:04:27.899 --> 00:04:30.180
stutter trick. Yeah, the stutter trick. Yeah.

00:04:30.259 --> 00:04:32.459
Google researchers found that for non -reasoning

00:04:32.459 --> 00:04:35.819
models, if you simply repeat the prompt twice...

00:04:35.819 --> 00:04:38.939
Just paste it a second time. ...performance absolutely

00:04:38.939 --> 00:04:41.779
skyrockets. We are not talking about a 5 % bump.

00:04:42.000 --> 00:04:45.300
No. On search -style tasks, accuracy jumped from

00:04:45.300 --> 00:04:50.819
21 %... to 97 percent 21 to 97 just by asking

00:04:50.819 --> 00:04:53.779
twice just by saying it again why does that work

00:04:53.779 --> 00:04:56.639
it comes down to how these models process information

00:04:56.639 --> 00:04:59.560
they read left to right they interpret early

00:04:59.560 --> 00:05:02.740
words before seeing later clarifications right

00:05:02.740 --> 00:05:05.240
they are predicting the next word based on what

00:05:05.240 --> 00:05:07.620
they have seen so far So if I give a complex

00:05:07.620 --> 00:05:09.740
instruction at the end of a sentence, it is already

00:05:09.740 --> 00:05:11.839
committed to a trajectory before it gets there.

00:05:11.920 --> 00:05:13.939
But repeating the prompt gives it a second pass.

00:05:14.319 --> 00:05:17.500
The first iteration puts the full context into

00:05:17.500 --> 00:05:19.720
its working memory. By the time it generates

00:05:19.720 --> 00:05:22.459
an answer after the second prompt, it has future

00:05:22.459 --> 00:05:24.660
knowledge of the entire request. It creates a

00:05:24.660 --> 00:05:28.339
perfect buffer for context awareness. I have

00:05:28.339 --> 00:05:30.779
to admit something here. I still wrestle with

00:05:30.779 --> 00:05:33.360
prompt drift myself. We all do. Sometimes I get

00:05:33.360 --> 00:05:36.300
lazy with instructions. It is incredibly comforting

00:05:36.300 --> 00:05:38.980
to know the fix is just copy -pasting. And the

00:05:38.980 --> 00:05:41.379
data backs it up. Repetition beat the normal

00:05:41.379 --> 00:05:45.220
prompt in 47 out of 70 cases. And crucially,

00:05:45.360 --> 00:05:48.480
it never performed worse in a statistically meaningful

00:05:48.480 --> 00:05:51.180
way. So there's really no downside. But does

00:05:51.180 --> 00:05:54.160
this prove models aren't actually thinking but

00:05:54.160 --> 00:05:56.600
just predicting linearly? Right. They aren't

00:05:56.600 --> 00:05:58.420
reasoning. They are just auto -completing with

00:05:58.420 --> 00:06:00.620
better hindsight. It is a great reminder of what

00:06:00.620 --> 00:06:03.050
is actually under the hood. Now let us look at

00:06:03.050 --> 00:06:05.730
the engine room of the industry itself. The business

00:06:05.730 --> 00:06:08.209
side of this deep dive is moving so fast right

00:06:08.209 --> 00:06:10.810
now. Anthropic is on an absolute tear. Their

00:06:10.810 --> 00:06:13.290
revenue scaled 10 times recently. Compared to

00:06:13.290 --> 00:06:16.790
OpenAI at 3 .4 times. OpenAI is still massive,

00:06:16.850 --> 00:06:19.990
but Anthropic is accelerating much faster. Some

00:06:19.990 --> 00:06:22.470
projections say they could overtake OpenAI by

00:06:22.470 --> 00:06:26.069
mid -2026. And then you have NVIDIA making a

00:06:26.069 --> 00:06:31.089
huge move. NVIDIA is nearing a $30 billion equity

00:06:31.089 --> 00:06:34.250
stake in OpenAI. Right. This replaces a previous

00:06:34.250 --> 00:06:37.740
chip supply pact. This helps value open AI at

00:06:37.740 --> 00:06:41.660
$830 billion. We are creeping into trillion dollar

00:06:41.660 --> 00:06:44.160
territory for a private company. But the map

00:06:44.160 --> 00:06:47.660
of who uses and regulates this tech is fracturing.

00:06:47.860 --> 00:06:49.600
We have to talk about the Delhi Declaration.

00:06:49.839 --> 00:06:52.259
Over 70 countries signed this declaration in

00:06:52.259 --> 00:06:55.300
India, focusing on AI safety. It is a massive

00:06:55.300 --> 00:06:57.740
move by the global south to have a voice here.

00:06:57.899 --> 00:06:59.720
But we need to be clear about the U .S. response.

00:07:00.139 --> 00:07:02.860
The White House completely rejected it. The exact

00:07:02.860 --> 00:07:05.680
phrase was they totally reject global. AI governance.

00:07:05.959 --> 00:07:07.959
We are just reporting what the sources state

00:07:07.959 --> 00:07:10.240
here, but that is a very definitive stance. It

00:07:10.240 --> 00:07:13.100
shows a clear prioritization of speed and domestic

00:07:13.100 --> 00:07:14.720
control. And when you look at the demographics,

00:07:15.160 --> 00:07:18.600
India's push makes sense. Young Indians are powering

00:07:18.600 --> 00:07:22.259
chat GPT usage. Nearly 50 % of their users are

00:07:22.259 --> 00:07:26.079
18 to 24. India has over 100 million weekly users.

00:07:26.319 --> 00:07:28.160
So you have the users in India, the hardware

00:07:28.160 --> 00:07:30.480
in Taiwan and the U .S. The capital in Silicon

00:07:30.480 --> 00:07:33.850
Valley. It is highly volatile mix. If the hardware

00:07:33.850 --> 00:07:36.529
maker owns the software maker who actually controls

00:07:36.529 --> 00:07:38.670
the industry. The arms dealer is essentially

00:07:38.670 --> 00:07:42.930
buying the army. Sponsor peak. We are back. Let

00:07:42.930 --> 00:07:45.029
us bring this down from geopolitics to something

00:07:45.029 --> 00:07:47.990
a bit more grounded. Literally down to your desktop

00:07:47.990 --> 00:07:50.970
spreadsheets. The tool that runs the world. The

00:07:50.970 --> 00:07:54.149
source highlights this concept of software hiding

00:07:54.149 --> 00:07:56.689
in your spreadsheets. They call it the big seed

00:07:56.689 --> 00:07:59.670
or the blueprint. Your messy Excel sheet with

00:07:59.670 --> 00:08:01.910
client data and notes is actually a blueprint

00:08:01.910 --> 00:08:04.370
for a custom app. You just need the right tool

00:08:04.370 --> 00:08:06.480
to translate it. This is where platforms like

00:08:06.480 --> 00:08:09.279
Glide come in. Wrapping a user interface around

00:08:09.279 --> 00:08:12.180
your raw data. It is total democratization of

00:08:12.180 --> 00:08:14.939
software. Yeah. But we are also seeing highly

00:08:14.939 --> 00:08:17.379
specialized micro tools. Like Cloud and PowerPoint.

00:08:17.680 --> 00:08:19.379
Right. It reads your layouts and fonts. So when

00:08:19.379 --> 00:08:21.680
it generates a slide, it stays perfectly on brand.

00:08:21.939 --> 00:08:24.199
No more generic corporate clip art. And then

00:08:24.199 --> 00:08:26.939
there is Wordy. Wordy is fun. You watch movie

00:08:26.939 --> 00:08:29.279
clips and it gives you quizzes. Gamified learning

00:08:29.279 --> 00:08:32.259
powered by AI to check comprehension. Then on

00:08:32.259 --> 00:08:34.639
the totally opposite end of the spectrum, we

00:08:34.639 --> 00:08:38.090
have. Ineffable intelligence. They just raised

00:08:38.090 --> 00:08:42.470
a $1 billion seed round. A $1 billion seed round

00:08:42.470 --> 00:08:46.529
led by ex -DeepMind star David Silver. Their

00:08:46.529 --> 00:08:49.750
explicit goal is building superhuman intelligence.

00:08:49.990 --> 00:08:52.129
The capital intensity required right now is just

00:08:52.129 --> 00:08:55.450
wild. With $1 billion seed rounds, are we in

00:08:55.450 --> 00:08:58.110
a bubble or just starting the curve? High -stakes

00:08:58.110 --> 00:09:01.029
poker. But the chips are worth billions. Let

00:09:01.029 --> 00:09:02.690
us pull all these threads together. We covered

00:09:02.690 --> 00:09:04.649
a lot of ground today. If we look at the big

00:09:04.649 --> 00:09:07.350
picture, we are seeing a massive move towards

00:09:07.350 --> 00:09:09.750
specialization. Starting with the Silicon Llama.

00:09:10.250 --> 00:09:12.830
Chips hardwired for specific thoughts. Moving

00:09:12.830 --> 00:09:15.409
away from general purpose to extreme focus. At

00:09:15.409 --> 00:09:17.529
the same time, we are learning the weird psychology

00:09:17.529 --> 00:09:20.370
of the machines. The Google stutter trick proves

00:09:20.370 --> 00:09:22.350
we are still just figuring out how to talk to

00:09:22.350 --> 00:09:24.870
them. And globally, the map is fracturing. The

00:09:24.870 --> 00:09:27.330
U .S. goes it alone, while the global south drives

00:09:27.330 --> 00:09:30.330
massive usage. It makes you wonder, if we are

00:09:30.330 --> 00:09:33.450
baking models into silicon, are we stabilizing

00:09:33.450 --> 00:09:36.370
or just building faster obsolescence? What happens

00:09:36.370 --> 00:09:38.629
when you bake llama into a chip and it gets outdated

00:09:38.629 --> 00:09:41.730
next Tuesday? You get a very expensive doorstop.

00:09:41.870 --> 00:09:44.110
Before we go, I want to encourage you to try

00:09:44.110 --> 00:09:47.289
that double prompt trick on your next task. Just

00:09:47.289 --> 00:09:49.429
paste your complex instruction twice and see

00:09:49.429 --> 00:09:51.529
what happens. Thank you for joining us on this

00:09:51.529 --> 00:09:54.330
deep dive. Stay curious. Out to your own music.
