WEBVTT

00:00:00.000 --> 00:00:02.620
Imagine you walk into a Ferrari dealership. You

00:00:02.620 --> 00:00:04.459
point at their top of the line million dollar

00:00:04.459 --> 00:00:06.919
hypercar. The one on the pedestal, yeah. Exactly.

00:00:07.639 --> 00:00:10.599
And the dealer just looks at you and says, actually,

00:00:10.800 --> 00:00:13.640
we're going to give you that same engine, but

00:00:13.640 --> 00:00:16.039
we're putting in a sedan and the gas is 67 %

00:00:16.039 --> 00:00:18.640
cheaper. And the sedan somehow gets better mileage.

00:00:19.000 --> 00:00:21.660
That is, I mean, effectively what just happened

00:00:21.660 --> 00:00:24.179
this week. Welcome back to the Deep Dive. It's

00:00:24.179 --> 00:00:26.140
good to be here. Today, we're not just talking

00:00:26.140 --> 00:00:28.539
about new features or specs. We are tracking

00:00:28.539 --> 00:00:31.780
a really massive shift in the industry. This

00:00:31.780 --> 00:00:34.899
pivot from just raw power to what you might call

00:00:34.899 --> 00:00:37.219
radical efficiency. It's the moment the tech

00:00:37.219 --> 00:00:39.759
really matures, you know. We're moving from the

00:00:39.759 --> 00:00:43.020
wow phase to the ROI phase. Right. Can it do

00:00:43.020 --> 00:00:45.659
the job cheaply? And can it do it without hallucinating

00:00:45.659 --> 00:00:49.039
halfway through? That's the key. Lay out the

00:00:49.039 --> 00:00:51.259
map for us today. How are we going to tackle

00:00:51.259 --> 00:00:53.479
this? All right. We've got a heavy lineup. First,

00:00:53.579 --> 00:00:56.359
we're starting with Anthropic's new clawed updates

00:00:56.359 --> 00:01:00.159
on it, 4 .6. It feels less like an update and

00:01:00.159 --> 00:01:03.340
more like a declaration of war on the whole flagship

00:01:03.340 --> 00:01:06.739
model category. Then we've got a lightning round

00:01:06.739 --> 00:01:10.299
NPR suing Google over a voice that sounds a little

00:01:10.299 --> 00:01:13.879
too familiar. Apple's hardware roadmap for 2027.

00:01:13.959 --> 00:01:16.920
All the big stuff. And we end on the underdog

00:01:16.920 --> 00:01:19.909
story. The biggest story of the week, in my opinion.

00:01:20.030 --> 00:01:23.670
A recruiting startup in China, not a tech giant,

00:01:23.790 --> 00:01:27.769
a recruiting firm, just released a tiny AI model

00:01:27.769 --> 00:01:30.769
that is absolutely beating the giants at their

00:01:30.769 --> 00:01:33.150
own game. I saw those numbers. It doesn't seem

00:01:33.150 --> 00:01:35.230
mathematically possible, but we'll get there.

00:01:35.349 --> 00:01:38.730
So let's start with the big dog. Anthropic, Claude

00:01:38.730 --> 00:01:42.329
Son at 4 .6. Right. Normally a .6 release is,

00:01:42.390 --> 00:01:45.189
you know, a patch, some bug fixes. But these

00:01:45.189 --> 00:01:48.140
benchmarks feel... Very aggressive. The whole

00:01:48.140 --> 00:01:50.140
headline here is the price to performance ratio.

00:01:50.519 --> 00:01:52.560
Anthropic is saying this new Sonnet delivers

00:01:52.560 --> 00:01:55.480
near Opus performance. And Opus is their Einstein

00:01:55.480 --> 00:01:58.500
model, the expensive heavy lifter. Sonnet's the

00:01:58.500 --> 00:02:00.560
mid -tier. Correct. But the pricing didn't change

00:02:00.560 --> 00:02:02.299
at all. So the pricing didn't move up. No, it

00:02:02.299 --> 00:02:04.760
stayed flat. $3 per million input tokens, 15

00:02:04.760 --> 00:02:08.180
for output. Opus is 5 and 25. OK, let's just

00:02:08.180 --> 00:02:10.180
sit with that for a second. If I'm getting near

00:02:10.180 --> 00:02:13.340
flagship intelligence for those prices, doing

00:02:13.340 --> 00:02:18.050
the math. That's roughly 67 % more tokens for

00:02:18.050 --> 00:02:20.889
every dollar I spend. Which is huge. But think

00:02:20.889 --> 00:02:23.030
about what that lets you do. As a developer,

00:02:23.189 --> 00:02:26.870
you can now afford to have the AI think longer,

00:02:27.150 --> 00:02:30.009
maybe check its own work three times for the

00:02:30.009 --> 00:02:32.550
same cost as one answer from the old flagship.

00:02:32.930 --> 00:02:34.710
It totally changes the economics of reliability.

00:02:35.229 --> 00:02:38.169
It does. It turns intelligence into a commodity

00:02:38.169 --> 00:02:40.590
you can afford to waste a little bit of. And

00:02:40.590 --> 00:02:42.210
it seems like the users are really noticing.

00:02:42.289 --> 00:02:44.900
I was looking at the sentiment analysis. 70 %

00:02:44.900 --> 00:02:47.159
of users prefer this new Sonnet over the last

00:02:47.159 --> 00:02:50.319
one. Right. But the stat that blew my mind, 59

00:02:50.319 --> 00:02:53.460
% prefer it over the previous Opus. That's the

00:02:53.460 --> 00:02:55.780
cannibalization metric right there. When your

00:02:55.780 --> 00:02:58.099
mid -tier model beats your old flagship, you've

00:02:58.099 --> 00:03:00.199
successfully raised the floor for the entire

00:03:00.199 --> 00:03:02.960
industry. But why the preference? Is it just

00:03:02.960 --> 00:03:06.639
about speed? Because usually better means smarter,

00:03:06.819 --> 00:03:09.620
not just faster. It's bloat reduction. That's

00:03:09.620 --> 00:03:12.580
what users are saying. Fewer hallucinations and,

00:03:12.740 --> 00:03:15.479
this is interesting, less over -engineering in

00:03:15.479 --> 00:03:17.500
the answers. Oh, I know exactly what that means.

00:03:17.620 --> 00:03:20.020
You ask for a simple two -line Python script

00:03:20.020 --> 00:03:22.379
and it gives you a lecture on coding ethics and

00:03:22.379 --> 00:03:24.319
a five -paragraph history of the language first.

00:03:24.560 --> 00:03:28.520
Exactly that. Sonnet 4 .6 just cuts the waffle.

00:03:29.000 --> 00:03:31.900
It's pragmatic. In fact, on the honest world

00:03:31.900 --> 00:03:35.360
benchmark, which tests how well an AI can navigate

00:03:35.360 --> 00:03:38.379
computer apps. Like a human would. Yes. It's

00:03:38.379 --> 00:03:40.939
matching the current opus. It's a worker bee

00:03:40.939 --> 00:03:43.659
with the brain of a queen. Speaking of worker

00:03:43.659 --> 00:03:45.979
bees, there's a new feature they rolled out called

00:03:45.979 --> 00:03:48.919
context compaction. This sounds technical, but

00:03:48.919 --> 00:03:50.439
I have a feeling it's going to matter. It really

00:03:50.439 --> 00:03:52.800
might. Do you ever have that issue where you're

00:03:52.800 --> 00:03:54.819
in a long chat with an AI, maybe working on a

00:03:54.819 --> 00:03:57.280
project, and by message 50, it starts forgetting

00:03:57.280 --> 00:03:59.740
what you told it in message one? Prompt drift.

00:04:00.740 --> 00:04:02.560
I'll be honest. I struggle with this constantly.

00:04:02.699 --> 00:04:05.840
I spend half my time reminding the bot, hey,

00:04:05.879 --> 00:04:08.620
remember, we aren't using that library or don't

00:04:08.620 --> 00:04:10.819
use that tone. It feels like the model gets dementia.

00:04:11.400 --> 00:04:14.620
Right. And that happens because the context window,

00:04:14.759 --> 00:04:17.600
the AI's short -term memory, it gets filled up

00:04:17.600 --> 00:04:19.459
with every single word. It's trying to remember

00:04:19.459 --> 00:04:23.699
every typo from the last four hours. And eventually

00:04:23.699 --> 00:04:25.699
it just collapses under its own weight. Exactly.

00:04:25.759 --> 00:04:28.360
So compaction fixes that. Think of it like a

00:04:28.360 --> 00:04:31.180
court stenographer versus a really good executive

00:04:31.180 --> 00:04:33.379
assistant. Okay. The stenographer writes down

00:04:33.379 --> 00:04:36.550
every single word verbatim. The assistant listens

00:04:36.550 --> 00:04:39.610
for 30 minutes, shreds the full transcript, and

00:04:39.610 --> 00:04:41.629
then writes a perfect three -bullet summary of

00:04:41.629 --> 00:04:44.389
what was actually decided. So the AI is editing

00:04:44.389 --> 00:04:47.829
its own memory in real time. Yes. It summarizes

00:04:47.829 --> 00:04:50.029
the older parts of the conversation. It keeps

00:04:50.029 --> 00:04:52.250
the intent and the decisions, but it throws away

00:04:52.250 --> 00:04:55.089
all the verbatim fluff. This means you can have

00:04:55.089 --> 00:04:57.629
a never -ending conversation. That is huge for

00:04:57.629 --> 00:05:00.540
autonomous agents. If I have a bat monitoring

00:05:00.540 --> 00:05:03.379
my emails for a week, it has to remember the

00:05:03.379 --> 00:05:05.740
rules I set on Monday without crashing on Friday.

00:05:05.879 --> 00:05:08.060
Exactly. It keeps the context clean. So let me

00:05:08.060 --> 00:05:10.839
ask you this. If Sonnet is this good and this

00:05:10.839 --> 00:05:13.500
cheap, does the flagship model category just

00:05:13.500 --> 00:05:17.040
die? Why would I ever pay for Opus or GPT -5?

00:05:17.259 --> 00:05:20.740
It doesn't die, but it forces flagships to justify

00:05:20.740 --> 00:05:23.160
their premium price. They can't just be good

00:05:23.160 --> 00:05:25.459
anymore. They have to be geniuses. Raising the

00:05:25.459 --> 00:05:28.889
floor forces the ceiling up. Okay, let's pivot.

00:05:29.230 --> 00:05:32.410
The industry never sleeps. Let's hit the lightning

00:05:32.410 --> 00:05:35.589
round. Let's do it. First up, a legal battle

00:05:35.589 --> 00:05:37.529
that feels like it's straight out of a sci -fi

00:05:37.529 --> 00:05:41.449
novel. NPR versus Google. David Green, the former

00:05:41.449 --> 00:05:44.300
host, is suing. He says Google's notebook LM

00:05:44.300 --> 00:05:46.920
tool is ripping off his voice. This is fascinating

00:05:46.920 --> 00:05:49.300
because Google's defense is pretty standard.

00:05:49.519 --> 00:05:52.040
We hired a paid actor, and they probably did,

00:05:52.240 --> 00:05:54.339
but Green isn't arguing about the voice print.

00:05:54.579 --> 00:05:56.579
He's arguing about the cadence. Exactly, the

00:05:56.579 --> 00:05:59.839
public radio voice. The pauses, the intonation,

00:05:59.939 --> 00:06:02.060
the way they sort of lean into the microphone.

00:06:02.160 --> 00:06:05.680
It raises this huge legal gray area. Can you

00:06:05.680 --> 00:06:07.560
copyright a pause? Can you copyright a vibe?

00:06:07.899 --> 00:06:10.199
Right, but I think there's a real identity theft

00:06:10.199 --> 00:06:12.180
angle here. If I listen to it and think it's...

00:06:12.199 --> 00:06:14.160
David Green, isn't that the real problem? That's

00:06:14.160 --> 00:06:17.120
the danger zone. Legally, Cadence isn't copyrightable,

00:06:17.120 --> 00:06:19.579
not yet. But if a jury believes Google told the

00:06:19.579 --> 00:06:22.740
AI to sound like David Green, that gets into

00:06:22.740 --> 00:06:25.439
right of publicity. It's the same thing Scarlett

00:06:25.439 --> 00:06:27.800
Johansson dealt with. It's not about the raw

00:06:27.800 --> 00:06:30.899
data. It's about the intent to mimic. So regarding

00:06:30.899 --> 00:06:34.220
the NPR lawsuit, where is the line between inspiration

00:06:34.220 --> 00:06:37.959
and theft? in AI voices. Its blurry cadence isn't

00:06:37.959 --> 00:06:41.139
copyrightable, but identity theft is a real legal

00:06:41.139 --> 00:06:44.879
risk. Okay, moving from legal battles to actual

00:06:44.879 --> 00:06:48.300
tools. A new tool called Manos just launched

00:06:48.300 --> 00:06:51.790
agents inside Telegram. This is wild. It's OpenClaw

00:06:51.790 --> 00:06:54.170
-style agents, basically autonomous research

00:06:54.170 --> 00:06:56.449
bots, but they live inside your chat app. So

00:06:56.449 --> 00:06:58.889
no coding needed? None. You just ask it to research

00:06:58.889 --> 00:07:01.490
a topic or process some data, create a PDF, and

00:07:01.490 --> 00:07:03.870
it just does it right there in the chat. It's

00:07:03.870 --> 00:07:05.930
putting a research assistant in the app we use

00:07:05.930 --> 00:07:07.750
for memes. Okay, let's look further down the

00:07:07.750 --> 00:07:10.560
timeline. Apple. Rumors are swirling about an

00:07:10.560 --> 00:07:13.439
acceleration on three wearables. Yes, the roadmap

00:07:13.439 --> 00:07:16.240
seems to be pointing toward late 2027. We're

00:07:16.240 --> 00:07:19.259
hearing about an AI pin, smart glasses, and this

00:07:19.259 --> 00:07:21.660
is the big one, AirPods with really deep Siri

00:07:21.660 --> 00:07:24.639
integration. 2027 sounds so far away, but in

00:07:24.639 --> 00:07:26.860
hardware terms, that's basically tomorrow. It

00:07:26.860 --> 00:07:30.420
is. If production starts in December 2027, we

00:07:30.420 --> 00:07:32.879
are looking at a very, very interesting holiday

00:07:32.879 --> 00:07:36.110
season that year. Apple is clearly betting the

00:07:36.110 --> 00:07:38.689
future isn't just a phone it's AI whispering

00:07:38.689 --> 00:07:42.240
in your ear and overlaying your vision meanwhile

00:07:42.240 --> 00:07:45.800
on the infrastructure side open ai just introduced

00:07:45.800 --> 00:07:48.560
lockdown mode this is for the enterprise crowd

00:07:48.560 --> 00:07:52.240
right totally the security focus teams it blocks

00:07:52.240 --> 00:07:55.220
risky tools cuts off live web access it's like

00:07:55.220 --> 00:07:57.839
putting the ai in a clean room so it can't leak

00:07:57.839 --> 00:08:00.560
secrets or download malware in meta they're just

00:08:00.560 --> 00:08:02.519
buying up all the chips all of them millions

00:08:02.519 --> 00:08:06.019
of nvidia chips grace cpus the next gen viewer

00:08:06.019 --> 00:08:08.899
rubin systems they are building a massive war

00:08:09.040 --> 00:08:12.199
chest of compute. Speaking of war chests, Blackstone

00:08:12.199 --> 00:08:15.920
just led a $1 .2 billion round for a company

00:08:15.920 --> 00:08:19.060
called Nasa. An Indian AI data center startup.

00:08:19.160 --> 00:08:21.240
This is a huge signal that the demand for AI

00:08:21.240 --> 00:08:23.220
infrastructure is just exploding in emerging

00:08:23.220 --> 00:08:25.240
markets. It's not just a Silicon Valley game

00:08:25.240 --> 00:08:27.480
anymore. Not at all. The physical backbone of

00:08:27.480 --> 00:08:30.199
AI is going global. So when you zoom out on all

00:08:30.199 --> 00:08:33.440
this, lawsuits over voices, agents in Telegram,

00:08:33.740 --> 00:08:37.539
AI in our glasses, data centers in India, what's

00:08:37.539 --> 00:08:40.129
the big picture? The big picture is AI becoming

00:08:40.129 --> 00:08:44.149
more human, more accessible, and just more ubiquitous.

00:08:44.389 --> 00:08:47.509
The gap between computer and partner is really

00:08:47.509 --> 00:08:49.909
starting to vanish. And that transition needs

00:08:49.909 --> 00:08:52.750
new tools. It's not just about the big models.

00:08:52.809 --> 00:08:54.950
It's about that layer in between. For sure. We've

00:08:54.950 --> 00:08:56.429
seen a few interesting ones pop up. We can do

00:08:56.429 --> 00:08:58.840
a quick fire on these. All right. First, figure

00:08:58.840 --> 00:09:01.320
AI. This one's for the designers. It maps out

00:09:01.320 --> 00:09:04.279
user flows, it spots edge cases, and it can even

00:09:04.279 --> 00:09:06.860
build out A -B variations for you. It's automating

00:09:06.860 --> 00:09:08.759
the logic part of design. Okay, then there's

00:09:08.759 --> 00:09:11.159
layers. That's a growth planning tool. It generates

00:09:11.159 --> 00:09:14.500
content, social posts, ads, basically a marketing

00:09:14.500 --> 00:09:17.860
agency in a box. And boost .spacev5. This one

00:09:17.860 --> 00:09:20.580
calls itself a persistent context layer. Think

00:09:20.580 --> 00:09:22.360
of it as the glue that holds all your different

00:09:22.360 --> 00:09:24.220
AIs together so they can actually talk to each

00:09:24.220 --> 00:09:28.360
other. And finally... QN3 .5, an open -weight

00:09:28.360 --> 00:09:31.179
vision language model. The key stat here is just

00:09:31.179 --> 00:09:33.720
fascinating. It delivers the capabilities of

00:09:33.720 --> 00:09:37.340
a nearly 400 billion parameter model. It's huge.

00:09:37.679 --> 00:09:40.139
But with the inference speed of a 17 billion

00:09:40.139 --> 00:09:43.460
parameter model. Okay, hold on. Inference speed.

00:09:43.539 --> 00:09:46.080
Let's define that. Simply put, it's how fast

00:09:46.080 --> 00:09:49.299
the AI thinks and replies. It's the delay between

00:09:49.299 --> 00:09:52.320
you hitting enter and getting an answer. Usually,

00:09:52.360 --> 00:09:55.960
smart models are slow and heavy. When 3 .5 is

00:09:55.960 --> 00:09:58.679
smart, but fast. So with tools like figure and

00:09:58.679 --> 00:10:01.259
layers, are we automating the creative director?

00:10:01.500 --> 00:10:03.200
We're automating the grunt work. The director

00:10:03.200 --> 00:10:06.279
still needs to choose the vision. Sponsor. Okay,

00:10:06.340 --> 00:10:08.379
we've talked about the giant centropic, Google,

00:10:08.620 --> 00:10:10.519
Apple. But this is where the story gets really

00:10:10.519 --> 00:10:12.779
interesting. The biggest surprise of the week,

00:10:12.779 --> 00:10:15.539
for me, didn't come from Silicon Valley. It came

00:10:15.539 --> 00:10:18.039
from a recruiting startup in China. This is the

00:10:18.039 --> 00:10:19.740
story everyone should be paying attention to.

00:10:20.000 --> 00:10:23.659
The lab is called Nanbije LLM Lab, and they released

00:10:23.659 --> 00:10:27.820
a model called Nanbije 4 .13b. 3b, as in 3 billion

00:10:27.820 --> 00:10:31.000
parameters. In the world of LLMs, that is. Tiny.

00:10:31.059 --> 00:10:32.860
That's pocket -sized. It's incredibly small.

00:10:33.259 --> 00:10:36.539
For context, GPT -4 is rumored to be in the trillions

00:10:36.539 --> 00:10:41.080
of parameters. Lama 3 is 70 or 400 billion. 3

00:10:41.080 --> 00:10:42.440
billion is something you could theoretically

00:10:42.440 --> 00:10:45.159
run on a high -end phone. But it's crushing benchmarks,

00:10:45.340 --> 00:10:47.460
despite being so small. It is playing David to

00:10:47.460 --> 00:10:51.750
the industry's Goliath. On Arena Hard V2, a really

00:10:51.750 --> 00:10:55.470
tough stress test, it scored a 73 .2. Okay, numbers

00:10:55.470 --> 00:10:57.049
are numbers. Let's talk about coding. That's

00:10:57.049 --> 00:10:59.210
where the rubber meets the road. This is the

00:10:59.210 --> 00:11:02.340
shocker. On LeetCode Weekly Contests... These

00:11:02.340 --> 00:11:04.700
are real coding challenges that actual humans

00:11:04.700 --> 00:11:09.100
take to get jobs. Nan Beige reported an 85 .0

00:11:09.100 --> 00:11:13.659
% pass rate. 85%. Now compare that to QEM332B,

00:11:13.919 --> 00:11:17.279
a model 10 times its size, which scored 50%.

00:11:17.279 --> 00:11:20.740
Wait, hold on. A 3 billion parameter model? beat

00:11:20.740 --> 00:11:24.519
a 32 billion parameter model by 35 percentage

00:11:24.519 --> 00:11:26.860
points yes that shouldn't be possible that's

00:11:26.860 --> 00:11:29.080
like a go -kart beating a formula one car imagine

00:11:29.080 --> 00:11:31.860
fitting a grandmaster chess player in your pocket

00:11:31.860 --> 00:11:34.299
that is what this really represents it just defies

00:11:34.299 --> 00:11:36.500
the logic that bigger is always better so how

00:11:36.500 --> 00:11:38.500
did they do it what's the secret sauce here it's

00:11:38.500 --> 00:11:40.600
all about the methodology they didn't just throw

00:11:40.600 --> 00:11:42.620
more data at it they trained it smarter they

00:11:42.620 --> 00:11:45.159
used an upgraded fine tuning mix but the real

00:11:45.159 --> 00:11:48.139
magic is in the reinforcement learning or rl

00:11:48.139 --> 00:11:50.580
okay break that down for us They used a two -stage

00:11:50.580 --> 00:11:54.669
process. First is pointwise RL. The model generates

00:11:54.669 --> 00:11:56.809
eight different answers for a single prompt,

00:11:56.929 --> 00:11:59.529
and a reward model scores each one. Right. And

00:11:59.529 --> 00:12:02.570
they optimize this using something called GRPO

00:12:02.570 --> 00:12:05.970
group relative policy optimization. Jargon alert.

00:12:06.690 --> 00:12:09.509
GRPO. Think of it as a way to reduce repetitive

00:12:09.509 --> 00:12:12.309
errors. It's like a teacher not just marking

00:12:12.309 --> 00:12:15.029
an answer wrong, but showing the student specifically

00:12:15.029 --> 00:12:17.610
where they started to ramble or make a mistake

00:12:17.610 --> 00:12:20.190
so they don't do it again. It tightens the logic.

00:12:20.350 --> 00:12:23.639
To the second stage. That's pairwise RL. They

00:12:23.639 --> 00:12:25.799
take a strong response and a weak response and

00:12:25.799 --> 00:12:28.379
force the model to compare them. It uses something

00:12:28.379 --> 00:12:31.620
called a swap consistency regularizer. Okay,

00:12:31.720 --> 00:12:34.059
simpler analogy. It's like wine tasting. It's

00:12:34.059 --> 00:12:35.919
one thing to say this wine is good. It's much

00:12:35.919 --> 00:12:38.840
harder and more educational to taste two wines

00:12:38.840 --> 00:12:41.080
side by side and articulate why one is better.

00:12:41.220 --> 00:12:43.899
It just sharpens the model's judgment. So it's

00:12:43.899 --> 00:12:46.200
teaching the AI to recognize quality, not just

00:12:46.200 --> 00:12:49.220
generate text. Exactly. And because of this intense

00:12:49.220 --> 00:12:52.159
training, this tiny model supports up to 250...

00:12:52.460 --> 00:12:55.980
56 ,000 tokens of context. It can do deep search

00:12:55.980 --> 00:12:58.440
with hundreds of tool calls. It's a real agent.

00:12:58.639 --> 00:13:00.940
Does this prove that training technique matters

00:13:00.940 --> 00:13:04.500
more than raw parameter size? Absolutely. Smart

00:13:04.500 --> 00:13:07.179
training beats brute force computing power every

00:13:07.179 --> 00:13:09.559
time. And this really brings us back to the big

00:13:09.559 --> 00:13:12.480
idea of this deep dive. It all circles back to

00:13:12.480 --> 00:13:15.620
efficiency. We started with Cloud Sonnet, which

00:13:15.620 --> 00:13:18.220
gives us flagship level power. for a fraction

00:13:18.220 --> 00:13:20.940
of the price. Then we saw tools like Quinn and,

00:13:21.039 --> 00:13:23.299
of course, Nan Beige proving that smaller models

00:13:23.299 --> 00:13:25.419
are becoming incredibly smart through better

00:13:25.419 --> 00:13:27.820
architecture, not just bigger data centers. The

00:13:27.820 --> 00:13:30.220
hardware from Apple and Meta is ramping up to

00:13:30.220 --> 00:13:33.080
support this whole ecosystem. But the software

00:13:33.080 --> 00:13:35.159
itself is actually getting leaner. We're not

00:13:35.159 --> 00:13:37.159
just building bigger brains anymore. We're building

00:13:37.159 --> 00:13:40.039
more efficient ones. And that's crucial. If we

00:13:40.039 --> 00:13:42.899
want AI everywhere in our glasses, our chats,

00:13:43.139 --> 00:13:45.320
our phones, it can't rely on a massive server

00:13:45.320 --> 00:13:48.059
farm for every single thought. It has to be efficient.

00:13:48.220 --> 00:13:50.480
It has to be. Before we go, I just want to leave

00:13:50.480 --> 00:13:53.139
you with one final thought. That Nan Beige model,

00:13:53.320 --> 00:13:56.779
the world -beating, code -crushing AI, it came

00:13:56.779 --> 00:13:59.399
from a recruiting startup. Not Google, not OpenAI.

00:13:59.860 --> 00:14:03.100
A company that helps people find jobs. If a non

00:14:03.100 --> 00:14:05.139
-AI native company can build a state -of -the

00:14:05.139 --> 00:14:07.240
-art model to solve their specific problems,

00:14:07.419 --> 00:14:10.620
like coding and hiring, what happens when every

00:14:10.620 --> 00:14:13.289
company starts doing that? A law firm building

00:14:13.289 --> 00:14:15.850
the best legal model. A hospital building the

00:14:15.850 --> 00:14:18.769
best diagnostic model. It means the era of renting

00:14:18.769 --> 00:14:21.850
these giant general purpose brains from a handful

00:14:21.850 --> 00:14:24.649
of companies might be ending. We could be moving

00:14:24.649 --> 00:14:27.289
toward a world of specialized pocket -sized experts

00:14:27.289 --> 00:14:29.669
that live on our own devices. It's a profound

00:14:29.669 --> 00:14:32.389
shift. It's a future I'm very curious to see

00:14:32.389 --> 00:14:34.289
unfold. If you want to check out the links to

00:14:34.289 --> 00:14:36.490
the new Claude features or download that Nambige

00:14:36.490 --> 00:14:38.570
model yourself, check out the show notes. And

00:14:38.570 --> 00:14:40.750
give that context compaction a try. It might

00:14:40.750 --> 00:14:42.909
just save your sanity on a long project. Until

00:14:42.909 --> 00:14:45.429
next time, stay curious. Stay curious.
