WEBVTT

00:00:00.000 --> 00:00:03.540
It's December 2nd, 2025. You walk into the OpenAI

00:00:03.540 --> 00:00:06.400
headquarters in San Francisco. Now, usually,

00:00:06.540 --> 00:00:08.699
this is the time for holiday parties, right?

00:00:08.759 --> 00:00:11.000
You're expecting champagne corks popping, maybe

00:00:11.000 --> 00:00:12.779
some high fives that are the kings of the hill.

00:00:12.800 --> 00:00:14.759
You would certainly think so, yeah. But instead

00:00:14.759 --> 00:00:19.160
of a party, the mood is tense. It's frantic.

00:00:19.300 --> 00:00:22.420
It is officially a code red. A code red. And

00:00:22.420 --> 00:00:25.440
why? Because the era of the bodybuilder AI, you

00:00:25.440 --> 00:00:27.420
know, those massive hulking models that rely

00:00:27.420 --> 00:00:30.160
on pure size to solve problems, is officially

00:00:30.160 --> 00:00:32.880
over and the era of the gymnast has begun. That's

00:00:32.880 --> 00:00:34.780
a vivid way to put it, but honestly, it's not

00:00:34.780 --> 00:00:37.359
far off. The raw strength we've all been obsessing

00:00:37.359 --> 00:00:39.880
over, it's still there, sure. But suddenly, the

00:00:39.880 --> 00:00:41.780
game isn't just about strength anymore. It's

00:00:41.780 --> 00:00:44.240
about precision. It's about agility. And if these

00:00:44.240 --> 00:00:46.439
leaks are real, the whole board has been reset.

00:00:46.969 --> 00:00:49.030
Welcome to the Deep Dive. It's Thursday, January

00:00:49.030 --> 00:00:52.469
22, 2026. Today, we're unpacking something that

00:00:52.469 --> 00:00:54.630
feels like a genuine pivot point in AI history.

00:00:54.810 --> 00:00:57.130
We're looking at the leaked details of OpenAI's

00:00:57.130 --> 00:01:00.369
GPT -5 .3, which has the internal, and let's

00:01:00.369 --> 00:01:02.590
be honest, slightly hilarious, codename, Garlic.

00:01:02.810 --> 00:01:05.230
Garlic. It's definitely a choice. A lot earthier

00:01:05.230 --> 00:01:08.189
than, you know, Orion or Gemini, all that celestial

00:01:08.189 --> 00:01:11.650
stuff we usually get. It really is. But don't

00:01:11.650 --> 00:01:14.510
let the name fool you. Our mission today is to

00:01:14.510 --> 00:01:17.680
figure out... why this specific model caused

00:01:17.680 --> 00:01:21.799
such a panic. This code red at OpenAI. We're

00:01:21.799 --> 00:01:23.439
going to look at the engineering breakthroughs,

00:01:23.439 --> 00:01:26.480
specifically around memory and this new self

00:01:26.480 --> 00:01:28.459
-checking thing. And we'll see how it stacks

00:01:28.459 --> 00:01:30.500
up against the current heavyweights, Google's

00:01:30.500 --> 00:01:33.879
Gemini 3 and Anthropic's Claude Opus 4 .5. This

00:01:33.879 --> 00:01:36.040
is going to be fun because the specs here aren't

00:01:36.040 --> 00:01:38.180
just number go up. Right. It's not just we added

00:01:38.180 --> 00:01:40.319
more zeros. This is a whole philosophical shift

00:01:40.319 --> 00:01:42.900
in how we built this stuff. So let's start with

00:01:42.900 --> 00:01:45.459
that context. We mentioned the code red. This

00:01:45.459 --> 00:01:47.340
leak came from the information back in December.

00:01:47.620 --> 00:01:50.340
Mark Chen, the chief research officer at OpenAI

00:01:50.340 --> 00:01:52.780
reportedly shared these details. But help me

00:01:52.780 --> 00:01:55.420
understand the panic. I mean, GBT 5 .2 was already

00:01:55.420 --> 00:01:58.019
out. It was a good model. Why the fire drill?

00:01:58.180 --> 00:02:00.219
It all comes down to momentum. In tech, if you

00:02:00.219 --> 00:02:02.680
aren't leading, you're basically dying. And frankly,

00:02:02.920 --> 00:02:05.159
OpenAI was losing ground. Losing ground to who?

00:02:05.519 --> 00:02:07.260
To everyone. I mean, look at the last six months.

00:02:07.379 --> 00:02:10.580
Google drops Gemini 3. And Gemini 3 just dominates

00:02:10.580 --> 00:02:13.500
anything multimodal video, messy real world data

00:02:13.500 --> 00:02:16.419
images. It was the king of stuff. Right. And

00:02:16.419 --> 00:02:19.280
then on the other side, you had Anthropic with

00:02:19.280 --> 00:02:22.340
Claude Opus 4 .5. And let's be honest, if you

00:02:22.340 --> 00:02:25.080
were writing code in late 2025, you were probably

00:02:25.080 --> 00:02:27.340
using Claude. I could vouch for that. I switched

00:02:27.340 --> 00:02:29.039
to Claude for all my scripting. It just felt

00:02:29.039 --> 00:02:32.500
less, I don't know, robotic. Exactly. So internally,

00:02:32.740 --> 00:02:36.300
GPT 5 .2 was seen. as a Band -Aid. It kept them

00:02:36.300 --> 00:02:38.500
in the conversation, but it wasn't bribing it.

00:02:38.580 --> 00:02:40.979
They knew that shipping GPT -5 at a little bit

00:02:40.979 --> 00:02:43.180
bigger just wasn't going to cut it. They needed

00:02:43.180 --> 00:02:45.699
a response that fundamentally changed the metric

00:02:45.699 --> 00:02:48.740
of success. And that response is garlic. You

00:02:48.740 --> 00:02:51.460
used this analogy before we started the bodybuilder

00:02:51.460 --> 00:02:53.719
versus the gymnast. I want to double click on

00:02:53.719 --> 00:02:55.909
that because for. what, the last five years,

00:02:56.090 --> 00:02:58.349
the headline has always been parameter count.

00:02:58.550 --> 00:03:00.330
Yeah. Trillions of parameters. Fairer clusters,

00:03:00.689 --> 00:03:03.229
more GPUs. Yeah. If you weren't building bigger,

00:03:03.330 --> 00:03:05.849
you weren't trying. And that's the bodybuilder

00:03:05.849 --> 00:03:08.550
approach. You solve problems by just adding more

00:03:08.550 --> 00:03:11.210
muscle. The model doesn't get physics. Add a

00:03:11.210 --> 00:03:13.050
trillion more parameters. Can't write a sonnet.

00:03:13.069 --> 00:03:16.229
Add another trillion. Total brute force. But

00:03:16.229 --> 00:03:19.210
garlic represents a shift toward density. It's

00:03:19.210 --> 00:03:22.069
the gymnast. It's physically smaller architecturally.

00:03:22.759 --> 00:03:25.719
more compact. But because of that, it can do

00:03:25.719 --> 00:03:28.199
these complex reasoning maneuvers, these mental

00:03:28.199 --> 00:03:31.460
backflips, if you will, that the massive, clumsy

00:03:31.460 --> 00:03:34.180
bodybuilder just can't do. Okay, but I have to

00:03:34.180 --> 00:03:36.500
play the skeptic here. Making it smaller but

00:03:36.500 --> 00:03:39.090
smarter sounds like... Marketing fluff. It sounds

00:03:39.090 --> 00:03:41.050
like the holy grail everyone promises, but nobody

00:03:41.050 --> 00:03:43.330
delivers. Usually when you make a neural net

00:03:43.330 --> 00:03:46.090
smaller, it gets dumber. How did they actually

00:03:46.090 --> 00:03:48.289
do that? It wasn't a straight line. It comes

00:03:48.289 --> 00:03:50.849
down to a merger of two different research tracks

00:03:50.849 --> 00:03:54.110
inside OpenAI. They had Shallot Pete, which was

00:03:54.110 --> 00:03:56.430
just their standard stability update track. Kind

00:03:56.430 --> 00:03:58.490
of boring. But then they had this experimental

00:03:58.490 --> 00:04:00.990
branch called Garlic. And the big breakthrough

00:04:00.990 --> 00:04:04.949
is a technique called EPTE. BPTE. Enhanced Pre

00:04:04.949 --> 00:04:07.520
-Training Efficiency. Okay, break that down for

00:04:07.520 --> 00:04:11.340
me. No jargon. What is EPTE actually doing that's

00:04:11.340 --> 00:04:13.599
different from GPT -4 or 5? Think of it like

00:04:13.599 --> 00:04:16.379
a garden. Or, you know, better yet, the human

00:04:16.379 --> 00:04:19.300
brain. When a baby is born, their brain just

00:04:19.300 --> 00:04:22.000
has this explosion of connections, synapses everywhere.

00:04:22.259 --> 00:04:25.019
It's a mess. Right. As we grow up, we get smarter

00:04:25.019 --> 00:04:27.540
not by adding connections, but by pruning them.

00:04:27.680 --> 00:04:29.860
We cut out the noise so the signal can travel

00:04:29.860 --> 00:04:33.519
faster. So we get smarter by deleting parts of

00:04:33.519 --> 00:04:36.120
our brain. In a way, yes. We delete the inefficiency.

00:04:36.480 --> 00:04:38.980
Traditional AI training is like letting a forest

00:04:38.980 --> 00:04:41.399
grow wild connections everywhere. Vines, weeds,

00:04:41.600 --> 00:04:45.100
you name it. It's huge, but it's messy. EPCE

00:04:45.100 --> 00:04:47.680
introduces a pruning phase during the training.

00:04:47.899 --> 00:04:50.740
The model actively discards redundant neural

00:04:50.740 --> 00:04:53.319
pathways. So it's Marie Kondo -ing its own brain.

00:04:53.540 --> 00:04:56.100
This neuron does not spark joy. Basically. It's

00:04:56.100 --> 00:04:58.100
cutting out the noise. So the result is you get

00:04:58.100 --> 00:05:00.980
GPT -6 level reasoning capabilities. So higher

00:05:00.980 --> 00:05:03.600
logic scores. Exactly. Higher logic scores. Because

00:05:03.600 --> 00:05:05.720
you've pruned away the inefficiency, it runs

00:05:05.720 --> 00:05:07.980
on a faster, smaller architecture. It's compressing

00:05:07.980 --> 00:05:10.439
thought. That's fascinating. It really does challenge

00:05:10.439 --> 00:05:13.540
that whole more is better assumption. But I have

00:05:13.540 --> 00:05:17.139
to ask, is efficiency really as exciting as raw

00:05:17.139 --> 00:05:20.899
power? As a user, do I care if the model is efficient?

00:05:21.060 --> 00:05:23.220
I usually get excited about the next biggest

00:05:23.220 --> 00:05:26.699
thing. Absolutely. And here's why. Latency and

00:05:26.699 --> 00:05:29.079
cost. Think of it like the difference between

00:05:29.079 --> 00:05:31.920
a muscle car and a Formula One racer. The muscle

00:05:31.920 --> 00:05:34.259
car has raw power and makes a ton of noise, burns

00:05:34.259 --> 00:05:37.720
a ton of gas. But the F1 car, it has precision

00:05:37.720 --> 00:05:40.560
engineering. It turns on a dime. In this case,

00:05:40.620 --> 00:05:43.259
precision means the model is cheaper to run and

00:05:43.259 --> 00:05:45.759
faster to answer. And when intelligence gets

00:05:45.759 --> 00:05:48.040
cheap and fast, you can use it in ways you never

00:05:48.040 --> 00:05:50.500
could with the big, slow bodybuilder. So it's

00:05:50.500 --> 00:05:53.519
shifting from raw horsepower to agility, and

00:05:53.519 --> 00:05:55.920
that changes what it's useful for. Exactly. It

00:05:55.920 --> 00:05:58.519
moves from a consultant you hire once to a worker

00:05:58.519 --> 00:06:00.519
that lives in your computer. Let's move to the

00:06:00.519 --> 00:06:02.800
specs, because while the philosophy is cool,

00:06:02.899 --> 00:06:05.259
the actual numbers here are, well, they're staggering.

00:06:05.459 --> 00:06:07.160
And I want to make sure we really get what they

00:06:07.160 --> 00:06:10.629
mean day to day. Let's do it. So, memory first.

00:06:10.990 --> 00:06:14.689
The context window. Garlic is reportedly shipping

00:06:14.689 --> 00:06:19.629
with a 400 ,000 token context window. Now, just

00:06:19.629 --> 00:06:23.310
to play devil advocate here, Gemini 3 has 2 million

00:06:23.310 --> 00:06:27.069
tokens. So on paper, Garlic looks smaller. Why

00:06:27.069 --> 00:06:29.540
should I be impressed by 400k? On paper, yeah,

00:06:29.600 --> 00:06:31.660
Gemini is bigger, but this is where the nuance

00:06:31.660 --> 00:06:34.040
really matters. If you've ever used Gemini's

00:06:34.040 --> 00:06:36.500
huge context window, like you dump a whole novel

00:06:36.500 --> 00:06:37.879
in there and ask about a character from chapter

00:06:37.879 --> 00:06:40.199
three, you might have noticed middle of the context

00:06:40.199 --> 00:06:42.420
loss. Right. I have seen this. It remembered

00:06:42.420 --> 00:06:44.560
the very beginning and the very end of your prompt,

00:06:44.699 --> 00:06:46.639
but it gets hazy on everything in the middle.

00:06:46.680 --> 00:06:48.579
It's like it skimmed the book. Exactly. It's

00:06:48.579 --> 00:06:50.839
the needle in a haystack problem. Yeah. Gemini

00:06:50.839 --> 00:06:52.879
has a huge stomach, but imperfect digestion.

00:06:53.579 --> 00:06:56.060
Garlic uses a new attention mechanism that reportedly

00:06:56.060 --> 00:06:59.019
gives it perfect recall. cross the whole 400

00:06:59.019 --> 00:07:01.060
,000 tokens. It doesn't just store the data.

00:07:01.100 --> 00:07:02.939
It actually remembers it. So I could feed it

00:07:02.939 --> 00:07:04.860
literally my entire company's documentation.

00:07:05.180 --> 00:07:08.680
Every PDF, every Slack policy. Every messy confluence

00:07:08.680 --> 00:07:10.519
page. And it wouldn't just have it. It would

00:07:10.519 --> 00:07:13.019
know it. It would know it. You could ask, what

00:07:13.019 --> 00:07:15.000
was that compliance rule we changed three years

00:07:15.000 --> 00:07:18.079
ago from that one Tuesday memo? And it just pulls

00:07:18.079 --> 00:07:20.920
it instantly. No skimming. That's the difference

00:07:20.920 --> 00:07:23.829
between a hard drive and a brain. Precisely.

00:07:23.870 --> 00:07:25.930
But here's the part that actually stopped me

00:07:25.930 --> 00:07:28.810
in my tracks. It's not just how much it can take

00:07:28.810 --> 00:07:32.189
in, it's how much it can put out. The output

00:07:32.189 --> 00:07:34.769
limit. This is the big one. This is that wonder

00:07:34.769 --> 00:07:39.579
moment. 128 ,000 tokens. In a single response.

00:07:39.920 --> 00:07:42.459
It's huge. I mean, just for context, for everyone

00:07:42.459 --> 00:07:44.339
listening right now, we're all used to the model

00:07:44.339 --> 00:07:46.920
just stopping. You ask it to write code. It gets

00:07:46.920 --> 00:07:48.800
halfway through a function and it just cuts off.

00:07:48.860 --> 00:07:50.899
And you have to type continue. And then it repeats

00:07:50.899 --> 00:07:53.160
the last line or it forgets the indentation.

00:07:53.220 --> 00:07:55.439
It's so fragmented. It feels like pulling teeth

00:07:55.439 --> 00:07:57.699
sometimes. It totally breaks your flow state.

00:07:57.819 --> 00:08:01.519
But 128 ,000 tokens. That's not a response. That's

00:08:01.519 --> 00:08:04.410
a novel. That's an entire software library. It

00:08:04.410 --> 00:08:06.810
is. And I want you to really just pause and imagine

00:08:06.810 --> 00:08:09.870
the experience of that. Imagine sitting at your

00:08:09.870 --> 00:08:13.990
terminal. You ask for a complex legal brief or

00:08:13.990 --> 00:08:17.050
maybe a full backend architecture for a new app,

00:08:17.129 --> 00:08:19.350
not just the outline, the actual code. You hit

00:08:19.350 --> 00:08:22.670
enter. And instead of a summary or a little piece

00:08:22.670 --> 00:08:26.939
of it, you just watch the cursor move. And it

00:08:26.939 --> 00:08:30.019
keeps moving. And it writes the files. It writes

00:08:30.019 --> 00:08:32.379
the documentation. It writes the test suite.

00:08:32.500 --> 00:08:34.679
And it just doesn't stop until the thought is

00:08:34.679 --> 00:08:38.259
complete. One coherent stream of creation. That

00:08:38.259 --> 00:08:41.639
is, wow. It's almost overwhelming to think about.

00:08:41.679 --> 00:08:43.700
It feels like we're moving from just chatting

00:08:43.700 --> 00:08:46.460
with a bot to manufacturing with a machine. That's

00:08:46.460 --> 00:08:48.440
the shift. No more chunking. No more stitching

00:08:48.440 --> 00:08:50.240
things together. You're not pasting snippets

00:08:50.240 --> 00:08:52.580
into VS Code anymore. You're reviewing a finished

00:08:52.580 --> 00:08:55.000
product. But does infinite memory actually change

00:08:55.000 --> 00:08:57.789
how we work? Or does it just change how much

00:08:57.789 --> 00:08:59.789
junk we dump into the chat box? I worry we'll

00:08:59.789 --> 00:09:02.110
just get lazier. I think it fundamentally changes

00:09:02.110 --> 00:09:04.230
the human's role. It stops us from being librarians

00:09:04.230 --> 00:09:06.889
of data, you know, constantly fetching context,

00:09:07.149 --> 00:09:10.009
pasting files, reminding the bot what we're talking

00:09:10.009 --> 00:09:12.850
about, and lets us be architects of ideas. You

00:09:12.850 --> 00:09:15.389
design the structure, the model pours all the

00:09:15.389 --> 00:09:17.990
concrete. So the output limit frees us from being

00:09:17.990 --> 00:09:21.000
data fetchers. To be designers instead. Spot

00:09:21.000 --> 00:09:23.799
on. It shifts the cognitive load from memory

00:09:23.799 --> 00:09:26.879
to strategy. That distinction architect versus

00:09:26.879 --> 00:09:30.120
librarian, that really lands. Because the other

00:09:30.120 --> 00:09:32.980
feature Garlick supposedly has leans heavily

00:09:32.980 --> 00:09:35.120
into that architect role. We're talking about

00:09:35.120 --> 00:09:37.820
native agents. Yes. This is another area where

00:09:37.820 --> 00:09:40.000
that code red was necessary. Because right now,

00:09:40.019 --> 00:09:42.340
everyone tries to make AI agents go do this task.

00:09:42.700 --> 00:09:45.059
And, well, usually they fail. They get stuck

00:09:45.059 --> 00:09:47.019
in loops. Or they hallucinate a file path that

00:09:47.019 --> 00:09:49.440
doesn't exist and then crash. Right. But Garlic

00:09:49.440 --> 00:09:51.820
isn't pretending to be an agent. The tool use

00:09:51.820 --> 00:09:54.340
is native. It understands file systems. It can

00:09:54.340 --> 00:09:56.980
run tests. It can debug like a developer. It

00:09:56.980 --> 00:09:59.620
treats APIs not as some external thing it has

00:09:59.620 --> 00:10:02.000
to awkwardly reach for, but as part of its own

00:10:02.000 --> 00:10:04.480
cognitive process. So it's the difference between

00:10:04.480 --> 00:10:06.779
me trying to speak French with a dictionary in

00:10:06.779 --> 00:10:09.399
my hand, looking up every word. Versus actually

00:10:09.399 --> 00:10:11.659
being fluent in French. Exactly. It thinks in

00:10:11.659 --> 00:10:14.980
tools. It thinks in execution. If it writes code

00:10:14.980 --> 00:10:17.950
that... fails a test it sees the error corrects

00:10:17.950 --> 00:10:20.129
it and reruns it all before it even gets back

00:10:20.129 --> 00:10:24.350
to you now speaking of thinking and correcting

00:10:24.350 --> 00:10:28.669
itself yeah there's one feature here that i think

00:10:28.669 --> 00:10:31.769
solves the single biggest anxiety i have when

00:10:31.769 --> 00:10:34.509
i use these tools now i want to be a little vulnerable

00:10:34.509 --> 00:10:36.409
for a second go for it i still struggle with

00:10:36.409 --> 00:10:39.330
trusting the output You know, I'll spend 20 minutes

00:10:39.330 --> 00:10:41.509
crafting this perfect prompt. I get an answer

00:10:41.509 --> 00:10:43.850
that looks incredible, super confident, polished.

00:10:44.049 --> 00:10:46.049
And then I spend 40 minutes fact checking it

00:10:46.049 --> 00:10:47.830
because I've been burned by hallucinations before.

00:10:48.049 --> 00:10:50.549
Oh, yeah. The universal experience. Confident

00:10:50.549 --> 00:10:52.990
liar problem. Exactly. And it creates this weird

00:10:52.990 --> 00:10:55.529
friction where I'm like, is this actually faster

00:10:55.529 --> 00:10:58.409
if I have to babysit it? But garlic has a self

00:10:58.409 --> 00:11:01.590
-checking mechanism. This is a game changer for

00:11:01.590 --> 00:11:04.679
exactly that anxiety. Before the model answers

00:11:04.679 --> 00:11:07.059
you, it enters a verification state. It just

00:11:07.059 --> 00:11:10.019
pauses. It checks its own internal knowledge

00:11:10.019 --> 00:11:12.620
graph to see, do I actually know this or am I

00:11:12.620 --> 00:11:14.759
just statistically guessing? So it has a conscience

00:11:14.759 --> 00:11:17.779
or at least a built -in BS detector. It has a

00:11:17.779 --> 00:11:20.500
system two thinking process, to use the Daniel

00:11:20.500 --> 00:11:23.639
Kahneman term. System one is fast, intuitive.

00:11:24.080 --> 00:11:27.840
System two is slow, deliberate. Garlic uses system

00:11:27.840 --> 00:11:31.570
two. If it isn't sure, it reassesses. The report

00:11:31.570 --> 00:11:34.149
says this leads to drastically fewer hallucinations.

00:11:34.409 --> 00:11:37.230
For lawyers, for developers, this is everything.

00:11:37.470 --> 00:11:39.409
But I have to play devil's advocate again. If

00:11:39.409 --> 00:11:41.389
the model is stopping to check its work, doesn't

00:11:41.389 --> 00:11:43.129
that make it slower? We were just talking about

00:11:43.129 --> 00:11:45.409
speed being the new king. It might pause briefly

00:11:45.409 --> 00:11:47.950
before the first token appears, maybe a second

00:11:47.950 --> 00:11:49.889
or two of thinking time. But think about the

00:11:49.889 --> 00:11:51.889
time you just described, the 40 minutes you spend

00:11:51.889 --> 00:11:54.149
fact -checking. It might be slower per second,

00:11:54.230 --> 00:11:56.950
but it saves hours of human rework later. It's

00:11:56.950 --> 00:12:00.220
that Navy SEAL mantra. Slow is smooth, and smooth

00:12:00.220 --> 00:12:03.700
is fast. So the pause pays for itself by eliminating

00:12:03.700 --> 00:12:06.500
all that cleanup time. Exactly. It trades milliseconds

00:12:06.500 --> 00:12:09.879
of latency for hours of reliability. I'll take

00:12:09.879 --> 00:12:12.360
that trade any day. Okay, I want to zoom out

00:12:12.360 --> 00:12:15.039
a bit. We have the specs, the philosophy. But

00:12:15.039 --> 00:12:18.379
OpenAI isn't operating in a vacuum. They're in

00:12:18.379 --> 00:12:21.919
a war. We mentioned Google and Anthropic. How

00:12:21.919 --> 00:12:24.399
does Garlic actually stack up? This is the battle

00:12:24.399 --> 00:12:26.200
for the leaderboard. So let's look at Google

00:12:26.200 --> 00:12:29.070
first. Gemini 3. The heavyweight. The heavyweight,

00:12:29.129 --> 00:12:31.269
yeah. If you look at the leaked benchmarks, the

00:12:31.269 --> 00:12:34.309
battle here is scale versus density. Gemini 3

00:12:34.309 --> 00:12:38.370
wins on multimodal. If you have messy, real -world

00:12:38.370 --> 00:12:42.129
data video, audio, weird images, Gemini is still

00:12:42.129 --> 00:12:44.830
the king. It has that massive context and parameter

00:12:44.830 --> 00:12:46.830
count for a reason. So if I'm analyzing a movie,

00:12:46.909 --> 00:12:49.350
I use Gemini. Correct. But Garlic wins on pure

00:12:49.350 --> 00:12:52.009
text, code, and complex reasoning. The benchmark

00:12:52.009 --> 00:12:55.210
is something called GDP Val for reasoning. What's

00:12:55.210 --> 00:12:56.970
that measuring? It's measuring logic puzzles,

00:12:57.250 --> 00:12:59.110
multi -step reasoning where you can't just memorize

00:12:59.110 --> 00:13:02.730
an answer. Garlic is scoring 70 .9%. Gemini is

00:13:02.730 --> 00:13:06.110
at 53 .3%. Wow. That is not a small margin. That's

00:13:06.110 --> 00:13:08.549
a generational gap. It's a blowout on reasoning.

00:13:08.669 --> 00:13:11.570
So the verdict is, analyze a three -hour video.

00:13:11.730 --> 00:13:14.429
Use Gemini. Build the back end of a banking app

00:13:14.429 --> 00:13:17.590
where logic is everything. You use Garlic. Okay,

00:13:17.629 --> 00:13:21.669
so that's Google. What about Antropic? Quad Opus

00:13:21.669 --> 00:13:24.470
4 .5. I know a ton of developers who swear by

00:13:24.470 --> 00:13:29.029
Claude. It feels warmer. It writes really readable

00:13:29.029 --> 00:13:31.350
code. Yeah, this is the battle for the developer's

00:13:31.350 --> 00:13:34.450
soul. Claude is known for that warmth and readability.

00:13:35.250 --> 00:13:37.769
But Garlic is coming in with a ruthless value

00:13:37.769 --> 00:13:40.129
proposition. It matches Claude's coding proficiency

00:13:40.129 --> 00:13:43.789
94 .2 % on Human Evil plus R, which is the gold

00:13:43.789 --> 00:13:45.590
standard. So it's just as good at the actual

00:13:45.590 --> 00:13:48.269
coding. Okay, so it's a tie. Not quite, because

00:13:48.269 --> 00:13:50.889
Garlic does it at two times the speed and half

00:13:50.889 --> 00:13:53.549
the cost. Half the cost. Because of that pruning

00:13:53.549 --> 00:13:56.059
we talked about. The model is physically smaller.

00:13:56.240 --> 00:13:58.779
It burns less electricity. It costs OpenAI less

00:13:58.779 --> 00:14:01.879
to run, so it costs you less to use. That's significant.

00:14:02.120 --> 00:14:04.179
Yeah. But does price really trump everything?

00:14:04.320 --> 00:14:06.580
I mean, if Claude feels more human to interact

00:14:06.580 --> 00:14:09.240
with, won't people stick with it? For a casual

00:14:09.240 --> 00:14:11.980
chat. Maybe. If you're brainstorming, you might

00:14:11.980 --> 00:14:14.360
stick with Claude. But for running a business

00:14:14.360 --> 00:14:16.980
at scale, if you're an API customer processing

00:14:16.980 --> 00:14:20.200
millions of requests, half price is everything.

00:14:20.679 --> 00:14:23.360
The unit economics alone will shift the market

00:14:23.360 --> 00:14:25.120
overnight. It's the difference between a boutique

00:14:25.120 --> 00:14:28.559
shop and like industrial scale. Exactly. If you're

00:14:28.559 --> 00:14:31.279
building a startup and your API bill just got

00:14:31.279 --> 00:14:34.039
cut in half, you're not going to care how warm

00:14:34.039 --> 00:14:36.559
the model feels. You care that it works and it's

00:14:36.559 --> 00:14:40.029
cheap. So garlic. competes on unit economics

00:14:40.029 --> 00:14:42.789
and logic and sort of cedes the human touch to

00:14:42.789 --> 00:14:45.730
Claude. For now, yeah. It's an industrial revelation,

00:14:45.909 --> 00:14:48.529
not a dinner party. So we have the what and the

00:14:48.529 --> 00:14:51.409
why. I want to take a brief pause here. And we're

00:14:51.409 --> 00:14:53.629
back. We've looked at the code red, the tech,

00:14:53.629 --> 00:14:55.610
and the competition. I want to wrap our heads

00:14:55.610 --> 00:14:58.110
around the timeline and the big picture. Leaks

00:14:58.110 --> 00:15:00.029
are great, but shipping is what matters. When

00:15:00.029 --> 00:15:01.789
are we actually going to see this thing? Well,

00:15:01.830 --> 00:15:04.070
looking at how the leaks and vendor updates are

00:15:04.070 --> 00:15:06.710
converging, it feels imminent. We're expecting

00:15:06.710 --> 00:15:10.090
a preview release. Probably the ChatGPT Pro users

00:15:10.090 --> 00:15:14.070
and some partners in late January 2026. Late

00:15:14.070 --> 00:15:16.409
January. That's basically this week. It's happening

00:15:16.409 --> 00:15:20.250
now. Then the full API availability is slated

00:15:20.250 --> 00:15:23.210
for February. And this is interesting. They're

00:15:23.210 --> 00:15:25.529
expected to integrate a version of this into

00:15:25.529 --> 00:15:28.549
the free tier by March. A counter to Gemini's

00:15:28.549 --> 00:15:30.990
free access. Exactly. They have to capture that

00:15:30.990 --> 00:15:33.990
user base. They can't let Google own the entry

00:15:33.990 --> 00:15:36.730
-level market. Pulling all this together, the

00:15:36.730 --> 00:15:40.169
pruning, the 128K output, the self -checking,

00:15:40.190 --> 00:15:42.850
what's the big idea here? If I'm a listener trying

00:15:42.850 --> 00:15:45.330
to make sense of all this, what is the core shift?

00:15:45.690 --> 00:15:48.070
The core shift is that the definition of AI progress

00:15:48.070 --> 00:15:51.490
has changed. For five years, progress meant bigger.

00:15:51.649 --> 00:15:54.190
It meant more parameters. Now, progress means

00:15:54.190 --> 00:15:56.830
cognitive density. It is about intelligence per

00:15:56.830 --> 00:15:59.149
dollar and intelligence per watt. Cognitive density.

00:15:59.309 --> 00:16:01.629
I like that. It sounds focused. It's about doing

00:16:01.629 --> 00:16:04.159
more with less. And for you listening, the so

00:16:04.159 --> 00:16:05.919
what is pretty direct. If you're a developer,

00:16:06.080 --> 00:16:08.759
you can finally refactor entire code bases without

00:16:08.759 --> 00:16:10.840
losing context. You don't have to choose which

00:16:10.840 --> 00:16:13.320
files to upload. If you're a business, you can

00:16:13.320 --> 00:16:15.340
build automation that actually works because

00:16:15.340 --> 00:16:17.720
of the agentic capabilities. And if you're a

00:16:17.720 --> 00:16:21.340
creator, you can generate long form content books,

00:16:21.600 --> 00:16:25.139
scripts, courses without having to manually stitch

00:16:25.139 --> 00:16:26.940
it together. It feels like the training wheels

00:16:26.940 --> 00:16:29.820
are coming off. They are. The limitations we've

00:16:29.820 --> 00:16:32.240
all learned to work around. Oh, I can only paste

00:16:32.240 --> 00:16:34.379
half this file. Oh, I have to check its math.

00:16:34.600 --> 00:16:38.360
Those limitations are just evaporating. So before

00:16:38.360 --> 00:16:40.500
we sign off, let's give everyone listening something

00:16:40.500 --> 00:16:42.659
to do. Because if this is dropping in weeks,

00:16:42.779 --> 00:16:45.460
we shouldn't just be waiting around. How do we

00:16:45.460 --> 00:16:48.600
prepare for the garlic era? Two things. First,

00:16:48.840 --> 00:16:51.899
organize your data. If you want to use that 400

00:16:51.899 --> 00:16:54.419
,000 token context window with perfect recall,

00:16:54.740 --> 00:16:57.419
your data needs to be ready. Clean up your documentation,

00:16:57.759 --> 00:17:00.200
merge your repositories, get it ready to go.

00:17:00.399 --> 00:17:03.000
Don't feed the gymnast junk food. Exactly. If

00:17:03.000 --> 00:17:04.660
you feed it garbage, you'll still get garbage.

00:17:04.740 --> 00:17:06.920
Just perfectly recalled garbage. Good point.

00:17:07.099 --> 00:17:10.529
And second, map your workflows. Start thinking

00:17:10.529 --> 00:17:13.430
in terms of agentic workflows. Don't just think,

00:17:13.470 --> 00:17:16.230
what question can I ask the bot? Think, what

00:17:16.230 --> 00:17:18.450
multi -step process can I hand off entirely?

00:17:18.869 --> 00:17:21.309
Check these invoices against these emails and

00:17:21.309 --> 00:17:23.710
update the spreadsheet. If you map those out

00:17:23.710 --> 00:17:26.450
now, when garlic drops, you can plug it in and

00:17:26.450 --> 00:17:28.930
it will just work. That's great advice. Whether

00:17:28.930 --> 00:17:31.769
it ends up being called garlic or GPT 5 .3 or

00:17:31.769 --> 00:17:34.329
something else entirely, the message is clear.

00:17:34.430 --> 00:17:38.890
The era of... giant, slow, forgetful AI is ending.

00:17:39.029 --> 00:17:40.809
And the people who are ready to build with these

00:17:40.809 --> 00:17:43.230
new efficient models are going to have a massive

00:17:43.230 --> 00:17:45.930
advantage. A massive advantage indeed. Thank

00:17:45.930 --> 00:17:47.730
you for walking us through the Code Red. Always

00:17:47.730 --> 00:17:49.849
a pleasure. And to you listening, thank you for

00:17:49.849 --> 00:17:51.950
diving deep with us. Go clean up your data, get

00:17:51.950 --> 00:17:54.269
your workflows ready, and we will see you on

00:17:54.269 --> 00:17:55.170
the next one. Take care.
