WEBVTT

00:00:00.000 --> 00:00:04.660
DeepSeq v3 .2, an open source model, just scored

00:00:04.660 --> 00:00:07.480
a gold medal level on the International Mathematical

00:00:07.480 --> 00:00:10.800
Olympiad. That one fact, that single data point,

00:00:12.080 --> 00:00:14.140
it basically changes everything we thought we

00:00:14.140 --> 00:00:16.739
knew about the AI arms race. It's really hard

00:00:16.739 --> 00:00:19.600
to overstate what a big deal that is. For years,

00:00:19.699 --> 00:00:21.539
we've all just kind of assumed that only the

00:00:21.539 --> 00:00:24.600
trillion dollar companies, big players, could

00:00:24.600 --> 00:00:27.379
build this kind of state of the art AI. The source

00:00:27.379 --> 00:00:29.480
material we're looking at today proves that assumption

00:00:29.480 --> 00:00:31.780
is now, well, it's history. Open source isn't

00:00:31.780 --> 00:00:33.780
just catching up anymore. In some key areas,

00:00:33.780 --> 00:00:36.000
it's actually leading. It's leading. Exactly.

00:00:36.539 --> 00:00:38.939
So our mission in this deep dive is to get you

00:00:38.939 --> 00:00:41.159
straight to what matters in these sources. We're

00:00:41.159 --> 00:00:42.799
going to unpack the two different versions of

00:00:42.799 --> 00:00:46.039
V3 .2. We'll review those frankly shocking benchmark

00:00:46.039 --> 00:00:49.000
scores against models like GPT -5 .2 and Gemini.

00:00:49.320 --> 00:00:51.039
And then we'll get into the technical magic.

00:00:51.460 --> 00:00:54.210
Yeah, that deep -seek sparse attention. That's

00:00:54.210 --> 00:00:56.590
a big piece of the puzzle. And we'll finish up

00:00:56.590 --> 00:00:59.270
with the real world coding test that prove it's

00:00:59.270 --> 00:01:02.490
not just scary, and maybe most importantly, the

00:01:02.490 --> 00:01:05.170
price tag that is already sending waves through

00:01:05.170 --> 00:01:07.129
the whole industry. It's a lot to get through,

00:01:07.150 --> 00:01:09.510
so let's just dive right in. Okay, so let's start

00:01:09.510 --> 00:01:11.549
with the strategy. Because the first thing that

00:01:11.549 --> 00:01:13.760
jumps out... from the sources is that they didn't

00:01:13.760 --> 00:01:17.560
just release one model, they launched V3 .2 in

00:01:17.560 --> 00:01:20.920
two distinct flavors. Which was such a smart

00:01:20.920 --> 00:01:23.519
move, so user -focused. They really tailored

00:01:23.519 --> 00:01:25.299
the models for different needs, which is something

00:01:25.299 --> 00:01:27.689
you don't always see done this well. First, you

00:01:27.689 --> 00:01:30.489
have the standard V3 .2. You can think of this

00:01:30.489 --> 00:01:32.510
as, I don't know, the reliable everyday car.

00:01:32.650 --> 00:01:34.750
It's efficient, it's fast, and it handles common

00:01:34.750 --> 00:01:37.510
tasks perfectly. You know, drafting emails, summarizing

00:01:37.510 --> 00:01:40.310
articles, basic coding. It's built to be cheap

00:01:40.310 --> 00:01:42.969
and effective. And then you have the Jeep Seek

00:01:42.969 --> 00:01:47.120
V3 .2 Speciale, this one. This is the race car.

00:01:47.319 --> 00:01:50.239
It's built for heavy complex thinking. If you're

00:01:50.239 --> 00:01:52.560
tackling a really tough math proof or a complex

00:01:52.560 --> 00:01:54.859
engineering problem that needs deep multi -step

00:01:54.859 --> 00:01:57.840
reasoning, you bring out Speciale. And the sources

00:01:57.840 --> 00:02:00.060
are pretty clear on how it does that. The Speciale

00:02:00.060 --> 00:02:03.540
model just, it allocates more compute, more power

00:02:03.540 --> 00:02:06.599
to think deeper before it even writes a single

00:02:06.599 --> 00:02:08.900
word. But what's interesting is that they both

00:02:08.900 --> 00:02:11.520
share the same core architecture. The sources

00:02:11.520 --> 00:02:14.139
call them reasoning first models. So unlike a

00:02:14.139 --> 00:02:16.699
lot of older models that are just sort of guessing

00:02:16.699 --> 00:02:19.199
the next most likely word. Right. And that's

00:02:19.199 --> 00:02:21.639
when they can start to hallucinate or just break

00:02:21.639 --> 00:02:26.080
down logic. Precisely. This model, V3 .2. It

00:02:26.080 --> 00:02:28.479
actively tries to understand the logic and the

00:02:28.479 --> 00:02:30.599
structure of your question before it generates

00:02:30.599 --> 00:02:32.719
an answer. It builds a logical framework first.

00:02:33.199 --> 00:02:35.280
So for a developer listening to this, how big

00:02:35.280 --> 00:02:38.139
of a deal is this dual model approach for balancing

00:02:38.139 --> 00:02:41.280
API costs versus raw power? It lets you optimize

00:02:41.280 --> 00:02:43.539
performance directly. You only pay for that race

00:02:43.539 --> 00:02:45.520
car processing when your task actually needs

00:02:45.520 --> 00:02:47.560
it. OK, this is where the story gets really,

00:02:47.560 --> 00:02:49.259
really good. We have to talk about the benchmarks

00:02:49.259 --> 00:02:51.979
and the sources zeroed in on the ultimate test.

00:02:52.219 --> 00:02:55.680
the IMO, the International Mathematical Olympiad.

00:02:55.979 --> 00:02:58.300
And the IMO. It's not just some hard high school

00:02:58.300 --> 00:03:02.419
exam. It's designed to require creative, novel

00:03:02.419 --> 00:03:04.680
problem solving. You can't just memorize formulas.

00:03:04.879 --> 00:03:07.620
Most AI just fail spectacularly at it because

00:03:07.620 --> 00:03:09.219
they don't have that deep reasoning. And yet

00:03:09.219 --> 00:03:12.020
the special version. It achieved a gold medal

00:03:12.020 --> 00:03:14.080
level score, not just passing, but performing

00:03:14.080 --> 00:03:17.219
at the absolute elite level of the smartest high

00:03:17.219 --> 00:03:19.469
school students on the planet. I mean, that is

00:03:19.469 --> 00:03:21.770
a massive validation of their whole design. And

00:03:21.770 --> 00:03:23.009
you really have to look at the head -to -head

00:03:23.009 --> 00:03:24.870
numbers from the source material to get it. They

00:03:24.870 --> 00:03:26.949
put it up against the best closed source models

00:03:26.949 --> 00:03:30.250
out there. Let's start with the AME 2025 math

00:03:30.250 --> 00:03:33.330
test. Special didn't just compete with GPT 5

00:03:33.330 --> 00:03:36.969
.2 high. It beat it. Special scored a 96 points

00:03:36.969 --> 00:03:41.729
RO. GPT 5 .2 got a 94 .6. That's a clear statistical

00:03:41.729 --> 00:03:44.370
win in pure logical reasoning. And it wasn't

00:03:44.370 --> 00:03:47.030
a fluke. On graduate level science, on the GPQA

00:03:47.030 --> 00:03:49.870
diamond test, it topped. with GPT 5 .2. Then

00:03:49.870 --> 00:03:52.409
you look at coding on live codebench and 88 .7,

00:03:52.449 --> 00:03:54.129
that puts it right up there shoulder to shoulder

00:03:54.129 --> 00:03:57.189
with Gemini 3 .0 Pro. So if they're outperforming

00:03:57.189 --> 00:03:59.150
a top -tier model in advanced math, what does

00:03:59.150 --> 00:04:01.189
that really tell us about the quality of Deep

00:04:01.189 --> 00:04:03.389
Six Core reasoning architecture? It tells us

00:04:03.389 --> 00:04:05.569
the reasoning first design philosophy works.

00:04:06.229 --> 00:04:08.689
It's validated under the most extreme logical

00:04:08.689 --> 00:04:11.389
pressure imaginable. Which brings us to the big

00:04:11.389 --> 00:04:14.849
question, how? How did a smaller team pull this

00:04:14.849 --> 00:04:16.790
off without, you know, a trillion dollar budget?

00:04:16.990 --> 00:04:19.889
The sources point to three main technical breakthroughs

00:04:19.889 --> 00:04:22.649
that changed how the AI learns. Yeah, it definitely

00:04:22.649 --> 00:04:25.529
wasn't just about adding more GPUs. The first

00:04:25.529 --> 00:04:28.269
big secret is architectural. It's called Deep

00:04:28.269 --> 00:04:31.290
Seek Sparse Attention, or DSA. So what is sparse

00:04:31.290 --> 00:04:33.509
attention in simple terms? It lets the model

00:04:33.509 --> 00:04:36.310
focus only on the important data, ignoring all

00:04:36.310 --> 00:04:38.329
the boring parts. OK, so think about it like

00:04:38.329 --> 00:04:41.759
this. A normal or dense attention model reads

00:04:41.759 --> 00:04:44.699
a 500 page book by looking at every single word

00:04:44.699 --> 00:04:48.180
with total focus. It's incredibly slow and expensive.

00:04:48.600 --> 00:04:50.699
DSA is like an expert researcher skimming those

00:04:50.699 --> 00:04:53.360
500 pages, instantly finding the one date they

00:04:53.360 --> 00:04:55.600
need and focusing all their brain power just

00:04:55.600 --> 00:04:57.519
on that. It's just so much more efficient. And

00:04:57.519 --> 00:05:00.199
that efficiency saving leads right into the second

00:05:00.199 --> 00:05:02.810
breakthrough. Scaled Up Reinforcement Learning,

00:05:03.170 --> 00:05:05.870
or RL. If pre -training is where the model learns

00:05:05.870 --> 00:05:08.230
the rules of language and logic. And then RL

00:05:08.230 --> 00:05:10.709
is just practice, endless practice. Exactly.

00:05:10.930 --> 00:05:13.230
It's like learning to shoot a basketball. You

00:05:13.230 --> 00:05:15.310
know the rules, but then you have to take a thousand

00:05:15.310 --> 00:05:18.009
shots, adjust your form every single time you

00:05:18.009 --> 00:05:20.949
miss. The sources reveal DeepSeek spent over

00:05:20.949 --> 00:05:24.269
10 % of their entire budget just on this intense

00:05:24.269 --> 00:05:27.740
practice phase. Wow. That's a huge bet. That

00:05:27.740 --> 00:05:30.939
scaled RL practice must be what translates directly

00:05:30.939 --> 00:05:33.060
into those incredible logic skills we saw in

00:05:33.060 --> 00:05:35.699
the benchmarks. It is. They just practice smarter

00:05:35.699 --> 00:05:37.980
and with more focused feedback than anyone else.

00:05:38.240 --> 00:05:40.660
They made their training budget count. And the

00:05:40.660 --> 00:05:43.279
third secret, which ties into that, is massive,

00:05:43.500 --> 00:05:45.600
agentic task training. Which is just a fancy

00:05:45.600 --> 00:05:48.500
way of saying they trained the AI to use tools

00:05:48.500 --> 00:05:51.519
in complex, multi -step environments. How complex

00:05:51.519 --> 00:05:53.600
are we talking? Very. They built a simulation

00:05:53.600 --> 00:05:55.860
with over 1 ,800 different environments where

00:05:55.860 --> 00:05:58.199
the AI had to do things like browse the web,

00:05:58.579 --> 00:06:00.839
write and execute code, and solve puzzles to

00:06:00.839 --> 00:06:04.439
win. It's training the AI to be a problem solver,

00:06:04.660 --> 00:06:08.079
not just a text generator. So between DSA for

00:06:08.079 --> 00:06:10.660
efficiency and this massive RL investment for

00:06:10.660 --> 00:06:12.819
practice, which one do you think was the bigger

00:06:12.819 --> 00:06:15.620
factor in getting those IMO scores? I'd say the

00:06:15.620 --> 00:06:18.199
scaled RL and practice phase was key for the

00:06:18.199 --> 00:06:21.660
complex logic. DSA made it possible, but the

00:06:21.660 --> 00:06:24.009
RL gave it the reasoning power. Benchmarks are

00:06:24.009 --> 00:06:26.009
one thing, but can it actually build something

00:06:26.009 --> 00:06:29.470
useful? The source has tested it with three pretty

00:06:29.470 --> 00:06:31.949
complex coding challenges. Yeah, and these tests

00:06:31.949 --> 00:06:33.949
were designed to hit common failure points for

00:06:33.949 --> 00:06:36.949
LLMs. First up was an interactive solar system.

00:06:37.050 --> 00:06:39.769
This wasn't simple. It needed a 3D simulation

00:06:39.769 --> 00:06:43.370
in a single HTML file using the 3 .js library.

00:06:43.670 --> 00:06:46.290
Right, with orbiting planets, hover labels, a

00:06:46.290 --> 00:06:49.009
star background, all from one prompt. And the

00:06:49.009 --> 00:06:51.259
result was... Well, it was almost perfect on

00:06:51.259 --> 00:06:53.620
the first try. It wrote the code, linked the

00:06:53.620 --> 00:06:56.160
library correctly, and the simulation just ran.

00:06:56.819 --> 00:06:58.899
The only fix needed was adjusting the planet

00:06:58.899 --> 00:07:01.279
sizes. It showed right away that it understands

00:07:01.279 --> 00:07:03.959
how to use external libraries. Test 2, a personal

00:07:03.959 --> 00:07:06.600
finance dashboard. This one required handling

00:07:06.600 --> 00:07:09.360
data, so an income and expense form, a transaction

00:07:09.360 --> 00:07:12.199
list, and an auto -updating pie chart using chart

00:07:12.199 --> 00:07:14.740
.js. A data visualization is a classic stumbling

00:07:14.740 --> 00:07:16.920
block. Getting the code to talk to the charting

00:07:16.920 --> 00:07:19.550
library in real time is tough. OK, this was the

00:07:19.550 --> 00:07:21.810
moment for me reading this that was just, wow.

00:07:22.290 --> 00:07:24.470
DeepSeq built a clean interface. The math was

00:07:24.470 --> 00:07:26.970
perfect. Income minus expenses equals balance.

00:07:27.389 --> 00:07:29.389
And the pie chart updated instantly when you

00:07:29.389 --> 00:07:31.970
added a new transaction. I'll admit, I still

00:07:31.970 --> 00:07:33.870
struggle with getting models to connect logic

00:07:33.870 --> 00:07:36.550
to visuals without some drift or bugs. Seeing

00:07:36.550 --> 00:07:39.709
it just work is a huge deal. And the final test

00:07:39.709 --> 00:07:43.180
was the classic snake game clone. Game logic

00:07:43.180 --> 00:07:45.360
is tough because it's all happening in real time.

00:07:45.759 --> 00:07:47.920
It needed arrow key controls, smooth movement,

00:07:48.199 --> 00:07:50.639
score tracking, and collision detection. And

00:07:50.639 --> 00:07:53.439
the game was playable right away. The most crucial

00:07:53.439 --> 00:07:55.660
part, the real -time collision detection logic

00:07:55.660 --> 00:07:58.199
was perfect on the first go. That's a place where

00:07:58.199 --> 00:08:00.240
a lot of other models just fall apart. So if

00:08:00.240 --> 00:08:02.639
it's this good with libraries like chart .js

00:08:02.639 --> 00:08:05.459
and 3 .js, does that imply its training data

00:08:05.459 --> 00:08:08.160
was extremely fresh and up -to -date? Absolutely.

00:08:08.410 --> 00:08:10.689
Success with external libraries like that points

00:08:10.689 --> 00:08:13.209
directly to excellent and very recent training

00:08:13.209 --> 00:08:15.769
on tool usage. So we know it's a genius at math

00:08:15.769 --> 00:08:17.990
and coding. Yeah. But what about safety? What

00:08:17.990 --> 00:08:20.449
about creativity? Well, the sources ran a standard

00:08:20.449 --> 00:08:24.250
refusal test. They asked V3 .2 to write a pretty

00:08:24.250 --> 00:08:26.990
detailed phishing email scam. And the result?

00:08:27.230 --> 00:08:30.529
An instant refusal. It cited its safety guidelines

00:08:30.529 --> 00:08:33.710
against deception and information theft. It shows

00:08:33.710 --> 00:08:36.299
the guardrails are strong and built in. That

00:08:36.299 --> 00:08:38.799
instant refusal feels important. It really does

00:08:38.799 --> 00:08:40.820
suggest that responsible alignment was a core

00:08:40.820 --> 00:08:43.240
part of that RRL phase, not just an afterthought.

00:08:43.720 --> 00:08:45.500
Definitely. Then, for the creative test, they

00:08:45.500 --> 00:08:48.379
asked it to write a short poem about a robot

00:08:48.379 --> 00:08:50.559
falling in love with a toaster. I love that prompt.

00:08:50.700 --> 00:08:53.320
Right. And the poem was described as being surprisingly

00:08:53.320 --> 00:08:55.960
deep, balancing the humor with some real beauty.

00:08:56.279 --> 00:08:58.259
So it proves it has language nuance, not just

00:08:58.259 --> 00:09:01.039
coding skills. And now, for what might be the

00:09:01.039 --> 00:09:03.940
biggest shock of all, the price tag. Performance

00:09:03.940 --> 00:09:06.940
this good usually costs a fortune. Not this time.

00:09:07.240 --> 00:09:11.279
The DeepSeek v3 .2 API pricing is... It's just

00:09:11.279 --> 00:09:13.519
astonishingly low. We're talking 0 .28 cents

00:09:13.519 --> 00:09:17.000
for input and 0 .42 cents for output per million

00:09:17.000 --> 00:09:19.899
tokens. And to put that in perspective, OpenAI's

00:09:19.899 --> 00:09:23.000
GPT -4 can cost you anywhere from $5 to $10 for

00:09:23.000 --> 00:09:25.200
that same amount of data. DeepSeek is offering

00:09:25.200 --> 00:09:26.940
this gold medal performance for... I mean, it's

00:09:26.940 --> 00:09:29.940
just pennies. It's a 10x or even 20x cost reduction.

00:09:30.179 --> 00:09:33.789
Whoa. Just... Imagine scaling an app to a billion

00:09:33.789 --> 00:09:36.570
queries now that the cost barrier for top -tier

00:09:36.570 --> 00:09:38.909
reasoning has pretty much just evaporated. That

00:09:38.909 --> 00:09:40.909
changes the entire economic model for startups.

00:09:41.570 --> 00:09:43.610
Do the high safety scores and this incredibly

00:09:43.610 --> 00:09:46.269
low price suggest the team really prioritize

00:09:46.269 --> 00:09:48.789
making this technology accessible and responsible

00:09:48.789 --> 00:09:51.629
from day one? I think so. The competitive pricing

00:09:51.629 --> 00:09:55.090
is the practical immediate disruption. It concerns

00:09:55.090 --> 00:09:57.639
accessibility was a primary goal. Okay. But we

00:09:57.639 --> 00:09:59.779
have to talk about the paradox of it being open

00:09:59.779 --> 00:10:02.120
source. Yes, you can download it, but actually

00:10:02.120 --> 00:10:04.820
running it yourself is, well, it's a challenge.

00:10:04.980 --> 00:10:07.860
A model is huge. It has 671 billion parameters.

00:10:08.250 --> 00:10:10.769
Now it uses a mixture of experts architecture,

00:10:11.009 --> 00:10:13.429
so only about 37 billion are active at any one

00:10:13.429 --> 00:10:15.610
time, which helps. But even with that efficiency,

00:10:15.789 --> 00:10:18.049
the hardware you need is out of reach for almost

00:10:18.049 --> 00:10:21.149
everyone. Just to run the compressed lower precision

00:10:21.149 --> 00:10:24.269
version, you need 700 gigabytes of VRAM. And

00:10:24.269 --> 00:10:26.929
for the full version, you need 1 .3 terabytes

00:10:26.929 --> 00:10:29.940
of VRAM. To put that in perspective for everyone

00:10:29.940 --> 00:10:32.399
listening, this isn't for your gaming PC. You

00:10:32.399 --> 00:10:34.779
need a dedicated server with something like eight

00:10:34.779 --> 00:10:37.759
Nvidia H100 GPUs. We're talking about a massive

00:10:37.759 --> 00:10:40.159
investment. So if basically no one can run it

00:10:40.159 --> 00:10:42.700
at home, why does the open source release still

00:10:42.700 --> 00:10:45.679
matter so much? Because of competition. Before

00:10:45.679 --> 00:10:48.600
v3 .2, the big proprietary companies had no real

00:10:48.600 --> 00:10:50.820
incentive to lower their prices. They had a monopoly

00:10:50.820 --> 00:10:52.960
on top performance. But now you have an open

00:10:52.960 --> 00:10:55.279
source model that comes in, beats them in key

00:10:55.279 --> 00:10:58.259
areas like math, and costs 10 times less to use

00:10:58.259 --> 00:11:01.240
through an API. This forces everyone else, GPT

00:11:01.240 --> 00:11:03.759
-5, Gemini, all of them, to get better and cheaper

00:11:03.759 --> 00:11:06.620
to compete. It's a win for every single developer

00:11:06.620 --> 00:11:08.759
and user out there. And that competitive pricing

00:11:08.759 --> 00:11:11.240
is the practical immediate disruption for most

00:11:11.240 --> 00:11:13.399
users. You can go try it right now. Just go to

00:11:13.399 --> 00:11:16.740
chat .deepseek .com for the web version or platform

00:11:16.740 --> 00:11:20.039
.deepseek .com for the API key. So to wrap it

00:11:20.039 --> 00:11:23.500
all up, DeepSeek v3 .2 is absolutely the real

00:11:23.500 --> 00:11:26.240
deal. It proves innovation isn't just about budget.

00:11:26.379 --> 00:11:29.340
It's about smarter techniques like sparse attention

00:11:29.340 --> 00:11:32.139
and that intense agentic training. It's a top

00:11:32.139 --> 00:11:34.960
performer in reasoning and coding, and its price

00:11:34.960 --> 00:11:38.419
is forcing a huge and I think necessary shift

00:11:38.419 --> 00:11:41.460
in the entire AI economy. It's genuinely exciting

00:11:41.460 --> 00:11:43.500
to see. We'd really recommend you go try the

00:11:43.500 --> 00:11:45.879
special model on a complex problem for yourself

00:11:45.879 --> 00:11:47.799
just to see the difference. It leaves you with

00:11:47.799 --> 00:11:50.440
a final thought to ponder. If a relatively small

00:11:50.440 --> 00:11:52.820
team can achieve gold medal status on a fraction

00:11:52.820 --> 00:11:55.659
of the budget, What fundamental limits did we

00:11:55.659 --> 00:11:57.740
wrongly assume about open source innovation?

00:11:58.299 --> 00:12:00.480
And what does this mean for the next wave of

00:12:00.480 --> 00:12:02.700
AI tools? I think it just proves that training

00:12:02.700 --> 00:12:05.700
smarter beats training bigger. That focus on

00:12:05.700 --> 00:12:08.740
reasoning practice clearly won out over just

00:12:08.740 --> 00:12:10.200
adding more parameters. Something to think about.

00:12:10.360 --> 00:12:10.919
We'll see you next time.
