WEBVTT

00:00:00.000 --> 00:00:02.700
Um, so think about this for a second. There are

00:00:02.700 --> 00:00:06.080
literally more possible games of chess than there

00:00:06.080 --> 00:00:09.400
are atoms in the observable universe. Yeah, which

00:00:09.400 --> 00:00:10.939
is one of those stats that just kind of breaks

00:00:10.939 --> 00:00:13.080
your brain a little bit. Right. Like, if your

00:00:13.080 --> 00:00:15.480
smartphone actually tried to calculate every

00:00:15.480 --> 00:00:19.070
single possible move to beat you... the universe

00:00:19.070 --> 00:00:21.250
would literally go cold and dark before the machine

00:00:21.250 --> 00:00:23.750
made its first decision. Exactly. The math just

00:00:23.750 --> 00:00:25.910
doesn't work out for brute force. But then, you

00:00:25.910 --> 00:00:27.670
know, you load up a chess app, you make a move,

00:00:27.829 --> 00:00:29.929
and it checkmates you in, like, three seconds

00:00:29.929 --> 00:00:33.030
flat. And you just have to wonder how. Well,

00:00:33.070 --> 00:00:35.250
it completely changes how you think about problem

00:00:35.250 --> 00:00:38.189
solving, honestly. Because we tend to assume

00:00:38.189 --> 00:00:41.170
computers are just doing these massive brute

00:00:41.170 --> 00:00:43.049
force calculations. Like, they're just reading

00:00:43.049 --> 00:00:45.409
the universe of possibilities way faster than

00:00:45.409 --> 00:00:47.780
we can. Right. But it's not just about having

00:00:47.780 --> 00:00:51.039
a faster processor. It's actually this fundamental

00:00:51.039 --> 00:00:54.119
philosophical shift in how the machine actually

00:00:54.119 --> 00:00:56.320
searches for the answer in the first place. And

00:00:56.320 --> 00:01:00.259
that is exactly the mission of today's deep dive.

00:01:00.960 --> 00:01:03.820
Welcome in, everyone. We are exploring a single,

00:01:04.500 --> 00:01:07.579
pretty dense, but absolutely fascinating concept

00:01:07.579 --> 00:01:09.739
from computer science. Yeah, it's known as alpha

00:01:09.739 --> 00:01:12.700
beta pruning. Right. And our goal for you today

00:01:12.700 --> 00:01:15.519
is to basically shortcut our way to understanding

00:01:15.519 --> 00:01:18.719
how algorithms make incredibly complex decisions

00:01:18.719 --> 00:01:21.079
in just a fraction of a second. And we're going

00:01:21.079 --> 00:01:23.079
to do it without getting bogged down in, you

00:01:23.079 --> 00:01:25.819
know... academic information overload. Exactly.

00:01:26.239 --> 00:01:28.400
So to lay the groundwork for us, what kind of

00:01:28.400 --> 00:01:30.400
problem is this algorithm actually trying to

00:01:30.400 --> 00:01:32.900
solve? So at its foundation, alpha -beta pruning

00:01:32.900 --> 00:01:35.780
is an adversarial search algorithm. Meaning it's

00:01:35.780 --> 00:01:38.439
built for opponents. Exactly. It is built specifically

00:01:38.439 --> 00:01:41.120
for machine playing of two -player, zero -sum

00:01:41.120 --> 00:01:44.459
games. So games where one person's win is mathematically

00:01:44.459 --> 00:01:46.700
equal to the other person's loss. Right. Like

00:01:46.700 --> 00:01:49.019
chess or tic -tac -toe, connect four, things

00:01:49.019 --> 00:01:51.379
like that. And this algorithm acts as an upgrade,

00:01:51.540 --> 00:01:53.640
really. It sits on top of an old older, more

00:01:53.640 --> 00:01:55.879
basic logic system called the Minimax algorithm.

00:01:56.810 --> 00:01:59.510
Yeah, and Minimax simply maps out the game tree,

00:01:59.909 --> 00:02:01.849
assuming both players are playing perfectly,

00:02:02.370 --> 00:02:04.590
just to find the best possible move. OK, but

00:02:04.590 --> 00:02:06.829
then alpha -beta pruning comes in and adds something

00:02:06.829 --> 00:02:09.530
else. It adds a true superpower to that base.

00:02:09.849 --> 00:02:12.270
It gives the computer the ability to mathematically

00:02:12.270 --> 00:02:15.710
prove exactly when it should stop thinking. OK,

00:02:15.710 --> 00:02:18.009
let's untack this, because before we dive into

00:02:18.009 --> 00:02:21.629
the heavy math of how it stops thinking, I think

00:02:21.629 --> 00:02:23.889
we need to visualize the dynamic between these

00:02:23.889 --> 00:02:26.509
two players first. Sure, yeah. That helps ground

00:02:26.509 --> 00:02:28.550
it. Let me try an analogy to frame this for you.

00:02:28.909 --> 00:02:31.909
Imagine you are out shopping for a car. OK. Your

00:02:31.909 --> 00:02:34.370
goal is to get the absolutely lowest possible

00:02:34.370 --> 00:02:37.270
price. So in the language of the algorithm, you

00:02:37.270 --> 00:02:39.210
are the minimizing player. Right, you want the

00:02:39.210 --> 00:02:41.909
smallest number. But the car dealer, their goal

00:02:41.909 --> 00:02:44.310
is to extract the highest possible price out

00:02:44.310 --> 00:02:47.479
of you. So they are the maximizing player. That

00:02:47.479 --> 00:02:50.099
captures the dynamic really well. It is a pure

00:02:50.099 --> 00:02:53.219
zero -sum negotiation. Right. So you do your

00:02:53.219 --> 00:02:56.500
research and you find out that dealer B's absolute

00:02:56.500 --> 00:02:59.360
worst case scenario offer like the highest price

00:02:59.360 --> 00:03:01.719
they will ultimately force you to pay is $20

00:03:01.719 --> 00:03:05.439
,000. OK, so that's your baseline. Exactly. Then

00:03:05.439 --> 00:03:07.539
you start looking into dealer A across town.

00:03:07.840 --> 00:03:10.000
And you immediately find out that dealer A has

00:03:10.000 --> 00:03:13.879
a mandatory non -negotiable $10 ,000 markup on

00:03:13.879 --> 00:03:18.139
every single car on their lot. Oh, wow. OK. Yeah.

00:03:18.620 --> 00:03:21.050
Which. instantly pushes their starting baseline

00:03:21.050 --> 00:03:25.189
to $25 ,000. At that exact moment, you don't

00:03:25.189 --> 00:03:27.050
need to look at Dealer A's inventory anymore.

00:03:27.210 --> 00:03:29.090
Because it's already a lost cause. Right. You

00:03:29.090 --> 00:03:31.069
don't care if they have the leather seats or

00:03:31.069 --> 00:03:34.270
the upgraded stereo or the sunroof. You know

00:03:34.270 --> 00:03:37.090
their baseline is already worse than the absolute

00:03:37.090 --> 00:03:39.969
worst case scenario at Dealer B. So you completely

00:03:39.969 --> 00:03:42.849
stop looking. Exactly. You prune Dealer A from

00:03:42.849 --> 00:03:45.169
your search entirely. I love that. The logic

00:03:45.169 --> 00:03:47.469
holds up perfectly. And when you map that onto

00:03:47.469 --> 00:03:50.030
the computer science behind a chess game, the

00:03:50.030 --> 00:03:51.909
algorithm is essentially playing the role of

00:03:51.909 --> 00:03:53.969
both the buyer and the dealer simultaneously.

00:03:54.289 --> 00:03:56.009
Oh, interesting. So it's arguing with itself.

00:03:56.349 --> 00:03:58.990
Basically, yeah. It maintains two running values

00:03:58.990 --> 00:04:01.509
as it calculates the future game tree. It calls

00:04:01.509 --> 00:04:04.289
them alpha and beta. Hence the name. Exactly.

00:04:04.729 --> 00:04:07.610
So alpha represents the minimum score that the

00:04:07.610 --> 00:04:10.469
maximizing player is assured of getting. And

00:04:10.469 --> 00:04:12.909
beta represents the maximum score the minimizing

00:04:12.909 --> 00:04:15.150
player is assured of. Okay, so it's tracking

00:04:15.150 --> 00:04:17.949
both sides. Right. And they start at absolute

00:04:17.949 --> 00:04:20.310
extremes, like negative infinity and positive

00:04:20.310 --> 00:04:23.069
infinity, and they tighten up as the computer

00:04:23.069 --> 00:04:25.870
evaluates its options. So it's constantly updating

00:04:25.870 --> 00:04:28.670
its absolute bottom line and its absolute ceiling

00:04:28.670 --> 00:04:31.189
as it looks ahead into the future. Yes, exactly.

00:04:31.470 --> 00:04:33.029
Let's bring it down to the actual chessboard.

00:04:33.589 --> 00:04:36.250
Imagine it's your turn in chess. You evaluate

00:04:36.250 --> 00:04:38.209
a potential move. Let's just call it move A.

00:04:38.300 --> 00:04:40.779
Okay, move A. You look a few turns ahead and

00:04:40.779 --> 00:04:42.959
see it improves your position. It's a solid,

00:04:43.040 --> 00:04:46.660
reliable move. Then, you decide to evaluate another

00:04:46.660 --> 00:04:49.779
option, move B. Right. And at first glance, move

00:04:49.779 --> 00:04:52.620
B looks incredibly tempting. You might even capture

00:04:52.620 --> 00:04:55.040
a knight. But as the computer lifts just a tiny

00:04:55.040 --> 00:04:57.759
bit further down that specific path, he realizes

00:04:57.759 --> 00:04:59.939
something terrible. Uh -oh. If you play move

00:04:59.939 --> 00:05:01.779
B, it opens up a sequence where your opponent

00:05:01.779 --> 00:05:04.180
can force a checkmate in two moves. Oh, wow.

00:05:04.300 --> 00:05:06.120
So no matter what else happens, you are just

00:05:06.120 --> 00:05:08.879
dead in the water. Exactly. In the algorithm's

00:05:08.879 --> 00:05:11.779
mathematical terms, the maximum score your opponent

00:05:11.779 --> 00:05:15.100
could force after Move B suddenly drops to negative

00:05:15.100 --> 00:05:18.000
infinity. Because it is a guaranteed forced loss.

00:05:18.500 --> 00:05:21.399
Right. Now remember, Move A gave you a solid

00:05:21.399 --> 00:05:24.019
position. It didn't result in a forced loss.

00:05:24.819 --> 00:05:27.839
Therefore, Move A's baseline is already mathematically

00:05:27.839 --> 00:05:31.029
superior to Move B. And because the opponent

00:05:31.029 --> 00:05:34.310
can force a win down the Move B branch, the algorithm

00:05:34.310 --> 00:05:36.910
no longer needs to consider any other possible

00:05:36.910 --> 00:05:39.629
outcomes from playing Move B. Because it literally

00:05:39.629 --> 00:05:42.509
doesn't matter if move B also occasionally leads

00:05:42.509 --> 00:05:45.329
to you winning their queen in some weird alternate

00:05:45.329 --> 00:05:47.990
sequence of bad plays by your opponent. Right.

00:05:47.990 --> 00:05:49.490
You have to assume they're playing perfectly.

00:05:49.529 --> 00:05:51.769
And since they have a guaranteed path to destroy

00:05:51.769 --> 00:05:53.949
you, none of the other possibilities matter at

00:05:53.949 --> 00:05:56.529
all. The computer just throws that entire branch

00:05:56.529 --> 00:05:59.529
of the future in the trash. It prunes the branch.

00:05:59.930 --> 00:06:02.449
And the profound shift here isn't just about

00:06:02.449 --> 00:06:04.250
the computer finding a good move. What is it

00:06:04.250 --> 00:06:06.449
then? It's about mathematically proving that

00:06:06.449 --> 00:06:09.269
certain futures are dead ends. which guarantees

00:06:09.269 --> 00:06:12.610
the exact same final decision as if you had evaluated

00:06:12.610 --> 00:06:15.089
every single possibility, but you're skipping

00:06:15.089 --> 00:06:18.509
the irrelevant ones entirely. Which sounds incredibly

00:06:18.509 --> 00:06:20.949
elegant. I mean, skipping a few bad chess moves

00:06:20.949 --> 00:06:23.269
definitely saves time. Oh, absolutely. But I

00:06:23.269 --> 00:06:25.209
have to push back a bit here because we are talking

00:06:25.209 --> 00:06:27.750
about supercomputers that handle billions of

00:06:27.750 --> 00:06:30.870
operations a second. Does pruning away a forced

00:06:30.870 --> 00:06:33.310
checkmate here and there actually save enough

00:06:33.310 --> 00:06:35.529
time to matter in the grand scheme of a match?

00:06:35.819 --> 00:06:38.319
Oh, the reduction in workload isn't just a slight

00:06:38.319 --> 00:06:41.819
optimization, it is staggering. Really? To understand

00:06:41.819 --> 00:06:44.680
the scale, we have to look at what's called combinatorial

00:06:44.680 --> 00:06:47.180
explosion. Think about a chess program searching

00:06:47.180 --> 00:06:51.170
just four plies deep. Ply's meaning individual

00:06:51.170 --> 00:06:53.149
turns, right? Like one move by white, one by

00:06:53.149 --> 00:06:55.110
black, one by white, one by black. Correct. Four

00:06:55.110 --> 00:06:57.149
turns into the future. And let's say there is

00:06:57.149 --> 00:07:01.930
an average of 36 possible legal moves or branches

00:07:01.930 --> 00:07:05.269
per turn. OK. 36 options every single time someone

00:07:05.269 --> 00:07:08.310
moves. Right. If the computer were to use a naive

00:07:08.310 --> 00:07:10.949
minimax search, meaning it meticulously looks

00:07:10.949 --> 00:07:13.170
at every single option without pruning anything,

00:07:13.649 --> 00:07:17.009
it would have to evaluate over one million terminal

00:07:17.009 --> 00:07:19.889
nodes. Wait, a terminal node is like the absolute

00:07:19.889 --> 00:07:23.449
end of a specific sequence of moves, right? Exactly,

00:07:23.629 --> 00:07:26.069
the final board state of that hypothetical timeline.

00:07:26.350 --> 00:07:29.970
Over a million possible future board states just

00:07:29.970 --> 00:07:33.149
to look four moves ahead. Just four moves? Wow.

00:07:33.230 --> 00:07:35.310
That really puts the sheer math of chess into

00:07:35.310 --> 00:07:38.069
perspective. It really does. So that is the naive

00:07:38.069 --> 00:07:41.029
search. When we apply alpha -beta pruning to

00:07:41.029 --> 00:07:43.970
that exact same four -move sequence, what happens?

00:07:44.149 --> 00:07:46.110
Well, assuming the move ordering is optimal,

00:07:46.550 --> 00:07:48.470
meaning the computer happens to search the best

00:07:48.470 --> 00:07:51.509
moves first, alpha -beta pruning would eliminate

00:07:51.509 --> 00:07:53.850
all but about 2 ,000 of those terminal nodes.

00:07:54.009 --> 00:07:57.199
Wait. From over 1 million down to 2 ,000? Are

00:07:57.199 --> 00:08:00.600
you serious? I am. It is roughly a 99 .8 % reduction

00:08:00.600 --> 00:08:02.180
in the work the computer has to do. Okay, that

00:08:02.180 --> 00:08:04.220
is not a shortcut. That is a straight -up miracle.

00:08:04.300 --> 00:08:06.740
It feels like magic, yeah. But I want to make

00:08:06.740 --> 00:08:09.040
sure we really grasp the underlying mechanics

00:08:09.040 --> 00:08:11.740
of this. Because there's this concept of big

00:08:11.740 --> 00:08:14.160
O notation that comes up when discussing this

00:08:14.160 --> 00:08:16.189
kind of optimization in the source material.

00:08:16.850 --> 00:08:19.550
Ah, yes. Big O. Which, I mean, it sounds super

00:08:19.550 --> 00:08:21.269
intimidating, but it's really just like the speedometer

00:08:21.269 --> 00:08:23.790
for an algorithm. It tells you how the workload

00:08:23.790 --> 00:08:26.589
grows as the problem gets bigger. That's a very

00:08:26.589 --> 00:08:28.810
clean way to look at it, actually. Big O notation

00:08:28.810 --> 00:08:31.269
measures worst case and best case performance.

00:08:31.370 --> 00:08:35.289
OK. So in a basic search where B is the branching

00:08:35.289 --> 00:08:38.690
factor, so those 36 choices per turn, and D is

00:08:38.690 --> 00:08:41.789
the depth of the search, the work grows exponentially.

00:08:41.809 --> 00:08:45.190
Right. The speedometer basically reads B to the

00:08:45.190 --> 00:08:48.529
power of D. But with optimal alpha beta pruning,

00:08:49.210 --> 00:08:51.269
that branching factor is effectively reduced

00:08:51.269 --> 00:08:53.950
to its square root. So instead of 36 choices

00:08:53.950 --> 00:08:57.370
compounding every turn, the math acts as if there

00:08:57.370 --> 00:09:00.730
are only six choices compounding. Yes. And the

00:09:00.730 --> 00:09:03.269
intuition behind why it's exactly a square root

00:09:03.269 --> 00:09:04.769
is really fascinating. They've locked me through

00:09:04.769 --> 00:09:07.679
it. So in a game tree, you alternate between

00:09:07.679 --> 00:09:09.620
your moves and your opponent's moves, right?

00:09:09.779 --> 00:09:12.200
To prove that a move is ultimately good for you,

00:09:12.600 --> 00:09:14.519
you have to look at all of your opponent's possible

00:09:14.519 --> 00:09:16.940
replies to ensure none of them defeat you. Makes

00:09:16.940 --> 00:09:20.480
sense. But to prove a move is bad, you only need

00:09:20.480 --> 00:09:23.379
to find one single reply from your opponent that

00:09:23.379 --> 00:09:27.299
ruins you. Just one fatal flaw. Exactly. Once

00:09:27.299 --> 00:09:30.120
you find that one reputation, you prune the rest.

00:09:30.970 --> 00:09:33.669
The math of needing to search all options on

00:09:33.669 --> 00:09:36.129
your turn but only one option on your opponent's

00:09:36.129 --> 00:09:38.809
turn geometrically cuts the effective branching

00:09:38.809 --> 00:09:42.120
factor down to its square root. Oh. OK, so I'm

00:09:42.120 --> 00:09:44.539
realizing what the practical implication of that

00:09:44.539 --> 00:09:46.440
square root actually is. What's that? If the

00:09:46.440 --> 00:09:48.940
workload is slashed to its square root, it means

00:09:48.940 --> 00:09:51.240
the computer can search twice as deep into the

00:09:51.240 --> 00:09:53.580
future using the exact same amount of computing

00:09:53.580 --> 00:09:56.059
power. Exactly. Like, if a processor previously

00:09:56.059 --> 00:09:58.379
had the juice to look five moves ahead before

00:09:58.379 --> 00:10:00.919
its clock ran out, alpha beta pruning lets that

00:10:00.919 --> 00:10:03.440
identical machine look 10 moves ahead. Yes. That

00:10:03.440 --> 00:10:05.480
is literally the difference between an amateur

00:10:05.480 --> 00:10:07.799
and a grandmaster just by changing the search

00:10:07.799 --> 00:10:10.460
logic. It fundamentally multiplies the intelligence

00:10:10.440 --> 00:10:12.860
of the machine without upgrading a single piece

00:10:12.860 --> 00:10:15.539
of hardware. That's wild. But this brings us

00:10:15.539 --> 00:10:17.799
to a massive catch. Oh, there's always a catch.

00:10:18.019 --> 00:10:21.419
To get that 99 .8 % reduction, to hit that square

00:10:21.419 --> 00:10:23.679
root optimization, you mentioned the condition

00:10:23.679 --> 00:10:26.659
yourself earlier. The algorithm needs optimal

00:10:26.659 --> 00:10:29.080
move ordering. Meaning it has to search the best

00:10:29.080 --> 00:10:31.379
moves first? It has to search the best moves

00:10:31.379 --> 00:10:33.600
first. Right. Which creates a massive chicken

00:10:33.600 --> 00:10:36.000
and egg problem. I mean, the entire point of

00:10:36.000 --> 00:10:38.620
the search algorithm is to figure out which move

00:10:38.620 --> 00:10:41.940
is the best. How on earth does the computer know

00:10:41.940 --> 00:10:44.679
which moves to look at first if it hasn't searched

00:10:44.679 --> 00:10:47.460
them yet? It is the ultimate paradox of search

00:10:47.460 --> 00:10:50.539
algorithms, honestly. You want to evaluate the

00:10:50.539 --> 00:10:53.360
winning options first to save time, but you don't

00:10:53.360 --> 00:10:55.940
know they are the winning options until you evaluate

00:10:55.940 --> 00:10:59.330
them. How do they fix it? To solve this, computer

00:10:59.330 --> 00:11:01.710
scientists had to introduce heuristics, which

00:11:01.710 --> 00:11:04.710
is essentially a formal framework for intelligent

00:11:04.710 --> 00:11:07.389
guesswork. And here's where it gets really interesting,

00:11:07.649 --> 00:11:09.710
because one of the most creatively named frameworks

00:11:09.710 --> 00:11:11.929
for this in the text is the killer heuristic.

00:11:12.129 --> 00:11:15.309
Such a great name. It's so clever. The concept

00:11:15.309 --> 00:11:17.830
is that this heuristic immediately checks the

00:11:17.830 --> 00:11:20.629
last move that caused a beta cutoff at the same

00:11:20.629 --> 00:11:23.350
depth in the game tree. Right. So let's translate

00:11:23.350 --> 00:11:25.210
that into human behavior so it makes sense for

00:11:25.210 --> 00:11:27.919
you listening. Imagine you were having a recurring

00:11:27.919 --> 00:11:30.120
argument with a friend. We've all been there.

00:11:30.379 --> 00:11:33.720
Exactly. You know exactly which specific piece

00:11:33.720 --> 00:11:36.240
of evidence completely shut down their argument

00:11:36.240 --> 00:11:39.019
the last time you debated this topic. The killer

00:11:39.019 --> 00:11:41.139
heuristic dictates that instead of starting from

00:11:41.139 --> 00:11:43.159
scratch and trying out a dozen new arguments,

00:11:43.200 --> 00:11:45.360
you just lead with the argument that killed the

00:11:45.360 --> 00:11:47.779
debate last time. You try the proven winner first.

00:11:47.779 --> 00:11:50.799
Right. It leverages past experience to organize

00:11:50.799 --> 00:11:54.419
the current search. And the algorithm does this

00:11:54.419 --> 00:11:58.070
constantly. It remembers which moves scored highly

00:11:58.070 --> 00:12:00.230
in previous evaluations. Okay, that makes sense.

00:12:00.370 --> 00:12:02.470
There's also a related technique called iterative

00:12:02.470 --> 00:12:04.990
deepening. The computer will search the entire

00:12:04.990 --> 00:12:07.769
board just one ply deep, then it starts over

00:12:07.769 --> 00:12:10.830
and searches two plies deep, then three. Wait,

00:12:10.830 --> 00:12:14.289
hold on. If it searches one ply, then starts

00:12:14.289 --> 00:12:16.889
over for two, then starts over for three, isn't

00:12:16.889 --> 00:12:18.809
that doing the same math multiple times? That

00:12:18.809 --> 00:12:21.169
sounds incredibly inefficient. It sounds totally

00:12:21.169 --> 00:12:23.909
inefficient until you realize the hidden benefit.

00:12:24.080 --> 00:12:27.919
The shallow searches are incredibly fast and

00:12:27.919 --> 00:12:30.539
they give the algorithm the move ordering it

00:12:30.539 --> 00:12:32.779
desperately needs for the deeper searches. Oh,

00:12:32.779 --> 00:12:35.460
I see. Yeah, the time lost repeating the shallow

00:12:35.460 --> 00:12:38.200
searches is vastly outweighed by the massive

00:12:38.200 --> 00:12:40.639
pruning cutoffs it achieves during the deep search

00:12:40.639 --> 00:12:43.620
because it now knows exactly which moves are

00:12:43.620 --> 00:12:45.779
likely to be the killers. That makes total sense.

00:12:45.919 --> 00:12:48.200
It's basically mapping the territory before it

00:12:48.200 --> 00:12:50.940
commits to the deep dive. Exactly. And speaking

00:12:50.940 --> 00:12:52.840
of mapping the territory, we also have to talk

00:12:52.840 --> 00:12:55.419
about aspiration windows, because this takes

00:12:55.419 --> 00:12:57.700
the guesswork to a highly aggressive level. It

00:12:57.700 --> 00:13:00.220
really does. Normally alpha starts at negative

00:13:00.220 --> 00:13:02.600
infinity and beta starts at positive infinity,

00:13:02.940 --> 00:13:05.100
so the window of acceptable scores is wide open.

00:13:05.299 --> 00:13:07.899
Anything goes. Right. But with an aspiration

00:13:07.899 --> 00:13:10.700
window, the algorithm uses its past experience

00:13:10.700 --> 00:13:14.059
to guess a much narrower range. It might decide

00:13:14.409 --> 00:13:17.330
Based on my previous shallow searches, I expect

00:13:17.330 --> 00:13:19.610
the final score of this move to be somewhere

00:13:19.610 --> 00:13:23.149
between plus 10 and plus 15. OK. So it artificially

00:13:23.149 --> 00:13:26.909
sets alpha to 10 and beta to 15. Aspiration window.

00:13:27.529 --> 00:13:30.070
So the algorithm is essentially aspiring to find

00:13:30.070 --> 00:13:32.990
a score within a specific tight range rather

00:13:32.990 --> 00:13:35.330
than looking at the whole board. Yes. But why

00:13:35.330 --> 00:13:37.590
does shrinking that window speed up the math?

00:13:37.679 --> 00:13:40.480
Because the acceptable range is so tiny, almost

00:13:40.480 --> 00:13:43.440
every single move the computer evaluates is going

00:13:43.440 --> 00:13:45.960
to instantly fall outside of it. The algorithm

00:13:45.960 --> 00:13:48.120
looks at a branch, sees it's going to score a

00:13:48.120 --> 00:13:51.139
9, and because 9 is below the strict alpha floor

00:13:51.139 --> 00:13:54.019
of 10, it instantly discards the entire branch.

00:13:54.159 --> 00:13:55.759
It doesn't even waste time figuring out the exact

00:13:55.759 --> 00:13:58.059
sequence of how the score gets to 9. Exactly.

00:13:58.279 --> 00:14:00.279
It just knows the score is in 10, so it's dead.

00:14:00.940 --> 00:14:03.159
The narrower the window, the faster the cut -offs

00:14:03.159 --> 00:14:05.269
happen. In the extreme case, which is called

00:14:05.269 --> 00:14:07.730
a zero window search, it sets alpha and beta

00:14:07.730 --> 00:14:10.090
to the exact same value. Wait, the same value?

00:14:10.470 --> 00:14:13.250
Yeah, it's essentially asking the game tree a

00:14:13.250 --> 00:14:16.409
simple true or false question. Is this move better

00:14:16.409 --> 00:14:20.110
than X? OK, but what happens when the computer's

00:14:20.110 --> 00:14:22.669
guess is wrong? Like, what if the actual best

00:14:22.669 --> 00:14:25.470
score on the board was a 20, but it artificially

00:14:25.470 --> 00:14:27.950
capped its search window at 15? Doesn't the algorithm

00:14:27.950 --> 00:14:31.429
just break? That is the calculated risk. If the

00:14:31.429 --> 00:14:34.429
search fails, meaning the real value falls outside

00:14:34.429 --> 00:14:36.870
that narrow aspiration window, the algorithm

00:14:36.870 --> 00:14:39.990
has to swallow its pride, widen the window, and

00:14:39.990 --> 00:14:43.610
research the position. But it's straightforward

00:14:43.610 --> 00:14:45.850
for the system to detect whether it failed high

00:14:45.850 --> 00:14:48.929
or failed low, giving it immediate data on how

00:14:48.929 --> 00:14:51.470
to adjust. And this actually introduces a critical

00:14:51.470 --> 00:14:53.429
technical distinction in how these algorithms

00:14:53.429 --> 00:14:56.549
are designed. Fail soft versus fail hard. OK,

00:14:56.669 --> 00:14:58.230
yeah, I saw this distinction in the reading,

00:14:58.269 --> 00:14:59.909
and I want to make sure I'm visualizing it correctly.

00:14:59.929 --> 00:15:02.799
Go ahead. If an algorithm is fail hard, it strictly

00:15:02.799 --> 00:15:05.639
limits the return value to the alpha and beta

00:15:05.639 --> 00:15:08.490
bounds it was given. But if it's fail soft, it

00:15:08.490 --> 00:15:11.350
can return values that actually bleed outside

00:15:11.350 --> 00:15:13.250
those original boundaries. Yes, that's right.

00:15:13.490 --> 00:15:15.450
It's kind of like a speed trap, right? A speed

00:15:15.450 --> 00:15:18.649
trap. Okay, how so? So a fail hard radar gun

00:15:18.649 --> 00:15:21.570
just flashes speeding if a car goes over 60 miles

00:15:21.570 --> 00:15:23.509
per hour. You know it's too fast, so you pull

00:15:23.509 --> 00:15:26.509
them over. But a fail soft radar gun tells you

00:15:26.509 --> 00:15:29.590
the car is going exactly 85 miles per hour. Ah,

00:15:29.789 --> 00:15:31.570
I see. Both give you the cutoff you need to take

00:15:31.570 --> 00:15:34.730
action, but the fail soft gives you extra precise

00:15:34.730 --> 00:15:37.720
data. That is a perfect analogy. Both the fail

00:15:37.720 --> 00:15:40.320
hard and the fail soft radar guns tell you the

00:15:40.320 --> 00:15:42.919
car violated the boundary. Right. But if you

00:15:42.919 --> 00:15:45.519
guessed wrong with your aspiration window and

00:15:45.519 --> 00:15:48.019
you have to do a wider research, that feel soft

00:15:48.019 --> 00:15:51.259
data is invaluable. Knowing the car was going

00:15:51.259 --> 00:15:54.980
exactly 85 gives you a highly accurate baseline

00:15:54.980 --> 00:15:57.779
to set your new wider window. So you aren't just

00:15:57.779 --> 00:15:59.759
hitting a brick wall and starting over. Exactly.

00:15:59.779 --> 00:16:02.340
You are failing forward with precise information.

00:16:02.570 --> 00:16:05.470
The layers of optimization here are just beautiful,

00:16:05.590 --> 00:16:07.870
from the basic pruning of dead ends to the killer

00:16:07.870 --> 00:16:10.429
heuristic to the incredibly aggressive aspiration

00:16:10.429 --> 00:16:12.950
windows and fail -soft fallbacks. It really is

00:16:12.950 --> 00:16:14.870
a masterpiece of computer science. Which makes

00:16:14.870 --> 00:16:17.570
the history of this algorithm incredibly ironic.

00:16:17.629 --> 00:16:19.970
It does. Because you would assume that when someone

00:16:19.970 --> 00:16:22.629
discovers a mathematical truth that reduces workloads

00:16:22.629 --> 00:16:26.610
by 99 .8%, the computer science world would just

00:16:26.610 --> 00:16:29.389
throw a parade. But human nature often gets in

00:16:29.389 --> 00:16:31.980
the way of brilliant math. Yeah, the historical

00:16:31.980 --> 00:16:34.580
record of alpha -beta pruning is a fascinating

00:16:34.580 --> 00:16:37.700
study in skepticism. The concept didn't arrive

00:16:37.700 --> 00:16:41.100
with a single universally celebrated eureka moment.

00:16:41.379 --> 00:16:43.740
Right. We have to talk about the Dartmouth workshop

00:16:43.740 --> 00:16:47.419
in 1956. This is widely considered the founding

00:16:47.419 --> 00:16:49.960
event of the entire artificial intelligence field.

00:16:50.120 --> 00:16:53.009
A legendary gathering. Totally. John McCarthy,

00:16:53.230 --> 00:16:56.129
one of the absolute legends of AI, invents this

00:16:56.129 --> 00:16:58.549
alpha -beta search concept. He's at the workshop

00:16:58.549 --> 00:17:01.309
and he crosses paths with Alex Bernstein, an

00:17:01.309 --> 00:17:03.110
IBM researcher who happens to be writing one

00:17:03.110 --> 00:17:05.329
of the very first chess programs. It's a match

00:17:05.329 --> 00:17:08.509
made in heaven. Exactly. McCarthy has the ultimate

00:17:08.509 --> 00:17:11.230
chess algorithm and Bernstein is building a chess

00:17:11.230 --> 00:17:13.710
program. So McCarthy pitches the alpha -beta

00:17:13.710 --> 00:17:16.529
search logic to Bernstein. He explains how tracking

00:17:16.529 --> 00:17:18.789
these bounds will drastically reduce the nodes

00:17:18.789 --> 00:17:21.839
his IBM program has to evaluate. And Bernstein's

00:17:21.839 --> 00:17:24.039
reaction, according to the records, was simply

00:17:24.039 --> 00:17:28.180
that he was unconvinced. Unconvinced. He is standing

00:17:28.180 --> 00:17:30.160
at the dawn of artificial intelligence, looking

00:17:30.160 --> 00:17:32.980
at the math that will define all adversarial

00:17:32.980 --> 00:17:35.420
computer search for the next century, and he

00:17:35.420 --> 00:17:38.299
just says, yeah, I don't buy it. Why would he

00:17:38.299 --> 00:17:41.039
reject it? Well, if we connect this to the bigger

00:17:41.039 --> 00:17:43.720
picture, you have to consider the context of

00:17:43.720 --> 00:17:47.740
1950s computing. Processing power was so incredibly

00:17:47.740 --> 00:17:51.119
limited that adding complex logical checks to

00:17:51.119 --> 00:17:54.599
every single node, constantly asking, is this

00:17:54.599 --> 00:17:57.640
greater than alpha or less than beta, seemed

00:17:57.640 --> 00:17:59.779
to many engineers like it would slow the computer

00:17:59.779 --> 00:18:02.619
down more than just rapidly evaluating the nodes

00:18:02.619 --> 00:18:05.000
themselves. Ah, so it's an optimization paradox.

00:18:05.220 --> 00:18:07.660
Exactly. It takes vision to realize that doing

00:18:07.660 --> 00:18:09.940
a little extra math at each step will ultimately

00:18:09.940 --> 00:18:11.980
save you from doing millions of math problems

00:18:11.980 --> 00:18:14.130
later. It just goes to show you can hand someone

00:18:14.130 --> 00:18:15.829
the keys to the universe and they might just

00:18:15.829 --> 00:18:18.130
decide they prefer walking? Pretty much. But

00:18:18.130 --> 00:18:21.029
because Bernstein rejected it, the idea didn't

00:18:21.029 --> 00:18:24.150
catch fire right away. It didn't. But what happened

00:18:24.150 --> 00:18:26.950
next proves how fundamental this algorithm truly

00:18:26.950 --> 00:18:30.049
is. Because the logic of alpha -beta pruning

00:18:30.049 --> 00:18:32.970
is a mathematical truth, it didn't need a marketing

00:18:32.970 --> 00:18:36.569
campaign. It was actually independently reinvented

00:18:36.569 --> 00:18:39.789
a stunning number of times. Arthur Samuel created

00:18:39.789 --> 00:18:42.650
an early version for a checker simulation. Wow.

00:18:42.890 --> 00:18:46.589
In 1963, a Soviet researcher named Alexander

00:18:46.589 --> 00:18:49.390
Brudno independently conceived and published

00:18:49.390 --> 00:18:52.130
the algorithm in Moscow. Just on his own. Yeah.

00:18:52.390 --> 00:18:54.589
Back in the US, McCarthy had suggested it to

00:18:54.589 --> 00:18:57.250
a group of his students at MIT who implemented

00:18:57.250 --> 00:19:00.250
it in 1961. It's like calculus with Newton and

00:19:00.250 --> 00:19:03.470
Leibniz. When the math is that true, multiple

00:19:03.470 --> 00:19:05.950
people are bound to uncover it once the field

00:19:05.950 --> 00:19:08.390
reaches a certain level of maturity. The core

00:19:08.390 --> 00:19:10.569
logic of the universe was simply waiting to be

00:19:10.569 --> 00:19:13.740
articulated. It wasn't until 1975 that Donald

00:19:13.740 --> 00:19:16.539
Newth and Ronald Moore really refined the algorithm

00:19:16.539 --> 00:19:19.339
mathematically, and later Judea Pearl published

00:19:19.339 --> 00:19:22.059
papers mathematically proving its optimality

00:19:22.059 --> 00:19:24.720
for certain types of game trees. But the sheer

00:19:24.720 --> 00:19:26.799
volume of independent discoveries proves that

00:19:26.799 --> 00:19:29.420
certain mathematical truths are inevitable, regardless

00:19:29.420 --> 00:19:31.039
of whether the initial critics are convinced.

00:19:31.299 --> 00:19:33.539
So when we step back from the game tree for a

00:19:33.539 --> 00:19:36.420
second, what does this all mean for you listening?

00:19:36.730 --> 00:19:38.650
Why should you care about the history of chess

00:19:38.650 --> 00:19:41.789
programs or the big O notation of adversarial

00:19:41.789 --> 00:19:44.809
search? It's a valid question. To me, alpha beta

00:19:44.809 --> 00:19:47.509
pruning isn't just a quirky footnote about artificial

00:19:47.509 --> 00:19:50.829
intelligence. It is a profound masterclass in

00:19:50.829 --> 00:19:53.789
efficiency. I'd agree with that. It proves mathematically

00:19:53.789 --> 00:19:57.390
that the absolute fastest way to solve an overwhelmingly

00:19:57.390 --> 00:20:00.690
complex problem is almost never to sit down and

00:20:00.690 --> 00:20:03.410
meticulously evaluate every single possible option.

00:20:03.490 --> 00:20:06.089
Trying to calculate the universe leads to paralysis.

00:20:06.309 --> 00:20:09.480
Right. The true shortcut, the real genius, is

00:20:09.480 --> 00:20:12.779
figuring out how to quickly identify and eliminate

00:20:12.779 --> 00:20:15.240
the options that allow your opponent, whether

00:20:15.240 --> 00:20:17.619
that's a chess player, the stock market, or just

00:20:17.619 --> 00:20:20.440
the friction of daily life, to force a loss.

00:20:20.519 --> 00:20:22.539
You figure out what kills you, and you prune

00:20:22.539 --> 00:20:24.960
it immediately. That is a deeply practical takeaway,

00:20:25.319 --> 00:20:27.299
and it leaves us with a provocative thought to

00:20:27.299 --> 00:20:29.200
consider. Let's hear it. We've talked a lot about

00:20:29.200 --> 00:20:32.000
the concept of refutation tables, that generalized

00:20:32.000 --> 00:20:34.660
set of moves that disprove an opponent's strategy,

00:20:34.980 --> 00:20:37.819
like the killer heuristics. What if we consciously

00:20:37.819 --> 00:20:40.740
apply the logic of alpha -beta pruning to our

00:20:40.740 --> 00:20:43.279
own daily decision -making? That is an incredible

00:20:43.279 --> 00:20:45.619
concept. What would a personal refutation table

00:20:45.619 --> 00:20:48.319
even look like? Think about the choices you navigate

00:20:48.319 --> 00:20:52.609
every single day. If you took the time consciously

00:20:52.609 --> 00:20:55.230
map out the negative infinity outcomes in your

00:20:55.230 --> 00:20:58.690
own life. The toxic habits, the draining relationships,

00:20:58.890 --> 00:21:01.210
the impulsive financial choices that practically

00:21:01.210 --> 00:21:03.549
guarantee a forced loss for your well -being.

00:21:03.910 --> 00:21:06.519
Yeah. Could you train your brain to act like

00:21:06.519 --> 00:21:09.380
this algorithm? Could you build your own internal

00:21:09.380 --> 00:21:12.420
refutation tables to automatically prune those

00:21:12.420 --> 00:21:14.660
branches of thought, those bad choices, before

00:21:14.660 --> 00:21:16.960
you even waste a single ounce of mental energy

00:21:16.960 --> 00:21:19.799
evaluating them? Wow. Imagine the mental processing

00:21:19.799 --> 00:21:21.880
power you would free up if you instantly discarded

00:21:21.880 --> 00:21:24.339
the dead ends, allowing you to focus entirely

00:21:24.339 --> 00:21:25.819
on the moves that actually win the game.
