WEBVTT

00:00:00.000 --> 00:00:02.439
Imagine you're sitting in a command bunker, right?

00:00:02.799 --> 00:00:05.040
You're staring at a radar screen and you have

00:00:05.040 --> 00:00:07.259
exactly three minutes to decide if you were going

00:00:07.259 --> 00:00:09.439
to launch a nuclear counter -strike. Yeah. The

00:00:09.439 --> 00:00:11.699
alarm is blaring. And you just don't know for

00:00:11.699 --> 00:00:13.919
sure if the blips on the screen are incoming

00:00:13.919 --> 00:00:18.379
enemy missiles or just a really poorly timed

00:00:18.379 --> 00:00:20.539
computer glitch. Right. How do you make that

00:00:20.539 --> 00:00:23.410
choice? You can't know the truth. So you basically

00:00:23.410 --> 00:00:26.170
assume the absolute worst. Yeah, it is the ultimate

00:00:26.170 --> 00:00:28.309
high stakes gamble. I mean, you are forced to

00:00:28.309 --> 00:00:31.170
act against a potentially hostile universe where

00:00:31.170 --> 00:00:34.340
guessing wrong literally means game over. Exactly.

00:00:34.719 --> 00:00:36.759
Welcome to this deep dive, everyone. I'm so glad

00:00:36.759 --> 00:00:40.140
you could join us today. We are exploring a dense

00:00:40.140 --> 00:00:43.200
but honestly highly consequential Wikipedia article

00:00:43.200 --> 00:00:45.600
on a mathematical decision rule called Minimax.

00:00:45.840 --> 00:00:48.359
It's quite the topic. It really is. Our mission

00:00:48.359 --> 00:00:50.920
for you today is to demystify this seemingly

00:00:50.920 --> 00:00:53.420
abstract math concept. Yeah. We're going to reveal

00:00:53.420 --> 00:00:55.939
how it secretly governs everything from, well,

00:00:56.079 --> 00:00:58.600
the way artificial intelligence plays chess to

00:00:58.600 --> 00:01:01.479
how we vote in our democracies and even how philosophers

00:01:01.479 --> 00:01:04.739
define a completely just society. It is quite

00:01:04.739 --> 00:01:07.840
the journey. And fittingly, our visual backdrop

00:01:07.840 --> 00:01:12.599
today is this shifting conceptual array of complex

00:01:12.599 --> 00:01:15.959
game trees and branching timelines. Yeah, representing

00:01:15.959 --> 00:01:18.019
the endless exploration of all those possible

00:01:18.019 --> 00:01:20.900
futures. Okay, let's unpack this. Before we can

00:01:20.900 --> 00:01:23.099
talk about supercomputers or the philosophy of

00:01:23.099 --> 00:01:25.500
justice, we really have to establish the basic

00:01:25.500 --> 00:01:28.060
mechanics of navigating a worst -case scenario.

00:01:28.859 --> 00:01:30.980
What exactly are we looking at here? Well, at

00:01:30.980 --> 00:01:34.500
its core, MiniMax is a decision rule used for

00:01:34.500 --> 00:01:36.780
minimizing the possible loss in a worst -case

00:01:36.780 --> 00:01:39.739
scenario. It was originally formulated for game

00:01:39.739 --> 00:01:42.700
theory, specifically for games with multiple

00:01:42.700 --> 00:01:45.019
players where, you know, one person's win is

00:01:45.019 --> 00:01:47.560
exactly another person's loss. A zero -sum situation.

00:01:47.879 --> 00:01:50.420
Exactly. Now, when you are dealing with gains

00:01:50.420 --> 00:01:52.700
rather than losses, the rule is flipped and is

00:01:52.700 --> 00:01:55.500
referred to as Maximin. Praximin. Right. That

00:01:55.500 --> 00:01:57.819
means you are trying to maximize your minimum

00:01:57.819 --> 00:02:00.439
gain. Maximize the minimum gain. I'll be honest,

00:02:00.700 --> 00:02:02.400
that sounds like a riddle you'd have to solve

00:02:02.400 --> 00:02:05.040
to cross a troll's bridge. I know, I know. It

00:02:05.040 --> 00:02:06.980
is a bit counterintuitive until you break down

00:02:06.980 --> 00:02:09.919
the mechanics. The critical difference between

00:02:09.919 --> 00:02:12.460
the two really comes down to the order of operations.

00:02:13.189 --> 00:02:16.590
Think of it sequentially. OK. In Maximin, the

00:02:16.590 --> 00:02:19.490
maximization comes after the minimization. You

00:02:19.490 --> 00:02:21.770
try to maximize your value before knowing what

00:02:21.770 --> 00:02:24.409
the other players will do. You are assuming the

00:02:24.409 --> 00:02:26.330
worst and just making the best of it. Right.

00:02:26.530 --> 00:02:28.669
But in Minimax, you are actually in a much better

00:02:28.669 --> 00:02:30.909
position. You maximize your value knowing what

00:02:30.909 --> 00:02:33.590
the others did. Oh, OK. Let me try to put this

00:02:33.590 --> 00:02:35.530
into the real world. Yeah. Let's say you are

00:02:35.530 --> 00:02:37.509
deciding whether to pack an umbrella for a trip.

00:02:37.680 --> 00:02:41.020
Packing that umbrella just in case is a classic

00:02:41.020 --> 00:02:43.219
maximum strategy, right? Too precisely. You are

00:02:43.219 --> 00:02:45.900
maximizing your comfort in the absolute worst

00:02:45.900 --> 00:02:48.400
-case scenario, which is a massive downpour,

00:02:48.900 --> 00:02:50.400
before you actually know what the weather will

00:02:50.400 --> 00:02:52.879
do. You secure your baseline of comfort first.

00:02:53.080 --> 00:02:56.560
That is a perfect everyday application. You are

00:02:56.560 --> 00:02:59.379
looking at the worst possible outcomes for every

00:02:59.379 --> 00:03:01.139
choice you could make, and you are selecting

00:03:01.139 --> 00:03:03.039
the choice where that worst -case outcome is

00:03:03.039 --> 00:03:05.629
the least damaging. Okay, got it. And the source

00:03:05.629 --> 00:03:08.250
text gives a very specific pure math example

00:03:08.250 --> 00:03:11.270
of this using a grid, or a payoff matrix, with

00:03:11.270 --> 00:03:13.030
two players. They call them a row player and

00:03:13.030 --> 00:03:15.550
a column player. Right. And in the text, it's

00:03:15.550 --> 00:03:18.310
just this dry grid of numbers. If the row player

00:03:18.310 --> 00:03:20.870
chooses the top row, it's this number. If they

00:03:20.870 --> 00:03:23.849
choose the bottom row, it's that number. It's

00:03:23.849 --> 00:03:25.590
a bit hard to visualize. It can be dry, yeah.

00:03:25.629 --> 00:03:28.090
So let's make it visceral. Let's imagine those

00:03:28.090 --> 00:03:31.830
choices, top, middle, and bottom, as three actual

00:03:31.830 --> 00:03:33.870
physical doors you have to walk through. Oh,

00:03:33.870 --> 00:03:36.610
I like that. Let's look at the row player's options,

00:03:36.689 --> 00:03:39.210
assuming they are only using pure strategies.

00:03:39.289 --> 00:03:41.930
Meaning? That simply means they pick one door

00:03:41.930 --> 00:03:44.169
and stick to it. There's no trickery or flipping

00:03:44.169 --> 00:03:46.689
coins. If the row player walks through the top

00:03:46.689 --> 00:03:48.969
door, the text tells us their payoff will be

00:03:48.969 --> 00:03:52.229
either a 3 or a 2. OK, not bad. If they walk

00:03:52.229 --> 00:03:54.789
through the middle door, it will be a 5 or a

00:03:54.789 --> 00:03:57.729
painful penalty of negative 10. And if they choose

00:03:57.729 --> 00:04:00.729
the bottom door, the payoffs are either a 4 or

00:04:00.729 --> 00:04:03.090
a devastating trap door, dropping them to negative

00:04:03.090 --> 00:04:05.479
100. See, when you phrase it like that, looking

00:04:05.479 --> 00:04:08.139
at that bottom door is terrifying. I mean, I

00:04:08.139 --> 00:04:10.060
could get a solid four, but it could also fall

00:04:10.060 --> 00:04:12.659
into a pit of negative 100. That's a massive

00:04:12.659 --> 00:04:15.219
game -ending penalty. Absolutely. And the middle

00:04:15.219 --> 00:04:17.019
door isn't great either, with a risk of losing

00:04:17.019 --> 00:04:20.339
10. Right. So following the Maximum Rule, the

00:04:20.339 --> 00:04:22.480
row player doesn't even look at the potential

00:04:22.480 --> 00:04:25.279
rewards. They strictly look at the worst possible

00:04:25.279 --> 00:04:27.819
outcome for each choice. So they just focus on

00:04:27.819 --> 00:04:30.920
the floors, not the ceilings. Exactly. Top door's

00:04:30.920 --> 00:04:33.860
worst case is a gain of 2. Middle door's worst

00:04:33.860 --> 00:04:36.860
case is losing 10. Bottom door's worst case is

00:04:36.860 --> 00:04:41.240
losing 100. The highest or, I guess, most survivable

00:04:41.240 --> 00:04:44.259
of those minimums is 2. Therefore, the row player

00:04:44.259 --> 00:04:47.139
chooses the top door. guaranteeing a payoff of

00:04:47.139 --> 00:04:49.420
at least two. So it's not about winning big,

00:04:49.600 --> 00:04:52.180
it's about not losing everything. They are playing

00:04:52.180 --> 00:04:55.120
it incredibly safe. And I assume the column player,

00:04:55.379 --> 00:04:57.160
choosing between their own left and right options,

00:04:57.420 --> 00:05:00.480
does the exact same defensive calculus. Yes,

00:05:00.480 --> 00:05:02.920
they do. The column player evaluates left and

00:05:02.920 --> 00:05:04.720
right. If they play left, their absolute worst

00:05:04.720 --> 00:05:07.279
payoff is zero. If they play right, their worst

00:05:07.279 --> 00:05:09.620
payoff is negative 20. So they go left. Right.

00:05:09.720 --> 00:05:12.540
They choose left to secure a baseline of at least

00:05:12.540 --> 00:05:14.779
zero, stepping right around that negative 20

00:05:14.779 --> 00:05:16.879
trap. OK. That makes sense when the board is

00:05:16.879 --> 00:05:19.540
static. You look at the trap doors. You calculate

00:05:19.540 --> 00:05:21.620
the depth of the spike. and you choose the safest

00:05:21.620 --> 00:05:25.399
path. But a static grid is one thing. What happens

00:05:25.399 --> 00:05:28.259
when we step into active competition? Yeah, that's

00:05:28.259 --> 00:05:30.019
where it gets messy. Like what happens when we

00:05:30.019 --> 00:05:33.779
look at a pure two -player zero -sum game? The

00:05:33.779 --> 00:05:36.920
dynamic completely fractures because in a zero

00:05:36.920 --> 00:05:40.060
-sum game, the environment isn't neutral. Your

00:05:40.060 --> 00:05:42.480
opponent is actively trying to minimize your

00:05:42.480 --> 00:05:44.819
score in order to maximize their own. Right.

00:05:44.920 --> 00:05:46.819
If you win a slice of pizza, they physically

00:05:46.819 --> 00:05:49.600
lose a slice of pizza. There is no mutual benefit.

00:05:49.720 --> 00:05:51.639
And the source mentions that in these zero -sum

00:05:51.639 --> 00:05:53.980
games, the minimax solution is the same as the

00:05:53.980 --> 00:05:56.680
Nash equilibrium. But walk me through the how.

00:05:57.019 --> 00:05:59.600
How does the math hold up when someone is actively

00:05:59.600 --> 00:06:01.839
hunting you? Well, the source material illustrates

00:06:01.839 --> 00:06:05.339
this with a complex 3x3 matrix involving player

00:06:05.339 --> 00:06:08.939
A and player B. Player A has choices A1, A2,

00:06:09.000 --> 00:06:12.459
and A3. Player B has B1, B2, and B3. Because

00:06:12.459 --> 00:06:14.839
it's zero -sum, whatever the payoff is for player

00:06:14.839 --> 00:06:17.639
A, the payoff for player B is the exact same

00:06:17.639 --> 00:06:20.500
number, just with the sign reversed. So if player

00:06:20.500 --> 00:06:24.079
A wins 5, player B inherently loses 5. Okay,

00:06:24.079 --> 00:06:26.879
so it's a direct tactical duel. Exactly. So let's

00:06:26.879 --> 00:06:29.839
look at player A's simple maximum choice. If

00:06:29.839 --> 00:06:32.879
we evaluate all the rows, the math tells us choice

00:06:32.879 --> 00:06:36.600
A2 is the safest. The absolute worst possible

00:06:36.600 --> 00:06:39.439
result if player A chooses A2 is having to pay

00:06:39.439 --> 00:06:42.360
one to player B. Okay, not a huge loss. For player

00:06:42.360 --> 00:06:44.439
B doing the same defensive analysis on their

00:06:44.439 --> 00:06:47.379
options, their simple maximum choice is B2, because

00:06:47.379 --> 00:06:50.980
the worst possible result is A0 payment, no loss,

00:06:51.040 --> 00:06:53.170
no gain. But wait, let me push back on this.

00:06:53.209 --> 00:06:55.370
Go for it. If we are playing this game and you

00:06:55.370 --> 00:06:57.410
are player B and you believe that I'm going to

00:06:57.410 --> 00:07:00.029
play it safe and choose A2, you aren't going

00:07:00.029 --> 00:07:02.470
to just sit there and choose B2 for a zero payout.

00:07:02.730 --> 00:07:04.470
You're going to adapt. You are going to switch

00:07:04.470 --> 00:07:07.050
your move to B1 because the matrix shows that

00:07:07.050 --> 00:07:10.129
against my A2, your B1 gives you a gain of one.

00:07:10.370 --> 00:07:12.350
Right. And you've spotted the foundational flaw

00:07:12.350 --> 00:07:15.209
of applying static rules to dynamic human behavior.

00:07:15.550 --> 00:07:18.720
But the flaw doesn't stop there. Because if I

00:07:18.720 --> 00:07:21.480
realize that you are going to switch to B1 to

00:07:21.480 --> 00:07:24.399
try and snatch that one point, I'm going to abandon

00:07:24.399 --> 00:07:27.279
my safe A2 strategy. I'm going to switch to A1

00:07:27.279 --> 00:07:31.019
because the grid says A1 against your B1 gives

00:07:31.019 --> 00:07:33.939
me a massive gain of three. Yep. It's an endless

00:07:33.939 --> 00:07:36.720
exhausting loop of second guessing. I know that

00:07:36.720 --> 00:07:38.930
you know that. I know that you know. It never

00:07:38.930 --> 00:07:41.850
stops. What's fascinating here is that game theorists

00:07:41.850 --> 00:07:44.610
mathematically recognize this exact paranoia.

00:07:45.009 --> 00:07:48.149
The simple pure strategy completely breaks down

00:07:48.149 --> 00:07:50.910
here. It is inherently unstable. So how do they

00:07:50.910 --> 00:07:53.470
solve it? Through two mechanical steps. First,

00:07:53.670 --> 00:07:56.009
you trim the fat. you eliminate what they call

00:07:56.009 --> 00:07:58.410
dominated choices. Dominated choices. Let's translate

00:07:58.410 --> 00:08:00.730
that. That just means moves that are mathematically

00:08:00.730 --> 00:08:02.750
embarrassing to make, right? Moves that fail

00:08:02.750 --> 00:08:04.589
no matter what the opponent does. Precisely.

00:08:04.810 --> 00:08:07.889
In the text matrix, player A will never, under

00:08:07.889 --> 00:08:11.129
any circumstance, choose A3, because their other

00:08:11.129 --> 00:08:13.750
options always yield better results against any

00:08:13.750 --> 00:08:15.730
of V's moves. You just snip that branch off the

00:08:15.730 --> 00:08:18.089
tree completely. Makes sense. But, as you pointed

00:08:18.089 --> 00:08:20.129
out, even with a smaller tree, we still have

00:08:20.129 --> 00:08:22.269
that endless loop of second guessing. Right.

00:08:22.290 --> 00:08:24.459
So how do you actually break the loop? You introduce

00:08:24.459 --> 00:08:27.139
the concept of mixed strategies. This means,

00:08:27.300 --> 00:08:29.660
instead of picking one move 100 % of the time,

00:08:29.899 --> 00:08:32.759
you use probabilities. You randomize your choices

00:08:32.759 --> 00:08:35.159
based on specific mathematical weights. Wait,

00:08:35.159 --> 00:08:37.120
really? You stop trying to outsmart the opponent

00:08:37.120 --> 00:08:38.940
and you turn yourself into a random number generator?

00:08:39.200 --> 00:08:42.440
Essentially, yes. In a zero -sum game, being

00:08:42.440 --> 00:08:45.419
predictable is the same as being dead. In this

00:08:45.419 --> 00:08:48.559
specific matrix from the source, player A can

00:08:48.559 --> 00:08:51.240
guarantee stability by choosing A1 with a one

00:08:51.240 --> 00:08:53.679
in six probability and A2 with a five in six

00:08:53.679 --> 00:08:56.740
probability. On the other side, player B secures

00:08:56.740 --> 00:08:58.960
their defensive position by choosing B1 with

00:08:58.960 --> 00:09:01.840
a one third probability and B2 with a two thirds

00:09:01.840 --> 00:09:04.220
probability. But what does that actually do?

00:09:04.340 --> 00:09:06.320
I'm rolling a die now instead of making a choice.

00:09:06.700 --> 00:09:09.559
By adopting those specific rigid probabilities,

00:09:10.059 --> 00:09:12.299
both players mathematically secure an expected

00:09:12.299 --> 00:09:15.820
payoff. or an expected gain of exactly one third.

00:09:16.039 --> 00:09:18.600
Meaning, if you play this game a thousand times,

00:09:18.860 --> 00:09:21.360
no matter what psychological mind games the other

00:09:21.360 --> 00:09:23.899
person tries to play, the math dictates that

00:09:23.899 --> 00:09:26.700
the value settles at one third. They've created

00:09:26.700 --> 00:09:30.620
concrete mathematical stability out of pure chaotic

00:09:30.620 --> 00:09:33.240
second -guessing. Here's where it gets really

00:09:33.240 --> 00:09:36.350
interesting for me. Because humans might try

00:09:36.350 --> 00:09:38.830
to use mixed strategies by flipping a coin or

00:09:38.830 --> 00:09:41.629
trying to be erratic, but computers, computers

00:09:41.629 --> 00:09:44.809
don't guess. They have the processing power to

00:09:44.809 --> 00:09:48.019
map out these possibilities explicitly. This

00:09:48.019 --> 00:09:50.639
bridges the abstract mathematical theory straight

00:09:50.639 --> 00:09:53.500
into real -world technology. Yes. This is the

00:09:53.500 --> 00:09:56.000
realm of combinatorial game theory. This is where

00:09:56.000 --> 00:09:58.379
the minimax algorithm becomes the literal brain

00:09:58.379 --> 00:10:00.559
of artificial intelligence for games like tic

00:10:00.559 --> 00:10:03.379
-tac -toe, checkers, or chess. And it does this

00:10:03.379 --> 00:10:05.639
by working backwards. I love the visual of this.

00:10:05.759 --> 00:10:08.159
It's like the AI is a paranoid mental time traveler.

00:10:08.600 --> 00:10:10.899
The computer looks into every possible future

00:10:10.899 --> 00:10:13.190
timeline of the chess game. It assumes that you,

00:10:13.509 --> 00:10:16.169
its human opponent, will always make the absolute

00:10:16.169 --> 00:10:19.289
best, most punishing counter move in every single

00:10:19.289 --> 00:10:22.870
timeline. And then, knowing how every timeline

00:10:22.870 --> 00:10:26.090
inevitably ends, it travels back to the present

00:10:26.090 --> 00:10:28.950
moment on the board and chooses the single path

00:10:28.950 --> 00:10:31.309
that guarantees the least amount of destruction.

00:10:31.480 --> 00:10:34.240
That is a highly accurate way to conceptualize

00:10:34.240 --> 00:10:37.059
the code. The AI treats itself as the maximizing

00:10:37.059 --> 00:10:39.299
player and its opponent as the minimizing player.

00:10:39.759 --> 00:10:42.200
The algorithm assigns a concrete numerical value

00:10:42.200 --> 00:10:45.059
to the end of the game. If the AI wins, that

00:10:45.059 --> 00:10:47.820
game state gets a value of plus one or sometimes

00:10:47.820 --> 00:10:51.259
positive infinity. If the opponent wins, that

00:10:51.259 --> 00:10:54.059
timeline gets a value of negative one or negative

00:10:54.059 --> 00:10:56.259
infinity. Layer after layer, it's just calculating,

00:10:56.620 --> 00:10:59.019
if I do this, they do that to minimize my score,

00:10:59.120 --> 00:11:01.360
then I do this to maximize it. Yes. And the text

00:11:01.360 --> 00:11:03.299
brings up the most famous historical example

00:11:03.299 --> 00:11:06.659
of this. The IBM chess computer Deep Blue beating

00:11:06.659 --> 00:11:09.299
the reigning human world champion, Garry Kasparov,

00:11:09.559 --> 00:11:12.879
in 1997. The source notes, Deep Blue looked ahead

00:11:12.879 --> 00:11:16.059
at least 12 plies. A ply being just one turn

00:11:16.059 --> 00:11:19.019
taken by one player. 12 moves deep into the future.

00:11:19.259 --> 00:11:21.059
Which sounds incredibly deep, but chess is a

00:11:21.059 --> 00:11:23.840
massive game. Even a supercomputer can't actually

00:11:23.840 --> 00:11:25.960
map every single timeline in chess all the way

00:11:25.960 --> 00:11:29.529
to a checkmate, can it? It cannot. No. And this

00:11:29.529 --> 00:11:33.049
is the great limitation of a naive minimax algorithm.

00:11:33.649 --> 00:11:35.690
The source material refers to this limitation

00:11:35.690 --> 00:11:38.350
as the effective branching factor. Let's define

00:11:38.350 --> 00:11:41.169
that. In a game like chess, every single turn

00:11:41.169 --> 00:11:44.929
offers dozens of possible legal moves. For every

00:11:44.929 --> 00:11:47.450
one of those moves, the opponent has dozens of

00:11:47.450 --> 00:11:50.250
responses. The number of possible game states

00:11:50.250 --> 00:11:53.470
doesn't just grow, it explodes exponentially.

00:11:53.549 --> 00:11:55.289
Right. It gets out of hand fast. Within a few

00:11:55.289 --> 00:11:57.850
moves, you are looking at billions of timelines.

00:11:58.509 --> 00:12:01.350
It is computationally impossible for any machine

00:12:01.350 --> 00:12:03.730
to map the entire game of chess from the first

00:12:03.730 --> 00:12:05.730
move to the last. So the time machine breaks

00:12:05.730 --> 00:12:07.889
down because there are simply too many alternate

00:12:07.889 --> 00:12:10.700
dimensions to visit. The clock is ticking. So

00:12:10.700 --> 00:12:13.820
how did Deep Blue actually beat Kasparov if it

00:12:13.820 --> 00:12:15.700
couldn't see to the end of the game? By using

00:12:15.700 --> 00:12:18.259
two critical mathematical workarounds. First,

00:12:18.399 --> 00:12:20.440
they used a technique called alpha -beta pruning.

00:12:20.820 --> 00:12:23.080
This dramatically improves performance without

00:12:23.080 --> 00:12:25.320
changing the final decision. Explain the how

00:12:25.320 --> 00:12:27.779
of that. How do you skip calculating a timeline

00:12:27.779 --> 00:12:30.309
without missing something important? Think of

00:12:30.309 --> 00:12:33.330
the algorithm searching down a massive branching

00:12:33.330 --> 00:12:36.629
tree of possibilities. Let's say it's evaluating

00:12:36.629 --> 00:12:39.549
a specific sequence of moves, and three steps

00:12:39.549 --> 00:12:42.509
down this timeline, it sees that the human opponent

00:12:42.509 --> 00:12:45.830
has a move that captures the AI's queen for absolutely

00:12:45.830 --> 00:12:49.490
free. A catastrophic loss. Right. The AI doesn't

00:12:49.490 --> 00:12:52.509
need to calculate the next 20 moves of that specific

00:12:52.509 --> 00:12:55.950
timeline to know it's a terrible path. The math

00:12:55.950 --> 00:12:58.409
proves that the maximum possible score of this

00:12:58.409 --> 00:13:00.730
branch is already lower than a safer branch it

00:13:00.730 --> 00:13:03.169
found earlier. So it simply stops searching.

00:13:03.309 --> 00:13:05.350
Oh, wow. Yeah, it takes a pair of mathematical

00:13:05.350 --> 00:13:08.129
shears and prunes that dead branch off the tree.

00:13:08.490 --> 00:13:10.690
It doesn't waste processing power calculating

00:13:10.690 --> 00:13:13.490
exactly how badly it will lose. It just knows

00:13:13.490 --> 00:13:15.409
never to go down that path. It cuts the dead

00:13:15.409 --> 00:13:17.429
weight. That's brilliant. But even with a pruned

00:13:17.429 --> 00:13:20.250
tree, you still can't see the final check made.

00:13:20.590 --> 00:13:22.669
At some point, the computer just has to guess,

00:13:22.809 --> 00:13:25.500
right? Exactly, and that brings us to the second

00:13:25.500 --> 00:13:28.580
workaround heuristic evaluation functions. Since

00:13:28.580 --> 00:13:31.379
the computer can't reach the absolute final winning

00:13:31.379 --> 00:13:34.399
or losing state of the game, it has to forcefully

00:13:34.399 --> 00:13:38.200
stop calculating at a certain depth, say 12 plies.

00:13:38.820 --> 00:13:41.259
At that exact point, it uses a heuristic which

00:13:41.259 --> 00:13:43.519
is essentially an educated mathematical guess.

00:13:43.879 --> 00:13:46.620
But how does a machine guess? How does it evaluate

00:13:46.620 --> 00:13:49.960
a board without a clear winner? It translates

00:13:49.960 --> 00:13:53.080
abstract concepts into rigid numbers. It evaluates

00:13:53.080 --> 00:13:55.039
the current board state by looking at material.

00:13:55.279 --> 00:13:57.480
Maybe a pawn is worth one point, a knight is

00:13:57.480 --> 00:14:00.539
three. It looks at board control, assigning fractional

00:14:00.539 --> 00:14:02.740
points for every square a piece threatens. Okay,

00:14:02.840 --> 00:14:05.340
I see. It tallies all these microadvantages up

00:14:05.340 --> 00:14:08.039
and assigns a finite numerical value to that

00:14:08.039 --> 00:14:10.519
timeline. It's a mathematically programmed intuition

00:14:10.519 --> 00:14:12.899
representing the belief that this path will eventually

00:14:12.899 --> 00:14:16.159
lead to a win. So it's forced to rely on a programmed

00:14:16.159 --> 00:14:18.419
gut feeling because the perfect information of

00:14:18.419 --> 00:14:21.259
the endgame is out of reach. But Deep Blue's

00:14:21.259 --> 00:14:23.600
universe, even with billions of branches, is

00:14:23.600 --> 00:14:26.340
still perfect. The chessboard has edges. It has

00:14:26.340 --> 00:14:29.179
strict rules. Real life doesn't. So how does

00:14:29.179 --> 00:14:31.720
this math survive when we step out of the computer

00:14:31.720 --> 00:14:34.639
and into the chaos of human society where we

00:14:34.639 --> 00:14:37.159
don't even know all the rules? How do we use

00:14:37.159 --> 00:14:40.080
MiniMax when there is no clearly defined opponent

00:14:40.080 --> 00:14:42.799
sitting across from us? It translates surprisingly

00:14:42.799 --> 00:14:46.120
well, particularly into individual decision -making

00:14:46.120 --> 00:14:49.580
under severe real -world uncertainty. The source

00:14:49.580 --> 00:14:51.960
gives a great example, prospecting for minerals.

00:14:52.779 --> 00:14:55.679
This is an incredibly expensive endeavor. If

00:14:55.679 --> 00:14:58.080
a mining company drills and finds nothing, they

00:14:58.080 --> 00:15:00.559
waste a fortune. If they find minerals, the rewards

00:15:00.559 --> 00:15:02.960
are astronomical. But the Earth isn't playing

00:15:02.960 --> 00:15:05.870
a game against you. The dirt doesn't have a strategy

00:15:05.870 --> 00:15:09.070
to hide the gold. It's just dirt. True. But the

00:15:09.070 --> 00:15:11.549
decision theory treats this uncertainty as a

00:15:11.549 --> 00:15:14.070
game against nature. It adopts a mindset very

00:15:14.070 --> 00:15:16.190
similar to Murphy's Law. Anything that can go

00:15:16.190 --> 00:15:18.590
wrong will go wrong. What's crucial here is that

00:15:18.590 --> 00:15:21.470
it's a non -probabilistic decision theory. Non

00:15:21.470 --> 00:15:24.009
-probabilistic. Right. Meaning it explicitly

00:15:24.009 --> 00:15:27.080
does not care about the odds. Exactly. It does

00:15:27.080 --> 00:15:29.759
not rely on the exact mathematical probabilities

00:15:29.759 --> 00:15:31.820
of outcomes. It doesn't care if a geologist says

00:15:31.820 --> 00:15:34.059
there's a 10 % chance or 80 % chance of finding

00:15:34.059 --> 00:15:37.740
gold. It relies strictly on scenario analysis.

00:15:38.299 --> 00:15:41.340
What is the absolute worst possible outcome of

00:15:41.340 --> 00:15:44.320
this decision? You simply rank the outcomes.

00:15:44.440 --> 00:15:46.919
You don't measure the exact statistical distances

00:15:46.919 --> 00:15:49.580
between them. And you choose the path where that

00:15:49.580 --> 00:15:52.379
worst case scenario doesn't bankrupt you. That's

00:15:52.379 --> 00:15:54.120
entirely different from how we are taught to

00:15:54.120 --> 00:15:57.049
think. You aren't calculating your odds of success.

00:15:57.169 --> 00:15:59.649
You are just building the strongest bunker possible,

00:16:00.110 --> 00:16:02.409
which brings up an incredibly interesting and

00:16:02.409 --> 00:16:04.789
frankly exhausting application from the text,

00:16:05.690 --> 00:16:08.809
democracy and voting. Ah, yes. The Wikipedia

00:16:08.809 --> 00:16:12.490
article discusses lesser evil voting, or LEV.

00:16:13.029 --> 00:16:14.549
And just to be perfectly clear to you listening,

00:16:14.669 --> 00:16:16.669
we are discussing this purely and partially.

00:16:16.889 --> 00:16:19.070
as an acrobic application of the theory from

00:16:19.070 --> 00:16:21.409
the source material. Yes, absolutely. The text

00:16:21.409 --> 00:16:23.629
quotes political thinkers Norrin Chomsky and

00:16:23.629 --> 00:16:26.269
John Halley, who argue that voting in a system

00:16:26.269 --> 00:16:28.850
with two major choices can be viewed purely through

00:16:28.850 --> 00:16:31.450
the lens of a minimax strategy. And as someone

00:16:31.450 --> 00:16:34.090
who gets deeply fatigued by election seasons,

00:16:34.850 --> 00:16:37.539
the logic here is fascinating. The quote from

00:16:37.539 --> 00:16:39.519
the source states that voting should not be viewed

00:16:39.519 --> 00:16:42.659
as a form of personal self -expression. It shouldn't

00:16:42.659 --> 00:16:45.120
be about moral retaliation against candidates

00:16:45.120 --> 00:16:47.960
who fail to perfectly reflect your values, and

00:16:47.960 --> 00:16:51.159
it's not about judging a corrupt system. Instead,

00:16:51.259 --> 00:16:53.639
they argue, it is simply a mechanical opportunity

00:16:53.639 --> 00:16:57.100
to reduce harm. It is the ultimate real world

00:16:57.100 --> 00:17:00.320
application of minimizing the maximum possible

00:17:00.320 --> 00:17:03.580
loss. In this framework, you evaluate the candidates,

00:17:03.820 --> 00:17:05.900
determine which one represents the absolute worst

00:17:05.900 --> 00:17:08.220
case scenario for your interests or the country's

00:17:08.220 --> 00:17:10.380
interests, and you vote for the other one. Wow.

00:17:10.460 --> 00:17:12.500
You aren't trying to maximize your ideal gain

00:17:12.500 --> 00:17:15.240
or achieve a utopian vision. You are strictly

00:17:15.240 --> 00:17:17.680
minimizing the worst case damage. It certainly

00:17:17.680 --> 00:17:20.220
strips all the romance and idealism out of democracy,

00:17:20.220 --> 00:17:23.200
but the mathematical logic of it is undeniable.

00:17:23.519 --> 00:17:25.910
You are pruning the darkest branch of the tree.

00:17:26.130 --> 00:17:28.289
If we connect this to the bigger picture, this

00:17:28.289 --> 00:17:31.529
exact logic scales up from individual votes to

00:17:31.529 --> 00:17:33.670
the very foundation of how we define a moral

00:17:33.670 --> 00:17:37.220
society. In philosophy, the term Maximin is deeply

00:17:37.220 --> 00:17:39.799
associated with John Rawls and his monumental

00:17:39.799 --> 00:17:43.140
work, A Theory of Justice. Wait, how does a political

00:17:43.140 --> 00:17:46.160
philosopher use game theory math? Rawls used

00:17:46.160 --> 00:17:48.839
Maximin logic to formulate what he called the

00:17:48.839 --> 00:17:51.599
Difference Principle. He created a famous thought

00:17:51.599 --> 00:17:54.779
experiment called the Veil of Ignorance. He argued

00:17:54.779 --> 00:17:57.160
that if you were designing a society from scratch,

00:17:57.319 --> 00:17:58.900
you should do it without knowing where you would

00:17:58.900 --> 00:18:01.000
personally end up in that society. Because you

00:18:01.000 --> 00:18:03.099
could be anyone. Right. You don't know if you

00:18:03.099 --> 00:18:05.799
will be born rich. or poor, healthy or sick,

00:18:06.059 --> 00:18:08.359
part of a majority, or a marginalized minority.

00:18:08.660 --> 00:18:10.359
Because if you don't know where you will land,

00:18:10.460 --> 00:18:13.180
you are forced to protect yourself against the

00:18:13.180 --> 00:18:15.740
absolute worst -case scenario. You assume you

00:18:15.740 --> 00:18:18.420
will be born in the very bottom. Precisely. Because

00:18:18.420 --> 00:18:20.920
of that uncertainty, a rational person using

00:18:20.920 --> 00:18:23.700
maximin logic would demand that a just society

00:18:23.700 --> 00:18:26.440
is one where social and economic inequalities

00:18:26.440 --> 00:18:28.880
are specifically arranged to provide the greatest

00:18:28.880 --> 00:18:31.460
benefit to the least advantaged members of society.

00:18:31.839 --> 00:18:35.009
Wow. He took the logic of avoiding a negative

00:18:35.009 --> 00:18:38.450
100 payoff in a simple grid game and used it

00:18:38.450 --> 00:18:40.890
to argue that a society is only mathematically

00:18:40.890 --> 00:18:43.950
fair if the absolute bottom rung of the ladder

00:18:43.950 --> 00:18:46.130
is raised as high off the ground as possible.

00:18:46.130 --> 00:18:48.269
Yeah. Because you never know if you're the one

00:18:48.269 --> 00:18:50.430
who is going to be standing on it. You maximize

00:18:50.430 --> 00:18:53.750
the minimum position in society. It's an incredibly

00:18:53.750 --> 00:18:56.750
powerful philosophical framework built entirely

00:18:56.750 --> 00:18:59.630
on the bones of risk aversion and game theory.

00:18:59.829 --> 00:19:01.609
So what does this all mean for you listening

00:19:01.609 --> 00:19:03.950
right now? It means that whether you are agonizing

00:19:03.950 --> 00:19:06.269
over a career change, whether you are casting

00:19:06.269 --> 00:19:09.069
a pragmatic ballot in an election, or, going

00:19:09.069 --> 00:19:10.910
all the way back to our first example, whether

00:19:10.910 --> 00:19:12.769
you are just packing an umbrella for a weekend

00:19:12.769 --> 00:19:15.869
trip, you are very likely intuitively applying

00:19:15.869 --> 00:19:18.450
minimax logic. You really are. You are ignoring

00:19:18.450 --> 00:19:20.930
the exact percentages, looking at the darkest

00:19:20.930 --> 00:19:23.690
cloud on the horizon, and simply protecting yourself

00:19:23.690 --> 00:19:25.789
against the worst possible timeline. It is a

00:19:25.789 --> 00:19:28.549
fundamental human survival mechanism, formalized

00:19:28.549 --> 00:19:31.089
into strict mathematics. We've covered an incredible

00:19:31.089 --> 00:19:33.500
amount of ground - today. We started with the

00:19:33.500 --> 00:19:36.460
strict, unyielding math of row and column matrices.

00:19:37.079 --> 00:19:39.440
We saw how Deep Blue harnessed that paranoid

00:19:39.440 --> 00:19:42.900
logic, pruning dead branches and using mathematical

00:19:42.900 --> 00:19:45.599
intuition to calculate twelve plies into the

00:19:45.599 --> 00:19:48.440
future and defeat Garry Kasparov. And we traced

00:19:48.440 --> 00:19:51.180
that exact same logic all the way to John Rawls,

00:19:51.519 --> 00:19:54.460
designing a vision for a just society by rigorously

00:19:54.460 --> 00:19:56.359
looking out for the most disadvantaged among

00:19:56.359 --> 00:19:59.529
us. It really highlights how interconnected mathematics,

00:19:59.890 --> 00:20:02.890
technology, and human philosophy truly are. But

00:20:02.890 --> 00:20:04.769
it also leaves us with a lingering question.

00:20:05.160 --> 00:20:07.319
Something I've been thinking about as we unpack

00:20:07.319 --> 00:20:09.460
all of these worst -case scenarios. Okay, lay

00:20:09.460 --> 00:20:11.440
it on us. We've learned today that the minimax

00:20:11.440 --> 00:20:13.920
algorithm trains artificial intelligence to always

00:20:13.920 --> 00:20:16.220
assume the opponent will make the most punishing

00:20:16.220 --> 00:20:19.059
worst -case counter move. It literally programs

00:20:19.059 --> 00:20:21.420
the AI to expect the worst from the world in

00:20:21.420 --> 00:20:24.140
order to survive. So what happens in the near

00:20:24.140 --> 00:20:28.839
future when two incredibly powerful AIs, both

00:20:28.839 --> 00:20:32.160
running minimax -style logic, are forced to interact

00:20:32.160 --> 00:20:35.029
in the real world outside the strict confines

00:20:35.029 --> 00:20:37.829
and perfect rolls of a chessboard. Will their

00:20:37.829 --> 00:20:40.109
mutual assumption of the worst -case scenario

00:20:40.109 --> 00:20:43.829
trap them in an endless cycle of extreme defensive

00:20:43.829 --> 00:20:46.960
paranoia? Oh, that is chilling. If both sides

00:20:46.960 --> 00:20:49.240
assume the others out to destroy them, they might

00:20:49.240 --> 00:20:51.740
accidentally create the very conflict that you're

00:20:51.740 --> 00:20:53.819
both mathematically trying to avoid. Exactly.

00:20:54.059 --> 00:20:56.359
A self -fulfilling prophecy of destruction born

00:20:56.359 --> 00:20:58.740
entirely out of perfect defensive mathematics.

00:20:59.180 --> 00:21:00.579
Something to think about the next time you try

00:21:00.579 --> 00:21:02.640
to outsmart a machine. Or the next time you are

00:21:02.640 --> 00:21:04.420
sitting in a bunker staring at a radar screen

00:21:04.420 --> 00:21:06.160
wondering if the universe is out to get you.

00:21:06.640 --> 00:21:08.220
Thank you so much for joining us on this deep

00:21:08.220 --> 00:21:09.420
dive. We will see you next time.