WEBVTT

00:00:00.000 --> 00:00:02.859
You know, usually when we think about changing

00:00:02.859 --> 00:00:05.919
our minds, we view it as this kind of moment

00:00:05.919 --> 00:00:09.060
of defeat. Right, like a structural failure or

00:00:09.060 --> 00:00:11.019
something. Yeah, exactly. You build a belief

00:00:11.019 --> 00:00:13.919
system, a crack appears in the foundation, and

00:00:13.919 --> 00:00:15.960
you just assume the whole house is going to come

00:00:15.960 --> 00:00:18.399
down. And it feels incredibly destabilizing.

00:00:18.579 --> 00:00:20.739
I mean, human beings are fundamentally wired

00:00:20.739 --> 00:00:25.420
to treat our convictions as absolute destinations

00:00:25.420 --> 00:00:28.359
rather than provisional rest stops. But what

00:00:28.359 --> 00:00:30.859
if changing your mind wasn't a collapse at all?

00:00:31.199 --> 00:00:33.899
What if it was actually like a mathematical upgrade?

00:00:34.090 --> 00:00:36.570
Oh, I like that framing. Think of a really good

00:00:36.570 --> 00:00:39.609
detective working a complex case. A savvy detective

00:00:39.609 --> 00:00:42.229
doesn't just look at a new clue and a total vacuum,

00:00:42.390 --> 00:00:45.189
right? No, of course not. They take that brand

00:00:45.189 --> 00:00:47.310
new fingerprint or that fresh alibi and they

00:00:47.310 --> 00:00:48.649
weigh it against everything they already knew

00:00:48.649 --> 00:00:51.130
about the case. Exactly. They update their work

00:00:51.130 --> 00:00:53.850
in Peary. And today we're taking a deep dive

00:00:53.850 --> 00:00:56.149
into the very math of doing exactly that. Yes,

00:00:56.149 --> 00:00:58.850
we are. We've got this massive comprehensive

00:00:58.850 --> 00:01:01.409
Wikipedia article sitting in front of us. It

00:01:01.409 --> 00:01:04.469
covers the history. the dense mathematics, and

00:01:04.469 --> 00:01:07.069
the real -world applications of a concept called

00:01:07.069 --> 00:01:12.040
Bayesian inference. one of the most powerful

00:01:12.040 --> 00:01:14.640
mental models you can acquire. And our mission

00:01:14.640 --> 00:01:17.739
for you today is to basically demystify it. We

00:01:17.739 --> 00:01:20.379
want to translate this heavy statistical framework

00:01:20.379 --> 00:01:22.659
into a mental shortcut that you can actually

00:01:22.659 --> 00:01:25.000
use. Right, because the goal is to walk away

00:01:25.000 --> 00:01:27.579
from this deep dive as a sharper, more adaptable

00:01:27.579 --> 00:01:30.680
thinker, especially in a world where we are constantly

00:01:30.680 --> 00:01:33.680
bombarded with, you know, conflicting information.

00:01:33.780 --> 00:01:35.900
So to set the stage here, Bayesian inference

00:01:35.900 --> 00:01:39.099
isn't just some dry statistical formula to be

00:01:39.099 --> 00:01:41.340
memorized in a universal lecture and then forgotten.

00:01:41.739 --> 00:01:44.319
No, not at all. It is a foundational framework

00:01:44.319 --> 00:01:46.799
for human reasoning. Its roots actually trace

00:01:46.799 --> 00:01:49.420
all the way back to a mathematician and minister

00:01:49.420 --> 00:01:52.739
named Thomas Bayes in the 1700s. The 1700s, yeah.

00:01:52.900 --> 00:01:55.140
Yeah, and it was later rigorously developed by

00:01:55.140 --> 00:01:57.599
Pierre -Simon Laplace, and what these two managed

00:01:57.599 --> 00:02:00.239
to do was build this unprecedented bridge. The

00:02:00.239 --> 00:02:02.299
bridge between what and what? Well, they connected

00:02:02.299 --> 00:02:05.420
pure objective mathematics with human subjective

00:02:05.420 --> 00:02:08.460
belief. They essentially wrote the formula for

00:02:08.460 --> 00:02:11.039
how to rationally update what you believe when

00:02:11.039 --> 00:02:13.759
new information finally arrives. To understand

00:02:13.759 --> 00:02:16.400
how that mathematical bridge is actually built

00:02:16.400 --> 00:02:18.560
and, you know, how it's currently being used

00:02:18.560 --> 00:02:21.780
to do wild things like map the cosmos or train

00:02:21.780 --> 00:02:24.740
AI, we first need to strip the complexity all

00:02:24.740 --> 00:02:27.199
the way down. We definitely do. So we need to

00:02:27.199 --> 00:02:29.280
look at a highly relatable thought experiment

00:02:29.280 --> 00:02:31.719
from our source material. It involves cookies.

00:02:32.340 --> 00:02:34.599
It is remarkable how often high -level mathematics

00:02:34.599 --> 00:02:37.580
relies on baked goods to make a point. It really

00:02:37.580 --> 00:02:39.719
grounds the abstract ideas. So I want you to

00:02:39.719 --> 00:02:41.840
picture two identical bowls of cookies sitting

00:02:41.840 --> 00:02:44.340
on a table in front of you. Let's call it Fred's

00:02:44.340 --> 00:02:46.240
Cookie Bowls. OK, I'm picturing it. Two bowls.

00:02:46.719 --> 00:02:48.780
Right. Bowl number one has 10 chocolate chip

00:02:48.780 --> 00:02:52.560
cookies and 30 plain cookies. So 40 total, but

00:02:52.560 --> 00:02:55.319
mostly plain. Got it. Bowl number two has 20

00:02:55.319 --> 00:02:58.360
chocolate chip and 20 plain. A perfect 50 -50

00:02:58.360 --> 00:03:01.280
split. The setup is clear. Two bowls, different

00:03:01.280 --> 00:03:04.919
ratios. Now, our friend Fred blindfolds you,

00:03:05.039 --> 00:03:06.939
shuffles the two bowls around on the table so

00:03:06.939 --> 00:03:09.180
you totally lose track of them, and asks you

00:03:09.180 --> 00:03:12.439
to pick a bowl at random. Okay. Then, from that

00:03:12.439 --> 00:03:15.800
random bowl, you reach in and pick a cookie at

00:03:15.800 --> 00:03:18.120
random. You take off the blindfold and look at

00:03:18.120 --> 00:03:21.379
your hand. It is a plain cookie. Right, a plain

00:03:21.379 --> 00:03:24.319
cookie. Question BayesianInference asks is this,

00:03:25.080 --> 00:03:27.840
what is the probability that you just pulled

00:03:27.840 --> 00:03:30.840
that plain cookie from bowl number one. And this

00:03:30.840 --> 00:03:33.439
is where human intuition usually jumps in, far

00:03:33.439 --> 00:03:36.280
ahead of the actual math. Oh, totally. Our intuition

00:03:36.280 --> 00:03:38.360
immediately screams that it's definitely more

00:03:38.360 --> 00:03:40.979
than a 50 % chance, because bowl number one has

00:03:40.979 --> 00:03:42.860
way more plain cookies in it. Right. You're holding

00:03:42.860 --> 00:03:45.460
a plain cookie so it feels obvious. But Bayesian

00:03:45.460 --> 00:03:48.520
inference gives us the precise language and the

00:03:48.520 --> 00:03:50.659
precise mathematical engine to actually prove

00:03:50.659 --> 00:03:53.819
it. So let's lay out the formal vocabulary. First,

00:03:53.919 --> 00:03:55.879
we have what is called the prior. The prior,

00:03:56.020 --> 00:03:57.719
yes. Before you even looked at the cookie in

00:03:57.719 --> 00:04:00.240
your hand, what was your chance of picking bowl

00:04:00.240 --> 00:04:03.300
number one? It was 50 -50. You picked a bowl

00:04:03.300 --> 00:04:06.400
entirely at random while blindfolded. Exactly.

00:04:06.740 --> 00:04:09.319
The prior probability is your baseline estimate

00:04:09.319 --> 00:04:12.280
of a hypothesis before any new evidence is observed.

00:04:12.580 --> 00:04:14.699
It's just your starting point. Then we introduce

00:04:14.699 --> 00:04:17.360
the likelihood. This measures the compatibility

00:04:17.360 --> 00:04:20.019
of the new evidence with the hypothesis. Right,

00:04:20.139 --> 00:04:21.879
so the likelihood of pulling a plain cookie from

00:04:21.879 --> 00:04:25.560
bowl 1 is 30 out of 40, which is 75%. And the

00:04:25.560 --> 00:04:27.040
likelihood of pulling a plain cookie from bowl

00:04:27.040 --> 00:04:31.300
2 is 20 out of 40, or 50%. Finally, we take those

00:04:31.300 --> 00:04:34.180
numbers and calculate the posterior. The posterior.

00:04:34.439 --> 00:04:37.360
That is your updated mathematically sound belief

00:04:37.360 --> 00:04:39.759
after incorporating the new evidence. Yeah. When

00:04:39.759 --> 00:04:42.759
you plug your 50 % prior and those likelihoods

00:04:42.759 --> 00:04:45.399
into Bayes' theorem, the posterior calculates

00:04:45.399 --> 00:04:48.759
out to exactly 60%. There's a 60 % chance you

00:04:48.759 --> 00:04:50.930
pick from bowl number one. Synthesizing that

00:04:50.930 --> 00:04:53.610
into the core formula basically looks like this.

00:04:54.050 --> 00:04:56.250
The posterior is proportional to the likelihood

00:04:56.250 --> 00:04:59.089
multiplied by the prior. Or, formulated another

00:04:59.089 --> 00:05:01.990
way, posterior equals likelihood times prior

00:05:01.990 --> 00:05:04.970
divided by the total evidence. Exactly. And the

00:05:04.970 --> 00:05:06.829
reason this matters for you listening to this

00:05:06.829 --> 00:05:09.930
right now is because it mathematically prevents

00:05:09.930 --> 00:05:12.970
the human tendency to overreact to new data.

00:05:13.100 --> 00:05:15.660
Oh, that makes sense. It forces us to anchor

00:05:15.660 --> 00:05:18.240
our new observations to our existing knowledge

00:05:18.240 --> 00:05:21.120
base. So because you saw a plain cookie, your

00:05:21.120 --> 00:05:24.079
brain might want to jump to being like 80 or

00:05:24.079 --> 00:05:26.819
90 percent sure it was bowl one. Right. But the

00:05:26.819 --> 00:05:29.660
math restrains you to 60 percent because it forces

00:05:29.660 --> 00:05:32.000
you to remember that your initial chance of picking

00:05:32.000 --> 00:05:34.980
bowl one was only a coin toss. It's like a built

00:05:34.980 --> 00:05:37.279
-in shock absorber for your opinions. It's a

00:05:37.279 --> 00:05:39.350
great way to put it. But that brings up a really

00:05:39.350 --> 00:05:41.990
fascinating friction point. Because the poster

00:05:41.990 --> 00:05:44.670
your belief, your final conclusion, relies so

00:05:44.670 --> 00:05:47.850
heavily on your initial prior belief. What happens

00:05:47.850 --> 00:05:50.730
if your starting belief is just incredibly stubborn?

00:05:51.230 --> 00:05:53.810
Ah, yeah. That leads us straight into a vital

00:05:53.810 --> 00:05:56.269
mathematical principle known as Cromwell's Rule.

00:05:56.490 --> 00:05:58.709
Right, Cromwell's Rule. It states that if your

00:05:58.709 --> 00:06:01.610
prior probability for a model, your initial foundational

00:06:01.610 --> 00:06:05.910
belief, is exactly zero or exactly one, then

00:06:05.910 --> 00:06:07.870
absolutely no amount of new evidence can ever

00:06:07.870 --> 00:06:11.230
change your mind. And the reason is purely mechanical.

00:06:11.949 --> 00:06:15.120
Mathematically, Multiplying any number by zero

00:06:15.120 --> 00:06:18.019
always yields zero. Right, it's just basic multiplication.

00:06:18.220 --> 00:06:22.240
Exactly. If you assign a prior of zero to a hypothesis,

00:06:22.579 --> 00:06:24.699
meaning you believe it is utterly impossible,

00:06:25.319 --> 00:06:28.240
it doesn't matter if the new evidence has a massive

00:06:28.240 --> 00:06:31.699
undeniable likelihood. The posterior will remain

00:06:31.699 --> 00:06:34.199
zero. You are multiplying the near evidence by

00:06:34.199 --> 00:06:36.500
zero. Let me play devil's advocate here, though,

00:06:36.519 --> 00:06:38.259
because I think a lot of people might push back

00:06:38.259 --> 00:06:40.899
on that being a bad thing. We often admire people

00:06:40.899 --> 00:06:43.259
with strong, unwavering convictions, right? We

00:06:43.259 --> 00:06:46.259
do, yeah. Being 100 % sure of your principles

00:06:46.259 --> 00:06:48.800
is frequently framed as a strong leadership trait.

00:06:49.519 --> 00:06:52.060
Socially, unwavering conviction might be rewarded.

00:06:52.939 --> 00:06:55.480
But mathematically, it's a dead end. It is a

00:06:55.480 --> 00:06:58.420
complete cognitive trap. Think of a heavy lock

00:06:58.420 --> 00:07:01.430
door. If you approach that door and you're 100

00:07:01.430 --> 00:07:03.970
% certain that it is locked, you won't even bother

00:07:03.970 --> 00:07:06.089
to try the handle. No, why would you? Right.

00:07:06.470 --> 00:07:07.970
Even if someone is standing right next to you,

00:07:08.110 --> 00:07:10.310
holding a key, explicitly telling you, hey, I

00:07:10.310 --> 00:07:13.209
just unlocked this, your absolute certainty prevents

00:07:13.209 --> 00:07:15.209
you from verifying the new evidence. You just

00:07:15.209 --> 00:07:17.430
turn around and walk away. That perfectly illustrates

00:07:17.430 --> 00:07:20.189
the danger. If we pull back to look at the larger

00:07:20.189 --> 00:07:22.529
implications, Cromwell's rule mathematically

00:07:22.529 --> 00:07:26.189
reveals why hard fundamentalist convictions are

00:07:26.189 --> 00:07:29.509
entirely insensitive to counter -evidence. Wow.

00:07:29.930 --> 00:07:32.269
Absolute certainty is the enemy of discovery.

00:07:32.550 --> 00:07:35.410
Yes. If you want to be a rational, adaptable

00:07:35.410 --> 00:07:38.839
thinker, a turbasion, you must always leave a

00:07:38.839 --> 00:07:41.480
tiny fraction of a percentage open to being wrong.

00:07:41.639 --> 00:07:46.420
Even just a sliver. Even if your prior is 0 .0001%,

00:07:46.420 --> 00:07:49.199
that sliver of doubt is required. If you don't

00:07:49.199 --> 00:07:51.660
leave that sliver, the entire mathematical engine

00:07:51.660 --> 00:07:53.839
of learning breaks down. You literally cannot

00:07:53.839 --> 00:07:56.720
update a belief if you start at exactly 100 %

00:07:56.720 --> 00:07:59.300
or exactly 0%. It's mechanically impossible to

00:07:59.300 --> 00:08:01.399
learn anything new if you already know everything.

00:08:01.399 --> 00:08:04.160
Precisely. So, we've seen how a single piece

00:08:04.160 --> 00:08:07.160
of evidence, like one cookie, or one locked door,

00:08:07.579 --> 00:08:10.620
updates a single belief. But real life is rarely

00:08:10.620 --> 00:08:13.519
that clean, right? Very rarely. How does this

00:08:13.519 --> 00:08:16.300
scale up? What does Bayesian inference look like

00:08:16.300 --> 00:08:18.759
when we are accumulating evidence over a long

00:08:18.759 --> 00:08:22.319
period of time in deeply complex scenarios? To

00:08:22.319 --> 00:08:24.920
see that scaling effect, we can examine the medieval

00:08:24.920 --> 00:08:27.620
archaeology example from our source text. It

00:08:27.620 --> 00:08:30.540
beautifully demonstrates how Bayes' theorem handles

00:08:30.540 --> 00:08:33.860
sequential compounding data. Oh, I loved this

00:08:33.860 --> 00:08:36.730
part. Imagine an archaeologist is digging at

00:08:36.730 --> 00:08:39.549
a newly discovered site. They know from the general

00:08:39.549 --> 00:08:41.450
region that the site is from the medieval period,

00:08:41.809 --> 00:08:43.669
somewhere between the 11th and 16th centuries.

00:08:44.049 --> 00:08:46.730
But they have no idea exactly when, in that 500

00:08:46.730 --> 00:08:49.110
-year span, the site was actually inhabited.

00:08:49.450 --> 00:08:52.230
Right. So as they begin to excavate, they find

00:08:52.230 --> 00:08:55.360
50 fragments of broken pottery. Some of the pottery

00:08:55.360 --> 00:08:58.820
is glazed and some is decorated. And the archaeologist

00:08:58.820 --> 00:09:00.940
has historical data to compare this against.

00:09:01.460 --> 00:09:03.519
They know that if a site is from the early medieval

00:09:03.519 --> 00:09:06.799
period, say the 11th century, only about 1 %

00:09:06.799 --> 00:09:08.779
of the pottery would typically be glazed and

00:09:08.779 --> 00:09:11.559
50 % would be decorated. Okay, so mostly decorated,

00:09:11.720 --> 00:09:14.600
barely any glaze. Exactly, but if it's from the

00:09:14.600 --> 00:09:18.019
late medieval period, the 16th century, the manufacturing

00:09:18.019 --> 00:09:22.139
techniques changed. By then, 81 % would be glazed

00:09:22.139 --> 00:09:25.669
and only 5 % decorated. So the evidence is trickling

00:09:25.669 --> 00:09:29.190
in one single fragment at a time. How does the

00:09:29.190 --> 00:09:31.909
Bayesian math actually process a slow trickle

00:09:31.909 --> 00:09:34.710
of clues? It starts with setting what statisticians

00:09:34.710 --> 00:09:37.889
call a uniform prior. Meaning they don't have

00:09:37.889 --> 00:09:40.620
a favorite century yet. Right. Because the archaeologist

00:09:40.620 --> 00:09:43.799
has no initial reason to favor one specific century

00:09:43.799 --> 00:09:46.740
over another, they assign a 20 % probability

00:09:46.740 --> 00:09:49.840
to each of the five centuries, an equal unbiased

00:09:49.840 --> 00:09:52.620
spread. OK, fair enough. Then they dig up the

00:09:52.620 --> 00:09:54.259
first pottery fragment. Let's say it's highly

00:09:54.259 --> 00:09:57.220
glazed. They apply Bayes' theorem and the probability

00:09:57.220 --> 00:09:59.460
shift. The likelihood of the 16th century goes

00:09:59.460 --> 00:10:02.019
up a bit, and the likelihood of the 11th century

00:10:02.019 --> 00:10:04.759
drops. Because 16th century pots were way more

00:10:04.759 --> 00:10:07.820
likely to be glazed. Yes. But here is the critical

00:10:07.820 --> 00:10:11.049
mechanism. That new updated probability, the

00:10:11.049 --> 00:10:13.750
posterior, now becomes the prior for the second

00:10:13.750 --> 00:10:16.350
fragment. Today's posterior is tomorrow's prior.

00:10:16.789 --> 00:10:18.690
You just carry the updated math forward to the

00:10:18.690 --> 00:10:20.990
next clue. That's the compounding power of it.

00:10:21.330 --> 00:10:24.690
This process repeats 50 times. Every fragment

00:10:24.690 --> 00:10:27.129
updates the belief. And by the time the 50th

00:10:27.129 --> 00:10:28.970
fragment is pulled from the dirt and analyzed,

00:10:29.549 --> 00:10:32.009
the math has converged to a highly specific,

00:10:32.250 --> 00:10:34.899
dominant conclusion. It has. The calculations

00:10:34.899 --> 00:10:38.340
reveal a 63 % chance the site is from the 14th

00:10:38.340 --> 00:10:41.639
century. A 36 % chance it's from the 15th century,

00:10:41.840 --> 00:10:44.419
and practically zero chance it's from the 11th

00:10:44.419 --> 00:10:47.000
or 12th. And there is a mathematical guarantee

00:10:47.000 --> 00:10:50.179
backing this process up, known as the Bernstein

00:10:50.179 --> 00:10:53.419
von Mises theorem. Yes, quite a mouthful. Without

00:10:53.419 --> 00:10:56.019
getting bogged down in the dense academic terminology,

00:10:56.500 --> 00:10:58.940
what this theorem essentially proves is that

00:10:58.940 --> 00:11:01.779
if you collect enough independent data, your

00:11:01.779 --> 00:11:03.840
final conclusion will eventually converge on

00:11:03.840 --> 00:11:06.299
the truth, regardless of what your initial prior

00:11:06.299 --> 00:11:09.259
was. The sheer volume of incoming data will eventually

00:11:09.259 --> 00:11:11.740
wash away the initial bias. That is so cool.

00:11:11.879 --> 00:11:14.159
It is. If our archaeologists had started out

00:11:14.159 --> 00:11:16.460
completely stubbornly convinced that the site

00:11:16.460 --> 00:11:18.679
was from the 11th century, assigning it a 90

00:11:18.679 --> 00:11:21.879
% prior, those 50 fragments of highly glazed

00:11:21.879 --> 00:11:23.759
pottery would have relentlessly dragged the math

00:11:23.759 --> 00:11:26.279
back to the 14th century anyway. The data always

00:11:26.279 --> 00:11:28.259
wins in the end, as long as you keep updating.

00:11:29.379 --> 00:11:31.059
But hold on. Let me push back on the premise

00:11:31.059 --> 00:11:33.799
here. Sure. If the initial prior can just be

00:11:33.799 --> 00:11:36.259
whatever the researcher wants it to be, a uniform

00:11:36.259 --> 00:11:40.509
prior, or just a subjective expert opinion. How

00:11:40.509 --> 00:11:44.629
can Bayesian inference be considered hard objective

00:11:44.629 --> 00:11:47.409
science? That is the million dollar question.

00:11:47.590 --> 00:11:49.730
Aren't we basically letting personal bias slip

00:11:49.730 --> 00:11:51.450
through the back door of the scientific method?

00:11:51.799 --> 00:11:55.480
That exact critique completely fractured the

00:11:55.480 --> 00:11:57.879
statistical world in the 20th century. After

00:11:57.879 --> 00:12:01.460
the 1920s, a rival school of thought called frequentist

00:12:01.460 --> 00:12:04.700
statistics actually dominated the field. Frequentist

00:12:04.700 --> 00:12:07.299
statistics. Right. Frequentists rely entirely

00:12:07.299 --> 00:12:09.980
on objective sampling data. They detest the concept

00:12:09.980 --> 00:12:12.220
of a prior because they view it as inherently

00:12:12.220 --> 00:12:15.019
unscientific and subjective. They only want to

00:12:15.019 --> 00:12:17.059
look at the frequency of events as they appear

00:12:17.059 --> 00:12:19.299
in the data itself. To put an analogy on it,

00:12:19.379 --> 00:12:21.220
a frequentist is like someone trying to predict

00:12:21.230 --> 00:12:23.529
if it will rain tomorrow by pulling up 50 years

00:12:23.529 --> 00:12:26.450
of almanacs, seeing that it rained on 12 % of

00:12:26.450 --> 00:12:29.029
this specific calendar date historically, and

00:12:29.029 --> 00:12:31.549
concluding the chance of rain is exactly 12%.

00:12:31.549 --> 00:12:33.769
Because they only look at the historical frequency.

00:12:33.990 --> 00:12:35.789
Right. A Bayesian is the person who looks at

00:12:35.789 --> 00:12:38.929
that same almanac, but then walks outside, sees

00:12:38.929 --> 00:12:41.409
massive black thunder clouds rolling over the

00:12:41.409 --> 00:12:45.370
horizon, and says, I'm updating that 12 % prior

00:12:45.370 --> 00:12:48.529
to a 90 % posterior because of the new evidence

00:12:48.529 --> 00:12:51.080
right in front of me. They incorporate the present

00:12:51.080 --> 00:12:53.820
state into the historical data. Exactly. That

00:12:53.820 --> 00:12:56.440
is a highly effective way to separate the two.

00:12:57.000 --> 00:12:59.259
And the Bayesian approach, relying on inverse

00:12:59.259 --> 00:13:02.120
probability to infer backwards from effects to

00:13:02.120 --> 00:13:05.659
causes, using a subjective belief, it absolutely

00:13:05.659 --> 00:13:08.799
infuriated many traditional scientists. Really?

00:13:08.899 --> 00:13:11.580
Like who? Well, the famous philosopher of science,

00:13:11.779 --> 00:13:15.039
Karl Popper, explicitly rejected Bayesian rationalism.

00:13:15.360 --> 00:13:17.679
Popper argued that using Bayes' rule creates

00:13:17.679 --> 00:13:20.559
a vicious circle. How so? He said it presupposes

00:13:20.559 --> 00:13:23.159
what it attempts to justify because you are baking

00:13:23.159 --> 00:13:26.299
your own beliefs into the starting formula. So

00:13:26.299 --> 00:13:28.159
Popper is essentially saying you can't use your

00:13:28.159 --> 00:13:30.360
own opinion as part of the math to prove your

00:13:30.360 --> 00:13:32.659
own opinion. Exactly. Which sounds like a completely

00:13:32.659 --> 00:13:35.080
fair critique. Okay. How did the Beijing camp

00:13:35.080 --> 00:13:37.700
defend against that? The defense relies on radical

00:13:37.700 --> 00:13:41.049
transparency. Bayesians argue that everyone has

00:13:41.049 --> 00:13:43.090
biases and assumptions when they look at data.

00:13:43.309 --> 00:13:45.649
That's true. The frequentist approach often hides

00:13:45.649 --> 00:13:48.070
those assumptions behind complex sampling models

00:13:48.070 --> 00:13:51.169
and arbitrary confidence intervals. The Bayesian

00:13:51.169 --> 00:13:54.110
approach, conversely, forces you to explicitly

00:13:54.110 --> 00:13:57.470
state your bias out loud as a mathematical prior.

00:13:57.690 --> 00:13:59.570
Oh, I see. You have to write down exactly what

00:13:59.570 --> 00:14:01.309
your assumptions are before you start calculating.

00:14:01.590 --> 00:14:04.029
Precisely. Furthermore, a statistician named

00:14:04.029 --> 00:14:07.110
Abraham Wald later proved mathematically that

00:14:07.110 --> 00:14:09.950
every admissible statistical decision procedure

00:14:09.950 --> 00:14:13.009
is, at its core, either a Bayesian procedure

00:14:13.009 --> 00:14:16.169
or a limit of one. Wait, really? Yeah, it turns

00:14:16.169 --> 00:14:18.950
out to be the unavoidable foundational math of

00:14:18.950 --> 00:14:21.409
making decisions under uncertainty. So instead

00:14:21.409 --> 00:14:23.730
of pretending we don't have biases, we put a

00:14:23.730 --> 00:14:25.470
hard number on them, put them out in the open,

00:14:25.789 --> 00:14:27.870
and let the incoming data correct us over time.

00:14:27.960 --> 00:14:30.799
Exactly. And despite those massive philosophical

00:14:30.799 --> 00:14:33.379
debates, the Bayesian approach came roaring back

00:14:33.379 --> 00:14:36.679
to dominance in the 1980s. It experienced a massive

00:14:36.679 --> 00:14:39.679
resurgence, largely thanks to the rise of powerful

00:14:39.679 --> 00:14:43.000
computers. Right. Yes. Bayesian math can get

00:14:43.000 --> 00:14:45.960
computationally overwhelming very quickly. But

00:14:45.960 --> 00:14:48.500
with the invention of Markov chain Monte Carlo

00:14:48.500 --> 00:14:52.159
methods, The real -world applications exploded.

00:14:52.620 --> 00:14:55.039
We see that term Markov chain Monte Carlo or

00:14:55.039 --> 00:14:57.720
MCMC all over the source material. But how do

00:14:57.720 --> 00:15:00.559
we actually compute this when the math gets insane?

00:15:01.000 --> 00:15:03.820
How does MCMC actually work? Think of it like

00:15:03.820 --> 00:15:06.899
a blindfolded hiker trying to find the highest

00:15:06.899 --> 00:15:09.759
peak in a massive sprawling mountain range. Okay,

00:15:09.960 --> 00:15:12.039
blindfolded hiker. The hiker can't see the whole

00:15:12.039 --> 00:15:14.700
map to calculate where the peak is, but they

00:15:14.700 --> 00:15:17.179
can take a step, feel if the ground is sloping

00:15:17.179 --> 00:15:19.940
up or down, and make a decision to stay or move

00:15:19.940 --> 00:15:23.100
higher. Just feeling their way up. Right. Markov

00:15:23.100 --> 00:15:25.480
chain Monte Carlo algorithms do something similar

00:15:25.480 --> 00:15:28.039
in a landscape of complex probabilities. They

00:15:28.039 --> 00:15:30.480
wander around, taking steps, testing the math

00:15:30.480 --> 00:15:32.710
locally, and eventually they map a the shape

00:15:32.710 --> 00:15:34.830
of the mountain finding the highest probability

00:15:34.830 --> 00:15:37.610
without needing to calculate every single inch

00:15:37.610 --> 00:15:40.269
of the infinite terrain. That computational power

00:15:40.269 --> 00:15:42.750
brings us to where this subjective math is actually

00:15:42.750 --> 00:15:44.990
operating in your daily life right now. Yeah.

00:15:45.169 --> 00:15:47.629
Let's trace this from your inbox to the literal

00:15:47.629 --> 00:15:49.690
edge of the universe. Let's do it. If you've

00:15:49.690 --> 00:15:52.110
ever checked your email spam folder, you are

00:15:52.110 --> 00:15:55.429
looking at Bayesian inference at work. Yes, specifically

00:15:55.429 --> 00:15:58.840
naive Bayes classifiers. The software starts

00:15:58.840 --> 00:16:00.940
with a prior belief about what a normal email

00:16:00.940 --> 00:16:03.519
looks like. Then it looks at the new evidence,

00:16:03.940 --> 00:16:06.100
the specific words in the incoming email. If

00:16:06.100 --> 00:16:08.860
it sees the word lottery or Prince, the algorithm

00:16:08.860 --> 00:16:10.860
notes that the likelihood of those words appearing

00:16:10.860 --> 00:16:14.080
in a legitimate email is very low, but incredibly

00:16:14.080 --> 00:16:17.340
high for spam. The posterior probability updates

00:16:17.340 --> 00:16:20.559
and the emails aggressively filtered out. The

00:16:20.559 --> 00:16:23.059
algorithm learns over time based on past data.

00:16:23.399 --> 00:16:25.899
But how does a computer which only understands

00:16:25.899 --> 00:16:30.039
ones and zeros possess a subjective bias? It's

00:16:30.039 --> 00:16:32.860
because the programmers initially feed it a massive

00:16:32.860 --> 00:16:35.620
data set of human -labeled spam to establish

00:16:35.620 --> 00:16:38.289
its prior. Right. Humans give it the bias. From

00:16:38.289 --> 00:16:41.629
tech, we jump to cosmology. Scientists use Bayesian

00:16:41.629 --> 00:16:43.970
inference to literally map the universe. They

00:16:43.970 --> 00:16:46.529
do. Our sources point out that researchers use

00:16:46.529 --> 00:16:49.750
Bayesian model comparison on the cosmic microwave

00:16:49.750 --> 00:16:52.950
background data, the Planck's 2018 data, to fit

00:16:52.950 --> 00:16:54.970
the parameters of the standard model of the Big

00:16:54.970 --> 00:16:57.909
Bang, the Lambda CDM model. Cosmology is the

00:16:57.909 --> 00:17:00.070
perfect application for Bayes' theorem because

00:17:00.070 --> 00:17:03.029
of a fundamental physical limitation. We only

00:17:03.029 --> 00:17:05.549
have one universe. Yeah, we can't exactly run

00:17:05.549 --> 00:17:08.609
a control group on the big bang. Exactly. A frequentist

00:17:08.609 --> 00:17:11.109
would prefer to sample multiple universes to

00:17:11.109 --> 00:17:13.829
find an average frequency of how big bangs usually

00:17:13.829 --> 00:17:16.869
play out. But we can't do that. We only have

00:17:16.869 --> 00:17:20.170
one set of cosmic microwave background data.

00:17:20.230 --> 00:17:23.470
So what do they do? Cosmologists set prior parameters

00:17:23.470 --> 00:17:26.569
based on existing physics, run the data through

00:17:26.569 --> 00:17:28.890
those Markov chain Monte Carlo simulations we

00:17:28.890 --> 00:17:31.349
just talked about, and update their beliefs about

00:17:31.349 --> 00:17:34.029
the fundamental structure of reality. So it filters

00:17:34.029 --> 00:17:36.349
our spam and it measures the Big Bang. But here

00:17:36.349 --> 00:17:38.809
is where this beautiful mathematical framework

00:17:38.809 --> 00:17:42.619
hits a very messy, very human limit. The courtroom.

00:17:42.880 --> 00:17:45.339
Oh, this is a fascinating example in the UK.

00:17:45.519 --> 00:17:48.490
There was a legal case known as RV Adams The

00:17:48.490 --> 00:17:51.130
defense actually brought in an expert to explain

00:17:51.130 --> 00:17:53.529
Bayes' theorem to the jury, hoping to help them

00:17:53.529 --> 00:17:55.690
combine different pieces of conflicting evidence

00:17:55.690 --> 00:17:57.609
mathematically. And the court's reaction to that

00:17:57.609 --> 00:18:00.269
was aggressively hostile. Really hostile. The

00:18:00.269 --> 00:18:02.589
Court of Appeal ultimately ruled that introducing

00:18:02.589 --> 00:18:05.910
Bayes' theorem plunges the jury into inappropriate

00:18:05.910 --> 00:18:08.930
and unnecessary realms of theory and complexity,

00:18:09.710 --> 00:18:11.849
deflecting them from their proper task. It really

00:18:11.849 --> 00:18:14.769
makes you wonder. Are human brains just not wired

00:18:14.769 --> 00:18:18.210
for this type of rigorous explicit accumulation

00:18:18.210 --> 00:18:21.829
of evidence? It's tough. We naturally crave cohesive

00:18:21.829 --> 00:18:25.009
stories and emotional narratives, not probability

00:18:25.009 --> 00:18:28.309
matrices. There is a profound friction between

00:18:28.309 --> 00:18:31.769
mathematical logic and human nature. The source

00:18:31.769 --> 00:18:34.289
text highlights a fascinating counter -argument

00:18:34.289 --> 00:18:36.849
regarding this friction by a theorist named Gardner

00:18:36.849 --> 00:18:39.950
-Medwin. was his argument. He argues that juries

00:18:39.950 --> 00:18:41.809
shouldn't just be calculating the probability

00:18:41.809 --> 00:18:43.750
of guilt, they should be looking at the probability

00:18:43.750 --> 00:18:45.970
of the evidence given that the defendant is innocent.

00:18:46.150 --> 00:18:47.849
Okay, let's break down what that actually means

00:18:47.849 --> 00:18:50.410
for a regular person sitting on a jury. If you

00:18:50.410 --> 00:18:53.170
are going to use Bayes' theorem to compute a

00:18:53.170 --> 00:18:55.910
final posterior probability of someone's guilt,

00:18:56.569 --> 00:18:58.950
the math demands that you start with a prior

00:18:58.950 --> 00:19:02.089
probability of guilt. But what is the prior probability

00:19:02.089 --> 00:19:04.589
that a random person sitting on trial committed

00:19:04.589 --> 00:19:07.490
a crime before you even look at the DNA or the

00:19:07.490 --> 00:19:10.670
witnesses? How do you even set that number? If

00:19:10.670 --> 00:19:12.650
100 ,000 people live in the city where the crime

00:19:12.650 --> 00:19:15.900
happened, is the prior one in 100 ,000? Or do

00:19:15.900 --> 00:19:17.779
you base it on the defendant's past criminal

00:19:17.779 --> 00:19:20.279
record? Yeah. Suddenly you are asking a jury.

00:19:20.880 --> 00:19:23.640
to assign a mathematical number to a person's

00:19:23.640 --> 00:19:26.079
underlying suspiciousness before looking at the

00:19:26.079 --> 00:19:28.259
facts of the case. Which is exactly the problem.

00:19:28.400 --> 00:19:30.819
The math works perfectly, but the act of setting

00:19:30.819 --> 00:19:34.079
a prior incidence of guilt feels fundamentally

00:19:34.079 --> 00:19:37.099
opposed to the legal concept of innocent until

00:19:37.099 --> 00:19:39.839
proven guilty. It totally clashes. It requires

00:19:39.839 --> 00:19:42.480
knowing a baseline probability of criminality

00:19:42.480 --> 00:19:44.900
that is highly controversial to establish in

00:19:44.900 --> 00:19:47.400
a specific individual trial. It encapsulates

00:19:47.400 --> 00:19:50.329
why this is such a powerful yet demanding mental

00:19:50.329 --> 00:19:53.769
model. So to wrap up our deep dive today, what

00:19:53.769 --> 00:19:55.690
are the actionable takeaways? I think the main

00:19:55.690 --> 00:19:58.690
one is transparency. Yeah. Thinking like a Bajan

00:19:58.690 --> 00:20:00.890
means explicitly acknowledging that you have

00:20:00.890 --> 00:20:03.910
priors, you have initial biases, you have starting

00:20:03.910 --> 00:20:06.950
assumptions, and that is perfectly OK as long

00:20:06.950 --> 00:20:09.289
as you make them transparent. Exactly. It means

00:20:09.289 --> 00:20:11.650
you have to remain genuinely open to new evidence

00:20:11.650 --> 00:20:14.150
and let that evidence iteratively update your

00:20:14.150 --> 00:20:19.160
beliefs over time. And above all, avoid 100 %

00:20:19.160 --> 00:20:21.980
certainty. Leave the door unlocked, even if it's

00:20:21.980 --> 00:20:24.019
just a fraction of a percent. Before we sign

00:20:24.019 --> 00:20:26.519
off, I want to leave you with one final, truly

00:20:26.519 --> 00:20:28.680
mind -bending concept from the source material.

00:20:29.220 --> 00:20:31.819
It's a theory called Solomonov's inductive inference.

00:20:32.019 --> 00:20:34.309
Okay, take us down the rabbit hole. Ray Salomonash

00:20:34.309 --> 00:20:37.289
mathematically combined Bayesian statistics with

00:20:37.289 --> 00:20:40.309
Occam's razor, which is the philosophical principle

00:20:40.309 --> 00:20:42.569
that the simplest explanation is usually the

00:20:42.569 --> 00:20:45.230
correct one. Right. He used these two concepts

00:20:45.230 --> 00:20:47.869
to theorize a framework for universal prediction.

00:20:48.569 --> 00:20:50.890
The underlying idea is that if our environment,

00:20:51.250 --> 00:20:53.970
our physical reality follows any unknown but

00:20:53.970 --> 00:20:56.849
computable probability distribution, then Bayesian

00:20:56.849 --> 00:20:59.470
updating could theoretically be used to predict

00:20:59.470 --> 00:21:02.430
the yet unseen parts of the universe in an optimal

00:21:02.430 --> 00:21:05.640
perfect - fashion. Wait so if the universe operates

00:21:05.640 --> 00:21:09.039
on some kind of underlying computable code Bayes

00:21:09.039 --> 00:21:11.799
theorem is the literal key to predicting the

00:21:11.799 --> 00:21:14.980
future. Given enough incoming data and the correct

00:21:14.980 --> 00:21:18.900
universal prior to start with you could hypothetically

00:21:18.900 --> 00:21:21.640
predict the unseen sequence of reality perfectly.

00:21:22.359 --> 00:21:24.680
Every random event would just be another data

00:21:24.680 --> 00:21:27.400
point updating the master equation. Look around

00:21:27.400 --> 00:21:29.720
the room you're in right now. It really asks

00:21:29.720 --> 00:21:32.480
you to ponder whether everything in your life,

00:21:33.180 --> 00:21:35.200
the traffic on your commute, the decisions of

00:21:35.200 --> 00:21:37.700
the people around you, the seemingly random chaos

00:21:37.700 --> 00:21:40.440
of your day, is just a computable sequence. It's

00:21:40.440 --> 00:21:42.119
a wild thought. It might all just be waiting

00:21:42.119 --> 00:21:44.680
for the right prior and enough data to be perfectly

00:21:44.680 --> 00:21:47.259
mathematically predicted. It brings us right

00:21:47.259 --> 00:21:49.299
back to our savvy detective at the start of the

00:21:49.299 --> 00:21:51.700
show. It really does. Because in the end, we're

00:21:51.700 --> 00:21:53.579
all just detectives trying to weigh the clues

00:21:53.579 --> 00:21:56.440
of today against the mysteries of tomorrow. making

00:21:56.440 --> 00:21:58.519
sure we never ever lock the door on the truth.