WEBVTT

00:00:00.000 --> 00:00:02.180
Every time you speak into your smartphone, you

00:00:02.180 --> 00:00:04.139
know, and it magically transcribes your voice

00:00:04.139 --> 00:00:06.620
into a text message, it isn't actually hearing

00:00:06.620 --> 00:00:09.080
your words. Right. Not the way a human does anyway.

00:00:09.300 --> 00:00:11.140
Exactly. It's essentially blind to language.

00:00:11.439 --> 00:00:13.880
Instead, what that phone is doing is calculating

00:00:13.880 --> 00:00:17.000
this massive invisible web of probabilities in

00:00:17.000 --> 00:00:19.899
real time. It's just guessing, really. Guessing

00:00:19.899 --> 00:00:22.460
the most mathematically likely sequence of sounds

00:00:22.460 --> 00:00:25.920
based on this totally chaotic ocean of audio

00:00:25.920 --> 00:00:29.899
data. And today, we are taking you, the learner,

00:00:30.039 --> 00:00:33.020
inside the architecture of that randomness. Welcome

00:00:33.020 --> 00:00:35.880
to the Deep Dive. Glad to be here. So our mission

00:00:35.880 --> 00:00:39.539
today is to decode a rather dense, highly technical

00:00:39.539 --> 00:00:41.619
Wikipedia article we have. have in our sources.

00:00:42.039 --> 00:00:44.799
It breaks down this secret cheat code that data

00:00:44.799 --> 00:00:47.700
scientists and statisticians use to map out complex,

00:00:47.820 --> 00:00:49.500
chaotic systems. Yeah, it's all about something

00:00:49.500 --> 00:00:52.500
called probabilistic graphical models, or PGMs

00:00:52.500 --> 00:00:55.179
for short. PGMs, right. And the goal here is

00:00:55.179 --> 00:00:57.159
to give you a mental cheat sheet for how these

00:00:57.159 --> 00:00:59.960
actually work. Because it is an incredibly powerful

00:00:59.960 --> 00:01:02.280
framework. I mean, when you're trying to understand

00:01:02.280 --> 00:01:06.060
any real world situation where dozens or even

00:01:06.060 --> 00:01:08.519
hundreds of different random variables are interacting

00:01:08.519 --> 00:01:11.450
all at once, the the math just becomes paralyzing.

00:01:11.489 --> 00:01:13.870
Oh, I can imagine. Yeah, so a graphical model

00:01:13.870 --> 00:01:17.849
is the tool that cuts through that noise. To

00:01:17.849 --> 00:01:20.329
visualize what we're talking about today, I want

00:01:20.329 --> 00:01:23.750
you to imagine stepping into a vast dimly lit

00:01:23.750 --> 00:01:26.250
room. Okay, setting the scene, I like it. Right,

00:01:26.450 --> 00:01:28.790
and floating all around you is this glowing,

00:01:29.349 --> 00:01:31.730
highly complex, detective style string board.

00:01:31.829 --> 00:01:33.909
Oh wow. You've got these push bins representing

00:01:33.909 --> 00:01:36.030
different variables, and they are connected by

00:01:36.030 --> 00:01:38.329
these pulsing threads of light, just weaving

00:01:38.329 --> 00:01:40.969
this massive three -dimensional web. I love that

00:01:40.969 --> 00:01:43.569
visual. So we're basically standing inside the

00:01:43.569 --> 00:01:45.870
brain of a machine learning algorithm, just looking

00:01:45.870 --> 00:01:48.569
at how it maps the world. Exactly. That's exactly

00:01:48.569 --> 00:01:50.129
what it is. Alright, let's start with the fundamental

00:01:50.129 --> 00:01:52.489
building blocks then. We have our floating push

00:01:52.489 --> 00:01:54.969
pins and our strings of light. The source material

00:01:54.969 --> 00:01:57.969
defines a graphical model as a graph that expresses

00:01:57.969 --> 00:02:00.569
the, and I'm quoting here, the conditional dependence

00:02:00.569 --> 00:02:03.650
structure between random variables. Right. It

00:02:03.650 --> 00:02:07.650
also says it provides a compact, factorized representation

00:02:07.650 --> 00:02:11.229
of a multi -dimensional space, which, you know,

00:02:11.270 --> 00:02:13.289
is a lot of heavy terminology right out of the

00:02:13.289 --> 00:02:15.409
gate. It is a mouthful. Let's break that down

00:02:15.409 --> 00:02:18.289
because understanding that one sentence is really

00:02:18.289 --> 00:02:20.009
the key to everything else we're going to talk

00:02:20.009 --> 00:02:23.300
about. Please do. So. In statistics, a random

00:02:23.300 --> 00:02:25.539
variable is really just something that can take

00:02:25.539 --> 00:02:27.860
on different values. And we aren't completely

00:02:27.860 --> 00:02:29.419
sure what value it's going to take. It could

00:02:29.419 --> 00:02:31.639
be a coin. Yeah, or like the weather tomorrow,

00:02:32.060 --> 00:02:35.159
the price of a stock, or, to use the detective

00:02:35.159 --> 00:02:37.639
board analogy, the identity of a suspect in a

00:02:37.639 --> 00:02:41.180
crime. Got it. Now, if you have 100 of these

00:02:41.180 --> 00:02:44.199
variables, trying to calculate the probability

00:02:44.199 --> 00:02:46.919
of every single possible combination of events

00:02:46.919 --> 00:02:50.110
happening at the exact same time, computationally

00:02:50.110 --> 00:02:52.409
impossible just too many variables way too many

00:02:52.409 --> 00:02:54.650
even our most powerful supercomputers would choke

00:02:54.650 --> 00:02:57.819
on it The math just scales exponentially into

00:02:57.819 --> 00:03:01.379
infinity. So the graph, this PGM, is basically

00:03:01.379 --> 00:03:03.900
a way of cheating that math. It's a way of isolating

00:03:03.900 --> 00:03:06.419
what actually matters. What's fascinating here

00:03:06.419 --> 00:03:09.099
is that it bridges graph theory and probability.

00:03:09.879 --> 00:03:11.599
Instead of assuming everything in the universe

00:03:11.599 --> 00:03:14.539
affects everything else equally, the graph maps

00:03:14.539 --> 00:03:19.500
out specific relationships. It factors the massive,

00:03:19.740 --> 00:03:22.639
incomprehensible problem into bite -sized, solvable

00:03:22.639 --> 00:03:26.240
pieces. Oh, I see. the math along the strings

00:03:26.240 --> 00:03:29.580
that are actually connected. By doing that, it

00:03:29.580 --> 00:03:32.099
encodes a distribution over a multi -dimensional

00:03:32.099 --> 00:03:34.560
space without having to calculate the whole space

00:03:34.560 --> 00:03:37.240
at once. Okay, let's unpack this and ground it

00:03:37.240 --> 00:03:40.120
in that detective string board analogy. Say we

00:03:40.120 --> 00:03:44.099
have three suspects. Suspect A, Suspect B, and

00:03:44.099 --> 00:03:46.900
Suspect C. Right. The classic trio. Right. And

00:03:46.900 --> 00:03:49.060
you're the detective trying to figure out who

00:03:49.060 --> 00:03:51.900
committed a robbery. Initially, you have strings

00:03:51.900 --> 00:03:53.639
connecting all three of them because they all

00:03:53.639 --> 00:03:55.960
run in the same circles. Makes sense. But then

00:03:55.960 --> 00:03:58.340
you uncover airtight security footage proving

00:03:58.340 --> 00:04:00.780
Suspect A was in a completely different country

00:04:00.780 --> 00:04:02.659
at the time of the robbery. So now the value

00:04:02.659 --> 00:04:05.300
of variable A is known. It's an absolute certainty.

00:04:05.300 --> 00:04:07.699
Right. And because we know the absolute truth

00:04:07.699 --> 00:04:10.919
about A, Let's say we realized that suspect B

00:04:10.919 --> 00:04:13.840
and suspect C's possible motives had everything

00:04:13.840 --> 00:04:16.100
to do with A's involvement. Okay, following you.

00:04:16.379 --> 00:04:18.899
So without A, B, and C actually have nothing

00:04:18.899 --> 00:04:21.139
to do with each other. If I were visualizing

00:04:21.139 --> 00:04:23.699
this on a physical board, my instinct would be

00:04:23.699 --> 00:04:26.019
to just take scissors and cut the string between

00:04:26.019 --> 00:04:29.379
B and C forever. Is that conditional independence?

00:04:29.660 --> 00:04:31.759
Well, let's refine that slightly. Because the

00:04:31.759 --> 00:04:33.740
underlying relationship doesn't just cease to

00:04:33.740 --> 00:04:35.930
exist in the real world. Oh, it doesn't. No,

00:04:36.009 --> 00:04:38.129
think of the connection between them. Less like

00:04:38.129 --> 00:04:40.629
a physical string that gets destroyed and more

00:04:40.629 --> 00:04:44.949
like a road with a toll booth. Okay, a toll booth.

00:04:45.170 --> 00:04:47.149
I'm tracking. In conditional independence, the

00:04:47.149 --> 00:04:49.269
flow of new information is what gets blocked.

00:04:50.269 --> 00:04:52.850
If suspect A is the toll booth and we don't know

00:04:52.850 --> 00:04:55.709
anything about A yet, information about suspect

00:04:55.709 --> 00:04:58.129
B's whereabouts might give us clues about C.

00:04:58.290 --> 00:05:00.509
Because they're connected through A. Exactly.

00:05:00.790 --> 00:05:03.399
The traffic is flowing freely between them. But

00:05:03.399 --> 00:05:05.399
the moment we lock down the truth about suspect

00:05:05.399 --> 00:05:08.199
A, the moment we close the toll booth, no new

00:05:08.199 --> 00:05:11.500
information can travel from B to C. Wow. Knowing

00:05:11.500 --> 00:05:13.920
more about B tells us absolutely nothing new

00:05:13.920 --> 00:05:16.939
about C. The road is blocked. And that is the

00:05:16.939 --> 00:05:19.600
essence of conditional independence. That makes

00:05:19.600 --> 00:05:22.629
so much more sense. It's about the flow of information.

00:05:23.290 --> 00:05:25.509
So once we establish that we are drawing this

00:05:25.509 --> 00:05:27.689
map of connected variables, we have to look at

00:05:27.689 --> 00:05:29.589
the strings themselves. Right, the connections.

00:05:29.750 --> 00:05:32.649
And looking at the concepts in our sources, there's

00:05:32.649 --> 00:05:36.129
a major fork in the road here. The entire field

00:05:36.129 --> 00:05:39.949
basically splits into two main families of models

00:05:39.949 --> 00:05:42.769
based on one really simple visual detail. Yep.

00:05:43.230 --> 00:05:46.089
Do the strings have arrows or not? Exactly. Do

00:05:46.089 --> 00:05:48.449
the strings connecting these variables have arrows

00:05:48.449 --> 00:05:50.569
pointing in a specific direction, or are they

00:05:50.569 --> 00:05:52.649
just plane lines? These are the two heavyweights

00:05:52.649 --> 00:05:55.250
of the PGM world. Let's tackle the plane lines

00:05:55.250 --> 00:05:57.709
first. Okay. These are called undirected graphical

00:05:57.709 --> 00:06:02.110
models, or Markov random fields. In an undirected

00:06:02.110 --> 00:06:04.949
graph, the presence of an edge, just a simple

00:06:04.949 --> 00:06:07.389
line between two nodes, implies that there is

00:06:07.389 --> 00:06:09.759
a mutual relationship or correlation. between

00:06:09.759 --> 00:06:11.620
those variables. So just a general link, the

00:06:11.620 --> 00:06:13.800
source actually gives a conceptual example of

00:06:13.800 --> 00:06:16.680
four nodes, A, B, C, and D. Let's say node A

00:06:16.680 --> 00:06:19.519
connects to B, C, and D, but there are no lines

00:06:19.519 --> 00:06:21.980
connecting B, C, and D directly to each other.

00:06:22.040 --> 00:06:24.100
Right. What is that structure actually doing

00:06:24.100 --> 00:06:27.040
mathematically? It's establishing our tollbooth

00:06:27.040 --> 00:06:30.240
rule on a larger scale. It tells us that B, C,

00:06:30.240 --> 00:06:32.800
and D are conditionally independent given A.

00:06:32.959 --> 00:06:35.500
So A is the tollbooth for all of them. Exactly.

00:06:35.560 --> 00:06:38.459
If we lock down the value of A, The variables

00:06:38.459 --> 00:06:41.680
b, c, and d are totally isolated from one another.

00:06:42.319 --> 00:06:45.560
In a Markov random field, the math focuses on

00:06:45.560 --> 00:06:48.579
pairs of connected nodes. You look at a and b,

00:06:48.980 --> 00:06:51.259
and you evaluate how compatible their states

00:06:51.259 --> 00:06:54.129
are. Yeah, the text mentions a specific formula

00:06:54.129 --> 00:06:56.490
for this, where you calculate the joint probability

00:06:56.490 --> 00:06:59.509
by multiplying functions of those connected pairs,

00:06:59.589 --> 00:07:01.629
like function of A and B times function of A

00:07:01.629 --> 00:07:03.670
and C. Yeah, the non -negative functions. But,

00:07:03.689 --> 00:07:06.170
you know, reading math equations aloud is a surefire

00:07:06.170 --> 00:07:08.870
way to lose anyone listening. How do we visualize

00:07:08.870 --> 00:07:10.910
what that calculation is actually doing without

00:07:10.910 --> 00:07:12.970
getting totally bogged down in the notation?

00:07:13.129 --> 00:07:15.709
Think of it as a scoring system for a group project.

00:07:15.850 --> 00:07:17.970
Oh. You don't need to evaluate how the entire

00:07:17.970 --> 00:07:20.089
company works together all at once. You just

00:07:20.089 --> 00:07:21.509
look at the pairs of people who who actually

00:07:21.509 --> 00:07:23.370
sit next to each other. Oh, I like that. Yeah.

00:07:23.569 --> 00:07:25.889
So you give a compatibility score to A and B

00:07:25.889 --> 00:07:28.290
working together. Then you give a compatibility

00:07:28.290 --> 00:07:30.350
score to A and C working together. And then you

00:07:30.350 --> 00:07:32.589
just multiply them. Exactly. You multiply all

00:07:32.589 --> 00:07:35.829
those local pairwise scores together, and it

00:07:35.829 --> 00:07:38.110
gives you a global score for how well the whole

00:07:38.110 --> 00:07:41.089
system is functioning. It's a way of calculating

00:07:41.089 --> 00:07:44.290
the probability of the whole based purely on

00:07:44.290 --> 00:07:47.149
the harmony of the local neighborhoods. a local

00:07:47.149 --> 00:07:49.149
neighborhood scoring system. I really like that.

00:07:49.149 --> 00:07:51.529
It feels very symmetrical and neat. It is. But

00:07:51.529 --> 00:07:54.629
then we cross the aisle to the directed models.

00:07:54.670 --> 00:07:57.850
These are called directed acyclic graphs or Bayesian

00:07:57.850 --> 00:08:01.009
networks. And here, the strings are no longer

00:08:01.009 --> 00:08:04.009
just plain lines. They have arrows. And those

00:08:04.009 --> 00:08:06.370
arrows fundamentally change the interpretation

00:08:06.370 --> 00:08:08.250
of the map. Because they enforce a direction.

00:08:08.370 --> 00:08:10.389
Right. In a Bayesian network, we aren't just

00:08:10.389 --> 00:08:12.810
looking at mutual correlation anymore. We are

00:08:12.810 --> 00:08:15.730
looking at influence. The node where the arrow

00:08:15.689 --> 00:08:18.410
starts is called a parent and the node the arrow

00:08:18.410 --> 00:08:21.269
points to is the child. See, when I hear parent

00:08:21.269 --> 00:08:24.250
and child, my mind immediately goes to a family

00:08:24.250 --> 00:08:28.089
tree. It implies that this variable precedes

00:08:28.089 --> 00:08:30.410
or physically creates that one. Is that the right

00:08:30.410 --> 00:08:33.570
way to think about a directed graph? If we connect

00:08:33.570 --> 00:08:36.129
this to the bigger picture, it's a very useful

00:08:36.129 --> 00:08:38.919
starting point. But it's better to think of it

00:08:38.919 --> 00:08:41.419
strictly as an influence or a generative process.

00:08:41.620 --> 00:08:43.840
Okay, influence. The family tree analogy works

00:08:43.840 --> 00:08:46.100
because time and genetics flow in one direction.

00:08:46.120 --> 00:08:48.059
Right. Your grandfather influences your father

00:08:48.059 --> 00:08:50.679
and your father influences you. Right. In a Bayesian

00:08:50.679 --> 00:08:53.419
network, the probability of the entire system

00:08:53.419 --> 00:08:55.820
is calculated by looking at the probability of

00:08:55.820 --> 00:08:58.240
each child variable, given the state of its parents.

00:08:58.379 --> 00:09:01.259
So the math is literally probability of X, given

00:09:01.259 --> 00:09:03.879
the parents of X. Exactly. The conditionality

00:09:03.879 --> 00:09:05.840
is baked directly into the direction of the arrow.

00:09:06.059 --> 00:09:09.480
The text mentions naive Bayes classifiers, hidden

00:09:09.480 --> 00:09:12.120
Markov models, neural networks. They all use

00:09:12.120 --> 00:09:14.700
this directed flow. So if undirected graphs are

00:09:14.700 --> 00:09:17.179
good for things sitting at the same table, directed

00:09:17.179 --> 00:09:19.700
graphs are for a sequence of events. But the

00:09:19.700 --> 00:09:22.419
source specifically calls them directed acyclic

00:09:22.419 --> 00:09:25.799
graphs. Acyclic meaning what? No cycles. Right.

00:09:25.919 --> 00:09:28.460
No cycles. You can't loop back. That seems like

00:09:28.460 --> 00:09:30.919
a big rule. It is a strict rule for Bayesian

00:09:30.919 --> 00:09:33.980
networks. You can never start at a node. Follow

00:09:33.980 --> 00:09:36.259
the arrows and end up back where you started.

00:09:36.580 --> 00:09:38.799
You can't be your own ancestor. That makes sense.

00:09:39.039 --> 00:09:42.299
Time and influence must march forward. And because

00:09:42.299 --> 00:09:45.000
of this strict one -way flow, Bayesian networks

00:09:45.000 --> 00:09:48.019
allow for a very specific type of analysis called

00:09:48.279 --> 00:09:51.139
D separation. Wait. D separation. You totally

00:09:51.139 --> 00:09:53.419
lost me there. The text mentioned it briefly

00:09:53.419 --> 00:09:56.139
as a criterion to determine if sets of nodes

00:09:56.139 --> 00:09:59.019
are conditionally independent. But how does looking

00:09:59.019 --> 00:10:01.740
at the direction of arrows separate anything?

00:10:01.840 --> 00:10:03.940
It's a fantastic concept, really. And it goes

00:10:03.940 --> 00:10:06.200
back to our toll booth analogy, but this time

00:10:06.200 --> 00:10:08.340
with directional traffic. Okay. Directional toll

00:10:08.340 --> 00:10:10.860
booths. Right. D separation stands for directed

00:10:10.860 --> 00:10:13.259
separation. Let's imagine a simple chain of three

00:10:13.259 --> 00:10:16.019
events. Event A is your alarm clock failing to

00:10:16.019 --> 00:10:19.419
go off. Oh no. B is you being late for work.

00:10:19.820 --> 00:10:22.659
And event C is you getting fired. Wow. A very

00:10:22.659 --> 00:10:24.720
stressful directed graph today. Very stressful,

00:10:24.899 --> 00:10:27.820
yeah. Now the arrow points from alarm, which

00:10:27.820 --> 00:10:31.320
is A, to late, which is B, and from late to fired,

00:10:31.340 --> 00:10:34.879
which is C. Got it. A to B to C. If you don't

00:10:34.879 --> 00:10:36.919
know anything about whether someone was late,

00:10:37.379 --> 00:10:40.120
knowing their alarm failed gives you a pretty

00:10:40.120 --> 00:10:43.139
strong clue that they might get fired. The information

00:10:43.139 --> 00:10:46.000
flows all the way down the chain. Makes total

00:10:46.000 --> 00:10:50.110
sense. But... What if you already know for an

00:10:50.110 --> 00:10:52.730
absolute fact that the person was late for work?

00:10:53.250 --> 00:10:56.529
You have observed variable B. Okay, so I know

00:10:56.529 --> 00:10:59.809
B is true. In the rules of deseparation, observing

00:10:59.809 --> 00:11:02.549
the middle node in a chain blocks the flow of

00:11:02.549 --> 00:11:05.409
information. If I already know you were late,

00:11:05.570 --> 00:11:08.070
discovering that your alarm failed doesn't change

00:11:08.070 --> 00:11:10.549
my probability of you getting fired. Oh wow,

00:11:10.690 --> 00:11:12.629
because being late is the thing that gets you

00:11:12.629 --> 00:11:15.250
fired, not the alarm. Exactly. The late node

00:11:15.250 --> 00:11:17.710
already contains all the predictive power. The

00:11:17.710 --> 00:11:20.490
arrows dictate exactly how and when that information

00:11:20.490 --> 00:11:23.110
gets blocked. So by tracing the arrows, an algorithm

00:11:23.110 --> 00:11:25.269
can instantly know which variables to ignore

00:11:25.269 --> 00:11:27.049
and which ones to actually pay attention to,

00:11:27.070 --> 00:11:29.110
which must save massive amounts of competing

00:11:29.110 --> 00:11:31.230
power. Huge amounts, yeah. That is brilliant.

00:11:31.370 --> 00:11:33.250
But my detective brain is flashing a warning

00:11:33.250 --> 00:11:35.830
light right now. Uh -oh. What's the issue? Well,

00:11:35.889 --> 00:11:38.789
both of these heavyweights, the undirected mutual

00:11:38.789 --> 00:11:41.929
models and the directed acyclic models, they

00:11:41.929 --> 00:11:44.740
both assume the world is perfectly neat. Oh,

00:11:44.840 --> 00:11:46.639
I see where you're going. Right. Like, either

00:11:46.639 --> 00:11:48.500
everything is a two -way street or everything

00:11:48.500 --> 00:11:51.340
flows forward in one direction. But what happens

00:11:51.340 --> 00:11:53.779
when the real world is messy? What happens when

00:11:53.779 --> 00:11:56.779
things do loop back on themselves? That is exactly

00:11:56.779 --> 00:11:59.279
where we leave the neat, tidy world of the textbooks

00:11:59.279 --> 00:12:02.299
and enter the Wild West of graphical models.

00:12:02.379 --> 00:12:04.320
The Wild West? Yeah, because the researchers

00:12:04.320 --> 00:12:06.820
who build these systems, they know reality is

00:12:06.820 --> 00:12:08.940
messy. So they started breaking the foundational

00:12:08.940 --> 00:12:11.279
rules to accommodate that reality. And the text

00:12:11.279 --> 00:12:15.299
outlines several of these other types of models.

00:12:15.360 --> 00:12:18.440
The first one that jumps out is the cyclic -directed

00:12:18.440 --> 00:12:21.559
graphical model, which seems to directly violate

00:12:21.559 --> 00:12:23.600
the acyclic rule we just spent all that time

00:12:23.600 --> 00:12:26.399
establishing. It allows variables to depend on

00:12:26.399 --> 00:12:29.620
parents in a closed loop. It does. And from a

00:12:29.620 --> 00:12:32.059
purely mathematical standpoint, a cycle creates

00:12:32.059 --> 00:12:34.500
an enormous headache. Think about it if you're

00:12:34.500 --> 00:12:36.299
trying to calculate the probability of an event,

00:12:36.659 --> 00:12:39.019
but the event is constantly feeding back into

00:12:39.019 --> 00:12:42.120
its own cause. The standard rules of factorization

00:12:42.120 --> 00:12:45.360
we discussed completely break down. I mean, how

00:12:45.360 --> 00:12:48.059
do you calculate a score when the score keeps

00:12:48.059 --> 00:12:50.259
changing itself? It's like a snake eating its

00:12:50.259 --> 00:12:52.759
own tail. The example in the source is actually

00:12:52.759 --> 00:12:55.940
wild. It describes a situation where variable

00:12:55.940 --> 00:12:59.779
D depends on A, B, and C, but then C depends

00:12:59.779 --> 00:13:02.360
on B and D. Exactly. It creates this infinite

00:13:02.360 --> 00:13:04.840
feedback loop of influence. But think about real

00:13:04.840 --> 00:13:08.019
life for a second. Ecosystems are feedback loops.

00:13:08.840 --> 00:13:11.519
Predator and prey populations constantly influence

00:13:11.519 --> 00:13:14.200
each other in cycles. Oh, that's true. Economics

00:13:14.200 --> 00:13:17.240
is full of feedback loops. If inflation rises,

00:13:17.860 --> 00:13:20.139
central banks raise interest rates, which slows

00:13:20.139 --> 00:13:23.080
the economy, which then impacts inflation. You

00:13:23.080 --> 00:13:25.340
can't model that accurately with a strict one

00:13:25.340 --> 00:13:28.039
-way arrow. So the math just has to adapt. Right.

00:13:28.340 --> 00:13:30.799
Statisticians developed complex discovery algorithms

00:13:30.799 --> 00:13:34.200
specifically to handle these cyclic graphs, basically

00:13:34.200 --> 00:13:36.980
forcing the math to reconcile with reality. There's

00:13:36.980 --> 00:13:39.200
another hybrid in the source that I find absolutely

00:13:39.200 --> 00:13:41.500
fascinating, the chain graph. Oh yeah, that's

00:13:41.500 --> 00:13:43.500
a good one. It's described as a graph that allows

00:13:43.500 --> 00:13:46.379
both directed arrows and undirected lines in

00:13:46.379 --> 00:13:48.720
the same exact model. It's like the ultimate

00:13:48.720 --> 00:13:50.639
diplomatic treaty between the two heavyweights.

00:13:50.779 --> 00:13:53.480
Right, because it keeps the rule of no directed

00:13:53.480 --> 00:13:55.340
cycles, so you still can't follow the arrows

00:13:55.340 --> 00:13:58.700
back home. But along the way, you can have clusters

00:13:58.700 --> 00:14:01.360
of variables that just mutually interact. It

00:14:01.360 --> 00:14:03.299
gives you the best of both worlds, really. But

00:14:03.299 --> 00:14:05.360
how does that actually look in practice? Like,

00:14:05.360 --> 00:14:07.950
why would you need both in the same map. Okay.

00:14:08.049 --> 00:14:11.610
Imagine modeling a complex disease. At the top

00:14:11.610 --> 00:14:14.110
of the map, you have genetic predispositions.

00:14:14.690 --> 00:14:16.990
Those are directed arrows. Because you inherit

00:14:16.990 --> 00:14:19.649
them. Right. The influence flows strictly downward

00:14:19.649 --> 00:14:22.590
from your DNA to your cellular function. You

00:14:22.590 --> 00:14:24.929
can't change your DNA based on a cold you caught.

00:14:25.070 --> 00:14:27.289
Right. The arrow of biology only goes one way

00:14:27.289 --> 00:14:29.879
there. Exactly. But once you get down to the

00:14:29.879 --> 00:14:32.899
cellular level, you might have hundreds of different

00:14:32.899 --> 00:14:35.179
proteins and enzymes interacting with each other

00:14:35.179 --> 00:14:37.360
all at once. And they don't have a strict parent

00:14:37.360 --> 00:14:40.100
-child thing going on. No, one protein doesn't

00:14:40.100 --> 00:14:41.899
necessarily cause the other. They're just mutually

00:14:41.899 --> 00:14:45.120
reacting in a chemical soup. Those relationships

00:14:45.120 --> 00:14:47.940
are undirected. So a chain graph lets a data

00:14:47.940 --> 00:14:50.559
scientist model the strict downward flow of genetics

00:14:50.559 --> 00:14:54.200
while simultaneously capturing the mutual, undirected

00:14:54.200 --> 00:14:56.600
correlation of the immediate cellular environment.

00:14:56.840 --> 00:14:59.539
That is incredibly elegant. It really shows the

00:14:59.539 --> 00:15:01.659
adaptability of the math. I mean, they've mutated

00:15:01.659 --> 00:15:04.059
the basic graph concept to fit whatever messy

00:15:04.059 --> 00:15:06.820
data they have. The text mentions a few honorable

00:15:06.820 --> 00:15:08.980
engines too, like tree -augmented classifiers,

00:15:09.240 --> 00:15:11.820
TAN models, and targeted Bayesian network learning,

00:15:12.059 --> 00:15:15.240
or TBNL. Right, and ancestral graphs, which add

00:15:15.240 --> 00:15:18.039
bi -directed edges, and restrictive Boltzmann

00:15:18.039 --> 00:15:20.440
machines, which are these bipartite generative

00:15:20.440 --> 00:15:22.480
models. Lots of specialized tools. But I want

00:15:22.480 --> 00:15:24.559
to zero in on one specific concept it mentions

00:15:24.559 --> 00:15:27.360
before we move on. Factor graphs and something

00:15:27.360 --> 00:15:30.659
called belief propagation. Ah. belief propagation.

00:15:30.899 --> 00:15:33.120
The text notes that factor graphs are undirected

00:15:33.120 --> 00:15:35.500
by partite graphs used to implement this belief

00:15:35.500 --> 00:15:38.100
propagation. But, as we established earlier,

00:15:38.159 --> 00:15:40.679
we want to avoid just listing terms. We need

00:15:40.679 --> 00:15:43.000
to know how this works, what is a factor graph

00:15:43.000 --> 00:15:45.720
doing, and what exactly is a belief propagating.

00:15:46.080 --> 00:15:49.059
Okay, to understand belief propagation, we need

00:15:49.059 --> 00:15:53.179
a good analogy. Imagine you're at a massive crowded

00:15:53.179 --> 00:15:55.820
cocktail party in a sprawling mansion. Sounds

00:15:55.820 --> 00:15:58.500
fun. You're deep inside one of the interior rooms

00:15:58.500 --> 00:16:00.539
and you want to know if it's raining outside.

00:16:00.879 --> 00:16:03.059
You don't have a window and you don't have your

00:16:03.059 --> 00:16:06.360
phone. Okay, so I'm a node in a graphical model

00:16:06.360 --> 00:16:08.379
trying to figure out the state of a variable

00:16:08.379 --> 00:16:11.100
I can't observe directly. Look at Q speaking

00:16:11.100 --> 00:16:14.059
the language. Exactly. Now in a graphical model

00:16:14.059 --> 00:16:16.139
you can only communicate with the nodes you're

00:16:16.139 --> 00:16:18.519
directly connected to. So you ask the person

00:16:18.519 --> 00:16:20.840
standing next to you. Hey, do you think it's

00:16:20.840 --> 00:16:22.919
raining? And they don't know either. Right, they

00:16:22.919 --> 00:16:24.460
don't know, so they ask the person next to them

00:16:24.460 --> 00:16:26.879
who asks the person next to them. It's just a

00:16:26.879 --> 00:16:29.039
giant game of telephone. But it's smarter than

00:16:29.039 --> 00:16:32.340
telephone. Eventually, someone near the entryway

00:16:32.340 --> 00:16:35.379
notices a guest walking in with a wet umbrella.

00:16:35.779 --> 00:16:39.440
Ah, new data. Yes. That person updates their

00:16:39.440 --> 00:16:41.519
belief about the weather to almost certainly

00:16:41.519 --> 00:16:44.629
raining. They pass that updated belief back to

00:16:44.629 --> 00:16:46.490
the person next to them. Hey, I just saw a wet

00:16:46.490 --> 00:16:49.070
umbrella. I'm 90 % sure it's raining Okay, so

00:16:49.070 --> 00:16:52.509
the message travels back and as it travels That

00:16:52.509 --> 00:16:55.210
person combines that message with what they're

00:16:55.210 --> 00:16:58.309
hearing from someone else who maybe heard thunder

00:16:59.390 --> 00:17:02.490
They synthesize these messages, update their

00:17:02.490 --> 00:17:05.150
own belief, and pass the new synthesis along

00:17:05.150 --> 00:17:08.109
to you. And belief propagation is just that process

00:17:08.109 --> 00:17:10.630
happening mathematically. Yes, exactly. It's

00:17:10.630 --> 00:17:13.519
a message passing algorithm. all the variables

00:17:13.519 --> 00:17:16.140
in the network, continuously pass mathematical

00:17:16.140 --> 00:17:18.900
messages back and forth along the connected strengths.

00:17:18.960 --> 00:17:21.019
Just constantly updating. Right, updating their

00:17:21.019 --> 00:17:22.900
probabilities based on what their neighbors are

00:17:22.900 --> 00:17:25.339
telling them until the entire room reaches a

00:17:25.339 --> 00:17:27.420
consensus about the most likely state of the

00:17:27.420 --> 00:17:29.579
world outside. And where does the factor graph

00:17:29.579 --> 00:17:32.119
come in? A factor graph is just a specialized,

00:17:32.240 --> 00:17:34.839
highly efficient map designed specifically to

00:17:34.839 --> 00:17:37.160
organize how those messages are passed, making

00:17:37.160 --> 00:17:39.140
sure the computation doesn't just get totally

00:17:39.140 --> 00:17:41.980
overwhelmed by all the chatter. Okay, this brings

00:17:41.980 --> 00:17:45.180
us to the ultimate so -what of this entire deep

00:17:45.180 --> 00:17:47.700
dive. We've built these magnificent structures

00:17:47.700 --> 00:17:50.460
in our minds. We have tollbooths blocking information,

00:17:50.720 --> 00:17:53.359
we have local neighborhood scoring systems, we

00:17:53.359 --> 00:17:55.839
have diplomatic treaties between arrows and lines,

00:17:56.220 --> 00:17:58.779
and cocktail parties passing messages of belief.

00:17:59.059 --> 00:18:02.220
It's a busy world. It is, but outside of theoretical

00:18:02.220 --> 00:18:04.920
mathematics, what's this actually do? The source

00:18:04.920 --> 00:18:07.319
has a whole section dedicated to applications,

00:18:07.359 --> 00:18:10.450
and it claims this framework provides algorithms

00:18:10.450 --> 00:18:13.089
for quoting, again, discovering and analyzing

00:18:13.089 --> 00:18:15.930
structure in complex distributions. When you

00:18:15.930 --> 00:18:17.769
look at the list of applications, it really reads

00:18:17.769 --> 00:18:20.410
like a catalog of modern technological miracles.

00:18:21.049 --> 00:18:24.670
Causal inference, information extraction, decoding

00:18:24.670 --> 00:18:27.529
low -density parity check codes, modeling protein

00:18:27.529 --> 00:18:30.029
structures. Gene regulatory networks, too. Let's

00:18:30.029 --> 00:18:31.910
tie this all the way back to the hook from the

00:18:31.910 --> 00:18:34.299
beginning of our deep dive. Speech recognition.

00:18:34.819 --> 00:18:36.740
When I'm talking to my phone, how is it using

00:18:36.740 --> 00:18:39.279
these maps to understand me? So historically,

00:18:39.559 --> 00:18:41.680
speech recognition relied heavily on a specific

00:18:41.680 --> 00:18:44.299
type of directed graphical model called a hidden

00:18:44.299 --> 00:18:47.440
Markov model. When you speak, the phone receives

00:18:47.440 --> 00:18:51.039
a chaotic wave of acoustic noise. It has to guess

00:18:51.039 --> 00:18:54.059
the hidden state, which is the actual word you

00:18:54.059 --> 00:18:56.500
intended to say based on the observed state,

00:18:56.799 --> 00:18:59.279
which is the messy audio data. And it uses the

00:18:59.279 --> 00:19:01.380
arrows to figure out the context. It uses the

00:19:01.380 --> 00:19:03.960
conditional dependencies, yeah? The model knows

00:19:03.960 --> 00:19:06.000
the statistical rules of the English language.

00:19:06.720 --> 00:19:08.700
It knows that if the last hidden state it guessed

00:19:08.700 --> 00:19:11.700
was the word peanut, the probability that the

00:19:11.700 --> 00:19:13.779
next directed arrow points to the word butter

00:19:13.779 --> 00:19:17.299
is incredibly high. Right, peanut butter. Exactly.

00:19:17.940 --> 00:19:20.039
The probability that the next arrow points to

00:19:20.039 --> 00:19:23.140
elephant is near zero. It uses the structure

00:19:23.140 --> 00:19:25.539
of the graph to filter out the noise and lock

00:19:25.539 --> 00:19:27.920
in on the most probable sequence of words. That's

00:19:27.920 --> 00:19:30.180
amazing. The text also mentions computer vision.

00:19:30.279 --> 00:19:32.359
I imagine that uses a different map entirely.

00:19:32.700 --> 00:19:34.900
Usually computer vision relies on undirected

00:19:34.900 --> 00:19:37.420
models, like the Markov random fields we discussed

00:19:37.420 --> 00:19:39.859
earlier. Think about a digital photograph. Just

00:19:39.859 --> 00:19:42.019
a bunch of pixels. Right, it's just a grid of

00:19:42.019 --> 00:19:43.980
millions of pixels, each with a color value.

00:19:44.680 --> 00:19:47.019
If a computer is trying to identify a stop sign

00:19:47.019 --> 00:19:50.059
in a noisy image, it looks at the mutual correlation

00:19:50.059 --> 00:19:52.440
between neighboring pixels. Oh, the local neighborhood

00:19:52.440 --> 00:19:55.279
scoring system again? You got it. The algorithm

00:19:55.279 --> 00:19:57.910
knows that if one pretzel is red, The pixels

00:19:57.910 --> 00:20:00.289
immediately connected to it are statistically

00:20:00.289 --> 00:20:03.529
highly likely to also be red. So it groups them?

00:20:03.730 --> 00:20:06.549
Right. It uses those undirected connections to

00:20:06.549 --> 00:20:09.410
smooth out the noise, group the red pixels together,

00:20:09.910 --> 00:20:12.250
recognize that octagonal shape, and confidently

00:20:12.250 --> 00:20:16.549
conclude that is a stop sign. It finds the structure

00:20:16.549 --> 00:20:18.630
in the static. It is kind of blowing my mind

00:20:18.630 --> 00:20:21.309
that the underlying logic used to read my texts

00:20:21.309 --> 00:20:23.869
is the same underlying logic used by a self -driving

00:20:23.869 --> 00:20:26.500
car to not run a red light. It's universal. And

00:20:26.500 --> 00:20:29.940
the source also lists disease diagnosis. A medical

00:20:29.940 --> 00:20:32.180
software program using a Bayesian network can

00:20:32.180 --> 00:20:34.480
take dozens of disparate variables, a patient's

00:20:34.480 --> 00:20:37.339
age, a slight fever, a specific genetic marker,

00:20:37.819 --> 00:20:40.359
a recent travel history, and use those directed

00:20:40.359 --> 00:20:42.799
arrows to calculate the actual probability of

00:20:42.799 --> 00:20:45.779
a specific underlying disease. It is profound

00:20:45.779 --> 00:20:48.680
when you step back and look at it. This raises

00:20:48.680 --> 00:20:51.019
an important question about what it means to

00:20:51.019 --> 00:20:54.920
discover structure. How so? Well, what do humans

00:20:54.920 --> 00:20:57.700
speech? The pixels in a photograph and the symptoms

00:20:57.700 --> 00:21:00.640
of a rare disease have in common. To the naked

00:21:00.640 --> 00:21:03.759
eye, nothing. But to a probabilistic graphical

00:21:03.759 --> 00:21:06.359
model, they're all just complex distributions

00:21:06.359 --> 00:21:08.440
of random variables. They're all just pushpins

00:21:08.440 --> 00:21:11.319
on the board. Yes. It means that hidden inside

00:21:11.319 --> 00:21:14.039
the seemingly random chaos of speech patterns

00:21:14.039 --> 00:21:17.440
or genetic mutations or digital noise, there's

00:21:17.440 --> 00:21:21.019
an underlying structure of dependence. PGMs are

00:21:21.019 --> 00:21:23.279
just the flashlights we use to go into that dark

00:21:23.279 --> 00:21:25.519
room and find that structure. It's the invisible

00:21:25.519 --> 00:21:27.660
architecture running beneath the surface of modern

00:21:27.660 --> 00:21:30.160
science. It forces you to realize that very few

00:21:30.160 --> 00:21:32.740
things are truly entirely independent. The world

00:21:32.740 --> 00:21:35.039
is really just a web of conditional probabilities.

00:21:35.279 --> 00:21:36.980
Let's take a second to recap the journey we've

00:21:36.980 --> 00:21:39.119
been on because we have covered a massive amount

00:21:39.119 --> 00:21:41.319
of intellectual ground today. We really have.

00:21:41.380 --> 00:21:44.240
We started with a pile of random variables and

00:21:44.240 --> 00:21:46.240
learned that to cheat the impossible math of

00:21:46.240 --> 00:21:47.839
calculating everything at once, we have to build

00:21:47.839 --> 00:21:51.180
a map, a graph. to map the flow of information.

00:21:51.380 --> 00:21:53.980
Right. If the relationship is just a mutual correlation

00:21:53.980 --> 00:21:57.240
sitting at the same table, we use Planeline's

00:21:57.240 --> 00:21:59.900
undirected Markov models. And we evaluate them

00:21:59.900 --> 00:22:02.309
using those local neighborhood scores. But if

00:22:02.309 --> 00:22:04.690
there is a flow of influence, a parent -child

00:22:04.690 --> 00:22:07.589
dynamic driving events forward, we use directed

00:22:07.589 --> 00:22:10.569
arrows, Bayesian networks. We learned how observing

00:22:10.569 --> 00:22:13.490
a middle event can act like a toll booth, shutting

00:22:13.490 --> 00:22:16.430
down the flow of new information through deseparation.

00:22:17.569 --> 00:22:20.289
And when the world refuses to fit into neat boxes,

00:22:20.549 --> 00:22:23.630
we embrace the hybrids. We allow feedback loops

00:22:23.630 --> 00:22:26.589
with cyclic graphs. We forge diplomatic treaties

00:22:26.589 --> 00:22:29.569
with chain graphs. And we pass messages of belief

00:22:29.569 --> 00:22:32.519
through factor graphs. the math to the messiness

00:22:32.519 --> 00:22:35.079
of reality, and ultimately using it to decode

00:22:35.079 --> 00:22:38.640
the world around us, from our DNA to human speech.

00:22:38.829 --> 00:22:41.430
It is a dense, heavy topic, but the elegance

00:22:41.430 --> 00:22:43.789
of how it all connects is just incredible. Thank

00:22:43.789 --> 00:22:46.170
you for joining us on this exploration. We tackled

00:22:46.170 --> 00:22:48.789
a highly technical subject today, but hopefully

00:22:48.789 --> 00:22:50.990
you're walking away with a brand new lens through

00:22:50.990 --> 00:22:53.549
which to view the technology you use every single

00:22:53.549 --> 00:22:57.009
day. It certainly changes how you perceive randomness.

00:22:57.309 --> 00:22:59.490
You start seeing the underlying strings connecting

00:22:59.490 --> 00:23:01.369
everything. It really does. And actually, there

00:23:01.369 --> 00:23:03.910
is one final lingering thought I want to leave

00:23:03.910 --> 00:23:06.309
you with. Oh, I like these. The source notes

00:23:06.309 --> 00:23:09.480
over and over that graphical are used for discovering

00:23:09.480 --> 00:23:12.660
and analyzing structure in complex distributions.

00:23:13.339 --> 00:23:16.039
It makes you wonder, if you were to sit down

00:23:16.039 --> 00:23:19.019
right now and draw a probabilistic graphical

00:23:19.019 --> 00:23:22.200
model of your own life. Wow. OK. Yeah, mapping

00:23:22.200 --> 00:23:25.039
out all the seemingly random variables of your

00:23:25.039 --> 00:23:27.420
daily choices, your habits, your social environment,

00:23:27.519 --> 00:23:30.640
your stress levels, what hidden dependencies

00:23:30.640 --> 00:23:33.640
would you discover? What unexpected parent nodes

00:23:33.640 --> 00:23:35.940
are secretly acting as toll booths in your life,

00:23:36.339 --> 00:23:37.900
controlling your outcomes and pulling the strings

00:23:37.900 --> 00:23:40.099
without you even realizing it? That is a question

00:23:40.099 --> 00:23:42.059
worth spending some time on. Something for you

00:23:42.059 --> 00:23:44.099
to mull over until next time. Keep exploring.
