WEBVTT

00:00:00.000 --> 00:00:03.100
In 1886, a scientist studying human genetics

00:00:03.100 --> 00:00:06.000
realized something that must have seemed kind

00:00:06.000 --> 00:00:08.080
of terrifying at the time. Oh, definitely. He

00:00:08.080 --> 00:00:10.720
was looking at family trees and noticed humanity

00:00:10.720 --> 00:00:12.880
basically appeared to be shrinking toward the

00:00:12.880 --> 00:00:16.660
average. Right. Like unusually tall parents were

00:00:16.660 --> 00:00:19.500
consistently having shorter kids. Exactly. And

00:00:19.500 --> 00:00:21.620
unusually short parents were having taller kids.

00:00:21.839 --> 00:00:23.780
It was like every extreme was just getting pulled

00:00:23.780 --> 00:00:26.079
right back to the middle. And he called this

00:00:26.079 --> 00:00:30.059
phenomenon regression toward mediocrity. Which

00:00:30.059 --> 00:00:33.240
is quite a phrase. It really is. So welcome to

00:00:33.240 --> 00:00:36.119
today's deep dive. We are exploring a really

00:00:36.119 --> 00:00:38.920
comprehensive Wikipedia article on a subject

00:00:38.920 --> 00:00:42.329
that actually grew out of that exact observation

00:00:42.329 --> 00:00:44.469
linear regression. Yeah, and I know that probably

00:00:44.469 --> 00:00:46.990
sounds like a dusty high school math topic that

00:00:46.990 --> 00:00:48.969
you, you know, tried to forget the second you

00:00:48.969 --> 00:00:52.049
graduated. Yeah, totally. But consider this,

00:00:52.409 --> 00:00:55.490
it is literally the hidden engine powering our

00:00:55.490 --> 00:00:57.570
modern world. It really is. I mean, it's the

00:00:57.570 --> 00:00:59.990
invisible architecture behind financial markets,

00:01:00.270 --> 00:01:02.950
epidemiological research, and maybe most importantly

00:01:02.950 --> 00:01:05.670
right now, it's the absolute bedrock of modern

00:01:05.670 --> 00:01:07.849
artificial intelligence. Right, because when

00:01:07.849 --> 00:01:10.790
you strip away all the layers of a complex neural

00:01:10.790 --> 00:01:13.719
network, you often just find linear regression

00:01:13.719 --> 00:01:16.079
quietly doing the heavy lifting in the background.

00:01:16.640 --> 00:01:20.280
Exactly. So our mission for you today is to demystify

00:01:20.280 --> 00:01:23.299
this statistical powerhouse. The goal here isn't

00:01:23.299 --> 00:01:25.579
to force you to memorize formulas or anything

00:01:25.579 --> 00:01:27.480
like that. No, definitely not. It's really to

00:01:27.480 --> 00:01:29.920
understand how humanity uses math to predict

00:01:29.920 --> 00:01:33.060
the future and, you know, make sense of incredibly

00:01:33.060 --> 00:01:36.099
chaotic data. So, okay, let's unpack this because

00:01:36.099 --> 00:01:38.599
the idea of just drawing a straight line through

00:01:38.599 --> 00:01:41.879
a bunch of messy data points, It seems way too

00:01:41.879 --> 00:01:44.099
simple to be this important. Well, yeah, that

00:01:44.099 --> 00:01:47.180
simplicity is kind of an illusion. The most compelling

00:01:47.180 --> 00:01:49.280
part of linear regression isn't actually the

00:01:49.280 --> 00:01:51.819
mathematics itself. Oh. Yeah, I mean the math

00:01:51.819 --> 00:01:55.000
is just a tool. The truly fascinating part is

00:01:55.000 --> 00:01:57.040
the philosophical assumptions we have to make

00:01:57.040 --> 00:01:59.680
when we try to draw that straight orderly line

00:01:59.680 --> 00:02:02.459
through a deeply messy, unpredictable reality.

00:02:02.640 --> 00:02:05.379
We're imposing order on chaos. Exactly. And that

00:02:05.379 --> 00:02:07.680
comes with some really profound implications.

00:02:08.099 --> 00:02:10.180
So before we can understand how modern AI uses

00:02:10.180 --> 00:02:12.879
this tool to impose that order, we kind of have

00:02:12.879 --> 00:02:15.240
to look at why it was invented in the first place.

00:02:15.419 --> 00:02:17.740
Right. Which actually takes us on a journey from

00:02:17.740 --> 00:02:20.379
looking up at the stars all the way to looking

00:02:20.379 --> 00:02:22.800
right back at ourselves. Because long before

00:02:22.800 --> 00:02:24.719
that geneticist was looking at human heights

00:02:24.719 --> 00:02:27.620
in the 1880s, astronomers were dealing with a

00:02:27.620 --> 00:02:30.379
massive data problem. Yeah, early scientists

00:02:30.379 --> 00:02:33.300
were essentially drowning in observational data

00:02:33.300 --> 00:02:35.939
that simply wouldn't line up. Which makes sense.

00:02:36.199 --> 00:02:38.479
Right. I mean, if we connect this to the bigger

00:02:38.479 --> 00:02:41.280
picture, imagine looking through a 17th century

00:02:41.280 --> 00:02:44.740
telescope. Your lenses are imperfect, right?

00:02:44.969 --> 00:02:47.550
The atmosphere distorts the light. And human

00:02:47.550 --> 00:02:49.610
error creeps into literally every measurement

00:02:49.610 --> 00:02:52.349
you make. Exactly. The source actually mentions

00:02:52.349 --> 00:02:55.389
Isaac Newton dealing with this. In the year 1700,

00:02:55.710 --> 00:02:59.229
Newton scribbled down what we now call the normal

00:02:59.229 --> 00:03:02.669
equations to study the equinoxes. Wow. Yeah.

00:03:03.050 --> 00:03:04.770
He's basically trying to find the mathematical

00:03:04.770 --> 00:03:07.090
equations that would minimize all those observation

00:03:07.090 --> 00:03:10.490
errors to find the optimal true path of the celestial

00:03:10.490 --> 00:03:12.840
bodies. That makes sense. And then... In the

00:03:12.840 --> 00:03:15.960
early 1800s, mathematicians like Legendre and

00:03:15.960 --> 00:03:19.379
Gauss formalized this into a method called least

00:03:19.379 --> 00:03:23.020
squares to predict planetary movement. Right,

00:03:23.020 --> 00:03:24.520
because they were trying to figure out exactly

00:03:24.520 --> 00:03:26.819
where a planet would be, even though their telescopes

00:03:26.819 --> 00:03:28.699
were giving them slightly conflicting readings

00:03:28.699 --> 00:03:31.199
every single night. They were hunting for the

00:03:31.199 --> 00:03:34.199
true signal hidden within all that noise. But

00:03:34.199 --> 00:03:36.819
as you mentioned at the start, the term regression

00:03:36.819 --> 00:03:40.030
didn't actually come from astronomy. No, it came

00:03:40.030 --> 00:03:43.930
from Francis Galton in 1886, observing those

00:03:43.930 --> 00:03:46.969
human heights. The regression toward mediocrity?

00:03:47.370 --> 00:03:49.750
Which, again, sounds so harsh to our modern ears.

00:03:49.830 --> 00:03:52.650
It does, but in a statistical sense, it really

00:03:52.650 --> 00:03:54.909
just meant returning to the center. Today, we

00:03:54.909 --> 00:03:57.490
call it regression toward the mean. Right. Galton

00:03:57.490 --> 00:03:59.509
plotted the parents' heights against the children's

00:03:59.509 --> 00:04:01.990
heights, drew a line through the scattered dots

00:04:01.990 --> 00:04:04.250
to represent that relationship, and the name

00:04:04.250 --> 00:04:06.849
regression just permanently stuck to the mathematical

00:04:06.849 --> 00:04:09.280
process of fitting that line. To ground this

00:04:09.280 --> 00:04:11.560
in how it actually works, the text gives this

00:04:11.560 --> 00:04:14.219
really brilliant analogy. Imagine you're throwing

00:04:14.219 --> 00:04:16.800
a small ball up into the air. If you track its

00:04:16.800 --> 00:04:19.899
height at various fractions of a second, physics

00:04:19.899 --> 00:04:22.579
strictly dictates its path. Right. Gravity is

00:04:22.579 --> 00:04:24.779
pulling it down. The initial velocity is pushing

00:04:24.779 --> 00:04:27.279
it up. Exactly. But when you actually measure

00:04:27.279 --> 00:04:29.500
it, Your measurements are going to be slightly

00:04:29.500 --> 00:04:31.920
off, you know, your thumb slips on the stopwatch

00:04:31.920 --> 00:04:34.139
or you blink. There's always human error. Yeah.

00:04:34.779 --> 00:04:38.279
So linear regression is basically a way to estimate

00:04:38.279 --> 00:04:42.980
those hidden true forces, like the exact initial

00:04:42.980 --> 00:04:46.220
velocity of the ball from the messy, imprecise

00:04:46.220 --> 00:04:48.800
data points we actually observe. And what's really

00:04:48.800 --> 00:04:51.439
fascinating here is the fundamental shift in

00:04:51.439 --> 00:04:54.110
how we view the numbers themselves. How so? Well,

00:04:54.149 --> 00:04:57.589
from a pure mathematical perspective, your variables,

00:04:57.870 --> 00:05:00.370
like time and height, are just unknown variables.

00:05:00.769 --> 00:05:02.889
And the parameters, like the force of gravity,

00:05:03.110 --> 00:05:05.550
are fixed constants. Gravity isn't changing.

00:05:05.670 --> 00:05:07.589
Right. But from a statistical perspective, which

00:05:07.589 --> 00:05:10.470
is what regression is, the focus completely flips.

00:05:10.709 --> 00:05:13.170
Once we substitute our actual observed data for

00:05:13.170 --> 00:05:15.930
the variables, the model becomes a function of

00:05:15.930 --> 00:05:18.759
the parameters. I see. The parameters become

00:05:18.759 --> 00:05:21.240
the unknowns that we have to estimate. We are

00:05:21.240 --> 00:05:24.220
basically using the messy data to work backwards

00:05:24.220 --> 00:05:27.420
and guess the hidden rules of the game. We are

00:05:27.420 --> 00:05:29.879
essentially reverse engineering reality. Yes,

00:05:30.279 --> 00:05:32.500
exactly. Okay, so if we're going to estimate

00:05:32.500 --> 00:05:35.079
those hidden forces, we need to peek under the

00:05:35.079 --> 00:05:38.329
hood at the anatomy of this line. The core mechanism

00:05:38.329 --> 00:05:41.750
doing the heavy lifting here is ordinarily squares,

00:05:41.970 --> 00:05:44.910
or OLS, and something called the mean squared

00:05:44.910 --> 00:05:47.470
error. Right. The basic formula for this is usually

00:05:47.470 --> 00:05:51.029
written as y equals beta zero plus beta one times

00:05:51.029 --> 00:05:53.629
x plus epsilon. Okay. Let's apply that formula

00:05:53.629 --> 00:05:55.470
directly to Galton's family so you can really

00:05:55.470 --> 00:05:58.470
visualize it. Y is the dependent variable. That's

00:05:58.470 --> 00:06:00.189
the outcome we want to guess. In this case, the

00:06:00.189 --> 00:06:03.089
child's height. Yeah. We are trying to predict

00:06:03.089 --> 00:06:05.779
that based on x, our independent variable, which

00:06:05.779 --> 00:06:07.519
is the parent's height. And those beta symbols

00:06:07.519 --> 00:06:09.720
are the parameters. They represent the intercept

00:06:09.720 --> 00:06:11.920
and the slope of our line. Right. They physically

00:06:11.920 --> 00:06:13.920
define the angle and position of the straight

00:06:13.920 --> 00:06:16.699
line we're drawing across the graph. But the

00:06:16.699 --> 00:06:18.639
most crucial part of that entire equation is

00:06:18.639 --> 00:06:21.680
actually the very last term, epsilon. Epsilon.

00:06:22.000 --> 00:06:25.060
The error term. The noise. Yeah. Epsilon acknowledges

00:06:25.060 --> 00:06:27.620
that our line will literally never be perfect.

00:06:28.300 --> 00:06:30.819
A child's height isn't purely dictated by their

00:06:30.819 --> 00:06:32.199
parent's height. Right. There's a lot of other

00:06:32.199 --> 00:06:35.519
stuff going on. Exactly. It captures all the

00:06:35.519 --> 00:06:38.040
other hidden factors, childhood diet, illness,

00:06:38.699 --> 00:06:41.439
random genetic mutation, that influence Y but

00:06:41.439 --> 00:06:43.620
just aren't included in our single X variable.

00:06:44.339 --> 00:06:47.060
So the goal of ordinary least squares is to find

00:06:47.060 --> 00:06:50.920
the specific beta values, the exact line that

00:06:50.920 --> 00:06:53.519
makes those epsilon errors as small as possible

00:06:53.519 --> 00:06:56.600
overall. Yep. And it does that by drawing a line

00:06:56.600 --> 00:06:58.980
that minimizes the sum of the squared errors.

00:06:59.230 --> 00:07:01.649
You take the distance from every single messy

00:07:01.649 --> 00:07:04.189
data point to your proposed perfect line, you

00:07:04.189 --> 00:07:05.750
square that distance, and you add them all up.

00:07:06.009 --> 00:07:07.990
And you want that final number to be as tiny

00:07:07.990 --> 00:07:10.529
as possible. Exactly. But wait, why are we squaring

00:07:10.529 --> 00:07:12.529
the errors though? I mean, if we just want the

00:07:12.529 --> 00:07:14.410
line that is closest to all the dots, why not

00:07:14.410 --> 00:07:16.430
just measure the absolute distance from the dot

00:07:16.430 --> 00:07:18.870
to the line and minimize that? That is a great

00:07:18.870 --> 00:07:21.220
question. Mathematically, squaring the error

00:07:21.220 --> 00:07:24.379
does something very specific. It penalizes large

00:07:24.379 --> 00:07:27.040
errors exponentially more than small errors.

00:07:27.240 --> 00:07:29.980
Exponentially. Yeah. If a data point is one unit

00:07:29.980 --> 00:07:32.120
away from your line, the squared error is one.

00:07:32.480 --> 00:07:34.639
But if a point is four units away, the squared

00:07:34.639 --> 00:07:37.879
error isn't four, it's 16. Wow. So the model

00:07:37.879 --> 00:07:40.439
becomes absolutely terrified of being really

00:07:40.439 --> 00:07:43.199
far off from any single point. It will actively

00:07:43.199 --> 00:07:46.060
twist and warp the angle of the line just to

00:07:46.060 --> 00:07:49.490
avoid those massive exponential penalties. But

00:07:49.490 --> 00:07:51.790
that seems incredibly vulnerable. Like, what

00:07:51.790 --> 00:07:53.829
if you have a hundred normal data points and

00:07:53.829 --> 00:07:57.269
one point that is wildly, ridiculously far away?

00:07:57.310 --> 00:07:59.959
Like an outlier. Yeah. Like maybe a data entry

00:07:59.959 --> 00:08:02.160
typo where someone typed a height of 600 inches

00:08:02.160 --> 00:08:05.019
instead of 60. The model will assign so much

00:08:05.019 --> 00:08:08.000
importance to that one huge squared error that

00:08:08.000 --> 00:08:10.259
it will pull the entire regression line away

00:08:10.259 --> 00:08:12.579
from the true data just to accommodate the typo.

00:08:12.720 --> 00:08:14.740
You're exactly right. One bad piece of data can

00:08:14.740 --> 00:08:17.620
completely derail ordinary least squares. That's

00:08:17.620 --> 00:08:19.860
crazy. The source text actually notes that alternative

00:08:19.860 --> 00:08:22.540
methods do exist to counter this. There's a technique

00:08:22.540 --> 00:08:25.279
called least absolute deviation, which does exactly

00:08:25.279 --> 00:08:27.779
what you suggested earlier. It minimizes the

00:08:27.779 --> 00:08:30.000
absolute distance without squaring it. Yeah,

00:08:30.139 --> 00:08:32.720
it's much more robust to massive outliers, but

00:08:32.720 --> 00:08:35.139
historically it was computationally harder and

00:08:35.139 --> 00:08:37.419
just less statistically efficient to calculate

00:08:37.419 --> 00:08:40.120
by hand, which is why the vulnerable ordinary

00:08:40.120 --> 00:08:42.659
least squares became the gold standard. Okay,

00:08:42.679 --> 00:08:45.500
but if the standard math is this fragile, if

00:08:45.500 --> 00:08:49.549
one bad data point can warp the whole line, How

00:08:49.549 --> 00:08:52.750
could statisticians possibly trust it for high

00:08:52.750 --> 00:08:54.950
stakes interpretations of reality? Well, they

00:08:54.950 --> 00:08:57.509
have to be very careful. This actually introduces

00:08:57.509 --> 00:09:01.679
a major theme in our sources. the danger of blind

00:09:01.679 --> 00:09:04.740
math. Statistics can easily lie to you if you

00:09:04.740 --> 00:09:07.179
just blindly trust the formula without looking

00:09:07.179 --> 00:09:09.700
at the real world context. Yes. Take Anscombe's

00:09:09.700 --> 00:09:11.940
Quartet, which is a famous statistical trap mentioned

00:09:11.940 --> 00:09:14.059
in the text. Anscombe's Quartet is a perfect

00:09:14.059 --> 00:09:16.960
example. It consists of four entirely different

00:09:16.960 --> 00:09:19.759
data sets. If you graph them, honestly, a child

00:09:19.759 --> 00:09:21.240
could tell you they represent four different

00:09:21.240 --> 00:09:23.820
phenomena. Right. One is a messy blob of dots.

00:09:23.919 --> 00:09:27.000
One is a perfect sweeping curve. One is a tight

00:09:27.000 --> 00:09:29.440
straight line with one massive outlier pole.

00:09:29.360 --> 00:09:31.419
the math and one is just a vertical stack of

00:09:31.419 --> 00:09:33.919
dots with one dot off to the side. So visually

00:09:33.919 --> 00:09:35.820
they literally have nothing in common. Nothing.

00:09:35.840 --> 00:09:38.320
But if you run a simple linear regression on

00:09:38.320 --> 00:09:42.000
all four of those data sets, the math produces

00:09:42.000 --> 00:09:44.740
the exact same regression line. Yep. It outputs

00:09:44.740 --> 00:09:48.039
the exact same mean. It outputs the same standard

00:09:48.039 --> 00:09:50.620
deviation, meaning the math calculates that the

00:09:50.620 --> 00:09:53.860
data spreads out from the average by the exact

00:09:53.860 --> 00:09:56.919
same amount overall. It's wild. And it gives

00:09:56.919 --> 00:09:59.740
the same correlation coefficient. meaning the

00:09:59.740 --> 00:10:02.000
formula thinks the variables are moving together

00:10:02.000 --> 00:10:05.019
in the exact same way across all four graphs.

00:10:05.259 --> 00:10:07.820
It is the ultimate warning label on the side

00:10:07.820 --> 00:10:09.679
of the regression box. Always graph your data

00:10:09.679 --> 00:10:12.200
first. Always. Do not just trust the mathematical

00:10:12.200 --> 00:10:14.500
summary because the math cannot see the shape

00:10:14.500 --> 00:10:17.139
of reality. It only sees the numbers you feed

00:10:17.139 --> 00:10:20.360
it. Right. And honestly, the dangers go far beyond

00:10:20.360 --> 00:10:23.600
visual illusions. There is a major interpretation

00:10:23.600 --> 00:10:26.039
issue with something called the held fixed fallacy.

00:10:26.059 --> 00:10:28.440
Oh, yeah. When you move from simple regression

00:10:28.399 --> 00:10:30.940
to multiple regression, meaning you have many

00:10:30.940 --> 00:10:33.919
x variables predicting one y. The way you interpret

00:10:33.919 --> 00:10:36.559
the results gets really tricky. Right. Let's

00:10:36.559 --> 00:10:38.919
say you're predicting a house price based on

00:10:38.919 --> 00:10:41.419
square footage and the number of bedrooms. The

00:10:41.419 --> 00:10:43.639
standard interpretation of the regression coefficient

00:10:43.639 --> 00:10:46.500
is, okay, this is the expected change in the

00:10:46.500 --> 00:10:48.539
price when the square footage changes by one

00:10:48.539 --> 00:10:50.940
unit while all other variables are held fixed.

00:10:51.200 --> 00:10:54.340
Exactly. The source uses epidemiology to explain

00:10:54.340 --> 00:10:58.169
why this held fixed concept is necessary. Early

00:10:58.169 --> 00:11:00.710
studies used observational regression to link

00:11:00.710 --> 00:11:04.769
tobacco smoking to mortality. If you just regress

00:11:04.769 --> 00:11:07.730
lifespan against smoking, your results will be

00:11:07.730 --> 00:11:10.340
skewed by confounding variables. You have to

00:11:10.340 --> 00:11:12.539
include other variables like education and income

00:11:12.539 --> 00:11:15.139
in the exact same mathematical model. Because

00:11:15.139 --> 00:11:16.940
otherwise you're not getting the real picture.

00:11:17.240 --> 00:11:20.059
Right. That way you ensure the shorter lifespan

00:11:20.059 --> 00:11:22.539
you are seeing is actually isolated to the smoking

00:11:22.539 --> 00:11:25.679
and not just reflecting broader socioeconomic

00:11:25.679 --> 00:11:28.799
factors. You are trying to find the unique isolated

00:11:28.799 --> 00:11:32.100
effect of smoking mathematically assuming education

00:11:32.100 --> 00:11:34.740
and income are held fixed in place. But here's

00:11:34.740 --> 00:11:37.159
where it gets really interesting. How can you

00:11:37.159 --> 00:11:39.620
possibly hold a variable fixed in the real world

00:11:39.620 --> 00:11:41.679
if things naturally move together? You often

00:11:41.679 --> 00:11:45.240
can't. Like using the house example. You can't

00:11:45.240 --> 00:11:47.480
realistically increase the square footage of

00:11:47.480 --> 00:11:50.620
a house by 2 ,000 feet while holding the number

00:11:50.620 --> 00:11:53.539
of bedrooms fixed at one. Those two things are

00:11:53.539 --> 00:11:55.919
deeply correlated in reality. You've hit on one

00:11:55.919 --> 00:11:58.320
of the most significant vulnerabilities of multiple

00:11:58.320 --> 00:12:01.419
regression. It's a problem known as multicollinearity.

00:12:01.799 --> 00:12:04.230
Multicollinearity. Yeah. When predictor variables

00:12:04.230 --> 00:12:06.990
are strongly correlated in the real world, the

00:12:06.990 --> 00:12:09.129
mathematical assumption that you can increase

00:12:09.129 --> 00:12:11.809
one while holding the others perfectly constant

00:12:11.809 --> 00:12:15.789
becomes a complete fantasy. The source text emphasizes

00:12:15.789 --> 00:12:18.909
that in these highly entangled situations, relying

00:12:18.909 --> 00:12:21.590
on the individual isolated effects of a single

00:12:21.590 --> 00:12:24.529
variable becomes problematic and often entirely

00:12:24.529 --> 00:12:26.610
meaningless. So the math will confidently give

00:12:26.610 --> 00:12:28.590
you a number for what happens when you isolate

00:12:28.590 --> 00:12:31.629
the variable. The reality it describes just doesn't

00:12:31.629 --> 00:12:34.850
actually exist. Precisely. To solve this, statisticians

00:12:34.850 --> 00:12:37.309
have to stop looking at isolated variables and

00:12:37.309 --> 00:12:40.389
instead look at group effects. Instead of asking

00:12:40.389 --> 00:12:43.129
what happens when one variable changes alone

00:12:43.129 --> 00:12:45.789
in a vacuum, they have to calculate what happens

00:12:45.789 --> 00:12:48.049
when the whole group of correlated variables

00:12:48.049 --> 00:12:50.769
changes together. Because, well, that is how

00:12:50.769 --> 00:12:53.610
nature actually operates. The real world doesn't

00:12:53.610 --> 00:12:56.539
play by isolated rules. And it certainly doesn't

00:12:56.539 --> 00:12:58.460
always move in straight lines. Yeah, it doesn't.

00:12:58.580 --> 00:13:01.559
Because reality is full of non -linear relationships,

00:13:02.120 --> 00:13:05.279
this basic tool actually had to adapt and evolve

00:13:05.279 --> 00:13:09.259
to survive, which requires clearing up a massive,

00:13:09.700 --> 00:13:12.320
very common misconception about the word linear.

00:13:12.509 --> 00:13:15.129
Oh, this is a big one. When we say linear regression,

00:13:15.470 --> 00:13:17.210
everyone pictures a perfectly straight line on

00:13:17.210 --> 00:13:19.730
a graph. But linear regression doesn't strictly

00:13:19.730 --> 00:13:21.889
mean the physical shape on the graph has to be

00:13:21.889 --> 00:13:24.070
a straight line. Right. It goes back to our earlier

00:13:24.070 --> 00:13:25.889
distinction between variables and parameters.

00:13:26.210 --> 00:13:28.590
The model is linear in its unknown parameters,

00:13:28.929 --> 00:13:31.269
not necessarily in its variables. OK, unpack

00:13:31.269 --> 00:13:33.909
that a bit. So the beta coefficients, the parameters

00:13:33.909 --> 00:13:36.669
we are trying to estimate, must be linear. They

00:13:36.669 --> 00:13:39.799
are only raised to the power of 1. but our x

00:13:39.799 --> 00:13:42.600
variables, the raw data we feed into the machine,

00:13:43.039 --> 00:13:45.659
we can transform those however we want before

00:13:45.659 --> 00:13:48.419
we run the regression. So we can square the x

00:13:48.419 --> 00:13:50.399
variable. Yes. We can take the logarithm of the

00:13:50.399 --> 00:13:53.279
x variable. The formula could be y equals beta

00:13:53.279 --> 00:13:57.799
0 plus beta 1 times x squared. Exactly. The x

00:13:57.799 --> 00:14:00.480
is squared, which physically creates a sweeping

00:14:00.480 --> 00:14:03.600
curve on your visual graph, but the beta, the

00:14:03.600 --> 00:14:05.580
parameter of the regression is actually estimating

00:14:05.580 --> 00:14:09.200
to fit that curve, is still just a simple linear

00:14:09.200 --> 00:14:12.679
multiplier. So technically, fitting a curve to

00:14:12.679 --> 00:14:15.759
data this way is still linear regression. Yeah.

00:14:16.139 --> 00:14:19.200
You are fitting a linear model to nonlinear data

00:14:19.200 --> 00:14:22.379
simply by transforming the inputs. That's incredibly

00:14:22.379 --> 00:14:24.340
clever. And the evolution didn't stop at curves

00:14:24.340 --> 00:14:27.269
either. The framework expanded into what are

00:14:27.269 --> 00:14:30.809
called generalized linear models or GLMs. These

00:14:30.809 --> 00:14:32.629
were developed for when the real world doesn't

00:14:32.629 --> 00:14:35.009
give you simple, continuous, easily measurable

00:14:35.009 --> 00:14:37.009
numbers to predict. Right, because not everything

00:14:37.009 --> 00:14:39.070
is a fluid measurement like a price or a height.

00:14:39.429 --> 00:14:41.190
Sometimes you're counting discrete events. Like

00:14:41.190 --> 00:14:44.269
how many people are in a room. Yeah. The source

00:14:44.269 --> 00:14:46.860
mentions Poisson regression. Which is a type

00:14:46.860 --> 00:14:49.220
of GLM used for modeling positive quantities

00:14:49.220 --> 00:14:52.240
that scale wildly, like counting the population

00:14:52.240 --> 00:14:54.419
of a city or the number of cars passing through

00:14:54.419 --> 00:14:56.879
an intersection. Or consider predicting a binary

00:14:56.879 --> 00:14:59.840
choice, like a coin flip. or whether a patient

00:14:59.840 --> 00:15:02.299
has a specific disease or not, a straight line

00:15:02.299 --> 00:15:04.259
going off into infinity just doesn't make any

00:15:04.259 --> 00:15:06.720
sense for a simple yes or no question. Right.

00:15:07.519 --> 00:15:10.120
So statisticians use another GLM called logistic

00:15:10.120 --> 00:15:12.840
regression. OK, but how does that actually map

00:15:12.840 --> 00:15:15.440
a straight line to a yes or no question? Well,

00:15:15.759 --> 00:15:18.220
instead of the line plotting a raw value from

00:15:18.220 --> 00:15:21.259
zero to infinity, the math basically bends the

00:15:21.259 --> 00:15:24.389
output into an S -curve. clamping it so the result

00:15:24.389 --> 00:15:26.750
is always a percentage between 0 and 1. Oh, we'll

00:15:26.750 --> 00:15:28.870
see. It takes the linear math and translates

00:15:28.870 --> 00:15:31.509
it into a probability. So a 0 .8 output means

00:15:31.509 --> 00:15:34.309
there is an 80 % chance the answer is yes. Wow.

00:15:34.789 --> 00:15:36.870
This tool just morphs to fit whatever domain

00:15:36.870 --> 00:15:40.070
it enters. It really does. The real -world applications

00:15:40.070 --> 00:15:42.370
mentioned in the text are staggering. Like, take

00:15:42.370 --> 00:15:44.769
building science. There's an ongoing debate over

00:15:44.769 --> 00:15:47.389
thermal comfort scales for designing HVAC systems.

00:15:47.730 --> 00:15:50.580
Oh yeah, the temperature debate. Yeah. You ask

00:15:50.580 --> 00:15:53.759
people in an office to rate how they feel on

00:15:53.759 --> 00:15:56.240
a scale from negative three, which is cold, to

00:15:56.240 --> 00:15:59.360
positive three, which is hot. The debate is entirely

00:15:59.360 --> 00:16:02.179
about the direction of the regression. Do you

00:16:02.179 --> 00:16:04.600
regress the comfort votes against the room temperature,

00:16:05.100 --> 00:16:07.019
treating the temperature as the independent variable?

00:16:07.679 --> 00:16:09.779
Or do you do the opposite, regressing the temperature

00:16:09.779 --> 00:16:12.080
against the comfort votes to find the ideal neutral

00:16:12.080 --> 00:16:14.700
setting? And deciding which variable is X and

00:16:14.700 --> 00:16:17.320
which is Y completely changes the mathematical

00:16:17.320 --> 00:16:19.899
outcome of the model. Wait, really? Just flipping

00:16:19.899 --> 00:16:23.059
them? Yes. If you flip the assumption of what

00:16:23.059 --> 00:16:25.700
is causing what, you calculate a completely different

00:16:25.700 --> 00:16:28.539
optimal temperature, which physically changes

00:16:28.539 --> 00:16:30.440
how the architecture of the building's climate

00:16:30.440 --> 00:16:33.179
control is calibrated. That is wild. And then

00:16:33.179 --> 00:16:36.080
you have finance. The capital asset pricing model,

00:16:36.299 --> 00:16:39.840
the CAPM, uses the beta coefficient of a linear

00:16:39.840 --> 00:16:42.460
regression to quantify investment risk. Right.

00:16:42.720 --> 00:16:45.139
It measures how much a specific stock jumps around

00:16:45.139 --> 00:16:48.019
compared to the entire stock market. If a trader

00:16:48.019 --> 00:16:50.960
sees a stock with a beta of 1, it means the math

00:16:50.960 --> 00:16:53.240
shows the stock moves perfectly with the market.

00:16:53.679 --> 00:16:55.299
Right, it tracks it exactly. But if it has a

00:16:55.299 --> 00:16:58.559
beta of 2, it's twice as volatile. For every

00:16:58.559 --> 00:17:03.039
1 % the market drops, that stock drops 2%. That

00:17:03.039 --> 00:17:05.799
single regression coefficient quantifies the

00:17:05.799 --> 00:17:09.299
exact premium a trader needs to demand for taking

00:17:09.299 --> 00:17:11.759
on that extra risk. Wall Street basically runs

00:17:11.759 --> 00:17:14.319
on this simple mathematical relationship. And

00:17:14.319 --> 00:17:16.680
of course, we cannot ignore machine learning.

00:17:16.759 --> 00:17:19.819
No, absolutely not. Linear regression is one

00:17:19.819 --> 00:17:22.019
of the fundamental supervised machine learning

00:17:22.019 --> 00:17:24.480
algorithms. So what does this all mean for you

00:17:24.480 --> 00:17:27.440
listening? It means that before artificial intelligence

00:17:27.440 --> 00:17:30.579
could generate photorealistic art or write code

00:17:30.579 --> 00:17:33.680
or drive cars, it first had to master the basic

00:17:33.680 --> 00:17:36.059
act of drawing the best possible line through

00:17:36.059 --> 00:17:38.619
the noise. It is the absolute foundation of learning

00:17:38.619 --> 00:17:40.900
from data. It's the foundational act of taking

00:17:40.900 --> 00:17:43.819
known inputs and mapped outputs and finding the

00:17:43.819 --> 00:17:46.079
mathematical bridge between them. We have covered

00:17:46.079 --> 00:17:48.559
incredible ground today. We started with 18th

00:17:48.559 --> 00:17:50.640
century astronomers scribbling equations to track

00:17:50.640 --> 00:17:53.380
planets and we followed that exact same mathematical

00:17:53.380 --> 00:17:56.559
thread. We saw Galton mapping human heights,

00:17:56.940 --> 00:18:00.180
we navigated the visual traps of Anscombe's quartet,

00:18:00.599 --> 00:18:02.920
we challenged the illusion of holding variables

00:18:02.920 --> 00:18:06.180
fixed in a messy world, all the way to the invisible

00:18:06.180 --> 00:18:09.299
architecture that predicts stock risks and trains

00:18:09.299 --> 00:18:11.980
modern artificial intelligence. It's really a

00:18:11.980 --> 00:18:15.400
testament to the enduring human need to find

00:18:15.400 --> 00:18:18.609
the signal in the noise. Definitely. But I actually

00:18:18.609 --> 00:18:20.269
want to leave you with a final thought, drawn

00:18:20.269 --> 00:18:22.509
from an advanced technique mentioned in the text

00:18:22.509 --> 00:18:25.630
called regularized regression. These are methods

00:18:25.630 --> 00:18:29.109
like ridge and lasso regression. We talked earlier

00:18:29.109 --> 00:18:31.470
about how ordinary least squares tries to find

00:18:31.470 --> 00:18:34.210
the absolutely perfect fit for the data it is

00:18:34.210 --> 00:18:37.769
given, minimizing that error to zero if possible.

00:18:37.950 --> 00:18:40.670
Yeah, that terrifying penalty for outliers. Exactly.

00:18:41.069 --> 00:18:43.910
But sometimes a perfect fit on your past data

00:18:43.910 --> 00:18:45.829
makes you terrible at predicting the future.

00:18:46.730 --> 00:18:48.910
all the random noise and outliers instead of

00:18:48.910 --> 00:18:51.730
learning the actual underlying trend. Precisely.

00:18:51.789 --> 00:18:54.309
It's a mathematical failure called overfitting.

00:18:54.950 --> 00:18:57.329
Your model becomes perfectly adapted to the past

00:18:57.329 --> 00:19:00.529
and utterly useless for the future. So regularized

00:19:00.529 --> 00:19:03.009
regression does something highly counterintuitive.

00:19:03.349 --> 00:19:06.109
It deliberately introduces mathematical bias

00:19:06.109 --> 00:19:08.829
into the estimation. Wait, it intentionally makes

00:19:08.829 --> 00:19:10.930
its own calculations a little bit wrong on the

00:19:10.930 --> 00:19:13.789
training data? Yes. It deliberately accepts a

00:19:13.789 --> 00:19:16.509
mathematical bias, a slight deviation from the

00:19:16.509 --> 00:19:19.450
pure data in order to drastically reduce its

00:19:19.450 --> 00:19:22.589
variance when it faces brand new unseen data.

00:19:23.069 --> 00:19:25.250
Wow. It essentially calculates that trying to

00:19:25.250 --> 00:19:27.910
be absolutely perfect makes the system too fragile.

00:19:28.490 --> 00:19:30.769
By accepting a little bias, the algorithm becomes

00:19:30.769 --> 00:19:33.750
vastly more robust and useful when it hits the

00:19:33.750 --> 00:19:36.750
messy reality of the real world. The algorithm

00:19:36.750 --> 00:19:39.190
is literally programmed to abandon perfection

00:19:39.190 --> 00:19:41.970
to survive. Exactly. And this raises an important

00:19:41.970 --> 00:19:43.730
question I want you to mull over. OK. If the

00:19:43.730 --> 00:19:45.660
most advanced predictive algorithms the world,

00:19:45.980 --> 00:19:47.859
the ones managing our finances and driving our

00:19:47.859 --> 00:19:50.440
cars, are literally programmed to accept a little

00:19:50.440 --> 00:19:53.019
bit of bias just to function better and survive

00:19:53.019 --> 00:19:56.160
in reality. What does that say about our own

00:19:56.160 --> 00:19:59.180
human pursuit of pure, absolute, unbiased truth?

00:19:59.440 --> 00:20:01.829
That is a phenomenal question to end on. Because

00:20:01.829 --> 00:20:04.410
whether you're a 1700s astronomer peering through

00:20:04.410 --> 00:20:07.829
a foggy telescope, or a modern AI crunching millions

00:20:07.829 --> 00:20:10.630
of data points, you're still just trying to draw

00:20:10.630 --> 00:20:13.029
a line through a very chaotic reality. Thank

00:20:13.029 --> 00:20:14.730
you for joining us on this deep dive. We will

00:20:14.730 --> 00:20:15.309
see you next time.
