WEBVTT

00:00:00.000 --> 00:00:02.359
You know, when we try to predict the future,

00:00:03.140 --> 00:00:05.599
we usually picture looking forward, like, I don't

00:00:05.599 --> 00:00:09.519
know, driving a car. staring through a nice,

00:00:09.660 --> 00:00:11.779
clean windshield to see what's coming up over

00:00:11.779 --> 00:00:13.859
the horizon. Right, yeah, it's a very natural,

00:00:14.160 --> 00:00:16.420
forward -facing kind of visual. We want to see

00:00:16.420 --> 00:00:18.300
the destination before we actually get there.

00:00:18.579 --> 00:00:20.960
Exactly, you're watching for obstacles, sharp

00:00:20.960 --> 00:00:24.660
turns, changing road conditions, but imagine

00:00:24.660 --> 00:00:26.140
you were driving a vehicle where the windshield

00:00:26.140 --> 00:00:28.640
was just completely blacked out, like welded

00:00:28.640 --> 00:00:32.140
shut. Oh, wow. Yeah, and your absolute only way

00:00:32.140 --> 00:00:33.880
to steer, the only way to figure out where you're

00:00:33.880 --> 00:00:37.100
going is by staring intensely into the rear view.

00:00:36.969 --> 00:00:40.530
mirror at the road you've just left behind. I

00:00:40.530 --> 00:00:42.329
mean, that sounds like a frankly terrifying way

00:00:42.329 --> 00:00:45.250
to drive on a highway. Right. But mathematically

00:00:45.250 --> 00:00:48.929
speaking, that's the exact mechanism powering

00:00:48.929 --> 00:00:50.969
some of the most sophisticated predictive engines

00:00:50.969 --> 00:00:53.649
in our modern world. I mean, it is the fundamental

00:00:53.649 --> 00:00:56.869
definition of an autoregressive process. Welcome

00:00:56.869 --> 00:01:00.329
to the deep dive. Today we are decoding the invisible

00:01:00.329 --> 00:01:02.990
math that basically runs the world around you.

00:01:03.509 --> 00:01:06.170
We're going to unpack the autoregressive model

00:01:06.170 --> 00:01:09.230
or AR model for short. Yeah, it's a huge topic.

00:01:09.359 --> 00:01:12.439
It really is. Think of this as your shortcut

00:01:12.439 --> 00:01:15.099
to understanding how scientists, economists,

00:01:15.459 --> 00:01:18.840
engineers actually forecast tomorrow based entirely

00:01:18.840 --> 00:01:21.219
on yesterday. We're going to explore what an

00:01:21.219 --> 00:01:23.939
AR model physically does, how it prevents the

00:01:23.939 --> 00:01:27.359
stock market from breaking math itself, and why

00:01:27.359 --> 00:01:29.799
the cutting edge AI you interact with every single

00:01:29.799 --> 00:01:32.500
day actually borrows its name. It really is the

00:01:32.500 --> 00:01:37.250
hidden engine of modern forecasting. The actual

00:01:37.250 --> 00:01:39.650
statistical equations might look like this intimidating

00:01:39.650 --> 00:01:42.030
wall of Greek letters on the surface. Oh, definitely.

00:01:42.109 --> 00:01:43.989
The underlying logic is something we all use

00:01:43.989 --> 00:01:46.510
intuitively every single day. We just need to

00:01:46.510 --> 00:01:49.010
unpack the why behind those formulas. Okay, let's

00:01:49.010 --> 00:01:51.450
unpack this because before we get into predicting

00:01:51.450 --> 00:01:53.590
financial crashes or massive weather events,

00:01:54.230 --> 00:01:57.670
we need to build the foundation. What does autoregressive

00:01:57.670 --> 00:02:01.040
actually mean in a practical sense? Because when

00:02:01.040 --> 00:02:04.280
we dig into the material, an AR model is defined

00:02:04.280 --> 00:02:07.420
as a, quote, stochastic difference equation,

00:02:07.900 --> 00:02:09.879
where the current value is linearly dependent

00:02:09.879 --> 00:02:12.439
on its own previous values, plus a random white

00:02:12.439 --> 00:02:15.400
noise or shock term. Yeah. So let's strip the

00:02:15.400 --> 00:02:17.400
jargon away from those terms, because understanding

00:02:17.400 --> 00:02:20.620
them really changes how you see the data. It's

00:02:20.620 --> 00:02:22.500
a difference equation, right? Not a differential

00:02:22.500 --> 00:02:24.939
equation. OK. What's the distinction there? Well,

00:02:25.060 --> 00:02:27.379
a differential equation deals with continuous

00:02:29.709 --> 00:02:33.050
Smooth flows of change. So think of the precise

00:02:33.050 --> 00:02:35.349
uninterrupted curve of a planet orbiting the

00:02:35.349 --> 00:02:39.810
sun. OK. Smooth, constant math. Exactly. A difference

00:02:39.810 --> 00:02:42.590
equation, however, deals with discrete, specific

00:02:42.590 --> 00:02:46.409
steps in time. It moves in chunks. Like checking

00:02:46.409 --> 00:02:49.009
your 401k exactly once a day at the closing bell,

00:02:49.270 --> 00:02:51.490
or measuring the temperature outside your window

00:02:51.490 --> 00:02:53.590
every hour on the hour. You aren't watching a

00:02:53.590 --> 00:02:55.250
smooth flow. You're looking at stepping stones.

00:02:55.449 --> 00:02:57.409
Precisely. And that second word, stochastic.

00:02:58.580 --> 00:03:00.599
That simply tells us that we are not dealing

00:03:00.599 --> 00:03:02.539
with absolute certainty. Right, there's randomness.

00:03:02.800 --> 00:03:05.340
Yeah, there is a built -in random element. And

00:03:05.340 --> 00:03:07.759
the notation you'll see used by statisticians

00:03:07.759 --> 00:03:11.560
is ARP. That P in the parentheses stands for

00:03:11.560 --> 00:03:14.219
the order or the maximum lag. Okay, so how far

00:03:14.219 --> 00:03:16.780
back we look. Exactly. It tells the system exactly

00:03:16.780 --> 00:03:19.020
how many past stepping stones it's allowed to

00:03:19.020 --> 00:03:21.159
look at to figure out where to place the next

00:03:21.159 --> 00:03:24.360
step. So an AR1 model only looks at yesterday

00:03:24.360 --> 00:03:28.060
to predict today. Got it. And an AR5 model looks

00:03:28.060 --> 00:03:31.199
at the last five days. Right. Assigning a specific

00:03:31.199 --> 00:03:33.800
mathematical weight to each one of those past

00:03:33.800 --> 00:03:36.219
five days. I like to think of this like a continuously

00:03:36.219 --> 00:03:40.340
running recipe. Imagine you're making this massive

00:03:40.340 --> 00:03:43.490
pot of stew. OK, stew. I'm with you. Today's

00:03:43.490 --> 00:03:45.930
stew is made using whatever leftover broth you

00:03:45.930 --> 00:03:47.610
have from yesterday's stew and maybe the day

00:03:47.610 --> 00:03:50.550
before that. That is your linear dependence on

00:03:50.550 --> 00:03:52.870
previous values, right? The base carries over.

00:03:53.009 --> 00:03:55.669
Mm -hmm. The history of the soup. Exactly. But

00:03:55.669 --> 00:03:58.629
to keep things interesting, and this is the stochastic

00:03:58.629 --> 00:04:01.949
part, every single day you close your eyes, reach

00:04:01.949 --> 00:04:03.949
blindly into the spice cabinet, and just throw

00:04:03.949 --> 00:04:06.569
in one entirely random new ingredient. Oh, I

00:04:06.569 --> 00:04:09.090
see. That random spice is the white noise or

00:04:09.090 --> 00:04:11.969
the daily shock. So you know the historical base

00:04:11.969 --> 00:04:14.909
of the stew, but you can never perfectly predict

00:04:14.909 --> 00:04:17.290
the final flavor of today's bowl because of that

00:04:17.290 --> 00:04:19.550
random daily addition. That's a great way to

00:04:19.550 --> 00:04:21.990
visualize it. The base ingredients carry over,

00:04:22.290 --> 00:04:24.490
but the continuous injection of randomness keeps

00:04:24.490 --> 00:04:27.050
the system evolving. Now what's fascinating here

00:04:27.050 --> 00:04:29.529
is how this connects to the artificial intelligence

00:04:29.529 --> 00:04:31.790
tools everyone is using right now. Here's where

00:04:31.790 --> 00:04:34.769
it gets really interesting. Large language models,

00:04:35.050 --> 00:04:39.800
right? LLMs. You hear tech CEOs constantly calling

00:04:39.800 --> 00:04:43.420
their AI auto -regressive models. They do use

00:04:43.420 --> 00:04:45.379
the term constantly, yeah. Yeah. And, you know,

00:04:45.879 --> 00:04:48.079
philosophically it makes sense. And LLM works

00:04:48.079 --> 00:04:50.699
by predicting the very next word in a sentence

00:04:50.699 --> 00:04:52.579
based strictly on the sequence of words that

00:04:52.579 --> 00:04:54.420
preceded it. Right. It looks backward to guess

00:04:54.420 --> 00:04:57.279
the next word. Exactly. It takes the past context

00:04:57.279 --> 00:04:59.759
to generate the immediate future, step by step.

00:05:00.519 --> 00:05:03.180
However... We really have to clarify a critical

00:05:03.180 --> 00:05:06.300
distinction here. LLMs are not classical autoregressive

00:05:06.300 --> 00:05:08.600
models in the statistical sense at all. Because

00:05:08.600 --> 00:05:10.899
they aren't linear. Right. They don't just apply

00:05:10.899 --> 00:05:13.060
a simple percentage weight to yesterday. They

00:05:13.060 --> 00:05:17.000
don't. A classical AR model relies on strict

00:05:17.000 --> 00:05:20.180
linear mathematical structures. So the relationship

00:05:20.180 --> 00:05:22.139
between yesterday's temperature and today's temperature

00:05:22.139 --> 00:05:25.620
is a simple constant multiplier. Let's say 0

00:05:25.620 --> 00:05:30.519
.8. OK. LLMs, on the other hand, use incredibly

00:05:30.519 --> 00:05:33.420
complex, highly non -linear neural networks.

00:05:34.139 --> 00:05:36.839
Specifically, attention mechanisms, right, to

00:05:36.839 --> 00:05:39.459
weigh the importance of passwords. Oh, like the

00:05:39.459 --> 00:05:42.019
word Apple might be wildly important to the next

00:05:42.019 --> 00:05:44.360
word, even if it appeared like... 20 paragraphs

00:05:44.360 --> 00:05:47.319
ago. Exactly. So they share the conceptual framework

00:05:47.319 --> 00:05:49.420
of, you know, the past predicts the present,

00:05:49.680 --> 00:05:52.079
but under the hood, the actual mechanics are

00:05:52.079 --> 00:05:54.040
vastly different. It's a shared philosophy, not

00:05:54.040 --> 00:05:56.120
a shared equation. Very well put. But let's stay

00:05:56.120 --> 00:05:58.319
with those strict statistical equations for a

00:05:58.319 --> 00:06:00.720
moment. Because if the present is always built

00:06:00.720 --> 00:06:02.879
on the past, what happens when something totally

00:06:02.879 --> 00:06:05.740
unprecedented hits the system, like a massive

00:06:05.740 --> 00:06:08.579
sudden shock? Say you accidentally dump a whole

00:06:08.579 --> 00:06:11.379
cup of cayenne pepper into our soup. Well, according

00:06:11.379 --> 00:06:14.220
to the math of an AR process, that shock has

00:06:14.220 --> 00:06:16.519
a permanent intertemporal effect. Permanent?

00:06:16.720 --> 00:06:21.160
Like forever? Yes. A one -time shock, which is

00:06:21.160 --> 00:06:24.560
mathematically just a non -zero white noise value,

00:06:25.160 --> 00:06:27.519
echoes infinitely far into the future. Infinitely?

00:06:27.720 --> 00:06:29.699
Okay, how does that work? Well, because variable

00:06:29.699 --> 00:06:32.980
X1 determines variable X2, and X2 determines

00:06:32.980 --> 00:06:37.180
X3, which determines X4 and so on forever. If

00:06:37.180 --> 00:06:40.319
you rewrite the autoregression equation, By isolating

00:06:40.319 --> 00:06:44.639
the noise, dividing by the back shift operator

00:06:44.639 --> 00:06:47.420
polynomial, basically you end up with an infinite

00:06:47.420 --> 00:06:49.480
number of lagged values on the right side of

00:06:49.480 --> 00:06:52.220
the equation. Every single value today is carrying

00:06:52.220 --> 00:06:54.660
the microscopic echoes of shocks that occurred

00:06:54.660 --> 00:06:56.660
infinitely far into the path. I have to push

00:06:56.660 --> 00:06:59.439
back on this. If every single random shock that

00:06:59.439 --> 00:07:02.920
ever happens echoes forever, why doesn't my entire

00:07:02.920 --> 00:07:05.060
stock portfolio or the global weather system

00:07:05.060 --> 00:07:07.759
just explode into total mathematical chaos? That

00:07:07.759 --> 00:07:09.819
is a very good question. Like if yesterday's

00:07:09.819 --> 00:07:11.860
market panic is still bouncing around and last

00:07:11.860 --> 00:07:13.819
year's flash crash and last decade's housing

00:07:13.819 --> 00:07:16.019
crisis, wouldn't the variance just stack up and

00:07:16.019 --> 00:07:18.480
diverge to infinity? Yes, it would. And that

00:07:18.480 --> 00:07:21.779
is the exact mathematical hurdle that engineers

00:07:21.779 --> 00:07:24.589
and economists have to solve for. It brings us

00:07:24.589 --> 00:07:27.350
to a crucial concept called weak sense stationarity.

00:07:27.529 --> 00:07:29.589
Weak sense stationarity, okay. To prevent the

00:07:29.589 --> 00:07:32.449
system from tearing itself apart, the math has

00:07:32.449 --> 00:07:35.149
to enforce strict physical constraints. How does

00:07:35.149 --> 00:07:37.750
the math physically prevent the chaos, though?

00:07:37.829 --> 00:07:40.629
Like, what acts as the brakes? Let's look at

00:07:40.629 --> 00:07:43.370
a simple AR1 model. The parameter that's the

00:07:43.370 --> 00:07:45.269
multiplier that connects yesterday to today,

00:07:45.269 --> 00:07:47.709
it must have an absolute value of less than 1.

00:07:47.750 --> 00:07:50.009
Oh, less than 1, okay. Right. So let's say the

00:07:50.009 --> 00:07:52.910
multiplier is 0 .5. If a massive shock of size

00:07:52.910 --> 00:07:55.959
10 hits the market today, tomorrow its echo is

00:07:55.959 --> 00:07:58.639
only multiplied by 0 .5, so it drops to 5. And

00:07:58.639 --> 00:08:01.300
the next day it's 2 .5. Exactly. The next day,

00:08:01.420 --> 00:08:04.240
1 .25. So the echo technically goes on forever,

00:08:04.439 --> 00:08:06.839
but it rapidly diminishes towards zero in the

00:08:06.839 --> 00:08:09.420
limit. Ah, so the shock is absorbed, the cayenne

00:08:09.420 --> 00:08:11.720
pepper dilutes into this massive pot of soup

00:08:11.720 --> 00:08:14.019
over time, and so you basically can't taste it

00:08:14.019 --> 00:08:16.620
anymore. Precisely. If that multiplier was exactly

00:08:16.620 --> 00:08:19.699
1, or greater than 1, the variance diverges to

00:08:19.699 --> 00:08:22.379
infinity, the system is not stationary, and the

00:08:22.379 --> 00:08:25.240
math just fails to model reality. It just breaks.

00:08:25.439 --> 00:08:27.759
Right. And in more complex models with higher

00:08:27.759 --> 00:08:30.860
lags, like in AR5, the math gets a bit more elegant.

00:08:31.339 --> 00:08:33.360
You might encounter the rule that the roots of

00:08:33.360 --> 00:08:36.220
the model's characteristic polynomial must lie

00:08:36.220 --> 00:08:40.519
strictly outside the unit circle. OK. Characteristic

00:08:40.519 --> 00:08:42.940
polynomial and unit circle sound like the exact

00:08:42.940 --> 00:08:45.360
kind of math that made me drop calculus. How

00:08:45.360 --> 00:08:47.799
do we visualize that? Fair enough. Think of the

00:08:47.799 --> 00:08:50.700
unit circle as a physical containment zone on

00:08:50.700 --> 00:08:54.409
a graph. Okay, containment zone. The roots are

00:08:54.409 --> 00:08:56.769
basically the core solutions that govern how

00:08:56.769 --> 00:08:59.889
the equation behaves. The mathematical law states

00:08:59.889 --> 00:09:02.330
that these roots must exist completely outside

00:09:02.330 --> 00:09:05.409
of this circle. If even one root breaches that

00:09:05.409 --> 00:09:08.090
boundary and creeps inside the circle, the containment

00:09:08.090 --> 00:09:10.250
field fails. And then what happens? The shock

00:09:10.250 --> 00:09:12.370
waves, instead of fading out like with that 0

00:09:12.370 --> 00:09:15.190
.5 multiplier, They begin to amplify with every

00:09:15.190 --> 00:09:17.350
step. Yeah. If we connect this to the bigger

00:09:17.350 --> 00:09:20.450
picture, this mathematical boundary is what separates

00:09:20.450 --> 00:09:24.190
a stable, predictable system from a chaotic runaway

00:09:24.190 --> 00:09:27.649
train. It is what guarantees that a flash crash

00:09:27.649 --> 00:09:31.070
in, say, an algorithmic trading bot eventually

00:09:31.070 --> 00:09:33.690
loses its influence, allowing the market to return

00:09:33.690 --> 00:09:36.350
to a constant mean. OK, so we have this stable

00:09:36.350 --> 00:09:38.750
system where shocks ripple, they dilute, and

00:09:38.750 --> 00:09:42.320
they fade. Moving from the abstract math to something

00:09:42.320 --> 00:09:45.440
a bit more tangible, statisticians have a really

00:09:45.440 --> 00:09:47.299
clever way of describing what these different

00:09:47.299 --> 00:09:49.639
AR parameters actually look and sound like. They

00:09:49.639 --> 00:09:52.419
use the terminology of color. Yeah, mapping statistical

00:09:52.419 --> 00:09:54.820
noise to the visual and auditory spectrum is

00:09:54.820 --> 00:09:57.519
such a brilliant way to conceptualize invisible

00:09:57.519 --> 00:09:59.799
data. It really is. Let's start with AR0. That

00:09:59.799 --> 00:10:03.559
means zero lag, no dependence on the past whatsoever.

00:10:03.639 --> 00:10:06.100
Just total randomness. Exactly. Every single

00:10:06.100 --> 00:10:08.740
data point is pure independent randomness. Which

00:10:08.740 --> 00:10:11.279
is known as white noise, right? Just like white

00:10:11.279 --> 00:10:13.039
light contains all visual frequencies equally,

00:10:13.279 --> 00:10:15.320
white noise contains all frequencies of data

00:10:15.320 --> 00:10:17.639
equally. It's just pure unpredictable static.

00:10:18.019 --> 00:10:21.759
But then you introduce an AR1 process with a

00:10:21.759 --> 00:10:25.500
positive parameter, meaning today is positively

00:10:25.500 --> 00:10:27.879
correlated with yesterday. If yesterday was high,

00:10:28.059 --> 00:10:31.259
today wants to be high. This acts as a low pass

00:10:31.259 --> 00:10:34.570
filter on the data. It smooths things out. effectively

00:10:34.570 --> 00:10:36.970
reducing the high -frequency jitters. And they

00:10:36.970 --> 00:10:39.929
call this red noise, which makes perfect sense

00:10:39.929 --> 00:10:42.250
if you think about literal light spectrum. Right,

00:10:42.370 --> 00:10:44.169
because of the wavelengths. Yeah, red light has

00:10:44.169 --> 00:10:46.549
the longest, most rolling wavelengths. If you

00:10:46.549 --> 00:10:48.789
apply a low -pass filter to full -spectrum white

00:10:48.789 --> 00:10:51.370
light, everything except the smooth red light

00:10:51.370 --> 00:10:54.409
gets stripped away. So a positive AR1 parameter

00:10:54.409 --> 00:10:56.330
makes your data look like rolling red hills.

00:10:56.730 --> 00:11:00.669
Exactly. Conversely, If your AR1 parameter is

00:11:00.669 --> 00:11:03.129
negative, today is negatively correlated with

00:11:03.129 --> 00:11:05.929
yesterday. So if yesterday was high, today desperately

00:11:05.929 --> 00:11:08.570
wants to be low. We had a pendulum. Yeah. This

00:11:08.570 --> 00:11:11.490
acts as a high pass filter, favoring rapid, jagged

00:11:11.490 --> 00:11:14.549
back and forth changes. Blue noise. Because blue

00:11:14.549 --> 00:11:16.590
light has short, rapid, high energy wavelengths.

00:11:16.730 --> 00:11:19.309
You got it. And when you step up to an AR2 process,

00:11:19.549 --> 00:11:21.730
where the equation looks back two distinct steps,

00:11:22.190 --> 00:11:24.950
the behavior gets even more dynamic. Well. If

00:11:24.950 --> 00:11:26.990
your first parameter is positive and your second

00:11:26.990 --> 00:11:30.549
is negative, the process actively favors changes

00:11:30.549 --> 00:11:33.269
in direction. It oscillates. So instead of just

00:11:33.269 --> 00:11:35.710
being smooth or jagged, it's actively seeking

00:11:35.710 --> 00:11:38.149
a rhythm. It's looking for turning points. It

00:11:38.149 --> 00:11:40.250
is. And this isn't just an abstract art project

00:11:40.250 --> 00:11:43.009
with red and blue filters, you know. These visual

00:11:43.009 --> 00:11:45.889
and auditory concepts are fundamental to physical

00:11:45.889 --> 00:11:48.470
signal processing. Oh, in the real world. Yeah.

00:11:48.639 --> 00:11:51.740
Engineers rely on these specific autoregressive

00:11:51.740 --> 00:11:54.440
frequency peaks to process biological signals.

00:11:55.039 --> 00:11:57.580
Take an EEG machine reading human brain waves.

00:11:57.679 --> 00:11:59.940
Right, measuring electrical activity. The brain

00:11:59.940 --> 00:12:02.200
waves act as the historical data. The machine

00:12:02.200 --> 00:12:05.159
applies an AR model to the signal. Normally the

00:12:05.159 --> 00:12:07.179
brain might be outputting smooth, rolling red

00:12:07.179 --> 00:12:09.539
noise. Just a normal, stable state. But if the

00:12:09.539 --> 00:12:12.360
AR parameters suddenly shift, if the math detects

00:12:12.360 --> 00:12:14.460
a sudden spike of high -frequency blue noise

00:12:14.460 --> 00:12:17.639
or a specific AR2 oscillation, The algorithm

00:12:17.639 --> 00:12:20.960
immediately flags that change. It realizes the

00:12:20.960 --> 00:12:22.879
underlying multiplier the brain has changed.

00:12:23.100 --> 00:12:25.879
which could indicate the onset of a seizure or

00:12:25.879 --> 00:12:28.080
some specific neurological event. That makes

00:12:28.080 --> 00:12:30.679
the math incredibly real, and I imagine the military

00:12:30.679 --> 00:12:32.860
uses this too, right? Oh, definitely. Phased

00:12:32.860 --> 00:12:36.279
-array radar systems rely heavily on it. By understanding

00:12:36.279 --> 00:12:38.919
the color and the autoregressive frequency of

00:12:38.919 --> 00:12:41.320
the background environmental noise, the radar

00:12:41.320 --> 00:12:43.980
system can actively filter out the static and

00:12:43.980 --> 00:12:46.519
successfully track a moving physical target,

00:12:46.879 --> 00:12:49.419
even in a totally chaotic environment. But wait,

00:12:49.419 --> 00:12:52.120
this raises a huge real -world problem. Okay,

00:12:52.120 --> 00:12:54.669
what's that? Well, the real world isn't a perfectly

00:12:54.669 --> 00:12:57.809
controlled laboratory. The brain changes, the

00:12:57.809 --> 00:13:00.190
weather changes, the rules of the road change.

00:13:00.549 --> 00:13:03.110
How does this math survive when the environment

00:13:03.110 --> 00:13:06.070
itself shifts? Because if an AR model assumes

00:13:06.070 --> 00:13:08.750
a constant multiplier like that, the rule connecting

00:13:08.750 --> 00:13:11.710
1995 to 1996 is the exact same rule connecting

00:13:11.710 --> 00:13:15.370
2025 to 2026. Isn't it just doomed to fail eventually

00:13:15.370 --> 00:13:18.250
in complex systems? Yes, absolutely. A static

00:13:18.250 --> 00:13:20.529
model in a dynamic world will eventually give

00:13:20.529 --> 00:13:23.159
you a disastrously wrong prediction. And that's

00:13:23.159 --> 00:13:25.700
where statisticians upgrade to the TVAR model.

00:13:26.019 --> 00:13:29.519
TVAR, Time Varying Autoregressive Model. Exactly.

00:13:29.960 --> 00:13:32.559
TVAR allows the autoregressive coefficients themselves,

00:13:32.840 --> 00:13:34.799
those hidden multipliers, to actually evolve

00:13:34.799 --> 00:13:37.980
over time. So if a standard AR model is like

00:13:37.980 --> 00:13:40.659
a smart thermostat, trying to predict the room

00:13:40.659 --> 00:13:43.279
temperature based purely on the last few hours

00:13:43.279 --> 00:13:47.100
of data, a TVAR model is like a thermostat that

00:13:47.100 --> 00:13:49.480
also realizes the seasons are changing outside

00:13:49.480 --> 00:13:51.860
from summer to winter, and it actually rewrites

00:13:51.860 --> 00:13:54.740
its own internal math to adapt to the new reality.

00:13:55.080 --> 00:13:57.580
I love that. Taking that thermostat idea a step

00:13:57.580 --> 00:14:00.299
further, the system recognizes that the underlying

00:14:00.299 --> 00:14:02.600
physics or economics of the environment have

00:14:02.600 --> 00:14:05.159
structurally shifted and it continuously updates

00:14:05.159 --> 00:14:07.460
the weights of its past lags without needing

00:14:07.460 --> 00:14:09.980
any human intervention. That's incredible. Yeah.

00:14:10.019 --> 00:14:13.159
TVAR handles non -stationary evolving processes.

00:14:13.259 --> 00:14:15.519
And the practical applications for this are massive,

00:14:15.700 --> 00:14:18.659
right? Climate scientists use generative time

00:14:18.659 --> 00:14:21.080
-varying models for decadal storm power predictions

00:14:21.080 --> 00:14:23.519
in the Mediterranean. As the oceans warm, the

00:14:23.519 --> 00:14:26.299
historical multipliers change, and TVAR adjusts

00:14:26.299 --> 00:14:29.039
for that. And economists use TVAR for financial

00:14:29.039 --> 00:14:31.179
forecasting, because obviously the rules of human

00:14:31.179 --> 00:14:33.720
behavior during a massive market boom are completely

00:14:33.720 --> 00:14:35.419
different. than during a terrified recession.

00:14:35.720 --> 00:14:38.299
Very true. We also see it in heavy industry with

00:14:38.299 --> 00:14:41.659
failure prognostics. Imagine a massive industrial

00:14:41.659 --> 00:14:44.759
turbine lined with vibration sensors. OK, I'm

00:14:44.759 --> 00:14:46.700
picturing it. A standard model might just look

00:14:46.700 --> 00:14:48.899
for a flat threshold of shaking. You know, is

00:14:48.899 --> 00:14:51.559
it shaking too hard? Right. But a TVAR model

00:14:51.559 --> 00:14:54.539
uses the multivariate sensor data to constantly

00:14:54.539 --> 00:14:56.840
track the autoregressive relationship of the

00:14:56.840 --> 00:14:59.899
vibrations. As the metal fatigues over months,

00:15:00.500 --> 00:15:02.559
the way yesterday's vibration affects today's

00:15:02.559 --> 00:15:05.820
vibration subtly shifts. So the equation itself

00:15:05.820 --> 00:15:09.279
changes. Yes. The TVAR model detects the changing

00:15:09.279 --> 00:15:12.139
coefficients and predicts exactly when the machine

00:15:12.139 --> 00:15:14.700
is going to catastrophically break down, weeks

00:15:14.700 --> 00:15:16.440
before the human ear could ever hear a problem.

00:15:16.580 --> 00:15:18.940
So to bring this all together into the actual

00:15:18.940 --> 00:15:22.830
implementation. How do statisticians and these

00:15:22.830 --> 00:15:25.389
automated systems actually calculate these hidden

00:15:25.389 --> 00:15:28.570
parameters and forecast the future? We know they

00:15:28.570 --> 00:15:30.809
look at the past, but how do they deduce the

00:15:30.809 --> 00:15:33.450
invisible multipliers? Well, the first step is

00:15:33.450 --> 00:15:35.710
determining the maximum lag, that pit value we

00:15:35.710 --> 00:15:37.730
talked about. You don't just blindly guess how

00:15:37.730 --> 00:15:40.250
far back to look. Statisticians calculate the

00:15:40.250 --> 00:15:42.799
partial autocorrelation function. Partial auto

00:15:42.799 --> 00:15:45.360
-correlation. Conceptually, they test how much

00:15:45.360 --> 00:15:48.539
unique predictive power each step backward retains.

00:15:48.779 --> 00:15:51.440
They find the exact point where the auto -correlations

00:15:51.440 --> 00:15:54.200
drop to zero. meaning that looking any further

00:15:54.200 --> 00:15:57.460
back provides absolutely no new mathematical

00:15:57.460 --> 00:16:00.159
information, and that point becomes your maximum

00:16:00.159 --> 00:16:03.080
lag. OK, so you know exactly how far back the

00:16:03.080 --> 00:16:05.639
echo reaches. Now you need to find the actual

00:16:05.639 --> 00:16:08.440
multipliers, the gears turning inside the machine.

00:16:08.820 --> 00:16:10.519
And there are some heavy -hitting mathematical

00:16:10.519 --> 00:16:12.679
methods for this in the source material, right?

00:16:13.259 --> 00:16:15.899
Oh, definitely. The Yule -Walker equations, named

00:16:15.899 --> 00:16:19.009
after Udny Yule and Gilbert Walker. or the Berg

00:16:19.009 --> 00:16:21.490
method developed by John Parker Berg. Yeah, the

00:16:21.490 --> 00:16:23.529
Yule Walker equations are really the bedrock

00:16:23.529 --> 00:16:25.909
of this field. Without getting bogged down in

00:16:25.909 --> 00:16:28.169
matrix algebra, think about it conceptually,

00:16:28.529 --> 00:16:30.509
Yule Walker allows you to reverse engineer the

00:16:30.509 --> 00:16:32.929
system. Like looking at the footprint to guess

00:16:32.929 --> 00:16:35.320
the weight of the animal. Very similar. You start

00:16:35.320 --> 00:16:37.720
by looking at the actual observed data wave,

00:16:37.759 --> 00:16:39.919
how the data points move together, which is the

00:16:39.919 --> 00:16:42.419
covariance. The Yule Walker equations provide

00:16:42.419 --> 00:16:44.960
a direct mathematical bridge between that visible

00:16:44.960 --> 00:16:46.899
movement and the hidden parameters. Oh, I see.

00:16:47.039 --> 00:16:49.419
You observe the swing of the pendulum, and the

00:16:49.419 --> 00:16:51.820
math tells you exactly how long the string must

00:16:51.820 --> 00:16:54.399
be and what the force of gravity is. And what

00:16:54.399 --> 00:16:56.679
about the Berg method? The Berg method is another

00:16:56.679 --> 00:16:59.259
highly effective approach. Instead of just looking

00:16:59.259 --> 00:17:02.399
at the covariance, It runs prediction equations

00:17:02.399 --> 00:17:05.700
both forward and backward through the data simultaneously

00:17:05.700 --> 00:17:07.839
to estimate the parameters. Forward and backward.

00:17:08.059 --> 00:17:10.059
Yeah, they call them maximum entropy estimates.

00:17:10.720 --> 00:17:12.799
It's particularly famous for providing highly

00:17:12.799 --> 00:17:15.079
stable estimates, even when you have a very short

00:17:15.079 --> 00:17:17.460
data set. OK, so you have your lag distance,

00:17:17.519 --> 00:17:19.720
and you have your hidden parameters calculated.

00:17:20.119 --> 00:17:23.220
Now it's time to actually predict tomorrow. Right.

00:17:23.460 --> 00:17:26.240
The process is called n -step -ahead forecasting.

00:17:26.779 --> 00:17:29.400
And it seems conceptually simple for one day

00:17:29.400 --> 00:17:31.779
out, right? You just plug in today's known data.

00:17:31.920 --> 00:17:34.700
Exactly. But to predict the day after tomorrow,

00:17:35.140 --> 00:17:38.440
you don't have tomorrow's real data yet. So you

00:17:38.440 --> 00:17:40.900
have to substitute your predicted value for tomorrow

00:17:40.900 --> 00:17:43.380
back into the equation. Right. Your output becomes

00:17:43.380 --> 00:17:46.259
your input. So what does this all mean for the

00:17:46.259 --> 00:17:48.960
accuracy of a long -term forecast? It sounds

00:17:48.960 --> 00:17:51.569
exactly like walking into a thick fog. To predict

00:17:51.569 --> 00:17:54.170
one step ahead, your foot is on solid ground,

00:17:54.269 --> 00:17:57.349
real historical data. But to predict two steps

00:17:57.349 --> 00:17:59.869
ahead, you're stepping onto a prediction. To

00:17:59.869 --> 00:18:02.630
predict 10 steps ahead, you're building a guess

00:18:02.630 --> 00:18:05.630
on top of a guess on top of a guess. The further

00:18:05.630 --> 00:18:08.069
out you go, the more you abandon real data. The

00:18:08.069 --> 00:18:10.369
fog gets thicker. And your confidence interval,

00:18:10.509 --> 00:18:12.690
the space where the true answer likely lives,

00:18:13.009 --> 00:18:15.369
has to get dramatically wider. The compounding

00:18:15.369 --> 00:18:18.529
nature of the fog is exactly why long -term forecasting

00:18:18.529 --> 00:18:21.750
is so notoriously difficult. And the math formally

00:18:21.750 --> 00:18:24.690
breaks down four distinct sources of forecasting

00:18:24.690 --> 00:18:27.349
uncertainty that stack up the further out you

00:18:27.349 --> 00:18:29.250
step. Okay, if I'm walking into this fog, my

00:18:29.250 --> 00:18:31.210
first problem is probably that I picked the wrong

00:18:31.210 --> 00:18:33.869
model entirely. Like, I brought a map for the

00:18:33.869 --> 00:18:36.269
wrong city. That is exactly the first source

00:18:36.269 --> 00:18:38.890
of uncertainty. Model error. The fundamental

00:18:38.890 --> 00:18:41.990
structure of your AR equation might just be incorrect

00:18:41.990 --> 00:18:43.789
for the phenomenon you're studying. Okay, what

00:18:43.789 --> 00:18:45.890
are the other three? Well, based on your fog

00:18:45.890 --> 00:18:48.359
analogy, what would the second one be? The blurry

00:18:48.359 --> 00:18:50.920
stepping stones, uncertainty about the accuracy

00:18:50.920 --> 00:18:53.140
of the forecasted values I'm using as my new

00:18:53.140 --> 00:18:56.160
inputs. Yes, the compounded error of lag predictions.

00:18:56.819 --> 00:18:59.900
Third is parameter uncertainty. Even with the

00:18:59.900 --> 00:19:02.079
brilliance of the Yule Walker equations, the

00:19:02.079 --> 00:19:04.079
hidden multipliers you calculated from historical

00:19:04.079 --> 00:19:06.799
data might be slightly inaccurate, and those

00:19:06.799 --> 00:19:09.839
20 inaccuracies multiply exponentially. They

00:19:09.839 --> 00:19:12.480
just snowball. And the fourth? The fourth has

00:19:12.480 --> 00:19:15.059
to be the cayenne pepper, the unobserved error

00:19:15.059 --> 00:19:18.099
term itself, the random white noise shock that

00:19:18.099 --> 00:19:20.740
simply hasn't happened yet and by definition

00:19:20.740 --> 00:19:24.559
cannot be predicted. Wow! That is the ultimate

00:19:24.559 --> 00:19:27.079
limit of forecasting, the fundamental randomness

00:19:27.079 --> 00:19:29.380
of the universe. And this raises an important

00:19:29.380 --> 00:19:31.599
question about how we interact with technology

00:19:31.599 --> 00:19:34.299
today, the danger of blind faith in algorithms.

00:19:34.519 --> 00:19:36.259
Because the computer does it all for us now.

00:19:36.559 --> 00:19:39.170
Exactly. Today, software packages do all the

00:19:39.170 --> 00:19:42.369
heavy lifting. Programs like R, MATLAB's econometrics

00:19:42.369 --> 00:19:45.150
toolbox, or Python stats models. A user can run

00:19:45.150 --> 00:19:47.589
a complex AR forecast with, like, three lines

00:19:47.589 --> 00:19:50.480
of code. The computer instantly solves the Yule

00:19:50.480 --> 00:19:53.500
-Walker equations, applies the unit circle constraints,

00:19:53.960 --> 00:19:56.680
and spits out a beautiful, authoritative trend

00:19:56.680 --> 00:19:58.940
line stretching years into the future. But the

00:19:58.940 --> 00:20:01.660
computer doesn't explicitly tell you how foggy

00:20:01.660 --> 00:20:04.359
it is out there, unless you specifically command

00:20:04.359 --> 00:20:07.539
it to show its work. It presents a guess as a

00:20:07.539 --> 00:20:10.299
certainty. Which is why understanding those four

00:20:10.299 --> 00:20:13.019
sources of uncertainty is absolutely critical

00:20:13.019 --> 00:20:15.859
for you, the listener. Whenever you encounter

00:20:15.859 --> 00:20:18.680
a forecast in life, business, or the evening

00:20:18.680 --> 00:20:21.839
news, you have to critically evaluate it. The

00:20:21.839 --> 00:20:24.900
math running these AR models is rigorous, brilliant,

00:20:25.099 --> 00:20:28.299
and elegant, but it is fundamentally probabilistic.

00:20:28.599 --> 00:20:31.400
It offers a spectrum of likely futures, not a

00:20:31.400 --> 00:20:33.460
crystal ball. Which brings us to the end of our

00:20:33.460 --> 00:20:35.920
journey today. We have covered incredible ground

00:20:35.920 --> 00:20:38.519
in this deep dive. We started with the basic

00:20:38.519 --> 00:20:41.019
stochastic difference equation, literally building

00:20:41.019 --> 00:20:43.220
today's stew out of yesterday's broth. We did

00:20:43.220 --> 00:20:46.259
it. We saw how single random shocks create infinite

00:20:46.259 --> 00:20:48.700
ripples and how the mathematical containment

00:20:48.700 --> 00:20:50.819
field of the unit circle prevents our financial

00:20:50.819 --> 00:20:53.640
markets from exploding into chaos. We visualized

00:20:53.640 --> 00:20:56.380
the invisible data through smooth red noise and

00:20:56.380 --> 00:20:59.740
jagged blue noise and saw how EEG machines use

00:20:59.740 --> 00:21:02.930
that math to save lives. We explored how the

00:21:02.930 --> 00:21:06.190
TVAR model adapts to a shifting world, reading

00:21:06.190 --> 00:21:09.329
engine vibrations to predict failures. And finally,

00:21:09.650 --> 00:21:12.130
we walked into the compounding fog of N -step

00:21:12.130 --> 00:21:14.589
-ahead forecasting armed with the Yule -Walker

00:21:14.589 --> 00:21:17.130
equations. It is a profound piece of mathematical

00:21:17.130 --> 00:21:19.940
architecture. And hopefully you now possess the

00:21:19.940 --> 00:21:22.440
conceptual framework to demystify these models.

00:21:22.480 --> 00:21:24.539
Exactly. Whether it's the climate, the stock

00:21:24.539 --> 00:21:27.000
market, or the foundational philosophy of generative

00:21:27.000 --> 00:21:30.200
AI, these invisible autoregressive forces are

00:21:30.200 --> 00:21:32.500
everywhere. Yeah. And they are magic. They are

00:21:32.500 --> 00:21:34.880
just rigorous math, making the absolute best

00:21:34.880 --> 00:21:37.160
possible guess based on the evidence left behind.

00:21:37.440 --> 00:21:39.319
But I want to leave you with a final lingering

00:21:39.319 --> 00:21:42.049
thought. We started this deep dive by imagining

00:21:42.049 --> 00:21:44.369
driving a car with a blacked out windshield navigating

00:21:44.369 --> 00:21:46.710
entirely by staring backward into the rearview

00:21:46.710 --> 00:21:49.109
mirror. Autoregressive models, by definition,

00:21:49.289 --> 00:21:52.069
are the ultimate rearview mirrors. They fundamentally

00:21:52.069 --> 00:21:54.349
assume that tomorrow is just a linear combination

00:21:54.349 --> 00:21:57.230
of yesterday, plus a bit of random noise. That

00:21:57.230 --> 00:21:59.170
reliance on history is their core premise, yes.

00:21:59.450 --> 00:22:01.450
So here is the question for you to mull over

00:22:01.450 --> 00:22:04.619
as you go about your day. If our most powerful

00:22:04.619 --> 00:22:07.259
predictive engines can only construct the future

00:22:07.259 --> 00:22:10.380
out of the recycled pieces of the past, how can

00:22:10.380 --> 00:22:13.740
we ever mathematically model true, unprecedented

00:22:13.740 --> 00:22:16.700
human innovation? If the road behind you is a

00:22:16.700 --> 00:22:18.839
dirt path and the road ahead suddenly turns into

00:22:18.839 --> 00:22:21.259
a paved superhighway or a sheer cliff or a flying

00:22:21.259 --> 00:22:24.059
car, can any autoregressive model, no matter

00:22:24.059 --> 00:22:26.640
how advanced, ever see it coming? That is the

00:22:26.640 --> 00:22:28.859
ultimate limitation of looking backward to move

00:22:28.859 --> 00:22:31.259
forward. The math can only predict variations

00:22:31.259 --> 00:22:33.309
of what has already been Keep your eyes on the

00:22:33.309 --> 00:22:35.450
road. Thanks for joining us on this deep dive.
