WEBVTT

00:00:00.000 --> 00:00:03.220
You know, I feel like we tend to think of truth

00:00:03.220 --> 00:00:06.019
as this absolute thing. Right, like a binary.

00:00:06.419 --> 00:00:08.160
Yeah, exactly. Like you either have a disease

00:00:08.160 --> 00:00:13.179
or you don't. Or the email in your inbox is either

00:00:13.179 --> 00:00:16.219
a malicious phishing scam or it's just a legitimate

00:00:16.219 --> 00:00:18.510
message from your boss. Right. It feels like

00:00:18.510 --> 00:00:20.350
it should be entirely black and white. Yeah,

00:00:20.469 --> 00:00:23.030
we really crave that kind of certainty. Oh, absolutely.

00:00:23.710 --> 00:00:27.250
We expect this clear, definitive line separating

00:00:27.250 --> 00:00:29.989
one reality from another, especially when the

00:00:29.989 --> 00:00:32.329
stakes are really high. It's honestly the absolute

00:00:32.329 --> 00:00:34.530
definition of a foundational concept just hiding

00:00:34.530 --> 00:00:37.250
in plain sight. I mean, it is the visual map

00:00:37.250 --> 00:00:40.670
of how we measure accuracy across nearly every

00:00:40.670 --> 00:00:42.969
scientific and technological field today. OK,

00:00:42.990 --> 00:00:46.429
let's unpack this. Because, I mean, receiver

00:00:46.429 --> 00:00:48.909
operating characteristic sounds like a dense

00:00:48.909 --> 00:00:51.670
chapter in some vintage ham radio manual, right?

00:00:51.829 --> 00:00:53.710
Yeah, it definitely does. It does not sound like

00:00:53.710 --> 00:00:56.429
the mathematical engine powering like Silicon

00:00:56.429 --> 00:00:59.369
Valley data science or modern oncology. No, it

00:00:59.369 --> 00:01:02.469
doesn't. But to understand why we rely on this

00:01:02.469 --> 00:01:05.189
specific statistical graph today, we really have

00:01:05.189 --> 00:01:07.549
to look at why it was invented in the first place.

00:01:07.549 --> 00:01:09.870
Right. And that takes us straight into a literal

00:01:09.870 --> 00:01:12.650
life or death scenario in the 1940s. Which is

00:01:12.650 --> 00:01:15.260
such a wild pivot. So take us back there. So,

00:01:15.700 --> 00:01:17.920
the historical context here sets up the entire

00:01:17.920 --> 00:01:20.560
philosophy of the math. Following the attack

00:01:20.560 --> 00:01:23.680
on Pearl Harbor in 1941, the U .S. military realized

00:01:23.680 --> 00:01:26.359
they had this massive fatal vulnerability. Right,

00:01:26.439 --> 00:01:29.500
with incoming attacks. Exactly. They desperately

00:01:29.500 --> 00:01:31.920
needed to better distinguish Japanese aircraft

00:01:31.920 --> 00:01:35.459
using radar. But early radar wasn't, you know,

00:01:35.620 --> 00:01:38.459
a clean video game screen with little red triangles

00:01:38.459 --> 00:01:40.780
pointing out the bad guys. It was incredibly

00:01:40.780 --> 00:01:43.799
messy. Very. The signals bouncing back from a

00:01:43.799 --> 00:01:46.280
target to a receiver station were often of very

00:01:46.280 --> 00:01:48.299
low energy compared to the noise floor of the

00:01:48.299 --> 00:01:50.230
ocean. in the atmosphere. So it's just a barrage

00:01:50.230 --> 00:01:54.310
of fuzzy blips, static ambient noise. Yeah, exactly.

00:01:54.569 --> 00:01:57.090
So electrical engineers and radar technicians

00:01:57.090 --> 00:02:00.150
had to figure out a way to measure a human operator's

00:02:00.150 --> 00:02:02.469
ability to make these vital distinctions under

00:02:02.469 --> 00:02:04.629
immense pressure. Because they had to ask, like,

00:02:04.750 --> 00:02:07.609
is that specific blip an enemy bomber or is it

00:02:07.609 --> 00:02:10.349
just a really dense flock of birds? Right, or

00:02:10.349 --> 00:02:12.870
is it a battleship or just some weird reflection

00:02:12.870 --> 00:02:16.129
off a massive wave? Wow. So they needed a quantifiable

00:02:16.129 --> 00:02:19.430
way to grade the operator. characteristics. Hence

00:02:19.430 --> 00:02:21.909
the name, receiver operating characteristic.

00:02:21.969 --> 00:02:24.270
Which is an incredible origin story. But looking

00:02:24.270 --> 00:02:27.030
at our sources, this old World War II military

00:02:27.030 --> 00:02:29.689
concept somehow made the leap into becoming the

00:02:29.689 --> 00:02:32.069
gold standard in modern hospitals and machine

00:02:32.069 --> 00:02:34.009
learning. Yeah, it did. So what does this all

00:02:34.009 --> 00:02:36.590
mean for a doctor or a data scientist today?

00:02:36.750 --> 00:02:40.710
How did an old radar concept end up in like modern

00:02:40.710 --> 00:02:43.509
MRI machines? Well, because at a mathematical

00:02:43.509 --> 00:02:46.050
level, spotting a bomber in the clouds and spotting

00:02:46.050 --> 00:02:48.669
a tumor in an x -ray are fundamentally the exact

00:02:48.669 --> 00:02:52.000
same problem. Really? just signal versus noise.

00:02:52.280 --> 00:02:55.919
Exactly. Whether you are a radiologist or radar

00:02:55.919 --> 00:03:00.120
operator or fraud detection AI, looking at credit

00:03:00.120 --> 00:03:03.659
card swipes, your job is to spot a weak signal

00:03:03.659 --> 00:03:05.740
hidden in a sea of noise. That makes a lot of

00:03:05.740 --> 00:03:08.259
sense. Yeah. And so in the 1950s, psychologists

00:03:08.259 --> 00:03:10.659
adopted this curve to study human perception.

00:03:10.939 --> 00:03:13.419
And then soon after, medicine brought it in to

00:03:13.419 --> 00:03:15.800
evaluate blood tests and diagnostic tools. And

00:03:15.800 --> 00:03:18.180
today, it's just everywhere. It is the cornerstone

00:03:18.180 --> 00:03:21.240
of machine learning. context constantly changes,

00:03:21.599 --> 00:03:24.039
but the core challenge of separating signal from

00:03:24.039 --> 00:03:26.460
noise remains identical. OK. So to understand

00:03:26.460 --> 00:03:28.819
how doctors or algorithm actually make these

00:03:28.819 --> 00:03:30.400
detections, we first need to look at the four

00:03:30.400 --> 00:03:33.120
possible outcomes of any binary choice, right?

00:03:33.300 --> 00:03:35.620
Yes, exactly. The grid of truth. Right. We all

00:03:35.620 --> 00:03:38.139
know the basic confusion matrix, true positives,

00:03:38.280 --> 00:03:40.120
false negatives, that standard four -box grid.

00:03:40.439 --> 00:03:43.060
Right. So using the medical example from the

00:03:43.060 --> 00:03:45.919
source, if you actually have the disease and

00:03:45.919 --> 00:03:49.129
the test catches it, That's a true positive.

00:03:49.250 --> 00:03:51.770
Right, working as intended. Exactly. And if you're

00:03:51.770 --> 00:03:53.830
completely healthy, but the test says you're

00:03:53.830 --> 00:03:57.009
sick, that's a false positive, a false alarm.

00:03:57.169 --> 00:03:59.150
And then a true negative is you're healthy, and

00:03:59.150 --> 00:04:02.009
it says you're healthy. Yep. And the worst one,

00:04:02.229 --> 00:04:04.449
a false negative, is you have the disease, but

00:04:04.449 --> 00:04:07.110
the test says you don't. It's a miss. OK, so

00:04:07.110 --> 00:04:09.469
I want to use an analogy drawn directly from

00:04:09.469 --> 00:04:11.550
the source to ground this, because it blew my

00:04:11.550 --> 00:04:14.139
mind. Oh, the random guessing one. Yeah. If you

00:04:14.139 --> 00:04:16.800
have a completely random classifier, it's exactly

00:04:16.800 --> 00:04:19.300
like flipping a balance coin to diagnose a patient.

00:04:19.819 --> 00:04:21.620
Yes. Like, imagine a doctor just looking at you

00:04:21.620 --> 00:04:24.399
and going, heads you need chemo, tails you're

00:04:24.399 --> 00:04:27.300
fine. That's a terrifying thought. Truly. But

00:04:27.300 --> 00:04:29.579
mathematically, if you test enough people that

00:04:29.579 --> 00:04:32.199
way, you will accidentally catch 50 % of the

00:04:32.199 --> 00:04:34.680
sick people. Right. But you will also falsely

00:04:34.680 --> 00:04:37.300
diagnose 50 % of the healthy people. Exactly.

00:04:37.579 --> 00:04:40.060
And that introduces the two vital metrics we

00:04:40.060 --> 00:04:42.199
derive from all this. You have the true positive

00:04:42.199 --> 00:04:45.180
rate, which is your sensitivity, your probability

00:04:45.180 --> 00:04:46.980
of detection. Catching the bad thing. Right.

00:04:47.259 --> 00:04:49.720
Versus the false positive rate, your probability

00:04:49.720 --> 00:04:52.259
of a false alarm. And balancing those two is

00:04:52.259 --> 00:04:54.800
the absolute heart of any decision model. OK.

00:04:54.819 --> 00:04:57.980
So if we plot those two rates, true positives

00:04:57.980 --> 00:05:00.740
and false positives, on a graph, we get the visual

00:05:00.740 --> 00:05:02.819
map of this whole thing, right? The ROC space.

00:05:03.000 --> 00:05:05.819
Yes. Let's map that space out visually for everyone.

00:05:06.259 --> 00:05:09.639
So imagine a standard graph. Your y -axis, running

00:05:09.639 --> 00:05:12.639
vertically, is your true positive rate. So out

00:05:12.639 --> 00:05:15.300
of everyone who actually has the disease, what

00:05:15.300 --> 00:05:17.720
percentage did your test count? Exactly. That

00:05:17.720 --> 00:05:21.379
runs from 0 to 100 percent. Then your x -axis,

00:05:21.560 --> 00:05:23.500
running horizontally, is your false positive

00:05:23.500 --> 00:05:25.540
rate. Out of everyone who is completely healthy,

00:05:25.740 --> 00:05:27.720
what percentage did you accidentally diagnose

00:05:27.720 --> 00:05:30.000
as sick? Right. So the top left corner of that

00:05:30.000 --> 00:05:32.459
grat is the Holy Grail. Right. Coordinate 0 on

00:05:32.459 --> 00:05:35.420
the x -axis, 100 on the y -axis. Yep. That means

00:05:35.420 --> 00:05:39.240
zero false alarms, but 100 % of the sick patients

00:05:39.240 --> 00:05:41.740
caught. It's a perfect classification. But if

00:05:41.740 --> 00:05:43.740
you draw a diagonal line cutting straight from

00:05:43.740 --> 00:05:46.259
the bottom left corner to the top right corner,

00:05:46.800 --> 00:05:48.839
you've just mapped out that random coin flip

00:05:48.839 --> 00:05:52.439
we talked about. Right. Any purely random classifier,

00:05:52.560 --> 00:05:55.259
no matter the sample size, will just hug that

00:05:55.259 --> 00:05:57.839
diagonal line. It's literally called the line

00:05:57.839 --> 00:06:00.459
of no discrimination. Here's where it gets really

00:06:00.459 --> 00:06:03.000
interesting. I was looking at the contingency

00:06:03.000 --> 00:06:05.730
tables in the source material. showing different

00:06:05.730 --> 00:06:08.870
classifiers plotted in this space. Oh, yeah.

00:06:09.410 --> 00:06:11.709
And obviously, points above that diagonal line

00:06:11.709 --> 00:06:14.089
represent good results. They are performing better

00:06:14.089 --> 00:06:17.009
than random chance. Right. But there are points

00:06:17.009 --> 00:06:19.730
mapped below the diagonal line, which means the

00:06:19.730 --> 00:06:22.269
model is performing worse than random guessing,

00:06:22.889 --> 00:06:25.089
worse than a coin flip. It's so counter -intuitive.

00:06:25.370 --> 00:06:28.089
Right. Because at first glance, you'd throw that

00:06:28.089 --> 00:06:31.689
model in the trash. But mathematically, if predictor

00:06:31.689 --> 00:06:33.889
is consistently wrong, you don't throw it away.

00:06:34.209 --> 00:06:36.709
You just reverse all of its predictions. I secretly

00:06:36.709 --> 00:06:38.689
love this part of binary classification. It's

00:06:38.689 --> 00:06:41.529
amazing. If the algorithm says the email is spam,

00:06:42.069 --> 00:06:44.250
you just send it to the inbox. If it says it's

00:06:44.250 --> 00:06:47.069
safe, you send it to spam. Exactly. By doing

00:06:47.069 --> 00:06:50.009
the exact opposite, you flip it across the center

00:06:50.009 --> 00:06:52.889
point of the graph. You instantly turn a terrible

00:06:52.889 --> 00:06:55.509
predictor into a highly predictive useful tool.

00:06:55.970 --> 00:06:59.060
The math completely supports that. The distance

00:06:59.060 --> 00:07:01.660
from that diagonal line in either direction is

00:07:01.660 --> 00:07:04.199
the true indicator of how much predictive power

00:07:04.199 --> 00:07:06.920
a method has. That's just wild to me. But, you

00:07:06.920 --> 00:07:09.079
know, navigating the space isn't just about plotting

00:07:09.079 --> 00:07:12.879
one dot on a graph. A continuous output model,

00:07:13.019 --> 00:07:15.660
like a blood test or a machine learning probability

00:07:15.660 --> 00:07:17.740
score, gives you a sweeping curve. Right, the

00:07:17.740 --> 00:07:20.199
actual ROC curve. Yeah, and you navigate that

00:07:20.199 --> 00:07:22.480
curve by shifting what the source calls your

00:07:22.480 --> 00:07:24.740
threshold. Okay, let's use the source's example

00:07:24.740 --> 00:07:26.920
of blood protein levels to explain this. Good

00:07:26.920 --> 00:07:30.040
idea. So imagine people with a specific disease

00:07:30.040 --> 00:07:32.720
have an average blood protein level of 2 grams

00:07:32.720 --> 00:07:36.319
per deciliter, and healthy people average 1 gram

00:07:36.319 --> 00:07:38.740
per deciliter. But those are just averages. In

00:07:38.740 --> 00:07:41.790
reality, human biology is messy. Very messy.

00:07:42.110 --> 00:07:43.949
The populations are normally distributed, meaning

00:07:43.949 --> 00:07:46.089
their bell curves overlap. You have perfectly

00:07:46.089 --> 00:07:48.230
healthy people who naturally run a high protein

00:07:48.230 --> 00:07:50.930
level and sick people who have unusually low

00:07:50.930 --> 00:07:53.990
protein. As the doctor, you have to place a vertical

00:07:53.990 --> 00:07:56.610
line, your threshold, somewhere on that overlap.

00:07:57.089 --> 00:08:00.290
If you say anyone over 1 .5 grams is officially

00:08:00.290 --> 00:08:02.769
diseased, you'll catch a lot of sick people.

00:08:03.050 --> 00:08:04.970
But you'll also scoop up those healthy people

00:08:04.970 --> 00:08:07.709
whose levels just naturally ran a bit high. Exactly.

00:08:07.829 --> 00:08:09.970
Those are your false alarms. So if you want to

00:08:09.970 --> 00:08:12.509
avoid scaring healthy people, you might slide

00:08:12.509 --> 00:08:15.170
that threshold to the right. You say, we will

00:08:15.170 --> 00:08:17.689
only diagnose the disease if the level is above

00:08:17.689 --> 00:08:22.050
1 .8 grams. Right. You drastically reduce your

00:08:22.050 --> 00:08:25.149
false positives. But the inescapable trade off

00:08:25.149 --> 00:08:27.649
is that you increase your false negatives. You

00:08:27.649 --> 00:08:30.519
start missing sick people. And the shape of the

00:08:30.519 --> 00:08:34.500
ROC curve basically maps out every single possible

00:08:34.500 --> 00:08:37.019
trade -off you could make between those two overlapping

00:08:37.019 --> 00:08:39.679
bell curves. Exactly. It forces you to visualize

00:08:39.679 --> 00:08:41.860
the cost of your decisions. But human nature

00:08:41.860 --> 00:08:44.179
being what it is, we hate looking at complex,

00:08:44.179 --> 00:08:46.960
nuanced trade -offs. Oh, we avoid it at all costs.

00:08:47.120 --> 00:08:49.539
Right. If I'm an executive buying an AI tool,

00:08:49.740 --> 00:08:51.940
I don't want to analyze a curve. I want a single,

00:08:52.159 --> 00:08:54.899
easy -to -read grade. And that brings us to the

00:08:54.899 --> 00:08:58.059
AUC, the area under the curve. Right. The AUC

00:08:58.059 --> 00:09:00.580
is arguably the most commonly cited statistic

00:09:00.580 --> 00:09:04.220
when evaluating classification models. In probabilistic

00:09:04.220 --> 00:09:06.639
terms, it's the probability that a classifier

00:09:06.639 --> 00:09:09.700
will rank a randomly chosen positive instance

00:09:09.700 --> 00:09:12.379
higher than a randomly chosen negative one. OK.

00:09:12.399 --> 00:09:14.519
Let me push back on this because this feels like

00:09:14.519 --> 00:09:16.820
a massive pitfall. Oh, it absolutely is. Like,

00:09:16.860 --> 00:09:18.960
if I'm looking at a model and the vendor boasts

00:09:19.049 --> 00:09:23.710
was an incredibly high ROC AUC, say, 0 .9, I'd

00:09:23.710 --> 00:09:25.629
intuitively look at that and think, that's a

00:09:25.629 --> 00:09:29.250
90%. That is a solid A grade. Isn't a 0 .9 essentially

00:09:29.250 --> 00:09:31.929
an A? It sounds phenomenal, but the source material

00:09:31.929 --> 00:09:35.129
outlines several major criticisms of using AUC

00:09:35.129 --> 00:09:38.049
as a standalone metric. It's a huge trap. Why?

00:09:38.190 --> 00:09:41.269
Because a high AUC of 0 .9 might still correspond

00:09:41.269 --> 00:09:43.909
to surprisingly low values of precision in the

00:09:43.909 --> 00:09:46.789
real world, like a precision of 0 .2. Wait, let's

00:09:46.789 --> 00:09:48.409
define precision real quick so we don't get lost.

00:09:48.269 --> 00:09:51.090
Sure. Precision is just asking. Yeah. Of all

00:09:51.090 --> 00:09:53.129
the times the alarm actually rang, how many times

00:09:53.129 --> 00:09:55.490
was there a real fire? Exactly. So a precision

00:09:55.490 --> 00:09:58.629
of .2 means 80 % of your alarms are entirely

00:09:58.629 --> 00:10:02.029
false. How can the area under the curve be an

00:10:02.029 --> 00:10:04.850
A grade if the test precision is that useless?

00:10:05.210 --> 00:10:08.509
Because of how the AUC calculates total area,

00:10:09.149 --> 00:10:12.250
it includes the entire area under the curve across

00:10:12.250 --> 00:10:14.730
every conceivable threshold. Even the bad ones.

00:10:15.070 --> 00:10:17.970
Yes, including regions of the curve with low

00:10:17.970 --> 00:10:21.370
sensitivity and low specificity like below 0

00:10:21.370 --> 00:10:24.470
.5 that are practically useless in the real world.

00:10:24.769 --> 00:10:27.149
It's like grading a restaurant based on its entire

00:10:27.149 --> 00:10:30.139
20 -page menu. giving it five stars because the

00:10:30.139 --> 00:10:32.340
caviar is world class, even though everyone only

00:10:32.340 --> 00:10:33.940
orders the pizza, and the pizza's terrible. That

00:10:33.940 --> 00:10:36.720
is a perfect analogy. What's fascinating here

00:10:36.720 --> 00:10:39.460
is that summarizing a complex trade -off into

00:10:39.460 --> 00:10:42.919
a single number inherently loses vital information

00:10:42.919 --> 00:10:45.019
about how the algorithm actually behaves. Right.

00:10:45.120 --> 00:10:47.340
It bundles absolutely useless threshold data

00:10:47.340 --> 00:10:49.940
into your final grade. Exactly. It tells you

00:10:49.940 --> 00:10:52.360
about sensitivity, but it masks the precision

00:10:52.360 --> 00:10:54.830
almost entirely. Which is pretty alarming. If

00:10:54.830 --> 00:10:56.929
you're deploying a cancer screening tool based

00:10:56.929 --> 00:10:59.070
purely on an AUC score, thinking it's highly

00:10:59.070 --> 00:11:00.669
accurate. When in reality, it's just going to

00:11:00.669 --> 00:11:03.090
flood your hospital with false alarms. Wow. So

00:11:03.090 --> 00:11:06.070
because of these exact flaws, scientists and

00:11:06.070 --> 00:11:08.710
engineers have had to invent alternative ways

00:11:08.710 --> 00:11:11.289
to measure performance that focus on what actually

00:11:11.289 --> 00:11:15.090
matters. Yes. The ROC curve is foundational,

00:11:15.090 --> 00:11:18.149
but it is no longer the only tool in the shed.

00:11:18.590 --> 00:11:20.049
Let's walk through some of those alternatives

00:11:20.049 --> 00:11:22.450
mentioned in the source that fix the ROC curve's

00:11:22.450 --> 00:11:26.519
bland spots. First up is the TOC, or total operating

00:11:26.519 --> 00:11:28.720
characteristic. Right. So the primary flaw of

00:11:28.720 --> 00:11:31.360
the RSC curve is that it only provides ratios.

00:11:31.740 --> 00:11:34.139
OK. At a given threshold, it might tell you your

00:11:34.139 --> 00:11:36.860
hit rate is 0 .3 and your false alarm rate is

00:11:36.860 --> 00:11:40.080
0 .2. But ratios hide reality. What do you mean?

00:11:40.580 --> 00:11:43.039
Well, catching one out of two sick people is

00:11:43.039 --> 00:11:46.259
a 50 % rate. Catching 1 ,000 out of 2 ,000 sick

00:11:46.259 --> 00:11:49.539
people is also a 50 % rate. The RSC curve strips

00:11:49.539 --> 00:11:52.080
away the scale of your data. Oh, I see. But the

00:11:52.080 --> 00:11:54.820
TOC reveals the total information in the contingency

00:11:54.820 --> 00:11:56.679
table, right? Exactly. It shows you the actual

00:11:56.679 --> 00:12:00.000
absolute number of hits, misses, false alarms,

00:12:00.220 --> 00:12:02.100
and correct rejections for each threshold. That

00:12:02.100 --> 00:12:04.179
seems much more practical. Another one the source

00:12:04.179 --> 00:12:06.820
mentions is the DET graph, the detection error

00:12:06.820 --> 00:12:09.620
trade -off. Yes. This one plots the false negative

00:12:09.620 --> 00:12:12.100
rate against the false positive rate on nonlinear

00:12:12.100 --> 00:12:15.139
axes. I love the analogy for this one. They deliberately

00:12:15.139 --> 00:12:17.460
warp the axes to spend more visual real estate

00:12:17.460 --> 00:12:20.519
on the errors. Right. Stretching the axes solves

00:12:20.519 --> 00:12:24.460
a very practical visual problem. On a standard

00:12:24.460 --> 00:12:27.620
ROC curve, most of the graph is taken up by thresholds

00:12:27.620 --> 00:12:30.039
nobody cares about. Yeah, and this relates perfectly

00:12:30.039 --> 00:12:32.580
to how the automatic speaker recognition community

00:12:32.779 --> 00:12:35.899
uses it. Like if you're building Siri or Alexa,

00:12:36.480 --> 00:12:39.000
you're constantly balancing false alarms and

00:12:39.000 --> 00:12:41.220
missed detections. Exactly. If Siri goes off

00:12:41.220 --> 00:12:44.120
randomly during a quiet movie, it's highly annoying.

00:12:44.620 --> 00:12:46.899
But if Siri ignores your voice command when you're

00:12:46.899 --> 00:12:48.779
driving on the highway, it's actually dangerous.

00:12:49.120 --> 00:12:51.759
Right. So the DET graph zooms in on the exact

00:12:51.759 --> 00:12:54.159
region of interest where those specific false

00:12:54.159 --> 00:12:56.519
alarms happen, rather than wasting space on the

00:12:56.519 --> 00:12:58.320
parts of the graph nobody cares about. It's all

00:12:58.320 --> 00:13:00.519
about managing that annoyance to danger ratio.

00:13:00.820 --> 00:13:02.519
Exactly. Okay, there's one more alternative I

00:13:02.519 --> 00:13:06.259
want to hit. ZROC. Applying a standard Z -score

00:13:06.259 --> 00:13:08.440
to transform the curve into a straight line.

00:13:08.539 --> 00:13:11.100
Yes. If we connect this to the bigger picture,

00:13:11.440 --> 00:13:13.419
this shows how different industries have had

00:13:13.419 --> 00:13:15.940
to customize how they measure truth based on

00:13:15.940 --> 00:13:18.179
what kinds of errors they can tolerate. Right.

00:13:18.600 --> 00:13:21.820
And ZROC is heavily used in psychology, specifically

00:13:21.820 --> 00:13:24.850
memory strength theory tests. Exactly. When you

00:13:24.850 --> 00:13:27.110
test memory, you're plotting the strength of

00:13:27.110 --> 00:13:30.049
recognition. And if you apply a standard z -score,

00:13:30.169 --> 00:13:33.509
it transforms that bowed ROC curve into a straight

00:13:33.509 --> 00:13:35.610
line. Which is so cool. The source goes deep

00:13:35.610 --> 00:13:38.450
into testing subjects, with targets, objects

00:13:38.450 --> 00:13:41.129
they actually studied, and lures objects designed

00:13:41.129 --> 00:13:43.570
to trick them. Right. And this brings up the

00:13:43.570 --> 00:13:46.529
Janelina's model of amnesia. Oh, yeah. That blew

00:13:46.529 --> 00:13:49.370
my mind. It suggests human memory is two separate

00:13:49.370 --> 00:13:52.129
mechanisms. First, you have familiarity, which

00:13:52.129 --> 00:13:54.789
is just a vague feeling. It operates like a continuous

00:13:54.789 --> 00:13:57.590
bell curve. OK. Second, you have recollection,

00:13:57.750 --> 00:13:59.990
which is a discrete all or nothing process. You

00:13:59.990 --> 00:14:02.370
either objectively remember it or you don't.

00:14:02.590 --> 00:14:05.049
And mathematically, adding that discrete recollection

00:14:05.049 --> 00:14:09.029
process. bends the straight ZROC line, forcing

00:14:09.029 --> 00:14:11.870
it to become concave up. Yes. But for patients

00:14:11.870 --> 00:14:14.129
suffering from amnesia who have lost that discrete

00:14:14.129 --> 00:14:16.350
recollection ability, they only have the vague

00:14:16.350 --> 00:14:18.710
familiarity left. Exactly. And because of that

00:14:18.710 --> 00:14:21.570
physical change in the brain, their ZROC curve

00:14:21.570 --> 00:14:24.110
loses its concavity and reverts to a straight

00:14:24.110 --> 00:14:27.309
line. We are using a mathematical concept designed

00:14:27.309 --> 00:14:31.330
for WBII radar to literally map the geometric

00:14:31.330 --> 00:14:34.879
shape of human amnesia. That is just stunning.

00:14:35.059 --> 00:14:37.279
It really is an incredible evolution of applied

00:14:37.279 --> 00:14:39.960
statistics. So bringing it all back to you, the

00:14:39.960 --> 00:14:42.600
listener, whether you are reading a medical study,

00:14:42.860 --> 00:14:45.179
evaluating a business algorithm, or just reading

00:14:45.179 --> 00:14:48.279
a headline about a new AI. You now know that

00:14:48.279 --> 00:14:50.860
accuracy isn't a single magical number. Right.

00:14:50.860 --> 00:14:53.639
You are equipped to ask where the threshold is

00:14:53.639 --> 00:14:56.059
actually set and what kinds of false alarms are

00:14:56.059 --> 00:14:58.799
hiding under the curve. Exactly. But we do want

00:14:58.799 --> 00:15:01.519
to leave you with one final provocative concept

00:15:01.519 --> 00:15:04.590
to mull over. All right. Everything we've discussed

00:15:04.590 --> 00:15:07.190
today, yes or no, diseased or healthy, enemy

00:15:07.190 --> 00:15:10.029
plane or bird, has been about binary choices.

00:15:10.210 --> 00:15:12.190
Right, two options. But what happens when there

00:15:12.190 --> 00:15:15.350
are three, four, or fifty classes? Oh wow. The

00:15:15.350 --> 00:15:17.830
source notes that to use RSC for multiple classes,

00:15:17.909 --> 00:15:20.269
you have to move beyond a 2D graph entirely.

00:15:20.690 --> 00:15:22.950
You have to calculate the volume under surface,

00:15:23.149 --> 00:15:25.929
the VUS, of a multi -dimensional hyperspace.

00:15:25.990 --> 00:15:28.950
That sounds impossibly complicated. It is. Imagine

00:15:28.950 --> 00:15:32.190
how exponentially complex defining truth becomes

00:15:32.190 --> 00:15:35.110
when we step out of a binary world and try to

00:15:35.110 --> 00:15:38.009
map out the infinite overlapping thresholds of

00:15:38.009 --> 00:15:41.250
reality. That is a lot to think about. Next time

00:15:41.250 --> 00:15:42.769
I check my spam folder, I'm definitely going

00:15:42.769 --> 00:15:45.029
to be picturing a multi -dimensional hyperspace.

00:15:45.429 --> 00:15:48.129
As you should. Well, that wraps up our deep dive

00:15:48.129 --> 00:15:50.090
for today. Thanks for exploring the invisible

00:15:50.090 --> 00:15:50.789
math with us.