WEBVTT

00:00:00.000 --> 00:00:02.680
You know, how comforting the math on a school

00:00:02.680 --> 00:00:06.500
test is. Like, 90 % is an A, 80 % is a B. Right.

00:00:06.599 --> 00:00:08.939
It's just straightforward. Exactly. You tally

00:00:08.939 --> 00:00:11.140
up the correct answers, and the final result

00:00:11.140 --> 00:00:13.500
just sits right there on the page. I mean, the

00:00:13.500 --> 00:00:15.560
math does not care about your background, your

00:00:15.560 --> 00:00:18.500
demographics, or your zip code. It's completely

00:00:18.500 --> 00:00:20.780
objective. Or at least, that's what we like to

00:00:20.780 --> 00:00:23.829
think. Right. And that brings us to today's topic.

00:00:23.949 --> 00:00:26.829
Welcome to the deep dive. Today, we are looking

00:00:26.829 --> 00:00:29.269
at a stack of sources, primarily focusing on

00:00:29.269 --> 00:00:33.630
this really dense, but incredibly relevant Wikipedia

00:00:33.630 --> 00:00:36.990
article on fairness in machine learning. Mm -hmm.

00:00:37.189 --> 00:00:39.590
It's a huge topic right now. It really is. Our

00:00:39.590 --> 00:00:42.229
mission today is to figure out what happens when

00:00:42.229 --> 00:00:45.109
that comforting, seemingly objective math is

00:00:45.109 --> 00:00:47.789
handed the power to decide, well, to decide major

00:00:47.789 --> 00:00:50.090
things for you, like if you get a mortgage or

00:00:50.090 --> 00:00:53.049
land a job. or even go to jail. Yeah, the stakes

00:00:53.049 --> 00:00:55.630
are incredibly high. They are. Because what if

00:00:55.630 --> 00:00:57.950
the grading rubric itself is secretly rigged

00:00:57.950 --> 00:01:01.350
from the start? We are exploring how these cold

00:01:01.350 --> 00:01:04.290
calculating algorithms are basically inheriting

00:01:04.290 --> 00:01:07.189
our worst human biases. And, you know, to really

00:01:07.189 --> 00:01:09.569
understand this, we have to look back. Because

00:01:09.569 --> 00:01:11.810
while artificial intelligence feels like this

00:01:11.810 --> 00:01:15.129
brand new futuristic frontier, the scientific

00:01:15.129 --> 00:01:17.269
community has actually been wrestling with this

00:01:17.269 --> 00:01:20.329
specific problem for a surprisingly long time.

00:01:20.269 --> 00:01:22.469
Wait, really? Like, how long are we talking?

00:01:22.590 --> 00:01:25.609
We're talking mid -1960s and 1970s. Oh, wow.

00:01:25.909 --> 00:01:27.629
I wouldn't have guessed that. Yeah, it was following

00:01:27.629 --> 00:01:30.170
the American Civil Rights Movement. You had the

00:01:30.170 --> 00:01:32.469
passage of the U .S. Civil Rights Act of 1964,

00:01:32.969 --> 00:01:35.069
and suddenly researchers were desperately trying

00:01:35.069 --> 00:01:38.469
to figure out quantitative statistical ways to

00:01:38.469 --> 00:01:41.370
test for unjust discrimination. Oh, right. Like,

00:01:41.569 --> 00:01:44.829
in testing and hiring practices. Exactly. But

00:01:44.829 --> 00:01:47.969
here's the wild part. By the end of the 1970s,

00:01:48.010 --> 00:01:51.140
that debate had largely just... Well, it vanished.

00:01:51.400 --> 00:01:54.939
Why? Did they solve it? No, not at all. The different

00:01:54.939 --> 00:01:57.599
competing mathematical notions of what fairness

00:01:57.599 --> 00:02:01.120
even meant left so little room for clarity that

00:02:01.120 --> 00:02:03.420
the discussion just stalled out. People couldn't

00:02:03.420 --> 00:02:05.700
agree. So let me get this straight. Decades later,

00:02:05.920 --> 00:02:08.000
we have basically resurrected that exact same

00:02:08.000 --> 00:02:10.159
unresolved philosophical debate. But now we've

00:02:10.159 --> 00:02:12.300
just handed it over to machines. That is exactly

00:02:12.300 --> 00:02:15.280
what happened. We gave an unresolved human problem.

00:02:15.469 --> 00:02:18.030
to computers. OK, well, before we get into the

00:02:18.030 --> 00:02:21.189
headache of how engineers are, you know, trying

00:02:21.189 --> 00:02:24.030
to actually fix AI, we have to look at how these

00:02:24.030 --> 00:02:26.090
systems are failing people in the real world

00:02:26.090 --> 00:02:29.050
right now because the stakes here, they aren't

00:02:29.050 --> 00:02:31.669
theoretical. No, they are very real. And the

00:02:31.669 --> 00:02:34.689
legal system. is perhaps the most heavily scrutinized

00:02:34.689 --> 00:02:37.050
area for this. Right. The sources mention that

00:02:37.050 --> 00:02:39.509
big report. Yeah, the 2016 Po Publica report.

00:02:39.530 --> 00:02:42.090
They did a major investigation into a risk assessment

00:02:42.090 --> 00:02:44.389
software called Compass. And Compass is used

00:02:44.389 --> 00:02:46.610
in the U .S. judicial system, right? Mm -hmm.

00:02:46.710 --> 00:02:49.569
It's a tool widely used to predict a defendant's

00:02:49.569 --> 00:02:53.270
likelihood of recidivism. Which is a... the probability

00:02:53.270 --> 00:02:55.990
they will reoffend. Exactly. Now, ProPublica

00:02:55.990 --> 00:02:58.409
claimed this algorithm was racially biased. Their

00:02:58.409 --> 00:03:00.610
data showed that black defendants were almost

00:03:00.610 --> 00:03:02.830
twice as likely to be incorrectly labeled as

00:03:02.830 --> 00:03:05.250
a higher risk than white defendants. Almost twice

00:03:05.250 --> 00:03:07.990
as likely, just incorrectly labeled. Yes. And

00:03:07.990 --> 00:03:10.610
meanwhile, the software made the inverse error

00:03:10.610 --> 00:03:13.129
with white defendants. It frequently labeled

00:03:13.129 --> 00:03:15.289
them as lower risk when they actually went on

00:03:15.289 --> 00:03:19.020
to reoffend. Wow. But the creator of the software

00:03:19.020 --> 00:03:21.560
pushed back on that, didn't they? They did. The

00:03:21.560 --> 00:03:24.189
creator Northpoint Inc. heavily disputed the

00:03:24.189 --> 00:03:26.750
report. They argued their tool was completely

00:03:26.750 --> 00:03:29.289
fair because from their mathematical perspective,

00:03:29.710 --> 00:03:32.930
a high risk score meant the exact same likelihood

00:03:32.930 --> 00:03:35.189
of reoffending regardless of the defendant's

00:03:35.189 --> 00:03:37.650
race. OK, so North Point is saying, look, our

00:03:37.650 --> 00:03:40.490
math is sound, but ProPublica is saying, look

00:03:40.490 --> 00:03:43.169
at the disproportionate harm these errors are

00:03:43.169 --> 00:03:46.030
causing. Right. ProPublica refuted North Point's

00:03:46.030 --> 00:03:48.449
defense, arguing that the differing error rates

00:03:48.449 --> 00:03:50.990
were what mattered. It became this massive clash

00:03:50.990 --> 00:03:53.060
of what the data was actually saying. And what

00:03:53.060 --> 00:03:55.900
metric of fairness mattered more, really? And

00:03:55.900 --> 00:03:57.900
we see similar clashes in corporate America,

00:03:58.159 --> 00:04:00.199
too. Oh, absolutely. Like Amazon built an AI

00:04:00.199 --> 00:04:02.580
software to review job resumes, and it turned

00:04:02.580 --> 00:04:05.120
out to be systematically penalizing women. Yeah,

00:04:05.120 --> 00:04:07.500
that was a famous case. The algorithm had been

00:04:07.500 --> 00:04:10.139
trained on a decade of past resumes, which, you

00:04:10.139 --> 00:04:12.900
know, came predominantly from men. So the AI

00:04:12.900 --> 00:04:15.520
learned to literally penalize applications that

00:04:15.520 --> 00:04:18.420
included the word women. It's wild. Like, if

00:04:18.420 --> 00:04:20.540
an applicant listed they were captain of the

00:04:20.540 --> 00:04:22.829
women's chess club, they're score was docked.

00:04:22.990 --> 00:04:24.790
Just for having the word in there. And then there

00:04:24.790 --> 00:04:27.980
was the apple cart issue. in 2019. Right. The

00:04:27.980 --> 00:04:30.360
algorithm determining credit limits came under

00:04:30.360 --> 00:04:33.199
heavy fire. It was giving significantly higher

00:04:33.199 --> 00:04:35.600
credit limits to men than to women, even for

00:04:35.600 --> 00:04:37.899
married couples. The sources pointed out this

00:04:37.899 --> 00:04:40.300
was happening even in cases where couples shared

00:04:40.300 --> 00:04:42.699
all of their finances. They had merged assets

00:04:42.699 --> 00:04:45.439
and possessed the exact same credit history.

00:04:45.519 --> 00:04:47.639
It's a perfect example of how the math isn't

00:04:47.639 --> 00:04:50.699
just objective and beyond financial unemployment

00:04:50.699 --> 00:04:53.899
algorithms, visual and language biases are deeply

00:04:53.899 --> 00:04:56.660
pervasive to. Oh, yeah. The image recognition

00:04:56.660 --> 00:05:00.199
failures are just startling. Back in 2015, Google

00:05:00.199 --> 00:05:03.220
Photos mistakenly labeled images of a black couple

00:05:03.220 --> 00:05:06.120
as gorillas. Which is horrific. And then in 2020,

00:05:06.540 --> 00:05:08.379
users discovered that an automatic image cropping

00:05:08.379 --> 00:05:11.220
tool used by Twitter was actively prioritizing

00:05:11.220 --> 00:05:14.129
lighter skinned faces. Like when it generated

00:05:14.129 --> 00:05:16.610
thumbnail previews for timelines, it just defaulted

00:05:16.610 --> 00:05:19.310
to the lighter faces. And those visual biases,

00:05:19.329 --> 00:05:21.689
they extend directly into the large language

00:05:21.689 --> 00:05:24.009
models everyone is interacting with today, like

00:05:24.009 --> 00:05:26.829
chat GPT. Right. Everybody is using those now.

00:05:27.290 --> 00:05:29.430
And current models suffer heavily from statistical

00:05:29.430 --> 00:05:32.569
sampling bias, specifically what we call language

00:05:32.569 --> 00:05:35.529
bias, because the vast majority of their training

00:05:35.529 --> 00:05:38.850
data is in English. These models tend to present

00:05:38.850 --> 00:05:41.269
Anglo -American political and cultural views

00:05:41.269 --> 00:05:45.610
as, well, universal truths. The source material

00:05:45.610 --> 00:05:48.209
highlighted a fascinating example of this. If

00:05:48.209 --> 00:05:50.490
you ask a language model what is liberalism,

00:05:51.029 --> 00:05:53.529
it defaults to the Anglo -American perspective.

00:05:53.769 --> 00:05:56.720
Emphasizing human rights and equality. But it

00:05:56.720 --> 00:05:59.459
completely ignores the dominant Vietnamese perspective,

00:05:59.899 --> 00:06:02.740
which views liberalism as opposing state intervention

00:06:02.740 --> 00:06:05.139
and personal life. Exactly. And it also ignores

00:06:05.139 --> 00:06:07.180
the prevalent Chinese perspective, which focuses

00:06:07.180 --> 00:06:10.480
on the limitation of government power. The chatbot

00:06:10.480 --> 00:06:13.300
presents itself as this worldly, omniscient oracle,

00:06:13.620 --> 00:06:16.879
but it is effectively blind to non -English perspectives.

00:06:17.160 --> 00:06:18.980
Simply because of the language distribution in

00:06:18.980 --> 00:06:21.279
its training data. You know, listening to all

00:06:21.279 --> 00:06:23.740
these examples, it really shatters the illusion

00:06:23.740 --> 00:06:26.860
of the objective machine. I mean, AI is not some

00:06:26.860 --> 00:06:29.579
super smart alien entity arriving with perfect

00:06:29.579 --> 00:06:32.579
logic. Not at all. It operates more like a parrot,

00:06:32.939 --> 00:06:35.160
just eagerly repeating the Internet's worst habits

00:06:35.160 --> 00:06:37.759
and historical inequalities right back to us.

00:06:38.079 --> 00:06:40.160
That's a great analogy. And for you, the listener,

00:06:40.699 --> 00:06:43.970
the crucial takeaway here is how deeply This

00:06:43.970 --> 00:06:46.689
invisible math is currently judging you. Judging

00:06:46.689 --> 00:06:49.769
us every day. Every day. Whether you are applying

00:06:49.769 --> 00:06:52.089
for a mortgage, where algorithms are currently

00:06:52.089 --> 00:06:54.410
more likely to reject non -white applicants,

00:06:54.829 --> 00:06:57.689
or you're simply scrolling through a personalized

00:06:57.689 --> 00:06:59.810
news feed on social media. You're constantly

00:06:59.810 --> 00:07:02.509
being evaluated by systems built on data that

00:07:02.509 --> 00:07:05.009
reflects a very flawed world. Exactly. Which

00:07:05.009 --> 00:07:07.290
naturally leads to the big question. If we know

00:07:07.290 --> 00:07:09.110
the machines are biased, why don't the engineers

00:07:09.110 --> 00:07:11.129
just, you know, open up the code and program

00:07:11.129 --> 00:07:13.449
them, to be fair? Just type in a math - mathematical

00:07:13.449 --> 00:07:16.089
rule that says treat everyone equally? Well,

00:07:16.129 --> 00:07:19.430
here is where we hit a massive mind -bending

00:07:19.430 --> 00:07:23.569
contradiction. Defining fair mathematically actually

00:07:23.569 --> 00:07:26.089
breaks the system. Wait, really? How does fairness

00:07:26.089 --> 00:07:28.970
break the system? In machine learning, you have

00:07:28.970 --> 00:07:31.290
a target variable, the thing you are trying to

00:07:31.290 --> 00:07:34.149
predict, like who will repay a loan. And you

00:07:34.149 --> 00:07:36.290
have sensitive characteristics, like gender,

00:07:36.730 --> 00:07:40.730
ethnicity, or age. Researchers have established

00:07:40.730 --> 00:07:43.750
three distinct mathematical definitions for group

00:07:43.750 --> 00:07:45.850
fairness. Okay, leave them on me. The first is

00:07:45.850 --> 00:07:47.829
called independence. It's sometimes referred

00:07:47.829 --> 00:07:50.949
to as demographic parity or statistical parity.

00:07:51.149 --> 00:07:54.069
So independence is about equal outcomes, right?

00:07:54.110 --> 00:07:57.410
Like if a company is hiring and the local applicant

00:07:57.410 --> 00:08:01.350
pool is 50 % men and 50 % women, the final group

00:08:01.350 --> 00:08:03.730
of people who actually get hired must reflect

00:08:03.730 --> 00:08:07.649
that exact same 50 -50 ratio. Exactly. The outcome

00:08:07.649 --> 00:08:09.689
is totally independent of the sensitive trait.

00:08:09.889 --> 00:08:11.649
Makes sense. What's the second one? The second

00:08:11.649 --> 00:08:14.089
definition is called separation or equalized

00:08:14.089 --> 00:08:17.050
odds. This metric demands that the true positive

00:08:17.050 --> 00:08:20.170
rate and the false positive rate must be identical.

00:08:20.329 --> 00:08:22.790
across all groups. So it's looking at the errors.

00:08:23.069 --> 00:08:25.990
Right. This means the algorithm's accuracy and

00:08:25.990 --> 00:08:29.790
its mistakes must be evenly distributed, it asks.

00:08:30.810 --> 00:08:33.690
Out of all the people who truly deserve the job,

00:08:34.210 --> 00:08:37.070
did equal percentages of qualified men and qualified

00:08:37.070 --> 00:08:39.409
women get hired? And out of the people who were

00:08:39.409 --> 00:08:41.909
unqualified, were they rejected at equal rates?

00:08:42.500 --> 00:08:45.200
Precisely. Okay, so independence is kind of a

00:08:45.200 --> 00:08:47.559
blunt instrument guaranteeing an equal overall

00:08:47.559 --> 00:08:50.460
outcome ratio, but separation is more like a

00:08:50.460 --> 00:08:52.919
microscope looking at equal accuracy among the

00:08:52.919 --> 00:08:55.360
people who actually meet the criteria. That's

00:08:55.360 --> 00:08:57.700
a very good way to put it. Now, the third definition

00:08:57.700 --> 00:09:00.779
is sufficiency. This dictates that if an AI predicts

00:09:00.779 --> 00:09:03.399
two people belong to a positive group, their

00:09:03.399 --> 00:09:05.620
actual real -world probability of belonging to

00:09:05.620 --> 00:09:07.820
it must be equal. Wait, how is it different from

00:09:07.820 --> 00:09:10.139
the second one? It's about predictive reliability.

00:09:10.500 --> 00:09:12.940
If the algorithm labels a man and woman as high

00:09:12.940 --> 00:09:15.720
risk for defaulting on a loan, the actual chance

00:09:15.720 --> 00:09:18.200
that they default needs to be identical. The

00:09:18.200 --> 00:09:20.539
prediction's reliability cannot change depending

00:09:20.539 --> 00:09:22.759
on the demographic group it is analyzing. Okay,

00:09:22.759 --> 00:09:25.049
I think I'm wrapping my head around this. We

00:09:25.049 --> 00:09:28.070
have independence for equal outcomes, separation

00:09:28.070 --> 00:09:31.009
for equal error rates, and sufficiency for equal

00:09:31.009 --> 00:09:34.629
predictive reliability. Three solid ways to measure

00:09:34.629 --> 00:09:37.549
fairness. Yes, three very logical definitions.

00:09:37.730 --> 00:09:40.470
So why not just program the AI to satisfy all

00:09:40.470 --> 00:09:43.009
three simultaneously? Problem solved. Because

00:09:43.009 --> 00:09:46.149
mathematically, achieving what researchers call

00:09:46.149 --> 00:09:49.500
total fairness is impossible. It is a proven

00:09:49.500 --> 00:09:52.320
mathematical paradox. Impossible, like literally

00:09:52.320 --> 00:09:55.100
impossible. Literally, unless the baseline data

00:09:55.100 --> 00:09:58.460
has absolutely zero correlation between the sensitive

00:09:58.460 --> 00:10:01.440
trait and the outcome, which is virtually never

00:10:01.440 --> 00:10:03.759
the case in the real world, you are forced to

00:10:03.759 --> 00:10:05.679
choose. I'm going to need an everyday example

00:10:05.679 --> 00:10:09.159
to understand this. Let's say I am hiring software

00:10:09.159 --> 00:10:11.340
developers. OK, imagine you are hiring from two

00:10:11.340 --> 00:10:13.559
different demographic groups, group A and group

00:10:13.559 --> 00:10:17.320
B. Due to historical systemic wealth disparities,

00:10:17.759 --> 00:10:20.100
Group A has had widespread access to expensive

00:10:20.100 --> 00:10:22.779
computer science boot camps for the last 20 years.

00:10:23.000 --> 00:10:26.659
And Group B has not. Right. Therefore, in your

00:10:26.659 --> 00:10:29.659
baseline data, a higher percentage of Group A

00:10:29.659 --> 00:10:31.840
currently possesses the specific coding skills

00:10:31.840 --> 00:10:34.779
you need. If you enforce the first rule, independence,

00:10:35.399 --> 00:10:38.340
you must hire exactly 50 % from Group A and 50

00:10:38.340 --> 00:10:40.799
% from Group B. But because of that historical

00:10:40.799 --> 00:10:43.100
disparity, the only way I can hit that 50 -50

00:10:43.100 --> 00:10:45.759
quota is by lowering my hiring threshold for

00:10:45.759 --> 00:10:48.419
Group B and raising it for Group A. Precisely

00:10:48.419 --> 00:10:50.879
the issue. By lowering the threshold for Group

00:10:50.879 --> 00:10:54.220
B, you are mathematically forced to hire some

00:10:54.220 --> 00:10:57.000
candidates who might not actually have the required

00:10:57.000 --> 00:10:59.120
skills. Which increases their false positive

00:10:59.120 --> 00:11:01.980
rate. Exactly. And by raising the threshold for

00:11:01.980 --> 00:11:05.059
Group A, you are forced to reject highly qualified

00:11:05.059 --> 00:11:07.320
coders, increasing their false negative rate.

00:11:07.419 --> 00:11:10.240
Oh, I see it now. And doing that completely dishores

00:11:10.240 --> 00:11:13.000
the second rule, separation, because equalized

00:11:13.000 --> 00:11:15.139
odds demands that those false positive and false

00:11:15.139 --> 00:11:17.299
negative rates be identical across both groups.

00:11:17.440 --> 00:11:20.139
It is a zero -sum game. Wow. So if I guarantee

00:11:20.139 --> 00:11:22.860
equal outcomes, I am mathematically forced to

00:11:22.860 --> 00:11:25.580
accept unequal error rates. You cannot mathematically

00:11:25.580 --> 00:11:27.679
correct a historical imbalance without throwing

00:11:27.679 --> 00:11:29.980
another fairness metric entirely out of whack.

00:11:30.190 --> 00:11:32.649
You've nailed it. And because group fairness

00:11:32.649 --> 00:11:36.429
is a mathematical paradox, many researchers argue

00:11:36.429 --> 00:11:39.090
we should stop trying to fix the group entirely

00:11:39.090 --> 00:11:41.669
and instead focus purely on the individual. OK,

00:11:41.750 --> 00:11:44.090
so zooming in on the person, not the demographic.

00:11:44.289 --> 00:11:47.490
Yes. This shift brings us to the concept of individual

00:11:47.490 --> 00:11:50.450
fairness. It started with a very intuitive approach

00:11:50.450 --> 00:11:53.590
called fairness through unawareness, or FTU.

00:11:53.950 --> 00:11:56.210
You essentially institute algorithmic blindness.

00:11:56.470 --> 00:11:58.009
Well, that makes total sense on the surface.

00:11:58.090 --> 00:12:00.389
Just delete the sensitive data. Do not tell the

00:12:00.389 --> 00:12:03.730
AI the applicant's gender, race, or age. Right.

00:12:03.730 --> 00:12:07.590
The logic is, if the machine cannot see the demographic

00:12:07.590 --> 00:12:10.570
data, it cannot discriminate against it. It sounds

00:12:10.570 --> 00:12:13.169
ideal. Does it work? It completely falls apart

00:12:13.169 --> 00:12:16.710
due to what we call proxy variables. An AI is,

00:12:16.710 --> 00:12:19.610
at its core, a pattern recognition engine. Even

00:12:19.610 --> 00:12:21.830
if you scrub the data set of race or gender,

00:12:22.289 --> 00:12:24.769
the AI can effortlessly deduce those sensitive

00:12:24.769 --> 00:12:27.090
traits through non -sensitive attributes that

00:12:27.090 --> 00:12:29.929
are highly correlated. Oh, like a zip code. Exactly.

00:12:30.210 --> 00:12:32.970
A zip code can act as a massive proxy for race

00:12:32.970 --> 00:12:35.710
due to decades of historical housing segregation.

00:12:36.289 --> 00:12:39.570
Or the specific extracurricular clubs listed

00:12:39.570 --> 00:12:42.370
on a resume can instantly reveal gender. Right,

00:12:42.429 --> 00:12:44.409
like the women's chess club example from earlier.

00:12:44.690 --> 00:12:47.549
Yes. So an algorithm can perfectly comply with

00:12:47.549 --> 00:12:49.970
fairness through unawareness while still engaging

00:12:49.970 --> 00:12:52.519
in rampant discrimination. It is like putting

00:12:52.519 --> 00:12:55.240
a blindfold on a referee to judge a sprint, but

00:12:55.240 --> 00:12:57.419
completely ignoring the fact that someone already

00:12:57.419 --> 00:13:00.279
placed heavy hurdles in half of the lanes. That's

00:13:00.279 --> 00:13:02.980
a perfect analogy. The referee does not see color

00:13:02.980 --> 00:13:05.460
or gender, but they also don't see the systemic

00:13:05.460 --> 00:13:08.580
obstacles affecting the runner's times. And because

00:13:08.580 --> 00:13:11.500
the blindfold approach fails so spectacularly,

00:13:11.720 --> 00:13:14.240
researchers like Cynthia Dwork introduced a counter

00:13:14.240 --> 00:13:17.200
concept in 2012 called Fairness Through Awareness,

00:13:17.360 --> 00:13:20.080
or FTA. Through awareness. So we intentionally

00:13:20.080 --> 00:13:23.610
tell the AI, the race and gender. Yes. This argues

00:13:23.610 --> 00:13:26.769
that instead of blinding the AI, we must explicitly

00:13:26.769 --> 00:13:29.149
use the demographic data to ensure we are mapping

00:13:29.149 --> 00:13:33.190
similar individuals similarly. This led to a

00:13:33.190 --> 00:13:35.870
really fascinating branch of study known as counterfactual

00:13:35.870 --> 00:13:39.190
fairness. Counterfactual. Okay, imagine an AI

00:13:39.190 --> 00:13:41.330
is scanning your loan application right now.

00:13:41.909 --> 00:13:44.210
For that decision to be counterfactually fair,

00:13:44.649 --> 00:13:48.000
how does the AI process your information? The

00:13:48.000 --> 00:13:51.100
AI uses causal models to basically play out an

00:13:51.100 --> 00:13:53.480
alternate reality. It isolates your sensitive

00:13:53.480 --> 00:13:56.080
attribute, let's say your race. It then asks

00:13:56.080 --> 00:13:59.679
if Every single detail about this individual's

00:13:59.679 --> 00:14:01.379
life remained exactly the same, but they were

00:14:01.379 --> 00:14:03.860
a different race. Would my final decision change?

00:14:03.980 --> 00:14:06.440
Whoa. So it looks at the causal chain of events.

00:14:06.500 --> 00:14:09.299
Exactly. If changing only that demographic node

00:14:09.299 --> 00:14:12.120
flips the outcome, the model is definitively

00:14:12.120 --> 00:14:15.240
biased. We are asking machines to simulate alternate

00:14:15.240 --> 00:14:17.820
dimensions to audit their own biases. That is

00:14:17.820 --> 00:14:19.960
brilliant. But how are the engineers actually

00:14:19.960 --> 00:14:21.480
implementing this in the code? There have to

00:14:21.480 --> 00:14:23.580
be practical strategies happening under the hood.

00:14:23.759 --> 00:14:26.259
There are. Engineers generally attack the problem.

00:14:26.120 --> 00:14:28.620
at three distinct phases of the machine learning

00:14:28.620 --> 00:14:31.600
pipeline. Pre -processing, in -processing, and

00:14:31.600 --> 00:14:33.279
post -processing. OK, let's break this down.

00:14:33.419 --> 00:14:36.019
Pre -processing, that tackles the problem before

00:14:36.019 --> 00:14:37.960
the AI even gets a chance to look at the data,

00:14:38.120 --> 00:14:41.539
right? Yes. The assumption here is that the algorithm

00:14:41.539 --> 00:14:44.519
is fine, but the historical data set it is about

00:14:44.519 --> 00:14:48.139
to learn from is contaminated with bias. So how

00:14:48.139 --> 00:14:50.769
do they clean it? The preprocessing stage attempts

00:14:50.769 --> 00:14:54.509
to alter that historical data set to remove discriminatory

00:14:54.509 --> 00:14:57.289
information while preserving the useful data.

00:14:57.870 --> 00:15:00.470
One method is called reweighing. Reweighing,

00:15:00.830 --> 00:15:03.509
like changing the value of the data points. Exactly.

00:15:03.889 --> 00:15:06.289
The software assigns a mathematical weight to

00:15:06.289 --> 00:15:09.500
each data point. If the historical data reflects

00:15:09.500 --> 00:15:12.679
a bias against a certain demographic, the algorithm

00:15:12.679 --> 00:15:15.620
assigns a heavier positive weight to successful

00:15:15.620 --> 00:15:18.120
outcomes for that disadvantaged group. Oh, I

00:15:18.120 --> 00:15:19.960
see. And then it does the opposite for the privileged

00:15:19.960 --> 00:15:22.179
group. Right. It assigns a lighter weight to

00:15:22.179 --> 00:15:24.120
successful outcomes for the historically privileged

00:15:24.120 --> 00:15:26.899
group. It is like adjusting golf handicaps before

00:15:26.899 --> 00:15:29.159
a tournament begins. You know, one player is

00:15:29.159 --> 00:15:31.539
using like vintage wooden clubs while the other

00:15:31.539 --> 00:15:34.240
has state of the art titanium gear. So you adjust

00:15:34.240 --> 00:15:36.779
the baseline scoring logic in the data set. That

00:15:36.779 --> 00:15:39.159
way, when the AI finally looks at the data, it

00:15:39.159 --> 00:15:42.039
learns from a simulated, equitable world rather

00:15:42.039 --> 00:15:45.379
than our flawed reality. OK, that's pre -processing.

00:15:45.379 --> 00:15:47.480
What about the second phase? Moving to the second

00:15:47.480 --> 00:15:50.320
phase, in -processing adds constraints to the

00:15:50.320 --> 00:15:53.159
algorithm while it is actively learning. One

00:15:53.159 --> 00:15:56.139
of the most compelling techniques here is adversarial

00:15:56.139 --> 00:15:58.919
debiasing. Adversarial debiasing? That sounds

00:15:58.919 --> 00:16:02.019
intense. It is. Instead of training a single

00:16:02.019 --> 00:16:05.399
AI, engineers train two distinct neural networks

00:16:05.399 --> 00:16:08.320
and pit them against each other using gradient

00:16:08.320 --> 00:16:11.299
-based methods. Wait, gradient -based methods?

00:16:11.679 --> 00:16:14.000
Are we just talking about how the AI mathematically

00:16:14.000 --> 00:16:16.639
scores its own performance and, you know, updates

00:16:16.639 --> 00:16:19.899
its brain? Exactly. The AI adjusts its internal

00:16:19.899 --> 00:16:22.620
logic based on a mathematical gradient of how

00:16:22.620 --> 00:16:25.000
wrong its previous guess was. In adversarial

00:16:25.000 --> 00:16:27.580
debiasing, you set up a competition. A competition

00:16:27.580 --> 00:16:30.440
between the two AIs. Yes. The first neural network

00:16:30.440 --> 00:16:32.929
is the predictor. Its only job is to solve the

00:16:32.929 --> 00:16:34.990
problem, like predicting who will repay a loan.

00:16:35.590 --> 00:16:37.490
The second network is the adversary. And what

00:16:37.490 --> 00:16:40.169
does the adversary do? Its sole purpose is to

00:16:40.169 --> 00:16:42.789
analyze the predictor's output and attempt to

00:16:42.789 --> 00:16:45.169
guess the sensitive variable, like the applicant's

00:16:45.169 --> 00:16:47.980
gender, based purely on that decision. Oh, wow.

00:16:48.179 --> 00:16:50.580
So the engineers force the AI to play an intense

00:16:50.580 --> 00:16:53.100
game of hide and seek with its own biases. Exactly.

00:16:53.279 --> 00:16:55.039
The predictor wants to correctly guess the loan,

00:16:55.419 --> 00:16:57.740
but it also desperately wants the adversary to

00:16:57.740 --> 00:16:59.960
fail at guessing the gender. Right. The predictor

00:16:59.960 --> 00:17:03.279
actively modifies its internal weights, mathematically

00:17:03.279 --> 00:17:06.839
scrubbing the bias out of its own logic. It learns

00:17:06.839 --> 00:17:09.319
to make accurate loan predictions using only

00:17:09.319 --> 00:17:12.119
data patterns that give absolutely no clues about

00:17:12.119 --> 00:17:15.180
gender to the adversary. That is incredibly clever.

00:17:15.700 --> 00:17:18.420
Okay, so if preprocessing fixes the past data

00:17:18.420 --> 00:17:20.920
and in -processing fixes the active learning,

00:17:21.460 --> 00:17:23.619
post -processing must be the final safety net.

00:17:23.980 --> 00:17:26.119
What happens here? Do we just put a human at

00:17:26.119 --> 00:17:28.200
the end of the line to double check the AI's

00:17:28.200 --> 00:17:30.759
math? Close, but we put another algorithm at

00:17:30.759 --> 00:17:32.880
the end of the line. Post -processing adjusts

00:17:32.880 --> 00:17:35.359
the results after the AI has already made its

00:17:35.359 --> 00:17:37.880
initial prediction. A common strategy here is

00:17:37.880 --> 00:17:40.819
called reject option -based classification. Reject

00:17:40.819 --> 00:17:43.569
option? How does that work? When an AI makes

00:17:43.569 --> 00:17:46.470
a prediction, it generates a probability score.

00:17:47.269 --> 00:17:50.130
If it is 90 % certain an applicant will repay

00:17:50.130 --> 00:17:53.490
a loan, it approves it. If it is 10 % certain,

00:17:53.650 --> 00:17:56.289
it rejects it. But what about the people hovering

00:17:56.289 --> 00:17:59.349
right around 50 %? Ah, the borderline cases where

00:17:59.349 --> 00:18:02.390
the AI is essentially flipping a coin. For those

00:18:02.390 --> 00:18:04.750
highly uncertain predictions, this algorithm

00:18:04.750 --> 00:18:07.730
intervenes. If the borderline applicant belongs

00:18:07.730 --> 00:18:10.509
to a historically deprived group, the algorithm

00:18:10.509 --> 00:18:12.730
automatically gives them the benefit of the doubt

00:18:12.730 --> 00:18:15.569
and overrides the AI to label them positive.

00:18:15.630 --> 00:18:17.730
And if they belong to the privileged group? It

00:18:17.730 --> 00:18:21.089
labels them negative. It specifically optimizes

00:18:21.089 --> 00:18:23.849
the threshold to give a boost where the AI is

00:18:23.849 --> 00:18:26.410
unsure without disrupting the highly certain

00:18:26.410 --> 00:18:28.289
predictions. It is the algorithmic equivalent

00:18:28.289 --> 00:18:31.859
of saying the tie goes to the runner. But specifically

00:18:31.859 --> 00:18:33.680
ensuring the tie goes to the runner, who we know

00:18:33.680 --> 00:18:36.819
has historically faced severe headwinds. So we

00:18:36.819 --> 00:18:39.119
can mathematically re -weigh data sets. We can

00:18:39.119 --> 00:18:41.400
set up adversarial neural networks to battle

00:18:41.400 --> 00:18:44.019
each other inside a black box. We can adjust

00:18:44.019 --> 00:18:48.109
post -decision thresholds. But despite all of

00:18:48.109 --> 00:18:50.049
this brilliant engineering, there is an ultimate

00:18:50.049 --> 00:18:52.269
limitation that brings this entire conversation

00:18:52.269 --> 00:18:56.430
crashing back down to reality. Yes. A human element.

00:18:56.450 --> 00:18:59.430
It always comes back to us. It does. Recent research

00:18:59.430 --> 00:19:01.529
highlighted in the sources points out a tragic

00:19:01.529 --> 00:19:04.430
irony in this field. You can perfectly execute

00:19:04.430 --> 00:19:07.430
all three phases of mitigation. You can achieve

00:19:07.430 --> 00:19:10.910
an incredibly fair, mathematically sound algorithmic

00:19:10.910 --> 00:19:13.890
output. Fuck. But in the vast majority of real

00:19:13.890 --> 00:19:16.569
world applications, that AI does not make the

00:19:16.569 --> 00:19:19.150
final call. It makes a recommendation to a human

00:19:19.150 --> 00:19:22.150
operator. And humans are deeply susceptible to

00:19:22.150 --> 00:19:24.970
confirmation bias. Extremely. Studies show a

00:19:24.970 --> 00:19:27.029
phenomenon where human decision makers exhibit

00:19:27.029 --> 00:19:29.670
automation bias, trusting the machine, but they

00:19:29.670 --> 00:19:31.990
apply it selectively. What do you mean selectively?

00:19:32.200 --> 00:19:35.339
They tend to accept AI recommendations only when

00:19:35.339 --> 00:19:37.680
those recommendations align with their own pre

00:19:37.680 --> 00:19:41.079
-existing prejudices. If a perfectly fair AI

00:19:41.079 --> 00:19:43.400
recommends an outcome that contradicts the human

00:19:43.400 --> 00:19:47.240
operator's internal bias, the human simply overrides

00:19:47.240 --> 00:19:50.980
the machine. Oh man, so that completely neutralizes...

00:19:51.039 --> 00:19:53.740
the millions of dollars and countless hours spent

00:19:53.740 --> 00:19:56.339
engineering the fairness protocols. In many cases,

00:19:56.339 --> 00:19:58.400
yes. Which is exactly why this topic matters

00:19:58.400 --> 00:20:01.440
so much for you, the listener. As someone who

00:20:01.440 --> 00:20:03.539
seeks out knowledge to understand the systems

00:20:03.539 --> 00:20:06.019
running our world, your critical thinking is

00:20:06.019 --> 00:20:08.720
the ultimate safeguard. It really is. We cannot

00:20:08.720 --> 00:20:10.980
blindly trust the output of a machine because

00:20:10.980 --> 00:20:13.259
a human might be using that seemingly objective

00:20:13.259 --> 00:20:16.059
math to launder their own subjective bias. We

00:20:16.059 --> 00:20:18.180
like to imagine the machine takes the human error

00:20:18.180 --> 00:20:20.539
out of the equation, but often the human just

00:20:20.329 --> 00:20:22.750
uses the machine to justify the error. Which

00:20:22.750 --> 00:20:25.109
leaves us with a profound philosophical puzzle

00:20:25.109 --> 00:20:27.809
to mull over. If an artificial intelligence is

00:20:27.809 --> 00:20:30.549
trained on completely accurate historical data

00:20:30.549 --> 00:20:33.589
and it flawlessly learns to predict the biased

00:20:33.589 --> 00:20:36.430
unequal world we currently live in, is the AI

00:20:36.430 --> 00:20:39.269
broken or is the world broken? That is the ultimate

00:20:39.269 --> 00:20:41.910
question. Should we be designing algorithms to

00:20:41.910 --> 00:20:44.069
accurately reflect our current reality, or should

00:20:44.069 --> 00:20:47.369
we be using them to engineer a fairer one? Something

00:20:47.369 --> 00:20:49.509
to think about next time a machine makes a decision

00:20:49.509 --> 00:20:51.029
for you. Until next time.
