WEBVTT

00:00:00.000 --> 00:00:03.140
Imagine walking into a hospital in the early

00:00:03.140 --> 00:00:07.360
1980s. You have, you know, perfect grades, a

00:00:07.360 --> 00:00:10.259
stellar academic record, glowing recommendations.

00:00:11.179 --> 00:00:13.300
You submit your medical school application, but

00:00:13.300 --> 00:00:15.939
a computer automatically just, it throws it in

00:00:15.939 --> 00:00:18.500
the trash. Right. Without a second thought. Exactly.

00:00:18.859 --> 00:00:21.100
Not because you aren't highly qualified, but

00:00:21.100 --> 00:00:24.379
simply because your name, quote, sounds foreign.

00:00:25.039 --> 00:00:26.600
You'd probably think there was a glitch, right?

00:00:26.600 --> 00:00:28.620
Because math and computer code are supposed to

00:00:28.620 --> 00:00:31.559
be objective. You type an equation into a calculator,

00:00:31.839 --> 00:00:34.539
the screen shows a number, and that is the absolute

00:00:34.539 --> 00:00:36.579
truth. It's a very comforting illusion, honestly.

00:00:37.399 --> 00:00:40.579
We inherently want to believe that machines operate

00:00:40.579 --> 00:00:44.219
in this clean binary realm of true or false,

00:00:44.960 --> 00:00:47.380
completely removed from the messy subjectivity

00:00:47.380 --> 00:00:49.880
of human emotion. Welcome to today's Deep Dive.

00:00:49.979 --> 00:00:51.920
If you're a learner who loves pulling back the

00:00:51.920 --> 00:00:53.939
curtain on how the world actually works, you

00:00:53.939 --> 00:00:57.000
are in the exact right place. Today, we are taking

00:00:57.000 --> 00:00:59.659
a really comprehensive look at the research and

00:00:59.659 --> 00:01:03.259
history behind algorithmic bias. And our mission

00:01:03.259 --> 00:01:06.680
for this deep dive is to completely dismantle

00:01:06.680 --> 00:01:08.920
that illusion of the neutral machine. Is a big

00:01:08.920 --> 00:01:12.319
mission. It is. We're going to uncover how deeply

00:01:12.319 --> 00:01:15.239
human flaws get baked directly into computer

00:01:15.239 --> 00:01:18.359
code, why this unseen code dictates so much of

00:01:18.359 --> 00:01:22.239
your daily life, and how society is attempting

00:01:22.239 --> 00:01:24.719
to fight back. And this is such a vital mission,

00:01:24.780 --> 00:01:27.620
because we are all currently falling victim to

00:01:27.620 --> 00:01:30.400
a psychological phenomenon known as automation

00:01:30.400 --> 00:01:32.840
bias. Automation bias. OK, what is that exactly?

00:01:32.859 --> 00:01:34.719
Well, as algorithms increasingly organize our

00:01:34.719 --> 00:01:36.920
financial systems, our legal institutions, our

00:01:36.920 --> 00:01:39.760
daily behaviors, our brains naturally trick us

00:01:39.760 --> 00:01:41.879
into trusting a machine's output more than we

00:01:41.879 --> 00:01:44.819
trust human experts. Oh, wow. Yeah. We just assume

00:01:44.819 --> 00:01:46.560
the machine has crunched the numbers perfectly,

00:01:46.739 --> 00:01:49.920
which causes us to switch off our critical thinking.

00:01:50.090 --> 00:01:52.310
We just accept the output. OK, so let's unpack

00:01:52.310 --> 00:01:54.189
this, because we need to look at the reality

00:01:54.189 --> 00:01:56.750
of that output. The stakes here are so much higher

00:01:56.750 --> 00:01:59.810
than like why your streaming service keeps recommending

00:01:59.810 --> 00:02:02.290
mediocre movies. Far higher, yeah. We're talking

00:02:02.290 --> 00:02:04.510
about invisible code, deciding who gets hired

00:02:04.510 --> 00:02:07.109
for a job, who receives life saving health care,

00:02:07.769 --> 00:02:10.770
and who gets flagged for a prison sentence. To

00:02:10.770 --> 00:02:13.689
understand how algorithms came to rule our present

00:02:13.689 --> 00:02:15.870
with such authority, we actually have to look

00:02:15.870 --> 00:02:18.840
backward. Right. This problem isn't some modern

00:02:18.840 --> 00:02:21.539
phenomenon caused by social media or big tech.

00:02:22.099 --> 00:02:25.300
It began the exact moment humanity started teaching

00:02:25.300 --> 00:02:28.900
computers how to, quote, think. So how far back

00:02:28.900 --> 00:02:31.330
are we talking? If we go all the way back to

00:02:31.330 --> 00:02:34.870
1976, an early artificial intelligence pioneer

00:02:34.870 --> 00:02:37.250
named Joseph Weizenbaum was already sounding

00:02:37.250 --> 00:02:40.569
the alarm. Wait, 1976? We were, like, barely

00:02:40.569 --> 00:02:43.189
out of the punch card era of computing. Exactly.

00:02:43.349 --> 00:02:45.669
Computers took up entire rooms and had a fraction

00:02:45.669 --> 00:02:47.969
of the power of a modern smart toaster. Yet,

00:02:48.389 --> 00:02:50.330
Weizenbaum understood something fundamental about

00:02:50.330 --> 00:02:52.650
the architecture of computing. He argued that

00:02:52.650 --> 00:02:55.409
computer programs, by their very nature, embody

00:02:55.409 --> 00:02:58.289
law. Embody law? What does he mean by that? Well...

00:02:58.280 --> 00:03:00.800
When a programmer writes code to solve a problem,

00:03:01.120 --> 00:03:03.879
they're enforcing a very specific, rigid way

00:03:03.879 --> 00:03:06.599
to reach a solution. And because a human being

00:03:06.599 --> 00:03:10.379
has to write those rules, the code naturally

00:03:10.379 --> 00:03:14.479
and inevitably incorporates that specific programmer's

00:03:14.479 --> 00:03:18.280
imagination, their expectations, and their implicit

00:03:18.280 --> 00:03:20.860
biases regarding how the world is supposed to

00:03:20.860 --> 00:03:23.030
work. That makes sense. The source material actually

00:03:23.030 --> 00:03:25.469
has a brilliant analogy from Weissenbaum that

00:03:25.469 --> 00:03:27.750
really crystallizes this. Oh, the tourist one.

00:03:27.870 --> 00:03:30.030
Yeah. He says, trusting a computer program you

00:03:30.030 --> 00:03:32.550
don't fully understand is like a tourist trying

00:03:32.550 --> 00:03:35.729
to find their hotel in a strange city by flipping

00:03:35.729 --> 00:03:38.629
a coin at every single intersection. Right. Sure,

00:03:38.810 --> 00:03:40.590
they might eventually arrive at a bed, but just

00:03:40.590 --> 00:03:42.710
because there's an end result doesn't mean the

00:03:42.710 --> 00:03:46.110
process was reliable or accurate or safe. Exactly.

00:03:46.449 --> 00:03:48.729
But, I mean, you would think early computer scientists

00:03:48.729 --> 00:03:50.770
realized this, right? If they knew they were

00:03:50.770 --> 00:03:52.830
building their own biases into the code, wouldn't

00:03:52.830 --> 00:03:55.349
they actively test for that? You would certainly

00:03:55.349 --> 00:03:57.919
hope so, but, um... That brings us right back

00:03:57.919 --> 00:04:00.340
to that 1980s hospital scenario you mentioned

00:04:00.340 --> 00:04:02.240
at the start. Wait, that wasn't a hypothetical.

00:04:02.460 --> 00:04:04.300
No, that wasn't hypothetical at all. It's one

00:04:04.300 --> 00:04:07.560
of the first major documented algorithmic scandals.

00:04:07.919 --> 00:04:11.699
It took place between 1982 and 1986 at St. George's

00:04:11.699 --> 00:04:14.360
Hospital Medical School in the UK. Oh, wow. OK.

00:04:14.560 --> 00:04:16.300
The administration wanted to make their admissions

00:04:16.300 --> 00:04:19.220
process more efficient, so they implemented a

00:04:19.220 --> 00:04:21.879
new computer guidance system to screen applicants.

00:04:22.350 --> 00:04:24.810
And the easiest way to teach a computer who to

00:04:24.810 --> 00:04:27.750
admit is to show it who the school admitted in

00:04:27.750 --> 00:04:31.170
the past. Precisely. The algorithm was trained

00:04:31.170 --> 00:04:34.110
on the school's historical admissions data. It

00:04:34.110 --> 00:04:36.290
just looked for mathematical patterns and the

00:04:36.290 --> 00:04:38.290
types of candidates the human administrators

00:04:38.290 --> 00:04:40.689
had previously accepted. And humans are flawed.

00:04:40.970 --> 00:04:43.189
Right. Because the human administrators possessed

00:04:43.189 --> 00:04:46.149
inherent biases, the algorithm learned those

00:04:46.149 --> 00:04:48.670
biases as mathematical rules for success. So

00:04:48.670 --> 00:04:52.069
it just automated the discrimination. Yeah. As

00:04:52.069 --> 00:04:54.250
a result, the system systematically denied entry

00:04:54.250 --> 00:04:56.990
to around 60 women and people with non -traditional

00:04:56.990 --> 00:04:59.610
foreign -sounding names every single year it

00:04:59.610 --> 00:05:02.389
was in operation. That is staggering. And, I

00:05:02.389 --> 00:05:04.430
mean, the most terrifying part is that it wasn't

00:05:04.430 --> 00:05:06.689
a glitch. Not at all. The algorithm was working

00:05:06.689 --> 00:05:09.790
perfectly according to its design. It was just

00:05:09.790 --> 00:05:12.829
accurately automating society's existing prejudices

00:05:12.829 --> 00:05:15.930
at lightning speed. That historical bias really

00:05:15.930 --> 00:05:19.100
forms the foundation of the problem. But to understand

00:05:19.100 --> 00:05:21.480
our modern landscape, we have to look at how

00:05:21.480 --> 00:05:24.139
bias physically enters the algorithms we interact

00:05:24.139 --> 00:05:26.339
with today. Right, because it's not just one

00:05:26.339 --> 00:05:28.860
thing. Exactly. It doesn't happen through a single

00:05:28.860 --> 00:05:31.620
faulty line of code. It infiltrates the system

00:05:31.620 --> 00:05:34.579
at multiple distinct stages of an algorithm's

00:05:34.579 --> 00:05:36.720
lifecycle. Okay, so let's trace that lifecycle.

00:05:36.939 --> 00:05:39.160
Before a single line of code is even doing its

00:05:39.160 --> 00:05:41.680
job in the real world, the training data itself

00:05:41.680 --> 00:05:44.100
can be corrupted, right? Yes. The source material

00:05:44.100 --> 00:05:47.040
calls this pre -existing bias, and specifically

00:05:47.040 --> 00:05:49.480
highlights something called label choice bias.

00:05:49.720 --> 00:05:52.100
What is that? Label choice bias happens when

00:05:52.100 --> 00:05:54.720
programmers need an algorithm to measure a highly

00:05:54.720 --> 00:05:58.160
complex, nuanced concept, but computers can only

00:05:58.160 --> 00:06:00.660
understand hard data. Okay, so they have to simplify

00:06:00.660 --> 00:06:02.959
it. Right. So the programmers choose a measurable

00:06:02.959 --> 00:06:06.360
proxy to represent that complex concept. The

00:06:06.360 --> 00:06:08.779
source text details a widely used healthcare

00:06:08.779 --> 00:06:11.600
algorithm designed to identify patients with

00:06:11.600 --> 00:06:14.439
highly complex health needs. So the hospital

00:06:14.439 --> 00:06:17.000
could allocate extra resources to them? Exactly.

00:06:17.579 --> 00:06:19.240
But the algorithm couldn't inherently measure

00:06:19.240 --> 00:06:22.540
sickness. Sickness is too abstract. Instead,

00:06:22.660 --> 00:06:25.259
it used healthcare costs as a proxy for healthcare

00:06:25.259 --> 00:06:28.759
needs. Which, I mean, on a purely surface level

00:06:28.759 --> 00:06:32.060
seems logical. Usually the sicker a person is,

00:06:32.339 --> 00:06:34.699
the more money their medical care costs. Sickness

00:06:34.699 --> 00:06:37.160
equals expense. It seems perfectly logical until

00:06:37.160 --> 00:06:40.220
you apply historical context. Historically, due

00:06:40.220 --> 00:06:43.000
to deep systemic inequalities, lack of geographic

00:06:43.000 --> 00:06:46.079
access to specialists, and historical distrust

00:06:46.079 --> 00:06:48.339
of the medical system, black patients incurred

00:06:48.339 --> 00:06:50.339
much lower health care costs than white patients.

00:06:50.620 --> 00:06:52.810
Oh, I see where this is going. Yeah. When the

00:06:52.810 --> 00:06:54.910
algorithm processed this data, it didn't understand

00:06:54.910 --> 00:06:57.509
the socioeconomic reasons behind the lower costs.

00:06:57.970 --> 00:07:00.189
It simply looked at the numbers and mathematically

00:07:00.189 --> 00:07:02.730
predicted that the black patients were significantly

00:07:02.730 --> 00:07:05.930
healthier than equally sick white patients. So

00:07:05.930 --> 00:07:08.410
an algorithm designed to help the sickest patients

00:07:08.410 --> 00:07:10.449
ended up denying care to the people who needed

00:07:10.449 --> 00:07:13.189
it most simply because it confused how much money

00:07:13.189 --> 00:07:15.610
you spend with how sick you actually are. That

00:07:15.610 --> 00:07:18.490
is the insidious nature of label choice bias.

00:07:19.110 --> 00:07:22.000
But the problem goes far beyond just as choosing

00:07:22.000 --> 00:07:26.199
the wrong numerical proxy. The bias is fundamentally

00:07:26.199 --> 00:07:29.339
woven into the actual words we use to communicate

00:07:29.339 --> 00:07:31.620
with these machines. Here's where it gets really

00:07:31.620 --> 00:07:33.439
interesting, because this is a massive issue

00:07:33.439 --> 00:07:35.480
right now with the explosion of large language

00:07:35.480 --> 00:07:37.839
models or LLMs. Oh, absolutely. But if an algorithm

00:07:37.839 --> 00:07:40.959
is just reading text, how can it be biased against

00:07:40.959 --> 00:07:44.259
a dialect or a specific way of speaking? Well,

00:07:44.560 --> 00:07:47.000
we have to look at how these massive AI models

00:07:47.000 --> 00:07:50.310
are trained. They are fed almost incomprehensible

00:07:50.310 --> 00:07:52.110
amounts of data scraped from the internet just

00:07:52.110 --> 00:07:54.569
billions of pages Right and because the early

00:07:54.569 --> 00:07:57.490
internet and the bulk of digitized academic and

00:07:57.490 --> 00:08:00.290
corporate text is primarily in English These

00:08:00.290 --> 00:08:03.089
models tend to present a very specific Anglo

00:08:03.089 --> 00:08:06.129
-american worldview as the absolute universal

00:08:06.129 --> 00:08:09.290
truth But the source text reveals that it goes

00:08:09.290 --> 00:08:14.160
much deeper into active dialect prejudice A 2024

00:08:14.160 --> 00:08:16.759
study showed that these models exhibit covert

00:08:16.759 --> 00:08:19.199
racism against speakers of African -American

00:08:19.199 --> 00:08:22.560
English. Wait, covert racism from an AI? Yeah.

00:08:22.839 --> 00:08:25.319
Because of the mathematical weights assigned

00:08:25.319 --> 00:08:27.439
to certain words and phrases in their training

00:08:27.439 --> 00:08:30.360
data, the models actually harbor negative stereotypes

00:08:30.360 --> 00:08:32.799
about the dialect that are sometimes even more

00:08:32.799 --> 00:08:36.639
extreme than recorded human biases. That's insane.

00:08:37.100 --> 00:08:39.820
And this language bias affects software we normally

00:08:39.820 --> 00:08:42.399
think of as purely technical and objective, too.

00:08:42.559 --> 00:08:45.600
Like what? Take Turnitin, for example, the plagiarism

00:08:45.600 --> 00:08:48.100
detection software used by high schools and universities

00:08:48.100 --> 00:08:51.139
all over the world. The source notes that Turnitin's

00:08:51.139 --> 00:08:53.759
detection algorithms actively penalize non -native

00:08:53.759 --> 00:08:56.639
English speakers far more often than native speakers.

00:08:56.759 --> 00:08:58.759
It is crucial to understand why that happens.

00:08:59.139 --> 00:09:01.159
It's not because non -native speakers are plagiarizing

00:09:01.159 --> 00:09:03.759
more. Right. It comes down to the mechanics of

00:09:03.759 --> 00:09:06.100
the software's string matching code. Right, because

00:09:06.100 --> 00:09:09.100
native English speakers possess a natural intuitive

00:09:09.100 --> 00:09:12.049
vocabulary. If they're summarizing a text, they

00:09:12.049 --> 00:09:14.549
inherently know how to swap out words and use

00:09:14.549 --> 00:09:16.830
synonyms, which breaks up the string of text.

00:09:17.309 --> 00:09:19.250
Exactly. The software looks for a matching string

00:09:19.250 --> 00:09:21.870
of 10 words, but the native speaker changed the

00:09:21.870 --> 00:09:24.330
fifth word, so the software sees it as original

00:09:24.330 --> 00:09:27.370
thought. But for a non -native speaker... A non

00:09:27.370 --> 00:09:29.509
-native speaker might rely more heavily on the

00:09:29.509 --> 00:09:31.990
original phrasing to ensure their grammar is

00:09:31.990 --> 00:09:34.730
correct, resulting in an exact string match that

00:09:34.730 --> 00:09:37.899
triggers a severe plagiarism penalty. That is

00:09:37.899 --> 00:09:41.379
a textbook example of technical bias, where the

00:09:41.379 --> 00:09:44.620
constraints of the software itself inadvertently

00:09:44.620 --> 00:09:47.179
favor one demographic over another. So let's

00:09:47.179 --> 00:09:49.799
say a team of engineers manages the impossible.

00:09:50.220 --> 00:09:52.559
They perfectly clean their training data, they

00:09:52.559 --> 00:09:54.659
avoid all the technical traps with language,

00:09:54.820 --> 00:09:57.460
and they build a pristine, mathematically sound

00:09:57.460 --> 00:10:00.549
algorithm. They're safe, right? Actually, no.

00:10:00.769 --> 00:10:03.289
Because the moment that pristine code hits the

00:10:03.289 --> 00:10:05.789
messy, unpredictable real world, it breaks in

00:10:05.789 --> 00:10:08.049
completely new ways. Oh, great. Sociologists

00:10:08.049 --> 00:10:11.009
call this emergent bias. A classic example from

00:10:11.009 --> 00:10:13.809
the source material happened in 1990 with the

00:10:13.809 --> 00:10:16.070
National Residency Match program. That's the

00:10:16.070 --> 00:10:19.129
program that uses an algorithm to place graduating

00:10:19.129 --> 00:10:21.509
medical students into hospitals across the country,

00:10:21.669 --> 00:10:24.629
right? Yes. For years, the algorithm worked perfectly.

00:10:24.940 --> 00:10:28.299
But as social dynamics shifted and significantly

00:10:28.299 --> 00:10:31.080
more women entered medical school, the program

00:10:31.080 --> 00:10:33.840
suddenly encountered married couples applying

00:10:33.840 --> 00:10:36.519
together. Ah, and they naturally wanted to be

00:10:36.519 --> 00:10:39.419
placed in the same city. Exactly. The human element

00:10:39.419 --> 00:10:42.779
introduced a completely new variable, compromise.

00:10:43.149 --> 00:10:46.590
But the algorithm was coded for rigid optimization,

00:10:46.929 --> 00:10:49.169
not human compromise. Right. When it encountered

00:10:49.169 --> 00:10:51.429
a couple, it didn't know how to find a middle

00:10:51.429 --> 00:10:54.429
ground geographic location. It simply looked

00:10:54.429 --> 00:10:56.950
at the mathematical scores, weighed the location

00:10:56.950 --> 00:10:59.370
choices of the higher rated partner first, and

00:10:59.370 --> 00:11:01.690
placed them. Completely ignoring the other person.

00:11:01.830 --> 00:11:04.149
It effectively penalized the lower rated partner's

00:11:04.149 --> 00:11:06.409
preferences entirely, often sending couples to

00:11:06.409 --> 00:11:08.840
opposite ends of the country. And when emergent

00:11:08.840 --> 00:11:12.240
bias involves complex societal systems rather

00:11:12.240 --> 00:11:14.840
than just job placements, it creates something

00:11:14.840 --> 00:11:17.600
incredibly dangerous, which is feedback loops.

00:11:17.919 --> 00:11:20.419
Oh, these are terrifying. Yeah. I want to look

00:11:20.419 --> 00:11:22.980
closely at the PredPol software example from

00:11:22.980 --> 00:11:25.720
the source text. This was a predictive policing

00:11:25.720 --> 00:11:28.919
algorithm used in Oakland, California. The goal

00:11:28.919 --> 00:11:31.220
was to predict where future crimes would happen

00:11:31.220 --> 00:11:34.480
so the city could proactively deploy police resources.

00:11:35.039 --> 00:11:37.740
And to make those predictions, the software was

00:11:37.740 --> 00:11:41.100
fed historical public crime reports. Based on

00:11:41.100 --> 00:11:43.480
that past data, the algorithm determined that

00:11:43.480 --> 00:11:46.259
certain black neighborhoods were high risk. Right.

00:11:46.500 --> 00:11:49.019
And it automatically dispatched more police cruisers.

00:11:49.200 --> 00:11:52.059
to patrol those specific streets. But think about

00:11:52.059 --> 00:11:54.639
how human behavior operates. If a neighborhood

00:11:54.639 --> 00:11:57.019
is suddenly saturated with police cruisers driving

00:11:57.019 --> 00:11:59.600
up and down the street, the officers are naturally

00:11:59.600 --> 00:12:02.740
going to observe and record more minor infractions

00:12:02.740 --> 00:12:04.899
simply because they are physically present to

00:12:04.899 --> 00:12:06.580
see them. And what does the algorithm do with

00:12:06.580 --> 00:12:09.220
those new police reports? Oh wow. It creates

00:12:09.220 --> 00:12:12.360
a self -fulfilling prophecy. The algorithm ingests

00:12:12.360 --> 00:12:15.139
those new reports, pats itself on the back for

00:12:15.139 --> 00:12:17.419
making a correct prediction, and mathematically

00:12:17.419 --> 00:12:19.919
decides that the neighborhood is even more dangerous

00:12:19.919 --> 00:12:21.820
than it originally thought, so it sends more

00:12:21.820 --> 00:12:25.120
police. The algorithm is entirely blind to the

00:12:25.120 --> 00:12:27.440
fact that its own initial deployment caused the

00:12:27.440 --> 00:12:30.139
spike in recorded data. It just locks the city

00:12:30.139 --> 00:12:33.460
into an endless racist loop of over -policing,

00:12:33.899 --> 00:12:36.580
all while hiding behind the defense of objective

00:12:36.580 --> 00:12:40.240
math. these mechanisms, how bias infiltrates

00:12:40.240 --> 00:12:43.159
at the design stage through technical constraints

00:12:43.159 --> 00:12:46.019
and through emergent real world feedback loops,

00:12:46.320 --> 00:12:48.899
you start to see the vast real world impacts.

00:12:48.899 --> 00:12:51.340
It's everywhere. These invisible systems are

00:12:51.340 --> 00:12:53.480
making highly consequential decisions about your

00:12:53.480 --> 00:12:55.940
daily life, your finances and your identity.

00:12:56.080 --> 00:12:57.820
And we should be clear that these impacts aren't

00:12:57.820 --> 00:13:00.840
always accidental. Sometimes the bias is deliberately

00:13:00.840 --> 00:13:03.440
engineered for commercial manipulation. That's

00:13:03.440 --> 00:13:06.269
very true. As early as the 1980s, American Airlines

00:13:06.269 --> 00:13:08.789
intentionally built a flight finding algorithm

00:13:08.789 --> 00:13:11.409
that artificially boosted its own flights to

00:13:11.409 --> 00:13:13.809
the top of the search results, regardless of

00:13:13.809 --> 00:13:16.090
whether a competitor offered a cheaper or more

00:13:16.090 --> 00:13:18.669
convenient route. Right. Today, that concept

00:13:18.669 --> 00:13:22.029
has evolved into what scholars call digital gerrymandering.

00:13:22.690 --> 00:13:24.590
Research shows that search engine algorithms

00:13:24.590 --> 00:13:27.350
can be manipulated to subtly shift the voting

00:13:27.350 --> 00:13:30.169
outcomes of undecided voters by up to 20 percent.

00:13:30.480 --> 00:13:32.840
just by altering the order in which political

00:13:32.840 --> 00:13:35.799
information is presented. Exactly. That invisible

00:13:35.799 --> 00:13:38.820
manipulation is incredibly powerful because the

00:13:38.820 --> 00:13:40.659
user believes they are conducting an independent

00:13:40.659 --> 00:13:44.220
search. But the most devastating impacts outlined

00:13:44.220 --> 00:13:46.620
in the source material occur when algorithms

00:13:46.620 --> 00:13:49.500
interact with human identity. specifically gender,

00:13:49.720 --> 00:13:53.879
race, and LGBTQ plus status. Right. Let's look

00:13:53.879 --> 00:13:56.139
at the mechanics of gender bias. Amazon spent

00:13:56.139 --> 00:13:58.919
years developing an automated AI recruiting tool

00:13:58.919 --> 00:14:01.519
to sort through job applications. They eventually

00:14:01.519 --> 00:14:03.259
had to completely scrap the project. Because

00:14:03.259 --> 00:14:05.500
it was biased. Because the algorithm actively

00:14:05.500 --> 00:14:08.179
penalized resumes that contained the word women's.

00:14:08.379 --> 00:14:11.200
So if an applicant listed that they were like

00:14:11.200 --> 00:14:13.779
the captain of the women's chess club, Their

00:14:13.779 --> 00:14:16.820
resume was mathematically downgraded. Yes. And

00:14:16.820 --> 00:14:19.179
again, this goes back to the training data. Amazon

00:14:19.179 --> 00:14:22.600
trained that AI on 10 years of their own historical

00:14:22.600 --> 00:14:25.639
resumes. Because the tech industry has historically

00:14:25.639 --> 00:14:29.019
been male dominated, the vast majority of successful

00:14:29.019 --> 00:14:31.480
resumes in that training data came from men.

00:14:31.779 --> 00:14:34.179
Exactly. The AI essentially learned that the

00:14:34.179 --> 00:14:36.279
mathematical pattern for a successful employee

00:14:36.279 --> 00:14:39.460
was being male. So it ruthlessly filtered out

00:14:39.460 --> 00:14:43.149
any indicators of female identity. We see similarly

00:14:43.149 --> 00:14:45.629
invasive algorithmic behavior in retail, too.

00:14:45.799 --> 00:14:48.919
Target developed an algorithm capable of predicting

00:14:48.919 --> 00:14:51.559
when female customers were pregnant before they'd

00:14:51.559 --> 00:14:53.460
even announced it to their own families. Wait,

00:14:53.480 --> 00:14:55.899
how is that even possible? Purely by detecting

00:14:55.899 --> 00:14:58.340
subtle mathematical shifts in their purchase

00:14:58.340 --> 00:15:00.759
histories, like buying unscented lotion instead

00:15:00.759 --> 00:15:03.220
of scented. That feels like a huge invasion of

00:15:03.220 --> 00:15:05.960
privacy. It is. Meanwhile, LinkedIn used to have

00:15:05.960 --> 00:15:07.799
a feature that would correct female names in

00:15:07.799 --> 00:15:10.360
search queries. If a user searched for Andrea,

00:15:10.600 --> 00:15:12.840
the algorithm would prompt, did you mean Andrew?

00:15:13.049 --> 00:15:16.090
But it never did the reverse. It never did the

00:15:16.090 --> 00:15:18.690
reverse because the algorithm weighed the male

00:15:18.690 --> 00:15:23.480
name as the higher probability. default reality.

00:15:23.799 --> 00:15:25.840
The impacts become even more alarming when we

00:15:25.840 --> 00:15:28.600
move from retail into surveillance and law enforcement.

00:15:29.299 --> 00:15:32.620
The source text highlights a landmark 2018 study

00:15:32.620 --> 00:15:35.259
on commercial facial recognition systems. This

00:15:35.259 --> 00:15:37.639
is a huge one. The researchers found that the

00:15:37.639 --> 00:15:40.559
algorithms had error rates of up to 35 percent

00:15:40.559 --> 00:15:42.620
when trying to identify darker skinned women.

00:15:43.019 --> 00:15:45.399
But when tasked with identifying lighter skinned

00:15:45.399 --> 00:15:47.980
men, the error rate plummeted to less than one

00:15:47.980 --> 00:15:50.379
percent. This failure is a combination of bad

00:15:50.379 --> 00:15:53.779
training data and physical technical bias. The

00:15:53.779 --> 00:15:56.100
databases used to train these facial recognition

00:15:56.100 --> 00:15:58.580
models were overwhelmingly filled with images

00:15:58.580 --> 00:16:01.320
of white faces. Leaving the algorithm mathematically

00:16:01.320 --> 00:16:04.019
blind to the diverse facial features of darker

00:16:04.019 --> 00:16:06.299
skinned individuals? Right, and the consequences

00:16:06.299 --> 00:16:08.480
of this are not abstract. This technological

00:16:08.480 --> 00:16:10.980
failure has directly led to the wrongful arrests

00:16:10.980 --> 00:16:13.139
and incarcerations of black men in the United

00:16:13.139 --> 00:16:16.580
States. And globally, these biased surveillance

00:16:16.580 --> 00:16:19.779
algorithms are being actively weaponized. In

00:16:19.779 --> 00:16:22.220
China, algorithms are used to heavily restrict

00:16:22.220 --> 00:16:25.139
the Uyghur Muslim minority. The source notes

00:16:25.139 --> 00:16:27.259
that members of this demographic aren't even

00:16:27.259 --> 00:16:29.840
allowed to purchase a simple kitchen knife unless

00:16:29.840 --> 00:16:33.000
they pass algorithm security protocols and have

00:16:33.000 --> 00:16:35.919
a QR code containing their personal data physically

00:16:35.919 --> 00:16:38.320
edged into the blade of the knife. It's dystopian.

00:16:38.539 --> 00:16:40.480
Here in the U .S., the justice system relies

00:16:40.480 --> 00:16:43.779
on an algorithm called COMPASS. Judges use it

00:16:43.779 --> 00:16:46.080
to predict the likelihood of a defendant reoffending.

00:16:46.500 --> 00:16:48.879
And studies show that COMPAS erroneously labeled

00:16:48.879 --> 00:16:51.879
black defendants as high risk twice as often

00:16:51.879 --> 00:16:54.620
as white defendants. Baking historical arrest

00:16:54.620 --> 00:16:57.659
disparities directly into future sentencing guidelines.

00:16:57.899 --> 00:17:00.340
The LGBTQ plus and disabled communities face

00:17:00.340 --> 00:17:03.620
severe algorithmic exclusion as well. Grindr's

00:17:03.620 --> 00:17:06.079
recommendation algorithm was once inexplicably

00:17:06.079 --> 00:17:08.180
linked by the Android Store to apps designed

00:17:08.180 --> 00:17:11.240
to track sex offenders. Creating a highly dangerous

00:17:11.240 --> 00:17:14.480
association. Exactly. Uber implemented a facial

00:17:14.480 --> 00:17:16.980
recognition security check for his drivers that

00:17:16.980 --> 00:17:19.640
completely failed to recognize transgender drivers

00:17:19.640 --> 00:17:22.140
who were actively transitioning. because their

00:17:22.140 --> 00:17:24.240
faces were changing? Right, the algorithm flagged

00:17:24.240 --> 00:17:26.960
their changing appearance as fraud and automatically

00:17:26.960 --> 00:17:29.480
suspended their accounts, cutting off their livelihoods.

00:17:29.619 --> 00:17:31.779
And if you are a person with a speech impairment,

00:17:32.099 --> 00:17:34.359
you are often entirely excluded from the modern

00:17:34.359 --> 00:17:37.299
digital world. Voice assistants like Alexa or

00:17:37.299 --> 00:17:40.640
Siri simply aren't trained on diverse voice data

00:17:40.640 --> 00:17:43.000
rendering the technology useless to millions

00:17:43.000 --> 00:17:46.069
of people. Now, I want to pause here for a second

00:17:46.069 --> 00:17:47.849
because it's important to note that this isn't

00:17:47.849 --> 00:17:51.150
just about corporate or social bias. It impacts

00:17:51.150 --> 00:17:54.069
global politics, too. And we want to be completely

00:17:54.069 --> 00:17:56.789
impartial here and just, you know. share what

00:17:56.789 --> 00:17:58.970
the data in our source material shows without

00:17:58.970 --> 00:18:00.809
taking any sides at all. Right. We're just looking

00:18:00.809 --> 00:18:03.470
at the findings. Exactly. Looking purely at the

00:18:03.470 --> 00:18:06.490
data in the source material, we see that AI doesn't

00:18:06.490 --> 00:18:08.710
pick sides based on objective truth. It just

00:18:08.710 --> 00:18:11.430
mirrors its training data. The source mentions

00:18:11.430 --> 00:18:14.470
a 2025 study by the Anti -Defamation League,

00:18:14.609 --> 00:18:17.309
which found that several major LLMs, including

00:18:17.309 --> 00:18:20.490
ChatGPT and Gemini, showed anti -Israel bias.

00:18:20.730 --> 00:18:23.089
Which highlights the core reality of machine

00:18:23.089 --> 00:18:26.339
learning. These systems are just reflections.

00:18:26.940 --> 00:18:29.119
They do not possess objective reasoning. They

00:18:29.119 --> 00:18:31.920
just spit back what they read. Whatever political,

00:18:32.259 --> 00:18:35.319
cultural, or social biases exist in the millions

00:18:35.319 --> 00:18:38.420
of documents they digest will be regurgitated

00:18:38.420 --> 00:18:41.279
as authoritative fact, regardless of where that

00:18:41.279 --> 00:18:43.220
data falls on the political spectrum. You might

00:18:43.220 --> 00:18:45.279
be listening to all of this and thinking, OK,

00:18:45.400 --> 00:18:48.359
if we know these algorithms are sexist, racist,

00:18:48.519 --> 00:18:50.859
and commercially manipulative, why don't we just

00:18:50.859 --> 00:18:53.730
open up a laptop? find the bad lines of code,

00:18:53.869 --> 00:18:55.930
and hit delete. If only it were that simple.

00:18:56.069 --> 00:18:58.750
Right. Like, why is it so hard to just fix the

00:18:58.750 --> 00:19:01.809
math? It comes down to a concept defined by sociologist

00:19:01.809 --> 00:19:04.509
Bruno Latour called blackboxing. Blackboxing.

00:19:04.890 --> 00:19:06.710
Latour observed that when a piece of technology

00:19:06.710 --> 00:19:08.849
becomes highly successful and highly efficient,

00:19:09.369 --> 00:19:12.440
it becomes invisible and opaque. We only look

00:19:12.440 --> 00:19:14.480
at the inputs, what we ask the computer to do,

00:19:14.720 --> 00:19:16.779
and the outputs what the computer gives us. Right.

00:19:16.980 --> 00:19:18.960
We completely ignore the internal complexity.

00:19:19.180 --> 00:19:21.279
We used to think of computer code like a recipe.

00:19:21.720 --> 00:19:23.779
If the cake tastes bad, you look at the recipe,

00:19:23.900 --> 00:19:25.660
see that you added too much salt, and you change

00:19:25.660 --> 00:19:28.839
it. But modern algorithms aren't recipes anymore,

00:19:28.859 --> 00:19:32.299
are they? Not at all. It is much more accurate

00:19:32.299 --> 00:19:35.559
to think of them like a biological brain, a massive

00:19:35.559 --> 00:19:38.980
interconnected neural network. If I ask you why

00:19:38.980 --> 00:19:42.119
your favorite color is blue, you can't point

00:19:42.119 --> 00:19:44.579
to the exact physical neuron in your brain that

00:19:44.579 --> 00:19:47.059
made that decision. I have no idea, yeah. Modern

00:19:47.059 --> 00:19:49.539
algorithms operate the same way. The search engine

00:19:49.539 --> 00:19:52.900
Bing, for example, runs up to 10 million subtle

00:19:52.900 --> 00:19:56.099
A -B test variations of its service every single

00:19:56.099 --> 00:19:58.819
day. 10 million a day. Every single day. They're

00:19:58.819 --> 00:20:01.559
constantly tweaking weights, pathways, and connections.

00:20:02.180 --> 00:20:04.039
The tech companies have built digital brains

00:20:04.039 --> 00:20:06.599
so massive and fluid that even the original engineers

00:20:06.599 --> 00:20:09.579
cannot trace exactly which line of code made

00:20:09.579 --> 00:20:11.940
a biased decision. That is wow. So it's like

00:20:11.940 --> 00:20:14.180
trying to fix a faulty recipe when you aren't

00:20:14.180 --> 00:20:16.200
allowed in the kitchen. You can't see the ingredients

00:20:16.200 --> 00:20:18.480
and the chef is changing the recipe 10 million

00:20:18.480 --> 00:20:21.200
times a day. Exactly. And on top of that, sheer

00:20:21.200 --> 00:20:24.240
technical complexity. These algorithms are fiercely

00:20:24.240 --> 00:20:26.980
protected as corporate trade secrets. Of course.

00:20:27.319 --> 00:20:30.180
The companies refuse to let independent auditors

00:20:30.180 --> 00:20:32.319
examine the digital brain because they don't

00:20:32.319 --> 00:20:35.619
want competitors copying their product or users

00:20:35.619 --> 00:20:38.180
learning how to game the system. But there is

00:20:38.180 --> 00:20:41.259
also a massive philosophical roadblock here.

00:20:41.640 --> 00:20:43.819
Even if we could open the black box and see the

00:20:43.819 --> 00:20:46.240
code, we still have to define what fairness actually

00:20:46.240 --> 00:20:48.980
looks like. And the source points out that this

00:20:48.980 --> 00:20:51.240
is mathematically impossible to agree on. It

00:20:51.240 --> 00:20:53.519
really is. Because if you program an algorithm

00:20:53.519 --> 00:20:56.380
for equality of outcome, meaning every demographic

00:20:56.380 --> 00:20:59.359
gets the exact same statistical result that inherently

00:20:59.359 --> 00:21:01.640
contradicts equality of treatment, where you

00:21:01.640 --> 00:21:03.480
treat individuals differently based on their

00:21:03.480 --> 00:21:07.119
specific nuanced context. So if the math is hidden

00:21:07.119 --> 00:21:09.059
and we can't even agree on the definition of

00:21:09.059 --> 00:21:12.380
fairness, are we just doomed to live under the

00:21:12.380 --> 00:21:15.039
authority of alien algorithms? We certainly aren't

00:21:15.039 --> 00:21:18.079
doomed. But fighting back against algorithmic

00:21:18.079 --> 00:21:21.140
bias requires a massive multi -pronged approach.

00:21:21.660 --> 00:21:24.160
It combines technical innovation, a radical shift

00:21:24.160 --> 00:21:27.319
in human diversity, and strict government regulation.

00:21:27.519 --> 00:21:29.119
OK, so what's happening on the technical side?

00:21:29.319 --> 00:21:32.470
Well, organizations like the IE, The world's

00:21:32.470 --> 00:21:35.109
largest technical professional organization have

00:21:35.109 --> 00:21:38.549
published new standards, like the 7003 -2024

00:21:38.549 --> 00:21:41.430
standard, which provides creators with specific,

00:21:41.809 --> 00:21:44.390
rigorous methodologies to identify and eliminate

00:21:44.390 --> 00:21:47.130
bias during the design phase. That's a good step.

00:21:47.269 --> 00:21:50.230
There is also a massive industry push for explainable

00:21:50.230 --> 00:21:53.130
AI, which essentially forces the neural network

00:21:53.130 --> 00:21:55.809
to show its work and justify how it reached a

00:21:55.809 --> 00:21:58.009
decision. The source also mentions tools like

00:21:58.009 --> 00:22:00.599
data sheets for datasets and model cards. The

00:22:00.599 --> 00:22:02.779
best way to think about these is like a mandatory

00:22:02.779 --> 00:22:04.759
nutrition label on the back of a cereal box,

00:22:04.940 --> 00:22:07.200
but for AI. I like that analogy. Yeah, instead

00:22:07.200 --> 00:22:09.240
of listing calories and sugar, these model cards

00:22:09.240 --> 00:22:11.440
would clearly list exactly what demographic data

00:22:11.440 --> 00:22:13.960
the AI was trained on, what data it is entirely

00:22:13.960 --> 00:22:16.380
missing, and what real -world tasks it absolutely

00:22:16.380 --> 00:22:18.500
should not be used for. Those technical tools

00:22:18.500 --> 00:22:21.160
are essential, but they are entirely insufficient

00:22:21.160 --> 00:22:24.279
without human solutions. The source notes a severe

00:22:24.279 --> 00:22:27.700
diversity crisis in the field. Currently, only

00:22:27.700 --> 00:22:30.019
12 % of machine learning engineers are women.

00:22:30.089 --> 00:22:32.950
Just 12%. That's wild. That's a huge problem.

00:22:33.250 --> 00:22:35.470
Initiatives like Stanford's Institute for Human

00:22:35.470 --> 00:22:38.910
-Centered AI and groups like Black in AI are

00:22:38.910 --> 00:22:41.710
actively fighting to fix the overarching whiteness

00:22:41.710 --> 00:22:43.950
and male -dominated culture of AI development.

00:22:44.490 --> 00:22:46.210
Because if the people building the tools don't

00:22:46.210 --> 00:22:48.609
reflect the diversity of the real world, the

00:22:48.609 --> 00:22:50.960
tools never will. And because we can't just rely

00:22:50.960 --> 00:22:53.299
on tech companies to self -regulate, governments

00:22:53.299 --> 00:22:55.619
are finally stepping in to pry open that black

00:22:55.619 --> 00:22:59.299
box. In Europe, the General Data Protection Regulation,

00:22:59.480 --> 00:23:02.980
or GDPR, has a specific clause Article 22, which

00:23:02.980 --> 00:23:05.440
explicitly prohibits solely automated decisions

00:23:05.440 --> 00:23:07.960
that have legal or significant effects on a person's

00:23:07.960 --> 00:23:10.440
life. You inherently have the right to demand

00:23:10.440 --> 00:23:13.059
a human in the loop. Here in the U .S., local

00:23:13.059 --> 00:23:15.980
governments are taking action, too. In 2023,

00:23:16.380 --> 00:23:19.259
New York City implemented a groundbreaking law

00:23:19.259 --> 00:23:22.740
that actually requires employers to conduct independent

00:23:22.740 --> 00:23:25.700
third -party bias audits for any automated hiring

00:23:25.700 --> 00:23:28.180
tools they use. And they have to publicly publish

00:23:28.180 --> 00:23:30.880
the results, right? Yes, they do. We are also

00:23:30.880 --> 00:23:33.500
seeing significant federal movement. President

00:23:33.500 --> 00:23:36.160
Biden signed executive orders mandating best

00:23:36.160 --> 00:23:39.200
practices for AI safety, specifically targeting

00:23:39.200 --> 00:23:42.190
fraud and algorithmic discrimination. And globally.

00:23:42.490 --> 00:23:44.730
Globally, India's proposed personal data bill

00:23:44.730 --> 00:23:47.289
takes a similar stance, explicitly targeting

00:23:47.289 --> 00:23:49.809
discriminatory treatment as a legal harm that

00:23:49.809 --> 00:23:52.309
companies can be prosecuted for. That's fascinating.

00:23:52.490 --> 00:23:54.150
But if we connect this to the bigger picture,

00:23:54.329 --> 00:23:56.470
the most important shift here is the change in

00:23:56.470 --> 00:23:58.890
perspective. The ultimate solution isn't just

00:23:58.890 --> 00:24:01.319
writing better, cleaner math. It is adopting

00:24:01.319 --> 00:24:03.720
frameworks like the Toronto Declaration, which

00:24:03.720 --> 00:24:06.380
treats algorithmic harm not as a computer glitch,

00:24:06.859 --> 00:24:08.799
but as a fundamental human rights violation.

00:24:08.799 --> 00:24:11.200
Oh, wow. It demands legal accountability and

00:24:11.200 --> 00:24:13.940
liability from the private corporate actors who

00:24:13.940 --> 00:24:16.119
deploy these untested systems into our public

00:24:16.119 --> 00:24:18.599
spaces. Let's bring all of this together. What

00:24:18.599 --> 00:24:20.640
we've uncovered today is that algorithms are

00:24:20.640 --> 00:24:23.900
not neutral calculators. They are not objective

00:24:23.900 --> 00:24:27.359
judges immune to human flaws. They are highly

00:24:27.359 --> 00:24:30.920
polished mirrors reflecting humanity's best intentions

00:24:30.920 --> 00:24:34.539
right alongside our worst, most deeply ingrained

00:24:34.539 --> 00:24:37.269
historical prejudices. Very well said. Before

00:24:37.269 --> 00:24:39.329
we wrap up, I want you to just take a second

00:24:39.329 --> 00:24:42.450
and consider how many invisible algorithms you

00:24:42.450 --> 00:24:44.750
had to interact with today just to listen to

00:24:44.750 --> 00:24:47.849
this deep dive. Your phone's facial recognition

00:24:47.849 --> 00:24:50.970
to unlock the screen, the podcast platform's

00:24:50.970 --> 00:24:53.130
recommendation engine that surfaced the episode,

00:24:53.549 --> 00:24:55.730
the network routing the audio to your headphones.

00:24:55.890 --> 00:24:57.950
It's everywhere. The unseen code is constantly

00:24:57.950 --> 00:24:59.769
making decisions about what you see and what

00:24:59.769 --> 00:25:02.130
you can do. And as we integrate these complex

00:25:02.130 --> 00:25:04.930
systems even deeper into the fabric of our society,

00:25:05.490 --> 00:25:17.170
the responsibility I want to leave you with a

00:25:17.170 --> 00:25:20.230
final lingering thought to mull over. Let's say

00:25:20.230 --> 00:25:23.289
we eventually manage the impossible. Let's say

00:25:23.289 --> 00:25:26.309
that one day we build an AI that is perfectly

00:25:26.309 --> 00:25:29.490
objectively fair, completely stripped of all

00:25:29.490 --> 00:25:32.150
human bias, emotion, and historical baggage.

00:25:32.710 --> 00:25:35.130
OK. Whose specific version of fairness will it

00:25:35.130 --> 00:25:38.069
use? And more importantly, would we humans, with

00:25:38.069 --> 00:25:40.269
all of our inherent flaws, our contradictions,

00:25:40.269 --> 00:25:43.009
and our messy emotions, even be willing to accept

00:25:43.009 --> 00:25:45.750
the rigid decisions of a machine that no longer

00:25:45.750 --> 00:25:48.190
thinks anything like us? That is the real question.

00:25:48.349 --> 00:25:50.450
Thank you for joining us on this deep dive. Keep

00:25:50.450 --> 00:25:52.670
questioning the invisible code running your world.