WEBVTT

00:00:00.000 --> 00:00:02.839
It used to be that when you picked up a tool,

00:00:03.520 --> 00:00:05.719
you understood exactly how it worked just by

00:00:05.719 --> 00:00:07.940
looking at it. Yeah, the mechanics were just

00:00:07.940 --> 00:00:10.320
obvious. Exactly. You pick up a hammer, you look

00:00:10.320 --> 00:00:11.919
at the heavy metal head, you look at the wooden

00:00:11.919 --> 00:00:14.699
handle, and the physics of the tool just make

00:00:14.699 --> 00:00:17.420
perfect intuitive sense to your brain. You can

00:00:17.420 --> 00:00:20.519
see the cause and effect. Right. And even with

00:00:20.519 --> 00:00:23.420
early computers, I mean, as unimaginably complex

00:00:23.420 --> 00:00:27.280
as they were, they were still just incredibly

00:00:27.280 --> 00:00:30.839
fast rule followers. You type a command and the

00:00:30.839 --> 00:00:33.479
machine executes it exactly as written the mechanism

00:00:33.479 --> 00:00:35.820
was entirely visible to the programmer Yeah,

00:00:35.920 --> 00:00:38.560
but today you pull out your smartphone You open

00:00:38.560 --> 00:00:41.399
an app and suddenly you're interacting with a

00:00:41.399 --> 00:00:44.219
tool where that mechanism is completely invisible

00:00:44.219 --> 00:00:47.820
It's hidden behind glass and code it it almost

00:00:47.820 --> 00:00:49.880
acts like it has a mind of its own Well, we've

00:00:49.880 --> 00:00:52.140
transitioned from tools that simply follow our

00:00:52.140 --> 00:00:54.700
explicit instructions to tools that actively

00:00:54.700 --> 00:00:57.359
anticipate our needs. And sometimes they do that

00:00:57.359 --> 00:00:59.740
before we even consciously register those needs

00:00:59.740 --> 00:01:02.079
ourselves. Which is a little unsettling. It is.

00:01:02.340 --> 00:01:04.819
That loss of visibility fundamentally changes

00:01:04.819 --> 00:01:07.000
our relationship with technology. I mean, we're

00:01:07.000 --> 00:01:09.319
no longer just operators. We are participants

00:01:09.319 --> 00:01:11.760
in a system we can't fully see. Welcome to this

00:01:11.760 --> 00:01:15.500
deep dive. Today we are looking at a really comprehensive

00:01:15.500 --> 00:01:18.620
foundational source, a highly detailed Wikipedia

00:01:18.620 --> 00:01:21.280
article on machine learning. We're trying to

00:01:21.280 --> 00:01:23.400
understand how these invisible tools actually

00:01:23.400 --> 00:01:25.879
work. And we are focusing on how we lost that

00:01:25.879 --> 00:01:28.319
visibility into our own technology. Right. The

00:01:28.319 --> 00:01:30.420
mission today is to cut through all the relentless

00:01:30.420 --> 00:01:33.620
modern hype and buzzwords. How did we go from

00:01:33.620 --> 00:01:36.939
machines that just followed our rigid rules to

00:01:36.939 --> 00:01:39.439
these opaque black boxes that learn on their

00:01:39.439 --> 00:01:42.560
own? And, you know, what happens when those mathematically

00:01:42.560 --> 00:01:46.040
perfect black boxes collide with our incredibly

00:01:46.040 --> 00:01:48.900
messy, biased human reality? The Culloden that's

00:01:48.900 --> 00:01:50.819
happening everywhere right now. It really is.

00:01:51.260 --> 00:01:53.799
So, whether you are actively working in tech

00:01:53.799 --> 00:01:56.359
or you're just trying to navigate a world that

00:01:56.359 --> 00:01:59.900
is increasingly run by algorithms, this deep

00:01:59.900 --> 00:02:01.980
dive is going to give you the ultimate foundational

00:02:01.980 --> 00:02:06.099
map. So, okay, let's unpack this. To really understand

00:02:06.099 --> 00:02:08.300
these sophisticated algorithms running our lives

00:02:08.300 --> 00:02:11.719
right now, We have to start by tracing a massive

00:02:11.719 --> 00:02:14.960
historical paradigm shift. The shift from programming

00:02:14.960 --> 00:02:17.979
a machine to teaching a machine. Exactly. And

00:02:17.979 --> 00:02:20.180
that shift actually started much earlier than

00:02:20.180 --> 00:02:22.599
most people realize. I mean, the term machine

00:02:22.599 --> 00:02:24.800
learning wasn't coined in the last decade during

00:02:24.800 --> 00:02:27.240
this current AI boom. It goes all the way back

00:02:27.240 --> 00:02:31.710
to 1959. Wait, 1959? Yeah. An IBM researcher

00:02:31.710 --> 00:02:34.150
named Arthur Samuel coined the term while he

00:02:34.150 --> 00:02:36.310
was working on a checkers program. A checkers

00:02:36.310 --> 00:02:39.169
program? Right. He realized that manually coding

00:02:39.169 --> 00:02:41.650
every possible board state and every possible

00:02:41.650 --> 00:02:44.150
strategy into a computer was just computationally

00:02:44.150 --> 00:02:47.569
impossible. There were just too many permutations.

00:02:47.789 --> 00:02:50.229
So what did he do? Well, he built a program that

00:02:50.229 --> 00:02:52.810
could evaluate the board and mathematically adjust

00:02:52.810 --> 00:02:55.030
its own internal parameters based on whether

00:02:55.030 --> 00:02:58.909
it won or lost a game. It is wild to think. that

00:02:58.909 --> 00:03:02.729
the origins of the systems currently driving

00:03:02.729 --> 00:03:04.810
our cars on the highway and diagnosing complex

00:03:04.810 --> 00:03:07.629
diseases started with a guy trying to get a computer

00:03:07.629 --> 00:03:09.969
to calculate winning chances in a board game.

00:03:10.129 --> 00:03:12.770
It's a very humble beginning. And the evolution

00:03:12.770 --> 00:03:15.270
from there is just fascinating. Our source points

00:03:15.270 --> 00:03:18.349
out that by the early 1960s, the Raytheon company

00:03:18.349 --> 00:03:20.729
had built an experimental learning machine called

00:03:20.729 --> 00:03:23.789
Cybertron. Cybertron, yeah. That sounds incredibly

00:03:23.789 --> 00:03:26.090
sci -fi. It really does. It analyzed complex

00:03:26.090 --> 00:03:30.250
data like sonar signals, electrocardiograms,

00:03:30.270 --> 00:03:32.509
and speech patterns using punch tape memory.

00:03:33.069 --> 00:03:35.289
But the absolute best detail here is that the

00:03:35.289 --> 00:03:37.810
machine was equipped with a literal physical

00:03:37.810 --> 00:03:40.129
goof button. I love the goof button. A human

00:03:40.129 --> 00:03:42.189
operator would literally sit at the console and

00:03:42.189 --> 00:03:44.169
act as the teacher. Just waiting for it to mess

00:03:44.169 --> 00:03:47.090
up. Exactly. Whenever the machine made an incorrect

00:03:47.090 --> 00:03:49.830
decision in its pattern recognition, the human

00:03:49.830 --> 00:03:52.250
operator would physically slam the goof button.

00:03:52.830 --> 00:03:55.449
Hitting that button forced the machine to mathematically

00:03:55.449 --> 00:03:58.430
penalize the logic path it had just used, so

00:03:58.430 --> 00:04:01.129
it had to reevaluate its approach. That is amazing.

00:04:01.270 --> 00:04:04.849
It's a highly rudimentary, tactile form of reinforcement

00:04:04.849 --> 00:04:06.969
learning. Yeah, instead of giving the computer

00:04:06.969 --> 00:04:09.990
a strict step -by -step logical flowchart, which

00:04:09.990 --> 00:04:13.270
is how traditional programming works. The human

00:04:13.270 --> 00:04:15.689
operator was just evaluating the final output.

00:04:16.329 --> 00:04:18.790
If the machine's output was wrong, you hit the

00:04:18.790 --> 00:04:21.430
button. It's like traditional programming is

00:04:21.430 --> 00:04:23.829
like giving a computer a strict recipe to bake

00:04:23.829 --> 00:04:26.230
a cake. Right. But machine learning is like feeding

00:04:26.230 --> 00:04:29.149
the computer 100 different cakes and asking it

00:04:29.149 --> 00:04:32.009
to figure out the recipe itself, with the human

00:04:32.009 --> 00:04:33.870
occasionally slamming the goof button when it

00:04:33.870 --> 00:04:35.889
bakes something awful. That's a great way to

00:04:35.889 --> 00:04:38.209
look at it. And what's fascinating here is the

00:04:38.209 --> 00:04:40.829
underlying philosophy of this shift, which really

00:04:40.829 --> 00:04:42.829
began to accelerate and take over the industry

00:04:42.829 --> 00:04:45.509
in the 1990s. The shift away from logic -based

00:04:45.509 --> 00:04:48.870
AI. Exactly. For a long time, artificial intelligence

00:04:48.870 --> 00:04:52.269
was dominated by a symbolic, logic -based approach.

00:04:52.689 --> 00:04:54.670
Researchers trying to program human knowledge

00:04:54.670 --> 00:04:57.610
and rigid cognitive rules into the machine. Like

00:04:57.610 --> 00:05:00.399
endless lists of ups and statements. Yeah, but

00:05:00.399 --> 00:05:03.120
the real world is incredibly complex and noisy.

00:05:03.600 --> 00:05:06.399
You simply cannot write enough if -then statements

00:05:06.399 --> 00:05:09.379
to cover, say, every possible lighting condition

00:05:09.379 --> 00:05:11.220
a self -driving car might encounter on a rainy

00:05:11.220 --> 00:05:13.800
night. The physical world is just too messy for

00:05:13.800 --> 00:05:15.779
a perfect set of written rules. So the field

00:05:15.779 --> 00:05:18.579
pivoted. They embraced a concept proposed much

00:05:18.579 --> 00:05:21.639
earlier by the mathematician Alan Turing. Right,

00:05:21.639 --> 00:05:24.279
the Turing test guy. Him, yeah. He suggested

00:05:24.279 --> 00:05:26.860
we stop asking the philosophical, unanswerable

00:05:26.860 --> 00:05:29.839
question, can machines think? and replace it

00:05:29.839 --> 00:05:32.319
with a pragmatic, testable question, which is,

00:05:32.540 --> 00:05:35.160
can machines do what we, as thinking entities,

00:05:35.279 --> 00:05:38.560
can do? Just focus on the output. Yes. The focus

00:05:38.560 --> 00:05:41.120
moved entirely away from trying to replicate

00:05:41.120 --> 00:05:43.899
human consciousness or cognitive thought. Instead,

00:05:44.019 --> 00:05:46.459
the goal became achieving practical, solvable

00:05:46.459 --> 00:05:49.259
tasks based purely on statistics, probability,

00:05:49.699 --> 00:05:51.899
and optimization. Okay, so if we are no longer

00:05:51.899 --> 00:05:54.560
writing rigid rules, how do we actually set up

00:05:54.560 --> 00:05:56.879
the classroom for the machine today? Like how

00:05:56.879 --> 00:05:58.879
do we feed it the cakes? Well modern machine

00:05:58.879 --> 00:06:01.360
learning generally falls into three main paradigms

00:06:01.360 --> 00:06:03.500
based on how the data is presented to the algorithm.

00:06:03.819 --> 00:06:05.879
Let's walk through those. The first is supervised

00:06:05.879 --> 00:06:08.459
learning, right? Yeah, supervised learning. This

00:06:08.459 --> 00:06:10.939
is where the machine is given a data set that

00:06:10.939 --> 00:06:13.899
has already been explicitly labeled by a human

00:06:13.899 --> 00:06:17.139
teacher. You provide the system with thousands,

00:06:17.379 --> 00:06:20.639
or maybe millions, of input -output pairs. Like

00:06:20.639 --> 00:06:23.360
classifying emails into a spam folder. Exactly.

00:06:23.480 --> 00:06:26.139
The algorithm's job is to calculate the mathematical

00:06:26.139 --> 00:06:28.839
function that best maps those specific inputs

00:06:28.839 --> 00:06:31.500
to the correct outputs. And it's used heavily

00:06:31.500 --> 00:06:33.779
for classification, but also for something called

00:06:33.779 --> 00:06:36.060
regression analysis. Regression is a bit different,

00:06:36.240 --> 00:06:38.699
right? Yeah, in regression the output isn't a

00:06:38.699 --> 00:06:41.920
category like spam or not spam. It's a continuous

00:06:41.920 --> 00:06:45.420
numerical value. For example, predicting future

00:06:45.420 --> 00:06:47.560
temperature fluctuations based on historical

00:06:47.560 --> 00:06:50.879
atmospheric data. Oh, okay. So the machine calculates

00:06:50.879 --> 00:06:53.040
the mathematical trajectory based on the labeled

00:06:53.040 --> 00:06:55.079
history. You got it. Then you have the second

00:06:55.079 --> 00:06:58.180
paradigm, which is unsupervised learning. This

00:06:58.180 --> 00:06:59.839
is where the training wheels completely come

00:06:59.839 --> 00:07:04.360
off. The machine is fed a massive pile of unlabeled

00:07:04.360 --> 00:07:07.079
raw data, and it is left completely on its own

00:07:07.079 --> 00:07:09.720
to find hidden structures or commonalities. Right,

00:07:09.879 --> 00:07:12.480
no labels to guide it at all. So the algorithm

00:07:12.480 --> 00:07:15.759
relies on mathematical techniques like dimensionality

00:07:15.759 --> 00:07:18.540
reduction. Dimensionality reduction. That sounds

00:07:18.540 --> 00:07:21.519
intense. It's just a way of dealing with massive

00:07:21.519 --> 00:07:24.699
datasets. It works by mapping data points in

00:07:24.699 --> 00:07:26.839
a high -dimensional space to find which variables

00:07:26.839 --> 00:07:29.879
correlate the most. It essentially merges redundant

00:07:29.879 --> 00:07:32.759
features. So if you have a dataset with a thousand

00:07:32.759 --> 00:07:35.720
variables... The machine figures out which handful

00:07:35.720 --> 00:07:38.699
of variables actually drive the variance. It

00:07:38.699 --> 00:07:41.339
strips away the noise so it can process the true

00:07:41.339 --> 00:07:44.160
signal efficiently without needing a human to

00:07:44.160 --> 00:07:46.040
tell it what to look for. Okay, but here's where

00:07:46.040 --> 00:07:48.740
it gets really interesting. Because if I dump

00:07:48.740 --> 00:07:51.199
a million unlabeled data points into an algorithm,

00:07:51.560 --> 00:07:53.560
and it groups them based on its own math, how

00:07:53.560 --> 00:07:55.740
does the machine know the pattern it found is

00:07:55.740 --> 00:07:58.000
actually useful? What do you mean? Well, what

00:07:58.000 --> 00:07:59.860
if it just groups all the photos in a database

00:07:59.860 --> 00:08:02.819
that happen to have exactly 42 red pixels in

00:08:02.819 --> 00:08:05.560
the top left corner? I mean, that is a mathematically

00:08:05.560 --> 00:08:08.319
valid pattern, but it's a completely random,

00:08:08.860 --> 00:08:11.269
meaningless coincidence in the real world. That

00:08:11.269 --> 00:08:14.430
is a very real, very crucial tension in the field,

00:08:14.649 --> 00:08:16.810
and it's exactly why we have to distinguish between

00:08:16.810 --> 00:08:19.069
data mining and machine learning. OK, what's

00:08:19.069 --> 00:08:21.769
the difference? Data mining is exploratory. It's

00:08:21.769 --> 00:08:24.290
looking for previously unknown properties or,

00:08:24.290 --> 00:08:27.029
as you said, random coincidences in a static

00:08:27.029 --> 00:08:30.550
database. Machine learning, however, is intensely

00:08:30.550 --> 00:08:33.009
focused on prediction. It has to actually work

00:08:33.009 --> 00:08:36.190
going forward. Yes. To ensure a pattern isn't

00:08:36.190 --> 00:08:38.929
just a meaningless coincidence like your 42 red

00:08:38.929 --> 00:08:41.929
pixels, we test the model through a concept called

00:08:41.929 --> 00:08:44.919
generalization. Generalization, so testing it

00:08:44.919 --> 00:08:48.000
against reality. Exactly. The true test of an

00:08:48.000 --> 00:08:50.320
algorithm isn't how well it memorizes its training

00:08:50.320 --> 00:08:53.179
data. I mean, memorization is computationally

00:08:53.179 --> 00:08:56.080
easy. Generalization is the machine's ability

00:08:56.080 --> 00:08:58.759
to apply the patterns it found to completely

00:08:58.759 --> 00:09:01.740
new, unseen examples. Oh, I see. So if its weird

00:09:01.740 --> 00:09:04.299
red pixel rule fails to predict anything useful

00:09:04.299 --> 00:09:06.639
when exposed to new photos. The model's internal

00:09:06.639 --> 00:09:09.440
math is penalized, and it adjusts to discard

00:09:09.440 --> 00:09:11.820
that useless rule. It has to survive outside

00:09:11.820 --> 00:09:14.360
the classroom. Which leads us organically to

00:09:14.360 --> 00:09:17.179
the third paradigm, reinforcement learning. Right.

00:09:17.419 --> 00:09:19.480
This is where the program interacts directly

00:09:19.480 --> 00:09:21.960
with a dynamic environment to achieve a goal.

00:09:22.480 --> 00:09:24.919
So it's not given a static data set to analyze

00:09:24.919 --> 00:09:27.659
at all? No. It's given a world state and a set

00:09:27.659 --> 00:09:31.139
of possible actions. As it takes actions, it

00:09:31.139 --> 00:09:33.259
receives feedback from the environment in the

00:09:33.259 --> 00:09:36.539
form of numerical rewards or punishments. Kind

00:09:36.539 --> 00:09:38.200
of like the goof button again, but automated.

00:09:38.409 --> 00:09:42.049
Very much so. Its sole objective is to maximize

00:09:42.049 --> 00:09:45.370
that cumulative reward over time. This is how

00:09:45.370 --> 00:09:48.029
DeepMind's AlphaGo system famously mastered the

00:09:48.029 --> 00:09:51.669
board game. Go back in 2016. It didn't just study

00:09:51.669 --> 00:09:53.950
historical human games, right? Right. It played

00:09:53.950 --> 00:09:56.210
millions of matches against itself, constantly

00:09:56.210 --> 00:09:58.490
tweaking its mathematical strategy to maximize

00:09:58.490 --> 00:10:01.029
the reward function of winning. Okay, so once

00:10:01.029 --> 00:10:03.960
the machine extracts these generalized patterns

00:10:03.960 --> 00:10:06.159
from the environment, whether that's supervised,

00:10:06.480 --> 00:10:09.700
unsupervised, or reinforcement learning, how

00:10:09.700 --> 00:10:12.000
is that knowledge physically structured inside

00:10:12.000 --> 00:10:13.980
the model? That's the big question. Because if

00:10:13.980 --> 00:10:16.600
it isn't a list of if -then rules like old programming,

00:10:17.120 --> 00:10:18.860
what does the resulting model actually look like

00:10:18.860 --> 00:10:21.399
under the hood? Well, the dominant architecture

00:10:21.399 --> 00:10:24.320
driving today's advancements is the artificial

00:10:24.320 --> 00:10:27.460
neural network. Right. These are loosely modeled

00:10:27.460 --> 00:10:29.940
on the biological structure of animal brains.

00:10:30.240 --> 00:10:32.440
So instead of traditional lines of sequential

00:10:32.440 --> 00:10:36.500
code, you have a vast web of nodes, which act

00:10:36.500 --> 00:10:39.259
as artificial neurons arranged in multiple layers.

00:10:39.379 --> 00:10:42.039
And these nodes are connected by edges. Yes,

00:10:42.139 --> 00:10:45.139
edges, which function like biological synapses.

00:10:45.340 --> 00:10:47.539
And the magic really happens in those connections,

00:10:47.620 --> 00:10:49.639
doesn't it? Every single edge has a specific

00:10:49.639 --> 00:10:52.360
mathematical way to assign to it. Exactly. As

00:10:52.360 --> 00:10:54.679
the machine learns, it uses a process called

00:10:54.679 --> 00:10:57.399
backpropagation. Backpropagation. Yeah. So when

00:10:57.399 --> 00:10:59.679
the network makes an error, the mathematical

00:10:59.679 --> 00:11:01.500
difference between its guess and the correct

00:11:01.500 --> 00:11:04.940
answer is calculated. That error signal is then

00:11:04.940 --> 00:11:07.220
propagated backward through the entire network.

00:11:07.399 --> 00:11:10.419
Like a ripple effect. Exactly. The system automatically

00:11:10.419 --> 00:11:12.259
adjusts the weights of all those connections,

00:11:12.759 --> 00:11:15.679
strengthening some and weakening others, to minimize

00:11:15.679 --> 00:11:18.360
the error next time. So the knowledge of the

00:11:18.360 --> 00:11:20.759
system doesn't live in any single line of code.

00:11:21.100 --> 00:11:23.879
It is distributed across millions or billions

00:11:23.879 --> 00:11:26.500
of these highly specific weighted probabilities.

00:11:26.940 --> 00:11:29.639
Wow. And our sources draw a parallel here that

00:11:29.639 --> 00:11:31.700
fundamentally changes how you look at artificial

00:11:31.700 --> 00:11:34.240
intelligence. It states that machine learning

00:11:34.240 --> 00:11:37.039
is deeply intimately tied to data compression.

00:11:37.440 --> 00:11:41.159
The AIXI theory. Yes. This theoretical framework

00:11:41.159 --> 00:11:43.299
suggests that the absolute best compression of

00:11:43.299 --> 00:11:46.419
a data file is simply the smallest possible software

00:11:46.419 --> 00:11:48.419
program that can mathematically generate that

00:11:48.419 --> 00:11:51.820
file. It is a brilliant conceptual lens. Think

00:11:51.820 --> 00:11:54.440
of a massive large language model developed by

00:11:54.440 --> 00:11:57.360
DeepMind called Chinchilla 7DB. What did Chinchilla

00:11:57.360 --> 00:12:00.320
do? Well, in research tests, this AI language

00:12:00.320 --> 00:12:03.519
model actually outperformed traditional dedicated

00:12:03.519 --> 00:12:06.039
lossless compression tools. Wait, a language

00:12:06.039 --> 00:12:09.159
model beat compression software? Yes. When fed

00:12:09.159 --> 00:12:12.799
raw image data, it compressed it down to 43 .4

00:12:12.799 --> 00:12:15.639
% of its original size, beating the standard

00:12:15.639 --> 00:12:18.600
PNG format. That's insane. And for audio, it

00:12:18.600 --> 00:12:21.899
compressed data down to 16 .4%. massively outperforming

00:12:21.899 --> 00:12:25.220
the FLAC format. So if I am grasping the magnitude

00:12:25.220 --> 00:12:28.460
of this, learning at its core is just the ultimate

00:12:28.460 --> 00:12:30.740
form of data compression. It really is. We tend

00:12:30.740 --> 00:12:33.460
to anthropomorphize AI. You know, we imagine

00:12:33.460 --> 00:12:35.879
this expansive, almost conscious brain growing

00:12:35.879 --> 00:12:39.200
inside a server rack, but maybe it's just squishing

00:12:39.200 --> 00:12:42.480
the absolute chaos of our reality into a highly

00:12:42.480 --> 00:12:45.820
efficient, mathematically predictable ZIP file.

00:12:45.919 --> 00:12:48.340
That's exactly what it's doing. It finds the

00:12:48.340 --> 00:12:51.440
shortest statistical path to represent a staggeringly

00:12:51.440 --> 00:12:53.620
complex world. But if we connect this to the

00:12:53.620 --> 00:12:57.100
bigger picture, that relentless drive to perfectly

00:12:57.100 --> 00:13:00.379
compress reality is exactly what causes one of

00:13:00.379 --> 00:13:02.600
the greatest technical failures. in machine learning,

00:13:03.059 --> 00:13:05.940
which is overfitting. Yes. Overfitting is when

00:13:05.940 --> 00:13:08.740
the compression goes too far. How so? Imagine

00:13:08.740 --> 00:13:11.340
a machine trying to draw a predictive line through

00:13:11.340 --> 00:13:14.320
a graph of scattered data points. If the model

00:13:14.320 --> 00:13:16.600
is too complex, it essentially gerrymanders its

00:13:16.600 --> 00:13:18.759
own mathematical theory. It twists itself into

00:13:18.759 --> 00:13:21.259
knots. Right. It draws this wildly complicated

00:13:21.259 --> 00:13:24.220
chaotic line that perfectly touches every single

00:13:24.220 --> 00:13:26.899
outlier and anomaly in the training data. It

00:13:26.899 --> 00:13:29.059
achieves perfect compression of the past. But

00:13:29.059 --> 00:13:32.230
the past has noise in it. Exactly. Because it

00:13:32.230 --> 00:13:35.230
has molded itself so rigidly to the specific

00:13:35.230 --> 00:13:38.389
noise of that data set, it utterly fails when

00:13:38.389 --> 00:13:41.929
deployed against new messy data. It has high

00:13:41.929 --> 00:13:45.009
variance, but zero predictive value. It's like

00:13:45.009 --> 00:13:48.909
a student who memorizes the exact answers to

00:13:48.909 --> 00:13:51.330
a practice test, but because they didn't actually

00:13:51.330 --> 00:13:53.690
learn the underlying concepts. they completely

00:13:53.690 --> 00:13:56.690
bomb the real exam. That is the perfect analogy.

00:13:57.029 --> 00:13:59.149
The model didn't learn the physics of the data.

00:13:59.370 --> 00:14:01.649
It just memorized the noise. And when we take

00:14:01.649 --> 00:14:03.789
these highly compressed pattern matchers, especially

00:14:03.789 --> 00:14:06.450
the overfitted ones, and deploy them into the

00:14:06.450 --> 00:14:08.950
unpredictable human environment, the collisions

00:14:08.950 --> 00:14:11.549
can be spectacular. Spectacular and sometimes

00:14:11.549 --> 00:14:14.720
very dangerous. Yeah. The sources highlight some

00:14:14.720 --> 00:14:17.580
massive high -profile failures that really pull

00:14:17.580 --> 00:14:19.720
back the curtain on the technology's limitations.

00:14:20.059 --> 00:14:22.279
Like, take the Netflix prize. Oh, that's a classic

00:14:22.279 --> 00:14:25.679
case study. Right. In 2006, Netflix offered a

00:14:25.679 --> 00:14:28.340
$1 million prize to anyone who could improve

00:14:28.340 --> 00:14:31.279
their cinematic recommendation algorithm by 10%.

00:14:31.279 --> 00:14:33.940
A million dollars just for a 10 % bump. Yeah.

00:14:34.299 --> 00:14:37.539
And a team finally won it in 2009, using this

00:14:37.539 --> 00:14:40.879
complex ensemble model. But Netflix quickly realized

00:14:40.879 --> 00:14:43.100
the model was completely useless for their actual

00:14:43.100 --> 00:14:45.399
business. Because the algorithm was mathematically

00:14:45.399 --> 00:14:49.360
sound, but the proxy data it trained on was fundamentally

00:14:49.360 --> 00:14:52.480
flawed. The human star ratings. Right. Human

00:14:52.480 --> 00:14:54.679
star ratings turned out to be terrible indicators

00:14:54.679 --> 00:14:58.200
of actual viewing behavior. Users would rate

00:14:58.200 --> 00:15:01.299
acclaimed documentaries five stars because, you

00:15:01.299 --> 00:15:02.919
know, they wanted to project an intellectual

00:15:02.919 --> 00:15:06.039
persona. But their actual behavior showed they

00:15:06.039 --> 00:15:08.679
spent their weekends binge watching reality television.

00:15:08.649 --> 00:15:12.549
Exactly. The machine optimized for the lie, not

00:15:12.549 --> 00:15:15.370
the reality. But the stakes are often much higher

00:15:15.370 --> 00:15:18.110
than movie recommendations, unfortunately. The

00:15:18.110 --> 00:15:20.409
source notes the 2018 incident where an Uber

00:15:20.409 --> 00:15:22.730
self -driving car failed to properly classify

00:15:22.730 --> 00:15:25.389
a pedestrian with a bicycle, which resulted in

00:15:25.389 --> 00:15:27.730
a fatal collision. A tragedy that shows how brittle

00:15:27.730 --> 00:15:31.129
these systems can be. It also details IBM Watson's

00:15:31.129 --> 00:15:34.110
failure to deliver reliable healthcare diagnostics

00:15:34.110 --> 00:15:37.370
despite billions of dollars invested. And it

00:15:37.370 --> 00:15:39.690
highlights language models like Microsoft's Bing

00:15:39.690 --> 00:15:43.309
Chat, producing hostile, erratic responses because

00:15:43.309 --> 00:15:45.110
they couldn't mathematically handle the edge

00:15:45.110 --> 00:15:48.549
cases of human conversation. Well, sometimes

00:15:48.549 --> 00:15:51.149
the machine just learns a functionally correct

00:15:51.149 --> 00:15:53.529
but utterly useless lesson. What do you mean

00:15:53.529 --> 00:15:55.970
by that? The source gives a great toy example.

00:15:56.389 --> 00:15:59.250
Imagine an image classifier trained entirely

00:15:59.250 --> 00:16:02.450
on pictures of brown horses and black cats. The

00:16:02.450 --> 00:16:04.970
network might optimize its weights to simply

00:16:04.970 --> 00:16:07.750
conclude that any brown patch of pixels in an

00:16:07.750 --> 00:16:10.070
image is a horse. Oh wow. So it didn't learn

00:16:10.070 --> 00:16:12.629
the shape or the legs or the context? Nope. It

00:16:12.629 --> 00:16:15.049
just found a lazy statistical shortcut. Brown

00:16:15.049 --> 00:16:17.570
equals horse. And because these are non -linear

00:16:17.570 --> 00:16:20.330
systems evaluating pixel relationships that human

00:16:20.330 --> 00:16:23.190
eyes don't even perceive, they have massive adversarial

00:16:23.190 --> 00:16:25.519
vulnerabilities. They really do. A researcher

00:16:25.519 --> 00:16:28.159
can change a single adversarially calculated

00:16:28.159 --> 00:16:30.740
pixel in an image, something completely invisible

00:16:30.740 --> 00:16:34.500
to you and me. Just one pixel. Just one. By adding

00:16:34.500 --> 00:16:37.820
a specific noise vector to the image, that tiny

00:16:37.820 --> 00:16:40.440
mathematical shift cascades through the network's

00:16:40.440 --> 00:16:43.200
weights, completely fooling the machine into

00:16:43.200 --> 00:16:45.980
misclassifying a stop sign as a speed limit sign.

00:16:46.220 --> 00:16:49.440
That is terrifying. But wait, if we know a machine

00:16:49.440 --> 00:16:52.100
learned the brown horse shortcut, or if we know

00:16:52.100 --> 00:16:54.620
it's being fooled by specific adversarial noise,

00:16:55.039 --> 00:16:57.940
how do developers fix it? Can't a software engineer

00:16:57.940 --> 00:17:00.100
just open up the code, find the line that says

00:17:00.100 --> 00:17:02.340
brown equals horse, and just delete it? And here

00:17:02.340 --> 00:17:05.000
we hit the terrifying core of modern artificial

00:17:05.000 --> 00:17:08.049
intelligence. The black box theory. The black

00:17:08.049 --> 00:17:10.390
box. The answer is no, they cannot just go in

00:17:10.390 --> 00:17:12.250
and tweak the rule. Because there are no explicit

00:17:12.250 --> 00:17:14.410
rules. As we discussed with neural networks,

00:17:14.569 --> 00:17:17.250
the logic is distributed across billions of minute

00:17:17.250 --> 00:17:19.089
weight adjustments. It's all just math. It is

00:17:19.089 --> 00:17:21.829
a vast matrix of floating -point numbers. It

00:17:21.829 --> 00:17:24.309
is so mathematically opaque that even the original

00:17:24.309 --> 00:17:26.849
coders who architected the system cannot audit

00:17:26.849 --> 00:17:29.410
the specific pattern the machine extracted. Wait,

00:17:29.410 --> 00:17:31.809
really? The people who built the system cannot

00:17:31.809 --> 00:17:34.509
trace its logic. They know the initial architecture

00:17:34.509 --> 00:17:37.769
and they know the data they fed it, but the specific

00:17:37.769 --> 00:17:41.490
pathway of logic for any given decision is an

00:17:41.490 --> 00:17:44.789
impenetrable black box. That is wild. And this

00:17:44.789 --> 00:17:46.910
opaqueness is driving a critical subfield right

00:17:46.910 --> 00:17:50.730
now called explainable AI or XAI. Because we

00:17:50.730 --> 00:17:53.150
urgently need systems where humans can actually

00:17:53.150 --> 00:17:55.690
decipher the prediction. Exactly. Especially

00:17:55.690 --> 00:17:58.990
in high stakes areas like medicine. Imagine a

00:17:58.990 --> 00:18:01.950
proprietary algorithm recommending unnecessary

00:18:02.220 --> 00:18:04.900
expensive medical tests, because the company

00:18:04.900 --> 00:18:07.960
that owns it trained the model on data that subtly

00:18:07.960 --> 00:18:10.880
rewards financial incentives. If the system is

00:18:10.880 --> 00:18:13.720
a black box, it's nearly impossible for regulators

00:18:13.720 --> 00:18:16.240
to prove that mathematical intent. Exactly. So

00:18:16.240 --> 00:18:19.000
if we can't audit the black box's logic, we are

00:18:19.000 --> 00:18:20.900
entirely at the mercy of the data we fed it.

00:18:21.150 --> 00:18:23.490
And historically, the data we've fed is incredibly

00:18:23.490 --> 00:18:26.630
flawed. Machine learning essentially digitizes

00:18:26.630 --> 00:18:29.490
and scales historical human behavior. It is a

00:18:29.490 --> 00:18:32.250
pure mathematical reflection. And a stark example

00:18:32.250 --> 00:18:35.529
of this occurred in 1988 at the UK's St. George's

00:18:35.529 --> 00:18:37.750
Medical School. What happened there? They implemented

00:18:37.750 --> 00:18:40.950
a computer program to screen initial applicants.

00:18:41.250 --> 00:18:43.549
And the program ended up automatically denying

00:18:43.549 --> 00:18:46.509
nearly 60 qualified candidates simply because

00:18:46.509 --> 00:18:49.029
they were women or because they had non -European

00:18:49.029 --> 00:18:51.319
sounding names. Wow. But the algorithm wasn't

00:18:51.319 --> 00:18:53.519
coded to be prejudiced. Yeah, it wasn't coded

00:18:53.519 --> 00:18:56.339
with malicious intent. It was trained on the

00:18:56.339 --> 00:18:58.960
historical behavior of the human admissions staff

00:18:58.960 --> 00:19:01.619
from previous years. It found the statistical

00:19:01.619 --> 00:19:04.160
pattern in the human decisions and ruthlessly

00:19:04.160 --> 00:19:06.900
codified it into a mathematical rule. We see

00:19:06.900 --> 00:19:09.319
this mechanical echo repeatedly. The investigative

00:19:09.319 --> 00:19:12.200
organization ProPublica found that risk assessment

00:19:12.200 --> 00:19:14.539
algorithms used in the criminal justice system

00:19:14.539 --> 00:19:17.519
falsely flagged black defendants as high risk

00:19:17.519 --> 00:19:20.549
for recidivism twice as often as white defendants,

00:19:20.950 --> 00:19:23.009
again based on historical arrest data. It just

00:19:23.009 --> 00:19:25.849
absorbs the history. In 2015, Google Photos'

00:19:26.049 --> 00:19:29.150
classifier tagged humans as gorillas due to massive

00:19:29.150 --> 00:19:31.210
gaps in its training data. And language models

00:19:31.210 --> 00:19:33.430
inherently absorb these claws, too, because human

00:19:33.430 --> 00:19:35.670
language corpora contain them. When Microsoft

00:19:35.670 --> 00:19:38.569
released its TeyChat bot on Twitter, it adopted

00:19:38.569 --> 00:19:41.029
inflammatory racist language almost immediately.

00:19:41.250 --> 00:19:43.309
Because it was learning from Twitter users. It

00:19:43.309 --> 00:19:47.259
simply ran the statistics on the text the crowd

00:19:47.259 --> 00:19:50.180
was feeding it, optimizing for engagement based

00:19:50.180 --> 00:19:53.559
on what it saw in real time. So the tech industry

00:19:53.559 --> 00:19:55.980
is currently attempting to address some of the

00:19:55.980 --> 00:19:58.380
data flow issues, particularly around privacy,

00:19:58.880 --> 00:20:01.660
by moving toward what is called embedded machine

00:20:01.660 --> 00:20:04.059
learning. Right, moving to the edge. Yeah. Instead

00:20:04.059 --> 00:20:06.420
of sending all your personal biometric or text

00:20:06.420 --> 00:20:09.440
data to a giant server in the cloud to be processed,

00:20:10.019 --> 00:20:12.259
models are being deployed directly onto edge

00:20:12.259 --> 00:20:15.000
devices, like your smartphone, your smartwatch,

00:20:15.339 --> 00:20:18.109
or local industrial sensors. But this requires

00:20:18.109 --> 00:20:20.890
specialized hardware. I mean, a standard CPU

00:20:20.890 --> 00:20:23.109
isn't efficient enough for the massive matrix

00:20:23.109 --> 00:20:25.490
multiplication required by neural networks. So

00:20:25.490 --> 00:20:27.809
what do they use? Devices are now equipped with

00:20:27.809 --> 00:20:31.250
tensor processing units, or TPUs, which are microchips

00:20:31.250 --> 00:20:33.869
designed specifically to run complex AI math

00:20:33.869 --> 00:20:36.430
locally, right there in your hand, without needing

00:20:36.430 --> 00:20:38.470
an internet connection. So what does this all

00:20:38.470 --> 00:20:41.460
mean? The push to the edge solves the latency

00:20:41.460 --> 00:20:44.380
issue, and it solves the massive privacy issue

00:20:44.380 --> 00:20:47.700
of broadcasting your life to a server farm. But

00:20:47.700 --> 00:20:50.099
if we step back and look at the fundamental mechanics,

00:20:50.500 --> 00:20:53.319
does it actually solve the bias? That's the million

00:20:53.319 --> 00:20:56.200
dollar question. Right. If an algorithm requires

00:20:56.200 --> 00:20:59.000
vast amounts of human data to learn, and that

00:20:59.000 --> 00:21:01.640
data is inherently skewed by history, aren't

00:21:01.640 --> 00:21:05.200
we just building incredibly fast, highly efficient

00:21:05.200 --> 00:21:08.099
digital mirrors of our own flaws? This raises

00:21:08.099 --> 00:21:09.880
an important question about the trade -offs of

00:21:09.880 --> 00:21:13.059
localized AI. A compounding factor in the data

00:21:13.059 --> 00:21:15.920
pipeline is the lack of diversity among the engineers

00:21:15.920 --> 00:21:18.420
curating it. Who is actually building these systems?

00:21:18.839 --> 00:21:22.240
Well, globally, only about 16 .1 % of faculty

00:21:22.240 --> 00:21:24.880
members focusing on AI are female, and the broader

00:21:24.880 --> 00:21:27.559
demographics of AI PhD graduates are similarly

00:21:27.559 --> 00:21:30.180
skewed. So the people defining the reward functions

00:21:30.180 --> 00:21:32.700
and selecting the training sets naturally represent

00:21:32.700 --> 00:21:36.000
a very narrow slice of humanity. Exactly. The

00:21:36.000 --> 00:21:38.539
hardware is getting exponentially better, but

00:21:38.539 --> 00:21:40.980
the baseline data remains a structural issue.

00:21:41.319 --> 00:21:45.279
The AI scientist Fei -Fei Lai summarized it perfectly

00:21:45.279 --> 00:21:47.680
when she noted, there's nothing artificial about

00:21:47.680 --> 00:21:50.319
AI. It's inspired by people, it's created by

00:21:50.319 --> 00:21:53.019
people, and it impacts people. That is a powerful

00:21:53.019 --> 00:21:55.920
quote. It is. Moving to embedded machine learning

00:21:55.920 --> 00:21:58.519
secures your data privacy because your personal

00:21:58.519 --> 00:22:01.680
data never leaves your device. But it also means

00:22:01.680 --> 00:22:04.220
that the mathematically opaque, biased black

00:22:04.220 --> 00:22:06.819
box we just discussed isn't living in a remote

00:22:06.819 --> 00:22:09.240
server anymore. It's living locally. It is now

00:22:09.240 --> 00:22:11.759
living right in your pocket. Wow. Wow, we have

00:22:11.759 --> 00:22:13.500
covered a massive amount of technical ground

00:22:13.500 --> 00:22:16.299
today. We traced the history from Arthur Samuel's

00:22:16.299 --> 00:22:19.240
checkers to the cybernetic goof button. We mapped

00:22:19.240 --> 00:22:22.559
the three learning paradigms, unpacked how backpropagation

00:22:22.559 --> 00:22:24.680
adjusts the microscopic weights of neural networks,

00:22:25.039 --> 00:22:27.559
and looked at how AI is essentially the ultimate

00:22:27.559 --> 00:22:30.079
zip file of reality. That's a lot to take in.

00:22:30.259 --> 00:22:32.019
And finally, we confronted what happens when

00:22:32.019 --> 00:22:34.619
those unauditable, overfitted models collide

00:22:34.619 --> 00:22:37.109
with our inherently flawed historical data. It

00:22:37.109 --> 00:22:39.609
truly is a profound shift in how humans interact

00:22:39.609 --> 00:22:43.049
with computation. We are moving from tools we

00:22:43.049 --> 00:22:46.130
explicitly instruct to tools we merely influence.

00:22:46.430 --> 00:22:48.890
I want to leave you, our listener, with a final

00:22:48.890 --> 00:22:51.690
scenario to mull over, building on the mechanics

00:22:51.690 --> 00:22:53.990
of embedded machine learning and the black box.

00:22:54.650 --> 00:22:57.369
Imagine a near future where an embedded machine

00:22:57.369 --> 00:23:00.230
learning model is running entirely locally on

00:23:00.230 --> 00:23:02.990
a life -critical wearable inside your body. Let's

00:23:02.990 --> 00:23:05.960
say a smart pacemaker. Okay. It uses unsupervised

00:23:05.960 --> 00:23:08.539
learning to continuously map your unique biological

00:23:08.539 --> 00:23:10.460
patterns. It monitors the minute variance in

00:23:10.460 --> 00:23:12.299
your heart rate, your stress biomarkers, your

00:23:12.299 --> 00:23:15.259
sleep cycles. Then one day it makes a split second

00:23:15.259 --> 00:23:17.700
life altering decision about how to regulate

00:23:17.700 --> 00:23:19.960
your heart during a severe medical crisis. A

00:23:19.960 --> 00:23:22.960
completely autonomous decision. Right. But because

00:23:22.960 --> 00:23:25.099
the neural network's logic is distributed across

00:23:25.099 --> 00:23:27.500
millions of unauditable mathematical weights,

00:23:28.119 --> 00:23:30.160
neither your doctors nor the original hardware

00:23:30.160 --> 00:23:33.039
developers can explain exactly why it made that

00:23:33.039 --> 00:23:36.380
specific choice. It saved you, or maybe it didn't,

00:23:36.480 --> 00:23:38.599
but no human can read the mathematical path it

00:23:38.599 --> 00:23:42.000
took. It is a very unsettling reality to consider.

00:23:42.500 --> 00:23:45.000
Who is ultimately responsible for the machine

00:23:45.000 --> 00:23:48.640
when it has become an un -auditable black box

00:23:48.640 --> 00:23:51.700
of you? Exactly. Pay attention to the tools you

00:23:51.700 --> 00:23:54.019
use today, the recommendations you get, the way

00:23:54.019 --> 00:23:56.660
your phone auto -completes your thoughts, the

00:23:56.660 --> 00:23:58.960
subtle nudges from your apps, how much of your

00:23:58.960 --> 00:24:01.440
own daily reality is already being shaped by

00:24:01.440 --> 00:24:03.819
these invisible, mathematically opaque patterns.

00:24:04.140 --> 00:24:06.460
It's something we all need to be aware of. We

00:24:06.460 --> 00:24:08.099
started by talking about the comfort of holding

00:24:08.099 --> 00:24:10.319
a hammer, seeing exactly how the physics of the

00:24:10.319 --> 00:24:12.440
tool work. We are now living in a world where

00:24:12.440 --> 00:24:14.759
the hammer decides which nail to hit, and the

00:24:14.759 --> 00:24:17.519
math won't tell us why. Thanks for joining us

00:24:17.519 --> 00:24:19.000
on this deep dive. Stay curious.
