WEBVTT

00:00:00.000 --> 00:00:03.120
Imagine this. It's a random Tuesday night, right?

00:00:03.940 --> 00:00:08.019
And you are browsing online and you buy a very

00:00:08.019 --> 00:00:12.359
specific, weirdly obscure item. Right, like something

00:00:12.359 --> 00:00:14.599
completely out of the blue. Exactly. Let's say

00:00:14.599 --> 00:00:18.480
it's a specialized carbon steel pan for making

00:00:18.480 --> 00:00:20.879
Japanese rolled omelets. You know, you've never

00:00:20.879 --> 00:00:22.660
cooked Japanese food in your life, but you watched

00:00:22.660 --> 00:00:25.280
a video and, well, you bought the pan. As one

00:00:25.280 --> 00:00:27.789
does at two in the morning. Yeah. Exactly. But

00:00:27.789 --> 00:00:30.230
then by Wednesday morning, your entire digital

00:00:30.230 --> 00:00:33.350
world has completely morphed. I mean, every streaming

00:00:33.350 --> 00:00:36.250
platform you open is suddenly suggesting documentaries

00:00:36.250 --> 00:00:38.829
about Tokyo Sushi Master. Oh, and your bookstore

00:00:38.829 --> 00:00:41.990
app is probably pushing like survivalist off

00:00:41.990 --> 00:00:46.130
the grid cooking guides, too. Yes, it is so frighteningly

00:00:46.130 --> 00:00:48.649
accurate and so perfectly tailored to this sudden,

00:00:49.049 --> 00:00:50.649
weirdly specific taste that you actually stare

00:00:50.649 --> 00:00:52.869
at your phone and wonder if it's listening to

00:00:52.869 --> 00:00:54.710
your conversations. It definitely feels like

00:00:54.710 --> 00:00:57.090
magic. It really does. As it turned out from

00:00:57.090 --> 00:00:59.049
our source material today, it's not magic at

00:00:59.049 --> 00:01:02.810
all. It is math. Very fast, heavily layered math.

00:01:03.250 --> 00:01:06.250
It is entirely math. And for anyone who has ever

00:01:06.250 --> 00:01:08.489
wondered how these platforms seem to just, you

00:01:08.489 --> 00:01:10.530
know, know what you want before you even realize

00:01:10.530 --> 00:01:13.810
you want it, that mathematical engine is exactly

00:01:13.810 --> 00:01:16.109
what we are looking at today. Right. It's the

00:01:16.109 --> 00:01:19.890
invisible framework powering recommender systems

00:01:19.890 --> 00:01:23.049
across the entire web. It dictates almost everything

00:01:23.049 --> 00:01:25.760
you see and consume online. OK, let's unpack

00:01:25.760 --> 00:01:29.359
this. Today's deep dive is all about the architecture

00:01:29.359 --> 00:01:32.420
of something called collaborative filtering.

00:01:32.780 --> 00:01:35.659
Yes, collaborative filtering. We've got a massive

00:01:35.659 --> 00:01:38.500
stack of source material for you today, ranging

00:01:38.500 --> 00:01:41.560
from academic papers to deep engineering breakdowns.

00:01:41.879 --> 00:01:45.299
And our mission here is to basically pull back

00:01:45.299 --> 00:01:47.489
the curtain on this algorithm for you. We want

00:01:47.489 --> 00:01:50.170
to show you exactly how it builds this invisible

00:01:50.170 --> 00:01:53.329
profile of your desires and ultimately how it

00:01:53.329 --> 00:01:56.530
predicts your next move. Right. So to really

00:01:56.530 --> 00:01:59.530
grasp how this works, where do we start? Well,

00:01:59.609 --> 00:02:01.250
we have to start with the foundational assumption

00:02:01.250 --> 00:02:04.150
of collaborative filtering. At its core, it quantifies

00:02:04.150 --> 00:02:07.590
a very simple human truth. It's that if... person

00:02:07.590 --> 00:02:11.509
A and person B agree on one issue, they are statistically

00:02:11.509 --> 00:02:15.050
far more likely to agree on other issues compared

00:02:15.050 --> 00:02:18.550
to just matching person A with a totally random

00:02:18.550 --> 00:02:20.509
stranger. OK, that makes sense. Yeah. I mean,

00:02:20.509 --> 00:02:22.830
it's essentially the mathematics of shared taste.

00:02:23.189 --> 00:02:24.650
Right. Well, we should probably draw a hard line

00:02:24.650 --> 00:02:26.509
in the sand right now for you listening. Yeah.

00:02:26.509 --> 00:02:28.289
Because this is fundamentally different from

00:02:28.289 --> 00:02:31.150
the 1990s internet. Oh, absolutely. Night and

00:02:31.150 --> 00:02:33.870
day. We aren't talking about a basic average

00:02:33.870 --> 00:02:36.509
rating system here. Like we aren't talking about

00:02:36.460 --> 00:02:38.699
a five -star aggregate where a movie gets a 4

00:02:38.699 --> 00:02:42.099
.8 out of 5, so the front page just blasts it

00:02:42.099 --> 00:02:44.580
to millions of people. No, not at all. An average

00:02:44.580 --> 00:02:47.360
rating system is completely nonspecific. It completely

00:02:47.360 --> 00:02:49.259
ignores you as an individual. It's just saying,

00:02:49.360 --> 00:02:52.199
hey, the crowd likes this. Exactly. It merely

00:02:52.199 --> 00:02:55.319
declares the crowd's preference. Collaborative

00:02:55.319 --> 00:02:57.979
filtering, on the other hand, is entirely about

00:02:57.979 --> 00:03:00.340
the customized experience. It's making it about

00:03:00.340 --> 00:03:03.680
you. Right. It makes a specific automated prediction

00:03:03.680 --> 00:03:07.360
about your unique interest by harvesting taste

00:03:07.360 --> 00:03:09.979
information collected from many, many other users.

00:03:10.639 --> 00:03:12.759
The system is literally collaborating to filter

00:03:12.759 --> 00:03:15.620
the world for you. So how does it actually do

00:03:15.620 --> 00:03:18.000
that? Walk me through the actual mechanics of

00:03:18.000 --> 00:03:21.020
matching me to an item, because I'm just imagining

00:03:21.020 --> 00:03:23.849
a giant terrifying spreadsheet. somewhere on

00:03:23.849 --> 00:03:26.150
a server. Honestly, that's actually the perfect

00:03:26.150 --> 00:03:29.050
way to visualize it. Imagine a massive grid,

00:03:29.370 --> 00:03:32.610
like a giant matrix. Okay. The rows are millions

00:03:32.610 --> 00:03:35.889
of users and the columns are millions of items.

00:03:36.430 --> 00:03:38.669
And the workflow of a standard collaborative

00:03:38.669 --> 00:03:40.930
filtering system really just boils down to two

00:03:40.930 --> 00:03:43.879
steps. First, it looks at your row in the spreadsheet.

00:03:44.099 --> 00:03:46.520
It scans the millions of other rows to find users

00:03:46.520 --> 00:03:48.539
who share your exact rating patterns. It's looking

00:03:48.539 --> 00:03:50.080
for people who are weird in the exact same way

00:03:50.080 --> 00:03:52.539
I'm weird. Basically, yeah. It finds the people

00:03:52.539 --> 00:03:54.939
who have a history of loving and hating the exact

00:03:54.939 --> 00:03:57.740
same obscure things you do. Yeah. And then second,

00:03:58.139 --> 00:04:00.139
it uses the ratings from those like -minded users

00:04:00.139 --> 00:04:02.439
to calculate a prediction for the blank spaces

00:04:02.439 --> 00:04:05.319
in your row. This specific method is known as

00:04:05.319 --> 00:04:08.699
user -based collaborative filtering. User -based.

00:04:08.780 --> 00:04:10.780
So I'm essentially looking for my taste twins.

00:04:11.060 --> 00:04:13.520
Exactly. Taste twins is a great way to put it.

00:04:13.840 --> 00:04:15.960
It's like asking a trusted friend for recommendation,

00:04:16.600 --> 00:04:19.879
but just at a massive, massive scale. Like if

00:04:19.879 --> 00:04:23.759
me and my weird digital taste twin both loved

00:04:24.079 --> 00:04:27.939
the same five obscure sci -fi movies and they

00:04:27.939 --> 00:04:30.600
just watched a sixth one and loved it, the map

00:04:30.600 --> 00:04:33.160
assumes there's a really high probability I'm

00:04:33.160 --> 00:04:35.579
going to love it too. That is exactly it. That

00:04:35.579 --> 00:04:38.519
covers the user -based approach. But there is

00:04:38.519 --> 00:04:40.819
a flip side to this coin, which is item -based

00:04:40.819 --> 00:04:43.279
collaborative filtering. Okay. How is that different?

00:04:43.399 --> 00:04:45.740
Well, instead of hunting for relationships between

00:04:45.740 --> 00:04:48.220
users, the system builds a matrix determining

00:04:48.220 --> 00:04:50.399
the relationships between the items themselves.

00:04:50.500 --> 00:04:52.620
Oh, interesting. Yeah, the slope one family of

00:04:52.620 --> 00:04:55.240
algorithms is a famous academic example of this

00:04:55.240 --> 00:04:58.300
from the source material. It analyzes how items

00:04:58.300 --> 00:05:00.560
relate to each other purely based on how human

00:05:00.560 --> 00:05:02.879
beings interact with them in the wild. Oh, so

00:05:02.879 --> 00:05:05.100
that's the classic people who bought a flashlight

00:05:05.100 --> 00:05:07.720
also bought batteries trick. Yes, exactly that.

00:05:07.870 --> 00:05:09.569
It's not looking for my soulmate. It's looking

00:05:09.569 --> 00:05:12.069
at the historical relationship between the actual

00:05:12.069 --> 00:05:15.850
objects. Precisely. But I'm kind of hung up on

00:05:15.850 --> 00:05:18.089
the data going into this giant grid. Like, is

00:05:18.089 --> 00:05:20.350
it just looking at my star ratings? Because I

00:05:20.350 --> 00:05:22.670
have to be honest, I almost never actually click

00:05:22.670 --> 00:05:25.170
the five stars on a movie or product. I just

00:05:25.170 --> 00:05:27.949
watch it and leave. What's fascinating here is

00:05:27.949 --> 00:05:30.250
that the algorithm doesn't necessarily need you

00:05:30.250 --> 00:05:33.730
to click anything at all. Really? Yeah. The data

00:05:33.730 --> 00:05:37.009
is split into explicit and implicit categories.

00:05:37.470 --> 00:05:40.329
Explicit data is you actively participating,

00:05:40.689 --> 00:05:42.930
you know, reading a movie, giving a thumbs up.

00:05:43.110 --> 00:05:45.850
The stuff I don't do. Right. But implicit data

00:05:45.850 --> 00:05:48.629
is arguably the true powerhouse of modern systems

00:05:48.629 --> 00:05:51.350
because it is based entirely on observing your

00:05:51.350 --> 00:05:54.550
normal passive behavior. Meaning it's watching

00:05:54.550 --> 00:05:56.449
what I don't do just as much as... what I do

00:05:56.449 --> 00:05:58.790
exactly that it tracks what music is sitting

00:05:58.790 --> 00:06:01.170
in your library how many seconds you lingered

00:06:01.170 --> 00:06:03.430
on a specific web page before scrolling past

00:06:03.430 --> 00:06:06.069
it or whether I skipped a song after three seconds

00:06:06.069 --> 00:06:08.089
versus listening all the way through to the end

00:06:08.089 --> 00:06:12.779
yes A robust system then filters all this implicit

00:06:12.779 --> 00:06:16.139
data through basic business logic. Like, if it

00:06:16.139 --> 00:06:18.259
observes that you already purchased a digital

00:06:18.259 --> 00:06:20.519
album, it knows it shouldn't recommend you buy

00:06:20.519 --> 00:06:23.000
it again. That'd be annoying. Right. So it maps

00:06:23.000 --> 00:06:25.779
your passive actions against millions of others

00:06:25.779 --> 00:06:29.300
to predict your next logical step. Okay, so I

00:06:29.300 --> 00:06:32.720
have my giant grid. I have my explicit stars,

00:06:32.899 --> 00:06:35.759
if I ever use them, and my implicit scrolling

00:06:35.759 --> 00:06:38.459
habits. But how does the system mathematically

00:06:38.459 --> 00:06:41.000
know that my taste twin is actually my twin?

00:06:41.360 --> 00:06:43.970
Like, how does it compute that similarity? So

00:06:43.970 --> 00:06:46.569
this brings us to a major historical divide in

00:06:46.569 --> 00:06:49.129
computer science. It's the split between memory

00:06:49.129 --> 00:06:52.089
-based versus model -based approaches. OK, let's

00:06:52.089 --> 00:06:54.110
start with memory -based. Memory -based methods

00:06:54.110 --> 00:06:56.149
are the classic sort of brute force way. They

00:06:56.149 --> 00:06:59.189
hold the entire massive database of user ratings

00:06:59.189 --> 00:07:01.730
in the computer's active memory and use statistical

00:07:01.730 --> 00:07:04.430
formulas to compute the literal distance between

00:07:04.430 --> 00:07:07.329
users. Distance, like physical distance. Mathematical

00:07:07.329 --> 00:07:09.389
distance. They use things like Pearson correlation

00:07:09.389 --> 00:07:12.490
or cosine similarity. Cosine similarity, like

00:07:12.490 --> 00:07:16.089
high school trigonometry. Yes, exactly. Think

00:07:16.089 --> 00:07:19.149
back to geometry class. Imagine we plot your

00:07:19.149 --> 00:07:22.209
tastes as a line on a massive, you know, multi

00:07:22.209 --> 00:07:24.350
-dimensional graph. Okay, I'm picturing it. And

00:07:24.350 --> 00:07:27.620
then we plot my tastes as another line. The system

00:07:27.620 --> 00:07:30.379
mathematically calculates the angle between our

00:07:30.379 --> 00:07:33.399
two lines. If our lines point in the exact same

00:07:33.399 --> 00:07:35.660
direction, the angle is zero, which means we

00:07:35.660 --> 00:07:38.720
have incredibly high similarity. But if our lines

00:07:38.720 --> 00:07:40.740
point in completely different directions, the

00:07:40.740 --> 00:07:43.360
math knows we have absolutely nothing in common.

00:07:43.660 --> 00:07:46.100
So the math is literally measuring the angle

00:07:46.100 --> 00:07:49.800
of our agreement. That is wild. It's very elegant,

00:07:50.079 --> 00:07:51.560
conceptually. But hold on, let me think about

00:07:51.560 --> 00:07:54.620
this grid again. If I have rated, say, 200 movies,

00:07:54.839 --> 00:07:57.649
and you've only rated three. How can we even

00:07:57.649 --> 00:07:59.370
have an angle? I mean, most of the grid for both

00:07:59.370 --> 00:08:03.050
of us is just totally blank space. You've just

00:08:03.050 --> 00:08:04.949
hit on the Achilles heel of the memory -based

00:08:04.949 --> 00:08:07.930
approach. That massive amount of empty space

00:08:07.930 --> 00:08:10.350
in the grid is what engineers call data sparsity.

00:08:10.470 --> 00:08:12.850
Data sparsity. Right. When you are dealing with

00:08:12.850 --> 00:08:15.490
millions of users and billions of items, computing

00:08:15.490 --> 00:08:18.310
the trigonometric angle for every single user

00:08:18.310 --> 00:08:21.490
against every single other user is cripplingly

00:08:21.490 --> 00:08:24.269
slow. Because it's doing math on nothing. Exactly.

00:08:24.529 --> 00:08:26.449
The computer is spending all its time crunching

00:08:26.449 --> 00:08:29.730
math on empty space. Which means the system crashes.

00:08:30.189 --> 00:08:32.830
Or, I don't know, it takes three days to recommend

00:08:32.830 --> 00:08:36.129
my next YouTube video. Which is obviously unacceptable

00:08:36.129 --> 00:08:38.610
when a user expects a new video to autoplay in

00:08:38.610 --> 00:08:41.649
two seconds. Right. This is where model -based

00:08:41.649 --> 00:08:44.450
approaches completely revolutionize the field.

00:08:44.639 --> 00:08:47.519
Instead of holding all that raw, mostly empty

00:08:47.519 --> 00:08:50.460
data in memory and measuring distances, the system

00:08:50.460 --> 00:08:53.220
uses machine learning techniques. Like what?

00:08:53.820 --> 00:08:56.679
Specifically, dimensionality reduction. Things

00:08:56.679 --> 00:09:00.080
like singular value decomposition. Singular value

00:09:00.080 --> 00:09:02.000
decomposition. Okay, you're definitely going

00:09:02.000 --> 00:09:03.679
to have to translate that into English for me.

00:09:04.039 --> 00:09:06.980
Fair enough. Think of it this way. Using a memory

00:09:06.980 --> 00:09:08.980
-based approach is like trying to navigate a

00:09:08.980 --> 00:09:11.539
massive city by measuring the exact distance

00:09:11.539 --> 00:09:13.820
between every single individual building. OK.

00:09:13.960 --> 00:09:16.659
Yeah, that sounds exhausting and incredibly slow.

00:09:16.840 --> 00:09:18.940
It is. A model -based approach, on the other

00:09:18.940 --> 00:09:21.379
hand, takes that whole city and creates a simplified

00:09:21.379 --> 00:09:25.080
subway map. I like that. It compresses the giant

00:09:25.080 --> 00:09:28.379
empty grid into a much smaller, dense mathematical

00:09:28.379 --> 00:09:30.860
model of what we call latent factors. Latent

00:09:30.860 --> 00:09:32.740
factors, like hidden categories or something?

00:09:33.019 --> 00:09:35.639
Yes, exactly like hidden categories. The math

00:09:35.639 --> 00:09:37.480
might discover that a certain group of movies

00:09:37.480 --> 00:09:40.440
always gets rated highly by the exact same cluster

00:09:40.440 --> 00:09:42.860
of people. OK. Now, the computer doesn't know

00:09:42.860 --> 00:09:45.940
what the English words brooding, dystopian, sci

00:09:45.940 --> 00:09:48.500
-fi mean. Right, it's just code. Exactly. No

00:09:48.500 --> 00:09:51.360
human ever tagged the movies with those words

00:09:51.360 --> 00:09:54.559
for the algorithm. But the math identifies the

00:09:54.559 --> 00:09:57.240
underlying pattern, the latent factor that connects

00:09:57.240 --> 00:09:59.539
them. It learns the rules of the data rather

00:09:59.539 --> 00:10:01.899
than memorizing every single data point. Wow.

00:10:02.059 --> 00:10:04.700
And because of that, it is vastly faster and

00:10:04.700 --> 00:10:06.639
way more scalable. So instead of comparing me

00:10:06.639 --> 00:10:09.440
to millions of individual people, it just maps

00:10:09.440 --> 00:10:13.129
me to the brooding sci - subway line and instantly

00:10:13.129 --> 00:10:15.730
knows what other stops are on that route. That's

00:10:15.730 --> 00:10:17.690
a perfect way to summarize it. That is brilliant.

00:10:17.909 --> 00:10:19.330
But let's take a step back here. We're talking

00:10:19.330 --> 00:10:22.250
about algorithms and subway maps and math. But

00:10:22.250 --> 00:10:24.870
what happens when you unleash this highly efficient

00:10:24.870 --> 00:10:28.990
matching system on actual messy human societies?

00:10:29.590 --> 00:10:31.990
Well, the social application of collaborative

00:10:31.990 --> 00:10:34.590
filtering is where things get complicated and

00:10:34.590 --> 00:10:37.450
frankly a bit concerning. Because we see this

00:10:37.450 --> 00:10:40.370
everywhere, right? The source material brings

00:10:40.370 --> 00:10:43.590
up platforms like Reddit, Wikipedia, Last .fm.

00:10:43.769 --> 00:10:46.169
Yes, the heavy hitters. And these aren't traditional

00:10:46.169 --> 00:10:48.730
media sites where a few editors sit in a smoky

00:10:48.730 --> 00:10:51.970
room and curate the front page. They rely entirely

00:10:51.970 --> 00:10:54.169
on collaborative filtering. They rely on the

00:10:54.169 --> 00:10:57.049
community. Exactly. They rely on the community

00:10:57.049 --> 00:11:00.350
to vote up content. On Wikipedia, volunteers

00:11:00.350 --> 00:11:03.769
collaborate to filter out falsehoods. The stories

00:11:03.769 --> 00:11:06.090
that hit the front page reflect the average interest

00:11:06.090 --> 00:11:08.230
of the specific group interacting with them.

00:11:08.549 --> 00:11:11.029
And the engine feeds on that interaction. The

00:11:11.029 --> 00:11:13.470
more a user interacts with the system, the more

00:11:13.470 --> 00:11:15.809
data the model has to build their latent factor

00:11:15.809 --> 00:11:17.870
profile. And the more tailored the recommendations

00:11:17.870 --> 00:11:20.070
become. Right. But let me pause you right there

00:11:20.070 --> 00:11:22.889
because I'm reasoning this out. If the system's

00:11:22.889 --> 00:11:25.490
entire mathematical goal is to promote what the

00:11:25.490 --> 00:11:28.590
majority of my taste twins enjoy and then just

00:11:28.590 --> 00:11:31.690
serve it back to me. Aren't we just building

00:11:31.690 --> 00:11:34.190
a massive echo chamber? That is the big question.

00:11:34.529 --> 00:11:37.710
Like, if I only see things that people exactly

00:11:37.710 --> 00:11:41.330
like me have upvoted, don't niche ideas or challenging

00:11:41.330 --> 00:11:44.559
viewpoints, just get completely buried. If we

00:11:44.559 --> 00:11:47.019
connect this to the bigger picture, your concern

00:11:47.019 --> 00:11:49.460
is the central crisis of modern recommendation.

00:11:50.059 --> 00:11:52.840
A collaborative filtering system does not guarantee

00:11:52.840 --> 00:11:55.419
an objective or diverse match. It just gives

00:11:55.419 --> 00:11:58.360
you what you already like. Yes. Unless a platform

00:11:58.360 --> 00:12:01.279
actively intervenes to enforce diversity of opinion,

00:12:01.940 --> 00:12:04.159
one dominant point of view will inevitably crush

00:12:04.159 --> 00:12:06.179
the others. It's a self -fulfilling prophecy.

00:12:06.580 --> 00:12:09.320
Just a massive feedback loop. A remarkably efficient

00:12:09.320 --> 00:12:12.179
one, yes. The system learns what you click, feeds

00:12:12.179 --> 00:12:14.779
you more of it, you click it again, which statistically

00:12:14.779 --> 00:12:17.519
reinforces to the matrix that this is your defining

00:12:17.519 --> 00:12:20.080
trait. Wow. The walls of the filter bubble get

00:12:20.080 --> 00:12:22.159
mathematically thicker with every interaction.

00:12:22.460 --> 00:12:24.299
What's wild to me from reading the research is

00:12:24.299 --> 00:12:26.919
that this tendency to trap people in bubbles

00:12:26.919 --> 00:12:29.740
isn't just a philosophical problem for sociologists

00:12:29.740 --> 00:12:31.659
to worry about? No, not at all. It's actually

00:12:31.659 --> 00:12:34.639
a symptom of deep structural flaws in how the

00:12:34.639 --> 00:12:37.299
math itself functions. Like the engineers are

00:12:37.299 --> 00:12:39.539
desperately trying to fix the very matrices they

00:12:39.539 --> 00:12:42.460
built. Yeah, the models are incredibly fragile

00:12:42.460 --> 00:12:45.200
in certain edge cases. We talked about data sparsity

00:12:45.200 --> 00:12:48.500
slowing things down, but its most famous consequence

00:12:48.500 --> 00:12:51.320
is known in the field as the cold start problem.

00:12:51.500 --> 00:12:54.340
The cold start. Think about the last time you,

00:12:54.480 --> 00:12:56.919
the listener, created a brand new account on

00:12:56.919 --> 00:12:59.480
a streaming service. You haven't clicked anything

00:12:59.480 --> 00:13:02.340
yet. You are a cold start. And to the algorithm,

00:13:02.700 --> 00:13:05.200
you are terrifying. Right. Because how does a

00:13:05.200 --> 00:13:07.840
system that relies entirely on past preferences

00:13:07.840 --> 00:13:10.200
predict anything for a complete blank slate?

00:13:10.419 --> 00:13:14.159
It can't. Exactly. A brand new user or a newly

00:13:14.159 --> 00:13:17.320
uploaded independent film with zero ratings completely

00:13:17.320 --> 00:13:20.159
breaks the logic of The Matrix. There is no history

00:13:20.159 --> 00:13:22.360
to find a taste twin. Right. If an obscure indie

00:13:22.360 --> 00:13:25.600
movie drops today, It has no data. It can't be

00:13:25.600 --> 00:13:28.120
matched to the brooding sci -fi subway line because

00:13:28.120 --> 00:13:30.320
no one has brought a ticket for it yet. Exactly.

00:13:30.559 --> 00:13:32.860
And another major breakdown is scalability in

00:13:32.860 --> 00:13:35.559
real time. Finding latent factors for a thousand

00:13:35.559 --> 00:13:38.299
users is easy, but doing it for hundreds of millions

00:13:38.299 --> 00:13:40.720
of customers, recalculating their subway map

00:13:40.720 --> 00:13:42.860
instantly as they click through a website, that

00:13:42.860 --> 00:13:45.279
requires staggering computing power. I can't

00:13:45.279 --> 00:13:47.559
even imagine the server costs. It's immense.

00:13:47.799 --> 00:13:50.059
And then on top of that, the system has to defend

00:13:50.059 --> 00:13:53.700
against human malice, specifically shilling attacks.

00:13:53.899 --> 00:13:56.419
Oh, man. I found this part of the reading fascinating.

00:13:56.639 --> 00:14:00.889
People actively gaming the math. Yeah, in any

00:14:00.889 --> 00:14:03.870
open system, bad actors will try to manipulate

00:14:03.870 --> 00:14:06.649
the algorithm. They deploy bots to create thousands

00:14:06.649 --> 00:14:09.370
of fake accounts. And then what do they do? These

00:14:09.370 --> 00:14:12.450
bots flood the matrix with positive ratings for

00:14:12.450 --> 00:14:15.269
their own products and artificially downvote

00:14:15.269 --> 00:14:17.210
their competitors. So they're just muddying the

00:14:17.210 --> 00:14:19.549
waters. Right. And the mathematical model has

00:14:19.549 --> 00:14:22.190
to be robust enough to detect that these thousands

00:14:22.190 --> 00:14:24.549
of new taste twins are actually a coordinated

00:14:24.549 --> 00:14:27.090
attack and then ignore them. Which brings me

00:14:27.090 --> 00:14:29.500
to two of my absolute favorite concepts. from

00:14:29.500 --> 00:14:32.440
the source material, the gray sheep and the black

00:14:32.440 --> 00:14:35.740
sheep. Ah, yes. These are essentially the ultimate

00:14:35.740 --> 00:14:39.000
internet hipsters. The gray sheep are real human

00:14:39.000 --> 00:14:42.159
users whose opinions just do not consistently

00:14:42.159 --> 00:14:44.460
align with any measurable group. They're a bit

00:14:44.460 --> 00:14:46.860
all over the place. Yeah. And the black sheep

00:14:46.860 --> 00:14:49.700
have a taste so completely idiosyncratic and

00:14:49.700 --> 00:14:51.899
weird that the algorithm just throws its hands

00:14:51.899 --> 00:14:53.840
up in defeat. They cannot be placed on the subway

00:14:53.840 --> 00:14:57.000
map. The math simply breaks down when it encounters

00:14:57.000 --> 00:14:59.779
a true statistical outlier. But here's where

00:14:59.779 --> 00:15:02.679
it gets really interesting. Because with all

00:15:02.679 --> 00:15:04.720
the breathless hype in the world right now about

00:15:04.720 --> 00:15:09.019
artificial intelligence, surely modern AI has

00:15:09.019 --> 00:15:11.259
fixed this, right? You'd think so. I mean, we

00:15:11.259 --> 00:15:13.320
have deep learning neural networks that can write

00:15:13.320 --> 00:15:16.000
college essays and generate hyper -realistic

00:15:16.000 --> 00:15:18.960
videos from scratch. They must have solved a

00:15:18.960 --> 00:15:21.279
clunky movie recommendation matrix by now. You

00:15:21.279 --> 00:15:23.559
would certainly think so, given the press coverage.

00:15:24.059 --> 00:15:26.879
But the research we reviewed reveals a staggering

00:15:26.879 --> 00:15:29.320
plot twist. Hit me with it. The effectiveness

00:15:29.320 --> 00:15:31.639
of cutting edge deep learning in this specific

00:15:31.639 --> 00:15:34.580
field is highly, highly questionable. Wait, really?

00:15:35.139 --> 00:15:37.940
Like, AI is actually failing at this. Yes. A

00:15:37.940 --> 00:15:40.240
systematic analysis was recently conducted on

00:15:40.240 --> 00:15:42.659
the top academic publications that claimed to

00:15:42.659 --> 00:15:45.700
use deep learning neural methods to solve recommendation

00:15:45.700 --> 00:15:48.419
problems. The researchers looked at articles

00:15:48.419 --> 00:15:51.379
published in the absolute most prestigious conferences

00:15:51.379 --> 00:15:54.399
in the world. And the results were a massive

00:15:54.399 --> 00:15:57.220
glow to the industry. How bad was it? On average,

00:15:57.460 --> 00:15:59.720
less than 40 % of the deep learning articles

00:15:59.720 --> 00:16:03.139
were reproducible. And at some elite conferences,

00:16:03.519 --> 00:16:06.700
the reproducibility rate was an abysmal 14%.

00:16:06.700 --> 00:16:08.139
OK, I want to make sure I understand what that

00:16:08.139 --> 00:16:10.679
means for you listening. Reproducibility. That

00:16:10.679 --> 00:16:13.399
means another scientist took the exact same neural

00:16:13.399 --> 00:16:16.320
network code, ran the exact same data set through

00:16:16.320 --> 00:16:18.600
it, and couldn't get the algorithm to actually

00:16:18.600 --> 00:16:21.139
do what the original paper claimed. Correct.

00:16:21.500 --> 00:16:24.399
The study identified 18 major articles pushing

00:16:24.399 --> 00:16:27.240
revolutionary new deep learning models. Eighteen.

00:16:27.379 --> 00:16:30.059
Out of those 18, only seven could actually be

00:16:30.059 --> 00:16:32.179
reproduced by independent labs. That's less than

00:16:32.179 --> 00:16:35.379
half. It gets worse. The kicker is this. Of those

00:16:35.379 --> 00:16:37.779
seven that worked, six of them were completely

00:16:37.779 --> 00:16:41.080
outperformed by older, simpler, traditional baseline

00:16:41.080 --> 00:16:45.200
algorithms. That is wild. So the incredibly complex,

00:16:45.500 --> 00:16:48.120
computationally expensive neural network was

00:16:48.120 --> 00:16:51.000
actually worse than the basic latent factor subway

00:16:51.000 --> 00:16:53.799
math we were just talking about. In almost every

00:16:53.799 --> 00:16:56.100
case, yes. How is that even possible? Why did

00:16:56.100 --> 00:16:58.600
it fail so badly? Well, it really comes down

00:16:58.600 --> 00:17:02.759
to two major issues. Overfitting and academic

00:17:02.759 --> 00:17:05.380
pressure. Okay, what's overfitting? Neural networks

00:17:05.380 --> 00:17:07.940
are incredibly powerful, but they have a tendency

00:17:07.940 --> 00:17:10.769
to overfit the data. They essentially memorize

00:17:10.769 --> 00:17:13.190
the training data perfectly, but when you give

00:17:13.190 --> 00:17:16.130
them messy, real -world data, they completely

00:17:16.130 --> 00:17:18.809
fall apart. Oh, I see. They memorize the answers

00:17:18.809 --> 00:17:20.910
to the test, but they don't actually understand

00:17:20.910 --> 00:17:23.509
the subject. Precisely. It's a common trap in

00:17:23.509 --> 00:17:25.769
machine learning. And the second issue is how

00:17:25.769 --> 00:17:27.430
these papers get published in the first place.

00:17:27.690 --> 00:17:29.829
The academic pressure part. Right. To get your

00:17:29.829 --> 00:17:31.690
paper accepted at a top conference, you have

00:17:31.690 --> 00:17:34.369
to prove your new deep learning model beats the

00:17:34.369 --> 00:17:37.000
old models. What the analysis found was that

00:17:37.000 --> 00:17:39.660
many researchers were intentionally using poorly

00:17:39.660 --> 00:17:42.799
tuned, unoptimized versions of the old algorithms

00:17:42.799 --> 00:17:45.079
as their baseline. Oh, wow. So they were setting

00:17:45.079 --> 00:17:47.779
up a straw man. Exactly. They made the old math

00:17:47.779 --> 00:17:50.579
look bad on purpose. Once independent labs took

00:17:50.579 --> 00:17:53.640
the old, simple math and actually tuned it properly,

00:17:54.299 --> 00:17:58.309
it crushed the fancy new AI. That is crazy. Yeah,

00:17:58.450 --> 00:18:00.930
this raises an important question about the current

00:18:00.930 --> 00:18:03.430
state of scientific scholarship and the pressure

00:18:03.430 --> 00:18:07.349
to chase hype over function. OK, so if throwing

00:18:07.349 --> 00:18:09.809
a broken neural network at the problem isn't

00:18:09.809 --> 00:18:13.170
the magic bullet we thought it was, how are engineers

00:18:13.170 --> 00:18:16.650
actually improving these systems? Like, how do

00:18:16.650 --> 00:18:19.470
they fix the cold start problem and handle the

00:18:19.470 --> 00:18:21.549
fact that our tastes change depending on our

00:18:21.549 --> 00:18:24.000
mood? The current frontier of the technology

00:18:24.000 --> 00:18:26.779
is something called context -aware collaborative

00:18:26.779 --> 00:18:29.599
filtering, and it requires fundamentally changing

00:18:29.599 --> 00:18:31.920
the shape of the mathematical grid. What do you

00:18:31.920 --> 00:18:33.759
mean by the shape of the grid? Well, traditional

00:18:33.759 --> 00:18:36.400
collaborative filtering uses that flat two -dimensional

00:18:36.400 --> 00:18:39.599
matrix we talked about. rows or users, columns

00:18:39.599 --> 00:18:42.380
or items. Right, the spreadsheet. But that completely

00:18:42.380 --> 00:18:45.200
ignores the real -world context of when and where

00:18:45.200 --> 00:18:47.900
you are actually consuming the media. Context

00:18:47.900 --> 00:18:50.259
-aware systems add a third dimension, transforming

00:18:50.259 --> 00:18:53.740
the flat grid into a 3D tensor. A 3D tensor.

00:18:53.960 --> 00:18:55.559
Okay, paint a picture of that for me. Imagine

00:18:55.559 --> 00:18:57.740
taking our flat spreadsheet and pulling it out

00:18:57.740 --> 00:19:00.579
into a massive cube. Okay, I see it. That third

00:19:00.579 --> 00:19:03.660
dimension represents context slices, things like

00:19:03.660 --> 00:19:06.000
time of day, location, what device you're using,

00:19:06.099 --> 00:19:08.339
because think about it, the music you want to

00:19:08.339 --> 00:19:10.819
listen to on your phone at 7 a .m. on a rainy

00:19:10.819 --> 00:19:14.099
Tuesday morning commute is drastically different

00:19:14.099 --> 00:19:16.059
from the music you want blurring on your smart

00:19:16.059 --> 00:19:18.420
speaker at 10 p .m. on a Friday night with friends.

00:19:18.680 --> 00:19:21.039
Oh, absolutely. Tuesday morning is moody podcasts.

00:19:21.660 --> 00:19:24.859
Friday night is upbeat pop. If the system just

00:19:24.859 --> 00:19:27.119
averages me out, it's going to play pop music

00:19:27.119 --> 00:19:29.799
on my Tuesday commute and completely ruin my

00:19:29.799 --> 00:19:32.980
morning. Exactly. The 3D tensor prevents that.

00:19:33.130 --> 00:19:35.750
It computes your similarity to others, not just

00:19:35.750 --> 00:19:37.849
based on what you like, but when and how you

00:19:37.849 --> 00:19:40.250
like it. That makes so much sense. And to finally

00:19:40.250 --> 00:19:42.630
defeat the cold start problem we discussed earlier,

00:19:42.930 --> 00:19:45.470
engineers are feeding auxiliary information into

00:19:45.470 --> 00:19:47.589
these tensors. Meaning data from outside the

00:19:47.589 --> 00:19:50.509
matrix entirely. Right. They look at user attributes,

00:19:50.650 --> 00:19:53.509
age, general location, social links, who you

00:19:53.509 --> 00:19:56.369
follow online. They also ingest item attributes

00:19:56.369 --> 00:19:58.690
like descriptive tags, categories, and brand

00:19:58.690 --> 00:20:01.589
names. By feeding this auxiliary data into the

00:20:01.589 --> 00:20:03.500
model, the system The system can make a highly

00:20:03.500 --> 00:20:06.460
educated guess about a brand new user or a newly

00:20:06.460 --> 00:20:09.259
uploaded video before a single rating has ever

00:20:09.259 --> 00:20:11.880
been given. So it's like upgrading from a flat

00:20:11.880 --> 00:20:14.559
spreadsheet to a Rubik's Cube of my personality.

00:20:15.279 --> 00:20:17.519
It's taking in all these different sides of who

00:20:17.519 --> 00:20:20.259
I am, where I am, and what time it is, not just

00:20:20.259 --> 00:20:23.000
the stars I click. That's a great analogy. So

00:20:23.000 --> 00:20:24.660
what does this all mean for the echo chamber

00:20:24.660 --> 00:20:27.339
we talked about earlier? Because if the algorithm

00:20:27.339 --> 00:20:29.420
is just getting hyper efficient at predicting

00:20:29.420 --> 00:20:32.880
exactly what I want in this exact context, how

00:20:32.880 --> 00:20:37.019
do we prevent the system from only serving up

00:20:37.019 --> 00:20:39.599
massive blockbuster hits? That is the ultimate

00:20:39.599 --> 00:20:42.000
challenge for system designers today. And it's

00:20:42.000 --> 00:20:44.119
addressed by focusing on a concept called the

00:20:44.119 --> 00:20:46.539
long tail. The long tail. Yeah. In any marketplace,

00:20:46.880 --> 00:20:49.160
the head of the curve consists of the few blockbuster

00:20:49.160 --> 00:20:51.839
hits that sell millions of copies. The long tail

00:20:51.839 --> 00:20:54.359
represents the vast, vast majority of items that

00:20:54.359 --> 00:20:57.359
only sell a few copies each. The niche, the independent,

00:20:57.539 --> 00:21:00.420
the obscure stuff. Exactly, the territory of

00:21:00.420 --> 00:21:02.960
the gray sheep and the black sheep. Because of

00:21:02.960 --> 00:21:06.059
data sparsity, it is mathematically very difficult

00:21:06.059 --> 00:21:08.900
for an algorithm to confidently recommend an

00:21:08.900 --> 00:21:11.180
item from the long tail. It's much safer to just

00:21:11.180 --> 00:21:13.150
recommend the blockbuster and move on. Right.

00:21:13.769 --> 00:21:16.529
But newer algorithms are being specifically tweaked

00:21:16.529 --> 00:21:18.769
with diversity metrics built into their reward

00:21:18.769 --> 00:21:21.210
functions. How does that work? Instead of just

00:21:21.210 --> 00:21:24.630
aiming for historical accuracy, they are mathematically

00:21:24.630 --> 00:21:27.670
incentivized to push diverse items from the long

00:21:27.670 --> 00:21:30.309
tail into your feed. Oh, I love that. They are

00:21:30.309 --> 00:21:33.410
basically injecting serendipity. Exactly. They

00:21:33.410 --> 00:21:35.769
intentionally introduce a little bit of chaos,

00:21:36.049 --> 00:21:38.690
a calculated risk into the recommendation, to

00:21:38.690 --> 00:21:40.809
break the feedback loop. To help you discover

00:21:40.809 --> 00:21:42.950
things you never would have found but just following

00:21:42.950 --> 00:21:45.829
your taste -twins. That's the goal. That is fascinating.

00:21:45.970 --> 00:21:47.990
It's like the math is actively trying to rescue

00:21:47.990 --> 00:21:50.589
us from our own predictability. So, to bring

00:21:50.589 --> 00:21:52.680
this all home for you listening. the next time

00:21:52.680 --> 00:21:55.299
you're scrolling on your couch and a platform

00:21:55.299 --> 00:21:58.980
suggests the absolute perfect video or a weirdly

00:21:58.980 --> 00:22:02.539
specific carbon steel pan and you get that spooky

00:22:02.539 --> 00:22:04.539
feeling that your phone is reading your mind.

00:22:04.759 --> 00:22:07.480
You now know exactly what is happening under

00:22:07.480 --> 00:22:10.200
the hood. Right. There is no magic. Yeah. Just

00:22:10.200 --> 00:22:13.920
vast matrices, massive 3D tensors, and millions

00:22:13.920 --> 00:22:17.119
of invisible taste twins working behind the scenes,

00:22:17.619 --> 00:22:19.640
mathematically calculating the angles of your

00:22:19.640 --> 00:22:22.759
desires to serve up your next obsession. It truly

00:22:22.759 --> 00:22:25.299
is a staggering technological achievement that

00:22:25.299 --> 00:22:28.480
shapes modern life. But if we connect all of

00:22:28.480 --> 00:22:30.579
this to the bigger picture, it leaves us with

00:22:30.579 --> 00:22:32.859
something quite profound to consider. What's

00:22:32.859 --> 00:22:35.299
that? If these algorithms continue to refine

00:22:35.299 --> 00:22:37.720
their tensors and eventually become flawlessly

00:22:37.720 --> 00:22:40.119
efficient at predicting our desires based entirely

00:22:40.119 --> 00:22:42.319
on our past behaviors and the behaviors of people

00:22:42.319 --> 00:22:45.700
exactly like us, how will we ever discover completely

00:22:45.700 --> 00:22:48.559
new unexpected versions of ourselves? Wow. By

00:22:48.559 --> 00:22:50.779
perfectly predicting who we are today, are we

00:22:50.779 --> 00:22:53.339
risking filtering out the very serendipity that

00:22:53.339 --> 00:22:54.640
makes human growth possible?
