WEBVTT

00:00:00.000 --> 00:00:02.759
Back in the day, if you wanted a map of the world,

00:00:03.759 --> 00:00:06.019
cartographers had to literally sail the oceans.

00:00:06.379 --> 00:00:08.140
Right, they're out there on the water. Yeah,

00:00:08.259 --> 00:00:10.380
drawing the coastlines by hand. You knew exactly

00:00:10.380 --> 00:00:12.099
where the physical edges were. It was a very

00:00:12.099 --> 00:00:14.960
tangible process, you know? Like, if a ship hadn't

00:00:14.960 --> 00:00:16.800
sailed there yet, the mapmaker just sketched

00:00:16.800 --> 00:00:19.839
in a sea monster and wrote, uh, here be dragons.

00:00:20.420 --> 00:00:24.399
Exactly. But today... The most complex, detailed

00:00:24.399 --> 00:00:27.199
maps in the world. They aren't of oceans or continents.

00:00:27.460 --> 00:00:30.780
They are maps of you. Oh, totally. Your tastes,

00:00:31.399 --> 00:00:34.119
your daily habits, your, I mean, your midnight

00:00:34.119 --> 00:00:37.479
impulse buys on a random Tuesday. And the carcographers

00:00:37.479 --> 00:00:41.060
drawing these incredibly invasive maps are completely

00:00:41.060 --> 00:00:43.320
invisible. They are invisible and they are constantly

00:00:43.320 --> 00:00:45.399
redrawing the borders of your preferences in

00:00:45.399 --> 00:00:47.859
real time, often anticipating what you want before

00:00:47.859 --> 00:00:49.859
you even realize you want it. Welcome to the

00:00:49.859 --> 00:00:53.340
deep dive. Today we have a really fascinating

00:00:53.340 --> 00:00:56.280
mission for you. We are looking at a single,

00:00:56.799 --> 00:00:59.960
incredibly comprehensive source. A detailed Wikipedia

00:00:59.960 --> 00:01:02.679
article covering the entire landscape of recommender

00:01:02.679 --> 00:01:05.120
systems. It's a massive topic. It really is.

00:01:05.359 --> 00:01:08.280
We are going to unpack the invisible digital

00:01:08.280 --> 00:01:11.140
architects that shape what you buy, what you

00:01:11.140 --> 00:01:13.700
watch, and honestly even what you think. Because

00:01:13.700 --> 00:01:16.140
it goes so much deeper than just shopping. Right,

00:01:16.260 --> 00:01:18.799
we're moving way beyond that generic buzzword

00:01:18.799 --> 00:01:22.379
algorithm to understand exactly how these systems

00:01:22.379 --> 00:01:26.599
mathematically map human taste, where they spectacularly

00:01:26.599 --> 00:01:29.140
fail, and how they are evolving right now. Okay,

00:01:29.180 --> 00:01:31.250
let's unpack this. Sounds good. Before we get

00:01:31.250 --> 00:01:34.069
to the terrifyingly advanced AI, we really have

00:01:34.069 --> 00:01:36.769
to start at the foundation. How do machines even

00:01:36.769 --> 00:01:39.590
begin the process of categorizing something as

00:01:39.590 --> 00:01:42.209
subjective as human preference? Well, the attempt

00:01:42.209 --> 00:01:44.290
to quantify human taste starts a lot earlier

00:01:44.290 --> 00:01:47.909
than most people assume. In 1979, a researcher

00:01:47.909 --> 00:01:50.549
named Elaine Rich created what the industry generally

00:01:50.549 --> 00:01:53.010
considers the first recommender system. 1979.

00:01:53.109 --> 00:01:55.450
That's early. Yeah, long before the modern internet.

00:01:55.510 --> 00:01:58.189
It was a program called Grendy. And the way Grendy

00:01:58.189 --> 00:02:01.189
worked was surprisingly psychological. It recommended

00:02:01.189 --> 00:02:03.890
books to users by asking them a series of specific

00:02:03.890 --> 00:02:06.549
questions and then classifying them into programmed

00:02:06.549 --> 00:02:09.789
stereotypes. Wait, literally calling them stereotypes?

00:02:10.110 --> 00:02:12.729
Literally. Based on the stereotype bucket you

00:02:12.729 --> 00:02:15.409
landed in, Grundy would spit out a book recommendation.

00:02:15.810 --> 00:02:18.069
I love that stereotype was used as an actual

00:02:18.069 --> 00:02:21.479
literal technical feature. It sounds so blunt

00:02:21.479 --> 00:02:23.900
today, but I guess that is the absolute origin

00:02:23.900 --> 00:02:26.800
point of user profiling. It really is the bedrock.

00:02:27.020 --> 00:02:29.300
Obviously we've evolved past Grundy categorizing

00:02:29.300 --> 00:02:32.360
us into little buckets. Our source material breaks

00:02:32.360 --> 00:02:35.080
down the two main modern pillars that replace

00:02:35.080 --> 00:02:38.039
that system collaborative filtering and content

00:02:38.039 --> 00:02:40.379
-based filtering. Right, and the mechanical distinction

00:02:40.379 --> 00:02:42.699
between those two is the key to understanding

00:02:42.699 --> 00:02:45.159
the modern internet. Collaborative filtering

00:02:45.159 --> 00:02:47.300
is entirely based on the behavior of the crowd.

00:02:47.479 --> 00:02:50.319
The mathematical assumption is simple. If two

00:02:50.319 --> 00:02:52.599
people's tastes aligned in the past, they will

00:02:52.599 --> 00:02:55.539
align in the future. Take a music platform like

00:02:55.539 --> 00:02:58.139
Last .fm. Oh, I used to use that all the time.

00:02:58.300 --> 00:03:00.539
Yeah. When it uses collaborative filtering, it

00:03:00.539 --> 00:03:03.460
doesn't analyze the audio files at all. It simply

00:03:03.460 --> 00:03:06.560
builds a massive grid connecting users to the

00:03:06.560 --> 00:03:08.509
songs they play. So it's just looking at the

00:03:08.509 --> 00:03:11.409
connections. Exactly. If you're listening history

00:03:11.409 --> 00:03:14.830
overlaps by, say, 80 % with A Stranger in Tokyo,

00:03:15.250 --> 00:03:17.969
the system assumes you are musical twins. It

00:03:17.969 --> 00:03:20.129
then looks at the other 20 % of the songs that

00:03:20.129 --> 00:03:22.830
Stranger played and just recommends them to you.

00:03:22.949 --> 00:03:25.050
It knows absolutely nothing about the music itself.

00:03:25.229 --> 00:03:27.050
Nothing. It doesn't know if a track is a jazz

00:03:27.050 --> 00:03:30.830
quartet or heavy metal. It just knows, uh, Tokyo

00:03:30.830 --> 00:03:32.810
Stranger like this, so you will too. That is

00:03:32.810 --> 00:03:35.129
exactly it. And that brings us to the other pillar.

00:03:35.739 --> 00:03:38.979
content -based filtering, which completely ignores

00:03:38.979 --> 00:03:41.479
the crowd and focuses on the physical properties

00:03:41.479 --> 00:03:43.580
of the item itself. Right, looking at the actual

00:03:43.580 --> 00:03:45.500
thing. Yeah, the classic example in our source

00:03:45.500 --> 00:03:48.460
is Pandora Radio. Pandora utilizes something

00:03:48.460 --> 00:03:51.240
called the Music Genome Project. They employ

00:03:51.240 --> 00:03:54.840
actual musicologists to assign up to 450 different

00:03:54.840 --> 00:03:58.879
attributes to a single song. 450! It's wild!

00:03:59.020 --> 00:04:01.580
They are mapping its musical DNA, like the level

00:04:01.580 --> 00:04:03.520
of syncopation, the distortion on the guitar,

00:04:03.680 --> 00:04:06.039
the vocal harmony. So instead of looking outward

00:04:06.039 --> 00:04:09.300
at the crowd, it looks inward at the item. And

00:04:09.300 --> 00:04:12.000
regardless of which filter a system uses, it

00:04:12.000 --> 00:04:15.180
has to feed on data, right? Which it gathers

00:04:15.180 --> 00:04:18.019
in two ways, explicit and implicit. Okay, break

00:04:18.019 --> 00:04:21.000
that down for me. Explicit data is when you deliberately

00:04:21.000 --> 00:04:24.060
hand the system information. Rating a movie four

00:04:24.060 --> 00:04:27.980
stars or hitting a thumbs down button. Implicit

00:04:27.980 --> 00:04:30.620
data is far more subtle and honestly much more

00:04:30.620 --> 00:04:32.339
heavily relied upon today. Because we're lazy

00:04:32.339 --> 00:04:34.779
and don't read things. Exactly. Implicit data

00:04:34.779 --> 00:04:37.329
is the system observing that You scrolled past

00:04:37.329 --> 00:04:40.290
four articles, but lingered on the fifth one

00:04:40.290 --> 00:04:42.930
for exactly three minutes and 12 seconds. I want

00:04:42.930 --> 00:04:44.629
to make sure I'm fully visualizing these two

00:04:44.629 --> 00:04:47.209
pillars. So collaborative filtering is like asking

00:04:47.209 --> 00:04:49.730
a friend who shares your exact taste in movies

00:04:49.730 --> 00:04:52.829
what to rent for movie night. Spot on. But content

00:04:52.829 --> 00:04:54.930
based filtering is like meticulously reading

00:04:54.930 --> 00:04:57.790
the ingredients on a cereal box to find a brand

00:04:57.790 --> 00:05:00.529
new cereal that has the exact same mathematical

00:05:00.529 --> 00:05:03.370
ratio of sugar and raisins as your absolute favorite

00:05:03.370 --> 00:05:06.100
brand. That is a great analogy. And what's fascinating

00:05:06.100 --> 00:05:08.480
here is how the mechanics of those two distinct

00:05:08.480 --> 00:05:11.899
approaches create massive opposing vulnerabilities

00:05:11.899 --> 00:05:15.040
for the algorithms. How so? Well, Pandora, using

00:05:15.040 --> 00:05:17.740
that serial box method, needs very little of

00:05:17.740 --> 00:05:20.040
your personal data to start functioning. Once

00:05:20.040 --> 00:05:22.360
it knows you like that specific ratio of sugar

00:05:22.360 --> 00:05:25.439
and raisins, it can instantly scan its inventory

00:05:25.439 --> 00:05:27.920
for a match. Right. But a collaborative system

00:05:27.920 --> 00:05:31.500
like Last .fm faces a severe mathematical roadblock

00:05:31.500 --> 00:05:35.209
known as the cold Because if I'm a brand new

00:05:35.209 --> 00:05:38.439
user, I don't have a listening history yet. There's

00:05:38.439 --> 00:05:40.420
no data to compare me to the Tokyo stranger.

00:05:40.660 --> 00:05:42.939
The system is completely paralyzed. And the reverse

00:05:42.939 --> 00:05:45.740
is true for items, too. If an unknown indie band

00:05:45.740 --> 00:05:47.600
uploads a brand new track, no one has listened

00:05:47.600 --> 00:05:49.779
to it yet. Oh, so there's no crowd behavior to

00:05:49.779 --> 00:05:51.939
track. Right. Because there is no crowd data

00:05:51.939 --> 00:05:55.199
attached to that specific file. A purely collaborative

00:05:55.199 --> 00:05:57.920
algorithm physically cannot recommend it to anyone.

00:05:58.160 --> 00:06:00.579
It's just stuck in limbo. And reading the source,

00:06:00.620 --> 00:06:02.819
there's a related math issue called the sparsity

00:06:02.819 --> 00:06:06.089
problem. I picture this like a giant Excel spreadsheet,

00:06:06.310 --> 00:06:09.470
right? You have millions of shoppers on one axis

00:06:09.470 --> 00:06:11.930
and millions of products on Amazon on the other

00:06:11.930 --> 00:06:15.149
axis. Yep, a massive grid. Even the most aggressive

00:06:15.149 --> 00:06:18.550
shopaholic only buys a tiny fraction of a percent

00:06:18.550 --> 00:06:23.250
of those items. 99 .9 % of that spreadsheet is

00:06:23.250 --> 00:06:26.529
just totally blank, empty cells. It's like trying

00:06:26.529 --> 00:06:28.470
to draw a constellation when you only have three

00:06:28.470 --> 00:06:31.389
stars. Yeah, that sparse matrix makes it incredibly

00:06:31.389 --> 00:06:33.769
difficult for the algorithm to find statistically

00:06:33.769 --> 00:06:36.709
significant overlaps between users. So the tech

00:06:36.709 --> 00:06:39.009
industry had to figure out how to fill in those

00:06:39.009 --> 00:06:41.529
blank cells and bypass the cold start problem.

00:06:41.589 --> 00:06:43.490
And they basically turned it into a global competition.

00:06:43.490 --> 00:06:45.189
It was one of the most famous events in tech

00:06:45.189 --> 00:06:48.089
history, the Netflix Prize. Oh, yeah. The Netflix

00:06:48.089 --> 00:06:51.449
Prize. Between 2006 and 2009, Netflix put up

00:06:51.449 --> 00:06:53.930
a $1 million bounty for any independent research

00:06:53.930 --> 00:06:56.389
team that could improve their existing recommendation

00:06:56.389 --> 00:07:00.350
algorithm by just 10%. Putting up a million dollars

00:07:00.350 --> 00:07:03.649
for a mere 10 % bump really quantifies how much

00:07:03.649 --> 00:07:05.810
revenue is tied up in keeping you on the couch

00:07:05.810 --> 00:07:08.870
for one more episode. Seriously. The team that

00:07:08.870 --> 00:07:11.509
finally claimed the prize was called Bellcore's

00:07:11.509 --> 00:07:15.420
Pragmatic Chaos. And they didn't win by inventing

00:07:15.420 --> 00:07:18.500
some magical, elegant, new mathematical formula.

00:07:18.819 --> 00:07:21.860
No, they did not. They won by brute force. They

00:07:21.860 --> 00:07:25.240
created a hybrid system. They took 107 different

00:07:25.240 --> 00:07:28.000
algorithmic approaches and basically mashed them

00:07:28.000 --> 00:07:30.699
together into one massive ensemble. The mechanism

00:07:30.699 --> 00:07:33.139
of an ensemble is similar to a voting system.

00:07:33.339 --> 00:07:35.300
You have dozens of different models analyzing

00:07:35.300 --> 00:07:37.899
the same user and they mathematically average

00:07:37.899 --> 00:07:40.420
out their predictions to reach a final recommendation.

00:07:40.800 --> 00:07:42.420
It's like a committee deciding what I watch.

00:07:42.660 --> 00:07:45.139
Pretty much. That hybrid approach immediately

00:07:45.139 --> 00:07:47.139
became the industry standard because it neatly

00:07:47.139 --> 00:07:49.120
solves the vulnerabilities we just discussed.

00:07:49.420 --> 00:07:52.279
If a hybrid system encounters a brand new movie

00:07:52.279 --> 00:07:54.680
triggering the cold start problem for the collaborative

00:07:54.680 --> 00:07:57.759
filter, the system simply shifts weight to the

00:07:57.759 --> 00:08:00.000
content -based filter. Oh, I see. It looks at

00:08:00.000 --> 00:08:02.899
the movie's metadata, sees it's a sci -fi thriller

00:08:02.899 --> 00:08:05.040
directed by Christopher Nolan, and recommends

00:08:05.040 --> 00:08:07.839
it based on those attributes until enough crowd

00:08:07.839 --> 00:08:10.639
data rolls in. But I have to push back here on

00:08:10.639 --> 00:08:13.540
the tech industry's favorite mantra, which is

00:08:13.540 --> 00:08:16.379
that more data is always better. because there

00:08:16.379 --> 00:08:18.740
was a massive unintended consequence to that

00:08:18.740 --> 00:08:21.420
Netflix prize. A huge one. To make the competition

00:08:21.420 --> 00:08:24.360
work, Netflix had to release a data set of 100

00:08:24.360 --> 00:08:27.100
million movie ratings to these thousands of indented

00:08:27.100 --> 00:08:29.959
researchers. They claimed it was completely anonymized,

00:08:30.439 --> 00:08:33.299
just random user ID numbers, movie titles, and

00:08:33.299 --> 00:08:35.320
the dates the movies are rated. No names, no

00:08:35.320 --> 00:08:37.360
locations. Right, standard practice at the time.

00:08:37.519 --> 00:08:40.559
But in 2007, two researchers from the University

00:08:40.559 --> 00:08:43.840
of Texas proved that anonymized data is a myth.

00:08:43.950 --> 00:08:46.690
They took that Netflix data set and cross -referenced

00:08:46.690 --> 00:08:49.009
it with public reviews on the internet movie

00:08:49.009 --> 00:08:52.590
database IMDb. It worked. It did. They successfully

00:08:52.590 --> 00:08:54.889
identified the actual real -life users behind

00:08:54.889 --> 00:08:57.450
the random ID numbers. The mechanism of how they

00:08:57.450 --> 00:08:59.750
did that is chilling. They didn't need a name,

00:08:59.750 --> 00:09:02.789
they just looked at patterns. If you rated a

00:09:02.789 --> 00:09:05.549
highly obscure foreign film on a Tuesday and

00:09:05.549 --> 00:09:07.750
then rated a mainstream comedy on a Thursday,

00:09:08.169 --> 00:09:10.909
that specific timeline of ratings acts like a

00:09:10.909 --> 00:09:12.830
digital fingerprint. It's so specific to you.

00:09:12.950 --> 00:09:16.039
Exactly. By finding that same fingerprint of

00:09:16.039 --> 00:09:18.940
timestamps and star ratings on a public IMDB

00:09:18.940 --> 00:09:21.879
profile where someone used their real name, the

00:09:21.879 --> 00:09:24.799
researchers instantly unmasked the user's entire

00:09:24.799 --> 00:09:27.539
private Netflix viewing history. Which included

00:09:27.539 --> 00:09:29.460
highly sensitive movies that people might not

00:09:29.460 --> 00:09:31.159
want their family or employers to know they were

00:09:31.159 --> 00:09:34.629
watching. It led to a major lawsuit. Duvye Netflix.

00:09:35.250 --> 00:09:37.269
And the privacy concerns were so severe that

00:09:37.269 --> 00:09:39.330
Netflix was forced to cancel their planned second

00:09:39.330 --> 00:09:42.149
prize competition entirely. It was a huge reality

00:09:42.149 --> 00:09:44.769
check for the industry. Yeah. I mean, if supposedly

00:09:44.769 --> 00:09:47.769
anonymous movie ratings from 2007 can unmask

00:09:47.769 --> 00:09:50.090
our real identities, what happens today when

00:09:50.090 --> 00:09:52.210
these hybrid systems require tracking our real

00:09:52.210 --> 00:09:55.070
time GPS locations, our late night impulse buys,

00:09:55.309 --> 00:09:57.429
and our sensitive political searches just to

00:09:57.429 --> 00:09:59.950
function? It is the defining tension of modern

00:09:59.950 --> 00:10:03.360
recommender systems. Hybrid systems solve the

00:10:03.360 --> 00:10:06.200
mathematical limitations. But the trade -off

00:10:06.200 --> 00:10:08.460
was that they dramatically increased the sheer

00:10:08.460 --> 00:10:10.879
volume of personal data required. So we're just

00:10:10.879 --> 00:10:13.379
giving it all away? You are explicitly trading

00:10:13.379 --> 00:10:16.379
your privacy for personalization. The algorithms

00:10:16.379 --> 00:10:19.120
demand a unified view of your cross -platform

00:10:19.120 --> 00:10:21.980
habits to perform well. But as that University

00:10:21.980 --> 00:10:24.580
of Texas study proved mechanically, when you

00:10:24.580 --> 00:10:27.080
combine enough anonymous data points from different

00:10:27.080 --> 00:10:30.340
sources, anonymity mathematically ceases to exist.

00:10:30.860 --> 00:10:32.799
I assume these companies can't just keep relying

00:10:32.799 --> 00:10:35.679
on 10 -year -old static databases of my past

00:10:35.679 --> 00:10:37.919
movie ratings, right? Because human moods change

00:10:37.919 --> 00:10:40.039
constantly. Oh, absolutely. Like what I want

00:10:40.039 --> 00:10:42.419
to listen to while working out on a Tuesday morning

00:10:42.419 --> 00:10:44.519
is completely different from what I want on a

00:10:44.519 --> 00:10:46.990
Friday night. How do modern systems actually

00:10:46.990 --> 00:10:50.289
keep up with my mood in real time without just

00:10:50.289 --> 00:10:53.129
querying a massive slow database every time I

00:10:53.129 --> 00:10:56.149
click refresh? That limitation forced a shift

00:10:56.149 --> 00:10:59.309
away from static databases and toward dynamic

00:10:59.309 --> 00:11:02.370
neural architectures. One major breakthrough

00:11:02.370 --> 00:11:05.110
was the session -based recommender, which is

00:11:05.110 --> 00:11:07.629
heavily utilized by platforms like YouTube and

00:11:07.629 --> 00:11:10.289
Amazon. OK. How does that work? Instead of relying

00:11:10.289 --> 00:11:12.909
solely on your 10 -year historical profile, they

00:11:12.909 --> 00:11:15.509
employ something called recurrent neural networks,

00:11:16.029 --> 00:11:19.330
or RNNs. I'm going to need an ELI -5 on recurrent

00:11:19.330 --> 00:11:21.370
neural networks. What is that actually doing

00:11:21.370 --> 00:11:24.549
in the math? Think of an RNN as giving the algorithm

00:11:24.549 --> 00:11:28.210
a highly focused short -term memory. As you navigate

00:11:28.210 --> 00:11:30.909
a site, your actions aren't treated as isolated

00:11:30.909 --> 00:11:33.710
events, they are treated as a sequence. A sequence.

00:11:33.950 --> 00:11:37.029
Like a storyline. Yeah. The RNN updates its internal

00:11:37.029 --> 00:11:39.129
state with every single click you make in your

00:11:39.129 --> 00:11:41.590
current active session. If you start clicking

00:11:41.590 --> 00:11:44.210
on camping gear, the network immediately weighs

00:11:44.210 --> 00:11:46.470
those recent clicks much heavier than the blender

00:11:46.470 --> 00:11:48.730
you bought six months ago. So it knows I'm in

00:11:48.730 --> 00:11:50.850
camping mode right now. It is calculating who

00:11:50.850 --> 00:11:53.350
you are right now in this exact five -minute

00:11:53.350 --> 00:11:55.529
window. But to do that calculation at the scale

00:11:55.529 --> 00:11:58.039
of the entire internet, The source details something

00:11:58.039 --> 00:12:00.580
called the two -tower model. The two -tower model

00:12:00.580 --> 00:12:04.039
solves the problem of speed. Imagine a 3D map

00:12:04.039 --> 00:12:06.440
of the universe, like a giant mathematical vector

00:12:06.440 --> 00:12:09.740
space. You have two separate neural networks,

00:12:10.139 --> 00:12:12.639
or towers. OK, two towers. One tower focuses

00:12:12.639 --> 00:12:15.700
solely on encoding the user. It processes your

00:12:15.700 --> 00:12:17.519
current session, your demographics, your past

00:12:17.519 --> 00:12:20.779
clicks, and places your user dot at a specific

00:12:20.779 --> 00:12:24.289
coordinate in that 3D galaxy. So I have a physical

00:12:24.289 --> 00:12:27.029
location in the mouth. Yes. And the other tower

00:12:27.029 --> 00:12:29.970
processes millions of items, videos, products,

00:12:30.129 --> 00:12:33.190
and places item dots in that same galaxy. So

00:12:33.190 --> 00:12:35.049
the algorithm doesn't have to scan a database.

00:12:35.509 --> 00:12:37.649
It just measures the physical distance between

00:12:37.649 --> 00:12:40.529
my dot and the movie's dot in that mathematical

00:12:40.529 --> 00:12:44.120
space. If we are in the same galaxy, it recommends

00:12:44.120 --> 00:12:46.480
it. Precisely. And because the item tower can

00:12:46.480 --> 00:12:48.580
pre -calculate the coordinates of millions of

00:12:48.580 --> 00:12:51.299
products in advance, the system only has to calculate

00:12:51.299 --> 00:12:54.299
your user dot in real time. Which makes the recommendation

00:12:54.299 --> 00:12:56.580
instantaneous. Exactly. Here's where it gets

00:12:56.580 --> 00:12:58.580
really interesting because the Wikipedia article

00:12:58.580 --> 00:13:01.519
dives into the absolute cutting edge generative

00:13:01.519 --> 00:13:04.740
recommenders, specifically a model called HSTU.

00:13:04.840 --> 00:13:06.600
This is the frontier. From my understanding,

00:13:06.899 --> 00:13:09.220
this takes the architecture that powers modern

00:13:09.220 --> 00:13:12.320
AI chatbots, like large language models, and

00:13:12.320 --> 00:13:15.580
applies it to our behavior. It treats every action

00:13:15.580 --> 00:13:18.460
you take as a token, just like a word in a sentence.

00:13:18.500 --> 00:13:20.600
Right. So it's like the algorithm is playing

00:13:20.600 --> 00:13:22.799
autocomplete with your life. Instead of predicting

00:13:22.799 --> 00:13:25.080
the next word in a paragraph, it's predicting

00:13:25.080 --> 00:13:28.409
your next human action. And just like autocomplete

00:13:28.409 --> 00:13:31.210
uses the context of your entire sentence to guess

00:13:31.210 --> 00:13:34.269
the next word, generative recommenders use a

00:13:34.269 --> 00:13:36.889
mechanism called self -attention to look at your

00:13:36.889 --> 00:13:40.669
entire digital day as a sentence. Self -attention.

00:13:40.870 --> 00:13:42.769
What does that mean practically? Self -attention

00:13:42.769 --> 00:13:44.889
allows the algorithm to look backward and find

00:13:44.889 --> 00:13:47.330
hidden correlations between distant actions.

00:13:48.210 --> 00:13:50.190
It might notice that a click on a camera lens

00:13:50.190 --> 00:13:52.769
at 9 a .m. is almost always followed by a search

00:13:52.769 --> 00:13:55.700
for memory cards at 3 p .m. It understands the

00:13:55.700 --> 00:13:57.899
context of your behavior over long sequences.

00:13:58.279 --> 00:14:00.039
But the source material notes it goes a step

00:14:00.039 --> 00:14:02.320
further with reinforcement learning. How does

00:14:02.320 --> 00:14:04.080
that differ from what we've been talking about?

00:14:04.379 --> 00:14:06.059
Aren't all these models just trying to guess

00:14:06.059 --> 00:14:08.200
what'll click? If we connect this to the bigger

00:14:08.200 --> 00:14:11.059
picture, reinforcement learning represents a

00:14:11.059 --> 00:14:14.529
massive philosophical and mechanical shift. In

00:14:14.529 --> 00:14:16.389
traditional machine learning, the system looks

00:14:16.389 --> 00:14:19.350
at past data to passively guess your preference.

00:14:19.850 --> 00:14:21.909
Right. In reinforcement learning, the algorithm

00:14:21.909 --> 00:14:25.289
acts as an agent, and you, the user, are the

00:14:25.289 --> 00:14:27.649
environment. I don't love being called an environment.

00:14:27.730 --> 00:14:30.850
It gets weirder. The agent doesn't have a script.

00:14:30.970 --> 00:14:33.529
It operates through exploration and exploitation.

00:14:33.769 --> 00:14:36.029
It's like a standup comedian testing out jokes

00:14:36.029 --> 00:14:39.289
on a live audience. OK. The agent takes an action

00:14:39.289 --> 00:14:41.710
like throwing a specific video onto your feed.

00:14:41.929 --> 00:14:45.350
If you engage with it, the agent receives a mathematical

00:14:45.350 --> 00:14:48.549
reward. It immediately updates its policy to

00:14:48.549 --> 00:14:50.990
exploit that success, showing you more videos

00:14:50.990 --> 00:14:53.500
just like it. It's actively learning how to push

00:14:53.500 --> 00:14:56.500
my buttons in real time. Yes. It shifts the system

00:14:56.500 --> 00:14:58.779
from passively predicting what you might like

00:14:58.779 --> 00:15:01.940
to actively optimizing your behavior to maximize

00:15:01.940 --> 00:15:04.799
its own reward, which is your prolonged engagement.

00:15:05.159 --> 00:15:07.659
That sounds incredibly powerful, but also like

00:15:07.659 --> 00:15:10.899
a trap. And researchers are realizing that because

00:15:10.899 --> 00:15:14.720
the source highlights a major glaring issue in

00:15:14.720 --> 00:15:17.039
the field right now called the magic barrier.

00:15:17.289 --> 00:15:20.070
or the accuracy barrier. It's a huge problem

00:15:20.070 --> 00:15:22.429
for engineers. Just because a reinforcement learning

00:15:22.429 --> 00:15:24.870
agent mathematically predicts exactly what I

00:15:24.870 --> 00:15:27.789
will click on does not mean I am actually satisfied

00:15:27.789 --> 00:15:30.389
by the experience. For decades the holy grail

00:15:30.389 --> 00:15:33.149
for engineers was offline accuracy. They would

00:15:33.149 --> 00:15:36.129
hide a portion of a historical data set train

00:15:36.129 --> 00:15:38.190
their algorithm on the rest, and see if it could

00:15:38.190 --> 00:15:40.789
accurately predict the hidden data. Like a practice

00:15:40.789 --> 00:15:43.570
test. Exactly. But they are hitting a wall of

00:15:43.570 --> 00:15:46.590
diminishing returns. Pushing an algorithm's offline

00:15:46.590 --> 00:15:49.909
accuracy from 95 % to 98 % frequently results

00:15:49.909 --> 00:15:52.830
in zero improvement in actual user satisfaction

00:15:52.830 --> 00:15:55.330
when deployed in the real world. Wait, there

00:15:55.330 --> 00:15:58.009
is also a massive reproducibility crisis mentioned

00:15:58.009 --> 00:16:01.289
in the source. A 2019 study surveyed deep learning

00:16:01.289 --> 00:16:04.090
recommendation papers from top scientific conferences.

00:16:04.690 --> 00:16:07.090
They discovered that less than 40 % of the published

00:16:07.090 --> 00:16:09.990
algorithms could actually be reproduced by independent

00:16:09.990 --> 00:16:12.690
scientists. Yeah, it's pretty damning. In some

00:16:12.690 --> 00:16:17.129
specific conferences, it was as low as 14%. You're

00:16:17.129 --> 00:16:19.289
telling me that in the most advanced field of

00:16:19.289 --> 00:16:21.629
computer science, where billions of dollars are

00:16:21.629 --> 00:16:24.490
on the line, the smartest engineers in the room

00:16:24.490 --> 00:16:27.919
can't even copy each other's homework. 60 % of

00:16:27.919 --> 00:16:30.240
these breakthrough algorithms are essentially

00:16:30.240 --> 00:16:33.539
mathematical mirages. It reveals the danger of

00:16:33.539 --> 00:16:36.740
evaluating these systems in a vacuum. Offline

00:16:36.740 --> 00:16:39.120
training is mechanically biased toward items

00:16:39.120 --> 00:16:41.480
that are already highly popular in the historical

00:16:41.480 --> 00:16:44.500
data. It creates false confidence. So the test

00:16:44.500 --> 00:16:47.139
is rigged from the start. The industry is realizing

00:16:47.139 --> 00:16:49.799
that the only way to measure true success isn't

00:16:49.799 --> 00:16:52.240
through offline math tests, but through A -B

00:16:52.240 --> 00:16:55.200
testing with real, live, incredibly unpredictable

00:16:55.200 --> 00:16:57.779
humans. Which is forcing the tech giants to measure

00:16:57.779 --> 00:17:00.960
things beyond accuracy. The source lists several

00:17:00.960 --> 00:17:02.799
new metrics that actually correlate with human

00:17:02.799 --> 00:17:05.140
happiness. There's diversity. We don't want a

00:17:05.140 --> 00:17:07.039
feed of 10 identical videos, even if we like

00:17:07.039 --> 00:17:08.819
that topic. No one wants that. There's labeling

00:17:08.819 --> 00:17:11.400
users. Physically click less on an item if it

00:17:11.400 --> 00:17:13.759
has a sponsored tag on it because it breaks trust.

00:17:13.950 --> 00:17:15.910
But the metric that really caught my eye was

00:17:15.910 --> 00:17:19.029
serendipity. Serendipity measures how surprisingly

00:17:19.029 --> 00:17:21.730
relevant a recommendation is. It's the opposite

00:17:21.730 --> 00:17:24.430
of obviousness. The Wikipedia article gives this

00:17:24.430 --> 00:17:27.809
perfect example of a grocery store app. If the

00:17:27.809 --> 00:17:30.210
algorithm looks at my cart and recommends that

00:17:30.210 --> 00:17:34.750
I buy a gallon of milk, that is a 100 % mathematically

00:17:34.750 --> 00:17:37.410
accurate prediction. Because you always buy milk.

00:17:37.529 --> 00:17:40.160
I buy milk every week. But it's a terrible recommendation.

00:17:40.359 --> 00:17:42.759
It's completely obvious. I don't need a multi

00:17:42.759 --> 00:17:45.039
-million dollar neural network to remind me to

00:17:45.039 --> 00:17:48.440
buy milk. I need it to suggest a new hot sauce

00:17:48.440 --> 00:17:51.339
I've never heard of that pairs perfectly with

00:17:51.339 --> 00:17:53.960
the specific tacos I'm making. That's serendipity.

00:17:54.180 --> 00:17:56.220
It makes me wonder, are hyper -accurate algorithms

00:17:56.220 --> 00:17:58.660
actually killing the joy of human discovery?

00:17:58.910 --> 00:18:02.069
It is a profound concern among researchers. If

00:18:02.069 --> 00:18:04.529
a system only feeds you what you already know

00:18:04.529 --> 00:18:07.390
you want, your tastes inevitably stagnate. We

00:18:07.390 --> 00:18:09.890
just get stuck in a loop. Mechanically, serenity

00:18:09.890 --> 00:18:12.009
is necessary not just to keep you from getting

00:18:12.009 --> 00:18:14.089
bored, but for the algorithm itself to learn.

00:18:14.309 --> 00:18:16.490
It has to throw wild cards at you, explore new

00:18:16.490 --> 00:18:19.349
territory, to map uncharted areas of your preferences.

00:18:19.849 --> 00:18:21.970
If it never risks an unusual recommendation,

00:18:22.509 --> 00:18:25.329
it can never update its map of you. These metrics,

00:18:25.509 --> 00:18:28.170
diversity, serendipity, trust, they aren't just

00:18:28.170 --> 00:18:30.869
abstract computer science concepts. They have

00:18:30.869 --> 00:18:33.289
massive implications for how our society functions

00:18:33.289 --> 00:18:36.009
because these invisible maps are deployed everywhere.

00:18:36.230 --> 00:18:38.769
Everywhere you look. In e -commerce, the recommender

00:18:38.769 --> 00:18:40.710
mechanically changes depending on your context.

00:18:40.950 --> 00:18:43.029
On the home page, it shows you a diverse spread

00:18:43.029 --> 00:18:45.549
of your general history. But the second you add

00:18:45.549 --> 00:18:47.910
a camera to your shopping cart, the algorithm

00:18:47.910 --> 00:18:50.529
switches to aggressive cross -selling, pushing

00:18:50.529 --> 00:18:53.269
the specific memory card that fits that exact

00:18:53.269 --> 00:18:56.099
camera model. The implications extend far beyond

00:18:56.099 --> 00:18:58.660
retail, too. Look at how we discover knowledge.

00:18:59.180 --> 00:19:02.559
Every single day, approximately 6 ,000 academic

00:19:02.559 --> 00:19:04.960
journal articles are published globally. Wow.

00:19:05.279 --> 00:19:08.380
No human scientist can read that volume. So the

00:19:08.380 --> 00:19:10.559
academic community relies heavily on systems

00:19:10.559 --> 00:19:12.599
like Google Scholar to recommend which papers

00:19:12.599 --> 00:19:15.500
to read. But Google Scholar's algorithm relies

00:19:15.500 --> 00:19:18.259
on statistical models weighted heavily by an

00:19:18.259 --> 00:19:20.619
author's historical citation volume. Meaning,

00:19:20.880 --> 00:19:23.740
if a famous scientist wrote it, it gets recommended

00:19:23.740 --> 00:19:27.160
more. Yes, it creates a mechanical rich -get

00:19:27.160 --> 00:19:30.259
-richer loop. The unintended consequence is that

00:19:30.259 --> 00:19:33.220
a brilliant early career researcher who publishes

00:19:33.220 --> 00:19:36.440
a groundbreaking paper gets severely penalized.

00:19:36.759 --> 00:19:39.460
That's awful. Because they lack a 10 -year history

00:19:39.460 --> 00:19:42.420
of citations, the algorithm mathematically buries

00:19:42.420 --> 00:19:45.220
their work. It never reaches the surface. The

00:19:45.220 --> 00:19:48.299
algorithm literally dictates who gets heard and

00:19:48.299 --> 00:19:50.440
who gets ignored in the scientific community.

00:19:50.480 --> 00:19:53.279
And that brings us to perhaps the most critical

00:19:53.279 --> 00:19:56.420
application, our decision -making and our political

00:19:56.420 --> 00:19:59.779
reality. topic right now. We are all deeply familiar

00:19:59.779 --> 00:20:02.660
with social media algorithms that use reinforcement

00:20:02.660 --> 00:20:05.140
learning to optimize for engagement. And we know

00:20:05.140 --> 00:20:07.180
that optimizing for engagement usually means

00:20:07.180 --> 00:20:09.880
optimizing for outrage because anger drives the

00:20:09.880 --> 00:20:11.740
most clicks. It's the easiest button to push.

00:20:12.200 --> 00:20:14.819
But the source material details an alternative

00:20:14.819 --> 00:20:17.240
mechanical approach called bridging -based ranking.

00:20:17.450 --> 00:20:19.690
Bridging -based ranking is a fundamentally different

00:20:19.690 --> 00:20:22.410
way to calculate reward. Instead of an algorithm

00:20:22.410 --> 00:20:24.630
searching for content that divides people to

00:20:24.630 --> 00:20:27.289
trigger an emotional response, bridging ranking

00:20:27.289 --> 00:20:30.390
optimizes for content that is unifying. Unifying.

00:20:30.700 --> 00:20:33.839
How does it even know what that is? It uses matrix

00:20:33.839 --> 00:20:37.279
factorization to identify groups of users who

00:20:37.279 --> 00:20:39.960
historically disagree on almost everything. When

00:20:39.960 --> 00:20:42.240
the algorithm finds a specific piece of content

00:20:42.240 --> 00:20:44.960
that both of those deeply opposed groups actually

00:20:44.960 --> 00:20:48.119
agree on, it boosts that content. It's actively

00:20:48.119 --> 00:20:51.740
searching for consensus across the divide. Platforms

00:20:51.740 --> 00:20:54.859
like Twitter or X have utilized this exact math

00:20:54.859 --> 00:20:57.299
for their community notes feature, surfacing

00:20:57.299 --> 00:20:59.299
context that people from differing political

00:20:59.299 --> 00:21:02.019
viewpoints generally agree is helpful. And tools

00:21:02.019 --> 00:21:04.660
like Polis use it to map out areas of agreement

00:21:04.660 --> 00:21:07.259
in town hall debates. So what does this all mean?

00:21:07.380 --> 00:21:09.619
What's the takeaway? It means the mathematical

00:21:09.619 --> 00:21:12.599
reward an algorithm is programmed to chase doesn't

00:21:12.599 --> 00:21:15.700
have to be mindless angry engagement. If you

00:21:15.700 --> 00:21:18.259
simply change the reward mechanism in the code,

00:21:18.440 --> 00:21:20.839
You could literally change the tone of the global

00:21:20.839 --> 00:21:23.359
internet. You could program the invisible digital

00:21:23.359 --> 00:21:25.599
architects to build bridges instead of walls.

00:21:25.779 --> 00:21:27.859
This raises an important question, though. Who

00:21:27.859 --> 00:21:30.019
gets to define the math of what is officially

00:21:30.019 --> 00:21:33.079
considered unifying? Right. Who's at the controls?

00:21:33.460 --> 00:21:36.079
That is a tremendous amount of societal power

00:21:36.079 --> 00:21:39.660
held by a few engineers. Some researchers, like

00:21:39.660 --> 00:21:42.460
Aviva Vaidya mentioned in the text, argue that

00:21:42.460 --> 00:21:45.380
we cannot leave that decision solely up to tech

00:21:45.380 --> 00:21:48.079
executives. Makes sense. They advocate for a

00:21:48.079 --> 00:21:50.619
structural change empowering deliberative groups,

00:21:50.839 --> 00:21:52.599
you know, panels of people, representative of

00:21:52.599 --> 00:21:55.779
the platform's actual users, to govern the design

00:21:55.779 --> 00:21:57.859
and implementation of these bridging algorithms.

00:21:58.539 --> 00:22:00.460
It's incredible to think about the journey we've

00:22:00.460 --> 00:22:04.099
taken today. We started in 1979 with Elaine Rich's

00:22:04.099 --> 00:22:06.799
grundy, bluntly assigning people into book stereotypes.

00:22:06.960 --> 00:22:08.940
The simpler time. We moved through the million

00:22:08.940 --> 00:22:11.740
dollar Netflix ensemble that accidentally unmasked

00:22:11.740 --> 00:22:15.220
its users into the modern AI era where two tower

00:22:15.220 --> 00:22:17.740
models and generative recommenders predict our

00:22:17.740 --> 00:22:19.799
next life action like autocomplete. It's moving

00:22:19.799 --> 00:22:22.400
so fast. And we've ended up discussing how the

00:22:22.400 --> 00:22:25.180
very fabric of our society and our ability to

00:22:25.180 --> 00:22:28.099
find consensus rests entirely on the mathematical

00:22:28.099 --> 00:22:30.809
definition of a good recommendation. Understand

00:22:30.809 --> 00:22:32.710
the mechanics of these systems really does give

00:22:32.710 --> 00:22:35.869
you more agency over your digital diet. It absolutely

00:22:35.869 --> 00:22:37.930
does. When you know how the map is drawn you

00:22:37.930 --> 00:22:40.809
can navigate it better. And as we wrap up...

00:22:40.779 --> 00:22:44.640
There is one final, slightly eerie concept from

00:22:44.640 --> 00:22:46.299
our source material that I want to leave you

00:22:46.299 --> 00:22:48.859
with. OK, lay it on me. It's an emerging field

00:22:48.859 --> 00:22:51.779
called risk -aware recommender systems. Currently,

00:22:52.000 --> 00:22:54.079
algorithms are incredibly aggressive. They push

00:22:54.079 --> 00:22:56.380
notifications and content at you constantly.

00:22:57.059 --> 00:22:59.299
But researchers are developing models using something

00:22:59.299 --> 00:23:03.140
called contextual bandits. Contextual bandits?

00:23:03.519 --> 00:23:05.920
That sounds like a gang of highly educated thieves.

00:23:06.039 --> 00:23:08.720
Well, in statistics, a multi -armed bandit is

00:23:08.720 --> 00:23:11.119
like a row of slot machines. where you are trying

00:23:11.119 --> 00:23:14.319
to figure out which one pays out the most? A

00:23:14.319 --> 00:23:17.440
contextual bandit calculates the odds based on

00:23:17.440 --> 00:23:19.900
the current environment. Oh, I see. These risk

00:23:19.900 --> 00:23:22.759
-aware algorithms use contextual bandits to calculate

00:23:22.759 --> 00:23:24.839
the mathematical danger of disturbing you at

00:23:24.839 --> 00:23:27.799
the wrong time. They learn your schedule to avoid

00:23:27.799 --> 00:23:30.720
pinging you with an ad while you are in a professional

00:23:30.720 --> 00:23:32.920
meeting or lay at night when you're trying to

00:23:32.920 --> 00:23:35.079
sleep. That sounds polite, actually. Yeah. They

00:23:35.079 --> 00:23:37.920
learn when to leave you alone. But I want you

00:23:37.920 --> 00:23:40.779
to mull this over. If an algorithm becomes sophisticated

00:23:40.779 --> 00:23:44.000
enough to monitor your daily schedule, your physical

00:23:44.000 --> 00:23:46.880
context, and your stress levels in order to politely

00:23:46.880 --> 00:23:50.279
avoid bothering you, does it also know the exact,

00:23:50.579 --> 00:23:53.519
precise moment of the day when you are most emotionally

00:23:53.519 --> 00:23:55.740
vulnerable to buying something you don't actually

00:23:55.740 --> 00:23:58.839
need? Wow. The invisible cartographer doesn't

00:23:58.839 --> 00:24:01.180
just know where your dot is on the map. They

00:24:01.180 --> 00:24:03.359
know exactly how tired you are when you get there.

00:24:04.000 --> 00:24:05.700
Exactly. Thank you for joining us on this deep

00:24:05.700 --> 00:24:08.319
dive. Stay curious, project your data, and watch

00:24:08.319 --> 00:24:10.579
out for those hyper accurate milk recommendations.

00:24:11.099 --> 00:24:13.259
The map of your preferences is always changing,

00:24:13.599 --> 00:24:15.720
so make sure you are the ones steering the ship.

00:24:16.299 --> 00:24:16.940
See you next time.
