WEBVTT

00:00:01.120 --> 00:00:04.419
So imagine you're up late, right? And you're

00:00:04.419 --> 00:00:08.160
trying to research the, I don't know, the profound

00:00:08.160 --> 00:00:11.439
cultural origins of the hip hop musical genre.

00:00:11.779 --> 00:00:13.779
Right, like a proper historical deep dive into

00:00:13.779 --> 00:00:16.019
music history. Exactly. So you type your query

00:00:16.019 --> 00:00:18.140
into a search engine, you click that very first

00:00:18.140 --> 00:00:21.059
Wikipedia link, and suddenly the internet decides

00:00:21.059 --> 00:00:22.899
you actually just want to watch Jimmy Fallon

00:00:22.899 --> 00:00:25.519
and Justin Timberlake perform a late night comedy

00:00:25.519 --> 00:00:28.339
bit. Which is, I mean, it's hilarious, but it's

00:00:28.339 --> 00:00:31.019
a real thing. It is. That literal collision of

00:00:31.019 --> 00:00:33.979
global history and late night television is happening

00:00:33.979 --> 00:00:36.200
right now. And it's completely out of sight.

00:00:36.500 --> 00:00:38.600
Yeah, it's all happening in the background architecture

00:00:38.600 --> 00:00:42.119
of the web. Right. So have you ever found yourself

00:00:42.119 --> 00:00:44.219
falling down one of those massive Wikipedia?

00:00:44.670 --> 00:00:47.009
rabbit holes, like you start out looking up the

00:00:47.009 --> 00:00:49.890
capital of Vermont and somehow three hours later,

00:00:50.270 --> 00:00:52.969
you were just deeply engrossed in the architectural

00:00:52.969 --> 00:00:56.229
nuances of 17th century naval tactics. Oh, absolutely.

00:00:56.229 --> 00:00:57.929
I mean, it happens to all of us. It's the classic

00:00:57.929 --> 00:01:00.990
Wikipedia experience. Yeah. But usually when

00:01:00.990 --> 00:01:04.030
we take those journeys, we're jumping from one

00:01:04.030 --> 00:01:07.010
giant sprawling encyclopedia entry to the next.

00:01:07.090 --> 00:01:09.489
Right. We kind of move from room to room in this

00:01:09.489 --> 00:01:11.969
endless information mansion. Right. Exploring

00:01:11.969 --> 00:01:15.010
all these massive topics. But today. We're doing

00:01:15.010 --> 00:01:17.689
something different. We're stopping and staring

00:01:17.689 --> 00:01:19.709
really, really closely at the doorway itself.

00:01:19.930 --> 00:01:21.969
Yeah. We're looking at the architecture of the

00:01:21.969 --> 00:01:23.810
portal rather than the room it leads to, which

00:01:23.810 --> 00:01:26.909
I find endlessly fascinating. It really is. Because

00:01:26.909 --> 00:01:30.370
today's deep dive is not into a massive, thousands

00:01:30.370 --> 00:01:32.969
of words long research paper. We are actually

00:01:32.969 --> 00:01:36.290
looking at a single, highly specific Wikipedia

00:01:36.290 --> 00:01:40.090
set index page titled, get this, lists of Jimmy

00:01:40.090 --> 00:01:42.709
Fallon games and sketches. It's so specific.

00:01:43.049 --> 00:01:45.569
So, so specific. And, you know, if you're tuning

00:01:45.569 --> 00:01:47.670
in right now expecting a retrospective on the

00:01:47.670 --> 00:01:50.909
comedic value of Jimmy Fallon's career or like

00:01:50.909 --> 00:01:52.870
a ranking of his best jokes. Yeah, you might

00:01:52.870 --> 00:01:54.709
be disappointed. Right. We're doing something

00:01:54.709 --> 00:01:57.189
a bit different. We're unpacking the fascinating

00:01:57.189 --> 00:02:00.349
ways human entertainment is categorized, archived

00:02:00.349 --> 00:02:03.189
and disambiguated online. We're basically going

00:02:03.189 --> 00:02:06.510
to use this one tiny skeletal directory page

00:02:06.510 --> 00:02:09.409
to understand how the Internet organizes pop

00:02:09.409 --> 00:02:12.129
culture. And, you know, it sounds incredibly

00:02:12.129 --> 00:02:15.469
niche, almost mundane on the surface, but the

00:02:15.469 --> 00:02:18.210
skeletal page acts as a perfect microcosm for

00:02:18.210 --> 00:02:20.629
how human knowledge is structured. Totally. We're

00:02:20.629 --> 00:02:23.270
basically analyzing the container that holds

00:02:23.270 --> 00:02:26.310
the career rather than the career itself. OK.

00:02:26.330 --> 00:02:28.090
So let's start with the primary function of this

00:02:28.090 --> 00:02:30.090
page. Right at the bottom, it categorizes itself.

00:02:30.129 --> 00:02:33.990
It says, category, set index articles. Right.

00:02:34.080 --> 00:02:36.560
And what this page essentially does is take Jimmy

00:02:36.560 --> 00:02:39.379
Fallon's entire late night sketch catalog and

00:02:39.379 --> 00:02:41.360
split it right down the middle into two completely

00:02:41.360 --> 00:02:44.479
distinct lists. You have the list of late night

00:02:44.479 --> 00:02:46.680
with Jimmy Fallon games and sketches, and then

00:02:46.680 --> 00:02:49.319
you have the list of the tonight show starring

00:02:49.319 --> 00:02:51.759
Jimmy Fallon games and sketches. Yeah, and in

00:02:51.759 --> 00:02:55.060
the taxonomy of online encyclopedias, a set index

00:02:55.060 --> 00:02:58.379
article isn't just a normal entry. And it's also,

00:02:58.460 --> 00:03:01.039
this is important, it's distinct from a standard

00:03:01.039 --> 00:03:04.080
disambiguation page. Wait, how so? Aren't they

00:03:04.080 --> 00:03:06.479
kind of doing the same thing? Well, sort of,

00:03:06.599 --> 00:03:08.419
but disambiguation pages are for things that

00:03:08.419 --> 00:03:11.099
just happen to share the exact same name by pure

00:03:11.099 --> 00:03:15.099
coincidence, like Apple the Fruit and Apple the

00:03:15.099 --> 00:03:17.020
Technology Company. Oh, right, because they have

00:03:17.020 --> 00:03:19.139
nothing to do with each other. Exactly. But a

00:03:19.139 --> 00:03:21.620
set index is designed for items that are intrinsically

00:03:21.620 --> 00:03:24.379
related. They share similar names or fall under

00:03:24.379 --> 00:03:27.099
a tightly bound umbrella. But collectively, they've

00:03:27.099 --> 00:03:30.500
just become far too unwieldy to exist on a single

00:03:30.500 --> 00:03:33.120
page. OK, I get it. It makes me think of, like,

00:03:33.449 --> 00:03:36.849
a musician's discography. Oh, that's a good comparison.

00:03:36.909 --> 00:03:39.449
Yeah, because fans will fiercely categorize a

00:03:39.449 --> 00:03:42.210
band's eras, right? You've got the gritty, experimental

00:03:42.210 --> 00:03:45.150
indie years, and then there's the highly polished

00:03:45.150 --> 00:03:47.830
stadium rock era. Yes, exactly. And you don't

00:03:47.830 --> 00:03:49.789
just mix all those tracks together in one playlist

00:03:49.789 --> 00:03:52.009
if you're a serious archivist. You separate them

00:03:52.009 --> 00:03:53.990
because the context of their creation matters.

00:03:54.389 --> 00:03:57.110
Right, it respects the distinct eat box of the

00:03:57.110 --> 00:04:00.169
creative output. And here's my pushback on this.

00:04:00.699 --> 00:04:04.340
Why couldn't this just be one massive chronological

00:04:04.340 --> 00:04:07.500
list? I mean, Jimmy Fallon is Jimmy Fallon. Sure.

00:04:08.360 --> 00:04:11.240
If he plays a game on late night and then he

00:04:11.240 --> 00:04:13.439
plays that exact same game years later on The

00:04:13.439 --> 00:04:16.980
Tonight Show, does shifting time slots really

00:04:16.980 --> 00:04:19.139
fundamentally change the categorization of the

00:04:19.139 --> 00:04:22.209
game? Or is Wikipedia just being, I don't know,

00:04:22.370 --> 00:04:24.589
overly bureaucratic? It's a fair question, but

00:04:24.589 --> 00:04:27.509
it isn't bureaucracy, actually. It's a structural

00:04:27.509 --> 00:04:30.209
necessity driven by the philosophy of broadcast

00:04:30.209 --> 00:04:32.829
history. OK, how so? Well, first, just consider

00:04:32.829 --> 00:04:35.050
the sheer volume of content. I mean, a daily

00:04:35.050 --> 00:04:37.370
late night show produces an astronomical amount

00:04:37.370 --> 00:04:39.410
of discrete segments over the years. Oh, yeah,

00:04:39.529 --> 00:04:41.310
thousands of episodes. Right. If you put that

00:04:41.310 --> 00:04:43.750
all on one page, the user experience breaks down

00:04:43.750 --> 00:04:46.149
completely. The browser struggles to load it.

00:04:46.529 --> 00:04:48.649
You know, the reader can't navigate it. It's

00:04:48.649 --> 00:04:50.689
just a mess. Nobody wants to scroll for 10 minutes.

00:04:51.189 --> 00:04:53.509
Exactly. But more importantly, Late Night and

00:04:53.509 --> 00:04:56.129
The Tonight Show are treated by television historians

00:04:56.129 --> 00:04:59.529
as entirely separate entities. So they aren't

00:04:59.529 --> 00:05:01.689
just different titles for the same guy sitting

00:05:01.689 --> 00:05:03.870
behind a desk. Not at all. They have different

00:05:03.870 --> 00:05:05.910
production codes, different legacy lineages,

00:05:06.110 --> 00:05:08.610
different network mandates. Think about it. Late

00:05:08.610 --> 00:05:11.750
Night was a franchise built by David Letterman,

00:05:12.110 --> 00:05:14.990
then Conan O 'Brien before Fallon even got there.

00:05:15.069 --> 00:05:18.949
Right. And The Tonight Show is like an institution.

00:05:19.100 --> 00:05:22.620
Yes. It carries the historical weight of Steve

00:05:22.620 --> 00:05:26.759
Allen, Jack Parr, and Johnny Carson. So by forcing

00:05:26.759 --> 00:05:29.579
this split, the set index article refuses to

00:05:29.579 --> 00:05:33.439
treat a single comedian's career as one continuous

00:05:33.439 --> 00:05:36.939
blurred timeline. It draws a hard line in the

00:05:36.939 --> 00:05:39.740
sand to preserve the institutional history of

00:05:39.740 --> 00:05:42.259
NBC's late night programming. OK, that actually

00:05:42.259 --> 00:05:44.060
makes a lot of sense. It's almost like a baseball

00:05:44.060 --> 00:05:46.660
player's statistics. Oh, yeah. Like, their stats

00:05:46.660 --> 00:05:49.370
for the minor leagues are kept completely separate

00:05:49.370 --> 00:05:51.430
from their stats in the major leagues, even though

00:05:51.430 --> 00:05:53.649
it's the exact same guy swinging the exact same

00:05:53.649 --> 00:05:55.610
bat. That's a great way to put it. So the set

00:05:55.610 --> 00:05:57.589
index is basically the manager making sure you

00:05:57.589 --> 00:05:59.550
know which ledger you're about to open before

00:05:59.550 --> 00:06:01.850
you start reading the numbers. Exactly. And this

00:06:01.850 --> 00:06:03.889
brings up the question of how users actually

00:06:03.889 --> 00:06:06.290
arrive at this ledger in the first place. Because

00:06:06.290 --> 00:06:09.110
the architecture of a doorway page is defined

00:06:09.110 --> 00:06:11.949
by the paths that lead to it. Right. And this

00:06:11.949 --> 00:06:13.970
is where the mechanics get wildly interactive.

00:06:14.290 --> 00:06:16.410
Yeah. I found this part so interesting. There

00:06:16.410 --> 00:06:18.750
are specific redirects that point directly to

00:06:18.750 --> 00:06:21.629
this page. Yes. Two of them in particular really

00:06:21.629 --> 00:06:24.310
stand out, first drafts of rock and history of

00:06:24.310 --> 00:06:27.290
rap. So if a user types the phrase first drafts

00:06:27.290 --> 00:06:30.050
of rock into the search bar, they don't get a

00:06:30.050 --> 00:06:32.410
page about the literal first drafts of rock music

00:06:32.410 --> 00:06:35.029
or unreleased studio tapes from the 1960s. Which

00:06:35.029 --> 00:06:37.629
you might expect them to. Right. Instead, the

00:06:37.629 --> 00:06:40.509
system automatically intercepts that query and

00:06:40.509 --> 00:06:42.750
redirects them to this Jimmy Fallon directory

00:06:42.750 --> 00:06:44.870
page, because that was the title of a popular

00:06:44.870 --> 00:06:47.490
sketch series he did. The encyclopedia is basically

00:06:47.490 --> 00:06:50.970
attempting to anticipate user intent. But that

00:06:50.970 --> 00:06:54.569
second redirect, history of rap, causes a massive

00:06:54.569 --> 00:06:57.629
multi -layered collision. Like, the disambiguation

00:06:57.629 --> 00:07:00.189
warning at the very top of this page is highly

00:07:00.189 --> 00:07:03.310
specific. It really is. It explicitly says...

00:07:03.420 --> 00:07:06.360
And I'm quoting here, history of rap redirects

00:07:06.360 --> 00:07:09.240
here. For the history of rap as a musical genre,

00:07:09.540 --> 00:07:11.680
see hip hop music. For the history of rapping

00:07:11.680 --> 00:07:14.860
as a vocal technique, see rapping section history.

00:07:15.040 --> 00:07:17.779
It is such a brilliant piece of traffic control.

00:07:17.899 --> 00:07:20.000
It's honestly absurd when you think about the

00:07:20.000 --> 00:07:22.439
real world implications of it. Oh, totally. The

00:07:22.439 --> 00:07:25.100
information ecosystem has to actively intercept

00:07:25.100 --> 00:07:29.829
people. who are trying to research the profound

00:07:29.829 --> 00:07:34.029
historical origins of a global music genre, just

00:07:34.029 --> 00:07:36.310
in case they accidentally stumbled into a directory

00:07:36.310 --> 00:07:38.410
of Jimmy Fallon comedy sketches. Yeah, it has

00:07:38.410 --> 00:07:40.410
to catch them before they get lost. It's like,

00:07:40.410 --> 00:07:42.250
I don't know, it's like a comedian naming a sketch

00:07:42.250 --> 00:07:45.329
The Industrial Revolution, and suddenly the entire

00:07:45.329 --> 00:07:47.230
encyclopedia has a panic attack trying to route

00:07:47.230 --> 00:07:49.329
traffic. Yeah. I mean, how often does pop culture

00:07:49.329 --> 00:07:51.810
hijack actual historical terminology like this?

00:07:51.889 --> 00:07:54.569
Honestly, it happens constantly. And it represents

00:07:54.569 --> 00:07:57.410
one of the greatest on -go - challenges in information

00:07:57.410 --> 00:07:59.990
architecture. Oh yeah, you have a late -night

00:07:59.990 --> 00:08:04.600
comedy bit. that decided to use the exact nomenclature,

00:08:04.680 --> 00:08:07.699
the literal phrase, history of rap, that describes

00:08:07.699 --> 00:08:10.439
a major cultural movement. And the thing is,

00:08:10.519 --> 00:08:14.199
the database is entirely blind to human motivation.

00:08:14.399 --> 00:08:15.800
Right. It doesn't know why you're typing it.

00:08:16.019 --> 00:08:18.939
Exactly. When you type history of rap into a

00:08:18.939 --> 00:08:21.639
search bar, the system doesn't know your context.

00:08:21.939 --> 00:08:24.139
It doesn't know if you're, say, a high school

00:08:24.139 --> 00:08:27.160
student writing a musicology paper on the 1970s

00:08:27.160 --> 00:08:29.660
Bronx, or if you're just a bored office worker

00:08:29.660 --> 00:08:32.340
looking for a vibe. from last night's broadcast.

00:08:32.580 --> 00:08:35.080
So it has to assume every possible state of the

00:08:35.080 --> 00:08:37.679
user simultaneously. Yes, exactly. And notice

00:08:37.679 --> 00:08:40.240
how surgically precise that warning text is.

00:08:40.500 --> 00:08:42.059
It doesn't just say, not what you're looking

00:08:42.059 --> 00:08:43.679
for, click here. Right, it doesn't leave you

00:08:43.679 --> 00:08:46.600
hanging. No, it acts as an educational tool right

00:08:46.600 --> 00:08:48.840
there in the warning banner. It cleanly separates

00:08:48.840 --> 00:08:51.019
the sketch from the broader historical reality.

00:08:51.480 --> 00:08:53.539
But then, and this is the cool part, it goes

00:08:53.539 --> 00:08:55.940
a step further and separates the musical genre

00:08:55.940 --> 00:08:58.659
from the specific vocal technique. Oh, wow. It

00:08:58.659 --> 00:09:00.860
directs you to the hip. pop music article for

00:09:00.860 --> 00:09:04.100
the genre, but directs you to a specific subsection

00:09:04.100 --> 00:09:06.460
of the rapping article for the vocal technique.

00:09:06.679 --> 00:09:10.080
That is so wild. It defines the nuanced difference

00:09:10.080 --> 00:09:12.879
between rap the genre and rapping the technique,

00:09:13.399 --> 00:09:15.419
all while you're literally just standing on the

00:09:15.419 --> 00:09:17.200
doorstep of a late night television directory.

00:09:17.360 --> 00:09:19.259
It really highlights how much work goes into

00:09:19.259 --> 00:09:21.840
this stuff. For sure. Because if this traffic

00:09:21.840 --> 00:09:24.539
cop didn't exist, the collision of pop culture

00:09:24.539 --> 00:09:27.500
and actual history would create a deeply confusing

00:09:27.500 --> 00:09:32.200
user experience. Human entertainment could inadvertently

00:09:32.200 --> 00:09:36.259
obscure actual historical facts in the digital

00:09:36.259 --> 00:09:38.399
record. And the directory ensures that doesn't

00:09:38.399 --> 00:09:41.509
happen. It manages the overlap of exact nomenclature

00:09:41.509 --> 00:09:43.929
without letting one concept cannibalize the other.

00:09:44.370 --> 00:09:46.210
OK, but intercepting traffic and routing users

00:09:46.210 --> 00:09:48.289
is only half the battle, right? Right. Because

00:09:48.289 --> 00:09:51.210
if we look under the hood of this specific page,

00:09:51.809 --> 00:09:53.870
there's a whole secondary layer of invisible

00:09:53.870 --> 00:09:56.950
labor keeping these links alive. Oh, the backend

00:09:56.950 --> 00:09:58.889
metadata. This is where it gets really good.

00:09:59.129 --> 00:10:01.190
Yeah. We have the visible structure, the links

00:10:01.190 --> 00:10:03.889
you slick, the warnings you read. But the backend

00:10:03.889 --> 00:10:06.009
metadata tells a completely different story.

00:10:06.200 --> 00:10:08.940
about how information is maintained. And there

00:10:08.940 --> 00:10:13.019
are two specific pieces of hidden data here that

00:10:13.019 --> 00:10:16.500
really highlight this. First, this page was edited

00:10:16.500 --> 00:10:19.460
very recently. It was last edited on March 5,

00:10:19.639 --> 00:10:23.440
2025. And second, There's a hidden category note

00:10:23.440 --> 00:10:26.259
attached to the page that simply reads, short

00:10:26.259 --> 00:10:28.559
description is different from wiki data. And

00:10:28.559 --> 00:10:30.940
those two tiny details are incredibly telling

00:10:30.940 --> 00:10:33.419
about the nature of knowledge curation. Right.

00:10:33.740 --> 00:10:36.460
Because, OK, if this page is literally just a

00:10:36.460 --> 00:10:39.039
doorway pointing to two lists, if its entire

00:10:39.039 --> 00:10:40.919
function is just to say, go left for late night

00:10:40.919 --> 00:10:43.379
and go right for the Tonight Show, what is there

00:10:43.379 --> 00:10:46.240
left to edit in March of 2025? It's not like

00:10:46.240 --> 00:10:48.580
the basic premise of a directory changes over

00:10:48.580 --> 00:10:51.860
time. Are they just fixing typos, or is the architecture

00:10:51.860 --> 00:10:53.740
of the site itself just shifting under our feet?

00:10:53.840 --> 00:10:55.919
Well, it's the latter, really. We tend to think

00:10:55.919 --> 00:10:58.559
of encyclopedias as these static objects, like

00:10:58.559 --> 00:11:01.080
a book gets printed, put on a shelf, and the

00:11:01.080 --> 00:11:03.000
text stays the same forever. Right, it's done.

00:11:03.299 --> 00:11:06.200
But digital curation is a living, breathing,

00:11:06.460 --> 00:11:09.899
continuous process. The fact that a simple directory

00:11:09.899 --> 00:11:13.220
requires an edit in 2025 proves that maintenance

00:11:13.220 --> 00:11:16.399
never stops. I mean, an editor might be fixing

00:11:16.399 --> 00:11:18.759
a broken link or maybe a new redirect needed

00:11:18.759 --> 00:11:22.360
to be added, but more likely, the global categorization

00:11:22.360 --> 00:11:25.080
standards of the entire platform might have updated.

00:11:25.279 --> 00:11:27.980
Like a site -wide update. Exactly. And that would

00:11:27.980 --> 00:11:30.639
require a manual tweak to the backend code of

00:11:30.639 --> 00:11:33.279
this specific page just to keep it compliant

00:11:33.279 --> 00:11:35.570
with the wider ecosystem. It makes me think of

00:11:35.570 --> 00:11:38.809
Wikidata and these hidden categories as like

00:11:38.809 --> 00:11:41.470
the backstage crew of a massive theater production.

00:11:41.549 --> 00:11:43.570
Oh, absolutely. You know, you as the audience

00:11:43.570 --> 00:11:45.450
member, you're sitting in the dark watching the

00:11:45.450 --> 00:11:47.490
actors, which are the articles themselves, just

00:11:47.490 --> 00:11:49.610
doing their thing in the spotlight. Right. You

00:11:49.610 --> 00:11:51.830
never see the crew in all black pulling the ropes,

00:11:52.049 --> 00:11:54.409
adjusting the lighting, moving the sets. But

00:11:54.409 --> 00:11:57.269
if they stop working, the entire show just collapses

00:11:57.269 --> 00:12:00.370
on stage. That scaffolding is exactly what holds

00:12:00.370 --> 00:12:03.259
the visible Internet up. And that hidden category

00:12:03.259 --> 00:12:05.259
note you mentioned, short description is different

00:12:05.259 --> 00:12:09.139
from Wikidata, reveals a really fascinating friction

00:12:09.139 --> 00:12:12.019
in how that scaffolding operates. OK, explain

00:12:12.019 --> 00:12:15.720
that. Because what even is Wikidata? Well, for

00:12:15.720 --> 00:12:17.659
anyone unfamiliar with the back end of these

00:12:17.659 --> 00:12:20.279
platforms, Wikidata is a separate but deeply

00:12:20.279 --> 00:12:23.759
interconnected database. Wikipedia is meant to

00:12:23.759 --> 00:12:27.720
be read by humans. It has prose, sentences, paragraphs.

00:12:27.980 --> 00:12:29.940
Right, it's written in English. Whatever language.

00:12:30.480 --> 00:12:33.000
Exactly. But Wikidata is meant to be read entirely

00:12:33.000 --> 00:12:36.240
by machines. It is pure structured data organized

00:12:36.240 --> 00:12:40.179
into relationships. Oh, OK. So Wikipedia is like

00:12:40.179 --> 00:12:43.460
the text of a novel. And Wikidata is the barcode

00:12:43.460 --> 00:12:45.879
on the back of the dust jacket that tells the

00:12:45.879 --> 00:12:48.100
scanner what genre the book is. That is a perfect

00:12:48.100 --> 00:12:50.399
analogy. And every single article has a short

00:12:50.399 --> 00:12:52.220
description attached to it, which is just a little

00:12:52.220 --> 00:12:54.419
tag of a few words that explains what the page

00:12:54.419 --> 00:12:57.710
is at a glance. Ideally, the human -written description

00:12:57.710 --> 00:13:00.129
perfectly matches the machine -readable data

00:13:00.129 --> 00:13:02.889
on Wikidata. But here, they're out of sync. They

00:13:02.889 --> 00:13:04.830
don't match. So what are the actual consequences

00:13:04.830 --> 00:13:07.570
of that mismatch? Why does it matter if a few

00:13:07.570 --> 00:13:09.590
words are different? Does the internet break?

00:13:09.710 --> 00:13:12.750
It doesn't break, but it matters immensely because

00:13:12.750 --> 00:13:15.529
of how modern technology interacts with the semantic

00:13:15.529 --> 00:13:19.110
web. See, Wikidata is the backbone feeding information

00:13:19.110 --> 00:13:21.269
to search algorithms, artificial intelligence

00:13:21.269 --> 00:13:24.149
models, and virtual voice systems. Oh, wow. I

00:13:24.149 --> 00:13:26.789
didn't realize that. Yeah. So if you ask your

00:13:26.789 --> 00:13:29.409
smart speaker to pull up information on Jimmy

00:13:29.409 --> 00:13:32.149
Fallon's sketches, the assistant is actually

00:13:32.149 --> 00:13:35.789
querying Wikidata, not reading the prose on Wikipedia.

00:13:36.009 --> 00:13:38.409
Oh. So if the human description says, you know,

00:13:38.549 --> 00:13:41.230
list of bits by Jimmy Fallon, but the machine

00:13:41.230 --> 00:13:43.269
readable data is coded slightly differently.

00:13:43.710 --> 00:13:46.769
Say it's coded as television series episode list.

00:13:47.259 --> 00:13:49.940
The algorithm gets confused. Oh, I see. It might

00:13:49.940 --> 00:13:52.279
try to play a full episode on your TV instead

00:13:52.279 --> 00:13:54.120
of returning the text directory you actually

00:13:54.120 --> 00:13:56.220
wanted to look at. That makes total sense. So

00:13:56.220 --> 00:13:59.340
that hidden category tag is basically like an

00:13:59.340 --> 00:14:01.399
automated flare going up in the back end. Yes.

00:14:01.559 --> 00:14:04.820
It's signaling to the volunteer archivists, hey,

00:14:05.200 --> 00:14:07.899
we have a database disagreement here. The human

00:14:07.899 --> 00:14:10.019
categorization and the machine categorization

00:14:10.019 --> 00:14:12.279
have drifted apart, and someone needs to come

00:14:12.279 --> 00:14:15.679
in and manually harmonize how we're defining

00:14:15.679 --> 00:14:18.309
this piece of pop culture. Exactly. It proves

00:14:18.309 --> 00:14:20.769
that categorizing human entertainment is not

00:14:20.769 --> 00:14:24.009
an exact science at all. It is highly subjective.

00:14:24.710 --> 00:14:26.809
The terminology one person uses to summarize

00:14:26.809 --> 00:14:29.950
a page can easily conflict with the rigid formatting

00:14:29.950 --> 00:14:32.250
required by an algorithm. It's just so messy.

00:14:32.440 --> 00:14:35.220
It is. And the invisible labor of thousands of

00:14:35.220 --> 00:14:37.759
editors is constantly working to resolve those

00:14:37.759 --> 00:14:40.480
micro -frictions so the end user never experiences

00:14:40.480 --> 00:14:42.759
a glitch. That completely changes how you view

00:14:42.759 --> 00:14:44.940
a simple internet search. Doesn't it? Yeah. I

00:14:44.940 --> 00:14:47.159
mean, you type in first drafts of rock, and boom,

00:14:47.259 --> 00:14:49.600
the page loads instantly. You just assume it

00:14:49.600 --> 00:14:51.639
works by magic. You don't realize there are traffic

00:14:51.639 --> 00:14:54.340
cops and stagehands and entire databases arguing

00:14:54.340 --> 00:14:56.379
with each other in the background just to deliver

00:14:56.379 --> 00:14:58.919
that one specific link to your screen. It truly

00:14:58.919 --> 00:15:01.620
is a monumental achievement of collaborative

00:15:01.620 --> 00:15:05.120
organization. Even the smallest, seemingly most

00:15:05.120 --> 00:15:08.519
insignificant pages are inextricably tied into

00:15:08.519 --> 00:15:12.000
this massive pulsating web of continuous curation

00:15:12.000 --> 00:15:14.259
and metadata. It's incredible. I mean, we started

00:15:14.259 --> 00:15:17.120
today with what looked like the most mundane

00:15:17.120 --> 00:15:19.960
technical page imaginable. Just a total skeleton

00:15:19.960 --> 00:15:22.299
of a page. Right. A set index article pointing

00:15:22.299 --> 00:15:24.759
to late night television bits. But by looking

00:15:24.759 --> 00:15:27.620
closely at its architecture. we discovered an

00:15:27.620 --> 00:15:31.100
incredibly rich intersection of forces. We saw

00:15:31.100 --> 00:15:33.820
how pop culture taxonomy respects the shifting

00:15:33.820 --> 00:15:37.100
eras of broadcast history, meticulously splitting

00:15:37.100 --> 00:15:39.659
late night from The Tonight Show to preserve

00:15:39.659 --> 00:15:42.159
institutional legacy. And we witnessed the highly

00:15:42.159 --> 00:15:44.899
necessary collision of historical disambiguation,

00:15:45.059 --> 00:15:47.399
where a comedy bit forces the platform to implement

00:15:47.399 --> 00:15:49.659
surgical traffic control so it doesn't accidentally

00:15:49.659 --> 00:15:52.700
overwrite the history of the entire hip hop genre.

00:15:52.799 --> 00:15:54.240
Which is still my favorite part of this whole

00:15:54.240 --> 00:15:57.299
thing. It's so good. And we've peeled back the

00:15:57.299 --> 00:15:59.860
curtain to look at the invisible labor, the ongoing

00:15:59.860 --> 00:16:02.679
maintenance required in 2025, and the database

00:16:02.679 --> 00:16:05.200
conflicts that prove organizing human knowledge

00:16:05.200 --> 00:16:09.039
is just a messy, subjective process of harmonizing

00:16:09.039 --> 00:16:12.139
human language with machine -readable data. It

00:16:12.139 --> 00:16:14.159
really makes you appreciate the doorway itself.

00:16:14.460 --> 00:16:17.639
Once you see the scaffolding holding the information

00:16:17.639 --> 00:16:19.860
ecosystem together, you just can't unsee it.

00:16:19.899 --> 00:16:22.059
You really can't. So here's something to mull

00:16:22.059 --> 00:16:23.679
over the next time you're surfing the web, clicking

00:16:23.679 --> 00:16:26.259
from link to link. We build these incredibly

00:16:26.259 --> 00:16:28.399
complex encyclopedias, and we assume they're

00:16:28.399 --> 00:16:30.200
for us right now in the present moment. Right.

00:16:30.360 --> 00:16:32.700
But imagine if this specific archive somehow

00:16:32.700 --> 00:16:35.539
survives intact for hundreds of years. Think

00:16:35.539 --> 00:16:38.580
about future historians centuries from now trying

00:16:38.580 --> 00:16:41.320
to interpret these specific disambiguation pages.

00:16:41.399 --> 00:16:44.399
Oh. Right. Like, will they understand the cultural

00:16:44.399 --> 00:16:46.320
context? Will they look at the architecture of

00:16:46.320 --> 00:16:49.159
the database and assume that the history of rap

00:16:49.159 --> 00:16:52.659
late night sketch was just as historically significant

00:16:52.659 --> 00:16:56.389
as the actual music? simply because the system

00:16:56.389 --> 00:16:59.830
had to treat them as equals, requiring disambiguation.

00:16:59.990 --> 00:17:03.049
That's a wild thought. What do our virtual doorways

00:17:03.049 --> 00:17:05.150
say about the world we are living in right now?

00:17:06.089 --> 00:17:07.890
The next time you find yourself stuck in a rabbit

00:17:07.890 --> 00:17:10.289
hole, don't just look at where you ended up.

00:17:11.009 --> 00:17:12.769
Pay attention to how the internet guided you

00:17:12.769 --> 00:17:13.049
there.
