WEBVTT

00:00:00.000 --> 00:00:03.180
Imagine being ordered by an absolute monarch

00:00:03.180 --> 00:00:07.960
to just manually track down, verify, and then

00:00:07.960 --> 00:00:10.839
publish 49 ,000 distinct pieces of literature.

00:00:10.980 --> 00:00:13.279
Yeah, and no pressure, right? Right. Like, if

00:00:13.279 --> 00:00:17.320
you miss a crucial document or if you compile

00:00:17.320 --> 00:00:19.640
the wrong version of a really sensitive text,

00:00:20.160 --> 00:00:23.250
the emperor might literally have your head. Literally.

00:00:23.390 --> 00:00:24.969
Oh, and by the way, you have to do all of this

00:00:24.969 --> 00:00:26.870
without a computer, without a search engine,

00:00:26.949 --> 00:00:30.010
without a modern printing press, and without

00:00:30.010 --> 00:00:32.490
any pre -existing organizational software whatsoever.

00:00:32.570 --> 00:00:34.649
It's just wild to even think about. It really

00:00:34.649 --> 00:00:37.109
is. So welcome to the deep dive. Today we are

00:00:37.109 --> 00:00:40.090
traveling back to the year 1705 to look at one

00:00:40.090 --> 00:00:42.689
of the most astonishing data collection projects

00:00:42.689 --> 00:00:44.189
in human history. Yeah, we're talking about the

00:00:44.189 --> 00:00:47.009
creation of the complete Tang poems, which is

00:00:47.009 --> 00:00:49.950
known in Chinese as the kuantangshi. Exactly.

00:00:50.859 --> 00:00:52.979
You know, for you listening right now, if you've

00:00:52.979 --> 00:00:55.179
ever felt overwhelmed by information overload,

00:00:56.079 --> 00:00:59.140
like just managing your inbox or your photo app.

00:00:59.140 --> 00:01:02.140
Oh, yeah, constantly. Wait until you hear how

00:01:02.140 --> 00:01:04.260
they handle data collection in the 18th century.

00:01:05.219 --> 00:01:08.200
Because it is, well, it's just mind blowing.

00:01:08.319 --> 00:01:12.760
It really is a master class in. information architecture

00:01:12.760 --> 00:01:15.879
because, you know, we tend to act as though we

00:01:15.879 --> 00:01:19.500
just invented the concept of compiling and categorizing

00:01:19.500 --> 00:01:22.280
vast amounts of human output in the digital age.

00:01:22.500 --> 00:01:26.000
Right, like we own big data. Exactly. But the

00:01:26.000 --> 00:01:29.739
drive to capture, collate, and preserve a massive

00:01:29.739 --> 00:01:33.159
cultural data set that is centuries old and the

00:01:33.159 --> 00:01:35.159
mechanics of how they actually pulled it off

00:01:35.159 --> 00:01:37.819
back then are staggering. Okay. Let's unpack

00:01:37.819 --> 00:01:40.200
this because the sheer scale of what we're dealing

00:01:40.200 --> 00:01:42.719
with here is, I mean, it's almost comical. It

00:01:42.719 --> 00:01:45.319
really is. The complete Tang poems is exactly

00:01:45.319 --> 00:01:47.819
what it sounds like, but on a level that completely

00:01:47.819 --> 00:01:50.140
defies logic. Like we mentioned, we're talking

00:01:50.140 --> 00:01:53.400
about compiling up probably 49 ,000 poems. Yeah.

00:01:53.719 --> 00:01:56.739
And these poems were written by more than 2 ,200

00:01:56.739 --> 00:01:59.900
different poets. And just to establish the tagging

00:01:59.900 --> 00:02:02.560
line here, this monumental project was not happening

00:02:02.560 --> 00:02:04.680
during the Tang dynasty itself. Right. That's

00:02:04.680 --> 00:02:07.140
a huge point. Yeah. It was commissioned much,

00:02:07.159 --> 00:02:10.979
much later, in 1705, under the direction of the

00:02:10.979 --> 00:02:13.909
Qing dynasty. Specifically, the Kangxi Emperor.

00:02:14.110 --> 00:02:15.590
I just, I keep trying to wrap my head around

00:02:15.590 --> 00:02:19.069
that number. 49 ,000 poems. That's a lot of paper.

00:02:19.210 --> 00:02:21.169
It's an incredible amount of paper. To put this

00:02:21.169 --> 00:02:24.069
in perspective for you listening, compiling that

00:02:24.069 --> 00:02:27.009
much specific literature without a search engine,

00:02:27.729 --> 00:02:29.810
it's essentially the equivalent of trying to

00:02:29.810 --> 00:02:33.409
manually transcribe and categorize every single

00:02:33.409 --> 00:02:37.449
meaningful tweet or blog or social media post

00:02:37.449 --> 00:02:40.110
from a golden era of the internet. Oh, wow. Yeah,

00:02:40.110 --> 00:02:42.030
that's a great way to look at it. Right. Imagine

00:02:42.030 --> 00:02:44.270
someone telling you to go out and collect the

00:02:44.270 --> 00:02:46.849
49 ,000 most important posts from the last decade.

00:02:46.969 --> 00:02:49.189
But you can't use a screen. No screens allowed.

00:02:49.430 --> 00:02:51.710
None. You have to track down people's physical

00:02:51.710 --> 00:02:54.289
diaries. You have to travel on horseback to different

00:02:54.289 --> 00:02:56.430
cities to look at local archives and physically

00:02:56.430 --> 00:02:58.770
carry stacks of paper back to your office. It's

00:02:58.770 --> 00:03:01.289
a logistical nightmare. But what's fascinating

00:03:01.289 --> 00:03:04.669
here is the cultural and political behind that

00:03:04.669 --> 00:03:08.590
directive. Well, this was the first of the great

00:03:08.590 --> 00:03:11.330
literary projects for which the Manchu dynasty,

00:03:11.469 --> 00:03:13.909
the Qing dynasty became famous. You really have

00:03:13.909 --> 00:03:15.710
to look at the power dynamics at the time. OK.

00:03:16.210 --> 00:03:18.810
Here is the Kangxi emperor, and he's issuing

00:03:18.810 --> 00:03:22.569
this massive imperial mandate to preserve the

00:03:22.569 --> 00:03:25.310
lyric poems, the Shifar of an entirely previous

00:03:25.310 --> 00:03:27.849
era. Wait, let me push back on that first. Why

00:03:27.849 --> 00:03:30.330
do it? Yeah, I mean, the king emperors were the

00:03:30.330 --> 00:03:32.969
ones in charge. They had the army. They had the

00:03:32.969 --> 00:03:36.240
throne. Why on earth do they care about poetry

00:03:36.240 --> 00:03:39.000
written hundreds of years before they even took

00:03:39.000 --> 00:03:41.659
power? It seems counterintuitive, right? It just

00:03:41.659 --> 00:03:43.979
seems like a massive waste of resources and time

00:03:43.979 --> 00:03:46.520
for a relatively new regime like Focus on the

00:03:46.520 --> 00:03:49.039
Present. Well, it is actually a brilliant strategic

00:03:49.039 --> 00:03:51.699
move. You see, the king rulers were Manchus.

00:03:51.919 --> 00:03:54.180
They were essentially foreign conquerors who

00:03:54.180 --> 00:03:56.800
had taken control of China. So they faced this

00:03:56.800 --> 00:03:59.740
constant underlying crisis of legitimacy among

00:03:59.740 --> 00:04:02.900
the Han Chinese elite. Oh, I see. Yeah. So by

00:04:02.900 --> 00:04:04.919
taking ownership of the defining literature of

00:04:04.919 --> 00:04:07.879
a past Chinese golden age, the Tang dynasty,

00:04:08.039 --> 00:04:10.060
the Kangxi emperor, is essentially asserting

00:04:10.060 --> 00:04:13.319
cultural dominance and continuity. Ah. So it's

00:04:13.319 --> 00:04:15.979
a massive political flex. Precisely. It's a huge

00:04:15.979 --> 00:04:19.519
flex. He is saying to the people, we, the current

00:04:19.519 --> 00:04:22.579
rulers, are the ultimate stewards and protectors

00:04:22.579 --> 00:04:25.899
of your greatest cultural achievements. It proves

00:04:25.899 --> 00:04:28.839
they possess the mandate of heaven. It is just

00:04:28.839 --> 00:04:31.920
an incredible demonstration of soft power, but

00:04:31.920 --> 00:04:34.779
executed on an industrial scale. By compiling

00:04:34.779 --> 00:04:37.199
the complete collection, they control the narrative

00:04:37.199 --> 00:04:40.089
of history itself. you know, standing up in a

00:04:40.089 --> 00:04:41.810
palace and declaring that you own history, that

00:04:41.810 --> 00:04:44.089
doesn't actually put books on a shelf. No, it

00:04:44.089 --> 00:04:46.069
definitely doesn't. The Kangxi Emperor has the

00:04:46.069 --> 00:04:49.790
vision, sure. But he is completely separated

00:04:49.790 --> 00:04:52.750
from the physical reality of finding 49 ,000

00:04:52.750 --> 00:04:54.829
scattered poems. He's not out there riding the

00:04:54.829 --> 00:04:57.189
horses. Exactly. To actually pull this off without

00:04:57.189 --> 00:04:58.970
the internet, he needs a human search engine.

00:04:59.290 --> 00:05:01.529
He needs a project manager who can deliver the

00:05:01.529 --> 00:05:04.750
impossible. And enter Cao Yin. Cao Yin. Yes.

00:05:05.189 --> 00:05:07.930
The emperor issued an edict directly to Cao Yin

00:05:07.930 --> 00:05:10.389
to compile and publish all these surviving lyric

00:05:10.389 --> 00:05:12.790
poems. Now, Cao Yin was a really interesting

00:05:12.790 --> 00:05:14.730
guy. He was a literary figure in his own right.

00:05:14.889 --> 00:05:18.220
But crucially, he was a trusted imperial bond

00:05:18.220 --> 00:05:20.819
servant. Okay, what exactly does that mean in

00:05:20.819 --> 00:05:24.819
this context? An imperial bond servant? Because

00:05:24.819 --> 00:05:28.180
that sounds intense. It is intense. Cao Yan wasn't

00:05:28.180 --> 00:05:31.040
just like a regular government employee or a

00:05:31.040 --> 00:05:33.139
hired contractor who could just quit. Right.

00:05:33.480 --> 00:05:36.180
In the Qing dynasty, being an imperial bond servant

00:05:36.180 --> 00:05:38.980
meant his family was legally bound directly to

00:05:38.980 --> 00:05:42.139
the emperor. Oh, wow. Yeah. It is a role of absolute

00:05:42.750 --> 00:05:45.589
inescapable loyalty. His entire existence, his

00:05:45.589 --> 00:05:48.129
wealth, his status, everything was tied to the

00:05:48.129 --> 00:05:50.610
throne. So the stakes were astronomically high

00:05:50.610 --> 00:05:53.189
for him personally. Exactly. And this tells you

00:05:53.189 --> 00:05:56.189
exactly how highly classified and culturally

00:05:56.189 --> 00:05:58.889
sensitive the emperor considered this data project.

00:05:59.129 --> 00:06:01.610
You don't hand it to just anyone. No, you don't

00:06:01.610 --> 00:06:03.829
hand the preservation of the empire's cultural

00:06:03.829 --> 00:06:06.389
legacy to some random bureaucrat. You hand it

00:06:06.389 --> 00:06:08.730
to your absolute most trusted operative. And

00:06:08.730 --> 00:06:10.569
Kao Yan didn't do it alone, right? He had to

00:06:10.569 --> 00:06:12.769
build this massive production machine. A huge

00:06:12.769 --> 00:06:14.889
machine. From what I read, he was assigned nine

00:06:14.889 --> 00:06:17.430
scholars from the Hanlin Academy just to oversee

00:06:17.430 --> 00:06:20.569
the collation of the texts and the Hanlin Academy.

00:06:20.649 --> 00:06:24.000
That's essentially the absolute elite, right?

00:06:24.319 --> 00:06:27.860
Like the Harvard, Oxford, and MIT of the Empire

00:06:27.860 --> 00:06:30.319
all rolled into one. That is a perfect way to

00:06:30.319 --> 00:06:32.220
describe it. These were the smartest guys in

00:06:32.220 --> 00:06:34.660
the room. They were out there comparing texts

00:06:34.660 --> 00:06:37.720
from various libraries, private collections,

00:06:38.160 --> 00:06:41.160
just dealing with mountains of unorganized data.

00:06:41.319 --> 00:06:44.399
Okay, so the intellectual firepower was immense.

00:06:44.500 --> 00:06:47.839
Oh, unparalleled. But the physical production

00:06:47.839 --> 00:06:51.089
is where the logistics become truly staggering.

00:06:51.350 --> 00:06:53.170
Right, the actual printing. Yeah, they had more

00:06:53.170 --> 00:06:55.689
than 100 craftsmen working on the printing. They

00:06:55.689 --> 00:06:59.529
had to specially procure the paper just to handle

00:06:59.529 --> 00:07:02.350
a job of this magnitude. Just finding enough

00:07:02.350 --> 00:07:05.569
of the right paper in 1705 would be a nightmare.

00:07:05.850 --> 00:07:07.829
Exactly, and remember they were carving wood

00:07:07.829 --> 00:07:10.410
blocks for printing. Let's really talk about

00:07:10.410 --> 00:07:12.240
the mechanics of that because it's wild. This

00:07:12.240 --> 00:07:14.959
isn't setting movable metal type. No. Like if

00:07:14.959 --> 00:07:16.779
you make a typo with movable type, you just pluck

00:07:16.779 --> 00:07:18.639
out a little metal letter A and swap it for an

00:07:18.639 --> 00:07:21.959
E. Right, it's an easy fix. But with woodblocks...

00:07:21.550 --> 00:07:24.490
Every single page had to be meticulously carved

00:07:24.490 --> 00:07:27.209
into a solid block of wood by hand. By hand.

00:07:27.790 --> 00:07:30.529
And the characters are carved in relief. So if

00:07:30.529 --> 00:07:33.629
a craftsman carves one wrong character, just

00:07:33.629 --> 00:07:36.170
one slip of the chisel on a page containing hundreds

00:07:36.170 --> 00:07:39.910
of words, the entire block is ruined. Oh my god.

00:07:39.990 --> 00:07:42.129
They have to literally shave the wood down and

00:07:42.129 --> 00:07:44.930
start the entire page over from scratch. So the

00:07:44.930 --> 00:07:47.800
error rate had to be virtually zero. Here's where

00:07:47.800 --> 00:07:51.000
it gets really interesting to me. Because of

00:07:51.000 --> 00:07:54.040
those wood blocks, Cao Yan actually had to train

00:07:54.040 --> 00:07:56.680
his calligraphers in a common style of writing

00:07:56.680 --> 00:08:00.620
before they even began carving. Yes. Think about

00:08:00.620 --> 00:08:02.740
the friction that solves. I mean, if you have

00:08:02.740 --> 00:08:06.050
49 ,000 files, coming from hundreds of different

00:08:06.050 --> 00:08:08.310
diaries, regional archives, private collections.

00:08:08.870 --> 00:08:10.970
They all look completely different. Oh, completely.

00:08:11.189 --> 00:08:12.810
Everyone has different handwriting, different

00:08:12.810 --> 00:08:15.269
shorthand. Exactly. The visual chaos would have

00:08:15.269 --> 00:08:16.930
been completely unreadable. It would have been

00:08:16.930 --> 00:08:19.110
a mess. So it's less about picking a pretty font

00:08:19.110 --> 00:08:21.290
and more about building, like, the first unified

00:08:21.290 --> 00:08:22.889
operating system. Right. That's a great analogy.

00:08:23.310 --> 00:08:25.569
If Cal Yen doesn't source a universal visual

00:08:25.569 --> 00:08:28.980
code, a standardized calligraphy... The reader

00:08:28.980 --> 00:08:31.519
would get syntactic whiplash trying to decipher

00:08:31.519 --> 00:08:33.840
a new handwriting style on every single page.

00:08:33.960 --> 00:08:36.480
Right. He literally had to invent a universal

00:08:36.480 --> 00:08:39.940
user interface for an 18th century database just

00:08:39.940 --> 00:08:42.340
so those 100 craftsmen could carve a visually

00:08:42.340 --> 00:08:45.470
cohesive product. And if we connect this to the

00:08:45.470 --> 00:08:48.490
bigger picture, it really highlights the immense

00:08:48.490 --> 00:08:51.250
crushing pressure Kao Yen was operating under.

00:08:51.250 --> 00:08:52.850
Because of the timeline. Because of everything.

00:08:53.269 --> 00:08:56.070
You have the emperor's mandate, the elite Hanlin

00:08:56.070 --> 00:08:58.710
scholars breathing down your neck, the specially

00:08:58.710 --> 00:09:02.009
procured paper, the standardized visual UI, and

00:09:02.009 --> 00:09:04.809
100 craftsmen carving unforgiving wood. It's

00:09:04.809 --> 00:09:07.029
a miracle it got done at all. And this massive

00:09:07.029 --> 00:09:09.700
production machine moved incredibly fast. The

00:09:09.700 --> 00:09:11.799
work was finished in a remarkably short time

00:09:11.799 --> 00:09:14.960
for the era. Yet, despite that breakneck speed,

00:09:15.220 --> 00:09:17.559
Khao In still felt the need to apologize to the

00:09:17.559 --> 00:09:20.340
emperor for the delay. Wait, really? He pulls

00:09:20.340 --> 00:09:23.440
off a logistical miracle, finishes a nearly impossible

00:09:23.440 --> 00:09:26.259
task in record time, and he's still basically

00:09:26.259 --> 00:09:29.220
sending a sorry I'm late email to the absolute

00:09:29.220 --> 00:09:31.639
monarch? Pretty much. That tells you everything

00:09:31.639 --> 00:09:34.179
you need to know about the stress levels in that

00:09:34.179 --> 00:09:37.220
operation. It speaks volumes about the absolute

00:09:37.220 --> 00:09:40.220
authority of the Kangxi Emperor. And, you know,

00:09:40.220 --> 00:09:43.240
you see that same authority play out in the battle

00:09:43.240 --> 00:09:46.460
for credit over the final product. Oh, the credit

00:09:46.460 --> 00:09:49.639
battle, yes. There is a very revealing contradiction

00:09:49.639 --> 00:09:53.100
in how the book was officially recorded. So the

00:09:53.100 --> 00:09:56.179
emperor allowed Cao Yan's name to be listed first

00:09:56.179 --> 00:09:59.509
in the book itself. Which is nice. A nice gesture,

00:09:59.690 --> 00:10:02.129
acknowledging the project manager's Herculean

00:10:02.129 --> 00:10:05.370
effort. Nice, yes. But there's a catch. There's

00:10:05.370 --> 00:10:08.070
always a catch. A major catch. In the annotated

00:10:08.070 --> 00:10:10.909
catalog to the complete library of the four treasuries.

00:10:11.009 --> 00:10:14.090
Which is what? Exactly. It's basically the massive

00:10:14.090 --> 00:10:17.350
official bibliography of the entire empire. Ah,

00:10:17.470 --> 00:10:19.750
right. Right? So in that official permanent record,

00:10:20.250 --> 00:10:22.850
the complete Tang poems are officially listed

00:10:22.850 --> 00:10:26.230
as an imperial compilation or a newting. Meaning?

00:10:26.519 --> 00:10:28.320
I'll let you put your name on the title page,

00:10:28.500 --> 00:10:30.740
Cao. But in the official permanent record of

00:10:30.740 --> 00:10:33.860
the empire, this is my book. Exactly. The workers

00:10:33.860 --> 00:10:35.940
build the monument, but the emperor's name goes

00:10:35.940 --> 00:10:38.559
on the plaque. The ultimate credit always rolls

00:10:38.559 --> 00:10:41.679
up to the top. The emperor ensures he gets the

00:10:41.679 --> 00:10:44.879
ultimate glory for preserving the culture. But

00:10:44.879 --> 00:10:47.440
this brings up a huge issue, doesn't it? The

00:10:47.440 --> 00:10:49.840
pressure from the emperor forced Cao Yan's team

00:10:49.840 --> 00:10:52.950
to move. with absolute breakneck speed. Very

00:10:52.950 --> 00:10:55.730
fast. And as anyone who has ever rushed a massive

00:10:55.730 --> 00:10:58.389
data project knows, and you listening probably

00:10:58.389 --> 00:11:01.009
know this from your own work, when you move that

00:11:01.009 --> 00:11:03.889
fast, you are going to get bugs. Oh, guaranteed.

00:11:04.009 --> 00:11:05.870
You are going to get errors and you are going

00:11:05.870 --> 00:11:08.610
to drop files, which leads to the devastating

00:11:08.610 --> 00:11:11.629
irony of this whole project. The complete Tang

00:11:11.629 --> 00:11:14.879
poems is neither complete, nor completely reliable.

00:11:15.080 --> 00:11:17.580
No. The title is definitely a misnomer. Because

00:11:17.580 --> 00:11:20.000
the work was done in such haste, the editors

00:11:20.000 --> 00:11:22.740
took massive shortcuts. Like what? Well, when

00:11:22.740 --> 00:11:24.879
they found different versions of a poem, what

00:11:24.879 --> 00:11:27.340
we call variant texts, they didn't systematically

00:11:27.340 --> 00:11:30.580
justify or even indicate why they chose one version

00:11:30.580 --> 00:11:32.679
over another. They just picked one. Basically.

00:11:32.940 --> 00:11:34.759
Sometimes they offered a first choice and a list

00:11:34.759 --> 00:11:36.639
of variants, but often they just made a quick

00:11:36.639 --> 00:11:39.179
decision and moved on. Which, according to the

00:11:39.179 --> 00:11:42.299
sources, is categorized as being Definitely weak

00:11:42.299 --> 00:11:45.879
by modern academic standards. Yeah. Meaning they

00:11:45.879 --> 00:11:48.820
didn't show their work. Exactly. By modern standards,

00:11:48.919 --> 00:11:51.259
if you choose one word over another in a historical

00:11:51.259 --> 00:11:53.659
document, you write a footnote explaining your

00:11:53.659 --> 00:11:56.200
methodology. Right. You justify it. But Kalyan's

00:11:56.200 --> 00:11:58.440
team didn't have time for footnotes. They just

00:11:58.440 --> 00:12:01.220
picked a version. And the issues go way beyond

00:12:01.220 --> 00:12:04.860
editorial sloppiness. There were huge blind spots.

00:12:04.960 --> 00:12:08.200
They missed stuff. A lot of stuff. The compilers

00:12:08.200 --> 00:12:10.700
entirely ignored some poems or simply couldn't

00:12:10.700 --> 00:12:12.940
find others. For some major poets, there were

00:12:12.940 --> 00:12:15.279
actually better texts sitting in individually

00:12:15.279 --> 00:12:18.179
edited volumes that never made it into this supposedly

00:12:18.179 --> 00:12:20.720
definitive master collection. And it gets even

00:12:20.720 --> 00:12:23.740
crazier. Because centuries later, in the early

00:12:23.740 --> 00:12:27.080
20th century, a sealed cave library was discovered

00:12:27.080 --> 00:12:30.769
in Dunhuang. Yes. the Dunhuang caves. And inside

00:12:30.769 --> 00:12:33.610
this desert cave they found many additional poems

00:12:33.610 --> 00:12:37.049
and variant texts that the 1705 team completely

00:12:37.049 --> 00:12:40.269
missed. It was a massive discovery. Plus we know

00:12:40.269 --> 00:12:43.029
that a lot of poems listed in older Tang dynasty

00:12:43.029 --> 00:12:46.149
catalogs simply didn't survive to the 18th century

00:12:46.149 --> 00:12:48.970
because the imperial libraries had been burned

00:12:48.970 --> 00:12:51.149
and destroyed over the years. Right. They were

00:12:51.149 --> 00:12:53.730
working from a broken data set from the absolute

00:12:53.730 --> 00:12:57.700
start. They really were. The Dunhuang Cave discovery

00:12:57.700 --> 00:13:00.759
is a perfect visceral illustration of how fragile

00:13:00.759 --> 00:13:04.200
the historical record truly is. It's terrifying,

00:13:04.259 --> 00:13:07.519
honestly. It is! You can have the full backing

00:13:07.519 --> 00:13:10.019
of the imperial throne, unlimited resources,

00:13:10.580 --> 00:13:13.720
the smartest Hanlin scholars of your day, and

00:13:13.720 --> 00:13:16.419
it doesn't matter. You cannot compile what has

00:13:16.419 --> 00:13:19.100
been lost to fire or what is hidden away in a

00:13:19.100 --> 00:13:21.590
sealed cave in the desert. But, and I have to

00:13:21.590 --> 00:13:23.330
ask this, this makes me seriously question the

00:13:23.330 --> 00:13:25.970
whole endeavor. If the editors were just picking

00:13:25.970 --> 00:13:29.049
text variants on a whim, dropping files, and

00:13:29.049 --> 00:13:31.129
completely missing a bunch of stuff hidden in

00:13:31.129 --> 00:13:34.250
caves, should we even trust this collection?

00:13:34.330 --> 00:13:36.850
That's a fair question. Is it just hopelessly

00:13:36.850 --> 00:13:39.110
flawed as a historical document? Honestly, if

00:13:39.110 --> 00:13:41.269
a modern database was this full of holes and

00:13:41.269 --> 00:13:43.250
arbitrary choices, we'd probably throw it out

00:13:43.250 --> 00:13:45.470
and start over. Well, this raises an important

00:13:45.470 --> 00:13:48.190
question about how we judge historical preservation.

00:13:48.620 --> 00:13:51.840
You really have to weigh academic rigor against

00:13:51.840 --> 00:13:54.639
the very real threat of total erasure. Total

00:13:54.639 --> 00:13:58.080
erasure. Yes. By modern peer -reviewed standards,

00:13:58.360 --> 00:14:01.200
the editorial choices are absolutely weak. But

00:14:01.200 --> 00:14:03.639
consider the alternative. We just talked about

00:14:03.639 --> 00:14:06.039
how older imperial libraries had been destroyed.

00:14:06.600 --> 00:14:09.279
History is violent and literature is incredibly

00:14:09.279 --> 00:14:12.190
fragile. So it was a race against time. Absolutely

00:14:12.190 --> 00:14:14.450
a race against time. If Kauvian and his scholars

00:14:14.450 --> 00:14:16.970
had waited for absolute perfection, if they had

00:14:16.970 --> 00:14:19.690
spent decades debating every single variant text

00:14:19.690 --> 00:14:22.070
instead of moving in haste, the project might

00:14:22.070 --> 00:14:23.909
never have been completed. Right. The emperor

00:14:23.909 --> 00:14:26.509
could die. Exactly. They might have lost funding.

00:14:26.769 --> 00:14:28.889
Or a new emperor might have come along and canceled

00:14:28.889 --> 00:14:31.730
the entire operation. And then those 49 ,000

00:14:31.730 --> 00:14:33.769
poems might have been scattered and lost forever.

00:14:34.070 --> 00:14:36.490
Flawed preservation is better than perfect oblivion.

00:14:36.570 --> 00:14:40.250
Exactly. The complete Tang poems may not be a

00:14:40.250 --> 00:14:42.570
flawless definitive reflection of the Tang dynasty,

00:14:42.889 --> 00:14:45.330
but it is a monumental lifeboat that carried

00:14:45.330 --> 00:14:48.049
a vast amount of cultural heritage across the

00:14:48.049 --> 00:14:49.889
centuries. That's a beautiful way to put it.

00:14:50.129 --> 00:14:52.570
The shock of the Dunhuang cave discovery doesn't

00:14:52.570 --> 00:14:55.570
invalidate their work. It simply proves that

00:14:55.570 --> 00:14:58.210
complete is always a relative term. It completely

00:14:58.210 --> 00:15:00.669
shatters the illusion of the definitive edition.

00:15:01.769 --> 00:15:04.220
History isn't just what happened. History is

00:15:04.220 --> 00:15:06.580
just what survived the fire and the rush job

00:15:06.580 --> 00:15:09.059
editors. That is precisely the right way to look

00:15:09.059 --> 00:15:11.559
at it. So despite all the missing pieces, the

00:15:11.559 --> 00:15:13.759
arbitrary editing, and the crushing deadlines,

00:15:14.379 --> 00:15:18.360
Cal Yen's team still had an unfathomable amount

00:15:18.360 --> 00:15:20.559
of material sitting in front of them. Mountains

00:15:20.559 --> 00:15:24.159
of it. They had 49 ,000 poems that they did manage

00:15:24.159 --> 00:15:26.679
to collect. And now they had to figure out how

00:15:26.679 --> 00:15:29.620
to organize it all. How do you sort that much

00:15:29.620 --> 00:15:31.720
chaotic data so that a reader can actually use

00:15:31.720 --> 00:15:33.779
it? This is where the architecture of the collection

00:15:33.779 --> 00:15:36.980
becomes truly revealing. The complete Tang poems

00:15:36.980 --> 00:15:41.720
is divided into 754 distinct sections. 754, that

00:15:41.720 --> 00:15:44.259
is so granular. It is incredibly granular. The

00:15:44.259 --> 00:15:46.080
largest number of these sections are arranged

00:15:46.080 --> 00:15:48.340
quite logically by author. Okay, that makes sense.

00:15:48.509 --> 00:15:51.289
Yeah, and they even include brief biographies

00:15:51.289 --> 00:15:53.970
of the poets, which is immensely helpful for

00:15:53.970 --> 00:15:56.169
historical context, but they didn't stop there.

00:15:56.730 --> 00:15:59.269
Other sections are arranged by poetic form, like

00:15:59.269 --> 00:16:01.690
the yufu. Can you clarify what the yufu form

00:16:01.690 --> 00:16:04.460
is for those who might not know? Certainly. Yufu

00:16:04.460 --> 00:16:08.100
translates roughly to music bureau style. These

00:16:08.100 --> 00:16:10.379
were originally folk song style poems collected

00:16:10.379 --> 00:16:12.500
by the government, and they often dealt with

00:16:12.500 --> 00:16:15.279
the hardships of common life, war, or social

00:16:15.279 --> 00:16:18.179
injustice. OK, so a very specific grounded form

00:16:18.179 --> 00:16:21.460
of poetry. Exactly. But beyond author and form,

00:16:21.620 --> 00:16:24.080
the editors created sections arranged by highly

00:16:24.080 --> 00:16:26.820
specific subjects. So what does this all mean?

00:16:26.980 --> 00:16:28.659
How specific are we talking? Because when I was

00:16:28.659 --> 00:16:30.879
looking through this part of the breakdown, my

00:16:30.879 --> 00:16:33.259
jaw kind of dropped. It's quite the list. They

00:16:33.259 --> 00:16:35.899
had sections for emperors, sections for consorts.

00:16:36.179 --> 00:16:38.740
They had five entire sections dedicated solely

00:16:38.740 --> 00:16:41.919
to women. Yes. And then the list just goes completely

00:16:41.919 --> 00:16:43.740
off the rails in the best way possible. They

00:16:43.740 --> 00:16:46.039
have sections for monks, priests, spirits, ghosts,

00:16:46.240 --> 00:16:50.139
dreams, prophecy, proverbs, mystery, rumor, and

00:16:50.139 --> 00:16:53.299
drinking. It is a remarkable taxonomy. I mean,

00:16:53.779 --> 00:16:57.679
spirits, ghosts, rumor, and drinking. That sounds

00:16:57.679 --> 00:16:59.799
like an amazing weekend playlist. It really does.

00:16:59.919 --> 00:17:02.639
It feels exactly like scrolling through Netflix

00:17:02.639 --> 00:17:05.220
and finding one of those hyper niche algorithm

00:17:05.220 --> 00:17:08.220
categories like gritty sci fi thrillers featuring

00:17:08.220 --> 00:17:11.420
strong female leads, but applied to eighth century

00:17:11.420 --> 00:17:14.480
poetry. That's a great analogy because just like

00:17:14.480 --> 00:17:17.200
modern algorithms, the categorization serves

00:17:17.200 --> 00:17:20.140
a profound purpose. When you look at those categories,

00:17:20.420 --> 00:17:23.920
dreams, prophecy, mystery, rumor. You are not

00:17:23.920 --> 00:17:26.200
just looking at a filing system. You are looking

00:17:26.200 --> 00:17:28.880
at a map of a civilization's consciousness. It

00:17:28.880 --> 00:17:30.839
wasn't just about grouping words that rhymed

00:17:30.839 --> 00:17:32.920
or finding poems with the same syllable count.

00:17:33.099 --> 00:17:35.480
Not at all. By organizing the poems into these

00:17:35.480 --> 00:17:38.440
specific bins, the 1705 editors were effectively

00:17:38.440 --> 00:17:40.839
mapping out the psychological and social landscape

00:17:40.839 --> 00:17:43.740
of the Tang dynasty. It was about documenting

00:17:43.740 --> 00:17:46.099
the entirety of the human experience as it was

00:17:46.099 --> 00:17:48.619
lived and imagined during that era. Think about

00:17:48.619 --> 00:17:51.920
what a category like rumor implies. It's fascinating.

00:17:52.160 --> 00:17:54.859
It means they recognize poetry not just as high

00:17:54.859 --> 00:17:57.579
art for the elite, but as a vehicle for social

00:17:57.579 --> 00:18:00.519
commentary. It was the ancient equivalent of

00:18:00.519 --> 00:18:02.859
political subreddits whispers in the imperial

00:18:02.859 --> 00:18:06.039
court, the anxieties of the common people. Exactly.

00:18:06.420 --> 00:18:08.839
And a section dedicated to drinking captures

00:18:08.839 --> 00:18:12.160
the social rituals, the camaraderie, and perhaps

00:18:12.160 --> 00:18:14.880
the escapism of the time. They needed a blow

00:18:14.880 --> 00:18:17.470
off scheme. We all do. They were using these

00:18:17.470 --> 00:18:20.730
754 sections to build a comprehensive taxonomy

00:18:20.730 --> 00:18:23.829
of Tang life. They captured everything from the

00:18:23.829 --> 00:18:26.769
divine and the spiritual down to the absolute

00:18:26.769 --> 00:18:29.970
bottom of a wine glass. It really humanizes these

00:18:29.970 --> 00:18:32.470
ancient scholars. You realize they were dealing

00:18:32.470 --> 00:18:36.029
with the exact same messy, chaotic, beautiful

00:18:36.029 --> 00:18:37.970
human emotions that we deal with today. They

00:18:37.970 --> 00:18:40.230
absolutely were. They just needed a way to organize

00:18:40.230 --> 00:18:42.529
all those feelings into wood blocks. And it clearly

00:18:42.529 --> 00:18:45.349
worked because this massive collection had lasting

00:18:45.349 --> 00:18:48.069
impact. The complete Tang poems actually served

00:18:48.069 --> 00:18:50.730
as the major reservoir, the parent database essentially,

00:18:51.150 --> 00:18:53.609
for a much more famous highly curated spin -off

00:18:53.609 --> 00:18:56.910
anthology called The 300 Tang Poems. Yes, and

00:18:56.910 --> 00:18:59.369
that perfectly illustrates the natural life cycle

00:18:59.369 --> 00:19:02.250
of information. First, you have the monumental,

00:19:02.369 --> 00:19:04.769
almost desperate effort to gather and preserve

00:19:04.769 --> 00:19:07.589
absolutely everything, the complete Tang poems.

00:19:07.829 --> 00:19:11.130
The huge data dump. Right. It is bulky, it is

00:19:11.130 --> 00:19:13.859
flawed, and it is overwhelming. Then comes the

00:19:13.859 --> 00:19:16.559
subsequent effort to distill that massive archive

00:19:16.559 --> 00:19:19.420
into something accessible, digestible, and easily

00:19:19.420 --> 00:19:22.799
transmittable, the 300 Tang poems. So they built

00:19:22.799 --> 00:19:26.900
the massive 49 ,000 poem unmanageable archive,

00:19:27.259 --> 00:19:29.579
and then later someone came along and curated

00:19:29.579 --> 00:19:31.720
the best -of playlist that everyone actually

00:19:31.720 --> 00:19:34.160
ended up reading and memorizing. Exactly. The

00:19:34.160 --> 00:19:36.359
larger work ensured the survival of the material,

00:19:36.599 --> 00:19:38.720
while the shorter anthology ensured its popularity.

00:19:39.380 --> 00:19:42.390
Without the massive flawed database, The curated

00:19:42.390 --> 00:19:45.450
masterpiece couldn't exist. It's all about curation.

00:19:45.750 --> 00:19:47.230
And honestly, looking back at all of this, I

00:19:47.230 --> 00:19:48.970
want to speak directly to you listening for a

00:19:48.970 --> 00:19:51.369
moment. Think about how you operate every single

00:19:51.369 --> 00:19:54.269
day. We are constantly trying to curate our own

00:19:54.269 --> 00:19:56.390
knowledge, aren't we? We're bookmarking articles

00:19:56.390 --> 00:19:58.509
we find interesting. We are saving deep dives

00:19:58.509 --> 00:20:01.490
to listen to later. We are meticulously organizing

00:20:01.490 --> 00:20:04.089
our photo albums and tagging our files. We're

00:20:04.089 --> 00:20:06.809
all little archivists. We really are. We're all

00:20:06.809 --> 00:20:08.569
trying to build our own little complete libraries

00:20:08.569 --> 00:20:10.849
to make sense of the world around us. The tools

00:20:10.849 --> 00:20:13.329
evolve, but the human imperative to remember

00:20:13.329 --> 00:20:16.450
and organize remains completely constant. Khao

00:20:16.450 --> 00:20:19.829
Yin did it with 100 craftsmen, specially procured

00:20:19.829 --> 00:20:22.849
paper, and thousands of carved wood blocks. We

00:20:22.849 --> 00:20:26.119
do it with a tap on a glass screen. But the underlying

00:20:26.119 --> 00:20:29.599
human desire, the absolute need to capture, organize,

00:20:29.779 --> 00:20:31.960
and hold onto brilliance before it fades away,

00:20:32.599 --> 00:20:35.119
is exactly the same today as it was in 1705.

00:20:35.240 --> 00:20:37.880
Then we are still fighting the same battles against

00:20:37.880 --> 00:20:40.839
lost data and the relentless forward march of

00:20:40.839 --> 00:20:42.859
time. Which brings me to a final thought I want

00:20:42.859 --> 00:20:44.500
to leave you with, a little thought experiment

00:20:44.500 --> 00:20:47.140
to mull over long after this deep dive is done.

00:20:47.359 --> 00:20:49.829
Okay, let's hear it. Think about that overwhelming

00:20:49.829 --> 00:20:52.609
mountain of digital data we generate every single

00:20:52.609 --> 00:20:56.769
day. The open tabs, the endless feeds, the entire

00:20:56.769 --> 00:20:59.690
chaotic output of the modern internet. It's terrifying

00:20:59.690 --> 00:21:02.589
to think about. Imagine if, 300 years from now,

00:21:03.029 --> 00:21:05.529
a future ruler decided to compile the complete

00:21:05.529 --> 00:21:08.569
thoughts of our current era. If they tried to

00:21:08.569 --> 00:21:12.470
sort our entire digital lives into 754 specific

00:21:12.470 --> 00:21:15.609
categories. What would those categories even

00:21:15.609 --> 00:21:18.630
be? Oh, wow! Would they have sections for memes,

00:21:19.150 --> 00:21:22.549
outrage, unread newsletters, or late night online

00:21:22.549 --> 00:21:24.630
shopping? It makes you wonder how much of what

00:21:24.630 --> 00:21:26.690
we consider essential today will be preserved

00:21:26.690 --> 00:21:28.710
and how much will just vanish into the margins

00:21:28.710 --> 00:21:31.690
of history. And more importantly, centuries from

00:21:31.690 --> 00:21:33.910
now, when the dust has settled on our civilization,

00:21:34.490 --> 00:21:37.150
what hidden cave library of our lost data will

00:21:37.150 --> 00:21:40.029
future archaeologists discover? A digital Dunhuang.

00:21:40.390 --> 00:21:42.869
Exactly. What forgotten server farm will they

00:21:42.869 --> 00:21:44.990
unearth out in the desert that proves our supposedly

00:21:44.990 --> 00:21:47.529
complete digital record was actually missing

00:21:47.529 --> 00:21:50.049
the most important voices of all? We're all just

00:21:50.049 --> 00:21:52.329
compiling our own flawed histories, hoping they

00:21:52.329 --> 00:21:54.849
survive the fire. We really are. So next time

00:21:54.849 --> 00:21:56.970
you feel overwhelmed by your open tabs, just

00:21:56.970 --> 00:22:00.390
remember Kao Yen and his 49 ,000 poems. You're

00:22:00.390 --> 00:22:02.329
not disorganized. You're just compiling your

00:22:02.329 --> 00:22:05.170
own historical archive. Thanks for joining us

00:22:05.170 --> 00:22:06.150
on this deep dive.
