WEBVTT

00:00:00.000 --> 00:00:03.620
Welcome, everyone, and a very special welcome

00:00:03.620 --> 00:00:06.099
to you, the learner. Yeah, welcome to the deep

00:00:06.099 --> 00:00:08.859
dive. You know who you are. You're the person

00:00:08.859 --> 00:00:13.839
who loves to peel back the layers of how we know

00:00:13.839 --> 00:00:16.079
what we know. Right, skipping the fluff and just

00:00:16.079 --> 00:00:18.000
getting right to the core of the matter. Exactly.

00:00:18.359 --> 00:00:20.539
And today our mission on this deep dive is to

00:00:20.539 --> 00:00:24.960
unpack a single, absolutely fascinating Wikipedia

00:00:24.960 --> 00:00:27.019
article about something called the Cambridge

00:00:27.019 --> 00:00:31.359
Reference Sequence, or the CRS. It is a critical

00:00:31.359 --> 00:00:33.740
piece of scientific history. It really is. But

00:00:33.740 --> 00:00:35.700
as we comb through our source material today,

00:00:35.759 --> 00:00:37.679
we uncover a foundation that was built with a

00:00:37.679 --> 00:00:39.920
rather colorful and sometimes, I mean, really

00:00:39.920 --> 00:00:42.179
chaotic methodology. Colorful is definitely one

00:00:42.179 --> 00:00:44.280
word for it. To set this up for you, the listener,

00:00:44.420 --> 00:00:46.939
I want you to imagine something. Okay. Imagine

00:00:46.939 --> 00:00:49.719
you were trying to read a highly detailed map

00:00:49.719 --> 00:00:53.109
of human ancestry. You're tracing the lines.

00:00:53.149 --> 00:00:54.950
You're looking for the very origins of humanity.

00:00:55.130 --> 00:00:57.409
And you're relying on this big, bold, you are

00:00:57.409 --> 00:01:00.210
here marker on the map to orient yourself. Right.

00:01:00.270 --> 00:01:02.630
Now imagine finding out that the baseline marker

00:01:02.630 --> 00:01:04.689
you've been using to navigate the history of

00:01:04.689 --> 00:01:08.609
human DNA accidentally includes cow DNA. It sounds

00:01:08.609 --> 00:01:11.739
like an urban legend. Right. But it is entirely

00:01:11.739 --> 00:01:14.459
rooted in the actual history of how this sequence

00:01:14.459 --> 00:01:16.920
was developed and used by the global scientific

00:01:16.920 --> 00:01:19.599
community. It's just wild to think about. So

00:01:19.599 --> 00:01:22.340
here is our roadmap for today's deep dive. We're

00:01:22.340 --> 00:01:24.200
going to explore what the Cambridge reference

00:01:24.200 --> 00:01:27.420
sequence actually is and the immense effort it

00:01:27.420 --> 00:01:30.200
took to create it in the 1970s. And then we'll

00:01:30.200 --> 00:01:33.099
look at the absolutely fascinating lab mistakes

00:01:33.099 --> 00:01:35.879
that occurred in its original creation. Yes.

00:01:36.099 --> 00:01:39.480
And finally, we'll delve into the current. fiercely

00:01:39.480 --> 00:01:43.760
debated scientific schism over exactly how we

00:01:43.760 --> 00:01:46.560
should be tracing our mitochondrial family tree

00:01:46.560 --> 00:01:49.200
today. Okay, let's unpack this. Let's start by

00:01:49.200 --> 00:01:51.219
setting the scene and appreciating the sheer

00:01:51.219 --> 00:01:53.620
technological mountain scientists were climbing

00:01:53.620 --> 00:01:55.780
back then. Oh, absolutely. We need to go back

00:01:55.780 --> 00:01:58.920
in time to the 1970s. During this decade, a group

00:01:58.920 --> 00:02:00.599
of researchers at the University of Cambridge

00:02:00.599 --> 00:02:02.959
led by the legendary scientist Fred Sanger took

00:02:02.959 --> 00:02:05.540
on a monumental task. Right. They set out to

00:02:05.540 --> 00:02:08.340
sequence the human mitochondrial genome. Which,

00:02:08.439 --> 00:02:10.419
just to make sure we are all on the same page,

00:02:10.520 --> 00:02:13.479
is different from the main bulk of our DNA, right?

00:02:13.539 --> 00:02:17.259
The nuclear DNA. If our main DNA is the massive

00:02:17.259 --> 00:02:19.819
reference library inside the nucleus of the cell,

00:02:19.979 --> 00:02:23.860
the mitochondrial DNA is more like a tiny separate

00:02:23.860 --> 00:02:26.159
instruction manual. Down in the boiler room.

00:02:26.219 --> 00:02:28.500
Yeah, kept down in the boiler room, the mitochondria

00:02:28.500 --> 00:02:31.159
which generate power for the cell. That is a

00:02:31.159 --> 00:02:33.819
perfect way to visualize it. And the way Sanger's

00:02:33.819 --> 00:02:36.120
team built this map is crucial to everything

00:02:36.120 --> 00:02:38.419
that follows. They didn't take an average of

00:02:38.419 --> 00:02:40.599
thousands of people. No, they didn't. They sequenced

00:02:40.599 --> 00:02:43.599
the mitochondrial genome of just one single woman

00:02:43.599 --> 00:02:46.280
of European descent. Just one person. All the

00:02:46.280 --> 00:02:48.280
baseline data they gathered was from this one

00:02:48.280 --> 00:02:52.099
individual's biological material. And they labored

00:02:52.099 --> 00:02:54.219
over this throughout the 1970s, working entirely

00:02:54.219 --> 00:02:56.819
without the automated high -speed computers we

00:02:56.819 --> 00:02:58.780
have today. It's staggering to think about doing

00:02:58.780 --> 00:03:00.870
that level of genetics. sequencing manually.

00:03:01.129 --> 00:03:02.990
And they finally published their sequence in

00:03:02.990 --> 00:03:06.349
1981. It was a massive triumph of 20th century

00:03:06.349 --> 00:03:08.569
biology. The source gives us some really specific

00:03:08.569 --> 00:03:10.550
statistics on what they found too, right? In

00:03:10.550 --> 00:03:13.349
that 1981 release. It does. First, they determined

00:03:13.349 --> 00:03:15.590
that this mitochondrial sequence contains some

00:03:15.590 --> 00:03:19.550
37 genes. Just 37 genes. I mean, compared to

00:03:19.550 --> 00:03:22.460
the tens of thousands in our main genome. That

00:03:22.460 --> 00:03:25.360
is tiny. It is a vanishingly small piece of our

00:03:25.360 --> 00:03:27.960
genetic makeup. To put the scale into perspective,

00:03:28.280 --> 00:03:33.080
this mitochondrial DNA represents just 0 .0006

00:03:33.080 --> 00:03:36.699
% of the nuclear human genome. Wow. Yet because

00:03:36.699 --> 00:03:39.860
mitochondrial DNA is passed down almost exclusively

00:03:39.860 --> 00:03:43.039
from mother to child, it is an incredibly vital

00:03:43.039 --> 00:03:46.000
tool for tracing maternal ancestry. And the source

00:03:46.000 --> 00:03:48.199
also mentions the length of the sequence they

00:03:48.199 --> 00:03:50.020
mapped out. When we talk about the length of

00:03:50.020 --> 00:03:52.159
DNA, we talk about base pairs. Right, the rungs

00:03:52.159 --> 00:03:53.819
of the ladder. Yeah, the individual chemical

00:03:53.819 --> 00:03:56.639
letters that make up the DNA ladder, the S, C,

00:03:56.639 --> 00:04:00.180
S, T's and G's. The 1981 publication announced

00:04:00.180 --> 00:04:02.580
that this woman's mitochondrial sequence had

00:04:02.580 --> 00:04:06.900
a length of exactly 16 ,569 base pairs. And they

00:04:06.900 --> 00:04:08.680
published that number with great confidence.

00:04:08.939 --> 00:04:11.159
They sure did. However, the source points out

00:04:11.159 --> 00:04:13.759
a brilliant, surprising detail right from the

00:04:13.759 --> 00:04:16.600
jump. due to a mistake made during that original

00:04:16.600 --> 00:04:19.600
grueling manual sequencing process. The actual

00:04:19.600 --> 00:04:23.660
length of that specific sequence isn't 16 ,569.

00:04:23.839 --> 00:04:29.060
No, it is exactly 16 ,568 base pairs. They spent

00:04:29.060 --> 00:04:31.879
a decade mapping it by hand, and they were off

00:04:31.879 --> 00:04:34.740
by a single letter. What's fascinating here is

00:04:34.740 --> 00:04:36.839
that despite that mistake, and we will get into

00:04:36.839 --> 00:04:40.180
the more severe error shortly, this specific

00:04:40.180 --> 00:04:42.779
sequence from one single woman of European descent

00:04:42.779 --> 00:04:46.560
became the definitive baseline map for human

00:04:46.560 --> 00:04:50.019
mitochondrial genetics. For decades, it was the

00:04:50.019 --> 00:04:52.240
gold standard. It became the ultimate measuring

00:04:52.240 --> 00:04:55.180
stick for the entire globe. Right. Whenever scientists

00:04:55.180 --> 00:04:57.819
anywhere in the world wanted to look at mitochondrial

00:04:57.819 --> 00:05:01.079
DNA, they compared it to this Cambridge reference

00:05:01.079 --> 00:05:03.720
sequence. So if your DNA had... an A where the

00:05:03.720 --> 00:05:06.319
Cambridge sequence had a T, that difference was

00:05:06.319 --> 00:05:08.639
logged as a variation. And because it became

00:05:08.639 --> 00:05:11.379
the global yardstick, other laboratories naturally

00:05:11.379 --> 00:05:13.680
began using it, testing it, and applying it to

00:05:13.680 --> 00:05:16.300
their own research. The scientific method relies

00:05:16.300 --> 00:05:19.100
on replication. But when other researchers finally

00:05:19.100 --> 00:05:21.279
started to repeat the sequencing that Sanger's

00:05:21.279 --> 00:05:23.699
team had done, they didn't just find that one

00:05:23.699 --> 00:05:25.860
missing base pair. No, they found quite a bit

00:05:25.860 --> 00:05:28.930
more. The article notes that they found what

00:05:28.930 --> 00:05:31.689
it calls striking discrepancies. They started

00:05:31.689 --> 00:05:34.069
looking closely at the original sequence and

00:05:34.069 --> 00:05:37.350
realized the 1981 map had some serious topographical

00:05:37.350 --> 00:05:40.110
errors. The scientific community methodically

00:05:40.110 --> 00:05:42.810
went through the 1981 sequence and eventually

00:05:42.810 --> 00:05:46.430
identified a total of 11 specific errors in the

00:05:46.430 --> 00:05:49.230
original published data. 11 errors. And these

00:05:49.230 --> 00:05:51.629
errors took a few different forms. There were

00:05:51.629 --> 00:05:54.069
incorrect assignments of single base pairs where

00:05:54.069 --> 00:05:56.930
the original team had essentially just misidentified

00:05:56.930 --> 00:06:00.139
the nucleotide. a specific position. Right. But

00:06:00.139 --> 00:06:02.339
there were also structural issues that required

00:06:02.339 --> 00:06:04.399
correction. I'll let you take the lead on the

00:06:04.399 --> 00:06:06.500
technical side of this because the source highlights

00:06:06.500 --> 00:06:09.040
one glaring structural error that really stood

00:06:09.040 --> 00:06:11.339
out. The most notable structural error occurred

00:06:11.339 --> 00:06:14.560
at a specific position known as 3107 Bell. Okay.

00:06:14.860 --> 00:06:18.060
3107 Bell. In the original 1981 sequence, the

00:06:18.060 --> 00:06:20.339
researchers recorded an extra base pair at that

00:06:20.339 --> 00:06:23.680
spot, labeling it as a C for cytosine. But it

00:06:23.680 --> 00:06:26.680
wasn't there. Exactly. We now know that extra

00:06:26.680 --> 00:06:29.180
base pair simply didn't exist in the actual DNA.

00:06:29.439 --> 00:06:31.319
So how do they fix that without throwing off

00:06:31.319 --> 00:06:33.600
all the numbers after it? Today, in order to

00:06:33.600 --> 00:06:35.959
maintain the overall numbering system, without

00:06:35.959 --> 00:06:38.339
shifting every subsequent number down by one,

00:06:38.379 --> 00:06:40.699
that position is labeled with an N. So they essentially

00:06:40.699 --> 00:06:43.120
put an N in there as a placeholder. Precisely.

00:06:43.120 --> 00:06:46.199
It just stands for an unspecified or unknown

00:06:46.199 --> 00:06:49.370
nucleotide. basically a blank space, to make

00:06:49.370 --> 00:06:51.689
up for the fact that the original sea was a phantom

00:06:51.689 --> 00:06:54.870
letter. It acts as a structural bridge to keep

00:06:54.870 --> 00:06:57.310
the rest of the map's coordinates intact. Exactly.

00:06:57.610 --> 00:07:00.129
But the real kicker, and this is easily the most

00:07:00.129 --> 00:07:02.709
surprising revelation in the entire deep dive,

00:07:03.879 --> 00:07:06.579
The part that absolutely completely changes how

00:07:06.579 --> 00:07:09.740
you view early genetic science is why some of

00:07:09.740 --> 00:07:12.000
these 11 errors happened. It really pulls back

00:07:12.000 --> 00:07:14.360
the curtain on the lab work of that era. You'd

00:07:14.360 --> 00:07:16.379
think it was just a smudge on a photographic

00:07:16.379 --> 00:07:19.279
plate or human fatigue from staring at data.

00:07:19.420 --> 00:07:21.959
The reality is much more reflective of the gritty

00:07:21.959 --> 00:07:25.259
nature of 1970s laboratory work. The article

00:07:25.259 --> 00:07:27.540
states explicitly that some of these errors were

00:07:27.540 --> 00:07:30.300
the result of biological contamination. Contamination.

00:07:30.399 --> 00:07:32.699
The specimens used in the original sequencing

00:07:32.699 --> 00:07:35.360
were contaminated with HeLa specimens. For those

00:07:35.360 --> 00:07:37.680
who might not be familiar, HeLa cells are incredibly

00:07:37.680 --> 00:07:40.560
famous in biology. They stem from a woman named

00:07:40.560 --> 00:07:43.519
Henrietta Lacks. And they are essentially the

00:07:43.519 --> 00:07:46.959
first immortal human cell line. They are famously

00:07:46.959 --> 00:07:49.120
resilient to the point where they are notorious

00:07:49.120 --> 00:07:52.600
for floating through the air or hitching a ride

00:07:52.600 --> 00:07:55.339
on lab equipment and aggressively taking over

00:07:55.339 --> 00:07:57.720
other cultures in a laboratory. HeLa contamination

00:07:57.720 --> 00:08:00.639
is a well -documented hazard in cell biology.

00:08:01.389 --> 00:08:03.189
But the contamination of the Cambridge reference

00:08:03.189 --> 00:08:05.610
sequence didn't stop at HeLa cells. No, it did

00:08:05.610 --> 00:08:08.029
not. The original samples were also contaminated

00:08:08.029 --> 00:08:11.670
with bovine specimens. Cow DNA. There was cow

00:08:11.670 --> 00:08:14.250
DNA mixed into the baseline reference sequence

00:08:14.250 --> 00:08:16.990
for the human mitochondrial genome. It is a remarkable

00:08:16.990 --> 00:08:19.589
piece of history. It is just wild to me that

00:08:19.589 --> 00:08:22.209
for years the blueprint we used to understand

00:08:22.209 --> 00:08:25.310
human origins had a little bit of bovine genetic

00:08:25.310 --> 00:08:27.889
material mixed in by accident. It highlights

00:08:27.889 --> 00:08:30.930
a critical vulnerability in early genetics. Before

00:08:30.930 --> 00:08:33.370
the advent of modern sterile techniques, sealed

00:08:33.370 --> 00:08:35.230
environments and computational error checking

00:08:35.230 --> 00:08:38.289
researchers often used bovine serum to nourish

00:08:38.289 --> 00:08:41.330
cell cultures in the lab. And traces of that

00:08:41.330 --> 00:08:45.090
bovine material simply snuck into the final sequence.

00:08:45.269 --> 00:08:47.450
Exactly. Scientists obviously couldn't just leave

00:08:47.450 --> 00:08:49.610
it like that. You can't have a global yardstick

00:08:49.610 --> 00:08:52.049
made partly of cow. No. The scientific community

00:08:52.049 --> 00:08:55.250
recognized the need for a clean baseline. A revised

00:08:55.250 --> 00:08:58.049
version was pain. Unstakenly assembled and published

00:08:58.049 --> 00:09:02.210
in 1999 by a team led by Andrews et al. Okay.

00:09:02.269 --> 00:09:05.490
They cleaned up the 11 errors, completely removed

00:09:05.490 --> 00:09:08.289
the bovine and HeLa contamination, and published

00:09:08.289 --> 00:09:10.730
what is now known as the Revised Cambridge Reference

00:09:10.730 --> 00:09:14.230
Sequence, or the RCRS. The RCRS. And that's what

00:09:14.230 --> 00:09:16.559
scientists used today. It is. The source mentions

00:09:16.559 --> 00:09:18.120
that if someone wants to go look it up themselves,

00:09:18.360 --> 00:09:21.480
it is officially deposited in the GenBank NCBI

00:09:21.480 --> 00:09:23.940
database. Anyone can access it. You can find

00:09:23.940 --> 00:09:26.779
it under the accession number NC underscore zero

00:09:26.779 --> 00:09:30.200
one two nine two zero. That is the corrected,

00:09:30.340 --> 00:09:32.899
pristine, modern version of the original 1981

00:09:32.899 --> 00:09:35.100
sequence. But even with the corrections, they

00:09:35.100 --> 00:09:37.379
faced a massive logistical headache regarding

00:09:37.379 --> 00:09:39.740
the numbering. Because the actual length of the

00:09:39.740 --> 00:09:44.940
sequence is 16 ,568 base pairs, not the 69. originally

00:09:44.940 --> 00:09:47.340
published. Right. Plus you have issues like that

00:09:47.340 --> 00:09:50.519
Phantom C at position 3107. So what did they

00:09:50.519 --> 00:09:53.340
do? When Andrews and his team revised the sequence

00:09:53.340 --> 00:09:56.659
in 1999, they faced a difficult choice. They

00:09:56.659 --> 00:09:58.779
could renumber the entire sequence from scratch

00:09:58.779 --> 00:10:02.320
1 to 16568 to perfectly reflect the accurate

00:10:02.320 --> 00:10:05.399
physical reality of the DNA molecule. Which feels

00:10:05.399 --> 00:10:07.860
like the most scientifically pure thing to do,

00:10:07.899 --> 00:10:09.899
just wipe the slate clean and number it correctly.

00:10:10.330 --> 00:10:12.549
The problem with scientific purity is that it

00:10:12.549 --> 00:10:15.289
often collides with practical reality. Think

00:10:15.289 --> 00:10:17.629
about the context. Right. Nearly 20 years have

00:10:17.629 --> 00:10:21.470
passed. Exactly. By 1999, nearly two decades

00:10:21.470 --> 00:10:24.389
of scientific literature. Thousands of published

00:10:24.389 --> 00:10:26.850
papers and countless evolutionary studies had

00:10:26.850 --> 00:10:28.690
been anchored to the original numbering system

00:10:28.690 --> 00:10:32.009
of the 1981 CRS. Oh, wow. So if they changed

00:10:32.009 --> 00:10:33.509
all the numbers, it would be like changing the

00:10:33.509 --> 00:10:35.429
Dewey Decimal System overnight. It would have

00:10:35.429 --> 00:10:37.769
created absolute chaos in the field of genetics.

00:10:37.990 --> 00:10:40.169
Every single book, every single paper published

00:10:40.169 --> 00:10:43.149
before 1999 would suddenly have the wrong coordinates.

00:10:43.429 --> 00:10:45.450
It would render an entire generation of scientific

00:10:45.450 --> 00:10:48.509
literature incredibly difficult to read and cross

00:10:48.509 --> 00:10:50.870
-reference. Researchers would have needed a conversion

00:10:50.870 --> 00:10:54.110
chart just to read a paper from 1995. That sounds

00:10:54.110 --> 00:10:56.409
like a nightmare. So despite making the rigorous

00:10:56.409 --> 00:10:58.830
chemical corrections to the actual genetic data,

00:10:59.029 --> 00:11:01.230
the scientific community made the deliberate

00:11:01.230 --> 00:11:04.610
decision to retain the flawed original nucleotide

00:11:04.610 --> 00:11:07.649
numbering system. It's such a pragmatic human

00:11:07.649 --> 00:11:10.629
solution to a scientific problem. They kept the

00:11:10.629 --> 00:11:13.580
fandom space at 3107. just to keep the filing

00:11:13.580 --> 00:11:16.139
system intact. They did. I want to bring this

00:11:16.139 --> 00:11:18.220
directly back to you, the listener. You might

00:11:18.220 --> 00:11:20.559
be wondering how an off -by -one error from a

00:11:20.559 --> 00:11:23.779
1970s laboratory affects you today. Well, if

00:11:23.779 --> 00:11:25.980
you have ever taken a genealogical DNA test.

00:11:26.139 --> 00:11:28.179
One of those swab -your -cheek or spin -the -tube

00:11:28.179 --> 00:11:30.539
kits. Exactly, to find out your maternal ancestry.

00:11:31.259 --> 00:11:33.779
Your results are intimately tied to this story.

00:11:34.019 --> 00:11:36.860
Your personal ancestry report is a direct descendant

00:11:36.860 --> 00:11:39.720
of that 1999 revision. Because when those testing

00:11:39.720 --> 00:11:42.590
companies look at your mitochondrial DNA, They

00:11:42.590 --> 00:11:44.710
don't map out your entire sequence from scratch

00:11:44.710 --> 00:11:48.450
and hand you a microscopic list of 16 ,500 plus

00:11:48.450 --> 00:11:51.210
letters. No. Instead, your results are reported

00:11:51.210 --> 00:11:54.129
simply as the differences between your DNA and

00:11:54.129 --> 00:11:57.210
this revised Cambridge reference sequence. You

00:11:57.210 --> 00:11:59.970
are literally being compared to that one European

00:11:59.970 --> 00:12:03.230
woman from the 1970s. If we connect this to the

00:12:03.230 --> 00:12:05.950
bigger picture, it is vital to remember what

00:12:05.950 --> 00:12:09.090
the CRS actually is and, more importantly, what

00:12:09.090 --> 00:12:11.629
it is not. Right. It is just a reference sequence.

00:12:11.769 --> 00:12:13.870
It is a yardstick. It is not a record of the

00:12:13.870 --> 00:12:16.710
earliest human mitochondrial DNA. It doesn't

00:12:16.710 --> 00:12:19.289
represent the origin point of human... or some

00:12:19.289 --> 00:12:22.889
perfect average human specimen? Not at all. It's

00:12:22.889 --> 00:12:24.850
an arbitrary starting line that we only use because

00:12:24.850 --> 00:12:26.950
she happened to be the first person sequenced.

00:12:27.149 --> 00:12:29.350
To be precise, the reference sequence belongs

00:12:29.350 --> 00:12:32.429
to a specific modern haplogroup. Which is essentially

00:12:32.429 --> 00:12:34.590
a major modern branch on the human evolutionary

00:12:34.590 --> 00:12:38.129
family tree. Yes, known as H2A2A1. It's a macro

00:12:38.129 --> 00:12:41.009
-European lineage. So what does this all mean?

00:12:41.149 --> 00:12:43.730
When the DNA test says, A, your DNA is different

00:12:43.730 --> 00:12:45.389
from the reference sequence at this specific

00:12:45.389 --> 00:12:48.240
spot, what does that actually tell us? us about

00:12:48.240 --> 00:12:50.500
our own genetics. It tells us about a genetic

00:12:50.500 --> 00:12:53.100
mutation, but we have to be incredibly careful

00:12:53.100 --> 00:12:55.919
about our assumptions. When a difference is found

00:12:55.919 --> 00:12:59.460
between a tested sample and the CRS, there are

00:12:59.460 --> 00:13:02.480
two distinct possibilities. Okay. The intuitive

00:13:02.480 --> 00:13:05.039
assumption is that a mutation happens somewhere

00:13:05.039 --> 00:13:07.580
in your specific maternal lineage over the centuries,

00:13:07.860 --> 00:13:10.100
making you different from the standard. That

00:13:10.100 --> 00:13:12.500
makes sense. My ancestors evolved a new trait,

00:13:12.600 --> 00:13:15.120
so I diverge from the baseline. The second equally

00:13:15.120 --> 00:13:17.600
valid possibility is that the mutation actually

00:13:17.600 --> 00:13:21.759
arose in the lineage of the CRS itself. The CRS

00:13:21.759 --> 00:13:25.480
is just one woman's DNA and her ancestors experienced

00:13:25.480 --> 00:13:27.960
genetic mutations just like everyone else's.

00:13:28.120 --> 00:13:30.480
So a difference just means your two lineages

00:13:30.480 --> 00:13:32.600
don't match at that spot. It doesn't automatically

00:13:32.600 --> 00:13:35.279
mean your DNA is the changed version and hers

00:13:35.279 --> 00:13:37.779
is the original version. Exactly. That is a crucial

00:13:37.779 --> 00:13:40.299
paradigm shift. And the article provides some

00:13:40.299 --> 00:13:43.039
fantastic proof for this. The reference sequence

00:13:43.039 --> 00:13:46.169
itself contains its own unique Somewhat unusual

00:13:46.169 --> 00:13:48.330
quirks. It definitely does. The source notes

00:13:48.330 --> 00:13:50.789
the sequence has seven specific genetic variations

00:13:50.789 --> 00:13:53.250
or polymorphisms that are actually quite rare.

00:13:53.389 --> 00:13:56.210
Because her DNA is the baseline, those seven

00:13:56.210 --> 00:13:58.490
rare genetic quirks became the standard against

00:13:58.490 --> 00:14:00.389
which the rest of the world is measured. Can

00:14:00.389 --> 00:14:03.029
you walk us through those specific rare polymorphisms

00:14:03.029 --> 00:14:05.409
from this source? Certainly. The rare variations

00:14:05.409 --> 00:14:09.330
included in the CRS are at positions 263A, the

00:14:09.330 --> 00:14:13.889
sequence from 311C to 315C. Then there is 750A

00:14:13.889 --> 00:14:19.610
and 1438A, followed by 4769A, 8860AA, and finally

00:14:19.610 --> 00:14:23.779
15326A. That is a very specific list of rare

00:14:23.779 --> 00:14:26.220
quirks to have as the global standard. It really

00:14:26.220 --> 00:14:28.740
is. The inclusion of those rare polymorphisms

00:14:28.740 --> 00:14:31.259
in the baseline is a perfect reminder that the

00:14:31.259 --> 00:14:33.740
yardstick itself has its own unique bumpy shape.

00:14:34.000 --> 00:14:35.860
Which naturally leads to a massive question.

00:14:36.259 --> 00:14:38.919
If this European sequence from the 70s has rare

00:14:38.919 --> 00:14:41.960
mutations and once had cow DNA mixed into it

00:14:41.960 --> 00:14:44.139
and it's just one arbitrary branch on the tree,

00:14:44.320 --> 00:14:46.179
have scientists ever tried using a different

00:14:46.179 --> 00:14:48.080
baseline? They have. Have we ever said, let's

00:14:48.080 --> 00:14:49.679
try measuring from a different starting line?

00:14:50.009 --> 00:14:52.029
The scientific community has certainly explored

00:14:52.029 --> 00:14:55.070
alternatives. Researchers have utilized different

00:14:55.070 --> 00:14:57.990
reference sequences over the years to suit different

00:14:57.990 --> 00:15:01.409
studies. For instance, an African reference sequence

00:15:01.409 --> 00:15:04.509
representing one Yoluba individual has been used

00:15:04.509 --> 00:15:06.909
in place of the Cambridge sequence. And because

00:15:06.909 --> 00:15:09.570
it's a completely different person's DNA, the

00:15:09.570 --> 00:15:12.750
fundamental stats of the map change. The Yoruba

00:15:12.750 --> 00:15:15.110
sequence uses a different numbering system altogether,

00:15:15.350 --> 00:15:17.549
and it has a slightly different length. Right.

00:15:17.629 --> 00:15:21.769
While the actual length of the CRS is 16 ,568

00:15:21.769 --> 00:15:24.629
base pairs, the Yoruba reference sequence has

00:15:24.629 --> 00:15:29.269
a length of 16 ,571 base pairs. The Yoruba sequence

00:15:29.269 --> 00:15:32.070
isn't the only alternative either. The source

00:15:32.070 --> 00:15:34.049
mentions that researchers have also utilized

00:15:34.049 --> 00:15:36.289
sequences from African individuals in Uganda,

00:15:36.570 --> 00:15:39.230
as well as Swedish and Japanese sequences, depending

00:15:39.230 --> 00:15:41.230
on the specific populations they're studying.

00:15:41.639 --> 00:15:44.059
So scientists do have options. But here's where

00:15:44.059 --> 00:15:46.240
it gets really interesting. Because rather than

00:15:46.240 --> 00:15:49.340
just swapping one modern person's DNA for another

00:15:49.340 --> 00:15:52.659
modern person's DNA, there was a major push to

00:15:52.659 --> 00:15:56.419
change the paradigm entirely. In 2012. Yes, in

00:15:56.419 --> 00:16:00.059
2012, a group of researchers, led by Behar et

00:16:00.059 --> 00:16:02.779
al., proposed that we should replace the revised

00:16:02.779 --> 00:16:05.240
Cambridge reference sequence with something totally

00:16:05.240 --> 00:16:08.840
new. something called the RSRS. The RSRS stands

00:16:08.840 --> 00:16:11.559
for the Reconstructed Sapiens Reference Sequence.

00:16:11.559 --> 00:16:13.899
It is a profoundly different approach to genetic

00:16:13.899 --> 00:16:16.600
mapping. Reconstructed Sapiens Reference Sequence

00:16:16.600 --> 00:16:18.259
sounds like something out of a science fiction

00:16:18.259 --> 00:16:21.019
novel. What makes it so different? The proponents

00:16:21.019 --> 00:16:24.580
of the RSRS argue that using any random modern

00:16:24.580 --> 00:16:26.879
human as our baseline, whether they are European

00:16:26.879 --> 00:16:29.480
Yoruba or Japanese, is fundamentally flawed.

00:16:29.799 --> 00:16:32.000
Instead, they propose we should measure from

00:16:32.000 --> 00:16:34.679
the very root of the human family tree. The root?

00:16:34.840 --> 00:16:37.210
Yes. The RSRS represents the computationally

00:16:37.210 --> 00:16:39.809
reconstructed ancestral genome of mitochondrial

00:16:39.809 --> 00:16:42.669
Eve. Mitochondrial Eve, the theoretical maternal

00:16:42.669 --> 00:16:45.210
ancestor from which all currently known human

00:16:45.210 --> 00:16:49.070
mitochondria. Exactly. They mathematically reconstructed

00:16:49.070 --> 00:16:51.470
what they believe the mitochondrial DNA of that

00:16:51.470 --> 00:16:53.950
ancient common ancestor looked like. And cleverly,

00:16:53.950 --> 00:16:55.789
to avoid the literature confusion we talked about

00:16:55.789 --> 00:16:58.669
earlier with the 1999 revision, the RSRS proposal

00:16:58.669 --> 00:17:00.970
retains the exact same numbering system as the

00:17:00.970 --> 00:17:03.629
old CRS. So the filing system stays the same,

00:17:03.710 --> 00:17:06.450
but the underlying letters, the foundational

00:17:06.450 --> 00:17:08.710
baseline we are comparing ourselves against,

00:17:08.789 --> 00:17:12.730
would be this theoretical ancestral DNA, not

00:17:12.730 --> 00:17:16.339
the 1970s Cambridge DNA. Like changing the prime

00:17:16.339 --> 00:17:18.759
meridian of a map from Greenwich to the actual

00:17:18.759 --> 00:17:21.180
equator, but keeping the grid lines the same.

00:17:21.380 --> 00:17:24.599
That is a brilliant analogy. The core argument

00:17:24.599 --> 00:17:27.460
from researchers like Behar is that using the

00:17:27.460 --> 00:17:29.759
ancestral root makes it much more logical for

00:17:29.759 --> 00:17:32.160
comparing genetic changes across all different

00:17:32.160 --> 00:17:34.559
modern haplogroups. You're measuring how far

00:17:34.559 --> 00:17:37.359
everyone has independently evolved from a shared

00:17:37.359 --> 00:17:39.900
starting point. Rather than arbitrarily measuring

00:17:39.900 --> 00:17:42.299
how far everyone is from one random European

00:17:42.299 --> 00:17:44.890
branch on the tree. That makes an incredible

00:17:44.890 --> 00:17:47.789
amount of logical sense. But as with any major

00:17:47.789 --> 00:17:50.289
paradigm shift in science, you don't just erase

00:17:50.289 --> 00:17:52.529
a 30 -year -old baseline overnight without serious

00:17:52.529 --> 00:17:55.130
pushback. No, you certainly don't. We have to

00:17:55.130 --> 00:17:56.869
look at both sides of this fairly, and there

00:17:56.869 --> 00:17:58.710
is a fierce debate about this in the scientific

00:17:58.710 --> 00:18:01.430
community. It is a classic tension between theoretical

00:18:01.430 --> 00:18:04.650
elegance and practical utility. While proponents

00:18:04.650 --> 00:18:07.269
argue for the logical superiority of the RSRS,

00:18:07.650 --> 00:18:09.990
other prominent researchers like Bandelt and

00:18:09.990 --> 00:18:12.910
his colleagues in a 2014 paper cited in the article

00:18:12.910 --> 00:18:15.089
have strongly pushed back against the change.

00:18:15.430 --> 00:18:17.849
What is their primary argument for keeping the

00:18:17.849 --> 00:18:20.630
old, somewhat flawed Cambridge sequence? Their

00:18:20.630 --> 00:18:23.450
argument is rooted in historical continuity and

00:18:23.450 --> 00:18:26.390
the realities of scientific labor. Bandelt and

00:18:26.390 --> 00:18:28.730
others argue that the revised Cambridge reference

00:18:28.809 --> 00:18:31.470
sequence works perfectly fine as a standardized

00:18:31.470 --> 00:18:34.410
notation tool it is a known quantity Exactly.

00:18:34.589 --> 00:18:37.369
They argue that introducing a completely new

00:18:37.369 --> 00:18:39.950
baseline, even one that is theoretically more

00:18:39.950 --> 00:18:43.650
pure, introduces unnecessary complexity and potential

00:18:43.650 --> 00:18:46.589
for error into a system that is already universally

00:18:46.589 --> 00:18:48.930
understood. It's the ultimate if it ain't entirely

00:18:48.930 --> 00:18:51.150
broke, don't fix it argument. So we have this

00:18:51.150 --> 00:18:53.789
tension between those who want an ancestral baseline

00:18:53.789 --> 00:18:56.670
to reflect human evolution accurately and those

00:18:56.670 --> 00:18:58.589
who prefer the established historical baseline

00:18:58.589 --> 00:19:00.630
because it keeps the global filing system stable.

00:19:01.109 --> 00:19:03.470
And because of this ongoing philosophical and

00:19:03.470 --> 00:19:06.569
practical debate, the consumer DNA testing world

00:19:06.569 --> 00:19:09.490
has had to find a middle ground. The source specifically

00:19:09.490 --> 00:19:11.809
notes how companies are handling this schism.

00:19:12.170 --> 00:19:14.470
For example, the testing company Family Tree

00:19:14.470 --> 00:19:17.789
DNA currently reports maternal DNA results using

00:19:17.789 --> 00:19:20.150
both systems. Right. When you get your results,

00:19:20.309 --> 00:19:22.769
they show you your variations compared to the

00:19:22.769 --> 00:19:25.329
historical RCRS, and they also show you your

00:19:25.329 --> 00:19:28.480
variations compared to the ancestral RSRS. It

00:19:28.480 --> 00:19:30.640
is a highly practical way to bridge the gap,

00:19:30.819 --> 00:19:33.039
giving consumers and researchers the data they

00:19:33.039 --> 00:19:35.720
need, while the broader scientific community

00:19:35.720 --> 00:19:38.319
continues to debate the best standardized approach

00:19:38.319 --> 00:19:41.930
for the future. It really is. Okay, as we wrap

00:19:41.930 --> 00:19:43.849
up this deep dive, let's take a quick look back

00:19:43.849 --> 00:19:45.910
at the incredible journey we've unpacked today.

00:19:46.089 --> 00:19:47.910
It has been quite a journey. We started in the

00:19:47.910 --> 00:19:51.430
1970s marveling at the sheer manual effort of

00:19:51.430 --> 00:19:54.329
Fred Sanger's team sequencing a single woman's

00:19:54.329 --> 00:19:56.769
mitochondrial DNA. We learned about the 11 errors.

00:19:56.970 --> 00:19:59.269
And the bizarre reality of HeLa cells and cow

00:19:59.269 --> 00:20:02.509
DNA hiding in humanity's baseline genetic map

00:20:02.509 --> 00:20:06.690
for years. We explored the 1999 RCRS revision

00:20:06.690 --> 00:20:09.130
that cleaned up the mess but kept the old numbers

00:20:09.130 --> 00:20:12.009
to save the literature. And finally, we waded

00:20:12.009 --> 00:20:14.470
into the modern heated debate over whether to

00:20:14.470 --> 00:20:18.049
replace that historical artifact with the reconstructed

00:20:18.049 --> 00:20:21.049
root of mitochondrial Eve. This raises an important

00:20:21.049 --> 00:20:23.210
question, one that goes far beyond genetics.

00:20:23.650 --> 00:20:26.069
Science is often presented as a static collection

00:20:26.069 --> 00:20:29.329
of absolute facts, but it is actually a constant,

00:20:29.430 --> 00:20:32.289
sometimes messy process of refining our tools.

00:20:32.589 --> 00:20:34.450
We build a yardstick with the best technology

00:20:34.450 --> 00:20:37.049
we have, we use it, we eventually discover its

00:20:37.049 --> 00:20:40.240
flaws, and we iterate. The history of the Cambridge

00:20:40.240 --> 00:20:43.180
reference sequence is a perfect embodiment of

00:20:43.180 --> 00:20:45.779
that relentless, continuous refinement. It perfectly

00:20:45.779 --> 00:20:47.640
captures the human element of the scientific

00:20:47.640 --> 00:20:49.880
method. And I want to leave you, the listener,

00:20:50.000 --> 00:20:52.220
with a final thought to mull over as you go about

00:20:52.220 --> 00:20:54.400
your day. Okay. We just learned that our entire

00:20:54.400 --> 00:20:57.079
foundational baseline for understanding human

00:20:57.079 --> 00:20:59.900
genetic history was initially shaped by a single

00:20:59.900 --> 00:21:03.319
individual's DNA. was impacted by lab contamination

00:21:03.319 --> 00:21:06.240
and is held together by a numbering system that

00:21:06.240 --> 00:21:09.099
was intentionally left flawed just to keep old

00:21:09.099 --> 00:21:11.380
filing cabinets organized. So it makes you wonder

00:21:11.380 --> 00:21:14.220
how many other scientific baselines in our world,

00:21:14.339 --> 00:21:17.119
the standard measures and vital metrics we take

00:21:17.119 --> 00:21:19.519
for granted every single day, are actually just

00:21:19.519 --> 00:21:22.250
historical accidents waiting to be revised. Thank

00:21:22.250 --> 00:21:24.430
you so much for exploring this fascinating slice

00:21:24.430 --> 00:21:27.170
of scientific methodology with us. Yes. Thank

00:21:27.170 --> 00:21:29.430
you for joining us on this deep dive. We'll be

00:21:29.430 --> 00:21:31.609
back next time to unpack another stack of sources.

00:21:31.930 --> 00:21:34.450
Until then, keep questioning the baselines. Goodbye.