WEBVTT

00:00:00.000 --> 00:00:03.259
Have you ever taken one of those commercial DNA

00:00:03.259 --> 00:00:05.700
tests? Oh, right. The ones where you just spit

00:00:05.700 --> 00:00:07.900
in a tube and drop it in the mail. Exactly. And

00:00:07.900 --> 00:00:09.859
then a few weeks later, you get this beautiful

00:00:09.859 --> 00:00:11.939
digital interface telling you where your ancestors

00:00:11.939 --> 00:00:13.660
came from. And it gives you all the specific

00:00:13.660 --> 00:00:17.000
genetic markers you carry. Yeah. But if you pause

00:00:17.000 --> 00:00:20.480
and really look at that data, a massive question

00:00:20.480 --> 00:00:22.960
emerges. Right. When a testing company tells

00:00:22.960 --> 00:00:27.519
you that your DNA... has certain markers or specific

00:00:27.519 --> 00:00:30.519
deviations, what exactly are they comparing your

00:00:30.519 --> 00:00:33.179
DNA against? That is the big question. What is

00:00:33.179 --> 00:00:36.520
the standard baseline for a human being? Establishing

00:00:36.520 --> 00:00:39.079
that baseline is actually one of the most crucial

00:00:39.079 --> 00:00:41.340
and, ironically, one of the most complicated

00:00:41.340 --> 00:00:44.020
endeavors in modern genetics. Because you need

00:00:44.020 --> 00:00:46.359
something to compare it to. Exactly. Without

00:00:46.359 --> 00:00:48.679
a fixed point of reference, analyzing millions

00:00:48.679 --> 00:00:51.700
of base pairs of DNA across billions of people

00:00:51.700 --> 00:00:54.420
is impossible. You need a universal map. And

00:00:54.420 --> 00:00:57.590
every map needs a starting point. Right. But

00:00:57.590 --> 00:01:00.170
how the scientific community chose that starting

00:01:00.170 --> 00:01:02.429
point and what it actually looks like under the

00:01:02.429 --> 00:01:04.829
microscope, well, it reveals a lot about how

00:01:04.829 --> 00:01:07.290
science actually progresses. Okay, let's unpack

00:01:07.290 --> 00:01:09.790
this. We are doing a deep dive today into the

00:01:09.790 --> 00:01:13.250
fascinating, slightly flawed, and constantly

00:01:13.250 --> 00:01:16.269
evolving yardstick that scientists use to map

00:01:16.269 --> 00:01:19.209
human ancestry. Specifically, we're looking at

00:01:19.209 --> 00:01:23.310
human mitochondrial DNA, or mtDNA. And exploring

00:01:23.310 --> 00:01:25.290
the Wikipedia article in the Cambridge Reference

00:01:25.290 --> 00:01:27.670
Sequence. Let's go back to the origin of this

00:01:27.670 --> 00:01:30.230
blueprint, which is a story of incredible ambition,

00:01:30.450 --> 00:01:33.590
but also a rather notorious laboratory blunder.

00:01:33.650 --> 00:01:35.430
To understand the origin, we have to travel back

00:01:35.430 --> 00:01:37.849
to the 1970s. A completely different era for

00:01:37.849 --> 00:01:40.930
genetics. Oh, absolutely. It was an era of incredible

00:01:40.930 --> 00:01:42.709
foundational work, but it was also incredibly

00:01:42.709 --> 00:01:46.930
manual. Sequencing DNA wasn't done by dropping

00:01:46.930 --> 00:01:49.810
a sample into some automated machine that spits

00:01:49.810 --> 00:01:51.790
out data in 10 minutes. Right, they had to do

00:01:51.790 --> 00:01:54.569
it by hand. It involved pouring physical gels,

00:01:54.709 --> 00:01:57.950
using radioactive tags, and painstakingly reading

00:01:57.950 --> 00:02:01.310
bands off X -ray films. At the University of

00:02:01.310 --> 00:02:03.810
Cambridge, a group of researchers, led by Fred

00:02:03.810 --> 00:02:06.650
Sanger, took on what was a monumental task at

00:02:06.650 --> 00:02:09.069
the time. Sequencing the entire mitochondrial

00:02:09.069 --> 00:02:12.050
genome of a single human being. Yes. And it was

00:02:12.050 --> 00:02:14.580
just one person. one woman of European descent.

00:02:14.659 --> 00:02:16.719
This was one. They worked on this immense puzzle

00:02:16.719 --> 00:02:19.360
throughout the entire decade, finally publishing

00:02:19.360 --> 00:02:22.099
the complete sequence in 1981. And it was an

00:02:22.099 --> 00:02:24.039
astonishing technical achievement. It really

00:02:24.039 --> 00:02:26.439
was. And to give some scale to what they were

00:02:26.439 --> 00:02:29.080
looking at, this mitochondrial genome contained

00:02:29.080 --> 00:02:32.379
some 37 genes. Which is tiny. Right, that makes

00:02:32.379 --> 00:02:37.030
up only about... 0 .00006 % of the nuclear human

00:02:37.030 --> 00:02:40.810
genome. It's a tiny, tiny fraction of the total

00:02:40.810 --> 00:02:42.750
genetic material that makes up a human body.

00:02:42.849 --> 00:02:45.330
It is a tiny fraction, but mitochondrial DNA

00:02:45.330 --> 00:02:47.949
is uniquely useful for genealogy. Because of

00:02:47.949 --> 00:02:50.129
how it's inherited, right? Exactly. Because it's

00:02:50.129 --> 00:02:52.389
passed down strictly from mother to child, it

00:02:52.389 --> 00:02:54.949
doesn't undergo the same complex shuffling that

00:02:54.949 --> 00:02:57.449
nuclear DNA does during reproduction. So it acts

00:02:57.449 --> 00:02:59.509
as a relatively stable chronological record.

00:02:59.949 --> 00:03:02.689
Precisely. When Sanger's team published their

00:03:02.689 --> 00:03:06.469
work in 1981, they determined this complete mitochondrial

00:03:06.469 --> 00:03:11.629
sequence to be exactly 16 ,569 base pairs long.

00:03:11.810 --> 00:03:16.449
16 ,569. Yes. They essentially laid down the

00:03:16.449 --> 00:03:18.930
very first track of the genetic railroad, giving

00:03:18.930 --> 00:03:21.909
scientists everywhere a standard template. This

00:03:21.909 --> 00:03:24.069
became known as the Cambridge Reference Sequence,

00:03:24.069 --> 00:03:27.599
or CRS. And having that universal template. allowed

00:03:27.599 --> 00:03:30.099
the entire field to accelerate. It gave everyone

00:03:30.099 --> 00:03:32.580
a common language. But as technology improved,

00:03:32.960 --> 00:03:35.360
other researchers inevitably started to repeat

00:03:35.360 --> 00:03:38.039
the sequencing to verify the work or build upon

00:03:38.039 --> 00:03:40.000
it. And that is when they noticed some striking

00:03:40.000 --> 00:03:42.180
discrepancies. Between what was published in

00:03:42.180 --> 00:03:44.919
1981 and what they were actually seeing in their

00:03:44.919 --> 00:03:47.039
own labs. Right. The original published sequence

00:03:47.039 --> 00:03:49.879
was actually flawed. As other scientists dug

00:03:49.879 --> 00:03:51.919
into the data over the next decade, they uncovered

00:03:51.919 --> 00:03:55.680
11 specific errors in that initial 1981 publication.

00:03:56.099 --> 00:03:59.259
11 errors? Yes. These included incorrect assignments

00:03:59.259 --> 00:04:01.400
of single base pairs, which is somewhat expected

00:04:01.400 --> 00:04:03.580
given the manual reading of those x -ray gels

00:04:03.580 --> 00:04:05.580
we talked about. It's easy to misread a blurry

00:04:05.580 --> 00:04:08.199
band on an x -ray film. But it wasn't just a

00:04:08.199 --> 00:04:10.719
matter of misreading a gel. The most shocking

00:04:10.719 --> 00:04:13.719
part of the initial mapping effort was the physical

00:04:13.719 --> 00:04:16.519
state of the sample itself. The contamination?

00:04:16.860 --> 00:04:19.670
Yes. The researchers eventually realized that

00:04:19.670 --> 00:04:22.529
the sample wasn't purely human. Which is wild

00:04:22.529 --> 00:04:24.689
to think about now. It had been contaminated

00:04:24.689 --> 00:04:28.529
with bovine DNA and HeLa cell specimens. Contamination

00:04:28.529 --> 00:04:31.730
is the invisible enemy in any laboratory, especially

00:04:31.730 --> 00:04:34.389
in the early days of sequencing before modern

00:04:34.389 --> 00:04:37.990
clean room protocols were standardized. But cow

00:04:37.990 --> 00:04:41.050
DNA. I know, it sounds crazy. But the fact that

00:04:41.050 --> 00:04:44.310
bovine DNA and HeLa cells, which are a famously

00:04:44.310 --> 00:04:47.129
robust and aggressive line of human cells used

00:04:47.129 --> 00:04:49.750
in research globally, managed to sneak into the

00:04:49.750 --> 00:04:52.139
first complete mitochondrial... sequence is a

00:04:52.139 --> 00:04:55.279
testament to how delicate the process is. It's

00:04:55.279 --> 00:04:57.819
incredibly sensitive. But the beauty of the scientific

00:04:57.819 --> 00:05:00.720
method is that it is fundamentally a self -correcting

00:05:00.720 --> 00:05:02.800
discipline. So when they finally scrubbed the

00:05:02.800 --> 00:05:04.920
bovine and helicontamination out of the sample

00:05:04.920 --> 00:05:07.579
and corrected the manual reading errors, what

00:05:07.579 --> 00:05:09.379
did they actually have left? Did the sequence

00:05:09.379 --> 00:05:12.699
change drastically? It took until 1999 for a

00:05:12.699 --> 00:05:15.800
formal correction to be published. A group of

00:05:15.800 --> 00:05:18.509
researchers led by Andrews et al. cleaned up

00:05:18.509 --> 00:05:20.589
the sequence, and published what is known as

00:05:20.589 --> 00:05:22.910
the Revised Cambridge Reference Sequence, or

00:05:22.910 --> 00:05:26.050
the RCRS. Or Revised Version. Right. They formalized

00:05:26.050 --> 00:05:28.870
the exact structure, and in doing so, confirmed

00:05:28.870 --> 00:05:31.230
that the true length of this woman's mitochondrial

00:05:31.230 --> 00:05:35.709
genome wasn't 16 ,569 base pairs at all. It was

00:05:35.709 --> 00:05:37.850
different. It was actually one base pair shorter,

00:05:38.069 --> 00:05:42.269
16 ,568. Hold on. If they deleted a base pair

00:05:42.269 --> 00:05:44.949
from the official record, doesn't that ruin the

00:05:44.949 --> 00:05:46.990
numbering system for every single scientific

00:05:46.990 --> 00:05:50.170
paper and database published since 1981? It creates

00:05:50.170 --> 00:05:53.529
a massive logistical nightmare. By 1999, the

00:05:53.529 --> 00:05:55.550
scientific community had been using the original

00:05:55.550 --> 00:05:58.189
Cambridge reference sequence for nearly two decades.

00:05:58.230 --> 00:06:00.550
That's thousands of papers. Thousands of papers,

00:06:00.730 --> 00:06:03.529
global studies, bioinformatics databases, all

00:06:03.529 --> 00:06:06.050
built on that specific coordinate system. So

00:06:06.050 --> 00:06:08.149
if you shrink the map by one base pair early

00:06:08.149 --> 00:06:10.129
in the sequence... Every single coordinate after...

00:06:16.759 --> 00:06:19.420
Rendering years of literature instantly obsolete

00:06:19.420 --> 00:06:21.639
and incredibly confusing to cross -reference,

00:06:21.779 --> 00:06:25.339
it would be like a city deciding to renumber

00:06:25.339 --> 00:06:30.300
every single house on a massive avenue. So how

00:06:30.300 --> 00:06:32.500
do they handle that missing base pair without

00:06:32.500 --> 00:06:34.379
breaking the architecture of modern genetics?

00:06:39.339 --> 00:06:42.399
technical debt a workaround yes there was an

00:06:42.399 --> 00:06:44.540
extra base pair originally listed at position

00:06:44.540 --> 00:06:48.939
3107 in the flawed 1981 sequence it was recorded

00:06:48.939 --> 00:06:52.600
as the nucleotide c okay in the 1999 revision

00:06:52.600 --> 00:06:54.839
since they knew it wasn't actually supposed to

00:06:54.839 --> 00:06:57.000
be there they didn't just delete it and shift

00:06:57.000 --> 00:06:59.199
all the numbers down instead they changed that

00:06:59.199 --> 00:07:02.060
c to an n just a placeholder a literal phantom

00:07:02.060 --> 00:07:05.160
placeholder the n stands in for an unknown or

00:07:05.160 --> 00:07:07.920
unspecified nucleotide purely to maintain the

00:07:07.920 --> 00:07:10.899
original 1981 numbering system. That is hilarious.

00:07:11.279 --> 00:07:14.240
So the sequence goes from position 3106, skips

00:07:14.240 --> 00:07:17.480
over the phantom N at 3107, and continues right

00:07:17.480 --> 00:07:20.480
along at 3108. So the scientific community collectively

00:07:20.480 --> 00:07:22.879
agreed to live with a permanent numbering anomaly

00:07:22.879 --> 00:07:26.060
rather than rewrite 20 years of genetic literature.

00:07:26.399 --> 00:07:29.600
Exactly. Today, this revised sequence is safely

00:07:29.600 --> 00:07:32.160
deposited in the GenBank database under the accession

00:07:32.160 --> 00:07:38.670
number NC012920. NC012920. That is the official,

00:07:38.870 --> 00:07:41.769
slightly jury -rigged barcode for the baseline

00:07:41.769 --> 00:07:44.870
of human mitochondrial DNA. It is. Which brings

00:07:44.870 --> 00:07:47.290
us to how this actually affects anyone who has

00:07:47.290 --> 00:07:49.870
ever dabbed a cheek swab for a genealogical test.

00:07:50.350 --> 00:07:53.089
When you get your mtDNA results back, the lab

00:07:53.089 --> 00:07:55.170
doesn't just print out a list of your 16 ,000

00:07:55.170 --> 00:07:57.410
base pairs. No, that would be unreadable. Your

00:07:57.410 --> 00:07:59.610
results are usually reported as a list of differences

00:07:59.610 --> 00:08:01.750
from this revised Cambridge reference sequence.

00:08:02.110 --> 00:08:03.889
What's fascinating here is how that reporting

00:08:03.889 --> 00:08:06.389
method creates a deeply ingrained psychological

00:08:06.389 --> 00:08:09.490
bias. We can call it the yardstick misconception.

00:08:09.990 --> 00:08:12.410
The yardstick misconception. Yes. When someone

00:08:12.410 --> 00:08:14.589
sees a genetic report that lists their differences

00:08:14.589 --> 00:08:16.670
or mutations compared to the reference sequence,

00:08:16.949 --> 00:08:19.769
the natural instinct is to assume that the reference

00:08:19.769 --> 00:08:22.310
sequence is the original, pure, or biologically

00:08:22.310 --> 00:08:25.029
normal human blueprint. The implication is that

00:08:25.029 --> 00:08:28.490
your own DNA is a deviation from that norm. It

00:08:28.490 --> 00:08:30.649
makes you feel like you are the one who mutated

00:08:30.649 --> 00:08:33.159
away from the standard. Precisely. But a helpful

00:08:33.159 --> 00:08:36.179
way to reframe the RCRS is to think of it like

00:08:36.179 --> 00:08:38.879
the prime meridian. The prime meridian runs right

00:08:38.879 --> 00:08:41.299
through Greenwich, England. Right. But there

00:08:41.299 --> 00:08:43.899
is nothing geologically zero about Greenwich.

00:08:43.980 --> 00:08:46.580
It's just an arbitrary reference point that mapmakers

00:08:46.580 --> 00:08:48.779
agreed upon so everyone could navigate. If you

00:08:48.779 --> 00:08:51.750
live at 40 degrees west. You aren't deviating

00:08:51.750 --> 00:08:53.950
from a perfect location. You're just located

00:08:53.950 --> 00:08:56.529
somewhere else relative to that agreed upon pin

00:08:56.529 --> 00:08:58.309
on the map. The Cambridge reference sequence

00:08:58.309 --> 00:09:01.230
functions in the exact same way. It is purely

00:09:01.230 --> 00:09:03.789
a reference sequence, not a record of the earliest

00:09:03.789 --> 00:09:06.470
human empty DNA. It is not the biological center

00:09:06.470 --> 00:09:09.299
of humanity. Far from it. Geneticists know exactly

00:09:09.299 --> 00:09:11.980
where it sits on the human family tree. The reference

00:09:11.980 --> 00:09:15.360
sequence belongs to a very specific modern lineage

00:09:15.360 --> 00:09:19.820
known as macro -European haplogroup H2A2A1. Wait,

00:09:19.919 --> 00:09:21.919
so when a testing company tells me I have a mutation,

00:09:22.139 --> 00:09:24.379
I haven't necessarily deviated from the human

00:09:24.379 --> 00:09:26.960
standard? I'm just different from this one specific

00:09:26.960 --> 00:09:29.899
European woman from the 1970s? That is the core

00:09:29.899 --> 00:09:32.240
of the misconception. A difference between your

00:09:32.240 --> 00:09:34.960
DNA and the CRS doesn't automatically mean you

00:09:34.960 --> 00:09:38.000
carry a rare mutation. The mutation might have

00:09:38.000 --> 00:09:40.360
actually occurred in the lineage of the Cambridge

00:09:40.360 --> 00:09:42.799
sequence itself. Oh, wow. In fact, the baseline

00:09:42.799 --> 00:09:45.259
we use for all of humanity contains genetic markers

00:09:45.259 --> 00:09:47.580
that are highly unusual when you look at the

00:09:47.580 --> 00:09:49.919
broader global population. The baseline itself

00:09:49.919 --> 00:09:52.740
is unusual. Yes. Researchers have pinpointed

00:09:52.740 --> 00:09:55.679
seven distinct locations on this genetic map

00:09:55.679 --> 00:09:58.519
where our supposed standard baseline includes

00:09:58.519 --> 00:10:02.120
rare polymorphisms. Seven distinct spots where

00:10:02.120 --> 00:10:04.159
the baseline is actually the global outlier.

00:10:04.299 --> 00:10:06.470
Yes. Do we know which ones? We do. These are

00:10:06.470 --> 00:10:10.129
specific positions like 263A, the cluster from

00:10:10.129 --> 00:10:16.269
311C to 315C, 750A, 1438A, 4769A, 03860A, and

00:10:16.269 --> 00:10:20.000
15326A. So a polymorphism is essentially a genetic

00:10:20.000 --> 00:10:22.179
variation. Correct. The fact that the universal

00:10:22.179 --> 00:10:24.580
yardstick contains seven rare variations means

00:10:24.580 --> 00:10:27.179
that for vast swaths of the global population,

00:10:27.440 --> 00:10:30.460
comparing their DNA to the CRS involves documenting

00:10:30.460 --> 00:10:32.259
differences that are actually just their own

00:10:32.259 --> 00:10:34.720
DNA being entirely normal. Exactly. While the

00:10:34.720 --> 00:10:37.340
Cambridge sequence is the odd one out. That realization

00:10:37.340 --> 00:10:40.000
seems like it would force researchers to rethink

00:10:40.000 --> 00:10:43.419
how we compare DNA globally. If you are studying

00:10:43.419 --> 00:10:46.480
populations in Africa or Asia, comparing their

00:10:46.480 --> 00:10:51.019
DNA to a European haplogroup, H2A2A1 sequence

00:10:51.019 --> 00:10:53.639
from Cambridge isn't just arbitrary, it could

00:10:53.639 --> 00:10:56.240
be analytically inefficient. It is. If the Cambridge

00:10:56.240 --> 00:10:58.860
sequence is just one pin on the map, why not

00:10:58.860 --> 00:11:01.320
make others? Well, the scientific community did

00:11:01.320 --> 00:11:03.720
attempt to diversify the baseline. Because the

00:11:03.720 --> 00:11:06.620
CRS is based on that single European individual,

00:11:07.019 --> 00:11:09.279
researchers began using alternative reference

00:11:09.279 --> 00:11:11.700
sequences. What were some of the alternatives?

00:11:12.059 --> 00:11:15.000
One notable example is the Yoruba reference sequence,

00:11:15.259 --> 00:11:18.120
representing the mitochondrial genome of an African

00:11:18.120 --> 00:11:20.559
individual. And because it's a different person

00:11:20.559 --> 00:11:22.580
from an entirely different lineage, the map looks

00:11:22.580 --> 00:11:25.039
different. Yes. The Yoruba sequence has a length

00:11:25.039 --> 00:11:29.539
of 16 ,571 base pairs, which is three base pairs

00:11:29.539 --> 00:11:32.299
longer than the revised Cambridge sequence. And

00:11:32.299 --> 00:11:34.870
scientists didn't stop there. They've also utilized

00:11:34.870 --> 00:11:37.669
an African Uganda sequence, a Swedish sequence,

00:11:37.889 --> 00:11:40.389
and a Japese sequence as alternative baselines.

00:11:40.570 --> 00:11:42.690
The intention behind localizing the reference

00:11:42.690 --> 00:11:45.629
points was good, but it created a massive fragmentation

00:11:45.629 --> 00:11:49.610
problem. Too many maps. Exactly. Having multiple

00:11:49.610 --> 00:11:52.309
different yardsticks, each with its own unique

00:11:52.309 --> 00:11:54.850
length and completely different numbering system,

00:11:55.070 --> 00:11:57.850
fractures the research landscape. You lose the

00:11:57.850 --> 00:11:59.789
universal language that the original Cambridge

00:11:59.789 --> 00:12:02.460
sequence provided. Even with its flaws and its

00:12:02.460 --> 00:12:06.179
phantom placeholder at 3107, the CRS allowed

00:12:06.179 --> 00:12:08.620
a researcher in Tokyo to perfectly understand

00:12:08.620 --> 00:12:11.039
a paper published by a researcher in London.

00:12:11.259 --> 00:12:13.980
Here's where it gets really interesting. In 2012,

00:12:14.299 --> 00:12:18.179
a group of scientists, led by Behar et al., published

00:12:18.179 --> 00:12:21.419
a paper proposing a massive philosophical shift

00:12:21.419 --> 00:12:23.759
in how genetics handles this mapping problem.

00:12:23.980 --> 00:12:26.039
They called it a Copernican reassessment of the

00:12:26.039 --> 00:12:29.190
human mitochondrial DNA tree. They proposed throwing

00:12:29.190 --> 00:12:31.350
out the revised Cambridge reference sequence

00:12:31.350 --> 00:12:33.649
as the standard and replacing it with something

00:12:33.649 --> 00:12:36.850
called the RSRS. That Copernican analogy is perfect.

00:12:37.049 --> 00:12:39.529
Before Copernicus, astronomers believed the Earth

00:12:39.529 --> 00:12:41.629
was the center of the universe, and all the math

00:12:41.629 --> 00:12:44.269
used to track the planets was incredibly convoluted

00:12:44.269 --> 00:12:45.950
because it was based on an arbitrary starting

00:12:45.950 --> 00:12:48.019
point. Right, trying to make the math fit the

00:12:48.019 --> 00:12:51.100
wrong center. Behar, et al., argued that using

00:12:51.100 --> 00:12:53.960
a modern human, whether from Cambridge, the Yoruba

00:12:53.960 --> 00:12:56.840
population, or Japan, as the center of the genetic

00:12:56.840 --> 00:13:00.340
universe was equally flawed. The RSRS, which

00:13:00.340 --> 00:13:02.940
stands for the Reconstructed Sapiens Reference

00:13:02.940 --> 00:13:06.440
Sequence, completely changes the paradigm. So

00:13:06.440 --> 00:13:08.360
instead of picking another modern person, what

00:13:08.360 --> 00:13:11.799
does the RSRS actually use as its baseline? It

00:13:11.799 --> 00:13:14.580
relies on a theoretical model. The RSRS represents

00:13:14.580 --> 00:13:18.100
the hypothetical ancestral genome of mitochondrial

00:13:18.100 --> 00:13:21.539
Eve. Yes, the theoretical root from which all

00:13:21.539 --> 00:13:24.480
currently known human mitochondria descend. That

00:13:24.480 --> 00:13:27.139
is a fundamental shift in perspective. Instead

00:13:27.139 --> 00:13:29.080
of picking a leaf on the outside of the human

00:13:29.080 --> 00:13:31.080
family tree and comparing all the other leaves

00:13:31.080 --> 00:13:33.419
to it, they mathematically work their way backward.

00:13:33.840 --> 00:13:35.879
Down the branches. All the way to the root of

00:13:35.879 --> 00:13:38.600
the tree. They reconstructed what the DNA of

00:13:38.600 --> 00:13:40.919
our most recent common matrilineal ancestor would

00:13:40.919 --> 00:13:43.299
have looked like. From an evolutionary and analytical

00:13:43.299 --> 00:13:46.259
standpoint, it is a vastly more elegant approach.

00:13:46.759 --> 00:13:49.899
If we connect this to the bigger picture. Using

00:13:49.899 --> 00:13:52.980
the ancestral root as the baseline is incredibly

00:13:52.980 --> 00:13:55.820
useful for comparing the changes across different

00:13:55.820 --> 00:13:57.899
haplogroups. Because you have a true starting

00:13:57.899 --> 00:14:00.539
point. Yes. When you measure differences from

00:14:00.539 --> 00:14:03.539
the RSRS, you are measuring the actual evolutionary

00:14:03.539 --> 00:14:06.519
path of mutations as they occurred chronologically

00:14:06.519 --> 00:14:08.919
over human history. You are tracking the journey

00:14:08.919 --> 00:14:12.000
from the root to the leaf, rather than just calculating

00:14:12.000 --> 00:14:14.399
the lateral distance between two modern leaves.

00:14:14.720 --> 00:14:17.240
It makes tracing the lineage much clearer. It

00:14:17.240 --> 00:14:19.840
makes total sense conceptually. But jumping back

00:14:19.840 --> 00:14:21.700
to the logistical problem we talked about earlier,

00:14:21.919 --> 00:14:24.779
if you change the baseline entirely to a reconstructed

00:14:24.779 --> 00:14:28.080
ancestor, doesn't that break all the bioinformatics

00:14:28.080 --> 00:14:30.799
databases again? That was the brilliance of the

00:14:30.799 --> 00:14:34.139
Behar et al. proposal. The RSRS actually keeps

00:14:34.139 --> 00:14:36.820
the exact same numbering system as the old Cambridge

00:14:36.820 --> 00:14:39.879
sequence. Oh, really? Yes. They deliberately

00:14:39.879 --> 00:14:42.700
maintain the legacy framework to ensure stability

00:14:42.700 --> 00:14:45.679
in the global databases, even while fundamentally

00:14:45.679 --> 00:14:47.840
changing what the sequence represents at its

00:14:47.840 --> 00:14:50.610
core. A structural workaround to save the databases.

00:14:50.970 --> 00:14:53.809
Exactly. But a paradigm shift like moving from

00:14:53.809 --> 00:14:58.110
a tangible European woman to a theoretical mitochondrial

00:14:58.110 --> 00:15:01.730
Eve probably wasn't embraced overnight. It absolutely

00:15:01.730 --> 00:15:05.289
sparked an ongoing debate. In 2014, a group led

00:15:05.289 --> 00:15:08.629
by Bandelt et al. published a paper pushing back

00:15:08.629 --> 00:15:11.809
against the RSRS, arguing strongly to keep using

00:15:11.809 --> 00:15:14.049
the revised Cambridge sequence. What was their

00:15:14.049 --> 00:15:16.149
reasoning? They made the case for standardizing

00:15:16.149 --> 00:15:19.690
notation around the physical known entity. It

00:15:19.690 --> 00:15:21.850
seems like a debate between practicality and

00:15:21.850 --> 00:15:25.850
theoretical purity. The RCRS is a tangible, physical

00:15:25.850 --> 00:15:28.470
sequence. We know it works as a comparative tool,

00:15:28.629 --> 00:15:31.169
despite the cow DNA history and the phantom N.

00:15:31.480 --> 00:15:33.580
Right. The RSRS is intellectually satisfying

00:15:33.580 --> 00:15:35.980
and evolutionarily accurate, but it remains a

00:15:35.980 --> 00:15:38.620
reconstructed theoretical sequence. It is a model,

00:15:38.720 --> 00:15:40.519
ultimately. So where does that leave the testing

00:15:40.519 --> 00:15:42.740
companies today? Do they use the quirky European

00:15:42.740 --> 00:15:45.980
yardstick or the reconstructed ghost of mitochondrial

00:15:45.980 --> 00:15:48.620
Eve? Commercial companies like Family Tree DNA

00:15:48.620 --> 00:15:50.960
have taken a very pragmatic approach. What do

00:15:50.960 --> 00:15:53.799
they do? They decided not to choose. They now

00:15:53.799 --> 00:15:56.940
report mtDNA results for both the RCRS and the

00:15:56.940 --> 00:15:59.639
RSRS. That's smart. They cover all the bases,

00:15:59.980 --> 00:16:02.259
providing your genetic differences compared to

00:16:02.259 --> 00:16:04.940
both the historical modern standard and the ancient

00:16:04.940 --> 00:16:08.000
evolutionary route. It allows users to view their

00:16:08.000 --> 00:16:11.159
genetics through multiple lenses. So what does

00:16:11.159 --> 00:16:13.639
this all mean? We started this deep dive looking

00:16:13.639 --> 00:16:16.539
at a reference sequence, but we found a wild

00:16:16.539 --> 00:16:19.320
timeline of human endeavor. It really is wild.

00:16:19.850 --> 00:16:22.070
The foundation of modern genetic genealogy started

00:16:22.070 --> 00:16:25.269
in the 1970s with a monumental effort to sequence

00:16:25.269 --> 00:16:28.450
one single woman's DNA. That sample survived

00:16:28.450 --> 00:16:31.490
contamination with bovine and HeLa cells. It

00:16:31.490 --> 00:16:34.110
had to be painstakingly revised. It required

00:16:34.110 --> 00:16:36.350
a phantom placeholder just to keep the world's

00:16:36.350 --> 00:16:38.649
databases from breaking. And it turned out to

00:16:38.649 --> 00:16:40.850
contain rare mutations of its own. And despite

00:16:40.850 --> 00:16:43.070
all that, it built the modern genetic landscape.

00:16:43.250 --> 00:16:46.490
It evolved into a standardized map. And now it

00:16:46.490 --> 00:16:48.509
is shifting toward a theoretical reconstruction

00:16:48.509 --> 00:16:51.429
of the mother of all modern human lineages. It

00:16:51.429 --> 00:16:53.350
is a remarkable narrative. I think the most important

00:16:53.350 --> 00:16:55.350
takeaway for you listening to this is about the

00:16:55.350 --> 00:16:57.899
nature of scientific truth. How so? Every time

00:16:57.899 --> 00:17:00.299
you read a medical claim or look at a genetic

00:17:00.299 --> 00:17:02.759
test result or hear about a biological standard,

00:17:02.960 --> 00:17:05.099
it is essential to remember that the baseline

00:17:05.099 --> 00:17:07.619
of normal is a human construct. It's an agreed

00:17:07.619 --> 00:17:10.500
-upon point. Yes. It is a tool we build to help

00:17:10.500 --> 00:17:13.519
us understand a chaotic universe. And like all

00:17:13.519 --> 00:17:16.519
tools, it is subject to revision, to the realities

00:17:16.519 --> 00:17:19.980
of laboratory limitations, and to rigorous, ongoing

00:17:19.980 --> 00:17:23.670
debate. The yardstick we use to measure humanity

00:17:23.670 --> 00:17:25.890
is always evolving alongside our understanding

00:17:25.890 --> 00:17:29.210
of ourselves. It completely changes how you view

00:17:29.210 --> 00:17:32.410
a simple pie chart or a list of genetic markers.

00:17:32.769 --> 00:17:35.029
You aren't just looking at your history. You're

00:17:35.029 --> 00:17:36.509
looking at your history filtered through the

00:17:36.509 --> 00:17:39.390
lens of a highly specific, passionately debated,

00:17:39.490 --> 00:17:42.470
and ever -changing scientific map. Which leaves

00:17:42.470 --> 00:17:44.549
us with a fascinating question to consider as

00:17:44.549 --> 00:17:46.650
we wrap up our exploration of this baseline.

00:17:46.990 --> 00:17:49.440
Let's hear it. If our ultimate genetic reference

00:17:49.440 --> 00:17:52.400
point is actively shifting from a tangible physical

00:17:52.400 --> 00:17:54.380
sample like that original Cambridge sequence

00:17:54.380 --> 00:17:57.940
to a reconstructed model of the past like the

00:17:57.940 --> 00:18:00.960
RSRS of mitochondrial Eve, how might our entire

00:18:00.960 --> 00:18:03.140
understanding of human history and our evolutionary

00:18:03.140 --> 00:18:06.019
relationships shift if scientists 50 years from

00:18:06.019 --> 00:18:08.339
now discover that our theoretical reconstruction

00:18:08.339 --> 00:18:11.420
of Eve was slightly off? That is a haunting thought.

00:18:11.619 --> 00:18:13.660
When the foundation of your map is a theoretical

00:18:13.660 --> 00:18:16.200
ghost, what happens when the ghost changes shape?