WEBVTT

00:00:00.000 --> 00:00:03.759
Right now, your brain is performing millions

00:00:03.759 --> 00:00:07.299
of agonizingly precise mathematical calculations

00:00:07.299 --> 00:00:10.060
just to understand the end of this sentence.

00:00:10.119 --> 00:00:12.279
Yeah, and you don't even feel it happening. Exactly.

00:00:12.460 --> 00:00:15.199
You just effortlessly decode the acoustic vibrations

00:00:15.199 --> 00:00:16.839
coming out of whatever speaker you're listening

00:00:16.839 --> 00:00:19.440
to right now. You turn them into concepts, images,

00:00:20.500 --> 00:00:22.679
and ideas inside your head. It's just wild. And

00:00:22.679 --> 00:00:26.789
the crazy thing is... For about 50 years, the

00:00:26.789 --> 00:00:29.230
smartest computer scientists on the planet couldn't

00:00:29.230 --> 00:00:31.210
figure out how to build a machine to do that

00:00:31.210 --> 00:00:33.409
exact same thing. Right. Because it's the ultimate

00:00:33.409 --> 00:00:36.049
paradox of human biology, isn't it? Oh, absolutely.

00:00:36.350 --> 00:00:38.429
The things that feel the most automatic to us

00:00:38.429 --> 00:00:40.929
walking, recognizing a face, or just having a

00:00:40.929 --> 00:00:43.689
casual conversation, they're actually built on

00:00:43.689 --> 00:00:46.329
mechanical frameworks so wildly complex that

00:00:46.329 --> 00:00:48.509
they just, well, they break our most advanced

00:00:48.509 --> 00:00:51.289
supercomputers. Welcome to another Deep Dive.

00:00:51.390 --> 00:00:54.000
Today, we are focusing our attention on a single

00:00:54.000 --> 00:00:57.179
incredibly dense piece of source material. We're

00:00:57.179 --> 00:00:59.539
looking at a comprehensive Wikipedia breakdown

00:00:59.539 --> 00:01:02.420
on the interdisciplinary field of computational

00:01:02.420 --> 00:01:04.939
linguistics. Yeah, it's a big one. It really

00:01:04.939 --> 00:01:07.819
is. And our mission for this deep dive is to

00:01:07.819 --> 00:01:10.920
explore how the sort of desperate attempt to

00:01:10.920 --> 00:01:13.500
teach computers to understand human language

00:01:13.500 --> 00:01:16.060
accidentally revealed some of the most fascinating

00:01:16.060 --> 00:01:18.939
insights into how human beings and specifically

00:01:18.939 --> 00:01:22.920
human infants actually learn to speak. We should

00:01:22.920 --> 00:01:25.500
probably establish the sheer scale of what we're

00:01:25.500 --> 00:01:28.239
talking about here first, though. I mean, computational

00:01:28.239 --> 00:01:30.780
linguistics isn't just some new sub -department

00:01:30.780 --> 00:01:33.019
of computer science. Oh, not at all. When you

00:01:33.019 --> 00:01:35.519
try to force human language into a silicon chip,

00:01:36.000 --> 00:01:39.019
you are essentially initiating this massive collision

00:01:39.019 --> 00:01:42.719
between cognitive psychology, philosophy, formal

00:01:42.719 --> 00:01:45.920
logic, neuroscience, anthropology. And mathematics,

00:01:46.219 --> 00:01:48.640
too. Right, mathematics. It touches almost every

00:01:48.640 --> 00:01:50.500
single discipline that studies the human mind.

00:01:50.599 --> 00:01:53.719
OK, let's unpack this. because to really understand

00:01:53.719 --> 00:01:56.739
the sophisticated software we use today, we have

00:01:56.739 --> 00:01:59.500
to start at the absolute beginning when our assumptions

00:01:59.500 --> 00:02:02.680
about language were just incredibly flawed. The

00:02:02.680 --> 00:02:05.140
timeline of this specific field really kicks

00:02:05.140 --> 00:02:07.620
off in the 1950s, right in the thick of the Cold

00:02:07.620 --> 00:02:10.569
War. And that geopolitical context is vital.

00:02:11.110 --> 00:02:13.729
The field didn't start because scientists were

00:02:13.729 --> 00:02:16.650
just, you know, curious about language. It started

00:02:16.650 --> 00:02:19.689
out of a very specific, very urgent military

00:02:19.689 --> 00:02:21.889
need in the United States. Right. They wanted

00:02:21.889 --> 00:02:24.669
to use early computers to automatically translate

00:02:24.669 --> 00:02:28.629
foreign texts into English. Exactly. And specifically,

00:02:28.629 --> 00:02:31.090
they were looking at Russian scientific journals.

00:02:31.210 --> 00:02:33.289
They wanted to know what Soviet scientists were

00:02:33.289 --> 00:02:35.990
researching, but human translation was just too

00:02:35.990 --> 00:02:39.159
slow. So the dream was to have a machine just

00:02:39.159 --> 00:02:42.080
in just a Russian journal on one end and spit

00:02:42.080 --> 00:02:43.819
out an English version on the other. That was

00:02:43.819 --> 00:02:46.590
the dream, yeah. And the initial expectation

00:02:46.590 --> 00:02:48.870
for how this would work was based purely on what

00:02:48.870 --> 00:02:51.370
computers were already good at. At the time,

00:02:51.590 --> 00:02:54.069
early computers were proving to be absolute marvels

00:02:54.069 --> 00:02:56.330
at arithmetic calculations. They could crunch

00:02:56.330 --> 00:02:59.849
complex algebra using explicit, rigid mathematical

00:02:59.849 --> 00:03:03.090
rules much faster than a human ever could. Right,

00:03:03.210 --> 00:03:04.990
so the researchers looked at that capability

00:03:04.990 --> 00:03:07.150
and thought, well, language is basically just

00:03:07.150 --> 00:03:09.389
a system of rules, right? Wow, so they just assumed

00:03:09.389 --> 00:03:12.659
it was math. That was the fatal assumption. They

00:03:12.659 --> 00:03:14.879
believed that the core pillars of language could

00:03:14.879 --> 00:03:17.479
all be programmed mathematically. And by core

00:03:17.479 --> 00:03:19.860
pillars, you mean, like, vocabulary and grammar.

00:03:20.199 --> 00:03:23.300
Exactly. The lexicon, which is just the vocabulary.

00:03:23.780 --> 00:03:26.319
The morphology, which is how words change shape,

00:03:26.780 --> 00:03:29.580
like adding an ED to make something past tense.

00:03:30.259 --> 00:03:32.580
The syntax, which is the structural order of

00:03:32.580 --> 00:03:35.599
words. And finally, the semantics, you know,

00:03:35.620 --> 00:03:37.599
the actual meaning of the sentence. OK, got it.

00:03:37.699 --> 00:03:40.139
They thought if you just hard code all the explicit

00:03:40.139 --> 00:03:42.800
rules for those four pillars into a machine,

00:03:43.419 --> 00:03:45.580
it will perfectly understand language. Which

00:03:45.580 --> 00:03:47.560
is just wild to think about now. I mean, this

00:03:47.560 --> 00:03:50.460
is like trying to learn to speak a foreign language

00:03:50.460 --> 00:03:52.960
just by memorizing a dictionary and a grammar

00:03:52.960 --> 00:03:55.300
textbook without ever actually hearing a real

00:03:55.300 --> 00:03:57.539
conversation. It's way too rigid. Right. You

00:03:57.539 --> 00:03:58.919
might know the rules, but you don't know the

00:03:58.919 --> 00:04:01.300
language. If we connect this to the bigger picture.

00:04:02.000 --> 00:04:05.020
That rigidity is exactly why these early rule

00:04:05.020 --> 00:04:07.560
-based translation systems completely fell apart.

00:04:08.240 --> 00:04:10.560
Human language simply isn't a neat algebraic

00:04:10.560 --> 00:04:13.780
equation. No, it's messy. Very messy. Think about

00:04:13.780 --> 00:04:17.199
the English word bank. A rigid arithmetic rule

00:04:17.199 --> 00:04:20.500
tells the computer it's a noun. But is it a riverbank?

00:04:20.680 --> 00:04:23.579
A financial institution. Does a plane bank to

00:04:23.579 --> 00:04:26.300
the left... Oh, I see. A strict set of math rules

00:04:26.300 --> 00:04:28.420
cannot calculate the elastic nature of human

00:04:28.420 --> 00:04:31.180
context and cultural nuance. Precisely. The machine

00:04:31.180 --> 00:04:34.000
gets paralyzed by ambiguity. And the failure

00:04:34.000 --> 00:04:36.379
to deliver this automatic Russian translation

00:04:36.379 --> 00:04:39.600
was actually so profound that it caused a massive

00:04:39.600 --> 00:04:41.740
schism in the academic community. Wait, really?

00:04:42.019 --> 00:04:44.379
Like a full -on academic drama? Oh, yeah. The

00:04:44.379 --> 00:04:46.139
computer scientists basically threw their hands

00:04:46.139 --> 00:04:49.339
up. It forced a total rebranding of the entire

00:04:49.339 --> 00:04:51.839
discipline. I love a good academic drama. A researcher

00:04:51.839 --> 00:04:55.000
named David Hayes actually coined the term computational

00:04:55.000 --> 00:04:57.500
linguistics in the aftermath of this failure,

00:04:57.959 --> 00:05:00.259
specifically to distance the work from artificial

00:05:00.259 --> 00:05:03.459
intelligence. Ah, because AI had a bad reputation

00:05:03.459 --> 00:05:06.759
at that point. Exactly. AI had over -promised

00:05:06.759 --> 00:05:08.860
and under -delivered, so Hayes wanted to pull

00:05:08.860 --> 00:05:11.040
the study of language away from those purely

00:05:11.040 --> 00:05:15.199
math -based, rigid expectations. This pivot eventually

00:05:15.199 --> 00:05:17.699
led to the creation of dedicated organizations

00:05:17.699 --> 00:05:20.420
like the Association for Computational Linguistics,

00:05:20.779 --> 00:05:23.560
the ACL, and the International Committee on Computational

00:05:23.560 --> 00:05:26.319
Linguistics. So if handing a computer a rigid

00:05:26.319 --> 00:05:29.100
grammar rulebook didn't work, what was the pivot?

00:05:29.699 --> 00:05:32.120
How do you teach a machine if you can't just

00:05:32.120 --> 00:05:34.360
program the rules of syntax into it? You change

00:05:34.360 --> 00:05:37.319
the diet. If theory fails, you pivot to practice.

00:05:38.279 --> 00:05:40.560
Researchers realized computers needed massive,

00:05:40.779 --> 00:05:43.220
overwhelming amounts of real -world data to study

00:05:43.220 --> 00:05:45.300
language organically rather than theoretically.

00:05:45.360 --> 00:05:48.000
Just feed it raw language. Right. And this realization

00:05:48.000 --> 00:05:51.100
gave birth to the era of the annotated text corpus.

00:05:51.480 --> 00:05:53.300
And this brings us to one of the most entertaining

00:05:53.300 --> 00:05:56.180
details in our source material. To give computers

00:05:56.180 --> 00:05:58.879
this diet of real -world language, researchers

00:05:58.879 --> 00:06:01.399
in the late 1980s and 90s created something called

00:06:01.399 --> 00:06:03.759
the Pen Tree Bank. Yes, a legendary data set.

00:06:04.079 --> 00:06:06.639
It became one of the most widely used corpora

00:06:06.639 --> 00:06:09.300
in the entire field. The Pen Tree Bank contained

00:06:09.300 --> 00:06:12.120
over 4 .5 million words of American English.

00:06:12.560 --> 00:06:14.879
But the bizarre part is where they source those

00:06:14.879 --> 00:06:17.360
4 .5 million words. It is definitely not what

00:06:17.360 --> 00:06:19.600
you would expect for a foundational scientific

00:06:19.600 --> 00:06:23.699
data set. It was a mashup of IBM computer manuals

00:06:23.699 --> 00:06:27.240
and transcribed telephone conversations. You

00:06:27.240 --> 00:06:29.519
laugh at that. It is pretty funny. I mean, early

00:06:29.519 --> 00:06:32.459
language models essentially learn to understand

00:06:32.459 --> 00:06:36.259
human speech by analyzing a chaotic blend of

00:06:36.259 --> 00:06:40.399
dry corporate IT instructions and random, probably

00:06:40.399 --> 00:06:43.319
very mundane, phone calls between everyday people.

00:06:43.459 --> 00:06:46.240
It sounds totally absurd, but from a data perspective,

00:06:46.399 --> 00:06:48.680
it was actually a brilliant pairing. Think about

00:06:48.680 --> 00:06:50.639
what a computer needs to understand the full

00:06:50.639 --> 00:06:53.540
spectrum of a language. The IBM manuals provided

00:06:53.540 --> 00:06:57.100
a vast ocean of formal highly structured, grammatically

00:06:57.100 --> 00:06:59.800
perfect text. Right, very rigid. But then the

00:06:59.800 --> 00:07:02.279
telephone conversations provided the exact opposite.

00:07:02.579 --> 00:07:05.300
It gave the machine the informal, spontaneous,

00:07:05.699 --> 00:07:07.920
messy, often grammatically incorrect way that

00:07:07.920 --> 00:07:10.439
humans actually talk. But they didn't just dump

00:07:10.439 --> 00:07:13.160
4 .5 million words onto a hard drive and tell

00:07:13.160 --> 00:07:15.439
the computer, good luck, figure it out, right?

00:07:15.899 --> 00:07:17.959
They had to build a bridge so the machine could

00:07:17.959 --> 00:07:21.350
actually read it. That bridge is the annotation

00:07:21.350 --> 00:07:24.769
part of an annotated corpus. Researchers had

00:07:24.769 --> 00:07:27.269
to painstakingly go through those millions of

00:07:27.269 --> 00:07:29.709
words and label them, basically creating a map

00:07:29.709 --> 00:07:32.410
for the machine. The source specifically mentions

00:07:32.410 --> 00:07:35.910
techniques like part of speech tagging and syntactic

00:07:35.910 --> 00:07:38.149
bracketing. Let's break those down. Sure. Part

00:07:38.149 --> 00:07:40.149
of speech tagging is pretty self -explanatory,

00:07:40.329 --> 00:07:42.670
right? Just manually labeling every word as a

00:07:42.670 --> 00:07:45.750
noun, verb, or adjective. But what is syntactic

00:07:45.750 --> 00:07:48.199
bracketing? Think of syntactic bracketing as

00:07:48.199 --> 00:07:51.019
drawing invisible boxes around chunks of a sentence

00:07:51.019 --> 00:07:53.680
to show the computer how ideas group together.

00:07:53.939 --> 00:07:56.019
OK, like how? Well, if you have the sentence,

00:07:56.139 --> 00:07:58.699
the quick brown fox jumps, you don't just want

00:07:58.699 --> 00:08:01.339
the computer to read it word by word. You draw

00:08:01.339 --> 00:08:03.759
a bracket around the quick brown fox to tell

00:08:03.759 --> 00:08:06.379
the computer, hey, this whole group of words

00:08:06.379 --> 00:08:08.779
functions as a single noun phrase. Ah, I see.

00:08:08.819 --> 00:08:10.680
Then you draw a bigger bracket connecting that

00:08:10.680 --> 00:08:12.980
to jumps. You are essentially drawing a structural

00:08:12.980 --> 00:08:15.699
tree of the sentence. Hence the name, pen tree

00:08:15.699 --> 00:08:19.439
bank. Okay, that makes sense. But obviously they

00:08:19.439 --> 00:08:21.779
didn't just stop with American English, right?

00:08:21.839 --> 00:08:24.860
Yeah. Did they apply this massive data analysis

00:08:24.860 --> 00:08:27.329
to other languages, because I imagine something

00:08:27.329 --> 00:08:29.949
highly structured like Japanese would yield totally

00:08:29.949 --> 00:08:32.269
different data. They absolutely did apply it

00:08:32.269 --> 00:08:34.950
to other languages, including massive Japanese

00:08:34.950 --> 00:08:37.789
sentence corpora, and what they discovered completely

00:08:37.789 --> 00:08:40.370
shocked the linguistics community. What did they

00:08:40.370 --> 00:08:43.049
find? When they analyzed millions of Japanese

00:08:43.049 --> 00:08:45.769
sentences, they didn't find totally different

00:08:45.769 --> 00:08:48.750
data. They found a hidden mathematical pattern

00:08:48.750 --> 00:08:51.870
related to sentence length called log normality.

00:08:52.250 --> 00:08:55.120
Log normality. What exactly does that look like

00:08:55.120 --> 00:08:57.909
in the data? Imagine a graph plotting how long

00:08:57.909 --> 00:08:59.889
sentences are. You might assume it would look

00:08:59.889 --> 00:09:02.070
like a standard bell curve, right? Like, most

00:09:02.070 --> 00:09:04.889
sentences are of average length, with a few short

00:09:04.889 --> 00:09:07.330
ones and a few long ones perfectly balanced on

00:09:07.330 --> 00:09:09.330
either side. Yeah, that's what I'd guess. But

00:09:09.330 --> 00:09:12.389
log normality is a skewed distribution. It means

00:09:12.389 --> 00:09:15.230
most sentences cluster tightly around a shorter

00:09:15.230 --> 00:09:17.950
length, but there is a very long trailing tail

00:09:17.950 --> 00:09:21.169
of occasionally massive, highly complex sentences.

00:09:21.450 --> 00:09:23.429
And this pattern showed up in Japanese, just

00:09:23.429 --> 00:09:26.039
like it does in English. Yes. And that is the

00:09:26.039 --> 00:09:28.480
massive revelation. Even though human language

00:09:28.480 --> 00:09:31.419
isn't an explicit arithmetic equation like the

00:09:31.419 --> 00:09:35.190
1950s scientists thought, there are still deep

00:09:35.190 --> 00:09:38.090
statistical mathematical fingerprints hidden

00:09:38.090 --> 00:09:40.950
inside of it. That is wild. Right. Whether you

00:09:40.950 --> 00:09:43.429
are speaking English on a telephone or writing

00:09:43.429 --> 00:09:46.909
formal Japanese text, your brain is unknowingly

00:09:46.909 --> 00:09:49.389
adhering to these statistical patterns of log

00:09:49.389 --> 00:09:51.909
normality. That's genuinely mind blowing. We

00:09:51.909 --> 00:09:54.789
were just walking around generating complex statistical

00:09:54.789 --> 00:09:56.950
distributions with our mouths, totally unaware

00:09:56.950 --> 00:10:00.090
of it. Exactly. But as fascinating as the 4 .5

00:10:00.090 --> 00:10:02.690
million words in the pantry bank are, it highlights

00:10:02.690 --> 00:10:05.240
a massive disconnect. because a human toddler

00:10:05.240 --> 00:10:08.720
does not need to read 4 .5 million words of IBM

00:10:08.720 --> 00:10:11.600
manuals to learn how to speak. That is the exact

00:10:11.600 --> 00:10:14.100
friction point that forced computational linguists

00:10:14.100 --> 00:10:16.399
to start looking deeply into human cognitive

00:10:16.399 --> 00:10:19.820
psychology. The massive data corpora worked for

00:10:19.820 --> 00:10:22.740
machines, but it clearly wasn't how humans operated.

00:10:22.759 --> 00:10:25.059
Right. They had to figure out how to simulate

00:10:25.059 --> 00:10:28.419
human language acquisition. And trying to simulate

00:10:28.419 --> 00:10:31.919
a toddler brings up a massive paradox that the

00:10:31.919 --> 00:10:34.759
source material calls the problem of. positive

00:10:34.759 --> 00:10:37.379
evidence. This is one of the most heavily debated

00:10:37.379 --> 00:10:40.360
concepts in linguistics. When children are acquiring

00:10:40.360 --> 00:10:43.220
language, they are largely only exposed to positive

00:10:43.220 --> 00:10:45.559
evidence. Meaning what? Exactly. It means they

00:10:45.559 --> 00:10:48.000
only ever hear the correct forms of language

00:10:48.000 --> 00:10:51.120
spoken by their parents or peers. They are given

00:10:51.120 --> 00:10:54.259
evidence for what is correct, but they are rarely,

00:10:54.379 --> 00:10:57.559
if ever, given explicit negative evidence. Nobody

00:10:57.559 --> 00:11:00.100
is outlining all the mathematical ways a sentence

00:11:00.100 --> 00:11:02.480
could be constructed incorrectly. Exactly. Wait,

00:11:02.480 --> 00:11:05.179
I have to push back on that. If babies only ever

00:11:05.179 --> 00:11:08.220
hear correct words, how do they ever figure out

00:11:08.220 --> 00:11:10.399
when they themselves are making a grammatical

00:11:10.399 --> 00:11:13.899
mistake? Like, if nobody is programming the wrong

00:11:13.899 --> 00:11:16.259
rules into them, how do their brains know to

00:11:16.259 --> 00:11:18.799
avoid them? You're touching on the exact limitation

00:11:18.799 --> 00:11:21.259
that plagued computational models in the late

00:11:21.259 --> 00:11:24.740
1980s. Yes, the early machines could not handle

00:11:24.740 --> 00:11:27.279
the lack of negative evidence. If a computer

00:11:27.279 --> 00:11:30.059
doesn't know what is explicitly wrong, it struggles

00:11:30.059 --> 00:11:33.100
to narrow down what is right. We didn't have

00:11:33.230 --> 00:11:36.389
the sophisticated deep learning algorithms back

00:11:36.389 --> 00:11:38.769
then that can just infer negative boundaries

00:11:38.769 --> 00:11:41.029
organically. So how do they bridge the gap? How

00:11:41.029 --> 00:11:43.990
do you get a computer to learn like a toddler

00:11:43.990 --> 00:11:47.049
who only hears positive evidence? Through a concept

00:11:47.049 --> 00:11:50.000
called incremental learning. Researchers hypothesized

00:11:50.000 --> 00:11:51.960
that maybe the secret wasn't the data itself,

00:11:52.320 --> 00:11:54.159
but the rate at which the data was consumed.

00:11:54.399 --> 00:11:56.980
OK, so feeding it slowly. Exactly. They found

00:11:56.980 --> 00:11:59.659
that a machine and a human learns best if the

00:11:59.659 --> 00:12:02.159
input is incredibly simple at first, and then

00:12:02.159 --> 00:12:04.639
slowly scales up in complexity. So you don't

00:12:04.639 --> 00:12:06.279
pour the roof before you pour the foundation.

00:12:06.620 --> 00:12:08.740
You don't feed the machine the entire complex

00:12:08.740 --> 00:12:11.700
pantry bank on day one. No. You present the input

00:12:11.700 --> 00:12:14.539
incrementally. And here is where it ties beautifully

00:12:14.539 --> 00:12:17.830
back to human biology. This computational finding

00:12:17.830 --> 00:12:20.570
provides a profound explanation for why human

00:12:20.570 --> 00:12:24.049
infants have such a uniquely long period of helplessness

00:12:24.049 --> 00:12:26.570
and language acquisition compared to other animals.

00:12:26.889 --> 00:12:29.330
We actually need our memory and attention spans

00:12:29.330 --> 00:12:32.750
to be small at first. It acts as a filter, forcing

00:12:32.750 --> 00:12:35.590
us to focus only on the simplest, most basic

00:12:35.590 --> 00:12:38.990
positive evidence. As our memory physically grows,

00:12:39.230 --> 00:12:41.330
the complexity of the language we can process

00:12:41.330 --> 00:12:47.519
scales up alongside it. To truly test these theories

00:12:47.519 --> 00:12:50.840
of infant language acquisition, the researchers

00:12:50.840 --> 00:12:53.679
didn't just build software, they built physical

00:12:53.679 --> 00:12:57.100
robots. Yes, the introduction of robotics into

00:12:57.100 --> 00:12:59.740
computational linguistics was a massive paradigm

00:12:59.740 --> 00:13:02.139
shift. I can imagine. Researchers realized that

00:13:02.139 --> 00:13:04.279
human babies don't learn language in a vacuum.

00:13:04.679 --> 00:13:06.360
They learn it by physically interacting with

00:13:06.360 --> 00:13:08.899
their environment. So they built physical robots

00:13:08.899 --> 00:13:11.120
to test something called an affordance model.

00:13:11.360 --> 00:13:13.440
Based on the source, the affordance model is

00:13:13.440 --> 00:13:16.139
essentially mapping physical reality to audio,

00:13:16.379 --> 00:13:18.480
right? Yes. They programmed these robots with

00:13:18.480 --> 00:13:21.059
motors and sensors. The robot would perform an

00:13:21.059 --> 00:13:24.029
action. like pushing a block, it would physically

00:13:24.029 --> 00:13:25.950
perceive the environment changing through its

00:13:25.950 --> 00:13:28.350
sensors, and then the researchers would link

00:13:28.350 --> 00:13:30.909
that physical data to a spoken word, like push.

00:13:31.129 --> 00:13:33.730
That's the mechanism perfectly described. They

00:13:33.730 --> 00:13:36.409
gave the language a physical grounding. And the

00:13:36.409 --> 00:13:39.370
result of these robot toddler experiments fundamentally

00:13:39.370 --> 00:13:42.769
shook the linguistics world. Because the robots

00:13:42.769 --> 00:13:46.210
were able to acquire functioning word -to -meaning

00:13:46.210 --> 00:13:49.759
mappings. without needing any grammatical structure

00:13:49.759 --> 00:13:51.980
programmed into them at all. The philosophical

00:13:51.980 --> 00:13:54.539
implication here is staggering. For decades,

00:13:54.879 --> 00:13:57.399
scientists obsessed over syntax and grammar rules,

00:13:57.639 --> 00:13:59.320
assuming that was the core of language. Like

00:13:59.320 --> 00:14:03.159
the 1950s math guys. Right. But the robot experiment

00:14:03.159 --> 00:14:05.379
suggested that meaning comes before grammar.

00:14:05.840 --> 00:14:08.159
The raw physical interaction with the world,

00:14:08.559 --> 00:14:11.039
the action of pushing, the perception of an object

00:14:11.039 --> 00:14:14.059
that is the actual foundation. Grammar is just

00:14:14.059 --> 00:14:16.240
the architectural scaffolding we build on top

00:14:16.240 --> 00:14:18.860
of the meaning later on to organize it. Which,

00:14:19.019 --> 00:14:20.919
if you've ever spent 10 minutes with a toddler,

00:14:21.019 --> 00:14:24.450
makes total intuitive sense. A one -year -old

00:14:24.450 --> 00:14:27.370
knows exactly what the word ball means, and they

00:14:27.370 --> 00:14:29.669
know the physical action of throwing it long

00:14:29.669 --> 00:14:32.149
before they can construct a syntactically perfect

00:14:32.149 --> 00:14:34.870
sentence like, Mother, I would like to throw

00:14:34.870 --> 00:14:37.549
the red ball. But that transition is the missing

00:14:37.549 --> 00:14:40.549
link. How do we get from a robot understanding

00:14:40.549 --> 00:14:43.929
the raw meaning of ball to a human understanding

00:14:43.929 --> 00:14:47.429
highly complex grammar? To understand that leap,

00:14:47.730 --> 00:14:49.750
the source material turns to one of the most

00:14:49.750 --> 00:14:52.509
monumental figures in the history of linguistics.

00:14:52.759 --> 00:14:55.779
Noam Chomsky. Right, Chomsky. His structural

00:14:55.779 --> 00:14:58.259
theories are the bedrock for understanding how

00:14:58.259 --> 00:15:00.860
infants eventually parse complex grammar. The

00:15:00.860 --> 00:15:03.039
source mentions something specific here called

00:15:03.039 --> 00:15:06.740
Chomsky Normal Form. What exactly is that? Chomsky

00:15:06.740 --> 00:15:09.019
Normal Form is a way of breaking down the rules

00:15:09.019 --> 00:15:12.340
of a language into a very stripped, rigid mathematical

00:15:12.340 --> 00:15:15.379
format. Basically, it's a rule system where every

00:15:15.379 --> 00:15:17.639
piece of a sentence branches off into exactly

00:15:17.639 --> 00:15:19.960
two other pieces. This is very binary. Incredibly

00:15:19.960 --> 00:15:22.759
neat, binary, and easy for computers to process.

00:15:23.139 --> 00:15:25.299
It essentially forces language into a perfect,

00:15:25.580 --> 00:15:28.019
predictable tree structure. But the source also

00:15:28.019 --> 00:15:29.960
says researchers were trying to figure out how

00:15:29.960 --> 00:15:33.379
infants learn non -normal grammar. I assume that

00:15:33.379 --> 00:15:35.379
means humans don't actually speak in perfect,

00:15:35.379 --> 00:15:37.740
neat two -branch trees. We absolutely do not.

00:15:37.940 --> 00:15:40.399
Human language is messy, interruptive, and full

00:15:40.399 --> 00:15:42.860
of weird clauses that loop back on themselves.

00:15:43.279 --> 00:15:46.039
Like this conversation. Exactly. That is non

00:15:46.039 --> 00:15:48.720
-normal grammar. So the challenge for modern

00:15:48.720 --> 00:15:52.529
researchers was, how do we take Chomsky's theoretical

00:15:52.529 --> 00:15:55.809
structures and figure out how babies navigate

00:15:55.809 --> 00:15:59.330
the non -normal, messy reality of human speech.

00:15:59.429 --> 00:16:02.309
How'd they do it? To do this, they stopped looking

00:16:02.309 --> 00:16:04.509
just at theories and started combining them with

00:16:04.509 --> 00:16:06.470
the massive computational models we talked about

00:16:06.470 --> 00:16:09.149
earlier, like the pentree bank. So they merged

00:16:09.149 --> 00:16:12.289
the macro data with the structural theory. Yes.

00:16:12.559 --> 00:16:15.440
And when you do that, you unlock an entirely

00:16:15.440 --> 00:16:18.679
new level of computational linguistics. We aren't

00:16:18.679 --> 00:16:21.100
just looking at how babies learn anymore, we

00:16:21.100 --> 00:16:23.179
are looking at the vast evolutionary trajectory

00:16:23.179 --> 00:16:25.259
of human language itself. Okay, now we're zooming

00:16:25.259 --> 00:16:27.860
way out. Way out. And to do that, researchers

00:16:27.860 --> 00:16:30.559
utilized some incredibly heavy mathematics, specifically

00:16:30.559 --> 00:16:32.820
the price equation and poliar earn dynamics.

00:16:32.980 --> 00:16:34.960
Okay, those sound intimidating. Let's break them

00:16:34.960 --> 00:16:38.389
down. What is the price equation doing in linguistics?

00:16:38.629 --> 00:16:40.470
Because I thought that was an evolutionary biology

00:16:40.470 --> 00:16:43.909
term used for genetics. It is. In biology, the

00:16:43.909 --> 00:16:46.309
price equation tracks how a specific genetic

00:16:46.309 --> 00:16:49.090
trait changes in a population over generations

00:16:49.090 --> 00:16:51.870
based on its fitness. Linguists realized they

00:16:51.870 --> 00:16:54.450
could use that exact same equation, but instead

00:16:54.450 --> 00:16:57.990
of tracking a gene, they track a word or a grammatical

00:16:57.990 --> 00:17:01.039
quirk. If a new slang word is fit, meaning it's

00:17:01.039 --> 00:17:04.539
useful, catchy, or easy to type, the price equation

00:17:04.539 --> 00:17:07.380
can track how it outcompetes older words and

00:17:07.380 --> 00:17:09.859
spreads through a population's vocabulary over

00:17:09.859 --> 00:17:13.180
time. Wow. Treating words literally like living

00:17:13.180 --> 00:17:15.559
organisms competing for survival. That's incredible.

00:17:15.900 --> 00:17:18.660
And what about pull your urn dynamics? This is

00:17:18.660 --> 00:17:21.519
a brilliant statistical model. Imagine an urn

00:17:21.519 --> 00:17:24.039
filled with a few red balls and a few blue balls.

00:17:24.319 --> 00:17:26.279
You reach in blindly and pull out a red ball.

00:17:26.579 --> 00:17:28.329
The rule of the pull your urn is that you put

00:17:28.329 --> 00:17:30.809
that red ball back, but you also add an extra

00:17:30.809 --> 00:17:33.109
red ball into the urn. So the next time I reach

00:17:33.109 --> 00:17:35.130
in, my odds of pulling a red ball are slightly

00:17:35.130 --> 00:17:38.009
higher. Exactly. The rich get richer. In linguistics,

00:17:38.049 --> 00:17:40.769
this models how certain words or sentence structures

00:17:40.769 --> 00:17:43.109
become dominant. Like when a phrase just takes

00:17:43.109 --> 00:17:45.150
over the internet. Right. The more a specific

00:17:45.150 --> 00:17:48.470
phrase is used, say, a viral meme or a new piece

00:17:48.470 --> 00:17:50.730
of corporate jargon, the more likely it is to

00:17:50.730 --> 00:17:53.549
be heard, repeated, and integrated into the broader

00:17:53.549 --> 00:17:56.269
language. It becomes a compounding exponential

00:17:56.269 --> 00:17:58.680
curve of linguistics. adoption. So what does

00:17:58.680 --> 00:18:01.900
this all mean? Why are researchers applying these

00:18:01.900 --> 00:18:04.920
urns and equations to the massive corpora of

00:18:04.920 --> 00:18:07.529
data we are generating today? They aren't just

00:18:07.529 --> 00:18:09.609
doing it to understand the history of language.

00:18:09.990 --> 00:18:13.150
They are doing it to forecast the future. Predicting

00:18:13.150 --> 00:18:15.670
the future of language. Yes. By running human

00:18:15.670 --> 00:18:18.369
communication through poly -earned dynamics and

00:18:18.369 --> 00:18:21.130
the price equation, researchers can mathematically

00:18:21.130 --> 00:18:23.690
predict where our language is evolving next.

00:18:24.089 --> 00:18:26.289
They can forecast which dialects will merge,

00:18:26.589 --> 00:18:28.750
which syntax structures will die out, and what

00:18:28.750 --> 00:18:30.670
the baseline of human communication will look

00:18:30.670 --> 00:18:33.549
like a decade from now. That is just, I mean,

00:18:33.910 --> 00:18:36.130
every text message you send, every informal email

00:18:36.130 --> 00:18:38.910
you write, every weird, messy sentence you speak

00:18:38.910 --> 00:18:41.829
on a phone call, you are unknowingly dropping

00:18:41.829 --> 00:18:44.710
a colored bowl into the urn. You're a data point

00:18:44.710 --> 00:18:47.890
in this massive evolving data set. This raises

00:18:47.890 --> 00:18:49.769
an important question for you, the listener,

00:18:49.950 --> 00:18:52.789
to consider. We navigate our daily lives assuming

00:18:52.789 --> 00:18:55.410
our communication is entirely spontaneous. We

00:18:55.410 --> 00:18:57.309
feel like we are making free will choices about

00:18:57.309 --> 00:18:59.210
the words we use. Right, I chose to say that.

00:18:59.390 --> 00:19:01.670
But if mathematical equations can accurately

00:19:01.670 --> 00:19:04.650
plot the evolutionary trajectory of our vocabulary

00:19:04.650 --> 00:19:08.539
based on statistical velocity... How predictable

00:19:08.539 --> 00:19:12.339
is our communication, really? It is a wild, slightly

00:19:12.339 --> 00:19:14.420
unnerving thought, honestly. To look back at

00:19:14.420 --> 00:19:17.400
the journey we just took. From the 1950s, computer

00:19:17.400 --> 00:19:19.619
scientists desperately trying to decode Russian

00:19:19.619 --> 00:19:22.200
journals with rigid arithmetic, to researchers

00:19:22.200 --> 00:19:24.599
building robot toddlers that learn meaning through

00:19:24.599 --> 00:19:27.180
physical actions, and finally, to evolutionary

00:19:27.180 --> 00:19:29.559
equations predicting the future of how we speak.

00:19:29.960 --> 00:19:32.240
And that journey isn't just academic history.

00:19:32.279 --> 00:19:34.920
It is the literal foundation of the digital world

00:19:34.920 --> 00:19:37.670
you interact with every day. Really. Oh, absolutely.

00:19:37.849 --> 00:19:40.230
All of those early failures, the creation of

00:19:40.230 --> 00:19:43.349
the 4 .5 million word pantry bank, the incremental

00:19:43.349 --> 00:19:45.829
learning algorithms, they birthed the tools we

00:19:45.829 --> 00:19:48.269
rely on now. The source material specifically

00:19:48.269 --> 00:19:50.990
lists the direct descendants of this wild evolutionary

00:19:50.990 --> 00:19:53.529
process. Things like modern software frameworks,

00:19:53.829 --> 00:19:56.450
spaCy, WordNet, the grammatical framework, and

00:19:56.450 --> 00:19:58.789
GloVe. Those are the invisible mechanics running

00:19:58.789 --> 00:20:01.289
our world. Every single time you use a modern

00:20:01.289 --> 00:20:03.690
search engine, or ask a voice assistant for the

00:20:03.690 --> 00:20:06.440
weather, or rely on predictive text to finish

00:20:06.440 --> 00:20:09.039
your sentence on your phone. You are benefiting

00:20:09.039 --> 00:20:11.720
from the legacy of those robotic toddler experiments

00:20:11.720 --> 00:20:14.819
and massive data corpora. Which leaves us with

00:20:14.819 --> 00:20:18.160
a final lingering thought to explore. We just

00:20:18.160 --> 00:20:20.660
established that computational tools using things

00:20:20.660 --> 00:20:23.400
like polio -earned dynamics can successfully

00:20:23.400 --> 00:20:26.119
predict the evolutionary future of human language.

00:20:26.359 --> 00:20:28.940
They know what we are statistically most likely

00:20:28.940 --> 00:20:32.039
to say next. But if our predictive text algorithms

00:20:32.039 --> 00:20:34.519
are constantly suggesting the most mathematically

00:20:34.519 --> 00:20:37.420
probable next word, and we habitually click it

00:20:37.420 --> 00:20:40.420
just to save time, At what point do the algorithms

00:20:40.420 --> 00:20:42.839
stop merely predicting our language and start

00:20:42.839 --> 00:20:45.099
quietly inventing the new linguistic structures

00:20:45.099 --> 00:20:47.920
that we unknowingly adopt? Are we dropping the

00:20:47.920 --> 00:20:50.119
balls into the urn, or is the machine doing it

00:20:50.119 --> 00:20:53.180
for us? Wow. That is exactly why we do this show.

00:20:53.579 --> 00:20:55.319
Thank you so much for joining us on this deep

00:20:55.319 --> 00:20:57.940
dive. Stay insanely curious, everyone. And, you

00:20:57.940 --> 00:21:00.279
know, the next time you take a walk or speak

00:21:00.279 --> 00:21:02.519
a simple sentence, just remember the invisible

00:21:02.519 --> 00:21:05.579
mechanics underneath it all are absolutely astonishing.
