WEBVTT

00:00:00.000 --> 00:00:01.860
Have you ever been on vacation maybe sitting

00:00:01.860 --> 00:00:05.620
in a little cafe somewhere and you're staring

00:00:05.620 --> 00:00:08.240
at a menu that might as well be written in hieroglyphics.

00:00:08.519 --> 00:00:10.820
Oh, absolutely. Right. So you pull out your phone,

00:00:11.199 --> 00:00:14.199
you open up a free online translation tool, and

00:00:14.199 --> 00:00:16.839
you just point your camera at it. And for a second,

00:00:17.039 --> 00:00:19.719
it feels like absolute magic. Like you have a

00:00:19.719 --> 00:00:22.179
superpower or something. Exactly. You can suddenly

00:00:22.179 --> 00:00:24.719
read the words. But then, you know, you look

00:00:24.719 --> 00:00:27.800
a little closer, and the app is cheerfully offering

00:00:27.800 --> 00:00:31.640
you a plate of stir -fried Wikipedia with pimientos.

00:00:31.820 --> 00:00:33.920
Yeah, the illusion of magic shatters instantly,

00:00:34.100 --> 00:00:36.759
right? there, or you're trying to read an international

00:00:36.759 --> 00:00:39.420
news site and halfway through this really serious

00:00:39.420 --> 00:00:42.619
political article, the text suddenly spouts total

00:00:42.619 --> 00:00:45.439
surreal nonsense. Just complete gibberish. Right.

00:00:45.659 --> 00:00:48.600
And you realize you aren't talking to some bilingual

00:00:48.600 --> 00:00:51.240
genius. You're basically talking to a calculator

00:00:51.240 --> 00:00:54.100
that just divided by zero. That broken magic

00:00:54.100 --> 00:00:56.759
is exactly what we are getting into today. So

00:00:56.759 --> 00:00:59.200
welcome to this deep dive into the Wikipedia

00:00:59.200 --> 00:01:01.979
article on machine translation. It's a huge topic.

00:01:02.189 --> 00:01:04.890
It really is. Our mission today is to explore

00:01:04.890 --> 00:01:07.469
the incredibly fascinating and frankly often

00:01:07.469 --> 00:01:10.170
frustrating journey of teaching computers to

00:01:10.170 --> 00:01:12.170
understand human language. Yeah, and we're going

00:01:12.170 --> 00:01:15.709
to discover why true human parity in translation

00:01:15.709 --> 00:01:18.409
is, well, it's still an illusion. And we'll unpack

00:01:18.409 --> 00:01:21.010
the high stakes real world consequences of relying

00:01:21.010 --> 00:01:23.049
on machines to speak for us. Because when we

00:01:23.049 --> 00:01:25.109
discuss machine translation, you know, we aren't

00:01:25.109 --> 00:01:27.430
just looking at lines of code. No, not at all.

00:01:27.569 --> 00:01:30.670
We're looking at how machines try to mathematically

00:01:30.670 --> 00:01:34.980
quantify the cultural, emotional, and even contextual

00:01:34.980 --> 00:01:37.480
depths of how you and I actually communicate.

00:01:37.680 --> 00:01:41.120
Which is wild. Now I know the early tech pioneers

00:01:41.120 --> 00:01:43.340
in the 1950s tried to tackle this problem with

00:01:43.340 --> 00:01:46.439
those first room -sized computers, but didn't

00:01:46.439 --> 00:01:48.579
philosophers dream about universal languages

00:01:48.579 --> 00:01:50.819
way before we even had the hardware? Oh, long

00:01:50.819 --> 00:01:53.480
before. I mean, back in 1629, Rene Descartes

00:01:53.480 --> 00:01:55.500
proposed this really beautiful philosophical

00:01:55.500 --> 00:01:58.260
concept of a universal language. Okay, 1629,

00:01:58.459 --> 00:02:00.780
wow. Yeah, you imagine a system where different

00:02:00.780 --> 00:02:03.140
tongues would share one single symbol for equivalent

00:02:03.140 --> 00:02:07.420
ideas. But if you want to find the actual mathematical

00:02:07.420 --> 00:02:10.259
roots of how today's apps work, we have to look

00:02:10.259 --> 00:02:12.819
past the philosophers and find the code breakers.

00:02:12.860 --> 00:02:15.460
Code breakers. Specifically, a ninth century

00:02:15.460 --> 00:02:19.030
Arabic cryptographer named Al -Kindi. He developed

00:02:19.030 --> 00:02:22.150
techniques for breaking secret codes using frequency

00:02:22.150 --> 00:02:24.830
analysis and probability. Wait, breaking secret

00:02:24.830 --> 00:02:27.729
codes? So the foundation of translating a French

00:02:27.729 --> 00:02:31.270
poem is the exact same math used to intercept

00:02:31.270 --> 00:02:33.610
enemy spy communications? It is the exact same

00:02:33.610 --> 00:02:36.169
math. Right. And that mindset carried over directly

00:02:36.169 --> 00:02:38.710
into the mid -20th century. Really? Yeah. So

00:02:38.710 --> 00:02:42.370
1947 and 1949, researchers like A .D. Booth in

00:02:42.370 --> 00:02:44.689
England and Warren Weaver at the Rockefeller

00:02:44.689 --> 00:02:47.490
Foundation, they proposed using digital computers.

00:02:47.340 --> 00:02:50.340
is to translate natural human languages. OK.

00:02:50.599 --> 00:02:53.099
Weaver actually wrote a highly influential memorandum

00:02:53.099 --> 00:02:56.219
in 1949 that essentially framed foreign languages

00:02:56.219 --> 00:02:59.060
not as distinct cultures, but basically as encrypted

00:02:59.060 --> 00:03:01.639
English. OK, let's unpack this. The early pioneers

00:03:01.639 --> 00:03:03.900
basically treated foreign languages like secret

00:03:03.900 --> 00:03:06.319
enemy codes to be cracked. They thought if you

00:03:06.319 --> 00:03:08.560
just found the right decoder ring, the language

00:03:08.560 --> 00:03:11.680
would be solved. But language isn't a static

00:03:11.680 --> 00:03:14.689
code, right? It's a living, breathing thing.

00:03:14.789 --> 00:03:16.590
Right. And they suffered from an incredible amount

00:03:16.590 --> 00:03:20.069
of hubris about it. In the 1950s, there was this

00:03:20.069 --> 00:03:22.650
massive optimism. I can imagine. There was a

00:03:22.650 --> 00:03:25.909
famous 1954 public demonstration by a Georgetown

00:03:25.909 --> 00:03:30.030
University research team and IBM. They fed Russian

00:03:30.030 --> 00:03:33.169
sentences into a massive computer, and it spit

00:03:33.169 --> 00:03:35.569
out English translations. And people bought it.

00:03:35.810 --> 00:03:38.669
Oh, the press went wild. Funding just poured

00:03:38.669 --> 00:03:41.960
in from all over the US, Japan, and Russia. The

00:03:41.960 --> 00:03:44.180
general consensus was that the entire language

00:03:44.180 --> 00:03:46.719
barrier problem would be permanently solved in,

00:03:46.719 --> 00:03:49.900
like, three to five years. Wow. Spoiler alert

00:03:49.900 --> 00:03:51.960
for anyone who has ever used an online translator.

00:03:52.180 --> 00:03:54.120
It wasn't solved in three years. Not even close.

00:03:54.479 --> 00:03:57.439
By 1966, the National Academy of Sciences formed

00:03:57.439 --> 00:03:59.740
a committee called ELAPAC to review the progress.

00:03:59.819 --> 00:04:01.840
And how does that go? They released a report

00:04:01.840 --> 00:04:04.139
that basically brought the entire industry crashing

00:04:04.139 --> 00:04:07.490
down. It concluded that a solid decade of incredibly

00:04:07.490 --> 00:04:10.289
expensive research had completely failed. Yeah.

00:04:10.289 --> 00:04:12.889
Yeah, the translations were terrible. They required

00:04:12.889 --> 00:04:15.870
massive human editing to even be legible. So

00:04:15.870 --> 00:04:17.850
the government drastically reduced all funding

00:04:17.850 --> 00:04:20.470
and the field went into a sort of dark age. So

00:04:20.470 --> 00:04:23.449
it forced a total reinvention. Once computer

00:04:23.449 --> 00:04:26.870
scientists recovered from that 1966 funding crash,

00:04:27.389 --> 00:04:29.730
they realized that treating language like a static

00:04:29.730 --> 00:04:32.069
cryptography puzzle was a complete dead end.

00:04:32.189 --> 00:04:35.069
Right, they needed entirely new frameworks, which

00:04:35.069 --> 00:04:37.649
led to three distinct eras of technology. Okay,

00:04:37.689 --> 00:04:39.449
let's go through them. What was the first era?

00:04:39.870 --> 00:04:42.810
The first era, which dominated for decades after

00:04:42.810 --> 00:04:46.639
that crash, was the rule -based approach. Programmers

00:04:46.639 --> 00:04:49.180
essentially relied on building massive electronic

00:04:49.180 --> 00:04:52.019
dictionaries and hard -coding every single grammar

00:04:52.019 --> 00:04:54.120
rule. Every single one? That sounds impossible.

00:04:54.199 --> 00:04:56.920
It was incredibly tedious. If there was an orthographical

00:04:56.920 --> 00:04:58.860
variation, meaning just a different spelling

00:04:58.860 --> 00:05:01.680
of the same word, like color with it, U in British

00:05:01.680 --> 00:05:04.439
English versus American English, a human programmer

00:05:04.439 --> 00:05:07.060
had to write a specific lexical selection rule.

00:05:07.319 --> 00:05:10.180
Oh my gosh. Yeah, that rule explicitly told the

00:05:10.180 --> 00:05:12.199
computer exactly which dictionary definition

00:05:12.199 --> 00:05:15.009
to pick based on the word surrounding it. Piping

00:05:15.009 --> 00:05:17.490
out millions of individual grammar rules sounds

00:05:17.490 --> 00:05:21.029
like a total nightmare. Did it actually work?

00:05:21.370 --> 00:05:23.750
In highly controlled, very narrow environments,

00:05:23.910 --> 00:05:26.879
yes it did. There is a system called CANT, designed

00:05:26.879 --> 00:05:30.079
in the early 90s, specifically to translate something

00:05:30.079 --> 00:05:33.199
called Caterpillar Technical English. Caterpillar

00:05:33.199 --> 00:05:35.959
like the tractor. Exactly. If you are translating

00:05:35.959 --> 00:05:38.720
tractor repair manuals where the vocabulary is

00:05:38.720 --> 00:05:40.480
restricted and the sentence structures are super

00:05:40.480 --> 00:05:44.139
simple and rigid, rule -based systems yield very

00:05:44.139 --> 00:05:46.199
stable results. Sure, because a tractor part

00:05:46.199 --> 00:05:48.699
is always a tractor part. Right. but you cannot

00:05:48.699 --> 00:05:51.879
program a rule for every single weird exception,

00:05:52.180 --> 00:05:55.240
slang word, or idiom in everyday human speech.

00:05:55.819 --> 00:05:58.500
The system just breaks under the weight of human

00:05:58.500 --> 00:06:01.420
unpredictability. So if rule based is like giving

00:06:01.420 --> 00:06:03.959
the computer a massive grammar textbook and telling

00:06:03.959 --> 00:06:06.540
it to study, how did they pivot for the second

00:06:06.540 --> 00:06:08.660
era? Did they just, I don't know, build a bigger

00:06:08.660 --> 00:06:10.959
textbook? What's fascinating here is the complete

00:06:10.959 --> 00:06:13.399
shift in philosophy. They stopped telling the

00:06:13.399 --> 00:06:15.699
computer how to translate and instead let the

00:06:15.699 --> 00:06:18.579
computer guess how to translate based on massive

00:06:18.579 --> 00:06:21.300
amounts of data. Oh, I see. This was the statistical

00:06:21.300 --> 00:06:24.240
machine translation era, which really took off

00:06:24.240 --> 00:06:26.579
as computing power got cheaper in the late 80s

00:06:26.579 --> 00:06:28.860
and 90s. So it's like dropping the computer.

00:06:28.839 --> 00:06:31.480
in the middle of Paris with a million translated

00:06:31.480 --> 00:06:33.680
documents and saying, figure out the patterns

00:06:33.680 --> 00:06:36.480
yourself. Precisely. They used bilingual text

00:06:36.480 --> 00:06:39.300
corpora, which are massive collections of documents

00:06:39.300 --> 00:06:41.660
already perfectly translated by humans. Like

00:06:41.660 --> 00:06:44.019
what kind of documents? A famous early example

00:06:44.019 --> 00:06:46.759
is the Canadian Hansard corpus. It's the official

00:06:46.759 --> 00:06:49.040
record of the Canadian Parliament, transcribed

00:06:49.040 --> 00:06:52.180
in both English and French. The computer analyzes

00:06:52.180 --> 00:06:55.000
these parallel texts and calculates statistical

00:06:55.000 --> 00:06:57.920
probability. Okay, so just running the numbers.

00:06:58.019 --> 00:07:01.040
Right. It simply notices that when the word taxes

00:07:01.040 --> 00:07:04.120
appears in English, a specific French word appears

00:07:04.120 --> 00:07:06.939
in the corresponding sentence, like 98 % of the

00:07:06.939 --> 00:07:09.139
time. And that's where Google completely changed

00:07:09.139 --> 00:07:12.339
the paradigm, right? In 2005, Google fed its

00:07:12.339 --> 00:07:15.680
internal system approximately 200 billion words

00:07:15.680 --> 00:07:18.199
from United Nations materials. Yep, 200 billion.

00:07:18.519 --> 00:07:22.509
200 billion? But wait, I have to ask. If Google

00:07:22.509 --> 00:07:25.430
fed its system 200 billion words from the UN

00:07:25.430 --> 00:07:28.769
back in 2005, shouldn't that sheer volume of

00:07:28.769 --> 00:07:31.350
data have permanently solved translation? Why

00:07:31.350 --> 00:07:34.230
wasn't 200 billion words enough to capture every

00:07:34.230 --> 00:07:36.689
nuance? Well, because UN documents are highly

00:07:36.689 --> 00:07:39.889
formal. If you train a machine purely on diplomatic

00:07:39.889 --> 00:07:42.470
treaties, it learns to sound exactly like a diplomat.

00:07:42.610 --> 00:07:44.490
Oh, right, so it doesn't know how normal people

00:07:44.490 --> 00:07:47.290
talk. Exactly. It will completely fail to translate

00:07:47.290 --> 00:07:50.050
a teenager texting their friend. Furthermore,

00:07:50.189 --> 00:07:52.009
statistical models fundamentally struggle with

00:07:52.009 --> 00:07:54.810
morphology -rich languages. Which are what? Languages

00:07:54.810 --> 00:07:56.769
where words change their entire spelling based

00:07:56.769 --> 00:08:00.490
on gender, tense, or case. The math just couldn't

00:08:00.490 --> 00:08:02.829
stretch far enough, leading to the third and

00:08:02.829 --> 00:08:05.550
current era neural machine translation. Right,

00:08:05.550 --> 00:08:08.370
the 2020s deep learning era. This is the architecture

00:08:08.370 --> 00:08:11.930
behind large language models or LLMs like ChatGPT

00:08:11.930 --> 00:08:14.189
and specialized translation tools like DeepL.

00:08:14.370 --> 00:08:16.990
Exactly. But how is a neural network actually

00:08:16.990 --> 00:08:18.850
different from just running better statistics?

00:08:19.120 --> 00:08:22.879
Imagine a giant multi -dimensional map. A neural

00:08:22.879 --> 00:08:25.779
network plots every single word as a spatial

00:08:25.779 --> 00:08:28.540
coordinate on that map. It understands words

00:08:28.540 --> 00:08:31.199
as vectors of meaning. Spatial coordinates. Yeah.

00:08:31.620 --> 00:08:35.860
So the word king and the word queen end up geographically

00:08:35.860 --> 00:08:38.000
close to each other on this map because they

00:08:38.000 --> 00:08:40.879
share royal contexts. Oh, I get it. The network

00:08:40.879 --> 00:08:43.120
doesn't just match words anymore. It predicts

00:08:43.120 --> 00:08:45.360
the likelihood of an entire sequence of words

00:08:45.360 --> 00:08:47.899
based on these spatial relationships. Which leads

00:08:47.899 --> 00:08:49.919
to all those marketing claims we see today about

00:08:49.919 --> 00:08:53.200
AI achieving human parity in translation. Claims

00:08:53.200 --> 00:08:55.940
that researchers overwhelmingly agree are a complete

00:08:55.940 --> 00:08:59.879
illusion. Really? Even now? Even now. The sources

00:08:59.879 --> 00:09:01.980
point out that human parity claims are based

00:09:01.980 --> 00:09:05.019
entirely on limited domains, specific language

00:09:05.019 --> 00:09:08.210
pairs, and very narrow test benchmarks. they

00:09:08.210 --> 00:09:10.889
lack true statistical significance power. So

00:09:10.889 --> 00:09:12.690
they're kind of cherry -picking the data to look

00:09:12.690 --> 00:09:15.070
good. Pretty much. Even with an advanced tool

00:09:15.070 --> 00:09:17.789
like DeepL, the outputs almost always require

00:09:17.789 --> 00:09:20.850
post -editing by a human to fix glaring contextual

00:09:20.850 --> 00:09:23.850
errors. Because data volume cannot easily solve

00:09:23.850 --> 00:09:26.750
contextual ambiguity. The machine might have

00:09:26.750 --> 00:09:29.529
a perfect map of the vocabulary, but it has no

00:09:29.529 --> 00:09:32.460
map of the real world. Exactly. This ambiguity

00:09:32.460 --> 00:09:34.559
wall is actually my favorite part of the deep

00:09:34.559 --> 00:09:37.120
dive because it proves how incredibly complex

00:09:37.120 --> 00:09:40.179
human brains really are. The issue of disambiguation

00:09:40.179 --> 00:09:43.519
was actually raised way back in the 1950s by

00:09:43.519 --> 00:09:46.360
a researcher named Yehoshua Bar -Hilal. What

00:09:46.360 --> 00:09:50.179
did he say? He pointed out that without a universal

00:09:50.179 --> 00:09:53.259
encyclopedia, of common sense embedded in its

00:09:53.259 --> 00:09:55.500
programming. A machine will never be able to

00:09:55.500 --> 00:09:57.179
distinguish between two completely different

00:09:57.179 --> 00:09:59.980
meanings of the exact same word. Claude Piron,

00:10:00.299 --> 00:10:02.320
who was a longtime translator for the United

00:10:02.320 --> 00:10:04.960
Nations and the World Health Organization, he

00:10:04.960 --> 00:10:07.139
had a brilliant example of this. He pointed out

00:10:07.139 --> 00:10:09.639
the phrase Japanese prisoners of war camp. Right.

00:10:09.820 --> 00:10:12.700
A perfect example is the text referring to an

00:10:12.700 --> 00:10:14.759
American camp that is holding Japanese prisoners.

00:10:14.820 --> 00:10:16.740
Or is it a Japanese camp that is holding American

00:10:16.740 --> 00:10:19.279
prisoners? Because both interpretations are grammatically

00:10:19.279 --> 00:10:22.340
identical in English. Right. A machine just looks

00:10:22.340 --> 00:10:25.159
at its spatial map, checks statistical probabilities,

00:10:25.600 --> 00:10:28.080
and takes a blind guess based on what it saw

00:10:28.080 --> 00:10:30.919
most often in its training data. But Purin pointed

00:10:30.919 --> 00:10:33.539
out that a human translator encounters that phrase,

00:10:34.379 --> 00:10:37.740
realizes the historical context is missing, and

00:10:37.740 --> 00:10:40.019
literally picks up the phone to call an expert

00:10:40.019 --> 00:10:42.980
in Australia to research the specific World War

00:10:42.980 --> 00:10:45.600
II epidemic being referenced. And a machine cannot

00:10:45.600 --> 00:10:47.940
make that phone call. No, it can't. That lack

00:10:47.940 --> 00:10:51.039
of real world context heavily impacts how machines

00:10:51.039 --> 00:10:54.580
handle named entities too, like people's names,

00:10:54.779 --> 00:10:57.480
organizations, or locations. They really struggle

00:10:57.480 --> 00:10:59.899
with those, don't they? Constantly. The system

00:10:59.899 --> 00:11:01.960
never quite knows when to transliterate versus

00:11:01.960 --> 00:11:04.259
when to translate. Transliteration is simply

00:11:04.259 --> 00:11:06.340
finding the corresponding phonetic letters in

00:11:06.340 --> 00:11:09.440
the target alphabet, whereas translation is converting

00:11:09.440 --> 00:11:12.169
the actual meaning of the word. The classic example

00:11:12.169 --> 00:11:15.149
in the sources is Southern California. The machine

00:11:15.149 --> 00:11:17.730
needs to translate Southern because that's a

00:11:17.730 --> 00:11:20.429
directional descriptor, but it needs to transliterate

00:11:20.429 --> 00:11:23.210
California because that's a proper noun. Right.

00:11:23.570 --> 00:11:25.970
And machines frequently get confused and treat

00:11:25.970 --> 00:11:28.649
them as one single block, either translating

00:11:28.649 --> 00:11:31.509
both or transliterating both, yielding total

00:11:31.509 --> 00:11:34.289
gibberish in the target language. Even more fascinating

00:11:34.289 --> 00:11:37.470
is the statistical quirk within those named entities.

00:11:38.230 --> 00:11:40.570
A Stanford study found that if you ask a machine

00:11:40.519 --> 00:11:42.980
to translate the sentence, Ted is going for a

00:11:42.980 --> 00:11:45.259
walk, it will assign a different mathematical

00:11:45.259 --> 00:11:47.759
probability score and potentially a different

00:11:47.759 --> 00:11:50.559
translation structure than if you ask it to translate,

00:11:51.259 --> 00:11:53.840
Erica is going for a walk. Purely based on the

00:11:53.840 --> 00:11:55.940
name. Here's where it gets really interesting.

00:11:56.179 --> 00:11:58.700
It's like a bouncer at a club who lets Ted walk

00:11:58.700 --> 00:12:01.379
right through the door, but stops Erica to check

00:12:01.379 --> 00:12:04.059
her ID purely because he's seen more guys named

00:12:04.059 --> 00:12:05.960
Ted that week. That's a great way to put it.

00:12:06.100 --> 00:12:08.659
The machine's output is shaped entirely by its

00:12:08.659 --> 00:12:12.129
training diet. If the name Ted appeared 10 ,000

00:12:12.129 --> 00:12:14.570
times in the data and Erica only appeared 50

00:12:14.570 --> 00:12:17.389
times, the neural network weighs those completely

00:12:17.389 --> 00:12:20.129
equal sentences differently. The system also

00:12:20.129 --> 00:12:22.769
stumbles massively outside of standard formal

00:12:22.769 --> 00:12:25.879
language. Because these tools are trained overwhelmingly

00:12:25.879 --> 00:12:28.200
on government records, published books, and news

00:12:28.200 --> 00:12:31.259
articles, they fail aggressively when faced with

00:12:31.259 --> 00:12:34.639
vernacular, slang, or just the casual way people

00:12:34.639 --> 00:12:37.440
type on their mobile phones. Right. And the Ted

00:12:37.440 --> 00:12:41.279
versus Erica math quirk, or a failure to grasp

00:12:41.279 --> 00:12:43.740
local slang, might just be a funny anecdote if

00:12:43.740 --> 00:12:46.200
you're translating a novel. But what happens

00:12:46.200 --> 00:12:49.580
when that exact same statistical blind spot occurs

00:12:49.580 --> 00:12:52.700
in a hospital emergency room? That is exactly

00:12:52.700 --> 00:12:55.539
where the academic puzzle becomes a severe high

00:12:55.539 --> 00:12:58.919
-stakes liability. Using tools like Google Translate

00:12:58.919 --> 00:13:01.259
in a medical setting is increasingly common because

00:13:01.259 --> 00:13:03.059
it helps doctors communicate with patients in

00:13:03.059 --> 00:13:05.299
day -to -day activities when a human translator

00:13:05.299 --> 00:13:07.500
isn't available. Makes sense on a practical level.

00:13:07.679 --> 00:13:10.500
It does, but researchers are aggressively cautioning

00:13:10.500 --> 00:13:12.240
against relying on it for anything critical.

00:13:12.559 --> 00:13:16.279
Let's play out a scenario. A patient uses a regional

00:13:16.279 --> 00:13:18.799
colloquialism to describe a sharp pain in their

00:13:18.799 --> 00:13:21.840
chest. The machine translation app, lacking the

00:13:21.840 --> 00:13:24.700
cultural context of that specific slang, translates

00:13:24.700 --> 00:13:27.200
it into a mild word like heartburn. Which happens

00:13:27.200 --> 00:13:30.039
all the time. The doctor, reading the app, skips

00:13:30.039 --> 00:13:33.000
the cardiac workup and hands the patient antacids.

00:13:33.559 --> 00:13:36.679
A pure mathematical error just resulted in a

00:13:36.679 --> 00:13:39.100
misdiagnosis. That's why the medical community

00:13:39.100 --> 00:13:41.620
stresses that machine translated medical texts

00:13:41.620 --> 00:13:44.919
must be reviewed by a human. And the legal field

00:13:44.919 --> 00:13:48.279
is facing similar crises. Oh, I bet. Legal language

00:13:48.279 --> 00:13:51.460
is so specific. Incredibly precise. Yeah. It

00:13:51.460 --> 00:13:54.500
uses normal words in very atypical ways. If a

00:13:54.500 --> 00:13:57.220
lawyer uses a free online translation tool to

00:13:57.220 --> 00:13:59.659
decipher a client's foreign contract, they aren't

00:13:59.659 --> 00:14:01.980
just risking a mistranslation. They might actually

00:14:01.980 --> 00:14:04.740
be violating client confidentiality. Wait, how?

00:14:04.860 --> 00:14:07.740
Because they are exposing private sensitive information

00:14:07.740 --> 00:14:10.220
to the remote servers of the software providers.

00:14:10.360 --> 00:14:12.460
Oh, of course. You're literally sending the private

00:14:12.460 --> 00:14:15.159
document to Google or whoever. Exactly. And the

00:14:15.159 --> 00:14:17.950
sources mention actual court debate. over police

00:14:17.950 --> 00:14:20.909
searches, too. How so? If a police officer obtains

00:14:20.909 --> 00:14:23.730
consent to search a vehicle using a machine translation

00:14:23.730 --> 00:14:26.889
app on a smartphone, is that legally valid? Did

00:14:26.889 --> 00:14:29.029
the suspect actually understand the legal parameters

00:14:29.029 --> 00:14:31.750
of what they were consenting to? Or did the machine

00:14:31.750 --> 00:14:36.509
hallucinate a softer phrasing? Wow. It is a massive

00:14:36.509 --> 00:14:39.539
legal gray area. Naturally, the military and

00:14:39.539 --> 00:14:41.639
surveillance sectors are heavily invested in

00:14:41.639 --> 00:14:43.840
navigating these exact high -stakes environments.

00:14:44.240 --> 00:14:46.360
Absolutely. Following the 9 -11 attacks, agencies

00:14:46.360 --> 00:14:48.799
like DARPA -funded programs like TISE and the

00:14:48.799 --> 00:14:51.179
Babylon Translator are specifically focusing

00:14:51.179 --> 00:14:55.019
on two -way mobile translations for Arabic, Pashto,

00:14:55.179 --> 00:14:57.980
and Dari to facilitate rapid communication in

00:14:57.980 --> 00:15:00.259
the field. If we connect this to the bigger picture,

00:15:00.440 --> 00:15:02.519
the central theme really seems to be the environment.

00:15:02.669 --> 00:15:05.990
In fluid, high -risk human environments, like

00:15:05.990 --> 00:15:08.490
a hospital, a police stop, or a military checkpoint

00:15:08.490 --> 00:15:11.049
machine translation, is a dangerous liability.

00:15:11.129 --> 00:15:14.009
Right. But in highly controlled, specific environments,

00:15:14.389 --> 00:15:17.049
it is an absolute miracle tool. Oh, the triumphs

00:15:17.049 --> 00:15:19.029
of the technology in controlled settings are

00:15:19.029 --> 00:15:21.889
staggering. Neural networks recently successfully

00:15:21.889 --> 00:15:24.409
translated ancient Akkadian and its dialects

00:15:24.409 --> 00:15:26.970
Babylonian and Assyrian. That's just amazing

00:15:26.970 --> 00:15:29.769
to me. There are hundreds of thousands of clay

00:15:29.769 --> 00:15:32.539
tablets from ancient Mesopotamia sitting in museums,

00:15:32.919 --> 00:15:34.600
completely untranslated because there simply

00:15:34.600 --> 00:15:36.980
aren't enough human experts. Machine translation

00:15:36.980 --> 00:15:39.639
is mass processing these texts, literally unlocking

00:15:39.639 --> 00:15:42.899
ancient human history. Or look at augmented reality

00:15:42.899 --> 00:15:45.659
for travelers. The Google Translate camera feature

00:15:45.659 --> 00:15:48.159
we mentioned at the start, when it works, is

00:15:48.159 --> 00:15:51.059
brilliant. It's like magic. It overlays the translated

00:15:51.059 --> 00:15:53.059
text right onto your environment, preserving

00:15:53.059 --> 00:15:56.549
the font and background. And in gaming, the title

00:15:56.549 --> 00:15:59.570
Lineage W gained massive popularity in Japan,

00:15:59.889 --> 00:16:02.649
specifically because its built -in machine translation

00:16:02.649 --> 00:16:05.210
features allowed players from different countries

00:16:05.210 --> 00:16:08.230
to seamlessly strategize and communicate in real

00:16:08.230 --> 00:16:11.309
time. That's so cool. And there's also incredible

00:16:11.309 --> 00:16:14.230
complex work being done with signed languages,

00:16:14.549 --> 00:16:18.049
right? Yes. A prototype called Team was developed

00:16:18.049 --> 00:16:21.019
to translate English text into American Sign

00:16:21.019 --> 00:16:22.600
Language. And this isn't just swapping words

00:16:22.600 --> 00:16:24.620
for hand gestures. It's way more complex than

00:16:24.620 --> 00:16:26.740
that. Right. Stress, pitch, and timing are conveyed

00:16:26.740 --> 00:16:29.379
entirely differently in sign languages. The system

00:16:29.379 --> 00:16:31.899
analyzes the English grammar, accesses a sign

00:16:31.899 --> 00:16:34.120
synthesizer, and a computer -generated human

00:16:34.120 --> 00:16:36.639
appears on screen to sign the text, replicating

00:16:36.639 --> 00:16:38.820
the spatial and temporal nuances required for

00:16:38.820 --> 00:16:40.700
comprehension. And we absolutely have to talk

00:16:40.700 --> 00:16:43.340
about how Wikipedia uses this technology. Wikipedia

00:16:43.340 --> 00:16:46.539
is available in 85 languages, but there is a

00:16:46.539 --> 00:16:49.509
massive imbalance. How big of an imbalance? While

00:16:49.509 --> 00:16:52.730
the English Wikipedia has over 6 .5 million articles,

00:16:53.190 --> 00:16:55.429
the German and Swedish versions only have around

00:16:55.429 --> 00:16:59.429
2 .5 million. Wow, that's a huge gap. Yeah. Volunteer

00:16:59.429 --> 00:17:02.070
editors are heavily utilizing Wikipedia's content

00:17:02.070 --> 00:17:04.809
translation tool to quickly draft articles from

00:17:04.809 --> 00:17:07.170
English into those other languages, bridging

00:17:07.170 --> 00:17:09.750
the global knowledge gap at a speed humans could

00:17:09.750 --> 00:17:12.309
never achieve alone. But this brings us to a

00:17:12.309 --> 00:17:15.069
really difficult question. If this technology

00:17:15.069 --> 00:17:17.849
is volatile enough to alter a medical diagnosis,

00:17:18.670 --> 00:17:21.089
but powerful enough to translate hundreds of

00:17:21.089 --> 00:17:24.410
thousands of Acadian clay tablets, how do we

00:17:24.410 --> 00:17:26.049
actually grade it? Right. How do you measure

00:17:26.049 --> 00:17:28.670
success? Exactly. How do we measure the success

00:17:28.670 --> 00:17:31.109
of a translation? I know there are automated

00:17:31.109 --> 00:17:33.289
metrics to do this. The sources mentioned programs

00:17:33.289 --> 00:17:37.589
with acronyms like BLEU, NIST, and METEOR. But

00:17:37.589 --> 00:17:40.029
how does a mathematical formula actually grade

00:17:40.029 --> 00:17:43.049
a fluid language translation? Let's look at BLEU,

00:17:43.170 --> 00:17:46.349
which stands for Bilingual Evaluation Understudy.

00:17:46.470 --> 00:17:48.250
It basically takes the machine's translation

00:17:48.250 --> 00:17:50.930
and overlays it onto a professional human translation

00:17:50.930 --> 00:17:53.710
of the same text. It then mathematically counts

00:17:53.710 --> 00:17:56.869
how many overlapping word sequences or engrams

00:17:56.869 --> 00:17:59.960
they share. The more overlapping sequences, the

00:17:59.960 --> 00:18:03.539
higher the score. It is incredibly useful for

00:18:03.539 --> 00:18:05.740
rapid testing. There are other methods too, right?

00:18:05.859 --> 00:18:07.759
Like example -based machine translation, which

00:18:07.759 --> 00:18:09.869
we haven't touched on yet. Yeah, example -based

00:18:09.869 --> 00:18:12.569
MT is an alternative approach that doesn't just

00:18:12.569 --> 00:18:15.089
look at statistical probabilities, but actually

00:18:15.089 --> 00:18:18.250
searches a massive database for similar past

00:18:18.250 --> 00:18:20.829
sentences and uses those as templates. Like finding

00:18:20.829 --> 00:18:23.829
a similar puzzle piece. Exactly. And some metrics

00:18:23.829 --> 00:18:25.970
found it actually performed better specifically

00:18:25.970 --> 00:18:28.589
for English to French translations compared to

00:18:28.589 --> 00:18:31.250
purely statistical models. But regardless of

00:18:31.250 --> 00:18:33.849
the metric or the method, all the sources emphasize

00:18:33.849 --> 00:18:37.250
one core truth. Which is? Human judges are still

00:18:37.250 --> 00:18:39.569
the absolute most reliable method of evaluation.

00:18:39.970 --> 00:18:43.049
Because BLEU can count overlapping words, but

00:18:43.049 --> 00:18:45.410
it can't matter whether a sentence actually makes

00:18:45.410 --> 00:18:47.970
logical sense. Precisely. It brings us back to

00:18:47.970 --> 00:18:50.630
Claude Piram. He summarized this entire dynamic

00:18:50.630 --> 00:18:53.130
perfectly with what I'll call the 90 -10 rule.

00:18:53.690 --> 00:18:56.210
He noted that machine translation, at its absolute

00:18:56.210 --> 00:18:59.690
best, only automates the easy 90 % of a translator's

00:18:59.690 --> 00:19:01.750
job, the vocabulary hauling. Just moving the

00:19:01.750 --> 00:19:04.809
words over. Right. But that final 10%, that's

00:19:04.809 --> 00:19:07.410
the part that requires six hours of intense human

00:19:07.410 --> 00:19:10.190
research to resolve ambiguities and context.

00:19:10.730 --> 00:19:13.130
The machine lays the bricks, but the human has

00:19:13.130 --> 00:19:15.589
to build the actual architecture of comprehension.

00:19:16.430 --> 00:19:18.890
And when you remove the human completely, and

00:19:18.890 --> 00:19:21.349
just let the machine loop in on its own mathematical

00:19:21.349 --> 00:19:24.509
logic, well, the system begins to hallucinate.

00:19:24.630 --> 00:19:27.210
Oh, we have to talk about the 2017 YouTube glitch.

00:19:27.289 --> 00:19:29.529
This is legendary. Oh yeah, this is so weird.

00:19:29.720 --> 00:19:31.720
Someone figured out that if you went to Google

00:19:31.720 --> 00:19:34.099
Translate and repeatedly typed in the Japanese

00:19:34.099 --> 00:19:37.420
hiragana characters A, which is just the phonetic

00:19:37.420 --> 00:19:40.400
sounds E and GU, the system's neural network

00:19:40.400 --> 00:19:42.920
completely lost its mind trying to find a pattern.

00:19:43.319 --> 00:19:45.640
It desperately tried to map those nonsense syllables

00:19:45.640 --> 00:19:48.440
to real -world concepts, spitting out completely

00:19:48.440 --> 00:19:51.079
absurd English phrases like deep sea squeeze

00:19:51.079 --> 00:19:55.079
trees. And the most famous output, desiring egg.

00:19:55.369 --> 00:19:58.450
Just all caps. Deceering egg. People made videos

00:19:58.450 --> 00:20:00.430
reading these outputs in dramatic voices and

00:20:00.430 --> 00:20:03.109
they got millions of views. So what does this

00:20:03.109 --> 00:20:06.009
all mean? Doesn't the deceering egg glitch perfectly

00:20:06.009 --> 00:20:09.269
prove that the machine has absolutely no inner

00:20:09.269 --> 00:20:10.970
understanding of the world? Oh, without a doubt.

00:20:11.029 --> 00:20:13.210
It's just hallucinating patterns when the math

00:20:13.210 --> 00:20:15.630
breaks down? It absolutely proves it. The machine

00:20:15.630 --> 00:20:18.690
has no conceptual understanding of an egg or

00:20:18.690 --> 00:20:22.529
the deep sea or a tree. It is purely mathematical

00:20:22.529 --> 00:20:25.680
probability. failing in real time. Just totally

00:20:25.680 --> 00:20:28.099
breaking. Right. Which raises an important question.

00:20:28.519 --> 00:20:30.440
If these machines are just predicting spatial

00:20:30.440 --> 00:20:33.099
relationships on a map of vocabulary, they possess

00:20:33.099 --> 00:20:36.059
no actual creativity. And that lack of creativity

00:20:36.059 --> 00:20:38.180
has sparked a massive legal debate regarding

00:20:38.180 --> 00:20:40.480
copyright. Because according to the law, only

00:20:40.480 --> 00:20:42.880
works that demonstrate original human creativity

00:20:42.880 --> 00:20:45.460
are subject to copyright protection. Multiple

00:20:45.460 --> 00:20:47.619
legal scholars are currently arguing that machine

00:20:47.619 --> 00:20:50.480
translation results are not entitled to any copyright

00:20:50.480 --> 00:20:52.750
protection whatsoever. The original author of

00:20:52.750 --> 00:20:54.690
the text retains their copyright, of course,

00:20:55.170 --> 00:20:57.950
but the actual translated output generated by

00:20:57.950 --> 00:21:00.690
the machine. Because it is just the result of

00:21:00.690 --> 00:21:03.410
a mathematical algorithm, it is legally considered

00:21:03.410 --> 00:21:05.769
devoid of the human spark required for ownership.

00:21:06.069 --> 00:21:08.950
Which leaves us with a truly wild final thought

00:21:08.950 --> 00:21:11.539
to consider. We learned today that machine translations

00:21:11.539 --> 00:21:13.500
may not be eligible for copyright protection

00:21:13.500 --> 00:21:16.660
because they lack human creativity. Yet, as we

00:21:16.660 --> 00:21:18.500
discussed with the Swedish and German examples,

00:21:19.019 --> 00:21:21.259
these exact same machines are increasingly being

00:21:21.259 --> 00:21:24.200
used to translate massive global repositories

00:21:24.200 --> 00:21:26.859
of knowledge like Wikipedia. They're basically

00:21:26.859 --> 00:21:29.519
building the modern library of Alexandria, crossing

00:21:29.519 --> 00:21:31.920
language barriers at an unprecedented scale,

00:21:32.339 --> 00:21:34.779
using code that lacks comprehension. So as you

00:21:34.779 --> 00:21:36.960
go about your week, and maybe the next time you

00:21:36.960 --> 00:21:39.440
use an app to translate a menu on vacation, ask

00:21:39.440 --> 00:21:42.140
yourself this. If machines are translating the

00:21:42.140 --> 00:21:44.359
bulk of the world's shared information, but those

00:21:44.359 --> 00:21:47.240
machines possess absolutely no creativity and

00:21:47.240 --> 00:21:49.859
cannot legally own their words, who will truly

00:21:49.859 --> 00:21:52.039
own the cross -cultural knowledge of our future?

00:21:52.480 --> 00:21:55.460
Is our shared human history becoming an uncopyrightable,

00:21:55.720 --> 00:21:58.200
machine -generated average? Thank you for taking

00:21:58.200 --> 00:22:00.960
this deep dive with us today. Keep asking questions,

00:22:01.059 --> 00:22:03.160
keep looking past the magic, and keep exploring

00:22:03.160 --> 00:22:05.200
the visible systems shaping your world.