WEBVTT

00:00:00.000 --> 00:00:05.339
So imagine a blindfolded jeweler. OK. And they're

00:00:05.339 --> 00:00:08.259
tasked with cutting this priceless, incredibly

00:00:08.259 --> 00:00:11.859
fragile diamond. But to keep the gem secure,

00:00:12.179 --> 00:00:14.859
it's been locked inside a thick, opaque titanium

00:00:14.859 --> 00:00:17.699
box. Right. So they can't see it at all. Exactly.

00:00:18.100 --> 00:00:20.019
The jeweler has to use these remote -controlled

00:00:20.019 --> 00:00:22.820
mechanical arms to reach inside and cut the gem.

00:00:22.969 --> 00:00:24.809
They can feel their way through the delicate

00:00:24.809 --> 00:00:27.010
movements and they execute the work perfectly,

00:00:27.149 --> 00:00:28.649
but they never actually see the jewel itself.

00:00:28.710 --> 00:00:30.730
They just hand the lock box back to the owner

00:00:30.730 --> 00:00:33.609
when they're finished. That is a really vivid

00:00:33.609 --> 00:00:36.030
way to look at it. Yeah. And welcome to today's

00:00:36.030 --> 00:00:38.490
deep dive. Our mission today is to explore a

00:00:38.490 --> 00:00:40.990
concept that cryptographers consider like the

00:00:40.990 --> 00:00:43.729
holy grail of their field. It's called homomorphic

00:00:43.729 --> 00:00:46.100
encryption. The big one. The really big one.

00:00:46.299 --> 00:00:48.340
And we're pulling from a comprehensive set of

00:00:48.340 --> 00:00:51.460
source materials today, mostly this really extensive

00:00:51.460 --> 00:00:54.159
Wikipedia overview on the subject, just to understand

00:00:54.159 --> 00:00:57.640
how we can process and analyze data while it

00:00:57.640 --> 00:01:00.640
remains entirely encrypted. Which is huge for,

00:01:00.640 --> 00:01:03.179
you know, safely outsourcing our digital lives

00:01:03.179 --> 00:01:06.379
to the cloud. Right. Because right now you have

00:01:06.379 --> 00:01:09.409
to trust the cloud provider. But if this works,

00:01:09.629 --> 00:01:11.849
maybe you don't and I want to make sure you the

00:01:11.849 --> 00:01:14.909
listener have a really clear Jargon -free shortcut

00:01:14.909 --> 00:01:17.329
to understanding this tech because it's probably

00:01:17.329 --> 00:01:19.049
going to be protecting your personal data very

00:01:19.049 --> 00:01:24.430
soon So, okay, let's unpack this How close is

00:01:24.430 --> 00:01:27.349
that blindfolded jeweler analogy to the actual

00:01:27.349 --> 00:01:30.569
mechanics of homomorphic encryption? I mean,

00:01:30.870 --> 00:01:32.450
the analogy actually holds up incredibly well

00:01:32.450 --> 00:01:34.829
to the underlying mechanics. The jeweler essentially

00:01:34.829 --> 00:01:37.170
represents a computer server in the cloud, right?

00:01:37.170 --> 00:01:39.590
OK. And the mechanical arms, those are the mathematical

00:01:39.590 --> 00:01:41.829
operations being performed. And the diamond is

00:01:41.829 --> 00:01:44.170
your sensitive data. But the critical part of

00:01:44.170 --> 00:01:46.709
your analogy is that the jeweler never sees the

00:01:46.709 --> 00:01:48.909
diamond. Because the box is locked. Exactly.

00:01:49.530 --> 00:01:52.049
In conventional computing, if you want a server

00:01:52.049 --> 00:01:55.510
to, say, search a database or organize files

00:01:55.510 --> 00:01:57.870
or run analytics, you have to give that server

00:01:57.870 --> 00:02:00.209
the decryption key. Right. The data has to be

00:02:00.209 --> 00:02:02.409
translated from encrypted gibberish back into

00:02:02.409 --> 00:02:04.790
plain text before the computer can even do anything

00:02:04.790 --> 00:02:07.049
with it. Yeah, and the moment it's in plain text,

00:02:07.209 --> 00:02:10.210
it's completely exposed. That creates a massive

00:02:10.210 --> 00:02:12.550
opening for what we call a privilege escalation

00:02:12.550 --> 00:02:15.610
attack. Which means if a hacker gets in, while

00:02:15.610 --> 00:02:17.789
the data is being processed. They can just walk

00:02:17.789 --> 00:02:20.789
away with the plain text. Yeah. But homomorphic

00:02:20.789 --> 00:02:24.449
encryption, or HE, aims to close that window

00:02:24.449 --> 00:02:28.069
of vulnerability permanently. The server performs

00:02:28.069 --> 00:02:30.330
computations directly on the ciphertext, which

00:02:30.330 --> 00:02:32.710
is the encrypted gibberish. So the output of

00:02:32.710 --> 00:02:35.620
the computation is just... more encrypted gibberish.

00:02:36.159 --> 00:02:38.180
Exactly. It's only when you download that result

00:02:38.180 --> 00:02:40.360
back to your own device and apply your private

00:02:40.360 --> 00:02:42.860
key that the gibberish suddenly resolves into

00:02:42.860 --> 00:02:45.539
the correct unencrypted answer. That's wild.

00:02:45.919 --> 00:02:47.719
And the source material gives a couple of really

00:02:47.719 --> 00:02:50.460
practical examples of this in action, which I

00:02:50.460 --> 00:02:51.919
think highlights why you should care about this.

00:02:52.120 --> 00:02:53.979
Think about predictive analytics in health care.

00:02:54.180 --> 00:02:57.520
That's a perfect example. A hospital has mountains

00:02:57.520 --> 00:03:00.319
of patient data that could be used to train AI

00:03:00.319 --> 00:03:04.560
models to predict disease outbreaks or patient

00:03:04.560 --> 00:03:07.960
outcomes. But because of privacy laws, they obviously

00:03:07.960 --> 00:03:10.300
can't just hand patient records over to a third

00:03:10.300 --> 00:03:12.419
-party tech company. Right, high PK and all that.

00:03:12.639 --> 00:03:15.639
Exactly. But with homomorphic encryption, the

00:03:15.639 --> 00:03:17.840
hospital could basically lock the records in

00:03:17.840 --> 00:03:20.819
that opaque box, send the boxes to the cloud,

00:03:21.240 --> 00:03:23.379
and the cloud runs its machine learning models

00:03:23.379 --> 00:03:26.780
without ever seeing the files. Then it just sends

00:03:26.780 --> 00:03:29.860
back an encrypted prediction. The cloud provider

00:03:29.860 --> 00:03:32.139
learns absolutely nothing about the patients.

00:03:32.439 --> 00:03:34.500
You see similar applications in image processing,

00:03:34.840 --> 00:03:37.319
too. Like, you could send an encrypted photograph

00:03:37.319 --> 00:03:39.900
to a lookup service to scan it for points of

00:03:39.900 --> 00:03:41.919
interest. Like trying to identify a landmark

00:03:41.919 --> 00:03:44.500
or something? Yeah, exactly. The service runs

00:03:44.500 --> 00:03:46.939
its algorithms over the encrypted data, finds

00:03:46.939 --> 00:03:49.460
the landmark, and sends you the answer all without

00:03:49.460 --> 00:03:52.000
ever actually viewing your photo. Though I will

00:03:52.000 --> 00:03:54.460
say, the sources provide a very necessary warning

00:03:54.460 --> 00:03:56.819
right up front about side channels. Ah, yes.

00:03:57.039 --> 00:03:59.759
the metadata issue. Right. Because the math itself

00:03:59.759 --> 00:04:01.819
might be perfectly secure, meaning the box cannot

00:04:01.819 --> 00:04:04.800
be broken into. But observing the behavior of

00:04:04.800 --> 00:04:08.400
the system still leaks metadata. Like, an observer

00:04:08.400 --> 00:04:10.460
might not see the photo of the landmark, but

00:04:10.460 --> 00:04:12.099
they can see that your device sent a massive

00:04:12.099 --> 00:04:14.280
packet of data to a lookup service at, you know,

00:04:14.360 --> 00:04:16.819
three in the afternoon. It's a really vital carryout.

00:04:17.100 --> 00:04:19.720
Cryptography secures the contents, but traffic

00:04:19.720 --> 00:04:22.480
analysis reveals context. The metadata is still

00:04:22.480 --> 00:04:26.899
visible. But to understand how the locked box

00:04:26.899 --> 00:04:29.399
computation actually works without decrypting

00:04:29.399 --> 00:04:31.959
the contents, we really have to look at the different

00:04:31.959 --> 00:04:35.019
flavors of homomorphic encryption. Right, yeah,

00:04:35.060 --> 00:04:37.240
they're categorized by the type and complexity

00:04:37.240 --> 00:04:39.800
of the mathematical operations, the circuits

00:04:39.800 --> 00:04:42.259
that they can handle. Right, and they range from

00:04:42.259 --> 00:04:45.699
basic arithmetic to incredibly complex operations.

00:04:45.920 --> 00:04:48.560
Like first, there's partially homomorphic encryption,

00:04:49.279 --> 00:04:51.680
which means the system supports only one type

00:04:51.680 --> 00:04:54.019
of mathematical gate. Just one. Right, so you

00:04:54.019 --> 00:04:55.879
can do addition or you can do multiplication,

00:04:55.980 --> 00:04:58.420
but you definitely cannot do both. The source

00:04:58.420 --> 00:05:01.779
notes that the famous RSA cryptosystem is actually

00:05:01.779 --> 00:05:04.139
partially homomorphic. It allows you to do an

00:05:04.139 --> 00:05:06.180
unbounded number of multiplications on encrypted

00:05:06.180 --> 00:05:08.459
data. And then there's the Pellier cryptosystem,

00:05:08.560 --> 00:05:11.079
which allows for unbounded additions. But building

00:05:11.079 --> 00:05:13.339
on that, if you want more flexibility, you move

00:05:13.339 --> 00:05:16.720
up to somewhat homomorphic encryption. This one

00:05:16.720 --> 00:05:19.329
allows for two types of gates. both addition

00:05:19.329 --> 00:05:22.290
and multiplication, but only for a limited subset

00:05:22.290 --> 00:05:24.250
of circuits. Meaning it has a ceiling on how

00:05:24.250 --> 00:05:26.029
much math you can do, right? Exactly. It hits

00:05:26.029 --> 00:05:30.209
a limit. From there, we reach leveled fully homomorphic

00:05:30.209 --> 00:05:32.610
encryption, which supports arbitrary surface

00:05:32.610 --> 00:05:34.990
with multiple types of gates, but only up to

00:05:34.990 --> 00:05:37.870
a predetermined bounded depth. OK. And finally,

00:05:38.069 --> 00:05:41.050
the ultimate goal, fully homomorphic encryption,

00:05:41.410 --> 00:05:45.649
or FHE. This allows for the evaluation of arbitrary

00:05:45.649 --> 00:05:48.569
circuits of unbounded depth, meaning you can

00:05:48.569 --> 00:05:51.470
run literally any computer program of any length

00:05:51.470 --> 00:05:54.389
or complexity on the encrypted data. Okay, I

00:05:54.389 --> 00:05:56.269
want to push back on that bounded depth concept

00:05:56.269 --> 00:05:58.670
for a second, because this is where the physical

00:05:58.670 --> 00:06:00.649
reality of the math seems to get really weird.

00:06:00.689 --> 00:06:03.449
It does get weird. Yeah. The sources focus heavily

00:06:03.449 --> 00:06:06.350
on this idea of a multiplicative ceiling. Let's

00:06:06.350 --> 00:06:09.519
say... Partially homomorphic encryption is a

00:06:09.519 --> 00:06:12.000
calculator with only an addition button. And

00:06:12.000 --> 00:06:13.839
somewhat homomorphic encryption gives you a calculator

00:06:13.839 --> 00:06:15.959
with a multiplication button, but the battery

00:06:15.959 --> 00:06:18.399
magically dies after five uses. That's a fun

00:06:18.399 --> 00:06:22.420
way to picture it. Thanks. But why does doing

00:06:22.420 --> 00:06:25.240
too much math break the encryption? Like, what

00:06:25.240 --> 00:06:27.920
is actually happening inside the lockbox that

00:06:27.920 --> 00:06:30.420
causes it to fail? Well, what's fascinating here

00:06:30.420 --> 00:06:33.339
is that To make homomorphic encryption function

00:06:33.339 --> 00:06:37.160
at all, the ciphertext must be inherently malleable.

00:06:37.439 --> 00:06:39.839
Malleable how? The server has to be able to manipulate

00:06:39.839 --> 00:06:42.279
the encrypted gibberish in a way that fundamentally

00:06:42.279 --> 00:06:45.759
changes the hidden plaintext inside. But to prevent

00:06:45.759 --> 00:06:47.920
attackers from just easily reverse engineering

00:06:47.920 --> 00:06:50.540
that malleability, cryptographers have to inject

00:06:50.540 --> 00:06:54.180
a small amount of mathematical noise or intentional

00:06:54.180 --> 00:06:57.240
error into the encryption. So the noise is like

00:06:57.240 --> 00:06:59.589
static on a radio transmission. That is a highly

00:06:59.589 --> 00:07:02.470
useful way to visualize it. Yes, every time the

00:07:02.470 --> 00:07:04.170
server performs an addition on the encrypted

00:07:04.170 --> 00:07:06.629
data, that static grows just a tiny bit. OK.

00:07:06.730 --> 00:07:08.470
But every time the server performs a multiplication,

00:07:08.990 --> 00:07:11.689
the static compounds exponentially. It multiplies

00:07:11.689 --> 00:07:13.269
the noise. So if the mathematical circuit is

00:07:13.269 --> 00:07:16.250
too deep, meaning you perform too many multiplications,

00:07:16.629 --> 00:07:18.810
the static eventually overpowers the signal.

00:07:19.129 --> 00:07:21.850
Oh, man. So when you finally download the result

00:07:21.850 --> 00:07:24.670
and use your key to open the box. The original

00:07:24.670 --> 00:07:26.910
data is completely drowned out, the decryption

00:07:26.910 --> 00:07:30.379
fails, and the data is lost forever. That is

00:07:30.379 --> 00:07:33.139
just a wild concept. The very feature that keeps

00:07:33.139 --> 00:07:36.360
the data secure, the noise, becomes the exact

00:07:36.360 --> 00:07:38.620
thing that destroys the data if you manipulate

00:07:38.620 --> 00:07:41.379
it too much. It's a huge paradox. Yeah. And the

00:07:41.379 --> 00:07:43.459
problem of how to stop that noise from destroying

00:07:43.459 --> 00:07:46.060
the data went unsolved for decades. I mean, the

00:07:46.060 --> 00:07:48.019
source notes the concept of homomorphic encryption

00:07:48.019 --> 00:07:51.439
was first proposed back in 1978, barely a year

00:07:51.439 --> 00:07:53.980
after RSA was published. Right. And for over

00:07:54.110 --> 00:07:56.910
30 years, the cryptography community didn't even

00:07:56.910 --> 00:07:58.709
know if a solution was mathematically possible,

00:07:58.730 --> 00:08:00.769
right? Yeah, they had no idea. Researchers built

00:08:00.769 --> 00:08:03.069
systems that allowed unlimited additions, but

00:08:03.069 --> 00:08:06.670
maybe only a single multiplication. But the holy

00:08:06.670 --> 00:08:09.050
grail of fully homomorphic encryption remained

00:08:09.050 --> 00:08:12.269
purely theoretical. It was this massive bottleneck

00:08:12.269 --> 00:08:15.970
in computer science. Well, until 2009, when a

00:08:15.970 --> 00:08:18.209
researcher named Craig Gentry figured it out.

00:08:18.379 --> 00:08:21.079
using lattice -based cryptography, creating the

00:08:21.079 --> 00:08:23.240
first plausible fully homomorphic encryption

00:08:23.240 --> 00:08:26.420
scheme. Yes, Gentry's breakthrough. And his workaround

00:08:26.420 --> 00:08:28.779
for the noise problem was a mechanism he called

00:08:28.779 --> 00:08:31.639
bootstrapping. Bootstrapping is honestly one

00:08:31.639 --> 00:08:34.320
of the most elegant concepts in modern cryptography.

00:08:35.039 --> 00:08:37.080
Gentry realized that it was impossible to stop

00:08:37.080 --> 00:08:38.860
the noise from growing during multiplication.

00:08:39.720 --> 00:08:43.220
The static was essentially a law of physics for

00:08:43.220 --> 00:08:46.190
this kind of math. So he couldn't stop it. Right.

00:08:46.370 --> 00:08:48.690
So if he couldn't stop it, he had to find a way

00:08:48.690 --> 00:08:52.210
to clean it up mid -computation before it destroyed

00:08:52.210 --> 00:08:54.490
the signal. Here's where it gets really interesting.

00:08:54.549 --> 00:08:56.629
Let me see if I have this right. It's like having

00:08:56.629 --> 00:08:59.070
a photocopy of a document. Every time you copy

00:08:59.070 --> 00:09:01.409
the photocopy, it degrades and gets grainier.

00:09:01.649 --> 00:09:05.049
Exactly. That grain is the noise. So boot scrapping

00:09:05.049 --> 00:09:07.809
sounds like having a machine that perfectly redraws

00:09:07.809 --> 00:09:10.750
the faded text onto a fresh sheet of paper. But

00:09:10.750 --> 00:09:13.049
the machine itself is entirely blind and never

00:09:13.049 --> 00:09:15.669
actually reads the words. It just resets the

00:09:15.669 --> 00:09:18.769
noise back to zero. Give you a fresh, clean ciphertext.

00:09:19.129 --> 00:09:21.590
That captures the outcome perfectly. But to explain

00:09:21.590 --> 00:09:23.950
the mechanics of how the blind machine does that,

00:09:24.090 --> 00:09:26.129
we have to look at recursive self -embedding.

00:09:26.370 --> 00:09:28.370
Recursive self -embedding. That sounds intense.

00:09:28.590 --> 00:09:31.350
It is a bit. Gentry designed a scheme that was

00:09:31.350 --> 00:09:33.889
capable of homomorphically evaluating its own

00:09:33.889 --> 00:09:36.389
decryption circuit. Wait, evaluating its own

00:09:36.389 --> 00:09:38.549
decryption circuit? How does that even work?

00:09:38.690 --> 00:09:41.110
Okay, let's go back to your locked box. You have

00:09:41.110 --> 00:09:43.750
a titanium box, and the static inside is getting

00:09:43.750 --> 00:09:45.909
dangerously loud. Before it ruins the diamond,

00:09:46.210 --> 00:09:49.450
the server wraps your box inside a second, slightly

00:09:49.450 --> 00:09:51.590
larger titanium box. Okay, so now you have a

00:09:51.590 --> 00:09:54.470
box inside a box. Right. Then you, the user,

00:09:54.830 --> 00:09:56.649
provide the server with an encrypted version

00:09:56.649 --> 00:09:59.740
of your key. The server uses its mechanical arms

00:09:59.740 --> 00:10:02.240
to take that encrypted key, reach inside the

00:10:02.240 --> 00:10:05.500
outer box, and unlock the inner box. Oh, so the

00:10:05.500 --> 00:10:08.230
inner box is open? but the diamond is still safely

00:10:08.230 --> 00:10:11.190
locked inside the outer box. Precisely. The data

00:10:11.190 --> 00:10:13.850
is never exposed to the open air. But because

00:10:13.850 --> 00:10:16.409
the inner box was unlocked, the accumulated noise

00:10:16.409 --> 00:10:18.450
from the previous computations is stripped away.

00:10:18.649 --> 00:10:20.610
That is brilliant. It really is. You're left

00:10:20.610 --> 00:10:23.309
with a brand new, fresh ciphertext with baseline

00:10:23.309 --> 00:10:26.009
noise holding the exact same value. And because

00:10:26.009 --> 00:10:28.070
of this, you can string together an infinite

00:10:28.070 --> 00:10:30.110
number of additions and multiplications. You

00:10:30.110 --> 00:10:32.330
just pause and bootstrap whenever the noise gets

00:10:32.330 --> 00:10:35.610
too high. Gentry proved that any scheme capable

00:10:35.610 --> 00:10:38.370
of this self -evaluation could be converted into

00:10:38.370 --> 00:10:41.129
a fully homomorphic scheme. Okay, but there is

00:10:41.129 --> 00:10:43.309
a massive catch in the source material, like

00:10:43.309 --> 00:10:46.289
a really big one. The processing time. Yes. The

00:10:46.289 --> 00:10:48.970
original implementation by Gentry and Halevi

00:10:48.970 --> 00:10:52.230
took about 30 minutes just to process a single

00:10:52.230 --> 00:10:55.590
basic bit operation. And a bit is literally just

00:10:55.590 --> 00:10:57.970
a one or a zero. It was very slow. If it takes

00:10:57.970 --> 00:11:00.529
half an hour to do one microscopic calculation,

00:11:01.169 --> 00:11:04.330
aren't we talking about years? just to load a

00:11:04.330 --> 00:11:06.830
basic web page or process a simple text document.

00:11:07.049 --> 00:11:09.570
How could anyone consider this practical? Well,

00:11:09.649 --> 00:11:11.809
you have to view that 30 -minute metric through

00:11:11.809 --> 00:11:14.750
the lens of scientific progress. At that moment

00:11:14.750 --> 00:11:18.009
in 2009, speed was completely irrelevant. It

00:11:18.009 --> 00:11:20.289
was entirely a proof of concept. Oh, so just

00:11:20.289 --> 00:11:22.450
proving it could be done at all. Exactly. The

00:11:22.450 --> 00:11:24.870
value was in proving that the underlying cryptographic

00:11:24.870 --> 00:11:27.370
assumptions like problems over ideal lattices

00:11:27.370 --> 00:11:30.110
could be used to achieve the impossible. Gentry

00:11:30.110 --> 00:11:32.789
broke a 30 -year stalemate. And once a community

00:11:32.720 --> 00:11:35.639
knew it was possible, other pathways opened up

00:11:35.639 --> 00:11:38.860
almost immediately. Like that? Well, by 2010,

00:11:39.200 --> 00:11:40.940
Van Dyke and a team of researchers presented

00:11:40.940 --> 00:11:43.740
an alternative scheme using simple integers instead

00:11:43.740 --> 00:11:46.480
of complex ideal lattices. The floodgates were

00:11:46.480 --> 00:11:49.399
open. Okay, so the academic world knew the lockbox

00:11:49.399 --> 00:11:52.320
worked. The blindfolded jeweler could reset their

00:11:52.320 --> 00:11:55.159
tools without destroying the diamond. But to

00:11:55.159 --> 00:11:57.240
make this viable for commercial cloud computing,

00:11:57.480 --> 00:11:59.919
where servers crunch petabytes of data, they

00:11:59.919 --> 00:12:02.240
needed to speed it up by several orders of magnitude.

00:12:02.700 --> 00:12:06.240
Which brings us to the race for efficiency, Generations

00:12:06.240 --> 00:12:08.860
2 and 3. Yeah, the subsequent generations of

00:12:08.860 --> 00:12:11.659
FHE schemes focused entirely on optimization.

00:12:12.360 --> 00:12:14.120
In the years following Gentry's breakthrough,

00:12:14.399 --> 00:12:17.600
roughly 2011 to 2012, Researchers developed schemes

00:12:17.600 --> 00:12:21.200
like BGV and BFE. And what did those do? They

00:12:21.200 --> 00:12:23.480
drastically slowed the growth of the noise during

00:12:23.480 --> 00:12:26.259
computations. The static simply didn't build

00:12:26.259 --> 00:12:28.159
up as fast, meaning you didn't have to perform

00:12:28.159 --> 00:12:30.919
that slow bootstrapping process nearly as often.

00:12:31.080 --> 00:12:32.659
Though the source did mention some missteps,

00:12:32.960 --> 00:12:35.639
right? Like NTRU -based schemes, I think. They

00:12:35.639 --> 00:12:39.059
were called LTV and BLLN. Ah, yes. Those failed

00:12:39.059 --> 00:12:41.120
because they relied on an overstretched variant

00:12:41.120 --> 00:12:43.779
that ended up being vulnerable to subfield lattice

00:12:43.779 --> 00:12:46.580
attacks. It was a dead end. But the successful

00:12:46.580 --> 00:12:49.639
Gen 2 schemes thrived. The source mentions they

00:12:49.639 --> 00:12:53.100
also achieved nearly optimal asymptotic complexity

00:12:53.100 --> 00:12:56.600
using a technique involving SIMD single instruction,

00:12:57.360 --> 00:13:00.460
multiple data. They were packing multiple plaintext

00:13:00.460 --> 00:13:03.840
values into a single Cypher text to compute on

00:13:03.840 --> 00:13:06.759
all of them simultaneously. Yes, Cindy packing

00:13:06.759 --> 00:13:09.440
was a game changer. I want to try an analogy

00:13:09.440 --> 00:13:11.759
for this. If we're packing multiple plaintexts

00:13:11.759 --> 00:13:15.039
into one Cypher text, is it like stuffing hundreds

00:13:15.039 --> 00:13:17.399
of passengers onto a massive train so you only

00:13:17.399 --> 00:13:19.080
have to make one trip instead of driving them

00:13:19.080 --> 00:13:21.519
all individually in cars? That's actually a great

00:13:21.519 --> 00:13:23.179
way to put it. But wait, it definitely saves

00:13:23.179 --> 00:13:25.700
time, but does that limit the route the train

00:13:25.700 --> 00:13:28.820
can take? Like, do we lose computational flexibility

00:13:28.820 --> 00:13:31.360
just to gain speed? If we connect this to the

00:13:31.360 --> 00:13:33.899
bigger picture, you do face routing constraints

00:13:33.899 --> 00:13:36.500
with SIMD packing. To use your train analogy,

00:13:36.659 --> 00:13:38.759
you cannot tell passenger A to get off in New

00:13:38.759 --> 00:13:41.440
York and passenger B to get off in Chicago. The

00:13:41.440 --> 00:13:44.000
entire train goes to one destination. Right.

00:13:44.340 --> 00:13:46.700
In computational terms, operations must be applied

00:13:46.700 --> 00:13:49.019
uniformly across all the packed data slots at

00:13:49.019 --> 00:13:51.460
the exact same time. You apply the exact same

00:13:51.460 --> 00:13:53.379
mathematical instruction to the entire vector

00:13:53.379 --> 00:13:55.460
of data. Which sounds pretty limiting, to be

00:13:55.460 --> 00:13:58.580
honest. It does, until you look at how cloud

00:13:58.580 --> 00:14:01.159
analytics actually work. Think about it. If you're

00:14:01.159 --> 00:14:04.519
searching a massive encrypted database or performing

00:14:04.519 --> 00:14:06.879
statistical analysis on health care records,

00:14:07.259 --> 00:14:10.159
you usually want to apply the exact same comparative

00:14:10.159 --> 00:14:13.179
operation to millions of data points anyway.

00:14:13.519 --> 00:14:15.559
Oh, I see. So the constraint doesn't really matter

00:14:15.559 --> 00:14:18.639
for those use cases. Exactly. The vector constraint

00:14:18.639 --> 00:14:21.639
doesn't hinder the goal, and the speed game is

00:14:21.639 --> 00:14:24.799
astronomical. In fact, these optimizations made

00:14:24.799 --> 00:14:27.340
many applications so efficient they could run

00:14:27.340 --> 00:14:31.129
in what's called leveled FHE mode. That's the

00:14:31.129 --> 00:14:32.929
one where you know the depth of the circuit ahead

00:14:32.929 --> 00:14:35.149
of time, right? Correct. If a software engineer

00:14:35.149 --> 00:14:37.529
knows exactly how many multiplications a specific

00:14:37.529 --> 00:14:40.549
program requires and the noise grows slowly enough,

00:14:40.929 --> 00:14:43.210
the program might never even hit the noise ceiling.

00:14:43.389 --> 00:14:45.929
So no bootstrapping at all? None. The server

00:14:45.929 --> 00:14:48.309
can run the entire operation from start to finish

00:14:48.309 --> 00:14:51.070
without ever needing to pause for the slow, expensive

00:14:51.070 --> 00:14:53.769
bootstrapping process. That's a huge leap in

00:14:53.769 --> 00:14:56.250
practicality. But the cryptography community

00:14:56.250 --> 00:14:59.230
didn't stop there. The source highlights a massive

00:14:59.240 --> 00:15:03.340
paradigm shift around 2013 and 2014, the third

00:15:03.340 --> 00:15:06.389
generation. Researchers proposed new techniques,

00:15:06.409 --> 00:15:09.710
I think GSW has mentioned, that avoided expensive

00:15:09.710 --> 00:15:12.850
relinearization steps. Yes. And that led to schemes

00:15:12.850 --> 00:15:16.129
like FHUW and TFHE, where they made a really

00:15:16.129 --> 00:15:19.370
bold choice. They decided to refresh the ciphertext

00:15:19.370 --> 00:15:22.289
after every single operation. Which seems completely

00:15:22.289 --> 00:15:24.830
counterintuitive at first glance. Like, if bootstrapping

00:15:24.830 --> 00:15:27.570
is the slowest part of the entire process, why

00:15:27.570 --> 00:15:29.690
would doing it more often make the system faster?

00:15:29.769 --> 00:15:32.409
That sounds like a terrible idea. Right. It seems

00:15:32.409 --> 00:15:36.000
backward. was in how they simplified the logic.

00:15:36.279 --> 00:15:38.799
By restricting the computations to very basic

00:15:38.799 --> 00:15:41.759
Boolean gates, the bootstrapping procedure itself

00:15:41.759 --> 00:15:44.220
became incredibly lightweight. So it wasn't this

00:15:44.220 --> 00:15:46.600
massive heavy process anymore. Exactly. They

00:15:46.600 --> 00:15:48.779
weren't running a massive heavy decryption circuit.

00:15:48.860 --> 00:15:51.440
They were running a highly optimized tiny one.

00:15:51.529 --> 00:15:54.529
So by 2016, these schemes dropped the bootstrapping

00:15:54.529 --> 00:15:57.169
time from Gentry's original 30 minutes down to

00:15:57.169 --> 00:15:58.730
a fraction of a second. We're talking less than

00:15:58.730 --> 00:16:01.649
0 .1 seconds per operation. Wow. From taking

00:16:01.649 --> 00:16:04.850
a coffee break to the bleak of an eye. So the

00:16:04.850 --> 00:16:07.809
community conquered basic, exact computations,

00:16:08.090 --> 00:16:10.789
making them lightning fast. But the tech landscape

00:16:10.789 --> 00:16:13.590
in 2016 was undergoing a seismic shift, wasn't

00:16:13.590 --> 00:16:15.769
it? It absolutely was. Yeah. The biggest demand

00:16:15.769 --> 00:16:18.549
for cloud computing was no longer running exact

00:16:18.549 --> 00:16:21.409
database searches. It was machine learning. was

00:16:21.409 --> 00:16:24.149
AI. And machine learning doesn't really rely

00:16:24.149 --> 00:16:27.669
on perfect exact integer math. It relies on messy

00:16:27.879 --> 00:16:30.840
continuous approximate math. That's a challenge.

00:16:31.100 --> 00:16:33.480
Machine learning models utilize weights and biases

00:16:33.480 --> 00:16:36.940
that are often tiny decimal fractions. The FHE

00:16:36.940 --> 00:16:39.480
schemes we've discussed so far, they handled

00:16:39.480 --> 00:16:42.059
integers perfectly. Well, not fractions. Right.

00:16:42.320 --> 00:16:44.700
If you try to multiply complex fractions in those

00:16:44.700 --> 00:16:46.919
older schemes, the scale of the numbers blows

00:16:46.919 --> 00:16:49.240
up and you have to use heavy bootstrapping just

00:16:49.240 --> 00:16:51.919
to manage the sheer size of the data. To make

00:16:51.919 --> 00:16:53.840
privacy -preserving machine learning work, the

00:16:53.840 --> 00:16:55.960
industry needed an entirely different approach.

00:16:57.200 --> 00:16:59.919
points to a specific scheme introduced in 2016

00:16:59.919 --> 00:17:04.140
called CKKS. This scheme supports fixed point

00:17:04.140 --> 00:17:06.829
arithmetic. which is commonly referred to as

00:17:06.829 --> 00:17:09.569
block floating point arithmetic, meaning it's

00:17:09.569 --> 00:17:11.710
designed specifically to handle those complex

00:17:11.710 --> 00:17:14.210
decimals. Crucially, CKKS includes an efficient

00:17:14.210 --> 00:17:16.670
rescaling operation. Instead of the scale of

00:17:16.670 --> 00:17:18.549
the numbers blowing up after a multiplication,

00:17:19.170 --> 00:17:22.309
CKKS allows the server to simply scale down the

00:17:22.309 --> 00:17:24.109
encrypted message. Oh, that makes sense. Yeah.

00:17:24.250 --> 00:17:26.789
This makes evaluating the polynomial approximations

00:17:26.789 --> 00:17:29.170
used in machine learning incredibly fast and

00:17:29.170 --> 00:17:31.730
efficient. It basically became the gold standard

00:17:31.730 --> 00:17:34.829
for running AI models on encrypted data. But,

00:17:34.910 --> 00:17:36.769
and this is a massive but in the source material,

00:17:37.250 --> 00:17:40.009
CKKS is an approximate homomorphic encryption

00:17:40.009 --> 00:17:43.289
scheme. It introduces non -deterministic and

00:17:43.289 --> 00:17:46.029
deterministic approximation errors. It's like

00:17:46.029 --> 00:17:49.410
slightly floppy by design. Yes, it trades exactness

00:17:49.410 --> 00:17:52.210
for speed. So what does this all mean? If CKKS

00:17:52.210 --> 00:17:54.569
uses approximate math, kind of like a busy accountant

00:17:54.569 --> 00:17:56.390
just rounding numbers to the nearest thousand

00:17:56.390 --> 00:17:58.990
to save time, the source notes this sloppiness

00:17:58.990 --> 00:18:01.910
caused a major vulnerability. It did, the 2020

00:18:01.910 --> 00:18:06.190
paper. Right. In 2020, researchers Lian Miciancio

00:18:06.190 --> 00:18:09.210
published a passive attack showing that if the

00:18:09.210 --> 00:18:11.109
results of these approximate decryptions are

00:18:11.109 --> 00:18:13.809
shared, the secret key itself could actually

00:18:13.809 --> 00:18:16.430
be recovered. I mean, doesn't that defeat the

00:18:16.430 --> 00:18:19.250
entire purpose of the lockbox? Are we sacrificing

00:18:19.250 --> 00:18:22.170
our perfect security just to get the speed we

00:18:22.170 --> 00:18:24.829
need for AI? Well, this raises an important question

00:18:24.829 --> 00:18:28.069
about how we evaluate cryptographic models. Let's

00:18:28.069 --> 00:18:30.670
look objectively at the mechanics of what Lye

00:18:30.670 --> 00:18:34.190
and Miciancio discovered. In exact FHE schemes,

00:18:34.650 --> 00:18:36.609
when you use your private key to open the box,

00:18:36.950 --> 00:18:39.349
the noise is removed entirely. The final answer

00:18:39.349 --> 00:18:42.410
you read is mathematically perfect. OK. But in

00:18:42.410 --> 00:18:45.630
CKKS, because the math is approximate, a tiny

00:18:45.630 --> 00:18:48.130
fraction of that noise remains in the final decrypted

00:18:48.130 --> 00:18:49.829
result. Like the accountant's rounding error.

00:18:50.109 --> 00:18:52.250
Exactly. And what the researchers realize is

00:18:52.250 --> 00:18:54.549
that this leftover error isn't just random static.

00:18:54.950 --> 00:18:56.809
That error is mathematically tied to the secret

00:18:56.809 --> 00:18:59.049
key. It's an algebraic artifact. Wait, really?

00:18:59.109 --> 00:19:00.910
So the noise is essentially a fingerprint of

00:19:00.910 --> 00:19:04.039
the key. It is. If a user decrypts the result

00:19:04.039 --> 00:19:06.920
and then shares that slightly noisy data publicly

00:19:06.920 --> 00:19:10.319
or, say, feeds it back to the server, an adversary

00:19:10.319 --> 00:19:13.759
can compare the noisy result to the true mathematical

00:19:13.759 --> 00:19:16.940
answer. Oh, wow. Yeah. By isolating the error

00:19:16.940 --> 00:19:19.660
across multiple queries, the adversary can reverse

00:19:19.660 --> 00:19:21.960
-engineer the secret key. The attacker doesn't

00:19:21.960 --> 00:19:23.839
even have to break into the system. They just

00:19:23.839 --> 00:19:26.599
have to observe the shared results. That sounds

00:19:26.599 --> 00:19:29.119
completely catastrophic for privacy. It required

00:19:29.119 --> 00:19:31.319
a fundamental rething, certainly, but it doesn't

00:19:31.319 --> 00:19:34.480
mean the technology is broken. It means new technologies

00:19:34.480 --> 00:19:37.299
that solve old problems inevitably create new

00:19:37.299 --> 00:19:40.500
threat vectors. The standard security model cryptographers

00:19:40.500 --> 00:19:44.220
use, which is known as IND -CPA, assumed that

00:19:44.220 --> 00:19:46.779
decrypted results would remain completely private.

00:19:47.099 --> 00:19:49.500
But people share AI outputs all the time. Exactly.

00:19:49.819 --> 00:19:51.960
CCATAS challenged that assumption because in

00:19:51.960 --> 00:19:53.920
modern machine learning, outputs are frequently

00:19:53.920 --> 00:19:57.140
shared or fed back into public models. The standard

00:19:57.140 --> 00:19:59.789
simply wasn't built for that use case. And the

00:19:59.789 --> 00:20:02.109
tech industry had to scramble to fix it. The

00:20:02.109 --> 00:20:04.549
source mentions that major tech companies maintaining

00:20:04.549 --> 00:20:08.750
homomorphic encryption libraries like Heon, Seal,

00:20:09.130 --> 00:20:13.180
Healib, Palisade, They all had to implement mitigations.

00:20:13.579 --> 00:20:15.279
Right. They patched the vulnerabilities through

00:20:15.279 --> 00:20:17.460
responsible disclosure before the paper even

00:20:17.460 --> 00:20:20.000
went public. That's a relief. It really highlights

00:20:20.000 --> 00:20:22.839
how rapidly the ecosystem is maturing. We're

00:20:22.839 --> 00:20:25.279
no longer relying on isolated academic papers

00:20:25.279 --> 00:20:28.140
here. There are standard setting bodies now.

00:20:28.700 --> 00:20:31.420
The Homomorphic Encryption Standardization Consortium

00:20:31.420 --> 00:20:34.579
was formed back in 2017 to maintain community

00:20:34.579 --> 00:20:36.519
security standards across these different schemes.

00:20:36.619 --> 00:20:39.200
With companies like IBM, Microsoft, and Intel

00:20:39.200 --> 00:20:42.099
involved, right? Yes. And the technology is robust

00:20:42.099 --> 00:20:44.700
enough now that it's even expanding into decentralized

00:20:44.700 --> 00:20:47.440
applications. There are blockchain projects like

00:20:47.440 --> 00:20:50.099
Phoenix leveraging FHE for privacy preserving

00:20:50.099 --> 00:20:53.119
financial use cases. We have massive open source

00:20:53.119 --> 00:20:55.440
frameworks pushing this complex math directly

00:20:55.440 --> 00:20:57.359
into the hands of everyday software developers.

00:20:58.039 --> 00:20:59.920
Which brings us right back to you the listener.

00:21:00.640 --> 00:21:03.240
Why should you care about the mechanics of lattice

00:21:03.240 --> 00:21:06.119
based cryptography or how a server manages floating

00:21:06.119 --> 00:21:08.460
point arithmetic? Because you live your life

00:21:08.460 --> 00:21:11.750
in the cloud. We all do. Every time you use a

00:21:11.750 --> 00:21:14.490
cloud -based AI tool to draft a sensitive email,

00:21:14.809 --> 00:21:16.730
or use a financial app to track your personal

00:21:16.730 --> 00:21:19.609
budget, or a digital healthcare platform to message

00:21:19.609 --> 00:21:22.890
your doctor, your privacy is increasingly going

00:21:22.890 --> 00:21:25.789
to depend on these invisible, mathematical locked

00:21:25.789 --> 00:21:29.789
boxes. As computing power scales up, homomorphic

00:21:29.789 --> 00:21:32.430
encryption ensure your data doesn't have to be

00:21:32.430 --> 00:21:35.529
exposed just to be useful. It represents a fundamental

00:21:35.529 --> 00:21:38.390
decoupling of data utility from data privacy.

00:21:38.640 --> 00:21:40.779
For the entire history of computing, we've had

00:21:40.779 --> 00:21:43.420
to choose between keeping our data secret and

00:21:43.420 --> 00:21:46.099
gaining the benefits of analytics. That compromise

00:21:46.099 --> 00:21:48.680
is finally disappearing. But there is one final

00:21:48.680 --> 00:21:50.400
thought I want to leave you with, building on

00:21:50.400 --> 00:21:51.940
the warning from the source material we mentioned

00:21:51.940 --> 00:21:54.779
earlier. Even with perfect homomorphic encryption,

00:21:55.359 --> 00:21:58.039
side channels can still reveal metadata. Right,

00:21:58.220 --> 00:22:00.759
the context. They can reveal that an encrypted

00:22:00.759 --> 00:22:03.440
photograph was sent to a lookup service, even

00:22:03.440 --> 00:22:05.460
if the photograph itself is perfectly secure.

00:22:06.269 --> 00:22:08.849
So, here's something to ponder as we wrap up.

00:22:09.589 --> 00:22:11.750
If homomorphic encryption eventually make the

00:22:11.750 --> 00:22:14.809
actual contents of our data mathematically unbreakable,

00:22:15.329 --> 00:22:17.690
will the future of cyber espionage and hacking

00:22:17.690 --> 00:22:19.930
completely abandon trying to read our messages?

00:22:20.750 --> 00:22:22.849
Will they stop trying to see the diamond and

00:22:22.849 --> 00:22:25.289
instead focus entirely on the metadata building

00:22:25.289 --> 00:22:28.329
complete intimate profiles of our lives based

00:22:28.329 --> 00:22:30.809
solely on when, where, and how often we send

00:22:30.809 --> 00:22:34.880
our locked boxes? The context speaks volumes.

00:22:35.160 --> 00:22:37.380
The jeweler might be blindfolded, but everyone

00:22:37.380 --> 00:22:39.380
can still see how often they show up to work.

00:22:39.920 --> 00:22:41.519
Thank you for joining us on this deep dive into

00:22:41.519 --> 00:22:43.799
homomorphic encryption. Keep questioning the

00:22:43.799 --> 00:22:46.019
systems around you, and keep exploring the digital

00:22:46.019 --> 00:22:47.299
world. We'll see you next time.