WEBVTT

00:00:00.000 --> 00:00:03.200
Welcome to the deep dive. So we often talk about

00:00:03.200 --> 00:00:06.160
compute as being, you know, the engine of the

00:00:06.160 --> 00:00:09.140
cloud. But if compute's the engine, then storage.

00:00:09.640 --> 00:00:11.339
Well, storage is everything else. It's the fuel

00:00:11.339 --> 00:00:14.640
tank, it's the cargo hold. Your data is absolutely

00:00:14.640 --> 00:00:17.140
vital. And knowing where to put it, how to keep

00:00:17.140 --> 00:00:19.600
it safe, and maybe most importantly, how it affects

00:00:19.600 --> 00:00:22.519
your bill. That's crucial. Get it wrong, and

00:00:22.519 --> 00:00:26.219
you're really building on shaky ground. Absolutely

00:00:26.219 --> 00:00:28.859
foundational. And look, for anyone who remembers

00:00:28.859 --> 00:00:31.519
what it was like before the cloud, storage was

00:00:31.519 --> 00:00:33.619
a huge headache. Oh, yeah. You're talking big,

00:00:33.899 --> 00:00:36.840
expensive boxes, SANs, storage area networks,

00:00:37.119 --> 00:00:39.899
NAS devices. Right. You had to spend a ton of

00:00:39.899 --> 00:00:42.399
money upfront guessing. literally guessing how

00:00:42.399 --> 00:00:44.539
much space you'd need years down the line. Which

00:00:44.539 --> 00:00:46.259
never worked out right. Either you bought way

00:00:46.259 --> 00:00:48.619
too much and half of it sat there gathering dust

00:00:48.619 --> 00:00:51.479
costing a fortune or inevitably you ran out.

00:00:51.640 --> 00:00:53.740
Usually at the worst possible time like 2 a .m.

00:00:53.780 --> 00:00:56.219
on a Monday then it's panic stations trying to

00:00:56.219 --> 00:00:58.640
get new discs shipped overnight. Exactly and

00:00:58.640 --> 00:01:01.219
that whole nightmare scenario. That's what AWS

00:01:01.219 --> 00:01:03.939
storage just completely changed. The core idea

00:01:03.939 --> 00:01:08.519
is elasticity, global reach and crucially pay

00:01:08.519 --> 00:01:12.510
as you go. No more crystal ball gazing for capacity.

00:01:12.790 --> 00:01:15.109
Okay, so that's our goal today then we want to

00:01:15.109 --> 00:01:16.989
give you the blueprint essentially We're gonna

00:01:16.989 --> 00:01:20.250
decode the whole landscape of AWS storage s3

00:01:20.250 --> 00:01:24.150
EBS EFS all those different tools The aim is

00:01:24.150 --> 00:01:26.250
to give you that clear practical knowledge. You

00:01:26.250 --> 00:01:29.650
can use right away Let's uh, let's start unpacking

00:01:29.650 --> 00:01:32.439
this blueprint. Sounds good But before we even

00:01:32.439 --> 00:01:35.260
name a single AWS service, like S3 or anything

00:01:35.260 --> 00:01:37.219
else, we really need to get our heads around

00:01:37.219 --> 00:01:39.680
the three basic ways data gets stored. Everything

00:01:39.680 --> 00:01:42.500
else fits into one of these buckets, so to speak.

00:01:42.599 --> 00:01:44.739
OK, three fundamental types. Let's walk through

00:01:44.739 --> 00:01:47.579
them. First up. First is object storage. Now,

00:01:47.579 --> 00:01:49.000
this one might feel a bit different from your

00:01:49.000 --> 00:01:51.420
laptop's hard drive. How so? Imagine a massive

00:01:51.420 --> 00:01:54.019
valley parking service, but for your data. You

00:01:54.019 --> 00:01:56.599
give them your file. It could be a photo, a video,

00:01:56.819 --> 00:01:58.859
a backup file, whatever. And they give you back

00:01:58.859 --> 00:02:01.480
a unique ID, a ticket. You don't worry about

00:02:01.480 --> 00:02:03.959
folders inside folders. It's a flat structure.

00:02:04.560 --> 00:02:08.020
Your object is the file plus some metadata, and

00:02:08.020 --> 00:02:10.780
you use that ID, that ticket, to get it back.

00:02:10.939 --> 00:02:13.389
And because it's flat, that makes it Super scalable,

00:02:13.650 --> 00:02:16.530
I guess. Good for just huge amounts of unstructured

00:02:16.530 --> 00:02:19.110
data, like all those photos and videos. Exactly.

00:02:19.509 --> 00:02:21.969
Massive scale. Yeah. Perfect for unstructured

00:02:21.969 --> 00:02:25.050
stuff. So that's object storage. But OK, what

00:02:25.050 --> 00:02:26.909
if you need storage that an operating system

00:02:26.909 --> 00:02:30.090
can actually use, like to boot from or run a

00:02:30.090 --> 00:02:32.370
database directly on? Right. Not just storing

00:02:32.370 --> 00:02:34.590
files to download later. That sounds different.

00:02:34.629 --> 00:02:36.509
That's where block storage comes in. That's block

00:02:36.509 --> 00:02:38.349
storage. Totally different concept. Think of

00:02:38.349 --> 00:02:42.280
it like a huge grid of equally sized boxes. blocks,

00:02:42.439 --> 00:02:44.819
each with a number. When you attach this kind

00:02:44.819 --> 00:02:47.259
of storage to a server, the operating system

00:02:47.259 --> 00:02:50.280
sees it like a raw, unformatted hard drive, just

00:02:50.280 --> 00:02:53.759
a bunch of blocks it can use. Ah, so the OS manages

00:02:53.759 --> 00:02:56.199
the file system on top of those blocks itself.

00:02:56.539 --> 00:02:58.680
Precisely. And that direct block access gives

00:02:58.680 --> 00:03:00.939
you the really high speed and low latency you

00:03:00.939 --> 00:03:03.259
need for things like databases or the operating

00:03:03.259 --> 00:03:07.259
system drive itself. Got it. OK, so object for

00:03:07.259 --> 00:03:11.780
massive scale, web access. Block for all performance

00:03:11.780 --> 00:03:14.759
OS level access. What's the third flavor? The

00:03:14.759 --> 00:03:16.780
third one is probably the most familiar file

00:03:16.780 --> 00:03:19.819
storage This is your classic folders and subfolders

00:03:19.819 --> 00:03:22.419
hierarchy like the shared drive on a network

00:03:22.419 --> 00:03:25.560
exactly like a shared network drive The key thing

00:03:25.560 --> 00:03:27.979
here the big difference from block storage Especially

00:03:27.979 --> 00:03:30.979
is that multiple servers multiple computers can

00:03:30.979 --> 00:03:33.340
connect to the same file storage and see the

00:03:33.340 --> 00:03:35.759
same files at the same time It's designed for

00:03:35.759 --> 00:03:38.000
sharing. Okay, the shared filing cabinet analogy

00:03:38.000 --> 00:03:41.199
makes sense object block file The basics. Right.

00:03:41.340 --> 00:03:43.259
Those are the building blocks. No pun intended.

00:03:43.439 --> 00:03:46.539
Huh. OK. So let's dive into the AWS services

00:03:46.539 --> 00:03:48.520
themselves. And we have to start with the biggest

00:03:48.520 --> 00:03:50.300
one, right? The Titan. You have to start with

00:03:50.300 --> 00:03:53.259
S3, Amazon Simple Storage Service. It's just

00:03:53.259 --> 00:03:56.800
fundamental to so much on AWS. It's their massive,

00:03:57.319 --> 00:03:59.939
super durable, super scalable object storage

00:03:59.939 --> 00:04:02.539
service. The magic, infinitely deep closet you

00:04:02.539 --> 00:04:04.300
called it earlier. That's a good way to think

00:04:04.300 --> 00:04:06.659
about it, yeah. Whether you've got tiny files

00:04:06.659 --> 00:04:10.180
or petabytes of data, S3 just handles it. Application

00:04:10.180 --> 00:04:13.900
code, documents, backups, website assets, the

00:04:13.900 --> 00:04:17.319
foundation for huge data lakes. It often ends

00:04:17.319 --> 00:04:19.740
up in S3. And a couple of key terms people need

00:04:19.740 --> 00:04:22.540
to know. Your data goes into buckets. And these

00:04:22.540 --> 00:04:25.490
bucket names... They have to be unique across

00:04:25.490 --> 00:04:28.430
the entire world of AWS, right? Globally unique,

00:04:28.550 --> 00:04:30.509
yeah, that's important. It really underscores

00:04:30.509 --> 00:04:33.470
that S3 is a global service, not tied to one

00:04:33.470 --> 00:04:35.889
region in the same way some other services are.

00:04:36.250 --> 00:04:38.689
And inside the buckets you have your objects,

00:04:38.730 --> 00:04:40.949
which are the actual files plus their metadata.

00:04:41.129 --> 00:04:42.930
But the headline feature, the one everyone talks

00:04:42.930 --> 00:04:46.050
about, is the durability. Oh yeah, the durability.

00:04:46.490 --> 00:04:52.370
AWS guarantees 99 .99999999 % durability. That's

00:04:52.370 --> 00:04:55.420
11 nines. 11 nines. It sounds almost unbelievable.

00:04:55.579 --> 00:04:57.420
What does that actually mean in practice? It

00:04:57.420 --> 00:04:59.319
means, statistically speaking, if you stored,

00:04:59.699 --> 00:05:01.920
say, 10 million files in S3, you could expect

00:05:01.920 --> 00:05:04.139
to maybe lose one single file over the course

00:05:04.139 --> 00:05:07.980
of 10 ,000 years. Wow. OK. That's pretty durable.

00:05:08.060 --> 00:05:10.879
How do they even achieve that? By automatically

00:05:10.879 --> 00:05:13.519
making copies of your data and storing them across

00:05:13.519 --> 00:05:16.100
multiple separate physical data centers, what

00:05:16.100 --> 00:05:19.160
AWS calls availability zones within a region.

00:05:19.759 --> 00:05:22.120
It's redundancy built in at a massive scale.

00:05:22.269 --> 00:05:24.750
Which explains why it's the default choice for

00:05:24.750 --> 00:05:26.709
so many critical things. You mentioned backups,

00:05:27.209 --> 00:05:30.230
disaster recovery, data archiving, even hosting

00:05:30.230 --> 00:05:33.089
static websites directly from S3. Absolutely.

00:05:33.129 --> 00:05:35.110
It's incredibly versatile because of that scale

00:05:35.110 --> 00:05:37.689
and durability. OK, but within S3, it's not just

00:05:37.689 --> 00:05:40.029
one size fits all, is it? You mentioned cost

00:05:40.029 --> 00:05:43.079
being a factor. How do we manage that? Right,

00:05:43.279 --> 00:05:45.839
so because S3 holds everything from data you

00:05:45.839 --> 00:05:48.980
need instantly multiple times a second to archives

00:05:48.980 --> 00:05:51.459
you might look at once a decade, they created

00:05:51.459 --> 00:05:53.939
S3 storage classes. This is all about balancing

00:05:53.939 --> 00:05:57.120
cost versus access time and frequency. And this

00:05:57.120 --> 00:06:00.300
is a really key area for anyone designing systems

00:06:00.300 --> 00:06:03.019
or even taking an AWS exam, I imagine, getting

00:06:03.019 --> 00:06:05.000
the storage class right. Definitely. It can have

00:06:05.000 --> 00:06:07.199
a huge impact on your bill. So let's start with

00:06:07.199 --> 00:06:09.399
the default, the kind of main one. That's S3

00:06:09.399 --> 00:06:11.660
standard. This is for frequently accessed data,

00:06:11.920 --> 00:06:14.000
stuff you need fast. It gives you low latency,

00:06:14.199 --> 00:06:16.360
high throughput. It has the highest storage cost

00:06:16.360 --> 00:06:19.819
per gigabyte. But accessing the data is cheap

00:06:19.819 --> 00:06:22.970
and instant. Right, website assets, active application

00:06:22.970 --> 00:06:25.410
data. Right, the prime shelf for your most active

00:06:25.410 --> 00:06:28.189
data. But what if my access patterns are, well,

00:06:28.850 --> 00:06:31.970
unpredictable? Or they change over time? I don't

00:06:31.970 --> 00:06:34.389
want to be constantly moving files between tiers

00:06:34.389 --> 00:06:37.050
myself. Ah, for that, AWS is something quite

00:06:37.050 --> 00:06:39.490
clever. S3 Intelligent Tiering. It's kind of

00:06:39.490 --> 00:06:41.879
the set it and forget it. How does that work?

00:06:42.199 --> 00:06:44.040
It uses machine learning, basically algorithms,

00:06:44.240 --> 00:06:46.899
to watch how you access each object. If you stop

00:06:46.899 --> 00:06:48.660
accessing something frequently, it automatically

00:06:48.660 --> 00:06:51.600
moves it to a cheaper infrequent access tier

00:06:51.600 --> 00:06:53.459
behind the scenes. If you access it again, it

00:06:53.459 --> 00:06:55.720
moves it back. You get automatic cost savings

00:06:55.720 --> 00:06:57.759
without lifting a finger. That sounds pretty

00:06:57.759 --> 00:07:00.420
useful. OK, so that handles automatic optimization.

00:07:01.040 --> 00:07:03.399
What about the tiers specifically designed for

00:07:03.399 --> 00:07:07.899
data you don't access often, the infrequent access

00:07:07.899 --> 00:07:10.980
tiers? Right, so the main one there is S3 standard

00:07:10.980 --> 00:07:14.319
and frequent access, or S3 standard IA. The storage

00:07:14.319 --> 00:07:17.279
cost per gigabyte per month is lower than S3

00:07:17.279 --> 00:07:19.540
standard. Cheaper to store. Okay, what's the

00:07:19.540 --> 00:07:22.279
catch? The catch is the retrieval fee. You pay

00:07:22.279 --> 00:07:24.740
a per gigabyte fee every time you access or retrieve

00:07:24.740 --> 00:07:27.870
data. from Standard IA. Ah, okay, so if I store

00:07:27.870 --> 00:07:30.350
archives there cheaply but then suddenly need

00:07:30.350 --> 00:07:33.149
to download terabytes of it, that could get expensive

00:07:33.149 --> 00:07:35.829
fast. It could. You need to be mindful. Standard

00:07:35.829 --> 00:07:38.089
IAs is great for things like older backups or

00:07:38.089 --> 00:07:40.290
data you need long -term attention for, but don't

00:07:40.290 --> 00:07:42.949
expect to access regularly. But crucially, when

00:07:42.949 --> 00:07:45.709
you do need it, access is still fast. Just like

00:07:45.709 --> 00:07:47.689
S3 standard, you just pay for that retrieval.

00:07:47.910 --> 00:07:49.870
Got it. And there's an even cheaper IA option,

00:07:49.990 --> 00:07:52.709
isn't there? S3 one zone IA. What's the trade

00:07:52.709 --> 00:07:55.329
off there? It sounds less resilient. It is less

00:07:55.329 --> 00:07:57.509
resilient. That's the key trade off. One zone

00:07:57.509 --> 00:08:00.009
means exactly that. Your data is stored in only

00:08:00.009 --> 00:08:02.750
a single availability zone within a region, unlike

00:08:02.750 --> 00:08:05.350
the other S3 classes that spread it across multiple

00:08:05.350 --> 00:08:08.629
AZs. So if that specific AZ has an issue, power

00:08:08.629 --> 00:08:11.360
outage, flood, whatever that data is unavailable,

00:08:11.740 --> 00:08:13.920
and potentially lost if the AZ is destroyed.

00:08:14.240 --> 00:08:16.839
Unavailable during the outage, yes. And potentially

00:08:16.839 --> 00:08:19.279
lost in a disaster scenario affecting that single

00:08:19.279 --> 00:08:21.959
AZ, although that's rare. So it's much cheaper

00:08:21.959 --> 00:08:24.259
storage, the cheapest non -archived tier, but

00:08:24.259 --> 00:08:26.740
you accept lower durability. It's really only

00:08:26.740 --> 00:08:28.920
suitable for data you can easily recreate if

00:08:28.920 --> 00:08:31.920
lost, like maybe thumbnail images generated from

00:08:31.920 --> 00:08:34.379
original stored elsewhere, or logs you also have

00:08:34.379 --> 00:08:36.950
copies of. Not for primary data. OK, that's a

00:08:36.950 --> 00:08:39.169
critical distinction. So standard for frequent,

00:08:39.629 --> 00:08:42.009
intelligent tiering for unknown, standard IA

00:08:42.009 --> 00:08:44.889
for infrequent but fast retrieval needed, one

00:08:44.889 --> 00:08:48.169
zone IA for recreatable infrequent data. What

00:08:48.169 --> 00:08:51.649
about really cold storage? The deep freeze. Now

00:08:51.649 --> 00:08:53.409
we're talking about the archive tiers, which

00:08:53.409 --> 00:08:55.350
used to be mainly under the Glacier brand name.

00:08:55.480 --> 00:08:58.059
These are for maximum cost savings on data you

00:08:58.059 --> 00:09:00.460
rarely access. Glacier. There's a glacier instant

00:09:00.460 --> 00:09:02.340
retrieval. This one's interesting. It offers

00:09:02.340 --> 00:09:04.559
the lowest cost storage for data that's rarely

00:09:04.559 --> 00:09:07.600
accessed. But when you do need it, you need it

00:09:07.600 --> 00:09:10.120
back in milliseconds, just like S3 standard or

00:09:10.120 --> 00:09:13.159
IA. So archive economics, but immediate access.

00:09:13.379 --> 00:09:15.460
Right, bridging that gap. Then the more traditional

00:09:15.460 --> 00:09:18.200
glacier. Yeah, Glacier Flexible Retrieval. This

00:09:18.200 --> 00:09:20.379
is the classic archive option. Very cheap to

00:09:20.379 --> 00:09:22.779
store, but retrieval isn't instant. It can take

00:09:22.779 --> 00:09:24.759
anywhere from minutes to several hours to get

00:09:24.759 --> 00:09:26.799
your data back, depending on what retrieval option

00:09:26.799 --> 00:09:29.100
you pick. Minutes to hours, okay. Definitely

00:09:29.100 --> 00:09:32.519
for archives. And the absolute coldest, cheapest

00:09:32.519 --> 00:09:35.799
option. That's Glacier Deep Archive. Rock bottom

00:09:35.799 --> 00:09:39.139
storage costs. We're talking fractions of a cent

00:09:39.139 --> 00:09:41.940
per gigabyte per month, but the retrieval time

00:09:41.940 --> 00:09:44.379
reflects that. You're looking at 12 to 48 hours

00:09:44.379 --> 00:09:46.840
to get your data back. 12 to 48 hours, okay.

00:09:47.019 --> 00:09:49.620
This is for data you might literally access once

00:09:49.620 --> 00:09:52.899
every few years. Think regulatory archives that

00:09:52.899 --> 00:09:54.639
you have to keep for seven years but probably

00:09:54.639 --> 00:09:57.480
never look at, or maybe the final backup before

00:09:57.480 --> 00:10:00.559
decommissioning a system entirely. Long -term,

00:10:00.860 --> 00:10:03.200
deep -free storage. Wow. Okay, that's a whole

00:10:03.200 --> 00:10:05.539
spectrum just within S3. Standard, intelligent

00:10:05.539 --> 00:10:08.360
tiering, standard IA, one -zone IA, Glacier Instant,

00:10:08.480 --> 00:10:10.659
Glacier Flexible, Glacier Deep Archive. Lots

00:10:10.659 --> 00:10:12.639
to choose from based on access patterns and cost.

00:10:12.860 --> 00:10:15.019
Exactly. Matching the class to the need is key.

00:10:15.240 --> 00:10:17.659
All right. We've spent a lot of time on S3, which

00:10:17.659 --> 00:10:20.019
makes sense given its importance, but that's

00:10:20.019 --> 00:10:22.139
mostly storage accessed over the internet, right?

00:10:22.580 --> 00:10:27.539
Object storage. Let's shift gears. What about

00:10:27.539 --> 00:10:30.340
storage attached directly to our virtual servers,

00:10:30.620 --> 00:10:34.019
our EC2 instances, inside the virtual data center?

00:10:34.200 --> 00:10:36.279
Right. Now we're moving away from object storage

00:10:36.279 --> 00:10:38.919
and primarily into block and file storage territory,

00:10:39.460 --> 00:10:41.539
directly connected to compute. And the main player

00:10:41.539 --> 00:10:44.460
for block storage attached to EC2 is? That would

00:10:44.460 --> 00:10:47.480
be Amazon EBS, the elastic block store. Remember

00:10:47.480 --> 00:10:49.840
our definition, high -performance block storage.

00:10:50.419 --> 00:10:52.799
EBS provides persistent block -level storage

00:10:52.799 --> 00:10:55.480
volumes for use with EC2 instances. Think of

00:10:55.480 --> 00:10:57.179
it as the virtual hard drive for your virtual

00:10:57.179 --> 00:10:59.570
server. virtual hard drive. But there's a really

00:10:59.570 --> 00:11:01.990
important limitation or characteristic people

00:11:01.990 --> 00:11:03.850
need to grasp about EBS. Isn't there something

00:11:03.850 --> 00:11:07.129
about location? Absolutely critical. An EBS volume

00:11:07.129 --> 00:11:10.070
exists in and is bound to a single availability

00:11:10.070 --> 00:11:13.509
zone, AZ. Just one AZ? Just one. You can only

00:11:13.509 --> 00:11:16.370
attach an EBS volume to an EC2 instance that

00:11:16.370 --> 00:11:18.309
is running that exact same availability zone.

00:11:18.570 --> 00:11:21.289
It cannot cross AZ boundaries natively. That's

00:11:21.289 --> 00:11:23.350
huge for designing resilient applications, right?

00:11:23.649 --> 00:11:27.110
If that whole AZ has a problem, your EC2 instance

00:11:27.110 --> 00:11:30.250
goes down and the EBS volume attached to it is

00:11:30.250 --> 00:11:32.370
also unavailable. Precisely. High availability

00:11:32.370 --> 00:11:35.470
for applications using EBS often involves strategies

00:11:35.470 --> 00:11:38.870
like replicating data across instances in different

00:11:38.870 --> 00:11:42.509
AZs or using those EBS snapshots. Ah, yes, snapshots.

00:11:42.629 --> 00:11:44.850
How do they work? Are they full backups every

00:11:44.850 --> 00:11:47.230
time? They're point -in -time backups of your

00:11:47.230 --> 00:11:50.370
EBS volume. And the clever part is they are incremental.

00:11:50.950 --> 00:11:53.250
After the first full snapshot, subsequent snapshots

00:11:53.250 --> 00:11:54.970
only save the blocks that have changed since

00:11:54.970 --> 00:11:57.289
the last one, which saves storage space and time.

00:11:57.549 --> 00:12:00.289
And where do these snapshots get stored for durability?

00:12:00.570 --> 00:12:03.940
They get stored under the hood in S3. Ah. So

00:12:03.940 --> 00:12:06.679
AWS uses the incredible durability of S3, those

00:12:06.679 --> 00:12:09.379
11 nines, as the backend to make EBS snapshots

00:12:09.379 --> 00:12:11.639
reliable. That's smart. It's a very smart architecture,

00:12:11.879 --> 00:12:14.460
yeah. So typical use cases for EBS are pretty

00:12:14.460 --> 00:12:16.580
clear. The boot volume for your EC2 instance

00:12:16.580 --> 00:12:19.039
where the operating system lives, or as the data

00:12:19.039 --> 00:12:22.120
drive for databases like SQL Server, MySQL, PostgreSQL,

00:12:22.220 --> 00:12:25.299
or NoSQL databases running directly on a single

00:12:25.299 --> 00:12:27.899
EC2 instance needing that low latency block access.

00:12:28.159 --> 00:12:31.279
OK, so EBS high performance block storage for

00:12:31.279 --> 00:12:34.639
a single EC2 instance in a single AZ. What if

00:12:34.639 --> 00:12:36.899
I need that shared network drive experience we

00:12:36.899 --> 00:12:39.759
talked about earlier? Multiple EC2 instances

00:12:39.759 --> 00:12:42.399
needing to access the same file simultaneously.

00:12:42.519 --> 00:12:45.299
Now you're talking about Amazon EFS, the Elastic

00:12:45.299 --> 00:12:49.100
File System. This is AWS's fully managed, scalable

00:12:49.100 --> 00:12:51.759
file storage service. And the key differentiator

00:12:51.759 --> 00:12:54.559
again? The key is that multiple EC2 instances

00:12:54.559 --> 00:12:57.879
can connect to the same EFS file system concurrently.

00:12:57.929 --> 00:13:00.389
That's the fundamental difference from EBS. So

00:13:00.389 --> 00:13:02.809
perfect for things like a fleet of web servers

00:13:02.809 --> 00:13:05.389
that all need access to the same set of website

00:13:05.389 --> 00:13:07.809
files or shared application assets. Exactly.

00:13:07.850 --> 00:13:09.830
Content management systems like WordPress or

00:13:09.830 --> 00:13:12.450
Drupal running across multiple web servers, share

00:13:12.450 --> 00:13:14.730
code repositories, home directories for users

00:13:14.730 --> 00:13:17.309
logging into multiple machines. EFS fits that

00:13:17.309 --> 00:13:19.149
perfectly. And how does it handle things like

00:13:19.149 --> 00:13:21.629
scaling and availability? Well, it's elastic,

00:13:21.649 --> 00:13:23.610
so it grows and shrinks automatically as you

00:13:23.610 --> 00:13:25.529
add or remove files. You know, pre -provision

00:13:25.529 --> 00:13:28.110
storage. And importantly, a regional service.

00:13:28.470 --> 00:13:31.269
It automatically stores your data across multiple

00:13:31.269 --> 00:13:34.350
availability zones within a region for high durability

00:13:34.350 --> 00:13:36.750
and availability. Okay, so unlike EBS, which

00:13:36.750 --> 00:13:40.649
is AZ specific, EFS spans multiple AZs within

00:13:40.649 --> 00:13:42.950
a region. That's a big plus for availability.

00:13:43.330 --> 00:13:46.750
Any specific protocols or OS limitations? It

00:13:46.750 --> 00:13:50.009
primarily uses the NFSv4 protocol, network file

00:13:50.009 --> 00:13:52.440
system version 4. which means it's primarily

00:13:52.440 --> 00:13:55.059
designed for and works best with Linux based

00:13:55.059 --> 00:13:57.840
EC2 instances. While there are ways to connect

00:13:57.840 --> 00:13:59.940
from Windows, it's not the primary use case.

00:14:00.080 --> 00:14:03.659
Got it. So EFS shared elastic regional file storage,

00:14:04.100 --> 00:14:06.720
mainly for Linux workloads via NFS. You got it.

00:14:06.919 --> 00:14:09.220
Okay, we've covered the big three. S3 for object,

00:14:09.440 --> 00:14:12.480
EBS for block, EFS for shared file. Are there

00:14:12.480 --> 00:14:14.879
other more specialized storage services we should

00:14:14.879 --> 00:14:17.340
touch on? Maybe for specific workloads or hybrid

00:14:17.340 --> 00:14:19.419
setups? Yeah, there are a couple more worth knowing

00:14:19.419 --> 00:14:21.580
about, especially for bridging on -premises worlds

00:14:21.580 --> 00:14:23.720
or handling specific application needs. Let's

00:14:23.720 --> 00:14:25.559
hear about the bridge first. Connecting existing

00:14:25.559 --> 00:14:28.600
data centers to AWS storage. That would be the

00:14:28.600 --> 00:14:32.159
AWS storage gateway. Think of it as a hybrid

00:14:32.159 --> 00:14:34.440
cloud storage appliance, like a magic portal.

00:14:34.620 --> 00:14:37.779
A magic portal? Okay, intriguing. You typically

00:14:37.779 --> 00:14:40.320
deploy a virtual machine, or sometimes a hardware

00:14:40.320 --> 00:14:43.379
appliance, in your own data center. This gateway

00:14:43.379 --> 00:14:46.460
then connects securely back to AWS and makes

00:14:46.460 --> 00:14:49.980
cloud storage like S3 or even EBS volumes appear

00:14:49.980 --> 00:14:52.100
as if it's local storage within your data center.

00:14:52.399 --> 00:14:54.840
Ah, so my old backup software running on my local

00:14:54.840 --> 00:14:57.080
server could think it's writing to a local tape

00:14:57.080 --> 00:15:00.179
library or a network share. Exactly. But it's

00:15:00.179 --> 00:15:01.919
actually sending the data through the gateway

00:15:01.919 --> 00:15:05.639
up to S3 for durable, scalable cloud storage.

00:15:06.059 --> 00:15:08.159
It lets you integrate cloud storage without having

00:15:08.159 --> 00:15:10.700
to rewrite your existing on -premises applications

00:15:10.700 --> 00:15:13.399
or workflows. It smooths that transition. That

00:15:13.399 --> 00:15:16.080
sounds incredibly useful for migration and backup

00:15:16.080 --> 00:15:19.039
scenarios. OK, what about specialized file systems?

00:15:19.100 --> 00:15:21.299
That's where the Amazon FSX family comes in.

00:15:21.460 --> 00:15:24.419
FSX stands for File System X, basically. It provides

00:15:24.419 --> 00:15:26.840
fully managed third -party file systems optimized

00:15:26.840 --> 00:15:29.039
for specific needs. Like what kind of specific

00:15:29.039 --> 00:15:31.429
needs? So the two big ones are FSX for Windows

00:15:31.429 --> 00:15:35.210
File Server and FSX for Lustre. OK, FSX for Windows

00:15:35.210 --> 00:15:38.269
sounds pretty self -explanatory. It is. If you're

00:15:38.269 --> 00:15:41.490
migrating Windows applications to AWS, and those

00:15:41.490 --> 00:15:43.950
applications rely on a shared Windows file server

00:15:43.950 --> 00:15:47.110
using the standard SMB protocol and Windows NTFS

00:15:47.110 --> 00:15:50.309
permissions, FSx for Windows provides exactly

00:15:50.309 --> 00:15:53.169
that, fully managed. It makes lifting and shifting

00:15:53.169 --> 00:15:56.269
those apps much easier. Gotcha. So if my app

00:15:56.269 --> 00:15:59.450
needs file super share, FSx for Windows can provide

00:15:59.450 --> 00:16:01.870
that file super share in the cloud. Precisely.

00:16:02.070 --> 00:16:04.649
And then there's FSx for Lustre. Lustre. Never

00:16:04.649 --> 00:16:06.870
heard of it. Lustre is a different beast altogether.

00:16:07.269 --> 00:16:09.649
It's an open source parallel file system designed

00:16:09.649 --> 00:16:12.480
for extreme speed and scale. Think massive throughput.

00:16:12.779 --> 00:16:14.480
Parallel file system. What kind of workloads

00:16:14.480 --> 00:16:16.559
need that? We're talking high -performance computing,

00:16:17.019 --> 00:16:19.860
HPC, scientific simulations, genomic analysis,

00:16:20.240 --> 00:16:22.740
financial modeling, also heavy -duty machine

00:16:22.740 --> 00:16:25.240
learning training, or big media workloads like

00:16:25.240 --> 00:16:28.519
rendering 4K or 8K video. When you need to process

00:16:28.519 --> 00:16:31.059
huge data sets across many compute nodes really,

00:16:31.059 --> 00:16:34.019
really fast, FSX or Lustre is often the answer.

00:16:34.200 --> 00:16:37.419
It's built for speed. OK, so FSx for Windows

00:16:37.419 --> 00:16:39.779
handles the specific needs of Windows shared

00:16:39.779 --> 00:16:42.980
file systems, and FSx for Lustre handles the

00:16:42.980 --> 00:16:45.240
extreme performance needs of HPC and similar

00:16:45.240 --> 00:16:47.580
workloads. That's the gist of it. Specialized

00:16:47.580 --> 00:16:50.440
tools for specialized jobs. Wow. OK, that was

00:16:50.440 --> 00:16:53.159
a whirlwind tour through the AWS storage universe.

00:16:53.600 --> 00:16:58.039
S3, EBS, EFS, Storage Gateway, FSx. Quite a lot

00:16:58.039 --> 00:17:00.179
to take in. It is a broad portfolio for sure.

00:17:00.340 --> 00:17:02.200
But hopefully we've managed to demystify it a

00:17:02.200 --> 00:17:04.420
bit. Let's try and boil it down to that quick

00:17:04.420 --> 00:17:07.000
blueprint for success. When you're faced with

00:17:07.000 --> 00:17:09.779
a storage requirement, how do you quickly decide?

00:17:10.000 --> 00:17:12.539
OK, the quick decision tree. First, think about

00:17:12.539 --> 00:17:15.660
the data type and access pattern. Is it unstructured

00:17:15.660 --> 00:17:18.680
data photos, videos, backups, logs that needs

00:17:18.680 --> 00:17:21.380
to scale massively and be accessible potentially

00:17:21.380 --> 00:17:24.140
from anywhere, maybe over the web? If yes. That

00:17:24.140 --> 00:17:26.630
screams S3. Start there. Think about the right

00:17:26.630 --> 00:17:30.029
S3 storage class for cost and access speed. Okay.

00:17:30.210 --> 00:17:32.250
What if it's not unstructured files, but you

00:17:32.250 --> 00:17:34.390
need a high -performance virtual hard drive for

00:17:34.390 --> 00:17:37.650
just one specific EC2 instance, like for its

00:17:37.650 --> 00:17:39.750
operating system or a database running on that

00:17:39.750 --> 00:17:42.410
instance? That's EBS. Remember, it's tied to

00:17:42.410 --> 00:17:45.009
a single instance in a single AZ. Right. And

00:17:45.009 --> 00:17:48.359
if you need that shared drive experience... multiple

00:17:48.359 --> 00:17:51.880
servers, probably Linux, needing to access and

00:17:51.880 --> 00:17:54.980
modify the same set of files concurrently. That's

00:17:54.980 --> 00:17:58.940
your queue for EFS, Shared Elastic Regional File

00:17:58.940 --> 00:18:01.859
Storage. And then the special cases. Migrating

00:18:01.859 --> 00:18:04.019
a Windows app that needs a traditional Windows

00:18:04.019 --> 00:18:06.680
file share. Look at FSx for Windows file server.

00:18:06.859 --> 00:18:10.359
And if the requirement is just absolutely blistering

00:18:10.359 --> 00:18:13.039
speed for something like HPC or massive video

00:18:13.039 --> 00:18:15.559
rendering, then FSX for Lustre comes into play.

00:18:15.779 --> 00:18:18.759
S3, EBS, EFS. Those are the main three pillars

00:18:18.759 --> 00:18:21.180
to get right. Absolutely. And maybe if we tie

00:18:21.180 --> 00:18:22.920
this back to where we started with durability.

00:18:23.380 --> 00:18:26.660
Remember, S3 is 11 nines. That comes from AWS

00:18:26.660 --> 00:18:29.440
managing the complexity of replicating data across

00:18:29.440 --> 00:18:32.240
multiple physical locations, multiple availability

00:18:32.240 --> 00:18:34.619
zones. They abstract away the hardware failure

00:18:34.619 --> 00:18:36.490
for you at that level. Which is a huge shift

00:18:36.490 --> 00:18:38.390
from the old days of managing your own ASAN,

00:18:38.690 --> 00:18:40.829
right? Where one device failure could be catastrophic.

00:18:40.910 --> 00:18:43.230
A massive shift. So maybe here's a final thought

00:18:43.230 --> 00:18:45.589
for you, the listener, to ponder based on that.

00:18:45.910 --> 00:18:50.210
If AWS provides that incredible 11 nines of physical

00:18:50.210 --> 00:18:53.769
data durability with a service like S3, how does

00:18:53.769 --> 00:18:56.690
that change how you, as an application developer

00:18:56.690 --> 00:18:58.690
or architect, should think about data safety?

00:18:59.150 --> 00:19:01.789
Do you still need to build all the same complex

00:19:01.789 --> 00:19:04.210
application level checks and redundancy for data

00:19:04.210 --> 00:19:06.329
loss that you might have done 10 15 years ago,

00:19:06.710 --> 00:19:09.369
or has the cloud platform itself fundamentally

00:19:09.369 --> 00:19:11.789
shifted some of that responsibility? Where does

00:19:11.789 --> 00:19:13.829
the line sit now? Something to think about for

00:19:13.829 --> 00:19:15.849
your next project. Definitely something to consider.

00:19:16.329 --> 00:19:18.589
The landscape has changed. Indeed it has. Thanks

00:19:18.589 --> 00:19:20.289
for sharing all that insight today. My pleasure.

00:19:20.569 --> 00:19:21.970
Until next time. Keep diving deep.
