WEBVTT

00:00:00.000 --> 00:00:03.040
Every single time you, you know, save a document

00:00:03.040 --> 00:00:06.679
or download a photo or just boot up your sleek

00:00:06.679 --> 00:00:10.140
new PC, you are relying on a piece of invisible

00:00:10.140 --> 00:00:13.019
plumbing that was designed way back in 1993.

00:00:13.679 --> 00:00:15.720
Yeah, it really is the absolute definition of

00:00:15.720 --> 00:00:18.399
legacy infrastructure, just quietly running the

00:00:18.399 --> 00:00:20.910
modern world. Right. Welcome to this deep dive.

00:00:21.289 --> 00:00:23.550
Our mission today is to explore that invisible

00:00:23.550 --> 00:00:27.089
plumbing, specifically the NTFile system, or

00:00:27.089 --> 00:00:29.489
NTFS. Which is something most people use every

00:00:29.489 --> 00:00:31.769
single day but never actually think about. Exactly.

00:00:31.910 --> 00:00:34.810
We're pulling from a really comprehensive Wikipedia

00:00:34.810 --> 00:00:37.469
breakdown of its architecture. And the goal here

00:00:37.469 --> 00:00:39.469
is for you to understand how this 30 -year -old

00:00:39.469 --> 00:00:43.299
system manipulates physical storage space, hides

00:00:43.299 --> 00:00:46.859
invisible files right under your nose, and even

00:00:46.859 --> 00:00:49.899
tracks time back to the 1600s just to keep your

00:00:49.899 --> 00:00:52.060
digital life intact. It's honestly an engineering

00:00:52.060 --> 00:00:54.200
marvel. We tend to focus so much on the shiny

00:00:54.200 --> 00:00:57.539
interfaces or the cloud, but the literal organization

00:00:57.539 --> 00:00:59.479
of the ones and zeros on your physical drive

00:00:59.479 --> 00:01:02.000
is fascinating. And look, I know usually reading

00:01:02.000 --> 00:01:04.099
about file systems sounds like, well, reading

00:01:04.099 --> 00:01:05.900
an audio textbook, it can get incredibly dry.

00:01:05.980 --> 00:01:08.500
Oh, absolutely. But the source material for NTFS

00:01:08.500 --> 00:01:11.260
is actually full of all this digital drama and

00:01:11.260 --> 00:01:14.200
these incredibly clever space bending illusions.

00:01:14.319 --> 00:01:17.079
It really is. And to understand why it's so robust,

00:01:17.659 --> 00:01:19.219
we kind of have to look at the environment it

00:01:19.219 --> 00:01:21.519
was born into. Yeah. Because an origin story

00:01:21.519 --> 00:01:23.480
reads like a messy corporate divorce. Right.

00:01:23.560 --> 00:01:26.239
Back in the mid 1980s, Microsoft and IBM were

00:01:26.239 --> 00:01:28.260
actually working together on a joint operating

00:01:28.260 --> 00:01:30.060
system, right? Yeah, they were developing OS

00:01:30.060 --> 00:01:32.900
2. And alongside it, they were building the high

00:01:32.900 --> 00:01:36.040
performance file system, or HPFS. Because the

00:01:36.040 --> 00:01:38.379
industry just desperately needed a step up from

00:01:38.379 --> 00:01:40.900
the old FAT system, the file allocation table.

00:01:41.159 --> 00:01:44.719
Exactly. FAT was conceptually very simple, but

00:01:44.719 --> 00:01:47.969
it was hitting severe limits. regarding scalability,

00:01:48.250 --> 00:01:50.909
file security, reliability. Like, if your computer

00:01:50.909 --> 00:01:53.950
crashed while riding to a FAT drive, your data

00:01:53.950 --> 00:01:55.810
was just gone. There was a very high chance it

00:01:55.810 --> 00:01:57.569
was just completely gone, yeah. But Microsoft

00:01:57.569 --> 00:01:59.409
and IBM, they couldn't get along. It was this

00:01:59.409 --> 00:02:01.629
massive clash of corporate cultures, so they

00:02:01.629 --> 00:02:04.269
just split up like a bad band break -up. A very

00:02:04.269 --> 00:02:08.569
expensive band break -up. IBM kept OS2, and Microsoft

00:02:08.569 --> 00:02:11.090
took concepts from that partnership and formed

00:02:11.090 --> 00:02:13.830
their own supergroup of engineers. People like

00:02:13.830 --> 00:02:16.659
Tom Miller and Gary Kamara. Right. And they went

00:02:16.659 --> 00:02:19.620
off to build Windows NT. And they knew they had

00:02:19.620 --> 00:02:21.719
to build a file system that wouldn't just solve

00:02:21.719 --> 00:02:24.879
the problems of 1993. It had to scale for decades.

00:02:25.039 --> 00:02:27.759
Which required a total architectural rethink.

00:02:27.979 --> 00:02:30.360
They didn't just take the old FAT system and,

00:02:30.379 --> 00:02:32.219
I don't know, make the storage buckets bigger.

00:02:32.520 --> 00:02:35.060
No. They built a highly abstract architecture.

00:02:35.639 --> 00:02:37.960
The core idea was that almost everything on the

00:02:37.960 --> 00:02:40.400
volume is treated as a file. And everything inside

00:02:40.400 --> 00:02:43.530
a file is treated as an attribute. That abstraction

00:02:43.530 --> 00:02:46.250
is what allowed for this insane leap in scale.

00:02:46.810 --> 00:02:48.770
The sources point out that the theoretical maximum

00:02:48.770 --> 00:02:51.710
file size in NTFS is 16 exabytes. Which is a

00:02:51.710 --> 00:02:54.030
number most people can't even fathom. I certainly

00:02:54.030 --> 00:02:55.949
can't. And even the currently implemented limit

00:02:55.949 --> 00:02:58.789
in recent Windows versions is 8 petabytes minus

00:02:58.789 --> 00:03:01.449
2 megabytes. But hold on. If this was designed

00:03:01.449 --> 00:03:05.210
in 1993, back when a massive hard drive was maybe

00:03:05.210 --> 00:03:08.620
a few hundred megabytes, How on earth is a system

00:03:08.620 --> 00:03:11.099
from that era mechanically handling an eight

00:03:11.099 --> 00:03:14.400
petabyte file today? Did they just guess the

00:03:14.400 --> 00:03:17.659
future? Well, not quite guess. The secret is

00:03:17.659 --> 00:03:20.379
that NTFS decouples the logical structure of

00:03:20.379 --> 00:03:22.500
a file from the physical hardware of the disk.

00:03:22.719 --> 00:03:25.439
OK, what does that mean in plain English? So

00:03:25.439 --> 00:03:27.900
in older systems, the file system was rigidly

00:03:27.900 --> 00:03:31.300
tied to physical sectors. But NTFS operates using

00:03:31.300 --> 00:03:34.479
virtual clusters. An attribute inside a file

00:03:34.479 --> 00:03:37.240
simply points to a run or a sequence. of these

00:03:37.240 --> 00:03:40.080
virtual clusters. Oh, I see. Yeah. So as hardware

00:03:40.080 --> 00:03:42.520
evolved and physical cluster sizes grew from,

00:03:42.520 --> 00:03:45.719
you know, 512 bytes up to two megabytes in modern

00:03:45.719 --> 00:03:48.280
drives, the underlying architecture of NTFS didn't

00:03:48.280 --> 00:03:50.300
have to change at all. It just continued mapping

00:03:50.300 --> 00:03:52.639
attributes to whatever cluster size the hardware

00:03:52.639 --> 00:03:55.020
handed to it. Exactly. It's incredibly adaptable.

00:03:55.060 --> 00:03:56.879
So if you're building a system intended to scale

00:03:56.879 --> 00:03:59.539
up to 16 exabytes without collapsing into total

00:03:59.539 --> 00:04:01.860
chaos, I mean, you can't just toss files onto

00:04:01.860 --> 00:04:03.740
a disk and hope you find them later. Oh, absolutely

00:04:03.740 --> 00:04:05.879
not. You need an obsessively rigorous tracking

00:04:05.879 --> 00:04:08.919
architecture. And if NTFS treats everything as

00:04:08.919 --> 00:04:11.039
a file, there has to be a master list holding

00:04:11.039 --> 00:04:13.759
all of reality together. Yes. That master list

00:04:13.759 --> 00:04:17.040
is the master file table or the MFT. This is

00:04:17.040 --> 00:04:20.600
really the absolute heart of NTFS. Every single

00:04:20.600 --> 00:04:23.680
file directory and piece of metadata on your

00:04:23.680 --> 00:04:27.079
drive is stored as an entry in the MFT. And because

00:04:27.079 --> 00:04:28.899
we said everything is a file, the MFT itself

00:04:28.899 --> 00:04:30.879
is just a file. All right. It's a hidden meta

00:04:30.879 --> 00:04:33.740
file called MFT. I love the naming convention

00:04:33.740 --> 00:04:36.420
here. The system is run by these hidden metafiles

00:04:36.420 --> 00:04:38.519
that all start from the dollar sign. You've got

00:04:38.519 --> 00:04:41.319
MFT. You've got bitmap, which tracks which clusters

00:04:41.319 --> 00:04:43.500
on your drive are free and which are used. And

00:04:43.500 --> 00:04:45.819
then you have bad class, which honestly sounds

00:04:45.819 --> 00:04:48.139
like a Star Wars bounty hunter. It really does.

00:04:48.579 --> 00:04:50.819
But it actually just quarantines bad sectors

00:04:50.819 --> 00:04:52.660
on your physical drives that the system knows

00:04:52.660 --> 00:04:55.699
to never write data there. The design is incredibly

00:04:55.699 --> 00:04:58.350
elegant. But the most fascinating mechanical

00:04:58.350 --> 00:05:00.329
quirk of the MFT, at least from the sources,

00:05:00.910 --> 00:05:02.910
is how it handles really small files. Oh, this

00:05:02.910 --> 00:05:05.310
is one of my favorite features. So a standard

00:05:05.310 --> 00:05:09.149
MFT record for a single file is typically 1024

00:05:09.149 --> 00:05:11.850
bytes, so one kilobyte. And every record needs

00:05:11.850 --> 00:05:14.490
some of that space for overhead. You know, a

00:05:14.490 --> 00:05:17.050
header. standard information attributes like

00:05:17.050 --> 00:05:18.930
the creation timestamp, security descriptors,

00:05:18.990 --> 00:05:20.930
that kind of stuff. Which leaves you with maybe,

00:05:20.930 --> 00:05:24.170
what, 700 or 800 bytes of empty space within

00:05:24.170 --> 00:05:27.269
that single ledger entry? Precisely. Now, normally

00:05:27.269 --> 00:05:30.250
if you have a large file, the MFT record uses

00:05:30.250 --> 00:05:33.990
that remaining space to store a map, a run list,

00:05:34.490 --> 00:05:36.750
that points to where the actual data is located

00:05:36.750 --> 00:05:39.230
out on the physical clusters of the hard drive.

00:05:39.420 --> 00:05:41.579
Right. And the sources call that a non -resident

00:05:41.579 --> 00:05:44.060
file. The MFT points to it and the system has

00:05:44.060 --> 00:05:46.600
to go fetch it. Exactly. But if you create a

00:05:46.600 --> 00:05:49.779
tiny text file that is only, say, 500 bytes.

00:05:49.839 --> 00:05:52.600
It easily fits into that leftover space inside

00:05:52.600 --> 00:05:56.000
the MFT record itself. Yes. So NTFS doesn't even

00:05:56.000 --> 00:05:58.000
bother finding a spot on the broader hard drive

00:05:58.000 --> 00:06:00.680
for it. It just stuffs the actual contents of

00:06:00.680 --> 00:06:03.079
the file directly inside the MFT record as a

00:06:03.079 --> 00:06:05.759
resident attribute. So the data literally never

00:06:05.759 --> 00:06:07.899
leaves the ledger. Never leaves it. It's basically

00:06:07.899 --> 00:06:10.360
like a hotel registry book. If a guest shows

00:06:10.360 --> 00:06:12.100
up with massive luggage, you write their name

00:06:12.100 --> 00:06:14.579
in the registry and hand them a key to room 302

00:06:14.579 --> 00:06:16.560
out in the hotel. That's a great way to picture

00:06:16.560 --> 00:06:18.620
it. But if their luggage is small enough, like

00:06:18.620 --> 00:06:20.639
under 800 bytes, you don't even give them a room.

00:06:20.720 --> 00:06:23.540
You just stuff their tiny bags directly into

00:06:23.540 --> 00:06:25.639
the binding of the registry book itself. And

00:06:25.639 --> 00:06:28.759
by doing that, the system completely eliminates

00:06:28.759 --> 00:06:32.139
the IO overhead of fetching the file. I mean,

00:06:32.139 --> 00:06:34.439
if the operating system is already reading the

00:06:34.439 --> 00:06:36.860
MFT to locate the file and the data is sitting

00:06:36.860 --> 00:06:39.120
right there in the index, the read operation

00:06:39.120 --> 00:06:42.399
is instantly finished. Wow. Yeah. And because

00:06:42.399 --> 00:06:45.920
of this mechanic, an NTFS volume can theoretically

00:06:45.920 --> 00:06:48.620
contain more individual files than there are

00:06:48.620 --> 00:06:50.860
available data clusters on the physical disk.

00:06:51.019 --> 00:06:53.500
Wait, really? Yeah, you could have millions of

00:06:53.500 --> 00:06:56.680
tiny files taking up zero dedicated blocks of

00:06:56.680 --> 00:06:59.259
standard storage space. That level of efficiency

00:06:59.259 --> 00:07:03.149
is wild, but I mean... If you step back, it also

00:07:03.149 --> 00:07:05.769
sounds incredibly precarious. How so? Well, if

00:07:05.769 --> 00:07:08.629
the MFT is the central ledger and you are actively

00:07:08.629 --> 00:07:11.529
stuffing real user data into the pages of this

00:07:11.529 --> 00:07:14.269
registry, any corruption to this file would be

00:07:14.269 --> 00:07:16.750
catastrophic. Like if you're downloading a file,

00:07:16.949 --> 00:07:19.490
updating the MFT, and your house loses power

00:07:19.490 --> 00:07:22.089
mid -save, your whole drive's reality could shatter.

00:07:22.230 --> 00:07:24.449
Yes. It's an existential threat that destroys

00:07:24.449 --> 00:07:28.110
lesser file systems. And this exact vulnerability

00:07:28.410 --> 00:07:31.370
is why the engineers introduced what is arguably

00:07:31.370 --> 00:07:34.110
NTFS's most critical survival tool, which is

00:07:34.110 --> 00:07:36.449
journaling. OK, let's look at how this mechanically

00:07:36.449 --> 00:07:39.589
works. Because to survive a sudden crash, it

00:07:39.589 --> 00:07:42.350
uses another hidden metafile, the log file. Right.

00:07:42.569 --> 00:07:44.629
And it's essentially writing down a to -do list

00:07:44.629 --> 00:07:47.529
before it actually alters the MFT, right? That

00:07:47.529 --> 00:07:49.889
is the core mechanism of a journaling file system.

00:07:50.029 --> 00:07:53.149
Before NTFS commits any changes to the main data

00:07:53.149 --> 00:07:55.529
structures like updating the MFT or changing

00:07:55.529 --> 00:07:58.290
the bitmap to show a cluster is now in use, it

00:07:58.290 --> 00:08:00.370
records its intent to make that change in the

00:08:00.370 --> 00:08:02.930
log file. Got it. It uses this combination of

00:08:02.930 --> 00:08:05.910
redo and undo information. So walk me through

00:08:05.910 --> 00:08:08.370
a crash scenario. Let's say I hit save on a document,

00:08:08.569 --> 00:08:10.670
the power cuts out, and my screen goes black.

00:08:11.069 --> 00:08:12.769
What actually happens when I turn the machine

00:08:12.769 --> 00:08:15.230
back on? Okay, so during the reboot process,

00:08:15.389 --> 00:08:17.110
before the operating system fully mounts the

00:08:17.110 --> 00:08:20.149
volume and gives you access, NTFS checks that

00:08:20.149 --> 00:08:22.550
log file. It compares the journal against the

00:08:22.550 --> 00:08:24.990
current state of the MFT. If it finds a transaction

00:08:24.990 --> 00:08:27.370
in the log that was interrupted and never fully

00:08:27.370 --> 00:08:31.000
committed to the disk, It uses that undue information

00:08:31.000 --> 00:08:34.960
to systematically roll back the MFT to its last

00:08:34.960 --> 00:08:37.620
known consistent state. So it just acts like

00:08:37.620 --> 00:08:39.899
the interrupted save never happened? Mechanically,

00:08:40.080 --> 00:08:42.500
yeah. It reverses the partial structural changes

00:08:42.500 --> 00:08:44.259
so the ledger isn't corrupted by half -written

00:08:44.259 --> 00:08:47.500
data. And this stability is actually the foundation

00:08:47.500 --> 00:08:51.620
for features like volume shadow copy or VSS.

00:08:51.779 --> 00:08:54.320
Oh, I've heard of that. Yeah. By keeping a rigorous

00:08:54.320 --> 00:08:56.720
metadata journal and using a copy -on -write

00:08:56.720 --> 00:08:59.759
technique, Windows can keep historical snapshots

00:08:59.759 --> 00:09:02.480
of your files while the system is actively in

00:09:02.480 --> 00:09:04.860
use. It's how the previous version's feature

00:09:04.860 --> 00:09:07.179
works. That's amazing. But wait, I have to push

00:09:07.179 --> 00:09:09.179
back on the performance aspect of this. Sure.

00:09:09.340 --> 00:09:11.740
If the file system has to write down everything

00:09:11.740 --> 00:09:14.320
it's going to do in a journal, and then actually

00:09:14.320 --> 00:09:16.840
do the work a millisecond later, doesn't that

00:09:16.840 --> 00:09:20.080
effectively double the read -write workload of

00:09:20.080 --> 00:09:22.080
the drive? Sounds like it would, yeah. If I'm

00:09:22.080 --> 00:09:24.539
transferring a 50 gigabyte video file, logging

00:09:24.539 --> 00:09:27.100
that twice sounds like a massive drag on speed.

00:09:27.559 --> 00:09:30.500
And it would be if it logged the user data. But

00:09:30.500 --> 00:09:34.000
the crucial nuance here is that NPFS is a metadata

00:09:34.000 --> 00:09:37.039
journaling file system. It does not journal the

00:09:37.039 --> 00:09:39.759
contents of your massive user files. Ah, so it's

00:09:39.759 --> 00:09:41.919
only double writing the administrative paperwork.

00:09:42.159 --> 00:09:45.519
Exactly. It logs the structural changes, updating

00:09:45.519 --> 00:09:48.139
the file size, the modification timestamps, the

00:09:48.139 --> 00:09:50.700
allocation of the cluster map. It prioritizes

00:09:50.700 --> 00:09:52.980
the structural integrity of the entire volume

00:09:52.980 --> 00:09:55.659
over the content of one individual file. Okay,

00:09:55.779 --> 00:09:58.159
that makes perfect sense. It's a calculated trade

00:09:58.159 --> 00:10:01.090
-off. If the power cuts out during your 50 gigabyte

00:10:01.090 --> 00:10:04.370
transfer, yes, you will lose the video data that

00:10:04.370 --> 00:10:07.110
was in transit. But the MFT will remain perfectly

00:10:07.110 --> 00:10:09.070
intact, meaning your whole hard drive won't be

00:10:09.070 --> 00:10:11.289
corrupted. So it basically guarantees the hotel

00:10:11.289 --> 00:10:13.730
registry survives the fire, even if a few guests

00:10:13.730 --> 00:10:15.690
lose their luggage. That is a perfect way to

00:10:15.690 --> 00:10:18.289
put it. But if we follow this logic, that NTFS

00:10:18.289 --> 00:10:21.190
has this incredibly robust, flexible metadata

00:10:21.190 --> 00:10:23.750
tracking architecture, it actually introduces

00:10:23.750 --> 00:10:26.460
a controversial side effect. Yes, it does. Because

00:10:26.460 --> 00:10:28.379
if you can track all these distinct attributes

00:10:28.379 --> 00:10:31.019
so easily, you can also attach invisible data

00:10:31.019 --> 00:10:33.019
directly to a file without the user ever seeing

00:10:33.019 --> 00:10:36.039
it. Which brings us to alternate data streams.

00:10:36.440 --> 00:10:38.519
Alternate data streams are ADS. This is where

00:10:38.519 --> 00:10:40.600
the abstraction of everything as an attribute

00:10:40.600 --> 00:10:43.299
gets really fascinating. So in the MFT record

00:10:43.299 --> 00:10:46.639
for a file, the actual contents of your document

00:10:46.639 --> 00:10:49.879
are stored in an attribute logically called data.

00:10:50.080 --> 00:10:52.450
Right. But the architecture does not limit a

00:10:52.450 --> 00:10:55.409
file to just one data attribute. You can append

00:10:55.409 --> 00:10:57.289
a second, third, or fourth data attribute to

00:10:57.289 --> 00:10:59.769
the exact same file name. And the spooky part

00:10:59.769 --> 00:11:02.570
is how standard Windows Explorer handles this.

00:11:02.669 --> 00:11:05.649
Because if you have a normal 1 kilobyte text

00:11:05.649 --> 00:11:08.669
file and you attach a 50 megabyte alternate data

00:11:08.669 --> 00:11:10.649
stream to it, Windows will still look you in

00:11:10.649 --> 00:11:12.970
the eye and tell you the file size is 1 kilobyte.

00:11:13.330 --> 00:11:15.470
The extra 50 megabytes are completely invisible

00:11:15.470 --> 00:11:18.149
to the casual user. Because standard graphical

00:11:18.149 --> 00:11:20.389
interfaces are programmed to only report the

00:11:20.389 --> 00:11:23.669
size of the primary unnamed data stream. The

00:11:23.669 --> 00:11:25.590
alternate streams are hidden behind a colon in

00:11:25.590 --> 00:11:27.940
the file path. something like, you know, text

00:11:27.940 --> 00:11:30.860
.txt .extrastream. If I'm listening to this right

00:11:30.860 --> 00:11:33.720
now, my immediate thought is, how much invisible

00:11:33.720 --> 00:11:36.500
ghost data or malware is just sitting on my hard

00:11:36.500 --> 00:11:39.620
drive masquerading as a boring text file? It's

00:11:39.620 --> 00:11:41.620
a very real thought to have. Like, why build

00:11:41.620 --> 00:11:44.659
a feature that explicitly hides data from the

00:11:44.659 --> 00:11:46.960
user? Well, it's a highly valid security concern,

00:11:47.200 --> 00:11:51.279
and malware absolutely exploits ADS to hide malicious

00:11:51.279 --> 00:11:54.559
payloads. But its origin wasn't nefarious at

00:11:54.559 --> 00:11:57.799
all. It wasn't. No. Back in the early 90s, Microsoft

00:11:57.799 --> 00:12:00.860
needed Windows NT servers to be compatible with

00:12:00.860 --> 00:12:03.519
Apple Macintosh networks. Oh, right. The sources

00:12:03.519 --> 00:12:06.100
mention Mac compatibility. Yeah. Mac file systems

00:12:06.100 --> 00:12:08.179
use something called resource forks to store

00:12:08.179 --> 00:12:11.179
extra metadata about a file, like custom icons

00:12:11.179 --> 00:12:13.360
or window positions, totally separate from the

00:12:13.360 --> 00:12:15.679
main data. So to host those Mac files without

00:12:15.679 --> 00:12:19.080
breaking them, NTFS needed a way to hold a secondary

00:12:19.080 --> 00:12:21.639
fork of data. Hence, alternate data streams.

00:12:21.799 --> 00:12:24.340
That is wild. And even though Mac compatibility

00:12:24.340 --> 00:12:27.320
isn't really the primary goal anymore, we still

00:12:27.320 --> 00:12:29.799
interact with ADS every single day through a

00:12:29.799 --> 00:12:31.659
security feature called the Mark of the Web.

00:12:31.919 --> 00:12:34.809
Yes, we do. If you download an executable file

00:12:34.809 --> 00:12:38.230
using Chrome or Edge, the browser deliberately

00:12:38.230 --> 00:12:41.629
writes a tiny alternate data stream called zone

00:12:41.629 --> 00:12:44.710
.identifier and attaches it to your download.

00:12:44.929 --> 00:12:46.730
So it's basically a hidden sticky note that says,

00:12:47.190 --> 00:12:49.549
warning, this file originated from the external

00:12:49.549 --> 00:12:51.889
internet zone. Exactly. So when I double -click

00:12:51.889 --> 00:12:54.570
that downloaded file, Windows reads the invisible

00:12:54.570 --> 00:12:56.470
sticky note and triggers that pop -up asking,

00:12:56.970 --> 00:12:59.059
are you sure you want to run this app? Right.

00:12:59.179 --> 00:13:01.759
The SmartScreen filter intercepts the execution,

00:13:02.120 --> 00:13:05.649
reads the hidden stream, and prompts you. perfectly

00:13:05.649 --> 00:13:08.269
highlights this classic tech dilemma. A feature

00:13:08.269 --> 00:13:10.590
engineered to hide helpful compatibility metadata

00:13:10.590 --> 00:13:13.309
becomes a vulnerability for malware, but then

00:13:13.309 --> 00:13:15.509
evolves into a critical mechanism for modern

00:13:15.509 --> 00:13:17.809
threat detection. It's just layer upon layer

00:13:17.809 --> 00:13:20.789
of workarounds. So if alternate data streams

00:13:20.789 --> 00:13:24.169
are how NTFS sneaks hidden data into your files,

00:13:24.529 --> 00:13:27.029
how does it handle the exact opposite problem?

00:13:27.129 --> 00:13:29.389
You mean applications that demand massive files

00:13:29.389 --> 00:13:31.389
full of absolutely nothing? Exactly. Because

00:13:31.389 --> 00:13:33.929
the sources outline some ingenious tricks NTFS

00:13:33.929 --> 00:13:36.230
uses to physically remove data from your drive

00:13:36.230 --> 00:13:38.350
while tricking your software into thinking it's

00:13:38.350 --> 00:13:41.169
still there. Ah, you're referring to sparse files

00:13:41.169 --> 00:13:44.649
in system compression. This is where NTFS essentially

00:13:44.649 --> 00:13:47.870
becomes a master illusionist. It separates the

00:13:47.870 --> 00:13:50.649
logical size of a file from the physical reality

00:13:50.649 --> 00:13:53.470
of the disk. So let's look at sparse files mechanically.

00:13:54.049 --> 00:13:56.149
Imagine a database application that reserves

00:13:56.149 --> 00:13:59.470
a massive chunk of space for future use. It creates

00:13:59.470 --> 00:14:02.929
a file containing gigabytes of empty space, just...

00:14:02.940 --> 00:14:06.159
Endless strings of zeros. Normally, writing gigabytes

00:14:06.159 --> 00:14:09.360
of zeros to a physical disk takes time, and it

00:14:09.360 --> 00:14:11.279
permanently occupies that physical space. Right.

00:14:11.480 --> 00:14:14.779
But if you flag it as a sparse file, NTFS intercepts

00:14:14.779 --> 00:14:17.259
the right request. Instead of actually saving

00:14:17.259 --> 00:14:19.320
millions of empty bytes to the physical clusters,

00:14:19.899 --> 00:14:22.360
it modifies that run list in the MFT we talked

00:14:22.360 --> 00:14:24.220
about earlier. OK. It writes a tiny metadata

00:14:24.220 --> 00:14:26.580
note saying, for the next three million virtual

00:14:26.580 --> 00:14:29.480
clusters, just return zeros. That is so clever.

00:14:29.700 --> 00:14:31.679
It's the digital equivalent of buying a massive

00:14:31.679 --> 00:14:34.519
1000 acre plot of land, but only paying taxes

00:14:34.519 --> 00:14:36.679
on the 10 square feet where your tent is pitched.

00:14:36.919 --> 00:14:39.639
I love that. Yes. The database application thinks

00:14:39.639 --> 00:14:43.639
it owns 1000 acres, sees a massive file, but

00:14:43.639 --> 00:14:46.559
NTFS knows the truth. It hadn't allocated any

00:14:46.559 --> 00:14:48.799
actual physical hardware to those zeros. And

00:14:48.799 --> 00:14:50.779
the physical space remains completely free for

00:14:50.779 --> 00:14:53.669
other files to use. Because of this trick, you

00:14:53.669 --> 00:14:55.929
could theoretically have a 1 petabyte sparse

00:14:55.929 --> 00:14:58.730
file that takes up exactly 0 bytes on your physical

00:14:58.730 --> 00:15:01.700
disk. That's absurd. Right. And when the application

00:15:01.700 --> 00:15:04.460
tries to read the file, NTFS rapidly feeds it

00:15:04.460 --> 00:15:07.679
a stream of zeros generated in memory, maintaining

00:15:07.679 --> 00:15:10.360
the illusion flawlessly. And Microsoft actually

00:15:10.360 --> 00:15:12.639
took this illusion of empty space even further

00:15:12.639 --> 00:15:14.700
with a feature introduced in Windows 10 called

00:15:14.700 --> 00:15:17.320
Compact OS. They did. It's meant to compress

00:15:17.320 --> 00:15:20.259
core system files to save space on cheap laptops

00:15:20.259 --> 00:15:22.659
and tablets, but the way it works under the hood

00:15:22.659 --> 00:15:26.419
is incredibly sneaky. Very sneaky. Compact OS

00:15:26.419 --> 00:15:29.080
doesn't use the legacy NTFS compression attribute

00:15:29.100 --> 00:15:31.899
from the 90s. Instead, it combines the mechanics

00:15:31.899 --> 00:15:34.799
of sparse files and alternate data streams with

00:15:34.799 --> 00:15:37.639
a file system filter driver. OK, so if I look

00:15:37.639 --> 00:15:39.779
at a core Windows system file that's been compressed

00:15:39.779 --> 00:15:42.100
this way, what am I actually looking at? On the

00:15:42.100 --> 00:15:45.340
surface, the main file on your drive is effectively

00:15:45.340 --> 00:15:49.299
an empty sparse file. OK. However, NTFS applies

00:15:49.299 --> 00:15:52.299
a reparse point to it, specifically stamped with

00:15:52.299 --> 00:15:55.200
a Windows overlay filter tag, or W -O -F tag.

00:15:55.299 --> 00:15:57.960
And a reparse point acts as a tripwire. Exactly.

00:15:58.340 --> 00:16:00.799
The actual compressed system data is stashed

00:16:00.799 --> 00:16:03.539
away inside a hidden alternate data stream attached

00:16:03.539 --> 00:16:06.259
to that empty shell. So the file is literally

00:16:06.259 --> 00:16:09.179
an empty decoy with a sticky note pointing to

00:16:09.179 --> 00:16:11.840
a hidden room. Yes. How does the computer actually

00:16:11.840 --> 00:16:14.620
read the file if an application needs it? When

00:16:14.620 --> 00:16:17.080
an application requests access to that core file,

00:16:17.539 --> 00:16:20.700
it hits the reparse point tripwire. This instantly

00:16:20.700 --> 00:16:22.820
triggers the Windows Overlay filter driver, which

00:16:22.820 --> 00:16:25.539
steps in, intercepts the request, locates the

00:16:25.539 --> 00:16:27.980
hidden alternate data stream, decompresses the

00:16:27.980 --> 00:16:29.899
data directly into the system's active memory,

00:16:30.259 --> 00:16:32.139
and hands it to the application. And the application

00:16:32.139 --> 00:16:34.779
has no idea? None. The application has no idea.

00:16:34.840 --> 00:16:37.039
The file on the disk was empty, compressed, or

00:16:37.039 --> 00:16:39.240
hidden. It just receives the uncompressed data

00:16:39.240 --> 00:16:41.990
seamlessly. The coordination required to execute

00:16:41.990 --> 00:16:45.009
that on the fly millions of times a day without

00:16:45.009 --> 00:16:48.269
the user ever noticing a delay. It's just staggering.

00:16:48.750 --> 00:16:51.529
It really is. So we've seen how NTFS manipulates

00:16:51.529 --> 00:16:54.230
physical storage space. It stuffs small files

00:16:54.230 --> 00:16:56.629
into the MFT ledger. It hides data in alternate

00:16:56.629 --> 00:16:59.269
streams. It fakes the size of empty files using

00:16:59.269 --> 00:17:01.509
run lists. But as I was digging through the Wikipedia

00:17:01.509 --> 00:17:04.130
documentation, I realized its manipulation of

00:17:04.130 --> 00:17:07.890
time is somehow even stranger. Yes. The time

00:17:07.890 --> 00:17:11.329
traveling quirk. The way NTFS tracks time is

00:17:11.329 --> 00:17:14.869
uniquely hyper -specific and deeply tied to historical

00:17:14.869 --> 00:17:17.089
mathematics. Right, because most file systems,

00:17:17.210 --> 00:17:19.089
like the Unix systems running the backbone of

00:17:19.089 --> 00:17:21.069
the internet, track time starting from January

00:17:21.069 --> 00:17:24.269
1st, 1970. Yep, the Unix epoch. But NTFS stores

00:17:24.269 --> 00:17:27.470
dates and times as a 64 -bit integer and it counts

00:17:27.470 --> 00:17:29.990
time in 100 nanosecond intervals. Which means

00:17:29.990 --> 00:17:32.390
the internal clock is ticking 10 million times

00:17:32.390 --> 00:17:34.089
every single second. But the starting point,

00:17:34.109 --> 00:17:36.190
the epoch that it counts forward from, isn't

00:17:36.190 --> 00:17:39.920
1993 when NTFS was released. It isn't 1970. The

00:17:39.920 --> 00:17:42.900
baseline zero for time on your sleek modern Windows

00:17:42.900 --> 00:17:46.799
PC right now is January 1st, 1601. Shakespeare

00:17:46.799 --> 00:17:50.359
was literally writing Hamlet in 1601. Why on

00:17:50.359 --> 00:17:52.819
earth did Microsoft engineers base a futuristic

00:17:52.819 --> 00:17:55.240
file system on the Elizabethan era? Well, as

00:17:55.240 --> 00:17:58.279
amusing as the historical context is, the decision

00:17:58.279 --> 00:18:01.319
was driven by pure algorithm optimization. The

00:18:01.319 --> 00:18:02.559
engineers weren't thinking about Shakespeare.

00:18:02.799 --> 00:18:05.000
They were thinking about Pope Gregory XIII. Right,

00:18:05.119 --> 00:18:07.299
the calendar guy. The calendar guy. The year

00:18:07.299 --> 00:18:10.460
1601 was the start of the 400 -year Gregorian

00:18:10.460 --> 00:18:13.240
calendar cycle that was currently active when

00:18:13.240 --> 00:18:16.480
Windows NT is being conceived in the late 1980s.

00:18:16.799 --> 00:18:19.910
That cycle ran from 1601 to the year 2000. Because

00:18:19.910 --> 00:18:22.309
the mathematical rules for calculating leap years

00:18:22.309 --> 00:18:25.190
repeat exactly every 400 years in the Gregorian

00:18:25.190 --> 00:18:27.990
calendar. Precisely. So by aligning the absolute

00:18:27.990 --> 00:18:30.269
zero of their computer clock with the first day

00:18:30.269 --> 00:18:33.529
of that specific 400 year cycle, the developers

00:18:33.529 --> 00:18:35.910
vastly simplified the math the operating system

00:18:35.910 --> 00:18:38.369
required to calculate leap years and convert

00:18:38.369 --> 00:18:41.470
those 100 nanosecond ticks into human readable

00:18:41.470 --> 00:18:44.170
dates. That is wild. It's a brilliant example

00:18:44.170 --> 00:18:47.049
of how a 16th century calendar reform directly

00:18:47.049 --> 00:18:49.380
dictated the engineering of a 20th century computer

00:18:49.380 --> 00:18:52.099
architecture. Though tying the system's perception

00:18:52.099 --> 00:18:55.119
of time to a global absolute did create one really

00:18:55.119 --> 00:18:57.359
annoying modern problem, the daylight saving

00:18:57.359 --> 00:19:00.640
time bug. Oh yeah, because NTFS internally tracks

00:19:00.640 --> 00:19:03.680
all of these timestamps in UTC, Coordinated Universal

00:19:03.680 --> 00:19:06.059
Time. Right. The file system itself doesn't care

00:19:06.059 --> 00:19:08.099
about time zones at all. It just counts ticks

00:19:08.099 --> 00:19:12.279
from 1601 UTC. But older file systems like FAT

00:19:12.279 --> 00:19:15.140
were designed to stamp files using the user's

00:19:15.140 --> 00:19:17.880
local time zone. So if you copy a file from a

00:19:17.880 --> 00:19:20.819
modern NTFS drive to an older USB thumb drive

00:19:20.819 --> 00:19:23.819
formatted in FAT, the operating system has to

00:19:23.819 --> 00:19:27.380
actively translate the 1601 UTC time into the

00:19:27.380 --> 00:19:30.559
local time of the FAT drive. Exactly. And if

00:19:30.559 --> 00:19:32.759
a daylight saving time boundary occurs between

00:19:32.759 --> 00:19:35.400
when it was saved and when it's copied, the translation

00:19:35.400 --> 00:19:37.779
math gets completely disjointed. Yep. Depending

00:19:37.779 --> 00:19:40.000
on the exact operating system version and the

00:19:40.000 --> 00:19:41.619
local jurisdiction rules for when the clocks

00:19:41.619 --> 00:19:43.839
change, you can look at a file you literally

00:19:43.839 --> 00:19:46.640
just copied and the modified have magically jumped

00:19:46.640 --> 00:19:49.640
forward or backward by exactly one hour. It's

00:19:49.640 --> 00:19:52.420
digital jet lag. The file is functionally identical,

00:19:52.420 --> 00:19:54.339
but because you moved it from a system thinking

00:19:54.339 --> 00:19:56.380
globally to a system thinking locally, it just

00:19:56.380 --> 00:19:59.039
gets disoriented in time. It is a harmless but

00:19:59.039 --> 00:20:01.599
very persistent ghost of backward compatibility.

00:20:01.859 --> 00:20:04.980
So, we have unpacked a massive amount of invisible

00:20:04.980 --> 00:20:07.619
infrastructure today. For you listening, if you've

00:20:07.619 --> 00:20:09.920
ever wondered why your PC doesn't immediately

00:20:09.920 --> 00:20:11.880
corrupt your photo library when the power goes

00:20:11.880 --> 00:20:14.339
out, or why your browser throws a warning on

00:20:14.339 --> 00:20:16.700
a downloaded file, well, you now know the mechanics.

00:20:17.059 --> 00:20:20.920
You certainly do. We started with a messy 1980s

00:20:20.920 --> 00:20:23.619
corporate divorce that forced the invention of

00:20:23.619 --> 00:20:26.940
a highly abstract architecture. We explored the

00:20:26.940 --> 00:20:29.559
MFT, the master ledger that physically absorbs

00:20:29.559 --> 00:20:33.380
tiny files. We broke down the redo undo mechanics

00:20:33.380 --> 00:20:35.660
of journaling, the hidden rooms of alternate

00:20:35.660 --> 00:20:38.619
data streams, and the massive zero -byte illusions

00:20:38.619 --> 00:20:41.539
of sparse files. All of it built on a foundation

00:20:41.539 --> 00:20:43.559
that counts 10 million ticks a second, starting

00:20:43.559 --> 00:20:46.400
from the year 1601. Yeah. It really is a hidden

00:20:46.400 --> 00:20:48.259
universe humming quietly beneath your keyboard

00:20:48.259 --> 00:20:50.920
right now. It really is. But before we sign off,

00:20:51.019 --> 00:20:53.440
I want to leave you with one final bizarre mechanical

00:20:53.440 --> 00:20:55.880
reality from our source material to mull over.

00:20:56.119 --> 00:20:57.920
Oh, the time limit. Yeah, we talked about the

00:20:57.920 --> 00:21:00.380
year 1601 and the 64 -bit integer that tracks

00:21:00.380 --> 00:21:02.779
time. Well, because that integer has a finite

00:21:02.779 --> 00:21:05.720
amount of space, the NTFS calendar eventually

00:21:05.720 --> 00:21:08.319
runs out of numbers. It is a mathematical certainty.

00:21:08.559 --> 00:21:11.619
The ultimate Y2K problem for the NTFile system.

00:21:12.000 --> 00:21:14.559
Exactly. Depending on how the system interprets

00:21:14.559 --> 00:21:16.759
the final bit of that integer, whether it sees

00:21:16.759 --> 00:21:20.380
it as signed or unsigned, NTFS time will eventually

00:21:20.380 --> 00:21:23.220
either break entirely and violently roll over

00:21:23.220 --> 00:21:26.779
to thousands of years before Christ, or it will

00:21:26.779 --> 00:21:29.259
max out completely and time will stop. in the

00:21:29.259 --> 00:21:32.599
year 30 ,828. Which is just a staggering amount

00:21:32.599 --> 00:21:34.559
of time for now. It genuinely makes you wonder,

00:21:34.700 --> 00:21:36.960
you know, what kind of computers will humanity

00:21:36.960 --> 00:21:40.180
be running in 28 ,000 years to misread an NTFS

00:21:40.180 --> 00:21:42.920
timestamp? Will they even know what Windows was,

00:21:43.099 --> 00:21:46.200
or will this 1993 plumbing still be quietly running

00:21:46.200 --> 00:21:48.299
in the background of some galactic database?

00:21:48.579 --> 00:21:50.420
That is quite the thought to leave on. Thank

00:21:50.420 --> 00:21:52.299
you for taking this deep dive into the invisible

00:21:52.299 --> 00:21:54.519
architecture of your digital life. Remember,

00:21:54.519 --> 00:21:56.380
there is always a deeper layer under the hood,

00:21:56.400 --> 00:21:58.779
so keep digging into your own sources. See you

00:21:58.779 --> 00:21:59.220
next time.
