WEBVTT

00:00:00.000 --> 00:00:02.299
Welcome back to Deep Dive. We are back with another

00:00:02.299 --> 00:00:04.839
stack of research. And today we are looking at

00:00:04.839 --> 00:00:10.060
something that is simultaneously the most boring

00:00:10.060 --> 00:00:13.160
and maybe the most fascinating string of characters

00:00:13.160 --> 00:00:15.550
on the entire internet. It is certainly the most

00:00:15.550 --> 00:00:18.129
ubiquitous. I'll give you that. Absolutely. We

00:00:18.129 --> 00:00:19.649
are talking about the very first line of code

00:00:19.649 --> 00:00:22.149
on essentially every single web page you have

00:00:22.149 --> 00:00:24.489
ever visited. Yeah. You right click, you hit

00:00:24.489 --> 00:00:26.449
view source, and before you get to the content,

00:00:26.530 --> 00:00:28.410
before the scripts, before any of the styles,

00:00:28.589 --> 00:00:31.629
there's this declaration. Usually it's just docu

00:00:31.629 --> 00:00:34.409
-type HTML. The document type declaration. Right.

00:00:34.630 --> 00:00:36.369
And, you know, looking through the materials

00:00:36.369 --> 00:00:38.450
we have today, the specs, the history, the browser

00:00:38.450 --> 00:00:41.750
notes, I was just struck by this central paradox.

00:00:42.229 --> 00:00:44.799
We talk all the time about code efficiency. Right.

00:00:44.880 --> 00:00:46.799
We talk about minification, compressing images,

00:00:47.020 --> 00:00:49.460
shaving off every possible millisecond of load

00:00:49.460 --> 00:00:51.259
time. Oh, yeah. Every byte counts. Every byte

00:00:51.259 --> 00:00:54.979
counts. And yet the entire web is built on shipping

00:00:54.979 --> 00:00:57.659
this one specific tag that, according to the

00:00:57.659 --> 00:01:00.000
modern spec itself, is mostly useless but required.

00:01:00.299 --> 00:01:02.960
It's a wonderful contradiction, isn't it? I mean,

00:01:02.979 --> 00:01:04.939
it really frames the whole discussion perfectly.

00:01:05.359 --> 00:01:07.620
It does. We are essentially shipping billions

00:01:07.620 --> 00:01:10.620
of bytes of data across the globe. every single

00:01:10.620 --> 00:01:13.640
day that technically do nothing, at least not

00:01:13.640 --> 00:01:16.140
in terms of processing content. But if you take

00:01:16.140 --> 00:01:18.459
them away, the whole thing falls apart. The visual

00:01:18.459 --> 00:01:20.500
web just completely falls apart. And that's what

00:01:20.500 --> 00:01:22.400
I want to get to the bottom of today. Because

00:01:22.400 --> 00:01:24.620
usually in engineering, if something is useless,

00:01:24.859 --> 00:01:26.980
you deprecate it. You get rid of it. You delete

00:01:26.980 --> 00:01:29.359
it. Of course. But here, we're keeping it. It

00:01:29.359 --> 00:01:31.019
feels, I don't know, it feels like a ritual.

00:01:31.159 --> 00:01:33.319
It's like we're lighting a candle before we start

00:01:33.319 --> 00:01:35.760
coding just to appease the gods of the internet

00:01:35.760 --> 00:01:38.040
or something. That's surprisingly close to the

00:01:38.040 --> 00:01:41.400
truth, actually. DOSI type is a fossil. A fossil.

00:01:41.560 --> 00:01:43.959
It's a remnant of a completely different era

00:01:43.959 --> 00:01:47.640
of computing history, the SGML era, that has

00:01:47.640 --> 00:01:51.180
survived into the modern web purely as a switch.

00:01:51.359 --> 00:01:54.579
It's just a signaling mechanism. Okay. So the

00:01:54.579 --> 00:01:57.079
mission for this deep dive is to figure out why.

00:01:57.299 --> 00:02:00.219
We're going to unpack the syntax, which I learned

00:02:00.219 --> 00:02:02.819
is actually way, way more complex than just that

00:02:02.819 --> 00:02:05.290
simple HTML5 tag. We're going to look at the

00:02:05.290 --> 00:02:08.270
history of sniffing, which I have to say sounds

00:02:08.270 --> 00:02:10.449
a little illicit, but it's actually just browser

00:02:10.449 --> 00:02:12.409
logic. And we have to talk about quirks mode.

00:02:12.849 --> 00:02:14.870
We are definitely going to talk about quirks

00:02:14.870 --> 00:02:18.199
mode. And I really want to understand. why modern

00:02:18.199 --> 00:02:21.199
browsers have decided to just completely ignore

00:02:21.199 --> 00:02:24.039
the sophisticated maps that were provided in

00:02:24.039 --> 00:02:25.979
the older code. It's a journey. It's a journey

00:02:25.979 --> 00:02:29.460
from the very strict academic world of standardized

00:02:29.460 --> 00:02:33.479
markup to the, well, to the messy, pragmatic

00:02:33.479 --> 00:02:36.800
reality of the web we actually built. Okay, let's

00:02:36.800 --> 00:02:38.780
start at the beginning then, section one. What

00:02:38.780 --> 00:02:42.180
is a doxy type? If we strip away all the modern

00:02:42.180 --> 00:02:44.639
context and just look at the definition and the

00:02:44.639 --> 00:02:46.580
source material, what are we actually looking

00:02:46.580 --> 00:02:48.699
at here? To understand the doxy type, you really

00:02:48.699 --> 00:02:50.500
have to understand that the web wasn't invented

00:02:50.500 --> 00:02:53.180
in a vacuum. It was born out of this whole world

00:02:53.180 --> 00:02:55.400
of document processing that already existed.

00:02:55.580 --> 00:02:57.740
Right. Specifically, it came from something called

00:02:57.740 --> 00:03:00.900
SGML, the Standard Generalized Markup Language.

00:03:01.539 --> 00:03:05.469
SGML. This is like the grandfather of HTML. Correct.

00:03:05.969 --> 00:03:09.590
SGML was a massive, incredibly complex international

00:03:09.590 --> 00:03:13.250
standard for defining document structures. It

00:03:13.250 --> 00:03:15.370
wasn't really a language itself. It was a metal

00:03:15.370 --> 00:03:16.949
language. What does that mean, a metal language?

00:03:17.250 --> 00:03:19.189
It was a language for creating other languages.

00:03:19.389 --> 00:03:21.770
And in that world, you couldn't just write...

00:03:22.000 --> 00:03:24.599
tags willy -nilly. You couldn't just make things

00:03:24.599 --> 00:03:27.539
up. You had to define your terms first. So you

00:03:27.539 --> 00:03:30.159
couldn't just invent a pizza tag unless you had

00:03:30.159 --> 00:03:32.919
first defined what a pizza tag was and what it

00:03:32.919 --> 00:03:35.319
could contain. Exactly. You needed a rule book.

00:03:35.479 --> 00:03:38.819
And that rule book is called the document type

00:03:38.819 --> 00:03:42.240
definition. The DDD. The DDD is a file that says,

00:03:42.319 --> 00:03:44.800
okay, in this language, we have an element called

00:03:44.800 --> 00:03:48.360
pizza. It can contain cheese and sauce, but it

00:03:48.360 --> 00:03:51.560
absolutely cannot contain rocks. it lays out

00:03:51.560 --> 00:03:54.439
the entire grammar okay so the dtd is the law

00:03:54.439 --> 00:03:57.719
the dtd is the law and the dock heat The document

00:03:57.719 --> 00:04:00.319
type declaration is the instruction inside your

00:04:00.319 --> 00:04:02.539
actual document that links it to that law. Ah,

00:04:02.620 --> 00:04:04.280
I see. It's a way of telling the computer, hey,

00:04:04.400 --> 00:04:06.719
I'm about to send you some data. Before you try

00:04:06.719 --> 00:04:08.620
to read it, please go refer to this specific

00:04:08.620 --> 00:04:10.419
rulebook so you know how to interpret everything.

00:04:10.759 --> 00:04:12.520
It's like declaring the rules of the game before

00:04:12.520 --> 00:04:14.740
you deal the cards. We're playing poker, so use

00:04:14.740 --> 00:04:17.420
the poker rulebook. Precisely. And in the serialized

00:04:17.420 --> 00:04:19.660
form, which just means the actual text file you

00:04:19.660 --> 00:04:22.139
send over the wire, this instruction has to come

00:04:22.139 --> 00:04:24.300
first. It has to be right at the top. which is

00:04:24.300 --> 00:04:26.360
why we see it as that short string of markup

00:04:26.360 --> 00:04:29.199
at the very top of the file. Now, one thing the

00:04:29.199 --> 00:04:31.839
nodes highlight is this concept of the root element.

00:04:32.800 --> 00:04:35.339
The doc type doesn't just point to a file. It

00:04:35.339 --> 00:04:38.019
also defines the root. What's that about? This

00:04:38.019 --> 00:04:41.170
is a really critical Parsewell concept. In SGML

00:04:41.170 --> 00:04:44.410
and its descendant, XML, a document is a tree

00:04:44.410 --> 00:04:46.490
structure. Right, like a family tree. Exactly.

00:04:46.730 --> 00:04:49.829
And every tree has to have a single trunk from

00:04:49.829 --> 00:04:52.629
which everything else branches off. The parser,

00:04:52.649 --> 00:04:54.810
the program reading the file, needs to know what

00:04:54.810 --> 00:04:56.990
that trunk is before it starts so it knows where

00:04:56.990 --> 00:04:59.529
to begin building the tree. So for a web page,

00:04:59.790 --> 00:05:03.350
that root is almost always HTML. Usually, yes.

00:05:03.709 --> 00:05:06.889
But in the abstract world of SGML, it could be

00:05:06.889 --> 00:05:09.410
anything. you could have a doci type where the

00:05:09.410 --> 00:05:12.310
root element is library or memo. But for the

00:05:12.310 --> 00:05:14.730
web, the doci type effectively says two things.

00:05:14.870 --> 00:05:17.310
I am declaring that this document adheres to

00:05:17.310 --> 00:05:19.670
the HTML D2D and the root element of this tree

00:05:19.670 --> 00:05:22.529
is HTML. So it's telling the parser, okay, get

00:05:22.529 --> 00:05:25.430
ready. Expect an HTML tag to start this whole

00:05:25.430 --> 00:05:28.370
thing off. Exactly. And this is important. The

00:05:28.370 --> 00:05:31.949
doci type then applies those rules to everything

00:05:31.949 --> 00:05:34.110
descending from that root. The whole tree. The

00:05:34.110 --> 00:05:36.560
entire tree. Once you declare the root, every

00:05:36.560 --> 00:05:39.560
single branch and leaf inside it is subject to

00:05:39.560 --> 00:05:41.819
the rules of that DTD you referenced. Okay, so

00:05:41.819 --> 00:05:43.860
that's the theory. In a perfect world, or at

00:05:43.860 --> 00:05:45.459
least, you know, in the academic world of the

00:05:45.459 --> 00:05:48.060
1990s, the browser would see this docu -type,

00:05:48.199 --> 00:05:50.019
it would go look up the rules in the DGD, and

00:05:50.019 --> 00:05:52.240
then it would validate your code against those

00:05:52.240 --> 00:05:54.620
rules. And if you broke a rule, the computer

00:05:54.620 --> 00:05:57.240
would throw an error and say, nope. not valid

00:05:57.240 --> 00:05:59.660
that was the dream of something called prescriptive

00:05:59.660 --> 00:06:02.259
markup the idea was that if we all enforce these

00:06:02.259 --> 00:06:04.740
really strict rules machines could easily read

00:06:04.740 --> 00:06:08.379
any document without any ambiguity but we know

00:06:08.379 --> 00:06:11.019
the web's not unambiguous the web is a complete

00:06:11.019 --> 00:06:14.139
mess the web is a beautiful beautiful mess so

00:06:14.139 --> 00:06:16.620
let's move to section two then the browser's

00:06:16.620 --> 00:06:18.980
secret decision because this is really where

00:06:18.980 --> 00:06:20.879
the theory just collides head -on with reality

00:06:20.879 --> 00:06:23.439
right This is where it gets interesting. So I

00:06:23.439 --> 00:06:25.240
asked myself this question when I was reading

00:06:25.240 --> 00:06:28.660
the notes. What happens if I mess this up? Let's

00:06:28.660 --> 00:06:32.120
say I'm a web developer in 1999. I forget the

00:06:32.120 --> 00:06:35.519
Dothie type PE or make a typo in it. Does the

00:06:35.519 --> 00:06:40.000
browser just stop working? Crash? Well, if the

00:06:40.000 --> 00:06:41.839
browser stopped working every time a developer

00:06:41.839 --> 00:06:43.819
made a typo, the Internet would have died in

00:06:43.819 --> 00:06:47.019
1996. Fair point. The browser makers, you know,

00:06:47.060 --> 00:06:50.069
Netscape and Microsoft. In the thick of the browser

00:06:50.069 --> 00:06:53.209
wars, they faced this huge dilemma. They wanted

00:06:53.209 --> 00:06:55.649
to implement new standards. They wanted to support

00:06:55.649 --> 00:06:58.129
CSS correctly. They wanted to fix all the bugs

00:06:58.129 --> 00:06:59.649
in their rendering engines. Sure, they wanted

00:06:59.649 --> 00:07:01.750
to move forward. But millions of websites had

00:07:01.750 --> 00:07:03.329
already been built, and they were built relying

00:07:03.329 --> 00:07:06.449
on the old buggy behavior. If they just fixed

00:07:06.449 --> 00:07:08.829
the browser, all those old sites would shatter.

00:07:09.189 --> 00:07:11.870
Layouts would break. Text would overlap everywhere.

00:07:12.149 --> 00:07:14.829
It would have been a complete disaster. The web

00:07:14.829 --> 00:07:17.009
would have become unusable overnight. So they

00:07:17.009 --> 00:07:19.040
couldn't just update the engine. They had to

00:07:19.040 --> 00:07:21.660
somehow support the bad code and the good code

00:07:21.660 --> 00:07:24.720
at the same time. Enter Doctype Sniffing and

00:07:24.720 --> 00:07:28.120
Switching. Sniffing. Hmm. It sounds so investigative

00:07:28.120 --> 00:07:31.120
like a detective. It is, in a way. When a browser

00:07:31.120 --> 00:07:33.819
receives a document that's served as text TML,

00:07:34.019 --> 00:07:36.620
it doesn't just blindly start rendering it. It

00:07:36.620 --> 00:07:39.000
looks at that very first line, the Doctype -y,

00:07:39.040 --> 00:07:41.980
and it sniffs it. Sniffs it. It checks it against

00:07:41.980 --> 00:07:44.079
an internal list of known patterns to decide

00:07:44.079 --> 00:07:46.759
which rendering mode it should use. It's making

00:07:46.759 --> 00:07:49.040
a judgment call based on that one line. And this

00:07:49.040 --> 00:07:50.819
is where we get the two main modes, right? Right.

00:07:50.879 --> 00:07:53.180
Quirks mode and standards mode. There's actually

00:07:53.180 --> 00:07:55.740
a third one called almost standards mode, which

00:07:55.740 --> 00:07:58.040
is a whole other rabbit hole. But yeah, let's

00:07:58.040 --> 00:07:59.819
stick to the big two for clarity. They're the

00:07:59.819 --> 00:08:01.790
most important. Okay, let's talk about Quirks

00:08:01.790 --> 00:08:04.829
Mode. The name itself implies it's just a little

00:08:04.829 --> 00:08:08.050
bit weird, but the source material suggests it's

00:08:08.050 --> 00:08:10.329
actually a highly sophisticated emulation of

00:08:10.329 --> 00:08:12.689
failure. That is a great way to put it. Quirks

00:08:12.689 --> 00:08:15.470
Mode is a compatibility layer. If the docketype

00:08:15.470 --> 00:08:18.910
is missing entirely, or if it's an old, malformed

00:08:18.910 --> 00:08:21.370
docketype that suggests the author didn't know

00:08:21.370 --> 00:08:23.750
about modern standards. So if it looks like it

00:08:23.750 --> 00:08:26.829
was written in 1998. Exactly. The browser enters

00:08:26.829 --> 00:08:29.970
Quirks Mode. And in this mode, the browser deliberately

00:08:29.970 --> 00:08:33.129
intentionally violates the modern web specifications

00:08:33.129 --> 00:08:37.230
to mimic the behavior of old browsers like Internet

00:08:37.230 --> 00:08:40.610
Explorer 5 or NetState 4. Can we get specific

00:08:40.610 --> 00:08:42.610
here? The notes mention something called the

00:08:42.610 --> 00:08:44.750
box model. This seems to be the biggest, most

00:08:44.750 --> 00:08:46.610
famous difference between the modes. This is

00:08:46.610 --> 00:08:48.350
the classic example. It's the one everyone points

00:08:48.350 --> 00:08:50.909
to. So imagine you have a box on your screen.

00:08:51.289 --> 00:08:53.889
A div, let's say. Okay. And you write some CSS.

00:08:54.009 --> 00:08:57.090
You say its width is 100 pixels and you add 10

00:08:57.090 --> 00:08:59.250
pixels of padding on each side. All right. 100

00:08:59.250 --> 00:09:01.669
pixels wide, 10 pixels of padding. Got it. In

00:09:01.669 --> 00:09:04.909
the W3C standards, the correct way, which is

00:09:04.909 --> 00:09:07.149
what standards mode uses, the width property

00:09:07.149 --> 00:09:09.570
refers only to the content area. Okay. There

00:09:09.570 --> 00:09:12.149
have 100 pixels of content plus 10 pixels of

00:09:12.149 --> 00:09:14.529
padding on the left plus 10 pixels of padding

00:09:14.529 --> 00:09:17.049
on the right. What's the total visible width

00:09:17.049 --> 00:09:22.720
of the box? 120 pixels. 100 plus 10 plus 10.

00:09:22.899 --> 00:09:25.000
Correct. That makes logical sense. That's standards

00:09:25.000 --> 00:09:26.960
mode. But in the old Internet Explorer, they

00:09:26.960 --> 00:09:29.620
did it differently. How so? They said, if I declare

00:09:29.620 --> 00:09:32.820
the width as 100, the whole darn box should be

00:09:32.820 --> 00:09:35.940
100 pixels wide, period. So they squished the

00:09:35.940 --> 00:09:38.960
content. The padding was included inside that

00:09:38.960 --> 00:09:41.820
100 pixel width. The content area would only

00:09:41.820 --> 00:09:44.100
be 80 pixels. Oh, I see. So if I built a whole

00:09:44.100 --> 00:09:48.639
website. relying on my boxes being exactly 100

00:09:48.639 --> 00:09:50.919
pixels wide to fit next to each other. And then

00:09:50.919 --> 00:09:52.840
suddenly the browser updates and starts following

00:09:52.840 --> 00:09:55.620
the standard. Your box is now 120 pixels wide.

00:09:55.740 --> 00:09:58.000
And my entire layout breaks. My sidebar probably

00:09:58.000 --> 00:09:59.740
falls down to the bottom of the page. The whole

00:09:59.740 --> 00:10:01.899
site looks ruined. And your boss is yelling at

00:10:01.899 --> 00:10:04.200
you. So quirks mode tells the modern browser,

00:10:04.379 --> 00:10:06.580
even the latest version of Chrome or Firefox

00:10:06.580 --> 00:10:08.679
running today, to use that old mathematically

00:10:08.679 --> 00:10:11.120
incorrect calculation for whip. It preserves

00:10:11.120 --> 00:10:13.879
the quirk. Exactly. But if you provide a correct

00:10:13.879 --> 00:10:17.379
modern Docky type, the browser says, aha, this

00:10:17.379 --> 00:10:19.080
author knows what they're doing. They want to

00:10:19.080 --> 00:10:21.259
follow the specs. And it switches to standards

00:10:21.259 --> 00:10:24.039
mode using the correct box model. So the DocsTag

00:10:24.039 --> 00:10:27.379
really is just a switch. But, yeah. Here's the

00:10:27.379 --> 00:10:29.039
dirty secret that really blew my mind in the

00:10:29.039 --> 00:10:31.080
research. We talked about how the doc type is

00:10:31.080 --> 00:10:33.620
supposed to point to a DTD file, right? A URL.

00:10:33.759 --> 00:10:35.759
Yes, that's the theory. So logic would suggest

00:10:35.759 --> 00:10:38.259
the browser would download that file from the

00:10:38.259 --> 00:10:40.919
URL to check the rules. But the sources say that

00:10:40.919 --> 00:10:43.759
modern browsers have these special purpose HTML

00:10:43.759 --> 00:10:47.379
parsers that do not use general purpose DTD logic

00:10:47.379 --> 00:10:49.960
at all. This is the big twist. Modern browsers.

00:10:50.549 --> 00:10:54.049
Chrome, Safari, Edge, Firefox, do not download

00:10:54.049 --> 00:10:56.090
the DTD. They never look at the file. Never.

00:10:56.269 --> 00:10:58.970
If you put a URL in your doc type pointing to,

00:10:59.070 --> 00:11:04.090
you know, http .www .w3 .org something .dtd,

00:11:04.190 --> 00:11:07.470
the browser completely ignores that URL. It does

00:11:07.470 --> 00:11:09.389
not make a network request. It does not fetch

00:11:09.389 --> 00:11:12.259
it. It does not read it. So why is it even there?

00:11:12.460 --> 00:11:14.919
In the modern context, it is there purely for

00:11:14.919 --> 00:11:17.120
the string matching algorithm. That's it. The

00:11:17.120 --> 00:11:19.740
browser has a hard -coded internal list of strings.

00:11:19.919 --> 00:11:22.080
A list of patterns. Exactly. It just says, if

00:11:22.080 --> 00:11:24.299
the characters at the very top of this file match

00:11:24.299 --> 00:11:26.679
this specific sequence, turn on standards mode.

00:11:26.779 --> 00:11:29.019
If they don't, or if there's nothing there, turn

00:11:29.019 --> 00:11:31.179
on quirks mode. So it's not reading the map.

00:11:31.320 --> 00:11:33.340
It's just checking to see if you have a map in

00:11:33.340 --> 00:11:35.200
your hand. That's a great way to put it. It's

00:11:35.200 --> 00:11:38.559
pure pattern recognition. The browser developers

00:11:38.559 --> 00:11:41.919
essentially said, Parsing TTDs is slow, it's

00:11:41.919 --> 00:11:44.379
complex, and it's completely unnecessary because

00:11:44.379 --> 00:11:47.539
we already know what HTML is. They just baked

00:11:47.539 --> 00:11:50.059
the rules for HTML directly into the browser's

00:11:50.059 --> 00:11:53.440
C++ or Rust code, rather than having to read

00:11:53.440 --> 00:11:56.279
them from some external text file every time.

00:11:56.440 --> 00:11:58.799
That is fascinating. It really means that Dossi

00:11:58.799 --> 00:12:01.759
Typee is strictly a trigger. It's not data. It's

00:12:01.759 --> 00:12:04.570
just a button. It's a button disguised as data.

00:12:04.809 --> 00:12:06.450
So if we're just poking a button, let's look

00:12:06.450 --> 00:12:08.389
at what that button actually looked like. Because

00:12:08.389 --> 00:12:11.590
before HTML5 came along, we weren't just typing

00:12:11.590 --> 00:12:15.230
HTML. We were typing these massive, complex incantations.

00:12:15.409 --> 00:12:17.250
Oh, the syntax was a beast. It was something

00:12:17.250 --> 00:12:19.830
else. Let's move to section three then. Anatomy

00:12:19.830 --> 00:12:22.190
of the syntax. I really want to deep dive into

00:12:22.190 --> 00:12:23.710
those characters because I feel like I typed

00:12:23.710 --> 00:12:25.710
them for 10 years without ever actually reading

00:12:25.710 --> 00:12:27.690
or understanding them. All right, let's decode

00:12:27.690 --> 00:12:29.769
the cipher. Okay, looking at the source here.

00:12:29.850 --> 00:12:32.970
The general syntax for a docket type. In the

00:12:32.970 --> 00:12:36.909
old STML style, it opens with docu -type. I want

00:12:36.909 --> 00:12:38.690
to note the exclamation point. It feels like

00:12:38.690 --> 00:12:41.610
it's shouting. It's a declaration. In markup

00:12:41.610 --> 00:12:44.289
languages, that exclamation point usually signals,

00:12:44.450 --> 00:12:46.450
hey, this isn't an instruction for the parser.

00:12:46.470 --> 00:12:48.649
This is not an element to be displayed on the

00:12:48.649 --> 00:12:51.110
page. Okay, that makes sense. Then we have the

00:12:51.110 --> 00:12:53.070
root element, which we've established is usually

00:12:53.070 --> 00:12:55.429
each PML. Right. But then we get these keywords.

00:12:56.169 --> 00:12:59.570
public or system, and then these really long

00:12:59.570 --> 00:13:02.450
strings in quotes. Let's break down system first

00:13:02.450 --> 00:13:04.470
because that one seems a little simpler. System

00:13:04.470 --> 00:13:06.830
is the straightforward one. If you see the keyword

00:13:06.830 --> 00:13:09.480
system. It is followed by what's called a system

00:13:09.480 --> 00:13:12.740
identifier. In the context of XML and the web,

00:13:12.919 --> 00:13:15.500
this is almost always a URI, a uniform resource

00:13:15.500 --> 00:13:18.360
identifier, basically a URL. So system just means,

00:13:18.419 --> 00:13:20.779
hey, browser, the rules, the DTD, it's located

00:13:20.779 --> 00:13:22.919
at this specific address. Go get them. Exactly.

00:13:23.039 --> 00:13:24.899
It points directly to a file on a disk or on

00:13:24.899 --> 00:13:27.559
a server. Very literal. But we rarely use system

00:13:27.559 --> 00:13:30.139
on its own, did we? We almost always use public.

00:13:30.220 --> 00:13:32.559
What does public mean in this context? Public

00:13:32.559 --> 00:13:34.679
is a bit more abstract. It indicates that the

00:13:34.679 --> 00:13:38.259
DTD is a public text. A public text. Yeah. It

00:13:38.259 --> 00:13:40.279
means this isn't just some random file on my

00:13:40.279 --> 00:13:42.299
personal server. This is a shared well -known

00:13:42.299 --> 00:13:45.059
standard that should be known to the system already.

00:13:45.340 --> 00:13:47.340
So it's like a celebrity file. In a way, yeah.

00:13:47.539 --> 00:13:50.200
It points to a public identifier. This is just

00:13:50.200 --> 00:13:52.259
a unique name for that standard. It's kind of

00:13:52.259 --> 00:13:54.519
like saying the U .S. Constitution. You don't

00:13:54.519 --> 00:13:56.259
need to tell me the physical address of the piece

00:13:56.259 --> 00:13:58.519
of paper in the National Archives. The name itself

00:13:58.519 --> 00:14:01.440
identifies the document. And this leads to that

00:14:01.440 --> 00:14:04.580
really specific weird string we always saw. The

00:14:04.580 --> 00:14:06.620
FPI, the formal public identifier. I want to

00:14:06.620 --> 00:14:11.980
parse this string. Natchez W3CTD XHTML 1 .1EN.

00:14:12.379 --> 00:14:14.620
I used to stare at those slashes and just wonder

00:14:14.620 --> 00:14:16.600
what they meant. It looks like total nonsense,

00:14:16.740 --> 00:14:18.899
but it's actually a very strict, very formal

00:14:18.899 --> 00:14:21.120
cataloging syntax. Okay, let's break it down.

00:14:21.179 --> 00:14:22.679
What is the minus sign at the very beginning?

00:14:23.399 --> 00:14:25.820
So that minus sign indicates the registration

00:14:25.820 --> 00:14:29.340
status of the identifier. A plus sign plus would

00:14:29.340 --> 00:14:31.419
mean it's an officially registered ISO standard.

00:14:31.720 --> 00:14:35.039
A minus sign means it is unregistered. Wait,

00:14:35.100 --> 00:14:37.279
hold on a second. The standard for the World

00:14:37.279 --> 00:14:40.759
Wide Web from the W3C is marked as unregistered.

00:14:40.759 --> 00:14:44.059
In the very strict formal sense of ISO SGML registration,

00:14:44.399 --> 00:14:47.379
yes. It basically just means we are an important

00:14:47.379 --> 00:14:49.519
organization, but we are not the international

00:14:49.519 --> 00:14:53.299
organization for standardization. So it starts

00:14:53.299 --> 00:14:55.720
with a minus. That is a fantastic piece of trivia.

00:14:55.860 --> 00:14:58.539
The entire web is built on unregistered documents.

00:14:58.799 --> 00:15:01.299
Pretty much. Then you have the double slash R.

00:15:01.399 --> 00:15:03.779
That's just the delimiter, a separator. Then

00:15:03.779 --> 00:15:06.639
W3C, that part is the owner identifier. It tells

00:15:06.639 --> 00:15:08.840
you who maintains this public text. Okay, that

00:15:08.840 --> 00:15:11.299
makes sense. W3C, the World Wide Web Consortium.

00:15:11.360 --> 00:15:15.620
Then another delimiter and DTD. XHTML 1 .1. That

00:15:15.620 --> 00:15:17.360
is the public text class in the description.

00:15:17.559 --> 00:15:19.759
It tells you what kind of document it is, a DTD,

00:15:19.820 --> 00:15:22.860
and which version it is, XHTML 1 .1. And finally,

00:15:22.940 --> 00:15:25.259
the EN at the end. That's the language, English.

00:15:25.620 --> 00:15:27.759
So it's like a Dewey decimal system for code.

00:15:27.799 --> 00:15:30.840
It's saying, unregistered, owned by W3C. It's

00:15:30.840 --> 00:15:34.700
a DTD for XHTML 1 .1, and it's in English. Exactly.

00:15:34.840 --> 00:15:37.580
And the original idea was that a parser would

00:15:37.580 --> 00:15:40.360
have a catalog, like a lookup table, built right

00:15:40.360 --> 00:15:43.000
into it. It would see the string, look it up

00:15:43.000 --> 00:15:45.100
in its internal table, and the table would say,

00:15:45.179 --> 00:15:47.399
oh, I have a local copy of that DTD right here

00:15:47.399 --> 00:15:48.860
on the hard drive. I don't need to go out to

00:15:48.860 --> 00:15:51.679
the internet to find it. But also, looking at

00:15:51.679 --> 00:15:54.200
the old syntax, right after that long public

00:15:54.200 --> 00:15:57.620
identifier string, there was also a URL in quotes.

00:15:58.080 --> 00:16:00.649
Why both? That's the fallback. That's the backup

00:16:00.649 --> 00:16:03.350
plan. If you use public, you provide the official

00:16:03.350 --> 00:16:05.870
name, but just in case the parser doesn't have

00:16:05.870 --> 00:16:08.210
that name in its catalog, you also provide the

00:16:08.210 --> 00:16:11.419
system identifier, the URL, as a backup. So it's

00:16:11.419 --> 00:16:12.940
like saying, here's the name of the book I want.

00:16:13.019 --> 00:16:15.240
If you don't have it in stock, here is the address

00:16:15.240 --> 00:16:17.419
of the library where you can go find it. That's

00:16:17.419 --> 00:16:20.080
a perfect analogy. And in XML, this was very

00:16:20.080 --> 00:16:22.419
strict. If you use public, you had to provide

00:16:22.419 --> 00:16:24.779
the system identifier URI as well. You couldn't

00:16:24.779 --> 00:16:26.679
just leave it hanging. Now, there's one more

00:16:26.679 --> 00:16:28.720
part of this syntax that this source material

00:16:28.720 --> 00:16:31.559
calls a hidden feature, the internal subset.

00:16:32.320 --> 00:16:35.620
The part with the square brackets. Ah, yes. This

00:16:35.620 --> 00:16:37.879
is the part that the power users really loved.

00:16:38.039 --> 00:16:39.659
Where does this even go? I don't think I've ever

00:16:39.659 --> 00:16:42.440
seen it. It goes at the very end of the dossy

00:16:42.440 --> 00:16:45.159
type declaration, just before the final closing

00:16:45.159 --> 00:16:47.379
angle bracket. And what goes inside those brackets?

00:16:47.899 --> 00:16:51.500
Actual DPD rules. You could define entities or

00:16:51.500 --> 00:16:54.200
even change element rules in line right there

00:16:54.200 --> 00:16:57.519
inside your HTML file. So I could, in theory,

00:16:57.720 --> 00:17:00.360
rewrite the rules of HTML from within the document

00:17:00.360 --> 00:17:03.259
itself. To an extent, yes. The internal subset

00:17:03.259 --> 00:17:06.299
is processed first. It has the highest precedence.

00:17:06.460 --> 00:17:09.240
So if you defined an entity inside those brackets,

00:17:09.539 --> 00:17:11.940
it would override whatever was defined in the

00:17:11.940 --> 00:17:14.400
external DTD file. Can you give me an example?

00:17:14.539 --> 00:17:16.680
I mean, why would I ever use this? Well, back

00:17:16.680 --> 00:17:18.720
in the old days, you might use it to define a

00:17:18.720 --> 00:17:21.240
shortcut. For instance, you could define an entity

00:17:21.240 --> 00:17:23.980
called nleg to mean written by the legal department.

00:17:24.519 --> 00:17:27.869
Copyright 1999. Then anywhere in your document

00:17:27.869 --> 00:17:30.630
where you typed in leg, the parser would automatically

00:17:30.630 --> 00:17:32.690
expand it to that full phrase. It was like a

00:17:32.690 --> 00:17:35.269
variable, a macro. Exactly. Or you could use

00:17:35.269 --> 00:17:37.309
it to temporarily disable certain validation

00:17:37.309 --> 00:17:39.490
rules if you were testing something or had a

00:17:39.490 --> 00:17:42.130
special case. It was very powerful in the proper

00:17:42.130 --> 00:17:44.910
SGML world. But again, let me bring it back to

00:17:44.910 --> 00:17:47.450
modern browsers. You know the answer. Chrome

00:17:47.450 --> 00:17:50.819
ignores it. Firefox ignores it. The browser's

00:17:50.819 --> 00:17:55.359
HTML parser is not a real SGML parser. It doesn't

00:17:55.359 --> 00:17:57.960
process entities defined in the internal subset.

00:17:58.299 --> 00:18:00.660
It just sees the brackets and skips right over

00:18:00.660 --> 00:18:03.440
them. It's a vestigial organ. It's like the appendix

00:18:03.440 --> 00:18:05.900
of the syntax. That's exactly what it is. It's

00:18:05.900 --> 00:18:07.799
there because the grammar says it's allowed to

00:18:07.799 --> 00:18:10.240
be there. But functionally, for the purpose of

00:18:10.240 --> 00:18:12.920
rendering a web page, it does absolutely nothing.

00:18:13.200 --> 00:18:14.799
Okay, so we've got the syntax. Now let's look

00:18:14.799 --> 00:18:17.000
at the flavors. Because this wasn't just about

00:18:17.000 --> 00:18:19.160
syntax. It was about... I don't know, philosophy.

00:18:19.660 --> 00:18:23.539
In the HTML 4 .01 era, we had to make a choice

00:18:23.539 --> 00:18:25.759
every single time we started a new file. The

00:18:25.759 --> 00:18:30.200
clan wars. The DTD wars. Section 4. The flavors

00:18:30.200 --> 00:18:33.259
of HTML. The notes list three really distinct

00:18:33.259 --> 00:18:37.500
DTDs for HTML 4 .01. Strict, transitional, and

00:18:37.500 --> 00:18:39.759
frameset. Let's talk about strict. This one always

00:18:39.759 --> 00:18:41.519
felt like it was on the moral high ground. It

00:18:41.519 --> 00:18:44.940
absolutely was. HTML 4 .01 strict was the W3C

00:18:44.940 --> 00:18:47.119
basically saying this is how the web should be.

00:18:47.680 --> 00:18:50.119
The key characteristic of the strict DTD is that

00:18:50.119 --> 00:18:53.079
it did not allow any presentational markup. And

00:18:53.079 --> 00:18:56.579
presentational markup means things that control

00:18:56.579 --> 00:18:58.640
how the page looks rather than what the content

00:18:58.640 --> 00:19:01.759
means. Right. No font tags, no center tags, no

00:19:01.759 --> 00:19:04.920
BG color attributes on your table cells. The

00:19:04.920 --> 00:19:06.920
argument was all about separation of concerns.

00:19:07.660 --> 00:19:11.640
HTML is for structure and semantics. CSS is for

00:19:11.640 --> 00:19:14.660
presentation. Which is. I mean, that's the standard

00:19:14.660 --> 00:19:16.779
best practice we all follow today. It is now.

00:19:17.000 --> 00:19:20.259
But back then, CSS support in browsers was spotty,

00:19:20.339 --> 00:19:23.220
to be generous. Right. Internet Explorer 4 and

00:19:23.220 --> 00:19:26.500
Netscape 4 were just terrible at CSS. So if you

00:19:26.500 --> 00:19:29.339
were brave and used the strip DTD, you were often

00:19:29.339 --> 00:19:31.779
fighting against the browser to get even basic

00:19:31.779 --> 00:19:34.380
layouts to look right. It was a real pain. So

00:19:34.380 --> 00:19:36.059
they gave us an out. They gave us the transitional

00:19:36.059 --> 00:19:39.319
DTD. also known as loose. The transitional DTD

00:19:39.319 --> 00:19:42.279
was a diplomatic compromise. It was the W3C saying,

00:19:42.519 --> 00:19:45.359
okay, we really want you to use CSS, but we acknowledge

00:19:45.359 --> 00:19:47.460
that you probably still need to use a font tag

00:19:47.460 --> 00:19:49.960
here and there or an align center attribute just

00:19:49.960 --> 00:19:52.059
to get this page to work for your client. So

00:19:52.059 --> 00:19:53.799
it was the, we know you're trying, but we accept

00:19:53.799 --> 00:19:56.619
your flaws, DTD. It was the bridge from the old

00:19:56.619 --> 00:19:58.880
way to the new way. It allowed all those deprecated

00:19:58.880 --> 00:20:01.799
presentational elements. And interestingly, almost

00:20:01.799 --> 00:20:04.880
everyone used transitional. It became the de

00:20:04.880 --> 00:20:07.579
facto standard because strict was just too painful.

00:20:09.470 --> 00:20:14.349
And the third one, very quickly, frameset. Ah,

00:20:14.650 --> 00:20:18.289
frames. A dark time. For our younger listeners,

00:20:18.369 --> 00:20:21.569
can we just briefly explain what a frame even

00:20:21.569 --> 00:20:24.650
was? Sure. Before we had good server -side includes

00:20:24.650 --> 00:20:27.650
or modern JavaScript frameworks, if you wanted

00:20:27.650 --> 00:20:29.730
a navigation bar that didn't reload every time

00:20:29.730 --> 00:20:31.990
you clicked a link, you used frames. You would

00:20:31.990 --> 00:20:34.789
essentially split the browser window itself into

00:20:34.789 --> 00:20:37.170
independent subwindows. Like a grid of little

00:20:37.170 --> 00:20:39.730
browser windows. Exactly. One window for the

00:20:39.730 --> 00:20:42.069
menu, one for the header, one for the main content.

00:20:42.289 --> 00:20:45.089
But it was a usability nightmare. Oh, it was

00:20:45.089 --> 00:20:48.130
terrible. It completely broke the URL bar. The

00:20:48.130 --> 00:20:50.089
address in the browser wouldn't change as you

00:20:50.089 --> 00:20:52.089
navigated inside the frames. It broke bookmarks.

00:20:52.140 --> 00:20:54.720
It broke the back button. But if you used it,

00:20:54.799 --> 00:20:57.420
you needed a special DTD, the frameset DTV, which

00:20:57.420 --> 00:20:59.819
basically told the browser, hey, this page doesn't

00:20:59.819 --> 00:21:01.799
have a body tag. It has a frameset tag instead.

00:21:02.119 --> 00:21:05.480
OK, so that was HTML 4 .01. But then came the

00:21:05.480 --> 00:21:09.400
era of XHTML, the X standing for XML. This was

00:21:09.400 --> 00:21:12.140
the absolute peak of the strict philosophy. The

00:21:12.140 --> 00:21:15.079
W3C decided that HTML is just too messy, too

00:21:15.079 --> 00:21:17.920
forgiving. They wanted to reformulate HTML as

00:21:17.920 --> 00:21:20.539
a... Proper XML application. Which meant it had

00:21:20.539 --> 00:21:22.779
to follow all the strict syntax rules of XML.

00:21:22.880 --> 00:21:25.480
All of them. Case sensitivity became a big deal.

00:21:25.660 --> 00:21:28.500
In regular HTML, tag and tag are the same thing.

00:21:28.599 --> 00:21:31.059
In XML, they are completely different. And XML

00:21:31.059 --> 00:21:33.660
requires lowercase tags. Right. And you must

00:21:33.660 --> 00:21:36.539
close every single tag no, leaving a p tag open

00:21:36.539 --> 00:21:38.559
just because you started a new one. You have

00:21:38.559 --> 00:21:40.740
to write pp, and you have to put quotes around

00:21:40.740 --> 00:21:43.380
all your attribute values. No more with 100.

00:21:43.539 --> 00:21:46.819
It has to be with 100. And the DTDs for XHTML,

00:21:46.920 --> 00:21:49.279
they just mirrored the HTML one. For XHTML 1

00:21:49.279 --> 00:21:52.200
.0, yes. You had XHTML 1 .0 strict, transitional,

00:21:52.400 --> 00:21:54.680
and frameset. But because it was XML, the syntax

00:21:54.680 --> 00:21:56.640
requirements were even tighter. For example,

00:21:56.700 --> 00:21:58.259
as we mentioned earlier, you had to include the

00:21:58.259 --> 00:22:00.259
system identifier, the URL, and the doc .tp.

00:22:00.400 --> 00:22:03.839
No inferring allowed. And then came XHTML 1 .1,

00:22:03.920 --> 00:22:06.599
the final form. This was supposed to be the finalized

00:22:06.599 --> 00:22:08.880
revision. It was the purest form. It dropped

00:22:08.880 --> 00:22:11.240
the transitional and frameset options entirely.

00:22:11.619 --> 00:22:13.900
It was strict or nothing. It was designed to

00:22:13.900 --> 00:22:16.880
be modular. It was beautiful from a pure computer

00:22:16.880 --> 00:22:19.339
science perspective. And it was a complete disaster

00:22:19.339 --> 00:22:23.000
in reality. It failed spectacularly because of

00:22:23.000 --> 00:22:25.400
something called draconian error handling, which

00:22:25.400 --> 00:22:28.579
is a core principle of XML. Draconian error handling.

00:22:28.880 --> 00:22:31.400
Yes. In true XML, if you have a single syntax

00:22:31.400 --> 00:22:34.980
error, a missing quote, an unclosed tag, anything,

00:22:35.319 --> 00:22:38.359
the parser is required to stop. It's supposed

00:22:38.359 --> 00:22:40.599
to show an error message. The yellow screen of

00:22:40.599 --> 00:22:42.819
death. The famous yellow screen of death in Firefox.

00:22:42.960 --> 00:22:44.680
It doesn't try to guess what you meant. It just

00:22:44.680 --> 00:22:46.720
stops and refuses to render the page at all.

00:22:47.099 --> 00:22:49.500
And you just can't have a web where a single

00:22:49.500 --> 00:22:51.539
missing quotation mark takes down the entire

00:22:51.539 --> 00:22:54.460
New York Times homepage. Exactly. The robustness

00:22:54.460 --> 00:22:56.039
principle. Be conservative in what you send.

00:22:56.099 --> 00:22:58.339
Be liberal in what you accept. That's what won

00:22:58.339 --> 00:23:01.160
out. Browsers continue to accept messy HTML,

00:23:01.420 --> 00:23:04.119
and the strict XHTML revolution largely happened

00:23:04.119 --> 00:23:06.799
in name only. Most developers who thought they

00:23:06.799 --> 00:23:09.140
were writing XHTML were actually serving it as

00:23:09.140 --> 00:23:11.259
text HTML, so the browsers were just treating

00:23:11.259 --> 00:23:13.400
it as tag soup anyway. Which brings us finally

00:23:13.400 --> 00:23:16.539
to the resolution of our story. The Great Pragmatism

00:23:16.539 --> 00:23:20.579
of HTML5. Section 5. The Simplification. We went

00:23:20.579 --> 00:23:23.200
from these massive intimidating strings, docs

00:23:23.200 --> 00:23:27.779
.html public, docket .w3c, dt .html 1 .0 transitional,

00:23:27.940 --> 00:23:32.079
and http .www .w3 .org. I have to assume that

00:23:32.079 --> 00:23:34.740
in HTML5, we just simplified the DTD. We didn't

00:23:34.740 --> 00:23:37.259
just simplify it. We obliterated the entire concept

00:23:37.259 --> 00:23:39.599
of the DTD. So what is the modern syntax? It's

00:23:39.599 --> 00:23:41.339
docket .ptml. What's it? That's the whole thing.

00:23:41.440 --> 00:23:43.839
Case insensitive, usually. No URL. No public

00:23:43.839 --> 00:23:46.740
identifier. No FPI. Nothing. Why? Why is it so

00:23:46.740 --> 00:23:50.390
short? Because HTML5 is not defined by SGML,

00:23:50.529 --> 00:23:52.829
it's its own standard with its own parsing rules.

00:23:53.049 --> 00:23:55.329
It doesn't need to reference an external rulebook

00:23:55.329 --> 00:23:57.529
because the rules are defined in the HTML spec

00:23:57.529 --> 00:24:00.690
itself, which the browsers then implement natively

00:24:00.690 --> 00:24:02.289
in their code. So it's an independent language

00:24:02.289 --> 00:24:04.329
now. It's all grown up and left home. Correct.

00:24:04.549 --> 00:24:07.150
But here's the kicker. The spec still says the

00:24:07.150 --> 00:24:09.990
Doctomail is required. So why is it required

00:24:09.990 --> 00:24:12.910
if it doesn't even point to a DTD anymore? Solely

00:24:12.910 --> 00:24:15.809
for the switch. The only reason Docty type HTML

00:24:15.809 --> 00:24:18.769
exists is to satisfy that sniffing algorithm

00:24:18.769 --> 00:24:21.210
we talked about back in section two. So it's

00:24:21.210 --> 00:24:23.549
just to trigger standards mode. That is its one

00:24:23.549 --> 00:24:25.730
and only purpose. It is the shortest possible

00:24:25.730 --> 00:24:27.750
string of characters that triggers standards

00:24:27.750 --> 00:24:30.450
mode in Internet Explorer 6 and all of its successors.

00:24:30.509 --> 00:24:33.269
It's a password. It's the minimum viable Docty.

00:24:33.900 --> 00:24:35.619
The browser developers basically got together

00:24:35.619 --> 00:24:38.119
and asked, what is the absolute least amount

00:24:38.119 --> 00:24:40.240
of text we can force developers to type that

00:24:40.240 --> 00:24:42.440
will still convince our old legacy code that

00:24:42.440 --> 00:24:45.640
this is a modern page? And the answer was dock

00:24:45.640 --> 00:24:48.619
to pipe HTML. That is hilarious. It's like wearing

00:24:48.619 --> 00:24:50.319
a tie to a Zoom meeting, but you're still wearing

00:24:50.319 --> 00:24:52.440
pajama pants just out of frame. You just need

00:24:52.440 --> 00:24:54.960
to show the tie, the dock type to get into the

00:24:54.960 --> 00:24:58.019
meeting. which is standards mode. That is a perfect

00:24:58.019 --> 00:24:59.700
description of what's happening. It looks like

00:24:59.700 --> 00:25:03.019
an old SGML declaration. It has the exclamation

00:25:03.019 --> 00:25:05.740
point. It has the doxy type E word, but it's

00:25:05.740 --> 00:25:08.359
completely hollow. There's no DTD behind it.

00:25:08.380 --> 00:25:10.910
It's a facade for legacy compatibility. Now,

00:25:10.930 --> 00:25:13.250
you mentioned case sensitivity. In HTML5, it's

00:25:13.250 --> 00:25:15.190
insensitive. So I can write doc, doc type in

00:25:15.190 --> 00:25:17.549
all lowercase. It's fine. In a regular HTML document

00:25:17.549 --> 00:25:20.990
served as text TML, yes. But if you are writing

00:25:20.990 --> 00:25:24.730
XHTML5, which is the XML serialization of HTML5,

00:25:24.950 --> 00:25:27.730
then you are back in XML land. And the strict

00:25:27.730 --> 00:25:30.309
rules apply again. They do. XML is case sensitive.

00:25:30.470 --> 00:25:33.009
The root element HTML must be lowercase. Therefore,

00:25:33.309 --> 00:25:36.089
the doxy type decoration must match that. You

00:25:36.089 --> 00:25:39.089
have to write doxy type eHTML, all lowercase.

00:25:39.289 --> 00:25:41.250
And the notes also. mentioned that in XHTML5

00:25:41.250 --> 00:25:43.950
the doxy tag is actually optional. Theoretically

00:25:43.950 --> 00:25:47.029
yes, because a true XML parser doesn't have a

00:25:47.029 --> 00:25:49.329
quirks mode in the same way an HTML parser does,

00:25:49.470 --> 00:25:52.349
so it doesn't need the switch. However, the source

00:25:52.349 --> 00:25:56.029
mentions an important edge case. polyglot document.

00:25:56.210 --> 00:25:59.049
A document that's written to be valid as both

00:25:59.049 --> 00:26:02.490
HTML and XML simultaneously. If you want your

00:26:02.490 --> 00:26:05.029
file to be readable by a strict XML parser and

00:26:05.029 --> 00:26:07.230
render correctly in a regular web browser as

00:26:07.230 --> 00:26:10.210
HTML, you absolutely need the doc type to keep

00:26:10.210 --> 00:26:12.670
the browser in standards mode. So, just keep

00:26:12.670 --> 00:26:14.490
it in. Just keep it in. It's 15 characters. It

00:26:14.490 --> 00:26:16.869
saves you a potential world of pain. It really

00:26:16.869 --> 00:26:19.150
is a fascinating evolution when you lay it all

00:26:19.150 --> 00:26:21.930
out. Yeah. We started with this incredibly high

00:26:21.930 --> 00:26:25.519
-minded, academic, structure, public catalogs,

00:26:25.519 --> 00:26:28.240
system identifiers, internal subsets. We literally

00:26:28.240 --> 00:26:30.819
built a bureaucracy of code. And then the messy

00:26:30.819 --> 00:26:32.799
market realities of the browser wars just came

00:26:32.799 --> 00:26:35.819
in and smashed that bureaucracy to pieces. And

00:26:35.819 --> 00:26:38.220
what we're left with today is a fossil. A tiny

00:26:38.220 --> 00:26:40.819
15 -character scar at the top of every single

00:26:40.819 --> 00:26:43.140
file that just reminds us of this massive battle

00:26:43.140 --> 00:26:46.319
between purism and pragmatism. And it's the ultimate

00:26:46.319 --> 00:26:49.539
victory of pragmatism. The standard now officially

00:26:49.539 --> 00:26:52.400
admits that the docs type is mostly useless.

00:26:52.799 --> 00:26:55.240
I mean, that's a refreshing level of honesty

00:26:55.240 --> 00:26:58.519
for a technical specification. So to summarize

00:26:58.519 --> 00:27:01.339
our investigation today, we learned that the

00:27:01.339 --> 00:27:04.940
docs type is technically an instruction to associate

00:27:04.940 --> 00:27:09.339
a document with a DTD. But in reality, for modern

00:27:09.339 --> 00:27:12.059
web pages, it is just a behavioral trigger. And

00:27:12.059 --> 00:27:14.279
we learned that sniffing is the browser's way

00:27:14.279 --> 00:27:16.500
of deciding whether it should emulate the bugs

00:27:16.500 --> 00:27:19.359
of the 1990s, which is quirks mode, or follow

00:27:19.359 --> 00:27:21.299
the rules of today, which is standards mode.

00:27:21.460 --> 00:27:23.759
We learned that the URL in all those old doctypes

00:27:23.759 --> 00:27:26.619
was essentially a lie. or at least a suggestion

00:27:26.619 --> 00:27:29.359
that was universally ignored by the actual browsers

00:27:29.359 --> 00:27:31.940
who rely on internal pattern matching instead.

00:27:32.180 --> 00:27:34.259
And we decoded the entire bureaucracy of the

00:27:34.259 --> 00:27:36.960
FBI, the owner, the class, the language, only

00:27:36.960 --> 00:27:38.880
to realize that the modern web has thrown all

00:27:38.880 --> 00:27:41.359
of it away for the simple, elegant docu -type

00:27:41.359 --> 00:27:44.059
HTML. It's just a great lesson in how standards

00:27:44.059 --> 00:27:45.920
actually evolve in the real world. They don't

00:27:45.920 --> 00:27:47.900
move in these perfect straight lines, they meander,

00:27:47.960 --> 00:27:49.940
they break, and sometimes they just leave these

00:27:49.940 --> 00:27:52.259
weird little artifacts behind. It reminds us

00:27:52.259 --> 00:27:55.119
that code is written by humans for systems that

00:27:55.119 --> 00:27:57.559
have a history. We can't just delete the past.

00:27:57.700 --> 00:27:59.839
We have to build these compatibility layers on

00:27:59.839 --> 00:28:02.500
top of it forever. Which leads me to a final

00:28:02.500 --> 00:28:03.880
thought I want to leave with you, the listener.

00:28:04.119 --> 00:28:06.599
Let's hear it. We've established that the modern

00:28:06.599 --> 00:28:09.799
web is built on a polite fiction. We include

00:28:09.799 --> 00:28:11.579
a line of code that looks like a definition,

00:28:11.839 --> 00:28:14.680
but it's actually just a trigger. The browser

00:28:14.680 --> 00:28:16.359
ignores the data and just looks at the shape

00:28:16.359 --> 00:28:19.240
of the tag. Yes, that's right. So if the docker

00:28:19.240 --> 00:28:21.470
ANSI type, the very first line of the web, is

00:28:21.470 --> 00:28:24.529
essentially a ritual we perform purely to appease

00:28:24.529 --> 00:28:26.710
legacy code from the days of Internet Explorer

00:28:26.710 --> 00:28:30.069
6. What else are we doing? That is the question.

00:28:30.309 --> 00:28:32.630
How many other best practices that we follow

00:28:32.630 --> 00:28:35.250
in our modern stacks, in JavaScript, in our CSS,

00:28:35.549 --> 00:28:37.930
in our server configurations, are actually just

00:28:37.930 --> 00:28:40.390
rituals, things we do because that's how it's

00:28:40.390 --> 00:28:42.730
always been done or because it used to fix a

00:28:42.730 --> 00:28:45.089
bug in a browser from 2015 that doesn't even

00:28:45.089 --> 00:28:48.200
exist anymore? We're basically coding by superstition.

00:28:48.200 --> 00:28:50.400
We're afraid to remove the talisman because we

00:28:50.400 --> 00:28:52.599
don't remember exactly what demon it was originally

00:28:52.599 --> 00:28:55.619
designed to keep away. The Doctype is the ultimate

00:28:55.619 --> 00:28:59.099
talisman. And as long as browsers have that quirks

00:28:59.099 --> 00:29:01.539
mode lurking inside them, we have to keep wearing

00:29:01.539 --> 00:29:03.500
it. And honestly, considering the alternative

00:29:03.500 --> 00:29:06.500
is broken layouts everywhere and the return of

00:29:06.500 --> 00:29:09.240
the box model hack, I am more than happy to type

00:29:09.240 --> 00:29:12.589
those 15 characters. Me too. Long live the doxie

00:29:12.589 --> 00:29:14.970
typhee. Thank you for joining us on this deep

00:29:14.970 --> 00:29:17.490
dive into the invisible history of the web. Always

00:29:17.490 --> 00:29:19.549
a pleasure to debug the past with you. Check

00:29:19.549 --> 00:29:21.730
your source code, everyone. We'll see you on

00:29:21.730 --> 00:29:22.490
the next deep dive.
