WEBVTT

00:00:00.000 --> 00:00:02.080
Welcome back to the deep dive. We are really

00:00:02.080 --> 00:00:04.620
going for it today. I mean, usually we take a

00:00:04.620 --> 00:00:06.719
topic, we sort of peel back the layers and we

00:00:06.719 --> 00:00:09.000
try to find that, you know, that core idea at

00:00:09.000 --> 00:00:12.400
the center. Right. But today, today we are tackling

00:00:12.400 --> 00:00:15.980
something that feels like it's just. It's everywhere

00:00:15.980 --> 00:00:18.320
all at once. It's a bit of a fog, isn't it? It's

00:00:18.320 --> 00:00:20.260
a total fog. I was walking through the airport

00:00:20.260 --> 00:00:22.859
the other day and I saw, I'm not kidding, three

00:00:22.859 --> 00:00:28.420
different ads for, get this, AI powered. toothbrushes.

00:00:28.420 --> 00:00:30.699
Oh no, the toothbrush that knows your molars

00:00:30.699 --> 00:00:32.960
better than you do. Exactly. And it just hit

00:00:32.960 --> 00:00:35.439
me. We are reaching that point of saturation

00:00:35.439 --> 00:00:38.060
where term just loses all its meaning. It becomes

00:00:38.060 --> 00:00:40.979
wallpaper. Yeah. AI, machine learning, neural

00:00:40.979 --> 00:00:43.380
networks. They're just these buzzwords we plaster

00:00:43.380 --> 00:00:45.960
on everything to make it sound futuristic. It's

00:00:45.960 --> 00:00:48.780
like the quantum of the 2020s. It absolutely

00:00:48.780 --> 00:00:51.200
is. And it creates this barrier to understanding.

00:00:51.420 --> 00:00:54.939
We hear the words, we nod along, but do we actually

00:00:54.939 --> 00:00:56.880
know what the machinery is doing under the hood?

00:00:57.390 --> 00:01:00.310
Most of the time, no, not at all. And for most

00:01:00.310 --> 00:01:02.670
people, and look, I'll include myself in this

00:01:02.670 --> 00:01:04.370
before I really start digging into these notes.

00:01:04.810 --> 00:01:07.989
When you hear neural network, your brain immediately

00:01:07.989 --> 00:01:10.930
goes to Silicon Valley. You think of chat GPT,

00:01:10.989 --> 00:01:13.109
you think of generative AI, you think of these

00:01:13.109 --> 00:01:15.769
massive data centers humming away somewhere in

00:01:15.769 --> 00:01:19.150
the desert. It feels like a strictly computer

00:01:19.150 --> 00:01:21.829
science topic. That is the common perception,

00:01:22.030 --> 00:01:25.109
yes. And it's not wrong, but it's... It's fundamentally

00:01:25.109 --> 00:01:27.829
incomplete. Incomplete. How? If you only look

00:01:27.829 --> 00:01:30.609
at the silicon, you are missing half the story.

00:01:30.909 --> 00:01:32.670
Actually, you're missing the original story.

00:01:32.790 --> 00:01:35.349
It's like describing a human being as just a

00:01:35.349 --> 00:01:37.549
calcium skeleton. You're missing the, you know,

00:01:37.549 --> 00:01:39.810
the meat. You're missing the origin story. Right.

00:01:39.829 --> 00:01:41.579
Because before there was any code. There was

00:01:41.579 --> 00:01:44.459
biology. Precisely. Neural networks are not just

00:01:44.459 --> 00:01:46.659
a computer science topic. They're fundamentally

00:01:46.659 --> 00:01:50.319
a biological one. This whole endeavor, this entire

00:01:50.319 --> 00:01:54.180
feel, it's the story of how we tried to reverse

00:01:54.180 --> 00:01:56.180
engineer the architecture of thought itself.

00:01:56.500 --> 00:01:58.319
Reverse engineering the architecture of thought.

00:01:58.400 --> 00:02:00.680
That's a wild way to frame it. But it's true.

00:02:00.920 --> 00:02:03.620
We looked inside our own heads. We saw how the

00:02:03.620 --> 00:02:05.319
machinery worked, or at least how we thought

00:02:05.319 --> 00:02:07.930
it worked. And then we tried to rebuild it. first

00:02:07.930 --> 00:02:10.449
with theory, then with math, and now with electricity

00:02:10.449 --> 00:02:12.610
and software. It sounds like science fiction,

00:02:12.710 --> 00:02:14.389
but it's actually the history of the last hundred

00:02:14.389 --> 00:02:17.250
years. It is. So, okay, our mission today is

00:02:17.250 --> 00:02:19.629
pretty ambitious. We're going to explore this

00:02:19.629 --> 00:02:22.750
definition of a neural network, and the source

00:02:22.750 --> 00:02:25.729
material gives us a really clean, simple place

00:02:25.729 --> 00:02:29.229
to start. A neural network is just a group of

00:02:29.229 --> 00:02:32.189
interconnected units. It sounds deceptively simple,

00:02:32.330 --> 00:02:34.419
doesn't it? a group of interconnected units,

00:02:34.520 --> 00:02:36.560
that could be anything. It really could. It could

00:02:36.560 --> 00:02:39.319
be a group of friends, a telephone system, a

00:02:39.319 --> 00:02:42.419
plumbing network in a house. But the magic, and

00:02:42.419 --> 00:02:44.620
this is the core of it, isn't in the units themselves.

00:02:44.759 --> 00:02:47.520
It's in the connections. It's how those simple

00:02:47.520 --> 00:02:50.000
individual units, whether they are wet, squishy

00:02:50.000 --> 00:02:52.699
cells in your brain or, you know, lines of code

00:02:52.699 --> 00:02:55.780
in a Python script, how they combine to perform

00:02:55.780 --> 00:02:58.319
these incredibly complex tasks. It's the idea

00:02:58.319 --> 00:03:01.180
that more is different. That's it. A single neuron

00:03:01.180 --> 00:03:05.039
is dumb. A trillion neurons. That's a mind. A

00:03:05.039 --> 00:03:07.659
single line of code is simple. Billions of them

00:03:07.659 --> 00:03:10.219
working together can write a symphony. So this

00:03:10.219 --> 00:03:12.219
is the core mechanism we need to understand today.

00:03:12.479 --> 00:03:14.900
And the source material sets up this perfect

00:03:14.900 --> 00:03:17.780
dichotomy for us. On one hand, you have neuroscience.

00:03:18.180 --> 00:03:20.960
Right. The biological world. In that world, a

00:03:20.960 --> 00:03:23.520
biological neural network is a physical structure.

00:03:23.680 --> 00:03:25.860
It is your brain. It is your nervous system.

00:03:25.960 --> 00:03:28.219
It is the hardware inside your skull. It's wet.

00:03:28.300 --> 00:03:30.759
It's chemical. It's alive. And then on the other

00:03:30.759 --> 00:03:32.479
hand, you have the world of machine learning.

00:03:32.580 --> 00:03:34.699
And in that world, a neural network isn't physical

00:03:34.699 --> 00:03:37.560
at all. It's a mathematical model. Abstraction.

00:03:37.699 --> 00:03:41.060
A pure abstraction. Specifically, it's a model

00:03:41.060 --> 00:03:43.979
used to approximate nonlinear functions. Okay,

00:03:44.039 --> 00:03:46.719
we are definitely going to have to unpack. approximating

00:03:46.719 --> 00:03:48.840
nonlinear functions later, because that sounds

00:03:48.840 --> 00:03:51.120
terrifyingly math heavy. We will make it painless,

00:03:51.159 --> 00:03:53.800
I promise. But that's the roadmap. We have these

00:03:53.800 --> 00:03:56.680
two worlds, the biological hardware inside us

00:03:56.680 --> 00:03:59.219
and the artificial software that is changing

00:03:59.219 --> 00:04:02.539
everything today. And so the plan for this deep

00:04:02.539 --> 00:04:05.439
dive is to follow that path. We're going to start

00:04:05.439 --> 00:04:08.159
with the biology, the original blueprint. Then

00:04:08.159 --> 00:04:10.039
we'll move into the surprising history of the

00:04:10.039 --> 00:04:12.659
theory, which, as I said, is way older than I

00:04:12.659 --> 00:04:15.580
thought it was. And we'll end up inside. the

00:04:15.580 --> 00:04:18.680
modern artificial mind. Sounds like a plan. Okay,

00:04:18.720 --> 00:04:20.240
let's do it. Let's start with that hardware.

00:04:20.680 --> 00:04:23.740
The biological blueprint. I want everyone listening

00:04:23.740 --> 00:04:26.500
to try and get a mental image of this. So if

00:04:26.500 --> 00:04:28.079
you're not driving, maybe close your eyes for

00:04:28.079 --> 00:04:30.980
a second. When we say biological neural network,

00:04:31.079 --> 00:04:33.120
what are we actually looking at? You are looking

00:04:33.120 --> 00:04:37.879
at a population of nerve cells. These are biological

00:04:37.879 --> 00:04:40.560
neurons. And you should imagine a dense, dense

00:04:40.560 --> 00:04:44.209
forest. But instead of trees with branches that

00:04:44.209 --> 00:04:46.089
just kind of touch, you have these incredibly

00:04:46.089 --> 00:04:49.290
complex spidery cells. Right. And the key, the

00:04:49.290 --> 00:04:51.610
absolute key, is that they aren't just floating

00:04:51.610 --> 00:04:54.750
around in isolation like islands. They are chemically

00:04:54.750 --> 00:04:56.290
connected to one another. We're communicating.

00:04:56.910 --> 00:04:58.709
Constantly. They're connected by what are called

00:04:58.709 --> 00:05:00.449
synapses. The synapse. That's the connection

00:05:00.449 --> 00:05:02.490
point, right? The little gap between them. Correct.

00:05:02.569 --> 00:05:04.870
It's the gap between the neurons where the communication

00:05:04.870 --> 00:05:08.069
actually happens. It's not a physical wire. It's

00:05:08.069 --> 00:05:11.529
a chemical conversation. Now, here is the stat

00:05:11.529 --> 00:05:13.329
from the research that really just stopped me

00:05:13.329 --> 00:05:15.649
in my tracks. Okay. We tend to think of connections

00:05:15.649 --> 00:05:18.209
as being one -to -one. You know, I call you on

00:05:18.209 --> 00:05:20.709
the phone, that's one connection, or I shake

00:05:20.709 --> 00:05:22.709
your hand. Right. A connects to B connects to

00:05:22.709 --> 00:05:26.689
C. A nice, clean, linear chain. Exactly. But

00:05:26.689 --> 00:05:29.550
a biological neuron, a single neuron in your

00:05:29.550 --> 00:05:33.189
brain, can be connected to. Wait for it. Hundreds

00:05:33.189 --> 00:05:35.410
of thousands of synapses. I'm sorry, say that

00:05:35.410 --> 00:05:38.490
again. Hundreds of thousands for one cell. That's

00:05:38.490 --> 00:05:40.990
not possible. How can one cell maintain that

00:05:40.990 --> 00:05:43.410
many connections? Through an incredibly intricate

00:05:43.410 --> 00:05:45.910
branching structure. It's not a simple chain.

00:05:46.089 --> 00:05:50.649
It is a dense, unbelievably complex web. If you

00:05:50.649 --> 00:05:52.670
have a population of these neurons and each one

00:05:52.670 --> 00:05:55.430
has that many potential connections, the number

00:05:55.430 --> 00:05:59.089
of possible pathways for a single signal to travel

00:05:59.089 --> 00:06:01.879
is... Well, it's astronomical. It's bigger than

00:06:01.879 --> 00:06:03.519
the number of atoms in the universe, I've heard.

00:06:03.579 --> 00:06:06.120
It's in that ballpark. And that explains why

00:06:06.120 --> 00:06:08.819
the brain is so much more powerful than, say,

00:06:08.860 --> 00:06:11.420
a pocket calculator. It's not the speed of the

00:06:11.420 --> 00:06:14.220
individual components. A transistor is way faster

00:06:14.220 --> 00:06:17.240
than a neuron. It's the density of the connections.

00:06:17.519 --> 00:06:20.800
It's the ultimate parallel processor. It's like

00:06:20.800 --> 00:06:23.959
the Internet. But if every single computer was

00:06:23.959 --> 00:06:26.839
directly connected to 100 ,000 other computers,

00:06:27.240 --> 00:06:31.129
the bandwidth would be... Unimaginable. So, okay,

00:06:31.230 --> 00:06:34.250
we have this insane web of connections. How does

00:06:34.250 --> 00:06:36.850
information actually move through it? We aren't

00:06:36.850 --> 00:06:39.589
plugging in USB cables here. It's not just pure

00:06:39.589 --> 00:06:42.209
electricity like a copper wire, is it? No, it's

00:06:42.209 --> 00:06:44.529
electrochemical. That's a key distinction. The

00:06:44.529 --> 00:06:46.790
neurons send and receive these little signals

00:06:46.790 --> 00:06:49.370
called action potentials. Action potentials.

00:06:49.370 --> 00:06:52.009
Think of it as a little pulse, a spike of electricity

00:06:52.009 --> 00:06:54.689
and chemistry zipping down the line. It's a binary

00:06:54.689 --> 00:06:56.529
event. It either happens or it doesn't. Like

00:06:56.529 --> 00:06:59.040
a heartbeat. A little blip on a monitor. Similar

00:06:59.040 --> 00:07:02.500
concept, but happening millions of times a second

00:07:02.500 --> 00:07:05.480
all over your brain. It zips down the axon of

00:07:05.480 --> 00:07:07.579
the neuron. It hits the synapse. It triggers

00:07:07.579 --> 00:07:09.959
a little spray of chemicals, neurotransmitters.

00:07:10.120 --> 00:07:13.540
And those chemicals drift across the gap and

00:07:13.540 --> 00:07:15.459
trigger the next neuron. So it's an electrical

00:07:15.459 --> 00:07:17.459
signal that turns into a chemical signal and

00:07:17.459 --> 00:07:19.980
then back into an electrical one. Exactly. But

00:07:19.980 --> 00:07:22.019
here is where it gets really, really interesting.

00:07:22.279 --> 00:07:24.839
And this is crucial for understanding how thought

00:07:24.839 --> 00:07:27.899
or processing actually happens. The neuron doesn't

00:07:27.899 --> 00:07:30.379
just passively pass the message along. It's not

00:07:30.379 --> 00:07:33.199
a bucket brigade. It makes the decision. A single

00:07:33.199 --> 00:07:35.819
cell makes a decision. In a way, yes. It acts

00:07:35.819 --> 00:07:38.259
as a filter. It has a vote. The source material

00:07:38.259 --> 00:07:40.360
highlights that a neuron can have one of two

00:07:40.360 --> 00:07:43.600
specific roles. It can be excitatory or it can

00:07:43.600 --> 00:07:46.100
be inhibitory. Okay, excitatory sounds pretty

00:07:46.100 --> 00:07:48.600
straightforward. It's exciting the signal. It's

00:07:48.600 --> 00:07:50.740
hyping it up. You got it. An excitatory role

00:07:50.740 --> 00:07:52.959
means it amplifies and propagates the signal.

00:07:53.019 --> 00:07:55.100
It says, yes, this is important. Pass it on.

00:07:55.240 --> 00:07:58.040
It essentially turns up the volume on that specific

00:07:58.040 --> 00:08:00.759
message. It pushes the next neuron closer to

00:08:00.759 --> 00:08:03.439
firing its own action potential. Okay, so what

00:08:03.439 --> 00:08:06.040
about the inhibitory role? That sounds like...

00:08:06.300 --> 00:08:08.439
Like a buzzkill. It's the necessary buzzkill.

00:08:09.000 --> 00:08:11.680
And it's arguably more important. An inhibitory

00:08:11.680 --> 00:08:13.920
neuron does the exact opposite. It suppresses

00:08:13.920 --> 00:08:16.560
the signal. It says, stop. This message does

00:08:16.560 --> 00:08:18.899
not go any further. It tells the next neuron

00:08:18.899 --> 00:08:22.040
not to fire. Yes. It makes it harder for the

00:08:22.040 --> 00:08:24.220
next neuron to fire. So for every signal that's

00:08:24.220 --> 00:08:27.019
saying go, go, go, you have others that are saying,

00:08:27.040 --> 00:08:31.060
no, wait, stop. That binary choice. amplify or

00:08:31.060 --> 00:08:33.840
suppress, that feels like the absolute root of

00:08:33.840 --> 00:08:35.980
everything. It is. Think about it for a second.

00:08:36.039 --> 00:08:38.820
If every neuron in your brain just shouted yes

00:08:38.820 --> 00:08:41.960
to every signal it received, what would happen?

00:08:42.139 --> 00:08:44.919
Just chaos. White noise. You'd be having a seizure.

00:08:45.080 --> 00:08:46.980
That is literally what a seizure is. Uncontrolled,

00:08:47.159 --> 00:08:49.860
cascading, electrical firing. The ability to

00:08:49.860 --> 00:08:53.220
inhibit, to say no, is what allows for precision.

00:08:53.460 --> 00:08:55.779
It allows for focus, for complex processing.

00:08:56.200 --> 00:08:58.990
So inhibition is like a sculptor. It's chiseling

00:08:58.990 --> 00:09:01.549
away all the noise so that a coherent thought

00:09:01.549 --> 00:09:03.809
can actually emerge. That is a beautiful way

00:09:03.809 --> 00:09:06.649
to put it. That delicate, dynamic balance between

00:09:06.649 --> 00:09:10.250
excitation and inhibition is critical to every

00:09:10.250 --> 00:09:12.549
single thing you do. It's what allows you to

00:09:12.549 --> 00:09:15.289
ignore the feeling of your shirt on your skin

00:09:15.289 --> 00:09:17.470
right now so you can focus on listening to this

00:09:17.470 --> 00:09:19.830
conversation. Right, or to pick out a single

00:09:19.830 --> 00:09:22.990
voice in a crowded, noisy room. Without inhibition,

00:09:22.990 --> 00:09:26.870
we are just static, just noise. So we have these

00:09:26.870 --> 00:09:30.129
individual units these neurons making these simple

00:09:30.129 --> 00:09:33.090
binary choices But we aren't just talking about

00:09:33.090 --> 00:09:35.750
one or two neurons here the source material talks

00:09:35.750 --> 00:09:38.889
about A hierarchy of scale. Right. It's not a

00:09:38.889 --> 00:09:41.509
flat system. We start at the micro level. Small

00:09:41.509 --> 00:09:43.929
populations of interconnected neurons are called

00:09:43.929 --> 00:09:46.570
neural circuits. Think of these as little specialized

00:09:46.570 --> 00:09:49.549
teams, maybe 100 neurons or so, that are designed

00:09:49.549 --> 00:09:52.389
to handle one very specific task. For example,

00:09:52.590 --> 00:09:54.769
there are circuits in your visual cortex that

00:09:54.769 --> 00:09:57.090
do nothing but detect vertical lines. That's

00:09:57.090 --> 00:09:59.070
their whole job. Dude, just vertical lines. That's

00:09:59.070 --> 00:10:01.129
it. And another circuit right next to it is just

00:10:01.129 --> 00:10:03.230
for horizontal lines. Yeah. Another for diagonal

00:10:03.230 --> 00:10:05.649
lines. And then you scale up from there. You

00:10:05.649 --> 00:10:07.570
scale up, you combine those circuits, and you

00:10:07.570 --> 00:10:11.309
get large -scale brain networks. These are massive,

00:10:11.309 --> 00:10:13.610
interconnected systems that span different regions

00:10:13.610 --> 00:10:16.730
of the brain. The visual network, the auditory

00:10:16.730 --> 00:10:19.049
network, the language network. And when you combine

00:10:19.049 --> 00:10:22.309
all of those, you get the full system. The brain

00:10:22.309 --> 00:10:24.990
and the entire nervous system. It's layers upon

00:10:24.990 --> 00:10:27.850
layers upon layers of complexity, starting from

00:10:27.850 --> 00:10:30.679
that simple yes -no vote. And what I found really

00:10:30.679 --> 00:10:32.779
fascinating in the reading, and you touched on

00:10:32.779 --> 00:10:36.000
this, was that the output of this whole biological

00:10:36.000 --> 00:10:39.320
machine isn't just thinking. I think we get trapped

00:10:39.320 --> 00:10:41.360
in our heads a bit. We think neural networks

00:10:41.360 --> 00:10:44.580
are for doing philosophy or calculus or writing

00:10:44.580 --> 00:10:46.639
poetry. Right. We think of it as this purely

00:10:46.639 --> 00:10:48.860
abstract thing. We think of the brain as a computer

00:10:48.860 --> 00:10:51.620
that just outputs data to a screen in our minds.

00:10:51.820 --> 00:10:55.539
Yeah. But biologically, the output is physical.

00:10:55.700 --> 00:10:58.070
It has to be. The source explicitly mentions

00:10:58.070 --> 00:11:00.309
that these electrochemical signals travel down

00:11:00.309 --> 00:11:02.309
through the nervous system. They go across what

00:11:02.309 --> 00:11:04.370
are called neuromuscular junctions and they hit

00:11:04.370 --> 00:11:07.250
muscle cells. And that causes contraction. It

00:11:07.250 --> 00:11:10.789
causes motion. The biological neural network

00:11:10.789 --> 00:11:14.110
is fundamentally a machine for turning electrochemical

00:11:14.110 --> 00:11:16.950
signals into physical movement. Everything you

00:11:16.950 --> 00:11:19.769
do, typing an email, running for a bus, even

00:11:19.769 --> 00:11:21.789
the tiny movements of your vocal cords while

00:11:21.789 --> 00:11:24.000
you're speaking right now. That is the output

00:11:24.000 --> 00:11:26.480
layer of your biological neural network in action.

00:11:26.679 --> 00:11:28.860
It's a machine that turns thought into action.

00:11:29.039 --> 00:11:32.200
Literally. From a purely biological evolutionary

00:11:32.200 --> 00:11:35.240
standpoint, the entire purpose of the brain is

00:11:35.240 --> 00:11:37.700
to move the body effectively through the world.

00:11:37.779 --> 00:11:40.080
If you don't move, you don't find food, you don't

00:11:40.080 --> 00:11:42.659
escape predators, you don't survive. So all that

00:11:42.659 --> 00:11:45.419
complex processing, all those hundred thousand

00:11:45.419 --> 00:11:48.460
connections, it all funnels down to one final

00:11:48.460 --> 00:11:50.799
command. Contract this muscle, relax that muscle.

00:11:51.259 --> 00:11:53.019
That just grounds it so much. We're not just

00:11:53.019 --> 00:11:55.720
thinking machines. We're movement machines. Primarily.

00:11:55.820 --> 00:11:57.720
So that's the biological hardware. This incredibly

00:11:57.720 --> 00:12:01.419
complex, dense web of neurons using excitatory

00:12:01.419 --> 00:12:04.419
and inhibitory signals to drive motion and thought.

00:12:05.269 --> 00:12:07.490
Now, I want to pivot to the history of the idea,

00:12:07.629 --> 00:12:09.330
because this was the part of the deep dive that

00:12:09.330 --> 00:12:11.450
genuinely shocked me. The timeline. The timeline.

00:12:11.490 --> 00:12:13.309
I mean, when I think of neural network theory,

00:12:13.509 --> 00:12:16.570
my brain jumps to the 1980s, maybe the 1950s,

00:12:16.570 --> 00:12:18.370
if we're being generous. To computer age. Exactly.

00:12:18.610 --> 00:12:20.809
But the source material takes us back to the

00:12:20.809 --> 00:12:23.149
19th century. It is surprising, isn't it? We

00:12:23.149 --> 00:12:25.950
think of this as a modern tech invention, but

00:12:25.950 --> 00:12:28.649
the theoretical base was laid down by Alexander

00:12:28.649 --> 00:12:32.149
Bain in 1873 and then William James in 1890.

00:12:32.629 --> 00:12:35.120
1873. Let's just... Let's contextualize that

00:12:35.120 --> 00:12:37.960
for a minute. In 1873, people were still getting

00:12:37.960 --> 00:12:40.639
around on horses. We didn't have computers. We

00:12:40.639 --> 00:12:42.879
barely had reliable light bulbs in homes. The

00:12:42.879 --> 00:12:45.759
telephone was barely a concept. What on earth

00:12:45.759 --> 00:12:48.840
were they doing proposing neural network theories?

00:12:49.200 --> 00:12:51.240
Well, they were psychologists and philosophers,

00:12:51.519 --> 00:12:53.899
and they were wrestling with the biggest question

00:12:53.899 --> 00:12:56.360
of all. What is the mind? What is consciousness?

00:12:56.700 --> 00:12:59.179
At the time, the prevailing view was still very

00:12:59.179 --> 00:13:03.620
spiritual or dualistic. The mind was a soul.

00:13:04.159 --> 00:13:06.000
A separate thing from the body, the ghost in

00:13:06.000 --> 00:13:08.580
the machine. Precisely. And Bain and James challenged

00:13:08.580 --> 00:13:10.759
that head on. They had this core insight that

00:13:10.759 --> 00:13:12.740
was absolutely revolutionary for the time. Which

00:13:12.740 --> 00:13:15.840
was? They posited that human thought wasn't magic.

00:13:15.940 --> 00:13:18.440
It wasn't some ethereal spirit floating around

00:13:18.440 --> 00:13:20.919
in your skull. They argued that human thought

00:13:20.919 --> 00:13:23.360
emerges from the physical interactions among

00:13:23.360 --> 00:13:26.860
large numbers of neurons inside the brain. That

00:13:26.860 --> 00:13:30.090
is a huge leap. To look at the gray matter in

00:13:30.090 --> 00:13:32.250
a skull and say, the interaction of these physical

00:13:32.250 --> 00:13:35.690
cells is what creates my subjective experience

00:13:35.690 --> 00:13:38.750
of being alive. It's the birth of a materialist

00:13:38.750 --> 00:13:40.850
understanding of the mind. They were basically

00:13:40.850 --> 00:13:43.129
saying, if you connect enough of these biological

00:13:43.129 --> 00:13:45.789
switches in the right way, you get a mind. They

00:13:45.789 --> 00:13:47.269
couldn't prove it, of course. They didn't have

00:13:47.269 --> 00:13:50.929
MRIs or electrodes. But the intuition was spot

00:13:50.929 --> 00:13:53.250
on. So they laid the philosophical foundation.

00:13:53.830 --> 00:13:55.690
But it seems like it just sat there for a while,

00:13:55.769 --> 00:13:58.620
right? As a theory. It did. It remained largely

00:13:58.620 --> 00:14:01.039
in the realm of philosophy until we get to the

00:14:01.039 --> 00:14:03.419
mid -20th century. That's when things really,

00:14:03.500 --> 00:14:06.220
really started to accelerate. We enter the era

00:14:06.220 --> 00:14:08.559
of what was called connectionism. Connectionism.

00:14:08.620 --> 00:14:10.700
That sounds like a networking event for LinkedIn.

00:14:10.980 --> 00:14:16.220
Laughs. Not quite. In the 1930s and 40s, connectionism

00:14:16.220 --> 00:14:18.639
was this new approach. It was about using artificial

00:14:18.639 --> 00:14:21.580
networks to model biological ones. Scientists

00:14:21.580 --> 00:14:24.200
started asking, OK, if the brain really works

00:14:24.200 --> 00:14:26.620
by connecting all these units, can we build a

00:14:26.620 --> 00:14:28.960
simplified model that does the same thing? It's

00:14:28.960 --> 00:14:31.500
the shift from just observing how does it work

00:14:31.500 --> 00:14:34.379
to asking, can we build one? It's the engineering

00:14:34.379 --> 00:14:37.710
mindset taking over. And this leads us directly

00:14:37.710 --> 00:14:41.149
to 1943. Two names that are incredibly important,

00:14:41.470 --> 00:14:44.190
Warren McCulloch and Walter Pitts. The inventors

00:14:44.190 --> 00:14:46.409
of the perceptron. Well, they created the model

00:14:46.409 --> 00:14:48.529
that the perceptron was based on. They created

00:14:48.529 --> 00:14:51.529
the first mathematical model of a neuron. The

00:14:51.529 --> 00:14:53.309
perceptron. It really sounds like a Transformers

00:14:53.309 --> 00:14:56.049
character. Beware the perceptron. It does, doesn't

00:14:56.049 --> 00:14:58.509
it? But the perceptron is essentially the great

00:14:58.509 --> 00:15:01.049
-great -grandfather of chat GPT. It was the first

00:15:01.049 --> 00:15:03.929
simple artificial neural network. So what did

00:15:03.929 --> 00:15:05.480
they do? What was the breakthrough? They did

00:15:05.480 --> 00:15:07.159
something brilliant. They took that biological

00:15:07.159 --> 00:15:08.919
neuron we just talked about, the one that gets

00:15:08.919 --> 00:15:11.620
signals, adds them up, and decides to fire, and

00:15:11.620 --> 00:15:13.320
they turned it into algebra. They said, okay,

00:15:13.419 --> 00:15:15.980
a neuron takes inputs, it sums them up, and if

00:15:15.980 --> 00:15:19.019
that sum hits a certain threshold, it fires.

00:15:19.399 --> 00:15:22.259
We can write an equation for that. They abstracted

00:15:22.259 --> 00:15:25.070
the biology into math. Exactly. And that was

00:15:25.070 --> 00:15:27.690
a monumental step. But right around that same

00:15:27.690 --> 00:15:32.169
time in 1949, we get another giant, Donald Hebb.

00:15:32.409 --> 00:15:34.750
And this is a name that comes up constantly in

00:15:34.750 --> 00:15:37.629
neuroscience. He's famous for Hebbian learning.

00:15:37.950 --> 00:15:41.049
Yes. This seems to be the golden rule of neuroscience.

00:15:41.429 --> 00:15:44.970
The one thing everyone knows. It is because Hebb

00:15:44.970 --> 00:15:47.190
gave us the mechanism for how the network learns.

00:15:47.409 --> 00:15:50.409
You see, McCulloch and Pitts built the car, but

00:15:50.409 --> 00:15:52.360
they didn't know how to drive it. The weights

00:15:52.360 --> 00:15:55.179
were fixed. Hebb explained how the driving works.

00:15:55.419 --> 00:15:57.159
And what was his insight? The source material

00:15:57.159 --> 00:15:59.700
puts it really clearly. It's the idea that neural

00:15:59.700 --> 00:16:02.659
networks change and learn over time by strengthening

00:16:02.659 --> 00:16:04.799
a synapse every single time a signal travels

00:16:04.799 --> 00:16:07.639
along it. Strengthening a synapse every time

00:16:07.639 --> 00:16:11.100
a signal travels along it. So if I practice a

00:16:11.100 --> 00:16:13.700
tennis swing, the signal to my arm goes down

00:16:13.700 --> 00:16:16.279
a certain neural path. And just because it went

00:16:16.279 --> 00:16:18.940
down that path, the path itself gets stronger.

00:16:19.120 --> 00:16:21.179
Exactly. It's like a trail in the woods. This

00:16:21.179 --> 00:16:23.440
is the best analogy for it. Imagine you're standing

00:16:23.440 --> 00:16:27.000
in a dense forest. There is no path. But you

00:16:27.000 --> 00:16:29.700
need to get from point A to point B. The very

00:16:29.700 --> 00:16:32.179
first time you walk it, you are hacking through

00:16:32.179 --> 00:16:36.240
brush with a machete. It's slow. It's hard. It

00:16:36.240 --> 00:16:38.100
takes a lot of effort. But you leave a trail

00:16:38.100 --> 00:16:40.860
behind you. A faint one. A very faint one. But

00:16:40.860 --> 00:16:42.779
if you walk that same route tomorrow, it's a

00:16:42.779 --> 00:16:44.240
little bit easier. You've already trampled some

00:16:44.240 --> 00:16:46.120
of the grass. Right. If you walk it every day

00:16:46.120 --> 00:16:48.659
for a month, the dirt starts to pack down. If

00:16:48.659 --> 00:16:51.080
you walk it for a year, it becomes a clear, wide

00:16:51.080 --> 00:16:53.200
path. You can eventually run down it without

00:16:53.200 --> 00:16:54.600
even thinking about where you're putting your

00:16:54.600 --> 00:16:56.919
feet. And that is what a synapse does. Literally.

00:16:57.840 --> 00:17:00.299
On a physical, chemical level, the connection

00:17:00.299 --> 00:17:02.980
becomes more efficient. The physical structure

00:17:02.980 --> 00:17:05.940
of the synapse changes to allow the signal to

00:17:05.940 --> 00:17:09.299
pass faster and more strongly. This is the biological

00:17:09.299 --> 00:17:12.920
basis for practice makes perfect. It is the physical

00:17:12.920 --> 00:17:15.519
manifestation of learning in your brain. It's

00:17:15.519 --> 00:17:17.940
wild to think that learning isn't just an abstract

00:17:17.940 --> 00:17:21.119
concept. It's a physical thickening and strengthening

00:17:21.119 --> 00:17:23.440
of connections in your head. You are literally

00:17:23.440 --> 00:17:25.539
terraforming your own brain with your thoughts.

00:17:25.740 --> 00:17:28.190
It gives a whole new... weight to the idea of

00:17:28.190 --> 00:17:30.150
habit forming, doesn't it? Every time you have

00:17:30.150 --> 00:17:31.970
a thought, you are physically sculpting your

00:17:31.970 --> 00:17:34.069
brain to make it easier to have that exact thought

00:17:34.069 --> 00:17:37.029
again. That is both incredibly inspiring and

00:17:37.029 --> 00:17:39.190
slightly terrifying at the same time. It really

00:17:39.190 --> 00:17:42.569
is. And the timeline keeps moving. So in 1956,

00:17:42.869 --> 00:17:45.150
we had another biological breakthrough that's

00:17:45.150 --> 00:17:47.690
mentioned in the text. So Adichan discovers the

00:17:47.690 --> 00:17:49.650
functioning of second -order retinal cells, these

00:17:50.170 --> 00:17:52.430
Horizontal cells. Right. This was a really crucial

00:17:52.430 --> 00:17:55.250
discovery for the biological side of things because

00:17:55.250 --> 00:17:57.069
it showed us that the processing wasn't just

00:17:57.069 --> 00:17:59.309
happening deep inside the brain. It was happening

00:17:59.309 --> 00:18:01.869
right at the sensor. In the eye itself. Yes.

00:18:01.970 --> 00:18:05.109
The retina is literally a part of the brain that's

00:18:05.109 --> 00:18:07.490
pushed out to the front of your face. And those

00:18:07.490 --> 00:18:10.230
cells were doing their own compression and processing

00:18:10.230 --> 00:18:13.210
like edge detection before the signal was even

00:18:13.210 --> 00:18:15.900
sent down the optic nerve. It proved that these

00:18:15.900 --> 00:18:19.059
networks are distributed and hierarchical right

00:18:19.059 --> 00:18:20.920
from the beginning. So we have the theory from

00:18:20.920 --> 00:18:24.819
James. We have the perceptron math from McCulloch

00:18:24.819 --> 00:18:27.259
and Pitts. We have Hebbian learning explaining

00:18:27.259 --> 00:18:29.920
how connections strengthen. We have Svetochin

00:18:29.920 --> 00:18:32.660
examining the retina. And then in the late 50s,

00:18:32.660 --> 00:18:34.680
things start to get physical in the artificial

00:18:34.680 --> 00:18:38.160
world. They get very physical. In 1957, a researcher

00:18:38.160 --> 00:18:41.079
named Frank Rosenblatt... implements the perceptron

00:18:41.079 --> 00:18:43.380
in hardware. Hardware. So this wasn't a simulation.

00:18:43.500 --> 00:18:45.660
This wasn't software running on a chip. No, chips

00:18:45.660 --> 00:18:47.819
as we know them didn't exist yet. This was a

00:18:47.819 --> 00:18:50.299
physical machine. It was called the Markahaya

00:18:50.299 --> 00:18:52.859
perceptron. And it was the size of a room. It

00:18:52.859 --> 00:18:54.980
had hundreds and hundreds of cables like a giant

00:18:54.980 --> 00:18:57.799
telephone switchboard. It had a camera that looked

00:18:57.799 --> 00:19:00.819
at shapes. And inside, to represent the weights

00:19:00.819 --> 00:19:03.720
of the connections, it had potentiometers, basically

00:19:03.720 --> 00:19:06.880
physical volume knobs that were attached to little

00:19:06.880 --> 00:19:10.339
electric motors. You're kidding. Motorized volume

00:19:10.339 --> 00:19:14.700
knobs. I'm not. To learn. The machine would physically

00:19:14.700 --> 00:19:17.240
turn the knobs to adjust the electrical resistance

00:19:17.240 --> 00:19:19.960
in the circuits. It was a physical, mechanical

00:19:19.960 --> 00:19:22.960
embodiment of a neural network. That is incredible.

00:19:23.099 --> 00:19:25.720
That's like a steampunk AI. It basically was.

00:19:26.180 --> 00:19:28.579
But this is also where we see the great divergence,

00:19:28.839 --> 00:19:31.500
the big split. The split between the two worlds

00:19:31.500 --> 00:19:33.200
we talked about in the intro, the biological

00:19:33.200 --> 00:19:35.859
and the artificial. Right. Up until this point,

00:19:35.880 --> 00:19:38.519
the main goal was largely to model biology, to

00:19:38.519 --> 00:19:41.339
try and understand the brain. But as these artificial

00:19:41.339 --> 00:19:44.059
networks started being built, And as they got

00:19:44.059 --> 00:19:46.559
better, they began to drift away from their biological

00:19:46.559 --> 00:19:49.900
counterparts. Why? Researchers realized that

00:19:49.900 --> 00:19:52.180
if your only goal is to solve a math problem,

00:19:52.420 --> 00:19:55.019
like recognizing a letter of the alphabet or

00:19:55.019 --> 00:19:57.759
calculating a missile trajectory, you don't actually

00:19:57.759 --> 00:20:00.259
need to be perfectly biologically accurate. You

00:20:00.259 --> 00:20:01.880
don't need all the squishy, messy chemistry.

00:20:02.180 --> 00:20:05.579
You don't. You can simplify. You can cheat. You

00:20:05.579 --> 00:20:08.710
just need the system to work. So the engineers

00:20:08.710 --> 00:20:11.049
and computer scientists started to focus purely

00:20:11.049 --> 00:20:13.730
on machine learning applications. They stopped

00:20:13.730 --> 00:20:16.369
trying to build an exact copy of a brain and

00:20:16.369 --> 00:20:18.329
started trying to build a machine that could

00:20:18.329 --> 00:20:21.230
learn, period. And that's the path that led us

00:20:21.230 --> 00:20:23.190
to where we are today. It's the fork in the road

00:20:23.190 --> 00:20:26.170
that created the modern AI industry. So let's

00:20:26.170 --> 00:20:29.049
cross that bridge now. We are leaving the wet,

00:20:29.150 --> 00:20:32.089
squishy world of biology and 19th century psychology,

00:20:32.309 --> 00:20:35.480
and we are entering the artificial mind. The

00:20:35.480 --> 00:20:38.299
domain of pure software. And the source material

00:20:38.299 --> 00:20:41.220
notes this transition explicitly. Early networks

00:20:41.220 --> 00:20:43.819
were these physical machines, like Rosenblatt's

00:20:43.819 --> 00:20:46.960
room -sized monster. Today, they are almost always

00:20:46.960 --> 00:20:49.160
implemented in software. And we have this definition

00:20:49.160 --> 00:20:51.680
again. An artificial mathematical model used

00:20:51.680 --> 00:20:54.140
to approximate nonlinear functions. All right.

00:20:54.200 --> 00:20:56.119
We have to tackle that phrase now. It's the elephant

00:20:56.119 --> 00:20:59.299
in the room. Please. Nonlinear functions. What

00:20:59.299 --> 00:21:01.759
does that mean in plain English? And why is it

00:21:01.759 --> 00:21:03.819
such a big deal that these networks can approximate

00:21:03.819 --> 00:21:07.299
them? Okay. To understand nonlinear, you first

00:21:07.299 --> 00:21:10.180
have to understand what linear is. Think of a

00:21:10.180 --> 00:21:12.279
linear function as a perfectly straight line.

00:21:12.440 --> 00:21:15.539
It's simple, predictable, cause and effect. If

00:21:15.539 --> 00:21:18.119
I double the input, I double the output. Every

00:21:18.119 --> 00:21:22.160
time. Okay. Like buying apples at a store? The

00:21:22.160 --> 00:21:24.880
perfect example. If one apple costs $1, two apples

00:21:24.880 --> 00:21:27.519
cost $2, 10 apples cost $10, you can draw it

00:21:27.519 --> 00:21:29.359
on a graph, and it's a perfectly straight line

00:21:29.359 --> 00:21:31.359
going up. It's predictable. It's easy. Got it.

00:21:31.440 --> 00:21:34.589
But the real world... the real world is almost

00:21:34.589 --> 00:21:37.630
never linear real world relationships are messy

00:21:37.630 --> 00:21:40.049
they're complex think about the relationship

00:21:40.049 --> 00:21:43.009
between say the amount of coffee you drink and

00:21:43.009 --> 00:21:45.089
your productivity oh that is definitely not a

00:21:45.089 --> 00:21:47.049
straight line not at all if you have one cup

00:21:47.319 --> 00:21:49.440
Your productivity goes up. You feel alert. You

00:21:49.440 --> 00:21:50.980
have two cups. Maybe it goes up a little bit

00:21:50.980 --> 00:21:52.900
more. But if you have 10 cups of coffee, you're

00:21:52.900 --> 00:21:54.940
vibrating under your desk and your productivity

00:21:54.940 --> 00:21:57.720
hits absolute zero. Exactly. The line goes up,

00:21:57.720 --> 00:21:59.359
then it curves over and then it crashes down.

00:21:59.599 --> 00:22:02.500
That is a nonlinear relationship. The output

00:22:02.500 --> 00:22:05.119
does not move in a straight, predictable line

00:22:05.119 --> 00:22:08.420
with the input. Ah, OK. So it handles the curve.

00:22:08.599 --> 00:22:11.019
It handles the complexity. And here's why this

00:22:11.019 --> 00:22:14.160
matters for AI. Almost every interesting problem

00:22:14.160 --> 00:22:17.700
in the world is nonlinear. Identifying a face

00:22:17.700 --> 00:22:21.160
in a photo is a highly non -linear problem. There

00:22:21.160 --> 00:22:23.319
isn't a straight line equation where pixel X

00:22:23.319 --> 00:22:26.660
plus pixel Y equals Steve. Because Steve might

00:22:26.660 --> 00:22:29.579
be turning his head or could be standing in a

00:22:29.579 --> 00:22:33.490
shadow or wearing glasses today. Exactly. The

00:22:33.490 --> 00:22:35.390
relationship between the arrangement of pixels

00:22:35.390 --> 00:22:38.069
and the identity of Steve is incredibly complex

00:22:38.069 --> 00:22:41.569
and messy. It twists and turns. Standard linear

00:22:41.569 --> 00:22:43.950
math fails miserably at this kind of problem.

00:22:44.170 --> 00:22:46.369
Neural networks are designed specifically to

00:22:46.369 --> 00:22:48.890
handle that messiness. They are approximation

00:22:48.890 --> 00:22:51.950
machines for complex, messy realities where A

00:22:51.950 --> 00:22:54.849
plus B does not always equal C. So they are the

00:22:54.849 --> 00:22:57.089
off -road vehicles of mathematics. They can handle

00:22:57.089 --> 00:22:59.049
the rough, bumpy terrain where other methods

00:22:59.049 --> 00:23:01.109
would get stuck. That is a fantastic analogy.

00:23:01.329 --> 00:23:03.650
Yes. They can map any curve, no matter how wiggly

00:23:03.650 --> 00:23:05.329
it is. Okay, that is actually really, really

00:23:05.329 --> 00:23:07.410
helpful. So how do they do it? Well, what does

00:23:07.410 --> 00:23:09.049
the architecture of one of these software networks

00:23:09.049 --> 00:23:11.049
look like? Because we can't see the synapses

00:23:11.049 --> 00:23:13.650
anymore. It's just code. We visualize it in layers.

00:23:13.869 --> 00:23:15.750
The source material breaks it down into three

00:23:15.750 --> 00:23:17.730
main types of layers that you'll always find.

00:23:17.930 --> 00:23:19.869
First you have the input layer. That's where

00:23:19.869 --> 00:23:22.140
the information enters the system. Right. If

00:23:22.140 --> 00:23:24.359
it's a facial recognition system, the input layer

00:23:24.359 --> 00:23:27.079
is just the raw pixels of the photo. It takes

00:23:27.079 --> 00:23:29.599
the image, say a grid of a thousand by a thousand

00:23:29.599 --> 00:23:31.799
pixels, and it turns it into a list of a million

00:23:31.799 --> 00:23:34.680
numbers. So it digitizes the world. Precisely.

00:23:34.779 --> 00:23:36.799
Then you have the hidden layers. These are the

00:23:36.799 --> 00:23:38.420
intermediate layers where all the processing

00:23:38.420 --> 00:23:41.099
happens. This is the black box where the real

00:23:41.099 --> 00:23:44.039
magic occurs. And finally, the output layer.

00:23:44.240 --> 00:23:47.140
The final result. A single neuron might light

00:23:47.140 --> 00:23:49.859
up that corresponds to this is a photo of Steve

00:23:49.859 --> 00:23:53.019
or this is a span. Or maybe a set of numbers

00:23:53.019 --> 00:23:55.960
that means turn the steering wheel 15 degrees

00:23:55.960 --> 00:23:58.880
to the left. Input, hidden output. Seems like

00:23:58.880 --> 00:24:01.279
a simple enough structure. But what is actually

00:24:01.279 --> 00:24:04.599
happening inside those hidden layers? How does

00:24:04.599 --> 00:24:08.259
an artificial neuron think? Because it doesn't

00:24:08.259 --> 00:24:10.759
have a brain. It's just a number. This is where

00:24:10.759 --> 00:24:12.779
we get into the math, but stick with me. It's

00:24:12.779 --> 00:24:16.200
pretty intuitive. The source says the signal

00:24:16.200 --> 00:24:18.880
input to a neuron in one of these layers is a

00:24:18.880 --> 00:24:22.079
number. Specifically, it is a linear combination

00:24:22.079 --> 00:24:25.160
of the outputs from all the connected neurons

00:24:25.160 --> 00:24:27.640
in the previous layer. Linear combination. That

00:24:27.640 --> 00:24:29.079
just sounds like it's mixing a bunch of things

00:24:29.079 --> 00:24:31.279
together. That's exactly what it is. Imagine

00:24:31.279 --> 00:24:33.640
you are a single neuron in one of the hidden

00:24:33.640 --> 00:24:35.880
layers. You're like a chef. You have all these

00:24:35.880 --> 00:24:37.819
ingredients coming in from the previous layer.

00:24:38.039 --> 00:24:40.660
The neurons in the previous layer are all shouting

00:24:40.660 --> 00:24:43.680
things at you. One is saying, I see a strong

00:24:43.680 --> 00:24:46.279
vertical line here. Another is saying, I see

00:24:46.279 --> 00:24:49.180
a shadow over here. And a third says, I'm detecting

00:24:49.180 --> 00:24:51.900
the color blue. The ingredients are the signals

00:24:51.900 --> 00:24:55.140
from the last layer. Yes. But, and this is the

00:24:55.140 --> 00:24:57.819
absolute key to everything, the neuron doesn't

00:24:57.819 --> 00:25:00.299
listen to them all equally. It pays more attention

00:25:00.299 --> 00:25:03.980
to some than others. It relies on weights. Weights.

00:25:04.299 --> 00:25:06.740
Okay. This is bringing us right back to heavy

00:25:06.740 --> 00:25:08.900
learning and the trail in the woods. It's the

00:25:08.900 --> 00:25:11.700
software version of that. The behavior of the

00:25:11.700 --> 00:25:14.400
entire network depends on the strengths or weights

00:25:14.400 --> 00:25:17.160
of the connections between the neurons. In the

00:25:17.160 --> 00:25:20.019
software, a weight is just a number that represents

00:25:20.019 --> 00:25:22.920
that strength. So just like the biological synapse

00:25:22.920 --> 00:25:26.410
gets physically stronger. In an artificial network,

00:25:26.549 --> 00:25:29.049
the weight number just gets bigger. If the weight

00:25:29.049 --> 00:25:31.670
on a connection is a high positive number, the

00:25:31.670 --> 00:25:33.789
connection is strong. The signal gets amplified.

00:25:34.170 --> 00:25:36.730
If the weight is a low number or a negative number,

00:25:36.910 --> 00:25:39.789
the signal gets ignored or even suppressed. So

00:25:39.789 --> 00:25:41.890
if I'm an artificial neuron trying to identify

00:25:41.890 --> 00:25:44.630
a cat, and I'm connected to a neuron in the previous

00:25:44.630 --> 00:25:48.180
layer that has detected pointy ears, I might

00:25:48.180 --> 00:25:49.940
have a really high weight on that connection.

00:25:50.140 --> 00:25:52.019
I'm listening very closely to the pointy ears

00:25:52.019 --> 00:25:54.180
guy. Exactly. You'd say, pointy ears, that's

00:25:54.180 --> 00:25:56.519
super important for being a cat. Multiply that

00:25:56.519 --> 00:25:58.960
signal by 10. And if I'm connected to a neuron

00:25:58.960 --> 00:26:01.700
that detected has wheels... You might have a

00:26:01.700 --> 00:26:03.559
strong negative weight on that connection. You'd

00:26:03.559 --> 00:26:06.099
say, wheels, cats don't have wheels. Suppress

00:26:06.099 --> 00:26:09.150
that signal. multiply it by next five, and make

00:26:09.150 --> 00:26:12.829
it count against the final total. Usually. Usually.

00:26:12.970 --> 00:26:14.990
Unless you're training it on cartoon cats, which

00:26:14.990 --> 00:26:17.349
is a whole other problem. So the neuron takes

00:26:17.349 --> 00:26:19.990
all these inputs, multiplies each one by its

00:26:19.990 --> 00:26:22.170
corresponding weight, and then it just sums them

00:26:22.170 --> 00:26:25.190
all up. That's the linear combination. But then

00:26:25.190 --> 00:26:28.490
there's one more critical step, the activation

00:26:28.490 --> 00:26:31.109
function. The gatekeeper. The gatekeeper, exactly.

00:26:31.450 --> 00:26:34.910
The neuron's final output signal is calculated

00:26:34.910 --> 00:26:37.470
from that big summed up number based on this

00:26:37.470 --> 00:26:39.890
activation function. It's a simple little piece

00:26:39.890 --> 00:26:43.089
of math that decides, okay, is this total number

00:26:43.089 --> 00:26:46.029
high enough to be important? Is it strong enough

00:26:46.029 --> 00:26:48.450
to fire and send a signal onto the next layer?

00:26:48.670 --> 00:26:51.390
So this is the artificial version of that excitatory

00:26:51.390 --> 00:26:53.509
versus inhibitory decision we talked about in

00:26:53.509 --> 00:26:56.190
the brain. That's precisely what it is. It introduces

00:26:56.190 --> 00:26:59.049
the non -linearity we need. Without the activation

00:26:59.049 --> 00:27:01.369
function, the whole network would just be one

00:27:01.369 --> 00:27:04.150
giant, very complicated multiplication problem.

00:27:04.289 --> 00:27:06.269
It would still be linear. It couldn't learn those

00:27:06.269 --> 00:27:08.490
complex curves. The activation function allows

00:27:08.490 --> 00:27:11.130
it to say, unless the evidence is this strong,

00:27:11.269 --> 00:27:13.369
just ignore it. It creates boundaries and allows

00:27:13.369 --> 00:27:16.789
for complex decisions. So to recap, the neuron

00:27:17.319 --> 00:27:19.640
sums up all the weighted inputs, it runs that

00:27:19.640 --> 00:27:21.920
total through the activation function, and then

00:27:21.920 --> 00:27:24.440
it decides whether to pass a message on and how

00:27:24.440 --> 00:27:26.859
strong that message should be. It's remarkably

00:27:26.859 --> 00:27:29.660
similar to the biological process. It's a mathematical

00:27:29.660 --> 00:27:32.220
mirror of it. It's an elegant simplification.

00:27:32.619 --> 00:27:34.740
But here is the million -dollar question, or

00:27:34.740 --> 00:27:37.160
the trillion -dollar question now. In biology,

00:27:37.299 --> 00:27:40.099
we learn by practice. The signal travels down

00:27:40.099 --> 00:27:42.460
the path. The synapse gets physically thicker.

00:27:42.819 --> 00:27:46.599
How does a software program learn? How does it

00:27:46.599 --> 00:27:48.839
know what the weights for all those connections

00:27:48.839 --> 00:27:51.299
should be in the first place? Because there are

00:27:51.299 --> 00:27:53.720
billions of them. Trillions in the really big

00:27:53.720 --> 00:27:56.299
modern models like GPT -4. And you're right.

00:27:56.339 --> 00:27:58.220
You can't set them by hand. You can't sit there

00:27:58.220 --> 00:28:00.559
as a programmer and dial in a trillion knobs

00:28:00.559 --> 00:28:03.000
to the right settings. So how do we do it? How

00:28:03.000 --> 00:28:05.099
does it figure it out? That is the process we

00:28:05.099 --> 00:28:07.980
call training. And the source material gives

00:28:07.980 --> 00:28:10.720
us two somewhat intimidating terms for this process,

00:28:11.000 --> 00:28:14.180
empirical risk minimization and backpropagation.

00:28:14.440 --> 00:28:16.640
Let's focus on the concept first, though. How

00:28:16.640 --> 00:28:20.180
do you train the beast? Okay. So ideally, you

00:28:20.180 --> 00:28:21.940
want the network to look at a picture of a cat

00:28:21.940 --> 00:28:25.339
and say, cat. But when you first start, all those

00:28:25.339 --> 00:28:28.099
trillions of weights are just set to random numbers.

00:28:28.380 --> 00:28:31.180
The network is a complete idiot. It knows nothing.

00:28:31.400 --> 00:28:33.259
So you show it the first picture of a cat. And

00:28:33.259 --> 00:28:35.240
it runs the numbers through all the layers. And

00:28:35.240 --> 00:28:39.140
at the end, it guesses. laughs, swing and a miss.

00:28:39.259 --> 00:28:42.819
A huge miss. Now, the math knows it's wrong because

00:28:42.819 --> 00:28:46.000
you, the trainer, have a pre -existing data set.

00:28:46.160 --> 00:28:48.880
You have the answer key. You've labeled that

00:28:48.880 --> 00:28:51.440
picture cat. So the system can calculate the

00:28:51.440 --> 00:28:54.500
error. It knows how wrong it was. Exactly. It

00:28:54.500 --> 00:28:57.440
calculates the loss or the cost. Ideally, the

00:28:57.440 --> 00:28:59.660
loss should be zero. Here, the loss is huge.

00:29:00.680 --> 00:29:02.960
Using a clever algorithm called back propagation,

00:29:03.339 --> 00:29:05.460
it goes backwards through the network. Backwards

00:29:05.460 --> 00:29:07.640
from the answer to the input. Yes. This is the

00:29:07.640 --> 00:29:10.019
genius of it. It starts at the output layer and

00:29:10.019 --> 00:29:12.160
moves back towards the input layer. It looks

00:29:12.160 --> 00:29:13.700
at all those weights that contributed to the

00:29:13.700 --> 00:29:15.680
final decision and it says, OK, who screwed up?

00:29:15.700 --> 00:29:18.259
Which connection, which weight contributed most

00:29:18.259 --> 00:29:20.680
to this ridiculous toaster decision? It's like

00:29:20.680 --> 00:29:22.839
a manager walking back through the assembly line

00:29:22.839 --> 00:29:25.019
after a defective product comes out at the end.

00:29:25.119 --> 00:29:27.750
That is the perfect analogy. The manager walks

00:29:27.750 --> 00:29:30.069
back down the line and says, you, station four,

00:29:30.250 --> 00:29:32.890
you tightened that bolt way too much. That made

00:29:32.890 --> 00:29:35.069
it look metallic and shiny. Let's nudge your

00:29:35.069 --> 00:29:38.069
weight down a little. And you, station two, you

00:29:38.069 --> 00:29:39.849
didn't add nearly enough weight to the fuzziness

00:29:39.849 --> 00:29:42.609
neuron. Let's nudge your weight up. It mathematically

00:29:42.609 --> 00:29:46.190
tweaks all the weights just a tiny bit in the

00:29:46.190 --> 00:29:47.970
direction that would have made the error smaller.

00:29:48.309 --> 00:29:50.940
It signs blame. It assigns blame mathematically

00:29:50.940 --> 00:29:53.019
and adjusts the weights to reduce the error.

00:29:53.160 --> 00:29:54.900
And then it does it again and again and again

00:29:54.900 --> 00:29:57.759
and again. Millions, sometimes billions of times.

00:29:57.839 --> 00:30:00.700
Sees another cat. It guesses dog. Big error.

00:30:00.819 --> 00:30:03.140
Adjust all the weights. It sees another cat.

00:30:03.279 --> 00:30:06.420
Guesses lion. Smally error, but still wrong.

00:30:06.599 --> 00:30:08.559
Adjust the weights again. It sees another cat.

00:30:08.640 --> 00:30:12.200
And finally, it guesses cat. Success. The error

00:30:12.200 --> 00:30:14.700
is zero. Keep those weights for now. So training

00:30:14.700 --> 00:30:17.690
is essentially just this. automated repetitive

00:30:17.690 --> 00:30:20.690
process of showing the network an example seeing

00:30:20.690 --> 00:30:23.869
how wrong it is and using calculus to nudge all

00:30:23.869 --> 00:30:26.549
the connection strengths until the output matches

00:30:26.549 --> 00:30:29.369
the data you want that is all it is it feels

00:30:29.369 --> 00:30:31.730
like intelligence it looks like learning but

00:30:31.730 --> 00:30:34.369
under the hood it is a brute force optimization

00:30:34.369 --> 00:30:37.630
problem it is statistics and calculus finding

00:30:37.630 --> 00:30:40.230
the path of least resistance through a mathematical

00:30:40.230 --> 00:30:43.009
landscape with trillions of dimensions that is

00:30:43.420 --> 00:30:45.720
Slightly underwhelming, but also incredibly impressive

00:30:45.720 --> 00:30:48.299
at the same time. It takes the magic out of it,

00:30:48.339 --> 00:30:50.839
but it adds this whole new layer of awe at what

00:30:50.839 --> 00:30:54.039
pure, relentless math can actually achieve. So

00:30:54.039 --> 00:30:55.940
that's the mechanism. We have the architecture,

00:30:56.140 --> 00:30:57.740
we have the weights, we have the training process

00:30:57.740 --> 00:31:00.200
with backpropagation. Now I have to ask about

00:31:00.200 --> 00:31:03.079
the big buzzword, deep learning. You hear it

00:31:03.079 --> 00:31:05.079
everywhere. Deep neural networks. Is that just

00:31:05.079 --> 00:31:06.579
marketing, or does it actually mean something

00:31:06.579 --> 00:31:09.059
specific? It is a specific technical term, and

00:31:09.059 --> 00:31:11.480
the definition is surprisingly concrete and,

00:31:11.519 --> 00:31:14.700
well, simple. The source says a deep neural network

00:31:14.700 --> 00:31:16.660
just refers to neural networks that have more

00:31:16.660 --> 00:31:19.000
than three layers. That's it. The definition

00:31:19.000 --> 00:31:22.640
of deep is just more than three. That's the technical

00:31:22.640 --> 00:31:25.420
threshold. Typically, it includes the input layer,

00:31:25.579 --> 00:31:27.720
the output layer, and at least two hidden layers.

00:31:28.039 --> 00:31:30.539
Anything with two or more hidden layers gets

00:31:30.539 --> 00:31:33.220
to call itself deep. So deep just refers to the

00:31:33.220 --> 00:31:35.420
depth of the layers. It's a structural definition.

00:31:35.819 --> 00:31:39.140
Yes. Early networks back in the 80s and 90s were

00:31:39.140 --> 00:31:41.599
shallow. Maybe they had one hidden layer. But

00:31:41.599 --> 00:31:44.700
the modern networks, the ones running GPT or

00:31:44.700 --> 00:31:48.400
AlphaGo or self -driving cars, they are incredibly

00:31:48.400 --> 00:31:51.299
deep. We're talking dozens, sometimes hundreds

00:31:51.299 --> 00:31:53.859
of layers. Why does that depth matter so much?

00:31:53.900 --> 00:31:56.220
Why is that the secret sauce? Why not just have

00:31:56.220 --> 00:31:59.599
one really, really wide hidden layer with billions

00:31:59.599 --> 00:32:03.309
of neurons? Hierarchy. That is the secret. Deep

00:32:03.309 --> 00:32:05.369
networks learn in a hierarchy of features, from

00:32:05.369 --> 00:32:08.289
simple to complex. Think about how you, a human,

00:32:08.529 --> 00:32:11.009
recognize a car. Okay. You don't just see car

00:32:11.009 --> 00:32:13.730
all at once. Your brain processes it in stages.

00:32:14.029 --> 00:32:16.109
The first layer of neurons in your visual cortex,

00:32:16.329 --> 00:32:18.170
as we mentioned, might just learn to recognize

00:32:18.170 --> 00:32:21.170
simple things. Edges, gradients, patches of color.

00:32:21.289 --> 00:32:23.329
The most basic building blocks. Vertical lines,

00:32:23.509 --> 00:32:27.130
horizontal lines. Exactly. The next layer up...

00:32:27.390 --> 00:32:29.309
takes the outputs from that first layer and learns

00:32:29.309 --> 00:32:31.410
to combine them. It says, okay, if I see four

00:32:31.410 --> 00:32:33.470
lines arranged in a rectangle, that's a shape.

00:32:33.529 --> 00:32:36.049
If I see a continuous curve, that's a circle.

00:32:36.390 --> 00:32:38.869
It learns simple shapes. So wheels and windows.

00:32:39.130 --> 00:32:42.029
Right. And the layer above that combines the

00:32:42.029 --> 00:32:44.849
shapes into components. It says a circle next

00:32:44.849 --> 00:32:47.289
to another circle on top of a rectangle that

00:32:47.289 --> 00:32:50.250
looks like a wheel assembly. Or this specific

00:32:50.250 --> 00:32:52.349
collection of shapes looks like a door handle.

00:32:52.529 --> 00:32:54.910
And the final layers combine those components

00:32:54.910 --> 00:32:58.190
into the abstract concept of car. So each layer

00:32:58.190 --> 00:33:00.089
is building on the abstraction of the one before

00:33:00.089 --> 00:33:03.380
it. From pixels to lines to shapes to objects

00:33:03.380 --> 00:33:06.140
to the final concept. Exactly. It moves from

00:33:06.140 --> 00:33:08.640
the very concrete to the very abstract. That

00:33:08.640 --> 00:33:11.019
is what deep learning allows. It allows the computer

00:33:11.019 --> 00:33:13.400
to build a complex understanding of the world

00:33:13.400 --> 00:33:16.420
from simple parts, step by step, just like we

00:33:16.420 --> 00:33:19.119
think our own brains do. And that is why we use

00:33:19.119 --> 00:33:21.140
them for so many things now. The applications

00:33:21.140 --> 00:33:22.980
listed in the source are pretty broad. It's a

00:33:22.980 --> 00:33:26.079
huge list. Yeah, it's extremely broad because

00:33:26.079 --> 00:33:28.759
this hierarchical feature learning turns out

00:33:28.759 --> 00:33:31.240
to be applicable to almost... every complex problem

00:33:31.240 --> 00:33:33.759
you can think of the list here includes predictive

00:33:33.759 --> 00:33:37.500
modeling adaptive control facial recognition

00:33:37.500 --> 00:33:40.799
handwriting recognition general game playing

00:33:40.799 --> 00:33:43.500
and of course the big one now generative ai and

00:33:43.500 --> 00:33:46.039
if you look at that whole diverse list they all

00:33:46.039 --> 00:33:49.380
share one common thread they are all problems

00:33:49.380 --> 00:33:51.859
where the rules are incredibly hard to write

00:33:51.859 --> 00:33:54.569
down explicitly right You can't sit down and

00:33:54.569 --> 00:33:57.410
write a computer program with a million if -then

00:33:57.410 --> 00:33:59.910
statements for what a cat looks like because

00:33:59.910 --> 00:34:01.569
there are just too many variations. Exactly.

00:34:01.589 --> 00:34:03.430
You can't program it line by line. You can't

00:34:03.430 --> 00:34:05.710
say if it has pointy ears, then it's a cat. Well,

00:34:05.750 --> 00:34:08.150
a fox has pointy ears. If it has whiskers, a

00:34:08.150 --> 00:34:10.309
rat has whiskers. You can't capture the essence

00:34:10.309 --> 00:34:12.969
of catness with a set of logical rules. But you

00:34:12.969 --> 00:34:15.690
can train a deep neural network to just sort

00:34:15.690 --> 00:34:18.909
of feel it out. You can train a network to approximate

00:34:18.909 --> 00:34:22.489
the nonlinear function of Katniss. That is the

00:34:22.489 --> 00:34:25.250
power of this technology. It allows us to solve

00:34:25.250 --> 00:34:27.650
problems that we humans know how to do intuitively,

00:34:27.969 --> 00:34:31.150
like recognizing a face or understanding a sentence

00:34:31.150 --> 00:34:34.250
or driving a car, but that we don't know how

00:34:34.250 --> 00:34:36.630
to explain logically or mathematically. We just

00:34:36.630 --> 00:34:38.530
let the network figure out the math for us by

00:34:38.530 --> 00:34:40.210
looking at millions of examples. We're basically

00:34:40.210 --> 00:34:42.349
mimicking our own intuition that we can't even

00:34:42.349 --> 00:34:44.889
articulate ourselves. Perfectly put. We're outsourcing

00:34:44.889 --> 00:34:48.610
our intuition to... So we have traveled quite

00:34:48.610 --> 00:34:51.429
a distance today. We started this deep dive with

00:34:51.429 --> 00:34:54.809
a single biological neuron vibrating with potentially

00:34:54.809 --> 00:34:58.170
100 ,000 synapses, sending these tiny electrochemical

00:34:58.170 --> 00:35:00.409
signals through a dense forest in our heads.

00:35:00.610 --> 00:35:02.489
We walked through the gaslight era of William

00:35:02.489 --> 00:35:05.329
James and Alexander Bain when the very idea that

00:35:05.329 --> 00:35:08.250
thought was physical was a radical concept. We

00:35:08.250 --> 00:35:10.329
saw the perceptron, that room -sized tangle of

00:35:10.329 --> 00:35:12.750
wires and motorized knobs. And we watched that

00:35:12.750 --> 00:35:14.610
moment where biology and engineering went their

00:35:14.610 --> 00:35:18.309
separate ways. And we ended up inside these deep,

00:35:18.329 --> 00:35:21.369
multilayered software architectures of generative

00:35:21.369 --> 00:35:24.929
AI, where pure calculus is acting as the teacher,

00:35:25.130 --> 00:35:27.610
adjusting trillions of weights to get the right

00:35:27.610 --> 00:35:30.050
answer. It really is quite a journey when you

00:35:30.050 --> 00:35:32.090
lay it all out like that. And for me, the biggest

00:35:32.090 --> 00:35:33.469
takeaway, the thing I'm going to be thinking

00:35:33.469 --> 00:35:36.170
about later this week, is that convergence at

00:35:36.170 --> 00:35:38.730
the end. The convergence of the two worlds we

00:35:38.730 --> 00:35:41.210
started with. Yeah. We started by trying to model

00:35:41.210 --> 00:35:44.030
the brain. We created this simplified mathematical

00:35:44.030 --> 00:35:48.059
approximation. And now... We're using that math

00:35:48.059 --> 00:35:51.019
to solve the exact same problems, like facial

00:35:51.019 --> 00:35:53.880
recognition or understanding language, that the

00:35:53.880 --> 00:35:56.500
biological brain evolved over millions of years

00:35:56.500 --> 00:35:58.480
to handle in the first place. It's come full

00:35:58.480 --> 00:36:01.000
circle. We outsourced a biological function to

00:36:01.000 --> 00:36:03.800
a silicon chip by first copying the biological

00:36:03.800 --> 00:36:06.000
architecture. It raises a bit of a provocative

00:36:06.000 --> 00:36:07.800
question for the end, doesn't it? It certainly

00:36:07.800 --> 00:36:10.000
does. The source material leaves us with this

00:36:10.000 --> 00:36:12.480
final comparison of what learning is. In the

00:36:12.480 --> 00:36:15.619
brain, it's Hebbian learning. Strengthening the

00:36:15.619 --> 00:36:18.159
synapses via signal travel. The famous line,

00:36:18.280 --> 00:36:20.960
neurons that fire together wire together. And

00:36:20.960 --> 00:36:23.000
in the machine, it's about modifying weights

00:36:23.000 --> 00:36:25.739
via back propagation. It's about minimizing a

00:36:25.739 --> 00:36:28.119
mathematical error function. The mechanisms are

00:36:28.119 --> 00:36:30.260
totally different. One is wet chemistry. The

00:36:30.260 --> 00:36:33.639
other is cold calculus. But the result is strikingly

00:36:33.639 --> 00:36:35.920
similar. A system that improves with experience.

00:36:36.420 --> 00:36:39.780
So the final thought is this. Are we just mathematical

00:36:39.780 --> 00:36:42.280
models ourselves, approximating our own biology?

00:36:43.050 --> 00:36:46.369
Or is the math just a really, really good shadow

00:36:46.369 --> 00:36:48.949
of the real thing? That is the question, isn't

00:36:48.949 --> 00:36:51.449
it? If we can replicate the result of thought,

00:36:51.590 --> 00:36:54.170
of creativity, of learning, using just simple

00:36:54.170 --> 00:36:57.090
math and optimization on a massive scale, does

00:36:57.090 --> 00:36:59.030
that mean that our own thought is just simple

00:36:59.030 --> 00:37:01.949
math and optimization on a massive scale? That

00:37:01.949 --> 00:37:05.250
is the dangerous and fascinating thought. If

00:37:05.250 --> 00:37:07.409
a machine can write a poem by just minimizing

00:37:07.409 --> 00:37:09.710
the error between its output and all the poems

00:37:09.710 --> 00:37:12.190
it's ever read, does that diminish our own poetry?

00:37:12.860 --> 00:37:14.440
Or does it just mean our brains are doing the

00:37:14.440 --> 00:37:16.219
same thing just with a different kind of hardware?

00:37:16.519 --> 00:37:19.139
I don't know if I'm ready to answer that. I think.

00:37:19.559 --> 00:37:21.840
I like to think I'm more than a linear combination

00:37:21.840 --> 00:37:24.699
of weighted inputs. I like to think there's still

00:37:24.699 --> 00:37:28.079
a ghost in my machine. Laughs. We all like to

00:37:28.079 --> 00:37:30.860
think that. But the trillions of weights in your

00:37:30.860 --> 00:37:33.539
brain might disagree. They might just be firing

00:37:33.539 --> 00:37:35.880
according to their training data, just like the

00:37:35.880 --> 00:37:38.659
AI. On that slightly existential note, I think

00:37:38.659 --> 00:37:41.289
we are going to wrap up this deep dive. Thank

00:37:41.289 --> 00:37:43.349
you for listening. Yes. Hopefully we have strengthened

00:37:43.349 --> 00:37:45.650
some of the right synapses in your brain today.

00:37:45.730 --> 00:37:47.929
And hopefully we didn't inhibit too many of the

00:37:47.929 --> 00:37:50.530
others. Exactly. We'll see you in the next one.
