WEBVTT

00:00:00.000 --> 00:00:04.080
You know, back in 1956, a machine didn't just

00:00:04.080 --> 00:00:07.000
solve a complicated math problem faster than

00:00:07.000 --> 00:00:10.339
a human. Right. It actually found a more beautiful,

00:00:10.960 --> 00:00:14.300
more elegant path to the truth than two of the

00:00:14.300 --> 00:00:17.260
greatest mathematical minds of the 20th century.

00:00:17.440 --> 00:00:19.839
Welcome to Deep Dive. It really is an incredible

00:00:19.839 --> 00:00:22.100
story. I mean, we're looking at a field that

00:00:22.100 --> 00:00:24.879
sits right at this crazy intersection of pure

00:00:24.879 --> 00:00:28.839
mathematics, philosophy, and artificial intelligence.

00:00:29.199 --> 00:00:31.859
Exactly. And our mission today is to give you,

00:00:31.879 --> 00:00:34.920
the listener, the ultimate shortcut to understanding

00:00:34.920 --> 00:00:37.079
this foundational, almost science fiction level

00:00:37.079 --> 00:00:39.539
concept in computer science, which is automated

00:00:39.539 --> 00:00:41.590
reasoning. Yeah, it's a heavy topic, but we're

00:00:41.590 --> 00:00:43.729
going to break it down. We really are. And just

00:00:43.729 --> 00:00:45.789
to give you a sense of our roadmap, we are pulling

00:00:45.789 --> 00:00:48.329
from a really comprehensive Wikipedia article

00:00:48.329 --> 00:00:51.570
on automated reasoning today, along with several

00:00:51.570 --> 00:00:53.549
of its deep -cut reference materials. Some really

00:00:53.549 --> 00:00:55.789
fascinating sources in there. Oh, for sure. And

00:00:55.789 --> 00:00:57.750
for you listening, here is exactly why you need

00:00:57.750 --> 00:00:59.770
to care about this. We are living in a moment

00:00:59.770 --> 00:01:02.549
right now where everyone is using these AI chatbots.

00:01:02.789 --> 00:01:04.250
And well, we all know they hallucinate, right?

00:01:04.489 --> 00:01:07.150
Oh, constantly. Yeah, they just constantly make

00:01:07.150 --> 00:01:09.739
things up. Right. So understanding automa - reasoning

00:01:09.739 --> 00:01:12.599
is the key to knowing how we are actually teaching

00:01:12.599 --> 00:01:15.159
machines not just to guess the next word in a

00:01:15.159 --> 00:01:19.200
sentence but to you know actually think deduce

00:01:19.200 --> 00:01:22.620
and verify truths with absolute certainty it's

00:01:22.620 --> 00:01:25.319
the antidote to the hallucination problem exactly

00:01:25.319 --> 00:01:28.280
okay so Let's unpack this. We need to set the

00:01:28.280 --> 00:01:31.519
foundation first. What exactly is this field

00:01:31.519 --> 00:01:34.299
even trying to achieve? Well, the primary goal

00:01:34.299 --> 00:01:37.180
of automated reasoning is to basically build

00:01:37.180 --> 00:01:39.579
computer programs that allow machines to reason

00:01:39.579 --> 00:01:41.659
completely automatically, or I mean, at least

00:01:41.659 --> 00:01:44.200
as close to completely as mechanically possible.

00:01:44.280 --> 00:01:45.739
So we're not just talking about a chatbot that

00:01:45.739 --> 00:01:48.019
sounds smart. No, no, not at all. We are talking

00:01:48.019 --> 00:01:50.879
about a machine that deduces what is mathematically

00:01:50.879 --> 00:01:54.120
and logically true. The most developed areas

00:01:54.120 --> 00:01:56.340
here are automated theorem, proving where the

00:01:56.340 --> 00:01:58.500
machine actually discovers a proof on its own,

00:01:58.819 --> 00:02:00.760
and automated proof checking. And proof checking

00:02:00.760 --> 00:02:03.159
is where the human does the work first. Exactly.

00:02:03.579 --> 00:02:05.719
A human provides the reasoning, and the machine

00:02:05.719 --> 00:02:08.460
rigorously verifies that absolutely no logical

00:02:08.460 --> 00:02:11.500
laws were broken. OK, so to do that, to actually

00:02:11.500 --> 00:02:14.240
automate reasoning, we have to define what perfect

00:02:14.240 --> 00:02:17.039
reasoning looks like, right? Which means we have

00:02:17.039 --> 00:02:20.150
to step completely away from how the human brain

00:02:20.150 --> 00:02:22.169
actually operates. Right, because human reasoning

00:02:22.169 --> 00:02:25.069
is just incredibly messy. I mean, it relies heavily

00:02:25.069 --> 00:02:28.250
on intuition. Yeah. A human mathematician will

00:02:28.250 --> 00:02:31.590
just, you know, skip a dozen minor logical steps

00:02:31.590 --> 00:02:33.770
and a proof because a connection just feels right

00:02:33.770 --> 00:02:36.650
or it looks obvious based on their past experience.

00:02:36.969 --> 00:02:38.590
You know, when I was reading the sources for

00:02:38.590 --> 00:02:40.610
this, it made me think of walking across a frozen

00:02:40.610 --> 00:02:44.039
lake. Oh, interesting. How so? Well, human intuition

00:02:44.039 --> 00:02:46.340
is like looking at the ice, seeing it look solid

00:02:46.340 --> 00:02:48.919
enough and just walk it across. You take a few

00:02:48.919 --> 00:02:50.599
steps. You don't fall in. So you just assume

00:02:50.599 --> 00:02:52.960
the whole lake is safe. Right. You extrapolate.

00:02:53.039 --> 00:02:57.479
Yeah. But formal logic, on the other hand, is

00:02:57.479 --> 00:03:00.259
like pausing to check the structural integrity

00:03:00.259 --> 00:03:03.479
of every single individual ice crystal before

00:03:03.479 --> 00:03:06.050
you are willing to move even an inch. That is

00:03:06.050 --> 00:03:08.430
a great analogy. And what's fascinating here

00:03:08.430 --> 00:03:11.689
is that a formal proof requires exactly that

00:03:11.689 --> 00:03:13.870
microscopic level of scrutiny. Is relentless.

00:03:14.229 --> 00:03:17.370
It really is. In a formal proof, every single

00:03:17.370 --> 00:03:19.669
logical inference has to be traced all the way

00:03:19.669 --> 00:03:21.710
back to the fundamental axioms of mathematics.

00:03:22.150 --> 00:03:24.590
You absolutely cannot skip steps. No intuitive

00:03:24.590 --> 00:03:27.849
leaps allowed. Zero. you cannot appeal to intuition.

00:03:28.110 --> 00:03:31.169
Even if translating a specific, you know, intuitive

00:03:31.169 --> 00:03:33.509
leap into formal logic would be totally routine

00:03:33.509 --> 00:03:36.590
and boring for a human mathematician, the machine

00:03:36.590 --> 00:03:39.330
demands the intermediate steps. It forces you

00:03:39.330 --> 00:03:41.810
to show your work. Exactly. Because by removing

00:03:41.810 --> 00:03:44.569
human intuition entirely, a formal proof strips

00:03:44.569 --> 00:03:47.150
away the exact places where logical errors like

00:03:47.150 --> 00:03:50.509
to hide. That makes total sense. Yeah. But logic

00:03:50.509 --> 00:03:52.490
isn't always just proving that two plus two is

00:03:52.490 --> 00:03:55.310
four, right? Like life and human thought involve

00:03:55.180 --> 00:03:57.659
a lot of gray areas. Does automated reasoning

00:03:57.659 --> 00:04:00.460
only deal with rigid black and white math? No,

00:04:00.460 --> 00:04:02.479
it goes much deeper than that, actually. The

00:04:02.479 --> 00:04:05.020
field also tackles how to reason by analogy.

00:04:05.219 --> 00:04:06.939
For instance, it works with induction, which

00:04:06.939 --> 00:04:10.379
is drawing a general rule from specific examples.

00:04:10.379 --> 00:04:13.219
OK. And also abduction. Abduction, like an alien

00:04:13.219 --> 00:04:17.019
abduction. Ha, no, logical abduction. It's essentially

00:04:17.019 --> 00:04:19.819
forming the most likely explanation for a set

00:04:19.819 --> 00:04:23.060
of observations, like a doctor diagnosing a patient

00:04:23.060 --> 00:04:25.750
based on a list of symptoms. Oh, I see. Yeah.

00:04:26.089 --> 00:04:28.129
And the sources also highlight something called

00:04:28.129 --> 00:04:31.129
non -monotonic reasoning. Okay, non -monotonic.

00:04:31.269 --> 00:04:32.689
I'm going to need you to break that down for

00:04:32.689 --> 00:04:35.250
me. Sure. So in classical logic, once you prove

00:04:35.250 --> 00:04:38.149
something is true, it stays true forever. Adding

00:04:38.149 --> 00:04:40.990
new information doesn't change it. That's monotonic.

00:04:41.410 --> 00:04:43.529
Right. Non -monotonic reasoning is closer to

00:04:43.529 --> 00:04:46.470
real life. It's when you might draw a logical

00:04:46.470 --> 00:04:48.889
conclusion, but then you learn a new piece of

00:04:48.889 --> 00:04:51.709
information that actually forces you to retract

00:04:51.709 --> 00:04:54.230
that earlier conclusion. So you basically change

00:04:54.230 --> 00:04:56.730
your mind based on new evidence. Right. Exactly.

00:04:56.850 --> 00:04:59.089
And a brilliant example of this from the reference

00:04:59.089 --> 00:05:02.040
materials is John Paul. Oscar system. Oscar?

00:05:02.220 --> 00:05:04.779
Yeah, Oscar is an automated argumentation system.

00:05:05.120 --> 00:05:07.519
So instead of just proving a rigid math theorem,

00:05:08.040 --> 00:05:10.740
it applies constraints of minimality and consistency

00:05:10.740 --> 00:05:14.180
to literally argue with itself. Wait, it argues

00:05:14.180 --> 00:05:17.079
with itself? Yeah, it builds what's called a

00:05:17.079 --> 00:05:19.930
defeasible argument. Defeasible meaning, you

00:05:19.930 --> 00:05:22.430
know, an argument that can be defeated. The system

00:05:22.430 --> 00:05:24.589
generates a claim and then it systematically

00:05:24.589 --> 00:05:27.149
searches its own database for a counter example

00:05:27.149 --> 00:05:30.649
to destroy its own claim. That's wild. It acts

00:05:30.649 --> 00:05:33.129
as both the prosecution and the defense, weighing

00:05:33.129 --> 00:05:35.910
evidence under uncertain conditions. Honestly,

00:05:35.930 --> 00:05:38.589
that sounds computationally exhausting. I mean,

00:05:38.589 --> 00:05:41.470
checking every single mathematical ice crystal

00:05:41.470 --> 00:05:44.269
or having a machine constantly prosecute and

00:05:44.269 --> 00:05:46.170
defend its own thoughts. What's a massive amount

00:05:46.170 --> 00:05:48.439
of processing? It really makes me wonder. what

00:05:48.439 --> 00:05:51.060
forced early computer scientists to even attempt

00:05:51.060 --> 00:05:54.120
this. Did they just like get tired of doing manual

00:05:54.120 --> 00:05:57.319
math? Well, sort of, yeah. The sheer physical

00:05:57.319 --> 00:05:59.620
limitation of the human brain to hold massive

00:05:59.620 --> 00:06:02.079
logical structures without making a single mistake

00:06:02.079 --> 00:06:04.439
was a huge driver. That makes sense. But the

00:06:04.439 --> 00:06:06.680
actual birth of the field is heavily debated.

00:06:07.300 --> 00:06:09.319
Some historians point back to Martin Davis in

00:06:09.319 --> 00:06:12.839
1954. He implemented a concept called Pressberger's

00:06:12.839 --> 00:06:15.500
Decision Procedure, and his program successfully

00:06:15.500 --> 00:06:18.439
proved that the sum of two even numbers is even.

00:06:18.639 --> 00:06:22.170
Okay, 1954. Right. But others argue the field

00:06:22.170 --> 00:06:24.730
really crystallized a few years later at this

00:06:24.730 --> 00:06:28.009
really famous Cornell summer meeting in 1957

00:06:28.009 --> 00:06:30.709
that gathered logicians and computer scientists

00:06:30.709 --> 00:06:32.529
together. You know, the project that really caught

00:06:32.529 --> 00:06:35.490
my eye right in that same window was the logic

00:06:35.490 --> 00:06:38.810
theorist program from 1956. Oh, yes. A classic.

00:06:39.110 --> 00:06:42.529
The creators, Alan Newell, Cliff Shaw and Herbert

00:06:42.529 --> 00:06:45.689
Simon, they pointed their machine at this massive

00:06:45.689 --> 00:06:48.589
book called the Principia Mathematica. What was

00:06:48.589 --> 00:06:51.470
the deal with that specific book? Well, the Principia

00:06:51.470 --> 00:06:55.069
Mathematica was a monumental multi -volume masterwork

00:06:55.069 --> 00:06:58.550
written between 1910 and 1913 by Alfred North

00:06:58.550 --> 00:07:01.029
Whitehead and Bertrand Russell. OK. Their whole

00:07:01.029 --> 00:07:03.930
goal was to derive all mathematical truths purely

00:07:03.930 --> 00:07:07.170
in terms of symbolic logic. It was widely considered

00:07:07.170 --> 00:07:09.970
the absolute Mount Everest of formal logic. So

00:07:09.970 --> 00:07:11.910
naturally, you point the new computer Mount Everest.

00:07:12.129 --> 00:07:15.350
Exactly. So Newell, Shaw, and Simon fed 52 theorems

00:07:15.350 --> 00:07:17.370
from chapter two of this absolute beast of a

00:07:17.370 --> 00:07:19.689
book into their logic theorist program. And the

00:07:19.689 --> 00:07:22.300
machine? proved 38 of them. It did. But here's

00:07:22.300 --> 00:07:25.379
the part that I just couldn't believe. The logic

00:07:25.379 --> 00:07:28.000
theorist didn't just mimic the human proofs that

00:07:28.000 --> 00:07:30.560
Whitehead and Russell had published. For one

00:07:30.560 --> 00:07:33.120
of those theorems, the machine actually found

00:07:33.120 --> 00:07:35.980
a proof that was more elegant. Yeah, that's the

00:07:35.980 --> 00:07:38.959
real kicker. Like it required fewer steps and

00:07:38.959 --> 00:07:41.420
was more logically beautiful than the proof provided

00:07:41.420 --> 00:07:43.740
by two of the greatest minds of the 20th century.

00:07:43.930 --> 00:07:47.449
It totally outlogist its human masters. It found

00:07:47.449 --> 00:07:50.209
a cleaner, purely mechanical path to the truth.

00:07:50.529 --> 00:07:52.449
And the funny thing is, the creators actually

00:07:52.449 --> 00:07:54.589
had trouble getting those results published in

00:07:54.589 --> 00:07:56.629
human mathematics journals at first. You were

00:07:56.629 --> 00:07:59.269
probably defensive. Oh, almost certainly. But

00:07:59.269 --> 00:08:01.970
that success deeply impacted how those scientists

00:08:01.970 --> 00:08:06.009
viewed the future. I mean, by 1958, Newell, Shaw,

00:08:06.069 --> 00:08:08.250
and Simon published a paper containing a quote

00:08:08.250 --> 00:08:11.089
that is just dripping with audacity. Let's hear

00:08:11.089 --> 00:08:14.339
it. They wrote, and I quote, There are now in

00:08:14.339 --> 00:08:16.939
the world machines that think, that learn, and

00:08:16.939 --> 00:08:20.060
that create. Moreover, their ability to do these

00:08:20.060 --> 00:08:22.519
things is going to increase rapidly until the

00:08:22.519 --> 00:08:25.060
range of problems they can handle will be coextensive

00:08:25.060 --> 00:08:27.360
with the range to which the human mind has been

00:08:27.360 --> 00:08:30.220
applied. Wow. They were essentially calling their

00:08:30.220 --> 00:08:33.820
shot in 1958, predicting the entire future of

00:08:33.820 --> 00:08:36.879
AI based on this one breakthrough. They completely

00:08:36.879 --> 00:08:39.539
saw the horizon. But you know, here's where it

00:08:39.539 --> 00:08:41.200
gets really interesting for me though. I have

00:08:41.200 --> 00:08:43.320
to push back a little on what this machine is

00:08:43.320 --> 00:08:45.529
actually doing. Okay, go for it. Aren't these

00:08:45.529 --> 00:08:48.070
early programs, and I mean even the modern ones,

00:08:48.330 --> 00:08:51.149
aren't they just glorified calculators? Like,

00:08:51.149 --> 00:08:54.230
how does a program actually prove a philosophical

00:08:54.230 --> 00:08:56.970
or mathematical concept without just crunching

00:08:56.970 --> 00:08:58.970
numbers really fast? That's a great question.

00:08:59.129 --> 00:09:01.149
And the distinction really lies in the difference

00:09:01.149 --> 00:09:04.389
between arithmetic and formal logic. A calculator

00:09:04.389 --> 00:09:06.690
takes numerical inputs and runs them through

00:09:06.690 --> 00:09:09.590
a hard -wired arithmetic operation. It just computes.

00:09:09.710 --> 00:09:12.190
Right. Numbers in, numbers out. Exactly. But

00:09:12.190 --> 00:09:15.580
automated reasoning systems manipulate They take

00:09:15.580 --> 00:09:18.440
abstract logical concepts and apply a strict

00:09:18.440 --> 00:09:20.679
set of rules to transform those symbols into

00:09:20.679 --> 00:09:23.480
a verified truth. They don't just calculate a

00:09:23.480 --> 00:09:25.879
result, they generate the actual chain of reasoning.

00:09:26.279 --> 00:09:28.460
OK, so if I am a computer scientist trying to

00:09:28.460 --> 00:09:31.559
build one of these logic engines, what kind of

00:09:31.559 --> 00:09:33.500
programming language am I even using to talk

00:09:33.500 --> 00:09:35.480
to the machine? Like, it can't just be standard

00:09:35.480 --> 00:09:38.860
code. No. You are using highly specialized languages

00:09:38.860 --> 00:09:41.419
designed specifically for symbolic manipulation.

00:09:42.259 --> 00:09:44.200
Let's look at a major system from the sources.

00:09:44.730 --> 00:09:47.970
the Boyer -Moore theorem prover, also known as

00:09:47.970 --> 00:09:53.110
NQTHM. NQTHM. Got it. This started in 1971 in

00:09:53.110 --> 00:09:55.549
Edinburgh, and it was built entirely in Pure

00:09:55.549 --> 00:09:58.730
Lisp, which is a very unique, highly recursive

00:09:58.730 --> 00:10:02.190
programming language. Good. And NQTHM had this

00:10:02.190 --> 00:10:05.009
fascinating mechanism for learning. It utilized

00:10:05.009 --> 00:10:07.429
an induction heuristic based entirely on the

00:10:07.429 --> 00:10:09.809
failure of its own symbolic evaluations. How

00:10:09.809 --> 00:10:12.029
does a machine logically learn from a failure?

00:10:12.240 --> 00:10:14.240
Well, imagine the system is trying to prove a

00:10:14.240 --> 00:10:16.240
rule that applies to all number. It tries to

00:10:16.240 --> 00:10:18.440
calculate the steps normally, but it hits a wall

00:10:18.440 --> 00:10:20.559
where a variable is unknown and it just can't

00:10:20.559 --> 00:10:22.879
proceed. Look at dead end. Right. Now a normal

00:10:22.879 --> 00:10:25.240
program would just crash or throw an error, but

00:10:25.240 --> 00:10:27.659
Enki THM was designed to analyze the shape of

00:10:27.659 --> 00:10:30.059
that dead end. The shape of it. Yeah, it looked

00:10:30.059 --> 00:10:32.559
at the exact point where the evaluation failed,

00:10:32.960 --> 00:10:35.399
and it used the structure of that failure to

00:10:35.399 --> 00:10:38.399
guess the underlying mathematical pattern, formulating

00:10:38.399 --> 00:10:41.059
a totally new logical step to bypass the wall.

00:10:41.320 --> 00:10:43.539
Oh, wow. So it literally mapped its own dead

00:10:43.539 --> 00:10:46.220
ends to find the open door. Exactly. That is

00:10:46.220 --> 00:10:47.659
brilliant. But let's bring this into the real

00:10:47.659 --> 00:10:49.820
world for a second, because if we're talking

00:10:49.820 --> 00:10:52.559
about trusting software to, I don't know, fly

00:10:52.559 --> 00:10:54.779
a commercial airplane or manage a national power

00:10:54.779 --> 00:10:57.000
grid, we can't just have a system making good

00:10:57.000 --> 00:10:59.440
guesses. We need Absolute certainty. Which is

00:10:59.440 --> 00:11:02.399
exactly where a system like ROCK comes in. ROCK,

00:11:02.620 --> 00:11:05.139
spelled R -O -Z -Q. Right. Yes, formerly known

00:11:05.139 --> 00:11:07.759
as COCK. This system was developed in France

00:11:07.759 --> 00:11:10.779
and represents just a massive leap forward. ROCK

00:11:10.779 --> 00:11:12.659
operates on something called the Calculus of

00:11:12.659 --> 00:11:16.159
Inductive Constructions, or CIC. Okay, CIC. In

00:11:16.159 --> 00:11:19.019
the CIC framework, a mathematical proof and a

00:11:19.019 --> 00:11:21.440
computer program are treated as the exact same

00:11:21.440 --> 00:11:23.759
mathematical object. Wait, hold on. A proof and

00:11:23.759 --> 00:11:26.269
a program are the same thing. In this specific

00:11:26.269 --> 00:11:29.090
architecture, yes. And because of this, Rock

00:11:29.090 --> 00:11:31.970
can do something incredible. Let's say you write

00:11:31.970 --> 00:11:34.289
a logical specification proving mathematically

00:11:34.289 --> 00:11:36.929
that a specific sorting algorithm is flawless.

00:11:37.090 --> 00:11:39.389
Okay. Rock doesn't just give you a checkmark

00:11:39.389 --> 00:11:42.470
saying your math is right. It mechanically extracts

00:11:42.470 --> 00:11:45.269
that theoretical proof and translates it directly

00:11:45.269 --> 00:11:48.750
into actual executable source code like OCaml

00:11:48.750 --> 00:11:51.649
or Haskell. Are you serious? So you do the pure

00:11:51.649 --> 00:11:53.909
math and the system literally spits out the working

00:11:53.909 --> 00:11:56.799
software? Yes. And because the software was extracted

00:11:56.799 --> 00:11:59.620
mechanically from a formal logical proof, it

00:11:59.620 --> 00:12:01.940
is physically impossible for that resulting code

00:12:01.940 --> 00:12:04.840
to have a logical bug. It is strictly guaranteed

00:12:04.840 --> 00:12:07.419
to behave exactly as specified. That completely

00:12:07.419 --> 00:12:09.860
blows my mind. What is the limit to this? How

00:12:09.860 --> 00:12:12.399
complex of a problem can these systems actually

00:12:12.399 --> 00:12:15.269
handle? The milestones are pretty staggering,

00:12:15.850 --> 00:12:19.029
honestly. Back in 1986, a researcher named Natarajan

00:12:19.029 --> 00:12:21.190
Shankar used the Boyer -Moor system we talked

00:12:21.190 --> 00:12:24.230
about to formalize Gödel's first incompleteness

00:12:24.230 --> 00:12:27.230
theorem. Which is huge. Massive. And by 2004,

00:12:27.490 --> 00:12:29.830
the rock system was used to verify the four -color

00:12:29.830 --> 00:12:32.970
theorem. But one of the most wild examples from

00:12:32.970 --> 00:12:36.570
the sources happened in 2016 regarding the Boolean

00:12:36.570 --> 00:12:40.110
Pythagorean triples problem. Researchers formalized

00:12:40.110 --> 00:12:42.929
this as a Boolean Satisfiability Problem, or

00:12:42.929 --> 00:12:45.769
a SAT problem. I saw the term SAT problem in

00:12:45.769 --> 00:12:48.000
the reading. What actually is that? Basically,

00:12:48.259 --> 00:12:51.019
a SAT problem asks if there is any possible way

00:12:51.019 --> 00:12:53.799
to assign true or false to a massive set of variables

00:12:53.799 --> 00:12:56.820
so that the entire overall equation evaluates

00:12:56.820 --> 00:12:59.500
to true. OK. True or false logic dates. Right.

00:12:59.639 --> 00:13:01.639
For the Pythagorean problem, imagine you have

00:13:01.639 --> 00:13:04.340
to color every whole number either red or blue.

00:13:04.820 --> 00:13:07.340
The rule is that no three numbers that satisfy

00:13:07.340 --> 00:13:09.460
the Pythagorean theorem, like three, four, and

00:13:09.460 --> 00:13:12.259
five, can all be the same color. OK. False. So

00:13:12.259 --> 00:13:13.919
mathematicians wanted to know if you could color

00:13:13.919 --> 00:13:16.000
numbers this way forever without breaking the

00:13:16.000 --> 00:13:18.399
rule. And how did the automated reasoner tackle

00:13:18.399 --> 00:13:21.460
that? By using pure brute force logic combined

00:13:21.460 --> 00:13:23.600
with some really smart pruning techniques, the

00:13:23.600 --> 00:13:25.519
system checked combinations all the way up to

00:13:25.519 --> 00:13:29.379
the number 7 ,824 and proved that, yes, you can

00:13:29.379 --> 00:13:31.580
color them without breaking the rule. But at

00:13:31.580 --> 00:13:34.720
the number 7 ,825, it becomes mathematically

00:13:34.720 --> 00:13:37.320
impossible. And to prove this, the system generated

00:13:37.320 --> 00:13:40.100
a proof file that was 200 terabytes in size.

00:13:40.320 --> 00:13:43.460
200 terabytes. Yeah. No human being could ever

00:13:43.460 --> 00:13:45.820
read that proof in a single lifetime. Not even.

00:13:45.769 --> 00:13:48.330
close, but we trust the result because we mathematically

00:13:48.330 --> 00:13:50.470
trust the logic engine that generated it. But

00:13:50.470 --> 00:13:52.970
wait, if we had the logic theorists outperforming

00:13:52.970 --> 00:13:56.490
humans back in 1956, and we are generating 200

00:13:56.490 --> 00:14:00.210
terabyte proofs today, why hasn't automated reasoning

00:14:00.210 --> 00:14:02.490
just been the dominant force in tech for the

00:14:02.490 --> 00:14:06.090
last 70 years? I mean, the reality of technological

00:14:06.090 --> 00:14:08.590
progress is rarely a straight line. What happened

00:14:08.590 --> 00:14:10.779
in between? Yeah, it definitely wasn't a straight

00:14:10.779 --> 00:14:13.580
line. The field actually hit a massive wall in

00:14:13.580 --> 00:14:16.559
the 1980s and early 1990s. This period is widely

00:14:16.559 --> 00:14:19.179
known as the AI winter. Right, the AI winter.

00:14:19.379 --> 00:14:21.740
The theoretical math was totally sound, but the

00:14:21.740 --> 00:14:24.120
hardware simply couldn't handle the combinatorial

00:14:24.120 --> 00:14:26.440
explosion of data required to run these proofs

00:14:26.440 --> 00:14:28.980
at scale. Funding dried up. So they just got

00:14:28.980 --> 00:14:31.250
stuck waiting for better computers. Pretty much.

00:14:31.710 --> 00:14:33.830
But the researchers didn't stop. They basically

00:14:33.830 --> 00:14:36.269
went into hibernation and optimized their logic

00:14:36.269 --> 00:14:39.529
engines. And the rebound became highly visible

00:14:39.529 --> 00:14:42.889
by 2005. That's when Microsoft started integrating

00:14:42.889 --> 00:14:45.830
automated verification technology directly into

00:14:45.830 --> 00:14:48.389
their internal projects. Oh, interesting. Yeah,

00:14:48.389 --> 00:14:50.669
they even incorporated logical checking into

00:14:50.669 --> 00:14:54.110
the 2012 version of Visual C. So the technology

00:14:54.110 --> 00:14:56.950
moved out of academic labs and straight into

00:14:56.950 --> 00:14:59.509
the software compiling tools used by millions

00:14:59.509 --> 00:15:02.110
of regular developers. So what does this all

00:15:02.110 --> 00:15:04.470
mean for us today right in the middle of this

00:15:04.470 --> 00:15:08.529
massive boom in generative AI? Well in the 2020s

00:15:08.529 --> 00:15:11.210
the focus has violently shifted back to automated

00:15:11.210 --> 00:15:14.190
reasoning as the ultimate cure for the AI hallucination

00:15:14.190 --> 00:15:15.710
problem we talked about at the start. Bringing

00:15:15.710 --> 00:15:18.549
it full circle. Exactly. Large language models

00:15:18.549 --> 00:15:20.750
guess the next word based on statistical patterns.

00:15:20.850 --> 00:15:23.159
They don't actually understand truth. So to fix

00:15:23.159 --> 00:15:25.220
this, researchers are building reasoning language

00:15:25.220 --> 00:15:28.039
models like DeepSeq R1, which are designed to

00:15:28.039 --> 00:15:30.519
literally spend additional computational time

00:15:30.519 --> 00:15:32.840
pondering a problem before they generate a single

00:15:32.840 --> 00:15:35.460
word of output. But how does spending more time

00:15:35.460 --> 00:15:39.500
pondering actually fix the hallucination fundamentally?

00:15:40.259 --> 00:15:42.080
Like, isn't it just thinking about a wrong answer

00:15:42.080 --> 00:15:44.899
longer? If we connect this to the bigger picture,

00:15:45.220 --> 00:15:47.320
it leads us to the absolute bleeding edge of

00:15:47.320 --> 00:15:50.639
the field. Neuro -symbolic architectures. This

00:15:50.639 --> 00:15:52.919
is the marriage of two completely different eras

00:15:52.919 --> 00:15:56.480
of computer science. Okay, neuro -symbolic. So

00:15:56.480 --> 00:15:58.440
the neuro part being the modern neural networks

00:15:58.440 --> 00:16:01.120
and the symbolic part being the formal logic

00:16:01.120 --> 00:16:03.440
engines like rock that we just discussed. You

00:16:03.440 --> 00:16:06.000
nailed it! It sounds kind of like pairing a wildly

00:16:06.000 --> 00:16:09.539
creative architect who dreams up impossible gravity

00:16:09.539 --> 00:16:12.659
-defying buildings with a ruthless safety inspector

00:16:12.659 --> 00:16:15.120
who checks the math on every single load -bearing

00:16:15.120 --> 00:16:17.620
beam before a single brick is laid. That is a

00:16:17.620 --> 00:16:19.480
perfect way to look at it. The neural network

00:16:19.480 --> 00:16:22.019
provides the intuition, the pattern recognition,

00:16:22.299 --> 00:16:24.960
and those creative leaps. It drafts the blueprint.

00:16:25.179 --> 00:16:27.259
Right. But before that answer is ever shown to

00:16:27.259 --> 00:16:30.320
you, the symbolic logic engine takes over. It

00:16:30.320 --> 00:16:32.559
attempts to formally prove the neural network's

00:16:32.559 --> 00:16:35.120
generated answer using strict mathematical axiom.

00:16:35.139 --> 00:16:37.639
And if it fails? If the safety inspector finds

00:16:37.639 --> 00:16:41.259
a logical flaw, it rejects the answer and forces

00:16:41.259 --> 00:16:44.309
the neural network to try again. The entire goal

00:16:44.309 --> 00:16:47.590
is to give AI a built -in, mathematically flawless

00:16:47.590 --> 00:16:51.090
lie detector. Man, it really is the invisible

00:16:51.090 --> 00:16:53.269
safety net for our modern digital infrastructure.

00:16:53.409 --> 00:16:56.250
Yeah. Because as we discussed, we are handing

00:16:56.250 --> 00:16:59.230
over control of power grids, financial markets,

00:16:59.330 --> 00:17:02.250
and aviation software to machines. If there is

00:17:02.250 --> 00:17:04.630
a logical bug in those systems, the consequences

00:17:04.630 --> 00:17:07.349
are just catastrophic. Absolutely. And to keep

00:17:07.349 --> 00:17:09.609
improving that safety net, the community maintains

00:17:09.609 --> 00:17:12.589
this massive, constantly updated library called

00:17:12.589 --> 00:17:16.289
the TPTP, the Thousands of Problems for Theorem

00:17:16.289 --> 00:17:18.480
Provers. Library of problems. Yeah. It provides

00:17:18.480 --> 00:17:20.500
a standardized battleground for these systems.

00:17:20.980 --> 00:17:22.859
There are even regular competitions held at the

00:17:22.859 --> 00:17:25.220
KD conference where different automated theorem

00:17:25.220 --> 00:17:27.019
provers from all around the world compete to

00:17:27.019 --> 00:17:29.640
see which system can solve the most complex logic

00:17:29.640 --> 00:17:33.000
puzzles from the TPTP library. It is literally

00:17:33.000 --> 00:17:35.299
the Olympics for artificial logic. It really

00:17:35.299 --> 00:17:38.380
is. I love that so much. An international competition

00:17:38.380 --> 00:17:42.039
to build a sharper, faster truth machine. As

00:17:42.039 --> 00:17:44.220
we wrap up this deep dive, want to turn it over

00:17:44.220 --> 00:17:46.900
to you for the final word, looking at this entire

00:17:46.900 --> 00:17:50.200
journey from the 1950s to today. What is the

00:17:50.200 --> 00:17:51.839
biggest takeaway the listener should be chewing

00:17:51.839 --> 00:17:54.500
on? You know, I keep going back to that initial

00:17:54.500 --> 00:17:57.279
shock in 1956 with the logic theorist. Yeah,

00:17:57.279 --> 00:18:00.200
that was wild. The machine found a more elegant

00:18:00.200 --> 00:18:02.400
mathematical proof than Whitehead and Russell,

00:18:02.759 --> 00:18:05.339
and it did it by mechanically applying logical

00:18:05.339 --> 00:18:08.339
rules without using a single drop of human intuition.

00:18:08.559 --> 00:18:10.960
Right. If a machine can create a more elegant

00:18:10.960 --> 00:18:14.019
truth through sheer mechanical deduction, it

00:18:14.019 --> 00:18:16.220
really forces us to ask a rather uncomfortable

00:18:16.220 --> 00:18:18.920
question. Which is? Is human intuition actually

00:18:18.920 --> 00:18:21.319
the magical, irreplaceable source of creativity

00:18:21.319 --> 00:18:24.549
we think? it is? Or is intuition just a biological

00:18:24.549 --> 00:18:27.450
shortcut, a messy adaptation our brains use simply

00:18:27.450 --> 00:18:29.430
because we lack the processing power to run the

00:18:29.430 --> 00:18:32.789
formal logic? Our greatest aha moments just workarounds

00:18:32.789 --> 00:18:37.089
for our biological hardware limits. That is a

00:18:37.089 --> 00:18:39.309
wild thought to leave on. Thank you so much for

00:18:39.309 --> 00:18:40.890
joining us on this journey into the invisible

00:18:40.890 --> 00:18:43.529
logic running our world. Keep questioning your

00:18:43.529 --> 00:18:45.690
own assumptions. Keep checking those ice crystals.

00:18:45.970 --> 00:18:48.089
We will catch you on the next Deep Dive.
