WEBVTT

00:00:00.000 --> 00:00:02.259
Imagine an AI that doesn't just, you know, assist

00:00:02.259 --> 00:00:04.320
you, but actually builds things. You give it

00:00:04.320 --> 00:00:07.360
a single prompt, and poof! Maybe a video game

00:00:07.360 --> 00:00:10.000
appears. Well, that's not quite science fiction

00:00:10.000 --> 00:00:12.970
anymore. Welcome to the Deep Dive. We're here

00:00:12.970 --> 00:00:15.250
to untack the technologies really reshaping our

00:00:15.250 --> 00:00:19.089
world. Today, we are going deep on OpenAI's GPT

00:00:19.089 --> 00:00:21.870
-5, our mission, to figure out what makes this

00:00:21.870 --> 00:00:24.329
feel less like just an update and more like a,

00:00:24.329 --> 00:00:26.609
well, a genuine paradigm shift for automation

00:00:26.609 --> 00:00:28.710
and maybe beyond. That's right. We're going to

00:00:28.710 --> 00:00:31.969
explore what GPT -5 is. You can think of it almost

00:00:31.969 --> 00:00:35.030
like a family of specialized AI brains. We'll

00:00:35.030 --> 00:00:37.549
break down some, frankly, jaw -dropping benchmark

00:00:37.549 --> 00:00:40.329
results. And uncover these surprising one -shot

00:00:40.329 --> 00:00:43.170
creation abilities it seems to have. And then

00:00:43.170 --> 00:00:44.869
we'll pivot. We'll look at the practical side,

00:00:44.990 --> 00:00:47.090
what this means for automation, for real -world

00:00:47.090 --> 00:00:49.170
applications. We want to offer you a kind of

00:00:49.170 --> 00:00:51.570
strategic playbook for using this power efficiently.

00:00:52.009 --> 00:00:54.130
And, like always, we'll wrap up by looking at

00:00:54.130 --> 00:00:56.350
the future this tech is carving out. Our approach

00:00:56.350 --> 00:01:00.310
today, calm, curious, focused on really understanding

00:01:00.310 --> 00:01:02.950
this new frontier. Let's get into it. Okay, let's

00:01:02.950 --> 00:01:04.930
unpack this then. The source material we looked

00:01:04.930 --> 00:01:08.739
at calls GPT -5 a seismic event. When we hear

00:01:08.739 --> 00:01:11.209
seismic event, I mean, what really makes this

00:01:11.209 --> 00:01:14.530
more than just, say, GPT -4 .5? What's fundamentally

00:01:14.530 --> 00:01:17.290
different here? Well, Sam Altman, OpenAI's CEO,

00:01:17.590 --> 00:01:19.730
he put it pretty well, I think. He basically

00:01:19.730 --> 00:01:23.170
described the leap from 4 to 5 as the difference

00:01:23.170 --> 00:01:25.450
between talking to a really smart college student,

00:01:25.530 --> 00:01:27.489
that was GPT -4, and talking to a PhD -level

00:01:27.489 --> 00:01:30.430
expert, that's GPT -5. He said GPT -3 was maybe

00:01:30.430 --> 00:01:32.890
like a high schooler. GPT -4, the college student.

00:01:32.969 --> 00:01:35.730
But GPT -5, it acts like a true domain expert,

00:01:35.829 --> 00:01:37.730
pretty much across any field you throw at it.

00:01:37.769 --> 00:01:40.349
A PhD -level expert. Okay, that's quite a jump.

00:01:40.469 --> 00:01:42.829
It really is. And for people actually building

00:01:42.829 --> 00:01:45.269
things, you know, building automation, it boils

00:01:45.269 --> 00:01:47.769
down to three critical improvements. First, just

00:01:47.769 --> 00:01:50.469
better reasoning and problem solving. GPT -5

00:01:50.469 --> 00:01:52.609
can work through these really complex multi -step

00:01:52.609 --> 00:01:55.930
problems with an accuracy that we just haven't

00:01:55.930 --> 00:01:58.569
seen before. Second, much better tool integration,

00:01:58.890 --> 00:02:01.409
its ability to understand and use external tools,

00:02:01.549 --> 00:02:04.349
APIs, which is vital for any real -world automation

00:02:04.349 --> 00:02:07.349
that's dramatically improved. And third, this

00:02:07.349 --> 00:02:09.490
is a big one, the cost efficiency is kind of

00:02:09.490 --> 00:02:12.610
mind -bending. The input tokens for the main

00:02:12.610 --> 00:02:16.189
GPT -5 model, they cost half of what GPT -4s

00:02:16.189 --> 00:02:19.629
did. We're talking 1 .25 per million input tokens

00:02:19.629 --> 00:02:22.669
for GPT -5 standard. That suddenly makes enterprise

00:02:22.669 --> 00:02:25.669
-grade AI really accessible. So it isn't just

00:02:25.669 --> 00:02:27.849
about raw power then, but more how it thinks

00:02:27.849 --> 00:02:29.909
and how it connects things. Exactly. It's about

00:02:29.909 --> 00:02:32.129
deeper understanding and real practical application.

00:02:32.990 --> 00:02:34.990
Hey, here's where it gets, I think, really interesting.

00:02:35.189 --> 00:02:37.789
The source mentions GPT -5 isn't just one single

00:02:37.789 --> 00:02:40.069
model. It's a family of brains. What does that

00:02:40.069 --> 00:02:42.550
actually mean for us? You know, the users, the

00:02:42.550 --> 00:02:44.509
builders. How should we think about that? It

00:02:44.509 --> 00:02:47.189
means you get choices and maybe more importantly,

00:02:47.330 --> 00:02:50.449
optimization. GPT -5 isn't this one monolithic

00:02:50.449 --> 00:02:53.050
thing. It's actually a suite of specialized brains.

00:02:53.449 --> 00:02:55.569
Each one is tuned for different kinds of tasks,

00:02:55.689 --> 00:02:58.289
different cost profiles. So you get GPT -5 standard.

00:02:58.490 --> 00:03:00.990
That's your well -rounded genius. Good for most

00:03:00.990 --> 00:03:03.849
general stuff. Then there's GPT -5 pro. That's

00:03:03.849 --> 00:03:06.430
the real deep thinker for those super complex

00:03:06.430 --> 00:03:09.490
multi -step problems that need maximum reasoning

00:03:09.490 --> 00:03:11.650
power. Right. And what about for things that

00:03:11.650 --> 00:03:14.129
are maybe simpler, but you do a lot of. High

00:03:14.129 --> 00:03:16.909
volume tasks. Exactly. For that, you get GPT

00:03:16.909 --> 00:03:19.409
-5 Mini. It's described as a hyper -efficient

00:03:19.409 --> 00:03:22.009
workhorse. Great for high volume, maybe simpler

00:03:22.009 --> 00:03:24.949
tasks where cost is a really big factor. And

00:03:24.949 --> 00:03:26.969
finally, there's GPT -5 Nano. That's the most

00:03:26.969 --> 00:03:29.530
economical one. Perfect for really basic high

00:03:29.530 --> 00:03:32.090
frequency needs. And this new pricing structure,

00:03:32.270 --> 00:03:34.189
alongside the family, it's kind of revolutionary.

00:03:34.349 --> 00:03:37.740
It democratizes this advanced AI. Small businesses,

00:03:37.819 --> 00:03:39.539
even solo entrepreneurs who might have found

00:03:39.539 --> 00:03:42.620
GPT -4 a bit too pricey for heavy use. Now it's

00:03:42.620 --> 00:03:45.039
potentially within reach. It feels like an economic

00:03:45.039 --> 00:03:47.849
shift just as much as a tech shift. So you can

00:03:47.849 --> 00:03:49.750
really pick the right brain for the specific

00:03:49.750 --> 00:03:52.710
job you have in mind. Yes, tailoring the power

00:03:52.710 --> 00:03:56.009
and the cost to your specific needs. Okay, let's

00:03:56.009 --> 00:03:58.069
talk about the report card. The benchmarks for

00:03:58.069 --> 00:04:00.909
GPT -5 show some pretty staggering performance

00:04:00.909 --> 00:04:03.009
improvements. Let's dive into those numbers.

00:04:03.169 --> 00:04:05.229
What do they tell us about what's actually possible

00:04:05.229 --> 00:04:07.669
now? Yeah, the numbers are, they're pretty clear.

00:04:07.770 --> 00:04:10.370
We've kind of stepped into a new era here. Take

00:04:10.370 --> 00:04:13.900
SWE Bench. That uses real -world coding problems

00:04:13.900 --> 00:04:17.939
pulled from GitHub. GPT -5 hit a 75 % accuracy

00:04:17.939 --> 00:04:21.360
rate there. That's correctly solving nearly three

00:04:21.360 --> 00:04:23.639
out of every four complex software engineering

00:04:23.639 --> 00:04:25.699
challenges it was given. That's just a massive

00:04:25.699 --> 00:04:29.000
leap for code generation. Wow, 75 % on real GitHub

00:04:29.000 --> 00:04:31.339
issues. That's practically like having a pretty

00:04:31.339 --> 00:04:33.980
dependable junior developer on call. It's certainly

00:04:33.980 --> 00:04:36.160
getting there. And then there's its multi -language

00:04:36.160 --> 00:04:39.019
ability. On a benchmark called Eater Polyglot,

00:04:39.139 --> 00:04:41.990
this tests how well an AI understands. and edits

00:04:41.990 --> 00:04:44.870
code across different languages, like Python,

00:04:45.110 --> 00:04:47.209
JavaScript, SQL, you know, like a full stack

00:04:47.209 --> 00:04:50.709
dev, GPT -5 scored 88%. That means it's getting

00:04:50.709 --> 00:04:52.730
close to being a really dependable coworker for

00:04:52.730 --> 00:04:55.129
development teams. Getting it right almost nine

00:04:55.129 --> 00:04:57.250
times out of 10 on these tricky multi -language

00:04:57.250 --> 00:04:59.290
tasks, that's huge. And then there's this thing

00:04:59.290 --> 00:05:02.189
the source calls the miracle maker, single prompt

00:05:02.189 --> 00:05:04.509
creations. That sounds, well, it sounds a bit

00:05:04.509 --> 00:05:06.689
like magic. Yeah, this is where it gets genuinely

00:05:06.689 --> 00:05:10.069
kind of mind bending, which is one single well

00:05:10.069 --> 00:05:12.490
thought out. prompt, GPT -5 has apparently been

00:05:12.490 --> 00:05:15.689
shown to create complete, professionally designed

00:05:15.689 --> 00:05:19.230
landing pages. Or interactive, fully working

00:05:19.230 --> 00:05:22.250
audio step sequencers. Little music tools. And

00:05:22.250 --> 00:05:24.889
yeah, even fully playable, pretty complex spaceship

00:05:24.889 --> 00:05:27.750
video games. And the key thing is, these weren't

00:05:27.750 --> 00:05:30.089
tweaked over and over. They were supposedly generated

00:05:30.089 --> 00:05:33.160
in a single shot. Whoa. I mean, imagine scaling

00:05:33.160 --> 00:05:35.279
that kind of one -shot creation capability up

00:05:35.279 --> 00:05:37.779
to, say, a billion users. It just fundamentally

00:05:37.779 --> 00:05:39.939
changes things like rapid prototyping. You could

00:05:39.939 --> 00:05:42.259
slash development cycles, accelerate getting

00:05:42.259 --> 00:05:44.720
minimum viable products out there. Okay, stepping

00:05:44.720 --> 00:05:47.240
back from the benchmarks for a sec, how do these

00:05:47.240 --> 00:05:50.560
breakthroughs really translate into a builder's

00:05:50.560 --> 00:05:52.920
dream, especially for automation? What specific

00:05:52.920 --> 00:05:55.790
headaches does GPT -5 maybe solve? Right, for

00:05:55.790 --> 00:05:57.370
automation builders, there are three things that

00:05:57.370 --> 00:05:59.470
really stand out. First, that tool usage accuracy

00:05:59.470 --> 00:06:02.910
we mentioned. GPT -5 is significantly better

00:06:02.910 --> 00:06:05.250
at understanding and correctly using external

00:06:05.250 --> 00:06:09.310
tools and APIs. That's just crucial for complex

00:06:09.310 --> 00:06:11.269
workflows that interact with the real world.

00:06:11.470 --> 00:06:14.870
Second, long context handling. It can keep track

00:06:14.870 --> 00:06:17.290
of the conversation, the instructions, over much

00:06:17.290 --> 00:06:20.069
longer multi -step tasks. Basically, it doesn't

00:06:20.069 --> 00:06:22.550
forget what it was doing halfway through. That

00:06:22.550 --> 00:06:24.670
makes it viable for much more sophisticated business.

00:06:24.839 --> 00:06:28.920
And third, factual accuracy. There seems to be

00:06:28.920 --> 00:06:31.819
a marked improvement, a much lower rate of hallucination.

00:06:32.160 --> 00:06:34.860
You know, when the AI just makes stuff up confidently.

00:06:34.980 --> 00:06:37.339
That obviously increases reliability quite a

00:06:37.339 --> 00:06:39.319
bit. So it's not just smarter in theory, but

00:06:39.319 --> 00:06:41.620
it's also more reliable in practice. And it actually

00:06:41.620 --> 00:06:43.620
remembers what you told it earlier. Precisely.

00:06:43.620 --> 00:06:45.699
It feels like a much more robust and trustworthy

00:06:45.699 --> 00:06:48.100
AI partner. This really sounds like a perfect

00:06:48.100 --> 00:06:50.620
match for automation platforms. The source actually

00:06:50.620 --> 00:06:53.360
calls the combination of GPT -5 and NANN, that's

00:06:53.360 --> 00:06:55.920
a popular automation tool, a perfect marriage.

00:06:56.990 --> 00:06:59.910
Why is that synergy supposedly so powerful? What

00:06:59.910 --> 00:07:02.089
makes them complement each other so well? It

00:07:02.089 --> 00:07:03.490
really does feel like that. Think of it like

00:07:03.490 --> 00:07:05.850
pairing, I don't know, the world's greatest architect

00:07:05.850 --> 00:07:09.029
that's GPT -5's intelligence with the most efficient,

00:07:09.129 --> 00:07:12.029
flexible construction crew that's N8N's platform.

00:07:13.259 --> 00:07:16.459
GPT -5's huge jump in accuracy means you can

00:07:16.459 --> 00:07:19.220
now automate more complex, even mission -critical

00:07:19.220 --> 00:07:22.480
tasks with way more confidence. Fewer errors

00:07:22.480 --> 00:07:25.060
mean more reliable business processes and probably

00:07:25.060 --> 00:07:27.060
fewer late -night calls because something broke.

00:07:27.300 --> 00:07:29.620
So it smooths out the rough edges, makes automation

00:07:29.620 --> 00:07:32.980
less brittle. Yes, exactly. And beyond just accuracy,

00:07:33.279 --> 00:07:36.060
GPT -5 is more of an all -in -one AI now. It

00:07:36.060 --> 00:07:39.079
can handle text, images, code, complex reasoning,

00:07:39.120 --> 00:07:42.199
all within a single model potentially. That simplifies

00:07:42.199 --> 00:07:44.879
workflows a lot. Before, you might need to chain

00:07:44.879 --> 00:07:47.120
together multiple different specialized AI services.

00:07:47.379 --> 00:07:49.560
Now, GPT -5 might handle the whole chain right

00:07:49.560 --> 00:07:52.180
inside your RN8N workflow. And maybe most importantly,

00:07:52.279 --> 00:07:55.300
it democratizes this power. RN8N is visual. It's

00:07:55.300 --> 00:07:57.639
low code or no code. Combine that ease of use

00:07:57.639 --> 00:07:59.980
with GPT -5's incredible ability to follow complex

00:07:59.980 --> 00:08:02.339
instructions. Well, suddenly, really advanced

00:08:02.339 --> 00:08:04.259
automation becomes accessible to non -technical

00:08:04.259 --> 00:08:06.560
users, maybe for the first time ever. So this

00:08:06.560 --> 00:08:09.339
combination means more people can build much

00:08:09.339 --> 00:08:12.000
more powerful things without needing to be expert

00:08:12.000 --> 00:08:15.160
coders. Exactly. It's genuinely empowering for

00:08:15.160 --> 00:08:17.339
builders of all kinds. Okay, for anyone listening

00:08:17.339 --> 00:08:19.600
who's now itching to try this out. The guide

00:08:19.600 --> 00:08:22.639
we read provides a step -by -step for hooking

00:08:22.639 --> 00:08:25.879
up GPT -5 with NETAN. Can you give us the quick

00:08:25.879 --> 00:08:28.000
rundown? How do people get started? Yeah, it's

00:08:28.000 --> 00:08:29.579
pretty straightforward, actually. Yeah. First,

00:08:29.819 --> 00:08:32.139
you need the key to the engine room, right? That's

00:08:32.139 --> 00:08:35.620
your OpenAI API setup. Just go to platform .openai

00:08:35.620 --> 00:08:37.480
.com, set up an account if you don't have one,

00:08:37.559 --> 00:08:40.120
add a payment method because API usage isn't

00:08:40.120 --> 00:08:43.279
free, and then create a new secret API key. Now,

00:08:43.340 --> 00:08:46.080
critical step here. Copy that key immediately

00:08:46.080 --> 00:08:48.440
and save it somewhere super secure, like a password

00:08:48.440 --> 00:08:50.940
manager. Treat it like a password, maybe even

00:08:50.940 --> 00:08:52.720
more carefully, because you won't be able to

00:08:52.720 --> 00:08:54.740
see the full key again after you create it. Got

00:08:54.740 --> 00:08:57.679
it. Guard that key. Then what's next inside NAN?

00:08:57.919 --> 00:09:01.500
Right. Then inside your NAN workflow, you just

00:09:01.500 --> 00:09:04.190
add an AI agent node. In its settings, you select

00:09:04.190 --> 00:09:07.169
OpenAI Chat Model, paste in that secret API key

00:09:07.169 --> 00:09:09.669
you just saved, and then, from the model drop

00:09:09.669 --> 00:09:12.450
-down menu, you should now see options for GPT

00:09:12.450 --> 00:09:15.330
-5 and its variants, like Mini or Pro. You just

00:09:15.330 --> 00:09:18.110
pick the one you want to use. Now, a really crucial

00:09:18.110 --> 00:09:19.950
note on billing, because this confuses a lot

00:09:19.950 --> 00:09:23.200
of people starting out. Your... monthly ChatGPT

00:09:23.200 --> 00:09:25.559
Plus subscription, if you have one, and the OpenAI

00:09:25.559 --> 00:09:28.220
API. They're two totally separate products with

00:09:28.220 --> 00:09:30.940
separate billing. ChatGPT Plus is kind of like

00:09:30.940 --> 00:09:32.720
an all -you -can -eat buffet for the chat interface.

00:09:32.960 --> 00:09:35.779
The API is a la carte. You pay specifically for

00:09:35.779 --> 00:09:38.179
what your automation workflow uses based on tokens

00:09:38.179 --> 00:09:41.639
processed. Ah, okay. So API usage is pay -as

00:09:41.639 --> 00:09:43.120
-you -go, separate from the chatbot subscription.

00:09:43.360 --> 00:09:45.500
That's important. So it's powerful, but you also

00:09:45.500 --> 00:09:47.480
need to be smart about managing those costs.

00:09:47.639 --> 00:09:50.159
Yes. Managing those API costs is absolutely key

00:09:50.159 --> 00:09:56.240
to using AI. All right. Academic benchmarks are

00:09:56.240 --> 00:09:58.779
one thing, but how does GPT -5 actually perform

00:09:58.779 --> 00:10:03.100
in, let's say, a real -world cage match? against

00:10:03.100 --> 00:10:06.279
the previous champ, GPT -4 .0. What happens when

00:10:06.279 --> 00:10:08.899
the rubber really meets the road in an actual

00:10:08.899 --> 00:10:10.980
automation workflow? Yeah, that's exactly what

00:10:10.980 --> 00:10:12.840
we wanted to find out too. So the test wasn't

00:10:12.840 --> 00:10:15.080
abstract. It was designed as a practical gauntlet.

00:10:15.200 --> 00:10:18.059
We ran both GPT -4 .0 and GPT -5 through 10,

00:10:18.259 --> 00:10:21.159
identical demanding test scenarios right inside

00:10:21.159 --> 00:10:24.240
NE10. Things like generating complex emails based

00:10:24.240 --> 00:10:26.860
on data from external tools or doing tricky database

00:10:26.860 --> 00:10:29.139
lookups and summarizing the results. And importantly,

00:10:29.139 --> 00:10:31.899
we used no custom system prompts, just raw out

00:10:31.899 --> 00:10:33.580
of the box. the box intelligence from both models.

00:10:33.700 --> 00:10:35.539
Okay. So head to head, what was the tail of the

00:10:35.539 --> 00:10:37.539
tape? How did they stack up? Right. Round one,

00:10:37.659 --> 00:10:40.919
accuracy. Here, GPT -5 landed a really heavy

00:10:40.919 --> 00:10:44.559
blow. GPT -4 -0 scored a respectable 4 .2 out

00:10:44.559 --> 00:10:47.860
of 5 .0 across the tests. But GPT -5, it hit

00:10:47.860 --> 00:10:50.559
4 .7 out of 5 .0. It's a really significant jump.

00:10:50.639 --> 00:10:52.919
It means far more reliable, much less error -prone

00:10:52.919 --> 00:10:54.720
automation right out of the gate. Round two,

00:10:54.840 --> 00:10:57.950
speed. Here, GPT -4 was consistently faster.

00:10:58.250 --> 00:11:00.750
Now, this is likely temporary, maybe due to server

00:11:00.750 --> 00:11:02.769
load from the GPT -5 launch, but for the moment,

00:11:02.870 --> 00:11:05.149
the veteran GPT -4 definitely had quicker responses.

00:11:05.470 --> 00:11:07.529
Interesting. And cost. That's always a huge factor

00:11:07.529 --> 00:11:09.590
for anyone building automations at scale. Right.

00:11:09.710 --> 00:11:12.509
Round three, cost. This was actually a bit of

00:11:12.509 --> 00:11:15.559
a surprise. While GPT -5's input tokens are indeed

00:11:15.559 --> 00:11:18.960
50 % cheaper, as OpenAI announced, it turns out

00:11:18.960 --> 00:11:21.440
GPT -5 is often so much more thorough and detailed

00:11:21.440 --> 00:11:24.059
in its responses that it frequently generates

00:11:24.059 --> 00:11:26.480
a significantly higher volume of output tokens

00:11:26.480 --> 00:11:30.360
compared to GPT -40 for the same task. So paradoxically,

00:11:30.440 --> 00:11:32.480
this can sometimes lead to a slightly higher

00:11:32.480 --> 00:11:35.159
overall cost per task with GPT -5, even though

00:11:35.159 --> 00:11:37.220
the input is cheaper. It's kind of a you -get

00:11:37.220 --> 00:11:39.100
-what -you -pay -for situation in terms of detail.

00:11:39.340 --> 00:11:41.299
Honestly, I still wrestle with prompt drift myself

00:11:41.299 --> 00:11:43.500
sometimes, you know. Finding that sweet spot

00:11:43.500 --> 00:11:45.259
between getting a really thorough answer and

00:11:45.259 --> 00:11:47.659
keeping the token count reasonable. Ah, that

00:11:47.659 --> 00:11:49.840
makes sense. More detail costs more tokens. Okay,

00:11:49.919 --> 00:11:52.679
so accuracy to GPT -5, speed to 4 .0, cost is

00:11:52.679 --> 00:11:56.200
nuanced. Was there a knockout punch? What really

00:11:56.200 --> 00:11:59.740
sealed the deal, if anything? Round four. Quality.

00:12:00.200 --> 00:12:03.220
Yeah, this is where GPT -5 delivered the decisive

00:12:03.220 --> 00:12:05.379
blow, I'd say. The sheer quality of the output

00:12:05.379 --> 00:12:08.309
was just in a completely different league. The

00:12:08.309 --> 00:12:10.490
responses felt dramatically more detailed, more

00:12:10.490 --> 00:12:13.450
personal, more nuanced, just more human -like.

00:12:13.789 --> 00:12:16.110
It didn't feel like just an incremental improvement

00:12:16.110 --> 00:12:18.629
over GPT -4 .0. It really felt like a generational

00:12:18.629 --> 00:12:20.850
leap in the quality of the interaction. And what

00:12:20.850 --> 00:12:23.289
about the little guy, the mini version? Did the

00:12:23.289 --> 00:12:26.110
featherweight contender make an impact in these

00:12:26.110 --> 00:12:28.470
tests? Yeah, the featherweight. GPT -5 mini was

00:12:28.470 --> 00:12:30.669
also put through the same 10 tests. It scored

00:12:30.669 --> 00:12:33.909
3 .6 out of 5 .0 for accuracy, respectable, but

00:12:33.909 --> 00:12:36.889
clearly below the bigger models. But here's the

00:12:36.889 --> 00:12:39.769
kicker. Running all 10 evaluations cost approximately

00:12:39.769 --> 00:12:42.889
$0 .03 in total API charges. Just $0 .03. So

00:12:42.889 --> 00:12:45.330
the verdict there is pretty clear. For high volume,

00:12:45.570 --> 00:12:47.909
relatively simple tasks where cost is absolutely

00:12:47.909 --> 00:12:50.809
paramount, GPT -5 Mini looks like the undisputed

00:12:50.809 --> 00:12:53.009
king of efficiency. So yeah, the overall judge's

00:12:53.009 --> 00:12:55.590
decision from the cage match. GPT -5 is the clear

00:12:55.590 --> 00:12:57.210
champion for accuracy and especially quality,

00:12:57.450 --> 00:12:59.750
but there's currently a tradeoff in speed and

00:12:59.750 --> 00:13:01.850
potentially slightly higher overall cost because

00:13:01.850 --> 00:13:04.070
its output is so much richer. So it sounds like

00:13:04.070 --> 00:13:06.649
for business -critical tasks, that boost in quality

00:13:06.649 --> 00:13:09.429
and reliability probably outweighs the slightly

00:13:09.429 --> 00:13:12.129
slower speed or the nuanced cost difference.

00:13:12.450 --> 00:13:15.389
Absolutely. In the real world, quality and reliability

00:13:15.389 --> 00:13:17.610
are usually what drive the most value. Okay,

00:13:17.669 --> 00:13:20.490
moving beyond those structured benchmarks, how

00:13:20.490 --> 00:13:23.710
does GPT -5 handle more complex reasoning or

00:13:23.710 --> 00:13:26.110
even creative challenges? The source material

00:13:26.110 --> 00:13:28.429
talks about testing it as an ultimate assistant

00:13:28.429 --> 00:13:32.200
or even an AI art director. This starts to sound

00:13:32.200 --> 00:13:34.720
like the JRVIS test, right? Like from Iron Man.

00:13:34.860 --> 00:13:36.639
It absolutely does feel like that. We simulated

00:13:36.639 --> 00:13:39.080
this ultimate assistance scenario. The mission

00:13:39.080 --> 00:13:41.799
was intentionally high level. Achieve a business

00:13:41.799 --> 00:13:43.620
goal that required coordinating multiple steps,

00:13:43.779 --> 00:13:45.500
web research, looking things up in a database,

00:13:45.700 --> 00:13:47.960
checking a calendar, drafting an email basically,

00:13:48.120 --> 00:13:49.919
orchestrating several different tools seamlessly.

00:13:50.200 --> 00:13:52.820
And what we saw was really a master class in

00:13:52.820 --> 00:13:55.659
AI orchestration from GPT -5. What was the most

00:13:55.659 --> 00:13:57.480
impressive part of that orchestration? What really

00:13:57.480 --> 00:13:59.659
stood out? The most striking thing was what the

00:13:59.659 --> 00:14:03.320
source called a self -healing workflow. During

00:14:03.320 --> 00:14:06.360
the test, we intentionally sabotaged the API

00:14:06.360 --> 00:14:08.679
key for the primary web search tool it was supposed

00:14:08.679 --> 00:14:11.399
to use. Now, an older model probably would have

00:14:11.399 --> 00:14:14.059
just failed, thrown an error, and stopped. But

00:14:14.059 --> 00:14:16.940
GPT -5 correctly identified the specific error

00:14:16.940 --> 00:14:19.779
it saw it was an authentication error. It automatically

00:14:19.779 --> 00:14:22.340
retried the tool once, just in case it was a

00:14:22.340 --> 00:14:24.879
temporary glitch. And when the error persisted,

00:14:24.899 --> 00:14:27.000
it intelligently switched to its backup plan.

00:14:27.340 --> 00:14:29.779
It used a different research tool, perplexity

00:14:29.779 --> 00:14:31.659
in this case, to get the information it needed

00:14:31.659 --> 00:14:34.100
and complete the mission successfully. That ability

00:14:34.100 --> 00:14:36.580
to recognize a problem, try to fix, and then

00:14:36.580 --> 00:14:39.860
execute a plan B. That's huge for building robust,

00:14:40.039 --> 00:14:42.019
reliable automation that doesn't break easily.

00:14:42.299 --> 00:14:44.820
Wow, okay. It can adapt to failures. What about

00:14:44.820 --> 00:14:46.940
the creative side, the AI art director test?

00:14:47.120 --> 00:14:49.000
How did that go? Right, so on the creative front.

00:14:49.519 --> 00:14:52.740
We compared GPT -4 .0 and GPT -5 again. We gave

00:14:52.740 --> 00:14:54.679
both models a really simple starting prompt,

00:14:54.799 --> 00:14:57.759
something like a shark wearing a cowboy hat on

00:14:57.759 --> 00:15:00.460
a classic car. And the task wasn't to generate

00:15:00.460 --> 00:15:02.960
the image itself, but to generate a much more

00:15:02.960 --> 00:15:05.480
detailed prompt that an image generation AI,

00:15:05.679 --> 00:15:09.019
like Midjourney or Dell E3, could use to create

00:15:09.019 --> 00:15:11.080
a really great picture. And the difference between

00:15:11.080 --> 00:15:13.960
the two was, well, night and day. How so? What

00:15:13.960 --> 00:15:16.559
made the GPT -5 prompt so much better? GPT -4

00:15:16.559 --> 00:15:18.799
acted like a, you know, a competent technician.

00:15:19.039 --> 00:15:21.399
It gave a solid literal description based on

00:15:21.399 --> 00:15:24.360
the input. Serviceable. Yeah. But GPT -5, it

00:15:24.360 --> 00:15:26.799
acted like a professional art director. Its generated

00:15:26.799 --> 00:15:29.700
prompt was incredibly rich, detailed, and evocative.

00:15:29.759 --> 00:15:32.519
It specified things like photorealistic cinematic

00:15:32.519 --> 00:15:35.340
wide shot, weathered leather cowboy hat, tilted

00:15:35.340 --> 00:15:38.230
at a jaunty angle. Mint condition, cherry red

00:15:38.230 --> 00:15:40.970
1950s American classic convertible, sun bleached

00:15:40.970 --> 00:15:43.289
desert highway, golden hour light, warm tones,

00:15:43.509 --> 00:15:45.850
long shadows, low angle composition, shallow

00:15:45.850 --> 00:15:48.409
depth of field, playful yet majestic mood, no

00:15:48.409 --> 00:15:50.370
text or watermark. It was just packed with specific

00:15:50.370 --> 00:15:52.750
visual and atmospheric details. That's, yeah,

00:15:52.850 --> 00:15:54.509
that's amazing. It really paints a picture with

00:15:54.509 --> 00:15:57.129
words. Exactly. And as you can imagine, the difference

00:15:57.129 --> 00:15:59.350
in the final image you'd get from those two tromps

00:15:59.350 --> 00:16:02.250
would be astronomical. One prompt gives you a

00:16:02.250 --> 00:16:04.250
basic picture. The other gives you a potential

00:16:04.250 --> 00:16:07.210
movie post or a whole story. This enhanced ability

00:16:07.210 --> 00:16:10.389
to generate really creative, detailed, effective

00:16:10.389 --> 00:16:13.730
prompts is a massive advantage for anyone who

00:16:13.730 --> 00:16:16.129
needs to automate content creation, whether it's

00:16:16.129 --> 00:16:18.070
for marketing, social media, anything visual.

00:16:18.250 --> 00:16:20.470
So it can adapt to errors on the fly and it can

00:16:20.470 --> 00:16:23.309
be incredibly creative. It's showing. deep intelligence

00:16:23.309 --> 00:16:26.070
and artistic flair. Yes, exactly. It's a combination

00:16:26.070 --> 00:16:27.649
we haven't really seen at this level before.

00:16:27.830 --> 00:16:29.730
So let's bring it all together. What does this

00:16:29.730 --> 00:16:32.929
actually mean for practical applications? With

00:16:32.929 --> 00:16:35.690
this level of AI, what kinds of real -world problems

00:16:35.690 --> 00:16:38.289
can we actually start solving now? Things that

00:16:38.289 --> 00:16:41.029
maybe seemed impossible or just too impractical

00:16:41.029 --> 00:16:43.750
before? Well, this new level of capability unlocks

00:16:43.750 --> 00:16:45.889
a whole range of practical automation opportunities

00:16:45.889 --> 00:16:48.649
that were maybe just out of reach. Think about

00:16:48.649 --> 00:16:51.690
level two customer service agents. AI agents

00:16:51.690 --> 00:16:54.230
that can handle genuinely complex, multi -step

00:16:54.230 --> 00:16:58.009
customer inquiries, not just simple FAQs. Agents

00:16:58.009 --> 00:16:59.830
that are fully integrated with your CRM that

00:16:59.830 --> 00:17:01.889
can look up order histories, process returns,

00:17:02.289 --> 00:17:04.670
troubleshoot issues, and importantly have a clear,

00:17:04.730 --> 00:17:07.329
seamless escalation path to a human agent when

00:17:07.329 --> 00:17:09.829
they hit their limit. Or imagine a content factory

00:17:09.829 --> 00:17:12.809
workflow. automating your entire content production

00:17:12.809 --> 00:17:14.769
pipeline from doing the initial research and

00:17:14.769 --> 00:17:17.230
fact -checking to generating drafts in multiple

00:17:17.230 --> 00:17:19.670
formats like blog posts, social media updates,

00:17:19.950 --> 00:17:22.349
video scripts, all with built -in checks for

00:17:22.349 --> 00:17:25.289
SEO and quality assurance. You could even build

00:17:25.289 --> 00:17:27.269
sophisticated business intelligence engines,

00:17:27.450 --> 00:17:30.130
systems that automatically pull data from various

00:17:30.130 --> 00:17:32.910
sources, aggregate it, generate concise executive

00:17:32.910 --> 00:17:35.589
summaries, maybe even create data visualizations

00:17:35.589 --> 00:17:37.759
on the fly. Okay, that's a serious amount of

00:17:37.759 --> 00:17:40.859
power. Which brings us back to strategy and cost.

00:17:41.039 --> 00:17:43.420
How do we apply this power effectively? What's

00:17:43.420 --> 00:17:45.519
the new playbook for building smart, cost -controlled

00:17:45.519 --> 00:17:48.220
automations with GPT -5? Right, because with

00:17:48.220 --> 00:17:51.519
great power comes potentially a great API bill

00:17:51.519 --> 00:17:53.740
if you're not careful. So, smart cost management

00:17:53.740 --> 00:17:56.059
is absolutely critical. The playbook has a few

00:17:56.059 --> 00:17:58.759
key parts. First, strategic model selection,

00:17:59.000 --> 00:18:00.460
like we talked about with the family of brains.

00:18:00.640 --> 00:18:02.539
You need to pick the right engine for the job.

00:18:02.819 --> 00:18:05.160
Use the powerful standard or pro models for tasks

00:18:05.160 --> 00:18:07.240
needing deep reasoning or complex instruction

00:18:07.240 --> 00:18:09.839
following. Use mini for high -volume, simpler

00:18:09.839 --> 00:18:12.859
tasks where cost per task is key. And use nano

00:18:12.859 --> 00:18:15.039
for the absolute simplest, most cost -critical

00:18:15.039 --> 00:18:17.619
operations. Don't use the most expensive brain

00:18:17.619 --> 00:18:20.519
for every single thought. Second, aggressive

00:18:20.519 --> 00:18:23.390
token management. Think of the AI's context window,

00:18:23.589 --> 00:18:25.569
the amount of text it can consider at once like

00:18:25.569 --> 00:18:27.950
a briefcase. Every word you put in or get out

00:18:27.950 --> 00:18:30.769
costs something. So optimize your prompts. Be

00:18:30.769 --> 00:18:32.569
ruthless about cutting out fluff and unnecessary

00:18:32.569 --> 00:18:35.250
words. Implement caching wherever possible if

00:18:35.250 --> 00:18:36.849
you're asking the AI the same question repeatedly

00:18:36.849 --> 00:18:39.710
or feeding it the same background info. Store

00:18:39.710 --> 00:18:41.390
the answer and reuse it instead of hitting the

00:18:41.390 --> 00:18:43.529
API every time. And consider hybrid approaches.

00:18:43.730 --> 00:18:45.970
Maybe use a cheap model like Mini for an initial

00:18:45.970 --> 00:18:48.890
pass, like filtering emails, and then only send

00:18:48.890 --> 00:18:50.849
the really important ones to the more expensive

00:18:50.849 --> 00:18:53.549
Pro model for detailed analysis. So being really

00:18:53.549 --> 00:18:55.430
disciplined about what goes into that briefcase

00:18:55.430 --> 00:18:58.569
and reusing information smartly is paramount

00:18:58.569 --> 00:19:01.609
for cost. Exactly. And the third piece is batch

00:19:01.609 --> 00:19:04.269
processing optimization. Think like an assembly

00:19:04.269 --> 00:19:07.690
line. Making one single API call to process,

00:19:07.750 --> 00:19:11.190
say, 10 customer reviews at once is far, far

00:19:11.190 --> 00:19:13.569
more efficient and cheaper than making 10 separate

00:19:13.569 --> 00:19:16.609
API calls, one for each review. So design your

00:19:16.609 --> 00:19:18.730
workflows to group similar operations together.

00:19:18.970 --> 00:19:21.390
Maybe implement intelligent queuing systems so

00:19:21.390 --> 00:19:23.930
non -urgent tasks can collect and then be processed

00:19:23.930 --> 00:19:26.529
in one large batch during off -peak hours. So

00:19:26.529 --> 00:19:28.950
it's really about strategic deployment, choosing

00:19:28.950 --> 00:19:31.269
the right tool, and being super smart and efficient

00:19:31.269 --> 00:19:34.619
with tokens and requests. Absolutely. Efficiency

00:19:34.619 --> 00:19:37.339
is what ensures profitability and allows you

00:19:37.339 --> 00:19:40.420
to scale these powerful AI solutions. Okay, let's

00:19:40.420 --> 00:19:42.259
zoom out one last time. If we connect all these

00:19:42.259 --> 00:19:44.759
dots, look at the bigger picture. The release

00:19:44.759 --> 00:19:46.980
of GPT -5 feels like more than just an update.

00:19:47.180 --> 00:19:49.119
It feels like a signal, maybe, of some major

00:19:49.119 --> 00:19:52.220
underlying trends in AI. From this vantage point,

00:19:52.359 --> 00:19:54.200
what does the near future look like? Yeah, it

00:19:54.200 --> 00:19:55.880
definitely feels like it's signaling at least

00:19:55.880 --> 00:19:58.920
three big shifts on the horizon. First, what

00:19:58.920 --> 00:20:01.119
the source calls the great leveling. This idea

00:20:01.119 --> 00:20:04.349
that... Truly advanced AI automation, the kind

00:20:04.349 --> 00:20:06.450
previously only available to huge corporations

00:20:06.450 --> 00:20:08.990
with massive budgets, is now becoming accessible

00:20:08.990 --> 00:20:11.710
to small businesses, even individual entrepreneurs.

00:20:11.990 --> 00:20:14.490
This allows them to genuinely compete, maybe

00:20:14.490 --> 00:20:17.230
even outmaneuver, much larger players by leveraging

00:20:17.230 --> 00:20:19.890
AI smartly. It's a democratization of capability.

00:20:20.730 --> 00:20:23.930
Second, we're getting closer to the JARVS build

00:20:23.930 --> 00:20:27.599
-me -a -workflow future. GPT -5's strong performance

00:20:27.599 --> 00:20:29.819
on coding benchmarks hints that we're rapidly

00:20:29.819 --> 00:20:31.799
approaching a point where you might be able to

00:20:31.799 --> 00:20:33.640
describe a complex automation you need in just

00:20:33.640 --> 00:20:36.200
plain English. And the AI system could actually

00:20:36.200 --> 00:20:38.460
generate a ready -to -run workflow for you, maybe

00:20:38.460 --> 00:20:41.240
in a tool like N8n. Imagine just typing, build

00:20:41.240 --> 00:20:43.460
me an N8n workflow that triggers every time I

00:20:43.460 --> 00:20:46.119
get a new email with an invoice attached, extracts

00:20:46.119 --> 00:20:48.400
the amount due and sender, adds it to my accounting

00:20:48.400 --> 00:20:50.319
spreadsheet, and sends me a confirmation message,

00:20:50.400 --> 00:20:52.109
and it just builds it. We're not there yet, but

00:20:52.109 --> 00:20:54.750
GPT -5 suggests it's getting closer. Wow. AI

00:20:54.750 --> 00:20:57.049
building AI workflows? And you mentioned a third

00:20:57.049 --> 00:21:00.190
trend, teams of AIs. Right. That leads to the

00:21:00.190 --> 00:21:02.690
third trend, the Avengers Assemble moment for

00:21:02.690 --> 00:21:05.869
AI. This is the potential shift away from trying

00:21:05.869 --> 00:21:09.250
to build one single monolithic AI that does everything,

00:21:09.410 --> 00:21:12.839
like a solo Iron Man. Towards orchestrating teams

00:21:12.839 --> 00:21:15.220
of more specialized AI agents, sometimes called

00:21:15.220 --> 00:21:18.859
agent swarms, that work together, GPT -5's much

00:21:18.859 --> 00:21:21.160
improved ability to reliably call and coordinate

00:21:21.160 --> 00:21:24.039
external tools makes these kinds of multi -agent

00:21:24.039 --> 00:21:26.599
systems far more feasible than they were before.

00:21:26.980 --> 00:21:29.259
The future might not be about building the single

00:21:29.259 --> 00:21:31.960
smartest AI, but about becoming like Nick Fury,

00:21:32.140 --> 00:21:34.339
the director, who knows how to assemble the right

00:21:34.339 --> 00:21:36.720
team of specialized Avengers, each with their

00:21:36.720 --> 00:21:39.440
own unique superpower, to tackle a complex mission

00:21:39.440 --> 00:21:42.210
collaboratively. So we're talking about AI building

00:21:42.210 --> 00:21:45.769
AI and potentially teams of specialized AIs working

00:21:45.769 --> 00:21:48.650
together on complex problems. Yes, it feels like

00:21:48.650 --> 00:21:50.609
the next frontier is really about AI collaboration

00:21:50.609 --> 00:21:53.150
and even self -assembly of solutions. Okay, so

00:21:53.150 --> 00:21:55.609
the bottom line here for you, the listener, really

00:21:55.609 --> 00:21:57.869
seems to be that GPT -5 isn't just another small

00:21:57.869 --> 00:21:59.869
step forward. It feels more like discovering

00:21:59.869 --> 00:22:02.009
a whole new continent for automation possibilities.

00:22:02.670 --> 00:22:04.349
I think that's a good way to put it. It brings

00:22:04.349 --> 00:22:06.750
dramatically improved reasoning, much better

00:22:06.750 --> 00:22:10.150
tool integration, and crucially, a more accessible...

00:22:10.160 --> 00:22:12.759
cost structure, especially with the family approach.

00:22:13.180 --> 00:22:15.519
It's fundamentally more reliable for critical

00:22:15.519 --> 00:22:18.640
tasks, and it enables more advanced, complex

00:22:18.640 --> 00:22:21.480
systems that require less constant human hand

00:22:21.480 --> 00:22:24.759
-holding. For platforms like NANN, this is massive.

00:22:25.019 --> 00:22:27.220
For years, you could argue that the flexibility

00:22:27.220 --> 00:22:29.500
of the automation platform itself was, in some

00:22:29.500 --> 00:22:32.099
ways, ahead of the intelligence of the AI brains

00:22:32.099 --> 00:22:35.480
we could easily plug into it. With GPT -5, it

00:22:35.480 --> 00:22:37.079
feels like the brain has finally caught up to

00:22:37.079 --> 00:22:40.089
the body. The potential is huge. So the race

00:22:40.089 --> 00:22:41.849
has definitely started. It seems the question

00:22:41.849 --> 00:22:43.789
is no longer if these tools will change pretty

00:22:43.789 --> 00:22:46.250
much every business, but really who will be the

00:22:46.250 --> 00:22:47.990
ones to actually build with them effectively

00:22:47.990 --> 00:22:50.529
and lead the way. Absolutely. And my advice,

00:22:50.690 --> 00:22:52.609
if you're listening and wondering where to start,

00:22:52.690 --> 00:22:55.609
start simple, but start now. Pick one important,

00:22:55.710 --> 00:22:58.529
tangible, real -world business problem you have.

00:22:58.869 --> 00:23:01.230
Try building an automation to solve it using

00:23:01.230 --> 00:23:04.430
these tools. Test it carefully. Refine it. Build

00:23:04.430 --> 00:23:07.289
one strong, reliable system first. Get comfortable

00:23:07.289 --> 00:23:10.390
with it. Then, slowly and confidently, grow your

00:23:10.390 --> 00:23:12.990
AI -powered capabilities from there. Don't try

00:23:12.990 --> 00:23:15.420
to boil the ocean on day one. That sounds like

00:23:15.420 --> 00:23:17.500
solid advice. Thank you for joining us on this

00:23:17.500 --> 00:23:20.039
deep dive into GPT -5. It's an exciting time.

00:23:20.119 --> 00:23:22.819
This new era of genuinely powerful, increasingly

00:23:22.819 --> 00:23:25.720
accessible AI is here. The time to start building

00:23:25.720 --> 00:23:27.900
is definitely now. Out to your own music.
