WEBVTT

00:00:00.000 --> 00:00:04.080
Technology usually creeps up on us slowly. Beat.

00:00:04.839 --> 00:00:08.019
But sometimes it kicks the door down. Yeah, it

00:00:08.019 --> 00:00:10.300
really does. I was pondering this pace of change

00:00:10.300 --> 00:00:13.779
recently. Yeah. We woke up on March 5th, 2026,

00:00:14.439 --> 00:00:17.059
to a very different reality. A completely wild

00:00:17.059 --> 00:00:19.679
reality. Right. OpenAI dropped an update that

00:00:19.679 --> 00:00:21.839
doesn't just answer your questions. It actually

00:00:21.839 --> 00:00:23.699
takes control of your mouse to do your digital

00:00:23.699 --> 00:00:26.239
chores. Welcome to the deep dive. I have been

00:00:26.239 --> 00:00:28.800
waiting for this one. We are unpacking the massive

00:00:28.800 --> 00:00:33.179
GPT 5 .4 chat GPT update today. It is a fundamental

00:00:33.179 --> 00:00:35.439
shift. We are moving past chatbots entirely.

00:00:35.780 --> 00:00:37.479
Absolutely. That is our mission for you today.

00:00:37.719 --> 00:00:39.979
We will cover the brand new model lineup first.

00:00:40.560 --> 00:00:43.380
And we'll explore how the system acts as a professional

00:00:43.380 --> 00:00:45.539
desk worker. We'll also look at its wild new

00:00:45.539 --> 00:00:48.119
ability to see. It can actually use your computer

00:00:48.119 --> 00:00:50.649
like a human would. Plus, we are diving into

00:00:50.649 --> 00:00:52.990
its coding chops. And finally, we will break

00:00:52.990 --> 00:00:55.570
down the real -world costs and safety implications.

00:00:55.770 --> 00:00:57.909
Let's jump right in. Let's start by meeting the

00:00:57.909 --> 00:01:00.170
new models. If you logged in recently, you probably

00:01:00.170 --> 00:01:02.549
noticed the confusing name. Oh, very confusing.

00:01:02.689 --> 00:01:04.609
The version numbers jump around quite a bit.

00:01:04.890 --> 00:01:07.489
There's no 5 .3 thinking model at all. Nope.

00:01:07.769 --> 00:01:09.650
They went straight from the old baseline to 5

00:01:09.650 --> 00:01:13.140
.4. It feels like a... Well, like a skip step.

00:01:13.900 --> 00:01:16.040
But it signals a massive architectural leap.

00:01:16.219 --> 00:01:18.120
Yeah, the lineup itself is actually very focused

00:01:18.120 --> 00:01:20.599
now. You have three main versions. First up is

00:01:20.599 --> 00:01:24.420
GPT 5 .3 Instant. Right. And that model is built

00:01:24.420 --> 00:01:26.540
for pure speed. It gives you an answer immediately.

00:01:26.739 --> 00:01:29.439
You hit Enter, and the text is just there. Almost

00:01:29.439 --> 00:01:32.140
zero latency. I find it perfect for those quick,

00:01:32.159 --> 00:01:34.950
everyday questions. like asking for a recipe

00:01:34.950 --> 00:01:37.870
or doing some basic fact -checking. It doesn't

00:01:37.870 --> 00:01:40.310
spend computing power pondering deeply, it just

00:01:40.310 --> 00:01:42.849
reacts. It's your quick reflex, but then you

00:01:42.849 --> 00:01:45.890
have the star of the show. GPT 5 .4 thinking.

00:01:46.510 --> 00:01:50.049
This model actually pauses to reason before answering

00:01:50.049 --> 00:01:53.230
complex queries. Beat, it maps out a logical

00:01:53.230 --> 00:01:55.849
path. You literally see a small box indicating

00:01:55.849 --> 00:01:59.569
its thinking. That brief pause changes the quality

00:01:59.569 --> 00:02:02.400
of the answer entirely. It stops treating words

00:02:02.400 --> 00:02:04.920
like a predictive text game. Exactly. It acts

00:02:04.920 --> 00:02:07.260
more like a strategist. And for the heavy lifters,

00:02:07.500 --> 00:02:11.180
there is GPT 5 .4 Pro. You do need a Pro or Enterprise

00:02:11.180 --> 00:02:13.879
plan to access this one. Yeah, it takes the longest

00:02:13.879 --> 00:02:17.759
to process, but it delivers absolute peak accuracy

00:02:17.759 --> 00:02:20.120
for deep research. Most of you will probably

00:02:20.120 --> 00:02:22.939
rely on auto mode, though. Auto mode is the practical

00:02:22.939 --> 00:02:25.520
choice. Think of it like an automatic transmission

00:02:25.520 --> 00:02:28.379
in a car. It shifts gears for you based on the

00:02:28.379 --> 00:02:30.710
road ahead. It takes that a step further, really.

00:02:30.889 --> 00:02:33.129
It's not just shifting gears. It's deciding if

00:02:33.129 --> 00:02:35.509
you need a bicycle or a freight train. Right.

00:02:35.669 --> 00:02:38.150
You ask a simple math question. It uses instant.

00:02:38.349 --> 00:02:40.710
You ask for a 10 page mark analysis. It shifts

00:02:40.710 --> 00:02:43.229
into thinking. I am looking at these three tiers

00:02:43.229 --> 00:02:46.590
and I have to ask, does auto mode actually save

00:02:46.590 --> 00:02:48.830
you time in the long run? Yes, it takes the right

00:02:48.830 --> 00:02:50.969
brain power so you don't have to guess. That

00:02:50.969 --> 00:02:53.750
friction removal is key. Let's shift into how

00:02:53.750 --> 00:02:57.060
this impacts professional work. OpenAI focused

00:02:57.060 --> 00:02:59.340
heavily on knowledge work with this release,

00:02:59.699 --> 00:03:02.400
managing data, rating emails, building presentations.

00:03:02.759 --> 00:03:05.780
They ran a rigorous benchmark called the GDPVAL

00:03:05.780 --> 00:03:08.740
test. Right. This test measures how well the

00:03:08.740 --> 00:03:11.539
AI performs jobs that human professionals do.

00:03:11.840 --> 00:03:14.240
We're talking about complex accounting or sales

00:03:14.240 --> 00:03:17.479
roles. The results were genuinely striking. GPT

00:03:17.479 --> 00:03:22.120
5 .4 matched or beat human pros 83 % of the time.

00:03:22.620 --> 00:03:25.199
Two -sex silence. That isn't just a slight improvement.

00:03:25.280 --> 00:03:27.960
No, that is a structural shift in corporate capability.

00:03:28.280 --> 00:03:30.699
It changes the hiring landscape entirely. Think

00:03:30.699 --> 00:03:33.280
about spreadsheets. If you spend half your week

00:03:33.280 --> 00:03:35.740
formatting raw data, this changes your life.

00:03:36.039 --> 00:03:39.360
The source highlighted a specific, highly complicated

00:03:39.360 --> 00:03:42.879
example. You feed it a messy list of sales data.

00:03:43.099 --> 00:03:46.400
Data that is unformatted, missing columns, and

00:03:46.400 --> 00:03:48.680
full of weird date strings. Yeah, the worst kind

00:03:48.680 --> 00:03:51.099
of data. You ask it to find the trends. You ask

00:03:51.099 --> 00:03:53.159
for the best -selling product. Then you ask for

00:03:53.159 --> 00:03:55.520
a prediction for next month based on a 5 % growth

00:03:55.520 --> 00:03:58.000
rate. It handles the math flawlessly. It runs

00:03:58.000 --> 00:04:00.199
the regression without complaining. But here's

00:04:00.199 --> 00:04:02.280
the kicker. It formats it as an Excel -ready

00:04:02.280 --> 00:04:05.500
table. Plus, there is now a direct chat GPT for

00:04:05.500 --> 00:04:08.379
Excel plugin. That plugin is huge. Context switching

00:04:08.379 --> 00:04:11.199
kills productivity. If you leave Excel to use

00:04:11.199 --> 00:04:14.360
a chatbot, you lose focus. Now, you use the model

00:04:14.360 --> 00:04:16.720
directly inside your sheets. The days of endlessly

00:04:16.720 --> 00:04:19.339
copy -pasting back and forth are over. It transforms

00:04:19.339 --> 00:04:22.279
you from a data entry clerk into a manager. And

00:04:22.279 --> 00:04:25.350
it isn't just raw data. It creates full presentations

00:04:25.350 --> 00:04:28.170
in minutes. You ask it to research a topic like

00:04:28.170 --> 00:04:31.009
the future of solar energy. You prompt it to

00:04:31.009 --> 00:04:33.709
generate a 10 -slide PowerPoint presentation.

00:04:34.050 --> 00:04:36.509
You ask for professional tone. And you make sure

00:04:36.509 --> 00:04:39.769
it includes rigorous academic citations. It doesn't

00:04:39.769 --> 00:04:42.490
just give you the text. It builds the actual

00:04:42.490 --> 00:04:45.430
file for you to download. The designs are incredibly

00:04:45.430 --> 00:04:48.069
sharp now, too. They use clean, modern colors

00:04:48.069 --> 00:04:51.430
and solid typography. And if you hate the design...

00:04:51.500 --> 00:04:53.819
You don't have to start over. Nope. You just

00:04:53.819 --> 00:04:55.579
tell it, make it more modern and minimalist.

00:04:55.959 --> 00:04:58.319
It rebuilds the entire thing in minutes. Yeah.

00:04:58.699 --> 00:05:00.560
But dealing with neat examples is one thing.

00:05:01.040 --> 00:05:03.879
Can it handle genuinely chaotic, unformatted

00:05:03.879 --> 00:05:06.079
data? Absolutely. It cleans it up and formats

00:05:06.079 --> 00:05:08.800
it perfectly for Excel. This brings us to what

00:05:08.800 --> 00:05:11.699
might be the real breakthrough. TwoSec Silence,

00:05:12.100 --> 00:05:15.310
AI with built -in computer use. This is the mind

00:05:15.310 --> 00:05:16.829
-blowing part of the update. It doesn't just

00:05:16.829 --> 00:05:18.790
give you advice anymore. It actually acts on

00:05:18.790 --> 00:05:21.389
your behalf. Right. It writes code to run your

00:05:21.389 --> 00:05:24.290
machine. Or it visually navigates screenshots.

00:05:24.509 --> 00:05:27.350
Think about that ISWorld verified score. It hits

00:05:27.350 --> 00:05:32.129
75 .0 % on desktop use. Beating the human score

00:05:32.129 --> 00:05:36.160
of 72 .4%. and absolutely crushing the old GPC

00:05:36.160 --> 00:05:40.839
5 .2. Which sat at just 47 .3%. It can look at

00:05:40.839 --> 00:05:43.100
a messy desktop and find an invoice perfectly.

00:05:43.300 --> 00:05:45.699
When you look at web browsing. The Mine2 web

00:05:45.699 --> 00:05:48.519
test is fascinating. It finished complex tasks

00:05:48.519 --> 00:05:51.139
just by looking at browser screenshots with 92

00:05:51.139 --> 00:05:54.060
.8 % accuracy. It doesn't need a special API

00:05:54.060 --> 00:05:55.980
to talk to a website. It just looks at the screen

00:05:55.980 --> 00:05:58.180
like we do. We also have to mention agentic web

00:05:58.180 --> 00:06:00.560
search. On the browse comp test, the Pro model

00:06:00.560 --> 00:06:03.959
scored 89 .3%. It excels at those painful needle

00:06:03.959 --> 00:06:07.300
-in -a -haystack tasks. Finding that one tiny

00:06:07.300 --> 00:06:09.620
buried piece of information on a chaotic website.

00:06:09.860 --> 00:06:12.100
It works because its upgraded vision is incredible.

00:06:12.279 --> 00:06:16.079
The MMU Pro test proves that. Scoring 81 .2 %

00:06:16.079 --> 00:06:18.459
on complex photos and scientific charts. And

00:06:18.459 --> 00:06:21.860
on Omnidoc Bench, it reads PDFs with a tiny .11

00:06:21.860 --> 00:06:24.579
error rate. It can process images at an original

00:06:24.579 --> 00:06:27.779
detail level of 10 .24 million pixels. That is

00:06:27.779 --> 00:06:30.800
a 6 ,000 pixel dimension. It spots tiny 10 pixel

00:06:30.800 --> 00:06:33.480
buttons it used to miss entirely. I still wrestle

00:06:33.480 --> 00:06:36.480
with the idea of letting an AI freely click around

00:06:36.480 --> 00:06:38.800
my personal desktop. I completely understand

00:06:38.800 --> 00:06:41.420
that hesitation. Handing over the mouse feels

00:06:41.420 --> 00:06:44.459
unnatural. Very unnatural. But there are serious

00:06:44.459 --> 00:06:46.779
safety rules built into the architecture. You

00:06:46.779 --> 00:06:49.579
can guide it with specific, limiting messages.

00:06:49.699 --> 00:06:52.420
With all this capability, I have to wonder, is

00:06:52.420 --> 00:06:54.920
it safe to let it click around your private files?

00:06:55.100 --> 00:06:58.079
You set strict rules and it asks permission before

00:06:58.079 --> 00:07:01.060
taking risks. We will be right back after a quick

00:07:01.060 --> 00:07:04.800
word. Sponsor. Welcome back to the Deep Dive.

00:07:05.079 --> 00:07:08.220
Before the break, we talked about how GPT 5 .4

00:07:08.220 --> 00:07:11.089
can navigate your desktop. But what if the software

00:07:11.089 --> 00:07:14.050
you need doesn't exist yet? That is where this

00:07:14.050 --> 00:07:16.470
update fundamentally changes how we build things.

00:07:16.709 --> 00:07:18.970
It's a massive deal for developers. The main

00:07:18.970 --> 00:07:22.449
GPT 5 .4 model fully absorbed the old codex tool.

00:07:22.889 --> 00:07:24.990
It is now native. And it's just as good as that

00:07:24.990 --> 00:07:26.930
specialized model ever was. It achieves much

00:07:26.930 --> 00:07:29.100
higher accuracy. and it does it in a fraction

00:07:29.100 --> 00:07:31.060
of the time. The iteration loop for software

00:07:31.060 --> 00:07:34.379
engineering is collapsing. Jeep's T5 .2 used

00:07:34.379 --> 00:07:36.759
to take nearly 2 ,000 seconds to hit its best

00:07:36.759 --> 00:07:38.959
accuracy. Now you get better results in about

00:07:38.959 --> 00:07:42.079
half that time. It starts smarter and stays ahead

00:07:42.079 --> 00:07:45.319
as it thinks. This enables a wild concept called

00:07:45.319 --> 00:07:47.920
vibe coding. You don't need to know C++ anymore.

00:07:48.160 --> 00:07:50.300
You just describe what you want and the AI builds

00:07:50.300 --> 00:07:52.819
it from scratch. The source highlighted building

00:07:52.819 --> 00:07:56.399
a 3D highway racing game. using a single prompt.

00:07:56.519 --> 00:07:59.399
It generated a complete HTML and JavaScript file

00:07:59.399 --> 00:08:01.920
instantly. It included a car selection screen

00:08:01.920 --> 00:08:04.720
with three colors. It added moving traffic, a

00:08:04.720 --> 00:08:07.839
nitro boost, and a real damage system. It even

00:08:07.839 --> 00:08:09.939
added professional details like street lamps

00:08:09.939 --> 00:08:13.120
and trees. The code was long and incredibly complex.

00:08:13.449 --> 00:08:16.290
But the physics actually felt real. If you can

00:08:16.290 --> 00:08:19.069
do that in 30 seconds, the barrier to entry for

00:08:19.069 --> 00:08:21.290
game design just dropped to zero. And if you

00:08:21.290 --> 00:08:23.629
are using the API, there is a new slash fast

00:08:23.629 --> 00:08:27.029
mode. It makes code generation 1 .5 times faster

00:08:27.029 --> 00:08:29.430
with no quality loss. It saves you from waiting

00:08:29.430 --> 00:08:32.029
for hundreds of lines of boilerplate code. But

00:08:32.029 --> 00:08:34.570
the biggest shift is the context window. It jumped

00:08:34.570 --> 00:08:38.529
from the standard 272k tokens to an experimental

00:08:38.529 --> 00:08:42.090
1 million tokens. The context window is the AI's

00:08:42.090 --> 00:08:44.070
short -term memory for tracking your current

00:08:44.070 --> 00:08:46.490
conversation and files. Whoa, imagine feeding

00:08:46.490 --> 00:08:48.809
it a whole library of code and it remembers line

00:08:48.809 --> 00:08:51.690
one. It changes everything about debugging. It

00:08:51.690 --> 00:08:54.090
can hold your entire project architecture in

00:08:54.090 --> 00:08:56.470
its head. It sees all your files at the same

00:08:56.470 --> 00:08:58.850
time to find hidden mistakes. But with all that

00:08:58.850 --> 00:09:01.850
processing power, Does using that massive 1 million

00:09:01.850 --> 00:09:05.289
context window cost more? Yes. Requests over

00:09:05.289 --> 00:09:07.929
the standard limit eat usage at double speed.

00:09:08.529 --> 00:09:11.159
We had to look at the broader rivalry here. How

00:09:11.159 --> 00:09:13.379
does this compare to competitors like Clawed

00:09:13.379 --> 00:09:17.080
4 .6 and Gemini 3 .1? It's a very close race

00:09:17.080 --> 00:09:20.539
right now. GPT 5 .4 clearly wins on knowledge

00:09:20.539 --> 00:09:23.220
work, overall speed, and computer use. That built

00:09:23.220 --> 00:09:25.899
-in desktop interaction is incredibly advanced

00:09:25.899 --> 00:09:28.299
compared to the rest. But to be fair to the source,

00:09:28.539 --> 00:09:31.159
Clawed still sounds more natural. Yeah, GPT 5

00:09:31.159 --> 00:09:33.600
.4 can sometimes feel a bit robotic when writing

00:09:33.600 --> 00:09:36.379
blogs or creative essays. It lacks a certain

00:09:36.379 --> 00:09:39.559
warmth. That is a fair critique. And Gemini remains

00:09:39.559 --> 00:09:41.620
highly creative for marketing and storytelling.

00:09:42.179 --> 00:09:44.159
The constant back and forth between these companies

00:09:44.159 --> 00:09:47.179
is great for users, though. It forces rapid innovation.

00:09:47.419 --> 00:09:49.379
Let's talk about pricing. Because this power

00:09:49.379 --> 00:09:52.080
isn't free. No, it is not. For plus users paying

00:09:52.080 --> 00:09:56.559
$20 a month, you get 5 .3 instant and 5 .4 thinking.

00:09:56.879 --> 00:09:59.700
Pro requires a much higher enterprise tier. There

00:09:59.700 --> 00:10:03.120
were some launch day message limit bugs frustrating

00:10:03.120 --> 00:10:05.679
users. But those usually clear up quickly as

00:10:05.679 --> 00:10:07.779
they scale their servers. For API developers

00:10:07.779 --> 00:10:10.299
building apps, the tokens do cost a bit more

00:10:10.299 --> 00:10:15.419
upfront. It's $2 .50 in and $15 out per 1 million

00:10:15.419 --> 00:10:18.899
tokens. But you actually save up to 47 % overall.

00:10:19.259 --> 00:10:21.779
The new tool search feature is a brilliant cost

00:10:21.779 --> 00:10:24.519
saving measure. In the past, the AI had to read

00:10:24.519 --> 00:10:27.200
every single tool definition for every single

00:10:27.200 --> 00:10:29.299
request. It was like reading the entire dictionary

00:10:29.299 --> 00:10:31.720
just to spell one word. Now it only looks up

00:10:31.720 --> 00:10:34.940
the specific tool it needs. In tests, that dropped

00:10:34.940 --> 00:10:39.159
usage from 123k tokens down to just 65k tokens.

00:10:39.720 --> 00:10:42.259
So are overall API costs actually going down

00:10:42.259 --> 00:10:44.379
for developers? Smarter tool search means fewer

00:10:44.379 --> 00:10:46.820
tokens, offsetting the higher base price. On

00:10:46.820 --> 00:10:49.759
the safety front, the AI relies heavily on reinforcement

00:10:49.759 --> 00:10:51.940
learning. Which is crucial. Reinforcement learning

00:10:51.940 --> 00:10:53.840
is checking its own work and trying different

00:10:53.840 --> 00:10:57.139
paths to find answers. Exactly. It grades its

00:10:57.139 --> 00:10:59.139
own homework before showing you the final result.

00:10:59.340 --> 00:11:01.220
And its internal thoughts are fully transparent

00:11:01.220 --> 00:11:04.320
to OpenAI. It can't hide its reasoning from human

00:11:04.320 --> 00:11:07.580
oversight. That prevents it from executing malicious

00:11:07.580 --> 00:11:10.419
logic. Let's leave you with a few pro tips for

00:11:10.419 --> 00:11:14.080
your daily workflow. The first is mid -response

00:11:14.080 --> 00:11:17.000
redirection. This workflow is absolute magic.

00:11:17.120 --> 00:11:18.740
You don't have to hit the stop button anymore

00:11:18.740 --> 00:11:20.960
if it goes off track. It's like steering a horse

00:11:20.960 --> 00:11:22.840
while you're already galloping. If it's writing

00:11:22.840 --> 00:11:25.399
a long report and you realize you forgot an instruction,

00:11:25.600 --> 00:11:28.480
you just type it. You say, focus on environmental

00:11:28.480 --> 00:11:31.830
impact right while it is generating text. And

00:11:31.830 --> 00:11:34.610
it pivots instantly. It just adapts the text

00:11:34.610 --> 00:11:37.049
on the fly without missing a beat. You don't

00:11:37.049 --> 00:11:39.470
lose the good parts it already wrote. Do I have

00:11:39.470 --> 00:11:41.429
to wait for it to finish typing completely? Nope.

00:11:41.509 --> 00:11:43.450
Just interrupt mid -sentence and it pivots its

00:11:43.450 --> 00:11:45.809
thoughts instantly. You can also adjust its thinking

00:11:45.809 --> 00:11:48.570
effort in the settings menu. You have standard

00:11:48.570 --> 00:11:51.090
and heavy options. Leave it on standard for 90

00:11:51.090 --> 00:11:54.269
% of your daily tasks. It's fast, incredibly

00:11:54.269 --> 00:11:57.059
smart, and handles routine logic perfectly. But

00:11:57.059 --> 00:12:00.240
crank it up to heavy for deep math or nasty coding

00:12:00.240 --> 00:12:02.620
bugs. It might take five to eight minutes to

00:12:02.620 --> 00:12:05.259
answer. It's beat. Think about what an eight

00:12:05.259 --> 00:12:08.700
minute AI thought process looks like. It yields

00:12:08.700 --> 00:12:11.399
master level results. It really does. Let's take

00:12:11.399 --> 00:12:13.779
a step back and look at the big idea here. Two

00:12:13.779 --> 00:12:17.039
sec silence. This update fundamentally shifts

00:12:17.039 --> 00:12:20.500
what this tool actually is. It's no longer just

00:12:20.500 --> 00:12:22.740
a super powered search engine. It has become

00:12:22.740 --> 00:12:25.730
a true desktop partner. And the ultimate metric

00:12:25.730 --> 00:12:29.129
we are talking about is time. You are compressing

00:12:29.129 --> 00:12:31.669
an hour of tedious spreadsheet wrangling into

00:12:31.669 --> 00:12:33.610
minutes. You're turning a day of hunting down

00:12:33.610 --> 00:12:35.789
code bugs into 10 minutes of simple oversight.

00:12:36.210 --> 00:12:39.029
Exactly. If this AI can now flawlessly look at

00:12:39.029 --> 00:12:41.549
your screen, interpret the buttons, and navigate

00:12:41.549 --> 00:12:44.049
your desktop better than most humans. How long

00:12:44.049 --> 00:12:46.850
until we stop needing traditional operating systems

00:12:46.850 --> 00:12:48.950
and screens altogether? What happens when the

00:12:48.950 --> 00:12:52.279
AI becomes the entire interface? Pause for effect.

00:12:52.580 --> 00:12:54.960
That is wild to think about. It changes the whole

00:12:54.960 --> 00:12:56.899
paradigm of human -computer interaction. For

00:12:56.899 --> 00:12:59.840
now, try this out yourself. Pick one tedious

00:12:59.840 --> 00:13:02.620
task you do every single week. Just see if GPT

00:13:02.620 --> 00:13:05.019
5 .4 thinking can do the first draft for you.

00:13:05.299 --> 00:13:07.379
You might be surprised by how much your daily

00:13:07.379 --> 00:13:10.340
workflow changes. Thanks for exploring this deep

00:13:10.340 --> 00:13:12.460
dive with us. Catch you next time.