WEBVTT

00:00:00.000 --> 00:00:05.219
Imagine having a team of brilliant experts. Not

00:00:05.219 --> 00:00:07.360
just a chatbot giving facts, but something that

00:00:07.360 --> 00:00:12.119
really thinks, plans, acts for you. Instantly,

00:00:12.140 --> 00:00:14.199
they're in your pocket. This isn't sci -fi anymore.

00:00:14.339 --> 00:00:16.500
It's actually here. It really feels that way.

00:00:16.800 --> 00:00:19.160
Welcome to the Deep Dive. Today, we're plunging

00:00:19.160 --> 00:00:21.660
into what feels like, well, a huge moment in

00:00:21.660 --> 00:00:24.719
AI. Yeah, the GPT -5 launch. Exactly. OpenAI's

00:00:24.719 --> 00:00:28.800
GPT -5. Our mission is to unpack the pretty bold

00:00:28.800 --> 00:00:32.420
claims. Which are surprising. Definitely surprising.

00:00:32.640 --> 00:00:35.939
And what this new unified model really means

00:00:35.939 --> 00:00:38.759
for you. We'll cover its capabilities, its immediate

00:00:38.759 --> 00:01:35.310
impact, and then look at other big A. So it's

00:01:35.310 --> 00:01:37.170
not just about being faster then. It's about

00:01:37.170 --> 00:01:39.310
a different kind of intelligence working together.

00:01:39.590 --> 00:01:41.829
I think that's the idea. It's a significant leap,

00:01:41.930 --> 00:02:08.979
yeah. It allows for way more... Okay, that's

00:02:08.979 --> 00:02:10.639
different. And there's this real -time routing

00:02:10.639 --> 00:02:13.860
thing, too. The model decides for itself. Quick

00:02:13.860 --> 00:02:16.199
answer needed? Or does this need deeper thought?

00:02:16.360 --> 00:02:19.180
And it just does it. Yeah, no user input needed

00:02:19.180 --> 00:02:21.620
for that switch. That's why OpenAI is calling

00:02:21.620 --> 00:02:24.300
it an agent over chatbot. Agent over chatbot.

00:02:24.539 --> 00:02:27.900
Their pitch is basically a team of PhD -level

00:02:27.900 --> 00:02:30.340
experts in your pocket. That's quite the image.

00:02:30.759 --> 00:02:33.180
And they're saying it's the best model in the

00:02:33.180 --> 00:02:35.599
world. Do the numbers back that up? What about

00:02:35.599 --> 00:02:37.479
those benchmarks? The numbers are pretty strong,

00:02:37.560 --> 00:02:41.419
yeah. Coding, for example, 74 .9 % accuracy.

00:02:41.620 --> 00:02:44.379
Okay, 74 .9. How's that compare? It just edges

00:02:44.379 --> 00:02:48.139
out Claude Opus 4 .1, which was at 74 .5%. Very

00:02:48.139 --> 00:02:50.740
close. Very close. Yeah. But it really pulls

00:02:50.740 --> 00:02:53.500
ahead of Google's Gemini 2 .5 Pro, which was

00:02:53.500 --> 00:02:57.969
59. That's a big gap. What about other areas

00:02:57.969 --> 00:03:01.490
like science? Science Q &A, it hit 89 .4%. That

00:03:01.490 --> 00:03:05.430
beats Claude Opus 4 .1 at 80 .9%. And even Grok

00:03:05.430 --> 00:03:09.030
4 Heavy at 88 .9%. Impressive. But I saw something

00:03:09.030 --> 00:03:11.379
about healthcare accuracy. That seemed really

00:03:11.379 --> 00:03:14.439
striking. Yes, that's maybe the biggest deal.

00:03:14.500 --> 00:03:16.599
The hallucination rate, you know, when the AI

00:03:16.599 --> 00:03:18.819
just makes stuff up. Yeah, always a worry. It

00:03:18.819 --> 00:03:23.120
dropped to just 1 .6%. 1 .6%, seriously? Seriously.

00:03:23.479 --> 00:03:28.000
Compare that to GPT -40's 12 .9 % or 03's 15

00:03:28.000 --> 00:03:31.879
.8%. Massive reduction. Wow. What does that mean,

00:03:31.919 --> 00:03:34.580
practically? Well, it's not really for solo diagnosis,

00:03:34.780 --> 00:03:37.460
obviously. But for things like summarizing research

00:03:37.460 --> 00:03:40.099
papers for a doctor, much, much more reliable.

00:03:49.120 --> 00:03:52.259
Humanity's last exam. Super hard reasoning benchmark.

00:03:52.479 --> 00:03:56.340
OK. GPT -5 scored 42 percent, which is just slightly

00:03:56.340 --> 00:03:59.699
under Grok for heavy is 44 .4 percent. So it's.

00:04:20.560 --> 00:04:22.660
700 million people? That's almost 10 % of the

00:04:22.660 --> 00:04:24.740
planet. Getting access overnight? Instantly.

00:04:24.899 --> 00:04:28.199
It's a massive strategic play by OpenAI. Keep

00:04:28.199 --> 00:04:30.879
users in their ecosystem. You know, FightOff,

00:04:31.019 --> 00:04:34.560
Anthropic, XAI, Google. Makes sense. Lock them

00:04:34.560 --> 00:04:36.699
in with the best stuff. Exactly. It's about market

00:04:36.699 --> 00:04:39.240
share and keeping users engaged. You know, it's

00:04:39.240 --> 00:04:41.459
funny. I still wrestle with confidence myself

00:04:41.459 --> 00:04:45.079
sometimes. Getting the AI to consistently understand

00:04:45.079 --> 00:04:47.100
what I mean, especially over several turns on

00:04:47.100 --> 00:04:50.800
a complex task. This is sort of the idea of a

00:04:50.800 --> 00:04:53.339
model that just gets it for those multi -step

00:04:53.339 --> 00:04:55.540
things without me having to constantly tweak

00:04:55.540 --> 00:04:57.899
and rephrase. Right. Less babysitting. Yeah.

00:04:58.139 --> 00:05:00.699
That feels like a genuine game changer for actually

00:05:00.699 --> 00:05:02.939
using it effectively day to day. Definitely.

00:05:03.100 --> 00:05:33.079
So boiling it down. Yeah, the rivalries are heating

00:05:33.079 --> 00:05:36.060
up, aren't they? Like, hours after GPC 5 dropped,

00:05:36.279 --> 00:05:38.939
Elon Musk claimed his Rock 4 Heavy was already

00:05:38.939 --> 00:05:40.899
smarter. Right, said it was smarter two weeks

00:05:40.899 --> 00:05:43.379
before the launch, though, you know, no published

00:05:43.379 --> 00:05:45.720
benchmarks to back that up yet. Still, it shows

00:05:45.720 --> 00:05:49.519
that intense personal and tech rivalry. And speaking

00:05:49.519 --> 00:05:52.699
of competition, OpenAI affordably spent, supposedly,

00:05:52.839 --> 00:05:56.439
$1 .5 million per employee. Yeah, for about 1

00:05:56.439 --> 00:05:59.639
,000 staff. Over $1 .5 billion total right before

00:05:59.639 --> 00:06:01.519
the launch. Just to keep talent from jumping

00:06:01.519 --> 00:06:04.000
ship. Seems like it. It's a war for talent out

00:06:04.000 --> 00:06:06.519
there. Huge money flying around. And huge...

00:06:35.240 --> 00:06:38.139
They just raised $100 million and a $3 .1 billion

00:06:38.139 --> 00:06:41.120
valuation. Wow, that's a steep climb. Yeah, a

00:06:41.120 --> 00:06:43.759
6x increase in less than a year. And get this,

00:06:43.920 --> 00:06:47.240
they've been profitable since day one. Profitable

00:06:47.240 --> 00:06:51.040
in AI research, that's rare. Very. And they massively

00:06:51.040 --> 00:06:53.439
cut the cost of using their video models from

00:06:53.439 --> 00:06:56.379
like $1 ,400 an hour down to $0 .25 an hour.

00:06:56.420 --> 00:07:00.199
Okay, wait, $1 ,400 down to $0 .25? Yeah, that's

00:07:00.199 --> 00:07:02.000
completely disruptive for anyone working with

00:07:02.000 --> 00:07:03.970
video generation. He makes it accessible. That

00:07:03.970 --> 00:07:07.129
cost reduction alone is revolutionary. And on

00:07:07.129 --> 00:07:09.589
a more relatable note, there is this funny thing

00:07:09.589 --> 00:07:34.230
on Reddit. Yeah. A senior D. And from people

00:07:34.230 --> 00:07:36.589
wanting to learn this stuff. Andrew Ng, you know,

00:07:36.610 --> 00:07:39.449
the AI pioneer. Sure. He announced a big free

00:07:39.449 --> 00:07:42.389
course on Cloud Code partnering with Anthropic.

00:07:42.569 --> 00:07:45.250
Free course from Andrew Ng, nice. Yeah, focused

00:07:45.250 --> 00:07:48.149
on what they call agentic AI, AI that acts with

00:07:48.149 --> 00:07:50.449
more autonomy. Great opportunity to learn from

00:07:50.449 --> 00:07:52.670
one of the best. Okay, so when you put all these

00:07:52.670 --> 00:07:54.829
things together, the rivalries, the new tools

00:07:54.829 --> 00:07:57.790
like 1, 2 .2, the cost drops from Dakar, even

00:07:57.790 --> 00:08:01.990
the funny explanations and the courses. Picture

00:08:01.990 --> 00:08:05.310
the pain of the AI landscape right now. It's

00:08:05.310 --> 00:08:07.910
intense competition driving incredibly rapid

00:08:07.910 --> 00:08:10.750
innovation, leading to practical, accessible

00:08:10.750 --> 00:08:13.329
applications emerging almost constantly. Constant

00:08:13.329 --> 00:08:36.519
mode. and they're already weaving GPT -5 into

00:08:36.519 --> 00:08:39.120
basically everything. Yeah, consumer stuff, developer

00:08:39.120 --> 00:08:42.220
tools, corporate offerings. They're moving fast.

00:08:42.639 --> 00:08:44.820
And Duolingo. Remember the backlash when they

00:08:44.820 --> 00:08:47.960
went AI first? Well, it seems like that's tied

00:08:47.960 --> 00:08:49.679
down. One release isn't hurting their momentum

00:08:49.679 --> 00:08:52.279
much. Interesting. People adapt quickly, maybe?

00:08:52.559 --> 00:08:55.399
Maybe. Okay, here's one that's a bit out there.

00:08:55.559 --> 00:09:00.659
Oh. Go on. Oh. Imagine an AI system discovering

00:09:00.659 --> 00:09:09.740
a new kind of physics. Yeah, that expands what

00:09:09.740 --> 00:09:12.820
discovery even means. Wow. That idea of AI discovering

00:09:12.820 --> 00:09:15.720
new physics, it really is profound. Kind of humbling.

00:09:16.059 --> 00:09:40.799
Absolutely. Also, a quick one. Adapt their reasoning

00:09:40.799 --> 00:09:43.600
in real time. Adapt their reasoning, meaning?

00:09:43.919 --> 00:09:46.440
Meaning they move beyond just using their static

00:09:46.440 --> 00:09:49.200
pre -trained knowledge. They're not just reciting

00:09:49.200 --> 00:09:51.700
facts they learned. Okay, so it's more dynamic

00:09:51.700 --> 00:09:54.700
than just recalling information. Exactly. And

00:09:54.700 --> 00:09:57.460
the results are impressive. It tripled GPT -4

00:09:57.460 --> 00:10:00.679
.1's accuracy on tough biomedical questions.

00:10:01.080 --> 00:10:03.899
Tripled it? Yep. And it even outperformed O3,

00:10:04.120 --> 00:10:06.519
which is already a top reasoning model. How does

00:10:06.519 --> 00:10:08.799
it do that? It generates these internal feedback

00:10:08.799 --> 00:10:13.080
loops while it's working. It can manage its own

00:10:13.080 --> 00:10:16.000
memory, recognize when it's uncertain about its

00:10:16.000 --> 00:10:18.519
own reasoning, and then this is the key part,

00:10:18.679 --> 00:10:21.480
revise its answers mid -process. So it's like

00:10:21.480 --> 00:10:23.440
it's thinking out loud to itself and correcting

00:10:23.440 --> 00:10:25.539
its own path. That's a great way to put it, a

00:10:25.539 --> 00:10:27.379
kind of self -reflection and correction. Like

00:10:27.379 --> 00:10:29.970
an internal monologue almost. Yeah. And what's

00:10:29.970 --> 00:10:31.690
really cool for users, especially scientists

00:10:31.690 --> 00:10:34.450
or devs, is the control it offers. Control how?

00:10:34.669 --> 00:10:37.070
You can set uncertainty thresholds, basically

00:10:37.070 --> 00:10:40.409
telling it how sure it needs to be. You can rerun

00:10:40.409 --> 00:10:43.029
its reasoning paths, even tweak them. So you

00:10:43.029 --> 00:10:46.059
can see how it's thinking and guide it. Exactly.

00:10:46.059 --> 00:10:49.399
You can oversee how it handles tricky, ambiguous

00:10:49.399 --> 00:10:51.960
problems. And it gives you regular, but also

00:10:51.960 --> 00:10:54.480
flexibility. That sounds incredibly useful, not

00:10:54.480 --> 00:10:56.299
just getting the answer, but understanding the

00:10:56.299 --> 00:10:58.899
process. Right. It's like a video GPS. Yeah.

00:10:58.980 --> 00:11:01.360
Instead of one fixed route. Yeah. It's like a

00:11:01.360 --> 00:11:05.120
GPS that can dynamically rethink your path mid

00:11:05.120 --> 00:11:07.960
-drive if it hits traffic or finds a better way.

00:11:08.120 --> 00:11:10.279
That's a fantastic analogy, adapting on the fly.

00:11:10.399 --> 00:11:12.659
It really is. And it raises a big question, doesn't

00:11:12.659 --> 00:11:15.090
it? Which is? If Microsoft can make GPS... P4

00:11:15.090 --> 00:11:18.830
.1, I think. Steer and adapt like this. What

00:11:18.830 --> 00:11:21.990
happens when they apply CLIO to their next generation

00:11:21.990 --> 00:11:25.370
models? Oh, yeah. The potential there seems huge.

00:11:25.529 --> 00:11:52.940
Immense. Right. Powerful AI reasoning, suddenly

00:11:52.940 --> 00:11:55.559
accessible to hundreds of millions. It's a huge

00:11:55.559 --> 00:11:58.240
shift from just chatbots to these dynamic assistants

00:11:58.240 --> 00:12:00.940
that can actually do complex things. And then

00:12:00.940 --> 00:12:04.450
you add things like Microsoft CLIO. And you see

00:12:04.450 --> 00:12:06.990
AI itself becoming more fluid, more adaptive.

00:12:07.210 --> 00:12:09.490
Yeah, not just static knowledge anymore. Exactly.

00:12:09.490 --> 00:12:11.649
They're becoming systems that can think, self

00:12:11.649 --> 00:12:14.669
-correct, learn in real time, explore new possibilities.

00:12:14.669 --> 00:12:17.590
The pace is just wild. The competition's intense.

00:12:18.070 --> 00:12:20.860
But the end result seems to be... AI that's more

00:12:20.860 --> 00:12:23.100
capable, more practical. And just more integrated

00:12:23.100 --> 00:12:25.379
into everything we do. It's an incredible moment

00:12:25.379 --> 00:12:28.639
to watch unfold. It really is. So maybe something

00:12:28.639 --> 00:12:31.279
for you listening to think about. Yeah. How might

00:12:31.279 --> 00:12:33.620
these new capabilities, especially this idea

00:12:33.620 --> 00:12:36.019
of an AI agent in your pocket, actually impact

00:12:36.019 --> 00:12:39.379
your daily life? Yeah. Your work. What specific

00:12:39.379 --> 00:12:41.759
task would you just hand over to this new level

00:12:41.759 --> 00:12:43.860
of AI? Yeah. What's the first thing you'd want

00:12:43.860 --> 00:12:45.919
it to handle for you? And maybe a bigger thought.

00:12:46.490 --> 00:12:49.830
If AI can discover new physics like we talked

00:12:49.830 --> 00:12:53.970
about or adapt its thinking like CLIO, how does

00:12:53.970 --> 00:12:56.490
that change our own understanding of intelligence

00:12:56.490 --> 00:12:58.809
itself in the years ahead? What does it mean

00:12:58.809 --> 00:13:03.009
for us? Deep questions. Lots to ponder. Definitely.

00:13:03.370 --> 00:13:07.149
We hope you found some aha moments in this deep

00:13:07.149 --> 00:13:09.289
dive. And hopefully you'll keep exploring all

00:13:09.289 --> 00:13:11.289
these fascinating developments with us. Thanks

00:13:11.289 --> 00:13:12.970
so much for joining us. Until next time, keep

00:13:12.970 --> 00:13:13.409
exploring.