WEBVTT

00:00:00.000 --> 00:00:02.439
So we have AI models now that honestly feel like

00:00:02.439 --> 00:00:05.339
they have this incredible grasp language. They

00:00:05.339 --> 00:00:09.699
can chew through huge data sets, spot really

00:00:09.699 --> 00:00:12.919
subtle hate speech online, flag toxic stuff on

00:00:12.919 --> 00:00:16.500
big platforms. And even, you know, write pretty

00:00:16.500 --> 00:00:18.859
complex marketing campaigns basically from scratch.

00:00:19.899 --> 00:00:22.420
It's amazing abstract thinking. Right. And then

00:00:22.420 --> 00:00:25.039
you take that same, like, immense brain power,

00:00:25.120 --> 00:00:27.739
stick it in a physical robot body. Yeah. And

00:00:27.739 --> 00:00:30.079
these machines literally just spin around in

00:00:30.079 --> 00:00:33.039
circles, lost, trying to find the butter on a

00:00:33.039 --> 00:00:36.079
kitchen counter. Why is that physical common

00:00:36.079 --> 00:00:38.420
sense, that simple intuitive stuff we do without

00:00:38.420 --> 00:00:41.659
thinking, why is that still the absolute biggest

00:00:41.659 --> 00:00:44.520
hurdle for AI? It is. That huge kind of hilarious

00:00:44.520 --> 00:00:47.259
gap. That's the tension we're really digging

00:00:47.259 --> 00:00:49.140
into today. It's a fascinating contradiction,

00:00:49.359 --> 00:00:51.460
isn't it? And our sources lay it out really well.

00:00:51.539 --> 00:00:53.200
We're going to kind of trace this whole development

00:00:53.200 --> 00:00:55.079
arc. Okay. First up, we'll look at these new

00:00:55.079 --> 00:00:57.560
safety and transparency pushes from the big players,

00:00:57.619 --> 00:01:00.100
like open AI, how they're trying to build trust.

00:01:00.359 --> 00:01:02.740
Right. Then we need to look at just how fast

00:01:02.740 --> 00:01:05.099
digital AI is being adopted, especially like

00:01:05.099 --> 00:01:07.859
creative tools, marketing, business automation.

00:01:08.680 --> 00:01:11.340
It's just surging. Yeah. It really is. We'll

00:01:11.340 --> 00:01:13.540
touch on the huge infrastructure build supporting

00:01:13.540 --> 00:01:15.780
that speed, too. Good point. And finally, we

00:01:15.780 --> 00:01:18.260
absolutely have to confront that reality gap.

00:01:18.439 --> 00:01:22.219
The butter bench test, robots failing super simple

00:01:22.219 --> 00:01:25.200
home tasks. It just perfectly shows the limits

00:01:25.200 --> 00:01:27.459
when digital smarts meet the physical world.

00:01:27.620 --> 00:01:29.480
OK, let's start there then with segment one,

00:01:29.579 --> 00:01:33.239
this push for trust, for transparency. OpenAI

00:01:33.239 --> 00:01:35.659
has been, well, under a lot of pressure lately,

00:01:35.799 --> 00:01:38.379
right, about safety misuse. Definitely. And that

00:01:38.379 --> 00:01:40.599
pressure seems to be leading to models that are

00:01:40.599 --> 00:01:42.959
specifically built to show their work. Exactly.

00:01:42.959 --> 00:01:44.879
They just put out two new reasoning models. There's

00:01:44.879 --> 00:01:48.700
the big one, GPTI Safeguard 120B. 120 billion

00:01:48.700 --> 00:01:51.079
parameters. Yeah, serious scale. And then the

00:01:51.079 --> 00:01:54.060
smaller, maybe more accessible one, GPTI Safeguard

00:01:54.060 --> 00:01:56.900
20B. These aren't just tweaks. They are custom

00:01:56.900 --> 00:01:59.519
-tuned, really focused on online safety tasks.

00:02:00.040 --> 00:02:02.260
And their job is pretty specific, right? Like

00:02:02.260 --> 00:02:05.379
finding fake reviews, shutting down nasty comments,

00:02:05.500 --> 00:02:08.039
maybe catching cheating talk on gaming forums

00:02:08.039 --> 00:02:10.699
or something. Precisely. But the really unique

00:02:10.699 --> 00:02:13.379
thing, and honestly the critical part, is they

00:02:13.379 --> 00:02:16.500
don't just say yes or no, safe or unsafe. They're

00:02:16.500 --> 00:02:18.879
built to show their reasoning. They actually

00:02:18.879 --> 00:02:22.280
explain, often step by step, why they flagged

00:02:22.280 --> 00:02:24.960
a comment or a response as harmful or toxic.

00:02:25.439 --> 00:02:28.120
That seems huge for platforms. So if you're,

00:02:28.120 --> 00:02:30.879
say, a Reddit mod dealing with tons of posts.

00:02:31.120 --> 00:02:33.659
You could use this to fight hate speech, and

00:02:33.659 --> 00:02:36.180
the AI actually gives you the justification for

00:02:36.180 --> 00:02:39.319
pulling something down. Exactly. Or maybe a review

00:02:39.319 --> 00:02:42.259
site feels more confident catching really clever

00:02:42.259 --> 00:02:45.939
fake reviews because the AI explains the subtle

00:02:45.939 --> 00:02:47.939
signs it picked up on. And you have to assume

00:02:47.939 --> 00:02:50.379
this push for transparency is partly a response

00:02:50.379 --> 00:02:52.719
to some very serious incidents, like that case

00:02:52.719 --> 00:02:55.099
alleging ChatGPT's involvement in a suicide.

00:02:55.439 --> 00:02:57.300
Absolutely. It really highlights that safety

00:02:57.300 --> 00:03:00.080
can't just be an internal checkbox anymore. Platforms

00:03:00.080 --> 00:03:01.800
need to be able to publicly prove and defend

00:03:01.800 --> 00:03:04.020
how they're keeping things safe. It feels like

00:03:04.020 --> 00:03:06.620
a really important step for accountability. But,

00:03:06.699 --> 00:03:09.479
I mean, doesn't showing the reasoning also kind

00:03:09.479 --> 00:03:11.479
of teach the bad actors how to get around it?

00:03:11.740 --> 00:03:14.400
Ah, yeah. If I know exactly why my toxic comment

00:03:14.400 --> 00:03:17.080
got flagged, can't I just tweak it slightly next

00:03:17.080 --> 00:03:19.919
time to sneak past the filter? That is the fundamental

00:03:19.919 --> 00:03:23.819
cat and mouse game of AI safety. 100%. You are

00:03:23.819 --> 00:03:26.599
kind of giving away the playbook. Okay. But the

00:03:26.599 --> 00:03:30.639
flip side... A completely opaque black box moderation

00:03:30.639 --> 00:03:34.719
system that tanks user trust instantly, especially

00:03:34.719 --> 00:03:36.759
when it makes mistakes. Yeah. So it's a tradeoff.

00:03:37.180 --> 00:03:39.659
Transparency is seen as, well, maybe the lesser

00:03:39.659 --> 00:03:42.379
evil for building that essential user confidence.

00:03:42.620 --> 00:03:44.719
And we should probably admit, even for the folks

00:03:44.719 --> 00:03:47.740
building these systems, getting AI behavior right

00:03:47.740 --> 00:03:49.960
is just incredibly difficult. Oh, absolutely.

00:03:50.180 --> 00:03:52.219
But I still wrestle with prompt drift myself

00:03:52.219 --> 00:03:54.960
sometimes, just trying to get an AI to stay focused

00:03:54.960 --> 00:03:57.150
on a specific task consistently. Prompt drift.

00:03:57.349 --> 00:03:59.430
That's when the model kind of forgets the original

00:03:59.430 --> 00:04:01.770
instructions or it loses context over a long

00:04:01.770 --> 00:04:04.229
chat. Exactly. Getting these incredibly complex

00:04:04.229 --> 00:04:08.669
models to stay reliably within safe, narrow boundaries.

00:04:08.770 --> 00:04:11.389
It's a constant ongoing challenge. A real battle.

00:04:11.710 --> 00:04:14.569
So think about the core value proposition here.

00:04:15.270 --> 00:04:18.290
How crucial is this transparency feature, this

00:04:18.290 --> 00:04:21.589
showing the reasoning for getting users and platforms

00:04:21.589 --> 00:04:24.439
to adopt these safety measures and set... policies

00:04:24.439 --> 00:04:26.540
around them. Fundamentally, it builds trust in

00:04:26.540 --> 00:04:28.939
automated moderation systems. Yeah. Reduces liability

00:04:28.939 --> 00:04:32.860
too. Okay. So moving from the need for safety

00:04:32.860 --> 00:04:35.300
to the sheer speed of innovation, let's talk

00:04:35.300 --> 00:04:37.879
about this incredible pace of adoption in the

00:04:37.879 --> 00:04:40.500
digital realm. Creative tools, marketing AI,

00:04:40.939 --> 00:04:44.220
it feels like it's just exploding right now.

00:04:44.360 --> 00:04:45.879
Exploding is the right word. I mean, look at

00:04:45.879 --> 00:04:48.040
that new Google web tool. You just point it at

00:04:48.040 --> 00:04:50.579
a website and it analyzes the whole company's

00:04:50.579 --> 00:04:53.529
history. It's messaging the source called it

00:04:53.529 --> 00:04:56.829
grabbing the company's DNA. Wow. And then it

00:04:56.829 --> 00:04:59.310
designs full on pretty sophisticated marketing

00:04:59.310 --> 00:05:01.670
campaigns. That's not just the chatbot suggesting

00:05:01.670 --> 00:05:04.310
ideas. That's deep automation doing strategic

00:05:04.310 --> 00:05:07.569
work. And the integrations are getting so much

00:05:07.569 --> 00:05:10.329
smoother, almost invisible, like being able to

00:05:10.329 --> 00:05:13.529
edit using powerful tools, Photoshop, Express

00:05:13.529 --> 00:05:16.250
directly inside ChatGPT. Oh, yeah. That demo

00:05:16.250 --> 00:05:18.750
looked sick, didn't it? That collapse between

00:05:18.750 --> 00:05:21.509
like generating an idea and actually creating

00:05:21.509 --> 00:05:23.990
and deploying the asset. Yeah. That's a massive

00:05:23.990 --> 00:05:26.029
speed up for creative workflows. And we're seeing

00:05:26.029 --> 00:05:28.069
it in video, too, right? It's not just the big

00:05:28.069 --> 00:05:30.329
names like Sora and Vue anymore. Not at all.

00:05:30.389 --> 00:05:32.589
You've got others bubbling up like Nano Banana

00:05:32.589 --> 00:05:35.860
and that LTX2. One that made the SpongeBob clip.

00:05:36.019 --> 00:05:38.240
Yeah, that went viral, right? Made from just

00:05:38.240 --> 00:05:42.180
a short text prompt. It shows how low the barrier

00:05:42.180 --> 00:05:44.939
is getting for making really high quality, impactful

00:05:44.939 --> 00:05:48.019
video content. Anyone can do it, almost. And

00:05:48.019 --> 00:05:50.800
crucially, this incredible software progress

00:05:50.800 --> 00:05:53.560
is built on top of a staggering physical infrastructure

00:05:53.560 --> 00:05:56.000
build out. We can't forget that. That's the essential

00:05:56.000 --> 00:05:59.139
context. Absolutely. I mean, Eli Lilly just opened

00:05:59.139 --> 00:06:01.519
what they're calling the world's largest AI factory

00:06:01.519 --> 00:06:04.079
with NVIDIA, specifically for drug discovery.

00:06:04.259 --> 00:06:08.480
GitHub pulled together AI coding agents from

00:06:08.480 --> 00:06:12.019
OpenAI, Anthropic, Google, basically putting

00:06:12.019 --> 00:06:15.339
all the top coding brains in one spot. for developers.

00:06:15.540 --> 00:06:18.199
Amazon's huge data center expansion in Indiana.

00:06:18.540 --> 00:06:21.019
Apparently, it's explicitly to boost Claude's

00:06:21.019 --> 00:06:23.439
abilities. Right. And over in Asia, KakaoTalk

00:06:23.439 --> 00:06:25.720
is integrating GPT -5 for, you know, smarter

00:06:25.720 --> 00:06:29.269
chat features. Whoa. Just imagine scaling the

00:06:29.269 --> 00:06:32.029
infrastructure needed to handle, say, a billion

00:06:32.029 --> 00:06:34.569
queries every single second. It's almost mind

00:06:34.569 --> 00:06:36.649
-boggling, the sheer computational power being

00:06:36.649 --> 00:06:38.990
assembled globally to fuel this digital shift.

00:06:39.170 --> 00:06:42.029
It's immense. And that scale is why the money

00:06:42.029 --> 00:06:44.730
is just flooding in. Speaking of money, Fireworks

00:06:44.730 --> 00:06:48.350
AI just raised $250 million. Big names involve

00:06:48.350 --> 00:06:51.009
Lightspeed, Index Ventures, and significantly,

00:06:51.310 --> 00:06:53.810
NVIDIA again. What does that size of investment

00:06:53.810 --> 00:06:55.889
tell us about where the market focus is shifting?

00:06:55.990 --> 00:06:58.040
That's a really key signal. that huge chunk of

00:06:58.040 --> 00:07:00.379
cash, it's not primarily going into just building

00:07:00.379 --> 00:07:02.139
bigger models anymore. It's going towards inference.

00:07:02.839 --> 00:07:05.420
That's the speed and efficiency of actually running

00:07:05.420 --> 00:07:08.879
those massive models to serve potentially billions

00:07:08.879 --> 00:07:12.000
of user requests quickly and affordably. The

00:07:12.000 --> 00:07:14.620
priority now is scaling the delivery, not just

00:07:14.620 --> 00:07:16.800
building bigger brains. That's a huge trend to

00:07:16.800 --> 00:07:19.430
watch. Interesting. Yet even with all this speed

00:07:19.430 --> 00:07:22.050
and abstract smarts, these digital platforms

00:07:22.050 --> 00:07:25.610
still run into, well, user friction and sometimes

00:07:25.610 --> 00:07:28.069
kind of funny mistakes. Like the pushback when

00:07:28.069 --> 00:07:30.810
OpenAI floated the idea of ads and chat GPT.

00:07:30.930 --> 00:07:33.110
That was instant. Oh, yeah. That was a powerful

00:07:33.110 --> 00:07:36.189
reality check. Users made it clear. Clutter up

00:07:36.189 --> 00:07:38.709
this essential tool with intrusive ads and we

00:07:38.709 --> 00:07:41.819
are out. Yeah. Ads. OK, bye. Simple as that.

00:07:41.899 --> 00:07:44.240
It's the classic platform monetization dilemma,

00:07:44.399 --> 00:07:46.199
right? User experience suffers if you get it

00:07:46.199 --> 00:07:48.759
wrong. And then you had Meta's AI ad tool reportedly

00:07:48.759 --> 00:07:51.899
replacing a fashion model with an AI -generated

00:07:51.899 --> 00:07:54.439
grandma in an ad. Slight chuckle. Yeah. Even

00:07:54.439 --> 00:07:56.199
when the AI nails the language, sometimes it

00:07:56.199 --> 00:07:59.339
completely misses the... The context or the point

00:07:59.339 --> 00:08:02.139
of the ad itself. It shows that abstract smarts

00:08:02.139 --> 00:08:04.459
don't always equal common sense understanding,

00:08:04.759 --> 00:08:07.720
which is a perfect lead in to our last segment,

00:08:07.920 --> 00:08:10.819
the physical world. Let's make that pivot. From

00:08:10.819 --> 00:08:12.699
the colossal scale of digital infrastructure

00:08:12.699 --> 00:08:16.139
and language mastery. To the sometimes embarrassing

00:08:16.139 --> 00:08:19.939
clumsiness when AI meets physical reality. The

00:08:19.939 --> 00:08:23.000
contract is just wild, isn't it? We're building

00:08:23.000 --> 00:08:25.959
the planet's biggest computers so software can

00:08:25.959 --> 00:08:29.360
detect. like subtle sarcasm online and then we

00:08:29.360 --> 00:08:31.399
asked that same level of intelligence to do something

00:08:31.399 --> 00:08:33.460
physically simple like go get a stick of butter

00:08:33.460 --> 00:08:36.919
from the kitchen and it just Fails miserably

00:08:36.919 --> 00:08:39.179
sometimes. Tell us about this butter bench. Right.

00:08:39.259 --> 00:08:41.960
The butter bench. It's a new robotics test, a

00:08:41.960 --> 00:08:44.379
benchmark. And it's designed to be ridiculously

00:08:44.379 --> 00:08:47.120
easy for a person. Totally mundane. It tests

00:08:47.120 --> 00:08:50.000
how well these top AI models can control standard

00:08:50.000 --> 00:08:52.519
home robots doing basic chores like find the

00:08:52.519 --> 00:08:54.379
butter, go to the kitchen, grab it using touch

00:08:54.379 --> 00:08:56.539
sensors to confirm you actually picked it up,

00:08:56.580 --> 00:08:58.639
find the human, hand it over and then go back

00:08:58.639 --> 00:09:01.120
and plug yourself in safely. Stuff we do without

00:09:01.120 --> 00:09:04.220
even thinking. And the reality gap here is just.

00:09:04.909 --> 00:09:08.730
Stark. Humans nailed it, right? Like 95 % success.

00:09:09.210 --> 00:09:12.169
Crushed it. What about the best AI they tested?

00:09:12.370 --> 00:09:15.710
So the top performer was Google's Gemini 2 .5

00:09:15.710 --> 00:09:20.230
Pro. It only managed a 40 % success rate. 40%.

00:09:20.230 --> 00:09:23.409
That means more often than not. this super advanced

00:09:23.409 --> 00:09:27.350
AI failed a basic chore in a controlled lab setting.

00:09:27.529 --> 00:09:29.710
Failed, yeah. And what's really telling is how

00:09:29.710 --> 00:09:32.509
they failed. It wasn't just like one simple bug.

00:09:32.649 --> 00:09:34.649
No, the failure modes are what make it so informative.

00:09:35.289 --> 00:09:38.090
You had robots literally spinning in circles,

00:09:38.190 --> 00:09:40.549
seemingly lost even with constant location data.

00:09:40.730 --> 00:09:43.549
Wow. They'd miss obvious visual cues, like confusing

00:09:43.549 --> 00:09:45.570
a water bottle on the counter for the package

00:09:45.570 --> 00:09:47.370
they were supposed to get. Or they'd lose track

00:09:47.370 --> 00:09:49.090
of where they were completely just by navigating

00:09:49.090 --> 00:09:51.090
around a simple obstacle, like the corner of

00:09:51.090 --> 00:09:53.549
a table. It sounds less like a failure of pure

00:09:53.549 --> 00:09:56.450
intelligence and more like a failure of embodiment,

00:09:56.590 --> 00:09:58.769
like connecting the brain to the world. That's

00:09:58.769 --> 00:10:01.169
exactly it. And the low battery panic was a great

00:10:01.169 --> 00:10:02.830
example. What happened there? When the robot's

00:10:02.830 --> 00:10:06.350
power started dropping. The high -level AI, the

00:10:06.350 --> 00:10:09.169
brain, couldn't execute the most basic emergency

00:10:09.169 --> 00:10:12.049
procedure. Just go back and dock itself to recharge.

00:10:12.370 --> 00:10:15.149
It would freeze. Or just start moving erratically.

00:10:15.330 --> 00:10:18.809
It really is like watching a toddler figure out

00:10:18.809 --> 00:10:21.629
how the world works. Except this toddler also

00:10:21.629 --> 00:10:24.889
happens to have, like, a PhD in theoretical physics

00:10:24.889 --> 00:10:27.470
and can write flawless essays on string theory.

00:10:27.730 --> 00:10:30.509
That analogy is spot on. The physical world,

00:10:30.529 --> 00:10:33.370
with all its messiness, friction, changing light,

00:10:33.549 --> 00:10:35.490
things being slightly out of place, unexpected

00:10:35.490 --> 00:10:38.950
battery drain, it just overwhelms models that

00:10:38.950 --> 00:10:41.409
thrive on the clean, predictable logic of text

00:10:41.409 --> 00:10:44.110
and code. So if you boil it down, what does the

00:10:44.110 --> 00:10:46.509
butter bench really prove about the core challenges

00:10:46.509 --> 00:10:49.409
facing robotics and physical AI right now? It

00:10:49.409 --> 00:10:51.870
proves the sheer complexity of real world variables

00:10:51.870 --> 00:10:54.649
still swamps abstract intelligence. Yeah. Sensor

00:10:54.649 --> 00:10:56.590
fusion, low level motor control. That's the hard

00:10:56.590 --> 00:10:58.590
part. The software interacting with physics is

00:10:58.590 --> 00:11:00.309
the bottleneck. Which brings us right back to

00:11:00.309 --> 00:11:01.970
that central duality we started with, the defining

00:11:01.970 --> 00:11:04.929
feature of AI today. On the one hand, you've

00:11:04.929 --> 00:11:07.929
got AI mastering incredibly high level reasoning.

00:11:08.110 --> 00:11:11.190
It's tackling transparency, writing complex code,

00:11:11.350 --> 00:11:13.629
designing global marketing strategies. It gets

00:11:13.629 --> 00:11:16.830
the abstract stuff, but it possesses the basic

00:11:16.830 --> 00:11:19.509
physical common sense, the object permanence

00:11:19.509 --> 00:11:22.710
of maybe a three year old child. That gap between

00:11:22.710 --> 00:11:24.789
the language models understanding the word butter

00:11:24.789 --> 00:11:27.590
and the embodied AI actually passing the butter.

00:11:27.909 --> 00:11:30.190
That's the defining challenge for the next decade,

00:11:30.309 --> 00:11:32.769
maybe more. It seems we can make the models smarter

00:11:32.769 --> 00:11:35.129
in terms of abstract thought incredibly quickly.

00:11:35.269 --> 00:11:37.789
That part's accelerating. But making them physically

00:11:37.789 --> 00:11:40.389
competent, integrating that smartness smoothly

00:11:40.389 --> 00:11:42.929
into actual physics, into mechanics, that feels

00:11:42.929 --> 00:11:46.070
like a much slower, much harder climb. The digital

00:11:46.070 --> 00:11:49.309
brain's brilliance meets the physical body's,

00:11:49.370 --> 00:11:51.990
well, clumsiness. The bottleneck isn't the intelligence

00:11:51.990 --> 00:11:54.269
anymore. It's the interaction with reality. It's

00:11:54.269 --> 00:11:56.730
the body. So here's a final thought to leave

00:11:56.730 --> 00:11:59.830
with. If the world's most advanced AI that can

00:11:59.830 --> 00:12:02.830
write elegant code and spot nuanced hate speech

00:12:02.830 --> 00:12:06.610
totally fumbles simple physical tasks like passing

00:12:06.610 --> 00:12:09.289
butter, what does that really mean for the future

00:12:09.289 --> 00:12:12.049
of complex manual jobs? Things like construction

00:12:12.049 --> 00:12:14.830
or maybe elder care, where adapting to unpredictable

00:12:14.830 --> 00:12:18.210
physical environments is everything. Maybe that

00:12:18.210 --> 00:12:21.409
true robot revolution in physical labor. Maybe

00:12:21.409 --> 00:12:23.289
it's still quite a bit further off than some

00:12:23.289 --> 00:12:25.549
of the headlines suggest. That's a really powerful

00:12:25.549 --> 00:12:28.230
thought to consider. That gap is very, very real.

00:12:28.370 --> 00:12:30.090
Well, thank you for joining us for this deep

00:12:30.090 --> 00:12:32.110
dive. We really encourage you as you go about

00:12:32.110 --> 00:12:35.029
your week to think about which side of this AI

00:12:35.029 --> 00:12:37.850
duality, the digital brilliance or the physical

00:12:37.850 --> 00:12:40.149
awkwardness you notice most in your own life

00:12:40.149 --> 00:12:42.870
and work. Yeah, great point. Until next time,

00:12:42.909 --> 00:12:43.929
keep digging into the sources.