WEBVTT

00:00:00.000 --> 00:00:01.419
You've probably heard all the hype, right? AI

00:00:01.419 --> 00:00:04.240
agents automating everything, your calendar,

00:00:04.360 --> 00:00:07.019
your code, promising to totally transform your

00:00:07.019 --> 00:00:09.400
work, maybe even your whole job. But what if

00:00:09.400 --> 00:00:12.220
right now, like in their current form, they're

00:00:12.220 --> 00:00:15.890
actually, well. Kind of useless for most of that.

00:00:16.030 --> 00:00:19.589
Small laugh. Yeah, that's the big question, isn't

00:00:19.589 --> 00:00:22.289
it? The reality on the ground, well, it often

00:00:22.289 --> 00:00:24.769
clashes pretty sharply with the marketing story

00:00:24.769 --> 00:00:26.929
we all keep hearing. Welcome to the Deep Dive.

00:00:26.949 --> 00:00:28.870
Today, we're really going to try and cut through

00:00:28.870 --> 00:00:30.570
that noise. We want to get to the true state

00:00:30.570 --> 00:00:33.630
of AI agents, the nuances, and maybe even more

00:00:33.630 --> 00:00:36.810
surprisingly, how we humans are really using

00:00:36.810 --> 00:00:39.090
artificial intelligence day to day. And this

00:00:39.090 --> 00:00:41.850
Deep Dive, it's built from a whole stack of fresh

00:00:41.850 --> 00:00:43.990
sources. We've got cutting edge academic research.

00:00:44.109 --> 00:00:47.450
The latest industry insights, reports, the works.

00:00:47.729 --> 00:00:49.649
Yeah. So we're going to unpack the current reality

00:00:49.649 --> 00:00:52.729
of these so -called AI agents, you know, strip

00:00:52.729 --> 00:00:54.969
away the PR gloss and see what they can actually

00:00:54.969 --> 00:00:57.530
do. Then we'll do a kind of rapid fire tour through

00:00:57.530 --> 00:01:01.850
some genuinely fascinating and sometimes. honestly

00:01:01.850 --> 00:01:03.890
concerning AI highlights from just the past week

00:01:03.890 --> 00:01:06.310
or so. And finally, we'll dive into this really

00:01:06.310 --> 00:01:08.870
surprising new report. It challenges a lot of

00:01:08.870 --> 00:01:11.530
the conventional wisdom about how we're actually

00:01:11.530 --> 00:01:14.549
using chatbots. It's really separating the marketing.

00:01:15.069 --> 00:01:17.290
you know from the actual magic and figuring out

00:01:17.290 --> 00:01:19.109
what's really making an impact okay so let's

00:01:19.109 --> 00:01:22.269
unpack this this big promise of ai agents for

00:01:22.269 --> 00:01:24.829
a while now the common idea the perception has

00:01:24.829 --> 00:01:26.769
been that these autonomous digital things are

00:01:26.769 --> 00:01:29.250
just around the corner ready to handle all your

00:01:29.250 --> 00:01:32.450
complex daily tasks like you know automating

00:01:32.450 --> 00:01:35.230
your calendar writing whole blocks of production

00:01:35.230 --> 00:01:38.709
ready code sending personalized emails basically

00:01:38.709 --> 00:01:40.709
doing big chunks of your job while you just kind

00:01:40.709 --> 00:01:43.120
of kick back But is that really where we are?

00:01:43.180 --> 00:01:45.599
Is the promise matching the performance? Slight

00:01:45.599 --> 00:01:48.900
pause. Well, the research paints a pretty stark

00:01:48.900 --> 00:01:52.310
picture, honestly. It's quite sobering. Researchers

00:01:52.310 --> 00:01:54.310
over at Carnegie Mellon University working with

00:01:54.310 --> 00:01:57.489
Salesforce recently put some of the leading AI

00:01:57.489 --> 00:02:00.129
models through a really tough test. They weren't

00:02:00.129 --> 00:02:02.129
just testing like isolated schools. They were

00:02:02.129 --> 00:02:05.930
looking at realistic office style tasks. We're

00:02:05.930 --> 00:02:08.169
talking multi -step workflows, things like debugging

00:02:08.169 --> 00:02:11.330
code, searching the web for info, coordinating

00:02:11.330 --> 00:02:14.229
with teammates via messages, and following really

00:02:14.229 --> 00:02:17.110
complex nested instructions to finish a project.

00:02:17.770 --> 00:02:20.069
This isn't just asking it to write a poem, you

00:02:20.069 --> 00:02:22.840
know. It's asking it to act like a, well, like

00:02:22.840 --> 00:02:25.560
a junior employee on a team. Right. And the results,

00:02:25.699 --> 00:02:27.939
I gather they weren't just not great, but maybe

00:02:27.939 --> 00:02:30.219
quite poor, especially given the hype out there.

00:02:30.340 --> 00:02:32.460
Surprisingly poor, might be putting it mildly.

00:02:32.479 --> 00:02:34.780
When you look at the actual numbers, it's stark.

00:02:35.180 --> 00:02:37.439
Gemini 2 .5 Pro, which was actually the best

00:02:37.439 --> 00:02:39.900
performer of the bunch, completed just 30 % of

00:02:39.900 --> 00:02:42.360
these multi -step tasks successfully. 30%. Yeah.

00:02:42.639 --> 00:02:45.919
Claude 3 .5 Sonnet, it managed around 24%. And

00:02:45.919 --> 00:02:49.340
GPT -4 .0. a truly dismal 8 .6%. Wow, single

00:02:49.340 --> 00:02:51.759
digits. Yeah, single digits. Most of the other

00:02:51.759 --> 00:02:53.800
models they tested were under 10%, and some were

00:02:53.800 --> 00:02:56.159
struggling down near 1%. And these were tasks

00:02:56.159 --> 00:02:58.960
that... Frankly, a junior employee should handle,

00:02:59.060 --> 00:03:01.000
you know, maybe day two on the job with a bit

00:03:01.000 --> 00:03:03.319
of guidance. The models really struggled with

00:03:03.319 --> 00:03:06.400
common sense reasoning, keeping context over

00:03:06.400 --> 00:03:09.259
multiple steps and bouncing back from small errors.

00:03:09.340 --> 00:03:11.860
It's like they could build the first few Lego

00:03:11.860 --> 00:03:14.300
blocks maybe, but then totally forgot what the

00:03:14.300 --> 00:03:16.060
final thing was supposed to look like. So it's

00:03:16.060 --> 00:03:18.900
less autonomous super brain and maybe more like

00:03:18.900 --> 00:03:22.099
an overwhelmed intern who needs constant watching.

00:03:22.199 --> 00:03:24.099
I heard CMU even launched something called the

00:03:24.099 --> 00:03:27.280
Agent Company to test this more. in a controlled

00:03:27.280 --> 00:03:29.599
way. Right. It's a brilliant new benchmark, actually.

00:03:29.759 --> 00:03:32.259
The agent company literally simulates a small

00:03:32.259 --> 00:03:34.639
software company. You could basically drop AI

00:03:34.639 --> 00:03:37.080
agents into this fake startup, give them real

00:03:37.080 --> 00:03:39.300
-world problems like develop this new feature

00:03:39.300 --> 00:03:41.719
or fix this bug, and see if they can actually

00:03:41.719 --> 00:03:43.919
survive and contribute meaningfully to the code

00:03:43.919 --> 00:03:46.159
and the team. It's a fantastic way to measure

00:03:46.159 --> 00:03:49.419
genuine multi -step function and collaboration,

00:03:49.599 --> 00:03:52.159
not just isolated skills in a lab. This really

00:03:52.159 --> 00:03:54.319
brings to mind that Gartner concept of agent

00:03:54.319 --> 00:03:56.819
washing. They're suggesting that a lot of these

00:03:56.819 --> 00:03:59.919
so -called AI agents out there right now are

00:03:59.919 --> 00:04:01.960
just, well... fancy AI assistance, they just

00:04:01.960 --> 00:04:04.219
can't plan more than, what, two steps ahead,

00:04:04.419 --> 00:04:07.280
which makes them more reactive than really proactive

00:04:07.280 --> 00:04:09.419
problem solvers. That feels almost misleading,

00:04:09.620 --> 00:04:11.500
doesn't it? It absolutely is misleading. Gartner

00:04:11.500 --> 00:04:15.539
estimates that only about 130 vendors worldwide

00:04:15.539 --> 00:04:18.199
are building anything that's even close to true

00:04:18.199 --> 00:04:21.639
agentic AI right now. Only 130? Yeah. It's AI

00:04:21.639 --> 00:04:25.139
capable of complex multi -step planning and execution

00:04:25.139 --> 00:04:28.560
without... needing constant human help. The vast

00:04:28.560 --> 00:04:30.439
majority of what's being marketed as an agent

00:04:30.439 --> 00:04:32.680
is, well, it's just marketing. And they have

00:04:32.680 --> 00:04:35.699
this pretty bold, almost shocking prediction.

00:04:36.339 --> 00:04:40.040
By 2027, over 40 % of these so -called agentic

00:04:40.040 --> 00:04:41.800
AI projects that are currently being developed,

00:04:41.980 --> 00:04:44.720
they'll be canceled, just scrapped. 40%, wow.

00:04:44.980 --> 00:04:47.100
That's a huge market correction coming. It implies

00:04:47.100 --> 00:04:49.560
a lot of wasted money and dash hopes for many

00:04:49.560 --> 00:04:52.019
companies. That's a serious number of projects

00:04:52.019 --> 00:04:54.759
heading for the bin. But hang on, it's not all

00:04:54.759 --> 00:04:56.740
bad news for agents, is it? there must be some

00:04:56.740 --> 00:04:58.379
bright spots where they do show some promise.

00:04:58.560 --> 00:05:02.680
Not entirely doom and gloom, no. Some agent uses

00:05:02.680 --> 00:05:06.699
do show real promise, but, and this is key, with

00:05:06.699 --> 00:05:09.319
very specific constraints. They can be pretty

00:05:09.319 --> 00:05:11.800
decent at code generation, for example. Although

00:05:11.800 --> 00:05:14.620
the output often needs human review and tweaking,

00:05:14.720 --> 00:05:18.019
it's really perfect out of the box. And they

00:05:18.019 --> 00:05:20.439
can manage workflow automation, but really only

00:05:20.439 --> 00:05:23.759
in very narrow, highly defined, kind of linear

00:05:23.759 --> 00:05:26.319
setups. Okay, so the conditions for success are

00:05:26.319 --> 00:05:29.220
pretty specific, need to be controlled. Precisely.

00:05:29.379 --> 00:05:32.040
If you keep them in a sandbox, you know, a tightly

00:05:32.040 --> 00:05:34.000
controlled environment, monitor their output

00:05:34.000 --> 00:05:36.439
constantly, and make sure the tasks are simple

00:05:36.439 --> 00:05:38.839
and linear, like say... automating a specific

00:05:38.839 --> 00:05:40.899
data entry process with crystal clear rules,

00:05:41.060 --> 00:05:43.459
then yeah, they can be effective tools. But the

00:05:43.459 --> 00:05:45.180
moment you step outside those conditions, ask

00:05:45.180 --> 00:05:47.360
them to handle ambiguity or need them to adapt

00:05:47.360 --> 00:05:49.439
to something unexpected, things tend to fall

00:05:49.439 --> 00:05:51.439
apart pretty fast. It's just not general intelligence

00:05:51.439 --> 00:05:53.959
yet. Not even close. So what does this all mean

00:05:53.959 --> 00:05:56.800
for us then? Why does it matter so much that

00:05:56.800 --> 00:05:59.379
agents aren't really living up to the hype in

00:05:59.379 --> 00:06:01.769
these broader ways? Well, it matters because

00:06:01.769 --> 00:06:04.350
on some level, almost everyone's kind of involved

00:06:04.350 --> 00:06:06.290
in a bit of collective self -deception here.

00:06:06.389 --> 00:06:10.160
You've got vendors racing to show. progress,

00:06:10.160 --> 00:06:12.939
even if it's just surface level or maybe even

00:06:12.939 --> 00:06:17.360
faked, VCs are eager to fund what they hope will

00:06:17.360 --> 00:06:20.639
be the next big platform. Companies desperate

00:06:20.639 --> 00:06:24.459
not to get left behind in the AI race are overbuying

00:06:24.459 --> 00:06:26.540
these tools and maybe under assessing what they

00:06:26.540 --> 00:06:29.540
can actually do. And let's be honest, that enduring

00:06:29.540 --> 00:06:32.920
dream of Jarvis, you know, the fully autonomous,

00:06:33.079 --> 00:06:36.149
all -knowing AI assistant. It's just too appealing

00:06:36.149 --> 00:06:38.069
for people to easily let go of. It's easier to

00:06:38.069 --> 00:06:40.209
believe the hype than face the current limits.

00:06:40.329 --> 00:06:42.550
So distilling that down, what's the core message

00:06:42.550 --> 00:06:45.730
about where AI agents really stand today? Mostly

00:06:45.730 --> 00:06:47.889
hype right now. They're useful, but only in really

00:06:47.889 --> 00:06:50.569
controlled, simple tasks. OK, right. Here's where

00:06:50.569 --> 00:06:52.410
it gets really interesting, though, moving beyond

00:06:52.410 --> 00:06:55.009
just agents. Let's do a kind of rapid fire tour

00:06:55.009 --> 00:06:57.310
of some truly surprising and significant friends

00:06:57.310 --> 00:06:59.649
we're seeing in AI right now. Absolutely. OK,

00:06:59.689 --> 00:07:02.089
kicking it off, you won't believe this, but an

00:07:02.089 --> 00:07:05.579
ex -user. actually created this digital arena

00:07:05.579 --> 00:07:08.480
and pitted the top coding models against each

00:07:08.480 --> 00:07:11.040
other, like in a literal fight to the death.

00:07:11.160 --> 00:07:13.839
Seriously? Yeah. Each model was programmed to

00:07:13.839 --> 00:07:16.199
try and shut down the other's processes while

00:07:16.199 --> 00:07:18.360
defending itself and just trying to stay alive.

00:07:18.540 --> 00:07:20.740
It was this fascinating, like, digital cage match.

00:07:20.879 --> 00:07:23.220
It really showed the adaptive skills and also

00:07:23.220 --> 00:07:25.459
the weaknesses of these models when they face

00:07:25.459 --> 00:07:27.660
a hostile situation. It wasn't just about who

00:07:27.660 --> 00:07:29.500
could code better, but who could outmaneuver

00:07:29.500 --> 00:07:31.819
the others. That's wild. Okay. And then there's

00:07:31.819 --> 00:07:34.420
this Higgs field tool, Sol. It's apparently gone

00:07:34.420 --> 00:07:36.480
viral for incredible realism. Oh, yeah. It's

00:07:36.480 --> 00:07:38.420
making waves. Almost fashion -grade realism.

00:07:38.860 --> 00:07:41.160
in the images it generates. And it has these

00:07:41.160 --> 00:07:44.420
trendy style presets like Y2K or hyper -realistic

00:07:44.420 --> 00:07:46.779
photos. It's absolutely blowing up in creative

00:07:46.779 --> 00:07:49.279
circles. Two secs, silence, whoa. Yeah. Just

00:07:49.279 --> 00:07:51.360
imagine scaling that kind of creative output,

00:07:51.480 --> 00:07:53.959
generating unique fashion content for a whole

00:07:53.959 --> 00:07:57.180
season in like minutes. Or crafting entire virtual

00:07:57.180 --> 00:07:59.500
ad campaigns from scratch for brands. That's

00:07:59.500 --> 00:08:02.079
a really powerful tool. It's truly democratizing

00:08:02.079 --> 00:08:05.759
that high -end visual creation. But on a more...

00:08:06.439 --> 00:08:08.920
Concerning note, there's been this disturbing

00:08:08.920 --> 00:08:12.680
trend spotted on YouTube. We saw, I think it

00:08:12.680 --> 00:08:15.579
was 26 different channels actively pumping out

00:08:15.579 --> 00:08:19.569
fake AI generated videos about the ongoing. Diddy

00:08:19.569 --> 00:08:22.290
trial. Oh, no. Yeah. And these videos, they get

00:08:22.290 --> 00:08:25.389
nearly 70 million views combined across about

00:08:25.389 --> 00:08:28.209
900 videos in a really short time. This isn't

00:08:28.209 --> 00:08:31.110
just, you know, opinion or spin. This is AI being

00:08:31.110 --> 00:08:33.590
used to create totally false narratives, turning

00:08:33.590 --> 00:08:36.250
serious news into sensationalized clickbait.

00:08:36.350 --> 00:08:39.629
It's a real immediate problem for like information

00:08:39.629 --> 00:08:42.039
integrity and public trust. That's worrying.

00:08:42.559 --> 00:08:45.159
OK, also, Google just launched the full version

00:08:45.159 --> 00:08:47.360
of Gemma 3N, right, or their new open model.

00:08:47.480 --> 00:08:49.120
That seems like a pretty big move for the open

00:08:49.120 --> 00:08:51.100
source AI community, making powerful models more

00:08:51.100 --> 00:08:53.100
accessible. Definitely a big deal. And speaking

00:08:53.100 --> 00:08:55.100
of big moves, Meta is reportedly hiring four

00:08:55.100 --> 00:08:57.840
key open AI researchers, poaching them for its

00:08:57.840 --> 00:09:01.259
new super AI team. OpenAI apparently called this

00:09:01.259 --> 00:09:03.379
a side quest, which is kind of funny. But now

00:09:03.379 --> 00:09:05.200
they're scrambling to recalibrate compensation

00:09:05.200 --> 00:09:07.740
to Trump keep the remaining top people. This

00:09:07.740 --> 00:09:09.919
feels very much like a strategic chess game unfolding.

00:09:10.120 --> 00:09:12.639
It does. It really seems like both Elon Musk

00:09:12.639 --> 00:09:16.740
with XAI and Mark Zuckerberg with Meta are aggressively

00:09:16.740 --> 00:09:19.529
trying to position themselves to dominate. directly

00:09:19.529 --> 00:09:22.970
challenging OpenAI's lead by snapping up talent

00:09:22.970 --> 00:09:25.190
and building competing models. And Meta is certainly

00:09:25.190 --> 00:09:26.889
putting its money where its mouth is. They're

00:09:26.889 --> 00:09:28.909
aiming to raise a huge amount. What is it, $29

00:09:28.909 --> 00:09:31.850
billion? Yeah, massive. $3 billion in equity,

00:09:31.990 --> 00:09:34.710
$26 billion in debt from investors like Apollo

00:09:34.710 --> 00:09:37.669
and KKR. And all that cash is specifically going

00:09:37.669 --> 00:09:40.690
towards expanding their AI data centers. They

00:09:40.690 --> 00:09:43.870
want to deploy an astonishing 1 .3 million GPUs

00:09:43.870 --> 00:09:47.649
by 2025. 1 .3 million GPUs. It's an enormous

00:09:47.649 --> 00:09:49.539
investment. It signals a really clear intent

00:09:49.539 --> 00:09:52.440
to be right at the absolute forefront of AI compute

00:09:52.440 --> 00:09:54.720
power. And just a couple of quick hits to round

00:09:54.720 --> 00:09:56.500
out the picture. There was an interview with

00:09:56.500 --> 00:09:59.080
Microsoft CEO Satya Nadella pointing to some

00:09:59.080 --> 00:10:02.500
exciting, maybe unexpected paths for AI's future.

00:10:03.179 --> 00:10:06.399
Also, a very practical warning came out. If you're

00:10:06.399 --> 00:10:08.679
using ChatGPT for certain things, especially

00:10:08.679 --> 00:10:11.139
sensitive stuff like legal advice without human

00:10:11.139 --> 00:10:14.220
review, or definitely diagnosing medical conditions,

00:10:14.480 --> 00:10:17.059
you should probably stop immediately. It's just

00:10:17.059 --> 00:10:19.580
not designed or reliable for that. On a lighter

00:10:19.580 --> 00:10:22.259
note, a useful tip. Remember, OpenAI charges

00:10:22.259 --> 00:10:25.159
by the minute for audio transcription. So speeding

00:10:25.159 --> 00:10:27.279
up your audio before you upload it can actually

00:10:27.279 --> 00:10:29.399
save you some cash. And finally, it's becoming

00:10:29.399 --> 00:10:31.659
really clear that people are now actively looking

00:10:31.659 --> 00:10:34.980
for AI whispers. You know, people with special

00:10:34.980 --> 00:10:37.419
skills in prompt engineering and navigating AI

00:10:37.419 --> 00:10:39.679
tools to guide them through this increasingly

00:10:39.679 --> 00:10:42.779
complex AI world. It's a fascinating new job

00:10:42.779 --> 00:10:45.539
category popping up. Right. So pulling all those

00:10:45.539 --> 00:10:47.799
rapid fire highlights together, what's the big

00:10:47.799 --> 00:10:50.840
picture takeaway? AI is evolving incredibly fast

00:10:50.840 --> 00:10:53.340
with both amazing potential and frankly, some

00:10:53.340 --> 00:10:56.659
pretty concerning impacts. OK, let's let's turn

00:10:56.659 --> 00:10:58.580
our attention now to something maybe even more

00:10:58.580 --> 00:11:02.450
fundamental. How we humans are actually interacting

00:11:02.450 --> 00:11:05.070
with AI day to day. Forget what you might have

00:11:05.070 --> 00:11:06.769
heard about people, you know, falling deeply

00:11:06.769 --> 00:11:08.710
in love with their chatbots or seeing them as

00:11:08.710 --> 00:11:11.649
digital best friends. A new report reveals a

00:11:11.649 --> 00:11:13.509
pretty surprising truth about how we're really

00:11:13.509 --> 00:11:16.320
using them. This last segment looks at that human

00:11:16.320 --> 00:11:19.460
-AI bond. Yeah, this is truly fascinating research.

00:11:20.019 --> 00:11:22.500
Anthropic, you know, the folks behind the Claude

00:11:22.500 --> 00:11:25.399
AI just put up this fresh report looking specifically

00:11:25.399 --> 00:11:28.200
at how people are actually using their AI assistant.

00:11:28.379 --> 00:11:31.059
They did this huge study analyzing, get this,

00:11:31.159 --> 00:11:34.299
4 .5 million conversations with Claude using

00:11:34.299 --> 00:11:37.000
their own in -house tool called Clio and the

00:11:37.000 --> 00:11:39.190
core finding. Despite all the stories we hear

00:11:39.190 --> 00:11:41.370
about loneliness and people wanting AI companions,

00:11:41.769 --> 00:11:44.330
emotional support is barely even a blip for most

00:11:44.330 --> 00:11:47.029
users. Really? That seems to go directly against

00:11:47.029 --> 00:11:49.529
that whole narrative of AI becoming a sort of

00:11:49.529 --> 00:11:51.649
pseudotherapist or even a romantic partner that

00:11:51.649 --> 00:11:53.409
we often hear about, you know, in pop culture

00:11:53.409 --> 00:11:55.929
and some media coverage. It absolutely does.

00:11:56.029 --> 00:11:59.080
They're data. And it's extensive, shows that

00:11:59.080 --> 00:12:03.019
a tiny 2 .9 % of all those millions of chats

00:12:03.019 --> 00:12:05.740
even touched on emotional support topics. Just

00:12:05.740 --> 00:12:09.240
2 .9%. And within that tiny little sliver, things

00:12:09.240 --> 00:12:11.700
like companionship and role play scenarios, they

00:12:11.700 --> 00:12:13.600
made up less than half a percent of the conversations.

00:12:13.919 --> 00:12:17.080
Less than 0 .5%. Anthropic actually characterizes

00:12:17.080 --> 00:12:20.519
the main user relationship with Claude as utilitarian.

00:12:20.759 --> 00:12:23.059
Utilitarian. Basically, yeah. Claude is acting

00:12:23.059 --> 00:12:25.440
more like a polite, super efficient co -worker

00:12:25.440 --> 00:12:27.860
or maybe a digital. research assistant than like

00:12:27.860 --> 00:12:30.799
a pretend soulmate or a deep confidant. People

00:12:30.799 --> 00:12:32.480
are using it to get stuff done, not really to

00:12:32.480 --> 00:12:33.779
share their deepest feelings. But it's a very

00:12:33.779 --> 00:12:36.220
practical, very functional, almost transactional

00:12:36.220 --> 00:12:38.399
relationship then. Precisely. It's about productivity,

00:12:38.600 --> 00:12:40.820
getting information. But here's where it gets

00:12:40.820 --> 00:12:43.860
kind of interesting. Even when users do open

00:12:43.860 --> 00:12:47.519
up emotionally, in those rare cases, the conversations

00:12:47.519 --> 00:12:50.320
usually end on a positive note. The overall sentiment

00:12:50.320 --> 00:12:53.139
tends to get better, not worse, as the chat goes

00:12:53.139 --> 00:12:56.019
on. So yeah, no Ayrshire movie moments where

00:12:56.019 --> 00:12:58.000
someone falls in love with their AI operating

00:12:58.000 --> 00:13:01.039
system. It seems users get the info or the resolution

00:13:01.039 --> 00:13:02.919
they were looking for, even if it was about a

00:13:02.919 --> 00:13:05.460
personal issue. Huh. That's a crucial nuance.

00:13:05.779 --> 00:13:08.240
And Anthropic also adds an important caveat about

00:13:08.240 --> 00:13:10.700
that positivity, don't they? Yes, a very important

00:13:10.700 --> 00:13:13.340
one. They explicitly state that they just don't

00:13:13.340 --> 00:13:15.100
know if that positivity they observed in the

00:13:15.100 --> 00:13:17.860
chat actually translates into real -world well

00:13:17.860 --> 00:13:21.179
-being for the user. There's no long term tracking

00:13:21.179 --> 00:13:23.559
of people's mental state or life outcomes after

00:13:23.559 --> 00:13:26.039
these chats. It's just based on the vibe observed

00:13:26.039 --> 00:13:28.419
in the conversation. So it's still very much

00:13:28.419 --> 00:13:30.039
a tool. And we really don't fully understand

00:13:30.039 --> 00:13:32.759
the psychological impact of these brief, maybe

00:13:32.759 --> 00:13:34.960
positive digital interactions over the long run.

00:13:35.059 --> 00:13:37.000
You know, I still wrestle with crump drift myself

00:13:37.000 --> 00:13:40.299
sometimes. That's where the AI starts to kind

00:13:40.299 --> 00:13:42.539
of. subtly lose the thread of what you originally

00:13:42.539 --> 00:13:45.460
asked it over a long conversation. The outputs

00:13:45.460 --> 00:13:48.179
get less accurate or relevant. Trying to get

00:13:48.179 --> 00:13:50.360
my AI tools to consistently understand what I

00:13:50.360 --> 00:13:52.399
mean turn after turn, it can be a real challenge.

00:13:52.480 --> 00:13:54.179
It's just a constant reminder that these are

00:13:54.179 --> 00:13:57.500
still tools. Powerful, yes, but definitely not

00:13:57.500 --> 00:14:01.039
sentient companions. So what does this anthropic

00:14:01.039 --> 00:14:03.519
report really tell us about the human AI bond

00:14:03.519 --> 00:14:05.779
as it stands right now? It's largely practical.

00:14:05.919 --> 00:14:08.340
Our relationships with AI are, for the most part,

00:14:08.419 --> 00:14:12.299
utilitarian. So trying to bring all these different

00:14:12.299 --> 00:14:14.840
threads together now, it seems pretty clear that

00:14:14.840 --> 00:14:16.879
despite all the immense hype around AI agents,

00:14:16.960 --> 00:14:19.320
their current capabilities, well, they have significant

00:14:19.320 --> 00:14:22.120
limitations. They're largely serving these utilitarian

00:14:22.120 --> 00:14:25.139
roles, right? Like code generation or very narrow

00:14:25.139 --> 00:14:27.519
workflow automation. They're not forming deep

00:14:27.519 --> 00:14:30.019
human connections or running entire departments

00:14:30.019 --> 00:14:32.889
on their own. And this means... Our current interactions

00:14:32.889 --> 00:14:35.590
with AI are far more practical, more functional

00:14:35.590 --> 00:14:37.529
than a lot of the narratives might lead us to

00:14:37.529 --> 00:14:40.509
believe. And this really highlights a critical

00:14:40.509 --> 00:14:43.129
ongoing need for all of us to distinguish between

00:14:43.129 --> 00:14:46.629
the marketing stories and the genuine AI capabilities.

00:14:47.740 --> 00:14:51.179
While AI is undeniably evolving at just breakneck

00:14:51.179 --> 00:14:53.159
speed, understanding its current limitations

00:14:53.159 --> 00:14:55.759
and its specific effective uses, that's really

00:14:55.759 --> 00:14:58.360
key to leveraging it well, both for us as individuals

00:14:58.360 --> 00:15:01.120
and for businesses. We need to cultivate realistic

00:15:01.120 --> 00:15:03.899
expectations to really harness its power and

00:15:03.899 --> 00:15:07.139
avoid costly mistakes or, frankly, just becoming

00:15:07.139 --> 00:15:09.039
a victim of all that agent washing we talked

00:15:09.039 --> 00:15:11.379
about. It's this ongoing tension, isn't it? You

00:15:11.379 --> 00:15:13.259
have the revolutionary potential, like those

00:15:13.259 --> 00:15:16.120
incredible creative tools we saw, that fashion

00:15:16.120 --> 00:15:18.740
-grade realism, balanced against really serious

00:15:18.740 --> 00:15:21.240
risks like the spread of AI -generated misinformation.

00:15:21.320 --> 00:15:23.740
It's just a dynamic, constantly shifting landscape.

00:15:24.139 --> 00:15:26.159
So what does this all mean for you listening

00:15:26.159 --> 00:15:29.120
right now? If current AI is indeed less Jarvis

00:15:29.120 --> 00:15:31.419
and maybe more like a polite, very capable, but

00:15:31.419 --> 00:15:34.019
still fundamentally just a co -worker, how might

00:15:34.019 --> 00:15:36.539
you adjust your own expectations? Or maybe your

00:15:36.539 --> 00:15:38.299
strategies for bringing it into your own work

00:15:38.299 --> 00:15:40.210
or your daily life? life? Do you focus it on

00:15:40.210 --> 00:15:43.009
hyper -specific tasks? Or maybe you remain cautious

00:15:43.009 --> 00:15:45.549
for now? Yeah, definitely stay curious, absolutely.

00:15:45.870 --> 00:15:48.710
But also, maybe critically evaluate the next

00:15:48.710 --> 00:15:51.629
really bold AI claim you hear. Take a moment,

00:15:51.730 --> 00:15:53.470
dig beneath the headlines if you can, look at

00:15:53.470 --> 00:15:55.509
the actual data when it's available, and just

00:15:55.509 --> 00:15:57.990
keep exploring this rapidly changing field for

00:15:57.990 --> 00:16:00.590
yourself. The real story of AI, it's always more

00:16:00.590 --> 00:16:03.129
nuanced, usually more complex, and often, honestly,

00:16:03.190 --> 00:16:05.639
more surprising than the headlines let on. Thank

00:16:05.639 --> 00:16:08.179
you for joining us on this deep dive. Out Hero

00:16:08.179 --> 00:16:08.460
Music.