WEBVTT

00:00:00.000 --> 00:00:02.600
So we keep hearing this narrative, right, that

00:00:02.600 --> 00:00:04.960
these really advanced AI agents, they're basically

00:00:04.960 --> 00:00:07.160
ready now, ready to just walk into our jobs.

00:00:07.280 --> 00:00:09.240
Yeah, the disruption is coming any second now.

00:00:09.359 --> 00:00:11.880
But when you actually dig into the performance

00:00:11.880 --> 00:00:14.220
data, well, it's a very different picture. These

00:00:14.220 --> 00:00:17.379
benchmarks on real freelance tasks, they tell

00:00:17.379 --> 00:00:20.399
a pretty startling story. There was this major

00:00:20.399 --> 00:00:22.839
study that tracked agents doing actual Upwork

00:00:22.839 --> 00:00:27.000
-style gigs, and the results were kind of brutal,

00:00:27.100 --> 00:00:29.739
actually. Brutal is the word. The very best model

00:00:29.739 --> 00:00:32.700
they tested, it managed to complete just, what,

00:00:32.719 --> 00:00:34.619
2 % or 3 % of the jobs available? Yeah, it's

00:00:34.619 --> 00:00:37.320
tiny. Earned something like $1 ,800 out of almost

00:00:37.320 --> 00:00:41.140
$144 ,000 worth of work. So the question is why?

00:00:41.659 --> 00:00:43.820
Why are these systems that seem so smart, the

00:00:43.820 --> 00:00:47.039
ones acing academic tests, failing this basic

00:00:47.039 --> 00:00:50.359
human test of just getting paid work done? And

00:00:50.359 --> 00:00:54.140
welcome, everyone, to the deep dive. That gap,

00:00:54.140 --> 00:00:56.299
you know, between a hype and the, let's face

00:00:56.299 --> 00:00:58.829
it, harsh reality on the ground. that's what

00:00:58.829 --> 00:01:00.609
we're tackling today we're going to unpack the

00:01:00.609 --> 00:01:03.130
sources you sent over really focus on the actual

00:01:03.130 --> 00:01:05.590
performance the potential sure but also some

00:01:05.590 --> 00:01:07.890
of the crazy drama happening behind the scenes

00:01:07.890 --> 00:01:10.090
this sounds good we've got a fair bit to get

00:01:10.090 --> 00:01:13.549
through for you yeah so first up that brutal

00:01:13.549 --> 00:01:16.810
reality check for ai agents trying to do real

00:01:16.810 --> 00:01:20.909
work then we'll dive into some frankly mind -bending

00:01:20.909 --> 00:01:23.230
new ai models that can chew through an entire

00:01:23.230 --> 00:01:26.150
novel's worth of text like super fast and finally

00:01:26.719 --> 00:01:28.939
We'll hit the hottest AI trends blowing up on

00:01:28.939 --> 00:01:31.200
social media right now. OK, let's unpack this.

00:01:31.299 --> 00:01:34.400
So this benchmark study from Scale AI and CIS.

00:01:34.519 --> 00:01:36.200
What's interesting is they didn't just, you know,

00:01:36.200 --> 00:01:37.920
test these agents in a lab. Like thrown right

00:01:37.920 --> 00:01:40.019
into the deep end. Exactly. The freelance marketplace

00:01:40.019 --> 00:01:42.739
where instructions can be vague. Feedback is

00:01:42.739 --> 00:01:45.920
subjective. It's messy. And those numbers we

00:01:45.920 --> 00:01:47.519
mentioned, that two, three percent completion

00:01:47.519 --> 00:01:50.859
rate, they're really sobering. You hear AI agent,

00:01:50.879 --> 00:01:53.219
you think, OK, autonomous worker, ready to go.

00:01:53.359 --> 00:01:57.299
But this tells you they are just. nowhere near

00:01:57.299 --> 00:02:00.040
autonomous yet. They're really struggling with

00:02:00.040 --> 00:02:02.700
the basic friction of how humans actually work

00:02:02.700 --> 00:02:05.579
together. Yeah. The study pinpointed four main

00:02:05.579 --> 00:02:08.000
failure points. First, tasks that need multiple

00:02:08.000 --> 00:02:11.819
steps or, and this is key, handoffs between different

00:02:11.819 --> 00:02:13.900
systems. The agent just gets lost trying to move

00:02:13.900 --> 00:02:16.620
its work from, say, sister A to system B. Right.

00:02:16.699 --> 00:02:20.300
And second. ambiguous instructions massive problem

00:02:20.300 --> 00:02:22.500
stuff a human would just you know shoot off a

00:02:22.500 --> 00:02:24.580
quick email to clarify or ask in a quick chat

00:02:24.580 --> 00:02:27.000
10 seconds yeah the agent it just freezes up

00:02:27.000 --> 00:02:30.379
or worse plows ahead confidently with completely

00:02:30.379 --> 00:02:33.419
wrong output garbage in garbage out but confidently

00:02:33.419 --> 00:02:37.639
wrong garbage the third one and this feels like

00:02:37.639 --> 00:02:40.400
the big one for complex work contextual judgment

00:02:40.400 --> 00:02:42.879
yeah picking the right tone the style Understanding

00:02:42.879 --> 00:02:45.560
the nuance a client needs. Yeah, I still wrestle

00:02:45.560 --> 00:02:48.159
with prompt drift myself sometimes when a project

00:02:48.159 --> 00:02:51.219
needs shift just slightly. So I absolutely get

00:02:51.219 --> 00:02:53.659
why an agent struggles when things get subjective

00:02:53.659 --> 00:02:57.800
or need that subtle touch. It's tough even for

00:02:57.800 --> 00:03:00.900
us humans working with these tools daily. That's

00:03:00.900 --> 00:03:03.000
a really honest point. And the fourth thing is

00:03:03.000 --> 00:03:05.879
that feedback loop. The client says, hmm, not

00:03:05.879 --> 00:03:08.099
quite. Try again. Make it, I don't know, 10 %

00:03:08.099 --> 00:03:10.280
punchier. Focus more on the future benefits.

00:03:10.479 --> 00:03:13.580
Right. Agents just fail repeatedly at taking

00:03:13.580 --> 00:03:15.759
that kind of subjective feedback on board, something

00:03:15.759 --> 00:03:19.099
humans do all the time. They just lack that common

00:03:19.099 --> 00:03:21.900
sense adaptability to criticism. So the takeaway

00:03:21.900 --> 00:03:23.819
for now seems pretty clear. Yeah, I think so.

00:03:23.939 --> 00:03:26.520
For you listening, the clarity should be this.

00:03:26.879 --> 00:03:29.719
AI agents right now. They're great as tools or

00:03:29.719 --> 00:03:31.800
maybe specialized assistants for really narrow,

00:03:31.860 --> 00:03:33.979
repetitive tasks in the back office. Like filling

00:03:33.979 --> 00:03:36.479
in forms or basic data stuff. Exactly. Simple

00:03:36.479 --> 00:03:38.680
data extraction, that kind of thing. But they

00:03:38.680 --> 00:03:41.819
are absolutely not replacements for complex human

00:03:41.819 --> 00:03:44.300
workflows. Anything that needs real judgment

00:03:44.300 --> 00:03:47.099
calls moment to moment. So why is that contextual

00:03:47.099 --> 00:03:50.599
judgment piece like picking the right tone for

00:03:50.599 --> 00:03:52.460
a client? Why is that the final hurdle for these

00:03:52.460 --> 00:03:54.580
models right now? Well, what's fascinating, I

00:03:54.580 --> 00:03:57.219
think, is that the systems just lack any real

00:03:57.219 --> 00:03:59.740
inherent understanding of subtle human intent.

00:04:00.060 --> 00:04:03.639
You know, they're processing patterns in data

00:04:03.639 --> 00:04:07.400
tokens, not actual meaning or feeling. It's kind

00:04:07.400 --> 00:04:09.319
of like translating a sentence word for word

00:04:09.319 --> 00:04:11.259
from another language, but completely missing

00:04:11.259 --> 00:04:14.020
the cultural nuance or the implied social signal.

00:04:14.180 --> 00:04:16.720
The systems lack inherent understanding of subtle

00:04:16.720 --> 00:04:19.379
human intent. Right. And that reality check,

00:04:19.420 --> 00:04:21.699
it really highlights the lag between what's actually

00:04:21.699 --> 00:04:23.889
deployed versus what. the big labs might be cooking

00:04:23.889 --> 00:04:25.930
up. So maybe let's peek behind that curtain.

00:04:26.009 --> 00:04:28.370
Start with the drama, then hit the tech. Okay,

00:04:28.449 --> 00:04:30.670
yeah. Speaking of the big labs, there was some

00:04:30.670 --> 00:04:33.250
pretty wild industry gossip swirling around this

00:04:33.250 --> 00:04:37.230
almost bizarre AI love story, apparently, that

00:04:37.230 --> 00:04:39.410
unfolded right after Sam Altman got initially

00:04:39.410 --> 00:04:41.790
fired from OpenAI. That's right. Our sources

00:04:41.790 --> 00:04:43.810
confirmed there were actual merger talks between

00:04:43.810 --> 00:04:47.610
OpenAI and, get this, their biggest rival, Anthropic.

00:04:47.910 --> 00:04:50.699
Wow. The two heavyweights were actually considering

00:04:50.699 --> 00:04:52.980
merging. That just shows how fast things can

00:04:52.980 --> 00:04:55.100
shift those internal power dynamics in this like

00:04:55.100 --> 00:04:58.839
super competitive space. Totally. But in the

00:04:58.839 --> 00:05:01.860
end, Amthropic backed out, said no to the merger,

00:05:02.000 --> 00:05:04.019
which also tells you something, right? A quick

00:05:04.019 --> 00:05:06.740
rethink about staying independent versus consolidating

00:05:06.740 --> 00:05:09.220
power. Interesting. And while all that internal

00:05:09.220 --> 00:05:11.259
politicking was happening, something else was

00:05:11.259 --> 00:05:14.139
going on with how ChatGPT is actually working

00:05:14.139 --> 00:05:17.420
in the wild. This. Atlas bot behavior, sources

00:05:17.420 --> 00:05:19.860
saying it's actively dodging links from The New

00:05:19.860 --> 00:05:22.279
York Times. Yeah. Instead of linking you straight

00:05:22.279 --> 00:05:24.819
to an N .Y .T. article, it seems to be rerouting,

00:05:24.939 --> 00:05:28.000
sending users to places like Reuters or maybe

00:05:28.000 --> 00:05:30.819
The Washington Post instead. Seems pretty clearly

00:05:30.819 --> 00:05:33.339
like a content dodging move, right? Trying to

00:05:33.339 --> 00:05:36.069
avoid. Potential legal headaches. Almost certainly,

00:05:36.230 --> 00:05:38.509
which is itself a massive development in how

00:05:38.509 --> 00:05:41.069
these models interact with, you know, established

00:05:41.069 --> 00:05:44.209
news sources and copyright. So while the lawyers

00:05:44.209 --> 00:05:46.769
are battling it out over scraping content, the

00:05:46.769 --> 00:05:48.829
engineers are tackling the really fundamental

00:05:48.829 --> 00:05:51.310
physics problems of these large language models.

00:05:51.689 --> 00:05:53.750
And this is where it gets really interesting.

00:05:53.949 --> 00:05:56.670
Let's talk about this huge technical leap. Moonshot

00:05:56.670 --> 00:05:59.990
AI's chemilinear model. OK, yeah, this is cool.

00:06:00.050 --> 00:06:02.269
You know how transformers have always had this

00:06:02.269 --> 00:06:05.189
problem? Transformers are the basic building

00:06:05.189 --> 00:06:08.250
blocks, the architecture behind things like ChatGPT.

00:06:08.389 --> 00:06:10.350
Right, the core tech. And they've always just

00:06:10.350 --> 00:06:13.610
completely choked, fallen apart when you try

00:06:13.610 --> 00:06:16.329
to feed them too much text to read at once. That

00:06:16.329 --> 00:06:18.750
huge amount of context just kills the memory.

00:06:19.050 --> 00:06:21.149
Yeah, that's the classic N -squared complexity

00:06:21.149 --> 00:06:24.350
issue, right? Every single word basically has

00:06:24.350 --> 00:06:26.310
to talk to every other word in the input. It

00:06:26.310 --> 00:06:28.569
just gets computationally massive really fast,

00:06:28.790 --> 00:06:31.250
needs tons of memory, tons of processing. Exactly.

00:06:31.250 --> 00:06:34.509
It's super bloated. So Kimi Linear, it uses a

00:06:34.509 --> 00:06:36.310
different approach, something they call Kimi

00:06:36.310 --> 00:06:38.889
Delta attention. It basically swaps out that

00:06:38.889 --> 00:06:41.790
demanding everyone talks to everyone system for

00:06:41.790 --> 00:06:44.470
something way leaner. And the benefits sound

00:06:44.470 --> 00:06:46.910
pretty striking. They're talking, what, 75 %

00:06:46.910 --> 00:06:49.810
less memory needed? That's huge for scaling and

00:06:49.810 --> 00:06:52.990
generating text six times faster. Yeah, massive

00:06:52.990 --> 00:06:55.449
improvements. And what's clever is how it works

00:06:55.449 --> 00:06:57.470
under the hood. It uses what's called sparse

00:06:57.470 --> 00:07:00.810
attention. It kind of intelligently forgets the

00:07:00.810 --> 00:07:02.829
less important stuff over time, keeps the key

00:07:02.829 --> 00:07:06.110
info around longer. But crucially, it still mixes

00:07:06.110 --> 00:07:08.149
in some of the old style attention layers like

00:07:08.149 --> 00:07:10.389
a three to one ratio just to keep the overall

00:07:10.389 --> 00:07:13.470
big picture understanding. But the headline number,

00:07:13.550 --> 00:07:15.949
the really wild part is the context Windows Isaac

00:07:15.949 --> 00:07:20.110
can handle. One million tokens. A million. That's

00:07:20.110 --> 00:07:23.779
like. 750 ,000 words you could feed an entire

00:07:23.779 --> 00:07:27.420
novel or a massive software code base or years

00:07:27.420 --> 00:07:29.800
of meeting transcripts and apparently keeps it

00:07:29.800 --> 00:07:31.920
all straight. Whoa. OK, just imagine scaling

00:07:31.920 --> 00:07:34.339
that a million token context. Sure. But doing

00:07:34.339 --> 00:07:37.100
that for like a billion queries without the memory

00:07:37.100 --> 00:07:39.500
just exploding. That genuinely changes everything

00:07:39.500 --> 00:07:42.339
for big companies, for enterprise use. Exactly.

00:07:42.620 --> 00:07:45.740
This kind of leap makes data tasks that were

00:07:45.740 --> 00:07:49.060
just completely impractical before suddenly feasible.

00:07:49.420 --> 00:07:52.139
Think about lawyers. doing discovery on huge

00:07:52.139 --> 00:07:54.779
cases, or insurance companies processing massive

00:07:54.779 --> 00:07:58.139
claims using the entire client history, or financial

00:07:58.139 --> 00:08:00.720
analysts looking at every single annual report

00:08:00.720 --> 00:08:03.779
a company ever filed, this huge window makes

00:08:03.779 --> 00:08:05.680
processing these enormous corporate document

00:08:05.680 --> 00:08:08.939
sets actually practical. Yeah, it really shifts

00:08:08.939 --> 00:08:10.860
the bottleneck. And the tech's already out there.

00:08:10.980 --> 00:08:13.560
It's open -sourced, checkpoints on Hugging Face,

00:08:13.800 --> 00:08:15.899
so anyone trying to wrestle with these massive

00:08:15.899 --> 00:08:18.160
data sets can start playing with it. So if this

00:08:18.160 --> 00:08:21.079
chemilinear tech holds up, Will it truly solve

00:08:21.079 --> 00:08:23.660
that context window problem for things like large

00:08:23.660 --> 00:08:25.860
corporate documents? Well, the promise is there.

00:08:25.939 --> 00:08:28.879
That huge context window makes processing entire

00:08:28.879 --> 00:08:31.560
documents practical now, really moving beyond

00:08:31.560 --> 00:08:33.919
the old limitations. Okay, moving from deep tech

00:08:33.919 --> 00:08:36.860
architecture to maybe the cultural side, how

00:08:36.860 --> 00:08:39.399
are creators actually using these tools, monetizing

00:08:39.399 --> 00:08:41.519
them, making stuff go viral? Yeah, it's moving

00:08:41.519 --> 00:08:44.620
fast, especially on the video front. OpenAI is

00:08:44.620 --> 00:08:46.480
starting to charge for Sora videos now. You can

00:08:46.480 --> 00:08:49.460
buy extra credits, extra generations. And apparently

00:08:49.460 --> 00:08:51.440
they're even setting up ways for creators to

00:08:51.440 --> 00:08:54.080
earn money from their Sora content, like a cameo

00:08:54.080 --> 00:08:57.299
feature soon. Interesting. The ecosystem builds

00:08:57.299 --> 00:09:00.539
fast. Super fast. And it's global demand, too.

00:09:00.659 --> 00:09:03.039
Our sources mentioned guides popping up already

00:09:03.039 --> 00:09:05.320
for people outside the U .S. on how to get access

00:09:05.320 --> 00:09:07.700
to the Sora app. Yeah. Shows how quickly the

00:09:07.700 --> 00:09:09.799
creator world jumps on new tools. So what are

00:09:09.799 --> 00:09:11.620
some of the hot trends right now? What's getting

00:09:11.620 --> 00:09:14.879
traction? Okay, rapid fire. First, spook mode.

00:09:15.580 --> 00:09:18.440
OpenAI dropped this fun Sora short for Halloween

00:09:18.440 --> 00:09:21.629
called Monster Manor. Right. and boom creators

00:09:21.629 --> 00:09:23.990
immediately started remixing classic monsters

00:09:23.990 --> 00:09:26.549
putting their own hyper stylized spin on them

00:09:26.549 --> 00:09:29.350
you see blends of old horror styles with like

00:09:29.350 --> 00:09:32.169
modern pop culture references really creative

00:09:32.169 --> 00:09:35.590
stuff okay then there's ai ads google put out

00:09:35.590 --> 00:09:38.590
its first ad made mostly with ai and now brand

00:09:38.590 --> 00:09:40.830
managers are starting to look at iconic ads like

00:09:40.830 --> 00:09:43.210
you know those really emotional coca -cola christmas

00:09:43.210 --> 00:09:46.389
ads yeah classics And they're expecting AI tools

00:09:46.389 --> 00:09:49.230
to recreate that level of polish like instantly.

00:09:49.429 --> 00:09:51.950
It's definitely speeding up expectations around

00:09:51.950 --> 00:09:54.960
quality and turnaround time. What about more

00:09:54.960 --> 00:09:56.820
controversial stuff? Well, that brings us to

00:09:56.820 --> 00:10:00.720
SNAP storm. We've seen these viral politically

00:10:00.720 --> 00:10:04.240
charged AI videos making the rounds, specifically

00:10:04.240 --> 00:10:07.000
examples showing SNAP recipients looking really

00:10:07.000 --> 00:10:09.679
upset about benefit cuts. And these often get

00:10:09.679 --> 00:10:12.259
amplified through specific news channels. Right.

00:10:12.320 --> 00:10:14.480
And the ease of making that kind of hyper realistic

00:10:14.480 --> 00:10:17.789
emotional content. It feels like it could easily

00:10:17.789 --> 00:10:19.909
bypass fact checking when it spreads that fast.

00:10:20.029 --> 00:10:22.110
Exactly. It definitely increases the risk for

00:10:22.110 --> 00:10:24.769
rapid disinformation or shaping narratives really

00:10:24.769 --> 00:10:26.350
quickly. It's something to watch. Definitely.

00:10:26.370 --> 00:10:29.960
Anything lighter. Oh, yeah. The one we all secretly

00:10:29.960 --> 00:10:33.580
love. Crittercore. AI pets are just dominating

00:10:33.580 --> 00:10:35.820
again. We're talking cats that cook gourmet meals,

00:10:35.960 --> 00:10:39.019
dogs doing like elaborate disco dances. Absurd

00:10:39.019 --> 00:10:41.440
stuff. Totally absurd, but racking up millions

00:10:41.440 --> 00:10:44.379
of views. People just love seeing familiar animals

00:10:44.379 --> 00:10:46.539
in these impossible, super detailed scenarios.

00:10:46.639 --> 00:10:49.679
It's just pure novelty, high fidelity, weirdness,

00:10:49.679 --> 00:10:53.450
a dopamine hit. What risks really emerge? when

00:10:53.450 --> 00:10:55.970
AI can generate that viral politically charged

00:10:55.970 --> 00:10:58.529
content so easily. I think the big one is the

00:10:58.529 --> 00:11:01.230
viral spread of potentially misleading stuff,

00:11:01.389 --> 00:11:02.889
especially when it's hard to figure out where

00:11:02.889 --> 00:11:06.029
it even came from. That's a growing risk. So

00:11:06.029 --> 00:11:07.710
what does this all mean? We've kind of looked

00:11:07.710 --> 00:11:10.409
at two very different sides of AI today. The

00:11:10.409 --> 00:11:13.159
gap feels... pretty huge. It really does. On

00:11:13.159 --> 00:11:15.500
the one hand, you have the deployed agents, right?

00:11:15.620 --> 00:11:19.019
And their performance on complex human type tasks

00:11:19.019 --> 00:11:22.440
is, well, frankly, underwhelming right now. Yeah.

00:11:22.519 --> 00:11:25.100
They stumble on judgment, ambiguity, nuanced

00:11:25.100 --> 00:11:28.019
feedback. The human factor is still crucial for

00:11:28.019 --> 00:11:30.149
actually getting quality work done. But then

00:11:30.149 --> 00:11:31.909
on the other hand, the underlying technology,

00:11:32.049 --> 00:11:33.889
the core engine stuff like Kimi Linear, it's

00:11:33.889 --> 00:11:36.330
making these absolutely radical leaps, speed,

00:11:36.529 --> 00:11:39.289
context handling. It's like we're suddenly stacking

00:11:39.289 --> 00:11:41.970
Lego blocks of data at scales and speeds that

00:11:41.970 --> 00:11:44.309
were just physically impossible like six months

00:11:44.309 --> 00:11:46.490
ago. So the short term future isn't necessarily

00:11:46.490 --> 00:11:50.409
about AI taking over jobs that need that human

00:11:50.409 --> 00:11:52.460
judgment. Probably not wholesale replacement,

00:11:52.720 --> 00:11:55.919
no. It seems much more likely to be about specializing.

00:11:56.220 --> 00:11:59.139
AI getting really good at those narrow, high

00:11:59.139 --> 00:12:01.519
volume, repetitive tasks that it can handle super

00:12:01.519 --> 00:12:04.100
efficiently, especially now with these massive

00:12:04.100 --> 00:12:07.019
context windows, which means the most valuable

00:12:07.019 --> 00:12:11.279
skill may be moving forward. It isn't just learning

00:12:11.279 --> 00:12:13.820
how to use the AI tools themselves. What is it

00:12:13.820 --> 00:12:16.379
then? It's mastering the art of asking the right

00:12:16.379 --> 00:12:18.860
questions, the clarifying questions, the contextual

00:12:18.860 --> 00:12:21.440
ones that the agents right now just don't understand.

00:12:21.919 --> 00:12:24.220
That's where the human intelligence, that nuance

00:12:24.220 --> 00:12:27.159
is still absolutely essential. That's a great

00:12:27.159 --> 00:12:30.019
point. So our challenge for you today listening

00:12:30.019 --> 00:12:33.200
is this. Take a hard look at your own workflow.

00:12:33.740 --> 00:12:35.639
Where are the steps that really require that

00:12:35.639 --> 00:12:38.159
contextual judgment, that nuance, that ability

00:12:38.159 --> 00:12:41.419
to handle ambiguity? Find the messy parts. Exactly.

00:12:41.500 --> 00:12:43.360
Find the messy parts. Because those are the areas

00:12:43.360 --> 00:12:45.820
AI can't really touch yet. And those are probably

00:12:45.820 --> 00:12:47.440
the skill you should be doubling down on like

00:12:47.440 --> 00:12:49.679
right now. Thank you for providing the source

00:12:49.679 --> 00:12:50.919
material for this deep dive.