WEBVTT

00:00:00.000 --> 00:00:03.339
It's sort of the great paradox of January 2026,

00:00:03.560 --> 00:00:05.719
isn't it? It really is. We have these incredible

00:00:05.719 --> 00:00:09.400
models, Dash GPT 5 .2, Gemini 3, and they can

00:00:09.400 --> 00:00:11.419
do amazing things. They can write poetry, they

00:00:11.419 --> 00:00:14.019
can code, they can even, you know, fake empathy

00:00:14.019 --> 00:00:16.260
pretty convincingly. Right. But then you put

00:00:16.260 --> 00:00:17.980
them in a digital office. You give them a Slack

00:00:17.980 --> 00:00:20.820
login, a Google Drive folder, and a basic analysis

00:00:20.820 --> 00:00:23.000
task. You just fall apart. It's like a nervous

00:00:23.000 --> 00:00:25.320
intern on their very first day. That's it. It

00:00:25.320 --> 00:00:28.000
is the intern paradox. And looking at the data

00:00:28.000 --> 00:00:30.050
we have today, it feels like the biggest reality

00:00:30.050 --> 00:00:31.949
check we've had in, what, the last two years?

00:00:32.090 --> 00:00:35.049
Oh, easily. Welcome back to The Deep Dive. It

00:00:35.049 --> 00:00:39.770
is Thursday, January 22nd, 2026. We're here to

00:00:39.770 --> 00:00:41.750
slow down, take a breath, and just try to make

00:00:41.750 --> 00:00:43.310
sense of the noise. And I'm here to help connect

00:00:43.310 --> 00:00:45.530
some of those dots. So today is really about

00:00:45.530 --> 00:00:47.789
taking a hard, honest look at where we actually

00:00:47.789 --> 00:00:50.869
are. We're going to unpack why this idea of the

00:00:50.869 --> 00:00:53.530
autonomous office agent just isn't working yet.

00:00:53.770 --> 00:00:56.450
We'll be looking at a new benchmark called Apex

00:00:56.450 --> 00:00:59.450
that is, frankly, a little embarrassing for the

00:00:59.450 --> 00:01:01.649
big labs. But we're also looking at the other

00:01:01.649 --> 00:01:04.870
side of that coin. Because while the brains of

00:01:04.870 --> 00:01:08.290
the AI are struggling with paperwork, the senses,

00:01:08.569 --> 00:01:11.409
I mean, vision, audio, they've just taken this

00:01:11.409 --> 00:01:14.390
quantum leap forward. Right. We're talking about

00:01:14.390 --> 00:01:17.769
Google's new 4D vision. Yeah. Pretty surreal

00:01:17.769 --> 00:01:20.390
story involving Liza Minnelli of all people.

00:01:20.510 --> 00:01:24.090
It is such a strange week. The tech is stalling

00:01:24.090 --> 00:01:26.510
in some areas and just hitting warp speed in

00:01:26.510 --> 00:01:28.569
others. Before we jumped into the data, though,

00:01:28.670 --> 00:01:30.810
I kind of have to make a vulnerable admission

00:01:30.810 --> 00:01:34.709
here. I honestly still wrestle with prompt drift

00:01:34.709 --> 00:01:37.349
myself. I was at my desk this morning trying

00:01:37.349 --> 00:01:39.969
to get an agent to format a report from three

00:01:39.969 --> 00:01:42.890
different documents. And I must have spent 45

00:01:42.890 --> 00:01:46.159
minutes just. tweaking instructions, you know,

00:01:46.180 --> 00:01:48.219
correcting its errors, telling it to stop making

00:01:48.219 --> 00:01:51.040
updates. And then it hit me. What's that? I could

00:01:51.040 --> 00:01:52.500
have just done the work myself in 10 minutes.

00:01:52.599 --> 00:01:54.760
Oh, absolutely. That is the automation tax. You

00:01:54.760 --> 00:01:56.900
pay it in patience just to prove the machine

00:01:56.900 --> 00:01:59.579
can do it. Exactly. And that feeling, it's not

00:01:59.579 --> 00:02:02.120
just me being impatient. It turns out there's

00:02:02.120 --> 00:02:04.920
hard data to back it up now. So let's start with

00:02:04.920 --> 00:02:07.920
segment one, the Apex failure. Yeah, this is

00:02:07.920 --> 00:02:09.419
a really significant report. It's called the

00:02:09.419 --> 00:02:13.060
AI gap. For, I'd say, the last 18 months, the

00:02:13.060 --> 00:02:16.680
narrative From OpenAI, from Google, from Anthropic

00:02:16.680 --> 00:02:20.000
has been pretty clear. AI is ready for real jobs.

00:02:20.280 --> 00:02:22.199
Exactly. We've been told they can be analysts,

00:02:22.560 --> 00:02:25.479
paralegals, executive assistants. Yeah, just

00:02:25.479 --> 00:02:27.840
give them the keys. Right. But we haven't really

00:02:27.840 --> 00:02:31.360
had a standardized way to test that claim. Not

00:02:31.360 --> 00:02:34.180
in a messy, real -world setting. That's the key.

00:02:34.280 --> 00:02:37.740
We had tests for write a poem or solve this math

00:02:37.740 --> 00:02:40.099
problem. We did not have tests for be an employee.

00:02:40.650 --> 00:02:42.669
And that's where the Apex Agents Benchmark comes

00:02:42.669 --> 00:02:45.629
in. So this is not a multiple choice test. No,

00:02:45.750 --> 00:02:47.990
not at all. They literally dropped the top models.

00:02:48.030 --> 00:02:50.810
We're talking Gemini 3 Flash, GPT 5 .2, Opus

00:02:50.810 --> 00:02:53.909
4 .5 into these simulated white collar workflows.

00:02:54.439 --> 00:02:57.139
So things like read the Slack thread, then cross

00:02:57.139 --> 00:02:59.099
-reference it with the PDF and Google Drive,

00:02:59.240 --> 00:03:01.840
and draft a reply to the client explaining why

00:03:01.840 --> 00:03:03.960
we can't do the refund. Precisely. The stuff

00:03:03.960 --> 00:03:06.000
that's actually, you know, 90 % of knowledge

00:03:06.000 --> 00:03:08.539
work. Yeah. It isn't just generating text. It's

00:03:08.539 --> 00:03:11.240
context switching. It's reasoning across different

00:03:11.240 --> 00:03:13.639
sources that might even contradict each other.

00:03:13.780 --> 00:03:16.879
And the results were? Well, I'm looking at this

00:03:16.879 --> 00:03:20.400
chart and shocking feels like an understatement.

00:03:20.539 --> 00:03:23.800
They were abysmal. So the winner was Gemini 3

00:03:23.800 --> 00:03:26.879
Flash. You want to take a guess at the accuracy

00:03:26.879 --> 00:03:30.099
rate? I mean, I would hope for at least a passing

00:03:30.099 --> 00:03:35.240
grade, maybe, what, 65 %? 24. 24%. That was the

00:03:35.240 --> 00:03:39.159
gold medal. GPT 5 .2 came in at 23%. And the

00:03:39.159 --> 00:03:41.979
others, like Opus 4 .5 and the standard Gemini

00:03:41.979 --> 00:03:44.740
3 Pro, were all hovering around 18%. That is

00:03:44.740 --> 00:03:47.460
just incredibly low. That means three out of

00:03:47.460 --> 00:03:50.120
four times the employee gets the task wrong.

00:03:50.360 --> 00:03:53.580
If I had a human intern with a 24 % success rate,

00:03:53.740 --> 00:03:55.340
I mean, I'd have to let them go before lunch.

00:03:55.919 --> 00:03:58.659
or they'd just quit. The report notes that in

00:03:58.659 --> 00:04:00.680
most cases, the models just couldn't piece the

00:04:00.680 --> 00:04:02.939
information together, they'd hallucinate a policy

00:04:02.939 --> 00:04:05.360
that didn't exist, or they'd miss a key detail

00:04:05.360 --> 00:04:07.419
in the Slack thread because it contradicted the

00:04:07.419 --> 00:04:09.500
PDF. It really feels like a connective tissue

00:04:09.500 --> 00:04:12.099
problem. They can process the individual data

00:04:12.099 --> 00:04:14.039
points, right? They can read the PDF, they can

00:04:14.039 --> 00:04:16.800
read the chat, but stringing them together into

00:04:16.800 --> 00:04:20.100
a coherent chain of logic, that's where it all

00:04:20.100 --> 00:04:22.259
breaks down. It is the perfect analogy for an

00:04:22.259 --> 00:04:24.100
intern. You ask an intern, what's the capital

00:04:24.100 --> 00:04:27.199
of France? They nail it. That's retrieval. Right.

00:04:27.300 --> 00:04:29.399
But if you ask them, look at these three messy

00:04:29.399 --> 00:04:31.699
files and tell me if we can legally fire this

00:04:31.699 --> 00:04:34.540
vendor, they panic. That's reasoning. It really

00:04:34.540 --> 00:04:37.540
makes you wonder about the timeline. We've all

00:04:37.540 --> 00:04:39.879
been preparing for this wave of autonomous agents

00:04:39.879 --> 00:04:44.259
to take over our inboxes in 2026. So looking

00:04:44.259 --> 00:04:47.000
at these numbers, I guess I have to ask, is the

00:04:47.000 --> 00:04:49.480
dream of the autonomous AI employee officially

00:04:49.480 --> 00:04:53.060
dead for now? Not dead, just delayed. They lack

00:04:53.060 --> 00:04:55.620
connective tissue reasoning. That distinction

00:04:55.620 --> 00:04:58.199
is key. But while the current models are struggling,

00:04:58.379 --> 00:05:00.040
the labs aren't just sitting still, are they?

00:05:00.139 --> 00:05:02.620
No, not at all. And this brings us to the leaks

00:05:02.620 --> 00:05:05.000
and the new tools emerging this week. The rumor

00:05:05.000 --> 00:05:07.639
mill is really spinning. We've got a leak about

00:05:07.639 --> 00:05:11.639
GPT -5 .3, which is codenamed Garlic. Garlic?

00:05:11.639 --> 00:05:13.899
That's an interesting choice. I know, right?

00:05:14.300 --> 00:05:17.230
But the leak suggests a pivot. Instead of just

00:05:17.230 --> 00:05:19.490
bigger is better, which has kind of been the

00:05:19.490 --> 00:05:22.089
strategy for five years, this next version is

00:05:22.089 --> 00:05:25.610
supposedly all about cost, speed, and crucially

00:05:25.610 --> 00:05:28.290
reasoning. Which would be a direct answer to

00:05:28.290 --> 00:05:31.170
the Apex failure. Exactly. If they can crack

00:05:31.170 --> 00:05:33.089
that reasoning bottleneck without making the

00:05:33.089 --> 00:05:36.569
model 10 times more expensive to run, well...

00:05:36.829 --> 00:05:38.870
That changes the entire equation. But while we're

00:05:38.870 --> 00:05:41.930
waiting for garlic to maybe save the day, there

00:05:41.930 --> 00:05:44.529
are some tools right now that are quietly solving

00:05:44.529 --> 00:05:47.189
this workflow problem just by changing how we

00:05:47.189 --> 00:05:49.170
interact with the info. You mentioned Notebook

00:05:49.170 --> 00:05:51.889
LM earlier. Yeah, Notebook LM really feels like

00:05:51.889 --> 00:05:53.930
a sleeper hit to me. Everyone is chasing the

00:05:53.930 --> 00:05:56.670
big agent dream. But Notebook LM has evolved

00:05:56.670 --> 00:05:58.970
into this amazing all -in -one researcher. It's

00:05:58.970 --> 00:06:01.269
gone from just a summarizer to a synthesizer.

00:06:01.629 --> 00:06:03.769
It has. The new workflows are really impressive.

00:06:03.910 --> 00:06:06.149
You can just dump in raw research papers, your

00:06:06.149 --> 00:06:08.629
notes, transcripts, and it doesn't just chat

00:06:08.629 --> 00:06:10.550
with you about them. It converts them. Right.

00:06:10.689 --> 00:06:13.610
I saw that workflow where it takes all that raw

00:06:13.610 --> 00:06:16.370
data and turns it into a slide deck outline,

00:06:16.730 --> 00:06:19.889
generates an audio overview, which sounds creepily

00:06:19.889 --> 00:06:22.350
human, and then it builds out comparison tables

00:06:22.350 --> 00:06:25.009
all in minutes. And that connects to this broader

00:06:25.009 --> 00:06:27.810
trend we're seeing, this idea of no PowerPoint.

00:06:28.250 --> 00:06:30.589
Please tell me that means what I think it means.

00:06:30.750 --> 00:06:32.810
Well. It means we stop fighting with PowerPoint.

00:06:33.170 --> 00:06:36.269
The shift is towards creating consistent, full

00:06:36.269 --> 00:06:39.430
slides using scripts. So you write the narrative

00:06:39.430 --> 00:06:42.089
and the AI builds the visual container for you.

00:06:42.149 --> 00:06:44.490
No more dragging text boxes or aligning fonts.

00:06:44.970 --> 00:06:47.230
Exactly. That's the dream. We become editors

00:06:47.230 --> 00:06:49.209
-in -chief instead of slide designers. Yeah.

00:06:49.290 --> 00:06:51.209
But there was another tool leaked from OpenAI

00:06:51.209 --> 00:06:54.790
that sounds a little more managerial. Salute.

00:06:54.949 --> 00:06:57.069
Salute is fascinating. It's an internal tool

00:06:57.069 --> 00:06:58.709
they're testing, and it's basically a project

00:06:58.709 --> 00:07:01.569
manager AI. You upload your files, you assign

00:07:01.569 --> 00:07:04.490
tasks to the AI, and it actually tracks the progress.

00:07:04.730 --> 00:07:07.149
So it's not just do this one thing. It's here's

00:07:07.149 --> 00:07:09.490
the project, now you manage the steps. It's trying

00:07:09.490 --> 00:07:12.629
to bridge that gap we saw in Apex. If the model

00:07:12.629 --> 00:07:15.589
can't reason across tasks on its own, maybe we

00:07:15.589 --> 00:07:17.910
just need a dedicated manager layer of software

00:07:17.910 --> 00:07:20.449
to force it to stay on track. Feels like a real

00:07:20.449 --> 00:07:22.850
shift in our role, you know? With tools like

00:07:22.850 --> 00:07:25.689
Salute and Notebook LM, are we actually working

00:07:25.689 --> 00:07:29.209
less or are we just managing the AI more? Managing

00:07:29.209 --> 00:07:31.769
more, but the output quality is exponentially

00:07:31.769 --> 00:07:35.009
higher. That feels like the trade -off of 2026.

00:07:35.329 --> 00:07:37.589
You're the conductor now, not the first violin.

00:07:37.790 --> 00:07:39.730
But let's talk about the business side of this.

00:07:39.889 --> 00:07:41.870
Because running these reasoning models, these

00:07:41.870 --> 00:07:45.259
manager layers, it is not free. Far from it.

00:07:45.319 --> 00:07:47.939
And the economics are getting pretty ugly. We

00:07:47.939 --> 00:07:50.220
saw a report on Anthropic, the makers of Claude.

00:07:50.379 --> 00:07:52.560
Their margins just took a really significant

00:07:52.560 --> 00:07:55.379
hit. Dropping from 50 percent down to 40. That's

00:07:55.379 --> 00:07:58.079
a huge slice of profit. It is. And the culprit

00:07:58.079 --> 00:08:01.459
is exactly who you'd expect. The cloud providers.

00:08:02.139 --> 00:08:04.980
Google and Amazon hiked their server costs by

00:08:04.980 --> 00:08:07.819
23 percent. It's the unsexy reality of the AI

00:08:07.819 --> 00:08:09.860
ecosystem, isn't it? You can have the smartest

00:08:09.860 --> 00:08:11.360
model in the world, but if you have to pay a

00:08:11.360 --> 00:08:13.889
landlord to run it, you're at their mercy. It's

00:08:13.889 --> 00:08:16.610
the infrastructure squeeze, and it's driving

00:08:16.610 --> 00:08:19.610
these massive capital deals. We just saw the

00:08:19.610 --> 00:08:22.850
Saudi infrastructure fund team up with Humane

00:08:22.850 --> 00:08:27.410
for a $1 .2 billion deal. Just to build data

00:08:27.410 --> 00:08:30.129
centers. $1 .2 billion just for the plumbing.

00:08:30.269 --> 00:08:32.870
For the plumbing. But, you know, while the infrastructure

00:08:32.870 --> 00:08:34.710
is getting more expensive and the reasoning is

00:08:34.710 --> 00:08:37.289
hitting a wall, the creative output is finding

00:08:37.289 --> 00:08:39.629
these really interesting new business models.

00:08:39.750 --> 00:08:42.730
This Eleven Labs story really caught my eye.

00:08:42.950 --> 00:08:44.929
It's a landmark moment, I think. They dropped

00:08:44.929 --> 00:08:48.970
a 13 -track AI music album. And we're not talking

00:08:48.970 --> 00:08:51.210
about some, you know, anonymous generated lo

00:08:51.210 --> 00:08:54.299
-fi beats. This features Liza Minnelli and Art

00:08:54.299 --> 00:08:56.659
Garfunkel. Now, to be clear, because the ethics

00:08:56.659 --> 00:08:58.879
here usually get messy, Liza Minnelli did not

00:08:58.879 --> 00:09:00.960
go into a booth and record this, correct? No,

00:09:00.980 --> 00:09:03.039
she did not. But her state and her team signed

00:09:03.039 --> 00:09:05.899
off on it. And that's the key. Yes. Full royalties

00:09:05.899 --> 00:09:07.879
are being paid. The labels are actually involved.

00:09:08.100 --> 00:09:10.139
It's a licensed collaboration. They're essentially

00:09:10.139 --> 00:09:13.399
treating her voice as an instrument, like a Stradivarius,

00:09:13.580 --> 00:09:16.360
that can be played by the AI with the artist's

00:09:16.360 --> 00:09:19.730
permission. That's a complete 180 from all the

00:09:19.730 --> 00:09:23.350
lawsuits we saw back in 2024 and 2025. It suggests

00:09:23.350 --> 00:09:26.350
a future where an artist's prime voice can become

00:09:26.350 --> 00:09:29.950
immortal. Exactly. And Spotify is leaning into

00:09:29.950 --> 00:09:32.669
this whole vibe culture, too. Their new prompted

00:09:32.669 --> 00:09:35.629
playlists are getting shockingly good. You don't

00:09:35.629 --> 00:09:37.669
search for a genre anymore. You just type in,

00:09:37.710 --> 00:09:40.110
make me a playlist that feels like rainy Tokyo

00:09:40.110 --> 00:09:42.889
nights. And it actually understands the semantic

00:09:42.889 --> 00:09:45.830
texture of that request. It's moving from metadata

00:09:45.830 --> 00:09:49.409
like this is a rock song to emotional data. This

00:09:49.409 --> 00:09:52.110
song feels like heartbreak. That's it. So does

00:09:52.110 --> 00:09:54.330
the Eleven Labs album prove that artists and

00:09:54.330 --> 00:09:57.710
AI can actually coexist profitably? Yes. It turns

00:09:57.710 --> 00:10:00.409
legacy voices into a scalable, renewable asset.

00:10:00.629 --> 00:10:02.990
A renewable asset. That is a wild way to think

00:10:02.990 --> 00:10:04.940
about a human voice. But speaking of sensing

00:10:04.940 --> 00:10:06.639
the world, we have to talk about the breakthrough

00:10:06.639 --> 00:10:08.200
of the week. This is the one that actually made

00:10:08.200 --> 00:10:10.259
me just stop and stare at my screen. Google's

00:10:10.259 --> 00:10:13.919
D4RT. Right. Dynamic 4D reconstruction and tracking.

00:10:14.279 --> 00:10:16.879
Now, we've had computer vision for a while. My

00:10:16.879 --> 00:10:19.600
car sees lane lines. My phone recognizes my face.

00:10:19.940 --> 00:10:22.559
How is this different? This is the difference

00:10:22.559 --> 00:10:24.600
between looking at a photograph and actually

00:10:24.600 --> 00:10:27.279
stepping inside the room. Traditional computer

00:10:27.279 --> 00:10:30.720
vision looks at a 2D image and draws boxes around

00:10:30.720 --> 00:10:34.840
things. Cats, cars. Right. D4RT watches a video

00:10:34.840 --> 00:10:37.559
and builds a world model. A world model. Yes.

00:10:37.639 --> 00:10:40.019
It tracks every single pixel across time. That's

00:10:40.019 --> 00:10:42.320
the fourth dimension here, time. And from that

00:10:42.320 --> 00:10:45.179
flat video, it reconstructs a full 3D scene.

00:10:45.419 --> 00:10:47.940
So if I showed a video of me walking around my

00:10:47.940 --> 00:10:50.559
kitchen, it's not just seeing a man in a kitchen.

00:10:50.600 --> 00:10:53.730
It's actually... Building the 3D geometry of

00:10:53.730 --> 00:10:56.629
the fridge, the table, the coffee cup. And it

00:10:56.629 --> 00:10:58.470
understands exactly where the camera is moving

00:10:58.470 --> 00:11:01.070
in that space. Correct. And it handles the messy

00:11:01.070 --> 00:11:03.509
stuff that usually breaks these models. Motion

00:11:03.509 --> 00:11:06.009
blur, things getting blocked. What we call occlusion.

00:11:06.250 --> 00:11:08.389
Ah. Like if you walk behind the fridge, old models

00:11:08.389 --> 00:11:11.789
would think you just vanished. D4NT knows you

00:11:11.789 --> 00:11:13.289
didn't disappear. It knows you're just behind

00:11:13.289 --> 00:11:15.889
an object. Okay. And the speed. That is the whoa

00:11:15.889 --> 00:11:18.669
moment. Previous models, and this was cutting

00:11:18.669 --> 00:11:21.320
edge just six months ago. would take maybe 10

00:11:21.320 --> 00:11:23.320
minutes to parse a video like that. 10 minutes,

00:11:23.440 --> 00:11:26.039
okay. I4RT does a one -minute video in five seconds.

00:11:26.299 --> 00:11:29.159
Two -sec silence, five seconds. Whoa. That's

00:11:29.159 --> 00:11:31.860
basically real time. It's anywhere from 18 to

00:11:31.860 --> 00:11:34.539
300 times faster than anything else out there.

00:11:35.139 --> 00:11:37.860
It uses this unified query system. So instead

00:11:37.860 --> 00:11:39.960
of having one model track objects, another one

00:11:39.960 --> 00:11:43.179
gets the depth, another one map the room, D4RT

00:11:43.179 --> 00:11:45.740
does it all in a single pass. I mean, imagine

00:11:45.740 --> 00:11:48.879
the implications for... Well, for everything.

00:11:49.259 --> 00:11:51.679
Robotics, obviously. If a robot can understand

00:11:51.679 --> 00:11:54.480
the 3D geometry of a room in milliseconds, it

00:11:54.480 --> 00:11:57.019
can navigate like a human. But also surveillance,

00:11:57.259 --> 00:11:59.240
content creation. It's like the AI isn't just

00:11:59.240 --> 00:12:01.299
watching a movie anymore. It's building the movie.

00:12:01.659 --> 00:12:04.519
set in its head instantly. It's superhuman perception.

00:12:04.720 --> 00:12:06.440
We're giving machines a spatial understanding

00:12:06.440 --> 00:12:08.840
of reality that might actually be faster than

00:12:08.840 --> 00:12:11.059
our own ability to process a scene. It's a fundamental

00:12:11.059 --> 00:12:13.600
shift. So if machines can reconstruct reality

00:12:13.600 --> 00:12:17.000
this fast, what happens to truth in video evidence?

00:12:17.379 --> 00:12:19.419
We stop trusting our eyes and start trusting

00:12:19.419 --> 00:12:21.779
digital watermarks. That is a chilling thought.

00:12:22.240 --> 00:12:24.659
We're going to take a quick breather here. We've

00:12:24.659 --> 00:12:26.659
covered the failures of the office intern and

00:12:26.659 --> 00:12:29.559
the superhuman vision of Google. When we come

00:12:29.559 --> 00:12:32.600
back, we are going to look at the Empowered Utility

00:12:32.600 --> 00:12:35.360
Belt, the specific tools you can use right now

00:12:35.360 --> 00:12:38.379
to make your life a little easier. Stay with

00:12:38.379 --> 00:12:41.799
us, and we are back. Okay, let's get tactical.

00:12:42.080 --> 00:12:44.639
We've talked about the big models, the high -level

00:12:44.639 --> 00:12:47.460
concepts, but what about the tools that actually

00:12:47.460 --> 00:12:50.419
save you time on a Tuesday afternoon? We called

00:12:50.419 --> 00:12:53.539
this section the Empowered Utility Belt. Yeah,

00:12:53.580 --> 00:12:55.860
there are some great ones this week. First up

00:12:55.860 --> 00:12:58.639
is something called Clawed Cowork. This one is

00:12:58.639 --> 00:13:00.340
interesting because it's not a new app. It's

00:13:00.340 --> 00:13:02.820
more of a workflow, right? Right. It's a 13 -minute

00:13:02.820 --> 00:13:05.779
exercise, and it's designed to turn Claude from

00:13:05.779 --> 00:13:08.000
just a chatbot into a real thinking partner.

00:13:08.399 --> 00:13:10.759
It's all about priming the model with your context

00:13:10.759 --> 00:13:13.200
so you don't have to explain your job every single

00:13:13.200 --> 00:13:15.860
time you open a new chat. I love that. It's like

00:13:15.860 --> 00:13:18.200
onboarding your AI colleague one time instead

00:13:18.200 --> 00:13:20.639
of every single morning. What else you got? ChartGen

00:13:20.639 --> 00:13:23.519
AI. This one is for anyone who just drowns in

00:13:23.519 --> 00:13:26.259
data. It takes raw data from pretty much anywhere.

00:13:26.700 --> 00:13:30.600
Facebook ad exports, TikTok analytics, just messy

00:13:30.600 --> 00:13:33.220
Excel sheets. And it turns them into professional

00:13:33.220 --> 00:13:37.320
charts in seconds. So no more fighting with Excel

00:13:37.320 --> 00:13:39.559
pivot tables. There are more pivot tables. And

00:13:39.559 --> 00:13:41.960
then there's Locate Store. It's very specific,

00:13:42.039 --> 00:13:44.720
but very cool. You have a Google sheet full of

00:13:44.720 --> 00:13:47.519
addresses. It instantly turns that into an interactive

00:13:47.519 --> 00:13:50.500
map with search filters. That's huge for logistics

00:13:50.500 --> 00:13:52.720
or even just planning a trip. But the one that

00:13:52.720 --> 00:13:55.379
really caught my eye for pure automation is Demonstrate.

00:13:55.850 --> 00:13:57.769
Okay, how does that work? You record a browser

00:13:57.769 --> 00:14:00.769
task one time. So say, log into this portal,

00:14:00.929 --> 00:14:03.289
download the invoice, rename it, and upload it

00:14:03.289 --> 00:14:06.029
to Dropbox. You just do it once, Demonstrate

00:14:06.029 --> 00:14:08.870
records it, and then this is the kicker, it deploys

00:14:08.870 --> 00:14:11.110
it as serverless code. So you don't just get

00:14:11.110 --> 00:14:12.889
a macro, you get a piece of software that runs

00:14:12.889 --> 00:14:14.990
in the cloud for you. Exactly. You are literally

00:14:14.990 --> 00:14:17.970
programming by doing. And finally, we have Calum.

00:14:18.320 --> 00:14:20.740
Right, the AI calendar assistant. But it handles

00:14:20.740 --> 00:14:23.019
the hard stuff like multi -person meetings, shared

00:14:23.019 --> 00:14:25.100
availability, all the real world constraints.

00:14:25.840 --> 00:14:28.460
It's really trying to kill the email ping pong

00:14:28.460 --> 00:14:31.840
of scheduling. Of all of these tools, if you

00:14:31.840 --> 00:14:34.480
had to pick one, which of these actually saves

00:14:34.480 --> 00:14:36.779
you an hour of sleep tonight? Demonstrate automating

00:14:36.779 --> 00:14:39.179
browser tasks is the ultimate cure for boredom.

00:14:39.259 --> 00:14:41.080
I am with you on that one. Anything that can

00:14:41.080 --> 00:14:43.100
fill out a web form for me is a friend of mine.

00:14:43.240 --> 00:14:46.960
Amen to that. So let's zoom out. What does all

00:14:46.960 --> 00:14:49.820
of this really mean for you, the listener, sitting

00:14:49.820 --> 00:14:53.580
here in January 2026? I think we're in what you

00:14:53.580 --> 00:14:56.320
could call a gap year. A gap year. Explain that.

00:14:56.500 --> 00:14:59.440
Well, the text -based agents, the brains, they're

00:14:59.440 --> 00:15:02.779
struggling. GPT -5 .2, Gemini 3, they're getting

00:15:02.779 --> 00:15:05.639
barely 24 % accuracy on real work. The brain

00:15:05.639 --> 00:15:07.659
is still learning how to file paperwork. Okay.

00:15:07.929 --> 00:15:10.269
But while the brain is stumbling, the senses

00:15:10.269 --> 00:15:13.889
are just exploding. Exactly. Google's D4RT vision

00:15:13.889 --> 00:15:16.309
is seeing the world 300 times faster than before.

00:15:17.210 --> 00:15:19.470
Eleven Labs has solved the legal and creative

00:15:19.470 --> 00:15:22.610
puzzle of AI music. So the synthesis here is,

00:15:22.750 --> 00:15:26.570
the AI might not be ready to be your lawyer or

00:15:26.570 --> 00:15:29.210
your project manager just yet, but it can see

00:15:29.210 --> 00:15:31.710
your world and sing your songs better than ever

00:15:31.710 --> 00:15:35.129
before. The brain is lagging, but the eyes and

00:15:35.129 --> 00:15:37.409
ears have become superhuman. That's a really

00:15:37.409 --> 00:15:39.820
powerful place to be. It means we have to stop

00:15:39.820 --> 00:15:42.639
waiting for some magic general manager AI and

00:15:42.639 --> 00:15:45.399
just start using the super sensory AI tools we

00:15:45.399 --> 00:15:47.740
have right now. Use the vision, use the voice,

00:15:47.879 --> 00:15:50.460
use the automation. Don't wait for the reasoning

00:15:50.460 --> 00:15:52.440
to be perfect. And if you want to start somewhere

00:15:52.440 --> 00:15:54.600
practical, I would highly encourage you to try

00:15:54.600 --> 00:15:57.159
that Claude Cowork 13 -minute exercise we mentioned.

00:15:57.360 --> 00:15:59.659
It's a small investment that really pays off

00:15:59.659 --> 00:16:02.120
every time you open that chat window. It really

00:16:02.120 --> 00:16:03.799
does change the dynamic. We're going to leave

00:16:03.799 --> 00:16:06.279
you with one final thought to mull over. Yeah,

00:16:06.320 --> 00:16:09.980
we talked about Google's D4RT, how it reconstructs

00:16:09.980 --> 00:16:12.879
a 3D world from a flat video in just five seconds.

00:16:12.980 --> 00:16:15.159
It's incredible. So here's the question. If an

00:16:15.159 --> 00:16:17.940
AI can reconstruct a 3D world from a flat video

00:16:17.940 --> 00:16:21.600
in five seconds, how long until it can reconstruct

00:16:21.600 --> 00:16:23.919
a better version of your workflow than you can?

00:16:24.139 --> 00:16:26.039
Something to think about. Thanks for diving in

00:16:26.039 --> 00:16:27.139
with us. We'll see you next time.
