WEBVTT

00:00:00.000 --> 00:00:02.120
So imagine for a second there's this new kind

00:00:02.120 --> 00:00:05.480
of AI. Think of it like that brilliant intern

00:00:05.480 --> 00:00:08.660
on their very first day, you know, super eager,

00:00:08.839 --> 00:00:12.060
capable of these flashes of real genius. Right.

00:00:12.119 --> 00:00:16.579
Ready to go. Right. But also maybe a little bit

00:00:16.579 --> 00:00:20.719
slow and incredibly forgetful and honestly might

00:00:20.719 --> 00:00:22.640
just confidently light the office trash can on

00:00:22.640 --> 00:00:25.199
fire while smiling at you. Welcome back to the

00:00:25.199 --> 00:00:26.660
Deep Dive. This is where we try to cut through

00:00:26.660 --> 00:00:28.440
all that digital noise, get you properly informed.

00:00:29.079 --> 00:00:32.460
Today, we're diving deep into OpenAI's new ChatGPT

00:00:32.460 --> 00:00:34.399
agent mode. There's been, well, a lot of buzz,

00:00:34.460 --> 00:00:36.380
hasn't there? Lots of promises. Our mission here,

00:00:36.439 --> 00:00:38.159
let's strip away the height, go past the marketing

00:00:38.159 --> 00:00:40.340
talk. We want to show you what it actually does.

00:00:40.500 --> 00:00:42.619
We'll put it through this really grueling 10

00:00:42.619 --> 00:00:45.670
-stage gauntlet of real -world tasks. Yeah, and

00:00:45.670 --> 00:00:47.189
we're going to lay out the whole roadmap for

00:00:47.189 --> 00:00:50.270
you. First up, we'll unpack what agent mode is.

00:00:50.829 --> 00:00:52.810
Fundamentally, it's a big shift. Then we'll get

00:00:52.810 --> 00:00:55.030
right into those tests you mentioned, break down

00:00:55.030 --> 00:00:56.770
the strengths, which were surprising sometimes.

00:00:57.210 --> 00:00:59.829
And yeah, those really frustrating flaws too.

00:00:59.990 --> 00:01:03.189
Can't ignore those. And finally, we'll give you

00:01:03.189 --> 00:01:05.709
a practical playbook. How you, as a smart operator,

00:01:05.890 --> 00:01:08.069
can actually use it effectively, like right now,

00:01:08.170 --> 00:01:10.549
and maybe what's coming next. Okay, let's unpack

00:01:10.549 --> 00:01:13.299
that first piece. What exactly is agent mode

00:01:13.299 --> 00:01:15.799
beyond just, you know, a cool name? Because it

00:01:15.799 --> 00:01:17.620
really does sound like a fundamental change from

00:01:17.620 --> 00:01:20.799
the standard chat GPT we've all been chatting

00:01:20.799 --> 00:01:23.879
with. Oh, it absolutely is. Think about standard

00:01:23.879 --> 00:01:27.200
chat GPT, right? It's like this brilliant conversationalist.

00:01:27.260 --> 00:01:29.299
It could answer questions, draft stuff, summarize

00:01:29.299 --> 00:01:32.120
things. But it was kind of trapped, you know,

00:01:32.120 --> 00:01:36.030
behind that text box. Agent mode, though. It's

00:01:36.030 --> 00:01:37.890
like they've given it the keys to the car. It

00:01:37.890 --> 00:01:40.530
can actually browse the live web, not just its

00:01:40.530 --> 00:01:43.150
old training beta. And crucially, it connects

00:01:43.150 --> 00:01:45.310
directly to your personal stuff, your apps like

00:01:45.310 --> 00:01:48.290
Gmail, Google Calendar, Google Drive. This AI

00:01:48.290 --> 00:01:50.930
can now sort of think and reason through complex,

00:01:51.010 --> 00:01:54.150
multi -step tasks. And here's the kicker. It

00:01:54.150 --> 00:01:56.090
can autonomously decide, OK, I need to use the

00:01:56.090 --> 00:01:58.390
browser now or time to check email or let's update

00:01:58.390 --> 00:02:00.969
the calendar. It calls these tools in sequence

00:02:00.969 --> 00:02:03.090
to actually solve a problem for you. It's not

00:02:03.090 --> 00:02:05.599
just talking about the. work it it does the work

00:02:05.599 --> 00:02:07.400
and getting access seems pretty straightforward

00:02:07.400 --> 00:02:09.659
if you want to try it out you log into chat GPT

00:02:09.659 --> 00:02:12.919
select tools and then agent mode but the real

00:02:12.919 --> 00:02:14.819
magic it sounds like happens when you hit that

00:02:14.819 --> 00:02:16.879
sources button and connect your gmail or drive

00:02:16.879 --> 00:02:20.199
right precisely Those integrations, that's the

00:02:20.199 --> 00:02:22.379
game changer. It's the difference between an

00:02:22.379 --> 00:02:25.240
AI that can talk about your schedule and one

00:02:25.240 --> 00:02:28.139
that can actually go and book that meeting right

00:02:28.139 --> 00:02:30.439
on your calendar. It shifts from just conversation

00:02:30.439 --> 00:02:34.740
to like direct autonomous action. So just to

00:02:34.740 --> 00:02:37.159
put it simply. How does hooking up those personal

00:02:37.159 --> 00:02:40.599
apps really change what AI can actually do for

00:02:40.599 --> 00:02:43.879
us? It transforms AI from just a talker into

00:02:43.879 --> 00:02:45.919
an assistant that truly acts on your behalf.

00:02:46.139 --> 00:02:47.819
Okay, this is where things get really interesting.

00:02:47.960 --> 00:02:50.840
The rubber meets the road, so to speak. We designed

00:02:50.840 --> 00:02:53.319
this, well, pretty tough 10 -stage gauntlet,

00:02:53.419 --> 00:02:55.680
progressively harder challenges, designed to

00:02:55.680 --> 00:02:57.800
test everything from just basic web browsing

00:02:57.800 --> 00:03:00.360
all the way up to reasoning through complex multi

00:03:00.360 --> 00:03:02.219
-step problems. Right, the first challenge was

00:03:02.219 --> 00:03:04.199
what we called the travel agent test. We asked

00:03:04.199 --> 00:03:06.979
it, find... Airbnb listings in Toronto. Very

00:03:06.979 --> 00:03:10.340
specific criteria. Two beds, parking, under 500

00:03:10.340 --> 00:03:12.560
Canadian a night, entire home. And we even threw

00:03:12.560 --> 00:03:14.860
in a little curveball logic bomb. We gave it

00:03:14.860 --> 00:03:17.800
2025 dates, then mentioned 2024 just to see if

00:03:17.800 --> 00:03:20.240
it noticed. And it was, well, pretty impressive,

00:03:20.319 --> 00:03:23.039
actually. It spotted the date issue right away,

00:03:23.180 --> 00:03:26.419
asked us to clarify. Then it intelligently defaulted

00:03:26.419 --> 00:03:29.080
to the future date. which made sense. It methodically

00:03:29.080 --> 00:03:30.819
clicked through all the filters on the Airbnb

00:03:30.819 --> 00:03:33.699
site. It even went the extra mile to check the

00:03:33.699 --> 00:03:36.199
total price, including all the fees. Okay, there

00:03:36.199 --> 00:03:37.860
was a broken links initially, maybe a little

00:03:37.860 --> 00:03:40.560
V1 .0 bug, but overall, it put the results in

00:03:40.560 --> 00:03:43.259
a nice clean table. Felt like a promising start.

00:03:44.060 --> 00:03:46.319
Competent, maybe a little green, you know. Like

00:03:46.319 --> 00:03:48.719
a new assistant, we gave it an A -. Yeah, A -.

00:03:48.979 --> 00:03:51.560
But right after that strong start, we push it

00:03:51.560 --> 00:03:54.060
harder with the data scientist test. The mission

00:03:54.060 --> 00:03:56.939
here was to do a Google Trends comparison for

00:03:56.939 --> 00:04:00.439
NAN, Make .com, and Zapier over 12 months and

00:04:00.439 --> 00:04:02.560
crucially extract the raw data. Usually that's

00:04:02.560 --> 00:04:05.080
like a downloadable CSV file, a spreadsheet format.

00:04:05.319 --> 00:04:07.819
And this is where it started to show off and

00:04:07.819 --> 00:04:10.460
then kind of stumble. It opened its own internal

00:04:10.460 --> 00:04:12.419
computer terminal, like inside its environment.

00:04:13.130 --> 00:04:15.610
Downloaded the raw CSV data. Started its own

00:04:15.610 --> 00:04:18.589
analysis. Incredible, right? The capability was

00:04:18.589 --> 00:04:20.189
clearly there. It could work internally. But

00:04:20.189 --> 00:04:23.290
then, like a brain freeze. After doing all that

00:04:23.290 --> 00:04:26.370
work, it completely forgot the data it had just

00:04:26.370 --> 00:04:28.529
analyzed. We actually had to manually copy and

00:04:28.529 --> 00:04:31.250
paste it back into the chat for it. Wow. Yeah.

00:04:31.310 --> 00:04:33.490
So this guest really highlighted that impressive

00:04:33.490 --> 00:04:36.009
ability to use internal tools, but also this

00:04:36.009 --> 00:04:39.089
critical weakness. It's short -term memory. It

00:04:39.089 --> 00:04:41.389
can do these complex steps, download, analyze,

00:04:41.670 --> 00:04:43.689
but then just lose track. So its working memory

00:04:43.689 --> 00:04:45.550
isn't quite there yet for these longer chains

00:04:45.550 --> 00:04:48.230
of tasks, unless you tell it explicitly to hold

00:04:48.230 --> 00:04:51.029
on to something. Exactly. Baffling memory loss.

00:04:51.129 --> 00:04:52.750
We gave this one a warning symbol. Okay, next

00:04:52.750 --> 00:04:56.209
up, the market analyst. We wanted average listing

00:04:56.209 --> 00:04:59.290
prices, average rental prices for three -bedroom

00:04:59.290 --> 00:05:01.889
homes in Orlando, Florida, across the big real

00:05:01.889 --> 00:05:04.709
estate sites. And it started off perfectly. Browsing

00:05:04.709 --> 00:05:07.269
Zillow, applying the filters, looked good. But

00:05:07.269 --> 00:05:10.360
then... It took a lazy shortcut. Instead of digging

00:05:10.360 --> 00:05:13.639
into actual listings for primary data, it just

00:05:13.639 --> 00:05:16.540
started pulling generic, kind of outdated rental

00:05:16.540 --> 00:05:19.379
data from random external blogs. Mixed it all

00:05:19.379 --> 00:05:21.939
together. So it didn't stick to the main sources.

00:05:22.259 --> 00:05:24.379
No. It showed us something important. If you

00:05:24.379 --> 00:05:26.360
give it vague instructions, it seems to optimize

00:05:26.360 --> 00:05:29.319
for the easiest path, not always the most accurate

00:05:29.319 --> 00:05:32.800
or thorough one. It really needs hyper -specific,

00:05:32.980 --> 00:05:35.279
almost micromanaged instructions, like you said,

00:05:35.360 --> 00:05:37.399
a contractor's blueprint. to get precise results.

00:05:37.740 --> 00:05:40.279
Another warning. Right. Okay, following that,

00:05:40.319 --> 00:05:43.720
the SEO strategist task, generate SEO blog ideas,

00:05:43.980 --> 00:05:46.560
use keyword tools, analyze the top -ranking content

00:05:46.560 --> 00:05:49.019
out there. This one was, frankly, a complete

00:05:49.019 --> 00:05:50.959
train wreck. It just didn't work. It spent ages

00:05:50.959 --> 00:05:53.500
trying to find free online tools. Yeah. I just

00:05:53.500 --> 00:05:55.620
seemed to get bored. I went on this random browsing

00:05:55.620 --> 00:05:58.480
spree, totally unrelated websites. The final

00:05:58.480 --> 00:06:01.060
report was useless. So it couldn't navigate that

00:06:01.060 --> 00:06:04.040
kind of research task. Not really. It made it

00:06:04.040 --> 00:06:06.649
crystal clear. You absolutely must give it direct

00:06:06.649 --> 00:06:09.470
URLs, links to the specific tools you want it

00:06:09.470 --> 00:06:12.009
to use, expecting it to navigate complex research

00:06:12.009 --> 00:06:14.670
like a human. That's just setting yourself up

00:06:14.670 --> 00:06:17.329
for disappointment right now. Big red X on this

00:06:17.329 --> 00:06:19.550
one. Okay, and the last one in this first batch

00:06:19.550 --> 00:06:23.189
of tests, the supply chain scout .mission. Find

00:06:23.189 --> 00:06:26.410
top -rated suppliers on Alibaba .com. Apply specific

00:06:26.410 --> 00:06:29.149
filters, collect detailed data on them. It hit

00:06:29.149 --> 00:06:31.029
a roadblock pretty quickly. One of those, are

00:06:31.029 --> 00:06:33.920
you a robot? kept TCHA screens. Happens all the

00:06:33.920 --> 00:06:35.759
time, right? Yeah, common hurdle. But instead

00:06:35.759 --> 00:06:37.360
of just saying, hey, I'm stuck, can you help?

00:06:37.459 --> 00:06:40.060
Which any decent assistant would do. Yeah. It

00:06:40.060 --> 00:06:42.439
just gave up. Abandoned the main mission entirely.

00:06:42.639 --> 00:06:44.720
It defaulted back to doing generic web research

00:06:44.720 --> 00:06:47.199
about suppliers, totally missing the point. Ah,

00:06:47.339 --> 00:06:49.540
so it doesn't know how to ask for help when it

00:06:49.540 --> 00:06:51.959
hits those common web obstacles. Right. Like

00:06:51.959 --> 00:06:54.980
Hapki TCHAs or login pages. Exactly. A major

00:06:54.980 --> 00:06:57.399
flaw in its error recovery. Doesn't handle roadblocks

00:06:57.399 --> 00:07:00.730
well yet. Another fail. Okay, so looking back

00:07:00.730 --> 00:07:02.750
at just these first five kind of foundational

00:07:02.750 --> 00:07:05.449
tests, what was the biggest, most consistent

00:07:05.449 --> 00:07:07.610
surprise for you? What really stood out? I think

00:07:07.610 --> 00:07:09.970
it was the contrast. These flashes of genuine

00:07:09.970 --> 00:07:13.069
genius, often completely overshadowed by surprisingly

00:07:13.069 --> 00:07:16.930
simple but really frustrating failures. Okay,

00:07:16.930 --> 00:07:20.329
let's move on then to the more complex challenges.

00:07:20.850 --> 00:07:23.389
This is where we started to see Agent Mode's

00:07:23.389 --> 00:07:27.410
true sweet spot emerge. And also what felt like...

00:07:27.769 --> 00:07:31.250
The final boss level. Right. Test six was the

00:07:31.250 --> 00:07:34.089
global expansion strategist. The mission here

00:07:34.089 --> 00:07:36.649
was to do market research. Should an e -commerce

00:07:36.649 --> 00:07:39.290
business expand into Australia or the UK? Pretty

00:07:39.290 --> 00:07:41.689
open -ended. And this. This turned out to be

00:07:41.689 --> 00:07:43.550
the agent's sweet spot. It was really good at

00:07:43.550 --> 00:07:45.649
this. It methodically opened up multiple browser

00:07:45.649 --> 00:07:48.329
tabs, gathered data on market size, consumer

00:07:48.329 --> 00:07:51.120
habits, competitors. across lots of different

00:07:51.120 --> 00:07:53.100
sites. And what was really cool was what you

00:07:53.100 --> 00:07:55.040
called the open kitchen policy. You could literally

00:07:55.040 --> 00:07:57.680
watch it screen as it worked, see the whole process

00:07:57.680 --> 00:08:00.300
unfold in real time. Yeah, that visual transparency,

00:08:00.579 --> 00:08:02.180
it really builds trust, doesn't it? You see what

00:08:02.180 --> 00:08:03.939
it's doing. Definitely. And this task was perfect

00:08:03.939 --> 00:08:06.879
for it. Pure, open -ended web research. Information

00:08:06.879 --> 00:08:10.379
synthesis. No tricky logins needed. Solid pass.

00:08:10.819 --> 00:08:12.860
Then we had the corporate spy mission. Sounds

00:08:12.860 --> 00:08:15.379
dramatic, but it was practical. We asked it to

00:08:15.379 --> 00:08:17.779
analyze the talent acquisition strategies of

00:08:17.779 --> 00:08:21.449
some Saw's competitors, Asana. Monday .com. Click

00:08:21.449 --> 00:08:25.430
up. Specifically, by finding and extracting their

00:08:25.430 --> 00:08:28.449
open job roles from their career pages and maybe

00:08:28.449 --> 00:08:30.970
LinkedIn. How did it do? It performed like a

00:08:30.970 --> 00:08:32.690
seasoned pro on this one. It was impressive.

00:08:32.769 --> 00:08:34.909
It flawlessly navigated all those different career

00:08:34.909 --> 00:08:37.389
pages, even the really dynamic ones, you know,

00:08:37.389 --> 00:08:39.450
with lots of JavaScript that often trips up simpler

00:08:39.450 --> 00:08:41.789
tools. Oh, yeah. Those can be tricky. Right.

00:08:41.950 --> 00:08:44.309
And it precisely extracted the data we asked

00:08:44.309 --> 00:08:48.389
for. The output. A perfectly formatted, downloadable

00:08:48.389 --> 00:08:52.129
CSV file. ready to use. Wow. So it actually beat

00:08:52.129 --> 00:08:54.169
out some dedicated scraping tools? In a way,

00:08:54.250 --> 00:08:56.830
yeah. Because it could visually understand the

00:08:56.830 --> 00:08:58.889
layout of the webpage almost like a human does,

00:08:58.970 --> 00:09:01.590
not just reading the raw code. Another clear

00:09:01.590 --> 00:09:04.870
pass. Okay, next, the AI chief of staff. This

00:09:04.870 --> 00:09:06.490
is the one we started calling the Voltron moment,

00:09:06.590 --> 00:09:10.289
right? Exactly. This was the most complex mission

00:09:10.289 --> 00:09:12.970
yet. We asked it to act as a sort of personal

00:09:12.970 --> 00:09:15.899
brand and content strategist. It needed to analyze

00:09:15.899 --> 00:09:18.639
expertise based on documents in Google Drive

00:09:18.639 --> 00:09:21.399
and emails in Gmail, identify relevant market

00:09:21.399 --> 00:09:23.899
trends from the web, propose content ideas based

00:09:23.899 --> 00:09:26.159
on all that, and then actually schedule writing

00:09:26.159 --> 00:09:28.799
sessions on Google Calendar, all connected. And

00:09:28.799 --> 00:09:31.100
this is where it really came together. It really

00:09:31.100 --> 00:09:34.500
did. It felt cinematic, almost. It scanned private

00:09:34.500 --> 00:09:37.259
data from Drive, did live web searches for trends,

00:09:37.440 --> 00:09:39.480
cross -referenced the calendar, scheduled the

00:09:39.480 --> 00:09:42.750
sessions. No hitches. Whoa. Just imagine scaling

00:09:42.750 --> 00:09:45.309
that kind of multi -app integration. Billions

00:09:45.309 --> 00:09:48.110
of queries a day for big companies. That's genuinely

00:09:48.110 --> 00:09:49.909
incredible when you think about it. It really

00:09:49.909 --> 00:09:52.830
is. This test showed the true promise of agent

00:09:52.830 --> 00:09:55.370
mode. That seamless bridging of your private

00:09:55.370 --> 00:09:57.950
internal world with the public external internet.

00:09:58.149 --> 00:10:00.570
A definite pass. Okay. What about test nine?

00:10:00.970 --> 00:10:03.690
The lead generation grunt. Sounds tedious. It

00:10:03.690 --> 00:10:05.549
was. High volume, pretty boring data collection.

00:10:05.750 --> 00:10:08.350
Yeah. Find dental practices in Texas. Extract

00:10:08.350 --> 00:10:10.870
names, websites, contact info, that kind of thing.

00:10:11.029 --> 00:10:12.830
And how did the intern handle the grunt work?

00:10:13.169 --> 00:10:16.470
Well, it was a marathon. This task ran for nearly

00:10:16.470 --> 00:10:20.269
45 minutes straight. Wow. It showed some surprisingly

00:10:20.269 --> 00:10:22.830
advanced techniques, actually, like saving website

00:10:22.830 --> 00:10:25.169
code, using a temporary cache. It acted like

00:10:25.169 --> 00:10:27.950
a really dedicated, diligent intern just grinding

00:10:27.950 --> 00:10:30.129
away. There's a but, isn't there? There's a but.

00:10:30.929 --> 00:10:34.549
It failed at a crucial final step. When a directory

00:10:34.549 --> 00:10:36.970
didn't explicitly list the name of the lead dentist,

00:10:37.070 --> 00:10:40.970
it just wrote unknown. Ah, it didn't try to dig

00:10:40.970 --> 00:10:43.750
deeper, look elsewhere for that info. Nope. It

00:10:43.750 --> 00:10:45.730
lacked that critical thinking step that, hmm,

00:10:45.789 --> 00:10:47.769
maybe I should check their actual website thought.

00:10:48.210 --> 00:10:51.370
Dedicated, yes, but ultimately kind of an inexperienced

00:10:51.370 --> 00:10:54.490
intern. So another warning symbol, Hasha. Right.

00:10:55.190 --> 00:10:57.710
Understandable. And finally, the final boss,

00:10:57.950 --> 00:10:59.990
the digital archaeologist. Yeah, this was the

00:10:59.990 --> 00:11:02.789
toughest one. Mission, extract foreclosure deed

00:11:02.789 --> 00:11:05.029
records from this really clunky old government

00:11:05.029 --> 00:11:07.750
database. And the key challenge, it required

00:11:07.750 --> 00:11:10.169
OCR optical character recognition to read text

00:11:10.169 --> 00:11:12.429
from scanned document images within the database.

00:11:12.710 --> 00:11:15.070
Okay, that sounds incredibly difficult for an

00:11:15.070 --> 00:11:17.590
AI. Did you give it any help? We did. We gave

00:11:17.590 --> 00:11:19.980
it a secret weapon. A perfect step -by -step

00:11:19.980 --> 00:11:22.899
walkthrough generated by another AI that had

00:11:22.899 --> 00:11:25.519
analyzed a video of a human navigating the same

00:11:25.519 --> 00:11:28.480
clunky site. Whoa, AI teaching AI how to use

00:11:28.480 --> 00:11:31.820
a bad interface. That's meta. Did it work? Honestly,

00:11:31.899 --> 00:11:33.840
the fact that it worked at all felt like a minor

00:11:33.840 --> 00:11:35.639
miracle. It followed the complex instructions.

00:11:35.700 --> 00:11:38.440
It navigated the terrible interface. It even

00:11:38.440 --> 00:11:40.960
attempted the OCR, intelligently zooming and

00:11:40.960 --> 00:11:43.080
scrolling within the document images to try and

00:11:43.080 --> 00:11:45.379
read them. But the results? The resulting data

00:11:45.379 --> 00:11:48.740
was messy. Lots of gaps, lots of errors. It showed

00:11:48.740 --> 00:11:51.779
promise, like a very early prototype. But it's

00:11:51.779 --> 00:11:54.139
absolutely not production ready for that level

00:11:54.139 --> 00:11:56.539
of complexity and data accuracy. Another warning,

00:11:56.720 --> 00:12:00.259
AI. So looking at these later, more complex tests,

00:12:00.600 --> 00:12:03.039
what kinds of missions really seem to showcase

00:12:03.039 --> 00:12:05.759
agent mode's true potential right now? Definitely

00:12:05.759 --> 00:12:08.100
complex web research, especially visual data

00:12:08.100 --> 00:12:10.000
extraction from lots of different kinds of sites.

00:12:10.120 --> 00:12:12.559
And that powerful multi -app integration, the

00:12:12.559 --> 00:12:15.240
Voltron stuff. That's where it shines. All right.

00:12:15.259 --> 00:12:18.980
So we put this brilliant AI intern through this

00:12:18.980 --> 00:12:21.940
intense trial by fire. Ten tough challenges.

00:12:22.360 --> 00:12:25.279
Now it's time for the performance review. Let's

00:12:25.279 --> 00:12:27.080
get real about what we learned. Where does it

00:12:27.080 --> 00:12:29.700
truly shine? what are the real strengths okay

00:12:29.700 --> 00:12:31.620
yeah the performance review first off there's

00:12:31.620 --> 00:12:33.500
what we call the voltron power that's got to

00:12:33.500 --> 00:12:35.600
be its biggest strength seamlessly combining

00:12:35.600 --> 00:12:39.460
gmail drive calendar web search all for these

00:12:39.460 --> 00:12:42.860
complex multi -app workflows it feels genuinely

00:12:42.860 --> 00:12:45.259
futuristic doesn't it like stacking legger blocks

00:12:45.259 --> 00:12:47.860
of data it really does then there's the open

00:12:47.860 --> 00:12:50.460
kitchen policy being able to watch it browse

00:12:50.460 --> 00:12:53.379
in real time see exactly what it's doing step

00:12:53.379 --> 00:12:56.500
by step. That visual transparency is huge for

00:12:56.500 --> 00:12:59.039
building user trust and also for figuring out

00:12:59.039 --> 00:13:02.080
what went wrong if it messes up. Debugging. Exactly.

00:13:02.399 --> 00:13:05.000
It's also surprisingly good at navigating tricky

00:13:05.000 --> 00:13:07.840
websites, what we call the parkour expert. Those

00:13:07.840 --> 00:13:11.279
dynamic JavaScript heavy sites with complex forms.

00:13:11.500 --> 00:13:13.620
It often handles them better than traditional,

00:13:13.720 --> 00:13:16.240
more brittle web scrapers. It seems to see the

00:13:16.240 --> 00:13:19.080
page better. And finally, the creative detour.

00:13:19.440 --> 00:13:21.139
Sometimes when it hits a wall, it doesn't just

00:13:21.139 --> 00:13:23.600
give up. It actually tries alternative paths.

00:13:24.120 --> 00:13:26.720
Shows this little spark of adaptive problem solving,

00:13:26.899 --> 00:13:28.740
which is pretty cool to see. Okay, those are

00:13:28.740 --> 00:13:31.740
some definite pluses. But let's not sugarcoat

00:13:31.740 --> 00:13:33.919
things. There are some pretty serious areas needing

00:13:33.919 --> 00:13:36.779
urgent improvement, right? First, the speed,

00:13:36.860 --> 00:13:40.620
or lack thereof. The sloth -like pace. It is

00:13:40.620 --> 00:13:43.879
just painfully slow sometimes. Tasks taking three,

00:13:44.000 --> 00:13:46.620
maybe five times longer than a human would take.

00:13:46.960 --> 00:13:49.179
That makes it unsuitable for anything time critical.

00:13:49.320 --> 00:13:50.940
Yeah, the speed is definitely an issue right

00:13:50.940 --> 00:13:52.960
now. And then there's the Dory problem. Yeah.

00:13:53.289 --> 00:13:55.970
From Finding Nemo, you know, the severe short

00:13:55.970 --> 00:13:58.149
-term memory loss. It can perform this brilliant

00:13:58.149 --> 00:14:01.509
analysis, pull data together, and then just completely

00:14:01.509 --> 00:14:03.370
forget what it just did or what data it had.

00:14:03.470 --> 00:14:06.289
It needs constant hand -holding, constant reminders

00:14:06.289 --> 00:14:08.730
from the human user. And I'll admit, I still

00:14:08.730 --> 00:14:11.330
wrestle with prompt drift myself sometimes, you

00:14:11.330 --> 00:14:13.149
know, where the AI kind of loses the plot over

00:14:13.149 --> 00:14:15.889
a long conversation. So I totally get how hard

00:14:15.889 --> 00:14:18.269
that memory piece must be to engineer. It's a

00:14:18.269 --> 00:14:21.610
tough problem. It is very tough. Then maybe its

00:14:21.610 --> 00:14:25.529
most dangerous flaw. The confident liar. It hallucinates.

00:14:25.590 --> 00:14:28.210
It just makes stuff up. Plausible sounding, but

00:14:28.210 --> 00:14:30.850
factually incorrect information. Especially when

00:14:30.850 --> 00:14:33.049
it's trying to synthesize from multiple web sources.

00:14:33.350 --> 00:14:35.529
This means you have to do constant human fact

00:14:35.529 --> 00:14:37.909
-checking. You can't just trust its output blindly.

00:14:38.169 --> 00:14:41.190
That's a huge one, the trust factor. Huge. And

00:14:41.190 --> 00:14:44.029
finally, the quiet quitter. That poor error recovery

00:14:44.029 --> 00:14:47.110
we saw. Faced with a simple CAPI -CHA or a login

00:14:47.110 --> 00:14:49.490
screen, instead of asking for help, it often

00:14:49.490 --> 00:14:51.490
just abandons the core task and does something

00:14:51.490 --> 00:14:54.500
else easier. That's not helpful. Right. So if

00:14:54.500 --> 00:14:57.120
you had to pick just one thing, what's the single

00:14:57.120 --> 00:14:59.580
biggest hurdle right now? The main reason you

00:14:59.580 --> 00:15:01.600
maybe you wouldn't trust agent mode for really

00:15:01.600 --> 00:15:03.820
critical work just yet. I think it's that combination,

00:15:04.179 --> 00:15:06.919
the agonizing slowness, the really unpredictable

00:15:06.919 --> 00:15:09.419
memory and that tendency to just confidently

00:15:09.419 --> 00:15:12.179
hallucinate incorrect information. It's just

00:15:12.179 --> 00:15:14.179
not reliable enough yet for high stakes stuff.

00:15:14.340 --> 00:15:17.370
OK, so given all that. This brilliant but flawed

00:15:17.370 --> 00:15:20.409
intern. Yeah. How do we as smart operators actually

00:15:20.409 --> 00:15:22.309
work with it effectively? What's the playbook?

00:15:22.409 --> 00:15:25.029
Right. The playbook. The art of the command you're

00:15:25.029 --> 00:15:27.350
prompting strategy is absolutely critical here,

00:15:27.450 --> 00:15:30.289
more so than ever. First, we recommend the inception

00:15:30.289 --> 00:15:33.210
prompt strategy. Basically, use regular chat

00:15:33.210 --> 00:15:35.850
GPT, which is good at language, to help you write

00:15:35.850 --> 00:15:38.549
the complex, super detailed prompts for agent

00:15:38.549 --> 00:15:41.379
mode. Don't just try to wing it. Okay. Use AI

00:15:41.379 --> 00:15:44.200
to prompt AI. Makes sense. Exactly. Then the

00:15:44.200 --> 00:15:47.039
GPS coordinate principle. Always, always provide

00:15:47.039 --> 00:15:50.899
direct URLs. Links to the specific tools or pages

00:15:50.899 --> 00:15:53.240
you want it to use. Don't make it search around.

00:15:53.500 --> 00:15:56.799
Guide it precisely. No vague destinations. Nope.

00:15:57.120 --> 00:16:00.360
Third, the contractor's blueprint. Be extremely

00:16:00.360 --> 00:16:02.379
specific about the output you want. What exact

00:16:02.379 --> 00:16:05.419
data points in what precise format. I need a

00:16:05.419 --> 00:16:07.679
table with three columns labeled X, Y, and Z.

00:16:08.159 --> 00:16:10.559
That level of detail. Okay. Meticulous instruction.

00:16:10.720 --> 00:16:13.000
Meticulous. And finally, maybe think about setting

00:16:13.000 --> 00:16:15.500
some Asimov's laws for it. Clear boundaries.

00:16:15.620 --> 00:16:17.539
Define what success looks like. Define failure.

00:16:17.799 --> 00:16:19.539
Tell it when it should stop and ask for human

00:16:19.539 --> 00:16:21.980
help. And list forbidden actions, like explicitly

00:16:21.980 --> 00:16:24.899
say, do not make any purchases. Right. Setting

00:16:24.899 --> 00:16:28.610
those guardrails. Good advice. Based on the test,

00:16:28.850 --> 00:16:31.070
what's the current verdict on going to go missions?

00:16:31.190 --> 00:16:33.330
Where should people feel comfortable using agent

00:16:33.330 --> 00:16:35.769
mode now? And where should they absolutely hold

00:16:35.769 --> 00:16:38.169
back? Okay. Greenlight missions, things you absolutely

00:16:38.169 --> 00:16:40.710
can use it for right now. Definitely include

00:16:40.710 --> 00:16:42.649
that visual web research and market analysis

00:16:42.649 --> 00:16:46.450
stuff it did well on. Also, those multi -app

00:16:46.450 --> 00:16:49.769
workflows, the Voltron tasks, connecting drive,

00:16:49.990 --> 00:16:53.090
calendar, web, and maybe complex website navigation

00:16:53.090 --> 00:16:56.230
for like one -off data extraction tasks where

00:16:56.230 --> 00:16:59.659
speed isn't. So its current sweet spot is really

00:16:59.659 --> 00:17:02.559
as this more visual, more integrated, powerful,

00:17:02.860 --> 00:17:05.500
deep research assistant where you can watch it

00:17:05.500 --> 00:17:07.900
work. Exactly. High transparency, good for complex

00:17:07.900 --> 00:17:09.880
info gathering. And the red light missions. Yeah.

00:17:10.160 --> 00:17:11.940
Or is it just not ready yet? Definitely anything

00:17:11.940 --> 00:17:14.400
time critical just because it's too slow. High

00:17:14.400 --> 00:17:17.059
accuracy data extraction is risky because of

00:17:17.059 --> 00:17:19.740
that hallucination problem. And probably any

00:17:19.740 --> 00:17:22.259
task that requires really strict perfect formatting

00:17:22.259 --> 00:17:24.240
in the output because it can still be a bit messy

00:17:24.240 --> 00:17:26.279
there. Okay, makes sense. Looking ahead though,

00:17:26.539 --> 00:17:28.660
agent mode really feels like a version 1 .0,

00:17:28.700 --> 00:17:31.240
right? But you can see the version 10 .0 potential

00:17:31.240 --> 00:17:34.579
glimmering inside it. This core concept in AI

00:17:34.579 --> 00:17:37.200
that can act autonomously across different tools

00:17:37.200 --> 00:17:39.940
that is the future. No question. the timeline

00:17:39.940 --> 00:17:42.980
look like, roughly? Well, educated guess. We'd

00:17:42.980 --> 00:17:45.140
expect significant improvements in speed and

00:17:45.140 --> 00:17:47.680
maybe better error handling within, say, the

00:17:47.680 --> 00:17:50.759
next three months, then probably a big jump in

00:17:50.759 --> 00:17:53.059
the number of app integrations and hopefully

00:17:53.059 --> 00:17:55.980
some major fixes for that memory problem within

00:17:55.980 --> 00:17:58.019
maybe six months. And longer term. Within a year.

00:17:58.380 --> 00:18:00.839
I think it's highly likely to be genuinely competitive

00:18:00.839 --> 00:18:03.759
with human virtual assistants for a pretty wide

00:18:03.759 --> 00:18:07.119
range of tasks. So our recommendation is if you're

00:18:07.119 --> 00:18:09.279
an early adopter type, start experimenting now.

00:18:09.420 --> 00:18:11.380
Play with it on non -critical things, everyone

00:18:11.380 --> 00:18:13.599
else. You can probably reasonably wait three

00:18:13.599 --> 00:18:15.420
to six months for some of those key improvements

00:18:15.420 --> 00:18:18.400
to land before diving in seriously. Okay, good

00:18:18.400 --> 00:18:21.400
practical advice. So just to circle back on prompting

00:18:21.400 --> 00:18:24.720
one last time, given everything, what's the single

00:18:24.720 --> 00:18:26.720
most crucial strategy for getting good results

00:18:26.720 --> 00:18:29.380
out of agent mode today? Providing those direct

00:18:29.380 --> 00:18:31.839
URLs the GPS coordinates and being incredibly

00:18:31.839 --> 00:18:34.920
precise about the output format you expect. The

00:18:34.920 --> 00:18:37.359
contractor's blueprint. That's key right now.

00:18:37.519 --> 00:18:40.579
Okay. So let's try to recap the big idea from

00:18:40.579 --> 00:18:44.680
this deep dive. Chat GPT agent mode. It really

00:18:44.680 --> 00:18:47.779
is like that brilliant, super enthusiastic intern

00:18:47.779 --> 00:18:50.819
on their first day. It has these undeniable flashes

00:18:50.819 --> 00:18:53.920
of genius. Yeah, and it's Voltron -like integration

00:18:53.920 --> 00:18:56.259
power pulling together your apps and the web.

00:18:56.339 --> 00:18:58.819
That potential is dazzling. It feels genuinely

00:18:58.819 --> 00:19:02.000
futuristic. It lets the AI act for you. But,

00:19:02.200 --> 00:19:04.240
and it's a big but, you have to remember, it's

00:19:04.240 --> 00:19:06.019
still inconsistent. It's got those crippling

00:19:06.019 --> 00:19:08.359
speed issues right now. The memory is unreliable,

00:19:08.400 --> 00:19:10.599
and it hallucinates sometimes quite confidently.

00:19:10.799 --> 00:19:13.019
You really need to treat it as a promising prototype

00:19:13.019 --> 00:19:14.819
today. not a fully polished production ready

00:19:14.819 --> 00:19:17.440
tool. Absolutely. For anything important, anything

00:19:17.440 --> 00:19:19.660
mission critical, you simply must keep a human

00:19:19.660 --> 00:19:21.539
in the loop for oversight and fact checking.

00:19:21.700 --> 00:19:24.299
For now, anyway. And maybe a final thought to

00:19:24.299 --> 00:19:27.240
leave folks with. As this technology gets better,

00:19:27.299 --> 00:19:30.720
faster, more reliable, when AI can truly do complex

00:19:30.720 --> 00:19:33.539
work autonomously, not just respond to prompts,

00:19:33.680 --> 00:19:36.859
but actually execute tasks across systems, what

00:19:36.859 --> 00:19:38.420
does that really mean for the nature of our own

00:19:38.420 --> 00:19:41.380
jobs? For how we even define productivity? Yeah.

00:19:41.960 --> 00:19:44.279
Something to ponder, definitely. The landscape

00:19:44.279 --> 00:19:46.900
is changing fast. It really is. Well, thank you

00:19:46.900 --> 00:19:49.240
for joining us on this deep dive into chat GPT

00:19:49.240 --> 00:19:50.960
agent mode. Aotearoa music.
