WEBVTT

00:00:00.000 --> 00:00:03.299
So there's been a lot of buzz recently around

00:00:03.299 --> 00:00:06.639
OpenAI's latest thing, the chat GPT agent. People

00:00:06.639 --> 00:00:09.640
are calling it a profound shift, like suggesting

00:00:09.640 --> 00:00:12.019
an AI that doesn't just respond, but actually

00:00:12.019 --> 00:00:15.099
acts across your digital life. Welcome to the

00:00:15.099 --> 00:00:17.940
deep dive, everyone. Today, yeah, we're plunging

00:00:17.940 --> 00:00:21.300
into this truly fascinating world of AI agents.

00:00:22.000 --> 00:00:24.600
Our mission really is to unpack what this new

00:00:24.600 --> 00:00:27.179
chat GPT agent actually means for you. And also

00:00:27.179 --> 00:00:29.460
look at some other big advancements that are

00:00:29.460 --> 00:00:31.760
fundamentally reshaping how we interact with

00:00:31.760 --> 00:00:34.200
tech. Right. We'll explore its core capabilities,

00:00:34.719 --> 00:00:37.200
how it performs in the, let's say the messy real

00:00:37.200 --> 00:00:40.399
world, how it stacks up against competitors and

00:00:40.399 --> 00:00:42.840
really what does all this mean for your daily

00:00:42.840 --> 00:00:45.340
tasks, your future workflows. And our goal, as

00:00:45.340 --> 00:00:47.740
always, is just a calm, curious, sort of down

00:00:47.740 --> 00:00:49.359
-to -earth look at this landscape. It's moving

00:00:49.359 --> 00:00:52.340
so fast. OK, let's get into it then. Sam Altman,

00:00:52.520 --> 00:00:56.000
OpenAI CEO, he called this agent a banger. What

00:00:56.000 --> 00:00:58.619
exactly is this agent? Beyond the hype, why is

00:00:58.619 --> 00:01:02.100
it such a big deal? Well, I think what really

00:01:02.100 --> 00:01:04.920
stands out is that agent isn't just another standalone

00:01:04.920 --> 00:01:08.319
model. It's not like just a new version of ChatGPT.

00:01:08.340 --> 00:01:12.599
OK. It's a Seamless integration. It's really

00:01:12.599 --> 00:01:15.040
a redefinition of what an AI assistant can be.

00:01:15.280 --> 00:01:18.459
OpenAI showed it off on July 17th. And it's this

00:01:18.459 --> 00:01:20.719
deeply integrated system, pulling together a

00:01:20.719 --> 00:01:23.659
bunch of powerful tools. So it's about combining

00:01:23.659 --> 00:01:28.859
these sort of individual powers into one cohesive

00:01:28.859 --> 00:01:31.019
brain. Precisely. Yeah. Think of it like bringing

00:01:31.019 --> 00:01:32.840
together tools that used to be separate, but

00:01:32.840 --> 00:01:36.790
now they're under one... So first, it has this

00:01:36.790 --> 00:01:38.969
deep research capability. It accesses the internet,

00:01:39.310 --> 00:01:41.569
spends real time kind of synthesizing vast amounts

00:01:41.569 --> 00:01:44.250
of info, offers more depth, more nuance than

00:01:44.250 --> 00:01:47.030
older AIs. Second, and this is a huge leap, it

00:01:47.030 --> 00:01:49.109
has computer use, or what they call operator.

00:01:49.170 --> 00:01:51.250
Operator. This means it literally controls a

00:01:51.250 --> 00:01:53.829
computer like a human would. Clicking, scrolling,

00:01:54.010 --> 00:01:57.650
filling forms, navigating complex websites. Past

00:01:57.650 --> 00:02:00.430
versions of this, like AI enhanced RPA that's

00:02:00.430 --> 00:02:02.969
a robotic process automation. Right, bots mimicking

00:02:02.969 --> 00:02:05.099
clicks. Yeah, they were often just too unstable,

00:02:05.319 --> 00:02:08.069
too brittle. Agents seems to have made... significant

00:02:08.069 --> 00:02:10.050
improvements there. So it's not just searching

00:02:10.050 --> 00:02:13.050
the web, it's actually using a browser, like

00:02:13.050 --> 00:02:15.409
interacting with sites dynamically, that feels

00:02:15.409 --> 00:02:17.569
fundamentally different. Exactly. And it also

00:02:17.569 --> 00:02:20.629
has robust code execution. It can write and run

00:02:20.629 --> 00:02:23.909
code, mostly Python, in a secure sandbox environment.

00:02:23.969 --> 00:02:26.349
Okay. And then it uses the results for complex

00:02:26.349 --> 00:02:29.830
requests, like analyzing big data sets or generating

00:02:29.830 --> 00:02:32.610
charts right there. And images too! Yeah. Finally,

00:02:32.650 --> 00:02:35.310
it's directly integrated with Daily 3 for image

00:02:35.310 --> 00:02:38.319
generation. It can create illustrations, charts,

00:02:38.879 --> 00:02:40.979
maybe artwork based on its research or analysis,

00:02:41.460 --> 00:02:43.479
all without you leaving the conversation. That

00:02:43.479 --> 00:02:46.120
flexible coordination does sound incredibly powerful.

00:02:46.340 --> 00:02:48.900
But for a lot of listeners, the word autonomous

00:02:48.900 --> 00:02:51.879
might raise some flags, right, given past AI

00:02:51.879 --> 00:02:54.539
issues. True. How truly autonomous is it right

00:02:54.539 --> 00:02:57.419
now? And what guardrails are there to stop it

00:02:57.419 --> 00:03:00.500
from going off the rails or making big mistakes?

00:03:00.659 --> 00:03:02.879
It's definitely a step closer to autonomous.

00:03:03.400 --> 00:03:06.110
The real power isn't just one tool. It's the

00:03:06.110 --> 00:03:09.250
agent flexibly deciding, okay, now I need to

00:03:09.250 --> 00:03:11.050
research, now I should code, now I need to use

00:03:11.050 --> 00:03:14.990
the computer. It really is a step toward a more

00:03:14.990 --> 00:03:18.110
truly autonomous agent. It changes the game from

00:03:18.110 --> 00:03:22.490
simple queries to genuinely autonomous multi

00:03:22.490 --> 00:03:25.370
-step task completion. So how does that seamless

00:03:25.370 --> 00:03:28.889
integration really change the nature of how users

00:03:28.889 --> 00:03:31.830
interact with AI? It shifts from asking simple

00:03:31.830 --> 00:03:35.009
questions to getting the AI to complete complex

00:03:35.009 --> 00:03:37.969
multi -step tasks autonomously. Got it. And who

00:03:37.969 --> 00:03:40.729
gets access? And when? It's rolling out in phases.

00:03:41.129 --> 00:03:43.110
So, pray users, they get full access right now.

00:03:43.449 --> 00:03:45.830
Plus, N -Team's users will get access soon, probably

00:03:45.830 --> 00:03:47.810
with some initial limits. Then education and

00:03:47.810 --> 00:03:49.710
enterprise later on. If you're on the free plan,

00:03:49.870 --> 00:03:51.229
well, you'll need to upgrade to get the full

00:03:51.229 --> 00:03:53.419
agent experience. Theory is one thing, but does

00:03:53.419 --> 00:03:56.520
it actually work in the real world? Right, the

00:03:56.520 --> 00:03:58.280
million dollar question. Let's dive into some

00:03:58.280 --> 00:04:00.560
practical tests that really pushed agents limits.

00:04:00.860 --> 00:04:03.159
What did that in -depth news research challenge

00:04:03.159 --> 00:04:06.560
show? OK, so this test asked agent to analyze

00:04:06.560 --> 00:04:09.159
generative AI trends affecting digital marketing

00:04:09.159 --> 00:04:13.650
over the past month for a management memo. So

00:04:13.650 --> 00:04:16.970
complex. Needs current info, multiple sources.

00:04:17.670 --> 00:04:19.750
Exactly. Distinguishing important from irrelevant,

00:04:20.089 --> 00:04:22.290
industry context, synthesis, professional format,

00:04:22.730 --> 00:04:25.329
right time frame. Lots going on. And the real

00:04:25.329 --> 00:04:28.029
world results. How did it do? Agent took about

00:04:28.029 --> 00:04:31.170
15 minutes. Yeah. And it produced a memo that

00:04:31.170 --> 00:04:35.250
was maybe 80, 85 % of the quality a human analyst

00:04:35.250 --> 00:04:38.029
team would produce in several hours. Wow. Previous

00:04:38.029 --> 00:04:40.629
AIs might only hit like 50 % accuracy on something

00:04:40.629 --> 00:04:43.009
like that. It correctly picked up major stories.

00:04:43.470 --> 00:04:45.350
The agent launched itself, new Mistral models,

00:04:45.529 --> 00:04:47.970
Google search updates, structured it professionally.

00:04:48.029 --> 00:04:50.610
Any issues? Minor stuff. Some info slightly outside

00:04:50.610 --> 00:04:52.750
the time frame missed some niche discussions

00:04:52.750 --> 00:04:56.569
happening on, say, X or Twitter. OK. So conclusion.

00:04:56.860 --> 00:04:58.839
It's a powerful professional research assistant,

00:04:58.839 --> 00:05:01.199
definitely, but you still need that human oversight.

00:05:01.360 --> 00:05:03.540
That's still a huge leap for research tasks.

00:05:03.860 --> 00:05:05.680
What about something really specific and precise,

00:05:05.839 --> 00:05:08.019
like financial data retrieval and calculations?

00:05:08.360 --> 00:05:10.399
Right, this is more of a clear pass -fail test.

00:05:11.000 --> 00:05:14.560
The task, go to Yahoo Finance, find the VINFAST

00:05:14.560 --> 00:05:18.959
ticker VFS on NASDAQ, extract the last five closing

00:05:18.959 --> 00:05:21.860
prices, calculate the average daily percentage

00:05:21.860 --> 00:05:24.519
change, and present it all on a table. So it

00:05:24.519 --> 00:05:28.040
combines web navigation and code execution. Verdict.

00:05:28.560 --> 00:05:31.000
Did it nail it? Flawless. Yeah, it navigated

00:05:31.000 --> 00:05:33.860
Yahoo Finance, found the data, wrote a Python

00:05:33.860 --> 00:05:36.180
script for the calculation, presented it perfectly

00:05:36.180 --> 00:05:40.480
in a table. Many previous AIs would just stumble

00:05:40.480 --> 00:05:42.339
or fail outright on one of those steps, especially

00:05:42.339 --> 00:05:44.740
combining them. Agent handled it cleanly. Whoa.

00:05:44.980 --> 00:05:47.120
Imagine scaling that kind of precision, analyzing

00:05:47.120 --> 00:05:49.899
entire portfolios instantly. That really could

00:05:49.899 --> 00:05:51.620
change things for financial tasks, couldn't it?

00:05:51.639 --> 00:05:53.670
Two secs silence. OK, but what about something

00:05:53.670 --> 00:05:55.970
totally different, like managing, say, the flood

00:05:55.970 --> 00:05:59.129
of daily emails? How did Agent handle email complexity?

00:05:59.410 --> 00:06:01.810
Yeah, this task involved connecting to Gmail,

00:06:02.189 --> 00:06:04.730
reviewing 10 recent support emails. Just 10?

00:06:04.870 --> 00:06:07.870
OK. Yeah, start small. Then classify them technical,

00:06:08.250 --> 00:06:11.509
feedback, billing, and then draft replies using

00:06:11.509 --> 00:06:15.269
uploaded FAQ and return policy files. So this

00:06:15.269 --> 00:06:18.629
tests external access, reading context, classification,

00:06:19.189 --> 00:06:21.720
and RG retrieval augmented generation. That's

00:06:21.720 --> 00:06:24.060
where it pulls info from documents you give it,

00:06:24.120 --> 00:06:25.740
right? Exactly. It needs to understand the docs

00:06:25.740 --> 00:06:27.740
to draft good replies. That's a lot of context

00:06:27.740 --> 00:06:29.980
and steps. How did it go? Did it in about three

00:06:29.980 --> 00:06:33.040
minutes, correctly classified each email, drafted

00:06:33.040 --> 00:06:35.360
tailored replies that actually cited the documents,

00:06:35.399 --> 00:06:38.420
and even offered to create actual Gmail drafts

00:06:38.420 --> 00:06:41.040
for you. So potentially huge for customer service

00:06:41.040 --> 00:06:43.319
automation using internal knowledge. But you

00:06:43.319 --> 00:06:45.850
mentioned a limitation. Right. Important one.

00:06:46.329 --> 00:06:48.949
Agent can get overwhelmed by too much info, trying

00:06:48.949 --> 00:06:51.250
to process, say, a thousand emails instead of

00:06:51.250 --> 00:06:53.550
ten. It might struggle. You definitely need to

00:06:53.550 --> 00:06:55.850
understand its limits. Got it. So scale matters.

00:06:56.110 --> 00:06:58.129
And finally, product research and comparison.

00:06:58.430 --> 00:07:00.990
This used to be so hit or miss with AI, almost

00:07:00.990 --> 00:07:04.310
felt like a gamble. How did it do here? OK, the

00:07:04.310 --> 00:07:06.610
task was research and compare the five best air

00:07:06.610 --> 00:07:09.129
purifier models for Vietnamese cities under a

00:07:09.129 --> 00:07:12.949
specific budget, five million VND. Very specific.

00:07:13.029 --> 00:07:15.699
Yeah. and create a comparison table with exact

00:07:15.699 --> 00:07:18.920
columns. Product name, reference price, suitable

00:07:18.920 --> 00:07:21.959
room area, filtration technology, and a retailer

00:07:21.959 --> 00:07:24.500
link. And did it deliver that kind of consistency

00:07:24.500 --> 00:07:27.560
we saw with the financial data? Pretty much exactly

00:07:27.560 --> 00:07:30.180
as requested. Yeah. It found models. used the

00:07:30.180 --> 00:07:32.639
precise table format, consulted local e -commerce

00:07:32.639 --> 00:07:35.259
sites, gave detailed feature comparisons. Any

00:07:35.259 --> 00:07:38.000
glitches? Minor one. Maybe one or two product

00:07:38.000 --> 00:07:40.879
links went to general category pages, not the

00:07:40.879 --> 00:07:43.860
specific product. OK. But the significance is

00:07:43.860 --> 00:07:46.800
huge. This kind of detailed, structured consumer

00:07:46.800 --> 00:07:49.639
research used to be really unreliable with AI.

00:07:49.819 --> 00:07:52.519
Agent makes it genuinely useful. It performs

00:07:52.519 --> 00:07:55.079
research almost like an experienced human would.

00:07:55.199 --> 00:07:57.019
It really does sound like Agent crossed some

00:07:57.019 --> 00:07:59.180
kind of critical threshold, moving from just

00:07:59.180 --> 00:08:02.120
experimental to genuinely reliable for complex

00:08:02.120 --> 00:08:05.060
stuff. What truly sets it apart from previous

00:08:05.060 --> 00:08:07.310
attempts we've seen? I think the real breakthrough

00:08:07.310 --> 00:08:09.610
is, yeah, while other companies definitely experimented

00:08:09.610 --> 00:08:11.990
with similar ideas, agents seem to have finally

00:08:11.990 --> 00:08:14.910
crossed that crucial threshold. It feels reliable

00:08:14.910 --> 00:08:18.050
enough for regular, practical use. Previous tools

00:08:18.050 --> 00:08:20.449
often lacked the stability, or the deep integration,

00:08:20.709 --> 00:08:23.689
or the consistent accuracy, or they just couldn't

00:08:23.689 --> 00:08:27.290
handle real -world complexity well. Agent seems

00:08:27.290 --> 00:08:29.449
to deliver consistent results much more often,

00:08:29.750 --> 00:08:31.870
turning that promise of an AI assistant into

00:08:31.870 --> 00:08:34.309
more of a practical reality. So given this new

00:08:34.309 --> 00:08:36.570
level of capability, what do you think is the

00:08:36.570 --> 00:08:39.269
biggest hurdle users might face when they try

00:08:39.269 --> 00:08:42.490
to apply Agent to their own, maybe unique, complex

00:08:42.490 --> 00:08:44.629
workflows? Probably setting clear boundaries

00:08:44.629 --> 00:08:47.070
and managing the information input effectively

00:08:47.070 --> 00:08:50.200
for the AI. Okay, so... Agent can talk to my

00:08:50.200 --> 00:08:53.159
Gmail, analyze spreadsheets. How does that even

00:08:53.159 --> 00:08:55.419
work? What exactly are these AI connectors we

00:08:55.419 --> 00:08:58.240
keep hearing about? AI connectors, they're essentially

00:08:58.240 --> 00:09:01.080
like intelligent bridges. They let AI assistants

00:09:01.080 --> 00:09:04.320
seamlessly access and interact with your external

00:09:04.320 --> 00:09:06.820
accounts and services, you know, Gmail, Google

00:09:06.820 --> 00:09:09.340
Drive, Slack, Microsoft 365, things like that.

00:09:09.379 --> 00:09:11.679
So the AI works directly within your existing

00:09:11.679 --> 00:09:13.960
workflows, not just in some isolated chat window,

00:09:14.220 --> 00:09:16.500
not just conversation. It's interaction with

00:09:16.500 --> 00:09:19.399
your whole digital setup. like stacking Lego

00:09:19.399 --> 00:09:21.620
blocks of data and actions together. Interesting

00:09:21.620 --> 00:09:24.019
analogy. And I've heard there are different approaches

00:09:24.019 --> 00:09:27.700
here, like chat GPT agent versus, say, Claude's

00:09:27.700 --> 00:09:29.879
connectors. How do they differ? That's a great

00:09:29.879 --> 00:09:32.460
point. They represent pretty different philosophies

00:09:32.460 --> 00:09:36.000
about AI integration. Agents connectors are deeply

00:09:36.000 --> 00:09:39.580
integrated with its full suite. So email plus

00:09:39.580 --> 00:09:43.179
web research plus code plus images. OK. This

00:09:43.179 --> 00:09:46.259
makes it really good for complex, multi -step,

00:09:46.480 --> 00:09:48.879
flexible workflows where you need lots of different

00:09:48.879 --> 00:09:50.960
tools working together. The philosophy seems

00:09:50.960 --> 00:09:54.039
to be build a central brain that orchestrates

00:09:54.039 --> 00:09:56.639
many tools. Right. And Claude. Claude's connectors,

00:09:56.679 --> 00:09:59.179
on the other hand, tend to focus on simpler,

00:09:59.659 --> 00:10:02.080
really reliable connections to external services.

00:10:02.500 --> 00:10:05.179
They're excellent for more focused, simpler tasks

00:10:05.179 --> 00:10:07.960
like summarize my unread emails in the hashtag

00:10:07.960 --> 00:10:10.220
general Slack channel. Gotcha. Their philosophy

00:10:10.220 --> 00:10:12.360
feels more like providing reliable specialized

00:10:12.360 --> 00:10:14.980
tools the model can call on when needed. Less

00:10:14.980 --> 00:10:17.500
orchestration, more specific capabilities. So

00:10:17.500 --> 00:10:19.720
one's like a symphony conductor for your whole

00:10:19.720 --> 00:10:22.440
digital life, maybe? And the other is more like

00:10:22.440 --> 00:10:24.679
a set of specialist mechanics, each really good

00:10:24.679 --> 00:10:28.000
at one specific job. Fascinating. Yeah, that's

00:10:28.000 --> 00:10:30.120
a decent way to put it. So when we start using

00:10:30.120 --> 00:10:32.600
these connectors, what are some tips for doing

00:10:32.600 --> 00:10:36.100
it safely and effectively? Good question. First,

00:10:36.580 --> 00:10:39.169
definitely start small. Begin with simple tasks

00:10:39.169 --> 00:10:41.250
before you try to automate something really complex.

00:10:42.409 --> 00:10:45.730
Second, set clear boundaries. Be really explicit,

00:10:45.929 --> 00:10:48.970
say, analyze these five emails from this sender,

00:10:49.129 --> 00:10:51.490
not just check my inbox. Right, specificity.

00:10:51.669 --> 00:10:55.129
Third, provide context. Upload relevant documents,

00:10:55.669 --> 00:10:58.429
internal guides, policies, templates, give the

00:10:58.429 --> 00:11:02.429
AI the background it needs. Fourth, Always, always

00:11:02.429 --> 00:11:05.529
review thoroughly. Look over AI -generated responses

00:11:05.529 --> 00:11:07.730
or actions before you let them happen. Makes

00:11:07.730 --> 00:11:10.350
sense. And finally, manage permissions. Go back

00:11:10.350 --> 00:11:13.210
regularly and review what services the AI has

00:11:13.210 --> 00:11:15.649
access to. Revoke access you don't need anymore.

00:11:15.850 --> 00:11:17.769
You know, I still wrestle with prompt rest myself

00:11:17.769 --> 00:11:20.090
sometimes, so that specificity really does help

00:11:20.090 --> 00:11:22.110
get the AI to deliver what you actually want.

00:11:22.309 --> 00:11:24.370
That's helpful to hear, yeah. Beyond just the

00:11:24.370 --> 00:11:27.070
features, what does this difference in philosophies

00:11:27.070 --> 00:11:30.070
orchestration versus specialization tell us about

00:11:30.070 --> 00:11:32.210
where AI integration might be heading more broadly?

00:11:32.529 --> 00:11:36.029
It suggests maybe two paths. One, prioritizing

00:11:36.029 --> 00:11:39.090
holistic orchestration. The other, specialized

00:11:39.090 --> 00:11:41.730
reliable automation. Interesting. OK, it wasn't

00:11:41.730 --> 00:11:44.009
just OpenAI making waves this past week, though.

00:11:44.149 --> 00:11:46.429
The whole AI landscape seems to be just constantly

00:11:46.429 --> 00:11:48.710
shifting. What else caught your eye that felt

00:11:48.710 --> 00:11:51.889
significant? Yeah, a big one was Grok. That's

00:11:51.889 --> 00:11:54.870
XAI's product -launching AI companions, characters

00:11:54.870 --> 00:11:58.269
named Annie and Rudy. Companions, like friends.

00:11:58.429 --> 00:12:00.669
Basically, yeah. Designed as digital friends,

00:12:00.809 --> 00:12:03.330
not just work assistants. This feels like a real

00:12:03.330 --> 00:12:06.700
shift, you know, from AI... purely as a productivity

00:12:06.700 --> 00:12:10.200
tool towards AI as a social companion. It seems

00:12:10.200 --> 00:12:12.700
targeted at tech enthusiasts right now, but it

00:12:12.700 --> 00:12:15.019
raises some really fascinating psychological

00:12:15.019 --> 00:12:17.299
and social questions about our future relationships

00:12:17.299 --> 00:12:19.379
with AI. Definitely something to watch. And then

00:12:19.379 --> 00:12:21.179
there was Higgs Field's sole ID. Sounds like

00:12:21.179 --> 00:12:23.779
it's making AI selfies incredibly realistic.

00:12:23.960 --> 00:12:26.299
Yes, Higgs Field put out an updated AI image

00:12:26.299 --> 00:12:29.539
generator. It specializes in remarkably realistic

00:12:29.539 --> 00:12:31.679
photos of people. How does it work? It needs

00:12:31.679 --> 00:12:34.360
about 20, 25 photos of a person to train on.

00:12:34.759 --> 00:12:37.240
Then it generates these very natural, almost

00:12:37.240 --> 00:12:39.919
iPhone -looking images, focusing on expressions.

00:12:40.919 --> 00:12:44.000
What's striking is how good they are. They can

00:12:44.000 --> 00:12:47.000
genuinely fool the human eye. Wow. You can imagine

00:12:47.000 --> 00:12:49.700
uses in social media, maybe professional headshots.

00:12:50.179 --> 00:12:53.049
That uncanny valley is shrinking fast. It really

00:12:53.049 --> 00:12:54.929
is. What about on the open source side? We saw

00:12:54.929 --> 00:12:57.649
some cool updates there, too. Voxtral from Mistral

00:12:57.649 --> 00:12:59.870
came out. That's an open source speech recognition

00:12:59.870 --> 00:13:02.169
model. Speech recognition, like transcribing

00:13:02.169 --> 00:13:04.509
audio. Exactly. And it's competitive with things

00:13:04.509 --> 00:13:07.649
like 11Lab Scribe and OpenAI's Whisper. That's

00:13:07.649 --> 00:13:09.750
great for developers, makes high quality transcription

00:13:09.750 --> 00:13:12.529
more accessible. Nice. And then there's KimiK2

00:13:12.529 --> 00:13:15.470
from a Chinese startup called Moonshot AI. This

00:13:15.470 --> 00:13:17.850
thing is massive, a trillion parameters. It uses

00:13:17.850 --> 00:13:20.750
a mixture of experts or Moe architecture. A mixture

00:13:20.750 --> 00:13:23.460
of experts. What's that mean? It's basically

00:13:23.460 --> 00:13:26.759
like having a team of specialized AI models working

00:13:26.759 --> 00:13:29.059
together on different parts of a problem. makes

00:13:29.059 --> 00:13:31.919
it incredibly powerful. And its performance on

00:13:31.919 --> 00:13:34.620
benchmarks is really competitive. It definitely

00:13:34.620 --> 00:13:37.659
signals the growing strength of Chinese AI companies

00:13:37.659 --> 00:13:40.019
on the global stage. Okay, a lot happening there.

00:13:40.399 --> 00:13:43.379
And for video generation, Runway Act 2 got an

00:13:43.379 --> 00:13:45.340
important upgrade too, right? Getting closer

00:13:45.340 --> 00:13:47.139
to practical use and things like filmmaking.

00:13:47.320 --> 00:13:50.059
Absolutely. Runway Act 2 is their upgraded AI

00:13:50.059 --> 00:13:53.559
video tech. It can take an input video, say...

00:13:53.549 --> 00:13:57.429
of a person's motion and use that motion to drive

00:13:57.429 --> 00:14:01.330
completely AI -generated content. So a person

00:14:01.330 --> 00:14:03.570
in a suit could become an astronaut or maybe

00:14:03.570 --> 00:14:06.409
a superhero, but their original motion is preserved

00:14:06.409 --> 00:14:08.929
precisely. So the movement looks real even if

00:14:08.929 --> 00:14:10.950
the character changes. Exactly. It's much more

00:14:10.950 --> 00:14:13.269
precise now, less distorted than earlier versions.

00:14:13.509 --> 00:14:15.870
That brings us way closer to practical uses in

00:14:15.870 --> 00:14:19.110
filmmaking, advertising, anywhere. Motion fidelity

00:14:19.110 --> 00:14:21.230
is key. Cool. And you mentioned something about

00:14:21.230 --> 00:14:24.039
transparency. With Grok. Yeah, this was interesting.

00:14:24.379 --> 00:14:26.559
So Grok initially gave some inappropriate answers,

00:14:26.820 --> 00:14:28.940
apparently because of viral memes in its training

00:14:28.940 --> 00:14:32.019
data. Yeah. So they fixed it by updating the

00:14:32.019 --> 00:14:34.039
system prompt, basically the core instructions

00:14:34.039 --> 00:14:37.460
the AI follows. But what really matters is that

00:14:37.460 --> 00:14:40.940
the Grok team published the exact system prompt

00:14:40.940 --> 00:14:43.860
changes on GitHub. They showed their work. Exactly.

00:14:44.200 --> 00:14:46.299
Showed precisely what they modified and why.

00:14:47.000 --> 00:14:48.820
That level of transparency is pretty rare in

00:14:48.820 --> 00:14:50.480
this competitive field, and it does a lot to

00:14:50.480 --> 00:14:53.480
build trust, I think. That is unusual. So from

00:14:53.480 --> 00:14:56.840
AI companions to hyper real visuals, open source

00:14:56.840 --> 00:15:00.620
advances, it's a lot. Which of these other developments

00:15:00.620 --> 00:15:02.899
do you think might have the most immediate impact

00:15:02.899 --> 00:15:05.889
on daily life for the average person? Probably

00:15:05.889 --> 00:15:09.250
the AI companions. Shifting AI from just tasks

00:15:09.250 --> 00:15:12.450
towards actual relationships is a really significant

00:15:12.450 --> 00:15:14.309
step, I think. Yeah, potentially transformative.

00:15:14.610 --> 00:15:16.870
OK, bringing it back to agent, who is ChatGPP

00:15:16.870 --> 00:15:19.889
Agent really for, and how can someone start using

00:15:19.889 --> 00:15:22.370
it effectively without feeling totally overwhelmed?

00:15:22.690 --> 00:15:25.590
Right. So ChatGPP Agent seems perfect for researchers,

00:15:25.970 --> 00:15:28.309
analysts, people who need to synthesize info

00:15:28.309 --> 00:15:30.289
quickly, often from lots of different places.

00:15:30.350 --> 00:15:32.649
Thanks. Also. Marketing and comms professionals,

00:15:33.070 --> 00:15:35.070
content creators who are combining research,

00:15:35.210 --> 00:15:37.950
writing, maybe visuals. Small business owners,

00:15:38.169 --> 00:15:40.809
too, for things like customer support or market

00:15:40.809 --> 00:15:43.730
analysis. Really, anyone who frequently combines

00:15:43.730 --> 00:15:46.289
multiple types of digital work. And who might

00:15:46.289 --> 00:15:49.190
it not be ideal for, at least right now? Well,

00:15:49.490 --> 00:15:51.889
probably not ideal for really simple single purpose

00:15:51.889 --> 00:15:55.259
tasks where regular chat GPT might be fine. or

00:15:55.259 --> 00:15:58.600
for users who need absolute 100 % accuracy for

00:15:58.600 --> 00:16:01.419
truly mission -critical decisions, or tasks that

00:16:01.419 --> 00:16:03.919
require really nuanced human judgment on sensitive

00:16:03.919 --> 00:16:07.419
stuff, it's powerful, but it's still an AI. Got

00:16:07.419 --> 00:16:09.960
it. So for those who fit the profile and are

00:16:09.960 --> 00:16:11.539
ready to jump in, what are the best practices?

00:16:11.700 --> 00:16:14.080
How do you actually get value from agents? OK,

00:16:14.080 --> 00:16:16.320
first, be incredibly clear and specific with

00:16:16.320 --> 00:16:18.360
your requests. Don't just say help with email.

00:16:18.679 --> 00:16:20.779
Try something like draft replies to the five

00:16:20.779 --> 00:16:23.220
most recent emails with complaint in the subject.

00:16:23.659 --> 00:16:25.899
Use the complaint -response .docsis template

00:16:25.899 --> 00:16:28.500
as a guide. Then ask for my approval before doing

00:16:28.500 --> 00:16:31.789
anything else. OK, very precise. Second, provide

00:16:31.789 --> 00:16:34.029
those context documents we talked about, upload,

00:16:34.190 --> 00:16:35.889
guides, templates, before you start the task.

00:16:36.470 --> 00:16:39.669
Third, use confirmation steps for important things.

00:16:40.110 --> 00:16:42.490
Ask agent, have you understood the data needs

00:16:42.490 --> 00:16:45.250
to be for Q2? Something like that. Check its

00:16:45.250 --> 00:16:48.610
understanding. Exactly. Fourth, combine its capabilities

00:16:48.610 --> 00:16:51.309
strategically. Think about the steps. And finally,

00:16:51.710 --> 00:16:54.529
set realistic expectations. Agent is powerful,

00:16:54.870 --> 00:16:57.340
but it's not perfect. Always review its work.

00:16:57.399 --> 00:16:59.379
It sounds like a new kind of workflow design

00:16:59.379 --> 00:17:01.620
almost like programming an assistant, but in

00:17:01.620 --> 00:17:04.279
plain English kind of yeah What are some sample

00:17:04.279 --> 00:17:06.240
workflows someone could try with agent things

00:17:06.240 --> 00:17:08.859
that really show off its unique abilities? Oh

00:17:08.859 --> 00:17:10.460
the possibilities are getting really exciting

00:17:10.460 --> 00:17:14.259
now imagine Content planning and creation you

00:17:14.259 --> 00:17:16.900
could prompt it to research trending topics in

00:17:16.900 --> 00:17:19.539
say sustainable development Okay, then propose

00:17:19.539 --> 00:17:22.460
a list of blog post ideas and for the top idea

00:17:22.460 --> 00:17:25.019
create a detailed outline and generate a suitable

00:17:25.860 --> 00:17:29.720
image, all in one go, or customer feedback analysis.

00:17:30.059 --> 00:17:32.279
Connect to Google Drive, tell it to read a CSV

00:17:32.279 --> 00:17:35.079
file, identify the top complaints and the features

00:17:35.079 --> 00:17:37.740
people love, create a bar chart from that data,

00:17:37.960 --> 00:17:40.059
and then draft a brief summary for a team meeting.

00:17:40.099 --> 00:17:43.140
That's powerful. Or even market entry research,

00:17:43.900 --> 00:17:46.079
maybe for specialty coffee in the Thai market.

00:17:46.380 --> 00:17:49.740
Ask for an overview report on competitors, average

00:17:49.740 --> 00:17:53.480
prices, import regulations, and present it all

00:17:53.480 --> 00:17:56.490
as a slide deck outline. It really does feel

00:17:56.490 --> 00:17:58.849
like a paradigm shift brewing. How do you see

00:17:58.849 --> 00:18:01.930
the evolution of AI assistance continuing from

00:18:01.930 --> 00:18:04.369
here, building on what Agent has started? Yeah,

00:18:04.490 --> 00:18:06.710
what feels revolutionary is that Agent really

00:18:06.710 --> 00:18:09.069
signifies this profound evolution. We're moving

00:18:09.069 --> 00:18:11.410
from single -purpose tools, you know, a grammar

00:18:11.410 --> 00:18:14.720
checker here, an image generator there, to deeply

00:18:14.720 --> 00:18:17.940
integrated assistance, from simple Q &A to actually

00:18:17.940 --> 00:18:20.779
completing complex tasks, and from AI needing

00:18:20.779 --> 00:18:23.660
human supervision at every single step to much

00:18:23.660 --> 00:18:26.579
more autonomous AI agents that can chain multiple

00:18:26.579 --> 00:18:29.410
actions together intelligently. And for us as

00:18:29.410 --> 00:18:31.490
professionals, how is this going to impact our

00:18:31.490 --> 00:18:34.170
day -to -day work? Our productivity, where do

00:18:34.170 --> 00:18:36.430
humans fit in this picture? Well, I think we'll

00:18:36.430 --> 00:18:39.009
see routine research and analysis get increasingly

00:18:39.009 --> 00:18:41.609
automated. That frees up our cognitive space

00:18:41.609 --> 00:18:43.809
for higher level thinking. Creative work will

00:18:43.809 --> 00:18:47.670
likely involve more human AI collaboration. Humans

00:18:47.670 --> 00:18:50.910
provide the strategy, the vision. AI handles

00:18:50.910 --> 00:18:53.869
more of the execution, maybe the iteration, communication.

00:18:54.400 --> 00:18:57.240
probably more AI assisted drafting, making interactions

00:18:57.240 --> 00:19:00.359
more efficient. And for complex problem solving,

00:19:00.779 --> 00:19:03.240
it'll be about combining human strategy with

00:19:03.240 --> 00:19:06.180
AI's execution power, letting us tackle bigger,

00:19:06.480 --> 00:19:09.170
harder problems. Which means the essential skills

00:19:09.170 --> 00:19:11.730
we need as professionals are also shifting, right?

00:19:12.089 --> 00:19:14.529
What are those core skills we need to be cultivating

00:19:14.529 --> 00:19:16.650
now? Absolutely. You'll definitely need prompt

00:19:16.650 --> 00:19:18.609
engineering learning to communicate precisely

00:19:18.609 --> 00:19:20.670
and effectively with AI. That's fundamental.

00:19:20.829 --> 00:19:23.130
Right. And systems thinking, designing workflows

00:19:23.130 --> 00:19:25.269
that optimally combine human strengths and AI

00:19:25.269 --> 00:19:28.299
strengths. Also, AI ethics and oversight being

00:19:28.299 --> 00:19:31.740
able to recognize and mitigate risks, biases

00:19:31.740 --> 00:19:34.799
in AI output. That's crucial. Critical finting,

00:19:34.940 --> 00:19:36.680
too, I imagine. Definitely. Critical thinking,

00:19:36.740 --> 00:19:39.039
evaluating the AI's output, knowing when it's

00:19:39.039 --> 00:19:41.720
right, when it's wrong, and crucially, when and

00:19:41.720 --> 00:19:44.710
how to intervene. Given all these new skills,

00:19:45.109 --> 00:19:47.910
what's the single most important mindset shift,

00:19:47.910 --> 00:19:50.309
do you think, for professionals trying to adapt

00:19:50.309 --> 00:19:54.190
to this rapidly evolving AI landscape? I'd say

00:19:54.190 --> 00:19:57.230
it's embracing human AI collaboration truly as

00:19:57.230 --> 00:20:00.190
an amplification of our abilities, not as a replacement

00:20:00.190 --> 00:20:02.349
for them. Amplification, not replacement. OK,

00:20:02.490 --> 00:20:05.789
so to recap our deep dive today, ChatGPT Agent

00:20:05.789 --> 00:20:08.589
feels like a genuine breakthrough for practical

00:20:08.589 --> 00:20:12.289
multi -step AI assistance. Yeah, it's reliably

00:20:12.289 --> 00:20:15.759
handling complex tasks that honestly used to

00:20:15.759 --> 00:20:18.200
need constant human handholding. It really marks

00:20:18.200 --> 00:20:20.440
a paradigm shift, doesn't it? From these isolated

00:20:20.440 --> 00:20:23.799
tools we used to use to more integrated, almost

00:20:23.799 --> 00:20:26.259
autonomous agents. Exactly. And that combination

00:20:26.259 --> 00:20:29.500
of human judgment and AI capability is becoming

00:20:29.500 --> 00:20:32.259
extraordinarily powerful. Really quickly. It's

00:20:32.259 --> 00:20:34.480
not about AI replacing humans, fundamentally.

00:20:34.579 --> 00:20:36.799
It's about humans amplifying what they can achieve

00:20:36.799 --> 00:20:39.279
with AI. Well said. So the question for you listening

00:20:39.279 --> 00:20:42.339
is, what workflow will you try first with ChatGPT

00:20:42.339 --> 00:20:44.380
Agent? The possibilities are really opening up.

00:20:44.460 --> 00:20:46.140
Yeah, we definitely encourage you to experiment,

00:20:46.319 --> 00:20:48.700
to explore what it can do. Because as these tools

00:20:48.700 --> 00:20:51.859
keep getting better, and they will, the critical

00:20:51.859 --> 00:20:54.920
question isn't really if you'll adopt them, but

00:20:54.920 --> 00:20:57.500
maybe how quickly you can learn to use them effectively.

00:20:57.640 --> 00:21:00.170
Right. the businesses, the individuals who master

00:21:00.170 --> 00:21:03.089
this human AI collaboration today, they're likely

00:21:03.089 --> 00:21:05.430
going to have significant advantages as these

00:21:05.430 --> 00:21:07.730
technologies just become standard everywhere

00:21:07.730 --> 00:21:10.250
in every industry. Thank you for diving deep

00:21:10.250 --> 00:21:12.630
with us today. We'll be back soon with more insights

00:21:12.630 --> 00:21:15.470
into the future of technology. OutTO Music.