WEBVTT

00:00:00.000 --> 00:00:02.580
We always assume that, you know, staffing more

00:00:02.580 --> 00:00:05.500
intelligence, more compute, more AI agents onto

00:00:05.500 --> 00:00:08.359
a single task, that it just guarantees a better

00:00:08.359 --> 00:00:10.839
outcome. Right. More bots have to mean a faster,

00:00:10.900 --> 00:00:13.140
smarter solution, don't they? It just feels so

00:00:13.140 --> 00:00:15.500
intuitive, like you're just stacking these Lego

00:00:15.500 --> 00:00:17.980
blocks of data until you have a skyscraper of

00:00:17.980 --> 00:00:20.339
IQ. But it's not exactly like that. Some breakthrough

00:00:20.339 --> 00:00:24.710
research just confirmed this really... counterintuitive

00:00:24.710 --> 00:00:27.850
truth, which is that sometimes adding more bots

00:00:27.850 --> 00:00:30.230
just burns through your resources, your tokens,

00:00:30.370 --> 00:00:33.350
and actually makes the entire system worse. Dramatically

00:00:33.350 --> 00:00:37.469
worse. Yeah. Less is often truly more, especially

00:00:37.469 --> 00:00:41.189
when you're dealing with high logic tasks. Welcome

00:00:41.189 --> 00:00:44.899
to the deep dive. Today, we're unpacking a really

00:00:44.899 --> 00:00:47.359
fascinating stack of sources that clarify the

00:00:47.359 --> 00:00:49.600
current AI business landscape, and they reveal

00:00:49.600 --> 00:00:51.820
some critical hidden truths about how these complex

00:00:51.820 --> 00:00:54.219
systems actually work under the hood. And we're

00:00:54.219 --> 00:00:56.000
here to help you gain that knowledge quickly,

00:00:56.140 --> 00:00:58.460
but also thoroughly. We have three main areas

00:00:58.460 --> 00:01:00.200
for you today that you could really be informed

00:01:00.200 --> 00:01:02.740
on. First, we're going to look at the big monetization

00:01:02.740 --> 00:01:04.980
risks. If you try to build a business inside,

00:01:05.079 --> 00:01:07.900
you know, the biggest AI marketplace out there.

00:01:08.000 --> 00:01:10.879
Then, an update on new models. Yep. Blazing fast

00:01:10.879 --> 00:01:13.159
new models and practical tools you can use right

00:01:13.159 --> 00:01:15.579
now. We even have a really clever prompting technique

00:01:15.579 --> 00:01:18.140
that pretty much guarantees retention. And finally,

00:01:18.260 --> 00:01:20.680
that breakthrough study we mentioned, showing

00:01:20.680 --> 00:01:23.599
precisely why scaling up AI systems can sometimes

00:01:23.599 --> 00:01:28.219
backfire in a big way. Let's get into it. Okay,

00:01:28.280 --> 00:01:30.099
let's unpack this for you. We'll start with the

00:01:30.099 --> 00:01:31.959
business builders, the entrepreneurs out there

00:01:31.959 --> 00:01:33.540
trying to find a home for their applications.

00:01:34.040 --> 00:01:36.599
OpenAI recently clarified how you can monetize

00:01:36.599 --> 00:01:39.200
your apps within the ChatGPT marketplace. And

00:01:39.200 --> 00:01:41.640
it is a massive distribution channel. I mean,

00:01:41.640 --> 00:01:43.980
you can't deny that. Everyone is rushing toward

00:01:43.980 --> 00:01:46.180
it. Of course. They basically offer two paths

00:01:46.180 --> 00:01:48.359
if you want to make money there. The path that's

00:01:48.359 --> 00:01:50.480
supported right now is called external checkout.

00:01:50.700 --> 00:01:52.459
So that means you handle all the payments yourself

00:01:52.459 --> 00:01:55.379
off platform. You keep control. Exactly. But

00:01:55.379 --> 00:01:57.799
the path everyone really wants, the one that

00:01:57.799 --> 00:02:00.579
gets you right into that user flow, is instant

00:02:00.579 --> 00:02:03.150
checkout. That's the built -in payment system.

00:02:03.370 --> 00:02:05.349
Right, but it's currently in a private beta,

00:02:05.489 --> 00:02:08.889
only for a few select partners. Everyone wants

00:02:08.889 --> 00:02:11.750
to be there, but relying on that single channel,

00:02:11.949 --> 00:02:15.650
it's just really dangerous long -term. And here's

00:02:15.650 --> 00:02:17.409
where it gets interesting, because that massive

00:02:17.409 --> 00:02:19.870
user base comes with some severe trade -offs.

00:02:19.930 --> 00:02:22.669
Oh, yeah. The first major one is what we call

00:02:22.669 --> 00:02:25.810
brand dilution. Buyers start to attribute the

00:02:25.810 --> 00:02:29.599
purchase To chat GPT, not to you, the developer.

00:02:29.699 --> 00:02:31.479
You become invisible. You're just the engine

00:02:31.479 --> 00:02:33.840
behind the scenes. That's the classic marketplace

00:02:33.840 --> 00:02:36.360
squeeze, isn't it? When instant checkout eventually

00:02:36.360 --> 00:02:39.319
expands, you're just another storefront and you're

00:02:39.319 --> 00:02:41.960
probably going to get lost in the noise. You

00:02:41.960 --> 00:02:44.599
get the scale, sure, but you pay for it with

00:02:44.599 --> 00:02:47.780
zero brand loyalty with your actual user. Yeah,

00:02:47.800 --> 00:02:49.810
but hold on. If that channel gets you millions

00:02:49.810 --> 00:02:52.849
of users almost instantly, isn't zero brand loyalty

00:02:52.849 --> 00:02:54.909
worth it at the start? I mean, you're trading

00:02:54.909 --> 00:02:57.849
that long term brand equity for massive, immediate

00:02:57.849 --> 00:03:00.210
scale. Which is the goal for a startup launch,

00:03:00.349 --> 00:03:02.530
right? It is. But that's the Faustian bargain.

00:03:02.689 --> 00:03:05.310
The sources are really clear. The moment you

00:03:05.310 --> 00:03:07.990
rely on their discovery mechanism, you are trapped.

00:03:08.189 --> 00:03:10.789
Just like the early days of Google ads, you know.

00:03:11.419 --> 00:03:12.879
Eventually, you're going to have to pay to get

00:03:12.879 --> 00:03:15.560
seen. Whoever bids the most gets listed first.

00:03:15.819 --> 00:03:18.840
And that just pushes up bidding costs and kills

00:03:18.840 --> 00:03:21.259
any small profits for independent builders. And

00:03:21.259 --> 00:03:24.689
the liability risk? This is massive. And I think

00:03:24.689 --> 00:03:26.949
it's often overlooked by people building on these

00:03:26.949 --> 00:03:29.050
platforms. This is where the cost of convenience

00:03:29.050 --> 00:03:32.530
becomes kind of terrifying. Imagine you built

00:03:32.530 --> 00:03:35.310
a highly customized financial planning bot, right?

00:03:35.389 --> 00:03:38.729
It relies on this complex set of your own internal

00:03:38.729 --> 00:03:41.710
instructions. Okay. And now imagine the AI marketplace

00:03:41.710 --> 00:03:44.370
explains your product is slightly incorrectly.

00:03:45.050 --> 00:03:48.650
A subtle prompt drift happens and it misinterprets

00:03:48.650 --> 00:03:50.969
the volatility of an asset because of that bad

00:03:50.969 --> 00:03:53.330
description. That's not just an unhappy user

00:03:53.330 --> 00:03:56.289
anymore. That is real legal exposure that developers

00:03:56.289 --> 00:03:58.949
now have to insure against. You are liable for

00:03:58.949 --> 00:04:01.370
the AI's bad explanation of your product. Yeah,

00:04:01.449 --> 00:04:04.189
that liability issue. It really resonates. I

00:04:04.189 --> 00:04:05.830
mean, to be honest, I still wrestle with prompt

00:04:05.830 --> 00:04:08.030
drift myself sometimes, just trying to make sure

00:04:08.030 --> 00:04:09.969
my core instructions stay consistent, even on

00:04:09.969 --> 00:04:12.509
simple non -financial tasks. Of course. So imagine

00:04:12.509 --> 00:04:15.469
that struggle when real money and customer trust

00:04:15.469 --> 00:04:17.769
and maybe even legal action are on the line.

00:04:17.810 --> 00:04:19.850
It's a whole other level. And we also saw that

00:04:19.850 --> 00:04:21.629
some products just don't translate well. Anything

00:04:21.629 --> 00:04:24.129
that relies on, say, visual or emotional input.

00:04:24.209 --> 00:04:26.310
The sources mentioned things like fashion, beauty

00:04:26.310 --> 00:04:28.569
or home decor. It just doesn't sell well through

00:04:28.569 --> 00:04:31.870
a chat interface. At the end of the day. ChatGPT

00:04:31.870 --> 00:04:34.670
sells convenience. It doesn't necessarily sell

00:04:34.670 --> 00:04:38.170
your complex, nuanced company. Right. So based

00:04:38.170 --> 00:04:40.709
on all this, what's the single biggest strategic

00:04:40.709 --> 00:04:42.990
shift a developer should be making right now

00:04:42.990 --> 00:04:45.629
to secure some long -term sustainability outside

00:04:45.629 --> 00:04:47.930
of that walled garden? They have to prioritize

00:04:47.930 --> 00:04:50.569
building their own distribution, their own brand

00:04:50.569 --> 00:04:52.949
presence, totally separate from the platform's

00:04:52.949 --> 00:04:55.709
convenience. All right, let's pivot a bit to

00:04:55.709 --> 00:04:57.649
the relentless pace of development, to speed.

00:04:58.110 --> 00:04:59.970
I mean, we're moving so fast now that entire

00:04:59.970 --> 00:05:02.310
model generations are obsolete in just a few

00:05:02.310 --> 00:05:05.389
months. Google just dropped Gemini 3 Flash. And

00:05:05.389 --> 00:05:08.290
as the name suggests, it is blazing fast. It's

00:05:08.290 --> 00:05:10.949
designed for these high volume, low latency tasks.

00:05:11.670 --> 00:05:13.889
What's impressive is that it actually beats the

00:05:13.889 --> 00:05:16.389
higher tier pro version in some coding benchmarks.

00:05:17.040 --> 00:05:19.259
So speed isn't necessarily sacrificing quality

00:05:19.259 --> 00:05:21.740
anymore. Exactly. And it's free to try right

00:05:21.740 --> 00:05:23.860
now, which just lowers that barrier to entry

00:05:23.860 --> 00:05:26.180
for everyone. And the financial velocity around

00:05:26.180 --> 00:05:29.620
this speed race is, it's just staggering. Look

00:05:29.620 --> 00:05:33.339
at Databricks. They just raised $4 billion at

00:05:33.339 --> 00:05:38.519
$134 billion valuation. And that's up 34 % in

00:05:38.519 --> 00:05:40.899
just three months. Three months. That growth

00:05:40.899 --> 00:05:43.839
rate is both terrifying and exhilarating. Whoa!

00:05:44.399 --> 00:05:46.500
I mean, just imagine scaling that infrastructure

00:05:46.500 --> 00:05:49.699
to a billion queries a day. The market is validating

00:05:49.699 --> 00:05:52.040
that speed and integration are now the critical

00:05:52.040 --> 00:05:54.139
infrastructure for this next phase of the Internet.

00:05:54.319 --> 00:05:57.000
Which brings us to how we, the users, should

00:05:57.000 --> 00:05:58.819
be interacting with these tools that are getting

00:05:58.819 --> 00:06:02.079
faster and faster. This velocity highlights a

00:06:02.079 --> 00:06:04.730
crucial point for you, the learner. You can't

00:06:04.730 --> 00:06:07.790
just let AI info dump on you. You've got to maximize

00:06:07.790 --> 00:06:09.790
the knowledge transfer from every interaction.

00:06:10.050 --> 00:06:12.149
Exactly. We found this really practical technique

00:06:12.149 --> 00:06:14.850
called the Feynman loop prompt. This concept

00:06:14.850 --> 00:06:17.889
is brilliant. It forces true understanding, and

00:06:17.889 --> 00:06:19.910
it guarantees you'll retain so much more than

00:06:19.910 --> 00:06:22.670
if you just read a summary. You tell the AI to

00:06:22.670 --> 00:06:25.009
teach you a topic. And then it tests you. It

00:06:25.009 --> 00:06:27.810
continuously tests you and teaches you, iterating

00:06:27.810 --> 00:06:29.949
over and over until you can successfully teach

00:06:29.949 --> 00:06:32.379
the concept back to the AI. So, for example,

00:06:32.480 --> 00:06:35.500
you can start with a prompt like, AI, teach me

00:06:35.500 --> 00:06:38.019
quantum entanglement using the Feynman loop methodology.

00:06:38.439 --> 00:06:41.259
And you add validating my understanding by requiring

00:06:41.259 --> 00:06:43.639
me to teach it back to you, broken down into

00:06:43.639 --> 00:06:46.660
five simple stages. And that shift from just

00:06:46.660 --> 00:06:50.139
passively consuming to actively teaching, it

00:06:50.139 --> 00:06:52.879
fundamentally changes how you learn. You should

00:06:52.879 --> 00:06:55.139
probably save that prompt forever. We're also

00:06:55.139 --> 00:06:57.079
seeing this specialization accelerating across

00:06:57.079 --> 00:06:59.939
practical tools. SEMrush, the marketing intelligence

00:06:59.939 --> 00:07:03.160
giant, is now operating inside ChatGPT. So marketers

00:07:03.160 --> 00:07:05.620
can automate complex reports or analyze competitors

00:07:05.620 --> 00:07:08.379
just by typing simple commands. And others are

00:07:08.379 --> 00:07:11.079
combining models. You see tools like AirOps combining

00:07:11.079 --> 00:07:14.240
40 or more different specialized models to handle

00:07:14.240 --> 00:07:17.439
these niche creative growth tasks. And Alibaba's

00:07:17.439 --> 00:07:21.639
WAN 2 .6 is creating these 15 -second 1080p videos.

00:07:21.959 --> 00:07:24.800
with multiple connected shots. The era of the

00:07:24.800 --> 00:07:27.399
single monolithic AI is fading really fast. It's

00:07:27.399 --> 00:07:29.540
moving toward these highly specific utility tools.

00:07:29.779 --> 00:07:32.319
So given all this speed, the question becomes,

00:07:32.500 --> 00:07:35.819
how do we even measure true intelligence now?

00:07:36.319 --> 00:07:38.800
OpenAI just introduced something called the Frontier

00:07:38.800 --> 00:07:41.199
Science Benchmark. And this goes way beyond just

00:07:41.199 --> 00:07:43.860
synthesizing data or completing some coding tasks.

00:07:44.360 --> 00:07:46.600
This benchmark specifically challenges models

00:07:46.600 --> 00:07:50.240
to, say, hypothesize novel chemical reactions

00:07:50.240 --> 00:07:52.740
or predict entirely new molecular structures.

00:07:52.879 --> 00:07:54.980
Or even suggest revolutionary breakthroughs in

00:07:54.980 --> 00:07:57.720
material science. It's measuring if AI can perform

00:07:57.720 --> 00:08:00.519
actual scientific discovery. The results are

00:08:00.519 --> 00:08:02.819
promising. They show huge potential. But, you

00:08:02.819 --> 00:08:05.620
know, skepticism is still pretty high. It's a

00:08:05.620 --> 00:08:07.800
massive step, though, toward proving that AI

00:08:07.800 --> 00:08:10.160
can innovate, not just summarize old papers.

00:08:10.889 --> 00:08:14.110
So how do these rapid and, let's be honest, complex

00:08:14.110 --> 00:08:16.509
benchmarks like frontier science really help

00:08:16.509 --> 00:08:19.370
the average user or builder who isn't a research

00:08:19.370 --> 00:08:21.870
chemist? Well, benchmarks clearly show the limitations

00:08:21.870 --> 00:08:24.129
and the potential. They guide us on which high

00:08:24.129 --> 00:08:26.230
-stakes tasks are actually safe for an AI to

00:08:26.230 --> 00:08:28.189
execute and which ones still need that human

00:08:28.189 --> 00:08:30.509
oversight. Which brings us directly to the heart

00:08:30.509 --> 00:08:33.289
of the research, those counterintuitive findings

00:08:33.289 --> 00:08:37.049
on multi -agent complexity. This is crucial for

00:08:37.049 --> 00:08:38.909
anyone thinking about building sophisticated

00:08:38.909 --> 00:08:41.730
AI workflows or enterprise solutions. Right.

00:08:41.850 --> 00:08:44.110
Multi -agent systems are incredibly popular right

00:08:44.110 --> 00:08:47.100
now. The idea is simple. It's a team of specialized

00:08:47.100 --> 00:08:50.019
AI bots working together on one complex task.

00:08:50.299 --> 00:08:52.299
And the assumption has always been pretty straightforward.

00:08:52.659 --> 00:08:55.460
More specialized agents means better, faster

00:08:55.460 --> 00:08:58.120
results. But a joint study from Google and MIT

00:08:58.120 --> 00:09:00.899
ran 180 experiments across different major models

00:09:00.899 --> 00:09:03.500
to rigorously test that foundational theory.

00:09:03.740 --> 00:09:06.960
Does adding more AI agents actually inherently

00:09:06.960 --> 00:09:09.750
improve the outcome? And the answer is, well,

00:09:09.850 --> 00:09:12.029
it's absolutely nuanced. For tasks where you

00:09:12.029 --> 00:09:14.470
can easily split data across agents and think

00:09:14.470 --> 00:09:16.970
about financial data analysis or processing massive

00:09:16.970 --> 00:09:19.330
batches of documents, they saw an incredible

00:09:19.330 --> 00:09:21.950
81 % improvement. That's awesome. Great performance

00:09:21.950 --> 00:09:24.769
scaling. Amazing. But for tasks that require

00:09:24.769 --> 00:09:27.629
sequential step -by -step reasoning. I'm thinking

00:09:27.629 --> 00:09:30.009
of it like trying to assemble a complex piece

00:09:30.009 --> 00:09:33.539
of IKEA furniture. If you ask... 10 different

00:09:33.539 --> 00:09:37.559
specialized people to each do one step, but without

00:09:37.559 --> 00:09:40.220
perfectly clear instant communication. You just

00:09:40.220 --> 00:09:43.639
end up with a huge mess. A huge mess. That is

00:09:43.639 --> 00:09:45.419
exactly what happened in the experiments. For

00:09:45.419 --> 00:09:47.539
these high logic planning tasks, performance

00:09:47.539 --> 00:09:51.440
dropped by up to 70%. 70%. That's a catastrophic

00:09:51.440 --> 00:09:54.139
failure rate for system complexity and cost.

00:09:54.379 --> 00:09:57.000
It is. It just completely flips the entire assumption

00:09:57.000 --> 00:10:00.000
we started with on its head. It means blindly

00:10:00.000 --> 00:10:02.659
throwing more compute at a complex problem is

00:10:02.659 --> 00:10:05.340
actually worse than just having one dedicated,

00:10:05.600 --> 00:10:07.759
highly trained specialist. So why did it fail?

00:10:07.820 --> 00:10:09.700
What's the mechanism? It's both economic and

00:10:09.700 --> 00:10:11.840
technical. The multi -agent setups just burn

00:10:11.840 --> 00:10:14.250
through tokens. And we should remember, tokens

00:10:14.250 --> 00:10:17.309
are the fundamental economic unit of AI. They

00:10:17.309 --> 00:10:19.269
represent the computational weight you pay for.

00:10:19.429 --> 00:10:22.950
They duplicated reasoning steps. They overcomplicated

00:10:22.950 --> 00:10:25.629
workflows that really demanded singular, high

00:10:25.629 --> 00:10:29.230
-precision logic. So in essence, they wasted

00:10:29.230 --> 00:10:32.110
huge amounts of computational weight, which translates

00:10:32.110 --> 00:10:34.850
directly into massively increased operating costs

00:10:34.850 --> 00:10:38.970
for absolutely zero benefit. Zero. This confirms

00:10:38.970 --> 00:10:42.559
that finding. More AI does not equal more IQ.

00:10:42.940 --> 00:10:45.759
And this is such a crucial takeaway for any builder

00:10:45.759 --> 00:10:48.299
out there. For high logic or long -term memory

00:10:48.299 --> 00:10:51.940
tasks, a single, finely -tuned agent can just

00:10:51.940 --> 00:10:56.000
crush a fancy, complex, multi -agent team. A

00:10:56.000 --> 00:10:58.139
lot of these agent stacks are just introducing

00:10:58.139 --> 00:11:00.860
fragility and cost instead of actually solving

00:11:00.860 --> 00:11:03.720
hard problems. So based on this, should builders

00:11:03.720 --> 00:11:06.320
just abandon multi -agent systems entirely? Is

00:11:06.320 --> 00:11:08.179
that the takeaway? No, not at all. They still

00:11:08.179 --> 00:11:10.399
excel in that massive parallel data splitting

00:11:10.399 --> 00:11:13.039
we talked about, but they clearly fail in high

00:11:13.039 --> 00:11:15.820
logic precision chains where one weak link can

00:11:15.820 --> 00:11:17.759
just spoil the whole process. Okay, so what does

00:11:17.759 --> 00:11:19.799
all this mean for you as you integrate this knowledge

00:11:19.799 --> 00:11:22.019
into your work or just your curiosity? We saw

00:11:22.019 --> 00:11:24.519
three powerful connected forces at play today.

00:11:24.909 --> 00:11:27.490
First, there's the marketplace risk. That instant

00:11:27.490 --> 00:11:29.509
distribution comes at the cost of your brand

00:11:29.509 --> 00:11:32.830
identity and potentially immense liability. You've

00:11:32.830 --> 00:11:34.690
got to build your brand outside the convenience

00:11:34.690 --> 00:11:37.669
of that walled platform. Second, tool speed and

00:11:37.669 --> 00:11:40.529
utility. Models are faster than ever, driven

00:11:40.529 --> 00:11:44.350
by these massive financial valuations. But that

00:11:44.350 --> 00:11:47.370
speed is useless if you don't maximize the knowledge

00:11:47.370 --> 00:11:49.970
transfer. So, you know, use the Feynman loop

00:11:49.970 --> 00:11:52.529
to guarantee retention. And finally, the paradox

00:11:52.529 --> 00:11:56.399
of scale. Complexity kills precision. Adding

00:11:56.399 --> 00:11:58.820
more agents doesn't guarantee a higher IQ. It

00:11:58.820 --> 00:12:01.659
often just raises your token costs and introduces

00:12:01.659 --> 00:12:04.559
these crippling failure points in logic -driven

00:12:04.559 --> 00:12:07.500
tasks. And this entire landscape is just shifting

00:12:07.500 --> 00:12:10.159
so quickly. Yeah, we even noted the intense debate

00:12:10.159 --> 00:12:13.019
over spotting AI content. The sources mentioned

00:12:13.019 --> 00:12:15.990
five reliable ways besides just... looking for

00:12:15.990 --> 00:12:18.690
the prevalence of emdashes. That's a whole discussion

00:12:18.690 --> 00:12:20.870
that probably deserves its own deep dive. We'll

00:12:20.870 --> 00:12:22.490
save that for end of the day. Yeah. But here's

00:12:22.490 --> 00:12:24.210
our final provocative thought for you to mull

00:12:24.210 --> 00:12:27.250
over. If the Feynman loop prompt is so incredibly

00:12:27.250 --> 00:12:29.850
effective for teaching humans and ensuring validated

00:12:29.850 --> 00:12:32.470
understanding, what would happen if we applied

00:12:32.470 --> 00:12:35.090
that same technique, requiring the AI to prove

00:12:35.090 --> 00:12:36.970
its depth of understanding to the foundational

00:12:36.970 --> 00:12:39.929
models themselves? Could that finally solve the

00:12:39.929 --> 00:12:42.149
issue of AI hallucination and the prompt drift

00:12:42.149 --> 00:12:43.750
we were wrestling with in the first segment?

00:12:44.090 --> 00:12:47.330
I mean, think about that potential. Deeply validated,

00:12:47.509 --> 00:12:50.610
reliable understanding rather than just probabilistic

00:12:50.610 --> 00:12:54.169
synthesis. It could be the key to true AI reliability

00:12:54.169 --> 00:12:57.409
across every single application. We really encourage

00:12:57.409 --> 00:12:59.970
you to explore that profound idea further. We

00:12:59.970 --> 00:13:01.629
appreciate you taking this deep dive with us.

00:13:01.669 --> 00:13:03.330
Until next time, keep exploring the sources.

00:13:03.549 --> 00:13:04.490
Out to your own music.