WEBVTT

00:00:00.000 --> 00:00:02.459
You've probably used an AI, right? You asked

00:00:02.459 --> 00:00:04.679
it for something, maybe a quick summary, some

00:00:04.679 --> 00:00:07.219
code, a social media post. Yeah, and you get

00:00:07.219 --> 00:00:09.820
something back. It works, technically. But it's

00:00:09.820 --> 00:00:12.880
usually just... Yeah. Okay. Average. That average

00:00:12.880 --> 00:00:14.740
result. Well, believe it or not, that's often

00:00:14.740 --> 00:00:16.620
not the model's fault entirely. It's more about

00:00:16.620 --> 00:00:18.839
the architecture. Architecture. How so? We're

00:00:18.839 --> 00:00:21.539
still giving it, like, simple instructions when

00:00:21.539 --> 00:00:25.140
we should be thinking in terms of recipes, design

00:00:25.140 --> 00:00:28.530
patterns. Okay. Welcome to the Deep Dive. This

00:00:28.530 --> 00:00:31.309
is for you, the listener, who's ready to go beyond

00:00:31.309 --> 00:00:33.630
just basic prompting. You're looking to build

00:00:33.630 --> 00:00:36.950
real AI agency. Exactly. We all know the basic

00:00:36.950 --> 00:00:39.810
parts, right? The model, that's the brain, the

00:00:39.810 --> 00:00:43.549
tools, and those are the hands, and evals, the

00:00:43.549 --> 00:00:46.299
quality checks. Crucial stuff. But those are

00:00:46.299 --> 00:00:48.799
just the ingredients. Spot on. Today we're digging

00:00:48.799 --> 00:00:51.259
into the structure, the recipes. We're exploring

00:00:51.259 --> 00:00:54.740
four critical design patterns that take an AI

00:00:54.740 --> 00:00:57.060
agent from just following a command to running

00:00:57.060 --> 00:00:59.420
a really sophisticated autonomous workflow. So

00:00:59.420 --> 00:01:01.659
we're shifting the focus from what AI can do

00:01:01.659 --> 00:01:04.519
to how we make sure it does it well, reliably,

00:01:04.980 --> 00:01:08.519
every single time. Let's start with recipe one,

00:01:08.840 --> 00:01:12.430
reflection. This one feels like maybe the easiest

00:01:12.430 --> 00:01:14.450
entry point, something you can use right away,

00:01:14.670 --> 00:01:16.870
no fancy tools needed. Yeah, it's definitely

00:01:16.870 --> 00:01:19.909
powerful and accessible. The core idea is simple.

00:01:20.510 --> 00:01:22.890
Build in self -critique. Don't just take the

00:01:22.890 --> 00:01:25.989
AI's first answer. Okay. Instead, you force it

00:01:25.989 --> 00:01:29.250
to review its own work. critique it against specific

00:01:29.250 --> 00:01:31.090
rules before it even thinks about rewriting.

00:01:31.390 --> 00:01:33.730
Right. Because the usual way, the basic prompting,

00:01:33.870 --> 00:01:35.870
it's often too vague. We ask it to write something,

00:01:35.989 --> 00:01:37.790
then we just say... Is that good? Make it better.

00:01:37.989 --> 00:01:40.810
Exactly. And better could mean anything. So the

00:01:40.810 --> 00:01:43.209
AI just kind of fiddles with words. It often

00:01:43.209 --> 00:01:45.629
misses the real problem. Yeah. So the pro -level

00:01:45.629 --> 00:01:48.569
move here, the expert recipe, is using a specific

00:01:48.569 --> 00:01:51.180
structured rubric. A rubric. Like in school?

00:01:51.519 --> 00:01:53.920
Sort of, yeah. It forces a structured analysis.

00:01:54.140 --> 00:01:56.480
Let's say the AI writes a blog post. Instead

00:01:56.480 --> 00:01:58.980
of make it better, you give it, say, three things

00:01:58.980 --> 00:02:01.780
to grade itself on. Scale of one to five. OK.

00:02:02.060 --> 00:02:04.480
Grade the intro for clarity and hook one to five.

00:02:04.620 --> 00:02:06.900
Yep. Then maybe grade the examples used. How

00:02:06.900 --> 00:02:08.879
relevant are they? One to five. And the call

00:02:08.879 --> 00:02:11.919
to action, is it clear? One to five. Exactly.

00:02:11.979 --> 00:02:13.979
You're making the agent switch gears. It goes

00:02:13.979 --> 00:02:17.219
from just a fast writing mode. To thinking mode.

00:02:17.400 --> 00:02:20.360
Analysis mode. Right, structured analysis. And

00:02:20.360 --> 00:02:23.659
the trick often is using structured formats,

00:02:24.259 --> 00:02:27.819
things like XML tags or maybe JSON. You put the

00:02:27.819 --> 00:02:31.430
draft. Inside draft tags? Inside rubric tags?

00:02:31.550 --> 00:02:33.530
Why do those tags matter so much? Why not just

00:02:33.530 --> 00:02:35.909
bullet points in the prompt? It's about signaling,

00:02:35.930 --> 00:02:39.330
really. When the AI sees those specific tags,

00:02:39.449 --> 00:02:42.490
draft, rubric, it knows its job isn't creative

00:02:42.490 --> 00:02:45.889
writing anymore. Its job is now logical parsing,

00:02:45.969 --> 00:02:48.849
following strict rules. It has to look inside

00:02:48.849 --> 00:02:51.270
those tags, find the flaws according to the rubric

00:02:51.270 --> 00:02:53.370
before it can try again. It literally needs to

00:02:53.370 --> 00:02:56.050
figure out why it got a 2 out of 5 on clarity

00:02:56.050 --> 00:02:57.810
before rewriting. That makes a lot of sense.

00:02:57.870 --> 00:02:59.689
It's like putting up guardrails. Okay, but let's

00:02:59.689 --> 00:03:02.669
talk trade -offs. Doesn't this double the time

00:03:02.669 --> 00:03:04.870
and the cost? You're basically running it twice.

00:03:05.110 --> 00:03:07.349
It absolutely does increase latency and cost.

00:03:07.530 --> 00:03:09.389
Yeah, it's a multi -step process. Takes longer,

00:03:09.409 --> 00:03:11.969
uses more tokens, no question. Why do it? Because

00:03:11.969 --> 00:03:14.650
you're paying for quality assurance. It's a trade

00:03:14.650 --> 00:03:17.909
-off, yes, but the improvement you get. Going

00:03:17.909 --> 00:03:21.210
from a med draft to a really solid final piece,

00:03:21.430 --> 00:03:24.789
that quality jump usually far outweighs the extra

00:03:24.789 --> 00:03:27.349
cost or time. It's an investment then. and getting

00:03:27.349 --> 00:03:29.849
it right. Precisely. I have to admit, though,

00:03:30.409 --> 00:03:32.830
even knowing this, sometimes I still catch myself

00:03:32.830 --> 00:03:35.789
typing that lazy, make -it -better prompt. You

00:03:35.789 --> 00:03:38.770
know, before I stop, delete it and actually build

00:03:38.770 --> 00:03:40.669
the rubric. It's a habit that's kind of hard

00:03:40.669 --> 00:03:42.530
to break. That's honest, and I think a lot of

00:03:42.530 --> 00:03:44.530
people can relate. It takes discipline. But OK,

00:03:44.550 --> 00:03:46.830
if you had to boil down the immediate benefit

00:03:46.830 --> 00:03:49.750
of Recipe 1, what's the core shift for the user?

00:03:49.990 --> 00:03:53.030
Quality goes up instantly, because the AI has

00:03:53.030 --> 00:03:56.389
to judge itself against clear, strict rules.

00:03:56.539 --> 00:03:59.400
OK, so reflection helps the AI improve itself

00:03:59.400 --> 00:04:02.219
internally. Recipe two, tool use, is about giving

00:04:02.219 --> 00:04:04.199
it external powers. Giving it hands, you said

00:04:04.199 --> 00:04:06.439
earlier. Connecting it to the outside world.

00:04:06.800 --> 00:04:10.199
APIs, search databases. Exactly. And modern models,

00:04:10.219 --> 00:04:12.060
they're pretty smart. They often know they might

00:04:12.060 --> 00:04:14.319
need a tool. But just relying on that built -in

00:04:14.319 --> 00:04:16.339
knowledge, that zero -shot ability, it can get

00:04:16.339 --> 00:04:18.819
tripped up by complex questions. Yeah, like if

00:04:18.819 --> 00:04:20.540
you ask something with multiple parts. Right.

00:04:20.800 --> 00:04:22.720
You ask, is it going to be cold in Hanoi tomorrow?

00:04:24.069 --> 00:04:28.170
And based on that, should I pack a jacket and

00:04:28.170 --> 00:04:30.370
maybe an umbrella? Okay, that's a few steps.

00:04:30.449 --> 00:04:32.350
Yeah, and the AI might just check the temperature

00:04:32.350 --> 00:04:34.629
but completely forget the part about the umbrella.

00:04:34.769 --> 00:04:37.050
It gets confused. So the pro -level technique

00:04:37.050 --> 00:04:39.589
is using guiding examples. Few -shot learning,

00:04:39.670 --> 00:04:41.910
right? Exactly. We don't just tell it the tool

00:04:41.910 --> 00:04:44.990
exists. We show it how to think through specific

00:04:44.990 --> 00:04:47.410
problems using examples. Like teaching it the

00:04:47.410 --> 00:04:50.589
decision process. Precisely. If a user asks that

00:04:50.589 --> 00:04:52.689
two -part question, maybe... How much does this

00:04:52.689 --> 00:04:55.750
cost and is it in stock? The example teaches

00:04:55.750 --> 00:04:58.310
the AI an internal thought process. Like what?

00:04:58.509 --> 00:05:00.310
Like, OK, wait, I need the product ID before

00:05:00.310 --> 00:05:05.230
I can check price or stock. So step one, I must

00:05:05.230 --> 00:05:08.959
use the ask you for info action first. Ah, so

00:05:08.959 --> 00:05:10.779
the key isn't just using the tool. It's teaching

00:05:10.779 --> 00:05:13.019
the AI when to stop and ask for missing info.

00:05:13.139 --> 00:05:16.079
Yes, that dramatically cuts down on errors. On

00:05:16.079 --> 00:05:18.860
the AI just guessing or making stuff up. So teaching

00:05:18.860 --> 00:05:21.560
the context, the steps needed, makes the whole

00:05:21.560 --> 00:05:24.060
system way more predictable and less error -prone.

00:05:24.240 --> 00:05:27.660
Totally. It reduces errors drastically. Makes

00:05:27.660 --> 00:05:31.740
it reliable. Yeah. Whoa. Just imagine scaling

00:05:31.740 --> 00:05:34.939
that. A system built with these robust guiding

00:05:34.939 --> 00:05:39.129
examples. Handling, say, a billion complex financial

00:05:39.129 --> 00:05:42.449
queries a day. Wow. That level of reliable connection

00:05:42.449 --> 00:05:45.550
to real -world systems, that unlocks some serious

00:05:45.550 --> 00:05:48.269
business value. Truly powerful stuff. Okay, so

00:05:48.269 --> 00:05:52.209
tools give the AI hands. This next recipe, planning.

00:05:53.579 --> 00:05:55.759
That sounds like giving it foresight. Yeah, now

00:05:55.759 --> 00:05:57.680
we're getting into really autonomous territory.

00:05:58.040 --> 00:05:59.939
The user just gives the high -level goal, and

00:05:59.939 --> 00:06:02.120
the agent has to figure out the entire step -by

00:06:02.120 --> 00:06:04.639
-step plan itself. But basic planning often fails,

00:06:04.680 --> 00:06:06.600
right? It does, because the AI tends to jump

00:06:06.600 --> 00:06:08.579
the gun. You say... Plan a trip to Paris for

00:06:08.579 --> 00:06:10.240
three days. And it immediately starts looking

00:06:10.240 --> 00:06:13.139
for flights. Exactly. Flights, hotels. It skips

00:06:13.139 --> 00:06:14.839
the crucial first step, asking questions. What's

00:06:14.839 --> 00:06:16.839
the budget? When are you going? Who's even traveling?

00:06:16.920 --> 00:06:19.759
It just defaults to a generic plan. Right. Minimum

00:06:19.759 --> 00:06:21.779
effort solution. Yeah. So the better structure,

00:06:22.000 --> 00:06:25.519
often called React, reasoning and action. It

00:06:25.519 --> 00:06:28.459
forces a mandatory plan and critique cycle. Oh,

00:06:28.459 --> 00:06:30.680
cycle. Yeah. A rigid four -step process it must

00:06:30.680 --> 00:06:32.620
follow before it does anything. OK, what are

00:06:32.620 --> 00:06:35.639
the steps? Thought. Clearly define the user's

00:06:35.639 --> 00:06:39.160
ultimate goal. Step two. Initial plan. Write

00:06:39.160 --> 00:06:42.379
down the steps it thinks it needs. One. Two.

00:06:42.819 --> 00:06:45.459
Three. Makes sense. This is the core self -critique.

00:06:45.699 --> 00:06:48.600
The AI must ask itself. What information am I

00:06:48.600 --> 00:06:53.120
missing? Budget. Dates. Interests. Where could

00:06:53.120 --> 00:06:55.899
this plan go wrong? This is where it confronts

00:06:55.899 --> 00:06:58.079
its own ignorance. It forces it to see the gaps.

00:06:58.240 --> 00:07:02.180
Exactly. Then step four. Final plan. It takes

00:07:02.180 --> 00:07:05.500
the critique and fixes the plan. And here's the

00:07:05.500 --> 00:07:08.519
key. The very first step in that final plan,

00:07:08.740 --> 00:07:10.519
almost always it should be using that ask user

00:07:10.519 --> 00:07:12.839
for info tool we talked about. Ah, connecting

00:07:12.839 --> 00:07:15.300
back to tool use. Yep. This whole structure stops

00:07:15.300 --> 00:07:17.540
the AI from just rushing ahead vaguely. It makes

00:07:17.540 --> 00:07:20.160
sure it gets the necessary info first before

00:07:20.160 --> 00:07:22.259
wasting time or, you know, computing resources.

00:07:22.420 --> 00:07:24.720
So that mandatory self -critique phase forces

00:07:24.720 --> 00:07:27.279
it to gather info first. That one step prevents

00:07:27.279 --> 00:07:29.800
the rush to a bad generic solution. It builds

00:07:29.800 --> 00:07:31.639
quality in right from the start. Okay, that makes

00:07:31.639 --> 00:07:33.360
a lot of sense. We need to take a quick break.

00:07:33.420 --> 00:07:34.939
When we come back, we'll dive into the final

00:07:34.939 --> 00:07:36.920
most advanced recipe, multi -agent workflows.

00:07:37.279 --> 00:07:39.879
Sounds good. Mid -roll sponsor read content to

00:07:39.879 --> 00:07:42.759
be inserted by Rider. All right, we're back.

00:07:42.879 --> 00:07:45.639
We've covered reflection, tool use, and planning.

00:07:46.500 --> 00:07:48.360
Now for the final recipe, which you said is the

00:07:48.360 --> 00:07:51.740
most advanced, multi -agent workflow. Yeah, this

00:07:51.740 --> 00:07:55.199
is where we stop trying to make one single general

00:07:55.199 --> 00:07:58.540
AI do everything perfectly. Instead, we build

00:07:58.540 --> 00:08:02.019
a team. A team of specialized AI agents. OK,

00:08:02.019 --> 00:08:04.399
because the normal way, that's usually just us,

00:08:04.500 --> 00:08:06.959
right? Doing the glue work. Exactly. Manual glue

00:08:06.959 --> 00:08:10.459
work. You ask your research AI for market trends.

00:08:10.620 --> 00:08:13.199
You copy the text. You paste it into your writing

00:08:13.199 --> 00:08:15.680
AI to generate some ad copy. And you're the one

00:08:15.680 --> 00:08:17.839
connecting the dots, trying to keep the context

00:08:17.839 --> 00:08:20.139
straight between copy paste. Right. And context

00:08:20.139 --> 00:08:23.360
gets lost or muddied. It's inefficient. The pro

00:08:23.360 --> 00:08:25.920
approach is setting up clear specialized roles

00:08:25.920 --> 00:08:28.199
for each agent and really strict rules for how

00:08:28.199 --> 00:08:30.220
they hand off. information. Think of it like

00:08:30.220 --> 00:08:32.600
a hyper -specialized assembly line. An assembly

00:08:32.600 --> 00:08:35.320
line for AI. Pretty much. The key concept here

00:08:35.320 --> 00:08:37.419
is fighting something called function creep.

00:08:37.679 --> 00:08:39.840
Function creep? What's that? It's when a single

00:08:39.840 --> 00:08:42.000
agent starts getting asked to do jobs it wasn't

00:08:42.000 --> 00:08:45.259
really designed or optimized for. Its core quality

00:08:45.259 --> 00:08:47.200
slowly degrades because it's being stretched

00:08:47.200 --> 00:08:49.919
too thin. Specialization prevents that. Okay,

00:08:49.940 --> 00:08:51.899
let's use that ad campaign example you mentioned.

00:08:52.279 --> 00:08:54.980
So instead of one AI, we have three. Yeah, let's

00:08:54.980 --> 00:08:58.100
say three distinct roles. Agent one. We'll call

00:08:58.100 --> 00:09:02.600
him Data Dave. Data Dave. OK. His job is purely

00:09:02.600 --> 00:09:06.580
logical, factual, analyze market trends. And

00:09:06.580 --> 00:09:09.480
crucially, his output must be a very strict JSON

00:09:09.480 --> 00:09:12.879
object. Maybe detailing target audience, key

00:09:12.879 --> 00:09:15.879
trends, the core pain point. He only speaks data,

00:09:16.399 --> 00:09:19.399
no creativity allowed. Strict JSON. Got it. Then

00:09:19.399 --> 00:09:22.539
that JSON goes to? Creative Carla. Her role is

00:09:22.539 --> 00:09:25.720
all about emotion narrative. turning Dave's dry

00:09:25.720 --> 00:09:27.899
data into, say, three compelling ad options.

00:09:27.960 --> 00:09:29.740
Yeah, because she gets that strict JSON contract

00:09:29.740 --> 00:09:31.340
from Dave. She does exactly which pain points

00:09:31.340 --> 00:09:33.519
she needs to address. No guesswork. OK, makes

00:09:33.519 --> 00:09:36.279
sense. And agent three. Manager Mike. His role

00:09:36.279 --> 00:09:38.480
is practical, results focused. He takes Carla's

00:09:38.480 --> 00:09:40.779
creative ads, compares them against Dave's original

00:09:40.779 --> 00:09:42.480
data analysis. Checks if they actually match

00:09:42.480 --> 00:09:44.879
the research. Right. And then he picks the single

00:09:44.879 --> 00:09:47.779
best ad and explains why based on the data and

00:09:47.779 --> 00:09:51.110
the goal. OK, but. Couldn't you just use one

00:09:51.110 --> 00:09:54.330
really big powerful model like GPT -4 for all

00:09:54.330 --> 00:09:57.110
of this? Seems way less complicated than setting

00:09:57.110 --> 00:09:59.649
up and managing this whole AI team. It might

00:09:59.649 --> 00:10:02.230
seem simpler up front, yeah, but the quality

00:10:02.230 --> 00:10:04.889
difference can be huge. That specialization is

00:10:04.889 --> 00:10:07.649
key. Why though, if the big model is smart enough?

00:10:08.309 --> 00:10:10.070
Because of that function creep we talked about.

00:10:10.269 --> 00:10:13.049
Data Dave cannot write good ad copy. Creative

00:10:13.049 --> 00:10:16.129
Carla cannot do rigorous data analysis. Forcing

00:10:16.129 --> 00:10:18.649
them into specialized lanes maintains peak performance

00:10:18.649 --> 00:10:21.110
for their specific task. And the other crucial

00:10:21.110 --> 00:10:23.889
piece, that strict JSON format for handoffs,

00:10:24.289 --> 00:10:26.669
it acts like an unbreakable contract between

00:10:26.669 --> 00:10:30.490
them. It drastically reduces the risk of hallucination

00:10:30.490 --> 00:10:32.690
or misinterpretation as information moves down

00:10:32.690 --> 00:10:35.409
the line. Context is preserved perfectly. The

00:10:35.409 --> 00:10:37.950
quality boost comes from forcing specialization

00:10:37.950 --> 00:10:40.090
and using that strict data contract. between

00:10:40.090 --> 00:10:42.129
them. Exactly. Specialization stops function

00:10:42.129 --> 00:10:45.429
creep. Strict JSON ensures perfect context transfer.

00:10:45.610 --> 00:10:47.750
You get a better final product from the assembly

00:10:47.750 --> 00:10:50.450
line. Okay, so let's recap the big ideas. Four

00:10:50.450 --> 00:10:53.309
recipes for moving beyond basic prompts. Yep.

00:10:53.649 --> 00:10:56.070
First, reflection improving internal quality

00:10:56.070 --> 00:10:59.110
using a strict rubric for self -critique. Second,

00:10:59.789 --> 00:11:02.669
tool use giving the AI external capabilities

00:11:02.669 --> 00:11:05.490
but guiding it with few -shot examples so it

00:11:05.490 --> 00:11:08.440
knows when and how. Third, planning creating

00:11:08.440 --> 00:11:11.059
autonomy with that mandatory self -critique step

00:11:11.059 --> 00:11:13.419
to force information gathering before acting,

00:11:13.940 --> 00:11:16.059
using structures like React. And fourth, multi

00:11:16.059 --> 00:11:18.259
-agent building specialized teams that communicate

00:11:18.259 --> 00:11:21.419
via strict formats like JSON to maximize quality

00:11:21.419 --> 00:11:24.039
and avoid function creed. That's the progression.

00:11:24.320 --> 00:11:27.139
From simple internal checks to complex orchestrated

00:11:27.139 --> 00:11:29.539
teams. Now, you probably shouldn't try to build

00:11:29.539 --> 00:11:32.129
a complex multi -agent system tomorrow. Definitely

00:11:32.129 --> 00:11:34.769
not. My advice is start simple. Get comfortable

00:11:34.769 --> 00:11:37.330
with the earlier recipes first. Right. So here's

00:11:37.330 --> 00:11:39.169
some actionable homework for you, the listener.

00:11:39.529 --> 00:11:42.090
Try recipe one, reflection, today. Yeah, pick

00:11:42.090 --> 00:11:44.129
a simple task you do often, maybe writing email

00:11:44.129 --> 00:11:47.590
subject lines. OK. Ask your AI to write, say,

00:11:47.909 --> 00:11:50.610
three subject lines for an email. Then immediately

00:11:50.610 --> 00:11:53.309
after, give it a rubric. Tell it. Grade these

00:11:53.309 --> 00:11:56.210
subject lines from one to five on curiosity and

00:11:56.210 --> 00:11:58.289
one to five on urgency make it be strict and

00:11:58.289 --> 00:12:01.029
then the final step Demand new subject lines

00:12:01.029 --> 00:12:03.750
tell it now write three new ones that score five

00:12:03.750 --> 00:12:06.929
out of five on both curiosity and Urgency based

00:12:06.929 --> 00:12:09.450
on your own critique and just notice the difference

00:12:09.450 --> 00:12:12.490
between that first attempt and the refined reflected

00:12:12.490 --> 00:12:14.990
version you can be dramatic it really shows that

00:12:14.990 --> 00:12:17.879
if Just changing a prompt slightly, adding that

00:12:17.879 --> 00:12:20.179
reflection step can give you such a better result.

00:12:21.039 --> 00:12:23.120
Imagine the power when you start layering these

00:12:23.120 --> 00:12:26.259
patterns, tool use on top of reflection, planning,

00:12:26.279 --> 00:12:28.740
managing tool use. It compounds. So here's a

00:12:28.740 --> 00:12:30.259
final thought to leave you with, something to

00:12:30.259 --> 00:12:33.039
chew on. OK. If your AI agent, the one you use

00:12:33.039 --> 00:12:35.139
every day, if it always checked what critical

00:12:35.139 --> 00:12:36.799
information it was missing before it took any

00:12:36.799 --> 00:12:39.919
action. How would that one change, that architectural

00:12:39.919 --> 00:12:42.279
shift based on the planning recipe, how would

00:12:42.279 --> 00:12:44.700
that fundamentally change your daily workflow,

00:12:44.879 --> 00:12:47.559
your decision -making? That's a really interesting

00:12:47.559 --> 00:12:50.159
question to consider. How much guesswork would

00:12:50.159 --> 00:12:53.179
that eliminate? Something to think about. Definitely.

00:12:53.960 --> 00:12:55.980
A thought to carry with you until our next deep

00:12:55.980 --> 00:12:57.620
dive. Thanks for tuning in.