WEBVTT

00:00:00.000 --> 00:00:03.220
Imagine feeding a single book draft into an AI.

00:00:04.299 --> 00:00:06.580
Instantly, you get five completely different

00:00:06.580 --> 00:00:09.460
reviews back. One review is from Linda. She's

00:00:09.460 --> 00:00:12.240
a retired teacher, brand new to AI. Another is

00:00:12.240 --> 00:00:14.859
from David, a corporate executive. Right. It's

00:00:14.859 --> 00:00:18.179
the exact same text draft, but you get completely

00:00:18.179 --> 00:00:22.660
distinct parallel perspectives. It feels like

00:00:22.660 --> 00:00:25.559
actual magic, but it's really just smart engineering.

00:00:25.780 --> 00:00:27.579
It really is. It's a fascinating shift in how

00:00:27.579 --> 00:00:30.170
we interact. Okay, let's unpack this. Welcome

00:00:30.170 --> 00:00:32.770
to our deep dive today. We're exploring the hidden

00:00:32.770 --> 00:00:35.390
power of Claude Code's sub -agents. We're going

00:00:35.390 --> 00:00:36.950
to look at what they actually are. We'll see

00:00:36.950 --> 00:00:39.509
why they save massive time and money. Then we'll

00:00:39.509 --> 00:00:42.090
help you build a custom plan roaster. We'll also

00:00:42.090 --> 00:00:44.929
set up vital safety guardrails you need. And

00:00:44.929 --> 00:00:46.990
finally, we'll scale up to dynamic workflows.

00:00:47.450 --> 00:00:50.130
I'm joined by our expert guide today. Let's dive

00:00:50.130 --> 00:00:53.130
right in. Hey there. I'm incredibly excited to

00:00:53.130 --> 00:00:55.109
get into this topic. That opening scenario with

00:00:55.109 --> 00:00:56.929
the book reviews sounds incredible, but I want

00:00:56.929 --> 00:00:59.270
to understand the mechanics here first. How does

00:00:59.270 --> 00:01:01.030
that parallel work actually happen behind the

00:01:01.030 --> 00:01:03.189
scenes? Well, it's a beautifully simple system

00:01:03.189 --> 00:01:05.370
of delegation. You can think of the main chat

00:01:05.370 --> 00:01:07.750
as your boss. The sub -agents are the workers.

00:01:07.989 --> 00:01:10.730
You only ever talk directly to the boss. Sub

00:01:10.730 --> 00:01:12.870
-agents don't talk to you directly either. They

00:01:12.870 --> 00:01:14.530
don't even talk to each other. They just do their

00:01:14.530 --> 00:01:17.000
job and report back. So it's kind of like a bustling

00:01:17.000 --> 00:01:19.579
restaurant kitchen. The main session is the head

00:01:19.579 --> 00:01:22.040
chef talking to customers. The sub -agents are

00:01:22.040 --> 00:01:24.420
prep cooks chopping onions in the back. Yeah,

00:01:24.480 --> 00:01:26.900
that's a perfect analogy. And setting up your

00:01:26.900 --> 00:01:29.560
kitchen this way gives four massive benefits.

00:01:29.859 --> 00:01:33.159
The very first one is keeping a pristine context

00:01:33.159 --> 00:01:37.040
window. Long AI chats get polluted incredibly

00:01:37.040 --> 00:01:40.489
fast. Yeah, because you make the AI read up on

00:01:40.489 --> 00:01:43.090
stuff. Like reading about Firefly's AI, just

00:01:43.090 --> 00:01:45.810
for a quick summary, that junk stays in its memory

00:01:45.810 --> 00:01:48.409
forever. Nobody wants that useless noise clogging

00:01:48.409 --> 00:01:50.909
up the conversation. Exactly. Every single token

00:01:50.909 --> 00:01:53.069
matters. The second benefit is a drastically

00:01:53.069 --> 00:01:56.150
lower overall cost. You use the really smart

00:01:56.150 --> 00:01:59.349
Opus model for planning, but you spin up cheaper

00:01:59.349 --> 00:02:02.290
Haiku models for heavy reading. That makes perfect

00:02:02.290 --> 00:02:04.109
financial sense right there. You don't pay a

00:02:04.109 --> 00:02:07.280
head chef to peel potatoes. Spot on. The third

00:02:07.280 --> 00:02:10.060
benefit is pure parallel work. You can review

00:02:10.060 --> 00:02:13.020
15 book chapters at the exact same time. You

00:02:13.020 --> 00:02:14.759
could research five different competitors all

00:02:14.759 --> 00:02:18.219
at once. That speed is just incredible. I know

00:02:18.219 --> 00:02:21.039
trying 15 chapters normally makes the AI hallucinate,

00:02:21.159 --> 00:02:24.780
so parallel work clearly fixes that issue. But

00:02:24.780 --> 00:02:27.159
what about the actual quality of the feedback?

00:02:27.500 --> 00:02:30.699
That brings us right to the fourth huge benefit.

00:02:30.900 --> 00:02:34.770
You get a truly fresh review every time. Subagents

00:02:34.770 --> 00:02:36.949
always start with completely blank memories.

00:02:37.110 --> 00:02:39.729
They give you honest and unbiased feedback. They

00:02:39.729 --> 00:02:41.550
aren't trying to be agreeable. Is context pollution

00:02:41.550 --> 00:02:44.030
really that big of a deal in normal workflows?

00:02:44.289 --> 00:02:47.090
Yes. It's a massive problem. Forcing one model

00:02:47.090 --> 00:02:49.669
to hold every file degrades its focus. It loses

00:02:49.669 --> 00:02:52.270
its logic over time. So crowded memory makes

00:02:52.270 --> 00:02:55.270
the AI lose focus and logic. Exactly. The kitchen

00:02:55.270 --> 00:02:57.129
analogy makes total sense to me. I definitely

00:02:57.129 --> 00:02:59.469
want those prep cooks saving me time. But if

00:02:59.469 --> 00:03:01.819
I'm the head chef here. How do we actually hire

00:03:01.819 --> 00:03:03.819
them? How are they given their specific instructions?

00:03:04.159 --> 00:03:06.539
It's surprisingly simple under the hood. A custom

00:03:06.539 --> 00:03:08.539
subagent is literally just a single markdown

00:03:08.539 --> 00:03:11.539
file. You usually store it in a hidden .clodagents

00:03:11.539 --> 00:03:13.819
folder. Wait, hold on. Just a standard text file?

00:03:14.000 --> 00:03:16.180
I assumed we were talking about complex Python

00:03:16.180 --> 00:03:18.639
scripts here. Nope. Just a standard markdown

00:03:18.639 --> 00:03:22.139
file. It has two completely distinct parts. The

00:03:22.139 --> 00:03:25.099
top part is the YAML front matter. Let's pause

00:03:25.099 --> 00:03:27.800
for a second. Define YAML front matter for us.

00:03:27.979 --> 00:03:30.969
A small... settings block at the top of a file.

00:03:31.069 --> 00:03:33.830
Got it. It holds the name and the trigger description.

00:03:34.050 --> 00:03:36.610
It also holds the model you want to use. Just

00:03:36.610 --> 00:03:39.310
basic configuration then. So what is the second

00:03:39.310 --> 00:03:41.150
part of the file? The second part is the main

00:03:41.150 --> 00:03:44.150
body. That's the actual workflow. It tells the

00:03:44.150 --> 00:03:46.770
agent exactly how to think. It lists the exact

00:03:46.770 --> 00:03:49.729
steps and the required output format. I read

00:03:49.729 --> 00:03:51.830
that Claude uses progressive disclosure here.

00:03:51.990 --> 00:03:54.909
Why is that specific feature so important? Think

00:03:54.909 --> 00:03:58.000
about it like RAM on your computer. If Claude

00:03:58.000 --> 00:04:00.340
loaded massive instruction sets every time you

00:04:00.340 --> 00:04:02.699
said hello, it would just grind to a total halt.

00:04:03.379 --> 00:04:05.520
Progressive disclosure means it only reads that

00:04:05.520 --> 00:04:08.300
tiny trigger description first. Ah, I see. So

00:04:08.300 --> 00:04:10.319
if the trigger matches, it loads the full body.

00:04:10.479 --> 00:04:12.840
That saves a massive amount of processing power.

00:04:13.180 --> 00:04:15.900
It's exactly right. Let's look at a concrete

00:04:15.900 --> 00:04:18.680
example to make this real. We call it the plan

00:04:18.680 --> 00:04:21.420
roaster. I really love that name. It's a phenomenal

00:04:21.420 --> 00:04:24.730
tool. The trigger description is kept remarkably

00:04:24.730 --> 00:04:28.370
simple. It just says to use this agent to critique

00:04:28.370 --> 00:04:32.170
a plan. I have a confession to make here. I still

00:04:32.170 --> 00:04:34.910
wrestle with prompt drift and misfires myself.

00:04:35.250 --> 00:04:38.269
Beat. It's frustrating when the AI just ignores

00:04:38.269 --> 00:04:41.069
your tools. Here's where it gets really interesting,

00:04:41.170 --> 00:04:43.730
though. People fall into a psychological trap

00:04:43.730 --> 00:04:46.110
when building these. They make those trigger

00:04:46.110 --> 00:04:48.350
descriptions way too long. They really do. It's

00:04:48.350 --> 00:04:51.579
a very classic mistake. A long description confuses

00:04:51.579 --> 00:04:53.939
the whole routing system. You must keep it remarkably

00:04:53.939 --> 00:04:56.199
short and punchy. Put the real heavy lifting

00:04:56.199 --> 00:04:58.680
down on the body instead. Right. The body is

00:04:58.680 --> 00:05:01.319
where you get highly specific. You tell the agent

00:05:01.319 --> 00:05:04.000
to find the absolute weakest points. Yeah, and

00:05:04.000 --> 00:05:06.319
you tell it to explain what could go wrong. You

00:05:06.319 --> 00:05:08.759
want it to be totally ruthless. What happens

00:05:08.759 --> 00:05:11.060
if the sub -agent just doesn't fire when I want

00:05:11.060 --> 00:05:13.800
it to? Well, that's called a misfire. It usually

00:05:13.800 --> 00:05:16.079
means your description is too vague. You fix

00:05:16.079 --> 00:05:18.410
it by trimming the description down. If it ignores

00:05:18.410 --> 00:05:20.550
you, make the trigger description shorter and

00:05:20.550 --> 00:05:23.050
sharper. Precisely. Okay, we know how to build

00:05:23.050 --> 00:05:25.990
one now. But once we write this text file down,

00:05:26.290 --> 00:05:28.790
where do we actually put it? We don't want these

00:05:28.790 --> 00:05:30.870
prep cooks getting lost. You need to understand

00:05:30.870 --> 00:05:34.829
skills versus subagents first. Skills run directly

00:05:34.829 --> 00:05:38.149
inside your main session context. Subagents run

00:05:38.149 --> 00:05:41.149
in a separate, fresh session entirely. But they

00:05:41.149 --> 00:05:43.230
aren't totally isolated from each other, right?

00:05:43.290 --> 00:05:44.790
They can actually work together. Absolutely.

00:05:44.850 --> 00:05:47.769
A subagent can actually invoke a skill. Right.

00:05:47.769 --> 00:05:50.529
Imagine spinning up five subagents at once. They

00:05:50.529 --> 00:05:52.430
all use a LinkedIn research skill simultaneously.

00:05:53.009 --> 00:05:55.569
That is an incredibly powerful combination. It's

00:05:55.569 --> 00:05:58.029
like giving your prep cooks power tools. It really

00:05:58.029 --> 00:06:00.329
is. Now, for storing these files, you have two

00:06:00.329 --> 00:06:02.850
main choices. You have project -level folders

00:06:02.850 --> 00:06:05.870
and global -level folders. Let's break down why

00:06:05.870 --> 00:06:08.230
you'd use one over the other. Project -level

00:06:08.230 --> 00:06:10.889
agents live inside your specific code repository.

00:06:11.750 --> 00:06:15.470
They're best for dedicated team workflows. Think

00:06:15.470 --> 00:06:17.889
about a strict security reviewer agent. That

00:06:17.889 --> 00:06:20.350
makes total sense. Because when you push the

00:06:20.350 --> 00:06:24.170
code to GitHub, your entire team gets that specific

00:06:24.170 --> 00:06:27.089
agent automatically. It stays directly with the

00:06:27.089 --> 00:06:30.310
project it belongs to. Exactly. Global agents

00:06:30.310 --> 00:06:32.889
live in your personal user directory instead.

00:06:33.029 --> 00:06:35.670
You use those across all your different projects.

00:06:36.009 --> 00:06:38.170
A personal writing reviewer is a great example.

00:06:38.490 --> 00:06:40.689
If I put an agent in the wrong folder, am I stuck?

00:06:41.029 --> 00:06:43.209
Not at all. Since they're literally just standard

00:06:43.209 --> 00:06:45.850
markdown files, you can just drag and drop them

00:06:45.850 --> 00:06:47.910
to a new folder later. They are just text files.

00:06:48.050 --> 00:06:50.029
You can move them anywhere, anytime. Exactly

00:06:50.029 --> 00:06:52.449
right. It's highly flexible. Midroll sponsor

00:06:52.449 --> 00:06:55.050
read placeholder. All right, let's get back to

00:06:55.050 --> 00:06:57.910
it. Okay, we have an army of cheap parallel workers

00:06:57.910 --> 00:07:00.050
now. We know how to build them and store them.

00:07:00.410 --> 00:07:03.470
But if we let them run wild, they could completely

00:07:03.470 --> 00:07:05.529
break our systems. We need to talk about security

00:07:05.529 --> 00:07:08.389
guardrails. Security is absolutely non -negotiable

00:07:08.389 --> 00:07:11.110
here. You have to adopt a zero -trust mindset.

00:07:11.740 --> 00:07:14.660
If an AI can touch something, assume it might.

00:07:14.819 --> 00:07:17.980
That is a slightly terrifying but deeply realistic

00:07:17.980 --> 00:07:21.040
rule. How do we actually enforce that zero trust

00:07:21.040 --> 00:07:23.860
policy? You must use strict tool restrictions.

00:07:24.279 --> 00:07:26.819
Always use read -only tools for your review agents.

00:07:27.339 --> 00:07:29.360
Our plan roaster doesn't need permission to write

00:07:29.360 --> 00:07:32.000
files. It only needs to read your plan and return

00:07:32.000 --> 00:07:34.939
feedback. So you only give edit access when strictly

00:07:34.939 --> 00:07:38.300
necessary. Never do it by default. Exactly. You

00:07:38.300 --> 00:07:40.620
also have to carefully limit your MCP access.

00:07:41.160 --> 00:07:43.620
Let's define MCP servers for the listener really

00:07:43.620 --> 00:07:46.079
quickly. Tools letting AI talk to your private

00:07:46.079 --> 00:07:48.560
databases and apps. Got it. So you don't give

00:07:48.560 --> 00:07:51.279
a writing critic database access. Please don't.

00:07:51.279 --> 00:07:54.079
That's a recipe for absolute disaster. Imagine

00:07:54.079 --> 00:07:56.399
a rogue subagent accidentally dropping your user

00:07:56.399 --> 00:07:59.220
tables. You also need to enforce max turns in

00:07:59.220 --> 00:08:01.379
your settings. What does the max turn setting

00:08:01.379 --> 00:08:04.600
actually do? It firmly caps how long an agent

00:08:04.600 --> 00:08:06.920
works. It prevents them from getting stuck in

00:08:06.920 --> 00:08:09.120
infinite loops. And it stops them from burning

00:08:09.120 --> 00:08:11.160
through your wallet. What about downloading community

00:08:11.160 --> 00:08:13.740
agents from public repositories? People really

00:08:13.740 --> 00:08:17.120
love sharing their custom setups online. Be incredibly

00:08:17.120 --> 00:08:19.240
careful with those. They're still just basic

00:08:19.240 --> 00:08:21.920
instruction files. They can easily contain malicious

00:08:21.920 --> 00:08:24.589
prompt injection or data leaks. You know what

00:08:24.589 --> 00:08:27.050
I think is a brilliant idea here? Building an

00:08:27.050 --> 00:08:30.329
AI agent specifically to act as a bouncer? Its

00:08:30.329 --> 00:08:33.029
entire job is just to inspect third -party agents.

00:08:33.269 --> 00:08:36.350
That's a fantastic safety practice. Build a read

00:08:36.350 --> 00:08:39.549
-only verifier subagent just for that exact purpose.

00:08:39.730 --> 00:08:42.029
It reviews the code before the boss ever sees

00:08:42.029 --> 00:08:44.769
it. Why isn't a prompt like, do not edit files

00:08:44.769 --> 00:08:48.350
enough to keep an agent safe? Because text instructions

00:08:48.350 --> 00:08:51.909
can be ignored or hallucinated away. Real permission

00:08:51.909 --> 00:08:54.649
limits physically block the AI from taking dangerous

00:08:54.649 --> 00:08:58.009
action. Never trust a polite request. Use hard

00:08:58.009 --> 00:09:00.169
permission blocks instead. That is the golden

00:09:00.169 --> 00:09:02.610
rule of agent security. So our system is safely

00:09:02.610 --> 00:09:05.590
locked down and isolated now. How far can we

00:09:05.590 --> 00:09:07.809
actually push this boss -worker relationship?

00:09:08.269 --> 00:09:10.750
What is the true ceiling here? We can push it

00:09:10.750 --> 00:09:13.370
remarkably far. Your main session can spin up

00:09:13.370 --> 00:09:15.669
massive numbers of sub -agents. We're not just

00:09:15.669 --> 00:09:17.409
talking about three or five agents. We're talking

00:09:17.409 --> 00:09:20.490
about serious dynamic scale. You can deploy 40.

00:09:20.889 --> 00:09:24.350
100 or even 200 agents. And they all run in perfect

00:09:24.350 --> 00:09:26.990
parallel simultaneously. I've seen developers

00:09:26.990 --> 00:09:29.330
trigger this by typing ultra code. It's kind

00:09:29.330 --> 00:09:32.009
of become a meme online. Yeah, that's a dramatic

00:09:32.009 --> 00:09:34.629
trigger phrase people use. The use cases for

00:09:34.629 --> 00:09:37.389
this scale are completely wild. You can review

00:09:37.389 --> 00:09:40.029
a massive legacy code base instantly. You can

00:09:40.029 --> 00:09:42.629
test dozens of experimental bug fixes all at

00:09:42.629 --> 00:09:45.149
once. Or review a massive thousand page book

00:09:45.149 --> 00:09:48.860
chapter by chapter. Two sec silence. Whoa, imagine

00:09:48.860 --> 00:09:51.600
scaling to 200 agents testing code -based fixes

00:09:51.600 --> 00:09:53.759
at once. That's a whole tech company inside your

00:09:53.759 --> 00:09:56.159
laptop. It really is. It fundamentally changes

00:09:56.159 --> 00:09:58.860
how fast you can iterate. But there's a very

00:09:58.860 --> 00:10:01.740
serious warning here. This burns through your

00:10:01.740 --> 00:10:04.720
session limits incredibly fast. It eats through

00:10:04.720 --> 00:10:07.259
your API tokens like crazy. Yeah, I can see how

00:10:07.259 --> 00:10:08.840
that gets expensive quickly. If you aren't paying

00:10:08.840 --> 00:10:12.389
attention, that's a massive bill. Exactly. Don't

00:10:12.389 --> 00:10:14.909
use dynamic workflows for small, trivial edits.

00:10:15.190 --> 00:10:17.850
Save this incredible power for the truly massive

00:10:17.850 --> 00:10:20.570
parallel tasks. When should I absolutely avoid

00:10:20.570 --> 00:10:23.450
using a subagent or dynamic workflow? Avoid them

00:10:23.450 --> 00:10:26.269
when the task is tiny, or if it depends heavily

00:10:26.269 --> 00:10:29.529
on previous chat history, or if it requires constant

00:10:29.529 --> 00:10:31.649
back -and -forth conversation with you. Keep

00:10:31.649 --> 00:10:34.230
small, highly conversational tasks in the main

00:10:34.230 --> 00:10:36.750
chat window. Exactly. Don't overcomplicate the

00:10:36.750 --> 00:10:39.259
simple stuff. So... Stepping back from all the

00:10:39.259 --> 00:10:41.480
technical execution, what does this all mean

00:10:41.480 --> 00:10:44.299
for us? If we connect this to the bigger picture,

00:10:44.460 --> 00:10:47.600
the core philosophy here is delegation. You have

00:10:47.600 --> 00:10:49.759
to change how you think. Ask yourself a simple

00:10:49.759 --> 00:10:53.220
question. Is this task going to dump a huge pile

00:10:53.220 --> 00:10:55.820
of text into my chat window? Stuff that I'll

00:10:55.820 --> 00:10:58.080
never actually read again. If the answer is yes,

00:10:58.220 --> 00:11:01.299
you delegate it immediately. Exactly. Keep the

00:11:01.299 --> 00:11:04.120
boss smart on Opus. Make the workers cheap on

00:11:04.120 --> 00:11:07.279
Haiku. Keep your context completely clean. And

00:11:07.279 --> 00:11:09.899
start small by building highly specific narrow

00:11:09.899 --> 00:11:12.639
specialists. We're moving into a really strange

00:11:12.639 --> 00:11:15.700
new era. You're no longer just an AI user typing

00:11:15.700 --> 00:11:18.779
prompts. You're an actual AI middle manager right

00:11:18.779 --> 00:11:20.740
now. Think about the implications of that for

00:11:20.740 --> 00:11:23.019
a second. What happens when your subagents eventually

00:11:23.019 --> 00:11:25.600
get the ability to hire their own subagents?

00:11:25.620 --> 00:11:27.500
At what point do we lose track of the bureaucracy?

00:11:27.940 --> 00:11:30.240
There's going to be this massive invisible bureaucracy

00:11:30.240 --> 00:11:33.399
happening inside our own computers. Beat. That

00:11:33.399 --> 00:11:35.779
is a mildly terrifying but utterly fascinating

00:11:35.779 --> 00:11:38.679
thought. It changes the whole definition of personal

00:11:38.679 --> 00:11:41.200
computing. It really does. I want to challenge

00:11:41.200 --> 00:11:43.600
you, the listener, go into your .cloud agents

00:11:43.600 --> 00:11:46.220
folder today. Just write one simple read -only

00:11:46.220 --> 00:11:48.320
subagent. See what happens when you delegate

00:11:48.320 --> 00:11:50.659
a tiny piece of your workflow. Thanks for joining

00:11:50.659 --> 00:11:53.019
us on this deep dive. Let those prep cooks get

00:11:53.019 --> 00:11:53.419
to work.