WEBVTT

00:00:00.000 --> 00:00:02.740
I spent the better part of last week just copying

00:00:02.740 --> 00:00:05.759
code snippets out of my IDE, pacing them into

00:00:05.759 --> 00:00:09.140
a chat window, waiting, and then carefully pacing

00:00:09.140 --> 00:00:11.259
the fix back in. It's so tedious, right? And

00:00:11.259 --> 00:00:14.060
it's fundamentally slow. Yeah. That realization,

00:00:14.359 --> 00:00:17.160
that friction, is the critical shift. We're still

00:00:17.160 --> 00:00:20.539
treating these incredible models like, you know,

00:00:20.699 --> 00:00:22.539
glorified chatbots. When they could be working

00:00:22.539 --> 00:00:25.440
directly in the project, autonomously. Exactly.

00:00:25.519 --> 00:00:27.859
We need to move beyond being the middleman. So

00:00:27.859 --> 00:00:31.510
this deep dive is about taking these technical

00:00:31.510 --> 00:00:34.009
guides we have, these blueprints, and forging

00:00:34.009 --> 00:00:36.929
them into a real strategy. The goal is to build

00:00:36.929 --> 00:00:40.579
an autonomous, multi -agent AI coding team. Our

00:00:40.579 --> 00:00:43.340
mission today for you is to extract the concrete

00:00:43.340 --> 00:00:47.079
tools, the specific GitHub workflows, and the

00:00:47.079 --> 00:00:48.960
crucial security measures you need to deploy

00:00:48.960 --> 00:00:51.460
a genuine 2 .0 .4 .7 development team. A team

00:00:51.460 --> 00:00:53.500
that lives inside your GitHub repository. And

00:00:53.500 --> 00:00:55.679
fixes issues while you're sleeping. OK, let's

00:00:55.679 --> 00:00:58.079
unpack this core idea first. Why bother with

00:00:58.079 --> 00:01:00.340
three different AI agents? Why not just use one

00:01:00.340 --> 00:01:03.079
massive model and prompt it to do everything?

00:01:03.219 --> 00:01:05.420
Isn't that just adding complexity? That's a great

00:01:05.420 --> 00:01:07.799
question, and it gets right to the heart of the

00:01:07.799 --> 00:01:11.760
architecture. The future here isn't one monolithic

00:01:11.760 --> 00:01:15.939
brain. It's a coordinated, specialized system.

00:01:16.340 --> 00:01:19.019
Think of it like a digital team. You want specific

00:01:19.019 --> 00:01:22.120
outputs for specific tasks, and different models

00:01:22.120 --> 00:01:25.099
give you different trade -offs. Speed, cost,

00:01:25.500 --> 00:01:28.390
and most importantly, control. Control being

00:01:28.390 --> 00:01:31.569
the kicker. Yes. Just like in a human team, you

00:01:31.569 --> 00:01:34.129
wouldn't hire your most creative architect to

00:01:34.129 --> 00:01:36.730
do a rigid, repetitive security audit. So we

00:01:36.730 --> 00:01:38.569
categorize them. So tell us what the lineup.

00:01:38.829 --> 00:01:40.930
Who's on the team? First up, we have the hybrid

00:01:40.930 --> 00:01:43.590
worker. We use Claude code for this. This is

00:01:43.590 --> 00:01:45.829
your best reasoning engine. It's great for complex

00:01:45.829 --> 00:01:48.849
logic, but, and this is key, human approval is

00:01:48.849 --> 00:01:50.650
required before it merges anything. So you're

00:01:50.650 --> 00:01:53.109
always in the loop. Safety first. Got it. Then

00:01:53.109 --> 00:01:55.569
you have the strict worker. For this, OpenAI

00:01:55.569 --> 00:01:57.750
Codex is ideal. This is where predictability

00:01:57.750 --> 00:02:01.049
is everything. Boring, repeatable tasks. Exactly.

00:02:01.230 --> 00:02:04.549
Writing unit tests, updating redomies, generating

00:02:04.549 --> 00:02:06.810
basic documentation. It has to follow a rigid,

00:02:07.069 --> 00:02:08.930
predictable process. And finally, the one that

00:02:08.930 --> 00:02:10.990
sounds the most fun. That's the fast worker.

00:02:11.590 --> 00:02:15.469
We use the cursor CLI for this. This is for complete

00:02:15.469 --> 00:02:20.150
autonomy. Full speed. It edits files, saves changes,

00:02:20.629 --> 00:02:23.030
and submits a pull request without you ever having

00:02:23.030 --> 00:02:26.060
to check in. Pure velocity. Pure velocity. So

00:02:26.060 --> 00:02:28.340
how does organizing them into these specialized

00:02:28.340 --> 00:02:32.020
roles actually improve efficiency over just using

00:02:32.020 --> 00:02:35.020
a single powerful model? Specialized roles give

00:02:35.020 --> 00:02:37.639
you precise control over the output. You can

00:02:37.639 --> 00:02:40.520
optimize for speed or rigidity as needed. Now,

00:02:40.539 --> 00:02:42.580
to make this system work, you need a foundation.

00:02:43.139 --> 00:02:45.949
If the agents are the workers, GitHub Actions

00:02:45.949 --> 00:02:49.150
is the robotic manager that tells them when to

00:02:49.150 --> 00:02:50.969
show up. Let's define that, Jorgan, quickly.

00:02:51.469 --> 00:02:53.569
GitHub Actions are basically a robotic butler

00:02:53.569 --> 00:02:55.610
for your code. Yeah, that's a great way to put

00:02:55.610 --> 00:02:57.729
it. They just follow a recipe when a specific

00:02:57.729 --> 00:02:59.370
event happens, like someone posting a comment.

00:02:59.629 --> 00:03:02.469
And that recipe needs three critical parts. First,

00:03:02.689 --> 00:03:05.069
the trigger. This is the magic word, maybe at

00:03:05.069 --> 00:03:07.870
CloudFix or at CursorFix, that you type as a

00:03:07.870 --> 00:03:09.889
comment on a GitHub issue. That wakes the bot

00:03:09.889 --> 00:03:12.069
up. Wakes the right bot up. Second, and this

00:03:12.069 --> 00:03:14.939
is crucial for security, is the runner. You're

00:03:14.939 --> 00:03:17.020
not running this on your laptop. GitHub gives

00:03:17.020 --> 00:03:19.360
you a temporary computer, a virtual machine,

00:03:19.759 --> 00:03:22.259
that spins up securely, does the work, and then

00:03:22.259 --> 00:03:24.879
just deletes itself. And the third part is the

00:03:24.879 --> 00:03:27.520
script itself, the workflow file. The recipe.

00:03:27.759 --> 00:03:30.280
It's a YAML text file that details all the steps.

00:03:30.659 --> 00:03:32.680
Read the comment, check who the user is, wake

00:03:32.680 --> 00:03:35.360
up the AI, send the code, save the changes. It's

00:03:35.360 --> 00:03:39.240
the conductor. Beyond security, What's the practical

00:03:39.240 --> 00:03:41.740
advantage of using a temporary GitHub runner

00:03:41.740 --> 00:03:44.340
instead of running these tasks locally? It prevents

00:03:44.340 --> 00:03:47.539
complex resource -intensive operations from slowing

00:03:47.539 --> 00:03:50.620
down your own computer. Okay, let's dive into

00:03:50.620 --> 00:03:53.860
method one, the safest approach, the hybrid model

00:03:53.860 --> 00:03:57.319
using Claude. The key here is that approval is

00:03:57.319 --> 00:03:59.500
always required. The human stays firmly in the

00:03:59.500 --> 00:04:01.139
loop. And that safety starts with permissions.

00:04:01.400 --> 00:04:03.740
It's non -negotiable. The first line of defense

00:04:03.740 --> 00:04:06.300
in that workflow file is a guest list, a defined

00:04:06.300 --> 00:04:08.639
list of authorized GitHub usernames. If anyone

00:04:08.639 --> 00:04:10.520
else tries to trigger the bot, the action just

00:04:10.520 --> 00:04:12.740
fails. Okay, so once you've secured who can use

00:04:12.740 --> 00:04:15.300
it, how do you instruct the bot without rewriting

00:04:15.300 --> 00:04:17.759
a huge prompt every single time? You definitely

00:04:17.759 --> 00:04:19.800
don't want to do that. You save a standard set

00:04:19.800 --> 00:04:22.420
of instructions in an instructions .md file.

00:04:22.939 --> 00:04:25.980
This tells Claude its role may be expert senior

00:04:25.980 --> 00:04:28.639
software engineer, and sets expectations for

00:04:28.639 --> 00:04:31.540
its tone. But just instructions for its personality

00:04:31.540 --> 00:04:33.779
isn't enough for code style, right? No, because

00:04:33.779 --> 00:04:36.000
the AI is only as good as the context you give

00:04:36.000 --> 00:04:38.920
it. And the critical trick here, and this is

00:04:38.920 --> 00:04:41.920
a constant battle, I still wrestle with prompt

00:04:41.920 --> 00:04:45.779
drift myself, is using an agents .md file in

00:04:45.779 --> 00:04:48.339
the repository. Tell me more about that specific

00:04:48.339 --> 00:04:50.620
file. That file is your project style guide,

00:04:50.639 --> 00:04:53.279
but for machines, it details your coding rules,

00:04:53.740 --> 00:04:56.279
always use TypeScript, indent with two spaces,

00:04:56.699 --> 00:04:59.600
all functions must be camel case. And every bot

00:04:59.600 --> 00:05:02.399
reads this first? Every single AI bot, no matter

00:05:02.399 --> 00:05:04.420
the model, is instructed to read that file first.

00:05:04.660 --> 00:05:07.160
This guarantees consistency. It stops the bots

00:05:07.160 --> 00:05:09.439
from contradicting each other's style. So what's

00:05:09.439 --> 00:05:12.290
the outcome of this hybrid approach? Claude reads

00:05:12.290 --> 00:05:14.430
the issue, reads the context, reads the style

00:05:14.430 --> 00:05:16.750
guide, creates a new Git branch with the fix,

00:05:17.129 --> 00:05:19.350
and then it comments back on the original issue

00:05:19.350 --> 00:05:21.610
with a clickable link to open the pull request

00:05:21.610 --> 00:05:24.449
for your final human review. So if the AI is

00:05:24.449 --> 00:05:27.949
so smart, why is using agents .md so crucial

00:05:27.949 --> 00:05:29.709
instead of just telling it the style once in

00:05:29.709 --> 00:05:33.110
a prompt? Contextual documentation ensures every

00:05:33.110 --> 00:05:35.509
bot consistently adheres to project standards.

00:05:35.829 --> 00:05:39.079
And that moves us to method two. strict and deterministic

00:05:39.079 --> 00:05:42.519
using OpenAI Codex. Here, the focus shifts to

00:05:42.519 --> 00:05:46.120
total uncompromising control. The AI only outputs

00:05:46.120 --> 00:05:48.540
text. The workflow controls everything else.

00:05:48.959 --> 00:05:51.079
I want to pause on that word, deterministic.

00:05:51.399 --> 00:05:54.339
That means predictable. Can AI -generated code

00:05:54.339 --> 00:05:56.339
really be deterministic, or does that just mean

00:05:56.339 --> 00:05:58.970
the process around the code is rigid? It's the

00:05:58.970 --> 00:06:01.290
latter. The process is absolutely rigid. The

00:06:01.290 --> 00:06:03.470
YAML file dictates everything. The branch name,

00:06:03.610 --> 00:06:05.509
the commit message where the file is saved, opening

00:06:05.509 --> 00:06:09.009
the PR. The AI's only job is to fill in the blanks

00:06:09.009 --> 00:06:11.639
with code. So you're just focusing its execution?

00:06:11.779 --> 00:06:13.680
Entirely. And the perfect use case for this is?

00:06:13.959 --> 00:06:16.560
Unit tests. Unit tests. It's boring, necessary

00:06:16.560 --> 00:06:18.680
work. A human doesn't want to spend two hours

00:06:18.680 --> 00:06:21.100
writing jest tests for every edge case. You set

00:06:21.100 --> 00:06:24.399
the AI's role to QA engineer, specify the framework,

00:06:24.839 --> 00:06:27.180
and demand exhaustive coverage. And what's the

00:06:27.180 --> 00:06:29.339
key to making sure the AI stays in that rigid

00:06:29.339 --> 00:06:31.879
lane? The prompt has to be explicit. It must

00:06:31.879 --> 00:06:35.490
include output. Only the code. Do not talk to

00:06:35.490 --> 00:06:38.769
me. Do not explain anything. Just the text. And

00:06:38.769 --> 00:06:41.290
the workflow script just grabs that raw output.

00:06:41.629 --> 00:06:44.189
Exactly. Captures that raw code and saves it

00:06:44.189 --> 00:06:46.529
directly into the right file like my -function

00:06:46.529 --> 00:06:49.850
.test .js before pushing it. No explanation needed.

00:06:50.189 --> 00:06:53.410
No conversation. So does this deterministic setup

00:06:53.410 --> 00:06:56.569
fundamentally limit the AI's ability to reason?

00:06:56.860 --> 00:06:59.500
Or does it just focus its execution? It focuses

00:06:59.500 --> 00:07:02.420
the execution, ensuring the output aligns perfectly

00:07:02.420 --> 00:07:04.720
with predictable file structures. Okay, now for

00:07:04.720 --> 00:07:07.379
maximum speed, this is method three, autonomous

00:07:07.379 --> 00:07:10.000
speed with the cursor CLI. This is where we run

00:07:10.000 --> 00:07:12.259
it in what's called headless mode. Right. Headless

00:07:12.259 --> 00:07:14.300
mode is just jargon for running a tool in the

00:07:14.300 --> 00:07:16.360
background without a user interface. It's perfect

00:07:16.360 --> 00:07:18.800
for an automated GitHub runner. We install the

00:07:18.800 --> 00:07:21.860
cursor CLI, and suddenly the AI has access to

00:07:21.860 --> 00:07:23.660
the command line. And because it has terminal

00:07:23.660 --> 00:07:26.480
access, you can skip a lot of the complex YAML

00:07:26.480 --> 00:07:28.680
scripting we needed for codecs. You just give

00:07:28.680 --> 00:07:31.319
it the do everything prompt. Precisely. You're

00:07:31.319 --> 00:07:33.899
basically telling the AI to manage the whole

00:07:33.899 --> 00:07:36.720
Git process itself. The prompt literally says

00:07:36.720 --> 00:07:39.560
things like, create a new branch called feature

00:07:39.560 --> 00:07:44.019
6, update the CSS file, verify the changes, commit

00:07:44.019 --> 00:07:47.560
them, and then use gapr create to open a pull

00:07:47.560 --> 00:07:50.379
request. So that's the power analogy here. You

00:07:50.379 --> 00:07:52.920
type in cursor fix, close your laptop, go make

00:07:52.920 --> 00:07:54.839
a coffee. And the PR is waiting for you when

00:07:54.839 --> 00:07:57.980
you get back. It's tireless, instantaneous development.

00:07:58.300 --> 00:08:01.829
Whoa. Just imagine scaling this across an entire

00:08:01.829 --> 00:08:04.949
company to handle thousands of repo updates instantly

00:08:04.949 --> 00:08:06.850
after a big security vulnerability is found.

00:08:07.230 --> 00:08:09.230
Humans can't match that speed. It's a different

00:08:09.230 --> 00:08:12.089
scale of maintenance. So since this is the fastest,

00:08:12.149 --> 00:08:14.509
most autonomous method, what practical steps

00:08:14.509 --> 00:08:17.209
should a manager take to audit its output effectively?

00:08:17.589 --> 00:08:19.829
Auditing is managed by integrating immediate

00:08:19.829 --> 00:08:22.470
automated security checks before the PR is even

00:08:22.470 --> 00:08:24.470
created. Before we move on, let's take a quick

00:08:24.470 --> 00:08:28.670
break. Welcome back to Deep Dive. We've talked

00:08:28.670 --> 00:08:30.310
about speed and autonomy, but when you give a

00:08:30.310 --> 00:08:32.309
machine that much power, security has to be the

00:08:32.309 --> 00:08:34.690
very next thought. Absolutely. If this AI team

00:08:34.690 --> 00:08:37.230
is working 24 -7, we need a security guard watching

00:08:37.230 --> 00:08:40.289
over them. And that guard is a tool like SonarCube.

00:08:40.590 --> 00:08:43.129
Exactly. It acts like an advanced spell checker

00:08:43.129 --> 00:08:45.669
for your code, finding bugs and critical security

00:08:45.669 --> 00:08:48.809
risks like SQL injection vulnerabilities. The

00:08:48.809 --> 00:08:51.750
key is integrating this scan before the code

00:08:51.750 --> 00:08:53.950
ever gets to a human reviewer. This is where

00:08:53.950 --> 00:08:56.549
the workflow gets really smart. The process is

00:08:56.549 --> 00:08:59.879
simple. AI writes the code, SonarKrub scans it,

00:09:00.179 --> 00:09:03.139
and if a security issue is found, SonarKrub immediately

00:09:03.139 --> 00:09:05.700
tells the AI that wrote it. to try again. It

00:09:05.700 --> 00:09:08.120
creates a self -correction loop. Right. The AI

00:09:08.120 --> 00:09:10.980
fixes the security issue on the spot, using that

00:09:10.980 --> 00:09:13.700
feedback, and only then is the pull request created.

00:09:14.059 --> 00:09:16.879
This means the code reaching a human is already

00:09:16.879 --> 00:09:20.059
pre -vetted and cleaner. It cuts down on so much

00:09:20.059 --> 00:09:22.480
wasted time. That dramatically improves the quality

00:09:22.480 --> 00:09:24.639
upstream. And my favorite concept from these

00:09:24.639 --> 00:09:26.960
guides has to be the triangle strategy. Making

00:09:26.960 --> 00:09:29.419
the agents check each other's work? It's AI peer

00:09:29.419 --> 00:09:32.480
review. It's formalized AI peer review. So if

00:09:32.480 --> 00:09:34.759
cursor writes the code, you immediately trigger

00:09:34.759 --> 00:09:37.360
Claude to review it, but with a different specialized

00:09:37.360 --> 00:09:39.799
prompt. You tell Claude its role is a strict

00:09:39.799 --> 00:09:43.210
senior tech lead. And the review prompt is the

00:09:43.210 --> 00:09:45.570
key here. You can't just say, look for errors.

00:09:45.690 --> 00:09:48.169
No, you have to give it critical evaluation criteria.

00:09:48.429 --> 00:09:50.490
Tell it to look for performance bottlenecks,

00:09:51.029 --> 00:09:54.230
like nested loops inside other loops, or check

00:09:54.230 --> 00:09:56.669
for obscure violations of your naming conventions.

00:09:56.970 --> 00:10:00.210
I saw a perfect example of this. An AI updated

00:10:00.210 --> 00:10:03.769
a config file correctly. But the reviewer, AI,

00:10:04.049 --> 00:10:06.649
the strict tech lead, commented that the main

00:10:06.649 --> 00:10:09.429
README file hadn't been updated to reflect the

00:10:09.429 --> 00:10:12.210
change. It found a communication gap a human

00:10:12.210 --> 00:10:14.909
might have easily missed. When you're using this

00:10:14.909 --> 00:10:17.509
triangle strategy, how do you stop the reviewer

00:10:17.509 --> 00:10:21.029
AI from just agreeing with the first AI to save

00:10:21.029 --> 00:10:24.559
time? This strict senior tech lead prompt enforces

00:10:24.559 --> 00:10:27.000
constructive but critical evaluation criteria,

00:10:27.139 --> 00:10:29.379
like performance metrics. It forces an adversarial

00:10:29.379 --> 00:10:32.039
role. This whole setup can sound complex, but

00:10:32.039 --> 00:10:34.039
the sources really emphasize you don't need to

00:10:34.039 --> 00:10:36.039
deploy all three bots at once. No, you start

00:10:36.039 --> 00:10:38.100
small. And we have a simple checklist for getting

00:10:38.100 --> 00:10:41.220
started. Okay, what's first? First, prepare your

00:10:41.220 --> 00:10:45.919
API keys. Claude, OpenAI, Cursor. A useful tip

00:10:45.919 --> 00:10:48.639
here is to use the Claude setup token command,

00:10:48.860 --> 00:10:50.759
which can tap into your monthly subscriptions

00:10:50.759 --> 00:10:53.879
and potentially save on costs. Second, and this

00:10:53.879 --> 00:10:56.960
is an absolute necessity, add those keys as secrets

00:10:56.960 --> 00:11:00.559
in your GitHub repository settings. Never, ever

00:11:00.559 --> 00:11:04.019
paste your keys directly into code files. Huge

00:11:04.019 --> 00:11:06.720
security failure waiting to happen. Right. And

00:11:06.720 --> 00:11:10.299
third. Start with the safest method. Create one

00:11:10.299 --> 00:11:13.019
simple hybrid workflow with Claude. Get that

00:11:13.019 --> 00:11:15.419
YAML file running requiring manual approval for

00:11:15.419 --> 00:11:17.659
everything until you've built up trust in the

00:11:17.659 --> 00:11:19.960
system. Finally, let's talk about common pitfalls.

00:11:20.179 --> 00:11:22.080
I learned some of these the hard way. The first

00:11:22.080 --> 00:11:25.399
is the infinite loop. Yes, a classic automation

00:11:25.399 --> 00:11:27.519
trap. You have to make sure the AI doesn't trigger

00:11:27.519 --> 00:11:30.000
itself. How does that happen? Well, if your workflow

00:11:30.000 --> 00:11:32.820
triggers on a new comment and the AI posts a

00:11:32.820 --> 00:11:34.639
comment like, I fixed it, that could trigger

00:11:34.639 --> 00:11:36.299
the workflow again. So you have to filter that

00:11:36.299 --> 00:11:38.480
out. Exactly. You check who the actor, the user

00:11:38.480 --> 00:11:40.600
of the comment is. If it's the bot, you break

00:11:40.600 --> 00:11:43.179
the loop. The second mistake is ignoring context.

00:11:44.039 --> 00:11:46.320
If you don't spend time writing a solidagents

00:11:46.320 --> 00:11:49.500
.md file, the AI is just going to guess at your

00:11:49.500 --> 00:11:52.179
project's style, and it will probably guess wrong.

00:11:52.559 --> 00:11:55.490
And finally, cost management. This is critical.

00:11:55.730 --> 00:11:58.210
Tokens are like water from a tap. Start with

00:11:58.210 --> 00:12:01.490
small, specific tasks like fix this one function

00:12:01.490 --> 00:12:04.549
in this one file. Don't say rewrite my entire

00:12:04.549 --> 00:12:06.909
app. You will get a very large bill if you do

00:12:06.909 --> 00:12:09.509
that. Be surgical in your requests. So what does

00:12:09.509 --> 00:12:11.889
this all mean for you, the listener? The core

00:12:11.889 --> 00:12:14.629
lesson here is this fundamental shift in your

00:12:14.629 --> 00:12:16.789
role. You go from user to manager. You aren't

00:12:16.789 --> 00:12:19.450
replacing yourself. You're directing a team of

00:12:19.450 --> 00:12:21.850
fast, tireless workers who handle the typing,

00:12:22.009 --> 00:12:23.750
the coding, the testing, even the first round

00:12:23.750 --> 00:12:26.389
of security checks, you get to focus on architecture

00:12:26.389 --> 00:12:28.830
and strategy. It really transforms the definition

00:12:28.830 --> 00:12:32.129
of a developer. Building this 2047 autonomous

00:12:32.129 --> 00:12:34.970
dev team feels like science fiction, but it's

00:12:34.970 --> 00:12:37.169
really just connecting existing tools through

00:12:37.169 --> 00:12:39.419
smart, repeatable workflows. And it leaves you

00:12:39.419 --> 00:12:41.159
with a really interesting question to think about.

00:12:41.559 --> 00:12:44.360
If these autonomous AI systems, driven by strict

00:12:44.360 --> 00:12:47.220
templates and validators like SonarKweeb, increasingly

00:12:47.220 --> 00:12:49.779
write and review our code based on fixed rules,

00:12:50.259 --> 00:12:52.519
will this ultimate focus on efficiency lead to

00:12:52.519 --> 00:12:55.299
a global homogenization of programming style?

00:12:55.620 --> 00:12:58.360
Will the unique creative quirks that human architects

00:12:58.360 --> 00:13:01.360
bring to software eventually fade away in favor

00:13:01.360 --> 00:13:04.139
of perfect, predictable, and maybe identical

00:13:04.139 --> 00:13:05.980
structure? Something to consider as you start

00:13:05.980 --> 00:13:07.179
building your own digital team.