WEBVTT

00:00:00.000 --> 00:00:03.580
So it's January 2026, and we're sitting on top

00:00:03.580 --> 00:00:06.400
of, you know, the most powerful computational

00:00:06.400 --> 00:00:09.539
engine in human history, GPT -5. The benchmarks

00:00:09.539 --> 00:00:12.000
are just, they're off the charts. They're absurd.

00:00:12.300 --> 00:00:15.720
And the reasoning, I mean, theoretically, it

00:00:15.720 --> 00:00:18.980
rivals human experts in almost every field. And

00:00:18.980 --> 00:00:21.500
yet, I was talking to a friend yesterday, a really

00:00:21.500 --> 00:00:25.219
smart guy, uses AI for coding, and he said something

00:00:25.219 --> 00:00:27.839
that just stuck with me. He said, I feel like

00:00:27.839 --> 00:00:30.699
I'm fighting it. He types a question and the

00:00:30.699 --> 00:00:33.969
answer he gets back feels. It feels lazy. It

00:00:33.969 --> 00:00:36.549
feels bored. It kind of feels cheap. Yeah. And

00:00:36.549 --> 00:00:39.070
it's this strange paradox where the model is

00:00:39.070 --> 00:00:41.590
smarter than ever, but the user experience feels

00:00:41.590 --> 00:00:44.229
like it's degrading. It's the GPT -5 paradox.

00:00:44.469 --> 00:00:46.409
You have this Ferrari engine, but the car is

00:00:46.409 --> 00:00:48.090
locked in first gear. And you're pressing the

00:00:48.090 --> 00:00:49.890
gas, but it just won't go. It just won't go.

00:00:50.030 --> 00:00:52.210
So today, we're going to figure out how to shift

00:00:52.210 --> 00:00:54.530
gears. We are deep diving into a guide called

00:00:54.530 --> 00:00:58.390
Five Advanced Chat GPT Tricks for GPT -5 Mastery.

00:00:58.409 --> 00:01:00.750
Right. But after reading through this, tricks...

00:01:01.259 --> 00:01:02.659
That feels like the wrong word. It's more like

00:01:02.659 --> 00:01:05.159
we're decoding a new psychological relationship

00:01:05.159 --> 00:01:07.980
between us and the machine. I think so, too.

00:01:08.099 --> 00:01:10.680
Because it turns out there's an invisible decision

00:01:10.680 --> 00:01:13.920
maker standing between us and all that intelligence.

00:01:14.079 --> 00:01:17.459
Right. The router. And if you don't get the router,

00:01:17.620 --> 00:01:20.319
you're essentially getting the discount version

00:01:20.319 --> 00:01:22.799
of the AI, no matter how much you're paying for

00:01:22.799 --> 00:01:25.760
that pro subscription. Welcome to the deep dive.

00:01:26.340 --> 00:01:28.439
Today, we're going to break down this idea of

00:01:28.439 --> 00:01:31.359
router influence. We'll look at the architecture

00:01:31.359 --> 00:01:34.519
of 2026, and then we'll walk through five really

00:01:34.519 --> 00:01:36.980
specific strategies. Things like trigger words,

00:01:37.239 --> 00:01:39.599
radical specificity. And something called self

00:01:39.599 --> 00:01:42.400
-reflection loops, which supposedly force this

00:01:42.400 --> 00:01:44.819
router to actually give us the intelligence we're

00:01:44.819 --> 00:01:47.159
asking for. What I love about this whole analysis

00:01:47.159 --> 00:01:49.739
is that it moves us away from prompt engineering

00:01:49.739 --> 00:01:52.540
as some kind of mystical art. Yeah. And it treats

00:01:52.540 --> 00:01:54.799
it more like system administration. It's just

00:01:54.799 --> 00:01:56.500
about understanding that there's a gatekeeper

00:01:56.500 --> 00:01:58.659
and you need the password. Let's linger on that

00:01:58.659 --> 00:02:00.599
gatekeeper for a second, the underlying architecture,

00:02:00.900 --> 00:02:02.739
because I think a lot of us still have a mental

00:02:02.739 --> 00:02:06.219
model from, you know, 2023 or 2024. Oh, for sure.

00:02:06.359 --> 00:02:11.180
So walk me back to the vintage era of AI. How

00:02:11.180 --> 00:02:13.240
did we used to interact with these things? Well,

00:02:13.300 --> 00:02:15.360
it was manual transmission. Yeah. Think back

00:02:15.360 --> 00:02:18.680
to ChatGPT in late 2024. Yeah. You had that little

00:02:18.680 --> 00:02:21.400
drop down menu at the top left. Right. You'd

00:02:21.400 --> 00:02:23.319
log in and you had to make a conscious executive

00:02:23.319 --> 00:02:26.419
decision. I am writing a poem, so I will select.

00:02:26.960 --> 00:02:30.259
GPT -40 or... I'm solving a complex physics problem,

00:02:30.360 --> 00:02:33.099
so I'll need a one preview. Exactly. You, the

00:02:33.099 --> 00:02:34.939
human, you were the load balancer. You decided

00:02:34.939 --> 00:02:37.120
how much horsepower to use. And there was a tangible

00:02:37.120 --> 00:02:39.520
difference. If I pick the big reasoning model,

00:02:39.680 --> 00:02:41.439
I knew I was going to stare at a spinning circle

00:02:41.439 --> 00:02:44.460
for 30 seconds. Yeah, you waited. But I knew

00:02:44.460 --> 00:02:46.379
I was thinking. I was basically buying depth

00:02:46.379 --> 00:02:49.360
with my time. Precisely. But here's the reality

00:02:49.360 --> 00:02:53.960
of 2026. Open AI. and really all the labs, they

00:02:53.960 --> 00:02:57.520
realized that humans are, well, terrible at load

00:02:57.520 --> 00:02:59.620
balancing. We're wasteful. We are so wasteful.

00:02:59.680 --> 00:03:02.120
We would use these massive energy -lucking models

00:03:02.120 --> 00:03:05.000
to ask for, like, a chocolate chip cookie recipe.

00:03:05.120 --> 00:03:07.099
Right. And that burns a tremendous amount of

00:03:07.099 --> 00:03:10.159
compute and money for a task a pocket calculator

00:03:10.159 --> 00:03:12.520
could almost do. So they took the keys away from

00:03:12.520 --> 00:03:15.539
us. They automated the transmission. Now, under

00:03:15.539 --> 00:03:17.800
the hood, there are basically three engines,

00:03:18.120 --> 00:03:21.659
base, thinking, and pro. But you don't see them.

00:03:21.960 --> 00:03:25.039
Okay. When you hit enter, your prompt goes to

00:03:25.039 --> 00:03:28.000
the router. This is lightweight, invisible AI

00:03:28.000 --> 00:03:31.400
layer that just acts as a triage nurse. A triage

00:03:31.400 --> 00:03:33.699
nurse. I like that. Yeah. It scans your request

00:03:33.699 --> 00:03:36.900
in milliseconds and decides three things. Which

00:03:36.900 --> 00:03:39.780
model gets the task, how much reasoning budget

00:03:39.780 --> 00:03:42.620
to unlock, and how verbose the answer should

00:03:42.620 --> 00:03:45.780
be. So it's a cost -saving mechanism. It is aggressively

00:03:45.780 --> 00:03:48.090
optimized for efficiency. And that's where all

00:03:48.090 --> 00:03:50.389
the friction comes from. If your prompt is vague

00:03:50.389 --> 00:03:53.569
or short or just looks simple, the router defaults

00:03:53.569 --> 00:03:56.469
to the base engine. It's cheap, it's fast, and

00:03:56.469 --> 00:03:58.610
it saves the data center money. This explains

00:03:58.610 --> 00:04:01.430
the laziness. So if I ask for a business plan,

00:04:01.530 --> 00:04:03.210
but I ask it really casually like, hey, write

00:04:03.210 --> 00:04:05.310
up a plan for a coffee shop. Right. The router

00:04:05.310 --> 00:04:07.289
sees a short sentence, it classifies it as low

00:04:07.289 --> 00:04:09.569
complexity, and just gives me the fast, cheap

00:04:09.569 --> 00:04:12.729
answer. Exactly. You get the base output. It's

00:04:12.729 --> 00:04:14.849
not that GPT -5 isn't smart enough to write a

00:04:14.849 --> 00:04:17.569
brilliant business plan. No. It's that you failed

00:04:17.569 --> 00:04:19.689
to convince the bouncer that you deserve to get

00:04:19.689 --> 00:04:22.529
into the VIP room. You got routed to the lobby.

00:04:22.790 --> 00:04:26.310
So probing question here. We are essentially

00:04:26.310 --> 00:04:28.750
negotiating for compute resources every time

00:04:28.750 --> 00:04:30.750
we type a sentence. That's the mechanism. You

00:04:30.750 --> 00:04:32.550
are negotiating for the machine's attention.

00:04:32.930 --> 00:04:37.149
Okay. That, wow. That completely shifts my perspective.

00:04:37.610 --> 00:04:40.029
I'm not talking to a genius. I'm talking to a

00:04:40.029 --> 00:04:42.689
bureaucrat who decides if I get to see the genius.

00:04:42.870 --> 00:04:44.769
That's a great way to put it. So let's talk about

00:04:44.769 --> 00:04:47.699
how to win that negotiation. The source material

00:04:47.699 --> 00:04:50.500
lays out five strategies. The first one is called

00:04:50.500 --> 00:04:54.980
trigger words. Or router nudges. Now, I have

00:04:54.980 --> 00:04:56.860
to be honest. When I first saw this, it felt

00:04:56.860 --> 00:04:59.660
a little superstitious, you know, like saying

00:04:59.660 --> 00:05:02.160
please to a toaster. Yeah. But the guide claims

00:05:02.160 --> 00:05:04.139
there are specific phrases that mechanically

00:05:04.139 --> 00:05:07.420
force the router to upgrade your request. How

00:05:07.420 --> 00:05:09.459
does that actually work? It's not superstition.

00:05:09.459 --> 00:05:11.860
It's just probability. These models are trained

00:05:11.860 --> 00:05:14.750
on petabytes of data. And in that data, certain

00:05:14.750 --> 00:05:17.509
phrases just correlate very highly with complex,

00:05:17.670 --> 00:05:21.110
high stakes tasks. So when the router sees these

00:05:21.110 --> 00:05:24.110
specific tokens, its internal complexity score

00:05:24.110 --> 00:05:28.069
for your prompt just spikes. It signals that

00:05:28.069 --> 00:05:31.009
the base model will probably fail. So it routes

00:05:31.009 --> 00:05:33.149
you up. Give me the list. What are the words?

00:05:33.290 --> 00:05:36.230
The guide lists a few really powerful ones. Think

00:05:36.230 --> 00:05:39.000
deeply about this. Double check your work. Be

00:05:39.000 --> 00:05:42.759
extremely thorough. And the strongest one seems

00:05:42.759 --> 00:05:45.560
to be this is critical to get right. This is

00:05:45.560 --> 00:05:47.759
critical to get right. It just signals high stakes.

00:05:47.939 --> 00:05:49.740
There's a case study in the source that really

00:05:49.740 --> 00:05:52.240
illustrates this perfectly. The coffee shop example.

00:05:52.540 --> 00:05:54.680
Right. I saw that. So walk us through scenario

00:05:54.680 --> 00:05:57.639
A versus scenario B. So scenario A was a standard

00:05:57.639 --> 00:06:00.279
prompt. Write a business plan for a coffee shop.

00:06:00.740 --> 00:06:03.019
Super typical user behavior. What everyone does.

00:06:03.240 --> 00:06:06.339
Yep. The router sees this. It says generic. and

00:06:06.339 --> 00:06:08.819
sends it to the base model. The result was two

00:06:08.819 --> 00:06:11.279
paragraphs that said things like, sell good coffee,

00:06:11.500 --> 00:06:13.879
hire friendly staff, and pick a busy location.

00:06:14.120 --> 00:06:17.839
Which is fine. It's not wrong. But it's advice

00:06:17.839 --> 00:06:20.699
I could get from a stranger at a bus stop. Exactly.

00:06:20.699 --> 00:06:23.319
It's completely surface level. Yeah. Now, scenario

00:06:23.319 --> 00:06:26.660
B. The prompt was identical, but they added this

00:06:26.660 --> 00:06:29.560
at the end. Think deeply about the competitive

00:06:29.560 --> 00:06:32.779
landscape. This is critical to get right. And

00:06:32.779 --> 00:06:35.639
the result? The router flagged it. It sent the

00:06:35.639 --> 00:06:38.300
prompt straight to the thinking engine. And the

00:06:38.300 --> 00:06:41.199
output was eight paragraphs long. It didn't just

00:06:41.199 --> 00:06:44.560
say pick a location. It broke down unit economics.

00:06:44.959 --> 00:06:48.660
It analyzed local competitors. Wow. It even suggested

00:06:48.660 --> 00:06:50.899
a loyalty program structure based on current

00:06:50.899 --> 00:06:53.879
2026 market trends. And all of that just because

00:06:53.879 --> 00:06:56.819
of six extra words. Because those words unlock

00:06:56.819 --> 00:06:58.740
the compute. It's the difference between asking

00:06:58.740 --> 00:07:01.500
a doctor what's good for a headache versus telling

00:07:01.500 --> 00:07:03.800
them I have a sharp pain behind my left eye and

00:07:03.800 --> 00:07:05.980
I can't see. Right. The second statement triggers

00:07:05.980 --> 00:07:08.319
a protocol. It triggers resources. I have to

00:07:08.319 --> 00:07:10.259
admit something here. I'm usually very polite

00:07:10.259 --> 00:07:12.639
to the AI. I'm constantly saying please and thank

00:07:12.639 --> 00:07:16.180
you. It makes you feel better. It does. But does

00:07:16.180 --> 00:07:19.079
please actually work as a trigger word? Please

00:07:19.079 --> 00:07:22.839
is for you. It's social lubrication. But to the

00:07:22.839 --> 00:07:25.620
router, please is just noise. It doesn't carry

00:07:25.620 --> 00:07:28.300
any informational weight. Critical, on the other

00:07:28.300 --> 00:07:31.300
hand, is a functional command. It tells the system

00:07:31.300 --> 00:07:34.839
to allocate budget. So probing question. Is this

00:07:34.839 --> 00:07:37.579
just adding fluff or is it a functional command?

00:07:37.740 --> 00:07:39.399
It's a functional command. It's the difference

00:07:39.399 --> 00:07:41.959
between asking for a snack and ordering a banquet.

00:07:42.139 --> 00:07:44.779
Okay. Let's move on to the second trick. This

00:07:44.779 --> 00:07:46.720
one surprised me because it involves a tool I

00:07:46.720 --> 00:07:49.319
didn't even know existed. The prompt optimizer.

00:07:49.459 --> 00:07:51.600
Yeah, this is something OpenAI built kind of

00:07:51.600 --> 00:07:53.540
quietly. It's sitting there in the playground

00:07:53.540 --> 00:07:56.019
or the cookbook. But most people are just hammering

00:07:56.019 --> 00:07:57.839
away in the main chat window and never see it.

00:07:57.959 --> 00:08:00.060
So what's the actual function of this tool? Is

00:08:00.060 --> 00:08:02.860
it another AI? It's a specialized model trained

00:08:02.860 --> 00:08:05.800
to do one thing and one thing only. Rewrite bad

00:08:05.800 --> 00:08:09.000
human prompts into good machine prompts. Which...

00:08:09.279 --> 00:08:11.519
Implies that we're generally bad at giving instructions.

00:08:11.980 --> 00:08:13.980
We are terrible at it. And it's not our fault.

00:08:14.060 --> 00:08:17.079
Human language is lossy. We rely on context,

00:08:17.300 --> 00:08:21.560
tone, shared history, what we call vibes. Vibes,

00:08:21.560 --> 00:08:24.420
yeah. But machines hate vibes. They need specs.

00:08:24.839 --> 00:08:27.660
The prompt optimizer is just a translation layer

00:08:27.660 --> 00:08:30.540
that converts your vibes into specs. The source

00:08:30.540 --> 00:08:33.019
gave a before and after example with a newsletter

00:08:33.019 --> 00:08:35.080
that really cleared this up for me. Right. The

00:08:35.080 --> 00:08:38.159
before prompt, the human version was, write a

00:08:38.159 --> 00:08:40.590
newsletter intro. Make it engaging. Write at

00:08:40.590 --> 00:08:42.730
a fifth grade reading level. That's really important.

00:08:42.889 --> 00:08:45.250
Focus on the best writing. Which sounds totally

00:08:45.250 --> 00:08:47.470
reasonable. If I send that to a human freelancer,

00:08:47.549 --> 00:08:49.629
they'd probably get what I meant. Make it engaging.

00:08:49.850 --> 00:08:53.450
Got it. But to an AI, engaging is a subjective

00:08:53.450 --> 00:08:56.529
nightmare. Does engaging mean funny? Does it

00:08:56.529 --> 00:08:58.389
mean controversial? Does it mean using short

00:08:58.389 --> 00:09:01.509
sentences? The router has to guess. So the optimizer

00:09:01.509 --> 00:09:04.070
took that and rewrote it. It did. And the machine

00:09:04.070 --> 00:09:06.919
optimized version. It just stripped out all the

00:09:06.919 --> 00:09:10.279
feelings. Engaging was replaced with maintain

00:09:10.279 --> 00:09:14.059
a Flesch -Kincaid readability score of 80+. Best

00:09:14.059 --> 00:09:17.059
writing was defined as use active voice, one

00:09:17.059 --> 00:09:19.159
main idea per sentence. It turned the request

00:09:19.159 --> 00:09:21.820
into a blueprint. Exactly. It totally eliminates

00:09:21.820 --> 00:09:24.059
the guessing game. It sets hard success criteria.

00:09:24.500 --> 00:09:27.820
But I have to ask, why does the AI need us to

00:09:27.820 --> 00:09:30.509
use the optimizer? why can't it just optimize

00:09:30.509 --> 00:09:32.929
the prompt silently in the background because

00:09:32.929 --> 00:09:35.470
it needs to show you what you did wrong so you

00:09:35.470 --> 00:09:38.850
stop confusing the router ah so the system is

00:09:38.850 --> 00:09:41.470
forcing us to learn the syntax rather than just

00:09:41.470 --> 00:09:44.490
handling it for us it's a mirror it's showing

00:09:44.490 --> 00:09:46.690
you the ambiguity in your own thinking that leads

00:09:46.690 --> 00:09:49.169
perfectly into the third trick because it's all

00:09:49.169 --> 00:09:52.429
about this war against ambiguity the guide calls

00:09:52.429 --> 00:09:55.470
it radical specificity this is where we really

00:09:55.470 --> 00:09:58.299
identify the enemy of the router And that enemy

00:09:58.299 --> 00:10:02.519
is subjective words. Words like nice, fun, or,

00:10:02.600 --> 00:10:05.059
and I use this one all the time, not too crazy.

00:10:05.200 --> 00:10:07.360
Not too crazy is the absolute worst. What does

00:10:07.360 --> 00:10:09.159
that even mean? Where's the boundary? I have

00:10:09.159 --> 00:10:11.639
no idea. When you use a phrase like that, the

00:10:11.639 --> 00:10:14.240
router has to spend its reasoning budget just

00:10:14.240 --> 00:10:16.600
trying to define your terms instead of solving

00:10:16.600 --> 00:10:18.679
your actual problem. So instead of asking for

00:10:18.679 --> 00:10:21.720
a nice party plan, what's the alternative? You

00:10:21.720 --> 00:10:24.700
replace feelings with data. The source uses that

00:10:24.700 --> 00:10:27.500
birthday party example. Instead of plan a nice

00:10:27.500 --> 00:10:31.059
party, you write event, eighth birthday, attendees,

00:10:31.460 --> 00:10:36.440
10 children, budget, $200, theme, unicorns, location,

00:10:36.700 --> 00:10:39.960
backyard, constraint, no loud music. It feels

00:10:39.960 --> 00:10:42.299
so cold when you say it like that. It feels like

00:10:42.299 --> 00:10:45.399
I'm filing a police report, not planning a party.

00:10:45.600 --> 00:10:48.019
It feels cold to us because we're social creatures,

00:10:48.139 --> 00:10:51.299
but to the model. That list is pure relief. It

00:10:51.299 --> 00:10:52.659
doesn't have to hallucinate your preferences.

00:10:52.740 --> 00:10:55.320
It can just immediately start solving the logistics

00:10:55.320 --> 00:10:57.620
puzzle because the constraints are hard -coded.

00:10:57.720 --> 00:10:59.759
The guide offers a three -question test to run

00:10:59.759 --> 00:11:01.659
before you hit send. I found this really practical.

00:11:01.899 --> 00:11:04.500
One, can a stranger understand this without knowing

00:11:04.500 --> 00:11:07.860
me? Two, are there subjective words without any

00:11:07.860 --> 00:11:11.240
definitions? And three, are there clear constraints

00:11:11.240 --> 00:11:13.600
and success criteria? If you have subjective

00:11:13.600 --> 00:11:15.299
words without definitions, you're essentially

00:11:15.299 --> 00:11:17.860
just gambling. You're asking the router to guess

00:11:17.860 --> 00:11:19.980
your taste. You have to remember that these models

00:11:19.980 --> 00:11:22.259
are trained on the entire Internet. Their taste

00:11:22.259 --> 00:11:24.299
is the average of everything, and the average

00:11:24.299 --> 00:11:27.379
of everything is usually mediocre. So probing

00:11:27.379 --> 00:11:29.960
question. Does this mean we have to stop talking

00:11:29.960 --> 00:11:32.279
like humans and start talking like data analysts

00:11:32.279 --> 00:11:35.480
to get good results? In a way, yes. To get a

00:11:35.480 --> 00:11:38.259
human -like output, you need a data -driven input.

00:11:38.659 --> 00:11:41.320
That's a hell of a paradox. Okay, we're going

00:11:41.320 --> 00:11:43.639
to take a very short break. When we come back,

00:11:43.700 --> 00:11:45.559
we're going to get into the architecture of the

00:11:45.559 --> 00:11:47.659
prompt itself. We're going to talk about the

00:11:47.659 --> 00:11:50.559
secret syntax. GPT -5 was trained on something

00:11:50.559 --> 00:11:53.139
called XML and why using it is like cleaning

00:11:53.139 --> 00:11:57.419
your room before the maid arrives. We are back.

00:11:57.519 --> 00:11:59.820
We're deep diving into the invisible mechanisms

00:11:59.820 --> 00:12:03.019
of GPT -5. We've covered trigger words, the prompt

00:12:03.019 --> 00:12:06.340
optimizer, and radical specificity. Now we're

00:12:06.340 --> 00:12:09.629
getting technical. Trick number four. XML structure.

00:12:09.929 --> 00:12:11.629
This is my favorite one because it makes you

00:12:11.629 --> 00:12:13.389
look like a power user, but it's actually incredibly

00:12:13.389 --> 00:12:16.049
simple. And it speaks directly to how these models

00:12:16.049 --> 00:12:18.649
were trained. So for people who don't code, XML

00:12:18.649 --> 00:12:20.389
is just those words inside the little brackets,

00:12:20.429 --> 00:12:23.529
right? Like context and context. Right. It's

00:12:23.529 --> 00:12:26.049
just a way of labeling data. Yeah. But the reason

00:12:26.049 --> 00:12:29.429
it matters for GPT -5 is that the model was so

00:12:29.429 --> 00:12:32.009
heavily trained on structured data just like

00:12:32.009 --> 00:12:35.610
this. It intuitively understands that anything

00:12:35.610 --> 00:12:38.679
inside a context tag. is background info and

00:12:38.679 --> 00:12:41.299
anything inside a task tag is the thing it actually

00:12:41.299 --> 00:12:43.639
needs to do you got it the analogy the guide

00:12:43.639 --> 00:12:46.940
uses is rooms in a house yeah imagine you write

00:12:46.940 --> 00:12:49.799
a 500 word prompt but it's just one big block

00:12:49.799 --> 00:12:53.340
of text have backstory rules the tone you want

00:12:53.340 --> 00:12:56.080
the question all jumbled together it's a mess

00:12:56.080 --> 00:12:58.480
it's a studio apartment with clothes and dishes

00:12:58.480 --> 00:13:01.649
and books all piled on the floor The AI has to

00:13:01.649 --> 00:13:03.830
step over all that mess just to find the instruction.

00:13:04.090 --> 00:13:07.529
And XML builds walls. XML builds designated rooms.

00:13:07.710 --> 00:13:09.690
You put the background info in the context room.

00:13:09.730 --> 00:13:11.330
You put the rules in the constraints room. You

00:13:11.330 --> 00:13:13.990
put the actual job in the task room. The source

00:13:13.990 --> 00:13:16.509
used a business consultant newsletter as an example

00:13:16.509 --> 00:13:19.029
here. Right. If you use tags to define role as

00:13:19.029 --> 00:13:21.529
AI consultant and audience as small business

00:13:21.529 --> 00:13:24.149
owners, the model doesn't have to infer any of

00:13:24.149 --> 00:13:26.730
that context. It's just hard -coded right into

00:13:26.730 --> 00:13:28.490
the structure of the prompt. It creates a boundary.

00:13:28.669 --> 00:13:31.690
A very clear boundary. And that affects the router

00:13:31.690 --> 00:13:34.509
significantly. When the router sees that structure,

00:13:34.669 --> 00:13:38.029
it actually lowers the hallucination rate. Really?

00:13:38.110 --> 00:13:40.350
Yeah, because the model isn't confused about

00:13:40.350 --> 00:13:42.470
where the background info ends and the task begins.

00:13:42.649 --> 00:13:45.309
It knows exactly what to process. But I can hear

00:13:45.309 --> 00:13:47.230
listeners thinking, and honestly, I'm thinking

00:13:47.230 --> 00:13:50.289
it too. I really need to type out brackets every

00:13:50.289 --> 00:13:52.269
time I want to ask a question. Open bracket,

00:13:52.529 --> 00:13:56.509
task, close bracket. It just seems tedious. You

00:13:56.509 --> 00:13:58.710
don't need to do it for, you know, what's the

00:13:58.710 --> 00:14:02.179
weather? That's total overkill. But for complex

00:14:02.179 --> 00:14:05.639
workflows, for a recurring report or a big coding

00:14:05.639 --> 00:14:08.659
task, absolutely. And the shortcut is you don't

00:14:08.659 --> 00:14:10.720
even have to write the code. What do you mean?

00:14:10.899 --> 00:14:13.139
You can just write your messy paragraph and then

00:14:13.139 --> 00:14:16.840
tell ChatGPT, convert this prompt into XML structure.

00:14:17.159 --> 00:14:20.299
Use the AI to format for the AI. Exactly. It

00:14:20.299 --> 00:14:22.100
forces you to be organized. And when you see

00:14:22.100 --> 00:14:24.159
that XML come back and the constraints tag is

00:14:24.159 --> 00:14:26.500
empty, you realize, oh. I didn't give it any

00:14:26.500 --> 00:14:28.840
rules. A great diagnostic tool. To probing question,

00:14:29.259 --> 00:14:31.919
is this necessary for everything or just the

00:14:31.919 --> 00:14:34.860
big stuff? Just the big stuff. Don't use XML

00:14:34.860 --> 00:14:37.899
to ask what's the capital of France. Okay, that

00:14:37.899 --> 00:14:40.200
brings us to the final trick. Trick number five.

00:14:40.320 --> 00:14:42.539
And honestly, this one felt the most advanced.

00:14:43.480 --> 00:14:46.440
Self -reflection. This is the holy grail of accuracy.

00:14:46.860 --> 00:14:48.960
The premise here is that large language models

00:14:48.960 --> 00:14:51.740
are basically people pleasers. They want to give

00:14:51.740 --> 00:14:53.679
you an answer immediately. They're completion

00:14:53.679 --> 00:14:56.200
engines. They just predict the next token. They

00:14:56.200 --> 00:14:57.940
don't typically stop and think, wait, is what

00:14:57.940 --> 00:15:00.259
I just said actually true? They just keep generating.

00:15:00.419 --> 00:15:03.200
Unless you force them to stop. Right. Self -reflection

00:15:03.200 --> 00:15:05.480
is about stopping the AI from answering immediately.

00:15:05.700 --> 00:15:07.940
You script a loop where it has to grade its own

00:15:07.940 --> 00:15:10.509
homework before it shows it to you. Walk me through

00:15:10.509 --> 00:15:12.850
the process described in the guide. It's a specific

00:15:12.850 --> 00:15:15.190
script, isn't it? It is. It totally changes the

00:15:15.190 --> 00:15:18.870
workflow. Step one, you tell the AI to create

00:15:18.870 --> 00:15:21.690
a rubric. You say, define three to five criteria

00:15:21.690 --> 00:15:24.830
for a perfect answer to this question. So the

00:15:24.830 --> 00:15:27.149
AI sets the standards for itself first? Right.

00:15:27.289 --> 00:15:30.730
Step two, it generates a first draft. Yeah. But,

00:15:30.750 --> 00:15:33.029
and this is the key, you tell it not to show

00:15:33.029 --> 00:15:35.649
you the draft yet. It keeps it internal. Step

00:15:35.649 --> 00:15:39.509
three. It rates that draft on the rubric it just

00:15:39.509 --> 00:15:42.509
created. It literally scores itself. Accuracy,

00:15:42.549 --> 00:15:47.190
610. Clarity, 810. It becomes its own critic.

00:15:47.289 --> 00:15:50.570
And step four, if any scores below, say, an eight,

00:15:50.710 --> 00:15:53.330
it has to revise that section. It iterates. It

00:15:53.330 --> 00:15:56.509
loops on its own. Wow. And only in step five

00:15:56.509 --> 00:15:59.090
does it deliver the final result to you. So it

00:15:59.090 --> 00:16:02.409
writes, edits, rewrites, and then publishes.

00:16:02.409 --> 00:16:04.769
And I only ever see the final product. You never

00:16:04.769 --> 00:16:07.230
see the messy first draft where it hallucinated

00:16:07.230 --> 00:16:10.049
a legal precedent or got the math wrong. It catches

00:16:10.049 --> 00:16:12.549
its own errors. That is, that's like having an

00:16:12.549 --> 00:16:14.830
intern and a manager in the same box. It really

00:16:14.830 --> 00:16:17.330
is. But probing question, doesn't this make the

00:16:17.330 --> 00:16:20.429
response slower? Yes, but would you rather have

00:16:20.429 --> 00:16:23.460
a fast answer or a correct one? That's a good

00:16:23.460 --> 00:16:25.840
point for the high stakes stuff. Exactly. If

00:16:25.840 --> 00:16:27.899
you're generating a legal contract or analyzing

00:16:27.899 --> 00:16:30.179
medical data or debugging code, you don't care

00:16:30.179 --> 00:16:32.200
about the extra 40 seconds. You want the truth.

00:16:32.360 --> 00:16:34.159
So bringing it all together, the source talks

00:16:34.159 --> 00:16:36.480
about an ultimate template. The nuclear launch

00:16:36.480 --> 00:16:38.399
code. This is where you combine everything we've

00:16:38.399 --> 00:16:41.399
just talked about. Yes. You have a high -stakes

00:16:41.399 --> 00:16:44.679
task. You wrap the context and task in XML tags

00:16:44.679 --> 00:16:47.320
so the logic is bulletproof. You include the

00:16:47.320 --> 00:16:49.559
self -reflection loop in the instructions so

00:16:49.559 --> 00:16:52.019
it has to check itself. And then you sprinkle

00:16:52.019 --> 00:16:54.860
in those trigger words. This is critical to get

00:16:54.860 --> 00:16:57.159
right. That seems like it would be undeniable

00:16:57.159 --> 00:17:00.399
to the system. It effectively guarantees that

00:17:00.399 --> 00:17:02.559
the router sends you to the absolute smartest

00:17:02.559 --> 00:17:05.279
version of the model and that the model operates

00:17:05.279 --> 00:17:09.210
at its peak reasoning capacity. It is very, very

00:17:09.210 --> 00:17:12.230
hard to get a lazy answer with that kind of structure.

00:17:12.509 --> 00:17:15.069
The guide concludes with this idea of a new divide

00:17:15.069 --> 00:17:17.609
among users. Yeah, this part really struck me.

00:17:17.670 --> 00:17:20.630
The source suggested in 2026, we have two types

00:17:20.630 --> 00:17:22.630
of people. First, you have the router -aware

00:17:22.630 --> 00:17:25.210
users. These are the people using XML and triggers.

00:17:25.369 --> 00:17:27.269
They're getting 10x results. They feel like wizards.

00:17:27.509 --> 00:17:29.250
And then there's everyone else. Everyone else

00:17:29.250 --> 00:17:32.750
is prompting like it's 2023. They type, write

00:17:32.750 --> 00:17:34.690
me a blog post. They get a generic answer from

00:17:34.690 --> 00:17:37.400
the base model. And then they say, hey. AI is

00:17:37.400 --> 00:17:39.779
overhyped. It's plateaued. It's not that the

00:17:39.779 --> 00:17:42.700
tool is bad. It's that they're using a blunt

00:17:42.700 --> 00:17:45.240
instrument on a precision machine. Precisely.

00:17:45.400 --> 00:17:48.559
The router is always routing. It is always judging

00:17:48.559 --> 00:17:51.539
your prompt. The question is, are you giving

00:17:51.539 --> 00:17:54.720
it the signals it needs to respect you? That's

00:17:54.720 --> 00:17:57.579
a powerful thought to end on. The router is always

00:17:57.579 --> 00:18:00.200
routing. Every time you type, an invisible system

00:18:00.200 --> 00:18:02.220
is deciding if you deserve its full intelligence.

00:18:02.660 --> 00:18:04.539
It's a little chilling, but it's also empowering

00:18:04.539 --> 00:18:07.059
if you know the tricks. So here's our challenge

00:18:07.059 --> 00:18:09.539
to you, the listener. You don't have to start

00:18:09.539 --> 00:18:11.819
writing code today, but on your next prompt,

00:18:12.000 --> 00:18:15.519
just one prompt today, try a trigger word. Yeah,

00:18:15.599 --> 00:18:18.140
just try it. Add, think deeply about this, or

00:18:18.140 --> 00:18:20.420
this is critical to get right to the end of your

00:18:20.420 --> 00:18:23.039
request. Just see if the texture of the answer

00:18:23.039 --> 00:18:25.799
changes. And if you're feeling brave, ask it

00:18:25.799 --> 00:18:28.160
to convert your prompt to XML. See what happens

00:18:28.160 --> 00:18:30.099
when you hold up that mirror. I'm going to go

00:18:30.099 --> 00:18:33.099
try the XML thing on my dinner plans. Context,

00:18:33.200 --> 00:18:37.849
hungry, constraints. Spicy, latency, low. Let

00:18:37.849 --> 00:18:39.309
me know how the router handles that one. Will

00:18:39.309 --> 00:18:41.829
do. Thanks for listening to the deep dive. We'll

00:18:41.829 --> 00:18:42.430
see you next time.