WEBVTT

00:00:00.000 --> 00:00:04.419
Have you ever spent weeks engineering an AI agent

00:00:04.419 --> 00:00:07.259
only to have it confidently deliver an answer

00:00:07.259 --> 00:00:10.380
that was just demonstrably catastrophically wrong?

00:00:10.480 --> 00:00:12.679
It's the paradox of this current AI generation,

00:00:12.820 --> 00:00:15.400
isn't it? The delivery is perfect, but the information

00:00:15.400 --> 00:00:18.839
is just terrible. You ask for Q4 revenue, you

00:00:18.839 --> 00:00:21.570
get a number off by a factor of 10. Or you ask

00:00:21.570 --> 00:00:23.910
for a contract summary, and it misses the one

00:00:23.910 --> 00:00:26.149
single liability clause that matters. Exactly.

00:00:26.210 --> 00:00:28.109
It's a hallucination, but it's delivered with

00:00:28.109 --> 00:00:31.149
this conviction of a seasoned expert. Welcome

00:00:31.149 --> 00:00:33.189
back to the Deep Dive. Today, we are tackling

00:00:33.189 --> 00:00:36.469
that exact engineering frustration. For anyone

00:00:36.469 --> 00:00:38.729
out there building agents, the secret we found

00:00:38.729 --> 00:00:41.350
is that the flaw, well, it's rarely the language

00:00:41.350 --> 00:00:43.969
model itself. No. The problem is almost always

00:00:43.969 --> 00:00:46.750
rooted in how we feed the data to our systems.

00:00:46.810 --> 00:00:48.630
Which is what we're calling context engineering.

00:00:48.829 --> 00:00:51.119
Precisely. It's the whole process of ensuring

00:00:51.119 --> 00:00:54.079
your retrieval augmented generation, your RAG

00:00:54.079 --> 00:00:57.340
system, has the exact information in the right

00:00:57.340 --> 00:01:00.340
format at the precise time it needs it. So RAG

00:01:00.340 --> 00:01:03.619
is basically giving your LLM some external knowledge

00:01:03.619 --> 00:01:06.659
to keep it grounded. Yes, and our mission today

00:01:06.659 --> 00:01:09.900
is to help you transform your agents from these

00:01:09.900 --> 00:01:14.500
confident guessers into truly verifiable analysts.

00:01:15.040 --> 00:01:17.700
OK, let's unpack this. We need to move beyond

00:01:17.700 --> 00:01:20.379
those intro tutorials that always seem to lead

00:01:20.379 --> 00:01:22.719
to these frustrating failures. For sure. We're

00:01:22.719 --> 00:01:26.480
going to cover four powerful, specific solutions.

00:01:27.060 --> 00:01:29.859
We'll start with a deep dive into why the basic

00:01:29.859 --> 00:01:33.129
default methods fail so badly. Then we'll get

00:01:33.129 --> 00:01:35.790
into the reliable methods, starting with structured

00:01:35.790 --> 00:01:38.469
data simple filters and SQL queries. And finally,

00:01:38.670 --> 00:01:40.769
we'll define when you should actually use full

00:01:40.769 --> 00:01:43.310
context retrieval or when that traditional vector

00:01:43.310 --> 00:01:46.010
search is still the right tool for the job. Let's

00:01:46.010 --> 00:01:47.489
do it. All right. So when you first start building

00:01:47.489 --> 00:01:50.049
these RIG systems, the default method everyone

00:01:50.049 --> 00:01:52.489
learns is chunk -based retrieval. It's usually

00:01:52.489 --> 00:01:54.709
powered by vector search. Great. You take a massive

00:01:54.709 --> 00:01:57.170
document, you cut it up into these little pieces,

00:01:57.290 --> 00:01:59.689
these chunks. And then the AI only gets to see

00:01:59.689 --> 00:02:02.200
a handful of them. maybe the three or four things

00:02:02.200 --> 00:02:04.780
are most relevant. It's like stacking scattered

00:02:04.780 --> 00:02:06.879
Lego blocks of data and just hoping they form

00:02:06.879 --> 00:02:10.930
a perfect blueprint. That standard, that fragmented

00:02:10.930 --> 00:02:14.129
approach, it basically guarantees three critical

00:02:14.129 --> 00:02:16.849
issues. The first one is the most common, missing

00:02:16.849 --> 00:02:19.750
context. The little fragments, they might contain

00:02:19.750 --> 00:02:22.389
keywords, sure, but they lack all the surrounding

00:02:22.389 --> 00:02:25.270
narrative. The setup, it's like, if you were

00:02:25.270 --> 00:02:27.689
summarizing a 100 -page book and I only gave

00:02:27.689 --> 00:02:31.430
you pages 12, 45, and 88. You'd have no idea

00:02:31.430 --> 00:02:33.990
what's going on. You couldn't possibly grasp

00:02:33.990 --> 00:02:36.789
the main character's motivation. You'd miss the

00:02:36.789 --> 00:02:39.759
crucial setup from chapter one. The LLM, when

00:02:39.759 --> 00:02:42.780
it's faced with these disconnected pieces, it's

00:02:42.780 --> 00:02:45.800
just forced to guess. Yeah, and I can offer a

00:02:45.800 --> 00:02:47.599
vulnerable admission here just based on my own

00:02:47.599 --> 00:02:49.639
early failures. The second issue is what I call

00:02:49.639 --> 00:02:52.310
the math wall. Oh, yeah. Language models are,

00:02:52.310 --> 00:02:54.610
I mean, they're incredible text processors, but

00:02:54.610 --> 00:02:58.210
they are fundamentally terrible at complex, verifiable

00:02:58.210 --> 00:03:00.349
math. They're calculators that are bad at math.

00:03:00.550 --> 00:03:03.110
Exactly. I once asked my internal agent for Q3

00:03:03.110 --> 00:03:06.389
sales figures from this sprawling 2000 line report.

00:03:06.729 --> 00:03:09.250
It pulled four records that happened to contain

00:03:09.250 --> 00:03:11.169
the word total. And it just added those four

00:03:11.169 --> 00:03:14.389
up. It summed those four lines and presented

00:03:14.389 --> 00:03:18.500
that tiny result. as our company's entire quarterly

00:03:18.500 --> 00:03:20.500
revenue. And that is the core absurdity, right?

00:03:20.780 --> 00:03:23.639
The agent calculated just a tiny fraction of

00:03:23.639 --> 00:03:26.340
the necessary data, presented the result with

00:03:26.340 --> 00:03:29.280
99 % confidence. And you have to go use external

00:03:29.280 --> 00:03:31.159
tools just to realize it's completely broken.

00:03:31.759 --> 00:03:35.500
It happens because that chunking process, it

00:03:35.500 --> 00:03:39.020
prioritizes what sounds similar over actual data

00:03:39.020 --> 00:03:41.759
integrity. Right. And that leads to the third

00:03:41.759 --> 00:03:44.000
flaw, which is the problem of bad summaries.

00:03:44.340 --> 00:03:46.539
Yeah. So you need a comprehensive chronological

00:03:46.539 --> 00:03:49.539
summary of a 50 page technical manual. If your

00:03:49.539 --> 00:03:52.419
system just randomly hands the AI four paragraphs

00:03:52.419 --> 00:03:54.139
from all over the document. The summary is going

00:03:54.139 --> 00:03:55.939
to be useless. It's like trying to write a detailed

00:03:55.939 --> 00:03:58.139
review of a three hour documentary after only

00:03:58.139 --> 00:04:00.180
watching like 30 seconds of the trailer. You

00:04:00.180 --> 00:04:02.520
lose the plot. the sequence, everything. So if

00:04:02.520 --> 00:04:04.979
we accept that these math errors and logic gaps

00:04:04.979 --> 00:04:07.960
are so easily created by fragmentation, how should

00:04:07.960 --> 00:04:10.340
a developer even approach verifiable accuracy

00:04:10.340 --> 00:04:12.840
at this point? Well, math errors are just inherently

00:04:12.840 --> 00:04:16.040
masked by the LLM's confidence. You have to force

00:04:16.040 --> 00:04:19.019
the calculations onto external specialized computation

00:04:19.019 --> 00:04:21.300
engines. OK. So let's pivot. Let's talk about

00:04:21.300 --> 00:04:23.540
the first real solution, moving away from all

00:04:23.540 --> 00:04:25.959
this guesswork. Right. And this one is specifically

00:04:25.959 --> 00:04:29.300
for structured data. So data that lives in neat

00:04:29.300 --> 00:04:32.980
rows and columns, think an internal dashboard

00:04:32.980 --> 00:04:35.139
or even just a Google sheet. Instead of asking

00:04:35.139 --> 00:04:38.259
the AI to search and guess, we teach it to operate

00:04:38.259 --> 00:04:41.370
more like a database admin. We teach it to filter.

00:04:41.870 --> 00:04:43.850
And this approach relies on what's called function

00:04:43.850 --> 00:04:46.509
calling, or defining these very specific tools

00:04:46.509 --> 00:04:49.670
the AI can use. We don't ask it to read the data.

00:04:49.850 --> 00:04:52.759
We ask it to construct a precise query. So for

00:04:52.759 --> 00:04:55.139
a sales database, for example, we define functions

00:04:55.139 --> 00:04:58.420
like product name query or date query. And the

00:04:58.420 --> 00:05:00.279
technical depth here, the really important part,

00:05:00.680 --> 00:05:02.920
is defining that tool's specification. You have

00:05:02.920 --> 00:05:05.680
to dictate the exact inputs. So for instance,

00:05:06.060 --> 00:05:09.579
requiring the date to be strictly in that yyymmdd

00:05:09.579 --> 00:05:12.120
format. That input validation is so crucial.

00:05:12.519 --> 00:05:15.579
It is. And we had to give the agent a core system

00:05:15.579 --> 00:05:18.220
prompt, an instruction that says, always use

00:05:18.220 --> 00:05:20.860
the most specific tool available for the user's

00:05:20.860 --> 00:05:23.339
intent. And that instruction, it turns the agent

00:05:23.339 --> 00:05:25.759
into this highly disciplined engineer, not some

00:05:25.759 --> 00:05:27.579
creative writer trying to guess what you mean.

00:05:27.899 --> 00:05:30.639
Exactly. So consider a real -world query. How

00:05:30.639 --> 00:05:32.959
many wireless headphones were sold in the Northeast

00:05:32.959 --> 00:05:38.360
region on 2025 -0915? The agent doesn't search

00:05:38.360 --> 00:05:41.430
a vector database. It sees the query. It identifies

00:05:41.430 --> 00:05:43.829
the need for filters. And it just chains the

00:05:43.829 --> 00:05:46.009
tools together. So it uses product name query,

00:05:46.410 --> 00:05:48.649
then region query, then the date query to narrow

00:05:48.649 --> 00:05:51.370
everything down. And then finally, it hands that

00:05:51.370 --> 00:05:54.649
small, perfect data set to a dedicated calculator

00:05:54.649 --> 00:05:56.910
or computation engine to just sum the results.

00:05:57.329 --> 00:06:00.290
The outcome is 100 % accurate. Because the heavy

00:06:00.290 --> 00:06:02.649
lifting was done by precise data filtering, not

00:06:02.649 --> 00:06:05.110
by linguistic guesswork. Right. But I should

00:06:05.110 --> 00:06:06.629
say, for listeners who might not have a deep

00:06:06.629 --> 00:06:08.579
programming background. Yeah. The simplicity

00:06:08.579 --> 00:06:11.120
of that outcome, it really masks the complexity

00:06:11.120 --> 00:06:13.600
of the setup. I mean, defining those tools and

00:06:13.600 --> 00:06:15.620
making sure they don't suffer from prompt drift,

00:06:16.019 --> 00:06:18.259
where the agent just starts ignoring your instructions

00:06:18.259 --> 00:06:21.000
over time. I still wrestle with that myself.

00:06:21.100 --> 00:06:23.439
It's a constant battle. It absolutely is. So

00:06:23.439 --> 00:06:25.959
what makes implementing these specific filters

00:06:25.959 --> 00:06:28.899
superior to just using generalized chunking,

00:06:28.920 --> 00:06:32.139
even for really large structured data sets? Filters

00:06:32.139 --> 00:06:35.000
provide targeted data access, which eliminates

00:06:35.000 --> 00:06:37.800
the risk of data fragmentation and guarantees

00:06:37.800 --> 00:06:41.199
100 % verifiable calculation inputs. OK, so filters

00:06:41.199 --> 00:06:43.480
are great for selection. But what if you need

00:06:43.480 --> 00:06:46.040
more complex calculations, things like generating

00:06:46.040 --> 00:06:49.060
averages or standard deviations or sorting complex

00:06:48.910 --> 00:06:51.889
reports. Yeah, the AI is going to fail those

00:06:51.889 --> 00:06:54.829
manual calculations every single time. And this

00:06:54.829 --> 00:06:56.829
is where we bring in the language of databases.

00:06:57.370 --> 00:07:00.110
SQL. So think of this as giving your ARAGAY LLM

00:07:00.110 --> 00:07:02.569
a superpowered calculator that's optimized for

00:07:02.569 --> 00:07:05.310
massive data operations. Exactly. The process

00:07:05.310 --> 00:07:07.750
is pretty elegant. The LLM writes the SQL code,

00:07:08.029 --> 00:07:10.649
maybe a complex group PO by Y function for a

00:07:10.649 --> 00:07:13.329
Postgres database, sends that query to the actual

00:07:13.329 --> 00:07:15.149
database engine. And the engine just returns

00:07:15.149 --> 00:07:18.069
the single precise result. But the critical engineering

00:07:18.069 --> 00:07:21.620
detail. The part you cannot get wrong is in the

00:07:21.620 --> 00:07:24.600
system prompt. You have to include the exact

00:07:24.600 --> 00:07:27.079
table schema. So you have to explicitly tell

00:07:27.079 --> 00:07:29.579
the AI, hey, there's a table named sales data,

00:07:29.819 --> 00:07:32.899
and then list every single column. Customered,

00:07:33.240 --> 00:07:37.139
INT, total price, float, order date, date. Because

00:07:37.139 --> 00:07:39.839
if you omit the exact column names or the data

00:07:39.839 --> 00:07:42.959
types, the AI just guesses. And that leads to

00:07:42.959 --> 00:07:45.160
immediate execution failure. Right. And that's

00:07:45.160 --> 00:07:47.399
why this is a step up in complexity from simple

00:07:47.399 --> 00:07:49.560
filtering. You're actually relying on the LLM

00:07:49.560 --> 00:07:52.160
to write code. And you have to instruct it with

00:07:52.160 --> 00:07:54.560
absolute clarity. Never attempt math manually.

00:07:55.000 --> 00:07:57.319
You force it to delegate the calculation using

00:07:57.319 --> 00:08:01.060
functions like sum, abg, and group pby. You're

00:08:01.060 --> 00:08:03.019
pushing all that mathematical load onto the database.

00:08:03.519 --> 00:08:05.790
That raises a critical point, though. If we're

00:08:05.790 --> 00:08:08.589
having an LLM write code, doesn't that open us

00:08:08.589 --> 00:08:11.290
up to huge security concerns like SQL injection

00:08:11.290 --> 00:08:14.149
vulnerabilities? Isn't it much slower than just

00:08:14.149 --> 00:08:16.110
fetching the data? That is a crucial technical

00:08:16.110 --> 00:08:18.610
challenge. Yes, the developer has to implement

00:08:18.610 --> 00:08:21.709
really strict input sanitization. And you typically

00:08:21.709 --> 00:08:24.250
have to limit the types of queries the LLM can

00:08:24.250 --> 00:08:26.509
even construct to prevent malicious injection.

00:08:27.430 --> 00:08:30.750
But you use SQL for speed on complex aggregation,

00:08:30.889 --> 00:08:33.879
not simple filtering. When a user asks, who are

00:08:33.879 --> 00:08:36.240
our top three customers by total spending this

00:08:36.240 --> 00:08:39.679
year? The SQL script processes millions of rows

00:08:39.679 --> 00:08:43.279
instantly. Ah, OK. So you need that power when

00:08:43.279 --> 00:08:45.759
a simple filter just won't cut it for the complex

00:08:45.759 --> 00:08:48.440
sorting and counting. Exactly. We let the database

00:08:48.440 --> 00:08:50.559
do the heavy lifting for absolute reliability.

00:08:50.889 --> 00:08:53.649
So why does providing the specific column names

00:08:53.649 --> 00:08:56.490
and data types prevent the most common SQL failures

00:08:56.490 --> 00:08:59.509
in these agents? Without the exact schema, the

00:08:59.509 --> 00:09:01.990
AI guarantees query failure by producing type

00:09:01.990 --> 00:09:04.529
mismatch errors or ambiguous column references.

00:09:04.629 --> 00:09:07.090
Makes sense. Now let's move back to unstructured

00:09:07.090 --> 00:09:09.649
data. We're talking about those long legal contracts,

00:09:09.950 --> 00:09:13.389
technical manuals, or sprawling video transcripts.

00:09:14.049 --> 00:09:16.409
Sometimes you really do need the AI to understand

00:09:16.409 --> 00:09:18.789
the entirety of a source, not just little fragmented

00:09:18.789 --> 00:09:23.980
parts. modern large context windows, like 128k

00:09:23.980 --> 00:09:26.940
tokens, this idea of full context retrieval is

00:09:26.940 --> 00:09:29.620
not only viable, but it's often the most accurate

00:09:29.620 --> 00:09:32.200
option. Right, you're giving the model a huge

00:09:32.200 --> 00:09:35.220
memory, the full text. But there are two different

00:09:35.220 --> 00:09:37.720
ways to do this and understanding the cost differences,

00:09:37.840 --> 00:09:40.779
well, it's key. Method A is the simple prompt

00:09:40.779 --> 00:09:44.179
method. You just copy and paste the entire document

00:09:44.179 --> 00:09:47.250
into the AI agent's static system prompt. But

00:09:47.250 --> 00:09:50.289
the cost drawback of that simple method is immediate.

00:09:50.509 --> 00:09:53.389
And it's heavy. Yeah. Since that long text is

00:09:53.389 --> 00:09:56.029
just hard -coded into the prompt, you pay for

00:09:56.029 --> 00:09:58.070
every single one of those tokens, the entire

00:09:58.070 --> 00:10:00.710
document, every single time the agent runs, even

00:10:00.710 --> 00:10:03.330
if the user just asks, hello, how are you? You're

00:10:03.330 --> 00:10:05.629
just burning tokens for no reason. Exactly. And

00:10:05.629 --> 00:10:08.029
that's why we really recommend method B, the

00:10:08.029 --> 00:10:10.809
tool -based method. We wrap the entire document

00:10:10.809 --> 00:10:12.990
inside a tool definition. It's the difference

00:10:12.990 --> 00:10:15.389
between forcing an employee to carry every file

00:10:15.389 --> 00:10:17.259
in their arms all day. versus just giving them

00:10:17.259 --> 00:10:19.940
a key to the filing cabinet. Maximum savings,

00:10:20.320 --> 00:10:23.340
on -demand access. The agent only fetches and

00:10:23.340 --> 00:10:26.820
reads that full, expensive document if the user's

00:10:26.820 --> 00:10:29.240
query specifically triggers that tool. So if

00:10:29.240 --> 00:10:31.840
the user asks about the document, the agent gets

00:10:31.840 --> 00:10:34.639
the full content instantly. But if they ask about

00:10:34.639 --> 00:10:37.139
the weather, it saves the cost. And the performance

00:10:37.139 --> 00:10:39.539
increase is just stunning. We saw a test where

00:10:39.539 --> 00:10:42.360
standard vector search created this messy chronological

00:10:42.360 --> 00:10:45.340
nightmare from a video transcript. But full context.

00:10:45.519 --> 00:10:48.000
Because it had the entire script, it produced

00:10:48.000 --> 00:10:51.019
a perfect sequential and comprehensive summary.

00:10:51.340 --> 00:10:54.139
So if accuracy on a single document is paramount

00:10:54.139 --> 00:10:56.580
and it fits within that context window, this

00:10:56.580 --> 00:11:00.220
is the choice. Whoa. Imagine scaling this precise

00:11:00.220 --> 00:11:02.740
chronological retrieval to a billion queries

00:11:02.740 --> 00:11:05.259
across complex legal documents without losing

00:11:05.259 --> 00:11:08.200
a single key detail. The analysis potential is

00:11:08.200 --> 00:11:10.879
just immense. It guarantees context continuity.

00:11:11.200 --> 00:11:12.940
So given the cost and the speed trade -offs,

00:11:13.320 --> 00:11:15.559
why should we always avoid that simpler prompt

00:11:15.559 --> 00:11:18.440
method for long documents? Avoid the prompt method

00:11:18.440 --> 00:11:20.700
because you will incur unnecessary token cost

00:11:20.700 --> 00:11:22.960
when the user's query does not require accessing

00:11:22.960 --> 00:11:25.409
the document. Okay, so... We've been a little

00:11:25.409 --> 00:11:27.610
tough on chunk -based retrieval, but let's be

00:11:27.610 --> 00:11:30.149
absolutely clear. Vector search is not a bad

00:11:30.149 --> 00:11:32.730
technology. Not at all. It's just deployed inappropriately,

00:11:32.929 --> 00:11:36.149
like 90 % of the time. Vector search or chunk

00:11:36.149 --> 00:11:39.210
-based retrieval, it remains the undisputed king

00:11:39.210 --> 00:11:42.549
for one specific thing. Massive scale. It's the

00:11:42.549 --> 00:11:44.870
library card catalog analogy, right? You use

00:11:44.870 --> 00:11:46.769
vector search when you have 10 ,000 documents,

00:11:47.450 --> 00:11:49.850
a massive library of employee handbooks, old

00:11:49.850 --> 00:11:52.750
support tickets, internal memos. You simply cannot

00:11:52.750 --> 00:11:55.529
afford to feed the AI all of them using full

00:11:55.529 --> 00:11:58.049
context, and they're way too unstructured for

00:11:58.049 --> 00:12:00.870
filters or SQL. And the beauty of it is the indexing.

00:12:01.080 --> 00:12:03.879
It embeds all that text into these numerical

00:12:03.879 --> 00:12:06.419
representations, stores them in a vector store.

00:12:06.500 --> 00:12:08.899
And can rapidly retrieve the pages that are numerically

00:12:08.899 --> 00:12:10.879
closest to the user's question. It helps you

00:12:10.879 --> 00:12:12.759
find the right book or the right page within

00:12:12.759 --> 00:12:14.679
the right book really quickly. So the distinction

00:12:14.679 --> 00:12:17.259
is crucial. You use vector search when the answer

00:12:17.259 --> 00:12:19.279
could be anywhere in this sprawling library.

00:12:19.820 --> 00:12:21.960
It's amazing for the find the needle in the haystack

00:12:21.960 --> 00:12:24.659
kind of query. But it's fundamentally terrible

00:12:24.659 --> 00:12:28.139
for analytical tasks. Things like summarize everything

00:12:28.139 --> 00:12:31.059
or count the total revenue. So of all these powerful

00:12:31.059 --> 00:12:34.440
context methods, what unique capability does

00:12:34.440 --> 00:12:37.279
vector search offer that the others just can't

00:12:37.279 --> 00:12:40.830
replicate? VectorSearch excels at rapidly indexing

00:12:40.830 --> 00:12:43.809
and retrieving relevant information from massive

00:12:43.809 --> 00:12:46.490
sprawling libraries of unstructured documents.

00:12:46.789 --> 00:12:48.870
Okay, so that brings it all together. How do

00:12:48.870 --> 00:12:51.549
we decide which of these four methods to use

00:12:51.549 --> 00:12:53.970
in a real -world scenario? Well, the decision

00:12:53.970 --> 00:12:56.029
framework is actually incredibly simple. We should

00:12:56.029 --> 00:12:57.950
always start with this question. How would a

00:12:57.950 --> 00:13:00.490
human knowledge worker answer this exact question?

00:13:00.769 --> 00:13:03.490
OK. So if a human would naturally use an Excel

00:13:03.490 --> 00:13:05.710
filter, a query like, show me all orders from

00:13:05.710 --> 00:13:07.970
yesterday where the product was blue, then you

00:13:07.970 --> 00:13:09.909
have to use filters. Right. And if a human would

00:13:09.909 --> 00:13:12.090
need a pivot table or a complex calculator, a

00:13:12.090 --> 00:13:14.789
question like, what's the average lifetime spending

00:13:14.789 --> 00:13:18.409
of our top 10 % of customers, then you must use

00:13:18.409 --> 00:13:20.450
SQL queries. And if a human would need to read

00:13:20.450 --> 00:13:22.490
the whole file sequentially, like summarize this

00:13:22.490 --> 00:13:25.169
40 page contract, noting all termination clauses,

00:13:26.050 --> 00:13:28.669
they use full context retrieval. And finally,

00:13:29.070 --> 00:13:31.309
if a human would need to search through a massive

00:13:31.309 --> 00:13:34.470
filing cabinet, if the answer to, how do I reset

00:13:34.470 --> 00:13:37.549
my password, is buried among thousands of old

00:13:37.549 --> 00:13:40.330
user guides, then you use vector search. Okay,

00:13:40.330 --> 00:13:41.929
but what about the layered complexity? What if

00:13:41.929 --> 00:13:45.409
you get a query like, summarize the Q1 marketing

00:13:45.409 --> 00:13:47.789
strategy document for new customers in Texas

00:13:47.789 --> 00:13:51.490
who spent over $500? That is where context engineering

00:13:51.490 --> 00:13:54.600
really earns its name. the agent has to chain

00:13:54.600 --> 00:13:57.799
these things together. First, it uses the filter

00:13:57.799 --> 00:14:00.779
tool to identify the relevant customers. Texas

00:14:00.779 --> 00:14:04.519
greater than $500. Then it uses the SQL tool

00:14:04.519 --> 00:14:06.919
to aggregate and sort their spending trends.

00:14:07.460 --> 00:14:09.940
And then finally. Finally, it uses the full context

00:14:09.940 --> 00:14:12.460
retrieval tool to fetch and summarize that single

00:14:12.460 --> 00:14:14.360
marketing strategy document that was mentioned

00:14:14.360 --> 00:14:16.460
in the query. It's like a chain of custody for

00:14:16.460 --> 00:14:18.940
information. That layered approach ensures you

00:14:18.940 --> 00:14:21.960
get maximum accuracy at every single step. So

00:14:21.960 --> 00:14:24.440
finally, maybe a few essential technical warnings.

00:14:24.759 --> 00:14:27.460
Yes. Don't be lazy with your system prompts.

00:14:27.740 --> 00:14:29.840
Give your tools descriptive, specific names that

00:14:29.840 --> 00:14:31.799
tell the agent exactly what they do and what

00:14:31.799 --> 00:14:35.480
data format they require. And never ever combine

00:14:35.480 --> 00:14:38.600
structured data like numbers, dates, or precise

00:14:38.600 --> 00:14:42.250
inventory counts with unstructured text. blindly

00:14:42.250 --> 00:14:44.570
in a vector database. Keep that structured data

00:14:44.570 --> 00:14:46.450
separate. If you mix your math with your text,

00:14:46.610 --> 00:14:48.610
you are guaranteeing the mathematical analysis

00:14:48.610 --> 00:14:50.850
will fail. That feels like the core lesson here.

00:14:50.929 --> 00:14:53.009
I think so. The main takeaway we've distilled

00:14:53.009 --> 00:14:56.190
today is that building reliable RRAG LLMs, it

00:14:56.190 --> 00:14:58.730
isn't about finding the perfect model. It's all

00:14:58.730 --> 00:15:01.210
about context engineering. It's about ensuring

00:15:01.210 --> 00:15:04.490
the AI has the exact, verifiable information

00:15:04.490 --> 00:15:07.789
in the right format at the precise time. We move

00:15:07.789 --> 00:15:11.110
past all the guesswork and into reliable, accountable

00:15:11.110 --> 00:15:14.110
analysis. And remember, the best agents are polyglots.

00:15:14.490 --> 00:15:16.950
They use a mix of these methods. You might filter

00:15:16.950 --> 00:15:19.730
to narrow down a custom race, then use full context

00:15:19.730 --> 00:15:22.350
to read their contract, then use SQL to report

00:15:22.350 --> 00:15:24.750
on their spending. That layered accuracy really

00:15:24.750 --> 00:15:27.029
is the new standard. So here's your final thought

00:15:27.029 --> 00:15:29.690
for reflection. Look at your current AI workflows.

00:15:30.110 --> 00:15:32.049
Ask yourself if you're using that generalized

00:15:32.049 --> 00:15:34.490
vector search where a simple precise filter or

00:15:34.490 --> 00:15:37.549
a structured SQL query would give you 100 % more

00:15:37.549 --> 00:15:40.070
accuracy and auditability. Try changing just

00:15:40.070 --> 00:15:42.490
one tool in your pipeline this week. Go build

00:15:42.490 --> 00:15:44.350
something amazing with this knowledge and we

00:15:44.350 --> 00:15:46.110
look forward to the next deep dive with you.
