WEBVTT

00:00:00.000 --> 00:00:03.040
So imagine you're talking to someone, right?

00:00:04.019 --> 00:00:06.280
And maybe five minutes after you tell them your

00:00:06.280 --> 00:00:08.859
name, they forget it. Oh, yeah. Or, you know,

00:00:08.960 --> 00:00:10.880
you poured your heart out last week about your

00:00:10.880 --> 00:00:14.460
big life goals, and today, blank stare, like

00:00:14.460 --> 00:00:16.480
you've never even met. It's kind of like that

00:00:16.480 --> 00:00:20.260
movie Memento. but for your AI assistant. Exactly.

00:00:20.320 --> 00:00:22.699
Honestly, that's probably the biggest thing holding

00:00:22.699 --> 00:00:26.239
AI agents back right now, this goldfish memory

00:00:26.239 --> 00:00:29.359
problem. Most just have this tiny temporary buffer.

00:00:29.460 --> 00:00:31.719
It really stops them from becoming true partners,

00:00:31.940 --> 00:00:34.500
doesn't it? Or collaborators. Definitely. Welcome

00:00:34.500 --> 00:00:38.130
to the Deep Dive. Today we're... taking a plunge

00:00:38.130 --> 00:00:41.850
into fixing that, giving AI real, actual, long

00:00:41.850 --> 00:00:43.909
-term memory. And it's not just about remembering

00:00:43.909 --> 00:00:46.270
the last two chat messages, not at all. No, it's

00:00:46.270 --> 00:00:47.770
much bigger than that. It's about building AIs

00:00:47.770 --> 00:00:50.509
that genuinely understand you. You know, your

00:00:50.509 --> 00:00:52.750
preferences, past talks, goals. That's how they

00:00:52.750 --> 00:00:54.530
give you that really personal evolving help.

00:00:54.750 --> 00:00:57.409
So our mission today is to unpack this core problem.

00:00:57.789 --> 00:01:00.469
We're going to look at how these relational knowledge

00:01:00.469 --> 00:01:02.770
graphs, something like this open source tool

00:01:02.770 --> 00:01:06.530
called Zep, can offer a real solution. Zep's

00:01:06.530 --> 00:01:09.989
basically a database made for AI memory. Exactly.

00:01:10.010 --> 00:01:11.530
And we're going to walk you through how this

00:01:11.530 --> 00:01:15.450
memory gets built from scratch. And the big one.

00:01:15.560 --> 00:01:18.719
How you do it without racking up some crazy huge

00:01:18.719 --> 00:01:21.400
bill from your AI provider. Yeah, the cost thing

00:01:21.400 --> 00:01:23.739
is key. We'll also get into some advanced tricks,

00:01:24.400 --> 00:01:27.379
the ethics side of things, privacy. Super important.

00:01:27.700 --> 00:01:30.040
And even give you a kind of four week plan to

00:01:30.040 --> 00:01:32.620
try this yourself. The goal is you'll understand

00:01:32.620 --> 00:01:35.500
how to take your AI agents from, well, forgetful

00:01:35.500 --> 00:01:37.920
assistants to really powerful collaborators.

00:01:38.079 --> 00:01:40.079
All right, let's get to it. So when you first

00:01:40.079 --> 00:01:42.430
start with an AI agent, its memory is. Well,

00:01:42.530 --> 00:01:44.769
pretty basic. Mm -hmm. Very basic. It usually

00:01:44.769 --> 00:01:46.750
just remembers the last few things said, maybe

00:01:46.750 --> 00:01:48.709
five to 15 messages back and forth. It's all

00:01:48.709 --> 00:01:51.030
in this temporary digital space. If you say,

00:01:51.290 --> 00:01:53.750
hi, my name is AI Fire, sure, it can say your

00:01:53.750 --> 00:01:56.170
name back. But, and this is the crucial part,

00:01:56.329 --> 00:01:59.310
the fatal flaw, really, it's just reading a transcript.

00:02:00.010 --> 00:02:02.069
The AI isn't learning anything deep. It's just

00:02:02.069 --> 00:02:04.750
looking back a few lines. So if the chat goes

00:02:04.750 --> 00:02:06.810
on too long. Or you start a new one later. Yeah.

00:02:07.040 --> 00:02:10.360
Poof. That old context is just gone. It doesn't

00:02:10.360 --> 00:02:13.400
understand AI fire as a person. It just knows

00:02:13.400 --> 00:02:16.199
those words were typed recently. It remembers

00:02:16.199 --> 00:02:19.599
the words, not the person. Okay, so contrast

00:02:19.599 --> 00:02:21.740
that with real long -term memory. What does that

00:02:21.740 --> 00:02:24.039
look like? Maybe I ask, hey, remind me about

00:02:24.039 --> 00:02:26.520
that Paris trip plan from last month. Yeah. And

00:02:26.520 --> 00:02:28.460
the agent, without you feeding in any details

00:02:28.460 --> 00:02:30.599
again, just comes back with, sure, here's that

00:02:30.599 --> 00:02:32.780
personalized plan for John Doe's Paris trip next

00:02:32.780 --> 00:02:37.659
month. Duration. Budget. Around $2 ,000. It pulls

00:02:37.659 --> 00:02:39.879
John Doe and the budget from its stored knowledge.

00:02:40.039 --> 00:02:42.340
Not from what I just typed. Exactly, from deep

00:02:42.340 --> 00:02:45.740
storage. Learn stuff. That difference, remembering

00:02:45.740 --> 00:02:47.860
words versus actually understanding a person,

00:02:47.860 --> 00:02:50.639
that seems huge. What's the fundamental difference

00:02:50.639 --> 00:02:53.340
there, inside the AI? It's remembering recent

00:02:53.340 --> 00:02:56.080
chat versus deeply understanding you. Okay, so

00:02:56.080 --> 00:02:58.740
how does it know all that stuff then? The John

00:02:58.740 --> 00:03:02.360
Doe details, the budget, the trip. It's this

00:03:02.360 --> 00:03:04.099
relational knowledge graph you mentioned. Yep,

00:03:04.180 --> 00:03:06.409
that's the core of it. And it's not just a simple

00:03:06.409 --> 00:03:09.990
list of facts. Think of it more like a mind palace,

00:03:10.569 --> 00:03:13.150
or maybe a really detailed visual map of how

00:03:13.150 --> 00:03:15.949
knowledge connects. For every single user, it's

00:03:15.949 --> 00:03:18.550
like their own personal Wikipedia. It stores

00:03:18.550 --> 00:03:21.409
facts, identifies the key things, the entities,

00:03:21.629 --> 00:03:24.949
like John Doe or Paris, and crucially, it maps

00:03:24.949 --> 00:03:27.770
the relationships between them, like John Doe

00:03:27.770 --> 00:03:29.830
plans trip to Paris. And I guess as you talk

00:03:29.830 --> 00:03:33.069
more, that map gets... Dense exactly grows new

00:03:33.069 --> 00:03:35.229
things get added like maybe the musee door say

00:03:35.229 --> 00:03:37.710
and new connections like John Doe is interested

00:03:37.710 --> 00:03:39.550
in art museums Interesting and it's not just

00:03:39.550 --> 00:03:42.069
the connections each thing in the graph each

00:03:42.069 --> 00:03:44.889
entity like John Doe or Hotel Paris Central Gets

00:03:44.889 --> 00:03:48.090
its own little summary generated by the AI something

00:03:48.090 --> 00:03:52.050
like John Doe user planning Paris trip budget

00:03:52.050 --> 00:03:55.930
$2 ,000 so this structured brain lets the AI

00:03:55.930 --> 00:03:58.650
quickly scan for the key ideas and links, makes

00:03:58.650 --> 00:04:01.050
its answers way faster and smarter, because it

00:04:01.050 --> 00:04:03.210
doesn't have to reread tons of old chat logs.

00:04:03.490 --> 00:04:05.909
So how does this mind map make the AI smarter,

00:04:05.909 --> 00:04:08.289
essentially? It's a structured brain for quick,

00:04:08.789 --> 00:04:10.689
precise knowledge retrieval. Let's try to picture

00:04:10.689 --> 00:04:13.650
this starting from zero. Imagine a totally new

00:04:13.650 --> 00:04:16.470
user. Let's call him Max. He talks to the agent

00:04:16.470 --> 00:04:18.410
for the first time. He says, my name is Max.

00:04:18.470 --> 00:04:20.790
I enjoy hiking, and I currently live in Vancouver,

00:04:20.790 --> 00:04:23.889
Canada. OK, boom. Instantly, behind the scenes,

00:04:24.209 --> 00:04:26.709
Zep kicks in. It creates a main thing, an entity,

00:04:27.230 --> 00:04:29.569
Max. Right. Then it starts drawing lines, the

00:04:29.569 --> 00:04:31.990
relationships. Yeah. Max likes hiking. Max lives

00:04:31.990 --> 00:04:35.350
in Canada. Maybe even AI assistant A's chatting

00:04:35.350 --> 00:04:38.689
with Max. It's like stacking Lego blocks of data,

00:04:39.050 --> 00:04:41.389
building up that context piece by piece. And

00:04:41.389 --> 00:04:43.889
then Max shares more. The graph gets richer.

00:04:44.069 --> 00:04:46.569
Maybe he adds, I usually hike on weekends. My

00:04:46.569 --> 00:04:49.149
favorite trail is Grousegrind. I also like taking

00:04:49.149 --> 00:04:51.470
photos of nature, but I'm not a pro photographer

00:04:51.470 --> 00:04:54.709
yet. OK. Graph expands again almost instantly.

00:04:54.920 --> 00:04:58.000
New lines, Max favorite trail, grouse grind,

00:04:58.360 --> 00:05:01.699
Max enjoys nature photos, Max skill level amateur

00:05:01.699 --> 00:05:05.100
photographer, and a new entity pops up, grouse

00:05:05.100 --> 00:05:08.220
grind, tagged as a hiking trail, Max likes. Okay,

00:05:08.220 --> 00:05:10.980
now for the payoff, the intelligence test. Max

00:05:10.980 --> 00:05:13.139
asks, got any ideas for what I could do this

00:05:13.139 --> 00:05:15.199
Saturday? Right, and the agent doesn't just give

00:05:15.199 --> 00:05:18.620
some generic stuff, it queries that knowledge

00:05:18.620 --> 00:05:21.120
graph it just built about Max. It pulls the pieces

00:05:21.120 --> 00:05:23.439
together. Just the size, is it? Yeah. The response

00:05:23.439 --> 00:05:26.180
might be something like, hi Max. Okay, since

00:05:26.180 --> 00:05:28.199
you like hiking and nature photos, here are some

00:05:28.199 --> 00:05:31.500
ideas for Saturday. One, hike a new trail, maybe

00:05:31.500 --> 00:05:34.100
try a different Vancouver trail, mix it up from

00:05:34.100 --> 00:05:37.379
Grau's Grind, like Lynn Canyon or Cypress Mountain.

00:05:37.850 --> 00:05:40.089
Ah, see, that's the magic, isn't it? It combined

00:05:40.089 --> 00:05:42.009
totally separate facts, the hiking, the photos,

00:05:42.170 --> 00:05:44.829
the location, his usual spot to give a really

00:05:44.829 --> 00:05:46.910
personalized recommendation. It didn't just recall,

00:05:47.209 --> 00:05:49.750
it connected things. Absolutely. What's the magic

00:05:49.750 --> 00:05:51.949
when the graph grows? Agent combines diverse

00:05:51.949 --> 00:05:55.389
facts for truly personalized results. So, this

00:05:55.389 --> 00:05:57.370
all sounds amazing, right? This AI that actually

00:05:57.370 --> 00:06:00.709
gets you. But, oh, here's the catch. The thing

00:06:00.709 --> 00:06:02.290
people don't always talk about up front. There's

00:06:02.290 --> 00:06:04.610
always a catch. As that memory gets smarter and

00:06:04.610 --> 00:06:08.279
deeper, Every single check can get way way more

00:06:08.279 --> 00:06:11.879
expensive. Ah the cost why it comes down to tokens

00:06:11.879 --> 00:06:14.439
That's how most AI models charge you think of

00:06:14.439 --> 00:06:17.279
a token as like roughly a word or sometimes part

00:06:17.279 --> 00:06:19.639
of a word Okay Every time you send a message

00:06:19.639 --> 00:06:21.839
the agent doesn't just send your message to the

00:06:21.839 --> 00:06:25.120
big brain the large language model or LLM It

00:06:25.120 --> 00:06:27.500
bundles up this whole context package. What's

00:06:27.500 --> 00:06:30.000
in the package usually a summary of you? the

00:06:30.000 --> 00:06:32.360
user, plus relevant facts pulled from that Knowledge

00:06:32.360 --> 00:06:34.360
Graph we talked about, and maybe the last few

00:06:34.360 --> 00:06:36.779
messages from the chat. Gotcha. So in our simple

00:06:36.779 --> 00:06:40.079
max example, asking about Saturday, how many

00:06:40.079 --> 00:06:43.240
tokens was that? Believe it or not, around 2

00:06:43.240 --> 00:06:46.579
,727 tokens, just for that simple question. Now,

00:06:46.579 --> 00:06:49.240
scale that up. Imagine a loyal customer you've

00:06:49.240 --> 00:06:51.259
interacted with for months. Their Knowledge Graph

00:06:51.259 --> 00:06:53.939
might have hundreds of facts. That context package

00:06:53.939 --> 00:06:56.980
could easily hit 3 ,000, 5 ,000, maybe even 10

00:06:56.980 --> 00:06:59.829
,000 tokens per message. Okay, let's do the math.

00:07:00.069 --> 00:07:03.410
If it's, say, 0 .002 tons per thousand tokens.

00:07:03.610 --> 00:07:08.769
Right. 3 ,000 tokens is 0 .006 cents. Six tenths

00:07:08.769 --> 00:07:11.029
of a cent. Doesn't sound like much. Quite. But

00:07:11.029 --> 00:07:14.689
if you have 1 ,000 users chatting daily... That's

00:07:14.689 --> 00:07:17.709
$6 a day, $180 a month, just for the memory piece.

00:07:18.230 --> 00:07:20.170
And that can easily balloon into thousands if

00:07:20.170 --> 00:07:22.089
your graph gets really big or you have lots of

00:07:22.089 --> 00:07:24.670
users. So why does having a smart AI memory get

00:07:24.670 --> 00:07:28.329
so expensive? AI models charged by tokens and

00:07:28.329 --> 00:07:31.490
long context uses many. Right. So the million

00:07:31.490 --> 00:07:33.029
dollar question, or maybe the thousand dollar

00:07:33.029 --> 00:07:35.910
a month question. How do we get this powerful

00:07:35.910 --> 00:07:39.600
memory without going broke? Method one. Smart

00:07:39.600 --> 00:07:41.980
context filtering the surgical approach as you

00:07:41.980 --> 00:07:43.800
called it. Yeah, because a lot of the off -the

00:07:43.800 --> 00:07:45.819
-shelf memory tools They kind of act like a blunt

00:07:45.819 --> 00:07:47.399
instrument. They just grab everything all the

00:07:47.399 --> 00:07:49.879
facts all the history It's frankly often just

00:07:49.879 --> 00:07:51.819
lazy engineering. Okay. So what's the fix? You

00:07:51.819 --> 00:07:53.480
got to take control instead of pulling everything

00:07:53.480 --> 00:07:56.660
blindly you use direct HTTP requests basically

00:07:56.660 --> 00:07:59.079
specific web commands to be like a surgeon You

00:07:59.079 --> 00:08:01.959
precisely select only the relevant info. So you're

00:08:01.959 --> 00:08:03.680
telling the system just give me the last 10 messages

00:08:03.959 --> 00:08:07.199
Exactly. Or, show me only the top three facts

00:08:07.199 --> 00:08:09.319
from the long -term memory that are really relevant

00:08:09.319 --> 00:08:11.699
to this specific question. Maybe you even add

00:08:11.699 --> 00:08:14.420
a filter, like ignore anything less than 70 %

00:08:14.420 --> 00:08:17.019
relevant. Okay, contrast the flows. Standard

00:08:17.019 --> 00:08:20.699
flow. User message block. Grab all facts plus

00:08:20.699 --> 00:08:22.680
entire history directs that stuff it all into

00:08:22.680 --> 00:08:25.910
the LLM. wasteful, super wasteful, optimized

00:08:25.910 --> 00:08:29.290
flow, user message block, targeted requests for

00:08:29.290 --> 00:08:32.350
say last 10 messages, smart search requests for

00:08:32.350 --> 00:08:35.190
top three relevant long -term facts, merge that

00:08:35.190 --> 00:08:38.210
small relevant package, then send to LLM. Much

00:08:38.210 --> 00:08:41.179
leaner, faster, focused. Totally. And you can

00:08:41.179 --> 00:08:43.220
get clever with the search query itself. Maybe

00:08:43.220 --> 00:08:45.720
use the LLM first to refine the user's raw message

00:08:45.720 --> 00:08:47.600
into a better search term before you even hit

00:08:47.600 --> 00:08:49.159
the knowledge graph. And I know you mentioned

00:08:49.159 --> 00:08:52.080
using tools like N8n's code node to handle some

00:08:52.080 --> 00:08:54.240
of the data formatting. That can be tricky. Oh,

00:08:54.259 --> 00:08:56.240
yeah. I mean, I still wrestle with prompt drift

00:08:56.240 --> 00:08:58.580
myself sometimes when trying to get AI to structure

00:08:58.580 --> 00:09:01.059
complex JSON data perfectly. It's not always

00:09:01.059 --> 00:09:03.820
easy. You might ask Claude or another AI to help

00:09:03.820 --> 00:09:05.720
write the JavaScript snippet to clean it up.

00:09:05.820 --> 00:09:08.000
That's a good tip. So the results of this surgical

00:09:08.000 --> 00:09:11.190
approach? Dramatic. Like we saw, you can go from

00:09:11.190 --> 00:09:14.330
maybe 2 ,700 tokens per interaction down to around

00:09:14.330 --> 00:09:17.549
670. Wow, that's a huge drop. Yeah, like a 76

00:09:17.549 --> 00:09:20.570
% reduction. Cuts your API costs by more than

00:09:20.570 --> 00:09:24.070
half. Easy. So how exactly does this surgical

00:09:24.070 --> 00:09:27.590
approach cut costs so dramatically? By sending

00:09:27.590 --> 00:09:30.570
only highly relevant filtered data to the AI.

00:09:30.870 --> 00:09:33.830
OK, method one sounds great. But sometimes, even

00:09:33.830 --> 00:09:36.149
being surgical hits a wall, right? You mentioned

00:09:36.149 --> 00:09:39.169
some APIs make it hard to get history in the

00:09:39.169 --> 00:09:41.389
right order. Yeah, exactly. Some systems, maybe

00:09:41.389 --> 00:09:43.529
they give you the whole history, but oldest first.

00:09:43.710 --> 00:09:46.149
So to get the last 10 messages, you have to pull

00:09:46.149 --> 00:09:48.269
everything and sort it yourself. Kind of defeats

00:09:48.269 --> 00:09:50.129
the purpose of being efficient. So that leads

00:09:50.129 --> 00:09:53.049
to method two, hybrid memory architecture. Right.

00:09:53.210 --> 00:09:56.049
The best of both worlds approach. The core idea

00:09:56.049 --> 00:09:58.740
is super simple, but really powerful. Different

00:09:58.740 --> 00:10:00.620
kinds of data belong in different kinds of databases.

00:10:00.879 --> 00:10:02.559
You know, you wouldn't use a hammer for a screw.

00:10:02.840 --> 00:10:05.240
Makes sense. So what's the two brains set up

00:10:05.240 --> 00:10:07.399
here? OK, so for your long term memory, that

00:10:07.399 --> 00:10:10.360
complex web of facts and relationships, you stick

00:10:10.360 --> 00:10:12.980
with Zep's knowledge graph. That's its superpower.

00:10:13.019 --> 00:10:15.460
It's built for that. Got it. And for short term.

00:10:15.940 --> 00:10:18.620
For the recent conversation history, the last

00:10:18.620 --> 00:10:22.509
10, 20 messages, use a standard Simple database

00:10:22.509 --> 00:10:25.529
like Postgresql. It's super fast and really efficient

00:10:25.529 --> 00:10:27.730
for just storing ordered lists with timestamps.

00:10:27.909 --> 00:10:30.649
Perfect for recent chat history. Ah, okay. So

00:10:30.649 --> 00:10:32.610
you get the deep understanding from Zepp, but

00:10:32.610 --> 00:10:35.009
the lightning fast recall of recent stuff from

00:10:35.009 --> 00:10:37.769
Postgresql. Exactly. The best of both worlds.

00:10:38.009 --> 00:10:40.409
How does the flow work then? Message comes in.

00:10:40.610 --> 00:10:43.110
Message arrives. Then at the same time, you fire

00:10:43.110 --> 00:10:46.190
off two requests. One API call to Zepp for the

00:10:46.190 --> 00:10:49.629
top. say three relevant long -term facts, and

00:10:49.629 --> 00:10:51.809
D, a super quick query to PostgreSQL for the

00:10:51.809 --> 00:10:54.169
last 10 messages. Okay, parallel requests. Yep.

00:10:54.529 --> 00:10:56.789
Then you quickly merge those two small relevant

00:10:56.789 --> 00:10:59.450
context packages together, send that combined

00:10:59.450 --> 00:11:01.929
package to the LLM. And after the LLM replies?

00:11:02.090 --> 00:11:04.350
You update both memories. Add the new exchange

00:11:04.350 --> 00:11:07.250
to PostgreSQL and let Zep process it to update

00:11:07.250 --> 00:11:09.330
the knowledge graph if needed. Let's picture

00:11:09.330 --> 00:11:11.950
it. User asks something complex like, where should

00:11:11.950 --> 00:11:13.750
I move? I need a place that fits my interest.

00:11:13.889 --> 00:11:17.279
Right. An agent with this hybrid setup can give

00:11:17.279 --> 00:11:20.440
a really nuanced answer. It uses ZEP, the long

00:11:20.440 --> 00:11:22.960
-term brain, to pull facts like enjoys hiking,

00:11:23.259 --> 00:11:25.539
knows about Lynn Canyon Park, interested in photography,

00:11:25.639 --> 00:11:28.139
and it uses Postgresco, the short -term brain,

00:11:28.500 --> 00:11:30.320
for the immediate context, like the user just

00:11:30.320 --> 00:11:33.120
mentioned, my future, or thinking about change.

00:11:33.460 --> 00:11:36.679
It combines both for a personalized insight without

00:11:36.679 --> 00:11:38.899
wasting tokens on stuff that's not relevant right

00:11:38.899 --> 00:11:41.679
now. What's the main advantage of this hybrid

00:11:41.679 --> 00:11:45.120
two -brain approach, then? It combines deep relational

00:11:45.120 --> 00:11:48.000
understanding with lightning -fast recent recall.

00:11:48.639 --> 00:11:50.120
This is where it gets really exciting. Moving

00:11:50.120 --> 00:11:52.179
from a single demo to something that works for

00:11:52.179 --> 00:11:55.379
tons of users. The key is session IDs, right?

00:11:55.539 --> 00:11:56.980
Absolutely. It's fundamental. The whole system

00:11:56.980 --> 00:11:59.919
hinges on this. Every unique user gets their

00:11:59.919 --> 00:12:03.399
own unique identifier, the session ID. Think

00:12:03.399 --> 00:12:05.340
of it as the key to their own private knowledge

00:12:05.340 --> 00:12:07.620
graph. So this could be their Telegram chat ID.

00:12:07.759 --> 00:12:09.759
Or their email address if it's an email bot or

00:12:09.759 --> 00:12:12.440
user account ID from your website. Anything unique

00:12:12.440 --> 00:12:15.299
to them. And the benefit. Massive scalability.

00:12:15.620 --> 00:12:18.720
Thousands, even millions of users can be talking

00:12:18.720 --> 00:12:21.299
to the agent at the same time. But each conversation

00:12:21.299 --> 00:12:24.700
is totally separate. Max's knowledge graph doesn't

00:12:24.700 --> 00:12:27.019
leak into mics. It's completely isolated. That's

00:12:27.019 --> 00:12:30.100
how you go from a chat bot to a real AI workforce.

00:12:30.220 --> 00:12:32.720
Think about the applications. Customer support.

00:12:32.879 --> 00:12:34.940
Oh, yeah. An agent that remembers your entire

00:12:34.940 --> 00:12:37.860
purchase history, pass support tickets, even

00:12:37.860 --> 00:12:40.320
your preferred way of communicating. You never

00:12:40.320 --> 00:12:42.519
have to repeat yourself. It feels like talking

00:12:42.519 --> 00:12:44.779
to a dedicated account manager who actually knows

00:12:44.779 --> 00:12:47.399
you. Or an educational tutor. Building a unique

00:12:47.399 --> 00:12:50.240
learning profile for every single student. Tracking

00:12:50.240 --> 00:12:52.659
their progress, spotting weaknesses, adapting

00:12:52.659 --> 00:12:55.080
teaching styles automatically. It's incredibly

00:12:55.080 --> 00:12:58.210
powerful. The sales assistant. Imagine. Detailed

00:12:58.210 --> 00:13:00.429
history for every prospect needs, objections

00:13:00.429 --> 00:13:02.350
raised before, personal interests they mentioned

00:13:02.350 --> 00:13:05.110
offhand. It's the perfect briefing for the human

00:13:05.110 --> 00:13:07.330
salesperson stepping in to close the deal. And

00:13:07.330 --> 00:13:10.690
even just onboarding new users. Remembering exactly

00:13:10.690 --> 00:13:13.190
where they left off in a complex setup process

00:13:13.190 --> 00:13:15.970
that could massively reduce churn. People hate

00:13:15.970 --> 00:13:19.309
starting over. Whoa. Just imagine the impact.

00:13:19.570 --> 00:13:22.070
A sales agent that knows hundreds of prospects

00:13:22.070 --> 00:13:25.360
intimately, instantly. or a support agent recalling

00:13:25.360 --> 00:13:28.419
every single detail. That really scales human

00:13:28.419 --> 00:13:30.860
-like intelligence. It's a total game changer

00:13:30.860 --> 00:13:33.620
for the user experience. So how do these AI agents

00:13:33.620 --> 00:13:36.840
manage to remember so many different users distinctly?

00:13:37.059 --> 00:13:40.259
Each user has a unique session ID and a private

00:13:40.259 --> 00:13:42.179
knowledge graph. Okay, so we've got the core

00:13:42.179 --> 00:13:44.600
methods down. What about taking it to the next

00:13:44.600 --> 00:13:47.179
level? Pro strategies. Definitely things you

00:13:47.179 --> 00:13:50.200
can do. For cost optimization, you can play with

00:13:50.200 --> 00:13:53.110
dynamic relevance scoring. Meaning? Adjusting

00:13:53.110 --> 00:13:56.370
that relevance threshold. Maybe lower it for

00:13:56.370 --> 00:13:58.629
creative brainstorming tasks where you want more

00:13:58.629 --> 00:14:01.470
tangential ideas, but crank it higher for technical

00:14:01.470 --> 00:14:03.990
support where accuracy is paramount. OK, that

00:14:03.990 --> 00:14:06.370
makes sense. You can also set up entity prioritization.

00:14:06.490 --> 00:14:09.350
Tell ZEP, hey, things like pass support ticket

00:14:09.350 --> 00:14:11.470
ID are way more important than favorite color.

00:14:11.710 --> 00:14:14.590
So it prioritizes retrieving the critical stuff.

00:14:15.029 --> 00:14:17.190
Nice. What about old information? Intelligent

00:14:17.190 --> 00:14:20.340
memory decay. Basically, if a fact hasn't been

00:14:20.340 --> 00:14:22.620
relevant or accessed in a long time, its important

00:14:22.620 --> 00:14:25.759
score gradually fades. It stops outdated info

00:14:25.759 --> 00:14:28.419
from cluttering things up. Smart. And for really

00:14:28.419 --> 00:14:30.960
big scale. You might look beyond just Postgresql

00:14:30.960 --> 00:14:34.039
for the short -term memory. Maybe Redis for absolutely

00:14:34.039 --> 00:14:36.480
blazing fast caching if you have users hitting

00:14:36.480 --> 00:14:39.899
the same topics repeatedly. Or Elasticsearch

00:14:39.899 --> 00:14:42.100
if your knowledge graphs become truly enormous.

00:14:42.220 --> 00:14:44.710
And managing the memory itself. Critical. You

00:14:44.710 --> 00:14:47.690
need limits. Set a max number of facts per user

00:14:47.690 --> 00:14:50.389
graph. Have a strategy to archive graphs for

00:14:50.389 --> 00:14:52.909
inactive users on a cheaper storage. Otherwise,

00:14:53.070 --> 00:14:55.289
costs and slowdowns are inevitable. It sounds

00:14:55.289 --> 00:14:57.710
like there are pitfalls, too. What are the common

00:14:57.710 --> 00:15:00.590
mistakes people make building these? The landmines

00:15:00.590 --> 00:15:03.200
to avoid. Oh yeah, plenty. Number one is probably

00:15:03.200 --> 00:15:05.620
over -storing. Saving every tiny detail makes

00:15:05.620 --> 00:15:08.639
the graph noisy and less useful. Use those relevance

00:15:08.639 --> 00:15:12.259
thresholds. Prioritize entities. Session ID collisions.

00:15:12.820 --> 00:15:15.100
Using ideas that aren't truly unique. Disaster.

00:15:15.399 --> 00:15:18.000
You mix up user data. Always use secure, unique

00:15:18.000 --> 00:15:21.019
IDs like UIDs or properly hashed identifiers.

00:15:21.919 --> 00:15:24.509
Unbounded memory growth. Just letting graphs

00:15:24.509 --> 00:15:27.590
grow forever without limits or archiving leads

00:15:27.590 --> 00:15:30.250
to slowdowns, ballooning costs, implement those

00:15:30.250 --> 00:15:32.889
limits. Poor relationship quality. Sometimes

00:15:32.889 --> 00:15:35.710
the AI extracting facts, it gets it wrong, creates

00:15:35.710 --> 00:15:38.970
weird or incorrect links like Maxi located in

00:15:38.970 --> 00:15:41.470
Panama when he's in Vancouver. You need to validate,

00:15:41.789 --> 00:15:43.970
maybe fine tune the prompts used for extraction.

00:15:44.230 --> 00:15:46.950
Good point. And the last one. Ignoring token

00:15:46.950 --> 00:15:49.669
optimization. Just assuming memory is worth any

00:15:49.669 --> 00:15:51.950
cost. You have to monitor usage and implement

00:15:51.950 --> 00:15:54.610
filtering like we discussed. Costs can sneak

00:15:54.610 --> 00:15:57.570
up on you fast. So what are the biggest mistakes

00:15:57.570 --> 00:16:00.110
people make when building AI memory systems?

00:16:00.250 --> 00:16:02.909
Over storing data, wrong IDs, unchecked growth,

00:16:03.470 --> 00:16:06.289
poor data quality, ignoring costs. Building these

00:16:06.289 --> 00:16:08.149
powerful memory systems isn't just tech though.

00:16:08.230 --> 00:16:10.649
There's a huge ethical dimension. Absolutely.

00:16:10.840 --> 00:16:12.799
With great memory comes great responsibility.

00:16:13.440 --> 00:16:15.620
A system that remembers so much about someone

00:16:15.620 --> 00:16:18.000
requires you to be a really careful guardian

00:16:18.000 --> 00:16:19.919
of their privacy. What are the key principles

00:16:19.919 --> 00:16:23.840
there? Transparency, number one. Tell users the

00:16:23.840 --> 00:16:26.220
agent remembers things to help them. Something

00:16:26.220 --> 00:16:28.539
simple like, to improve our chats, I'll remember

00:16:28.539 --> 00:16:31.679
key details. Goes a long way. And letting users

00:16:31.679 --> 00:16:34.259
control their data. Crucial. The right to be

00:16:34.259 --> 00:16:37.259
forgotten, like under GDPR. Users must be able

00:16:37.259 --> 00:16:40.059
to easily see their data. export it, and most

00:16:40.059 --> 00:16:42.720
importantly, delete it if they want to. And security,

00:16:42.980 --> 00:16:45.059
obviously. Non -negotiable. Especially if you're

00:16:45.059 --> 00:16:47.600
storing anything remotely sensitive, robust security

00:16:47.600 --> 00:16:50.039
is a must. And you mentioned performance benchmarks

00:16:50.039 --> 00:16:53.080
earlier. These optimizations aren't just theory,

00:16:53.279 --> 00:16:55.679
they have real impact. Huge impact. We looked

00:16:55.679 --> 00:16:58.379
at cost per 1 ,000 interactions. Basics app might

00:16:58.379 --> 00:17:03.299
be, say, $150 to $240. Okay. Our optimized HTTP

00:17:03.299 --> 00:17:06.430
filtering method. cuts that way down maybe $60

00:17:06.430 --> 00:17:09.069
to $90. Big improvement. But the hybrid architecture,

00:17:09.190 --> 00:17:11.109
that's the winner. We saw a cost between $48

00:17:11.109 --> 00:17:13.950
and $72. That's a massive saving compared to

00:17:13.950 --> 00:17:15.970
the basic setup. And user experience improves

00:17:15.970 --> 00:17:19.690
too. Dramatically. Response times drop from like

00:17:19.690 --> 00:17:21.750
three eight seconds down to one three seconds.

00:17:22.089 --> 00:17:24.670
Accuracy jumps from maybe six out of ten to eight

00:17:24.670 --> 00:17:27.089
point five out of ten. User feedback goes from

00:17:27.089 --> 00:17:30.109
yeah it's okay to wow this thing actually gets

00:17:30.109 --> 00:17:32.809
me. And it's fast. So what's the biggest tightrope

00:17:32.809 --> 00:17:35.509
walk for developers building these powerful AI

00:17:35.509 --> 00:17:37.990
memories? Balancing advanced memory capabilities

00:17:37.990 --> 00:17:41.170
with user privacy and data security. OK. So for

00:17:41.170 --> 00:17:42.470
listeners who are thinking, all right, I want

00:17:42.470 --> 00:17:44.970
to build this, you put together a kind of four

00:17:44.970 --> 00:17:47.150
-week plan. Yeah, a practical roadmap to get

00:17:47.150 --> 00:17:50.190
started. Week one, foundation setup. Meaning?

00:17:50.390 --> 00:17:53.069
Get Zepp installed. Get PostgreSQL running. Maybe

00:17:53.069 --> 00:17:55.289
grab a pre -built workflow template. Start having

00:17:55.289 --> 00:17:57.430
simple chats just to see the graph begin to form.

00:17:57.589 --> 00:18:00.119
Get the basics working. Week 2. Customization.

00:18:00.380 --> 00:18:02.519
Start tailoring it to your specific need. Adjust

00:18:02.519 --> 00:18:04.880
those relevant scores. Maybe define some custom

00:18:04.880 --> 00:18:06.940
types of entities you care about. And critically,

00:18:07.339 --> 00:18:09.339
set up proper session ID handling for how your

00:18:09.339 --> 00:18:12.460
users will connect. Week 3. Optimization. Now,

00:18:12.460 --> 00:18:15.140
implement those cost -saving tricks. Fine -tune

00:18:15.140 --> 00:18:17.900
the limits, the relevance filters. Double -check

00:18:17.900 --> 00:18:20.759
those PostgreSQL queries are running fast. Set

00:18:20.759 --> 00:18:23.420
up monitoring so you can actually see your token

00:18:23.420 --> 00:18:26.119
usage and costs. And weak for. Production deployment.

00:18:26.579 --> 00:18:28.880
Test it hard with simulated users to make sure

00:18:28.880 --> 00:18:31.839
everyone's data stays separate. Set up backups

00:18:31.839 --> 00:18:34.759
and a recovery plan. Then you're ready to go

00:18:34.759 --> 00:18:37.559
live. So the bottom line here, this isn't just

00:18:37.559 --> 00:18:40.539
about fancy or chatbots. No. Not at all. This

00:18:40.539 --> 00:18:42.779
is really the foundation for a whole new class

00:18:42.779 --> 00:18:46.619
of AI agents, agents that can actually form genuine,

00:18:47.119 --> 00:18:49.359
useful long -term relationships with people.

00:18:49.519 --> 00:18:51.720
They learn from the past, they understand what

00:18:51.720 --> 00:18:54.359
makes each user unique, and they provide value

00:18:54.359 --> 00:18:56.440
that actually gets better over time. Right. The

00:18:56.440 --> 00:18:59.049
core message is simple. The smartest AI model

00:18:59.049 --> 00:19:01.549
in the world is basically useless if it can't

00:19:01.549 --> 00:19:03.150
remember what actually matters to the person

00:19:03.150 --> 00:19:05.430
it's talking to. And building this kind of memory,

00:19:05.509 --> 00:19:07.890
this understanding, that's a real edge. Huge

00:19:07.890 --> 00:19:09.730
competitive advantage. While everyone else is

00:19:09.730 --> 00:19:11.450
building agents that forget everything tomorrow,

00:19:11.809 --> 00:19:13.369
you can build agents that learn and grow with

00:19:13.369 --> 00:19:15.970
your users. It's about remembering what matters,

00:19:16.210 --> 00:19:18.589
doing it securely and doing it respectfully.

00:19:18.970 --> 00:19:21.750
So here's a final thought to chew on. As AI gets

00:19:21.750 --> 00:19:24.630
more and more woven into our lives, how do you

00:19:24.630 --> 00:19:27.980
think our own human memories might adapt? when

00:19:27.980 --> 00:19:30.779
we can lean on these agents that, well, never

00:19:30.779 --> 00:19:33.279
forget. That's a deep question. And what does

00:19:33.279 --> 00:19:37.079
understanding even mean when an AI can perfectly

00:19:37.079 --> 00:19:39.359
recall every single thing you've ever said to

00:19:39.359 --> 00:19:41.220
it? Lots to think about there. We definitely

00:19:41.220 --> 00:19:43.099
encourage you to explore these possibilities,

00:19:43.500 --> 00:19:45.920
giving your AI agents memory that's intelligent,

00:19:46.259 --> 00:19:48.480
affordable, and ethical. That's our deep dive

00:19:48.480 --> 00:19:50.380
for today. Out, T -Pro Music.