WEBVTT

00:00:00.000 --> 00:00:03.160
You know that feeling? You're maybe 10, 15 minutes

00:00:03.160 --> 00:00:06.259
into a chat with an AI. Oh, yeah. And it is brilliant.

00:00:06.839 --> 00:00:10.220
You're building something and it's just, it's

00:00:10.220 --> 00:00:12.380
clicking. It remembers every single thing you

00:00:12.380 --> 00:00:13.960
told it. You're in the flow state. It feels like

00:00:13.960 --> 00:00:16.079
it's reading your mind. Exactly. And then, I

00:00:16.079 --> 00:00:18.320
don't know, message 20 hits. Yeah. And it's like

00:00:18.320 --> 00:00:20.039
a switch flips. All of a sudden, it ignores your

00:00:20.039 --> 00:00:22.379
formatting, contradicts itself. It hasn't gotten

00:00:22.379 --> 00:00:24.519
dumber. It just ran out of whiteboard space.

00:00:24.679 --> 00:00:27.190
And today... We're going to fix that. Welcome

00:00:27.190 --> 00:00:29.929
to the deep dive. We are looking at a guide by

00:00:29.929 --> 00:00:33.770
Max Anne called Mastering AI Context Windows

00:00:33.770 --> 00:00:37.469
Memory Hacks for 2026. And this is really the

00:00:37.469 --> 00:00:39.409
manual for dealing with what the author calls

00:00:39.409 --> 00:00:42.530
the memory wall. We're going to cover why models

00:00:42.530 --> 00:00:46.289
like ChatGPT 5 .2 and Claude 4 .5 seem to just,

00:00:46.390 --> 00:00:48.689
you know, forget things. Right. We'll look at

00:00:48.689 --> 00:00:51.170
the hidden token limits of these 2026 models

00:00:51.170 --> 00:00:54.630
and a strategy called the handoff process to

00:00:54.630 --> 00:00:59.939
keep. So the goal here is to move from just chatting

00:00:59.939 --> 00:01:03.039
to becoming what the article calls a context

00:01:03.039 --> 00:01:05.859
orchestrator. That's the idea. Okay, so let's

00:01:05.859 --> 00:01:08.420
start with that core concept, the context window.

00:01:09.079 --> 00:01:12.319
The guide compares it to a finite whiteboard,

00:01:12.459 --> 00:01:15.060
which I find really helpful. It's honestly the

00:01:15.060 --> 00:01:17.540
best analogy. Yeah. Just picture a physical whiteboard.

00:01:17.920 --> 00:01:20.420
That is a fixed size. It can't get any bigger.

00:01:20.739 --> 00:01:24.480
Every single word you type, every file you upload,

00:01:24.659 --> 00:01:28.180
every response the AI gives, it all takes up

00:01:28.180 --> 00:01:30.500
physical space on this board. And even its own

00:01:30.500 --> 00:01:32.819
internal thinking, right? Exactly. The hidden

00:01:32.819 --> 00:01:34.959
reasoning steps, all of it gets written down.

00:01:35.099 --> 00:01:37.459
And once it's full, I mean, it doesn't just stop

00:01:37.459 --> 00:01:40.780
working. No, it starts erasing. To make room

00:01:40.780 --> 00:01:43.159
for the new stuff you just typed, it has to wipe

00:01:43.159 --> 00:01:45.219
the oldest stuff off the board. Which is why

00:01:45.219 --> 00:01:48.700
instruction drift happens. It's not a gug. It's

00:01:48.700 --> 00:01:51.120
an automatic deletion of your initial rules just

00:01:51.120 --> 00:01:53.959
to fit the new data. The AI isn't being difficult.

00:01:54.060 --> 00:01:56.400
Your rules literally don't exist in its memory

00:01:56.400 --> 00:01:59.739
anymore. So, is this strictly based on word count?

00:02:00.299 --> 00:02:02.739
Or is it more complex? It's not words. It uses

00:02:02.739 --> 00:02:04.719
tokens. Which are what? Roughly three quarters

00:02:04.719 --> 00:02:06.579
of a word? Yeah, that's a good rule of thumb.

00:02:06.680 --> 00:02:08.419
It's a bit more complicated, but that's close

00:02:08.419 --> 00:02:10.939
enough for practical use. Got it. Okay, let's

00:02:10.939 --> 00:02:13.680
talk about the big three models for 2026. Because

00:02:13.680 --> 00:02:15.759
their rod boards are apparently of very different

00:02:15.759 --> 00:02:17.900
sizes. Wildly different. And they're for different

00:02:17.900 --> 00:02:22.289
jobs. So first up. ChatGPT 5 .2 Codex from OpenAI.

00:02:22.530 --> 00:02:24.430
Right, so this one's the smallest memory of the

00:02:24.430 --> 00:02:27.710
majors. In the UI, you're limited to about 60

00:02:27.710 --> 00:02:31.169
,000 tokens. Which is around 45 ,000 words. Yeah,

00:02:31.229 --> 00:02:33.409
and it's built for speed. It's your sports car.

00:02:33.530 --> 00:02:36.270
Quick code, fast answers. Not for analyzing a

00:02:36.270 --> 00:02:38.550
huge novel, then. No, you'll hit that wall almost

00:02:38.550 --> 00:02:40.189
immediately. Okay, then in the middle, we've

00:02:40.189 --> 00:02:43.310
got Claude Opus 4 .5 from Anthropic. This is

00:02:43.310 --> 00:02:45.469
the middle ground. It's got 200 ,000 tokens.

00:02:46.009 --> 00:02:48.650
About 150 ,000 words. So it's better for deeper

00:02:48.650 --> 00:02:51.530
reasoning. Much better. It's a bit slower, but

00:02:51.530 --> 00:02:54.750
it's great for deep analysis writing. It can

00:02:54.750 --> 00:02:56.750
hold a thought for much longer. And then there's

00:02:56.750 --> 00:03:00.030
the beast. Gemini 3 Pro from Google. The free

00:03:00.030 --> 00:03:03.580
train. It has a one million token window. Whoa.

00:03:03.840 --> 00:03:07.639
That's about 750 ,000 words. One million tokens.

00:03:07.699 --> 00:03:09.580
Just imagine that. You could feed it a two -hour

00:03:09.580 --> 00:03:12.680
video, 10 PDFs, and ask 50 questions without

00:03:12.680 --> 00:03:15.039
it even breaking a sweat. That is Gemini 3 Pro.

00:03:15.180 --> 00:03:18.219
It's just a different scale. So, wait. Does having

00:03:18.219 --> 00:03:21.860
a bigger tank mean it drives better? Why do we

00:03:21.860 --> 00:03:23.939
need memory hacks if we can just use Gemini?

00:03:24.439 --> 00:03:27.300
Surprisingly, no. Performance actually degrades

00:03:27.300 --> 00:03:30.139
as that giant window fills up. So bigger isn't

00:03:30.139 --> 00:03:32.300
always better. Not at all. There's a very specific

00:03:32.300 --> 00:03:34.500
performance curve. Okay, tell me about it. So

00:03:34.500 --> 00:03:38.199
from zero to about 30 % capacity, you get beautiful

00:03:38.199 --> 00:03:41.219
performance. The honeymoon phase. Right. 30 to

00:03:41.219 --> 00:03:43.939
50 % is kind of the peak zone. But here's the

00:03:43.939 --> 00:03:48.020
danger zone. Above 60 % capacity, quality starts

00:03:48.020 --> 00:03:50.979
to slide. 60%. And once you get to that 70 %

00:03:50.979 --> 00:03:53.879
to 100 % range, the results get really unpredictable.

00:03:54.360 --> 00:03:56.659
Hallucinations go way up. It's like computer

00:03:56.659 --> 00:03:58.840
RAM. You can have a ton of it, but if you're

00:03:58.840 --> 00:04:01.819
running at 90 % capacity, everything slows down

00:04:01.819 --> 00:04:04.120
and gets glitchy. It's the perfect analogy. The

00:04:04.120 --> 00:04:06.219
AI is trying to juggle too much at once. So the

00:04:06.219 --> 00:04:08.000
goal isn't just fitting the data. It's about

00:04:08.000 --> 00:04:11.419
active management. Exactly. Unused memory actually

00:04:11.419 --> 00:04:14.000
creates better focus than an overloaded memory.

00:04:14.520 --> 00:04:17.000
Unused memory is clarity. I like it. It's the

00:04:17.000 --> 00:04:20.079
mantra for this whole process. Okay, so if we're

00:04:20.079 --> 00:04:22.420
trying to stay out of that danger zone, we need

00:04:22.420 --> 00:04:26.160
to spot the warning signs. The guide lists four

00:04:26.160 --> 00:04:28.939
big red flags for memory loss. Yeah, these are

00:04:28.939 --> 00:04:31.120
the signs that you need to stop immediately.

00:04:31.600 --> 00:04:34.639
Number one is instructions disappear. The classic.

00:04:35.149 --> 00:04:37.310
You ask for bullet points in message one. And

00:04:37.310 --> 00:04:40.189
by message 20, you get a 600 -word essay. The

00:04:40.189 --> 00:04:42.350
rule just fell off the board. It literally doesn't

00:04:42.350 --> 00:04:44.189
know you asked for bullets anymore. Okay, number

00:04:44.189 --> 00:04:47.370
two, contradictions. Right. Message five said,

00:04:47.550 --> 00:04:50.029
let's use a conservative investment strategy.

00:04:50.170 --> 00:04:53.750
By message 20, it's saying, go all in on crypto.

00:04:53.889 --> 00:04:56.069
It's not being creative. It just forgot the persona.

00:04:56.389 --> 00:04:59.949
Yep. Number three, this one feels really dangerous.

00:05:00.899 --> 00:05:03.800
Facts get invented. It's the most crucial warning,

00:05:03.899 --> 00:05:06.779
I think. Yeah. If the budget was $9 ,000 in message

00:05:06.779 --> 00:05:09.720
two, the AI might just make up a number like

00:05:09.720 --> 00:05:12.399
6 ,500 in message 30. Where does it get that

00:05:12.399 --> 00:05:15.199
from? Thin air. It knows a number should be there,

00:05:15.300 --> 00:05:17.800
but the original fact was erased. So it just

00:05:17.800 --> 00:05:20.480
hallucinates a plausible substitute. That's terrifying

00:05:20.480 --> 00:05:23.420
for actual work. And number four is a specific

00:05:23.420 --> 00:05:26.000
tell for Claude, right? Yeah. Claude kind of

00:05:26.000 --> 00:05:28.279
admits when it's struggling. If you see organizing

00:05:28.279 --> 00:05:32.060
thoughts. or compacting conversation, that's

00:05:32.060 --> 00:05:33.860
Klan telling you its memory is full and it's

00:05:33.860 --> 00:05:35.959
trying to save itself. So is there a way to see

00:05:35.959 --> 00:05:38.800
the fuel gauge before it crashes? Or are we just

00:05:38.800 --> 00:05:40.980
guessing? You're not just guessing. There are

00:05:40.980 --> 00:05:43.519
tools like Google's AI Studio Playground that

00:05:43.519 --> 00:05:46.420
show you the exact token counts. So you can watch

00:05:46.420 --> 00:05:48.620
the meter go up. Exactly. It's worth getting

00:05:48.620 --> 00:05:50.259
familiar with those, even if you're not a developer.

00:05:50.560 --> 00:05:52.879
Okay, let's talk tactics. The main one from the

00:05:52.879 --> 00:05:55.480
guide is called the handoff process. This is

00:05:55.480 --> 00:05:58.740
the most effective fix for 2026. It takes a little

00:05:58.740 --> 00:06:01.439
discipline, but it works every time. So what's

00:06:01.439 --> 00:06:03.540
the trigger? When do you do it? When you hit

00:06:03.540 --> 00:06:07.639
about 60 % capacity. For most chats, that's usually

00:06:07.639 --> 00:06:10.519
around 15 or 20 messages. You just stop. Stop.

00:06:10.519 --> 00:06:13.120
Don't ask the next question. Then what? Step

00:06:13.120 --> 00:06:16.459
one. You ask for a very specific kind of summary.

00:06:16.819 --> 00:06:20.259
You ask four questions. Okay. One, what we covered.

00:06:20.399 --> 00:06:23.319
Two, decisions we made. Three, current to -do

00:06:23.319 --> 00:06:26.550
lists. And four. the next immediate task. You're

00:06:26.550 --> 00:06:28.329
basically packing up the essential information.

00:06:28.769 --> 00:06:32.670
Exactly. Then step two, open a brand new chat,

00:06:32.850 --> 00:06:37.970
a clean whiteboard, 0 % capacity. And step three,

00:06:37.990 --> 00:06:40.790
you just paste that summary in. You paste it

00:06:40.790 --> 00:06:42.709
in with a little prompt, like here's the context

00:06:42.709 --> 00:06:44.970
from our last session. Let's continue. Why is

00:06:44.970 --> 00:06:47.089
this better than just continuing the old chat

00:06:47.089 --> 00:06:49.610
though? Because it eliminates all the back and

00:06:49.610 --> 00:06:52.389
forth fluff that clogs the AI's reasoning. So

00:06:52.389 --> 00:06:54.370
you're getting rid of the can you tweak that,

00:06:54.430 --> 00:06:56.870
how about this kind of messages. Yes, you get

00:06:56.870 --> 00:06:58.949
a clean whiteboard, but you keep the essential

00:06:58.949 --> 00:07:01.329
intelligence of the project. It's a fresh start

00:07:01.329 --> 00:07:03.670
without starting from scratch. I want to talk

00:07:03.670 --> 00:07:05.670
about files next because that's where I think

00:07:05.670 --> 00:07:07.589
a lot of token waste happens. But first, we'll

00:07:07.589 --> 00:07:10.470
take a quick break. All right, we're back diving

00:07:10.470 --> 00:07:13.250
into AI memory. We just covered the handoff process.

00:07:13.959 --> 00:07:17.319
Now, memory hygiene for file uploads. The guide

00:07:17.319 --> 00:07:19.639
says not all files cost the same amount of memory.

00:07:19.819 --> 00:07:22.139
Not even close. People think an upload is an

00:07:22.139 --> 00:07:24.600
upload, but some files are insanely expensive

00:07:24.600 --> 00:07:26.819
in terms of tokens. So what's cheap? What's in

00:07:26.819 --> 00:07:31.339
the green zone? Plain text files. So .txt, CSVs,

00:07:31.360 --> 00:07:35.560
Markdown, just raw data with no formatting. Okay,

00:07:35.620 --> 00:07:37.199
pretty straightforward. What about the stuff

00:07:37.199 --> 00:07:40.259
we all use, like PDFs and Word docs? That's the

00:07:40.259 --> 00:07:42.660
yellow zone. Moderate cost. They have all this

00:07:42.660 --> 00:07:45.439
hidden data headers, footers, formatting that

00:07:45.439 --> 00:07:47.500
the AI has to parse, so it costs a bit more.

00:07:47.699 --> 00:07:49.860
And then it gets expensive. The orange zone.

00:07:50.120 --> 00:07:52.339
This is where you find images and complex Excel

00:07:52.339 --> 00:07:55.180
files. A spreadsheet with charts and colors.

00:07:55.939 --> 00:07:58.420
The AI has to visually interpret all of that,

00:07:58.459 --> 00:08:00.639
and it just burns tokens. So what's in the red

00:08:00.639 --> 00:08:04.110
zone? Very expensive. Video and audio. A five

00:08:04.110 --> 00:08:06.250
-minute video can cost more memory than a 50

00:08:06.250 --> 00:08:09.129
-page book because the AI has to transcribe the

00:08:09.129 --> 00:08:11.589
audio and analyze the visual frames over time.

00:08:11.790 --> 00:08:14.629
Wow. So a practical tip would be, if I have a

00:08:14.629 --> 00:08:16.470
big spreadsheet, don't upload the whole thing.

00:08:16.569 --> 00:08:19.569
Exactly. If you only need one tab, export that

00:08:19.569 --> 00:08:22.629
single tab as a CSV. And upload that instead.

00:08:22.870 --> 00:08:24.769
So we should just chop things up before we upload

00:08:24.769 --> 00:08:27.269
them. Yes, you have to pre -process. Trim videos,

00:08:27.629 --> 00:08:30.329
extract text. Don't make the AI do that work.

00:08:30.490 --> 00:08:32.590
That makes perfect sense. Now, the guide talks

00:08:32.590 --> 00:08:35.470
about building intuition because you can't be

00:08:35.470 --> 00:08:37.909
counting tokens all day. Right. It's not practical.

00:08:38.429 --> 00:08:41.950
And every task is different. Coding costs more

00:08:41.950 --> 00:08:45.549
tokens than, say, writing a poem. So how do you

00:08:45.549 --> 00:08:48.470
build that intuition? It suggests keeping a simple

00:08:48.470 --> 00:08:51.820
log just for a week. A log. What do you track?

00:08:52.019 --> 00:08:55.179
Four things. The task you were doing, the files

00:08:55.179 --> 00:08:57.740
you used, the message count when quality started

00:08:57.740 --> 00:09:00.139
to drop, and your rating of the final output.

00:09:00.340 --> 00:09:03.240
That's it? That's it. After 10 or so projects,

00:09:03.460 --> 00:09:05.960
you'll start to just feel it. You'll know, okay,

00:09:06.080 --> 00:09:08.159
this task is probably good for about 15 messages

00:09:08.159 --> 00:09:10.679
before I need to do a handoff. You start to anticipate

00:09:10.679 --> 00:09:12.559
the cliff instead of falling off it? You stop

00:09:12.559 --> 00:09:15.480
flying blind. Okay, there's another tactic in

00:09:15.480 --> 00:09:17.580
here for when you really don't want to start

00:09:17.580 --> 00:09:20.179
a new chat. It's called in -thread summaries.

00:09:20.500 --> 00:09:23.100
This is the light version of the handoff. A quick

00:09:23.100 --> 00:09:25.779
fix. How does that work? Every five to ten messages,

00:09:25.879 --> 00:09:28.919
you just pause and say, stop. Summarize our current

00:09:28.919 --> 00:09:31.700
goals, decisions, and status. But wait, wouldn't

00:09:31.700 --> 00:09:34.120
that summary just use up more space on the already

00:09:34.120 --> 00:09:36.820
full whiteboard? It does, but remember, the oldest

00:09:36.820 --> 00:09:39.679
stuff gets erased first. By creating that summary

00:09:39.679 --> 00:09:42.769
now, you're putting the most important info at

00:09:42.769 --> 00:09:44.690
the bottom of the whiteboard, the freshest part

00:09:44.690 --> 00:09:47.149
of the memory. I see. So you're forcing it to

00:09:47.149 --> 00:09:49.330
refocus on the important stuff and keeping it

00:09:49.330 --> 00:09:51.850
safe from the eraser for a little longer. Exactly.

00:09:51.850 --> 00:09:54.629
It anchors the context. So is this as good as

00:09:54.629 --> 00:09:56.649
the handoff? No, it's a lighter alternative.

00:09:57.269 --> 00:09:59.230
Handoff is still the best for heavy -duty tasks.

00:09:59.610 --> 00:10:02.309
Let's get to the mistakes to avoid. As I was

00:10:02.309 --> 00:10:04.889
reading this section, I felt very seen. I think

00:10:04.889 --> 00:10:07.330
we're all guilty of these. The first one is just

00:10:07.330 --> 00:10:09.980
one more message. The classic trap. You're at

00:10:09.980 --> 00:10:12.779
80 % capacity, you know you should restart, but

00:10:12.779 --> 00:10:14.879
you think, I'll just ask one more quick question.

00:10:14.960 --> 00:10:17.419
I do this constantly. I still wrestle with prompt

00:10:17.419 --> 00:10:19.720
drift myself. I always think I can squeeze in

00:10:19.720 --> 00:10:22.759
one last query. It's the sunk cost fallacy. You

00:10:22.759 --> 00:10:25.639
feel invested. But that one question pushes you

00:10:25.639 --> 00:10:29.799
to 95 % and the answer you get is garbage and

00:10:29.799 --> 00:10:31.840
sends you down the wrong path. You waste more

00:10:31.840 --> 00:10:33.720
time than if you'd just done the handoff. Way

00:10:33.720 --> 00:10:36.700
more. Okay, next mistake. Uploading everything

00:10:36.700 --> 00:10:40.240
at once. The data dump? You think more context

00:10:40.240 --> 00:10:42.820
is better, so you upload 100 pages of background

00:10:42.820 --> 00:10:45.340
documents. But you only need an answer from page

00:10:45.340 --> 00:10:47.720
10. And you've just filled your entire whiteboard

00:10:47.720 --> 00:10:49.840
with noise before you've even asked your first

00:10:49.840 --> 00:10:52.159
question. You've paralyzed it. And the last one

00:10:52.159 --> 00:10:54.379
is just ignoring the warning signs. It's about

00:10:54.379 --> 00:10:56.600
discipline. When you see instructions start to

00:10:56.600 --> 00:10:58.659
fade, you have to pause. It's not going to get

00:10:58.659 --> 00:11:01.279
better on its own. So what's the ultimate role

00:11:01.279 --> 00:11:04.500
shift here? You stop being a user and you start

00:11:04.500 --> 00:11:07.490
being a context manager. you're actively managing

00:11:07.490 --> 00:11:09.429
a scarce resource. That's a great way to put

00:11:09.429 --> 00:11:12.350
it. It puts the control back in your court. Okay,

00:11:12.409 --> 00:11:14.470
let's wrap this up with the big ideas. For me,

00:11:14.490 --> 00:11:17.659
the big takeaway is... Your AI isn't broken.

00:11:17.879 --> 00:11:20.360
Its whiteboard is just full. That's the whole

00:11:20.360 --> 00:11:23.440
thing in a nutshell. And success in 2026 isn't

00:11:23.440 --> 00:11:26.360
about one long conversation. It's about a series

00:11:26.360 --> 00:11:28.759
of short, focused sprints connected by these

00:11:28.759 --> 00:11:31.679
summarized handoffs. Right. Keep your capacity

00:11:31.679 --> 00:11:34.539
under 60 percent. Be deliberate with your file

00:11:34.539 --> 00:11:37.059
uploads and treat that memory like the scarce

00:11:37.059 --> 00:11:40.899
resource it is. And remember, unused memory is

00:11:40.899 --> 00:11:44.480
clarity. A clean whiteboard is always the smartest

00:11:44.480 --> 00:11:47.659
AI. Unused memory is clarity. That's a great

00:11:47.659 --> 00:11:49.620
line. That's a good rule for life, too. So here's

00:11:49.620 --> 00:11:51.779
a final thought to mull over. We've spent this

00:11:51.779 --> 00:11:54.480
whole deep dive talking about how to manage these

00:11:54.480 --> 00:11:58.600
memory limits. But what happens when the limits

00:11:58.600 --> 00:12:01.940
basically disappear? When we have a hundred million

00:12:01.940 --> 00:12:05.120
token window. Right. And we will. Will we stop

00:12:05.120 --> 00:12:07.639
needing to curate and summarize? And if the AI

00:12:07.639 --> 00:12:09.460
can hold everything perfectly, does that mean

00:12:09.460 --> 00:12:12.059
we stop trying to hold it in our own heads? Does

00:12:12.059 --> 00:12:14.820
the friction of context management actually force

00:12:14.820 --> 00:12:17.100
us to learn? That's the question. If you don't

00:12:17.100 --> 00:12:19.320
have to pack the bags yourself, do you even know

00:12:19.320 --> 00:12:21.159
what you're carrying? Something to think about

00:12:21.159 --> 00:12:24.279
for 2027. For now, next time you see ChatGPT

00:12:24.279 --> 00:12:27.259
ignore a rule, don't get mad. Check the tokens.

00:12:27.559 --> 00:12:30.009
And go clear that whiteboard. Thanks for listening

00:12:30.009 --> 00:12:32.070
to The Deep Dive. See you in the next print.
