WEBVTT

00:00:00.000 --> 00:00:02.560
You sit down to work, you ask a few questions,

00:00:03.040 --> 00:00:05.740
and suddenly the limit is reached. Where did

00:00:05.740 --> 00:00:09.800
the tokens go? To sex silence. Welcome to this

00:00:09.800 --> 00:00:12.080
deep dive. Thank you. It is great to be here.

00:00:12.279 --> 00:00:14.400
Today we are going on a bit of a mission together.

00:00:14.900 --> 00:00:17.640
We are exploring a fascinating breakdown of the

00:00:17.640 --> 00:00:21.000
12 hidden reasons why your clawed co -work usage

00:00:21.000 --> 00:00:23.399
limit just vanishes. Yeah, and the most interesting

00:00:23.399 --> 00:00:25.920
part is that it's almost never because you are

00:00:25.920 --> 00:00:28.800
doing too much actual work. I think that is a

00:00:28.800 --> 00:00:31.420
huge relief for a lot of you listening, because

00:00:31.420 --> 00:00:33.579
I have to make a bit of a vulnerable admission

00:00:33.579 --> 00:00:36.460
here. Right. Yeah. I still wrestle with prompt

00:00:36.460 --> 00:00:39.340
drift myself, just letting context pile up without

00:00:39.340 --> 00:00:41.320
really thinking about it. Right. It is so easy

00:00:41.320 --> 00:00:43.060
to do. We just assume the machine can handle

00:00:43.060 --> 00:00:45.439
everything effortlessly. Exactly. We assume it

00:00:45.439 --> 00:00:47.740
only charges us for the exact question we just

00:00:47.740 --> 00:00:50.789
asked. But the source text we are analyzing today

00:00:50.789 --> 00:00:53.570
paints a very different picture. It really does.

00:00:53.789 --> 00:00:56.030
Fixing this issue isn't about working less or

00:00:56.030 --> 00:00:58.929
cutting back on your actual output. It is about

00:00:58.929 --> 00:01:01.829
cleaning up a messy, bloated setup. So we are

00:01:01.829 --> 00:01:04.170
basically going to travel from the simplest quick

00:01:04.170 --> 00:01:07.810
fixes all the way to some really smart automation

00:01:07.810 --> 00:01:10.909
habits. Exactly. But before we fix the limit,

00:01:11.370 --> 00:01:13.209
we really have to understand where the credits

00:01:13.209 --> 00:01:15.469
are actually leaking in the background. Right.

00:01:15.640 --> 00:01:18.040
Because clearly it is not just the words I am

00:01:18.040 --> 00:01:20.340
typing into the little prompt box. Not at all.

00:01:20.599 --> 00:01:22.780
You need to think of every single session as

00:01:22.780 --> 00:01:25.480
having two distinct bills you have to pay. Okay,

00:01:25.599 --> 00:01:27.760
two bills. What is the first one? The first bill

00:01:27.760 --> 00:01:31.079
is the actual work. This is the stuff you see

00:01:31.079 --> 00:01:33.359
happening on the screen. Like the text that generates

00:01:33.359 --> 00:01:36.459
for you. Yeah, the text generation, the web searching.

00:01:36.810 --> 00:01:40.730
running Python code. That is the explicit work.

00:01:40.890 --> 00:01:43.010
That makes total sense. I am asking for a service.

00:01:43.090 --> 00:01:45.489
I pay for that service. Right. But the second

00:01:45.489 --> 00:01:48.109
bill is where people get caught. That is the

00:01:48.109 --> 00:01:49.870
automatic background loading. And this happens

00:01:49.870 --> 00:01:51.969
completely out of sight. Entirely. It happens

00:01:51.969 --> 00:01:54.310
before you even finish typing your first message.

00:01:54.709 --> 00:01:57.430
Your setup files load up. Your active tools load.

00:01:57.640 --> 00:01:59.719
And the source material points out that Cowork

00:01:59.719 --> 00:02:02.599
specifically drains limits much faster than regular

00:02:02.599 --> 00:02:06.260
chat, right? It does, yes. Because Cowork combines

00:02:06.260 --> 00:02:09.979
file reading, searching, and coding all into

00:02:09.979 --> 00:02:12.599
one unified environment. It is just a heavier

00:02:12.599 --> 00:02:14.819
lift overall. Because every single one of those

00:02:14.819 --> 00:02:17.879
actions requires the underlying model to process

00:02:17.879 --> 00:02:20.099
tokens. Exactly. We should probably pause and

00:02:20.099 --> 00:02:22.900
just define that AI jargon quickly for everyone

00:02:22.900 --> 00:02:26.879
listening. What exactly is a token? Sure. Tokens

00:02:26.879 --> 00:02:29.360
are just pieces of words the AI reads and writes.

00:02:29.560 --> 00:02:31.900
So it is the basic currency of the whole system.

00:02:32.020 --> 00:02:34.439
Right. It is the raw fuel. So if I have a bloated

00:02:34.439 --> 00:02:37.740
setup, the AI is reading thousands of these tokens

00:02:37.740 --> 00:02:40.280
before doing any real work. So leaving unused

00:02:40.280 --> 00:02:42.479
files in your setup is like leaving your car

00:02:42.479 --> 00:02:44.560
engine running while parked. That is a great

00:02:44.560 --> 00:02:47.460
analogy. You are burning expensive fuel to go

00:02:47.460 --> 00:02:49.939
absolutely nowhere. So background tasks can actually

00:02:49.939 --> 00:02:52.139
cost more than the prompt. Exactly. Invisible

00:02:52.139 --> 00:02:54.860
background loading. quietly eats your daily limit.

00:02:55.400 --> 00:02:57.060
Wow. Okay, so now that we know how these credits

00:02:57.060 --> 00:02:58.919
are being spent, let us look at the fastest fix

00:02:58.919 --> 00:03:01.319
available. Yeah, this literally takes five seconds.

00:03:01.759 --> 00:03:04.240
It is all about choosing the right engine for

00:03:04.240 --> 00:03:07.210
the task. Model matchmaking. I am definitely

00:03:07.210 --> 00:03:09.990
guilty of messing this up. I tend to just default

00:03:09.990 --> 00:03:14.050
to Opus 4 .7 for absolutely everything. You and

00:03:14.050 --> 00:03:16.090
almost everyone else. It is a safety blanket.

00:03:16.229 --> 00:03:18.789
It totally is. I just want the best possible

00:03:18.789 --> 00:03:21.409
answer, even if I am just drafting a basic email.

00:03:21.770 --> 00:03:24.090
But based on this guide, I am kind of just throwing

00:03:24.090 --> 00:03:27.150
credits away. You really are. Opus 4 .7 is an

00:03:27.150 --> 00:03:30.310
incredible model, but it is built for complex,

00:03:30.750 --> 00:03:33.050
multi -step reasoning. So using it for simple

00:03:33.050 --> 00:03:35.590
stuff is overkill. Totally. It is like using

00:03:35.500 --> 00:03:38.199
a sledgehammer for a thumbtack. Right. It gets

00:03:38.199 --> 00:03:40.379
the job done, but it is exhausting and wasteful.

00:03:40.400 --> 00:03:42.340
So what should we be using instead? You have

00:03:42.340 --> 00:03:45.039
three main options. Haiku 4 .5 is the lightest.

00:03:45.060 --> 00:03:47.319
You want to use that for quick formatting, simple

00:03:47.319 --> 00:03:49.719
summaries, or basic emails. And then there is

00:03:49.719 --> 00:03:52.479
Sonnet? Yeah, Sonnet 4 .6. That is your daily

00:03:52.479 --> 00:03:54.919
driver. It is the best all -around model for

00:03:54.919 --> 00:03:57.659
regular work and light research. You only bring

00:03:57.659 --> 00:04:00.560
in Opus 4 .7 for the heavy logical lifting. Okay,

00:04:00.560 --> 00:04:02.599
so we match the model to the task, but there

00:04:02.599 --> 00:04:04.460
is also a massive difference between what we

00:04:04.460 --> 00:04:06.979
send the AI and what it sends back, right? A

00:04:06.979 --> 00:04:09.300
huge difference. Output tokens are significantly

00:04:09.300 --> 00:04:11.780
more expensive than input tokens. Let us look

00:04:11.780 --> 00:04:13.979
at the specific example from the source regarding

00:04:13.979 --> 00:04:17.100
the Sonnet 4 API. Right. So the API pricing shows

00:04:17.100 --> 00:04:19.959
that input tokens cost about $3 per million.

00:04:20.459 --> 00:04:23.699
OK, $3. But the output tokens, the words the

00:04:23.699 --> 00:04:27.939
AI generates for you, those cost $15 per million.

00:04:28.100 --> 00:04:31.199
Whoa. That is a massive jump. It is five times

00:04:31.199 --> 00:04:34.699
more expensive. Why do long, detailed answers

00:04:34.699 --> 00:04:38.019
drain my credits so fast? Because output tokens

00:04:38.019 --> 00:04:41.279
cost five times more than input tokens. So the

00:04:41.279 --> 00:04:43.920
natural instinct of the AI to give these long,

00:04:44.160 --> 00:04:46.339
beautifully structured essays is actually hurting

00:04:46.339 --> 00:04:49.360
my daily limit. Exactly. It wants to be helpful,

00:04:49.779 --> 00:04:51.800
but thoroughness is incredibly expensive. So

00:04:51.800 --> 00:04:54.019
how do we fix that? It is brilliantly simple.

00:04:54.170 --> 00:04:56.230
Just tell Claude to be brief. Literally just

00:04:56.230 --> 00:04:59.050
add a constraint to the prompt. Yes. Just keep

00:04:59.050 --> 00:05:01.629
it under five sentences. That single instruction

00:05:01.629 --> 00:05:03.850
saves you a massive amount of output compute.

00:05:04.050 --> 00:05:06.269
That makes perfect sense. We trim the outputs,

00:05:06.610 --> 00:05:08.709
but earlier we talked about the permanent instructions

00:05:08.709 --> 00:05:11.069
we forced the AI to carry. Write the background

00:05:11.069 --> 00:05:14.230
files. The biggest offender here is the ClaudeDD

00:05:14.230 --> 00:05:17.680
.md file. This is a file that sits in your project

00:05:17.680 --> 00:05:20.620
and tells the AI how to behave, right? Exactly.

00:05:20.759 --> 00:05:23.720
It holds your custom instructions. The problem

00:05:23.720 --> 00:05:27.839
is that this file loads every single time you

00:05:27.839 --> 00:05:29.620
send a new message. Even for a quick follow -up

00:05:29.620 --> 00:05:32.620
question. Every single time. If your file is

00:05:32.620 --> 00:05:35.579
2 ,000 words long, the AI has to read those 2

00:05:35.579 --> 00:05:37.740
,000 words before it even looks at your new prompt.

00:05:38.060 --> 00:05:41.120
That sounds incredibly wasteful. It is. The rule

00:05:41.120 --> 00:05:43.920
from the breakdown is very clear. Keep that file

00:05:43.920 --> 00:05:47.040
under 200 lines. Okay, under 200 lines. What

00:05:47.040 --> 00:05:48.959
should actually go in there? Only the universal

00:05:48.959 --> 00:05:52.019
stuff. Your core business identity, your general

00:05:52.019 --> 00:05:55.259
tone of voice, maybe a few absolute non -negotiable

00:05:55.259 --> 00:05:57.879
rules. But what if I have a massive checklist

00:05:57.879 --> 00:06:01.819
for writing blog posts or a complex translation

00:06:01.819 --> 00:06:04.220
workflow? You do not put those in the main file.

00:06:04.439 --> 00:06:08.040
You move heavy, specific instructions into skills.

00:06:08.279 --> 00:06:11.000
Skills. Okay, how do those differ? Skills operate

00:06:11.000 --> 00:06:12.899
differently. They do not load globally. They

00:06:12.899 --> 00:06:15.120
only load strictly on demand. Oh, I see. So instead

00:06:15.120 --> 00:06:17.000
of carrying your entire high school locker in

00:06:17.000 --> 00:06:19.100
your backpack, using skills is like stacking

00:06:19.100 --> 00:06:21.220
Lego blocks of data only when you need them.

00:06:21.500 --> 00:06:24.100
That is a perfect way to visualize it. If you

00:06:24.100 --> 00:06:26.240
ask it to translate a document, it reaches out,

00:06:26.459 --> 00:06:29.120
grabs the translation skill block, and uses it

00:06:29.120 --> 00:06:32.589
just for that task. So... The MD file is always

00:06:32.589 --> 00:06:35.949
on, but skills are on demand. Right. Skills only

00:06:35.949 --> 00:06:38.370
load when your specific task actually needs them.

00:06:38.569 --> 00:06:42.550
That clears up so much unnecessary weight. Sponsor?

00:06:43.649 --> 00:06:47.089
Okay, we are back. So we have fixed the permanent

00:06:47.089 --> 00:06:50.470
setup files. But what about the actual workspace?

00:06:51.230 --> 00:06:53.589
Right. How we structure our daily conversations

00:06:53.589 --> 00:06:55.850
is the next big trap. And the source guide leans

00:06:55.850 --> 00:06:59.050
heavily into using projects for this. Yes. Projects

00:06:59.050 --> 00:07:01.550
are vital. You have to separate your context.

00:07:01.870 --> 00:07:03.889
You need a space for content, a separate one

00:07:03.889 --> 00:07:06.310
for client work, personal stuff, operations.

00:07:06.509 --> 00:07:08.529
You don't want to mix YouTube scripts with grocery

00:07:08.529 --> 00:07:11.649
lists. Exactly. But even if you use projects

00:07:11.649 --> 00:07:14.370
perfectly, the individual chats themselves can

00:07:14.370 --> 00:07:16.870
drain your account. Long chats are incredibly

00:07:16.870 --> 00:07:18.709
expensive. Because we treat it like a texting

00:07:18.709 --> 00:07:20.589
thread with a friend, we just keep replying in

00:07:20.589 --> 00:07:23.089
the same window all day. And that is a huge mistake.

00:07:23.319 --> 00:07:26.000
The AI does not remember the chat like a human

00:07:26.000 --> 00:07:28.899
does. Right. Claude has to reread the entire

00:07:28.899 --> 00:07:31.759
history of the chat with every single new message

00:07:31.759 --> 00:07:34.920
you send. Wait, the entire history from the very

00:07:34.920 --> 00:07:38.220
first... Hello! Yes. A 20 -message session costs

00:07:38.220 --> 00:07:40.279
two to three times more than a 10 -message session.

00:07:40.319 --> 00:07:42.279
And if you keep going... A 30 -message session

00:07:42.279 --> 00:07:45.540
is four to five times more expensive. One developer

00:07:45.540 --> 00:07:48.019
found that in long threads, most of your tokens

00:07:48.019 --> 00:07:51.970
just go to rereading the past. Whoa! Imagine

00:07:51.970 --> 00:07:54.769
scaling to a billion queries. The amount of wasted

00:07:54.769 --> 00:07:57.350
compute just rereading history is staggering.

00:07:57.670 --> 00:07:59.449
It is an architectural quirk of how the models

00:07:59.449 --> 00:08:02.269
work right now. Do older messages in a thread

00:08:02.269 --> 00:08:05.350
keep charging me tokens? Yes. The AI rereads

00:08:05.350 --> 00:08:08.209
the entire chat history every single time. So

00:08:08.209 --> 00:08:10.769
how do we stop the bleeding here? One task per

00:08:10.769 --> 00:08:13.370
session. That is the rule. Once the specific

00:08:13.370 --> 00:08:15.800
task is done, you close the chat. But what if

00:08:15.800 --> 00:08:18.000
the task takes a long time and I need that previous

00:08:18.000 --> 00:08:20.339
context to keep going? Then you ask Claude to

00:08:20.339 --> 00:08:23.100
summarize the chat so far. You copy that short

00:08:23.100 --> 00:08:25.339
summary, open a brand new clean session, and

00:08:25.339 --> 00:08:27.899
paste the summary in. Oh, that is so smart. You

00:08:27.899 --> 00:08:29.939
keep the core knowledge without dragging the

00:08:29.939 --> 00:08:32.440
heavy transcript along with you. Exactly. It

00:08:32.440 --> 00:08:34.919
resets your token cost back to zero while keeping

00:08:34.919 --> 00:08:36.799
the momentum. OK, so we are clearing out the

00:08:36.799 --> 00:08:40.500
active chat history. There is also hidden history

00:08:40.500 --> 00:08:42.740
gathering dust elsewhere in the workspace, right?

00:08:42.960 --> 00:08:46.019
Yes. Connectors and memory files. Let us start

00:08:46.019 --> 00:08:49.039
with connectors. These are the plugins like Canva,

00:08:49.299 --> 00:08:52.659
Gmail, Google Drive. Right. They are super useful,

00:08:53.200 --> 00:08:55.799
but they add significant context weight to your

00:08:55.799 --> 00:08:58.779
profile just by being plugged in. Even if I am

00:08:58.779 --> 00:09:01.039
not actively using them in that session. Even

00:09:01.039 --> 00:09:03.730
if you are not using them. The system has to

00:09:03.730 --> 00:09:06.029
allocate memory just to keep them on standby.

00:09:06.210 --> 00:09:08.970
It is like having ten browser tabs open from

00:09:08.970 --> 00:09:10.710
last month that you were still paying rent on.

00:09:10.850 --> 00:09:13.070
That is exactly what it is. The fix here is an

00:09:13.070 --> 00:09:15.409
audit. Disconnect any integration you haven't

00:09:15.409 --> 00:09:17.669
used in the last two weeks. Cut the dead weight.

00:09:18.549 --> 00:09:20.669
And what about memory files? Memory files sit

00:09:20.669 --> 00:09:22.690
inside a project. They might be past feedback

00:09:22.690 --> 00:09:25.750
you gave or old formatting rules. And they load

00:09:25.750 --> 00:09:28.210
automatically? Yes. They load at the start of

00:09:28.210 --> 00:09:30.769
every conversation in that specific project.

00:09:31.149 --> 00:09:33.370
If they're outdated, they're just expensive noise.

00:09:33.750 --> 00:09:36.269
Are old memory files quietly draining my account

00:09:36.269 --> 00:09:38.370
in the background? Yeah. You are paying for outdated

00:09:38.370 --> 00:09:40.490
noise in every single session. So I just need

00:09:40.490 --> 00:09:42.730
to go in and delete them? Yep. Review and clean

00:09:42.730 --> 00:09:44.409
them out every two weeks. It takes five minutes.

00:09:44.590 --> 00:09:47.509
OK. So we have leaned out the system. We trimmed

00:09:47.509 --> 00:09:51.600
the MD file. Use skills. kept chats short, and

00:09:51.600 --> 00:09:54.440
deleted old memory. Your system is now incredibly

00:09:54.440 --> 00:09:58.820
lean, which means we can finally talk about maximizing

00:09:58.820 --> 00:10:02.000
the limits you have left using automation. The

00:10:02.000 --> 00:10:04.539
source text mentions scheduled tasks, like a

00:10:04.539 --> 00:10:07.179
morning email briefing or a weekly report. Right.

00:10:07.480 --> 00:10:09.759
Scheduled tasks are incredibly credit efficient.

00:10:09.879 --> 00:10:12.539
Why is that? Because they start fresh. They have

00:10:12.539 --> 00:10:15.299
absolutely zero chat history to read. They just

00:10:15.299 --> 00:10:17.799
wake up, do the job, and shut down. But what

00:10:17.799 --> 00:10:20.240
happens if an automated task hits my limit right

00:10:20.240 --> 00:10:23.539
in the middle of a run? It crashes. Which is

00:10:23.539 --> 00:10:25.759
bad if it is an important client report. But

00:10:25.759 --> 00:10:28.080
there is a safety net you can set up. Extra usage,

00:10:28.100 --> 00:10:30.240
right. Yeah. It is basically a pay -as -you -go

00:10:30.240 --> 00:10:32.840
buffer. You put $5 or $10 into the account. So

00:10:32.840 --> 00:10:35.059
if my main limit vanishes, it just dips into

00:10:35.059 --> 00:10:38.259
that $5 to finish the job. Exactly. It ensures

00:10:38.259 --> 00:10:41.419
your automations never abruptly stop. That is

00:10:41.419 --> 00:10:44.830
a great fail safe. There is one more variable

00:10:44.830 --> 00:10:46.830
the guide mentions, and I found this one really

00:10:46.830 --> 00:10:50.509
surprising. Timing. Timing is huge. Peak hours

00:10:50.509 --> 00:10:53.429
are weekdays from 5 a .m. to 11 a .m. Pacific

00:10:53.429 --> 00:10:55.710
time. Wait, the actual time of day I run a prompt

00:10:55.710 --> 00:10:58.490
changes how the limit feels. Right. And it changes

00:10:58.490 --> 00:11:01.000
how the system handles your requests. During

00:11:01.000 --> 00:11:04.360
peak hours, the system load is much higher. Millions

00:11:04.360 --> 00:11:06.639
of people are logging on to work. So the available

00:11:06.639 --> 00:11:09.159
compute shrinks for everyone. Exactly. The limits

00:11:09.159 --> 00:11:11.240
will feel much tighter during those hours. So

00:11:11.240 --> 00:11:13.620
the fix is just scheduling around the rush hour.

00:11:13.700 --> 00:11:17.019
Yes. Take your heavy automated jobs like massive

00:11:17.019 --> 00:11:20.019
data scraping or weekly reports and schedule

00:11:20.019 --> 00:11:22.539
them to run late at night or on the weekends.

00:11:22.659 --> 00:11:24.620
Why should I schedule my heavy automated tasks

00:11:24.620 --> 00:11:27.340
at night? System load drops, which keeps your

00:11:27.340 --> 00:11:29.750
automated workflows running smoothly. It is such

00:11:29.750 --> 00:11:32.149
a simple adjustment, but it makes so much sense.

00:11:32.409 --> 00:11:35.950
Beat. So stepping back from all these technical

00:11:35.950 --> 00:11:38.990
fixes, what is the big idea here? The main takeaway

00:11:38.990 --> 00:11:42.169
is a shift in perspective. Hitting limits isn't

00:11:42.169 --> 00:11:44.110
a sign that you are working too hard or being

00:11:44.110 --> 00:11:46.649
too productive. Right. It is usually a sign that

00:11:46.649 --> 00:11:48.450
your system is carrying too much baggage. We

00:11:48.450 --> 00:11:50.230
have to keep our inputs lean. We have to match

00:11:50.230 --> 00:11:54.179
the models correctly. Haiku, Sonnet. Opus. Exactly.

00:11:54.879 --> 00:11:56.659
And cutting all that dead weight gives you massive

00:11:56.659 --> 00:11:58.940
runway to do the actual creative work you want

00:11:58.940 --> 00:12:01.860
to do. It really makes you think. If AI context

00:12:01.860 --> 00:12:05.000
windows are like human working memory, are we

00:12:05.000 --> 00:12:07.340
treating our tools the way we treat ourselves?

00:12:08.080 --> 00:12:10.279
Overloading them with irrelevant baggage and

00:12:10.279 --> 00:12:12.559
anxieties from the past, instead of giving them

00:12:12.559 --> 00:12:15.759
a clean slate to focus on the task at hand? Beat.

00:12:16.320 --> 00:12:18.759
Take five minutes today, log in and look at your

00:12:18.759 --> 00:12:22.840
cladu .md file. trim it down to just the essentials.

00:12:23.139 --> 00:12:24.700
It really will change how you work. Thank you

00:12:24.700 --> 00:12:26.259
for taking this deep dive with us today.