WEBVTT

00:00:00.000 --> 00:00:02.819
That $200 monthly AI subscription you might be

00:00:02.819 --> 00:00:05.480
using, well, it could actually be a massive problem.

00:00:05.559 --> 00:00:07.900
Oh, absolutely. It's a huge issue. Yeah, because

00:00:07.900 --> 00:00:11.460
it could be costing OpenAI up to $14 ,000 a month.

00:00:11.640 --> 00:00:15.060
Right, for a single user. It's wild. That is

00:00:15.060 --> 00:00:18.100
the brutal math behind this current AI boom.

00:00:18.869 --> 00:00:21.690
Welcome to the Deep Dive. We have a massive stack

00:00:21.690 --> 00:00:24.789
of research today. And it really paints a fascinating

00:00:24.789 --> 00:00:27.609
picture. Okay, let's unpack this. Our mission

00:00:27.609 --> 00:00:30.489
today is to explore the hidden economics of AI

00:00:30.489 --> 00:00:33.670
subscriptions. We're looking at how massive compute

00:00:33.670 --> 00:00:37.509
costs are driving this crazy explosion of cheap

00:00:37.509 --> 00:00:40.049
open source tools. Yeah. And we're also covering

00:00:40.049 --> 00:00:42.289
how geopolitical bands are radically reshaping

00:00:42.289 --> 00:00:44.670
the whole global AI landscape. It's all connected.

00:00:44.829 --> 00:00:47.310
So let's start with that brutal financial reality.

00:00:47.549 --> 00:00:50.409
It really underpins the entire AI ecosystem right

00:00:50.409 --> 00:00:52.450
now. It does. It's the foundation of everything

00:00:52.450 --> 00:00:54.850
we're seeing. We all love a flat rate subscription.

00:00:55.070 --> 00:00:58.770
You pay 20 bucks or maybe 200 for a pro tier

00:00:58.770 --> 00:01:01.250
and you just use it. Right. It feels like an

00:01:01.250 --> 00:01:04.019
all you can eat buffet. Exactly. Like an all

00:01:04.019 --> 00:01:06.079
-you -can -eat buffet. But if you actually max

00:01:06.079 --> 00:01:10.480
out your theoretical limits on a $200 ChatGPT

00:01:10.480 --> 00:01:14.700
Pro plan... It costs OpenAI roughly $14 ,000.

00:01:15.000 --> 00:01:18.400
$14 ,000 in API compute just to serve one user.

00:01:18.680 --> 00:01:21.540
Yeah. And API compute is just the server power

00:01:21.540 --> 00:01:25.239
needed to process AI requests. It's the raw electricity

00:01:25.239 --> 00:01:28.280
and hardware. And Anthropic's top tier is not

00:01:28.280 --> 00:01:30.819
much better. Their system caps out around $8

00:01:30.819 --> 00:01:34.359
,000 in token costs. The break -even points are

00:01:34.359 --> 00:01:36.140
actually shocking. Yeah, I was looking at this.

00:01:36.500 --> 00:01:40.319
OpenAI starts losing money at just 11 .4 % utilization.

00:01:40.760 --> 00:01:44.640
Wow, 11%. Just 11 .4%. And Claude hits its break

00:01:44.640 --> 00:01:47.500
-even point at a 20 % utilization rate. So if

00:01:47.500 --> 00:01:49.519
you use it even a quarter of the amount you're

00:01:49.519 --> 00:01:51.959
allowed to, they lose money. They're just bleeding

00:01:51.959 --> 00:01:55.579
cash. Yeah. So you have to ask, what drives this

00:01:55.579 --> 00:01:58.260
massive cost? Because human typing questions

00:01:58.260 --> 00:02:00.799
can only work so fast. Right. A person can't

00:02:00.799 --> 00:02:03.859
type $14 ,000 worth of prompts. The real culprit

00:02:03.859 --> 00:02:06.459
here is agent workflows. Let's define that quickly.

00:02:06.739 --> 00:02:09.900
Sure. Agent workflows are AI systems that independently

00:02:09.900 --> 00:02:13.300
loop and execute multi -step tasks. I have a

00:02:13.300 --> 00:02:16.219
bit of a vulnerable admission here. I still wrestle

00:02:16.219 --> 00:02:19.629
with my own token usage. when trying to set up

00:02:19.629 --> 00:02:22.229
agent loops. Yeah. It gets out of hand so fast.

00:02:22.389 --> 00:02:24.729
Oh, yeah. You are definitely not alone there.

00:02:24.849 --> 00:02:26.569
I mean, I looked at my dashboard last week and

00:02:26.569 --> 00:02:28.469
was just staring at the bill. It's because an

00:02:28.469 --> 00:02:31.889
agent breaks a task down. If you ask it to research

00:02:31.889 --> 00:02:34.969
a company, it doesn't just write a summary. It

00:02:34.969 --> 00:02:38.409
breaks that into 20 subtasks. It searches the

00:02:38.409 --> 00:02:42.050
web. It reads a PDS. It scrapes a database. And

00:02:42.050 --> 00:02:45.289
it feeds every single step back into itself.

00:02:45.569 --> 00:02:47.349
So it's constantly talking to itself. Exactly.

00:02:47.349 --> 00:02:49.810
It loops infinitely until it solves the problem.

00:02:49.909 --> 00:02:52.469
And every single loop costs money. Going back

00:02:52.469 --> 00:02:54.050
to that buffet analogy, it's like an all -you

00:02:54.050 --> 00:02:56.490
-can -eat buffet where one guy just sits down

00:02:56.490 --> 00:02:58.979
and eats the entire kitchen. Where he be? The

00:02:58.979 --> 00:03:01.000
restaurant cannot survive that. No, they can't.

00:03:01.000 --> 00:03:03.180
And we saw this happen. One firm reportedly burned

00:03:03.180 --> 00:03:06.240
through $500 million in a single month on Claude.

00:03:06.319 --> 00:03:08.569
Wait, really? Half a billion dollars? Half a

00:03:08.569 --> 00:03:11.430
billion, yeah. How is that even possible? Don't

00:03:11.430 --> 00:03:13.669
they have kill switches for enterprise contracts?

00:03:13.969 --> 00:03:16.150
Well, you would think so, but they didn't cap

00:03:16.150 --> 00:03:18.949
internal employee access, so everyone was running

00:03:18.949 --> 00:03:22.129
complex multi -step queries. Wow. An employee

00:03:22.129 --> 00:03:25.150
asks for a massive data sort, the agent hits

00:03:25.150 --> 00:03:28.250
a glitch, and it just loops 10 ,000 times in

00:03:28.250 --> 00:03:30.729
the background. That is just financially terrifying.

00:03:31.050 --> 00:03:33.469
It is. And it proves you don't need a massive

00:03:33.469 --> 00:03:35.629
frontier model for every task. You don't need

00:03:35.629 --> 00:03:38.340
quantum -level AI. to summarize a Tuesday meeting.

00:03:38.599 --> 00:03:41.520
How can frontier models survive if the business

00:03:41.520 --> 00:03:44.659
model is this fundamentally broken? They'll likely

00:03:44.659 --> 00:03:47.259
transition from flat consumer rates to strict

00:03:47.259 --> 00:03:49.979
metered enterprise usage contracts soon. Unlimited

00:03:49.979 --> 00:03:52.800
subscriptions die, replaced by strict pay -as

00:03:52.800 --> 00:03:56.219
-you -go corporate meters. Two sec silence. So

00:03:56.219 --> 00:03:58.840
because these frontier models are unsustainably

00:03:58.840 --> 00:04:01.500
expensive, the market is fracturing. Oh, completely.

00:04:01.740 --> 00:04:04.180
It is urgently fracturing into chipper, highly

00:04:04.180 --> 00:04:06.280
efficient alternatives. Let's talk about that

00:04:06.280 --> 00:04:08.780
push for efficiency. Startups are ditching expensive

00:04:08.780 --> 00:04:11.319
APIs, right? Yeah, they're moving to dirt -cheap

00:04:11.319 --> 00:04:13.580
alternatives like DeepSeek. And they're saving

00:04:13.580 --> 00:04:17.560
up to 95 % on compute. 95%. That is the difference

00:04:17.560 --> 00:04:19.800
between life and death for a startup. Absolutely.

00:04:20.000 --> 00:04:22.300
And we're seeing specialized models emerge too.

00:04:22.439 --> 00:04:26.540
Look at the new Kimi K2 .7 code. Right. The coding

00:04:26.540 --> 00:04:30.040
model. Yeah. It beats Opus 4 .8 on tool use,

00:04:30.259 --> 00:04:34.139
and it uses 30 % fewer reasoning tokens. And

00:04:34.139 --> 00:04:36.120
reasoning tokens are just tokens spent thinking

00:04:36.120 --> 00:04:38.839
before typing an answer. Exactly. Less thinking

00:04:38.839 --> 00:04:42.079
time means lower cost. Plus, Kimi can be self

00:04:42.079 --> 00:04:44.480
-hosted for entirely free. So no cloud fees at

00:04:44.480 --> 00:04:46.860
all. None. And the big players are scrambling

00:04:46.860 --> 00:04:50.860
to adapt. OpenAI just launched a $150 million

00:04:50.860 --> 00:04:54.120
partner network. To turn AI plans into actual

00:04:54.120 --> 00:04:57.240
workflows. Right. And Anthropic is testing something

00:04:57.240 --> 00:05:00.699
called Conway. I saw this. Whoa. Imagine scaling

00:05:00.699 --> 00:05:03.300
to an always -on agent environment like Conway.

00:05:03.399 --> 00:05:06.060
The possibilities are staggering. It really is.

00:05:06.160 --> 00:05:08.459
Conway is a standalone cloud agent environment.

00:05:08.720 --> 00:05:11.139
It uses webhooks and active Chrome browsing.

00:05:11.339 --> 00:05:13.560
Just to clarify, webhooks are automated messages

00:05:13.560 --> 00:05:16.639
apps send when something happens. Exactly. And

00:05:16.639 --> 00:05:18.879
this isn't just for big enterprise either. We're

00:05:18.879 --> 00:05:21.379
seeing a massive explosion of consumer tools.

00:05:21.579 --> 00:05:26.569
Like Slashy. Yeah. Slashy handles AI email triage.

00:05:26.750 --> 00:05:28.870
It writes in your voice and makes sure you don't

00:05:28.870 --> 00:05:31.230
miss follow -ups. And Tastelab. I thought this

00:05:31.230 --> 00:05:33.509
was fascinating. Oh, Tastelab is incredible.

00:05:33.910 --> 00:05:36.829
It extracts a website's design DNA. It pulls

00:05:36.829 --> 00:05:39.230
the hex codes and typography directly into a

00:05:39.230 --> 00:05:43.209
template, right? And there's Permute, which is

00:05:43.209 --> 00:05:47.399
a universal media converter. And Athenic 2 .0.

00:05:47.480 --> 00:05:49.959
Athenic is the data analysis agent. Right. It

00:05:49.959 --> 00:05:52.519
basically replaces a junior data analyst. It

00:05:52.519 --> 00:05:55.259
ships automated dashboards and reports. Are we

00:05:55.259 --> 00:05:57.819
moving toward a future of a million micro models

00:05:57.819 --> 00:06:00.660
instead of one giant god model? Definitely. A

00:06:00.660 --> 00:06:03.399
swarm of hyper specialized local models is far

00:06:03.399 --> 00:06:05.800
cheaper than one expensive monolith. We trade

00:06:05.800 --> 00:06:08.660
one expensive supercomputer for an army of cheap

00:06:08.660 --> 00:06:11.660
digital interns. Mid -roll sponsor placeholder.

00:06:11.860 --> 00:06:14.180
So it's not just cost driving people away from

00:06:14.180 --> 00:06:17.050
these big... proprietary models, it is also access.

00:06:17.689 --> 00:06:19.670
Geopolitics is playing a massive role here. It

00:06:19.670 --> 00:06:22.269
really is. Geopolitics is forcing the world to

00:06:22.269 --> 00:06:24.569
build around US tech giants. I want to maintain

00:06:24.569 --> 00:06:26.730
complete neutrality here and just report the

00:06:26.730 --> 00:06:28.329
facts. Sure. Let's look at the actual events.

00:06:28.860 --> 00:06:32.139
US export restrictions recently forced Anthropic

00:06:32.139 --> 00:06:34.980
to shut off Fable 5 and Mythos 5. Yeah, that

00:06:34.980 --> 00:06:37.480
happened incredibly fast. Anthropic is actually

00:06:37.480 --> 00:06:39.879
rushing staff to Washington, D .C. right now.

00:06:40.019 --> 00:06:42.500
They're claiming the security risk was overstated.

00:06:42.639 --> 00:06:45.040
What's fascinating here is the global ripple

00:06:45.040 --> 00:06:48.040
effect this caused. Bans didn't slow the competition

00:06:48.040 --> 00:06:50.920
down. No, they didn't. India is now actively

00:06:50.920 --> 00:06:53.240
rethinking its dependence on foreign models.

00:06:53.339 --> 00:06:55.560
Right. They are having a serious debate. Do they

00:06:55.560 --> 00:06:58.459
lean into open source or do they build sovereign

00:06:58.459 --> 00:07:02.060
AI? Sovereign AI simply means an AI model built

00:07:02.060 --> 00:07:04.519
and controlled entirely by one nation. Exactly.

00:07:04.779 --> 00:07:07.379
And Europe is taking the sovereign AI route.

00:07:07.540 --> 00:07:10.139
They are heavily backing Mistral AI. Mistral

00:07:10.139 --> 00:07:13.360
is raising 3 billion euros at a 20 billion euro

00:07:13.360 --> 00:07:15.939
valuation. Which is huge. Though, to be fair,

00:07:16.120 --> 00:07:19.120
OpenAI and Anthropix still lead the global market.

00:07:19.279 --> 00:07:21.699
For now. But the massive breakthrough came just

00:07:21.699 --> 00:07:24.579
two days after Fable 5 was banned. Yeah, this

00:07:24.579 --> 00:07:27.560
was wild. China's Zippo AI stepped in. They released

00:07:27.560 --> 00:07:31.100
GLM 5 .2, and they released it as a fully open

00:07:31.100 --> 00:07:33.319
source model under a permissive MIT license.

00:07:33.639 --> 00:07:36.339
That is the key detail. An MIT license means

00:07:36.339 --> 00:07:39.459
there are zero regional restrictions. Anyone

00:07:39.459 --> 00:07:43.529
can use it. And GLM 5 .2 absolutely dominated

00:07:43.529 --> 00:07:45.769
the benchmarks. It hit number one on BridgeBench.

00:07:45.790 --> 00:07:49.829
Right. It scored 100 .0 on BS and a 42 .8 on

00:07:49.829 --> 00:07:53.050
reasoning. So it actually beat the banned Fable

00:07:53.050 --> 00:07:56.509
5 model. It did. And it runs at 300 tokens per

00:07:56.509 --> 00:07:58.850
second. That is incredibly fast. And it does

00:07:58.850 --> 00:08:01.639
that at one tenth the cost. of the big proprietary

00:08:01.639 --> 00:08:04.680
models plus it has a 1 million token context

00:08:04.680 --> 00:08:07.480
window a context window is an ai's short -term

00:08:07.480 --> 00:08:09.759
memory limit during one conversation right so

00:08:09.759 --> 00:08:12.100
you can dump an entire massive code base into

00:08:12.100 --> 00:08:14.860
it at once we saw real world tests immediately

00:08:14.860 --> 00:08:18.319
the platform z .ai rolled it out instantly yeah

00:08:18.319 --> 00:08:20.019
developers got their hands on it right away and

00:08:20.019 --> 00:08:24.579
it coded a 925 line svg clock from scratch it's

00:08:24.579 --> 00:08:27.079
pure math and visual logic it also built a functional

00:08:27.079 --> 00:08:29.920
3d penalty kick game And a mini spreadsheet.

00:08:30.279 --> 00:08:33.159
All running flawlessly. If export bans immediately

00:08:33.159 --> 00:08:35.620
result in highly capable open source clones,

00:08:35.940 --> 00:08:38.759
do these restrictions actually work? Honestly,

00:08:38.980 --> 00:08:41.799
they seem to act as a catalyst that just supercharges

00:08:41.799 --> 00:08:44.779
global open source competition instead. Export

00:08:44.779 --> 00:08:47.960
blocks fail to contain tech and only force competitors

00:08:47.960 --> 00:08:52.389
to innovate faster. Two sec silence. So while

00:08:52.389 --> 00:08:54.850
superpowers fight over these massive models,

00:08:55.129 --> 00:08:58.669
AI is quietly rewriting our daily realities.

00:08:58.929 --> 00:09:00.830
It's happening on a deeply personal level now.

00:09:00.990 --> 00:09:04.090
It really is. It's both taking jobs and managing

00:09:04.090 --> 00:09:06.669
our personal lives. The job data is pretty grim

00:09:06.669 --> 00:09:09.470
right now. AI layoffs are rising. Nearly 120

00:09:09.470 --> 00:09:12.429
,000 tech workers have lost jobs this year. Wow,

00:09:12.690 --> 00:09:15.289
that's a massive shift. But there is a very surprising

00:09:15.289 --> 00:09:18.909
statistic here. Almost 75 % of unemployed Americans...

00:09:19.289 --> 00:09:21.289
Never apply for unemployment benefits. That is

00:09:21.289 --> 00:09:23.789
so high. People just quietly absorb the loss.

00:09:24.049 --> 00:09:26.210
Yeah. And this brings us to what we're calling

00:09:26.210 --> 00:09:28.730
the personal assistant paradox. Right. Because

00:09:28.730 --> 00:09:31.250
while AI is displacing tech jobs, people are

00:09:31.250 --> 00:09:33.830
leaning on AI to manage their daily friction.

00:09:33.870 --> 00:09:36.529
Exactly. For example, there are these 15 chat

00:09:36.529 --> 00:09:38.870
GPT problems going viral right now. Oh, I saw

00:09:38.870 --> 00:09:41.090
these. People use them to dissect their paychecks.

00:09:41.519 --> 00:09:44.220
They use the AI to find highly specific ways

00:09:44.220 --> 00:09:47.080
to squeeze savings out of their budget. It's

00:09:47.080 --> 00:09:50.080
crazy. The same tech disrupting the job market

00:09:50.080 --> 00:09:52.539
is what people use to survive the disruption.

00:09:52.759 --> 00:09:55.080
Right. So what does this all mean? We are seeing

00:09:55.080 --> 00:09:57.899
the tension of macro -level job losses mixed

00:09:57.899 --> 00:10:00.519
with micro -level convenience. Yeah. And there's

00:10:00.519 --> 00:10:02.440
a lighter side to this convenience, too. It's

00:10:02.440 --> 00:10:06.299
not all just budgeting and layoffs. True. ChatGPT

00:10:06.299 --> 00:10:10.149
now has a dedicated World Cup 2026 page. You

00:10:10.149 --> 00:10:12.769
can track live scores and matches. And you do

00:10:12.769 --> 00:10:15.309
it without ever leaving the chat interface. It's

00:10:15.309 --> 00:10:17.710
all just seamlessly integrated into your conversation.

00:10:18.029 --> 00:10:20.330
It becomes the primary lens for how you interact

00:10:20.330 --> 00:10:22.429
with the internet. It's the ultimate trade -off

00:10:22.429 --> 00:10:25.330
of the AI era, exchanging job security for extreme

00:10:25.330 --> 00:10:27.850
personal convenience. It really seems like we

00:10:27.850 --> 00:10:30.289
are trading long -term career stability for hyper

00:10:30.289 --> 00:10:33.070
-efficient... Automated daily task management.

00:10:33.370 --> 00:10:35.750
Trading lifelong career stability for incredibly

00:10:35.750 --> 00:10:38.809
smooth daily task automation. Exactly. Let's

00:10:38.809 --> 00:10:40.909
briefly synthesize the major through line of

00:10:40.909 --> 00:10:44.169
this deep dive. The era of relying on a few massive,

00:10:44.269 --> 00:10:47.289
expensive, centralized AI models is fracturing.

00:10:47.580 --> 00:10:50.220
It is absolutely breaking apart. Between the

00:10:50.220 --> 00:10:52.919
massive token burn rates we discussed, the $14

00:10:52.919 --> 00:10:57.019
,000 underlying cost per user, and geopolitical

00:10:57.019 --> 00:10:59.500
export bans. Right. The global developer ecosystem

00:10:59.500 --> 00:11:01.980
is just actively routing around all of those

00:11:01.980 --> 00:11:04.500
roadblocks. We are entering an era of incredibly

00:11:04.500 --> 00:11:08.139
cheap, hyper -capable, open -source agents that

00:11:08.139 --> 00:11:10.220
basically live everywhere. It's a completely

00:11:10.220 --> 00:11:12.860
decentralized future. I want to leave you with

00:11:12.860 --> 00:11:17.019
a final lingering question to ponder. If government

00:11:17.019 --> 00:11:19.519
roadblocks only motivate the global community

00:11:19.519 --> 00:11:21.779
to build smarter, faster, and completely open

00:11:21.779 --> 00:11:24.220
source models with zero regional restrictions,

00:11:24.559 --> 00:11:28.100
are we rapidly approaching a point where AI regulation

00:11:28.100 --> 00:11:30.860
is no longer a legal question, but a technical

00:11:30.860 --> 00:11:33.740
impossibility? Two sex islands. Thank you for

00:11:33.740 --> 00:11:35.779
joining the Deep Dive today. Keep questioning

00:11:35.779 --> 00:11:38.360
the information around you. Out to your own music.