WEBVTT

00:00:00.000 --> 00:00:02.700
Think about the last time you spoke to an AI.

00:00:02.960 --> 00:00:05.519
Yeah. It is usually a pretty jarring feeling.

00:00:05.700 --> 00:00:07.799
You try to interrupt the machine. An awkward

00:00:07.799 --> 00:00:10.779
silence stretches out. The system clunks through

00:00:10.779 --> 00:00:13.820
a messy reset. It just feels completely unnatural.

00:00:14.080 --> 00:00:16.160
It really does. It breaks the illusion immediately.

00:00:16.239 --> 00:00:19.059
Now, change that picture entirely. What if your

00:00:19.059 --> 00:00:21.579
AI could, you know, see you frown? What if it

00:00:21.579 --> 00:00:23.339
stopped talking mid -sentence and asked, what

00:00:23.339 --> 00:00:25.399
is wrong? That completely changes the paradigm.

00:00:25.579 --> 00:00:28.359
It does. Welcome to this custom deep dive. We

00:00:28.359 --> 00:00:30.980
are very glad you are here with us. Today, we

00:00:30.980 --> 00:00:33.159
are exploring a massive technological shift.

00:00:33.609 --> 00:00:36.270
We are tracking the move from clunky AI tools

00:00:36.270 --> 00:00:40.810
to fluid proactive partners. It is a big one.

00:00:40.990 --> 00:00:43.689
First, we will unpack Google's brand new Android

00:00:43.689 --> 00:00:46.609
updates. Then we are going to examine the staggering

00:00:46.609 --> 00:00:48.729
capital engines operating behind the scenes.

00:00:48.829 --> 00:00:52.109
We are talking about OpenAI, SoftBank, and Anthropic.

00:00:52.229 --> 00:00:54.509
The scale of the money is just wild. It truly

00:00:54.509 --> 00:00:57.109
is. Finally, we will look at a major breakthrough

00:00:57.109 --> 00:01:00.149
from Miramarati's Thinking Machines Lab. It is

00:01:00.149 --> 00:01:02.490
a fundamental shift that promises to end the

00:01:02.490 --> 00:01:05.620
awkward AI silence forever. It is a profound

00:01:05.620 --> 00:01:09.219
transition. We are moving past the novelty phase

00:01:09.219 --> 00:01:11.900
of AI. Yeah. The technology is weaving itself

00:01:11.900 --> 00:01:14.799
into our daily routines. It is basically becoming

00:01:14.799 --> 00:01:17.620
an invisible layer of reality. That is exactly

00:01:17.620 --> 00:01:19.620
what struck me about Google's latest announcements.

00:01:19.900 --> 00:01:22.099
They just wrapped up the Android show. It was

00:01:22.099 --> 00:01:24.620
their special IO edition. Right. And honestly,

00:01:24.799 --> 00:01:27.219
the whole ecosystem is shifting. We are looking

00:01:27.219 --> 00:01:29.920
at a radical redesign of how we interact with

00:01:29.920 --> 00:01:32.120
software. We are seeing a definitive move away

00:01:32.120 --> 00:01:35.420
from reactive tools. Consumer tech is becoming

00:01:35.420 --> 00:01:38.459
a proactive assistant. It anticipates what you

00:01:38.459 --> 00:01:40.799
need. It doesn't just wait around for a type

00:01:40.799 --> 00:01:43.500
command anymore. Right. And the clearest example

00:01:43.500 --> 00:01:45.760
of this is the Google book. They announced this

00:01:45.760 --> 00:01:48.560
entirely new type of computer. It is built from

00:01:48.560 --> 00:01:50.319
the ground up for Gemini. The operating system

00:01:50.319 --> 00:01:53.239
architecture is basically inverted. How so? Well,

00:01:53.280 --> 00:01:55.579
traditionally, the OS manages files and apps.

00:01:55.719 --> 00:01:59.019
Here, the OS centers entirely on the AI model.

00:01:59.180 --> 00:02:02.299
The AI is the kernel. Wow. They showcased a feature

00:02:02.299 --> 00:02:04.519
called the Magic Pointer too. It is essentially

00:02:04.519 --> 00:02:07.260
an AI -powered cursor. Yeah, that was fascinating.

00:02:07.439 --> 00:02:09.240
What is really interesting is how it bridges

00:02:09.240 --> 00:02:12.520
devices. You can use all your mobile apps on

00:02:12.520 --> 00:02:15.719
the big screen seamlessly. That removes a massive

00:02:15.719 --> 00:02:19.550
layer of friction. And the cursor itself... actually

00:02:19.550 --> 00:02:22.009
understands context. Right. It uses computer

00:02:22.009 --> 00:02:25.030
vision. Exactly. It knows what you are hovering

00:02:25.030 --> 00:02:28.030
over. If you hover over an address, it predicts

00:02:28.030 --> 00:02:30.569
you want a map. It anticipates your next action

00:02:30.569 --> 00:02:32.729
before you click. Okay, but the vibe -coded widgets

00:02:32.729 --> 00:02:35.069
feature is what actually stopped me in my tracks.

00:02:35.289 --> 00:02:37.610
Oh, man. You just describe a widget you want,

00:02:37.689 --> 00:02:39.490
maybe something incredibly niche that doesn't

00:02:39.490 --> 00:02:42.090
exist yet, and Android just builds it for you

00:02:42.090 --> 00:02:45.090
on the spot. Oh, it is wild. It completely flips

00:02:45.090 --> 00:02:47.469
how we interact with our phone's interface. You

00:02:47.469 --> 00:02:49.139
don't search an app store anymore? Nope. You

00:02:49.139 --> 00:02:51.800
just articulate a desire. The operating system

00:02:51.800 --> 00:02:54.900
handles the creation instantly. It is like magic.

00:02:55.180 --> 00:02:57.280
Under the hood, a lightweight language model

00:02:57.280 --> 00:03:00.539
writes the code. It compiles the UI components

00:03:00.539 --> 00:03:03.280
in real time. But wait, let's unpack that. Isn't

00:03:03.280 --> 00:03:05.520
that a massive threat to developers? Oh, absolutely.

00:03:05.780 --> 00:03:08.060
If the OS dynamically generates custom widgets,

00:03:08.379 --> 00:03:11.060
nobody is buying third -party apps for those

00:03:11.060 --> 00:03:13.900
microtasks anymore. Right? And that is the hidden

00:03:13.900 --> 00:03:16.919
disruption here. It democratizes software creation

00:03:16.919 --> 00:03:20.259
for the user. but it absolutely cannibalizes

00:03:20.259 --> 00:03:23.979
the lower tier of the developer ecosystem. The

00:03:23.979 --> 00:03:26.400
middleman is eliminated. That makes sense. We

00:03:26.400 --> 00:03:29.000
are also seeing major upgrades to Gemini intelligence.

00:03:29.789 --> 00:03:32.030
It is actually taking real -world actions now.

00:03:32.150 --> 00:03:34.990
Yes. And this is the shift from a passive oracle

00:03:34.990 --> 00:03:38.270
to an active agent. Right. It moves beyond generating

00:03:38.270 --> 00:03:42.229
text into executing multi -step workflows. The

00:03:42.229 --> 00:03:44.169
example they showed was brilliant. You take a

00:03:44.169 --> 00:03:46.169
photo of a concert flyer on a telephone pole.

00:03:46.409 --> 00:03:49.669
You simply tell Gemini to book a hotel. It goes

00:03:49.669 --> 00:03:51.990
out, navigates the travel sites autonomously,

00:03:52.030 --> 00:03:54.210
and finds a room near the venue. Think about

00:03:54.210 --> 00:03:56.629
the underlying mechanics there. It parses the

00:03:56.629 --> 00:03:59.280
image using a vision model. Yeah. It extracts

00:03:59.280 --> 00:04:01.879
the location, the band name and the date. It

00:04:01.879 --> 00:04:04.159
cross references your calendar. It accesses your

00:04:04.159 --> 00:04:06.400
payment preferences. It is doing all of that

00:04:06.400 --> 00:04:08.860
in the background. Exactly. Then it executes

00:04:08.860 --> 00:04:11.280
a complex sequence of web navigation tasks. It

00:04:11.280 --> 00:04:14.199
acts exactly like a human assistant. It is staggering.

00:04:14.280 --> 00:04:16.319
And alongside these massive shifts, there are

00:04:16.319 --> 00:04:18.819
smaller quality of life improvements. Right,

00:04:18.879 --> 00:04:21.379
like in Gboard. Yes. Gboard now has a feature

00:04:21.379 --> 00:04:24.560
called Rambler. It scrubs your voice -to -text

00:04:24.560 --> 00:04:27.879
inputs before they ever hit the screen. It automatically

00:04:27.879 --> 00:04:31.439
removes your awes and mid -sentence corrections.

00:04:31.480 --> 00:04:34.540
It uses on -device models to filter acoustic

00:04:34.540 --> 00:04:37.779
garbage. It is so helpful. It makes human communication

00:04:37.779 --> 00:04:40.720
look much cleaner. It basically edits our natural

00:04:40.720 --> 00:04:43.379
verbal stumbling in real time. Then there's the

00:04:43.379 --> 00:04:46.839
pause point feature. This one feels deeply necessary.

00:04:47.120 --> 00:04:49.839
I love this one. It forces you to wait 10 seconds

00:04:49.839 --> 00:04:52.319
before opening distracting apps, things like

00:04:52.319 --> 00:04:54.680
TikTok or Instagram. It gives you a moment to

00:04:54.680 --> 00:04:57.040
decide if you want to read a book instead. It

00:04:57.040 --> 00:04:59.100
is a great intervention. I have to admit something

00:04:59.100 --> 00:05:01.360
here. I still wrestle with doom scrolling myself.

00:05:01.680 --> 00:05:04.100
We all do. Our brains are hardwired for that

00:05:04.100 --> 00:05:06.959
quick dopamine hit. Yeah. PausePoint acts as

00:05:06.959 --> 00:05:09.160
a digital speed bump. Yes. It inserts a crucial

00:05:09.160 --> 00:05:11.279
moment of reflection into a compulsive habit.

00:05:11.459 --> 00:05:14.240
It fights algorithm -induced addiction with a

00:05:14.240 --> 00:05:16.040
counter -algorithm. They also announced some

00:05:16.040 --> 00:05:18.360
practical quick share updates. You can easily

00:05:18.360 --> 00:05:21.620
share files with iPhones via QR codes now. Moving

00:05:21.620 --> 00:05:24.899
your whole digital life from iOS to Android is

00:05:24.899 --> 00:05:27.439
getting much smoother. They are actively breaking

00:05:27.439 --> 00:05:30.399
down the walled gardens. The strategy is to make

00:05:30.399 --> 00:05:33.899
switching ecosystems entirely painless. Exactly.

00:05:34.519 --> 00:05:37.100
Lowering the barrier to entry. Security got a

00:05:37.100 --> 00:05:41.139
fascinating AI upgrade too. There is a new theft

00:05:41.139 --> 00:05:44.589
detection lock. This is very clever. The AI actually

00:05:44.589 --> 00:05:46.790
senses when a phone has been snatched out of

00:05:46.790 --> 00:05:49.129
your hand. It automatically locks the device.

00:05:49.449 --> 00:05:52.029
It also aggressively reduces the number of PI

00:05:52.029 --> 00:05:54.649
attempts a thief can make. That is a brilliant

00:05:54.649 --> 00:05:57.269
use of hardware sensors. It uses the phone's

00:05:57.269 --> 00:05:59.589
accelerometers and gyroscopes. Right. The machine

00:05:59.589 --> 00:06:01.569
learning model was trained on the physical signature

00:06:01.569 --> 00:06:05.410
of a theft. It knows the exact G -force and angle

00:06:05.410 --> 00:06:07.209
of a phone being grabbed and someone running

00:06:07.209 --> 00:06:10.410
away. Wow. It reacts faster than human reflexes

00:06:10.410 --> 00:06:13.209
could. And on a lighter note, they fully redesigned

00:06:13.209 --> 00:06:16.430
all 4 ,000 3D emojis. Right, because it is the

00:06:16.430 --> 00:06:18.290
little things that keep users tethered to an

00:06:18.290 --> 00:06:20.509
ecosystem. Looking at all this together, the

00:06:20.509 --> 00:06:23.290
ecosystem shift is vivid. It's like turning your

00:06:23.290 --> 00:06:25.910
phone from a passive filing cabinet into an active

00:06:25.910 --> 00:06:29.209
copilot. That is the perfect analogy. The device

00:06:29.209 --> 00:06:31.509
is no longer just waiting for your manual input.

00:06:31.689 --> 00:06:34.980
It is observing, predicting, and acting. But

00:06:34.980 --> 00:06:37.720
this raises a profound question. Does removing

00:06:37.720 --> 00:06:40.040
all friction with things like Gemini intelligence

00:06:40.040 --> 00:06:43.100
make us lose our own cognitive maps? That is

00:06:43.100 --> 00:06:45.620
a very valid concern. If the AI does everything,

00:06:45.939 --> 00:06:48.720
do we forget how to do things ourselves? That

00:06:48.720 --> 00:06:51.339
is the classic dilemma of cognitive offloading.

00:06:51.500 --> 00:06:54.959
Every new tool fundamentally changes our baseline

00:06:54.959 --> 00:06:57.579
expectations. Like when we got GPS. Exactly.

00:06:57.699 --> 00:07:01.060
When we offloaded navigation to GPS, our spatial

00:07:01.060 --> 00:07:03.939
memory demonstrably weakened. But it freed up

00:07:03.939 --> 00:07:06.060
mental bandwidth for other things. Right. We

00:07:06.060 --> 00:07:08.560
are outsourcing routine digital logistics to

00:07:08.560 --> 00:07:11.160
AI now. We might actually forget how to manually

00:07:11.160 --> 00:07:13.079
navigate a travel site or build a spreadsheet.

00:07:13.360 --> 00:07:16.000
But the theory is we gain time and energy for

00:07:16.000 --> 00:07:18.639
higher level thinking. It is a permanent tradeoff.

00:07:18.680 --> 00:07:20.459
So we trade a little self -reliance for ultimate

00:07:20.459 --> 00:07:23.579
daily efficiency. Got it. We're redefining what

00:07:23.579 --> 00:07:25.720
we consider a worthwhile use of our human time.

00:07:25.839 --> 00:07:28.579
If we are outsourcing our daily logistics to

00:07:28.579 --> 00:07:31.120
these seamless assistants, it takes a terrifying

00:07:31.120 --> 00:07:33.160
amount of server power to keep that illusion

00:07:33.160 --> 00:07:36.279
alive without lagging. Oh, the compute power

00:07:36.279 --> 00:07:38.899
required is mind boggling. Which brings us to

00:07:38.899 --> 00:07:41.120
the actual physical engines running all this.

00:07:41.319 --> 00:07:44.079
The numbers we are seeing are massive. The scale

00:07:44.079 --> 00:07:46.360
of capital deployment right now is unprecedented.

00:07:46.939 --> 00:07:49.759
We are watching a historical shift in global

00:07:49.759 --> 00:07:53.240
resource allocation. Let us look at OpenAI. They

00:07:53.240 --> 00:07:55.600
just had a massive liquidity event last fall.

00:07:55.939 --> 00:08:00.319
It was valued at $6 .6 billion. It is hard to

00:08:00.319 --> 00:08:02.740
even fathom that number. Roughly 600 employees

00:08:02.740 --> 00:08:05.870
cashed out. Around 75 of those employees walked

00:08:05.870 --> 00:08:09.519
away with over $30 million each. That fundamentally

00:08:09.519 --> 00:08:12.500
changes the internal dynamics of a company. You

00:08:12.500 --> 00:08:14.439
suddenly have a large group of incredibly wealthy

00:08:14.439 --> 00:08:17.079
engineers. Yeah. It shifts the culture from a

00:08:17.079 --> 00:08:19.399
scrappy research lab to a legacy institution

00:08:19.399 --> 00:08:21.920
managing massive wealth. It certainly changes

00:08:21.920 --> 00:08:24.319
the stakes. But the infrastructure spending happening

00:08:24.319 --> 00:08:26.779
outside the company is even bigger. Way bigger.

00:08:27.139 --> 00:08:30.000
SoftBank is making aggressive moves. Their CEO

00:08:30.000 --> 00:08:33.759
is discussing a $100 billion AI investment in

00:08:33.759 --> 00:08:36.320
France. That is an astonishing figure. It is

00:08:36.320 --> 00:08:40.509
literally larger than the G. It includes building

00:08:40.509 --> 00:08:43.870
massive new AI data centers. They want to radically

00:08:43.870 --> 00:08:46.690
expand the country's computing power infrastructure.

00:08:47.149 --> 00:08:49.750
This is deeply tied to national AI sovereignty.

00:08:50.289 --> 00:08:53.149
Countries are realizing that compute power is

00:08:53.149 --> 00:08:56.429
the new oil. If you don't control your own compute,

00:08:56.690 --> 00:08:59.149
you are at the mercy of foreign tech giants.

00:08:59.350 --> 00:09:02.409
Ain't true. SoftBank is positioning itself as

00:09:02.409 --> 00:09:05.570
the primary financier of this new global infrastructure.

00:09:05.870 --> 00:09:07.899
They are pouring the concrete. The corporate

00:09:07.899 --> 00:09:10.500
drama behind the scenes is just as intense as

00:09:10.500 --> 00:09:13.580
the spending. Sam Altman recently testified under

00:09:13.580 --> 00:09:16.299
oath. That testimony was wild. He shared some

00:09:16.299 --> 00:09:19.059
fascinating and honestly bizarre details about

00:09:19.059 --> 00:09:21.139
his relationship with Elon Musk. The history

00:09:21.139 --> 00:09:23.240
between those two founders is incredibly complicated.

00:09:23.519 --> 00:09:26.779
It is the defining origin story of modern AI.

00:09:27.080 --> 00:09:29.559
Altman testified that Musk once suggested open

00:09:29.559 --> 00:09:32.000
AI could pass to his children if he died. I still

00:09:32.000 --> 00:09:33.879
cannot believe he said that. Altman called it

00:09:33.879 --> 00:09:35.740
one of the hair -raising moments of their relationship.

00:09:36.320 --> 00:09:38.360
It highlights the philosophical clashes they

00:09:38.360 --> 00:09:41.299
had early on. They were just arguing over a software

00:09:41.299 --> 00:09:43.840
product. They were arguing over who should legally

00:09:43.840 --> 00:09:46.759
control artificial general intelligence. It sounds

00:09:46.759 --> 00:09:49.019
like science fiction, but to them, the stakes

00:09:49.019 --> 00:09:51.590
were very real. Meanwhile, Anthropic is throwing

00:09:51.590 --> 00:09:54.590
its own weight around. They are hiring a literal

00:09:54.590 --> 00:09:59.590
Claude evangelist. The job pays up to $315 ,000

00:09:59.590 --> 00:10:03.309
a year. They want someone specifically to help

00:10:03.309 --> 00:10:06.970
founders and VCs adopt their AI products. It

00:10:06.970 --> 00:10:09.169
shows how critical developer adoption has become.

00:10:09.409 --> 00:10:12.049
The best model doesn't win automatically. You

00:10:12.049 --> 00:10:14.549
have to fight a ground war for mindshare among

00:10:14.549 --> 00:10:16.830
the people building the apps. Anthropic is also

00:10:16.830 --> 00:10:19.940
flexing its capital in acquisitions. reportedly

00:10:19.940 --> 00:10:22.360
in late -stage talks to buy a startup called

00:10:22.360 --> 00:10:25.299
Stainless. This is a huge deal. It builds crucial

00:10:25.299 --> 00:10:28.259
developer tools. OpenAI and Google actually already

00:10:28.259 --> 00:10:31.019
use Stainless. And Throdnick might buy it for

00:10:31.019 --> 00:10:34.419
over $300 million. That is a brilliant, aggressive

00:10:34.419 --> 00:10:37.159
chess move. If Anthropic owns the underlying

00:10:37.159 --> 00:10:40.399
tools that their biggest rivals rely on, they

00:10:40.399 --> 00:10:43.000
gain a massive structural advantage. It is about

00:10:43.000 --> 00:10:45.679
owning the plumbing of the entire ecosystem.

00:10:46.080 --> 00:10:48.659
Speaking of advantages, OpenAI is fighting back

00:10:48.659 --> 00:10:50.360
with new technology. They just released something

00:10:50.360 --> 00:10:52.620
called Daybreak. This is their new security play.

00:10:53.049 --> 00:10:55.789
It is their direct answer to anthropic -sclawed

00:10:55.789 --> 00:10:58.990
mythos. Security is rapidly becoming the next

00:10:58.990 --> 00:11:01.289
major battleground for these models. Daybreak

00:11:01.289 --> 00:11:04.809
combines two distinct systems. It uses GPT -5

00:11:04.809 --> 00:11:08.950
.5 cyber and codex security, an AI guard dog

00:11:08.950 --> 00:11:11.970
catching software bugs before hackers do. The

00:11:11.970 --> 00:11:15.659
AI writes the code. But now it also audits the

00:11:15.659 --> 00:11:18.360
code autonomously. Which is crazy to think about.

00:11:18.500 --> 00:11:21.240
It runs adversarial attacks against itself. It

00:11:21.240 --> 00:11:23.700
hardens critical systems against zero -day exploits

00:11:23.700 --> 00:11:26.340
at machine speed. There are also some incredibly

00:11:26.340 --> 00:11:29.279
fun details emerging from all this coding. The

00:11:29.279 --> 00:11:31.919
team behind Cloud Code shared a strange tip.

00:11:32.100 --> 00:11:34.399
Oh, the markdown thing. Yes. They said developers

00:11:34.399 --> 00:11:36.379
should avoid using markdown files for instructions.

00:11:37.279 --> 00:11:39.360
Funny enough, they called the tip unreasonably

00:11:39.360 --> 00:11:41.639
effective. Sometimes the behavior of these massive

00:11:41.639 --> 00:11:44.039
models defines logical explanation. It really

00:11:44.039 --> 00:11:46.860
does. It is likely a quirky artifact hidden deep

00:11:46.860 --> 00:11:49.820
in the training data. The model just pays better

00:11:49.820 --> 00:11:53.120
attention to raw text. And everyday people are

00:11:53.120 --> 00:11:55.559
using these massive models for entirely different,

00:11:55.679 --> 00:11:59.340
hilarious things. Like what? A fan used AI video

00:11:59.340 --> 00:12:01.679
tools to insert himself into the movie Titanic.

00:12:02.179 --> 00:12:05.639
He somehow fixed every single fan complaint.

00:12:06.009 --> 00:12:08.950
in a single video. He saved Jack. Right, and

00:12:08.950 --> 00:12:10.889
he basically became the hero the movie needed.

00:12:11.070 --> 00:12:14.190
Exactly. It shows the incredible, chaotic, creative

00:12:14.190 --> 00:12:16.809
potential of these tools once they hit the mainstream.

00:12:17.399 --> 00:12:19.620
But looking at the big picture, a massive question

00:12:19.620 --> 00:12:22.580
remains. With SoftBank dropping $100 billion

00:12:22.580 --> 00:12:25.720
and OpenAI minting millionaires overnight, are

00:12:25.720 --> 00:12:28.139
we building foundational plumbing or is this

00:12:28.139 --> 00:12:30.259
just an unsustainable arms race? It certainly

00:12:30.259 --> 00:12:32.080
looks like an arms race from the outside. The

00:12:32.080 --> 00:12:34.320
valuations are undeniably astronomical. Right.

00:12:34.460 --> 00:12:35.759
But we have to look at what they're actually

00:12:35.759 --> 00:12:38.019
building with that cash. The data centers are

00:12:38.019 --> 00:12:40.120
real. The power grids are real. They are physical

00:12:40.120 --> 00:12:43.419
assets. Exactly. They are pouring actual concrete

00:12:43.419 --> 00:12:46.080
and laying thousands of miles of fiber optics.

00:12:46.650 --> 00:12:49.710
Even if an economic bubble bursts, that physical

00:12:49.710 --> 00:12:52.769
infrastructure remains. It will power the next

00:12:52.769 --> 00:12:56.250
generation of software, regardless of which specific

00:12:56.250 --> 00:13:00.309
company wins the AI war. Massive capital is literally

00:13:00.309 --> 00:13:02.870
laying down the physical plumbing for the future.

00:13:03.029 --> 00:13:06.210
Makes sense. Yes. It is very similar to the telecom

00:13:06.210 --> 00:13:08.889
boom of the late 90s. How so? The companies went

00:13:08.889 --> 00:13:11.549
bust, but the fiber they laid gave us the modern

00:13:11.549 --> 00:13:14.299
Internet. Wait, I have to push back there. Telecom

00:13:14.299 --> 00:13:17.519
laid neutral fiber that anyone could use. These

00:13:17.519 --> 00:13:20.720
AI data centers are highly proprietary walled

00:13:20.720 --> 00:13:23.740
gardens. SoftBank isn't building a public park.

00:13:23.940 --> 00:13:25.679
Isn't that fundamentally different? That is a

00:13:25.679 --> 00:13:27.659
very sharp distinction. You are right. Yeah.

00:13:27.759 --> 00:13:30.740
The physical hardware exists, but the access

00:13:30.740 --> 00:13:33.120
is strippedly gated by the corporate giants.

00:13:33.419 --> 00:13:36.799
It is infrastructure, but it is privatized infrastructure.

00:13:37.299 --> 00:13:40.000
Sponsor. The billions of dollars we just discussed

00:13:40.000 --> 00:13:43.139
are all chasing one holy grail. They all want

00:13:43.139 --> 00:13:45.480
to make computers feel completely, seamlessly

00:13:45.480 --> 00:13:48.080
human. That is the ultimate goal. This leads

00:13:48.080 --> 00:13:51.080
perfectly into Mira Marotti's new venture. It's

00:13:51.080 --> 00:13:52.720
perhaps the most exciting technical development

00:13:52.720 --> 00:13:55.240
we are covering today. It changes the core interaction

00:13:55.240 --> 00:13:58.100
paradigm completely. Maradi is the former CTO

00:13:58.100 --> 00:14:01.240
of OpenAI. She recently founded Thinking Machines

00:14:01.240 --> 00:14:04.159
Lab. We will call it TML. Right. They just dropped

00:14:04.159 --> 00:14:06.600
a research preview that is stunning. It focuses

00:14:06.600 --> 00:14:08.980
on the one thing we actually hate about talking

00:14:08.980 --> 00:14:12.000
to AI. The awkward silence. Yeah. The fundamental

00:14:12.000 --> 00:14:14.600
lack of human conversational rhythm. Exactly.

00:14:14.940 --> 00:14:17.539
They are trying to end the chat bubble interface

00:14:17.539 --> 00:14:21.500
forever. TML built a system that collaborates

00:14:21.500 --> 00:14:24.539
in a live, continuous streaming loop. More waiting.

00:14:24.700 --> 00:14:27.840
It processes your voice, video, and text simultaneously.

00:14:28.039 --> 00:14:30.480
This is a radical departure from turn -based

00:14:30.480 --> 00:14:33.480
interactions. Standard AI operates like a walkie

00:14:33.480 --> 00:14:35.899
-talkie. Yeah. You speak, you wait. It speaks.

00:14:36.440 --> 00:14:38.720
With TML, you no longer wait for the model to

00:14:38.720 --> 00:14:41.580
finish generating a response. The model perceives

00:14:41.580 --> 00:14:44.580
and processes data in tiny rapid bursts. It operates

00:14:44.580 --> 00:14:46.940
in chunks of just 200 milliseconds at a time.

00:14:47.000 --> 00:14:49.500
That is incredibly fast. It is. This allows it

00:14:49.500 --> 00:14:52.000
to make natural human noises. It says, mm -hmm,

00:14:52.080 --> 00:14:54.419
yeah, and got it while you are actively speaking.

00:14:54.700 --> 00:14:57.639
Human conversation is incredibly complex sociologically.

00:14:57.879 --> 00:15:00.360
We rely heavily on those subtle back -channel

00:15:00.360 --> 00:15:02.840
signals. They are so important. They let us know

00:15:02.840 --> 00:15:05.519
the other person is tracking our thoughts. Standard

00:15:05.519 --> 00:15:07.840
AI lacks this entirely, which is why it feels

00:15:07.840 --> 00:15:10.399
dead. It even reacts to your facial expressions

00:15:10.399 --> 00:15:12.779
before you finish your sentence. Wow. If you

00:15:12.779 --> 00:15:15.360
look confused mid -thought, the AI stops and

00:15:15.360 --> 00:15:17.539
clarifies its point. That requires incredibly

00:15:17.539 --> 00:15:20.580
low system latency. 200 milliseconds is the magic

00:15:20.580 --> 00:15:23.340
number. Why that specific number? It matches

00:15:23.340 --> 00:15:26.519
human conversational gap times perfectly. If

00:15:26.519 --> 00:15:28.960
it takes longer than that, our brains register

00:15:28.960 --> 00:15:32.200
it as an awkward pause. To stay that impossibly

00:15:32.200 --> 00:15:35.419
fast... TML uses a unique two layer architecture.

00:15:36.000 --> 00:15:38.320
This is a brilliant engineering solution to a

00:15:38.320 --> 00:15:40.879
very hard physics problem. Because you cannot

00:15:40.879 --> 00:15:43.840
run a massive intelligence model in 200 milliseconds.

00:15:44.039 --> 00:15:46.259
Exactly. You just can't. So they split it up.

00:15:46.299 --> 00:15:49.440
They use a live model to handle immediate superficial

00:15:49.440 --> 00:15:52.019
interaction. Then a background model does the

00:15:52.019 --> 00:15:54.320
heavy intellectual lifting. One brain for quick

00:15:54.320 --> 00:15:57.100
reactions, another brain for deep thinking. Exactly.

00:15:57.100 --> 00:16:00.279
It is like the human amygdala versus the prefrontal

00:16:00.279 --> 00:16:03.200
cortex. That makes sense. The live model manages

00:16:03.200 --> 00:16:06.799
the social dynamics and the hmms. The background

00:16:06.799 --> 00:16:09.759
model pulls the factual data and structures the

00:16:09.759 --> 00:16:12.879
complex argument. They work in tandem seamlessly.

00:16:13.279 --> 00:16:15.980
Because it processes video in real time, the

00:16:15.980 --> 00:16:19.519
real -world capabilities are wild. The model

00:16:19.519 --> 00:16:22.220
can literally watch you exercise through your

00:16:22.220 --> 00:16:24.519
phone camera. Oh, yeah. It will count your workout

00:16:24.519 --> 00:16:26.840
reps out loud. It acts as an interactive physical

00:16:26.840 --> 00:16:30.159
coach. It sees your posture, analyzes your form,

00:16:30.240 --> 00:16:32.620
and corrects it instantly. It can also translate

00:16:32.620 --> 00:16:36.240
live speech directly from a TV screen. It can

00:16:36.240 --> 00:16:38.460
speak up proactively if it sees something change

00:16:38.460 --> 00:16:39.980
in your environment. Right, like telling you

00:16:39.980 --> 00:16:42.840
your coffee has got to stop. It has genuine situational

00:16:42.840 --> 00:16:45.879
awareness. That is a massive leap forward from

00:16:45.879 --> 00:16:48.360
a text box. Think about standard AI voice modes

00:16:48.360 --> 00:16:50.940
today. If you have ever tried to interrupt one,

00:16:51.059 --> 00:16:53.379
you know how clunky it feels. Oh, it is terrible.

00:16:53.580 --> 00:16:56.240
It refuses to stop or it glitches. It breaks

00:16:56.240 --> 00:16:58.750
the illusion of intelligence immediately. TML

00:16:58.750 --> 00:17:01.610
is placing a massive contrarian bet here. They

00:17:01.610 --> 00:17:04.029
believe the future of AI isn't just about raw,

00:17:04.210 --> 00:17:07.069
escalating intelligence. No, it is deeply about

00:17:07.069 --> 00:17:10.910
latency, empathy, and conversational flow. Whoa.

00:17:11.609 --> 00:17:15.410
Beat it. Imagine scaling those 200 millisecond

00:17:15.410 --> 00:17:18.309
reactions to a billion queries. It requires an

00:17:18.309 --> 00:17:20.250
entirely different approach to server architecture.

00:17:20.609 --> 00:17:23.130
That is why the two -layer system is so crucial.

00:17:23.529 --> 00:17:27.190
Yeah. They are basically rewriting how data moves.

00:17:27.430 --> 00:17:30.230
But I have to ask a strategic question. If TML

00:17:30.230 --> 00:17:32.670
is betting everything on latency and flow rather

00:17:32.670 --> 00:17:35.109
than just raw smarts, can they actually survive

00:17:35.109 --> 00:17:37.529
as a standalone company? That is the million

00:17:37.529 --> 00:17:39.569
dollar question. Or is this just a preview of

00:17:39.569 --> 00:17:42.150
a feature that Google or OpenAI will eventually

00:17:42.150 --> 00:17:44.789
absorb? That is the ultimate Silicon Valley dilemma

00:17:44.789 --> 00:17:47.730
right now. Interaction models do create a powerful

00:17:47.730 --> 00:17:50.759
user mode. True. If users fall in love with that

00:17:50.759 --> 00:17:53.180
fluidity, they will not want to go back to clunky

00:17:53.180 --> 00:17:56.599
walkie -talkie AI. However, tech giants have

00:17:56.599 --> 00:17:59.220
immense distribution power. Right. Google already

00:17:59.220 --> 00:18:01.839
has the phones. Exactly. Google can push a software

00:18:01.839 --> 00:18:03.980
update to billions of Android phones overnight.

00:18:04.579 --> 00:18:07.140
TML has to build their user base from absolute

00:18:07.140 --> 00:18:09.900
scratch. That is a tough climb. They are pioneering

00:18:09.900 --> 00:18:12.359
the interface, but the giants are watching closely,

00:18:12.480 --> 00:18:14.740
and they have infinite capital to copy it. They

00:18:14.740 --> 00:18:17.079
either redefine the whole industry or get swallowed

00:18:17.079 --> 00:18:20.339
by a giant. fascinating it will be a thrilling

00:18:20.339 --> 00:18:22.319
narrative to watch unfold over the next year

00:18:22.319 --> 00:18:24.319
let us step back and look at the big picture

00:18:24.319 --> 00:18:26.920
how do all these disparate pieces fit together

00:18:26.920 --> 00:18:30.000
today we can trace A very clear through line

00:18:30.000 --> 00:18:32.720
across all these stories. AI is fundamentally

00:18:32.720 --> 00:18:36.519
moving from reactive text boxes to omnipresent

00:18:36.519 --> 00:18:40.059
real time partners. We see it in Google Androids,

00:18:40.079 --> 00:18:42.759
proactive widgets building themselves. We see

00:18:42.759 --> 00:18:45.160
it in the massive physical infrastructure being

00:18:45.160 --> 00:18:47.720
built globally to support these complex agents.

00:18:47.920 --> 00:18:50.880
Right. And we see it most clearly in TML's 200

00:18:50.880 --> 00:18:53.980
millisecond conversational bursts. The new metric

00:18:53.980 --> 00:18:56.539
for AI isn't just how smart it is on a test.

00:18:56.779 --> 00:18:59.569
Exactly. The metric is. how seamlessly it flows

00:18:59.569 --> 00:19:02.589
with human latency. It is about emotional intelligence

00:19:02.589 --> 00:19:05.170
and frictionless interaction. It is a profound

00:19:05.170 --> 00:19:07.569
shift in our relationship with technology. I

00:19:07.569 --> 00:19:09.109
want to leave you with a final thought to mull

00:19:09.109 --> 00:19:11.549
over. Two sec silence. We talked heavily about

00:19:11.549 --> 00:19:13.650
ending the awkward silence today. We talked about

00:19:13.650 --> 00:19:16.990
fluid, proactive interactions. If an AI can read

00:19:16.990 --> 00:19:19.539
your micro expressions perfectly. and interrupt

00:19:19.539 --> 00:19:21.759
you with the precise level of empathy in just

00:19:21.759 --> 00:19:24.720
200 milliseconds, at what point does talking

00:19:24.720 --> 00:19:26.799
to a machine become more comforting than talking

00:19:26.799 --> 00:19:29.579
to a human? Wow. Thank you for joining us on

00:19:29.579 --> 00:19:32.680
this deep dive. We will see you next time. U2RO

00:19:32.680 --> 00:19:32.859
Music.
