WEBVTT

00:00:00.000 --> 00:00:02.299
Imagine slipping on a virtual reality headset

00:00:02.299 --> 00:00:05.120
right now. You walk directly into Monica's apartment

00:00:05.120 --> 00:00:07.780
from friends, you pick up a heavy ceramic coffee

00:00:07.780 --> 00:00:10.580
mug, you drop it, and it shatters across the

00:00:10.580 --> 00:00:13.740
floor. It shatters with perfectly accurate physical

00:00:13.740 --> 00:00:16.600
laws. This isn't just some polished video game

00:00:16.600 --> 00:00:19.239
environment, it's a leaked AI building physical

00:00:19.239 --> 00:00:22.359
reality from scratch. We are also looking at

00:00:22.359 --> 00:00:25.379
a brand new vision model. It maps the physical

00:00:25.379 --> 00:00:28.660
world in real time. And it does this using a

00:00:28.660 --> 00:00:34.049
cheap Welcome to the Deep Dive. We have a genuinely

00:00:34.049 --> 00:00:37.509
fascinating roadmap for you today. We're unpacking

00:00:37.509 --> 00:00:40.689
some massive new leaks from OpenAI. We're also

00:00:40.689 --> 00:00:44.469
exploring a staggering $100 billion compute race.

00:00:44.609 --> 00:00:47.350
This massive arms race is happening quietly behind

00:00:47.350 --> 00:00:50.030
closed doors. Finally, we'll break down a major

00:00:50.030 --> 00:00:52.670
breakthrough in real -time 3D vision. Let's start

00:00:52.670 --> 00:00:55.189
by looking closely at the situation inside OpenAI.

00:00:55.679 --> 00:00:58.500
Late last year, Sam Altman issued a massive code

00:00:58.500 --> 00:01:00.979
red. Yeah, he did. That internal warning triggered

00:01:00.979 --> 00:01:03.259
a really aggressive two -pronged release strategy.

00:01:03.759 --> 00:01:06.180
They are feeling immense pressure from their

00:01:06.180 --> 00:01:08.609
competitors right now. The underlying context

00:01:08.609 --> 00:01:12.010
here is totally crucial to understand. They completely

00:01:12.010 --> 00:01:14.730
missed their active user goals for late 2025.

00:01:15.390 --> 00:01:18.650
They originally wanted 1 billion weekly active

00:01:18.650 --> 00:01:21.310
users. Instead, they watched Anthropic experience

00:01:21.310 --> 00:01:24.549
a massive revenue surge. OpenAI also dealt with

00:01:24.549 --> 00:01:27.349
some undeniably bad vibes regarding leadership.

00:01:27.689 --> 00:01:30.010
Right. So they are aggressively striking back

00:01:30.010 --> 00:01:32.530
right now. That brings us to their first major

00:01:32.530 --> 00:01:35.680
leet project. It is currently operating under

00:01:35.680 --> 00:01:39.640
the internal code name Spud. OpenAI is actively

00:01:39.640 --> 00:01:42.040
A -B testing this model in the wild. Yeah, it's

00:01:42.040 --> 00:01:44.540
out there. Some users are randomly seeing it

00:01:44.540 --> 00:01:48.560
inside the GPT 5 .4 Pro interface. Spud apparently

00:01:48.560 --> 00:01:51.140
represents highly advanced spatial reasoning

00:01:51.140 --> 00:01:53.719
capabilities. Right, and spatial reasoning changes

00:01:53.719 --> 00:01:56.500
the entire paradigm completely. A standard language

00:01:56.500 --> 00:01:58.700
model simply predicts the next word. It guesses

00:01:58.700 --> 00:02:00.900
what text logically follows your specific prompt.

00:02:01.159 --> 00:02:03.540
But Spud is predicting the next physical state

00:02:03.540 --> 00:02:06.379
instead. If you drop a virtual mug, it calculates

00:02:06.379 --> 00:02:08.860
gravity, it understands momentum, mass, and physical

00:02:08.860 --> 00:02:11.319
collision in real time. That's how it rebuilt

00:02:11.319 --> 00:02:13.500
a functional 3D version of Monica's apartment.

00:02:13.819 --> 00:02:16.300
You can interact with the incredibly realistic

00:02:16.300 --> 00:02:19.560
physics engine inside. It even generates incredible

00:02:19.560 --> 00:02:22.479
Minecraft -style voxel art from basic prompts.

00:02:22.800 --> 00:02:25.919
What exactly is the mechanism behind voxel art?

00:02:26.159 --> 00:02:29.379
Voxel art means 3D digital models built from...

00:02:29.680 --> 00:02:32.680
tiny cubes. Exactly. You use simple language

00:02:32.680 --> 00:02:36.000
to describe a complex structure. The AI then

00:02:36.000 --> 00:02:38.039
constructs that physical environment perfectly.

00:02:38.300 --> 00:02:40.259
So it is exactly like stacking Lego blocks of

00:02:40.259 --> 00:02:42.990
data. That is the perfect analogy for the specific

00:02:42.990 --> 00:02:45.750
technology. It's a massive shift in underlying

00:02:45.750 --> 00:02:49.310
digital capability. Spud is also producing scalable

00:02:49.310 --> 00:02:52.750
SVG designs for everyday developers. Let's explain

00:02:52.750 --> 00:02:56.150
why that specific detail actually matters. Scalable

00:02:56.150 --> 00:02:58.530
vector graphics are math -based codes for drawing

00:02:58.530 --> 00:03:01.310
crisp images. They don't rely on a fixed grid

00:03:01.310 --> 00:03:03.509
of colored pixels. This means you can scale them

00:03:03.509 --> 00:03:05.780
infinitely without any blurring. Spud generates

00:03:05.780 --> 00:03:08.860
this complex mathematical code incredibly efficiently.

00:03:09.099 --> 00:03:11.319
It accomplishes this with significantly fewer

00:03:11.319 --> 00:03:14.400
lines of code. It creates incredibly clean, professional

00:03:14.400 --> 00:03:17.500
-grade, minimalistic layouts. Reducing the lines

00:03:17.500 --> 00:03:19.699
of code is actually a massive deal. It means

00:03:19.699 --> 00:03:21.620
the model operates with much greater computational

00:03:21.620 --> 00:03:24.680
efficiency. There is simply less room for hidden

00:03:24.680 --> 00:03:27.780
errors to accumulate. It's heavily outperforming

00:03:27.780 --> 00:03:30.379
Claude Opus 4 .7 in technical precision right

00:03:30.379 --> 00:03:32.919
now. This connects directly to the mysterious

00:03:32.919 --> 00:03:36.379
chatbot arena leaks recently. We've seen three

00:03:36.379 --> 00:03:38.860
very strange models testing on the platform.

00:03:39.180 --> 00:03:42.800
They are named Masking Tape Alpha, Gaffer Tape

00:03:42.800 --> 00:03:45.039
Alpha, and Packing Tape Alpha. Classic naming

00:03:45.039 --> 00:03:47.599
scheme. Insiders confirm this is actually a model

00:03:47.599 --> 00:03:51.340
called GPT Image 2. It was built to directly

00:03:51.340 --> 00:03:54.840
rival Google's Nano Banana Pro. They frequently

00:03:54.840 --> 00:03:57.780
use tape names to mask their true identity. But

00:03:57.780 --> 00:03:59.979
the ultimate strategic goal here is remarkably

00:03:59.979 --> 00:04:02.520
clear. They are betting heavily on hyper realistic

00:04:02.520 --> 00:04:05.400
personal avatars right now. These are gorgeous

00:04:05.400 --> 00:04:08.539
Studio Ghibli style personalized digital creations.

00:04:08.860 --> 00:04:10.680
They think these avatars will trigger a viral

00:04:10.680 --> 00:04:13.159
user surge. They desperately want to replicate

00:04:13.159 --> 00:04:15.719
the massive growth of early 2025. The visual

00:04:15.719 --> 00:04:17.560
quality is supposed to be absolutely breathtaking.

00:04:17.899 --> 00:04:20.040
They need this viral hit to reach that billion

00:04:20.040 --> 00:04:23.000
user goal. Whoa. Imagine scaling to a billion.

00:04:23.819 --> 00:04:27.740
Queries. Two sec silence. The digital infrastructure

00:04:27.740 --> 00:04:31.459
required to support that is hard to grasp. How

00:04:31.459 --> 00:04:34.399
does shifting from flat text to 3D spatial environments

00:04:34.399 --> 00:04:38.259
change OpenAI's ultimate endgame? It fundamentally

00:04:38.259 --> 00:04:40.740
transforms their entire core business model.

00:04:41.199 --> 00:04:43.420
They are evolving away from a simple chatbot

00:04:43.420 --> 00:04:46.240
utility. They are becoming an immersive world

00:04:46.240 --> 00:04:49.160
-building platform instead. They want to be the

00:04:49.160 --> 00:04:52.319
primary engine for future virtual realities.

00:04:52.560 --> 00:04:54.399
So they're trading flat text for interactive,

00:04:54.540 --> 00:04:56.839
personalized physical reality. Exactly. It's

00:04:56.839 --> 00:04:58.560
a completely different level of technological

00:04:58.560 --> 00:05:01.300
ambition. OpenAI is trying to win by building

00:05:01.300 --> 00:05:04.160
interactive digital worlds. But Anthropic is

00:05:04.160 --> 00:05:06.399
taking a completely different, highly physical

00:05:06.399 --> 00:05:09.180
path. They are making truly monumental physical

00:05:09.180 --> 00:05:11.040
infrastructure plays right now. Yeah, they are.

00:05:11.160 --> 00:05:13.379
Amazon is quietly investing up to $25 billion

00:05:13.379 --> 00:05:17.019
more. This massively adds to their existing $8

00:05:17.019 --> 00:05:19.699
billion stake. The underlying infrastructure

00:05:19.699 --> 00:05:22.360
math here is genuinely mind -bending. Anthropic

00:05:22.360 --> 00:05:25.579
plans to spend $100 billion on AWS chips. They

00:05:25.579 --> 00:05:27.360
are rolling this out over the next 10 years.

00:05:27.500 --> 00:05:30.860
They are actively securing 5 gigawatts of raw

00:05:30.860 --> 00:05:33.279
compute power. Let's put that 5 gigawatts into

00:05:33.279 --> 00:05:36.709
proper physical perspective. One single gigawatt

00:05:36.709 --> 00:05:39.410
can easily power a medium sized American city.

00:05:39.990 --> 00:05:41.970
Anthropic is securing enough physical energy

00:05:41.970 --> 00:05:45.230
to power five cities and is all feeding into

00:05:45.230 --> 00:05:48.089
a massive digital brain. They desperately need

00:05:48.089 --> 00:05:51.250
this to meet. Surging global cloud demand. Right.

00:05:51.350 --> 00:05:53.649
And we need to examine the actual user friction

00:05:53.649 --> 00:05:56.290
here. The newly released cloud design tool is

00:05:56.290 --> 00:05:58.610
going highly viral. People are creating wild

00:05:58.610 --> 00:06:01.529
visuals and complex mockups from basic prompts.

00:06:01.730 --> 00:06:04.069
Yeah. The design outputs are undeniably spectacular

00:06:04.069 --> 00:06:06.910
and widely shared online. But their flagship

00:06:06.910 --> 00:06:08.810
reasoning model is currently struggling quite

00:06:08.810 --> 00:06:12.610
badly. Claude Opus 4 .7 is facing intense criticism

00:06:12.610 --> 00:06:15.290
from everyday developers. Users have mockingly

00:06:15.290 --> 00:06:18.569
dubbed it Gaslitus 4 .7 online recently. It is

00:06:18.569 --> 00:06:20.910
wildly hallucinating digital files that simply

00:06:20.910 --> 00:06:23.370
do not exist. It delivers incredibly stubborn,

00:06:23.470 --> 00:06:25.870
completely incorrect outputs during complex coding

00:06:25.870 --> 00:06:28.709
tasks. It will literally argue with a developer

00:06:28.709 --> 00:06:31.850
over basic code. It refuses to correct itself,

00:06:32.029 --> 00:06:35.430
even when clearly shown the error. This definitely

00:06:35.430 --> 00:06:37.790
raises some major architectural concerns for

00:06:37.790 --> 00:06:40.149
their immediate future. Does this mean there

00:06:40.149 --> 00:06:43.410
is serious trouble ahead for Opus 4 .8? I still

00:06:43.410 --> 00:06:47.069
wrestle with prompt drift myself. Beat. It is

00:06:47.069 --> 00:06:49.329
incredibly frustrating when the model forgets

00:06:49.329 --> 00:06:52.329
initial instructions. Over a long conversation,

00:06:52.589 --> 00:06:55.250
it simply loses the main logical thread. But

00:06:55.250 --> 00:06:57.449
reliability isn't strictly an anthropic problem

00:06:57.449 --> 00:07:00.420
right now. That is entirely true across the entire

00:07:00.420 --> 00:07:03.540
global tech industry. ChatGPT recently suffered

00:07:03.540 --> 00:07:06.699
a massive, completely worldwide digital outage.

00:07:06.740 --> 00:07:09.120
It totally took down Codex and all their API

00:07:09.120 --> 00:07:11.819
services simultaneously. Thousands of developers

00:07:11.819 --> 00:07:13.920
couldn't work, code, or research anything online.

00:07:14.399 --> 00:07:16.519
OpenAI says the root issue is currently under

00:07:16.519 --> 00:07:19.019
active investigation. It highlights the deep

00:07:19.019 --> 00:07:21.920
fragility of this entire AI ecosystem. We rely

00:07:21.920 --> 00:07:24.000
so heavily on these hidden servers for our daily

00:07:24.000 --> 00:07:27.029
work. With billions spent on compute, Why are

00:07:27.029 --> 00:07:29.209
flagship models still stubbornly hallucinating

00:07:29.209 --> 00:07:32.410
files? Massive processing power doesn't automatically

00:07:32.410 --> 00:07:36.170
fix underlying architecture flaws. Neural networks

00:07:36.170 --> 00:07:39.129
don't actually think in a logical step -by -step

00:07:39.129 --> 00:07:42.149
progression. They analyze massive patterns to

00:07:42.149 --> 00:07:45.350
guess the most likely outcome. They are constantly

00:07:45.350 --> 00:07:47.990
predicting the next most likely token in a sequence.

00:07:48.410 --> 00:07:51.029
If the underlying logic path is fundamentally

00:07:51.029 --> 00:07:55.110
flawed, compute doesn't help. Adding more compute

00:07:55.110 --> 00:07:57.689
just makes the model confidently wrong much faster.

00:07:57.889 --> 00:08:00.889
The fundamental reasoning pathways still desperately

00:08:00.889 --> 00:08:04.189
need radical structural improvement. Throwing

00:08:04.189 --> 00:08:06.569
raw compute at a model won't magically cure its

00:08:06.569 --> 00:08:08.689
fundamental reasoning bugs. That is the harsh

00:08:08.689 --> 00:08:11.449
reality of the current generation. This massive

00:08:11.449 --> 00:08:14.089
scale and unreliability isn't just an abstract

00:08:14.089 --> 00:08:16.730
theoretical problem. It's fundamentally changing

00:08:16.730 --> 00:08:19.290
how everyday businesses are actively operating

00:08:19.290 --> 00:08:21.750
today. Everyday workflows are constantly bending

00:08:21.750 --> 00:08:24.730
around these new AI capabilities. Yeah. Adobe

00:08:24.730 --> 00:08:27.410
recently made a very revealing public statement

00:08:27.410 --> 00:08:30.550
about this shift. They flatly admitted AI could

00:08:30.550 --> 00:08:33.460
deeply disrupt their own massive business. They

00:08:33.460 --> 00:08:34.899
aren't just sitting back and watching it happen,

00:08:34.960 --> 00:08:38.139
though. They immediately released new CX enterprise

00:08:38.139 --> 00:08:41.320
agents for corporate businesses. These digital

00:08:41.320 --> 00:08:44.100
agents completely automate complex marketing

00:08:44.100 --> 00:08:47.440
and sales workflows. Adobe is also actively partnering

00:08:47.440 --> 00:08:50.559
with OpenAI and Anthropic directly. New developer

00:08:50.559 --> 00:08:52.940
tools are rapidly emerging to manage this digital

00:08:52.940 --> 00:08:55.980
chaos. We have to look closely at a tracking

00:08:55.980 --> 00:08:59.309
tool called WayDev. It tracks agent -generated

00:08:59.309 --> 00:09:01.870
code from the IDE straight to production. Right.

00:09:01.990 --> 00:09:04.490
Agent -generated code is software written entirely

00:09:04.490 --> 00:09:08.289
by AI systems. An IDE is basically the digital

00:09:08.289 --> 00:09:11.169
workspace where human coders type. Waitif tracks

00:09:11.169 --> 00:09:13.730
exactly which specific AI agent wrote the final

00:09:13.730 --> 00:09:16.360
code. It thoroughly monitors the total tokens

00:09:16.360 --> 00:09:18.919
consumed during the entire process. Tokens are

00:09:18.919 --> 00:09:20.840
basically the small chunks of words AI reads.

00:09:21.080 --> 00:09:23.500
Wative tracks every single token to calculate

00:09:23.500 --> 00:09:25.899
exact financial costs. It calculates the specific

00:09:25.899 --> 00:09:28.679
cost per individual pull request. A pull request

00:09:28.679 --> 00:09:30.460
is just officially proposing a new code change.

00:09:30.799 --> 00:09:33.960
It also perfectly tracks overall acceptance rates

00:09:33.960 --> 00:09:37.539
and live deployment status. We're seeing a massive

00:09:37.539 --> 00:09:40.220
explosion of these specialized ecosystem tracking

00:09:40.220 --> 00:09:42.879
tools. Look at Gemini Notebooks competing heavily

00:09:42.879 --> 00:09:44.789
with Notebook LM. right now yeah that's a big

00:09:44.789 --> 00:09:47.389
one their new sync loop workflow is a total game

00:09:47.389 --> 00:09:50.250
changer today it perfectly synchronizes your

00:09:50.250 --> 00:09:53.090
complex research across multiple different digital

00:09:53.090 --> 00:09:56.830
platforms we're also seeing fascinating new physical

00:09:56.830 --> 00:09:59.809
hardware integrations emerging right now the

00:09:59.809 --> 00:10:03.190
dune context aware mac keypad is a perfect physical

00:10:03.190 --> 00:10:05.730
example it physically automates your workflows

00:10:05.730 --> 00:10:08.950
and your complex digital meetings wow it automatically

00:10:08.950 --> 00:10:11.370
changes its physical button behavior based on

00:10:11.370 --> 00:10:13.929
your foreground app There's also the fascinating

00:10:13.929 --> 00:10:17.889
new Claude Desktop Buddy project online. It exposes

00:10:17.889 --> 00:10:20.289
a lightweight API directly from the Claude Desktop

00:10:20.289 --> 00:10:22.929
app. This allows you to connect digital workloads

00:10:22.929 --> 00:10:25.370
to physical microcontrollers. Right. You can

00:10:25.370 --> 00:10:27.389
physically bridge the gap between digital code

00:10:27.389 --> 00:10:30.779
and physical reality. But we must violently pivot

00:10:30.779 --> 00:10:33.639
to the macro security risks here. Governments

00:10:33.639 --> 00:10:36.580
are desperately racing to secure their AI leadership

00:10:36.580 --> 00:10:40.279
globally. The UK just launched a massive sovereign

00:10:40.279 --> 00:10:44.860
AI fund recently. It's a 500 million pound domestic

00:10:44.860 --> 00:10:47.120
investment initiative. They're offering massive

00:10:47.120 --> 00:10:50.139
capital, supercomputers and rapid visas for international

00:10:50.139 --> 00:10:53.039
startups. Governments are clearly recognizing

00:10:53.039 --> 00:10:55.919
this as a critical national priority. But we

00:10:55.919 --> 00:10:58.919
have to firmly contrast that with... security

00:10:58.919 --> 00:11:01.899
realities. Yeah, we do. A clever cheat malware

00:11:01.899 --> 00:11:05.440
recently led to a massive Vercel breach. Vercel

00:11:05.440 --> 00:11:07.679
is a highly popular platform for deploying web

00:11:07.679 --> 00:11:10.620
applications. Hackers use this malware to completely

00:11:10.620 --> 00:11:13.159
infiltrate an active workspace. Once inside,

00:11:13.360 --> 00:11:15.399
they didn't just steal a few static text files,

00:11:15.519 --> 00:11:17.759
they gained full access to the underlying automated

00:11:17.759 --> 00:11:20.860
deployment systems. It completely exposed highly

00:11:20.860 --> 00:11:23.059
sensitive Vercel workspace data to malicious

00:11:23.059 --> 00:11:25.759
hackers. The hacker group is now actively demanding

00:11:25.759 --> 00:11:29.019
a $2 million ransom. It's a massive, glaring

00:11:29.019 --> 00:11:31.120
reminder of the hidden dangers lurking here.

00:11:31.279 --> 00:11:34.740
Giving AI agents full workspace access creates

00:11:34.740 --> 00:11:38.500
incredibly serious institutional risks. of AI

00:11:38.500 --> 00:11:41.539
agents total workspace access to automate coding?

00:11:41.740 --> 00:11:44.460
Are we just automating our own security breaches?

00:11:44.700 --> 00:11:47.759
Speed and deep integration currently vastly outpace

00:11:47.759 --> 00:11:50.860
basic security protocols. We are letting AI bypass

00:11:50.860 --> 00:11:53.460
traditional human review processes entirely.

00:11:53.879 --> 00:11:56.539
The automated systems deploy the generated code

00:11:56.539 --> 00:11:58.960
without any friction. Right. We are definitely

00:11:58.960 --> 00:12:01.940
leaving massive vulnerabilities open for global

00:12:01.940 --> 00:12:04.899
exploitation. Deep workspace integration creates

00:12:04.899 --> 00:12:19.200
incredible speed, but opens terrifying Yeah.

00:12:34.139 --> 00:12:35.879
This is a genuinely fascinating breakthrough

00:12:35.879 --> 00:12:38.519
in digital mapping technology. Traditional 3D

00:12:38.519 --> 00:12:41.000
mapping has always had a massive computing bottleneck

00:12:41.000 --> 00:12:43.919
historically. It usually requires feeding a computer

00:12:43.919 --> 00:12:46.840
a massive mountain of digital photos. You have

00:12:46.840 --> 00:12:48.440
to wait until you're completely done recording

00:12:48.440 --> 00:12:51.059
everything physically. Right. Then the computer

00:12:51.059 --> 00:12:54.080
slowly processes that massive visual data into

00:12:54.080 --> 00:12:56.759
a map. That offline processing takes an enormous

00:12:56.759 --> 00:12:59.220
amount of computational time. You can't actually

00:12:59.220 --> 00:13:02.299
see the digital map while you are walking. But

00:13:02.299 --> 00:13:05.299
Lingbot Map... It's a completely new open source

00:13:05.299 --> 00:13:08.700
sees you go visual model. It completely deletes

00:13:08.700 --> 00:13:10.759
that frustrating digital waiting period entirely.

00:13:11.139 --> 00:13:14.159
It's a fully streaming 3D reconstruction foundation

00:13:14.159 --> 00:13:17.639
model. A foundation model is a massive adaptable

00:13:17.639 --> 00:13:21.360
AI trained on vast data. Right. And this specific

00:13:21.360 --> 00:13:24.159
model builds a digital world in real time. It

00:13:24.159 --> 00:13:26.840
processes the complex visual data frame by frame

00:13:26.840 --> 00:13:29.379
as you physically move. Traditional mapping heavily

00:13:29.379 --> 00:13:31.639
relies on incredibly expensive hardware like

00:13:31.639 --> 00:13:34.720
LiDAR sensors. LiDAR shoots precise lasers to

00:13:34.720 --> 00:13:37.220
measure physical distance in a room. But LingbotMap

00:13:37.220 --> 00:13:39.679
achieves the exact same precise result using

00:13:39.679 --> 00:13:42.600
pure software. It analyzes the changing pixels

00:13:42.600 --> 00:13:45.080
from a standard $10 phone camera. The technical

00:13:45.080 --> 00:13:47.659
performance specs on this are genuinely deeply

00:13:47.659 --> 00:13:50.759
impressive. It maintains... a rock -solid 20

00:13:50.759 --> 00:13:53.559
frames per second processing rate it perfectly

00:13:53.559 --> 00:13:56.039
holds this even over marathon sequences of 10

00:13:56.039 --> 00:13:59.399
,000 frames Wow most importantly it completely

00:13:59.399 --> 00:14:03.200
solves the infamous digital drift problem Drift

00:14:03.200 --> 00:14:06.019
is a notoriously huge issue in traditional digital

00:14:06.019 --> 00:14:09.139
mapping. Imagine walking completely blindfolded

00:14:09.139 --> 00:14:11.600
and trying to map a room in your head. Eventually,

00:14:11.639 --> 00:14:14.460
your internal mental map drifts completely away

00:14:14.460 --> 00:14:17.820
from reality. Early AI vision models suffered

00:14:17.820 --> 00:14:20.200
from this exact same navigational confusion.

00:14:20.620 --> 00:14:22.960
Let's clearly explain that specific tracking

00:14:22.960 --> 00:14:25.720
metric for a moment. Absolute trajectory error

00:14:25.720 --> 00:14:28.279
measures how far digital maps drift from reality.

00:14:28.649 --> 00:14:31.149
Lingbot map drastically cuts this physical error

00:14:31.149 --> 00:14:34.730
by 57 to 75%. That's compared to all the previous

00:14:34.730 --> 00:14:36.529
streaming methods currently available globally.

00:14:37.289 --> 00:14:39.929
On the brutally complex Oxford Spires dataset,

00:14:40.190 --> 00:14:42.750
it performed absolutely beautifully. It hit a

00:14:42.750 --> 00:14:46.470
genuinely staggering 6 .42 meter overall tracking

00:14:46.470 --> 00:14:49.009
error rate. That actually beats high -end offline

00:14:49.009 --> 00:14:51.830
models that see the whole video. And it's officially

00:14:51.830 --> 00:14:54.830
released entirely under the Apache 2 .0 open

00:14:54.830 --> 00:14:57.429
license. That means it is highly open source

00:14:57.429 --> 00:15:00.409
and free for anyone. Anyone can pull the public

00:15:00.409 --> 00:15:03.950
repo and run demo .pi right now. You can easily

00:15:03.950 --> 00:15:06.720
build a 3D viewer in your browser today. If an

00:15:06.720 --> 00:15:08.860
open -source model can map the world flawlessly

00:15:08.860 --> 00:15:11.799
with a $10 camera, what happens to the multi

00:15:11.799 --> 00:15:14.460
-billion -dollar LiDAR and high -end sensor industry?

00:15:14.809 --> 00:15:17.110
Expensive physical sensors will likely become

00:15:17.110 --> 00:15:20.350
highly niche for hypercritical tasks. Mass market

00:15:20.350 --> 00:15:22.809
augmented reality and delivery drones will totally

00:15:22.809 --> 00:15:25.090
rely on cheap vision models. The intelligent

00:15:25.090 --> 00:15:27.450
software will simply replace the complex hardware

00:15:27.450 --> 00:15:30.129
components entirely. Expensive physical sensors

00:15:30.129 --> 00:15:33.389
become obsolete as intelligent software extracts

00:15:33.389 --> 00:15:35.710
perfect data from cheap lenses. It's a truly

00:15:35.710 --> 00:15:38.590
profound shift in how machines understand space

00:15:38.590 --> 00:15:41.350
natively. We've spent years building complex

00:15:41.350 --> 00:15:43.610
robots that have to stop and think. They normally

00:15:43.610 --> 00:15:45.490
rely on... entirely on pre -mapped environments

00:15:45.490 --> 00:15:48.210
to function properly safely. Right. Lingbot map

00:15:48.210 --> 00:15:51.090
fundamentally turns mapping and exploring into

00:15:51.090 --> 00:15:53.669
a single fluid motion. We're reaching the end

00:15:53.669 --> 00:15:56.009
of our deep dive for today. Let's synthesize

00:15:56.009 --> 00:15:58.210
the massive big ideas we've uncovered together

00:15:58.210 --> 00:16:01.970
here. We are watching AI aggressively evolve

00:16:01.970 --> 00:16:05.970
from a stubborn text generator. It's moving from

00:16:05.970 --> 00:16:09.009
hallucinating digital files into an engine of

00:16:09.009 --> 00:16:11.799
spatial reasoning. The technological shift is

00:16:11.799 --> 00:16:14.519
happening across multiple different complex digital

00:16:14.519 --> 00:16:17.580
fronts. It's fundamentally changing how machines

00:16:17.580 --> 00:16:19.820
perceive and interact with physical reality.

00:16:20.100 --> 00:16:23.919
We see OpenAI's Spud recreating sitcom apartments

00:16:23.919 --> 00:16:27.279
with absolutely perfect physics. We see Lingbot

00:16:27.279 --> 00:16:29.759
Map turning environmental mapping into a single

00:16:29.759 --> 00:16:32.639
fluid motion. This directly powers autonomous

00:16:32.639 --> 00:16:35.700
delivery drones and lightweight AR glasses seamlessly.

00:16:36.299 --> 00:16:38.740
The strict boundary between physical reality

00:16:38.740 --> 00:16:41.220
and digital generation is collapsing. This incredible

00:16:41.220 --> 00:16:43.799
evolution is being fueled by astronomical global

00:16:43.799 --> 00:16:46.299
capital investments. Anthropic's $100 billion

00:16:46.299 --> 00:16:49.419
compute plan is a truly perfect example. But

00:16:49.419 --> 00:16:51.879
we also see malicious hackers aggressively exploiting

00:16:51.879 --> 00:16:54.559
this new ecosystem. The invisible digital risks

00:16:54.559 --> 00:16:56.419
are growing just as fast as the capabilities.

00:16:56.919 --> 00:16:59.980
If our AR glasses can perfectly anchor virtual

00:16:59.980 --> 00:17:02.299
furniture in our real -world living rooms in

00:17:02.299 --> 00:17:05.579
real time, and AI can recreate spaces with perfect...

00:17:12.999 --> 00:17:15.059
Thank you so much for joining us on this deep

00:17:15.059 --> 00:17:17.019
dive. It's always a genuine pleasure to explore

00:17:17.019 --> 00:17:19.420
these fascinating frontiers together. We'll catch

00:17:19.420 --> 00:17:21.099
you next time as the future keeps unfolding.

00:17:21.319 --> 00:17:22.140
Out to your music.
