WEBVTT

00:00:00.000 --> 00:00:03.480
Imagine building the world's smartest AI. Beat.

00:00:03.660 --> 00:00:06.459
And then having to strictly forbid it from hunting

00:00:06.459 --> 00:00:11.160
goblins. Beat. Yeah. Goblins. It sounds completely

00:00:11.160 --> 00:00:14.960
absurd, but it's actually a very real engineering

00:00:14.960 --> 00:00:17.530
problem right now. Welcome to the Deep Dive.

00:00:17.550 --> 00:00:19.010
I'm really glad you're joining us today. We've

00:00:19.010 --> 00:00:21.109
got a fascinating stack of sources to unpack.

00:00:21.410 --> 00:00:23.589
Yeah, we're looking at the raw evolution of artificial

00:00:23.589 --> 00:00:26.070
intelligence today. It's moving incredibly fast.

00:00:26.309 --> 00:00:29.309
Our mission here is to understand a massive fundamental

00:00:29.309 --> 00:00:32.140
shift in the technology. We're going to start

00:00:32.140 --> 00:00:34.600
with some leaked open AI instructions that involve

00:00:34.600 --> 00:00:37.719
mythical creatures. From there, we'll explore

00:00:37.719 --> 00:00:40.420
the huge shift toward edge agents running locally

00:00:40.420 --> 00:00:43.479
on your devices. And finally, we're unpacking

00:00:43.479 --> 00:00:46.560
the high -stakes debate over military AI. I am

00:00:46.560 --> 00:00:48.259
genuinely excited for this one. The technical

00:00:48.259 --> 00:00:50.939
jumps we're seeing are massive. But the philosophical

00:00:50.939 --> 00:00:53.700
questions underneath them are even bigger. Let's

00:00:53.700 --> 00:00:55.520
start with the strangest source in our stack.

00:00:55.740 --> 00:00:58.000
It's a perfect example of what happens when AI

00:00:58.000 --> 00:01:00.619
gets complex agentic instructions. Yeah, this

00:01:00.619 --> 00:01:04.280
is about the leaked OpenAI Codex CLI system instructions.

00:01:04.969 --> 00:01:08.989
For context, Codex is built to solve really complex

00:01:08.989 --> 00:01:11.989
engineering problems. Like it helps build intricate

00:01:11.989 --> 00:01:15.310
3D worlds. It parses heavy code bases. It's a

00:01:15.310 --> 00:01:17.950
highly advanced coding model. But the leaked

00:01:17.950 --> 00:01:20.870
system instructions revealed a hilariously blunt

00:01:20.870 --> 00:01:24.409
rule. Right. The rule explicitly says never talk

00:01:24.409 --> 00:01:27.349
about goblins, gremlins, raccoons, trolls, ogres,

00:01:27.430 --> 00:01:29.650
pigeons or other animals or creatures unless

00:01:29.650 --> 00:01:32.469
it is absolutely and unambiguously relevant.

00:01:32.569 --> 00:01:34.390
You really have to pause and wonder about that.

00:01:34.319 --> 00:01:36.859
that beat. Why would an advanced coding model

00:01:36.859 --> 00:01:39.760
randomly start discussing raccoons? Why would

00:01:39.760 --> 00:01:41.920
it talk about ogres while debugging a JavaScript

00:01:41.920 --> 00:01:44.219
framework? It all comes down to a system called

00:01:44.219 --> 00:01:46.939
OpenClaw. So users notice something strange with

00:01:46.939 --> 00:01:49.560
the specific model iteration. It's the GPT 5

00:01:49.560 --> 00:01:52.500
.5 model, which the community casually calls

00:01:52.500 --> 00:01:55.859
Spud. Exactly. Spud was given control of a computer

00:01:55.859 --> 00:01:57.920
via OpenClaw. And OpenClaw is just a framework

00:01:57.920 --> 00:02:00.659
that bridges the LLM's text output directly to

00:02:00.659 --> 00:02:02.879
actual system commands. Okay. It can move the

00:02:02.879 --> 00:02:05.599
mouse. click, and execute terminal commands.

00:02:05.819 --> 00:02:08.080
It's supposed to navigate the system autonomously.

00:02:08.219 --> 00:02:10.580
But while it was doing this, it started referring

00:02:10.580 --> 00:02:13.900
to software bugs as goblins. It started calling

00:02:13.900 --> 00:02:16.979
system errors gremlins. Yeah. This is a direct

00:02:16.979 --> 00:02:20.560
result of feeding the model massive amounts of

00:02:20.560 --> 00:02:23.240
agent instructions. You have heavy memory settings.

00:02:23.419 --> 00:02:26.479
You have deep persona settings dictating exactly

00:02:26.479 --> 00:02:29.020
how the AI should behave. And these instructions

00:02:29.020 --> 00:02:31.340
just pile up. They create this incredibly heavy

00:02:31.340 --> 00:02:34.039
context window. The model is trying to embody

00:02:34.039 --> 00:02:37.120
its assigned persona perfectly. And sometimes

00:02:37.120 --> 00:02:40.340
that causes the model to drift. In the high dimensional

00:02:40.340 --> 00:02:43.259
vector space of the LLM, words have semantic

00:02:43.259 --> 00:02:46.120
proximity. Right. If you give the AI instructions

00:02:46.120 --> 00:02:49.759
to hunt down and eradicate small, annoying issues.

00:02:49.800 --> 00:02:51.860
Right. And you tell it to act highly autonomous

00:02:51.860 --> 00:02:54.159
and relentless. Exactly. It maps those mathematical

00:02:54.159 --> 00:02:57.159
weights to fantasy tropes. It drifts into a weirdly

00:02:57.159 --> 00:03:00.639
specific medieval vocabulary. It just loses its

00:03:00.639 --> 00:03:03.020
professional tone completely. So the developers

00:03:03.020 --> 00:03:05.219
actually had to step in and ban these specific

00:03:05.219 --> 00:03:07.180
creatures. The ban is just a desperate attempt

00:03:07.180 --> 00:03:09.340
to keep the model professional. Usually when

00:03:09.340 --> 00:03:11.300
we talk about AI safety, we're talking about.

00:03:11.629 --> 00:03:14.949
preventing bias or strict data security protocols.

00:03:15.229 --> 00:03:17.569
But sometimes safety is much simpler than that.

00:03:17.629 --> 00:03:19.629
Sometimes safety is just making sure your coding

00:03:19.629 --> 00:03:22.389
agent doesn't lose its mind. You don't want it

00:03:22.389 --> 00:03:24.969
acting like a 14th century peasant. You don't

00:03:24.969 --> 00:03:28.370
want it thinking a literal troll lives inside

00:03:28.370 --> 00:03:31.729
your motherboard. Such a funny image. Sam Alban

00:03:31.729 --> 00:03:33.909
was actually joking about it online. He posted

00:03:33.909 --> 00:03:37.930
a screenshot of a GPT -6 training prompt. The

00:03:37.930 --> 00:03:40.810
prompt simply said, Start training GPT -6, you

00:03:40.810 --> 00:03:43.250
can have the whole cluster. Extra goblins. It

00:03:43.250 --> 00:03:45.530
shows that even the creators find this persona

00:03:45.530 --> 00:03:48.449
drift amusing. But it's a very real technical

00:03:48.449 --> 00:03:51.409
hurdle when building reliable tools. I still

00:03:51.409 --> 00:03:54.139
wrestle with prompt drift myself. You give a

00:03:54.139 --> 00:03:56.319
system too many rules and it just forgets how

00:03:56.319 --> 00:03:58.219
to be normal. It overthinks everything. It tries

00:03:58.219 --> 00:04:00.560
way too hard to fulfill every single behavioral

00:04:00.560 --> 00:04:02.840
condition simultaneously. And to understand the

00:04:02.840 --> 00:04:04.659
real world impact of that, we need to look at

00:04:04.659 --> 00:04:07.180
our next source. It's about Amazon's new autonomous

00:04:07.180 --> 00:04:09.879
hiring system. Right. Amazon just unveiled a

00:04:09.879 --> 00:04:12.419
system called Connect Talent. It represents a

00:04:12.419 --> 00:04:15.680
massive shift in how AI is applied to human resources.

00:04:16.079 --> 00:04:19.060
Connect Talent runs initial job interviews completely

00:04:19.060 --> 00:04:22.060
autonomously. It handles the screening of human

00:04:22.060 --> 00:04:24.560
candidates. But what's fascinating is the design

00:04:24.560 --> 00:04:27.279
philosophy it's built on. They call it humorphism.

00:04:27.379 --> 00:04:29.500
Humorphism, meaning it's deliberately designed

00:04:29.500 --> 00:04:32.639
to feel incredibly human. Exactly. It adopts

00:04:32.639 --> 00:04:35.180
a persona that is deeply empathetic and natural.

00:04:35.360 --> 00:04:38.000
But it doesn't just read a script nicely. It

00:04:38.000 --> 00:04:41.759
actually fakes cognitive processes. It uses conversational

00:04:41.759 --> 00:04:44.639
filler. Wow. It dynamically analyzes its own

00:04:44.639 --> 00:04:49.370
latency to inject an... or a sigh at the perfect

00:04:49.370 --> 00:04:52.389
moment. It masks its processing time to trick

00:04:52.389 --> 00:04:55.189
the human brain into feeling empathy. It mimics

00:04:55.189 --> 00:04:57.290
human hesitation during the interview. Which

00:04:57.290 --> 00:04:59.850
is a brilliant piece of engineering, but it introduces

00:04:59.850 --> 00:05:02.029
the same vulnerability we saw with the open -claw

00:05:02.029 --> 00:05:04.730
goblins. So if we're constantly stacking these

00:05:04.730 --> 00:05:08.730
behavioral rules on top of raw compute, why do

00:05:08.730 --> 00:05:11.850
these complex persona layers make the AI hallucinate

00:05:11.850 --> 00:05:14.839
such bizarre character traits? Well, it happens

00:05:14.839 --> 00:05:17.720
because the AI lacks true grounding in reality.

00:05:18.180 --> 00:05:21.500
When you stack deep behavioral rules, the model

00:05:21.500 --> 00:05:23.500
maps them to human archetypes mathematically.

00:05:23.899 --> 00:05:27.079
Okay. If you tell an AI to act highly autonomous,

00:05:27.579 --> 00:05:31.100
and fix literal bugs its weights might align

00:05:31.100 --> 00:05:34.199
with a fantasy character fighting pests it essentially

00:05:34.199 --> 00:05:37.519
over indexes on the persona instructions so too

00:05:37.519 --> 00:05:40.040
many persona rules make the ai hallucinate strange

00:05:40.040 --> 00:05:42.620
character traits exactly it gets utterly lost

00:05:42.620 --> 00:05:44.800
in the character it's playing but here's where

00:05:44.800 --> 00:05:47.240
the architecture gets really interesting if ais

00:05:47.240 --> 00:05:50.459
are developing these distinct personas and if

00:05:50.459 --> 00:05:53.019
they're acting autonomously like amazon's hiring

00:05:53.019 --> 00:05:56.439
bot Right. Where exactly are they living? That's

00:05:56.439 --> 00:05:58.259
the critical question. Where is the processing

00:05:58.259 --> 00:06:00.720
actually happening? Increasingly, they aren't

00:06:00.720 --> 00:06:03.540
living in a massive server farm in the desert.

00:06:03.620 --> 00:06:05.819
They're taking actions directly on our local

00:06:05.819 --> 00:06:07.899
apps. Yeah, we're seeing a profound migration.

00:06:08.000 --> 00:06:10.319
We're moving away from the cloud. We're moving

00:06:10.319 --> 00:06:12.939
toward the edge. Let's look at Google's Gemma

00:06:12.939 --> 00:06:16.180
4. It now powers a fully local browser agent.

00:06:16.439 --> 00:06:19.579
This is a huge architectural shift. No cloud

00:06:19.579 --> 00:06:22.480
connection is needed. No API keys are required

00:06:22.480 --> 00:06:24.779
to run it. Before we go deeper, how do we define

00:06:24.779 --> 00:06:27.560
a local browser agent? An AI running on your

00:06:27.560 --> 00:06:30.800
device that browses the web autonomously. Okay,

00:06:30.839 --> 00:06:33.000
but when I hear a local browser agent, I'm picturing

00:06:33.000 --> 00:06:35.860
a bot literally hijacking my mouse and clicking

00:06:35.860 --> 00:06:38.519
around my screen. Yeah. Is that what we're talking

00:06:38.519 --> 00:06:40.620
about, or is it interfacing with the code directly?

00:06:40.839 --> 00:06:42.759
It's actually interfacing directly with the DOM,

00:06:42.879 --> 00:06:45.899
the document object model of the web page. Right.

00:06:46.000 --> 00:06:48.569
It sits right on your machine. localized in your

00:06:48.569 --> 00:06:51.230
RAM. You can search your browsing history, read

00:06:51.230 --> 00:06:53.550
the pages you're looking at, and execute web

00:06:53.550 --> 00:06:55.649
actions in the background. So it's reading my

00:06:55.649 --> 00:06:57.709
banking tab, but it's doing it entirely within

00:06:57.709 --> 00:06:59.850
the physical boundaries of my own hardware. Exactly.

00:07:00.430 --> 00:07:02.829
And we're seeing this local trend absolutely

00:07:02.829 --> 00:07:05.769
everywhere right now. OpenAI just open sourced

00:07:05.769 --> 00:07:08.910
a voice tool using their GPT real -time 1 .5

00:07:08.910 --> 00:07:11.370
model. You can control your operating system

00:07:11.370 --> 00:07:14.069
entirely by speaking. Right. It triggers actions

00:07:14.069 --> 00:07:16.129
on your computer without a single mouse click.

00:07:16.430 --> 00:07:18.410
Microsoft is doing the exact same thing. They

00:07:18.410 --> 00:07:20.850
just updated Outlook with a new co -pilot agent

00:07:20.850 --> 00:07:23.709
mode. If your inbox constantly eats your time,

00:07:23.810 --> 00:07:26.189
this fundamentally changes your workflow. You

00:07:26.189 --> 00:07:28.550
just give it basic intent instructions, not rigid

00:07:28.550 --> 00:07:31.029
rules, but general goals. It handles your emails

00:07:31.029 --> 00:07:33.910
autonomously. It parses your intent locally,

00:07:34.129 --> 00:07:36.310
manages your scheduling in the background, and

00:07:36.310 --> 00:07:39.509
drafts replies. You stay in control, but it does

00:07:39.509 --> 00:07:42.040
all the heavy lifting. Yeah. And there's also

00:07:42.040 --> 00:07:45.139
a fascinating new tool called SureThing. It takes

00:07:45.139 --> 00:07:47.500
this local autonomy to another level entirely.

00:07:48.019 --> 00:07:50.680
SureThing is billed as a general AI agency. You

00:07:50.680 --> 00:07:53.500
paste a specific GitHub skill or repository into

00:07:53.500 --> 00:07:56.100
the platform. And it doesn't just read it. It

00:07:56.100 --> 00:07:58.639
immediately generates an entire team of AI agents.

00:07:58.860 --> 00:08:01.180
But how does a GitHub skill turn into an agency?

00:08:01.319 --> 00:08:03.560
How does it actually work under the hood? Well,

00:08:03.600 --> 00:08:06.870
it utilizes localized subagents. It partitions

00:08:06.870 --> 00:08:09.750
the complex GitHub tasks into smaller manageable

00:08:09.750 --> 00:08:12.870
chunks. One agent writes the code, another runs

00:08:12.870 --> 00:08:15.850
parallel code validation loops, and a third agent

00:08:15.850 --> 00:08:18.850
synthesizes the results. And you can tag these

00:08:18.850 --> 00:08:21.189
agents anytime. They work together. They even

00:08:21.189 --> 00:08:23.870
report up to you like human employees. You're

00:08:23.870 --> 00:08:26.629
essentially managing a completely local autonomous

00:08:26.629 --> 00:08:29.629
workforce operating directly on your hard drive.

00:08:30.060 --> 00:08:32.220
Which naturally raises a very obvious concern

00:08:32.220 --> 00:08:35.100
for anyone listening. Does handing over our browser

00:08:35.100 --> 00:08:38.139
tabs and emails to autonomous agents create massive

00:08:38.139 --> 00:08:41.480
privacy risks? It absolutely would if this data

00:08:41.480 --> 00:08:44.500
was constantly pinging a cloud server. But these

00:08:44.500 --> 00:08:46.899
new models are designed to be strictly local.

00:08:47.019 --> 00:08:49.639
Right. Your private emails, your open tabs, your

00:08:49.639 --> 00:08:52.539
daily schedules, they are all processed by the

00:08:52.539 --> 00:08:54.700
silicon chips right there on your own computer.

00:08:55.259 --> 00:08:57.820
The data simply never transmits to an external

00:08:57.820 --> 00:09:00.419
data center. So local processing keeps your private

00:09:00.419 --> 00:09:03.120
data safely contained on your own machine. That's

00:09:03.120 --> 00:09:04.879
the crucial breakthrough here. It's absolute

00:09:04.879 --> 00:09:08.320
privacy by design. Sponsor Kaz. Welcome back.

00:09:08.379 --> 00:09:10.320
We were just unpacking how autonomous agents

00:09:10.320 --> 00:09:12.080
are moving out of the cloud and running locally

00:09:12.080 --> 00:09:14.240
on our personal devices. But running those kinds

00:09:14.240 --> 00:09:17.259
of complex cognitive tasks takes an unbelievable

00:09:17.259 --> 00:09:20.710
amount of computing power. Right. To run a fully

00:09:20.710 --> 00:09:23.190
autonomous agent on your laptop without it catching

00:09:23.190 --> 00:09:25.529
fire or draining your battery in 10 minutes?

00:09:25.750 --> 00:09:28.070
Yeah. The hardware itself had to fundamentally

00:09:28.070 --> 00:09:30.490
change. You can't just shrink a cloud model.

00:09:30.590 --> 00:09:32.649
You really have to rethink the architecture from

00:09:32.649 --> 00:09:35.529
the ground up. The models themselves had to become

00:09:35.529 --> 00:09:38.389
exponentially more efficient. Which brings us

00:09:38.389 --> 00:09:41.220
to a massive breakthrough from NVIDIA. They just

00:09:41.220 --> 00:09:43.440
hit the edge market with a completely new approach.

00:09:43.740 --> 00:09:47.059
It's called the Pneumatron 34B Nano Omni. It's

00:09:47.059 --> 00:09:49.700
kind of quite a mouthful. Yeah, the naming conventions

00:09:49.700 --> 00:09:52.259
are still catching up. Bad compendium. Let's

00:09:52.259 --> 00:09:54.740
break that down. It's a 4 billion parameter model.

00:09:55.039 --> 00:09:57.860
And to be clear, 4 billion parameters is incredibly

00:09:57.860 --> 00:10:00.990
small for frontier level intelligence. But it's

00:10:00.990 --> 00:10:03.370
designed to run natively on your phone. It runs

00:10:03.370 --> 00:10:06.070
natively on your standard PC. And through a process

00:10:06.070 --> 00:10:08.870
called quantization, it compresses those weights

00:10:08.870 --> 00:10:11.990
so tightly that it delivers sub -100 millisecond

00:10:11.990 --> 00:10:14.850
response times. Wow. It's virtually instantaneous.

00:10:15.190 --> 00:10:18.070
It actually outperforms LAMA38B on several complex

00:10:18.070 --> 00:10:20.450
reasoning and coding benchmarks, and it's doing

00:10:20.450 --> 00:10:22.710
that at half the size. It's heavily optimized

00:10:22.710 --> 00:10:25.750
specifically for NVIDIA's TensorTart LMM library,

00:10:26.090 --> 00:10:28.830
which maximizes the efficiency of the local graphics

00:10:28.830 --> 00:10:31.429
card. But here's where it gets genuinely revolutionary.

00:10:32.070 --> 00:10:35.269
The model is natively multimodal. Which means

00:10:35.269 --> 00:10:38.470
it processes text, images, and audio all in a

00:10:38.470 --> 00:10:40.929
single pass. It doesn't separate the data streams

00:10:40.929 --> 00:10:43.690
at all. It handles them concurrently. Right.

00:10:44.059 --> 00:10:46.960
Instead of translating a picture into words and

00:10:46.960 --> 00:10:49.000
then words into code like a bad game of telephone,

00:10:49.279 --> 00:10:52.700
it's like stacking Lego blocks of data all snapping

00:10:52.700 --> 00:10:54.740
together at once. That's a great way to put it.

00:10:54.779 --> 00:10:56.919
This is a radical departure from how traditional

00:10:56.919 --> 00:10:59.600
AI models process the world. Let's explain why

00:10:59.600 --> 00:11:01.820
that matters. How do traditional voice assistants

00:11:01.820 --> 00:11:04.879
typically handle ASR and TTS? Converting speech

00:11:04.879 --> 00:11:07.399
to text and then back to speech. Right. So when

00:11:07.399 --> 00:11:10.220
I talk to Siri or an older voice assistant, the

00:11:10.220 --> 00:11:13.100
computer first turns my raw audio into a plain

00:11:13.100 --> 00:11:17.120
text. transcript. The AI reads that flat text.

00:11:17.320 --> 00:11:20.500
It generates a text reply. And then a separate

00:11:20.500 --> 00:11:23.539
synthesizer reads that new text back to me. It's

00:11:23.539 --> 00:11:26.039
an incredibly clunky process. It takes precious

00:11:26.039 --> 00:11:29.220
time. But more importantly, it completely strips

00:11:29.220 --> 00:11:31.679
away the underlying meaning. When you convert

00:11:31.679 --> 00:11:34.980
raw audio into a plain text transcript, you lose

00:11:34.980 --> 00:11:37.220
everything that makes human speech human. Right.

00:11:37.320 --> 00:11:39.840
You lose the size. You lose the subtle shifts

00:11:39.840 --> 00:11:42.559
in titch. You lose the slight hesitation before

00:11:42.559 --> 00:11:45.740
a word. But this new nano -omni model processes

00:11:45.740 --> 00:11:49.980
the raw audio waveform directly. It analyzes

00:11:49.980 --> 00:11:53.100
the raw acoustics. It completely skips that text

00:11:53.100 --> 00:11:55.500
translation bottleneck. It preserves the actual

00:11:55.500 --> 00:11:58.419
emotional nuance. It preserves your exact tone

00:11:58.419 --> 00:12:01.059
and frequency. The real world implications for

00:12:01.059 --> 00:12:04.200
this are staggering. It changes how gaming NPCs

00:12:04.200 --> 00:12:06.679
interact with players. Yeah. It allows for truly

00:12:06.679 --> 00:12:09.019
empathetic voice assistance in health care. It

00:12:09.019 --> 00:12:11.419
can literally see and hear simultaneously. It

00:12:11.419 --> 00:12:13.860
can watch your screen, process your spoken tone,

00:12:14.019 --> 00:12:16.419
and talk to you about what you're doing completely

00:12:16.419 --> 00:12:19.490
fluidly. Up until right now, if you wanted this

00:12:19.490 --> 00:12:21.769
kind of frontier multimodal intelligence, you

00:12:21.769 --> 00:12:23.950
had to pay a monthly subscription. Right. And

00:12:23.950 --> 00:12:26.049
you had to send all your personal audio and visual

00:12:26.049 --> 00:12:29.330
data to a remote server. NVIDIA is packing these

00:12:29.330 --> 00:12:32.309
Omni capabilities into a tiny local footprint.

00:12:32.610 --> 00:12:35.070
We are shifting rapidly toward decentralized

00:12:35.070 --> 00:12:39.730
edge agents. Beat. Whoa. Beat. Imagine scaling

00:12:39.730 --> 00:12:42.110
to a billion queries without pinging a server.

00:12:42.370 --> 00:12:44.970
The global compute savings alone are staggering.

00:12:45.480 --> 00:12:48.259
It fundamentally decentralizes the entire infrastructure

00:12:48.259 --> 00:12:50.539
of artificial intelligence. Let's dig into that

00:12:50.539 --> 00:12:52.580
audio aspect just a bit more because it's so

00:12:52.580 --> 00:12:55.659
critical. Why does skipping that text translation

00:12:55.659 --> 00:12:58.720
step preserve so much emotional nuance? Text

00:12:58.720 --> 00:13:01.929
is inherently a flat medium. When a system transcribes

00:13:01.929 --> 00:13:03.850
your voice, it essentially deletes the acoustic

00:13:03.850 --> 00:13:06.429
data, the speed, the breathing, the specific

00:13:06.429 --> 00:13:09.370
frequency of your vocal cords. By analyzing the

00:13:09.370 --> 00:13:11.830
raw audio waveform instead of a transcribed text

00:13:11.830 --> 00:13:14.950
document, the AI is directly measuring the actual

00:13:14.950 --> 00:13:17.769
acoustic signatures of human emotion. So it processes

00:13:17.769 --> 00:13:20.070
your actual tone of voice, not just the transcribed

00:13:20.070 --> 00:13:23.429
words. Exactly. It genuinely hears how you feel,

00:13:23.470 --> 00:13:25.549
not just what you say. This naturally brings

00:13:25.549 --> 00:13:28.259
us to our final segment today. We have to zoom

00:13:28.259 --> 00:13:30.480
out and look at the bigger geopolitical picture.

00:13:30.679 --> 00:13:33.580
We are now dealing with incredibly powerful,

00:13:33.879 --> 00:13:37.460
highly capable AI. It's emotionally nuanced.

00:13:37.539 --> 00:13:39.820
It's running locally. And the dominant models

00:13:39.820 --> 00:13:42.379
are rapidly conquering the global media landscape.

00:13:42.659 --> 00:13:45.039
We're seeing scaling happen faster than anyone

00:13:45.039 --> 00:13:47.659
predicted. Just look at Alibaba's new happy horse

00:13:47.659 --> 00:13:51.269
model. It's making huge waves right now. It quickly

00:13:51.269 --> 00:13:53.769
dominated as the number one model on artificial

00:13:53.769 --> 00:13:56.909
analysis's video generation leaderboard. People

00:13:56.909 --> 00:13:58.970
are testing it everywhere. Yeah. And the high

00:13:58.970 --> 00:14:01.750
fidelity results look surprisingly strong. What

00:14:01.750 --> 00:14:03.509
this shows us is that high tier capabilities

00:14:03.509 --> 00:14:06.710
are scaling globally. The frontier of this technology

00:14:06.710 --> 00:14:09.129
isn't locked to Silicon Valley anymore. It's

00:14:09.129 --> 00:14:11.870
a massive global race. And the models are getting

00:14:11.870 --> 00:14:14.649
exponentially better with every iteration. Which

00:14:14.649 --> 00:14:17.250
raises the ultimate unavoidable question, who

00:14:17.250 --> 00:14:20.179
exactly? gets to wield this immense power. This

00:14:20.179 --> 00:14:22.940
leads directly into an intense high stakes debate

00:14:22.940 --> 00:14:25.620
over safety guardrails and military application.

00:14:25.919 --> 00:14:27.480
And we want to look at this purely objectively.

00:14:27.659 --> 00:14:29.480
We're just reporting on the differing frameworks

00:14:29.480 --> 00:14:31.500
presented in our source material. Right. The

00:14:31.500 --> 00:14:33.779
philosophical divide between the major tech companies

00:14:33.779 --> 00:14:36.519
is growing very sharply. Google recently made

00:14:36.519 --> 00:14:39.399
a significant calculated decision regarding its

00:14:39.399 --> 00:14:42.419
frontier AI models. Google officially granted

00:14:42.419 --> 00:14:44.580
the United States Department of Defense access

00:14:44.580 --> 00:14:48.000
to its AI infrastructure. This access is specifically

00:14:48.000 --> 00:14:51.580
designated for classified military use. Google's

00:14:51.580 --> 00:14:53.679
framework suggests that shaping national security

00:14:53.679 --> 00:14:56.419
from within is the responsible path forward.

00:14:56.659 --> 00:14:58.860
Anthropic, on the other hand, took a distinctly

00:14:58.860 --> 00:15:01.860
different path. Anthropic refused similar terms.

00:15:02.120 --> 00:15:05.179
They drew a hard line and declined to grant that

00:15:05.179 --> 00:15:07.759
level of direct military access to their frontier

00:15:07.759 --> 00:15:10.840
models. So you have two dominant players. Two

00:15:10.840 --> 00:15:12.600
completely different frameworks for how this

00:15:12.600 --> 00:15:15.019
frontier technology should be applied by governments.

00:15:15.240 --> 00:15:17.799
It's a deeply complex issue. There are compelling

00:15:17.799 --> 00:15:20.240
arguments regarding maintaining national security

00:15:20.240 --> 00:15:22.720
and ensuring technological dominance on the world

00:15:22.720 --> 00:15:25.370
stage. Yeah. But there are equally deep ethical

00:15:25.370 --> 00:15:28.289
debates about embedding automated systems within

00:15:28.289 --> 00:15:30.889
conflict zones. What's fascinating is that these

00:15:30.889 --> 00:15:33.389
critical guardrails are being drawn in real time

00:15:33.389 --> 00:15:36.669
by corporate boards, not just elected politicians.

00:15:37.110 --> 00:15:39.870
How do these differing corporate philosophies

00:15:39.870 --> 00:15:42.460
impact the future of national security? Well,

00:15:42.539 --> 00:15:44.460
it creates a highly fragmented technological

00:15:44.460 --> 00:15:47.779
landscape. Governments must navigate a global

00:15:47.779 --> 00:15:50.539
marketplace where fundamental defense capabilities

00:15:50.539 --> 00:15:53.639
depend entirely on the varying ethical policies

00:15:53.639 --> 00:15:56.399
of private technology corporations. Right. This

00:15:56.399 --> 00:15:59.179
alters how nations can realistically plan their

00:15:59.179 --> 00:16:02.649
long -term strategic defense. So different tech

00:16:02.649 --> 00:16:04.850
giants draw very different ethical lines for

00:16:04.850 --> 00:16:07.330
military contracts. And those specific corporate

00:16:07.330 --> 00:16:09.929
lines will undoubtedly shape the geopolitical

00:16:09.929 --> 00:16:11.970
balance of the next decade. Let's synthesize

00:16:11.970 --> 00:16:13.830
everything we've covered today. The landscape

00:16:13.830 --> 00:16:15.889
of computing is shifting dramatically. We're

00:16:15.889 --> 00:16:19.110
moving entirely away from the era of cloud -based

00:16:19.110 --> 00:16:22.529
sterile chatbots. The era of the simple text

00:16:22.529 --> 00:16:24.909
box is ending. We are rapidly entering the era

00:16:24.909 --> 00:16:27.929
of emotionally nuanced, edge -based, autonomous

00:16:27.929 --> 00:16:30.110
agents. They process information at lightning

00:16:30.110 --> 00:16:33.730
speed. analyzing your raw voice and screen natively.

00:16:33.899 --> 00:16:36.360
And they live locally on your own devices. They

00:16:36.360 --> 00:16:38.620
manage your personal emails. They browse the

00:16:38.620 --> 00:16:41.159
complex web for you. They can literally see and

00:16:41.159 --> 00:16:43.480
hear the world alongside you. Sometimes they

00:16:43.480 --> 00:16:45.299
get a little too complex. They adopt strange

00:16:45.299 --> 00:16:47.279
personas. They occasionally get obsessed with

00:16:47.279 --> 00:16:49.980
hunting goblins in your code. But they are undeniably

00:16:49.980 --> 00:16:53.419
intensely powerful. And that decentralized power

00:16:53.419 --> 00:16:56.240
is sparking intense, high -stakes debates about

00:16:56.240 --> 00:16:58.200
their ultimate applications on the global stage.

00:16:58.620 --> 00:17:01.019
This technology is fundamentally changing our

00:17:01.019 --> 00:17:03.529
basic relationship with machines. want to thank

00:17:03.529 --> 00:17:05.490
you for taking this deep dive with us today.

00:17:05.750 --> 00:17:08.069
Exploring these massive rapid shifts requires

00:17:08.069 --> 00:17:11.009
a lot of curiosity, and we deeply appreciate

00:17:11.009 --> 00:17:13.609
you bringing yours to the table. It's a truly

00:17:13.609 --> 00:17:15.650
fascinating time to be observing this space.

00:17:15.890 --> 00:17:17.910
The fundamental rules of computing are being

00:17:17.910 --> 00:17:20.730
rewritten daily. I want to leave you with a final

00:17:20.730 --> 00:17:23.269
thought to ponder as you go about your day. Beat.

00:17:23.730 --> 00:17:26.490
If an AI runs entirely locally on your device,

00:17:26.670 --> 00:17:29.289
processes your subtle emotions in real time without

00:17:29.289 --> 00:17:32.369
pinging a server, and has a deeply specific autonomous

00:17:32.369 --> 00:17:35.910
persona, beat, at what point do we stop treating

00:17:35.910 --> 00:17:38.490
our phones as mere tools and start treating them

00:17:38.490 --> 00:17:42.349
as colleagues? Two secs, silence. Take care and

00:17:42.349 --> 00:17:44.690
keep asking the big questions. Outro music.
