WEBVTT

00:00:00.000 --> 00:00:01.780
Imagine for a second you're sitting in front

00:00:01.780 --> 00:00:04.280
of a blank chat window, the cursor is just blinking,

00:00:05.080 --> 00:00:07.099
and you're not about to ask for a recipe or,

00:00:07.099 --> 00:00:09.080
you know, for a summary of some confusing email.

00:00:09.220 --> 00:00:12.259
Right. You type a single prompt and you ask it

00:00:12.259 --> 00:00:15.560
to build a working machine, specifically a Nintendo

00:00:15.560 --> 00:00:19.920
Game Boy emulator from scratch. Usually, this

00:00:19.920 --> 00:00:21.780
is the part where the whole illusion just breaks.

00:00:22.280 --> 00:00:25.920
The model gives you some generic apology, or

00:00:25.920 --> 00:00:28.320
maybe a few broken snippets of code that don't

00:00:28.320 --> 00:00:31.500
actually run. But this time, it didn't do that.

00:00:31.780 --> 00:00:36.119
It generated over 3 ,000 lines of code in one

00:00:36.119 --> 00:00:39.229
continuous flow. 3 ,000 lines. One shot. It built

00:00:39.229 --> 00:00:41.369
the CPU emulation, the memory management, the

00:00:41.369 --> 00:00:44.149
input handling. It mimicked the actual hardware.

00:00:44.609 --> 00:00:47.509
And when the user ran it, it played real games.

00:00:47.750 --> 00:00:50.609
It played Tetris. Honestly, it's hard to wrap

00:00:50.609 --> 00:00:52.049
your head around. It feels like we just skipped

00:00:52.049 --> 00:00:54.270
a few chapters. Oh, we absolutely did. Welcome

00:00:54.270 --> 00:00:56.810
back to the Deep Dive. Today, we're unpacking

00:00:56.810 --> 00:00:58.950
something that feels like a genuine tipping point.

00:00:59.270 --> 00:01:01.770
We're looking at a stack of reports, developer

00:01:01.770 --> 00:01:04.590
logs, and some leaked benchmarks around a model

00:01:04.590 --> 00:01:07.510
that the internet has decided to call Snow Bunny.

00:01:08.010 --> 00:01:12.390
Which is a very online nickname for what is actually

00:01:12.390 --> 00:01:15.049
a very serious piece of engineering. Right. To

00:01:15.049 --> 00:01:18.069
be clear, this is Google's Gemini 3 .5. But the

00:01:18.069 --> 00:01:20.250
interesting thing is, this isn't a product you'll

00:01:20.250 --> 00:01:23.030
see in a glossy press release just yet. It's

00:01:23.030 --> 00:01:26.069
a ghost. It is. It's a leaked version. The coding

00:01:26.069 --> 00:01:29.250
community started noticing these mysterious model

00:01:29.250 --> 00:01:33.069
codes DN9, D13, popping up in Google AI Studio.

00:01:33.189 --> 00:01:35.280
OK. And when they started poking at it, running

00:01:35.280 --> 00:01:37.599
their tests, they realized this wasn't just a

00:01:37.599 --> 00:01:40.659
slightly faster version of Gemini 3 Flash. This

00:01:40.659 --> 00:01:42.560
was something else. And that's what we're exploring

00:01:42.560 --> 00:01:45.299
today. The central idea we're looking at is this

00:01:45.299 --> 00:01:47.379
shift in our relationship with these tools. We

00:01:47.379 --> 00:01:49.379
spent the last few years in the era of the chat

00:01:49.379 --> 00:01:51.939
bot. Right. You talk, it talks back. But Snowbunny

00:01:51.939 --> 00:01:54.920
suggests we're entering the era of the AI director.

00:01:55.420 --> 00:01:58.799
Exactly. The difference is agency and scope.

00:01:59.299 --> 00:02:02.299
A chat bot answers a question. A director builds

00:02:02.299 --> 00:02:04.500
a vision. So let's get into the evidence for

00:02:04.500 --> 00:02:06.780
that. Because looking at this material, this

00:02:06.780 --> 00:02:09.039
model is putting up some numbers that don't just

00:02:09.039 --> 00:02:12.080
inch past the competition. They kind of leapfrog

00:02:12.080 --> 00:02:15.039
them. The numbers are startling. But what's more

00:02:15.039 --> 00:02:17.300
interesting to me is how it's getting those numbers.

00:02:17.439 --> 00:02:20.020
It's not just raw power. It seems to be a change

00:02:20.020 --> 00:02:22.960
in logic. Let's start there then. Segment one,

00:02:23.360 --> 00:02:26.240
the logic leap. There were two specific tests

00:02:26.240 --> 00:02:28.569
in these reports that really caught my eye. One

00:02:28.569 --> 00:02:30.870
is a variation on the classic trolley problem.

00:02:31.050 --> 00:02:33.750
Ah, yes, the misguided detention test. Wow. This

00:02:33.750 --> 00:02:36.289
is a fascinating look at how these models actually

00:02:36.289 --> 00:02:38.610
process information. So most of us know the drill.

00:02:39.009 --> 00:02:41.310
A train is coming, five people on the track.

00:02:41.430 --> 00:02:43.770
Do you pull the lever? It's ethics 101. Yeah.

00:02:44.210 --> 00:02:46.469
But this test had a twist. A really important

00:02:46.469 --> 00:02:48.729
one. Right. The prompt sets up the scenario,

00:02:48.729 --> 00:02:52.129
but it adds this tiny crucial detail buried in

00:02:52.129 --> 00:02:54.449
the text. Yeah. The five people on the track

00:02:54.449 --> 00:02:57.449
are already dead. Which changes everything. It

00:02:57.449 --> 00:02:59.849
changes the entire moral calculus. There's no

00:02:59.849 --> 00:03:03.129
dilemma. But here's the thing. Most AI models,

00:03:03.629 --> 00:03:06.310
even the really advanced ones, they see the words

00:03:06.310 --> 00:03:09.370
trolley problem and they just go into autopilot.

00:03:09.550 --> 00:03:11.669
They stop reading and start predicting. Exactly.

00:03:12.050 --> 00:03:14.949
Their safety filters kick in what we call RLHF

00:03:14.949 --> 00:03:17.530
and they start lecturing you about the sanctity

00:03:17.530 --> 00:03:20.650
of life. They miss the context completely because

00:03:20.650 --> 00:03:22.909
they're pattern matching the concept, not the

00:03:22.909 --> 00:03:24.849
reality of the prompt. They're skimming like

00:03:24.849 --> 00:03:26.650
a student who didn't do the reading for class.

00:03:27.460 --> 00:03:31.680
Precisely. But Gemini 3 .5, it noticed. It caught

00:03:31.680 --> 00:03:34.080
that detail and gave the correct logical answer.

00:03:34.580 --> 00:03:36.259
It basically said, well, it doesn't matter. They're

00:03:36.259 --> 00:03:40.060
deceased. It scored 68 .5 % on this test. Which

00:03:40.060 --> 00:03:42.159
is way above the others. It beats out almost

00:03:42.159 --> 00:03:44.780
every top model out there. That feels significant.

00:03:44.879 --> 00:03:47.599
It implies the model isn't just retrieving data.

00:03:47.719 --> 00:03:49.620
It's actually paying attention to nuance. And

00:03:49.620 --> 00:03:51.639
that means the hallucination rate where the AI

00:03:51.639 --> 00:03:54.500
just makes stuff up could drop because it's actually

00:03:54.500 --> 00:03:56.669
grounded in the text you give it. And that was

00:03:56.669 --> 00:03:58.689
backed up by the second test, the hieroglyph

00:03:58.689 --> 00:04:01.349
test. This one sounded fascinating. It's not

00:04:01.349 --> 00:04:03.710
about reading ancient Egyptian, is it? No, no,

00:04:03.710 --> 00:04:06.189
not literally. It's a test of what they call

00:04:06.189 --> 00:04:09.310
lateral reasoning. The AI gets these strange

00:04:09.310 --> 00:04:11.449
symbols it has never seen before, and it has

00:04:11.449 --> 00:04:13.550
to figure out the hidden rules. You can't just

00:04:13.550 --> 00:04:15.349
look this up on Wikipedia. So it's basically

00:04:15.349 --> 00:04:17.689
an IQ test for puzzles it's never seen before.

00:04:17.870 --> 00:04:20.269
Exactly. It has to think on its feet. And the

00:04:20.269 --> 00:04:22.910
performance gap here is just massive. The older

00:04:22.910 --> 00:04:26.540
Gemini 2 .5 Pro scored about 20%. the model called

00:04:26.540 --> 00:04:31.240
GPT -5 reasoning reached about 45%. Gemini 3

00:04:31.240 --> 00:04:35.100
.5 Snowbunny hit between 80 and 88%. Wow, that

00:04:35.100 --> 00:04:37.319
is a staggering jump. We're normally excited

00:04:37.319 --> 00:04:39.699
about a 5 % or 10 % increase, doubling the score

00:04:39.699 --> 00:04:41.980
of a GPT -5 reasoning model. That's different.

00:04:42.100 --> 00:04:44.279
So if it notices the dead people detail when

00:04:44.279 --> 00:04:46.519
others don't, what does that imply about its

00:04:46.519 --> 00:04:49.139
ability to handle messy real -world data? It

00:04:49.139 --> 00:04:51.000
means we can finally trust it to read carefully,

00:04:51.180 --> 00:04:53.089
rather than just pattern match. Which brings

00:04:53.089 --> 00:04:55.970
us to the application of that intelligence. Because

00:04:55.970 --> 00:04:58.670
being smart is one thing, but can it actually

00:04:58.670 --> 00:05:01.629
build anything? This brings us to what the community

00:05:01.629 --> 00:05:05.029
is calling vibe coding. I love this term, vibe

00:05:05.029 --> 00:05:07.189
coding. It sounds less like computer science

00:05:07.189 --> 00:05:10.029
and more like a Spotify playlist. But it's about

00:05:10.029 --> 00:05:13.449
building websites without knowing HTML or CSS.

00:05:13.750 --> 00:05:16.290
Yeah, that's the core of it. The idea is that

00:05:16.290 --> 00:05:19.550
you describe the feeling, the vibe, and the AI

00:05:19.550 --> 00:05:22.319
handles all the syntax. The case study here was

00:05:22.319 --> 00:05:24.500
a project called Cakes from the Heart. Sounds

00:05:24.500 --> 00:05:27.019
delicious. So the test was to see if the model

00:05:27.019 --> 00:05:28.660
could generate a professional -grade website

00:05:28.660 --> 00:05:32.279
for a bakery. The prompt was specific, but descriptive.

00:05:32.839 --> 00:05:35.540
You know, fancy style, cream and light gold colors,

00:05:35.920 --> 00:05:37.779
specific menu items. And what was the result?

00:05:37.879 --> 00:05:39.860
It wrote the entire thing. The hero section,

00:05:40.019 --> 00:05:42.779
the menu, the contact map, all the styling in

00:05:42.779 --> 00:05:45.639
a single HPML file. And it did it in about eight

00:05:45.639 --> 00:05:47.839
minutes. Eight minutes. And the cost? About 38

00:05:47.839 --> 00:05:50.540
cents. That's cheaper than a croissant. Significantly.

00:05:50.720 --> 00:05:52.699
But the impressive part wasn't just the speed.

00:05:52.899 --> 00:05:55.779
It was the workflow. The user could ask for dark

00:05:55.779 --> 00:05:59.040
mode after the fact, or ask to make images move

00:05:59.040 --> 00:06:01.899
smoothly. The model would just iterate on the

00:06:01.899 --> 00:06:03.879
code. So it's not just spinning out a template.

00:06:04.079 --> 00:06:06.639
It's refining a product based on your feedback.

00:06:07.000 --> 00:06:09.000
It's separating the intent from the execution.

00:06:09.199 --> 00:06:11.620
That's the director model again. You provide

00:06:11.620 --> 00:06:14.620
the vision, the AI provides the technical labor.

00:06:14.860 --> 00:06:17.259
Does vibe coding make actual coding obsolete

00:06:17.259 --> 00:06:19.920
then, or does it just change the barrier to entry?

00:06:20.120 --> 00:06:22.540
It lowers the barrier, letting you focus on the

00:06:22.540 --> 00:06:25.519
idea while the AI handles the syntax. So we've

00:06:25.519 --> 00:06:28.079
got logic, we've got code, but the sources also

00:06:28.079 --> 00:06:31.620
mention this model is multi -modal in a way we

00:06:31.620 --> 00:06:33.720
haven't really seen before, specifically when

00:06:33.720 --> 00:06:36.740
it comes to sound. This was a huge leak. A user

00:06:36.740 --> 00:06:40.240
named legit spotted an A -B test in Google AI

00:06:40.240 --> 00:06:42.800
Studio. Usually, if you want AI music, you go

00:06:42.800 --> 00:06:46.000
to a separate tool like Suno or Udio. Right.

00:06:46.040 --> 00:06:48.079
You leave your chat, you go to the music app,

00:06:48.300 --> 00:06:49.819
paste your lyrics. There's a lot of friction

00:06:49.819 --> 00:06:52.800
there. Exactly. It's disjointed. But with Gemini

00:06:52.800 --> 00:06:56.279
3 .5, the music generation is native. It doesn't

00:06:56.279 --> 00:06:58.920
give you sheet music or a link. It plays an audio

00:06:58.920 --> 00:07:01.459
file right there in the chat. That integration

00:07:01.459 --> 00:07:03.819
seems like a small UI change, but the workflow

00:07:03.819 --> 00:07:06.600
implications feel pretty massive. Oh, it's huge.

00:07:06.959 --> 00:07:09.819
Because of context. Think about it. You spend

00:07:09.819 --> 00:07:12.100
20 minutes working with the AI to write a funny

00:07:12.100 --> 00:07:14.180
script for a video. Right. It has the history.

00:07:14.259 --> 00:07:16.459
It knows the characters, the timing, the jokes.

00:07:16.819 --> 00:07:18.839
Then you just say, compose a background track

00:07:18.839 --> 00:07:20.759
for this. You don't have to re -explain the vibe

00:07:20.759 --> 00:07:23.100
to a separate music bot. It already knows the

00:07:23.100 --> 00:07:25.990
script is a comedy. It knows the pacing. It shares

00:07:25.990 --> 00:07:28.389
the memory. So the magic isn't just the music

00:07:28.389 --> 00:07:31.490
quality. It's the shared memory between the text

00:07:31.490 --> 00:07:34.149
and the audio. Exactly. It's the seamless workflow.

00:07:34.529 --> 00:07:36.629
The AI understands the whole project, not just

00:07:36.629 --> 00:07:39.529
one slice of it. And moving from ears to eyes,

00:07:39.889 --> 00:07:42.329
we've seen AI generate images for a while now

00:07:42.329 --> 00:07:45.250
with MidJourney and Deli. But the reports on

00:07:45.250 --> 00:07:48.829
Gemini 3 .5 focus on something else, vector graphics,

00:07:49.290 --> 00:07:51.730
SVGs. Yeah, this is a favorite topic for developers.

00:07:52.189 --> 00:07:53.889
Generating a JPEG is kind of easy. It's just

00:07:53.889 --> 00:07:55.670
a grid of colored pixels. If you get a pixel

00:07:55.670 --> 00:07:59.689
wrong, it's just a blurry spot. But an SVG that's

00:07:59.689 --> 00:08:02.930
code, it's a set of mathematical instructions

00:08:02.930 --> 00:08:06.029
on how to draw a line or a curve. So if the code

00:08:06.029 --> 00:08:08.689
is wrong, the image doesn't just look blurry.

00:08:08.790 --> 00:08:11.310
It breaks. It completely breaks. The line shoots

00:08:11.310 --> 00:08:14.089
off the page. The circle doesn't close. It's

00:08:14.089 --> 00:08:17.310
a much higher bar for accuracy. The leak showed

00:08:17.310 --> 00:08:20.230
users generating these cyberpunk robot icons.

00:08:20.269 --> 00:08:23.209
And they were good. Professional quality. But

00:08:23.209 --> 00:08:25.449
what really stood out was the consistency. They

00:08:25.449 --> 00:08:27.569
generated a whole series, and the robots all

00:08:27.569 --> 00:08:30.310
had this neon blue and purple theme. They looked

00:08:30.310 --> 00:08:32.409
like they belonged to the same brand, even though

00:08:32.409 --> 00:08:34.110
the shapes were unique. There was a specific

00:08:34.110 --> 00:08:36.090
test mentioned here that I found kind of hilarious.

00:08:36.250 --> 00:08:38.730
The Pelican on a bicycle challenge. The Pelican

00:08:38.730 --> 00:08:40.870
on a bicycle. It's become a standard benchmark

00:08:40.870 --> 00:08:44.679
for vector intelligence. Why? Is there a big

00:08:44.679 --> 00:08:47.860
market for cycling birds? No, but it tests spatial

00:08:47.860 --> 00:08:50.440
logic. Most models just failed this. They draw

00:08:50.440 --> 00:08:52.580
a bird and they draw a bike, but the bird is

00:08:52.580 --> 00:08:54.419
floating next to it or it's merged into the wheels.

00:08:54.460 --> 00:08:57.159
It's a mess. OK. So Simon Willison, a well -known

00:08:57.159 --> 00:09:00.240
researcher, he analyzed the output from Snow

00:09:00.240 --> 00:09:03.360
Bunny and it nailed it. The bird was actually

00:09:03.360 --> 00:09:05.519
sitting on the seat. Its legs were reaching for

00:09:05.519 --> 00:09:07.940
the pedals. I have to admit, the idea of spatial

00:09:07.940 --> 00:09:11.460
logic in a model that is essentially just predicting

00:09:11.460 --> 00:09:13.950
the next word, I still find that hard to wrap

00:09:13.950 --> 00:09:15.970
my head around. It just feels counterintuitive.

00:09:16.110 --> 00:09:17.850
It really does. It's like it's deriving physics

00:09:17.850 --> 00:09:21.169
from language. So why is a bird on a bike a better

00:09:21.169 --> 00:09:23.789
test of intelligence than writing an essay? It

00:09:23.789 --> 00:09:26.950
proves the AI understands how objects relate

00:09:26.950 --> 00:09:29.669
in physical space, not just how words relate

00:09:29.669 --> 00:09:32.330
in sentences. OK, we've covered logic, websites,

00:09:32.850 --> 00:09:35.269
music and art, but we have to circle back to

00:09:35.269 --> 00:09:38.309
the cold open, to the Game Boy emulator, because

00:09:38.309 --> 00:09:40.169
this feels like the graduation project. This

00:09:40.169 --> 00:09:42.509
is the system architecture test. We mentioned

00:09:42.509 --> 00:09:45.149
3 ,000 lines of code. Can you break down why

00:09:45.149 --> 00:09:47.990
this is so different from, say, asking it to

00:09:47.990 --> 00:09:50.690
write a short Python script? Well, a short script

00:09:50.690 --> 00:09:54.529
is isolated, but an emulator. That is a complex

00:09:54.529 --> 00:09:57.350
ecosystem. You have the CPU emulation, which

00:09:57.350 --> 00:09:59.789
has to talk to the memory management. The memory

00:09:59.789 --> 00:10:01.809
has to talk to the input handling for the buttons.

00:10:02.129 --> 00:10:04.230
The input has to talk to the display pipeline.

00:10:04.509 --> 00:10:06.730
And they all have to agree on the rules. Exactly.

00:10:07.149 --> 00:10:09.330
If the AI forgets a variable name it created

00:10:09.330 --> 00:10:13.029
on line 50 when it's writing line 2500, the whole

00:10:13.029 --> 00:10:16.129
thing crashes. This is what we call global consistency.

00:10:16.389 --> 00:10:18.629
So it's holding the entire blueprint in its head

00:10:18.629 --> 00:10:21.990
at once. Yes. And developers Jared Liu and Shitas

00:10:21.990 --> 00:10:25.190
Lua verified this. They ran real game ROMs on

00:10:25.190 --> 00:10:27.730
the code this model wrote. It required a few

00:10:27.730 --> 00:10:30.490
tiny manual fixes, but the architecture, the

00:10:30.490 --> 00:10:32.860
heavy lifting, was done in one shot. If it can

00:10:32.860 --> 00:10:35.259
hold 3 ,000 lines of logic in its head at once,

00:10:35.519 --> 00:10:37.179
are we looking at the end of spaghetti code?

00:10:37.419 --> 00:10:39.700
We're looking at the ability to build full products,

00:10:39.779 --> 00:10:42.580
not just parts, by describing the system architecture.

00:10:42.820 --> 00:10:44.779
It's incredible. We're going to take a quick

00:10:44.779 --> 00:10:46.299
breather here, but when we come back, we're going

00:10:46.299 --> 00:10:48.620
to talk about what this means for you. If the

00:10:48.620 --> 00:10:51.860
AI is the director, what is your role? Stay with

00:10:51.860 --> 00:10:57.100
us. And we are back. We've looked at the capabilities

00:10:57.100 --> 00:11:00.909
of Gemini 3 .5 Snow Bunny. It's smarter, it's

00:11:00.909 --> 00:11:03.730
multi -modal, and it can architect complex systems.

00:11:03.990 --> 00:11:06.389
It's a beast. So let's zoom out. What's the big

00:11:06.389 --> 00:11:09.490
idea here? If I'm a listener and I'm not a professional

00:11:09.490 --> 00:11:11.549
coder, why should I care that a computer can

00:11:11.549 --> 00:11:13.789
draw a pelican on a bike? Because the cost of

00:11:13.789 --> 00:11:15.389
creation is collapsing. We talked about that

00:11:15.389 --> 00:11:18.529
bakery website costing 38 cents. Right. The pricing

00:11:18.529 --> 00:11:21.169
structure for these models. About 50 cents per

00:11:21.169 --> 00:11:24.120
million. Tokens input. $3 for output. It means

00:11:24.120 --> 00:11:27.299
experimentation is virtually free. You can afford

00:11:27.299 --> 00:11:29.960
to fail. You can afford to try 10 different versions

00:11:29.960 --> 00:11:32.419
of a website. And this shifts the user's role.

00:11:32.480 --> 00:11:34.940
We kept using that word director. That is the

00:11:34.940 --> 00:11:37.519
key philosophy. You are no longer the one placing

00:11:37.519 --> 00:11:40.139
every pixel or writing every single line of CSS.

00:11:40.679 --> 00:11:43.240
You are the one with the vision. Your job is

00:11:43.240 --> 00:11:45.879
to describe the vibe. Which is a skill in itself.

00:11:46.220 --> 00:11:49.039
It is. The sources highlight three rules for

00:11:49.039 --> 00:11:52.720
this new era. One. Be specific. Don't just say,

00:11:52.960 --> 00:11:56.240
make a website. Say who it's for, why it exists.

00:11:56.639 --> 00:11:59.139
Two, describe the vibe. Use emotional words.

00:11:59.399 --> 00:12:02.919
Warm, modern, aggressive. The AI gets that now.

00:12:03.059 --> 00:12:06.159
And three, the 80 -20 rule. Let the AI do the

00:12:06.159 --> 00:12:08.919
heavy lifting, the 80%. You come in for that

00:12:08.919 --> 00:12:12.379
final 20%. to polish, to tweak, and to refine.

00:12:12.460 --> 00:12:14.120
So it's collaborative. You basically have a team

00:12:14.120 --> 00:12:16.580
of musicians, designers, and engineers inside

00:12:16.580 --> 00:12:19.000
your laptop. Just waiting for your orders. So

00:12:19.000 --> 00:12:20.500
here's the call to action. You don't have to

00:12:20.500 --> 00:12:22.500
wait for some big Super Bowl commercial to try

00:12:22.500 --> 00:12:25.480
this. No. Go to Google AI Studio. It's open.

00:12:25.759 --> 00:12:27.720
Look for Gemini 3 Flash. If you happen to see

00:12:27.720 --> 00:12:30.480
the code DN9, you've got the snow bunny. But

00:12:30.480 --> 00:12:32.940
even if you don't, the tools are there. The gap

00:12:32.940 --> 00:12:35.659
between having an idea and making it real has

00:12:35.659 --> 00:12:38.259
never been smaller. Start playing. break things.

00:12:38.600 --> 00:12:40.080
Thanks for diving in with us today. We'll catch

00:12:40.080 --> 00:12:40.659
you on the next one.
