WEBVTT

00:00:00.000 --> 00:00:02.620
Have you ever tried to get an AI to make a group

00:00:02.620 --> 00:00:05.080
photo? Oh, it's a nightmare. It's an absolute

00:00:05.080 --> 00:00:07.179
disaster. For years, you'd ask for something

00:00:07.179 --> 00:00:09.400
simple. You know, five friends at a picnic, and

00:00:09.400 --> 00:00:11.980
you'd get, well, you'd get a thumb where it shouldn't

00:00:11.980 --> 00:00:14.640
be. Or that one blurry face in the back that

00:00:14.640 --> 00:00:16.899
looks like it's melting. Exactly. Or just that

00:00:16.899 --> 00:00:18.940
awful flat lighting that makes everyone look

00:00:18.940 --> 00:00:23.179
like a plastic doll. AI images were. They were

00:00:23.179 --> 00:00:26.280
fun toys. A novelty. A novelty, for sure. But

00:00:26.280 --> 00:00:28.859
what if that all just changed almost overnight?

00:00:29.489 --> 00:00:32.450
Welcome to the deep dive. Today, we are digging

00:00:32.450 --> 00:00:35.369
into a massive update to Google's image generator.

00:00:36.030 --> 00:00:39.170
Officially, it's called Gemini 3, maybe Imogen

00:00:39.170 --> 00:00:41.850
3. But the community has its own names. It really

00:00:41.850 --> 00:00:44.570
does. Some are calling it Nano Banana Pro. Which

00:00:44.570 --> 00:00:47.960
is, I mean, it's just jargon. But it gets at

00:00:47.960 --> 00:00:50.000
the feeling here. It does. It shows this isn't

00:00:50.000 --> 00:00:51.979
just a small step. The source material we're

00:00:51.979 --> 00:00:54.659
looking at calls it a giant leap. So our mission

00:00:54.659 --> 00:00:56.859
today is to unpack what makes this so different.

00:00:57.299 --> 00:00:59.380
We'll show you how to use it for free right now,

00:00:59.380 --> 00:01:01.700
and we're even going to share some prompts to

00:01:01.700 --> 00:01:04.599
get you those stunning results. Yeah, we're going

00:01:04.599 --> 00:01:06.980
to get past the plastic faces and straight to

00:01:06.980 --> 00:01:10.120
photorealism. So where do we start? I think we

00:01:10.120 --> 00:01:12.859
have to start with the physics. The huge jump

00:01:12.859 --> 00:01:16.750
in realism and why that plastic look. is finally

00:01:16.750 --> 00:01:19.030
a thing of the past. OK. Then we'll get into

00:01:19.030 --> 00:01:22.390
the AI's memory, which is really where the magic

00:01:22.390 --> 00:01:25.010
is for creative work. And finally, we'll cover

00:01:25.010 --> 00:01:26.870
how you can actually use this professionally.

00:01:26.950 --> 00:01:29.930
Let's do it. So the fundamental problem with

00:01:29.930 --> 00:01:32.569
the old models, it was always about texture.

00:01:33.049 --> 00:01:34.629
They could see shapes. They could see colors.

00:01:34.989 --> 00:01:37.450
But they didn't understand reality. If you ask

00:01:37.450 --> 00:01:39.689
for five friends on a beach, you just got this

00:01:39.689 --> 00:01:42.390
flat, cartoonish image. It was never convincing.

00:01:42.510 --> 00:01:45.799
It just felt fake. Totally fake. Yeah. The breakthrough

00:01:45.799 --> 00:01:47.840
here and what the sources are all pointing to

00:01:47.840 --> 00:01:51.159
is that Gemini 3 gets nuance. It understands

00:01:51.159 --> 00:01:54.060
that human skin has pores, imperfections. It's

00:01:54.060 --> 00:01:56.560
not smooth plastic. Exactly. It knows wood has

00:01:56.560 --> 00:01:59.379
grain. It processes how light actually works

00:01:59.379 --> 00:02:01.959
in the real world. And that's the biggest giveaway

00:02:01.959 --> 00:02:04.439
for an AI image bad lighting. And this update

00:02:04.439 --> 00:02:06.200
seems to fix that. It pretty much eliminates

00:02:06.200 --> 00:02:09.180
it. OK, so let's unpack the tech behind this.

00:02:09.719 --> 00:02:12.439
The sources are highlighting, I think, four major

00:02:12.439 --> 00:02:14.599
improvements that get rid of that flatness. Yeah.

00:02:14.639 --> 00:02:17.120
And the first one is the big one. Which is that.

00:02:17.259 --> 00:02:20.599
It actually thinks. It actually thinks. Before,

00:02:21.240 --> 00:02:23.340
a model would just start drawing pixels immediately.

00:02:24.139 --> 00:02:28.419
This one, it pauses. It actually takes a moment

00:02:28.419 --> 00:02:30.539
to plan the composition. Like an artist doing

00:02:30.539 --> 00:02:34.000
a quick sketch first. Precisely. A wire frame,

00:02:34.379 --> 00:02:37.759
a thumbnail sketch. it plans before it renders.

00:02:37.879 --> 00:02:39.819
And that leads directly to the second point,

00:02:39.919 --> 00:02:43.659
which is just the realism. That smooth, plasticky

00:02:43.659 --> 00:02:46.460
skin is gone. The photos look like they were

00:02:46.460 --> 00:02:49.599
taken with a high -end camera. A DSLR, for sure.

00:02:50.060 --> 00:02:51.539
It's a different quality altogether. Then there's

00:02:51.539 --> 00:02:53.960
the text handling. This is huge. Oh, this has

00:02:53.960 --> 00:02:56.060
been a problem for years. You'd get a great image,

00:02:56.159 --> 00:02:59.099
but the sign in the background would be just

00:02:59.099 --> 00:03:00.860
gibberish. Now, if you want a sign that says

00:03:00.860 --> 00:03:02.740
coffee shop, it actually spells coffee shop.

00:03:02.759 --> 00:03:04.840
Which is... critical for any kind of commercial

00:03:04.840 --> 00:03:08.120
work. And that brings us to the last point. Consistency.

00:03:08.280 --> 00:03:10.960
The holy grail. You can actually keep the same

00:03:10.960 --> 00:03:14.219
character, the same face, across a bunch of different

00:03:14.219 --> 00:03:16.400
images. That used to be almost impossible. It

00:03:16.400 --> 00:03:19.300
was. And that planning phase you mentioned, that's

00:03:19.300 --> 00:03:21.740
the engine behind all of this. It's what allows

00:03:21.740 --> 00:03:25.180
for the realism, the good text, everything. Even

00:03:25.180 --> 00:03:29.199
the AI now thinks. How does that planning phase

00:03:29.199 --> 00:03:31.639
fundamentally alter the quality compared to the

00:03:31.639 --> 00:03:34.900
older models? It allows for complex cinematic

00:03:34.900 --> 00:03:38.639
lighting and detailed realism, which were previously

00:03:38.639 --> 00:03:42.099
impossible. The planning step lets the AI build

00:03:42.099 --> 00:03:44.639
the physics of the scene first. Exactly. It plans

00:03:44.639 --> 00:03:47.379
the physics before it draws the picture. So how

00:03:47.379 --> 00:03:49.360
do we actually get our hands on this? Is it,

00:03:49.360 --> 00:03:51.500
you know, some special software? Not at all.

00:03:51.520 --> 00:03:53.340
It's available right now in the browser. Just

00:03:53.340 --> 00:03:55.300
on Google Gemini? Yep. You just have to look

00:03:55.300 --> 00:03:57.580
for the Gemini advanced model, or sometimes you'll

00:03:57.580 --> 00:03:59.780
see a little label that says thinking with three.

00:04:00.280 --> 00:04:01.719
That's how you know you're using the new engine.

00:04:01.979 --> 00:04:04.120
And it's free. There are limits for free users,

00:04:04.300 --> 00:04:07.060
usually about three to five high quality images

00:04:07.060 --> 00:04:09.639
a day. And there's a catch, right? The watermark.

00:04:09.939 --> 00:04:12.819
Right. All the images have an invisible digital

00:04:12.819 --> 00:04:15.699
watermark. You can't see it, but it's embedded

00:04:15.699 --> 00:04:18.420
in the file's data. Which is for traceability,

00:04:18.459 --> 00:04:21.319
I assume. to fight deepfakes. That's the idea.

00:04:21.720 --> 00:04:24.139
It's Google's way of creating a digital signature,

00:04:24.439 --> 00:04:27.300
so you can always check if an image was AI -generated.

00:04:27.779 --> 00:04:30.139
For serious users, that's actually a good thing.

00:04:30.259 --> 00:04:33.740
It builds trust. Okay, that makes sense. Now,

00:04:33.899 --> 00:04:35.899
let's talk about the feature that everyone is

00:04:35.899 --> 00:04:40.199
buzzing about, this. continuous context editing

00:04:40.199 --> 00:04:42.180
this is the one I mean we've all been there I

00:04:42.180 --> 00:04:44.360
I still wrestle with prompt drift myself you

00:04:44.360 --> 00:04:46.100
get the perfect character the perfect face and

00:04:46.100 --> 00:04:48.620
then you want to change one tiny thing exactly

00:04:48.620 --> 00:04:50.879
you'd say okay now make the cat wear a hat and

00:04:50.879 --> 00:04:52.879
poof he would generate a completely different

00:04:52.879 --> 00:04:56.000
cat you'd lose the one you liked the AI had zero

00:04:56.000 --> 00:04:59.339
memory zero this new model it remembers what

00:04:59.339 --> 00:05:01.759
it just created It holds onto the context from

00:05:01.759 --> 00:05:04.220
one prompt to the next. For storyboarding or

00:05:04.220 --> 00:05:07.199
any sequential work, it's a game changer. I love

00:05:07.199 --> 00:05:09.439
the coffee shop experiment they walk through

00:05:09.439 --> 00:05:11.779
in the source material. It shows this perfectly.

00:05:11.959 --> 00:05:14.300
Yeah, that's a great example. So the first prompt

00:05:14.300 --> 00:05:17.620
sets the scene. Something like, create a realistic

00:05:17.620 --> 00:05:20.839
cinematic photo of a young male barista in a

00:05:20.839 --> 00:05:25.939
green apron. Warm lighting. 16 .9 aspect ratio,

00:05:26.060 --> 00:05:28.220
and you get a great image. Right, your starting

00:05:28.220 --> 00:05:30.680
point. But then the magic word is just next.

00:05:30.860 --> 00:05:32.779
You just type next. And what happens? You get

00:05:32.779 --> 00:05:35.100
a new angle of the same barista and the same

00:05:35.100 --> 00:05:38.060
cafe. The face is the same, the apron, the lighting.

00:05:38.240 --> 00:05:40.779
It's all consistent. It feels like a second shot

00:05:40.779 --> 00:05:42.639
from the same photo shoot. It's incredible. Then

00:05:42.639 --> 00:05:44.819
you can change the action. You prompt, now show

00:05:44.819 --> 00:05:46.839
him serving coffee to a customer. Yeah. And it

00:05:46.839 --> 00:05:49.060
keeps the barista identical, but just changes

00:05:49.060 --> 00:05:51.100
what he's doing. We've never had that level of

00:05:51.100 --> 00:05:53.540
control. So beyond just making a few sequential

00:05:53.540 --> 00:05:56.899
shots, what is the core implication of retaining

00:05:56.899 --> 00:05:59.959
that same character across, say, hundreds of

00:05:59.959 --> 00:06:02.819
requests? Businesses can now create reliable

00:06:02.819 --> 00:06:05.819
brand mascots and models without needing expensive

00:06:05.819 --> 00:06:08.660
photo shoots. You're creating a persistent digital

00:06:08.660 --> 00:06:11.579
asset you can use over and over. Exactly. OK,

00:06:11.579 --> 00:06:14.040
let's move from the how to the what, specifically

00:06:14.040 --> 00:06:17.139
what to type. If you're still just prompting

00:06:17.139 --> 00:06:19.459
woman in Japan street, you're going to get flat.

00:06:19.600 --> 00:06:22.579
generic results. You need the ProPrompt Blueprint.

00:06:23.439 --> 00:06:25.860
The sources are really clear on this. The best

00:06:25.860 --> 00:06:28.959
prompts use terms from photography and physics.

00:06:29.160 --> 00:06:31.800
Let's do a comparison. Take a travel blogger.

00:06:32.079 --> 00:06:34.720
Bad prompt is just subject and location. Right.

00:06:35.019 --> 00:06:38.160
A ProPrompt gets specific. It uses phrases like

00:06:38.160 --> 00:06:40.980
candid realistic medium shot. That controls the

00:06:40.980 --> 00:06:43.920
framing. And lighting. It would specify golden

00:06:43.920 --> 00:06:46.379
hour lighting. Exactly. And my favorite one?

00:06:47.660 --> 00:06:51.480
Background is slightly blurry. Okay, that instantly

00:06:51.480 --> 00:06:53.839
adds depth and makes it look professional. You're

00:06:53.839 --> 00:06:56.420
basically telling the AI which camera lens to

00:06:56.420 --> 00:06:59.259
use. Okay, what about a food ad instead of just

00:06:59.259 --> 00:07:01.459
picture of a burger? You use sensory details,

00:07:01.600 --> 00:07:03.920
professional food photography. You tell the cheese

00:07:03.920 --> 00:07:06.240
is melting down the side. You can get that specific.

00:07:06.399 --> 00:07:09.259
Oh, yeah, and scheme is rising from the patty.

00:07:09.379 --> 00:07:12.680
Then you control the lighting. Dark, moody background

00:07:12.680 --> 00:07:15.579
with a spotlight. Whoa, imagine generating hundreds

00:07:15.579 --> 00:07:18.410
of perfect product shots an hour. where you can

00:07:18.410 --> 00:07:20.730
control the exact moment the cheese melts, that's

00:07:20.730 --> 00:07:23.189
profound. It's a massive shift. And you see the

00:07:23.189 --> 00:07:25.189
same thing with a sneaker ad. The prop isn't

00:07:25.189 --> 00:07:28.829
just sneaker, it's dynamic action shot. And the

00:07:28.829 --> 00:07:32.089
camera angle. Low angle view, ground level. And

00:07:32.089 --> 00:07:35.050
this is the craziest part. Water droplets are

00:07:35.050 --> 00:07:38.009
frozen in midair. You're directing physics. So

00:07:38.009 --> 00:07:39.790
let's talk about where this is already being

00:07:39.790 --> 00:07:43.750
used. The sources list a few key business applications.

00:07:44.949 --> 00:07:46.779
Real estate agents are a big one. Right. They

00:07:46.779 --> 00:07:49.199
can upload a photo of an empty room and just

00:07:49.199 --> 00:07:51.959
tell the AI to furnish it, add a modern gray

00:07:51.959 --> 00:07:54.459
sofa, but keep the windows and walls the same.

00:07:54.660 --> 00:07:57.139
It's instant digital staging. Saves a ton of

00:07:57.139 --> 00:07:59.660
money. And for e -commerce, you can take a simple

00:07:59.660 --> 00:08:02.199
photo of your product, say a candle on your kitchen

00:08:02.199 --> 00:08:05.300
table. You prompt, place this candle on a wooden

00:08:05.300 --> 00:08:08.540
spa shelf, soft lighting, and suddenly your basic

00:08:08.540 --> 00:08:10.730
photo looks like a high -end ad. And the last

00:08:10.730 --> 00:08:12.850
one was for event planners. Yeah, this one's

00:08:12.850 --> 00:08:15.110
cool. They can upload a napkin sketch of a wedding

00:08:15.110 --> 00:08:18.149
layout. A literal napkin sketch. A photo of it.

00:08:18.529 --> 00:08:20.769
And prompt, turn this sketch into a realistic

00:08:20.769 --> 00:08:23.110
photo of a wedding reception with flowers and

00:08:23.110 --> 00:08:26.810
gold lights. It goes from a vague idea to a photorealistic

00:08:26.810 --> 00:08:29.709
concept in seconds. OK, so considering the quality

00:08:29.709 --> 00:08:31.529
we're talking about, especially for those food

00:08:31.529 --> 00:08:35.830
ads, is this tool already replacing entry -level

00:08:35.830 --> 00:08:38.830
commercial photographers? For initial mock -ups

00:08:38.830 --> 00:08:41.610
and marketing visuals, yes, the speed and cost

00:08:41.610 --> 00:08:44.389
savings are enormous. So it's not killing the

00:08:44.389 --> 00:08:46.570
creative vision of the artist, but it's automating

00:08:46.570 --> 00:08:49.850
the technical execution. Exactly. It's democratizing

00:08:49.850 --> 00:08:52.909
the ability to create really persuasive visual

00:08:52.909 --> 00:08:57.090
content. Okay, so based on all the tests in the

00:08:57.090 --> 00:09:00.100
source material, there are four. Let's call them

00:09:00.100 --> 00:09:03.059
secrets, to getting truly professional results.

00:09:03.139 --> 00:09:05.960
A master tips. Master tips. And tip number one

00:09:05.960 --> 00:09:08.539
is the face first rule. Okay, what's that? If

00:09:08.539 --> 00:09:10.960
you're combining things, like a specific person

00:09:10.960 --> 00:09:13.200
with a car and a dog, always upload the photo

00:09:13.200 --> 00:09:16.539
of the person first. The AI gives the most attention

00:09:16.539 --> 00:09:18.759
to the initial input. So if you lead with the

00:09:18.759 --> 00:09:20.919
car, the car will be perfect, but the face might

00:09:20.919 --> 00:09:23.480
be messy. Exactly. The AI needs you to tell it

00:09:23.480 --> 00:09:25.740
what the most complex and important part of the

00:09:25.740 --> 00:09:28.299
image is. That makes sense. What's number two?

00:09:28.379 --> 00:09:30.240
Let's specify the aspect ratio. The shape of

00:09:30.240 --> 00:09:34.820
the picture. Yep. 16 .9 for wide video, 9 .16

00:09:34.820 --> 00:09:38.320
for a tall social media story, 1 .1 for a square

00:09:38.320 --> 00:09:41.320
post. If you don't tell it, it defaults to a

00:09:41.320 --> 00:09:44.139
square and can crop your image badly. Simple

00:09:44.139 --> 00:09:47.799
but crucial. Okay, tip three. Trust the thinking

00:09:47.799 --> 00:09:50.220
process. When it says analyzing or thinking,

00:09:50.860 --> 00:09:53.659
don't cancel it. Be patient. You have to be.

00:09:54.169 --> 00:09:56.110
Pause is the planning phase we talked about.

00:09:56.250 --> 00:09:58.149
It's where the quality comes from. If you interrupt

00:09:58.149 --> 00:10:00.690
it, you get a rush job. Got it. And the last

00:10:00.690 --> 00:10:04.049
one. Know when to use specialized tools. Google

00:10:04.049 --> 00:10:07.399
is. pretty strict on safety. If you need super

00:10:07.399 --> 00:10:10.259
high resolution, like 4K, or you're creating

00:10:10.259 --> 00:10:12.460
content that's a bit more edgy, you might need

00:10:12.460 --> 00:10:14.559
a paid tool like Higgs Field AI or something

00:10:14.559 --> 00:10:17.279
in Adobe Photoshop. That's a great list. So if

00:10:17.279 --> 00:10:19.259
a listener only takes one tip away from this

00:10:19.259 --> 00:10:21.220
whole deep dive, which one should they prioritize?

00:10:21.340 --> 00:10:23.279
What's the number one thing to do? I would say

00:10:23.279 --> 00:10:26.279
tip three. Wait for the AI to finish its analysis.

00:10:26.960 --> 00:10:29.200
That pause is the secret ingredient for quality

00:10:29.200 --> 00:10:31.139
because you're letting it run its physics simulation.

00:10:31.309 --> 00:10:32.610
See, I thought you were going to say the face

00:10:32.610 --> 00:10:34.389
-first rule. That seems like it fixes the most

00:10:34.389 --> 00:10:37.169
common problem, you know, messy hands and faces.

00:10:37.429 --> 00:10:40.970
And it is an easy fix, for sure. But if you cancel

00:10:40.970 --> 00:10:43.330
the thinking phase, even a perfect face -first

00:10:43.330 --> 00:10:45.590
image is still going to have flat lighting and

00:10:45.590 --> 00:10:49.210
bad texture. The planning is foundational. It

00:10:49.210 --> 00:10:51.590
sets the quality for everything else. So prioritize

00:10:51.590 --> 00:10:54.129
the process over the subject. Precisely. So we've

00:10:54.129 --> 00:10:57.190
covered a lot of ground. I think the big idea

00:10:57.190 --> 00:11:01.289
here is that this update, Gemini 3, It transforms

00:11:01.289 --> 00:11:04.029
the image generator from a fun little toy into

00:11:04.029 --> 00:11:07.070
a professional artist inside your computer. And

00:11:07.070 --> 00:11:09.110
that transformation comes down to two things.

00:11:09.269 --> 00:11:11.629
Two things. Hyperrealism, which it gets through

00:11:11.629 --> 00:11:14.470
that planning phase, and continuous memory, which

00:11:14.470 --> 00:11:17.090
gives you consistent characters. So our challenge

00:11:17.090 --> 00:11:19.490
to you, the listener, is pretty simple. Go to

00:11:19.490 --> 00:11:21.789
Google Gemini today, think of a senior dream

00:11:21.789 --> 00:11:24.350
kitchen, a funny picture of your pet, and apply

00:11:24.350 --> 00:11:27.049
the prompt tips we shared. Yeah, focus on the

00:11:27.049 --> 00:11:29.850
lighting, the texture, the camera angle. See

00:11:29.850 --> 00:11:31.690
what you can create. The only way to really keep

00:11:31.690 --> 00:11:33.389
up with this stuff is to get your hands dirty

00:11:33.389 --> 00:11:35.149
and start playing with it. And we have to think

00:11:35.149 --> 00:11:37.590
about what this means. I mean, what if, in a

00:11:37.590 --> 00:11:40.269
year, the line between a real photo and an AI

00:11:40.269 --> 00:11:42.870
image becomes completely invisible, even to experts?

00:11:43.250 --> 00:11:45.759
That's the path we're on. It is. We're heading

00:11:45.759 --> 00:11:48.559
into a world where visual truth is going to be

00:11:48.559 --> 00:11:51.360
a lot harder to define. Keep testing those boundaries.

00:11:51.720 --> 00:11:54.059
Keep playing with the tech. Until next time,

00:11:54.360 --> 00:11:55.179
keep digging deeper.