WEBVTT

00:00:00.000 --> 00:00:03.060
So there's this new AI image editing tool, right?

00:00:03.259 --> 00:00:05.719
And it's just, it's blowing everything else away

00:00:05.719 --> 00:00:07.719
in blind tests. Yeah, it's really something.

00:00:08.199 --> 00:00:11.400
We're talking over 200 ELO points ahead on Elma

00:00:11.400 --> 00:00:14.050
Arena. That's a massive gap. I mean, that really

00:00:14.050 --> 00:00:15.810
grabs your attention, doesn't it? It changes

00:00:15.810 --> 00:00:18.250
the whole game. It absolutely does. This thing,

00:00:18.250 --> 00:00:20.530
we've been calling it Nano Banana, just kind

00:00:20.530 --> 00:00:23.429
of internally. But the official name is Gemini

00:00:23.429 --> 00:00:26.989
2 .5 Flash Image. It's really shifting how we

00:00:26.989 --> 00:00:28.929
think about, you know, accurate photo editing.

00:00:29.070 --> 00:00:31.910
Whole new benchmark. Exactly. Welcome to the

00:00:31.910 --> 00:00:33.590
deep dive. Today, we're going to cut through

00:00:33.590 --> 00:00:36.909
all the hype around this new Google tool. Our

00:00:36.909 --> 00:00:40.829
mission really is to figure out why. Why is it

00:00:40.829 --> 00:00:43.070
number one? Where's that power coming from? And,

00:00:43.090 --> 00:00:46.570
you know, crucially, how can you use it for fixing

00:00:46.570 --> 00:00:48.929
old photos, maybe, or even professional character

00:00:48.929 --> 00:00:51.530
design? Yeah, we'll dig into the tech behind

00:00:51.530 --> 00:00:53.109
it first, then walk through the basics, some

00:00:53.109 --> 00:00:54.890
of the editing features. We need to talk about

00:00:54.890 --> 00:00:56.850
the consistency stuff. That's huge. But also

00:00:56.850 --> 00:00:59.450
look at where it kind of hits its limits creatively.

00:00:59.810 --> 00:01:02.549
OK, good. And then we'll wrap up with some solid

00:01:02.549 --> 00:01:05.870
tips, right? How to actually write prompts that

00:01:05.870 --> 00:01:07.730
work consistently. So let's start with that score

00:01:07.730 --> 00:01:10.370
gap. Okay, the numbers. They're pretty wild.

00:01:11.030 --> 00:01:14.069
If you look at Ella Marina, most of the top editors,

00:01:14.290 --> 00:01:18.290
they hover around 1 ,100 ELO. This tool, Mano

00:01:18.290 --> 00:01:21.069
Banana, it's pushing past 1 ,300. Okay, wait.

00:01:21.069 --> 00:01:23.250
For someone listening who doesn't follow these

00:01:23.250 --> 00:01:25.370
leaderboards religiously, what does a 200 ELO

00:01:25.370 --> 00:01:27.790
gap actually mean? It's a really good question.

00:01:28.090 --> 00:01:31.409
Think about chess. If a player has a 200 ELO

00:01:31.409 --> 00:01:34.430
advantage, they're expected to win about, what,

00:01:34.569 --> 00:01:36.870
76 % of the time against the lower rated player?

00:01:37.129 --> 00:01:39.969
It's a significant edge. So in these blind tests

00:01:39.969 --> 00:01:41.909
where people are just picking which image looks

00:01:41.909 --> 00:01:45.290
better or more accurate, this new model is just

00:01:45.290 --> 00:01:47.469
winning, like three out of four times against

00:01:47.469 --> 00:01:49.469
the best competitors out there. Pretty much,

00:01:49.609 --> 00:01:52.390
yeah. Consistently, overwhelmingly chosen. That's

00:01:52.390 --> 00:01:54.689
the advantage we need to unpack. OK, so if it's

00:01:54.689 --> 00:01:56.430
that much better, the sources must be saying

00:01:56.430 --> 00:01:59.170
it's not just slightly more data or better code.

00:01:59.250 --> 00:02:00.950
It has to be something different at the core.

00:02:01.170 --> 00:02:04.390
It is. The secret sauce is the foundation. Gemini

00:02:04.390 --> 00:02:08.009
2 .5 Flash. That's Google's special model. And

00:02:08.009 --> 00:02:11.750
the key thing is, it's multimodal. Multimodal.

00:02:11.909 --> 00:02:14.009
Meaning it gets different kinds of info at once.

00:02:14.569 --> 00:02:16.990
Words, pictures. Exactly. Words, pictures, sounds,

00:02:17.210 --> 00:02:19.530
even video concepts. It understands them all

00:02:19.530 --> 00:02:22.530
together. Ah, okay. Like stacking different kinds

00:02:22.530 --> 00:02:25.189
of Lego blocks of data, maybe? That's a great

00:02:25.189 --> 00:02:28.169
way to put it. Imagine editing a Lego build.

00:02:28.550 --> 00:02:30.930
A normal tool might just see, I don't know, the

00:02:30.930 --> 00:02:33.889
color of a brick. A multimodal one sees the color,

00:02:33.969 --> 00:02:35.689
the shape, where it sits next to other bricks,

00:02:36.169 --> 00:02:37.949
and it understands the written instructions to

00:02:37.949 --> 00:02:40.409
the final thing. Oh, okay. So it has a deeper

00:02:40.409 --> 00:02:43.370
context. Right. So because of that, NanoBanana

00:02:43.370 --> 00:02:46.430
isn't just making images, it's editing them super

00:02:46.430 --> 00:02:48.509
accurately, especially keeping all those little

00:02:48.509 --> 00:02:51.289
details right. That's stability. That makes sense.

00:02:52.039 --> 00:02:54.300
If the power comes from being multimodal, how

00:02:54.300 --> 00:02:57.020
does that, like, technical thing translate into

00:02:57.020 --> 00:02:58.939
a real benefit for someone just trying to fix

00:02:58.939 --> 00:03:02.099
an old photo? Well, that reliability, you know?

00:03:02.219 --> 00:03:05.280
Keeping tiny details consistent, like a specific

00:03:05.280 --> 00:03:07.919
glint in someone's eye or the weave of fabric.

00:03:08.199 --> 00:03:10.400
It just makes the tool feel much more trustworthy.

00:03:10.659 --> 00:03:13.240
Right. It stops those weird, jarring changes

00:03:13.240 --> 00:03:15.439
you sometimes see where the AI gets confused.

00:03:15.699 --> 00:03:18.139
Exactly. It avoids that stuff. OK, so we get

00:03:18.139 --> 00:03:21.189
the why. It's that multimodal foundation. Now,

00:03:21.189 --> 00:03:23.229
for someone listening who's thinking, OK, I want

00:03:23.229 --> 00:03:26.610
to try this, where do they actually go? Two main

00:03:26.610 --> 00:03:30.009
places. For beginners, definitely Google AI Studio.

00:03:30.090 --> 00:03:32.750
It's free. The interface is super simple, really

00:03:32.750 --> 00:03:35.550
straightforward. You just upload your picture,

00:03:35.969 --> 00:03:37.969
type what you want changed in the text box, be

00:03:37.969 --> 00:03:40.789
specific, click Run, and then you can download

00:03:40.789 --> 00:03:43.789
it. And Ella Marina is still there for the comparisons

00:03:43.789 --> 00:03:45.849
if you want to really nerd out on the scores.

00:03:46.009 --> 00:03:48.629
Right, for testing. But for actually making stuff,

00:03:48.849 --> 00:03:51.560
AI Studio is the way to go. OK. But there's a

00:03:51.560 --> 00:03:53.379
catch, right? The free use. There is, yeah. It's

00:03:53.379 --> 00:03:55.520
limited right now. Usually like three to five

00:03:55.520 --> 00:03:57.680
free edits before you hit a wall. So planning

00:03:57.680 --> 00:04:00.000
your edits is really important. If you have a

00:04:00.000 --> 00:04:02.780
ton of photos, the sources suggest maybe using

00:04:02.780 --> 00:04:06.060
different Google accounts. But yeah, treat those

00:04:06.060 --> 00:04:08.719
first few tries carefully. Like gold. Got it.

00:04:09.300 --> 00:04:12.009
Okay, speaking of valuable stuff. Photo restoration.

00:04:12.550 --> 00:04:15.270
The source has really highlighted this. Fixing

00:04:15.270 --> 00:04:17.389
old family photos. That's a massive strength,

00:04:17.430 --> 00:04:19.329
again, because of that detail retention. You

00:04:19.329 --> 00:04:22.129
can give it an old, faded, scratched up black

00:04:22.129 --> 00:04:25.730
and white photo and just ask simply, like, color

00:04:25.730 --> 00:04:28.550
this photo naturally and fix the torn parts.

00:04:29.129 --> 00:04:32.050
Doesn't just slap color on. It actually rebuilds

00:04:32.050 --> 00:04:34.129
the missing bits, smooths out the damage, gets

00:04:34.129 --> 00:04:35.930
the color balance right. And crucially, like

00:04:35.930 --> 00:04:37.790
we said, it doesn't mess up the faces, does it?

00:04:38.110 --> 00:04:40.529
Older tools sometimes tweak the expression or

00:04:40.529 --> 00:04:43.089
the features when fixing damage. Right. This

00:04:43.089 --> 00:04:45.589
one keeps those original features almost perfectly

00:04:45.589 --> 00:04:48.550
preserved. That's huge for anyone trying to digitize

00:04:48.550 --> 00:04:50.790
family albums or for historical stuff. Keeps

00:04:50.790 --> 00:04:53.930
the soul of the picture. Exactly. So beyond fixing

00:04:53.930 --> 00:04:56.279
old damage, how about just basic cleanup? Getting

00:04:56.279 --> 00:04:58.160
rid of photobombers, that kind of thing. Yeah,

00:04:58.220 --> 00:05:00.660
it's great at that, too, because it's precise.

00:05:01.060 --> 00:05:03.379
You can tell it very specifically. Remove the

00:05:03.379 --> 00:05:05.819
person in the background on the left, and it

00:05:05.819 --> 00:05:08.000
intelligently takes them out without messing

00:05:08.000 --> 00:05:10.759
up the rest. OK, now this is where, for me, it

00:05:10.759 --> 00:05:13.100
gets really exciting, especially for creative

00:05:13.100 --> 00:05:15.639
work. Advanced consistency, that's always been

00:05:15.639 --> 00:05:17.980
the hard part for AI image tools. Oh, absolutely.

00:05:18.459 --> 00:05:21.779
This is where that 200 ELO lead really feels

00:05:21.779 --> 00:05:24.720
earned. It's ability to keep a person looking

00:05:24.720 --> 00:05:28.579
like the same person across really drastic changes.

00:05:29.079 --> 00:05:31.019
That's the magic. So you can give it one picture.

00:05:31.220 --> 00:05:33.939
Yep, one reference photo. And then ask for huge

00:05:33.939 --> 00:05:35.860
changes. Like put them in a totally different

00:05:35.860 --> 00:05:38.180
outfit, a beige wool jacket, maybe then a long

00:05:38.180 --> 00:05:40.420
green dress. Change the whole background, move

00:05:40.420 --> 00:05:42.980
them from inside an office to like an autumn

00:05:42.980 --> 00:05:45.839
garden. And the face. It just stays the same.

00:05:46.110 --> 00:05:49.009
perfectly consistent through all that. Undeniably

00:05:49.009 --> 00:05:52.129
consistent. It's uncanny. Whoa. OK, that is the

00:05:52.129 --> 00:05:54.050
moment of wonder. Imagine how much time that

00:05:54.050 --> 00:05:56.569
saves for, say, game designers making character

00:05:56.569 --> 00:06:00.189
sheets. Exactly. Or concept artists. You need

00:06:00.189 --> 00:06:03.050
consistent looks for characters, whether it's

00:06:03.050 --> 00:06:06.069
anime or detailed robots. Just use text prompts

00:06:06.069 --> 00:06:08.670
for the changes and the core character doesn't

00:06:08.670 --> 00:06:12.009
drift. It's amazing. It shrinks that whole tedious

00:06:12.009 --> 00:06:14.310
process down, right? Ensuring lighting is the

00:06:14.310 --> 00:06:16.930
same, features are the same. from potentially

00:06:16.930 --> 00:06:19.850
hours to just seconds of prompting. The scalability

00:06:19.850 --> 00:06:22.310
is, yeah, it's pretty mind -blowing. And this

00:06:22.310 --> 00:06:24.709
consistency, does it carry over into things like

00:06:24.709 --> 00:06:27.009
virtual try -on? The sources mentioned that was

00:06:27.009 --> 00:06:29.769
a big plus. Totally. That's another area where

00:06:29.769 --> 00:06:31.529
it really pulls ahead of other apps right now.

00:06:31.529 --> 00:06:33.629
OK. And we're not just talking simple stuff like

00:06:33.629 --> 00:06:36.170
putting one shirt on one person. It handles more

00:06:36.170 --> 00:06:38.970
complex swaps, too. Yeah. Especially in group

00:06:38.970 --> 00:06:41.389
photos. Give me an example of a complex swap.

00:06:41.430 --> 00:06:43.769
What does that look like? OK. Picture a photo

00:06:43.769 --> 00:06:45.779
with, say, three or four people. You can tell

00:06:45.779 --> 00:06:48.939
the AI. Swap the entire outfit of the man in

00:06:48.939 --> 00:06:50.779
the blue shirt with the outfit of the woman in

00:06:50.779 --> 00:06:53.240
the red shirt. But keep everyone else exactly

00:06:53.240 --> 00:06:55.660
the same. And it does it. It transfers the clothes,

00:06:55.759 --> 00:06:57.759
the style, the fit, the texture between those

00:06:57.759 --> 00:07:00.360
two people, keeps their faces right, and doesn't

00:07:00.360 --> 00:07:02.079
make a mess of the background. That's impressive.

00:07:02.920 --> 00:07:05.480
OK, so if it's that good with characters and

00:07:05.480 --> 00:07:07.740
details, where do things start to break down?

00:07:08.100 --> 00:07:09.980
When are you pushing it too far? What's the first

00:07:09.980 --> 00:07:13.100
sign? Usually style changes. That's where the

00:07:13.100 --> 00:07:16.759
cracks appear first. Especially complex artistic

00:07:16.759 --> 00:07:19.540
requests, things far away from just realism or

00:07:19.540 --> 00:07:22.139
simple photo effects. Right. Even with that amazing

00:07:22.139 --> 00:07:24.959
consistency, the sources mentioned this art gap,

00:07:25.500 --> 00:07:27.720
a limit we need to be aware of. Correct. It's

00:07:27.720 --> 00:07:30.379
great at realistic style changes. Making a photo

00:07:30.379 --> 00:07:33.100
look like it was shot on old film, easy. Turning

00:07:33.100 --> 00:07:35.610
it into a realistic -looking oil painting. does

00:07:35.610 --> 00:07:37.930
that well. Adjusting mood and lighting too. Yeah,

00:07:38.170 --> 00:07:40.670
excellent that. Warm sunset light, gloomy rainy

00:07:40.670 --> 00:07:43.050
day effects, no problem. But the really artsy

00:07:43.050 --> 00:07:45.370
stuff is harder. That's where it struggles. If

00:07:45.370 --> 00:07:48.089
you say make this real photo look like true cell

00:07:48.089 --> 00:07:51.290
shaded anime, the results. They often look more

00:07:51.290 --> 00:07:53.670
like a detailed pencil drawing, maybe with anime

00:07:53.670 --> 00:07:56.870
colors, but not the actual style. And mixing

00:07:56.870 --> 00:07:59.170
styles, that's a big failure point currently.

00:07:59.850 --> 00:08:03.269
Like our Lego analogy, asking for left half Lego

00:08:03.269 --> 00:08:06.569
style, right half Pixar style, is usually too

00:08:06.569 --> 00:08:08.769
much. Too confusing for it. Yeah, it's like trying

00:08:08.769 --> 00:08:11.910
to combine incompatible instructions. Too many

00:08:11.910 --> 00:08:14.709
style words at once. Watercolor, vintage, pop

00:08:14.709 --> 00:08:18.769
art just leads to messy, weird results. And complex

00:08:18.769 --> 00:08:21.389
scene changes too, like changing multiple objects

00:08:21.389 --> 00:08:23.290
in a big way. Right, the source has had that

00:08:23.290 --> 00:08:25.819
example. turn all the girls into cats and boys

00:08:25.819 --> 00:08:28.920
into dogs in a group photo. That's just asking

00:08:28.920 --> 00:08:31.279
way too much right now. Too many individual changes

00:08:31.279 --> 00:08:33.919
at once. Yeah. I still wrestle with prompt drift

00:08:33.919 --> 00:08:35.779
myself sometimes. You try to combine too many

00:08:35.779 --> 00:08:38.019
things. Oh, yeah. I tried to get it to put a

00:08:38.019 --> 00:08:40.399
tiny Viking helmet on my dog and give him a glowing

00:08:40.399 --> 00:08:42.919
laser sword in his mouth in the same edit. Huh.

00:08:43.100 --> 00:08:45.820
How did that turn out? It was just... A mess.

00:08:46.240 --> 00:08:48.919
A melted, weird, inconsistent blob. It's easy

00:08:48.919 --> 00:08:50.860
to overload it. That's actually really helpful

00:08:50.860 --> 00:08:53.580
to hear because it shows it's not just user error

00:08:53.580 --> 00:08:56.639
sometimes. The tool has limits. Definitely. So

00:08:56.639 --> 00:08:58.960
if we know it struggles with those really complex

00:08:58.960 --> 00:09:01.480
multi -part edits, what's the best strategy?

00:09:01.620 --> 00:09:03.440
How do you get the result you want? You got to

00:09:03.440 --> 00:09:06.259
break it down. Always. One simple change at a

00:09:06.259 --> 00:09:09.500
time. Edit the main object first, then change

00:09:09.500 --> 00:09:11.820
the lighting, then maybe add a simple style.

00:09:12.000 --> 00:09:14.440
Atomic steps. OK, one step at a time. Makes sense.

00:09:14.740 --> 00:09:17.799
So going back from the limits to the strengths

00:09:17.799 --> 00:09:22.179
micro editing, tiny, precise changes. Yes. This

00:09:22.179 --> 00:09:24.440
is where that Gemini flash precision really comes

00:09:24.440 --> 00:09:26.379
through. Things like just changing eye color

00:09:26.379 --> 00:09:29.240
or adding just a little smile, stuff that other

00:09:29.240 --> 00:09:31.879
tools might use as an excuse to redo the whole

00:09:31.879 --> 00:09:35.179
face. Exactly. Nano Banana handles that delicately.

00:09:35.460 --> 00:09:37.519
And the consistency holds up so well, you can

00:09:37.519 --> 00:09:40.139
actually generate like a grid of nine different

00:09:40.139 --> 00:09:43.240
facial expressions. Happy, sad, angry, surprised

00:09:43.240 --> 00:09:45.960
from one starting photo. Wow. And the person

00:09:45.960 --> 00:09:47.919
still looks like the same person underneath with

00:09:47.919 --> 00:09:49.940
the same features, same lighting. That's super

00:09:49.940 --> 00:09:52.100
valuable for professional uses. How about text

00:09:52.100 --> 00:09:54.240
and images? That's usually a nightmare for AI.

00:09:54.759 --> 00:09:58.259
It's surprisingly decent here, though language

00:09:58.259 --> 00:10:00.919
translation can be hit or miss. But simple replacement

00:10:00.919 --> 00:10:03.419
works pretty well. changing a sign from welcome

00:10:03.419 --> 00:10:06.259
home to happy birthday. It can often do that

00:10:06.259 --> 00:10:08.639
while keeping the original font style in perspective.

00:10:09.320 --> 00:10:11.700
It seems to treat the text like an object that

00:10:11.700 --> 00:10:13.899
needs to preserve or change carefully. OK, so

00:10:13.899 --> 00:10:15.860
to really leverage all this power, you need good

00:10:15.860 --> 00:10:18.580
prompts. The sources laid out a structure, a

00:10:18.580 --> 00:10:20.639
way to get better results, especially with those

00:10:20.639 --> 00:10:23.059
limited free uses. Yeah, a simple three -part

00:10:23.059 --> 00:10:27.080
structure helps you think clearly. First, the

00:10:27.080 --> 00:10:29.500
action. What are you changing? Change the car

00:10:29.500 --> 00:10:33.980
color to red. OK. Second, the constraint. What

00:10:33.980 --> 00:10:35.980
needs to stay the same? Keep the background and

00:10:35.980 --> 00:10:38.259
the driver exactly the same. Constraint. Got

00:10:38.259 --> 00:10:41.320
it. And third, the style. What's the desired

00:10:41.320 --> 00:10:43.919
quality or feel? Make it look like a professional

00:10:43.919 --> 00:10:47.320
car ad. Action, constraint, style. That structure

00:10:47.320 --> 00:10:49.860
helps avoid those accidental changes. The drift.

00:10:50.159 --> 00:10:52.559
Exactly. It forces clarity. And since those free

00:10:52.559 --> 00:10:56.080
turns are precious, the advice is plan your edits.

00:10:56.350 --> 00:10:58.809
Use clear, simple language, and definitely save

00:10:58.809 --> 00:11:00.769
prompts that work well so you can reuse them.

00:11:01.190 --> 00:11:03.649
Smart. And always double check the output. Absolutely.

00:11:03.990 --> 00:11:07.490
Final QC is crucial. Check the face consistency

00:11:07.490 --> 00:11:09.789
again. Did it actually make all the changes you

00:11:09.789 --> 00:11:12.049
asked for? Sometimes it misses one. And look

00:11:12.049 --> 00:11:14.409
carefully near the edges of your edit. Did anything

00:11:14.409 --> 00:11:15.990
weird happen in the background that you didn't

00:11:15.990 --> 00:11:18.970
intend? Good checklist. So this amazing tech

00:11:18.970 --> 00:11:22.169
is in Google AI Studio now, but the sources were

00:11:22.169 --> 00:11:24.690
clear. This is just the flash version. Right.

00:11:24.809 --> 00:11:27.009
That's the key point. This is the fast accessible

00:11:27.009 --> 00:11:29.190
version, the one winning the ELO scores. But

00:11:29.190 --> 00:11:31.610
there's likely a more powerful full version of

00:11:31.610 --> 00:11:34.730
Gemini behind it, maybe still to come. And developers

00:11:34.730 --> 00:11:37.149
can already tap into this? Yep. The API is available

00:11:37.149 --> 00:11:39.350
so people can build tools that integrate this,

00:11:39.789 --> 00:11:42.450
allow editing through, like chatting with the

00:11:42.450 --> 00:11:45.250
AI, or combine multiple images in really sophisticated

00:11:45.250 --> 00:11:47.950
ways. The potential is huge. OK, so let's boil

00:11:47.950 --> 00:11:50.330
it down. The big takeaway here. I'd say it's

00:11:50.330 --> 00:11:52.990
that NanoBanana's multimodal core gives it just

00:11:53.070 --> 00:11:56.970
unmatched consistency, especially for tiny edits

00:11:56.970 --> 00:11:59.750
and keeping characters the same across big changes.

00:11:59.909 --> 00:12:02.809
Right, but the challenge, the thing users need

00:12:02.809 --> 00:12:05.210
to remember... You have to respect its limits,

00:12:05.669 --> 00:12:08.950
especially with complex art styles, and always,

00:12:09.190 --> 00:12:12.629
always break down big ideas into small, simple

00:12:12.629 --> 00:12:14.870
steps. Don't try to do everything at once. So

00:12:14.870 --> 00:12:17.169
the call to action is... Pretty clear. Yeah.

00:12:17.250 --> 00:12:19.590
Head over to Google AI Studio. Try some simple

00:12:19.590 --> 00:12:22.289
edits first. Just feel that consistency. It really

00:12:22.289 --> 00:12:24.909
does change how you think about what AI can do

00:12:24.909 --> 00:12:27.590
with images. And maybe final thought to leave

00:12:27.590 --> 00:12:31.539
people with, that 200 ELO gap. It suggests this

00:12:31.539 --> 00:12:34.240
isn't just a small step forward. It feels like

00:12:34.240 --> 00:12:36.899
a fundamental shift in defining what realistic

00:12:36.899 --> 00:12:39.539
even means for digital images, right? It really

00:12:39.539 --> 00:12:42.179
does. What other fields, you know, beyond photos,

00:12:42.460 --> 00:12:44.600
architecture, product design, might get completely

00:12:44.600 --> 00:12:47.539
reshaped by this kind of seamless, super precise

00:12:47.539 --> 00:12:50.379
AI integration, something to think about. Definitely

00:12:50.379 --> 00:12:52.360
something to ponder. Thank you for sharing your

00:12:52.360 --> 00:12:54.379
sources with us today. We really hope this deep

00:12:54.379 --> 00:12:56.019
dive was useful for you. We'll catch you on the

00:12:56.019 --> 00:12:56.360
next one.
