WEBVTT

00:00:00.000 --> 00:00:02.759
Imagine for a second that you could take every

00:00:02.759 --> 00:00:05.580
single Harry Potter book, all seven of them,

00:00:06.059 --> 00:00:09.000
stack them up, hand them to someone. Now imagine

00:00:09.000 --> 00:00:11.619
that person could read every single word just

00:00:11.619 --> 00:00:14.490
instantly. But more importantly, imagine that

00:00:14.490 --> 00:00:16.510
while they're reading the very last sentence

00:00:16.510 --> 00:00:19.370
of the last book, they still perfectly remember

00:00:19.370 --> 00:00:21.670
the first sentence of the first book. That is

00:00:21.670 --> 00:00:24.429
the scale we're talking about here. It's a staggering

00:00:24.429 --> 00:00:26.390
amount of information to hold in your head at

00:00:26.390 --> 00:00:28.750
once, without getting fuzzy on any of the details.

00:00:28.890 --> 00:00:31.510
It really is. And that concept, that ability

00:00:31.510 --> 00:00:33.810
to hold all that information without losing the

00:00:33.810 --> 00:00:36.130
thread, is technically called a context window.

00:00:36.490 --> 00:00:38.509
It's pretty central to what we're exploring today.

00:00:39.450 --> 00:00:42.469
Welcome back to the Deep Dive. Today we are unpacking

00:00:42.469 --> 00:00:44.670
Google Gemini Advanced. It's good to be here.

00:00:44.750 --> 00:00:47.450
This is a big one. It is, yeah. And looking at

00:00:47.450 --> 00:00:49.170
the numbers, the interest is definitely there.

00:00:49.350 --> 00:00:51.850
The user base for Gemini has, what, quadrupled

00:00:51.850 --> 00:00:54.429
in just a year? That's massive growth. But digging

00:00:54.429 --> 00:00:57.240
into the reports, there's this... this interesting

00:00:57.240 --> 00:00:59.340
tension. We have millions of people using these

00:00:59.340 --> 00:01:02.659
tools, but there's a massive gap between knowing

00:01:02.659 --> 00:01:05.319
a feature exists, like, oh, I heard it can read

00:01:05.319 --> 00:01:07.439
a PDF, and actually weaving that feature into

00:01:07.439 --> 00:01:09.939
a daily workflow. That's the classic adoption

00:01:09.939 --> 00:01:12.780
curve, right? We treat these tools like novelties

00:01:12.780 --> 00:01:16.159
or just really big search engines. We ask a question.

00:01:16.159 --> 00:01:19.219
We get an answer. It's transactional. But the

00:01:19.219 --> 00:01:21.219
shift we want to talk about today and what the

00:01:21.219 --> 00:01:23.859
source material really emphasizes is moving from

00:01:23.859 --> 00:01:27.569
just chatting with a bot to building a persistent

00:01:27.569 --> 00:01:29.870
personal assistant. Which is a heavy promise.

00:01:30.090 --> 00:01:32.030
So to keep us on track, we have a bit of a roadmap.

00:01:32.329 --> 00:01:34.109
We're going to look at three specific pillars

00:01:34.109 --> 00:01:37.030
from the documentation. First, multimodality,

00:01:37.170 --> 00:01:38.909
which is a fancy word we will definitely break

00:01:38.909 --> 00:01:42.090
down. Second, the new thinking model versus the

00:01:42.090 --> 00:01:44.489
fast model. And third, creating these custom

00:01:44.489 --> 00:01:46.870
gems. And we'll wrap it all up with a master

00:01:46.870 --> 00:01:50.189
class example, a gardening workflow that actually

00:01:50.189 --> 00:01:53.870
ties all these disparate pieces together into

00:01:53.870 --> 00:01:55.939
something really practical. So let's start with

00:01:55.939 --> 00:01:59.560
that first pillar, multimodality. To me, this

00:01:59.560 --> 00:02:01.959
often sounds a bit like marketing jargon. When

00:02:01.959 --> 00:02:04.159
you strip it back, what are we actually talking

00:02:04.159 --> 00:02:07.439
about here in terms of utility? It sounds complex,

00:02:07.439 --> 00:02:10.000
but it's actually the most human part of the

00:02:10.000 --> 00:02:12.479
AI. I mean, think about how you and I interact

00:02:12.479 --> 00:02:14.219
with the world right now. We don't just read

00:02:14.219 --> 00:02:16.960
text floating in a void. We see things. We hear

00:02:16.960 --> 00:02:20.039
sounds. We watch movement. Multimodality just

00:02:20.039 --> 00:02:22.930
means the AI can do that too. It isn't limited

00:02:22.930 --> 00:02:25.229
to text anymore. It can process photos. It can

00:02:25.229 --> 00:02:28.050
watch videos. It can listen to audio. So it's

00:02:28.050 --> 00:02:30.849
effectively breaking that text -only barrier.

00:02:31.150 --> 00:02:33.650
Exactly. And the efficiency gain here is massive.

00:02:33.849 --> 00:02:36.169
Think about the friction of trying to describe

00:02:36.169 --> 00:02:38.949
a photo to a computer. You're typing, OK, there's

00:02:38.949 --> 00:02:40.610
a tree in the left corner, and the light is hitting

00:02:40.610 --> 00:02:43.069
it this way. It's just tedious. With multimodality,

00:02:43.250 --> 00:02:45.689
you don't describe the context. You just show

00:02:45.689 --> 00:02:48.229
it. You just show it. And this ties back to that

00:02:48.229 --> 00:02:51.259
opening hook we had, the context window. The

00:02:51.259 --> 00:02:53.620
source material mentions Gemini Advanced has

00:02:53.620 --> 00:02:57.199
a capacity of 1 million tokens. We use the Harry

00:02:57.199 --> 00:02:59.719
Potter analogy, which is vivid, but technically,

00:02:59.919 --> 00:03:02.979
what is a token? Good catch. A token is basically

00:03:02.979 --> 00:03:06.319
a unit of text, roughly four characters. So a

00:03:06.319 --> 00:03:11.120
million tokens is about 700 ,000 words. Which

00:03:11.120 --> 00:03:13.520
is hard to wrap your head around. It is. But

00:03:13.520 --> 00:03:15.680
practically, it solves the goldfish memory problem.

00:03:16.250 --> 00:03:18.949
Most older models, if you gave them a long document

00:03:18.949 --> 00:03:20.629
by page 50, they've kind of forgotten what was

00:03:20.629 --> 00:03:23.229
on page one. With a million tokens, you can dump

00:03:23.229 --> 00:03:25.870
a massive amount of data in, and it retains high

00:03:25.870 --> 00:03:27.810
-fidelity recall from start to finish. Let's

00:03:27.810 --> 00:03:29.650
look at a concrete example from the research

00:03:29.650 --> 00:03:31.370
to see if this holds up. Say I have to learn

00:03:31.370 --> 00:03:34.009
about quantum computing, and I have a dense,

00:03:34.330 --> 00:03:38.289
79 -page PDF. It's dry, it's academic, it's painful.

00:03:38.430 --> 00:03:40.469
The nightmare scenario. Right. So I drag that

00:03:40.469 --> 00:03:42.569
PDF into Gemini, and because of that massive

00:03:42.569 --> 00:03:44.889
context window, I can ask it, explain this to

00:03:44.889 --> 00:03:46.590
me like I'm a five -year -old. And it works.

00:03:46.750 --> 00:03:48.909
It synthesizes the whole thing. But here's my

00:03:48.909 --> 00:03:51.949
pushback on that. It reads the text. It summarizes

00:03:51.949 --> 00:03:54.590
it. That's great for saving time. But how does

00:03:54.590 --> 00:03:57.289
that actually help us learn a complex topic?

00:03:57.770 --> 00:03:59.949
Because just reading a summary isn't the same

00:03:59.949 --> 00:04:01.990
as understanding the mechanics of it. That's

00:04:01.990 --> 00:04:04.509
a crucial distinction. And if you just ask for

00:04:04.509 --> 00:04:07.110
a summary, you are only getting surface -level

00:04:07.110 --> 00:04:11.069
info. The power is in interrogation. The source

00:04:11.069 --> 00:04:13.889
material suggests asking the AI to transform

00:04:13.889 --> 00:04:17.069
that static text into interactive tools. Interactive

00:04:17.069 --> 00:04:19.870
tools? Yeah. You aren't just getting a book report

00:04:19.870 --> 00:04:22.310
back. You can ask it to generate an infographic

00:04:22.310 --> 00:04:25.089
based on the text. Or, and this is where it gets

00:04:25.089 --> 00:04:27.709
wild, you can ask it to create a simple interactive

00:04:27.709 --> 00:04:30.750
simulation. Code, essentially. It lets you play

00:04:30.750 --> 00:04:33.170
with the variables mentioned in the PDF. It turns

00:04:33.170 --> 00:04:36.050
passive reading into active simulation. So I

00:04:36.050 --> 00:04:38.310
could ask... Based on this paper, write me a

00:04:38.310 --> 00:04:40.569
Python script that simulates particle spin. And

00:04:40.569 --> 00:04:43.050
it'll do it. You run the code, and suddenly you're

00:04:43.050 --> 00:04:45.189
seeing the concept in action rather than just

00:04:45.189 --> 00:04:47.730
reading about it. It's bridging that gap between

00:04:47.730 --> 00:04:50.529
information and real understanding. That's fascinating.

00:04:50.709 --> 00:04:52.350
OK, so we've got text and documents covered,

00:04:52.610 --> 00:04:54.910
but I want to pivot to the second pillar, which

00:04:54.910 --> 00:04:57.389
I think is even more mind -bending. The ability

00:04:57.389 --> 00:05:00.889
to watch. Video analysis. The source makes a

00:05:00.889 --> 00:05:03.529
really crucial distinction here. When we say

00:05:03.529 --> 00:05:05.829
Gemini watches a video, we aren't just saying

00:05:05.829 --> 00:05:07.930
it reads the automated captions, are we? Because

00:05:07.930 --> 00:05:09.709
I can do that. I can just read a transcript.

00:05:09.970 --> 00:05:12.490
Correct. And that is the game changer. Most tools

00:05:12.490 --> 00:05:16.910
just scrape the transcript text. Gemini is processing

00:05:16.910 --> 00:05:20.089
the pixels. It's using its native vision capabilities,

00:05:21.110 --> 00:05:23.610
analyzing the visual information frame by frame.

00:05:23.850 --> 00:05:26.629
I have to pause on that because... Just imagine

00:05:26.629 --> 00:05:28.730
the implication. It's not just hearing what is

00:05:28.730 --> 00:05:31.310
said. It's seeing the editing speed. It's seeing

00:05:31.310 --> 00:05:34.110
the color grading, the facial expressions. It

00:05:34.110 --> 00:05:35.870
captures the whole texture of the video, not

00:05:35.870 --> 00:05:37.910
just the content. So let's look at a use case.

00:05:38.209 --> 00:05:40.750
The guide mentions learning from YouTube. Say

00:05:40.750 --> 00:05:42.670
I want to make a video and I find a tutorial

00:05:42.670 --> 00:05:46.110
on how to make viral AI videos, but it's super

00:05:46.110 --> 00:05:48.529
fast paced. It's chaotic. Read the kind where

00:05:48.529 --> 00:05:50.129
you're pausing every two seconds just trying

00:05:50.129 --> 00:05:52.750
to catch up. Exactly. So you can paste that link

00:05:52.750 --> 00:05:55.430
into Gemini. What happens next? You just say,

00:05:55.750 --> 00:05:58.670
watch this and write a prompt for me to recreate

00:05:58.670 --> 00:06:01.589
this exact style. Because it's watching the pixels,

00:06:01.990 --> 00:06:04.550
it understands the visual language, the pacing,

00:06:04.829 --> 00:06:06.910
the aesthetic, and it gives you a command you

00:06:06.910 --> 00:06:09.410
can use to replicate it. It's like having a director

00:06:09.410 --> 00:06:12.029
analyzing the film for you. There was another

00:06:12.029 --> 00:06:13.689
example for creators that I thought was really

00:06:13.689 --> 00:06:17.149
sharp. If you're a YouTuber or even just doing

00:06:17.149 --> 00:06:19.629
marketing videos for work, you can upload your

00:06:19.629 --> 00:06:22.439
top five performing videos. This is where it

00:06:22.439 --> 00:06:24.699
becomes an analyst. You ask it, look at these

00:06:24.699 --> 00:06:27.420
five videos. Why did they succeed? And because

00:06:27.420 --> 00:06:30.100
it sees the video, it doesn't just say, oh, the

00:06:30.100 --> 00:06:32.779
topic was good. It looks at the structure. It

00:06:32.779 --> 00:06:35.680
might say, in all five videos, you used a rapid

00:06:35.680 --> 00:06:38.480
visual hook in the first three seconds. Or your

00:06:38.480 --> 00:06:40.959
editing pace accelerated at the one minute mark,

00:06:41.240 --> 00:06:43.779
which kept retention high. It's spotting patterns

00:06:43.779 --> 00:06:45.600
that you might not even realize you're doing.

00:06:45.800 --> 00:06:48.500
Precisely. It's objective feedback based on visual

00:06:48.500 --> 00:06:51.800
data. OK, now. This works great for immediate

00:06:51.800 --> 00:06:56.040
analysis, checking a few videos, a PDF. But what

00:06:56.040 --> 00:06:58.920
if I have a year's worth of scripts? What if

00:06:58.920 --> 00:07:01.459
I have a huge archive of data that I don't want

00:07:01.459 --> 00:07:03.639
to re -upload every single time? The context

00:07:03.639 --> 00:07:05.980
window is big, but it's not infinite. That's

00:07:05.980 --> 00:07:07.759
what you need to change tools. You connect it

00:07:07.759 --> 00:07:10.439
to Notebook LM to give the AI that long -term

00:07:10.439 --> 00:07:13.519
memory of your entire creative history. Notebook

00:07:13.519 --> 00:07:15.860
LM is often mentioned alongside Gemini. Can you

00:07:15.860 --> 00:07:18.199
just clarify the relationship there? Think of

00:07:18.199 --> 00:07:20.560
Gemini as the processor. the brain that's thinking

00:07:20.560 --> 00:07:23.139
right now. Think of Notebook LM as the library.

00:07:23.740 --> 00:07:26.600
You store your deep archives, hundreds of PDFs,

00:07:26.699 --> 00:07:29.819
old scripts, research notes in Notebook LM. Then

00:07:29.819 --> 00:07:31.740
when you're chatting in Gemini, you can reference

00:07:31.740 --> 00:07:33.639
that library without having to re -upload it.

00:07:33.939 --> 00:07:36.839
It grounds the AI in your specific history. Okay,

00:07:36.939 --> 00:07:38.600
moving on. I want to talk about something that

00:07:38.600 --> 00:07:41.500
I think everyone can relate to. The pain of repetition.

00:07:41.819 --> 00:07:44.339
Oh, absolutely. I still wrestle with this myself.

00:07:44.860 --> 00:07:46.800
I find a prompt that works, but then I have to

00:07:46.800 --> 00:07:49.560
type it out or copy paste it from a note. And

00:07:49.560 --> 00:07:52.259
I have to remind the AI of the formatting rules

00:07:52.259 --> 00:07:55.560
every single time. Don't use bullet points. Use

00:07:55.560 --> 00:07:59.279
a table. Don't do this. Do that. It's just exhausting.

00:07:59.540 --> 00:08:01.939
It's the friction that kills adoption. If it

00:08:01.939 --> 00:08:04.819
feels like work to ask the AI to do the work,

00:08:05.139 --> 00:08:07.750
you just stop doing it. So enter gems. Gems.

00:08:07.949 --> 00:08:10.209
These are described as custom versions of the

00:08:10.209 --> 00:08:13.269
AI, but how is that different from just, you

00:08:13.269 --> 00:08:16.370
know, a saved chat? Think of a gem as a preset

00:08:16.370 --> 00:08:19.170
persona. It's a version of Gemini where you have

00:08:19.170 --> 00:08:22.149
preloaded a specific set of system instructions

00:08:22.149 --> 00:08:24.730
that it never ever forgets. You're effectively

00:08:24.730 --> 00:08:26.930
programming the AI, but you're using natural

00:08:26.930 --> 00:08:28.970
language instead of code. The example in the

00:08:28.970 --> 00:08:31.589
deep dive material is the accountant gem. I think

00:08:31.589 --> 00:08:33.490
this illustrates it perfectly. It's so simple,

00:08:33.549 --> 00:08:35.909
but so effective. The problem is universal. You

00:08:35.909 --> 00:08:37.210
come back from a business trip, your pockets

00:08:37.210 --> 00:08:39.710
are full of crumpled receipts. It's a mess. The

00:08:39.710 --> 00:08:42.190
worst part of any trip. Easily. So the old way

00:08:42.190 --> 00:08:45.149
is, you take a photo, upload it, and you type

00:08:45.149 --> 00:08:47.909
a paragraph. Please extract the dates, amounts,

00:08:47.990 --> 00:08:51.110
and vendors. Put them in a table, categorize

00:08:51.110 --> 00:08:53.149
them. You have to type that every time. Which

00:08:53.149 --> 00:08:55.629
is so annoying. Right. With a gem, you build

00:08:55.629 --> 00:08:58.529
it once. You call it Expense Helper. And inside

00:08:58.529 --> 00:09:00.889
that gem, you write those instructions one time.

00:09:01.120 --> 00:09:04.440
Take numbers, format into an Excel table, categorize

00:09:04.440 --> 00:09:08.019
it to food, travel, hotel. Do not wait for me

00:09:08.019 --> 00:09:10.659
to ask. So the next time I open that gem, you

00:09:10.659 --> 00:09:12.980
drop the photo, you say absolutely nothing. Silence.

00:09:13.259 --> 00:09:15.820
Silence. It just does the job. It knows its role.

00:09:16.039 --> 00:09:17.600
It's like walking into your office and handing

00:09:17.600 --> 00:09:19.220
a file to an assistant who's worked with you

00:09:19.220 --> 00:09:21.620
for 10 years. You don't need to explain what

00:09:21.620 --> 00:09:23.139
to do with it. They already know the protocol.

00:09:23.259 --> 00:09:26.179
That is the shift from prompting every time to

00:09:26.179 --> 00:09:28.759
delegating to a system that you've built. Exactly.

00:09:29.019 --> 00:09:31.320
And you could do this for anything. a coding

00:09:31.320 --> 00:09:34.240
buddy that always comments your code, a writing

00:09:34.240 --> 00:09:36.059
editor that always checks for passive voice.

00:09:36.460 --> 00:09:38.480
You build the tool once, you use it forever.

00:09:38.919 --> 00:09:41.240
But, okay, let's say we have the data organized.

00:09:41.639 --> 00:09:43.659
The receipts are in a table, but that's just

00:09:43.659 --> 00:09:46.960
data entry. Where do we actually do the work

00:09:46.960 --> 00:09:49.919
of refining it? Where do we turn that data into,

00:09:49.919 --> 00:09:53.059
say, a report or a script? We move to Canvas,

00:09:53.919 --> 00:09:55.940
which the documentation describes as a hybrid

00:09:55.940 --> 00:09:58.399
between a Google Doc and a code editor. Canvas

00:09:58.399 --> 00:10:01.139
seems to be the answer to the chat interface

00:10:01.139 --> 00:10:03.720
problem because chatting is actually terrible

00:10:03.720 --> 00:10:06.299
for editing documents. It is. Chat is linear.

00:10:06.580 --> 00:10:09.940
You ask, it answers. If you want to change one

00:10:09.940 --> 00:10:11.659
sentence in the middle of a generated email,

00:10:12.000 --> 00:10:13.820
you usually have to ask it to rewrite the whole

00:10:13.820 --> 00:10:15.679
thing and then hope it doesn't mess up the parts

00:10:15.679 --> 00:10:18.289
you liked. Or you just copy paste it into Word

00:10:18.289 --> 00:10:20.549
and edit it yourself. Right. Canvas solves that.

00:10:20.669 --> 00:10:22.429
It opens a separate window right next to the

00:10:22.429 --> 00:10:24.009
chat. So you have your conversation on the left

00:10:24.009 --> 00:10:26.049
and the actual document on the right. So it looks

00:10:26.049 --> 00:10:28.750
more like a document editor. Yes. So let's say

00:10:28.750 --> 00:10:30.649
you're writing a video script based on those

00:10:30.649 --> 00:10:33.049
receipts. You generate the draft. It appears

00:10:33.049 --> 00:10:35.769
in the Canvas window. Now, you can highlight

00:10:35.769 --> 00:10:38.129
just the introduction and type, make this punchier,

00:10:38.590 --> 00:10:40.730
or highlight a technical paragraph and say, remove

00:10:40.730 --> 00:10:43.769
the jargon. It's surgical. It is. And crucially,

00:10:44.110 --> 00:10:46.210
you aren't losing the context of the rest of

00:10:46.210 --> 00:10:48.610
the document. It's acting like a collaborative

00:10:48.610 --> 00:10:51.129
editor sitting next to you rather than a bot

00:10:51.129 --> 00:10:54.590
just throwing text at you. Now, within this ecosystem,

00:10:54.870 --> 00:10:57.210
there's also a distinction made between the models

00:10:57.210 --> 00:10:59.889
themselves. We have the fast model and the thinking

00:10:59.889 --> 00:11:02.309
model. I feel like most people just leave it

00:11:02.309 --> 00:11:05.149
on default and never touch this. Which is a mistake.

00:11:06.090 --> 00:11:08.210
Because they are fundamentally different tools.

00:11:08.269 --> 00:11:12.000
How so? The fast model is... Well, it's fast.

00:11:12.460 --> 00:11:15.019
It uses what psychologists call system one thinking.

00:11:15.440 --> 00:11:18.580
It's instinctive, rapid, pattern matching. Great

00:11:18.580 --> 00:11:21.220
for brainstorming or quick chats. But the thinking

00:11:21.220 --> 00:11:23.879
model uses system two. System two being that

00:11:23.879 --> 00:11:26.259
slow, deliberative, logical part of the brain.

00:11:26.419 --> 00:11:28.559
Exactly. When you switch to the thinking model,

00:11:28.879 --> 00:11:31.700
you'll actually notice the AI pauses. It might

00:11:31.700 --> 00:11:33.919
take 10 or 15 seconds before it even starts typing.

00:11:34.179 --> 00:11:36.580
What's it doing during that pause? It's reasoning.

00:11:37.279 --> 00:11:39.990
It's mapping out a chain of thought. If you ask

00:11:39.990 --> 00:11:42.830
it a complex logic puzzle or a math problem,

00:11:43.370 --> 00:11:46.429
or to plan a travel itinerary with five different

00:11:46.429 --> 00:11:49.529
constraints, the thinking model maps out the

00:11:49.529 --> 00:11:52.210
steps before it generates the final answer. It

00:11:52.210 --> 00:11:54.629
actually checks its own work. The guide compares

00:11:54.629 --> 00:11:57.070
it to choosing the right tool to fix a car. You

00:11:57.070 --> 00:11:59.509
don't use a hammer for everything. Exactly. My

00:11:59.509 --> 00:12:02.409
recommendation. Keep thinking as the default.

00:12:02.559 --> 00:12:05.220
for anything complex. If you're doing real work

00:12:05.220 --> 00:12:08.240
coding, planning, analysis, the extra 10 seconds

00:12:08.240 --> 00:12:10.519
of wait time is worth the accuracy if you're

00:12:10.519 --> 00:12:13.240
just chatting about a movie. Use fast. So if

00:12:13.240 --> 00:12:15.620
we zoom out, we have multi -mobility, which is

00:12:15.620 --> 00:12:17.600
the eyes and ears. We have the thinking model,

00:12:17.639 --> 00:12:20.320
the brain. We have gems for the specialized training.

00:12:20.639 --> 00:12:22.700
And Canvas is the workspace. The full stack.

00:12:22.860 --> 00:12:24.460
It feels like a lot of separate tools when we

00:12:24.460 --> 00:12:26.179
list them like that. I want to see how they fit

00:12:26.179 --> 00:12:28.519
together. The source material lays out a smart

00:12:28.519 --> 00:12:31.039
garden workflow that I think perfectly illustrates

00:12:31.039 --> 00:12:33.639
the stacking concept. This is my favorite part

00:12:33.639 --> 00:12:35.700
because it takes someone with zero knowledge

00:12:35.700 --> 00:12:38.519
and gives them a result that looks like an expert

00:12:38.519 --> 00:12:41.129
produced it. OK, so let's walk through it. Step

00:12:41.129 --> 00:12:43.909
one, you want to grow a garden. You're in California.

00:12:44.029 --> 00:12:46.470
You have a balcony. You know nothing about plants.

00:12:46.690 --> 00:12:50.529
What do you do? Step one is multimodality. Don't

00:12:50.529 --> 00:12:53.029
type, I have a balcony that faces south. Just

00:12:53.029 --> 00:12:55.789
go outside, take a photo, upload it to Gemini,

00:12:56.250 --> 00:12:58.710
ask what grows here in California weather. So

00:12:58.710 --> 00:13:01.970
the AI analyzes the light, the space, the context

00:13:01.970 --> 00:13:04.450
right from the pixels. Right. It suggests, say,

00:13:04.610 --> 00:13:06.230
strawberries and lettuce because it sees you

00:13:06.230 --> 00:13:09.070
have Limited floor space, but good railing space.

00:13:09.289 --> 00:13:12.549
Okay. Step two? Visualization. This is the motivation

00:13:12.549 --> 00:13:15.190
step. You ask, show me a picture of what this

00:13:15.190 --> 00:13:17.370
will look like when it's fully grown. It uses

00:13:17.370 --> 00:13:20.049
image generation to show you a lush green balcony.

00:13:20.289 --> 00:13:21.809
Now you're excited. You can actually see the

00:13:21.809 --> 00:13:23.970
goal. I'm motivated, but I still don't know how

00:13:23.970 --> 00:13:26.929
to keep anything alive. Step three is deep research

00:13:26.929 --> 00:13:29.389
with a thinking model. You ask for a detailed

00:13:29.389 --> 00:13:31.830
plan. Create a weekly schedule telling me when

00:13:31.830 --> 00:13:34.070
to plant, when to fertilize, when to harvest.

00:13:34.549 --> 00:13:37.370
It digests all that agricultural data into a

00:13:37.370 --> 00:13:39.429
structured timeline for you. Now this is where

00:13:39.429 --> 00:13:41.870
most people would stop. They have the plan, they

00:13:41.870 --> 00:13:44.169
print it out, they lose it, and the plants die.

00:13:44.649 --> 00:13:47.149
Exactly. But we're going to build a system. Step

00:13:47.149 --> 00:13:50.870
four. Create a gem. You call it garden expert.

00:13:51.570 --> 00:13:53.870
You upload that plan and that original photo

00:13:53.870 --> 00:13:56.549
of your balcony into the gem's knowledge base.

00:13:56.850 --> 00:14:00.169
So the gem knows your specific garden. It knows

00:14:00.169 --> 00:14:01.990
your garden and the instruction you give it is,

00:14:02.370 --> 00:14:05.029
you are the caretaker of this balcony. When I

00:14:05.029 --> 00:14:07.370
send a photo of a sick plant, tell me how to

00:14:07.370 --> 00:14:09.929
fix it naturally based on the plants I own. That

00:14:09.929 --> 00:14:12.470
is brilliant. So two months later, the strawberry

00:14:12.470 --> 00:14:14.529
leaves are turning yellow. You don't Google yellow

00:14:14.529 --> 00:14:17.549
leaves. No. You snap a photo, you drop it in

00:14:17.549 --> 00:14:19.769
your gem, and it says, hey, remember those strawberries

00:14:19.769 --> 00:14:22.070
in the South Rail? They're nitrogen deficient.

00:14:22.330 --> 00:14:25.529
Add coffee grounds. It's context -aware troubleshooting.

00:14:25.730 --> 00:14:28.509
And finally, step five. Management via Canvas.

00:14:28.809 --> 00:14:31.289
You ask it to create a visual dashboard, a tracking

00:14:31.289 --> 00:14:33.190
table for watering and harvesting that you can

00:14:33.190 --> 00:14:35.889
edit. You just keep that open to track your progress.

00:14:36.269 --> 00:14:38.690
So in minutes, you went from zero knowledge to

00:14:38.690 --> 00:14:41.990
a personalized plan, a visual goal, a custom

00:14:41.990 --> 00:14:44.730
troubleshooter, and a management system. That's

00:14:44.730 --> 00:14:46.909
the power of the stack. You aren't just chatting.

00:14:47.250 --> 00:14:49.929
You are orchestrating. So the core shift here,

00:14:50.549 --> 00:14:53.539
it really is moving from just asking the AI a

00:14:53.539 --> 00:14:57.700
question to stacking the AI skills into a workflow.

00:14:57.919 --> 00:15:00.039
Exactly. It's not a robot you chat with. It's

00:15:00.039 --> 00:15:02.019
a multi -skilled assistant you build workflows

00:15:02.019 --> 00:15:05.120
for. You're the architect. The AI is the builder.

00:15:05.320 --> 00:15:07.480
I love that framing. It makes it feel much more

00:15:07.480 --> 00:15:09.600
active. But we've thrown a lot of features at

00:15:09.600 --> 00:15:12.039
you today. If you take away just one thing, what

00:15:12.039 --> 00:15:15.480
is the big idea here? The big idea is a mindset

00:15:15.480 --> 00:15:18.700
shift. It's resisting the urge to be overwhelmed.

00:15:19.419 --> 00:15:21.940
Don't try to change your entire work life tomorrow.

00:15:22.179 --> 00:15:24.100
Don't try to build the accountant gem and the

00:15:24.100 --> 00:15:26.879
garden gem and a coding gem all at once. Right.

00:15:27.100 --> 00:15:29.539
Start small. Pick one thing. One repetitive,

00:15:29.919 --> 00:15:32.320
boring, daily task. Maybe it's meal planning.

00:15:32.480 --> 00:15:34.940
Maybe it's summarizing that weekly meeting no

00:15:34.940 --> 00:15:36.840
one pays attention to. Maybe it's a good best

00:15:36.840 --> 00:15:38.820
report. It's one thing that annoys you. And build

00:15:38.820 --> 00:15:41.139
one simple gem for it. That's it. And once you

00:15:41.139 --> 00:15:43.299
save time on that one thing, the utility of the

00:15:43.299 --> 00:15:46.000
other features, multimodality, canvas, the thinking

00:15:46.000 --> 00:15:48.259
model, it all becomes obvious. It just clicks.

00:15:48.519 --> 00:15:50.279
You realize, oh, I can use this for that other

00:15:50.279 --> 00:15:52.480
thing, too. And suddenly, you aren't just a user

00:15:52.480 --> 00:15:55.159
anymore. You're a master of the tool. That is

00:15:55.159 --> 00:15:58.879
a great place to leave it. So here is our challenge

00:15:58.879 --> 00:16:02.000
to you listening right now. Identify one boring

00:16:02.000 --> 00:16:05.620
task today, just one, and attempt to create a

00:16:05.620 --> 00:16:08.440
gem for it. See if you can delegate that friction

00:16:08.440 --> 00:16:10.620
to the machine. And let us know how it goes.

00:16:10.779 --> 00:16:12.720
Thanks for diving deep with us today. We'll see

00:16:12.720 --> 00:16:14.159
you next time. Safe travels.