WEBVTT

00:00:00.000 --> 00:00:02.859
If you've spent any time building AI automation,

00:00:03.160 --> 00:00:06.799
I'm sure you've hit this infuriating wall. Oh,

00:00:06.799 --> 00:00:08.619
yeah. You know, wiring up the system, connecting

00:00:08.619 --> 00:00:10.660
the triggers, moving the data around. That's

00:00:10.660 --> 00:00:12.720
the easy part. And that's a solved problem, really.

00:00:13.019 --> 00:00:15.939
Exactly. But the moment you ask the brain to

00:00:15.939 --> 00:00:18.879
handle real complexity, like a 100 -page policy

00:00:18.879 --> 00:00:21.780
manual, a detailed diagram, a messy financial

00:00:21.780 --> 00:00:25.059
report, the whole workflow just becomes instantly

00:00:25.059 --> 00:00:27.739
fragile. It works on your machine and it just

00:00:27.739 --> 00:00:30.219
shatters in production. It really does. Well,

00:00:30.320 --> 00:00:32.719
this deep dive is about the technology that have

00:00:32.719 --> 00:00:36.179
the potential to solve that exact fragility We're

00:00:36.179 --> 00:00:39.060
talking about Google's new Gemini 3 Pro model.

00:00:39.299 --> 00:00:42.820
Yeah But as always with that immense power comes

00:00:42.820 --> 00:00:45.320
a whole new set of limitations. We have to navigate

00:00:45.579 --> 00:00:48.020
Welcome. We're going to dive into a necessary

00:00:48.020 --> 00:00:51.039
reality check on Gemini 3 Pro today. We really

00:00:51.039 --> 00:00:52.920
need to go past the press releases and focus

00:00:52.920 --> 00:00:55.100
squarely on what actually matters for building

00:00:55.100 --> 00:00:57.780
reliable AI automation inside a platform like

00:00:57.780 --> 00:01:00.600
NEN. Our mission today is pretty simple. Cut

00:01:00.600 --> 00:01:03.399
through the marketing noise and get real. We

00:01:03.399 --> 00:01:06.180
have to understand its huge strengths, context,

00:01:06.500 --> 00:01:09.359
planning, but we also have to face its current

00:01:09.359 --> 00:01:12.790
weaknesses. Which are? Primarily cost and some

00:01:12.790 --> 00:01:15.969
very specific tool calling bugs So we'll cover

00:01:15.969 --> 00:01:18.549
why it's sheer capacity is a game changer the

00:01:18.549 --> 00:01:21.049
trade -offs and how you connect it a hidden cost

00:01:21.049 --> 00:01:23.629
saving setting that most people miss and the

00:01:23.629 --> 00:01:27.650
big conclusion For reliability right now. You

00:01:27.650 --> 00:01:29.969
have to mix your models. Okay, let's unpack that

00:01:29.969 --> 00:01:32.590
the first major shift isn't really about raw

00:01:32.590 --> 00:01:34.920
intelligence, is it? It's about sheer capacity.

00:01:35.159 --> 00:01:37.640
That's it. Before Gemini 3 Pro, handling these

00:01:37.640 --> 00:01:40.099
massive documents was, I mean, an engineering

00:01:40.099 --> 00:01:42.060
nightmare. It was the foundational problem. You

00:01:42.060 --> 00:01:43.959
were solving data limitation problems before

00:01:43.959 --> 00:01:45.719
you could even touch the actual business logic.

00:01:45.859 --> 00:01:47.140
We were always stuck with workarounds, right?

00:01:47.140 --> 00:01:49.159
Yeah. We had to do things like chunking, splitting

00:01:49.159 --> 00:01:52.379
a document into tiny pieces or use these expensive

00:01:52.379 --> 00:01:55.180
vector databases just to feed the model relevant

00:01:55.180 --> 00:01:57.540
snippets. Mm -hmm. It felt like stacking Lego

00:01:57.540 --> 00:01:59.659
blocks of data and just hoping the AI saw the

00:01:59.659 --> 00:02:02.420
full picture. Yeah, hoping. Precisely. Now, just

00:02:02.420 --> 00:02:05.650
look at the numbers. Gemini 3 Pro supports roughly

00:02:05.650 --> 00:02:10.129
1 million input tokens. A token, to keep it simple,

00:02:10.629 --> 00:02:12.830
is just a piece of information the AI processes,

00:02:13.150 --> 00:02:15.789
usually a word or part of a word. A million tokens.

00:02:15.990 --> 00:02:18.409
That sounds abstract, so give us the practical

00:02:18.409 --> 00:02:20.050
scale of that. What does that actually mean?

00:02:20.129 --> 00:02:23.150
It's massive. Think of it like this. The entire

00:02:23.150 --> 00:02:26.530
novel Moby Dick is about 200 ,000 words. You

00:02:26.530 --> 00:02:29.590
could paste multiple full copies of long legal

00:02:29.590 --> 00:02:32.830
documents, internal procedure manuals, or huge

00:02:32.830 --> 00:02:35.189
financial reports directly into the prompt. The

00:02:35.189 --> 00:02:37.370
whole thing. Without hitting a limit. Without

00:02:37.370 --> 00:02:40.090
hitting a limit. This just removes a whole layer

00:02:40.090 --> 00:02:42.590
of engineering complexity for so many internal

00:02:42.590 --> 00:02:44.849
automations. You're not spending days building

00:02:44.849 --> 00:02:47.719
retrieval pipelines anymore. That context. is

00:02:47.719 --> 00:02:50.379
incredible. But like you said, the moment we

00:02:50.379 --> 00:02:52.379
solve that engineering headache, we slam right

00:02:52.379 --> 00:02:54.539
into the budget. Yeah. Let's talk economics here.

00:02:54.650 --> 00:02:57.129
because the price tag is significant. It's a

00:02:57.129 --> 00:02:58.909
critical dilemma. You have to get this up front.

00:02:59.129 --> 00:03:01.710
Gemini 3 Pro is significantly more expensive,

00:03:01.949 --> 00:03:04.330
we're talking orders of magnitude, than its little

00:03:04.330 --> 00:03:07.389
brother, Gemini 2 .5 Flash. Right. And if you

00:03:07.389 --> 00:03:09.770
fall into that trap of using the best model for

00:03:09.770 --> 00:03:12.229
everything just because you can, your cloud bill

00:03:12.229 --> 00:03:14.969
is going to spiral out of control fast. So when

00:03:14.969 --> 00:03:19.289
is Gemini 2 .5 Flash good enough? It's cheap.

00:03:19.520 --> 00:03:23.120
It's fast, and it's perfectly fine for bulk tasks.

00:03:23.680 --> 00:03:26.560
Things like simple document tagging, email routing,

00:03:26.939 --> 00:03:29.300
basic summaries. You don't need a sledgehammer

00:03:29.300 --> 00:03:32.259
for that. Wait, OK. If Flash is cheaper and can

00:03:32.259 --> 00:03:35.939
handle, say, 80 % of our simple tasks, why even

00:03:35.939 --> 00:03:39.199
bother with Pro? Isn't managing two models more

00:03:39.199 --> 00:03:41.680
of a headache than just eating the cost? Only

00:03:41.680 --> 00:03:44.580
if your volume is really low, Pro earns its keep

00:03:44.580 --> 00:03:47.419
only when that deep structural reasoning or that

00:03:47.419 --> 00:03:50.219
massive context window actually changes the quality

00:03:50.219 --> 00:03:52.539
of the output in a meaningful way. So we shouldn't

00:03:52.539 --> 00:03:54.479
just be looking at general benchmarks. The scores

00:03:54.479 --> 00:03:56.099
that matter for automation builders are more

00:03:56.099 --> 00:03:58.099
practical. Exactly. I'm thinking about tests

00:03:58.099 --> 00:04:00.080
like image understanding, things like the ScreenSpot

00:04:00.080 --> 00:04:02.419
Pro benchmark. It's not just counting objects

00:04:02.419 --> 00:04:05.180
in a picture. No. It's evaluating how well the

00:04:05.180 --> 00:04:07.060
model understands the structure of a diagram

00:04:07.060 --> 00:04:10.270
or a complex flow chart. Right, and also long

00:04:10.270 --> 00:04:12.530
horizon tasks like the vending bench benchmark.

00:04:12.909 --> 00:04:16.209
That one focuses on multi -step complex planning.

00:04:16.670 --> 00:04:19.350
The model has to plan five steps ahead and remember

00:04:19.350 --> 00:04:21.550
the constraints from step one all the way to

00:04:21.550 --> 00:04:24.209
the end. And that's where Pro justifies its cost.

00:04:24.269 --> 00:04:26.670
That's it. If your automation needs to generate

00:04:26.670 --> 00:04:29.810
highly structured data or execute a complex plan

00:04:29.810 --> 00:04:33.069
based on a 100 page document, that deep reasoning

00:04:33.069 --> 00:04:35.449
is where Pro pays for itself. So what is the

00:04:35.449 --> 00:04:38.779
main takeaway? regarding its expense. It's strategic

00:04:38.779 --> 00:04:41.939
role assignment. Use the best model for the specific

00:04:41.939 --> 00:04:44.980
task required. That makes a lot of sense. So

00:04:44.980 --> 00:04:47.800
if we accept that strategy, how do we actually

00:04:47.800 --> 00:04:49.759
connect this thing to our automation platform?

00:04:50.199 --> 00:04:52.000
There are basically three ways to hook it into

00:04:52.000 --> 00:04:54.660
N8n, each with different trade -offs. Yep. The

00:04:54.660 --> 00:04:56.279
first two are the most straightforward. You've

00:04:56.279 --> 00:04:58.660
got the native Google Gemini node. It's the fastest

00:04:58.660 --> 00:05:01.139
setup. It's great for quick tests, but you have

00:05:01.139 --> 00:05:03.500
very limited control over advanced settings.

00:05:03.660 --> 00:05:06.089
And it's not great for agents. Not at all. Then

00:05:06.089 --> 00:05:08.810
you have the AI agent node. This treats Gemini

00:05:08.810 --> 00:05:12.110
Pro as the brain for multi -step reasoning. You

00:05:12.110 --> 00:05:14.529
use it when the AI needs to actually think, not

00:05:14.529 --> 00:05:16.730
just describe something. But it still has limits.

00:05:17.230 --> 00:05:20.550
It does. While it's great for planning, it still

00:05:20.550 --> 00:05:23.290
lacks that granular control over the model's

00:05:23.290 --> 00:05:25.529
internal workings. And that's where reliability

00:05:25.529 --> 00:05:28.430
starts to suffer. And I'll share a quick vulnerable

00:05:28.430 --> 00:05:31.569
admission here. I mean, after years of building

00:05:31.569 --> 00:05:34.490
these systems, I still wrestle with prompt drift

00:05:34.490 --> 00:05:38.029
myself. A tiny change in the model or the input

00:05:38.029 --> 00:05:40.569
can just throw everything off. And that's why

00:05:40.569 --> 00:05:42.990
having that low level predictable control is

00:05:42.990 --> 00:05:45.949
so crucial for production systems. That's reassuring

00:05:45.949 --> 00:05:48.509
in a painful kind of way. What's an example of

00:05:48.509 --> 00:05:51.120
that drift causing a real problem? We had a user

00:05:51.120 --> 00:05:53.220
whose agent was supposed to classify support

00:05:53.220 --> 00:05:56.000
tickets and just pass a tag to a database. Simple.

00:05:56.560 --> 00:05:58.800
But it started adding a verbose explanation before

00:05:58.800 --> 00:06:01.120
the tag. Oh, no. The database only wanted the

00:06:01.120 --> 00:06:03.779
tag, so the system just broke silently. It lost

00:06:03.779 --> 00:06:06.060
data for three hours before anyone even noticed

00:06:06.060 --> 00:06:09.019
the structure had changed. Small drift, big error.

00:06:09.240 --> 00:06:11.819
Wow. OK, those simple connections can hide some

00:06:11.819 --> 00:06:15.459
really costly details. Which brings us to the

00:06:15.459 --> 00:06:18.300
more advanced option. Something like OpenRouter.

00:06:19.000 --> 00:06:21.420
OpenRouter centralizes your billing and gives

00:06:21.420 --> 00:06:23.560
you a clean way to swap and compare different

00:06:23.560 --> 00:06:27.279
models, Gemini, OpenAI, Anthropic All, with a

00:06:27.279 --> 00:06:30.120
single API key. It's great for testing that role

00:06:30.120 --> 00:06:32.329
assignment idea we talked about. It is. It's

00:06:32.329 --> 00:06:35.149
often the cleanest setup for serious high -volume

00:06:35.149 --> 00:06:37.569
users. But it still abstracts away some of those

00:06:37.569 --> 00:06:40.730
deep model -specific controls that, as we're

00:06:40.730 --> 00:06:43.170
finding out, Gemini Pro really needs. Exactly.

00:06:43.310 --> 00:06:45.689
So which connection method gives the most granular

00:06:45.689 --> 00:06:48.550
control over the model? Calling the API directly

00:06:48.550 --> 00:06:52.709
via the HTTP request node is mandatory for full

00:06:52.709 --> 00:06:54.970
control. And that brings us directly to a hidden

00:06:54.970 --> 00:06:57.860
problem. A silent cost leak that's driven by

00:06:57.860 --> 00:06:59.879
a setting most people can't even access with

00:06:59.879 --> 00:07:02.579
the easy notes. They can't, no. This setting

00:07:02.579 --> 00:07:05.300
defaults to maximum performance, which just silently

00:07:05.300 --> 00:07:07.279
inflates your costs without adding any value

00:07:07.279 --> 00:07:09.800
for simpler tasks. This is what we call the thinking

00:07:09.800 --> 00:07:12.899
level. Gemini Pro can operate in two states,

00:07:13.339 --> 00:07:16.240
low, which is faster, cheaper, and uses less

00:07:16.240 --> 00:07:19.019
deep reasoning, and high, which is the default.

00:07:19.279 --> 00:07:23.060
It's slower and engages much deeper, more resource

00:07:23.060 --> 00:07:25.720
-intensive logic. So high thinking is like having

00:07:25.720 --> 00:07:29.000
an internal QA team checking every logical step,

00:07:29.360 --> 00:07:31.759
while low thinking is the quick gut reaction

00:07:31.759 --> 00:07:34.360
response. That's a great way to put it. But here's

00:07:34.360 --> 00:07:37.509
the tooling gap. Gemini 10 currently doesn't

00:07:37.509 --> 00:07:39.949
expose this thinking level toggle in its native

00:07:39.949 --> 00:07:43.230
or agent nodes. You can change temperature, token

00:07:43.230 --> 00:07:45.670
limits, but not this critical cost style. Wow.

00:07:45.850 --> 00:07:47.930
Which means users are just stuck in the more

00:07:47.930 --> 00:07:50.189
expensive, higher latency, high thinking mode

00:07:50.189 --> 00:07:52.610
all the time, even for basic data extraction.

00:07:53.149 --> 00:07:55.970
At scale, that is a guaranteed insidious cost

00:07:55.970 --> 00:07:58.769
leak. We estimate you could save 40 to 50 percent.

00:07:58.889 --> 00:08:00.790
You're asking people to ditch the easy native

00:08:00.790 --> 00:08:03.509
node and write a manual HTTP call just to flip

00:08:03.509 --> 00:08:05.920
a toggle. I mean, are the savings really worth

00:08:05.920 --> 00:08:08.079
that extra complexity? They absolutely are if

00:08:08.079 --> 00:08:10.040
you're running that workflow hundreds or thousands

00:08:10.040 --> 00:08:12.680
of times a day. We've seen costs drop dramatically

00:08:12.680 --> 00:08:14.660
overnight for users who make this switch. So

00:08:14.660 --> 00:08:17.180
it's more setup time up front? A bit more, yeah.

00:08:17.439 --> 00:08:20.100
But it's the only reliable way to save money

00:08:20.100 --> 00:08:22.939
and reduce latency right now. This is a classic

00:08:22.939 --> 00:08:25.879
case of the model's capability running way ahead

00:08:25.879 --> 00:08:28.879
of the integration tooling. Why is this hidden

00:08:28.879 --> 00:08:32.740
setting so critical for workflows at scale? Hidden

00:08:32.740 --> 00:08:35.700
high thinking mode increases costs and latency

00:08:35.700 --> 00:08:38.960
unnecessarily for simple reasoning tasks. Okay,

00:08:38.960 --> 00:08:40.379
before the break we talked about the control

00:08:40.379 --> 00:08:43.019
you don't have. Let's pivot now to the execution

00:08:43.019 --> 00:08:45.419
barrier, which really defines the reliability

00:08:45.419 --> 00:08:48.100
of Gemini Pro today. Let's do it. We saw some

00:08:48.100 --> 00:08:50.299
amazing success in the image analysis experiments

00:08:50.299 --> 00:08:53.299
from our source material. When testing flow charts

00:08:53.299 --> 00:08:56.500
or, say, property damage photos, other models

00:08:56.500 --> 00:08:58.399
could describe what they saw. Yeah, they could

00:08:58.399 --> 00:09:00.659
give you a caption, but the key here is the shift

00:09:00.659 --> 00:09:03.000
from what to why. Explain that. Other models

00:09:03.000 --> 00:09:06.340
might say a dent in the fender or a pipe labeled

00:09:06.340 --> 00:09:09.639
A. Gemini 3 Pro consistently explained the structure

00:09:09.639 --> 00:09:11.700
of the diagram, the decision pass in the flow

00:09:11.700 --> 00:09:13.899
chart, and even the likely causes of the damage

00:09:13.899 --> 00:09:16.120
based on context. And that's what you need for

00:09:16.120 --> 00:09:19.490
automation. Exactly. Downstream automations need

00:09:19.490 --> 00:09:23.070
causes and logic, not captions. That level of

00:09:23.070 --> 00:09:25.350
interpretation makes the analysis immediately

00:09:25.350 --> 00:09:28.649
actionable. Whoa. Imagine scaling that kind of

00:09:28.649 --> 00:09:31.649
deep -image analysis across a billion insurance

00:09:31.649 --> 00:09:34.389
claims or infrastructure audits. That's a profound

00:09:34.389 --> 00:09:36.909
leap. It is, and that extends directly to the

00:09:36.909 --> 00:09:39.669
large -context reliability. The test of stuffing

00:09:39.669 --> 00:09:43.490
a full 126 -page PDF into the prompt without

00:09:43.490 --> 00:09:46.679
chunking. It was a clean success. It handled

00:09:46.679 --> 00:09:49.960
the entire document, no crashes, no lost accuracy.

00:09:50.179 --> 00:09:53.259
So it matched or even beat a complex ARG system.

00:09:53.460 --> 00:09:56.220
It did, but the real win is the reduced engineering

00:09:56.220 --> 00:09:58.419
time. You just eliminate the whole maintenance

00:09:58.419 --> 00:10:01.080
headache of chunking and vector databases for

00:10:01.080 --> 00:10:03.419
your static documents. Okay, but this incredible

00:10:03.419 --> 00:10:05.860
analytic power just hits a brick wall when the

00:10:05.860 --> 00:10:07.779
model needs to act. Let's talk about the major

00:10:07.779 --> 00:10:10.659
limitation. Tool calling. This is the big one.

00:10:10.919 --> 00:10:14.179
We found that in NCI Gemini Pro breaks Specifically

00:10:14.179 --> 00:10:16.679
when it tries to call an external tool like looking

00:10:16.679 --> 00:10:19.399
up a record in a database and then tries to resume

00:10:19.399 --> 00:10:21.799
reasoning Right the action itself executes, but

00:10:21.799 --> 00:10:24.100
the agent just errors out right after what's

00:10:24.100 --> 00:10:25.899
the technical reason for that? It comes down

00:10:25.899 --> 00:10:29.279
to something called thought signatures basically

00:10:29.279 --> 00:10:33.509
the agent errors out because NAN doesn't yet

00:10:33.509 --> 00:10:36.389
support Gemini's specific internal format for

00:10:36.389 --> 00:10:39.090
its planning notes. So the model is like taking

00:10:39.090 --> 00:10:40.990
notes for itself on how to solve the problem.

00:10:41.389 --> 00:10:43.809
Exactly. And when it comes back from using a

00:10:43.809 --> 00:10:46.490
tool, say, after looking up that database entry,

00:10:46.970 --> 00:10:49.490
it needs those internal notes to continue. But

00:10:49.490 --> 00:10:52.029
the current integration drops them, so the agent

00:10:52.029 --> 00:10:54.889
forgets its place and just crashes. So if I set

00:10:54.889 --> 00:10:57.769
up a Flocalize with the PDF, look up a customer.

00:10:57.980 --> 00:11:00.539
then send an email. It fails between the customer

00:11:00.539 --> 00:11:02.320
lookup and the email. That's right. That makes

00:11:02.320 --> 00:11:04.960
it completely unsafe for production execution

00:11:04.960 --> 00:11:07.440
flows. Precisely. It's amazing at planning, but

00:11:07.440 --> 00:11:09.480
when it goes to actually do something, the flow

00:11:09.480 --> 00:11:12.019
is just too brittle right now. This brings us

00:11:12.019 --> 00:11:15.360
to the final critical insight. The winning strategy

00:11:15.360 --> 00:11:18.000
is intentional model segmentation. Spligging

00:11:18.000 --> 00:11:20.779
the roles. You got it. Use Gemini 3 Pro as the

00:11:20.779 --> 00:11:23.440
planner and analyst. Let it understand the complex

00:11:23.440 --> 00:11:26.240
input. Let it design the logic. But then let

00:11:26.240 --> 00:11:29.019
another model do the work. Let other more stable

00:11:29.019 --> 00:11:32.000
models like Gemini Flash or even some open AI

00:11:32.000 --> 00:11:35.039
models serve as the executor. Let them handle

00:11:35.039 --> 00:11:37.620
the simple tool calls and action steps. What

00:11:37.620 --> 00:11:40.480
is the main risk of using Gemini in any 10 today?

00:11:40.759 --> 00:11:44.159
Tool -heavy execution fails due to unsupported

00:11:44.159 --> 00:11:46.580
internal thought signatures. That duality is

00:11:46.580 --> 00:11:49.539
really the big idea here. Gemini 3 Pro has absolutely

00:11:49.539 --> 00:11:51.659
raised the ceiling on what AI can understand.

00:11:52.220 --> 00:11:55.019
Complex context, images, deep reasoning. It has.

00:11:55.120 --> 00:11:58.080
But its integration is being held back by the

00:11:58.080 --> 00:12:00.539
tooling, by these hidden settings and that broken

00:12:00.539 --> 00:12:02.720
tool calling process. So the key to reliability

00:12:02.720 --> 00:12:05.970
is intentionality. It's strategic. Use Gemini

00:12:05.970 --> 00:12:08.570
Pro for its genuine strengths, that complex analysis

00:12:08.570 --> 00:12:11.509
and design. Then let other, more stable models

00:12:11.509 --> 00:12:13.970
handle the simple execution where cost and stability

00:12:13.970 --> 00:12:16.470
are what matters most. Test everything. Right.

00:12:16.669 --> 00:12:18.789
And this dynamic, where the model's capability

00:12:18.789 --> 00:12:21.409
is running so far ahead of the tools we use every

00:12:21.409 --> 00:12:23.669
day, it just shows how fast this whole field

00:12:23.669 --> 00:12:26.029
is moving. It's incredible. And the provocative

00:12:26.029 --> 00:12:28.610
thought to leave you with is this. What happens

00:12:28.610 --> 00:12:31.549
when the tools finally catch up? When we get

00:12:31.549 --> 00:12:34.029
full cost -effective control over that deeper

00:12:34.029 --> 00:12:38.149
thinking level, even for small, fast tasks, that's

00:12:38.149 --> 00:12:40.409
when the real massive shift in automation truly

00:12:40.409 --> 00:12:40.850
begins.
