WEBVTT

00:00:00.000 --> 00:00:03.200
So when AI agents need to connect to the outside

00:00:03.200 --> 00:00:06.080
world, maybe they're booking a flight for you

00:00:06.080 --> 00:00:09.039
or checking warehouse stock, they have to speak

00:00:09.039 --> 00:00:11.679
some language to do that. And right now there

00:00:11.679 --> 00:00:13.960
are really two main contenders. The core conflict

00:00:13.960 --> 00:00:20.179
is this. Do you optimize for just raw binary

00:00:20.179 --> 00:00:24.280
speed or for a more intelligent sort of human

00:00:24.280 --> 00:00:26.579
-like understanding? Yeah, that choice kind of

00:00:26.579 --> 00:00:28.359
defines everything, doesn't it? It really does.

00:00:28.640 --> 00:00:30.640
Welcome back to the Deep Dive, everyone. And

00:00:30.640 --> 00:00:33.539
our sources today really zero in on that exact

00:00:33.539 --> 00:00:36.119
tension. We're looking at Anthropic's newer AI

00:00:36.119 --> 00:00:39.119
-focused model context protocol, or MCP, and

00:00:39.119 --> 00:00:41.640
pitting it against Google's, you know, tried

00:00:41.640 --> 00:00:44.200
and true workhorse, gRPC. That's Google Remote

00:00:44.200 --> 00:00:46.119
Procedure Call. Okay, so here's the plan. First,

00:00:46.219 --> 00:00:48.460
we need to establish why these LLMs, these large

00:00:48.460 --> 00:00:50.780
language models, even need external connections.

00:00:50.880 --> 00:00:52.890
Why can't they just... know everything. Yeah.

00:00:52.950 --> 00:00:54.270
What's the fundamental limit? And then we'll

00:00:54.270 --> 00:00:56.390
really get into the weeds comparing them. MCPs,

00:00:56.549 --> 00:01:00.210
semantic smarts versus GRPCs, production ready

00:01:00.210 --> 00:01:03.509
speed. And then importantly, we'll look ahead.

00:01:03.969 --> 00:01:06.650
How might these two actually work together? Because

00:01:06.650 --> 00:01:09.310
spoiler, it's probably not going to be just one

00:01:09.310 --> 00:01:11.250
winner takes all. We think there's a hybrid future

00:01:11.250 --> 00:01:13.730
here. Right. Hybrid architecture. That sounds

00:01:13.730 --> 00:01:16.310
like where we need to land. OK, let's start with

00:01:16.310 --> 00:01:19.680
that necessity. LLMs are. Well, they're amazing

00:01:19.680 --> 00:01:21.719
at pattern matching, understanding language.

00:01:22.140 --> 00:01:24.420
Incredibly. But they're not all knowing oracles.

00:01:24.540 --> 00:01:26.500
They have some really fundamental limits that

00:01:26.500 --> 00:01:28.739
mean they have to reach out. Absolutely. The

00:01:28.739 --> 00:01:30.799
first big one is what's called the context window

00:01:30.799 --> 00:01:34.040
bottleneck. Even these huge models, you hear

00:01:34.040 --> 00:01:36.819
about 200 ,000 token context windows, right?

00:01:37.120 --> 00:01:39.439
Sounds massive. It does sound massive. But it's

00:01:39.439 --> 00:01:42.640
still finite. You just cannot possibly cram,

00:01:42.640 --> 00:01:46.120
say, an entire company's customer database, terabytes

00:01:46.120 --> 00:01:49.659
of info or all its historical cone or like a

00:01:49.659 --> 00:01:53.480
real time financial data feed into that window.

00:01:53.680 --> 00:01:56.299
It just doesn't work. Computationally, it's kind

00:01:56.299 --> 00:01:57.959
of nuts. And the cost would be astronomical.

00:01:58.079 --> 00:02:00.859
It really is like trying to fit the entire Library

00:02:00.859 --> 00:02:04.359
of Congress into like a single notebook. Exactly.

00:02:04.540 --> 00:02:06.859
It doesn't scale that way. And the second big

00:02:06.859 --> 00:02:10.819
problem is knowledge cutoff. The LLM's knowledge

00:02:10.819 --> 00:02:14.219
is fundamentally a snapshot in time. Right. Based

00:02:14.219 --> 00:02:16.580
on when it was trained. Precisely. Yeah. So it

00:02:16.580 --> 00:02:18.419
doesn't matter if the training data was updated

00:02:18.419 --> 00:02:22.099
last week. It can't know live real time information

00:02:22.099 --> 00:02:25.080
like the weather right now in Phoenix or your

00:02:25.080 --> 00:02:27.240
company's specific internal sales numbers for

00:02:27.240 --> 00:02:30.039
this quarter or, you know, the gate number for

00:02:30.039 --> 00:02:32.099
a flight right now unless it goes and asks. So

00:02:32.099 --> 00:02:34.379
the answer isn't just building bigger and bigger

00:02:34.379 --> 00:02:37.330
models indefinitely. Nope. The answer is making

00:02:37.330 --> 00:02:40.129
the AI agent smarter about how it gets information,

00:02:40.430 --> 00:02:42.770
turning it into an orchestrator. Yeah, like a

00:02:42.770 --> 00:02:45.030
real -time decision maker. It needs to know how

00:02:45.030 --> 00:02:48.169
to query the CRM or the weather API or the flight

00:02:48.169 --> 00:02:50.889
status system exactly when needed instead of

00:02:50.889 --> 00:02:52.949
trying to hold it all internally. It shifts from

00:02:52.949 --> 00:02:55.050
trying to know everything to knowing how to find

00:02:55.050 --> 00:02:57.530
everything. Right. A subtle but really crucial

00:02:57.530 --> 00:03:00.250
difference. So thinking about that orchestration,

00:03:00.430 --> 00:03:02.830
how does it directly help with that snapshot

00:03:02.830 --> 00:03:05.099
in time problem you mentioned? Well, the agents

00:03:05.099 --> 00:03:08.039
can then fetch real -time, often proprietary

00:03:08.039 --> 00:03:10.360
data that simply wasn't part of its original

00:03:10.360 --> 00:03:13.520
training. Okay, so that need for external access

00:03:13.520 --> 00:03:16.800
brings us neatly to MCP, the Model Context Protocol.

00:03:17.780 --> 00:03:20.960
Anthropic built this, what, late? 2024? Yeah,

00:03:21.039 --> 00:03:23.120
fairly recent. And they really came at it with

00:03:23.120 --> 00:03:26.199
an AI -first philosophy. It's designed to speak

00:03:26.199 --> 00:03:29.199
the LLM's language kind of natively. And how

00:03:29.199 --> 00:03:30.659
does it do that? What's the underlying tech?

00:03:30.960 --> 00:03:34.240
So it's built on JSON RPC 2 .0, which, you know,

00:03:34.259 --> 00:03:37.199
for anyone not deep in APIs, is basically a standard

00:03:37.199 --> 00:03:39.780
way for programs to call functions on other programs

00:03:39.780 --> 00:03:42.939
using simple text, JSON. Which is human -readable

00:03:42.939 --> 00:03:45.780
and importantly, very LLM -friendly. They understand

00:03:45.780 --> 00:03:48.520
text structures like JSON really well. Exactly.

00:03:48.639 --> 00:03:51.000
And MCP structures everything around three core

00:03:51.000 --> 00:03:53.300
concepts or primitives. And the key thing is

00:03:53.300 --> 00:03:55.300
they all include natural language descriptions.

00:03:55.659 --> 00:03:57.280
Okay, what are they? First up, you have tools.

00:03:57.620 --> 00:03:59.699
Think of these as specific functions the agent

00:03:59.699 --> 00:04:01.840
can use, like get weather or update customer

00:04:01.840 --> 00:04:03.759
record. But crucially, they come with descriptions

00:04:03.759 --> 00:04:05.900
in plain English saying what they do and when

00:04:05.900 --> 00:04:07.919
to use them. Got it. Like instructions included.

00:04:08.120 --> 00:04:11.539
What else? Then there are resources. These represent...

00:04:13.159 --> 00:04:15.860
Bigger chunks of data or systems the agent might

00:04:15.860 --> 00:04:18.000
need to interact with. Maybe like the schema

00:04:18.000 --> 00:04:20.199
definition for a whole database. Again, described

00:04:20.199 --> 00:04:23.399
naturally. Okay. Tools, resources. And finally,

00:04:23.480 --> 00:04:26.120
prompts. These are like templates for interaction.

00:04:26.670 --> 00:04:29.050
They help guide the AI on how it should behave

00:04:29.050 --> 00:04:31.670
when performing certain tasks, setting contacts,

00:04:31.870 --> 00:04:34.389
setting guardrails. Interesting, but you said

00:04:34.389 --> 00:04:36.629
the real game changer is something else. Yeah,

00:04:36.649 --> 00:04:38.370
the runtime discovery. This is super important.

00:04:38.509 --> 00:04:41.870
An agent connects to an MCP server and can just

00:04:41.870 --> 00:04:44.610
ask, literally using a command like tools list,

00:04:44.829 --> 00:04:47.449
hey, what can you do? And the server responds,

00:04:47.750 --> 00:04:50.389
how? It sends back not just function names, but

00:04:50.389 --> 00:04:52.569
those full human readable descriptions we talked

00:04:52.569 --> 00:04:55.389
about. Use this tool if the user asks about flight

00:04:55.389 --> 00:04:58.209
availability. Or use this one for checking inventory

00:04:58.209 --> 00:05:00.990
levels. It's like built -in self -documentation

00:05:00.990 --> 00:05:04.350
that the AI understands instantly. So the agent

00:05:04.350 --> 00:05:06.610
can adapt on the fly if you add a new tool to

00:05:06.610 --> 00:05:09.149
the server. Exactly. The agent discovers it immediately.

00:05:09.629 --> 00:05:13.230
No need to retrain the whole model or write complex

00:05:13.230 --> 00:05:15.870
new integration code just to make the AI aware

00:05:15.870 --> 00:05:18.810
of a new capability. It speeds things up massively.

00:05:19.129 --> 00:05:20.850
Okay, thinking about that built -in understanding.

00:05:21.620 --> 00:05:23.800
What's the really core advantage of having those

00:05:23.800 --> 00:05:25.779
natural language descriptions right there in

00:05:25.779 --> 00:05:28.560
the protocol? It lets the AI agent figure out

00:05:28.560 --> 00:05:31.540
when and why it should use a specific tool all

00:05:31.540 --> 00:05:33.920
on its own. Okay, now let's completely shift

00:05:33.920 --> 00:05:37.579
perspective. Let's talk gRPC, Google Remote Procedure

00:05:37.579 --> 00:05:40.300
Call. This isn't new and AI -focused like MCP.

00:05:40.420 --> 00:05:42.699
This is the heavyweight champion from the world

00:05:42.699 --> 00:05:44.819
of microservices. Right, been around for, what,

00:05:44.899 --> 00:05:46.980
over a decade? At least. And it was built for

00:05:46.980 --> 00:05:50.660
one primary thing, speed. raw industrial strength

00:05:50.660 --> 00:05:52.819
efficiency. And how did it achieve that speed?

00:05:52.959 --> 00:05:55.120
What are its core components? Well, the absolute

00:05:55.120 --> 00:05:57.779
foundation is protocol buffers or protobuffs.

00:05:57.920 --> 00:06:00.519
Instead of sending text like MCP does with JSON,

00:06:00.839 --> 00:06:03.980
protobuffs serialize the data into a super compact

00:06:03.980 --> 00:06:07.180
binary format. Think of it like a highly efficient

00:06:07.180 --> 00:06:09.560
machine code for data structures. Very small,

00:06:09.620 --> 00:06:11.980
very fast to process. Kind of like Lego blocks

00:06:11.980 --> 00:06:14.120
of data, really tightly packed. That's a great

00:06:14.120 --> 00:06:17.600
analogy. Yeah. Tiny, efficient blocks. Plus,

00:06:17.660 --> 00:06:20.959
gRPC is... built on HTTP 2 which allows things

00:06:20.959 --> 00:06:23.180
like bi -directional streaming which means means

00:06:23.180 --> 00:06:25.339
the client and server can send messages back

00:06:25.339 --> 00:06:27.459
and forth simultaneously over the same connection

00:06:27.459 --> 00:06:29.660
really important for real -time applications

00:06:29.660 --> 00:06:32.540
data streams that kind of thing okay binary speed

00:06:32.540 --> 00:06:35.300
efficient streaming and you mentioned battle

00:06:35.300 --> 00:06:39.379
tested oh yeah proven scale Google uses gRPC

00:06:39.379 --> 00:06:41.819
for tons of its internal systems. We're talking

00:06:41.819 --> 00:06:45.040
systems handling billions, maybe trillions of

00:06:45.040 --> 00:06:47.860
requests. It's designed for massive production

00:06:47.860 --> 00:06:51.180
environments, rock solid. But there's always

00:06:51.180 --> 00:06:53.120
a but, isn't there? Yeah. What's the downside

00:06:53.120 --> 00:06:56.319
when we think about AI agents? The downside is

00:06:56.319 --> 00:06:59.399
what we can call the AI translation gap. GRPC

00:06:59.399 --> 00:07:01.860
tells you how to call a service structurally.

00:07:01.959 --> 00:07:04.439
The data formats, the function names, it's all

00:07:04.439 --> 00:07:07.399
very precise. Okay. But it has zero built -in

00:07:07.399 --> 00:07:09.959
semantic context. It doesn't naturally tell an

00:07:09.959 --> 00:07:12.519
AI why you'd call this service or when in conversation

00:07:12.519 --> 00:07:14.379
it makes sense to use it. That understanding

00:07:14.379 --> 00:07:17.660
isn't part of GRPC itself. Oh, I see. So that

00:07:17.660 --> 00:07:20.959
creates a need for what? An extra step. Exactly.

00:07:20.959 --> 00:07:23.000
You need this middle layer, this AI translation

00:07:23.000 --> 00:07:25.160
layer. It's basically custom code that sits between

00:07:25.160 --> 00:07:28.019
the AI agent's sort of fuzzy natural language

00:07:28.019 --> 00:07:31.699
intention and the very strict technical GRPC

00:07:31.699 --> 00:07:34.399
call needed to execute the action. You have to

00:07:34.399 --> 00:07:36.459
build that bridge yourself. Two sec silence.

00:07:36.699 --> 00:07:39.699
Yeah. You know, honestly, I still wrestle with

00:07:39.699 --> 00:07:42.379
prompt drift myself sometimes. Trying to get

00:07:42.379 --> 00:07:44.800
an agent to follow a very specific sequence or

00:07:44.800 --> 00:07:48.100
technical instruction perfectly, it's hard. So

00:07:48.100 --> 00:07:50.639
I definitely get the challenge of bridging that

00:07:50.639 --> 00:07:53.279
gap between, you know, what the user means and

00:07:53.279 --> 00:07:56.040
the precise technical steps required. It's a

00:07:56.040 --> 00:07:58.040
real challenge. It's the cost you pay for leveraging

00:07:58.040 --> 00:08:00.759
that high performance but perhaps less flexible

00:08:00.759 --> 00:08:05.129
foundation. So if gRPC is so fast, why is that

00:08:05.129 --> 00:08:07.370
translation layer, that extra bit of work, really

00:08:07.370 --> 00:08:09.410
necessary? Why bother if it adds complexity?

00:08:09.629 --> 00:08:11.709
Yeah, that crucial layer is what translates the

00:08:11.709 --> 00:08:13.970
human -like goal into the very precise structured

00:08:13.970 --> 00:08:16.370
instructions the gRPC system needs to actually

00:08:16.370 --> 00:08:18.550
do the job. Okay, that comparison sets us up

00:08:18.550 --> 00:08:20.550
perfectly. Let's actually visualize this. Looking

00:08:20.550 --> 00:08:23.009
at the architectural flow, how a request actually

00:08:23.009 --> 00:08:25.069
moves through the system, really highlights the

00:08:25.069 --> 00:08:27.069
core differences. Yeah, let's map it out. With

00:08:27.069 --> 00:08:30.029
the MCP flow, it looks simpler on paper. You've

00:08:30.029 --> 00:08:32.889
got the LLM agent. It talks to an MCP client.

00:08:32.990 --> 00:08:36.870
Right. That client uses JSON RPC 2 .0, that text

00:08:36.870 --> 00:08:39.870
-based standard, to talk to the MCP server, which

00:08:39.870 --> 00:08:41.870
then interacts with the actual back -end service.

00:08:42.210 --> 00:08:45.169
The whole path is designed around that text -based

00:08:45.169 --> 00:08:48.070
semantic communication. Okay. Pretty direct.

00:08:48.190 --> 00:08:51.190
Now contrast that with gRPC. The gRPC flow immediately

00:08:51.190 --> 00:08:53.950
shows that extra component. You have the LM agent.

00:08:54.110 --> 00:08:56.330
Then you have to insert that adapter layer, the

00:08:56.330 --> 00:08:58.429
AI translation piece we just talked about. The

00:08:58.429 --> 00:09:01.070
bridge. Before you get to the gRPC client, which

00:09:01.070 --> 00:09:03.549
then talks to the gRPC service, likely using

00:09:03.549 --> 00:09:07.250
protobuffs over HTTP2, that adapter adds a hop,

00:09:07.350 --> 00:09:09.470
adds development time. Yeah. But it's the price

00:09:09.470 --> 00:09:12.210
of admission for tapping into gRPC's speed. And

00:09:12.210 --> 00:09:14.590
the discovery process, how an agent figures out

00:09:14.590 --> 00:09:16.549
what it can do, also looks really different,

00:09:16.610 --> 00:09:18.970
right? Totally different philosophies. With MCP's

00:09:18.970 --> 00:09:21.929
built -in intelligence, discovery is, well, intelligent.

00:09:22.090 --> 00:09:24.950
The agent sends tools lists and gets back those

00:09:24.950 --> 00:09:27.590
nice semantic descriptions. I am the tool for

00:09:27.590 --> 00:09:30.509
updating inventory or whatever. It's conversational.

00:09:30.610 --> 00:09:33.809
Whereas with gRPC. GRPC offers something called

00:09:33.809 --> 00:09:36.409
server reflection for technical discovery. But

00:09:36.409 --> 00:09:39.669
what it returns is the raw technical protobuf

00:09:39.669 --> 00:09:41.990
definition. It's like getting the engineer's

00:09:41.990 --> 00:09:44.490
blueprint incredibly detailed about the structure.

00:09:44.710 --> 00:09:47.009
But no instructions on when to use that part

00:09:47.009 --> 00:09:48.870
of the blueprint. Exactly. It tells you what

00:09:48.870 --> 00:09:51.169
the function signature is, but not the why or

00:09:51.169 --> 00:09:54.129
the when in plain English. It lacks that semantic

00:09:54.129 --> 00:09:57.429
layer inherently. So boiling it down, if you

00:09:57.429 --> 00:09:59.690
ignore the AI understanding part for a second.

00:09:59.950 --> 00:10:01.990
What's the fundamental performance trade -off

00:10:01.990 --> 00:10:04.049
shown in these architectures? It really comes

00:10:04.049 --> 00:10:07.009
down to a direct trade. Built -in AI semantic

00:10:07.009 --> 00:10:09.610
intelligence with MCP versus highly optimized

00:10:09.610 --> 00:10:12.360
binary speed with gRPC. Okay, let's put some

00:10:12.360 --> 00:10:14.639
rough numbers on that performance difference.

00:10:14.879 --> 00:10:17.419
This is where gRPC really shines, or at least

00:10:17.419 --> 00:10:19.960
where the difference becomes stark. Right. MCP,

00:10:20.019 --> 00:10:23.500
using JSON -RPC 2 .0, is text. It's human -readable,

00:10:23.620 --> 00:10:26.500
AI -readable, but text is inherently kind of

00:10:26.500 --> 00:10:29.019
verbose, right? A simple tool call might easily

00:10:29.019 --> 00:10:31.879
be, say, 60 bytes, maybe more, once you include

00:10:31.879 --> 00:10:33.980
the descriptions and JSON overhead. Okay, 60

00:10:33.980 --> 00:10:37.379
-plus bytes. And gRPC with protobufs? Ruthlessly

00:10:37.379 --> 00:10:40.669
efficient. That same logical request encoded

00:10:40.669 --> 00:10:43.269
in binary protobuf. It might only be 20 bytes,

00:10:43.490 --> 00:10:45.690
maybe even less depending on the specifics. Wow,

00:10:45.830 --> 00:10:48.009
that's like a third of the size. Yeah, or even

00:10:48.009 --> 00:10:50.490
smaller. And that size difference gets amplified

00:10:50.490 --> 00:10:54.850
because gRPC uses HTTP2 multiplexing. Explain

00:10:54.850 --> 00:10:57.690
that quickly. It means gRPC can handle many requests

00:10:57.690 --> 00:11:00.350
and responses flying back and forth at the same

00:11:00.350 --> 00:11:03.230
time over a single network connection. Think

00:11:03.230 --> 00:11:05.330
of it like multiple conversations happening in

00:11:05.330 --> 00:11:08.000
parallel on one phone line instead of need. a

00:11:08.000 --> 00:11:11.279
new line for each call. Drastically cuts down

00:11:11.279 --> 00:11:14.240
network chatter and latency. Whoa. Okay, imagine

00:11:14.240 --> 00:11:17.419
scaling that. A billion queries a day using that

00:11:17.419 --> 00:11:20.440
tiny binary format with tons of requests happening

00:11:20.440 --> 00:11:22.759
simultaneously on each connection. The efficiency

00:11:22.759 --> 00:11:25.240
gains, the cost savings. That's actually staggering.

00:11:25.460 --> 00:11:27.320
It really is. Which leads us to the key takeaway.

00:11:27.679 --> 00:11:30.000
Context determines the winner. There's no single

00:11:30.000 --> 00:11:32.980
best protocol here. Right. So when does MCP make

00:11:32.980 --> 00:11:36.139
the most sense? MCP excels when that AI discovery

00:11:36.139 --> 00:11:39.019
piece is crucial, when the agent needs to dynamically

00:11:39.019 --> 00:11:41.759
figure out what it can do, when semantic understanding

00:11:41.759 --> 00:11:43.639
is really important for the task, maybe in more

00:11:43.639 --> 00:11:45.580
complex conversational agents, and definitely

00:11:45.580 --> 00:11:48.240
during rapid prototyping, where ease of use and

00:11:48.240 --> 00:11:50.580
understanding is key, and maybe raw throughput

00:11:50.580 --> 00:11:52.919
isn't the main concern yet. Okay, makes sense.

00:11:53.080 --> 00:11:56.340
And the flip side, when is gRPC just dominating?

00:11:57.059 --> 00:12:00.539
gRPC dominates when performance is absolutely

00:12:00.539 --> 00:12:03.259
paramount. high -frequency trading systems, real

00:12:03.259 --> 00:12:05.899
-time data streaming applications, anything where

00:12:05.899 --> 00:12:08.700
milliseconds genuinely matter. Also, when you're

00:12:08.700 --> 00:12:10.620
integrating with an existing backend that's already

00:12:10.620 --> 00:12:13.259
built on microservices using gRPC, and of course,

00:12:13.259 --> 00:12:16.299
at massive, massive scale. So given that performance

00:12:16.299 --> 00:12:18.960
edge and its history, does that mean gRPC is

00:12:18.960 --> 00:12:21.000
basically the default choice once enterprises

00:12:21.000 --> 00:12:23.159
get serious about putting agents into production?

00:12:23.519 --> 00:12:26.740
Well, gRPC is definitely the trusted choice for

00:12:26.740 --> 00:12:29.210
that. core high -scale infrastructure, no question.

00:12:29.490 --> 00:12:32.950
But MCP seems much better suited for that initial

00:12:32.950 --> 00:12:35.289
AI agent development, especially when you're

00:12:35.289 --> 00:12:38.090
building something new, AI first, and need that

00:12:38.090 --> 00:12:40.149
flexibility and semantic understanding early

00:12:40.149 --> 00:12:43.929
on. Which really brings us to the big idea, the

00:12:43.929 --> 00:12:46.230
synthesis here. The sources are pretty clear

00:12:46.230 --> 00:12:49.169
and it feels intuitive. We're heading towards

00:12:49.169 --> 00:12:51.799
a hybrid future. Yeah, it's not really going

00:12:51.799 --> 00:12:53.879
to be either in the long run, is it? Especially

00:12:53.879 --> 00:12:56.039
as these agents get more sophisticated and handle

00:12:56.039 --> 00:12:58.539
more critical, high volume tasks. You'll likely

00:12:58.539 --> 00:13:00.600
need both. So how does that hybrid model look?

00:13:00.639 --> 00:13:03.570
What role does each play? We're seeing MCP emerge

00:13:03.570 --> 00:13:06.149
as the potential front door. It handles that

00:13:06.149 --> 00:13:08.269
initial interaction, the semantic understanding,

00:13:08.590 --> 00:13:11.230
the discovery, what should I do? It's the intelligent

00:13:11.230 --> 00:13:13.370
routing layer. Okay, the thinking part. Right.

00:13:13.450 --> 00:13:16.570
And then GRPC acts as the engine behind the scenes.

00:13:16.870 --> 00:13:19.470
Once MCP figures out what needs to happen, if

00:13:19.470 --> 00:13:21.850
it's a performance -critical task, it hands it

00:13:21.850 --> 00:13:25.429
off to a highly optimized GRPC service to actually

00:13:25.429 --> 00:13:27.769
execute it with maximum speed and efficiency.

00:13:28.149 --> 00:13:30.559
The high -throughput workhorse. Exactly. So the

00:13:30.559 --> 00:13:33.860
best practice seems to be maybe light quickly

00:13:33.860 --> 00:13:37.159
with MCP. Build your agent. Get the logic right.

00:13:37.299 --> 00:13:40.539
Then identify the real performance bottlenecks.

00:13:40.580 --> 00:13:43.960
Which specific operations need that gRPC speed?

00:13:44.559 --> 00:13:48.059
Implement gRPC just for those and build that

00:13:48.059 --> 00:13:50.500
crucial translation layer carefully between your

00:13:50.500 --> 00:13:53.419
MCP front end and your gRPC back end services.

00:13:53.759 --> 00:13:56.039
Get the best of both worlds, but be deliberate

00:13:56.039 --> 00:13:59.110
about it. You know, this whole MCP versus GRPC

00:13:59.110 --> 00:14:01.750
thing, it feels like it represents a bigger question

00:14:01.750 --> 00:14:04.450
in AI development, doesn't it? Yeah, it's a microcosm

00:14:04.450 --> 00:14:07.009
of the broader debate. Do we adapt to the robust,

00:14:07.070 --> 00:14:09.730
proven technologies we already have, like GRPC

00:14:09.730 --> 00:14:12.470
from the microservices world? Or do we need to

00:14:12.470 --> 00:14:14.889
build fundamentally new AI -native solutions

00:14:14.889 --> 00:14:17.730
from scratch, like MCP? And the answer, like

00:14:17.730 --> 00:14:21.029
always it seems, is probably... a bit of both.

00:14:21.149 --> 00:14:23.370
It usually is. The key really looks like flexibility

00:14:23.370 --> 00:14:25.409
and understanding that the choice you make early

00:14:25.409 --> 00:14:28.769
on, MCP or gRPC or a mix, it doesn't just impact

00:14:28.769 --> 00:14:30.889
performance today. It kind of sets the philosophy

00:14:30.889 --> 00:14:33.110
for how your AI systems will evolve. That's a

00:14:33.110 --> 00:14:35.230
great point. So here's maybe a final thought

00:14:35.230 --> 00:14:39.269
to chew on. Can we eventually bake that gRPC

00:14:39.269 --> 00:14:42.970
level efficiency into inherently semantic protocols

00:14:42.970 --> 00:14:47.750
like MCP? Or will semantics and raw binary speed

00:14:47.750 --> 00:14:50.649
always require that sort of adapter, that translation

00:14:50.649 --> 00:14:53.070
layer sitting between them? That feels like a

00:14:53.070 --> 00:14:55.110
really core architectural challenge for the future.

00:14:55.370 --> 00:14:57.450
A fascinating question at the intersection of

00:14:57.450 --> 00:14:59.990
AI concepts and hardcore infrastructure engineering.

00:15:00.429 --> 00:15:02.070
Well, thank you for digging into the sources

00:15:02.070 --> 00:15:03.950
with us today on this one. It's a really nuanced

00:15:03.950 --> 00:15:05.929
and important area. Absolutely. Lots to think

00:15:05.929 --> 00:15:07.789
about. Indeed. Thanks, everyone, for joining

00:15:07.789 --> 00:15:09.330
us on the Deep Dive. We'll catch you next time.
