WEBVTT

00:00:00.000 --> 00:00:03.020
I want you to just imagine a sound for a second.

00:00:03.919 --> 00:00:07.500
It's Tuesday morning, 10 a .m. You're in a dental

00:00:07.500 --> 00:00:10.119
clinic. The receptionist is busy. The dentist

00:00:10.119 --> 00:00:13.019
is working. And in the background, a phone is

00:00:13.019 --> 00:00:17.280
ringing. It rings and rings and then it stops.

00:00:17.820 --> 00:00:20.160
Silence. Just silence. And to the average person,

00:00:20.199 --> 00:00:22.879
that silence is, you know, just a missed call.

00:00:23.059 --> 00:00:25.140
Right. It's an annoyance, but it's part of doing

00:00:25.140 --> 00:00:27.059
business. But we're looking at a case study today.

00:00:28.010 --> 00:00:31.070
of a dr martinez he's a solo dental practitioner

00:00:31.070 --> 00:00:34.250
and for him that silence was the actual sound

00:00:34.250 --> 00:00:37.250
of money evaporating okay his clinic was missing

00:00:37.250 --> 00:00:40.789
34 of its calls during the day and what 100 of

00:00:40.789 --> 00:00:43.189
calls after hours i'm guessing 100 of course

00:00:43.189 --> 00:00:45.009
but they didn't audit they put in the solution

00:00:45.009 --> 00:00:47.390
we're going to discuss today and in six months

00:00:47.390 --> 00:00:52.189
they recovered 1 .3 million dollars 1 .3 million

00:00:52.189 --> 00:00:54.829
all from calls that were just going to voicemail

00:00:54.829 --> 00:00:57.420
before That is a staggering number. I mean, it

00:00:57.420 --> 00:00:59.500
completely reframes the problem. We're not talking

00:00:59.500 --> 00:01:01.140
about better customer service. We're talking

00:01:01.140 --> 00:01:04.060
about plugging a massive hole in the bottom of

00:01:04.060 --> 00:01:06.900
the boat. And the plug wasn't a second shift

00:01:06.900 --> 00:01:10.219
of employees. It was an AI voice agent. Right.

00:01:10.340 --> 00:01:12.159
And I know what everyone's thinking. Oh, great.

00:01:12.280 --> 00:01:15.099
Another robotic phone tree. Press one for appointments.

00:01:15.359 --> 00:01:18.299
That's not what this is. Not at all. The source

00:01:18.299 --> 00:01:21.260
material we're breaking down is a guide. And

00:01:21.260 --> 00:01:25.099
the promise here is, it's pretty wild. It claims

00:01:25.099 --> 00:01:29.260
you can build a full digital employee, one that

00:01:29.260 --> 00:01:33.140
thinks and speaks and even negotiates, in under

00:01:33.140 --> 00:01:36.439
18 minutes. 18 minutes. For something that agencies

00:01:36.439 --> 00:01:38.579
are currently charging small businesses, what,

00:01:39.040 --> 00:01:41.939
$5 ,000 to $25 ,000 to set up? That's the opportunity

00:01:41.939 --> 00:01:44.519
right there. So here's our plan for this deep

00:01:44.519 --> 00:01:46.620
dive. We're going to strip away the hype and

00:01:46.620 --> 00:01:49.000
really look at the engineering. First, we need

00:01:49.000 --> 00:01:51.219
to define what actually makes an agent different

00:01:51.219 --> 00:01:53.359
from, say, a chatbot. Then we'll dissect the

00:01:53.359 --> 00:01:56.200
stack. We're talking retail, AI, cal .com, and

00:01:56.200 --> 00:01:58.980
Gemini. And then walk through the build process,

00:01:59.239 --> 00:02:02.480
the brain, the memory, and the tools. And I want

00:02:02.480 --> 00:02:04.939
to be clear for everyone listening. We are ignoring

00:02:04.939 --> 00:02:07.400
the spaghetti code. The complex flowcharts. This

00:02:07.400 --> 00:02:10.340
is the zero code approach. The guide claims if

00:02:10.340 --> 00:02:12.500
you can fill out a web form, you can build this.

00:02:12.659 --> 00:02:15.259
Which makes it accessible. Exactly. Okay, let's

00:02:15.259 --> 00:02:17.599
start with the philosophy. The guide uses a phrase

00:02:17.599 --> 00:02:20.840
that stuck with me. Scalable labor. We usually

00:02:20.840 --> 00:02:24.180
hear automation. Why is scalable labor a better

00:02:24.180 --> 00:02:26.800
way to think about this? Well, think about a

00:02:26.800 --> 00:02:30.240
human receptionist. Let's call her Sarah. Sarah's

00:02:30.240 --> 00:02:34.219
great, but she's linear. She can handle one call

00:02:34.219 --> 00:02:36.599
at a time. Right. If three people call at once,

00:02:36.680 --> 00:02:39.539
two go to voicemail. Which, as you said, is basically

00:02:39.539 --> 00:02:41.979
a black hole where leads go to die. Yeah. An

00:02:41.979 --> 00:02:44.740
AI agent is parallel. If 100 people call at the

00:02:44.740 --> 00:02:48.520
exact same second, the AI just. It spawns 100

00:02:48.520 --> 00:02:51.419
instances of itself. It answers every single

00:02:51.419 --> 00:02:53.419
call. Every single one. It doesn't sleep. It

00:02:53.419 --> 00:02:54.900
doesn't get sick. It doesn't get overwhelmed.

00:02:55.539 --> 00:02:57.840
That's a kind of scalability that biological

00:02:57.840 --> 00:03:00.340
labor just can't compete with. It fundamentally

00:03:00.340 --> 00:03:02.419
changes the economics of the front office. But

00:03:02.419 --> 00:03:04.840
I have to push back a little. We've had automated

00:03:04.840 --> 00:03:07.620
systems for years. I call my bank. I talk to

00:03:07.620 --> 00:03:09.580
a robot. I get frustrated. And I just start yelling,

00:03:09.719 --> 00:03:11.960
representative, how is this really different?

00:03:12.240 --> 00:03:15.139
That is a critical question. That old system

00:03:15.139 --> 00:03:19.150
is an IVR. An interactive voice response. It's

00:03:19.150 --> 00:03:21.430
just a decision tree. A script. A rigid script.

00:03:21.650 --> 00:03:24.969
If you say balance, it plays recording A. If

00:03:24.969 --> 00:03:27.550
you say transfer, it plays recording B. It's

00:03:27.550 --> 00:03:32.069
a maze. An AI agent is a brain. It's got a large

00:03:32.069 --> 00:03:34.810
language model and LLM at its core. It's not

00:03:34.810 --> 00:03:36.650
following a script. It is understanding your

00:03:36.650 --> 00:03:39.729
intent. So if I just put a talking chat bot on

00:03:39.729 --> 00:03:42.650
a phone line, is that an agent? No. Without tools,

00:03:42.830 --> 00:03:44.849
it's just a brain in a box. Agents have to actually

00:03:44.849 --> 00:03:46.930
do work. Okay, let's unpack that. The source

00:03:46.930 --> 00:03:49.590
lays out a formula. Brain plus memory plus tools

00:03:49.590 --> 00:03:52.050
equals agent. That's the holy trinity right there.

00:03:52.110 --> 00:03:54.349
The brain is the reasoning engine. In our case,

00:03:54.449 --> 00:03:56.349
that's Google's Gemini model. And the memory?

00:03:56.569 --> 00:03:58.569
Memory is just context. It's knowing that you

00:03:58.569 --> 00:04:00.129
just told me your name is Mike so I don't ask

00:04:00.129 --> 00:04:02.110
for it again five seconds later. Yeah. You know,

00:04:02.169 --> 00:04:04.729
it avoids that digital amnesia. So the tools.

00:04:05.169 --> 00:04:08.300
The tools are the critical piece. The tools are

00:04:08.300 --> 00:04:10.719
the hands. It's the API connection that lets

00:04:10.719 --> 00:04:12.919
the brain reach out into the real world and do

00:04:12.919 --> 00:04:14.280
something. Like this check at Google Calendar?

00:04:14.439 --> 00:04:16.879
Exactly. Check the calendar, see an open slot,

00:04:17.079 --> 00:04:19.560
and then write Mike 2 p .m. into the database.

00:04:19.959 --> 00:04:22.439
Without the tool, it's just a nice chat. With

00:04:22.439 --> 00:04:25.139
the tool, it's a completed task. So the tool

00:04:25.139 --> 00:04:27.379
is really the bridge between digital thought

00:04:27.379 --> 00:04:30.899
and physical reality. That's a great way to put

00:04:30.899 --> 00:04:33.319
it. Tools turn conversation into completed tasks,

00:04:33.579 --> 00:04:36.000
like booking a seat. so let's get into the build

00:04:36.000 --> 00:04:40.439
itself we're using retail ai as the uh the platform

00:04:40.439 --> 00:04:44.220
or orchestrator and gemini 2 .5 flashlight as

00:04:44.220 --> 00:04:46.420
the brain and that choice of flashlight is really

00:04:46.420 --> 00:04:48.480
important why light wouldn't we want the biggest

00:04:48.480 --> 00:04:50.860
smartest brain possible if you were writing an

00:04:50.860 --> 00:04:55.379
email yes but for voice no in voice your number

00:04:55.379 --> 00:04:58.800
one enemy is latency. The pause. The awkward

00:04:58.800 --> 00:05:02.180
pause. Right. If you say hello and the AI takes

00:05:02.180 --> 00:05:04.560
three seconds to think and say hi, the illusion

00:05:04.560 --> 00:05:07.040
just shatters. You think the call dropped. Or

00:05:07.040 --> 00:05:10.720
you talk over it. Gemini Flashlight is incredibly

00:05:10.720 --> 00:05:13.680
fast and incredibly cheap. We're talking milliseconds

00:05:13.680 --> 00:05:16.040
of delay. It feels like a natural conversation

00:05:16.040 --> 00:05:18.519
rhythm. So speed is more important than pure

00:05:18.519 --> 00:05:21.800
IQ in this specific context. 100%. The AI doesn't

00:05:21.800 --> 00:05:23.959
need to write a philosophy paper. It just needs

00:05:23.959 --> 00:05:26.829
to check a calendar and be polite. Speed wins.

00:05:27.110 --> 00:05:30.110
So step one in the guide is just setting up the

00:05:30.110 --> 00:05:32.910
agent and retail standard stuff. But step two

00:05:32.910 --> 00:05:35.709
is the knowledge base. This is where it gets

00:05:35.709 --> 00:05:37.829
interesting for me. Yeah. This is basically how

00:05:37.829 --> 00:05:39.870
you teach it about your business. Think of it

00:05:39.870 --> 00:05:43.110
like giving a new employee. an employee handbook

00:05:43.110 --> 00:05:45.569
you're not retraining a whole model no god no

00:05:45.569 --> 00:05:47.470
you don't send it back to college you just hand

00:05:47.470 --> 00:05:50.290
them a pdf with your hours your prices your cancellation

00:05:50.290 --> 00:05:53.230
policy so you literally just upload a pdf or

00:05:53.230 --> 00:05:55.889
a text file that's it a pdf a word doc whatever

00:05:55.889 --> 00:05:58.610
when a customer asks a question the brain doesn't

00:05:58.610 --> 00:06:01.490
guess it reads that document in a split second

00:06:01.490 --> 00:06:03.930
to find the correct answer so why is a simple

00:06:03.930 --> 00:06:07.930
static pdf better than say fine -tuning a model

00:06:07.930 --> 00:06:11.980
from scratch Simplicity. The AI references the

00:06:11.980 --> 00:06:14.740
document dynamically. It makes sure the answers

00:06:14.740 --> 00:06:16.920
are accurate and always up to date. It's the

00:06:16.920 --> 00:06:19.319
difference between memorizing the textbook and

00:06:19.319 --> 00:06:22.379
just keeping the textbook open on your desk.

00:06:22.459 --> 00:06:24.500
That is a perfect analogy. Okay, so we have a

00:06:24.500 --> 00:06:26.860
brain. We have the handbook. Now we hit step

00:06:26.860 --> 00:06:29.779
three. And I have to admit, whenever I hear the

00:06:29.779 --> 00:06:32.689
phrase API key, I usually... I freeze up a little.

00:06:32.790 --> 00:06:35.750
You are not alone. The API key is usually that

00:06:35.750 --> 00:06:38.550
scary monster under the bed for people who don't

00:06:38.550 --> 00:06:40.870
code. It sounds like the part where everything

00:06:40.870 --> 00:06:43.610
is supposed to break. It does. But trust me,

00:06:43.649 --> 00:06:46.170
this is the boring part that pays off. It's much

00:06:46.170 --> 00:06:48.129
simpler than it sounds. All right, walk us through

00:06:48.129 --> 00:06:50.389
it, the impossible part, integrating the calendar.

00:06:50.629 --> 00:06:53.290
So we're using cal .com. It's free. It's powerful.

00:06:54.170 --> 00:06:56.990
First, you create an event type. Let's call it

00:06:56.990 --> 00:06:59.269
a 60 -minute meeting. Okay. You grab the event

00:06:59.269 --> 00:07:01.829
type ID right out of the URL. You just copy it.

00:07:01.850 --> 00:07:03.769
Copy, paste. Got it. Then you go into settings.

00:07:03.829 --> 00:07:06.629
You generate an API key. Quick tip, make sure

00:07:06.629 --> 00:07:09.589
you set it to never expire so your AI receptionist

00:07:09.589 --> 00:07:11.750
doesn't quit in a month. Right. And you paste

00:07:11.750 --> 00:07:14.509
both of those into retail under functions. That's

00:07:14.509 --> 00:07:16.750
it. That's it. You've given the AI permission

00:07:16.750 --> 00:07:20.029
to see your schedule and, crucially, to write

00:07:20.029 --> 00:07:23.069
to it. So the AI actually sees my real Google

00:07:23.069 --> 00:07:27.100
calendar. Yes. And it syncs instantly. Right.

00:07:27.160 --> 00:07:28.920
It prevents double bookings without a human ever

00:07:28.920 --> 00:07:32.060
having to look at it. That is, that's profound.

00:07:32.139 --> 00:07:34.279
It's not a script anymore. It's a dynamic system.

00:07:34.439 --> 00:07:37.379
It's alive in a way. It's reacting to the real

00:07:37.379 --> 00:07:39.920
world in real time. So we have the brain, we

00:07:39.920 --> 00:07:42.800
have the tools, but we still have a robot. We

00:07:42.800 --> 00:07:45.060
need a personality. Right. And that's steps four

00:07:45.060 --> 00:07:48.240
through seven. This is the fun part. Prompting,

00:07:48.240 --> 00:07:52.319
voices, and very importantly, handoffs. What's

00:07:52.319 --> 00:07:54.939
the handoff? The human handoff is your safety

00:07:54.939 --> 00:07:57.800
valve. Sometimes the AI will get stuck or a caller

00:07:57.800 --> 00:07:59.879
is just angry and wants a person. Let me talk

00:07:59.879 --> 00:08:01.959
to your manager. Exactly. So you set up a call

00:08:01.959 --> 00:08:05.019
transfer function that just routes the call to

00:08:05.019 --> 00:08:07.360
a real human's phone number. Okay, that's critical.

00:08:07.540 --> 00:08:09.860
And the prompt? The prompt is the secret weapon.

00:08:10.980 --> 00:08:14.139
And the source has great advice here. Don't write

00:08:14.139 --> 00:08:16.269
it yourself from scratch. Yeah. Use a prompt

00:08:16.269 --> 00:08:18.430
generator. You define the persona. You were a

00:08:18.430 --> 00:08:20.889
helpful receptionist named Henry. You set the

00:08:20.889 --> 00:08:23.569
guardrails and you let the generator create the

00:08:23.569 --> 00:08:26.009
structured instructions the LLM actually needs

00:08:26.009 --> 00:08:28.089
to follow. And the voice itself. We're using

00:08:28.089 --> 00:08:30.370
11 Labs voices here. But the guide had this one

00:08:30.370 --> 00:08:32.769
tip that I just loved. Adding background noise.

00:08:32.990 --> 00:08:35.490
Yes. Light coffee shop background noise. Why

00:08:35.490 --> 00:08:37.009
would you add background noise? Doesn't that

00:08:37.009 --> 00:08:39.350
just make it harder to hear? It creates psychological

00:08:39.350 --> 00:08:42.269
realism. Yeah. Total dead silence feels digital.

00:08:42.750 --> 00:08:45.519
Ambient noise feels human. It's a texture. It

00:08:45.519 --> 00:08:48.080
creates a sense of presence. That's it. And then

00:08:48.080 --> 00:08:49.940
you just set the welcome message to dynamic.

00:08:50.240 --> 00:08:53.179
So the AI generates its own greeting based on

00:08:53.179 --> 00:08:56.039
the prompt. Hi, I'm Henry from AI Firestore or

00:08:56.039 --> 00:08:58.679
whatever it is. OK, so we've built it. And now

00:08:58.679 --> 00:09:02.340
the moment of wonder. I'm looking at this transcript

00:09:02.340 --> 00:09:05.379
of a demo call from the guide for a made up Sarah's

00:09:05.379 --> 00:09:07.820
salon. This is the aha moment. It really is.

00:09:07.960 --> 00:09:11.139
A caller asks for a cut and blow dry. The agent

00:09:11.139 --> 00:09:15.220
quotes the prices. 95 for senior stylist. 70

00:09:15.220 --> 00:09:18.039
for a junior. The caller then asks for tomorrow

00:09:18.039 --> 00:09:21.980
at 11 a .m. or 8 a .m. And this is where a dumb

00:09:21.980 --> 00:09:25.519
chatbot fails. If 11 and 8 are taken, it just

00:09:25.519 --> 00:09:29.399
says, sorry, unavailable. End of story. Right.

00:09:29.480 --> 00:09:31.840
But this agent says, I have an opening at 9 p

00:09:31.840 --> 00:09:34.340
.m. Would that work? Boom. The caller agrees,

00:09:34.679 --> 00:09:36.559
gives their name and number, and the booking

00:09:36.559 --> 00:09:38.779
is confirmed. It didn't just check a box. It

00:09:38.779 --> 00:09:41.000
negotiated. It negotiated the time. It didn't

00:09:41.000 --> 00:09:43.779
just say no. What does that imply? It implies

00:09:43.779 --> 00:09:47.220
reasoning. It understood the intent was I want

00:09:47.220 --> 00:09:49.080
an appointment. Yeah. And it offered a viable

00:09:49.080 --> 00:09:51.340
alternative. And to make this go live, to actually

00:09:51.340 --> 00:09:53.620
deploy it. You buy a phone number in retail for

00:09:53.620 --> 00:09:56.769
like two bucks. You click publish. And you can

00:09:56.769 --> 00:09:58.590
call that number from your cell phone five seconds

00:09:58.590 --> 00:10:01.409
later. It's live. Okay, so let's recap the big

00:10:01.409 --> 00:10:04.909
idea here. We moved from this sphere of spaghetti

00:10:04.909 --> 00:10:08.190
diagrams and code to a working system. We did.

00:10:08.230 --> 00:10:11.750
We used Retail AI for the framework, the body.

00:10:11.929 --> 00:10:15.330
We used Cal .com for the action, for the tools.

00:10:15.509 --> 00:10:18.009
And Gemini 2 .5 Flashlight for the reasoning,

00:10:18.070 --> 00:10:20.850
for the brain. And we did it all without writing

00:10:20.850 --> 00:10:23.259
a single line of code. You just built a system

00:10:23.259 --> 00:10:25.759
that, as you said, agencies are charging $10

00:10:25.759 --> 00:10:28.720
,000 or more for. In the time it takes to watch

00:10:28.720 --> 00:10:31.159
a sitcom. It's the democratization of some really

00:10:31.159 --> 00:10:34.259
high end technology. It's power. It's putting

00:10:34.259 --> 00:10:36.960
enterprise grade automation into the hands of

00:10:36.960 --> 00:10:40.759
a solo dentist or a local salon owner. So what

00:10:40.759 --> 00:10:42.820
do you think this all means for the future of

00:10:42.820 --> 00:10:44.919
work? This feels like it's about more than just

00:10:44.919 --> 00:10:47.620
automation. It's about recovering lost time and

00:10:47.620 --> 00:10:50.899
lost revenue. It's about this idea of the 18

00:10:50.899 --> 00:10:53.940
minute employee. The 18 -minute employee. Yeah.

00:10:54.100 --> 00:10:57.240
If you can build an employee in 18 minutes that

00:10:57.240 --> 00:11:01.519
recovers $1 .3 million, the definition of hiring

00:11:01.519 --> 00:11:04.120
has just fundamentally changed. That is a very

00:11:04.120 --> 00:11:05.740
heavy thought. So I want to encourage everyone

00:11:05.740 --> 00:11:08.559
listening. Try the 18 -minute challenge. Even

00:11:08.559 --> 00:11:10.279
if you don't have a business. Just build it to

00:11:10.279 --> 00:11:13.039
see it work. Exactly. Build an agent that books

00:11:13.039 --> 00:11:15.879
time on your personal calendar just to see how

00:11:15.879 --> 00:11:19.299
that brain plus memory plus tools formula feels.

00:11:19.720 --> 00:11:21.720
Because once you see it work, you can't unsee

00:11:21.720 --> 00:11:24.120
it. Go build something. Go build something. Thank

00:11:24.120 --> 00:11:25.980
you for listening to The Deep Dive. We'll see

00:11:25.980 --> 00:11:26.419
you next time.
