WEBVTT

00:00:00.000 --> 00:00:01.720
Welcome to the Deep Dive. Today we're looking

00:00:01.720 --> 00:00:05.480
beyond the typical generative AI. You know, the

00:00:05.480 --> 00:00:07.459
kind that just gives you answers. Yeah. Really

00:00:07.459 --> 00:00:09.359
digging into how you build something proactive,

00:00:09.560 --> 00:00:11.660
something that takes action. Exactly. Think about

00:00:11.660 --> 00:00:13.679
an AI worker that doesn't just chat. It's actually

00:00:13.679 --> 00:00:15.820
checking your life calendar. It's booking meetings

00:00:15.820 --> 00:00:18.519
for specific times. And then it automatically

00:00:18.519 --> 00:00:20.399
logs the whole thing in your back -end systems.

00:00:20.620 --> 00:00:24.679
Now that is a real AI worker. Autonomous. So

00:00:24.679 --> 00:00:26.800
our mission today is pretty focused. Yeah. We

00:00:26.800 --> 00:00:28.500
want to break down the architecture that turns

00:00:28.500 --> 00:00:31.780
a basic voice agent into, well, a genuinely useful

00:00:31.780 --> 00:00:34.399
automated service provider. We'll be looking...

00:00:34.439 --> 00:00:36.500
closely at sources detailing VOP, that's the

00:00:36.500 --> 00:00:39.820
voice platform, and N8n, which acts as the automation

00:00:39.820 --> 00:00:42.039
engine behind it all. Yeah, we're covering the

00:00:42.039 --> 00:00:43.719
whole stack, how to create the agent's special

00:00:43.719 --> 00:00:46.640
abilities or tools, some clever system prompt

00:00:46.640 --> 00:00:48.659
techniques, how to get structured reports out

00:00:48.659 --> 00:00:51.520
of it, and importantly, the best practices you

00:00:51.520 --> 00:00:53.619
absolutely need for making this production ready,

00:00:53.659 --> 00:00:56.320
running 247. Basically, you're getting a shortcut

00:00:56.320 --> 00:00:59.579
here, a blueprint for building a reliable, autonomous

00:00:59.579 --> 00:01:03.079
AI service agent. So let's unpack it. OK, let's

00:01:03.079 --> 00:01:04.760
start with those abilities, the superpowers.

00:01:05.019 --> 00:01:07.840
How do we get the AI to do more than just talk?

00:01:07.980 --> 00:01:10.219
Right. This is the big shift, isn't it? Moving

00:01:10.219 --> 00:01:13.620
from just being like a knowledgeable receptionist,

00:01:13.620 --> 00:01:16.040
which we might have covered before, to an agent

00:01:16.040 --> 00:01:18.799
that actually. does things. The example we're

00:01:18.799 --> 00:01:20.939
looking at is an appointment booker for a fictional

00:01:20.939 --> 00:01:23.739
newsletter company, AI Fire. The agent needs

00:01:23.739 --> 00:01:27.000
to chat, sure, but crucially, it has to check

00:01:27.000 --> 00:01:29.739
calendar availability and then actually commit

00:01:29.739 --> 00:01:32.400
to putting an event on the calendar. And these

00:01:32.400 --> 00:01:34.780
actions, these superpowers, they live inside

00:01:34.780 --> 00:01:37.239
Voppy, specifically in the tool section. Is that

00:01:37.239 --> 00:01:39.299
right? That's it. And what the source material

00:01:39.299 --> 00:01:42.420
really emphasizes is not building one giant do

00:01:42.420 --> 00:01:45.540
-everything booking tool. Instead... You break

00:01:45.540 --> 00:01:48.480
it down. You segment the tasks. Why is that segmentation

00:01:48.480 --> 00:01:50.840
important? Why split checking availability from

00:01:50.840 --> 00:01:53.200
actually creating the event? It's all about making

00:01:53.200 --> 00:01:55.480
it robust. Reducing the chances of something

00:01:55.480 --> 00:01:57.959
going wrong mid -flow. So tool number one is

00:01:57.959 --> 00:02:00.200
check availability. The description is dead simple.

00:02:00.640 --> 00:02:02.799
Use this tool to check the calendar to see when

00:02:02.799 --> 00:02:04.959
there are available time slots. That's its only

00:02:04.959 --> 00:02:08.849
job. Okay. Focused. And tool two. Tool two is

00:02:08.849 --> 00:02:11.389
create event. This one handles the actual booking,

00:02:11.530 --> 00:02:14.810
the commitment. Its description says, use this

00:02:14.810 --> 00:02:18.689
tool to create a calendar event booking. And

00:02:18.689 --> 00:02:22.250
a really key point, these tools don't do anything

00:02:22.250 --> 00:02:24.330
until you've actually connected your Google Calendar

00:02:24.330 --> 00:02:26.810
account. That happens over in Voppy's integrations

00:02:26.810 --> 00:02:29.810
tab first. Got it. Connect the account, then

00:02:29.810 --> 00:02:32.330
build the specific tools. Right. Once the tools

00:02:32.330 --> 00:02:34.710
exist, you create the assistant itself. Let's

00:02:34.710 --> 00:02:36.990
call it the booking agent. Now, a critical setting

00:02:36.990 --> 00:02:39.969
here is the AI model. The recommendation is to

00:02:39.969 --> 00:02:43.870
use the GPT -4O cluster. GPT -4O cluster. Why

00:02:43.870 --> 00:02:45.930
that specific one? It's kind of the sweet spot.

00:02:46.090 --> 00:02:47.969
It's fast. It's pretty good with logical tasks

00:02:47.969 --> 00:02:50.210
like scheduling. And this matters when you scale.

00:02:50.289 --> 00:02:53.110
It's cheaper than the full GPT -4 model. Think

00:02:53.110 --> 00:02:55.469
about thousands of calls a day. Those costs add

00:02:55.469 --> 00:02:57.610
up. That makes sense. So that deliberate split

00:02:57.610 --> 00:02:59.830
checking availability versus creating the event.

00:02:59.930 --> 00:03:01.669
Yeah. It feels like it builds in resilience.

00:03:01.810 --> 00:03:04.469
If you ask one tool to do both, maybe the agent

00:03:04.469 --> 00:03:06.789
gets confused if the user, you know, changes

00:03:06.789 --> 00:03:09.069
their mind halfway through. Exactly. Focus ensures

00:03:09.069 --> 00:03:12.069
reliability. Basically, one function per tool

00:03:12.069 --> 00:03:14.490
helps prevent those complex execution errors.

00:03:14.770 --> 00:03:16.750
OK, so we have the tools. Now, what about the

00:03:16.750 --> 00:03:19.990
agent's brain, the system prompt? That sounds

00:03:19.990 --> 00:03:21.849
like where the real magic happens. It really

00:03:21.849 --> 00:03:24.789
is. The system prompt defines everything. The

00:03:24.789 --> 00:03:27.310
agent's personality, its main goal, the specific

00:03:27.310 --> 00:03:30.090
rules it must follow. And here's a smart shortcut

00:03:30.090 --> 00:03:33.379
the sources mention. Don't try to write this

00:03:33.379 --> 00:03:35.620
super complex prompt entirely from scratch. It's

00:03:35.620 --> 00:03:39.780
hard. Instead, use an AI like ChatGPT to draft

00:03:39.780 --> 00:03:42.539
the first version for you. Ah, use the AI to

00:03:42.539 --> 00:03:44.560
help build the AI. It got it. Yeah. You tell

00:03:44.560 --> 00:03:47.099
the drafting AI, okay, the persona is John from

00:03:47.099 --> 00:03:48.960
AI Fire. The main goal is booking appointments,

00:03:49.120 --> 00:03:51.699
period. And then you add specific non -negotiable

00:03:51.699 --> 00:03:54.719
rules. Like the example rule is before booking,

00:03:54.960 --> 00:03:56.939
Jano must ask for the email address and then

00:03:56.939 --> 00:03:58.759
he has to confirm the spelling by reading back

00:03:58.759 --> 00:04:02.060
the characters before the at symbol. Like A -I

00:04:02.060 --> 00:04:05.099
-F -I -R -E at example dot com. That level of

00:04:05.099 --> 00:04:07.699
detail. Yeah. It's impressive. It moves it from

00:04:07.699 --> 00:04:10.159
feeling like a basic bot to something, well,

00:04:10.280 --> 00:04:12.439
more trustworthy. I have to admit, prompt engineering

00:04:12.439 --> 00:04:15.159
can be tough. I still wrestle with prompt drift

00:04:15.159 --> 00:04:17.560
myself sometimes. Getting the AI to consistently

00:04:17.560 --> 00:04:19.720
follow those specific instructions isn't always

00:04:19.720 --> 00:04:22.209
easy. Oh, absolutely. It's an ongoing challenge.

00:04:22.389 --> 00:04:24.569
So to help with consistency, especially for scheduling,

00:04:24.769 --> 00:04:27.850
the sources suggest a specific sort of hack for

00:04:27.850 --> 00:04:29.550
the system prompt. Yeah, it's a really clever

00:04:29.550 --> 00:04:32.170
pro tip. You embed a little piece of code right

00:04:32.170 --> 00:04:34.410
into the system prompt itself. It looks like

00:04:34.410 --> 00:04:38.689
this. Today's date and time is now date. Beep,

00:04:38.689 --> 00:04:40.430
beep, percent, beep, percent, percent, percent,

00:04:40.529 --> 00:04:41.689
percent, percent, percent, percent, percent,

00:04:41.850 --> 00:04:45.850
percent. Okay, what does that code snippet actually

00:04:45.850 --> 00:04:48.949
do? It injects the current date and time right

00:04:48.949 --> 00:04:51.310
when the call starts directly into the AI's context.

00:04:51.750 --> 00:04:54.509
Think about it. If a call starts at 11 .58 p

00:04:54.509 --> 00:04:56.930
.m. and the user says book it for tomorrow, how

00:04:56.930 --> 00:05:00.370
does the AI know what tomorrow means? Ah, without

00:05:00.370 --> 00:05:02.670
this, it might use its training data cutoff date.

00:05:02.990 --> 00:05:05.649
Which could be months old. Precisely. This little

00:05:05.649 --> 00:05:08.209
variable gives it perfect real -time awareness.

00:05:08.310 --> 00:05:11.970
It stops those kinds of errors cold. So if the

00:05:11.970 --> 00:05:13.769
AI helps write the prompt, what's the single

00:05:13.769 --> 00:05:16.629
most vital human touch for scheduling? It sounds

00:05:16.629 --> 00:05:18.370
like it's that date -time variable. That ensures

00:05:18.370 --> 00:05:21.089
it handles references like tomorrow or next Tuesday

00:05:21.089 --> 00:05:23.589
correctly. The date -time variable ensures accurate

00:05:23.589 --> 00:05:26.129
scheduling references like tomorrow. Exactly.

00:05:26.350 --> 00:05:28.329
All right, so we've engineered the tools in the

00:05:28.329 --> 00:05:31.139
brain. Did the test actually work seamlessly?

00:05:31.420 --> 00:05:33.500
Yeah, the walkthrough sounds pretty smooth, so

00:05:33.500 --> 00:05:35.939
the user kicks it off. Can I schedule for tomorrow

00:05:35.939 --> 00:05:39.620
at 4 .pm? Immediately, the agent fires off the

00:05:39.620 --> 00:05:41.959
check availability tool. And you can see this

00:05:41.959 --> 00:05:44.519
happen in real time in the VAPI call logs, which

00:05:44.519 --> 00:05:46.779
is great for debugging. Okay, it checks the calendar.

00:05:46.959 --> 00:05:49.459
Finds the slot is open, confirms it, and then

00:05:49.459 --> 00:05:51.720
crucially it follows that specific rule we programmed.

00:05:51.980 --> 00:05:54.319
It asks, okay, great, can I get your email address

00:05:54.319 --> 00:05:56.220
to finalize the booking? And this is the moment

00:05:56.220 --> 00:05:59.149
of truth for that. Detailed prompt engineering,

00:05:59.410 --> 00:06:01.550
right? Yeah. The email confirmation. Yes. The

00:06:01.550 --> 00:06:04.790
user gives the email, say, AIFire at example

00:06:04.790 --> 00:06:07.550
.com. The agent comes back with, got it, just

00:06:07.550 --> 00:06:11.730
to confirm, that's AIFire at example .com. Correct.

00:06:12.110 --> 00:06:14.250
Spelling it out, just like the rule dictated,

00:06:14.329 --> 00:06:16.810
perfect execution. Wow. Okay, so that specific

00:06:16.810 --> 00:06:19.350
instruction stuck. It did. The user confirms,

00:06:19.389 --> 00:06:21.629
and boom, the agent immediately calls the second

00:06:21.629 --> 00:06:24.860
tool, create vent. And then almost instantly,

00:06:25.079 --> 00:06:26.980
the appointment pops up in the connected Google

00:06:26.980 --> 00:06:30.019
Calendar. Correct time, correct duration. The

00:06:30.019 --> 00:06:32.860
example used 1 .5 hours and with the user's email

00:06:32.860 --> 00:06:35.259
added as an attendee. That does sound seamless.

00:06:35.439 --> 00:06:38.959
So what specific part of that test really proved

00:06:38.959 --> 00:06:42.040
the AI -written prompt was high quality? I'd

00:06:42.040 --> 00:06:44.259
say the agent confirming the email by spelling

00:06:44.259 --> 00:06:46.459
individual letters. That demonstrated really

00:06:46.459 --> 00:06:48.879
high detail awareness and rule following. Okay.

00:06:48.939 --> 00:06:51.660
The booking works, but that's just one interaction

00:06:51.660 --> 00:06:54.199
logged in Voppy and Google Calendar. How do we

00:06:54.199 --> 00:06:55.699
connect this to the rest of the business, make

00:06:55.699 --> 00:06:58.040
it part of a larger workflow? This is where the

00:06:58.040 --> 00:07:01.420
real automation power comes in, using N8N. And

00:07:01.420 --> 00:07:04.439
the key starts back in VAPI in the analysis tab.

00:07:04.600 --> 00:07:07.220
You set up something called structured data extraction.

00:07:07.759 --> 00:07:11.120
Structured data extraction. What does that mean

00:07:11.120 --> 00:07:13.019
exactly? It means you're not just getting a raw

00:07:13.019 --> 00:07:15.379
transcript of the call, which is, frankly, hard

00:07:15.379 --> 00:07:18.019
for other systems to use reliably. Instead, you

00:07:18.019 --> 00:07:20.019
tell VAPI exactly what pieces of information

00:07:20.019 --> 00:07:22.259
you want to pull out from the conversation. So

00:07:22.259 --> 00:07:23.959
instead of the whole conversation, you define

00:07:23.959 --> 00:07:27.110
specific fields. Precisely. Unifying properties

00:07:27.110 --> 00:07:31.110
like, say, appointment type, email address, appointment

00:07:31.110 --> 00:07:34.790
date time, maybe even a short call summary. This

00:07:34.790 --> 00:07:38.050
turns the messy conversation into clean, predictable,

00:07:38.430 --> 00:07:41.370
machine -readable data, usually a JSON object.

00:07:41.730 --> 00:07:44.189
But why is that structure so important? Couldn't

00:07:44.189 --> 00:07:45.829
you just send the whole transcript to another

00:07:45.829 --> 00:07:47.850
AI later and have it figure out the details?

00:07:48.129 --> 00:07:50.769
You could try, but unstructured data is brittle

00:07:50.769 --> 00:07:53.470
for automation. If you want this to reliably

00:07:53.470 --> 00:07:56.649
feed into a database or update your CRM or even

00:07:56.649 --> 00:07:58.769
just populate a spreadsheet cleanly every single

00:07:58.769 --> 00:08:01.870
time, you need that predictable structure. Structured

00:08:01.870 --> 00:08:04.569
data is essential for reliable integration into

00:08:04.569 --> 00:08:08.050
databases, CRMs, and spreadsheets. Right. Consistency

00:08:08.050 --> 00:08:10.750
is key for automation. So we extract this structured

00:08:10.750 --> 00:08:13.740
data. Then what? Then we send it over to NAN.

00:08:13.800 --> 00:08:16.560
The sources highlight NAN probably because it's

00:08:16.560 --> 00:08:18.720
really good at handling these kinds of incoming

00:08:18.720 --> 00:08:20.699
webhooks and allows for pretty complex logic

00:08:20.699 --> 00:08:23.139
later on if you need it. Okay, so how does VAPI

00:08:23.139 --> 00:08:25.720
talk to NAN? You set up a webhook trigger node

00:08:25.720 --> 00:08:28.779
in NAN first. Make sure it's set to listen for

00:08:28.779 --> 00:08:31.160
the POST method that's important for security

00:08:31.160 --> 00:08:34.879
and how VAPI sends data. N8n gives you a unique

00:08:34.879 --> 00:08:37.220
URL for this webhook. And you paste that URL

00:08:37.220 --> 00:08:40.340
back into VAPI. Exactly. In VAPI's advanced settings

00:08:40.340 --> 00:08:42.919
for the agent, you paste that N8n webhook URL.

00:08:43.039 --> 00:08:45.919
And critically, you configure VAPI to only send

00:08:45.919 --> 00:08:48.299
the end of call report, which contains that nice

00:08:48.299 --> 00:08:50.620
structured data we defined earlier. You don't

00:08:50.620 --> 00:08:52.620
necessarily need data mid -call for this use

00:08:52.620 --> 00:08:55.740
case. Okay. So VAPI call ends. Structured data

00:08:55.740 --> 00:08:58.710
package is set. to the NANN webhook. What happens

00:08:58.710 --> 00:09:01.289
inside NANN then? The workflow instantly triggers,

00:09:01.490 --> 00:09:04.809
step one. Webhook receives the data. Step two

00:09:04.809 --> 00:09:07.149
could be, for example, a Google Sheets node.

00:09:07.309 --> 00:09:09.549
You configure it to take the incoming variable's

00:09:09.549 --> 00:09:11.210
email address, appointment type, appointment

00:09:11.210 --> 00:09:13.610
date time, call summary, and append them as a

00:09:13.610 --> 00:09:16.330
neat new row in your spreadsheet. Ah, so it's

00:09:16.330 --> 00:09:18.529
like automatically logging every booking into

00:09:18.529 --> 00:09:20.590
a central sheet, almost like building your own

00:09:20.590 --> 00:09:22.950
database entry by entry. Yeah, like stacking

00:09:22.950 --> 00:09:25.590
Lego blocks of data. Exactly. And you can add

00:09:25.590 --> 00:09:27.970
more steps, maybe a Gmail node right after to

00:09:27.970 --> 00:09:29.870
send a quick notification email to the sales

00:09:29.870 --> 00:09:32.730
team saying, hey, new appointment booked with

00:09:32.730 --> 00:09:35.169
summary. Very cool. And then making it live.

00:09:35.649 --> 00:09:38.019
What's the final step for production? This is

00:09:38.019 --> 00:09:40.419
super important. NEN gives you separate URLs

00:09:40.419 --> 00:09:43.000
for testing and for production. Once you've tested

00:09:43.000 --> 00:09:45.519
everything using the test URL, you must copy

00:09:45.519 --> 00:09:49.019
the production URL from NEN, go back into VAPI

00:09:49.019 --> 00:09:51.679
settings, replace the test URL with the production

00:09:51.679 --> 00:09:54.259
one, and then the final click, toggle your NEN

00:09:54.259 --> 00:09:57.639
workflow to active. Now it's live, ready to run

00:09:57.639 --> 00:10:01.039
247. Got it. Test, confirm, then switch to the

00:10:01.039 --> 00:10:03.820
production URL and activate. So once this basic

00:10:03.820 --> 00:10:05.980
booking and logging system is running, potential

00:10:05.980 --> 00:10:09.919
seems huge. Whoa. I mean, imagine scaling this

00:10:09.919 --> 00:10:11.519
core system, handling, I don't know, a billion

00:10:11.519 --> 00:10:14.279
queries across a massive organization. That VAPI

00:10:14.279 --> 00:10:16.360
and ANN connection, it feels like a foundational

00:10:16.360 --> 00:10:19.059
piece for some serious automation. It absolutely

00:10:19.059 --> 00:10:21.899
is. That core pattern voice interaction, structured

00:10:21.899 --> 00:10:24.679
data extraction, webhook to automation platform

00:10:24.679 --> 00:10:28.500
is incredibly powerful and scalable. The simple

00:10:28.500 --> 00:10:30.779
booking and logging we discussed, that's just

00:10:30.779 --> 00:10:33.139
scratching the surface. So what comes next? What

00:10:33.139 --> 00:10:35.259
are the more advanced possibilities? Well, think

00:10:35.259 --> 00:10:38.100
about CRM integration. Because we have that structured

00:10:38.100 --> 00:10:40.940
data, like the email address, your N8N workflow

00:10:40.940 --> 00:10:43.720
could easily look up that email in your HubSpot

00:10:43.720 --> 00:10:46.340
or Salesforce. If the contact exists, update

00:10:46.340 --> 00:10:48.200
their record with notes from the call summary

00:10:48.200 --> 00:10:50.960
if it's a new person. Create a new contact automatically.

00:10:51.259 --> 00:10:53.970
No manual data entry needed. That saves a ton

00:10:53.970 --> 00:10:55.769
of time. And you mentioned logic earlier. Right.

00:10:55.889 --> 00:10:58.049
This is where N8n really shines. You can add

00:10:58.049 --> 00:11:01.210
IF nodes, conditional logic. So if the call summary

00:11:01.210 --> 00:11:03.370
contains keywords like complaint or unhappy,

00:11:03.690 --> 00:11:06.169
the workflow could automatically create a high

00:11:06.169 --> 00:11:09.070
priority support ticket in Zendesk or whatever

00:11:09.070 --> 00:11:12.730
you use. If the summary mentions interested in

00:11:12.730 --> 00:11:15.710
enterprise plan or something high value, maybe

00:11:15.710 --> 00:11:17.809
it triggers a different path, adding them to

00:11:17.809 --> 00:11:20.350
a specific sales sequence, alerting a senior

00:11:20.350 --> 00:11:22.970
account manager immediately. It's real -time

00:11:22.970 --> 00:11:24.870
intelligent routing based on the conversation

00:11:24.870 --> 00:11:28.450
content. That ability to use conditional logic

00:11:28.450 --> 00:11:32.149
to triage issues or opportunities seems incredibly

00:11:32.149 --> 00:11:35.570
valuable. If you're using that logic, what's

00:11:35.570 --> 00:11:37.509
probably the most high -value immediate action

00:11:37.509 --> 00:11:39.879
you could trigger? I'd say creating a support

00:11:39.879 --> 00:11:42.320
ticket and instantly notifying a manager when

00:11:42.320 --> 00:11:44.759
a complaint is detected. That rapid response

00:11:44.759 --> 00:11:47.360
can save a customer relationship. Makes sense.

00:11:47.580 --> 00:11:49.960
Okay, so we have this powerful agent and backend

00:11:49.960 --> 00:11:52.440
automation. How do we actually let people call

00:11:52.440 --> 00:11:54.340
it? Give us a phone number. Vipii makes that

00:11:54.340 --> 00:11:56.320
pretty easy too. They actually provide up to

00:11:56.320 --> 00:11:59.399
10 free US phone numbers right within their platform.

00:11:59.580 --> 00:12:01.679
You can just grab one, configure it, point directly

00:12:01.679 --> 00:12:04.120
to the booking agent we designed, and boom, it's

00:12:04.120 --> 00:12:06.320
public. 10 free numbers is quite generous. What

00:12:06.320 --> 00:12:08.120
if you already have business numbers? Right.

00:12:08.200 --> 00:12:10.539
Maybe with another provider like Twilio. You

00:12:10.539 --> 00:12:13.279
can import those too. Voppy allows you to bring

00:12:13.279 --> 00:12:15.519
your existing numbers over so your customers

00:12:15.519 --> 00:12:17.879
don't have to learn a new number. It maintains

00:12:17.879 --> 00:12:20.950
that consistency. All right, let's shift to some

00:12:20.950 --> 00:12:24.230
practicalities. Cost is always a factor. How

00:12:24.230 --> 00:12:25.990
does the pricing work for something like this?

00:12:26.110 --> 00:12:29.629
Is it expensive to run? It's typically a pay

00:12:29.629 --> 00:12:32.110
-as -you -go model, which is nice. The source

00:12:32.110 --> 00:12:34.110
material actually gives an example after running

00:12:34.110 --> 00:12:36.210
multiple test calls for that whole booking flow

00:12:36.210 --> 00:12:38.870
we described. It only used about 1 .6 credits.

00:12:39.169 --> 00:12:41.389
1 .6 credits. That sounds really affordable.

00:12:41.929 --> 00:12:44.610
But I assume that's for like the perfect scenario,

00:12:44.769 --> 00:12:46.669
the happy path test. That's a fair assumption,

00:12:46.909 --> 00:12:48.889
yeah. What happens if a call gets complicated?

00:12:49.710 --> 00:12:52.169
If the user makes the agent use the check availability

00:12:52.169 --> 00:12:54.769
tool five times because they keep changing their

00:12:54.769 --> 00:12:57.830
mind, or if we decide we need the absolute best

00:12:57.830 --> 00:13:00.850
reasoning and switch from GPT -4 cluster to the

00:13:00.850 --> 00:13:03.710
full GPT -4 model, where do the costs really

00:13:03.710 --> 00:13:06.149
come from? Great points. The main things that

00:13:06.149 --> 00:13:09.450
drive up costs are, first, the AI model you choose.

00:13:09.990 --> 00:13:13.309
Full GPT -4 will cost more per minute than GPT

00:13:13.309 --> 00:13:16.909
-4 cluster, definitely. Second is simply the

00:13:16.909 --> 00:13:20.070
call duration. Longer calls cost more. And third,

00:13:20.269 --> 00:13:22.029
as you pointed out, is the number of tool calls.

00:13:22.470 --> 00:13:24.429
Every time the agent has to make an external

00:13:24.429 --> 00:13:26.669
call, like checking the calendar or creating

00:13:26.669 --> 00:13:28.809
the event, that usually incurs a small cost.

00:13:29.269 --> 00:13:31.750
So, efficient prompting that gets the job done

00:13:31.750 --> 00:13:34.250
with fewer tool calls, that's key for keeping

00:13:34.250 --> 00:13:37.210
operational costs low, especially at scale. Okay,

00:13:37.289 --> 00:13:39.350
cost management is one thing. What about making

00:13:39.350 --> 00:13:41.779
the agent reliable, resilient? building something

00:13:41.779 --> 00:13:44.120
that's ready for the messy real world resilience

00:13:44.120 --> 00:13:46.600
testing is critical you absolutely have to test

00:13:46.600 --> 00:13:48.259
the happy path where everything goes perfectly

00:13:48.259 --> 00:13:50.580
but that's not enough you need to test edge cases

00:13:50.580 --> 00:13:53.059
what happens when the user gives an unclear request

00:13:53.059 --> 00:13:55.759
what if they change their mind repeatedly and

00:13:55.759 --> 00:13:57.580
i guess how does it handle difficult callers

00:13:57.580 --> 00:14:00.539
exactly you need to test the frustrated caller

00:14:00.539 --> 00:14:03.220
scenario does the agent remain calm stick to

00:14:03.220 --> 00:14:05.899
its script stay helpful or does it break down

00:14:05.899 --> 00:14:09.620
and related to that is tool failure planning

00:14:10.169 --> 00:14:12.409
Keep your tools focused, doing one thing well,

00:14:12.570 --> 00:14:15.350
but also explicitly tell the agent in the prompt

00:14:15.350 --> 00:14:18.629
what to do if a tool fails. If the Google Calendar

00:14:18.629 --> 00:14:21.649
API is down or throws an error, the agent needs

00:14:21.649 --> 00:14:24.309
instructions. Something like, if a tool fails,

00:14:24.490 --> 00:14:26.450
apologize to the user, explain you're having

00:14:26.450 --> 00:14:29.190
technical difficulties, and offer to take a message

00:14:29.190 --> 00:14:30.950
or have a human call them back. Planning for

00:14:30.950 --> 00:14:33.970
failure. Okay. What about common issues people

00:14:33.970 --> 00:14:36.429
run into once these agents are live? Are there

00:14:36.429 --> 00:14:38.860
quick fixes? Yeah, there are a few common ones.

00:14:38.960 --> 00:14:41.299
Sometimes agents get repetitive. The fix. Go

00:14:41.299 --> 00:14:43.360
back to the system prompt and reinforce the rule.

00:14:43.559 --> 00:14:45.940
Never repeat information you have already stated

00:14:45.940 --> 00:14:48.139
clearly. OK, what else? Giving out wrong times

00:14:48.139 --> 00:14:49.879
for appointments. That often comes down to time

00:14:49.879 --> 00:14:52.460
zone confusion. So double check the time zone

00:14:52.460 --> 00:14:55.080
settings in the P in your calendar and definitely

00:14:55.080 --> 00:14:57.419
make sure that expression in the prompt includes

00:14:57.419 --> 00:14:59.799
the correct time zone, like America, Chicago

00:14:59.799 --> 00:15:02.659
in the example. Alignment is key. Yeah. Anything

00:15:02.659 --> 00:15:06.059
for making it sound less. Robotic. Oh, yeah.

00:15:06.419 --> 00:15:08.820
Sometimes the default voice just doesn't sound

00:15:08.820 --> 00:15:11.059
right. Try changing the voice provider or the

00:15:11.059 --> 00:15:14.440
specific voice model within Voppy. Also, you

00:15:14.440 --> 00:15:16.879
can explicitly tell the agent in the prompt to

00:15:16.879 --> 00:15:19.740
speak naturally like a helpful human colleague.

00:15:19.980 --> 00:15:22.980
Use contractions where appropriate. Loosen up

00:15:22.980 --> 00:15:26.259
the style guide a bit. Good tips. Now, a trickier

00:15:26.259 --> 00:15:28.799
one. What happens if the agent starts, you know,

00:15:28.799 --> 00:15:31.919
hallucinating, making things up, giving incorrect

00:15:31.919 --> 00:15:33.980
information about policies or something sensitive?

00:15:34.240 --> 00:15:37.059
Given how important consistency is, what's the

00:15:37.059 --> 00:15:38.980
crucial fix there? Right. That's a big one. If

00:15:38.980 --> 00:15:41.320
it starts going off script with facts or policies,

00:15:41.600 --> 00:15:44.059
force the agent to check its knowledge base before

00:15:44.059 --> 00:15:46.539
providing any policy answers. You need to ground

00:15:46.539 --> 00:15:48.940
it firmly in verified information. Grounding.

00:15:49.019 --> 00:15:51.419
Makes sense. Okay, let's pull this all together.

00:15:51.789 --> 00:15:53.330
What we've really outlined today is a blueprint,

00:15:53.570 --> 00:15:55.889
isn't it? Moving AI from just a concept or a

00:15:55.889 --> 00:15:58.710
fun tool into a genuine autonomous worker integrated

00:15:58.710 --> 00:16:01.149
into business processes. You've got Voppy handling

00:16:01.149 --> 00:16:02.990
the voice, the understanding, the action tools,

00:16:03.029 --> 00:16:05.970
and then N8n providing that robust, scalable

00:16:05.970 --> 00:16:09.769
automation backbone to connect it all, log the

00:16:09.769 --> 00:16:12.460
data, and trigger downstream actions. And the

00:16:12.460 --> 00:16:15.340
impact is potentially massive. We're seeing sources

00:16:15.340 --> 00:16:17.299
talk about businesses cutting customer service

00:16:17.299 --> 00:16:20.360
costs by like 60 to 80 percent and maybe even

00:16:20.360 --> 00:16:22.399
more importantly, dropping lead response times

00:16:22.399 --> 00:16:25.820
from hours or even days down to seconds. This

00:16:25.820 --> 00:16:29.600
architecture, it turns AI from a cool idea into

00:16:29.600 --> 00:16:33.299
a 247 operational asset that can directly impact

00:16:33.299 --> 00:16:35.940
the bottom line. The pieces seem to be there.

00:16:36.019 --> 00:16:38.399
The tech feels mature enough. The latency is

00:16:38.399 --> 00:16:40.340
getting incredibly low. The architectures like

00:16:40.340 --> 00:16:42.340
this one are being proven out. It feels like

00:16:42.340 --> 00:16:44.220
the market is really ready for this level of

00:16:44.220 --> 00:16:46.519
automation. It absolutely is. And the competitive

00:16:46.519 --> 00:16:48.379
advantage, well, it's going to go to those who

00:16:48.379 --> 00:16:51.019
build this capability first. The question for

00:16:51.019 --> 00:16:53.059
you listening isn't really if this kind of AI

00:16:53.059 --> 00:16:55.529
worker becomes standard. It's more like. will

00:16:55.529 --> 00:16:57.110
you be ready to build it? Or will you be playing

00:16:57.110 --> 00:16:58.850
catch up while others are already reaping the

00:16:58.850 --> 00:17:01.909
benefits? A powerful thought to end on. Thank

00:17:01.909 --> 00:17:03.950
you for joining us for this deep dive. We encourage

00:17:03.950 --> 00:17:06.250
you to really think about how structured data

00:17:06.250 --> 00:17:09.369
extraction and conditional automation logic could

00:17:09.369 --> 00:17:12.089
start transforming your own workflows. Definitely

00:17:12.089 --> 00:17:13.829
food for thought. We'll see you next time.
