WEBVTT

00:00:00.000 --> 00:00:02.419
Welcome to the Deep Dive. We're here to pull

00:00:02.419 --> 00:00:03.980
out the key bits of knowledge so you can get

00:00:03.980 --> 00:00:06.580
up to speed fast. Today we're looking at something,

00:00:06.740 --> 00:00:09.220
well, really transformative, I think. You know

00:00:09.220 --> 00:00:11.560
that AI agent you used, the text -based one?

00:00:11.679 --> 00:00:13.800
Super smart, super helpful. Well, what if he

00:00:13.800 --> 00:00:15.820
can actually talk back to you? Yeah, not like,

00:00:15.839 --> 00:00:18.460
you know, an old GPS voice. Exactly. Not clunky

00:00:18.460 --> 00:00:21.140
or robotic, but like, with a natural, engaging

00:00:21.140 --> 00:00:24.660
voice. A real personality. Something that sounds,

00:00:24.960 --> 00:00:31.260
well, human. And the thing is, It's something

00:00:31.260 --> 00:00:34.740
you can genuinely build today, like right now,

00:00:34.840 --> 00:00:37.039
and without writing code. That's the amazing

00:00:37.039 --> 00:00:40.000
part, the accessibility. Right. So our mission

00:00:40.000 --> 00:01:00.750
for this deep dive is... So, OK, let's kick things

00:01:00.750 --> 00:01:03.909
off. Method one. We're calling it the AI voicemail

00:01:03.909 --> 00:01:08.010
system expert. This sounds well, it sounds clever,

00:01:08.090 --> 00:01:10.489
like a smart way to handle tasks that aren't

00:01:10.489 --> 00:01:13.730
instant. Right. This asynchronous idea. Exactly.

00:01:13.730 --> 00:01:18.310
Yeah. Think of it like a super intelligent. Voice

00:01:18.310 --> 00:01:21.010
mailbox. You, the user, leave a voice message,

00:01:21.109 --> 00:01:23.650
maybe a complex question or instruction. And

00:01:23.650 --> 00:01:26.950
the AI doesn't rush. It takes its time. It listens.

00:01:27.069 --> 00:01:29.129
It thinks, processes everything, and then responds

00:01:29.129 --> 00:01:31.629
with its own audio message back to you. Oh, OK.

00:01:31.769 --> 00:01:34.109
So this asynchronous thing, it's perfect for

00:01:34.109 --> 00:01:36.890
jobs that need, you know, deeper thought, summarizing

00:01:36.890 --> 00:01:39.329
a really long document maybe or doing some complex

00:01:39.329 --> 00:01:41.530
research. Right. Not that instant back and forth,

00:01:41.689 --> 00:01:43.849
but more considered. Precisely. It's about getting

00:01:43.849 --> 00:01:46.390
a thoughtful, rich response when you don't need

00:01:46.390 --> 00:01:48.319
it that very second. That makes a lot of sense.

00:01:48.319 --> 00:01:50.099
Like having a research assistant, you can brief

00:01:50.099 --> 00:01:52.060
and they come back later with the findings. So,

00:01:52.180 --> 00:01:54.359
OK, what's the actual journey from me hitting

00:01:54.359 --> 00:01:57.000
record to the AI talking back? Walk us through

00:01:57.000 --> 00:01:59.560
this voice message pipeline. Yeah, it's quite

00:01:59.560 --> 00:02:01.620
an elegant flow, actually, like a little assembly

00:02:01.620 --> 00:02:04.319
line. It starts when you send a voice message.

00:02:04.640 --> 00:02:07.060
Let's say you send it to a telegram bot. Then

00:02:07.060 --> 00:02:11.219
N8N, acting as the orchestrator, grabs that audio

00:02:11.219 --> 00:02:15.050
file. downloads it, that raw audio, it goes straight

00:02:15.050 --> 00:02:17.349
to 11 Labs, their models transcribe it, turn

00:02:17.349 --> 00:02:20.289
it into text, and they do it brilliantly. That

00:02:20.289 --> 00:02:23.349
text then gets fed to the AI brain. Could be

00:02:23.349 --> 00:02:25.509
ChatGPT, could be Claude, whatever you're using.

00:02:25.590 --> 00:02:28.389
The AI does its thing, generates the response

00:02:28.389 --> 00:02:31.310
in text, and that text goes back to 11 Labs,

00:02:31.389 --> 00:02:33.909
this time for text -to -speech, turning it into

00:02:33.909 --> 00:02:36.389
that natural -sounding audio. And then finally,

00:02:36.509 --> 00:02:39.189
N8N takes that new audio file and sends it back

00:02:39.189 --> 00:02:41.270
to you in Telegram. It's a full circle. Wow.

00:02:41.370 --> 00:02:43.819
Okay. Okay, that sounds incredibly capable. And

00:02:43.819 --> 00:02:45.560
the bit that really grabs me is the no -code

00:02:45.560 --> 00:02:47.439
part you mentioned. Like, for someone wanting

00:02:47.439 --> 00:02:49.580
to build this, what are the actual pieces, the

00:02:49.580 --> 00:02:52.400
Lego bricks in N8N? How do they, you know, snap

00:02:52.400 --> 00:02:54.780
together? Right, yeah, Lego bricks is a great

00:02:54.780 --> 00:02:56.360
way to put it. They really do feel like they're

00:02:56.360 --> 00:02:59.550
made for each other. You're listening to Ghost.

00:02:59.729 --> 00:03:02.689
In any in -the -ghosted area, like the Telegram

00:03:02.689 --> 00:03:04.990
turn -out, you set it up to catch incoming audio

00:03:04.990 --> 00:03:07.009
files, specifically the voice message format.

00:03:07.270 --> 00:03:10.689
And crucially, it grabs the chatted... The return

00:03:10.689 --> 00:03:13.189
address. Exactly. It's the unique return address,

00:03:13.250 --> 00:03:15.370
so the AI knows where to send its response back.

00:03:15.849 --> 00:03:19.610
Next trick. The universal translator. That's

00:03:19.610 --> 00:03:22.110
an 11 labs node doing the speech -to -text...

00:03:22.169 --> 00:03:25.030
It takes the audio file and just grabs and transcribes

00:03:25.030 --> 00:03:28.770
it. The beauty is NAN handles passing that audio

00:03:28.770 --> 00:03:31.289
data automatically. It's pretty seamless. Okay,

00:03:31.349 --> 00:03:56.710
no complex mapping needed there. Nope. And it

00:03:56.710 --> 00:03:58.930
totally changes the whole feel of the interaction.

00:03:59.310 --> 00:04:02.110
So the system prompt is really the soul of the

00:04:02.110 --> 00:04:04.150
agent. OK, then what? How does it get its voice

00:04:04.150 --> 00:04:07.430
back? Right. So after the AI thinks and writes

00:04:07.430 --> 00:04:10.430
its text response, you use another 11 Labs node.

00:04:10.629 --> 00:04:13.430
This one's for text to speech. You connect the

00:04:13.430 --> 00:04:16.350
AI's text output to this node. And this is where

00:04:16.350 --> 00:04:18.810
you like. Cast your agent. You choose a voice

00:04:18.810 --> 00:04:21.029
from Eleven Labs library. They have tons. Or

00:04:21.029 --> 00:04:23.189
you can use a specific voice ID if you've cloned

00:04:23.189 --> 00:04:25.389
one or have a favorite. Got it. Choosing the

00:04:25.389 --> 00:04:28.449
voice actor. Pretty much. And the final step,

00:04:28.670 --> 00:04:31.410
delivering the message. That's another telegram

00:04:31.410 --> 00:04:33.790
node, a sender this time. You make sure it sends

00:04:33.790 --> 00:04:36.050
the audio file generated by Eleven Labs back

00:04:36.050 --> 00:04:38.189
to the original chat aid so the message goes

00:04:38.189 --> 00:04:39.670
right back to the person who sent the request.

00:04:39.990 --> 00:04:43.579
And then the test flight. The moment of truth,

00:04:43.680 --> 00:04:45.620
as you called it. I bet it's satisfying seeing

00:04:45.620 --> 00:04:47.620
it all light up. Oh, it really is. Watching each

00:04:47.620 --> 00:04:49.879
node activate on the inning and canvas as your

00:04:49.879 --> 00:04:52.420
message flows through. Very cool. But, you know,

00:04:52.480 --> 00:04:54.519
things don't always work first time. What about

00:04:54.519 --> 00:04:59.100
when, Houston, we have a problem? Any quick debugging

00:04:59.100 --> 00:05:01.480
tips? Yeah, absolutely. The bugging and editing

00:05:01.480 --> 00:05:03.620
is usually pretty straightforward. The visual

00:05:03.620 --> 00:05:06.379
flow helps a lot. If a node doesn't light up

00:05:06.379 --> 00:05:08.620
green, you know exactly where it's stopped. First

00:05:08.620 --> 00:05:11.079
thing, check the trigger. Does the Telegram node

00:05:11.079 --> 00:05:13.319
even receive the message? If not, maybe check

00:05:13.319 --> 00:05:16.180
your WhatsApp and Telegram as well. Then follow

00:05:16.180 --> 00:05:18.699
the noodles, those lines connecting the nodes.

00:05:18.800 --> 00:05:21.339
Is everything connected? Did you link an output

00:05:21.339 --> 00:05:24.439
to the right input? Easy mistake to make. But

00:05:24.439 --> 00:05:26.620
honestly, the most common failure point nine

00:05:26.620 --> 00:05:30.370
times out of ten is probably credentials. Ah,

00:05:30.769 --> 00:05:33.550
yes, the classic. Triple check them. Your 11

00:05:33.550 --> 00:05:36.410
labs key, your AI service key, one typo and the

00:05:36.410 --> 00:05:39.449
whole thing falls over. And maybe most importantly,

00:05:39.649 --> 00:05:47.790
read the error message. That's really solid advice.

00:05:47.949 --> 00:05:52.029
OK, so this AI voicemail bot is fantastic for

00:05:52.029 --> 00:05:55.069
those deeper asynchronous things. But what if

00:05:55.069 --> 00:05:58.490
you want that instant back and forth, the conversational

00:05:58.490 --> 00:06:00.689
feel? This is where it gets really interesting,

00:06:00.810 --> 00:06:04.410
I think. We're talking about building a fluid,

00:06:04.730 --> 00:06:08.490
real -time conversational AI, like talking to

00:06:08.490 --> 00:06:10.629
a human assistant on the phone. This sounds like

00:06:10.629 --> 00:06:12.490
the main event, the showstopper. Yeah, this is

00:06:12.490 --> 00:06:15.050
where it feels truly interactive. So how do we

00:06:15.050 --> 00:06:18.430
achieve that, the proper AI assistant experience?

00:06:18.850 --> 00:06:21.290
Okay, so for this real -time system. you need

00:06:21.290 --> 00:06:23.329
a totally different setup, a different kind of

00:06:23.329 --> 00:06:25.269
partnership. I like to think of it using the

00:06:25.269 --> 00:06:28.480
NASA mission control analogy. Ooh, okay. Tell

00:06:28.480 --> 00:06:30.959
me more. Right. In this model, 11 Labs is mission

00:06:30.959 --> 00:06:32.839
control. It's the sophisticated front end. It

00:06:32.839 --> 00:06:34.879
handles the direct chat with the user, the voice,

00:06:34.959 --> 00:06:37.100
the personality, managing the flow of the conversation.

00:06:37.360 --> 00:06:39.480
It's talking to the astronaut, basically. Got

00:06:39.480 --> 00:06:42.279
it. The voice on the comms. Exactly. Now, your

00:06:42.279 --> 00:06:44.420
NA workflow, that's the specialist team back

00:06:44.420 --> 00:06:47.420
in Houston. The engineers, the scientists, they

00:06:47.420 --> 00:06:50.660
don't talk directly to the user. Instead, they

00:06:50.660 --> 00:06:53.660
are a powerful tool that Mission Control 11 Labs

00:06:53.660 --> 00:06:56.740
calls upon when the user asks something complex.

00:06:57.639 --> 00:07:00.379
thing needing research or a specific action performed.

00:07:00.560 --> 00:07:03.740
Ah, I see. So Eleven Labs handles the chat and

00:07:03.740 --> 00:07:05.579
ANN does the heavy lifting in the background

00:07:05.579 --> 00:07:08.339
when needed. Precisely. That's the key takeaway.

00:07:08.639 --> 00:07:11.500
Eleven Labs manages the conversation and ANN

00:07:11.500 --> 00:07:14.839
handles the actions or the deep information retrieval.

00:07:14.939 --> 00:07:18.000
That analogy makes it crystal clear. So if 11

00:07:18.000 --> 00:07:20.819
Labs is mission control, how do we actually build

00:07:20.819 --> 00:07:23.060
it? How do we set up our agent in 11 Labs for

00:07:23.060 --> 00:07:25.160
these live calls? You actually do it right inside

00:07:25.160 --> 00:07:27.019
your 11 Labs account. They have a section called

00:07:27.019 --> 00:07:29.620
Conversational AI. You go there, create a new

00:07:29.620 --> 00:07:31.439
agent, and this is where you play casting director

00:07:31.439 --> 00:07:33.839
again. Give it a name, something fitting. Choose

00:07:33.839 --> 00:07:36.079
a voice from their library that matches the personality

00:07:36.079 --> 00:07:39.529
you want. calm and professional, energetic and

00:07:39.529 --> 00:07:42.110
friendly. And then you craft that initial greeting,

00:07:42.310 --> 00:07:44.370
something simple to start, like, hello, how can

00:07:44.370 --> 00:07:46.029
I help you today? Okay, straightforward enough.

00:07:46.470 --> 00:07:50.129
But how do we connect mission control to the

00:07:50.129 --> 00:07:52.670
specialist team, to NAN? That feels like the

00:07:52.670 --> 00:07:55.250
critical link. It absolutely is. So back in your

00:07:55.250 --> 00:07:58.370
11 Labs agent settings, you scroll down to tools.

00:07:58.959 --> 00:08:01.420
Here, you add a custom webhook tool. This is

00:08:01.420 --> 00:08:03.779
literally the direct phone line to your NAN workflow.

00:08:04.139 --> 00:08:06.439
Okay. You need to give the AI a clear briefing

00:08:06.439 --> 00:08:08.300
about this tool. So you give it a name, like

00:08:08.300 --> 00:08:11.120
NAN Web Researcher, and a description, something

00:08:11.120 --> 00:08:13.839
really clear like, call this tool to search the

00:08:13.839 --> 00:08:16.379
web and find information about any topic. It

00:08:16.379 --> 00:08:19.279
is an expert research assistant. So the AI knows

00:08:19.279 --> 00:08:21.519
what it's for. Exactly. You set the method to

00:08:21.519 --> 00:08:24.420
post and then you paste in the URL from an NAN

00:08:24.420 --> 00:08:27.339
webhook node that connects them. Got it. So the

00:08:27.339 --> 00:08:30.100
agent knows what the specialist team does, but

00:08:30.100 --> 00:08:31.800
it also needs to know when to call them, right?

00:08:31.879 --> 00:08:34.019
Yeah. That brings us to the prime directive,

00:08:34.220 --> 00:08:37.399
the system prompt for the 11Labs agent itself.

00:08:37.639 --> 00:08:40.299
How do we teach it to... Essentially, put the

00:08:40.299 --> 00:08:43.399
user on hold and call the NAN tool at the right

00:08:43.399 --> 00:08:45.980
moment. Yeah, this prompt is, it's like the agent's

00:08:45.980 --> 00:08:48.200
constitution. It dictates everything. 11 Labs

00:08:48.200 --> 00:08:50.799
has a nice generate with AI feature to get you

00:08:50.799 --> 00:08:53.779
started with a base prompt. But you must manually

00:08:53.779 --> 00:08:56.860
add a really crucial instruction. You have to

00:08:56.860 --> 00:09:00.100
explicitly tell the agent. When the user asks

00:09:00.100 --> 00:09:02.139
a question that requires up -to -date information

00:09:02.139 --> 00:09:05.500
or web research, use the N8 Web Researcher tool.

00:09:05.840 --> 00:09:08.360
Tell the user you are searching and then wait

00:09:08.360 --> 00:09:10.860
for the tool's response before continuing. Ah,

00:09:11.059 --> 00:09:14.240
okay, so it's a non -negotiable rule. Absolutely.

00:09:14.240 --> 00:09:16.860
Without that specific instruction, the agent

00:09:16.860 --> 00:09:18.919
might just try to answer from its general knowledge,

00:09:19.039 --> 00:09:20.879
which could be outdated, or it might just get

00:09:20.879 --> 00:09:23.419
confused about how to use the tool. This tells

00:09:23.419 --> 00:09:25.779
it exactly how to handle research requests. That's

00:09:25.779 --> 00:09:28.340
a really important detail. Okay, and what about

00:09:28.340 --> 00:09:31.879
building the specialist team itself, the NANN

00:09:31.879 --> 00:09:34.159
workflow that receives the call from 11 labs?

00:09:34.419 --> 00:09:36.940
What are the key parts there? So the NANN research

00:09:36.940 --> 00:09:39.340
backend is pretty focused. It usually has three

00:09:39.340 --> 00:09:42.710
main parts. First, the researcher. The webhook

00:09:42.710 --> 00:09:45.809
node receives the query from 11 labs. You pass

00:09:45.809 --> 00:09:48.889
that query straight to, say, a perplexity node.

00:09:49.049 --> 00:09:51.029
You'd probably use one of their sonar models.

00:09:51.169 --> 00:09:52.990
They're designed for real -time web search, pulling

00:09:52.990 --> 00:09:55.450
back sourced info. Okay, so it gets the raw data.

00:09:55.610 --> 00:09:59.649
Right. But that raw data can be a lot. Maybe

00:09:59.649 --> 00:10:02.330
too much for a quick conversational answer. So

00:10:02.330 --> 00:10:05.570
next comes the editor. You pass Perplexity's

00:10:05.570 --> 00:10:08.289
output to another AI agent node, its only job,

00:10:08.450 --> 00:10:11.570
to be a ruthless editor. You give it a sign prompt

00:10:11.570 --> 00:10:14.289
like, summarize the following information concisely,

00:10:14.289 --> 00:10:17.289
no more than three sentences. Nice. Keep it brief.

00:10:17.490 --> 00:10:21.250
Exactly. And finally, the report back. This is

00:10:21.250 --> 00:10:24.570
just a respond to webhook node in ANN. It takes

00:10:24.570 --> 00:10:26.629
that short, concise summary from the editor AI

00:10:26.629 --> 00:10:28.690
and sends it straight back to the 11 labs agent

00:10:28.690 --> 00:10:31.529
waiting on the line. Loop closed. This sounds

00:10:31.529 --> 00:10:34.370
seriously powerful when put together. Let's make

00:10:34.370 --> 00:10:36.769
it real. Can you walk us through a quick test

00:10:36.769 --> 00:10:38.809
flight, a hypothetical conversation so we can

00:10:38.809 --> 00:10:41.370
hear how it flows? Yeah, sure. Imagine you call

00:10:41.370 --> 00:10:43.210
your agent. You could be using your browser,

00:10:43.389 --> 00:10:45.809
your phone. You start. Hello, I'd like to do

00:10:45.809 --> 00:10:48.009
some research on the company NVIDIA. The agent,

00:10:48.129 --> 00:10:50.409
11 Labs, responds smoothly using the voice you

00:10:50.409 --> 00:10:52.629
chose. I can certainly help with that. Is there

00:10:52.629 --> 00:10:54.409
anything specific you'd like to know about NVIDIA?

00:10:54.549 --> 00:10:57.720
Okay, nice and natural. Then you say. Yeah, let's

00:10:57.720 --> 00:11:01.759
look at their Q4 2025 forecast. Now, the agent

00:11:01.759 --> 00:11:04.120
recognizes this needs the specialist tool because

00:11:04.120 --> 00:11:06.320
of that prompt rule. So it says, understood.

00:11:07.059 --> 00:11:10.039
I'll search for NVIDIA's Q4 2025 forecast now.

00:11:10.259 --> 00:11:12.940
Please give me just a moment. Ah, putting you

00:11:12.940 --> 00:11:15.759
on hold politely. Exactly. And behind the scenes,

00:11:15.879 --> 00:11:20.000
bam, the N8N workflow fires up. Perplexity searches,

00:11:20.299 --> 00:11:22.799
the AI summarizes. The specialist team is working.

00:11:22.980 --> 00:11:25.519
Right. Then maybe five, 10 seconds later, the

00:11:25.519 --> 00:11:28.759
11 labs agent comes back. Okay. I have found

00:11:28.759 --> 00:11:31.740
some information on NVIDIA's quarter four 2025

00:11:31.740 --> 00:11:36.639
forecast. NVIDIA reported revenue of whatever

00:11:36.639 --> 00:11:38.899
the summary is. Wow. It's a seamless conversation,

00:11:39.059 --> 00:11:41.139
even with that complex lookup happening in the

00:11:41.139 --> 00:11:43.379
background. That's genuinely impressive. It completely

00:11:43.379 --> 00:11:46.559
changes the game for AI interaction. So thinking

00:11:46.559 --> 00:11:49.250
bigger. What does this mean for expanding? You

00:11:49.250 --> 00:11:51.230
mentioned a pro -level upgrade, the multi -specialist

00:11:51.230 --> 00:11:52.789
idea. Yeah, this is where it gets really powerful.

00:11:52.990 --> 00:11:54.809
You're not limited to just one NANA specialist

00:11:54.809 --> 00:11:56.990
tool. You can create multiple NAN workflows,

00:11:57.389 --> 00:11:59.370
each starting with a webhook, each designed for

00:11:59.370 --> 00:12:01.370
a different task. Like what? Well, you could

00:12:01.370 --> 00:12:04.570
have your web researcher, but also maybe an internal

00:12:04.570 --> 00:12:06.970
database checker that looks at customer info.

00:12:07.740 --> 00:12:10.139
or a calendar scheduler, or even one that sends

00:12:10.139 --> 00:12:13.320
emails. Okay. You add each of these as separate

00:12:13.320 --> 00:12:15.759
webhook tools in the 11 Labs agent settings,

00:12:15.899 --> 00:12:18.679
each with a clear name and description. Then

00:12:18.679 --> 00:12:21.259
your 11 Labs agent, Mission Control, becomes

00:12:21.259 --> 00:12:24.000
much smarter. Based on your conversation, it

00:12:24.000 --> 00:12:26.240
will figure out which specialist tool is the

00:12:26.240 --> 00:12:28.679
right one to call for that specific task. So

00:12:28.679 --> 00:12:30.720
it routes the request intelligently. Exactly.

00:12:30.919 --> 00:12:33.159
It transforms your agent from just a researcher

00:12:33.159 --> 00:12:36.090
into a truly versatile assistant. That's incredible,

00:12:36.250 --> 00:12:39.309
giving the AI a whole team. Okay, so for people

00:12:39.309 --> 00:12:41.289
wanting to actually deploy this, move beyond

00:12:41.289 --> 00:13:03.639
just... Also, consider rate limiting on your

00:13:03.639 --> 00:13:05.539
webhook to prevent accidental or intentional

00:13:05.539 --> 00:13:09.139
overload. And keep a close eye on your API usage

00:13:09.139 --> 00:13:13.460
11 labs, your AI model provider perplexity. Those

00:13:13.460 --> 00:13:15.340
costs can add up if you're not monitoring them.

00:13:15.519 --> 00:13:18.419
And technically, remember to switch your NAN

00:13:18.419 --> 00:13:20.799
workflow from the test URL to the production

00:13:20.799 --> 00:13:23.179
URL. Just remove the test part and make sure

00:13:23.179 --> 00:13:25.480
the workflow toggle is set to active. Little

00:13:25.480 --> 00:13:29.019
details, but crucial. Okay, beyond the text setup.

00:13:29.320 --> 00:13:32.159
There's the experience itself, crafting the perfect

00:13:32.159 --> 00:13:34.519
conversational experience. This sounds like an

00:13:34.519 --> 00:13:37.019
art. Any tips on voice selection, keeping the

00:13:37.019 --> 00:13:40.100
flow smooth? It absolutely is an art. For voice

00:13:40.100 --> 00:13:42.320
selection, really think about the agent's role.

00:13:42.500 --> 00:13:45.000
Is it a formal research assistant? Maybe a clear,

00:13:45.019 --> 00:13:47.419
neutral voice? A creative brainstorming partner?

00:13:47.620 --> 00:13:49.820
Perhaps something more expressive? And definitely

00:13:49.820 --> 00:13:52.620
test voices with real content, not just, hello,

00:13:52.820 --> 00:13:55.259
a voice that sounds great for one sentence might

00:13:55.259 --> 00:13:58.230
get grading over a longer explanation. For conversation

00:13:58.230 --> 00:14:00.789
flow, keep the agent's responses conversational

00:14:00.789 --> 00:14:03.750
but also concise. Aim for maybe two or three

00:14:03.750 --> 00:14:06.269
sentences for simple answers. Don't let it ramble.

00:14:06.529 --> 00:14:10.080
And crucially, plan for errors. What happens

00:14:10.080 --> 00:14:14.000
if the NAN workflow times out or fails? The agent

00:14:14.000 --> 00:14:16.419
shouldn't just hang up. It needs a graceful exit,

00:14:16.539 --> 00:14:18.500
like, I'm sorry, I seem to be having trouble

00:14:18.500 --> 00:14:20.379
connecting to my research tool right now. Could

00:14:20.379 --> 00:14:22.340
you try again in a moment? Handling failures

00:14:22.340 --> 00:14:24.679
gracefully, yeah, that's key for a professional

00:14:24.679 --> 00:14:26.720
field. And this brings us back to the prompt,

00:14:26.779 --> 00:14:29.000
really. How do you ensure it behaves consistently?

00:14:29.600 --> 00:14:32.120
That's advanced prompt engineering for voice.

00:14:33.059 --> 00:14:35.639
The system prompt for your 11 Labs agent is its

00:14:35.639 --> 00:14:38.679
constitution. It needs explicit rules. Define

00:14:38.679 --> 00:14:41.120
the persona clearly. Set rules for conversation

00:14:41.120 --> 00:14:43.679
management, like always ask clarifying questions

00:14:43.679 --> 00:14:46.340
if a request is vague, keep information concise,

00:14:46.500 --> 00:14:48.720
use natural language. And most importantly for

00:14:48.720 --> 00:14:52.159
this setup, explicit tool usage guidelines. Define

00:14:52.159 --> 00:14:55.059
exactly when to use each N8N tool, how to introduce

00:14:55.059 --> 00:14:57.320
it, and what to tell the user if a tool fails

00:14:57.320 --> 00:14:59.000
or takes too long. It sounds like the prompt

00:14:59.000 --> 00:15:00.860
is doing a lot of heavy lifting in managing the

00:15:00.860 --> 00:15:03.399
whole interaction. It really is. It's the brain

00:15:03.399 --> 00:15:06.340
governing the conversation flow and tool orchestration.

00:15:06.440 --> 00:15:09.220
So this blueprint, it really unlocks some serious

00:15:09.220 --> 00:15:12.299
superpowers. These advanced voice agent patterns.

00:15:12.539 --> 00:15:14.860
We're talking multi -tool agents doing research,

00:15:15.100 --> 00:15:18.620
scheduling, emailing. Checking databases, analyzing

00:15:18.620 --> 00:15:21.700
sales data. Yeah. And context -aware conversations.

00:15:22.059 --> 00:15:24.179
Yeah. Connecting to memory systems like ZEP or

00:15:24.179 --> 00:15:27.179
Supabase so it remembers past chats. Exactly.

00:15:27.179 --> 00:15:29.519
So you can pick up where you left off or it can

00:15:29.519 --> 00:15:31.919
build knowledge over time. Makes it feel much

00:15:31.919 --> 00:15:34.820
more intelligent. And then my favorite, Star

00:15:34.820 --> 00:15:38.200
Trek mode. Voice -activated workflows. Using

00:15:38.200 --> 00:15:41.360
the voice agent as, like, a master controller

00:15:41.360 --> 00:15:44.379
for other inane automations. Precisely. Imagine

00:15:44.379 --> 00:15:46.240
just saying, computer, run the morning sales

00:15:46.240 --> 00:15:48.659
report automation. Oh, man. For anyone who's

00:15:48.659 --> 00:15:50.639
manually pulled reports every morning, that sounds

00:15:50.639 --> 00:15:52.919
like pure magic. Have you actually seen teams

00:15:52.919 --> 00:15:54.379
implement that kind of thing? Does this save

00:15:54.379 --> 00:15:56.399
a lot of time? Oh, absolutely. E -efficiency

00:15:56.399 --> 00:15:58.620
gains can be huge, especially for repetitive

00:15:58.620 --> 00:16:00.960
tasks that can be triggered by a simple voice

00:16:00.960 --> 00:16:03.379
command. It frees people up for more complex

00:16:03.379 --> 00:16:39.039
work. And the agent talks too much, doesn't know

00:16:39.039 --> 00:16:41.259
when to stop. That's almost always a prompt engineering

00:16:41.259 --> 00:16:43.960
issue. You need to go back to that system prompt

00:16:43.960 --> 00:17:12.339
and add stricter rules. Also, sales and lead

00:17:12.339 --> 00:17:15.509
qualification. An AI voice agent can handle initial

00:17:15.509 --> 00:17:18.710
outreach, answer basic questions, qualify leads,

00:17:19.009 --> 00:17:21.730
and then pass the really warm ones over to a

00:17:21.730 --> 00:17:25.309
human salesperson. Very efficient. And internally,

00:17:25.549 --> 00:17:28.069
think voice -controlled tools for teams, allowing

00:17:28.069 --> 00:17:31.369
people to query databases, run reports, or trigger

00:17:31.369 --> 00:17:33.509
workflows completely hands -free while they're

00:17:33.509 --> 00:17:35.750
doing other things. The possibilities really

00:17:35.750 --> 00:17:38.390
do seem vast. Yeah. Okay, let's try and bring

00:17:38.390 --> 00:17:40.430
this all together. The bottom line seems pretty

00:17:40.430 --> 00:17:42.980
clear. The future is conversational. And think

00:17:42.980 --> 00:17:45.619
about it. A few years back, the idea that a solo

00:17:45.619 --> 00:17:48.119
creator, maybe even just you listening now, could

00:17:48.119 --> 00:17:50.740
build an AI agent that holds a natural real -time

00:17:50.740 --> 00:17:52.900
voice chat, hooks into the Internet, performs

00:17:52.900 --> 00:17:55.680
tasks, that was pure sci -fi. Absolutely. Star

00:17:55.680 --> 00:17:58.109
Trek stuff. Right. And today. It's just another

00:17:58.109 --> 00:18:00.289
project you can build in AAN. You literally have

00:18:00.289 --> 00:18:02.890
the blueprint now to create AI systems that don't

00:18:02.890 --> 00:18:05.609
just crunch data, but actually engage, interact,

00:18:05.809 --> 00:18:08.529
connect on a much more human level. It democratizes

00:18:08.529 --> 00:18:10.789
some seriously advanced capabilities. Totally.

00:18:10.869 --> 00:18:13.069
So here's your mission briefing, your takeaway

00:18:13.069 --> 00:18:15.869
challenge, how you use this blueprint. Are you

00:18:15.869 --> 00:18:18.890
going to cast your character? Maybe a witty,

00:18:18.930 --> 00:18:21.650
sarcastic J -A -R -V -I -S like Iron Man's AI?

00:18:22.670 --> 00:18:26.190
Or perhaps a calm, professional, endlessly patient

00:18:26.190 --> 00:18:29.210
Star Trek computer voice. The personality is

00:18:29.210 --> 00:18:33.809
half the fun. It really is. And think, what specific

00:18:33.809 --> 00:18:38.029
nagging, recurring... problem in your life, your

00:18:38.029 --> 00:18:40.470
work, your business could be totally transformed

00:18:40.470 --> 00:18:43.150
by adding a voice. By batting one of these agents,

00:18:43.230 --> 00:18:44.910
imagine connecting it to your smart home, your

00:18:44.910 --> 00:18:47.369
office, literally saying, hey Jarvis, or whatever

00:18:47.369 --> 00:18:49.309
you call it, turn on the lights in the studio

00:18:49.309 --> 00:18:51.150
and start the coffee machine. The tech is there.

00:18:51.269 --> 00:18:53.309
It's ready. It really is. It's just waiting for

00:18:53.309 --> 00:18:55.650
your creativity. Time to get building. We really

00:18:55.650 --> 00:18:58.069
hope this deep dive has sparked some ideas, maybe

00:18:58.069 --> 00:19:00.470
given you some surprising insights. Keep exploring,

00:19:00.630 --> 00:19:02.609
keep building, and we'll see you on the next

00:19:02.609 --> 00:19:03.029
deep dive.