WEBVTT

00:00:00.000 --> 00:00:03.299
OK, so you hear prompt engineering. Right. And

00:00:03.299 --> 00:00:06.799
maybe you picture someone typing magic words

00:00:06.799 --> 00:00:08.820
into a chat bot or you wonder if it's just another,

00:00:08.960 --> 00:00:12.359
you know, overhyped tech job title. Yeah. Or

00:00:12.359 --> 00:00:14.500
maybe even a bit of a fake job, some people think.

00:00:14.560 --> 00:00:16.679
Right. But here's where it gets really interesting.

00:00:16.839 --> 00:00:19.820
Our sources suggest it's actually the kind of

00:00:19.820 --> 00:00:23.339
secret weapon powering some seriously successful

00:00:23.339 --> 00:00:26.949
AI startups. That's true. And today. We're taking

00:00:26.949 --> 00:00:29.769
a deep dive based on insights straight from Y

00:00:29.769 --> 00:00:32.409
Combinator. You know, these are the folks renowned

00:00:32.409 --> 00:00:34.990
for spotting and scaling companies that go on

00:00:34.990 --> 00:00:36.969
to become massive tech players. The ones behind

00:00:36.969 --> 00:00:40.670
so many big names. Exactly. And they share their

00:00:40.670 --> 00:00:43.149
actual methods, their operational playbook for

00:00:43.149 --> 00:00:45.850
building sophisticated AI agents, not just like.

00:00:46.170 --> 00:00:48.229
Dabbling with chatbots. Yeah, this isn't about

00:00:48.229 --> 00:00:50.810
tweaking your chat GPT query to get a better

00:00:50.810 --> 00:00:53.229
poem or something. The material we're exploring

00:00:53.229 --> 00:00:55.409
goes way deeper. It's structured. It's tactical.

00:00:55.590 --> 00:00:58.049
It includes real world examples, even the failures

00:00:58.049 --> 00:01:00.450
and the specific techniques they use. It's pretty

00:01:00.450 --> 00:01:02.909
detailed. So our mission for you in this deep

00:01:02.909 --> 00:01:05.769
dive is to unpack these strategies. We want to

00:01:05.769 --> 00:01:07.409
give you a shortcut to understanding what the

00:01:07.409 --> 00:01:09.670
pros are doing behind the scenes. So you can

00:01:09.670 --> 00:01:11.969
start thinking about how these powerful methods

00:01:11.969 --> 00:01:14.269
might apply to what you're building or, you know,

00:01:14.269 --> 00:01:18.719
just interested in. OK, let's jump in. So why

00:01:18.719 --> 00:01:21.680
does this advanced approach to prompting even

00:01:21.680 --> 00:01:24.519
matter? I mean, beyond just getting better answers

00:01:24.519 --> 00:01:28.120
from an AI? Well, the source frames it as fundamentally

00:01:28.120 --> 00:01:30.640
changing how software is sold and delivered,

00:01:30.799 --> 00:01:32.819
especially when you're dealing with large businesses,

00:01:32.980 --> 00:01:35.659
that whole enterprise market. How so? What's

00:01:35.659 --> 00:01:38.500
fascinating here is this shift in roles. Gary

00:01:38.500 --> 00:01:41.420
Tan at Y Combinator talks about the forward deployed

00:01:41.420 --> 00:01:44.370
engineer. It's this idea that the key technical

00:01:44.370 --> 00:01:46.870
talent in AI startups aren't just coding away

00:01:46.870 --> 00:01:49.030
in the back room. Okay. They're acting like engineers

00:01:49.030 --> 00:01:51.530
directly in front of customers, solving problems

00:01:51.530 --> 00:01:54.650
live, often by configuring or even building these

00:01:54.650 --> 00:01:57.250
AI agents, like right there on the spot. Wow.

00:01:57.349 --> 00:01:59.150
Okay. Compare that to the old way, you know,

00:01:59.170 --> 00:02:01.109
the traditional enterprise sales cycle. The source

00:02:01.109 --> 00:02:03.670
even brings up the Salesforce comparison. Oh,

00:02:03.689 --> 00:02:05.829
yeah. Yeah. That's long months sometimes, right?

00:02:06.170 --> 00:02:08.990
Generic demos, endless back and forth, trying

00:02:08.990 --> 00:02:11.590
to figure out if the software even fits the customer's

00:02:11.590 --> 00:02:14.310
actual problem. It can be painful. Totally. And

00:02:14.310 --> 00:02:17.469
the new way. An AI -native company hears a specific

00:02:17.469 --> 00:02:19.990
problem the customer has, and they might build

00:02:19.990 --> 00:02:23.629
and demonstrate a tailored AI solution. Maybe

00:02:23.629 --> 00:02:26.590
overnight. Or in a day or two. It's kind of wild

00:02:26.590 --> 00:02:28.889
how fast that is, that speed is the differentiator.

00:02:29.030 --> 00:02:32.069
And the impact is concrete. The source highlights

00:02:32.069 --> 00:02:35.370
companies like Geiger ML and Happy Robot. They're

00:02:35.370 --> 00:02:38.229
using this exact playbook, this ability to rapidly

00:02:38.229 --> 00:02:41.449
tailor AI agents to solve specific customer problems.

00:02:41.849 --> 00:02:43.750
And they're closing seven -figure deals doing

00:02:43.750 --> 00:02:46.729
it. Not small pilot projects, huge deals. Exactly.

00:02:46.770 --> 00:02:48.909
They're not selling a generic software license.

00:02:49.169 --> 00:02:51.449
They're selling a rapid, high -value solution

00:02:51.449 --> 00:02:54.610
to a specific pain point. That's how they're

00:02:54.610 --> 00:02:56.669
described as eating Salesforce alive in certain

00:02:56.669 --> 00:02:59.509
areas. Because of that speed and direct problem

00:02:59.509 --> 00:03:01.849
-solving with AI. Right. But that speed requires

00:03:01.849 --> 00:03:04.009
structure, doesn't it? You can't build something

00:03:04.009 --> 00:03:06.150
reliable and scalable for a seven -figure deal

00:03:06.150 --> 00:03:08.710
with just one giant, messy prompt. It'll fall

00:03:08.710 --> 00:03:11.610
apart. Totally. Which leads into the next big

00:03:11.610 --> 00:03:15.349
thing, the source covers structure. They spent

00:03:15.349 --> 00:03:17.370
a lot of time on this. They dive into this real

00:03:17.370 --> 00:03:20.169
world example called ParaHelp. Ah, yeah, ParaHelp.

00:03:20.270 --> 00:03:23.729
It's a YC company powering customer support for

00:03:23.729 --> 00:03:26.509
places like Perplexity, Replit, Bolton. These

00:03:26.509 --> 00:03:29.270
are serious companies. So actual production systems

00:03:29.270 --> 00:03:32.330
handling thousands of tickets. And they made

00:03:32.330 --> 00:03:34.889
their core prompt public, which is amazing. And

00:03:34.889 --> 00:03:39.120
it's like. Six pages long. Six pages. That right

00:03:39.120 --> 00:03:41.500
there tells you this isn't basic stuff. It's

00:03:41.500 --> 00:03:43.860
a production level prompt designed for scale

00:03:43.860 --> 00:03:46.120
and reliability, not just a quick experiment.

00:03:46.259 --> 00:03:48.860
Exactly. Well, we're not going to read six pages

00:03:48.860 --> 00:03:51.080
here. The source pulls out the key principles

00:03:51.080 --> 00:03:53.479
from that structure. Things like super clear

00:03:53.479 --> 00:03:55.900
role definition. Yeah, the AI isn't just an assistant,

00:03:56.120 --> 00:03:58.300
right? Yeah. It's described more like a manager

00:03:58.300 --> 00:04:00.539
who's approving tasks or making decisions based

00:04:00.539 --> 00:04:02.840
on rules. And structured decision making, like

00:04:02.840 --> 00:04:05.120
actual step -by -step instructions for handling

00:04:05.120 --> 00:04:07.139
different kinds of requests. Don't just figure

00:04:07.139 --> 00:04:09.580
it out. Follow these steps. And using structured

00:04:09.580 --> 00:04:12.759
formats is key too, like XML or JSON for input

00:04:12.759 --> 00:04:15.780
and output. Why XML or JSON specifically? Well,

00:04:15.900 --> 00:04:18.240
there's standard ways to organize data, basically

00:04:18.240 --> 00:04:21.560
putting tags or labels around information. LLMs

00:04:21.560 --> 00:04:23.959
seem to handle these really well, probably because

00:04:23.959 --> 00:04:26.120
they saw so much structured data during their

00:04:26.120 --> 00:04:29.120
training. It helps avoid ambiguity. Okay, that

00:04:29.120 --> 00:04:31.220
makes sense. What's also crucial in that para

00:04:31.220 --> 00:04:33.339
-health example, and what the source emphasizes

00:04:33.339 --> 00:04:35.740
a lot, are the built -in safety features and

00:04:35.740 --> 00:04:38.720
guardrails. Multiple checkpoints in the prompt

00:04:38.720 --> 00:04:41.329
itself to keep the AI on track. So it doesn't

00:04:41.329 --> 00:04:44.589
go rogue or get confused. Exactly. Or start fabricating

00:04:44.589 --> 00:04:47.689
information. That level of detail, those safety

00:04:47.689 --> 00:04:50.410
checks, they're non -negotiable for real world

00:04:50.410 --> 00:04:52.790
high stakes applications. So building on that

00:04:52.790 --> 00:04:55.589
idea of structure for scale, the source introduces

00:04:55.589 --> 00:04:57.670
this concept they call the three layer architecture.

00:04:58.389 --> 00:05:00.230
Sounds interesting. Yeah, this is really key.

00:05:00.329 --> 00:05:02.509
If you want to build one core AI system that

00:05:02.509 --> 00:05:04.930
can serve many different customers or handle

00:05:04.930 --> 00:05:07.810
various internal use cases, but still feel customized

00:05:07.810 --> 00:05:09.850
for each one. Okay, break it down. Layer one.

00:05:09.990 --> 00:05:12.509
Layer one is the system prompt. Think of this

00:05:12.509 --> 00:05:14.930
as the company -wide operating system for the

00:05:14.930 --> 00:05:18.189
AI. It defines the core identity, the universal

00:05:18.189 --> 00:05:20.610
rules, the brand voice. Like the personality.

00:05:20.930 --> 00:05:23.879
Sort of, yeah. And the fundamental rules. Something

00:05:23.879 --> 00:05:26.860
like, you are an expert, professional, empathetic

00:05:26.860 --> 00:05:29.779
customer service AI for our company name. Always

00:05:29.779 --> 00:05:32.399
follow these core guidelines. Be helpful. Be

00:05:32.399 --> 00:05:34.600
accurate. Never promise things you can't deliver.

00:05:35.420 --> 00:05:38.160
This layer is consistent for every interaction

00:05:38.160 --> 00:05:40.379
across the whole company. Okay. The baseline.

00:05:40.660 --> 00:05:43.339
Then layer two. That's the developer prompt.

00:05:43.500 --> 00:05:46.379
This layer adds customer -specific or use case

00:05:46.379 --> 00:05:49.529
-specific context and configuration. Ah, so this

00:05:49.529 --> 00:05:51.889
is the customization part. Right. If you're serving

00:05:51.889 --> 00:05:54.189
different B2B clients, this layer would include

00:05:54.189 --> 00:05:57.490
details unique to client A versus client B. It

00:05:57.490 --> 00:05:59.410
might define their industry, their common issues,

00:05:59.569 --> 00:06:02.769
specific escalation rules just for that client.

00:06:02.990 --> 00:06:06.829
Like any billing dispute over $500 for Acme Corp

00:06:06.829 --> 00:06:09.209
must be escalated to their account manager, Jane

00:06:09.209 --> 00:06:11.990
Doe. Something that specific. Exactly that specific.

00:06:12.149 --> 00:06:13.949
This is where you inject the tailored knowledge

00:06:13.949 --> 00:06:16.730
and workflows. Got it. And layer three is just...

00:06:16.939 --> 00:06:19.319
The actual user's message. Yep. Layer three is

00:06:19.319 --> 00:06:21.600
the user prompt, the real -time customer input,

00:06:21.839 --> 00:06:24.680
their actual question, their query, the data

00:06:24.680 --> 00:06:26.699
they provide, like my dashboard isn't loading

00:06:26.699 --> 00:06:29.160
or I need to change my shipping address. So the

00:06:29.160 --> 00:06:31.959
AI gets all three layers at once, the general

00:06:31.959 --> 00:06:34.360
rules, the specific context, and the immediate

00:06:34.360 --> 00:06:37.980
question. Precisely. The brilliance of this layering

00:06:37.980 --> 00:06:41.360
is that the AI model receives all three context

00:06:41.360 --> 00:06:44.500
layers combined. This allows a single underlying

00:06:44.500 --> 00:06:47.939
AI model to behave differently, accurately, and

00:06:47.939 --> 00:06:50.980
helpfully for varied scenarios or clients. Giving

00:06:50.980 --> 00:06:53.339
that feeling of personalized service, but at

00:06:53.339 --> 00:06:56.300
scale. Exactly. It's how you scale customization

00:06:56.300 --> 00:06:59.019
without building totally separate systems for

00:06:59.019 --> 00:07:01.279
everyone. That's really elegant. Okay, so beyond

00:07:01.279 --> 00:07:03.360
just structure, what kind of advanced techniques

00:07:03.360 --> 00:07:06.480
are these YC companies using? The source mentioned

00:07:06.480 --> 00:07:09.660
something that sounds kind of meta, meta -prompting.

00:07:09.819 --> 00:07:11.339
Oh, this is where it gets really interesting,

00:07:11.399 --> 00:07:13.740
I think. Metaprompting is basically using AI

00:07:13.740 --> 00:07:16.220
itself to write and refine your prompts. Wait,

00:07:16.279 --> 00:07:18.240
hold on. You're telling an AI, like, hey, my

00:07:18.240 --> 00:07:19.819
prompt isn't working very well. Can you make

00:07:19.819 --> 00:07:21.839
it better for me? Pretty much. Instead of just

00:07:21.839 --> 00:07:24.360
endlessly tweaking prompts manually, trying things

00:07:24.360 --> 00:07:27.660
out, getting frustrated, you use a powerful LLM,

00:07:27.680 --> 00:07:29.459
a large language model. That's what we mean when

00:07:29.459 --> 00:07:32.040
we say AI model here. Right, like GPT -4, Claude

00:07:32.040 --> 00:07:35.139
Opus. Exactly. You tell it to act as your expert

00:07:35.139 --> 00:07:37.980
prompt engineer. You feed it your current prompt.

00:07:38.139 --> 00:07:40.079
You describe the problems you're seeing. Maybe

00:07:40.079 --> 00:07:43.699
the AI agent is hallucinating or it's failing

00:07:43.699 --> 00:07:46.759
to follow the output format you specify. And

00:07:46.759 --> 00:07:49.339
you ask it to give you suggestions or even rewrite

00:07:49.339 --> 00:07:51.079
the problematic parts of the prompt for you.

00:07:51.139 --> 00:07:54.019
Wow. So the AI is helping debug its own instructions.

00:07:54.220 --> 00:07:56.199
That's kind of mind bending. It is. The source

00:07:56.199 --> 00:07:58.259
talks about a sort of meta prompting formula

00:07:58.259 --> 00:08:01.519
idea. Yeah. You give the AI expert persona the

00:08:01.519 --> 00:08:04.019
current prompt and the observed failures or desired

00:08:04.019 --> 00:08:06.220
improvements. Make it act like a prompt expert

00:08:06.220 --> 00:08:08.709
analyzing the problem. And they have a pro tip

00:08:08.709 --> 00:08:10.949
for this, right, about which models to use. Yeah,

00:08:10.990 --> 00:08:13.129
use the biggest, most capable models for the

00:08:13.129 --> 00:08:16.610
metaprompting task itself, like GPT -4 .1, Clod

00:08:16.610 --> 00:08:20.129
3 Opus, or Gemini 2 .5 Pro, because they're better

00:08:20.129 --> 00:08:22.209
at that complex reasoning and creative problem

00:08:22.209 --> 00:08:25.110
solving needed for improving a prompt, not just

00:08:25.110 --> 00:08:27.310
executing one. And does this actually work? Is

00:08:27.310 --> 00:08:30.829
there proof? Yes. They share a great real -world

00:08:30.829 --> 00:08:33.529
impact case study. A company had a simple initial

00:08:33.529 --> 00:08:36.580
prompt for an AI billing agent. But it gave generic

00:08:36.580 --> 00:08:39.100
answers, couldn't handle complex cases. Leading

00:08:39.100 --> 00:08:41.440
to lots of tickets getting escalated to human

00:08:41.440 --> 00:08:44.120
support. Exactly. A high escalation rate. So

00:08:44.120 --> 00:08:47.000
they applied metaprompting. They fed the prompts

00:08:47.000 --> 00:08:50.360
and the problems to a powerful LLM acting as

00:08:50.360 --> 00:08:53.039
an expert. and the ai suggested it suggested

00:08:53.039 --> 00:08:55.639
a completely rewritten much more detailed prompt

00:08:55.639 --> 00:08:59.059
with specific roles defined step -by -step processes

00:08:59.059 --> 00:09:01.740
for different billing issues clear examples of

00:09:01.740 --> 00:09:04.620
good and bad interactions and strict rules for

00:09:04.620 --> 00:09:07.399
when to escalate okay much more robust and the

00:09:07.399 --> 00:09:11.039
result A reported 40 % reduction in escalated

00:09:11.039 --> 00:09:13.399
billing tickets just from improving the prompt

00:09:13.399 --> 00:09:17.600
using the AI's own suggestions. Wow. 40%. That's

00:09:17.600 --> 00:09:20.360
a massive measurable business impact. That's

00:09:20.360 --> 00:09:22.600
not trivial. It totally underscores that prompt

00:09:22.600 --> 00:09:25.159
engineering isn't just about making the AI sound

00:09:25.159 --> 00:09:28.379
good. It's about driving tangible outcomes like

00:09:28.379 --> 00:09:31.139
reducing costs, improving efficiency, making

00:09:31.139 --> 00:09:33.320
customers happier. That's incredible. Okay. And

00:09:33.320 --> 00:09:34.860
another technique they highlight, which sounds

00:09:34.860 --> 00:09:36.940
really crucial, is what they call the escape

00:09:36.940 --> 00:09:40.360
hatch. Yes. This is basically giving the AI permission

00:09:40.360 --> 00:09:42.840
to say, I don't know. Which sounds simple, but

00:09:42.840 --> 00:09:45.320
why is it so important? Because the biggest mistake,

00:09:45.519 --> 00:09:47.980
according to the source, is designing your prompt

00:09:47.980 --> 00:09:51.200
so the AI must answer everything no matter what.

00:09:52.019 --> 00:09:54.779
LLMs are trained to be helpful and complete tasks.

00:09:55.200 --> 00:09:57.059
So if they don't know the answer? If they encounter

00:09:57.059 --> 00:09:59.000
something they don't know based on their knowledge

00:09:59.000 --> 00:10:02.220
or the context you gave them, their default programming,

00:10:02.440 --> 00:10:06.120
in a sense, is often to just make something plausible

00:10:06.120 --> 00:10:09.080
up. That's a hallucination. Which can be disastrous,

00:10:09.320 --> 00:10:11.700
right? If this AI is giving customers incorrect

00:10:11.700 --> 00:10:14.399
information or handling sensitive financial or

00:10:14.399 --> 00:10:17.320
medical tasks. Precisely. It destroys trust and

00:10:17.320 --> 00:10:20.179
can cause real harm. So you must build explicit

00:10:20.179 --> 00:10:22.639
uncertainty handling into the prompt. Give it

00:10:22.639 --> 00:10:25.580
a clear protocol. Like what? Things like, do

00:10:25.580 --> 00:10:28.600
not guess if you are unsure. If the user's request

00:10:28.600 --> 00:10:31.720
is ambiguous, ask for clarification. Never fabricate

00:10:31.720 --> 00:10:34.360
details or make assumptions. Escalate to a human

00:10:34.360 --> 00:10:36.820
supervisor if you are in doubt or if the request

00:10:36.820 --> 00:10:39.120
involves sensitive information like list -specific

00:10:39.120 --> 00:10:41.419
types. So clear boundaries, and the YC secret

00:10:41.419 --> 00:10:43.340
sauce version of this goes a step further, right?

00:10:43.399 --> 00:10:46.139
With that, a creasy feedback log field. Yes.

00:10:46.639 --> 00:10:49.549
This is clever. They designed the expected output

00:10:49.549 --> 00:10:52.809
format, maybe it's JSON, maybe XML, to include

00:10:52.809 --> 00:10:55.789
a dedicated field, like a feedback log, where

00:10:55.789 --> 00:10:59.110
the AI itself can log any ambiguities, uncertainties,

00:10:59.110 --> 00:11:01.669
or difficulties it had while processing the request.

00:11:01.929 --> 00:11:05.710
So the AI can kind of complain or raise a flag.

00:11:05.909 --> 00:11:08.009
Exactly. Think of it as the AI leaving notes

00:11:08.009 --> 00:11:10.470
for the human prompt engineer. The user's request

00:11:10.470 --> 00:11:13.289
about billing adjustment was unclear, or I couldn't

00:11:13.289 --> 00:11:15.870
find specific info on X in the provided documents.

00:11:16.129 --> 00:11:18.190
And then the engineers review these logs. Right.

00:11:18.269 --> 00:11:20.809
By reviewing these logs regularly, you get invaluable

00:11:20.809 --> 00:11:23.370
insight into exactly where your prompt is unclear,

00:11:23.590 --> 00:11:26.230
where the AI struggled, and how to make it better

00:11:26.230 --> 00:11:28.789
in the next iteration. Every interaction because

00:11:28.789 --> 00:11:30.370
of potential learning opportunity to improve

00:11:30.370 --> 00:11:32.740
the system. That's brilliant. You're using the

00:11:32.740 --> 00:11:35.440
AI's own confusion to refine its instructions.

00:11:35.720 --> 00:11:38.740
The source even mentions you can drag those log

00:11:38.740 --> 00:11:40.779
files, maybe if they're formatted as JSON. Yeah,

00:11:40.820 --> 00:11:44.320
into powerful LLMs like Gemini 2 .5 Pro to help

00:11:44.320 --> 00:11:46.600
analyze the logs themselves and find patterns

00:11:46.600 --> 00:11:49.480
in the failures or ambiguities. Using AI as a

00:11:49.480 --> 00:11:51.919
tool within the development and debugging process,

00:11:52.039 --> 00:11:55.320
not just as the final output. Very smart. It

00:11:55.320 --> 00:11:57.419
really is. It turns debugging into a data -driven

00:11:57.419 --> 00:12:00.240
process. Okay, shifting gears a bit to practical

00:12:00.240 --> 00:12:03.000
application. The source provides this really

00:12:03.000 --> 00:12:06.000
helpful model personality guide. What's that

00:12:06.000 --> 00:12:09.139
about? This is the recognition, based on lots

00:12:09.139 --> 00:12:12.259
of real -world usage, that not all LLMs are the

00:12:12.259 --> 00:12:15.580
same. Different models like Claude, GPT, and

00:12:15.580 --> 00:12:17.860
Gemini, even different versions within the same

00:12:17.860 --> 00:12:21.580
family like GPT -3 .5 versus GPT -4, have distinct

00:12:21.580 --> 00:12:23.700
strengths and tendencies. Almost have personalities.

00:12:23.759 --> 00:12:26.019
Yeah, that's a good way to put it. And you need

00:12:26.019 --> 00:12:28.039
to consider these personalities when you're writing

00:12:28.039 --> 00:12:30.000
prompts for them. You can't just use the exact

00:12:30.000 --> 00:12:32.700
same prompt and expect the best results from

00:12:32.700 --> 00:12:35.220
every model. Okay, so what are some of the personalities

00:12:35.220 --> 00:12:38.899
they describe, like Claude? They describe Claude,

00:12:38.899 --> 00:12:42.059
especially Claude III Opus, as maybe the... Collaborative,

00:12:42.179 --> 00:12:45.200
context -aware colleague. Generally good for

00:12:45.200 --> 00:12:47.759
customer -facing roles, handling long conversations,

00:12:48.080 --> 00:12:50.519
maintaining context, maybe more creative or nuanced

00:12:50.519 --> 00:12:52.740
writing tasks. So you might prompt it a bit more

00:12:52.740 --> 00:12:55.600
conversationally. Right. It often responds well

00:12:55.600 --> 00:12:57.500
to prompts that feel more like giving context

00:12:57.500 --> 00:13:00.179
and asking for collaboration, rather than just

00:13:00.179 --> 00:13:03.279
issuing commands. Interesting. And GPT, particularly

00:13:03.279 --> 00:13:06.720
GPT -4. GPT is often seen as the rule -following

00:13:06.720 --> 00:13:09.409
structured soldier. It tends to be really good

00:13:09.409 --> 00:13:12.210
at following complex, rigid instructions, handling

00:13:12.210 --> 00:13:15.110
step -by -step procedures accurately, and outputting

00:13:15.110 --> 00:13:17.769
specifically structured formats like JSON or

00:13:17.769 --> 00:13:21.389
XML reliably. So for GPT, you'd be more direct,

00:13:21.490 --> 00:13:24.389
more like programming. Exactly. You often prompt

00:13:24.389 --> 00:13:26.450
GPT more like you're giving explicit commands,

00:13:26.649 --> 00:13:28.529
defining functions, or laying out very clear

00:13:28.529 --> 00:13:31.250
logical steps. It excels when you are extremely

00:13:31.250 --> 00:13:33.190
clear and structured in what you want it to do

00:13:33.190 --> 00:13:35.490
and how you want the output formatted. And Gemini.

00:13:36.059 --> 00:13:38.759
especially the newer, larger versions. Gemini,

00:13:38.879 --> 00:13:41.360
particularly models like 1 .5 Pro or the upcoming

00:13:41.360 --> 00:13:44.200
2 .5 Pro, is often positioned as the thoughtful,

00:13:44.419 --> 00:13:47.559
analytical intern or researcher. Good for research

00:13:47.559 --> 00:13:50.340
tasks, breaking down complex problems, showing

00:13:50.340 --> 00:13:52.440
its reasoning, chain of thought, digging into

00:13:52.440 --> 00:13:54.440
large amounts of data or documents you provide.

00:13:54.679 --> 00:13:57.340
So prompts asking it to show its work or analyze

00:13:57.340 --> 00:14:00.250
information might play to its strengths. Yes,

00:14:00.250 --> 00:14:03.669
exactly. Prompts that encourage analysis, comparison,

00:14:03.929 --> 00:14:06.450
or step -by -step reasoning tend to leverage

00:14:06.450 --> 00:14:09.090
Gemini's strengths well, especially with that

00:14:09.090 --> 00:14:11.929
large context window some versions have. So the

00:14:11.929 --> 00:14:15.490
key takeaway is, know your model, tailor your

00:14:15.490 --> 00:14:19.250
prompt. Absolutely. One size does not fit all

00:14:19.250 --> 00:14:21.370
when it comes to advanced prompting. That makes

00:14:21.370 --> 00:14:23.610
a lot of sense. You wouldn't talk to every person

00:14:23.610 --> 00:14:26.049
the same way to get the best result. The source

00:14:26.049 --> 00:14:28.850
also gives concrete, real -world prompt examples

00:14:28.850 --> 00:14:31.259
like, actual templates for different types of

00:14:31.259 --> 00:14:33.960
agents. Yeah, this is super practical. They walk

00:14:33.960 --> 00:14:35.740
through the structure for things like an intelligent

00:14:35.740 --> 00:14:38.200
lead qualification agent, an empathetic technical

00:14:38.200 --> 00:14:41.220
support agent, and even a persuasive sales objection

00:14:41.220 --> 00:14:43.179
handler. These aren't just simple questions,

00:14:43.299 --> 00:14:45.460
right? They're detailed. Oh, very detailed. They're

00:14:45.460 --> 00:14:47.639
full instructions defining the agent's specific

00:14:47.639 --> 00:14:50.840
role. You are an AI assistant responsible for

00:14:50.840 --> 00:14:54.440
qualifying inbound leads. Its primary goal? Determine

00:14:54.440 --> 00:14:57.659
if the lead meets BANT criteria. the step -by

00:14:57.659 --> 00:15:00.360
-step process it must follow. Bant, that's budget,

00:15:00.480 --> 00:15:03.480
authority, need, and timeline, right? Standard

00:15:03.480 --> 00:15:06.919
sales qualification. Exactly. The prompt details

00:15:06.919 --> 00:15:09.580
how to assess each of those, what questions to

00:15:09.580 --> 00:15:12.120
ask or information to look for, and then specifies

00:15:12.120 --> 00:15:15.379
the required output format, maybe a JSON object

00:15:15.379 --> 00:15:18.220
with scores for each Bant element, and a final

00:15:18.220 --> 00:15:21.259
recommendation. Like route to senior sales rep

00:15:21.259 --> 00:15:24.740
or add to email nurture sequence? Precisely.

00:15:24.759 --> 00:15:26.899
Where the technical support agent prompt includes

00:15:26.899 --> 00:15:29.639
steps for diagnosing issues based on user descriptions,

00:15:29.940 --> 00:15:31.919
instructions on how to search a specific knowledge

00:15:31.919 --> 00:15:34.480
base, clear rules on when and how to escalate

00:15:34.480 --> 00:15:36.899
to a human tier two support, and even a specific

00:15:36.899 --> 00:15:38.919
template for the response back to the customer,

00:15:39.100 --> 00:15:42.000
ensuring consistency and empathy. And the sales

00:15:42.000 --> 00:15:43.919
objection handler one sounds interesting, too.

00:15:44.039 --> 00:15:46.759
It's not just canned replies. No, it embeds a

00:15:46.759 --> 00:15:49.080
whole philosophy for handling objections. It

00:15:49.080 --> 00:15:50.899
outlines strategies for common ones like it's

00:15:50.899 --> 00:15:53.500
too expensive, maybe reframing value, offering

00:15:53.500 --> 00:15:55.580
tiered options, or we need to think about it,

00:15:55.620 --> 00:15:57.899
suggesting discovery questions to uncover the

00:15:57.899 --> 00:16:00.259
real hesitation. It's quite strategic. So these

00:16:00.259 --> 00:16:03.279
examples are complex, sure. But they clearly

00:16:03.279 --> 00:16:05.740
show the level of strategic thinking and explicit

00:16:05.740 --> 00:16:09.000
instruction required to make these agents perform

00:16:09.000 --> 00:16:12.639
real business tasks effectively. It's way beyond

00:16:12.639 --> 00:16:15.870
basic Q &A. Definitely. It's about encoding business

00:16:15.870 --> 00:16:18.250
logic and process into the prompt. And once you've

00:16:18.250 --> 00:16:19.990
built one of these sophisticated agents, you

00:16:19.990 --> 00:16:21.889
need to know if it's actually working well, right?

00:16:22.009 --> 00:16:24.429
The source offers an evaluation framework for

00:16:24.429 --> 00:16:26.850
that. Absolutely crucial. It goes way beyond

00:16:26.850 --> 00:16:30.070
just asking, did it answer the question? That's

00:16:30.070 --> 00:16:31.769
not nearly enough for a business application.

00:16:32.169 --> 00:16:34.529
It's a multi -level approach. Okay, level one.

00:16:34.690 --> 00:16:37.429
Level one is basic functionality and adherence.

00:16:37.629 --> 00:16:39.789
Is the agent following the fundamental rules

00:16:39.789 --> 00:16:42.289
you set? Is it staying within constraints? Is

00:16:42.289 --> 00:16:45.100
the output... in the correct format you specified

00:16:45.100 --> 00:16:47.360
like if you said output json are you getting

00:16:47.360 --> 00:16:50.100
valid json every single time or is it sometimes

00:16:50.100 --> 00:16:53.080
just rambling text exactly yeah basics have to

00:16:53.080 --> 00:16:55.740
work reliably then level two is quality and user

00:16:55.740 --> 00:16:58.840
experience how satisfied are the actual users

00:16:58.840 --> 00:17:00.919
interacting with it what's the customer satisfaction

00:17:00.919 --> 00:17:04.319
score or cs8 Critically, what's the escalation

00:17:04.319 --> 00:17:06.799
rate? How often do humans have to step in because

00:17:06.799 --> 00:17:09.819
the AI failed? Ah, okay. So a lower escalation

00:17:09.819 --> 00:17:12.619
rate is a very good sign. Huge sign. Also, things

00:17:12.619 --> 00:17:14.660
like first contact resolution rate did the AI

00:17:14.660 --> 00:17:16.940
solve the user's issue on the very first interaction?

00:17:17.480 --> 00:17:20.460
And qualitative things like how accurate, clear,

00:17:20.599 --> 00:17:23.059
and appropriately toned are the responses? That

00:17:23.059 --> 00:17:25.599
makes sense. Quality metrics and level three.

00:17:25.700 --> 00:17:27.900
Level three is the bottom line. Business impact

00:17:27.900 --> 00:17:31.259
and ROI. Is this AI agent actually moving the

00:17:31.259 --> 00:17:33.859
needle on key business metrics? Is it increasing

00:17:33.859 --> 00:17:36.579
revenue, improving lead conversion rates, reducing

00:17:36.579 --> 00:17:39.460
customer support costs, saving valuable human

00:17:39.460 --> 00:17:41.599
time? That's the ultimate goal, right? Connecting

00:17:41.599 --> 00:17:43.640
the AI's performance directly back to tangible

00:17:43.640 --> 00:17:46.200
business value. They even mention using something

00:17:46.200 --> 00:17:48.220
like a sample rubric. Yeah, like a scorecard

00:17:48.220 --> 00:17:50.359
to evaluate individual interactions consistently,

00:17:50.720 --> 00:17:54.180
looking at things like accuracy. tone adherence

00:17:54.180 --> 00:17:57.539
to process efficiency and the overall perceived

00:17:57.539 --> 00:17:59.980
quality of the interaction from the user's perspective

00:17:59.980 --> 00:18:02.839
so using a framework like this regularly provides

00:18:02.839 --> 00:18:05.700
critical data it tells you where your agent is

00:18:05.700 --> 00:18:08.640
failing or succeeding and it guides your iterative

00:18:09.019 --> 00:18:11.480
prompt improvement efforts. It's how you go from

00:18:11.480 --> 00:18:14.140
just a cool tech demo to a system that actually

00:18:14.140 --> 00:18:17.019
delivers measurable, reliable value to the business.

00:18:17.180 --> 00:18:19.799
It closes the loop. OK, so if you're listening

00:18:19.799 --> 00:18:21.839
and feeling inspired to try building one of these,

00:18:22.000 --> 00:18:23.740
the source doesn't just give you the concepts.

00:18:23.859 --> 00:18:26.740
They actually provide a real world implementation

00:18:26.740 --> 00:18:30.240
roadmap, like a week by week plan. Yeah, it's

00:18:30.240 --> 00:18:32.559
a very practical, systematic process they lay

00:18:32.559 --> 00:18:35.009
out. Week one. Foundation and initial prompting.

00:18:35.049 --> 00:18:37.349
Start small. Don't try to boil the ocean. Pick

00:18:37.349 --> 00:18:40.170
one really focused use case, maybe an internal

00:18:40.170 --> 00:18:42.589
one first to lower the stakes. Select the best

00:18:42.589 --> 00:18:45.329
model for that specific job based on those personalities

00:18:45.329 --> 00:18:49.529
we talked about. Draft a simple V1 prompt. Don't

00:18:49.529 --> 00:18:52.609
obsess over perfection yet. And set up your basic

00:18:52.609 --> 00:18:55.130
criteria for what success looks like, your level

00:18:55.130 --> 00:18:57.349
one metrics maybe. Right. Don't aim for perfect

00:18:57.349 --> 00:19:00.410
on day one. Get something working. Exactly. Week

00:19:00.410 --> 00:19:03.549
two. Strict testing and iterative refinement.

00:19:03.630 --> 00:19:06.089
This is crucial. Run lots of diverse test scenarios,

00:19:06.329 --> 00:19:08.789
the easy cases, the tricky edge cases. Try to

00:19:08.789 --> 00:19:12.450
break it. And document meticulously every single

00:19:12.450 --> 00:19:15.589
time the agent messes up. Every failure. Why

00:19:15.589 --> 00:19:17.670
is documenting failure so important? Because

00:19:17.670 --> 00:19:20.529
that documentation is gold. Those failures are

00:19:20.529 --> 00:19:22.869
your data. You feed those documented failures

00:19:22.869 --> 00:19:24.930
into your metaprompting process with a powerful

00:19:24.930 --> 00:19:27.890
model to analyze why it failed and get suggestions

00:19:27.890 --> 00:19:30.369
for improving the prompt. Maybe you A, B test

00:19:30.369 --> 00:19:32.309
a couple of different prompt versions based on

00:19:32.309 --> 00:19:34.109
that feedback. So the failures aren't problems.

00:19:34.190 --> 00:19:35.970
They're just data points for the next iteration.

00:19:36.529 --> 00:19:39.589
Precisely. Embrace the failures. Week three,

00:19:39.750 --> 00:19:42.329
production preparedness and safety nets. Now

00:19:42.329 --> 00:19:44.910
you build in those crucial escape hatches and

00:19:44.910 --> 00:19:47.009
error handling mechanisms. We talked about the,

00:19:47.109 --> 00:19:51.069
I don't know, capability, the logging. Implement

00:19:51.069 --> 00:19:53.289
monitoring and alerts so you know if things go

00:19:53.289 --> 00:19:55.890
wrong in production. And train the humans. Yes.

00:19:56.009 --> 00:19:58.710
Train any human team members who will interact

00:19:58.710 --> 00:20:01.910
with, oversee, or support the AI agent. Make

00:20:01.910 --> 00:20:03.829
sure they know how it works and what to do if

00:20:03.829 --> 00:20:06.309
it escalates. And then maybe consider a soft

00:20:06.309 --> 00:20:09.309
launch. Release it to a small, controlled group

00:20:09.309 --> 00:20:12.339
of users first. A phased rollout. Makes a lot

00:20:12.339 --> 00:20:14.700
of sense. Minimizes risk. Totally. And finally,

00:20:14.779 --> 00:20:17.119
week four, full deployment, scaling, and optimization.

00:20:17.500 --> 00:20:19.900
Go live for that specific use case you targeted.

00:20:20.039 --> 00:20:22.059
Keep monitoring performance constantly using

00:20:22.059 --> 00:20:24.900
that evaluation framework. And this is key. Schedule

00:20:24.900 --> 00:20:27.339
regular prompt reviews and updates. Ah, so it's

00:20:27.339 --> 00:20:29.880
not set it and forget it. Definitely not. Models

00:20:29.880 --> 00:20:32.180
get updated. Business needs evolve. You learn

00:20:32.180 --> 00:20:34.680
more from real -world usage patterns. You have

00:20:34.680 --> 00:20:36.599
to treat the prompt like a living piece of critical

00:20:36.599 --> 00:20:38.940
software that needs ongoing maintenance and optimization.

00:20:39.609 --> 00:20:42.170
Treat the prompt like code, essentially. Needs

00:20:42.170 --> 00:20:44.230
version control, needs updates. Pretty much,

00:20:44.390 --> 00:20:47.170
yeah. And then once that first agent is stable

00:20:47.170 --> 00:20:49.349
and delivering value, you take everything you

00:20:49.349 --> 00:20:51.150
learn from that process and you start planning

00:20:51.150 --> 00:20:54.509
your next focused agent. Build iteratively. Okay,

00:20:54.569 --> 00:20:57.029
that's a solid plan. And before we wrap up, the

00:20:57.029 --> 00:20:59.789
source also explicitly lists the common mistakes

00:20:59.789 --> 00:21:02.849
that kill these AI agent projects. What are some

00:21:02.849 --> 00:21:05.230
of the big pitfalls to avoid? Yeah, these are

00:21:05.230 --> 00:21:08.849
based on seeing what goes wrong. Everything agent

00:21:08.849 --> 00:21:11.549
trap is a classic. Trying to build one single

00:21:11.549 --> 00:21:15.210
AI agent that does 12 different complex, unrelated

00:21:15.210 --> 00:21:18.109
tasks. Like trying to build one agent for sales

00:21:18.109 --> 00:21:20.430
prospecting, technical support, and internal

00:21:20.430 --> 00:21:23.470
HR queries all at once. Exactly. Just gets confused

00:21:23.470 --> 00:21:26.069
and performs poorly at everything. The fix. Build

00:21:26.069 --> 00:21:28.690
focused, specialized agents. A specialist AI

00:21:28.690 --> 00:21:31.170
is almost always better than a generalist AI

00:21:31.170 --> 00:21:33.930
for challenging specific work. Makes sense. What

00:21:33.930 --> 00:21:36.410
else? The perfect prompt obsession. Spending

00:21:36.410 --> 00:21:38.589
months and months endlessly tweaking a prompt

00:21:38.589 --> 00:21:40.670
in a sandbox environment before you ever put

00:21:40.670 --> 00:21:42.730
it in front of real users or real scenarios.

00:21:42.910 --> 00:21:45.930
What else is paralysis? Totally. The fix is to

00:21:45.930 --> 00:21:49.789
ship early, even if V1 is imperfect. Get it out

00:21:49.789 --> 00:21:52.349
there. Let it fail in controlled ways. Gather

00:21:52.349 --> 00:21:55.009
that real -world failure data and then iterate

00:21:55.009 --> 00:21:57.789
based on actual performance, not just theoretical

00:21:57.789 --> 00:22:00.670
perfection. Okay, get data faster. What about

00:22:00.670 --> 00:22:03.309
the AI will magically figure it out fallacy?

00:22:03.349 --> 00:22:06.039
Well, this is the big one. Assuming the AI understands

00:22:06.039 --> 00:22:09.000
implied context, can handle ambiguous requests

00:22:09.000 --> 00:22:11.359
gracefully, or knows your company's specific

00:22:11.359 --> 00:22:14.220
internal policies or edge case procedures without

00:22:14.220 --> 00:22:16.680
being explicitly told. Assuming it just knows

00:22:16.680 --> 00:22:20.319
things. Right. It doesn't. The fix. Be incredibly

00:22:20.319 --> 00:22:22.799
explicit about everything in your prompt. Detail

00:22:22.799 --> 00:22:24.799
the roles, the step -by -step processes, the

00:22:24.799 --> 00:22:27.319
constraints, what data sources to use or ignore,

00:22:27.500 --> 00:22:29.960
and especially what to do when it's unsure. Spell

00:22:29.960 --> 00:22:31.980
it out. Leave nothing to chance. And the last

00:22:31.980 --> 00:22:33.700
one you mentioned earlier, the set -and -forget

00:22:33.700 --> 00:22:35.779
syndrome. Yeah, deploying the agent and then

00:22:35.779 --> 00:22:37.619
basically never looking at the prompt again,

00:22:37.720 --> 00:22:40.200
assuming the job is done. Prompts degrade over

00:22:40.200 --> 00:22:43.019
time as models change or business needs shift.

00:22:43.710 --> 00:22:46.250
The fix. Schedule regular performance audits,

00:22:46.289 --> 00:22:48.730
review those feedback logs, and plan for regular

00:22:48.730 --> 00:22:50.970
prompt updates as part of your operational rhythm.

00:22:51.190 --> 00:22:53.910
Treat the prompt like the critical business logic

00:22:53.910 --> 00:22:56.609
it actually is. Exactly. It's not just configuration.

00:22:57.009 --> 00:23:00.119
It is the logic for these systems. Okay, so wrapping

00:23:00.119 --> 00:23:02.740
this all up, this deep dive into the YC perspective

00:23:02.740 --> 00:23:06.079
really shows that elite AI prompt engineering

00:23:06.079 --> 00:23:09.779
isn't about some kind of, you know, coding magic

00:23:09.779 --> 00:23:12.519
or finding secret keywords nobody else knows.

00:23:12.660 --> 00:23:15.420
No, it's much more systematic. It's a planned

00:23:15.420 --> 00:23:18.460
strategic operational discipline. It involves

00:23:18.460 --> 00:23:20.720
structured architecture like that three -layer

00:23:20.720 --> 00:23:23.990
model. Rigorous testing driven by analyzing failures,

00:23:24.210 --> 00:23:26.549
continuous measurement against real business

00:23:26.549 --> 00:23:29.589
outcomes, and that ongoing process of refinement

00:23:29.589 --> 00:23:32.150
using techniques like metaprompting and feedback

00:23:32.150 --> 00:23:34.900
logs. And as the source really hammers home,

00:23:35.119 --> 00:23:37.480
mastering this skill, building this capability

00:23:37.480 --> 00:23:40.099
within your team or yourself brings a clear,

00:23:40.180 --> 00:23:42.140
significant competitive advantage right now.

00:23:42.240 --> 00:23:44.299
Definitely. The companies getting good at this

00:23:44.299 --> 00:23:46.380
now are the ones we see rapidly solving specific

00:23:46.380 --> 00:23:48.900
high value problems and frankly winning those

00:23:48.900 --> 00:23:50.599
big deals because they can deliver solutions

00:23:50.599 --> 00:23:52.660
so much faster and more tailored than traditional

00:23:52.660 --> 00:23:56.079
methods. And the window to gain this kind of

00:23:56.079 --> 00:23:59.099
expertise, well... it might not stay open forever

00:23:59.099 --> 00:24:01.900
as these techniques become more widespread, more

00:24:01.900 --> 00:24:04.299
standard practice. Which makes the final thought

00:24:04.299 --> 00:24:07.700
from the source, particularly thought -provoking

00:24:07.700 --> 00:24:10.140
for you listening, the fundamental question isn't

00:24:10.140 --> 00:24:12.500
really whether AI agents will transform your

00:24:12.500 --> 00:24:15.230
industry. That seems... almost a certainty at

00:24:15.230 --> 00:24:17.589
this point for many sectors. Right. The change

00:24:17.589 --> 00:24:19.910
is coming. The more critical question is, will

00:24:19.910 --> 00:24:21.990
you be one of the architects building them and

00:24:21.990 --> 00:24:24.609
shaping that future? Or will you find yourself

00:24:24.609 --> 00:24:27.650
competing against the companies, maybe even the

00:24:27.650 --> 00:24:30.410
AI agents themselves that are mastering this?

00:24:30.609 --> 00:24:33.170
That's a powerful question. So their suggested

00:24:33.170 --> 00:24:35.250
first step, maybe something you can do starting

00:24:35.250 --> 00:24:38.500
today. Pick one small, manageable process, something

00:24:38.500 --> 00:24:41.180
internal maybe, where an AI agent could potentially

00:24:41.180 --> 00:24:43.359
help streamline things or answer common questions.

00:24:43.880 --> 00:24:46.420
Draft a V1 prompt based on these principles we

00:24:46.420 --> 00:24:49.059
discussed. Define the role clearly, outline the

00:24:49.059 --> 00:24:51.880
task, list the steps, specify the rules, include

00:24:51.880 --> 00:24:54.079
that crucial escape hatch for uncertainty. Keep

00:24:54.079 --> 00:24:56.640
it simple to start. Yes. Then test it with some

00:24:56.640 --> 00:25:00.259
real scenarios. Document carefully where it fails

00:25:00.259 --> 00:25:02.900
or gets confused. Don't get discouraged by the

00:25:02.900 --> 00:25:05.400
failures. Use them. Start iterating based on

00:25:05.400 --> 00:25:07.500
that feedback. Just start. Start building and

00:25:07.500 --> 00:25:09.380
learning today. That seems to be the core message.

00:25:09.559 --> 00:25:11.640
Exactly. That's how you become one of the practitioners

00:25:11.640 --> 00:25:14.059
who are not just reacting to this shift, but

00:25:14.059 --> 00:25:15.380
actually shaping what comes next.
