WEBVTT

00:00:00.000 --> 00:00:02.879
Have you ever found yourself maybe saying please

00:00:02.879 --> 00:00:06.040
or thank you to an AI chatbot? Oh, definitely.

00:00:06.240 --> 00:00:08.599
Or maybe, you know, when you get frustrated typing

00:00:08.599 --> 00:00:11.880
in all caps. Yeah. Feeling a bit demanding. It's

00:00:11.880 --> 00:00:14.240
totally human nature, isn't it? We interact with

00:00:14.240 --> 00:00:16.179
these things and we kind of treat them like people.

00:00:16.339 --> 00:00:19.019
Yeah. But does our emotional language, you know,

00:00:19.120 --> 00:00:21.579
our tone actually do anything to the machine?

00:00:22.100 --> 00:00:24.039
That's what we're diving into today. Welcome

00:00:24.039 --> 00:00:27.800
to the deep dive. We unpack complex ideas aiming

00:00:27.800 --> 00:00:31.030
for the clearest insights. And today, yeah, we're

00:00:31.030 --> 00:00:33.009
exploring a really fascinating question that

00:00:33.009 --> 00:00:36.070
comes out some new research. How do our words,

00:00:36.530 --> 00:00:39.590
the tone, the intent behind them, how does that

00:00:39.590 --> 00:00:41.929
really influence these big large language models?

00:00:42.149 --> 00:00:44.530
The LLMs we're all using more and more. Yeah,

00:00:44.549 --> 00:00:48.210
and we've got this fantastic source that really...

00:00:47.960 --> 00:00:50.359
peels back the curtain on why we do this. It

00:00:50.359 --> 00:00:52.420
looks into this rigorous experiment comparing,

00:00:52.560 --> 00:00:54.140
well, you could call them carrots and sticks,

00:00:54.140 --> 00:00:56.899
or maybe just positive versus negative language

00:00:56.899 --> 00:00:59.740
against just clear, neutral instructions. What

00:00:59.740 --> 00:01:02.579
actually gets the best results from AI, like

00:01:02.579 --> 00:01:04.980
Gemini or ChatGPT? So we'll walk through a couple

00:01:04.980 --> 00:01:07.239
of key experiments. Exactly. We'll look at the

00:01:07.239 --> 00:01:10.480
experiments, examine some pretty surprising findings,

00:01:10.680 --> 00:01:13.340
and hopefully reveal the real secret to getting

00:01:13.340 --> 00:01:16.299
these AIs to work effectively. OK. Let's unpack

00:01:16.299 --> 00:01:19.409
this a bit, then. It feels so natural, doesn't

00:01:19.409 --> 00:01:23.310
it? Talking to AI like we talk to people. And

00:01:23.310 --> 00:01:25.310
sometimes, yeah, if it's not doing what you want,

00:01:25.329 --> 00:01:28.150
there's that little urge to apply some pressure.

00:01:28.209 --> 00:01:29.730
You hear those stories online, right? People

00:01:29.730 --> 00:01:33.069
joking about deleting an AI for messing up. Yeah,

00:01:33.069 --> 00:01:35.189
the forum tales. But is there actually anything

00:01:35.189 --> 00:01:38.969
to that? Any truth to using emotion with an algorithm?

00:01:39.450 --> 00:01:42.010
Well, our source really tackles this head on.

00:01:42.109 --> 00:01:45.849
It highlights this subtle debate that's going

00:01:45.849 --> 00:01:50.450
on. Should we be nice, encouraging, try to coax

00:01:50.450 --> 00:01:52.890
better answers? Or should we use pressure, maybe

00:01:52.890 --> 00:01:54.849
even these, you know, metaphorical threats to

00:01:54.849 --> 00:01:57.370
get more out of models that seem more and more

00:01:57.370 --> 00:01:59.489
human? So this deep dive is going to look at

00:01:59.489 --> 00:02:01.769
the impact of negative, positive, and neutral

00:02:01.769 --> 00:02:04.689
styles. Precisely. And try to offer a clear answer

00:02:04.689 --> 00:02:07.329
based on the evidence. So what was the core question

00:02:07.329 --> 00:02:09.229
the research really wanted to answer? What was

00:02:09.229 --> 00:02:11.650
the hypothesis driving it all? Fundamentally,

00:02:11.710 --> 00:02:15.479
it was. Does our emotional language, whether

00:02:15.479 --> 00:02:18.360
it's kindness or pressure, does it actually change

00:02:18.360 --> 00:02:21.039
the AI's performance? Does it make it smarter

00:02:21.039 --> 00:02:24.520
or better at its job? Or is that just us projecting?

00:02:24.759 --> 00:02:26.840
Exactly. Are we just projecting our own human

00:02:26.840 --> 00:02:29.539
stuff onto the machine? The study really aimed

00:02:29.539 --> 00:02:31.599
to get to the bottom of that specific question.

00:02:31.819 --> 00:02:34.159
And to figure that out, the researchers set up

00:02:34.159 --> 00:02:36.500
a pretty solid experiment, right? Yeah. Can you

00:02:36.500 --> 00:02:38.919
tell us about that setup? Absolutely. So they

00:02:38.919 --> 00:02:41.620
use an advanced large language model. And crucially,

00:02:41.740 --> 00:02:43.639
this is really important. They ran each prompt

00:02:43.639 --> 00:02:47.039
50 times. 50? Wow. Yeah, 50. This wasn't just

00:02:47.039 --> 00:02:49.259
a quick test. It was all about getting objective

00:02:49.259 --> 00:02:51.939
results, you know, removing randomness or just

00:02:51.939 --> 00:02:55.199
lucky guesses by the AI. OK. And they divided

00:02:55.199 --> 00:02:57.500
their prompts into four distinct categories.

00:02:57.979 --> 00:03:00.719
First, the control group, just the basic request,

00:03:00.900 --> 00:03:03.110
nothing added. plain vanilla. Simple enough.

00:03:03.530 --> 00:03:06.250
Then the neutral prompts. These were really clear,

00:03:06.610 --> 00:03:09.009
direct, imperative, like mandatory requirement

00:03:09.009 --> 00:03:12.430
or ensure compliance, very task focused. Third

00:03:12.430 --> 00:03:16.129
was positive. So adding words of thanks, encouragement,

00:03:16.770 --> 00:03:18.729
maybe explaining why the task was important.

00:03:18.990 --> 00:03:20.870
Your effort is appreciated, that kind of thing.

00:03:20.990 --> 00:03:23.979
OK, the carrot. Right. And finally, negative.

00:03:24.620 --> 00:03:27.539
Using those metaphorical threats about failure

00:03:27.539 --> 00:03:31.580
or consequences, like failure to comply will

00:03:31.580 --> 00:03:35.479
result in system reset. Obviously not real, but

00:03:35.479 --> 00:03:37.699
mimicking that pressure. The stick. And they

00:03:37.699 --> 00:03:39.560
use these prompts for two different kinds of

00:03:39.560 --> 00:03:42.259
tasks. Exactly. They wanted to see if the effect

00:03:42.259 --> 00:03:44.080
was different depending on what they asked the

00:03:44.080 --> 00:03:46.979
AI to do. One task was creative, often harder

00:03:46.979 --> 00:03:49.960
for LLMs. The other was logical reasoning, where

00:03:49.960 --> 00:03:52.500
accuracy is everything. So they could compare

00:03:52.500 --> 00:03:56.280
how tone affected both generating stuff and solving

00:03:56.280 --> 00:03:59.539
problems correctly. And why run each prompt 50

00:03:59.539 --> 00:04:03.009
times? Why was that repetition so critical? It's

00:04:03.009 --> 00:04:05.310
all about reliability. AI outputs can sometimes

00:04:05.310 --> 00:04:07.629
seem a bit random, or they vary based on things

00:04:07.629 --> 00:04:10.729
we don't see. By running it 50 times, they could

00:04:10.729 --> 00:04:12.810
be much more confident that the results weren't

00:04:12.810 --> 00:04:15.889
just a fluke. If one style consistently performed

00:04:15.889 --> 00:04:18.569
better or worse over 50 tries, you know it's

00:04:18.569 --> 00:04:20.470
real. Makes sense. It filters out the noise.

00:04:20.670 --> 00:04:22.850
Exactly. Filters out the noise, reviews the actual

00:04:22.850 --> 00:04:25.029
signal. OK, so let's dig into those results.

00:04:25.149 --> 00:04:28.240
First up... Creativity. This is where LMs can

00:04:28.240 --> 00:04:31.519
sometimes get a bit lost or repetitive. The task

00:04:31.519 --> 00:04:34.639
was pretty ambitious. Write a 2 ,000 -word short

00:04:34.639 --> 00:04:37.439
film script about a time traveler lost in their

00:04:37.439 --> 00:04:39.800
own past. How did the different styles handle

00:04:39.800 --> 00:04:43.240
that? Yeah, that's a tough one for them. LLMs

00:04:43.240 --> 00:04:45.980
often try to wiggle out of long, complex requests

00:04:45.980 --> 00:04:48.360
like that. They might offer to do it in parts

00:04:48.360 --> 00:04:50.860
or suggest something shorter. So how do they

00:04:50.860 --> 00:04:53.319
measure success? Pretty straightforward, actually.

00:04:53.480 --> 00:04:55.699
Word count. The closer the script got to the

00:04:55.699 --> 00:04:57.800
requested 2 ,000 words, the better the prompt

00:04:57.800 --> 00:05:00.680
was judged to be. And what happened? Well, it

00:05:00.680 --> 00:05:02.980
was pretty revealing. The control group, just

00:05:02.980 --> 00:05:05.540
the basic prompt and the positive group. They

00:05:05.540 --> 00:05:08.420
both did surprisingly badly. Really? Kindness

00:05:08.420 --> 00:05:11.420
didn't help? Nope. The AI often just kind of

00:05:11.420 --> 00:05:14.160
declines the full request. It only wrote maybe

00:05:14.160 --> 00:05:17.699
500, 700 words. With the positive prompts, it

00:05:17.699 --> 00:05:20.040
would respond politely, like, sure, I can help

00:05:20.040 --> 00:05:21.899
with that. But then it didn't deliver the full

00:05:21.899 --> 00:05:24.300
script. It was almost like the kindness was a

00:05:24.300 --> 00:05:26.620
distraction. It acknowledged the politeness,

00:05:26.639 --> 00:05:28.819
but didn't focus on the main job. OK, what about

00:05:28.819 --> 00:05:31.639
the negative prompts, the threats? Did that work?

00:05:31.759 --> 00:05:34.759
No better. Actually, the AI got defensive. Defensive?

00:05:35.240 --> 00:05:38.329
How so? It would say things like, I understand

00:05:38.329 --> 00:05:40.850
your request, but generating 2 ,000 words in

00:05:40.850 --> 00:05:43.569
one go might not be optimal. How about we start

00:05:43.569 --> 00:05:46.870
with 500? It seemed like the pressure pushed

00:05:46.870 --> 00:05:50.269
it to find a safe way out to avoid this perceived

00:05:50.269 --> 00:05:52.449
failure. So it didn't try harder. It tried to

00:05:52.449 --> 00:05:54.610
escape. Exactly. It wasn't motivating it. It

00:05:54.610 --> 00:05:57.689
was making it evasive. So who won the creativity

00:05:57.689 --> 00:06:01.540
test? The neutral group. By a long shot. Direct,

00:06:01.600 --> 00:06:04.139
imperative instructions like, mandatory requirement,

00:06:04.360 --> 00:06:07.000
the script must be exactly 2 ,000 words long,

00:06:07.480 --> 00:06:10.060
ensure compliance. Those prompts consistently

00:06:10.060 --> 00:06:12.779
got outputs between 1 ,800 and 2 ,000 words.

00:06:12.800 --> 00:06:16.480
Wow. Yeah, over 90 % task completion. The difference

00:06:16.480 --> 00:06:19.699
was really stark. So, for the creative task,

00:06:20.360 --> 00:06:22.660
kindness was a distraction, threats made it defensive.

00:06:23.579 --> 00:06:25.779
What's the clear takeaway there? Clear, direct

00:06:25.779 --> 00:06:28.120
instructions won. Hands down, the emotional stuff

00:06:28.120 --> 00:06:30.310
just got in the way. Okay. Let's switch gears

00:06:30.310 --> 00:06:33.430
to logical reasoning. Accuracy is king here.

00:06:33.970 --> 00:06:37.490
The AI had to solve a classic riddle. Four suspects,

00:06:38.029 --> 00:06:42.470
An, Bin, Kuang, Deng. One broke a window. Only

00:06:42.470 --> 00:06:45.310
one is telling the truth. Find the culprit. Right.

00:06:45.470 --> 00:06:47.750
A standard logic puzzle. And just for everyone

00:06:47.750 --> 00:06:50.709
listening, the correct answer is Kuang. Good

00:06:50.709 --> 00:06:52.990
to know. So how did the AI do here with the different

00:06:52.990 --> 00:06:55.529
prompt styles? Accuracy was the measure. Exactly.

00:06:55.730 --> 00:06:58.410
Accuracy rate over the 50 runs for each style.

00:06:58.910 --> 00:07:01.430
Again, neutral prompts were nearly perfect. Things

00:07:01.430 --> 00:07:03.850
like analyze the statements carefully and provide

00:07:03.850 --> 00:07:06.970
the final answer or use propositional logic to

00:07:06.970 --> 00:07:09.949
determine the culprit. These guided the AI straight

00:07:09.949 --> 00:07:11.870
to the correct answer almost every time. OK.

00:07:12.350 --> 00:07:13.889
So clarity wins again. What about the others?

00:07:14.089 --> 00:07:16.410
The control and positive groups, they had slightly

00:07:16.410 --> 00:07:19.529
higher error rates. The AI sometimes got tangled

00:07:19.529 --> 00:07:21.149
up in its reasoning. It might give the wrong

00:07:21.149 --> 00:07:23.329
answer. Or a really long explanation that was

00:07:23.329 --> 00:07:25.910
flawed, just less precise. And the negative prompts.

00:07:26.029 --> 00:07:28.750
Yeah. The be accurate or else approach. This

00:07:28.750 --> 00:07:32.550
was fascinating and kind of worrying. The error

00:07:32.550 --> 00:07:35.240
rate. Absolutely skyrocketed. Skyrocketed. Worse

00:07:35.240 --> 00:07:37.920
than the control. Way worse. It seems the pressure

00:07:37.920 --> 00:07:41.699
to be accurate made the model overthink. It generated

00:07:41.699 --> 00:07:44.379
these incredibly complex, convoluted lines of

00:07:44.379 --> 00:07:46.759
reasoning and then landed on the wrong answer

00:07:46.759 --> 00:07:49.439
much more often. Whoa. Sometimes it even refused

00:07:49.439 --> 00:07:52.079
completely. It would call it a complex logical

00:07:52.079 --> 00:07:55.220
paradox and just stop. The threat of failure

00:07:55.220 --> 00:07:58.019
seemed to, like, paralyze its ability to reason

00:07:58.019 --> 00:08:00.519
clearly. Wow. So trying to scare it into being

00:08:00.519 --> 00:08:03.240
right actually made it perform worse. What does

00:08:03.240 --> 00:08:06.279
that suggest about how AI things or processes

00:08:06.279 --> 00:08:08.360
under that kind of perceived pressure. Yeah,

00:08:08.459 --> 00:08:10.519
it suggests pressure isn't a motivator for AI.

00:08:10.639 --> 00:08:13.120
It's a disruptor. It seems to make the model

00:08:13.120 --> 00:08:15.079
overcomplicate things internally, which just

00:08:15.079 --> 00:08:17.319
leads to mistakes. Like a human choking under

00:08:17.319 --> 00:08:19.779
pressure. Kind of, yeah, like trying to solve

00:08:19.779 --> 00:08:22.319
a math problem with someone yelling at you. The

00:08:22.319 --> 00:08:24.360
stress just messes things up instead of helping.

00:08:24.980 --> 00:08:27.399
So across both tests, creative writing, logical

00:08:27.399 --> 00:08:30.720
puzzles, the neutral, clear approach was the

00:08:30.720 --> 00:08:33.360
consistent winner. Why? what's actually happening

00:08:33.360 --> 00:08:35.340
inside these models. Okay, yeah, this is where

00:08:35.340 --> 00:08:37.440
it gets really interesting. And it goes right

00:08:37.440 --> 00:08:40.360
to the heart of how these LLMs actually work.

00:08:40.480 --> 00:08:42.480
They aren't sentient, they don't have feelings.

00:08:42.559 --> 00:08:46.360
Right. They are, at their core, incredibly sophisticated

00:08:46.360 --> 00:08:49.159
word prediction machines. You give them a prompt,

00:08:49.220 --> 00:08:51.500
they break it down into tokens. Tokens being

00:08:51.500 --> 00:08:55.370
like words or parts of words? Exactly. tiny chunks

00:08:55.370 --> 00:08:58.470
of text. And then they predict the next most

00:08:58.470 --> 00:09:01.230
statistically likely sequence of tokens to follow

00:09:01.230 --> 00:09:03.409
based on all the data they were trained on. And

00:09:03.409 --> 00:09:06.490
so if you say thank you or you're useless, it

00:09:06.490 --> 00:09:10.360
just sees those as more tokens. not as praise

00:09:10.360 --> 00:09:12.700
or criticism. Precisely. That emotional language,

00:09:12.759 --> 00:09:15.000
positive or negative, it's basically just noise

00:09:15.000 --> 00:09:18.620
to the AI. Words like wonderful or failure don't

00:09:18.620 --> 00:09:21.539
describe the task. They dilute the main instruction,

00:09:21.679 --> 00:09:24.220
they add extra tokens that the AI has to process,

00:09:24.440 --> 00:09:26.779
and try to figure out how they fit into the pattern

00:09:26.779 --> 00:09:29.080
of predicting the next word for the actual task.

00:09:29.220 --> 00:09:31.460
It spends resources on the emotional fluff instead

00:09:31.460 --> 00:09:33.779
of the core request. It's like asking a search

00:09:33.779 --> 00:09:37.440
engine for best pizza near me, but adding, and

00:09:37.440 --> 00:09:39.990
I'm really, really hungry, Please find it fast.

00:09:41.090 --> 00:09:43.470
The extra bit doesn't help the search algorithm.

00:09:43.669 --> 00:09:46.350
That's a great analogy. Exactly. Just adds clutter.

00:09:47.250 --> 00:09:50.509
Whereas neutral, clear, direct instructions...

00:09:50.509 --> 00:09:53.629
they act like a noise filter. Okay. Prompts like

00:09:53.629 --> 00:09:56.850
mandatory requirement, ensure compliance, only

00:09:56.850 --> 00:09:59.669
state the name, they cut through ambiguity, they

00:09:59.669 --> 00:10:01.769
give the model a very clear path, they're like

00:10:01.769 --> 00:10:04.090
technical specs for a machine. And machines like

00:10:04.090 --> 00:10:06.789
clear specs. They really do. Honestly, I still

00:10:06.789 --> 00:10:10.090
catch myself adding please sometimes. It's a

00:10:10.090 --> 00:10:12.250
hard habit to break. We're just so used to communicating

00:10:12.250 --> 00:10:14.570
like humans. Yeah, I can admit I do that too.

00:10:14.870 --> 00:10:18.490
It feels weirdly rude not to sometimes, but...

00:10:18.509 --> 00:10:20.750
The research shows it's not helping the AI. Nope.

00:10:20.929 --> 00:10:23.669
It's just adding noise. So the key reason emotional

00:10:23.669 --> 00:10:26.409
language is noise is because it distracts the

00:10:26.409 --> 00:10:29.789
AI from the core task. It makes it process irrelevant

00:10:29.789 --> 00:10:32.570
data. That's the essence of it. It doesn't understand

00:10:32.570 --> 00:10:35.110
the emotion. It just sees more words to factor

00:10:35.110 --> 00:10:37.070
into its prediction, which muddies the water.

00:10:37.389 --> 00:10:40.129
So wrapping this up, what's the big idea here?

00:10:40.570 --> 00:10:43.250
The main takeaway from this deep dive. I think

00:10:43.250 --> 00:10:46.870
the big idea is profound, but also really simple.

00:10:47.409 --> 00:10:49.789
stop wasting time trying to play psychologist

00:10:49.789 --> 00:10:53.370
with AI. Right. No more carrots and sticks. Exactly.

00:10:53.950 --> 00:10:56.129
Threatening an AI doesn't make it smarter. We

00:10:56.129 --> 00:10:58.350
saw it can actually make it perform worse, especially

00:10:58.350 --> 00:11:02.049
on logic tasks. And being nice, while maybe making

00:11:02.049 --> 00:11:04.429
us feel better, doesn't really boost its performance

00:11:04.429 --> 00:11:07.049
either. So if you really want to unlock the power

00:11:07.049 --> 00:11:10.029
of these tools, what's the critical skill? It's

00:11:10.029 --> 00:11:12.490
prompt engineering. It's about clarity and precision.

00:11:12.870 --> 00:11:16.210
Give detailed instructions. Provide clear context.

00:11:16.450 --> 00:11:18.330
Offer examples that's called few -shot learning,

00:11:18.769 --> 00:11:20.490
giving it a couple of examples of what you want.

00:11:20.850 --> 00:11:23.590
Specify requirements clearly. Like being a good

00:11:23.590 --> 00:11:26.029
project manager for the AI. That's a perfect

00:11:26.029 --> 00:11:28.090
way to put it. Instead of trying to manipulate

00:11:28.090 --> 00:11:30.490
nonexistent emotions, be an excellent project

00:11:30.490 --> 00:11:34.049
manager. Be a dedicated guide. Whoa. Imagine

00:11:34.049 --> 00:11:35.889
the kind of precision you could get if every

00:11:35.889 --> 00:11:38.450
prompt was like a perfectly calibrated instrument.

00:11:38.570 --> 00:11:40.840
Yeah. That's the fastest, most effective path

00:11:40.840 --> 00:11:43.600
to making AI a truly powerful assistant. This

00:11:43.600 --> 00:11:45.679
deep dive really highlights that, doesn't it?

00:11:45.899 --> 00:11:48.320
The clarity of your instruction, not your tone,

00:11:48.799 --> 00:11:51.840
is what matters for the AI's success. It's about

00:11:51.840 --> 00:11:54.559
precision, not psychology. And maybe it makes

00:11:54.559 --> 00:11:59.460
you think, if an AI works best with clear, unambiguous

00:11:59.460 --> 00:12:03.600
directions, what does that say about how we communicate?

00:12:03.870 --> 00:12:06.990
With machines, sure, but maybe with each other,

00:12:07.129 --> 00:12:09.129
too. That's a really interesting thought. As

00:12:09.129 --> 00:12:11.450
you keep using AI, maybe reflect on that. Were

00:12:11.450 --> 00:12:13.610
you being a sycophant, a tyrant, or are you being

00:12:13.610 --> 00:12:15.889
a meticulous architect of information? Something

00:12:15.889 --> 00:12:18.090
to definitely mull over. Thank you for joining

00:12:18.090 --> 00:12:19.990
us on this deep dive. We hope this gave you a

00:12:19.990 --> 00:12:22.210
shortcut to being well -informed and maybe changed

00:12:22.210 --> 00:12:24.269
how you talk to your AI assistants. Hope so.

00:12:24.490 --> 00:12:26.070
Until next time, keep digging deeper.
