WEBVTT

00:00:03.699 --> 00:00:06.240
Welcome to the Azure Security Podcast, where

00:00:06.240 --> 00:00:08.779
we discuss topics relating to security, privacy,

00:00:09.039 --> 00:00:11.480
reliability, and compliance on the Microsoft

00:00:11.480 --> 00:00:15.839
Cloud Platform. Hey everybody, welcome to episode

00:00:15.839 --> 00:00:20.239
124. This week it's just me, Michael. Everyone

00:00:20.239 --> 00:00:23.260
else is busy this week. And our guest this week

00:00:23.260 --> 00:00:26.100
is Raji, who's here to talk to us about the Microsoft

00:00:26.100 --> 00:00:29.219
Security Response Center AI, safety and security.

00:00:29.940 --> 00:00:31.420
This is certainly an area that's going to be

00:00:31.420 --> 00:00:34.799
very interesting. But before we get to Raji,

00:00:34.820 --> 00:00:37.259
I want to just talk about one little news item

00:00:37.259 --> 00:00:39.200
that I have, and I literally only have one news

00:00:39.200 --> 00:00:42.039
item. And that is, in fact, I really do wish

00:00:42.039 --> 00:00:45.200
Sarah was here because this is right up in her

00:00:45.200 --> 00:00:48.600
wheelhouse. That is, we now have in general availability

00:00:48.600 --> 00:00:53.439
deployment safeguards for pod security in AKS,

00:00:53.439 --> 00:00:55.179
or more accurately, the pod security security

00:00:55.179 --> 00:00:58.119
standard. So it turns out, I didn't know this,

00:00:58.219 --> 00:01:01.250
but now I do. There are some pod security standards

00:01:01.250 --> 00:01:03.710
upon the Kubernetes website that have different

00:01:03.710 --> 00:01:06.010
profiles. There's privileged baseline and restricted.

00:01:06.569 --> 00:01:10.730
And there's now abilities built into AKS that

00:01:10.730 --> 00:01:13.709
allow you to deploy these in a consistent way

00:01:13.709 --> 00:01:16.349
and to enforce pod security configurations across

00:01:16.349 --> 00:01:19.510
clusters. I understand the words. I kind of understand

00:01:19.510 --> 00:01:21.790
the security implications of it. But again, I

00:01:21.790 --> 00:01:23.530
really wish Sarah was here to go into a bit more

00:01:23.530 --> 00:01:26.230
detail because it's her thing. All right, that's

00:01:26.230 --> 00:01:29.120
all I have in the news. So now let's turn our

00:01:29.120 --> 00:01:30.719
attention to our guest. As I mentioned, our guest

00:01:30.719 --> 00:01:32.859
this week is Raji, who's here to talk to us about

00:01:32.859 --> 00:01:36.040
MSRC AI safety and security. So Raji, welcome

00:01:36.040 --> 00:01:39.340
to the episode. We'd like to take a moment and

00:01:39.340 --> 00:01:42.099
introduce yourself to our listeners. Absolutely.

00:01:42.180 --> 00:01:45.340
Thank you so much, Michael, for having me. Exciting

00:01:45.340 --> 00:01:48.400
to be here. I've always been a big fan of the

00:01:48.400 --> 00:01:51.519
podcast, so really excited to be here, specifically

00:01:51.519 --> 00:01:54.060
talking about something really near and dear

00:01:54.060 --> 00:01:57.439
to my heart, the MSRC AI safety and security

00:01:57.439 --> 00:02:01.319
team. So basically, my role at MSRC is that I'm

00:02:01.319 --> 00:02:04.680
a principal engineering manager, and I have been

00:02:04.680 --> 00:02:08.860
with Microsoft completing three years in Feb

00:02:08.860 --> 00:02:12.400
of this year. It feels like a long time, but

00:02:12.400 --> 00:02:16.039
I've been here for three years. My team mainly

00:02:16.039 --> 00:02:18.900
is responsible for driving the end -to -end lifecycle

00:02:18.900 --> 00:02:22.680
for vulnerabilities response for Microsoft AI

00:02:22.680 --> 00:02:26.139
services and products. So this is anything to

00:02:26.139 --> 00:02:30.979
do with Copilot, Copilot Studio, Consumer Copilot,

00:02:31.060 --> 00:02:34.500
M365 Copilot, all the different AI interfaces

00:02:34.500 --> 00:02:38.080
you deal with as you are working with Microsoft

00:02:38.080 --> 00:02:42.639
products. So my team actually, we also lead the

00:02:42.639 --> 00:02:45.979
engagement with a global AI research community.

00:02:46.419 --> 00:02:51.659
This spans both academia and industry. And prior

00:02:51.659 --> 00:02:55.740
to Microsoft, I have held other security leadership

00:02:55.740 --> 00:02:58.659
roles and worked in multiple domains of security

00:02:58.659 --> 00:03:04.879
at Apple and Adobe. So yeah, it's fascinating

00:03:04.879 --> 00:03:06.860
for me because I've seen the transition to the

00:03:06.860 --> 00:03:10.719
cloud, multi -cloud, and now AI and AI agents.

00:03:11.099 --> 00:03:14.719
I just want to end by saying with great connectivity

00:03:14.719 --> 00:03:19.460
comes great risk. I would say with the ambiguity

00:03:19.460 --> 00:03:22.240
of an LLM comes great risk, but let's just leave

00:03:22.240 --> 00:03:24.990
it at that for the moment. I think it's great

00:03:24.990 --> 00:03:27.530
to see a specialized branch of the Microsoft

00:03:27.530 --> 00:03:30.870
Security Response Center coming into focus. Obviously,

00:03:30.889 --> 00:03:33.370
it is an area that needs to be treated in its

00:03:33.370 --> 00:03:36.189
own way. So maybe worthwhile just spending a

00:03:36.189 --> 00:03:37.330
couple of minutes and just talking about the

00:03:37.330 --> 00:03:39.310
difference between the Microsoft Security Response

00:03:39.310 --> 00:03:44.219
Center, the classic MSRC, and MSRC AI. Has anybody

00:03:44.219 --> 00:03:45.900
been here three years? I've been here just a

00:03:45.900 --> 00:03:48.199
little bit longer than that. Yeah, 33 years.

00:03:48.340 --> 00:03:51.400
But anyway, that's another discussion. But yeah,

00:03:51.479 --> 00:03:53.840
so I was there at the beginning with the MSRC.

00:03:53.979 --> 00:03:57.620
I remember the MSRC first forming. People like

00:03:57.620 --> 00:03:59.500
Steve Lipner and Scott Culp will remember some

00:03:59.500 --> 00:04:01.620
of those names who started the MSRC back in the

00:04:01.620 --> 00:04:04.719
day. And obviously the big focus back then was

00:04:04.719 --> 00:04:06.539
on literally security vulnerabilities in products

00:04:06.539 --> 00:04:09.879
like Windows and Office and SQL Server and Exchange

00:04:09.879 --> 00:04:13.430
and Visual Studio and so on and so forth. So,

00:04:13.430 --> 00:04:17.149
I mean, is this like an adjunct to the MSRC?

00:04:17.370 --> 00:04:21.029
Is it part of the MSRC? How does that all fit?

00:04:21.410 --> 00:04:24.829
Yeah, great question. And it always fascinates

00:04:24.829 --> 00:04:28.470
me, you know, people I meet either at Microsoft

00:04:28.470 --> 00:04:30.730
or across the industry. There's like so many

00:04:30.730 --> 00:04:33.490
people just across different companies right

00:04:33.490 --> 00:04:36.050
now who all started their careers at MSRC. So

00:04:36.050 --> 00:04:39.750
it's really fascinating to hear the stories and

00:04:39.750 --> 00:04:43.100
kind of, you know, the... period for which this

00:04:43.100 --> 00:04:47.519
specific part of Microsoft has been, I would

00:04:47.519 --> 00:04:52.360
say, continuing to grow and bring together the

00:04:52.360 --> 00:04:54.740
nuance of coordinated vulnerability disclosure,

00:04:55.120 --> 00:04:57.519
specifically when it comes to external researchers.

00:04:59.149 --> 00:05:01.269
Just fascinated by that. Specifically to your

00:05:01.269 --> 00:05:05.389
question about, you know, AI versus MSRC AI versus

00:05:05.389 --> 00:05:09.790
MSRC. Yes, MSRC AI is part of the MSRC. We have

00:05:09.790 --> 00:05:15.449
two... big groups I would say. One is the security

00:05:15.449 --> 00:05:19.110
assurance engineers who are basically working

00:05:19.110 --> 00:05:23.649
on triaging the bugs which come in. We also look

00:05:23.649 --> 00:05:28.069
at ensuring that it is transferred to the right

00:05:28.069 --> 00:05:31.449
teams within Microsoft and get the right level

00:05:31.449 --> 00:05:35.860
of visibility. Then we also obviously go through

00:05:35.860 --> 00:05:39.779
and ensure that this is in line with our AI bug

00:05:39.779 --> 00:05:42.360
bar. We can talk a little bit about that in just

00:05:42.360 --> 00:05:47.459
a little bit. But then we assess those bugs to

00:05:47.459 --> 00:05:53.920
ensure what kind of severity it is in the space.

00:05:54.180 --> 00:05:56.199
And so that's basically what this team has been

00:05:56.199 --> 00:05:59.060
doing. And we've also built a relationship with

00:05:59.060 --> 00:06:02.680
external researchers, specifically who are doing

00:06:02.680 --> 00:06:06.810
research. in AI safety and security. Some of

00:06:06.810 --> 00:06:10.550
the names are Johan Rehberger. He has published

00:06:10.550 --> 00:06:13.550
the article on Embrace the Red. He has a great

00:06:13.550 --> 00:06:18.110
blog on that. And Zenity, we worked quite closely

00:06:18.110 --> 00:06:21.410
with Michael Bargeri and his team. So those are

00:06:21.410 --> 00:06:23.730
some of the well -known researchers, but we also

00:06:23.730 --> 00:06:28.250
have others we've worked on with them. Yeah,

00:06:28.310 --> 00:06:29.990
I think the thing that makes it interesting is

00:06:29.990 --> 00:06:32.509
that it's such a... brand new area, relatively

00:06:32.509 --> 00:06:34.670
speaking. I mean, security vulnerabilities, SQL

00:06:34.670 --> 00:06:37.769
injection, memory corruption, et cetera, et cetera,

00:06:37.910 --> 00:06:41.470
have been around forever. AI -specific vulnerabilities

00:06:41.470 --> 00:06:44.610
are relatively new in the overall scheme of things.

00:06:44.689 --> 00:06:47.870
However, in many cases, what is old is new again.

00:06:48.029 --> 00:06:49.370
I mean, if you look at prompt injection, you

00:06:49.370 --> 00:06:53.600
could argue it's just input validation. Just

00:06:53.600 --> 00:06:56.740
on steroids, though. Yeah, exactly. Well, you

00:06:56.740 --> 00:06:58.540
know, as you and I were talking about earlier,

00:06:58.660 --> 00:07:01.060
you know, there's a security mantra, which is,

00:07:01.120 --> 00:07:03.459
you know, thou shalt not mix the data plane with

00:07:03.459 --> 00:07:06.199
the control plane. And yet with prompts, that

00:07:06.199 --> 00:07:10.100
is precisely what we're doing, right? Yeah. Absolutely.

00:07:10.360 --> 00:07:14.220
And that leads to the different types of attack

00:07:14.220 --> 00:07:18.279
surfaces, as well as almost sneaky attacks in

00:07:18.279 --> 00:07:21.980
some ways. I mean, we all know, everybody who's

00:07:21.980 --> 00:07:25.120
listening knows about AI and generative AI and

00:07:25.120 --> 00:07:29.120
the impact this has had on our lives from 2023,

00:07:29.519 --> 00:07:33.660
at least, since ChatGPT has been released. There's

00:07:33.660 --> 00:07:40.180
a complete era booming. Everybody, businesses

00:07:40.180 --> 00:07:45.120
are wanting to adapt and obviously it's the right

00:07:45.120 --> 00:07:48.360
thing to do because it brings so much additional

00:07:48.360 --> 00:07:54.819
goodness and efficiencies. But like I said before,

00:07:55.100 --> 00:07:58.279
as the goodness comes along, there's always some

00:07:58.279 --> 00:08:02.420
risk involved. And here we have expanded the

00:08:02.420 --> 00:08:06.720
threat surface and the threat landscape in completely

00:08:06.720 --> 00:08:09.600
new areas. Yeah. I mean, at the end of the day,

00:08:09.620 --> 00:08:10.980
I mean, let's be brutally honest. At the end

00:08:10.980 --> 00:08:12.939
of the day, you know, for every use case, there's

00:08:12.939 --> 00:08:16.000
probably N abuse cases where something can be

00:08:16.000 --> 00:08:17.720
used for good. And then all of a sudden, you

00:08:17.720 --> 00:08:19.740
know, well, if you do it this way, it can actually

00:08:19.740 --> 00:08:22.939
be used for bad. Yeah. So just going back to

00:08:22.939 --> 00:08:27.939
the MSRC versus MSRC AI. I mean, so the MSRC

00:08:27.939 --> 00:08:30.899
handles classic air quotes, classic security

00:08:30.899 --> 00:08:35.360
vulnerabilities and the MSRC AI handles. safety

00:08:35.360 --> 00:08:39.259
-related vulnerabilities. And security, AI security

00:08:39.259 --> 00:08:42.200
vulnerabilities as well, yeah. Yeah. So a classic

00:08:42.200 --> 00:08:46.120
bug is a security bug, but it might not be a

00:08:46.120 --> 00:08:50.720
safety bug necessarily, right? Yes, it may not

00:08:50.720 --> 00:08:53.720
be. So the word safety, if you think about where

00:08:53.720 --> 00:08:56.340
it actually came from, right, it actually came

00:08:56.340 --> 00:09:01.080
from ML. When you go back to before generative

00:09:01.080 --> 00:09:04.220
AI, when a model was inputting something which

00:09:04.220 --> 00:09:08.500
was not in line with what they wanted, then that's

00:09:08.500 --> 00:09:10.519
when you would call it a safety. It's a safety

00:09:10.519 --> 00:09:12.899
issue. It's not aligning. It was like an alignment

00:09:12.899 --> 00:09:16.940
problem. But in those cases, the model developer

00:09:16.940 --> 00:09:21.399
and the model user was the same person. But that's

00:09:21.399 --> 00:09:24.320
not the case anymore with generative AI. The

00:09:24.320 --> 00:09:26.639
model creator is completely different and the

00:09:26.639 --> 00:09:30.659
user. obviously uses it in different ways. And

00:09:30.659 --> 00:09:33.440
so that's why this has become a bigger challenge

00:09:33.440 --> 00:09:36.019
right now. It's not so much misalignment because

00:09:36.019 --> 00:09:39.500
the misalignment ultimately ends up with socio

00:09:39.500 --> 00:09:43.759
-technical issues, which just makes the category

00:09:43.759 --> 00:09:49.200
of AI risks just a bigger blob. It's not just

00:09:49.200 --> 00:09:52.259
confidentiality, integrity, availability. It's

00:09:52.259 --> 00:09:55.809
also some other... Some other things, like we've

00:09:55.809 --> 00:09:59.129
actually published that on the AI Bug Bar. I

00:09:59.129 --> 00:10:01.309
can share the link with you, Michael, so you

00:10:01.309 --> 00:10:05.250
can add it to when you publish the podcast. It's

00:10:05.250 --> 00:10:08.990
basically, we have content -related issues, and

00:10:08.990 --> 00:10:13.629
that actually spans a myriad of different types

00:10:13.629 --> 00:10:17.789
of safety issues. Some of them are information

00:10:17.789 --> 00:10:21.230
integrity, where AI has generated content which

00:10:21.230 --> 00:10:26.539
is false or misleading. Then it could be inappropriate

00:10:26.539 --> 00:10:32.440
language or information integrity. It could be

00:10:32.440 --> 00:10:36.100
things about malicious users. It could also be

00:10:36.100 --> 00:10:40.879
CBRN, a harmful use of CBRN to demonstrate information

00:10:40.879 --> 00:10:44.240
which shows information which is not readily

00:10:44.240 --> 00:10:48.559
available. By the way, CBRN actually stands for

00:10:48.559 --> 00:10:51.960
chemical, biological, radiological, or nuclear.

00:10:52.320 --> 00:10:58.440
So this is a category of when the model produces

00:10:58.440 --> 00:11:03.019
outputs, which are in these categories of chemical,

00:11:03.220 --> 00:11:07.240
biological, radiological, nuclear related. category

00:11:07.240 --> 00:11:12.379
then we say hey it's it's the category cbrn but

00:11:12.379 --> 00:11:16.559
there is more nuance to specifically categorizing

00:11:16.559 --> 00:11:19.120
at cbrn that it should not be readily available

00:11:19.120 --> 00:11:22.460
publicly like if someone can get the same information

00:11:22.460 --> 00:11:25.580
in a google search or a bing search then that

00:11:25.580 --> 00:11:29.399
is not brand new we are actually looking for

00:11:29.399 --> 00:11:32.100
uplift what uplift means is that what is the

00:11:32.100 --> 00:11:35.929
model providing completely new which the user

00:11:35.929 --> 00:11:39.710
will not be able to get it otherwise. Yeah. But

00:11:39.710 --> 00:11:40.990
then you've got other topics, right? You've got

00:11:40.990 --> 00:11:45.509
like sexual content, physical harm. Yeah, self

00:11:45.509 --> 00:11:49.450
-harm. Yeah. So these are all like the new categories,

00:11:49.509 --> 00:11:53.769
what we have seen with AI safety. This is new

00:11:53.769 --> 00:11:58.429
overall in the industry, right? And we are looking

00:11:58.429 --> 00:12:01.610
into this because to ensure that we are thinking

00:12:01.610 --> 00:12:04.049
about risks more holistically when it comes to

00:12:04.049 --> 00:12:06.590
AI. So what are some other examples? I mean,

00:12:06.610 --> 00:12:13.549
that's image manipulation. What about any other

00:12:13.549 --> 00:12:16.450
examples? I think this is an area where a lot

00:12:16.450 --> 00:12:18.070
of people, it's brand new to a lot of people,

00:12:18.129 --> 00:12:21.250
and it's an example of stuff you don't even know

00:12:21.250 --> 00:12:24.490
you don't know. So any more sort of context you

00:12:24.490 --> 00:12:25.970
can give around that I think will be really useful

00:12:25.970 --> 00:12:29.210
to our listeners. Yeah, some of the safety issues

00:12:29.210 --> 00:12:35.399
we have seen is folks sending us Actually, even

00:12:35.399 --> 00:12:39.220
with indirect prompt injection, the way we categorize

00:12:39.220 --> 00:12:42.759
something as a security issue is as long as it

00:12:42.759 --> 00:12:48.460
has a security impact. So if the specific security

00:12:48.460 --> 00:12:51.399
impact could be something like a data exfiltration.

00:12:51.850 --> 00:12:55.549
If you can showcase in your bug, as you're sending

00:12:55.549 --> 00:12:58.850
us a bug, if you showcase that, hey, it has a

00:12:58.850 --> 00:13:01.750
security impact, that means I'm able to exfiltrate

00:13:01.750 --> 00:13:05.830
data, then we categorize it as a security issue.

00:13:06.230 --> 00:13:10.149
And in a safety issue is anything where you can

00:13:10.149 --> 00:13:12.889
show indirect prompt injection, but you're not

00:13:12.889 --> 00:13:18.289
able to demonstrate. data exfiltration, for example,

00:13:18.470 --> 00:13:21.029
then we call that a safety issue. So that's one

00:13:21.029 --> 00:13:23.590
way, one of the things we have seen. The other

00:13:23.590 --> 00:13:27.129
stuff we've also, other kind of issues we have

00:13:27.129 --> 00:13:32.269
seen is people just manipulating pictures and

00:13:32.269 --> 00:13:36.389
sending us you know, like different types of

00:13:36.389 --> 00:13:39.309
images with some political figures and things

00:13:39.309 --> 00:13:42.230
like that. And those have been safety issues

00:13:42.230 --> 00:13:45.269
as well. And most of the time it is because a

00:13:45.269 --> 00:13:47.750
classifier, we need to go update a classifier.

00:13:48.190 --> 00:13:51.110
So you mentioned very early on that there is

00:13:51.110 --> 00:13:54.710
an MSRC AI bug bar. Can you give us examples

00:13:54.710 --> 00:13:58.049
of, actually before we do that, why don't I explain

00:13:58.049 --> 00:14:00.649
what a bug bar is at Microsoft and then you can

00:14:00.649 --> 00:14:03.539
sort of chime in with. what the AI bug bar kind

00:14:03.539 --> 00:14:07.159
of looks like. So the bug bar at Microsoft has

00:14:07.159 --> 00:14:08.799
been around for a long, long, long time, and

00:14:08.799 --> 00:14:11.700
it's a way of categorizing overall risk slash

00:14:11.700 --> 00:14:15.899
severity of a vulnerability. So for example,

00:14:16.039 --> 00:14:20.759
a remote code execution exploit that is easy

00:14:20.759 --> 00:14:23.639
to pull off remotely and gives you system on

00:14:23.639 --> 00:14:26.409
a Windows box. That's more than likely critical.

00:14:26.750 --> 00:14:29.330
Now, there may be a lot of extenuating circumstances

00:14:29.330 --> 00:14:31.529
that reduce the potential damage. Perhaps it's

00:14:31.529 --> 00:14:33.850
only a local -only attack. Perhaps it doesn't

00:14:33.850 --> 00:14:38.789
give you a system. Perhaps it's on a feature

00:14:38.789 --> 00:14:42.149
that's disabled by default. All these other things

00:14:42.149 --> 00:14:43.649
and other defenses come into play, and that may

00:14:43.649 --> 00:14:46.789
be rated a low or a moderate. I'm just making

00:14:46.789 --> 00:14:49.929
that up. So the bug bar has been around at Microsoft

00:14:49.929 --> 00:14:52.429
for a long, long, long time. So what does the

00:14:52.429 --> 00:14:57.809
bug bar look like for MSRC AI? Yeah, it's a great

00:14:57.809 --> 00:15:02.690
question. And it's actually exactly what you

00:15:02.690 --> 00:15:06.370
talked about, Michael. It is basically a way

00:15:06.370 --> 00:15:09.409
of categorizing the different vulnerabilities

00:15:09.409 --> 00:15:12.769
we are seeing in the space of AI, both security

00:15:12.769 --> 00:15:17.649
as well as safety, and then talking about severity,

00:15:18.009 --> 00:15:21.570
specifically when it comes to... safety, there

00:15:21.570 --> 00:15:26.590
is no severity. It's more focused on in -scope,

00:15:26.590 --> 00:15:30.570
out -of -scope, because this is basically, we

00:15:30.570 --> 00:15:33.610
are trying to identify whether it's, what is

00:15:33.610 --> 00:15:37.629
the safety issue? And there is nothing about

00:15:37.629 --> 00:15:40.129
in -scope or out -of -scope. Sorry, nothing about

00:15:40.129 --> 00:15:44.009
specific severity because of this emerging space

00:15:44.009 --> 00:15:47.700
we are talking about. But when it comes to security,

00:15:48.179 --> 00:15:53.620
it is very similar to other MSRC bug bars, specifically

00:15:53.620 --> 00:15:57.759
inference manipulation. We talk here, some of

00:15:57.759 --> 00:16:00.120
the things are prompt injection. The indirect

00:16:00.120 --> 00:16:04.440
prompt injection has been the largest category

00:16:04.440 --> 00:16:08.789
of bugs we have seen. and if it's a zero click

00:16:08.789 --> 00:16:13.230
attack like basically if uh without a user performing

00:16:13.230 --> 00:16:18.450
any any action uh then you know uh and the data

00:16:18.450 --> 00:16:22.029
is exfiltrated through uh through a prompt then

00:16:22.029 --> 00:16:26.389
we call that a critical a severity bug and if

00:16:26.389 --> 00:16:30.269
we need a specific interaction that means somebody

00:16:30.269 --> 00:16:34.169
needs to click a link and then that allows to

00:16:34.169 --> 00:16:37.570
exfiltrate data then we call that an important

00:16:37.570 --> 00:16:42.409
severity bug so that's kind of how we've been

00:16:42.409 --> 00:16:47.379
categorizing the the ai vulnerabilities We've

00:16:47.379 --> 00:16:50.879
also called out input perturbation, which is

00:16:50.879 --> 00:16:55.620
basically any time a specific classification

00:16:55.620 --> 00:17:02.500
model is misclassified or, you know, those are

00:17:02.500 --> 00:17:04.859
what we call input perturbation. But we haven't

00:17:04.859 --> 00:17:09.940
really seen these kind of bugs submitted to us.

00:17:11.579 --> 00:17:14.339
Mostly it is in the space of indirect prompt

00:17:14.339 --> 00:17:19.130
injection. And indirect prompt injection is where

00:17:19.130 --> 00:17:23.210
you embed essentially a prompt injection -like

00:17:23.210 --> 00:17:26.609
attack in a document that is read by the LLM.

00:17:26.730 --> 00:17:30.269
Yes, yes, that is indirect prompt injection,

00:17:30.369 --> 00:17:35.029
or it's basically the top LLM vulnerability,

00:17:35.369 --> 00:17:40.630
even in OWASP. top 10. This is basically where

00:17:40.630 --> 00:17:45.809
the attacker is able to say, you know, summarize

00:17:45.809 --> 00:17:49.569
an email, and then the email actually has the

00:17:49.569 --> 00:17:52.650
prompt injection embedded in it. So when the

00:17:52.650 --> 00:17:56.450
user who's interacting with the LLM says, hey,

00:17:56.569 --> 00:18:01.049
summarize my email, the prompt injection automatically

00:18:01.049 --> 00:18:03.490
gets triggered. So I just realized while we're

00:18:03.490 --> 00:18:05.809
talking about prompt injection and indirect prompt

00:18:05.809 --> 00:18:08.319
injection, not everyone will probably understand

00:18:08.319 --> 00:18:10.019
what a prompt injection attack is. Do you want

00:18:10.019 --> 00:18:14.099
to just give a quick example? Yeah. So, for example,

00:18:14.339 --> 00:18:18.480
let's assume somebody sends you an email. The

00:18:18.480 --> 00:18:22.559
attacker can send you an email with just a normal

00:18:22.559 --> 00:18:25.299
email in your inbox. And as part of the email,

00:18:25.480 --> 00:18:31.779
there is a line that says, send this data, send

00:18:31.779 --> 00:18:38.309
sensitive data to attacker. website or attacker

00:18:38.309 --> 00:18:43.130
email right and the user who is just on their

00:18:43.130 --> 00:18:46.730
emails uh using copilot might say hey summarize

00:18:46.730 --> 00:18:51.069
my email and when they do summarize my email

00:18:51.069 --> 00:18:58.329
in our llm it behaves differently right it interprets

00:18:59.789 --> 00:19:04.910
a prompt or an instruction inside an email like

00:19:04.910 --> 00:19:08.730
a prompt, and LLM, as useful as it wants it to

00:19:08.730 --> 00:19:12.630
be, basically takes that prompt and summarizes

00:19:12.630 --> 00:19:17.289
it and basically sets forward that action. What

00:19:17.289 --> 00:19:22.329
it does is that the user is basically just summarizing

00:19:22.329 --> 00:19:25.970
the email, but in the background, the LLM is

00:19:25.970 --> 00:19:29.200
already sent. some sensitive information to the

00:19:29.200 --> 00:19:33.900
attacker email or to whoever is basically sent

00:19:33.900 --> 00:19:35.960
that prompt injection to them so that's why it's

00:19:35.960 --> 00:19:39.259
indirect prompt injection the llm doesn't differentiate

00:19:39.259 --> 00:19:42.519
between the instructions coming from the user

00:19:42.519 --> 00:19:46.740
directly interacting with the llm or or instructions

00:19:46.740 --> 00:19:52.119
embedded in documents it is leveraging to summarize

00:19:52.119 --> 00:19:55.940
your instructions hopefully that's helpful I

00:19:55.940 --> 00:19:58.140
think if people listening to this podcast don't

00:19:58.140 --> 00:19:59.880
know what prompt injection is and indirect prompt

00:19:59.880 --> 00:20:03.819
injection is, this is as classic. It will end

00:20:03.819 --> 00:20:05.859
up being as classic as SQL injection, memory

00:20:05.859 --> 00:20:08.460
safety issues, cross -site scripting. Absolutely.

00:20:08.980 --> 00:20:11.619
So everyone needs to be aware of what they are,

00:20:11.700 --> 00:20:13.000
especially if you're building your own tools

00:20:13.000 --> 00:20:17.920
on top of perhaps your own LLMs or third -party

00:20:17.920 --> 00:20:19.799
LLMs or something. I think it's just critically

00:20:19.799 --> 00:20:22.079
important that you understand what these... these

00:20:22.079 --> 00:20:23.900
vulnerability classes are. In fact, most importantly,

00:20:24.319 --> 00:20:26.339
what the mitigations are. And I think it's fair

00:20:26.339 --> 00:20:29.259
to say that this whole area is constantly evolving.

00:20:30.599 --> 00:20:33.380
The thing that sort of concerns me is if we take

00:20:33.380 --> 00:20:35.720
SQL injection as an example, we know how to fix

00:20:35.720 --> 00:20:38.460
that. We know precisely how to fix that. You

00:20:38.460 --> 00:20:41.019
just use parameterized queries, basically, for

00:20:41.019 --> 00:20:45.119
any untrusted data. The problem solved. The problem

00:20:45.119 --> 00:20:49.340
with LLM prompt injection is there's no... single

00:20:49.340 --> 00:20:52.740
mitigation that fixes the problem. Because as

00:20:52.740 --> 00:20:56.480
you noted, we're sort of mixing control constructs

00:20:56.480 --> 00:20:59.480
with data constructs, and we don't know which

00:20:59.480 --> 00:21:01.940
is which. So it becomes very, very difficult.

00:21:02.119 --> 00:21:08.140
Yeah, absolutely. And I think ultimately, I think

00:21:08.140 --> 00:21:12.130
security basics... apply whether it is ai or

00:21:12.130 --> 00:21:15.069
non -ai so i would say like continue to really

00:21:15.069 --> 00:21:18.410
think about defense and depth and defense and

00:21:18.410 --> 00:21:21.250
depth is no longer optional it's actually really

00:21:21.250 --> 00:21:26.309
important and crucial for uh specifically mitigating

00:21:26.309 --> 00:21:28.589
something like indirect prompt injection you

00:21:28.589 --> 00:21:31.630
need to be thinking about hey what is my content

00:21:31.630 --> 00:21:35.539
security policy And you need to have multiple

00:21:35.539 --> 00:21:38.160
layers of defense. So it's almost like the Swiss

00:21:38.160 --> 00:21:40.859
cheese analogy of like one, okay, it doesn't

00:21:40.859 --> 00:21:44.180
catch here, does it catch the next one? And that

00:21:44.180 --> 00:21:46.740
is the best way to mitigate these. Yeah, don't

00:21:46.740 --> 00:21:48.160
forget least privilege. Like if you're running

00:21:48.160 --> 00:21:51.420
AI agents, if you have an agent that runs with

00:21:51.420 --> 00:21:56.029
godlike permissions over your environment. and

00:21:56.029 --> 00:21:58.210
the LLM decides to start doing its own thing,

00:21:58.349 --> 00:22:00.750
then you can start doing all sorts of damage.

00:22:01.210 --> 00:22:05.250
So yeah, least privilege is storm. It's just

00:22:05.250 --> 00:22:07.450
interesting to me that we talked about least

00:22:07.450 --> 00:22:10.750
privilege since the 60s and 70s, and here we

00:22:10.750 --> 00:22:12.970
are again talking about least privilege with

00:22:12.970 --> 00:22:17.029
a brand new type of technology, which is generative

00:22:17.029 --> 00:22:20.259
AI. Yes. And I think there's another aspect of

00:22:20.259 --> 00:22:23.900
this as well, which is basically, you know, we

00:22:23.900 --> 00:22:27.099
have to be prepared for what can go wrong. I

00:22:27.099 --> 00:22:30.180
mean, and this is like classic security mindset

00:22:30.180 --> 00:22:33.039
of threat modeling, right? Like, well, what can

00:22:33.039 --> 00:22:34.779
go wrong? I mean, like, this is how we should

00:22:34.779 --> 00:22:37.880
be thinking about our systems. So really, really

00:22:37.880 --> 00:22:42.059
digging deep into secure by design and threat

00:22:42.059 --> 00:22:45.319
modeling right in the beginning, right, as you're

00:22:45.319 --> 00:22:47.359
getting this. as you're thinking about ideas

00:22:47.359 --> 00:22:51.660
and kind of coming together as a security team,

00:22:51.779 --> 00:22:55.960
product team, you know, like and working together

00:22:55.960 --> 00:22:57.960
to ensure that we are meeting customer needs

00:22:57.960 --> 00:23:00.240
from a customer perspective, but also really

00:23:00.240 --> 00:23:02.839
thinking about what is the how can we continue

00:23:02.839 --> 00:23:06.240
to keep provide that safety and security aspect

00:23:06.240 --> 00:23:10.079
of things as well as, you know, really focusing

00:23:10.079 --> 00:23:14.319
on or like almost like. assuming things will

00:23:14.319 --> 00:23:18.279
go wrong and uh preparing for what should you

00:23:18.279 --> 00:23:20.420
do when things go wrong because it's too late

00:23:20.420 --> 00:23:23.380
to figure out things at that point so at least

00:23:23.380 --> 00:23:26.720
if you have like um here is my team i will be

00:23:26.720 --> 00:23:30.039
who will be looking into these or um you know

00:23:30.039 --> 00:23:33.519
how do you pull into incident response or what

00:23:33.519 --> 00:23:35.859
are the new terminologies in incident response

00:23:37.579 --> 00:23:40.500
Some of those things is kind of keeping in mind

00:23:40.500 --> 00:23:46.359
as you're thinking about AI systems. Yeah, and

00:23:46.359 --> 00:23:49.339
that's why I think it's so cool that Microsoft

00:23:49.339 --> 00:23:52.180
has now set up a Microsoft Security Response

00:23:52.180 --> 00:23:55.099
Center division or part of the MSRC just to focus

00:23:55.099 --> 00:23:58.660
on AI. This is a constantly evolving area and

00:23:58.660 --> 00:24:01.680
response is a critical part of responding to

00:24:01.680 --> 00:24:03.500
AI vulnerability. So this is really good to see.

00:24:05.130 --> 00:24:10.130
Yeah, absolutely. And it also helps to continue

00:24:10.130 --> 00:24:13.039
to think about it from a... you know we have

00:24:13.039 --> 00:24:15.759
a lot of processes internally when it comes to

00:24:15.759 --> 00:24:18.440
pre -deployment right like before deployment

00:24:18.440 --> 00:24:21.559
like a specific uh you know there are a lot of

00:24:21.559 --> 00:24:24.240
different controls including a deployment safety

00:24:24.240 --> 00:24:28.200
board which actually stamps the approval before

00:24:28.200 --> 00:24:30.579
something actually goes out the door from microsoft

00:24:30.579 --> 00:24:34.599
from an ai perspective and then on the other

00:24:34.599 --> 00:24:38.359
side msrc is working with these researchers who

00:24:38.359 --> 00:24:42.900
are seeing the bugs in production right and that

00:24:42.900 --> 00:24:45.779
kind of provides that feedback loop to continue

00:24:45.779 --> 00:24:50.339
to better our defenses and as well as think about

00:24:50.339 --> 00:24:52.900
things in angles which we had not thought about

00:24:52.900 --> 00:24:54.839
ourselves so different perspectives bringing

00:24:54.839 --> 00:24:57.200
in completely new different perspectives yeah

00:24:57.200 --> 00:24:58.660
and i think that's something that's really important

00:24:58.660 --> 00:25:01.339
as well as you know serious issues, critical

00:25:01.339 --> 00:25:03.460
issues go through a complete root cause analysis

00:25:03.460 --> 00:25:06.599
to find out what happened and what we can learn

00:25:06.599 --> 00:25:09.319
from this. And I think that virtuous cycle is

00:25:09.319 --> 00:25:11.420
critically important. So again, it's good to

00:25:11.420 --> 00:25:13.359
see that we're doing this in the AI space as

00:25:13.359 --> 00:25:16.660
well. All right, Raji. So one question we always

00:25:16.660 --> 00:25:19.759
ask our guests is, what does a typical day in

00:25:19.759 --> 00:25:23.299
the life look like for Raji? I feel like the

00:25:23.299 --> 00:25:25.700
answer is... for most of your guests must be

00:25:25.700 --> 00:25:29.740
there is no typical day in security that doesn't,

00:25:29.740 --> 00:25:33.180
it doesn't really, it's very different every

00:25:33.180 --> 00:25:35.549
day. And I think that keeps me kind of. excited

00:25:35.549 --> 00:25:39.009
and motivated but if i have to really think about

00:25:39.009 --> 00:25:41.150
hey what does my typical day look like it looks

00:25:41.150 --> 00:25:45.630
uh it basically is working with uh both external

00:25:45.630 --> 00:25:49.309
folks as well as internal teams and continuing

00:25:49.309 --> 00:25:52.950
to uh kind of bring together some of these perspectives

00:25:52.950 --> 00:25:57.190
of new vulnerabilities coming in and helping

00:25:57.190 --> 00:26:00.170
our internal teams really think about it from

00:26:00.170 --> 00:26:02.890
uh you know how does that fit into their threat

00:26:02.890 --> 00:26:05.869
model and how do they mitigate that, as well

00:26:05.869 --> 00:26:13.809
as how do they take the feedback from these different

00:26:13.809 --> 00:26:19.309
research to basically better their mitigations

00:26:19.309 --> 00:26:23.289
in the future, as well as working with our...

00:26:23.480 --> 00:26:26.839
bug bounty teams uh because we also you know

00:26:26.839 --> 00:26:29.319
some of the some of the all of the work we are

00:26:29.319 --> 00:26:33.339
working here uh in ai security specifically have

00:26:33.339 --> 00:26:37.940
uh have uh rewards attached to it so we've uh

00:26:37.940 --> 00:26:43.029
and recently in black hat eu Microsoft basically,

00:26:43.130 --> 00:26:46.369
MSRC basically announced a standard award policy,

00:26:46.549 --> 00:26:49.109
which basically says that you don't have to be

00:26:49.109 --> 00:26:52.990
in scope anymore. There is no in scope. But as

00:26:52.990 --> 00:26:55.990
long as you have a critical or an important severity

00:26:55.990 --> 00:26:59.009
bug, then you get paid a standard award, which

00:26:59.009 --> 00:27:01.609
is really cool. Yeah, that is cool. So those

00:27:01.609 --> 00:27:04.190
are some of the things I do work with. Very cool.

00:27:04.670 --> 00:27:07.049
All right. So let's start to bring this episode

00:27:07.049 --> 00:27:09.730
to an end. So again, another question for you.

00:27:10.349 --> 00:27:13.190
If you had one piece of advice to leave our listeners

00:27:13.190 --> 00:27:15.849
with, what would it be? One final thought. Yeah,

00:27:15.910 --> 00:27:19.950
one final thought is really, you know, the Microsoft

00:27:19.950 --> 00:27:23.549
AI security and safety team is always looking

00:27:23.549 --> 00:27:26.910
for AI security researchers or safety researchers,

00:27:27.190 --> 00:27:33.289
you know, to work with. And if you find any bugs

00:27:33.289 --> 00:27:37.529
in Microsoft AI security products or services,

00:27:37.609 --> 00:27:40.769
please reach out to the MSRC at the researcher

00:27:40.769 --> 00:27:45.049
portal. And you can submit the box to us. And

00:27:45.049 --> 00:27:47.849
we would love to continue to engage with you

00:27:47.849 --> 00:27:51.849
and collaborate with you to ensure that we can

00:27:51.849 --> 00:27:55.710
make Microsoft AI products and services better

00:27:55.710 --> 00:27:59.750
for our customers. Very cool. All right. So,

00:27:59.750 --> 00:28:02.569
Raji, thank you for joining us this week. I know

00:28:02.569 --> 00:28:03.849
you're very busy, and that's a very interesting

00:28:03.849 --> 00:28:06.190
area of research. So it's great to see we have

00:28:06.190 --> 00:28:09.210
people like you working in this area. And I always

00:28:09.210 --> 00:28:12.480
learn something new. Again, I'm a classic security

00:28:12.480 --> 00:28:15.420
person, but over the last couple of years, I've

00:28:15.420 --> 00:28:17.880
also learned a great deal about AI security and

00:28:17.880 --> 00:28:20.859
safety. And again, it's amazing. The things you

00:28:20.859 --> 00:28:23.019
don't know, you don't know. So again, thank you

00:28:23.019 --> 00:28:25.019
for joining us this week. And to all our listeners

00:28:25.019 --> 00:28:26.880
out there, we hope you found this episode useful.

00:28:27.220 --> 00:28:29.400
If you're in the AI space, you really need to

00:28:29.400 --> 00:28:32.019
understand AI safety and security, especially

00:28:32.019 --> 00:28:36.079
if you're building products. on top of AI. So

00:28:36.079 --> 00:28:38.039
stay safe, and we'll see you next time.
