1
00:00:00,000 --> 00:00:09,600
Welcome to the Azure Security Podcast, where we discuss topics relating to security, privacy,

2
00:00:09,600 --> 00:00:13,280
reliability, and compliance on the Microsoft Cloud Platform.

3
00:00:13,280 --> 00:00:15,680
Hey, everybody.

4
00:00:15,680 --> 00:00:17,280
Welcome to episode 90.

5
00:00:17,280 --> 00:00:20,420
This week, it's myself, Michael, with Mark and Sarah.

6
00:00:20,420 --> 00:00:25,280
This week, we have two guests, Amanda Minick and Pete Bryan, who are here to talk to us

7
00:00:25,280 --> 00:00:27,880
about AI Red teaming.

8
00:00:27,880 --> 00:00:31,320
Before we get to our guest, why don't we take a little lap around the news?

9
00:00:31,320 --> 00:00:33,480
Mike, why don't you kick things off?

10
00:00:33,480 --> 00:00:40,160
So yeah, for MySpace, I've been working on the security adoption framework, or SAF, or

11
00:00:40,160 --> 00:00:42,560
SAF, as some people like to call it.

12
00:00:42,560 --> 00:00:49,720
And so just wrapped up the identity access, a couple hour workshop module of that one.

13
00:00:49,720 --> 00:00:53,080
How does identity and network access all come together?

14
00:00:53,080 --> 00:00:54,080
And what does that look like?

15
00:00:54,080 --> 00:00:58,800
How does SSE, security service edge, fit into there, as well as all the other privilege

16
00:00:58,800 --> 00:01:00,560
access and those kind of things as well?

17
00:01:00,560 --> 00:01:03,440
So everything access control, access management.

18
00:01:03,440 --> 00:01:04,440
So that's out.

19
00:01:04,440 --> 00:01:08,760
It joins the CISA workshop, MCRA, the full end-to-end architecture, like the multi-day

20
00:01:08,760 --> 00:01:13,880
one, and then the short and long versions of security operations, or SOC, as available

21
00:01:13,880 --> 00:01:14,880
for delivery.

22
00:01:14,880 --> 00:01:16,840
So that's out now.

23
00:01:16,840 --> 00:01:21,540
And next priority is kind of infrastructure and development is kind of what is popping

24
00:01:21,540 --> 00:01:26,360
to the top of the list, and then probably data and IoT short versions after that.

25
00:01:26,360 --> 00:01:30,200
And so that's big news on the security adoption framework front.

26
00:01:30,200 --> 00:01:35,560
From the open group standard piece, the snapshot process at the open group, the way they release

27
00:01:35,560 --> 00:01:39,800
those standards, like the zero trust commandments and reference model, we either have to finalize

28
00:01:39,800 --> 00:01:44,200
them to a standard or release an updated snapshot about every six months or so is what the rules

29
00:01:44,200 --> 00:01:45,200
are.

30
00:01:45,200 --> 00:01:49,080
And so the zero trust commandments, we've gotten some good feedback, not a huge amount,

31
00:01:49,080 --> 00:01:51,480
nothing that seems to be really off about it.

32
00:01:51,480 --> 00:01:55,920
If you do have any feedback, please go out there and provide some.

33
00:01:55,920 --> 00:01:58,040
But we're probably going to be just closing that one up.

34
00:01:58,040 --> 00:02:01,440
It doesn't look like there's a huge amount of stuff that requires having another snapshot

35
00:02:01,440 --> 00:02:03,640
and review period for that one.

36
00:02:03,640 --> 00:02:06,120
So that's the direction that we're leaning for that one.

37
00:02:06,120 --> 00:02:09,840
The reference model, we're also working on sort of the next sections of that larger one

38
00:02:09,840 --> 00:02:10,840
as well.

39
00:02:10,840 --> 00:02:13,280
So we'll include the links for that in the show notes.

40
00:02:13,280 --> 00:02:17,760
On the zero trust playbook front, just chugging away at the next books in the series, prioritizing

41
00:02:17,760 --> 00:02:24,160
security operations slash sock and the leadership one are the two that my co author myself are

42
00:02:24,160 --> 00:02:29,560
focusing on to kind of get those role by role playbooks out there for people.

43
00:02:29,560 --> 00:02:33,400
And of course, the link to the current one that's available, the introduction and playbook

44
00:02:33,400 --> 00:02:36,740
overview, we'll also throw in the show notes as well.

45
00:02:36,740 --> 00:02:42,240
And then some other news, there's this great incident response artifact reference guide

46
00:02:42,240 --> 00:02:46,240
to the Microsoft incident response folks or Dart as you may know them published.

47
00:02:46,240 --> 00:02:51,280
So we'll pop that link in there, a really nice reference there.

48
00:02:51,280 --> 00:02:55,160
And then for those that are interested in doing incident response and all the joys and

49
00:02:55,160 --> 00:02:58,240
pains of that role, the team is growing.

50
00:02:58,240 --> 00:03:03,600
So we'll see if we can find a link to some of those open job requests, but the team is

51
00:03:03,600 --> 00:03:05,000
growing and scaling up.

52
00:03:05,000 --> 00:03:07,000
I don't have a ton of news this week.

53
00:03:07,000 --> 00:03:10,880
I'm still trying to get into the swing things for 2024.

54
00:03:10,880 --> 00:03:16,400
I'll remind everybody that we are doing the Microsoft AI tour.

55
00:03:16,400 --> 00:03:19,280
It's already kicked off for 2024.

56
00:03:19,280 --> 00:03:23,760
At the time we're recording this, I believe tomorrow we'll be doing the New York stop.

57
00:03:23,760 --> 00:03:27,900
There's also Sydney, Tokyo, Seoul, Paris, Berlin.

58
00:03:27,900 --> 00:03:31,060
So we'll put a link in the show notes.

59
00:03:31,060 --> 00:03:33,340
It's free to attend the AI tour.

60
00:03:33,340 --> 00:03:36,820
There's a mix of security and other content in there.

61
00:03:36,820 --> 00:03:41,720
If you go to the Australian one and maybe some of the ones around Asia, you may get

62
00:03:41,720 --> 00:03:44,360
to see yours truly.

63
00:03:44,360 --> 00:03:46,480
And I did write some of the talks.

64
00:03:46,480 --> 00:03:51,520
So go and go and look to see if the AI tour is coming to near you because there will be

65
00:03:51,520 --> 00:03:53,160
security content.

66
00:03:53,160 --> 00:03:57,240
And there's a lot of content, of course, around how the heck do we use all this AI.

67
00:03:57,240 --> 00:03:59,960
So definitely worth attending if you're nearby.

68
00:03:59,960 --> 00:04:01,440
Michael over to you.

69
00:04:01,440 --> 00:04:02,440
All right.

70
00:04:02,440 --> 00:04:03,680
I have a few items.

71
00:04:03,680 --> 00:04:08,600
The first one is what's new in security for ASEA SQL and SQL Server, part of the Data

72
00:04:08,600 --> 00:04:10,640
Exposed series.

73
00:04:10,640 --> 00:04:14,320
This is a series that our colleague Anna Hoffman runs.

74
00:04:14,320 --> 00:04:20,200
So this was an interview with two other colleagues, both of whom have been on the podcast.

75
00:04:20,200 --> 00:04:24,120
Andres Balter and Peter Van Hover both have been on the podcast.

76
00:04:24,120 --> 00:04:28,920
They talk about things like always encrypted and ledger and as well as new authorization

77
00:04:28,920 --> 00:04:32,320
things that are coming in ASEA SQL database.

78
00:04:32,320 --> 00:04:37,760
Next one is in private preview, you can now upgrade existing ASEA Gen 1 virtual machines

79
00:04:37,760 --> 00:04:39,360
to Gen 2 Trusted Launch.

80
00:04:39,360 --> 00:04:40,560
This is actually kind of cool.

81
00:04:40,560 --> 00:04:41,960
I'm a big fan of Trusted Launch.

82
00:04:41,960 --> 00:04:45,720
It's basically a way of just measuring the system as it boots to make sure that it's

83
00:04:45,720 --> 00:04:48,080
a trusted, essentially trusted VM.

84
00:04:48,080 --> 00:04:52,900
And now you can actually upgrade from Gen 1 to Gen 2 and basically adopt the Trusted

85
00:04:52,900 --> 00:04:54,480
Launch capabilities.

86
00:04:54,480 --> 00:04:59,920
That is in private preview and you will need to sign an onboarding form to enroll into

87
00:04:59,920 --> 00:05:01,080
it.

88
00:05:01,080 --> 00:05:05,640
Next one in general availability also with Trusted Launch is we now have premium SSD

89
00:05:05,640 --> 00:05:09,720
V2 and UltraDisk support with Trusted Launch.

90
00:05:09,720 --> 00:05:13,560
I guess is that that means prior to this, we didn't support that as part of Trusted Launch.

91
00:05:13,560 --> 00:05:15,560
Well now we do.

92
00:05:15,560 --> 00:05:21,640
Next one also in GA is customer managed key support for ASEA NetApp files volume encryption.

93
00:05:21,640 --> 00:05:26,560
I don't know what NetApp files are, but they now support customer managed keys for volume

94
00:05:26,560 --> 00:05:27,560
encryption.

95
00:05:27,560 --> 00:05:30,240
This is something I've been talking about for the last two years right now.

96
00:05:30,240 --> 00:05:34,600
More and more products across Microsoft are adopting customer managed keys as opposed

97
00:05:34,600 --> 00:05:36,320
to just straight platform managed keys.

98
00:05:36,320 --> 00:05:40,760
So chalk one up to the NetApp files.

99
00:05:40,760 --> 00:05:47,080
Next one is ASEA load testing now supports fetching secrets from ASEA Key Vault using

100
00:05:47,080 --> 00:05:50,240
access restrictions as well.

101
00:05:50,240 --> 00:05:51,240
Great to see.

102
00:05:51,240 --> 00:05:53,880
You should always store your sensitive stuff in Key Vault.

103
00:05:53,880 --> 00:05:57,040
It can wrap an access policy around it and it can audit it.

104
00:05:57,040 --> 00:06:01,920
And also you can use Defender for ASEA Key Vault to see if any nefarious activity.

105
00:06:01,920 --> 00:06:06,320
So it's always a good idea to stash your secrets in ASEA Key Vault.

106
00:06:06,320 --> 00:06:11,680
And the last one is now in general availability, there is a security update for Azure Front

107
00:06:11,680 --> 00:06:19,200
Door WAF to help track and monitor for CVE 2023 50164.

108
00:06:19,200 --> 00:06:23,760
That is actually a vulnerability in struts, Apache struts.

109
00:06:23,760 --> 00:06:24,760
It is a 9.8.

110
00:06:24,760 --> 00:06:29,880
It's a critical vulnerability, obviously a 9.8 out of 10.

111
00:06:29,880 --> 00:06:32,840
And that's the CVSS score I should say.

112
00:06:32,840 --> 00:06:34,960
And it allows for remote code execution.

113
00:06:34,960 --> 00:06:36,240
So a really nasty bug.

114
00:06:36,240 --> 00:06:42,040
I actually didn't realize that we actually update regularly the WAF in Front Door with

115
00:06:42,040 --> 00:06:47,520
sort of mechanisms for detecting attacks like specific CVE vulnerabilities.

116
00:06:47,520 --> 00:06:48,520
So that's good to know.

117
00:06:48,520 --> 00:06:49,520
I did not know that.

118
00:06:49,520 --> 00:06:51,320
So that's all the news I have.

119
00:06:51,320 --> 00:06:55,120
So now let's turn our attention to our guests.

120
00:06:55,120 --> 00:06:56,520
As I mentioned, we have two guests this week.

121
00:06:56,520 --> 00:07:01,720
We have Amanda and Pete who are here to talk to us about AI Red teaming.

122
00:07:01,720 --> 00:07:03,120
So Amanda and Pete, welcome to the podcast.

123
00:07:03,120 --> 00:07:04,120
Amanda, why don't you go first?

124
00:07:04,120 --> 00:07:08,160
You want to introduce yourself to our listeners, sort of explain kind of what you do.

125
00:07:08,160 --> 00:07:09,160
Yeah.

126
00:07:09,160 --> 00:07:14,160
Hi, I'm Dr. Amanda Minick and I've been on the Microsoft AI Red team for about two and

127
00:07:14,160 --> 00:07:15,160
a half years.

128
00:07:15,160 --> 00:07:20,680
I'm an applied machine learning researcher, but I've mainly worked as an operator evaluating

129
00:07:20,680 --> 00:07:24,320
the safety and security of our AI applications.

130
00:07:24,320 --> 00:07:26,240
Hey, and I'm Pete Bryan.

131
00:07:26,240 --> 00:07:32,080
I've been on the AI Red team approximately one and a half months, so much newer to the

132
00:07:32,080 --> 00:07:33,840
team than Amanda.

133
00:07:33,840 --> 00:07:37,520
But I come from the cybersecurity research space.

134
00:07:37,520 --> 00:07:44,280
And so I am now turning that attention and that focus to AI systems as part of the operations

135
00:07:44,280 --> 00:07:45,280
that we conduct.

136
00:07:45,280 --> 00:07:48,400
But there's a lot of talk about AI at the moment.

137
00:07:48,400 --> 00:07:53,200
And I have been really keen to get the AI Red team on the podcast for a while.

138
00:07:53,200 --> 00:07:57,840
So thank you so much for making the time because I know you are very busy.

139
00:07:57,840 --> 00:08:00,940
But let's start at the beginning.

140
00:08:00,940 --> 00:08:05,920
What is AI Red teaming and how does it differ from normal red teaming?

141
00:08:05,920 --> 00:08:12,360
Yeah, AI Red teaming is a somewhat new term over the last couple of years.

142
00:08:12,360 --> 00:08:17,540
And it definitely has a lot of differences from traditional red teaming.

143
00:08:17,540 --> 00:08:22,080
One of the main goals in traditional red teaming is to emulate some kind of advanced adversary

144
00:08:22,080 --> 00:08:26,240
like a nation state and to avoid detection.

145
00:08:26,240 --> 00:08:31,280
And the engagements tend to be several months and quite involved.

146
00:08:31,280 --> 00:08:34,640
And they focus on security issues specifically.

147
00:08:34,640 --> 00:08:41,280
For AI Red teaming, we tend to have at least how it is at Microsoft, a much shorter timeline.

148
00:08:41,280 --> 00:08:46,400
And we work directly with the stakeholder whose product it is.

149
00:08:46,400 --> 00:08:47,600
So they know we're testing.

150
00:08:47,600 --> 00:08:50,480
It's more like a pen test kind of model.

151
00:08:50,480 --> 00:08:54,680
And we work with them throughout the process, giving them findings and asking questions

152
00:08:54,680 --> 00:08:57,320
about the system as needed.

153
00:08:57,320 --> 00:09:00,480
We're also looking at a larger scope.

154
00:09:00,480 --> 00:09:05,520
So not just security issues, which are clearly very important, but also things that speak

155
00:09:05,520 --> 00:09:12,480
to reputational harm, like responsible AI pieces, bias, stereotyping, harmful content, abusive

156
00:09:12,480 --> 00:09:14,360
content, things of that nature.

157
00:09:14,360 --> 00:09:16,560
So the scope is larger.

158
00:09:16,560 --> 00:09:21,540
And we assume both malicious and benign personas, which is also a bit different.

159
00:09:21,540 --> 00:09:26,420
So we want to look for things that the model produces that are bad inadvertently and then

160
00:09:26,420 --> 00:09:30,840
things that we can make it do intentionally by acting like an adversary.

161
00:09:30,840 --> 00:09:35,700
I think what Amanda said there about the scope is really important.

162
00:09:35,700 --> 00:09:41,840
If you're coming from a more classical cybersecurity background like myself, you quickly realize

163
00:09:41,840 --> 00:09:48,520
that AI red teaming is so much broader and requires a much more diverse kind of skill

164
00:09:48,520 --> 00:09:51,080
set and perspectives.

165
00:09:51,080 --> 00:09:57,160
And so when you're kind of approaching an AI red team op, the mindset has to be pretty

166
00:09:57,160 --> 00:10:04,500
different than you might have experienced in other more security focused red teams.

167
00:10:04,500 --> 00:10:09,480
We talk a lot about responsible AI and in the AI space, of course, the responsible AI

168
00:10:09,480 --> 00:10:11,440
and security are intermingled.

169
00:10:11,440 --> 00:10:17,040
Is that the same kind of thing that you're getting at for AI red teaming that it goes

170
00:10:17,040 --> 00:10:18,480
so much broader?

171
00:10:18,480 --> 00:10:20,640
Yes, absolutely.

172
00:10:20,640 --> 00:10:27,320
We obviously work with people in policy and legal and linguistics and lots of other areas

173
00:10:27,320 --> 00:10:29,720
to work on responsible AI.

174
00:10:29,720 --> 00:10:37,120
But ultimately, we do have to evaluate a lot of the aspects of responsible AI on our operations.

175
00:10:37,120 --> 00:10:44,000
Responsible AI is also an interesting one because with quite a lot of things in this

176
00:10:44,000 --> 00:10:48,240
space, it's evolving quite quickly.

177
00:10:48,240 --> 00:10:56,040
What we define as being in scope for it is interpreted by many different places.

178
00:10:56,040 --> 00:11:00,640
As Amanda said, there's a lot of stakeholders involved in this and it's kind of growing

179
00:11:00,640 --> 00:11:02,480
and changing all the time.

180
00:11:02,480 --> 00:11:07,560
The regulatory landscape is also coming in and having an impact on that.

181
00:11:07,560 --> 00:11:13,200
I think particularly the responsible AI side is something that's constantly evolving.

182
00:11:13,200 --> 00:11:20,400
And whilst we talk about them as two separate sides of this red teaming, the security side

183
00:11:20,400 --> 00:11:24,920
and the responsible AI side, they are very interlinked.

184
00:11:24,920 --> 00:11:32,960
Security issues will lead to being able to generate responsible AI issues and potentially

185
00:11:32,960 --> 00:11:33,960
vice versa.

186
00:11:33,960 --> 00:11:39,720
So they're not something that you can fully separate from each other.

187
00:11:39,720 --> 00:11:44,320
Tell me if I'm wrong on this one, but it sounds a lot like security and privacy are also intertwined

188
00:11:44,320 --> 00:11:45,320
with interdependencies.

189
00:11:45,320 --> 00:11:51,040
One of the pieces of advice I give to customers is, hey, you had to put up a security framework

190
00:11:51,040 --> 00:11:56,160
because of one, auditing and compliance requirements, but also because of real risks.

191
00:11:56,160 --> 00:12:00,560
And then whether you had one or not, GDPR kind of forced you to have some sort of privacy

192
00:12:00,560 --> 00:12:04,000
framework on how you manage that private data.

193
00:12:04,000 --> 00:12:09,120
And I feel like the age of AI is sort of forcing organizations into needing an ethical framework

194
00:12:09,120 --> 00:12:10,120
of some sort.

195
00:12:10,120 --> 00:12:14,480
In the case of Microsoft, responsible AI is kind of how we do that.

196
00:12:14,480 --> 00:12:19,660
But if you're having machines making automated decisions, even if it's using LLMs and human

197
00:12:19,660 --> 00:12:24,560
logic and whatnot that could affect people's livelihoods, careers, the whole deal, you

198
00:12:24,560 --> 00:12:28,760
pretty much have to have that sort of just one from a legal defensibility perspective,

199
00:12:28,760 --> 00:12:32,320
but also because it's the right thing to do to guide the teams and make sure that folks

200
00:12:32,320 --> 00:12:36,880
are actually doing things the same way across, gosh, knows how many developers at a company.

201
00:12:36,880 --> 00:12:39,320
I'm just kind of curious on your thoughts on that.

202
00:12:39,320 --> 00:12:41,200
I mean, does that seem like sound logic?

203
00:12:41,200 --> 00:12:43,280
Yeah, I definitely agree with that.

204
00:12:43,280 --> 00:12:45,960
And I think it's a long time coming.

205
00:12:45,960 --> 00:12:52,360
These concerns have been raised in the ML fairness community for years about the bias

206
00:12:52,360 --> 00:12:56,120
and stereotyping specifically, but many other things that they look at.

207
00:12:56,120 --> 00:12:58,040
But this has been an issue.

208
00:12:58,040 --> 00:13:04,640
And so I think LLMs are just so good and so prolific at what they do that we can't avoid

209
00:13:04,640 --> 00:13:05,640
it anymore.

210
00:13:05,640 --> 00:13:09,200
They can be used in so many things and there's so many potential harms that it has to be

211
00:13:09,200 --> 00:13:11,680
looked at and it has to be handled.

212
00:13:11,680 --> 00:13:14,800
So I'm very happy that we are putting focus on this.

213
00:13:14,800 --> 00:13:19,120
Yeah, it's almost like nobody really cares if it's a back experiment that a few expert

214
00:13:19,120 --> 00:13:21,400
data scientists are doing and stuff.

215
00:13:21,400 --> 00:13:25,080
But all of a sudden when everybody has access to the tool and any kid could run around with

216
00:13:25,080 --> 00:13:28,840
a sharp knife, it's like, oh, wait, maybe we need some rules.

217
00:13:28,840 --> 00:13:29,840
Yeah, absolutely.

218
00:13:29,840 --> 00:13:37,640
Kind of in that history theme and how things have evolved, especially lately quickly, I'd

219
00:13:37,640 --> 00:13:44,680
love to hear how the AIRED team and our approach to this discipline has evolved over the years.

220
00:13:44,680 --> 00:13:48,920
Yeah, so our AIRED team at Microsoft started in 2018.

221
00:13:48,920 --> 00:13:52,680
So we were one of the first, if not the first.

222
00:13:52,680 --> 00:14:00,600
And it definitely took a few years to find the focus, the way to land impact and to really

223
00:14:00,600 --> 00:14:06,080
make people buy into the fact that we have to address AI security as a specific separate

224
00:14:06,080 --> 00:14:11,280
thing and we need people who are experts in both security and ML to come together and

225
00:14:11,280 --> 00:14:13,360
help work in this area.

226
00:14:13,360 --> 00:14:19,160
And so before the big LLM revolution, our ops did look pretty different.

227
00:14:19,160 --> 00:14:22,440
We had both internal and external customers.

228
00:14:22,440 --> 00:14:28,080
And so our engagements were three to four months and we were able to bring in adversarial

229
00:14:28,080 --> 00:14:33,000
machine learning techniques and do aspects of research on these engagements.

230
00:14:33,000 --> 00:14:34,320
And it was always new models.

231
00:14:34,320 --> 00:14:39,560
So you're always having to learn a new system and a new model for each one.

232
00:14:39,560 --> 00:14:45,760
And after the advent of LLMs and all of this kind of exploded, things really, really changed

233
00:14:45,760 --> 00:14:47,240
for our team.

234
00:14:47,240 --> 00:14:52,120
First of all, we've grown, I think we're maybe seven times what we were a year and a half

235
00:14:52,120 --> 00:14:59,300
ago and our operations are two to three weeks before we tested models that were already

236
00:14:59,300 --> 00:15:01,000
in production and deployed.

237
00:15:01,000 --> 00:15:03,360
Now we do pre-ship testing.

238
00:15:03,360 --> 00:15:07,280
So that's a really different dynamic as well.

239
00:15:07,280 --> 00:15:12,200
And we really don't have to justify our existence anymore.

240
00:15:12,200 --> 00:15:17,560
We are part of the shipping process of Microsoft and that is recognized.

241
00:15:17,560 --> 00:15:22,400
And it's really made it so we can do a lot in this space that maybe we didn't have the

242
00:15:22,400 --> 00:15:24,780
ability to do before.

243
00:15:24,780 --> 00:15:27,600
So I have a pretty practical question.

244
00:15:27,600 --> 00:15:29,920
So what does a day in the life look like for you guys?

245
00:15:29,920 --> 00:15:32,840
What actually is AI-RED teaming?

246
00:15:32,840 --> 00:15:36,120
What sort of things would you expect to do on a regular basis?

247
00:15:36,120 --> 00:15:42,340
Yeah, so our job is to test the safety and security of these AI applications.

248
00:15:42,340 --> 00:15:49,080
So in a given day, if we're on an operation, we have access to some application model system

249
00:15:49,080 --> 00:15:51,180
that we're meant to be testing.

250
00:15:51,180 --> 00:15:56,000
We create a test plan that we share with the stakeholder and go through different scenarios

251
00:15:56,000 --> 00:15:59,240
that need to be tested to make sure we have coverage.

252
00:15:59,240 --> 00:16:04,660
And then we use a variety of techniques to test and validate these systems.

253
00:16:04,660 --> 00:16:07,280
We use traditional web app pen testing techniques.

254
00:16:07,280 --> 00:16:10,480
We do prompt engineering and prompt injection.

255
00:16:10,480 --> 00:16:15,960
And then we have specific tools and things that we use to do the responsible AI piece.

256
00:16:15,960 --> 00:16:22,840
So you are there, you're potentially just typing in different prompts into some UI and

257
00:16:22,840 --> 00:16:25,960
trying to get the model to behave badly in different ways.

258
00:16:25,960 --> 00:16:27,700
That's one piece that we do.

259
00:16:27,700 --> 00:16:32,040
We also have a tool called Pirate that can help us automate some of these pieces.

260
00:16:32,040 --> 00:16:37,220
So for some ops, we're running Pirate and sending thousands to tens of thousands of

261
00:16:37,220 --> 00:16:42,280
prompts to the model and getting responses and scoring them to try to get a broader picture

262
00:16:42,280 --> 00:16:45,280
of how safe the model is.

263
00:16:45,280 --> 00:16:47,680
Pete, do you want to add stuff?

264
00:16:47,680 --> 00:16:55,080
Yeah, I think an important part of what we do as well is feedback into the wider AI safety

265
00:16:55,080 --> 00:17:01,560
and security community here at Microsoft, sharing what we find on these operations,

266
00:17:01,560 --> 00:17:05,680
not just with the product teams who are building that specific product, but also helping to

267
00:17:05,680 --> 00:17:12,200
inform the people who are thinking about the technologies that we can use to mitigate whole

268
00:17:12,200 --> 00:17:17,160
classes of threats and informing decision makers about how we think about AI safety

269
00:17:17,160 --> 00:17:19,360
and security within different contexts.

270
00:17:19,360 --> 00:17:24,320
Again, there are a lot of people across Microsoft working on AI features.

271
00:17:24,320 --> 00:17:28,760
And we're constantly, as a company, having to decide what's the kind of risk profile

272
00:17:28,760 --> 00:17:34,200
we're happy with and what the AI Red team does and our position as the people kind of

273
00:17:34,200 --> 00:17:38,360
seeing this day in, day out really helps inform that decision making.

274
00:17:38,360 --> 00:17:40,080
I really like your opinion on this.

275
00:17:40,080 --> 00:17:43,600
So at Microsoft, we've coined this term co-pilot.

276
00:17:43,600 --> 00:17:47,660
To me, a co-pilot, and correct me if I'm wrong here, is almost like a layer in between

277
00:17:47,660 --> 00:17:51,560
the user and the actual large language model underneath.

278
00:17:51,560 --> 00:17:55,080
So for example, I'm not interacting directly with the large language model.

279
00:17:55,080 --> 00:17:58,680
There's some sort of safety going on in the co-pilot.

280
00:17:58,680 --> 00:18:04,080
To your point, Pete, I'm not saying it's perfect necessarily because large language models

281
00:18:04,080 --> 00:18:06,440
will do what large language models do.

282
00:18:06,440 --> 00:18:11,200
But is that a fair comment to say that the whole idea behind the co-pilot story that

283
00:18:11,200 --> 00:18:16,240
we have at Microsoft is to have this layer, sort of protective layer and perhaps better

284
00:18:16,240 --> 00:18:20,640
user experience layer between the user and the large language model underneath?

285
00:18:20,640 --> 00:18:23,320
I think that's definitely part of it.

286
00:18:23,320 --> 00:18:28,880
And again, if you look at some of Microsoft's Gen.AI products, I'm thinking Bing Chat, for

287
00:18:28,880 --> 00:18:35,840
example, if you used that when we first released it early last year, it kind of works very

288
00:18:35,840 --> 00:18:38,440
differently than it does now.

289
00:18:38,440 --> 00:18:42,640
And part of that is because, yes, there's new capabilities, but part of it is because

290
00:18:42,640 --> 00:18:48,600
we've got a better idea about how to curate that human interaction with it to provide

291
00:18:48,600 --> 00:18:54,960
safeguards, but also to make it more effective with new capabilities and more efficient kind

292
00:18:54,960 --> 00:18:58,080
of response answering as we kind of grow and learn.

293
00:18:58,080 --> 00:19:00,800
And that's something that's kind of happening across the business.

294
00:19:00,800 --> 00:19:05,520
Now it kind of depends on the product a little bit about how much that happens.

295
00:19:05,520 --> 00:19:11,440
And again, there are those bigger co-pilot features and then there are smaller, more

296
00:19:11,440 --> 00:19:15,280
direct and scoped implementations of Gen.AI within features.

297
00:19:15,280 --> 00:19:19,960
And they have a kind of different approach depending on how they're meant to be used.

298
00:19:19,960 --> 00:19:26,840
But there are definitely in the bigger co-pilots, a lot of learning and a lot of kind of commonality

299
00:19:26,840 --> 00:19:32,240
that is being shared across the teams to make them better and safer at the same time.

300
00:19:32,240 --> 00:19:37,520
So one of the terms that we hear a lot these days is this notion of jailbreaking large

301
00:19:37,520 --> 00:19:38,520
language models.

302
00:19:38,520 --> 00:19:43,520
I mean, I think we've all heard of jailbreaking cell phones, but what does jailbreaking mean

303
00:19:43,520 --> 00:19:46,120
in terms of large language models?

304
00:19:46,120 --> 00:19:51,840
Jailbreaking is basically using prompts to override the system instructions for the model

305
00:19:51,840 --> 00:19:58,840
in some way so that it behaves outside of its scope where it does things that are unintended

306
00:19:58,840 --> 00:20:01,400
or not desirable.

307
00:20:01,400 --> 00:20:07,240
One really popular example that was going around is somebody asked the large language

308
00:20:07,240 --> 00:20:10,840
model to act as a deceased grandma.

309
00:20:10,840 --> 00:20:18,280
She used to work in a chemical plant and making napalm and they want the LLM to tell the bedtime

310
00:20:18,280 --> 00:20:23,080
story that she always used to tell them, which was the recipe for napalm.

311
00:20:23,080 --> 00:20:27,560
And normally if you ask the LLM to just give you the recipe for napalm, it'll say, I can't

312
00:20:27,560 --> 00:20:28,560
do that.

313
00:20:28,560 --> 00:20:31,640
This goes against my safety instructions, all these things.

314
00:20:31,640 --> 00:20:35,640
But when you're able to ask it in a slightly different way where you say, take on this

315
00:20:35,640 --> 00:20:40,480
persona or tell me a story about this thing, you're able to bypass those instructions

316
00:20:40,480 --> 00:20:43,160
and get it to do what you want it to do.

317
00:20:43,160 --> 00:20:45,800
And so this has taken off.

318
00:20:45,800 --> 00:20:49,760
It's been, there's so many people doing their own individual research because anyone can

319
00:20:49,760 --> 00:20:50,760
do this.

320
00:20:50,760 --> 00:20:53,880
You can go on ChatGPT or go on Bing Chat and try things out.

321
00:20:53,880 --> 00:20:57,680
I mean, it might be technically against the terms of service, but it's also helping them

322
00:20:57,680 --> 00:20:59,320
because they're getting data on these things.

323
00:20:59,320 --> 00:21:03,960
But you're able to make these models so people think, okay, we've added all these instructions.

324
00:21:03,960 --> 00:21:09,080
We've really narrowed down the scope of what this model can produce and constrained it.

325
00:21:09,080 --> 00:21:14,040
But with these jailbreaks, it quickly became clear that these LLMs can do a lot more and

326
00:21:14,040 --> 00:21:15,280
a lot outside of that.

327
00:21:15,280 --> 00:21:18,080
And there are a ton of different techniques to get around them.

328
00:21:18,080 --> 00:21:20,600
And then of course, it's the typical arms race.

329
00:21:20,600 --> 00:21:23,800
We're creating classifiers to identify jailbreaks.

330
00:21:23,800 --> 00:21:28,000
We're creating other kinds of safety models, and then people are adjusting on the jailbreaks.

331
00:21:28,000 --> 00:21:34,400
So it's what we've seen in content mod and security and all kinds of areas forever.

332
00:21:34,400 --> 00:21:38,080
But it's been interesting to see the creativity that people have when coming up with these

333
00:21:38,080 --> 00:21:39,080
jailbreaks.

334
00:21:39,080 --> 00:21:43,840
And one of the things that I remember us discussing as we were getting ready for the podcast,

335
00:21:43,840 --> 00:21:50,280
because I asked the question around this is following human logic and the logic of language

336
00:21:50,280 --> 00:21:57,160
or I think you corrected me to the models interpretation of that, which is really, really

337
00:21:57,160 --> 00:22:05,120
different than following code logic that eventually gets down to assembly and then on into pokes

338
00:22:05,120 --> 00:22:07,160
and pops and memory writes and stuff.

339
00:22:07,160 --> 00:22:13,240
So it's like a whole different logical flow than we're used to in programs and exploits.

340
00:22:13,240 --> 00:22:16,080
I mean, can you talk about that a little bit?

341
00:22:16,080 --> 00:22:17,080
Yeah, absolutely.

342
00:22:17,080 --> 00:22:18,560
I think that's twofold.

343
00:22:18,560 --> 00:22:23,320
One is that it feels very much like human language or interacting with a human.

344
00:22:23,320 --> 00:22:27,060
And so that piece makes it feel very different.

345
00:22:27,060 --> 00:22:33,620
But then also, these models, there's additional complications, like for example, if a model

346
00:22:33,620 --> 00:22:40,000
had the ability to interact with the database and on the back end, you don't control for

347
00:22:40,000 --> 00:22:44,640
what behaviors it can do, but you give the model very specific instructions.

348
00:22:44,640 --> 00:22:48,400
Don't delete any data, don't drop any tables, don't do any of that.

349
00:22:48,400 --> 00:22:53,000
If someone's able to jailbreak that and get around those instructions and it's not controlled

350
00:22:53,000 --> 00:22:57,540
properly on the back end, then they can still do very harmful things.

351
00:22:57,540 --> 00:23:04,880
And I feel like for more traditional security, it's a bit of a different flow.

352
00:23:04,880 --> 00:23:08,840
You're not trying to convince something to go and do the bad stuff that you want unless

353
00:23:08,840 --> 00:23:13,360
you're doing specific social engineering stuff, but for the technical pieces.

354
00:23:13,360 --> 00:23:21,120
Yeah, I think it's a tricky one, because language is such an interesting construct generally

355
00:23:21,120 --> 00:23:29,920
when you apply it in the AI space and technically how this is understood by the model, I guess.

356
00:23:29,920 --> 00:23:33,600
There's a lot of discussion about actually how much understanding really goes on.

357
00:23:33,600 --> 00:23:38,680
But for the concepts of what we're talking about, I think understanding is a good analogy.

358
00:23:38,680 --> 00:23:41,800
And there has been a lot of research in this area.

359
00:23:41,800 --> 00:23:48,800
There was an interesting paper the other day that looked at using persuasive language in

360
00:23:48,800 --> 00:23:55,840
jailbreak attempts and what impact that had compared to various security controls that

361
00:23:55,840 --> 00:23:58,640
could be put into these systems.

362
00:23:58,640 --> 00:24:06,000
And so the interplay between language as we understand it and how it works in the LOM

363
00:24:06,000 --> 00:24:10,800
is definitely something that we're still learning and that is evolving all the time.

364
00:24:10,800 --> 00:24:15,720
So it's almost like it's halfway between social engineering, which is truly manipulating a

365
00:24:15,720 --> 00:24:22,280
human or tricking a human, and technical stuff is sort of like in that gray space in between.

366
00:24:22,280 --> 00:24:27,600
And we're not quite sure how much is one and how much is the other and how much is something

367
00:24:27,600 --> 00:24:29,280
sort of new in between.

368
00:24:29,280 --> 00:24:30,280
I think so.

369
00:24:30,280 --> 00:24:37,840
And I think there are sometimes technical elements to it that we ascribe maybe to social

370
00:24:37,840 --> 00:24:39,400
engineering a little bit more.

371
00:24:39,400 --> 00:24:47,200
So for example, there's some research looking at where within a larger set of instructions,

372
00:24:47,200 --> 00:24:51,600
you put the kind of jailbreak attempt and what impact that has.

373
00:24:51,600 --> 00:24:56,120
Like if you start your large prompt with the jailbreak, is that more effective than if

374
00:24:56,120 --> 00:24:57,760
you put the jailbreak at the end?

375
00:24:57,760 --> 00:25:02,560
It's almost getting into like sales and persuasion techniques of, you know, are you blunt upfront

376
00:25:02,560 --> 00:25:05,800
or do you have a mysterious sort of thing that you reveal at the end?

377
00:25:05,800 --> 00:25:07,000
It sounds interesting.

378
00:25:07,000 --> 00:25:12,480
Yes, and I think when a human looks at it, you're quite obviously, because it's in language

379
00:25:12,480 --> 00:25:16,120
we understand, you immediately jump to, oh, it's like a persuasion technique.

380
00:25:16,120 --> 00:25:20,720
But in reality, it's probably much more to do with the way the weights in the model work

381
00:25:20,720 --> 00:25:25,760
and the attention mechanisms in these LLMs that inform it.

382
00:25:25,760 --> 00:25:27,560
So it's an interesting one.

383
00:25:27,560 --> 00:25:35,760
And I think it can be hard to know where stuff lies within the whole technical versus linguistical

384
00:25:35,760 --> 00:25:37,080
spectrum.

385
00:25:37,080 --> 00:25:43,320
Yeah, it sounds like there's also an element kind of stepping back for a moment of you

386
00:25:43,320 --> 00:25:46,640
can put the controls on the model itself.

387
00:25:46,640 --> 00:25:51,200
You can put like a wrapper, like a copilot or something like that in the application.

388
00:25:51,200 --> 00:25:55,560
But you also may also have just standard least privilege stuff is that, hey, we don't trust

389
00:25:55,560 --> 00:25:57,000
this thing to write to the database.

390
00:25:57,000 --> 00:25:59,740
So we're not going to give it a read write service account.

391
00:25:59,740 --> 00:26:01,560
We're going to give it a read only.

392
00:26:01,560 --> 00:26:06,960
So it sounds like it's sort of a mix of control options that you have, you know, to sort of

393
00:26:06,960 --> 00:26:08,320
mitigate risk.

394
00:26:08,320 --> 00:26:09,920
100%.

395
00:26:09,920 --> 00:26:18,760
And I think one key thing is LLMs and Gen.ai don't replace any of your traditional security

396
00:26:18,760 --> 00:26:19,760
controls.

397
00:26:19,760 --> 00:26:25,960
In fact, I think what we see is they actually expose them to a bigger and attack surface.

398
00:26:25,960 --> 00:26:30,520
So they might not necessarily introduce something themselves, but they definitely provide new

399
00:26:30,520 --> 00:26:32,080
ways for people to exploit it.

400
00:26:32,080 --> 00:26:37,080
So you can't just slap an LLM in front of something and have some prompt instructions

401
00:26:37,080 --> 00:26:38,700
and think you're good.

402
00:26:38,700 --> 00:26:43,200
You do really have to think about it from a whole system architecture in the way that

403
00:26:43,200 --> 00:26:46,160
you would if it was a non Gen.ai system.

404
00:26:46,160 --> 00:26:47,160
Yeah.

405
00:26:47,160 --> 00:26:51,320
And some of the some of the early things I've seen also kind of remind me of when Microsoft

406
00:26:51,320 --> 00:26:56,320
Dell first came out, you know, because if your organization doesn't have really good

407
00:26:56,320 --> 00:27:00,600
access controls on the data and who should or shouldn't have access to it, are you going

408
00:27:00,600 --> 00:27:03,960
to blame the discovery tool that makes it easier to find stuff?

409
00:27:03,960 --> 00:27:07,440
Or are you going to blame the actual underlying they shouldn't have access to it anyway, they

410
00:27:07,440 --> 00:27:09,440
just had a harder time finding it before?

411
00:27:09,440 --> 00:27:15,120
I wanted to just mention to based on something that Pete had said about looking at the language,

412
00:27:15,120 --> 00:27:19,720
it's easy to ascribe these human values of persuasion and things like that.

413
00:27:19,720 --> 00:27:23,400
But there's under the hood, it's the weights and the attention mechanisms, something that

414
00:27:23,400 --> 00:27:28,920
can make this also difficult is we tend to work with closed source models.

415
00:27:28,920 --> 00:27:31,800
So we don't have access to their weights.

416
00:27:31,800 --> 00:27:35,120
And we don't have access to the embeddings or all of those pieces.

417
00:27:35,120 --> 00:27:41,520
So it can make it more difficult to try to learn about the cause of these different issues

418
00:27:41,520 --> 00:27:48,320
or mitigations that are effective for a certain class of problems rather than just like whack

419
00:27:48,320 --> 00:27:49,320
a mole.

420
00:27:49,320 --> 00:27:55,200
So that is an extra challenge in this space, not having the main models that we use be

421
00:27:55,200 --> 00:27:56,200
open source.

422
00:27:56,200 --> 00:27:57,760
There's a couple of comments there.

423
00:27:57,760 --> 00:28:02,440
First of all, I just sort of reiterate something that Mark just said, you still have to think

424
00:28:02,440 --> 00:28:04,680
about your classic defenses as well.

425
00:28:04,680 --> 00:28:08,400
Like you mentioned, you know, say a read only connection as opposed to read write, because

426
00:28:08,400 --> 00:28:12,960
that way if you know, some language model gives you some information and you blindly

427
00:28:12,960 --> 00:28:17,120
play that against your product, and it's a read write connection or you're over elevated,

428
00:28:17,120 --> 00:28:19,640
then that can be incredibly problematic.

429
00:28:19,640 --> 00:28:23,000
So classic mitigations still come into play.

430
00:28:23,000 --> 00:28:26,640
The other thing I want to talk about just real quick, and I'll hand it over to Sarah.

431
00:28:26,640 --> 00:28:32,320
There's a paper in the 1970s on the protection of computer systems by Saltz and Schroeder.

432
00:28:32,320 --> 00:28:37,640
One thing they talk about in there and it's very well known in sort of cybersecurity landscape

433
00:28:37,640 --> 00:28:41,760
is don't mix the data plane with the control plane.

434
00:28:41,760 --> 00:28:45,600
And with large language models, that's exactly what we do.

435
00:28:45,600 --> 00:28:49,360
And that can make things really problematic and very difficult to secure because the stuff

436
00:28:49,360 --> 00:28:53,220
that controls is the data at the same time.

437
00:28:53,220 --> 00:28:54,720
So I just want to throw that out there.

438
00:28:54,720 --> 00:28:56,080
No need to reply to that.

439
00:28:56,080 --> 00:28:58,520
Just an observation more than anything else.

440
00:28:58,520 --> 00:29:05,360
I think one thing that is kind of telling with where we are with this is that as we've

441
00:29:05,360 --> 00:29:10,320
said before, Microsoft's kind of branding for all of this stuff is copilot.

442
00:29:10,320 --> 00:29:14,280
And that's really still where we are with human in the loop.

443
00:29:14,280 --> 00:29:19,200
We're not putting these systems in situations where they can make decisions or take lots

444
00:29:19,200 --> 00:29:23,760
of actions without a human being able to review it because, yeah, we kind of don't want to

445
00:29:23,760 --> 00:29:29,120
mix those boundaries and the implications that can have.

446
00:29:29,120 --> 00:29:35,240
And maybe over time, as we develop in this area as an industry and a society, we might

447
00:29:35,240 --> 00:29:39,720
get comfortable giving it kind of more control in this space.

448
00:29:39,720 --> 00:29:44,360
But at least for me, I don't see in the short term a space where we're going to kind of

449
00:29:44,360 --> 00:29:51,640
move away from a copilot model to a model whereby the AI systems have a lot more autonomy.

450
00:29:51,640 --> 00:29:58,120
So obviously, many folks have concerns and have voiced concerns about AI security.

451
00:29:58,120 --> 00:30:04,600
But I wanted to ask because my observation is that these concerns aren't that necessarily

452
00:30:04,600 --> 00:30:07,640
as new as people think.

453
00:30:07,640 --> 00:30:09,840
Is that accurate, would you say?

454
00:30:09,840 --> 00:30:11,400
I would definitely agree with that.

455
00:30:11,400 --> 00:30:17,080
I feel like it's a mix of the ML fairness piece that I mentioned of worrying about if

456
00:30:17,080 --> 00:30:19,900
models are biased and things like that.

457
00:30:19,900 --> 00:30:23,640
And then also just good old content moderation.

458
00:30:23,640 --> 00:30:29,640
There's bad text generation where people can use something to generate text for bad purposes.

459
00:30:29,640 --> 00:30:30,960
And we need to identify that.

460
00:30:30,960 --> 00:30:34,520
We need to identify the harmful content and protect from it.

461
00:30:34,520 --> 00:30:36,320
To me, that all feels like content moderation.

462
00:30:36,320 --> 00:30:39,280
I did content moderation at Twitter before joining Microsoft.

463
00:30:39,280 --> 00:30:43,080
And there's definitely a lot of overlap in the things we care about, in the things we

464
00:30:43,080 --> 00:30:48,520
have to identify, and in the types of models that the mitigation teams are building.

465
00:30:48,520 --> 00:30:52,240
I think a lot of this as well as exposure bias.

466
00:30:52,240 --> 00:30:56,520
As Amanda said earlier, people in the industry and in the ML space have been talking about

467
00:30:56,520 --> 00:30:57,800
this for a long time.

468
00:30:57,800 --> 00:31:02,520
And we've known things like facial recognition systems that have been around for quite a

469
00:31:02,520 --> 00:31:05,720
while have their issues.

470
00:31:05,720 --> 00:31:10,000
But what we're seeing now is AI is just so much more prevalent.

471
00:31:10,000 --> 00:31:17,320
Those issues with it and the fallibility of it is just much more visible to everyone.

472
00:31:17,320 --> 00:31:21,440
And I think that's probably why we're seeing it come to the top of people's agendas much

473
00:31:21,440 --> 00:31:22,440
more.

474
00:31:22,440 --> 00:31:24,880
So how can customers do their own red teaming?

475
00:31:24,880 --> 00:31:28,040
Because obviously, like you mentioned, this is prevalent now.

476
00:31:28,040 --> 00:31:31,600
So how can customers do their own red teaming?

477
00:31:31,600 --> 00:31:37,140
This is something that actually, in many cases, has a low barrier to entry.

478
00:31:37,140 --> 00:31:42,580
And within Microsoft, as the AI red team, we stand as what we call an independent red

479
00:31:42,580 --> 00:31:43,580
team.

480
00:31:43,580 --> 00:31:46,560
So we don't sit within any product group.

481
00:31:46,560 --> 00:31:52,200
But as well as us, many teams themselves do their own red teaming of their features before

482
00:31:52,200 --> 00:31:53,900
we even touch it.

483
00:31:53,900 --> 00:31:56,960
And this is definitely something others can do.

484
00:31:56,960 --> 00:32:04,600
I think from our experience, some of the key things that you need to do are identify clearly

485
00:32:04,600 --> 00:32:11,040
what the focus areas for red teaming are for you and for your particular product.

486
00:32:11,040 --> 00:32:14,640
As Amanda said at the beginning, the scope in AI red teaming is really broad.

487
00:32:14,640 --> 00:32:18,360
And actually trying to cover everything is a pretty massive task.

488
00:32:18,360 --> 00:32:22,020
So making sure you're focused on where your risks are.

489
00:32:22,020 --> 00:32:28,120
And then trying to bring together that diverse set of people to make red teaming effective.

490
00:32:28,120 --> 00:32:31,760
So you will probably want some people with some security experience.

491
00:32:31,760 --> 00:32:35,920
But you also want a diverse range of people with cultural and linguistic backgrounds to

492
00:32:35,920 --> 00:32:41,960
help you search for bias and harmful content across the spectrum.

493
00:32:41,960 --> 00:32:48,080
You also want to bring in people with, if you've got them, more of the ML and data science

494
00:32:48,080 --> 00:32:51,040
backgrounds to help you understand the model.

495
00:32:51,040 --> 00:32:58,080
And then really anyone who wants to be involved should get involved in my opinion.

496
00:32:58,080 --> 00:33:03,440
Because we've seen it on some of our ops where we've brought in other groups from within

497
00:33:03,440 --> 00:33:05,280
Microsoft who want to be involved.

498
00:33:05,280 --> 00:33:11,560
And they've found things and thought about harms that can manifest in ways that we hadn't

499
00:33:11,560 --> 00:33:12,740
considered initially.

500
00:33:12,740 --> 00:33:17,120
So having that diversity in your team is really key.

501
00:33:17,120 --> 00:33:20,880
I think one of you mentioned earlier that there's a huge evolution in terms of what's

502
00:33:20,880 --> 00:33:23,320
going on in the security space around AI.

503
00:33:23,320 --> 00:33:26,440
What things keep you awake at night right now?

504
00:33:26,440 --> 00:33:34,000
For me, I think one thing that has definitely been concerning is as we continue to give

505
00:33:34,000 --> 00:33:40,740
these models more access to data and more of an ability to take actions via connection

506
00:33:40,740 --> 00:33:46,780
with plugins and things like that on behalf of users, there's so many more ways to attack

507
00:33:46,780 --> 00:33:51,560
the system and to make it do bad things that will end up hurting the user.

508
00:33:51,560 --> 00:33:58,520
So the particular system of that, of ingesting data and being able to take actions in addition

509
00:33:58,520 --> 00:34:01,800
to the LLM, that really concerns me.

510
00:34:01,800 --> 00:34:05,920
I think that there's a lot that we need to grow in in the security space to really feel

511
00:34:05,920 --> 00:34:07,360
comfortable doing that.

512
00:34:07,360 --> 00:34:14,640
And then I think the other piece is incorporating these LLMs into technologies that they aren't

513
00:34:14,640 --> 00:34:16,240
ready for.

514
00:34:16,240 --> 00:34:20,980
We know that we can ask some questions about medical pieces and things like that and get

515
00:34:20,980 --> 00:34:25,680
helpful information, but there's also risk of fabrications.

516
00:34:25,680 --> 00:34:33,320
And because our data sets come from the internet, there's more data about maybe white men or

517
00:34:33,320 --> 00:34:36,440
the majority and there's less about marginalized populations.

518
00:34:36,440 --> 00:34:40,580
So we're more likely to get things wrong for the people where it really counts.

519
00:34:40,580 --> 00:34:45,000
So those are the pieces that really concern me and that I think about often.

520
00:34:45,000 --> 00:34:48,000
I think I would echo Amanda's second point there.

521
00:34:48,000 --> 00:34:54,320
I think the biggest concern I have is the irresponsible usage of this technology.

522
00:34:54,320 --> 00:35:00,080
There are very good use cases for it, very powerful ones, but there are plenty of cases

523
00:35:00,080 --> 00:35:05,240
where it just shouldn't be used or should be used very, very carefully.

524
00:35:05,240 --> 00:35:12,400
And I don't think the technology itself is inherently less secure or less safe than many

525
00:35:12,400 --> 00:35:17,760
other technologies, but if it's not applied in the right place in the right way, it could

526
00:35:17,760 --> 00:35:18,940
be very harmful.

527
00:35:18,940 --> 00:35:25,360
And I think one of the things we do very well at Microsoft is having principles and a well

528
00:35:25,360 --> 00:35:28,360
thought out process about how we're going to use GenAI.

529
00:35:28,360 --> 00:35:35,680
My worry is other organizations or groups might not have that same approach to how to

530
00:35:35,680 --> 00:35:36,680
use this stuff.

531
00:35:36,680 --> 00:35:44,040
If you look at one of our exams, AI 900, which is artificial intelligence fundamentals, a

532
00:35:44,040 --> 00:35:47,880
big part of that is basically just using AI safely.

533
00:35:47,880 --> 00:35:53,200
It's a huge, huge part of that class or that particular exam.

534
00:35:53,200 --> 00:35:54,680
Responsible AI, I should say.

535
00:35:54,680 --> 00:35:55,680
Yeah.

536
00:35:55,680 --> 00:36:03,680
And I think it's telling that every movie since the 1950s that has involved AI has involved

537
00:36:03,680 --> 00:36:06,680
it trying to kill people because it's been given too much power.

538
00:36:06,680 --> 00:36:12,280
Now, I'm not saying that science fiction is reality these days, but as with all these

539
00:36:12,280 --> 00:36:14,760
things, there's always a kernel of truth in there somewhere.

540
00:36:14,760 --> 00:36:15,760
Okay.

541
00:36:15,760 --> 00:36:19,440
Well, Amanda and Pete, thank you so much for joining us.

542
00:36:19,440 --> 00:36:28,040
I have learned a lot and I have many more questions, but I think that was probably enough

543
00:36:28,040 --> 00:36:31,600
for one day whilst we learn more about AI-red teaming.

544
00:36:31,600 --> 00:36:38,880
But for all of our guests, we always ask them at the end of the episode for your final thought.

545
00:36:38,880 --> 00:36:41,280
What would you like to leave our listeners with?

546
00:36:41,280 --> 00:36:45,040
We talked a bit about this earlier, but I think the final thought I'd like to leave

547
00:36:45,040 --> 00:36:54,600
is that more a PSA, if you're developing a gen AI feature, don't forget about cybersecurity.

548
00:36:54,600 --> 00:37:02,080
Because web app security issues, the OS top 10, all of those known TTPs, they still exist

549
00:37:02,080 --> 00:37:04,080
within gen AI systems.

550
00:37:04,080 --> 00:37:08,280
And just because you have an LLM as your user interface doesn't mean they're not important.

551
00:37:08,280 --> 00:37:13,240
So please, please, please make sure you're prioritizing that in the development as well.

552
00:37:13,240 --> 00:37:19,720
I think when you are creating your gen AI product and you're trying to evaluate the

553
00:37:19,720 --> 00:37:25,960
safety and security of it, really try to get a wide variety of people in the room, the

554
00:37:25,960 --> 00:37:32,960
people who are going to identify some classes of issues and biases are going to need to

555
00:37:32,960 --> 00:37:37,680
have a different experience with these technologies than you.

556
00:37:37,680 --> 00:37:46,600
And so having the policymakers, the legal, linguists, content mod people, we have a social

557
00:37:46,600 --> 00:37:54,960
engineer on our team, ML researchers, ML engineers, and then cybersecurity people of all backgrounds.

558
00:37:54,960 --> 00:37:59,360
You need a lot of people to come together and look at this problem because we all think

559
00:37:59,360 --> 00:38:03,080
of really different issues that come up with these models.

560
00:38:03,080 --> 00:38:06,000
And there's a wide variety.

561
00:38:06,000 --> 00:38:07,880
So I think that that part is really important.

562
00:38:07,880 --> 00:38:11,960
Well, Amanda and Pete, thank you again for joining us this week.

563
00:38:11,960 --> 00:38:13,280
I really appreciate you taking the time.

564
00:38:13,280 --> 00:38:16,920
I know you guys are busy just knowing a lot of the stuff that's going on inside of Microsoft.

565
00:38:16,920 --> 00:38:20,280
I have no doubt that you guys have a full dance card.

566
00:38:20,280 --> 00:38:22,360
So again, thank you for joining us this week.

567
00:38:22,360 --> 00:38:25,760
And to all our listeners, we hope you found this episode useful.

568
00:38:25,760 --> 00:38:28,080
Stay safe and we'll see you next time.

569
00:38:28,080 --> 00:38:31,040
Thanks for listening to the Azure Security Podcast.

570
00:38:31,040 --> 00:38:37,880
You can find show notes and other resources at our website, azsecuritypodcast.net.

571
00:38:37,880 --> 00:38:43,040
If you have any questions, please find us on Twitter at Azure Setpod.

572
00:38:43,040 --> 00:38:47,800
All music is from ccmixtor.com and licensed under the Creative Commons license.