1
00:00:00,000 --> 00:00:09,440
Welcome to the Azure Security podcast where we discuss topics relating to security, privacy,

2
00:00:09,440 --> 00:00:12,640
reliability and compliance on the Microsoft Cloud Platform.

3
00:00:13,920 --> 00:00:18,160
Hey everybody, welcome to episode 27. We have a full house this week.

4
00:00:18,160 --> 00:00:22,160
We have Sarah, Mark, Gladys and myself. We also have a guest Sharon Shaw.

5
00:00:22,880 --> 00:00:28,000
She's here to speak to us about applied data science in cybersecurity. But before we get to

6
00:00:28,000 --> 00:00:31,280
Sharon, let's take a quick look at the news. Mike, why don't you kick things off?

7
00:00:32,720 --> 00:00:39,680
Our first piece of news is that the Open Group has released the Zero Trust Core Principles white

8
00:00:39,680 --> 00:00:44,240
paper. It's a free but it's a registration wall so you have to sign up for an account if you're

9
00:00:44,240 --> 00:00:50,320
not already part of the Open Group. I'm actually co-chair of the Zero Trust Architecture Working

10
00:00:50,320 --> 00:00:56,320
Group over there. And the cool thing is for those of you that are familiar with the Jericho form,

11
00:00:56,320 --> 00:01:01,600
which was the first formal challenge to the perimeter-centric view of security back in the day,

12
00:01:02,480 --> 00:01:08,080
talking probably 15 years ago now, that Jericho form was actually hosted in the security form and

13
00:01:08,080 --> 00:01:12,080
became part of the security form within the Open Group. And so I got to work with some of the

14
00:01:12,080 --> 00:01:18,080
original members of the Jericho form as we figure out how do we modernize these ideas and recognize

15
00:01:18,080 --> 00:01:24,400
what Zero Trust is in that open industry agnostic way. And so those core principles came out and

16
00:01:24,400 --> 00:01:28,240
I think they're pretty good. I worked on them some little biased, but I want to make sure everyone

17
00:01:28,240 --> 00:01:32,160
was aware that those are out there. It's another great reference point just like NIST or any others

18
00:01:32,160 --> 00:01:38,480
for sort of a vendor agnostic view of what Zero Trust is because there's a lot of vendor

19
00:01:39,280 --> 00:01:43,040
claims out there that you just buy my product and you get Zero Trust, which I know our customers

20
00:01:43,040 --> 00:01:49,440
are getting really tired of. The other thing that caught my eye was there was a publishing

21
00:01:49,440 --> 00:01:55,760
that was done of the top five VPN vulnerabilities that are being exploited by advanced actor groups.

22
00:01:55,760 --> 00:02:01,200
It was really interesting, triggered some thoughts, and then I'm actually going to put in two links

23
00:02:01,200 --> 00:02:07,360
there. One is the actual report and then the other is two kind of Microsoft's recommendations in

24
00:02:07,360 --> 00:02:14,640
this space because we've seen a lot of these kind of exploitation of VPNs a lot lately because

25
00:02:14,640 --> 00:02:20,640
people don't patch them. Windows update is easy. It's easy to patch, right? Or Microsoft update,

26
00:02:20,640 --> 00:02:24,640
it's just easy to use those well-established channels or your iPhone or whatever it may be.

27
00:02:25,360 --> 00:02:30,640
But once you get into sort of appliances where you have to do downloads and other kind of stuff,

28
00:02:30,640 --> 00:02:35,360
it gets really kind of challenging and tends to get forgotten probably because it's hard,

29
00:02:35,360 --> 00:02:39,280
but it does. And so it's really important to get those patched, but there's more that you can do

30
00:02:39,280 --> 00:02:44,000
it as secure VPN, like make sure you're not keeping credentials on it. Just use Azure AD to do your

31
00:02:44,000 --> 00:02:50,480
authentication. Most vendors, almost all, I think all the major ones, do that and so you can

32
00:02:50,480 --> 00:02:54,560
authenticate with it. So there's ways to protect it above and beyond patching as well. So I wanted

33
00:02:54,560 --> 00:03:00,160
to point folks at that guidance. Azure Network Security Book came out, which I'm really excited

34
00:03:00,160 --> 00:03:04,880
about. So that will get you a link there to take a look at that in case you're interested in

35
00:03:05,520 --> 00:03:10,720
learning a lot more about that. Yeah, I think I actually contributed a little excerpt to that

36
00:03:10,720 --> 00:03:14,880
particular book. The other thing I want to call folks' attention to is there's a Security

37
00:03:14,880 --> 00:03:19,840
Technical Content Library, essentially a technical catalog of all the security content and guidance

38
00:03:20,480 --> 00:03:24,640
that Microsoft publishes. And so I wanted to put that link in there for folks to find it.

39
00:03:25,200 --> 00:03:28,720
It's a great way to sort of find a lot of our security content and guidance in one place.

40
00:03:29,440 --> 00:03:33,440
Last one's a bit of a teaser. We are very close to being done with the cyber reference

41
00:03:33,440 --> 00:03:38,320
architecture. So the MCRA is some people like to call it. Highly complex diagram with all

42
00:03:38,320 --> 00:03:43,280
Microsoft cybersecurity technology. So that will be coming soon. Don't have a link. Don't have a

43
00:03:43,280 --> 00:03:48,400
download point yet, but we are actively working on getting that up and running and ready to go.

44
00:03:48,400 --> 00:03:54,560
That's all I got. Cool. So then it's me and I'm going to talk about, unsurprisingly,

45
00:03:54,560 --> 00:04:03,440
a ton of Sentinel things. But first of all, we now have some Azure policy-based data connectors

46
00:04:03,440 --> 00:04:09,600
for Sentinel, which is really cool because Azure policy is useful. And of course, having that

47
00:04:09,600 --> 00:04:15,200
being coming into Sentinel is really, really helpful. So that's a good start. And as you know,

48
00:04:15,200 --> 00:04:20,000
we're always adding new data connectors to Sentinel. The next one I wanted to talk about,

49
00:04:20,000 --> 00:04:27,360
slightly different. We've now released a preview, a public preview, a number of

50
00:04:27,360 --> 00:04:33,920
additional logins for Azure AD. So in the Azure AD connector in Sentinel, we used to just have

51
00:04:33,920 --> 00:04:38,960
sign-in logs and audit logs. There's now a number of additional log sources in there.

52
00:04:38,960 --> 00:04:46,160
Of particular note there is the non-interactive logons. Now, that was arguably, you could say,

53
00:04:46,160 --> 00:04:52,880
a bit of a blind spot because that wasn't able, that wasn't something we ingested. And in the

54
00:04:52,880 --> 00:04:59,360
context of a couple of things that have happened in the cyber world, some attacks that have happened

55
00:04:59,360 --> 00:05:05,520
in the past few months, the non-interactive logons are pretty important. So what we've done is you

56
00:05:05,520 --> 00:05:13,520
can now ingest them natively into the connector. And also, the Mystic team, who are the acronym

57
00:05:13,520 --> 00:05:19,600
Mystic, if you don't know it, is Microsoft Threat Intelligence Center. They've also updated 24

58
00:05:19,600 --> 00:05:27,520
analytics rules that are identity related, that are now performing correlations of those new

59
00:05:27,520 --> 00:05:32,400
non-interactive logons. There's a really cool blog post that one of the guys in my team,

60
00:05:32,400 --> 00:05:37,360
Yeneve wrote, and we'll link to it in the show notes. So go and check that out because

61
00:05:37,360 --> 00:05:41,680
it's something that you should definitely consider if you're already using Sentinel and

62
00:05:41,680 --> 00:05:46,960
ingesting Azure AD. You should definitely go and have a look at ingesting those logs. And then

63
00:05:46,960 --> 00:05:53,840
last thing, slightly different. We did mention this a while ago on the show, but all the new

64
00:05:53,840 --> 00:06:08,240
security exams, the Microsoft security exams, so that's the SC200, SC300, SC400, and SC900 as well.

65
00:06:08,240 --> 00:06:15,600
They are now not in beta anymore. They have gone to generally available. So if you are looking at

66
00:06:15,600 --> 00:06:22,080
taking them, then of course, you will now get your results straight away. I took them in beta,

67
00:06:22,080 --> 00:06:27,520
and I'm still waiting for my results. So fingers crossed, I passed them. But yeah, if you want to

68
00:06:27,520 --> 00:06:33,040
go and do them, they're now in generally available. They're out of beta. So go and have a look.

69
00:06:33,680 --> 00:06:38,400
And keep your eyes out because there will be in the not too near future, not too distant future,

70
00:06:38,400 --> 00:06:43,360
hopefully, a lot more learning resources to help you study for those as well. Obviously, when their

71
00:06:43,360 --> 00:06:48,160
new exams takes a little while for that stuff to come out, but yeah, keep your eyes open because

72
00:06:48,160 --> 00:06:54,560
it's definitely coming. And yeah, that's my news this week. Sarah, I'm actually really excited

73
00:06:54,560 --> 00:07:01,520
about the SC200 and currently looking at the Azure Defender and Sentinel material. I think

74
00:07:01,520 --> 00:07:10,160
it's pretty good. Yeah, Gladys, I took SC200 and SC900 in beta, still waiting for my results.

75
00:07:10,160 --> 00:07:17,200
Going to be quite embarrassing if I fail either of them. I'll update everyone next time, maybe,

76
00:07:17,200 --> 00:07:24,800
if I've got my results by then. Maybe you could give me some hints. Anyway, so I wanted to talk

77
00:07:24,800 --> 00:07:32,960
about this website that I found that had a lot of interesting information. This podcast is a great

78
00:07:32,960 --> 00:07:39,760
source of information. However, there's so much that is happening in Azure that it's impossible

79
00:07:39,760 --> 00:07:48,080
for us to cover all of it. So I always wanted to see a website or a place with a list of

80
00:07:48,720 --> 00:07:56,080
all Azure services. And actually, I found one previously I had not found it, but it was due

81
00:07:56,080 --> 00:08:04,880
to these other site called Azure Charts. I had no idea that there were 250 services in Azure.

82
00:08:04,880 --> 00:08:11,920
I thought it was less. Anyway, so I always was asking myself, where can I find that list of

83
00:08:11,920 --> 00:08:19,440
information, user stories, information about the latest capability release for each service,

84
00:08:19,440 --> 00:08:25,520
list of regions where I could find the services, reference architecture, solutions, idea,

85
00:08:26,160 --> 00:08:33,120
security. Actually, there's a section in security and compliance. And even parts of the services

86
00:08:33,120 --> 00:08:40,800
that have been retired. At first, it was a little bit difficult to look at these Azure charts. I

87
00:08:40,800 --> 00:08:47,120
wasn't sure how to get the information. But there's this video called Azure Fundamentals.

88
00:08:48,080 --> 00:08:56,400
I think it's video 26 that focus. I think it's like 15, 20 minutes. It goes and provides a quick

89
00:08:56,400 --> 00:09:02,960
overview about the site. So I really recommend watching this and then keeping these as a

90
00:09:02,960 --> 00:09:11,760
source of information because it just being updated all the time. The next thing that I wanted to

91
00:09:11,760 --> 00:09:21,120
talk about is Azure Purview. As everyone knows, I'm really a fan of labeling and classifying data.

92
00:09:22,160 --> 00:09:29,680
And I have spoken about Azure Purview. As I mentioned, it helps manage and govern on-prem,

93
00:09:29,680 --> 00:09:37,200
multi-cloud and software as a service structure data, such as databases and storage resources.

94
00:09:37,920 --> 00:09:46,240
It does this by labeling data within defined resource sets, using built-in and custom classifier,

95
00:09:47,040 --> 00:09:55,120
and even the Microsoft Information Sensitive label. So for example, Azure Data Lakes Storage Gen2,

96
00:09:55,120 --> 00:10:05,280
Azure Blobs Storage, Azure Files are some of the samples that the resource set can be used for.

97
00:10:05,840 --> 00:10:12,960
So now Purview Resource Set Pattern Rules is not available. And what this does,

98
00:10:13,840 --> 00:10:20,720
it allows you to customize or override how Azure Purview detects which assets

99
00:10:20,720 --> 00:10:26,560
are grouped in the resource set and how they're displayed within the catalog.

100
00:10:26,560 --> 00:10:31,600
Thanks, Ladas. Hey, there's a few things that took my interest this week. The first one was

101
00:10:32,640 --> 00:10:37,760
featured us in preview for Azure Automation, and that's support for managed identities.

102
00:10:37,760 --> 00:10:43,280
As I mentioned, I think I've just got every single podcast so far. One thing you'll see more and more

103
00:10:43,280 --> 00:10:49,680
is more services move to use managed identities because that way, storing the credential is

104
00:10:49,680 --> 00:10:54,160
actually managed by Azure, and you don't have to worry about where that credential is stored

105
00:10:54,160 --> 00:10:59,840
or worrying about it being compromised. So this is always a good foundation for providing client

106
00:10:59,840 --> 00:11:04,720
authentication for one service to provide authentication to another. And on the other side,

107
00:11:04,720 --> 00:11:08,720
of course, we'll use TLS for the server authentication, but that's another discussion.

108
00:11:09,440 --> 00:11:17,200
The other thing is kind of cool, and I'm a huge fan of this, is that Azure Virtual Machines DCS

109
00:11:17,200 --> 00:11:24,160
version 2. That series of VMs is now in public preview in Azure Government. So these are the

110
00:11:24,160 --> 00:11:31,440
VMs that are used for confidential computing. They're the ones that have the Intel Xeon CPUs

111
00:11:31,440 --> 00:11:37,440
in there that can support the software guard extensions or SGX technology. So if you're building

112
00:11:37,440 --> 00:11:43,360
your own secure enclaves or you're running applications that can take advantage of secure

113
00:11:43,360 --> 00:11:48,880
enclaves, then these are the VMs that we will use. So this is there. This is great to see.

114
00:11:49,920 --> 00:11:56,240
The last one is we've now just put a new ability in application gateway called

115
00:11:56,960 --> 00:12:01,520
URL Rewrite. The notion of URL Rewriting has been around for quite some time.

116
00:12:02,160 --> 00:12:07,760
It's not really a security feature per se, but you can certainly use it to provide some kind of

117
00:12:07,760 --> 00:12:14,000
security functionality such as writing specific headers, for example, based on URL. You might

118
00:12:14,000 --> 00:12:19,520
want to redirect to a different URL based on some kind of logic. So again, the concept has been

119
00:12:19,520 --> 00:12:25,920
around for some time, but it's now available in application gateway. And with that, that's the

120
00:12:25,920 --> 00:12:31,360
end of our news this week. It's a relatively quiet week. So now let's turn our attention to our

121
00:12:31,360 --> 00:12:38,080
guest. This week we have Sharon Shah. She is a principal program manager in the Azure Cloud

122
00:12:38,080 --> 00:12:43,120
Security team focusing on data science. First of all, Sharon, thank you so much for joining us

123
00:12:43,120 --> 00:12:47,760
on the podcast this week. Would you mind spending a moment just to explain what you do at Microsoft

124
00:12:47,760 --> 00:12:52,960
and how long you've been with the company? Sure. Thank you for inviting me to this podcast.

125
00:12:52,960 --> 00:13:01,680
I joined Microsoft three and a half years ago, so I lead the program manager team with five PMs.

126
00:13:01,680 --> 00:13:09,680
Right now, we build threat detections using machine learning algorithms in security products like

127
00:13:10,160 --> 00:13:17,440
Azure Active Directory identity protection, Azure Defender, and Azure Sentinel. We also

128
00:13:17,440 --> 00:13:27,600
own a security data platform that supports trillions of data that are processed by various

129
00:13:27,600 --> 00:13:34,560
detections, including the machine learning based threat detections. I want to start with a basic

130
00:13:34,560 --> 00:13:39,360
question. Like, what is artificial intelligence and machine learning? And what's the difference

131
00:13:39,360 --> 00:13:45,840
between them? I'm really kind of curious like how we think about that. Yeah. So, you know,

132
00:13:45,840 --> 00:13:52,720
let me talk about those three terminologies, data science, artificial intelligence, AI,

133
00:13:52,720 --> 00:14:02,000
and machine learning, ML. So data science is an interdisciplinary field that use scientific

134
00:14:02,000 --> 00:14:12,720
methods, processes, mathematics, algorithms, and systems to extract knowledge and insights from

135
00:14:12,720 --> 00:14:22,480
many structural or unstructured data. So data science is related to data mining, including

136
00:14:22,480 --> 00:14:31,360
machine learning and a big data. So this is a very wide, big field. Artificial intelligence in our

137
00:14:31,360 --> 00:14:40,720
definition is that machines or computers mimic cognitive functions that associate with the human

138
00:14:40,720 --> 00:14:49,440
mind, such as learning and the problem solving obviously needs to use some algorithms. And

139
00:14:50,000 --> 00:14:58,640
one of those, including machine learning algorithm. So machine learning is the study of computer

140
00:14:58,640 --> 00:15:06,720
algorithms that improve automatically through experience. So it's a subset of the artificial

141
00:15:06,720 --> 00:15:14,400
intelligence. Gotcha. So like, because I went through statistics classes, you know, when I was,

142
00:15:14,400 --> 00:15:20,240
you know, working to get my college degree. So it's like sort of machine learning is kind of

143
00:15:20,240 --> 00:15:24,240
like the progression of that into like really sophisticated algorithms, right? And that's sort

144
00:15:24,240 --> 00:15:29,120
of like a foundation. And then AI is kind of turning it into, hey, we're trying to mimic what

145
00:15:29,120 --> 00:15:36,640
humans do to reason, right? And then data science, including doing, you know, all the data analysis

146
00:15:36,640 --> 00:15:44,160
processing. Yeah. Okay, nice. Now, um, so how do we apply this to security? Like what does this

147
00:15:44,160 --> 00:15:49,840
bring us? How does this, how does bring this bring value to security? That's a great question.

148
00:15:50,480 --> 00:15:57,040
You know, digital transformation and the tech intensity now across all the organizations

149
00:15:57,600 --> 00:16:05,840
have led to exponentially like a data growth, right? And the regulations are consistently

150
00:16:05,840 --> 00:16:13,360
involving the attack surfaces is growing faster. Because, you know, organizations are moving to

151
00:16:13,360 --> 00:16:21,520
cloud. Now you have, you know, the hybrid clouds, multi crowds, and then you have on-prem, you know,

152
00:16:21,520 --> 00:16:28,000
this attack vectors and the surface is just growing, you know, tremendously. And all the

153
00:16:28,000 --> 00:16:34,880
attacks are more sophisticated and the stealer. So, you know, the traditional like a rule-based

154
00:16:34,880 --> 00:16:41,200
approach no longer meets the demand of the skill and the constant changing of landscape.

155
00:16:41,200 --> 00:16:48,400
And people are looking for new solutions dealing with this complexity. And what is good at,

156
00:16:49,040 --> 00:16:57,200
of machine learning is it's good at dealing with big data and handling with the multi-dimensional

157
00:16:57,200 --> 00:17:04,960
and the multi-variety of data. And it's also good at continuous improvement as the machine

158
00:17:04,960 --> 00:17:10,800
learning algorithm gain experience and the learning. You probably heard lots of like a talk about

159
00:17:10,800 --> 00:17:16,240
deep learning, neural network, all these terminologies in the machine learning.

160
00:17:16,240 --> 00:17:20,320
Yeah, and I kind of smile and nod like, okay, someday I'll understand that terminology.

161
00:17:20,320 --> 00:17:30,560
Yeah, you just like mimicking, you know, human mind, human brain, right, be able to learn through

162
00:17:30,560 --> 00:17:39,040
experience. And from algorithm point of view, it's learn through data. So, it can keep learning and

163
00:17:39,040 --> 00:17:46,640
keep adapting to the environment change. That's why, you know, machine learning is, you know,

164
00:17:46,640 --> 00:17:53,920
it's the technology that can help us to, you know, keep up with this data volume,

165
00:17:54,800 --> 00:18:01,680
grow this complexity of that tax. Yeah, so it's sort of like the, like, I think of it like everybody

166
00:18:01,680 --> 00:18:05,920
loves to tell people in the security business or really to tell people, read your logs,

167
00:18:05,920 --> 00:18:10,000
which is like impossible when there's a million lines a minute, right? So this is basically

168
00:18:10,000 --> 00:18:17,680
helping kind of do that without having to burn out like a biological mind. Right. I obviously,

169
00:18:17,680 --> 00:18:22,960
Sharon, I come at this from a Sentinel perspective. You know, I, when I work with Sentinel, of course,

170
00:18:22,960 --> 00:18:30,720
we know that Sentinel has ML in it. But I know that there's far more to ML, AI, ML capabilities

171
00:18:30,720 --> 00:18:36,960
than just Sentinel. So can you tell us a bit about where ML and AI is used in different

172
00:18:36,960 --> 00:18:43,440
Microsoft security products? Sure. Yeah. So you asked the right person. I actually own the ML

173
00:18:43,440 --> 00:18:52,320
feature in Sentinel. Sarah, you probably know. I do. I do know that. Yeah. So in addition,

174
00:18:52,320 --> 00:18:59,600
in, you know, we build machine-based threat detection behavior analytics in Sentinel, we also

175
00:18:59,600 --> 00:19:06,240
virtually every security product in Microsoft use machine learning, like Azure Defender, right?

176
00:19:06,240 --> 00:19:14,560
We also build behavior analytics for Azure Defender for storage. Glad this touched about

177
00:19:14,560 --> 00:19:20,960
Azure. You have Rob storage. You have SQL. You have files. So you have ADLs, data lakes. You know,

178
00:19:20,960 --> 00:19:30,480
that's so much data in there. We do use machine learning to analyze the access pattern and the

179
00:19:30,480 --> 00:19:38,880
behavior to detect threats to the storage as well as like the critical security service,

180
00:19:38,880 --> 00:19:44,640
like a keyboard. We use machine learning to detect threats to the keyboard. So

181
00:19:45,600 --> 00:19:51,840
Azure Defender is an example. I mentioned Azure Active Directory identity protection.

182
00:19:52,480 --> 00:19:59,520
We process billions of logins every day on our machine learning platform to detect

183
00:19:59,520 --> 00:20:07,840
unusual suspicious logging and a potential compromised account. And in Exchange Outlook,

184
00:20:07,840 --> 00:20:15,760
we use machine learning to identify phishing attacks. So literally every security product

185
00:20:15,760 --> 00:20:20,320
leveraging machine learning after Microsoft. That is a lot of machine learning there.

186
00:20:21,440 --> 00:20:27,760
I'm going to have to ask you because tell us a bit, because it's my baby. Can you tell us a

187
00:20:27,760 --> 00:20:33,200
little bit? Because it's something that definitely a lot of my customers are interested in. A little

188
00:20:33,200 --> 00:20:39,440
bit specifically about the Sentinel ML, because I know that's your thing too. So just for anyone

189
00:20:39,440 --> 00:20:47,200
who might have heard about it but doesn't know much about it, watch your elevator pitch for the ML

190
00:20:47,840 --> 00:20:54,080
in particular in Sentinel. Yeah, I know all the Simbenders talk about the use ML and

191
00:20:54,080 --> 00:21:04,560
the people, is it real? Is it high? And I can tell you at least in Sentinel, it's real. We do have

192
00:21:05,360 --> 00:21:13,280
basic behavior analytics like in our UEBA model in Sentinel, we use machine learning. And we have

193
00:21:13,280 --> 00:21:20,400
building machine learning threat detections like anomalous SSH logging or RDP logging. And we have

194
00:21:20,400 --> 00:21:27,520
a fusion we call advanced multi-stage detection that we actually have four different machine

195
00:21:27,520 --> 00:21:34,640
learning algorithms in that fusion to correlate the, you know, like the we call yellow signals

196
00:21:35,520 --> 00:21:43,120
and find those multi-stage attacks. You know, sometimes as a suspicious logging from a tall

197
00:21:43,120 --> 00:21:50,720
browser, it's not, maybe it's a deny. But then it's followed by, you know, data expectation,

198
00:21:50,720 --> 00:21:58,560
followed by C2 communication, and, you know, all these steps, then it's seriously, it's an attack.

199
00:21:58,560 --> 00:22:04,800
So fusion detection detects those kind of attacks and many like ransomware patterns.

200
00:22:05,600 --> 00:22:11,920
We work with Mystic, you know, our threat intelligence threat hunters, they identify those

201
00:22:11,920 --> 00:22:18,160
patterns and we feed to the machine learning algorithm. And like I said previously, it learns

202
00:22:18,800 --> 00:22:25,600
and can detect those emerging threats. And that's the like the outer box machine learning we have

203
00:22:25,600 --> 00:22:32,400
built. And we also have the platform. So bring your own machine learning to Sentinel. So you can

204
00:22:32,400 --> 00:22:38,480
ingest your data into Sentinel and build your model outside of the Sentinel or random outside of

205
00:22:38,480 --> 00:22:45,520
Sentinel and bring the signal back. So we heard customer, they have this platform scalability

206
00:22:45,520 --> 00:22:54,080
issue. So, you know, the BYOML platform solves that issue for those kind of customers, they have the

207
00:22:54,080 --> 00:23:00,800
data scientists in their organization. That's a very high level of what we have in Sentinel.

208
00:23:01,760 --> 00:23:06,800
I know, I know we can go on a lot about that. But just to the interest of time, I guess we'll

209
00:23:06,800 --> 00:23:13,280
leave it there for that one. Thanks Sharon. Actually Sharon, I'm not familiar with what is

210
00:23:13,280 --> 00:23:20,080
yellow alerts or signals that I think you mentioned. Is that alerts that you need to correlate more

211
00:23:20,080 --> 00:23:27,040
data to ensure whether it's true positive or false positive or what exactly it is?

212
00:23:27,040 --> 00:23:34,160
Yeah, so, so you know, if you, if you have a delta or talk to the security analysts or delta with

213
00:23:34,160 --> 00:23:42,480
those security products, and they generate lots of signals or alerts and all the alerts are in the

214
00:23:42,480 --> 00:23:48,800
different, they put apply different severity level on the alerts, like high severity or medium low,

215
00:23:48,800 --> 00:23:56,160
or some lots of them are informational. And the security analysts deal with like thousands,

216
00:23:56,160 --> 00:24:02,880
thousands alerts every day. They never keep up. So most of our customers told us like they never

217
00:24:02,880 --> 00:24:09,840
look at any alerts severity level, like lower than medium, or even they don't have time to look at

218
00:24:09,840 --> 00:24:17,360
the median severity alerts. So those are basically like learn under the radar of those security,

219
00:24:17,360 --> 00:24:23,920
you know, SOC operations. Nobody's seen it. So we call those like yellow signals.

220
00:24:23,920 --> 00:24:32,240
Lots of attacks are very like still see like I said, you know, you give you can find many examples,

221
00:24:32,240 --> 00:24:39,680
including, you know, the latest solo wings. So it's like hiding under your radar and going on,

222
00:24:39,680 --> 00:24:46,720
you, if you, you know, you think about all the security news and compromises, then people say,

223
00:24:46,720 --> 00:24:54,560
oh, attacker already has been in that environment for nine months before the organization and the

224
00:24:54,560 --> 00:25:02,080
discover the attack or the compromise, right? So that kind of, you know, it's not like there's no

225
00:25:02,080 --> 00:25:10,160
signal triggered. It's just all these signals are not sure and, you know, low confidence that

226
00:25:10,160 --> 00:25:18,640
were not surfaced to the eyes of the security analysts. So machine learning our fusion algorithm

227
00:25:18,640 --> 00:25:25,920
actually correlating all these signals and find the, you know, the kind of a multi-stage attack.

228
00:25:26,720 --> 00:25:34,320
Awesome. Hopefully for those median signals for the customer now, they can see them because we

229
00:25:34,320 --> 00:25:43,440
bring so sure integration and unification that in automation that hopefully they can speed up the

230
00:25:43,440 --> 00:25:52,240
meantime to our knowledge and remediate. So you just mentioned a good case, a common case to use

231
00:25:52,240 --> 00:26:01,840
AI ML. Are there other common use cases where one may use it? Yeah. So the common use cases,

232
00:26:01,840 --> 00:26:10,640
because AI, you know, it's good to find the patterns like in the huge amount of data, right?

233
00:26:10,640 --> 00:26:19,280
So it's very good at do behavior analytics. And you can find a spark of like excessive download

234
00:26:19,280 --> 00:26:27,600
from VPN or excessive upload, you know, those kinds of sparks usually indicate some problem

235
00:26:27,600 --> 00:26:37,280
or, you know, access to IP address or host to never seen before. So those are good, like the AI

236
00:26:37,280 --> 00:26:44,960
and machine learning is good at it. Then you correlating with these abnormal behavior with

237
00:26:44,960 --> 00:26:52,160
the threat intel you have information you have, then you can elevate the signal a little bit,

238
00:26:52,160 --> 00:27:00,800
make sure, you know, oh, now this abnormal access, maybe outbound connection combined with our threat

239
00:27:00,800 --> 00:27:10,880
intelligence information, or this IP is a C2 or this UIL is waterhole UIL malicious. Then you

240
00:27:10,880 --> 00:27:18,240
combine these and, you know, we will find the trace of these attacks. So that's kind of the

241
00:27:18,240 --> 00:27:25,440
good use cases for machine learning. Another one is we really think machine learning is good at

242
00:27:25,440 --> 00:27:32,560
finding the emerging threats in the unknowns, you know, for rules, you know, oh, I matched the

243
00:27:33,200 --> 00:27:42,960
minicad.exe, that's an attack, right? Or, you know, you have a blacklist, but that's limited to know.

244
00:27:42,960 --> 00:27:50,000
But with the machine learning and the behavior analytics, it observed the trend, it find the

245
00:27:50,000 --> 00:27:56,800
abnormal behavior, it finds, you know, it can detect like emerging threats or unknown threats.

246
00:27:57,440 --> 00:28:02,640
So looking at all this AI and ML stuff, as it relates to security, I can't imagine this is

247
00:28:02,640 --> 00:28:07,920
this particularly easy. So what are some of the challenges that you come across applying

248
00:28:07,920 --> 00:28:14,800
artificial intelligence and machine learning in the realm of security? Yeah, good question.

249
00:28:14,800 --> 00:28:19,600
We've been doing this for years, like I joined the team three and a half years, and before I

250
00:28:19,600 --> 00:28:25,360
joined the team, the team, you know, have done this for, I think more than three years already.

251
00:28:26,080 --> 00:28:32,960
We have lots of experience applying this, and we also encountered, you know, lots of, you know,

252
00:28:32,960 --> 00:28:40,800
obstacles and problems. And we also talked to customers, like Sentinel customers, you know,

253
00:28:40,800 --> 00:28:47,200
what's their issue. And they realized the machine learning is the way to go, but, you know, have

254
00:28:47,200 --> 00:28:54,880
trouble. So basically, the number one trouble is the data quality and the lack of uniformed schema.

255
00:28:54,880 --> 00:29:02,800
Like data is everywhere, and in different format, and in different, like, meanings, even maybe the

256
00:29:02,800 --> 00:29:10,480
format is same, like a self format, but every field is different meaning, because it's, you know,

257
00:29:10,480 --> 00:29:16,880
whatever the interpretation of vendor, firewall vendor, or software vendor putting there, right.

258
00:29:17,440 --> 00:29:22,800
And for the machine learning algorithm, you basically, it's a garbage in garbage out. So you

259
00:29:22,800 --> 00:29:30,160
have to do lots of data transformation, cleaning, blah, blah. So that's one challenge. The second

260
00:29:30,160 --> 00:29:37,680
challenge is a lack of labels in security. If you give machine learning algorithms some labels,

261
00:29:37,680 --> 00:29:44,560
they tell them, oh, this result is good, this is bad, then you will, you know, the model will

262
00:29:44,560 --> 00:29:52,000
learn and improve by itself. But literally, you know, because all the security information are

263
00:29:52,000 --> 00:29:58,640
confidential, you know, or, you know, there are lots of like data privacy or information. So

264
00:29:59,440 --> 00:30:07,120
literally, the machine learning model used in cybersecurity, and, you know, we don't get enough

265
00:30:07,120 --> 00:30:15,520
labels to improve the model. So that's the second challenge. The third challenge, which I heard from

266
00:30:15,520 --> 00:30:22,960
lots of customers, it's not us, is one lack of data science resource in their organization.

267
00:30:22,960 --> 00:30:30,240
And, you know, you, if you want to use machine learning in security, you really need both security

268
00:30:30,240 --> 00:30:38,320
skill and the data science skill. And it's rare already to get like security experts. And it's

269
00:30:38,320 --> 00:30:46,240
also rare to get, you know, machine learning experts. And it's even harder to get like people

270
00:30:46,240 --> 00:30:54,080
who have both, right? So that's a huge challenge. Another challenge, you know, with data scientists

271
00:30:54,080 --> 00:31:00,880
in their organization, they have problems with dealing with the large data, the scalability.

272
00:31:01,520 --> 00:31:07,520
So sometimes they do well in prototyping, but they have trouble to bring it to production because

273
00:31:07,520 --> 00:31:15,040
the data volume and they are not able to support like this kind of scale. So Azure, it's an awesome

274
00:31:15,040 --> 00:31:21,920
platform, you know, it's elastic, you know, and the Sentinel is running on Azure. So it inherits

275
00:31:21,920 --> 00:31:29,360
this scalability, availability, capability. So that's the perfect platform for building machine

276
00:31:29,360 --> 00:31:34,640
learning on top of it. Yeah, I can certainly speak to the hiring security people aspect.

277
00:31:34,640 --> 00:31:39,760
It's pretty hard to get a good security person. I can only imagine what it's like hiring someone

278
00:31:39,760 --> 00:31:46,080
who's a security person and a data science person. That leads into another topic. So one thing that

279
00:31:46,080 --> 00:31:52,240
I've been doing over the last few months is taking all the 900 level exams, Microsoft, and one of them

280
00:31:52,240 --> 00:31:57,520
is AI 900. And the reason why I'm doing it is just to essentially make sure that even though I'm a

281
00:31:57,520 --> 00:32:02,880
security guy is just to make sure that I'm actually focusing on, you know, the platform in

282
00:32:02,880 --> 00:32:07,360
general and getting a better understanding of various aspects of the Azure platform beyond

283
00:32:07,360 --> 00:32:12,400
security. And one of them is AI 900, which is an introduction to all the fundamentals of

284
00:32:12,400 --> 00:32:18,960
artificial intelligence. One thing that's talked about a lot in there, looking at the study materials

285
00:32:18,960 --> 00:32:26,560
is this notion of responsible AI. Could you just give us a quick overview of what responsible AI is?

286
00:32:26,560 --> 00:32:34,000
Yeah, so like a machine learning model, machine learning algorithms and a model are like mimicking

287
00:32:34,000 --> 00:32:42,160
the human brain by thinking. And so what do you have to be really careful? Like there are many examples,

288
00:32:42,160 --> 00:32:50,160
you know, the ML algorithm and the gut abused, produced the results that was not the intent

289
00:32:50,160 --> 00:32:59,520
of the author of the model or algorithm, right? There is an example, I think earlier, Microsoft

290
00:32:59,520 --> 00:33:06,880
has a chatbot called Tay on the Twitter. You guys probably know it's a chatbot and it's chatting

291
00:33:06,880 --> 00:33:14,480
on the Twitter with, you know, with conversation. And it got attacked. People feed it's all

292
00:33:14,480 --> 00:33:21,680
these languages, offensive language. And then it learned from those and, you know, spit out

293
00:33:21,680 --> 00:33:30,240
those offensive, you know, language. So Microsoft shut it down. If you Google like a TAY tweet,

294
00:33:30,240 --> 00:33:37,840
you will find a lot of discussion and the news on it. And so this brings like, when we build out

295
00:33:37,840 --> 00:33:45,760
machine learning algorithm, when we build out those advanced features, for example, maybe do a HR,

296
00:33:45,760 --> 00:33:55,680
like a resume screen, we need to think about do we actually unintentionally have bias, the ML model,

297
00:33:55,680 --> 00:34:03,520
like a robot, and learn, you know, potential, learn some, you know, some, you know, some

298
00:34:03,520 --> 00:34:17,760
behavior that's caused a bias, or is it possible it's unintentionally have the leak of the privacy

299
00:34:17,760 --> 00:34:26,000
information? You know, there are lots of study and articles on that about responsible AI. Microsoft

300
00:34:26,000 --> 00:34:33,360
is very serious about this. So how, how are we securing from the introduction of bad data then?

301
00:34:35,120 --> 00:34:42,880
Yeah, this is good question. So, so there are two things. One is like I talked about

302
00:34:42,880 --> 00:34:51,120
unintentional, right? And there's another aspect of is the, you know, malicious attack to the machine

303
00:34:51,120 --> 00:34:58,960
learning model and to the data used by machine learning model. And our team has a trustworthy

304
00:34:58,960 --> 00:35:06,960
ML project, and that we worked with MITRE, you guys probably know MITRE has an attack framework

305
00:35:06,960 --> 00:35:14,160
for enterprise for, you know, IoT for I see it's leveraged like a widely used by in the

306
00:35:14,160 --> 00:35:21,280
security industry to have the security tech, the kill chain, right, the security tactics techniques.

307
00:35:21,280 --> 00:35:30,560
Our team worked with MITRE and we published MITRE attack framework for adversary ML to identify,

308
00:35:31,440 --> 00:35:39,040
call out those cyber attacks specific to machine learning algorithms. For example,

309
00:35:39,040 --> 00:35:48,800
data poisoning. Gladys, you asked the bad data, right? So attack can intentionally poison the

310
00:35:48,800 --> 00:35:55,200
data that's used by training the machine learning model. If the model is trained on a bad data,

311
00:35:55,200 --> 00:36:03,440
and it will produce bad results. So this is the one. So securing the data source is extremely

312
00:36:03,440 --> 00:36:09,520
important, you know, in the ML system doesn't matter. It's for, you know, the ML system is for

313
00:36:10,240 --> 00:36:17,760
for using insecurity or using in speech recognition or using in facial recognition,

314
00:36:17,760 --> 00:36:25,280
or even it's, you know, you think about if it's used in healthcare, it's about human life. So this

315
00:36:25,280 --> 00:36:34,080
is really important. And there are other common attacks to ML models, like an invasion attack.

316
00:36:34,640 --> 00:36:41,200
Basically, there is a very famous example, like people, the researchers put a few

317
00:36:42,000 --> 00:36:52,240
three, like a very small sticker on the road, and it fooled the Tesla's ML model to drive to the,

318
00:36:52,240 --> 00:37:02,160
you know, opposite lane. So that's, and that's scary, right? And there's research, like they can

319
00:37:02,160 --> 00:37:08,960
pull out called a machine learning inversion test. It's about privacy, right? The facial

320
00:37:08,960 --> 00:37:16,880
recognition program, you know, you know, use a lot of data sampling to train the model to recognize

321
00:37:16,880 --> 00:37:26,160
fusion, the face, like Windows Hello, right? So we're logging with our face. So there is a specific

322
00:37:26,160 --> 00:37:35,840
attack, they can reverse the train the model and discover like your face. So, so, so this

323
00:37:35,840 --> 00:37:45,120
called a model inversion attack that can kind of like invert the PII data from your binary

324
00:37:45,120 --> 00:37:51,120
machine learning model. So this definitely is a privacy concern. So there are lots of attacks

325
00:37:51,840 --> 00:37:57,920
in this area. You know, if you are interested, look at, you know, the material we provide

326
00:37:59,200 --> 00:38:03,040
within this podcast, you have lots of to read in this area.

327
00:38:04,000 --> 00:38:11,760
Okay, so lots to think about the Sharon and I've learned a lot today. But if someone who's

328
00:38:11,760 --> 00:38:19,920
listening wanted to know more about security related AI and ML, are there any sources or

329
00:38:19,920 --> 00:38:27,200
materials that you'd recommend they went to look at? There are lots of paper online, you can search,

330
00:38:27,200 --> 00:38:32,320
you know, just being it Google it, and you apply machine learning data science in

331
00:38:33,440 --> 00:38:39,120
security, you can find a lot of online. And there are, and in this area, actually,

332
00:38:39,120 --> 00:38:44,960
it is relatively new. And, you know, we definitely going to include some links in this podcast and

333
00:38:44,960 --> 00:38:52,480
for you to get started. It's relatively new. And there are lots of research and lots of green

334
00:38:52,480 --> 00:39:00,400
area we can, you know, explore in this area. That's like a basically what I have been doing.

335
00:39:00,400 --> 00:39:06,240
And our team is doing half research half, you know, building the features in the product.

336
00:39:06,240 --> 00:39:11,280
And sometimes some attempt may fail. But that's fine.

337
00:39:13,280 --> 00:39:18,800
So before we let you go, one thing that we ask all of our guests, is do you have any final thoughts

338
00:39:18,800 --> 00:39:25,920
you'd like to leave our listeners with? Yeah, you know, like I said, it's hard to find

339
00:39:25,920 --> 00:39:32,240
security experts in the market, and also hard to find the data scientists, it's even harder to

340
00:39:32,240 --> 00:39:41,200
find the both. But I would say that, you know, my background is from security. So I just like

341
00:39:42,000 --> 00:39:49,840
jumped into this applying data science to security, and started learning, taking courses,

342
00:39:49,840 --> 00:39:57,520
you know, on Coursera and LinkedIn, YouTube, and just keep learning that way. So I would say

343
00:39:57,520 --> 00:40:03,600
you are passionate about applying, you know, data science in cybersecurity. Don't worry about that

344
00:40:03,600 --> 00:40:10,400
you don't have much knowledge, you know, or maybe you are a security expert, I don't know data science.

345
00:40:10,400 --> 00:40:15,680
That's okay. All your data scientists, I'm really interested in using my data science skill in

346
00:40:15,680 --> 00:40:24,240
security. That's fine. As long as you are willing to learn. So threats is changing. And the new

347
00:40:24,240 --> 00:40:32,400
machine learning technology is emerging. And the only way to be successful is continuous learning.

348
00:40:32,960 --> 00:40:41,840
So, you know, my final thought is jumping to it. If you are really passionate about it, I see great

349
00:40:41,840 --> 00:40:47,920
future in the data, like I said, lots of green area for us to explore. Thanks for that. And

350
00:40:47,920 --> 00:40:51,920
thanks so much for joining us this week, Sharon. We really appreciate you taking the time. We know

351
00:40:51,920 --> 00:40:56,400
you're extremely busy. I learned a great deal this week. It's another example of I learned stuff I

352
00:40:56,400 --> 00:41:01,200
didn't know I didn't know. And to all our listeners out there, we hope you found it useful too.

353
00:41:01,200 --> 00:41:04,160
Thanks for listening. Stay safe. And we'll see you next time.

354
00:41:04,160 --> 00:41:09,040
Thanks for listening to the Azure Security Podcast. You can find show notes and other

355
00:41:09,040 --> 00:41:16,320
resources at our website azsecuritypodcast.net. If you have any questions, please find us on

356
00:41:16,320 --> 00:41:23,200
Twitter at azuresetpod. Background music is from ccmixter.com and licensed under the Creative

357
00:41:23,200 --> 00:41:49,200
Commons license. Background music ends.

