WEBVTT

00:00:03.760 --> 00:00:06.240
Welcome to the Azure Security Podcast, where

00:00:06.240 --> 00:00:08.759
we discuss topics relating to security, privacy,

00:00:09.039 --> 00:00:11.480
reliability, and compliance on the Microsoft

00:00:11.480 --> 00:00:16.480
Cloud Platform. Hey everybody, welcome to episode

00:00:16.480 --> 00:00:19.620
116. This week is myself, Michael, with Sarah

00:00:19.620 --> 00:00:22.719
and Mark. And our guest this week is Mark Kendrick,

00:00:22.839 --> 00:00:25.579
who's here to talk to us about Microsoft Sentinel

00:00:25.579 --> 00:00:28.019
Dead Lake. But before we get to our guest, let's

00:00:28.019 --> 00:00:29.760
take a little lap around the news. Mark, why

00:00:29.760 --> 00:00:32.060
don't you kick things off? Cool. So yeah, I got

00:00:32.060 --> 00:00:34.840
a couple of items. One of them is for folks with

00:00:34.840 --> 00:00:37.000
Microsoft Unified, formerly Microsoft Premiere.

00:00:37.240 --> 00:00:39.719
Folks that have that are Security Option Framework

00:00:39.719 --> 00:00:43.079
Module 5, or SAF as we like to call it. The one

00:00:43.079 --> 00:00:45.859
on data security is out there, available, ready

00:00:45.859 --> 00:00:48.380
to be dispatched for anybody that wants and has

00:00:48.380 --> 00:00:51.159
Unified. This one is a really, really hot and

00:00:51.159 --> 00:00:53.780
in -demand one because when it comes down to

00:00:53.780 --> 00:00:56.799
data security, what we've found is that organizations

00:00:56.799 --> 00:00:59.200
have always sort of treated data security as

00:00:59.200 --> 00:01:01.250
they've given it a lot of lip service, hey, the

00:01:01.250 --> 00:01:04.090
data is the most important thing, etc. But oftentimes,

00:01:04.230 --> 00:01:08.670
whether it's an EDR or an XDR or a SIM or something

00:01:08.670 --> 00:01:11.609
else, always took priority. MFA, something like

00:01:11.609 --> 00:01:13.950
that. And so data security was always important

00:01:13.950 --> 00:01:16.230
on people's mind, but it never actually got executed.

00:01:16.650 --> 00:01:19.390
And so the challenge, of course, is that AI comes

00:01:19.390 --> 00:01:21.530
along and you need to have your data labeled

00:01:21.530 --> 00:01:25.909
because otherwise there's no... way for the AI

00:01:25.909 --> 00:01:28.230
to know whether it's sensitive or not. And so

00:01:28.230 --> 00:01:30.549
it can start feeding it into all the sorts of

00:01:30.549 --> 00:01:32.049
different places it's not supposed to go. So

00:01:32.049 --> 00:01:33.849
we found this one's really, really important.

00:01:34.090 --> 00:01:36.109
And so we prioritize getting that one out there

00:01:36.109 --> 00:01:39.049
quick to support people that have to help get

00:01:39.049 --> 00:01:40.950
some AI stuff out there and do it securely. So

00:01:40.950 --> 00:01:43.209
that one's out. And then the other stuff, so

00:01:43.209 --> 00:01:45.530
this is news that's coming. It's not quite out

00:01:45.530 --> 00:01:48.390
yet, but we finished the primary development

00:01:48.390 --> 00:01:51.269
for the first release of our security roles and

00:01:51.269 --> 00:01:53.230
glossary standards from the open group. So this

00:01:53.230 --> 00:01:56.859
is me putting on my open group hat. And so this

00:01:56.859 --> 00:01:59.680
is, roles and glossary is the overall standard

00:01:59.680 --> 00:02:02.159
that we're putting out there. It should be out

00:02:02.159 --> 00:02:04.500
hopefully in October. We're shooting to get it

00:02:04.500 --> 00:02:07.140
out there before the November conference in Houston

00:02:07.140 --> 00:02:09.419
that the open group hosts. Glossary defines a

00:02:09.419 --> 00:02:11.919
lot of different terms. Some of them we just

00:02:11.919 --> 00:02:14.620
had to clear up some stuff about risk and governance

00:02:14.620 --> 00:02:17.120
and stuff like that, just standard glossary stuff.

00:02:17.479 --> 00:02:19.060
And some of it was sort of some new additions

00:02:19.060 --> 00:02:21.580
from learnings from the Dart team. There's a

00:02:21.580 --> 00:02:23.599
very big difference between an Incident, something

00:02:23.599 --> 00:02:26.219
happens. A compromise, something went wrong in

00:02:26.219 --> 00:02:28.639
security. And a breach, something went wrong

00:02:28.639 --> 00:02:30.719
that might be legally defined in your jurisdiction

00:02:30.719 --> 00:02:33.919
by legal authorities. And so we got some very

00:02:33.919 --> 00:02:35.699
specific clarity out there and some other lessons

00:02:35.699 --> 00:02:37.319
learned and stuff like that. So the glossary

00:02:37.319 --> 00:02:39.719
is actually more interesting than a normal glossary.

00:02:39.860 --> 00:02:41.639
Then we talk about the difference between accountability

00:02:41.639 --> 00:02:43.960
and responsibility, essentially who's to blame

00:02:43.960 --> 00:02:46.620
versus who does the work. And there's a lot of

00:02:46.620 --> 00:02:49.340
interesting insights on that. Something like

00:02:49.340 --> 00:02:51.979
75 or 80 roles, I've sort of lost track at this

00:02:51.979 --> 00:02:54.039
point in time. that we found that have to do

00:02:54.039 --> 00:02:56.379
security work. maybe like less than a quarter

00:02:56.379 --> 00:02:58.280
of those are actually in the security team because

00:02:58.280 --> 00:03:00.680
IT has to do security work. The business has

00:03:00.680 --> 00:03:02.800
to do security work and has accountability for

00:03:02.800 --> 00:03:04.900
the decisions they make that affect the security

00:03:04.900 --> 00:03:07.759
of the org, all the way up to the board of directors.

00:03:08.000 --> 00:03:10.580
And so we really wanted to show that chain and

00:03:10.580 --> 00:03:12.139
are in the process of showing that chain, everything

00:03:12.139 --> 00:03:14.599
from the fiduciary duty of the board members

00:03:14.599 --> 00:03:16.879
that represent the shareholders, all the way

00:03:16.879 --> 00:03:20.120
into the management team CEOs and down, all the

00:03:20.120 --> 00:03:22.500
way into technology teams, business teams, security

00:03:22.500 --> 00:03:24.919
teams. So we're really trying to... round out

00:03:24.919 --> 00:03:28.539
the full set of things there. And so the first

00:03:28.539 --> 00:03:30.740
two that we focused on was the organizational

00:03:30.740 --> 00:03:33.599
leaders, because oftentimes they just don't know

00:03:33.599 --> 00:03:35.360
what their security job is or how to set up the

00:03:35.360 --> 00:03:37.500
accountability structure right, et cetera. So

00:03:37.500 --> 00:03:40.800
we got some guidance there. And then the SOC,

00:03:40.860 --> 00:03:42.800
we kind of went to the other extreme of really

00:03:42.800 --> 00:03:44.800
in the depths of security, dealing on the front

00:03:44.800 --> 00:03:46.800
line, the tip of the spear, as it were, kind

00:03:46.800 --> 00:03:49.520
of defining those roles, responsibilities. And

00:03:49.520 --> 00:03:52.479
we found that there's about 11 different roles

00:03:52.479 --> 00:03:55.030
and specializations in the SOC. And so to find

00:03:55.030 --> 00:03:57.050
all those and what are the tasks, the job duties

00:03:57.050 --> 00:03:59.389
that they have to do. So we've got some links

00:03:59.389 --> 00:04:02.710
for those and people can check those out. Some

00:04:02.710 --> 00:04:04.490
LinkedIn articles that I put together sort of

00:04:04.490 --> 00:04:06.430
previewing and showing some of the content and

00:04:06.430 --> 00:04:08.689
the approach. And then anyone that's actually

00:04:08.689 --> 00:04:10.689
a member of the open group, and there's I think

00:04:10.689 --> 00:04:12.490
like 500 organizations or something around the

00:04:12.490 --> 00:04:14.750
world that are members of it, can go ahead and

00:04:14.750 --> 00:04:16.430
review that as it goes through the review period.

00:04:16.569 --> 00:04:18.569
And otherwise, check out the LinkedIn stuff.

00:04:18.790 --> 00:04:21.660
So that's my news. So I have a couple of bits

00:04:21.660 --> 00:04:26.459
of news. Now in public preview, WAF or the Web

00:04:26.459 --> 00:04:29.970
Application Firewall is... running on application

00:04:29.970 --> 00:04:33.569
gateway for containers so you probably know hopefully

00:04:33.569 --> 00:04:36.029
you know that WAF has to run on an application

00:04:36.029 --> 00:04:38.269
gateway or it needs to run on front door but

00:04:38.269 --> 00:04:41.149
now you can do it in a container which is lovely

00:04:41.149 --> 00:04:44.790
if you have something that is facing an application

00:04:44.790 --> 00:04:47.370
that is facing the internet you should have a

00:04:47.370 --> 00:04:50.949
WAF in front of it so application gateway for

00:04:50.949 --> 00:04:53.310
containers means that if you're containerizing

00:04:53.310 --> 00:04:55.470
everything you can still have WAF which is lovely

00:04:55.470 --> 00:05:00.449
and then the other new I have is that it started

00:05:00.449 --> 00:05:02.470
to be released at the time of recording this

00:05:02.470 --> 00:05:06.569
last week. Michael and I did a series. Michael

00:05:06.569 --> 00:05:09.410
and I harassed people at Microsoft Build a couple

00:05:09.410 --> 00:05:12.250
of months ago and were asking developer security

00:05:12.250 --> 00:05:15.889
questions. And they've started to be released

00:05:15.889 --> 00:05:18.670
on YouTube and they're pretty funny. We got to

00:05:18.670 --> 00:05:20.910
talk to a lot of well -known Microsoft faces

00:05:20.910 --> 00:05:23.810
and just attendees at the audience. So we're

00:05:23.810 --> 00:05:26.189
asking pertinent developer security questions.

00:05:27.300 --> 00:05:30.199
I'm biased, but they're pretty fun. So I'll put

00:05:30.199 --> 00:05:32.339
a link to a couple of them in the show notes

00:05:32.339 --> 00:05:34.720
and keep an eye out because you'll be seeing

00:05:34.720 --> 00:05:37.740
them come out for probably at least a couple

00:05:37.740 --> 00:05:39.860
of months because we did quite a few of them

00:05:39.860 --> 00:05:42.620
and they're good fun. So take a look. Yeah, they're

00:05:42.620 --> 00:05:44.959
certainly fun. Although one of the comments on

00:05:44.959 --> 00:05:48.019
YouTube was they thought that we were holding

00:05:48.019 --> 00:05:50.339
people at gunpoint when we're asking them questions

00:05:50.339 --> 00:05:53.069
or under duress. which is absolutely not true,

00:05:53.230 --> 00:05:54.949
but it was a lot of fun doing them. A lot of

00:05:54.949 --> 00:05:56.629
series of sort of security related questions.

00:05:56.889 --> 00:05:59.649
Yeah, it was a lot of fun. Anyway, so onto my

00:05:59.649 --> 00:06:03.709
news. First one is we now have VNet encryption

00:06:03.709 --> 00:06:07.420
with Azure managed luster. If you're anything

00:06:07.420 --> 00:06:09.879
like me, I had no idea what Managed Lustre is,

00:06:09.959 --> 00:06:12.360
but apparently it is a very high -speed file

00:06:12.360 --> 00:06:16.399
system for high -performance environments. So

00:06:16.399 --> 00:06:18.560
now we support VNet encryption, which means that

00:06:18.560 --> 00:06:21.120
the connection between the VM, the high -performance

00:06:21.120 --> 00:06:24.319
VM, or any VM for that matter, and the actual

00:06:24.319 --> 00:06:28.019
storage is protected by DTLS, which is Datagram

00:06:28.019 --> 00:06:30.639
Transport Layer Security, or just like TLS, but

00:06:30.639 --> 00:06:34.600
rather than TCP, it's using UDP. And that is

00:06:34.600 --> 00:06:36.360
used to secure the traffic. So it authenticates

00:06:36.360 --> 00:06:39.500
the endpoints as well as protecting the data

00:06:39.500 --> 00:06:44.300
on the wire. Next one is we've now added transformations

00:06:44.300 --> 00:06:50.180
in Azure Monitor so that you can take, actually

00:06:50.180 --> 00:06:53.259
more accurately, data coming out of a firewall,

00:06:53.439 --> 00:06:56.199
logs coming out of a firewall can now be transformed

00:06:56.199 --> 00:07:00.000
before putting it into Azure Monitor logs. The

00:07:00.000 --> 00:07:02.839
reason for doing this is so that you can do some

00:07:02.839 --> 00:07:06.660
initial cut at the data before you store it in

00:07:06.660 --> 00:07:09.180
the log files, which you can then use as your

00:07:09.180 --> 00:07:12.639
monitor to analyze it, which is really kind of

00:07:12.639 --> 00:07:15.139
timely based on the conversation that we're about

00:07:15.139 --> 00:07:18.279
to go through about Microsoft Sentinel Data Lake.

00:07:20.089 --> 00:07:22.269
Sort of pause disbelief just for a moment, and

00:07:22.269 --> 00:07:24.269
hopefully Mark can cover some of those stories

00:07:24.269 --> 00:07:27.649
about costs and so on for storing log data. Next

00:07:27.649 --> 00:07:32.709
one is that we have now released a new hardware

00:07:32.709 --> 00:07:36.110
security module called Azure Cloud HSM. This

00:07:36.110 --> 00:07:39.769
is a replacement for dedicated HSM. I'm not going

00:07:39.769 --> 00:07:41.069
to get into all the details, but we basically

00:07:41.069 --> 00:07:43.889
have essentially four different key management

00:07:43.889 --> 00:07:46.029
systems now. We have Azure Key Vault, which,

00:07:46.069 --> 00:07:49.360
you know, everyone... kind of knows we have managed

00:07:49.360 --> 00:07:55.139
hsm and now we have cloud hsm and we have as

00:07:55.139 --> 00:07:56.980
your payment hsm which has been around for a

00:07:56.980 --> 00:07:58.639
while i'm not going to go into all the details

00:07:58.639 --> 00:08:00.660
here there's actually a flow chart which is actually

00:08:00.660 --> 00:08:03.519
really useful to work out whether you use key

00:08:03.519 --> 00:08:05.379
vault whether you use payment hsm whether you

00:08:05.379 --> 00:08:07.740
use cloud hsm or whether you use managed hsm

00:08:07.740 --> 00:08:11.160
but essentially cloud hsm is your own piece of

00:08:11.160 --> 00:08:13.639
hardware which is the same for dedicated hsm

00:08:13.639 --> 00:08:16.959
which is you know cloud hsm is replacing Next

00:08:16.959 --> 00:08:21.560
one is PostgreSQL now allows Power BI to connect

00:08:21.560 --> 00:08:24.079
to it using a Microsoft account. In other words,

00:08:24.079 --> 00:08:26.360
Microsoft Entry ID. This is really, really cool

00:08:26.360 --> 00:08:28.120
because that way you don't have to use a username

00:08:28.120 --> 00:08:30.160
and password. For those of you who've been following

00:08:30.160 --> 00:08:32.620
the Secure Future initiative, you will all know

00:08:32.620 --> 00:08:34.360
that one of the big things that we're doing is

00:08:34.360 --> 00:08:36.080
getting rid of usernames and passwords or passwords

00:08:36.080 --> 00:08:38.480
or keys or tokens that are used for authentication

00:08:38.480 --> 00:08:41.559
and potentially authorization. So this is great

00:08:41.559 --> 00:08:43.799
to see. Again, we're seeing a lot of products

00:08:43.799 --> 00:08:46.360
move away from using usernames and passwords

00:08:46.360 --> 00:08:49.259
to using EntraID. That way, the credential is

00:08:49.259 --> 00:08:51.899
not in the firing line in the case of an attack.

00:08:52.480 --> 00:08:55.200
All right, so that's my news. So with that out

00:08:55.200 --> 00:08:57.460
of the way, let's turn our attention to our guests.

00:08:57.480 --> 00:09:00.679
As I mentioned at the top of this episode, we

00:09:00.679 --> 00:09:03.100
have our guest this week is Mark Kendrick, who

00:09:03.100 --> 00:09:04.919
is here to talk to us about Microsoft Sentinel

00:09:04.919 --> 00:09:08.360
Data Lake. So Mark, thank you so much for joining

00:09:08.360 --> 00:09:09.899
us this week. We'd like to take a moment and

00:09:09.899 --> 00:09:12.200
introduce yourself to our listeners. Certainly.

00:09:12.779 --> 00:09:15.960
So I have been at Microsoft, oh gracious now,

00:09:16.039 --> 00:09:19.679
approaching five years. I came into Microsoft

00:09:19.679 --> 00:09:22.379
with the RISC -IQ acquisition. Prior to that,

00:09:22.460 --> 00:09:25.120
spent a lot of time in the threat intelligence

00:09:25.120 --> 00:09:29.500
space. and working with large organizations,

00:09:30.019 --> 00:09:33.200
actually doing some things that are somewhat

00:09:33.200 --> 00:09:36.240
similar to the topic at hand, some early iterations

00:09:36.240 --> 00:09:38.360
of trying to solve that. So that was one of the

00:09:38.360 --> 00:09:40.440
reasons why when I had an opportunity to be involved

00:09:40.440 --> 00:09:42.500
in Sentinel Data Lake, I certainly raised my

00:09:42.500 --> 00:09:45.830
hand for that. Last few years here at Microsoft

00:09:45.830 --> 00:09:49.230
have been super exciting, really busy. There's

00:09:49.230 --> 00:09:51.149
a lot of this AI thing going on, in case anybody

00:09:51.149 --> 00:09:54.070
might not have noticed. I was actually incredibly

00:09:54.070 --> 00:09:56.070
fortunate to be part of the founding product

00:09:56.070 --> 00:09:58.190
management team that helped bring Security Copilot

00:09:58.190 --> 00:10:02.190
to market. And that was a tremendous learning

00:10:02.190 --> 00:10:05.470
experience for all of us, a tremendous amount

00:10:05.470 --> 00:10:09.190
of potential then and now. It's hard to believe

00:10:09.190 --> 00:10:13.379
that that was a mere 24 months back. which is

00:10:13.379 --> 00:10:18.200
basically 24 years in AI terms. But that has

00:10:18.200 --> 00:10:21.899
sort of led me to the present moment where I

00:10:21.899 --> 00:10:25.120
am now a principal product manager focused on

00:10:25.120 --> 00:10:28.720
the Microsoft Sentinel Data Lake, working in

00:10:28.720 --> 00:10:32.500
our customer experience engineering team, aligned

00:10:32.500 --> 00:10:35.820
side by side with our product engineering teams

00:10:35.820 --> 00:10:38.659
and extension of that team, really helping to

00:10:38.659 --> 00:10:43.500
bring the customer's voice into the product management

00:10:43.500 --> 00:10:47.620
process here. and also really just learning a

00:10:47.620 --> 00:10:50.700
lot about what our customer expectations are

00:10:50.700 --> 00:10:53.379
for data management, what some of the opportunities

00:10:53.379 --> 00:10:56.980
are, and how we can begin to address that always

00:10:56.980 --> 00:11:00.240
with an eye towards the advances that are happening

00:11:00.240 --> 00:11:03.980
in AI. So super exciting space, and yeah. All

00:11:03.980 --> 00:11:06.580
right, let's start with the most obvious question,

00:11:06.759 --> 00:11:09.340
at least obvious question to me. I spent quite

00:11:09.340 --> 00:11:12.759
a bit of time in Azure Data, so working on SQL

00:11:12.759 --> 00:11:16.980
and Cosmos DB. and so on. I heard the word data

00:11:16.980 --> 00:11:20.960
lake thrown around, but as a security nerd, I

00:11:20.960 --> 00:11:23.440
still do not know what a data lake is. So can

00:11:23.440 --> 00:11:26.700
you just start with that? What is a data lake

00:11:26.700 --> 00:11:28.600
and why should anybody care before we actually

00:11:28.600 --> 00:11:30.840
move on to the security ramifications of all

00:11:30.840 --> 00:11:35.600
this stuff? There's a concept that has persisted

00:11:35.600 --> 00:11:39.149
for a while of... and of creating an infrastructure

00:11:39.149 --> 00:11:41.970
that allows us to retain as much security data

00:11:41.970 --> 00:11:44.490
as possible. And this applies, I said security

00:11:44.490 --> 00:11:46.990
data, it really applies even beyond that. How

00:11:46.990 --> 00:11:50.610
can we be less selective in the data that we

00:11:50.610 --> 00:11:54.549
want to ingest? And one of the things that is

00:11:54.549 --> 00:11:58.169
a potential solution to that is something that

00:11:58.169 --> 00:12:01.990
would natively leverage the low -cost storage

00:12:01.990 --> 00:12:05.399
available with cloud infrastructure. The concept

00:12:05.399 --> 00:12:09.480
of a data lake generally has come to mean, at

00:12:09.480 --> 00:12:12.500
least at a foundational level, a place where

00:12:12.500 --> 00:12:16.960
you can inexpensively ingest and store a tremendous

00:12:16.960 --> 00:12:20.460
amount of data. and take advantage of the cloud

00:12:20.460 --> 00:12:24.000
scale abilities to do that. Over time, the architecture

00:12:24.000 --> 00:12:26.399
has evolved a little bit. In the business world,

00:12:26.820 --> 00:12:30.840
a data lake is perhaps a place that combines

00:12:30.840 --> 00:12:34.779
both transaction data as well as analytical data.

00:12:35.059 --> 00:12:40.320
It is an intentional rendering of data for specific

00:12:40.320 --> 00:12:44.100
business outcomes. You see some patterns emerging

00:12:44.100 --> 00:12:47.919
when people interact with a data lake. Usually

00:12:47.919 --> 00:12:51.100
you might find some data scientists doing some

00:12:51.100 --> 00:12:54.320
really great stuff with machine learning algorithms

00:12:54.320 --> 00:12:57.659
running maybe in PySpark notebooks. So you have

00:12:57.659 --> 00:13:00.179
that sort of activity happening on the business

00:13:00.179 --> 00:13:05.200
side. There has been a recognition over time

00:13:05.200 --> 00:13:08.259
that, as I said at the outset, definitely want

00:13:08.259 --> 00:13:10.899
to store a lot of data for security. I think

00:13:10.899 --> 00:13:13.179
in my history, I've seen people do all kinds

00:13:13.179 --> 00:13:17.799
of ways to try to solve that problem. I remember

00:13:17.799 --> 00:13:20.340
Purple Rain, I think, was one that stands out

00:13:20.340 --> 00:13:23.139
to me. And that was the name of the home -built

00:13:23.139 --> 00:13:25.799
data platform at a major financial institution

00:13:25.799 --> 00:13:28.220
that they had created, presumably because the

00:13:28.220 --> 00:13:31.259
architect was a Prince fan. But learning about

00:13:31.259 --> 00:13:33.360
all the different names for these infrastructures,

00:13:33.379 --> 00:13:34.940
learning about how they ended up building these

00:13:34.940 --> 00:13:38.120
things, we had like the Elk Stack, Log Stash,

00:13:38.200 --> 00:13:40.779
we had all this sort of things over the history

00:13:40.779 --> 00:13:45.379
of everything. And it never did really quite...

00:13:46.000 --> 00:13:48.700
meet the expectations that folks really had.

00:13:48.860 --> 00:13:51.580
And so the most recent manifestation of that

00:13:51.580 --> 00:13:54.039
is something you might have heard of Databricks,

00:13:54.159 --> 00:13:57.600
for example. Databricks is actually a project

00:13:57.600 --> 00:13:59.580
of which one of the core founders is somebody

00:13:59.580 --> 00:14:02.720
who is a security incident responder. And so...

00:14:02.960 --> 00:14:05.980
That has kind of taken us to the present moment

00:14:05.980 --> 00:14:09.820
where a data lake is actually a really well -established

00:14:09.820 --> 00:14:13.120
concept within a lot of organizations for business

00:14:13.120 --> 00:14:16.879
analytics. And now we have an opportunity to

00:14:16.879 --> 00:14:21.320
apply that directly here within security. So,

00:14:21.320 --> 00:14:23.159
Michael, does that somewhat answer your question

00:14:23.159 --> 00:14:25.559
or does it create more questions? No, no, it

00:14:25.559 --> 00:14:28.200
doesn't. So basically, it's just a way of a dumping

00:14:28.200 --> 00:14:30.679
ground, essentially a cost -effective dumping

00:14:30.679 --> 00:14:33.269
ground for data. You know, ironically, that is

00:14:33.269 --> 00:14:37.090
actually the right way to describe it. Even though

00:14:37.090 --> 00:14:38.649
it sounds somewhat pejorative, it is actually

00:14:38.649 --> 00:14:42.549
accurate. Because whereas previously, we kind

00:14:42.549 --> 00:14:44.700
of do a lot of... I see teams doing a lot of

00:14:44.700 --> 00:14:46.399
hand -wringing about, oh, goodness, well, we

00:14:46.399 --> 00:14:49.720
can't really store that data. We can't ingest

00:14:49.720 --> 00:14:51.940
that data. Maybe it's too expensive. Or, okay,

00:14:52.000 --> 00:14:53.580
we can ingest that data, but we need to cut out

00:14:53.580 --> 00:14:55.559
all these columns, and hopefully we don't need

00:14:55.559 --> 00:14:59.179
them later. Or we can ingest that data, but we

00:14:59.179 --> 00:15:02.159
can't retain it longer than 30 days. And maybe

00:15:02.159 --> 00:15:04.080
once they get it in, maybe they do some complex

00:15:04.080 --> 00:15:06.360
transformations on it and some other things.

00:15:07.240 --> 00:15:09.779
But for the most part, they're trying to essentially...

00:15:10.579 --> 00:15:15.000
prune and route and condense and generally mangle,

00:15:15.080 --> 00:15:18.019
frankly, the data before it even goes in. The

00:15:18.019 --> 00:15:20.179
architecture of a data lake, again, because it's

00:15:20.179 --> 00:15:24.679
built on low -cost cloud storage at the deepest

00:15:24.679 --> 00:15:28.899
level, it's actually typically Delta Parquet

00:15:28.899 --> 00:15:32.659
files on, say, something like ADLSv2, Azure Data

00:15:32.659 --> 00:15:37.960
Lake Storage. the architecture actually encourages

00:15:37.960 --> 00:15:41.200
you to just ingest everything, ingest it raw.

00:15:41.600 --> 00:15:43.840
This also allows you to scale because sometimes

00:15:43.840 --> 00:15:46.960
transformations or any of the other things that

00:15:46.960 --> 00:15:49.820
we do, parsing, normalization, if we try to do

00:15:49.820 --> 00:15:51.639
those things in ingestion time, we start creating

00:15:51.639 --> 00:15:54.970
bottlenecks. We start creating... complex infrastructure

00:15:54.970 --> 00:15:57.909
because we want to make sure we have enough scale

00:15:57.909 --> 00:16:00.690
to not drop records. It gets really messy. The

00:16:00.690 --> 00:16:04.210
idea with the proverbial dumping ground, as you

00:16:04.210 --> 00:16:07.809
said here, is let's ingest it raw and then let's

00:16:07.809 --> 00:16:10.629
have that single one copy of the data that we

00:16:10.629 --> 00:16:12.429
access through different modalities, through

00:16:12.429 --> 00:16:15.509
different ways of querying into it. And we apply

00:16:15.509 --> 00:16:18.049
transformations directly on the data in place

00:16:18.049 --> 00:16:20.409
that maybe creates new aggregations of that data.

00:16:20.850 --> 00:16:23.210
Folks can maybe do some research on the medallion

00:16:23.210 --> 00:16:26.250
architecture. That's less relevant necessarily

00:16:26.250 --> 00:16:29.330
into the Sentinel data lake, but it does give

00:16:29.330 --> 00:16:32.529
you some foundational ideas to how that architecture

00:16:32.529 --> 00:16:35.529
is. And that progressive rendering or progressive

00:16:35.529 --> 00:16:39.730
improvement of the data over time. is really

00:16:39.730 --> 00:16:43.429
the future. Instead of looking at tiering your

00:16:43.429 --> 00:16:45.610
data or putting it into different storage levels

00:16:45.610 --> 00:16:47.490
and all this, just bring it all into the lake

00:16:47.490 --> 00:16:51.789
and then work with it once it's there. I was

00:16:51.789 --> 00:16:54.509
really excited to see this before it was coming

00:16:54.509 --> 00:16:56.870
and then as it launched into public preview.

00:16:57.769 --> 00:17:00.669
One of the recommendations that we have when

00:17:00.669 --> 00:17:02.370
we talk about the security operations workshops

00:17:02.370 --> 00:17:05.509
and whatnot and the best practices that we put

00:17:05.509 --> 00:17:08.240
out to customers is really kind of a three -part

00:17:08.240 --> 00:17:11.140
strategy, right? You want XDR, you know, start

00:17:11.140 --> 00:17:13.359
as EDR, and then we realize it's more than endpoint,

00:17:13.559 --> 00:17:15.380
it's identity, it's cloud, it's everything else.

00:17:15.660 --> 00:17:18.000
But you need to have like an XDR because, hey,

00:17:18.099 --> 00:17:21.720
if a Microsoft or a CrowdStrike or a Proofpoint

00:17:21.720 --> 00:17:23.480
or whoever is, you know, going to engineer and

00:17:23.480 --> 00:17:26.019
support a bunch of detections, take those. You

00:17:26.019 --> 00:17:27.079
know, you don't have to engineer them, you have

00:17:27.079 --> 00:17:29.680
to reinvent the wheel. And then a SIEM, because

00:17:29.680 --> 00:17:31.559
you need to do analytics to promote other stuff

00:17:31.559 --> 00:17:34.509
to that level and do custom detections. And then

00:17:34.509 --> 00:17:36.430
the problem is a SIEM is expensive, right? Whether

00:17:36.430 --> 00:17:39.009
you run it on -prem or whether you do it in the

00:17:39.009 --> 00:17:42.250
cloud, you need a cheap dumping ground, right?

00:17:42.289 --> 00:17:44.470
You need a data lake. And so XTR SIEM data lake

00:17:44.470 --> 00:17:47.490
is what we recommend. Because the worst thing

00:17:47.490 --> 00:17:50.910
you can do is have the data or have the ability

00:17:50.910 --> 00:17:53.930
to get the data and then not have it. And so

00:17:53.930 --> 00:17:58.509
I loved to see the launch of the Sentinel data

00:17:58.509 --> 00:18:00.349
lake, especially the close integration within

00:18:00.349 --> 00:18:03.410
Sentinel. Because that just makes life so much

00:18:03.410 --> 00:18:05.630
easier for our customers because they can work

00:18:05.630 --> 00:18:08.549
with one console and get the benefit of all these

00:18:08.549 --> 00:18:11.200
different types of analytics. and not have to

00:18:11.200 --> 00:18:14.220
do the swivel chair analytics. That was just

00:18:14.220 --> 00:18:15.740
me. I guess it was more of a compensator question.

00:18:15.859 --> 00:18:19.079
I was really excited. I'm curious, is it just

00:18:19.079 --> 00:18:21.599
the cheap dumping ground? Do you have any other

00:18:21.599 --> 00:18:23.900
comments on the problem it's aiming to solve?

00:18:24.019 --> 00:18:26.259
I want to get your thoughts on that. Yeah, certainly.

00:18:26.759 --> 00:18:32.940
I think the fundamental step for us here is this

00:18:32.940 --> 00:18:36.819
is a platform play. By that, I mean we are...

00:18:37.289 --> 00:18:42.109
Sentinel has kind of always been a platform in

00:18:42.109 --> 00:18:46.569
practice, but maybe not quite as overt or intentional

00:18:46.569 --> 00:18:49.490
as what we could make it. And we're really leaning

00:18:49.490 --> 00:18:53.809
into that. And so having Sentinel as that data

00:18:53.809 --> 00:18:57.569
platform, that's the sort of foundational step

00:18:57.569 --> 00:19:02.289
and providing this foundational capability of

00:19:02.289 --> 00:19:05.710
low cost storage and ingestion, but doing so.

00:19:06.140 --> 00:19:09.460
right next to all of the capabilities that Sentinel

00:19:09.460 --> 00:19:13.099
has today. So on day one, one of the first things

00:19:13.099 --> 00:19:16.119
customers usually ask me about is, hey, how do

00:19:16.119 --> 00:19:18.099
I get data in the data lake? I've got some data

00:19:18.099 --> 00:19:20.240
going into Sentinel. I want to go put it in the

00:19:20.240 --> 00:19:22.960
data lake. Well, hey, guess what? As soon as

00:19:22.960 --> 00:19:26.720
you onboard to the data lake, then your data

00:19:26.720 --> 00:19:29.680
that is going into Sentinel is actually mirrored

00:19:29.680 --> 00:19:33.150
in real time into the data lake. So all of your

00:19:33.150 --> 00:19:35.529
existing connectors, all of your existing infrastructures,

00:19:35.529 --> 00:19:38.450
everything that you have for getting data into

00:19:38.450 --> 00:19:42.289
Sentinel, we're reusing that, essentially, and

00:19:42.289 --> 00:19:44.750
routing that data into the data lake. So on day

00:19:44.750 --> 00:19:48.730
one, you get some interesting benefits. So one

00:19:48.730 --> 00:19:50.750
of the things, of course, is all of your existing

00:19:50.750 --> 00:19:53.410
Sentinel bits, everything that is the SIM in

00:19:53.410 --> 00:19:55.450
Sentinel, all that continues to function as expected.

00:19:56.269 --> 00:19:58.609
But what you now get is this data that's starting

00:19:58.609 --> 00:20:01.569
to collect in the data lake. And because of the

00:20:01.569 --> 00:20:03.589
architecture, the underlying architecture of

00:20:03.589 --> 00:20:06.609
the data lake, you can query that data through

00:20:06.609 --> 00:20:10.250
multiple different modalities, meaning you can

00:20:10.250 --> 00:20:13.509
run KQL queries on the data in the lake. Our

00:20:13.509 --> 00:20:16.569
friends maybe who are more familiar with running

00:20:16.569 --> 00:20:20.089
Spark jobs over in, maybe as I said, in data

00:20:20.089 --> 00:20:22.950
analytics world, those are ready to be used on

00:20:22.950 --> 00:20:26.400
the lake directly. You know, Michael, you mentioned

00:20:26.400 --> 00:20:28.619
some of your experience in data world there.

00:20:28.660 --> 00:20:31.480
If you're just missing SQL and you've just been

00:20:31.480 --> 00:20:33.660
doing KQL for just way too long, you really want

00:20:33.660 --> 00:20:36.700
to just write some SQL. Well, now or at least

00:20:36.700 --> 00:20:39.099
some point very near in the future, you can write

00:20:39.099 --> 00:20:41.599
SQL against the data lake. And I'm being a little

00:20:41.599 --> 00:20:44.970
silly there, but... A very common use case for

00:20:44.970 --> 00:20:48.950
our customers is providing visibility to their

00:20:48.950 --> 00:20:51.589
security operations, providing reporting and

00:20:51.589 --> 00:20:55.769
telemetry. And that can often result in even

00:20:55.769 --> 00:20:57.769
just a very practical level. You have some practical

00:20:57.769 --> 00:20:59.990
challenges of, well, hold on, I've got this great

00:20:59.990 --> 00:21:02.549
little Sentinel workbooks feature. It shows some

00:21:02.549 --> 00:21:04.130
really cool dashboards. I don't really want to

00:21:04.130 --> 00:21:07.609
give, you know, my CFO or my CEO security reader

00:21:07.609 --> 00:21:11.049
because that's terrifying. And so what do I do

00:21:11.049 --> 00:21:13.109
instead? Well, I'd love to use Power BI. Great

00:21:13.109 --> 00:21:15.210
solution, but now how do you make that connection

00:21:15.210 --> 00:21:18.730
into there? Well, because the data lake is on

00:21:18.730 --> 00:21:21.089
architecture optimally suited for that, your

00:21:21.089 --> 00:21:23.089
data is actually ready to be queried in that

00:21:23.089 --> 00:21:25.670
modality right away. And those are just, of course,

00:21:25.690 --> 00:21:28.289
the traditional ways. As we start looking at

00:21:28.289 --> 00:21:34.240
our evolution to AI, as we start... bringing

00:21:34.240 --> 00:21:38.319
on new agentic experiences. All of those things

00:21:38.319 --> 00:21:41.019
that can query into the lake become other different

00:21:41.019 --> 00:21:43.740
ways of accessing it. So all that begins. Now

00:21:43.740 --> 00:21:48.339
that presumes that the data that we have going

00:21:48.339 --> 00:21:51.339
into the sim is maybe all the data that we need,

00:21:51.420 --> 00:21:53.200
but of course we know it's not. We know that

00:21:53.200 --> 00:21:56.259
there's... Within organizations, very often,

00:21:56.359 --> 00:22:00.500
there is data that we know would afford really

00:22:00.500 --> 00:22:04.559
useful security insights, but we just maybe can't

00:22:04.559 --> 00:22:08.539
quite justify and make the business case for

00:22:08.539 --> 00:22:11.160
ingesting it into a full -on SIM. But again,

00:22:11.240 --> 00:22:13.539
we know it's useful. Historically, that's led

00:22:13.539 --> 00:22:16.119
us to essentially fork our infrastructure to

00:22:16.119 --> 00:22:20.160
go put that data someplace else. Some organizations

00:22:20.160 --> 00:22:23.730
might leave the data in an on -prem SIM. or an

00:22:23.730 --> 00:22:26.849
on -prem data platform. Maybe you have, I mentioned

00:22:26.849 --> 00:22:29.230
Logstash earlier, maybe there's some other open

00:22:29.230 --> 00:22:33.349
source tools that you've deployed. So maybe that's

00:22:33.349 --> 00:22:35.910
a solution. But however those solutions work,

00:22:36.049 --> 00:22:38.150
when you have the data in two separate places,

00:22:38.210 --> 00:22:41.049
you miss out on opportunities for what I think

00:22:41.049 --> 00:22:44.859
of as data adjacency. You know, I actually was

00:22:44.859 --> 00:22:47.579
talking with a customer recently where they were

00:22:47.579 --> 00:22:50.180
saying that they had some data in Azure Data

00:22:50.180 --> 00:22:52.619
Explorer and they had some data in Sentinel and

00:22:52.619 --> 00:22:54.940
then they had some data in another traditional

00:22:54.940 --> 00:22:59.039
SIM, cloud SIM. And I'm like, wow, that's all

00:22:59.039 --> 00:23:01.019
really useful security data. How do you join

00:23:01.019 --> 00:23:04.220
that together? And there was silence on the call.

00:23:04.339 --> 00:23:07.480
And finally, one courageous soul spoke up and

00:23:07.480 --> 00:23:11.990
admitted Excel. which was just tragic. So I can

00:23:11.990 --> 00:23:14.210
just imagine trying to do joins in Excel with

00:23:14.210 --> 00:23:16.210
VLOOKUPs. But this is the reality that so many

00:23:16.210 --> 00:23:18.789
teams are in, and that's the reality that's changing

00:23:18.789 --> 00:23:23.130
with Microsoft Sentinel data lake. So now, if

00:23:23.130 --> 00:23:26.170
you are at that stage where you're just not quite

00:23:26.170 --> 00:23:28.130
ready to make the business case to apply full

00:23:28.130 --> 00:23:30.329
analytics to a given data set, let's just pick,

00:23:30.369 --> 00:23:33.910
say, DNS logs, then you can instead bring that

00:23:33.910 --> 00:23:36.559
data in directly to the data lake. You can do

00:23:36.559 --> 00:23:38.440
so with the existing connectors and the existing

00:23:38.440 --> 00:23:41.500
infrastructure inside of Sentinel. Once that's

00:23:41.500 --> 00:23:44.119
in the data lake, you can query it using KQL

00:23:44.119 --> 00:23:45.759
that you're already familiar with. You can do

00:23:45.759 --> 00:23:48.359
that inside the XDR portal. So there's a lake

00:23:48.359 --> 00:23:51.640
explorer integrated into that experience. And

00:23:51.640 --> 00:23:53.960
because you also have that adjacent to the rest

00:23:53.960 --> 00:23:56.359
of the data that's there, everything begins to

00:23:56.359 --> 00:23:59.519
sort of come together. So that is really what

00:23:59.519 --> 00:24:03.619
this is starting to bring together in that structure.

00:24:04.490 --> 00:24:07.609
So, Mark, this might sound like an obvious question,

00:24:07.730 --> 00:24:13.849
but do we have to have a Sentinel to use Data

00:24:13.849 --> 00:24:16.329
Lake? I mean, you've talked about running KQL

00:24:16.329 --> 00:24:20.329
across it. Do they have to operate with each

00:24:20.329 --> 00:24:25.950
other? Tell me all the things, just for clarity's

00:24:25.950 --> 00:24:28.450
sake. Yeah, no, that's a good question. And we

00:24:28.450 --> 00:24:30.289
do have some folks that have asked that question

00:24:30.289 --> 00:24:34.319
before. The key thing is to realize the way in

00:24:34.319 --> 00:24:38.779
which Microsoft is evolving the concept of Microsoft

00:24:38.779 --> 00:24:43.619
Sentinel. Microsoft Sentinel is our security

00:24:43.619 --> 00:24:47.059
data platform, and you will continue to see that.

00:24:47.730 --> 00:24:50.869
coming into and converging with our other solutions.

00:24:51.490 --> 00:24:55.349
So whereas a lot of us that know and love Sentinel

00:24:55.349 --> 00:24:59.190
inside of Azure, and we use that term to describe

00:24:59.190 --> 00:25:02.849
that specific experience, then it might seem

00:25:02.849 --> 00:25:04.869
a little peculiar, like, what do I have to have

00:25:04.869 --> 00:25:07.250
that to just have the data lake? But really,

00:25:07.309 --> 00:25:09.829
we're evolving not just the product, but also

00:25:09.829 --> 00:25:12.009
how we communicate about this and indeed how

00:25:12.009 --> 00:25:14.930
we build it. So the direction that we are heading

00:25:14.930 --> 00:25:19.230
is that If you want to benefit from the things

00:25:19.230 --> 00:25:21.509
that we're talking about here, from the data

00:25:21.509 --> 00:25:25.869
lake, then you'll want to activate the Microsoft

00:25:25.869 --> 00:25:28.190
Sentinel data lake, and you'll want to bring

00:25:28.190 --> 00:25:30.910
that data in through the Microsoft Sentinel connectors.

00:25:31.529 --> 00:25:33.869
You do, of course, have the option of bringing

00:25:33.869 --> 00:25:36.710
that data directly into the data lake only, and

00:25:36.710 --> 00:25:39.109
then maybe making a decision later that you want

00:25:39.109 --> 00:25:41.589
to present it for the real -time analytics in

00:25:41.589 --> 00:25:45.460
the sim. So in that way, that... element of it

00:25:45.460 --> 00:25:49.799
is is mostly optional but of course highly recommended

00:25:49.799 --> 00:25:53.480
and it also gives you the option in fact of even

00:25:53.480 --> 00:25:56.220
taking the result of your queries and presenting

00:25:56.220 --> 00:25:59.559
those directly into the analytics tier as we

00:25:59.559 --> 00:26:02.420
call it within the sentinel what we know and

00:26:02.420 --> 00:26:05.000
love is sentinel today and so it's the adjacency

00:26:05.000 --> 00:26:07.640
of those two technical experiences connected

00:26:07.640 --> 00:26:11.240
together unified together into that that really

00:26:11.240 --> 00:26:13.880
makes the difference. And again, the experience

00:26:13.880 --> 00:26:16.019
of this is going to be happening inside of security

00:26:16.019 --> 00:26:17.799
.microsoft .com. It's going to be happening inside

00:26:17.799 --> 00:26:22.779
of the Defender portal, and it is just an extension

00:26:22.779 --> 00:26:27.039
of that overall experience. So, Mark, one thing

00:26:27.039 --> 00:26:29.319
I wanted to sort of run by you, because I had

00:26:29.319 --> 00:26:32.720
this sort of interesting insight as I was helping

00:26:32.720 --> 00:26:35.480
write the standards that I mentioned in the earlier

00:26:35.480 --> 00:26:37.970
news section. Because we were basically going

00:26:37.970 --> 00:26:40.250
through, for each of the different SOC roles

00:26:40.250 --> 00:26:42.470
and organizational leader roles, here's your

00:26:42.470 --> 00:26:45.289
job, and if you don't do this, then here's the

00:26:45.289 --> 00:26:47.930
risk of neglect. What goes wrong? And it really

00:26:47.930 --> 00:26:49.970
helps bring things in a real clear perspective

00:26:49.970 --> 00:26:53.690
of why it's important to do something. And one

00:26:53.690 --> 00:26:55.369
of the things that we were talking about is,

00:26:55.390 --> 00:26:58.569
hey, SOC should be reviewing... the solutions

00:26:58.569 --> 00:27:00.529
that go through your architecture review board

00:27:00.529 --> 00:27:02.089
or your solution review board, whatever you call

00:27:02.089 --> 00:27:04.910
it, as different folks review it to make sure

00:27:04.910 --> 00:27:06.210
that this thing isn't going to break something

00:27:06.210 --> 00:27:09.250
in your enterprise. And of course, one of the

00:27:09.250 --> 00:27:12.769
big things there is logs. And one of the interesting

00:27:12.769 --> 00:27:16.789
insights that I had on why logs are so important

00:27:16.789 --> 00:27:20.430
is as we're writing that risk and neglect out,

00:27:20.549 --> 00:27:23.220
I realized you can't... actually do root cause

00:27:23.220 --> 00:27:25.299
analysis without logs so if you don't turn the

00:27:25.299 --> 00:27:27.579
logs on if you don't have the logs and you know

00:27:27.579 --> 00:27:29.200
like a sentinel data lake or something else or

00:27:29.200 --> 00:27:32.519
just anywhere when you have an incident then

00:27:32.519 --> 00:27:34.380
one you can't detect it and it's really hard

00:27:34.380 --> 00:27:36.700
to investigate and you can't do a root cause

00:27:36.700 --> 00:27:38.779
analysis of it and you're like well something

00:27:38.779 --> 00:27:41.680
happened we don't know what or why or how and

00:27:41.680 --> 00:27:43.279
then so you can't do the root cause analysis

00:27:43.279 --> 00:27:44.920
you don't know actually what happened or how

00:27:44.920 --> 00:27:47.500
it happened so you can't prevent the next version

00:27:47.500 --> 00:27:52.039
of the exact same attack And so, you know, assuming

00:27:52.039 --> 00:27:53.819
you do turn it on after that first one because

00:27:53.819 --> 00:27:56.160
you learned your lesson, you know, then you can

00:27:56.160 --> 00:27:59.059
actually start to do root cause analysis and

00:27:59.059 --> 00:28:00.960
then, you know, prevent that and block whatever

00:28:00.960 --> 00:28:03.319
happened that first time. And so it was really

00:28:03.319 --> 00:28:06.019
interesting that I didn't quite realize that

00:28:06.019 --> 00:28:08.920
when you don't have logs, you're signing up for

00:28:08.920 --> 00:28:12.400
at least two more incidents because you simply

00:28:12.400 --> 00:28:14.000
don't understand what happened the first time.

00:28:14.039 --> 00:28:16.319
And so you're bound to have the second one because

00:28:16.319 --> 00:28:18.000
the attackers always try the same thing over

00:28:18.000 --> 00:28:20.400
and over again. And so I'm just curious on your

00:28:20.400 --> 00:28:22.059
thoughts on that. But that was one of those things

00:28:22.059 --> 00:28:23.920
that really sort of came home on how important

00:28:23.920 --> 00:28:29.660
logs are. Absolutely. Our good friend, good fellow,

00:28:29.859 --> 00:28:33.480
John Lambert, is fond of saying that it's all

00:28:33.480 --> 00:28:35.799
security data, right? People sometimes look at

00:28:35.799 --> 00:28:37.680
it like, well, is that log security data? Well,

00:28:37.859 --> 00:28:40.519
it might be someday. Sorry, but it might be.

00:28:41.299 --> 00:28:45.480
So you're exactly right in that the... The inability

00:28:45.480 --> 00:28:51.400
to have the logs that you need has a direct impact

00:28:51.400 --> 00:28:54.259
on your ability to do blast radius assessment.

00:28:54.700 --> 00:28:59.180
It has a direct correlation, as you said, to

00:28:59.180 --> 00:29:03.960
prevention. And it, of course, also for what

00:29:03.960 --> 00:29:07.819
I think of as historical detections, right? Finding

00:29:07.819 --> 00:29:10.000
the evil that you did not know was evil that

00:29:10.000 --> 00:29:13.119
actually happened six months ago when somebody

00:29:13.119 --> 00:29:17.140
clicked on something and experienced an event.

00:29:18.559 --> 00:29:20.779
Because it was a novel event or maybe because

00:29:20.779 --> 00:29:24.539
they hadn't had Mark enough of your sessions

00:29:24.539 --> 00:29:26.019
that they were listening to. So maybe there were

00:29:26.019 --> 00:29:28.619
some gaps in their protection. There were some

00:29:28.619 --> 00:29:30.599
pieces that maybe they hadn't deployed yet. But

00:29:30.599 --> 00:29:32.920
that happened six months ago. Nobody knows that.

00:29:33.019 --> 00:29:35.119
Nobody knows maybe at this point that it is a

00:29:35.119 --> 00:29:36.900
problem. But now suddenly there's a publication

00:29:36.900 --> 00:29:39.400
that comes out and says, hey, there's this terrible

00:29:39.400 --> 00:29:41.339
thing called password sprays. They're scary.

00:29:41.740 --> 00:29:44.079
Here's how you go look for them. Or even if it's

00:29:44.079 --> 00:29:47.440
something as simple as. Here's a list of five

00:29:47.440 --> 00:29:50.539
IP addresses, three hashes, and two host names.

00:29:51.039 --> 00:29:53.759
Just go see if your organization ever interacted

00:29:53.759 --> 00:29:56.240
with any of these things, ever, in any context,

00:29:56.579 --> 00:30:00.000
any time over the last six months. Well, that

00:30:00.000 --> 00:30:02.609
all sounds... wonderful on paper, but if you've

00:30:02.609 --> 00:30:04.589
ever had to actually do that, you quickly encounter,

00:30:04.910 --> 00:30:07.509
it's like, well, hey, guess what? We were gathering

00:30:07.509 --> 00:30:10.009
our DNS logs, we were gathering our firewall

00:30:10.009 --> 00:30:14.210
logs in our sim, but they aged out after 30 days.

00:30:14.890 --> 00:30:17.029
And oh, actually, somebody was doing some really

00:30:17.029 --> 00:30:21.150
cool projects around some machine learning, around

00:30:21.150 --> 00:30:24.349
files in SharePoint, and it was really kind of

00:30:24.349 --> 00:30:26.609
cool, but it was expensive to maintain that cluster

00:30:26.609 --> 00:30:28.009
and they didn't really see an outcome from it,

00:30:28.029 --> 00:30:31.099
so yeah, that data's not there anymore. And the

00:30:31.099 --> 00:30:35.140
end result is you find that you're actually impacting

00:30:35.140 --> 00:30:38.519
your ability, as I said, to do retroactive detections,

00:30:38.519 --> 00:30:41.079
to search historically. We sometimes talk about

00:30:41.079 --> 00:30:45.259
threat hunting in the abstract, and sometimes

00:30:45.259 --> 00:30:47.740
it can be. You just have a hypothesis of something

00:30:47.740 --> 00:30:50.180
that maybe you can go find some evil. That's

00:30:50.180 --> 00:30:52.299
absolutely useful, and having a data lake to

00:30:52.299 --> 00:30:54.759
help you with that and store the data is good.

00:30:55.460 --> 00:30:57.759
But sometimes threat hunting is actually fairly

00:30:57.759 --> 00:31:00.539
targeted. You know the IOCs you're looking for.

00:31:00.680 --> 00:31:03.420
The question is whether or not you have the data

00:31:03.420 --> 00:31:08.119
for it. And so our vision with the Microsoft

00:31:08.119 --> 00:31:10.940
Sentinel Data Lake is to go directly after that

00:31:10.940 --> 00:31:14.839
problem and provide low -cost retention, provide

00:31:14.839 --> 00:31:20.240
something that is operating adjacent to the rest

00:31:20.240 --> 00:31:22.660
of your infrastructure, and allows you to...

00:31:23.039 --> 00:31:26.859
to sort of incrementally build the case to ingest

00:31:26.859 --> 00:31:29.619
that data. And I think maybe sometimes that's

00:31:29.619 --> 00:31:31.819
also why folks don't have the visibility that

00:31:31.819 --> 00:31:36.819
they need is because they know that it might

00:31:36.819 --> 00:31:39.000
be useful, but we all want to be good stewards

00:31:39.000 --> 00:31:42.460
of our organization's budgets. And if I'm going

00:31:42.460 --> 00:31:44.519
to go and say, hey, I want to ingest these DNS

00:31:44.519 --> 00:31:47.960
logs, but... They're like, okay, great. What

00:31:47.960 --> 00:31:49.279
are the detections that are going to surface?

00:31:49.500 --> 00:31:51.640
How is this going to inform my existing real

00:31:51.640 --> 00:31:53.819
-time detections? Well, I'm pretty confident

00:31:53.819 --> 00:31:55.779
that it's going to do that. I'm pretty sure I

00:31:55.779 --> 00:31:57.519
can show that case, but it's going to take me

00:31:57.519 --> 00:32:00.180
a little while. And you get into these poultry

00:32:00.180 --> 00:32:02.000
sequencing problems, chicken or the egg thing,

00:32:02.079 --> 00:32:04.039
right? How do I build the business case for this

00:32:04.039 --> 00:32:08.619
if I can't get it into the data platform? Yeah,

00:32:08.619 --> 00:32:11.900
I was thinking that's like the data version of

00:32:11.900 --> 00:32:13.779
the you can't get experience without a job and

00:32:13.779 --> 00:32:15.299
you can't get a job without experience. It's

00:32:15.299 --> 00:32:18.809
very much that. No, you're exactly right. And

00:32:18.809 --> 00:32:23.230
so our thinking is that for some of these data

00:32:23.230 --> 00:32:27.750
sets, let's just bring this into the data lake

00:32:27.750 --> 00:32:29.829
and then turn people loose on it and see what

00:32:29.829 --> 00:32:34.309
they're able to find. So, Mark, I have a question

00:32:34.309 --> 00:32:37.309
for you, actually. Now, you talked about the

00:32:37.309 --> 00:32:39.950
recommendations and things that you've put out

00:32:39.950 --> 00:32:44.140
there. One of the pieces that I'm really excited

00:32:44.140 --> 00:32:45.799
about with the data lake is the fact that we're

00:32:45.799 --> 00:32:48.059
ingesting not just the activity data and the

00:32:48.059 --> 00:32:50.559
logs you were talking about, but we're also ingesting

00:32:50.559 --> 00:32:53.680
assets. And when we talk about assets... into

00:32:53.680 --> 00:32:56.940
the platform. What we're talking about are, say,

00:32:57.119 --> 00:33:00.880
the enter identities, the various users and groups

00:33:00.880 --> 00:33:03.279
that people are a member of, think snapshots

00:33:03.279 --> 00:33:05.299
of things that you might maybe pull off of the

00:33:05.299 --> 00:33:08.720
Graph API. We're also bringing in periodic snapshots

00:33:08.720 --> 00:33:12.480
from Azure Resource Graph, from ARG, and even

00:33:12.480 --> 00:33:15.940
some additional attributes from SharePoint and

00:33:15.940 --> 00:33:19.940
Office 365, all rendered as these periodic snapshots

00:33:19.940 --> 00:33:23.650
of asset data. I'm curious, what do you imagine

00:33:23.650 --> 00:33:27.529
that you could do with that data combined with

00:33:27.529 --> 00:33:30.390
log data together in sort of the same world?

00:33:31.470 --> 00:33:34.589
So I think there's a huge amount of possibilities.

00:33:35.490 --> 00:33:38.650
The funny thing is, as you're describing that,

00:33:38.769 --> 00:33:41.690
I was connecting it to in the open group when

00:33:41.690 --> 00:33:44.509
we were trying to describe what is a modern SOC,

00:33:44.509 --> 00:33:48.059
right? Because the old -fashioned SOC is... is

00:33:48.059 --> 00:33:49.960
basically a network perimeter centric, right?

00:33:50.099 --> 00:33:54.039
Let's try and catch the stuff as it tries to

00:33:54.039 --> 00:33:55.599
traverse the firewall, the network perimeter,

00:33:55.700 --> 00:33:58.099
the IDS, the IPS, whatever, intrusion detection,

00:33:58.420 --> 00:34:01.259
intrusion prevention, et cetera, you know, all

00:34:01.259 --> 00:34:04.660
in the network perimeter. And then after we do

00:34:04.660 --> 00:34:06.259
that, okay, let's come up with a detection. Then

00:34:06.259 --> 00:34:10.179
let's like block it so that they can't get in

00:34:10.179 --> 00:34:11.559
the perimeter. That's kind of the old way of

00:34:11.559 --> 00:34:13.679
doing SOC. And we were trying to figure out like,

00:34:13.719 --> 00:34:15.969
how do you frame the new way of doing? security

00:34:15.969 --> 00:34:19.570
operations, right? And the term we ended up settling

00:34:19.570 --> 00:34:22.670
on was asset -centric because ultimately what

00:34:22.670 --> 00:34:24.969
you need to do is you need to protect something,

00:34:25.150 --> 00:34:28.389
whether it is valuable or not. You need to know

00:34:28.389 --> 00:34:29.789
how valuable it is so you can do the most valuable

00:34:29.789 --> 00:34:32.489
stuff first. And you've got to protect it regardless

00:34:32.489 --> 00:34:35.489
of wherever it is, inside, outside the firewall,

00:34:35.630 --> 00:34:37.510
the perimeter, the corporate network, whatever

00:34:37.510 --> 00:34:39.980
you want to say. And so we ended up settling

00:34:39.980 --> 00:34:42.000
on asset -centric security operations. And so

00:34:42.000 --> 00:34:43.780
it's really funny that you mentioned even use

00:34:43.780 --> 00:34:47.900
the term assets as you're describing that. So

00:34:47.900 --> 00:34:52.340
I have a huge amount of hope for what we can

00:34:52.340 --> 00:34:55.929
do with that. I know hope's not a plan, but there's

00:34:55.929 --> 00:34:57.949
an amazing amount of things you can do that once

00:34:57.949 --> 00:35:00.550
you actually are focused on the things that matter

00:35:00.550 --> 00:35:04.090
and you can tag those and you can establish relationships.

00:35:04.170 --> 00:35:06.250
And I understand that y 'all aren't first release,

00:35:06.309 --> 00:35:08.309
so I'm not expecting that any of this is necessarily

00:35:08.309 --> 00:35:10.829
in the product already. But the more that you

00:35:10.829 --> 00:35:13.050
can get it to that asset -centric thing, because...

00:35:14.039 --> 00:35:15.940
Assets are a thing, right? And that's what's

00:35:15.940 --> 00:35:18.139
happening is the thing is being attacked, right?

00:35:18.219 --> 00:35:20.219
Whether it's a user identity or an endpoint or

00:35:20.219 --> 00:35:23.320
a cloud container or an API or whatever it happens

00:35:23.320 --> 00:35:26.159
to be. All of those things are what are being

00:35:26.159 --> 00:35:28.320
attacked, taken over, and then used as a launching

00:35:28.320 --> 00:35:30.380
point for the next thing. Because logs, at the

00:35:30.380 --> 00:35:32.900
end of the day, logs are just something happened.

00:35:33.679 --> 00:35:35.860
Something happened to this thing, and this is

00:35:35.860 --> 00:35:37.619
the data that we gathered, the attributes, the

00:35:37.619 --> 00:35:41.300
other observed information about it. But at the

00:35:41.300 --> 00:35:43.809
end of the day, Activities and events about an

00:35:43.809 --> 00:35:46.469
asset, and that asset is what the attackers are

00:35:46.469 --> 00:35:49.989
targeting and using and trying to steal and corrupt

00:35:49.989 --> 00:35:53.349
and take down and extort you for ransomware -wise.

00:35:53.949 --> 00:35:57.070
I'm really excited about that concept. I recognize,

00:35:57.190 --> 00:35:59.090
again, you're literally the first release public

00:35:59.090 --> 00:36:01.650
preview, but I think that's an amazing approach.

00:36:02.449 --> 00:36:04.269
Well, it's good to hear the validation on that.

00:36:04.369 --> 00:36:08.070
And yes, we also are super excited. I'm not sure

00:36:08.070 --> 00:36:10.230
if folks like Michael who are focused on the

00:36:10.230 --> 00:36:11.769
red team side of things are quite as excited

00:36:11.769 --> 00:36:14.369
about that. For obvious reasons, we are always

00:36:14.369 --> 00:36:16.789
interested in making his job harder. So we're

00:36:16.789 --> 00:36:18.869
excited about that possibility too. I don't know,

00:36:18.929 --> 00:36:21.449
Michael, if any of this is making you nervous

00:36:21.449 --> 00:36:23.769
from your perspective here. Well, actually, it's

00:36:23.769 --> 00:36:25.230
funny you should bring that up. I mean, we found

00:36:25.230 --> 00:36:26.809
that basically one of the biggest indicators

00:36:26.809 --> 00:36:29.130
about whether the red team is successful or not

00:36:29.130 --> 00:36:34.579
is logs. Absolutely. And we're all about making

00:36:34.579 --> 00:36:37.679
sure that Michael has a bad day as his red team

00:36:37.679 --> 00:36:43.119
persona. What is a day in the life of Mark Hendrick

00:36:43.119 --> 00:36:45.880
look like? What do you do day to day in your

00:36:45.880 --> 00:36:48.280
job? The day usually starts fairly early for

00:36:48.280 --> 00:36:52.719
me because I'm in the West Coast and we have,

00:36:52.760 --> 00:36:54.940
of course, teams and customers all over the place.

00:36:55.099 --> 00:36:57.719
But that often does mean there's some early calls.

00:36:58.429 --> 00:37:00.829
So actually this morning was fairly early for

00:37:00.829 --> 00:37:04.630
me. Lots of conversations directly with some

00:37:04.630 --> 00:37:08.289
of our customers. We usually have some key design

00:37:08.289 --> 00:37:10.909
partners we're collaborating with on early versions

00:37:10.909 --> 00:37:14.409
of features. Certainly collaboration among our

00:37:14.409 --> 00:37:17.590
team, learning from each other, what we are hearing

00:37:17.590 --> 00:37:20.199
from customers. providing some order and some

00:37:20.199 --> 00:37:23.320
structure to that, then joining with our engineering

00:37:23.320 --> 00:37:26.900
colleagues, listening to the efforts that they

00:37:26.900 --> 00:37:28.900
have been making and the progress that they've

00:37:28.900 --> 00:37:31.840
been making on various features and capabilities,

00:37:32.139 --> 00:37:34.639
and then preparing, of course, to get the word

00:37:34.639 --> 00:37:38.400
out with that. And at that point, my brain is

00:37:38.400 --> 00:37:40.139
kind of starting to get to the end of itself

00:37:40.139 --> 00:37:43.559
because it's been functioning at least at some

00:37:43.559 --> 00:37:46.199
level for about six or eight hours at that point.

00:37:46.480 --> 00:37:49.130
But afternoons do tend to be... full of a lot

00:37:49.130 --> 00:37:52.190
of collaboration with my colleagues. Microsoft

00:37:52.190 --> 00:37:56.030
is still blessedly committed to remote work.

00:37:56.510 --> 00:37:58.309
Everybody's all over the place, but the afternoons

00:37:58.309 --> 00:38:00.250
when things kind of calm down with customer conversations,

00:38:00.650 --> 00:38:03.070
lots of collaboration and conversation there.

00:38:03.230 --> 00:38:05.230
So that's roughly the day in the life, if that

00:38:05.230 --> 00:38:07.829
helps. Then the very last thing we ask people,

00:38:07.969 --> 00:38:10.530
Mark, because we love to grill people, is if

00:38:10.530 --> 00:38:13.610
you had a final thought for our listeners, what

00:38:13.610 --> 00:38:16.989
would it be? So I think the one other thing that

00:38:16.989 --> 00:38:19.849
I would say is this product right now, Sentinel

00:38:19.849 --> 00:38:22.789
Data Lake, is in public preview, which means

00:38:22.789 --> 00:38:25.269
it's available for you to get your hands on right

00:38:25.269 --> 00:38:30.309
away. And if you look really closely at the way

00:38:30.309 --> 00:38:32.929
that the deployment strategy works, the way that

00:38:32.929 --> 00:38:36.469
the pricing works even, you can find a pretty

00:38:36.469 --> 00:38:39.329
affordable way to begin trying this product out.

00:38:39.429 --> 00:38:41.989
So please, whatever you do, don't tell anybody

00:38:41.989 --> 00:38:44.730
on the sales side that I said that. But you can

00:38:44.730 --> 00:38:47.369
actually find incremental ways of engaging with

00:38:47.369 --> 00:38:49.929
this. I know sometimes you're thinking, wow,

00:38:50.170 --> 00:38:52.610
huge data platform, SIM migration. I'm going

00:38:52.610 --> 00:38:54.530
to have to talk to those terrifying people over

00:38:54.530 --> 00:38:57.530
there in purchasing. Take a look at the product.

00:38:57.610 --> 00:39:00.989
Take a look at how it is offered, how it's packaged.

00:39:01.190 --> 00:39:04.710
I think you'll find a pretty clear path to how

00:39:04.710 --> 00:39:07.409
you can actually get started with it in the near

00:39:07.409 --> 00:39:09.409
term and start kind of building the case around

00:39:09.409 --> 00:39:13.019
it. All right, so let's bring this episode to

00:39:13.019 --> 00:39:14.900
an end. Thank you, Mark, for joining us this

00:39:14.900 --> 00:39:17.460
week. I always say this, but I always learn something,

00:39:17.639 --> 00:39:19.719
but I learned a heck of a lot actually on this

00:39:19.719 --> 00:39:22.360
one. It makes a lot of sense, this whole data

00:39:22.360 --> 00:39:23.980
lake idea, which again was kind of funny seeing

00:39:23.980 --> 00:39:25.940
as I was in Azure Data for such a long time,

00:39:26.019 --> 00:39:28.280
but there you go. All right, so let's bring it

00:39:28.280 --> 00:39:30.739
to an end. Again, thanks for joining us and to

00:39:30.739 --> 00:39:32.719
all our listeners out there, we hope you found

00:39:32.719 --> 00:39:35.980
this episode of use. Stay safe and we will see

00:39:35.980 --> 00:39:38.360
you next time. Thanks for listening to the Azure

00:39:38.360 --> 00:39:41.500
Security Podcast. You can find show notes and

00:39:41.500 --> 00:39:45.019
other resources at our website, azsecuritypodcast

00:39:45.019 --> 00:39:48.780
.net. If you have any questions, please find

00:39:48.780 --> 00:39:52.199
us on Twitter at AzureSecPod. Background music

00:39:52.199 --> 00:39:55.579
is from ccmixter .com and licensed under the

00:39:55.579 --> 00:39:56.659
Creative Commons license.
