WEBVTT

00:00:00.000 --> 00:00:02.680
Hi everyone, welcome back to Collaboration Simplified.

00:00:02.720 --> 00:00:04.839
Today we're going to talk about three things

00:00:04.839 --> 00:00:08.839
that people get wrong when it comes to M365 Copilot

00:00:08.839 --> 00:00:12.099
and security. First one, good data hygiene is

00:00:12.099 --> 00:00:15.380
good data hygiene. Number two, the LLM does not

00:00:15.380 --> 00:00:17.780
know or care about your data. And number three,

00:00:17.940 --> 00:00:20.699
Copilot does not have a memory. This is Kitty,

00:00:20.859 --> 00:00:23.199
a compliance technical specialist focused on

00:00:23.199 --> 00:00:27.359
data security for M365 Copilot. Copilot really

00:00:27.359 --> 00:00:30.219
does things on behalf of the user, which means

00:00:30.219 --> 00:00:33.320
that if a user has rights to it, then so does

00:00:33.320 --> 00:00:35.880
Copilot. If a user doesn't have rights to it,

00:00:35.920 --> 00:00:37.740
then Copilot doesn't either. The way I like to

00:00:37.740 --> 00:00:40.320
put it is Copilot isn't going to break down the

00:00:40.320 --> 00:00:42.500
walls. It's just going to show you where the

00:00:42.500 --> 00:00:45.100
holes are. What it highlights is really good

00:00:45.100 --> 00:00:47.600
data security practices or bad data security

00:00:47.600 --> 00:00:50.899
practices. Everyone has permissions to everything.

00:00:51.219 --> 00:00:53.299
Copilot is going to make it easier for them to

00:00:53.299 --> 00:00:54.799
surface things that they probably shouldn't have

00:00:54.799 --> 00:00:57.640
access to. If you have put good guardrails and

00:00:57.640 --> 00:01:00.740
good permissions into place, Copilot is going

00:01:00.740 --> 00:01:03.920
to only show them what it is that they need for

00:01:03.920 --> 00:01:06.700
productivity. And ultimately, that is what Copilot

00:01:06.700 --> 00:01:08.959
is. And that makes a ton of sense, Kitty. So

00:01:08.959 --> 00:01:12.680
what are some examples of good... Data hygiene

00:01:12.680 --> 00:01:15.700
practices. Permissions. If you have a SharePoint

00:01:15.700 --> 00:01:19.700
site, if you have a repository of some kind,

00:01:19.819 --> 00:01:21.980
you need to make sure that you review permissions

00:01:21.980 --> 00:01:25.200
regularly. Look at the site. Who has access?

00:01:25.480 --> 00:01:28.000
Why do they have access? Should they still have

00:01:28.000 --> 00:01:31.420
access? Know where your sensitive data is. It's

00:01:31.420 --> 00:01:34.260
kind of a no -brainer, but people tend to dump

00:01:34.260 --> 00:01:37.799
data into repositories because it's convenient.

00:01:37.959 --> 00:01:40.760
They don't think about the potential problems

00:01:40.760 --> 00:01:44.620
that having access to this data can cause. So

00:01:44.620 --> 00:01:47.420
put data where it's appropriate. Know where your

00:01:47.420 --> 00:01:50.840
data is stored. And then lastly, make sure that

00:01:50.840 --> 00:01:54.500
you are labeling your data appropriately and

00:01:54.500 --> 00:01:58.159
that you have DLP policies in place to catch.

00:01:58.750 --> 00:02:02.170
data trying to leave or data being misused. The

00:02:02.170 --> 00:02:04.709
second thing that people get wrong about co -pilot

00:02:04.709 --> 00:02:06.790
and security, what do you have for us there?

00:02:07.049 --> 00:02:11.189
So the second one is all about the actual LLM.

00:02:11.370 --> 00:02:15.509
Obviously, LLM is trained on data. And that is,

00:02:15.590 --> 00:02:18.030
by the way, when we say LLM, we're talking large

00:02:18.030 --> 00:02:21.830
language model. The LLM is an interpreter. It

00:02:21.830 --> 00:02:24.990
says, hey, I'm taking what you're saying in natural

00:02:24.990 --> 00:02:26.909
language, which is what you were talking about,

00:02:27.030 --> 00:02:31.909
and translates that into a search query against

00:02:31.909 --> 00:02:36.810
the data stores in M365. And lots of people are

00:02:36.810 --> 00:02:41.129
afraid that the LLM is somehow manipulating and

00:02:41.129 --> 00:02:43.550
retaining their data. The problem with this is

00:02:43.550 --> 00:02:48.409
that if we allow the LLM to do something with

00:02:48.409 --> 00:02:52.620
questions that people ask it, or data that is

00:02:52.620 --> 00:02:56.860
asked of it, then we open ourselves up to AI

00:02:56.860 --> 00:02:59.259
poisoning. And if you want to know what AI poisoning

00:02:59.259 --> 00:03:01.699
looks like, do a search in your favorite search

00:03:01.699 --> 00:03:06.300
engine of muffin versus dog. And if you do that,

00:03:06.300 --> 00:03:10.580
what you'll find is a lot of muffins and dogs,

00:03:10.719 --> 00:03:14.080
and they look like each other. Because we're

00:03:14.080 --> 00:03:18.020
human, and we have an understanding. of what

00:03:18.020 --> 00:03:20.819
a dog is versus what a muffin is, we are actually

00:03:20.819 --> 00:03:23.780
able to see the difference and we can easily

00:03:23.780 --> 00:03:26.259
distinguish between a dog and a muffin. An AI

00:03:26.259 --> 00:03:30.280
can't. And the more muffins you throw at it and

00:03:30.280 --> 00:03:34.020
tell it that these are dogs, the more likely

00:03:34.020 --> 00:03:37.520
the LLM will be to drift in its model. Microsoft

00:03:37.520 --> 00:03:42.259
doesn't actually allow user data into the LLM.

00:03:42.539 --> 00:03:46.960
to be retained because we don't want the LLM

00:03:46.960 --> 00:03:50.819
to slide from its original purpose and from its

00:03:50.819 --> 00:03:53.139
original training. And that training is controlled

00:03:53.139 --> 00:03:57.099
very strictly by not only an ethics board that

00:03:57.099 --> 00:04:01.280
follows ethical AI standards, but also a group

00:04:01.280 --> 00:04:04.939
of engineers who are responsible for training

00:04:04.939 --> 00:04:07.479
the model. What all this is to say is that the

00:04:07.479 --> 00:04:11.650
LLM does not know. or care about the data that

00:04:11.650 --> 00:04:15.210
you input into it it is purely taking the question

00:04:15.210 --> 00:04:18.689
and turning it into something that the rest of

00:04:18.689 --> 00:04:22.149
the systems can understand without ever really

00:04:22.149 --> 00:04:24.810
retaining anything because i think a common concern

00:04:24.810 --> 00:04:28.490
is the large language model learning from my

00:04:28.490 --> 00:04:31.110
prompts what i type in that the questions that

00:04:31.110 --> 00:04:34.189
i ask my data the documents that i share with

00:04:34.189 --> 00:04:37.339
it and basically what you're saying is don't

00:04:37.339 --> 00:04:40.399
worry, the large language models that we use

00:04:40.399 --> 00:04:44.360
are not learning. Correct. And more to the point,

00:04:44.439 --> 00:04:47.620
we wouldn't want that in the first place because

00:04:47.620 --> 00:04:50.600
that could lead to undesirable results. And I

00:04:50.600 --> 00:04:53.319
think we've seen that with other language models

00:04:53.319 --> 00:04:55.800
that have been open to the internet and the kind

00:04:55.800 --> 00:04:58.120
of drift that has been introduced. What about

00:04:58.120 --> 00:05:01.879
like a good scenario for this, right? Like if

00:05:01.879 --> 00:05:05.139
certain business has a lot of acronyms or...

00:05:05.550 --> 00:05:08.509
does business in a very unique way, and they

00:05:08.509 --> 00:05:11.949
actually want the large language model to learn?

00:05:12.329 --> 00:05:14.589
Yeah, that's actually a great question. There

00:05:14.589 --> 00:05:18.089
are really several ways to solve that. The first

00:05:18.089 --> 00:05:21.970
is obviously to create your own model. We're

00:05:21.970 --> 00:05:23.790
not going to talk about that because that's not

00:05:23.790 --> 00:05:26.670
my area of expertise. The second is actually

00:05:26.670 --> 00:05:29.329
a different part. And this is something that

00:05:29.329 --> 00:05:33.750
a lot of people misconstrue about Copilot. Copilot

00:05:33.750 --> 00:05:36.459
doesn't really have a memory. Copilot's only

00:05:36.459 --> 00:05:38.660
memory is something called the semantic index.

00:05:38.920 --> 00:05:42.540
Semantic index relates you to your data. What

00:05:42.540 --> 00:05:45.279
it's doing is every time that you interact with

00:05:45.279 --> 00:05:48.699
Copilot, it is learning about what you mean.

00:05:48.839 --> 00:05:52.899
If a company has a lot of acronyms, it has a

00:05:52.899 --> 00:05:55.379
lot of things that are unique to that company,

00:05:55.519 --> 00:06:00.839
as users in that company use those types of things,

00:06:01.120 --> 00:06:04.779
their individual semantic index We'll actually

00:06:04.779 --> 00:06:08.480
learn about that. So if I'm talking to you all

00:06:08.480 --> 00:06:12.360
the time and I continually use a set group of

00:06:12.360 --> 00:06:15.360
words, that is going to be written to my semantic

00:06:15.360 --> 00:06:18.819
index, which is individual to me and your index

00:06:18.819 --> 00:06:21.240
is individual to you. And it's going to learn

00:06:21.240 --> 00:06:26.939
that if I use the acronym. AMS that I mean this

00:06:26.939 --> 00:06:30.620
and not that. It is a kind of interesting learning

00:06:30.620 --> 00:06:35.500
model that helps the LLM to make the responses

00:06:35.500 --> 00:06:38.439
more personalized and more interesting to you.

00:06:38.500 --> 00:06:43.000
It also shortcuts how long it takes for the LLM

00:06:43.000 --> 00:06:46.589
to respond. So as an example, If I'm looking

00:06:46.589 --> 00:06:49.930
for information in documents that I commonly

00:06:49.930 --> 00:06:54.050
use, then what the semantic index allows the

00:06:54.050 --> 00:06:57.949
LLM and Copilot to do is to say, well, I use

00:06:57.949 --> 00:07:01.589
this file all the time. So I'm going to check

00:07:01.589 --> 00:07:04.449
there before I go search the entire SharePoint

00:07:04.449 --> 00:07:07.050
library that it's available to me. And that makes

00:07:07.050 --> 00:07:10.180
the response quicker. It also makes it. the research

00:07:10.180 --> 00:07:12.779
results more relevant to you. That is what a

00:07:12.779 --> 00:07:15.339
lot of people actually mistake for memory. Copilot,

00:07:15.500 --> 00:07:18.579
after you've typed in a prompt, Copilot does

00:07:18.579 --> 00:07:21.839
not remember what you typed in. This is why if

00:07:21.839 --> 00:07:25.480
you reference a previous session with Copilot,

00:07:25.639 --> 00:07:28.720
it's actually not going to be able to get that

00:07:28.720 --> 00:07:31.339
information. It looks like it's available, and

00:07:31.339 --> 00:07:33.839
that is because the conversations you've had

00:07:33.839 --> 00:07:37.589
with Copilot are actually retained in a log of

00:07:37.589 --> 00:07:40.449
sorts that's stored in your mailbox. It's not

00:07:40.449 --> 00:07:43.389
accessible to you, but it is accessible to the

00:07:43.389 --> 00:07:46.600
corporation. and to administrators in order to

00:07:46.600 --> 00:07:49.660
be able to do things like e -discovery on them

00:07:49.660 --> 00:07:53.779
for security purposes. That information is stored

00:07:53.779 --> 00:07:56.579
in the context of that session only. Copilot

00:07:56.579 --> 00:08:00.120
really has no access to previous queries. Thank

00:08:00.120 --> 00:08:02.480
you for joining. If you did enjoy the content,

00:08:02.620 --> 00:08:05.019
please make sure to give it a thumbs up and consider

00:08:05.019 --> 00:08:07.680
subscribing. If you have any questions or if

00:08:07.680 --> 00:08:10.120
you want to see additional content on security

00:08:10.120 --> 00:08:13.350
in M365 Copilot, Let us know in the comments

00:08:13.350 --> 00:08:15.230
and I'll catch you on the next one.