1
00:00:00,000 --> 00:00:10,240
Welcome to the cannabis data science meetup group. Glad to have you all here in for a

2
00:00:10,240 --> 00:00:17,520
treat today. Gonna get our hands on some data and picking up from last week be looking at

3
00:00:17,520 --> 00:00:24,000
Benford's law. This was new to me. Our good friend Isaac at MCR labs brought this up as

4
00:00:24,000 --> 00:00:30,400
an interesting tool that we could perhaps use a statistical tool. Did a bit of research. It looks

5
00:00:30,400 --> 00:00:38,560
promising. Got some great statisticians behind it and so wanted to share with you what I've learned,

6
00:00:38,560 --> 00:00:44,960
how we can apply it to some data and get your thoughts about moving the ball forward. That's

7
00:00:44,960 --> 00:00:50,640
the plan for today but just to go ahead and you know give five minutes let everybody say oh you

8
00:00:50,640 --> 00:00:57,040
know what you may hope to get out of the group. I'm gonna start in reverse order today that way

9
00:00:57,760 --> 00:01:03,520
our new friend here Christina doesn't have to be put on the spot. Candice excited to see you today.

10
00:01:04,160 --> 00:01:10,080
Love the hat. Would love to hear about your adventures in the past week. If you were able

11
00:01:10,080 --> 00:01:17,040
to get any data science work done and what you're looking forward to in the coming weeks to finish

12
00:01:17,040 --> 00:01:24,800
off the year strong. Well that's a doozy. So yes so I'm going to be working on parsing a new lab

13
00:01:25,440 --> 00:01:35,600
and also too I'm gonna be putting up my presentations, my plots with Seaborn and

14
00:01:35,600 --> 00:01:40,400
I've been checking out two the YouTube videos that you posted Keegan. That's really great. I did get

15
00:01:40,400 --> 00:01:47,920
to Florida and I did go to Kana MD and so I'm renewed. It was pretty interesting I was talking

16
00:01:47,920 --> 00:01:53,600
to my doctor about some of the things that I was learning with MCR Labs and Jeff Haas and over at

17
00:01:53,600 --> 00:02:01,360
Harvard. We had an interesting discussion and but he also gave me some good leads too for some good

18
00:02:01,360 --> 00:02:06,560
flower in Florida and because it's kind of interesting that you know I joined some user

19
00:02:06,560 --> 00:02:15,360
groups and actually somebody broke open a fresh product from the state of Florida and they showed

20
00:02:15,360 --> 00:02:21,120
me pictures. I have copies of the pictures. It has mold in it you know it's like visible like you can

21
00:02:21,120 --> 00:02:28,640
like pull it apart and I don't know it's uh but anyway so I'm going to be getting some COAs from

22
00:02:28,640 --> 00:02:36,080
Florida also. Well it's not fantastic but it's quite interesting to hear you on the ground in

23
00:02:36,080 --> 00:02:43,200
Florida telling it how it is and mold is an interesting thing because from being in the

24
00:02:43,200 --> 00:02:50,960
laboratory and in fact we'll look at this today not all analyses are taken as seriously as others

25
00:02:51,520 --> 00:02:59,840
unfortunately. Coming at it in an economic lens people have to prioritize and a lot of the priority

26
00:02:59,840 --> 00:03:09,040
goes towards measuring cannabinoids really really accurately and tests like micro detection,

27
00:03:09,040 --> 00:03:16,080
microtoxin screening that may get pushed to the back burner. However from seeing some of the

28
00:03:16,080 --> 00:03:23,920
unintended consequences one thing that is a little concerning to me is that some large-scale

29
00:03:23,920 --> 00:03:32,320
cultivators are seeing this as a cost which I pointed out that it was a while back right you

30
00:03:32,320 --> 00:03:38,640
have to factor in the probability that you may fail a test and that's a cost and if you don't

31
00:03:38,640 --> 00:03:45,920
factor it in you could run into the red with your business. However some of the large cultivators are

32
00:03:45,920 --> 00:03:54,640
thinking okay how can we prevent failing say micro testing and there are various treatments out there

33
00:03:54,640 --> 00:04:00,720
as we know with a lot of things you know how well studied are some of these things and so just to be

34
00:04:00,720 --> 00:04:07,840
to be frank some of the ones going from from least concerning to more concerning pressure

35
00:04:07,840 --> 00:04:14,640
oxygen deprivation perhaps to treating the product with ozone and then Candice I think you mentioned

36
00:04:14,640 --> 00:04:20,640
at one point radiation and those are the ones that are slightly more concerning to me while they may

37
00:04:20,640 --> 00:04:29,360
be say chemically sound I don't know I just wouldn't wouldn't mind a bit further research myself

38
00:04:29,360 --> 00:04:37,440
so that can be an unintended consequence of mandated micro testing but as Candice you pointed out

39
00:04:37,440 --> 00:04:46,400
mold is a thing right and no one wants moldy cannabis flower the consumer may want a reasonable

40
00:04:46,400 --> 00:04:53,040
expectation that their products won't have mold in them so what to do about that. I forget the name

41
00:04:53,040 --> 00:04:58,240
of it the disease but it's like some type of disease that I think people get in agriculture

42
00:04:58,240 --> 00:05:03,920
and from breathing in mold I'll have to nail it down you know I mean it's a new industry and it's

43
00:05:03,920 --> 00:05:09,360
awesome cannabis is amazing right compared to big pharma and people can really get off a lot of

44
00:05:09,360 --> 00:05:16,080
drugs Keegan right and with cannabis but then also too you know it's like yeah they really do need I

45
00:05:16,080 --> 00:05:20,400
think they need transparency we need open data sets in Florida and Massachusetts that's what we

46
00:05:20,400 --> 00:05:25,760
need so that the watch dogs right can help keep an eye on the system like Jeff Haasen says right

47
00:05:25,760 --> 00:05:31,680
well the federal government right the FDA do we really we us kids in the states we need to play

48
00:05:31,680 --> 00:05:36,960
good and be transparent and have the data out there so that the federal government doesn't turn

49
00:05:36,960 --> 00:05:42,800
around and turn it into tobacco or you know because it's funny because like my first review with

50
00:05:42,800 --> 00:05:48,400
pure leaf that's still up there a year ago talks about how I'm getting this nice organic cannabis

51
00:05:48,400 --> 00:05:54,160
and stuff right and wow what I've learned right because that's really how it sold to us patients

52
00:05:54,160 --> 00:05:59,280
you know and and I get it because I was kind of doing the same thing you know that review is still

53
00:05:59,280 --> 00:06:06,240
up there I don't know it's it's just so cool just to know right to just be aware of you know possible

54
00:06:06,240 --> 00:06:12,000
health problems and then too I don't know how they would fix that because uh you know what I

55
00:06:12,000 --> 00:06:18,240
was saying in that picture looked just like uh when I put the predator mics on the uh for my

56
00:06:18,240 --> 00:06:23,760
spider mics that somehow came in on my hydrogen peroxide clone right you raised a couple good

57
00:06:23,760 --> 00:06:29,920
points that have sparked a couple thoughts but before I get to those anyway I'm taking the floor

58
00:06:30,640 --> 00:06:36,560
we'll get back to you Candice I just wanted to let Isaac chime in wouldn't mind hearing your thoughts

59
00:06:36,560 --> 00:06:43,360
on this this microbe yeah I just want to add something uh from the lab's perspective I mean

60
00:06:43,360 --> 00:06:50,080
well each state all require an array of testing right heavy metal uh more uh eastern mold and

61
00:06:50,080 --> 00:06:58,320
other bacterias and but I can attest that uh east mode is among one of the most challenging

62
00:06:58,320 --> 00:07:04,800
essays for clients to pass even if they are huge dispensaries I would say at least half of them

63
00:07:05,920 --> 00:07:13,520
do struggle with the issue so it's uh I'm not surprised that you're finding uh uh flowers with

64
00:07:13,520 --> 00:07:20,480
literally visible mold on them I mean that I've heard that happen from time to time and also

65
00:07:20,480 --> 00:07:28,720
from another perspective uh is the health of the worker if the growing room is contaminated by

66
00:07:28,720 --> 00:07:35,600
uh mold or some kind of bacteria uh the workers are at risk of getting respiratory diseases

67
00:07:35,600 --> 00:07:42,480
as a matter of fact uh last year there is a uh worker in one of the dispensaries unfortunately

68
00:07:42,480 --> 00:07:51,280
passed away and the investigation is still ongoing but uh the direction is toward the link between the

69
00:07:51,280 --> 00:07:58,480
mold contamination in their grow room and her unfortunate passing away so it's also to the

70
00:07:58,480 --> 00:08:04,880
uh workers in the industry you're Christina are you going to chime in before I ramble on um I guess

71
00:08:04,880 --> 00:08:10,560
what concerns me this is about concerns right because what concerns me is the um some of the

72
00:08:10,560 --> 00:08:18,080
some of the strains are getting contaminated I don't know if by chemicals or another strand that

73
00:08:18,080 --> 00:08:26,080
kind of it doesn't go well with the strand like if I were to buy an ACDC right and it's not

74
00:08:26,080 --> 00:08:33,760
completely clean the strain for me it's very important because I have epilepsy and terpene

75
00:08:33,760 --> 00:08:43,520
terpenes they matter to me because one terpene can make a big change in my um my body you know

76
00:08:43,520 --> 00:08:49,120
and I don't want to get a seizure because of a strand being contaminated and um um I live in

77
00:08:49,120 --> 00:08:54,960
Seattle like the Seattle area there's this little shop where I bought a strand and it was indeed

78
00:08:54,960 --> 00:09:02,880
contaminated I got I started to see like weird things and it was just not a fun moment so like

79
00:09:02,880 --> 00:09:11,200
I don't know if that's common or not you're spot on in in the right place because this is the exact

80
00:09:11,200 --> 00:09:17,120
topic at hand that we're talking about in that when you go to a retailer what are they going to

81
00:09:17,120 --> 00:09:24,240
be selling to you they're going to have the the strains labeled oh ACDC or this or that um and

82
00:09:24,240 --> 00:09:31,360
then they may have some of the chemical compositions labeled you've learned that okay ACDC works well

83
00:09:31,360 --> 00:09:38,080
for you and what we're trying to do is tie some of these real life situations with the data and the

84
00:09:38,080 --> 00:09:46,800
best lead that we have is why is ACDC effective for you perhaps it has something to do with the

85
00:09:46,800 --> 00:09:55,760
unique chemical composition of ACDC so we want to get a nice measure of those chemicals because

86
00:09:55,760 --> 00:10:02,560
we're so at the frontier essentially no one consuming cannabis except for maybe a small

87
00:10:02,560 --> 00:10:08,880
minority of people actually know the milligrams of various chemical compounds that they're

88
00:10:08,880 --> 00:10:16,480
ingesting luckily you're in Seattle so there are good testing laboratories in Washington State

89
00:10:16,480 --> 00:10:25,200
and so what you may want to do in fact in Washington State it is required that you may want to ask for

90
00:10:25,200 --> 00:10:33,920
the certificate of analysis when you when you purchase a product then you can keep track of the

91
00:10:33,920 --> 00:10:40,720
chemicals that you're ingesting it would be awesome if the product was tested for terpenes

92
00:10:40,720 --> 00:10:47,360
that's not required but some people do test their products for terpenes that that's kind of concerning

93
00:10:47,360 --> 00:10:53,040
concerning aren't the the products supposed to already come with the terpenes is if that's what

94
00:10:53,040 --> 00:11:01,920
i'm actually purchasing well you raise a awesome point that wasn't really taking into consideration

95
00:11:01,920 --> 00:11:08,240
until late in the game so long story short when cannabis was first legalized right so it was

96
00:11:08,240 --> 00:11:14,400
permitted in Washington State back in 2012 and so originally people were thinking what are the main

97
00:11:14,400 --> 00:11:24,000
psychoactive components those are going to be thc and cbd well since 2012 it's been a decade

98
00:11:24,000 --> 00:11:30,400
10 years of scientific research people have discovered more and more compounds and it's

99
00:11:30,400 --> 00:11:36,960
looking like other compounds may have some significance right maybe not all of them right

100
00:11:36,960 --> 00:11:45,680
maybe chlorophyll may not necessarily have a good or a bad effect but people are discovering oh

101
00:11:45,680 --> 00:11:51,840
there's beta-cariofoli and these also may have effects last comments and then i'll let Isaac

102
00:11:51,840 --> 00:11:56,640
jump in and i was just going to say certain states are starting to realize that we may also

103
00:11:56,640 --> 00:12:03,680
want to measure terpenes i think Oklahoma they're mandated it was sort of an unusual mandate but

104
00:12:03,680 --> 00:12:10,240
they are tested there and before i keep talking i would love to get Isaac love to hear what you

105
00:12:10,240 --> 00:12:17,600
have to say yeah thanks i just want to add that uh dreams like gg4 because cannabis has been

106
00:12:17,600 --> 00:12:26,000
underground for such a long time and the readings are all very messy and and and it's almost

107
00:12:26,000 --> 00:12:33,200
impossible to trace that to the original gene so there are a lot of dispensaries or seed banks they

108
00:12:33,200 --> 00:12:42,640
can claim that their flower is gg4 or if it's blue dream but they're really we don't really have a

109
00:12:42,640 --> 00:12:49,840
reference or a standard or certification okay this is the gg4 phenotype i've actually done similar

110
00:12:49,840 --> 00:12:57,040
analysis of uh using uh principal component analysis just to plot of for example all the

111
00:12:57,040 --> 00:13:06,240
gg4s we receive from different growers and they do appear uh triable so uh one client or one

112
00:13:06,240 --> 00:13:13,280
dispensary's gg4 is likely to be different from another client that's very difficult for

113
00:13:14,000 --> 00:13:20,320
consumers like you to know exactly what what you're purchasing and i think this leads naturally to

114
00:13:20,320 --> 00:13:27,760
what keegan was talking about to measure or to get exact chemical profile of a strain and then

115
00:13:27,760 --> 00:13:35,520
using this more scientific categorization rather than relying on a name which at the moment any

116
00:13:36,160 --> 00:13:43,680
seed bank can claim that their seed is this strain but they might not i'll just keep piggybacking on

117
00:13:43,680 --> 00:13:49,440
that in that let's say strain names the best we have it's a far far from perfect tool but like

118
00:13:49,440 --> 00:13:56,240
you said it's at least getting you on the right track so okay acdc the strain appears to have a

119
00:13:56,240 --> 00:14:01,760
certain effect for you and from my understanding of this strain that it could even be a what's

120
00:14:01,760 --> 00:14:12,320
called a type two or where it's almost an equal thc to cbd ratio it's very equal exactly so i think

121
00:14:12,320 --> 00:14:18,000
there could quite easily be something going on with the the unique chemical composition of that

122
00:14:18,000 --> 00:14:22,640
we're still we're working on this we're going to get to the bottom of this because as isaac said

123
00:14:22,640 --> 00:14:28,960
and it's real peculiar because there's a real disconnect between say the laboratories and

124
00:14:28,960 --> 00:14:35,200
the breeders and maybe there's not as much disconnect as one may think um it may it may

125
00:14:35,200 --> 00:14:40,880
be more of a show all the breeders right isaac said so if you're the breeders are selling their

126
00:14:40,880 --> 00:14:47,440
seeds by strain name and this is something that i wanted to get into what maybe this week but it

127
00:14:47,440 --> 00:14:53,760
was looking like next week's going to be the week is actually looking at some of the the seed vendors

128
00:14:53,760 --> 00:14:59,600
and looking at their history because there's a really really cool history there and all of

129
00:14:59,600 --> 00:15:07,200
these strain names have cool histories but the kind of names and as isaac was saying they're not

130
00:15:07,200 --> 00:15:16,400
set in stone right anyone can grow any weed any cannabis plant and just and call it acdc so that's

131
00:15:16,400 --> 00:15:25,280
the trouble behind it the interesting part is perhaps there is a variety that tends to have

132
00:15:25,280 --> 00:15:32,240
equal distributions of thc and cdd perhaps people kind of want to call that acdc at the end of the

133
00:15:32,240 --> 00:15:38,800
day i think what you want to be concerned about is what you're ingesting and and those will be

134
00:15:39,760 --> 00:15:45,840
the chemical compounds you may actually end up finding another strain and so for example

135
00:15:45,840 --> 00:15:52,560
you know there's canatonic or this or that yeah i like canatonic exactly that also have a similar

136
00:15:52,560 --> 00:15:58,880
chemical composition that was with some of the work we were doing earlier this summer with product

137
00:15:58,880 --> 00:16:05,520
recommendations we're saying oh you know if you gravitate towards this one strain then you may

138
00:16:05,520 --> 00:16:13,920
just want to find a strain that's as chemically similar as possible okay it's imperfect but at

139
00:16:13,920 --> 00:16:20,640
the moment it's you know the best we can do until we can get better hold of this data you know get

140
00:16:20,640 --> 00:16:27,200
it into your hands let you start learning about what works well for you it'll be a slow process

141
00:16:27,200 --> 00:16:33,600
but hopefully there'll be a feedback loop where you know maybe certain growers will start growing

142
00:16:34,160 --> 00:16:40,800
specific cultivars with specific chemical compositions and then the retailers know how

143
00:16:40,800 --> 00:16:48,080
to label this to get it to you to get the intended effect i have one more question well i have a bunch

144
00:16:48,080 --> 00:16:54,720
of questions but on that i'm looking for new strands i know there's one called corazón and

145
00:16:54,720 --> 00:17:04,480
that's from yerbabuena farms up in oregon and one of the parent strands strains is acdc and the other

146
00:17:04,480 --> 00:17:13,200
strain is unknown there's not a lot like research done to it yet but is there a way that you guys

147
00:17:13,200 --> 00:17:21,680
have connections as to how to look for these strains that are not well established yet like i

148
00:17:21,680 --> 00:17:27,760
want to get a hold of this strain but i don't know how right so i just sent in the chat the best kind

149
00:17:27,760 --> 00:17:36,000
of seed genetic website that i have uh give that a try like go directly to the website and request

150
00:17:36,000 --> 00:17:42,560
it there no the website is like a database so you can search for different strains and see their

151
00:17:42,560 --> 00:17:49,520
histories if that's part of the archive what's the website called i would love it thank you

152
00:17:49,520 --> 00:17:59,840
uh en.seedfinder.eu perfect thank you i really appreciate it it's not the most polished looking

153
00:17:59,840 --> 00:18:08,400
website but it has a wealth of information that overwhelms everyone that is okay all right i'll

154
00:18:08,400 --> 00:18:15,200
look into that thank you oh make sure to to come back to this christina since this this is interesting

155
00:18:15,200 --> 00:18:20,880
i'll try to just run through this super super fast for isaac what's it while you're here just wanted

156
00:18:20,880 --> 00:18:28,720
to to share this book came in the mail hemp diseases and pests so this is a an old one it's

157
00:18:28,720 --> 00:18:34,720
interesting right so right the consumers are often concerned about effects but the cultivators are

158
00:18:34,720 --> 00:18:41,200
often concerned about diseases and canons are plagued by many diseases so you know these molds

159
00:18:41,200 --> 00:18:46,320
are a real concern they're of a top concern to cultivators i think we'll want to to talk more

160
00:18:46,320 --> 00:18:53,120
about that i'll i'll run through this super super fast last week isaac brought up benford's law

161
00:18:53,120 --> 00:19:01,040
which is basically when you look at data sets coincidentally in it it's not necessarily

162
00:19:01,040 --> 00:19:08,560
coincidentally i think it's a statistical phenomena different digits occur at certain probabilities

163
00:19:08,560 --> 00:19:14,480
and they follow a certain distribution and so here's the distribution you know one it will appear

164
00:19:14,480 --> 00:19:19,600
around 30 percent of the time we don't have to get too much into the math but you're welcome to

165
00:19:19,600 --> 00:19:28,480
so this was noted way back in the 1800s and then really made concrete in the early 1900s by frank

166
00:19:28,480 --> 00:19:36,320
benford so i'll let you read about this his main contribution was getting many different data sets

167
00:19:36,320 --> 00:19:42,000
and analyzing all these different data sets that made me think of the cannabis data science group

168
00:19:42,000 --> 00:19:48,720
how we're getting many different data sets and applying statistics to them okay what's going to

169
00:19:48,720 --> 00:19:59,120
be required for this we actually want a wide range of numbers if you have a narrow range it's going

170
00:19:59,120 --> 00:20:06,960
to be difficult to show this so so we'll just kind of point that out now and then what are some cool

171
00:20:06,960 --> 00:20:14,960
applications of this if you don't know him halvarian is a famous economist statistician

172
00:20:14,960 --> 00:20:21,280
and he was the one who put forward that this could potentially be used to detect fraud isaac pointed

173
00:20:21,280 --> 00:20:28,160
out potential well not potentially there are cases of laboratory fraud in the cannabis industry and

174
00:20:28,160 --> 00:20:37,280
we would want some mechanism for detecting if a lab result is fraudulent or perhaps if it's probable

175
00:20:37,280 --> 00:20:44,800
that it's fraudulent just from reading the rest of this my main takeaway is that this is a good

176
00:20:44,800 --> 00:20:52,960
mechanism to know what looks suspect and where you may want to look further but i don't know if it's

177
00:20:52,960 --> 00:20:59,280
definitive proof but i think it can be a useful tool about where you may want to look further

178
00:20:59,280 --> 00:21:05,280
the last thing i wanted to point out we actually are running into the problem actually here i'll

179
00:21:05,280 --> 00:21:10,400
point this i'll i'll talk more about this here here momentarily so that's benford's law in a

180
00:21:10,400 --> 00:21:18,320
nutshell different digits appear at different frequencies and if digits don't appear at that

181
00:21:18,320 --> 00:21:25,120
frequency then one may be suspect as to the underlying distribution that's generating the

182
00:21:25,120 --> 00:21:33,360
data so there may not be a nice random underlying generation but this is my really really crude

183
00:21:33,360 --> 00:21:37,280
first take so isaac please feel free to correct me at any point but i want to go ahead and get

184
00:21:37,280 --> 00:21:42,800
through this data to go ahead and kind of get you out of here on time so long story short we have

185
00:21:42,800 --> 00:21:55,680
lab results from washington state in 2022 1.6 million compounds were measured so for example

186
00:21:55,680 --> 00:22:04,480
what does a compound look like this was somebody who tested for microbes this sample actually

187
00:22:04,480 --> 00:22:15,440
failed for microbes it looks like we were interested in say total thc here is the distribution of

188
00:22:15,440 --> 00:22:25,520
thc in cannabis flower in 2022 and this is the phenomena that isaac was pointing out in that

189
00:22:25,520 --> 00:22:35,200
presumably this mode is flower samples and it doesn't appear perfectly normally distributed

190
00:22:35,200 --> 00:22:43,360
and it perhaps may even be skewed towards above 20 i think there's better examples of this but

191
00:22:44,000 --> 00:22:49,040
but that's sort of what we claim just to look at a couple more analytes there's a distribution of

192
00:22:49,040 --> 00:22:58,960
cbd here is your distribution of moisture content and this is more applicable in other states where

193
00:22:58,960 --> 00:23:04,960
lab results are corrected for moisture content i'm not certain if they are in washington state or not

194
00:23:04,960 --> 00:23:12,080
you're really looking for anomalies or non-normal distributions at this stage and one thing i'd like

195
00:23:12,080 --> 00:23:19,520
to point out is you've got this curious mode at 10 and so it looks like for whatever reason some

196
00:23:19,520 --> 00:23:25,200
laboratory may just be reporting just 10 for their moisture content whereas other labs may

197
00:23:25,200 --> 00:23:33,040
be a bit more granular and then water activity this is sort of a measure of shelf stability

198
00:23:33,040 --> 00:23:44,480
and curious thing here is right the limit is 0.65 and this may just be just how the distribution

199
00:23:44,480 --> 00:23:50,640
shook out but you know one may kind of wonder if you know things are kind of being you know slid

200
00:23:50,640 --> 00:23:58,960
under or the 0.65 limit but once again as far as the distribution goes it's looks fairly normal

201
00:23:58,960 --> 00:24:06,800
now quick application of benford's law benford's law requires us to span at least a 10-digit range

202
00:24:06,800 --> 00:24:14,640
so we could do it with thc but it would be tough we wouldn't meet the assumptions required for

203
00:24:14,640 --> 00:24:23,040
benford's law with the other analytes i don't know if this is acceptable but what i did was i just

204
00:24:23,040 --> 00:24:32,560
grabbed the first decimal place for example if you look at thc you'll see most people are reporting

205
00:24:32,560 --> 00:24:40,640
whole numbers so zero is going to occur a lot and then i was also just looking at the first decimal

206
00:24:40,640 --> 00:24:46,720
place here once again this is sort of my quick and dirty attempt at this i don't know if this is

207
00:24:46,720 --> 00:24:53,440
an unacceptable way to implement benford's law but if you look at the digit occurrences

208
00:24:53,440 --> 00:25:02,080
one through nine you do curiously see the decreasing probability of digits it doesn't look

209
00:25:02,080 --> 00:25:08,160
like you know they decrease in probability at quite the same rate as benford observed

210
00:25:08,160 --> 00:25:13,600
but it is interesting that you you see this distribution because if you look at some of the

211
00:25:13,600 --> 00:25:21,840
other analytes well check out cbd cbd diminishes at even further rate i think this is a first

212
00:25:21,840 --> 00:25:30,640
interesting observation is that thc and cbd for whatever reason have different distributions of

213
00:25:30,640 --> 00:25:38,400
digits but they are both are are diminishing which is interesting if you look at moisture content

214
00:25:38,400 --> 00:25:47,920
you'll see almost a uniform distribution so if every digit occurs at the same probability

215
00:25:47,920 --> 00:25:54,320
that would be called a uniform distribution and it would essentially imply that the results are

216
00:25:54,320 --> 00:26:01,200
random i don't know what's going on with moisture content but this looks like it requires further

217
00:26:01,200 --> 00:26:08,800
investigation because it almost just looks like people are just randomly selecting moisture

218
00:26:08,800 --> 00:26:15,680
content or at least the data generation process maybe we may need to think further about the data

219
00:26:15,680 --> 00:26:23,200
generation process and then even more curiously with water activity once again this could just be

220
00:26:23,920 --> 00:26:30,320
something about the data generation process it could just be something unique to cannabis flower

221
00:26:30,320 --> 00:26:38,240
but once again you don't observe benford's law with the digits you see the digits increasing

222
00:26:38,240 --> 00:26:43,920
in probability to four and then decreasing in probability once again this may be a violation

223
00:26:43,920 --> 00:26:52,560
of benford's law so whereas thc and cbd which curiously we thought oh maybe those would be the

224
00:26:52,560 --> 00:26:59,280
ones that were more likely to be falsified curiously and i kind of brought this up at the beginning

225
00:26:59,280 --> 00:27:06,400
that people are putting such great focus into measuring thc and cbd then maybe they actually

226
00:27:06,400 --> 00:27:15,440
are measuring thc and cbd well and maybe little emphasis is put on moisture and water activity

227
00:27:15,440 --> 00:27:21,600
and they may just be just writing down random numbers and when a human writes down a random

228
00:27:21,600 --> 00:27:29,600
number then the digits won't follow benford's law and so that was just curious but that was the

229
00:27:29,600 --> 00:27:37,280
analysis of benford's law this is a start and i would just like to point out that you may need to

230
00:27:37,280 --> 00:27:45,120
take the log of these numbers so so that's a downside and as i pointed out i'm using the first

231
00:27:45,120 --> 00:27:54,400
decimal place and that's also perhaps imperfect but i thought this was curious so i'll kind of

232
00:27:54,400 --> 00:28:02,400
let people think about that what if this is actually one lab that's just doing this but if

233
00:28:02,400 --> 00:28:09,920
you look at all the different laboratories they all sort of have this weird pattern for water

234
00:28:09,920 --> 00:28:20,720
activity long story short is i don't think it's a specific laboratory that is doing this it appears

235
00:28:20,720 --> 00:28:28,960
that all of the laboratories are experiencing this phenomenon awesome so this is total thc by

236
00:28:28,960 --> 00:28:37,120
laboratory in washington state in 2022 and as you can see this appears to be the flower mode and

237
00:28:37,120 --> 00:28:46,560
this would be the concentrate mode i'll just point out what i find interesting so okay lab 10 lab 11

238
00:28:46,560 --> 00:28:55,760
look quite similar you know lab 2907 is interesting in that they're the only laboratory that has a

239
00:28:55,760 --> 00:29:07,440
a triple mode so they've got a mode around say 50 so maybe they do a lot of teeth or bubble hash

240
00:29:07,440 --> 00:29:15,440
testing that's a possibility i'm just kind of curious then lab 2908 they have a similar

241
00:29:15,440 --> 00:29:23,760
distribution lab 2909 it looks like they do significantly fewer concentrate tests there's

242
00:29:23,760 --> 00:29:30,800
only a small tail down here nothing's actually jumping out at me about 2911 those are pretty

243
00:29:30,800 --> 00:29:37,520
normal distributions right and they're they're different right and so these would be the different

244
00:29:37,520 --> 00:29:43,040
this is actually a good example of two different normal distributions with different moments if you

245
00:29:43,040 --> 00:29:48,640
cut this into two different distributions you know this distribution may have a larger variance

246
00:29:48,640 --> 00:29:55,120
it's going to have what's called greater kurtosis so fatter tails probably the thing that jumps out

247
00:29:55,120 --> 00:30:04,320
to me about 2912 is that maybe their mean of concentrates may be slightly higher but as we

248
00:30:04,320 --> 00:30:10,960
said there's nothing really consequential about having a different mean there could be perfectly

249
00:30:10,960 --> 00:30:17,440
normal underlying explanations so for example perhaps there's a distillate maker and they

250
00:30:17,440 --> 00:30:24,160
simply love to use this laboratory so that could easily just explain this concentrate tail whereas

251
00:30:24,880 --> 00:30:32,080
the bubble hash makers may prefer this laboratory so there's just different underlying causes there

252
00:30:32,080 --> 00:30:39,360
the final thing i wanted to point out here we're not like pointing fingers but i'm just explaining

253
00:30:39,360 --> 00:30:46,560
what cultivators may look at and this may drive market dynamics as we discussed last time okay

254
00:30:46,560 --> 00:30:56,240
so you've got two laboratories here 2913 2914 well you can tell from these distributions are 2913

255
00:30:56,880 --> 00:31:06,880
appears to have a wider variance in flower and their mean appears to be slightly lower than the

256
00:31:06,880 --> 00:31:15,840
mean of 2914 so if you were a cultivator and you saw this data right you sent in your samples to

257
00:31:15,840 --> 00:31:21,680
all the different laboratories well at the end of the day you're trying to sell your cannabis flower

258
00:31:21,680 --> 00:31:28,000
and if you've known they may have noticed that cannabis with the higher thc sells at greater rates

259
00:31:28,000 --> 00:31:34,160
and so if they were comparing these two laboratories they would say oh you know this laboratory

260
00:31:34,160 --> 00:31:43,840
looks like on average we may have higher results and we have a lower variance on average you know

261
00:31:43,840 --> 00:31:51,120
and with lower variance our thc may test higher at this laboratory once again there could be other

262
00:31:51,120 --> 00:31:57,040
underlying mechanisms here as we discussed the different clients using the different labs if

263
00:31:57,040 --> 00:32:06,720
that were the case then you may see people leave laboratory 2913 go and use lab 2914 over time more

264
00:32:06,720 --> 00:32:14,480
and more people use lab 2914 and the methods kind of standardized towards this so that's sort of the

265
00:32:15,200 --> 00:32:22,800
concern about lab shopping is that you know any lab with the slightly different variance as we'll

266
00:32:22,800 --> 00:32:29,360
show later these may actually be surprisingly in the same ballpark so as you can see the lab that

267
00:32:29,360 --> 00:32:38,720
tests the most samples you know has a mean of 22.26 and a variance of around 21 so you could do a

268
00:32:38,720 --> 00:32:45,840
difference in means test between all these different laboratories and see if any of them have

269
00:32:45,840 --> 00:32:52,640
you know statistically different means but as I said that's not really going to take you too

270
00:32:52,640 --> 00:32:59,840
too far and in fact there could be perfectly good explanations for this so for example you know lab

271
00:32:59,840 --> 00:33:08,400
11 they're only testing almost 200 samples level of magnitude different so lab 10 their mean is

272
00:33:08,400 --> 00:33:15,200
significantly different you know perhaps they just have an entirely different type of clientele so

273
00:33:15,200 --> 00:33:23,920
that's what's going on with THC just wanted to point out the other compounds for you real quick

274
00:33:23,920 --> 00:33:30,320
because it helps to look at a bunch of different distributions the distribution and CBD by laboratory

275
00:33:30,320 --> 00:33:36,480
2907 has a really really wide distribution so here's an interesting one so this one is

276
00:33:36,480 --> 00:33:43,440
you know not quite a normal or logarithmic distribution if you were this laboratory you

277
00:33:43,440 --> 00:33:48,160
know you may want you may be looking at the other laboratories and think oh you know maybe you know

278
00:33:48,160 --> 00:33:54,560
maybe there's something different going on with our method however lab 2913 also has this interesting

279
00:33:54,560 --> 00:34:00,720
distribution once again this could just be something unique about the clientele of these

280
00:34:00,720 --> 00:34:07,600
two different laboratories and then once again lab 2914 is a really wide distribution of CBD and so

281
00:34:07,600 --> 00:34:15,760
my takeaway from this is that okay let's say you were growing CBD rich flour you may not want to

282
00:34:15,760 --> 00:34:22,880
use you know these laboratories that have a really wide distribution of CBD because it doesn't look

283
00:34:22,880 --> 00:34:29,920
like these may necessarily be accurate say right you've got 10 laboratories and you know eight of

284
00:34:29,920 --> 00:34:37,040
them are measuring in this manner appears that those may be the laboratories that have the the

285
00:34:37,040 --> 00:34:42,240
more precise method i'm not certain what's going on with with those odd distributions so you could

286
00:34:42,240 --> 00:34:49,120
ask the laboratory about it or pick one that has a slightly more normal distribution that's how i

287
00:34:49,120 --> 00:34:57,120
would approach it so here are moisture content values by laboratory and as you can see they're

288
00:34:57,120 --> 00:35:03,360
actually all over the place they're at least different they're quite different distributions

289
00:35:03,360 --> 00:35:08,560
from lab to lab and i think this is something interesting to point out in that we may have

290
00:35:08,560 --> 00:35:17,840
been quite critical about the THC and CBD ratios but everything's relative so we were just looking

291
00:35:17,840 --> 00:35:24,960
at those analytes so yes the distributions may have looked slightly different but they're all

292
00:35:24,960 --> 00:35:31,120
generally in the same ballpark similar looking distributions they started to look a little

293
00:35:31,120 --> 00:35:38,400
different for CBD but good sign for measuring cannabinoids however when you look at say moisture

294
00:35:38,400 --> 00:35:44,000
content it looks like the laboratories are fundamentally just testing differently right

295
00:35:44,000 --> 00:35:53,120
so you've got lab 10 lab 11 lab 2907 is measuring moisture more on this end of the distribution

296
00:35:53,120 --> 00:36:01,120
or or on this end of the scale same for 2908 then lab 2909 you've got this really peculiar

297
00:36:02,080 --> 00:36:11,840
distribution at a different scale really than 2908 lab 2910 looks slightly different than the

298
00:36:11,840 --> 00:36:18,560
the other ones it's the most similar to lab 10 it looks like then lab 2911 is is pretty similar

299
00:36:18,560 --> 00:36:27,760
to 2910 2912 has what appears to be you know the the most normal of distributions however it does

300
00:36:27,760 --> 00:36:37,200
have a you know a skewed tail here that's a little interesting and then 2913 you see slightly a

301
00:36:37,200 --> 00:36:47,360
slightly peculiar distribution and only ever slightly but it's similar i would say to 2912

302
00:36:47,360 --> 00:36:57,280
and that except for this oddity and then 2914 is once again it's kind of similar but it also has

303
00:36:57,280 --> 00:37:03,520
oddities around this end of the distribution what this would indicate to me is you know maybe

304
00:37:03,520 --> 00:37:14,320
there's something going on with labs 2913 and 2914 how their method is testing above 10% moisture

305
00:37:14,320 --> 00:37:20,880
and once again there may not be anything nefarious about this it just maybe they may have their

306
00:37:20,880 --> 00:37:30,000
instruments calibrated at a certain part of the scale and so they may just need to maybe we just

307
00:37:30,000 --> 00:37:38,000
need to look at how our method is testing moisture content above 10% similarly other labs may say oh

308
00:37:38,000 --> 00:37:46,720
maybe we're you know structurally testing moisture differently so maybe the laboratories can come to

309
00:37:46,720 --> 00:37:55,600
a consensus on what the appropriate method to to test moisture is and they can all have their own

310
00:37:55,600 --> 00:38:04,400
proprietary takes but would it hurt for them to at least agree upon you know around like what

311
00:38:04,400 --> 00:38:10,960
scale of this moisture thing should be on because it looks like these labs think moisture should be

312
00:38:10,960 --> 00:38:16,640
measured in this manner and then these labs think that moisture should be measured in this manner

313
00:38:16,640 --> 00:38:23,760
let's just look at the last one just to be complete and that would be water activity and it's measured

314
00:38:23,760 --> 00:38:34,400
in peculiar units on a scale of zero to one it's measured in aw so here is water activity they're

315
00:38:34,400 --> 00:38:40,800
not perfectly normal distributions i was going to say they're not that bizarre um i guess with

316
00:38:40,800 --> 00:38:49,920
with chunks out okay so right off the bat 2907 is slightly lower 2909 has a peculiar double load

317
00:38:49,920 --> 00:38:58,560
2910 that's a pretty normal distribution just really really wide tails high variance probably

318
00:38:58,560 --> 00:39:05,840
high kurtosis nothing wrong with that that may just be how moisture is in cannabis here's 2911

319
00:39:05,840 --> 00:39:16,240
slightly less variance a bit more of a skew 2912 quite skewed and their mean looks more similar to

320
00:39:16,240 --> 00:39:24,240
2907 and then once again 2913 and 2914 are slightly different and so if you look at them as a whole

321
00:39:24,800 --> 00:39:30,720
they're not wildly all over the place but they are slightly different on first sight they appear to

322
00:39:30,720 --> 00:39:38,800
be more different than the thc distributions that's subjective but that's my take once again

323
00:39:38,800 --> 00:39:46,320
you've got to weigh the costs and benefits obviously people care most about the psychoactive compounds

324
00:39:46,320 --> 00:39:52,720
the thc and the cbd but as you pointed out at the beginning you do need to be concerned about the

325
00:39:52,720 --> 00:39:59,680
shelf life of these products right you don't want things molding on the shelf we'll get more into

326
00:40:00,240 --> 00:40:07,040
microbes and micro toxins we have those results so we can look more at those but i think the take

327
00:40:07,040 --> 00:40:14,400
away from today is you know washington state may actually be doing a fairly good job at measuring

328
00:40:14,400 --> 00:40:23,440
thc and cbd and they may simply need to relook at some of the more minor tests like moisture content

329
00:40:23,440 --> 00:40:30,480
and water activity because those are the ones that actually look anomalous versus versus just the the

330
00:40:30,480 --> 00:40:37,920
thc and the cbd you know maybe they can get around to it now that they're stabilizing their

331
00:40:37,920 --> 00:40:43,200
cannabinoid method so that's sort of my big takeaway from the day remember we've seen all the

332
00:40:43,200 --> 00:40:53,280
the peculiar plots and now you see the thc again by laboratory and once again this laboratory has

333
00:40:53,280 --> 00:40:59,040
a peculiar mode in the middle maybe there's a perfectly good explanation for that and then

334
00:40:59,040 --> 00:41:06,080
once again the mode that we're the most concerned about that as far as fraud goes would just be

335
00:41:06,640 --> 00:41:15,440
the thc mode and from looking at all the other modes yes there may be slight variances in in

336
00:41:15,440 --> 00:41:23,200
variance and mean but they're generally in the same ballpark but as we've pointed out small

337
00:41:23,200 --> 00:41:30,160
differences matter i don't think this is the end of this analysis but i think this was a interesting

338
00:41:30,160 --> 00:41:36,400
take on the data well i'll let you in on the future work that needs to be done we're looking

339
00:41:36,400 --> 00:41:43,200
at the traceability data in washington state and so this was our first take on looking at the

340
00:41:43,200 --> 00:41:50,800
laboratory results we still have to get through all the inventory the strains the sales a lot more

341
00:41:50,800 --> 00:41:58,480
data to crunch and as you were pointing out it's awesome having this open data well you need good

342
00:41:58,480 --> 00:42:05,520
people good data scientists to actually crunch the numbers and this is golden data that's just been

343
00:42:05,520 --> 00:42:12,880
sitting around in the open and i haven't seen anyone do a really really thorough analysis of it

344
00:42:12,880 --> 00:42:21,040
and that's exactly why the cannabis data science team and canlittics are here so over the coming

345
00:42:21,040 --> 00:42:28,960
weeks let's put our minds together organize curate and analyze this data because it's just sitting

346
00:42:28,960 --> 00:42:35,840
there right for us i think we can answer many cool interesting questions with it so let's embark i

347
00:42:35,840 --> 00:42:40,240
think it's going to be a lot of fun sounds awesome you know i was just like looking at my whiteboard

348
00:42:40,240 --> 00:42:45,520
because i happen to have like the uh all the fields and files for the erd i was going to do

349
00:42:45,520 --> 00:42:51,840
washington state that's still on my whiteboard it's been there since last winter but you do so

350
00:42:51,840 --> 00:42:56,240
much you know we're good it's so that's great we're coming back to that because there is a lot of

351
00:42:56,240 --> 00:43:03,040
data in washington state there certainly is you know how we are we're the tortoise so it may have

352
00:43:03,040 --> 00:43:10,640
been a year but this data no one else has looked at it so that's sort of what's crazy right it's

353
00:43:10,640 --> 00:43:17,520
been almost a whole year and no one's looked at this data because it's such a daunting task and

354
00:43:17,520 --> 00:43:23,840
in fact i was reading a really interesting article i'll have to share it with you that was making the

355
00:43:23,840 --> 00:43:31,120
point that's where a lot of people stop is they stop in the data curation phase people have really

356
00:43:31,120 --> 00:43:42,320
cool ideas for apps and they're building all this shiny software but a lot of it requires data data

357
00:43:42,320 --> 00:43:48,720
that's been carefully curated and cleaned and that's what us awesome data scientists are here

358
00:43:48,720 --> 00:43:55,680
to do and help out is if we can get this data cleaned and organized then it unlocks an incredible

359
00:43:55,680 --> 00:44:03,680
amount of potential simply a daunting task but it's not hard it's dirty and that's why i like to

360
00:44:03,680 --> 00:44:12,080
compare our work to that of a plumber and that's why you pay a plumber a decent amount it's dirty

361
00:44:12,080 --> 00:44:20,880
unglamorous work that no one wants to do and we'll roll up our sleeve grab the plunger and go at it

362
00:44:20,880 --> 00:44:27,440
so we're going to go grab the data plunger and get everything unclogged and flowing smoothly

363
00:44:27,440 --> 00:44:32,320
and so all this awesome traceability data that's sitting there right in front of us

364
00:44:32,320 --> 00:44:38,640
we'll get that flowing nicely into people's hands it's great and and so thank you thank you thank

365
00:44:38,640 --> 00:44:45,040
you thank you thank you i want to thank you once again for your eyes your ears your brilliant minds

366
00:44:45,040 --> 00:44:50,880
it's you that really moved the ball forward thank you for helping advance cannabis science

367
00:44:50,880 --> 00:45:18,880
could it do it without you thank you

