1
00:00:00,000 --> 00:00:08,100
Welcome to the Cannabis Data Science Meetup Group.

2
00:00:08,100 --> 00:00:16,900
I wanted to share with you some of the research and how eager cannabis data scientists like yourself can help out.

3
00:00:16,900 --> 00:00:21,900
I'll go ahead and sort of jump into the material that I wanted to talk about.

4
00:00:21,900 --> 00:00:27,700
But before I do that, why don't I give all of you a chance to talk about what you may want to talk about?

5
00:00:27,700 --> 00:00:31,400
In no particular order, maybe starting in my top corner.

6
00:00:31,400 --> 00:00:34,100
Rick, welcome to the group.

7
00:00:34,100 --> 00:00:44,400
I'd be curious to hear about what you'd like to get out of cannabis data and maybe some interesting projects you're working on and things that you would like to see move forward.

8
00:00:44,400 --> 00:00:48,100
Sure. So, hey everyone, my name is Rick.

9
00:00:48,100 --> 00:00:54,200
I have a background working professionally in big data in the healthcare industry.

10
00:00:54,200 --> 00:01:02,400
So, and I'm also a long time lover of cannabis, whose state has been recreational here for a few years.

11
00:01:02,400 --> 00:01:09,500
So, I've taken advantage of that and began cultivating and most of the passion has gone into breeding.

12
00:01:09,500 --> 00:01:19,400
That is my primary interest in data is compiling as much as I can and creating relations for strains and effects.

13
00:01:19,400 --> 00:01:28,800
And I started looking into it, saw what you guys were doing and it was, you know, right on the same page as what I'm looking into right now.

14
00:01:28,800 --> 00:01:39,300
Ultimately, I'd like to build an app that is geared towards breeders and facility owners that allow them to use their data more efficiently.

15
00:01:39,300 --> 00:01:47,500
I work with some of the facilities in Michigan here and see just all the untapped potential and data going out the window.

16
00:01:47,500 --> 00:01:51,200
So, I'm working on something to fix that here locally.

17
00:01:51,200 --> 00:01:55,300
Simply phenomenal, Rick. Since you weren't here last week, I'll go ahead and recap.

18
00:01:55,300 --> 00:01:59,100
All of the varieties are of quite interest, right?

19
00:01:59,100 --> 00:02:04,900
So, go to any retailer and they're going to be selling by variety.

20
00:02:04,900 --> 00:02:08,400
So, it's a long time tradition in the cannabis industry.

21
00:02:08,400 --> 00:02:17,300
And what we like to do is marry sort of folklore and things that people use every day to statistics.

22
00:02:17,300 --> 00:02:23,800
And so, if people are using these strain names every day, then we're going to see if there's any rhyme or reason to it.

23
00:02:23,800 --> 00:02:29,500
And it seems that, of course, underlying this is the chemical variability.

24
00:02:29,500 --> 00:02:32,300
And where does the chemical variability come from?

25
00:02:32,300 --> 00:02:35,000
Well, that may come from growing conditions, right?

26
00:02:35,000 --> 00:02:39,200
It may come from genetics, could come from probably both.

27
00:02:39,200 --> 00:02:44,200
Love that you're trying to understand this and then you're following it through, right?

28
00:02:44,200 --> 00:02:47,900
From seed to effect, right?

29
00:02:47,900 --> 00:02:57,400
Because, right, not only do you go seed to sale, but what happens after sale, people ingest this and presumably have some sort of effects.

30
00:02:57,400 --> 00:03:01,500
And we're trying to determine what, if any, are those effects.

31
00:03:01,500 --> 00:03:11,900
And we kind of hit on last week that at least I think the appropriate way to go about this is just having just a null hypothesis that,

32
00:03:11,900 --> 00:03:16,000
OK, let's just say that maybe cannabis has no effect on people.

33
00:03:16,000 --> 00:03:18,800
Of course, there's lots of evidence to the contrary.

34
00:03:18,800 --> 00:03:26,400
So, you know, that may be a easily falsifiable hypothesis, but that's a good starting point.

35
00:03:26,400 --> 00:03:36,600
So then, right, you'll want to one by one see what effects these various compounds people are ingesting are having.

36
00:03:36,600 --> 00:03:40,900
At least that's the way I think a nice scientific way to go about this would be.

37
00:03:40,900 --> 00:03:45,300
It's brilliant. You're right on the money here.

38
00:03:45,300 --> 00:03:48,400
This is what people are working on.

39
00:03:48,400 --> 00:03:57,200
And I think you actually have a pretty sophisticated knowledge of this because, as I'll point out to you today,

40
00:03:57,200 --> 00:04:09,800
just say taking into consideration that there are different varieties of cannabis is actually not apparent to many.

41
00:04:09,800 --> 00:04:14,500
And may not be included in the conversation and that has implications.

42
00:04:14,500 --> 00:04:17,800
But I'll point some of that out later today. But enough of me rambling.

43
00:04:17,800 --> 00:04:26,100
Isaac, good to see you today. Curious if you have any projects that you want to see moved forward here in the coming weeks.

44
00:04:26,100 --> 00:04:30,500
Good to see you all again. And I'm Isaac.

45
00:04:30,500 --> 00:04:38,200
I work for MCR Labs located in Massachusetts, cannabis testing lab.

46
00:04:38,200 --> 00:04:43,700
Well, this is actually for both myself and the place I work for.

47
00:04:43,700 --> 00:04:47,100
Kind of my main focus is on lab shopping.

48
00:04:47,100 --> 00:04:55,100
I mean, for the for my employer, definitely as we're trying to do proper science,

49
00:04:55,100 --> 00:05:00,400
we're losing business to those who are fudging numbers.

50
00:05:00,400 --> 00:05:15,500
And for myself, I mean, as a cannabis consumer, this is just MC a lot of health risk and just consumer fraud that is rampant in the industry.

51
00:05:15,500 --> 00:05:23,000
And I feel I'm in a particular position that allow me to do something about it.

52
00:05:23,000 --> 00:05:38,100
So, yeah, my main and I am actually working on fraud detection.

53
00:05:38,100 --> 00:05:41,900
And actually, I'm here today with a question.

54
00:05:41,900 --> 00:05:50,500
I'm hoping to prove to prove your brain to see your thoughts on that.

55
00:05:50,500 --> 00:05:55,000
So it's about using Benfors law.

56
00:05:55,000 --> 00:06:04,400
And I think you have that would assume you have at least heard of it is about the distribution of naturally occurring numbers.

57
00:06:04,400 --> 00:06:15,200
Contrary to intuition is it follows a kind of exponential curve that there is more ones than twos.

58
00:06:15,200 --> 00:06:27,500
And but for fraud detection in cannabis, the numbers, specifically the THC values are all clumped together around 20 percent.

59
00:06:27,500 --> 00:06:37,000
Right. And that's if you go from the textbook kind of manual, that's a no no for application of Benfors law.

60
00:06:37,000 --> 00:06:52,400
But then I think I found a method or operation that is to raise the numbers to a high power, a fourth or fifth power.

61
00:06:52,400 --> 00:07:01,600
By doing that, it seems to be able to stretch the numbers into several magnitudes into a realm where Benfors law apply.

62
00:07:01,600 --> 00:07:08,200
So the most straightforward case is take integers from, say, one to a thousand.

63
00:07:08,200 --> 00:07:13,900
But first, I won't apply it right because it's sequential numbers.

64
00:07:13,900 --> 00:07:19,100
But if you keep those numbers, then first law does appear.

65
00:07:19,100 --> 00:07:31,500
So I'm just wondering the possibility of even stretching the power of Benfors law by allowing the operation of.

66
00:07:31,500 --> 00:07:41,500
Raising the data set to a high power to artificially increase the the magnitude.

67
00:07:41,500 --> 00:07:48,200
So I'm hoping to hear your thoughts on that direction.

68
00:07:48,200 --> 00:08:00,100
And also, in addition, as my colleague, Yasha just joined, we were thinking of another reason that.

69
00:08:00,100 --> 00:08:05,400
Well, as we all know, THC values has been rising for a while.

70
00:08:05,400 --> 00:08:12,500
And the easiest explanation is, well, there could be a lot of explanations.

71
00:08:12,500 --> 00:08:20,900
LAP shopping is definitely one, but there could be also that the girls are getting better at cultivating,

72
00:08:20,900 --> 00:08:29,700
or it could be that they are simply getting better at selecting the best gene to grow.

73
00:08:29,700 --> 00:08:39,500
So we're interested in finding out how much each of those potential causes contribute to the.

74
00:08:39,500 --> 00:08:44,700
Graduate growth of THC content that we've been seeing in the Washington data should show this.

75
00:08:44,700 --> 00:08:48,200
Right. Yasha, what was your input there at the end?

76
00:08:48,200 --> 00:08:54,500
The watch the data that you guys have from Washington should should answer two of those questions,

77
00:08:54,500 --> 00:09:05,200
whether a single. Is it that genetics that yield higher THC are being used more often?

78
00:09:05,200 --> 00:09:16,100
And that leading to average of higher potency or is a single strain tested by the same grower at the same lab growing in potency or staying the same?

79
00:09:16,100 --> 00:09:24,300
I'm going to basically let you in on the golden way that you can actually go about attempting to answer this question.

80
00:09:24,300 --> 00:09:28,100
Right. You came here. So here it is. Right.

81
00:09:28,100 --> 00:09:39,400
Ultimately, the problem you're going to run into is, well, how do you identify which lab results are fraudulent and which ones are not?

82
00:09:39,400 --> 00:09:50,100
Right. And so basically at the end of the day, right, you'd want a zero or one zero, not fraudulent, one fraudulent,

83
00:09:50,100 --> 00:09:57,700
because ultimately you're going to need to predict if the lab results fraudulent or not.

84
00:09:57,700 --> 00:10:03,600
And I think, yes, exactly. You could use Bamford's law to a simple way.

85
00:10:03,600 --> 00:10:12,200
Right. If you built this logic regression, so zero or one trying to predict if it's fraudulent or not, you could say use Bamford's law.

86
00:10:12,200 --> 00:10:16,600
You could you say you could maybe maybe you know this better than I.

87
00:10:16,600 --> 00:10:26,100
I'm just kind of conjecturing here, but maybe as your regressors, you would use things like I don't know how you would incorporate this into your regressors,

88
00:10:26,100 --> 00:10:29,700
but somehow have the number of digits or something of that sort.

89
00:10:29,700 --> 00:10:33,600
Actually, in fact, I don't know how you would build out the regression.

90
00:10:33,600 --> 00:10:42,800
But the first the first and that actually may be sort of the the beauty or not the beauty, but the the novel insight you have.

91
00:10:42,800 --> 00:10:48,600
Right. Maybe if you wanted to patent some fraud detection device, you just have a really good algorithm.

92
00:10:48,600 --> 00:10:51,900
Well, first things first, you need the zeros or ones.

93
00:10:51,900 --> 00:10:57,400
And so now that I've been teasing it, well, how do you actually get about getting that data set?

94
00:10:57,400 --> 00:11:09,900
Well, it so happens that I do believe this data exists and you can get your hands on it in that we've been looking at the Washington State data.

95
00:11:09,900 --> 00:11:14,800
That's because Washington State has a pretty open Freedom of Information Act request.

96
00:11:14,800 --> 00:11:24,700
So you can get these data points specifically and what you'd be looking for or as some of you may have known,

97
00:11:24,700 --> 00:11:36,600
if you look in the news, the former laboratory in Washington State Praxis Laboratory was was called falsifying samples.

98
00:11:36,600 --> 00:11:45,900
And, you know, these lab results were entered into the traceability system, which you can then get those those results.

99
00:11:45,900 --> 00:11:52,300
That would basically be your subset of known fraudulent results.

100
00:11:52,300 --> 00:12:00,400
So that basically you would just say, OK, you know, these are the lab results that we know are fraudulent.

101
00:12:00,400 --> 00:12:05,300
One, the rest of them, you would assume are non fraudulent.

102
00:12:05,300 --> 00:12:09,800
But as you will point out, that's a big assumption, right?

103
00:12:09,800 --> 00:12:21,300
Because we don't actually know for certain t right that none of the other laboratories were not were not falsifying lab results.

104
00:12:21,300 --> 00:12:25,400
So that's sort of a hiccup in your analysis.

105
00:12:25,400 --> 00:12:32,300
But you could potentially just, I guess, look at the Praxis Laboratory lab results.

106
00:12:32,300 --> 00:12:43,700
And so then for that time window, you would just say, OK, these are all the lab results that Praxis Laboratory tested that weren't falsified.

107
00:12:43,700 --> 00:12:47,000
These are the ones that were falsified.

108
00:12:47,000 --> 00:12:55,700
Is there any way to detect to a significant degree of certainty that those were, in fact, falsified?

109
00:12:55,700 --> 00:13:00,400
And that's where your training, your prediction model would come in.

110
00:13:00,400 --> 00:13:05,800
So you say, oh, you know, and that's, I guess, yet another tricky part.

111
00:13:05,800 --> 00:13:09,100
So so that's at least the first tricky part solved.

112
00:13:09,100 --> 00:13:11,600
How do you get your training data set?

113
00:13:11,600 --> 00:13:16,700
The second tricky part is how do you actually predict?

114
00:13:16,700 --> 00:13:19,700
So that's actually going to all have to chew on that.

115
00:13:19,700 --> 00:13:21,800
But maybe you've thought about that.

116
00:13:21,800 --> 00:13:23,000
Any questions at this point?

117
00:13:23,000 --> 00:13:23,900
I'll show you.

118
00:13:23,900 --> 00:13:27,300
There's there's just multiple ways to cheat by a lab.

119
00:13:27,300 --> 00:13:33,200
There's the way that they prepped the sample, just fudging numbers.

120
00:13:33,200 --> 00:13:40,200
And there can be probably four or five different ways by which they can do so.

121
00:13:40,200 --> 00:13:43,000
So we need examples of each to be the ones.

122
00:13:43,000 --> 00:13:47,800
And then a clean data set that's just definitely clean.

123
00:13:47,800 --> 00:13:49,600
Right. And that's a good point.

124
00:13:49,600 --> 00:13:57,200
And so that would essentially be I think we kind of mentioned on this last week where there's so many variables.

125
00:13:57,200 --> 00:13:59,500
Right. And each of these are something that you would control for.

126
00:13:59,500 --> 00:14:02,700
And this is the thing about statistical models is right.

127
00:14:02,700 --> 00:14:07,300
They very laser focused in on the data at hand.

128
00:14:07,300 --> 00:14:16,100
Right. So if you train your model on data and you only had, like you said, the one way that fraud was committed,

129
00:14:16,100 --> 00:14:21,600
you know, committed, then you would miss potentially other ways.

130
00:14:21,600 --> 00:14:26,600
So maybe they are somehow preserving the digits.

131
00:14:26,600 --> 00:14:30,000
So maybe they would somehow preserve Bamford's law.

132
00:14:30,000 --> 00:14:32,000
Something else is going on.

133
00:14:32,000 --> 00:14:33,300
So that's a possibility.

134
00:14:33,300 --> 00:14:37,700
And in fact, that may just be where we just kind of need to expand the data set.

135
00:14:37,700 --> 00:14:41,800
There is the one case in Washington state where I think you could actually get the data.

136
00:14:41,800 --> 00:14:46,000
There have been other cases of fraud, like in Michigan.

137
00:14:46,000 --> 00:14:49,600
I think there was a case that may be debated.

138
00:14:49,600 --> 00:14:52,300
I think that one may be contested.

139
00:14:52,300 --> 00:14:57,700
And then I think there could have been an occurrence in Nevada and perhaps California.

140
00:14:57,700 --> 00:15:06,300
But ultimately, I think what you would want to seek are the results that you know are fraudulent

141
00:15:06,300 --> 00:15:12,300
versus the ones that you suspect aren't and see if there's any systematic difference.

142
00:15:12,300 --> 00:15:20,500
Because otherwise, and this is actually unfortunately what we've kind of seen is basically maybe not wrong,

143
00:15:20,500 --> 00:15:23,600
but I don't think it was fruitful.

144
00:15:23,600 --> 00:15:30,200
But basically, like in Washington state, right, you just see people take simple averages by lab

145
00:15:30,200 --> 00:15:37,600
and just kind of complain that a particular lab may have, you know, high on average results,

146
00:15:37,600 --> 00:15:40,700
which, you know, it's not a good look, right,

147
00:15:40,700 --> 00:15:45,900
if you're producing results that are significantly different than another laboratory.

148
00:15:45,900 --> 00:15:52,200
But that I don't think, right, there's many ways that people can go about explaining that away, right.

149
00:15:52,200 --> 00:15:55,700
There's the, oh, you know, good growers will choose our lab.

150
00:15:55,700 --> 00:16:01,000
As much as it may be like a signal of like, oh, maybe we should look here or look there.

151
00:16:01,000 --> 00:16:04,800
I don't think there is much concrete evidence.

152
00:16:04,800 --> 00:16:13,700
Because the other thing to kind of consider is there's sort of permitted ways that people can kind of push the numbers up.

153
00:16:13,700 --> 00:16:17,800
Or maybe they're not permitted, but they're definitely kind of grade lines.

154
00:16:17,800 --> 00:16:23,400
So, for example, I don't know if this is, this wouldn't really affect total THC or CBD,

155
00:16:23,400 --> 00:16:30,700
but I know there was an effort to just add as many cannabinoids as you can to your panel, right.

156
00:16:30,700 --> 00:16:39,400
Because if you're testing for CBL and CBT and CBCA and CBDV,

157
00:16:39,400 --> 00:16:47,000
so sort of just the more cannabinoids that would just push up the total cannabinoids.

158
00:16:47,000 --> 00:16:51,000
So that was kind of happening. That's kind of the case in California.

159
00:16:51,000 --> 00:16:57,000
So once again, I think there's many factors going on, but that's definitely one of them is

160
00:16:57,000 --> 00:17:04,300
there's definitely been a push to increase the panel, which isn't actually necessarily a bad thing.

161
00:17:04,300 --> 00:17:08,600
But as you pointed out, there may be other things going on.

162
00:17:08,600 --> 00:17:17,100
With that one specifically, so the more cannabinoids you have, the likely the lower THC will be

163
00:17:17,100 --> 00:17:21,900
because it'll get rid of the coelusion problem, especially with Delta-8, for example,

164
00:17:21,900 --> 00:17:23,900
that coeluses with Delta-9.

165
00:17:23,900 --> 00:17:30,100
And in a article about what happened in Washington,

166
00:17:30,100 --> 00:17:40,500
it does reference a price point at which buyers buy cannabis, which is at 20% THC.

167
00:17:40,500 --> 00:17:45,500
And I think THC by most dispensaries is defined as THC plus THCA.

168
00:17:45,500 --> 00:17:52,500
So what they think once is, and I think the number that most people look to is how much THC is in there,

169
00:17:52,500 --> 00:17:55,100
not how many total cannabinoids are in there.

170
00:17:55,100 --> 00:18:01,400
So I think what they're trying to do is specifically how can we increase THC for that final sale to be higher.

171
00:18:01,400 --> 00:18:08,100
I may need some clarification on your second point, but the first point was actually clever

172
00:18:08,100 --> 00:18:15,700
in that you would actually exactly expect as you add more cannabinoids for the THCA

173
00:18:15,700 --> 00:18:22,900
or Delta-8 or so on and so forth to diminish because you can actually see legitimate cases of this.

174
00:18:22,900 --> 00:18:28,800
So for example, in Michigan, we had a lot of PSI labs data.

175
00:18:28,800 --> 00:18:36,700
And you could see back in 2015, 2016, they were just fundamentally testing cannabis differently.

176
00:18:36,700 --> 00:18:41,400
I mean, for example, I believe they were even testing it with GC.

177
00:18:41,400 --> 00:18:45,300
So they started off testing just by GC, right?

178
00:18:45,300 --> 00:18:51,800
Because they were mostly just trying to get total THC, total CBD.

179
00:18:51,800 --> 00:18:57,600
And today, right, you move to HTLC, the numbers are going to be fundamentally different, right?

180
00:18:57,600 --> 00:19:02,600
You would still expect the total THC and CBD to be in the same ballpark.

181
00:19:02,600 --> 00:19:04,100
They're still being measured differently.

182
00:19:04,100 --> 00:19:05,700
And then, of course, the time effects.

183
00:19:05,700 --> 00:19:09,200
So yes, over this, it's real interesting data.

184
00:19:09,200 --> 00:19:14,400
I think I shared it with you last week, but I'll make sure that you have a hold of it because the PSI labs data,

185
00:19:14,400 --> 00:19:16,200
I think, is pretty good.

186
00:19:16,200 --> 00:19:19,400
Do we know for certain that nothing was falsified?

187
00:19:19,400 --> 00:19:23,000
No, we kind of have to go off of heuristics.

188
00:19:23,000 --> 00:19:29,400
And from seeing the scientific director speak and whatnot,

189
00:19:29,400 --> 00:19:35,000
I don't have any reasons to believe they're putting out wrong data.

190
00:19:35,000 --> 00:19:40,400
Their data is interesting to observe over time because you see methods change

191
00:19:40,400 --> 00:19:47,400
and you can observe cultivators in Michigan that appears to be getting better at their craft over time.

192
00:19:47,400 --> 00:19:53,600
Oh, yes. And then you raised the interesting question is, and we could actually look at this.

193
00:19:53,600 --> 00:19:56,000
We could do a regression.

194
00:19:56,000 --> 00:20:03,100
One of our explanatory variables would be number of cannabinoids.

195
00:20:03,100 --> 00:20:06,700
So how many cannabinoids are in your panel?

196
00:20:06,700 --> 00:20:10,200
Is it eight? Is it 12? Is it 16?

197
00:20:10,200 --> 00:20:15,900
And then your variable of interest would say be THCA.

198
00:20:15,900 --> 00:20:19,600
And then you could, I guess, try to try to disentangle that and say,

199
00:20:19,600 --> 00:20:29,200
okay, you know, as we add analytes to our panel, do we detect more or less of these various confidence?

200
00:20:29,200 --> 00:20:32,100
You're smiling. Wait, does this sound silly or?

201
00:20:32,100 --> 00:20:35,900
No, no, no, I think it's perfect so long as the data is clean.

202
00:20:35,900 --> 00:20:41,300
As in if you can take out if you're specifically talking about PSI's data

203
00:20:41,300 --> 00:20:44,600
and we believe that their data is accurate, then yes.

204
00:20:44,600 --> 00:20:48,300
But if we were to look at, let's say, all of Washington data,

205
00:20:48,300 --> 00:20:56,000
then we would first have to remove any data from labs that may not be trusted.

206
00:20:56,000 --> 00:20:58,800
Because that would mess up the whole experiment.

207
00:20:58,800 --> 00:21:06,600
And we may not know necessarily how many cannabinoids each of the labs are offering at each time.

208
00:21:06,600 --> 00:21:11,500
I think that analysis is best lab by lab.

209
00:21:11,500 --> 00:21:17,100
Or you could always basically have a lab specific fixed effect.

210
00:21:17,100 --> 00:21:20,100
I don't know if I've explicitly recommended these before,

211
00:21:20,100 --> 00:21:27,500
but you definitely want to have lab specific fixed effects if you're doing analysis for multiple labs.

212
00:21:27,500 --> 00:21:32,500
So, for example, Rick, when you're doing genetic analysis,

213
00:21:32,500 --> 00:21:39,500
say you want to do some prediction on the effects of your strains or this or that,

214
00:21:39,500 --> 00:21:42,100
if you're getting data from a bunch of different,

215
00:21:42,100 --> 00:21:46,400
you may want to get data from multiple labs.

216
00:21:46,400 --> 00:21:52,900
And then if you do so, you may want to basically control for lab by lab variants.

217
00:21:52,900 --> 00:21:58,600
Back to the topic at hand, I think is, what are all the different ways I guess fraud can be committed?

218
00:21:58,600 --> 00:22:01,600
Or it may not necessarily be fraud.

219
00:22:01,600 --> 00:22:07,700
For example, a consideration that I had, I don't know if this is the most likely,

220
00:22:07,700 --> 00:22:11,500
but it could be a co-founding variant.

221
00:22:11,500 --> 00:22:14,100
And I don't want to throw them under the bus because these are the hardest,

222
00:22:14,100 --> 00:22:17,200
in my opinion, the hardest working people at the laboratory.

223
00:22:17,200 --> 00:22:23,900
Or unfortunately, the analysts may have sort of like a pressure like,

224
00:22:23,900 --> 00:22:29,100
oh, the director wants us to have really high lab results.

225
00:22:29,100 --> 00:22:35,100
And they may just be kind of heavy handed, so to speak, not intentionally,

226
00:22:35,100 --> 00:22:40,300
but just kind of introduce bias of their own because they may just say like,

227
00:22:40,300 --> 00:22:46,600
oh, you know, all my peers, all their measurements are coming out at 30%.

228
00:22:46,600 --> 00:22:52,700
Maybe I'll just be a little basically the way the analysts,

229
00:22:52,700 --> 00:22:57,000
the way that would end up happening is you just be maybe a little heavy handed

230
00:22:57,000 --> 00:23:01,100
on the amount that you sample out and you'd be a little light handed

231
00:23:01,100 --> 00:23:04,200
on basically the amount of methanol you would add in.

232
00:23:04,200 --> 00:23:10,700
I don't think that would be the primary way that fraud would be occurring,

233
00:23:10,700 --> 00:23:15,100
but it would just be kind of something to take into consideration is,

234
00:23:15,100 --> 00:23:18,600
unfortunately, like I said, these are hard working people,

235
00:23:18,600 --> 00:23:21,000
may not be getting paid the most.

236
00:23:21,000 --> 00:23:27,200
And so they may just have sort of this pressure to weigh things heavy, so to speak.

237
00:23:27,200 --> 00:23:31,900
But I don't think that's the, you just think it's just something to consider.

238
00:23:31,900 --> 00:23:36,900
And then, of course, just outright fraud, just outright changing the numbers,

239
00:23:36,900 --> 00:23:42,900
Benford's law, which surprisingly, I never really played around with this before.

240
00:23:42,900 --> 00:23:47,100
So Isaac, if you have any insights or maybe one meetup,

241
00:23:47,100 --> 00:23:52,600
if you want to share any luck or work that you've done with this,

242
00:23:52,600 --> 00:23:54,800
I'd be excited to learn more about this.

243
00:23:54,800 --> 00:23:59,400
The final variant here is just, and in fact,

244
00:23:59,400 --> 00:24:07,000
this is I think what people complain the most or I've heard people complain the most about in Oregon,

245
00:24:07,000 --> 00:24:11,300
is based in, in fact, California recently too,

246
00:24:11,300 --> 00:24:17,600
is basically the cultivator would send in a non-random sample

247
00:24:17,600 --> 00:24:20,900
or potentially even an adulterated sample.

248
00:24:20,900 --> 00:24:26,700
The idea is, say in Washington state, you've got to take a random sample from a five pound block.

249
00:24:26,700 --> 00:24:31,900
And I think they'll, cultivators will even get audited.

250
00:24:31,900 --> 00:24:36,000
So the auditor will come by and say, okay, how do you sample from your SOP?

251
00:24:36,000 --> 00:24:41,100
And they say, oh, you know, we'll just take, you know, five random nugs from this five pound lot.

252
00:24:41,100 --> 00:24:46,400
But it's sort of like, okay, well, you know, how do they sample when the auditor is not there?

253
00:24:46,400 --> 00:24:51,200
You know, are they just picking the choicest buds?

254
00:24:51,200 --> 00:25:00,000
In Oregon, there's been claims that people will like sprinkle teeth or even, I've heard people will even,

255
00:25:00,000 --> 00:25:05,600
and this is where I say, so there's a couple of things people will say, oh, well, they'll paint them.

256
00:25:05,600 --> 00:25:11,500
So they'll like get a cannabis concentrate and, you know, lather that on.

257
00:25:11,500 --> 00:25:13,800
Well, then you run into the situation.

258
00:25:13,800 --> 00:25:19,300
Well, we'll say you've got a cannabis flower with concentrate lathered on it.

259
00:25:19,300 --> 00:25:29,300
Well, a good analyst at the laboratory, you know, they're going to look at this under a microscope to look at it for contaminants or,

260
00:25:29,300 --> 00:25:33,300
you know, foreign matter. They'll see, oh, this has been adulterated.

261
00:25:33,300 --> 00:25:36,100
Well, now this analyst is in, right.

262
00:25:36,100 --> 00:25:42,500
Once again, you've got a person who's getting paid, you know, the bare minimum.

263
00:25:42,500 --> 00:25:45,300
Okay, they see the sample. It's been adulterated.

264
00:25:45,300 --> 00:25:54,400
Well, that's going to be a real part, part, part in the pine because it's really no joking matter,

265
00:25:54,400 --> 00:25:59,400
but they're going to be in a sticky situation because, okay, what do you do?

266
00:25:59,400 --> 00:26:09,800
Do you now say this is actually my recommendation is the lab should then fail that sample for performing matter.

267
00:26:09,800 --> 00:26:15,500
They should say, oh, this sample's been adulterated. We're going to fail this.

268
00:26:15,500 --> 00:26:20,000
Well, now that's now in the traceability system.

269
00:26:20,000 --> 00:26:23,500
The state's going to say you what you did. What, what, what?

270
00:26:23,500 --> 00:26:26,500
Yeah, you know, you adulterated this sample.

271
00:26:26,500 --> 00:26:31,000
The cultivator could potentially be in big trouble.

272
00:26:31,000 --> 00:26:34,800
You know, they may lose their license.

273
00:26:34,800 --> 00:26:37,600
They could easily lose their license over something like that.

274
00:26:37,600 --> 00:26:41,800
What seemed like standard practice from the cultivator, right?

275
00:26:41,800 --> 00:26:46,300
They say, oh, all the other cultivators are adulterating our product.

276
00:26:46,300 --> 00:26:49,200
We'll adulterate it. They may lose their license.

277
00:26:49,200 --> 00:26:56,600
And then the lab, once again, is there in a difficult position because I've heard labs say that,

278
00:26:56,600 --> 00:26:59,600
oh, you know, we're not the potency police.

279
00:26:59,600 --> 00:27:06,400
And I kind of cringed at hearing that because, well, they actually kind of are, right?

280
00:27:06,400 --> 00:27:12,100
Because I was thinking about it the other day and it's there's a reason why laboratories have to get accredited.

281
00:27:12,100 --> 00:27:14,000
And, Yosh, you can probably attest to this.

282
00:27:14,000 --> 00:27:20,300
And, Isaac, this is actually the point you raised at the beginning is it's kind of a bit of a privilege,

283
00:27:20,300 --> 00:27:23,100
so to speak, to be a licensed laboratory.

284
00:27:23,100 --> 00:27:29,000
You know, you're there for the public to say, yeah, so there's a lot of responsibility there.

285
00:27:29,000 --> 00:27:33,500
You know, that's, I think, the reason why it's not cheap to start a laboratory

286
00:27:33,500 --> 00:27:36,300
and the reason why you have to go through these accreditations.

287
00:27:36,300 --> 00:27:41,800
Is because you're the ones who do have to make the tough call at the end of the day and say,

288
00:27:41,800 --> 00:27:44,900
yep, these guys said and send an adulterated sample.

289
00:27:44,900 --> 00:27:49,300
It's tough, but we're going to have to now fail you for foreign matter.

290
00:27:49,300 --> 00:27:51,900
They're probably going to take your license away.

291
00:27:51,900 --> 00:27:55,000
You know, that's just a really, really tough call.

292
00:27:55,000 --> 00:27:58,300
And like I said, they may not lose their license, right?

293
00:27:58,300 --> 00:28:02,400
So they depending on the state, maybe they'll get a warning.

294
00:28:02,400 --> 00:28:05,200
Maybe they'll say, OK, we're going to give you a warning.

295
00:28:05,200 --> 00:28:07,000
It may vary by state.

296
00:28:07,000 --> 00:28:08,800
You know, it's a tough call.

297
00:28:08,800 --> 00:28:11,000
Who's responsible?

298
00:28:11,000 --> 00:28:17,100
But I don't think that's something that the laboratory should just pretend they didn't see.

299
00:28:17,100 --> 00:28:19,600
So that's sort of my take on it.

300
00:28:19,600 --> 00:28:25,800
I suggested this in the past where, OK, we found this big ugly problem.

301
00:28:25,800 --> 00:28:30,400
And instead of just stopping there, let's actually offer a solution.

302
00:28:30,400 --> 00:28:34,200
It's not going to be the best solution in the world, but it can be a starting point.

303
00:28:34,200 --> 00:28:42,400
And from what I've seen, a simple, effective, cost-effective way to go about solving this is just say,

304
00:28:42,400 --> 00:28:50,300
you have to keep the samples for 90 days or you have to keep the sample for 60 days.

305
00:28:50,300 --> 00:28:58,500
This imposes a cost on the labs because they'll have to invest in refrigeration or what have you.

306
00:28:58,500 --> 00:29:03,800
But that's a relatively minor cost than some of the alternatives.

307
00:29:03,800 --> 00:29:11,000
And then the idea is, OK, if there's any question later on down the line,

308
00:29:11,000 --> 00:29:13,600
then you can come back and check the sample.

309
00:29:13,600 --> 00:29:20,700
And basically, if something tests in the store at 40% for flour,

310
00:29:20,700 --> 00:29:28,300
you could just go back to the laboratory and just say, OK, you know, can I in fact see this?

311
00:29:28,300 --> 00:29:31,600
Is it obvious that it was adulterated?

312
00:29:31,600 --> 00:29:40,800
If it's not obvious, then, oh, well, you know, and then or then ask the laboratory to do a confirmation run.

313
00:29:40,800 --> 00:29:44,000
But question before I just keep droning on.

314
00:29:44,000 --> 00:29:48,000
I just want to weigh in. Do you mind if I weigh in on last week's next?

315
00:29:48,000 --> 00:29:55,800
So let's say in Massachusetts, we view pre-rolled joints of flour that is tested for that as flour.

316
00:29:55,800 --> 00:30:00,200
It's not separated by testing category. It is that.

317
00:30:00,200 --> 00:30:06,000
And sometimes the stuff that makes pre-rolled joints is a combination of flour and some concentrate.

318
00:30:06,000 --> 00:30:11,800
And it's submitted to us as flour. And then it's definitely altered.

319
00:30:11,800 --> 00:30:16,100
And to me, from, I guess, the ethical standpoint is.

320
00:30:16,100 --> 00:30:19,000
This is what they submitted. This is what we're testing.

321
00:30:19,000 --> 00:30:22,000
This is the results that we're going to give them.

322
00:30:22,000 --> 00:30:28,300
And it accurately says what's in the product that was given to us

323
00:30:28,300 --> 00:30:32,500
and accurately represents the product that consumers will end up buying.

324
00:30:32,500 --> 00:30:39,200
If that product that consumers end up buying is also.

325
00:30:39,200 --> 00:30:44,200
Given the extra whatever was put into the testing sample.

326
00:30:44,200 --> 00:30:54,000
And that if let's say there's a grower that wants to hack the system, wants to send in a sample that will have more.

327
00:30:54,000 --> 00:30:57,900
That will have concentrate tossed in it just for higher results.

328
00:30:57,900 --> 00:31:02,800
If they do it just once, shame on them. They absolutely shouldn't.

329
00:31:02,800 --> 00:31:07,800
And if the results are ridiculous, then I think a secret shopper program would yield that.

330
00:31:07,800 --> 00:31:13,400
OK, the regulators bought a product, they got it tested and the results differ very much.

331
00:31:13,400 --> 00:31:19,600
Between what consumers buy and what the original results were.

332
00:31:19,600 --> 00:31:22,600
And that should lead to an analysis of the data.

333
00:31:22,600 --> 00:31:29,900
Does it show that this is an outlier? Or is this a consistent thing?

334
00:31:29,900 --> 00:31:33,000
Well, I can surely hear you tell how it is.

335
00:31:33,000 --> 00:31:39,400
Exactly. So I think states just need to be conscious and explicitly say how this needs to be handled.

336
00:31:39,400 --> 00:31:47,000
So, for example, in Washington state, they just say if you add anything to it, it's technically a mixed product.

337
00:31:47,000 --> 00:31:50,400
I don't see anything wrong if you want to call it a mixed product.

338
00:31:50,400 --> 00:31:57,200
But I think the idea behind the testing was whatever you send in should be representative of what you sell.

339
00:31:57,200 --> 00:32:03,700
If you send in flowers sprinkled with teeth on it, as long as that's what the label says

340
00:32:03,700 --> 00:32:08,800
and depending on what state you're in, you may call it a mixed product, then by all means.

341
00:32:08,800 --> 00:32:12,000
I mean, in fact, people have really creative products, right?

342
00:32:12,000 --> 00:32:14,900
There's a famous product, the moon rocks.

343
00:32:14,900 --> 00:32:17,700
And that actually is just exactly that.

344
00:32:17,700 --> 00:32:23,200
Just a cannabis flower coated in oil and rolled around in teeth.

345
00:32:23,200 --> 00:32:25,700
So people will specifically buy those.

346
00:32:25,700 --> 00:32:32,600
But that's fine. But if you're selling moon rocks, send in representative moon rocks to be tested.

347
00:32:32,600 --> 00:32:39,400
I don't think it's 100% okay to just say you're selling flour

348
00:32:39,400 --> 00:32:47,200
because I think the big complaint is people just get low quality flour, sprinkle teeth on it.

349
00:32:47,200 --> 00:32:49,500
What they send in isn't what they're selling.

350
00:32:49,500 --> 00:32:51,300
And then Isaac, question?

351
00:32:51,300 --> 00:32:54,900
Yeah, I just want to kind of put in my two cents on this.

352
00:32:54,900 --> 00:32:58,400
It seems to me that there really is two separate issues.

353
00:32:58,400 --> 00:33:05,100
The first is whether the sample that's tested is representative of whatever is being sold.

354
00:33:05,100 --> 00:33:13,100
And the second being the type of product like a pre-roll, as Yasha was saying.

355
00:33:13,100 --> 00:33:21,200
I mean, cannabis market in general is very new and there's always new products being made.

356
00:33:21,200 --> 00:33:27,600
And we have categorization like flour extracts, concentrates,

357
00:33:27,600 --> 00:33:33,000
but pre-roll is blurring the line between flour and extracts, right?

358
00:33:33,000 --> 00:33:39,000
And that's more of a regulatory problem or categorization than representative.

359
00:33:39,000 --> 00:33:45,900
So I just want to separate those two issues as they are rather than having them mixed together.

360
00:33:45,900 --> 00:33:47,700
100% agreed.

361
00:33:47,700 --> 00:33:53,900
And this is once again, I think we said this last week, but it helps to reiterate this week.

362
00:33:53,900 --> 00:34:01,200
That's why we're talking about these things is it just helps to say them out loud and formalize them.

363
00:34:01,200 --> 00:34:04,600
As I shared this piece of policy with you out of Washington State,

364
00:34:04,600 --> 00:34:09,300
a lot of the language either doesn't take product types into consideration

365
00:34:09,300 --> 00:34:14,600
or when they do, they do it in maybe not the most logical fashion, right?

366
00:34:14,600 --> 00:34:21,600
So just the fact that you pointed out that, oh, there's actually two distinct problems going on here is helpful

367
00:34:21,600 --> 00:34:25,600
because now, right, regulators can take that into consideration that, oh, yes,

368
00:34:25,600 --> 00:34:32,400
we need to be explicit about how are different product types being sampled and tested.

369
00:34:32,400 --> 00:34:36,300
You can say, oh, take a representative batch from five pounds of flour.

370
00:34:36,300 --> 00:34:43,500
Well, then also have sampling rules for joints, have sampling rules for mixed products, concentrate, so on and so forth.

371
00:34:43,500 --> 00:34:49,700
And then also, you may want to the states may want to be explicit about sampling.

372
00:34:49,700 --> 00:34:53,300
You know, what does adulteration, what does that mean?

373
00:34:53,300 --> 00:34:56,300
What are the consequences?

374
00:34:56,300 --> 00:34:57,900
Who is responsible?

375
00:34:57,900 --> 00:35:01,700
Because as I was saying, the labs are in a difficult situation, right?

376
00:35:01,700 --> 00:35:02,900
If all the other, right.

377
00:35:02,900 --> 00:35:04,600
So, for example, Massachusetts, right.

378
00:35:04,600 --> 00:35:07,000
And in fact, I just saw this today.

379
00:35:07,000 --> 00:35:11,600
So, right, if all the other labs in Massachusetts, you all want to be on the same page

380
00:35:11,600 --> 00:35:19,900
because just today I saw that ARM Labs in Colorado has decided to close.

381
00:35:19,900 --> 00:35:24,500
And the reason they stated was they simply couldn't stay competitive

382
00:35:24,500 --> 00:35:30,000
because they think other labs in the state aren't following the rules

383
00:35:30,000 --> 00:35:35,400
and they were doing their best to do testing by the book and they just couldn't stay competitive.

384
00:35:35,400 --> 00:35:39,500
And so, Isaac, you mentioned at the beginning, like, you know, you're trying to solve this problem.

385
00:35:39,500 --> 00:35:42,200
And, you know, I applaud you for doing that.

386
00:35:42,200 --> 00:35:43,700
And it matters.

387
00:35:43,700 --> 00:35:50,300
I do think, you know, MCR labs can be profitable in the long term just doing nice ethical testing.

388
00:35:50,300 --> 00:35:55,400
It's just how do you get through the all this short term noise with, you know,

389
00:35:55,400 --> 00:35:59,500
maybe someone sets up shop in there, they're cutting corners.

390
00:35:59,500 --> 00:36:02,400
So how do you survive the short term?

391
00:36:02,400 --> 00:36:05,900
So I love that you're thinking about this and working on it.

392
00:36:05,900 --> 00:36:09,900
Any more thoughts while I think if I have any more final thoughts real quick?

393
00:36:09,900 --> 00:36:16,700
Isaac, did you happen to bring up the question that was asked on Friday with what model would best,

394
00:36:16,700 --> 00:36:22,800
what can be used for identifying anomalies within these kind of data sets?

395
00:36:22,800 --> 00:36:25,800
No, I haven't asked it.

396
00:36:25,800 --> 00:36:32,400
But yeah, I mean, I very much appreciate if the group have any ideas.

397
00:36:32,400 --> 00:36:36,200
Essentially, we're seeing in Massachusetts data,

398
00:36:36,200 --> 00:36:45,300
there is likely a gap of flowers around with around 18 to 19 percent concentration.

399
00:36:45,300 --> 00:36:50,500
And there's a spike of flowers around 20 percent THC concentration.

400
00:36:50,500 --> 00:37:00,300
As I'm sure we all know, flowers with a threshold, flowers above that sells much better.

401
00:37:00,300 --> 00:37:06,700
And I'm just wondering if you're aware of any statistical method

402
00:37:06,700 --> 00:37:13,700
that is a good measurement of this kind of gap?

403
00:37:13,700 --> 00:37:16,200
This is a phenomenal question.

404
00:37:16,200 --> 00:37:20,700
And perhaps when we'll have to chew on more for next week,

405
00:37:20,700 --> 00:37:24,900
I'll just go ahead and give you my hot take right now

406
00:37:24,900 --> 00:37:27,900
and then think about this more thoroughly over the next week

407
00:37:27,900 --> 00:37:30,300
because this seems like this is a pressing issue.

408
00:37:30,300 --> 00:37:35,600
My hot take is, right, I think you were already on a fruitful approach with Bamford's law.

409
00:37:35,600 --> 00:37:41,400
I would like to just reiterate what I was saying at the beginning,

410
00:37:41,400 --> 00:37:46,500
where I think ultimately to for certain predict fraudulent values,

411
00:37:46,500 --> 00:37:50,700
you'll need ones that you know are fraudulent, right?

412
00:37:50,700 --> 00:37:55,200
Because, well, maybe you won't need them for certain, I guess.

413
00:37:55,200 --> 00:37:58,500
But I think that would be helpful is,

414
00:37:58,500 --> 00:38:05,300
okay, what do fraudulent lab results look like compared to non-fraudulent?

415
00:38:05,300 --> 00:38:09,700
And as I was saying, the data set I think that you can get a hold of

416
00:38:09,700 --> 00:38:12,300
are the practice laboratory results

417
00:38:12,300 --> 00:38:17,300
because you know for certain which ones were falsified

418
00:38:17,300 --> 00:38:24,400
and you can compare those to ones that were probably not falsified.

419
00:38:24,400 --> 00:38:27,500
So you can see if there's any systematic difference there.

420
00:38:27,500 --> 00:38:29,600
And that's where Bamford's law could kick in.

421
00:38:29,600 --> 00:38:34,600
So potentially there are digit differences with the falsified ones.

422
00:38:34,600 --> 00:38:39,000
And then finally, this was something that we had hit on in the past

423
00:38:39,000 --> 00:38:42,800
because Rick, once again, getting back to genetics,

424
00:38:42,800 --> 00:38:46,900
we were actually using a some sort of,

425
00:38:46,900 --> 00:38:48,200
I'm not going to get the name right,

426
00:38:48,200 --> 00:38:52,100
but some sort of multivariate difference in means test.

427
00:38:52,100 --> 00:38:55,600
It was the last episode of statistics we did.

428
00:38:55,600 --> 00:38:57,400
But basically we were trying to say,

429
00:38:57,400 --> 00:39:01,000
okay, there's all these plant patents out there

430
00:39:01,000 --> 00:39:03,700
and the way people are patenting their plants

431
00:39:03,700 --> 00:39:07,700
or one of the ways is by chemical composition.

432
00:39:07,700 --> 00:39:09,800
And so we were saying, okay, you know,

433
00:39:09,800 --> 00:39:12,800
could you actually do a difference in means test?

434
00:39:12,800 --> 00:39:18,400
To basically say, okay, this variety, we've tested it 30 times.

435
00:39:18,400 --> 00:39:21,400
We've tested this variety 30 times.

436
00:39:21,400 --> 00:39:24,200
This one is significantly,

437
00:39:24,200 --> 00:39:28,600
this one's chemical composition is significantly different

438
00:39:28,600 --> 00:39:30,800
than this chemical composition.

439
00:39:30,800 --> 00:39:35,300
And that was how we were recommending people go about patenting their plants

440
00:39:35,300 --> 00:39:38,800
where you would just say, okay, we've tested this plant.

441
00:39:38,800 --> 00:39:42,400
It's statistically different than all the existing patents

442
00:39:42,400 --> 00:39:44,900
and it would make a good patent candidate.

443
00:39:44,900 --> 00:39:51,400
You could potentially do a similar sort of difference in means test with samples.

444
00:39:51,400 --> 00:39:53,800
But I think you'd have to get,

445
00:39:53,800 --> 00:39:58,400
think outside of the box about what means you'd be wanting to compare.

446
00:39:58,400 --> 00:40:01,800
And so this is why I floated last week

447
00:40:01,800 --> 00:40:04,700
that you may want to start looking at ratios.

448
00:40:04,700 --> 00:40:08,000
So for example, we looked at it briefly that, okay,

449
00:40:08,000 --> 00:40:13,800
the ratio between total cannabinoids and total terpenes,

450
00:40:13,800 --> 00:40:15,400
I think we looked at this briefly,

451
00:40:15,400 --> 00:40:19,600
but I wanted to say there was a linear relationship between these two,

452
00:40:19,600 --> 00:40:21,800
which was just kind of interesting.

453
00:40:21,800 --> 00:40:25,400
And like I said, we only have briefly explored this.

454
00:40:25,400 --> 00:40:31,100
You know, maybe there's a sample that basically has a weird total cannabinoid

455
00:40:31,100 --> 00:40:33,800
to total THC ratio.

456
00:40:33,800 --> 00:40:39,700
And maybe it also has a weird beta pining to delimiting ratio.

457
00:40:39,700 --> 00:40:43,900
Maybe you'd have, and I think this is kind of what our friend,

458
00:40:43,900 --> 00:40:46,800
John Abrams over at the CESC uses.

459
00:40:46,800 --> 00:40:52,900
He uses a set of ratios to kind of see if lab results look logical.

460
00:40:52,900 --> 00:40:53,700
Ooh, Rick.

461
00:40:53,700 --> 00:40:55,200
Yeah, sorry, I didn't want to cut you off,

462
00:40:55,200 --> 00:40:59,800
but I wanted to agree with what you were saying in terms of like the lab sampling

463
00:40:59,800 --> 00:41:04,500
and having certain profiles or, you know, like the MIMP profile

464
00:41:04,500 --> 00:41:07,800
and attributing that to a specific terpene

465
00:41:07,800 --> 00:41:14,200
and then that to a specific strain or genetic and be able to track that.

466
00:41:14,200 --> 00:41:17,700
Also with things like hoplite and viral,

467
00:41:17,700 --> 00:41:21,800
it's important now to also for clean cuts

468
00:41:21,800 --> 00:41:26,800
or things coming out of tissue culture labs to establish a chain of custody.

469
00:41:26,800 --> 00:41:30,200
So having a way to do that as well is all interesting.

470
00:41:30,200 --> 00:41:34,700
But right now, none of that data is released for like the hoplite and testing.

471
00:41:34,700 --> 00:41:38,600
I noticed a bunch of startup testing labs

472
00:41:38,600 --> 00:41:41,500
where they can send you at home or they'll have people come on site.

473
00:41:41,500 --> 00:41:46,200
There seems to be kind of a lot of data streams for where that's going.

474
00:41:46,200 --> 00:41:51,700
And so right now it's not readily available or in a database that I'm aware of.

475
00:41:51,700 --> 00:41:54,800
But that's the sort of stuff that interests me.

476
00:41:54,800 --> 00:41:58,700
And then also, you know, if you're working with, you know,

477
00:41:58,700 --> 00:42:02,700
trusted data in terms of testing of those genetics as well,

478
00:42:02,700 --> 00:42:07,600
collecting information from the facilities such as light spectrum,

479
00:42:07,600 --> 00:42:12,700
medium that they use, different types of fertilizers or whatever

480
00:42:12,700 --> 00:42:17,500
and how that may express down the line with different cannabinoid profiles

481
00:42:17,500 --> 00:42:22,300
or, you know, understanding that relation as well or terpene profiles.

482
00:42:22,300 --> 00:42:27,600
I love it, Rick. And you were right at the frontier and you brought up many good points.

483
00:42:27,600 --> 00:42:31,800
And I think what you hit on is what's the starting point?

484
00:42:31,800 --> 00:42:37,100
Get the data, organize it, curate it, and then of course, get around to analyzing it.

485
00:42:37,100 --> 00:42:38,900
You're right on the path, right?

486
00:42:38,900 --> 00:42:42,000
And that is right now there is poor data.

487
00:42:42,000 --> 00:42:44,400
So first things first, get it.

488
00:42:44,400 --> 00:42:49,500
And so I love that. And then you raise issues like, oh, hoplite and viroid.

489
00:42:49,500 --> 00:42:55,800
And this is why I was saying, well, there's a million co-founding effects going on, right?

490
00:42:55,800 --> 00:43:01,200
And so, for example, oh, maybe you're looking at total cannabinoids to terpenes,

491
00:43:01,200 --> 00:43:04,500
and that's a really good ratio. And one sample's off.

492
00:43:04,500 --> 00:43:08,300
Well, maybe we're not taking into consideration hoplite and viroid.

493
00:43:08,300 --> 00:43:13,000
Maybe the viroid interferes with these standard ratios.

494
00:43:13,000 --> 00:43:17,900
So maybe you think, oh, maybe somebody's falsifying results for this one sample.

495
00:43:17,900 --> 00:43:23,100
But, oh, you go take a closer look and it's like, oh, maybe these plants had trace amounts of...

496
00:43:23,100 --> 00:43:25,900
Or maybe these plants were slightly infected with the virus

497
00:43:25,900 --> 00:43:32,000
and they just weren't producing terpenes normally or something of that sort.

498
00:43:32,000 --> 00:43:34,000
Once again, it's all investigation.

499
00:43:34,000 --> 00:43:37,600
That's part of what's interesting about being a data scientist, right?

500
00:43:37,600 --> 00:43:39,400
We wear many hats, right?

501
00:43:39,400 --> 00:43:42,600
As a good researcher, we have to kind of investigate these.

502
00:43:42,600 --> 00:43:45,200
You know, don't jump to conclusions.

503
00:43:45,200 --> 00:43:48,500
But I think we've got some fruitful things to work on.

504
00:43:48,500 --> 00:43:51,900
So, Isaac, I love that you've pointed this problem out,

505
00:43:51,900 --> 00:43:54,900
because as I said, you know, we're all about solutions here.

506
00:43:54,900 --> 00:43:57,700
So let's, you know, put our minds together

507
00:43:57,700 --> 00:44:02,000
and see if we can't think about ways to help the laboratories out.

508
00:44:02,000 --> 00:44:03,700
I think we can use statistics to help.

509
00:44:03,700 --> 00:44:07,000
Well, I think we've got plenty to work on.

510
00:44:07,000 --> 00:44:13,700
So for Rick, we're going to be looking at strain names just to try to find crosses,

511
00:44:13,700 --> 00:44:17,900
because lineage is something that I've been curious about looking at.

512
00:44:17,900 --> 00:44:21,700
Just basically seeing, oh, can we trace all the hazes back?

513
00:44:21,700 --> 00:44:26,000
I'm like working on scraping data from a few different locations.

514
00:44:26,000 --> 00:44:35,100
There's a few sites out there that keep pretty detailed tracking of like a strain and its lineage.

515
00:44:35,100 --> 00:44:39,900
And there's also Reddit, where there's individual subreddits with posts

516
00:44:39,900 --> 00:44:44,500
that you can pull strain names and other stuff out of.

517
00:44:44,500 --> 00:44:51,900
So I'll try to contribute to the GitHub and make sure that anything that I collect will be available there also.

518
00:44:51,900 --> 00:44:52,900
Phenomenal, Rick.

519
00:44:52,900 --> 00:44:57,800
Well, try to come back next week, because genetics and lineages on the agenda,

520
00:44:57,800 --> 00:45:02,100
somehow we got pushed back from today, but definitely next week.

521
00:45:02,100 --> 00:45:06,400
And then we'll also end up looking at consumers in the next week,

522
00:45:06,400 --> 00:45:14,100
because last week we just did a back of the envelope estimate of how many consumers there were in Washington state.

523
00:45:14,100 --> 00:45:19,300
But we can do a much better job of that, given some data that's out there.

524
00:45:19,300 --> 00:45:23,100
So I was thinking, and we can even do some consumer analysis.

525
00:45:23,100 --> 00:45:24,600
So those are the things that are coming up.

526
00:45:24,600 --> 00:45:31,700
We'll get to the bottom of this lab shopping for help Isaac out, help good old NCR labs out.

527
00:45:31,700 --> 00:45:35,600
So we'll help out as much as we can.

528
00:45:35,600 --> 00:45:39,300
Then, of course, genetics and lineage and cannabis consumers.

529
00:45:39,300 --> 00:45:41,900
So those are the topics coming up in the coming weeks.

530
00:45:41,900 --> 00:45:43,700
It's going to be a fun time.

531
00:45:43,700 --> 00:45:46,500
So we'll end the year out strong.

532
00:45:46,500 --> 00:45:51,100
Just like to thank you all for helping advance cannabis science.

533
00:45:51,100 --> 00:45:55,600
Couldn't do it without you, your eyes, your ears, your brilliant minds.

534
00:45:55,600 --> 00:46:01,800
So thank you all for coming together to the Cantlydix Cannabis Data Science Meetup.

535
00:46:01,800 --> 00:46:07,600
So thank you.