1
00:00:00,000 --> 00:00:11,560
All right.

2
00:00:11,560 --> 00:00:14,080
Well, it's good to see everybody today.

3
00:00:14,080 --> 00:00:18,720
So welcome to the Cannabis Data Science Meetup group.

4
00:00:18,720 --> 00:00:19,720
My name is Keegan.

5
00:00:19,720 --> 00:00:22,120
I started a company, Canlytics.

6
00:00:22,120 --> 00:00:28,440
And so I've got a background working in a laboratory and have an economics degree, spent

7
00:00:28,440 --> 00:00:30,480
a bit of time as a data scientist.

8
00:00:30,480 --> 00:00:36,480
So trying to apply that to the cannabis field and try to share what I know and just help

9
00:00:36,480 --> 00:00:40,720
everybody else do cannabis analytics simply and easily.

10
00:00:40,720 --> 00:00:45,780
So just real quick, I guess we can go ahead and go around and just introduce ourselves

11
00:00:45,780 --> 00:00:46,780
real quick.

12
00:00:46,780 --> 00:00:49,920
So if you don't mind just taking 30 seconds real quick.

13
00:00:49,920 --> 00:00:52,080
So I'll just stop starting in the top corner.

14
00:00:52,080 --> 00:00:55,320
Charles, would you mind introducing yourself real quick?

15
00:00:55,320 --> 00:00:57,600
Hi, I'm Charles.

16
00:00:57,600 --> 00:01:04,680
I have 27 years of software development experience and I'm transitioning now into like data science

17
00:01:04,680 --> 00:01:07,680
and machine learning.

18
00:01:07,680 --> 00:01:09,680
Paul.

19
00:01:09,680 --> 00:01:10,680
Sure.

20
00:01:10,680 --> 00:01:13,200
Yeah, my name is Paul.

21
00:01:13,200 --> 00:01:18,840
I'm a data scientist with Gerald Motors.

22
00:01:18,840 --> 00:01:24,920
Been doing analytics work prior for about the last 10 to 12 years.

23
00:01:24,920 --> 00:01:29,800
I'm currently enrolled in a master's program at the University of Wisconsin, a master's

24
00:01:29,800 --> 00:01:31,720
degree in data science.

25
00:01:31,720 --> 00:01:34,600
Currently working on my final project for that program.

26
00:01:34,600 --> 00:01:38,800
And Charles and Keegan have been helping me out with my project.

27
00:01:38,800 --> 00:01:41,800
So welcome everybody.

28
00:01:41,800 --> 00:01:43,600
Awesome.

29
00:01:43,600 --> 00:01:48,400
Megan, it's good to see you here today.

30
00:01:48,400 --> 00:01:50,400
Would you mind introducing yourself?

31
00:01:50,400 --> 00:01:51,680
Hello.

32
00:01:51,680 --> 00:01:53,320
Nice to meet everybody.

33
00:01:53,320 --> 00:01:54,320
I'm Meg.

34
00:01:54,320 --> 00:01:58,720
I've been in data analytics for almost two years.

35
00:01:58,720 --> 00:02:03,600
I was just laid off from Cerner, but I was a business data analyst there.

36
00:02:03,600 --> 00:02:04,600
Awesome.

37
00:02:04,600 --> 00:02:07,800
Well, welcome to the group.

38
00:02:07,800 --> 00:02:11,360
It's always awesome to have more data scientists aboard.

39
00:02:11,360 --> 00:02:16,920
Arunabh, could you introduce yourself real quick if you don't mind?

40
00:02:16,920 --> 00:02:17,920
Sure, sure.

41
00:02:17,920 --> 00:02:21,360
Firstly, glad to meet everyone for the first time.

42
00:02:21,360 --> 00:02:22,800
And yeah, my name is Arunabh.

43
00:02:22,800 --> 00:02:28,800
Basically I'm a data science instructor at Lighthouse Labs, also looking to transition

44
00:02:28,800 --> 00:02:32,240
in the industry for any interesting opportunities.

45
00:02:32,240 --> 00:02:33,720
But yeah, that's what I do.

46
00:02:33,720 --> 00:02:36,920
Basically I've been in the field for four or five years now.

47
00:02:36,920 --> 00:02:37,920
Awesome.

48
00:02:37,920 --> 00:02:41,160
Well, it's great to have you.

49
00:02:41,160 --> 00:02:42,160
Ari.

50
00:02:42,160 --> 00:02:45,400
Hi, I'm Ari.

51
00:02:45,400 --> 00:02:50,400
I've been doing some web development and digital marketing, but recently took a data science

52
00:02:50,400 --> 00:02:56,360
bootcamp with General Assembly and looking to make transition into data science.

53
00:02:56,360 --> 00:02:57,360
Excellent.

54
00:02:57,360 --> 00:03:01,120
The more data scientists, the merrier.

55
00:03:01,120 --> 00:03:05,760
So Gabriel, if you don't mind.

56
00:03:05,760 --> 00:03:06,760
Hi.

57
00:03:06,760 --> 00:03:07,760
Oops.

58
00:03:07,760 --> 00:03:10,320
Sorry, had my webcam covered.

59
00:03:10,320 --> 00:03:12,760
I'm a data analytics student.

60
00:03:12,760 --> 00:03:16,920
I'm doing a bootcamp through Thinkful, so I'm pretty new to all this, but I'm open to

61
00:03:16,920 --> 00:03:18,920
learning as much as I can.

62
00:03:18,920 --> 00:03:21,680
Well, you're in the right place.

63
00:03:21,680 --> 00:03:24,220
So we're all learning here, including myself.

64
00:03:24,220 --> 00:03:26,920
So I've got some to learn today myself.

65
00:03:26,920 --> 00:03:29,800
So we've got Heather.

66
00:03:29,800 --> 00:03:32,160
It's up to you, Heather, but if you want to introduce yourself, you're more than welcome

67
00:03:32,160 --> 00:03:33,160
to.

68
00:03:33,160 --> 00:03:34,160
Hey, everybody.

69
00:03:34,160 --> 00:03:35,160
I'm Heather.

70
00:03:35,160 --> 00:03:39,280
I've been attending these meetings for a couple of weeks now.

71
00:03:39,280 --> 00:03:46,880
I don't have data science, but, or excuse me, data, but I hope to get a job soon where

72
00:03:46,880 --> 00:03:50,560
I'm actually collecting cannabis data and being able to analyze it.

73
00:03:50,560 --> 00:03:53,200
That would be more than exciting.

74
00:03:53,200 --> 00:04:00,640
So this meetup keeps me sane and looking forward to meeting you all.

75
00:04:00,640 --> 00:04:06,880
And how's everybody's 710?

76
00:04:06,880 --> 00:04:07,880
Productive and busy.

77
00:04:07,880 --> 00:04:10,360
So it's awesome to have you, Heather.

78
00:04:10,360 --> 00:04:11,360
Yeah.

79
00:04:11,360 --> 00:04:12,360
Yeah.

80
00:04:12,360 --> 00:04:13,720
The sales on 710 are way better than the 420.

81
00:04:13,720 --> 00:04:14,720
Just letting you all know.

82
00:04:14,720 --> 00:04:15,880
Oh, that's right.

83
00:04:15,880 --> 00:04:16,880
It's oil.

84
00:04:16,880 --> 00:04:18,400
It's a spill pack.

85
00:04:18,400 --> 00:04:19,400
Yes.

86
00:04:19,400 --> 00:04:21,600
So all the RSOs are like missing from the shelf.

87
00:04:21,600 --> 00:04:25,400
People just like take the whole one milliliter and dump it onto like some vanilla wafers.

88
00:04:25,400 --> 00:04:27,960
So you might not see RSOs for a while.

89
00:04:27,960 --> 00:04:30,960
They have to rejuvenate.

90
00:04:30,960 --> 00:04:33,960
Jesus.

91
00:04:33,960 --> 00:04:37,560
Interesting opportunity for some analytics there.

92
00:04:37,560 --> 00:04:38,840
Yeah, I like to do.

93
00:04:38,840 --> 00:04:42,880
So I like tearing into it if I can.

94
00:04:42,880 --> 00:04:47,960
Well, we may have to try to find some daily data and see if we can find a spike there.

95
00:04:47,960 --> 00:04:49,960
So awesome.

96
00:04:49,960 --> 00:04:57,440
Erin, last but not least, would you mind introducing yourself real quick?

97
00:04:57,440 --> 00:04:58,440
Getting a little bit ecstatic.

98
00:04:58,440 --> 00:04:59,440
That's not me.

99
00:04:59,440 --> 00:05:19,920
No, no, no.

100
00:05:19,920 --> 00:05:20,920
Keegan you're muted.

101
00:05:20,920 --> 00:05:28,600
Oh, I was saying, sorry Erin, I couldn't hear you and had to mute you for a second.

102
00:05:28,600 --> 00:05:33,360
But I'll try to re-mute.

103
00:05:33,360 --> 00:05:38,120
Oh, here we go.

104
00:05:38,120 --> 00:05:40,640
Yeah, I'm sorry.

105
00:05:40,640 --> 00:05:41,640
Just getting static.

106
00:05:41,640 --> 00:05:47,040
So we'll have to get your introduction here in a second unless you want to drop your drop

107
00:05:47,040 --> 00:05:53,600
a quick introduction for yourself in the chat and we'll revisit that.

108
00:05:53,600 --> 00:05:57,400
So it's awesome to have everyone here.

109
00:05:57,400 --> 00:05:59,680
So just to go ahead and kick it off.

110
00:05:59,680 --> 00:06:07,300
So each week we sort of look at an interesting data analytics question with the best cannabis

111
00:06:07,300 --> 00:06:09,500
data that we can find.

112
00:06:09,500 --> 00:06:14,580
So we've been primarily looking at lab results.

113
00:06:14,580 --> 00:06:19,240
So we have a data set in Washington state that Charles has been looking at trying to

114
00:06:19,240 --> 00:06:25,880
predict if we can, trying to see if we can predict when a sample may fail quality assurance

115
00:06:25,880 --> 00:06:36,100
testing for say microbial contamination, high residual solvents or potentially foreign matter.

116
00:06:36,100 --> 00:06:45,880
So it's tricky because most samples pass quality assurance screening and only a small number

117
00:06:45,880 --> 00:06:46,880
fail.

118
00:06:46,880 --> 00:06:56,720
So with limited variation it can lead to, it can be hard to fit a good model.

119
00:06:56,720 --> 00:07:06,720
So changing gears slightly, we found an interesting hemp, which is still cannabis, hemp data set

120
00:07:06,720 --> 00:07:14,720
from, let's see, I'm forgetting here the authors.

121
00:07:14,720 --> 00:07:23,140
So it's Midwestern hemp database from the University of Illinois.

122
00:07:23,140 --> 00:07:27,640
So I'll go ahead and share my screen real quick and just go ahead and get everybody

123
00:07:27,640 --> 00:07:32,360
up to speed here.

124
00:07:32,360 --> 00:07:35,160
Just to introduce Aaron real quick.

125
00:07:35,160 --> 00:07:40,760
He's a front end developer, also interested in cannabis data analysis.

126
00:07:40,760 --> 00:07:44,920
And so you're in the right spot as well here.

127
00:07:44,920 --> 00:07:50,480
So I'll share my screen real quick.

128
00:07:50,480 --> 00:08:03,840
So near the end we'll look at, I'll save some time so that way Charles can go over

129
00:08:03,840 --> 00:08:08,560
the work he's done because Charles has done some great work here predicting failures in

130
00:08:08,560 --> 00:08:09,920
Washington state.

131
00:08:09,920 --> 00:08:15,320
So I'll try to leave about 10 minutes or so at the end for Charles to present.

132
00:08:15,320 --> 00:08:20,520
So at least, so we'll try not to get too bogged down in this.

133
00:08:20,520 --> 00:08:25,600
So long story short, we found the Midwestern hemp database.

134
00:08:25,600 --> 00:08:40,400
And so, Philippa Burde is putting together data on hemp strains grown in Illinois, Wisconsin,

135
00:08:40,400 --> 00:08:43,680
Michigan, and Indiana.

136
00:08:43,680 --> 00:08:45,960
I think you can participate in other states.

137
00:08:45,960 --> 00:08:52,000
However, these are the cultivators that are currently participating.

138
00:08:52,000 --> 00:09:00,600
And so we've collected this data last week and have begun analyzing it.

139
00:09:00,600 --> 00:09:06,360
So what are some of the interesting data points we have here?

140
00:09:06,360 --> 00:09:15,120
Well, we have essentially the cultivar, which is the strain of cannabis grown.

141
00:09:15,120 --> 00:09:17,480
The source, we need to pin this down.

142
00:09:17,480 --> 00:09:20,560
It's either the company or the seed company.

143
00:09:20,560 --> 00:09:24,640
I believe it's actually the cultivator.

144
00:09:24,640 --> 00:09:28,520
And then we're interested in a couple variables here.

145
00:09:28,520 --> 00:09:35,520
So we're of course interested in what state it was grown in because we had an interesting

146
00:09:35,520 --> 00:09:45,000
observation last week that the soil breakdown of various states could affect the quality

147
00:09:45,000 --> 00:09:48,220
of hemp grown.

148
00:09:48,220 --> 00:09:54,800
And then we want to also look at the harvest date, if possible.

149
00:09:54,800 --> 00:10:01,600
So this is information about the cultivars, the strains.

150
00:10:01,600 --> 00:10:11,320
And so we ideally want to combine this with data they've provided on lab testing.

151
00:10:11,320 --> 00:10:16,480
So they've sent in each strain to get tested.

152
00:10:16,480 --> 00:10:24,840
And so they received results for CBD, THC, and CBG.

153
00:10:24,840 --> 00:10:27,120
Now I'm going to show you a graph below.

154
00:10:27,120 --> 00:10:39,720
So keep in mind the federal permitted level of THC in hemp is 0.3%.

155
00:10:39,720 --> 00:10:49,680
So if you accidentally grow hemp that's higher than 0.3%, then you need to essentially destroy

156
00:10:49,680 --> 00:10:51,360
your harvest.

157
00:10:51,360 --> 00:10:59,120
So we were talking last week about how it's a tricky dance that cultivators have to dance

158
00:10:59,120 --> 00:11:06,840
where they're trying to maximize the CBD in their products.

159
00:11:06,840 --> 00:11:16,720
However, as we can essentially visually see, there appears to be a positive correlation

160
00:11:16,720 --> 00:11:20,440
between CBD and THC.

161
00:11:20,440 --> 00:11:30,600
They're all cannabinoids deriving from CBG, CBGA to be precise.

162
00:11:30,600 --> 00:11:39,480
So as people try to maximize CBD in their hemp harvests, they run the risk of failing.

163
00:11:39,480 --> 00:11:45,000
So it comes down to the cultivator's risk preferences.

164
00:11:45,000 --> 00:11:54,960
However, there's calculus to be done to centrally try to maximize your CBD while keeping your

165
00:11:54,960 --> 00:12:00,760
probability of failing at a tolerable rate.

166
00:12:00,760 --> 00:12:09,160
So we're going to begin analysis here and see if we can't lend a hand and see if we

167
00:12:09,160 --> 00:12:14,440
can make any sense out of the data provided.

168
00:12:14,440 --> 00:12:29,320
So I have gone ahead and made this analysis available on GitHub.

169
00:12:29,320 --> 00:12:36,920
And I'll go ahead and push the latest commit so that way you can get your hands on the

170
00:12:36,920 --> 00:12:38,080
data.

171
00:12:38,080 --> 00:12:48,760
So essentially, I have copied and pasted this data into just an Excel spreadsheet just so

172
00:12:48,760 --> 00:12:52,000
that we can start to actually read the data.

173
00:12:52,000 --> 00:12:58,680
And so if you have a better way than copying and pasting, then have at it.

174
00:12:58,680 --> 00:13:04,720
However, that just was what I resorted to for the time being.

175
00:13:04,720 --> 00:13:18,360
So we've got the data.

176
00:13:18,360 --> 00:13:23,120
Then the analysis we're doing today will be available for you.

177
00:13:23,120 --> 00:13:27,640
And just to show you where that's located.

178
00:13:27,640 --> 00:13:38,280
So boosting all of the cannabis data science code to a GitHub repository.

179
00:13:38,280 --> 00:13:55,760
That way you can follow along, get the code and follow.

180
00:13:55,760 --> 00:14:05,560
So I must have accidentally saved this in last week's folder.

181
00:14:05,560 --> 00:14:07,320
So I'll fix this after the presentation.

182
00:14:07,320 --> 00:14:16,360
So if you are following along, the script is in the 07.07 folder at the time being.

183
00:14:16,360 --> 00:14:19,960
So long story short, we've got the data.

184
00:14:19,960 --> 00:14:24,680
So the first thing we need to do is look at the data.

185
00:14:24,680 --> 00:14:34,120
So well, simply I'm using Python, but you can use your programming language of choice.

186
00:14:34,120 --> 00:14:45,160
And so just going to...

187
00:14:45,160 --> 00:14:49,760
First things first, we want to just look at the data here.

188
00:14:49,760 --> 00:14:58,480
So I always just look at the first five observations, just as sort of a sanity check to see, okay,

189
00:14:58,480 --> 00:15:00,440
this is the data we have.

190
00:15:00,440 --> 00:15:08,240
And then if you're getting fancy, it's always a good idea just to generate some summary

191
00:15:08,240 --> 00:15:12,880
statistics of the data at hand.

192
00:15:12,880 --> 00:15:20,400
So we're working with about 750 hemp samples.

193
00:15:20,400 --> 00:15:33,120
So these were samples that were sent in by various cultivators and the strains.

194
00:15:33,120 --> 00:15:40,160
So what we can do is we can start adding some variables here.

195
00:15:40,160 --> 00:15:48,680
So the first thing I did was, okay, we can create a variable fail, which is just zero

196
00:15:48,680 --> 00:15:50,120
or one.

197
00:15:50,120 --> 00:16:00,920
And so it will be one if the hemp is higher than 0.3% concentration of THC and then zero

198
00:16:00,920 --> 00:16:06,720
otherwise.

199
00:16:06,720 --> 00:16:16,240
And let's look at that data.

200
00:16:16,240 --> 00:16:27,280
So we can see that about 25% of the samples are failing for high THC.

201
00:16:27,280 --> 00:16:30,680
So we can find that just by taking the mean.

202
00:16:30,680 --> 00:16:37,000
And we were talking about in Washington state how about only we were finding a shockingly

203
00:16:37,000 --> 00:16:38,560
low number, maybe less than 1%.

204
00:16:38,560 --> 00:16:43,080
Correct me if I'm wrong, Charles, but it seemed that it was low.

205
00:16:43,080 --> 00:16:47,800
Percent of samples fall failing for quality assurance reason.

206
00:16:47,800 --> 00:16:55,720
When you have such low variance, you typically have to resort to special models.

207
00:16:55,720 --> 00:17:02,200
So there's a negative binomial model, but in general analysis can be tricky.

208
00:17:02,200 --> 00:17:08,560
So what's interesting about the hemp data is what I would consider a non-negligible

209
00:17:08,560 --> 00:17:14,480
portion of the sample is failing for high THC.

210
00:17:14,480 --> 00:17:17,920
And you have a high standard deviation.

211
00:17:17,920 --> 00:17:21,600
So you've got a lot of variation here.

212
00:17:21,600 --> 00:17:25,500
So it may prove for rich analytics.

213
00:17:25,500 --> 00:17:36,600
So we may be able to help these hemp producers out or labs that are testing hemp.

214
00:17:36,600 --> 00:17:45,720
So this is where we got to last week where I was essentially saying you can find quite

215
00:17:45,720 --> 00:17:52,820
interesting statistics by just taking conditional averages.

216
00:17:52,820 --> 00:17:56,640
So here I'm just taking an average.

217
00:17:56,640 --> 00:18:03,600
So here I'm just taking the mean for each state.

218
00:18:03,600 --> 00:18:09,840
So I'll go ahead and print that out just so that you can see.

219
00:18:09,840 --> 00:18:16,400
So if you do a conditional average, you can see the failure rate varies a little bit from

220
00:18:16,400 --> 00:18:18,120
state to state.

221
00:18:18,120 --> 00:18:27,440
So Illinois has the highest failure rate at above 30% failing for high THC.

222
00:18:27,440 --> 00:18:33,860
And then Wisconsin has the lowest failure rate for high hemp at just over 20%.

223
00:18:33,860 --> 00:18:36,880
So there's slight variation state by state.

224
00:18:36,880 --> 00:18:42,640
However, it'd be interesting to see if there's anything to that.

225
00:18:42,640 --> 00:18:49,360
So for now, I've just taken the conditional average, and there's about a 10% range.

226
00:18:49,360 --> 00:18:52,680
So it may be worth exploring further.

227
00:18:52,680 --> 00:18:56,880
For example, soil.

228
00:18:56,880 --> 00:19:07,700
Next, just wanted to think about whether some of the other data points that we have here.

229
00:19:07,700 --> 00:19:11,440
So if we just print out our columns.

230
00:19:11,440 --> 00:19:14,440
So we have the state.

231
00:19:14,440 --> 00:19:19,600
I think it would be interesting to look at the county to see if you could find county

232
00:19:19,600 --> 00:19:22,280
level soil data.

233
00:19:22,280 --> 00:19:25,480
So there's a lot of factors you could add.

234
00:19:25,480 --> 00:19:28,840
You could do precipitation.

235
00:19:28,840 --> 00:19:30,120
You could be creative.

236
00:19:30,120 --> 00:19:35,080
So I think there's room for analysis there.

237
00:19:35,080 --> 00:19:37,400
Geographic analysis can take a little bit of time.

238
00:19:37,400 --> 00:19:45,320
So just for expediency sake, I began looking at centrally when the hemp was sampled.

239
00:19:45,320 --> 00:19:51,280
So we have the date when the hemp was sampled.

240
00:19:51,280 --> 00:19:57,840
So just looking at the data here, just the way the data was collected.

241
00:19:57,840 --> 00:20:01,160
And this is what you'll learn when you're doing data analytics.

242
00:20:01,160 --> 00:20:02,920
And a lot of you have already learned.

243
00:20:02,920 --> 00:20:05,960
You spend a lot of your time just cleaning up the data.

244
00:20:05,960 --> 00:20:12,840
So from the way I copied and pasted this in, you'll see there's these unnecessary spaces.

245
00:20:12,840 --> 00:20:15,280
The year is in its own column.

246
00:20:15,280 --> 00:20:17,480
So not ideal.

247
00:20:17,480 --> 00:20:28,920
So centrally, these lines here are just turning the date, the sample date, and the sample

248
00:20:28,920 --> 00:20:33,640
year into a nice time stamp.

249
00:20:33,640 --> 00:20:39,080
That way we can work with it as a time series.

250
00:20:39,080 --> 00:20:40,720
But we're not doing time series analysis.

251
00:20:40,720 --> 00:20:46,440
But we can at least have it in daytime.

252
00:20:46,440 --> 00:20:56,200
So once again, make sure to look at the data.

253
00:20:56,200 --> 00:21:02,720
OK.

254
00:21:02,720 --> 00:21:20,080
So what we have here is essentially the earliest sample date was July 27.

255
00:21:20,080 --> 00:21:28,360
And then the latest was November 16.

256
00:21:28,360 --> 00:21:36,680
I should have here we are.

257
00:21:36,680 --> 00:21:44,240
I think if we take the mean of OK.

258
00:21:44,240 --> 00:21:46,680
So I wrote this little function up here.

259
00:21:46,680 --> 00:21:52,620
So basically, the way I'm calculating this is I'm calculating this is the number of days

260
00:21:52,620 --> 00:21:59,800
into the year that sampling occurred.

261
00:21:59,800 --> 00:22:09,560
So for example, the median.

262
00:22:09,560 --> 00:22:25,280
I think we wanted integer.

263
00:22:25,280 --> 00:22:32,400
So perhaps we want strings.

264
00:22:32,400 --> 00:22:37,920
The average harvest occurred on September 19.

265
00:22:37,920 --> 00:22:42,020
So that was when the average harvest occurred.

266
00:22:42,020 --> 00:22:53,320
And then we saw that the latest harvest was in November, the earliest in July.

267
00:22:53,320 --> 00:22:59,440
So there's some variation between when people are harvesting their hemp.

268
00:22:59,440 --> 00:23:06,920
And so essentially, I was wanting to see when people sample their hemp.

269
00:23:06,920 --> 00:23:12,320
So keep in mind, we essentially want to combine some data sets here.

270
00:23:12,320 --> 00:23:16,040
So it would be nice to actually use the harvest date.

271
00:23:16,040 --> 00:23:25,000
However, there's not a one to one relationship here between the agronomic performance and

272
00:23:25,000 --> 00:23:27,260
the cannabinoid data.

273
00:23:27,260 --> 00:23:30,500
So I haven't connected the dots yet.

274
00:23:30,500 --> 00:23:34,200
So I'm using the sample date.

275
00:23:34,200 --> 00:23:43,680
And it's not clear the relation between sample date and harvest date.

276
00:23:43,680 --> 00:23:48,720
So the harvest date seems to be in the future from the sample date.

277
00:23:48,720 --> 00:23:51,840
That's what I had a sneaking suspicion of.

278
00:23:51,840 --> 00:23:59,040
However, it doesn't.

279
00:23:59,040 --> 00:24:01,320
That may just be the situation.

280
00:24:01,320 --> 00:24:09,680
So they may be sampling before they harvest, which would make sense.

281
00:24:09,680 --> 00:24:16,160
So that way, they can just be continuously monitoring their cannabinoids and then harvest

282
00:24:16,160 --> 00:24:21,680
before it's too late.

283
00:24:21,680 --> 00:24:31,680
I think we need to do a deeper dive here into this data to actually understand a bit more

284
00:24:31,680 --> 00:24:34,040
about these data points.

285
00:24:34,040 --> 00:24:43,840
So I'll go ahead and hedge that now that we need to understand sampling date and harvest

286
00:24:43,840 --> 00:24:48,280
date a little better.

287
00:24:48,280 --> 00:24:57,000
However, continuing just to keep in mind, a lot of what I'm doing now is essentially

288
00:24:57,000 --> 00:25:06,400
a demonstration, a crude demonstration of how you can start going about data analytics.

289
00:25:06,400 --> 00:25:10,640
So in practice, you'll want to do things.

290
00:25:10,640 --> 00:25:15,220
You want to research these data points a little better.

291
00:25:15,220 --> 00:25:19,300
Study your models a little better.

292
00:25:19,300 --> 00:25:21,220
Do a bit more background research.

293
00:25:21,220 --> 00:25:24,680
So just keep that in mind.

294
00:25:24,680 --> 00:25:32,280
This is just a quick crude analysis just to give you an idea of what you can do.

295
00:25:32,280 --> 00:25:38,040
Hey, Keegan, a quick question for you on that actual source data.

296
00:25:38,040 --> 00:25:42,880
So I notice it's called hemp data.

297
00:25:42,880 --> 00:25:47,520
I assume it's just another, they're using that term generically as just a cannabis crop,

298
00:25:47,520 --> 00:25:49,400
but are this grown outside?

299
00:25:49,400 --> 00:25:53,800
Are these particular plants grown outside or crops grown outside?

300
00:25:53,800 --> 00:25:55,560
Hemp is typically grown outside.

301
00:25:55,560 --> 00:26:03,480
So typically people just seed acres and acres of hemp.

302
00:26:03,480 --> 00:26:06,160
There's no rules one way or the other.

303
00:26:06,160 --> 00:26:12,040
So you can grow hemp in, say, a greenhouse or potentially indoors.

304
00:26:12,040 --> 00:26:16,520
However, there's a couple of things going on here.

305
00:26:16,520 --> 00:26:22,560
From my understanding, there's essentially three reasons people grow hemp.

306
00:26:22,560 --> 00:26:29,600
I'm sure there's more, but the three main ones are for CBD, so for processing the flower

307
00:26:29,600 --> 00:26:30,940
into CBD.

308
00:26:30,940 --> 00:26:36,840
So there you're going to be growing flower stalks like you see in this picture, and then

309
00:26:36,840 --> 00:26:40,360
you'll want to be harvesting that for CBD.

310
00:26:40,360 --> 00:26:42,680
The second reason is for fiber.

311
00:26:42,680 --> 00:26:46,440
And so then you're going to be growing much more fibrous hemp.

312
00:26:46,440 --> 00:26:51,120
And so there you typically won't flower quite like that, and they'll look a lot more like

313
00:26:51,120 --> 00:26:52,880
stalks.

314
00:26:52,880 --> 00:26:58,360
And then the third reason is for seeds, for hemp seeds.

315
00:26:58,360 --> 00:27:05,080
And so then you'll typically have something that you'd have in this field, but then you'll

316
00:27:05,080 --> 00:27:06,080
have them pollinated.

317
00:27:06,080 --> 00:27:12,720
So you'll have a bunch of male plants in there with the females, and they'll produce seeds.

318
00:27:12,720 --> 00:27:19,600
And then people can either eat hemp seeds or hemp seeds are actually quite valuable

319
00:27:19,600 --> 00:27:20,680
these days.

320
00:27:20,680 --> 00:27:27,240
So that's something that I'm kind of coming to understand is there's such a high demand

321
00:27:27,240 --> 00:27:32,360
for cultivation that the cultivators need to get seeds.

322
00:27:32,360 --> 00:27:39,960
So there's actually it's actually quite profitable to grow seeds is what I am coming to understand.

323
00:27:39,960 --> 00:27:40,960
I gotcha.

324
00:27:40,960 --> 00:27:42,520
Yeah, thanks for explaining that.

325
00:27:42,520 --> 00:27:47,640
I wasn't quite up on the hemp side of the house, so that helps.

326
00:27:47,640 --> 00:27:48,640
Exactly.

327
00:27:48,640 --> 00:27:51,000
So it's a bit more of an agricultural crop.

328
00:27:51,000 --> 00:27:57,720
So it's not the exact same as high THC cannabis.

329
00:27:57,720 --> 00:28:05,680
So is that hence the point three percent or the point three percent threshold for THC?

330
00:28:05,680 --> 00:28:06,680
Point oh three maybe.

331
00:28:06,680 --> 00:28:07,680
It's a point oh three.

332
00:28:07,680 --> 00:28:08,680
It's like nothing.

333
00:28:08,680 --> 00:28:11,560
Yeah, it's point three percent.

334
00:28:11,560 --> 00:28:16,200
So it's it's a shockingly low percent of THC.

335
00:28:16,200 --> 00:28:23,140
However, when you really dive into the science, it is still just the cannabis plant.

336
00:28:23,140 --> 00:28:28,960
So it just some varieties just don't produce much THC.

337
00:28:28,960 --> 00:28:36,200
And so it's real interesting because you'll end.

338
00:28:36,200 --> 00:28:42,760
You know, I worked in a lab that tested hemp products and you'll find farmers who they'll

339
00:28:42,760 --> 00:28:49,880
basically start growing hemp and they think they may have a low THC variety and it may

340
00:28:49,880 --> 00:28:57,880
produce phenomenal CBD, but sometimes it may have like seven percent THC and then they

341
00:28:57,880 --> 00:29:01,640
will have to destroy their harvest.

342
00:29:01,640 --> 00:29:08,720
So it's tricky, right?

343
00:29:08,720 --> 00:29:15,520
Because of course, you know, there are just these strains that definitely don't produce

344
00:29:15,520 --> 00:29:23,400
much THC, but as we can kind of see here, there's.

345
00:29:23,400 --> 00:29:26,600
You know, it's not like they don't produce THC.

346
00:29:26,600 --> 00:29:33,760
So it's basically just people have selected the cannabis for different purposes.

347
00:29:33,760 --> 00:29:44,720
So people selling high THC flour or or oil, they've basically just they just, you know,

348
00:29:44,720 --> 00:29:51,480
selected the plants that have the high THC and they've just bred for high THC.

349
00:29:51,480 --> 00:29:59,800
And that's why you see strains in the stores with, you know, quite high levels of THC,

350
00:29:59,800 --> 00:30:02,640
like above 20 percent THC.

351
00:30:02,640 --> 00:30:06,920
So it's really just what people are selecting for.

352
00:30:06,920 --> 00:30:15,680
And so here people are trying to select like high CBD strains that still don't produce

353
00:30:15,680 --> 00:30:16,680
much THC.

354
00:30:16,680 --> 00:30:17,680
Gotcha.

355
00:30:17,680 --> 00:30:18,680
Makes sense.

356
00:30:18,680 --> 00:30:32,120
So just to kind of wrap this up, what we'll ultimately want to do is somehow analyze the

357
00:30:32,120 --> 00:30:39,440
cultivars, because that's what from my analysis is that's what I try to kind of boil this

358
00:30:39,440 --> 00:30:43,360
down to, because basically I'll get to it at the end.

359
00:30:43,360 --> 00:30:49,320
But I was thinking that, OK, these.

360
00:30:49,320 --> 00:30:55,200
The cultivator is choosing, OK, this is when I harvest, this is when I sample.

361
00:30:55,200 --> 00:31:02,160
But then it may it's in a way, in a way, it's almost the plant deciding, right, because

362
00:31:02,160 --> 00:31:07,880
really the cultivator is just essentially waiting for the plant to finish the flower,

363
00:31:07,880 --> 00:31:12,480
to seed and then they'll they'll they'll harvest it when it's ready.

364
00:31:12,480 --> 00:31:22,640
So the choice is more in choosing the strain, the cultivar, but that has the your ideal

365
00:31:22,640 --> 00:31:23,640
traits.

366
00:31:23,640 --> 00:31:32,160
So choosing a cultivar that that finishes harvest in September.

367
00:31:32,160 --> 00:31:38,120
So it's more the cultivar is finishing in September, whereas some cultivars finish in

368
00:31:38,120 --> 00:31:39,120
November.

369
00:31:39,120 --> 00:31:40,120
But.

370
00:31:40,120 --> 00:31:44,600
But that just threw thought through the time being.

371
00:31:44,600 --> 00:31:50,560
So let's just go ahead and get back to the analysis here.

372
00:31:50,560 --> 00:31:57,800
So just to to walk through just some some basic, some simple regressions just to continue

373
00:31:57,800 --> 00:32:00,480
looking at the data.

374
00:32:00,480 --> 00:32:11,120
Essentially just took a regression of total THC on the days into the year that that the

375
00:32:11,120 --> 00:32:13,720
hemp was sampled.

376
00:32:13,720 --> 00:32:19,720
And so as you would expect, there's a slight positive coefficient.

377
00:32:19,720 --> 00:32:26,480
So the later you wait, the higher the THC is going to be.

378
00:32:26,480 --> 00:32:37,640
And similarly, if you run a regression of total CBD on the days into the year where

379
00:32:37,640 --> 00:32:46,680
sampling occurred, you'll find another positive coefficient.

380
00:32:46,680 --> 00:32:53,760
So this was what we were beginning to talk about last week, where it appears that the

381
00:32:53,760 --> 00:33:00,040
longer into the year you wait to harvest or the longer into the year the plant takes to

382
00:33:00,040 --> 00:33:04,760
harvest, the higher the cannabinoids are going to be.

383
00:33:04,760 --> 00:33:13,640
So we were wondering, does that mean your chance of failing is going to be higher?

384
00:33:13,640 --> 00:33:20,960
So introducing you today, well, a lot of you may have already seen it, is the logistic

385
00:33:20,960 --> 00:33:22,440
regression.

386
00:33:22,440 --> 00:33:32,000
And so I put some some sources here at the top so that way you can just begin.

387
00:33:32,000 --> 00:33:40,400
There's everything from just some tutorials to some originals.

388
00:33:40,400 --> 00:33:50,840
So for example, so if you need some more in-depth reading, there it is for you.

389
00:33:50,840 --> 00:33:54,960
And then essentially, you just need a light tutorial.

390
00:33:54,960 --> 00:33:57,520
There's Geeks for Geeks.

391
00:33:57,520 --> 00:34:04,360
So I've got a couple references here for you to read up on logistic regressions.

392
00:34:04,360 --> 00:34:14,800
So long story short, I wish there was a better picture here.

393
00:34:14,800 --> 00:34:22,960
Well, basically, your regression can take a value of zero or one.

394
00:34:22,960 --> 00:34:32,900
And so they're useful for predicting binary outcomes because you'll basically run a regression

395
00:34:32,900 --> 00:34:39,520
if your estimate is less than 0.5, then your prediction is zero.

396
00:34:39,520 --> 00:34:45,800
And then if your prediction is greater than 0.5, your prediction is one.

397
00:34:45,800 --> 00:34:54,480
So anything you can dichotomize as zero and one, and you have some explanatory variables,

398
00:34:54,480 --> 00:34:57,600
then you can run a logistic regression.

399
00:34:57,600 --> 00:35:00,900
And there's a lot more to it.

400
00:35:00,900 --> 00:35:08,640
So it depends on whether your explanatory variables depends on the choice or on the

401
00:35:08,640 --> 00:35:09,640
individual.

402
00:35:09,640 --> 00:35:15,600
So I'll let you dive into the reading, but I won't bog you down with that right now.

403
00:35:15,600 --> 00:35:19,120
And we'll just get to the analysis.

404
00:35:19,120 --> 00:35:30,840
Long story short, remember, we have coded a failure variable, which is

405
00:35:30,840 --> 00:35:35,000
just zero if it passes.

406
00:35:35,000 --> 00:35:44,240
And so that would be everything below the red line is zero, everything above the red

407
00:35:44,240 --> 00:35:46,640
line is one.

408
00:35:46,640 --> 00:35:53,960
And so basically, we're going to see if we can predict if things, if the hemp sample

409
00:35:53,960 --> 00:35:54,960
fails.

410
00:35:54,960 --> 00:36:01,600
So we've got things coded, and we're just going to use the sample that date.

411
00:36:01,600 --> 00:36:09,120
So does the date, the time into the year when things were sampled, can that be used to predict

412
00:36:09,120 --> 00:36:14,160
when things were, when hemp fails?

413
00:36:14,160 --> 00:36:20,920
So you can basically fit a logistic regression.

414
00:36:20,920 --> 00:36:27,880
Regression doesn't mean too too much, except for if it's positive or negative.

415
00:36:27,880 --> 00:36:37,400
So long story short, let's go ahead and essentially look at the predictions.

416
00:36:37,400 --> 00:36:39,720
The predictions are the most interesting.

417
00:36:39,720 --> 00:36:45,360
Well, that may be my opinion, but it's sort of where the analysis lies with the logistic

418
00:36:45,360 --> 00:36:47,000
regression.

419
00:36:47,000 --> 00:36:58,200
So essentially, using the regression line that we fit, I predicted the probability of

420
00:36:58,200 --> 00:37:07,480
failing for high THC given the days into the year when sampling occurred.

421
00:37:07,480 --> 00:37:20,040
So keep in mind, so here's another plot, which basically adds the, what we observed.

422
00:37:20,040 --> 00:37:26,400
And so I'll be frank, it's not really the best fit.

423
00:37:26,400 --> 00:37:34,680
So when you see in the examples and tutorials out there, logistic regressions, you'll see

424
00:37:34,680 --> 00:37:37,600
much better fits than this.

425
00:37:37,600 --> 00:37:47,400
So graphically, it doesn't look like the best fit, but you can essentially start to see,

426
00:37:47,400 --> 00:37:55,920
keep in mind this is a crude analysis, but essentially what the prediction is, is the

427
00:37:55,920 --> 00:38:05,600
later into the year that you wait, the chance of failing for high THC increases.

428
00:38:05,600 --> 00:38:16,280
So all the way, so this is why I have this function right here.

429
00:38:16,280 --> 00:38:26,080
Oh, well, I guess I've got this plotted down here.

430
00:38:26,080 --> 00:38:35,880
It looks like all the way through September or so, you only have maybe less than a 25%

431
00:38:35,880 --> 00:38:38,000
chance of failing.

432
00:38:38,000 --> 00:38:48,200
And then it looks like, as you wait, your chances of failing increases.

433
00:38:48,200 --> 00:38:55,600
So I wanted to quantify this a little bit better.

434
00:38:55,600 --> 00:39:02,760
So using just another statistical package.

435
00:39:02,760 --> 00:39:09,840
So here I use stats models, and then here I used scikit-learn.

436
00:39:09,840 --> 00:39:15,880
So once again, it's just a package.

437
00:39:15,880 --> 00:39:22,040
In fact, I actually have an implementation right here that if you want to see how to

438
00:39:22,040 --> 00:39:32,680
write a logistic regression with just matrices, then have at it.

439
00:39:32,680 --> 00:39:38,520
So long story short, I'm just using tools here.

440
00:39:38,520 --> 00:39:48,880
Essentially trying to just quantify this regression to see, well, how accurate was it?

441
00:39:48,880 --> 00:40:02,360
So apparently it predicted with 76% accuracy, but I just kind of want to show the breakdown

442
00:40:02,360 --> 00:40:05,280
of that.

443
00:40:05,280 --> 00:40:13,720
And so here we're predicting failures, and it's actually a failure.

444
00:40:13,720 --> 00:40:17,360
So that happened 16 times.

445
00:40:17,360 --> 00:40:22,600
Then here we predicted a failure, and it's actually a pass.

446
00:40:22,600 --> 00:40:25,000
That's 12 times.

447
00:40:25,000 --> 00:40:31,960
Here we predicted a pass, and it actually failed 168 times.

448
00:40:31,960 --> 00:40:33,600
And I'm going to revisit.

449
00:40:33,600 --> 00:40:35,240
And actually, I'll visit that now.

450
00:40:35,240 --> 00:40:38,520
So essentially, that's worrisome.

451
00:40:38,520 --> 00:40:48,480
So in the Geeks for Geeks article, maybe I'll pin it again.

452
00:40:48,480 --> 00:40:58,240
They start to talk about essentially what do you want to optimize for in your logistic

453
00:40:58,240 --> 00:40:59,280
regression?

454
00:40:59,280 --> 00:41:07,960
So are you trying to get things precise, or are you trying to make sure that you don't

455
00:41:07,960 --> 00:41:09,400
miss any?

456
00:41:09,400 --> 00:41:19,240
And so from the laboratory's point of view, you don't want to be predicting a lot of passes

457
00:41:19,240 --> 00:41:22,140
that are actually failures.

458
00:41:22,140 --> 00:41:34,200
So in my book, it would be like this is the number that you would like to minimize.

459
00:41:34,200 --> 00:41:41,960
I think it's OK if you predict failures and there are actually passes.

460
00:41:41,960 --> 00:41:51,120
So I think this quadrant is not the worst in the world from the laboratory's point of

461
00:41:51,120 --> 00:41:52,800
view.

462
00:41:52,800 --> 00:41:56,760
But these are basic.

463
00:41:56,760 --> 00:42:00,200
Anything in this quadrant is essentially a missed opportunity.

464
00:42:00,200 --> 00:42:08,680
And the reason I say missed opportunity is so remember, so there's 750 samples here.

465
00:42:08,680 --> 00:42:16,080
So it's all about the value of time.

466
00:42:16,080 --> 00:42:26,240
So it takes about 10 minutes, give or take, to analyze one sample of hemp.

467
00:42:26,240 --> 00:42:40,400
So that's 125 hours to run 750 samples.

468
00:42:40,400 --> 00:42:47,400
And there'll probably be pauses in between when an analyst has to load on some more vials,

469
00:42:47,400 --> 00:42:57,960
because typically an instrument may only hold 50 to 100 vials that are being analyzed.

470
00:42:57,960 --> 00:43:05,640
So it may seem like a short amount of time, but it's a long amount of time.

471
00:43:05,640 --> 00:43:10,720
And there are a couple of laboratories doing these tests, but essentially you're trying

472
00:43:10,720 --> 00:43:17,920
to identify the failures as soon as possible, because the sooner you can identify the failure,

473
00:43:17,920 --> 00:43:28,120
you can notify the cultivator, and then they can begin their remedy process.

474
00:43:28,120 --> 00:43:35,420
So that's almost five days.

475
00:43:35,420 --> 00:43:46,760
So the idea is if you can load the failures onto the instrument first, then you can notify

476
00:43:46,760 --> 00:43:55,400
those cultivators almost five days sooner than you could otherwise.

477
00:43:55,400 --> 00:44:02,280
And that could mean a lot of money.

478
00:44:02,280 --> 00:44:09,480
So basically, let's say they just tested one field and it's high for THC.

479
00:44:09,480 --> 00:44:15,620
Well, maybe they have some other fields and they need to harvest those like right this

480
00:44:15,620 --> 00:44:19,200
second before it's too late.

481
00:44:19,200 --> 00:44:27,360
And so say if they waited five days, well now all of their fields may be at high THC.

482
00:44:27,360 --> 00:44:33,760
So it may seem like trivial or small or inconsequential.

483
00:44:33,760 --> 00:44:42,400
However, from my book, basically identifying these failures as soon as possible could just

484
00:44:42,400 --> 00:44:46,600
generate enormous savings later on.

485
00:44:46,600 --> 00:44:56,000
And it's just one of those things where just these small little ways that you do things.

486
00:44:56,000 --> 00:45:02,800
So the small little ways that you do analysis can have these enormous consequences.

487
00:45:02,800 --> 00:45:17,400
So even if this model may not look the best, it's only predicting 16 failures out of 750

488
00:45:17,400 --> 00:45:19,680
or so.

489
00:45:19,680 --> 00:45:27,800
Those are still 16 failures that you can load on the instrument first and just identify

490
00:45:27,800 --> 00:45:30,200
them that much sooner.

491
00:45:30,200 --> 00:45:39,480
And so long story short is what I would try to do is essentially build logit models to

492
00:45:39,480 --> 00:45:48,380
try to, of course, you want to maximize these two quadrants where you're predicting failures

493
00:45:48,380 --> 00:45:51,360
correctly and predicting passes correctly.

494
00:45:51,360 --> 00:46:02,440
However, I would continue to work on the model to try to reduce the number in this third

495
00:46:02,440 --> 00:46:09,280
quadrant and increase the number here or in the second and fourth quadrants.

496
00:46:09,280 --> 00:46:15,320
So that's how I would approach this analysis.

497
00:46:15,320 --> 00:46:18,440
So Keegan, question for you, make sure I'm tracking to this.

498
00:46:18,440 --> 00:46:24,200
So essentially by trying to predict which of the samples are more likely to fail, they

499
00:46:24,200 --> 00:46:28,240
kind of get bumped to the front of the line whenever they come into the lab.

500
00:46:28,240 --> 00:46:32,880
And by doing that and giving them first priority, you're giving that customer potentially more

501
00:46:32,880 --> 00:46:35,720
time to react to their failure.

502
00:46:35,720 --> 00:46:37,360
Exactly.

503
00:46:37,360 --> 00:46:44,420
And so there may be better ways to do it because I was thinking once it's failed, it may be

504
00:46:44,420 --> 00:46:46,200
too late.

505
00:46:46,200 --> 00:46:51,880
So it may be better to try to find things that are right at the brink.

506
00:46:51,880 --> 00:46:56,120
So right at like 0.28, 0.29.

507
00:46:56,120 --> 00:46:59,660
I don't know how you would go about doing that.

508
00:46:59,660 --> 00:47:12,280
So from the lab's point of view, it's always sort of a delicate, it's always delicate to

509
00:47:12,280 --> 00:47:19,880
just tell a cultivator, you've actually failed quality assurance testing or you've failed

510
00:47:19,880 --> 00:47:20,880
hemp testing.

511
00:47:20,880 --> 00:47:23,480
And it's just part of the business.

512
00:47:23,480 --> 00:47:28,800
It's just necessary.

513
00:47:28,800 --> 00:47:30,800
So it's often just a little bit delicate.

514
00:47:30,800 --> 00:47:35,560
So the sooner they know, they can go ahead and let them know the news.

515
00:47:35,560 --> 00:47:38,720
Like you've got high THC in your crop.

516
00:47:38,720 --> 00:47:47,080
And like I said, it's just that much sooner that they can begin essentially the remediation

517
00:47:47,080 --> 00:47:50,960
process for their next crop or adjacent crops.

518
00:47:50,960 --> 00:47:58,600
So this is something that, like I said, the hemp producers, they are, it is a problem

519
00:47:58,600 --> 00:47:59,600
for them at the moment.

520
00:47:59,600 --> 00:48:03,960
So there's a non-negligible portion failing and it's something that people are trying

521
00:48:03,960 --> 00:48:05,460
to pin down.

522
00:48:05,460 --> 00:48:13,040
And so from my point of view, even if their crop fails, but they have five additional

523
00:48:13,040 --> 00:48:20,640
days to try to work on it for next year, something's better than nothing.

524
00:48:20,640 --> 00:48:26,800
So I think that could go a long way.

525
00:48:26,800 --> 00:48:35,360
So that's essentially where I see the gains being from the laboratory point of view, from

526
00:48:35,360 --> 00:48:41,400
the cultivators point of view, these statistics may be a bit more interesting.

527
00:48:41,400 --> 00:48:50,360
So seeing, okay, we may want to, right, because different cultivators have a different risk

528
00:48:50,360 --> 00:48:55,080
preference, right, risk tolerance.

529
00:48:55,080 --> 00:48:58,800
So this is essentially your probability of failing.

530
00:48:58,800 --> 00:49:05,040
So some cultivators may have a higher preference for risk and they may want to push it out

531
00:49:05,040 --> 00:49:09,200
a ways to try to, right, because they're maximizing their CBD.

532
00:49:09,200 --> 00:49:14,600
And let's say they get revenue proportional to the CBD that they produce, right?

533
00:49:14,600 --> 00:49:19,400
So it's just a cost benefit analysis.

534
00:49:19,400 --> 00:49:26,560
So the longer you wait, the higher your benefit is, your CBD, but the higher your cost is,

535
00:49:26,560 --> 00:49:29,840
which is your probability of failing.

536
00:49:29,840 --> 00:49:43,600
So depending on their risk preference, you would just, one could think, well, simplifying

537
00:49:43,600 --> 00:49:48,840
things, you know, if these were just the only variables at hand, then you could just make

538
00:49:48,840 --> 00:49:49,840
a trade off.

539
00:49:49,840 --> 00:49:54,840
Okay, you know, how long do I want to wait for the amount of CBD given my probability

540
00:49:54,840 --> 00:49:55,840
of failing?

541
00:49:55,840 --> 00:49:59,920
And so then it would just be a simple cost benefit analysis, and you could use that to

542
00:49:59,920 --> 00:50:05,280
just choose, I want to harvest on this date.

543
00:50:05,280 --> 00:50:09,000
In reality, were you going to say something?

544
00:50:09,000 --> 00:50:10,920
Oh, I'm sorry, Keegan, go ahead.

545
00:50:10,920 --> 00:50:14,520
Or I was just going to say, in reality, there's a lot more factors.

546
00:50:14,520 --> 00:50:17,000
But do you have a comment?

547
00:50:17,000 --> 00:50:24,160
I did regarding your CBD to THC correlation plot.

548
00:50:24,160 --> 00:50:28,480
There was one outlier that was very interesting, which is right there on the right hand side

549
00:50:28,480 --> 00:50:35,640
where they're budding right up against the 0.3% THC threshold, but they're maximizing

550
00:50:35,640 --> 00:50:37,320
their CBD.

551
00:50:37,320 --> 00:50:43,000
That guy right there, to me, when I see that, that to me is like suspect.

552
00:50:43,000 --> 00:50:48,480
They either cracked the nut or something else is going on.

553
00:50:48,480 --> 00:50:49,480
At the 0.29%?

554
00:50:49,480 --> 00:50:57,600
Yeah, and they were all the way up there to almost 15, 14 and a half CBD.

555
00:50:57,600 --> 00:50:59,600
Yes.

556
00:50:59,600 --> 00:51:02,640
And yeah, that's, you're right.

557
00:51:02,640 --> 00:51:07,600
So they could have, they may have just gotten lucky.

558
00:51:07,600 --> 00:51:12,720
I don't think there's anything shady going on because we've got some pretty reputable

559
00:51:12,720 --> 00:51:13,720
labs here.

560
00:51:13,720 --> 00:51:25,040
So one would like to think that it's just luck of the draw.

561
00:51:25,040 --> 00:51:29,920
Yeah, maybe I'm being too cynical.

562
00:51:29,920 --> 00:51:35,280
But like I said, I mean, I think it's just an outlier.

563
00:51:35,280 --> 00:51:36,800
Like you said, anything's possible.

564
00:51:36,800 --> 00:51:48,120
I think maybe if you saw like a string of observations all at 0.29% that may be suspect.

565
00:51:48,120 --> 00:52:00,760
Yeah, and also the point by 10% CBD, which is kind of has low THC.

566
00:52:00,760 --> 00:52:03,240
So if it's right over there, right there.

567
00:52:03,240 --> 00:52:06,600
Yeah, that guy's doing pretty well, right?

568
00:52:06,600 --> 00:52:08,520
He's got a very low risk threshold.

569
00:52:08,520 --> 00:52:14,680
You know, if we're using the terms that you kind of defined through your logic model,

570
00:52:14,680 --> 00:52:17,480
he's got low risk and kind of high reward there.

571
00:52:17,480 --> 00:52:20,440
So that's a pretty good performance.

572
00:52:20,440 --> 00:52:23,560
And this is actually essentially what cultivators are trying to figure out.

573
00:52:23,560 --> 00:52:27,100
And like I said, it just luck of the draw getting these right.

574
00:52:27,100 --> 00:52:36,240
So it's basically, you're trying to figure out these, the right, the right strain.

575
00:52:36,240 --> 00:52:44,640
So you're trying to figure out the perfect strains to grow.

576
00:52:44,640 --> 00:52:50,920
And so basically, they're still figuring it out.

577
00:52:50,920 --> 00:53:02,280
So for example, these, the front range biosciences, they may want to start breeding hybrid number

578
00:53:02,280 --> 00:53:03,280
five.

579
00:53:03,280 --> 00:53:04,280
Right.

580
00:53:04,280 --> 00:53:14,880
And so I think essentially, I think this gets back to sort of essentially what I'm starting

581
00:53:14,880 --> 00:53:24,640
to narrow down on is it's about select.

582
00:53:24,640 --> 00:53:29,720
There's a lot of factors going on, but a lot comes down to selecting the right strain.

583
00:53:29,720 --> 00:53:36,880
So and there's, you know, and so that's why it's in high demand.

584
00:53:36,880 --> 00:53:37,880
Right.

585
00:53:37,880 --> 00:53:43,040
So just because you have hemp seeds doesn't necessarily mean they're going to be profitable.

586
00:53:43,040 --> 00:53:49,000
So you know, a lot of people out there, they may not know one way or the other, and they

587
00:53:49,000 --> 00:53:56,920
just purchase a lot of, you know, berry blossom seeds or presidential seeds.

588
00:53:56,920 --> 00:54:01,780
And they just, they just got, you know, the wrong genetics.

589
00:54:01,780 --> 00:54:09,280
And so I think this is what hemp producers are starting to learn is, you know, the strain

590
00:54:09,280 --> 00:54:10,760
you're growing really matters.

591
00:54:10,760 --> 00:54:19,520
So I think that's, I think that's what I'm starting to get to is because here we're starting

592
00:54:19,520 --> 00:54:24,400
to see that, okay, when you sample, it starts to matter.

593
00:54:24,400 --> 00:54:30,480
And so I'm starting to think, okay, that may be dictated by which strain you're actually

594
00:54:30,480 --> 00:54:33,760
growing.

595
00:54:33,760 --> 00:54:41,680
Were there multiple, multiple incidents at the same strain just being harvested at different

596
00:54:41,680 --> 00:54:42,680
times?

597
00:54:42,680 --> 00:54:44,120
Oh, yes.

598
00:54:44,120 --> 00:54:58,120
And so there is room for strain analysis here, because as you see, we've got a lot of repeats.

599
00:54:58,120 --> 00:55:01,440
So here are some of like the popular ones.

600
00:55:01,440 --> 00:55:13,760
So like, it looks like, like this cherry wine just looks to be, you know, unbelievably popular.

601
00:55:13,760 --> 00:55:18,480
But you know, it may not really be, you know, the best choice.

602
00:55:18,480 --> 00:55:22,880
So, so, or maybe it's a fantastic choice.

603
00:55:22,880 --> 00:55:27,680
And so I think, I think there's room there to be done, because basically, let's say you're

604
00:55:27,680 --> 00:55:31,560
a new, a new hemp producer out there.

605
00:55:31,560 --> 00:55:37,520
Well, you know, you're, you've got your, you've invested all this in your cultivation equipment,

606
00:55:37,520 --> 00:55:44,320
and you're about to buy, say, 30,000 seeds to go seed acres.

607
00:55:44,320 --> 00:55:48,880
Well, you want to make sure you pick out the right variety.

608
00:55:48,880 --> 00:55:55,080
And I'm sure every seed producer out there that you call is going to say that, oh, yeah,

609
00:55:55,080 --> 00:55:58,040
their seeds are fantastic.

610
00:55:58,040 --> 00:56:06,280
So so I think that's, that's where I think our analysis comes in, is, you know, if you're

611
00:56:06,280 --> 00:56:15,600
a hemp producer, you can begin to start tailoring our analysis and see if you can't, you know,

612
00:56:15,600 --> 00:56:19,720
start picking the varieties that are right for you.

613
00:56:19,720 --> 00:56:28,320
So it's known that over time, THC increases in these in hemp.

614
00:56:28,320 --> 00:56:34,360
So and there are a lot of cases of farmers harvesting, and it being tested like a month

615
00:56:34,360 --> 00:56:39,880
or two later, it passed when they harvested, but it fails later.

616
00:56:39,880 --> 00:56:44,840
So there's sort of this race against the clock.

617
00:56:44,840 --> 00:56:50,800
And I'm not sure I guess I'm from this data, it looks like CDB also goes up over time.

618
00:56:50,800 --> 00:56:53,760
But it's kind of trying to balance the two out.

619
00:56:53,760 --> 00:57:00,360
Yeah, definitely a trade off like you're saying, Keung, like a cost function here.

620
00:57:00,360 --> 00:57:06,680
Exactly, and so I think I think there's room for deeper economic analysis.

621
00:57:06,680 --> 00:57:11,480
And I think I think what Paul, I think you, you're you're real good at noting these outliers

622
00:57:11,480 --> 00:57:16,880
and their significance, because I think essentially, that's what you're you're aiming for.

623
00:57:16,880 --> 00:57:25,720
Because if you can find a stream like this, then you don't have quite the the the careful

624
00:57:25,720 --> 00:57:27,920
dance that you have to dance, right?

625
00:57:27,920 --> 00:57:35,600
Because you're growing something right here that it's going to be just such a such a fine

626
00:57:35,600 --> 00:57:43,680
line between passing and failing that, you know, that that could pose a that, you know,

627
00:57:43,680 --> 00:57:49,400
the probability of failing could pose such a high cost on a cultivator that it wouldn't

628
00:57:49,400 --> 00:57:51,320
be worth it or that they could.

629
00:57:51,320 --> 00:57:55,980
It's not uncommon that I mean, they could end up as a loss, right?

630
00:57:55,980 --> 00:58:02,160
Because if you act, it's not unheard of to, you know, to grow a lot of hemp, and then

631
00:58:02,160 --> 00:58:04,520
it fail.

632
00:58:04,520 --> 00:58:06,760
And then you'll you're in the red.

633
00:58:06,760 --> 00:58:10,320
Yeah, yeah, very risky.

634
00:58:10,320 --> 00:58:17,360
In in in from firsthand experience, a lot of hemp producers don't understand that risk

635
00:58:17,360 --> 00:58:18,720
getting into it.

636
00:58:18,720 --> 00:58:25,980
They think that they're growing a hemp strain, and they just they're just there's not going

637
00:58:25,980 --> 00:58:28,240
to be much THC in it.

638
00:58:28,240 --> 00:58:33,920
But as we're seeing, you know, there's there is a positive core, it appears to be a positive

639
00:58:33,920 --> 00:58:35,480
correlation here.

640
00:58:35,480 --> 00:58:42,560
So it's just that some of them, it's just a weaker, you know, weaker correlation.

641
00:58:42,560 --> 00:58:46,760
So yeah, good, good work, Keegan.

642
00:58:46,760 --> 00:58:49,800
And Charles, that's good stuff.

643
00:58:49,800 --> 00:58:55,760
The data that was collected and consolidated, the gentleman that you mentioned at the beginning

644
00:58:55,760 --> 00:59:00,600
of the presentation, did he reach out to each of the states separately to get this information?

645
00:59:00,600 --> 00:59:09,520
Yes, so the Washington State data is just from Washington State.

646
00:59:09,520 --> 00:59:18,400
So so in fact, touching on that real quick, didn't leave Charles 10 minutes, but I'll

647
00:59:18,400 --> 00:59:21,160
give you 10 minutes at the start of next meetup.

648
00:59:21,160 --> 00:59:34,440
So essentially, what Paul is referring to here is in you can find this in in our prior

649
00:59:34,440 --> 00:59:45,880
in our earlier meetups, there's several links here, essentially to a Washington State database.

650
00:59:45,880 --> 00:59:52,040
And so this data was collected by Jim McCray, who's a researcher in Washington State and

651
00:59:52,040 --> 00:59:53,960
shared with me.

652
00:59:53,960 --> 01:00:00,120
And then Charles has done some analysis here, just trying to predict the probability of

653
01:00:00,120 --> 01:00:02,240
a sample failing.

654
01:00:02,240 --> 01:00:06,360
And so Charles, I apologize to you that I didn't leave you 10 minutes.

655
01:00:06,360 --> 01:00:15,880
And so if you want to go over this at the beginning of next week, we can set that aside.

656
01:00:15,880 --> 01:00:18,040
Okay, that's cool.

657
01:00:18,040 --> 01:00:19,120
All right.

658
01:00:19,120 --> 01:00:25,160
And then for the rest of the crew, you can look through Charles's work here, because

659
01:00:25,160 --> 01:00:32,680
he's done some fantastic work, essentially seeing if we can, similarly to what we've

660
01:00:32,680 --> 01:00:40,200
done today, predict the probability of failing, given what sample type it is.

661
01:00:40,200 --> 01:00:44,560
But I'll let Charles get more into that next week.

662
01:00:44,560 --> 01:00:47,000
All right.

663
01:00:47,000 --> 01:00:48,000
Looking forward to that one, Charles.

664
01:00:48,000 --> 01:00:49,000
That looks good.

665
01:00:49,000 --> 01:00:50,000
All right.

666
01:00:50,000 --> 01:00:56,680
Well, I'm going to end the presentation here.

667
01:00:56,680 --> 01:01:05,840
And so real quick, are there any quick comments, questions, concerns before we call it a day?

668
01:01:05,840 --> 01:01:07,240
Nope.

669
01:01:07,240 --> 01:01:11,840
Well, it's been awesome having everybody.

670
01:01:11,840 --> 01:01:13,800
It's great having some new faces here.

671
01:01:13,800 --> 01:01:19,620
So it's awesome seeing you, Gabrielle, Eri, Megan, and of course, Paul, Charles, and Heather.

672
01:01:19,620 --> 01:01:21,360
So it's been fun.

673
01:01:21,360 --> 01:01:28,200
So let's keep in touch, keep bouncing ideas off each other.

674
01:01:28,200 --> 01:01:33,760
And then next week, we'll start with Charles's analysis and keep doing analytics.

675
01:01:33,760 --> 01:01:35,760
Sounds good.

676
01:01:35,760 --> 01:01:39,440
All right, everyone.

677
01:01:39,440 --> 01:01:42,880
Have a productive week and until next week, keep being awesome.

678
01:01:42,880 --> 01:01:43,880
Bye now.

679
01:01:43,880 --> 01:01:44,880
Bye-bye.

680
01:01:44,880 --> 01:02:00,960
Bye.