1
00:00:00,000 --> 00:00:11,120
Welcome to the Canvas Data Science Meetup Group.

2
00:00:11,120 --> 00:00:13,280
You're in for a treat today.

3
00:00:13,280 --> 00:00:16,400
It could easily be the best meetup to date.

4
00:00:16,400 --> 00:00:18,760
So we'll cover a lot of ground today.

5
00:00:18,760 --> 00:00:22,280
So fasten your seatbelts because it's going to be a good one.

6
00:00:22,280 --> 00:00:24,920
So before we kick it off though,

7
00:00:24,920 --> 00:00:27,440
Kay and Travis.

8
00:00:27,440 --> 00:00:32,400
So Kay just told me that you're interested in data science,

9
00:00:32,400 --> 00:00:35,600
particularly R. Travis,

10
00:00:35,600 --> 00:00:38,280
would you be interested in introducing yourself and

11
00:00:38,280 --> 00:00:42,040
what questions about cannabis and the cannabis industry

12
00:00:42,040 --> 00:00:46,000
you may be interested in answering with data science?

13
00:00:47,040 --> 00:00:50,400
Hi. I'm just trying to get into the industry

14
00:00:50,400 --> 00:00:52,360
and getting to be in a data scientist.

15
00:00:52,360 --> 00:00:55,960
I just want to learn as much as I can right now.

16
00:00:55,960 --> 00:00:59,560
Awesome. You're in the right place.

17
00:00:59,560 --> 00:01:02,600
Well, without further ado,

18
00:01:02,600 --> 00:01:08,120
I typically end up just trying to present some of my latest work,

19
00:01:08,120 --> 00:01:11,960
and then just would like to pause frequently throughout

20
00:01:11,960 --> 00:01:14,240
to answer any questions you may have,

21
00:01:14,240 --> 00:01:16,880
and hopefully it spark a discussion.

22
00:01:16,880 --> 00:01:21,600
So I'll go ahead and share my screen.

23
00:01:21,600 --> 00:01:24,280
Awesome. So welcome to the meetup.

24
00:01:24,280 --> 00:01:30,880
You can find all of the source code on GitHub here.

25
00:01:30,880 --> 00:01:35,640
Here's a good regular Varus Graham who will be joining momentarily.

26
00:01:37,640 --> 00:01:40,280
Welcome Graham. It's good to see you today.

27
00:01:40,280 --> 00:01:42,280
Hey.

28
00:01:42,280 --> 00:01:46,880
Well, we've got a lot of questions.

29
00:01:46,880 --> 00:01:53,000
Well, we've got a lot of questions.

30
00:02:16,880 --> 00:02:21,880
Awesome. So just making sure everybody's on board.

31
00:02:21,880 --> 00:02:24,480
So here's the source code.

32
00:02:24,480 --> 00:02:30,280
And essentially thought we could start with a question

33
00:02:30,280 --> 00:02:33,960
and spend most of the day doing data science today.

34
00:02:33,960 --> 00:02:39,520
So Graham will be getting to producers and processors

35
00:02:39,520 --> 00:02:42,040
in next week's meetup.

36
00:02:42,040 --> 00:02:47,120
For today, I thought we could answer the question,

37
00:02:47,120 --> 00:02:50,560
how can you run a profitable lab?

38
00:02:50,560 --> 00:02:56,120
So we've got this rich data set of lab analyses in Washington state.

39
00:02:56,120 --> 00:03:01,920
And so I was thinking, is there any way we can use this data set to help others

40
00:03:01,920 --> 00:03:04,600
either in Washington state or across the country,

41
00:03:04,600 --> 00:03:08,320
think about how they can run a profitable lab?

42
00:03:08,320 --> 00:03:12,160
So always helps to start with a good question.

43
00:03:12,160 --> 00:03:14,160
So this is our question of the day.

44
00:03:18,160 --> 00:03:27,000
So without further ado, the first step is really just to read in the data.

45
00:03:27,000 --> 00:03:34,160
So I've already read in the data just to save us a minute or so.

46
00:03:34,160 --> 00:03:43,760
But it's simple enough to read in the packages.

47
00:03:43,760 --> 00:03:51,240
And here I just defined a few useful helper functions.

48
00:03:51,240 --> 00:03:55,960
And then to save ourselves some bandwidth,

49
00:03:55,960 --> 00:03:59,080
I'm just reading in one variable.

50
00:03:59,080 --> 00:04:06,880
So we've got this awesome data set of lab results in Washington state.

51
00:04:06,880 --> 00:04:11,960
And so you can find the link here above.

52
00:04:11,960 --> 00:04:16,360
And just to show you where I've stored the data,

53
00:04:16,360 --> 00:04:21,640
so in the cannabis data science repository,

54
00:04:21,640 --> 00:04:24,840
you can create a folder at your root.

55
00:04:24,840 --> 00:04:29,040
Called data sets, dot data sets.

56
00:04:29,040 --> 00:04:36,240
And here I've stored these three sets of lab results,

57
00:04:36,240 --> 00:04:38,440
as well as the licensees data.

58
00:04:38,440 --> 00:04:46,800
So as you can see, we're working with about 2.2 or so gigabytes of data.

59
00:04:46,800 --> 00:04:48,320
So a nice good chunk.

60
00:04:48,320 --> 00:04:51,920
And there's so many data points here.

61
00:04:51,920 --> 00:04:57,680
We've got all these solvents and all the pesticide data is blank.

62
00:04:57,680 --> 00:05:00,240
So you can't use it anyways.

63
00:05:00,240 --> 00:05:02,680
You do have the cannabinoid data.

64
00:05:02,680 --> 00:05:04,720
So rich data set here.

65
00:05:04,720 --> 00:05:10,760
But I'll show you what you can do with just this one data point, the global ID.

66
00:05:10,760 --> 00:05:14,520
So here I've got the

67
00:05:14,520 --> 00:05:19,720
global ID, which is this one data point, the global ID.

68
00:05:19,720 --> 00:05:26,840
So we'll do a lot of legwork today with just this one variable.

69
00:05:26,840 --> 00:05:32,120
So basically, I've already read in the data here.

70
00:05:32,120 --> 00:05:39,400
This doesn't take very long, but maybe 30 seconds or so to read in this data set.

71
00:05:39,400 --> 00:05:42,640
And then we clean it.

72
00:05:42,640 --> 00:05:50,320
So we can identify all of the labs in Washington state.

73
00:05:50,320 --> 00:05:53,760
And this is sort of supplementary.

74
00:05:53,760 --> 00:05:56,560
We don't really need this step.

75
00:05:56,560 --> 00:06:02,520
But you can also find out information about the various labs

76
00:06:02,520 --> 00:06:07,360
by running it, by reading in the licensees data.

77
00:06:07,360 --> 00:06:15,440
That way we can, for example, look at specific factors.

78
00:06:15,440 --> 00:06:20,760
And then here, just add day, month, year variables.

79
00:06:20,760 --> 00:06:25,440
So a lot of cleaning and organizing to do.

80
00:06:25,440 --> 00:06:32,120
But we can get onto the data because that's what's more interesting to look at.

81
00:06:32,120 --> 00:06:36,040
So we can calculate our first data point.

82
00:06:36,040 --> 00:06:39,600
So just using the total number of samples,

83
00:06:39,600 --> 00:06:47,160
we can see which lab tested each sample based on the global ID.

84
00:06:47,160 --> 00:07:00,680
So for example, you can see that the global ID contains the lab's identifier.

85
00:07:00,680 --> 00:07:06,680
So these five lab results were tested by lab 10.

86
00:07:06,680 --> 00:07:15,960
So we can simply just do a count of how many samples all the various labs were testing.

87
00:07:15,960 --> 00:07:21,160
And since we know the total number of samples tested,

88
00:07:21,160 --> 00:07:25,120
we can see the proportion that each lab is testing.

89
00:07:25,120 --> 00:07:28,320
And so I just call that the market share here,

90
00:07:28,320 --> 00:07:33,280
so that the number of samples that lab tested versus the total.

91
00:07:33,280 --> 00:07:40,920
So without further ado, let's go ahead and print out our first statistics for the day.

92
00:07:40,920 --> 00:07:44,120
And here we have them.

93
00:07:44,120 --> 00:07:56,280
So these are a handful of laboratories that have or are operating in Washington state.

94
00:07:56,280 --> 00:08:04,240
And I just printed out the city, so that way you can just get a geographical perspective

95
00:08:04,240 --> 00:08:06,800
of where the labs may be.

96
00:08:06,800 --> 00:08:12,600
And just to sort these, I'm just going to essentially sort them

97
00:08:12,600 --> 00:08:16,040
by their total number of samples.

98
00:08:16,040 --> 00:08:21,240
That way, we can see what's the market share by laboratory

99
00:08:21,240 --> 00:08:24,600
and where in the state are they located.

100
00:08:24,600 --> 00:08:31,680
So you can see, so Poles bow is what you would call Western Washington.

101
00:08:31,680 --> 00:08:34,840
So it's interesting the breakdown of Washington.

102
00:08:34,840 --> 00:08:40,320
It's really a different climate in Eastern versus Western Washington.

103
00:08:40,320 --> 00:08:46,720
And so I always thought it was interesting about where producers may be located.

104
00:08:46,720 --> 00:08:52,200
And I thought, OK, maybe laboratories can locate anywhere.

105
00:08:52,200 --> 00:08:59,480
But of course, people prefer a shorter distance for sending their samples.

106
00:08:59,480 --> 00:09:06,360
So there is a laboratory in Poles bow, and they have the highest market share.

107
00:09:06,360 --> 00:09:16,080
Bellingham is in Northeastern Washington, so they're capturing that segment of the market.

108
00:09:16,080 --> 00:09:19,360
Redmond is near Seattle.

109
00:09:19,360 --> 00:09:25,600
And then, of course, Olympia is on the board and in a couple of places.

110
00:09:25,600 --> 00:09:35,440
So just a statistic, there's the market share and there's the cities they're in.

111
00:09:35,440 --> 00:09:42,280
So we always like to try to do some plots here.

112
00:09:42,280 --> 00:09:50,640
So here is simply a plot of samples by day.

113
00:09:50,640 --> 00:09:53,800
Just checking, am I still coming in OK for everyone?

114
00:09:58,320 --> 00:10:00,680
Just making sure we're still on the view.

115
00:10:00,680 --> 00:10:01,480
Awesome.

116
00:10:01,480 --> 00:10:10,320
So like I said, we've got a lot of ground to cover, so sorry if I talk a lot.

117
00:10:10,320 --> 00:10:19,680
So here we have samples by day, by lab.

118
00:10:19,680 --> 00:10:24,400
Quite noisy, so a little too noisy.

119
00:10:24,400 --> 00:10:29,720
So here I'm just doing the same plot, but by month.

120
00:10:29,720 --> 00:10:35,040
So we're just aggregating all the samples by month.

121
00:10:35,040 --> 00:10:46,880
So here you can see we've got labs operating all over the board with some testing,

122
00:10:46,880 --> 00:10:57,120
you know, almost some months more, but, you know, some up to around 3,000 samples a month.

123
00:10:57,120 --> 00:10:59,640
And then we can plot by year.

124
00:10:59,640 --> 00:11:05,440
Just to reduce a bit of the noise even further.

125
00:11:05,440 --> 00:11:11,960
And so here you see samples tested by lab by year.

126
00:11:11,960 --> 00:11:21,440
Just really just sort of adding statistics upon statistics here.

127
00:11:21,440 --> 00:11:24,920
So, right, we started with, OK, what's the total number of samples?

128
00:11:24,920 --> 00:11:30,720
Tested by lab, now we want to try to break that out by year.

129
00:11:30,720 --> 00:11:41,880
So doing that, and, you know, we're seeing, right, we've got some labs testing around a little less than 30,000 samples per year.

130
00:11:41,880 --> 00:11:49,680
Looks like the blue lab, light blue, is, you know, on the trajectory towards 30,000.

131
00:11:49,680 --> 00:11:57,680
Then you've got a handful of labs testing around 15,000 or so samples a year.

132
00:11:57,680 --> 00:12:10,840
And then we've got another handful that are testing below 10,000 per year with this gray on a slight positive trajectory.

133
00:12:10,840 --> 00:12:14,280
So nothing too interesting yet.

134
00:12:14,280 --> 00:12:20,080
Just, you know, these are just the samples being tested.

135
00:12:20,080 --> 00:12:26,880
So where things get interesting in data science is when you can start adding supplementary statistics.

136
00:12:26,880 --> 00:12:33,880
So for those of you that are ambitious, I'll let you calculate these extra statistics on your own.

137
00:12:33,880 --> 00:12:39,880
So this is you can start to break down samples tested by the various analyses.

138
00:12:39,880 --> 00:12:44,880
That they may be doing, or you could even look at the failure rate.

139
00:12:44,880 --> 00:12:55,880
So these are quality assurance or quality control tests that samples need to go through to make it onto the shelves.

140
00:12:55,880 --> 00:13:03,480
So it would be interesting from a producer processor point of view to know what the failure rate may be.

141
00:13:03,480 --> 00:13:12,480
So that may be for a bit more for next week. So next week, we'll look at the profitability of producers and processors.

142
00:13:12,480 --> 00:13:17,480
So that'll come into play then.

143
00:13:17,480 --> 00:13:28,080
So we'll start supplementing this with data here in a second, but we can start using some of our statistical techniques.

144
00:13:28,080 --> 00:13:34,680
So, for example, we've done forecasting with ARIMA models.

145
00:13:34,680 --> 00:13:41,880
So here we can essentially forecast the number of samples tested by lab.

146
00:13:41,880 --> 00:13:45,680
And I was thinking.

147
00:13:45,680 --> 00:13:53,080
We've been talking about the long term here, so I figured let's just go ahead and do a long term forecast.

148
00:13:53,080 --> 00:13:58,280
So we'll forecast today through the next five years.

149
00:13:58,280 --> 00:14:02,280
So through 2027.

150
00:14:02,280 --> 00:14:09,280
Cecilia joining.

151
00:14:09,280 --> 00:14:17,480
So I'll go ahead and run this forecast since this forecasting model takes a second to run.

152
00:14:17,480 --> 00:14:22,080
And I'll talk about some of the next data that we'll collect.

153
00:14:22,080 --> 00:14:26,080
So where am I going with all of this?

154
00:14:26,080 --> 00:14:33,880
Right. We're getting the samples tested per year, but I'm essentially interested in, OK, how do you run a profitable lab?

155
00:14:33,880 --> 00:14:39,880
Just because you're testing a lot of samples doesn't necessarily mean you're profitable or not.

156
00:14:39,880 --> 00:14:46,080
So I was thinking we could supplement this with some of the data that we've collected.

157
00:14:46,080 --> 00:14:52,880
And we could supplement this with the price data in Washington state.

158
00:14:52,880 --> 00:15:01,480
So essentially what I've done is I've just collected a big data set here of laboratories.

159
00:15:01,480 --> 00:15:15,280
And I just went to the laboratory's website, went to their pricing page and.

160
00:15:15,280 --> 00:15:18,280
And grabbed.

161
00:15:18,280 --> 00:15:26,280
The price of an eye 502 panel, so that would be your quality assurance testing panel.

162
00:15:26,280 --> 00:15:34,680
Some of the labs have different prices, depending on if it's a flower or if it's a concentrate or what have you.

163
00:15:34,680 --> 00:15:38,080
So I just chose the simplest number.

164
00:15:38,080 --> 00:15:41,280
So that's sort of an imperfection.

165
00:15:41,280 --> 00:15:48,080
So the price may be overstated because, for example.

166
00:15:48,080 --> 00:15:54,480
And the products such as sodas, I believe.

167
00:15:54,480 --> 00:16:01,680
I may walk that back, but certain products don't need as many tests as others, so their prices are cheaper.

168
00:16:01,680 --> 00:16:07,680
So I generally got the price for a flower or concentrate test.

169
00:16:07,680 --> 00:16:11,380
And so I collected all the prices for the various labs.

170
00:16:11,380 --> 00:16:17,680
And since can litics is here to make all this data easily accessible for you.

171
00:16:17,680 --> 00:16:22,680
You can actually access all of this data.

172
00:16:22,680 --> 00:16:37,080
Through an API, so you can ping can litics dot com forward slash API forward slash labs and you can get all of the lab data.

173
00:16:37,080 --> 00:16:43,480
And haven't built in very many queries yet, but we can at least query by the state.

174
00:16:43,480 --> 00:16:50,680
So we'll do that here momentarily, but it looks like our forecasts are finished.

175
00:16:50,680 --> 00:17:02,980
So here's a forecast for the next five years of samples tested by the various laboratories in Washington state.

176
00:17:02,980 --> 00:17:12,880
So as you can see, we're forecasting there is a group of laboratories that are testing around 500 or so samples a month.

177
00:17:12,880 --> 00:17:19,380
Another set testing between a thousand and two thousand samples per month.

178
00:17:19,380 --> 00:17:27,580
And then there's a couple of labs here at the top that are testing around 2500 samples per month.

179
00:17:27,580 --> 00:17:33,880
But keep in mind this these are samples tested, not necessarily profitability.

180
00:17:33,880 --> 00:17:45,880
So we needed to keep in mind price and cost if we want to really get a deep dive on profitability.

181
00:17:45,880 --> 00:17:50,180
So let's read in this place price data.

182
00:17:50,180 --> 00:17:58,580
So simple enough. So here I'm just making a request to can litics API.

183
00:17:58,580 --> 00:18:10,180
Forward slash labs in passing the parameter that I want the state to be Washington since we just want to look at Washington labs here.

184
00:18:10,180 --> 00:18:13,980
So just going to make a quick request.

185
00:18:13,980 --> 00:18:17,280
And it should already have executed.

186
00:18:17,280 --> 00:18:21,780
And there are all of our labs.

187
00:18:21,780 --> 00:18:30,880
And in particular, we just want to look at the prices that the labs are charging for their panels.

188
00:18:30,880 --> 00:18:32,780
And so.

189
00:18:32,780 --> 00:18:36,180
Here you see, you know, we've got a range.

190
00:18:36,180 --> 00:18:40,380
I couldn't find prices for all the labs. That's OK.

191
00:18:40,380 --> 00:18:44,880
We'll just look at the labs where we do have price data.

192
00:18:44,880 --> 00:18:49,380
As you can see, we've got a range here, 80 to 120.

193
00:18:49,380 --> 00:18:54,880
It looks like around 100 is around the average.

194
00:18:54,880 --> 00:19:03,380
Not certain if we can just take a quick average here.

195
00:19:03,380 --> 00:19:06,880
Yeah, so the average is is 99.

196
00:19:06,880 --> 00:19:26,180
So this is a lot lower of a price than you'll see in many other states because Washington state doesn't mandate yet pesticide or heavy metal testing, whereas a lot of the other states do.

197
00:19:26,180 --> 00:19:29,980
And I'll explain further on.

198
00:19:29,980 --> 00:19:36,680
As to a reason why, but it's really hard to change the rules once you've implemented them.

199
00:19:36,680 --> 00:19:44,380
So a lot of states, their initial rules required heavy metal or pesticide testing.

200
00:19:44,380 --> 00:19:55,780
Washington State, one of the earlier labs, I mean, one of the earlier states to start laboratory testing just started with some of the more basic tests.

201
00:19:55,780 --> 00:20:02,980
So cannabinoids that screen for micro toxins and microbiology, foreign matter.

202
00:20:02,980 --> 00:20:07,580
And they do test concentrates for residual solvents.

203
00:20:07,580 --> 00:20:12,480
So not as many tests that are required.

204
00:20:12,480 --> 00:20:18,880
So you're looking at around $100 for a quality assurance test.

205
00:20:18,880 --> 00:20:30,680
Well, assuming that all of the labs keep their prices the same, then what would their revenue be for the next five years?

206
00:20:30,680 --> 00:20:33,180
So this is where things get interesting, right?

207
00:20:33,180 --> 00:20:42,180
So we now get to combine price data with data on the total number of samples tested.

208
00:20:42,180 --> 00:20:45,980
So we can do just that.

209
00:20:45,980 --> 00:20:53,580
And we can have a forecast of monthly.

210
00:20:53,580 --> 00:21:01,280
This should actually read monthly revenue by labs in Washington.

211
00:21:01,280 --> 00:21:05,780
So let's go ahead and make that quick.

212
00:21:05,780 --> 00:21:09,180
Adjustment. OK, so here we have it.

213
00:21:09,180 --> 00:21:22,480
So revenue is ranging anywhere from below 50,000 a month to above 300 or not above certain months.

214
00:21:22,480 --> 00:21:31,080
But really, like above 250,000 for certain laboratories here.

215
00:21:31,080 --> 00:21:42,580
A lot of times we would like to think about this on a year to year basis so we can plot this by year.

216
00:21:42,580 --> 00:21:47,980
So we've talked about our forecasting model in the past.

217
00:21:47,980 --> 00:22:00,480
I don't think this package PMD-AutoAReMI is handling month effects correctly.

218
00:22:00,480 --> 00:22:04,980
And so I'm having to include all of the months.

219
00:22:04,980 --> 00:22:09,080
And so you can't really interpret the regression.

220
00:22:09,080 --> 00:22:17,880
So long story short, just keep in mind that I don't have full confidence in our forecasting model here.

221
00:22:17,880 --> 00:22:22,780
But some forecasts are better than no forecasts.

222
00:22:22,780 --> 00:22:28,280
So keep that in mind. Just want to bring that up that.

223
00:22:28,280 --> 00:22:34,980
This isn't estimating how I would prefer to estimate our forecast.

224
00:22:34,980 --> 00:22:44,480
So hopefully I'll find some time this spring or later this winter to redo this forecast.

225
00:22:44,480 --> 00:22:51,280
But long story short, we can still get some approximations for today.

226
00:22:51,280 --> 00:22:56,580
So we've got our prices, we've got our year forecast.

227
00:22:56,580 --> 00:23:06,980
So we're forecasting some labs will be cresting 3 million in revenue per year.

228
00:23:06,980 --> 00:23:14,180
Where, you know, you're looking at, you know, this lab.

229
00:23:14,180 --> 00:23:20,280
It's just a smaller lab and they've just they're going to have a little less than half a million in revenue.

230
00:23:20,280 --> 00:23:27,580
And then the second that we can measure a little less than one million in revenue.

231
00:23:27,580 --> 00:23:31,680
And then you see where the others fall out as well.

232
00:23:31,680 --> 00:23:42,880
The only main major prediction so far is that this lab, lab number 22,

233
00:23:42,880 --> 00:23:53,880
for which for whatever reason, our forecasting model predicts that their revenue is greatly going to diminish after 2023.

234
00:23:53,880 --> 00:24:00,280
So that would be the only real thing I would really point out so far is, you know,

235
00:24:00,280 --> 00:24:09,680
if you're running lab 22, I would look at your fundamentals and see maybe it's just the forecasting model.

236
00:24:09,680 --> 00:24:15,980
Something's odd with the forecasting model, but that's that's sort of.

237
00:24:15,980 --> 00:24:23,080
Something's going on there because you see the other forecasts are relatively staffed.

238
00:24:23,080 --> 00:24:29,780
15 million, three lab three around 14.6 million and then down the line.

239
00:24:29,780 --> 00:24:33,180
So long story short.

240
00:24:33,180 --> 00:24:45,980
I find it interesting that the lab with the second highest predicted revenue doesn't actually have the second highest forecasted total samples tested.

241
00:24:45,980 --> 00:24:52,180
So this would be sort of my my second major.

242
00:24:52,180 --> 00:25:01,480
Observation is. Lab number 21 is forecasted to test almost.

243
00:25:01,480 --> 00:25:11,780
150,000 samples over the next five years, but they have the lowest price.

244
00:25:11,780 --> 00:25:21,080
And. We need to do a bit more research, but it's not clear to me that price.

245
00:25:21,080 --> 00:25:25,780
Is a major factor for boosting revenue.

246
00:25:25,780 --> 00:25:37,580
So, for example, these are mandated quality control tests. So producers and processors demand for tests,

247
00:25:37,580 --> 00:25:43,980
maybe fairly not maybe I mean, to a large extent, it could.

248
00:25:43,980 --> 00:25:48,880
We will include the maybe it may be fairly inelastic.

249
00:25:48,880 --> 00:25:56,380
Right, you have to get the tests one way or the other, so you pretty much have to pay your price.

250
00:25:56,380 --> 00:26:03,940
And, you know, maybe you can go and travel to another lab, but these labs are.

251
00:26:03,940 --> 00:26:10,980
You saw they have different geographic locations, so you may not necessarily want to.

252
00:26:10,980 --> 00:26:17,680
Have to work out logistics to send your samples to a lab across the state.

253
00:26:17,680 --> 00:26:22,580
So that could play a factor. So there's a lot of factors that.

254
00:26:22,580 --> 00:26:26,980
May suggest there may be price inelasticity.

255
00:26:26,980 --> 00:26:32,140
For quality control testing, so.

256
00:26:32,140 --> 00:26:39,080
I think there's more research to be done, but I would naively suggest to lab number 21,

257
00:26:39,080 --> 00:26:45,980
you know that they may want to think about increasing their price, right? Because.

258
00:26:45,980 --> 00:26:53,440
The top two labs have higher prices and.

259
00:26:53,440 --> 00:27:00,780
You know, it's not clear that lowering their price is, you know, increasing their revenue.

260
00:27:00,780 --> 00:27:08,180
It may, you know, their low price may be one of the reasons why they're testing a lot of samples.

261
00:27:08,180 --> 00:27:14,840
I just think it would be interesting to see, OK, you know, if they did raise their price, you know, could they?

262
00:27:14,840 --> 00:27:17,480
So long story short.

263
00:27:17,480 --> 00:27:22,640
I'm not convinced that price necessarily shakes out with revenue.

264
00:27:22,640 --> 00:27:31,340
It does seem like there's a pretty strong correlation here between the number of samples you're testing and your revenue.

265
00:27:31,340 --> 00:27:36,740
But once again, not a perfect correlation there.

266
00:27:36,740 --> 00:27:38,440
So.

267
00:27:38,440 --> 00:27:44,920
Now on to I think the most interesting statistic of the day.

268
00:27:44,920 --> 00:27:52,740
And that is profitability, and so we'll need to take into account fixed costs.

269
00:27:52,740 --> 00:28:05,180
And what I was really trying to do was collect instrument data for various instruments that you would expect to see at a laboratory.

270
00:28:05,180 --> 00:28:14,020
However, it can be tricky to find prices for instruments, so prices vary and you may get a.

271
00:28:14,020 --> 00:28:19,140
It wouldn't surprise me if you get a lab by lab quote on a price.

272
00:28:19,140 --> 00:28:30,680
So long story short. Don't have a good measure on instrument prices, so I'm just going to have to do my best.

273
00:28:30,680 --> 00:28:45,180
And use my Bayesian prior on what total fixed costs may be, and we'll sort of do like they do in finance and do a couple of different projections, a low, a medium and a high.

274
00:28:45,180 --> 00:28:54,840
So I think, you know, was there a question or comment?

275
00:28:54,840 --> 00:28:58,740
So long story short, we'll start with the low projection.

276
00:28:58,740 --> 00:29:07,520
So keep in mind, Washington State doesn't have pesticide and heavy metals mandated, which would increase the cost of testing.

277
00:29:07,520 --> 00:29:14,240
So if you need those additional instruments, your fixed costs are going to increase.

278
00:29:14,240 --> 00:29:19,480
But let's start with the low projection. Just say, OK, let's say.

279
00:29:19,480 --> 00:29:27,880
It costs one million to increase to to open a lab.

280
00:29:27,880 --> 00:29:37,880
Well, essentially, what I've done here is I've taken the projected revenue that you would.

281
00:29:37,880 --> 00:29:42,720
The lab would earn over the next five years.

282
00:29:42,720 --> 00:29:49,480
Assuming that the instruments fully depreciate over five years.

283
00:29:49,480 --> 00:29:52,980
So here we've factored in the total fixed costs.

284
00:29:52,980 --> 00:30:03,780
And so that may not necessarily be the case. These instruments do see a lot of wear and tear in the technology is always increasing.

285
00:30:03,780 --> 00:30:09,040
People use scientific instruments from way back, though.

286
00:30:09,040 --> 00:30:14,620
So it's not uncommon to get a used scientific instrument.

287
00:30:14,620 --> 00:30:23,580
So there could be some resale value of these, but we'll just assume they fully depreciate over five years.

288
00:30:23,580 --> 00:30:34,180
So what I'm basically estimating here is the maximum variable cost per sample.

289
00:30:34,180 --> 00:30:38,580
So we're subtracting out fixed costs.

290
00:30:38,580 --> 00:30:50,620
So that means the rest is you basically write you need to keep your revenue above your fixed costs plus your variable costs.

291
00:30:50,620 --> 00:31:00,720
So here is a measure of what a lab's maximum variable cost could be to remain profitable.

292
00:31:00,720 --> 00:31:09,160
And keep in mind, we don't have a measure of prices for these labs, so just ignore these.

293
00:31:09,160 --> 00:31:14,660
So these are what we're focusing on. So.

294
00:31:14,660 --> 00:31:19,620
So long story short.

295
00:31:19,620 --> 00:31:26,320
The this would include all of your variable costs, so all of your supplies,

296
00:31:26,320 --> 00:31:32,860
say, plus your instrument fixes, so say your instrument breaks down and needs a repair.

297
00:31:32,860 --> 00:31:37,500
That would be a variable cost plus all of your labor.

298
00:31:37,500 --> 00:31:48,260
So you need a well-educated labor force to run your laboratory.

299
00:31:48,260 --> 00:31:54,760
So that can often come with high hourly wages for your employees.

300
00:31:54,760 --> 00:32:07,800
So long story short, you know, these labs will need to keep their variable costs per sample under these dollar amounts to remain profitable.

301
00:32:07,800 --> 00:32:12,360
So as you can see. As.

302
00:32:12,360 --> 00:32:16,660
Labs can test more and more samples.

303
00:32:16,660 --> 00:32:23,460
The. Room for increased variable costs increases.

304
00:32:23,460 --> 00:32:30,060
So this has a couple implications here.

305
00:32:30,060 --> 00:32:39,960
One profit, right? So if let's say all of the labs have just picking out a number,

306
00:32:39,960 --> 00:32:45,920
say they're all operating with sixty dollars variable cost per sample.

307
00:32:45,920 --> 00:32:55,160
Well. You know, lab 18 wouldn't be profitable if variable costs are sixty dollars per sample,

308
00:32:55,160 --> 00:33:00,560
whereas, you know, lab, you know, for.

309
00:33:00,560 --> 00:33:11,160
At sixty dollars per sample cost, they could, you know, you know, they could borderline.

310
00:33:11,160 --> 00:33:20,660
Well, I guess they're just testing a lot fewer samples, but they could have a higher profit margin than lab 22.

311
00:33:20,660 --> 00:33:26,600
So, you know, they've got, you know, a bit bit more room for their variable costs there.

312
00:33:26,600 --> 00:33:30,120
We're just going to say, let's just redo these.

313
00:33:30,120 --> 00:33:34,060
These estimates with a bit higher of a fixed cost.

314
00:33:34,060 --> 00:33:41,920
So here we'll assume a fixed cost of five million dollars to open a laboratory.

315
00:33:41,920 --> 00:33:47,360
You know, today, so say all of these labs had to expend five million dollars today.

316
00:33:47,360 --> 00:33:54,100
You know, what would their variable costs need to be over the next five years?

317
00:33:54,100 --> 00:34:02,100
So we see that if each lab had to expend five million dollars today,

318
00:34:02,100 --> 00:34:10,960
then, you know, lab four and 18, you know, they couldn't be profitable in over the next five years.

319
00:34:10,960 --> 00:34:20,160
That's a high fixed cost. But it just would be, you know, sort of a lesson to people thinking about opening a lab.

320
00:34:20,160 --> 00:34:24,660
You know, if you're looking at fixed costs of around five million, you know,

321
00:34:24,660 --> 00:34:33,160
you're going to need to test, you know, more than, you know, more than 30,000 samples a year.

322
00:34:33,160 --> 00:34:43,660
And, you know, you may want to start looking at, you know, at these labs as to what may be feasible.

323
00:34:43,660 --> 00:34:47,860
Keep in mind that they need to keep their variable costs low.

324
00:34:47,860 --> 00:34:56,420
So, you know, if they had five million dollars of fixed costs, you know, they'll need to keep their variable costs,

325
00:34:56,420 --> 00:35:01,720
you know, below, you know, twenty four and twenty eight dollars per sample.

326
00:35:01,720 --> 00:35:09,120
So that may be getting pretty tight with all the labor you need to employ.

327
00:35:09,120 --> 00:35:16,720
And as you can see, you know, these labs have slightly, lab 21 has slightly higher.

328
00:35:16,720 --> 00:35:23,800
And then, you know, lab three and seven have a good, you know, a good room there.

329
00:35:23,800 --> 00:35:34,200
So, you know, the lower they can keep their variable costs, the higher their profit will be.

330
00:35:34,200 --> 00:35:42,600
And then, you know, for example, in I've heard an estimate that in Massachusetts,

331
00:35:42,600 --> 00:35:47,400
you know, it would cost 15 million to open a laboratory.

332
00:35:47,400 --> 00:35:58,000
And, you know, at 15 million, really, none of the laboratories in Washington would be profitable at their given price.

333
00:35:58,000 --> 00:36:03,160
So keep in mind that in Massachusetts, they also have.

334
00:36:03,160 --> 00:36:06,500
Pesticide and heavy metal screening.

335
00:36:06,500 --> 00:36:14,800
So if you did need high capital expenditure to do heavy metal and pesticide screening,

336
00:36:14,800 --> 00:36:23,500
then that's that's why labs in states such as Massachusetts charge higher higher prices.

337
00:36:23,500 --> 00:36:27,620
So so that's the low, medium and high.

338
00:36:27,620 --> 00:36:35,300
And we'll just stick with the medium forecast for now.

339
00:36:35,300 --> 00:36:45,400
And we've been doing a lot of numbers, so we'll just look at a few plots here and then I'll share my takeaway with you.

340
00:36:45,400 --> 00:36:51,900
And we can start to talk about this because I know it was a lot of number crunching for one day.

341
00:36:51,900 --> 00:36:55,700
So sorry if it's been.

342
00:36:55,700 --> 00:37:02,960
Been a little abstract or much, but here is just a quick regression.

343
00:37:02,960 --> 00:37:09,300
We only have seven observations, so, you know, make of it what you will.

344
00:37:09,300 --> 00:37:18,340
But these were the four tests that we made for the various labs over the next five years.

345
00:37:18,340 --> 00:37:21,760
Regressed on their price.

346
00:37:21,760 --> 00:37:27,500
And basically what I can show you is right here, you know, two labs, you know,

347
00:37:27,500 --> 00:37:33,160
where they each have a price of $100 per test.

348
00:37:33,160 --> 00:37:41,300
But, you know, one lab is forecasted to make less than 2.5 million over the next five years,

349
00:37:41,300 --> 00:37:45,360
where the other lab is projected to make.

350
00:37:45,360 --> 00:37:49,660
You know, around 15 million in the next five years.

351
00:37:49,660 --> 00:38:02,060
So long story short is I don't know how much price competition will work in boosting your sales.

352
00:38:02,060 --> 00:38:04,700
So.

353
00:38:04,700 --> 00:38:14,560
So long story short, you know, if I was doing pricing, I mean, why not, you know, just crank it up, right?

354
00:38:14,560 --> 00:38:23,460
Because that's what it looks like some labs have done, and they've just they've just chosen a price of 120.

355
00:38:23,460 --> 00:38:27,900
Because it looks like they've just maybe inelastic demand.

356
00:38:27,900 --> 00:38:29,620
I'm not telling anyone what the price.

357
00:38:29,620 --> 00:38:35,700
I'm just saying, you know, if I was running a lab.

358
00:38:35,700 --> 00:38:42,500
So you can't compete on price or at least I'm not certain you can.

359
00:38:42,500 --> 00:38:46,100
So at least not to a great extent.

360
00:38:46,100 --> 00:38:50,860
So, you know, what are some of the other factors you can compete on?

361
00:38:50,860 --> 00:39:03,020
Well, here is essentially, you know, the maximum variable cost per sample regressed against revenue.

362
00:39:03,020 --> 00:39:07,760
And so here we see a positive correlation.

363
00:39:07,760 --> 00:39:18,560
Keep in mind, we only have seven observations, but essentially what I'm starting to kind of gather from this data is, you know,

364
00:39:18,560 --> 00:39:26,700
it looks like, you know, if you're able to sort of give yourself a bit more cushion and, you know,

365
00:39:26,700 --> 00:39:37,700
have room to have a higher variable cost and still make a profit, then that can increase your revenue.

366
00:39:37,700 --> 00:39:45,200
And just to kind of show a few more figures here to flesh out the story.

367
00:39:45,200 --> 00:39:52,000
Here is total samples against variable costs.

368
00:39:52,000 --> 00:40:03,360
So it looks like, you know, if you are able to have a higher variable cost, it looks like you're able to test more samples.

369
00:40:03,360 --> 00:40:12,660
And it also looks, I mean, I would assume there is sort of simultaneous causation here in that.

370
00:40:12,660 --> 00:40:17,240
The more samples you test.

371
00:40:17,240 --> 00:40:26,060
That's not the best plot ever, but the more samples you test, the.

372
00:40:26,060 --> 00:40:31,660
You know, the higher your variable cost can be.

373
00:40:31,660 --> 00:40:36,760
So let's just print this back out here.

374
00:40:36,760 --> 00:40:39,440
And I'll try to get to my takeaway here.

375
00:40:39,440 --> 00:40:44,240
So so basically.

376
00:40:44,240 --> 00:40:53,360
We're starting to get a little abstract and a little imperfect here, but basically what I'm trying to argue is.

377
00:40:53,360 --> 00:40:57,060
You know, to run a profitable lab.

378
00:40:57,060 --> 00:41:05,600
I think focusing on your variable costs is one of your most important.

379
00:41:05,600 --> 00:41:14,000
Factors that's under your control, so you can only do so many pricing gains.

380
00:41:14,000 --> 00:41:22,800
And if you look at the laboratory's websites, you see that they are engaged in pricing strategies.

381
00:41:22,800 --> 00:41:26,860
They do, for example, they do bulk deals.

382
00:41:26,860 --> 00:41:34,300
There's various discounts and packages, so they're engaged in pricing strategies,

383
00:41:34,300 --> 00:41:39,600
but I'm not certain that those are effective strategies.

384
00:41:39,600 --> 00:41:44,760
And. I would like to just introduce a concept to you.

385
00:41:44,760 --> 00:41:47,660
Price as a signal.

386
00:41:47,660 --> 00:41:56,500
So, you know, we think, oh, yes, just increasing the price or decreasing the price, and that'll change quantity.

387
00:41:56,500 --> 00:42:02,520
But retailers are hip to this and.

388
00:42:02,520 --> 00:42:08,920
It may make sense for a laboratory to think about this price can be a signal of quality.

389
00:42:08,920 --> 00:42:20,860
So if you charge a higher price, you know that that could just be a signal to others that your higher quality.

390
00:42:20,860 --> 00:42:30,500
And so, you know, if you're charging a price of 80, you know, and other people are charging a price of 120.

391
00:42:30,500 --> 00:42:38,600
That may be signaling to people that you're doing poor quality tests,

392
00:42:38,600 --> 00:42:45,900
whereas you're just trying that pricing strategy out because you think, you know, it may be profitable.

393
00:42:45,900 --> 00:42:50,460
So so just, you know, that's a factor.

394
00:42:50,460 --> 00:42:58,060
Of course, you know, the higher your price is, you know, the more wiggle room you have on.

395
00:42:58,060 --> 00:43:06,560
You know, to keep your variable costs under control, under control while making a profit.

396
00:43:06,560 --> 00:43:13,200
So so that's sort of my main takeaway from today is.

397
00:43:13,200 --> 00:43:25,900
And that was sort of the lesson I was trying to say is, you know, it's important to keep your variable costs right.

398
00:43:25,900 --> 00:43:31,760
Gloves, solvents, disposables, it's important to keep those low.

399
00:43:31,760 --> 00:43:36,800
And in economics, there's a mathematical proof to it.

400
00:43:36,800 --> 00:43:47,160
But essentially, there's an argument that cost minimization results in the same outcome as profit maximization.

401
00:43:47,160 --> 00:43:54,100
So instead of just. Question or comment.

402
00:43:54,100 --> 00:43:59,360
Yes, there was a question about the recording. Yes, there will be a recording available.

403
00:43:59,360 --> 00:44:05,300
I'm sorry. I'm sorry, actually, you know, I'm running into two calls and I wanted to ask a lot of questions there.

404
00:44:05,300 --> 00:44:08,760
But just because of like my office work results there.

405
00:44:08,760 --> 00:44:12,560
So I could not. But at the last, like I thought, OK, maybe it's only 10 minutes.

406
00:44:12,560 --> 00:44:18,640
And if you guys disconnect, I need to have that video like audio, some recording maybe.

407
00:44:18,640 --> 00:44:24,360
Because next week I will come to, you know, join this call.

408
00:44:24,360 --> 00:44:27,500
And with a lot of questions. Awesome.

409
00:44:27,500 --> 00:44:30,960
We'll love to have you next week and I'll get the recording up.

410
00:44:30,960 --> 00:44:37,760
And then next week we'll start talking.

411
00:44:37,760 --> 00:44:45,000
Yes, so bring your bring your questions next time and we'll make sure to spend a ton of time answering your questions.

412
00:44:45,000 --> 00:44:49,600
And we'll be focusing on producers and processors next time.

413
00:44:49,600 --> 00:45:00,100
But long story short is, you know, just instead of just being obsessed with profit and prices and,

414
00:45:00,100 --> 00:45:05,440
you know, how many total samples are you testing and this and that,

415
00:45:05,440 --> 00:45:11,340
you could also be just as successful by trying to be efficient.

416
00:45:11,340 --> 00:45:19,540
So this is where I say, you know, just if you just have this laser focus on efficiencies and optimizing your workflow,

417
00:45:19,540 --> 00:45:24,960
then you can, you know, you can run a profitable lab here.

418
00:45:24,960 --> 00:45:33,960
Right. I mean, you know, it's hard to imagine that these laboratories aren't going to be profitable.

419
00:45:33,960 --> 00:45:39,540
It's possible. Right. It's expensive to operate a laboratory.

420
00:45:39,540 --> 00:45:49,540
So, for example, you know, lab 21, you know, they could bring in 12 million dollars, you know, in the next five years.

421
00:45:49,540 --> 00:45:59,340
But if they can't keep their variable costs under forty six dollars per sample, then they're not going to be profitable.

422
00:45:59,340 --> 00:46:12,640
And forty six dollars per sample, I mean, we're talking about things like disposables and variable costs include things like your employee labor.

423
00:46:12,640 --> 00:46:24,800
Then, you know, every, you know, centers matters, especially on things like your like, you know, your disposables.

424
00:46:24,800 --> 00:46:34,040
And basically, you know, I do know, you know, there's no need to cut corners because it does look like.

425
00:46:34,040 --> 00:46:44,060
You can make a profit with room to spare. But but that's sort of my my main takeaway is,

426
00:46:44,060 --> 00:46:55,640
you know, rather than, you know, just try to just be laser focused on price or, you know, how just just thinking, oh, how can we bring in more and more samples?

427
00:46:55,640 --> 00:47:07,280
You know, maybe you should think about costs, you know, you know, maybe there are ways you can improve your workflow or, you know,

428
00:47:07,280 --> 00:47:14,960
I always think there is endless opportunities for just becoming more and more efficient, just just doing things better and better and better.

429
00:47:14,960 --> 00:47:20,240
And that's why I started, you know, analytics.

430
00:47:20,240 --> 00:47:30,440
So, you know, just quick shout out to you know, analytics for making this all possible. And, you know, if any of you are interested in helping laboratories

431
00:47:30,440 --> 00:47:39,080
keep their variable costs as low as possible, then check out the analytics software, the analytics engine.

432
00:47:39,080 --> 00:47:48,320
So here is software specifically designed to help laboratories keep their costs as low as possible.

433
00:47:48,320 --> 00:47:57,120
Right. If you're having your analyst enter data, that's increasing your variable costs.

434
00:47:57,120 --> 00:48:01,360
And there's many better things that your analysts can be doing.

435
00:48:01,360 --> 00:48:17,440
So I've seen in the past where people have gotten their time freed up and they're able to enroll in courses part time and they just become more educated and just their careers blossom.

436
00:48:17,440 --> 00:48:24,120
So, you know, there's a lot of ways that you can just become more efficient over time.

437
00:48:24,120 --> 00:48:28,640
And then, you know, that way, you know, you don't have to compete on price.

438
00:48:28,640 --> 00:48:37,560
And, you know, if you get more efficient, I would argue that you're going to be able to test more samples.

439
00:48:37,560 --> 00:48:45,320
You can test more samples. That'll give you a bit more wiggle room and you can do a better and better job.

440
00:48:45,320 --> 00:48:50,520
So they can actually increase the quality of their tests. Right.

441
00:48:50,520 --> 00:48:58,440
So if you've got a bit more room, you can do even more stringent quality control.

442
00:48:58,440 --> 00:49:03,000
So I think it's just a win win for everyone.

443
00:49:03,000 --> 00:49:10,560
So focus on costs, test more samples, test your samples better.

444
00:49:10,560 --> 00:49:15,440
Increase your price because you're doing quality work. Right.

445
00:49:15,440 --> 00:49:23,200
It's just that that's how I would go about running a profitable lab.

446
00:49:23,200 --> 00:49:31,640
So sort of a long winded long winded story for today, but.

447
00:49:31,640 --> 00:49:34,880
But I think it was an interesting one.

448
00:49:34,880 --> 00:49:44,320
So are there any thoughts, comments? I'll just put up a plot.

449
00:49:44,320 --> 00:49:56,640
But any thoughts or comments about the labs in Washington state or across the country or really any questions whatsoever?

450
00:49:56,640 --> 00:50:03,800
Maybe maybe not related to maybe not to the Washington state thing.

451
00:50:03,800 --> 00:50:08,080
I was, you know, I'm back from my office. I'm sorry.

452
00:50:08,080 --> 00:50:12,720
I winded up like, OK, they wind up for five minutes. So I have a lot of questions.

453
00:50:12,720 --> 00:50:16,400
I'm joining this session from last three consecutive times.

454
00:50:16,400 --> 00:50:19,520
I'm trying to get a hold of this thing.

455
00:50:19,520 --> 00:50:25,760
Number one is like, OK, this cannabis is a kind of like the commercial tool or what?

456
00:50:25,760 --> 00:50:29,920
Or we are we are most of time talking about like, OK, these labs,

457
00:50:29,920 --> 00:50:36,840
because I am trying to where I'm trying to pick up your brain is I'm into the data science.

458
00:50:36,840 --> 00:50:46,120
Just just stepping into this one. And I am finding like, OK, any of the existing forums which are talking about more of like

459
00:50:46,120 --> 00:50:50,880
like today you were talking about the forecasting and all that stuff and definitely maybe not the forecasting.

460
00:50:50,880 --> 00:50:56,680
And you for for for only for this one, it will be applicable to anything else as well.

461
00:50:56,680 --> 00:51:04,800
If it is my right understanding, like this this forum is specific to a certain need or is it really the generalized version of it,

462
00:51:04,800 --> 00:51:10,760
like which we can implement anywhere else in the industry?

463
00:51:10,760 --> 00:51:15,480
Really, you know, these are tools you can use really anywhere.

464
00:51:15,480 --> 00:51:21,840
So this forecasting model I used is if you want to learn more,

465
00:51:21,840 --> 00:51:28,440
tune in, I'd recommend checking out Saturday morning statistics when we can go in depth on this.

466
00:51:28,440 --> 00:51:32,960
But we really just use what's called a REMA forecasting.

467
00:51:32,960 --> 00:51:38,040
And so this just uses historic observations.

468
00:51:38,040 --> 00:51:42,280
So we literally just use.

469
00:51:42,280 --> 00:51:52,400
The historic right. So these are the historic, the historic trends we've seen from the various labs.

470
00:51:52,400 --> 00:51:58,680
And with the REMA forecasting, all we use is the one time series.

471
00:51:58,680 --> 00:52:03,360
I always also use the month so you can add in other variables.

472
00:52:03,360 --> 00:52:11,640
But really, all you're using is the past historic trends of these labs.

473
00:52:11,640 --> 00:52:18,320
And you're forecasting it forward. So to not only to you can.

474
00:52:18,320 --> 00:52:26,240
That's what I love a REMA forecasting is you can forecast any time series,

475
00:52:26,240 --> 00:52:34,880
any data that you can track over time, you can forecast with the REMA.

476
00:52:34,880 --> 00:52:44,880
There's a few stipulations like it's good to have like a consistent interval and it's best not to have missing data.

477
00:52:44,880 --> 00:52:53,640
So there's and there's like some statistical checks you can do to make sure.

478
00:52:53,640 --> 00:52:59,880
You're not leaving any variation left on the table, so to speak.

479
00:52:59,880 --> 00:53:07,080
But it's it's powerful because a lot of people stop here.

480
00:53:07,080 --> 00:53:13,400
Right. I see a lot of people, they they do like historic analysis, which is useful.

481
00:53:13,400 --> 00:53:20,480
Right. It's it's awesome to know how many samples were tested by month by the labs.

482
00:53:20,480 --> 00:53:29,160
But you can just play it forward. Right. You can just use statistical techniques, forecast this forward.

483
00:53:29,160 --> 00:53:35,000
It's not going to be perfect. We're not going to hit everything perfectly.

484
00:53:35,000 --> 00:53:39,960
But, you know, at least now you have a forecast for the future.

485
00:53:39,960 --> 00:53:47,480
And, you know, like I was saying. You know, yes.

486
00:53:47,480 --> 00:53:51,280
Yeah. So one more question here. So I get your point.

487
00:53:51,280 --> 00:53:56,520
Like when you said, OK, it is that's good for forecasting.

488
00:53:56,520 --> 00:54:06,600
Let me let me say, like, in other words, like what my expectations, if it is like if it is useful for me, would be is

489
00:54:06,600 --> 00:54:10,600
we deal with a lot of volume of data, like which is coming daily.

490
00:54:10,600 --> 00:54:16,040
And you make you can say like, OK, millions of records.

491
00:54:16,040 --> 00:54:21,680
Right. And let's say it's the claims data, like millions of claims we are getting day in and out.

492
00:54:21,680 --> 00:54:31,320
Right. And I want to be in a position like where I can say, OK, hey, for because let's say I for this year,

493
00:54:31,320 --> 00:54:40,160
I think we churned out like almost close to maybe 30 million of records.

494
00:54:40,160 --> 00:54:43,160
We churned out that one. And that is just growing.

495
00:54:43,160 --> 00:54:47,840
And we are just started like three years back. It is not like two old data.

496
00:54:47,840 --> 00:54:54,800
We don't have much data from the from the longevity perspective, but we have like, OK, within the short time,

497
00:54:54,800 --> 00:55:05,040
we have grown enormously, like from like, OK, from few hundred thousands from few thousands towards the hundred thousands.

498
00:55:05,040 --> 00:55:14,240
And now we are churning out approximately 300 to 400 thousand in a day and which will be giving us a huge amount.

499
00:55:14,240 --> 00:55:21,600
I don't know where we are going. And I want to see be in a position like, OK, hey, I have the, you know,

500
00:55:21,600 --> 00:55:27,800
a prediction that based on like whatever the data from the last three years.

501
00:55:27,800 --> 00:55:34,120
Because you were talking about the samples as well. I was just since I was into the two calls,

502
00:55:34,120 --> 00:55:37,960
like I was thinking this sample, which you're talking about, like the sample of the population.

503
00:55:37,960 --> 00:55:42,280
And I think you were talking about the sample about like the lab testing. Right.

504
00:55:42,280 --> 00:55:51,280
So somewhat I could differentiate. But then I think, OK, maybe maybe I need a dedicated call for this one.

505
00:55:51,280 --> 00:55:59,880
Maybe with you. And questions were how often you guys are doing this session so that I can be regular here

506
00:55:59,880 --> 00:56:06,480
and I can pick up some brain for like, OK, these kind of things can be applied at my side.

507
00:56:06,480 --> 00:56:12,320
Second thing, is it a commercial tool or is it just like an open source Python like kind of library?

508
00:56:12,320 --> 00:56:17,240
Awesome questions. So it's all open source. So it's under the MIT license.

509
00:56:17,240 --> 00:56:22,760
So if you just, you know, give credit to Caneletics and that's awesome.

510
00:56:22,760 --> 00:56:26,800
And then to answer your second your other questions, we do meet weekly.

511
00:56:26,800 --> 00:56:33,320
So there's a group Saturday morning statistics where we specifically do statistics.

512
00:56:33,320 --> 00:56:38,000
And then we do the Wednesday cannabis data science each week.

513
00:56:38,000 --> 00:56:46,960
And then as to your question with data, the time series forecasting can be done.

514
00:56:46,960 --> 00:56:55,600
Wait, did I do that here? Yes, this model, it can be done on any time series.

515
00:56:55,600 --> 00:57:08,000
So here you see this data. It's just. It's, you know, it's yearly data.

516
00:57:08,000 --> 00:57:16,800
I wonder. But but as long.

517
00:57:16,800 --> 00:57:26,960
You will be lost, too, I guess. Hello.

518
00:57:26,960 --> 00:57:32,600
Is it only me or anybody else? OK, another three more people.

519
00:57:32,600 --> 00:57:40,560
Yes, can you listen to me, Dharma? Paul, I can hear you, but I might as well answer the question for Keegan.

520
00:57:40,560 --> 00:57:47,480
I think you're you described like huge data sets, like millions of data points.

521
00:57:47,480 --> 00:57:55,320
And if you're looking for humongous data sets with millions of data points, you're in the wrong space.

522
00:57:55,320 --> 00:58:06,720
But because cannabis has very few data points, a lot of this situation is primarily if we're going to go into data science realm,

523
00:58:06,720 --> 00:58:15,320
unsupervised learning where you have the bootstrap data samples and supplement data samples and everything like that.

524
00:58:15,320 --> 00:58:26,040
They're the only big data sets within the cannabis space is held in private businesses because cannabis is big business.

525
00:58:26,040 --> 00:58:37,280
And it particularly relies with cannabis genome sequencing and to tie it into this cannabis conversation.

526
00:58:37,280 --> 00:58:43,200
That is particularly why Eastern Washington.

527
00:58:43,200 --> 00:58:55,400
Is very well known for their testing sample, because that is the forefront of cannabis research in terms of growing and testing and sampling.

528
00:58:55,400 --> 00:58:58,040
Got it. Appreciate that. Thank you.

529
00:58:58,040 --> 00:59:03,560
And this sample which you have created, like is it available somewhere in the GitHub or somewhere?

530
00:59:03,560 --> 00:59:07,880
Or is it shareable? I mean, first thing I should ask, like, is it shareable?

531
00:59:07,880 --> 00:59:18,120
Yes. So check out the Cannabix Data Science GitHub repository.

532
00:59:18,120 --> 00:59:27,760
And I've got a link here in this latest script lab profitability to the data sets.

533
00:59:27,760 --> 00:59:34,680
So that way you can download those. Can you please put into the message?

534
00:59:34,680 --> 00:59:39,960
Just fork the repo, dude. If you're a real data scientist, you know what a fork on GitHub.

535
00:59:39,960 --> 00:59:42,240
Here we are.

536
00:59:42,240 --> 00:59:54,240
So. Thank you. And here's the link to the actual data.

537
00:59:54,240 --> 01:00:00,320
And the largest data there is going to be the sales data. And Graham was right.

538
01:00:00,320 --> 01:00:04,480
We've got to work with the data points we can get.

539
01:00:04,480 --> 01:00:12,720
So but I would just just push back slightly in that I think the the models, the ARIMA models,

540
01:00:12,720 --> 01:00:16,640
they work on small and large data sets.

541
01:00:16,640 --> 01:00:23,440
I think as long as you've got a timestamp, you can start to aggregate by time.

542
01:00:23,440 --> 01:00:29,040
So say you're looking at sales over time or you were looking at claims.

543
01:00:29,040 --> 01:00:33,920
So you can just look at number of claims over time.

544
01:00:33,920 --> 01:00:42,080
And any frequency, right? So here.

545
01:00:42,080 --> 01:00:47,520
Right, we already showed you one with yearly and this should be monthly.

546
01:00:47,520 --> 01:00:52,760
So really anything that you can put a timestamp on.

547
01:00:52,760 --> 01:00:57,960
And so you could run your ARIMA model.

548
01:00:57,960 --> 01:01:06,040
Minute by minute, hour by hour, so you could get, you know, same day predictions.

549
01:01:06,040 --> 01:01:11,960
I found they're useful for like week, like one week ahead, one month ahead forecasts.

550
01:01:11,960 --> 01:01:16,440
But if you're ambitious and you want to do them in real time with your data,

551
01:01:16,440 --> 01:01:18,680
there's really no reason why you can't.

552
01:01:18,680 --> 01:01:27,320
All you just all you need is just a series of values that you're keeping track of over time.

553
01:01:27,320 --> 01:01:30,360
And you're off to the races.

554
01:01:30,360 --> 01:01:35,480
And one more question. So this data cleanup, the data which you have used,

555
01:01:35,480 --> 01:01:36,920
like you would send me both of the links.

556
01:01:36,920 --> 01:01:42,760
So the data is already cleaned up or you are doing it like in your program somewhere?

557
01:01:42,760 --> 01:01:46,440
Oh, yes. All the cleanings done here.

558
01:01:46,440 --> 01:01:52,360
So basically, so here I just read in all of the data.

559
01:01:52,360 --> 01:02:00,360
And if you're reading it in, you can just read it in sort of bit by bit at a time.

560
01:02:00,360 --> 01:02:05,160
And in here, you know, you can pick and choose your variables.

561
01:02:05,160 --> 01:02:06,920
Got it.

562
01:02:06,920 --> 01:02:10,760
I just scraped the tip of the iceberg.

563
01:02:10,760 --> 01:02:17,160
But like I said, if you're feeling ambitious, then check out the sales data.

564
01:02:17,160 --> 01:02:22,760
There is. I really appreciate that.

565
01:02:22,760 --> 01:02:29,560
I will not say like, OK, the millions of the data points, I would say the data is too much.

566
01:02:29,560 --> 01:02:31,320
But yeah, the features are limited.

567
01:02:31,320 --> 01:02:34,040
Like, OK, it's not expanding any further.

568
01:02:34,040 --> 01:02:37,640
It's like merely like 10 to 20 features, I would be saying.

569
01:02:37,640 --> 01:02:47,000
But yes, there is a huge diversification in terms of like one data is segmented into

570
01:02:47,000 --> 01:02:48,280
this type of process.

571
01:02:48,280 --> 01:02:50,120
Second is into this type of process.

572
01:02:50,120 --> 01:02:51,880
Like, as I said, like the claims.

573
01:02:51,880 --> 01:02:55,560
So it could be like, OK, this guy is having this kind of claim.

574
01:02:55,560 --> 01:02:57,480
That guy is having that kind of claim.

575
01:02:57,480 --> 01:03:02,760
So in a nutshell, like I'm getting like the variety of the claims basically,

576
01:03:03,400 --> 01:03:05,560
maybe 10 to 20 types of that one.

577
01:03:05,560 --> 01:03:08,840
And then eventually it turns into like millions of records.

578
01:03:08,840 --> 01:03:15,560
So I was thinking like, OK, like to be very honest, like I even started this one like

579
01:03:15,560 --> 01:03:21,160
some time back and then I think, OK, what actually like would be would be helpful for me?

580
01:03:21,160 --> 01:03:25,240
And I am honestly like going from kind of the door to door.

581
01:03:25,240 --> 01:03:27,480
People say, OK, if you have too much of data, OK, go ahead.

582
01:03:27,480 --> 01:03:28,440
Do the deep learning.

583
01:03:28,440 --> 01:03:31,560
OK, if you are like, OK, this no, no, no, you do this supervised learning.

584
01:03:31,560 --> 01:03:35,320
And today, again, like, OK, it's not going anywhere.

585
01:03:35,320 --> 01:03:40,840
So let me let me start with some point like how at least they start handling these things.

586
01:03:41,480 --> 01:03:44,440
And, you know, that's where I am.

587
01:03:44,440 --> 01:03:49,720
And I appreciate that, like you guys, at least, you know, sharing this information so that I can

588
01:03:49,720 --> 01:03:55,080
next time probably I will see you with a lot of questions what I have and what I don't.

589
01:03:55,080 --> 01:03:59,960
Or probably I'll try to grab some sample, maybe not the sample from the original data,

590
01:03:59,960 --> 01:04:04,760
but I will just create a replica of like with some dummy values so that we can,

591
01:04:04,760 --> 01:04:09,560
if you guys can, you know, think about like, yeah, you can you can guide me.

592
01:04:09,560 --> 01:04:13,800
We lost you again.

593
01:04:13,800 --> 01:04:17,800
Yes.

594
01:04:17,800 --> 01:04:23,800
I don't really yet.

595
01:04:23,800 --> 01:04:27,800
We lost you for for last 15 seconds.

596
01:04:27,800 --> 01:04:29,000
You have to repeat what I said.

597
01:04:29,000 --> 01:04:29,800
I'm sorry.

598
01:04:29,800 --> 01:04:30,840
Oh, yes.

599
01:04:30,840 --> 01:04:33,400
So my background is in economics.

600
01:04:33,400 --> 01:04:34,840
So that's why I use.

601
01:04:34,840 --> 01:04:40,120
A lot of tools that economists may use.

602
01:04:40,120 --> 01:04:44,120
So I'm just as eager as you are to learn some of these interesting,

603
01:04:44,120 --> 01:04:48,120
say, machine learning tools or other data science tools.

604
01:04:48,120 --> 01:04:52,120
So just because I haven't talked about it doesn't mean it's not useful.

605
01:04:52,120 --> 01:04:57,640
So if you find an awesome, cool data science tool, machine learning tool,

606
01:04:57,640 --> 01:05:00,120
then please share it because I'd be.

607
01:05:00,120 --> 01:05:00,600
Yeah.

608
01:05:00,600 --> 01:05:05,000
Drilled to explore it and apply it to some of this cannabis data.

609
01:05:05,000 --> 01:05:06,040
Yep.

610
01:05:06,040 --> 01:05:06,680
For sure.

611
01:05:06,680 --> 01:05:07,960
Thank you very much.

612
01:05:07,960 --> 01:05:08,520
Awesome.

613
01:05:08,520 --> 01:05:10,680
Well, thank you all for coming.

614
01:05:10,680 --> 01:05:13,720
I'm going to go ahead and stop the presentation for today.

615
01:05:13,720 --> 01:05:18,520
I hope I hope you all were able to get something out of it.

616
01:05:18,520 --> 01:05:20,120
It was a little long.

617
01:05:20,120 --> 01:05:21,720
It was a little long winded today.

618
01:05:21,720 --> 01:05:26,200
So next week, I'll I'll try to make it a bit more simple and a bit more handy.

619
01:05:26,200 --> 01:05:29,240
So I'll try to make it a little bit more simple and a bit more handy.

620
01:05:29,240 --> 01:05:32,200
I'll try to make it a little bit more simple and a bit more hands on,

621
01:05:32,200 --> 01:05:33,400
a bit more discussion based.

622
01:05:33,400 --> 01:05:35,800
So I think next week will be a good one, too.

623
01:05:35,800 --> 01:05:36,600
I appreciate that.

624
01:05:36,600 --> 01:05:37,160
Thank you.

625
01:05:37,160 --> 01:05:37,720
Thank you.

626
01:05:37,720 --> 01:05:38,360
Great.

627
01:05:38,360 --> 01:05:39,080
Awesome.

628
01:05:39,080 --> 01:05:40,520
Thank you all for coming.

629
01:05:40,520 --> 01:06:00,200
Have a productive week.

