1
00:00:00,000 --> 00:00:13,480
Without further ado, welcome to the Cannabis Data Science Meetup Group.

2
00:00:13,480 --> 00:00:15,040
Thank you for attending.

3
00:00:15,040 --> 00:00:18,160
So we've got a jam packed day today.

4
00:00:18,160 --> 00:00:24,280
So going to be doing, as always, data science.

5
00:00:24,280 --> 00:00:28,440
We've got on the agenda forecasting.

6
00:00:28,440 --> 00:00:30,280
So we'll be picking up with that.

7
00:00:30,280 --> 00:00:33,160
So we started looking at forecasting last week.

8
00:00:33,160 --> 00:00:35,960
We looked at a quite simple forecasting model.

9
00:00:35,960 --> 00:00:39,600
We're going to extend upon that today.

10
00:00:39,600 --> 00:00:45,560
So as we always do, we just build upon each week, provide some building blocks, and then

11
00:00:45,560 --> 00:00:51,120
just keep adding, adding, adding, and getting more advanced, improving as we go.

12
00:00:51,120 --> 00:01:00,000
So last week, we did forecasting with a vector autoregression, a VAR model.

13
00:01:00,000 --> 00:01:01,880
And we looked at monthly data.

14
00:01:01,880 --> 00:01:03,840
We saw some flaws.

15
00:01:03,840 --> 00:01:10,300
So monthly data is maybe not the ideal frequency.

16
00:01:10,300 --> 00:01:19,240
We were only able to look at sales data, and we had to use an a theoretical approach.

17
00:01:19,240 --> 00:01:21,840
This week, we'll also use an a theoretical approach.

18
00:01:21,840 --> 00:01:27,880
However, we can apply economics to our forecasts.

19
00:01:27,880 --> 00:01:30,720
That will be interesting.

20
00:01:30,720 --> 00:01:34,320
And we'll use a different forecasting methodology.

21
00:01:34,320 --> 00:01:38,240
So we'll use what's called Box-Jenkins methodology.

22
00:01:38,240 --> 00:01:44,200
And we touched on this in Saturday Morning Statistics this past Saturday.

23
00:01:44,200 --> 00:01:49,880
And so you're all welcome to join us for Saturday Morning Statistics this coming week.

24
00:01:49,880 --> 00:01:52,840
And we will dive into more statistics.

25
00:01:52,840 --> 00:01:57,040
So we may introduce some brand new statistical models.

26
00:01:57,040 --> 00:02:01,440
So we may continue working with time series data.

27
00:02:01,440 --> 00:02:06,800
However, I've got some interesting statistical models to show you.

28
00:02:06,800 --> 00:02:11,160
So we can do some quite interesting analysis.

29
00:02:11,160 --> 00:02:14,200
And I'm going to save that for Saturday Morning Statistics.

30
00:02:14,200 --> 00:02:16,000
So that's a bit of a teaser.

31
00:02:16,000 --> 00:02:22,760
So definitely stay tuned, because I definitely try to give a little something extra for the

32
00:02:22,760 --> 00:02:26,000
people that show up to Saturday Morning Statistics.

33
00:02:26,000 --> 00:02:34,760
Since it costs $1 for you, I hope you can get more than $1 value out of it.

34
00:02:34,760 --> 00:02:37,860
And your time to show up on Saturday Morning.

35
00:02:37,860 --> 00:02:41,260
So I'll try to make that valuable.

36
00:02:41,260 --> 00:02:47,760
So without further ado, we'll jump into today's forecasting and economics.

37
00:02:47,760 --> 00:02:53,600
And sprinkle in some data science and cannabis talk where we can.

38
00:02:53,600 --> 00:03:01,760
So I'm going to go ahead and start the presentation here.

39
00:03:01,760 --> 00:03:09,040
Awesome.

40
00:03:09,040 --> 00:03:20,480
So like we said, make sure to check out Saturday Morning Statistics, because we've got some

41
00:03:20,480 --> 00:03:22,800
big plans coming with that.

42
00:03:22,800 --> 00:03:34,300
So we're going to use what's called an ARIMA model today.

43
00:03:34,300 --> 00:03:42,400
So we looked at a vector autoregressive model last week with vector meaning multiple variables

44
00:03:42,400 --> 00:03:43,840
here.

45
00:03:43,840 --> 00:03:47,000
This week, we're going to look at variable by variable.

46
00:03:47,000 --> 00:03:51,240
However, we're going to extend upon the model.

47
00:03:51,240 --> 00:03:58,760
So we're going to add in an integration component to make sure that we are not breaking any

48
00:03:58,760 --> 00:04:01,720
assumptions when we're forecasting.

49
00:04:01,720 --> 00:04:10,080
We're going to add in a moving average component to capture some cyclical trends.

50
00:04:10,080 --> 00:04:20,280
And what's cool about this package is it's built ready to go with a bunch of features

51
00:04:20,280 --> 00:04:28,520
that we would like to use, in particular, the auto ARIMA.

52
00:04:28,520 --> 00:04:39,600
So this is where we, you know, one would almost call this an application of machine learning

53
00:04:39,600 --> 00:04:47,080
if you continuously feed in the data as it arrives, right?

54
00:04:47,080 --> 00:04:55,680
Because you're basically letting the computer select the best forecasting model given the

55
00:04:55,680 --> 00:05:00,560
data and your parameters.

56
00:05:00,560 --> 00:05:21,520
So you're just making sure we're still connected here, correct?

57
00:05:21,520 --> 00:05:34,880
So pardon the interruption, but Cheyenne, we are still connected, correct?

58
00:05:34,880 --> 00:05:37,880
Yeah, we are connected.

59
00:05:37,880 --> 00:05:38,880
Yeah.

60
00:05:38,880 --> 00:05:39,880
Awesome.

61
00:05:39,880 --> 00:05:42,760
So anyways, moving forward with the ARIMA model.

62
00:05:42,760 --> 00:05:55,680
So we shared this script in Saturday Morning Statistics, where this is a ARIMA model that

63
00:05:55,680 --> 00:06:01,960
I wrote a few years back, back in 2017, and I've knocked the dust off of it.

64
00:06:01,960 --> 00:06:07,520
So you know, it still runs fine, a nice maintainable code here.

65
00:06:07,520 --> 00:06:17,920
And so this will, you can either forecast with a training period or with the minimum

66
00:06:17,920 --> 00:06:20,920
BIC.

67
00:06:20,920 --> 00:06:28,520
The algorithm I wrote only scans for optimal P's and Q's, and it doesn't have a lot of

68
00:06:28,520 --> 00:06:33,280
interesting features built in, like exogenous variables.

69
00:06:33,280 --> 00:06:40,600
So what we like to do is, you know, you can prove that you can do it by yourself.

70
00:06:40,600 --> 00:06:44,880
So here, you know, we coded up our own auto ARIMA.

71
00:06:44,880 --> 00:06:53,720
However, we can stand on the shoulder of giants and we can use the auto ARIMA package.

72
00:06:53,720 --> 00:06:57,280
And this is published by Alkaline ML.

73
00:06:57,280 --> 00:07:05,800
And you know, there's a handful of contributors, you know, 15 contributors.

74
00:07:05,800 --> 00:07:10,680
It's used by a thousand plus people or organizations.

75
00:07:10,680 --> 00:07:25,280
So you know, it's a little, it's arguably more vetted than the ARIMA model that I wrote

76
00:07:25,280 --> 00:07:26,480
myself.

77
00:07:26,480 --> 00:07:37,120
So instead of using this code, which we saw can be used to make forecasts, we'll go ahead

78
00:07:37,120 --> 00:07:46,560
and use the PMD ARIMA package today.

79
00:07:46,560 --> 00:07:49,080
All right.

80
00:07:49,080 --> 00:07:51,820
So that's the programming.

81
00:07:51,820 --> 00:07:54,000
We talked about the statistics.

82
00:07:54,000 --> 00:07:58,040
Now time to get into the cannabis data.

83
00:07:58,040 --> 00:08:06,240
So first things first, get the tools we're going to be using.

84
00:08:06,240 --> 00:08:12,680
Next, we're going to get the data.

85
00:08:12,680 --> 00:08:19,560
And so if you've tuned in in prior weeks, these are all of the data points that we can

86
00:08:19,560 --> 00:08:27,440
get from the Massachusetts Open API data.

87
00:08:27,440 --> 00:08:31,380
And so we're given a rich set of data here.

88
00:08:31,380 --> 00:08:40,600
So we have, you know, so we have the sales by product types here.

89
00:08:40,600 --> 00:08:45,240
You have prices.

90
00:08:45,240 --> 00:08:50,280
It's an awesome data point.

91
00:08:50,280 --> 00:08:55,220
You have a licensee data.

92
00:08:55,220 --> 00:09:03,040
And so here you could look at a single licensee.

93
00:09:03,040 --> 00:09:13,600
And so you have a plethora of data points for each licensee.

94
00:09:13,600 --> 00:09:19,000
And we will actually capitalize on some of these data points today.

95
00:09:19,000 --> 00:09:28,240
And a teaser for what's coming up is we're going to calculate an interesting statistic,

96
00:09:28,240 --> 00:09:35,560
several of them, that I'm not certain that anyone's calculated before in Massachusetts

97
00:09:35,560 --> 00:09:41,000
because there's maybe just not that many people looking at this data.

98
00:09:41,000 --> 00:09:44,320
And there's many novel ways to look at this data.

99
00:09:44,320 --> 00:09:49,760
And I believe we're going to look at the data in a novel way today.

100
00:09:49,760 --> 00:09:51,800
So that's a teaser.

101
00:09:51,800 --> 00:10:00,640
And so there's going to be one of these data points that we're going to capitalize on.

102
00:10:00,640 --> 00:10:05,120
And so, you know, there's many interesting data points here, right?

103
00:10:05,120 --> 00:10:09,060
You've got the geocoded location.

104
00:10:09,060 --> 00:10:12,880
You have when the license was issued.

105
00:10:12,880 --> 00:10:17,160
You of course have the name.

106
00:10:17,160 --> 00:10:19,040
You have tons of interesting variables.

107
00:10:19,040 --> 00:10:24,720
You have square footage, which I think would be an interesting analysis of its own, right?

108
00:10:24,720 --> 00:10:34,160
So you could look at the square footage required for cultivation and see how that's trending

109
00:10:34,160 --> 00:10:36,280
with the distribution.

110
00:10:36,280 --> 00:10:41,520
I think there's a lot of interesting analysis that can be done there.

111
00:10:41,520 --> 00:10:53,040
Of course, just seeing how the licensees are distributed geographically.

112
00:10:53,040 --> 00:10:56,360
There may be some analysis that can be done with license fees.

113
00:10:56,360 --> 00:11:00,400
So long story short, lots of data points here.

114
00:11:00,400 --> 00:11:03,360
We'll be using one of them shortly.

115
00:11:03,360 --> 00:11:09,080
Then of course, we calculate our sales data here.

116
00:11:09,080 --> 00:11:14,520
And note to the newcomers that we're adjusting for outliers.

117
00:11:14,520 --> 00:11:20,880
So just coding outliers is zero.

118
00:11:20,880 --> 00:11:23,440
So we've gotten all our data here.

119
00:11:23,440 --> 00:11:30,440
So for example, we can look at sales data.

120
00:11:30,440 --> 00:11:35,880
And also for the newcomers, Massachusetts is interesting because unlike a lot of other

121
00:11:35,880 --> 00:11:47,800
states where you saw a spike in cannabis sales right at the start of the pandemic, in Massachusetts,

122
00:11:47,800 --> 00:11:53,400
they suspended cannabis sales for approximately a month.

123
00:11:53,400 --> 00:11:59,880
So you have the exact opposite effect in Massachusetts as you see in other states.

124
00:11:59,880 --> 00:12:03,080
In Massachusetts, sales dropped to zero.

125
00:12:03,080 --> 00:12:05,020
In other states, they spiked.

126
00:12:05,020 --> 00:12:11,440
So this actually opens the door to rich comparative analysis.

127
00:12:11,440 --> 00:12:21,560
So you can see potentially how policies in Massachusetts may compare to other states.

128
00:12:21,560 --> 00:12:28,360
What type of comparative analysis you would do, not 100% sure yet.

129
00:12:28,360 --> 00:12:32,640
We are going to introduce some models.

130
00:12:32,640 --> 00:12:35,040
So maybe teasing a little too much.

131
00:12:35,040 --> 00:12:39,960
We're going to be introducing some models in Saturday morning statistics that specifically

132
00:12:39,960 --> 00:12:46,440
let us do this type of comparative analysis where we can look at these breakpoints in

133
00:12:46,440 --> 00:12:51,200
time and let these aid us.

134
00:12:51,200 --> 00:12:56,800
So long story short, we'll tease that and keep moving forward for now.

135
00:12:56,800 --> 00:13:04,560
But we'll get into the juicy bits now because there's a lot of ground to cover.

136
00:13:04,560 --> 00:13:14,840
And I want to estimate a lot of statistics and create a lot of forecasts today.

137
00:13:14,840 --> 00:13:21,520
So to do that, we're going to use our handy dandy PMD-ARIMA package.

138
00:13:21,520 --> 00:13:27,280
So you'll need to pip install PMD-ARIMA.

139
00:13:27,280 --> 00:13:33,240
If you're using another programming language, you may have to find another tool for the

140
00:13:33,240 --> 00:13:34,240
job.

141
00:13:34,240 --> 00:13:36,280
But it doesn't matter.

142
00:13:36,280 --> 00:13:38,040
This is just statistics.

143
00:13:38,040 --> 00:13:42,120
It can be done in any programming language.

144
00:13:42,120 --> 00:13:46,120
So we're going to calculate some statistics here.

145
00:13:46,120 --> 00:13:50,760
We're going to calculate weekly series.

146
00:13:50,760 --> 00:13:58,800
So here is a tip slash trick that I shared with Saturday morning statistics.

147
00:13:58,800 --> 00:14:02,560
I'll also share it here because it's such a useful tip and trick.

148
00:14:02,560 --> 00:14:06,680
But they got it a week ahead of time in Saturday morning stats.

149
00:14:06,680 --> 00:14:13,440
But long story short, a nifty trick is to use weekly series for forecasting the medium

150
00:14:13,440 --> 00:14:15,920
term.

151
00:14:15,920 --> 00:14:21,240
If you're going to be forecasting the very short term, like one week ahead, definitely

152
00:14:21,240 --> 00:14:24,160
I would use daily data.

153
00:14:24,160 --> 00:14:29,960
So if you're going to just, or even a month ahead, I would just use daily data.

154
00:14:29,960 --> 00:14:36,320
So you could just use recent daily data.

155
00:14:36,320 --> 00:14:44,760
I would try to add in day of the week effects because Fridays are probably different than

156
00:14:44,760 --> 00:14:50,800
Mondays and Sundays and what have you.

157
00:14:50,800 --> 00:15:01,720
So daily data is fine for short term forecasting and you can make do for long term forecasting.

158
00:15:01,720 --> 00:15:06,040
As we saw, we can aggregate it into monthly series.

159
00:15:06,040 --> 00:15:20,240
However, we lose a lot of the variability when we move to monthly series.

160
00:15:20,240 --> 00:15:28,280
So I found that the sweet spot for forecasting the medium term, and when I say medium term,

161
00:15:28,280 --> 00:15:32,240
I mean three months to two years.

162
00:15:32,240 --> 00:15:34,600
Two years is sort of pushing it.

163
00:15:34,600 --> 00:15:40,960
So maybe three months to one and a half years would be medium term.

164
00:15:40,960 --> 00:15:45,760
If you're going to be doing two plus years, two to five years, that's going to be long

165
00:15:45,760 --> 00:15:46,760
term.

166
00:15:46,760 --> 00:15:55,520
And then, you know, all bets are off, I think, in 10 plus year forecasts.

167
00:15:55,520 --> 00:15:59,040
But people make them.

168
00:15:59,040 --> 00:16:04,040
But I think there's just so many structural changes, especially in this day and age that

169
00:16:04,040 --> 00:16:08,920
will occur in 10 years that it's maybe not even worthwhile forecasting.

170
00:16:08,920 --> 00:16:15,440
But you can still do it just for a mental exercise.

171
00:16:15,440 --> 00:16:20,820
But you know, everyone's interested in the short to medium term.

172
00:16:20,820 --> 00:16:26,920
So short term, really useful for general managers, right?

173
00:16:26,920 --> 00:16:37,600
So if you need to know how many employee hours to employ the next week, so how long you need

174
00:16:37,600 --> 00:16:42,360
to have your employees there, what days you need to have your employees there, what time

175
00:16:42,360 --> 00:16:47,080
of day you need to have your employees there.

176
00:16:47,080 --> 00:16:51,480
Short term forecasting works well.

177
00:16:51,480 --> 00:16:53,680
Then for more of the executives, right?

178
00:16:53,680 --> 00:17:00,200
For in the investors, they're looking at more of the short to medium term, right?

179
00:17:00,200 --> 00:17:07,160
Because they want to see, okay, is the business going to potentially start to generate a return?

180
00:17:07,160 --> 00:17:13,680
Or is the return going to be increasing or decreasing?

181
00:17:13,680 --> 00:17:20,200
You know, and those players have slightly longer time horizons.

182
00:17:20,200 --> 00:17:24,860
So today we're going to be doing a medium term for forecast.

183
00:17:24,860 --> 00:17:31,000
So we're going to be forecasting the next year, essentially.

184
00:17:31,000 --> 00:17:36,360
So the remainder of this year in 2022, that's sort of how I like to do things, the remainder

185
00:17:36,360 --> 00:17:41,880
of the year and next year, just personal preference.

186
00:17:41,880 --> 00:17:51,440
So we actually may need to crank up the number of weeks here.

187
00:17:51,440 --> 00:17:57,360
So let's just crank this up to the next 60 weeks.

188
00:17:57,360 --> 00:18:02,240
And we've here mean drone on and on.

189
00:18:02,240 --> 00:18:09,560
Let's look at the data.

190
00:18:09,560 --> 00:18:12,160
Early sales.

191
00:18:12,160 --> 00:18:17,440
And we're also going to be looking at the number of plants grown.

192
00:18:17,440 --> 00:18:22,760
This is the total number of tracked plants, vegetative and flowering.

193
00:18:22,760 --> 00:18:29,280
So, and another interesting variable, we've got it.

194
00:18:29,280 --> 00:18:30,280
So let's look at it.

195
00:18:30,280 --> 00:18:31,720
We're given employees.

196
00:18:31,720 --> 00:18:40,320
So we're going to aggregate this by the week, taking the average number of employees employed

197
00:18:40,320 --> 00:18:45,080
at any given time.

198
00:18:45,080 --> 00:18:49,040
You see quite a large growth rate.

199
00:18:49,040 --> 00:19:00,960
So the number of employees has grown five times in the past year or so.

200
00:19:00,960 --> 00:19:04,520
And so that's impressive.

201
00:19:04,520 --> 00:19:08,600
So you've seen a large growth in this sector.

202
00:19:08,600 --> 00:19:13,560
So a large number of employees are entering.

203
00:19:13,560 --> 00:19:21,040
Awesome.

204
00:19:21,040 --> 00:19:23,500
I'm going to jump ahead real quick.

205
00:19:23,500 --> 00:19:32,840
So before we get to forecasting, which we're going to do shortly, I'm going to go ahead

206
00:19:32,840 --> 00:19:44,240
and introduce to you a brand new statistic, which I just calculated this morning.

207
00:19:44,240 --> 00:19:45,960
It's not perfect.

208
00:19:45,960 --> 00:19:51,760
And I'll point out the flaws.

209
00:19:51,760 --> 00:19:59,040
But this is a statistic that I believe you can calculate with the data given.

210
00:19:59,040 --> 00:20:03,160
And I'm not certain if anyone's thought to look for it before.

211
00:20:03,160 --> 00:20:13,120
So long story short, I wanted to know what is the total number of retailers in the market?

212
00:20:13,120 --> 00:20:16,800
So we're given licensees.

213
00:20:16,800 --> 00:20:24,800
So I thought we've got our licensees data here.

214
00:20:24,800 --> 00:20:34,480
I thought, OK, why don't we just count the number of licensees that are retailers?

215
00:20:34,480 --> 00:20:41,920
And we can do that.

216
00:20:41,920 --> 00:20:49,160
Well, there was probably not 379 retailers on day one.

217
00:20:49,160 --> 00:20:52,720
So I started thinking, man, that's unfortunate.

218
00:20:52,720 --> 00:21:01,440
It would be nice to have a count of the retailers over time.

219
00:21:01,440 --> 00:21:07,360
Well, we're given, right?

220
00:21:07,360 --> 00:21:32,840
So if you look at one of these retailers, we're given when their application was created.

221
00:21:32,840 --> 00:21:41,920
Oh, and I actually just thought about how we may be able to account for exits.

222
00:21:41,920 --> 00:21:48,640
Because basically what I was going to say is, well, we can see when these licensees

223
00:21:48,640 --> 00:21:50,680
entered.

224
00:21:50,680 --> 00:22:02,120
So for example, this licensee, we're going to say entered on April 28 of 2021.

225
00:22:02,120 --> 00:22:10,920
So the way we can potentially do account of the total retailers is we can go through time,

226
00:22:10,920 --> 00:22:18,120
look at all the days, and simply add up all the retailers that had their app create date

227
00:22:18,120 --> 00:22:20,160
before that given date.

228
00:22:20,160 --> 00:22:24,120
So let's say this is a market of one.

229
00:22:24,120 --> 00:22:34,240
Then you're going to have 00000000, and then for all the days prior to 2021, 428, then

230
00:22:34,240 --> 00:22:43,240
all of a sudden on 2021, 428, our total licensees increases by one.

231
00:22:43,240 --> 00:22:47,240
And then it's going to be one for perpetuity.

232
00:22:47,240 --> 00:22:54,800
I just realized now that we were also given the activity date.

233
00:22:54,800 --> 00:23:07,840
So in one cell swoop, we may also be able to control for licensees that have exited.

234
00:23:07,840 --> 00:23:10,920
And I just had this idea.

235
00:23:10,920 --> 00:23:18,840
So bear with me while I adjust the code and see if this still works.

236
00:23:18,840 --> 00:23:21,880
So we'll listen to it before and after, right?

237
00:23:21,880 --> 00:23:29,760
So basically, what I'm going to do here is I'm going to create a series, total retailers,

238
00:23:29,760 --> 00:23:33,840
total cultivators, and total licensees.

239
00:23:33,840 --> 00:23:41,240
And I would like to turn this into a time series where we can look at the total number

240
00:23:41,240 --> 00:23:46,780
of these license types and total licensees over time.

241
00:23:46,780 --> 00:23:55,120
So we can see how the total number of retailers may be growing or decreasing over time.

242
00:23:55,120 --> 00:24:10,480
So let's just go ahead and code in the activity date part now just to go ahead and be precise

243
00:24:10,480 --> 00:24:12,600
about this.

244
00:24:12,600 --> 00:24:17,840
Actually, let's do it before and after.

245
00:24:17,840 --> 00:24:23,440
So this is what I calculated this morning where I just say, OK, this is everybody.

246
00:24:23,440 --> 00:24:30,800
For example, I'm going to count the total retailers as where the licensees, license

247
00:24:30,800 --> 00:24:37,640
type is equal to, uh-oh, this should actually be retailer.

248
00:24:37,640 --> 00:24:40,040
So it didn't do it 100% right this morning.

249
00:24:40,040 --> 00:24:47,880
But here, I want to say, OK, where the license type is equal to the marijuana retailer and

250
00:24:47,880 --> 00:24:54,760
the app create date is less than or equal to that time stamp.

251
00:24:54,760 --> 00:24:59,560
And here, I'm just iterating over the production, right?

252
00:24:59,560 --> 00:25:03,440
But here, I'm just iterating over the index.

253
00:25:03,440 --> 00:25:10,560
And I'm saying, OK, the total retailers is where the date is less than or equal to 1015,

254
00:25:10,560 --> 00:25:14,480
less than or equal to 1016.

255
00:25:14,480 --> 00:25:22,080
So let's just do that, and then we'll account for exits.

256
00:25:22,080 --> 00:25:28,040
OK.

257
00:25:28,040 --> 00:25:32,920
We need some packages here for plotting.

258
00:25:32,920 --> 00:25:36,240
Let's try this one more time.

259
00:25:36,240 --> 00:25:38,800
Awesome.

260
00:25:38,800 --> 00:25:43,520
So here, we actually, this is my first go at it.

261
00:25:43,520 --> 00:25:45,920
We'll adjust it in a second.

262
00:25:45,920 --> 00:25:52,000
But this is the first go at creating.

263
00:25:52,000 --> 00:25:54,960
Well I kind of jumped the gun.

264
00:25:54,960 --> 00:25:56,040
Let's go back to this.

265
00:25:56,040 --> 00:26:04,440
So this is our attempt at calculating total retailers over time.

266
00:26:04,440 --> 00:26:09,520
And so, well, here, I actually aggregated it into the week.

267
00:26:09,520 --> 00:26:14,480
But we can look at it daily as well.

268
00:26:14,480 --> 00:26:22,240
So you can see, going up, a real interesting curve here, right?

269
00:26:22,240 --> 00:26:37,400
And so it looks like the market is reaching almost a natural level here of retailers

270
00:26:37,400 --> 00:26:40,480
that the market can sustain.

271
00:26:40,480 --> 00:26:41,200
Right?

272
00:26:41,200 --> 00:26:45,040
So you have retailers entering.

273
00:26:45,040 --> 00:26:50,480
You start off with, what do you start off with?

274
00:26:55,040 --> 00:26:59,400
Start off with 118 retailers in the market.

275
00:26:59,400 --> 00:27:03,200
And you end up with 379.

276
00:27:03,200 --> 00:27:06,800
And there's the projected path.

277
00:27:06,800 --> 00:27:09,440
Let's go ahead and correct this real quick.

278
00:27:09,440 --> 00:27:14,440
And right, because just because a retailer entered,

279
00:27:14,440 --> 00:27:18,640
they may have had a faulty business model.

280
00:27:18,640 --> 00:27:28,520
Or they may have cut some corners and got dinged and got their license suspended.

281
00:27:28,520 --> 00:27:30,040
You know, it's not impossible.

282
00:27:30,040 --> 00:27:31,840
Like, these things happen.

283
00:27:31,840 --> 00:27:38,600
Or they decided mining cryptocurrencies more and more profitable,

284
00:27:38,600 --> 00:27:42,040
and they exited the industry entirely.

285
00:27:42,040 --> 00:27:46,240
So who knows what's going on?

286
00:27:46,240 --> 00:27:48,760
But there's always exits.

287
00:27:48,760 --> 00:27:53,280
And this is something that's rarely, or not rarely,

288
00:27:53,280 --> 00:28:00,080
but proportionally neglected in the field of economics,

289
00:28:00,080 --> 00:28:02,600
is the way I like to phrase it.

290
00:28:02,600 --> 00:28:06,760
Not many people like to talk about or research exit.

291
00:28:06,760 --> 00:28:07,280
Right?

292
00:28:07,280 --> 00:28:10,800
Like, of course, people are looking at market entry.

293
00:28:10,800 --> 00:28:11,320
Right?

294
00:28:11,320 --> 00:28:13,440
And so that gets a lot of focus.

295
00:28:13,440 --> 00:28:16,400
And then, of course, production gets a lot of focus,

296
00:28:16,400 --> 00:28:20,560
how people are actually operating and performing in the market, right?

297
00:28:20,560 --> 00:28:23,560
How profitable people are.

298
00:28:23,560 --> 00:28:26,840
And profitability has to do with your exit.

299
00:28:26,840 --> 00:28:27,640
Right?

300
00:28:27,640 --> 00:28:32,200
But there's such a thing as a strategic exit.

301
00:28:32,200 --> 00:28:35,000
Knowing when do you exit?

302
00:28:35,000 --> 00:28:39,240
Just because you're in the red doesn't necessarily mean you should exit.

303
00:28:39,240 --> 00:28:39,760
Right?

304
00:28:39,760 --> 00:28:43,320
So it depends a lot on your fixed costs, your variable costs,

305
00:28:43,320 --> 00:28:47,680
the price in the market, the future trajectory.

306
00:28:47,680 --> 00:28:50,480
So there's a lot of factors that go into exit.

307
00:28:50,480 --> 00:28:54,080
So I think there's a lot of analysis that can be done simply

308
00:28:54,080 --> 00:28:56,680
on predicting firm exit.

309
00:28:56,680 --> 00:29:02,080
So if we're able to look at when these firms exit, why?

310
00:29:02,080 --> 00:29:09,680
Are firms in particular geographic regions exiting at different rates?

311
00:29:12,520 --> 00:29:14,960
You know, there's so, right?

312
00:29:14,960 --> 00:29:17,600
So that may be something to pinpoint, right?

313
00:29:17,600 --> 00:29:19,680
You may want to look at a map and see if there's

314
00:29:19,680 --> 00:29:23,640
certain geographic areas where there's a high exit rate.

315
00:29:23,640 --> 00:29:24,160
Right?

316
00:29:24,160 --> 00:29:30,040
And people there may need a bit more business assistance or what have you.

317
00:29:30,040 --> 00:29:37,280
So long story short, lots of fruitful analysis that can be done.

318
00:29:37,280 --> 00:29:47,320
Without boring you too much on that, let's see if we can't parse out exits

319
00:29:47,320 --> 00:29:49,840
by the activity date.

320
00:29:49,840 --> 00:29:55,400
So first off, let's just look at this activity date

321
00:29:55,400 --> 00:29:57,400
and see what's going on with it.

322
00:30:10,280 --> 00:30:12,800
It looks like, unfortunately, we're not going

323
00:30:12,800 --> 00:30:15,280
to be able to get exits out of this because it looks

324
00:30:15,280 --> 00:30:18,560
like there's only one unique value.

325
00:30:18,560 --> 00:30:24,080
So as much as I was just hammering on about, that's unfortunate.

326
00:30:24,080 --> 00:30:38,000
So as much as I was just hammering on about,

327
00:30:38,000 --> 00:30:41,520
as I was just hammering on about exits, I'm not sure if we can actually

328
00:30:41,520 --> 00:30:44,480
get exits out of this data.

329
00:30:44,480 --> 00:30:48,800
So we have when the app was created.

330
00:30:48,800 --> 00:30:50,280
Ch-ch-ch-ch-ch-ch-ch-ch.

331
00:30:54,320 --> 00:30:59,440
And we may want to check the application status to make sure no one's

332
00:30:59,440 --> 00:31:00,520
got unimproved.

333
00:31:00,520 --> 00:31:06,680
But I think, unfortunately, we're not going to be able to parse out exits.

334
00:31:06,680 --> 00:31:11,160
So I am going to have to hedge all further analysis

335
00:31:11,160 --> 00:31:15,400
about these specific statistics on the fact

336
00:31:15,400 --> 00:31:20,800
that it's not going to account for exits, which is sort of a big deal.

337
00:31:26,240 --> 00:31:30,040
Let's just see what's going on with the application status just

338
00:31:30,040 --> 00:31:35,640
to see if there's any suspended licenses or anything of that sort

339
00:31:35,640 --> 00:31:48,200
that we may need to be worried about.

340
00:31:48,200 --> 00:31:56,080
So I'm going to go ahead and keep steamrolling forward here.

341
00:31:56,080 --> 00:32:02,560
However, I think this is worth looking at for other interested parties

342
00:32:02,560 --> 00:32:08,520
is I'm just going to go ahead and count all the retailers,

343
00:32:08,520 --> 00:32:13,600
even if they potentially have exited or we're not accounting for exits.

344
00:32:13,600 --> 00:32:18,280
So that may bias historic values.

345
00:32:18,280 --> 00:32:23,200
Plus, I'm not going to be accounting for this approved license type right now.

346
00:32:23,200 --> 00:32:26,520
So this may be worthwhile looking at seeing, OK,

347
00:32:26,520 --> 00:32:30,640
is this a provisional consideration, provisional license,

348
00:32:30,640 --> 00:32:32,960
final license, or in process?

349
00:32:32,960 --> 00:32:37,560
Because if they're in process, they may not be actually selling yet.

350
00:32:37,560 --> 00:32:44,080
So you may just want to look at the total number of final licenses over time.

351
00:32:44,080 --> 00:32:45,400
So you can do that.

352
00:32:45,400 --> 00:32:48,200
The numbers will change.

353
00:32:48,200 --> 00:32:51,320
But I'm going to leave that for interested parties

354
00:32:51,320 --> 00:32:57,040
here because I don't want to get too bogged down in this.

355
00:32:57,040 --> 00:33:00,960
But what Canlytics is going to be doing is we're

356
00:33:00,960 --> 00:33:06,520
going to be recording these licenses over time.

357
00:33:06,520 --> 00:33:13,840
So that way, we actually can get an accurate count

358
00:33:13,840 --> 00:33:17,560
of the total retailers, total cultivators,

359
00:33:17,560 --> 00:33:25,280
and all the other businesses over time and an accurate count of entries

360
00:33:25,280 --> 00:33:33,840
and exits with those accurate counts that will open the door to rich analysis

361
00:33:33,840 --> 00:33:41,440
of market entry and exit, which we were just going on and on about earlier

362
00:33:41,440 --> 00:33:46,200
about how understudied and potentially understudied.

363
00:33:46,200 --> 00:33:48,960
And that's an opinion.

364
00:33:48,960 --> 00:33:53,120
But in my opinion, it's understudied.

365
00:33:53,120 --> 00:33:58,400
And there's fruitful insights that can be made.

366
00:33:58,400 --> 00:34:05,680
So without getting you too bogged down in this,

367
00:34:05,680 --> 00:34:09,120
why was this data interesting?

368
00:34:09,120 --> 00:34:14,240
Well, like we said, we could look at just the total number estimate.

369
00:34:14,240 --> 00:34:16,280
Remember, this is an estimate.

370
00:34:16,280 --> 00:34:22,560
These numbers have bias because I wasn't able to effectively account for exits.

371
00:34:22,560 --> 00:34:27,040
These are biased numbers.

372
00:34:27,040 --> 00:34:30,720
They're inaccurate or they may be inaccurate.

373
00:34:30,720 --> 00:34:36,680
But I always think a measure, even if it's imperfect,

374
00:34:36,680 --> 00:34:49,440
is better than no measure because before, all we had was a single point number.

375
00:34:49,440 --> 00:34:56,520
All we had was the total number of cultivators today, which was 285.

376
00:34:56,520 --> 00:35:06,400
Well, with using the data we're given, we're able to estimate.

377
00:35:06,400 --> 00:35:13,360
I want to make this super clear here that we're inferring here.

378
00:35:13,360 --> 00:35:15,920
We're making an inference.

379
00:35:15,920 --> 00:35:21,120
This is not a precise statistic.

380
00:35:21,120 --> 00:35:23,280
That's why it's a statistic.

381
00:35:23,280 --> 00:35:26,880
It's not a data point.

382
00:35:26,880 --> 00:35:30,880
Well, it's not an observation.

383
00:35:30,880 --> 00:35:32,720
It's a statistic.

384
00:35:32,720 --> 00:35:40,760
So anyways, there's someone who knows more about statistics

385
00:35:40,760 --> 00:35:47,160
than I do for the correct terminology.

386
00:35:47,160 --> 00:35:51,480
But anywho, we're going from about 100 cultivators

387
00:35:51,480 --> 00:35:58,800
at the very beginning to almost 300, 285.

388
00:35:58,800 --> 00:36:17,640
So almost a 200% increase, so not bad.

389
00:36:17,640 --> 00:36:24,640
Anyways, let's stay on focus here and keep calculating statistics.

390
00:36:24,640 --> 00:36:29,640
We'll also look at the total number of licensees.

391
00:36:29,640 --> 00:36:33,640
So these are interesting time series.

392
00:36:33,640 --> 00:36:36,640
So like I said, these are imperfect.

393
00:36:36,640 --> 00:36:42,640
But we've just created three brand new time series here.

394
00:36:42,640 --> 00:36:52,640
And I would always encourage you to keep track of, find, record,

395
00:36:52,640 --> 00:36:56,640
and utilize time series data.

396
00:36:56,640 --> 00:37:01,640
So for those of you joining us, time series data,

397
00:37:01,640 --> 00:37:07,640
it's basically right here.

398
00:37:07,640 --> 00:37:12,640
It's a pair of observations.

399
00:37:12,640 --> 00:37:19,640
You're given a time and the value at that time.

400
00:37:19,640 --> 00:37:24,640
And so we're all used to this, daily weather.

401
00:37:24,640 --> 00:37:29,640
We're all familiar with time series.

402
00:37:29,640 --> 00:37:38,640
However, if you define it, there's

403
00:37:38,640 --> 00:37:45,640
a lot of interesting statistics you can do, such as forecasting.

404
00:37:45,640 --> 00:37:48,640
So all we're going to be doing with forecasting,

405
00:37:48,640 --> 00:37:52,640
we're not going to be using any economic models or theory.

406
00:37:52,640 --> 00:37:57,640
We're just going to be using a time series itself.

407
00:37:57,640 --> 00:38:02,640
So we're going to be saying, given this time series,

408
00:38:02,640 --> 00:38:05,640
can we extend this out a bit?

409
00:38:05,640 --> 00:38:08,640
Because you say, like your manager said,

410
00:38:08,640 --> 00:38:12,640
oh, will you predict this forward five months?

411
00:38:12,640 --> 00:38:17,640
Well, a naive prediction is tomorrow.

412
00:38:17,640 --> 00:38:19,640
And this is actually the definition.

413
00:38:19,640 --> 00:38:23,640
The naive forecast, tomorrow's for the best naive forecast

414
00:38:23,640 --> 00:38:27,640
is tomorrow's forecast is the same as today.

415
00:38:27,640 --> 00:38:29,640
That's the best you can do.

416
00:38:29,640 --> 00:38:34,640
And that's a pure autoregressive forecast.

417
00:38:34,640 --> 00:38:37,640
So you're just going to say, OK, whatever today is,

418
00:38:37,640 --> 00:38:39,640
that's what tomorrow's going to be.

419
00:38:39,640 --> 00:38:42,640
And then that's what the day after is going to be.

420
00:38:42,640 --> 00:38:44,640
It's better than nothing.

421
00:38:44,640 --> 00:38:47,640
It's better than saying zero.

422
00:38:47,640 --> 00:38:52,640
And so a naive forecast, you just draw a straight line out.

423
00:38:52,640 --> 00:38:56,640
And that would be your forecast.

424
00:38:56,640 --> 00:39:00,640
That's just sort of naively looking at past values

425
00:39:00,640 --> 00:39:02,640
and playing it forward.

426
00:39:02,640 --> 00:39:07,640
Well, we can get a bit more sophisticated than that.

427
00:39:07,640 --> 00:39:13,640
So we may be able to parse out some cyclical behavior

428
00:39:13,640 --> 00:39:14,640
or what have you.

429
00:39:14,640 --> 00:39:19,640
These series don't have too much variability.

430
00:39:22,640 --> 00:39:26,640
So they may not have rich, rich forecasts.

431
00:39:26,640 --> 00:39:28,640
But that may be expected.

432
00:39:28,640 --> 00:39:31,640
We may not really expect the total number of retailers

433
00:39:31,640 --> 00:39:36,640
to waver that much in the coming months.

434
00:39:36,640 --> 00:39:42,640
So if we look at the numbers, we're

435
00:39:42,640 --> 00:39:48,640
going to see that we've had 379 retailers

436
00:39:48,640 --> 00:39:50,640
since the beginning of October.

437
00:39:57,640 --> 00:40:02,640
So it looks like this number here is quite steady.

438
00:40:02,640 --> 00:40:05,640
So it looks like we've had about, in Massachusetts,

439
00:40:05,640 --> 00:40:09,640
there's been about 379 retailers going on the past three months

440
00:40:09,640 --> 00:40:10,640
now.

441
00:40:10,640 --> 00:40:12,640
So things have maybe stabilized.

442
00:40:15,640 --> 00:40:20,640
So that's stabilizing.

443
00:40:20,640 --> 00:40:24,640
But what did we see was all over the board?

444
00:40:27,640 --> 00:40:30,640
Sales, all over the board.

445
00:40:30,640 --> 00:40:31,640
All over the board.

446
00:40:31,640 --> 00:40:37,640
Like we saw, as you start aggravating,

447
00:40:37,640 --> 00:40:40,640
it's going to get slightly smoother.

448
00:40:40,640 --> 00:40:44,640
Weekly sales starting to smooth out.

449
00:40:44,640 --> 00:40:48,640
Monthly sales would be even smoother.

450
00:40:48,640 --> 00:40:50,640
But still variability.

451
00:40:50,640 --> 00:40:54,640
Well, that means there's going to be

452
00:40:54,640 --> 00:40:56,640
variability for the retailers.

453
00:40:56,640 --> 00:40:59,640
So what would be the best way to do that?

454
00:40:59,640 --> 00:41:01,640
Retailers.

455
00:41:01,640 --> 00:41:04,640
So what would be awesome would be

456
00:41:04,640 --> 00:41:08,640
to have the actual sales per retailer.

457
00:41:08,640 --> 00:41:11,640
And in some states, like Washington state,

458
00:41:11,640 --> 00:41:16,640
you can actually get the total sales per retailer.

459
00:41:16,640 --> 00:41:19,640
So you can do a Freedom of Information Act request.

460
00:41:19,640 --> 00:41:22,640
We've done this in previous meetups.

461
00:41:22,640 --> 00:41:26,640
You can get the sales data in Washington state.

462
00:41:26,640 --> 00:41:29,640
I actually calculated this statistic,

463
00:41:29,640 --> 00:41:32,640
but still on my long-term to-do list.

464
00:41:32,640 --> 00:41:34,640
Would love to-do list.

465
00:41:34,640 --> 00:41:42,640
And that would be to calculate sales per retailer per day.

466
00:41:42,640 --> 00:41:47,640
So then you could actually get a breakdown

467
00:41:47,640 --> 00:41:53,640
of market concentration.

468
00:41:53,640 --> 00:41:59,640
You could actually calculate the market portion

469
00:41:59,640 --> 00:42:04,640
for each of the firms, each of the retailers at least.

470
00:42:04,640 --> 00:42:07,640
So there's a lot of real interesting analysis

471
00:42:07,640 --> 00:42:09,640
you can do in Washington state.

472
00:42:09,640 --> 00:42:13,640
The problem is 100-plus gigabytes of data.

473
00:42:13,640 --> 00:42:15,640
It's hard to work with.

474
00:42:15,640 --> 00:42:20,640
So that's why we have to leverage powerful tools.

475
00:42:20,640 --> 00:42:24,640
And I can let X and others provide us.

476
00:42:24,640 --> 00:42:32,640
So for here, we're going to do an estimate,

477
00:42:32,640 --> 00:42:34,640
not going to be exact.

478
00:42:34,640 --> 00:42:40,640
So we can at least create a benchmark for the retailers.

479
00:42:40,640 --> 00:42:42,640
So we can say, OK, retailers.

480
00:42:42,640 --> 00:42:49,640
OK, what's the average number of retailers

481
00:42:49,640 --> 00:42:52,640
that are present on any given week?

482
00:42:56,640 --> 00:43:01,640
And then we can say, what's the average number of sales

483
00:43:01,640 --> 00:43:03,640
on any given week?

484
00:43:03,640 --> 00:43:06,640
And then a useful benchmark would just

485
00:43:06,640 --> 00:43:11,640
be what's the sales per cultivator?

486
00:43:11,640 --> 00:43:12,640
Awesome.

487
00:43:12,640 --> 00:43:14,640
Awesome, awesome, awesome.

488
00:43:14,640 --> 00:43:18,640
And this is a number, a statistic, an estimate.

489
00:43:18,640 --> 00:43:25,640
An estimated statistic that business managers,

490
00:43:25,640 --> 00:43:33,640
investment firms or individuals would be interested in.

491
00:43:33,640 --> 00:43:42,640
So this is the average amount of sales per retailer.

492
00:43:42,640 --> 00:43:44,640
And even your general manager is going

493
00:43:44,640 --> 00:43:45,640
to be interested in this.

494
00:43:45,640 --> 00:43:48,640
Because you want to, and this is by week.

495
00:43:48,640 --> 00:43:54,640
And you could aggregate it and do it by month as well.

496
00:43:54,640 --> 00:43:59,640
So this is the average sales per week per retailer.

497
00:43:59,640 --> 00:44:09,640
So if you're a retailer, are your sales above or below average?

498
00:44:09,640 --> 00:44:20,640
So if you're at this date only doing about 20,000 a week in sales,

499
00:44:20,640 --> 00:44:28,640
then you're underperforming potentially the average.

500
00:44:28,640 --> 00:44:30,640
And so if you're a general manager,

501
00:44:30,640 --> 00:44:38,640
you may want to be looking for ways you can get lead and get improved.

502
00:44:38,640 --> 00:44:41,640
Get your profits up.

503
00:44:41,640 --> 00:44:43,640
Maybe you need to do some advertising

504
00:44:43,640 --> 00:44:50,640
or find a better location or you name it.

505
00:44:50,640 --> 00:44:55,640
Conversely, if you're significantly above the average,

506
00:44:55,640 --> 00:44:59,640
say you're doing 100,000 a week in sales,

507
00:44:59,640 --> 00:45:03,640
well, you may be feeling a little good about yourself.

508
00:45:03,640 --> 00:45:06,640
You may want to make sure you're not slipping.

509
00:45:06,640 --> 00:45:08,640
You're staying above average.

510
00:45:08,640 --> 00:45:11,640
You may want to do a self-analysis and say, OK,

511
00:45:11,640 --> 00:45:16,640
what are we doing that makes us perform above average?

512
00:45:16,640 --> 00:45:20,640
And maybe we can capitalize on that.

513
00:45:20,640 --> 00:45:25,640
So if you can figure out why you're doing better than others,

514
00:45:25,640 --> 00:45:29,640
maybe you can do more of that.

515
00:45:29,640 --> 00:45:34,640
So maybe you can find your niche or your comparative advantage.

516
00:45:34,640 --> 00:45:38,640
And you can just keep performing better and better.

517
00:45:38,640 --> 00:45:41,640
Or maybe you're looking for investment.

518
00:45:41,640 --> 00:45:45,640
You can go sell yourself to investors and say, hey,

519
00:45:45,640 --> 00:45:52,640
look, the average sales in Massachusetts is X, around 80,000.

520
00:45:52,640 --> 00:45:59,640
We're performing 25% above average.

521
00:45:59,640 --> 00:46:00,640
Invest in us.

522
00:46:00,640 --> 00:46:05,640
We're one of the outliers.

523
00:46:05,640 --> 00:46:10,640
So that's something interesting you can do.

524
00:46:10,640 --> 00:46:15,640
Well, hey, the cultivators say, yeah, you're leaving us out of all the fun.

525
00:46:15,640 --> 00:46:16,640
Well, don't worry.

526
00:46:16,640 --> 00:46:18,640
We've got something for you, too.

527
00:46:18,640 --> 00:46:26,640
So for the cultivators, well, we've calculated the weekly number of plants

528
00:46:26,640 --> 00:46:28,640
going up and up and up.

529
00:46:28,640 --> 00:46:35,640
Well, the total number of cultivators also going up and up and up.

530
00:46:35,640 --> 00:46:38,640
Interestingly, hitting a plateau.

531
00:46:38,640 --> 00:46:41,640
Quite interesting.

532
00:46:41,640 --> 00:46:48,640
Well, how can we gauge the performance of a retailer?

533
00:46:48,640 --> 00:46:50,640
I mean, of a cultivator.

534
00:46:50,640 --> 00:46:56,640
It would be awesome to have wholesale sales.

535
00:46:56,640 --> 00:47:01,640
Short of wholesale sales, we can use plants.

536
00:47:01,640 --> 00:47:05,640
And so this will proxy the size of the cultivators.

537
00:47:05,640 --> 00:47:06,640
You know, it's not perfect.

538
00:47:06,640 --> 00:47:12,640
Some cultivators have different growing styles.

539
00:47:12,640 --> 00:47:17,640
So maybe your growing style favors more plants or less plants.

540
00:47:17,640 --> 00:47:24,640
So you may grow these big bushy plants, these ginormous plants.

541
00:47:24,640 --> 00:47:32,640
Or you may do what's called like a sea of green approach, where you just have tons of smaller, shorter plants.

542
00:47:32,640 --> 00:47:35,640
So you may.

543
00:47:35,640 --> 00:47:38,640
I've heard of people stacking plants.

544
00:47:38,640 --> 00:47:40,640
I've heard that may not be super successful.

545
00:47:40,640 --> 00:47:42,640
But what do I know?

546
00:47:42,640 --> 00:47:43,640
You know, I'm not a cultivator.

547
00:47:43,640 --> 00:47:48,640
So all these cultivators, I'm sure, have different growing styles.

548
00:47:48,640 --> 00:47:55,640
So one cultivators plant is probably not equal to another cultivators plant.

549
00:47:55,640 --> 00:48:03,640
So it would be awesome to maybe look at yields or what have you.

550
00:48:03,640 --> 00:48:06,640
Like I said, wholesale sales, it's hard to just beat sales.

551
00:48:06,640 --> 00:48:09,640
But but anywho, there's other measures.

552
00:48:09,640 --> 00:48:12,640
We're going to use the data we're given.

553
00:48:12,640 --> 00:48:18,640
And we'll just calculate plants per cultivator.

554
00:48:18,640 --> 00:48:21,640
And I think this is interesting.

555
00:48:21,640 --> 00:48:28,640
Right. So what you see is.

556
00:48:28,640 --> 00:48:33,640
From what data is showing, there's a there's been a recent dip.

557
00:48:33,640 --> 00:48:45,640
So like we said, we can't read in to to too much into these statistics, but it's like, you know, what's why, why the dip?

558
00:48:45,640 --> 00:48:47,640
Why the dip?

559
00:48:47,640 --> 00:48:55,640
You know, there's still the same number of cultivators, but there's just not as many plants.

560
00:48:55,640 --> 00:49:01,640
So, you know, this could be harvest season.

561
00:49:01,640 --> 00:49:07,640
But I just don't know if that's harvest season, right, because.

562
00:49:07,640 --> 00:49:14,640
You know that there's been prior harvest seasons and we haven't seen that this extraordinary dip.

563
00:49:14,640 --> 00:49:19,640
So, you know, it may have been economies of scale.

564
00:49:19,640 --> 00:49:29,640
Right. These growers, they may have reached their minimum long run average cost.

565
00:49:29,640 --> 00:49:35,640
And I'm not going to get into.

566
00:49:35,640 --> 00:49:38,640
Cost functions today, maybe another day.

567
00:49:38,640 --> 00:49:41,640
This is actually something that I know pretty well.

568
00:49:41,640 --> 00:49:47,640
So.

569
00:49:47,640 --> 00:49:53,640
So long story short, these cultivators may have been exploiting economies of scale.

570
00:49:53,640 --> 00:49:58,640
Right. So they're going to keep growing more and more plants at a lower and lower cost.

571
00:49:58,640 --> 00:50:03,640
Lower and lower average cost per plant.

572
00:50:03,640 --> 00:50:09,640
However, the average cost per plant can only get so low.

573
00:50:09,640 --> 00:50:18,640
And then at a certain point, if you want to get produce plants above that, your average cost is going to be increasing again.

574
00:50:18,640 --> 00:50:21,640
So.

575
00:50:21,640 --> 00:50:36,640
You know, right. So it's like, you know, if you really want to push the gas, you know, that much further, you know, like you want to you really, really, you really have to add that a thousand in first plant.

576
00:50:36,640 --> 00:50:44,640
You're going to need a whole other facility costs are going to go through the roof, you know, so there's, you know.

577
00:50:44,640 --> 00:50:55,640
There are like limits to these economies of scale. And so I'm not, I'm not by any means saying that these economies of scale have been reached.

578
00:50:55,640 --> 00:51:01,640
But I just find it interesting that you see this dip. It may be entirely.

579
00:51:01,640 --> 00:51:13,640
Explained by other factors, right. There could be a lull in the business cycle. There may be something going on in the capital markets. So people may have a hard time getting the investment they need.

580
00:51:13,640 --> 00:51:23,640
There could not there could be there is something going on in the labor markets, right. And so these are.

581
00:51:23,640 --> 00:51:36,640
With production inputs, right. So if you have something going on in the labor market that could affect your capital supply, like, you know how much capital you need to use.

582
00:51:36,640 --> 00:51:41,640
So there's a lot of factors going in here.

583
00:51:41,640 --> 00:51:53,640
But once again, we're given a rough measure. So if you're a cultivator and you know that you're only growing 100 plants.

584
00:51:53,640 --> 00:51:59,640
At any given time, then you're, you know, significantly below average.

585
00:51:59,640 --> 00:52:04,640
But maybe you're a cultivator and you've got 10,000 plants.

586
00:52:04,640 --> 00:52:17,640
I'm not 100% sure what the regulations are in Massachusetts on plant counts. But there's undoubtedly people that have above average plant counts and people that are below average.

587
00:52:17,640 --> 00:52:24,640
Just because you're at a below average plant count doesn't necessarily mean you're.

588
00:52:24,640 --> 00:52:35,640
Running your business poorly, right. Maybe you can yield more with 600 plants than somebody can yield with 1000 plants.

589
00:52:35,640 --> 00:52:47,640
Who knows. So, but it's a measure, right. And that's where we're here to say we're here to say that a measure is better than no measure.

590
00:52:47,640 --> 00:53:00,640
And so, you know, at this date, we can say that, you know, the average number of plants per cultivator is about 800.

591
00:53:00,640 --> 00:53:09,640
800 plants. So that's a sizeable amount. That's more than a home cultivation for sure.

592
00:53:09,640 --> 00:53:19,640
So there you have it. Some brand new statistics. So

593
00:53:19,640 --> 00:53:29,640
just to keep adding statistics on statistics, it's like how many statistics can we do in five minutes? Well, about to show you we can do a

594
00:53:29,640 --> 00:53:40,640
lot of statistics in five minutes. So let's do another one. Employees per licensee, right.

595
00:53:40,640 --> 00:53:48,640
So we saw that, oh, we can calculate the total number of licensees.

596
00:53:48,640 --> 00:53:53,640
Estimate the total number of licensees. Check that box.

597
00:53:53,640 --> 00:54:00,640
Well, we can see how many weekly employees there are.

598
00:54:00,640 --> 00:54:15,640
Right. And so this is interesting, right. Employees going up and up and up and up. Right. So maybe it's not the labor markets that's explaining what's going on with plants, unless there's something funny going on with wages, which

599
00:54:15,640 --> 00:54:27,640
easily may be, right, because we were looking at prices and inflation in prior weeks. We were having a hard time estimating the competitive wage in Massachusetts.

600
00:54:27,640 --> 00:54:43,640
If you'll look below, that's on our to do list to revisit that. So perhaps next week we're going to revisit competitive wages and interest rates in Massachusetts, because I still want to estimate those and forecast them.

601
00:54:43,640 --> 00:54:57,640
And I believe we were saying that wages were rising. Don't quote me on that. They may have been falling. Wages may have been falling. But for some reason, I want to say they were rising.

602
00:54:57,640 --> 00:55:14,640
So whatever's going on with the price of labor wage and the price of capital, the interest rate, that's going to affect your production inputs.

603
00:55:14,640 --> 00:55:33,640
So one of those is labor. Once again, you're going to have big firms and small firms, but we can see, okay, about how many employees are there per license.

604
00:55:33,640 --> 00:55:50,640
And we found this interesting. So looks like at the very beginning, they're short staffed. You know, you haven't, like you're starting with really few employees here.

605
00:55:50,640 --> 00:56:10,640
So you're starting to see less, you know, so these first days may not really be very representative. But, you know, let's just say you're out in January or so, you have around two employees per license.

606
00:56:10,640 --> 00:56:23,640
And so you see the firm size is increasing over time, and it hasn't hit a plateau yet. So there could still be economies of scale in labor, right?

607
00:56:23,640 --> 00:56:33,640
So this would be specialization, right? So the more employees you bring on board, the more specialized they can get, right?

608
00:56:33,640 --> 00:56:41,640
So when you only have two employees, they're going to be doing almost everything. And you see this in businesses, right?

609
00:56:41,640 --> 00:56:50,640
You see maybe the founder and the partner or the first hire, just doing an extraordinary amount of tasks.

610
00:56:50,640 --> 00:56:56,640
And then they keep bringing on more and more people and you can get specialized. So I work a lot with laboratories.

611
00:56:56,640 --> 00:57:09,640
And so you'll see this at the laboratory. So you can bring on new chemists and new analysts. And these chemists may at first do many tasks and then they can get specialized.

612
00:57:09,640 --> 00:57:20,640
So you may eventually have a chemist whose specialty is testing for heavy metals or another chemist whose specialty is testing for aromatics.

613
00:57:20,640 --> 00:57:30,640
So that would be terpenes in your residual solvents. And you would have another person who excels in microbiology.

614
00:57:30,640 --> 00:57:38,640
They're in the micro lab. So you can start specializing and the same is true with the cultivation.

615
00:57:38,640 --> 00:57:46,640
I'm sure the same is true in retail to a certain extent. So you can get people to specialize.

616
00:57:46,640 --> 00:57:58,640
Awesome. Well, we promised forecasts. So let's forecast.

617
00:57:58,640 --> 00:58:06,640
And I'm going to just do it briefly and then we can go back into it next time.

618
00:58:06,640 --> 00:58:16,640
So long story short, we're using these historic time periods to forecast.

619
00:58:16,640 --> 00:58:22,640
And I'm going to do this more in depth next time. I'm just going to show it to you today just so you can see it.

620
00:58:22,640 --> 00:58:29,640
And then we'll go into it next week. And then for those of you in Saturday Statistics,

621
00:58:29,640 --> 00:58:39,640
we can do some some real interesting comparative analysis and we go a little bit more in depth in Saturday Statistics.

622
00:58:39,640 --> 00:58:49,640
So make sure you make sure you attend if you can. So without further ado, we've got our weekly sales.

623
00:58:49,640 --> 00:58:56,640
Define our forecast horizon the next 60 weeks.

624
00:58:56,640 --> 00:59:01,640
I'm going to go into this more in depth next time, but I'm going to add month fixed effects.

625
00:59:01,640 --> 00:59:09,640
So we're basically going to add an effect for which month it is.

626
00:59:09,640 --> 00:59:21,640
So we're going to say control for the fact that January's may be different than April and April may be different than August.

627
00:59:21,640 --> 00:59:30,640
We need a baseline. So I'm going to exclude January. So we basically going to compare every single month to January.

628
00:59:30,640 --> 00:59:37,640
So April is X percent different than January.

629
00:59:37,640 --> 00:59:42,640
August is X percent different than January, so on and so forth.

630
00:59:42,640 --> 00:59:50,640
So we're going to look at the month fixed effects and then we're just going to use past historic values using the auto arema model.

631
00:59:50,640 --> 01:00:00,640
Which you could argue is a form of machine learning if you repeatedly feed it new data.

632
01:00:00,640 --> 01:00:09,640
And we're basically going to let the computer and statistics fit the best forecasting model for us.

633
01:00:09,640 --> 01:00:17,640
So we're going to let it try a bunch of different models, whichever model fits best in sample.

634
01:00:17,640 --> 01:00:28,640
We're going to use for predicting and then it's just playing it forward using past values to predict future values.

635
01:00:28,640 --> 01:00:38,640
So without further ado, also note I'm restricting the time frame from essentially August.

636
01:00:38,640 --> 01:00:50,640
I mean, June of 2020 onwards because this avoids the gap where business was closed in Massachusetts.

637
01:00:50,640 --> 01:00:59,640
There's a couple ways you could do this. One, include the gap. Two, exclude the gap, like cut it out.

638
01:00:59,640 --> 01:01:09,640
But then that may mess up your frequency. I thought for simplicity's sake, since there may have been a structural change,

639
01:01:09,640 --> 01:01:17,640
foreshadowing that occurred, that may have occurred during the pandemic,

640
01:01:17,640 --> 01:01:26,640
then I think the forecasting model, in my personal opinion, is going to be different post pandemic than prior pandemic.

641
01:01:26,640 --> 01:01:33,640
I still want to include a long time range and I don't like missing data.

642
01:01:33,640 --> 01:01:45,640
So I arbitrarily picked 2020-06-01 to begin training for forecasting. This is my judgment.

643
01:01:45,640 --> 01:01:51,640
Please use your own judgment when you're forecasting, right? It's not a perfect science.

644
01:01:51,640 --> 01:02:01,640
There are judgment calls to be made and you should be upfront with your judgment calls to whoever you present your forecast to.

645
01:02:01,640 --> 01:02:12,640
It's simple to say, hey, I made these assumptions. I started my training, my data on this time frame because of this reason.

646
01:02:12,640 --> 01:02:21,640
Please take that into consideration. This is not the end all be all.

647
01:02:21,640 --> 01:02:31,640
I don't know what the word is, but we can't foresee the future. We can just use statistics.

648
01:02:31,640 --> 01:02:44,640
So long story short, let's just go ahead and play this series out. We're looking at weekly sales. Awesome.

649
01:02:44,640 --> 01:02:56,640
So here I fit the model. We're using 85 observations. I'm estimating in a remote.

650
01:02:56,640 --> 01:03:11,640
100. I don't actually think this is the best model, so we may revisit this next week because I think we need to integrate this to forecast correctly without breaking our assumptions.

651
01:03:11,640 --> 01:03:14,640
And I'll explain more next week.

652
01:03:14,640 --> 01:03:26,640
So long story short, short, you see 11 fixed effects. And so these are comparing all the months to January.

653
01:03:26,640 --> 01:03:39,640
It actually looks because all the coefficients are positive. It actually looks like all the months are higher than January on average for sales.

654
01:03:39,640 --> 01:03:44,640
And then here's our autoregressive component.

655
01:03:44,640 --> 01:03:49,640
And this I believe are constant.

656
01:03:49,640 --> 01:03:57,640
Oh, I wonder if I need to include a constant.

657
01:03:57,640 --> 01:04:10,640
So this is something that we'll revisit next week because I may be leaving out the constant, which would force our model through zero, which is not what we want.

658
01:04:10,640 --> 01:04:16,640
So once again, I'm just going to present this, but we're going to revisit this next week. Do it correctly.

659
01:04:16,640 --> 01:04:19,640
And I'll explain it a bit more in depth.

660
01:04:19,640 --> 01:04:28,640
But long story short, just to show you some plots before concluding today, thanks for staying a handful of extra minutes.

661
01:04:28,640 --> 01:04:35,640
But we're basically, right, we made our forecast.

662
01:04:35,640 --> 01:04:45,640
And I'm going to go ahead and beautify this real quick. And so if you tune into Saturday Morning Statistics, you'll see how we go about making these charts.

663
01:04:45,640 --> 01:05:03,640
I've sort of boiled everything down right here, and I'm not going to go into it in depth. So this is why you should tune in to Saturday Morning Statistics, because I'll make sure that you can walk away at the end of the day knowing how to make a beautiful visualization.

664
01:05:03,640 --> 01:05:09,640
Beautiful visualizations is almost what data science is all about.

665
01:05:09,640 --> 01:05:15,640
He who controls the beautiful visualization controls the decision.

666
01:05:15,640 --> 01:05:20,640
May sound not intuitive, counterintuitive, but it's true.

667
01:05:20,640 --> 01:05:33,640
So arguably true. So long story short, try it yourself. Create some visual beautiful visualizations and

668
01:05:33,640 --> 01:05:40,640
see what that brings you.

669
01:05:40,640 --> 01:05:43,640
There's always some

670
01:05:43,640 --> 01:06:11,640
ogling to be done. Right, so maybe where do you want to slap on this legend? But there's still some work that can be done with this visualization, but here you see our rough estimate of sales in Massachusetts for the coming year.

671
01:06:11,640 --> 01:06:17,640
And we're adding the fixed effects for the months.

672
01:06:17,640 --> 01:06:30,640
And you see our predictions are sales dipped to zero in January. Will they actually dip to zero? Probably not. Will they dip? Maybe.

673
01:06:30,640 --> 01:06:36,640
So, and then this is where you know the models,

674
01:06:36,640 --> 01:06:46,640
they lose their effective power in the future, right, because our model may kind of get the direction right, that sales may be dipping in January.

675
01:06:46,640 --> 01:06:59,640
But the model, because it lacks a structural component, it may not really capture the effect that sales, you know, may rise in 2022.

676
01:06:59,640 --> 01:07:09,640
Right, this is still our best estimate. So, you know, it's quite possible that sales will be within this blue region here.

677
01:07:09,640 --> 01:07:23,640
If we were Bayesians, we could add a probability distribution to see how probable it is that sales are up here versus down here. So that's sort of the difference between the Bayesian forecast

678
01:07:23,640 --> 01:07:36,640
and a frequentist forecast, which is what we just did. So there's way, there's many ways that you can extend upon these forecasts.

679
01:07:36,640 --> 01:07:46,640
However, we were able to extend upon the monthly forecast we made in the prior week, and now we have

680
01:07:46,640 --> 01:07:54,640
weekly forecasts with month fixed effects. So we have

681
01:07:54,640 --> 01:08:00,640
a much more dynamic path for our forecasts here.

682
01:08:00,640 --> 01:08:14,640
And so we're going to save this figure, and then in the coming months, we're going to plot the actual. So we're going to see how the actual trends

683
01:08:14,640 --> 01:08:25,640
along with our forecast. And that's what's so cool about the cannabis data science meetup group is, you know, we're working with real data in real time.

684
01:08:25,640 --> 01:08:34,640
So we can make forecasts, and then we can check them. So, you know, in the coming months,

685
01:08:34,640 --> 01:08:46,640
in November and in December and then into 2022, please take this code and and make your own forecasts and check your forecasts and

686
01:08:46,640 --> 01:09:03,640
keep at it, right, because maybe we can write this forecast in blue, and maybe next month, we can look at the data and we can make a new forecast and plot the new forecast in purple.

687
01:09:03,640 --> 01:09:15,640
And we can see if our new forecast predicts better than our old forecast. So you keep iterating and you keep making forecasts.

688
01:09:15,640 --> 01:09:24,640
What's awesome is we can also forecast all of these other series.

689
01:09:24,640 --> 01:09:31,640
So we can forecast

690
01:09:31,640 --> 01:09:41,640
and I'll be concluding here momentarily, but we can also forecast the number of plants

691
01:09:41,640 --> 01:09:47,640
where we dip and we go up.

692
01:09:47,640 --> 01:09:54,640
We can also forecast the number of employees.

693
01:09:54,640 --> 01:10:01,640
So we can forecast how many employees there are going to be in the market.

694
01:10:01,640 --> 01:10:06,640
Or in the end, yeah, working in the Massachusetts market.

695
01:10:06,640 --> 01:10:20,640
Interestingly, we predict that, oh, wow, the total number of employees is going to fall off a cliff, and then maybe, you know, go up and stabilize around 8000 before falling off another cliff.

696
01:10:20,640 --> 01:10:44,640
As we saw, doesn't really work like that with employees, right, because there's a cost to hiring and firing employees so you know your total number of employees doesn't quite vary like this so this may be a suboptimal slash poor forecast.

697
01:10:44,640 --> 01:10:53,640
Well, I don't want to spend too too too much time here

698
01:10:53,640 --> 01:11:11,640
and bore you to death and just drag on forever. So I'm going to go ahead and tease what's coming up in the next week, and we'll go even further than this, but this is at least what we're covered.

699
01:11:11,640 --> 01:11:24,640
So I want to go back over the forecasting models just to explain, you know, how we are making these forecasts.

700
01:11:24,640 --> 01:11:38,640
Then I want to show you how we can use this same forecasting methodology, right, we're just using time series, a single series to predict itself moving forward.

701
01:11:38,640 --> 01:11:51,640
So we can also predict, right, the total retailers, so we can predict the total number of retailers that are going to be in the market.

702
01:11:51,640 --> 01:11:58,640
We can predict the total number of cultivators that are going to be in the market.

703
01:11:58,640 --> 01:12:06,640
And then we can predict the sales per retailer.

704
01:12:06,640 --> 01:12:12,640
We can predict the plants per cultivator going into 2022.

705
01:12:12,640 --> 01:12:35,640
So, you know, we saw that we could look at those historic values. Well now we can predict them forward. So now managers, executives, investors can make good decisions about the coming year, right, so we saw that oh you know the plants may dip to

706
01:12:35,640 --> 01:12:38,640
800 plants per cultivator.

707
01:12:38,640 --> 01:12:47,640
Well, how many plants per cultivator. Do we expect there are going to be in 2022.

708
01:12:47,640 --> 01:12:51,640
Well, it will just take.

709
01:12:51,640 --> 01:13:10,640
We could do it in about five minutes, but I'm going to save it for next time. But next time, or if you're adventurous you're welcome to take this code, which I am going to commit momentarily to GitHub.

710
01:13:10,640 --> 01:13:14,640
You can take this code and calculate these yourselves.

711
01:13:14,640 --> 01:13:30,640
So tune in next Wednesday, and we will predict these statistics. So we're going to predict sales per retailer in 2022 and plants per cultivator in 2022.

712
01:13:30,640 --> 01:13:53,640
That way, if you're a manager, an executive or an investor of a retailer or a cultivator, cultivation, then you can have an expectation for what your sales goals may need to be how many plants, you need to get into the ground.

713
01:13:53,640 --> 01:13:58,640
Right, or into the water if you're doing hydroponics.

714
01:13:58,640 --> 01:14:01,640
So,

715
01:14:01,640 --> 01:14:14,640
we're going to try to create some values and valuable statistics that people can actually use. So, we could do this in the next five minutes, but

716
01:14:14,640 --> 01:14:31,640
we've already stayed an extra 15 minutes or so. So I'm just going to tease this, and this is what we'll do next time. And then, if we can get to it. I would like to also estimate the

717
01:14:31,640 --> 01:14:47,640
competitive wage and interest rate. Right, so we can look at the historic competitive wage and interest rate. Well, guess what, we can also predict the competitive wage and interest rate into the coming year.

718
01:14:47,640 --> 01:15:06,640
So we will not only have done a market analysis, but we'll have created a forecast of what the market, of how the market will perform in the coming years, in the coming year, where we can say how many players there may be in the market,

719
01:15:06,640 --> 01:15:20,640
how many sales there may be, how many products are going to be sold, right, we could do how many plants are going to be grown in 2022.

720
01:15:20,640 --> 01:15:27,640
And then we can estimate the competitive wage rates so we can do all the prices.

721
01:15:27,640 --> 01:15:32,640
So, incredibly rich analysis.

722
01:15:32,640 --> 01:15:40,640
And I think we've only just begun. So, it's, it's, it's exciting.

723
01:15:40,640 --> 01:15:54,640
So, I'm going to go ahead and end the presentation there, but I hope you've gotten something out of it, Shyam, and everyone else.

724
01:15:54,640 --> 01:16:13,640
So, until next week, definitely feel free to email me or message me your questions, comments, concerns, any ask, or even if you've got topics that you want delved in upon next week.

725
01:16:13,640 --> 01:16:18,640
And I will get this code posted for you to utilize.

726
01:16:18,640 --> 01:16:30,640
And I'm excited to share with you some brand new statistics next week, so we can look at even more novel.

727
01:16:30,640 --> 01:16:35,640
First of the kind, first of the whole statistics next week.

728
01:16:35,640 --> 01:16:46,640
There may have been some other entrepreneurial, adventurous people out there that may have calculated these statistics before us.

729
01:16:46,640 --> 01:16:50,640
And we will hear from them. So it's always, it's always fun.

730
01:16:50,640 --> 01:16:58,640
Fun time, data science plus cannabis, plus statistics, the cannabis data science meetup group.

731
01:16:58,640 --> 01:17:11,640
So, as we always say, until next time, stay productive, keep your nose to the grindstone, and have fun.

732
01:17:11,640 --> 01:17:16,640
All right, Shyam, I'm going to go ahead and end it here.

733
01:17:16,640 --> 01:17:21,640
But thanks a million for attending, and I'll look forward to seeing you in Saturday Statistics.

734
01:17:21,640 --> 01:17:42,640
And I've got some interesting models to share with you, so stay tuned and we'll have a good time on Saturday.

