1
00:00:00,000 --> 00:00:11,200
Welcome to the Cannabis Data Science Meetup Group.

2
00:00:11,200 --> 00:00:12,760
Happy to have you here.

3
00:00:12,760 --> 00:00:14,600
So got a lot to talk about.

4
00:00:14,600 --> 00:00:23,240
So last week we were at MJ Biz, so we didn't have too much material to actually sit down

5
00:00:23,240 --> 00:00:26,000
and work on, so we're going to make it up today.

6
00:00:26,000 --> 00:00:29,800
And even launched a new event, Saturday Statistics.

7
00:00:29,800 --> 00:00:36,440
So you're welcome to join us on Saturdays and then we'll get a little bit more in-depth

8
00:00:36,440 --> 00:00:43,600
about the actual statistics behind what we're doing.

9
00:00:43,600 --> 00:00:49,120
And so we'll do a statistics lesson, supplement that with some cannabis data, and we'll have

10
00:00:49,120 --> 00:00:57,480
a nice fun time of it.

11
00:00:57,480 --> 00:01:02,580
So here's just the standard group for today.

12
00:01:02,580 --> 00:01:12,000
And then you're welcome to find the Saturday morning statistics, which will be a fun time.

13
00:01:12,000 --> 00:01:17,880
So essentially each Saturday I'll make sure that we walk away at the end of the day having

14
00:01:17,880 --> 00:01:21,640
created a beautiful visualization.

15
00:01:21,640 --> 00:01:29,440
And so I heard somewhere that the one in control of the data and the visualization is often

16
00:01:29,440 --> 00:01:31,860
the one in control of the decision.

17
00:01:31,860 --> 00:01:39,200
So if you're able to create some beautiful charts at your organization, then it'll take

18
00:01:39,200 --> 00:01:41,040
you a long way.

19
00:01:41,040 --> 00:01:46,520
So I'll try to make that worthwhile for you.

20
00:01:46,520 --> 00:01:48,720
All right.

21
00:01:48,720 --> 00:01:58,600
So for today, I thought we could dust off a little work we've done in the past and apply

22
00:01:58,600 --> 00:02:04,240
it to the Massachusetts market because that's a market that still have interesting.

23
00:02:04,240 --> 00:02:10,360
So it's still an up and coming market despite there being some established players since

24
00:02:10,360 --> 00:02:13,160
the industry has been around since 2018.

25
00:02:13,160 --> 00:02:18,960
However, it is one of the more strictly regulated markets.

26
00:02:18,960 --> 00:02:27,320
So it is taking certain players longer than expected to get up and running.

27
00:02:27,320 --> 00:02:32,900
So one thing we're going to look at are prices, right?

28
00:02:32,900 --> 00:02:43,140
So in the past, we looked at prices in Oregon and we tried to estimate the output and prices

29
00:02:43,140 --> 00:02:44,140
in Oregon.

30
00:02:44,140 --> 00:02:52,960
And so remember, we were looking at inflation and we can simply define inflation as the

31
00:02:52,960 --> 00:03:02,840
change in prices from the day to yesterday divided by yesterday's prices.

32
00:03:02,840 --> 00:03:07,560
So we'll be interested in inflation as well as output.

33
00:03:07,560 --> 00:03:22,560
So before we dived, well, I guess just a bit of the cannabis information.

34
00:03:22,560 --> 00:03:33,400
So prices dropping precipitously here in Oregon has been a challenge for the producers and

35
00:03:33,400 --> 00:03:36,280
perhaps retailers in Oregon.

36
00:03:36,280 --> 00:03:39,160
Because it's incredibly competitive.

37
00:03:39,160 --> 00:03:41,560
It's really hard to make a profit.

38
00:03:41,560 --> 00:03:49,300
So many people that may have been profitable at a $40 per gram price point may no longer

39
00:03:49,300 --> 00:03:50,720
be profitable.

40
00:03:50,720 --> 00:04:09,680
So I think we may have.

41
00:04:09,680 --> 00:04:10,680
Welcome.

42
00:04:10,680 --> 00:04:11,680
Happy to have you.

43
00:04:11,680 --> 00:04:17,960
So right now we're talking about prices in Oregon.

44
00:04:17,960 --> 00:04:33,120
So let's see.

45
00:04:33,120 --> 00:04:39,240
And we'll maybe do introductions here at the end since we have already dived in or perhaps

46
00:04:39,240 --> 00:04:43,640
we'll take a break in the middle.

47
00:04:43,640 --> 00:04:48,320
So let's just try to.

48
00:04:48,320 --> 00:04:51,360
Hi, Cheyenne.

49
00:04:51,360 --> 00:04:53,080
It's good to have you here.

50
00:04:53,080 --> 00:04:57,240
So I'll try to stay unfocused since I'm already presenting here.

51
00:04:57,240 --> 00:05:01,000
Okay, let's try this one more time.

52
00:05:01,000 --> 00:05:03,840
Pardon the interruptions.

53
00:05:03,840 --> 00:05:09,240
So long story short, we've got historical data in Oregon.

54
00:05:09,240 --> 00:05:11,360
We saw the price point fall.

55
00:05:11,360 --> 00:05:15,280
This made it difficult for producers.

56
00:05:15,280 --> 00:05:18,560
Some producers are still making a good go of it.

57
00:05:18,560 --> 00:05:25,060
We're interested in what may happen in Massachusetts.

58
00:05:25,060 --> 00:05:28,880
So we're going to use some economic models.

59
00:05:28,880 --> 00:05:33,280
As we found out last time with our theoretical models, we were having a hard time getting

60
00:05:33,280 --> 00:05:41,840
them to fit in Massachusetts due to the sporadic nature of the data and the small number of

61
00:05:41,840 --> 00:05:45,880
observations and perhaps the flimsy theory.

62
00:05:45,880 --> 00:05:50,840
So we're going to go with an a theoretical approach this time.

63
00:05:50,840 --> 00:05:56,380
And we're going to introduce a variable, the interest rate.

64
00:05:56,380 --> 00:06:04,280
So we talked about the dynamic here when we talked about Oregon, how the central bank

65
00:06:04,280 --> 00:06:14,040
will try to set interest rates to ultimately keep inflation under control and try to maximize

66
00:06:14,040 --> 00:06:18,640
output.

67
00:06:18,640 --> 00:06:25,160
So you can actually model these economic variables in a system of equations.

68
00:06:25,160 --> 00:06:27,960
And so these are simultaneous equations.

69
00:06:27,960 --> 00:06:30,800
So these are all happening at the same time.

70
00:06:30,800 --> 00:06:34,360
So you've got output y.

71
00:06:34,360 --> 00:06:45,040
And so output today is dependent on output from the past, any given historical periods,

72
00:06:45,040 --> 00:06:46,040
t minus j.

73
00:06:46,040 --> 00:06:52,160
So t minus 1, t minus 2, t minus 3, what have you.

74
00:06:52,160 --> 00:06:56,280
Output today is dependent on inflation in the past.

75
00:06:56,280 --> 00:07:02,560
And then output today is also dependent on the interest rates in the past.

76
00:07:02,560 --> 00:07:12,000
Because investors needed to, well, the producers needed to invest in capital equipment.

77
00:07:12,000 --> 00:07:16,480
And the investors needed to supply a certain amount of capital.

78
00:07:16,480 --> 00:07:22,520
So that's a whole market of its own, the capital market.

79
00:07:22,520 --> 00:07:30,120
Inflation, so what's happening with prices, but governs what's going to happen with output.

80
00:07:30,120 --> 00:07:34,440
So we know that from our economic theory.

81
00:07:34,440 --> 00:07:39,320
And now we've introduced a third player, the central bank.

82
00:07:39,320 --> 00:07:46,440
And so the central bank, they're going to be making their decisions based on past output.

83
00:07:46,440 --> 00:07:50,920
Past inflation, as well as past interest rates.

84
00:07:50,920 --> 00:07:57,080
So you have these simultaneous equations that are happening.

85
00:07:57,080 --> 00:08:03,760
Luckily, we can estimate these because these are nice linear equations.

86
00:08:03,760 --> 00:08:08,240
And we have each of these data points.

87
00:08:08,240 --> 00:08:11,920
So why are we picking this model?

88
00:08:11,920 --> 00:08:16,320
Well, for starters, it's a theoretical model.

89
00:08:16,320 --> 00:08:17,800
It's theoretical.

90
00:08:17,800 --> 00:08:29,640
So we're not imposing any production function assumptions or any assumptions of that sort,

91
00:08:29,640 --> 00:08:31,640
like we have done in previous weeks.

92
00:08:31,640 --> 00:08:37,800
So if you look at the recordings from the prior weeks, we attempted to estimate production

93
00:08:37,800 --> 00:08:40,120
functions in Massachusetts.

94
00:08:40,120 --> 00:08:46,560
As Heather can attest, we were having trouble estimating those models well.

95
00:08:46,560 --> 00:08:53,120
So can we use those models for predicting the future?

96
00:08:53,120 --> 00:08:59,160
You know, perhaps, but they may not be the best predictors.

97
00:08:59,160 --> 00:09:05,800
So what's cool about the VARS is they're flexible.

98
00:09:05,800 --> 00:09:10,040
They can really fit many different data points, as I'll show you here in a bit.

99
00:09:10,040 --> 00:09:12,720
They can fit any frequency.

100
00:09:12,720 --> 00:09:17,680
So we were seeing that, oh, you know, we may have daily, we may have weekly, we may have

101
00:09:17,680 --> 00:09:20,980
monthly, we may even have quarterly data.

102
00:09:20,980 --> 00:09:26,440
So the frequency of the data matters.

103
00:09:26,440 --> 00:09:29,120
Bitfall of the VAR.

104
00:09:29,120 --> 00:09:34,520
If you look here, we have many different variables.

105
00:09:34,520 --> 00:09:42,120
As we were talking about, every variable introduces possibilities for measurement error or what

106
00:09:42,120 --> 00:09:43,240
have you.

107
00:09:43,240 --> 00:09:52,000
And so the top of those concerns, you also have each variable, each way at your degrees

108
00:09:52,000 --> 00:09:53,840
of freedom.

109
00:09:53,840 --> 00:09:57,560
So on that each variable rate to each parameter, right?

110
00:09:57,560 --> 00:10:00,520
So we're going to be estimating all of these parameters.

111
00:10:00,520 --> 00:10:07,240
So our degrees of freedom is going to go down rapidly.

112
00:10:07,240 --> 00:10:13,680
So you have to be careful, because if you're limited to your number of observations, which

113
00:10:13,680 --> 00:10:20,000
we are, it can be difficult to fit a VAR model.

114
00:10:20,000 --> 00:10:26,800
And so you may overfit, which once again can lead to bad forecasts.

115
00:10:26,800 --> 00:10:33,640
So we're going to hedge this.

116
00:10:33,640 --> 00:10:36,840
This is how you can write the VAR.

117
00:10:36,840 --> 00:10:41,640
So this is just a simplified manner.

118
00:10:41,640 --> 00:10:42,640
Let's get to the forecasting.

119
00:10:42,640 --> 00:10:45,440
That's the more interesting part.

120
00:10:45,440 --> 00:10:52,120
So it's easy to forecast with a VAR, right?

121
00:10:52,120 --> 00:10:56,920
And so the first equation is the model we'll be estimating, right?

122
00:10:56,920 --> 00:11:05,000
Where output, inflation, and the interest rate, those all depend on yesterday's values.

123
00:11:05,000 --> 00:11:13,840
Well, given yesterday's values, you can, and you've estimated your parameters, then you

124
00:11:13,840 --> 00:11:18,160
can simply play that one period forward.

125
00:11:18,160 --> 00:11:24,480
So you can just plug in today's values with the parameters you estimated, and you can

126
00:11:24,480 --> 00:11:26,920
get the next period's values.

127
00:11:26,920 --> 00:11:34,360
You can then plug in next period's values with your parameters and get the period after.

128
00:11:34,360 --> 00:11:44,200
So you can do this subsequently, and you can get forecasts for end periods into the future.

129
00:11:44,200 --> 00:11:55,680
So we can predict output, inflation, and interest rate however long we want into the future.

130
00:11:55,680 --> 00:12:03,040
Keep in mind the confidence of our forecasts is going to be increasing the further we go

131
00:12:03,040 --> 00:12:09,680
into the future, but we can still do powerful things, right?

132
00:12:09,680 --> 00:12:17,600
So this is how people can say, oh, what may sales be in 2030, right?

133
00:12:17,600 --> 00:12:18,920
We could play that out.

134
00:12:18,920 --> 00:12:23,280
It could be wildly inaccurate, but we can play that out.

135
00:12:23,280 --> 00:12:25,600
What will sales be like in 2022?

136
00:12:25,600 --> 00:12:26,600
We can play that out.

137
00:12:26,600 --> 00:12:30,000
It will be slightly more accurate than 2030.

138
00:12:30,000 --> 00:12:33,660
Or what would sales be like next month, right?

139
00:12:33,660 --> 00:12:43,980
And then that's going to have the least, the smallest confidence interval.

140
00:12:43,980 --> 00:12:54,320
So we can be a bit more certain about that one, but there's still going to be uncertainty.

141
00:12:54,320 --> 00:13:03,400
And so I want to hit on what a professor boiled down to me as the 10 Commandments of Forecasting.

142
00:13:03,400 --> 00:13:04,400
Right?

143
00:13:04,400 --> 00:13:09,120
And so we'll just go over these real quick since we're about to be doing forecasting.

144
00:13:09,120 --> 00:13:14,000
And I want to have these in everyone's mind, and then we'll go over them at the end just

145
00:13:14,000 --> 00:13:17,080
to make sure that we hit each point.

146
00:13:17,080 --> 00:13:23,100
So we first and foremost need to know what we're forecasting.

147
00:13:23,100 --> 00:13:29,040
That's going to be sales in Massachusetts, right?

148
00:13:29,040 --> 00:13:35,880
Because we want to have an estimate for what sales may be in 2021.

149
00:13:35,880 --> 00:13:41,320
And then we may like to know what sales may be like next year in 2022.

150
00:13:41,320 --> 00:13:49,680
So we can maybe say, making prediction about how much sales may increase or perhaps decrease

151
00:13:49,680 --> 00:13:51,160
next year.

152
00:13:51,160 --> 00:13:52,380
Right?

153
00:13:52,380 --> 00:14:01,920
And then we could perhaps make forecasts all the way to 2025 or 2030 or what have you,

154
00:14:01,920 --> 00:14:08,640
because you're seeing some large analytics firms making predictions that far into the

155
00:14:08,640 --> 00:14:09,640
future.

156
00:14:09,640 --> 00:14:15,280
And like I'd said in the past, I'm not a big fan of the black box approach, because we've

157
00:14:15,280 --> 00:14:18,680
seen how many assumptions are made along the way.

158
00:14:18,680 --> 00:14:24,400
So we're going to make a clear box for forecasting.

159
00:14:24,400 --> 00:14:25,400
Right?

160
00:14:25,400 --> 00:14:31,080
So you can at least look into it and see what gears are turning.

161
00:14:31,080 --> 00:14:34,440
What are the inputs and what are the outputs?

162
00:14:34,440 --> 00:14:35,960
And it's not a black box.

163
00:14:35,960 --> 00:14:41,660
So we're making a nice transparent box right now where you can see the forecasting model.

164
00:14:41,660 --> 00:14:43,760
You can see where you get your data from.

165
00:14:43,760 --> 00:14:49,800
You can see how things are operated and you can see how forecasts are made.

166
00:14:49,800 --> 00:14:58,320
And you can get in there and tinker yourself if you're not satisfied with how we made forecasts

167
00:14:58,320 --> 00:15:02,880
and improve upon what we've done to make even better forecasts.

168
00:15:02,880 --> 00:15:05,640
So that's what we're doing here.

169
00:15:05,640 --> 00:15:10,200
So that's step one.

170
00:15:10,200 --> 00:15:17,760
And really step two, right, the purpose of this is to have a nice transparent mechanism

171
00:15:17,760 --> 00:15:23,400
for forecasting that people can use to have better insights for the future so they can

172
00:15:23,400 --> 00:15:27,640
understand the market better.

173
00:15:27,640 --> 00:15:29,680
The cost of forecasting.

174
00:15:29,680 --> 00:15:36,540
This is where we need to emphasize that there's measurement error and our predictions could

175
00:15:36,540 --> 00:15:43,600
be inaccurate, right, because we don't want people putting too much stake in our forecasts,

176
00:15:43,600 --> 00:15:44,600
right?

177
00:15:44,600 --> 00:15:50,560
Because if you're a producer and you think the market is going to grow by X percent and

178
00:15:50,560 --> 00:15:57,160
the market doesn't grow by X percent, well, you need to have accounted for that in your

179
00:15:57,160 --> 00:15:59,680
business model, right?

180
00:15:59,680 --> 00:16:03,900
Because if you put too much stake into the forecast, sales don't grow as much as you

181
00:16:03,900 --> 00:16:06,440
expect.

182
00:16:06,440 --> 00:16:08,920
You may wind up in the red.

183
00:16:08,920 --> 00:16:27,600
So sometimes it's better to, you know, be a bit more, you know.

184
00:16:27,600 --> 00:16:36,080
Sometimes it's a bit prudent to plan based on the lower ends of the forecast and often

185
00:16:36,080 --> 00:16:42,760
than other times, maybe when you're predicting risk, it may be more prudent to operate in

186
00:16:42,760 --> 00:16:45,880
the upper bounds of the forecast.

187
00:16:45,880 --> 00:16:53,720
So you have to know, you know, is your risk asymmetric or symmetric?

188
00:16:53,720 --> 00:16:59,760
So three is a tricky point and we'll we'll dance around this as we we go through our

189
00:16:59,760 --> 00:17:00,760
estimation.

190
00:17:00,760 --> 00:17:07,760
Four, we've hit on this, so this is the rationalize the forecast horizon.

191
00:17:07,760 --> 00:17:11,200
So what's useful to forecast?

192
00:17:11,200 --> 00:17:17,160
So at the moment, I figured we may as well forecast to the end of the year.

193
00:17:17,160 --> 00:17:22,240
So that way we can go ahead and get an early estimate of the year total.

194
00:17:22,240 --> 00:17:25,240
So slightly valuable.

195
00:17:25,240 --> 00:17:29,000
And then I figured we may as well estimate sales next year.

196
00:17:29,000 --> 00:17:38,160
So that way we can start to to predict to what extent sales may rise or fall.

197
00:17:38,160 --> 00:17:40,400
So I think that would be useful.

198
00:17:40,400 --> 00:17:49,120
And then it perhaps would be useful to forecast through 2025, 2030, just for the fun of things,

199
00:17:49,120 --> 00:17:51,800
just to see how much things may grow.

200
00:17:51,800 --> 00:18:01,360
But I think the utility diminishes in that horizon.

201
00:18:01,360 --> 00:18:02,360
Understand our variables.

202
00:18:02,360 --> 00:18:03,360
Right.

203
00:18:03,360 --> 00:18:07,240
So these are we're going to dive into the deeper here in a second.

204
00:18:07,240 --> 00:18:13,360
But these are the variables that are available to us that match our economic theory and that

205
00:18:13,360 --> 00:18:16,280
fit into our model.

206
00:18:16,280 --> 00:18:18,760
Which forecasting model are we going to use?

207
00:18:18,760 --> 00:18:24,520
We talked about this where we were trying the theoretical approach, not working very

208
00:18:24,520 --> 00:18:25,520
well for us.

209
00:18:25,520 --> 00:18:30,000
We're going to try the a theoretical VAR approach.

210
00:18:30,000 --> 00:18:33,240
Of course, we need to present our results.

211
00:18:33,240 --> 00:18:34,240
Right.

212
00:18:34,240 --> 00:18:39,080
A nice time series visualization would be wonderful.

213
00:18:39,080 --> 00:18:41,360
You need to know how to decipher the results.

214
00:18:41,360 --> 00:18:42,360
Right.

215
00:18:42,360 --> 00:18:46,440
So that's when we're talking about, OK, if sales are going to increase, right, it would

216
00:18:46,440 --> 00:18:51,480
be nice to know about what percentage they're going to increase.

217
00:18:51,480 --> 00:18:56,360
And then these are the two most important points, in my opinion.

218
00:18:56,360 --> 00:18:58,920
One use recursive methods.

219
00:18:58,920 --> 00:19:11,560
So this is essentially what I mean by I boil this down to revisiting the model.

220
00:19:11,560 --> 00:19:12,560
Right.

221
00:19:12,560 --> 00:19:19,520
So we're going to forecast this period and then we can then forecast next period when

222
00:19:19,520 --> 00:19:21,520
the data comes in.

223
00:19:21,520 --> 00:19:28,600
So, you know, just keep iterating.

224
00:19:28,600 --> 00:19:37,240
And then we as we iterate, we may we may find flaws in the model.

225
00:19:37,240 --> 00:19:44,560
And it's OK for us to improve our model over time.

226
00:19:44,560 --> 00:19:49,680
So that's the main thing is we want to revisit our forecasts.

227
00:19:49,680 --> 00:19:55,760
So we want to forecast and then next month we want to come back, check our forecasts

228
00:19:55,760 --> 00:20:04,020
with the actual and use the actual and try to forecast the next month even better.

229
00:20:04,020 --> 00:20:11,560
So just so that's the recursion where each month you come back, you see.

230
00:20:11,560 --> 00:20:16,120
You measure your forecasting error.

231
00:20:16,120 --> 00:20:23,120
And then you use the new actual data to try to create a better a better model and reduce

232
00:20:23,120 --> 00:20:27,560
your forecast error for the next time.

233
00:20:27,560 --> 00:20:35,680
So I think let's get our hands on the data at this point.

234
00:20:35,680 --> 00:20:41,640
So all right.

235
00:20:41,640 --> 00:20:52,040
So just going to use a handful of standard packages here as well as two utility functions

236
00:20:52,040 --> 00:20:59,360
I've written that just help help us clean and clean the data.

237
00:20:59,360 --> 00:21:04,240
Using Python, but you're welcome to use your favorite programming language.

238
00:21:04,240 --> 00:21:08,480
OK.

239
00:21:08,480 --> 00:21:09,480
First things.

240
00:21:09,480 --> 00:21:10,480
Hold on.

241
00:21:10,480 --> 00:21:15,920
We may have a new attendee.

242
00:21:15,920 --> 00:21:24,120
All right.

243
00:21:24,120 --> 00:21:26,640
Awesome to have you, Eric.

244
00:21:26,640 --> 00:21:34,160
Just just wanted to point out that it was a little much for me to get everything ready

245
00:21:34,160 --> 00:21:36,960
for last Saturday for Saturday's statistics.

246
00:21:36,960 --> 00:21:39,360
So I apologize if you were looking for that.

247
00:21:39,360 --> 00:21:42,880
However, I have created the event.

248
00:21:42,880 --> 00:21:49,600
And so this coming Saturday will finally kick off with Saturday's statistics.

249
00:21:49,600 --> 00:21:56,880
And the goal of Saturday's statistics is to walk away with one beautiful visualization

250
00:21:56,880 --> 00:21:59,320
at the end of the day.

251
00:21:59,320 --> 00:22:07,640
Because that's sort of what we're going to emphasize here is how much you can persuade

252
00:22:07,640 --> 00:22:15,160
informed and even entertain people with a nice visualization.

253
00:22:15,160 --> 00:22:26,800
So for today, we're just going to get our hands dirty trying to actually do some forecasting.

254
00:22:26,800 --> 00:22:29,840
So we may make some visualizations.

255
00:22:29,840 --> 00:22:31,840
They may not be beautiful.

256
00:22:31,840 --> 00:22:38,040
But that's why you can tune in on Saturday.

257
00:22:38,040 --> 00:22:42,000
But without further ado, let's get the data.

258
00:22:42,000 --> 00:22:49,760
So I may go over the getting the data quickly, because we've done this a bit more in depth

259
00:22:49,760 --> 00:22:51,240
in prior weeks.

260
00:22:51,240 --> 00:23:01,800
But long story short, at least for Massachusetts, you can get quite rich data through the first

261
00:23:01,800 --> 00:23:02,800
week.

262
00:23:02,800 --> 00:23:06,560
So we're going to start with the Socrata API.

263
00:23:06,560 --> 00:23:12,160
So here we're going to be grabbing the average prices.

264
00:23:12,160 --> 00:23:16,200
This is a monthly series.

265
00:23:16,200 --> 00:23:22,880
And we're also going to be grabbing sales totals.

266
00:23:22,880 --> 00:23:30,200
And this is a daily series that we'll be aggregating into a monthly series.

267
00:23:30,200 --> 00:23:33,880
Data from the Federal Reserve.

268
00:23:33,880 --> 00:23:41,880
So we're going to grab the federal funds rate, which is essentially what economists refer

269
00:23:41,880 --> 00:23:49,040
to when they are talking about the interest rate.

270
00:23:49,040 --> 00:23:55,840
So this is the interest rate the banks pay, and thus it affects all the other interest

271
00:23:55,840 --> 00:23:59,960
rates.

272
00:23:59,960 --> 00:24:13,840
So we're going to get these data points and clean them to a certain extent.

273
00:24:13,840 --> 00:24:18,080
So for example, we need to calculate sales.

274
00:24:18,080 --> 00:24:24,460
So sales total is a cumulative series.

275
00:24:24,460 --> 00:24:27,440
So we need to take the difference of that.

276
00:24:27,440 --> 00:24:33,240
There are also these outliers that aren't rational.

277
00:24:33,240 --> 00:24:39,320
So these are negative sales and sales greater than, I believe this is 10 million in one

278
00:24:39,320 --> 00:24:40,320
day.

279
00:24:40,320 --> 00:24:45,480
And so these appear to be outliers to me.

280
00:24:45,480 --> 00:24:50,480
And so coding them is zero.

281
00:24:50,480 --> 00:24:54,840
And so this is where we're starting to get into how the sausage is made, right?

282
00:24:54,840 --> 00:25:03,440
Because if you're using a black box approach, you wouldn't see these adjustments, or they

283
00:25:03,440 --> 00:25:06,720
may be in a tiny little footnote.

284
00:25:06,720 --> 00:25:14,120
And here we're being quite transparent that, yes, we're dealing with these outliers.

285
00:25:14,120 --> 00:25:16,040
My coding them is zero.

286
00:25:16,040 --> 00:25:23,480
If you have a better approach, then by all means, either fix it or send me an email or

287
00:25:23,480 --> 00:25:24,480
what have you.

288
00:25:24,480 --> 00:25:25,480
Then we'll get it fixed.

289
00:25:25,480 --> 00:25:39,640
And long story short, we can get the sales data here.

290
00:25:39,640 --> 00:25:49,040
And this was throwing us a curveball in prior weeks.

291
00:25:49,040 --> 00:25:54,920
And I finally got an answer at MJBizCon.

292
00:25:54,920 --> 00:26:01,640
And it was real funny because I think I was one of the only people to actually know this

293
00:26:01,640 --> 00:26:02,640
answer.

294
00:26:02,640 --> 00:26:06,800
So there was one of the seminars was this cannabis data game show.

295
00:26:06,800 --> 00:26:12,760
And you had some of the top data scientists from the top analytics companies.

296
00:26:12,760 --> 00:26:15,680
And they were just asking them trivia questions.

297
00:26:15,680 --> 00:26:24,280
And one of the questions was how much cannabis was sold in Massachusetts in April of 2020?

298
00:26:24,280 --> 00:26:27,560
And I had a sneaking suspicion that I knew the answer.

299
00:26:27,560 --> 00:26:31,520
And I guessed it was zero.

300
00:26:31,520 --> 00:26:35,640
And now we know for a fact it is in fact zero.

301
00:26:35,640 --> 00:26:40,800
So these data points here aren't missing.

302
00:26:40,800 --> 00:26:51,240
They in fact suspended sales of cannabis in Massachusetts from I want to say, what was

303
00:26:51,240 --> 00:27:07,280
it like March 24th or so through May 24th or so.

304
00:27:07,280 --> 00:27:09,200
I won't take up too much time with it now.

305
00:27:09,200 --> 00:27:13,240
You can drill down here and get the specific dates.

306
00:27:13,240 --> 00:27:18,800
But long story short, they did actually suspend sales during that period, which I think is

307
00:27:18,800 --> 00:27:20,640
real interesting.

308
00:27:20,640 --> 00:27:25,200
So now it's real interesting for a couple of factors.

309
00:27:25,200 --> 00:27:31,960
One, it's unfortunate because, well, it's unfortunate to maybe the producers or what

310
00:27:31,960 --> 00:27:38,840
have you because it may be fortunate for the people of Massachusetts, who knows.

311
00:27:38,840 --> 00:27:42,960
You saw a spike in most states, right?

312
00:27:42,960 --> 00:27:50,080
So most people went into a buying frenzy.

313
00:27:50,080 --> 00:27:56,480
But then again, economic theory suggests that people may just smooth out their consumption.

314
00:27:56,480 --> 00:28:02,760
So just because they went into a buying frenzy on one date, they may just reduce their consumption

315
00:28:02,760 --> 00:28:04,960
at a later date.

316
00:28:04,960 --> 00:28:11,160
But what have you, Massachusetts suspended sales.

317
00:28:11,160 --> 00:28:20,120
So we're kind of given essentially what you would call a shock, right?

318
00:28:20,120 --> 00:28:28,120
So here you're given this industry and this would be a production shock, right?

319
00:28:28,120 --> 00:28:39,000
Because you're going along, going along, and then we're just going to put a large negative

320
00:28:39,000 --> 00:28:42,680
production shock, the pandemic.

321
00:28:42,680 --> 00:28:51,680
And in this case, Massachusetts, it knocked sales to zero for this set period.

322
00:28:51,680 --> 00:29:00,680
So that's essentially a production shock and we can measure, okay, what effect did that

323
00:29:00,680 --> 00:29:05,360
have on prices?

324
00:29:05,360 --> 00:29:10,720
And also the interest rate, but the interest rate, we may go ahead and make the assumption

325
00:29:10,720 --> 00:29:19,520
that the interest rate's not really determined by output and prices of cannabis in Massachusetts,

326
00:29:19,520 --> 00:29:20,800
which it probably really isn't.

327
00:29:20,800 --> 00:29:30,800
I mean, maybe to a minor, minor degree, but that really depends on much larger scale factors.

328
00:29:30,800 --> 00:29:33,520
But that matters more for identification.

329
00:29:33,520 --> 00:29:44,560
And so long story short, if you're particularly interested in the theory behind VARs, so vector

330
00:29:44,560 --> 00:29:49,200
autoregressions, which we'll get to here in a minute, I'll recommend that you check out

331
00:29:49,200 --> 00:29:50,200
my website.

332
00:29:50,200 --> 00:29:53,960
So this is just keeganskeet.com.

333
00:29:53,960 --> 00:30:02,760
If you go to the PDFs and macroeconomics, I've got a real good section here about these vector

334
00:30:02,760 --> 00:30:03,760
autoregressions.

335
00:30:03,760 --> 00:30:09,600
And so here you'll get all the sources and more in depth about how we're actually estimating

336
00:30:09,600 --> 00:30:11,480
these.

337
00:30:11,480 --> 00:30:17,800
And if you really want to get into the weeds behind the statistics about what assumptions

338
00:30:17,800 --> 00:30:26,360
need to be made in a statistical sense, then I'll let you read into that, get into the

339
00:30:26,360 --> 00:30:29,560
weeds if you wish.

340
00:30:29,560 --> 00:30:34,400
But long story short, we'll estimate the model.

341
00:30:34,400 --> 00:30:38,320
We're not going to get into the weeds yet, perhaps on Saturday.

342
00:30:38,320 --> 00:30:42,720
But for today, we're just going to run through this.

343
00:30:42,720 --> 00:30:47,760
So long story short, we're interested in this production shock and how that's going to

344
00:30:47,760 --> 00:30:50,120
affect prices.

345
00:30:50,120 --> 00:30:55,960
So we can go ahead and create our monthly series here.

346
00:30:55,960 --> 00:31:04,360
And so here we're just aggregating sales.

347
00:31:04,360 --> 00:31:15,520
As you see, in April of 2020, sales dropped to zero.

348
00:31:15,520 --> 00:31:18,440
And then they carry along their way.

349
00:31:18,440 --> 00:31:27,560
And so it's going to be interesting to see if we can't disentangle what effect this shock

350
00:31:27,560 --> 00:31:28,880
may have had.

351
00:31:28,880 --> 00:31:40,000
And then the whole grand scheme is what's cool about these VARs is you can simulate

352
00:31:40,000 --> 00:31:54,680
production shocks, or you can simulate price shocks or interest rate shocks to figure out

353
00:31:54,680 --> 00:31:56,960
what would happen in the future.

354
00:31:56,960 --> 00:32:09,280
So you basically just add a large positive or negative shock to yt.

355
00:32:09,280 --> 00:32:13,200
And then you sort of let that play out in your forecasts.

356
00:32:13,200 --> 00:32:18,100
And you compare that to your forecasts without the shock.

357
00:32:18,100 --> 00:32:25,000
And so you can kind of estimate the effect of the shock.

358
00:32:25,000 --> 00:32:26,720
So I think this is interesting.

359
00:32:26,720 --> 00:32:32,760
And so that's going to require a bit more explanation with statistics.

360
00:32:32,760 --> 00:32:36,320
So this is either something we'll do on Saturday or perhaps next week.

361
00:32:36,320 --> 00:32:43,280
I kind of may say this for next week, because I think the shocks are something that a lot

362
00:32:43,280 --> 00:32:45,780
of people would benefit from understanding.

363
00:32:45,780 --> 00:32:53,840
Because that's sort of seeing things through their logical conclusions.

364
00:32:53,840 --> 00:33:00,560
And I think that's really what's cool about the VARs is you estimate the model.

365
00:33:00,560 --> 00:33:03,680
You use the model to predict the future.

366
00:33:03,680 --> 00:33:09,960
And then you say, OK, what would happen if something changed in the present?

367
00:33:09,960 --> 00:33:14,040
How would that change things in the future?

368
00:33:14,040 --> 00:33:19,800
So that's really quite sophisticated analysis.

369
00:33:19,800 --> 00:33:23,480
And that's sort of what we're moving towards.

370
00:33:23,480 --> 00:33:33,040
And I've gotten near to doing it in the wild, estimating production shocks.

371
00:33:33,040 --> 00:33:36,560
Not quite, but close.

372
00:33:36,560 --> 00:33:46,320
So I think it is something that people can do for companies once you have a nice handle

373
00:33:46,320 --> 00:33:50,320
on all the other statistics.

374
00:33:50,320 --> 00:33:56,640
But first things first, forecasting.

375
00:33:56,640 --> 00:34:02,160
So we've got our sales going along.

376
00:34:02,160 --> 00:34:07,920
Now we can get prices, right?

377
00:34:07,920 --> 00:34:10,600
What do the prices look like?

378
00:34:10,600 --> 00:34:15,760
Well, isn't that interesting?

379
00:34:15,760 --> 00:34:22,840
Of course, prices go to zero if sales.

380
00:34:22,840 --> 00:34:31,320
Well, actually, look, prices didn't necessarily go to zero.

381
00:34:31,320 --> 00:34:44,320
So I think I may want to think about this data point here for a second.

382
00:34:44,320 --> 00:34:51,240
Great, because if there was no sales, you would expect prices to go to.

383
00:34:51,240 --> 00:34:54,440
So these are supposed to be off.

384
00:34:54,440 --> 00:34:56,440
So you'd expect prices to go to zero.

385
00:34:56,440 --> 00:35:00,640
Long story short.

386
00:35:00,640 --> 00:35:05,160
What I would expect is, OK, you've got the production shock.

387
00:35:05,160 --> 00:35:12,880
And so one would expect that in all the periods following April of 2020, that that shock would

388
00:35:12,880 --> 00:35:19,360
have sort of a ripple effect on prices throughout time.

389
00:35:19,360 --> 00:35:27,440
And it seems that here it's an immediate effect.

390
00:35:27,440 --> 00:35:32,440
If we look at our model, right?

391
00:35:32,440 --> 00:35:41,880
So inflation, which is prices, right?

392
00:35:41,880 --> 00:35:45,840
Inflation rate change in price.

393
00:35:45,840 --> 00:35:50,320
So that's dependent on output from yesterday.

394
00:35:50,320 --> 00:35:58,940
So we would expect rate that if output all of a sudden takes a hit, then that's going

395
00:35:58,940 --> 00:36:03,480
to affect prices into the future.

396
00:36:03,480 --> 00:36:09,400
Thus, it's going to affect inflation into the future.

397
00:36:09,400 --> 00:36:17,440
And if inflation is going to be changing, well, that's going to have the the that's

398
00:36:17,440 --> 00:36:22,480
why it's a system of equations, right, because if prices change, then that's going to change

399
00:36:22,480 --> 00:36:27,880
inflation, which is then rate outputs dependent on inflation.

400
00:36:27,880 --> 00:36:32,000
And then rate, the interest rate is dependent rates.

401
00:36:32,000 --> 00:36:38,360
So these are all sort of interdependent on each other.

402
00:36:38,360 --> 00:36:43,960
So that's why we're about to estimate these as a system of equations here.

403
00:36:43,960 --> 00:36:47,200
So we've got our prices.

404
00:36:47,200 --> 00:36:58,040
We're going to go ahead and grab the federal funds rate, which we're going to assume.

405
00:36:58,040 --> 00:37:05,880
You know, we're basically really essentially going to make the assumption rate that inflation

406
00:37:05,880 --> 00:37:12,360
depends on the interest rate and output depends on the interest rate.

407
00:37:12,360 --> 00:37:16,000
But the interest rate doesn't necessarily.

408
00:37:16,000 --> 00:37:24,760
So this is more of an abstract assumption in order to identify your VAR.

409
00:37:24,760 --> 00:37:31,880
But long story short, I'll let you get into the weeds of this.

410
00:37:31,880 --> 00:37:39,320
But check it out, page 96 of Keeganskee.com, macroeconomics.

411
00:37:39,320 --> 00:37:49,640
So it's worth your read because sort of assumptions like this are important because, well, if

412
00:37:49,640 --> 00:37:56,440
you don't correctly identify your VAR with these assumptions, then your output, then

413
00:37:56,440 --> 00:38:08,360
your parameters are biased because these variables, you know, they are dependent on the errors,

414
00:38:08,360 --> 00:38:11,440
right?

415
00:38:11,440 --> 00:38:17,280
Each equation, because it's dependent on the others, it's not independent of the error.

416
00:38:17,280 --> 00:38:32,480
Long story short, you do have to make assumptions in order not to have biased estimators.

417
00:38:32,480 --> 00:38:40,200
But what we're going to see is the VARs are interestingly effective at prediction.

418
00:38:40,200 --> 00:38:53,960
So we may have biased parameters, but we're going to do the Ten Commandments of Forecasting,

419
00:38:53,960 --> 00:38:54,960
right?

420
00:38:54,960 --> 00:39:01,100
So we may have biased parameters, but we're going to compare our forecasts with the actuals

421
00:39:01,100 --> 00:39:03,960
on a month by month basis.

422
00:39:03,960 --> 00:39:11,540
So we can at least judge our bias, right?

423
00:39:11,540 --> 00:39:23,760
So if we can maybe look at our forecasts, right, and we may be consistently over forecasting

424
00:39:23,760 --> 00:39:29,420
or we may be consistently under forecasting.

425
00:39:29,420 --> 00:39:35,040
So that's why it's going to be really important to use these recursive measures, right, because

426
00:39:35,040 --> 00:39:38,900
we'll want to measure our forecasting error from month to month to month.

427
00:39:38,900 --> 00:39:46,300
So that way if we make a forecast and we know, okay, in the past our forecasts have been

428
00:39:46,300 --> 00:39:53,900
under or over, you know, you can kind of take that into consideration.

429
00:39:53,900 --> 00:40:02,400
So it's an imperfect science, but like we say, having a measure is better than no measure,

430
00:40:02,400 --> 00:40:07,280
but definitely hedge the measure with all the assumptions and biases that are built

431
00:40:07,280 --> 00:40:08,280
in.

432
00:40:08,280 --> 00:40:12,960
So that's the best I can tell you.

433
00:40:12,960 --> 00:40:21,880
But we're going to go through the process anyway, so that way you can see the imperfect

434
00:40:21,880 --> 00:40:24,960
nature of forecasting.

435
00:40:24,960 --> 00:40:29,960
If anyone ever asks you to do it, and then you can actually do it for them, and then,

436
00:40:29,960 --> 00:40:40,080
you know, please hedge to them what they need to take into consideration.

437
00:40:40,080 --> 00:40:46,180
So without further ado, hopefully it scared you off enough that you know, you're not going

438
00:40:46,180 --> 00:40:55,440
to replace your life's fortune in these forecasts here.

439
00:40:55,440 --> 00:41:03,120
Okay, so first things first, we actually need to calculate inflation.

440
00:41:03,120 --> 00:41:17,120
And so we were given the prices, so we can actually calculate inflation here.

441
00:41:17,120 --> 00:41:26,440
Okay, just a heads up, in most industries, you would not expect a negative inflation,

442
00:41:26,440 --> 00:41:28,320
so that would be deflation.

443
00:41:28,320 --> 00:41:39,920
Okay, you would expect an increasing or decreasing rate of inflation, but the market wouldn't

444
00:41:39,920 --> 00:41:43,360
be stable unless there is a little bit of inflation.

445
00:41:43,360 --> 00:41:48,840
We talked about this in the past, where as soon as you experience deflation, it's essentially

446
00:41:48,840 --> 00:41:55,360
a race to the bottom, because everybody's just going to be waiting until the future

447
00:41:55,360 --> 00:41:59,440
to make their purchases, right?

448
00:41:59,440 --> 00:42:03,320
And if everybody's waiting for the future to make their purchases, well, outputs just

449
00:42:03,320 --> 00:42:05,760
going to keep coming down and down.

450
00:42:05,760 --> 00:42:11,880
So you don't want to get caught in a deflationary spiral.

451
00:42:11,880 --> 00:42:16,560
You don't want inflation to be out of control either.

452
00:42:16,560 --> 00:42:35,040
So what policymakers are looking for is a steady rate of change of inflation.

453
00:42:35,040 --> 00:42:47,240
So we can maybe rate, so what would the rate of change of inflation be?

454
00:42:47,240 --> 00:43:01,640
Well, just the change of inflation would just be inflation diff.

455
00:43:01,640 --> 00:43:12,600
So right, so anyways, you'd want that to be fairly constant.

456
00:43:12,600 --> 00:43:22,000
As you can see, we've got a big change in inflation happening in August of 2020.

457
00:43:22,000 --> 00:43:26,640
And even actually not even so much, see, it's more the subsequent months, right?

458
00:43:26,640 --> 00:43:32,720
And this is where we were talking about how things are going to play out after various

459
00:43:32,720 --> 00:43:33,960
months, right?

460
00:43:33,960 --> 00:43:36,080
Because it's more, right?

461
00:43:36,080 --> 00:43:44,040
You've got a, yes, you've got a dip in August of 2020, but then the way inflation is calculated,

462
00:43:44,040 --> 00:43:45,920
the change in prices, right?

463
00:43:45,920 --> 00:43:51,520
You've got a big change in May, and then all of a sudden you've got a big change in June,

464
00:43:51,520 --> 00:43:52,520
July.

465
00:43:52,520 --> 00:43:55,200
So it's still white, right?

466
00:43:55,200 --> 00:43:58,800
So this is why you have sort of this ripple effect.

467
00:43:58,800 --> 00:44:09,440
So it's not just like the shock is isolated to one month.

468
00:44:09,440 --> 00:44:16,880
So now we're just going to just look at all of our variables over the same time period

469
00:44:16,880 --> 00:44:19,880
here, right?

470
00:44:19,880 --> 00:44:36,400
So we're basically given inflation rate from December of 2018 through September of 2021.

471
00:44:36,400 --> 00:44:44,440
So we'll just look at all our variables through this time period here.

472
00:44:44,440 --> 00:44:50,240
Awesome.

473
00:44:50,240 --> 00:44:51,440
Oh yeah, that's right.

474
00:44:51,440 --> 00:44:58,440
We're just looking at inflation, not prices.

475
00:44:58,440 --> 00:45:05,320
So now we're going to estimate the system of equations.

476
00:45:05,320 --> 00:45:11,440
So we've got output, inflation, and the interest rate.

477
00:45:11,440 --> 00:45:15,960
And we need to estimate these simultaneously.

478
00:45:15,960 --> 00:45:25,360
And you can actually simply just do this through ordinary least squares.

479
00:45:25,360 --> 00:45:36,600
So this is why we wrote our model here as in matrix form.

480
00:45:36,600 --> 00:45:48,160
So we're just going to estimate our model through ordinary least squares.

481
00:45:48,160 --> 00:45:57,980
So this is where we get into the art of forecasting, right?

482
00:45:57,980 --> 00:46:07,280
So we're given that rate or VOR depends on variables from prior days.

483
00:46:07,280 --> 00:46:12,360
But how many prior periods, not days, periods, right?

484
00:46:12,360 --> 00:46:14,520
Because we're doing monthly here.

485
00:46:14,520 --> 00:46:18,520
So does it depend on just last month?

486
00:46:18,520 --> 00:46:23,400
Does it depend on last month and two months ago?

487
00:46:23,400 --> 00:46:26,920
Does it depend on the whole quarter?

488
00:46:26,920 --> 00:46:34,720
So this is where we try to fit the best VOR from our in-sample data.

489
00:46:34,720 --> 00:46:40,440
So we'll basically try to fit a series of VORs.

490
00:46:40,440 --> 00:46:44,720
So we'll fit, in this case, a VOR 1 through 6.

491
00:46:44,720 --> 00:46:51,880
So we'll say, OK, sales today depend on sales from the prior month, the prior two months,

492
00:46:51,880 --> 00:46:55,160
three, four, five, six months.

493
00:46:55,160 --> 00:46:57,360
We're going to keep doing this.

494
00:46:57,360 --> 00:47:02,600
Keep in mind, we'll run into overfitting, right?

495
00:47:02,600 --> 00:47:10,320
Because say we're trying to say that each equation depends on observations from up to

496
00:47:10,320 --> 00:47:12,320
six periods ago.

497
00:47:12,320 --> 00:47:24,600
Well, then that's going to be six parameters times three for just one equation times three

498
00:47:24,600 --> 00:47:26,000
seconds.

499
00:47:26,000 --> 00:47:35,880
So that would be, what's that, like 40-some parameters?

500
00:47:35,880 --> 00:47:47,160
So if you have a lot of observations, that will be wonderful.

501
00:47:47,160 --> 00:48:01,680
But we're only given 34 observations here because we only have 34 months of data.

502
00:48:01,680 --> 00:48:06,480
So we're not going to be able to estimate a VOR of six.

503
00:48:06,480 --> 00:48:15,560
So we're basically going to try to fit the best fitting VOR that we can with the data

504
00:48:15,560 --> 00:48:20,160
that we have here.

505
00:48:20,160 --> 00:48:30,000
And it actually, right?

506
00:48:30,000 --> 00:48:35,840
So we use stats model to fit all those and select the best order.

507
00:48:35,840 --> 00:48:43,680
So you use a Bayesian information criterion that penalizes you for adding parameters.

508
00:48:43,680 --> 00:48:51,240
So basically, what we're told is the best model is a VOR one.

509
00:48:51,240 --> 00:49:03,240
I suspect that we weren't even able to fit much above a VOR one given the number of observations

510
00:49:03,240 --> 00:49:04,600
we have.

511
00:49:04,600 --> 00:49:09,520
But so this is where, once again, you kind of have to hedge, right?

512
00:49:09,520 --> 00:49:17,720
Because it's like, OK, is the VOR one the best, most accurate model?

513
00:49:17,720 --> 00:49:18,720
Probably not.

514
00:49:18,720 --> 00:49:24,880
But it's the only one that we're really able to fit given our observations.

515
00:49:24,880 --> 00:49:34,640
So once again, we're just going to have to move forward with that, knowing that it's

516
00:49:34,640 --> 00:49:40,840
the best of all of our imperfect options.

517
00:49:40,840 --> 00:49:43,080
So keep that in mind.

518
00:49:43,080 --> 00:49:46,480
And this is where we can get into the recursive nature, right?

519
00:49:46,480 --> 00:49:54,680
Because say in a few months from now, right, or even a year from now or two years from

520
00:49:54,680 --> 00:49:58,240
now, we're going to have more data.

521
00:49:58,240 --> 00:50:06,800
So the more data we get, the more likely we are to be able to fit a higher order VOR model,

522
00:50:06,800 --> 00:50:11,840
which may be a better predictor.

523
00:50:11,840 --> 00:50:13,280
So that's why it's a recursive model.

524
00:50:13,280 --> 00:50:19,200
So each month, we'll see what's the best order VOR that we can fit.

525
00:50:19,200 --> 00:50:26,480
And if we can fit multiple ones, we may want to compare which one actually makes the best

526
00:50:26,480 --> 00:50:29,520
out of sample forecasts.

527
00:50:29,520 --> 00:50:39,160
Because just because you can forecast well in sample, it's out of sample forecasts that

528
00:50:39,160 --> 00:50:41,360
matter, right?

529
00:50:41,360 --> 00:50:45,720
So let's do just that.

530
00:50:45,720 --> 00:50:53,440
So right, we said, OK, we want to go ahead and get forecasts for the rest of this year

531
00:50:53,440 --> 00:50:56,520
plus next year.

532
00:50:56,520 --> 00:50:59,640
So great.

533
00:50:59,640 --> 00:51:00,640
So great.

534
00:51:00,640 --> 00:51:04,320
So our data, right, we go through the end of September.

535
00:51:04,320 --> 00:51:10,800
So we want to get observations for October, November, and December, and then the 12 months

536
00:51:10,800 --> 00:51:12,400
next year.

537
00:51:12,400 --> 00:51:26,760
So our horizon is 15 periods and our lag order is 1.

538
00:51:26,760 --> 00:51:32,440
So right, so right, it's real cool, right, because this is how you forecast with the

539
00:51:32,440 --> 00:51:33,680
VOR, right?

540
00:51:33,680 --> 00:51:39,320
So we've just estimated our parameters.

541
00:51:39,320 --> 00:51:49,680
So I wonder if we can just do this.

542
00:51:49,680 --> 00:51:57,240
OK, well, here are our parameters.

543
00:51:57,240 --> 00:52:04,320
So right, we estimated three equations.

544
00:52:04,320 --> 00:52:09,840
For each equation, we've got a constant and three parameters.

545
00:52:09,840 --> 00:52:21,640
So this is alpha y, beta j, gamma j, delta j, right?

546
00:52:21,640 --> 00:52:34,480
And then here, we've got alpha pi, theta j, theta j, lambda j.

547
00:52:34,480 --> 00:52:39,160
And I don't want to keep going through the Greek alphabet, so I'll leave you with the

548
00:52:39,160 --> 00:52:42,400
idea there.

549
00:52:42,400 --> 00:52:46,440
So we've estimated all our parameters.

550
00:52:46,440 --> 00:52:51,200
And so now, the cool thing is we can just plug and play.

551
00:52:51,200 --> 00:52:55,400
So we've got our parameters, a of l.

552
00:52:55,400 --> 00:53:00,640
And now, we can just plug in the observations from September, right?

553
00:53:00,640 --> 00:53:10,000
We can just plug in September output, September inflation, September interest rates, and we

554
00:53:10,000 --> 00:53:14,200
get October's estimates.

555
00:53:14,200 --> 00:53:22,560
Well, we can then take October's estimates, plug them in, use the same parameters, and

556
00:53:22,560 --> 00:53:25,640
we get November's estimates, right?

557
00:53:25,640 --> 00:53:28,000
And so you see how this plays out.

558
00:53:28,000 --> 00:53:36,200
Plug in November's estimates, use your parameters, December's estimates, so on and so forth.

559
00:53:36,200 --> 00:53:38,600
You can code this up yourself.

560
00:53:38,600 --> 00:53:40,360
I've done so.

561
00:53:40,360 --> 00:53:49,280
Or we can just use stats models here and forecast forward.

562
00:53:49,280 --> 00:53:54,880
Get the same result.

563
00:53:54,880 --> 00:54:00,480
And here are our forecasts.

564
00:54:00,480 --> 00:54:07,880
So we have output, right?

565
00:54:07,880 --> 00:54:15,760
So it dips down to zero, going up, dip down, going up.

566
00:54:15,760 --> 00:54:20,560
And here are our forecasts.

567
00:54:20,560 --> 00:54:28,080
As you can see, and Heather can attest that this has been the case in other weeks, our

568
00:54:28,080 --> 00:54:31,400
confidence is really wide.

569
00:54:31,400 --> 00:54:36,240
We've got, right, we don't have a narrow confidence interval here, right?

570
00:54:36,240 --> 00:54:43,240
So sales could take a dip or they could rise.

571
00:54:43,240 --> 00:54:52,600
So this is where if you were doing your business planning, you would need to acknowledge your

572
00:54:52,600 --> 00:54:54,200
cost of forecasting, right?

573
00:54:54,200 --> 00:54:56,760
So it may not be symmetric, right?

574
00:54:56,760 --> 00:55:04,560
It may be asymmetrically worse for you if sales are lower than expected and higher than

575
00:55:04,560 --> 00:55:07,960
expected.

576
00:55:07,960 --> 00:55:18,680
So you may want to over prepare for this lower period, these lower potential sales, right?

577
00:55:18,680 --> 00:55:26,240
So you may want to do some worst case scenario planning and just say, okay, what if sales

578
00:55:26,240 --> 00:55:32,320
dip way down to this threshold?

579
00:55:32,320 --> 00:55:36,120
We'll actually quantify this here in a second because we have these numbers.

580
00:55:36,120 --> 00:55:43,400
And then you can say, okay, so you need to plan for the rainy day scenario here.

581
00:55:43,400 --> 00:55:47,400
We at least have our estimates here.

582
00:55:47,400 --> 00:55:54,680
This is where we get into our wish for a higher order VAR model.

583
00:55:54,680 --> 00:56:00,960
So as you can see, there's not too much dynamism, if that's a word.

584
00:56:00,960 --> 00:56:07,080
There's not too much dynamics with our forecasts here, right?

585
00:56:07,080 --> 00:56:12,120
And that's because of our plug and play nature here, right?

586
00:56:12,120 --> 00:56:22,440
So if you're just going to plug in today's values to get November's values, right, depending

587
00:56:22,440 --> 00:56:36,120
on your parameters, if you just keep plugging in your forecasts, you may lose the dynamic.

588
00:56:36,120 --> 00:56:41,240
You're going to lose some of the variability and you're just going to get sort of these

589
00:56:41,240 --> 00:56:50,160
steady forecasts, which is not really what we would expect.

590
00:56:50,160 --> 00:56:59,080
And so what I've found is if you're able to estimate and fit a higher order VAR model,

591
00:56:59,080 --> 00:57:07,360
you're going to get a bit more dynamics in your estimates, which tend to reduce your

592
00:57:07,360 --> 00:57:09,560
forecasting error.

593
00:57:09,560 --> 00:57:21,360
So that way, your forecasts tend to fluctuate better with the actuals in my experience.

594
00:57:21,360 --> 00:57:24,960
However, we have to start somewhere.

595
00:57:24,960 --> 00:57:31,080
So we now have a crude forecast of output.

596
00:57:31,080 --> 00:57:36,520
We have a crude forecast of inflation.

597
00:57:36,520 --> 00:57:41,680
And then we even have a crude forecast of the federal funds rate.

598
00:57:41,680 --> 00:57:47,280
We're predicting it's going beneath zero, which we may expect.

599
00:57:47,280 --> 00:57:51,240
As we know, interest rates can't go below zero.

600
00:57:51,240 --> 00:58:00,160
And this is where the Federal Reserve is already undoubtedly doing quantitative easing, right,

601
00:58:00,160 --> 00:58:05,460
to effectively push the interest rate below zero.

602
00:58:05,460 --> 00:58:11,880
So interesting dynamics going on there.

603
00:58:11,880 --> 00:58:21,080
Before we conclude, let's go ahead and save our forecasts here.

604
00:58:21,080 --> 00:58:27,120
Because right, that was the whole point of forecasting, right, is we need to get the

605
00:58:27,120 --> 00:58:35,960
forecasts, so that way we can get a crude forecast for 2021, as well as 2022.

606
00:58:35,960 --> 00:58:39,040
Then we need to compare these to actuals.

607
00:58:39,040 --> 00:58:45,520
And just for the sake of brevity here, I'm just going to do this by hand, and then we

608
00:58:45,520 --> 00:58:47,960
can maybe do it programmatically later.

609
00:58:47,960 --> 00:58:51,160
But remember, the dates here, right.

610
00:58:51,160 --> 00:58:59,920
So this is for October 2021.

611
00:58:59,920 --> 00:59:01,440
Let's see if we can't.

612
00:59:01,440 --> 00:59:06,040
Okay, through December 22.

613
00:59:06,040 --> 00:59:15,680
Let's insert some of these past values.

614
00:59:15,680 --> 00:59:28,360
Because basically I'm just curious what these...

615
00:59:28,360 --> 00:59:29,360
What was it?

616
00:59:29,360 --> 00:59:30,360
Monthly...

617
00:59:30,360 --> 00:59:33,360
What was this?

618
00:59:33,360 --> 00:59:52,040
Sorry, bear with me, because I just want to leave today with the last few estimates here.

619
00:59:52,040 --> 01:00:20,160
That way we can at least leave today with our forecast for 2021 and 2022 real quick.

620
01:00:20,160 --> 01:00:24,880
So let's just put this into Excel real quick.

621
01:00:24,880 --> 01:00:32,400
Okay, so like I said, we're just doing this real quick in bare bones just to have some

622
01:00:32,400 --> 01:00:34,960
tangible forecasts by the end of the day.

623
01:00:34,960 --> 01:01:00,040
But basically 2021 sales forecast, then 2022 sales forecast, and then forecasted change

624
01:01:00,040 --> 01:01:01,520
in sales.

625
01:01:01,520 --> 01:01:08,000
So just bear with me if you can stay an extra five minutes so we can actually walk away

626
01:01:08,000 --> 01:01:09,800
with these two numbers.

627
01:01:09,800 --> 01:01:12,720
But basically we're predicting, right.

628
01:01:12,720 --> 01:01:18,240
And like I said, I would prefer to do this programmatically than just hacking it away

629
01:01:18,240 --> 01:01:24,120
real quick in Excel.

630
01:01:24,120 --> 01:01:32,960
Right, we're just hacking away real quick in Excel just so we can get a number.

631
01:01:32,960 --> 01:01:39,760
So let's put this into millions.

632
01:01:39,760 --> 01:01:41,920
Okay.

633
01:01:41,920 --> 01:01:58,920
So we're estimating above 1 billion in sales.

634
01:01:58,920 --> 01:02:05,800
And let's see what the percent change we're forecasting is going to be.

635
01:02:05,800 --> 01:02:15,680
Okay, so this is interesting here.

636
01:02:15,680 --> 01:02:20,200
So real quick, right.

637
01:02:20,200 --> 01:02:27,240
We've, you know, as you can see, they're not the most dynamic forecasts in the world, but

638
01:02:27,240 --> 01:02:30,360
we wanted to get a number.

639
01:02:30,360 --> 01:02:33,040
And we've got an a number here.

640
01:02:33,040 --> 01:02:34,520
We've gotten a couple.

641
01:02:34,520 --> 01:02:44,920
So we're going to go ahead and estimate that sales in 2021 in Massachusetts are going to

642
01:02:44,920 --> 01:02:51,120
be about 1.1 billion in 2021.

643
01:02:51,120 --> 01:02:57,800
And we're also estimating that they're just going to be a little higher than 1.1 billion

644
01:02:57,800 --> 01:03:00,180
in 2022.

645
01:03:00,180 --> 01:03:08,120
So we're only predicting, you know, a 1.2% increase in sales.

646
01:03:08,120 --> 01:03:19,400
I just from my personal experience, working with other states, I foresee that sales are

647
01:03:19,400 --> 01:03:23,400
going to increase by a lot more than 1.2%.

648
01:03:23,400 --> 01:03:24,440
They may not.

649
01:03:24,440 --> 01:03:31,680
So this is where we're going to want to come back and visit this on a month by month basis.

650
01:03:31,680 --> 01:03:41,240
Because is it the case that this giant hit to production in 2020 is still filtering through

651
01:03:41,240 --> 01:03:44,840
the production?

652
01:03:44,840 --> 01:03:48,900
And is this big hit to inflation?

653
01:03:48,900 --> 01:03:54,960
Is this having lasting effects that may just completely lower overall production?

654
01:03:54,960 --> 01:04:00,280
Because there could have been a whole shift in the production function.

655
01:04:00,280 --> 01:04:12,280
So this is where we'll want to come and on a month by month basis, we all want to come

656
01:04:12,280 --> 01:04:17,520
and look at the actual sales.

657
01:04:17,520 --> 01:04:25,680
And so we already know the actual sales through September 2021.

658
01:04:25,680 --> 01:04:31,520
And now what's cool is, you know, the cannabis data science meetup group, we've now made

659
01:04:31,520 --> 01:04:34,940
forecasts all the way through 2022.

660
01:04:34,940 --> 01:04:39,840
So we can start filling in the actuals month by month.

661
01:04:39,840 --> 01:04:45,880
And we may need to add a whole new series of forecasts.

662
01:04:45,880 --> 01:04:57,920
So next month, we may forecast with a different model if this one has wildly inaccurate forecasts.

663
01:04:57,920 --> 01:05:05,240
And so we can just gradually adapt our model, gradually improve it, and see if we can't

664
01:05:05,240 --> 01:05:08,680
hone in our forecasts.

665
01:05:08,680 --> 01:05:11,200
And so this is what's cool.

666
01:05:11,200 --> 01:05:13,960
So you've now attended the meetup group.

667
01:05:13,960 --> 01:05:19,520
We've gotten real cannabis data through the API.

668
01:05:19,520 --> 01:05:26,720
We've talked about the economic theory, as well as an a theoretical statistical model

669
01:05:26,720 --> 01:05:30,160
that we can use for forecasting.

670
01:05:30,160 --> 01:05:40,240
We've now fit the model, created some visualizations, but help to understand the data.

671
01:05:40,240 --> 01:05:49,120
And we've saved these new data points so that way we can continue to make ongoing analysis.

672
01:05:49,120 --> 01:05:51,960
So that's what's real cool about this.

673
01:05:51,960 --> 01:05:55,680
And it's completely transparent, completely open source.

674
01:05:55,680 --> 01:06:03,380
So right after this meetup, I'll go ahead and commit this to the GitHub repository.

675
01:06:03,380 --> 01:06:06,160
So it is MIT licensed.

676
01:06:06,160 --> 01:06:10,320
So you're welcome to use the code for any reason you see fit.

677
01:06:10,320 --> 01:06:17,760
It would just be awesome if you mentioned the authors, tossing the cannabis data science

678
01:06:17,760 --> 01:06:19,800
group if you wish.

679
01:06:19,800 --> 01:06:24,920
And then you're free to use the code however you see fit.

680
01:06:24,920 --> 01:06:29,000
So you're free to expand upon this.

681
01:06:29,000 --> 01:06:36,080
And just to point you in some avenues for further work.

682
01:06:36,080 --> 01:06:42,040
You can also estimate output with two additional VAR models.

683
01:06:42,040 --> 01:06:47,440
So you could estimate output using unemployment.

684
01:06:47,440 --> 01:06:52,280
You could also estimate output using hours worked.

685
01:06:52,280 --> 01:06:57,480
So perhaps on Saturday, I may show you these models.

686
01:06:57,480 --> 01:07:02,440
And so that way you could prepare two additional forecasts.

687
01:07:02,440 --> 01:07:04,000
And this is what it's all about.

688
01:07:04,000 --> 01:07:08,680
So you could have one forecast with one model.

689
01:07:08,680 --> 01:07:14,260
And then we could add on two more forecasts.

690
01:07:14,260 --> 01:07:19,640
And then we can compare the forecast error from all three models.

691
01:07:19,640 --> 01:07:24,420
And then we can see which model predicts the best.

692
01:07:24,420 --> 01:07:31,600
And then we can have a bit more faith in that model going forward.

693
01:07:31,600 --> 01:07:34,960
So that's the beauty to it.

694
01:07:34,960 --> 01:07:41,000
So until next time, you're welcome to use the code.

695
01:07:41,000 --> 01:07:43,480
Try to make some forecasts of your own.

696
01:07:43,480 --> 01:07:46,680
Feel free to email me if you have any questions.

697
01:07:46,680 --> 01:07:57,360
And then definitely check out Saturday Statistics because we may do even more forecasting.

698
01:07:57,360 --> 01:08:04,000
And we're going to walk away at the end of the day with an even, I wouldn't even call

699
01:08:04,000 --> 01:08:05,660
this a beautiful visualization.

700
01:08:05,660 --> 01:08:10,200
So we're going to walk away at the end of the day with a stunning, stunningly beautiful

701
01:08:10,200 --> 01:08:11,600
visualization.

702
01:08:11,600 --> 01:08:13,600
So that's the objective.

703
01:08:13,600 --> 01:08:21,240
And we'll iron down the forecasting a bit more and build up confidence in that.

704
01:08:21,240 --> 01:08:30,880
So we ran a little long today, but hopefully it was worthwhile for you.

705
01:08:30,880 --> 01:08:37,080
Any questions, comments, thoughts, ideas before we conclude today?

706
01:08:37,080 --> 01:08:39,080
Oh, yes.

707
01:08:39,080 --> 01:08:44,840
OK, here, Shayan has some in the chat here.

708
01:08:44,840 --> 01:08:51,160
So yes, so the confidence interval is a 95% confidence interval right out of the gate.

709
01:08:51,160 --> 01:08:58,400
But for actual applications, that's quite a high confidence level.

710
01:08:58,400 --> 01:09:03,960
I found that if you say you're presenting this to your manager or what have you, I think

711
01:09:03,960 --> 01:09:10,000
it's perfectly OK to reduce your confidence to maybe 80% or so.

712
01:09:10,000 --> 01:09:19,360
It's going to reduce your confidence bars, your confidence interval to make your estimates

713
01:09:19,360 --> 01:09:21,400
appear to be a bit more precise.

714
01:09:21,400 --> 01:09:27,120
Once again, you've got to hedge that you're using only an 80% confidence interval.

715
01:09:27,120 --> 01:09:31,440
Also, this bar does not address seasonality.

716
01:09:31,440 --> 01:09:37,680
So once again, Shayan is a whiz here and knows a lot more about this.

717
01:09:37,680 --> 01:09:41,920
So there's a lot you can add on to these bars.

718
01:09:41,920 --> 01:09:46,120
So we use what's called an ARIMA model.

719
01:09:46,120 --> 01:09:52,960
Shayan notes that you can improve upon this in a manner of ways.

720
01:09:52,960 --> 01:10:01,560
So the ARIMA X model is if you want to add on another explanatory variable.

721
01:10:01,560 --> 01:10:05,960
So I'll explain this real quick.

722
01:10:05,960 --> 01:10:12,240
So we've got our output, our inflation, our interest rate.

723
01:10:12,240 --> 01:10:17,000
Well, let's say we want to add on another variable.

724
01:10:17,000 --> 01:10:24,440
Some variables I love to add on are month effects, because you always know what month

725
01:10:24,440 --> 01:10:26,440
it is.

726
01:10:26,440 --> 01:10:32,920
That's the hard thing about adding on explanatory variables is you also have to predict them.

727
01:10:32,920 --> 01:10:42,440
So say you wanted to add on gas prices, well, that's not really that feasible, because you

728
01:10:42,440 --> 01:10:46,440
don't really know what gas prices are going to be a year from now.

729
01:10:46,440 --> 01:10:51,720
What's cool about the month is you do know what the month is going to be a year from

730
01:10:51,720 --> 01:10:52,720
now.

731
01:10:52,720 --> 01:10:53,720
It'll be October.

732
01:10:53,720 --> 01:11:00,120
So that's a cool way you can add on effects.

733
01:11:00,120 --> 01:11:03,400
You can also add on seasonality effects.

734
01:11:03,400 --> 01:11:07,120
And so this is going to be what's called like your moving averages.

735
01:11:07,120 --> 01:11:08,560
So you can kind of take.

736
01:11:08,560 --> 01:11:18,640
So this is, I don't know the seasonality as well, but you can take into account seasonality.

737
01:11:18,640 --> 01:11:29,040
So long story short is, you know, Shayan pointed out some good additional models that you can

738
01:11:29,040 --> 01:11:31,040
extend upon this analysis with.

739
01:11:31,040 --> 01:11:36,040
And that's exactly what I encourage you to do, because this is the whole idea about the

740
01:11:36,040 --> 01:11:42,040
transparent box is, and this is when you like when you see other people's forecasts, you

741
01:11:42,040 --> 01:11:44,040
know, who knows what model they used?

742
01:11:44,040 --> 01:11:47,040
Are they willing to even say what model they used?

743
01:11:47,040 --> 01:11:50,040
And can it be extended upon?

744
01:11:50,040 --> 01:11:53,040
In this case, we just used an ARIMA model.

745
01:11:53,040 --> 01:11:55,040
It can be extended upon.

746
01:11:55,040 --> 01:11:59,040
And by all means, please try.

747
01:11:59,040 --> 01:12:04,040
And then, you know, you may get better forecasts than we made today.

748
01:12:04,040 --> 01:12:08,040
Excellent, excellent question.

749
01:12:08,040 --> 01:12:10,040
All right.

750
01:12:10,040 --> 01:12:13,040
We ran quite a bit over today.

751
01:12:13,040 --> 01:12:14,040
Excellent.

752
01:12:14,040 --> 01:12:16,040
So definitely be in touch.

753
01:12:16,040 --> 01:12:17,040
So we ran a bit over today.

754
01:12:17,040 --> 01:12:20,040
So I'm going to go ahead and conclude it here.

755
01:12:20,040 --> 01:12:22,040
But I think it was a fun day.

756
01:12:22,040 --> 01:12:28,040
So we finally got our hands back on some data and some statistics.

757
01:12:28,040 --> 01:12:33,040
Like I said, I think the visualizations can still be improved upon and we can still do some more modeling.

758
01:12:33,040 --> 01:12:38,040
So let's let's do let's extend upon that on Saturday.

759
01:12:38,040 --> 01:12:50,040
And then next week, I think it'll be fun to get into the exogenous shocks so we can say, OK, what happens if all of a sudden there's a price shock in Massachusetts?

760
01:12:50,040 --> 01:12:54,040
Or what happens if all of a sudden there's a production shock?

761
01:12:54,040 --> 01:13:05,040
And so you can brainstorm about why there may be various shocks, but why we don't have to worry about we can just do the statistics.

762
01:13:05,040 --> 01:13:13,040
But by all means, bring some good stories as to why there may be some good economic shocks.

763
01:13:13,040 --> 01:13:20,040
And then, like I said, we can do the analyses and then you can take these and apply them how you will.

764
01:13:20,040 --> 01:13:22,040
So we originally did these in Oregon.

765
01:13:22,040 --> 01:13:30,040
And so these analyses could be done in Colorado, where I am now or wherever wherever you see fit.

766
01:13:30,040 --> 01:13:35,040
So please, please take what we've provided here and expand upon it.

767
01:13:35,040 --> 01:13:44,040
So. All right, crew, until next time, stay productive and feel free to reach out.

768
01:13:44,040 --> 01:14:06,040
Kinetics is always here to support you.

