1
00:00:00,000 --> 00:00:18,080
Well, good morning everybody. How are you doing this week? Good. Yeah, good. I want

2
00:00:18,080 --> 00:00:24,580
to thank Charles real quick. So Charles sent me the pipeline workbook he put together last

3
00:00:24,580 --> 00:00:36,680
night with waste by licensee by day, which is the exact data point that better carbon

4
00:00:36,680 --> 00:00:42,680
solutions has been searching for. And I think it'll be helpful for anyone who's interested

5
00:00:42,680 --> 00:00:49,240
in cannabis waste. And I think we were talking about last week, how perhaps people in other

6
00:00:49,240 --> 00:00:57,560
states could even look at Washington and learn from us. So it could be helpful to waste producers

7
00:00:57,560 --> 00:01:05,320
in other states as well. Good work there, Charles. Do you have any comments? So, you

8
00:01:05,320 --> 00:01:09,920
know, definitely take a look at it and see how you want to display that. There's over

9
00:01:09,920 --> 00:01:17,800
like 1100 different producers that produced waste. And so I wasn't sure how to like display

10
00:01:17,800 --> 00:01:31,200
that. Well, I would just start with just just average. So just what's the average amount

11
00:01:31,200 --> 00:01:41,840
of waste for licensee per day. And so I would maybe start there. So that that way you can

12
00:01:41,840 --> 00:01:48,680
just get a singular time series. You could look at like the top 10, you know, see what

13
00:01:48,680 --> 00:01:59,160
the top 10 producers how much waste they're producing. And then the this one, you didn't

14
00:01:59,160 --> 00:02:03,480
really need to sort it by licensee. But it would also just be interesting to look at

15
00:02:03,480 --> 00:02:15,240
total waste by day. So those are the three metrics that I even start by plotting. And

16
00:02:15,240 --> 00:02:21,840
then those are just all time series. So you could get more creative than that, I'm sure.

17
00:02:21,840 --> 00:02:32,160
Okay. I'll keep plugging away at it. Speaking of getting creative, if you were so one of

18
00:02:32,160 --> 00:02:38,760
my favorite types of figures is a map. So you could potentially have a map and have

19
00:02:38,760 --> 00:02:46,040
almost I'm not sure you could do it. I wonder if you could almost do like, like radial sizes.

20
00:02:46,040 --> 00:02:54,680
So you just have bigger circles around the bigger bigger waste producers. But I'm not

21
00:02:54,680 --> 00:03:00,980
sure if that would really make sense. That way you could almost get like a geographical

22
00:03:00,980 --> 00:03:08,200
sense of where all the waste is. Yeah, maybe like a heat map of the region.

23
00:03:08,200 --> 00:03:13,960
Exactly. A heat map would be a good one. If you're feeling ambitious, but you have the

24
00:03:13,960 --> 00:03:22,840
data so it's possible. And if you need to geocode the addresses, canlytics has a little

25
00:03:22,840 --> 00:03:29,280
utility helper that for the most part, you can just toss in someone's name, just their

26
00:03:29,280 --> 00:03:40,480
licensee name, and it can get you the geo coded address. But anywho, good work there.

27
00:03:40,480 --> 00:03:46,360
And I think how about maybe I'll take a look at it. We can digest it. And then if you want

28
00:03:46,360 --> 00:03:51,160
it, we could look at that to look at your waste data next week.

29
00:03:51,160 --> 00:03:52,160
Okay.

30
00:03:52,160 --> 00:04:05,440
So that's awesome. Good to see you here too, Nick. I've actually got something interesting

31
00:04:05,440 --> 00:04:14,720
for today. So last week we started looking at the inflation of cannabis prices in Oregon.

32
00:04:14,720 --> 00:04:28,720
So finished compiling that data. So let me just go ahead and push it up to the Canvas

33
00:04:28,720 --> 00:04:46,640
Data Science repository here. And then I will go ahead and start showing you just a bit

34
00:04:46,640 --> 00:05:03,720
of the progress that I made on calculating inflation rates since last week. Don't worry,

35
00:05:03,720 --> 00:05:14,560
Charles, didn't miss anything. I'm just opening up the presentation.

36
00:05:14,560 --> 00:05:31,240
All right. So it is actually April 21st. However, this is essentially the presentation I put

37
00:05:31,240 --> 00:05:37,240
together last week. So Charles has seen this. Nick, I'm just going to run through this real

38
00:05:37,240 --> 00:05:44,200
quick just so that you can get up to speed with the data we're looking at and the stats

39
00:05:44,200 --> 00:05:52,280
we're trying to calculate here. So we're looking at total cannabis sales in Oregon, and we

40
00:05:52,280 --> 00:06:01,400
can break that down by product type. And this is available through Oregon's public data

41
00:06:01,400 --> 00:06:11,480
dashboard for the cannabis industry. And they update their data quickly. So you have everything

42
00:06:11,480 --> 00:06:18,280
through the prior month. So you have everything up through March of 2021, which is phenomenal

43
00:06:18,280 --> 00:06:23,760
that they publish their data so quickly. Yeah. Did they fix that dashboard? Because last

44
00:06:23,760 --> 00:06:29,520
time I checked, it was like, I always had issues, but maybe I had this wrong link. But

45
00:06:29,520 --> 00:06:33,320
like it would always just air out and nothing would show up. So that's...

46
00:06:33,320 --> 00:06:46,480
Yes. So the dashboard is a little tricky. So when you access it, it's actually an HTTP

47
00:06:46,480 --> 00:06:56,880
address and you may have to hit the advance, proceed to an unsafe website button. Okay.

48
00:06:56,880 --> 00:07:07,720
All right. And so it may look like the website's unavailable at first, but essentially I think

49
00:07:07,720 --> 00:07:17,480
they're just running on HTTP. And then most modern browsers warn you against navigating

50
00:07:17,480 --> 00:07:23,200
to some HTTPs. It's worse than that. It means that they're

51
00:07:23,200 --> 00:07:28,200
running HTTPs, but they have a self-signed cert or one that can't be verified.

52
00:07:28,200 --> 00:07:38,880
Oh, thanks for chiming in. So exactly. So it's good to hear from someone who knows their

53
00:07:38,880 --> 00:07:49,440
computer science. So thank you for chiming in. So maybe it preceded your own caution,

54
00:07:49,440 --> 00:07:56,360
but in this case I went ahead and collected the data because the data is valuable in our

55
00:07:56,360 --> 00:07:59,360
case. Yeah.

56
00:07:59,360 --> 00:08:09,360
So the other thing is you may figure out a niftier way. I essentially... So here I'll

57
00:08:09,360 --> 00:08:26,880
show you what the data portal looks like. So as we've noticed, it's not secure, but

58
00:08:26,880 --> 00:08:37,920
we're not posting any sensitive data here. So long story short, you can download some

59
00:08:37,920 --> 00:08:49,920
series, but some series it's harder to download. So I actually just transcribed the figures.

60
00:08:49,920 --> 00:08:59,480
And I put them here in the... Well, I may have saved... Oh, yes. So essentially I've

61
00:08:59,480 --> 00:09:13,200
just transcribed the numbers here into Excel. So you can double check my work that I transcribed

62
00:09:13,200 --> 00:09:23,200
everything correctly. But if you can think of a niftier way to collect this data, then

63
00:09:23,200 --> 00:09:33,400
by all means I've added. So back to the presentation. We've collected sales data, we've collected

64
00:09:33,400 --> 00:09:45,680
price data, and we're curious about inflation of prices. So of course you notice right off

65
00:09:45,680 --> 00:09:54,360
the bat that prices in the first two years decreased dramatically as you would expect

66
00:09:54,360 --> 00:09:59,680
in a new market. So as things are coming to equilibrium, as people are figuring out how

67
00:09:59,680 --> 00:10:08,000
to price their products. And then you notice in the longer run, prices seem to stabilize

68
00:10:08,000 --> 00:10:18,120
and you have steady... It appears that you may have steady or inflation at this point.

69
00:10:18,120 --> 00:10:29,840
And so that would just be a moderate increase in prices over time. And then we've noted

70
00:10:29,840 --> 00:10:38,920
that economic theory would suggest that sales and prices would be correlated with the interest

71
00:10:38,920 --> 00:10:46,720
rate. We notice the cannabis industry is a little unique, so it may not be as strongly

72
00:10:46,720 --> 00:10:52,440
correlated with the interest rate as other industries may be. And so we're just going

73
00:10:52,440 --> 00:11:00,800
to see if the interest rate can be helpful to predict inflation in the cannabis industry

74
00:11:00,800 --> 00:11:13,280
or not. So here you see the interest rate is lower to near zero in early of 2020, March

75
00:11:13,280 --> 00:11:26,200
of 2020 to be specific. Well, maybe February. Next, we're going to try to estimate output,

76
00:11:26,200 --> 00:11:32,680
inflation and the interest rate. And just for a reasonable timeline, we'll just do through

77
00:11:32,680 --> 00:11:40,080
2022. So just for the remainder of the year to see what output and inflation may be in

78
00:11:40,080 --> 00:11:52,040
Oregon this year. So we are using economic theory. However, the statistical model is

79
00:11:52,040 --> 00:12:01,960
a theoretical. So it's really just taking any vector of time series and using historical

80
00:12:01,960 --> 00:12:12,920
observations to forecast that data forward. It's useful because you can apply it to really

81
00:12:12,920 --> 00:12:23,680
any time series data. And as long as it's coming in at a regular interval. So you can

82
00:12:23,680 --> 00:12:33,400
look at daily data. If you're looking at actual individual transactions during the day, there's

83
00:12:33,400 --> 00:12:40,600
statistical problems there. And then the other major pitfall of vector autoregressions is,

84
00:12:40,600 --> 00:12:50,120
of course, if you look at our equations from earlier, you'll see that there are a lot of

85
00:12:50,120 --> 00:13:04,480
variables. So sales depend on sales from however many periods ago. So they may depend on one,

86
00:13:04,480 --> 00:13:15,400
two, six periods ago. Likewise, it also depends on inflation from one to six periods ago,

87
00:13:15,400 --> 00:13:25,600
ten periods ago. However, many lags are in the model, similar for inflation. So if you

88
00:13:25,600 --> 00:13:35,320
only have, so in this case, we're looking at monthly price data. So we're just looking

89
00:13:35,320 --> 00:13:47,560
at the number of months really between 2017 and early of 2021. So we have about 50 observations

90
00:13:47,560 --> 00:13:58,420
here. And so we're going to be chewing through our degrees of freedom quite quickly. Because

91
00:13:58,420 --> 00:14:14,360
if, say, you just had one lag, then in this first model, you would have four parameters

92
00:14:14,360 --> 00:14:28,000
times three models. So that would be 12 parameters with just one lag order. So with 50 observations,

93
00:14:28,000 --> 00:14:36,440
you probably couldn't estimate the VAR with a lag order of six. You probably would not

94
00:14:36,440 --> 00:14:48,560
have enough observations. So keep that in mind when you're estimating your VARs. So

95
00:14:48,560 --> 00:14:57,480
that's why monthly data can be a little tricky. Weekly data is nice. And then daily data has

96
00:14:57,480 --> 00:15:10,400
its advantages because you get a lot of observations. And my only observation is it depends on your

97
00:15:10,400 --> 00:15:22,080
forecasting horizon of which frequency you want to choose. This is just a slight note

98
00:15:22,080 --> 00:15:30,440
that just says that we don't actually think that... Well, essentially, we're just saying

99
00:15:30,440 --> 00:15:36,200
we think that people essentially take the interest rate into their calculations when

100
00:15:36,200 --> 00:15:43,560
they're making decisions. So the Federal Reserve can change it however they want. People are

101
00:15:43,560 --> 00:15:53,440
just going to change prices and act accordingly. So enough of that. Let's go ahead and get

102
00:15:53,440 --> 00:16:04,880
into forecasting. So real quick, just to run over the 10. So these are essentially... Professor

103
00:16:04,880 --> 00:16:14,400
Iqbal, UNC Charlotte just told me these 10 steps to forecasting. And they've proved pretty

104
00:16:14,400 --> 00:16:20,760
useful as 10 steps to follow when you're doing forecasting. But first, we need to know what

105
00:16:20,760 --> 00:16:29,680
we're forecasting, the inflation rate in Oregon, cannabis prices. We need to understand why

106
00:16:29,680 --> 00:16:38,520
we're forecasting because we want to know how prices are going to be moving in Oregon.

107
00:16:38,520 --> 00:16:43,320
As we've saw, it looks like there's going to be moderate inflation, but we want to actually

108
00:16:43,320 --> 00:16:50,800
quantify that. And we need to acknowledge the cost of the forecast error. So that would

109
00:16:50,800 --> 00:16:59,640
just mean we can't take too much stock into our forecast. So what's the cost of over

110
00:16:59,640 --> 00:17:07,080
estimating inflation and what's the cost of underestimating inflation? Because if we put

111
00:17:07,080 --> 00:17:13,040
these forecasts out there and businesses make decisions about them based on what they think

112
00:17:13,040 --> 00:17:20,960
prices may be, they need to take into consideration what's the chance of overestimation slash

113
00:17:20,960 --> 00:17:28,000
underestimation. That's a tricky one, but important to take into consideration.

114
00:17:28,000 --> 00:17:34,200
Next, we just need to know the horizon, which we're just going to be forecasting through

115
00:17:34,200 --> 00:17:45,920
2022. Long enough to be useful, however, not too long that it's unrealistic. We've chosen

116
00:17:45,920 --> 00:18:01,920
our variables, inflation, output, and the interest rate. And we've chosen them because

117
00:18:01,920 --> 00:18:15,880
of economic theory. We've then chosen a vector autoregression to do the statistics. And the

118
00:18:15,880 --> 00:18:21,680
vector autoregression is a vector that fits our economic theory and it's a relatively

119
00:18:21,680 --> 00:18:28,000
simple statistical model that we can also use for forecasting.

120
00:18:28,000 --> 00:18:34,720
We'll want to present the results in some sort of figure. I'm not sure if you've heard

121
00:18:34,720 --> 00:18:42,000
of Edward Tuft, but he would always emphasize that you need to show the data in some sort

122
00:18:42,000 --> 00:18:52,680
of beautiful figure. So we'll throw out some charts. Then we'll want to interpret what

123
00:18:52,680 --> 00:18:58,440
predictions we've made. So we'll want to see if inflation is rising and decreasing, what's

124
00:18:58,440 --> 00:19:05,920
our confidence. Next, we will essentially want to use recursive

125
00:19:05,920 --> 00:19:13,760
methods. So what this means is we want to keep forecasting each month. So we'll want

126
00:19:13,760 --> 00:19:23,120
to revisit our forecasts next month and compare our forecasts to the actuals and see if we

127
00:19:23,120 --> 00:19:29,960
can't make better forecasts the next month. And when we do that, that brings us to the

128
00:19:29,960 --> 00:19:36,120
final step is when we look at our forecasts next month, so we'll be forecasting April

129
00:19:36,120 --> 00:19:43,400
sales today. And then in May, we can actually check our

130
00:19:43,400 --> 00:19:50,760
forecasts and see if the model we selected was a good predictor. And we may need to select

131
00:19:50,760 --> 00:19:58,960
a different model if our forecasts were wildly wrong or slightly adapt the model or slightly

132
00:19:58,960 --> 00:20:08,640
stick with it if we can't find a better model. So that's a quick background. Are there any

133
00:20:08,640 --> 00:20:27,320
questions before we jump into the code here? All right. So I just opened up Spyder here.

134
00:20:27,320 --> 00:20:35,560
There's nothing special about Spyder. I'm getting used to running Python in VS code.

135
00:20:35,560 --> 00:20:45,840
However, I'm still just faster and more effective in Spyder. So until I'm better in VS code,

136
00:20:45,840 --> 00:20:55,960
we'll be here. Or another editor. I've heard of PyCharm and Atom. Both seem like good choices.

137
00:20:55,960 --> 00:21:08,560
I haven't tried either. You're a weapon of choice.

138
00:21:08,560 --> 00:21:21,120
The packages we're using today are essentially NumPy and Pandas. And then I'll be reading

139
00:21:21,120 --> 00:21:32,080
in a Federal Thread API key from an environmental variable, an environment variable. And then

140
00:21:32,080 --> 00:21:46,240
we'll be using the Thread API to get the interest rate.

141
00:21:46,240 --> 00:21:56,120
First things first, we need to read in the data. So I'll just be reading in this data

142
00:21:56,120 --> 00:22:15,600
that I've scrounged from the Oregon dashboard. Just to look at the first observations.

143
00:22:15,600 --> 00:22:32,000
Notice, we don't really have consistent prices across the board until about 2017. So a professor

144
00:22:32,000 --> 00:22:39,320
in college really emphasized you never want to throw away data. However, in this case,

145
00:22:39,320 --> 00:22:47,200
just for simplicity's sake, I'm restricting the time period to 2017 and onwards. And that's

146
00:22:47,200 --> 00:22:52,520
for simplicity's sake. If you have a more elegant way to estimate the models, then by

147
00:22:52,520 --> 00:22:58,680
all means.

148
00:22:58,680 --> 00:23:11,320
We're looking at data from 2017 through March of 2021. And first we'll look at the data.

149
00:23:11,320 --> 00:23:23,280
So what are some of the variables we have here? We've got total sales. We've got flower

150
00:23:23,280 --> 00:23:51,440
sales. We've got concentrate sales. I thought we had retail flour prices. Okay, so we've

151
00:23:51,440 --> 00:24:04,920
got retail flour prices. And we have retail concentrate prices. Those are our variables

152
00:24:04,920 --> 00:24:08,920
of interest.

153
00:24:08,920 --> 00:24:19,680
And so last week, we talked briefly about how to calculate the CPI. And just to show

154
00:24:19,680 --> 00:24:35,080
you a presentation from last week with my chicken scratch. We basically noted that the

155
00:24:35,080 --> 00:24:47,840
consumer's basket of goods is about 60% flour and about 40% concentrate. So if you were

156
00:24:47,840 --> 00:24:56,800
going to calculate the CPI, which is a price index. So you can think about a stock index,

157
00:24:56,800 --> 00:25:05,000
which just sort of measures the general trend of all the goods in the basket. And so our

158
00:25:05,000 --> 00:25:18,760
basket is concentrates in flour. So our CPI will be the sum of all prices times their

159
00:25:18,760 --> 00:25:28,960
weight. So we can actually do that here in Python or your favorite programming language.

160
00:25:28,960 --> 00:25:47,880
So essentially, we can calculate the share of flour out of total sales.

161
00:25:47,880 --> 00:25:57,120
Rule number one, look at the data. So as you see, flour was making about 70% of all sales.

162
00:25:57,120 --> 00:26:07,880
And it's actually decreased, which is interesting, and is now maybe kind of stabilizing around

163
00:26:07,880 --> 00:26:20,120
55 or so percent. That's quite interesting. Then you can calculate concentrate share of

164
00:26:20,120 --> 00:26:31,640
the sales. And so this should be roughly the inverse where you have concentrates. Because

165
00:26:31,640 --> 00:26:39,720
keep in mind, there are also edibles and other goods. So we are excluding some data here.

166
00:26:39,720 --> 00:26:46,880
So it would be nice to have prices on edibles, but edibles vary so much across the board

167
00:26:46,880 --> 00:26:56,400
that it may not be key of pairing apples to apples, or infused apple to infused apple.

168
00:26:56,400 --> 00:27:01,960
Yeah, that's what sucks about this is they kind of lumped everything together, like edibles

169
00:27:01,960 --> 00:27:09,800
and like some of that. Because the concentrate would include oil then too, right?

170
00:27:09,800 --> 00:27:11,240
Yes.

171
00:27:11,240 --> 00:27:14,600
So it's a lot of different things that they put in there. And then on the manufacturing

172
00:27:14,600 --> 00:27:19,560
side, we have to consider this as pie retail sales. So if we're forecasting, we have to

173
00:27:19,560 --> 00:27:25,960
actually consider there's a markup. Because we're trying to get back to what we sell it

174
00:27:25,960 --> 00:27:32,840
at. So it gets a little complicated.

175
00:27:32,840 --> 00:27:41,080
That's a really good observation from someone who's in the industry. Because as you noted,

176
00:27:41,080 --> 00:27:48,200
you're sort of aggregating everything here. As you said, there's a wide array of concentrate

177
00:27:48,200 --> 00:27:56,280
goods. So you've got everything. So you've got your really high percentage distillates

178
00:27:56,280 --> 00:28:06,360
to maybe just some lower percentage waxes and other oils. And then, I mean, there are

179
00:28:06,360 --> 00:28:16,800
concentrates being made and sold to edible producers. And then how do you classify something

180
00:28:16,800 --> 00:28:24,960
like just like infused, like, you know, when they make like moon rocks or something like

181
00:28:24,960 --> 00:28:29,160
that.

182
00:28:29,160 --> 00:28:34,100
Because you know, moon rocks would probably be technically a concentrate. But that's hardly

183
00:28:34,100 --> 00:28:49,480
the same thing as oil. But taking it as it, taking what we have, taking our lemons, we'll

184
00:28:49,480 --> 00:28:58,280
try to make lemonade. We've got the sheer flour and the sheer concentrate, which are

185
00:28:58,280 --> 00:29:11,200
actually by themselves are interesting figures. So now we can calculate the cannabis CPI, which

186
00:29:11,200 --> 00:29:20,840
will be the flour share of prices times the retail flour price plus the concentrate share

187
00:29:20,840 --> 00:29:29,960
of sales times the retail concentrate price. And so this, you can maybe calculate this

188
00:29:29,960 --> 00:29:39,720
in a more glamorous way. But that is essentially this calculation here. So we'll calculate

189
00:29:39,720 --> 00:29:48,440
the CPI. Let's just look at it just to see what it may look like. However, the nominal

190
00:29:48,440 --> 00:30:00,480
value doesn't matter too much. It's more about the trend. So the trend, we can calculate

191
00:30:00,480 --> 00:30:12,600
what we've defined as inflation, which is just the rate of change of the CPI. We'll

192
00:30:12,600 --> 00:30:16,760
basically just be looking at the CPI. And so just to go back to the presentation real

193
00:30:16,760 --> 00:30:27,560
quick. So it's basically the CPI of today minus the CPI of yesterday divided by the

194
00:30:27,560 --> 00:30:35,800
CPI of yesterday. Last week, I just made a crude hypothesis that it would be between

195
00:30:35,800 --> 00:30:43,720
1 to 3% on average, which would maybe be what you'd expect then maybe in a more mature

196
00:30:43,720 --> 00:31:09,560
industry. So let's go ahead and calculate this and run it. And I'm actually, so it's

197
00:31:09,560 --> 00:31:18,920
kind of jumping across the board. The first thing I notice is it is negative, which negative

198
00:31:18,920 --> 00:31:32,880
inflation is deflation. Deflation from the federal level is not desirable. So deflation

199
00:31:32,880 --> 00:31:43,160
can have bad effects. So from a consumer's point of view, if there is deflation, then

200
00:31:43,160 --> 00:31:49,480
you know that prices are going to be decreasing. So you know prices next week are going to

201
00:31:49,480 --> 00:31:58,880
be lower than prices today. So it gives you the incentive to wait to buy your goods next

202
00:31:58,880 --> 00:32:07,360
week. So if everybody's just waiting as long as they can to buy their goods, then it's

203
00:32:07,360 --> 00:32:17,840
going to decrease the economic demand even more. So businesses, they're already struggling

204
00:32:17,840 --> 00:32:27,840
for demand. And now consumers are postponing their purchases. So the businesses don't have

205
00:32:27,840 --> 00:32:35,480
much cash flow. So it's even harder for the businesses. They lower their prices even further.

206
00:32:35,480 --> 00:32:44,640
And so you can get trapped in essentially the cyclical deflation. That's sort of bad

207
00:32:44,640 --> 00:32:50,680
in the economy as a whole. Like I said, the cannabis industry, it's sort of this short

208
00:32:50,680 --> 00:33:00,760
term shock where things needed to stabilize. There was just this, I think there was even

209
00:33:00,760 --> 00:33:05,760
a note in Oregon where they're just saying, oh, there's so much supply. So just in this

210
00:33:05,760 --> 00:33:11,240
first few years, everyone's sort of figuring out, okay, what is the level of supply? What's

211
00:33:11,240 --> 00:33:22,560
the level of demand? And prices, it appears, were just too high. Or not too high, but maybe

212
00:33:22,560 --> 00:33:29,600
those were the prices maybe when you were expecting more of a gray slash transitioning

213
00:33:29,600 --> 00:33:39,320
from a black market. And then as more producers entered the market, you see things stabilize

214
00:33:39,320 --> 00:33:50,040
over time. So it looks like it took about two years for people who were thinking about

215
00:33:50,040 --> 00:33:57,000
entering the market to enter. So this is the time essentially where everybody's entering

216
00:33:57,000 --> 00:34:05,660
the market and the people who aren't successful maybe exiting the market. And then it looks

217
00:34:05,660 --> 00:34:15,640
like by about 2019, things are starting to stabilize. So let's actually just look at

218
00:34:15,640 --> 00:34:32,060
inflation in percentages real quick, because that's typically what you think about. Then

219
00:34:32,060 --> 00:34:41,640
let's just look at the mean. So as a whole, prices are decreasing. I would just note that

220
00:34:41,640 --> 00:34:54,480
if you just look at the last two years, if you just look at the last two years, inflation

221
00:34:54,480 --> 00:35:05,560
maybe is around 0.5% per month, which is maybe that's closer to typical. So maybe moving

222
00:35:05,560 --> 00:35:16,560
forward, you may expect more, this slight 0.5% a month inflation. Maybe there's some

223
00:35:16,560 --> 00:35:30,680
shocks here and there, but personally, I wouldn't expect this deflation to continue. So that's

224
00:35:30,680 --> 00:35:40,640
just sort of our crude data analysis. So now let's grab the interest rate and start forecasting

225
00:35:40,640 --> 00:35:56,000
here. So you can get your API key for free from the St. Louis Fred. They're a good resource.

226
00:35:56,000 --> 00:36:01,640
Just starting to try to incorporate them into my work just to get some more data points

227
00:36:01,640 --> 00:36:15,760
in here. This is the effective federal funds rate. It can be a good starting point. And

228
00:36:15,760 --> 00:36:40,120
it looks like something went wrong here. Okay. So not 100% sure. I'll have to investigate

229
00:36:40,120 --> 00:36:45,680
what that was after this. But long story short, we've got the interest rate now. I think that

230
00:36:45,680 --> 00:36:54,240
was maybe some sort of error maybe reading in my API key. Anyways, moving on. We've got

231
00:36:54,240 --> 00:37:10,000
our interest rate now. So we've basically just read in the tail end here of the interest

232
00:37:10,000 --> 00:37:18,800
rate. And we'll use it in our vector autoregression. And like I said, next month we can compare

233
00:37:18,800 --> 00:37:25,520
our forecasts and perhaps even maybe even next week we can maybe even try a different

234
00:37:25,520 --> 00:37:34,520
forecasting model and then compare the two forecasting models. So we could try next week

235
00:37:34,520 --> 00:37:42,320
to just do an autoregression of inflation. And then we can compare that to the vector

236
00:37:42,320 --> 00:37:51,720
autoregression and see what forecasts better, the more complex model or the more simple

237
00:37:51,720 --> 00:38:04,240
model. And that takes us to our 10 commandments where we wanted to use recursive methods.

238
00:38:04,240 --> 00:38:20,720
Okay. So we have our total sales, we have inflation, and we have the interest rate.

239
00:38:20,720 --> 00:38:27,200
We're going to toss all of these together into a vector. And so just to show you what

240
00:38:27,200 --> 00:38:43,080
the vector looks like. So basically this is sales, this is inflation, and this is the

241
00:38:43,080 --> 00:39:00,200
interest rate in time period zero, which would be 2017 January. So this is sort of what the

242
00:39:00,200 --> 00:39:12,320
vector looks like. So just to show you what's happening under the hood. And so now I wrote

243
00:39:12,320 --> 00:39:17,920
a scrappy vector autoregression and I wasn't actually very pleased with how it was working.

244
00:39:17,920 --> 00:39:26,280
So I actually noticed that you can actually just use a vector autoregression in the stats

245
00:39:26,280 --> 00:39:37,840
models package. So it saved a lot of time. And it's a bit more flexible. You've got more

246
00:39:37,840 --> 00:39:46,160
contributors, so more stable than my scrappy var model. So we'll use this one from this

247
00:39:46,160 --> 00:40:03,680
package. And so let's say you're just going to fit. Well, this one has my notes on it.

248
00:40:03,680 --> 00:40:14,040
So here's our three equations. So let's go ahead and fit this. So output inflation, the

249
00:40:14,040 --> 00:40:29,720
interest rate. And we're just going to use one lag order. And so what we've done here

250
00:40:29,720 --> 00:40:41,640
is we've basically estimated three ordinary least squares regressions simultaneously.

251
00:40:41,640 --> 00:40:54,680
So this is our first regression. This is our regression for output. Then this is our regression

252
00:40:54,680 --> 00:41:07,800
for inflation. And then this is our regression for the interest rate. And so as you see,

253
00:41:07,800 --> 00:41:18,120
each one has a constant. And then these are the coefficients. So this is beta, this is

254
00:41:18,120 --> 00:41:40,360
gamma, and this is delta. So we've estimated quite a lot of parameters here. I'm not sure

255
00:41:40,360 --> 00:41:52,320
if they tell us our degrees of freedom here. But long story short, we were able to fit

256
00:41:52,320 --> 00:42:03,980
an AR1. So that's a good sign. Next. If you were going to fit multiple models, so say

257
00:42:03,980 --> 00:42:20,600
we're going to fit an AR2. So we're now going to use two lags. So here are our three simultaneous

258
00:42:20,600 --> 00:42:36,200
equations. And now you'll notice that we have two lags of each. Yes, so two lags of each

259
00:42:36,200 --> 00:42:45,560
variable. So there's output lagged once, output lagged twice. Inflation lagged once, inflation

260
00:42:45,560 --> 00:42:54,200
lagged twice, interest rate lagged once, interest rate lagged twice in all three models. So

261
00:42:54,200 --> 00:43:05,400
that is a lot of parameters. That is 21 parameters, I believe. So that's one, right? So that's

262
00:43:05,400 --> 00:43:16,920
six, seven times three should be 21 parameters. So I'm not sure if we can, it's going to be

263
00:43:16,920 --> 00:43:26,040
tough to estimate too much larger. However, we can try. But we need some way to compare

264
00:43:26,040 --> 00:43:32,240
these models to each other, right? Because we can just do this all day if we have a large

265
00:43:32,240 --> 00:43:45,240
enough data series. So centrally, we need to have a criterion. Centrally, the BIC, the

266
00:43:45,240 --> 00:43:55,400
Bayesian Information Criterion, it rewards you for making better predictions, but then

267
00:43:55,400 --> 00:44:05,680
it punishes you for adding more and more predictors, more and more parameters. So the BIC is a

268
00:44:05,680 --> 00:44:16,160
useful way to measure which model is a better, which model may be a better choice. So as

269
00:44:16,160 --> 00:44:23,000
you'll notice, this model, we tossed in a ton of parameters, so we may be able to predict

270
00:44:23,000 --> 00:44:33,200
a little better. But we've got a BIC of 22, and this model has a BIC of 21. And so we

271
00:44:33,200 --> 00:44:46,600
want to pick the model with the minimum BIC. And so in this situation, the AR1, it's more

272
00:44:46,600 --> 00:44:57,400
parsimonious and it's actually the better model choice if you're choosing by BIC, because

273
00:44:57,400 --> 00:45:08,560
you're essentially overfitting with this model. And it may lead to bad out of sample predictions.

274
00:45:08,560 --> 00:45:17,160
So that's sort of a long explanation of what we're about to let stats models do for us

275
00:45:17,160 --> 00:45:25,440
automatically. So we can actually just let stats models say, OK, why don't you try every

276
00:45:25,440 --> 00:45:38,080
model up to lag order six and just tell me the best one based on minimum BIC. So we can

277
00:45:38,080 --> 00:45:44,080
just let stats models do that for us and save us a lot of programming, because you could

278
00:45:44,080 --> 00:45:53,280
program that up yourself. But you let stats models do it. And look at that. They tried

279
00:45:53,280 --> 00:46:01,440
every model that they could fit up to lag order six. I'm sure the higher order lags,

280
00:46:01,440 --> 00:46:08,160
they weren't able to fit successfully. And so they determined, OK, the model with the

281
00:46:08,160 --> 00:46:18,640
minimum BIC is actually the lag order one. So that's just our first model.

282
00:46:18,640 --> 00:46:28,480
What this is is this model, but without the dot, dot, dots. So we're literally just doing

283
00:46:28,480 --> 00:46:40,840
YT on YT minus one, inflation of T minus one, the interest rate of T minus one, and so forth.

284
00:46:40,840 --> 00:46:52,240
So we finally get to the goods. So we've got our best model. It's forecasted and show the

285
00:46:52,240 --> 00:47:16,520
data. Maybe I ran that twice. Not certain why it's printing twice. But anywho, here

286
00:47:16,520 --> 00:47:27,600
are our forecasts. So this legend's a little in the way, but there's our forecast for our

287
00:47:27,600 --> 00:47:44,960
total sales. Here is our forecast for inflation. So slightly rising. And then our, see, this

288
00:47:44,960 --> 00:47:55,160
is interesting. So our forecast for the interest rate is negative. But as we know, the interest

289
00:47:55,160 --> 00:48:04,880
rate can't be, well, it can't be nominally below zero. So the Federal Reserve has to

290
00:48:04,880 --> 00:48:19,240
use monetary policy, quantitative easing, and other tools in their toolbox to effectively

291
00:48:19,240 --> 00:48:31,620
drive the interest rates below zero, if that's with their policy. So that's the plot. I will

292
00:48:31,620 --> 00:48:41,060
admit that I am just now learning the stats models packages, because I would like to actually

293
00:48:41,060 --> 00:48:55,720
print out with this series is here. But I am not certain that I know the thing. I may

294
00:48:55,720 --> 00:49:03,020
need to learn the stats models package a little bit better to actually get you these raw data

295
00:49:03,020 --> 00:49:17,340
points here. However, here is the package I'm using. And there is a lot here. So I've

296
00:49:17,340 --> 00:49:30,220
just hit the surface. You can do error analysis to make sure you're not doing anything crazy.

297
00:49:30,220 --> 00:49:40,680
We've done the lag order selection. And then I think for maybe in the future, I would like

298
00:49:40,680 --> 00:49:48,300
to show you how to do impulse response function. Because you can actually say, okay, well,

299
00:49:48,300 --> 00:49:56,500
what would actually happen if there was a shock in prices? How would that affect output?

300
00:49:56,500 --> 00:50:06,700
Or if there was a shock in the interest rate, how would that affect prices or sales? So

301
00:50:06,700 --> 00:50:14,460
all three of these series are theoretically related to each other, particularly sales

302
00:50:14,460 --> 00:50:28,460
and prices. So long story short is you can use these impulse response functions to simulate

303
00:50:28,460 --> 00:50:35,780
shocks. So what would happen if all of a sudden there was a recession? So these could be useful

304
00:50:35,780 --> 00:50:47,740
for simulating harvest spikes in October or something like that. So perhaps for next week

305
00:50:47,740 --> 00:51:00,540
we can dive into that. And I'll actually want to show you the actual numbers here. But that

306
00:51:00,540 --> 00:51:11,700
is sort of the conclusion of my spiel here on forecasting inflation. And so I just maybe

307
00:51:11,700 --> 00:51:20,580
like to open it up if there are any questions or anything.

308
00:51:20,580 --> 00:51:37,540
Would be able to, is that beneficial or there are avenues for better analysis next week?

309
00:51:37,540 --> 00:51:38,540
Or?

310
00:51:38,540 --> 00:51:46,180
No, it was interesting. These are things that I don't think about. So it was good to learn

311
00:51:46,180 --> 00:51:57,940
about how you go about forecasting these things. So it was cool. I learned something.

312
00:51:57,940 --> 00:52:04,420
What are some of the things you think about? Because this may have been, I'm not sure,

313
00:52:04,420 --> 00:52:09,620
this may have been a bit too economics heavy. So what are some of the things that you're

314
00:52:09,620 --> 00:52:13,140
thinking about on a day to day basis?

315
00:52:13,140 --> 00:52:19,540
Well, that's kind of in my data science journey. I realized there's like a data science food

316
00:52:19,540 --> 00:52:24,620
pyramid of things that you should know. And so I know everything at the top and I know

317
00:52:24,620 --> 00:52:31,620
everything at the bottom, but there's this middle business understanding, business storytelling

318
00:52:31,620 --> 00:52:39,220
kind of thing, which I don't know. And so I'm trying to learn that kind of thing. And

319
00:52:39,220 --> 00:52:43,260
these presentations have been really good for that.

320
00:52:43,260 --> 00:52:55,740
Well, that could also just be my style. So that's sort of, from my experience, that was,

321
00:52:55,740 --> 00:53:01,820
so I attended a lot of seminars when I was in college. And that's what I noticed made

322
00:53:01,820 --> 00:53:11,100
the most successful presentations slash papers is when you sort of tell a story and you let

323
00:53:11,100 --> 00:53:20,180
the models evolve. So you start with some sort of business question. So here we started,

324
00:53:20,180 --> 00:53:27,060
it helps to start with just a figure. So start with the data. So we just started with what

325
00:53:27,060 --> 00:53:35,100
we're looking at sales in Oregon, prices in Oregon. And we sort of just let the model

326
00:53:35,100 --> 00:53:44,660
evolve naturally. So we start as simple as we can with essentially the growth rate. So

327
00:53:44,660 --> 00:53:53,580
you start with the simplest statistics you can calculate, and then you gradually make

328
00:53:53,580 --> 00:54:00,860
it more complex. So really before you jump to the VAR, the VAR, you'd really want to

329
00:54:00,860 --> 00:54:07,580
just estimate a VAR, which we've kind of done. Well, I'm actually, I'm not sure, I'll have

330
00:54:07,580 --> 00:54:16,940
to review it. We may have done that prior weeks, but essentially you let the model become

331
00:54:16,940 --> 00:54:31,540
gradually more complex, but still keeping in mind your initial objective. And then you

332
00:54:31,540 --> 00:54:48,580
just, yeah, you, and then as you're doing your analysis, you'll be calculating other

333
00:54:48,580 --> 00:54:55,020
metrics of interest. So we calculated the share of flour, we calculated the share of

334
00:54:55,020 --> 00:55:07,020
concentrates. All of those sort of add to the story we're telling here of this presentation.

335
00:55:07,020 --> 00:55:20,860
So you sort of build these pieces here. They're just sort of pieces of your analysis that

336
00:55:20,860 --> 00:55:28,180
just gradually gets, you just keep adding to it until you've, you know, you've done

337
00:55:28,180 --> 00:55:35,900
sort of an in-depth analysis on the subject at hand. And then you've answered sort of

338
00:55:35,900 --> 00:55:44,740
your question at hand, which started by just looking at the data. We just basically wanted

339
00:55:44,740 --> 00:55:58,500
to know what's the trajectory for inflation. What even is inflation? So we were able to,

340
00:55:58,500 --> 00:56:06,980
you know, gradually build up to that. And then we can answer that question. We can say,

341
00:56:06,980 --> 00:56:15,820
oh, it looks like despite, you know, historic deflation in the first two years, you know,

342
00:56:15,820 --> 00:56:20,420
we predict, yeah, moderate inflation. And I wish I could print the numbers and actually

343
00:56:20,420 --> 00:56:29,900
give you the number here. But yeah, we predict, you know, moderate inflation here throughout

344
00:56:29,900 --> 00:56:39,460
the rest of the year. And you could keep taking this story further and keep adding steps to

345
00:56:39,460 --> 00:56:45,700
it. So you could say, oh, like, what's going to happen if there's a harvest spike or what

346
00:56:45,700 --> 00:56:55,020
about this? Or can't we estimate an even better model? So, and that's where sort of papers

347
00:56:55,020 --> 00:57:00,740
build on each other is because then you kind of can continue someone else's story by saying,

348
00:57:00,740 --> 00:57:09,060
oh, picking up this story where they left off, I'm going to, you know, add to their

349
00:57:09,060 --> 00:57:20,140
model. But that's my approach is just start off simple and then just sort of document

350
00:57:20,140 --> 00:57:24,460
your steps. You know, instead of just jumping to the end and just saying, oh, I'm just going

351
00:57:24,460 --> 00:57:31,660
to do a, you know, principal components analysis, or I'm going to do, you know, machine learning

352
00:57:31,660 --> 00:57:39,520
algorithm, XYZ, you know, instead of just jumping all the way there, I find it's more

353
00:57:39,520 --> 00:57:48,300
informative to start with the basics and then build your way up until the machine learning

354
00:57:48,300 --> 00:57:58,740
algorithm is actually like the next reasonable next step. Because eventually, if you kept

355
00:57:58,740 --> 00:58:04,000
taking this analysis further and further and further, I mean, machine learning would be

356
00:58:04,000 --> 00:58:10,700
the rational next step. I'm just not, I just wouldn't start with it right out of the gates.

357
00:58:10,700 --> 00:58:20,620
So that's a long-winded answer, but, you know, I resonate with that statement.

358
00:58:20,620 --> 00:58:28,020
Yeah, cool. That's, yeah, that's, you know, I think kind of the lesson that I've been

359
00:58:28,020 --> 00:58:32,660
learning and everybody emphasizes machine learning a lot, but actually there's a lot

360
00:58:32,660 --> 00:58:44,060
of steps before you get there and before it's even useful. Recruiter. They're like, they

361
00:58:44,060 --> 00:58:50,460
never stop calling. But anyway, I read a really cool article in the Oregonian and I'm trying

362
00:58:50,460 --> 00:58:53,940
to figure out how to share it because the Oregonian is making it more and more difficult

363
00:58:53,940 --> 00:59:01,420
to read their articles online and share them. But there was an article about the counties

364
00:59:01,420 --> 00:59:10,860
and the towns along the Oregon-Idaho border and how, because, you know, cannabis is legal

365
00:59:10,860 --> 00:59:16,420
in Oregon, but not in Idaho, that these towns have, you know, like on the weekend, they

366
00:59:16,420 --> 00:59:28,160
have huge, huge influxes of people from Idaho coming in and like the per capita cannabis

367
00:59:28,160 --> 00:59:33,580
sales in these towns is like four and five times the amount it is in like Multnomah County,

368
00:59:33,580 --> 00:59:38,660
which is like the most populous county in Oregon. Oh gosh. Oh, because people are coming

369
00:59:38,660 --> 00:59:42,900
in from Idaho. Coming in, yes. And so, and, you know, and there was a statement from like

370
00:59:42,900 --> 00:59:48,140
some, somebody in New Mexico and they're like, yeah, we needed to legalize it before Texas

371
00:59:48,140 --> 00:59:56,820
does so that we can, you know, we can capitalize on those sales. So it was really interesting.

372
00:59:56,820 --> 01:00:03,780
I guess that's one of those things where it's so hard to capture that in the data. And I'm

373
01:00:03,780 --> 01:00:12,580
not even sure what the laws on that are. Maybe, but that's up to the individuals who are doing

374
01:00:12,580 --> 01:00:20,180
that, I suppose. But I know there's a town called Ontario, Oregon, that's right on the

375
01:00:20,180 --> 01:00:28,420
Oregon, Idaho border that has a pretty big dispensary. So, well, has a dispensary. So

376
01:00:28,420 --> 01:00:32,980
they're making money from both states. Well, the people in Oregon are just saying, hey,

377
01:00:32,980 --> 01:00:41,300
we're just, we're just opening up shop. We're not doing anything wrong. Yeah. I don't know.

378
01:00:41,300 --> 01:00:48,900
I don't know. That's a, that's an interesting observation. And I, yeah, that's something

379
01:00:48,900 --> 01:00:53,940
to think about Charles, if there's any way to parse that out of the data. I don't know

380
01:00:53,940 --> 01:01:08,220
if that Oregon data has a geographical, like component to it, but I don't know. That may

381
01:01:08,220 --> 01:01:13,720
take a bit more exploration. I'm not sure if you can get sales by licensee in Oregon

382
01:01:13,720 --> 01:01:20,100
or not and try to do a geographical analysis. Well, I mean, whoever wrote, whoever wrote

383
01:01:20,100 --> 01:01:25,340
this article is able to figure it out. They were able to do like a per capita analysis.

384
01:01:25,340 --> 01:01:32,860
Oh, okay. Okay. So you may, they may have county by county data. So you may actually

385
01:01:32,860 --> 01:01:40,980
be able to do some sort of analysis there. So that perhaps actually for a future meetup,

386
01:01:40,980 --> 01:01:46,980
we could do geographical analysis there because like I said, I think the geographical dimension

387
01:01:46,980 --> 01:01:53,220
is an incredibly interesting one. It's can be tricky, but we've got the counties. So

388
01:01:53,220 --> 01:02:00,740
that, I mean, there's no reason why we can't do some sort of analysis. Yeah. Yeah. And

389
01:02:00,740 --> 01:02:06,740
that Ontario is only like Oregon's only like less than an hour away from Boise, Idaho.

390
01:02:06,740 --> 01:02:12,020
So I've met Boise's people from Boise are like, cause Boise is a college town too. So

391
01:02:12,020 --> 01:02:21,620
you, you probably have a lot of people from Boise just drive down to here to pick up stuff.

392
01:02:21,620 --> 01:02:25,940
That would be a real interesting analysis. And you could extend that to a bunch of states

393
01:02:25,940 --> 01:02:33,020
because that would be interesting to see if you have like this like border phenomena in

394
01:02:33,020 --> 01:02:39,780
cannabis states where of course you would expect like the cities to be like high in

395
01:02:39,780 --> 01:02:47,260
sales, but then it would be so funny if like the border towns were all high sales too.

396
01:02:47,260 --> 01:03:01,180
Yeah. That's interesting observation and could be a good opportunity for analysis. So I think

397
01:03:01,180 --> 01:03:09,540
I'll go ahead and just, unless anyone's got any more topics at hand to talk about it,

398
01:03:09,540 --> 01:03:19,020
we've kind of reached the hour. So let's anyone else have to throw on the table? I mean, go

399
01:03:19,020 --> 01:03:29,180
ahead and go ahead and conclude it here until next week. All right. All right. Well, thank

400
01:03:29,180 --> 01:03:35,420
you all for coming. Thanks, Ryan. Thanks Charles. Thanks Nick. Thank you for your contributions.

401
01:03:35,420 --> 01:03:41,140
It's always, it's always awesome to hear from you. And then next week we'll see if we can't

402
01:03:41,140 --> 01:03:51,140
look at Charles's waste data, touch up the inflation data if needed. And then, and then

403
01:03:51,140 --> 01:04:02,820
maybe start some geographical analysis of if we can find that county data. All right,

404
01:04:02,820 --> 01:04:10,980
everyone. Thank you. Thanks. Thanks for coming. And until next week, have an awesome week.

405
01:04:10,980 --> 01:04:22,740
Bye. Bye. Bye.

