1
00:00:00,000 --> 00:00:15,560
Welcome to the Canvas Data Science Meetup Group for the 15th of December, 2021.

2
00:00:15,560 --> 00:00:21,360
We're heading quickly into 2022, and that has some big implications.

3
00:00:21,360 --> 00:00:28,440
So one, we need to go back through the year and look at our forecasts.

4
00:00:28,440 --> 00:00:35,880
So we made some forecasts throughout the year of how the market would shake out in 2021.

5
00:00:35,880 --> 00:00:38,520
And so now it's time to check those forecasts.

6
00:00:38,520 --> 00:00:45,040
So we made forecasts of Washington State sales and performance.

7
00:00:45,040 --> 00:00:47,680
We looked at Oregon prices.

8
00:00:47,680 --> 00:00:52,600
So let's, we'll need to, you know, look, check our forecasts there.

9
00:00:52,600 --> 00:00:53,600
We looked at so many states.

10
00:00:53,600 --> 00:00:56,600
We looked at Colorado, Oklahoma, Massachusetts.

11
00:00:56,600 --> 00:01:02,880
Lately, we've been looking at Illinois.

12
00:01:02,880 --> 00:01:10,680
And we've been curious about states such as Michigan, Maryland, and all the other states

13
00:01:10,680 --> 00:01:13,880
with adult use and medicinal use.

14
00:01:13,880 --> 00:01:21,480
So as we're here in the home stretch, I'll be working on, and you're happy to contribute

15
00:01:21,480 --> 00:01:23,200
with, forecasts.

16
00:01:23,200 --> 00:01:30,720
So we can work on one, checking our forecasts from 2021, making forecasts for 2022.

17
00:01:30,720 --> 00:01:36,880
So should be a big time ahead with that.

18
00:01:36,880 --> 00:01:42,560
Without further ado, so, Shane and Heather both joined us before.

19
00:01:42,560 --> 00:01:49,280
So do either of you have any topics or questions, ideas, comments on your mind for the start

20
00:01:49,280 --> 00:01:50,280
of today?

21
00:01:50,280 --> 00:01:51,280
Starting with you, Shane.

22
00:01:51,280 --> 00:01:54,040
I know I'm just glad to be here.

23
00:01:54,040 --> 00:02:01,120
This is my second time and I'm relatively new to data and tech, but not so much new

24
00:02:01,120 --> 00:02:02,120
to cannabis.

25
00:02:02,120 --> 00:02:06,920
So I'm trying to bridge the gap and just kind of get the information and about the industry

26
00:02:06,920 --> 00:02:10,840
and just get as much information as I can.

27
00:02:10,840 --> 00:02:13,840
Happy to have you, Shane.

28
00:02:13,840 --> 00:02:16,040
We're going to have a bunch of information today.

29
00:02:16,040 --> 00:02:22,040
And so then Heather, would you be happy to just say a quick word and if you've got anything

30
00:02:22,040 --> 00:02:24,400
on your mind for today?

31
00:02:24,400 --> 00:02:38,440
Yes, I planned on observing this talk rather passively, but definitely, I know from last

32
00:02:38,440 --> 00:02:44,320
week we were talking about Michigan, right?

33
00:02:44,320 --> 00:02:47,920
So I guess I just, I know I want to analyze the Maryland data.

34
00:02:47,920 --> 00:02:50,720
I just hasn't been in the cards for me just yet.

35
00:02:50,720 --> 00:02:52,040
So that's really the angle.

36
00:02:52,040 --> 00:02:58,240
I'm just, you know, I'm really intrigued by the state stuff because there's a different

37
00:02:58,240 --> 00:03:00,040
level of control at each state.

38
00:03:00,040 --> 00:03:04,440
So it's, you know, not something that we can ignore.

39
00:03:04,440 --> 00:03:05,440
Anyway.

40
00:03:05,440 --> 00:03:06,440
Exactly.

41
00:03:06,440 --> 00:03:14,520
And so, as far as I'll speak to Michigan data real quick and then Maryland, so Michigan

42
00:03:14,520 --> 00:03:18,760
data, they've got great data on their website.

43
00:03:18,760 --> 00:03:21,520
They just have quite restrictive terms of use.

44
00:03:21,520 --> 00:03:28,680
So you can't use their data for commercial use and you may not be able to access it programmatically.

45
00:03:28,680 --> 00:03:39,080
I sent them an email, the Michigan regulatory agency to see what exactly is permissible.

46
00:03:39,080 --> 00:03:44,560
So can we use their data in, you know, an open source public manner, the way we have

47
00:03:44,560 --> 00:03:45,640
been.

48
00:03:45,640 --> 00:03:48,360
So still waiting to hear back from that.

49
00:03:48,360 --> 00:03:53,680
So we're going to actually have to pause on Michigan data, just till we get the green

50
00:03:53,680 --> 00:03:57,080
light if we do, hoping we do.

51
00:03:57,080 --> 00:04:03,600
Maryland will come up next week when we start doing state by state forecast for 2022.

52
00:04:03,600 --> 00:04:06,640
We're going to really kick that off next week.

53
00:04:06,640 --> 00:04:12,120
We're sort of just teasing that today.

54
00:04:12,120 --> 00:04:15,760
And we may begin Saturday morning statistics.

55
00:04:15,760 --> 00:04:16,760
We'll see.

56
00:04:16,760 --> 00:04:20,520
We're doing a few like regressions there.

57
00:04:20,520 --> 00:04:24,440
So if you're interested in regression analysis, definitely worth checking out.

58
00:04:24,440 --> 00:04:30,760
And then next week, next Wednesday, we'll start the forecast.

59
00:04:30,760 --> 00:04:36,080
Today I was just going to share with you some economics.

60
00:04:36,080 --> 00:04:41,400
We've talked about some real interesting economics, talked about some of it during Saturday morning

61
00:04:41,400 --> 00:04:42,400
statistics.

62
00:04:42,400 --> 00:04:48,280
And I wanted to share some of that with you here today, on Wednesday with the KAMS data

63
00:04:48,280 --> 00:04:55,520
science group, because I think it's valuable information and don't want to leave you out

64
00:04:55,520 --> 00:04:56,520
of the mix.

65
00:04:56,520 --> 00:05:01,400
So just wanted to share with you some real interesting economics real quick.

66
00:05:01,400 --> 00:05:06,160
And then we can take a quick look at Washington State, because that was one of the states

67
00:05:06,160 --> 00:05:10,960
that we first began and now analyzing at the beginning of the year.

68
00:05:10,960 --> 00:05:15,360
So it seems fit to take a look here near the end.

69
00:05:15,360 --> 00:05:32,400
So without further ado, let me share with you

70
00:05:32,400 --> 00:05:39,560
the latest work and some little stash of economics just for fun.

71
00:05:39,560 --> 00:05:50,920
So we saw a technical memorandum produced for the state of Nevada, where they were looking

72
00:05:50,920 --> 00:05:59,440
at the relationship between dispensaries per capita and sales per dispensary.

73
00:05:59,440 --> 00:06:13,720
And so we got a handle on this data, and we're going to take a look at it ourselves.

74
00:06:13,720 --> 00:06:17,640
Awesome to have you join us Marjana.

75
00:06:17,640 --> 00:06:19,480
Just quick update.

76
00:06:19,480 --> 00:06:30,000
Today we're going to be honing in, we couldn't yet start wrangling Michigan data.

77
00:06:30,000 --> 00:06:36,160
Waiting for the green light on that, just to make sure we abide by their terms of use

78
00:06:36,160 --> 00:06:41,760
as far as accessing their data.

79
00:06:41,760 --> 00:06:47,160
The next best thing, we can look at a comparable state.

80
00:06:47,160 --> 00:06:53,600
So just so happens that the first state we looked at, and a comparable state to Michigan,

81
00:06:53,600 --> 00:06:55,640
is Washington State.

82
00:06:55,640 --> 00:07:07,160
So on the chart, as things shake out, Washington has a comparable number of retailers per capita

83
00:07:07,160 --> 00:07:11,600
and sales per dispensary.

84
00:07:11,600 --> 00:07:18,880
As far as the whole dichotomy goes, where you have Oregon and Colorado on this far end,

85
00:07:18,880 --> 00:07:24,000
and then you have states that we've been looking at the past couple of weeks, such as Illinois

86
00:07:24,000 --> 00:07:30,040
and Massachusetts on the far other side of the scale.

87
00:07:30,040 --> 00:07:40,160
It's nice that we actually now have points all over the regression line, so that's nice.

88
00:07:40,160 --> 00:07:46,800
Exactly, and as Marjana pointed out, there's only 10 data points here, so we would like

89
00:07:46,800 --> 00:07:53,840
to add a lot more, especially equally distributed along this line here.

90
00:07:53,840 --> 00:08:00,280
So we've added a lot of data points for Illinois and Massachusetts, so we're getting a little

91
00:08:00,280 --> 00:08:02,560
heavy on this end.

92
00:08:02,560 --> 00:08:07,360
So these states also have good data in Washington, Colorado, and Oregon.

93
00:08:07,360 --> 00:08:14,520
So today we'll add a lot of data points for Washington State.

94
00:08:14,520 --> 00:08:21,760
And once we get the green light, we'll work on Michigan, but for now we'll look at Washington.

95
00:08:21,760 --> 00:08:33,560
And so we're starting with the figures, I'll zoom in here a bit, that we're expecting around

96
00:08:33,560 --> 00:08:40,120
10 dispensaries per 100,000 adults here.

97
00:08:40,120 --> 00:08:44,760
Notice world measure in population, so it'll be a little different.

98
00:08:44,760 --> 00:08:54,960
And then we're also looking for around 2.2 or so million in annual revenue.

99
00:08:54,960 --> 00:09:02,520
Okay, so why are we even looking at this data?

100
00:09:02,520 --> 00:09:09,760
Just want to just go ahead and talk just real quick about why we're even looking at this

101
00:09:09,760 --> 00:09:16,680
data, just to give the group a little bit of background.

102
00:09:16,680 --> 00:09:22,200
So we've got a research question, this is essentially our literature review, and then

103
00:09:22,200 --> 00:09:31,920
we'll get into our data, our methods, and our analysis and conclusion.

104
00:09:31,920 --> 00:09:36,040
So essentially applying the scientific methods data.

105
00:09:36,040 --> 00:09:48,640
So a lot of this work has been done for decades going on, starting at the beginning of what

106
00:09:48,640 --> 00:09:51,800
was formerly termed economics.

107
00:09:51,800 --> 00:09:59,800
And so the reason people are looking at this is, well, I'm just making sure there's not

108
00:09:59,800 --> 00:10:03,440
any join these real quick.

109
00:10:03,440 --> 00:10:10,680
Okay, so the reason people started looking at this is they wanted to start characterizing

110
00:10:10,680 --> 00:10:15,320
markets that maybe there were some bad actors going on.

111
00:10:15,320 --> 00:10:20,480
So maybe there was collusion, maybe there was some sort of regulatory capture, maybe

112
00:10:20,480 --> 00:10:26,600
there were some sort of other factors going on to create these monopolies.

113
00:10:26,600 --> 00:10:33,040
So in the early 1900s, you see a lot of the antitrust policy where they start to break

114
00:10:33,040 --> 00:10:37,040
up the large oil companies.

115
00:10:37,040 --> 00:10:48,520
And that's really where a lot of the focus started to go into characterizing markets

116
00:10:48,520 --> 00:10:50,600
is, okay, are they competitive?

117
00:10:50,600 --> 00:10:53,800
Are they anti-competitive?

118
00:10:53,800 --> 00:11:04,160
And this is where the structured conduct performance paradigm came about, where they basically

119
00:11:04,160 --> 00:11:15,320
start to say, okay, we need sort of a classification system for classifying markets.

120
00:11:15,320 --> 00:11:23,360
So economic theory suggests, okay, if there's perfect competition, the market will shake

121
00:11:23,360 --> 00:11:24,960
out one way.

122
00:11:24,960 --> 00:11:30,800
If there's a monopoly, the market will shake out another way.

123
00:11:30,800 --> 00:11:38,280
So this diagram shows how the market would shake out with a monopolist, with the monopolist

124
00:11:38,280 --> 00:11:43,800
operating at QM in a perfectly competitive market.

125
00:11:43,800 --> 00:11:49,300
All the firms would produce a total of QC.

126
00:11:49,300 --> 00:12:01,400
So that's just really fundamental microeconomic models that make these predictions.

127
00:12:01,400 --> 00:12:04,760
So we don't really ever observe a perfect monopoly.

128
00:12:04,760 --> 00:12:07,600
We never really observe perfect competition.

129
00:12:07,600 --> 00:12:08,840
Well, I take that back.

130
00:12:08,840 --> 00:12:14,260
You may see a monopolist now and again, but it's actually kind of rare.

131
00:12:14,260 --> 00:12:24,640
So you're really observing people in firms operating somewhere between oligopoly and

132
00:12:24,640 --> 00:12:25,640
monopolistic competition.

133
00:12:25,640 --> 00:12:31,680
And you could even split that up into gradients.

134
00:12:31,680 --> 00:12:43,240
And generally, governments are concerned if they see really concentrated markets.

135
00:12:43,240 --> 00:12:52,240
So if they see essentially oligopolies, they start to get a little concerned because the

136
00:12:52,240 --> 00:12:55,400
firms start to get a lot of market power.

137
00:12:55,400 --> 00:13:02,760
And as we saw, the consumers start to kind of get boxed out of surplus.

138
00:13:02,760 --> 00:13:09,040
So the consumers, they lose a bit of their surplus.

139
00:13:09,040 --> 00:13:13,480
And so the idea of the government is they're looking out for everyone.

140
00:13:13,480 --> 00:13:19,120
And so they're looking out for producers, and they're looking out for consumers.

141
00:13:19,120 --> 00:13:28,560
So that would be the justification for antitrust policy, is if there's a social preference

142
00:13:28,560 --> 00:13:35,320
for there not to be a monopolist and for a more competitive market.

143
00:13:35,320 --> 00:13:48,280
And so that's sort of been the justification for antitrust policy in the past.

144
00:13:48,280 --> 00:13:53,320
And then I just point out things like regulatory capture, inclusion, and whatnot, because those

145
00:13:53,320 --> 00:13:58,440
are factors that I think push the market in a more anticompetitive direction.

146
00:13:58,440 --> 00:14:05,000
But long story short, you start to see different market structures.

147
00:14:05,000 --> 00:14:07,240
They're affected by different things.

148
00:14:07,240 --> 00:14:10,680
So for example, there may be barriers to entry.

149
00:14:10,680 --> 00:14:16,840
So it may not really matter that much what government policy is per se, depending on

150
00:14:16,840 --> 00:14:19,200
what the barriers to entry are.

151
00:14:19,200 --> 00:14:25,480
So if there's high barriers to entry, that's going to push the market in a more concentrated

152
00:14:25,480 --> 00:14:28,960
direction.

153
00:14:28,960 --> 00:14:36,640
So long story short, we're interested in the cannabis industry.

154
00:14:36,640 --> 00:14:44,200
And so it would be interesting to see if we can actually quantify this.

155
00:14:44,200 --> 00:14:53,000
So in previous meetups, we've just been trying to just kind of make qualitative statements

156
00:14:53,000 --> 00:14:56,080
about where we think the market's heading.

157
00:14:56,080 --> 00:15:00,160
But we've got real good data here in Washington state.

158
00:15:00,160 --> 00:15:04,480
So we can actually start looking at some of these metrics.

159
00:15:04,480 --> 00:15:10,360
In one of these metrics that we can actually use to quantify this, so put an actual number

160
00:15:10,360 --> 00:15:15,600
on how competitive the market is, this is metric called the HHI.

161
00:15:15,600 --> 00:15:18,760
And so I'll come back to this table here in a second.

162
00:15:18,760 --> 00:15:23,120
But let me go ahead and introduce this measure.

163
00:15:23,120 --> 00:15:30,320
So I could have presented this more elegantly, but we'll power through this and get to the

164
00:15:30,320 --> 00:15:35,000
data because it's an important concept.

165
00:15:35,000 --> 00:15:45,720
Because in the past, we've just been trying to use other ways to judge how competitive

166
00:15:45,720 --> 00:15:47,280
the market is.

167
00:15:47,280 --> 00:15:56,720
But in Washington state, we actually know the market share of each retailer over time.

168
00:15:56,720 --> 00:16:04,160
So we can actually use these standard measures of market competitiveness.

169
00:16:04,160 --> 00:16:10,920
And the two standard measures are the end firm concentration ratio.

170
00:16:10,920 --> 00:16:16,960
And so that's just the sum of the top largest firms.

171
00:16:16,960 --> 00:16:22,960
So you just maybe look at the top five largest firms.

172
00:16:22,960 --> 00:16:29,160
And the higher the number, the more concentrated the market is.

173
00:16:29,160 --> 00:16:43,000
Similarly, the HHI, you look at the sum of all of the market shares squared.

174
00:16:43,000 --> 00:16:53,120
And so if you were calculating the market share in terms of percentages, if there's

175
00:16:53,120 --> 00:16:57,560
just one firm, the monopoly has 100%.

176
00:16:57,560 --> 00:17:03,600
100 squared is 10,000.

177
00:17:03,600 --> 00:17:12,960
Well if all the firms have effectively 0% of the market, so they're going to be each

178
00:17:12,960 --> 00:17:18,920
firm is going to be approaching 0% of the market, then your HHI is 0.

179
00:17:18,920 --> 00:17:25,920
And so note that there's an infinite amount of firms in the market.

180
00:17:25,920 --> 00:17:30,280
Never really observe an infinite number of firms.

181
00:17:30,280 --> 00:17:33,040
So it's approaching infinity.

182
00:17:33,040 --> 00:17:42,640
So long story short, the Justice Department, the Department of Justice, through the Department

183
00:17:42,640 --> 00:17:52,120
of Justice, they need a metric that they're going to use in courts, in legal disputes.

184
00:17:52,120 --> 00:17:55,240
So they need a metric, a number.

185
00:17:55,240 --> 00:17:58,720
So they defined the HHI.

186
00:17:58,720 --> 00:18:05,840
And they set this threshold at 20.

187
00:18:05,840 --> 00:18:10,200
And I recommend checking out this source here for the exact language.

188
00:18:10,200 --> 00:18:21,880
But essentially, they say that, OK, any industry with an HHI above 2,500, we're going to define

189
00:18:21,880 --> 00:18:25,160
as highly concentrated.

190
00:18:25,160 --> 00:18:28,000
So that's what we would call an oligopoe.

191
00:18:28,000 --> 00:18:38,480
And then basically they said, OK, anything below 2,500, if it's above 1,500, that's

192
00:18:38,480 --> 00:18:44,760
a moderate concentration, but not really worth litigating over.

193
00:18:44,760 --> 00:18:52,200
And then anything below 1,500 is fairly competitive.

194
00:18:52,200 --> 00:18:55,680
So that's all of the nitty gritty out of the way.

195
00:18:55,680 --> 00:19:04,280
But essentially, now we actually have a metric that we can.

196
00:19:04,280 --> 00:19:15,440
Not really a metric, we've got like baselines that we can compare our metrics in Washington

197
00:19:15,440 --> 00:19:16,440
State to.

198
00:19:16,440 --> 00:19:25,760
So without further ado, let's calculate the HHI in Washington State.

199
00:19:25,760 --> 00:19:33,360
And not only can we calculate it, but we've got fantastic panel data.

200
00:19:33,360 --> 00:19:38,200
Here we've got data that varies over the individual.

201
00:19:38,200 --> 00:19:41,800
So that would be over the different licenses.

202
00:19:41,800 --> 00:19:45,120
And it varies over time.

203
00:19:45,120 --> 00:19:51,560
So we can measure the HHI over time.

204
00:19:51,560 --> 00:19:58,200
And so that way we can say, OK, is the market in Washington State becoming more competitive

205
00:19:58,200 --> 00:20:02,000
or less competitive over time?

206
00:20:02,000 --> 00:20:11,560
And the point to watch out for is essentially you're looking for jumps in the HHI of around

207
00:20:11,560 --> 00:20:17,680
200 going up, positive increases.

208
00:20:17,680 --> 00:20:23,880
Because those would be occurrences that increase market concentration.

209
00:20:23,880 --> 00:20:27,480
So you just kind of got to keep an eye out for those.

210
00:20:27,480 --> 00:20:35,480
So as diligent consumers, we can keep a pulse on the Washington market and sort of monitor

211
00:20:35,480 --> 00:20:46,120
the HHI and see if it's spiking out of control or trending positively or negatively one way

212
00:20:46,120 --> 00:20:47,400
or the other.

213
00:20:47,400 --> 00:20:55,240
So real interesting metric that we can look at here.

214
00:20:55,240 --> 00:21:03,800
So where can we get this data?

215
00:21:03,800 --> 00:21:09,000
Well, Washington State has incredibly open public data.

216
00:21:09,000 --> 00:21:21,280
And the Cannabis Observer is there, I guess you'd call them a news outlet, but they primarily

217
00:21:21,280 --> 00:21:26,720
report on the cannabis industry in Washington State.

218
00:21:26,720 --> 00:21:31,680
And they do periodic Freedom of Information Act requests to make these available to the

219
00:21:31,680 --> 00:21:32,680
public.

220
00:21:32,680 --> 00:21:39,160
So that way the public can get access to really incredible data here.

221
00:21:39,160 --> 00:21:46,280
And so, for example, I'll share this data with you here.

222
00:21:46,280 --> 00:21:55,600
I probably should have already shared it with you, but I will immediately after the talk

223
00:21:55,600 --> 00:21:56,600
today.

224
00:21:56,600 --> 00:22:05,880
Essentially, though, you now have access to all of the Washington State traceability data

225
00:22:05,880 --> 00:22:16,080
through, you know, November of 2021.

226
00:22:16,080 --> 00:22:21,400
And so I'm sure the Cannabis Observer will do another request here.

227
00:22:21,400 --> 00:22:30,920
So Washington State is migrating off of their traceability system in this month, December.

228
00:22:30,920 --> 00:22:36,880
What's interesting about that is it'll give us sort of a complete data set of this sort

229
00:22:36,880 --> 00:22:39,600
of the their traceability data.

230
00:22:39,600 --> 00:22:46,720
So that way we can just look at, you know, Washington State's experience with this date

231
00:22:46,720 --> 00:22:49,000
with this traceability system.

232
00:22:49,000 --> 00:22:52,600
And we actually have the entire population of data.

233
00:22:52,600 --> 00:23:03,600
And so this is really powerful from a statistical lens, because, right, a lot of statistics

234
00:23:03,600 --> 00:23:12,920
your underlying assumption is that you're operating with a sample of the data and you've

235
00:23:12,920 --> 00:23:18,080
got much limited predictive power when you're working with the sample versus the entire

236
00:23:18,080 --> 00:23:19,080
population.

237
00:23:19,080 --> 00:23:26,600
And so here, you know, we actually have the entire population of all of Washington State's

238
00:23:26,600 --> 00:23:31,920
data, you know, every single sale item.

239
00:23:31,920 --> 00:23:41,040
So it's actually sort of overwhelming to deal with, because it's just these are zipped files.

240
00:23:41,040 --> 00:23:50,280
So just be warned that once you unzip these, these are just ginormous TSVs.

241
00:23:50,280 --> 00:23:54,360
So they're not even CSVs, they're tab separated values.

242
00:23:54,360 --> 00:23:57,680
So they're even trickier to work with.

243
00:23:57,680 --> 00:24:00,080
So just be warned.

244
00:24:00,080 --> 00:24:04,520
Check out some of the source code from the cannabis data science group from the beginning

245
00:24:04,520 --> 00:24:09,640
of the year where we are working with some of these data sets.

246
00:24:09,640 --> 00:24:16,000
So user be warned, they're difficult to work with.

247
00:24:16,000 --> 00:24:24,880
How many months of, like, I am, how long has cannabis been legal in Washington?

248
00:24:24,880 --> 00:24:28,560
And does it have all of the data from when it was?

249
00:24:28,560 --> 00:24:31,680
It doesn't have the very beginning.

250
00:24:31,680 --> 00:24:42,200
So and I should know this off the top of my head, but oh yes, I want to say that Washington

251
00:24:42,200 --> 00:24:45,800
State permitted cannabis use in 2012.

252
00:24:45,800 --> 00:24:46,800
Oh, wow.

253
00:24:46,800 --> 00:24:49,840
So that's a lot of data.

254
00:24:49,840 --> 00:24:59,080
But this traceability system was adopted in formally of April of 2018.

255
00:24:59,080 --> 00:25:00,960
So you really just have.

256
00:25:00,960 --> 00:25:01,960
Yeah.

257
00:25:01,960 --> 00:25:02,960
Okay.

258
00:25:02,960 --> 00:25:05,080
So it's still a lot of data.

259
00:25:05,080 --> 00:25:11,320
You don't have all of those formative years of the industry, which would actually be incredibly

260
00:25:11,320 --> 00:25:13,120
interesting to look at.

261
00:25:13,120 --> 00:25:15,120
The growth, yeah.

262
00:25:15,120 --> 00:25:20,120
You can see how the industry came online over the years.

263
00:25:20,120 --> 00:25:22,440
2018 still a good time frame.

264
00:25:22,440 --> 00:25:26,520
So here's what we're given.

265
00:25:26,520 --> 00:25:32,640
But essentially what happened is Washington State migrated from biotrack traceability

266
00:25:32,640 --> 00:25:36,880
to leaf data systems traceability.

267
00:25:36,880 --> 00:25:43,040
And we've got all the data from leaf data systems traceability system.

268
00:25:43,040 --> 00:25:49,400
So it's awesome that Washington State makes this publicly available.

269
00:25:49,400 --> 00:25:59,040
If any of you have clever ways to aggregate this data, have at it because we'll share

270
00:25:59,040 --> 00:26:01,440
with you.

271
00:26:01,440 --> 00:26:15,640
So for example, they put together a aggregate of sales by retailer by month.

272
00:26:15,640 --> 00:26:19,760
And so I've already downloaded this here.

273
00:26:19,760 --> 00:26:22,880
And once again, I'll share the link with you afterwards.

274
00:26:22,880 --> 00:26:27,560
I think I already have it open.

275
00:26:27,560 --> 00:26:42,840
But essentially, someone helpful has aggregated total sales by license number by month.

276
00:26:42,840 --> 00:26:51,520
Well, here they're going back to November 2017.

277
00:26:51,520 --> 00:27:00,680
But as I said, the system wasn't formally fully adopted until November, I mean, until

278
00:27:00,680 --> 00:27:02,680
April of 2018.

279
00:27:02,680 --> 00:27:10,000
So I would be cautious about using data prior to April of 2018.

280
00:27:10,000 --> 00:27:11,000
OK.

281
00:27:11,000 --> 00:27:14,160
And one more question.

282
00:27:14,160 --> 00:27:19,280
Each license number is for one retailer.

283
00:27:19,280 --> 00:27:23,720
And when they renew their license, it's the same license number.

284
00:27:23,720 --> 00:27:25,200
I don't know.

285
00:27:25,200 --> 00:27:28,800
That's why I'm asking.

286
00:27:28,800 --> 00:27:30,680
It should.

287
00:27:30,680 --> 00:27:34,880
I haven't actually dug into this data too, too deeply here.

288
00:27:34,880 --> 00:27:42,240
But it should be like, so for example, like if we just look at one license.

289
00:27:42,240 --> 00:27:43,240
OK.

290
00:27:43,240 --> 00:27:46,920
Well, maybe that's not the best license to look at.

291
00:27:46,920 --> 00:27:50,000
But if we just look at one license, well.

292
00:27:50,000 --> 00:27:51,000
OK.

293
00:27:51,000 --> 00:27:57,680
So this could be a problem here if these license numbers aren't unique.

294
00:27:57,680 --> 00:28:01,720
Because it would be helpful to track these people over time.

295
00:28:01,720 --> 00:28:02,720
Yeah.

296
00:28:02,720 --> 00:28:03,720
OK.

297
00:28:03,720 --> 00:28:11,040
So we don't actually have quite the exact data that I was hoping we did here.

298
00:28:11,040 --> 00:28:15,480
Well, that sucks.

299
00:28:15,480 --> 00:28:19,560
Yes, because it would be interesting to.

300
00:28:19,560 --> 00:28:23,200
And like I said, the data exists.

301
00:28:23,200 --> 00:28:28,080
We'll just have to compile it ourselves.

302
00:28:28,080 --> 00:28:37,840
I think this will work for the analysis I wanted to do today, which was just look at

303
00:28:37,840 --> 00:28:42,240
sort of aggregate this by license number by period.

304
00:28:42,240 --> 00:28:50,600
So I think we just can't track these over time with this particular data set.

305
00:28:50,600 --> 00:29:01,040
Unless, hold on here, unless we've just gotten really unlucky with my.

306
00:29:01,040 --> 00:29:02,040
Right.

307
00:29:02,040 --> 00:29:08,440
So this could be the problem with the non-random sample.

308
00:29:08,440 --> 00:29:12,440
So let's try to pick out one.

309
00:29:12,440 --> 00:29:13,440
OK.

310
00:29:13,440 --> 00:29:14,440
Yeah.

311
00:29:14,440 --> 00:29:15,440
Cool.

312
00:29:15,440 --> 00:29:16,440
OK.

313
00:29:16,440 --> 00:29:17,440
Awesome.

314
00:29:17,440 --> 00:29:18,440
That's good.

315
00:29:18,440 --> 00:29:19,440
OK.

316
00:29:19,440 --> 00:29:27,440
As I just demonstrated, you have to be really careful about the conclusions you draw with

317
00:29:27,440 --> 00:29:29,440
non-random samples.

318
00:29:29,440 --> 00:29:31,920
But this is good.

319
00:29:31,920 --> 00:29:32,920
Yeah.

320
00:29:32,920 --> 00:29:36,240
So now we know that there are licenses that are going over years.

321
00:29:36,240 --> 00:29:37,240
Yeah.

322
00:29:37,240 --> 00:29:38,240
Cool.

323
00:29:38,240 --> 00:29:49,440
So that way, I just picked a bad example.

324
00:29:49,440 --> 00:29:54,240
So here is license 425498.

325
00:29:54,240 --> 00:29:59,920
And what's cool is you could actually match that.

326
00:29:59,920 --> 00:30:02,840
So you could grab the.

327
00:30:02,840 --> 00:30:03,840
There should be.

328
00:30:03,840 --> 00:30:08,400
So there's a licensees data set.

329
00:30:08,400 --> 00:30:09,400
OK.

330
00:30:09,400 --> 00:30:16,920
So you should be able to cross reference the licensees data and get granular information

331
00:30:16,920 --> 00:30:18,520
about that license.

332
00:30:18,520 --> 00:30:21,520
So another.

333
00:30:21,520 --> 00:30:26,840
Where it isn't the state, what their name is.

334
00:30:26,840 --> 00:30:28,120
Yeah.

335
00:30:28,120 --> 00:30:36,880
Another interesting thing also is that it would be interesting to see like who did not

336
00:30:36,880 --> 00:30:41,160
renew their how many how many retailers did not renew their license.

337
00:30:41,160 --> 00:30:48,120
You know, and was it dependent on their total sales?

338
00:30:48,120 --> 00:30:49,120
You are brilliant.

339
00:30:49,120 --> 00:30:57,120
I love your thinking because you took it to the next step because not only is it interesting

340
00:30:57,120 --> 00:31:05,320
to just see, OK, who's staying in the market to look at entries, exits over time.

341
00:31:05,320 --> 00:31:11,360
But then seeing if that's dependent on sales, that's brilliant.

342
00:31:11,360 --> 00:31:14,200
So brilliant idea, Marjana.

343
00:31:14,200 --> 00:31:21,040
So I think that's worthy of a, you know, I always think things like this are worthy of

344
00:31:21,040 --> 00:31:23,680
a paper because I mean, who else is looking at this?

345
00:31:23,680 --> 00:31:24,680
But I mean, that's just.

346
00:31:24,680 --> 00:31:26,280
You never want to write a paper.

347
00:31:26,280 --> 00:31:32,600
I have a lot of experience writing paper, so I'm game with that.

348
00:31:32,600 --> 00:31:37,040
Because I think you've got the data here to parse that out.

349
00:31:37,040 --> 00:31:38,040
Yeah, absolutely.

350
00:31:38,040 --> 00:31:42,440
So you could sort of.

351
00:31:42,440 --> 00:31:47,200
I don't know how you would do your analysis here, but you'd almost do some sort of like

352
00:31:47,200 --> 00:31:49,680
survival analysis.

353
00:31:49,680 --> 00:31:56,800
So like this license.

354
00:31:56,800 --> 00:32:03,920
Seventy nine or thirteen, you know, they only survived one period.

355
00:32:03,920 --> 00:32:09,400
And then you could see, OK, you know, does like the number of periods that a license

356
00:32:09,400 --> 00:32:13,680
survives, does that depend on.

357
00:32:13,680 --> 00:32:18,200
Their average sales or something like that.

358
00:32:18,200 --> 00:32:24,440
So you probably know better than I could do, but there's probably more sophisticated ways

359
00:32:24,440 --> 00:32:25,440
to do that analysis.

360
00:32:25,440 --> 00:32:31,800
But no, but I was going to say, Keegan, if you were interested in ever publishing even

361
00:32:31,800 --> 00:32:38,200
for a state's results and you needed like help writing a paper, I can help you with

362
00:32:38,200 --> 00:32:39,200
that.

363
00:32:39,200 --> 00:32:42,880
I published several papers, but not in this field, obviously.

364
00:32:42,880 --> 00:32:50,280
But we can look at journals that, you know, potentially this can be published in this.

365
00:32:50,280 --> 00:32:54,480
This is this is a this is good.

366
00:32:54,480 --> 00:32:58,520
This data set here.

367
00:32:58,520 --> 00:33:04,320
Because that was something a professor told me once that stuck with me that, you know,

368
00:33:04,320 --> 00:33:12,760
everyone's always so focused on market performance and entrance that no one that exit.

369
00:33:12,760 --> 00:33:14,560
It is quite understudied.

370
00:33:14,560 --> 00:33:16,520
Yes, absolutely.

371
00:33:16,520 --> 00:33:22,000
And that that that actually answers a lot more questions that I feel like then entering

372
00:33:22,000 --> 00:33:25,160
the market.

373
00:33:25,160 --> 00:33:27,120
What is causing people to leave?

374
00:33:27,120 --> 00:33:33,480
Yes, because we're risk averse because those are things you want to hedge against.

375
00:33:33,480 --> 00:33:40,680
So if you know that there's some threshold of sales that you really have to hit.

376
00:33:40,680 --> 00:33:46,120
So if you see that, oh, everyone below.

377
00:33:46,120 --> 00:33:52,520
You know, 300,000 in monthly sales is exiting after six months, then that's a really that's

378
00:33:52,520 --> 00:34:03,880
a, you know, a red flag, so to speak, that you'd want to watch out for.

379
00:34:03,880 --> 00:34:05,480
Or other factors.

380
00:34:05,480 --> 00:34:12,760
You know, you could have particular geographic regions that have high exit rates.

381
00:34:12,760 --> 00:34:18,880
Which could be dependent on the taxes in that region, too.

382
00:34:18,880 --> 00:34:20,480
I have a quick question.

383
00:34:20,480 --> 00:34:21,480
Yes.

384
00:34:21,480 --> 00:34:25,720
How I know if I'm sure it varies from state to state.

385
00:34:25,720 --> 00:34:31,080
Are there different incentives or encouragement from the states once they decide to legalize

386
00:34:31,080 --> 00:34:39,920
to encourage cannabis businesses from growers to retail to open up in the state?

387
00:34:39,920 --> 00:34:43,200
Is there anything going on like that?

388
00:34:43,200 --> 00:34:46,320
In Washington, I don't know off the top of my head.

389
00:34:46,320 --> 00:34:53,920
In fact, I want to say it's difficult to operate because it's you really have to make sure

390
00:34:53,920 --> 00:34:55,960
you're abiding by the regulations.

391
00:34:55,960 --> 00:35:05,040
It can be real easy to get mad for some violation for just like not having appropriate security

392
00:35:05,040 --> 00:35:07,440
cameras or this or that.

393
00:35:07,440 --> 00:35:13,840
So I do know in certain states, that's sort of a new priority is to make sure that people

394
00:35:13,840 --> 00:35:16,680
have the resources they need.

395
00:35:16,680 --> 00:35:24,520
But I don't know per se the state the standard in Washington state.

396
00:35:24,520 --> 00:35:35,720
I think Michigan has some incentives last I read, but I don't know what the incentives

397
00:35:35,720 --> 00:35:36,720
are.

398
00:35:36,720 --> 00:35:37,720
But yeah.

399
00:35:37,720 --> 00:35:47,000
It's an interesting point because these companies are bringing in good tax revenue.

400
00:35:47,000 --> 00:35:59,600
So you don't want the companies just not surviving for some good reason or another that could

401
00:35:59,600 --> 00:36:00,600
be prevented.

402
00:36:00,600 --> 00:36:06,920
But I love that you're thinking Marjana.

403
00:36:06,920 --> 00:36:15,400
So I think that's going to entail a whole bit of analysis.

404
00:36:15,400 --> 00:36:20,080
So we can perhaps put it on the docket and start looking at that.

405
00:36:20,080 --> 00:36:21,080
Sure.

406
00:36:21,080 --> 00:36:24,560
Look at exits every time.

407
00:36:24,560 --> 00:36:32,440
So for today, I'll just show you how to just start calculating some of the standard retail

408
00:36:32,440 --> 00:36:35,880
statistics that we've been calculating for other states.

409
00:36:35,880 --> 00:36:44,160
And we can do that pretty quickly, just because we've got such good data here.

410
00:36:44,160 --> 00:36:54,760
So essentially just going to read in this Excel spreadsheet.

411
00:36:54,760 --> 00:37:05,120
So now we're just looking at this exact same data.

412
00:37:05,120 --> 00:37:11,160
I've already kind of been filtering this, but looking at this data here.

413
00:37:11,160 --> 00:37:15,360
And then I noticed there was just at least one blank row.

414
00:37:15,360 --> 00:37:20,160
So just going to remove that.

415
00:37:20,160 --> 00:37:25,960
And then just create a formal date column.

416
00:37:25,960 --> 00:37:34,600
So nothing fancy yet.

417
00:37:34,600 --> 00:37:40,240
Just making sure that the date matches the reporting period.

418
00:37:40,240 --> 00:37:47,400
Then just going to grab Washington state's population real quick, just because we're

419
00:37:47,400 --> 00:37:54,040
interested in knowing the number of retailers per capita, just to kind of put things in

420
00:37:54,040 --> 00:37:56,920
perspective.

421
00:37:56,920 --> 00:38:03,640
So without further ado, we can start looking at some of these statistics in Washington.

422
00:38:03,640 --> 00:38:12,800
So for example, we can aggregate everything by period and just look at total sales over

423
00:38:12,800 --> 00:38:16,160
time.

424
00:38:16,160 --> 00:38:24,960
So this is sort of what we have to work with in a lot of the other states.

425
00:38:24,960 --> 00:38:30,920
And so this is what's so cool about the Washington state data is we can go as granular or as

426
00:38:30,920 --> 00:38:33,480
aggregated as we would like.

427
00:38:33,480 --> 00:38:44,640
So for example, we could get each granular sale item, which we've had some entrepreneurial

428
00:38:44,640 --> 00:38:56,400
members do and do studies on sale items.

429
00:38:56,400 --> 00:39:01,160
And so to long story short, today we're just looking at a few aggregate statistics.

430
00:39:01,160 --> 00:39:07,120
So we were calculating these statistics in Massachusetts and Illinois.

431
00:39:07,120 --> 00:39:12,280
So it makes sense to go ahead and calculate them in Washington state too.

432
00:39:12,280 --> 00:39:17,560
Okay, I knew there was someone trying to join.

433
00:39:17,560 --> 00:39:24,520
We have one more member joining.

434
00:39:24,520 --> 00:39:26,440
Happy to have you, Graham.

435
00:39:26,440 --> 00:39:30,760
We've just been looking at statistics here in Washington state.

436
00:39:30,760 --> 00:39:32,720
So happy to have you join.

437
00:39:32,720 --> 00:39:38,080
Yeah, sorry guys, running late from a doctor appointment, but this is wonderful.

438
00:39:38,080 --> 00:39:40,080
100% okay.

439
00:39:40,080 --> 00:39:43,800
So essentially you joined at a good time.

440
00:39:43,800 --> 00:39:48,920
Since we're centrally looking at the exact same metrics we were looking at in Illinois

441
00:39:48,920 --> 00:39:49,920
and Massachusetts.

442
00:39:49,920 --> 00:39:55,200
And this time we're looking at them in Washington state.

443
00:39:55,200 --> 00:39:57,200
So we can see, okay.

444
00:39:57,200 --> 00:40:03,160
And this is what's really cool is this is why I wanted to really compare Washington

445
00:40:03,160 --> 00:40:06,000
state with Massachusetts.

446
00:40:06,000 --> 00:40:09,840
Because I was in Washington state at the time.

447
00:40:09,840 --> 00:40:21,120
And in April of 2020, you see this giant spike in retail sales.

448
00:40:21,120 --> 00:40:28,640
And just anecdotally at the time, stores were getting rushed.

449
00:40:28,640 --> 00:40:33,320
So people were stockpiling on products.

450
00:40:33,320 --> 00:40:41,840
And it got to the point where basically all of the retailers were offering sales.

451
00:40:41,840 --> 00:40:47,120
So they were basically, different retailers would say, oh, we'll have a Friday sale or

452
00:40:47,120 --> 00:40:49,560
we'll do a Saturday sale.

453
00:40:49,560 --> 00:40:56,200
Well, they said, okay, we're just going to do sales all the time.

454
00:40:56,200 --> 00:41:00,720
And so effectively, that lowered the price.

455
00:41:00,720 --> 00:41:06,280
And so that's going to be a whole nother analysis of its own.

456
00:41:06,280 --> 00:41:14,840
We need to get the price data out of these sales items and look at prices over time in

457
00:41:14,840 --> 00:41:16,240
Washington state.

458
00:41:16,240 --> 00:41:21,040
So that's going to be a whole nother animal to wrangle.

459
00:41:21,040 --> 00:41:29,080
But for now, we can at least see that sales spiked in April of 2020, which is sort of

460
00:41:29,080 --> 00:41:36,080
the exact opposite of what we saw in Massachusetts where markets were closed for two months.

461
00:41:36,080 --> 00:41:46,160
And so I think this offers a great opportunity for differential analysis where, you know,

462
00:41:46,160 --> 00:41:50,760
you basically have two different policy decisions.

463
00:41:50,760 --> 00:41:56,960
And basically what, you know, economics is all about is measuring sort of the impact

464
00:41:56,960 --> 00:42:05,120
response of these policy decisions.

465
00:42:05,120 --> 00:42:11,520
So basically, from my just naive observation, it looks like, okay, you know, in Washington

466
00:42:11,520 --> 00:42:18,880
state, you know, the policy decision that they were going to keep stores open on top

467
00:42:18,880 --> 00:42:30,760
of sort of the dynamics that led all the retailers to decide that they're going to do their discounts,

468
00:42:30,760 --> 00:42:35,840
probably because there's an increase in demand, then, you know, you just see this is the whole

469
00:42:35,840 --> 00:42:40,080
shift in supply.

470
00:42:40,080 --> 00:42:46,160
So it just supplies shifted up and it's it stayed up.

471
00:42:46,160 --> 00:42:54,120
And as we saw in Massachusetts, there was sort of a sort of a shift down in supply during

472
00:42:54,120 --> 00:42:57,840
that time.

473
00:42:57,840 --> 00:43:06,160
So just naive observations so far, nothing concrete yet.

474
00:43:06,160 --> 00:43:14,120
But then we can start looking at retailers over time.

475
00:43:14,120 --> 00:43:20,240
A little interested in what's happened in these these latest months.

476
00:43:20,240 --> 00:43:22,560
Is this just like a reporting thing?

477
00:43:22,560 --> 00:43:26,720
Like maybe these companies just haven't reported their sales yet.

478
00:43:26,720 --> 00:43:32,880
And maybe there's measurement error, but there could have been a large well, and actually,

479
00:43:32,880 --> 00:43:35,820
we have to look at the scale over here, right?

480
00:43:35,820 --> 00:43:39,640
This is just a dip from 440 to 430.

481
00:43:39,640 --> 00:43:51,920
So you know, having 10 retailers exit the market is not made on it really be that extraordinary.

482
00:43:51,920 --> 00:44:03,720
But but anywho, this is the count of retailers over time.

483
00:44:03,720 --> 00:44:15,520
So now we've got sales over time, we've got retailers over time, we can divide the two

484
00:44:15,520 --> 00:44:22,880
and get sales per retailer over time.

485
00:44:22,880 --> 00:44:28,720
And so once again, going along, and then you see the sales per retailer spike along with

486
00:44:28,720 --> 00:44:30,400
sales.

487
00:44:30,400 --> 00:44:38,160
So you know, although each retailer, you know, is seeing this increase in their revenue.

488
00:44:38,160 --> 00:44:48,120
So good to be a retailer, if you're one of the ones that hasn't exited that is.

489
00:44:48,120 --> 00:44:59,240
And then finally, just looking at the retailers per 100,000 people that tracks along with

490
00:44:59,240 --> 00:45:02,360
retail retailers pretty closely.

491
00:45:02,360 --> 00:45:11,280
The one thing I'm going to point out though, is our measurements here are quite different

492
00:45:11,280 --> 00:45:22,240
than what's measured here in the technical memorandum produced for Nevada.

493
00:45:22,240 --> 00:45:30,880
So they are they're estimating that Washington State has 9.8 dispensaries per 100,000 and

494
00:45:30,880 --> 00:45:37,520
around 2.2 in annual revenue.

495
00:45:37,520 --> 00:45:45,920
We're seeing okay, you know, if we calculate this exact same metrics, you know, these should

496
00:45:45,920 --> 00:45:46,920
be identical.

497
00:45:46,920 --> 00:45:53,840
But we're estimating, you know, 5.8 retailers per 100,000.

498
00:45:53,840 --> 00:45:55,760
So it is a different metric.

499
00:45:55,760 --> 00:46:02,280
So ours should be a little less, but it should still be comparable.

500
00:46:02,280 --> 00:46:07,200
And then also our sales per retailer is a little higher.

501
00:46:07,200 --> 00:46:19,120
So I think it depends on perhaps when this technical memorandum was prepared.

502
00:46:19,120 --> 00:46:29,320
So you know, they may have prepared their memorandum, you know, early on in 2020, you

503
00:46:29,320 --> 00:46:34,400
know, before we saw that spike in sales.

504
00:46:34,400 --> 00:46:40,120
So that's a possibility I just realized.

505
00:46:40,120 --> 00:46:49,400
Anywho, now we can do a little live coding here, because I'm still working on these metrics.

506
00:46:49,400 --> 00:46:52,220
But we're real close.

507
00:46:52,220 --> 00:47:04,000
So long story short, we're trying to calculate the HHI in the similar metric, the concentration

508
00:47:04,000 --> 00:47:08,760
ratio over time.

509
00:47:08,760 --> 00:47:17,640
So just to save us a little bit of time, I've already written this snippet of code right

510
00:47:17,640 --> 00:47:18,640
here.

511
00:47:18,640 --> 00:47:29,640
Let's see if we can't use this to our advantage.

512
00:47:29,640 --> 00:47:40,680
So basically, I've got the market share for all the licenses on all the different days

513
00:47:40,680 --> 00:47:44,680
here.

514
00:47:44,680 --> 00:47:49,320
So let's try to calculate the HHI over time.

515
00:47:49,320 --> 00:47:56,080
So bear with me while I code this, but I think we can do this within five minutes.

516
00:47:56,080 --> 00:48:03,720
So basically, you know, if we're just going to iterate over this, so you know, so for

517
00:48:03,720 --> 00:48:16,720
the date, we're basically going to iterate over the keys.

518
00:48:16,720 --> 00:48:25,160
We're going to get the particular market shares for that date, actually, to make more sense

519
00:48:25,160 --> 00:48:37,600
to iterate like this.

520
00:48:37,600 --> 00:48:45,120
I think we can do it this way.

521
00:48:45,120 --> 00:49:00,280
So bear with me, but just make sure this is...

522
00:49:00,280 --> 00:49:06,320
Okay, cool.

523
00:49:06,320 --> 00:49:18,000
So the HHI is going to be the sum of all of these market shares squared.

524
00:49:18,000 --> 00:49:26,840
So it's share squared, right?

525
00:49:26,840 --> 00:49:33,360
That's going to be S squared for SN shares.

526
00:49:33,360 --> 00:49:44,360
And the HHI is just going to be the sum of the shares squared.

527
00:49:44,360 --> 00:49:54,600
I may not be doing this correctly, but we've got to start somewhere.

528
00:49:54,600 --> 00:50:05,360
And we want to just keep track of these over time.

529
00:50:05,360 --> 00:50:28,800
So no promises here, but we may have a non-numerical value here.

530
00:50:28,800 --> 00:50:50,120
Okay, although I did something wrong.

531
00:50:50,120 --> 00:50:51,120
This should...

532
00:50:51,120 --> 00:51:02,520
So now we should have the HHI over time.

533
00:51:02,520 --> 00:51:05,680
But remember, we need this in percentages.

534
00:51:05,680 --> 00:51:20,240
So we actually need the share times 100 squared.

535
00:51:20,240 --> 00:51:28,280
And then we'll just put this into...

536
00:51:28,280 --> 00:51:39,880
And pardon that I'm coding this up sort of haphazardly, but you're seeing how I'm hacking

537
00:51:39,880 --> 00:51:46,480
this out.

538
00:51:46,480 --> 00:51:53,480
So if we're lucky...

539
00:51:53,480 --> 00:52:05,480
Not too lucky.

540
00:52:05,480 --> 00:52:09,880
Okay.

541
00:52:09,880 --> 00:52:24,240
One second, let me make sure that I'm doing this correctly.

542
00:52:24,240 --> 00:52:46,040
We can just do a...

543
00:52:46,040 --> 00:53:01,120
One second here.

544
00:53:01,120 --> 00:53:06,120
Adding an index.

545
00:53:06,120 --> 00:53:17,920
Let's see if we can add this data with everything else.

546
00:53:17,920 --> 00:53:22,640
Okay.

547
00:53:22,640 --> 00:53:36,320
Here is a rough attempt at calculating the HHI over time.

548
00:53:36,320 --> 00:53:43,960
And so it looks like it's decreasing and it looks like it's really low at 40.

549
00:53:43,960 --> 00:53:53,560
And so remember earlier we were looking for an HHI of below 2,500.

550
00:53:53,560 --> 00:54:01,400
So I wanted to make you 100% sure I calculated this correctly, but it sure looks like there's

551
00:54:01,400 --> 00:54:12,920
a quite competitive market in Washington state that's becoming more competitive over time.

552
00:54:12,920 --> 00:54:22,240
Like I said, I'm super uncertain about this because I just calculated this sort of spur

553
00:54:22,240 --> 00:54:25,240
on the moment here.

554
00:54:25,240 --> 00:54:31,000
But let's just look at the logic real quick.

555
00:54:31,000 --> 00:54:50,920
So we're getting the market shares for each day and then for each day we're calculating

556
00:54:50,920 --> 00:54:59,800
the percentage squared and then we're summing those.

557
00:54:59,800 --> 00:55:04,640
So I want to say it's just a competitive industry.

558
00:55:04,640 --> 00:55:09,560
So you don't see like a oligopoly yet.

559
00:55:09,560 --> 00:55:12,400
And that makes sense, right?

560
00:55:12,400 --> 00:55:16,920
There's some like 400 some retailers, right?

561
00:55:16,920 --> 00:55:25,920
If we were looking at the retailers over time, right?

562
00:55:25,920 --> 00:55:31,680
There are some like 400 retailers in Washington state.

563
00:55:31,680 --> 00:55:39,960
So if they were like one or I mean if there were like five or six retailers, you would

564
00:55:39,960 --> 00:55:43,720
expect this HHI to be much higher.

565
00:55:43,720 --> 00:55:55,640
So it looks like, at least to me, like I said, I want to recheck this metric, but it looks

566
00:55:55,640 --> 00:56:08,280
like the HHI is well into the monopolistic competition level in Washington state.

567
00:56:08,280 --> 00:56:17,520
Real quick, if you want to bear with us, bear with me, I think we could calculate the CR5

568
00:56:17,520 --> 00:56:18,520
over time.

569
00:56:18,520 --> 00:56:26,040
So here, this is just going to be the sum of the top market shares.

570
00:56:26,040 --> 00:56:33,080
So I think we can just sort.

571
00:56:33,080 --> 00:56:37,280
Okay, cool.

572
00:56:37,280 --> 00:56:51,720
Here, let me, I should do this off the top of my head, but.

573
00:56:51,720 --> 00:57:04,200
Okay, yeah, so I think we should just be able to just sort the shares.

574
00:57:04,200 --> 00:57:23,080
And then just get the top five shares.

575
00:57:23,080 --> 00:57:36,800
And then just do the sum of the top five.

576
00:57:36,800 --> 00:57:39,800
That should be our CR5.

577
00:57:39,800 --> 00:57:47,360
Yeah, let's just go ahead and multiply this.

578
00:57:47,360 --> 00:57:57,040
Well, we don't really have to multiply it by 100, I suppose.

579
00:57:57,040 --> 00:58:06,840
Okay, I don't think I did something right.

580
00:58:06,840 --> 00:58:09,840
Okay, actually, let's.

581
00:58:09,840 --> 00:58:38,320
So this would be, let's maybe multiply this by 100.

582
00:58:38,320 --> 00:58:43,800
It looks like the CR5 may be slightly increasing, so this would be the concentration of the

583
00:58:43,800 --> 00:58:50,120
top five largest retailers, but it still looks like an incredibly low number.

584
00:58:50,120 --> 00:58:56,600
So it looks like, to me, like I said, I'm really skeptical about how I'm calculating

585
00:58:56,600 --> 00:58:58,520
these statistics.

586
00:58:58,520 --> 00:59:03,520
And so I'm, after this meetup, I'm going to make sure that I am calculating these correctly

587
00:59:03,520 --> 00:59:08,520
and I'll send you out an email if I made any mistakes here.

588
00:59:08,520 --> 00:59:20,880
I think you might have picked up all the shares except for the top five, because I don't know

589
00:59:20,880 --> 00:59:27,280
if it's sort of descending or descending, but that looks like the same plots.

590
00:59:27,280 --> 00:59:35,280
Yeah, I think I may have picked up the smallest five here.

591
00:59:35,280 --> 00:59:38,280
So let's try this.

592
00:59:38,280 --> 00:59:39,280
Oh, you're right.

593
00:59:39,280 --> 00:59:45,320
Yeah, because like it should be for sort, you can do sorted.shares and put reverse equals

594
00:59:45,320 --> 00:59:48,680
true because we want the highest shares, right?

595
00:59:48,680 --> 00:59:49,680
Exactly.

596
00:59:49,680 --> 00:59:55,640
So let's try one more time here.

597
00:59:55,640 --> 01:00:13,360
Let's try this one more time.

598
01:00:13,360 --> 01:00:26,840
Okay, so this looks a bit more plausible and it matches the movement of the HHI.

599
01:00:26,840 --> 01:00:30,880
And so these numbers look about right.

600
01:00:30,880 --> 01:00:32,600
Well, actually, I don't know.

601
01:00:32,600 --> 01:00:36,080
I've actually never calculated this metric before, so I don't know if they look about

602
01:00:36,080 --> 01:00:39,080
right or not.

603
01:00:39,080 --> 01:00:46,440
If we calculated this correctly, then this would say that the top five retailers had

604
01:00:46,440 --> 01:00:58,720
around 6.6% of sales and now they're down around 5.6% of all sales.

605
01:00:58,720 --> 01:01:08,880
Keegan, I think you actually did that right because mathematically it makes sense.

606
01:01:08,880 --> 01:01:17,760
Like this makes perfect sense and it's analogued in the Maryland system at least.

607
01:01:17,760 --> 01:01:26,560
Like Maryland uses a sales system now because there was the only way the smaller retailers

608
01:01:26,560 --> 01:01:30,880
could get customers.

609
01:01:30,880 --> 01:01:33,680
So I think you actually did the math right.

610
01:01:33,680 --> 01:01:35,920
Be a little more confident.

611
01:01:35,920 --> 01:01:41,000
Well, I always got to double check and that's the whole point of the scientific process

612
01:01:41,000 --> 01:01:44,440
too is reproducibility, right?

613
01:01:44,440 --> 01:01:53,360
So not only should I calculate these statistics, but there really should be dozens if not hundreds

614
01:01:53,360 --> 01:01:56,160
or thousands of other people such as yourselves.

615
01:01:56,160 --> 01:01:59,880
You're brilliant data scientists calculating these too.

616
01:01:59,880 --> 01:02:03,880
That way we're all sort of double checking each other.

617
01:02:03,880 --> 01:02:13,080
So that way if I made a mistake in my logic, then someone else can point that out.

618
01:02:13,080 --> 01:02:18,880
I'm going to be studying this script today and refining it and I would encourage you

619
01:02:18,880 --> 01:02:19,880
all to as well.

620
01:02:19,880 --> 01:02:30,880
So that way the more eyes on the data and the mechanism that generates the statistics,

621
01:02:30,880 --> 01:02:32,160
the better.

622
01:02:32,160 --> 01:02:36,560
So there you have it.

623
01:02:36,560 --> 01:02:49,440
That was my, you know, rough attempt to calculate the CR5, the concentration ratio and the Hirfendale

624
01:02:49,440 --> 01:02:55,240
Hirschman index, the HHI in Washington state.

625
01:02:55,240 --> 01:03:03,680
And definitely want to double check everything, but on first glance, it looks like things

626
01:03:03,680 --> 01:03:08,600
are quite competitive in Washington state.

627
01:03:08,600 --> 01:03:15,400
So that would have been my guess, but I would have, I didn't think things were going to

628
01:03:15,400 --> 01:03:19,080
that the HHI would be as low as it is.

629
01:03:19,080 --> 01:03:28,080
So now it's piqued my curiosity to see what it may be in other states such as Illinois

630
01:03:28,080 --> 01:03:38,360
or Maryland, where you have quite a fewer number of retailers per capita.

631
01:03:38,360 --> 01:03:40,880
Good luck getting that data out.

632
01:03:40,880 --> 01:03:48,680
Well, and so that's why, you know, we've got to work with the data that we're given

633
01:03:48,680 --> 01:03:53,360
and you know, here I'm going to stop presenting for now.

634
01:03:53,360 --> 01:03:58,440
And that's one of the reasons why we do this analysis is, so I was thinking about this

635
01:03:58,440 --> 01:03:59,440
this morning.

636
01:03:59,440 --> 01:04:06,560
So there's any of you smart entrepreneurial people one day end up as a statistician at

637
01:04:06,560 --> 01:04:12,320
one of these regulatory agencies, or maybe one of these regulatory agencies, one of their

638
01:04:12,320 --> 01:04:14,080
data scientists is listening in.

639
01:04:14,080 --> 01:04:21,000
Well, you know, these are techniques that you can use in your markets and in your analysis.

640
01:04:21,000 --> 01:04:25,560
So this is how it was done in Washington state.

641
01:04:25,560 --> 01:04:30,240
So we were fortunate enough to have access to this data.

642
01:04:30,240 --> 01:04:35,840
Well, I'm sure the regulators have access to this data in Massachusetts and Illinois

643
01:04:35,840 --> 01:04:39,520
and California, or all these other states.

644
01:04:39,520 --> 01:04:45,120
So people at those agencies can calculate these statistics.

645
01:04:45,120 --> 01:04:53,120
Like I said, it'd be awesome, I think, to have many people be able to calculate them,

646
01:04:53,120 --> 01:04:57,120
because as I pointed out, the more eyes on them, the better.

647
01:04:57,120 --> 01:05:00,280
It's always awesome to have people double checking each other.

648
01:05:00,280 --> 01:05:04,600
So I'm a big fan of sort of democratizing data.

649
01:05:04,600 --> 01:05:11,600
And I heard somewhere that, you know, having your data be being reproducible.

650
01:05:11,600 --> 01:05:15,800
So in our case, having our statistics be reproducible is critical.

651
01:05:15,800 --> 01:05:20,400
Absolutely, even in science, even in biomedical sciences, right.

652
01:05:20,400 --> 01:05:26,320
The problem with cancer studies is more 80% of them are not reproducible.

653
01:05:26,320 --> 01:05:27,640
It's a problem in every science.

654
01:05:27,640 --> 01:05:33,160
So yeah, I agree with you on that.

655
01:05:33,160 --> 01:05:43,440
Well, I think that's going to be the lesson of the day is do what you can to make your

656
01:05:43,440 --> 01:05:44,440
data reproducible.

657
01:05:44,440 --> 01:05:48,240
That's sort of what we strive for here in the James Day Science Group.

658
01:05:48,240 --> 01:05:51,080
So that's why I put our methods out there.

659
01:05:51,080 --> 01:05:57,600
Ideally, you can run them yourselves, double check all the statistics.

660
01:05:57,600 --> 01:05:59,720
And that's what we're all about.

661
01:05:59,720 --> 01:06:02,640
So thank you all for coming.

662
01:06:02,640 --> 01:06:06,520
And if you want to contribute, there's many different avenues.

663
01:06:06,520 --> 01:06:11,080
So feel free to reach out and we can all start collaborating.

664
01:06:11,080 --> 01:06:18,560
And as I said at the beginning, we'll start looking at our forecasts of 2021 just to see

665
01:06:18,560 --> 01:06:20,720
how we did.

666
01:06:20,720 --> 01:06:27,920
And then we can learn and improve and try to make even better forecasts for 2022.

667
01:06:27,920 --> 01:06:30,040
So stay tuned.

668
01:06:30,040 --> 01:06:31,680
That's great.

669
01:06:31,680 --> 01:06:32,680
Thanks so much, Keegan.

670
01:06:32,680 --> 01:06:33,680
This is great.

671
01:06:33,680 --> 01:06:34,680
Awesome.

672
01:06:34,680 --> 01:06:39,680
Marjana, Graham, Shann, thank you for coming today.

673
01:06:39,680 --> 01:06:44,000
I hope you have a productive week and I'll speak with you all soon.

674
01:06:44,000 --> 01:06:45,000
See you.

675
01:06:45,000 --> 01:06:46,000
Bye everyone.

676
01:06:46,000 --> 01:07:02,000
Bye now.

