1
00:00:00,000 --> 00:00:10,840
Welcome to the cannabis data science meetup group.

2
00:00:10,840 --> 00:00:12,800
The day before Thanksgiving.

3
00:00:12,800 --> 00:00:17,360
So it's harvest season.

4
00:00:17,360 --> 00:00:19,440
So we'll be talking about

5
00:00:19,440 --> 00:00:22,600
cannabis sales and harvests and all that.

6
00:00:22,600 --> 00:00:24,520
So there's a lot of ground to cover.

7
00:00:24,520 --> 00:00:26,280
We've got some new faces today.

8
00:00:26,280 --> 00:00:32,520
So if we want to just do a quick 30 second intro around.

9
00:00:37,960 --> 00:00:43,080
So Sabahullah, you wouldn't mind

10
00:00:43,080 --> 00:00:44,680
beginning with you and just giving

11
00:00:44,680 --> 00:00:46,600
a quick introduction about yourself.

12
00:00:46,600 --> 00:00:49,120
Okay, sure. Hi everyone.

13
00:00:49,120 --> 00:00:51,000
My name is Sabahullah Zabi.

14
00:00:51,000 --> 00:00:52,760
I'm also called Zabi Buddha.

15
00:00:52,760 --> 00:00:56,920
And currently I'm joining you from Toronto.

16
00:00:56,920 --> 00:00:58,640
Yeah. And first of all,

17
00:00:58,640 --> 00:01:00,200
happy Thanksgiving for everyone.

18
00:01:00,200 --> 00:01:04,600
And the thing is like I am a potential data scientist.

19
00:01:04,600 --> 00:01:08,040
I have taken bootcamp for three months with Lighthouse.

20
00:01:08,040 --> 00:01:13,360
So I kind of know how to do machine learning and do modeling,

21
00:01:13,360 --> 00:01:15,360
analyze data, but I have not joined

22
00:01:15,360 --> 00:01:17,440
yet the data science field yet.

23
00:01:17,440 --> 00:01:19,800
So in the real world.

24
00:01:19,800 --> 00:01:21,560
Awesome. Yeah.

25
00:01:21,560 --> 00:01:25,120
Heather, would you mind introducing yourself to the group?

26
00:01:25,680 --> 00:01:28,160
Sure. My name is Heather.

27
00:01:28,160 --> 00:01:32,520
Joined cannabis data science a couple of months ago.

28
00:01:32,520 --> 00:01:34,920
Been attending on a regular basis.

29
00:01:34,920 --> 00:01:37,960
I don't come from the background of economics,

30
00:01:37,960 --> 00:01:42,840
but it is something that's helping me to understand the data better.

31
00:01:42,840 --> 00:01:48,600
So I have experience in the lab and also

32
00:01:48,600 --> 00:01:52,200
a love of cannabis.

33
00:01:52,200 --> 00:01:55,800
So with those two things combined, here I am.

34
00:01:55,800 --> 00:01:57,720
Awesome. And exactly.

35
00:01:57,720 --> 00:01:59,840
And so we have people from all different backgrounds,

36
00:01:59,840 --> 00:02:01,280
my background is in economics.

37
00:02:01,280 --> 00:02:03,360
As Heather said, she's spent time in the lab.

38
00:02:03,360 --> 00:02:05,160
She's got a laboratory background.

39
00:02:05,160 --> 00:02:10,960
So whatever you bring to the table just adds value to the group.

40
00:02:10,960 --> 00:02:16,040
So Marjana, you wouldn't mind introducing yourself real quick, would you?

41
00:02:16,040 --> 00:02:19,120
Hey, everyone. Can everyone hear me?

42
00:02:19,120 --> 00:02:19,840
Yes.

43
00:02:19,840 --> 00:02:26,080
OK. I didn't know how to use my pod's mic, so I'm glad it's working.

44
00:02:26,080 --> 00:02:34,880
So yeah, I joined Keenan on Saturday for the stats portion.

45
00:02:34,880 --> 00:02:38,280
And I actually worked in a lab too, just like Heather,

46
00:02:38,280 --> 00:02:40,640
for 10 years and decided that was not for me.

47
00:02:40,640 --> 00:02:46,480
So during the pandemic, I made a switch to software engineering.

48
00:02:46,480 --> 00:02:49,160
I went to a boot camp too, like you, Zabi.

49
00:02:49,160 --> 00:02:54,080
I went to a coding boot camp actually for software engineering.

50
00:02:54,080 --> 00:02:56,560
And right now I work as a tech consultant.

51
00:02:56,560 --> 00:03:03,040
I also like cannabis and think there is great potential for it in the market.

52
00:03:03,040 --> 00:03:07,440
So here I am. Nice to see everyone.

53
00:03:07,440 --> 00:03:08,880
Awesome to see you, Marjana.

54
00:03:08,880 --> 00:03:11,440
And that's interesting that you have a laboratory background as well.

55
00:03:11,440 --> 00:03:14,560
So every now and again, we'll talk about cannabis testing.

56
00:03:14,560 --> 00:03:19,160
So we'd always love your perspective on the lab space.

57
00:03:19,160 --> 00:03:19,960
Sure.

58
00:03:19,960 --> 00:03:22,960
To Parat, if I'm pronouncing that correctly.

59
00:03:22,960 --> 00:03:24,200
And correct me if I'm not.

60
00:03:24,200 --> 00:03:27,800
But would you mind introducing yourself?

61
00:03:27,800 --> 00:03:28,960
Beira?

62
00:03:28,960 --> 00:03:29,960
Yeah. OK. Hi.

63
00:03:29,960 --> 00:03:33,440
I wasn't sure it was me. Yeah, Farah is fine.

64
00:03:33,440 --> 00:03:37,000
Hi. I'm a software engineer.

65
00:03:37,000 --> 00:03:40,480
I am currently in England, south of England.

66
00:03:40,480 --> 00:03:45,200
I've been a software engineer for four years.

67
00:03:45,200 --> 00:03:47,800
And I'm very interested in data science.

68
00:03:47,800 --> 00:03:49,320
Cannabis is quite new to me.

69
00:03:49,320 --> 00:03:56,240
It's because there's legal issues and then other stuff that kind of make people think

70
00:03:56,240 --> 00:03:57,080
that it's not OK.

71
00:03:57,080 --> 00:03:59,120
And I'm like, hold on for a second.

72
00:03:59,120 --> 00:04:00,880
Maybe it is OK.

73
00:04:00,880 --> 00:04:05,280
And my recent experiences with it have been really interesting.

74
00:04:05,280 --> 00:04:13,880
So I really wanted to learn more because I also think that it has a lot more uses than

75
00:04:13,880 --> 00:04:16,520
the government wants us to know.

76
00:04:16,520 --> 00:04:17,280
Awesome.

77
00:04:17,280 --> 00:04:20,400
And we'd love your perspective, too, coming from the UK.

78
00:04:20,400 --> 00:04:26,120
So feel free to share what the industry is like there.

79
00:04:26,120 --> 00:04:28,120
Not even sure if they've got permitted.

80
00:04:28,120 --> 00:04:31,320
Do they have permitted medicinal?

81
00:04:31,320 --> 00:04:33,520
Yeah, it's new.

82
00:04:33,520 --> 00:04:34,120
Yeah, it's new.

83
00:04:34,120 --> 00:04:42,080
It's not, again, it's not very widely used or accepted stills because it's still considered

84
00:04:42,080 --> 00:04:44,040
as a drug.

85
00:04:44,040 --> 00:04:46,920
So it's not that widely used, but definitely still is.

86
00:04:46,920 --> 00:04:47,640
Yeah.

87
00:04:47,640 --> 00:04:48,280
Awesome.

88
00:04:48,280 --> 00:04:53,920
Well, we'll have to hear from you as things develop across the seas.

89
00:04:53,920 --> 00:05:00,000
So Joachim, if I'm pronouncing that correctly, would you mind introducing yourself?

90
00:05:00,000 --> 00:05:03,320
Well, my name is Joachim.

91
00:05:03,320 --> 00:05:04,280
I'm from Germany.

92
00:05:04,280 --> 00:05:06,000
I live in Aaraba.

93
00:05:06,000 --> 00:05:14,160
Like Zabihullah and Marjana, I took a boot camp in data science some time ago.

94
00:05:14,160 --> 00:05:22,240
Like Zabihullah, like in Zabihullah's case, that didn't lead to a related drop immediately.

95
00:05:22,240 --> 00:05:32,240
And I must admit, I just came across this meetup, and it seemed a quirky thing to do

96
00:05:32,240 --> 00:05:33,520
for my lunch break.

97
00:05:33,520 --> 00:05:39,240
And I'm looking forward to seeing what's in store for me.

98
00:05:39,240 --> 00:05:40,800
Oh, awesome.

99
00:05:40,800 --> 00:05:43,960
We've got quite the international crowd today, so happy to have you.

100
00:05:43,960 --> 00:05:51,240
We're representing at least four countries, United States, Canada, Britain, and Germany.

101
00:05:51,240 --> 00:06:00,400
So Joachim, sorry for mispronouncing, but you're also welcome to share your perspective.

102
00:06:00,400 --> 00:06:08,520
Correct me if I'm wrong, but there may even be cannabis imports into Germany.

103
00:06:08,520 --> 00:06:16,600
Well, a new government is in the process of being formed.

104
00:06:16,600 --> 00:06:25,680
And they, I think, talked about legalization as something they might want to tackle.

105
00:06:25,680 --> 00:06:30,040
And ideologically, they're in a good position to do so.

106
00:06:30,040 --> 00:06:39,800
But as we speak, I think, but it remains illegal for all intents and purposes, except for possibly

107
00:06:39,800 --> 00:06:41,240
some medical uses.

108
00:06:41,240 --> 00:06:43,480
I'm not quite sure there.

109
00:06:43,480 --> 00:06:53,240
So a booming export market may come into being, but it's not there at the moment.

110
00:06:53,240 --> 00:06:59,600
And like some other people on this meetup, I am currently in Canada.

111
00:06:59,600 --> 00:07:01,880
I live in Ottawa.

112
00:07:01,880 --> 00:07:08,080
Well, as things develop, love to hear your experience.

113
00:07:08,080 --> 00:07:13,960
And then our last guest, I don't have your name, so you're welcome to introduce yourself.

114
00:07:13,960 --> 00:07:19,240
Oh, sorry, you guys talking to me?

115
00:07:19,240 --> 00:07:20,240
Yes.

116
00:07:20,240 --> 00:07:21,240
Okay.

117
00:07:21,240 --> 00:07:25,280
It's just unknown on name for me, but you're welcome to introduce yourself.

118
00:07:25,280 --> 00:07:26,280
Okay.

119
00:07:26,280 --> 00:07:27,280
My name is Courtney.

120
00:07:27,280 --> 00:07:30,880
I'm from Calgary, Alberta in Canada.

121
00:07:30,880 --> 00:07:32,520
And I'm actually attending this meeting.

122
00:07:32,520 --> 00:07:33,800
I'm going to school right now.

123
00:07:33,800 --> 00:07:38,200
I'm completing my master's in data science at the University of Colorado.

124
00:07:38,200 --> 00:07:40,880
So I'm attending this as part of a class, actually.

125
00:07:40,880 --> 00:07:46,400
But I'm excited to see kind of the data science process and relate it to cannabis a little

126
00:07:46,400 --> 00:07:47,400
bit.

127
00:07:47,400 --> 00:07:48,400
Well, cool.

128
00:07:48,400 --> 00:07:51,320
You'll have an interesting story to report back.

129
00:07:51,320 --> 00:07:58,720
So we've got some good material to look at today, and that's exciting.

130
00:07:58,720 --> 00:08:04,760
So we can maybe chat more throughout the presentation.

131
00:08:04,760 --> 00:08:10,400
But without further ado, I'll just show you essentially some material that I prepared

132
00:08:10,400 --> 00:08:16,000
for today to just sort of start a discussion.

133
00:08:16,000 --> 00:08:30,700
So without further ado.

134
00:08:30,700 --> 00:08:39,920
So in the past few weeks, we were looking at Massachusetts in specific.

135
00:08:39,920 --> 00:08:47,160
And so I've got a major correction for today, but that's what we do here, right?

136
00:08:47,160 --> 00:08:50,080
We're doing exploratory statistics.

137
00:08:50,080 --> 00:08:56,800
And as new information comes to light, well, we can update our statistics and our forecast

138
00:08:56,800 --> 00:08:58,720
as well.

139
00:08:58,720 --> 00:09:07,680
So long story short, we were calculating statistics based on just the total number of retail licenses,

140
00:09:07,680 --> 00:09:08,680
379.

141
00:09:08,680 --> 00:09:13,360
And same for manufacturers, 218.

142
00:09:13,360 --> 00:09:22,280
Well, I listened into this Massachusetts public meeting last Thursday, the day after our meetup

143
00:09:22,280 --> 00:09:27,680
last week, and new information came to light.

144
00:09:27,680 --> 00:09:38,560
So we presented this table where you essentially have a breakdown of the applications by type

145
00:09:38,560 --> 00:09:42,080
and the stage that they're at.

146
00:09:42,080 --> 00:09:50,240
So as you can see, not all retailers have commenced operations.

147
00:09:50,240 --> 00:09:59,400
So you've got pending applications, and then you have licenses that have been provisionally

148
00:09:59,400 --> 00:10:01,400
approved.

149
00:10:01,400 --> 00:10:10,760
And then it's not clear if all of these licenses have begun operating yet.

150
00:10:10,760 --> 00:10:22,720
So when we were calculating sales per retailer and we were using 379 retailers, well, that's

151
00:10:22,720 --> 00:10:27,880
going to bias our estimates downwards.

152
00:10:27,880 --> 00:10:38,280
So our estimate of sales per retailer is probably much lower than the actual.

153
00:10:38,280 --> 00:10:43,680
So today, we can revise that statistic.

154
00:10:43,680 --> 00:10:53,000
And it got me thinking, well, what is essentially the average sales per retailer?

155
00:10:53,000 --> 00:11:00,600
Because it would be nice to have a figure to try to gauge our estimates against to see

156
00:11:00,600 --> 00:11:04,640
if we're in the right ballpark.

157
00:11:04,640 --> 00:11:16,760
So did some digging, and Nevada commissioned a technical memorandum.

158
00:11:16,760 --> 00:11:18,960
And I'll share the link with you.

159
00:11:18,960 --> 00:11:26,960
It's definitely worth reading through because they've done a lot of work that's in a similar

160
00:11:26,960 --> 00:11:30,320
vein as work that we've done.

161
00:11:30,320 --> 00:11:38,040
And one can only imagine how much the firm that was commissioned to create the report

162
00:11:38,040 --> 00:11:40,120
was paid.

163
00:11:40,120 --> 00:11:43,320
And this is all public data.

164
00:11:43,320 --> 00:11:49,000
So that's what we're here at the Cannabis Data Science Group to do is use public data

165
00:11:49,000 --> 00:11:54,440
to create statistics that people find valuable.

166
00:11:54,440 --> 00:12:03,680
So if the state of Nevada paid a lump sum of money to get these statistics calculated,

167
00:12:03,680 --> 00:12:05,840
then they must be valuable.

168
00:12:05,840 --> 00:12:12,840
So we're going to essentially try to recalculate these statistics, right?

169
00:12:12,840 --> 00:12:21,280
Because that's part of the scientific process is to replicate other people's results.

170
00:12:21,280 --> 00:12:23,560
One second.

171
00:12:23,560 --> 00:12:27,200
Let's get...

172
00:12:27,200 --> 00:12:29,240
And welcome to the group.

173
00:12:29,240 --> 00:12:30,240
OK.

174
00:12:30,240 --> 00:12:35,160
So just had to let someone in.

175
00:12:35,160 --> 00:12:40,360
So we're going to try to replicate these data points here.

176
00:12:40,360 --> 00:12:51,440
So for example, in Massachusetts, they calculated in 2020, I believe.

177
00:12:51,440 --> 00:13:03,680
Once again, this is sort of going to be the lesson of the day is calculating precise numbers

178
00:13:03,680 --> 00:13:07,960
is difficult and hard to reproduce.

179
00:13:07,960 --> 00:13:14,000
And that's why we're trying to do things open source as an open box, right?

180
00:13:14,000 --> 00:13:17,560
Because that way all of you can look at the source code.

181
00:13:17,560 --> 00:13:25,000
Anyone in the world can look at the source code and critique it or improve upon it.

182
00:13:25,000 --> 00:13:30,760
So that way it's not just a black box that's spitting out these numbers.

183
00:13:30,760 --> 00:13:37,680
We actually have a nice transparent box so we can see the process about how these numbers

184
00:13:37,680 --> 00:13:40,360
are calculated.

185
00:13:40,360 --> 00:13:41,360
Because...

186
00:13:41,360 --> 00:13:49,480
So right off the bat, right, they say by state 2020.

187
00:13:49,480 --> 00:13:54,920
Well, their dispensary count says 2021.

188
00:13:54,920 --> 00:13:59,040
So well, we've got the numbers right here in front of us.

189
00:13:59,040 --> 00:14:01,920
And so should check that.

190
00:14:01,920 --> 00:14:10,800
But centrally, where exactly are they getting this revenue per dispensary?

191
00:14:10,800 --> 00:14:20,460
And as we were just talking about in Massachusetts, these retailers, not all of them necessarily

192
00:14:20,460 --> 00:14:22,880
are operating.

193
00:14:22,880 --> 00:14:38,960
So where are they getting this 84 retailer number from?

194
00:14:38,960 --> 00:14:42,520
Are they counting just final licenses?

195
00:14:42,520 --> 00:14:51,220
Are they just counting people that they think have commenced operations in 2020?

196
00:14:51,220 --> 00:14:57,440
So this is something that we're going to poke at.

197
00:14:57,440 --> 00:15:01,840
And there's some dimensions to this that we'll talk about here shortly.

198
00:15:01,840 --> 00:15:12,120
But long story short, we're, one, we're going to try to reproduce these numbers, but reproducing

199
00:15:12,120 --> 00:15:21,200
numbers or statistics just for statistics sake is not that interesting.

200
00:15:21,200 --> 00:15:27,920
So welcome, Eric.

201
00:15:27,920 --> 00:15:33,600
So we need a research question.

202
00:15:33,600 --> 00:15:44,320
So what jumps out to me and the reason Nevada commissioned the report to begin with is they

203
00:15:44,320 --> 00:15:54,340
were curious about how many potential new retailers they could bring into the market

204
00:15:54,340 --> 00:16:01,480
and how many retailers will the market sustain?

205
00:16:01,480 --> 00:16:07,440
So for example, states like Oklahoma have gone a route where they haven't really introduced

206
00:16:07,440 --> 00:16:13,680
a cap and it's just you pay the license fee and you can open up shop.

207
00:16:13,680 --> 00:16:23,160
So we may want to essentially compare states that have caps in those that don't and not

208
00:16:23,160 --> 00:16:31,440
just caps, but essentially policies that affect how many dispensaries can open up and why

209
00:16:31,440 --> 00:16:44,920
well, going to incorporate a little bit of economics here, not to scare anyone off, but

210
00:16:44,920 --> 00:16:51,200
this is a good introduction to industrial organization and we're just scraping the

211
00:16:51,200 --> 00:16:52,980
iceberg here.

212
00:16:52,980 --> 00:16:59,480
But basically, industrial organization economists break down the market into four different

213
00:16:59,480 --> 00:17:01,560
structures.

214
00:17:01,560 --> 00:17:07,320
Of course, if you studied economics, you know about perfect competition and this is the

215
00:17:07,320 --> 00:17:12,600
baseline assumption in most models.

216
00:17:12,600 --> 00:17:19,400
We're assuming infinite firms, no barriers to entry, no market power.

217
00:17:19,400 --> 00:17:21,740
So price is equal to cost.

218
00:17:21,740 --> 00:17:25,900
So that means no one's making a profit.

219
00:17:25,900 --> 00:17:30,400
This is quite the assumptions.

220
00:17:30,400 --> 00:17:40,940
And so, you know, similarly on the other side of the spectrum, you have the monopoly where

221
00:17:40,940 --> 00:17:45,800
you have one singular firm, right?

222
00:17:45,800 --> 00:17:47,000
There's only one.

223
00:17:47,000 --> 00:17:49,960
So the barriers to entry is essentially infinite.

224
00:17:49,960 --> 00:17:54,160
You can't enter and they have high market power.

225
00:17:54,160 --> 00:18:02,560
So they can set the price to maximize their profits.

226
00:18:02,560 --> 00:18:05,500
And they do so.

227
00:18:05,500 --> 00:18:08,240
There are a few examples of monopolies.

228
00:18:08,240 --> 00:18:16,760
So in certain geographical areas, you may have one power company or one internet provider.

229
00:18:16,760 --> 00:18:25,720
So this is possible, but one can argue that there's not that many examples of pure monopolies.

230
00:18:25,720 --> 00:18:30,440
And then, like we said, you don't really see perfect competition.

231
00:18:30,440 --> 00:18:39,480
So that brings us to the two market structures that are observed empirically.

232
00:18:39,480 --> 00:18:48,400
And these are monopolistic competition, where you have many firms, low barriers to entry,

233
00:18:48,400 --> 00:18:49,880
low market power.

234
00:18:49,880 --> 00:18:55,280
So this you could think of as restaurants.

235
00:18:55,280 --> 00:18:58,080
There's many, many, many, many restaurants.

236
00:18:58,080 --> 00:19:05,000
The barriers to entry, there are some, but they're relatively low.

237
00:19:05,000 --> 00:19:10,200
You know, think of the lemonade stand.

238
00:19:10,200 --> 00:19:12,320
And their market power is relatively low.

239
00:19:12,320 --> 00:19:17,200
You can't set price that much above cost.

240
00:19:17,200 --> 00:19:19,920
So your profits are going to be low.

241
00:19:19,920 --> 00:19:27,640
So that's why you see in the restaurant industry, low profit margins and lots of turnover.

242
00:19:27,640 --> 00:19:35,520
So you see lots of firms entering, lots of firms exiting.

243
00:19:35,520 --> 00:19:39,080
As you move, and you can think of this almost as a spectrum.

244
00:19:39,080 --> 00:19:45,440
So as you move towards monopoly, you're going to have fewer and fewer firms.

245
00:19:45,440 --> 00:19:51,520
You're going to observe more and more barriers to entry.

246
00:19:51,520 --> 00:20:01,600
And each player depends on game theory and market dynamics.

247
00:20:01,600 --> 00:20:05,500
So their market power may be low, it may be high.

248
00:20:05,500 --> 00:20:12,080
So it's a bit more indeterminate, but one can generally say that they've got medium

249
00:20:12,080 --> 00:20:15,480
to high market power.

250
00:20:15,480 --> 00:20:19,460
So this is just a little background of economics.

251
00:20:19,460 --> 00:20:22,760
Why does it matter?

252
00:20:22,760 --> 00:20:40,160
Well, you can think of dispensaries per 100,000 adults as the measure of the number of firms.

253
00:20:40,160 --> 00:20:49,440
So if we ever saw infinite dispensaries per 1,000 people, that would be perfect competition.

254
00:20:49,440 --> 00:20:58,080
If we just saw a really, really small number where there's just one, it would be a fraction,

255
00:20:58,080 --> 00:21:05,160
where there's one dispensary for everyone.

256
00:21:05,160 --> 00:21:14,680
So that's sort of the idea is we can use the dispensaries per 100,000 people as a measure

257
00:21:14,680 --> 00:21:21,440
of the number of firms in the market relative to population.

258
00:21:21,440 --> 00:21:25,500
And it doesn't control for costs.

259
00:21:25,500 --> 00:21:32,200
So one would assume costs of say opening a retailer very state by state.

260
00:21:32,200 --> 00:21:35,320
That goes back to the barriers to entry.

261
00:21:35,320 --> 00:21:40,120
So we can't measure costs, we can't measure profits.

262
00:21:40,120 --> 00:21:43,880
But the next best thing we can measure revenue.

263
00:21:43,880 --> 00:21:51,160
So we can essentially measure revenue as a proxy for profit.

264
00:21:51,160 --> 00:22:01,840
Like I said, it's not necessarily going to be one to one because costs are probably not

265
00:22:01,840 --> 00:22:05,440
the same across states.

266
00:22:05,440 --> 00:22:10,720
So long story short, if we're just looking at a couple of these, lots of dispensaries

267
00:22:10,720 --> 00:22:21,220
in Alaska, the next most dense, as far as dispensary goes, is Oregon.

268
00:22:21,220 --> 00:22:26,060
So 23 dispensaries per 100,000 people.

269
00:22:26,060 --> 00:22:33,320
And then on the other end of the spectrum, you've got the low point of Illinois, one

270
00:22:33,320 --> 00:22:42,440
dispensary per 100,000 people, and then you've got some other ones like Vermont, Maine, and

271
00:22:42,440 --> 00:22:52,840
Arizona, and Massachusetts that also have a low number of dispensaries per 100,000.

272
00:22:52,840 --> 00:23:01,200
Well just visually, if you're just trying to do just a visual correlation here, you

273
00:23:01,200 --> 00:23:11,960
see, oh look, Oregon revenue per dispensary is about 1.5 million.

274
00:23:11,960 --> 00:23:21,760
On the other side, Illinois, one dispensary per 100,000 revenue, 10 million annual.

275
00:23:21,760 --> 00:23:28,840
Well obviously it's not like a perfect correlation here because look at, we've got some outliers,

276
00:23:28,840 --> 00:23:29,840
right?

277
00:23:29,840 --> 00:23:31,440
Maine.

278
00:23:31,440 --> 00:23:35,600
Maine doesn't have many, and so look, this is real interesting.

279
00:23:35,600 --> 00:23:41,760
So here Maine has 1.4 dispensaries per 100,000.

280
00:23:41,760 --> 00:23:44,160
Massachusetts has a comparable 1.5.

281
00:23:44,160 --> 00:23:53,320
But look at the difference in revenue in a Maine retailer on average versus a Massachusetts

282
00:23:53,320 --> 00:23:54,320
retailer.

283
00:23:54,320 --> 00:24:04,640
So on average you're having less than a third of a million in revenue in Maine, whereas

284
00:24:04,640 --> 00:24:10,520
in Massachusetts it's almost 8.3 million per year.

285
00:24:10,520 --> 00:24:21,680
So we can use statistics and we can actually quantify what this relationship is.

286
00:24:21,680 --> 00:24:27,560
So without further ado, let's do just that.

287
00:24:27,560 --> 00:24:33,880
And I forgot to commit today's code.

288
00:24:33,880 --> 00:24:47,320
So long story short, you can check out the cannabis data science repository to get the

289
00:24:47,320 --> 00:24:49,440
code as we go through it.

290
00:24:49,440 --> 00:24:55,880
So for those who just joined today, maybe a little hard to follow along, but in the

291
00:24:55,880 --> 00:25:07,160
future I'll post the code here so that way you can follow along in Python if you're a

292
00:25:07,160 --> 00:25:09,000
Pythonista.

293
00:25:09,000 --> 00:25:17,120
If you're not the end of the world, all of this is just statistics and can be reproduced

294
00:25:17,120 --> 00:25:20,680
in your favorite programming language.

295
00:25:20,680 --> 00:25:34,240
So without further ado, I've just taken this data and put it into Excel.

296
00:25:34,240 --> 00:25:41,660
Bear with me, Excel updated itself.

297
00:25:41,660 --> 00:25:45,480
So long story short, here's the data.

298
00:25:45,480 --> 00:25:53,240
I just copied and pasted it from this table and I'll share the link with you.

299
00:25:53,240 --> 00:25:55,520
Here's the link right here to the report.

300
00:25:55,520 --> 00:26:00,400
And so I'll share this with you after the meetup.

301
00:26:00,400 --> 00:26:08,000
And so this is the technical memorandum, as I said, commissioned by Nevada, produced by

302
00:26:08,000 --> 00:26:10,760
RCG economics.

303
00:26:10,760 --> 00:26:14,220
They probably didn't produce the report for free.

304
00:26:14,220 --> 00:26:28,400
So that means the statistics have some value, what the value is, not 100% certain, depends

305
00:26:28,400 --> 00:26:29,400
on how much they paid.

306
00:26:29,400 --> 00:26:34,360
But long story short, let's calculate these statistics.

307
00:26:34,360 --> 00:26:39,280
So first, let's just use their data.

308
00:26:39,280 --> 00:26:46,640
And then, as I said, we're going to try to reproduce their statistics.

309
00:26:46,640 --> 00:26:55,680
So first things first, we read in the retailer stats.

310
00:26:55,680 --> 00:26:59,200
Let me just make sure.

311
00:26:59,200 --> 00:27:03,920
OK, so we've got the retailer statistics here.

312
00:27:03,920 --> 00:27:12,560
Also, if I'm just droning on it too long at any point, just feel free to chime up and

313
00:27:12,560 --> 00:27:16,800
speak up if you have any questions at any point.

314
00:27:16,800 --> 00:27:22,680
OK, so we've got our states here.

315
00:27:22,680 --> 00:27:30,400
As we can see, some states are missing revenue per retailer.

316
00:27:30,400 --> 00:27:36,600
At the end of the world, we just don't have those observations to work from.

317
00:27:36,600 --> 00:27:44,380
So here, we just exclude all the, and I'm calling them observations, because as you

318
00:27:44,380 --> 00:27:53,280
can see, we've got two observations here in Nevada, including Nevada as a whole.

319
00:27:53,280 --> 00:27:58,900
So in my analysis today, I'm including these two counties.

320
00:27:58,900 --> 00:28:08,720
If you think it's better to exclude them, then they can be excluded.

321
00:28:08,720 --> 00:28:14,680
For our sake, I'm not a big fan of throwing away data.

322
00:28:14,680 --> 00:28:20,160
And so I think they're interesting observations, and so I'm going to include them, because

323
00:28:20,160 --> 00:28:28,280
we only have 12 observations here.

324
00:28:28,280 --> 00:28:34,360
So I don't want to throw any away if I can help it.

325
00:28:34,360 --> 00:28:41,840
And so we only have 12 observations, so I'm sure a lot of you have taken statistics, and

326
00:28:41,840 --> 00:28:49,840
your statistics professor probably told you that you would like a minimum of maybe 25

327
00:28:49,840 --> 00:28:54,600
or 30 observations to run a regression.

328
00:28:54,600 --> 00:29:02,560
And this is true because the law of large numbers starts to kick in the more and more

329
00:29:02,560 --> 00:29:03,800
observations you get.

330
00:29:03,800 --> 00:29:12,320
And so what the law of large numbers is, is the more you observe a data set, the more

331
00:29:12,320 --> 00:29:15,800
likely you are to have a representative sample.

332
00:29:15,800 --> 00:29:22,680
So if you observe every point, then you've observed the population.

333
00:29:22,680 --> 00:29:29,400
If you're only observing certain points, you have a sample of the population, and basically

334
00:29:29,400 --> 00:29:36,600
the larger and larger your sample gets, the more representative it gets, and you're going

335
00:29:36,600 --> 00:29:40,760
to have more accurate results.

336
00:29:40,760 --> 00:29:46,880
Here in the Meetup group, we take what we're given and we make the best of it.

337
00:29:46,880 --> 00:29:53,560
So we're going to run a regression here with 12 observations.

338
00:29:53,560 --> 00:30:01,520
And we're essentially just going to run a regression of revenue per retailer on retailers

339
00:30:01,520 --> 00:30:03,880
per 100,000.

340
00:30:03,880 --> 00:30:14,360
So I'm assuming that the direction of the relationship is the more retailers that are

341
00:30:14,360 --> 00:30:21,440
licensed into the market, I'm assuming that's independent of revenue per retailer.

342
00:30:21,440 --> 00:30:31,120
So the commission is just licensing people as they commit, as they make successful applications.

343
00:30:31,120 --> 00:30:38,800
And then the revenue per retailer is dependent on the number of retailers in the market.

344
00:30:38,800 --> 00:30:45,160
So our dependent variable, retailers, independent revenue.

345
00:30:45,160 --> 00:30:49,760
We run our regression.

346
00:30:49,760 --> 00:30:58,040
We have a weak fit, so low r squared, so there's a lot of variation left to be explained.

347
00:30:58,040 --> 00:31:08,160
However, even with our low number of observations, we do have...

348
00:31:08,160 --> 00:31:16,400
If you're a frequentist, you almost would be able to declare statistical significance

349
00:31:16,400 --> 00:31:19,880
from this parameter here.

350
00:31:19,880 --> 00:31:27,560
But as some of you may have picked up, I'm a bit more on the Bayesian spectrum in that

351
00:31:27,560 --> 00:31:39,800
I don't put as much importance on significant values as certain people, and I'm open about

352
00:31:39,800 --> 00:31:43,320
my biases and my priors.

353
00:31:43,320 --> 00:31:46,440
But I still do frequentist statistics.

354
00:31:46,440 --> 00:31:50,480
So anyways, but that's a whole nother can of worms.

355
00:31:50,480 --> 00:32:06,240
If you're interested in statistics, do some searching about frequentism and Bayesian statistics.

356
00:32:06,240 --> 00:32:11,280
Whole can of worms, so I'm not going to get into it today.

357
00:32:11,280 --> 00:32:26,000
But long story short, we at least have a coefficient here so we can make a statement about the

358
00:32:26,000 --> 00:32:32,600
statistical relationship, the correlation between these variables.

359
00:32:32,600 --> 00:32:37,800
And in English, essentially what we're saying is...

360
00:32:37,800 --> 00:32:45,760
So here, we estimated a negative coefficient on retailers per 100,000, which is sort of

361
00:32:45,760 --> 00:32:54,520
what we expected in that as there are more and more retailers entering the market, revenue

362
00:32:54,520 --> 00:32:58,240
per retailer is decreasing.

363
00:32:58,240 --> 00:33:02,160
So they're cannibalizing each other.

364
00:33:02,160 --> 00:33:07,960
It's not just one retailer comes online and all those sales just come out of thin air.

365
00:33:07,960 --> 00:33:11,760
No, there's only so many sales happening in the market.

366
00:33:11,760 --> 00:33:21,120
And if another retailer enters, some sales will filter from other retailers to that retailer.

367
00:33:21,120 --> 00:33:31,460
So what we're saying is, if retailers per 100,000 adults increases by one, everything

368
00:33:31,460 --> 00:33:45,560
else held constant, you would expect the average revenue per retailer to decrease by 350,000

369
00:33:45,560 --> 00:33:48,520
per year.

370
00:33:48,520 --> 00:33:55,920
So it sounds like a lot.

371
00:33:55,920 --> 00:34:04,240
Just to put this in perspective real quick, I was just going to show you Massachusetts

372
00:34:04,240 --> 00:34:11,240
population here.

373
00:34:11,240 --> 00:34:14,920
To kind of put things into perspective here.

374
00:34:14,920 --> 00:34:17,920
One second.

375
00:34:17,920 --> 00:34:19,080
Okay.

376
00:34:19,080 --> 00:34:20,400
So right.

377
00:34:20,400 --> 00:34:26,120
So one per 100,000.

378
00:34:26,120 --> 00:34:35,080
Well, that would be

379
00:34:35,080 --> 00:34:42,040
approximately 69 retailers in Massachusetts.

380
00:34:42,040 --> 00:34:56,160
So what our analysis would suggest is, if 69 retailers came online in Massachusetts,

381
00:34:56,160 --> 00:35:03,080
everything else held constant, you would expect the average revenue per retailer to decrease

382
00:35:03,080 --> 00:35:05,080
by 350,000.

383
00:35:05,080 --> 00:35:10,760
Well, we've got a bunch of data here.

384
00:35:10,760 --> 00:35:15,680
If you listen to Edward Tuft at all, and if you haven't, I'd recommend checking out his

385
00:35:15,680 --> 00:35:16,680
books.

386
00:35:16,680 --> 00:35:19,540
He's all about visualizing data.

387
00:35:19,540 --> 00:35:22,880
So enough looking at the numbers.

388
00:35:22,880 --> 00:35:26,240
Let's visualize them.

389
00:35:26,240 --> 00:35:36,600
So here is a scatter plot of revenue to retailers plus the regression line.

390
00:35:36,600 --> 00:35:43,480
And so I'll share the code with you, but here I'm going to make a, I'm going to redo this

391
00:35:43,480 --> 00:35:51,120
same figure inspired by Edward Tuft, and we're going to make it look nice.

392
00:35:51,120 --> 00:35:59,560
And then we're going to talk about it because that's what we're here to do.

393
00:35:59,560 --> 00:36:03,040
So just made a beautiful chart.

394
00:36:03,040 --> 00:36:11,800
And so now let's talk about that rather than just looking at numbers.

395
00:36:11,800 --> 00:36:24,640
So here we have plotted all of our observations, which we have from the table that was prepared

396
00:36:24,640 --> 00:36:29,120
for Nevada.

397
00:36:29,120 --> 00:36:39,040
As we observed visually earlier, just from the table, Maine does in fact look to be like

398
00:36:39,040 --> 00:36:41,200
an outlier.

399
00:36:41,200 --> 00:36:46,040
So that's a whole other research question of its own.

400
00:36:46,040 --> 00:36:48,400
Why is Maine an outlier?

401
00:36:48,400 --> 00:36:52,920
So it's not just population alone.

402
00:36:52,920 --> 00:36:54,740
There's something else going on there.

403
00:36:54,740 --> 00:37:02,880
So it would be worthwhile trying to answer that question.

404
00:37:02,880 --> 00:37:06,600
And then we see the negative relationship here.

405
00:37:06,600 --> 00:37:17,040
And we see, as we observed, Oregon on the far end where they have many retailers per

406
00:37:17,040 --> 00:37:19,340
1,000 adults.

407
00:37:19,340 --> 00:37:25,040
The average revenue is low.

408
00:37:25,040 --> 00:37:31,440
And what I think is interesting is it looks like there's almost two groups here.

409
00:37:31,440 --> 00:37:38,860
So if you were going to do like a principal component analysis and try to group these,

410
00:37:38,860 --> 00:37:40,920
I would almost do two groups.

411
00:37:40,920 --> 00:37:42,640
I would do these.

412
00:37:42,640 --> 00:37:49,460
I would do this group, like this low revenue per dispensary.

413
00:37:49,460 --> 00:37:56,760
And that would be Maine, Michigan, Washington, Colorado, and Oregon.

414
00:37:56,760 --> 00:38:04,240
And I would try to find some, maybe there's some trait that all of these states have that

415
00:38:04,240 --> 00:38:05,720
this group doesn't, right?

416
00:38:05,720 --> 00:38:10,440
Because this looks almost like a separate group here, where you've got Arizona, Illinois,

417
00:38:10,440 --> 00:38:15,480
Oregon, Nevada, Massachusetts, California.

418
00:38:15,480 --> 00:38:21,240
What is different about those states than these states?

419
00:38:21,240 --> 00:38:35,560
Well, that's a whole question that we may not be able to answer today.

420
00:38:35,560 --> 00:38:39,520
So that's what being a scientist is all about, right?

421
00:38:39,520 --> 00:38:43,080
Now it's time to hit the books, do some research,

422
00:38:43,080 --> 00:38:46,920
and look at the cannabis laws and everything

423
00:38:46,920 --> 00:38:48,840
in these states versus these states

424
00:38:48,840 --> 00:38:53,720
and see if there's some sort of systematic difference.

425
00:38:53,720 --> 00:38:58,360
One can note, right, Washington, Colorado, the two first states

426
00:38:58,360 --> 00:39:03,880
to legalize cannabis, followed shortly by Oregon.

427
00:39:03,880 --> 00:39:08,600
Michigan's interesting because they legalized in 2018,

428
00:39:08,600 --> 00:39:13,040
but they have a long history of medicinal cannabis

429
00:39:13,040 --> 00:39:15,840
that may go back till 2008 or so.

430
00:39:15,840 --> 00:39:18,760
Don't quote me on that.

431
00:39:18,760 --> 00:39:24,440
So and I don't know the history of Maine cannabis that well.

432
00:39:24,440 --> 00:39:27,680
Most people know the history of California cannabis, right?

433
00:39:27,680 --> 00:39:34,040
That's where cannabis is famously come from historically.

434
00:39:34,040 --> 00:39:36,800
And if you talk to people in the industry,

435
00:39:36,800 --> 00:39:40,880
California's got everyone will say that they've

436
00:39:40,880 --> 00:39:42,320
got big problems right now.

437
00:39:42,320 --> 00:39:47,360
So it's tough for licensees there because anecdotally,

438
00:39:47,360 --> 00:39:52,040
there's still a large illegal market in California.

439
00:39:52,040 --> 00:39:54,360
I always find it funny when people

440
00:39:54,360 --> 00:39:56,880
try to estimate the size of the illegal market

441
00:39:56,880 --> 00:40:01,520
because I mean, that's the whole point is it's not measured.

442
00:40:01,520 --> 00:40:02,400
It's illegal.

443
00:40:02,400 --> 00:40:06,840
So if we were keeping a tab on it,

444
00:40:06,840 --> 00:40:09,160
then it wouldn't be possible.

445
00:40:09,160 --> 00:40:12,440
So by its nature, it's unmeasured.

446
00:40:12,440 --> 00:40:16,800
But some people estimate that maybe 50% to 75%

447
00:40:16,800 --> 00:40:21,200
of all production in California is on the black market.

448
00:40:21,200 --> 00:40:24,960
And so that may have some sort of effect

449
00:40:24,960 --> 00:40:30,040
on what's going on with the structure in California.

450
00:40:30,040 --> 00:40:32,280
Massachusetts is much different, right?

451
00:40:32,280 --> 00:40:36,560
So Massachusetts has tried to be a industry leader,

452
00:40:36,560 --> 00:40:38,200
and they're open about that.

453
00:40:38,200 --> 00:40:40,120
So if you listen to the public meeting,

454
00:40:40,120 --> 00:40:42,640
they'll talk about how they are.

455
00:40:42,640 --> 00:40:44,200
It's not a goal of theirs, but they

456
00:40:44,200 --> 00:40:47,960
don't mind being a leader for other states.

457
00:40:47,960 --> 00:40:53,840
And they've consulted with Maine, New York, New Jersey,

458
00:40:53,840 --> 00:40:57,960
Virginia, Rhode Island, Connecticut, right?

459
00:40:57,960 --> 00:41:02,120
They are talking with all of the leaders

460
00:41:02,120 --> 00:41:05,800
and commissions in all of these other states

461
00:41:05,800 --> 00:41:09,720
to talk about what's working well for them and what's not.

462
00:41:09,720 --> 00:41:12,320
So Massachusetts is definitely trying to be a leader here.

463
00:41:16,800 --> 00:41:19,680
And then I just don't know that much

464
00:41:19,680 --> 00:41:21,520
about some of these other states.

465
00:41:21,520 --> 00:41:23,680
Nevada is relatively new.

466
00:41:23,680 --> 00:41:26,360
Don't know too much about Nevada.

467
00:41:26,360 --> 00:41:33,600
Illinois allowed adult use in January 1st of 2020.

468
00:41:33,600 --> 00:41:37,120
Arizona is an interesting market because they've

469
00:41:37,120 --> 00:41:43,080
had medicinal for a long time, and they just permitted

470
00:41:43,080 --> 00:41:45,000
adult use not that long ago.

471
00:41:48,040 --> 00:41:53,960
So long story short, we just have the relationship here.

472
00:41:53,960 --> 00:41:59,800
So we don't know for certain why the states are shaking out

473
00:41:59,800 --> 00:42:01,560
along these different points.

474
00:42:01,560 --> 00:42:09,880
But we can observe that there's a negative correlation

475
00:42:09,880 --> 00:42:16,000
between retailers per 100,000 annual revenue.

476
00:42:16,000 --> 00:42:21,600
And this is in line with our economic theory, right?

477
00:42:21,600 --> 00:42:24,840
The more firms there are in the market,

478
00:42:24,840 --> 00:42:27,640
the more competitive it becomes.

479
00:42:27,640 --> 00:42:33,480
And they're not able to raise prices above costs.

480
00:42:33,480 --> 00:42:36,040
And so their average revenue may decrease.

481
00:42:36,040 --> 00:42:39,400
Once again, we're not measuring profits here.

482
00:42:39,400 --> 00:42:43,560
So it's hard to speak to the economic side.

483
00:42:43,560 --> 00:42:52,480
Now, we want to add more data points

484
00:42:52,480 --> 00:42:56,680
and see if this relationship holds up, right?

485
00:42:56,680 --> 00:43:01,720
Because we only have 12 data points here.

486
00:43:01,720 --> 00:43:07,440
So it would be really cool if we could add many, many, many,

487
00:43:07,440 --> 00:43:12,120
many, many more data points to the economic side.

488
00:43:12,120 --> 00:43:15,360
Many, many, many, many, many more data points.

489
00:43:15,360 --> 00:43:17,960
The number of states we have is limited.

490
00:43:17,960 --> 00:43:27,320
So I'm going to introduce you to panel data.

491
00:43:27,320 --> 00:43:31,080
And I'm just going to talk about it briefly today.

492
00:43:31,080 --> 00:43:34,960
If you tune in the Saturday morning statistics this week,

493
00:43:34,960 --> 00:43:40,200
I believe I'm going to try to estimate hopefully both

494
00:43:40,200 --> 00:43:44,160
the fixed effects and the random effects model.

495
00:43:44,160 --> 00:43:46,920
And so what are these models?

496
00:43:46,920 --> 00:43:53,600
OK, well, we can essentially create panel data here.

497
00:43:53,600 --> 00:44:00,680
If we look at this data, we've got x sub i.

498
00:44:00,680 --> 00:44:08,120
So we've got x of Arizona, x of California, x of Colorado.

499
00:44:08,120 --> 00:44:08,800
Well, guess what?

500
00:44:08,800 --> 00:44:11,400
There's another dimension here, time.

501
00:44:15,120 --> 00:44:18,480
We can thank Einstein for discovering this dimension.

502
00:44:18,480 --> 00:44:22,880
But time's a whole other dimension here.

503
00:44:22,880 --> 00:44:29,400
So we can basically say we've got an individual dimension,

504
00:44:29,400 --> 00:44:31,760
which is our state.

505
00:44:31,760 --> 00:44:36,600
And then we've got a time dimension, which is over time.

506
00:44:36,600 --> 00:44:42,760
So we can actually measure all of these statistics over time.

507
00:44:42,760 --> 00:44:49,240
So what we have here is a snapshot of 2020.

508
00:44:49,240 --> 00:44:58,040
But if we aggregate the data for each point in time

509
00:44:58,040 --> 00:45:03,320
so that we know the revenue and the number of retailers

510
00:45:03,320 --> 00:45:07,560
in each state at each point in time,

511
00:45:07,560 --> 00:45:12,440
well, we're going to have a lot of observations.

512
00:45:12,440 --> 00:45:16,680
And we can use fixed effects and random effects models

513
00:45:16,680 --> 00:45:23,400
to better estimate our parameter here, beta.

514
00:45:23,400 --> 00:45:30,840
So here, we already estimated beta

515
00:45:30,840 --> 00:45:35,240
as negative 350,000.

516
00:45:35,240 --> 00:45:39,080
However, this may be biased since we only

517
00:45:39,080 --> 00:45:41,440
have 12 observations.

518
00:45:41,440 --> 00:45:45,720
So we can get a lot more observations

519
00:45:45,720 --> 00:45:53,040
and see if our estimate is still the same or if it's different.

520
00:45:53,040 --> 00:45:58,000
And essentially, we can just get a better measure.

521
00:45:58,000 --> 00:46:02,160
Well, just to go ahead and show you

522
00:46:02,160 --> 00:46:08,360
how we're going to be going about that for each state

523
00:46:08,360 --> 00:46:13,720
and how this could lead to discrepancies against the data

524
00:46:13,720 --> 00:46:16,600
we've already seen.

525
00:46:16,600 --> 00:46:21,320
Well, so we'll eventually want to do this for each state,

526
00:46:21,320 --> 00:46:26,080
California, Oregon, Colorado, Illinois, so on and so forth.

527
00:46:26,080 --> 00:46:31,560
However, for today, we can just look at Massachusetts.

528
00:46:31,560 --> 00:46:35,680
So in prior weeks, we've looked at this data.

529
00:46:35,680 --> 00:46:39,200
And so please check out some of the prior videos

530
00:46:39,200 --> 00:46:42,440
just to see how we read in this data

531
00:46:42,440 --> 00:46:46,480
and some of the intricacies of the data.

532
00:46:46,480 --> 00:46:53,560
Massachusetts has a awesome open data API

533
00:46:53,560 --> 00:46:59,800
so we can read in the number of licensees and sales

534
00:46:59,800 --> 00:47:03,960
on an almost real-time basis.

535
00:47:03,960 --> 00:47:10,840
So let's do exactly that.

536
00:47:10,840 --> 00:47:14,680
So once again, I'll share the source code with you.

537
00:47:14,680 --> 00:47:17,000
I'm sort of going to be moving through it quick just

538
00:47:17,000 --> 00:47:20,920
so we can get to the interesting parts.

539
00:47:20,920 --> 00:47:28,000
So long story short, just going to read in the licensees data

540
00:47:28,000 --> 00:47:30,480
and the production data.

541
00:47:30,480 --> 00:47:41,320
So licensees, we've got 914.

542
00:47:41,320 --> 00:47:44,960
And then if you were going to look

543
00:47:44,960 --> 00:47:50,800
at a specific observation, we have many, many

544
00:47:50,800 --> 00:47:56,360
data points for each licensee.

545
00:47:56,360 --> 00:48:02,040
In particular, we're going to be looking at retailers,

546
00:48:02,040 --> 00:48:07,120
so all of the licenses with license type retailer.

547
00:48:07,120 --> 00:48:11,880
Well, this brings us back to what we talked about

548
00:48:11,880 --> 00:48:14,640
at the very beginning.

549
00:48:14,640 --> 00:48:18,720
Not all licensees are operational.

550
00:48:18,720 --> 00:48:26,400
So if we were just going to calculate sales per retailer,

551
00:48:26,400 --> 00:48:31,040
we may overstate how many retailers are actually

552
00:48:31,040 --> 00:48:33,080
operating.

553
00:48:33,080 --> 00:48:39,880
So what you can do is you can just

554
00:48:39,880 --> 00:48:49,680
look at the retailers here that have a final license.

555
00:48:49,680 --> 00:49:00,520
So the retailers would be all of the final licenses

556
00:49:00,520 --> 00:49:10,040
where the license type is a marijuana retailer.

557
00:49:10,040 --> 00:49:15,440
So 191.

558
00:49:15,440 --> 00:49:26,240
And this is where it gets tricky comparing apples to apples.

559
00:49:26,240 --> 00:49:31,600
So we're measuring 191 retailers.

560
00:49:31,600 --> 00:49:39,240
Well, some of these may have come online since November 18.

561
00:49:39,240 --> 00:49:40,160
Not impossible.

562
00:49:46,880 --> 00:49:50,720
But long story short is I think these ones that

563
00:49:50,720 --> 00:49:53,720
have the final license in the open data,

564
00:49:53,720 --> 00:49:57,560
I believe those are the ones that are commensurating

565
00:49:57,560 --> 00:49:59,720
operations.

566
00:49:59,720 --> 00:50:02,240
Because in this last meeting, they just

567
00:50:02,240 --> 00:50:07,120
approved retail licenses for a hand tail of retailers.

568
00:50:07,120 --> 00:50:11,760
So long story short, there's already confusion here.

569
00:50:11,760 --> 00:50:16,520
So I'm not even going to pretend that I have this figured out.

570
00:50:16,520 --> 00:50:20,840
I don't have any clue how to figure out

571
00:50:20,840 --> 00:50:23,560
which of these retailers are actually operating.

572
00:50:23,560 --> 00:50:25,000
And it's gotten to the point where

573
00:50:25,000 --> 00:50:28,880
I'm going to have to email the commission,

574
00:50:28,880 --> 00:50:31,120
try to get a hold of their data scientists,

575
00:50:31,120 --> 00:50:38,000
and try to figure out what exactly they mean when they're

576
00:50:38,000 --> 00:50:42,600
talking about these license types.

577
00:50:42,600 --> 00:50:46,560
Because like I said, the data we're getting in the open API

578
00:50:46,560 --> 00:50:50,600
is not matching up with their internal data

579
00:50:50,600 --> 00:50:54,000
that they presented at the meeting.

580
00:50:54,000 --> 00:51:00,320
And what we're going to do is we're just

581
00:51:00,320 --> 00:51:04,920
going to do the best we can and basically consider

582
00:51:04,920 --> 00:51:10,920
any retailer operating at the time

583
00:51:10,920 --> 00:51:14,200
they got their final license.

584
00:51:14,200 --> 00:51:21,240
So as you can see, this retailer has a provisional license.

585
00:51:21,240 --> 00:51:24,880
They have not gotten their final license yet.

586
00:51:24,880 --> 00:51:30,360
So I'm going to consider this retailer as not operating.

587
00:51:30,360 --> 00:51:32,640
Well, we could do a bit of homework.

588
00:51:32,640 --> 00:51:38,520
We could actually look at one of these.

589
00:51:38,520 --> 00:51:40,320
We could even check out their website

590
00:51:40,320 --> 00:51:44,240
or give them a call, Flower and Soul.

591
00:51:44,240 --> 00:51:47,400
And that's where you have to be a bit of an investigator

592
00:51:47,400 --> 00:51:49,520
when you're doing all of this.

593
00:51:49,520 --> 00:51:52,120
We could even call up Flower and Soul and ask them,

594
00:51:52,120 --> 00:51:55,280
hey, are you operating or not?

595
00:51:55,280 --> 00:52:01,720
So as I said, there's a lot of homework to be done here.

596
00:52:01,720 --> 00:52:04,600
So enough of crepacing all of that.

597
00:52:04,600 --> 00:52:07,960
I'm going to do my best here to calculate

598
00:52:07,960 --> 00:52:13,600
the number of retailers at any given point in time.

599
00:52:13,600 --> 00:52:19,200
So we're going to use the date of final licensure.

600
00:52:19,200 --> 00:52:25,520
And we're basically going to say the retailer started operating

601
00:52:25,520 --> 00:52:27,480
when they got their final license.

602
00:52:31,080 --> 00:52:32,280
Is this perfect?

603
00:52:32,280 --> 00:52:33,800
No.

604
00:52:33,800 --> 00:52:37,840
But this is the best I can think of for the time being.

605
00:52:37,840 --> 00:52:41,480
As we learn more, we can calculate

606
00:52:41,480 --> 00:52:45,120
better and better statistics.

607
00:52:45,120 --> 00:52:51,720
Well, let's go ahead and calculate these.

608
00:52:51,720 --> 00:52:55,200
So we're in specific looking at retailers.

609
00:52:57,960 --> 00:53:05,240
So here we see our count of retailers going up and up

610
00:53:05,240 --> 00:53:13,200
and up and up until the point where we have the, well,

611
00:53:13,200 --> 00:53:18,040
now we're counting 193.

612
00:53:18,040 --> 00:53:19,520
This should be 191.

613
00:53:19,520 --> 00:53:23,120
So there's a discrepancy here between those

614
00:53:23,120 --> 00:53:25,600
that have a final license and then those that have

615
00:53:25,600 --> 00:53:27,440
a final licensure date.

616
00:53:27,440 --> 00:53:30,080
So that's more investigation.

617
00:53:30,080 --> 00:53:31,680
But we're almost here at the end,

618
00:53:31,680 --> 00:53:34,720
so I'm just going to power through this,

619
00:53:34,720 --> 00:53:39,920
keeping in mind that we've got some imperfect statistics here.

620
00:53:39,920 --> 00:53:43,320
So we've got our retailers going up and up and up and up.

621
00:53:43,320 --> 00:53:44,960
Well, look at this.

622
00:53:44,960 --> 00:53:48,640
In 2020, well, you don't have the same number

623
00:53:48,640 --> 00:53:51,120
of retailers at the beginning of year

624
00:53:51,120 --> 00:53:53,760
as you do at the end of the year.

625
00:53:53,760 --> 00:53:59,040
So that makes me wonder, when they calculated

626
00:53:59,040 --> 00:54:03,960
their statistics, how are they calculating

627
00:54:03,960 --> 00:54:07,440
the number of retailers in 2020?

628
00:54:07,440 --> 00:54:12,320
Are they using the end of year number of retailers?

629
00:54:12,320 --> 00:54:16,440
Are they using 2021's retailers?

630
00:54:16,440 --> 00:54:18,320
How are they calculating these numbers here?

631
00:54:23,360 --> 00:54:28,040
Because this is an increasing number over time.

632
00:54:28,040 --> 00:54:35,360
Well, we've got sales per week.

633
00:54:35,360 --> 00:54:35,880
Right?

634
00:54:35,880 --> 00:54:37,360
We've got weekly sales.

635
00:54:43,200 --> 00:54:44,880
We can get weekly sales.

636
00:54:50,000 --> 00:54:53,600
So here is sales over time.

637
00:54:53,600 --> 00:54:55,600
And those of you just joining us today,

638
00:54:55,600 --> 00:54:58,400
Massachusetts has a real interesting market here,

639
00:54:58,400 --> 00:55:03,600
where in 2020, they closed for two months.

640
00:55:03,600 --> 00:55:05,200
Right?

641
00:55:05,200 --> 00:55:09,200
I mean, you may want to put an asterisk here

642
00:55:09,200 --> 00:55:12,680
on the Massachusetts one and say Massachusetts

643
00:55:12,680 --> 00:55:16,000
was closed for two months in 2020.

644
00:55:16,000 --> 00:55:16,520
Right?

645
00:55:16,520 --> 00:55:20,520
Because I mean, that's just not even taken into consideration.

646
00:55:20,520 --> 00:55:25,000
Like 2020 was closed for two months.

647
00:55:25,000 --> 00:55:31,280
So 2020 is quite the atypical year in Massachusetts.

648
00:55:35,280 --> 00:55:37,640
Well, just to keep powering through this

649
00:55:37,640 --> 00:55:45,200
and making some statistics here, we can now do sales per retailer.

650
00:55:45,200 --> 00:55:48,240
And we've calculated this in weeks past.

651
00:55:48,240 --> 00:55:52,960
But now we actually get to calculate it

652
00:55:52,960 --> 00:56:00,400
with the number of retailers that we think are in the market.

653
00:56:00,400 --> 00:56:06,360
And so there is sales per retailer per week.

654
00:56:06,360 --> 00:56:12,960
Well, we now have sales per retailer.

655
00:56:12,960 --> 00:56:16,120
Well, we can try to measure apples to apples.

656
00:56:16,120 --> 00:56:20,720
What is our estimate of sales per retailer

657
00:56:20,720 --> 00:56:24,680
in Massachusetts in 2020?

658
00:56:24,680 --> 00:56:31,840
We're saying 10.86 million.

659
00:56:31,840 --> 00:56:40,040
Well, that is not 8.36 million.

660
00:56:40,040 --> 00:56:43,120
So we've got a different number here.

661
00:56:43,120 --> 00:56:45,520
We probably have a different number

662
00:56:45,520 --> 00:56:51,080
because we're taking into consideration the number

663
00:56:51,080 --> 00:56:59,320
of retailers that were actually operational at any given time.

664
00:56:59,320 --> 00:57:06,080
So we're calculating a different statistic than they are.

665
00:57:06,080 --> 00:57:13,480
If they're just using sales per dispensary at the end of 2020,

666
00:57:13,480 --> 00:57:16,600
well, I think they need to specify this.

667
00:57:16,600 --> 00:57:21,760
So basically, this is sort of the lesson I'm driving home today

668
00:57:21,760 --> 00:57:27,880
is you need to be really explicit in your notes

669
00:57:27,880 --> 00:57:29,400
and how you describe.

670
00:57:29,400 --> 00:57:31,280
And like I said, they've got a report.

671
00:57:31,280 --> 00:57:35,840
So they could be buried in the report somewhere.

672
00:57:35,840 --> 00:57:37,160
And it probably is.

673
00:57:37,160 --> 00:57:40,120
So next week, I'll make my corrections

674
00:57:40,120 --> 00:57:43,120
and quit slamming these guys.

675
00:57:43,120 --> 00:57:48,400
And tell you more about how they created their statistics.

676
00:57:48,400 --> 00:57:51,040
Because like I said, it's probably in the report.

677
00:57:51,040 --> 00:57:56,600
But like I said, you need to be really explicit in your notes.

678
00:57:56,600 --> 00:58:00,360
And that's the whole thing about data science.

679
00:58:00,360 --> 00:58:03,160
And that's where I think it needs to be done out

680
00:58:03,160 --> 00:58:05,840
in the open and out in the light.

681
00:58:05,840 --> 00:58:09,920
Because there are so many assumptions

682
00:58:09,920 --> 00:58:11,440
you can make along the way.

683
00:58:11,440 --> 00:58:15,840
It's like, are you calculating average?

684
00:58:15,840 --> 00:58:18,720
It's like, how are you calculating this average?

685
00:58:18,720 --> 00:58:22,360
Are you calculating average sales per retailer,

686
00:58:22,360 --> 00:58:26,240
taking into consideration which ones were actually open?

687
00:58:26,240 --> 00:58:33,440
Or are you just estimating it using the end of year total

688
00:58:33,440 --> 00:58:36,520
and keeping in mind that that's going to be an estimate and not

689
00:58:36,520 --> 00:58:38,800
the actual?

690
00:58:38,800 --> 00:58:44,400
So long story short, we can go ahead

691
00:58:44,400 --> 00:58:48,000
and we've cast some uncertainty there.

692
00:58:48,000 --> 00:58:51,200
Well, we can say, OK, let's at least look

693
00:58:51,200 --> 00:58:53,960
at the retailers per capita.

694
00:58:53,960 --> 00:58:57,240
Hopefully, that one we can maybe have some agreement upon.

695
00:59:00,120 --> 00:59:04,520
And once again, our numbers vary because we're not

696
00:59:04,520 --> 00:59:07,680
calculating apples to apples here.

697
00:59:07,680 --> 00:59:11,520
Because I'm pulling the best I can do is pull pop.

698
00:59:11,520 --> 00:59:13,080
And maybe this is the best I can do.

699
00:59:13,080 --> 00:59:14,160
So I may have to revise it.

700
00:59:14,160 --> 00:59:18,600
But pulling just total population.

701
00:59:18,600 --> 00:59:24,160
But you see, they use the actual adult population.

702
00:59:24,160 --> 00:59:26,800
So not quite apples to apples.

703
00:59:29,840 --> 00:59:35,840
But we're estimating that the retailers per capita,

704
00:59:35,840 --> 00:59:44,040
you know, in 2020 was on average around one.

705
00:59:44,040 --> 00:59:45,920
As you can see, it's growing up.

706
00:59:45,920 --> 00:59:51,400
And you see the year looks like the year end around 2020,

707
00:59:51,400 --> 00:59:56,600
it was 1.5, which is what they measured.

708
00:59:56,600 --> 01:00:00,680
They measured 1.5 dispensaries per 100,000.

709
01:00:00,680 --> 01:00:03,160
So that may be like year of end total.

710
01:00:03,160 --> 01:00:05,400
But at the beginning of the year,

711
01:00:05,400 --> 01:00:10,840
there was only 0.5, approximately 0.5 retailers

712
01:00:10,840 --> 01:00:12,960
per 100,000 people.

713
01:00:12,960 --> 01:00:23,160
So when you just see, oh, in 2020,

714
01:00:23,160 --> 01:00:27,520
this number changed in 2020.

715
01:00:27,520 --> 01:00:34,920
So long story short, I think I may wrap it up

716
01:00:34,920 --> 01:00:36,240
here for today.

717
01:00:36,240 --> 01:00:39,720
But for the coming weeks, this is essentially

718
01:00:39,720 --> 01:00:42,280
going to be our goal, or at least next week,

719
01:00:42,280 --> 01:00:50,520
is basically get the data from more states,

720
01:00:50,520 --> 01:00:55,040
continue to reproduce these statistics, retailers

721
01:00:55,040 --> 01:01:00,040
per capita and sales per capita, and see

722
01:01:00,040 --> 01:01:06,680
if we can't replicate all of these statistics here.

723
01:01:06,680 --> 01:01:11,480
And as we said, we're not doing perfectly apples to apples.

724
01:01:11,480 --> 01:01:15,000
So if there is some slight discrepancy,

725
01:01:15,000 --> 01:01:17,640
that's not the end of the world.

726
01:01:17,640 --> 01:01:19,800
But we just want to make sure that we're not

727
01:01:19,800 --> 01:01:24,200
systemically measuring things differently here.

728
01:01:24,200 --> 01:01:29,200
And as I said, we get to create panel data.

729
01:01:29,200 --> 01:01:35,160
So here, we actually have sales per retailer

730
01:01:35,160 --> 01:01:38,840
throughout time in Massachusetts.

731
01:01:38,840 --> 01:01:43,600
So we have many, many, many more data points

732
01:01:43,600 --> 01:01:47,640
so we can run a regression.

733
01:01:47,640 --> 01:01:51,680
And maybe, in fact, I don't have the time to do it today,

734
01:01:51,680 --> 01:01:56,120
but it would be interesting to do the regression of this

735
01:01:56,120 --> 01:01:57,760
just on Massachusetts.

736
01:01:57,760 --> 01:02:04,000
So redo this regression of revenue

737
01:02:04,000 --> 01:02:11,080
against retailers per 100,000, but just do it in Massachusetts

738
01:02:11,080 --> 01:02:16,720
and see what the effect is, see what the beta is

739
01:02:16,720 --> 01:02:20,040
in Massachusetts.

740
01:02:20,040 --> 01:02:26,520
And then, as we said, replicate it for all the other states.

741
01:02:26,520 --> 01:02:34,360
OK, so this is sort of what we began with.

742
01:02:34,360 --> 01:02:38,880
And now we're down a rabbit hole where

743
01:02:38,880 --> 01:02:42,120
we're questioning the certainty of the data.

744
01:02:42,120 --> 01:02:46,120
And we're going to have a lot of work on our hands.

745
01:02:46,120 --> 01:02:51,800
But this is the work of a data scientist.

746
01:02:51,800 --> 01:02:57,000
So I'm going to go ahead and stop presenting for today

747
01:02:57,000 --> 01:03:00,960
just to save some for next week and just not go overboard.

748
01:03:00,960 --> 01:03:03,760
However, does anybody have any questions

749
01:03:03,760 --> 01:03:05,840
from all of this work today?

750
01:03:12,040 --> 01:03:13,840
I just had a comment.

751
01:03:13,840 --> 01:03:16,120
So we did the linear regression.

752
01:03:16,120 --> 01:03:20,720
And I'm assuming we are assuming that it's not normalized,

753
01:03:20,720 --> 01:03:24,880
right, because the data points are skewed on one side.

754
01:03:24,880 --> 01:03:28,320
And there are a couple of points, like two or three

755
01:03:28,320 --> 01:03:29,160
on the other end.

756
01:03:38,480 --> 01:03:39,840
Can you please repeat that question?

757
01:03:39,840 --> 01:03:42,440
My connection got spotty there for a second.

758
01:03:42,440 --> 01:03:46,680
OK, yeah, we did the linear regression.

759
01:03:46,680 --> 01:03:53,640
And I was just making a comment that obviously we

760
01:03:53,640 --> 01:03:59,840
can't take this for what it is because the majority

761
01:03:59,840 --> 01:04:02,760
of the data points were skewed to the left.

762
01:04:02,760 --> 01:04:05,920
It's not normally distributed.

763
01:04:05,920 --> 01:04:10,080
So I'd be curious to see how, like you already pointed out,

764
01:04:10,080 --> 01:04:13,560
that we need more data points.

765
01:04:13,560 --> 01:04:15,720
100% correct.

766
01:04:15,720 --> 01:04:18,280
That's where the law of large numbers comes in,

767
01:04:18,280 --> 01:04:22,400
is if we did have more and more observations,

768
01:04:22,400 --> 01:04:27,720
our errors may approach the normal distribution.

769
01:04:27,720 --> 01:04:30,840
Yeah, because even though it's a negative correlation,

770
01:04:30,840 --> 01:04:35,120
is it really because of how the data points are skewed?

771
01:04:35,120 --> 01:04:37,080
Yeah, because it's significant.

772
01:04:37,080 --> 01:04:39,080
But yeah.

773
01:04:39,080 --> 01:04:40,840
That's just a perfect observation.

774
01:04:40,840 --> 01:04:42,840
And I couldn't have said it better myself.

775
01:04:42,840 --> 01:04:46,000
Because I mean, you hit on a key point here

776
01:04:46,000 --> 01:04:51,280
is you can't put too much stock into these estimates we're

777
01:04:51,280 --> 01:04:55,720
doing because we only have 12 observations.

778
01:04:58,640 --> 01:05:02,040
There is probably systemic differences going on.

779
01:05:07,000 --> 01:05:11,320
So that's essentially why we're on this road to get panel data,

780
01:05:11,320 --> 01:05:14,680
is to increase our statistical power.

781
01:05:14,680 --> 01:05:22,440
So if we're able to get these observations throughout time,

782
01:05:22,440 --> 01:05:25,640
then we may be able to estimate the relationship a little

783
01:05:25,640 --> 01:05:29,120
better than we did today.

784
01:05:29,120 --> 01:05:31,720
Because like I said, we're saying,

785
01:05:31,720 --> 01:05:34,800
oh, if one retailer enters per 100,000,

786
01:05:34,800 --> 01:05:37,400
it may go down by 350,000.

787
01:05:37,400 --> 01:05:40,840
I wouldn't put too much stock on those numbers.

788
01:05:40,840 --> 01:05:44,160
So if you're doing policy decisions,

789
01:05:44,160 --> 01:05:47,040
I don't know if I would make them based on those numbers.

790
01:05:47,040 --> 01:05:53,280
So welcome to the world of statistics and data science.

791
01:05:53,280 --> 01:05:58,040
So I hope I've cast enough uncertainty.

792
01:05:58,040 --> 01:06:01,160
But long story short is, I think it's

793
01:06:01,160 --> 01:06:06,120
useful to at least look at the data, at least plot it,

794
01:06:06,120 --> 01:06:07,960
see what the regression line is.

795
01:06:07,960 --> 01:06:11,400
But just take it at face value.

796
01:06:11,400 --> 01:06:15,160
So just know this is 12 observations.

797
01:06:15,160 --> 01:06:18,640
There could be measurement error going on.

798
01:06:18,640 --> 01:06:21,480
There's a lot of factors that are questionable here.

799
01:06:21,480 --> 01:06:26,160
So further investigation is needed.

800
01:06:26,160 --> 01:06:30,200
But that's my typical is just hedge everything.

801
01:06:30,200 --> 01:06:31,800
Be skeptical of data.

802
01:06:31,800 --> 01:06:35,960
Be skeptical of statistics that people put before you.

803
01:06:35,960 --> 01:06:41,480
Always ask about methodology.

804
01:06:41,480 --> 01:06:43,040
Ask about any assumptions that may

805
01:06:43,040 --> 01:06:44,600
have been made along the way.

806
01:06:44,600 --> 01:06:47,360
And likewise, when you're presenting your results,

807
01:06:47,360 --> 01:06:50,720
be upfront about the assumptions you make

808
01:06:50,720 --> 01:06:54,920
and any shortcomings in the data, and so on and so forth.

809
01:06:54,920 --> 01:06:59,640
So it's the best we can do in an imperfect world.

810
01:06:59,640 --> 01:07:03,560
Well, definitely feel free to reach out throughout the week

811
01:07:03,560 --> 01:07:06,640
if you have any questions, one thought cannabis or data

812
01:07:06,640 --> 01:07:10,160
science or cannabis data science.

813
01:07:10,160 --> 01:07:15,920
And if you have any good ideas, avenues for future research,

814
01:07:15,920 --> 01:07:17,680
yeah, always feel free to reach out.

815
01:07:17,680 --> 01:07:19,040
Always happy to have a discussion.

816
01:07:22,040 --> 01:07:24,560
Well, thank you all for coming.

817
01:07:24,560 --> 01:07:26,480
Enjoy your Thanksgiving.

818
01:07:26,480 --> 01:07:29,520
If you're taking time off from work, if not, then

819
01:07:29,520 --> 01:07:31,520
keep your nose to the grindstone.

820
01:07:31,520 --> 01:07:36,480
And then until next time, stay productive.

821
01:07:36,480 --> 01:07:39,200
Feel free to tune in through Saturday morning statistics.

822
01:07:39,200 --> 01:07:41,440
We'll look at panel data models.

823
01:07:41,440 --> 01:07:43,560
And then next week, we'll pick up

824
01:07:43,560 --> 01:07:48,200
with our comparative analysis of the various states.

825
01:07:48,200 --> 01:07:49,560
Thank you so much.

826
01:07:49,560 --> 01:07:51,000
You're welcome.

827
01:07:51,000 --> 01:07:52,280
Have an awesome one, everyone.

828
01:07:52,280 --> 01:07:52,800
You, too.

829
01:07:52,800 --> 01:07:53,320
Thank you.

830
01:07:53,320 --> 01:07:53,840
Thank you.

831
01:07:53,840 --> 01:07:56,760
You're welcome.

832
01:07:56,760 --> 01:07:57,280
Bye.

833
01:07:57,280 --> 01:07:57,800
Bye now.

834
01:07:57,800 --> 01:08:24,840
["The Star-Spangled Banner"]

