1
00:00:00,000 --> 00:00:11,040
Welcome to the Cannabis Data Science Meetup Group.

2
00:00:11,040 --> 00:00:13,040
You're in for a special treat today.

3
00:00:13,040 --> 00:00:15,880
Got incredibly good material.

4
00:00:15,880 --> 00:00:21,680
Some of my thoughts have really come together about the industry in general and the economic

5
00:00:21,680 --> 00:00:25,720
effects of the permitted cannabis industry.

6
00:00:25,720 --> 00:00:31,840
So without further ado, we'll go ahead and dive right into it and I'll start sharing

7
00:00:31,840 --> 00:00:36,960
my screen.

8
00:00:36,960 --> 00:00:43,200
Thanks.

9
00:00:43,200 --> 00:00:55,320
So last week we began to just see how Massachusetts is publishing a good amount of public data.

10
00:00:55,320 --> 00:01:08,880
So you have everything from just the number of licensees, you have sales by product type,

11
00:01:08,880 --> 00:01:21,960
and at the end of last week we were beginning to look at the production side.

12
00:01:21,960 --> 00:01:28,000
So we saw, wow, they're doing a good job at publishing data.

13
00:01:28,000 --> 00:01:34,480
They're publishing daily totals on an almost real time basis.

14
00:01:34,480 --> 00:01:35,480
Look at this.

15
00:01:35,480 --> 00:01:45,400
This was just updated on September 27th, so just on Monday.

16
00:01:45,400 --> 00:01:53,480
So we have incredibly recent data, which is hard to come by data that's so fresh.

17
00:01:53,480 --> 00:02:01,480
So we've got fresh data and on top of that we can access it through an API just to make

18
00:02:01,480 --> 00:02:04,640
our lives simple and easy.

19
00:02:04,640 --> 00:02:12,560
And we have a rich data set here with a number of interesting data points.

20
00:02:12,560 --> 00:02:19,520
We've got the number of plants in their various stages, so we can see what plants are mature,

21
00:02:19,520 --> 00:02:22,120
which are flowering.

22
00:02:22,120 --> 00:02:31,640
We can see what packages are on hand, so we can get a gauge of inventory at any given

23
00:02:31,640 --> 00:02:34,280
time.

24
00:02:34,280 --> 00:02:42,480
Furthermore, we can get a couple more interesting data points, such as the number of strains,

25
00:02:42,480 --> 00:02:50,760
products, and last but not least, we saw, wow, they're tracking the number of employees

26
00:02:50,760 --> 00:02:53,760
in the cannabis industry.

27
00:02:53,760 --> 00:03:02,360
And earlier in this series, in particular, we were looking at Colorado and we were trying

28
00:03:02,360 --> 00:03:09,120
to look at production functions and we were talking about capital inputs, labor inputs,

29
00:03:09,120 --> 00:03:16,680
and that was where we estimated the competitive rate of return on capital, as well as the

30
00:03:16,680 --> 00:03:19,600
competitive wage rate in Colorado.

31
00:03:19,600 --> 00:03:24,520
And so what's real interesting with this data is we could repeat this exact same analysis

32
00:03:24,520 --> 00:03:27,600
for Massachusetts.

33
00:03:27,600 --> 00:03:33,800
And we'll probably do that, but we can do something new and fresh today.

34
00:03:33,800 --> 00:03:39,840
So first things first, let's get the data.

35
00:03:39,840 --> 00:03:48,280
So just using a handful of standard Python packages, we'll be using Python, but the data

36
00:03:48,280 --> 00:03:57,320
is accessible through an API, so you can use your favorite programming languages.

37
00:03:57,320 --> 00:04:03,480
Just writing a handful of helper functions here, I can talk a bit more about these when

38
00:04:03,480 --> 00:04:06,640
we use them here.

39
00:04:06,640 --> 00:04:16,040
So first things first, let's get this data and look at it ourselves.

40
00:04:16,040 --> 00:04:23,600
So you can register for Socratic to get an app token.

41
00:04:23,600 --> 00:04:26,120
You're not necessarily required.

42
00:04:26,120 --> 00:04:34,400
You just get throttled if you make too many requests without an app token.

43
00:04:34,400 --> 00:04:39,040
So we can use one at the end of the row.

44
00:04:39,040 --> 00:04:41,800
So we'll just initialize the API.

45
00:04:41,800 --> 00:04:45,120
So we're just saying, okay, this is the URL.

46
00:04:45,120 --> 00:04:47,320
So what is an API?

47
00:04:47,320 --> 00:04:53,960
It's really just a simple, simple website that you can interact with programmatically

48
00:04:53,960 --> 00:04:56,000
to get data.

49
00:04:56,000 --> 00:05:01,480
So it's not going to be marked up in a nice, beautiful website format.

50
00:05:01,480 --> 00:05:05,440
It's just going to be raw data.

51
00:05:05,440 --> 00:05:09,600
That's what we're after, the bare bones.

52
00:05:09,600 --> 00:05:24,120
So we'll just ping this website, backslash our data set ID, which you can get right here,

53
00:05:24,120 --> 00:05:28,480
the data set identifier.

54
00:05:28,480 --> 00:05:30,720
We'll pass a handful of parameters.

55
00:05:30,720 --> 00:05:36,720
So we'll basically say, okay, we don't want more than 2000 observations.

56
00:05:36,720 --> 00:05:39,200
And we actually want to order this.

57
00:05:39,200 --> 00:05:43,600
So here I've done a little bit of scouting.

58
00:05:43,600 --> 00:05:50,040
And then I saw, okay, look, they give us a timestamp.

59
00:05:50,040 --> 00:05:54,960
So we can actually order our data by time.

60
00:05:54,960 --> 00:06:01,920
Here they give us an example of how you can do a query for a specific date.

61
00:06:01,920 --> 00:06:19,160
Luckily, we can use a search query language command descending.

62
00:06:19,160 --> 00:06:28,280
To specify in our query that we want everything sorted by the date.

63
00:06:28,280 --> 00:06:30,280
So enough of that.

64
00:06:30,280 --> 00:06:42,440
And so long story short, we're just going to ping the website and get our data.

65
00:06:42,440 --> 00:06:44,160
So we should have our data here.

66
00:06:44,160 --> 00:06:53,320
So for example, in the programming language, 200 is success.

67
00:06:53,320 --> 00:06:56,040
And so what did we just get?

68
00:06:56,040 --> 00:07:02,880
Well, instead of getting a website, we just got a giant data dump.

69
00:07:02,880 --> 00:07:09,920
So this is instead of delivering us a website, they just deliver us a bunch of what's called

70
00:07:09,920 --> 00:07:12,120
JSON objects.

71
00:07:12,120 --> 00:07:18,640
So JavaScript object notation data.

72
00:07:18,640 --> 00:07:27,080
And so this is how you can think about a data point where it's essentially an observation

73
00:07:27,080 --> 00:07:28,080
here.

74
00:07:28,080 --> 00:07:30,480
So this is one observation.

75
00:07:30,480 --> 00:07:37,080
So this is the date, September 27th.

76
00:07:37,080 --> 00:07:47,920
On that date, there were 129,000 immature plants and so on and so forth.

77
00:07:47,920 --> 00:07:53,000
So we can work with this.

78
00:07:53,000 --> 00:07:58,400
What's a glorified simplified Excel spreadsheet in programming language?

79
00:07:58,400 --> 00:08:00,000
So a data frame.

80
00:08:00,000 --> 00:08:02,760
So we'll just put this in the data frame.

81
00:08:02,760 --> 00:08:09,520
And here I use one of these helper functions to basically put everything into chronological

82
00:08:09,520 --> 00:08:13,160
order.

83
00:08:13,160 --> 00:08:18,240
Because we wanted to get the most recent data, but now we actually need to put it in chronological

84
00:08:18,240 --> 00:08:26,880
order just because that's how we do time series analysis from the past to the present or even

85
00:08:26,880 --> 00:08:28,840
the future.

86
00:08:28,840 --> 00:08:37,880
So let's put this into a data frame just so that we can work with it easily.

87
00:08:37,880 --> 00:08:41,680
And what does this look like?

88
00:08:41,680 --> 00:08:47,400
Well this is why I was saying this looks similar to an Excel spreadsheet where you've got a

89
00:08:47,400 --> 00:08:49,120
table here.

90
00:08:49,120 --> 00:08:55,780
We've got a thousand plus rows, 1069 rows.

91
00:08:55,780 --> 00:09:01,400
So that's 1069 days.

92
00:09:01,400 --> 00:09:15,160
So just short of three years of data here.

93
00:09:15,160 --> 00:09:18,920
And like we mentioned earlier, we have all these observations.

94
00:09:18,920 --> 00:09:21,880
Here's the observations on a daily basis.

95
00:09:21,880 --> 00:09:23,120
Awesome.

96
00:09:23,120 --> 00:09:25,880
What variables do we have here?

97
00:09:25,880 --> 00:09:33,120
Well I just did a little pre-scouting and just listed, okay.

98
00:09:33,120 --> 00:09:37,280
Here we are.

99
00:09:37,280 --> 00:09:40,640
So those are the variables we're working with.

100
00:09:40,640 --> 00:09:48,120
So without further ado, let's start looking at a handful of these.

101
00:09:48,120 --> 00:09:52,440
So great because that's the first step is look at the data.

102
00:09:52,440 --> 00:10:00,680
So you can plot these.

103
00:10:00,680 --> 00:10:05,440
Awesome.

104
00:10:05,440 --> 00:10:15,960
So we have our three years of data and we see that wow, at the very beginning, the very

105
00:10:15,960 --> 00:10:19,440
first observation, there were three employees.

106
00:10:19,440 --> 00:10:27,000
So we're basically tracking the Massachusetts industry from the inception, the first company,

107
00:10:27,000 --> 00:10:40,200
the first three employees to present day where we have just shy of 10,000 employees in presumably

108
00:10:40,200 --> 00:10:45,780
plant touching cannabis businesses in Massachusetts.

109
00:10:45,780 --> 00:10:53,000
So it's now time to put this into perspective because that I think is what this is all about.

110
00:10:53,000 --> 00:10:58,440
Because there's been a lot of noise in the cannabis industry.

111
00:10:58,440 --> 00:11:06,160
They'll let you believe that everybody works in the cannabis industry and it's just booming

112
00:11:06,160 --> 00:11:10,640
and well that may be, but it's time to quantify that.

113
00:11:10,640 --> 00:11:16,200
So it's time to measure, okay, what exactly is the size, what exactly is the impact here,

114
00:11:16,200 --> 00:11:18,280
at least the economic impact.

115
00:11:18,280 --> 00:11:21,320
We can measure that.

116
00:11:21,320 --> 00:11:31,360
So without further ado, the reason why is, well, perhaps other states can look at the

117
00:11:31,360 --> 00:11:34,720
Massachusetts and see, okay, what happens?

118
00:11:34,720 --> 00:11:38,580
What is the economic effect of permitting cannabis?

119
00:11:38,580 --> 00:11:45,680
You can look at the social effects as you should, but perhaps in another study right

120
00:11:45,680 --> 00:11:50,600
now they're really focused on the economic side of things.

121
00:11:50,600 --> 00:11:55,280
So time to put this into perspective.

122
00:11:55,280 --> 00:12:01,640
So for starters, this is going to need some sort of time scale, right?

123
00:12:01,640 --> 00:12:04,400
We're looking at a time series here.

124
00:12:04,400 --> 00:12:16,280
So what we can do is actually add a date to our data and set the index on the date.

125
00:12:16,280 --> 00:12:29,680
Nothing fancy here, but what that does is we can now plot our data in the exact same

126
00:12:29,680 --> 00:12:35,720
way, and we actually now can see the time scale.

127
00:12:35,720 --> 00:12:38,680
So now we can see, oh, look at this.

128
00:12:38,680 --> 00:12:49,280
Our data is starting just about this time in 2018, around September or October of 2018,

129
00:12:49,280 --> 00:13:01,840
and we can actually quantify that as October 15th is when our data set begins.

130
00:13:01,840 --> 00:13:09,520
And as we noted, we have quite recent data here up until September 27th, so all the way

131
00:13:09,520 --> 00:13:11,400
until Monday's totals.

132
00:13:11,400 --> 00:13:14,960
So they haven't quite published Tuesday's totals yet.

133
00:13:14,960 --> 00:13:27,920
So let's say, in my opinion, awesome that we have almost real-time data here.

134
00:13:27,920 --> 00:13:30,880
So let's start putting things into context.

135
00:13:30,880 --> 00:13:36,400
So we know how many employees are here in the campus industry.

136
00:13:36,400 --> 00:13:41,320
Well, how many employees are in Massachusetts in general?

137
00:13:41,320 --> 00:13:49,040
So this is a technique that I suggest to anyone doing data sciences.

138
00:13:49,040 --> 00:13:52,760
OK, you get your data set, right?

139
00:13:52,760 --> 00:13:55,000
Nice interesting data.

140
00:13:55,000 --> 00:13:58,840
What do we have here?

141
00:13:58,840 --> 00:14:00,840
What do we have here?

142
00:14:00,840 --> 00:14:07,640
How do you define a Massachusetts employee for the total?

143
00:14:07,640 --> 00:14:08,640
Not trying to be nasty.

144
00:14:08,640 --> 00:14:11,720
I'm just having a little difficulty with that.

145
00:14:11,720 --> 00:14:14,600
Wait, would you comment, Heather?

146
00:14:14,600 --> 00:14:22,720
Oh, meaning to say, it may not be so simple to define and say, OK, total Massachusetts-bound

147
00:14:22,720 --> 00:14:24,760
employees, that's your total.

148
00:14:24,760 --> 00:14:32,880
And then you're going to divide that by the number of cannabis-related or cannabis industry

149
00:14:32,880 --> 00:14:33,880
employees.

150
00:14:33,880 --> 00:14:34,880
Am I right?

151
00:14:34,880 --> 00:14:42,040
Because essentially, the rough idea is to just try to figure out the percentage of employees

152
00:14:42,040 --> 00:14:44,280
that are working in the cannabis industry.

153
00:14:44,280 --> 00:14:45,280
Right.

154
00:14:45,280 --> 00:14:49,400
So for cannabis, there aren't a whole lot of remote jobs.

155
00:14:49,400 --> 00:14:55,680
And I guess for me, just using Maryland as a comparison, I have difficulty even saying

156
00:14:55,680 --> 00:15:00,480
who's a Maryland employee other than those who are physically on site.

157
00:15:00,480 --> 00:15:05,600
Just because the headquarters is there, all of their managers may be on the West Coast

158
00:15:05,600 --> 00:15:07,720
somewhere as I'm finding.

159
00:15:07,720 --> 00:15:10,880
So I'm not saying that the number is going to be wrong.

160
00:15:10,880 --> 00:15:16,120
The definition of a Massachusetts employee may be more complicated than expected.

161
00:15:16,120 --> 00:15:21,680
Critical, critical, critical, critical point, Heather.

162
00:15:21,680 --> 00:15:24,200
And this is what we need more of, right?

163
00:15:24,200 --> 00:15:28,120
Because you can't just take this data at face value.

164
00:15:28,120 --> 00:15:32,800
You need to, for each, every data point, do exactly what you did.

165
00:15:32,800 --> 00:15:36,220
You just say, okay, we've got this data point here.

166
00:15:36,220 --> 00:15:37,640
What could be wrong with it?

167
00:15:37,640 --> 00:15:41,240
Well, for starters, it could easily undercount people.

168
00:15:41,240 --> 00:15:49,000
Like you said, like executives, people who are kind of gray on the books, right?

169
00:15:49,000 --> 00:15:56,320
Like are the janitors, are the janitors, are all the consultants or what have you, or all

170
00:15:56,320 --> 00:15:59,000
those people on the books.

171
00:15:59,000 --> 00:16:05,760
So I would say, like you said, and then you can start to think, okay, how does that bias

172
00:16:05,760 --> 00:16:06,760
the data?

173
00:16:06,760 --> 00:16:11,400
Well, I think it would probably bias the data downwards, right?

174
00:16:11,400 --> 00:16:19,120
Not necessarily, but I think you may have an undercount of employees here.

175
00:16:19,120 --> 00:16:26,240
So I think there could be more people than this that are associated with these businesses.

176
00:16:26,240 --> 00:16:33,600
And another interesting aspect is there's probably many people that are tangentially

177
00:16:33,600 --> 00:16:36,520
related to these businesses, like consultants, right?

178
00:16:36,520 --> 00:16:43,280
Like so you may have law firms or banks that predominantly serve these licensees, but they're

179
00:16:43,280 --> 00:16:48,120
not going to be counted as employees.

180
00:16:48,120 --> 00:16:55,720
So I think I like your note, Heather, and I think we need more of that.

181
00:16:55,720 --> 00:17:05,000
So be incredibly critical of these data points because, you know, we'll do our analysis here,

182
00:17:05,000 --> 00:17:15,880
but if our data going in has measurement errors, then that's going to have consequences on

183
00:17:15,880 --> 00:17:17,960
our analysis.

184
00:17:17,960 --> 00:17:25,480
So critical point.

185
00:17:25,480 --> 00:17:28,440
I'd like to say that's not going to stop us.

186
00:17:28,440 --> 00:17:30,600
We'll do our analysis anyways.

187
00:17:30,600 --> 00:17:34,520
You just have to take that into consideration when you're looking at the analysis.

188
00:17:34,520 --> 00:17:42,440
So when we get to our final result, we'll want to take into consideration, okay, let's

189
00:17:42,440 --> 00:17:49,720
keep in mind that total employees may be biased downwards or some of these other figures,

190
00:17:49,720 --> 00:17:53,740
you know, they may have their own measurements and their measurement errors of their own.

191
00:17:53,740 --> 00:18:01,400
So these are things that factor into our conclusion, conclusions and results.

192
00:18:01,400 --> 00:18:05,160
So every bit is worth pointing out.

193
00:18:05,160 --> 00:18:16,700
So just to keep steamrolling here, we have 15 variables.

194
00:18:16,700 --> 00:18:22,580
So that's 15 sources of potential measurement error.

195
00:18:22,580 --> 00:18:28,280
So every single one of these probably has their own flaws.

196
00:18:28,280 --> 00:18:36,680
So but nonetheless, we've got these data points here and to introduce even more uncertainty

197
00:18:36,680 --> 00:18:44,000
and measurement error, but also to add some more insight as well, we can supplement this

198
00:18:44,000 --> 00:18:45,320
data.

199
00:18:45,320 --> 00:18:55,040
And so we'll be supplementing it with data from the the Petal Roll Reserve here.

200
00:18:55,040 --> 00:19:00,720
So yes, that's right, we were looking at the number of employees.

201
00:19:00,720 --> 00:19:07,920
So once again, these numbers admittedly, and you know, people have done studies, there

202
00:19:07,920 --> 00:19:11,840
are undoubtedly measurement errors here.

203
00:19:11,840 --> 00:19:21,600
So take them into consideration, but we can still just start getting a rough measure on

204
00:19:21,600 --> 00:19:26,520
on the market, the size of the market and its impact.

205
00:19:26,520 --> 00:19:31,620
Because a rough measure is better than no measure.

206
00:19:31,620 --> 00:19:38,960
So we'll grab the number of employees here in Massachusetts.

207
00:19:38,960 --> 00:19:46,560
So note that this is going to be in thousands of people.

208
00:19:46,560 --> 00:20:00,600
So without further ado, we can get the number of Massachusetts employees here with the thread.

209
00:20:00,600 --> 00:20:05,320
Note I'm just doing your I'll show you specifically what I'm doing here.

210
00:20:05,320 --> 00:20:13,080
So we'll grab the first thing first, we need thread.

211
00:20:13,080 --> 00:20:19,080
So we've imported the packages.

212
00:20:19,080 --> 00:20:23,600
And then we need to initialize the API.

213
00:20:23,600 --> 00:20:29,040
So essentially just create a little client here that remembers our API key and makes

214
00:20:29,040 --> 00:20:36,520
requests on our behalf to the website, because once again, just like before, right, we're

215
00:20:36,520 --> 00:20:45,920
using one API, the Socratic API, we'll also use the dead Fred API, which is once again,

216
00:20:45,920 --> 00:20:52,000
just a URL that we ping and get some data back.

217
00:20:52,000 --> 00:21:01,400
Specifically, we're going to ping Federal Reserve.

218
00:21:01,400 --> 00:21:09,480
We want to real quick, I'm defining the start of our data here, right, because here's the

219
00:21:09,480 --> 00:21:16,800
Federal Reserve data set, they're going all the way back to prior 1992.

220
00:21:16,800 --> 00:21:17,800
That's awesome.

221
00:21:17,800 --> 00:21:24,160
We don't necessarily need that data at this moment.

222
00:21:24,160 --> 00:21:28,160
So we'll define a start date here.

223
00:21:28,160 --> 00:21:36,600
And like we noted earlier, our start date is October 15, 2018.

224
00:21:36,600 --> 00:21:40,720
Next, we need a data set ID.

225
00:21:40,720 --> 00:21:44,000
Notice here, they're all over the place, but there's one right there.

226
00:21:44,000 --> 00:21:46,440
You can also get it from the URL.

227
00:21:46,440 --> 00:21:51,040
So we'll get this data set ID.

228
00:21:51,040 --> 00:22:00,600
Then we'll read the employees here.

229
00:22:00,600 --> 00:22:03,880
And let's see, but yes, that's right.

230
00:22:03,880 --> 00:22:10,000
And so just, and this is what you'll notice with data sciences.

231
00:22:10,000 --> 00:22:17,160
There's these small little things that are important and you can get hung up on them.

232
00:22:17,160 --> 00:22:19,720
But here I've just done a little scouting.

233
00:22:19,720 --> 00:22:24,680
And so I'll show you a couple things that may seem simple, but they'll make the analysis

234
00:22:24,680 --> 00:22:26,440
go a lot better.

235
00:22:26,440 --> 00:22:29,960
So here we have the total Massachusetts employees.

236
00:22:29,960 --> 00:22:33,040
We have it by the month.

237
00:22:33,040 --> 00:22:35,920
So that is awesome.

238
00:22:35,920 --> 00:22:38,500
Right.

239
00:22:38,500 --> 00:22:40,800
And wow, look at that.

240
00:22:40,800 --> 00:22:49,280
So keep in mind, this is thousands of employees.

241
00:22:49,280 --> 00:22:55,080
So here we're going to multiply that by a thousand in just one second.

242
00:22:55,080 --> 00:23:02,120
But you'll notice, wow, like we noticed, we noted this last meetup.

243
00:23:02,120 --> 00:23:07,680
Yeah, something happened in April of 2020.

244
00:23:07,680 --> 00:23:13,760
So that's going to be showing in a lot of data sets.

245
00:23:13,760 --> 00:23:26,760
So there's a regime or they go by so many names.

246
00:23:26,760 --> 00:23:36,320
It depends on what field you're really studying, maybe macroeconomics, like a structural break,

247
00:23:36,320 --> 00:23:38,440
a shock.

248
00:23:38,440 --> 00:23:45,440
So anyways, anyways, we're focusing on the data here.

249
00:23:45,440 --> 00:24:05,920
So this little trick is basically if you look at our production data here, right, we've

250
00:24:05,920 --> 00:24:08,760
got daily data.

251
00:24:08,760 --> 00:24:16,800
And so basically we want to compare daily to monthly.

252
00:24:16,800 --> 00:24:30,480
So we're going to need to aggregate our production data into monthly data, which is not the end

253
00:24:30,480 --> 00:24:31,700
of the world here.

254
00:24:31,700 --> 00:24:35,440
So here we've already set our index.

255
00:24:35,440 --> 00:24:40,560
So we can just create a monthly average, right?

256
00:24:40,560 --> 00:24:44,160
Because how do you want to aggregate?

257
00:24:44,160 --> 00:24:51,640
So there's two ways predominantly that we'll be using in this demonstration, we'll use

258
00:24:51,640 --> 00:24:53,500
two ways to aggregate.

259
00:24:53,500 --> 00:24:55,760
You can aggregate by averaging.

260
00:24:55,760 --> 00:24:58,600
So what's the average during the month?

261
00:24:58,600 --> 00:25:04,320
So here we'll do that with total employees, which is the average number of employees during

262
00:25:04,320 --> 00:25:06,280
a given month.

263
00:25:06,280 --> 00:25:10,840
You could also do some, which we'll be doing later with sales.

264
00:25:10,840 --> 00:25:15,340
And then that would just be what's the total during a given month.

265
00:25:15,340 --> 00:25:19,360
So it's the total sales during a given month.

266
00:25:19,360 --> 00:25:25,440
Here we're looking at averages.

267
00:25:25,440 --> 00:25:35,200
So for example, we can now look at, okay, what's the monthly average number of employees

268
00:25:35,200 --> 00:25:39,580
in the Canvas industry in Massachusetts?

269
00:25:39,580 --> 00:25:46,680
And we can see our monthly averages here.

270
00:25:46,680 --> 00:25:55,040
Real quick, this is a cool little, oops, the cool little note here is if you actually look

271
00:25:55,040 --> 00:26:05,960
at these two plots together, you'll notice that our monthly average is essentially, you

272
00:26:05,960 --> 00:26:12,160
know, it's not quite because it changes month, right?

273
00:26:12,160 --> 00:26:17,140
The different months have different number of days, but it's roughly a 30 day moving

274
00:26:17,140 --> 00:26:18,700
average.

275
00:26:18,700 --> 00:26:29,280
So that's what you see with the moving averages, the moving average tends to smooth out the

276
00:26:29,280 --> 00:26:31,280
daily series.

277
00:26:31,280 --> 00:26:39,660
So anyways, quick little aside that I thought was interesting.

278
00:26:39,660 --> 00:26:45,240
So anyways, we have the monthly average here.

279
00:26:45,240 --> 00:26:49,160
So now we can actually compare that to the total employees.

280
00:26:49,160 --> 00:26:51,160
Oh yes, and that's right.

281
00:26:51,160 --> 00:26:52,520
I was explaining that.

282
00:26:52,520 --> 00:27:05,040
Oh, if we look at our monthly data here, the way I tend to conceptualize things is to timestamp

283
00:27:05,040 --> 00:27:08,080
them at the end of the month, right?

284
00:27:08,080 --> 00:27:17,080
Because if you're talking about the average number of employees in October of 2018, for

285
00:27:17,080 --> 00:27:25,600
me it conceptually makes a bit more sense to timestamp that as October 31st versus October

286
00:27:25,600 --> 00:27:30,320
1st, but that's entirely a preference.

287
00:27:30,320 --> 00:27:42,680
Well in this, for this analysis, it's entirely a preference, but it matters for the programming.

288
00:27:42,680 --> 00:27:53,540
So long story short, I'm using this function here to basically just take our, so what we

289
00:27:53,540 --> 00:28:06,360
have is, all I'm doing is taking our total Massachusetts employees and changing the timestamp

290
00:28:06,360 --> 00:28:14,240
from the beginning of the month to the end of the month.

291
00:28:14,240 --> 00:28:23,400
That way, you know, that way we have the same data, but it just timestamped at the end of

292
00:28:23,400 --> 00:28:24,400
the month.

293
00:28:24,400 --> 00:28:27,200
So that way we're comparing apples to apples.

294
00:28:27,200 --> 00:28:29,800
We're saying, okay, we're comparing this month to this month.

295
00:28:29,800 --> 00:28:32,440
It's real clear what we're doing here.

296
00:28:32,440 --> 00:28:36,560
Furthermore, we want to compare the same units to units.

297
00:28:36,560 --> 00:28:44,220
So changing thousands of people to number of people.

298
00:28:44,220 --> 00:28:48,840
So can do that simple enough.

299
00:28:48,840 --> 00:28:53,200
And then we can actually look at our employees here.

300
00:28:53,200 --> 00:29:04,880
So we can say, here's our monthly number of employees.

301
00:29:04,880 --> 00:29:08,760
I think we've already done this.

302
00:29:08,760 --> 00:29:12,400
Exactly.

303
00:29:12,400 --> 00:29:17,720
Then we can plot that with the total number of Massachusetts employees.

304
00:29:17,720 --> 00:29:26,360
And okay, right off the bat, what's this?

305
00:29:26,360 --> 00:29:34,280
Like is this like 3 million some employees in Massachusetts?

306
00:29:34,280 --> 00:29:37,800
Exactly.

307
00:29:37,800 --> 00:29:45,840
So you've got about 3.5 million total employees in Massachusetts.

308
00:29:45,840 --> 00:29:52,840
So quite a large number.

309
00:29:52,840 --> 00:30:00,600
Seems like too many, doesn't it?

310
00:30:00,600 --> 00:30:05,680
Real quick, let's just grab the population of Massachusetts real quick.

311
00:30:05,680 --> 00:30:12,520
So here I'm just grabbing the population of Massachusetts.

312
00:30:12,520 --> 00:30:15,560
So what was the population here?

313
00:30:15,560 --> 00:30:19,600
Total says 6.893 million as of 2019.

314
00:30:19,600 --> 00:30:23,520
Okay, that's more reasonable then.

315
00:30:23,520 --> 00:30:39,840
So you've got, well, we could do the math of the percentage of people actually working.

316
00:30:39,840 --> 00:30:43,480
We may do that here in one second.

317
00:30:43,480 --> 00:30:46,120
So let's just do it right now.

318
00:30:46,120 --> 00:30:47,120
That's what we're all about.

319
00:30:47,120 --> 00:31:06,080
The reason I was hesitant is we're going to have to

320
00:31:06,080 --> 00:31:30,480
create the annual number of Massachusetts employees, which we just want to do an average.

321
00:31:30,480 --> 00:31:36,800
If this works out well, we can do it.

322
00:31:36,800 --> 00:31:41,040
Awesome.

323
00:31:41,040 --> 00:31:44,920
So now we have the annual number of Massachusetts employees.

324
00:31:44,920 --> 00:31:51,880
And then we can just divide that by the population.

325
00:31:51,880 --> 00:31:56,400
Not quite.

326
00:31:56,400 --> 00:32:05,000
Wait one second.

327
00:32:05,000 --> 00:32:06,560
Let's look at these two data sets here.

328
00:32:06,560 --> 00:32:10,520
We've got the number of employees.

329
00:32:10,520 --> 00:32:13,520
We've got the population.

330
00:32:13,520 --> 00:32:22,800
Oh yes, that's right.

331
00:32:22,800 --> 00:32:37,640
It's just this sort of weird pandas thing going on here.

332
00:32:37,640 --> 00:32:41,920
So there we are.

333
00:32:41,920 --> 00:32:50,520
So roughly 50% of the population is working at any given time.

334
00:32:50,520 --> 00:32:58,760
And keep in mind, this is different than the unemployment rate, because the population

335
00:32:58,760 --> 00:33:01,400
includes children.

336
00:33:01,400 --> 00:33:04,120
It includes retired people.

337
00:33:04,120 --> 00:33:11,280
And children and retired people and people that aren't in the labor force, they're not

338
00:33:11,280 --> 00:33:14,960
included when you're calculating unemployment.

339
00:33:14,960 --> 00:33:21,160
So this is a different statistic here.

340
00:33:21,160 --> 00:33:24,800
So I just wanted to do it more for a sanity check, right?

341
00:33:24,800 --> 00:33:32,300
Because you can't have more people working than there are living in the state.

342
00:33:32,300 --> 00:33:38,240
So I was more doing it for a sanity check.

343
00:33:38,240 --> 00:33:43,440
So anyways, we've got the total number of employees as we saw.

344
00:33:43,440 --> 00:33:47,040
Oh yes, I wanted to comment on this real quick.

345
00:33:47,040 --> 00:33:54,520
So we've got the total number of employees here in Massachusetts.

346
00:33:54,520 --> 00:34:04,120
And if you look at the cannabis industry, you'll see how they really demonstrate how

347
00:34:04,120 --> 00:34:12,700
the cannabis industry proved to be, well, they were defined as an essential industry

348
00:34:12,700 --> 00:34:14,480
during the pandemic.

349
00:34:14,480 --> 00:34:20,020
And it definitely has an effect here on the employment here.

350
00:34:20,020 --> 00:34:30,760
So you'll notice in August of 2020, or around that time frame, that you do see a slight

351
00:34:30,760 --> 00:34:34,240
dip in employment.

352
00:34:34,240 --> 00:34:37,720
However, it remains almost constant.

353
00:34:37,720 --> 00:34:44,360
So it's almost like in the cannabis industry, there is essentially a hiring freeze, not

354
00:34:44,360 --> 00:34:52,360
a lot of new hires, or at least any new hires were essentially offset by anyone leaving.

355
00:34:52,360 --> 00:35:06,680
It looks like there were a handful of people that, for a better word, they quit being employees

356
00:35:06,680 --> 00:35:07,680
during that time.

357
00:35:07,680 --> 00:35:18,200
Maybe they lost their jobs, maybe they resigned, who knows, maybe they got fired or laid off.

358
00:35:18,200 --> 00:35:20,280
What about homegrown?

359
00:35:20,280 --> 00:35:22,560
Maybe they took up homegrown.

360
00:35:22,560 --> 00:35:24,520
Seriously, it's happening.

361
00:35:24,520 --> 00:35:26,240
People are moving for that reason.

362
00:35:26,240 --> 00:35:27,240
Thank you.

363
00:35:27,240 --> 00:35:33,360
Oh yes, and that may actually be what this crowd is.

364
00:35:33,360 --> 00:35:36,760
So these people are already in the cannabis industry here.

365
00:35:36,760 --> 00:35:44,160
Yes, people may be transitioning from banking or what have you.

366
00:35:44,160 --> 00:35:55,480
Their everyday retail jobs do exactly that, to home grow perhaps.

367
00:35:55,480 --> 00:36:03,040
We can only conjecture as to what the reasons are or what is underlying this data.

368
00:36:03,040 --> 00:36:12,160
However, I think it's interesting to remark and to note that the cannabis industry was

369
00:36:12,160 --> 00:36:21,760
quite resilient to job loss during the pandemic, which remember last week we were talking about

370
00:36:21,760 --> 00:36:31,880
how regulations often affect the structure and performance and outcomes of various industries.

371
00:36:31,880 --> 00:36:39,400
And so simply just saying, okay, this industry is essential.

372
00:36:39,400 --> 00:36:43,240
Well that likely has an impact here.

373
00:36:43,240 --> 00:36:46,840
And so, you know, does that explain everything?

374
00:36:46,840 --> 00:36:54,400
No, like there's always a multitude of factors here.

375
00:36:54,400 --> 00:37:01,300
So I don't know, I just, so this is something that's been observed, not just by me, a lot

376
00:37:01,300 --> 00:37:07,360
of people have observed this, that, you know, the cannabis industry was resilient during

377
00:37:07,360 --> 00:37:09,960
the pandemic.

378
00:37:09,960 --> 00:37:18,560
And the data shows that here in Massachusetts where the employment levels pretty much, they

379
00:37:18,560 --> 00:37:24,840
stayed constant during that time, but there wasn't a dramatic drop.

380
00:37:24,840 --> 00:37:34,760
And then, you know, basically what we see here is the trend looks like the trend resumes

381
00:37:34,760 --> 00:37:39,680
to about what the trend was, but perhaps maybe a little more volatility here.

382
00:37:39,680 --> 00:37:47,840
However, the overall rate, there is, it looks like there was an entire shift in the curve.

383
00:37:47,840 --> 00:37:54,960
So it looks like, okay, there was the shock, the adjustment, steady state growth is now

384
00:37:54,960 --> 00:38:02,720
back on track, but there was a whole shift down in employment.

385
00:38:02,720 --> 00:38:14,800
Whereas the cannabis industry, there was the shock, not much of a shock, at least to employment,

386
00:38:14,800 --> 00:38:24,120
and then resumes to steady state growth with, you know, just a slight, slight shift in the

387
00:38:24,120 --> 00:38:27,240
labor curve.

388
00:38:27,240 --> 00:38:36,000
So I think there's more insights that can be taken away here, but we're just going to

389
00:38:36,000 --> 00:38:40,960
kind of keep moving along here because there's a lot of ground to cover.

390
00:38:40,960 --> 00:38:47,480
So just to look at another statistic here.

391
00:38:47,480 --> 00:38:57,400
Oh yes, we were just going to start to gauge, okay, cannabis industry is resilient.

392
00:38:57,400 --> 00:38:58,920
Is it meaningful, right?

393
00:38:58,920 --> 00:39:02,040
What percent of the economy is this?

394
00:39:02,040 --> 00:39:06,160
Is it a noteworthy segment of the economy, right?

395
00:39:06,160 --> 00:39:11,000
Because like I said, if you talk to people in the cannabis industry, they'll make you

396
00:39:11,000 --> 00:39:16,000
believe that, you know, the cannabis industry is propping up the entire economy.

397
00:39:16,000 --> 00:39:24,800
Let's see how true that is, or at least try to quantify.

398
00:39:24,800 --> 00:39:30,060
So I think this is interesting data here.

399
00:39:30,060 --> 00:39:40,000
So here you have just the cannabis employees as a percent of all employees.

400
00:39:40,000 --> 00:39:46,120
And so obviously they were 0% to begin with.

401
00:39:46,120 --> 00:39:52,400
And keep in mind, and so this is where we're going to start talking about the economic

402
00:39:52,400 --> 00:39:55,320
impact of permitting cannabis, right?

403
00:39:55,320 --> 00:39:58,040
Because was it actually zero?

404
00:39:58,040 --> 00:40:04,440
So what we've talked about this in the past, whenever you run up to a zero bound, well,

405
00:40:04,440 --> 00:40:08,440
that may mean you're not capturing things on the other side of the bound.

406
00:40:08,440 --> 00:40:13,340
And what's on the other side of the bound here is you may have people that are working

407
00:40:13,340 --> 00:40:17,240
in illegal cannabis markets, right?

408
00:40:17,240 --> 00:40:24,840
So people doing illegal grows, people doing illegal processing, people doing illegal retail.

409
00:40:24,840 --> 00:40:28,680
And that's not going to be captured.

410
00:40:28,680 --> 00:40:31,960
That's not measured economic activity.

411
00:40:31,960 --> 00:40:37,960
That's typically not tax economic activity.

412
00:40:37,960 --> 00:40:46,520
And it has own debatable social costs, right?

413
00:40:46,520 --> 00:40:48,920
So this is debatable, right?

414
00:40:48,920 --> 00:40:57,520
Because you don't want to be tossing good people in jail for doing some activity that's

415
00:40:57,520 --> 00:40:59,080
now legal today, right?

416
00:40:59,080 --> 00:41:03,080
And so that's why there's these no...

417
00:41:03,080 --> 00:41:06,760
I want to give a shout out to them, but I don't know their specific names, but there's...

418
00:41:06,760 --> 00:41:14,480
I'll have to put some links in the chat or something, but there's movements out there

419
00:41:14,480 --> 00:41:20,400
to try to help people that were in prison for cannabis crimes that were nonviolent,

420
00:41:20,400 --> 00:41:25,400
but today would be perfectly permissible.

421
00:41:25,400 --> 00:41:31,520
But that's not to under shadow that whenever things are illegal, it does attract kind of

422
00:41:31,520 --> 00:41:33,280
shady people sometimes.

423
00:41:33,280 --> 00:41:40,720
So not saying that everybody's bad, but unfortunately when things are illegal, right?

424
00:41:40,720 --> 00:41:45,840
That's when you kind of run into the organized crime and some bad actors.

425
00:41:45,840 --> 00:41:52,640
So maybe some decent people get lumped in with some bad characters.

426
00:41:52,640 --> 00:41:57,080
So long story short, if things can be permitted, then...

427
00:41:57,080 --> 00:42:02,880
And this is where you kind of get into the judgment calls, but in my opinion, if things

428
00:42:02,880 --> 00:42:09,280
are permitted, then you can at least put them out under the light.

429
00:42:09,280 --> 00:42:20,320
You can let things operate, at least they can get measured and regulated and operate

430
00:42:20,320 --> 00:42:22,320
under this regulation series.

431
00:42:22,320 --> 00:42:26,160
But long story short, that's a little bit of a spiel there.

432
00:42:26,160 --> 00:42:33,400
Didn't mean to make any opinionated statements because we're focused here on the data.

433
00:42:33,400 --> 00:42:38,800
So anyways, back to the data.

434
00:42:38,800 --> 00:42:45,920
We've got about a quarter of a percent of people today working in the cannabis industry

435
00:42:45,920 --> 00:42:48,200
in Massachusetts.

436
00:42:48,200 --> 00:42:51,860
So I think this is a non-negligible portion.

437
00:42:51,860 --> 00:42:55,960
It seems small, but I think this is larger than you would think.

438
00:42:55,960 --> 00:43:01,200
So I think it would be interesting to compare to other professions.

439
00:43:01,200 --> 00:43:09,960
So if we had more time, I would get the data on how many teachers are there in Massachusetts,

440
00:43:09,960 --> 00:43:12,200
how many...

441
00:43:12,200 --> 00:43:15,200
I think people have done this, how many fast food workers are there.

442
00:43:15,200 --> 00:43:19,360
And so you can kind of compare how many fast food workers to how many people working in

443
00:43:19,360 --> 00:43:24,320
cannabis to kind of get some perspective, tell a story.

444
00:43:24,320 --> 00:43:25,960
So that would be beneficial.

445
00:43:25,960 --> 00:43:31,840
We're time strapped today, limited on time, so we're just going to keep moving.

446
00:43:31,840 --> 00:43:37,000
But I think it's worth pointing out that a couple of things are quick.

447
00:43:37,000 --> 00:43:38,940
It's increasing.

448
00:43:38,940 --> 00:43:46,760
It increased faster during this recession here in 2020.

449
00:43:46,760 --> 00:43:54,880
And the reason I say recession is because the Federal Reserve has flagged this period

450
00:43:54,880 --> 00:43:58,600
as a recession, these gray bars are here.

451
00:43:58,600 --> 00:44:09,400
So the Federal Reserve has designated this quarter in 2020 as a recession, I believe.

452
00:44:09,400 --> 00:44:12,840
If I'm wrong, please let me know.

453
00:44:12,840 --> 00:44:18,640
But anyways, we've got the total number of employees here.

454
00:44:18,640 --> 00:44:22,440
Here, one second.

455
00:44:22,440 --> 00:44:34,120
We may have somebody joining us real quick.

456
00:44:34,120 --> 00:44:41,600
So welcome, Hentoko.

457
00:44:41,600 --> 00:44:45,840
We've got about 15 minutes here to finish up some interesting analysis.

458
00:44:45,840 --> 00:44:48,920
So you're just in time for some good takeaways.

459
00:44:48,920 --> 00:44:53,720
And then I'd be happy to talk with you some more.

460
00:44:53,720 --> 00:45:03,040
So long story short, we're looking here at the GDP in Massachusetts, and we're comparing

461
00:45:03,040 --> 00:45:10,320
that to, okay, what's the economic output here in the cannabis industry?

462
00:45:10,320 --> 00:45:18,880
We just saw that about a quarter of a percent of people here work in the cannabis industry.

463
00:45:18,880 --> 00:45:23,480
And that may be understated because as Heather pointed out, the total number of employees

464
00:45:23,480 --> 00:45:30,080
here may be understated.

465
00:45:30,080 --> 00:45:38,200
When that's thing, well, you can kind of correlate these two, may as well just create this statistic.

466
00:45:38,200 --> 00:45:40,360
They're negatively correlated.

467
00:45:40,360 --> 00:45:57,520
So it's hard to interpret this, but basically, as the total employment in Massachusetts decreases,

468
00:45:57,520 --> 00:46:03,480
you actually have employment increasing in the cannabis industry, which is an interesting

469
00:46:03,480 --> 00:46:04,480
observation.

470
00:46:04,480 --> 00:46:08,240
So anyways, to GDP real quick.

471
00:46:08,240 --> 00:46:13,600
So in this last 15 minutes, we can do some pretty incredible stuff here.

472
00:46:13,600 --> 00:46:18,280
So we can measure GDP.

473
00:46:18,280 --> 00:46:20,200
So what is GDP?

474
00:46:20,200 --> 00:46:27,120
Well, that's the whole economic output in an economy.

475
00:46:27,120 --> 00:46:33,680
In this case, we're talking about the cannabis sector, and we can simply proxy the whole

476
00:46:33,680 --> 00:46:38,120
economic output as consumption.

477
00:46:38,120 --> 00:46:42,560
So we can actually measure cannabis consumption.

478
00:46:42,560 --> 00:46:46,400
Keep in mind, I'm simplifying this here.

479
00:46:46,400 --> 00:46:56,920
There are some good resources that discuss, okay, what is GDP?

480
00:46:56,920 --> 00:47:00,280
Let's see.

481
00:47:00,280 --> 00:47:03,120
Right.

482
00:47:03,120 --> 00:47:11,680
So in general, right, it's consumption, investment, government expenditures, and then exports.

483
00:47:11,680 --> 00:47:15,320
So but right, we're just going to proxy things as consumption.

484
00:47:15,320 --> 00:47:19,440
And this is the way you see things done in macroeconomics a lot.

485
00:47:19,440 --> 00:47:24,240
We can abstract away the government spending and the investment, and there's not going

486
00:47:24,240 --> 00:47:25,960
to be any net exports.

487
00:47:25,960 --> 00:47:31,240
So we can abstract those away for the time being and just say, okay, GDP is going to

488
00:47:31,240 --> 00:47:37,800
be consumption, which is going to be sales.

489
00:47:37,800 --> 00:47:39,640
Here I am calculating sales.

490
00:47:39,640 --> 00:47:53,800
I noticed that sales total is actually a cumulative total.

491
00:47:53,800 --> 00:47:57,200
That's not what daily data should look like.

492
00:47:57,200 --> 00:48:05,320
So we can actually take the difference here to calculate our daily data.

493
00:48:05,320 --> 00:48:16,440
And so now if you just plot sales, there is this one outlier, which I'm quite worried

494
00:48:16,440 --> 00:48:17,440
about.

495
00:48:17,440 --> 00:48:23,080
And so I think we should come and revisit what happened on this day.

496
00:48:23,080 --> 00:48:26,920
This could be a miscoding measurement error.

497
00:48:26,920 --> 00:48:30,800
So this is a problem.

498
00:48:30,800 --> 00:48:38,880
In fact, I'd probably exclude this one day, but to the time being, it doesn't overly affect

499
00:48:38,880 --> 00:48:40,600
our analysis here.

500
00:48:40,600 --> 00:48:46,600
But I think it should be pointed out and it needs to be investigated further.

501
00:48:46,600 --> 00:48:51,320
And our analysis probably needs to be redone with this one observation excluded because

502
00:48:51,320 --> 00:48:57,320
it's atypical.

503
00:48:57,320 --> 00:49:07,880
So for example, if we just look at the last 100 observations here, you'll see, okay, this

504
00:49:07,880 --> 00:49:14,800
is more typical for what daily data looks like, where we have fluctuations, right?

505
00:49:14,800 --> 00:49:21,040
Fridays may be busier than Mondays or Tuesdays or what have you.

506
00:49:21,040 --> 00:49:25,120
Which is a whole other interesting analysis of its own, right?

507
00:49:25,120 --> 00:49:28,480
I'm a big fan of looking at the days of the week.

508
00:49:28,480 --> 00:49:36,160
So exactly like I just said, how do sales on Friday compare to Monday or Tuesday?

509
00:49:36,160 --> 00:49:38,160
In fact, we could probably do that next week.

510
00:49:38,160 --> 00:49:41,840
We'll probably do that in not that much time.

511
00:49:41,840 --> 00:49:44,160
But like we said, we're strapped on time here.

512
00:49:44,160 --> 00:49:50,760
So we're going to try to stay focused and save some of these cool analyses for the future.

513
00:49:50,760 --> 00:49:58,560
We've got sales here, daily data.

514
00:49:58,560 --> 00:50:11,480
So we can aggregate this into monthly and quarterly data, which we'll do.

515
00:50:11,480 --> 00:50:13,880
We've already gotten our total employees.

516
00:50:13,880 --> 00:50:16,800
We've already gotten the population.

517
00:50:16,800 --> 00:50:21,640
It's now time to define GDP here.

518
00:50:21,640 --> 00:50:24,760
So we're going to define GDP.

519
00:50:24,760 --> 00:50:31,200
And here I'm going to put everything into millions of dollars because we're actually

520
00:50:31,200 --> 00:50:39,080
going to be comparing this to the GDP of Massachusetts to put things into perspective.

521
00:50:39,080 --> 00:50:45,600
And the GDP of Massachusetts measured by the Federal Reserve is measured in millions of

522
00:50:45,600 --> 00:50:49,640
dollars.

523
00:50:49,640 --> 00:50:57,760
So we're going to put our units in millions of dollars just for convenience sake.

524
00:50:57,760 --> 00:51:08,680
Then we are going to get the GDP from Federal Reserve and make sure everything's time stamped

525
00:51:08,680 --> 00:51:14,000
at the end of the quarter and not the beginning of the quarter.

526
00:51:14,000 --> 00:51:19,200
And so without further ado, what do these series look like?

527
00:51:19,200 --> 00:51:33,840
Well, here is quarterly cannabis GDP or quarterly cannabis sales, interchangeable in our case.

528
00:51:33,840 --> 00:51:41,280
So sales may be a bit more familiar to those people.

529
00:51:41,280 --> 00:51:50,200
And here is GDP in Massachusetts during a slightly shorter timeframe here.

530
00:51:50,200 --> 00:52:00,320
And in fact, to compare apples to apples, I'm actually going to plot just quarterly

531
00:52:00,320 --> 00:52:09,520
cannabis through quarter one of 2021 because that's all we have, data for Massachusetts

532
00:52:09,520 --> 00:52:11,960
GDP.

533
00:52:11,960 --> 00:52:19,240
Okay, now we're comparing apples to apples here.

534
00:52:19,240 --> 00:52:25,480
And what we notice is, I think this is interesting.

535
00:52:25,480 --> 00:52:40,000
So remember earlier we saw employment was almost impervious to the shock of April of

536
00:52:40,000 --> 00:52:42,960
2020.

537
00:52:42,960 --> 00:52:48,280
Sales were actually hit here.

538
00:52:48,280 --> 00:52:56,440
And so this is contrary to I think perhaps what we saw in other places or maybe just

539
00:52:56,440 --> 00:53:04,600
the general perspective that sales increased during this time.

540
00:53:04,600 --> 00:53:07,560
And so there's a couple things going on here.

541
00:53:07,560 --> 00:53:11,160
One, we may have just entirely messed up our measurement.

542
00:53:11,160 --> 00:53:12,160
That's one.

543
00:53:12,160 --> 00:53:15,360
It's not impossible.

544
00:53:15,360 --> 00:53:24,760
And you have an interesting effect here in economics where people will shift their purchases.

545
00:53:24,760 --> 00:53:30,720
So just because you see a spike in purchases, just say everybody rushes to the store and

546
00:53:30,720 --> 00:53:36,480
they buy a bunch of cannabis at this one specific time and you see a large spike, that doesn't

547
00:53:36,480 --> 00:53:41,180
necessarily mean their overall consumption is going to increase.

548
00:53:41,180 --> 00:53:50,280
So they may just change the time at which they make their purchase and then consume

549
00:53:50,280 --> 00:53:51,280
the same amount.

550
00:53:51,280 --> 00:53:55,560
So they may have purchased a lot at a given time.

551
00:53:55,560 --> 00:54:01,640
So there may have been a spike in sales during this time.

552
00:54:01,640 --> 00:54:14,360
So instead of just conjecturing, we can actually look.

553
00:54:14,360 --> 00:54:18,000
This is going to cover the right timeframe here.

554
00:54:18,000 --> 00:54:32,680
So let's see if we can't actually look at sales during this time period.

555
00:54:32,680 --> 00:54:40,000
Sorry I could probably pinpoint this a little better, but just sort of ad hoc trying to

556
00:54:40,000 --> 00:54:47,880
find this.

557
00:54:47,880 --> 00:54:50,520
Okay this is not good.

558
00:54:50,520 --> 00:54:59,760
We may even have measurement error here.

559
00:54:59,760 --> 00:55:02,460
Okay so this is not good.

560
00:55:02,460 --> 00:55:13,580
So we just zoomed in on this pandemic period here and would you look at this.

561
00:55:13,580 --> 00:55:19,520
We don't have any sales data for this period here.

562
00:55:19,520 --> 00:55:28,160
So that would actually explain our big decrease here.

563
00:55:28,160 --> 00:55:32,080
So I think we may need to do a deeper dive here.

564
00:55:32,080 --> 00:55:42,080
So I need you to look and see, okay did Massachusetts maybe suspend reporting during this period?

565
00:55:42,080 --> 00:55:51,080
Maybe they said okay you know we don't have to report sales during this period.

566
00:55:51,080 --> 00:55:53,580
There could be a lot of things going on.

567
00:55:53,580 --> 00:56:00,480
So this is sort of what data science is all about here right.

568
00:56:00,480 --> 00:56:04,080
We just had a monkey wrench thrown in.

569
00:56:04,080 --> 00:56:16,160
Welcome back Kandoko.

570
00:56:16,160 --> 00:56:22,240
I'm going to be wrapping up my spiel here then momentarily.

571
00:56:22,240 --> 00:56:26,560
That way we can get through a little discussion here because this is a pretty glaring monkey

572
00:56:26,560 --> 00:56:30,480
wrench that we're going to have to address for next week.

573
00:56:30,480 --> 00:56:35,480
I'll still show you how the analysis can be done but this is what kind of Heather was

574
00:56:35,480 --> 00:56:41,480
talking about where unfortunately if you put the data under your microscope you're going

575
00:56:41,480 --> 00:56:48,780
to find some glaring oddities sometimes.

576
00:56:48,780 --> 00:56:54,520
So that need to be explained and this is one of them.

577
00:56:54,520 --> 00:56:56,960
So we just put the data under your microscope.

578
00:56:56,960 --> 00:57:07,400
We're basically just missing data here for the most important period.

579
00:57:07,400 --> 00:57:29,200
We're one of the more important periods here.

580
00:57:29,200 --> 00:57:39,160
So let's see if we just I just want to see real quick if okay yep so it's the data set

581
00:57:39,160 --> 00:57:45,120
themselves so maybe I'll do a deeper dive and try to figure out and explain for next

582
00:57:45,120 --> 00:57:47,120
week why this is missing.

583
00:57:47,120 --> 00:57:52,360
But basically we just put the data under a microscope and we found out that we're missing

584
00:57:52,360 --> 00:57:57,760
data for this critical time period here and that's probably going to have an effect on

585
00:57:57,760 --> 00:58:06,000
our analysis right because here we are trying to make these statements about what's going

586
00:58:06,000 --> 00:58:12,420
on here in quarter two and we've got measurement errors.

587
00:58:12,420 --> 00:58:18,000
So that's a big problem and so that makes us wonder like are there measurement errors

588
00:58:18,000 --> 00:58:23,740
going on here that explain this dip.

589
00:58:23,740 --> 00:58:37,960
So we've got problems but that's life as a data scientist but just to show you the last

590
00:58:37,960 --> 00:58:42,760
little bit of analysis here because despite having measurement errors we can still calculate

591
00:58:42,760 --> 00:58:48,360
the statistics because I think they're still worthwhile to look at right.

592
00:58:48,360 --> 00:58:52,760
We'd rather calculate the statistics than not.

593
00:58:52,760 --> 00:58:55,160
And it rated the stage we've not.

594
00:58:55,160 --> 00:58:57,360
So let's calculate them real quick.

595
00:58:57,360 --> 00:59:09,200
So let's grab Massachusetts GDP maybe we've already done that but let's grab it again.

596
00:59:09,200 --> 00:59:16,880
We've plotted them we can calculate the correlation between the two.

597
00:59:16,880 --> 00:59:21,640
They were positively correlated but not perfectly correlated.

598
00:59:21,640 --> 00:59:28,060
So a lot of times you'll see much stronger correlation with sectors and GDP.

599
00:59:28,060 --> 00:59:34,280
So I'd say this is a weak correlation not I mean an actual weak correlation you know

600
00:59:34,280 --> 00:59:35,920
point one point two.

601
00:59:35,920 --> 00:59:45,160
So this is still statistically a strong correlation but not like a I would not say this is a strong

602
00:59:45,160 --> 00:59:53,280
like economic correlation which is an interesting real real real interesting observation.

603
00:59:53,280 --> 00:59:58,680
But we're going to be cursory with it.

604
00:59:58,680 --> 01:00:03,640
We're going to say okay we're quantifying it here and sorry that I'm rushing through

605
01:00:03,640 --> 01:00:07,240
this we can expand on everything next week.

606
01:00:07,240 --> 01:00:14,000
But we'll say okay what's cannabis is a percent of GDP.

607
01:00:14,000 --> 01:00:23,200
Well my takeaway is Massachusetts has a quite booming robust economy here.

608
01:00:23,200 --> 01:00:35,280
So yes cannabis is this amazing interesting sector but if you look at the data they may

609
01:00:35,280 --> 01:00:41,360
it may just be I want to say almost like a drop in the bucket of the whole economy of

610
01:00:41,360 --> 01:00:42,360
Massachusetts here.

611
01:00:42,360 --> 01:00:45,680
But like I said it's not negligible.

612
01:00:45,680 --> 01:00:53,160
You've got 0.04 percent of the entire economy.

613
01:00:53,160 --> 01:00:57,320
So it's not nothing.

614
01:00:57,320 --> 01:01:06,400
It's not everything by all means.

615
01:01:06,400 --> 01:01:20,120
So you know it is what it is but it's not it's not the it's not the end all be all.

616
01:01:20,120 --> 01:01:34,080
But let's put this into dollars so we can say okay what's the annual GDP here.

617
01:01:34,080 --> 01:01:39,960
And we can visualize that.

618
01:01:39,960 --> 01:01:44,600
Okay so like we said it's not nothing.

619
01:01:44,600 --> 01:01:51,120
So here's just the annual cannabis sales or cannabis GDP in Massachusetts.

620
01:01:51,120 --> 01:01:53,800
So like we said it's almost nothing.

621
01:01:53,800 --> 01:02:01,720
But look at this it's almost one billion dollars in 2021.

622
01:02:01,720 --> 01:02:03,160
And the year is not even over yet.

623
01:02:03,160 --> 01:02:06,400
We've still got like a whole quarter left.

624
01:02:06,400 --> 01:02:10,800
So Massachusetts is doing well for themselves.

625
01:02:10,800 --> 01:02:13,960
Like we said Massachusetts is a big economy here.

626
01:02:13,960 --> 01:02:17,840
So this is great.

627
01:02:17,840 --> 01:02:21,080
Like look at look at the size of their economy here.

628
01:02:21,080 --> 01:02:31,480
This is quarterly six hundred thousand million dollars.

629
01:02:31,480 --> 01:02:34,560
So they've got a big economy here.

630
01:02:34,560 --> 01:02:40,520
But you know a billion dollars is not nothing.

631
01:02:40,520 --> 01:02:47,640
So like you said you need to weigh that against the social costs.

632
01:02:47,640 --> 01:02:54,840
But I mean if you're going to increase GDP of a state by one billion dollars that's not

633
01:02:54,840 --> 01:02:56,840
bad in my book.

634
01:02:56,840 --> 01:02:58,840
But that's an opinion.

635
01:02:58,840 --> 01:03:07,680
And so then okay what's the annual percent of GDP here?

636
01:03:07,680 --> 01:03:15,760
Well as we noted you know it's not the biggest segment of the economy.

637
01:03:15,760 --> 01:03:20,800
But look at this you know.

638
01:03:20,800 --> 01:03:37,280
The cannabis GDP is about 0.16 percent of all of GDP.

639
01:03:37,280 --> 01:03:43,120
But I mean when you're talking about like a boost to GDP right.

640
01:03:43,120 --> 01:03:50,880
People are out there like the nation shooting like for like you know like a 2 percent or

641
01:03:50,880 --> 01:03:51,880
so.

642
01:03:51,880 --> 01:03:52,880
More than that's awesome right.

643
01:03:52,880 --> 01:03:59,240
So you know when you're talking about just trying to squeak up GDP I mean you're squeaking

644
01:03:59,240 --> 01:04:06,240
up GDP you know 0.16 of a percent that's not nothing.

645
01:04:06,240 --> 01:04:14,200
And like to put that into dollar terms well I thought this was an interesting statistic

646
01:04:14,200 --> 01:04:15,200
here.

647
01:04:15,200 --> 01:04:19,240
Well we've got the population of Massachusetts.

648
01:04:19,240 --> 01:04:25,520
We've calculated GDP which is used to say okay that's the it's used to kind of proxy

649
01:04:25,520 --> 01:04:28,960
you know the standard of living sometimes.

650
01:04:28,960 --> 01:04:37,880
And so we can say okay well how much better off is a person in Massachusetts for having

651
01:04:37,880 --> 01:04:42,020
the cannabis industry in economic terms.

652
01:04:42,020 --> 01:04:50,200
Social terms that's a whole nother can of worms that will maybe be for somebody.

653
01:04:50,200 --> 01:04:53,200
So but we can look at the economic side.

654
01:04:53,200 --> 01:04:58,680
Okay so what's the GDP per capita?

655
01:04:58,680 --> 01:05:09,400
Well in 2018 that was just about two dollars so negligible.

656
01:05:09,400 --> 01:05:15,880
Already by 2019 per capita that's around sixty five dollars.

657
01:05:15,880 --> 01:05:27,040
So per person that's man woman and child and the retired elderly everybody sixty five dollars.

658
01:05:27,040 --> 01:05:34,180
And then by 2021 almost one hundred and forty dollars.

659
01:05:34,180 --> 01:05:42,280
So that's every person in Massachusetts is about one hundred and forty dollars better

660
01:05:42,280 --> 01:05:49,000
off per year because of the cannabis industry just just passively.

661
01:05:49,000 --> 01:05:55,680
Right just you know they're you know just the economy is better right.

662
01:05:55,680 --> 01:06:03,120
It's just a bigger economy right there's more tax revenue there's just more economic activity

663
01:06:03,120 --> 01:06:05,880
and they're just they're just better off.

664
01:06:05,880 --> 01:06:13,680
So just having the cannabis industry permitted right you don't even have to partake in the

665
01:06:13,680 --> 01:06:14,680
industry.

666
01:06:14,680 --> 01:06:21,200
So great so just for some people that actually partake that's you know beneficial for them.

667
01:06:21,200 --> 01:06:23,480
But even for the people that don't partake.

668
01:06:23,480 --> 01:06:30,800
I mean let's say you can think about it as a check in the mail once a year for one hundred

669
01:06:30,800 --> 01:06:35,160
and forty bucks and that's just now.

670
01:06:35,160 --> 01:06:37,400
And so it's going to be increasing right.

671
01:06:37,400 --> 01:06:45,400
So you know next year maybe it'll be two hundred bucks and just more and more and more maybe

672
01:06:45,400 --> 01:06:47,800
you know for perpetuity.

673
01:06:47,800 --> 01:06:55,520
So you hear we've done a thing about this this is with measurement error.

674
01:06:55,520 --> 01:06:59,640
This is excluding months worth of data.

675
01:06:59,640 --> 01:07:04,680
So this this number may even be biased downwards.

676
01:07:04,680 --> 01:07:15,960
So I think that's sort of my takeaway for today is by permitting the cannabis industry

677
01:07:15,960 --> 01:07:23,720
in Maryland in Massachusetts.

678
01:07:23,720 --> 01:07:34,080
There has been an economic benefit that I think is quantifiable because it has increased

679
01:07:34,080 --> 01:07:41,520
from what I can tell it appears to have increased GDP by a small amount but I would say non

680
01:07:41,520 --> 01:07:50,800
negligible amount to the portion where no every person in Massachusetts is you know

681
01:07:50,800 --> 01:07:59,560
at least you know on average about you know one hundred and forty or maybe one hundred

682
01:07:59,560 --> 01:08:05,320
maybe we can say there may be estimated estimation errors but maybe between one hundred and two

683
01:08:05,320 --> 01:08:15,160
hundred dollars per year better off than they were without the cannabis industry.

684
01:08:15,160 --> 01:08:22,880
And I mean that's not nothing in my book you know if I'd rather I'd personally rather have

685
01:08:22,880 --> 01:08:28,480
a check in the mail once a year for two hundred bucks than not.

686
01:08:28,480 --> 01:08:31,520
So that's awesome.

687
01:08:31,520 --> 01:08:38,640
And so I think this is my takeaway that you that other states could take to the bank is

688
01:08:38,640 --> 01:08:44,440
okay yes you know you're just looking at maybe the social aspect of this or maybe the health

689
01:08:44,440 --> 01:08:52,080
aspect where maybe some people are being like we talked about earlier they're forced into

690
01:08:52,080 --> 01:09:00,240
the gray illegal economy and that's maybe not so good at a social level or maybe at

691
01:09:00,240 --> 01:09:06,120
a medicinal level people can't get medicine they need or this or that.

692
01:09:06,120 --> 01:09:12,600
So those are different considerations but we look at that we're looking at the economics

693
01:09:12,600 --> 01:09:21,240
here and from an economic perspective you know the appears to be you know the benefit

694
01:09:21,240 --> 01:09:26,400
here and so you know I think it's a consideration that people should should take into consideration

695
01:09:26,400 --> 01:09:34,520
that you know if if you permit cannabis with regulations like they have in Massachusetts

696
01:09:34,520 --> 01:09:40,600
right it's not like they're they're just you know out there just letting anyone do anything

697
01:09:40,600 --> 01:09:48,720
they want there are rules and regulations in Massachusetts and so you know given a permitted

698
01:09:48,720 --> 01:09:58,680
market there's an economic benefit and so I think that ends sort of the point I wanted

699
01:09:58,680 --> 01:10:08,200
to to make today because that that was just sort of what my you know my thinking and sort

700
01:10:08,200 --> 01:10:13,480
of research and studies has sort of led to I think is I think that's that's the main

701
01:10:13,480 --> 01:10:21,680
takeaway is what is the economic impact of permitting these cannabis markets is it big

702
01:10:21,680 --> 01:10:28,280
is it negligible and can we quantify that and I think at least in Massachusetts we made

703
01:10:28,280 --> 01:10:32,200
a rough attempt at doing just that.

704
01:10:32,200 --> 01:10:44,560
So any thoughts questions comments?

705
01:10:44,560 --> 01:10:50,920
Well on that note it was awesome to have you Heather and then Hendo Ko if you're interested

706
01:10:50,920 --> 01:10:57,440
at all I'm curious to hear about you know what perspective you're coming at the group

707
01:10:57,440 --> 01:11:05,440
from because we're happy to have you.

708
01:11:05,440 --> 01:11:13,320
The R programming language okay so we have people in the group that use R so I'll have

709
01:11:13,320 --> 01:11:22,720
to get you in touch with one of our regulars Paul well I don't think there's a meme that

710
01:11:22,720 --> 01:11:32,840
Paul that that R is is almost this poor neglected you know the poor neglected cat and then everybody's

711
01:11:32,840 --> 01:11:39,880
focused on on Python the the cute the cute dog in the corner so I don't think R should

712
01:11:39,880 --> 01:11:47,240
be should be neglected or forgotten about so I'll get you in touch with Paul because

713
01:11:47,240 --> 01:11:58,160
like I said oh well I'll put a link here in the chat so you can join the group here meetup.com

714
01:11:58,160 --> 01:12:08,280
word slash cannabis data science I believe.

715
01:12:08,280 --> 01:12:20,400
Yes here we are here you can find the group put this in the chat yeah and I'll get you

716
01:12:20,400 --> 01:12:26,000
in touch with Paul he uses R and all of this analysis can be done in your favorite programming

717
01:12:26,000 --> 01:12:32,000
language because we're just using API's and calculating basic statistics here right so

718
01:12:32,000 --> 01:12:40,600
we're just reading the data from the API calculating some means some totals some some min some

719
01:12:40,600 --> 01:12:50,160
maxes what have you nothing nothing too fancy and what I think making some groundbreaking

720
01:12:50,160 --> 01:12:54,920
discoveries and that's what we do that's what we do here the cannabis data science group

721
01:12:54,920 --> 01:13:03,960
right we get real public we get real cannabis data we get it into a format that's usable

722
01:13:03,960 --> 01:13:10,400
we poke and prod at it and find all its flaws point those out because as Heather pointed

723
01:13:10,400 --> 01:13:20,440
out that's critical we analyze it talk about it have fun so that's what we're about so

724
01:13:20,440 --> 01:13:29,280
glad that you joined ah in Mexico well we're incredibly happy to have you because we're

725
01:13:29,280 --> 01:13:34,880
trying to you know spread the word about cannabis data science all around the world so the more

726
01:13:34,880 --> 01:13:47,560
the merrier so please share the word and invite your friends family colleagues co-workers

727
01:13:47,560 --> 01:13:54,920
you know share the group and you know everybody's welcome and doesn't matter your background

728
01:13:54,920 --> 01:14:00,360
you know we got a little technical today we got a little programming heavy but whatever

729
01:14:00,360 --> 01:14:07,560
you may be you know data science academics production processing retail whether you're

730
01:14:07,560 --> 01:14:15,800
at the lab anybody touching cannabis data you're welcome join the group and let's have

731
01:14:15,800 --> 01:14:24,080
some fun crunching some numbers so glad to have you all and without further ado I'm going

732
01:14:24,080 --> 01:14:30,000
to go ahead and wrap it up here today so thank you for coming and then next week we'll dive

733
01:14:30,000 --> 01:14:37,120
back into the data we'll look at all the flaws we'll try to redo the analysis just to make

734
01:14:37,120 --> 01:14:42,080
sure it's nice and crispy maybe make some crispy charts and there's a lot more we can

735
01:14:42,080 --> 01:14:48,840
get to right there's a whole slew of data points ahead of us so tune in next week because

736
01:14:48,840 --> 01:15:13,240
we will be on another adventure so until then keep your nose to the grindstone

