1
00:00:00,000 --> 00:00:12,280
Well, welcome to the Cannabis Data Science Meetup Group.

2
00:00:12,280 --> 00:00:13,920
It's good to have you both.

3
00:00:13,920 --> 00:00:16,120
So happy to have you, Kyron.

4
00:00:16,120 --> 00:00:22,360
So I'll introduce myself first and then I'd love to hear about what brings you to the

5
00:00:22,360 --> 00:00:23,520
group.

6
00:00:23,520 --> 00:00:25,680
So my name is Keegan.

7
00:00:25,680 --> 00:00:28,720
So I started a company, Candlytics.

8
00:00:28,720 --> 00:00:33,000
And so we principally provide software to laboratories.

9
00:00:33,000 --> 00:00:38,280
And being in this space, we realize there's a high demand for analytics.

10
00:00:38,280 --> 00:00:41,520
And coincidentally, that's what my background is in.

11
00:00:41,520 --> 00:00:45,920
So just thought, OK, time to lend a hand to everyone.

12
00:00:45,920 --> 00:00:52,280
So we help everyone from labs to producers, processors, retailers, crunch numbers, move

13
00:00:52,280 --> 00:00:56,800
data around, just make life simple and easy.

14
00:00:56,800 --> 00:00:58,680
So that's what we do.

15
00:00:58,680 --> 00:01:04,760
And so every Wednesday, we do a Cannabis Data Science Meetup Group where we basically get

16
00:01:04,760 --> 00:01:14,320
some real cannabis data, look at some statistics to show, OK, this is how you would do things.

17
00:01:14,320 --> 00:01:18,760
These are some statistics that you can calculate with data that's out there.

18
00:01:18,760 --> 00:01:22,680
And just have fun, pick each other's brains.

19
00:01:22,680 --> 00:01:24,880
So that's what we're about.

20
00:01:24,880 --> 00:01:29,560
And so I'd love to hear about your angle.

21
00:01:29,560 --> 00:01:33,600
What's your background, Kyron?

22
00:01:33,600 --> 00:01:40,640
Well, I actually am in a data analytics boot camp.

23
00:01:40,640 --> 00:01:45,360
And for one of my mock roles, it's to get started on networking.

24
00:01:45,360 --> 00:01:51,920
Because I'm like an alternative route into the analytics field.

25
00:01:51,920 --> 00:01:56,360
That's going to be a little bit more difficult for me since I have a degree backing it up.

26
00:01:56,360 --> 00:02:00,160
So this is actually my first meetup that I'm doing on the subject.

27
00:02:00,160 --> 00:02:04,800
And I picked this one because it kind of sounds interesting enough to learn about.

28
00:02:04,800 --> 00:02:07,680
But other than that, there really isn't much.

29
00:02:07,680 --> 00:02:12,840
I'm just here to sell it all in.

30
00:02:12,840 --> 00:02:18,680
I think it's one of the, if you're coming from a data perspective, it's one of the best

31
00:02:18,680 --> 00:02:25,200
subjects you could be studying because, believe it or not, there's no data like it.

32
00:02:25,200 --> 00:02:28,800
No one's got public data like cannabis data.

33
00:02:28,800 --> 00:02:37,840
So because the states want to keep a good measure on the cannabis operations, cannabis

34
00:02:37,840 --> 00:02:41,840
activity, because this is something that used to be illegal.

35
00:02:41,840 --> 00:02:45,840
So they want to keep a good eye on it.

36
00:02:45,840 --> 00:02:53,600
And so nobody's tracking tomatoes from seed to sale.

37
00:02:53,600 --> 00:02:59,240
Well maybe to a certain extent, but not to the extent that they are with cannabis.

38
00:02:59,240 --> 00:03:06,880
And then on top of that, a lot of these states are really forthcoming with their data.

39
00:03:06,880 --> 00:03:09,840
They've got public data.

40
00:03:09,840 --> 00:03:14,840
And for the most, the biggest hurdle is just the technical hurdle.

41
00:03:14,840 --> 00:03:17,520
They're happy to provide all the data.

42
00:03:17,520 --> 00:03:20,880
It's just all their mechanisms in place.

43
00:03:20,880 --> 00:03:22,600
And some states have awesome mechanisms.

44
00:03:22,600 --> 00:03:25,840
And so we're going to be exploring some of those today.

45
00:03:25,840 --> 00:03:31,800
So up in the Northeast, like for example, today we'll look at Massachusetts and Finnish

46
00:03:31,800 --> 00:03:32,800
Connecticut.

47
00:03:32,800 --> 00:03:37,320
They've got their data readily available through an API.

48
00:03:37,320 --> 00:03:44,560
So you can just tap in, get some of these cases real time data.

49
00:03:44,560 --> 00:03:49,640
So Connecticut basically is real time data.

50
00:03:49,640 --> 00:03:54,080
And it calculates statistics to your heart's content.

51
00:03:54,080 --> 00:03:55,080
So it's fun.

52
00:03:55,080 --> 00:03:57,200
It's a good space to be in.

53
00:03:57,200 --> 00:03:58,200
Cool.

54
00:03:58,200 --> 00:03:59,200
Sounds good.

55
00:03:59,200 --> 00:04:08,720
So well, without further ado, let's just jump into it.

56
00:04:08,720 --> 00:04:14,240
Because yeah, you know, show don't tell.

57
00:04:14,240 --> 00:04:22,080
So OK, so just to give you a little introduction here.

58
00:04:22,080 --> 00:04:39,960
So we're coming from the we've got a repository here where you can find each week's code and

59
00:04:39,960 --> 00:04:45,480
data that way you can follow along.

60
00:04:45,480 --> 00:04:52,600
So basically last week we were looking at cannabinoids in Connecticut.

61
00:04:52,600 --> 00:05:01,840
And so I figured, OK, before we move on, let's just go ahead and finish out this analysis

62
00:05:01,840 --> 00:05:07,280
with Connecticut data just to be thorough.

63
00:05:07,280 --> 00:05:15,800
So let's essentially do just that.

64
00:05:15,800 --> 00:05:23,240
So here's the script we were working with last time that I've tidied up a little bit.

65
00:05:23,240 --> 00:05:31,680
And so just to remind everyone from last week and then to show you, Kyron, this is the data

66
00:05:31,680 --> 00:05:34,080
source.

67
00:05:34,080 --> 00:05:38,880
So data.ct.gov.

68
00:05:38,880 --> 00:05:51,320
And then you can find your way to the medical cannabis brand registry where they essentially

69
00:05:51,320 --> 00:06:02,680
have real time or at least up to the past day entries for what appears to be every product

70
00:06:02,680 --> 00:06:04,160
sold in Connecticut.

71
00:06:04,160 --> 00:06:10,520
I could be wrong, but it appears to be every product.

72
00:06:10,520 --> 00:06:23,080
So if you're going into the store and you're getting, you know, one of these, you know,

73
00:06:23,080 --> 00:06:35,060
if you're getting Abiflex flower, then you can actually get the image.

74
00:06:35,060 --> 00:06:39,080
So you went and bought some Abiflex flower.

75
00:06:39,080 --> 00:06:43,560
And then you can actually see the certificate of analysis.

76
00:06:43,560 --> 00:06:50,560
So you can actually see, OK, you know, this sample of data passed quality assurance testing,

77
00:06:50,560 --> 00:06:56,080
test microtoxins, you know, there's no heavy metals detected.

78
00:06:56,080 --> 00:07:00,120
You know, there's your cannabinoids and your terpenes.

79
00:07:00,120 --> 00:07:09,340
So and it's verified, signed off on by the lab director here.

80
00:07:09,340 --> 00:07:13,400
So we've got nice official records.

81
00:07:13,400 --> 00:07:21,160
So this is this is a pretty pristine data set, right, because, you know, if if there's

82
00:07:21,160 --> 00:07:28,760
any confusion with the numbers, so for whatever reason, if there was any confusion with any

83
00:07:28,760 --> 00:07:36,120
of these data points, well, you can just refer back to the certificate if you need to.

84
00:07:36,120 --> 00:07:39,720
So cool data set.

85
00:07:39,720 --> 00:07:41,920
They have made it easily accessible.

86
00:07:41,920 --> 00:07:48,960
So you can just come here, you can download it or.

87
00:07:48,960 --> 00:07:55,000
As we like to do when possible, so you can automate things, we can access it through

88
00:07:55,000 --> 00:07:57,040
an API.

89
00:07:57,040 --> 00:08:08,520
So so what's cool here is we'll use the Socrata API.

90
00:08:08,520 --> 00:08:13,160
And with Connecticut, we'll use this package here.

91
00:08:13,160 --> 00:08:20,720
And then the second half of the meetup, we'll get data from Massachusetts and we'll do things

92
00:08:20,720 --> 00:08:21,720
manually.

93
00:08:21,720 --> 00:08:28,600
So here we'll use a package and then in Massachusetts, Massachusetts will just do things manually

94
00:08:28,600 --> 00:08:31,520
just to show you both ways.

95
00:08:31,520 --> 00:08:38,480
So without further ado, let's get this data.

96
00:08:38,480 --> 00:08:52,400
So I'm not as familiar with DS codes.

97
00:08:52,400 --> 00:09:00,800
Console here, terminal, but we'll experiment.

98
00:09:00,800 --> 00:09:07,200
And then I just wrote just a quick charting function.

99
00:09:07,200 --> 00:09:11,800
We'll talk about this when we get to it.

100
00:09:11,800 --> 00:09:14,760
So just a little background work.

101
00:09:14,760 --> 00:09:21,680
I basically looked at all the data points and just defined, OK, these are cannabinoids.

102
00:09:21,680 --> 00:09:24,480
These are terpenes.

103
00:09:24,480 --> 00:09:36,640
So we can define those.

104
00:09:36,640 --> 00:09:44,400
Next it's real easy to get data from this endpoint here.

105
00:09:44,400 --> 00:09:50,960
So essentially all you need to know is this ID right here.

106
00:09:50,960 --> 00:09:55,160
That's essentially the data set ID.

107
00:09:55,160 --> 00:09:58,080
And there we have it right there.

108
00:09:58,080 --> 00:10:13,200
And we can probably get this data.

109
00:10:13,200 --> 00:10:17,040
See if we have any data.

110
00:10:17,040 --> 00:10:19,320
And so there we are.

111
00:10:19,320 --> 00:10:26,160
So there are 10,859 observations at this moment.

112
00:10:26,160 --> 00:10:29,200
And we have read those in with the API.

113
00:10:29,200 --> 00:10:36,600
And so what's cool is as they add records, you can get those through the API.

114
00:10:36,600 --> 00:10:43,320
So you can always have the latest data.

115
00:10:43,320 --> 00:10:50,200
And then you can dive into this code or watch last week's meetup.

116
00:10:50,200 --> 00:10:58,640
But this is basically what we did last week was we spent the majority of the time parsing

117
00:10:58,640 --> 00:11:02,760
the data and getting it into a nice clean format.

118
00:11:02,760 --> 00:11:10,240
So although it's maybe less than 15 lines of code now, this is what we spent the bulk

119
00:11:10,240 --> 00:11:13,560
of our time on last week.

120
00:11:13,560 --> 00:11:23,400
So that's nice that we can just move forward this week with clean data.

121
00:11:23,400 --> 00:11:33,200
And just to show you what does the data look like.

122
00:11:33,200 --> 00:11:42,480
We've got our brand.

123
00:11:42,480 --> 00:11:47,160
We've got the approval date.

124
00:11:47,160 --> 00:11:54,920
And now we've just cleaned all of the cannabinoids, terpenes.

125
00:11:54,920 --> 00:12:02,320
And we added these two columns, total cannabinoids and total terpenes.

126
00:12:02,320 --> 00:12:07,480
Because all those numbers weren't present in the data set.

127
00:12:07,480 --> 00:12:20,840
These numbers, if we look at one of these certificates, I would not be the least bit

128
00:12:20,840 --> 00:12:27,520
surprised to find total cannabinoids.

129
00:12:27,520 --> 00:12:33,760
Well it does not look like total cannabinoids or total terpenes are reported on these COAs.

130
00:12:33,760 --> 00:12:41,040
So perhaps in Massachusetts they aren't.

131
00:12:41,040 --> 00:12:47,080
However in many states you'll see total terpenes and total cannabinoids reported on the certificate

132
00:12:47,080 --> 00:12:49,680
of analysis.

133
00:12:49,680 --> 00:12:54,000
And so this number doesn't make sense, right?

134
00:12:54,000 --> 00:12:57,260
Because you can't have cannabinoids above 100.

135
00:12:57,260 --> 00:13:02,920
So we'll have to account for situations like this.

136
00:13:02,920 --> 00:13:15,320
Where in general we can now have a pretty good measure of those animals.

137
00:13:15,320 --> 00:13:18,400
Now we want to look at the data.

138
00:13:18,400 --> 00:13:28,580
So what we wanted to do last week was we basically said, OK, what is the prevalence of each terpene?

139
00:13:28,580 --> 00:13:32,120
So how often can it be found?

140
00:13:32,120 --> 00:13:38,360
How often does it occur in products?

141
00:13:38,360 --> 00:13:40,880
In Connecticut actually.

142
00:13:40,880 --> 00:13:43,440
This is in Massachusetts.

143
00:13:43,440 --> 00:13:56,760
So this is code we did last week where we basically, we just, the magic here is we just

144
00:13:56,760 --> 00:14:05,680
calculate the prevalence, which is basically the length, so the number of observations

145
00:14:05,680 --> 00:14:16,720
where we have an analyte present divided by the total number of observations.

146
00:14:16,720 --> 00:14:26,920
And so let's show this data sooner rather than later.

147
00:14:26,920 --> 00:14:31,560
Here we are.

148
00:14:31,560 --> 00:14:42,320
Didn't think we were going to be plotting this quite yet.

149
00:14:42,320 --> 00:14:57,360
Anyways, this was basically this chart here, which I have the source code for here, this

150
00:14:57,360 --> 00:14:58,360
chart.

151
00:14:58,360 --> 00:15:07,360
But basically I just tried to generalize to basically, you pass your data frame and a

152
00:15:07,360 --> 00:15:16,280
handful of parameters, and then this generates a fairly good looking chart.

153
00:15:16,280 --> 00:15:25,240
Just as you can tell, sometimes it takes a few lines of code here to make a good chart.

154
00:15:25,240 --> 00:15:32,200
So I encourage you to look through this code on your own if you're interested in making

155
00:15:32,200 --> 00:15:35,380
good looking Matplotlib charts.

156
00:15:35,380 --> 00:15:43,160
So but at the moment I've just sort of abstracted this away for simplicity here because this

157
00:15:43,160 --> 00:15:46,440
isn't really the focus of the meetup.

158
00:15:46,440 --> 00:15:54,800
You're welcome to look at this code here to see how to make the chart.

159
00:15:54,800 --> 00:16:00,760
Anyways, we calculated the data.

160
00:16:00,760 --> 00:16:07,520
So for each analyte here, we've calculated its prevalence, and we've also calculated

161
00:16:07,520 --> 00:16:11,160
its average concentration.

162
00:16:11,160 --> 00:16:17,040
And so just plotted the top 10.

163
00:16:17,040 --> 00:16:24,680
And so as you can see, beta-karyonfylene is in about half of the samples.

164
00:16:24,680 --> 00:16:33,200
Then you have this next group here, which is in about 30%, 30% to 33% of all samples,

165
00:16:33,200 --> 00:16:40,080
the linalool, limonene, humulene, betamersene.

166
00:16:40,080 --> 00:16:49,280
And so betamersene, I believe, is what you may find in lavenders.

167
00:16:49,280 --> 00:16:54,960
So betamersene, betakaryophyllene, I think, are going to be more prevalent in what you

168
00:16:54,960 --> 00:16:57,920
typically would call an indica strain.

169
00:16:57,920 --> 00:17:02,760
And then the limonene, limonool, those may be more present in what you would call a sativa

170
00:17:02,760 --> 00:17:03,760
type strain.

171
00:17:03,760 --> 00:17:09,400
And then these other terpenes, I don't know as much about.

172
00:17:09,400 --> 00:17:19,240
Beta-pinene, alpha-pinene, alpha-bizabolol, osemine, trans-nerolidal.

173
00:17:19,240 --> 00:17:21,440
You do not know as much about these terpenes.

174
00:17:21,440 --> 00:17:26,680
However, they appear in a non-negligible portion of the samples.

175
00:17:26,680 --> 00:17:31,640
So there may be something special going on about these.

176
00:17:31,640 --> 00:17:39,680
So people may be chopping around through these terpenes without knowing it.

177
00:17:39,680 --> 00:17:49,160
So for example, osemine, it would be interesting to dive into the samples that contain osemine.

178
00:17:49,160 --> 00:17:59,600
And see if there is a particular strain or what have you that explains this.

179
00:17:59,600 --> 00:18:02,080
So maybe it's popular.

180
00:18:02,080 --> 00:18:05,080
Maybe it's not popular.

181
00:18:05,080 --> 00:18:09,960
So that's what we essentially did last week.

182
00:18:09,960 --> 00:18:18,280
But I just wanted to run through it again this week just to show you the cleaner code

183
00:18:18,280 --> 00:18:23,680
and we can add on top of it real quick.

184
00:18:23,680 --> 00:18:28,480
So here are some other charts, some of them useful, some of them not so useful, that you

185
00:18:28,480 --> 00:18:30,240
can also make.

186
00:18:30,240 --> 00:18:34,560
And these ones I didn't beautify, I just made.

187
00:18:34,560 --> 00:18:38,000
But they're interesting nonetheless.

188
00:18:38,000 --> 00:18:50,800
So this next one, it was like, OK, how do these terpenes occur with each other?

189
00:18:50,800 --> 00:18:55,320
So that's not what we wanted.

190
00:18:55,320 --> 00:19:00,640
I'm going to run this interactive.

191
00:19:00,640 --> 00:19:11,160
OK, so this is honestly, I wouldn't call this the most informative figure in the world,

192
00:19:11,160 --> 00:19:13,680
but it does have some insights.

193
00:19:13,680 --> 00:19:21,980
So what this is, is this is a choreogram of terpenes in Connecticut.

194
00:19:21,980 --> 00:19:27,960
So this is how correlated various terpenes are with each other.

195
00:19:27,960 --> 00:19:36,760
So what you can think of this is the darker the color, the more common these terpenes

196
00:19:36,760 --> 00:19:38,680
are occurring with each other.

197
00:19:38,680 --> 00:19:41,760
So you can just look at some dark colors here.

198
00:19:41,760 --> 00:19:43,360
So here's one.

199
00:19:43,360 --> 00:19:55,720
So it looks like, OK, this beta-phardazine occurs quite often with phelan...

200
00:19:55,720 --> 00:19:58,920
Phelan... phelandrine.

201
00:19:58,920 --> 00:20:01,200
I'm probably not pronouncing that correctly.

202
00:20:01,200 --> 00:20:04,200
So is that a good insight or is that not a good insight?

203
00:20:04,200 --> 00:20:07,200
I'm not certain.

204
00:20:07,200 --> 00:20:11,360
But there's some interesting ones here.

205
00:20:11,360 --> 00:20:17,480
And so I don't know how these could particularly help you, but maybe they could.

206
00:20:17,480 --> 00:20:18,880
So like, here's another one.

207
00:20:18,880 --> 00:20:29,840
So it's like, OK, for whatever reason, menthol and it looks like puligol looks like those

208
00:20:29,840 --> 00:20:32,520
are correlated with each other.

209
00:20:32,520 --> 00:20:36,520
So that's interesting.

210
00:20:36,520 --> 00:20:46,760
So maybe if you notice that, OK, samples that have menthol perhaps have this particular

211
00:20:46,760 --> 00:20:59,280
taste, who knows what type of strains contain menthol or puligol or what they smell or taste

212
00:20:59,280 --> 00:21:00,280
like.

213
00:21:00,280 --> 00:21:04,520
But perhaps if that was something you were looking for, maybe you could use this data

214
00:21:04,520 --> 00:21:09,960
to your advantage.

215
00:21:09,960 --> 00:21:15,360
Maybe some of these like beta-karyophylline, that would be an interesting one.

216
00:21:15,360 --> 00:21:20,600
OK, so what shows up a lot with beta-karyophylline.

217
00:21:20,600 --> 00:21:28,360
So you may be able to use this when you're selecting strains, if you're trying to breathe

218
00:21:28,360 --> 00:21:32,560
for a particular terpene.

219
00:21:32,560 --> 00:21:33,800
Hard to say.

220
00:21:33,800 --> 00:21:39,560
So honestly, the chart's not as informative as I was...

221
00:21:39,560 --> 00:21:40,560
I don't know.

222
00:21:40,560 --> 00:21:41,560
It exists.

223
00:21:41,560 --> 00:21:55,400
So if you have any insights from this figure, by all means, by all means share.

224
00:21:55,400 --> 00:21:56,400
Moving on.

225
00:21:56,400 --> 00:22:00,920
So I figured, OK, that wasn't the most informative figure in the world.

226
00:22:00,920 --> 00:22:03,320
With another way we can look at this data.

227
00:22:03,320 --> 00:22:06,280
Well we've looked at the terpenes.

228
00:22:06,280 --> 00:22:10,240
Let's look at the cannabinoids.

229
00:22:10,240 --> 00:22:20,560
So what we've looked at in past weeks, particularly in hemp, is OK, what's the CBD to THC ratio?

230
00:22:20,560 --> 00:22:24,040
So let's do just that.

231
00:22:24,040 --> 00:22:37,160
So here we're just going to do a scatter plot of CBD to THCA.

232
00:22:37,160 --> 00:22:44,080
This figure could probably be refined.

233
00:22:44,080 --> 00:22:48,360
So one thing we're doing here is we're just lumping all the sample types together.

234
00:22:48,360 --> 00:22:56,520
It would probably be fruitful to separate out the different sample types, so that way

235
00:22:56,520 --> 00:22:59,080
you're comparing apples to apples.

236
00:22:59,080 --> 00:23:04,880
So what I always like to do is just start with flour, if that's what you're interested

237
00:23:04,880 --> 00:23:11,000
in looking at, to just look at flour samples.

238
00:23:11,000 --> 00:23:17,720
So a lot of times here in the cannabis data science group, we're more about do rather

239
00:23:17,720 --> 00:23:19,780
than talk.

240
00:23:19,780 --> 00:23:25,800
So why don't we see if we can do that?

241
00:23:25,800 --> 00:23:32,000
So the way I would go about doing that is first we want to remind ourselves, OK, what

242
00:23:32,000 --> 00:23:36,320
data points do we actually have here?

243
00:23:36,320 --> 00:23:41,920
A bunch of them.

244
00:23:41,920 --> 00:23:54,640
But it looks, oh yes, that's right, it may be hard for us to isolate.

245
00:23:54,640 --> 00:24:17,840
So I think usage form is, yes, so there's many different types of product categories.

246
00:24:17,840 --> 00:24:25,800
So not to go too deep into this, but I'll give it 30 seconds here.

247
00:24:25,800 --> 00:24:39,600
Let's see if we can just say, OK, we'll say the flour data is the data where the data

248
00:24:39,600 --> 00:24:49,640
is stored in the dosage form as a string.

249
00:24:49,640 --> 00:24:55,200
We want it, I'm not sure if this is going to work, but this is what I would like to

250
00:24:55,200 --> 00:25:05,200
do where the dosage form as a string that's lowered.

251
00:25:05,200 --> 00:25:10,400
This is not going to work, contains flour.

252
00:25:10,400 --> 00:25:14,920
That's what I would love to be able to do.

253
00:25:14,920 --> 00:25:19,120
We're going to have to hit Google real quick.

254
00:25:19,120 --> 00:25:43,000
So we were just doing this last week where we were saying get data, get contains.

255
00:25:43,000 --> 00:25:52,520
OK, there's actually another way we could potentially do this.

256
00:25:52,520 --> 00:25:58,080
OK, so let's see if we can do.

257
00:25:58,080 --> 00:26:05,880
OK, so we can actually do this and we can actually say, OK, let's just get everything

258
00:26:05,880 --> 00:26:15,800
where it contains flour or it contains flour.

259
00:26:15,800 --> 00:26:19,480
Awesome.

260
00:26:19,480 --> 00:26:24,840
So let's see if this makes sense.

261
00:26:24,840 --> 00:26:39,240
So like I said, if you're doing this analysis for research purposes or any serious endeavors,

262
00:26:39,240 --> 00:26:41,600
you'll want to double check all of this stuff.

263
00:26:41,600 --> 00:26:45,920
A lot of times here in the Canvas Data Science group, we're just moving quick, just sort

264
00:26:45,920 --> 00:26:47,980
of doing proof of concepts.

265
00:26:47,980 --> 00:26:54,200
So if you're doing this on your own, you're going to want to dive in here and look at

266
00:26:54,200 --> 00:26:59,920
the flour data and make sure that this makes sense to you.

267
00:26:59,920 --> 00:27:05,360
OK, these are flour observations.

268
00:27:05,360 --> 00:27:12,280
The cannabinoids should all be less than 30, 35 percent.

269
00:27:12,280 --> 00:27:18,160
I'm instantly wondering what's going on with these cannabinoids.

270
00:27:18,160 --> 00:27:25,000
So something's either going wrong with their total cannabinoid calculation or something's

271
00:27:25,000 --> 00:27:27,120
going on there.

272
00:27:27,120 --> 00:27:32,960
Looks like in some cases it's getting it right, some cases what's going on.

273
00:27:32,960 --> 00:27:39,520
So if you're doing this on your own, you're going to want to dive in and figure out, OK,

274
00:27:39,520 --> 00:27:42,640
this is an oil syringe.

275
00:27:42,640 --> 00:27:53,080
This needs to be excluded from the flour data.

276
00:27:53,080 --> 00:28:03,960
Just since we're just moving quick and breaking things, I'll just make our scattered plot

277
00:28:03,960 --> 00:28:08,640
here of flour data.

278
00:28:08,640 --> 00:28:12,280
And then we'll move on to new things.

279
00:28:12,280 --> 00:28:27,280
So we'll just do the scattered plot here.

280
00:28:27,280 --> 00:28:36,800
Looks like I messed something up.

281
00:28:36,800 --> 00:28:42,600
That's probably it.

282
00:28:42,600 --> 00:28:50,600
OK, so I think that's about as much time as we'll dedicate to this particular endeavor.

283
00:28:50,600 --> 00:28:59,680
But here we tried to just isolate the flour data.

284
00:28:59,680 --> 00:29:08,840
You could potentially, this is another little ad hoc thing, we could just say, oh, we'll

285
00:29:08,840 --> 00:29:15,840
just get everything with the total cannabinoids.

286
00:29:15,840 --> 00:29:24,160
Let's just say less than 40, just in case there's some high things in there.

287
00:29:24,160 --> 00:29:31,040
So this is an ad hoc way of just getting those oils out of there.

288
00:29:31,040 --> 00:29:39,400
But similar thing going on where, OK, it looks like there's maybe a slight correlation here

289
00:29:39,400 --> 00:29:49,100
between CBDA and THCA, but maybe not a perfect correlation by any means, because it looks

290
00:29:49,100 --> 00:29:57,000
like you can definitely crank up the THCA without increasing CBDA and perhaps conversely

291
00:29:57,000 --> 00:29:59,600
as well.

292
00:29:59,600 --> 00:30:12,480
So long story short, the data needs a bit more cleaning to do a nice scattered plot

293
00:30:12,480 --> 00:30:14,120
here.

294
00:30:14,120 --> 00:30:24,240
So I thought, OK, one last attempt, let's just look at a scatter plot of cannabinoids

295
00:30:24,240 --> 00:30:29,080
versus total terpenes and see if there's a trade-off there.

296
00:30:29,080 --> 00:30:35,080
Because last week we talked about, oh, maybe if the plant is producing more terpenes, then

297
00:30:35,080 --> 00:30:38,480
maybe it's producing less cannabinoids.

298
00:30:38,480 --> 00:30:41,400
Maybe they have a positive relationship.

299
00:30:41,400 --> 00:30:46,640
So we don't really know, so we can find out.

300
00:30:46,640 --> 00:30:53,400
So this is just a scatter plot here of total cannabinoids, or it should be total terpenes

301
00:30:53,400 --> 00:30:56,400
versus total cannabinoids.

302
00:30:56,400 --> 00:30:58,400
Cool.

303
00:30:58,400 --> 00:31:06,880
Oh, yeah, I also separated this by producer.

304
00:31:06,880 --> 00:31:19,240
So what I would say from this is there's no strong correlation.

305
00:31:19,240 --> 00:31:26,240
You'd actually have to crunch the stats to actually know the correlation.

306
00:31:26,240 --> 00:31:32,080
But just from eyeball, I'd say there's not a strong correlation, but maybe a weak positive

307
00:31:32,080 --> 00:31:35,220
correlation between terpenes and cannabinoids.

308
00:31:35,220 --> 00:31:41,220
So what that would basically suggest to me is just higher quality product.

309
00:31:41,220 --> 00:31:46,840
So just the higher quality product it's going to be is to basically the higher you crank

310
00:31:46,840 --> 00:31:51,960
up the lights, the higher you crank up the nutrients.

311
00:31:51,960 --> 00:31:53,560
I think it's going to say the lights.

312
00:31:53,560 --> 00:31:58,120
So if you really crank those lights up, you're going to get high terpenes.

313
00:31:58,120 --> 00:32:00,640
You're going to get high cannabinoids.

314
00:32:00,640 --> 00:32:08,280
The things on this side of the graph are probably concentrates.

315
00:32:08,280 --> 00:32:12,880
So that's more about processing.

316
00:32:12,880 --> 00:32:15,080
But I think this is interesting.

317
00:32:15,080 --> 00:32:17,960
So here I separated this.

318
00:32:17,960 --> 00:32:27,400
So here let me open this figure up so we can get a better look at it.

319
00:32:27,400 --> 00:32:33,720
So here I separated by the different producers.

320
00:32:33,720 --> 00:32:40,640
And this is what I found so interesting is that Connecticut is similar to Illinois in

321
00:32:40,640 --> 00:32:44,420
that you only have a handful of producers.

322
00:32:44,420 --> 00:32:47,800
You only have four producers in Connecticut.

323
00:32:47,800 --> 00:32:55,340
So I know Connecticut's a small state, but that seems like a small number of producers.

324
00:32:55,340 --> 00:33:03,680
So we'll actually look at Massachusetts next and compare the two.

325
00:33:03,680 --> 00:33:05,720
We'll do the comparison next week.

326
00:33:05,720 --> 00:33:08,040
We'll get Massachusetts dated today.

327
00:33:08,040 --> 00:33:13,240
Again, looking at Massachusetts, then do a comparison next week perhaps.

328
00:33:13,240 --> 00:33:17,640
But for this week, I thought it was interesting that, oh, let's look at what these different

329
00:33:17,640 --> 00:33:19,120
companies are doing.

330
00:33:19,120 --> 00:33:22,880
And not to single out any one of these companies.

331
00:33:22,880 --> 00:33:27,560
But it looks like they kind of have different segments of the market.

332
00:33:27,560 --> 00:33:30,200
They've each found things that work well for them.

333
00:33:30,200 --> 00:33:35,400
So you've got their plant here.

334
00:33:35,400 --> 00:33:41,720
And it looks like they've got, you know, you'd have to do the statistics.

335
00:33:41,720 --> 00:33:48,160
But they have maybe lower terpenes than some of the other players on average.

336
00:33:48,160 --> 00:33:54,840
But wow, they definitely have some of these high canobinoids over here.

337
00:33:54,840 --> 00:34:06,360
So maybe they do a lot of distillates, these high THC distillates perhaps.

338
00:34:06,360 --> 00:34:15,400
And then maybe these companies like Advanced Grow, definitely the Connecticut Pharmaceutical

339
00:34:15,400 --> 00:34:16,400
Solutions.

340
00:34:16,400 --> 00:34:23,840
It looks like they're maybe doing some of these, you know, high terpene concentrates

341
00:34:23,840 --> 00:34:33,480
where sometimes I do believe they'll add terpenes back in after the...

342
00:34:33,480 --> 00:34:40,640
And Heather can maybe speak to this, where this is maybe what consumers are looking for,

343
00:34:40,640 --> 00:34:43,240
these real tasty concentrates.

344
00:34:43,240 --> 00:34:46,920
And so, right, so it's interesting, right?

345
00:34:46,920 --> 00:34:52,840
So here you have, you know, an Orange Connecticut CPS.

346
00:34:52,840 --> 00:34:56,080
And they're doing some and they're not even the highest cannabinoids.

347
00:34:56,080 --> 00:35:00,640
But wow, you know, they've got like some that are like, look, they've got the highest, they've

348
00:35:00,640 --> 00:35:04,880
got one here that's almost 10% terpenes and no one else is up there.

349
00:35:04,880 --> 00:35:09,480
Well, I think that their plant's kind of up there a couple times.

350
00:35:09,480 --> 00:35:15,400
So long story short, I just thought it was interesting to break this out by producer

351
00:35:15,400 --> 00:35:18,400
since you only have four.

352
00:35:18,400 --> 00:35:25,760
And then once again, it looks like there's maybe a slight positive correlation here,

353
00:35:25,760 --> 00:35:34,680
especially when you get into this sort of what I would call the high quality top shelf

354
00:35:34,680 --> 00:35:42,960
flower where you basically got the flower really starting at around 15 to 18% to about,

355
00:35:42,960 --> 00:35:43,960
you know, the high...

356
00:35:43,960 --> 00:35:47,320
Well, this is total cannabinoids.

357
00:35:47,320 --> 00:35:52,120
You know, this is going to be up to like 33, about 35%.

358
00:35:52,120 --> 00:35:55,320
Looks like some of them kind of pushing that.

359
00:35:55,320 --> 00:36:04,480
And that's, you know, just adding all the cannabinoids together, not taking in the mass

360
00:36:04,480 --> 00:36:07,280
factor for THCA or anything like that.

361
00:36:07,280 --> 00:36:11,280
So long story short, this is where the bulk of your flowers shaking out.

362
00:36:11,280 --> 00:36:18,720
And it does look like there's just like positive correlation between terpenes.

363
00:36:18,720 --> 00:36:22,120
So cool things going on in Connecticut.

364
00:36:22,120 --> 00:36:24,800
Awesome data there.

365
00:36:24,800 --> 00:36:33,760
So to step next door, we can start looking at Massachusetts.

366
00:36:33,760 --> 00:36:37,640
So this will be a brand new look here at this data.

367
00:36:37,640 --> 00:36:43,640
I just put together the requests this morning.

368
00:36:43,640 --> 00:36:50,120
So now we can get back into some of the programming things for you, for you program nerds out

369
00:36:50,120 --> 00:36:51,120
there.

370
00:36:51,120 --> 00:36:58,680
So basically if you're using Python, we were using Subcrata, which is a third party package

371
00:36:58,680 --> 00:37:01,760
to access the API.

372
00:37:01,760 --> 00:37:06,160
I was looking around online and some people were kind of complaining.

373
00:37:06,160 --> 00:37:13,280
Subcrata is maybe a little, or not Subcrata, but the SodaPy package is maybe a little dated.

374
00:37:13,280 --> 00:37:19,600
And you can accomplish the same things on your own with simple HTML requests.

375
00:37:19,600 --> 00:37:28,080
So we're just going to, one, this is what can let it stress is simple is that, well,

376
00:37:28,080 --> 00:37:33,720
Python for that matter, simple is better than complex.

377
00:37:33,720 --> 00:37:38,000
Simple solutions just tend to work out better in the long run.

378
00:37:38,000 --> 00:37:47,680
So we can simply request this data with the request package, which is about as simple

379
00:37:47,680 --> 00:37:50,440
and bare bones as you can get.

380
00:37:50,440 --> 00:37:58,280
So basically just going to get these pretty standard packages here.

381
00:37:58,280 --> 00:38:02,700
I am reading in an app token.

382
00:38:02,700 --> 00:38:14,360
You don't need one, but this will make our requests so they're not throttled.

383
00:38:14,360 --> 00:38:17,200
Then we're just going to define our headers.

384
00:38:17,200 --> 00:38:19,320
So basically this is our authentication.

385
00:38:19,320 --> 00:38:22,600
We just pass our app token.

386
00:38:22,600 --> 00:38:26,520
This is the main URL.

387
00:38:26,520 --> 00:38:31,520
So it's just like visiting a web page where you just, it's just a URL.

388
00:38:31,520 --> 00:38:37,120
So this is just a URL that contains a lot of awesome data.

389
00:38:37,120 --> 00:38:43,320
So we can just read those into the terminal here.

390
00:38:43,320 --> 00:38:46,840
And we can start getting these data points.

391
00:38:46,840 --> 00:38:53,920
And so just to show you the data before we, or show you the data source.

392
00:38:53,920 --> 00:39:03,400
So the first one we'll get, we're just going to get the adult use cannabis retail sales

393
00:39:03,400 --> 00:39:07,960
by date and product type in Massachusetts.

394
00:39:07,960 --> 00:39:12,240
So basically we're just going to get all of the Massachusetts data.

395
00:39:12,240 --> 00:39:17,400
We can, and then we'll see what we can do with it at that point.

396
00:39:17,400 --> 00:39:27,680
So as we noted, we really just need this data set ID and then Bob's your uncle.

397
00:39:27,680 --> 00:39:41,320
So we just slap that data set ID onto the base URL, add.json because that way we can

398
00:39:41,320 --> 00:39:50,720
specify that we want our data in JSON requests, allow us to specify our parameters here.

399
00:39:50,720 --> 00:40:06,400
And I will admit that I am not an SQL expert by any means.

400
00:40:06,400 --> 00:40:12,760
Just tell me to move along if I'm getting that too technical here or anyone.

401
00:40:12,760 --> 00:40:28,240
But basically if you want to nerd out, Socrata has good documentation for how to do all of

402
00:40:28,240 --> 00:40:30,080
this.

403
00:40:30,080 --> 00:40:39,200
And so I won't buy into it too much on my own here, but I've put the links up at the

404
00:40:39,200 --> 00:40:40,200
GitHub.

405
00:40:40,200 --> 00:40:48,080
And so then you can, you know, if you're interested, you can read up and see, okay, this is how

406
00:40:48,080 --> 00:41:00,040
the ordering data, or this is how we, you can do many magical things here.

407
00:41:00,040 --> 00:41:08,760
So but anyways, if you're interested, there's many powerful things you can do for our needs

408
00:41:08,760 --> 00:41:09,760
today.

409
00:41:09,760 --> 00:41:18,680
Sorry, let's get back on track here.

410
00:41:18,680 --> 00:41:24,200
For our needs today, we don't really need to limit, but I'm going to add one anyways.

411
00:41:24,200 --> 00:41:26,560
And then we just need to order the data.

412
00:41:26,560 --> 00:41:33,040
So I've already looked at the data set here.

413
00:41:33,040 --> 00:41:35,300
Here we are.

414
00:41:35,300 --> 00:41:51,200
You can see the fields and actually is this the right data set?

415
00:41:51,200 --> 00:41:55,040
Well we'll find out.

416
00:41:55,040 --> 00:42:02,360
But these are the data points and it looks like we want to order by the activity summary

417
00:42:02,360 --> 00:42:03,360
date.

418
00:42:03,360 --> 00:42:12,560
There's a reason I'm saying sales date.

419
00:42:12,560 --> 00:42:19,400
So anyways, it should work, but a little bit of disconnect here.

420
00:42:19,400 --> 00:42:21,240
Sorry, the documentation is not great.

421
00:42:21,240 --> 00:42:23,320
I should have left better comments.

422
00:42:23,320 --> 00:42:33,400
But anyways, for this first one, we're just going to get the products that are in Massachusetts.

423
00:42:33,400 --> 00:42:38,680
So just read 100 of them.

424
00:42:38,680 --> 00:42:43,160
And so these are products that are getting sold.

425
00:42:43,160 --> 00:42:50,360
And so it looks like, okay, these are our product sales by day.

426
00:42:50,360 --> 00:42:56,840
So it looks like we've got data up until the 14th, looks like.

427
00:42:56,840 --> 00:42:59,800
So up and through last week.

428
00:42:59,800 --> 00:43:02,840
So not bad.

429
00:43:02,840 --> 00:43:09,160
And then we just know, okay, these are the amount of buds that were sold.

430
00:43:09,160 --> 00:43:12,080
This is the amount of concentrate that was sold.

431
00:43:12,080 --> 00:43:16,040
This is the amount of edibles that were sold.

432
00:43:16,040 --> 00:43:18,920
So mostly sales data at this point.

433
00:43:18,920 --> 00:43:22,280
However, we can still, we can.

434
00:43:22,280 --> 00:43:29,080
Hold on, there may be.

435
00:43:29,080 --> 00:43:39,160
Anyways, maybe there was someone trying to join, but anyways.

436
00:43:39,160 --> 00:43:45,720
Well this is interesting data because we can maybe start to compare Massachusetts to Connecticut.

437
00:43:45,720 --> 00:43:52,360
And so what we've been trying, we've been working on is, okay, what's the competitiveness

438
00:43:52,360 --> 00:43:54,080
of these different states?

439
00:43:54,080 --> 00:44:01,720
Because they're all, each state is supposedly, it's a little island where they're all operating

440
00:44:01,720 --> 00:44:04,360
like these little isolated environments.

441
00:44:04,360 --> 00:44:07,400
And so we could say, what's the effect of that?

442
00:44:07,400 --> 00:44:14,480
So like Illinois, they only allowed a few dozen licensees.

443
00:44:14,480 --> 00:44:19,120
How does Illinois compare to Michigan, which is, or to Oklahoma?

444
00:44:19,120 --> 00:44:22,560
Like we know Oklahoma in particular is a free for all.

445
00:44:22,560 --> 00:44:25,920
So how do those states compare?

446
00:44:25,920 --> 00:44:30,120
How does Connecticut compare to Massachusetts?

447
00:44:30,120 --> 00:44:40,800
So Connecticut only has four licensees that produce cannabis, four production licenses.

448
00:44:40,800 --> 00:44:46,840
We'll find out how many Massachusetts has, and then does that have any effect on the

449
00:44:46,840 --> 00:44:48,360
competitiveness?

450
00:44:48,360 --> 00:44:55,760
So can we now compare sales in Connecticut to sales in Massachusetts?

451
00:44:55,760 --> 00:44:56,760
Are they different?

452
00:44:56,760 --> 00:45:05,600
Like is there a price discrepancy if we can break that out?

453
00:45:05,600 --> 00:45:09,080
Because that's what is ultimately, that's kind of what I'm coming out with.

454
00:45:09,080 --> 00:45:13,880
My economics background is, what are the effects of these policies?

455
00:45:13,880 --> 00:45:23,760
So yes, they're legalizing and permitting cannabis in various forms, but what's the

456
00:45:23,760 --> 00:45:25,400
effect of that?

457
00:45:25,400 --> 00:45:29,960
Maybe some policies are more optimal than others.

458
00:45:29,960 --> 00:45:36,280
You may have unintended consequences of certain policies.

459
00:45:36,280 --> 00:45:44,480
It's fascinating to see, okay, how do the different states shake out?

460
00:45:44,480 --> 00:45:56,880
And so the do instead of just speak, let's look at the licensees here in Massachusetts.

461
00:45:56,880 --> 00:46:03,880
So there's a hundred of them.

462
00:46:03,880 --> 00:46:06,640
Let's actually just go ahead and get all of them.

463
00:46:06,640 --> 00:46:10,480
I think there are less than a thousand.

464
00:46:10,480 --> 00:46:14,280
So let's see.

465
00:46:14,280 --> 00:46:20,280
Okay, awesome.

466
00:46:20,280 --> 00:46:25,640
So in Massachusetts, there are 898 licensees.

467
00:46:25,640 --> 00:46:27,560
Awesome.

468
00:46:27,560 --> 00:46:29,800
What do they do?

469
00:46:29,800 --> 00:46:31,960
So there's license type.

470
00:46:31,960 --> 00:46:35,280
So let's find out how many of each type there are.

471
00:46:35,280 --> 00:46:38,800
So this is the first time I looked at this data with you.

472
00:46:38,800 --> 00:46:46,440
I promise this is all I've done is I just wrote the code to get the data.

473
00:46:46,440 --> 00:46:53,040
So that's the, getting the data is often the most boring part.

474
00:46:53,040 --> 00:46:55,800
So I've already done sort of the boring part.

475
00:46:55,800 --> 00:47:03,000
And so now we get to get into the fun part, which is looking at the data, cleaning it

476
00:47:03,000 --> 00:47:07,840
up, which is fun depending on who you talk to.

477
00:47:07,840 --> 00:47:13,800
And then visualizing the data, which is if you're a fan of Edward Cuff, this is what

478
00:47:13,800 --> 00:47:14,800
it's all about.

479
00:47:14,800 --> 00:47:19,600
If you've got data, you want to visualize it like that.

480
00:47:19,600 --> 00:47:25,760
Visualize the data should be step one, but you actually have to first, you actually have

481
00:47:25,760 --> 00:47:29,880
to get the data and then you actually have to kind of clean it up.

482
00:47:29,880 --> 00:47:32,760
So inevitably those are the first two steps.

483
00:47:32,760 --> 00:47:36,760
But then you want to look at the data.

484
00:47:36,760 --> 00:47:42,520
So anyways, let's do just that.

485
00:47:42,520 --> 00:47:54,160
So for the license type in the licensees.license type.

486
00:47:54,160 --> 00:47:58,000
So let's just look at all the unique license types.

487
00:47:58,000 --> 00:48:01,720
Let's list them.

488
00:48:01,720 --> 00:48:12,040
For all of these.

489
00:48:12,040 --> 00:48:16,020
Let's find the, you know, the license count for each one.

490
00:48:16,020 --> 00:48:31,360
So that's just going to be the length where our licensees, the license type is equal to

491
00:48:31,360 --> 00:48:33,240
that license type.

492
00:48:33,240 --> 00:48:44,720
So that should get us the, that should get us the license count.

493
00:48:44,720 --> 00:48:49,720
And then we can just print that out.

494
00:48:49,720 --> 00:49:02,360
We can say, okay, so what's the number of this license type?

495
00:49:02,360 --> 00:49:04,600
And that's just going to be the license count.

496
00:49:04,600 --> 00:49:10,160
We don't promise this is going to work.

497
00:49:10,160 --> 00:49:12,240
It looks like it may have.

498
00:49:12,240 --> 00:49:20,880
So this is awesome because now we now have the count of all the licensees here in Massachusetts,

499
00:49:20,880 --> 00:49:27,280
which is already useful because we can already start making comparisons to Connecticut right

500
00:49:27,280 --> 00:49:31,760
off the bat.

501
00:49:31,760 --> 00:49:41,080
There's a lot more cultivators in Massachusetts than there are in Connecticut.

502
00:49:41,080 --> 00:49:44,280
What's the population difference of these two states?

503
00:49:44,280 --> 00:49:56,520
Well, this is a, this is a data point that we've already collected here.

504
00:49:56,520 --> 00:50:06,040
So you can use Fred, Fed to find the population for various states.

505
00:50:06,040 --> 00:50:07,640
And so let's compare these two.

506
00:50:07,640 --> 00:50:21,320
So Massachusetts is 6.9 million, Connecticut's 3.6 million.

507
00:50:21,320 --> 00:50:24,320
So what did we say?

508
00:50:24,320 --> 00:50:31,520
3.6, 6.9, 6.9 minus 3.6.

509
00:50:31,520 --> 00:50:51,160
So Massachusetts is almost twice as large as, and so Massachusetts is a little less

510
00:50:51,160 --> 00:50:57,200
than twice as large as Connecticut.

511
00:50:57,200 --> 00:51:02,760
So that's interesting, but it's got a lot more than twice the number of cultivators,

512
00:51:02,760 --> 00:51:03,760
right?

513
00:51:03,760 --> 00:51:10,880
Remember Connecticut had four and you've got 274 Massachusetts.

514
00:51:10,880 --> 00:51:15,440
So quite a bit different.

515
00:51:15,440 --> 00:51:22,720
And then you have a handful of other companies, you know, you have your processors, you have

516
00:51:22,720 --> 00:51:25,720
your retailers.

517
00:51:25,720 --> 00:51:37,600
And then we can find out, okay, what's the percentage that these are of the whole market

518
00:51:37,600 --> 00:51:38,600
here?

519
00:51:38,600 --> 00:51:45,440
So we want to find out what percent this is of the whole market.

520
00:51:45,440 --> 00:51:55,600
For the percentage of licensees, well, that's just going to be the license count divided

521
00:51:55,600 --> 00:51:57,600
by all licensees.

522
00:51:57,600 --> 00:52:06,800
So let's see if we can't.

523
00:52:06,800 --> 00:52:19,680
Need more decimal places.

524
00:52:19,680 --> 00:52:21,680
Cool.

525
00:52:21,680 --> 00:52:25,640
So now we can even break it down.

526
00:52:25,640 --> 00:52:29,400
So we've done this in other states.

527
00:52:29,400 --> 00:52:34,520
So now this is cool because now we can start comparing Massachusetts to other states.

528
00:52:34,520 --> 00:52:39,880
We can say, okay, look at this.

529
00:52:39,880 --> 00:52:44,880
Almost 40% of the licensees are retailers.

530
00:52:44,880 --> 00:52:48,200
That's much higher than we've noted in other states.

531
00:52:48,200 --> 00:52:55,640
I want to say we were just looking at these statistics in Oklahoma and maybe only 15%

532
00:52:55,640 --> 00:52:58,480
of the licensees were retailers.

533
00:52:58,480 --> 00:53:04,280
And maybe 60 to 70% were cultivators.

534
00:53:04,280 --> 00:53:07,400
Here only 30% are cultivators.

535
00:53:07,400 --> 00:53:13,800
The manufacturers is a little higher than we saw in Oklahoma, but comparable.

536
00:53:13,800 --> 00:53:18,880
And then of course, your other businesses are comparable where you only have a small

537
00:53:18,880 --> 00:53:23,920
number of laboratories.

538
00:53:23,920 --> 00:53:26,400
This may be higher than it was in Oklahoma.

539
00:53:26,400 --> 00:53:32,680
This is greater than 1%.

540
00:53:32,680 --> 00:53:38,640
So let's put this in percentages.

541
00:53:38,640 --> 00:53:46,760
But that's not what we want.

542
00:53:46,760 --> 00:53:54,280
Exactly.

543
00:53:54,280 --> 00:54:00,480
So I think in Oklahoma, maybe less than 1% of the licensees were laboratories.

544
00:54:00,480 --> 00:54:03,320
Here you have a little more than 1%.

545
00:54:03,320 --> 00:54:08,760
Not sure if that's a significant difference.

546
00:54:08,760 --> 00:54:14,560
You've got just a small number of transporters, et cetera.

547
00:54:14,560 --> 00:54:20,640
I just think it's interesting that you have a much different breakdown of retailers, of

548
00:54:20,640 --> 00:54:26,040
the percent of retailers and cultivators than you do in Oklahoma.

549
00:54:26,040 --> 00:54:33,240
And so I would like to think, OK, does this have to do with time?

550
00:54:33,240 --> 00:54:39,120
Is Massachusetts a more mature market than Oklahoma?

551
00:54:39,120 --> 00:54:41,720
And trend that.

552
00:54:41,720 --> 00:54:44,960
So you could start to make predictions.

553
00:54:44,960 --> 00:54:50,600
So maybe you could look at Massachusetts early on and maybe look at Massachusetts like a

554
00:54:50,600 --> 00:54:56,400
year ago, two years ago, and see, OK, maybe the percent of retailers has increased over

555
00:54:56,400 --> 00:55:00,920
time and the percent of cultivators has decreased over time.

556
00:55:00,920 --> 00:55:04,640
Just a conjecture, but that may be the case.

557
00:55:04,640 --> 00:55:09,440
And so if that's the case, you can basically learn from Massachusetts.

558
00:55:09,440 --> 00:55:13,400
And you could maybe use that to apply to Oklahoma.

559
00:55:13,400 --> 00:55:22,080
And you say, OK, maybe over time, Oklahoma may wind up with a higher percentage of retailers

560
00:55:22,080 --> 00:55:24,760
and a lower percentage of cultivators.

561
00:55:24,760 --> 00:55:26,760
May, may not.

562
00:55:26,760 --> 00:55:30,680
And it may be a policy effect.

563
00:55:30,680 --> 00:55:31,680
Right?

564
00:55:31,680 --> 00:55:41,680
So if you look at them, you watch them over time, and time doesn't appear to be a factor,

565
00:55:41,680 --> 00:55:46,680
which would be like maturity, learning by doing, et cetera, then maybe there's another

566
00:55:46,680 --> 00:55:48,680
explanatory factor.

567
00:55:48,680 --> 00:55:49,680
Right?

568
00:55:49,680 --> 00:55:52,840
Maybe it's the policy.

569
00:55:52,840 --> 00:56:00,520
Maybe there's policies in place in Massachusetts that just encourage a large percentage of

570
00:56:00,520 --> 00:56:04,200
retail versus Oklahoma.

571
00:56:04,200 --> 00:56:11,560
So already, already, we've barely gotten the data.

572
00:56:11,560 --> 00:56:14,000
We've barely scratched the surface.

573
00:56:14,000 --> 00:56:15,000
Right?

574
00:56:15,000 --> 00:56:20,320
We've basically taken like, we're talking about a conditional average.

575
00:56:20,320 --> 00:56:22,400
We've taken a conditional count.

576
00:56:22,400 --> 00:56:27,680
Like, we made like one conditional statistic here.

577
00:56:27,680 --> 00:56:34,240
And we already have groundbreaking insights.

578
00:56:34,240 --> 00:56:37,680
I mean, you could borderline already write a paper on this.

579
00:56:37,680 --> 00:56:42,360
Like say, oh, like this is, right, because what we're doing is we're basically looking

580
00:56:42,360 --> 00:56:45,800
at the industrial organization of these different states.

581
00:56:45,800 --> 00:56:56,800
And you can start to see how does the policies and behavior of the organizations and players,

582
00:56:56,800 --> 00:57:00,960
how does that affect the structure?

583
00:57:00,960 --> 00:57:03,840
So fascinating stuff here.

584
00:57:03,840 --> 00:57:09,960
So we'll kind of bring it to a head here, but basically just want to show you the rest

585
00:57:09,960 --> 00:57:12,400
of the data points.

586
00:57:12,400 --> 00:57:18,880
That way you can start picking your brains about what statistics could be made for next

587
00:57:18,880 --> 00:57:19,880
week.

588
00:57:19,880 --> 00:57:29,440
So that way we can continue to uncover insights and maybe continue our interstate comparison.

589
00:57:29,440 --> 00:57:39,320
So just to show you the other data points here, you can also get prices.

590
00:57:39,320 --> 00:57:43,440
So just get a hundred prices here.

591
00:57:43,440 --> 00:57:45,760
Awesome.

592
00:57:45,760 --> 00:57:58,360
So you basically have the average price of an ounce, unbelievably a flower, by day.

593
00:57:58,360 --> 00:58:01,320
So I don't think my order worked.

594
00:58:01,320 --> 00:58:05,600
Oh, yes, this is actually by month.

595
00:58:05,600 --> 00:58:13,400
So this is a really aggregated statistic, but statistics on prices are hard to come

596
00:58:13,400 --> 00:58:14,480
by.

597
00:58:14,480 --> 00:58:19,000
We've got them in Washington state because we can do Freedom of Information Act requests.

598
00:58:19,000 --> 00:58:23,960
We probably get them in other states, but prices are hard to come by.

599
00:58:23,960 --> 00:58:27,200
So this is good data.

600
00:58:27,200 --> 00:58:29,120
I'm happy to have it.

601
00:58:29,120 --> 00:58:35,160
And so here you just have the average price of an ounce per month.

602
00:58:35,160 --> 00:58:52,960
And then just to show you real quick what you can do with that, we want to sort this

603
00:58:52,960 --> 00:58:59,640
on the place.

604
00:58:59,640 --> 00:59:02,480
Okay.

605
00:59:02,480 --> 00:59:10,680
Let's go to our good friend Google real quick.

606
00:59:10,680 --> 00:59:19,560
Just see how do we sort a data frame.

607
00:59:19,560 --> 00:59:21,840
We use sort values.

608
00:59:21,840 --> 00:59:26,520
So let's do sort the values.

609
00:59:26,520 --> 00:59:28,240
Okay.

610
00:59:28,240 --> 00:59:32,240
That looks chronological.

611
00:59:32,240 --> 00:59:33,240
Okay.

612
00:59:33,240 --> 00:59:41,320
I'll be wrapping up here, but basically just to show you this data.

613
00:59:41,320 --> 00:59:49,240
Oh, that's not what we want.

614
00:59:49,240 --> 00:59:58,160
Here we are.

615
00:59:58,160 --> 01:00:02,520
So you can plot prices over time.

616
01:00:02,520 --> 01:00:04,040
This is real interesting.

617
01:00:04,040 --> 01:00:12,560
So they have this data on their website, but as many of you noticed, there was something

618
01:00:12,560 --> 01:00:18,000
unusual about April of 2020.

619
01:00:18,000 --> 01:00:26,080
I'll let you be the judge of what was unusual about that month, but it had an effect on

620
01:00:26,080 --> 01:00:28,080
prices.

621
01:00:28,080 --> 01:00:36,560
It looks like it was a transitory effect on prices.

622
01:00:36,560 --> 01:00:47,240
So as you can see, everybody's running through the store to get cannabis.

623
01:00:47,240 --> 01:00:55,960
And for whatever reason, you'd actually think prices would rise, but for whatever reason,

624
01:00:55,960 --> 01:01:04,320
prices fell dramatically that month, April of 2020.

625
01:01:04,320 --> 01:01:12,000
But it was transitory and by May, the prices had stabilized.

626
01:01:12,000 --> 01:01:18,600
As you can see, this is what you'd call an increase in volatility, where you basically

627
01:01:18,600 --> 01:01:28,200
have pretty stable prices and then, whoa, we've got a volatile market.

628
01:01:28,200 --> 01:01:30,560
Prices are going all over the place.

629
01:01:30,560 --> 01:01:36,200
People like, this is what you see from a rise in uncertainty.

630
01:01:36,200 --> 01:01:39,160
So people are less certain about what's going on.

631
01:01:39,160 --> 01:01:40,160
Oh no.

632
01:01:40,160 --> 01:01:42,200
Is this good?

633
01:01:42,200 --> 01:01:43,200
Is this bad?

634
01:01:43,200 --> 01:01:45,040
We don't really know.

635
01:01:45,040 --> 01:01:54,080
This is what I would call increased volatility, where you see it's just greater variation

636
01:01:54,080 --> 01:02:00,240
in prices, not quite so stable.

637
01:02:00,240 --> 01:02:07,600
So who knows if this is going to persist, but that's how I would characterize this past

638
01:02:07,600 --> 01:02:08,600
year.

639
01:02:08,600 --> 01:02:16,160
And you could actually look at the statistics, actually compare the variance of this period

640
01:02:16,160 --> 01:02:18,280
versus the variance of this period.

641
01:02:18,280 --> 01:02:25,520
I know you only have so many observations, so you probably couldn't get the statistical

642
01:02:25,520 --> 01:02:30,400
claim one way or the other, but you could still look at the variance and maybe we'll

643
01:02:30,400 --> 01:02:31,880
do that next week.

644
01:02:31,880 --> 01:02:35,960
But long story short, you can look at the trend.

645
01:02:35,960 --> 01:02:41,560
So there's cool things you can do, even though it's just a simple series.

646
01:02:41,560 --> 01:02:48,240
And then the last set of data points here, which can't be forgotten because this is some

647
01:02:48,240 --> 01:02:51,320
real meat right here.

648
01:02:51,320 --> 01:02:56,480
So we'll dive into this meat and potatoes next week.

649
01:02:56,480 --> 01:03:11,280
But just to show you what the data looks like, let's just look at a couple of these.

650
01:03:11,280 --> 01:03:15,320
So here is real cool data.

651
01:03:15,320 --> 01:03:17,960
So you've got the date.

652
01:03:17,960 --> 01:03:22,360
Yeah, so you've got daily data here.

653
01:03:22,360 --> 01:03:25,480
You've got plant counts.

654
01:03:25,480 --> 01:03:27,600
You've got you know what stage they're in.

655
01:03:27,600 --> 01:03:32,880
So you know how many flowering plants there are, how many vegetative plants there are,

656
01:03:32,880 --> 01:03:37,160
how many plants are being harvested.

657
01:03:37,160 --> 01:03:40,560
Just you've got the sales total.

658
01:03:40,560 --> 01:03:44,280
You know how many packages there are.

659
01:03:44,280 --> 01:03:50,760
Packages are kind of a little vague, but nonetheless, interesting data point.

660
01:03:50,760 --> 01:03:54,680
And you know strains is interesting.

661
01:03:54,680 --> 01:03:57,200
You know how many employees there are.

662
01:03:57,200 --> 01:04:02,760
This is an incredibly interesting data point because it's real.

663
01:04:02,760 --> 01:04:05,240
There's ways to measure capital.

664
01:04:05,240 --> 01:04:11,440
So in economics, we talk about two inputs to the production function, primarily.

665
01:04:11,440 --> 01:04:14,080
You've got capital and labor.

666
01:04:14,080 --> 01:04:17,320
And you know, there's many ways to measure capital.

667
01:04:17,320 --> 01:04:21,480
And I've done it in the past and we'll do it maybe next week where we can just kind

668
01:04:21,480 --> 01:04:27,640
of proxy capital with things like plants or what have you.

669
01:04:27,640 --> 01:04:30,120
There's ways to do it.

670
01:04:30,120 --> 01:04:34,600
But actually knowing the number of employees, the labor, of course, yes, it would be nice

671
01:04:34,600 --> 01:04:37,080
to know the breakdown of the labor.

672
01:04:37,080 --> 01:04:39,200
What are these employees doing?

673
01:04:39,200 --> 01:04:41,200
What are their wages?

674
01:04:41,200 --> 01:04:44,960
However, we'll take what we get.

675
01:04:44,960 --> 01:04:49,400
So we've been given the total number of employees.

676
01:04:49,400 --> 01:04:52,920
So I think this is an extraordinarily interesting data point.

677
01:04:52,920 --> 01:05:03,960
So you know, just to run just a little bit longer here, let's just do a similar thing

678
01:05:03,960 --> 01:05:12,920
we did with prices and just plot employees real quick because I'm super, super interested

679
01:05:12,920 --> 01:05:24,120
what that may look like by total employees.

680
01:05:24,120 --> 01:05:26,080
So let's just end on this note.

681
01:05:26,080 --> 01:05:29,600
Oh, oh no.

682
01:05:29,600 --> 01:05:33,240
I want to check.

683
01:05:33,240 --> 01:05:41,400
OK, that is chronological order.

684
01:05:41,400 --> 01:05:46,200
So this is worrisome.

685
01:05:46,200 --> 01:05:50,760
At least it is to me.

686
01:05:50,760 --> 01:05:58,280
If unless I somehow plotted this in the wrong direction, which is not unfathomable.

687
01:05:58,280 --> 01:06:11,720
Yes, I think I may have plotted the data backwards here.

688
01:06:11,720 --> 01:06:20,840
So let's double check on this next week because right off the bat, I'm worried I may have

689
01:06:20,840 --> 01:06:26,400
plotted this in the wrong direction because this would look like, oh no, our total employees

690
01:06:26,400 --> 01:06:28,640
are falling off a cliff here.

691
01:06:28,640 --> 01:06:39,200
But I've got a feeling if you look here at the data, we've got 2021, 06, 05, and we have

692
01:06:39,200 --> 01:06:40,200
8,000.

693
01:06:40,200 --> 01:06:45,240
And then here we have 2021, 09, 14.

694
01:06:45,240 --> 01:06:46,600
We have 9,000.

695
01:06:46,600 --> 01:06:50,880
So I think this is reversed.

696
01:06:50,880 --> 01:06:53,120
So that's OK.

697
01:06:53,120 --> 01:06:56,120
So got a little sloppy there at the end.

698
01:06:56,120 --> 01:07:06,240
So next week, we'll finish cleaning this data and then dive into more analysis.

699
01:07:06,240 --> 01:07:13,680
So we've got all these awesome data points here so we can have a field day next week.

700
01:07:13,680 --> 01:07:18,960
So I'm going to stop the presentation there.

701
01:07:18,960 --> 01:07:22,920
But thank you both for listening to that big launch feel.

702
01:07:22,920 --> 01:07:29,640
Did any questions, comments come to mind?

703
01:07:29,640 --> 01:07:33,840
As far as the presentation went, it was a great watch.

704
01:07:33,840 --> 01:07:38,200
Everything kind of went over my head because, like I said, I'm still learning about this.

705
01:07:38,200 --> 01:07:41,520
I haven't even dipped more than a pinky toe into Python yet.

706
01:07:41,520 --> 01:07:48,000
But I was going to ask after we're done here if I could get like five minutes of your time

707
01:07:48,000 --> 01:07:49,280
to ask a couple of questions.

708
01:07:49,280 --> 01:07:51,520
It's also for my school thing.

709
01:07:51,520 --> 01:07:52,520
Oh, yes.

710
01:07:52,520 --> 01:07:55,120
Always happy to talk.

711
01:07:55,120 --> 01:07:58,400
This is sort of our approach is yet thrown in on the deep end.

712
01:07:58,400 --> 01:08:02,160
So like I said, it's just sort of a cursory approach.

713
01:08:02,160 --> 01:08:10,320
In practice, you'd go much slower and really parse things out and be real certain with

714
01:08:10,320 --> 01:08:11,320
things.

715
01:08:11,320 --> 01:08:14,480
But this is what we're sort of doing just real quick and easy.

716
01:08:14,480 --> 01:08:17,640
We're just showing, OK, it's possible.

717
01:08:17,640 --> 01:08:18,800
The data is there.

718
01:08:18,800 --> 01:08:19,960
You can get the data.

719
01:08:19,960 --> 01:08:21,120
You can print it.

720
01:08:21,120 --> 01:08:24,680
And so this group is more for fun.

721
01:08:24,680 --> 01:08:30,600
And then exactly, yes, if you're trying to get serious about data analytics, then I'm

722
01:08:30,600 --> 01:08:34,640
always happy to be a resource for you.

723
01:08:34,640 --> 01:08:57,880
It's good to hear.

