1
00:00:00,000 --> 00:00:10,800
Welcome to the Cannabis Data Science Meetup Group.

2
00:00:10,800 --> 00:00:12,560
Happy to have you all here.

3
00:00:12,560 --> 00:00:17,120
So just a big group of like-minded individuals,

4
00:00:17,120 --> 00:00:19,320
just from all walks of life,

5
00:00:19,320 --> 00:00:23,760
with all different interests in the cannabis industry and data science.

6
00:00:23,760 --> 00:00:27,000
So see a couple of new faces today.

7
00:00:27,000 --> 00:00:30,200
So just a quick round of introductions.

8
00:00:30,200 --> 00:00:33,200
I'll start and then we can start in my top left.

9
00:00:33,200 --> 00:00:36,080
Essentially, my name is Keegan Skeet.

10
00:00:36,080 --> 00:00:40,160
I got into the cannabis industry as a laboratory analyst,

11
00:00:40,160 --> 00:00:44,460
and then started developing software and realized I could help

12
00:00:44,460 --> 00:00:50,540
many people in the industry just with data and crunching numbers at the lab.

13
00:00:50,540 --> 00:00:54,200
So started Cannlytics to do exactly that.

14
00:00:54,200 --> 00:00:58,720
So we've been providing software solutions to laboratories across the country,

15
00:00:58,720 --> 00:01:01,720
and now we're branching into data science,

16
00:01:01,720 --> 00:01:05,440
because that's what my background is in, in particular economics.

17
00:01:05,440 --> 00:01:11,200
So I would love to hear about what you all do and what you like to get out of the group,

18
00:01:11,200 --> 00:01:12,760
and what your interests may be.

19
00:01:12,760 --> 00:01:16,680
So Nina, would you mind introducing yourself real quick?

20
00:01:16,680 --> 00:01:19,320
Hi Keegan, nice to meet you.

21
00:01:19,320 --> 00:01:24,040
I've been trying to catch this class for about two to three weeks now.

22
00:01:24,040 --> 00:01:26,760
I could never quite get it at 1130.

23
00:01:26,760 --> 00:01:30,480
But I'm here and I'm ready to learn.

24
00:01:31,480 --> 00:01:38,400
Right now, I am learning data science as far as programming and R coding.

25
00:01:38,400 --> 00:01:40,200
I graduated from college,

26
00:01:40,200 --> 00:01:46,040
so I want to get into data analysis as a career. Yeah.

27
00:01:46,040 --> 00:01:49,080
Awesome. You're in the right place.

28
00:01:49,080 --> 00:01:53,120
I always tell people there's a shortage of data scientists.

29
00:01:53,120 --> 00:01:55,800
So there are people that are in need of your talents.

30
00:01:55,800 --> 00:01:59,080
So just have to connect the dots.

31
00:01:59,080 --> 00:02:02,640
So happy to have you aboard.

32
00:02:02,640 --> 00:02:06,880
John, would you mind introducing yourself real quick?

33
00:02:06,880 --> 00:02:11,320
Hello all. I'm really here just to observe.

34
00:02:11,320 --> 00:02:12,960
I'm new to the industry.

35
00:02:12,960 --> 00:02:16,600
I'm new to trying out coding.

36
00:02:16,600 --> 00:02:19,120
I've always been interested in data,

37
00:02:19,120 --> 00:02:21,320
and I happen to see that I'm new to Meetup.

38
00:02:21,320 --> 00:02:24,840
So I'm overall new to everything and I just happen to see it.

39
00:02:24,840 --> 00:02:28,440
I had the hour free, so I thought I'd join in.

40
00:02:28,440 --> 00:02:30,720
Welcome aboard, John.

41
00:02:30,720 --> 00:02:35,360
We do focus on the cannabis industry just to give the group direction.

42
00:02:35,360 --> 00:02:38,640
However, a lot of these skills can be applied elsewhere.

43
00:02:38,640 --> 00:02:42,120
So a lot of this is just data wrangling and statistics,

44
00:02:42,120 --> 00:02:45,640
and a little bit of economic theory splashed in.

45
00:02:45,640 --> 00:02:47,720
We talked about the cannabis industry.

46
00:02:47,720 --> 00:02:52,160
So you can, I'm sure, apply these skills in many aspects of life.

47
00:02:52,160 --> 00:02:54,920
Sure. I'm interested in the cannabis industry.

48
00:02:54,920 --> 00:03:01,640
So I'm interested to see where this data is coming from and what's going on in general.

49
00:03:01,640 --> 00:03:03,840
Exactly. Then, Heather,

50
00:03:03,840 --> 00:03:05,680
you don't have to chime up if you don't want to.

51
00:03:05,680 --> 00:03:08,560
I know you're going through a bit of recovery right now.

52
00:03:08,560 --> 00:03:17,000
So Heather's got experience at the lab and has been a classic cannabis data science,

53
00:03:17,000 --> 00:03:18,680
meetup member for a long time now.

54
00:03:18,680 --> 00:03:20,080
So you're free to chime in.

55
00:03:20,080 --> 00:03:20,920
I have a question.

56
00:03:20,920 --> 00:03:21,560
Yeah, by all means.

57
00:03:21,560 --> 00:03:22,320
I have a question.

58
00:03:22,320 --> 00:03:26,520
With our data over the past couple of weeks,

59
00:03:26,520 --> 00:03:30,520
do dark stores make it to the data?

60
00:03:30,520 --> 00:03:34,440
I know that, or at least in my area,

61
00:03:34,440 --> 00:03:38,480
some stores have not made it to the public essentially.

62
00:03:38,480 --> 00:03:43,400
They're just operating from the curbside because

63
00:03:43,400 --> 00:03:48,120
the employees just have gotten hit with COVID.

64
00:03:48,120 --> 00:03:51,560
So it's like, they're dark stores anyway.

65
00:03:51,560 --> 00:03:52,880
So it's like, what's the purpose?

66
00:03:52,880 --> 00:03:57,200
I mean, me, I can't bear the lights in the dispensary.

67
00:03:57,200 --> 00:04:00,040
So when they see me, they think I have COVID and I don't.

68
00:04:00,040 --> 00:04:03,120
I'm like, I can't bear their orbs of light.

69
00:04:03,120 --> 00:04:07,120
So I don't mean I just need my cannabis.

70
00:04:07,120 --> 00:04:09,720
I know what it looks like anyway,

71
00:04:09,720 --> 00:04:11,360
as long as it doesn't have,

72
00:04:11,360 --> 00:04:14,320
it's the product that I know is what I need.

73
00:04:14,320 --> 00:04:19,160
So it's like, I'm just curious if the stores themselves are,

74
00:04:19,160 --> 00:04:22,040
when you say the stores and the dispensaries,

75
00:04:22,040 --> 00:04:24,040
these are not dark stores included?

76
00:04:25,280 --> 00:04:30,280
As far as dark stores, I think we're just measuring

77
00:04:30,280 --> 00:04:33,440
the ones that are officially on the books with,

78
00:04:33,440 --> 00:04:35,760
so we've been looking at sales in different states.

79
00:04:35,760 --> 00:04:36,960
And so, exactly.

80
00:04:36,960 --> 00:04:40,880
So we're just looking at stores that are on the books

81
00:04:40,880 --> 00:04:43,080
and reporting all their revenue.

82
00:04:43,080 --> 00:04:47,160
So, I mean, I myself lived in California for a little bit.

83
00:04:47,160 --> 00:04:50,360
So I think things have shaping up a bit there.

84
00:04:50,360 --> 00:04:51,600
But when I was there,

85
00:04:51,600 --> 00:04:53,200
I kind of know what you're talking about.

86
00:04:53,200 --> 00:04:55,800
There are definitely some that are clearly licensed

87
00:04:55,800 --> 00:04:59,320
and then there are some that you're lesser

88
00:04:59,320 --> 00:05:02,160
and if they are licensed.

89
00:05:02,160 --> 00:05:04,680
Is that what you're talking about?

90
00:05:04,680 --> 00:05:06,480
So like, oh, no, no, no, no, no, no.

91
00:05:06,480 --> 00:05:08,760
Like they don't have a retail presence,

92
00:05:08,760 --> 00:05:11,520
like a store that you can step into.

93
00:05:13,360 --> 00:05:15,120
So anybody that reports sales

94
00:05:15,120 --> 00:05:17,720
is gonna contribute to your data.

95
00:05:18,720 --> 00:05:22,200
Heather, are you talking about like street dealers?

96
00:05:22,200 --> 00:05:24,680
I don't think there are. No.

97
00:05:24,680 --> 00:05:29,680
No, so there are dispensaries that allow the customer

98
00:05:30,280 --> 00:05:33,040
to step into the store.

99
00:05:33,040 --> 00:05:36,040
And then there are dispensaries that don't.

100
00:05:36,040 --> 00:05:39,200
And I don't believe that we have these in Maryland.

101
00:05:39,200 --> 00:05:43,880
But like if you listen to, I guess, cannabis podcasts

102
00:05:43,880 --> 00:05:44,720
or whatever, there are people

103
00:05:44,720 --> 00:05:47,480
that have these cannabis dark stores

104
00:05:48,440 --> 00:05:53,240
where they don't wanna bother with the retail aspect

105
00:05:53,240 --> 00:05:55,880
of the customer stepping into the store,

106
00:05:55,880 --> 00:06:00,240
having to deal with, you know, speaking to a bud tender,

107
00:06:00,240 --> 00:06:01,200
which is fine.

108
00:06:01,200 --> 00:06:02,760
That's a whole education aspect

109
00:06:02,760 --> 00:06:04,880
that they're missing out on, fine.

110
00:06:04,880 --> 00:06:09,600
But it's just, they just wanna sell out online,

111
00:06:09,600 --> 00:06:12,680
pick up from the little store somewhere else.

112
00:06:12,680 --> 00:06:15,200
Or maybe they even just ship it.

113
00:06:15,200 --> 00:06:16,880
And then you're just done with it.

114
00:06:16,880 --> 00:06:20,000
Don't bother dealing with a grocery store or a restaurant,

115
00:06:20,000 --> 00:06:21,720
like, you know, how restaurants,

116
00:06:21,720 --> 00:06:24,320
they have their own dark stores, the same principle.

117
00:06:26,120 --> 00:06:27,000
Interesting.

118
00:06:27,000 --> 00:06:29,080
I'm going to have to do more research.

119
00:06:29,080 --> 00:06:34,080
I think this may be something that's unique to,

120
00:06:34,080 --> 00:06:37,360
well, I guess maybe you said present in multiple states,

121
00:06:37,360 --> 00:06:40,480
but I don't know the least about this.

122
00:06:40,480 --> 00:06:44,000
West Coast, Midwest and West, not here.

123
00:06:44,000 --> 00:06:46,520
Meaning I'm saying here, not East Coast.

124
00:06:46,520 --> 00:06:48,720
I don't think so, not East Coast.

125
00:06:48,720 --> 00:06:49,760
I don't think.

126
00:06:49,760 --> 00:06:52,400
Graham, do you have any light to shed on this?

127
00:06:52,400 --> 00:06:57,400
Yeah, but East Coast markets are much more structured.

128
00:06:58,920 --> 00:07:01,480
And that's what we've primarily been looking at

129
00:07:01,480 --> 00:07:05,040
is particularly structured markets.

130
00:07:05,040 --> 00:07:08,760
Probably one of the only West Coast markets

131
00:07:08,760 --> 00:07:12,400
that we've looked at is Washington.

132
00:07:14,120 --> 00:07:17,920
But there are some like Southwestern states

133
00:07:17,920 --> 00:07:21,960
where it's kind of like the Wild West.

134
00:07:21,960 --> 00:07:23,840
It's what we talked about before,

135
00:07:23,840 --> 00:07:28,840
how in Oklahoma, the quality of the product just went down.

136
00:07:28,840 --> 00:07:31,240
And I think that's what she's talking about

137
00:07:31,240 --> 00:07:34,560
when it's all screwed up around there.

138
00:07:35,720 --> 00:07:38,160
Politics has taken up everything.

139
00:07:38,160 --> 00:07:43,160
But if I had to guess an answer to Heather's question,

140
00:07:44,480 --> 00:07:49,480
we are not taking into account these dark stores

141
00:07:49,480 --> 00:07:52,480
because they are dark for a reason.

142
00:07:52,480 --> 00:07:54,440
They aren't reporting their data

143
00:07:54,440 --> 00:07:58,960
because they'll probably get fined for not being legal.

144
00:08:01,440 --> 00:08:02,280
Yes.

145
00:08:02,280 --> 00:08:03,440
In some aspects, right?

146
00:08:03,440 --> 00:08:05,360
Like, I just curious, I don't know.

147
00:08:05,360 --> 00:08:07,520
I just kind of brought this term in there.

148
00:08:07,520 --> 00:08:08,360
I don't know.

149
00:08:08,360 --> 00:08:09,200
I was just wondering.

150
00:08:09,200 --> 00:08:12,240
Yes, well, you bring up a good point because I,

151
00:08:12,240 --> 00:08:17,240
like I said, I observed a similar thing in California

152
00:08:17,240 --> 00:08:22,240
when I was there back in 2016 or so.

153
00:08:22,240 --> 00:08:27,240
So, what I was just going to say is there's a lot of variation

154
00:08:31,560 --> 00:08:34,960
and potential measurement error from state to state.

155
00:08:34,960 --> 00:08:38,280
So I would just kind of chalk that up as measurement error,

156
00:08:38,280 --> 00:08:42,120
which is a big deal and can introduce bias

157
00:08:42,120 --> 00:08:43,680
into your results.

158
00:08:43,680 --> 00:08:48,680
So long story short is we just need to just do more

159
00:08:48,680 --> 00:08:50,400
research on all these states.

160
00:08:50,400 --> 00:08:54,400
And that's why it's so beneficial for you to,

161
00:08:54,400 --> 00:08:58,680
let us know how things are operating in state Maryland,

162
00:08:58,680 --> 00:09:02,520
because you're like boots on the ground there.

163
00:09:02,520 --> 00:09:05,360
So, you can report firsthand.

164
00:09:06,880 --> 00:09:08,560
So that would exactly fall in,

165
00:09:08,560 --> 00:09:12,320
like Graham said, as in structure and performance.

166
00:09:12,320 --> 00:09:14,400
So how are these retailers structured

167
00:09:14,400 --> 00:09:16,760
and how are they doing their work?

168
00:09:16,760 --> 00:09:19,600
How are these retailers structured and how are they,

169
00:09:19,600 --> 00:09:20,600
well, actually that's conduct.

170
00:09:20,600 --> 00:09:22,400
How are they conducting themselves?

171
00:09:22,400 --> 00:09:27,400
So critical aspects that aren't captured by the data.

172
00:09:28,880 --> 00:09:31,720
So if we're just looking at the data alone,

173
00:09:31,720 --> 00:09:34,000
then we may mislead ourselves.

174
00:09:34,000 --> 00:09:35,200
So definitely a factor.

175
00:09:38,200 --> 00:09:40,760
I also like to say that you stumbled upon

176
00:09:40,760 --> 00:09:43,520
exactly what we'll be talking about today,

177
00:09:43,520 --> 00:09:47,360
which is quality in particular,

178
00:09:47,360 --> 00:09:49,240
West Coast versus East Coast.

179
00:09:49,240 --> 00:09:53,840
So we'll let Graham and Alice introduce themselves

180
00:09:53,840 --> 00:09:56,440
and then we can dive into that.

181
00:09:56,440 --> 00:09:57,880
So it'll be exciting.

182
00:09:57,880 --> 00:10:00,480
So Graham, do you have a quick introduction

183
00:10:00,480 --> 00:10:03,680
that you would like to give yourself to the group?

184
00:10:03,680 --> 00:10:07,120
Yep. Hey guys, I'm Graham.

185
00:10:07,120 --> 00:10:09,360
I live in Maryland as well.

186
00:10:09,360 --> 00:10:13,800
And I've been a cannabis user for over a decade,

187
00:10:13,800 --> 00:10:18,800
but in particular, I've realized it's medical properties

188
00:10:18,960 --> 00:10:21,080
ever since I've been diagnosed

189
00:10:21,080 --> 00:10:23,720
with a recessive genetic disorder

190
00:10:23,720 --> 00:10:28,720
that causes a lot of side effects and has no cure.

191
00:10:29,840 --> 00:10:34,840
So cannabis is really the only way I can medicate for myself.

192
00:10:35,080 --> 00:10:37,280
So I have a lot of passion in that

193
00:10:37,280 --> 00:10:39,920
that's come about the past three years.

194
00:10:39,920 --> 00:10:42,880
And I am trained.

195
00:10:42,880 --> 00:10:46,120
I have a master's in applied mathematics

196
00:10:46,120 --> 00:10:51,120
and I was a junior data scientist working in aerospace.

197
00:10:53,000 --> 00:10:56,120
I worked particularly in satellite imaging.

198
00:10:58,720 --> 00:11:02,480
So I know we can see your faces from space.

199
00:11:02,480 --> 00:11:03,640
Yes.

200
00:11:03,640 --> 00:11:04,640
Graham is sharp.

201
00:11:04,640 --> 00:11:07,520
So Graham keeps me on my toes and make sure

202
00:11:07,520 --> 00:11:09,440
that I'm doing things right.

203
00:11:09,440 --> 00:11:11,720
So if all of you want to come

204
00:11:11,720 --> 00:11:14,520
to the Saturday morning statistics,

205
00:11:14,520 --> 00:11:17,640
we'll be in for a good treat this coming Saturday.

206
00:11:17,640 --> 00:11:21,920
We'll relook at our forecast for 2022 and do them correctly

207
00:11:21,920 --> 00:11:25,200
as Graham noted that there's ways

208
00:11:25,200 --> 00:11:29,400
that we can improve our forecasts and not make rookie errors.

209
00:11:29,400 --> 00:11:32,760
So love having you aboard, Graham.

210
00:11:32,760 --> 00:11:37,760
You keep us sharp and you give us a real life story

211
00:11:40,160 --> 00:11:44,480
as to some of the value that can be had from cannabis.

212
00:11:45,320 --> 00:11:46,160
So thank you.

213
00:11:47,400 --> 00:11:51,440
Alice, happy to have you today, Alice.

214
00:11:51,440 --> 00:11:55,480
Would you mind introducing yourself if you want to the group?

215
00:11:55,480 --> 00:11:56,300
Oh yeah.

216
00:11:56,300 --> 00:11:57,140
Hi everyone.

217
00:11:57,140 --> 00:11:58,200
I'm Alice.

218
00:11:58,200 --> 00:12:00,080
I'm currently in New York

219
00:12:00,080 --> 00:12:05,080
and I'm pretty new to the cannabis and analytics.

220
00:12:05,520 --> 00:12:07,920
I'm currently pursuing my master degree

221
00:12:07,920 --> 00:12:09,740
in analytics here in New York.

222
00:12:10,760 --> 00:12:11,920
Nice to meet you all.

223
00:12:11,920 --> 00:12:13,800
Hope to see you more.

224
00:12:13,800 --> 00:12:14,640
Awesome.

225
00:12:14,640 --> 00:12:16,000
Awesome to have you, Alice.

226
00:12:16,000 --> 00:12:17,520
You'll have to keep us informed

227
00:12:17,520 --> 00:12:21,160
as New York rolls out their program.

228
00:12:22,140 --> 00:12:27,140
We'll be talking about one of your neighbors,

229
00:12:27,520 --> 00:12:29,120
Connecticut today.

230
00:12:29,120 --> 00:12:30,240
All right.

231
00:12:30,240 --> 00:12:33,920
So without further ado,

232
00:12:33,920 --> 00:12:38,760
I'll go ahead and share my screen

233
00:12:38,760 --> 00:12:43,060
and just sort of guide the talk for today.

234
00:12:43,060 --> 00:12:47,440
So normally the way it goes is I just kind of drone

235
00:12:47,440 --> 00:12:48,880
on and on and on.

236
00:12:48,880 --> 00:12:50,260
And so I don't mean to do that.

237
00:12:50,260 --> 00:12:53,860
So feel free to just chime in at any point

238
00:12:54,860 --> 00:12:58,300
with questions or thoughts or comments.

239
00:12:58,300 --> 00:13:03,300
And basically I'll present a little bit of recent work

240
00:13:03,320 --> 00:13:06,080
that I've done probably for about 30 minutes

241
00:13:06,080 --> 00:13:09,800
and then we can save 15 minutes at the end to talk about it.

242
00:13:09,800 --> 00:13:14,800
So, well, it's been a good year

243
00:13:21,900 --> 00:13:24,000
at the Cannabis Data Science Meetup Group.

244
00:13:24,000 --> 00:13:27,820
This will be the last meetup for the year, 2021.

245
00:13:27,820 --> 00:13:32,400
And so since we're sort of spearheading the movement

246
00:13:32,400 --> 00:13:35,920
in data science, I thought it'd be worthwhile

247
00:13:35,920 --> 00:13:40,920
to outline some principles that we've covered

248
00:13:41,200 --> 00:13:42,960
and are utilizing.

249
00:13:42,960 --> 00:13:47,440
And we can start to formalize principles

250
00:13:47,440 --> 00:13:51,820
for other data scientists that follow in our footsteps.

251
00:13:51,820 --> 00:13:53,520
And these aren't golden.

252
00:13:53,520 --> 00:13:56,400
This is just a starting point.

253
00:13:56,400 --> 00:13:59,520
So if you have principles to add,

254
00:13:59,520 --> 00:14:02,240
ways to improve these principles,

255
00:14:02,240 --> 00:14:04,560
or you think any of these need to be crossed out,

256
00:14:04,560 --> 00:14:08,400
then feel free to mention this.

257
00:14:09,880 --> 00:14:14,120
But basically, you see schools, universities

258
00:14:14,120 --> 00:14:16,400
creating data science programs.

259
00:14:16,400 --> 00:14:18,960
And well, it would be helpful to ask,

260
00:14:18,960 --> 00:14:22,200
what is data science and what are some of the principles?

261
00:14:23,080 --> 00:14:24,800
And so I think of data science

262
00:14:24,800 --> 00:14:28,920
as sort of a merging of a few fields.

263
00:14:28,920 --> 00:14:31,120
Of course, computer science,

264
00:14:31,120 --> 00:14:34,360
maybe sprinkle in a bit of economics and business.

265
00:14:34,360 --> 00:14:38,140
Of course, you have data visualization and statistics.

266
00:14:39,200 --> 00:14:44,200
So I pulled some principles from some of these fields.

267
00:14:44,560 --> 00:14:46,140
And we've used these,

268
00:14:46,140 --> 00:14:48,940
and I'll show you how we use them today.

269
00:14:49,840 --> 00:14:52,720
So right off the bat,

270
00:14:52,720 --> 00:14:55,440
the top three rules of programming,

271
00:14:55,440 --> 00:14:58,160
reuse, reuse, reuse.

272
00:14:59,200 --> 00:15:03,120
So once you've write code,

273
00:15:03,120 --> 00:15:08,120
it's a very, very low marginal cost

274
00:15:08,280 --> 00:15:11,420
to copy and paste that code somewhere else.

275
00:15:11,420 --> 00:15:15,820
So as you've seen throughout the year,

276
00:15:15,820 --> 00:15:19,240
we've used various snippets of code

277
00:15:19,240 --> 00:15:21,400
over and over and over again.

278
00:15:21,400 --> 00:15:24,840
And that's the beauty of computer science.

279
00:15:24,840 --> 00:15:29,840
And so I encourage all of you to pull out any snippets

280
00:15:30,360 --> 00:15:33,720
that you find useful and reuse, reuse,

281
00:15:33,720 --> 00:15:36,560
and reuse them as much as you please.

282
00:15:38,340 --> 00:15:43,340
And that brings us nicely to another principle, refactor.

283
00:15:43,520 --> 00:15:46,920
So that just means essentially clean up your code

284
00:15:46,920 --> 00:15:51,920
and tidy it up, make it more readable, more maintainable.

285
00:15:54,620 --> 00:15:57,240
So this is something that you can always be doing.

286
00:15:57,240 --> 00:15:59,680
So you can look at the code

287
00:15:59,680 --> 00:16:01,680
that we've written throughout the year,

288
00:16:01,680 --> 00:16:05,480
and almost all of it needs a good refactor.

289
00:16:05,480 --> 00:16:09,340
And as you're refactoring it, you can build upon it,

290
00:16:09,340 --> 00:16:11,260
then you can iterate.

291
00:16:11,260 --> 00:16:15,280
So iteration can mean a lot of things

292
00:16:15,280 --> 00:16:20,240
that can mean applying something multiple times

293
00:16:20,240 --> 00:16:24,440
or using a piece of software over and over and over again.

294
00:16:24,440 --> 00:16:26,960
So for example, we're sort of iterating

295
00:16:26,960 --> 00:16:29,320
with our forecasting software.

296
00:16:30,320 --> 00:16:33,920
So maybe didn't do the best job at this principle,

297
00:16:33,920 --> 00:16:37,160
but refactoring is critical.

298
00:16:37,160 --> 00:16:38,720
So I wanted to include it.

299
00:16:41,140 --> 00:16:44,440
I guess from my perspective with iterate,

300
00:16:44,440 --> 00:16:47,080
you never run code just once.

301
00:16:47,080 --> 00:16:49,680
You have to run it over and over to get it better.

302
00:16:52,000 --> 00:16:53,120
Exactly.

303
00:16:53,120 --> 00:16:56,160
And that's one of the reasons why I love Python

304
00:16:56,160 --> 00:17:01,160
is it's so quick from right time to run time.

305
00:17:05,080 --> 00:17:06,800
Depends on how you define quick,

306
00:17:06,800 --> 00:17:10,880
but for me it's simple to write Python code

307
00:17:10,880 --> 00:17:14,760
and then get it running and then find your errors,

308
00:17:14,760 --> 00:17:17,960
fix your errors and run it again.

309
00:17:17,960 --> 00:17:20,120
So thank you for that, Graham.

310
00:17:23,280 --> 00:17:27,520
Countless principles from economics that could be used.

311
00:17:28,380 --> 00:17:31,920
However, economics I've heard is basically

312
00:17:31,920 --> 00:17:34,280
the study of choices.

313
00:17:34,280 --> 00:17:37,480
And we've seen over and over again

314
00:17:37,480 --> 00:17:41,360
that the choices that we make matter.

315
00:17:41,360 --> 00:17:46,360
So that could be anything from what we choose to look at,

316
00:17:47,080 --> 00:17:49,440
how we choose to look at it,

317
00:17:49,440 --> 00:17:54,440
or even how people in the industry make choices.

318
00:17:58,480 --> 00:18:02,920
So I thought that was a good way to sum up economics

319
00:18:02,920 --> 00:18:04,320
into a small little nugget.

320
00:18:04,320 --> 00:18:08,160
There was a principle that an entrepreneur taught me

321
00:18:08,160 --> 00:18:13,160
that kind of goes back to principle one, reuse,

322
00:18:13,400 --> 00:18:17,000
which is take stock of what you already have.

323
00:18:17,000 --> 00:18:22,000
So this applies it nicely to reuse.

324
00:18:22,000 --> 00:18:24,960
And you'll see today that it's just,

325
00:18:24,960 --> 00:18:29,400
take stock of this repository that we've created

326
00:18:29,400 --> 00:18:31,520
at the Cannabis Data Science Group.

327
00:18:31,520 --> 00:18:35,360
And figure out which pieces that you can reuse.

328
00:18:36,360 --> 00:18:40,360
So once again, just sort of starting the starting point.

329
00:18:40,360 --> 00:18:42,600
So this principle could almost be condensed

330
00:18:42,600 --> 00:18:43,960
into principle one.

331
00:18:45,640 --> 00:18:50,640
And then finally, one of the most important principles

332
00:18:50,800 --> 00:18:53,000
that I learned from Edward Tuft,

333
00:18:53,880 --> 00:18:57,360
who published his book on data visualization,

334
00:18:57,360 --> 00:18:59,160
I think in 1983.

335
00:18:59,160 --> 00:19:02,440
So he's been at it for a long time now.

336
00:19:02,440 --> 00:19:05,840
And I'm sure if you've been coming to the group for a while,

337
00:19:05,840 --> 00:19:08,240
you've heard me promote his work.

338
00:19:08,240 --> 00:19:10,440
But for those of you who are new,

339
00:19:10,440 --> 00:19:13,200
I'd highly recommend checking out Edward Tuft.

340
00:19:14,120 --> 00:19:17,360
His books had been a great influence on my work.

341
00:19:18,440 --> 00:19:23,440
And so I would sum up all of his brilliant contributions

342
00:19:24,560 --> 00:19:27,680
as show the data visualization,

343
00:19:27,680 --> 00:19:30,160
show the data.

344
00:19:30,160 --> 00:19:34,720
So first and foremost, you've got all this data,

345
00:19:34,720 --> 00:19:37,040
you need to show it for a couple of reasons.

346
00:19:37,040 --> 00:19:39,640
One, to analyze it.

347
00:19:40,520 --> 00:19:43,640
Two, to inform other people.

348
00:19:43,640 --> 00:19:47,880
Three, whether you like it or not, to persuade other people.

349
00:19:47,880 --> 00:19:50,120
So a good visualization,

350
00:19:50,120 --> 00:19:54,480
it goes a long way into helping explain a concept.

351
00:19:54,480 --> 00:19:58,480
So these are the main principles I've laid out.

352
00:19:58,480 --> 00:20:03,480
And then I'll return to the presentation here in a bit.

353
00:20:03,560 --> 00:20:06,280
But without further ado,

354
00:20:06,280 --> 00:20:09,200
let's go ahead and get to the data here.

355
00:20:11,000 --> 00:20:16,000
So gonna use a couple of different data sources here.

356
00:20:16,000 --> 00:20:21,000
So you can hunt down this data

357
00:20:21,000 --> 00:20:26,000
that was produced by a Freedom of Information Act request

358
00:20:29,560 --> 00:20:31,920
by Cannabis Observer.

359
00:20:31,920 --> 00:20:35,280
So they do diligent work in Washington State.

360
00:20:35,280 --> 00:20:38,840
And so Washington State has a very generous

361
00:20:38,840 --> 00:20:40,640
freedom of information.

362
00:20:42,240 --> 00:20:46,440
And so you can actually get the entire

363
00:20:46,440 --> 00:20:51,440
database of traceability data.

364
00:20:51,440 --> 00:20:56,440
So be warned, some of these are quite large.

365
00:20:56,440 --> 00:21:00,440
So for example, just the inventory's data,

366
00:21:01,440 --> 00:21:06,440
if you unzip it, that's 30 gigabytes of data.

367
00:21:06,800 --> 00:21:11,800
So there's almost 40 gigabytes unzipped

368
00:21:11,800 --> 00:21:16,800
so we're looking at a large scale data here.

369
00:21:20,360 --> 00:21:23,600
And so this is probably,

370
00:21:25,000 --> 00:21:27,800
maybe not, I would call this big data.

371
00:21:29,600 --> 00:21:32,120
People that work at large retailers,

372
00:21:32,120 --> 00:21:34,680
they deal with much larger data sets,

373
00:21:34,680 --> 00:21:37,160
but this is a large enough data set

374
00:21:37,160 --> 00:21:41,400
to be able to do a large scale analysis

375
00:21:41,400 --> 00:21:44,640
of a large enough data set that it's tough to wrangle.

376
00:21:46,680 --> 00:21:49,320
We've been so focused on sales

377
00:21:49,320 --> 00:21:52,040
that we can come back to sales later.

378
00:21:52,040 --> 00:21:56,120
So today, when we looked at this earlier in the year,

379
00:21:56,120 --> 00:21:59,920
so apologize if this is a bit of a review for some of us,

380
00:21:59,920 --> 00:22:03,520
but this is where we go back to reuse,

381
00:22:03,520 --> 00:22:07,360
reuse, refactor, build upon, iterate.

382
00:22:07,360 --> 00:22:12,360
So we'll be looking at lab results.

383
00:22:13,080 --> 00:22:15,880
Where did this come about?

384
00:22:15,880 --> 00:22:20,880
Well, I opened up the cannabis data science repository

385
00:22:21,680 --> 00:22:26,680
and I just did a search for cannabinoids.

386
00:22:29,000 --> 00:22:34,000
And I started to see all of the times

387
00:22:34,000 --> 00:22:37,560
that we've talked about cannabinoids this year.

388
00:22:37,560 --> 00:22:42,560
So, we started off the year in February

389
00:22:44,840 --> 00:22:48,840
talking about these data points.

390
00:22:48,840 --> 00:22:51,720
We've touched on them here and there.

391
00:22:51,720 --> 00:22:53,360
And so I thought it would be fitting

392
00:22:53,360 --> 00:22:57,640
that we end the year talking about these lab results.

393
00:22:57,640 --> 00:23:02,640
And long story short, I've copied some of this code

394
00:23:02,640 --> 00:23:07,640
and I realized, oh, you know, we also talked about

395
00:23:08,880 --> 00:23:13,880
cannabinoids when we were researching Connecticut.

396
00:23:14,960 --> 00:23:19,960
So I thought today, why don't we look at

397
00:23:20,440 --> 00:23:24,240
both Washington and Connecticut?

398
00:23:24,240 --> 00:23:27,840
So these are the two states that I know of

399
00:23:27,840 --> 00:23:31,040
that we can get cannabinoid data from.

400
00:23:31,040 --> 00:23:34,240
And so I thought, why not let's analyze them

401
00:23:34,240 --> 00:23:37,440
and compare them because we basically have

402
00:23:37,440 --> 00:23:40,520
one West Coast state, Washington,

403
00:23:40,520 --> 00:23:43,080
and then an East Coast state, Connecticut.

404
00:23:43,080 --> 00:23:46,000
So we could start to see, you know,

405
00:23:46,000 --> 00:23:47,600
if there's a difference between

406
00:23:47,600 --> 00:23:50,280
East Coast and West Coast cannabis.

407
00:23:50,280 --> 00:23:53,720
So kind of what we were talking about earlier,

408
00:23:53,720 --> 00:23:55,720
but you know, I thought it would be fun.

409
00:23:55,720 --> 00:24:00,720
So right off the bat, we're reusing,

410
00:24:01,760 --> 00:24:05,800
we're refactoring, we're building upon,

411
00:24:05,800 --> 00:24:07,600
and we're iterating.

412
00:24:07,600 --> 00:24:12,600
And then we've also taken stock of what we already have.

413
00:24:12,680 --> 00:24:17,680
So now I'll tell you at the end how our choices matter,

414
00:24:18,080 --> 00:24:20,480
but for now, let's show the data.

415
00:24:20,480 --> 00:24:25,480
So first things first, I needed to,

416
00:24:25,800 --> 00:24:28,400
or we needed to read in the data.

417
00:24:28,400 --> 00:24:31,000
And so this doesn't take very long,

418
00:24:31,000 --> 00:24:33,760
maybe 30 seconds to a minute,

419
00:24:33,760 --> 00:24:37,760
but I've already read in the data just for,

420
00:24:38,720 --> 00:24:41,440
for time sake here, you know,

421
00:24:41,440 --> 00:24:44,560
just to save us a minute or so.

422
00:24:44,560 --> 00:24:47,000
So let's go ahead and do that.

423
00:24:47,000 --> 00:24:50,640
You know, just to save us a minute or so.

424
00:24:50,640 --> 00:24:55,640
Well, I probably shouldn't have just, okay.

425
00:24:58,280 --> 00:25:00,320
Maybe I can keyboard interrupt.

426
00:25:00,320 --> 00:25:02,800
Okay, the long story short,

427
00:25:04,320 --> 00:25:08,840
we have many observations here just from Washington.

428
00:25:08,840 --> 00:25:11,800
So just these lab results

429
00:25:11,800 --> 00:25:16,800
are almost two million observations.

430
00:25:17,960 --> 00:25:22,960
So we're starting to get into the big data realm.

431
00:25:22,960 --> 00:25:24,880
Once you get into around, you know,

432
00:25:24,880 --> 00:25:27,080
the millions of observations,

433
00:25:27,080 --> 00:25:29,680
you're in, you're in big data world.

434
00:25:29,680 --> 00:25:32,200
So awesome, we're there.

435
00:25:33,960 --> 00:25:36,720
Added a time column.

436
00:25:36,720 --> 00:25:39,040
So let's see if we can't,

437
00:25:39,040 --> 00:25:41,960
so let's see if we can't,

438
00:25:43,960 --> 00:25:46,720
I realized that some of these operations

439
00:25:48,080 --> 00:25:50,400
may take a long time since we have,

440
00:25:50,400 --> 00:25:52,560
you know, two million observations.

441
00:25:57,320 --> 00:26:00,520
Computers are brilliantly smart

442
00:26:01,560 --> 00:26:03,400
doing these number crunches.

443
00:26:03,400 --> 00:26:07,240
So long story short, we've got data ranging from

444
00:26:07,240 --> 00:26:12,240
the beginning of 2018 through November.

445
00:26:12,320 --> 00:26:17,320
So we don't have quite the full year of 2021.

446
00:26:17,320 --> 00:26:19,640
We'll do, I'm sure the Canvas Observer

447
00:26:19,640 --> 00:26:23,360
will get us a complete data set here before too long.

448
00:26:23,360 --> 00:26:26,600
But for now, we've got a lot of data to work with.

449
00:26:28,440 --> 00:26:31,280
So just for today,

450
00:26:31,280 --> 00:26:35,160
we're just going to be looking at cannabinoid data

451
00:26:35,160 --> 00:26:38,160
and I figured, well,

452
00:26:38,160 --> 00:26:41,160
we could start just looking at differences

453
00:26:41,160 --> 00:26:42,840
from year to year.

454
00:26:42,840 --> 00:26:47,840
So here, I just separated the data into 2020 and 2021.

455
00:26:50,760 --> 00:26:54,000
And this is one of my favorite things to do with data

456
00:26:54,000 --> 00:26:58,280
is find conditional, say averages,

457
00:26:58,280 --> 00:27:00,520
or conditional statistics.

458
00:27:00,520 --> 00:27:02,720
So I'm going to go ahead and do that.

459
00:27:02,720 --> 00:27:05,880
So we're going to find data averages

460
00:27:05,880 --> 00:27:08,360
or conditional statistics.

461
00:27:08,360 --> 00:27:10,320
And so this is exactly what we're doing here.

462
00:27:10,320 --> 00:27:14,840
We're basically going to find data conditional

463
00:27:14,840 --> 00:27:18,840
on the year 2020 or 2021.

464
00:27:18,840 --> 00:27:22,520
And then we're also going to add another condition,

465
00:27:22,520 --> 00:27:26,320
whether the data is flower data

466
00:27:26,320 --> 00:27:30,240
or whether the data is a concentrate.

467
00:27:30,240 --> 00:27:35,240
And I have added a link to the data guide

468
00:27:37,160 --> 00:27:40,160
so that way you can see

469
00:27:40,160 --> 00:27:45,160
where I'm pulling all of these fields from.

470
00:27:46,120 --> 00:27:48,320
So long story short,

471
00:27:49,560 --> 00:27:52,600
software is always the most boring part.

472
00:27:52,600 --> 00:27:56,440
So I'll let you just kind of pick through this code

473
00:27:56,440 --> 00:27:59,280
when you have a chance

474
00:27:59,280 --> 00:28:03,720
and we'll get onto the fun part, looking at the data.

475
00:28:03,720 --> 00:28:05,480
So first things first,

476
00:28:06,600 --> 00:28:10,120
one of the principal compounds people are interested in

477
00:28:10,120 --> 00:28:12,440
is THCA.

478
00:28:12,440 --> 00:28:17,440
So this is essentially the intoxicating compound

479
00:28:17,440 --> 00:28:18,880
found in cannabis.

480
00:28:18,880 --> 00:28:23,880
And so here I just plotted the distribution of THCA

481
00:28:25,320 --> 00:28:28,880
in all cannabis products in Washington state.

482
00:28:30,640 --> 00:28:35,640
I realize now that I actually like this setting.

483
00:28:36,680 --> 00:28:39,280
So when I think of a histogram,

484
00:28:39,280 --> 00:28:42,080
I think of it in terms of density.

485
00:28:43,120 --> 00:28:46,080
So let me just add this parameter real quick.

486
00:28:46,080 --> 00:28:51,080
Okay.

487
00:28:51,080 --> 00:28:54,080
So basically before we were plotting

488
00:28:54,080 --> 00:28:58,080
in terms of frequency on the Y axis,

489
00:28:58,080 --> 00:29:01,080
but I kind of typically think of a histogram

490
00:29:01,080 --> 00:29:04,080
as having density.

491
00:29:04,080 --> 00:29:09,080
So the proportion of samples that have a given X value

492
00:29:09,080 --> 00:29:10,080
on the Y axis.

493
00:29:12,080 --> 00:29:13,080
So,

494
00:29:13,080 --> 00:29:15,080
I love histograms.

495
00:29:15,080 --> 00:29:20,080
So as you can see, they're fairly overlapping.

496
00:29:20,080 --> 00:29:24,080
There are tests, statistical tests that you can do

497
00:29:24,080 --> 00:29:29,080
to tell if, or estimate if, you know,

498
00:29:29,080 --> 00:29:32,080
these distributions are different.

499
00:29:32,080 --> 00:29:34,080
I won't get into that today,

500
00:29:34,080 --> 00:29:35,080
but if you're interested,

501
00:29:35,080 --> 00:29:38,080
that's a good topic for Saturday morning statistics.

502
00:29:38,080 --> 00:29:43,080
So we've got THCA

503
00:29:44,080 --> 00:29:45,080
or,

504
00:29:49,080 --> 00:29:51,080
what was this next plot?

505
00:29:54,080 --> 00:29:57,080
Ah, yes.

506
00:29:57,080 --> 00:30:02,080
So now instead of just looking at the data,

507
00:30:02,080 --> 00:30:04,080
I'm going to look at the data.

508
00:30:04,080 --> 00:30:09,080
Yes, so now instead of just looking at cannabis,

509
00:30:10,080 --> 00:30:14,080
just all varieties, we'll just look at flower.

510
00:30:15,080 --> 00:30:20,080
So flower in Washington state, as you can see,

511
00:30:20,080 --> 00:30:24,080
has a distribution between, you know,

512
00:30:24,080 --> 00:30:29,080
around, I can't see if this is like 10 or 15%,

513
00:30:29,080 --> 00:30:33,080
do around 30, 35% or so.

514
00:30:33,080 --> 00:30:37,080
So this is the distribution of THCA

515
00:30:37,080 --> 00:30:40,080
that you observe in Washington.

516
00:30:40,080 --> 00:30:44,080
And you actually don't really see that much of a difference

517
00:30:44,080 --> 00:30:46,080
from 2020 to 2021.

518
00:30:47,080 --> 00:30:48,080
So,

519
00:30:52,080 --> 00:30:55,080
you can draw what inferences you want from this.

520
00:30:55,080 --> 00:30:58,080
The one inference that I'm kind of thinking is

521
00:30:58,080 --> 00:31:02,080
maybe, you know, growers in Washington state

522
00:31:02,080 --> 00:31:05,080
have kind of perfected their craft.

523
00:31:05,080 --> 00:31:10,080
So they maybe have figured out how to grow cannabis

524
00:31:10,080 --> 00:31:13,080
and, oh, Graham, question.

525
00:31:14,080 --> 00:31:16,080
Yeah, can you go back to that plot?

526
00:31:17,080 --> 00:31:21,080
Because like the mathematician in me is kind of

527
00:31:21,080 --> 00:31:25,080
just seeing on that graph exactly what you're saying

528
00:31:25,080 --> 00:31:30,080
in comparison with the blue and the pink.

529
00:31:30,080 --> 00:31:32,080
Pink is the later years.

530
00:31:33,080 --> 00:31:37,080
The spike is much sharper.

531
00:31:37,080 --> 00:31:42,080
The silhouette width of that Gaussian distribution

532
00:31:42,080 --> 00:31:45,080
is just ever so thinner.

533
00:31:45,080 --> 00:31:49,080
And you can only see it with the higher spikes in orange

534
00:31:49,080 --> 00:31:52,080
in like the top four bars.

535
00:31:52,080 --> 00:31:56,080
But you can also, if you zoom in, I'm assuming,

536
00:31:56,080 --> 00:32:00,080
there's bottom areas in like the 10 and 20 and all that stuff.

537
00:32:00,080 --> 00:32:02,080
That's primarily blue.

538
00:32:03,080 --> 00:32:08,080
And I guess it's just basically saying,

539
00:32:09,080 --> 00:32:13,080
you're right, I think they're getting better at their craft,

540
00:32:13,080 --> 00:32:15,080
but not significantly.

541
00:32:16,080 --> 00:32:17,080
Exactly.

542
00:32:17,080 --> 00:32:21,080
And like I said, Graham keeps it sharp.

543
00:32:21,080 --> 00:32:23,080
So, exactly.

544
00:32:23,080 --> 00:32:28,080
And so there is a difference between these distributions.

545
00:32:28,080 --> 00:32:34,080
So the blue one is, as Graham said,

546
00:32:34,080 --> 00:32:39,080
it doesn't have quite the same normal distribution as 2021 does.

547
00:32:39,080 --> 00:32:43,080
It has a bit sharper of a...

548
00:32:44,080 --> 00:32:47,080
It's got lower variance, essentially.

549
00:32:47,080 --> 00:32:52,080
Well, I'll have to think about this,

550
00:32:52,080 --> 00:32:58,080
but you do raise an interesting point that the average...

551
00:32:58,080 --> 00:33:03,080
Actually, I guess we could technically calculate the averages real quick.

552
00:33:05,080 --> 00:33:10,080
So let's actually do that because...

553
00:33:10,080 --> 00:33:19,080
So in 2020, the mean was 22% and then in 2021...

554
00:33:19,080 --> 00:33:22,080
So this is what's so interesting.

555
00:33:22,080 --> 00:33:25,080
And we talked about this once in Saturday Morning Statistics is

556
00:33:25,080 --> 00:33:30,080
just because the mean is the same doesn't necessarily mean

557
00:33:30,080 --> 00:33:32,080
all the moments are the same.

558
00:33:32,080 --> 00:33:37,080
And when I say moments, that could be variance,

559
00:33:37,080 --> 00:33:40,080
kurtosis, and skew.

560
00:33:40,080 --> 00:33:45,080
So there's multiple ways to characterize a distribution.

561
00:33:45,080 --> 00:33:50,080
And so we could further look at this

562
00:33:50,080 --> 00:33:54,080
and really kind of get nitpicky about how things have changed.

563
00:33:54,080 --> 00:33:58,080
So I love the sharp eye, Graham.

564
00:33:58,080 --> 00:33:59,080
So...

565
00:33:59,080 --> 00:34:00,080
Good work.

566
00:34:00,080 --> 00:34:01,080
What am I hearing?

567
00:34:02,080 --> 00:34:03,080
Thank you.

568
00:34:03,080 --> 00:34:04,080
So...

569
00:34:04,080 --> 00:34:10,080
So that's definitely, you know, something to take note of.

570
00:34:11,080 --> 00:34:16,080
So just to go ahead and look at some of these other distributions here,

571
00:34:16,080 --> 00:34:19,080
let's actually crank up the bins.

572
00:34:19,080 --> 00:34:21,080
So this is sort of the number of groupings.

573
00:34:21,080 --> 00:34:24,080
So let's crank that up to 100 and look at...

574
00:34:25,080 --> 00:34:28,080
Let's look at concentrates.

575
00:34:28,080 --> 00:34:32,080
Or let's actually crank this back down.

576
00:34:32,080 --> 00:34:38,080
So concentrates, I think, is interesting too,

577
00:34:38,080 --> 00:34:45,080
where on first glance, the distributions look similar.

578
00:34:45,080 --> 00:34:51,080
But to me, it looks like 2021 is slightly different.

579
00:34:51,080 --> 00:34:53,080
So I think that's a good point.

580
00:34:53,080 --> 00:35:00,080
Sure. But to me, it looks like 2021 is slightly...

581
00:35:00,080 --> 00:35:03,080
may have a slightly higher mean.

582
00:35:03,080 --> 00:35:06,080
So we can calculate that.

583
00:35:07,080 --> 00:35:10,080
Well, some of them may have...

584
00:35:13,080 --> 00:35:18,080
Well, 2021 may have sort of a heavy tail towards the left.

585
00:35:18,080 --> 00:35:20,080
Yeah, use the median.

586
00:35:20,080 --> 00:35:22,080
I wonder...

587
00:35:22,080 --> 00:35:28,080
The tail on the left is due to CBD concentrates.

588
00:35:33,080 --> 00:35:39,080
You may have to exclude the CBD concentrates in somehow.

589
00:35:43,080 --> 00:35:44,080
But...

590
00:35:45,080 --> 00:35:46,080
Well...

591
00:35:46,080 --> 00:35:49,080
Just real quick, let's just look at...

592
00:35:53,080 --> 00:35:55,080
this data...

593
00:35:56,080 --> 00:36:02,080
where this is greater than 50%.

594
00:36:08,080 --> 00:36:11,080
So this is sort of an ad hoc way to do this.

595
00:36:11,080 --> 00:36:19,080
But you see, if we just look at concentrates at the top end of the distribution here,

596
00:36:19,080 --> 00:36:22,080
above 50% THCA,

597
00:36:22,080 --> 00:36:28,080
then 2021 has a slightly higher mean,

598
00:36:29,080 --> 00:36:33,080
76.1 to 75.8.

599
00:36:33,080 --> 00:36:39,080
Once again, you can do statistical tests to see if that is statistically different.

600
00:36:39,080 --> 00:36:41,080
But...

601
00:36:42,080 --> 00:36:51,080
basically, the inference I would take from this is it looks to me like the processors, the manufacturers,

602
00:36:51,080 --> 00:36:57,080
creating THCA concentrates may be getting slightly better at it.

603
00:36:57,080 --> 00:37:01,080
So they may be sort of refining their technique.

604
00:37:03,080 --> 00:37:07,080
But once again, that's sort of a big inference to make.

605
00:37:07,080 --> 00:37:13,080
From this little bit of data and visualizations that we've done.

606
00:37:14,080 --> 00:37:15,080
Question?

607
00:37:16,080 --> 00:37:18,080
Yeah, I just had one question.

608
00:37:19,080 --> 00:37:23,080
Kind of, this is maybe like a shot in the dark here.

609
00:37:23,080 --> 00:37:27,080
But when it comes to like molecules and stuff, like they have half-lives,

610
00:37:27,080 --> 00:37:29,080
kind of like they kind of split down.

611
00:37:29,080 --> 00:37:35,080
Is there any chance that we would see, whether in the future or previously in the past,

612
00:37:35,080 --> 00:37:41,080
where they're a lot, not their half-lives, but the word I'm looking for.

613
00:37:42,080 --> 00:37:45,080
You're looking for shelf life stability.

614
00:37:45,080 --> 00:37:46,080
Yeah.

615
00:37:47,080 --> 00:37:50,080
So that's a critical factor.

616
00:37:50,080 --> 00:37:57,080
So this is something that, of course, the laboratories are taking into consideration,

617
00:37:57,080 --> 00:38:03,080
but a lot of producers and manufacturers, they may not even know about this.

618
00:38:03,080 --> 00:38:06,080
Or if they do, they may, it's tough to factor it in.

619
00:38:06,080 --> 00:38:17,080
Long story short, there's been studies done on essentially the degradation of cannabinoids at various temperatures.

620
00:38:17,080 --> 00:38:25,080
And so you actually do see degradation of cannabinoids at room temperature over time.

621
00:38:25,080 --> 00:38:30,080
So I'll have to refer you to the papers.

622
00:38:30,080 --> 00:38:37,080
But you know, if you keep cannabis at room temperature for months on end,

623
00:38:37,080 --> 00:38:42,080
you will see diminishing in the cannabinoids.

624
00:38:42,080 --> 00:38:46,080
If you keep the cannabis refrigerated or frozen,

625
00:38:46,080 --> 00:38:54,080
you have substantially less degradation to the point where it's almost negligible.

626
00:38:54,080 --> 00:39:03,080
So that's why laboratories really stress to keep samples refrigerated, if not frozen.

627
00:39:03,080 --> 00:39:07,080
So if you're running a lab, that's a critical step.

628
00:39:07,080 --> 00:39:19,080
For example, in Missouri, they have to hold the samples, I think, for 60 to 90 days in case you have to do a confirmation study.

629
00:39:19,080 --> 00:39:24,080
If you're doing that, you want to keep the cannabis refrigerated.

630
00:39:24,080 --> 00:39:31,080
So that way, when you measure it again, say in a couple of months from now, you'll still get the same measurement.

631
00:39:31,080 --> 00:39:41,080
Long story short, as to your question, the data we're looking at doesn't really take this into consideration.

632
00:39:41,080 --> 00:39:46,080
The values are assigned when the sample is tested.

633
00:39:46,080 --> 00:39:54,080
And it like you said, that may not actually be the value when it is sold.

634
00:39:54,080 --> 00:40:00,080
So say if something's tested and it's sold three months later,

635
00:40:00,080 --> 00:40:14,080
if it's been sitting at room temperature, then you could likely assume that it's going to have slightly lower cannabinoids, maybe not even slightly.

636
00:40:14,080 --> 00:40:18,080
So this is actually something that needs a lot more work.

637
00:40:18,080 --> 00:40:26,080
So I think there's only been like, you know, cursory research done on this at laboratories and controlled settings.

638
00:40:26,080 --> 00:40:34,080
I think you're brilliant because I think this is an avenue for really good research,

639
00:40:34,080 --> 00:40:41,080
because this is something that essentially people are concerned about is it's a big concern to say,

640
00:40:41,080 --> 00:40:45,080
to have products on the shelves that may be misleading.

641
00:40:45,080 --> 00:40:52,080
And, you know, it's one thing for it to be intentional, but it could be entirely unintentional.

642
00:40:52,080 --> 00:41:05,080
So if you just didn't know better and you were just keeping your products at room temperature, they may not be as potent as the label says.

643
00:41:05,080 --> 00:41:07,080
Thank you.

644
00:41:07,080 --> 00:41:15,080
So long answer, but awesome question. So important question.

645
00:41:15,080 --> 00:41:27,080
So on to cannabinoids. Well, this is just the distribution of total cannabinoids in Washington flower.

646
00:41:27,080 --> 00:41:37,080
And actually, sure enough, if you look at total cannabinoids here and let's look at density,

647
00:41:37,080 --> 00:41:47,080
then once again, it looks like it 2021 has a little bit sharper distribution there.

648
00:41:47,080 --> 00:41:55,080
More variants, it looks like in 2020.

649
00:41:55,080 --> 00:42:03,080
Well, that plot, well, actually, the prior plot was misleading because we were looking at frequency.

650
00:42:03,080 --> 00:42:12,080
If we look at density, the density is, you know, it's quite similar.

651
00:42:12,080 --> 00:42:15,080
And once again, that's a subjective statement.

652
00:42:15,080 --> 00:42:28,080
You can try to get a bit more objective with it with actually calculating statistics to tell the difference between these distributions.

653
00:42:28,080 --> 00:42:33,080
I commented this section out just to kind of save time here.

654
00:42:33,080 --> 00:42:42,080
But for the adventurous ones, try to uncomment this code because

655
00:42:42,080 --> 00:42:49,080
we've all heard it. Well, a lot of us have heard of the Indica sativa dichotomy.

656
00:42:49,080 --> 00:43:00,080
Well, at a cannabis conference I was at, they introduced a new classification system that is basically based on five types,

657
00:43:00,080 --> 00:43:06,080
where you basically say, OK, is something high THC to CBD?

658
00:43:06,080 --> 00:43:12,080
So that would be what you may think of as your typical sativas.

659
00:43:12,080 --> 00:43:18,080
There's the near unitary, which is a balance between THC and CBD.

660
00:43:18,080 --> 00:43:22,080
So that's sort of a whole new type of its own.

661
00:43:22,080 --> 00:43:31,080
Then there's the high CBD to THC ratio, which is what you would typically think of as an Indica.

662
00:43:31,080 --> 00:43:39,080
And then you have some that are high in CBG. And what's going on with those types?

663
00:43:39,080 --> 00:43:44,080
And then there are some types that there's not really a distinct major cannabinoid.

664
00:43:44,080 --> 00:43:52,080
What's going on there? So this is sort of an avenue for further research.

665
00:43:52,080 --> 00:43:58,080
Just for brevity today, I'm just going to move past this.

666
00:43:58,080 --> 00:44:08,080
But long story short, you can use THC and CBD to start to classify these cannabinoids.

667
00:44:08,080 --> 00:44:16,080
So we did this once earlier this year, so I'll refer you to that day.

668
00:44:16,080 --> 00:44:20,080
But today I thought it would be really interesting.

669
00:44:20,080 --> 00:44:28,080
We were reusing various bits of code, so we've looked at cannabinoids in Washington.

670
00:44:28,080 --> 00:44:33,080
On another day, we looked at cannabinoids in Connecticut.

671
00:44:33,080 --> 00:44:41,080
And we've also talked about comparative analysis, how important it is to compare one state to another state.

672
00:44:41,080 --> 00:44:44,080
So let's do just that.

673
00:44:44,080 --> 00:45:00,080
So this bit of code is just where you read the data from the Socrata API, clean up the data a little bit.

674
00:45:00,080 --> 00:45:11,080
So the data is in strings in text format with some values that you see frequently at a laboratory.

675
00:45:11,080 --> 00:45:23,080
So when you're measuring cannabis in a laboratory setting, you can never...

676
00:45:23,080 --> 00:45:29,080
And this goes just to the principles of science. You can't prove a negative.

677
00:45:29,080 --> 00:45:36,080
So at no point can they say, oh, there's no, say, THC in a product.

678
00:45:36,080 --> 00:45:41,080
They can just say, we didn't detect it. We didn't detect it.

679
00:45:41,080 --> 00:45:47,080
Our measurements can detect everything down to 0.1% accurately.

680
00:45:47,080 --> 00:45:53,080
So all we can say is it's less than 0.1%.

681
00:45:53,080 --> 00:46:02,080
So long story short, I just code those as zero because in my mind, it's effectively zero,

682
00:46:02,080 --> 00:46:06,080
even though technically it's non-detected.

683
00:46:06,080 --> 00:46:13,080
So a little bit of science mixed in.

684
00:46:13,080 --> 00:46:17,080
So long story short, clean up the Connecticut data.

685
00:46:17,080 --> 00:46:28,080
And then we get some nice observations here from Connecticut.

686
00:46:28,080 --> 00:46:38,080
So the Connecticut data is awesome because in addition to the cannabinoids, we also get terpenes.

687
00:46:38,080 --> 00:46:43,080
So if you're interested in looking at terpenes,

688
00:46:43,080 --> 00:46:51,080
Connecticut is the only state that I know of that has publicly available terpene data.

689
00:46:51,080 --> 00:46:58,080
I am frequently mistaken, so maybe there's more terpene data out there to be had.

690
00:46:58,080 --> 00:47:11,080
I think there's a lot of informative analysis and research and inferences that can be made from terpenes.

691
00:47:11,080 --> 00:47:24,080
So I personally think terpenes may be a better way to classify cannabis than, say, the high THC ratio.

692
00:47:24,080 --> 00:47:37,080
So if you're adventurous, then you can maybe try to see how these types shake out in Connecticut in terms of terpenes.

693
00:47:37,080 --> 00:47:50,080
So, for example, does type 4 that's high in CBG, does that also have a particular terpene that is also frequently present?

694
00:47:50,080 --> 00:47:54,080
So terpene shouldn't be neglected.

695
00:47:54,080 --> 00:48:05,080
So, for example, I've heard that beta-karyophylline is actually considered quite a bit of an intoxicant.

696
00:48:05,080 --> 00:48:14,080
And so when people get, say, sleepy or what have you from like an indica,

697
00:48:14,080 --> 00:48:20,080
I've heard that's often actually the effect of beta-karyophylline.

698
00:48:20,080 --> 00:48:27,080
And it seems there's a threshold to it. So beta-karyophylline is found in many, many, many strains.

699
00:48:27,080 --> 00:48:38,080
But it seems that once you cross a threshold, then you get the intoxicant sedative effect.

700
00:48:38,080 --> 00:48:49,080
So long story short, you may actually want to sort of be on the lookout for some of these terpenes

701
00:48:49,080 --> 00:49:00,080
if you're consuming cannabis and how they may have an effect on you. So interesting point there.

702
00:49:00,080 --> 00:49:07,080
But without further ado, back to the cannabinoids and enough of just looking at numbers here.

703
00:49:07,080 --> 00:49:19,080
Let's look at the data. So this is just for fun, just a scatter plot of CBDA to THCA.

704
00:49:19,080 --> 00:49:24,080
And then we're doing comparative analysis.

705
00:49:24,080 --> 00:49:42,080
So here's the same plot with both Washington and Connecticut. So Washington sort of has a lot more observations.

706
00:49:42,080 --> 00:49:52,080
So kind of kind of dominates the plot there. So I don't know. Sorry, got a little off track there.

707
00:49:52,080 --> 00:50:00,080
The main thing we're looking at here is distributions. So this is sort of the hammer for the day.

708
00:50:00,080 --> 00:50:10,080
So if we look at, say, the distribution of THCA in Washington to Connecticut.

709
00:50:10,080 --> 00:50:17,080
So remember, we were just looking at distributions above in Washington.

710
00:50:17,080 --> 00:50:24,080
And we see in Washington from year to year, there's quite the overlap.

711
00:50:24,080 --> 00:50:32,080
When we look at, say, THCA in Washington to Connecticut, there's a big difference.

712
00:50:32,080 --> 00:50:38,080
The distributions just visually look different.

713
00:50:38,080 --> 00:50:43,080
And once again, we can do a statistical test to see if they are, in fact, different.

714
00:50:43,080 --> 00:50:51,080
But the visualization is quite powerful.

715
00:50:51,080 --> 00:50:56,080
I was just going to just look at the means here.

716
00:50:56,080 --> 00:51:05,080
So in Washington, right, we've got the 22.5.

717
00:51:05,080 --> 00:51:15,080
And then in Connecticut, we see actually 27 percent. OK, something's going on.

718
00:51:15,080 --> 00:51:22,080
Let's just look at total cannabinoids here.

719
00:51:22,080 --> 00:51:29,080
So once again, if you just look at all the cannabinoids,

720
00:51:29,080 --> 00:51:36,080
there is still a different distribution that appears in Connecticut.

721
00:51:36,080 --> 00:51:46,080
So one thinks, OK, this may be going on with all the cannabinoids, not necessarily just THCA.

722
00:51:46,080 --> 00:51:54,080
Well, the choices you make matter.

723
00:51:54,080 --> 00:52:01,080
And what I suspect and I don't know for certain here,

724
00:52:01,080 --> 00:52:06,080
but the recent trend, I know this is the case in Massachusetts.

725
00:52:06,080 --> 00:52:11,080
So I am suspecting it's the case in Connecticut. I need to confirm.

726
00:52:11,080 --> 00:52:21,080
But the science community is coming to the conclusion or coming to the agreement

727
00:52:21,080 --> 00:52:32,080
that for a more standardized measure, that scientists should measure cannabinoids

728
00:52:32,080 --> 00:52:36,080
in a moisture corrected manner.

729
00:52:36,080 --> 00:52:45,080
So, for example, cannabis has a moisture content, right?

730
00:52:45,080 --> 00:52:55,080
So part of the flower is.

731
00:52:55,080 --> 00:52:58,080
Graham, do you have a question?

732
00:52:58,080 --> 00:53:02,080
No, I'm just water activity level. You're spot on.

733
00:53:02,080 --> 00:53:05,080
It was what I was going to say. So I type.

734
00:53:05,080 --> 00:53:15,080
A water activity level. Moisture content.

735
00:53:15,080 --> 00:53:18,080
What was the hold on?

736
00:53:18,080 --> 00:53:24,080
I think we want to use moisture content, but let me just.

737
00:53:24,080 --> 00:53:38,080
Find out what the water activity measure is because there is one. Yeah, the water activity rate.

738
00:53:38,080 --> 00:53:45,080
So the water activity rate is essentially how much like water vapor.

739
00:53:45,080 --> 00:53:51,080
Once again, not a chemist here, so I could be butchering the interpretation of this.

740
00:53:51,080 --> 00:54:04,080
But from my understanding, the water activity rate is sort of the vapor that a product gives off measured in this abstract unit.

741
00:54:04,080 --> 00:54:12,080
Water activity rate, and that just ranges from zero to one, and it's not a percentage. It's just a rate.

742
00:54:12,080 --> 00:54:18,080
And then the moisture content.

743
00:54:18,080 --> 00:54:25,080
Is how much actual, you know, H2O is in a particular.

744
00:54:25,080 --> 00:54:30,080
In a particular substance.

745
00:54:30,080 --> 00:54:34,080
Let me try to.

746
00:54:34,080 --> 00:54:38,080
Okay, there's a better distribution plot.

747
00:54:38,080 --> 00:54:48,080
So that's sort of the distribution of moisture content in flour in Washington state.

748
00:54:48,080 --> 00:54:54,080
And so, as you can see, you know, the mean.

749
00:54:54,080 --> 00:54:57,080
Is around 7%.

750
00:54:57,080 --> 00:55:01,080
And then I want to say.

751
00:55:01,080 --> 00:55:12,080
Well, there may be some outliers, but essentially anything with the moisture content above 15% actually fails quality assurance testing.

752
00:55:12,080 --> 00:55:15,080
Because that's sort of the.

753
00:55:15,080 --> 00:55:26,080
The limit that people have suggested would encourage mold, right? We were talking about shelf life stability earlier of cannabinoids.

754
00:55:26,080 --> 00:55:34,080
Well, this is actually sort of a concern, you know, in a lot of food industries, and it's sort of.

755
00:55:34,080 --> 00:55:43,080
Was adopted by the cannabis industry in that, you know, you don't want, you know, flour sitting on the shelf to potentially mold over time.

756
00:55:43,080 --> 00:55:49,080
You know, it's one thing for it to lose cannabinoids, but you definitely don't want it molding on the shelf.

757
00:55:49,080 --> 00:55:57,080
And so in Washington state, they've said, OK, the moisture content percent needs to be less than 15%.

758
00:55:57,080 --> 00:56:04,080
And the water activity rate needs to be less than 0.65.

759
00:56:04,080 --> 00:56:12,080
So these are metrics that people have set for shelf life stability.

760
00:56:12,080 --> 00:56:16,080
Long story short, flour has different moisture content.

761
00:56:16,080 --> 00:56:27,080
And from states such as Oklahoma, and I know they're doing it in Massachusetts, and I suspect they're doing it in Connecticut, need to confirm.

762
00:56:27,080 --> 00:56:31,080
You basically correct the.

763
00:56:31,080 --> 00:56:35,080
Concentration of cannabinoids in respect to the moisture.

764
00:56:35,080 --> 00:56:54,080
So you basically say, OK, if all of the moisture evaporated out and we're left with an entirely dry cannabis flour, what percentage of that dry flour is the cannabinoid?

765
00:56:54,080 --> 00:57:00,080
And so this is going to be a higher percentage because.

766
00:57:00,080 --> 00:57:15,080
If all the water is taken out. And you're just left with the exact same amount of THCA in the dry flour, it's going to make up a higher proportion of that flour.

767
00:57:15,080 --> 00:57:22,080
So I suspect that may be why Connecticut had a higher distribution.

768
00:57:22,080 --> 00:57:30,080
So here I calculate total cannabinoids with the moisture correction.

769
00:57:30,080 --> 00:57:39,080
So I divide by one minus the moisture content.

770
00:57:39,080 --> 00:57:47,080
Which will inflate the cannabinoids, but you know, just in in.

771
00:57:47,080 --> 00:57:52,080
Attempt to standardize this right because we want to compare apples to apples.

772
00:57:52,080 --> 00:58:00,080
So if they're correcting for moisture content in Connecticut, then we'll want to do the same in Washington.

773
00:58:00,080 --> 00:58:05,080
And so without further ado, let's look at that plot.

774
00:58:05,080 --> 00:58:10,080
And so let's actually look at these back to back.

775
00:58:10,080 --> 00:58:23,080
So this was the original plot we calculated and then if we correct for moisture content in Washington.

776
00:58:23,080 --> 00:58:28,080
Then the distributions.

777
00:58:28,080 --> 00:58:31,080
Well, that was maybe too many bins.

778
00:58:31,080 --> 00:58:43,080
Long story short, it looks like the Washington distribution is still a little lower than Connecticut, even with the moisture correction factor.

779
00:58:43,080 --> 00:58:46,080
So.

780
00:58:46,080 --> 00:58:49,080
I think we still need to get to the bottom of this.

781
00:58:49,080 --> 00:58:55,080
Why is the distribution in Connecticut different than it is in Washington?

782
00:58:55,080 --> 00:59:05,080
I mean, perhaps they're just as much as I would love to believe that they're just growing better cannabis on the East Coast.

783
00:59:05,080 --> 00:59:08,080
That may not necessarily be the case.

784
00:59:08,080 --> 00:59:16,080
So that's why I said, you know, the choices you make matter and not just the choices we make, but the choices other people make.

785
00:59:16,080 --> 00:59:23,080
So it matters how the laboratories are measuring cannabinoids.

786
00:59:23,080 --> 00:59:32,080
It measure it matters if they're doing a moisture correction factor.

787
00:59:32,080 --> 00:59:41,080
You know, and it matters, you know, how we look at the data and calculate statistics and create our visualizations.

788
00:59:41,080 --> 00:59:58,080
So. I always like to cast a lot of uncertainty and doubt upon my analysis because, you know, you should take any data or statistical analysis with a little bit of skepticism

789
00:59:58,080 --> 01:00:05,080
and make sure people justify their assumptions, explain the shortcomings in their data.

790
01:00:05,080 --> 01:00:16,080
So. You know, nothing's set in stone, but I just thought this was an interesting analysis here that warrants further research.

791
01:00:16,080 --> 01:00:22,080
So after today, when we get off the call, I'm going to see, OK.

792
01:00:22,080 --> 01:00:26,080
Are they actually doing moisture correction in Connecticut?

793
01:00:26,080 --> 01:00:34,080
So it may have to dig into the regulations there or maybe even call some labs in Connecticut and ask them directly.

794
01:00:34,080 --> 01:00:49,080
And. And this is why, you know, there's a lot of science conferences where people at different laboratories are discussing what's the best way to measure cannabis,

795
01:00:49,080 --> 01:00:57,080
because how you go about measuring it matters. So your sample preparation matters.

796
01:00:57,080 --> 01:01:07,080
So that's essentially how you're, you know, they call it homogenizing, which is just a fancy word for grinding the sample up.

797
01:01:07,080 --> 01:01:15,080
So when you send a sample to the lab, they grind it up before they test it or they probably should.

798
01:01:15,080 --> 01:01:21,080
They may not be. And so a lot of things like this aren't standardized and things like this make a big difference.

799
01:01:21,080 --> 01:01:28,080
So it's how you homogenize the sample can affect your results.

800
01:01:28,080 --> 01:01:34,080
You know, so it would be so it looks like.

801
01:01:34,080 --> 01:01:43,080
I mean, we can't tell, right. Or growers just doing a little better job in Connecticut, maybe.

802
01:01:43,080 --> 01:01:49,080
Or the laboratories measuring the cannabinoids differently, maybe.

803
01:01:49,080 --> 01:01:58,080
Or are we doing something wrong on our end? Maybe. So all of these things need to be reviewed.

804
01:01:58,080 --> 01:02:07,080
So. And we could always just get more and more data points. So we just have observations from two states.

805
01:02:07,080 --> 01:02:20,080
So say you're a laboratory in one of these other states, maybe you maybe you could add your observations to this mix and see where your state shakes out in.

806
01:02:20,080 --> 01:02:28,080
On this scale in terms in regards in relation to Washington in Connecticut.

807
01:02:28,080 --> 01:02:30,080
So.

808
01:02:30,080 --> 01:02:47,080
So that's me droning on and on and on. So do any of you have any questions or comments or thoughts from the analysis today?

809
01:02:47,080 --> 01:03:01,080
Well, in that case, definitely feel free to reach out if you have, you know, any where that you want to see the analysis extended or anything of that sort.

810
01:03:01,080 --> 01:03:07,080
Graham lost you there for a second. So if you have any last questions, you're free to chime in.

811
01:03:07,080 --> 01:03:24,080
Yes. One thing I do have to say is that I don't think carotene has psychoactive activities.

812
01:03:24,080 --> 01:03:41,080
I personally heard carotene is the only terpenoid that acts on the CV2 receptors, which are specifically found in the muscular organs as compared to the central nervous system.

813
01:03:41,080 --> 01:03:58,080
But I have heard that carotene is related to couch lock. But I also understand that mercy, I believe, is the most psychoactive of it.

814
01:03:58,080 --> 01:04:11,080
It maxes out the CV1 activation level and is the closest to opioids. Sorry, I just I'm not sure if I'm wrong or I.

815
01:04:11,080 --> 01:04:19,080
You're 100% correct. Don't be sorry, Graham. You're 100% correct. I that was in the back of my mind and should have mentioned it.

816
01:04:19,080 --> 01:04:31,080
And like I said, I'm not the expert chemist here. So thank you for keeping it sharp as always. Should have mentioned that Mersene is another terpene, as Graham said.

817
01:04:31,080 --> 01:04:46,080
You should look out for if you're concerned about the intoxicant effects of cannabis. So once again, if you're just looking at cannabinoids, may miss Mersene.

818
01:04:46,080 --> 01:05:02,080
So critical point. So exactly. So I exactly. So I made mix mix those two up a bit. So as Graham keeps it sharp here, you know, Mersene may actually be one that is a bit more intoxicant.

819
01:05:02,080 --> 01:05:16,080
And then the curiophiline may I've heard it's maybe almost like an irritant. So depending on your body chemistry, it may kind of clash with your body chemistry.

820
01:05:16,080 --> 01:05:23,080
So, you know, everybody's different and, you know, everyone's reaction to cannabis is a little bit different.

821
01:05:23,080 --> 01:05:37,080
And so this is why it's so important to get granular and scientific as we try to do. Perfect ending Keegan. I couldn't have said it better myself.

822
01:05:37,080 --> 01:05:44,080
Thank you, Graham. Love having you. Love having you all. I hope you all have gotten something of value from today.

823
01:05:44,080 --> 01:05:58,080
And like to encourage you all to come to Saturday Morning Statistics. I was just going to tease real quick. Saturday Morning Statistics.

824
01:05:58,080 --> 01:06:08,080
So there's a funny comment going around at XKCD.

825
01:06:08,080 --> 01:06:19,080
It's just so relevant to some of the things that we've been talking about where they say, oh, you know, if you don't control for content bounding variables, you'll have bias.

826
01:06:19,080 --> 01:06:24,080
We talked about this when we talked about instrumental variables.

827
01:06:24,080 --> 01:06:32,080
And then we've talked about if you include too many variables, we may overfit or over identify the model.

828
01:06:32,080 --> 01:06:47,080
We've also ran into that problem. But I kind of disagree with the statistician here in this last one, because, you know, I kind of agree in that, you know, you definitely want to be skeptical of all analysis.

829
01:06:47,080 --> 01:07:12,080
But the statistician is forgetting that there is a whole lot of work that's been done exactly on this problem, which is model selection. And so we'll talk about that on Saturday, where there's actually been statistics done where you can actually try to find, you know, what is, you know, the statistically best model to use.

830
01:07:12,080 --> 01:07:22,080
So essentially penalizing yourself for adding more variables while recognizing that more variables add more information.

831
01:07:22,080 --> 01:07:40,080
So just thought I would add a little bit of fun here at the end. But, you know, thank you for coming. And, you know, feel free to start exploring some of the code for yourself.

832
01:07:40,080 --> 01:07:57,080
Yes. So all the data is open to the public. That was the final question here. And so I've got the links in the source code, and then that is found on GitHub.

833
01:07:57,080 --> 01:08:18,080
So check out GitHub Analytics, Innovist Data Science. And I've got references to these various sources. But that's one thing we stress is, you know, it's important to have reproducible results.

834
01:08:18,080 --> 01:08:30,080
And so we try to find public data, do everything publicly and transparently so that they can be reproduced. So we're making our effort.

835
01:08:30,080 --> 01:08:48,080
And so you don't have to use these on public data. So if you have an awesome private data set, then by all means use our statistics on your private data. But in this group, we tend to focus on public data.

836
01:08:48,080 --> 01:08:50,080
Awesome.

837
01:08:50,080 --> 01:09:01,080
Well, I know everyone's time's precious. So thank you for staying a little extra today. So check out XKCD for a good laugh or two.

838
01:09:01,080 --> 01:09:19,080
You know, feel free to be in touch. And definitely. And if you have any directions for future meetups for next year, any topics you want covered, any states you want to deep dive in, definitely feel free to reach out because always happy to accommodate.

839
01:09:19,080 --> 01:09:27,080
On that note, as I like to say, I hope everybody, you keep your nose to the grindstone, stay productive, and have fun.

840
01:09:27,080 --> 01:09:44,080
And I'll see you all next year in 2022, unless you come to Saturday morning statistics. In that case, we'll dive into model selection and touch up our forecast for 2022.

841
01:09:44,080 --> 01:09:45,080
Awesome.

842
01:09:45,080 --> 01:10:03,080
Bye everyone. Thank you for coming. Thank you. Bye, else. Bye, Graham. Bye, Nina.

