1
00:00:00,000 --> 00:00:12,320
Welcome to the Cannabis Data Science Meetup group in for a treat today as we finish 2022.

2
00:00:12,320 --> 00:00:13,800
It's been a big year.

3
00:00:13,800 --> 00:00:17,960
We've calculated cannabis statistics week after week after week.

4
00:00:17,960 --> 00:00:20,460
We've wrangled data set after data set.

5
00:00:20,460 --> 00:00:22,800
It's been a fruitful year.

6
00:00:22,800 --> 00:00:27,520
I think we've reached some of the hard to reach statistics and I think there's still

7
00:00:27,520 --> 00:00:35,400
some more to get a hold of and I'll share with you today exactly how we can do that.

8
00:00:35,400 --> 00:00:44,640
Just for, well, I don't need to do an introduction, but I'll share with you today all of the interesting

9
00:00:44,640 --> 00:00:49,320
statistics and data that I'll be chasing after in 2023.

10
00:00:49,320 --> 00:00:53,940
But before I do that, just to give you all a chance to share something that you may be

11
00:00:53,940 --> 00:01:02,280
interested in, Candice, what's on your mind and what would you hope to, well, I guess

12
00:01:02,280 --> 00:01:07,040
we're already almost finished with the year, but what are your big goals for 2023 as far

13
00:01:07,040 --> 00:01:10,040
as cannabis data goes?

14
00:01:10,040 --> 00:01:12,760
Good to see you, Cindy.

15
00:01:12,760 --> 00:01:16,880
I would, let me see, hold on.

16
00:01:16,880 --> 00:01:22,440
I would like to probably pursue more Massachusetts and Florida data sets.

17
00:01:22,440 --> 00:01:29,120
I would like to do that.

18
00:01:29,120 --> 00:01:35,600
And then also too, I am working on the cannabis data science directory and some of the files

19
00:01:35,600 --> 00:01:42,080
doing some housekeeping, getting them running and then put them into notebooks too for people

20
00:01:42,080 --> 00:01:46,280
that might want to run the code too in a notebook.

21
00:01:46,280 --> 00:01:53,040
Also, what was your final remark?

22
00:01:53,040 --> 00:01:54,040
I'm sorry.

23
00:01:54,040 --> 00:01:55,040
Is it hard to hear me?

24
00:01:55,040 --> 00:01:56,200
Oh, I just cut you off.

25
00:01:56,200 --> 00:01:58,680
I just started speaking before you finished.

26
00:01:58,680 --> 00:01:59,840
Oh, sorry.

27
00:01:59,840 --> 00:02:03,520
And yeah, that's, that's it.

28
00:02:03,520 --> 00:02:10,240
Well, that's exciting because one of the things that's going to be needed, there's an interesting

29
00:02:10,240 --> 00:02:12,680
new project, cannabis data.

30
00:02:12,680 --> 00:02:19,640
As far as I know, it's the first of its kind, it's an open source project that's aimed at

31
00:02:19,640 --> 00:02:21,680
curating cannabis data.

32
00:02:21,680 --> 00:02:24,680
So Cantlydx has been building some data pike lines.

33
00:02:24,680 --> 00:02:27,640
So Cantlydx will be being used in the project.

34
00:02:27,640 --> 00:02:28,640
So that's exciting.

35
00:02:28,640 --> 00:02:35,780
I'll share with you today some of the, it's really interesting how small projects can

36
00:02:35,780 --> 00:02:39,060
start and then get adopted elsewhere.

37
00:02:39,060 --> 00:02:43,880
So I'll share with you today some of the cooler tools that are being used.

38
00:02:43,880 --> 00:02:49,680
It's always interesting when you develop software because as a software developer, it's fun

39
00:02:49,680 --> 00:02:50,960
to write code.

40
00:02:50,960 --> 00:02:56,240
So you're writing all this interesting code and it's not until it gets into the hands

41
00:02:56,240 --> 00:03:00,400
of your users that you really figure out what's useful.

42
00:03:00,400 --> 00:03:07,120
And it's interesting how Cantlydx is being used because some of the minor, what I considered

43
00:03:07,120 --> 00:03:12,240
minor, some of the utility functions and the constants, those are actually some of the

44
00:03:12,240 --> 00:03:15,040
most useful pieces of the code.

45
00:03:15,040 --> 00:03:19,560
So that's awesome to discover because now we can expand upon those.

46
00:03:19,560 --> 00:03:21,920
So I'll share with those with you today.

47
00:03:21,920 --> 00:03:27,160
And then the other piece being cannabis data is based out of Washington state.

48
00:03:27,160 --> 00:03:31,240
So that's definitely where we're focusing a lot of our energy right now.

49
00:03:31,240 --> 00:03:39,720
And what we're thinking is there's almost a need for ambassadors or other contributors

50
00:03:39,720 --> 00:03:46,600
throughout the, well, first the United States and then the world, because there's so much

51
00:03:46,600 --> 00:03:52,720
ground to cover that a data scientist can't do it by themselves.

52
00:03:52,720 --> 00:03:56,240
We need many awesome cannabis data scientists.

53
00:03:56,240 --> 00:04:02,160
So while we're chewing on this data in Washington state, hopefully some of these same tools

54
00:04:02,160 --> 00:04:07,880
can get picked up and then you can say Wrangled data in Massachusetts, Wrangled data in New

55
00:04:07,880 --> 00:04:12,440
Jersey, Wrangled data in Florida, where have you.

56
00:04:12,440 --> 00:04:17,200
And then we can each learn from each other, share interesting data.

57
00:04:17,200 --> 00:04:20,720
So I think it's going to be a fruitful, promising project.

58
00:04:20,720 --> 00:04:23,600
So I'll start sharing some of these links in the chat with you.

59
00:04:23,600 --> 00:04:29,080
And while I share some of these links, Isaac, you're free to share some of the interesting

60
00:04:29,080 --> 00:04:33,040
projects that you may hope to accomplish here in 2023.

61
00:04:33,040 --> 00:04:38,000
Hello, can you hear me?

62
00:04:38,000 --> 00:04:41,560
Yes, I can hear you.

63
00:04:41,560 --> 00:04:44,560
We can hear you.

64
00:04:44,560 --> 00:04:45,560
Perfect.

65
00:04:45,560 --> 00:04:46,560
Yeah, yeah, yeah.

66
00:04:46,560 --> 00:04:47,560
Perfect.

67
00:04:47,560 --> 00:04:48,560
I'm sorry.

68
00:04:48,560 --> 00:04:49,560
Apologies for my different background noises.

69
00:04:49,560 --> 00:04:51,560
I'm at a public space.

70
00:04:51,560 --> 00:05:00,920
But my hope for the year, I mean, I'm really thankful for this group to share the Washington

71
00:05:00,920 --> 00:05:01,920
data set.

72
00:05:01,920 --> 00:05:04,600
So I've been working on that.

73
00:05:04,600 --> 00:05:12,760
I mean, maybe next meetup when I'm at a better place, I'm happy to show you some of my findings.

74
00:05:12,760 --> 00:05:18,960
But my general hope for the next year is, well, it seems to me that kind of regulation

75
00:05:18,960 --> 00:05:22,160
landscape is going to get some changes.

76
00:05:22,160 --> 00:05:28,040
There's already some changes regarding pesticide use in Massachusetts, and people are noticeably

77
00:05:28,040 --> 00:05:30,560
talking more about cannabis data.

78
00:05:30,560 --> 00:05:31,560
So that's very good.

79
00:05:31,560 --> 00:05:38,880
I'm just hoping if we can make some contributions to this more general trend.

80
00:05:38,880 --> 00:05:44,880
And also for me personally, I think it's to learn more Python.

81
00:05:44,880 --> 00:05:51,280
I have been analyzing data using R, and I thought most of your tools are in Python.

82
00:05:51,280 --> 00:05:58,280
So I'm hoping I can get on board to learn how to use it and maybe contribute to the

83
00:05:58,280 --> 00:06:00,320
project overall.

84
00:06:00,320 --> 00:06:02,800
Phenomenal, Isaac.

85
00:06:02,800 --> 00:06:05,080
I'd love to hear it.

86
00:06:05,080 --> 00:06:08,640
Just to speak to some of these things.

87
00:06:08,640 --> 00:06:17,600
I'd love that you're interested in pushing forward the regulatory framework in Massachusetts.

88
00:06:17,600 --> 00:06:20,600
This is still a new industry, so there's a lot to iron out.

89
00:06:20,600 --> 00:06:26,360
And in fact, that's the talk that's happening next week in Washington state.

90
00:06:26,360 --> 00:06:33,460
People are talking about quality control testing and the consequences of Washington state mandated

91
00:06:33,460 --> 00:06:39,160
pesticide and heavy metal screening in April of 2022.

92
00:06:39,160 --> 00:06:43,840
And they've given licensees a long adoption period.

93
00:06:43,840 --> 00:06:53,200
And people are going to be talking next week about how the industry experienced that change

94
00:06:53,200 --> 00:06:55,160
in quality testing regulations.

95
00:06:55,160 --> 00:06:58,520
So I think that's going to be an interesting talk.

96
00:06:58,520 --> 00:07:00,560
I'll share with you some of that.

97
00:07:00,560 --> 00:07:05,920
And then, as I said, it's always helpful when all of us come together and talk about some

98
00:07:05,920 --> 00:07:12,400
of these regulations and the consequences, whether they be intended or unintended, because

99
00:07:12,400 --> 00:07:15,600
simply talking about it helps move the ball forward.

100
00:07:15,600 --> 00:07:22,480
And as I'll share with you today, as we document some of the things we do, these can get picked

101
00:07:22,480 --> 00:07:23,480
up by others.

102
00:07:23,480 --> 00:07:30,960
So for example, I was playing around with, of course, you have all heard of the chat

103
00:07:30,960 --> 00:07:37,560
GBT, where you ask interesting questions and it will field a response.

104
00:07:37,560 --> 00:07:45,040
And I was thinking that that's almost another reason why it's beneficial to keep studying

105
00:07:45,040 --> 00:07:51,040
and putting out our material, because who knows, maybe some of the material that we

106
00:07:51,040 --> 00:07:58,240
discuss about cannabis and cannabis related policies may get picked up by some model like

107
00:07:58,240 --> 00:08:01,560
chat GPT and regurgitated.

108
00:08:01,560 --> 00:08:04,760
So I don't know.

109
00:08:04,760 --> 00:08:08,160
It's definitely an interesting thought.

110
00:08:08,160 --> 00:08:15,360
The second part being what I always tell people is, you know, each programming language, it's

111
00:08:15,360 --> 00:08:19,280
ultimately just a tool to get a job done.

112
00:08:19,280 --> 00:08:33,520
I happened to learn Python back in college to build plots for various economic questions

113
00:08:33,520 --> 00:08:35,520
we would look at.

114
00:08:35,520 --> 00:08:39,760
I always found that making plots in Python was more reliable.

115
00:08:39,760 --> 00:08:43,960
And actually, that's actually something that we'll touch on today.

116
00:08:43,960 --> 00:08:48,760
It doesn't have to be Python, but any programming language, it can help make things like creating

117
00:08:48,760 --> 00:08:54,040
plots in this case today, we'll talk about diagrams, reproducible.

118
00:08:54,040 --> 00:08:59,320
And so that's why I picked up the programming language is I wanted good looking reproducible

119
00:08:59,320 --> 00:09:00,320
plots.

120
00:09:00,320 --> 00:09:03,320
And I ended up going with Python.

121
00:09:03,320 --> 00:09:08,200
There were other students who used art, some used matplotlib.

122
00:09:08,200 --> 00:09:12,920
At the end of the day, we could all build beautiful plots.

123
00:09:12,920 --> 00:09:18,520
And for a long time, when I started out as a software developer, Python has kind of looked

124
00:09:18,520 --> 00:09:24,120
down upon and people were using more sophisticated programming languages.

125
00:09:24,120 --> 00:09:30,240
So I never really considered myself a cutting edge programmer.

126
00:09:30,240 --> 00:09:33,480
But it's interesting that it's being used more and more.

127
00:09:33,480 --> 00:09:39,200
So personally, I'm thankful, but I don't want to let other people miss out on the fun.

128
00:09:39,200 --> 00:09:44,720
So I don't see any reason why you can't use R or your favorite programming language to

129
00:09:44,720 --> 00:09:49,320
do data science.

130
00:09:49,320 --> 00:09:58,600
But yes, a lot of our work, we'll get a little static here, a lot of our work is in Python.

131
00:09:58,600 --> 00:10:01,920
And so if you can pick it up, then that could help.

132
00:10:01,920 --> 00:10:04,080
But I'm droning on and droning on.

133
00:10:04,080 --> 00:10:07,520
So we'll get back to cannabis and data science here momentarily.

134
00:10:07,520 --> 00:10:10,800
But Sammy, welcome to the group.

135
00:10:10,800 --> 00:10:12,680
Don't want to leave you out on the fun.

136
00:10:12,680 --> 00:10:16,740
So feel free to ask for any introductions if you need them.

137
00:10:16,740 --> 00:10:21,440
But we're basically talking about data science projects, cannabis related, that we would

138
00:10:21,440 --> 00:10:24,160
like to try to accomplish in the coming year.

139
00:10:24,160 --> 00:10:29,360
So you're welcome to share anything that you're interested in and any way that you may see

140
00:10:29,360 --> 00:10:31,360
the group helping you out.

141
00:10:31,360 --> 00:10:34,440
Okay, hi, thank you.

142
00:10:34,440 --> 00:10:39,680
Yeah, so I'm kind of an unusual situation.

143
00:10:39,680 --> 00:10:42,200
Can I say just like a word or two about who I am?

144
00:10:42,200 --> 00:10:46,960
I just stumbled upon your event on Meetup.

145
00:10:46,960 --> 00:10:50,640
By all means, it's a meetup after all.

146
00:10:50,640 --> 00:10:53,880
So sorry if I didn't do a great introduction.

147
00:10:53,880 --> 00:10:59,200
So my name is Keegan, founded this company, Canlytics, basically to help people with cannabis

148
00:10:59,200 --> 00:11:00,720
and analytics out.

149
00:11:00,720 --> 00:11:06,480
And basically this Meetup group, it's a roundtable for data scientists to come together and talk

150
00:11:06,480 --> 00:11:14,480
about projects we're working on and see if we can't help each other move the ball forward.

151
00:11:14,480 --> 00:11:19,200
Yeah, well, please share.

152
00:11:19,200 --> 00:11:22,920
Yeah, so a couple things.

153
00:11:22,920 --> 00:11:31,160
So I'm a mathematician who's been a lecturer in the university system for 10 years since

154
00:11:31,160 --> 00:11:37,920
my PhD, and I've in the last few years gotten very interested in data science and been doing

155
00:11:37,920 --> 00:11:41,160
a lot of it.

156
00:11:41,160 --> 00:11:44,720
Playing with Python in particular, you just mentioned Python and a lot of the machine

157
00:11:44,720 --> 00:11:49,120
learning tools there, scikit-learn and stuff.

158
00:11:49,120 --> 00:11:52,640
And so I'm trying to make this a career.

159
00:11:52,640 --> 00:11:53,920
I'm trying to get into this.

160
00:11:53,920 --> 00:12:02,440
And yeah, and cannabis is such a relatively new kind of industry, right?

161
00:12:02,440 --> 00:12:08,080
So you're in Washington State, right?

162
00:12:08,080 --> 00:12:09,080
Not DC?

163
00:12:09,080 --> 00:12:10,080
Exactly.

164
00:12:10,080 --> 00:12:13,220
So I'm from the East Coast.

165
00:12:13,220 --> 00:12:16,020
So when I hear Washington, I think Washington DC too.

166
00:12:16,020 --> 00:12:20,960
So I always say, well, I guess I've gotten out of the habit, but exactly Washington State.

167
00:12:20,960 --> 00:12:26,800
Well, I was asking because I'm in Oregon, we're sort of sister states.

168
00:12:26,800 --> 00:12:32,480
We are also one of the early adopters of the cannabis decriminalization and all that.

169
00:12:32,480 --> 00:12:41,400
But yeah, so I'm just, I think you had a query, you had a question when signing up for this

170
00:12:41,400 --> 00:12:48,320
and said, what would you, what are you curious about?

171
00:12:48,320 --> 00:12:55,000
And I mentioned something that was kind of very open, like aspirational thing, almost

172
00:12:55,000 --> 00:12:58,480
like a research topic.

173
00:12:58,480 --> 00:13:05,000
And it's from my experience with being in these cannabis shops where they have a thousand

174
00:13:05,000 --> 00:13:10,680
flavors and I can't make heads or tails of them.

175
00:13:10,680 --> 00:13:18,200
It's the same kind of paralyzing thing where I'm looking at the wines on the aisle in the

176
00:13:18,200 --> 00:13:19,200
grocery store.

177
00:13:19,200 --> 00:13:21,800
I'm just like, well, that label is cool.

178
00:13:21,800 --> 00:13:29,000
And I was thinking, wait, this is, this maybe is a problem that has like a data science.

179
00:13:29,000 --> 00:13:33,960
I don't want to call it a solution, but maybe like a, there's something there.

180
00:13:33,960 --> 00:13:39,360
So I was, I was curious, like, I know some people have tried to catalog these terpenes

181
00:13:39,360 --> 00:13:42,000
that show up in the different strains.

182
00:13:42,000 --> 00:13:52,920
And basically somehow figure out a mapping from the space of on the one hand, the actual

183
00:13:52,920 --> 00:13:59,840
strains, whether they're identified by their name or their producer or whatever, to some

184
00:13:59,840 --> 00:14:05,120
other space that is more identifiable to people rather, because the names are just sort of

185
00:14:05,120 --> 00:14:06,120
silly, right?

186
00:14:06,120 --> 00:14:09,760
They're like, they don't really mean a lot.

187
00:14:09,760 --> 00:14:13,440
So like, for example, and then I'll stop.

188
00:14:13,440 --> 00:14:16,960
I've taken a lot of time here.

189
00:14:16,960 --> 00:14:20,680
If I were to be like, well, okay, I bought this one strain one time and I really liked

190
00:14:20,680 --> 00:14:22,440
it.

191
00:14:22,440 --> 00:14:24,720
What's another one that's like that?

192
00:14:24,720 --> 00:14:30,520
That's a question about like, there's some space where the, what are the other points

193
00:14:30,520 --> 00:14:32,160
that are nearby in that space?

194
00:14:32,160 --> 00:14:33,920
But like, what is that space?

195
00:14:33,920 --> 00:14:36,240
Like that seems like a data science question.

196
00:14:36,240 --> 00:14:37,240
Exactly.

197
00:14:37,240 --> 00:14:42,440
So today I'll show you exactly how you can answer that question because that came up

198
00:14:42,440 --> 00:14:51,280
last week and I was actually going to answer that question for our cannabis data science

199
00:14:51,280 --> 00:14:54,280
crew member.

200
00:14:54,280 --> 00:15:00,840
I haven't gotten to the answer yet because it's a big one, but once again, the work we're

201
00:15:00,840 --> 00:15:02,360
doing is reproducible.

202
00:15:02,360 --> 00:15:07,680
And once we get to the end goal, then it will be readily reproducible.

203
00:15:07,680 --> 00:15:08,680
And so, exactly.

204
00:15:08,680 --> 00:15:15,480
So basically the question is, okay, these cannabis flowers or products are getting sold

205
00:15:15,480 --> 00:15:17,720
by strain name.

206
00:15:17,720 --> 00:15:21,200
And next week we'll get a bit more into the history of that.

207
00:15:21,200 --> 00:15:25,800
I might keep teasing it, but I think that's going to be a real cool way to start the new

208
00:15:25,800 --> 00:15:26,800
year.

209
00:15:26,800 --> 00:15:30,640
We're basically going to start with this grand strain hunt.

210
00:15:30,640 --> 00:15:38,360
And I've got some things to share with you, so don't let me forget.

211
00:15:38,360 --> 00:15:40,840
Okay.

212
00:15:40,840 --> 00:15:47,640
So part of the interesting history, I think, is, well, cannabis used to be underground.

213
00:15:47,640 --> 00:15:52,640
And I think I just kind of had this thought today.

214
00:15:52,640 --> 00:15:58,120
I don't know how accurate it is, but people didn't used to want to be traced.

215
00:15:58,120 --> 00:16:04,520
So I don't know if people ever thought, oh, maybe they could trace us back through the

216
00:16:04,520 --> 00:16:06,840
genetics of the cannabis.

217
00:16:06,840 --> 00:16:13,440
But these growers I've realized are super sophisticated, and maybe that was a concern

218
00:16:13,440 --> 00:16:15,200
of theirs.

219
00:16:15,200 --> 00:16:22,200
Because from basically the research I've been doing, people have been measuring, the cultivators

220
00:16:22,200 --> 00:16:28,600
have been measuring the cannabinoid contents, from my understanding, since back in the 90s.

221
00:16:28,600 --> 00:16:34,300
So this is something that people have been measuring, have been studying, but maybe it's

222
00:16:34,300 --> 00:16:37,760
just prized information.

223
00:16:37,760 --> 00:16:47,440
And so we're pretty open here, and so we're trying to free up any of this knowledge that

224
00:16:47,440 --> 00:16:48,440
may be hidden.

225
00:16:48,440 --> 00:16:53,920
We're not trying to step on anybody's toes, but we're just going for publicly available

226
00:16:53,920 --> 00:16:58,560
data and seeing if there's any knowledge that can be gleaned there.

227
00:16:58,560 --> 00:17:04,880
And long story short, we've got strain names.

228
00:17:04,880 --> 00:17:12,280
In Washington State, we can at least find the THC and CBD for these strains.

229
00:17:12,280 --> 00:17:18,200
And so for example, last time somebody was interested in what's considered a type 2 cannabis

230
00:17:18,200 --> 00:17:19,200
testing.

231
00:17:19,200 --> 00:17:27,880
I messed up my language last time, but I believe it's about equal proportion of THC and CBD.

232
00:17:27,880 --> 00:17:34,960
To say the flower may have 7% THC, 7% CBD.

233
00:17:34,960 --> 00:17:43,720
Well we could actually look at all the flowers in Washington State that were tested and try

234
00:17:43,720 --> 00:17:50,840
to see which other ones were also near 7% to 7%.

235
00:17:50,840 --> 00:17:58,600
And then we could say, okay, well which ones of those are in your area of Washington State?

236
00:17:58,600 --> 00:18:03,280
And which ones were in stock last month?

237
00:18:03,280 --> 00:18:08,480
So that way we could at least produce a list of similar products.

238
00:18:08,480 --> 00:18:11,440
And that's a real bare minimal example.

239
00:18:11,440 --> 00:18:15,840
And that's a lot of what we do here is just sort of a proof of concept.

240
00:18:15,840 --> 00:18:20,080
But as you were saying, this could get substantially more sophisticated.

241
00:18:20,080 --> 00:18:28,040
So instead of just using THC and CBD for product similarity, you would also use things like

242
00:18:28,040 --> 00:18:29,040
terpenes.

243
00:18:29,040 --> 00:18:34,600
And you could even look at, say, reviews.

244
00:18:34,600 --> 00:18:39,760
And that was some of the work we were doing with one of our models, an effects prediction

245
00:18:39,760 --> 00:18:40,760
model.

246
00:18:40,760 --> 00:18:47,840
Looking at people's reviews of strains and seeing if we couldn't correlate those with

247
00:18:47,840 --> 00:18:51,160
the chemical compositions.

248
00:18:51,160 --> 00:18:54,600
So lots of interesting work to be done there.

249
00:18:54,600 --> 00:18:58,280
And I'll share something with you real quick.

250
00:18:58,280 --> 00:19:03,640
But first, Tammy, love to see you here at the meetup.

251
00:19:03,640 --> 00:19:05,880
We're finishing the year off strong.

252
00:19:05,880 --> 00:19:07,600
And I've got a surprise for you all.

253
00:19:07,600 --> 00:19:12,640
Before I share this with you, we'd love to hear about any big projects you have planned

254
00:19:12,640 --> 00:19:20,960
for 2023 or any way that we may be able to help you out in the coming year.

255
00:19:20,960 --> 00:19:27,800
Well, I think my main project is to find a job.

256
00:19:27,800 --> 00:19:32,480
That's what most of my time is spent on right now.

257
00:19:32,480 --> 00:19:37,680
So I'll have to see if the ad is still valid.

258
00:19:37,680 --> 00:19:42,000
I don't know if you have much of a chemistry background.

259
00:19:42,000 --> 00:19:50,960
But the most in-demand jobs that I've seen are people looking for laboratory analysts

260
00:19:50,960 --> 00:19:53,640
and chemists and directors.

261
00:19:53,640 --> 00:19:58,600
The director would be, you know, they would of course want, of course they would love

262
00:19:58,600 --> 00:20:02,440
someone with a PhD, but they may settle through someone with a master's.

263
00:20:02,440 --> 00:20:08,880
But if that's your background, I think there is a laboratory in Oklahoma City that may

264
00:20:08,880 --> 00:20:09,880
be hiring.

265
00:20:09,880 --> 00:20:16,800
And then I've seen laboratories popping up in Louisiana that look like they're in desperate

266
00:20:16,800 --> 00:20:19,080
need for scientists.

267
00:20:19,080 --> 00:20:26,720
And I do believe laboratories in New York and perhaps New Jersey hiring as well.

268
00:20:26,720 --> 00:20:33,480
So I'll have to keep an eye out for other jobs, but I do know people are in desperate

269
00:20:33,480 --> 00:20:37,080
need for good scientists in the lab.

270
00:20:37,080 --> 00:20:40,560
And in fact, I don't know if you're willing to relocate.

271
00:20:40,560 --> 00:20:51,880
Well, my sister works at an environmental laboratory in North Carolina called EMSL.

272
00:20:51,880 --> 00:20:55,560
And they're actually all on the East Coast.

273
00:20:55,560 --> 00:21:01,960
And she said they're looking into getting into the cannabis testing.

274
00:21:01,960 --> 00:21:03,520
Exactly.

275
00:21:03,520 --> 00:21:10,400
And I've heard of a laboratory in North Carolina as well that tests for hemp.

276
00:21:10,400 --> 00:21:15,360
And they weren't really on my radar, but I think exactly once.

277
00:21:15,360 --> 00:21:20,600
So the way laboratory testing works is it's actually kind of unusual to have a bunch of

278
00:21:20,600 --> 00:21:26,200
laboratories in all these different states, because normally you could just get a sample

279
00:21:26,200 --> 00:21:30,720
and put it in the mail and send it anywhere in the country you want.

280
00:21:30,720 --> 00:21:33,400
In cannabis, you can't do that.

281
00:21:33,400 --> 00:21:38,560
So for example, there's probably already well established laboratories in certain places

282
00:21:38,560 --> 00:21:40,920
in the country and they do tons of testing.

283
00:21:40,920 --> 00:21:43,920
They're just haven't gotten into cannabis yet.

284
00:21:43,920 --> 00:21:51,720
And so, for example, Louisiana, they need a testing laboratory.

285
00:21:51,720 --> 00:21:58,720
Now they have to find a bunch of scientists and chemists and microbiologists in Louisiana.

286
00:21:58,720 --> 00:22:02,200
And I'm sure they're there, but maybe they're already employed.

287
00:22:02,200 --> 00:22:04,160
So that's the other thing.

288
00:22:04,160 --> 00:22:06,340
So it's tricky.

289
00:22:06,340 --> 00:22:10,380
And then the other laboratory that I know that's starting up, and this is a lovely place

290
00:22:10,380 --> 00:22:17,800
in the country if you want to live there, is I do believe Confidence Analytics, a laboratory

291
00:22:17,800 --> 00:22:20,480
that is out of Seattle, Washington.

292
00:22:20,480 --> 00:22:25,720
I think they're starting a laboratory in Grover Beach, California.

293
00:22:25,720 --> 00:22:29,560
And so that would be a lovely part of the country to live in.

294
00:22:29,560 --> 00:22:34,200
I know they were looking for a scientific director, but they may look for other positions

295
00:22:34,200 --> 00:22:35,200
as well.

296
00:22:35,200 --> 00:22:41,840
Like I said, those were just jobs that came across my radar and I'll do a better job at

297
00:22:41,840 --> 00:22:45,600
putting all this material in the slack in the coming year because there's opportunities

298
00:22:45,600 --> 00:22:46,600
out there.

299
00:22:46,600 --> 00:22:53,160
Like I said, I don't know how tied down you are to your location.

300
00:22:53,160 --> 00:23:00,400
That's often a thing, but I'll keep posting these opportunities for people.

301
00:23:00,400 --> 00:23:05,400
Sometimes cannabis-friendly positions are in need.

302
00:23:05,400 --> 00:23:06,400
Yes.

303
00:23:06,400 --> 00:23:11,040
I just wanted to share with you some other opportunities coming up.

304
00:23:11,040 --> 00:23:18,600
So I had these made when I was in Las Vegas, but I never had them.

305
00:23:18,600 --> 00:23:19,600
I don't know.

306
00:23:19,600 --> 00:23:22,120
They didn't arrive in time.

307
00:23:22,120 --> 00:23:25,840
So we were talking about this effects prediction model.

308
00:23:25,840 --> 00:23:31,480
So here are two of these prediction model shirts.

309
00:23:31,480 --> 00:23:37,520
And then I also have, we have the COA parser.

310
00:23:37,520 --> 00:23:42,720
So we've got two COA doc shirts.

311
00:23:42,720 --> 00:23:44,720
So that's cool.

312
00:23:44,720 --> 00:23:50,520
So if anybody wants a shirt, if you just want to buy them outright, I was thinking just

313
00:23:50,520 --> 00:23:54,880
to charge costs, which would be about $45.

314
00:23:54,880 --> 00:24:00,160
So if you do want to buy one at cost, you can, but I'm going to be putting these shirts

315
00:24:00,160 --> 00:24:02,280
up in the coming year for competitions.

316
00:24:02,280 --> 00:24:07,800
And like I said, if you want, if you desperately want a shirt, I may list them on the website

317
00:24:07,800 --> 00:24:10,440
or you can email me and we can get you a shirt.

318
00:24:10,440 --> 00:24:16,680
But I think I was going to start thinking of interesting data science projects or competitions

319
00:24:16,680 --> 00:24:22,040
that people can participate in and I'll send you a shirt.

320
00:24:22,040 --> 00:24:30,280
And the idea is to get some proceeds and start sending people to conferences.

321
00:24:30,280 --> 00:24:39,440
So I want to start sending people to all these, the conference that I've actually met the

322
00:24:39,440 --> 00:24:46,280
person who organizes it is CannaCon and it's a family run business.

323
00:24:46,280 --> 00:24:56,000
So it's, you know, a lot of the family members run various duties like marketing and arrangements.

324
00:24:56,000 --> 00:24:57,640
So I'll have to get a list.

325
00:24:57,640 --> 00:25:04,640
I had written them down here, but I can't find it on this short notice.

326
00:25:04,640 --> 00:25:08,080
But anywho, yes.

327
00:25:08,080 --> 00:25:09,320
So here they are.

328
00:25:09,320 --> 00:25:11,960
So they're coming up in 2023.

329
00:25:11,960 --> 00:25:14,640
There's Mississippi.

330
00:25:14,640 --> 00:25:18,080
There's Oklahoma City CannaCon.

331
00:25:18,080 --> 00:25:22,360
And you do not want to miss that one if you can get to Oklahoma.

332
00:25:22,360 --> 00:25:26,200
So that's in, that's on March 31st.

333
00:25:26,200 --> 00:25:32,640
And that's probably the, besides from the MJBizCon in Las Vegas, but you know, Las Vegas

334
00:25:32,640 --> 00:25:40,120
is pretty pricey, but this is a real affordable conference and just it's real fun.

335
00:25:40,120 --> 00:25:45,320
So you'll just, I think the past couple of years there's been 10,000 plus people.

336
00:25:45,320 --> 00:25:49,520
So it's a real good way to meet people in the cannabis industry.

337
00:25:49,520 --> 00:25:57,000
So if you are looking to get a job in Oklahoma, then bring your resume and a big smile and

338
00:25:57,000 --> 00:26:00,880
some business cards and just talk with as many people as you can.

339
00:26:00,880 --> 00:26:03,000
So that's a good opportunity there.

340
00:26:03,000 --> 00:26:12,480
And then we've got New Mexico coming up in May, California, Long Beach in August, and

341
00:26:12,480 --> 00:26:13,480
then Detroit, Michigan.

342
00:26:13,480 --> 00:26:17,400
And there may be some other ones, Detroit's next October.

343
00:26:17,400 --> 00:26:21,000
So a bunch of these cool CannaCons coming up.

344
00:26:21,000 --> 00:26:27,320
And I'm going to try to think of ways that we can do competitions because ideally I'm

345
00:26:27,320 --> 00:26:36,040
trying to fund you to get to these conferences and if you want, wear your Cantlitics shirt

346
00:26:36,040 --> 00:26:37,280
while you're there.

347
00:26:37,280 --> 00:26:43,840
So anywho, if you're in any of these areas, be in touch and we'll try to think of some

348
00:26:43,840 --> 00:26:46,720
cool competitions to get you there.

349
00:26:46,720 --> 00:26:53,560
So that's just kind of something fun to end the year with.

350
00:26:53,560 --> 00:26:59,720
So shall we get into some of this data here?

351
00:26:59,720 --> 00:27:04,800
And then I'll share with you, Sammy, how you can answer this question about how you can

352
00:27:04,800 --> 00:27:09,240
find similar products in a nice logical manner.

353
00:27:09,240 --> 00:27:10,240
So, okay.

354
00:27:10,240 --> 00:27:11,240
Sounds good.

355
00:27:11,240 --> 00:27:12,240
Yeah.

356
00:27:12,240 --> 00:27:13,240
Okay.

357
00:27:13,240 --> 00:27:14,240
Let's do this.

358
00:27:14,240 --> 00:27:15,240
Okay.

359
00:27:15,240 --> 00:27:20,640
So everybody's been talking about it.

360
00:27:20,640 --> 00:27:28,320
So we wouldn't be doing our due diligence if we didn't check out chat GPT.

361
00:27:28,320 --> 00:27:36,560
And so this morning I just said, oh, you know, can you please, you know, create a table of,

362
00:27:36,560 --> 00:27:45,920
you know, 20 terpenes that are found in cannabis?

363
00:27:45,920 --> 00:27:53,280
Because, you know, I was trying to think of terpenes off of the top of my head and the

364
00:27:53,280 --> 00:27:58,920
only one I could think of super, super quick was Theta-karyophiline.

365
00:27:58,920 --> 00:28:05,760
And so now say we had this tool with us, you know, now all I have to do is say, oh, you

366
00:28:05,760 --> 00:28:13,120
know, here's a nice table of, you know, 20 terpenes that are found in cannabis.

367
00:28:13,120 --> 00:28:18,760
Interestingly, they did list a cannabinoid.

368
00:28:18,760 --> 00:28:27,400
Interesting fact, fun fact, cannabinoids are in fact terpenes.

369
00:28:27,400 --> 00:28:38,160
And in fact, I've been reading a, right, I said last week, do your reading.

370
00:28:38,160 --> 00:28:45,360
So I thought I wouldn't, I think I've been in the laboratory or tangential to the laboratory

371
00:28:45,360 --> 00:28:49,080
space for too long without knowing chemistry.

372
00:28:49,080 --> 00:28:58,680
So this is just my sister's old chemistry book, just organic chemistry by Joseph Hornback.

373
00:28:58,680 --> 00:29:06,080
And you know, stand on the shoulders of giants and he's just an incredible chemist.

374
00:29:06,080 --> 00:29:11,120
The book is, it's pretty dense, but there's some real interesting things in there.

375
00:29:11,120 --> 00:29:16,400
And I was going to start sharing, sharing some of the fun ones with you next week, but

376
00:29:16,400 --> 00:29:19,240
it's all chemistry at the end of the day.

377
00:29:19,240 --> 00:29:25,280
And so Sammy, that's actually, I think an interesting thing that we may need to do is

378
00:29:25,280 --> 00:29:31,520
sort of tie some of the chemistry to these strain names.

379
00:29:31,520 --> 00:29:41,400
So for example, this chemist, that was another thing I was going to ask for.

380
00:29:41,400 --> 00:29:50,280
Let's see if we can't get their, I think, they're finding cannabis with their chemical

381
00:29:50,280 --> 00:29:53,040
structure.

382
00:29:53,040 --> 00:30:04,520
So this is what we're used to seeing, right, peppery or woody or what have you.

383
00:30:04,520 --> 00:30:07,440
This isn't the right thing.

384
00:30:07,440 --> 00:30:16,920
I would actually, like I said, I'm not a chemist, so I'm not even using the right terminology.

385
00:30:16,920 --> 00:30:27,200
Hold on one second, let me try one more thing.

386
00:30:27,200 --> 00:30:29,120
Does anybody know what this is called?

387
00:30:29,120 --> 00:30:44,600
The chemical, the bottle maybe?

388
00:30:44,600 --> 00:30:50,640
Are you talking about the arrangement of, like given, like those chemical formulas just

389
00:30:50,640 --> 00:30:52,480
list the number of each atom.

390
00:30:52,480 --> 00:30:55,160
Oh yeah, the chemical formulas.

391
00:30:55,160 --> 00:30:56,160
Yeah.

392
00:30:56,160 --> 00:31:01,160
But there's something, an isomer, isomers are like the different ways that you can arrange

393
00:31:01,160 --> 00:31:06,400
the same collection, but in different shapes, right?

394
00:31:06,400 --> 00:31:08,480
Is that the word you're looking for?

395
00:31:08,480 --> 00:31:16,680
Well, essentially I do believe chemists have like a shorthand way to just depict these

396
00:31:16,680 --> 00:31:27,520
chemical formulas, like the chemical diagrams that you're used to seeing, like the pentagon

397
00:31:27,520 --> 00:31:29,960
for carbon, I think.

398
00:31:29,960 --> 00:31:30,960
Yeah.

399
00:31:30,960 --> 00:31:39,800
But anywho, I won't spend too much time on this, but long story short is a good chemist

400
00:31:39,800 --> 00:31:46,960
can see the chemical formula and tell you properties about the compound.

401
00:31:46,960 --> 00:31:52,560
So I think we need to start incorporating chemistry more.

402
00:31:52,560 --> 00:32:02,840
So also look at some of the properties of these molecules, like so for example, boiling

403
00:32:02,840 --> 00:32:06,400
point, melting point.

404
00:32:06,400 --> 00:32:09,880
And I think this is going to help us a lot more in our efforts.

405
00:32:09,880 --> 00:32:14,240
But anywho, I'm getting a little sidetracked there.

406
00:32:14,240 --> 00:32:17,940
Let me get back to the data at hand.

407
00:32:17,940 --> 00:32:22,500
So I wanted to share with you an interesting tool.

408
00:32:22,500 --> 00:32:29,440
So I mentioned at the beginning how it's useful to create diagrams.

409
00:32:29,440 --> 00:32:37,720
And so this is a cool tool, Python diagrams, to create real beautiful diagrams.

410
00:32:37,720 --> 00:32:45,480
And so once again, you can find the code on GitHub for those who are interested in the

411
00:32:45,480 --> 00:32:46,480
code.

412
00:32:46,480 --> 00:32:52,840
I'm going to focus more here on the actual diagram itself, since this is more what we're

413
00:32:52,840 --> 00:32:53,840
interested in.

414
00:32:53,840 --> 00:32:54,840
Okay.

415
00:32:54,840 --> 00:33:06,400
So back in the day, people didn't want to be traced when they were cultivating cannabis.

416
00:33:06,400 --> 00:33:09,160
And now it's the exact opposite.

417
00:33:09,160 --> 00:33:13,560
People are getting traced down to the plant.

418
00:33:13,560 --> 00:33:17,960
So this is interesting and it's interesting data.

419
00:33:17,960 --> 00:33:24,560
I don't want to just say this over and over and over again, but really there's no other

420
00:33:24,560 --> 00:33:26,240
industry like it.

421
00:33:26,240 --> 00:33:29,000
We've just got this awesome data.

422
00:33:29,000 --> 00:33:31,760
It's something that people are really interested in.

423
00:33:31,760 --> 00:33:34,140
People love the cannabis industry.

424
00:33:34,140 --> 00:33:35,480
It's high growth.

425
00:33:35,480 --> 00:33:38,680
And then there's just all this phenomenal public data.

426
00:33:38,680 --> 00:33:42,260
So just best of both worlds here.

427
00:33:42,260 --> 00:33:50,840
So just want to share with you what cannabis data there is, at least in Washington state,

428
00:33:50,840 --> 00:33:58,040
how we can go about connecting it and the various statistics that can be calculated.

429
00:33:58,040 --> 00:34:02,640
And it's similar, but different in other states.

430
00:34:02,640 --> 00:34:05,200
Every state's a little different.

431
00:34:05,200 --> 00:34:11,960
And that's why we need ambassadors or cannabis data scientists all over the place in different

432
00:34:11,960 --> 00:34:14,080
states, different countries.

433
00:34:14,080 --> 00:34:21,400
For example, some probably need Canadian data scientists to help out with all of the numbers

434
00:34:21,400 --> 00:34:23,920
that can be crunched in Canada.

435
00:34:23,920 --> 00:34:26,760
But like I said, we can at least start somewhere.

436
00:34:26,760 --> 00:34:30,280
So we can start here in Washington state.

437
00:34:30,280 --> 00:34:40,920
So Washington commissions, licensees who are licensed to operate with cannabis so they

438
00:34:40,920 --> 00:34:48,880
can either cultivate, process, test, or sell cannabis.

439
00:34:48,880 --> 00:34:54,400
And we have various data points here for the licensees.

440
00:34:54,400 --> 00:34:59,960
The main things that are of interest are, of course, their name.

441
00:34:59,960 --> 00:35:05,760
If you wanted to look them up, do a search for them.

442
00:35:05,760 --> 00:35:11,280
Knowing that their license was actually issued is fairly interesting.

443
00:35:11,280 --> 00:35:15,640
And just knowing, really, the zip code's the most useful.

444
00:35:15,640 --> 00:35:18,440
Just sort of knowing their geography.

445
00:35:18,440 --> 00:35:25,200
That way you can do county by county statistics or even zip code by zip code statistics.

446
00:35:25,200 --> 00:35:28,560
I haven't gotten that granular yet.

447
00:35:28,560 --> 00:35:33,020
But I think there would be some real fruitful analysis along those veins.

448
00:35:33,020 --> 00:35:43,760
So if anyone's ambitious and wants to do a zip code level analysis, then I haven't seen

449
00:35:43,760 --> 00:35:45,560
one yet for Washington state.

450
00:35:45,560 --> 00:35:47,120
But they may exist.

451
00:35:47,120 --> 00:35:50,240
Just because I haven't seen it doesn't mean someone hasn't done it.

452
00:35:50,240 --> 00:35:51,240
Okay.

453
00:35:51,240 --> 00:35:53,880
So you've got licensees.

454
00:35:53,880 --> 00:35:57,080
Well what are they doing?

455
00:35:57,080 --> 00:36:02,440
First and foremost, the cultivators are growing plants.

456
00:36:02,440 --> 00:36:06,440
So here are all the data points we have for plants.

457
00:36:06,440 --> 00:36:10,920
And you see that it's tied to licensee.

458
00:36:10,920 --> 00:36:17,280
Well it's not super clear, but it's tied to licensee by licensee ID.

459
00:36:17,280 --> 00:36:19,720
That those connect.

460
00:36:19,720 --> 00:36:25,200
And then, oh, it's your plants are in an area.

461
00:36:25,200 --> 00:36:31,720
So they may be in flower room A or flower room B or what have you.

462
00:36:31,720 --> 00:36:37,160
Each plant has a strain.

463
00:36:37,160 --> 00:36:42,440
So strains are a whole other type of data field here.

464
00:36:42,440 --> 00:36:52,200
And as we start seeing, the strains is almost an antiquated data set.

465
00:36:52,200 --> 00:37:03,280
So basically my terminology here is each one of these is going to be a data set.

466
00:37:03,280 --> 00:37:06,360
And these are all data sets.

467
00:37:06,360 --> 00:37:10,120
So if you have better terminology, then please share.

468
00:37:10,120 --> 00:37:18,200
In long story short, as we'll see, the strains is almost an antiquated data set.

469
00:37:18,200 --> 00:37:25,200
And as Sammy was pointing out, other data sets or other data points, in particular lab

470
00:37:25,200 --> 00:37:28,360
results, may be better.

471
00:37:28,360 --> 00:37:36,520
There's still a lot in a name, but I'll share with you the strains is one of the smaller

472
00:37:36,520 --> 00:37:37,760
data sets here.

473
00:37:37,760 --> 00:37:38,760
All right, cool.

474
00:37:38,760 --> 00:37:40,840
So we've got plants.

475
00:37:40,840 --> 00:37:52,480
If you need to see when anything was destroyed, you can match those to the various plants.

476
00:37:52,480 --> 00:38:00,880
If anybody wants to do plant level statistics, then some of the things that are on the agenda

477
00:38:00,880 --> 00:38:07,080
that haven't yet been calculated yet are canopy.

478
00:38:07,080 --> 00:38:13,760
This is something that Washington state itself, the regulatory body is concerned about.

479
00:38:13,760 --> 00:38:24,640
So in 2015, there were 2.5 million square feet of cannabis plants.

480
00:38:24,640 --> 00:38:36,320
So we would love to know what is the square feet of canopy in 2022.

481
00:38:36,320 --> 00:38:41,960
And you may be able to parse this out of this data.

482
00:38:41,960 --> 00:38:47,520
You may have to make some assumptions as to how many square feet a particular plant takes

483
00:38:47,520 --> 00:38:48,520
up.

484
00:38:48,520 --> 00:38:55,960
So I don't know if you can calculate square feet, but I think we've calculated number

485
00:38:55,960 --> 00:38:58,000
of plants before.

486
00:38:58,000 --> 00:39:04,160
So that's one thing, just to calculate the number of plants, because you can do it by

487
00:39:04,160 --> 00:39:05,160
date.

488
00:39:05,160 --> 00:39:12,760
So for example, you could say, oh, how many plants were harvested day by day?

489
00:39:12,760 --> 00:39:15,440
So that's something that's of interest.

490
00:39:15,440 --> 00:39:18,480
Then you can go granular.

491
00:39:18,480 --> 00:39:25,840
So the way I always recommend people to approach statistics or start with the aggregate and

492
00:39:25,840 --> 00:39:28,840
then keep adding conditions.

493
00:39:28,840 --> 00:39:37,520
So the aggregate would just be how many plants were harvested in 2022.

494
00:39:37,520 --> 00:39:41,360
Then the first condition would be by day.

495
00:39:41,360 --> 00:39:48,160
So how many actually before that, the first condition is actually by licensee.

496
00:39:48,160 --> 00:39:56,760
So how many plants were harvested, then by licensee, and then by day.

497
00:39:56,760 --> 00:40:01,760
And then you can do lots of interesting things with that.

498
00:40:01,760 --> 00:40:05,080
Then you could just say, oh, how many plants were harvested by day?

499
00:40:05,080 --> 00:40:08,440
How many plants were harvested by month?

500
00:40:08,440 --> 00:40:11,240
Who were the top cultivators?

501
00:40:11,240 --> 00:40:16,480
Which licensee harvested the most plants?

502
00:40:16,480 --> 00:40:19,720
Who harvested the most plants in any given month?

503
00:40:19,720 --> 00:40:22,440
Who harvested the most on average?

504
00:40:22,440 --> 00:40:27,200
What is the average number of plants harvested per month?

505
00:40:27,200 --> 00:40:29,480
You can do all of these statistics.

506
00:40:29,480 --> 00:40:35,280
And I think these are quite interesting to say a cultivator.

507
00:40:35,280 --> 00:40:40,120
So for example, if you're a cultivator in Washington state, wouldn't it be interesting

508
00:40:40,120 --> 00:40:46,880
to know what's the average number of plants being harvested on a month by month basis

509
00:40:46,880 --> 00:40:48,960
by all the other cultivators?

510
00:40:48,960 --> 00:40:56,440
That way you can know if you sit above or below average.

511
00:40:56,440 --> 00:41:00,280
And maybe you already have an idea of if you're above or below average.

512
00:41:00,280 --> 00:41:05,320
Well, you can find out what percentile you're in.

513
00:41:05,320 --> 00:41:07,520
Are you in the 80th percentile?

514
00:41:07,520 --> 00:41:11,000
Are you in the X percentile?

515
00:41:11,000 --> 00:41:12,000
Go for it.

516
00:41:12,000 --> 00:41:16,280
OK, so plants is an area that I've touched little.

517
00:41:16,280 --> 00:41:21,680
So if you want to calculate statistics there, there's many, many, many cool plant statistics

518
00:41:21,680 --> 00:41:24,040
that can be calculated.

519
00:41:24,040 --> 00:41:29,400
Next, we've got those connected to strains.

520
00:41:29,400 --> 00:41:34,960
Well, inventory is also connected to strains.

521
00:41:34,960 --> 00:41:49,000
So this is really the only connection between plant and inventory, which may or may not

522
00:41:49,000 --> 00:41:51,280
be a sufficient connection.

523
00:41:51,280 --> 00:41:55,960
I think it requires more investigation.

524
00:41:55,960 --> 00:42:02,540
Because the idea behind the traceability system was right from seed to sale.

525
00:42:02,540 --> 00:42:12,240
So ideally, you would like to be able to trace a plant or plant all the way to sale.

526
00:42:12,240 --> 00:42:16,080
You may not be able to go to a particular plant.

527
00:42:16,080 --> 00:42:21,280
While this has a plant ID, this field's actually blank.

528
00:42:21,280 --> 00:42:30,600
So I don't think you can trace a sale back to a particular plant.

529
00:42:30,600 --> 00:42:34,240
But you may be able to trace.

530
00:42:34,240 --> 00:42:43,440
One would hope you could at least trace a sale back to at least a batch of plants.

531
00:42:43,440 --> 00:42:54,760
So that, I think, would be the goal is if you purchased runt flower, ideally, you would

532
00:42:54,760 --> 00:43:05,000
like to be able to track this back and see at least which producer produced that runt.

533
00:43:05,000 --> 00:43:11,040
And ideally, you would like to get some estimate of the harvest date.

534
00:43:11,040 --> 00:43:14,600
So that's something that we'll work on.

535
00:43:14,600 --> 00:43:17,160
I think that's going to require a fair amount of work.

536
00:43:17,160 --> 00:43:21,160
And I'll share with you what I have gotten done.

537
00:43:21,160 --> 00:43:25,080
So we'll come back to inventory here momentarily.

538
00:43:25,080 --> 00:43:30,200
Let's jump to sales since we just mentioned it.

539
00:43:30,200 --> 00:43:36,480
So on the far other side of the graph, we've got sales.

540
00:43:36,480 --> 00:43:45,160
So this is what happens when, well, actually, when a consumer makes a purchase, this is

541
00:43:45,160 --> 00:43:51,000
basically, you can think of the sale header as the receipt.

542
00:43:51,000 --> 00:43:53,400
So this is the receipt.

543
00:43:53,400 --> 00:44:01,360
And all you have there are the licensee to the retailer who sold it, when they sold it.

544
00:44:01,360 --> 00:44:04,560
OK, so that's all you have.

545
00:44:04,560 --> 00:44:09,840
Well, now you can match the items.

546
00:44:09,840 --> 00:44:14,360
So these are all the sales items, the sales details.

547
00:44:14,360 --> 00:44:20,360
You can match the items that were on that receipt.

548
00:44:20,360 --> 00:44:25,840
So you actually have to make this connection to find the licensee.

549
00:44:25,840 --> 00:44:32,120
So as you see, sales detail doesn't have the licensee.

550
00:44:32,120 --> 00:44:34,460
Connected the sale headers.

551
00:44:34,460 --> 00:44:37,840
Now we know the licensee.

552
00:44:37,840 --> 00:44:40,240
And we know the sale date.

553
00:44:40,240 --> 00:44:41,560
Awesome.

554
00:44:41,560 --> 00:44:45,760
So you have to merge these.

555
00:44:45,760 --> 00:44:50,000
Sorry if this is dry, but I'll get to some interesting statistics towards the end.

556
00:44:50,000 --> 00:44:56,560
But long story short, you've got to merge the items with the receipts to find out who

557
00:44:56,560 --> 00:44:59,760
sold it and when.

558
00:44:59,760 --> 00:45:09,360
And once you've done that, well, now you can calculate sales by retailer by day.

559
00:45:09,360 --> 00:45:14,600
And that's exactly what we'll be doing today.

560
00:45:14,600 --> 00:45:22,000
And just to share with you some of this data.

561
00:45:22,000 --> 00:45:24,720
Wait.

562
00:45:24,720 --> 00:45:30,280
Here, I'll come back to that.

563
00:45:30,280 --> 00:45:33,760
We'll keep talking about the data.

564
00:45:33,760 --> 00:45:37,240
So you can calculate sales by licensee by day.

565
00:45:37,240 --> 00:45:38,240
Cool.

566
00:45:38,240 --> 00:45:45,240
Well, we want to know what strain was that.

567
00:45:45,240 --> 00:45:48,480
And then we also want to know what THC and CBD was that.

568
00:45:48,480 --> 00:45:51,200
So let's try to see if we can get those fields.

569
00:45:51,200 --> 00:45:52,200
So how do you do that?

570
00:45:52,200 --> 00:45:55,760
Well, we have the inventory ID.

571
00:45:55,760 --> 00:46:03,080
So now you have to connect sales to inventory.

572
00:46:03,080 --> 00:46:08,200
And that's another way to get the licensee ID.

573
00:46:08,200 --> 00:46:14,600
So we made these connections and now we connect to inventory.

574
00:46:14,600 --> 00:46:21,040
In inventory provides us with some interesting fields, but primarily we now need that to

575
00:46:21,040 --> 00:46:32,960
get us to the product, to get the inventory type, its name, and a description.

576
00:46:32,960 --> 00:46:38,360
Once you have the name, I don't think you need to get the name from the strain anymore,

577
00:46:38,360 --> 00:46:46,400
but you could also connect this to the strain to get more information about the strain.

578
00:46:46,400 --> 00:46:55,880
Then once you have the inventory, you can connect lab results to the inventory item.

579
00:46:55,880 --> 00:47:02,040
So that is how you finally get all of the lab results for that sale.

580
00:47:02,040 --> 00:47:04,040
So that is phenomenal.

581
00:47:04,040 --> 00:47:09,780
So now we have items that were sold.

582
00:47:09,780 --> 00:47:13,720
We know what type of items they were.

583
00:47:13,720 --> 00:47:17,320
We know which strain they were.

584
00:47:17,320 --> 00:47:21,040
And we even know the lab results.

585
00:47:21,040 --> 00:47:27,880
So the only cannabinoids we have are THC and CBD, but that's okay.

586
00:47:27,880 --> 00:47:30,040
It's better than nothing.

587
00:47:30,040 --> 00:47:32,600
So we now have the lab results.

588
00:47:32,600 --> 00:47:38,840
So we can actually do a lot now, a whole lot.

589
00:47:38,840 --> 00:47:42,640
So let me share with you some of these statistics.

590
00:47:42,640 --> 00:47:49,840
But before I go on, does anybody have any big picture questions here about the CCRS?

591
00:47:49,840 --> 00:48:02,440
Sorry, I might have missed the big frame of this.

592
00:48:02,440 --> 00:48:09,320
This is publicly available data, but what is the incentive behind it?

593
00:48:09,320 --> 00:48:16,480
Is this a curiosity or who wants to know this, like what you're discovering here?

594
00:48:16,480 --> 00:48:19,000
Or are you just demonstrating?

595
00:48:19,000 --> 00:48:23,760
The company behind this, there's twofold.

596
00:48:23,760 --> 00:48:28,400
I work for Cannlytics and we help people out with cannabis analytics.

597
00:48:28,400 --> 00:48:36,400
And the company that's working on the cannabis data in Washington state is the company Cannabis

598
00:48:36,400 --> 00:48:37,400
Data.

599
00:48:37,400 --> 00:48:45,880
And they're creating an open source data platform for people to be able to access this data.

600
00:48:45,880 --> 00:48:56,760
So the idea is you can do a Freedom of Information Act request to get all of this data.

601
00:48:56,760 --> 00:49:08,360
But Washington announced, I'll show you momentarily, you basically just get a data dump of aggregate,

602
00:49:08,360 --> 00:49:17,680
of all the data together, and it's really hard to use.

603
00:49:17,680 --> 00:49:22,000
So I think cannabis data is for profit.

604
00:49:22,000 --> 00:49:24,440
So everybody's for profit.

605
00:49:24,440 --> 00:49:33,440
But long story short is, I think people want these data points enough that they're willing

606
00:49:33,440 --> 00:49:35,920
to pay people to go pursue them.

607
00:49:35,920 --> 00:49:39,400
So that's the main thing.

608
00:49:39,400 --> 00:49:43,680
So definitely pursuing these for profit.

609
00:49:43,680 --> 00:49:52,160
And in fact, that's what I tell people is, and in fact, this may almost be the most valuable

610
00:49:52,160 --> 00:49:56,720
stage with some of these tools like GPT coming out.

611
00:49:56,720 --> 00:50:07,240
I think the data curation is actually one of the highest value added stages, simply

612
00:50:07,240 --> 00:50:22,800
turning this data from this web of, I guess it's clean, but I don't know, with unstructured,

613
00:50:22,800 --> 00:50:24,800
that's the right word.

614
00:50:24,800 --> 00:50:31,760
Moving unstructured data into structured data, I think is where the value is added.

615
00:50:31,760 --> 00:50:40,240
It's becoming increasingly easy to build apps and websites, and all of a sudden, the actual

616
00:50:40,240 --> 00:50:45,040
material is now astonishingly easy to create.

617
00:50:45,040 --> 00:50:52,400
So I've been creating background images with some of the tools that OpenAI has put out.

618
00:50:52,400 --> 00:50:56,400
DoubleBuff Fusion is the one that I use the most.

619
00:50:56,400 --> 00:51:01,040
So I've already found a way to work that into my workflow to generate material.

620
00:51:01,040 --> 00:51:07,160
And just today, I showed you a brief way that you could, that was a crude way, but you may

621
00:51:07,160 --> 00:51:11,160
be able to start generating material for your website.

622
00:51:11,160 --> 00:51:18,760
And so long story short, I think structuring the data is the most valuable part.

623
00:51:18,760 --> 00:51:26,080
And like I said, analytics is in the business of helping people structure and analyze their

624
00:51:26,080 --> 00:51:28,800
cannabis data.

625
00:51:28,800 --> 00:51:31,080
There's this project out of Washington State.

626
00:51:31,080 --> 00:51:36,920
I'm more just sharing this with you because this is a project I'm working on.

627
00:51:36,920 --> 00:51:39,520
It's also open source.

628
00:51:39,520 --> 00:51:49,000
So if you can think of a way to profit from creating this data, then by all means, join

629
00:51:49,000 --> 00:51:53,080
in on the fun.

630
00:51:53,080 --> 00:52:02,040
But also, if you need help, then be in touch with me because as I said, in the coming year,

631
00:52:02,040 --> 00:52:04,840
I'm trying to get more and more people involved.

632
00:52:04,840 --> 00:52:14,960
So send me a message on Slack or through email and I'll try to get you in on the projects.

633
00:52:14,960 --> 00:52:16,440
So that's what I'm working on.

634
00:52:16,440 --> 00:52:22,880
And then if you can take any of these ideas and use them, say, in other states or other

635
00:52:22,880 --> 00:52:26,600
data sets, then maybe you can add value there.

636
00:52:26,600 --> 00:52:28,800
So that's sort of the project at hand.

637
00:52:28,800 --> 00:52:39,320
Just an open source for project, not for project, an open source for profit data curation project

638
00:52:39,320 --> 00:52:42,720
that Candidates and Cannabis Data are working on.

639
00:52:42,720 --> 00:52:49,220
Just wanted to share it with you to say, hey, if you want to help out, you're welcome.

640
00:52:49,220 --> 00:52:53,640
If you want to take anything to use on your own, you're welcome.

641
00:52:53,640 --> 00:53:00,520
And then we'd love to hear any ideas you may have because we're far from perfect.

642
00:53:00,520 --> 00:53:05,600
And so if you have any ideas from how to improve, we're always open to hear those.

643
00:53:05,600 --> 00:53:09,920
And then, you know, like as I said, you can take our ideas and run with them.

644
00:53:09,920 --> 00:53:11,640
So that's sort of the idea behind it.

645
00:53:11,640 --> 00:53:14,920
It's a win-win project.

646
00:53:14,920 --> 00:53:21,240
Clearly, I still need to do a better job at formulating and explaining it.

647
00:53:21,240 --> 00:53:22,680
But do you have any questions?

648
00:53:22,680 --> 00:53:29,680
Should I do a half decent job or do you have more questions at hand?

649
00:53:29,680 --> 00:53:30,680
Go on.

650
00:53:30,680 --> 00:53:32,680
I might have questions later, but yeah.

651
00:53:32,680 --> 00:53:33,680
Okay.

652
00:53:33,680 --> 00:53:37,960
Well, here, why don't I just go ahead and get into some of this data and then that may

653
00:53:37,960 --> 00:53:41,320
make it a bit more clear what we're after here.

654
00:53:41,320 --> 00:53:52,600
So I just made this diagram just to kind of help in the data cleaning stage because it

655
00:53:52,600 --> 00:53:58,480
helps to be able to visualize this because there's a lot going on here.

656
00:53:58,480 --> 00:53:59,480
Okay.

657
00:53:59,480 --> 00:54:02,480
So let's just...

658
00:54:02,480 --> 00:54:15,240
Unfortunately, we may not have the data, which would be kind of...

659
00:54:15,240 --> 00:54:19,320
Let's keep our fingers crossed.

660
00:54:19,320 --> 00:54:23,320
There's actually one more thing I can try.

661
00:54:23,320 --> 00:54:25,720
Okay.

662
00:54:25,720 --> 00:54:35,320
So I'm going to unplug this and plug this back in.

663
00:54:35,320 --> 00:54:40,720
Let me make sure I still have the camera on.

664
00:54:40,720 --> 00:54:43,920
Okay.

665
00:54:43,920 --> 00:54:45,920
We've got the data back.

666
00:54:45,920 --> 00:54:46,920
Okay.

667
00:54:46,920 --> 00:54:47,920
Phenomenal.

668
00:54:47,920 --> 00:54:49,640
Let me just do one last double check.

669
00:54:49,640 --> 00:54:53,640
Sorry for the rocky...

670
00:54:53,640 --> 00:54:59,720
There we go.

671
00:54:59,720 --> 00:55:02,240
Okay.

672
00:55:02,240 --> 00:55:08,040
Everything's going well now.

673
00:55:08,040 --> 00:55:10,880
Thanks for bearing with us.

674
00:55:10,880 --> 00:55:11,880
Okay.

675
00:55:11,880 --> 00:55:16,520
So we've been doing these periodic Freedom of Information Act requests.

676
00:55:16,520 --> 00:55:18,240
We're going to start doing...

677
00:55:18,240 --> 00:55:20,120
Well, we haven't.

678
00:55:20,120 --> 00:55:23,440
Our good friends over at Cannabis Data have been doing them.

679
00:55:23,440 --> 00:55:28,160
And I think they're going to start doing them on a nice month-by-month basis.

680
00:55:28,160 --> 00:55:30,880
So what are you given?

681
00:55:30,880 --> 00:55:34,040
So as I said, this is public data just sitting there.

682
00:55:34,040 --> 00:55:39,600
And we're going to see, okay, can we take public data, get some knowledge out of it,

683
00:55:39,600 --> 00:55:43,120
and do so in a way that we can't make a buck?

684
00:55:43,120 --> 00:55:44,120
Okay.

685
00:55:44,120 --> 00:55:48,080
So you can download this data.

686
00:55:48,080 --> 00:55:50,320
How much do we have?

687
00:55:50,320 --> 00:55:56,600
I think we can...

688
00:55:56,600 --> 00:55:57,600
How many bytes?

689
00:55:57,600 --> 00:55:58,600
Okay.

690
00:55:58,600 --> 00:56:01,200
So that's almost 41 and a half.

691
00:56:01,200 --> 00:56:13,560
So you have almost 42 gigabytes of data that was generated between, say, January and late

692
00:56:13,560 --> 00:56:18,240
November of 2022 in Washington State.

693
00:56:18,240 --> 00:56:21,360
So as we were pointing out, this adds up quick.

694
00:56:21,360 --> 00:56:28,400
We should have been approaching this in piecemeal earlier in the year, but we get around to

695
00:56:28,400 --> 00:56:30,800
it when we get around to it.

696
00:56:30,800 --> 00:56:36,120
Now what are we given in this huge pile of data?

697
00:56:36,120 --> 00:56:38,480
Well, check it out.

698
00:56:38,480 --> 00:56:42,760
So there's lots and lots of inventory data.

699
00:56:42,760 --> 00:56:52,960
There's a little bit of lab result data.

700
00:56:52,960 --> 00:56:55,120
It's dense and important data.

701
00:56:55,120 --> 00:56:57,680
So we'll explore that more.

702
00:56:57,680 --> 00:57:06,680
There's a fair amount of plant data, product data, and check it out, the bulk are sales.

703
00:57:06,680 --> 00:57:13,880
So this is to be expected.

704
00:57:13,880 --> 00:57:16,800
In fact, consumer facing...

705
00:57:16,800 --> 00:57:18,280
Oh, yes, that's what they call them.

706
00:57:18,280 --> 00:57:22,760
They call them point of sales systems.

707
00:57:22,760 --> 00:57:28,000
That type of software deals with enormous volumes of data.

708
00:57:28,000 --> 00:57:32,360
So this is, I think, what Walmart's claim to fame.

709
00:57:32,360 --> 00:57:38,200
So I think they were just able to handle enormous amounts of data, much, much better than everyone

710
00:57:38,200 --> 00:57:39,200
else.

711
00:57:39,200 --> 00:57:42,760
I mean, I think they also did logistics really, really well too.

712
00:57:42,760 --> 00:57:47,000
And that just made them a leader.

713
00:57:47,000 --> 00:57:53,680
And so as you can see, if you're a cultivator, you've got some data to deal with, but these

714
00:57:53,680 --> 00:57:59,560
retailers just have an enormous amount of data to deal with.

715
00:57:59,560 --> 00:58:00,920
So how much data is this?

716
00:58:00,920 --> 00:58:10,240
So each of these files ranges from, say, 0.1 to 0.25 gigabytes.

717
00:58:10,240 --> 00:58:15,520
And there's 60, so there's 120 of those.

718
00:58:15,520 --> 00:58:21,560
Maybe I'm doing math off the top of my head, but I want to say that's around 30 gigabytes

719
00:58:21,560 --> 00:58:23,600
of sales data.

720
00:58:23,600 --> 00:58:32,880
And then look, all of that sales data, and there's just this one meager data file that's

721
00:58:32,880 --> 00:58:38,200
not even 0.1 gigabytes of stream data.

722
00:58:38,200 --> 00:58:47,040
So it's there, but I think it's definitely a minor part of the whole picture as a whole.

723
00:58:47,040 --> 00:58:53,560
So I think, as you were kind of pointing out, Sammy, I think people do get kind of lost

724
00:58:53,560 --> 00:59:03,880
in these strain names a lot when really the action that's going on are the sales, the

725
00:59:03,880 --> 00:59:09,480
plants that are being grown, of course, the inventory, the products.

726
00:59:09,480 --> 00:59:13,080
I think that's what I think really matters.

727
00:59:13,080 --> 00:59:17,480
So let's dig into some of that data.

728
00:59:17,480 --> 00:59:27,840
Sorry, I'll be kind of wrapping this up and getting to my main point here.

729
00:59:27,840 --> 00:59:33,640
I'm trying to think about what my main point is.

730
00:59:33,640 --> 00:59:38,600
I probably should have written this down.

731
00:59:38,600 --> 00:59:48,480
Okay, well, I'll just share with you some of the rewards that have been reaped.

732
00:59:48,480 --> 00:59:52,560
So once again, this script is on GitHub.

733
00:59:52,560 --> 00:59:57,600
I'll let those of you who are interested in the code, paw through it.

734
00:59:57,600 --> 01:00:11,080
But all I did was, okay, I read through all 60 of these sales details.

735
01:00:11,080 --> 01:00:20,200
As you do that, say I read this one in, I now have to read in all, you don't actually

736
01:00:20,200 --> 01:00:29,920
have to read in all 60, but you start reading in all 60 sales headers data files.

737
01:00:29,920 --> 01:00:34,520
And then the way I have it programmed is, oh, it just stops once we've matched them

738
01:00:34,520 --> 01:00:35,520
all.

739
01:00:35,520 --> 01:00:40,760
So you read in the sales details, you start reading in the sales headers until they've

740
01:00:40,760 --> 01:00:44,120
all been matched, and then you stop.

741
01:00:44,120 --> 01:00:50,120
And then I haven't done this yet, but the next part is, okay, now you read in all 60

742
01:00:50,120 --> 01:00:59,280
of all the inventory items until you've matched them all, and then you would stop.

743
01:00:59,280 --> 01:01:03,640
And then you do the same thing with products.

744
01:01:03,640 --> 01:01:11,520
And then lab results, I think you actually have to go in reverse because it's actually

745
01:01:11,520 --> 01:01:16,160
the lab result gets matched to the inventory, the inventory doesn't get to the lab results.

746
01:01:16,160 --> 01:01:23,680
So that one's a little confusing, but long story short, the script isn't hard to run,

747
01:01:23,680 --> 01:01:27,440
but it just takes a long time.

748
01:01:27,440 --> 01:01:30,280
So here, I'll just get it running and I'll show you.

749
01:01:30,280 --> 01:01:35,080
So here it's running.

750
01:01:35,080 --> 01:01:43,480
So basically we're reading in the first data file, hopefully we'll start getting some print

751
01:01:43,480 --> 01:01:48,880
statements and then I'll basically just show you what the output would look like.

752
01:01:48,880 --> 01:01:49,880
Okay, great.

753
01:01:49,880 --> 01:02:01,200
So we just read in the sales details 61, read in the first sales headers file, we were able

754
01:02:01,200 --> 01:02:04,840
to match 61% of the sales details.

755
01:02:04,840 --> 01:02:06,560
Okay, cool.

756
01:02:06,560 --> 01:02:14,120
So we now just read in the next batch of sales details, I mean sales headers, and we were

757
01:02:14,120 --> 01:02:17,240
able to match 100% of them.

758
01:02:17,240 --> 01:02:24,640
And now I think I forget what's commented and what's uncommented, but now I'm basically

759
01:02:24,640 --> 01:02:32,720
saving all of these items licensed by, I'll go ahead and stop this here because that's

760
01:02:32,720 --> 01:02:33,720
sufficient.

761
01:02:33,720 --> 01:02:39,680
So now I'm basically just saving all these items after they've been augmented, licensed

762
01:02:39,680 --> 01:02:40,680
by license.

763
01:02:40,680 --> 01:02:43,480
Okay, why am I doing all of this?

764
01:02:43,480 --> 01:02:52,200
Well, this is what I call data curation because now we have these nice data sets.

765
01:02:52,200 --> 01:02:56,120
So that was the script, that was the hard part.

766
01:02:56,120 --> 01:03:06,360
And now I think this is, so the script is the value added, the value adding, so the

767
01:03:06,360 --> 01:03:08,720
script adds the value.

768
01:03:08,720 --> 01:03:18,320
And then here is the result, a nice curated data set where this is called panel data,

769
01:03:18,320 --> 01:03:25,560
where you've got license by date, and then this is the total sales on that date.

770
01:03:25,560 --> 01:03:33,000
So now you can track this licensee selling over time.

771
01:03:33,000 --> 01:03:40,800
And then if you scroll down, oh, here's another license that you can track over time.

772
01:03:40,800 --> 01:03:45,960
And so this is called panel data, and you can do really cool analysis now.

773
01:03:45,960 --> 01:03:56,160
So for example, you could now merge the licensee data, which would have their zip code, then

774
01:03:56,160 --> 01:04:01,480
you could say, get statistics that you think are pertinent to that zip code.

775
01:04:01,480 --> 01:04:06,680
So the one that we know is statistically significant is median income.

776
01:04:06,680 --> 01:04:13,560
So you could find the median income for all these different zip codes, and you can see

777
01:04:13,560 --> 01:04:16,880
how that correlates with price.

778
01:04:16,880 --> 01:04:25,240
So that's maybe not the most interesting study ever, but you could do many, many different

779
01:04:25,240 --> 01:04:27,000
analyses there.

780
01:04:27,000 --> 01:04:34,520
You could see who's paying the most in taxes.

781
01:04:34,520 --> 01:04:42,320
So let's just go ahead and end this strong.

782
01:04:42,320 --> 01:04:48,200
I said, oh, I was going to share with you how you could solve this strain problem.

783
01:04:48,200 --> 01:04:54,400
OK, well, how would you do it?

784
01:04:54,400 --> 01:05:07,760
First, you have to augment all the data, which is the task at hand.

785
01:05:07,760 --> 01:05:14,480
Then you say, OK, you basically, so the question we were given is, oh, can we find strains

786
01:05:14,480 --> 01:05:20,480
that are, say, in the Seattle area that are chemically similar to ACDC?

787
01:05:20,480 --> 01:05:30,400
Well, we can find all the lab results that anyone sent in that were tested for ACDC.

788
01:05:30,400 --> 01:05:40,640
We could calculate the average THC and CBD of those ACDC strains.

789
01:05:40,640 --> 01:05:52,880
We can use, I forget which one's the, I forget the name for it, but we had similarity models.

790
01:05:52,880 --> 01:05:56,560
I forget what it's called.

791
01:05:56,560 --> 01:06:01,000
I'll have to get the name for you.

792
01:06:01,000 --> 01:06:08,240
It's when we did product recommendations.

793
01:06:08,240 --> 01:06:09,960
I'll look it up for you later.

794
01:06:09,960 --> 01:06:18,080
But you can find the products that are most similar chemically to ACDC.

795
01:06:18,080 --> 01:06:24,960
And then you would basically find the inventory items in Seattle that are the most similar

796
01:06:24,960 --> 01:06:27,880
to ACDC.

797
01:06:27,880 --> 01:06:31,360
So you can answer that question.

798
01:06:31,360 --> 01:06:38,360
So you can do really accurate product recommendations.

799
01:06:38,360 --> 01:06:49,720
And then unfortunately, I don't think this script will run fast enough.

800
01:06:49,720 --> 01:06:52,680
But I'll share this with you.

801
01:06:52,680 --> 01:06:57,520
My apologies that I wasn't more prepared in demonstrating this code.

802
01:06:57,520 --> 01:07:06,320
But some of the other questions you can answer are, oh, you know, who sold the most in November?

803
01:07:06,320 --> 01:07:09,780
Who were the top 10 retailers?

804
01:07:09,780 --> 01:07:15,920
How much cannabis was sold in 2022?

805
01:07:15,920 --> 01:07:19,400
What was the average sales per retailer?

806
01:07:19,400 --> 01:07:26,840
And I think these are interesting data points that I would argue a lot of people don't know.

807
01:07:26,840 --> 01:07:33,680
And so long term, I would love to publish this knowledge, right?

808
01:07:33,680 --> 01:07:40,540
Because I would love Cannlytics to be an authoritative source on cannabis statistics.

809
01:07:40,540 --> 01:07:46,400
So for example, if we're publishing these statistics, oh, you know, who sold the most

810
01:07:46,400 --> 01:07:48,800
in November of 2022?

811
01:07:48,800 --> 01:07:50,080
We've got the data source.

812
01:07:50,080 --> 01:07:51,880
We've got the statistic.

813
01:07:51,880 --> 01:08:07,000
Well, maybe somebody may ask ChatGPT, you know, who sold the most cannabis in Washington State?

814
01:08:07,000 --> 01:08:12,640
I wonder if they'll know.

815
01:08:12,640 --> 01:08:17,700
Oh, that's interesting.

816
01:08:17,700 --> 01:08:20,920
So it looks like they're only trained up to 2021.

817
01:08:20,920 --> 01:08:23,680
So we still have time.

818
01:08:23,680 --> 01:08:32,360
But anywho, GPT4 is coming out this coming year.

819
01:08:32,360 --> 01:08:35,360
So we'll see.

820
01:08:35,360 --> 01:08:36,880
Okay.

821
01:08:36,880 --> 01:08:40,800
So it looks like they don't necessarily have that information.

822
01:08:40,800 --> 01:08:46,900
So, but anywho, that's sort of the grander scheme is, oh, let's just be an authoritative

823
01:08:46,900 --> 01:08:48,640
source of data.

824
01:08:48,640 --> 01:08:51,600
Short term, how can you make a buck?

825
01:08:51,600 --> 01:08:58,040
Well, in the couple of weeks ago, we went over a REMA forecasting.

826
01:08:58,040 --> 01:09:04,680
So what you could do, and I often have these grand schemes, but I never really get around

827
01:09:04,680 --> 01:09:05,680
to them.

828
01:09:05,680 --> 01:09:12,720
So I always encourage people to capitalize on my, some of my business model ideas, because

829
01:09:12,720 --> 01:09:16,760
as I said, I may not necessarily get around to them myself.

830
01:09:16,760 --> 01:09:21,040
And as I said, there is more than one restaurant out there.

831
01:09:21,040 --> 01:09:26,720
So there's no, there's no, any reason why Caneladix has to be the only provider of cannabis

832
01:09:26,720 --> 01:09:27,720
statistics.

833
01:09:27,720 --> 01:09:36,240
So for example, now that we have sales by licensee by retailer, well, you could just

834
01:09:36,240 --> 01:09:41,880
reach out to the retailers and say, Hey, like, how well are you monitoring your sales?

835
01:09:41,880 --> 01:09:45,280
Would you like to see how well you did compared to everybody else?

836
01:09:45,280 --> 01:09:46,520
Well, guess what?

837
01:09:46,520 --> 01:09:48,960
They're going to, they're going to want to know that.

838
01:09:48,960 --> 01:09:54,640
And then you could also say, well, and once again, a lot of people deride forecasting

839
01:09:54,640 --> 01:10:01,760
because in forecast should be taken with a grain of salt, but I still think they're,

840
01:10:01,760 --> 01:10:09,480
they're at least useful to get my saying is any projections better than no projection,

841
01:10:09,480 --> 01:10:11,680
you know, don't like put all your faith in it.

842
01:10:11,680 --> 01:10:18,600
I mean, it would at least like help to like, say your sales are trending up.

843
01:10:18,600 --> 01:10:24,000
What did it at least be like good to get an estimate of what they would be in the next

844
01:10:24,000 --> 01:10:25,160
year?

845
01:10:25,160 --> 01:10:30,200
Because they're probably not going to be the same if they're on a positive trend.

846
01:10:30,200 --> 01:10:33,960
So you could at least get like an estimate of what they may be in the next year.

847
01:10:33,960 --> 01:10:38,800
So you could at least offer all the retailers, Hey, you know, would you like to see your

848
01:10:38,800 --> 01:10:45,160
statistics from 2022 and we can get you a forecast for 2023.

849
01:10:45,160 --> 01:10:52,960
So right there, I think that's a, like a business proposition that's so rich that I won't even

850
01:10:52,960 --> 01:10:56,000
be able to capitalize on all of that myself.

851
01:10:56,000 --> 01:10:57,000
I mean, think about it.

852
01:10:57,000 --> 01:11:07,640
There's a three to 400 retailers and we could potentially help them all out with rich data

853
01:11:07,640 --> 01:11:10,120
and get them forecast for the next year.

854
01:11:10,120 --> 01:11:16,880
And as I was even saying in my forecasting episode, a smart retailer would get as many

855
01:11:16,880 --> 01:11:18,880
forecasts as they can.

856
01:11:18,880 --> 01:11:26,880
So there's no reason why a smart retailer couldn't employ me and you to create a forecast

857
01:11:26,880 --> 01:11:27,880
for them.

858
01:11:27,880 --> 01:11:31,800
And as I said, let the best forecast win, right?

859
01:11:31,800 --> 01:11:37,520
So you can measure the root mean squared error, the forecasting error.

860
01:11:37,520 --> 01:11:41,000
And so that's, I think what some fun competitions would be.

861
01:11:41,000 --> 01:11:47,280
I was thinking like, I never formally wrote out instructions, but that I talked about

862
01:11:47,280 --> 01:11:48,280
it last year.

863
01:11:48,280 --> 01:11:55,640
Oh, why don't we have a competition of who could forecast sales for the best in 2022?

864
01:11:55,640 --> 01:11:59,600
So if any of you cooked up a forecasting model, you know, definitely share and we can compare

865
01:11:59,600 --> 01:12:01,000
our results for fun.

866
01:12:01,000 --> 01:12:05,160
But also thinking about things like that for the coming year.

867
01:12:05,160 --> 01:12:08,400
But I've been talking way, way too long.

868
01:12:08,400 --> 01:12:15,620
Unfortunately, as I said, I spent almost too much time writing code and not enough time

869
01:12:15,620 --> 01:12:18,440
preparing a presentation.

870
01:12:18,440 --> 01:12:24,760
So my apologies that things were kind of, I've been sort of flying by the seat of my

871
01:12:24,760 --> 01:12:31,400
pants and just maybe not have had like as concrete of a plan as I should have.

872
01:12:31,400 --> 01:12:37,200
But basically the idea behind today was to share with you, hey, there's all this CCRS

873
01:12:37,200 --> 01:12:38,200
data.

874
01:12:38,200 --> 01:12:42,520
We're finally starting to get our teeth into it and curate it.

875
01:12:42,520 --> 01:12:44,840
Here are some interesting statistics.

876
01:12:44,840 --> 01:12:47,720
Here are some that I may not have time to get around to.

877
01:12:47,720 --> 01:12:55,840
See if we can't put our great minds together and make the at least the cannabis space a

878
01:12:55,840 --> 01:13:01,120
little bit better molecule by molecule.

879
01:13:01,120 --> 01:13:04,480
But that's my spiel for today.

880
01:13:04,480 --> 01:13:12,320
Anyone have any thoughts, comments, questions before we get on to the new years?

881
01:13:12,320 --> 01:13:18,520
Ooh, Isaac, what's on your mind?

882
01:13:18,520 --> 01:13:22,400
Hey, yes, actually do have a question.

883
01:13:22,400 --> 01:13:25,880
I'm just wondering how often do they update the database?

884
01:13:25,880 --> 01:13:30,720
I believe the last time I accessed it, it says something like November data.

885
01:13:30,720 --> 01:13:37,100
Do they just add in new data or there's like new Dropbox folders for each month?

886
01:13:37,100 --> 01:13:39,440
They will create a new Dropbox folder.

887
01:13:39,440 --> 01:13:44,560
And you don't actually have to wait around on us.

888
01:13:44,560 --> 01:13:47,920
He's actually Jim McCray to give him proper credit.

889
01:13:47,920 --> 01:13:51,960
And he'll share the language so I can share the language with you.

890
01:13:51,960 --> 01:13:58,760
Periodically, he'll just send an email to the Washington State office and say, hey,

891
01:13:58,760 --> 01:14:00,640
could I get the latest data?

892
01:14:00,640 --> 01:14:03,880
And they'll send it to him and he'll share it with the cannabis data.

893
01:14:03,880 --> 01:14:06,720
And he just does that about once a month.

894
01:14:06,720 --> 01:14:12,360
And I think enough people request the data that they generally just wait and give everybody

895
01:14:12,360 --> 01:14:13,360
the same link.

896
01:14:13,360 --> 01:14:16,520
And it's around the 20th of each month.

897
01:14:16,520 --> 01:14:20,720
So around the 20th, I think, is when you can expect the drop.

898
01:14:20,720 --> 01:14:22,400
And so that's pretty good.

899
01:14:22,400 --> 01:14:24,840
I'll find out if we can get it more frequent than that.

900
01:14:24,840 --> 01:14:26,560
But Sammy, did you have a question?

901
01:14:26,560 --> 01:14:29,720
Oh, well, I was just going to ask.

902
01:14:29,720 --> 01:14:35,320
I'm also like actually looking for more work.

903
01:14:35,320 --> 01:14:44,200
And so your company is based in Washington.

904
01:14:44,200 --> 01:14:47,080
That's also like you're trying to only study Washington.

905
01:14:47,080 --> 01:14:54,560
Is there an interest to extend this to Oregon or make sort of parallel things?

906
01:14:54,560 --> 01:14:55,560
Absolutely.

907
01:14:55,560 --> 01:14:59,400
The reason is just being it's manageable in a starting point.

908
01:14:59,400 --> 01:15:06,800
So one thing we stress here is when you've got a big daunting task, just start somewhere.

909
01:15:06,800 --> 01:15:09,200
And then that's at least a start.

910
01:15:09,200 --> 01:15:12,000
And then you can think about where to proceed from there.

911
01:15:12,000 --> 01:15:17,240
So we're just starting in Washington state really just for no better reason than that's

912
01:15:17,240 --> 01:15:19,640
what's available.

913
01:15:19,640 --> 01:15:23,440
Definitely would love to expand to more states, right?

914
01:15:23,440 --> 01:15:28,120
Oregon, Massachusetts, everywhere under the sun.

915
01:15:28,120 --> 01:15:33,720
Can you remind me what was the second part to your question?

916
01:15:33,720 --> 01:15:37,520
Well, no, I'm just I'm like this looks interesting.

917
01:15:37,520 --> 01:15:41,720
I'd like to work on this.

918
01:15:41,720 --> 01:15:46,760
Or something inspired by this, like I said, like I'm here in Oregon.

919
01:15:46,760 --> 01:15:52,680
So presumably there's similar data sets available.

920
01:15:52,680 --> 01:15:56,560
But also like, yeah, no, I don't.

921
01:15:56,560 --> 01:16:00,360
Like I said, this is my first date, first time with with your your meetup.

922
01:16:00,360 --> 01:16:03,680
And I don't I'm not sure exactly what the goals are.

923
01:16:03,680 --> 01:16:06,000
But I'm interested in what you're doing.

924
01:16:06,000 --> 01:16:10,320
I'd love to talk to you basically about about this outside of here if you want.

925
01:16:10,320 --> 01:16:11,320
Oh, yes.

926
01:16:11,320 --> 01:16:14,320
Sorry, I don't think I fully answered your question.

927
01:16:14,320 --> 01:16:18,200
But as I said, I'll think about it and pick pick up more next week.

928
01:16:18,200 --> 01:16:22,120
And oh, yes, that it just jogged my memory.

929
01:16:22,120 --> 01:16:28,240
So it's not exclusive to the cannabis space, but here's an example from the cannabis space.

930
01:16:28,240 --> 01:16:32,200
How do you actually start an effective startup?

931
01:16:32,200 --> 01:16:38,720
Well, a way is start locally and do it well.

932
01:16:38,720 --> 01:16:42,200
So an example from the cannabis space is Ease.

933
01:16:42,200 --> 01:16:44,040
They're a delivery company.

934
01:16:44,040 --> 01:16:45,040
Guess what?

935
01:16:45,040 --> 01:16:52,040
When they started, they were exclusive to the San Francisco area, from my understanding.

936
01:16:52,040 --> 01:16:56,360
So I don't even think they touched other jurisdictions in California.

937
01:16:56,360 --> 01:16:59,640
I think it was a very, very local thing.

938
01:16:59,640 --> 01:17:01,840
They figured it out there.

939
01:17:01,840 --> 01:17:06,360
They made sure they could do it there well and profitably.

940
01:17:06,360 --> 01:17:10,280
And then you start looking to expand.

941
01:17:10,280 --> 01:17:13,060
And there are other delivery companies.

942
01:17:13,060 --> 01:17:16,240
Like I said, there's there's many restaurants, right?

943
01:17:16,240 --> 01:17:19,480
I don't think there should just be one company doing something.

944
01:17:19,480 --> 01:17:23,920
But they were definitely a successful transportation company.

945
01:17:23,920 --> 01:17:26,800
And they just started small, started locally.

946
01:17:26,800 --> 01:17:30,080
And as I said, they may be an exception to the rule.

947
01:17:30,080 --> 01:17:34,880
So maybe that's it, right?

948
01:17:34,880 --> 01:17:40,120
That may not necessarily be the way if everybody knew how to start a startup.

949
01:17:40,120 --> 01:17:46,040
Yeah, no, I I know precious little about the business side of things.

950
01:17:46,040 --> 01:17:51,880
Like I'm interested in the data science, like but I so yeah, I don't know.

951
01:17:51,880 --> 01:17:55,040
Maybe we should talk outside of this.

952
01:17:55,040 --> 01:17:56,760
I don't know.

953
01:17:56,760 --> 01:18:00,360
Yes, let's all keep the conversation going.

954
01:18:00,360 --> 01:18:03,680
We'll talk and then I'll get you on the slack.

955
01:18:03,680 --> 01:18:11,120
So that's the idea is, you know, we come here to have a nice round table just to touch base.

956
01:18:11,120 --> 01:18:17,760
And yes, let's keep the conversation going, because there is a need for awesome data scientists

957
01:18:17,760 --> 01:18:18,760
everywhere.

958
01:18:18,760 --> 01:18:25,000
So we'll start wrangling data in Oregon, wrangle some data in Massachusetts and Florida and

959
01:18:25,000 --> 01:18:27,920
Oklahoma everywhere under the sun.

960
01:18:27,920 --> 01:18:31,480
And I think we can learn from each other.

961
01:18:31,480 --> 01:18:36,800
If I discover a new technique, maybe you can use it too.

962
01:18:36,800 --> 01:18:42,800
And you know, if you have an idea for a way that we can improve, we'd always love to hear

963
01:18:42,800 --> 01:18:44,880
that because we're always looking to improve.

964
01:18:44,880 --> 01:18:46,200
So I think we can.

965
01:18:46,200 --> 01:18:47,200
Right.

966
01:18:47,200 --> 01:18:48,920
That's the idea behind kinetics.

967
01:18:48,920 --> 01:18:55,920
And that's why we build open source projects and work on open data projects is we believe

968
01:18:55,920 --> 01:18:57,940
in the win win.

969
01:18:57,940 --> 01:19:00,680
So it's a long term game.

970
01:19:00,680 --> 01:19:03,680
But we're here.

971
01:19:03,680 --> 01:19:09,400
We're in it for the long haul, and hopefully we can have some fun while we're doing it.

972
01:19:09,400 --> 01:19:12,040
Well, too cool.

973
01:19:12,040 --> 01:19:17,600
Thank you all for coming and lending your brilliant eyes, your ears, your minds.

974
01:19:17,600 --> 01:19:20,360
It's you that are really helping advance cannabis science.

975
01:19:20,360 --> 01:19:22,680
I couldn't do it without you, right?

976
01:19:22,680 --> 01:19:25,320
It's you putting your interest in here.

977
01:19:25,320 --> 01:19:31,680
That's that's helping move the cannabis industry forward, even if it's only molecule by molecule.

978
01:19:31,680 --> 01:19:32,680
I think it's helping.

979
01:19:32,680 --> 01:19:35,080
So thank you all.

980
01:19:35,080 --> 01:19:41,200
So sorry, I was just gonna say, how can we get in contact with you outside of here?

981
01:19:41,200 --> 01:19:42,200
Okay.

982
01:19:42,200 --> 01:19:56,640
If you all hold tight, and you can go there and then I'll put the link to the Slack channel

983
01:19:56,640 --> 01:20:05,040
here momentarily.

984
01:20:05,040 --> 01:20:08,000
I think.

985
01:20:08,000 --> 01:20:20,240
But like, but yeah, I'll get the slack for you.

986
01:20:20,240 --> 01:20:26,040
But as I said, there's a lot of work to be done with the cannabis data group.

987
01:20:26,040 --> 01:20:30,440
So that one's one that's really picking up steam.

988
01:20:30,440 --> 01:20:36,080
There's really concrete tasks that can be done with that one.

989
01:20:36,080 --> 01:20:39,080
So this link will let you join the slack.

990
01:20:39,080 --> 01:20:41,440
And hopefully we can make that a bit more active.

991
01:20:41,440 --> 01:20:46,800
So currently, Candice and I do a lot of conversations there.

992
01:20:46,800 --> 01:20:52,280
And as I said, I'm going to start doing a better job about posting jobs, interesting

993
01:20:52,280 --> 01:20:57,880
articles, of course, data sets and things we're working on, trying to just get people

994
01:20:57,880 --> 01:20:59,760
more engaged into the coming year.

995
01:20:59,760 --> 01:21:06,800
They say you're going to start doing some competitions for t-shirts, start sending people

996
01:21:06,800 --> 01:21:07,800
to conferences.

997
01:21:07,800 --> 01:21:14,440
I'd love to get you all involved in data science projects.

998
01:21:14,440 --> 01:21:16,680
A lot on the table for the coming year.

999
01:21:16,680 --> 01:21:25,560
So definitely be in touch because want to share all the fun with all of you.

1000
01:21:25,560 --> 01:21:33,080
So Camlytics is you and it's a relatively small company right now?

1001
01:21:33,080 --> 01:21:34,080
Exactly.

1002
01:21:34,080 --> 01:21:40,160
So essentially, Camlytics is me plus all the cannabis data scientists that love to help

1003
01:21:40,160 --> 01:21:41,720
out.

1004
01:21:41,720 --> 01:21:48,480
So it's a small open source project at the moment, but as I was saying, it's kind of

1005
01:21:48,480 --> 01:21:50,040
like a flame, right?

1006
01:21:50,040 --> 01:21:53,040
So we just kind of need to nurse it.

1007
01:21:53,040 --> 01:21:57,800
In the past few months, I let the flame burn a little low.

1008
01:21:57,800 --> 01:21:58,800
I don't know.

1009
01:21:58,800 --> 01:22:05,500
As I was telling you all in the past few weeks, I just need to do a better job of focusing

1010
01:22:05,500 --> 01:22:08,320
on open source because that's what I believe in.

1011
01:22:08,320 --> 01:22:17,800
I think that's a nice long-term sustainable endeavor.

1012
01:22:17,800 --> 01:22:23,600
And I also just need to do a better job of prioritizing and getting people involved.

1013
01:22:23,600 --> 01:22:26,520
So I got behind on getting the recordings up.

1014
01:22:26,520 --> 01:22:34,520
But I've found new tools like OpenAI that can help me expedite that process.

1015
01:22:34,520 --> 01:22:40,520
And then I don't think I was doing a good enough job getting all of you involved because

1016
01:22:40,520 --> 01:22:44,280
there are ways that you can help out in the projects.

1017
01:22:44,280 --> 01:22:48,800
I don't know.

1018
01:22:48,800 --> 01:22:51,760
We all have things we're good at and things we're not good at.

1019
01:22:51,760 --> 01:22:56,720
And communication is not necessarily one of my strengths.

1020
01:22:56,720 --> 01:23:07,320
So that's why I encourage you all to don't feel bad about poking me because if I don't

1021
01:23:07,320 --> 01:23:11,880
reach out to you, that doesn't necessarily mean that I don't need your help.

1022
01:23:11,880 --> 01:23:15,180
I just didn't communicate that.

1023
01:23:15,180 --> 01:23:16,180
So feel free.

1024
01:23:16,180 --> 01:23:22,200
Just poke me and say, hey, is there anything I can help on or this or that?

1025
01:23:22,200 --> 01:23:24,960
And I'll try to get you involved.

1026
01:23:24,960 --> 01:23:26,960
All right.

1027
01:23:26,960 --> 01:23:29,120
Too cool.

1028
01:23:29,120 --> 01:23:31,560
But yeah, poke me.

1029
01:23:31,560 --> 01:23:38,840
And in fact, this is a valuable lesson that I learned and didn't take heed until way

1030
01:23:38,840 --> 01:23:41,360
too late in life.

1031
01:23:41,360 --> 01:23:45,280
The squeaky wheel gets the oil.

1032
01:23:45,280 --> 01:23:46,280
Be squeaky.

1033
01:23:46,280 --> 01:23:50,960
Squeak, squeak, squeak, squeak.

1034
01:23:50,960 --> 01:23:55,680
Don't feel bad about impeding on people.

1035
01:23:55,680 --> 01:24:04,760
Don't be shy because that's actually selfish because you're thinking about yourself.

1036
01:24:04,760 --> 01:24:13,440
Instead, think, oh, people actually do want to be engaged.

1037
01:24:13,440 --> 01:24:15,440
So be squeaky.

1038
01:24:15,440 --> 01:24:16,440
Engage people.

1039
01:24:16,440 --> 01:24:17,440
Ask.

1040
01:24:17,440 --> 01:24:24,280
And as I said, it doesn't hurt to ask, and if someone says no, don't take it too heavy.

1041
01:24:24,280 --> 01:24:25,680
Just move on.

1042
01:24:25,680 --> 01:24:49,240
So ask, be squeaky, get involved, and I think we've got an awesome year coming up.

