1
00:00:00,000 --> 00:00:07,660
Welcome to the Canvas Data Science Meetup Group.

2
00:00:07,660 --> 00:00:09,700
Happy to have you today, Stephen.

3
00:00:09,700 --> 00:00:11,100
My name's Keegan.

4
00:00:11,100 --> 00:00:14,940
Got into the Canvas space in 2018.

5
00:00:14,940 --> 00:00:19,100
Started out as an analyst at a Canvas testing laboratory,

6
00:00:19,100 --> 00:00:21,420
and then started developing software,

7
00:00:21,420 --> 00:00:25,660
and now do data analytics for everyone from laboratories

8
00:00:25,660 --> 00:00:30,180
all the way to producers, processors, and retailers.

9
00:00:30,180 --> 00:00:33,580
And just here to have this group to provide value,

10
00:00:33,580 --> 00:00:36,660
because I think data science is something

11
00:00:36,660 --> 00:00:38,300
that's in high demand here.

12
00:00:38,300 --> 00:00:41,340
So we'd love to hear about what you do

13
00:00:41,340 --> 00:00:43,620
and what you would like to get out of the group

14
00:00:43,620 --> 00:00:45,340
and what your interests make you.

15
00:00:45,340 --> 00:00:47,940
So I'd love to hear from you, Stephen.

16
00:00:47,940 --> 00:00:51,340
Oh, me, I'm not actually a data scientist myself.

17
00:00:51,340 --> 00:00:52,420
I'm more of a hobbyist.

18
00:00:52,420 --> 00:00:56,740
My main interest is cybersecurity,

19
00:00:56,740 --> 00:00:58,140
which I'm trying to break into.

20
00:00:58,140 --> 00:01:03,780
And they definitely go well together.

21
00:01:03,780 --> 00:01:06,980
Because these days, cybersecurity is so complex,

22
00:01:06,980 --> 00:01:15,940
you need automated analytics to really be able to manage risk.

23
00:01:15,940 --> 00:01:21,140
And the other part of the group, PiData, the part I'm a group,

24
00:01:21,140 --> 00:01:23,900
group I'm part of, their main focus

25
00:01:23,900 --> 00:01:25,660
is diversity in data science.

26
00:01:30,140 --> 00:01:32,060
Well, it's interesting.

27
00:01:32,060 --> 00:01:36,420
And Hello, Geri, we'll give you an opportunity

28
00:01:36,420 --> 00:01:38,020
to introduce yourself here in a minute.

29
00:01:38,020 --> 00:01:40,460
I just want to say real quick, while Stephen brought it up,

30
00:01:40,460 --> 00:01:44,140
I was actually thinking there's probably a good demand.

31
00:01:44,140 --> 00:01:46,740
I don't know if it's discovered yet,

32
00:01:46,740 --> 00:01:49,740
but people will probably find out.

33
00:01:49,740 --> 00:01:53,900
And I think there's probably a good demand for cybersecurity

34
00:01:53,900 --> 00:01:56,260
in the cannabis space.

35
00:01:56,260 --> 00:02:03,660
So I'm sure people that are savvy are investing in that.

36
00:02:03,660 --> 00:02:06,460
So I think there's a lot of opportunities for you.

37
00:02:06,460 --> 00:02:11,340
Because everyone's concerned about it.

38
00:02:11,340 --> 00:02:16,580
Whether they have itemized that as one of their concerns

39
00:02:16,580 --> 00:02:17,420
is another thing.

40
00:02:17,420 --> 00:02:22,220
But I think it's definitely on people's minds.

41
00:02:22,220 --> 00:02:26,540
One of the issues is it seems like pretty much like 99.9%

42
00:02:26,540 --> 00:02:29,460
of developers don't think about security at all.

43
00:02:34,900 --> 00:02:36,380
True.

44
00:02:36,380 --> 00:02:39,020
Well, I don't know if that's true.

45
00:02:39,020 --> 00:02:41,660
But that's an interesting statistic.

46
00:02:41,660 --> 00:02:49,380
But I think it's something that should be on people's minds.

47
00:02:49,380 --> 00:02:51,380
Like you said, it may not necessarily be.

48
00:02:51,380 --> 00:02:59,580
So the cannabis space, it's of concern.

49
00:02:59,580 --> 00:03:05,020
And for example, retailers, they go to great extent

50
00:03:05,020 --> 00:03:07,180
to secure their facilities.

51
00:03:07,180 --> 00:03:09,980
And sometimes they don't think it's there

52
00:03:09,980 --> 00:03:15,420
when people are waxed in other regards in the industry.

53
00:03:15,420 --> 00:03:19,580
So they'd say, hey, we're going to these great extent

54
00:03:19,580 --> 00:03:22,020
to secure our facilities.

55
00:03:22,020 --> 00:03:27,220
Make sure your data facilities are up to snuff.

56
00:03:27,220 --> 00:03:30,620
And you're taking all the necessary precautions

57
00:03:30,620 --> 00:03:32,220
in other aspects of work.

58
00:03:32,220 --> 00:03:36,100
So long story short, I can go on and on about this.

59
00:03:36,100 --> 00:03:38,460
So great to have you here, Stephen.

60
00:03:38,460 --> 00:03:39,500
Oh, yeah.

61
00:03:39,500 --> 00:03:40,140
Definitely.

62
00:03:40,140 --> 00:03:42,860
Hopefully, I'm into the cannabis myself a bit.

63
00:03:42,860 --> 00:03:45,100
That would be why.

64
00:03:45,100 --> 00:03:47,900
So I'm hoping to learn what I can

65
00:03:47,900 --> 00:03:49,860
besides about development and maybe

66
00:03:49,860 --> 00:03:54,460
some other aspects of cannabis industry that I'm not aware of.

67
00:03:58,100 --> 00:04:00,820
Well, without stealing the show, Jerry,

68
00:04:00,820 --> 00:04:02,700
would you be interested in introducing yourself

69
00:04:02,700 --> 00:04:03,940
to the group?

70
00:04:03,940 --> 00:04:05,420
They've got some awesome endeavors.

71
00:04:05,420 --> 00:04:09,420
So here's the opportunity to brag about yourself here.

72
00:04:13,540 --> 00:04:16,460
If you want, Jerry.

73
00:04:16,460 --> 00:04:17,580
Me?

74
00:04:17,580 --> 00:04:19,140
Yeah, then we'll go ahead.

75
00:04:19,140 --> 00:04:20,340
Good morning, everybody.

76
00:04:20,340 --> 00:04:24,380
I'm a marketing guy.

77
00:04:24,380 --> 00:04:30,180
And recently took a boot camp in data analytics

78
00:04:30,180 --> 00:04:31,660
and learned a little bit of Python.

79
00:04:31,660 --> 00:04:34,300
So I know enough to be dangerous.

80
00:04:34,300 --> 00:04:38,700
I am the world's worst typist and a terrible coder.

81
00:04:38,700 --> 00:04:41,220
So when I start to work on a project,

82
00:04:41,220 --> 00:04:46,460
like I'm trying to map some of the data that has been

83
00:04:46,460 --> 00:04:49,900
presented here, and it's just taken me a long time,

84
00:04:49,900 --> 00:04:51,020
but I'm getting there.

85
00:04:51,020 --> 00:04:54,820
And as you were saying, Stephen, there's

86
00:04:54,820 --> 00:04:59,460
an awful lot to be learned about cannabis.

87
00:04:59,460 --> 00:05:04,820
And it's a lot of information and insights

88
00:05:04,820 --> 00:05:09,900
that sophisticated data analytics can provide to the industry.

89
00:05:09,900 --> 00:05:12,940
And I think this is a great place

90
00:05:12,940 --> 00:05:18,020
to be to discuss those ideas and see how we can develop it.

91
00:05:21,020 --> 00:05:22,060
Exactly.

92
00:05:22,060 --> 00:05:24,100
We're coincidentally going to speed up

93
00:05:24,100 --> 00:05:25,940
your endeavors here today, Jerry.

94
00:05:25,940 --> 00:05:27,620
We'll be doing some mapping.

95
00:05:27,620 --> 00:05:30,380
And I can share with you the latest work.

96
00:05:30,380 --> 00:05:36,180
Well, I took a boot camp, and it was very wide and not very deep.

97
00:05:36,180 --> 00:05:38,140
And a lot of it didn't stick.

98
00:05:38,140 --> 00:05:40,820
So what I do is I go back to what I was taught

99
00:05:40,820 --> 00:05:44,420
and try and bring in those techniques

100
00:05:44,420 --> 00:05:46,100
and then just build on it.

101
00:05:46,100 --> 00:05:47,820
So for instance, this morning, I was

102
00:05:47,820 --> 00:05:50,940
doing some restructuring how to geocode the addresses

103
00:05:50,940 --> 00:05:52,060
and learned a little bit.

104
00:05:52,060 --> 00:05:53,740
But it's step by step.

105
00:05:53,740 --> 00:05:55,420
But my biggest problem was when I

106
00:05:55,420 --> 00:05:56,860
was trying to bring in a data set,

107
00:05:56,860 --> 00:05:58,820
and I spelled cannabis with an A,

108
00:05:58,820 --> 00:06:01,980
and I couldn't find my directory anywhere.

109
00:06:01,980 --> 00:06:04,220
And it took me about half an hour to figure that out.

110
00:06:04,220 --> 00:06:06,100
So that's what I mean about my typing skills.

111
00:06:10,220 --> 00:06:14,660
Well, you're going to be well positioned for today,

112
00:06:14,660 --> 00:06:18,220
because like you said, the first step is getting the geocoding.

113
00:06:18,220 --> 00:06:20,180
And then today, we'll show you all the value

114
00:06:20,180 --> 00:06:21,180
that we have from that.

115
00:06:21,180 --> 00:06:25,820
So just a little bit of work up front

116
00:06:25,820 --> 00:06:26,940
can go a long way.

117
00:06:26,940 --> 00:06:27,940
So most of the work.

118
00:06:27,940 --> 00:06:30,500
I just actually was reading an article.

119
00:06:30,500 --> 00:06:35,180
It's still up on my screen here about geocoding, how to do it.

120
00:06:35,180 --> 00:06:37,100
So it's not that complicated.

121
00:06:37,100 --> 00:06:41,180
You just have to know the right tools to use.

122
00:06:41,180 --> 00:06:43,340
There's something in the pandas, geopandas,

123
00:06:43,340 --> 00:06:45,900
that I'm looking at.

124
00:06:45,900 --> 00:06:47,620
We'll play with that later today.

125
00:06:47,620 --> 00:06:49,940
Get to that momentarily, in fact.

126
00:06:49,940 --> 00:06:53,340
So before we do, though, Ryan, would you

127
00:06:53,340 --> 00:06:56,860
like the opportunity to introduce yourself to the group?

128
00:06:56,860 --> 00:06:59,220
And what you may like to get out of the group?

129
00:07:05,620 --> 00:07:07,380
OK, Ryan, feel free to speak up.

130
00:07:07,380 --> 00:07:08,020
It's muted.

131
00:07:08,020 --> 00:07:10,340
Ryan's muted.

132
00:07:10,340 --> 00:07:11,780
OK, 100% OK.

133
00:07:11,780 --> 00:07:15,300
Just feel free to listen along.

134
00:07:15,300 --> 00:07:16,300
All right.

135
00:07:16,300 --> 00:07:19,460
Well, without further ado, let's just go ahead and get into it,

136
00:07:19,460 --> 00:07:24,220
because Jerry's brought up exactly what we're

137
00:07:24,220 --> 00:07:26,260
going to be working on here today.

138
00:07:26,260 --> 00:07:26,780
Cool.

139
00:07:33,020 --> 00:07:34,100
Awesome.

140
00:07:34,100 --> 00:07:37,860
So long story short, I want to start with a question

141
00:07:37,860 --> 00:07:39,060
through the day.

142
00:07:39,060 --> 00:07:43,860
So we keep talking about just cannabis, the plant,

143
00:07:43,860 --> 00:07:45,580
the history of cannabis.

144
00:07:45,580 --> 00:07:50,500
Well, it may, because Washington's quite a luscious

145
00:07:50,500 --> 00:07:53,980
state, but I'm not sure if cannabis would naturally

146
00:07:53,980 --> 00:07:56,100
grow up in the Northeast.

147
00:07:56,100 --> 00:08:02,700
And so I like to think humans can sort of be this useful tool

148
00:08:02,700 --> 00:08:03,980
of the plants, right?

149
00:08:03,980 --> 00:08:06,500
And so I think it was going to be interesting just

150
00:08:06,500 --> 00:08:12,100
to kind of look at where is the canopy in Washington state?

151
00:08:12,100 --> 00:08:12,820
So here's the thing.

152
00:08:12,820 --> 00:08:15,780
One of the things that interests me about Washington state

153
00:08:15,780 --> 00:08:19,740
is it is a major apple producer.

154
00:08:19,740 --> 00:08:23,380
And the Hudson Valley in New York state

155
00:08:23,380 --> 00:08:26,420
is a major apple producer as well, second in the country

156
00:08:26,420 --> 00:08:27,860
to Washington state.

157
00:08:30,420 --> 00:08:32,260
They call this the Apple Valley sometimes.

158
00:08:32,260 --> 00:08:36,820
So I think from an agricultural point of view,

159
00:08:36,820 --> 00:08:41,140
it may be very similar to the environment that

160
00:08:41,140 --> 00:08:44,700
we find in Washington.

161
00:08:44,700 --> 00:08:47,580
And that's something that's always been of curious to me

162
00:08:47,580 --> 00:08:52,660
is where are cultivations located in relation

163
00:08:52,660 --> 00:08:55,620
to other agriculture?

164
00:08:55,620 --> 00:09:01,260
So we're just going to scratch the tip of the iceberg today.

165
00:09:01,260 --> 00:09:03,540
But long story short, we're just going

166
00:09:03,540 --> 00:09:05,660
to start looking at plants.

167
00:09:05,660 --> 00:09:12,980
So we can just look at plants by licensee over time.

168
00:09:12,980 --> 00:09:20,900
So I'll show you what I've been working on here.

169
00:09:20,900 --> 00:09:28,500
And then we can just work with the subset of the data here.

170
00:09:28,500 --> 00:09:31,940
So long story short, I've been working

171
00:09:31,940 --> 00:09:40,540
on augmenting these various Washington state data sets.

172
00:09:40,540 --> 00:09:45,580
So nothing fancy.

173
00:09:45,580 --> 00:09:46,900
They're just large.

174
00:09:46,900 --> 00:09:49,540
And they can really be reduced.

175
00:09:49,540 --> 00:09:58,060
So you've got about 40 gigabytes zipped data.

176
00:09:58,060 --> 00:10:03,700
And I've extracted all of this data here.

177
00:10:03,700 --> 00:10:06,380
And once you've done that, you're

178
00:10:06,380 --> 00:10:17,300
actually looking at around 370 to 390 gigabytes of data,

179
00:10:17,300 --> 00:10:20,700
which is really cool because that's coincidentally

180
00:10:20,700 --> 00:10:26,980
about how big the Bitcoin blockchain is at this moment.

181
00:10:26,980 --> 00:10:34,620
So we basically have the Bitcoin blockchain worth of data here.

182
00:10:34,620 --> 00:10:39,220
Then this is cannabis data from Washington state from 2018

183
00:10:39,220 --> 00:10:44,700
to November of 2021.

184
00:10:44,700 --> 00:10:48,620
So this is just shy of four years of data.

185
00:10:48,620 --> 00:10:54,100
So I'm going to be just slowly whittling through this.

186
00:10:54,100 --> 00:10:59,380
So I was hoping to have the sales items processed for today,

187
00:10:59,380 --> 00:11:01,020
but not yet.

188
00:11:01,020 --> 00:11:04,700
So it just started with the plants data here.

189
00:11:04,700 --> 00:11:10,300
And what you can do is if you only

190
00:11:10,300 --> 00:11:14,620
read in a subset of the data points,

191
00:11:14,620 --> 00:11:17,140
so just to show you some of the data points here,

192
00:11:17,140 --> 00:11:20,180
like for example, from plants, you

193
00:11:20,180 --> 00:11:24,820
have a lot of extraneous fields.

194
00:11:24,820 --> 00:11:33,140
And I was just setting out to count the number of plants

195
00:11:33,140 --> 00:11:39,420
by licensee and by date.

196
00:11:39,420 --> 00:11:42,700
So whenever the plant was created

197
00:11:42,700 --> 00:11:45,740
or when it was updated at last is essentially

198
00:11:45,740 --> 00:11:48,180
what I considered it harvested.

199
00:11:48,180 --> 00:11:51,420
So we only really need a handful of fields here.

200
00:11:51,420 --> 00:11:54,700
Really just need to know who grew it

201
00:11:54,700 --> 00:11:57,300
and when they put it in the ground

202
00:11:57,300 --> 00:12:00,420
and when they last touched the plant.

203
00:12:00,420 --> 00:12:04,180
So created at means when they planted?

204
00:12:04,180 --> 00:12:06,580
That's what I'm assuming.

205
00:12:06,580 --> 00:12:10,580
There are other fields here, like plant created at,

206
00:12:10,580 --> 00:12:12,460
plant harvested at.

207
00:12:12,460 --> 00:12:19,140
But they're not reliably entered in the data.

208
00:12:19,140 --> 00:12:23,220
So as you'll see, there are going to be flaws in the data.

209
00:12:23,220 --> 00:12:26,380
And in fact, the beginning portion

210
00:12:26,380 --> 00:12:30,580
of the data from maybe early 2018

211
00:12:30,580 --> 00:12:32,940
may just need to be discarded.

212
00:12:32,940 --> 00:12:35,140
Because it's basically, you'll see,

213
00:12:35,140 --> 00:12:37,460
it's basically people are entering

214
00:12:37,460 --> 00:12:39,860
their beginning inventory.

215
00:12:39,860 --> 00:12:44,140
And it looks like a large spike in the data.

216
00:12:44,140 --> 00:12:48,460
But it may just be a data entry anomaly.

217
00:12:48,460 --> 00:12:51,460
So you'll see.

218
00:12:51,460 --> 00:12:54,700
But I would encourage you to do your own exploration here.

219
00:12:54,700 --> 00:12:59,900
Because I always think there's a trove of analysis that

220
00:12:59,900 --> 00:13:03,100
can be conducted on each and every data point

221
00:13:03,100 --> 00:13:06,700
if you've got the right question at hand.

222
00:13:06,700 --> 00:13:10,980
So these are just errors in your quiver.

223
00:13:10,980 --> 00:13:15,700
And today, I'm just saying, OK, we just

224
00:13:15,700 --> 00:13:20,740
need three of these data fields.

225
00:13:20,740 --> 00:13:34,940
And so just to open up a terminal here,

226
00:13:34,940 --> 00:13:45,780
you're basically taking this file that's 12.9 gigabytes

227
00:13:45,780 --> 00:13:48,700
of data.

228
00:13:48,700 --> 00:13:52,020
And I've already counted these here.

229
00:13:52,020 --> 00:13:55,820
This is 22 fields.

230
00:13:55,820 --> 00:14:01,060
So if we just use three of 22, we're

231
00:14:01,060 --> 00:14:04,700
only using about 14% of the data.

232
00:14:04,700 --> 00:14:19,500
So we're only using about 1.8 gigabytes of data.

233
00:14:19,500 --> 00:14:25,180
So we can actually just go ahead and read in the entire plant's

234
00:14:25,180 --> 00:14:32,900
database if you have 1.8 gigabytes of available memory.

235
00:14:32,900 --> 00:14:36,580
So you're taking this problem of, oh,

236
00:14:36,580 --> 00:14:42,340
do you have 13 gigabytes of memory?

237
00:14:42,340 --> 00:14:48,940
And now you're reducing that to a factor of 1 fifth, at least,

238
00:14:48,940 --> 00:14:55,140
where you just say, OK, now I just need 2 gigabytes of memory.

239
00:14:55,140 --> 00:14:59,460
So I know I'm going on a tangent there.

240
00:14:59,460 --> 00:15:02,420
But it's useful to find tricks like this.

241
00:15:02,420 --> 00:15:10,220
Because the topic of the day is statistics on big data.

242
00:15:10,220 --> 00:15:16,180
And so the idea is it's tricky just

243
00:15:16,180 --> 00:15:18,980
to calculate simple statistics.

244
00:15:18,980 --> 00:15:23,060
And if you are able to wrangle these big data sets,

245
00:15:23,060 --> 00:15:25,500
then just these simple statistics

246
00:15:25,500 --> 00:15:29,780
can be quite informative.

247
00:15:29,780 --> 00:15:30,700
OK.

248
00:15:30,700 --> 00:15:36,300
So long story short, you read the plant's data in here.

249
00:15:36,300 --> 00:15:40,500
And then we're just going to iterate.

250
00:15:44,900 --> 00:15:48,820
I still have a portion of this data left to read.

251
00:15:48,820 --> 00:15:51,980
But I'll just demonstrate what it looks like.

252
00:15:51,980 --> 00:15:56,620
And then I'll show you the initial portion of the data.

253
00:15:56,620 --> 00:16:00,740
So long story short, we just read in the plants

254
00:16:00,740 --> 00:16:06,380
and just parse this day by day here.

255
00:16:09,700 --> 00:16:13,740
So long story short, let's just look at the data.

256
00:16:13,740 --> 00:16:16,660
So this is sort of the baking part of the show,

257
00:16:16,660 --> 00:16:23,060
where this is how you prepare your ingredients.

258
00:16:23,060 --> 00:16:27,820
And now we're just going to pull the baked piece of bread

259
00:16:27,820 --> 00:16:31,020
out of the oven.

260
00:16:31,020 --> 00:16:35,260
So I'm still aggregating this data.

261
00:16:35,260 --> 00:16:38,900
So I'll send you the complete data set

262
00:16:38,900 --> 00:16:42,900
after the presentation today.

263
00:16:42,900 --> 00:16:47,100
But basically what we get here is

264
00:16:47,100 --> 00:16:51,500
we get a daily count of the plants,

265
00:16:51,500 --> 00:16:53,460
as well as the cultivators.

266
00:16:53,460 --> 00:16:57,420
And what it looks like is not all cultivators

267
00:16:57,420 --> 00:17:02,900
have plants in the ground at all times.

268
00:17:02,900 --> 00:17:09,540
But that's something that warrants more study.

269
00:17:09,540 --> 00:17:13,300
But you can at least get the total number of plants

270
00:17:13,300 --> 00:17:14,100
over a day.

271
00:17:16,820 --> 00:17:19,620
And the numbers we're looking at are the total plants

272
00:17:19,620 --> 00:17:21,820
in the ground on that day.

273
00:17:21,820 --> 00:17:22,980
Is that what that means?

274
00:17:22,980 --> 00:17:23,580
Exactly.

275
00:17:23,580 --> 00:17:25,900
The total number of plants that are active.

276
00:17:25,900 --> 00:17:29,660
And so this is my first crude attempt.

277
00:17:29,660 --> 00:17:33,540
And so these could be any way that people are

278
00:17:33,540 --> 00:17:37,420
keeping track of their plants.

279
00:17:37,420 --> 00:17:41,380
So what would cause a rise of 37,000 plants in one day

280
00:17:41,380 --> 00:17:46,220
and then go back down to almost 40,000 plants

281
00:17:46,220 --> 00:17:51,820
the next day, harvesting, or I don't understand the data.

282
00:17:51,820 --> 00:17:56,220
Well, these were why these first few observations

283
00:17:56,220 --> 00:17:57,860
are a bit anomalous.

284
00:17:57,860 --> 00:18:00,860
Let's go ahead and plot this data

285
00:18:00,860 --> 00:18:02,740
because it is quite interesting here.

286
00:18:02,740 --> 00:18:12,820
OK, so this is stuff you can do here.

287
00:18:12,820 --> 00:18:16,980
We're just going to focus here on the Canvas data science

288
00:18:16,980 --> 00:18:17,980
code.

289
00:18:17,980 --> 00:18:21,700
And I'll post this to GitHub afterwards.

290
00:18:21,700 --> 00:18:27,500
Just still need to clean it up a bit.

291
00:18:27,500 --> 00:18:35,020
So pardon the brim we're doing here.

292
00:18:35,020 --> 00:18:39,620
But let's go ahead and look at these plants here.

293
00:18:39,620 --> 00:18:41,140
So let's see here.

294
00:18:53,060 --> 00:18:54,660
So this is how it looks.

295
00:18:54,660 --> 00:19:00,940
So this is how you go about cleaning the data.

296
00:19:00,940 --> 00:19:06,460
We're just going to go ahead and read in the pre-cleaned data.

297
00:19:17,220 --> 00:19:20,100
And we're just using the test data

298
00:19:20,100 --> 00:19:24,860
because we still need to complete the full data set.

299
00:19:32,980 --> 00:19:37,780
We can at least look at the data so far.

300
00:19:37,780 --> 00:19:44,100
So for example, if we just want to look at total plants so far,

301
00:19:44,100 --> 00:19:50,380
then I still need to collect about a year and a half or so of data.

302
00:19:50,380 --> 00:19:58,340
But you'll see in the first few days here,

303
00:19:58,340 --> 00:20:01,380
just say that we just want to look at the first 40 days.

304
00:20:04,180 --> 00:20:09,700
It looks like there is a bit of data entry going on.

305
00:20:09,700 --> 00:20:15,460
So maybe people are just entering in their plants and harvesting them

306
00:20:15,460 --> 00:20:16,780
right away.

307
00:20:16,780 --> 00:20:21,100
I don't want to conjecture too much because I barely

308
00:20:21,100 --> 00:20:23,820
looked at this data.

309
00:20:23,820 --> 00:20:27,740
This is sort of my first pass at looking at this data.

310
00:20:27,740 --> 00:20:31,500
So as you can see, I'm still collecting it.

311
00:20:31,500 --> 00:20:36,460
I've only collected it through July of 2020 at this point.

312
00:20:36,460 --> 00:20:40,860
So still collecting these.

313
00:20:40,860 --> 00:20:44,620
Still have a little more than a year's worth of data to collect.

314
00:20:44,620 --> 00:20:50,380
So that's just the processing is slow.

315
00:20:50,380 --> 00:20:58,020
And so if you have ways that you can improve this cleaning algorithm,

316
00:20:58,020 --> 00:21:04,700
where I'm basically just calculating total plants by day, day by day,

317
00:21:04,700 --> 00:21:07,100
then you can speed this up.

318
00:21:07,100 --> 00:21:10,460
But for now, the algorithm's slow.

319
00:21:10,460 --> 00:21:12,900
But we can do a lot with this.

320
00:21:12,900 --> 00:21:17,580
And so long story short, we can go ahead and start plotting it.

321
00:21:17,580 --> 00:21:26,700
So that way, we can get a rough idea of what our figures will end up

322
00:21:26,700 --> 00:21:28,580
looking like.

323
00:21:28,580 --> 00:21:29,940
What's the word for it?

324
00:21:29,940 --> 00:21:33,900
We can do a proof of concept of some figures.

325
00:21:33,900 --> 00:21:41,460
And then we can do the final copy once all of this is said and done.

326
00:21:41,460 --> 00:21:46,860
So we'll start mapping here in a second.

327
00:21:46,860 --> 00:21:50,860
Since we already have these daily plants pulled up,

328
00:21:50,860 --> 00:21:56,460
we can actually look at this as monthly in-depth mapping.

329
00:21:56,460 --> 00:22:02,060
We can actually look at this as monthly in-weekly,

330
00:22:02,060 --> 00:22:08,860
because that may, it'll be less static than the daily,

331
00:22:08,860 --> 00:22:12,140
so we can get maybe a better plot here.

332
00:22:12,140 --> 00:22:18,620
So let's try to look at this monthly plan.

333
00:22:18,620 --> 00:22:23,700
So long story short, you can look at the number of plants.

334
00:22:23,700 --> 00:22:34,060
And just wanted to show you that we can actually even group these

335
00:22:34,060 --> 00:22:36,540
by licensee.

336
00:22:36,540 --> 00:22:43,220
So that way, you can see which licensees are growing at which times.

337
00:22:43,220 --> 00:22:48,420
And then we can actually look at the number of plants.

338
00:22:48,420 --> 00:22:51,900
And we've talked about entry and exit before.

339
00:22:51,900 --> 00:22:59,020
And so I think this would be an interesting game theoretical.

340
00:22:59,020 --> 00:23:02,660
Here, I'll pull up the data, and then I'll explain the model.

341
00:23:02,660 --> 00:23:13,300
So we've got a nice panel data set here with some identifiers.

342
00:23:13,300 --> 00:23:17,460
So we've got our panel data going over time.

343
00:23:17,460 --> 00:23:25,820
So we've got time, and the individual dimension is the licensee.

344
00:23:25,820 --> 00:23:30,100
And then we have our time series total plants.

345
00:23:30,100 --> 00:23:37,260
So we can track how many plants each licensee has in the ground

346
00:23:37,260 --> 00:23:39,340
at any given time.

347
00:23:39,340 --> 00:23:41,900
And like I said, this is first pass.

348
00:23:41,900 --> 00:23:45,780
So take the data as a grain of salt, and just

349
00:23:45,780 --> 00:23:49,820
know that things may be not counted right.

350
00:23:49,820 --> 00:23:52,900
So I highly encourage you to look at that algorithm.

351
00:23:52,900 --> 00:23:58,900
This is just my first pass at looking at the data.

352
00:23:58,900 --> 00:24:02,820
So what's cool is we've got this nice identifier.

353
00:24:02,820 --> 00:24:07,500
And so we talked last week about augmenting the data.

354
00:24:07,500 --> 00:24:13,900
So we can augment the licensee data and go ahead and get

355
00:24:13,900 --> 00:24:17,700
the latitude and longitude, which are variables

356
00:24:17,700 --> 00:24:20,860
that Jerry was interested in.

357
00:24:20,860 --> 00:24:24,980
And I realized last week we basically

358
00:24:24,980 --> 00:24:28,180
coded latitude and longitude as strings.

359
00:24:28,180 --> 00:24:37,300
If we code them as floats, then everything goes through smoothly.

360
00:24:37,300 --> 00:24:41,180
But here, just reading in the licensee's data,

361
00:24:41,180 --> 00:24:49,780
which we'll visualize momentarily,

362
00:24:49,780 --> 00:24:54,260
just getting the name.

363
00:24:54,260 --> 00:24:58,300
There's just a couple test cases in here,

364
00:24:58,300 --> 00:25:02,180
but the rest are active licenses.

365
00:25:02,180 --> 00:25:05,620
And then you just have latitude, longitude.

366
00:25:05,620 --> 00:25:12,100
Those were the only variables we needed.

367
00:25:12,100 --> 00:25:16,700
And long story short, you can begin visualizing.

368
00:25:16,700 --> 00:25:24,100
So long story short, we had just our simple panel data,

369
00:25:24,100 --> 00:25:29,060
where we just have, well, that's just our daily data,

370
00:25:29,060 --> 00:25:30,340
which is even more simple.

371
00:25:30,340 --> 00:25:32,180
That's just the time series.

372
00:25:32,180 --> 00:25:35,260
We actually have a nice panel data set here,

373
00:25:35,260 --> 00:25:39,660
where we've got date, individual.

374
00:25:39,660 --> 00:25:41,940
We can augment this.

375
00:25:41,940 --> 00:25:46,460
And so we'll augment this with our licensee data.

376
00:25:46,460 --> 00:25:48,140
We're merging.

377
00:25:48,140 --> 00:25:51,300
So the thing about augmentation is we basically just

378
00:25:51,300 --> 00:25:54,340
need a nice field we can merge on.

379
00:25:54,340 --> 00:25:57,900
So basically, just think about a Venn diagram,

380
00:25:57,900 --> 00:26:03,660
and we're just going to basically merge these two objects.

381
00:26:03,660 --> 00:26:06,100
And so we'll merge on this ID.

382
00:26:12,580 --> 00:26:18,180
And then we get a nice geocoded data set.

383
00:26:18,180 --> 00:26:23,340
And the problem I had last week was all of the geocodes

384
00:26:23,340 --> 00:26:27,580
were strings, and they need to be

385
00:26:27,580 --> 00:26:31,860
floats for all the plotting to go through correctly.

386
00:26:31,860 --> 00:26:37,740
So simple fix, but it threw me for a week last time.

387
00:26:37,740 --> 00:26:41,820
So we've got our data here.

388
00:26:41,820 --> 00:26:49,500
And we can even visualize this as panel data.

389
00:26:49,500 --> 00:26:54,420
So whether you want to group this by month first or ID,

390
00:26:54,420 --> 00:27:02,620
or you could just group this by ID and then month.

391
00:27:02,620 --> 00:27:04,380
Well, maybe you can.

392
00:27:04,380 --> 00:27:06,940
Maybe you can't.

393
00:27:06,940 --> 00:27:15,300
So that was not really our main focus here.

394
00:27:15,300 --> 00:27:18,580
So let's get on to plotting, because long story short,

395
00:27:18,580 --> 00:27:21,020
we've got some awesome data.

396
00:27:21,020 --> 00:27:25,500
Rule number one, look at the data.

397
00:27:25,500 --> 00:27:32,380
So we're just going to start off with a simple map here,

398
00:27:32,380 --> 00:27:35,180
since we've got the licensees data.

399
00:27:41,140 --> 00:27:47,700
So just reading in the licensees again with one more field,

400
00:27:47,700 --> 00:27:54,820
the type, so that way we can look at the licensees.

401
00:27:54,820 --> 00:27:59,780
Specifically, we're going to exclude some of the minor

402
00:27:59,780 --> 00:28:04,540
licenses just to save space on the map.

403
00:28:04,540 --> 00:28:07,180
And you'll see here in a second.

404
00:28:07,180 --> 00:28:10,260
So long story short, the code's boring.

405
00:28:10,260 --> 00:28:12,340
So if you're interested in the code,

406
00:28:12,340 --> 00:28:15,980
I'll let you look through it afterwards on GitHub.

407
00:28:15,980 --> 00:28:23,620
But we're just making a simple base map.

408
00:28:23,620 --> 00:28:28,300
I'm specifying some latitude and longitude coordinates

409
00:28:28,300 --> 00:28:31,220
of Washington state.

410
00:28:31,220 --> 00:28:36,300
And then there are some formatting functions.

411
00:28:36,300 --> 00:28:43,660
But what's cool is we're basically, I'll run it,

412
00:28:43,660 --> 00:28:46,700
and then we can talk about the code.

413
00:28:46,700 --> 00:28:49,300
So basically what we're doing here

414
00:28:49,300 --> 00:28:56,620
is we're putting a bubble at each location.

415
00:28:56,620 --> 00:29:04,420
So for now, we're just going to color code these by license type

416
00:29:04,420 --> 00:29:05,700
here.

417
00:29:05,700 --> 00:29:10,340
And so I'll open this up as an actual figure,

418
00:29:10,340 --> 00:29:14,020
so that way we can see this in full here.

419
00:29:17,660 --> 00:29:19,620
Actually, that's very interesting.

420
00:29:19,620 --> 00:29:24,860
So here we have the first visualization.

421
00:29:24,860 --> 00:29:27,900
And what we were talking about last week

422
00:29:27,900 --> 00:29:30,980
is what's cool about data augmentation

423
00:29:30,980 --> 00:29:36,380
is we can begin to visualize multiple dimensions

424
00:29:36,380 --> 00:29:40,260
of the data in two dimensions.

425
00:29:40,260 --> 00:29:45,140
And so here we're visualizing the geographic dimension,

426
00:29:45,140 --> 00:29:50,660
latitude and longitude, and the type.

427
00:29:50,660 --> 00:29:58,460
So we're able to visualize three dimensions here on this one map.

428
00:29:58,460 --> 00:30:03,140
And there's ways you can make this more complex.

429
00:30:03,140 --> 00:30:06,340
But any thoughts at first, Jerry?

430
00:30:06,340 --> 00:30:08,380
Or anyone?

431
00:30:08,380 --> 00:30:12,580
So the orange is the dispensaries.

432
00:30:12,580 --> 00:30:13,380
Let's see.

433
00:30:13,380 --> 00:30:17,380
Orange will be people who are doing processing.

434
00:30:17,380 --> 00:30:19,460
Processing.

435
00:30:19,460 --> 00:30:20,460
That's production.

436
00:30:20,460 --> 00:30:22,220
So OK.

437
00:30:22,220 --> 00:30:26,580
So the reddish color is dispensaries.

438
00:30:26,580 --> 00:30:27,780
Exactly.

439
00:30:27,780 --> 00:30:33,220
And I think it would be worthwhile to plot major cities here,

440
00:30:33,220 --> 00:30:39,020
because basically you have Seattle, Olympia.

441
00:30:39,020 --> 00:30:41,020
This is Vancouver.

442
00:30:41,020 --> 00:30:45,660
It's interesting that a lot of the stuff is close to water.

443
00:30:47,660 --> 00:30:52,180
Exactly, because a lot of the cities are close to the water.

444
00:30:52,180 --> 00:30:57,820
So in particular, the Seattle, Tacoma.

445
00:30:57,820 --> 00:30:59,740
And you've got the Cascade Mountains

446
00:30:59,740 --> 00:31:04,820
with that big empty area in the center, pretty much.

447
00:31:04,820 --> 00:31:09,220
And then to the east of the Cascades is desert.

448
00:31:09,220 --> 00:31:10,660
Very dry.

449
00:31:10,660 --> 00:31:13,620
It's an arid area.

450
00:31:13,620 --> 00:31:14,620
Exactly.

451
00:31:14,620 --> 00:31:20,460
And it's a good place for these cultivator producers,

452
00:31:20,460 --> 00:31:23,220
so people doing cultivation and production.

453
00:31:23,220 --> 00:31:29,940
And just a few outposts.

454
00:31:29,940 --> 00:31:32,540
So a lot more blue to the east of the mountains

455
00:31:32,540 --> 00:31:36,020
than to the west, it looks like.

456
00:31:36,020 --> 00:31:41,220
Definitely a lot more green, it looks like.

457
00:31:41,220 --> 00:31:45,540
You don't see too much many green dots, the cultivators

458
00:31:45,540 --> 00:31:49,340
here in Seattle.

459
00:31:49,340 --> 00:31:51,300
That's what I observe.

460
00:31:51,300 --> 00:31:56,500
You still see a lot of processors, it looks like.

461
00:31:56,500 --> 00:31:58,620
Or not a lot, but a handful.

462
00:31:58,620 --> 00:31:59,100
Right.

463
00:32:01,700 --> 00:32:06,540
So this is what we wanted to do last week.

464
00:32:06,540 --> 00:32:09,020
So we'll keep trailblazing.

465
00:32:09,020 --> 00:32:12,580
And so I'll get this data shared with you afterwards,

466
00:32:12,580 --> 00:32:15,500
so that way you can actually do this.

467
00:32:15,500 --> 00:32:20,380
Because believe it or not, I think in an old meetup,

468
00:32:20,380 --> 00:32:22,620
we geocoded these at one point.

469
00:32:22,620 --> 00:32:26,180
So I'll share the latest data with you,

470
00:32:26,180 --> 00:32:31,660
because it'll just have the latest licensees geocoded.

471
00:32:31,660 --> 00:32:40,340
If you also want to find the script yourself,

472
00:32:40,340 --> 00:32:47,980
basically I'm just using the Google Maps API

473
00:32:47,980 --> 00:32:56,660
and I'm just pinging them one by one with the address.

474
00:32:56,660 --> 00:33:01,140
So I'm just formatting it street, city, state, zip,

475
00:33:01,140 --> 00:33:05,900
and asking them for the latitude and longitude

476
00:33:05,900 --> 00:33:07,900
and keeping track of that.

477
00:33:07,900 --> 00:33:11,100
But long story short.

478
00:33:11,100 --> 00:33:13,140
That's coming from Google Maps all the way.

479
00:33:13,140 --> 00:33:13,820
Exactly.

480
00:33:13,820 --> 00:33:16,820
And so they have their own terms.

481
00:33:16,820 --> 00:33:19,220
So they've got usage limits.

482
00:33:19,220 --> 00:33:23,220
So you may be able to make so many free calls,

483
00:33:23,220 --> 00:33:29,460
but they generally want you to be passing them an API key.

484
00:33:29,460 --> 00:33:33,220
And they will bill you to a certain extent

485
00:33:33,220 --> 00:33:34,900
if you make too many calls.

486
00:33:34,900 --> 00:33:38,340
But I think data is valuable.

487
00:33:38,340 --> 00:33:43,820
So you can get these latitude and longitude points.

488
00:33:43,820 --> 00:33:47,060
So feel free to use this function.

489
00:33:47,060 --> 00:33:49,780
It generally works with any data as long

490
00:33:49,780 --> 00:33:56,660
as it's a data frame with street, city, state, and zip.

491
00:33:56,660 --> 00:33:59,540
You have to try the Microsoft map?

492
00:33:59,540 --> 00:34:01,580
I haven't, but I've been thinking

493
00:34:01,580 --> 00:34:03,900
about looking at different tools,

494
00:34:03,900 --> 00:34:09,060
because you want to have different options for people.

495
00:34:09,060 --> 00:34:10,940
You don't want to just force people

496
00:34:10,940 --> 00:34:15,340
into this certain solution that may be paid.

497
00:34:15,340 --> 00:34:20,100
I'm sure Microsoft will find a way of monetizing it.

498
00:34:20,100 --> 00:34:23,620
For now, this is just an old function

499
00:34:23,620 --> 00:34:25,420
that we wrote a long time ago.

500
00:34:25,420 --> 00:34:28,860
So I use it.

501
00:34:28,860 --> 00:34:33,140
But we've got the licensees geocoded now.

502
00:34:33,140 --> 00:34:39,980
And so now I was going to show you a novel visualization.

503
00:34:39,980 --> 00:34:43,460
And I think it's pretty cool.

504
00:34:43,460 --> 00:34:47,220
So we've got our plants.

505
00:34:47,220 --> 00:34:52,460
And we've got a location of all the cultivators.

506
00:34:52,460 --> 00:34:58,180
So I figured, OK, let's try to plot even more dimensions

507
00:34:58,180 --> 00:35:00,700
on this map here.

508
00:35:00,700 --> 00:35:08,460
So here, I'll just do it for the first observation.

509
00:35:08,460 --> 00:35:10,940
And then we'll do it for all of them here.

510
00:35:10,940 --> 00:35:16,900
So just going to be looking at the geocoded plants.

511
00:35:16,900 --> 00:35:21,260
And I'm going to just look at them month by month since.

512
00:35:23,860 --> 00:35:27,340
Well, we can do day by day, but month by month

513
00:35:27,340 --> 00:35:31,460
just to shorten the number of observations

514
00:35:31,460 --> 00:35:33,660
for the time being.

515
00:35:33,660 --> 00:35:35,980
And so what I'll show you what we're going to do here.

516
00:35:35,980 --> 00:35:40,620
So we just made this map, very similar code here.

517
00:35:40,620 --> 00:35:45,980
Except now, I'm going to use instead of the hue parameter,

518
00:35:45,980 --> 00:35:49,980
I'm going to be using size s.

519
00:35:49,980 --> 00:35:56,660
And we just say, OK, size is a function of total plants.

520
00:35:56,660 --> 00:36:03,220
I'm scaling it by 10% just to keep the markers

521
00:36:03,220 --> 00:36:04,700
under control.

522
00:36:04,700 --> 00:36:07,100
You'll see here in a second.

523
00:36:07,100 --> 00:36:09,540
It's all just a function of plants.

524
00:36:09,540 --> 00:36:14,980
And eventually, we'd like to put a legend on here.

525
00:36:14,980 --> 00:36:19,380
But I'll just show you the first plot.

526
00:36:24,540 --> 00:36:30,700
Let's make sure we have augmented our data correctly.

527
00:36:30,700 --> 00:36:38,460
Then try this plot one more time.

528
00:36:38,460 --> 00:36:43,580
So here, we're just plotting plants.

529
00:36:43,580 --> 00:36:49,340
And for some reason, it didn't import all of our helper

530
00:36:49,340 --> 00:36:51,820
functions.

531
00:36:51,820 --> 00:36:55,380
For some reason, we needed that one.

532
00:36:55,380 --> 00:37:09,980
Let's try to make this plot one more time.

533
00:37:09,980 --> 00:37:12,980
Cool.

534
00:37:12,980 --> 00:37:29,820
And so I think this is essentially, well, maybe this.

535
00:37:35,300 --> 00:37:41,580
Well, I hope we're getting saved now.

536
00:37:41,580 --> 00:37:49,340
But anyways, we can start looking at more of these plots

537
00:37:49,340 --> 00:37:49,940
here.

538
00:37:49,940 --> 00:37:52,540
And so basically, what we have here

539
00:37:52,540 --> 00:37:56,740
are plants at a particular date.

540
00:37:56,740 --> 00:38:18,700
And so well, sometimes we just have

541
00:38:18,700 --> 00:38:22,220
to power through some of these technical problems.

542
00:38:22,220 --> 00:38:29,660
But I think we've got it mostly figured out here now.

543
00:38:29,660 --> 00:38:32,620
OK, this figure is looking a little better.

544
00:38:32,620 --> 00:38:34,540
So here it is full screen.

545
00:38:34,540 --> 00:38:39,980
So this is actually sort of a misleading month.

546
00:38:39,980 --> 00:38:44,060
So this is a month where we're doing a lot of data entry.

547
00:38:44,060 --> 00:38:49,900
So let's go ahead and let this play forward a few months.

548
00:38:49,900 --> 00:38:52,580
And we can start to see essentially

549
00:38:52,580 --> 00:38:56,180
where the canopy is over time.

550
00:38:56,180 --> 00:38:59,660
So I like to think about it as cannabis

551
00:38:59,660 --> 00:39:02,620
is this biological organism.

552
00:39:02,620 --> 00:39:15,660
And it's found this creative way to habitate Washington State.

553
00:39:15,660 --> 00:39:20,620
It does coincidentally happen to be a lot indoors.

554
00:39:20,620 --> 00:39:27,060
But here this plant found this ingenious or this ingenious.

555
00:39:27,060 --> 00:39:29,660
So you're not differentiating between indoor

556
00:39:29,660 --> 00:39:33,380
and outdoor cultivation?

557
00:39:33,380 --> 00:39:34,900
I am not.

558
00:39:34,900 --> 00:39:38,380
However, I would conjecture the vast majority

559
00:39:38,380 --> 00:39:40,860
is indoor cultivation.

560
00:39:40,860 --> 00:39:47,620
There's a small amount of outdoor cultivation done.

561
00:39:47,620 --> 00:39:51,260
And I know there is greenhouse cultivation done.

562
00:39:51,260 --> 00:39:54,340
But I just don't feel like a bulk.

563
00:39:54,340 --> 00:39:57,220
And this is just my naive.

564
00:39:57,220 --> 00:40:00,980
Would you consider the greenhouse to be indoor or outdoor?

565
00:40:00,980 --> 00:40:05,060
My naive prior is almost all of this is indoor.

566
00:40:05,060 --> 00:40:08,140
I'm sure there are people doing it outdoor.

567
00:40:08,140 --> 00:40:16,900
But it's just a different product to a certain extent.

568
00:40:16,900 --> 00:40:19,260
OK.

569
00:40:19,260 --> 00:40:23,780
Well, so it's hard to maintain the same quality when

570
00:40:23,780 --> 00:40:25,220
you do it outdoor.

571
00:40:25,220 --> 00:40:31,420
You do see people doing hemp in large scale outdoor.

572
00:40:31,420 --> 00:40:38,820
And I don't know.

573
00:40:38,820 --> 00:40:42,260
So I think it's definitely a profitable thing

574
00:40:42,260 --> 00:40:44,060
if you're in the right part of the country.

575
00:40:44,060 --> 00:40:49,060
So just anecdotally, I know I've heard people say that, OK,

576
00:40:49,060 --> 00:40:53,660
a lot of the Californian cannabis is grown outdoor.

577
00:40:53,660 --> 00:40:59,020
But I'm just personally skeptical about places

578
00:40:59,020 --> 00:41:00,940
like Washington State.

579
00:41:00,940 --> 00:41:02,340
And we've got the Massachusetts.

580
00:41:02,340 --> 00:41:03,580
Cold and wet.

581
00:41:03,580 --> 00:41:05,580
Yeah.

582
00:41:05,580 --> 00:41:13,300
I'm just uncertain as to how much is actually grown outdoor.

583
00:41:13,300 --> 00:41:15,140
But that's just me talking.

584
00:41:15,140 --> 00:41:19,060
So this warrants a lot more investigation.

585
00:41:19,060 --> 00:41:25,060
And this would be sort of an investigative project.

586
00:41:25,060 --> 00:41:31,340
But we do have all of the licensees' data here.

587
00:41:31,340 --> 00:41:36,220
So I don't know if they would be thrilled about this.

588
00:41:36,220 --> 00:41:41,140
But you could potentially reach out to these licensees.

589
00:41:41,140 --> 00:41:43,020
Some of them may have websites.

590
00:41:43,020 --> 00:41:46,220
And just ask them, hey, do you do indoor?

591
00:41:46,220 --> 00:41:48,620
Do you do outdoor?

592
00:41:48,620 --> 00:41:52,460
Sort of try to conduct a survey.

593
00:41:52,460 --> 00:41:54,660
That would be really the only way you could really

594
00:41:54,660 --> 00:41:56,820
get that data point.

595
00:41:56,820 --> 00:41:59,980
Nobody collects that information.

596
00:41:59,980 --> 00:42:00,900
Not to my knowledge.

597
00:42:03,980 --> 00:42:06,780
I think it would be a good data point to have.

598
00:42:06,780 --> 00:42:10,380
Obviously, you're in demand of that data point.

599
00:42:10,380 --> 00:42:12,900
Well, and to your point, it's a lot easier

600
00:42:12,900 --> 00:42:14,860
to do quality control when it's indoors.

601
00:42:18,100 --> 00:42:20,340
There's pros and cons both ways.

602
00:42:20,340 --> 00:42:24,860
So the major con to indoors, just the high energy

603
00:42:24,860 --> 00:42:27,380
intensity.

604
00:42:27,380 --> 00:42:33,860
So you're basically replicating the sun with indoor lights.

605
00:42:33,860 --> 00:42:38,260
And the sun is remarkably good at doing what it does.

606
00:42:38,260 --> 00:42:41,300
And it just does it for free.

607
00:42:41,300 --> 00:42:44,580
But at the same time, you are controlling

608
00:42:44,580 --> 00:42:47,380
for things like pests.

609
00:42:47,380 --> 00:42:51,740
You're keeping a really standardized climate.

610
00:42:51,740 --> 00:42:53,860
So it's easy to control.

611
00:42:53,860 --> 00:42:55,820
Because that's what a lot of people say

612
00:42:55,820 --> 00:42:57,500
is the name of the game.

613
00:42:57,500 --> 00:43:02,060
Is climate control or environment control.

614
00:43:02,060 --> 00:43:08,260
Where you just anecdotally people

615
00:43:08,260 --> 00:43:14,900
have said that the more standardized and regular.

616
00:43:14,900 --> 00:43:17,300
What becomes more of a manufacturing product

617
00:43:17,300 --> 00:43:20,420
than an agricultural product in a lot of ways.

618
00:43:20,420 --> 00:43:24,340
Well, I think that's a lot of the ways people are doing

619
00:43:24,340 --> 00:43:25,900
agriculture these days.

620
00:43:25,900 --> 00:43:28,660
They're really methodical about it.

621
00:43:28,660 --> 00:43:33,860
So they're measuring it and just trying

622
00:43:33,860 --> 00:43:35,780
to find out what works, what doesn't,

623
00:43:35,780 --> 00:43:38,540
and just trying to be as efficient as possible.

624
00:43:38,540 --> 00:43:47,340
And you see that happening a lot with Canada's cultivation.

625
00:43:47,340 --> 00:43:49,220
It's just different.

626
00:43:49,220 --> 00:43:52,180
And like I said, I don't want to say this is the end all be all.

627
00:43:52,180 --> 00:43:54,020
Because you do see a lot of people

628
00:43:54,020 --> 00:43:58,060
doing it in Oregon and Washington.

629
00:43:58,060 --> 00:44:02,860
But that's just not captured in this data set.

630
00:44:02,860 --> 00:44:05,220
You were talking about data augmentation.

631
00:44:05,220 --> 00:44:11,980
It would be real interesting to overlay this cultivation.

632
00:44:11,980 --> 00:44:15,060
So this would be called the I-502.

633
00:44:15,060 --> 00:44:18,140
So this would be the recreational cannabis.

634
00:44:18,140 --> 00:44:21,140
It would be real interesting to overlay this

635
00:44:21,140 --> 00:44:25,820
with a map of hemp production.

636
00:44:25,820 --> 00:44:31,660
So that way you can see maybe if people

637
00:44:31,660 --> 00:44:36,100
are doing hemp in different locations than they're doing

638
00:44:36,100 --> 00:44:37,820
potentially indoor cannabis.

639
00:44:42,300 --> 00:44:43,740
Awesome discussion there, Jerry.

640
00:44:43,740 --> 00:44:45,980
Definitely let me know if you have any more thoughts

641
00:44:45,980 --> 00:44:47,180
come to mind.

642
00:44:47,180 --> 00:44:51,660
But basically, I'm just going to quickly start generating

643
00:44:51,660 --> 00:44:53,540
these charts one by one.

644
00:44:53,540 --> 00:44:56,100
And then I'll start to tell you about sort

645
00:44:56,100 --> 00:44:58,900
of the grand scheme of all of this.

646
00:44:58,900 --> 00:45:04,740
So here we are just generating these charts through time.

647
00:45:04,740 --> 00:45:12,420
And we may have to increase the size of these squares.

648
00:45:12,420 --> 00:45:15,220
I kind of shrunk them quite a lot.

649
00:45:15,220 --> 00:45:25,940
So let's maybe even crank these sides up a bit here.

650
00:45:25,940 --> 00:45:26,420
OK.

651
00:45:40,220 --> 00:45:40,740
OK.

652
00:45:40,740 --> 00:45:43,140
Then we can just run through these real quick.

653
00:45:43,140 --> 00:45:45,060
Anyone else from the group?

654
00:45:45,060 --> 00:45:47,620
Any questions, comments about sort of the work

655
00:45:47,620 --> 00:45:48,860
we've done here today?

656
00:45:48,860 --> 00:45:51,900
So this is just sort of a first pass

657
00:45:51,900 --> 00:45:55,420
at looking at some plant statistics here

658
00:45:55,420 --> 00:45:56,940
in Washington state.

659
00:45:56,940 --> 00:46:01,780
And so basically, the first thought that came to my mind

660
00:46:01,780 --> 00:46:06,420
is, OK, let's just get a count of the number of plants.

661
00:46:06,420 --> 00:46:12,220
And so basically, the first thought that came to my mind

662
00:46:12,220 --> 00:46:16,180
was, OK, let's just get a count of the number of plants

663
00:46:16,180 --> 00:46:17,700
in Washington state.

664
00:46:17,700 --> 00:46:20,380
And just look at them essentially

665
00:46:20,380 --> 00:46:22,260
in a spacetime dimension.

666
00:46:22,260 --> 00:46:26,060
So we can just look at them in space,

667
00:46:26,060 --> 00:46:28,820
so where they're happening in Washington state.

668
00:46:28,820 --> 00:46:32,140
And then over time, so here we're

669
00:46:32,140 --> 00:46:34,700
looking at them month by month.

670
00:46:34,700 --> 00:46:42,460
And so we can just see how they're doing over time.

671
00:46:42,460 --> 00:46:46,180
It would be interesting to plot this against sales.

672
00:46:46,180 --> 00:46:47,660
Sure.

673
00:46:47,660 --> 00:46:52,860
See how cultivation should lead sales.

674
00:46:55,820 --> 00:46:56,300
OK.

675
00:46:56,300 --> 00:47:01,300
So you're thinking do similar bubbles,

676
00:47:01,300 --> 00:47:06,540
where they're going to take to get from the production

677
00:47:06,540 --> 00:47:09,700
phase to the consumer.

678
00:47:09,700 --> 00:47:13,340
I love how you think, Jerry, because that's basically

679
00:47:13,340 --> 00:47:17,300
just data augmentation on data augmentation

680
00:47:17,300 --> 00:47:21,140
and just adding more and more dimensions to this.

681
00:47:21,140 --> 00:47:25,300
Because if we can add the retail dimension,

682
00:47:25,300 --> 00:47:28,780
then like you said, you can see how

683
00:47:28,780 --> 00:47:31,700
this evolves over time.

684
00:47:31,700 --> 00:47:38,060
And you can see how cultivation will bubble up.

685
00:47:38,060 --> 00:47:46,180
And that may be followed by or preceded by bubbles in retail.

686
00:47:49,500 --> 00:47:54,020
So that's brilliant.

687
00:47:54,020 --> 00:48:00,740
So that will add a lot more, at least another dimension.

688
00:48:04,900 --> 00:48:08,460
More than that, depending on how you classify dimensions.

689
00:48:08,460 --> 00:48:16,220
But it's just another demonstration

690
00:48:16,220 --> 00:48:20,180
about how the more you augment data,

691
00:48:20,180 --> 00:48:27,420
the more you're able to visualize it and essentially tell

692
00:48:27,420 --> 00:48:29,820
stories and inform people.

693
00:48:29,820 --> 00:48:35,220
So just whatever your goal may be with the data.

694
00:48:35,220 --> 00:48:41,300
So right now, we're just doing an exploratory exercise here.

695
00:48:41,300 --> 00:48:49,580
So next week, we'll start diving into some of these sales items.

696
00:48:49,580 --> 00:48:55,620
And so maybe we can try to augment these maps with sales

697
00:48:55,620 --> 00:48:57,820
by retailer over time.

698
00:48:57,820 --> 00:49:01,580
So here are plants by cultivator.

699
00:49:01,580 --> 00:49:04,740
So now we can try to add sales by retailer.

700
00:49:04,740 --> 00:49:09,740
I wonder why you go from 359 cultivators in June

701
00:49:09,740 --> 00:49:12,300
to 46 cultivators in July.

702
00:49:12,300 --> 00:49:15,060
There's two things that could be going on here.

703
00:49:15,060 --> 00:49:19,300
One, I stopped counting it at a certain point.

704
00:49:19,300 --> 00:49:24,540
So I may have just interrupted this routine

705
00:49:24,540 --> 00:49:27,340
while this was mid-count.

706
00:49:27,340 --> 00:49:30,180
So that's one rookie answer.

707
00:49:30,180 --> 00:49:33,700
And the second is there's something going on here

708
00:49:33,700 --> 00:49:37,500
with these total numbers of cultivators.

709
00:49:37,500 --> 00:49:42,500
And total plants.

710
00:49:42,500 --> 00:49:45,740
And so I think it could just be how people are keeping

711
00:49:45,740 --> 00:49:47,740
track of their data.

712
00:49:47,740 --> 00:49:58,860
So it doesn't really make sense for 30 cultivators just

713
00:49:58,860 --> 00:50:02,980
to go without a plan.

714
00:50:02,980 --> 00:50:08,700
So there could be oddities to how these cultivators actually

715
00:50:08,700 --> 00:50:12,420
record their data.

716
00:50:12,420 --> 00:50:16,900
So I'm not ruling that out, that there's just weird things

717
00:50:16,900 --> 00:50:19,460
going on with the way people are keeping

718
00:50:19,460 --> 00:50:22,860
track of these cultivations.

719
00:50:22,860 --> 00:50:25,940
However, I think that's a good point.

720
00:50:25,940 --> 00:50:27,940
I think that's a good point.

721
00:50:27,940 --> 00:50:33,140
People are keeping track of these cultivations.

722
00:50:33,140 --> 00:50:36,620
However, I do actually think this last one,

723
00:50:36,620 --> 00:50:38,820
I think this last chart, you may just

724
00:50:38,820 --> 00:50:41,220
want to scrap this for the time being.

725
00:50:41,220 --> 00:50:42,620
Not to an outlier.

726
00:50:42,620 --> 00:50:43,580
Oh, actually.

727
00:50:46,260 --> 00:50:47,500
Yeah.

728
00:50:47,500 --> 00:50:52,460
So this either was a miscount or there was one period

729
00:50:52,460 --> 00:50:57,020
where the Washington State traceability system sort

730
00:50:57,020 --> 00:51:01,180
of stopped to update and there wasn't much data entry.

731
00:51:01,180 --> 00:51:04,220
And so this could have been that time period.

732
00:51:04,220 --> 00:51:06,020
I don't think it was.

733
00:51:06,020 --> 00:51:09,580
I think this is just, I think I just stopped counting this

734
00:51:09,580 --> 00:51:14,220
halfway because I needed to have some material prepared

735
00:51:14,220 --> 00:51:16,620
for today's presentation.

736
00:51:16,620 --> 00:51:20,140
And so after the meetup, I'm going

737
00:51:20,140 --> 00:51:23,020
to finish running this algorithm.

738
00:51:23,020 --> 00:51:25,180
And you're welcome to run it too.

739
00:51:25,180 --> 00:51:28,900
I'll run it with you once it's finished.

740
00:51:28,900 --> 00:51:34,660
But essentially the idea is you can walk away

741
00:51:34,660 --> 00:51:40,620
at the end of the day with a series of these figures.

742
00:51:43,220 --> 00:51:51,140
And I think it would be cool to create a video audit,

743
00:51:51,140 --> 00:51:52,900
sort of a video of this.

744
00:51:52,900 --> 00:51:54,620
You've got a little animation.

745
00:51:54,620 --> 00:51:55,460
Exactly.

746
00:51:55,460 --> 00:52:02,580
So I was thinking you see things like the trees

747
00:52:02,580 --> 00:52:06,180
are breathing in and out.

748
00:52:06,180 --> 00:52:08,700
Well, this is sort of cannabis.

749
00:52:12,140 --> 00:52:16,660
You can think about this from a biologist's point of view

750
00:52:16,660 --> 00:52:24,180
where you see species migrate and populations wax and wane

751
00:52:24,180 --> 00:52:26,540
over time.

752
00:52:26,540 --> 00:52:28,140
And so I just think it's interesting

753
00:52:28,140 --> 00:52:33,780
that you can see cannabis in Washington state.

754
00:52:33,780 --> 00:52:40,100
You wax and wane over time where in the state it is.

755
00:52:40,100 --> 00:52:45,540
And largely, this is a human phenomenon.

756
00:52:45,540 --> 00:52:48,580
Like I said, it wouldn't surprise me

757
00:52:48,580 --> 00:52:54,660
if cannabis just would grow here maybe in splotches here

758
00:52:54,660 --> 00:52:58,340
and there because it is such a lush climate.

759
00:52:58,340 --> 00:53:01,580
But I would say a lot of this canopy

760
00:53:01,580 --> 00:53:07,420
is because of humans maintaining these cultivations.

761
00:53:07,420 --> 00:53:11,140
So I just thought it was just an interesting observation

762
00:53:11,140 --> 00:53:17,780
that cannabis has just found a unique niche

763
00:53:17,780 --> 00:53:22,700
and way to thrive here in the Pacific Northwest.

764
00:53:22,700 --> 00:53:25,180
Thanks for bearing through all those interruptions.

765
00:53:25,180 --> 00:53:27,260
But I think we got on track there.

766
00:53:27,260 --> 00:53:32,020
And let me know what you think of it.

767
00:53:32,020 --> 00:53:36,380
But I think we're just scraping the tip of an iceberg

768
00:53:36,380 --> 00:53:39,380
of some real interesting analysis here.

769
00:53:39,380 --> 00:53:42,460
So that was just a first pass.

770
00:53:42,460 --> 00:53:47,900
The data still needs to be finished, collected.

771
00:53:47,900 --> 00:53:50,340
Well, the algorithm needs to finish running.

772
00:53:50,340 --> 00:53:53,180
Then I'll share that with you because I

773
00:53:53,180 --> 00:53:55,580
think these are valuable data points to have.

774
00:53:55,580 --> 00:54:01,660
So let me know if you think of the other interesting ways

775
00:54:01,660 --> 00:54:03,300
to augment this data.

776
00:54:03,300 --> 00:54:05,540
I love the sales idea.

777
00:54:05,540 --> 00:54:07,740
So we can add that to the figure.

778
00:54:07,740 --> 00:54:09,820
And let's just keep adding layers.

779
00:54:09,820 --> 00:54:13,940
I think there's a lot we can do with this.

780
00:54:13,940 --> 00:54:17,180
Unfortunately, I guess it's also not all specifically

781
00:54:17,180 --> 00:54:17,940
so new to me.

782
00:54:17,940 --> 00:54:19,580
I don't really have anything to add.

783
00:54:19,580 --> 00:54:22,260
But I'm intrigued enough to continue.

784
00:54:22,260 --> 00:54:28,340
Do you have your meetings at the regular same time every week?

785
00:54:28,340 --> 00:54:29,140
Exactly.

786
00:54:29,140 --> 00:54:33,700
So you can catch the cannabis data science Wednesdays,

787
00:54:33,700 --> 00:54:38,100
830 Pacific time AM, and then 1130

788
00:54:38,100 --> 00:54:41,420
AM Eastern Standard Time.

789
00:54:41,420 --> 00:54:43,020
And yeah, exactly.

790
00:54:43,020 --> 00:54:45,580
I'll get the code posted to GitHub.

791
00:54:45,580 --> 00:54:47,940
So that way, you can follow through there.

792
00:54:47,940 --> 00:54:50,660
And then, yeah, feel free to let me

793
00:54:50,660 --> 00:54:52,220
know if you have any ideas or anything

794
00:54:52,220 --> 00:54:53,380
you want covered in the group.

795
00:54:53,380 --> 00:54:56,300
Because it goes.

796
00:54:56,300 --> 00:54:58,420
You mentioned speed of the code.

797
00:54:58,420 --> 00:54:59,580
I am interested.

798
00:54:59,580 --> 00:55:03,940
There's been a push towards this other language called Julia.

799
00:55:03,940 --> 00:55:09,020
So I'm wondering if it would run a lot faster than that.

800
00:55:09,020 --> 00:55:11,340
I heard about Julia a long time ago.

801
00:55:11,340 --> 00:55:16,700
I haven't heard too much noise.

802
00:55:16,700 --> 00:55:19,100
I don't necessarily have my ear to the ground.

803
00:55:22,140 --> 00:55:25,260
There's a few things that I've been interested in.

804
00:55:25,260 --> 00:55:27,940
I've just been lucky enough that Python

805
00:55:27,940 --> 00:55:30,420
has found a bit of a popularity.

806
00:55:30,420 --> 00:55:34,380
I know it's got its downsides.

807
00:55:34,380 --> 00:55:36,460
But I just say, oh, it's just a good job.

808
00:55:36,460 --> 00:55:38,540
I got to go, guys.

809
00:55:38,540 --> 00:55:40,060
Thank you very much.

810
00:55:40,060 --> 00:55:42,460
See you, Jerry.

811
00:55:42,460 --> 00:55:48,420
But yeah, definitely, we're not set in stone

812
00:55:48,420 --> 00:55:50,340
with any one tool or the other.

813
00:55:50,340 --> 00:55:55,060
So if you have any good tools or uses

814
00:55:55,060 --> 00:55:58,100
or you want to share your work, you're always welcome to.

815
00:55:58,100 --> 00:56:00,540
Because this is a platform for everyone.

816
00:56:00,540 --> 00:56:02,940
So you're always welcome to.

817
00:56:02,940 --> 00:56:05,260
If you did crunch some interesting numbers,

818
00:56:05,260 --> 00:56:07,940
you're really welcome to share them with the group.

819
00:56:07,940 --> 00:56:09,060
Oh, cool.

820
00:56:09,060 --> 00:56:14,420
Yeah, I'll try to make it to next week's one.

821
00:56:14,420 --> 00:56:19,340
And I'll try to find you on LinkedIn.

822
00:56:19,340 --> 00:56:24,580
You can find the TANlytics on LinkedIn.

823
00:56:24,580 --> 00:56:27,580
It's the company I started, Data Analytics Company.

824
00:56:27,580 --> 00:56:31,340
So that's sort of where I spend the bulk of my time.

825
00:56:31,340 --> 00:56:36,260
And they help just get the videos up,

826
00:56:36,260 --> 00:56:39,220
make sure everything stays online and up and running.

827
00:56:39,220 --> 00:56:43,020
So that's sort of the backbone of the group.

828
00:56:43,020 --> 00:56:49,500
But the main presence is the Meetup page for now.

829
00:56:49,500 --> 00:56:54,700
But we've put together some good work on the TANlytics website.

830
00:56:54,700 --> 00:56:59,700
You can actually find the paper from one of our team members,

831
00:56:59,700 --> 00:57:00,980
Paul Kitko.

832
00:57:00,980 --> 00:57:05,220
So he was a regular while he was pursuing his master's

833
00:57:05,220 --> 00:57:06,540
in data science.

834
00:57:06,540 --> 00:57:09,020
He's in hot demand right now.

835
00:57:09,020 --> 00:57:12,900
So he's basically a hot cake since he's

836
00:57:12,900 --> 00:57:15,340
got a master's in data science.

837
00:57:15,340 --> 00:57:19,660
So we were lucky enough to have him write the paper with sales

838
00:57:19,660 --> 00:57:24,180
data in Washington state, which we'll be exploring next week.

839
00:57:24,180 --> 00:57:28,140
So if you want to tease her, check out some of his work.

840
00:57:28,140 --> 00:57:29,660
All right.

841
00:57:29,660 --> 00:57:31,060
Awesome.

842
00:57:31,060 --> 00:57:31,580
Awesome.

843
00:57:31,580 --> 00:57:33,460
Well, keep your nose to the grindstone.

844
00:57:33,460 --> 00:57:35,220
And I hope you have an awesome week.

845
00:57:35,220 --> 00:57:42,140
Full

