1
00:00:00,000 --> 00:00:15,280
And I guess just to introduce you to Charles, this is Paul.

2
00:00:15,280 --> 00:00:16,880
Paul, this is Charles.

3
00:00:16,880 --> 00:00:17,880
Hi.

4
00:00:17,880 --> 00:00:27,520
Charles has been in the data science field for, he's a veteran and he knows his way around

5
00:00:27,520 --> 00:00:30,840
data.

6
00:00:30,840 --> 00:00:37,680
Charles Paul is getting, correct me if I'm wrong, a master's in data science.

7
00:00:37,680 --> 00:00:38,680
Yeah.

8
00:00:38,680 --> 00:00:42,640
Why don't you introduce yourselves?

9
00:00:42,640 --> 00:00:43,640
Sure.

10
00:00:43,640 --> 00:00:46,480
Okay, I can go first, Charles.

11
00:00:46,480 --> 00:00:49,560
Yeah, pleasure to meet you.

12
00:00:49,560 --> 00:00:54,600
Yeah, I'm just wrapping up a master's in data science from University of Wisconsin.

13
00:00:54,600 --> 00:00:57,160
I'm a data scientist at General Motors.

14
00:00:57,160 --> 00:01:05,240
I work in our global vehicle safety division and we use vehicle telematics to try and

15
00:01:05,240 --> 00:01:11,960
uncover safety issues with vehicles before our customers experience those things to the

16
00:01:11,960 --> 00:01:14,000
greatest extent that we can.

17
00:01:14,000 --> 00:01:15,760
I'm part of a pretty large team.

18
00:01:15,760 --> 00:01:18,440
There's about a dozen data scientists on the team.

19
00:01:18,440 --> 00:01:23,040
We're kind of split between unstructured data and structured data.

20
00:01:23,040 --> 00:01:31,000
We do a lot of text analysis of customer feedback and technician feedback from dealership services.

21
00:01:31,000 --> 00:01:36,680
We look at that data and then we also use OnStar, our telematics data, to try and understand

22
00:01:36,680 --> 00:01:39,320
what's going on with customers' vehicles.

23
00:01:39,320 --> 00:01:43,120
That's kind of my background in data science.

24
00:01:43,120 --> 00:01:47,620
Before that, I got my start in business intelligence.

25
00:01:47,620 --> 00:01:51,640
Before that, in IT project management and before that, I was an officer in the Air Force

26
00:01:51,640 --> 00:01:55,320
for a while, so that's a bit about my background.

27
00:01:55,320 --> 00:01:56,320
Okay.

28
00:01:56,320 --> 00:02:04,040
So, yeah, I'm Charles and I've been a programmer for like over 27 years.

29
00:02:04,040 --> 00:02:09,960
I'm kind of the last year, year and a half, I've been getting into data science.

30
00:02:09,960 --> 00:02:18,840
I took some graduate level classes in machine learning in the early 2000s.

31
00:02:18,840 --> 00:02:25,320
I worked for Ford on a system that tested vehicles at the end of the assembly line and

32
00:02:25,320 --> 00:02:26,800
it would detect defects.

33
00:02:26,800 --> 00:02:27,800
Yeah, yeah.

34
00:02:27,800 --> 00:02:28,800
Okay.

35
00:02:28,800 --> 00:02:31,040
I know exactly we have the same thing at GM.

36
00:02:31,040 --> 00:02:35,000
Right as they come off the line, they plug them in and run all kinds of diagnostic tests

37
00:02:35,000 --> 00:02:37,280
on it, so I'm sure it's the same type of thing.

38
00:02:37,280 --> 00:02:38,280
So are you in Detroit?

39
00:02:38,280 --> 00:02:39,640
Yeah, I'm just north.

40
00:02:39,640 --> 00:02:43,360
I'm about, I'd say about an hour north of Detroit.

41
00:02:43,360 --> 00:02:44,360
Okay.

42
00:02:44,360 --> 00:02:45,640
I went to Wayne State.

43
00:02:45,640 --> 00:02:46,640
Oh, yeah.

44
00:02:46,640 --> 00:02:51,280
I know my wife, she did some of her schooling there.

45
00:02:51,280 --> 00:02:56,080
I've been down to Wayne State a couple of times for some predictive analytics conferences.

46
00:02:56,080 --> 00:02:57,080
So yeah, good school.

47
00:02:57,080 --> 00:02:58,080
Yeah.

48
00:02:58,080 --> 00:03:01,480
Yeah, I got a degree in math from there.

49
00:03:01,480 --> 00:03:02,480
Okay.

50
00:03:02,480 --> 00:03:06,720
Did, so are you from here, Southeast Michigan?

51
00:03:06,720 --> 00:03:07,720
Yeah.

52
00:03:07,720 --> 00:03:12,440
Yeah, I grew up in like Detroit, Roseville.

53
00:03:12,440 --> 00:03:15,200
My wife grew up in Roseville.

54
00:03:15,200 --> 00:03:20,160
Yeah, she grew up in Roseville.

55
00:03:20,160 --> 00:03:27,320
She's Middle Eastern by descent and her family emigrated in the early 80s and she went to

56
00:03:27,320 --> 00:03:29,240
Roseville High School and everything.

57
00:03:29,240 --> 00:03:30,240
So yeah.

58
00:03:30,240 --> 00:03:31,240
I went to Fraser High.

59
00:03:31,240 --> 00:03:33,520
I was like at the north end of Roseville.

60
00:03:33,520 --> 00:03:34,520
Okay.

61
00:03:34,520 --> 00:03:35,520
Wow, such a small world.

62
00:03:35,520 --> 00:03:38,360
Are you still in Southeast Michigan or are you somewhere else now?

63
00:03:38,360 --> 00:03:39,840
I'm in Portland, Oregon.

64
00:03:39,840 --> 00:03:40,840
Oh, okay.

65
00:03:40,840 --> 00:03:43,800
So you're just in the same neck of the woods as Keegan then?

66
00:03:43,800 --> 00:03:44,800
Yeah.

67
00:03:44,800 --> 00:03:45,800
Okay.

68
00:03:45,800 --> 00:03:46,800
All right.

69
00:03:46,800 --> 00:03:47,800
That's cool.

70
00:03:47,800 --> 00:03:48,800
It's good to meet you.

71
00:03:48,800 --> 00:03:49,800
It's a small world, huh?

72
00:03:49,800 --> 00:03:50,800
It is.

73
00:03:50,800 --> 00:03:51,800
Yeah.

74
00:03:51,800 --> 00:03:58,000
That's so funny because both of you seem to have deep programming backgrounds.

75
00:03:58,000 --> 00:03:59,000
I'm more of a hacker.

76
00:03:59,000 --> 00:04:04,080
I try to do whatever I can to try and get the answer I'm finding, but it sounds like

77
00:04:04,080 --> 00:04:08,440
you guys are pretty experienced developers.

78
00:04:08,440 --> 00:04:13,120
Well, you know, it's just tools for the trade.

79
00:04:13,120 --> 00:04:15,440
So you've got a task at hand.

80
00:04:15,440 --> 00:04:20,880
So it's awesome to see you using the tool.

81
00:04:20,880 --> 00:04:24,160
And like whenever you're doing anything new, you end up sort of hacking around anyway to

82
00:04:24,160 --> 00:04:26,400
figure it out and then...

83
00:04:26,400 --> 00:04:27,400
Exactly.

84
00:04:27,400 --> 00:04:28,400
Yeah.

85
00:04:28,400 --> 00:04:29,400
Yeah, we...

86
00:04:29,400 --> 00:04:36,600
Yeah, here at GM, we use Python, PySpark.

87
00:04:36,600 --> 00:04:41,280
We have a Hadoop environment because our data sets are so large.

88
00:04:41,280 --> 00:04:47,000
So we're usually using Jupyter Labs environment to work in that environment.

89
00:04:47,000 --> 00:04:54,240
And then some of our desktop kind of exploratory analysis is done in R. And I tend to lean

90
00:04:54,240 --> 00:04:57,960
more towards R, although it's not as popular as Python.

91
00:04:57,960 --> 00:05:00,120
But for me, it's a little easier.

92
00:05:00,120 --> 00:05:02,640
So that's why I tend to lean towards the R side of the house.

93
00:05:02,640 --> 00:05:03,640
But yeah.

94
00:05:03,640 --> 00:05:04,640
So yeah, all good.

95
00:05:04,640 --> 00:05:05,640
It's good to...

96
00:05:05,640 --> 00:05:11,800
I just can't believe that you actually from Roseville that pulls me away.

97
00:05:11,800 --> 00:05:12,800
Yeah.

98
00:05:12,800 --> 00:05:22,000
Yeah, I figure the odds.

99
00:05:22,000 --> 00:05:24,560
I guess you wanted...

100
00:05:24,560 --> 00:05:31,600
I guess I'm just real curious about your use with big data because we've been having a

101
00:05:31,600 --> 00:05:36,160
bit of...

102
00:05:36,160 --> 00:05:40,280
That's one of our things, one of our tasks at hand is working with big data, in particular

103
00:05:40,280 --> 00:05:42,120
this large Washington state data.

104
00:05:42,120 --> 00:05:45,640
So I'm just curious, what's been your experience?

105
00:05:45,640 --> 00:05:51,600
I pointed you in the direction of the sales data and you seem to read it in without any

106
00:05:51,600 --> 00:05:52,600
trouble.

107
00:05:52,600 --> 00:05:55,800
So is R pretty powerful for that?

108
00:05:55,800 --> 00:05:57,960
I didn't have any real big problems.

109
00:05:57,960 --> 00:06:01,800
I just used...

110
00:06:01,800 --> 00:06:06,200
There's a group of packages called the tidyverse.

111
00:06:06,200 --> 00:06:09,480
And that's what I lean towards.

112
00:06:09,480 --> 00:06:13,400
And I just read it in using the read CSV function.

113
00:06:13,400 --> 00:06:19,040
And I read in the first 20,000 rows just to take a look, take a peek and it read right

114
00:06:19,040 --> 00:06:20,040
in fine.

115
00:06:20,040 --> 00:06:21,640
And that's a large data set too.

116
00:06:21,640 --> 00:06:26,040
I forgot how many gigabytes that is, but I didn't have any trouble.

117
00:06:26,040 --> 00:06:28,280
My machine, I think I have...

118
00:06:28,280 --> 00:06:30,440
I'm not sure.

119
00:06:30,440 --> 00:06:37,600
I have to check how much RAM I've gotten here, but I didn't really have a problem just taking

120
00:06:37,600 --> 00:06:38,600
a peek.

121
00:06:38,600 --> 00:06:39,600
Awesome.

122
00:06:39,600 --> 00:06:40,600
Awesome, awesome.

123
00:06:40,600 --> 00:06:44,440
Well, I'm glad you started to get a look at that.

124
00:06:44,440 --> 00:06:45,440
And...

125
00:06:45,440 --> 00:06:46,440
Yeah, it's...

126
00:06:46,440 --> 00:06:49,440
Or what are you going to say?

127
00:06:49,440 --> 00:06:50,440
Oh, I'm sorry.

128
00:06:50,440 --> 00:06:54,800
I think it might be a little lag.

129
00:06:54,800 --> 00:06:59,360
I'm not interrupting you too much.

130
00:06:59,360 --> 00:07:00,360
So I printed out...

131
00:07:00,360 --> 00:07:01,360
Where is it?

132
00:07:01,360 --> 00:07:02,360
Yeah.

133
00:07:02,360 --> 00:07:12,960
I printed out the data set, the Washington data set, the user manual, the table of contents,

134
00:07:12,960 --> 00:07:18,760
just to learn my way around all the different table names and try to get used to the business

135
00:07:18,760 --> 00:07:24,480
logic that is required of the system to run the different types of businesses, whether

136
00:07:24,480 --> 00:07:28,880
you're a producer or if you're a lab or anything like that.

137
00:07:28,880 --> 00:07:32,720
And just try and get familiar with some of those things.

138
00:07:32,720 --> 00:07:35,920
What I would like to try and take a shot at...

139
00:07:35,920 --> 00:07:40,280
And let me back up real quick for the benefit of Charles.

140
00:07:40,280 --> 00:07:42,200
So I reached out to Keegan last week, actually.

141
00:07:42,200 --> 00:07:46,120
I just stumbled onto this meetup.

142
00:07:46,120 --> 00:07:51,920
And I'm wrapping up my graduate program and I have to do graduate project.

143
00:07:51,920 --> 00:07:54,800
And I was trying to figure out, what am I going to do?

144
00:07:54,800 --> 00:07:59,280
And I have some family members are actually involved in the business.

145
00:07:59,280 --> 00:08:02,920
And I have a periphery view of that.

146
00:08:02,920 --> 00:08:08,080
But I thought, well, is there something I could do for my project?

147
00:08:08,080 --> 00:08:12,320
So I reached out to Keegan and he was kind enough to talk with me and give me some ideas

148
00:08:12,320 --> 00:08:16,080
and maybe some questions I could try to answer with the data.

149
00:08:16,080 --> 00:08:18,120
So that's why I've kind of...

150
00:08:18,120 --> 00:08:22,560
The last week or so I've been trying to dig into the data set and try to understand it

151
00:08:22,560 --> 00:08:26,720
as best I could with the amount of time I've had.

152
00:08:26,720 --> 00:08:30,360
But yeah, so one of the things I wanted to do is...

153
00:08:30,360 --> 00:08:33,800
It may be a waste of time depending on what work you guys have already done.

154
00:08:33,800 --> 00:08:40,240
But for me, it would be nice if there was an entity relationship diagram of the tables.

155
00:08:40,240 --> 00:08:46,520
So know how the different keys are related and how all these different fields are connected

156
00:08:46,520 --> 00:08:49,440
within the different tables.

157
00:08:49,440 --> 00:08:53,200
And so I was thinking about taking a shot at doing something like that based on what

158
00:08:53,200 --> 00:09:01,320
I could scrape out of the user guide and sort of have this big picture map of what's going

159
00:09:01,320 --> 00:09:02,320
on in the data.

160
00:09:02,320 --> 00:09:06,960
I don't know if you think it's a worthwhile thing or something you've already done or

161
00:09:06,960 --> 00:09:09,920
maybe it's a lot of extra...

162
00:09:09,920 --> 00:09:11,760
I like how you think.

163
00:09:11,760 --> 00:09:17,760
I think that needs to be done because I think at the moment I just have this vague mental

164
00:09:17,760 --> 00:09:21,840
map about how IDs connect.

165
00:09:21,840 --> 00:09:27,040
But it's better to formalize things because then you can actually have it down on paper

166
00:09:27,040 --> 00:09:31,440
and you can actually start to see all of the connections.

167
00:09:31,440 --> 00:09:34,480
So I think that needs to be done.

168
00:09:34,480 --> 00:09:38,640
It's all essentially based on global IDs.

169
00:09:38,640 --> 00:09:48,320
So each object for the most part has a global ID.

170
00:09:48,320 --> 00:09:52,160
And so it's all about sort of tracing.

171
00:09:52,160 --> 00:09:55,920
Yeah, I guess tracing those.

172
00:09:55,920 --> 00:10:09,680
But it'll be...

173
00:10:09,680 --> 00:10:13,720
So basically what I was working on right now was essentially mapping.

174
00:10:13,720 --> 00:10:18,400
The first connection was just mapping the lab result.

175
00:10:18,400 --> 00:10:19,840
For me, I started with lab results.

176
00:10:19,840 --> 00:10:26,360
So I was just mapping the lab result to the licensee.

177
00:10:26,360 --> 00:10:35,800
And so that one, I guess you would just say the connection would be like the 4MME ID of

178
00:10:35,800 --> 00:10:37,280
the lab result ID.

179
00:10:37,280 --> 00:10:45,200
Here, I'll share my screen and share what I'm talking about.

180
00:10:45,200 --> 00:10:48,600
So one thing, every table has a global ID.

181
00:10:48,600 --> 00:10:49,600
Okay.

182
00:10:49,600 --> 00:10:54,280
But those global IDs are actually...

183
00:10:54,280 --> 00:11:02,280
It's actually the record ID for that entry in the table.

184
00:11:02,280 --> 00:11:06,080
Like the sales table has a global ID, but that's actually a sales ID.

185
00:11:06,080 --> 00:11:12,600
And the lab results ID has a global ID, but that's actually the lab results ID.

186
00:11:12,600 --> 00:11:17,800
And so trying to combine those tables, there's always this conflict that every table has

187
00:11:17,800 --> 00:11:19,440
a global ID.

188
00:11:19,440 --> 00:11:24,160
So I think one thing to do is to go through and rename those.

189
00:11:24,160 --> 00:11:25,160
Okay.

190
00:11:25,160 --> 00:11:26,160
Okay.

191
00:11:26,160 --> 00:11:31,120
So there is more of a local ID to the table in question.

192
00:11:31,120 --> 00:11:34,320
And then linking those ideas...

193
00:11:34,320 --> 00:11:39,160
I mean, I assume that in some of these tables, they have ID sharing, right?

194
00:11:39,160 --> 00:11:41,680
So you would have a global ID from...

195
00:11:41,680 --> 00:11:43,280
I'm just looking at lists here from...

196
00:11:43,280 --> 00:11:44,640
I'm making an example.

197
00:11:44,640 --> 00:11:51,560
A global ID from the sales table would also show up in the sales item table, right?

198
00:11:51,560 --> 00:11:52,560
To make that linkage.

199
00:11:52,560 --> 00:11:54,680
So, okay, I see what you're saying.

200
00:11:54,680 --> 00:11:59,640
So there needs to be some distinctions made between that global ID that they actually

201
00:11:59,640 --> 00:12:03,160
are specific to the table that they're embedded in.

202
00:12:03,160 --> 00:12:04,160
Right.

203
00:12:04,160 --> 00:12:05,160
Yeah.

204
00:12:05,160 --> 00:12:09,200
And now we're actually making a map of...

205
00:12:09,200 --> 00:12:10,560
Sort of a schema map of the tables.

206
00:12:10,560 --> 00:12:11,560
That would be great.

207
00:12:11,560 --> 00:12:14,480
That would be helpful.

208
00:12:14,480 --> 00:12:16,680
They are linked.

209
00:12:16,680 --> 00:12:25,680
The global ID that's in the lab results shows up as the lab result ID in the inventories.

210
00:12:25,680 --> 00:12:26,680
Okay.

211
00:12:26,680 --> 00:12:27,680
Yeah.

212
00:12:27,680 --> 00:12:35,040
And until you get it all down and can see it all laid out, I mean, they have that in

213
00:12:35,040 --> 00:12:41,640
the user guide, but it's obviously not a very practical way to look at it.

214
00:12:41,640 --> 00:12:42,640
Okay.

215
00:12:42,640 --> 00:12:43,640
Go ahead, Keith.

216
00:12:43,640 --> 00:12:44,640
Go ahead, Keegan.

217
00:12:44,640 --> 00:12:51,280
I think you both have a very good mental framework, and we've both all come to the conclusion

218
00:12:51,280 --> 00:12:55,920
that we need to map this out, but I was just going to show you an example.

219
00:12:55,920 --> 00:12:59,640
So here you found the data.

220
00:12:59,640 --> 00:13:05,200
So pretty much the core is the licensees, right?

221
00:13:05,200 --> 00:13:10,680
So they're sort of what I think of as like the core object.

222
00:13:10,680 --> 00:13:17,000
So here you just have just a given licensee.

223
00:13:17,000 --> 00:13:24,280
And I think this is what Charles did at, which I kind of like that idea is...

224
00:13:24,280 --> 00:13:32,400
So here you have sort of a global ID for the license.

225
00:13:32,400 --> 00:13:36,520
You deal with a lot of global IDs along the way.

226
00:13:36,520 --> 00:13:40,960
So we may want to...

227
00:13:40,960 --> 00:13:45,280
I guess that's also maybe called the MME ID.

228
00:13:45,280 --> 00:13:48,560
So we may want to make...

229
00:13:48,560 --> 00:13:56,960
That was maybe an idea is to just make our own formalization of what they call that.

230
00:13:56,960 --> 00:13:59,560
But we can think about that.

231
00:13:59,560 --> 00:14:01,080
What is the format of that ID?

232
00:14:01,080 --> 00:14:08,200
You got WAWA1. and then something else after that, is that a typical formatting of the

233
00:14:08,200 --> 00:14:09,200
ID?

234
00:14:09,200 --> 00:14:14,200
I think it's...

235
00:14:14,200 --> 00:14:22,960
I think it's usually just...

236
00:14:22,960 --> 00:14:27,440
Well, it looks like it maybe begins with MN.

237
00:14:27,440 --> 00:14:34,040
So that's probably like, I'm not 100% certain for the abbreviation, maybe like medical marijuana

238
00:14:34,040 --> 00:14:35,040
or something.

239
00:14:35,040 --> 00:14:39,600
And then it looks like followed by just a random...

240
00:14:39,600 --> 00:14:47,000
It looks like it increments through one to nine and then...

241
00:14:47,000 --> 00:14:48,000
Exactly.

242
00:14:48,000 --> 00:14:51,680
So just the alphanumeric counter.

243
00:14:51,680 --> 00:14:58,160
Yeah, and there's a different format for labs, for producers.

244
00:14:58,160 --> 00:14:59,160
Okay.

245
00:14:59,160 --> 00:15:16,400
And this is another interesting point is the...

246
00:15:16,400 --> 00:15:23,880
The code is essentially their license number.

247
00:15:23,880 --> 00:15:28,960
So let's look at a lab result real quick.

248
00:15:28,960 --> 00:15:33,200
So I just read in the lab results.

249
00:15:33,200 --> 00:15:47,560
There's a lot of data here, but...

250
00:15:47,560 --> 00:15:52,640
Okay.

251
00:15:52,640 --> 00:16:01,720
So essentially, I guess we can talk about the IDs more in a second, but just to hit

252
00:16:01,720 --> 00:16:02,720
on it real quick.

253
00:16:02,720 --> 00:16:09,880
The way I was able to connect these was basically say, okay, so the lab results has this field

254
00:16:09,880 --> 00:16:15,920
that's for MME ID.

255
00:16:15,920 --> 00:16:25,360
And so that is the global ID of the license data.

256
00:16:25,360 --> 00:16:31,360
And so you're able to merge the data on those two.

257
00:16:31,360 --> 00:16:34,320
And so I think that's where you were talking about.

258
00:16:34,320 --> 00:16:42,960
I think that's where you were talking about we can maybe start to create a visual of the

259
00:16:42,960 --> 00:16:43,960
actual...

260
00:16:43,960 --> 00:16:44,960
Right.

261
00:16:44,960 --> 00:16:47,960
Like a bit more abstraction.

262
00:16:47,960 --> 00:16:54,400
What were you going to say?

263
00:16:54,400 --> 00:17:01,360
So I just found this online utility where I took the user guide and it tries to convert

264
00:17:01,360 --> 00:17:05,600
it like PDF files and tries to convert it into a table.

265
00:17:05,600 --> 00:17:10,280
So obviously where there's all the text and everything in the user guide, it has a bunch

266
00:17:10,280 --> 00:17:15,840
of just line after line if you import it into Excel.

267
00:17:15,840 --> 00:17:22,620
But then the tables themselves with a little bit of minimal cleanup, I can extract those

268
00:17:22,620 --> 00:17:25,520
tables and make them their own CSV files.

269
00:17:25,520 --> 00:17:30,880
And then there was another online utility I found that you can feed in CSV files and

270
00:17:30,880 --> 00:17:38,960
it will try to join the tables together as best it can based on common field aiming.

271
00:17:38,960 --> 00:17:41,280
And that might get us part of the way there.

272
00:17:41,280 --> 00:17:44,840
I haven't tried it yet, but that might get us part of the way there to speed it up because

273
00:17:44,840 --> 00:17:49,720
obviously going through and trying to map things manually is not the optimal way of

274
00:17:49,720 --> 00:17:50,720
doing it.

275
00:17:50,720 --> 00:17:56,440
But there may be, I don't know, what do you guys think of that idea?

276
00:17:56,440 --> 00:18:03,880
Honestly I think it's worth a stab just to try it.

277
00:18:03,880 --> 00:18:18,000
However actually, we have the data guide here.

278
00:18:18,000 --> 00:18:19,880
I think it's worth a stab.

279
00:18:19,880 --> 00:18:25,160
I think ultimately we're just going to have to get in here though and just sort of match

280
00:18:25,160 --> 00:18:27,440
up the main fields.

281
00:18:27,440 --> 00:18:39,280
So yeah, because like the licensee global ID is the 4-MEME ID.

282
00:18:39,280 --> 00:18:46,120
It doesn't match the global ID in the lab results.

283
00:18:46,120 --> 00:18:54,080
I mean Panda has a really hard time with this and I have to name things.

284
00:18:54,080 --> 00:18:56,120
Right.

285
00:18:56,120 --> 00:19:00,080
Well at the very least, I'll go through the exercise.

286
00:19:00,080 --> 00:19:06,480
I mean it's a good exercise for me to go through for familiarity reasons, right?

287
00:19:06,480 --> 00:19:13,160
To go through and scrape the tables out of the document, put them into like a CSV file

288
00:19:13,160 --> 00:19:16,440
that I could share with you guys.

289
00:19:16,440 --> 00:19:21,720
And we could then start, if we need to go through and do some manual mapping, we could

290
00:19:21,720 --> 00:19:22,720
do that too.

291
00:19:22,720 --> 00:19:29,680
At least it will give us a common data file, a common worksheet that we could collaborate

292
00:19:29,680 --> 00:19:33,240
on or whatever.

293
00:19:33,240 --> 00:19:40,920
So I think that would be incredibly helpful Paul because basically any stab, any way we

294
00:19:40,920 --> 00:19:44,080
can try to connect this data the better.

295
00:19:44,080 --> 00:19:48,600
Because I think it's worth a go because what do we have here?

296
00:19:48,600 --> 00:19:56,240
We've got about, I don't know, a dozen or two dozen, more than a dozen, or maybe two

297
00:19:56,240 --> 00:20:01,120
dozen whole like data sets here.

298
00:20:01,120 --> 00:20:05,280
And they're all loosely connected.

299
00:20:05,280 --> 00:20:10,160
And I think just first would be just a nice mental map.

300
00:20:10,160 --> 00:20:17,040
So what I'm imagining is almost just, you can do it a little cleaner, but just a real

301
00:20:17,040 --> 00:20:18,520
quick stab at it.

302
00:20:18,520 --> 00:20:27,880
You could just almost just do something like you've got the licensees here and they've

303
00:20:27,880 --> 00:20:35,120
got some data.

304
00:20:35,120 --> 00:20:37,800
But they maybe connect up.

305
00:20:37,800 --> 00:20:42,520
And we can think about how we want to visualize this.

306
00:20:42,520 --> 00:21:02,360
But basically, we basically show that licensees connect to lab results.

307
00:21:02,360 --> 00:21:13,000
And then this would be like from, this is just real quick and dirty.

308
00:21:13,000 --> 00:21:15,480
You could just sort of have arrows.

309
00:21:15,480 --> 00:21:16,480
Exactly.

310
00:21:16,480 --> 00:21:17,480
Yeah.

311
00:21:17,480 --> 00:21:20,760
And that would just be a good starting point.

312
00:21:20,760 --> 00:21:27,480
I know that for my school projects, I have to be able to account for the data set that

313
00:21:27,480 --> 00:21:30,320
I'm using, how I made sense of it, and those types of things.

314
00:21:30,320 --> 00:21:37,000
So I would have to go through some of this exercise anyway.

315
00:21:37,000 --> 00:21:43,520
So yeah, I could take a stab at it, take a first pass to see how far I get.

316
00:21:43,520 --> 00:21:47,440
And for next time, when we come together, I can show you guys what I have up to that

317
00:21:47,440 --> 00:21:50,000
point.

318
00:21:50,000 --> 00:21:54,240
Feel free to bounce things off me when questions off.

319
00:21:54,240 --> 00:22:01,840
Well, the group really, Charles too, he's gotten pretty deep into this data as well.

320
00:22:01,840 --> 00:22:06,920
Because the matching game is tricky.

321
00:22:06,920 --> 00:22:16,400
So for example, let's look at this lab result ID real quick.

322
00:22:16,400 --> 00:22:20,080
Oh, here we go.

323
00:22:20,080 --> 00:22:22,740
This is what I was looking for.

324
00:22:22,740 --> 00:22:31,600
This inventory ID is critical as well.

325
00:22:31,600 --> 00:22:40,120
So this inventory ID may actually be better than the 4MME ID.

326
00:22:40,120 --> 00:22:51,320
So see the 4MME ID connects to the global ID here.

327
00:22:51,320 --> 00:23:08,520
But if you look at the inventory ID, if you split the WA to the period, you see that?

328
00:23:08,520 --> 00:23:13,600
That is a licenses code.

329
00:23:13,600 --> 00:23:17,520
And so that's basically their license number.

330
00:23:17,520 --> 00:23:23,160
And that's an important number on its own.

331
00:23:23,160 --> 00:23:32,080
I think you can navigate the system through MME IDs.

332
00:23:32,080 --> 00:23:41,040
However, you may actually sort of have better luck.

333
00:23:41,040 --> 00:23:46,880
Well, actually, they're for tracking different things.

334
00:23:46,880 --> 00:23:49,200
So this is for tracking the licensee.

335
00:23:49,200 --> 00:23:53,480
And then this is actually for tracking a particular inventory item.

336
00:23:53,480 --> 00:23:54,480
Right.

337
00:23:54,480 --> 00:24:03,480
But long story short, this is an important ID as well.

338
00:24:03,480 --> 00:24:04,480
Yeah.

339
00:24:04,480 --> 00:24:14,120
I mean, obviously, the benefit of going through some of this mapping is that it puts everybody

340
00:24:14,120 --> 00:24:17,440
on the experience that you and Charles have had so far.

341
00:24:17,440 --> 00:24:21,560
You've kind of figured out some of these relationships.

342
00:24:21,560 --> 00:24:28,160
But if we had it documented somewhere, it makes it easier to share and collaborate and

343
00:24:28,160 --> 00:24:29,880
kind of speak the same language.

344
00:24:29,880 --> 00:24:32,800
Yeah, no, that would be really helpful.

345
00:24:32,800 --> 00:24:36,840
Because yeah, it all is sort of in my head.

346
00:24:36,840 --> 00:24:41,360
And I have to, you know, sometimes I take a few days off from working on this and I

347
00:24:41,360 --> 00:24:45,000
have to go back and think, oh, this is how it all works together.

348
00:24:45,000 --> 00:24:46,000
Right.

349
00:24:46,000 --> 00:24:47,000
It's always the case.

350
00:24:47,000 --> 00:24:55,320
You asked me something I coded five days ago and you might as well be talking to a wall.

351
00:24:55,320 --> 00:25:00,720
And I think this is something that's been a long time coming because I just, yeah, it's

352
00:25:00,720 --> 00:25:08,480
just a brilliant idea because we all just have these sort of mental maps about, oh,

353
00:25:08,480 --> 00:25:09,480
yeah.

354
00:25:09,480 --> 00:25:14,280
For example, when I was coding this up, I was like, oh, you know, I just sort of looked

355
00:25:14,280 --> 00:25:20,680
at the data and I was like, okay, you know, this sort of matches global ID to MME ID.

356
00:25:20,680 --> 00:25:30,520
However, if we had just almost not just this connection, but if we had the whole web, that

357
00:25:30,520 --> 00:25:35,240
way we could see, oh, you know, this ID connects to this ID.

358
00:25:35,240 --> 00:25:36,240
Right.

359
00:25:36,240 --> 00:25:38,240
So that way we could connect.

360
00:25:38,240 --> 00:25:43,520
Okay, so that way we can connect the licensee to the lab result.

361
00:25:43,520 --> 00:25:53,040
And then, you know, what if we could also then, you know, connect, what if we could

362
00:25:53,040 --> 00:25:58,800
then, you know, connect this lab result up to sales essentially?

363
00:25:58,800 --> 00:25:59,800
Exactly.

364
00:25:59,800 --> 00:26:00,800
Right.

365
00:26:00,800 --> 00:26:06,800
You can answer a lot more interesting questions if you just know the lay of the land.

366
00:26:06,800 --> 00:26:07,800
Yeah.

367
00:26:07,800 --> 00:26:08,800
Exactly.

368
00:26:08,800 --> 00:26:13,000
And I don't even know really the first thing about sales.

369
00:26:13,000 --> 00:26:21,840
So for example, well, just since the spirit of the data science group is actually look

370
00:26:21,840 --> 00:26:31,080
at the data, let's actually look at the script from last week where we were reading in sales

371
00:26:31,080 --> 00:26:32,960
last week.

372
00:26:32,960 --> 00:26:34,560
Now.

373
00:26:34,560 --> 00:26:41,920
Maybe that was something I was doing on my own.

374
00:26:41,920 --> 00:26:45,600
But have no fear.

375
00:26:45,600 --> 00:26:46,960
We can go to the documentation.

376
00:26:46,960 --> 00:26:52,080
By the way, I just saw your your analytics logo on there.

377
00:26:52,080 --> 00:26:53,080
That was pretty cool.

378
00:26:53,080 --> 00:26:55,080
Oh, thank you.

379
00:26:55,080 --> 00:26:56,080
So.

380
00:26:56,080 --> 00:27:04,720
But so.

381
00:27:04,720 --> 00:27:06,800
Okay.

382
00:27:06,800 --> 00:27:16,480
Here is a sales data point.

383
00:27:16,480 --> 00:27:24,040
And so what as you'll see, once again, we've got the PESTI global ID.

384
00:27:24,040 --> 00:27:29,600
Well, or the or the helpful global ID, whichever.

385
00:27:29,600 --> 00:27:33,160
If you're the glass half full person.

386
00:27:33,160 --> 00:27:41,080
So like Charles was saying, so this is sort of my inclination.

387
00:27:41,080 --> 00:27:44,080
But there's a pro and the con.

388
00:27:44,080 --> 00:27:51,240
So my inclination would be to just call this a sales ID.

389
00:27:51,240 --> 00:27:57,040
And that's just what we'll in internally refer to it as.

390
00:27:57,040 --> 00:28:02,160
So table, not table name underscore ID.

391
00:28:02,160 --> 00:28:03,160
Exactly.

392
00:28:03,160 --> 00:28:04,160
Sales under.

393
00:28:04,160 --> 00:28:10,480
And so let's see if that would work for the most part.

394
00:28:10,480 --> 00:28:13,800
So I don't think taxes are in there.

395
00:28:13,800 --> 00:28:18,240
But the sales are the big one.

396
00:28:18,240 --> 00:28:23,480
We can get plants and see if we can incorporate some plants or like I would call that like

397
00:28:23,480 --> 00:28:27,440
plant ID.

398
00:28:27,440 --> 00:28:30,760
Inventory type ID.

399
00:28:30,760 --> 00:28:32,800
Lab result ID.

400
00:28:32,800 --> 00:28:33,800
Sure.

401
00:28:33,800 --> 00:28:36,800
Licensee ID.

402
00:28:36,800 --> 00:28:41,720
And then the inventory transfer ID.

403
00:28:41,720 --> 00:28:42,720
Yeah.

404
00:28:42,720 --> 00:28:43,720
That's right.

405
00:28:43,720 --> 00:28:44,720
That's ID.

406
00:28:44,720 --> 00:28:48,800
That's what I would do.

407
00:28:48,800 --> 00:28:58,520
But just to go ahead and play devil's advocate that that's introducing like yet a whole other

408
00:28:58,520 --> 00:29:04,760
variable into the into the mix of variables.

409
00:29:04,760 --> 00:29:12,920
But you know, to counter that, you know, it's just it can be nice to just add a little structure

410
00:29:12,920 --> 00:29:17,600
to add a bit more schema is always nice.

411
00:29:17,600 --> 00:29:25,960
Well, you have to do that at some point, because if you try and combine entries from the lab

412
00:29:25,960 --> 00:29:31,880
results and the inventories, they both have a global ID, but they're not the same global

413
00:29:31,880 --> 00:29:32,880
ID.

414
00:29:32,880 --> 00:29:36,600
And then pandas tries to think, well, it's a global ID.

415
00:29:36,600 --> 00:29:41,060
So it's all the same column, but it's not.

416
00:29:41,060 --> 00:29:44,420
So you have to go back and you have to go through and rename them anyway at some point

417
00:29:44,420 --> 00:29:48,880
if you want to if you want to do analysis using more than one table.

418
00:29:48,880 --> 00:29:49,880
Exactly.

419
00:29:49,880 --> 00:29:58,800
I was just curious for the files that Keegan that you've collected up to this point.

420
00:29:58,800 --> 00:30:06,160
I might be tracking off the subject a little bit, but you mentioned previously that there's

421
00:30:06,160 --> 00:30:09,480
a Freedom of Information Act that you had to use to get all this data.

422
00:30:09,480 --> 00:30:10,480
Is that correct?

423
00:30:10,480 --> 00:30:22,400
I already have this stuff posted out there.

424
00:30:22,400 --> 00:30:26,840
So I'll share this link with you.

425
00:30:26,840 --> 00:30:31,160
So this is essentially the way to go about things.

426
00:30:31,160 --> 00:30:52,760
So if you go to the LCB.WA.gov slash records slash make dash public dash request records

427
00:30:52,760 --> 00:31:01,320
request, they've basically got this set of guidelines.

428
00:31:01,320 --> 00:31:15,400
So basically just submit your information in sort of the data you're seeking.

429
00:31:15,400 --> 00:31:17,760
And so it may be helpful.

430
00:31:17,760 --> 00:31:26,160
So this was shared with me with me by someone who just does sort of monthly records requests.

431
00:31:26,160 --> 00:31:30,640
So they're just each month just getting all the latest data.

432
00:31:30,640 --> 00:31:35,360
And so it actually to this, you know, just goes up through December.

433
00:31:35,360 --> 00:31:42,440
So it'd be worthwhile to have to do another records request.

434
00:31:42,440 --> 00:31:48,960
I'm not 100% certain like how fast they're able to turn around the data, but you can

435
00:31:48,960 --> 00:31:53,040
probably get like up to like the last month.

436
00:31:53,040 --> 00:31:57,960
So everything up to before the prior month.

437
00:31:57,960 --> 00:32:02,080
So let's say this month is May.

438
00:32:02,080 --> 00:32:11,000
I wouldn't be surprised if you could get everything up through March, through the end of March.

439
00:32:11,000 --> 00:32:22,320
But I just haven't I haven't gone about making records requests for the latest data myself.

440
00:32:22,320 --> 00:32:26,440
And they published all that out on their kind of public access point.

441
00:32:26,440 --> 00:32:27,440
Yes.

442
00:32:27,440 --> 00:32:28,440
Yeah.

443
00:32:28,440 --> 00:32:29,440
Okay.

444
00:32:29,440 --> 00:32:30,440
Okay.

445
00:32:30,440 --> 00:32:33,480
We do something similar with government data.

446
00:32:33,480 --> 00:32:40,120
So the National Highway Transportation Safety Administration, it's they publish a lot of

447
00:32:40,120 --> 00:32:48,720
public databases on vehicle accidents and complaints from customers.

448
00:32:48,720 --> 00:32:53,080
And we actually scrape that data and ingest it.

449
00:32:53,080 --> 00:32:57,040
But that we do that daily for General Motors because they update it daily.

450
00:32:57,040 --> 00:33:02,840
But I don't know what your long term vision is Keegan for the Canalytics company.

451
00:33:02,840 --> 00:33:08,960
But if you're thinking about aggregation across states or something like that, you'd want

452
00:33:08,960 --> 00:33:17,360
to think about some formalized ETL type tools or whatever to ingest it and consolidate it.

453
00:33:17,360 --> 00:33:26,040
But back to our mapping schema, whatever we end up calling, if we call these different

454
00:33:26,040 --> 00:33:32,880
IDs by their table name and all that stuff, that would be part of your formal definition

455
00:33:32,880 --> 00:33:34,400
of map.

456
00:33:34,400 --> 00:33:40,160
As you ingest them, you can then translate the fields to whatever it is that you want

457
00:33:40,160 --> 00:33:41,160
to use.

458
00:33:41,160 --> 00:33:50,560
And then they'll be like data ready to use after you ingest them.

459
00:33:50,560 --> 00:33:54,520
You've actually hit on a brilliant idea.

460
00:33:54,520 --> 00:34:05,880
So essentially, that's something that essentially I sort of started fiddling with over here

461
00:34:05,880 --> 00:34:06,880
because you're right.

462
00:34:06,880 --> 00:34:12,960
So we're essentially going to need a formal schema for all the data.

463
00:34:12,960 --> 00:34:19,380
So that's actually something that you're right that Canalytics is sort of setting about this.

464
00:34:19,380 --> 00:34:30,680
So the way I'm approaching this is I'm trying to essentially consolidate the two main traceability

465
00:34:30,680 --> 00:34:31,680
systems.

466
00:34:31,680 --> 00:34:41,800
So you've got leaf here and then you've got metric over here.

467
00:34:41,800 --> 00:34:45,120
And that's the one Michigan uses metric.

468
00:34:45,120 --> 00:34:50,720
I haven't done much digging around yet, but because I find out more about how to access

469
00:34:50,720 --> 00:34:57,200
Michigan's metric data, I'll share that with you.

470
00:34:57,200 --> 00:35:06,080
So I'm not 100% sure if you can actually get public metric data.

471
00:35:06,080 --> 00:35:19,280
But essentially, I was just looking at this just to try to consolidate these names here.

472
00:35:19,280 --> 00:35:25,200
So let's see here.

473
00:35:25,200 --> 00:35:37,520
So let's just look at these two side by side.

474
00:35:37,520 --> 00:35:44,880
So here would be, well, this is a delivery.

475
00:35:44,880 --> 00:35:59,520
An actual...

476
00:35:59,520 --> 00:36:05,640
When you were working in your lab environment and you were running APIs, obviously, I say

477
00:36:05,640 --> 00:36:11,360
obviously, but you have to be a license holder in order to access the data, right?

478
00:36:11,360 --> 00:36:13,080
Yes, 100%.

479
00:36:13,080 --> 00:36:17,920
So well, in Washington state, you can do the Freedom of Information Act.

480
00:36:17,920 --> 00:36:18,920
That's just a...

481
00:36:18,920 --> 00:36:25,600
It's sort of funny how that works, but that's like historic data.

482
00:36:25,600 --> 00:36:36,600
But to get sort of the live active data, yes, you have to be a license holder or technically

483
00:36:36,600 --> 00:36:42,040
there are software companies that are verified integrators with the state, provide software

484
00:36:42,040 --> 00:36:44,800
services to the companies.

485
00:36:44,800 --> 00:36:58,520
And then they can actually interface with the state traceability API.

486
00:36:58,520 --> 00:37:02,080
Okay.

487
00:37:02,080 --> 00:37:08,000
The way I'm interested in it though is basically just trying to help people just have sort

488
00:37:08,000 --> 00:37:12,000
of a common way of talking about data.

489
00:37:12,000 --> 00:37:17,760
The data they're looking at.

490
00:37:17,760 --> 00:37:28,960
So for example, on the left here, you've got like essentially a sales receipt in LEAF.

491
00:37:28,960 --> 00:37:36,120
And then here you've got a sales receipt on the right in metric.

492
00:37:36,120 --> 00:37:41,500
And so they're generally capturing the same thing.

493
00:37:41,500 --> 00:37:43,080
So you've got the time.

494
00:37:43,080 --> 00:37:51,440
So here we've got sales date and they have it in ISO formatted timestamp.

495
00:37:51,440 --> 00:37:54,760
And then here we've got created ad.

496
00:37:54,760 --> 00:38:01,760
And I do believe you can actually pass ISO formatted timestamps.

497
00:38:01,760 --> 00:38:11,800
But the documentation, it says it should be in that format.

498
00:38:11,800 --> 00:38:17,800
Yes, you're already thinking along the lines of data consolidation and aggregation.

499
00:38:17,800 --> 00:38:18,800
Exactly.

500
00:38:18,800 --> 00:38:22,120
And so basically just for my...

501
00:38:22,120 --> 00:38:25,800
I'm just trying to create some standards here.

502
00:38:25,800 --> 00:38:29,320
Well, and there's other efforts to do this.

503
00:38:29,320 --> 00:38:36,600
I should note there's OpenTHC and other groups who are trying to do sort of standardization

504
00:38:36,600 --> 00:38:39,040
in the cannabis industry.

505
00:38:39,040 --> 00:38:45,000
Oh yes.

506
00:38:45,000 --> 00:38:50,880
This was specifically lab results.

507
00:38:50,880 --> 00:38:54,720
But essentially it would just be nice to create models for these.

508
00:38:54,720 --> 00:39:00,880
And my formalization would be yes to go with the ISO.

509
00:39:00,880 --> 00:39:06,040
You definitely want the ISO formatted timestamps.

510
00:39:06,040 --> 00:39:11,000
But it's not critical, but it's...

511
00:39:11,000 --> 00:39:17,640
I think it would be awesome to just have a standardized way to talk about cannabis data

512
00:39:17,640 --> 00:39:20,120
even between traceability systems.

513
00:39:20,120 --> 00:39:28,480
So that way you could do analysis regardless if they're in Washington state or in Michigan.

514
00:39:28,480 --> 00:39:29,480
Right.

515
00:39:29,480 --> 00:39:30,480
Absolutely.

516
00:39:30,480 --> 00:39:38,200
So Charles, one of the ideas that Keegan was sharing with me for my graduate project was

517
00:39:38,200 --> 00:39:49,200
possibly mapping THC concentrations in different products and relating that to sales to see

518
00:39:49,200 --> 00:39:57,920
if what the relationship is, is there any kind of correlation between those factors.

519
00:39:57,920 --> 00:40:01,440
And so that was one good idea.

520
00:40:01,440 --> 00:40:04,600
And I'm definitely open to other ideas as well.

521
00:40:04,600 --> 00:40:06,480
I just have to come up with something pretty quick.

522
00:40:06,480 --> 00:40:09,960
But if you also, if you think of any kind of interesting questions, Charles, please

523
00:40:09,960 --> 00:40:10,960
let me know.

524
00:40:10,960 --> 00:40:16,120
But again, the first step for me to do anything with this is time to kind of map out the data

525
00:40:16,120 --> 00:40:20,400
so I understand it, the relationship and how I can pull the data that I would need for

526
00:40:20,400 --> 00:40:21,400
my analysis.

527
00:40:21,400 --> 00:40:22,400
Right.

528
00:40:22,400 --> 00:40:27,480
I started actually, I've started working on that particular problem.

529
00:40:27,480 --> 00:40:28,480
Okay.

530
00:40:28,480 --> 00:40:37,360
So one of the things you're going to run into is just the massive amount of data because

531
00:40:37,360 --> 00:40:45,160
you need the lab results, you need the inventory, you need the sales.

532
00:40:45,160 --> 00:40:51,680
And so, I mean, it's a lot of data.

533
00:40:51,680 --> 00:41:01,080
And so what I've been trying to do is get Dask working so that, although what ends up

534
00:41:01,080 --> 00:41:11,040
happening is even with Dask, you're swap file bound and you're still, if you don't have

535
00:41:11,040 --> 00:41:18,600
a lot of memory, then it spends a lot of time swapping between the Dask and the memory.

536
00:41:18,600 --> 00:41:27,960
I wonder if you could do some sort of data sampling methodology as opposed to trying

537
00:41:27,960 --> 00:41:35,160
to get it all just maybe do a segment of the products or at least a start.

538
00:41:35,160 --> 00:41:38,440
Yeah, that would be a good approach.

539
00:41:38,440 --> 00:41:44,040
So Paul, here, oops, I keep bringing that up.

540
00:41:44,040 --> 00:41:50,760
So here is actually how you make the connection.

541
00:41:50,760 --> 00:41:58,720
You can do it more elegantly because I'm not sure how you're going to capture the, basically

542
00:41:58,720 --> 00:42:04,880
the things you need to capture are the quantity.

543
00:42:04,880 --> 00:42:10,560
So it's going to be a mess, right?

544
00:42:10,560 --> 00:42:13,840
So you've got a receipt here.

545
00:42:13,840 --> 00:42:19,120
A receipt can have multiple items, right?

546
00:42:19,120 --> 00:42:30,120
And this is where you got into the basket idea because a consumer may buy a soda and

547
00:42:30,120 --> 00:42:41,360
a jar of flour or they may buy two jars of different flour from two different licenses

548
00:42:41,360 --> 00:42:46,920
or two different cultivators.

549
00:42:46,920 --> 00:42:57,200
So what you're basically going to have to do is for each sale item, I would aggregate

550
00:42:57,200 --> 00:43:04,800
by sale item, so just essentially think of every sale item independently.

551
00:43:04,800 --> 00:43:08,480
They are independent in the sale items table.

552
00:43:08,480 --> 00:43:11,240
Oh, I see.

553
00:43:11,240 --> 00:43:13,640
The sales is just a total amount.

554
00:43:13,640 --> 00:43:17,080
The sale items actually have the individual items.

555
00:43:17,080 --> 00:43:20,200
It's like a parent-child relationship.

556
00:43:20,200 --> 00:43:21,200
Yeah.

557
00:43:21,200 --> 00:43:22,200
Okay.

558
00:43:22,200 --> 00:43:25,200
That's good to know.

559
00:43:25,200 --> 00:43:33,080
Yeah, so that was one idea that we were talking about, Charles's market basket analysis where

560
00:43:33,080 --> 00:43:40,520
the example is if I buy peanut butter and jelly, I'm likely to buy bread, right?

561
00:43:40,520 --> 00:43:44,280
And some of that analysis might be kind of interesting just to see what the product mix

562
00:43:44,280 --> 00:43:46,600
is of what people are buying.

563
00:43:46,600 --> 00:43:48,080
Yeah.

564
00:43:48,080 --> 00:43:55,800
I don't know how you can, or I just don't know off the top of my head how to quantify

565
00:43:55,800 --> 00:43:56,800
it.

566
00:43:56,800 --> 00:43:57,800
I'm sure you can.

567
00:43:57,800 --> 00:44:06,080
We have to get the sale ID or somehow the sale ID I think is embedded in the sale items

568
00:44:06,080 --> 00:44:08,560
ID or in the sale items table.

569
00:44:08,560 --> 00:44:09,560
Yeah.

570
00:44:09,560 --> 00:44:15,840
You have to link them that way and then find everything that was sold with that particular

571
00:44:15,840 --> 00:44:16,840
sales ID.

572
00:44:16,840 --> 00:44:19,520
Yeah, that seems to make sense.

573
00:44:19,520 --> 00:44:20,520
Okay.

574
00:44:20,520 --> 00:44:26,040
We have to take a look to maybe I'll start off with that one then as far as the mapping

575
00:44:26,040 --> 00:44:27,040
goes.

576
00:44:27,040 --> 00:44:36,400
But you were saying that the, was it the customer or the member ID or whatever, the MMI, I think

577
00:44:36,400 --> 00:44:42,280
you were calling it, that seems to be like the core ID in the whole system, the licensee

578
00:44:42,280 --> 00:44:45,280
ID.

579
00:44:45,280 --> 00:44:55,440
Oh, so let me just get this thought out and then I'll explain the ID.

580
00:44:55,440 --> 00:44:56,440
Yeah, go ahead.

581
00:44:56,440 --> 00:45:02,240
I was just thinking a way that you could actually quantify the baskets would be just do percent

582
00:45:02,240 --> 00:45:07,000
out of 100 of the different types of goods.

583
00:45:07,000 --> 00:45:18,320
So did this person buy 100% flower or did they buy or was their sale like 80% flower,

584
00:45:18,320 --> 00:45:19,800
20% edible?

585
00:45:19,800 --> 00:45:26,080
Yeah, they have in market basket analysis, they have something called support.

586
00:45:26,080 --> 00:45:31,240
There's like four or five metrics and you're hitting on one of them.

587
00:45:31,240 --> 00:45:35,680
One's called support is basically like a popularity measure.

588
00:45:35,680 --> 00:45:41,840
There's another one called lift and that's the strength of relationship amongst the items

589
00:45:41,840 --> 00:45:43,100
in the basket.

590
00:45:43,100 --> 00:45:48,000
So there's various different metrics in market basket analysis that will kind of give us

591
00:45:48,000 --> 00:45:49,000
that picture.

592
00:45:49,000 --> 00:45:55,440
It gives you different, so you could have a combination that's very, very popular, but

593
00:45:55,440 --> 00:45:59,800
it's weird, it's almost counterintuitive, but the relationship amongst those items in

594
00:45:59,800 --> 00:46:03,080
the basket may not be that strong.

595
00:46:03,080 --> 00:46:06,400
If I have a propensity by one, do I buy the other?

596
00:46:06,400 --> 00:46:11,600
So there's some interesting things, relationships you can get out of that.

597
00:46:11,600 --> 00:46:12,680
That's fascinating.

598
00:46:12,680 --> 00:46:16,120
You are an expert on this.

599
00:46:16,120 --> 00:46:19,840
So I'm stunned.

600
00:46:19,840 --> 00:46:23,840
I want to learn a lot more from you about this because this is something that I'm interested

601
00:46:23,840 --> 00:46:28,800
in and I apparently, I don't know that much.

602
00:46:28,800 --> 00:46:36,280
It's actually, maybe next time we can go over some of it, but it's actually not that difficult

603
00:46:36,280 --> 00:46:38,920
after you look at it a little bit.

604
00:46:38,920 --> 00:46:46,640
It's actually one of the most basic machine learning algorithms that you use, or not you

605
00:46:46,640 --> 00:46:48,400
use, but it can be used.

606
00:46:48,400 --> 00:46:51,720
It's just real basic, it's been around for a long time.

607
00:46:51,720 --> 00:46:55,400
And one of the reasons, this sounds, maybe I shouldn't say this, but for this graduate

608
00:46:55,400 --> 00:46:58,760
project, I was looking for something that wouldn't be too, too overwhelming because

609
00:46:58,760 --> 00:47:02,640
there's such a large time commitment in writing the paper and everything else that I didn't

610
00:47:02,640 --> 00:47:07,280
want to trip myself up and get in something that's really advanced that might be a real

611
00:47:07,280 --> 00:47:08,280
time suck.

612
00:47:08,280 --> 00:47:11,080
So I thought, this is something that could be useful.

613
00:47:11,080 --> 00:47:12,080
It's well known.

614
00:47:12,080 --> 00:47:14,000
It's not too complicated.

615
00:47:14,000 --> 00:47:18,800
And it may be a low hanging fruit for this data set.

616
00:47:18,800 --> 00:47:25,800
So just something that I'm keeping in mind.

617
00:47:25,800 --> 00:47:28,320
Okay.

618
00:47:28,320 --> 00:47:32,280
Yeah.

619
00:47:32,280 --> 00:47:35,920
I'm glad, Charles, you said about that sale item relationship.

620
00:47:35,920 --> 00:47:41,920
That'll be the first thing I'm going to go check out as far as mapping goes.

621
00:47:41,920 --> 00:47:47,160
But yeah, I mean, if you guys are cool with me taking a stab at some of the, getting started

622
00:47:47,160 --> 00:47:51,040
with some of the data mapping, I'll be glad to do that.

623
00:47:51,040 --> 00:47:55,200
But yeah, I probably will have some questions for you as I get tripped up on anything.

624
00:47:55,200 --> 00:47:56,200
Sure.

625
00:47:56,200 --> 00:47:57,200
Well, yeah.

626
00:47:57,200 --> 00:48:05,440
I mean, contact Keegan or me and we'll help you out or try and help you out.

627
00:48:05,440 --> 00:48:06,440
It's a lot of data.

628
00:48:06,440 --> 00:48:10,680
It's a lot of information.

629
00:48:10,680 --> 00:48:12,000
It's hard to keep it all straight.

630
00:48:12,000 --> 00:48:15,000
So yeah, certainly.

631
00:48:15,000 --> 00:48:25,000
So just curious about...

632
00:48:25,000 --> 00:48:30,240
The connection may be a little choppy, but I was just going to jump in real quick and

633
00:48:30,240 --> 00:48:34,440
point you in maybe a direction connecting these IDs real quick.

634
00:48:34,440 --> 00:48:35,440
That's okay?

635
00:48:35,440 --> 00:48:36,440
Yeah, go ahead.

636
00:48:36,440 --> 00:48:37,440
Right here.

637
00:48:37,440 --> 00:48:38,440
Yeah.

638
00:48:38,440 --> 00:48:42,320
Am I coming through okay?

639
00:48:42,320 --> 00:48:55,080
It chops a little bit, but for the most part, you're fine.

640
00:48:55,080 --> 00:48:57,960
Am I still coming through okay?

641
00:48:57,960 --> 00:48:58,960
Yeah.

642
00:48:58,960 --> 00:48:59,960
Yeah.

643
00:48:59,960 --> 00:49:00,960
Yes.

644
00:49:00,960 --> 00:49:01,960
Okay.

645
00:49:01,960 --> 00:49:05,760
Well, I'll just try to power through this.

646
00:49:05,760 --> 00:49:11,080
Essentially I was just going to say real quick that, you know, we've basically mapped the

647
00:49:11,080 --> 00:49:15,360
licensee to the lab result.

648
00:49:15,360 --> 00:49:21,640
And then you can use the inventory ID.

649
00:49:21,640 --> 00:49:31,800
So you can use the inventory ID of the lab result to map that to the global inventory

650
00:49:31,800 --> 00:49:36,480
ID of the sale.

651
00:49:36,480 --> 00:49:44,640
So that way you've mapped the cultivator and the lab result to the sale item.

652
00:49:44,640 --> 00:49:48,440
So in the sale table, what was the one that maps back to the lab result?

653
00:49:48,440 --> 00:49:49,840
Can you repeat that?

654
00:49:49,840 --> 00:49:51,760
I was looking down, sorry.

655
00:49:51,760 --> 00:49:52,760
Yes.

656
00:49:52,760 --> 00:50:07,040
So you have the inventory ID of the lab result maps to the global inventory ID of actually

657
00:50:07,040 --> 00:50:11,040
the sale item.

658
00:50:11,040 --> 00:50:13,040
I got you.

659
00:50:13,040 --> 00:50:14,440
Yep.

660
00:50:14,440 --> 00:50:20,200
And what Charles was saying is you can actually basically use this.

661
00:50:20,200 --> 00:50:31,760
I think this is what Charles was saying, you can use this global inventory ID and then

662
00:50:31,760 --> 00:50:36,320
you could also potentially, you know, incorporate some other data points.

663
00:50:36,320 --> 00:50:46,200
So you could also maybe get, look that inventory item up in inventories.

664
00:50:46,200 --> 00:50:56,040
And then you can find out, you know, whatever, you know, you could find out more data points

665
00:50:56,040 --> 00:51:05,440
about that inventory, like what strain it is.

666
00:51:05,440 --> 00:51:09,560
This system is just a wealth for business intelligence, isn't it?

667
00:51:09,560 --> 00:51:17,080
I mean, if you wrap your hands around this, I mean, you'll definitely have the pulse.

668
00:51:17,080 --> 00:51:20,320
I mean, obviously if you only get the data set once a month, but you'll have a pulse

669
00:51:20,320 --> 00:51:22,720
of what's going on in Washington state for sure.

670
00:51:22,720 --> 00:51:26,840
And that could be pretty good for your company, I would think.

671
00:51:26,840 --> 00:51:27,840
Well definitely.

672
00:51:27,840 --> 00:51:34,800
And it's one of those things where the data is sitting there, but it's such like a big

673
00:51:34,800 --> 00:51:40,960
pile of data and it's coming out so fast that I don't think people have the great handle

674
00:51:40,960 --> 00:51:41,960
on it.

675
00:51:41,960 --> 00:51:48,200
So you do see companies out there that are sort of just doing monthly totals.

676
00:51:48,200 --> 00:51:58,520
So that's useful just to know, okay, you know, what's the total sales, you know, each month.

677
00:51:58,520 --> 00:52:00,160
And so those are the big ones.

678
00:52:00,160 --> 00:52:05,880
Like maybe the more ambitious people are maybe trying to do like average price.

679
00:52:05,880 --> 00:52:09,600
But I don't even think people have that great of a handle on that.

680
00:52:09,600 --> 00:52:15,520
Maybe I just haven't seen it.

681
00:52:15,520 --> 00:52:22,920
But the more in-depth connections, I'm not certain that, and that's why it's so cool

682
00:52:22,920 --> 00:52:29,480
working with it because, you know, we can be some of the first people to actually connect

683
00:52:29,480 --> 00:52:40,600
the data sets here and really have some real cool findings and just some real interesting

684
00:52:40,600 --> 00:52:48,720
findings and often all you really need to do is some conditional averages and you can.

685
00:52:48,720 --> 00:52:55,080
Yeah, it's definitely like first mover advantage.

686
00:52:55,080 --> 00:53:01,080
So in your opinion, for you as well, Charles, I mean, obviously you guys know a lot more

687
00:53:01,080 --> 00:53:02,080
about this than I do.

688
00:53:02,080 --> 00:53:11,920
But would you say the data science slash analytics maturity level in the industry right now is

689
00:53:11,920 --> 00:53:13,220
pretty low then?

690
00:53:13,220 --> 00:53:18,560
So there's not a whole lot of, I mean, it sounds like people are doing some basic aggregation

691
00:53:18,560 --> 00:53:23,080
based on what you said, but there's really not much in the way of data science being

692
00:53:23,080 --> 00:53:28,320
applied in the space.

693
00:53:28,320 --> 00:53:34,400
Well so that's why I think there's a shortage.

694
00:53:34,400 --> 00:53:37,280
So it's being done.

695
00:53:37,280 --> 00:53:40,240
What I see is mostly just averages.

696
00:53:40,240 --> 00:53:47,760
So I'll just see people just do monthly averages.

697
00:53:47,760 --> 00:53:52,040
So they'll just do total sales by month.

698
00:53:52,040 --> 00:53:59,120
And then they'll almost just conjecture what may be in the future.

699
00:53:59,120 --> 00:54:07,400
So I rarely even see like formal forecasting models.

700
00:54:07,400 --> 00:54:12,400
And then I think like the, I take that back.

701
00:54:12,400 --> 00:54:18,580
There are some companies I see that are starting to do a bit more in-depth analysis of sort

702
00:54:18,580 --> 00:54:20,240
of consumer sales.

703
00:54:20,240 --> 00:54:34,280
So they're saying, oh, the average consumer, maybe they're 60, 70% male, 30, 40% female.

704
00:54:34,280 --> 00:54:38,160
And so that's where, so I think they're starting to get there.

705
00:54:38,160 --> 00:54:43,400
And so that's where I was talking about sort of the conditional averages.

706
00:54:43,400 --> 00:54:49,880
And that's sort of, I think, where you start to get into the better insights is instead

707
00:54:49,880 --> 00:55:03,440
of just doing monthly sales, do sales by X.

708
00:55:03,440 --> 00:55:06,440
Instead of monthly sales, do sales by flower versus concentrate.

709
00:55:06,440 --> 00:55:08,600
So you're starting to get there.

710
00:55:08,600 --> 00:55:17,800
But then like the analysis you're doing where you're actually saying, OK, does lab results,

711
00:55:17,800 --> 00:55:22,560
do lab results have an effect on sales?

712
00:55:22,560 --> 00:55:29,960
I don't think I haven't, the only people who are doing that sort of research are like the

713
00:55:29,960 --> 00:55:30,960
academics.

714
00:55:30,960 --> 00:55:34,160
And like I said, it's in high demand.

715
00:55:34,160 --> 00:55:36,520
Everybody wants to know those sort of questions.

716
00:55:36,520 --> 00:55:45,760
But there's just a shortage of data scientists and people that have time to wrangle this

717
00:55:45,760 --> 00:55:46,760
stuff.

718
00:55:46,760 --> 00:55:49,280
So that's why it's just such a good opportunity.

719
00:55:49,280 --> 00:55:52,880
Yeah, the wrangling is the biggest part.

720
00:55:52,880 --> 00:55:58,400
And so there's univariate type statistics out there, summary statistics.

721
00:55:58,400 --> 00:56:03,360
You kind of talking about some multivariate stuff, this versus that.

722
00:56:03,360 --> 00:56:10,040
But it sounds like there's really no kind of machine learning effort or that we're aware

723
00:56:10,040 --> 00:56:13,400
of anyway that's currently.

724
00:56:13,400 --> 00:56:14,400
Nothing super public.

725
00:56:14,400 --> 00:56:21,880
I mean, like I'm sure people are making their attempts internally at making like machine

726
00:56:21,880 --> 00:56:22,880
learning models.

727
00:56:22,880 --> 00:56:31,880
And I've heard of like biotechnology companies who are trying to maybe do predictive analysis

728
00:56:31,880 --> 00:56:41,880
with cultivation, but really, really early stages.

729
00:56:41,880 --> 00:56:49,920
So I have one contact, so this is a bit of a stretch, but my cousin's husband's brother.

730
00:56:49,920 --> 00:56:51,720
I know he lives in Chicago.

731
00:56:51,720 --> 00:56:58,640
I know for a time he was looking, he worked for a hedge fund for a while.

732
00:56:58,640 --> 00:57:05,280
And then all I heard was he was looking into the cannabis industry.

733
00:57:05,280 --> 00:57:07,880
And this was probably about 18 months ago.

734
00:57:07,880 --> 00:57:10,840
That's the last I heard of it.

735
00:57:10,840 --> 00:57:17,640
And so my guess is that investment firms probably have some quants that have tried to do some

736
00:57:17,640 --> 00:57:24,800
of this stuff as far as trying to figure out what the investment landscape looks like.

737
00:57:24,800 --> 00:57:28,000
So those are probably one of the few in the academics that you mentioned.

738
00:57:28,000 --> 00:57:33,320
Those are probably the few people then that probably have delved into it that much.

739
00:57:33,320 --> 00:57:39,760
And I imagine once they look at some of the data wrangling and that it's not consolidated,

740
00:57:39,760 --> 00:57:43,480
it's probably not a comprehensive view.

741
00:57:43,480 --> 00:57:46,840
But I'm just kind of theorizing here, but those are probably the only folks that then

742
00:57:46,840 --> 00:57:50,440
have really probably dug into this that much.

743
00:57:50,440 --> 00:57:51,440
Exactly.

744
00:57:51,440 --> 00:57:56,120
The quantum, unless you were going to chime in, Charles.

745
00:57:56,120 --> 00:57:57,120
Oh, no.

746
00:57:57,120 --> 00:58:02,360
I mean, I really kind of don't know what other people are doing or what's going on out there.

747
00:58:02,360 --> 00:58:09,440
I'm just sort of doing this for an academic sense, not really like, you know.

748
00:58:09,440 --> 00:58:11,880
But it would be nice to see what other people are doing.

749
00:58:11,880 --> 00:58:12,880
Yeah.

750
00:58:12,880 --> 00:58:13,880
Yeah.

751
00:58:13,880 --> 00:58:16,080
At the very least, it's fun.

752
00:58:16,080 --> 00:58:19,080
So it's interesting.

753
00:58:19,080 --> 00:58:20,080
Exactly.

754
00:58:20,080 --> 00:58:21,080
And there's so many avenues.

755
00:58:21,080 --> 00:58:25,320
And I was just going to chime in that I think the quantitative guys, they're looking at

756
00:58:25,320 --> 00:58:28,080
stocks and sales.

757
00:58:28,080 --> 00:58:32,680
So I think there's a ton of interest in like consumer sales.

758
00:58:32,680 --> 00:58:37,640
So I think they're starting to get a bit more advanced there.

759
00:58:37,640 --> 00:58:43,960
But like, and but I think people are just barely scraping the surface with sort of cultivation

760
00:58:43,960 --> 00:58:50,840
analytics and then just the market analysis as a whole.

761
00:58:50,840 --> 00:58:53,240
There are maybe some people taking a stab at it.

762
00:58:53,240 --> 00:58:55,160
And then, of course, academics.

763
00:58:55,160 --> 00:59:01,600
But a lot of times they're just late to the game.

764
00:59:01,600 --> 00:59:06,720
Maybe don't have the best inside knowledge of the industry.

765
00:59:06,720 --> 00:59:10,960
It'd be great because there's a lot of like combining data sets you need to do.

766
00:59:10,960 --> 00:59:13,080
So I'm not certain.

767
00:59:13,080 --> 00:59:18,240
But like I said, it's crazy because it's just all this data is just sitting there on the

768
00:59:18,240 --> 00:59:19,240
table.

769
00:59:19,240 --> 00:59:26,920
So yeah, there's a lot of opportunity for sure to do something around and find out some

770
00:59:26,920 --> 00:59:29,720
interesting things.

771
00:59:29,720 --> 00:59:36,160
So I will share some things that I've learned from digging through this data.

772
00:59:36,160 --> 00:59:48,640
One, there's a lot of records that have dates of 1-1-1900.

773
00:59:48,640 --> 00:59:56,720
Basically I think anything that's like dated before April of 2018 is kind of messy because

774
00:59:56,720 --> 01:00:00,520
there was apparently there was some sort of system conversion.

775
01:00:00,520 --> 01:00:05,800
And so a lot of stuff that was from before 2018 is crammed into the first couple months

776
01:00:05,800 --> 01:00:07,720
of 2018.

777
01:00:07,720 --> 01:00:09,960
So you'll get some there's a lot of outliers.

778
01:00:09,960 --> 01:00:14,280
There's a lot of really off the charts results.

779
01:00:14,280 --> 01:00:17,320
And then it sort of stabilizes after that.

780
01:00:17,320 --> 01:00:20,160
So some data quality issues then.

781
01:00:20,160 --> 01:00:21,160
Yeah.

782
01:00:21,160 --> 01:00:25,720
I feel like there's something else I have to tell you.

783
01:00:25,720 --> 01:00:26,720
But let's see.

784
01:00:26,720 --> 01:00:33,920
I mean, those are two of the big oh, the other thing is data isn't necessarily recorded in

785
01:00:33,920 --> 01:00:36,200
real time.

786
01:00:36,200 --> 01:00:42,560
Like I think some places enter their data every day, some every week, some every month.

787
01:00:42,560 --> 01:00:47,400
So it's kind of hard to do a time any sort of time series analysis.

788
01:00:47,400 --> 01:00:49,360
I see.

789
01:00:49,360 --> 01:00:59,040
My, my two cents on those is you may want to exactly restrict your analysis to either

790
01:00:59,040 --> 01:01:06,520
after 2018 or after April of 2018, because that's sort of when the traceability was formalized.

791
01:01:06,520 --> 01:01:10,600
But look at your data and make your best judgment.

792
01:01:10,600 --> 01:01:19,400
And then what was the second point Charles?

793
01:01:19,400 --> 01:01:20,400
It's not recorded.

794
01:01:20,400 --> 01:01:23,080
It's not recorded at the same time.

795
01:01:23,080 --> 01:01:24,080
Yes.

796
01:01:24,080 --> 01:01:25,080
Yeah.

797
01:01:25,080 --> 01:01:36,080
A lot of the times I find weekly analysis real interesting, because you still get the

798
01:01:36,080 --> 01:01:45,840
dynamics of, you know, a bit more dynamic dynamics than like a monthly series.

799
01:01:45,840 --> 01:01:56,360
But then you don't get this noise that's present in the daily data where you may have these

800
01:01:56,360 --> 01:02:00,760
unnatural spikes where like Charles said, maybe everybody just does all their data entry

801
01:02:00,760 --> 01:02:01,760
on Monday.

802
01:02:01,760 --> 01:02:06,160
Or, you know, but maybe that's not when the sales actually happen.

803
01:02:06,160 --> 01:02:07,160
Sure.

804
01:02:07,160 --> 01:02:11,960
I, that makes a lot of sense because one of the first things that I felt when I was reading

805
01:02:11,960 --> 01:02:19,200
through this, I thought, what a, what a business burden for these organizations, right?

806
01:02:19,200 --> 01:02:23,280
To have to actually do all this work, right?

807
01:02:23,280 --> 01:02:24,760
All this accounting for the government.

808
01:02:24,760 --> 01:02:27,520
And that's just a, that's a lot of time, a lot of effort.

809
01:02:27,520 --> 01:02:32,200
So if they're going to figure out ways to make it as easy for them as possible and batch

810
01:02:32,200 --> 01:02:37,160
their submissions, I'm sure that's what they're going to do.

811
01:02:37,160 --> 01:02:41,280
Well you've, you've hit on essentially what analytics is all about.

812
01:02:41,280 --> 01:02:45,040
And so not just analytics, but many software companies out there.

813
01:02:45,040 --> 01:02:47,680
So that's one thing it's, it's incredible.

814
01:02:47,680 --> 01:02:53,400
You know, you throw these companies a challenge and they just, you know, there's companies

815
01:02:53,400 --> 01:02:54,560
that rise to it.

816
01:02:54,560 --> 01:03:00,360
So there's basically, you know, software companies that specifically just essentially try to

817
01:03:00,360 --> 01:03:03,280
make interfacing with the traceability system easy.

818
01:03:03,280 --> 01:03:10,040
And then, you know, they often have to have some sort of like ERP aspect.

819
01:03:10,040 --> 01:03:13,160
But you know, some sort of value added.

820
01:03:13,160 --> 01:03:19,880
And then, because you're right, it's either that or, you know, sit your employee down

821
01:03:19,880 --> 01:03:22,480
to do some data entry.

822
01:03:22,480 --> 01:03:23,480
Right.

823
01:03:23,480 --> 01:03:27,960
So is that, so from the analytics perspective, is that the space that you're most interested

824
01:03:27,960 --> 01:03:32,280
in right now is like how to facilitate making these guys' lives easier?

825
01:03:32,280 --> 01:03:36,040
Well, essentially so.

826
01:03:36,040 --> 01:03:39,840
Right now we're, analytics is verified in Oklahoma.

827
01:03:39,840 --> 01:03:42,400
So with metric.

828
01:03:42,400 --> 01:03:51,640
And so the idea is, so, you know, the laboratories need to operate and, you know, they need to

829
01:03:51,640 --> 01:03:53,120
comply with traceability.

830
01:03:53,120 --> 01:03:59,760
And to end of the day, the producers need to just, they care about getting their products

831
01:03:59,760 --> 01:04:02,480
tested and then sold.

832
01:04:02,480 --> 01:04:07,200
And they need to comply with traceability, preferably as an afterthought.

833
01:04:07,200 --> 01:04:08,200
Right.

834
01:04:08,200 --> 01:04:12,400
So they don't want to spend all day worrying about that.

835
01:04:12,400 --> 01:04:13,400
Sure.

836
01:04:13,400 --> 01:04:19,440
And so, and that's sort of what CanLinux does is so you sort of leverage the API so that

837
01:04:19,440 --> 01:04:25,440
way, you know, the data just flows nicely through the whole system.

838
01:04:25,440 --> 01:04:27,960
So you collect it smoothly.

839
01:04:27,960 --> 01:04:30,320
You create your certificates.

840
01:04:30,320 --> 01:04:33,120
It goes up into the traceability system.

841
01:04:33,120 --> 01:04:37,120
The cultivators, processors receive it.

842
01:04:37,120 --> 01:04:45,160
And then they can, you know, have the results in their system, sell it to the retailer.

843
01:04:45,160 --> 01:04:48,040
Everything's just moving along smoothly.

844
01:04:48,040 --> 01:04:51,800
And then they can sell it.

845
01:04:51,800 --> 01:04:57,040
And so that's the ideal world that we're trying to move towards.

846
01:04:57,040 --> 01:05:04,560
And then we're just trying to, you know, iron everything out, help those that are forced

847
01:05:04,560 --> 01:05:12,960
to still do data entry because I've been there and I don't think anybody, you know, no one

848
01:05:12,960 --> 01:05:14,600
deserves to be doing data entry.

849
01:05:14,600 --> 01:05:16,960
So we're trying to solve that problem.

850
01:05:16,960 --> 01:05:22,200
Yeah, no, that's a great spot that you're in then because you're just trying to make

851
01:05:22,200 --> 01:05:24,040
the lives easier for the businesses.

852
01:05:24,040 --> 01:05:29,160
Yeah, they just want to have a quality product and make sure it's following the process and

853
01:05:29,160 --> 01:05:30,160
they make their money.

854
01:05:30,160 --> 01:05:31,160
Right.

855
01:05:31,160 --> 01:05:34,880
And so I think that's a great spot for you coming in and kind of greasing the skids for

856
01:05:34,880 --> 01:05:38,920
that process is a great spot to be in, it sounds like.

857
01:05:38,920 --> 01:05:39,920
Just trying to help.

858
01:05:39,920 --> 01:05:42,920
Because, you know, like that's what I've learned.

859
01:05:42,920 --> 01:05:48,600
And there's just so much help and value that can be added there because, you know, there's

860
01:05:48,600 --> 01:05:50,320
some people that are figuring it out.

861
01:05:50,320 --> 01:05:54,240
But, you know, when you start talking to people, there are a lot of people that are frustrated

862
01:05:54,240 --> 01:06:06,680
with it and they are doing a lot of unnecessary data entry and it can only help.

863
01:06:06,680 --> 01:06:07,680
Right.

864
01:06:07,680 --> 01:06:14,920
And then it has a ripple effect because, you know, if all of a sudden they don't have that

865
01:06:14,920 --> 01:06:20,560
burden of data entry, they can spend that much time to make their business better, have

866
01:06:20,560 --> 01:06:21,560
better operations.

867
01:06:21,560 --> 01:06:22,560
Yeah.

868
01:06:22,560 --> 01:06:32,240
Do you know if the state agencies, especially like Washington or Oregon, do they have like

869
01:06:32,240 --> 01:06:38,160
auditing where you have to go into different maybe labs or producers or whatever and they

870
01:06:38,160 --> 01:06:42,200
have to be audited for compliance?

871
01:06:42,200 --> 01:06:44,360
So that's probably a state by state basis.

872
01:06:44,360 --> 01:06:49,720
I think they essentially, I'm sure they do sort of spot checks.

873
01:06:49,720 --> 01:06:55,480
So I don't actually know too much about like the actual enforcement side.

874
01:06:55,480 --> 01:07:01,960
So like for example, pesticide testing in Washington state, it's one of those things

875
01:07:01,960 --> 01:07:08,560
where you have to comply with the regulations so you can't use unregulated pesticides.

876
01:07:08,560 --> 01:07:14,280
But they're currently just, it's one of those things where basically if they showed up at

877
01:07:14,280 --> 01:07:27,640
your facility and tested your product, you know, you could be in violation of the regulations.

878
01:07:27,640 --> 01:07:32,160
But I don't really know like if they're like, if they have a schedule that they're going

879
01:07:32,160 --> 01:07:38,800
about and spot checking or if it's random or if they just do it, you know, if they suspect

880
01:07:38,800 --> 01:07:41,900
there's a pesticide being used.

881
01:07:41,900 --> 01:07:43,400
And that's just Washington state.

882
01:07:43,400 --> 01:07:48,600
And so I really think it probably varies state by state.

883
01:07:48,600 --> 01:07:58,640
I just heard secondhand that like in Oklahoma, they basically just pick random samples off

884
01:07:58,640 --> 01:08:01,440
of the shelf and get retested.

885
01:08:01,440 --> 01:08:04,000
So that's sort of a form of auditing.

886
01:08:04,000 --> 01:08:05,000
Yeah.

887
01:08:05,000 --> 01:08:06,000
Okay.

888
01:08:06,000 --> 01:08:07,000
I was just curious.

889
01:08:07,000 --> 01:08:10,840
It's interesting to see what kind of maturity level the industry is in.

890
01:08:10,840 --> 01:08:14,920
And it just seemed like it's well, it's all just getting started really.

891
01:08:14,920 --> 01:08:23,160
So I read in Oregon that if a producer goes to a lab and their sample fails, they'll take

892
01:08:23,160 --> 01:08:25,000
it to another lab.

893
01:08:25,000 --> 01:08:30,080
And basically labs that fail products just don't get any business and the ones that just

894
01:08:30,080 --> 01:08:33,080
pass products just don't get any business.

895
01:08:33,080 --> 01:08:36,200
So it's this pressure on labs to just pass things.

896
01:08:36,200 --> 01:08:37,440
Oh my word.

897
01:08:37,440 --> 01:08:38,440
Yeah.

898
01:08:38,440 --> 01:08:42,080
So who's testing the testers, right?

899
01:08:42,080 --> 01:08:43,080
Yeah.

900
01:08:43,080 --> 01:08:44,080
Interesting.

901
01:08:44,080 --> 01:08:45,080
Okay.

902
01:08:45,080 --> 01:08:48,360
Well, correct me if I'm wrong, too.

903
01:08:48,360 --> 01:08:52,280
In Oregon, you do have mandated pesticide testing.

904
01:08:52,280 --> 01:08:57,000
Do they also have mandated metals or?

905
01:08:57,000 --> 01:09:00,360
It's you know, everything in Oregon, right?

906
01:09:00,360 --> 01:09:03,200
It's the strictest testing.

907
01:09:03,200 --> 01:09:12,520
It's everything from the seed to the sale is tracked and we have the most, the strictest

908
01:09:12,520 --> 01:09:14,640
testing standards and stuff.

909
01:09:14,640 --> 01:09:21,440
But I don't believe any of it because I've, especially after reading stuff like that,

910
01:09:21,440 --> 01:09:26,200
I mean, it seems like there's always a way around it.

911
01:09:26,200 --> 01:09:28,760
And that was one of the things I looked at in Washington.

912
01:09:28,760 --> 01:09:32,560
There's very little retest data.

913
01:09:32,560 --> 01:09:34,160
That table is almost non-existent.

914
01:09:34,160 --> 01:09:35,160
Interesting.

915
01:09:35,160 --> 01:09:40,680
Yes, I hadn't even looked at that data.

916
01:09:40,680 --> 01:09:46,200
So that's worth a look.

917
01:09:46,200 --> 01:09:47,200
So that's interesting.

918
01:09:47,200 --> 01:09:48,200
Yeah.

919
01:09:48,200 --> 01:09:55,280
I mean, it's, could you imagine if Canalex had a white paper about retesting results,

920
01:09:55,280 --> 01:09:56,280
right?

921
01:09:56,280 --> 01:10:03,680
That would definitely get you some attention, I would think.

922
01:10:03,680 --> 01:10:04,680
Yeah.

923
01:10:04,680 --> 01:10:10,200
But yeah, I don't know.

924
01:10:10,200 --> 01:10:16,480
That's something I read in the newspaper, you know, the local newspaper.

925
01:10:16,480 --> 01:10:19,920
So it would be interesting to just go out.

926
01:10:19,920 --> 01:10:23,600
Actually there's a lot of, you can actually come up with a lot of good project ideas from

927
01:10:23,600 --> 01:10:24,800
reading the newspaper.

928
01:10:24,800 --> 01:10:25,800
Yeah.

929
01:10:25,800 --> 01:10:32,000
There's one about like sales along the Oregon-Idaho border because it's illegal in Idaho, but

930
01:10:32,000 --> 01:10:33,640
it's legal in Oregon.

931
01:10:33,640 --> 01:10:40,200
And like, you know, these little towns have this huge per capita spending on cannabis.

932
01:10:40,200 --> 01:10:46,400
But the larger cities, because people are coming across the border.

933
01:10:46,400 --> 01:10:48,400
Yeah, sure.

934
01:10:48,400 --> 01:10:49,440
Yeah.

935
01:10:49,440 --> 01:10:51,680
This is all part of the story that could be told, right?

936
01:10:51,680 --> 01:10:54,720
I mean, yeah, it's interesting stuff.

937
01:10:54,720 --> 01:11:02,680
But first we just have to get a handle on the data.

938
01:11:02,680 --> 01:11:06,240
Yeah, the data always comes first.

939
01:11:06,240 --> 01:11:08,240
So I do have to drop off here.

940
01:11:08,240 --> 01:11:09,240
Oh yes, yes.

941
01:11:09,240 --> 01:11:10,240
We've overstayed.

942
01:11:10,240 --> 01:11:11,240
No, no, no.

943
01:11:11,240 --> 01:11:12,240
It's my lunch break.

944
01:11:12,240 --> 01:11:17,640
I'm supposed to be working, but this is more fun.

945
01:11:17,640 --> 01:11:23,680
So I'll start doing, trying to do the best I can in the mapping.

946
01:11:23,680 --> 01:11:28,240
I'll touch base with you guys before we meet again, just to let you know where I am.

947
01:11:28,240 --> 01:11:33,640
Right now, what kind of project I'm going to do for my schoolwork, I'm not exactly 100%

948
01:11:33,640 --> 01:11:34,640
sure.

949
01:11:34,640 --> 01:11:39,680
I think it's going to be really low hanging fruit on what data seems to present itself

950
01:11:39,680 --> 01:11:41,520
to pull together most easily.

951
01:11:41,520 --> 01:11:43,440
I think that's probably going to be the driving factor.

952
01:11:43,440 --> 01:11:47,840
But yeah, I'll take a shot at doing what I can for the mapping and reach out to you guys

953
01:11:47,840 --> 01:11:50,200
if I have any questions and I'll give you a little update.

954
01:11:50,200 --> 01:11:53,840
I'll probably beginning of next week or something like that.

955
01:11:53,840 --> 01:11:55,360
Awesome, Paul.

956
01:11:55,360 --> 01:11:57,320
It's been fun and I like your ideas.

957
01:11:57,320 --> 01:11:58,840
I think it's going to be incredibly helpful.

958
01:11:58,840 --> 01:12:00,440
Yeah, I appreciate it.

959
01:12:00,440 --> 01:12:01,800
And it's great meeting you, Charles.

960
01:12:01,800 --> 01:12:07,040
And I'm looking forward to tell my wife that I met a guy in Oregon who's from Roseville.

961
01:12:07,040 --> 01:12:10,840
It's great meeting you.

962
01:12:10,840 --> 01:12:11,840
Okay.

963
01:12:11,840 --> 01:12:12,840
All right.

964
01:12:12,840 --> 01:12:13,840
All right, everyone.

965
01:12:13,840 --> 01:12:14,840
It's been awesome.

966
01:12:14,840 --> 01:12:15,840
Have a productive week.

967
01:12:15,840 --> 01:12:16,840
All right.

968
01:12:16,840 --> 01:12:17,840
Take care, guys.

969
01:12:17,840 --> 01:12:18,840
Talk to you later.

970
01:12:18,840 --> 01:12:19,840
Bye.

971
01:12:19,840 --> 01:12:20,840
See you soon.

972
01:12:20,840 --> 01:12:49,840
Bye.

