1
00:00:00,000 --> 00:00:10,840
The strategy that I would love to do is you would basically mix some of the tried and

2
00:00:10,840 --> 00:00:17,520
true genetics like the MAC with some of these crazy land races that people are discovering

3
00:00:17,520 --> 00:00:24,280
in Southeast Asia or people are growing cannabis.

4
00:00:24,280 --> 00:00:29,480
One, you want to make sure you get your seeds from a legitimate vendor.

5
00:00:29,480 --> 00:00:36,720
People are growing cannabis all over the world from Afghanistan, Pakistan to Southeast Asia

6
00:00:36,720 --> 00:00:41,920
down to South America, plus all over the United States.

7
00:00:41,920 --> 00:00:51,360
Some of the best seed vendors are right in California, right in Humboldt County.

8
00:00:51,360 --> 00:00:58,640
And then there's the whole thing where you could do feminized or unfeminized seeds.

9
00:00:58,640 --> 00:01:08,880
And the more I looked into that, I don't want to harsh on anyone's seed business, but basically

10
00:01:08,880 --> 00:01:14,040
definitely the big time producers, they just go straight through the feminized seeds.

11
00:01:14,040 --> 00:01:15,280
They just don't want to mess around.

12
00:01:15,280 --> 00:01:21,920
They just want to grow a bunch of flower and just female flower and be happy with that.

13
00:01:21,920 --> 00:01:25,280
I'm more interested in genetics and whatnot.

14
00:01:25,280 --> 00:01:30,320
And so if you're breeding, you may not want to get feminized seeds, right?

15
00:01:30,320 --> 00:01:35,640
Because you need some males in there and all that jazz, but we're going down a whole can

16
00:01:35,640 --> 00:01:36,640
of worms.

17
00:01:36,640 --> 00:01:41,340
But anywho, enough seed talk.

18
00:01:41,340 --> 00:01:46,240
So for those of you who are new, Grant, my name is Keegan.

19
00:01:46,240 --> 00:01:47,760
Started a company, Canlytics.

20
00:01:47,760 --> 00:01:52,760
So got into the space first by helping out labs that test cannabis.

21
00:01:52,760 --> 00:01:58,960
And now we're just trying to organize and make cannabis data accessible.

22
00:01:58,960 --> 00:02:01,340
So it's sort of all hands on deck moment.

23
00:02:01,340 --> 00:02:04,120
So happy to have you here.

24
00:02:04,120 --> 00:02:06,160
Thanks.

25
00:02:06,160 --> 00:02:10,200
I'm happy to be here.

26
00:02:10,200 --> 00:02:12,520
I'm just kind of curious about the space.

27
00:02:12,520 --> 00:02:17,840
I'm a pretty young data scientist and was honestly just wanting to learn more.

28
00:02:17,840 --> 00:02:19,200
Too cool.

29
00:02:19,200 --> 00:02:23,200
Well, we've got a real interesting project at hand.

30
00:02:23,200 --> 00:02:29,000
At the moment, we're basically wrangling data from certificates of analysis.

31
00:02:29,000 --> 00:02:34,520
So basically, cannabis has to get mandated quality assurance testing.

32
00:02:34,520 --> 00:02:41,720
And while that can be a sore point for a lot of producers, what I try to point out is this

33
00:02:41,720 --> 00:02:45,080
is also putting a lot of data in your hands, right?

34
00:02:45,080 --> 00:02:50,360
These are rigorous tests, right, and you're getting all the compounds in your products

35
00:02:50,360 --> 00:02:51,360
measured.

36
00:02:51,360 --> 00:02:58,040
So instead of just leaving that data on your table, it will take full advantage of it.

37
00:02:58,040 --> 00:03:03,320
And so basically, people have all this data in PDFs.

38
00:03:03,320 --> 00:03:05,440
Or sometimes it's on the web.

39
00:03:05,440 --> 00:03:15,280
But it's amazing how much data can get locked up in a PDF and get rendered useless.

40
00:03:15,280 --> 00:03:21,840
And so I've seen it where people will have hundreds, if not thousands of these COAs,

41
00:03:21,840 --> 00:03:27,960
and they're just saved on their Google Drive somewhere and they're not doing anything with

42
00:03:27,960 --> 00:03:28,960
them.

43
00:03:28,960 --> 00:03:34,040
So basically, we're building tools here to unlock that data.

44
00:03:34,040 --> 00:03:35,100
And it's open source.

45
00:03:35,100 --> 00:03:42,160
So I'll show you today how, one, you can go pick up these tools and make money with them,

46
00:03:42,160 --> 00:03:43,160
right?

47
00:03:43,160 --> 00:03:55,440
You can go talk with labs and laboratory software providers and help them out and help out retailers,

48
00:03:55,440 --> 00:03:56,440
help out producers.

49
00:03:56,440 --> 00:03:59,880
So the sky's the limit here.

50
00:03:59,880 --> 00:04:10,600
And anyone have any thoughts, comments, questions before we just dive straight into the fun?

51
00:04:10,600 --> 00:04:15,280
One curiosity that I have is just generally how much analytics is actually happening right

52
00:04:15,280 --> 00:04:18,040
now in the cannabis space.

53
00:04:18,040 --> 00:04:22,720
Is data analysis or data science a pretty hot topic right now, or is it still kind of

54
00:04:22,720 --> 00:04:23,720
brooding?

55
00:04:23,720 --> 00:04:29,080
John may have a thought or two on this.

56
00:04:29,080 --> 00:04:37,280
My thought is there's a supply, but I think the demand far outstrips the supply.

57
00:04:37,280 --> 00:04:41,520
So it also can be a tough sell, right?

58
00:04:41,520 --> 00:04:48,040
Going to these producer processors or retailers, I think a lot would benefit from analytics.

59
00:04:48,040 --> 00:04:52,360
But their first question is going to be, what's my return on investment?

60
00:04:52,360 --> 00:04:58,460
So if you're not able to clearly articulate what their return is going to be, they probably

61
00:04:58,460 --> 00:05:04,760
won't even entertain a conversation with you.

62
00:05:04,760 --> 00:05:10,400
So like I said, some of the more savvy players are realizing this is something that they

63
00:05:10,400 --> 00:05:14,680
need, and so they're trying to find analytics providers.

64
00:05:14,680 --> 00:05:19,440
And now, as I said, I think there's just a very, very small supply.

65
00:05:19,440 --> 00:05:25,520
So if you're interested in bringing your skills into the space, I think there's a demand for

66
00:05:25,520 --> 00:05:26,520
it.

67
00:05:26,520 --> 00:05:31,800
John or anyone else, do you have any thoughts about how much analytics is needed here in

68
00:05:31,800 --> 00:05:32,800
the cannabis space?

69
00:05:32,800 --> 00:05:33,800
I may be buying.

70
00:05:33,800 --> 00:05:36,800
Well, I could go on forever.

71
00:05:36,800 --> 00:05:43,120
But as I said, people are getting more savvy to it.

72
00:05:43,120 --> 00:05:45,120
It's not even just the cannabis space.

73
00:05:45,120 --> 00:05:47,920
It's more just industry in general.

74
00:05:47,920 --> 00:05:59,160
The way I describe it was maybe 2015 to 2018 or so were the data-basic years.

75
00:05:59,160 --> 00:06:02,840
Those were basically where people were just trying to figure out how can we just put a

76
00:06:02,840 --> 00:06:06,800
cap on this fire hose of data coming out.

77
00:06:06,800 --> 00:06:13,560
People were trying to figure out how to collect it, what could be collected, where to store

78
00:06:13,560 --> 00:06:19,800
it, how to store it, what the data should look like, what should standards be, all that

79
00:06:19,800 --> 00:06:20,800
jazz.

80
00:06:20,800 --> 00:06:23,200
And that's largely been figured out.

81
00:06:23,200 --> 00:06:25,920
And so now people have these giant databases.

82
00:06:25,920 --> 00:06:29,080
So now people have all their data.

83
00:06:29,080 --> 00:06:34,600
It may have been cleaned by 2015 standards, but so be it.

84
00:06:34,600 --> 00:06:35,600
We'll take it.

85
00:06:35,600 --> 00:06:41,840
So now they've got their data, and now it's time to do something with it.

86
00:06:41,840 --> 00:06:49,840
And the issue is severalfold.

87
00:06:49,840 --> 00:06:57,280
The data sets in the cannabis space are by and large pretty narrow in focus.

88
00:06:57,280 --> 00:07:03,360
Part of that is historical and as regulatory as come on, they've principally focused on

89
00:07:03,360 --> 00:07:09,320
product safety and product safety attributes that can be measured.

90
00:07:09,320 --> 00:07:14,920
Again, California took the lead because of their pesticide focus.

91
00:07:14,920 --> 00:07:23,440
And it really drove pesticide analysis big time in the analytics space, microbial testing,

92
00:07:23,440 --> 00:07:26,000
and the cannabinoid levels as they come online.

93
00:07:26,000 --> 00:07:27,500
But that's about it.

94
00:07:27,500 --> 00:07:35,160
We tried very hard during when there was public comment period in California during the run

95
00:07:35,160 --> 00:07:38,740
up to the Prop 68.

96
00:07:38,740 --> 00:07:43,320
And we tried to push pretty hard that they should be expanding it beyond just cannabinoids.

97
00:07:43,320 --> 00:07:48,680
And we pushed hard for terpene analysis to be required by regulatory.

98
00:07:48,680 --> 00:07:49,800
And that never happened.

99
00:07:49,800 --> 00:07:53,220
And so that's hit or miss.

100
00:07:53,220 --> 00:08:02,320
The field is moving towards having terpene analysis more prevalent in the commodity.

101
00:08:02,320 --> 00:08:04,780
Now consumers are starting to demand it.

102
00:08:04,780 --> 00:08:07,000
So you're starting to see it.

103
00:08:07,000 --> 00:08:14,080
But the other key compounds that would be affecting the outcome, namely esters and thiles,

104
00:08:14,080 --> 00:08:18,600
sulfur containing compounds, are not really part of the analysis yet.

105
00:08:18,600 --> 00:08:26,400
So we get a very narrow perspective on what you can tease out and interpret.

106
00:08:26,400 --> 00:08:29,120
Anyway, I'll stop there.

107
00:08:29,120 --> 00:08:35,480
But I would like to see broader types of analyses.

108
00:08:35,480 --> 00:08:40,320
Again, it's a marketing question because what's the ROI?

109
00:08:40,320 --> 00:08:46,760
But at the end of the day, I think cannabis, it's pretty narrow in its focus.

110
00:08:46,760 --> 00:08:54,560
And so there's only so much you can get from narrow data sets.

111
00:08:54,560 --> 00:08:57,720
And that's where the cannabis data science team comes in.

112
00:08:57,720 --> 00:09:02,720
And I think that's where sort of the lesson of the day will fit in so nicely because I'll

113
00:09:02,720 --> 00:09:09,440
show you how we've been stuck on these real easy low-hanging fruit.

114
00:09:09,440 --> 00:09:14,280
And I'll show you how we can use a cherry picker to go and get the hard to reach fruit

115
00:09:14,280 --> 00:09:20,760
today because that's what we're all about, using clever tools to get the juicy fruit

116
00:09:20,760 --> 00:09:24,420
that's just out of reach.

117
00:09:24,420 --> 00:09:31,240
So here is one of these certificates of analysis that John is talking about.

118
00:09:31,240 --> 00:09:41,880
So this is, I think, just a mock product or maybe not, but just some flower that was sampled

119
00:09:41,880 --> 00:09:47,080
here in January.

120
00:09:47,080 --> 00:09:52,000
And you can see, okay, it passed the overall tests.

121
00:09:52,000 --> 00:10:00,840
And typically, what people will put on their label is they'll just put maybe the total

122
00:10:00,840 --> 00:10:13,480
THC, the total CBD, and the total cannabinoids because chances are the vendor who purchased

123
00:10:13,480 --> 00:10:20,640
– so at the retailer, right, some vendor purchased this flower from a wholesaler, like

124
00:10:20,640 --> 00:10:24,440
a producer in California, it would be a distributor.

125
00:10:24,440 --> 00:10:27,920
So this distributor sold this product to a retailer.

126
00:10:27,920 --> 00:10:33,440
They're putting it on the shelf, and all they have is this PDF.

127
00:10:33,440 --> 00:10:36,600
So this is cool, right?

128
00:10:36,600 --> 00:10:41,440
So I always say the labs, their product is twofold.

129
00:10:41,440 --> 00:10:49,760
One, it's the certificate saying, yes, this got signed off on by the laboratory director

130
00:10:49,760 --> 00:10:52,320
at Greenleaf Labs.

131
00:10:52,320 --> 00:10:55,960
And then the second part are the data points.

132
00:10:55,960 --> 00:11:03,280
So the retailer could try to copy and paste this and put this into Excel.

133
00:11:03,280 --> 00:11:06,200
And so they probably do that to a certain extent, right?

134
00:11:06,200 --> 00:11:11,960
They probably grab the cannabinoids and they make their labels.

135
00:11:11,960 --> 00:11:16,520
However, I mean, look at all these data points here.

136
00:11:16,520 --> 00:11:18,960
And so this is what John was talking about.

137
00:11:18,960 --> 00:11:24,560
So now you come down and, okay, this sample was tested for terpenes.

138
00:11:24,560 --> 00:11:33,760
So now you have all the monoterpenes, incestuous terpenes, but that's going to, you know, you're

139
00:11:33,760 --> 00:11:37,800
going to have to start paying someone by the hour, right?

140
00:11:37,800 --> 00:11:42,640
If you're just going to have somebody copy and paste this into Excel, you're getting

141
00:11:42,640 --> 00:11:45,600
dozens of these a day, maybe.

142
00:11:45,600 --> 00:11:47,960
Maybe you've got hundreds of these, right?

143
00:11:47,960 --> 00:11:56,360
You're looking at quite a cost if you just want to pay someone to copy and paste this

144
00:11:56,360 --> 00:12:00,760
into Excel.

145
00:12:00,760 --> 00:12:09,480
And then, oh, man, like, you know, now you've got all the pesticides down here.

146
00:12:09,480 --> 00:12:22,640
So all the pesticides, heavy metals, microtoxins, which are produced by microbes.

147
00:12:22,640 --> 00:12:26,480
And so then you also have your product screen.

148
00:12:26,480 --> 00:12:27,480
And double check beyond that.

149
00:12:27,480 --> 00:12:30,080
I'm not a microbiologist.

150
00:12:30,080 --> 00:12:31,400
Definitely double check.

151
00:12:31,400 --> 00:12:33,040
That's a whole field of its own.

152
00:12:33,040 --> 00:12:35,480
That's what's cool about the laboratory, right?

153
00:12:35,480 --> 00:12:44,200
You may have microbiologists working who know these analytes really well.

154
00:12:44,200 --> 00:12:48,520
You may have a chemist who's performing your heavy metal test.

155
00:12:48,520 --> 00:12:53,080
You may have another chemist performing your pesticide test.

156
00:12:53,080 --> 00:12:57,760
So this is awesome data, right?

157
00:12:57,760 --> 00:13:02,720
These are top-notch scientists producing this data.

158
00:13:02,720 --> 00:13:12,720
So why just let this go to the garbage bin, but why just let this collect dust in your

159
00:13:12,720 --> 00:13:13,720
drive?

160
00:13:13,720 --> 00:13:20,280
Was there a question or comment?

161
00:13:20,280 --> 00:13:22,800
Yeah.

162
00:13:22,800 --> 00:13:27,760
So you mentioned that this is to some degree open source.

163
00:13:27,760 --> 00:13:33,800
I guess I'm just curious now, like, how you get access to these types of compliance testing

164
00:13:33,800 --> 00:13:35,080
reports and whatnot.

165
00:13:35,080 --> 00:13:36,520
And I don't know.

166
00:13:36,520 --> 00:13:38,160
I'll probably learn more.

167
00:13:38,160 --> 00:13:43,160
I don't have too much time left until my next meeting, but I'll be coming next week to kind

168
00:13:43,160 --> 00:13:45,480
of a longer meeting to talk more.

169
00:13:45,480 --> 00:13:47,600
Graham, where are you?

170
00:13:47,600 --> 00:13:49,600
I'm in Salt Lake City right now.

171
00:13:49,600 --> 00:13:50,600
Yeah.

172
00:13:50,600 --> 00:13:55,720
So you're going to be more limited in Utah than you are in jurisdictions like Washington,

173
00:13:55,720 --> 00:14:00,320
California, Oregon, in terms of accessing this.

174
00:14:00,320 --> 00:14:02,400
It's more or less open source.

175
00:14:02,400 --> 00:14:08,600
I believe Washington state is doing a much better job at getting this out in open source

176
00:14:08,600 --> 00:14:09,600
mode.

177
00:14:09,600 --> 00:14:14,000
California, it's all required, but it doesn't circulate.

178
00:14:14,000 --> 00:14:20,680
It circulates at the level of the distributor and the retailer, and the consumer can ask

179
00:14:20,680 --> 00:14:23,360
for it and hopefully get it.

180
00:14:23,360 --> 00:14:28,880
But it's not like it's all online and you can just go download it.

181
00:14:28,880 --> 00:14:34,800
And Grant, I'll have to point you to our episode from last week.

182
00:14:34,800 --> 00:14:40,880
I'll try to get that uploaded for you because basically the idea is these tests are being

183
00:14:40,880 --> 00:14:48,360
performed for consumer safety.

184
00:14:48,360 --> 00:14:55,840
So if you're a consumer, you should be able to access your data here.

185
00:14:55,840 --> 00:15:13,920
So let me – for example, let's say one of your products – so here is a product that

186
00:15:13,920 --> 00:15:19,120
I recently purchased here in California.

187
00:15:19,120 --> 00:15:28,240
So you can get this product and then on the label it'll have a – it may have a QR code.

188
00:15:28,240 --> 00:15:31,760
And so here you could find these results online.

189
00:15:31,760 --> 00:15:38,600
And so the idea is at the end of the day the reason the quality assurance is mandated is

190
00:15:38,600 --> 00:15:44,000
to get the consumer educated about their product.

191
00:15:44,000 --> 00:15:46,720
And so that's sort of what Canalytics is here to solve.

192
00:15:46,720 --> 00:15:52,160
We're just saying, okay, they've done all this fantastic quality assurance testing,

193
00:15:52,160 --> 00:16:01,520
but now we need to do the final step and get this data into the hands of the consumer.

194
00:16:01,520 --> 00:16:06,120
So this is a publicly available certificate.

195
00:16:06,120 --> 00:16:11,840
So this is a public URL.

196
00:16:11,840 --> 00:16:18,440
You can browse around this and go download this certificate.

197
00:16:18,440 --> 00:16:23,160
And as a consumer you want to know what's in your product.

198
00:16:23,160 --> 00:16:28,520
You want to know your cannabinoids, your terpenes.

199
00:16:28,520 --> 00:16:38,000
And basically having the PDF is cool, but what we're going to do – and we've already

200
00:16:38,000 --> 00:16:44,840
begun with – we've already checked off confident cannabis, tag-leaf limbs, and then

201
00:16:44,840 --> 00:16:47,760
I'll show you today Greenleaf Labs.

202
00:16:47,760 --> 00:16:52,480
But basically we can just start parsing these COAs.

203
00:16:52,480 --> 00:16:59,040
That way you can actually walk away at the end of the day with – whether it's a JSON

204
00:16:59,040 --> 00:17:04,160
file or just a simple Excel file, you can just walk away at the end of the day with

205
00:17:04,160 --> 00:17:14,000
all of this data in an Excel file that you can give to your favorite data scientist.

206
00:17:14,000 --> 00:17:20,200
You can call Grant up on the phone and say, hey, I've got all of my data now.

207
00:17:20,200 --> 00:17:24,200
Can you crank some numbers on this?

208
00:17:24,200 --> 00:17:29,600
Does that help answer your question?

209
00:17:29,600 --> 00:17:31,600
Yeah, it does.

210
00:17:31,600 --> 00:17:34,320
I really appreciate that.

211
00:17:34,320 --> 00:17:36,320
This is really neat.

212
00:17:36,320 --> 00:17:44,280
So, Gary, I'm not as knowledgeable on terpenes or these other compounds that are in the products,

213
00:17:44,280 --> 00:17:52,640
but I guess I'm just interested now more in what consumers care about and why they

214
00:17:52,640 --> 00:17:53,640
care about it.

215
00:17:53,640 --> 00:17:54,640
Consumers care about getting high.

216
00:17:54,640 --> 00:18:02,620
If you want to know that, go be a bud zender.

217
00:18:02,620 --> 00:18:07,800
Do it for a week and you will learn so much.

218
00:18:07,800 --> 00:18:09,680
That's actually a good recommendation.

219
00:18:09,680 --> 00:18:23,080
In fact, I have heard somebody make the same recommendation that if you truly want to learn

220
00:18:23,080 --> 00:18:29,280
a lot, you spend a week as a bud tender.

221
00:18:29,280 --> 00:18:31,320
Different consumers will want different things.

222
00:18:31,320 --> 00:18:35,600
Some just want the highest cannabinoid concentration.

223
00:18:35,600 --> 00:18:38,760
A lot – I mean, think about it.

224
00:18:38,760 --> 00:18:44,440
Here we're having the conversation about these analytes, but we're probably on the far end

225
00:18:44,440 --> 00:18:50,640
of the distribution of people who are educated about this.

226
00:18:50,640 --> 00:18:56,480
What I have found is the more that people find out about, say, terpenes or, in fact,

227
00:18:56,480 --> 00:18:59,480
pesticides, the more they care about them.

228
00:18:59,480 --> 00:19:06,380
For example, I, in particular, after spending time at the laboratory, I'm particularly

229
00:19:06,380 --> 00:19:13,160
interested in seeing pesticide results because what people don't realize is, yes, you may

230
00:19:13,160 --> 00:19:23,000
have a pesticide pass, but I want to then go and make sure that it's non-detect across

231
00:19:23,000 --> 00:19:25,440
the board.

232
00:19:25,440 --> 00:19:31,160
In this one, it did detect for pyrethrins, but you may have to double-check on that because

233
00:19:31,160 --> 00:19:39,440
it's possible that there may be some naturally occurring pyrethrins.

234
00:19:39,440 --> 00:19:46,600
But now as an educated consumer, you can now go and do your homework.

235
00:19:46,600 --> 00:19:52,480
Now you don't have to research 70-some pesticides.

236
00:19:52,480 --> 00:20:04,120
You can now just go and say, oh, should I be concerned about 0.1 nanogram per gram or

237
00:20:04,120 --> 00:20:06,880
we'll have to find the units here.

238
00:20:06,880 --> 00:20:12,600
Yeah, UG per G. I believe that's nanogram per gram.

239
00:20:12,600 --> 00:20:18,760
Long story short, from my background, I believe there's some naturally occurring pyrethrins,

240
00:20:18,760 --> 00:20:27,120
so I think this one you may not – once again, do your own homework.

241
00:20:27,120 --> 00:20:32,840
I would say do a quick search on pyrethrins and you'll probably learn more than I can

242
00:20:32,840 --> 00:20:34,280
tell you here.

243
00:20:34,280 --> 00:20:42,800
But this is what I would look for as a consumer, is I would look for any detects.

244
00:20:42,800 --> 00:20:47,120
I don't think producers are going to love to hear that news, but like I said, I think

245
00:20:47,120 --> 00:20:56,360
it's just – it'll just kind of make everybody try a little bit harder because – and not

246
00:20:56,360 --> 00:21:03,880
every consumer would care about this, but me in particular, this is what I look for.

247
00:21:03,880 --> 00:21:08,320
And then the second thing I look for are just terpenes, right?

248
00:21:08,320 --> 00:21:16,880
So just again, speaking from personal preference, I've recently kind of learned that limonene

249
00:21:16,880 --> 00:21:23,560
may be what puts me on the couch.

250
00:21:23,560 --> 00:21:34,960
So I maybe would want to try to avoid limonene if I'm wanting to be productive, so to speak.

251
00:21:34,960 --> 00:21:40,080
But once again, this is getting into a whole can of worms and this is where you may want

252
00:21:40,080 --> 00:21:45,000
to speak with John because this is sort of right up John's alley and the work he's

253
00:21:45,000 --> 00:21:56,000
doing is you're consuming many, many milligrams of many different compounds here and what

254
00:21:56,000 --> 00:21:58,440
effect is this having on your body?

255
00:21:58,440 --> 00:22:04,400
Are you kind of moving all these chemical dials in your brain?

256
00:22:04,400 --> 00:22:11,200
But a lot of it's just trial and error and people are just kind of just figuring it out

257
00:22:11,200 --> 00:22:12,200
this haphazardly.

258
00:22:12,200 --> 00:22:19,640
I've heard a saying once that if you're not measuring it, you're not managing it.

259
00:22:19,640 --> 00:22:27,040
And once again, I've basically kind of found that I sort of gravitate towards cannabis

260
00:22:27,040 --> 00:22:34,040
that has terpenelene in it and it took me forever to figure this out.

261
00:22:34,040 --> 00:22:37,680
And then it's like what is a lot of terpenelene?

262
00:22:37,680 --> 00:22:48,880
And we've done the statistics here and I think maybe.2 or.25 is maybe the average around

263
00:22:48,880 --> 00:22:51,120
the mean or median.

264
00:22:51,120 --> 00:22:55,320
So this would be above average terpenelene.

265
00:22:55,320 --> 00:23:00,720
Once again, I'm just pulling that number out of my memory, so go double check us on this.

266
00:23:00,720 --> 00:23:07,080
But that's the thing is we're sort of these real educated consumers and so the idea is

267
00:23:07,080 --> 00:23:13,160
can we get this data to more and more people so that way hopefully other people can have

268
00:23:13,160 --> 00:23:21,800
these revelations about what works well for them and what doesn't because we're also

269
00:23:21,800 --> 00:23:24,840
kind of finding out that everybody's a little different.

270
00:23:24,840 --> 00:23:27,400
We all have our different biochemistries.

271
00:23:27,400 --> 00:23:35,740
So we all have individual biochemistry and then you're mixing in this complex chemistry

272
00:23:35,740 --> 00:23:41,160
and now we kind of need to connect the dots.

273
00:23:41,160 --> 00:23:46,360
So that's one of the reasons Cantalytics is helping with this is we're trying to help

274
00:23:46,360 --> 00:23:54,480
people like John who essentially are doing clinical studies.

275
00:23:54,480 --> 00:24:00,520
John's doing the dosing project where people are consuming cannabis and reporting effects

276
00:24:00,520 --> 00:24:11,240
and now if we can get these data connected to all of that then people can start to figure

277
00:24:11,240 --> 00:24:18,840
out what a reasonable dose may look like and how they could track their consumption over

278
00:24:18,840 --> 00:24:19,840
time.

279
00:24:19,840 --> 00:24:21,840
That's really neat.

280
00:24:21,840 --> 00:24:22,840
I apologize.

281
00:24:22,840 --> 00:24:27,360
I have to step out for a work meeting now but it's been really good to meet everyone

282
00:24:27,360 --> 00:24:30,020
here and you'll see me again.

283
00:24:30,020 --> 00:24:33,680
And Grant thanks for tuning in and tune in anytime.

284
00:24:33,680 --> 00:24:34,680
Yeah appreciate it.

285
00:24:34,680 --> 00:24:36,680
Have a good day everyone.

286
00:24:36,680 --> 00:24:38,840
Thank you.

287
00:24:38,840 --> 00:24:47,720
Well since we can kind of change gears now and since I believe everyone here for the

288
00:24:47,720 --> 00:24:55,320
most part is a you know COA aficionado do you want to see how to start parsing these?

289
00:24:55,320 --> 00:24:59,120
Yeah I'm curious to see how this is working and then I have a question for you based on

290
00:24:59,120 --> 00:25:00,120
that.

291
00:25:00,120 --> 00:25:01,120
Yes please.

292
00:25:01,120 --> 00:25:04,360
So I'll show you how it works.

293
00:25:04,360 --> 00:25:12,520
So essentially the way I've figured the way I like to do a lot of programming is all the

294
00:25:12,520 --> 00:25:20,600
constants that you can sort of was there a question or comment?

295
00:25:20,600 --> 00:25:28,960
So all the constants that you sort of can pull out into a JSON object the better.

296
00:25:28,960 --> 00:25:35,160
So here's sort of just a Greenleaf lab observation.

297
00:25:35,160 --> 00:25:42,080
Give me one second.

298
00:25:42,080 --> 00:25:45,160
I may have to restart this.

299
00:25:45,160 --> 00:25:58,080
Give me one second.

300
00:25:58,080 --> 00:26:05,160
Okay so basically we'll try to pull out all of the constants so that way you know you

301
00:26:05,160 --> 00:26:07,680
could store this in your database.

302
00:26:07,680 --> 00:26:14,720
And you know this would be a typical database entry you know for your given lab right.

303
00:26:14,720 --> 00:26:19,560
You've got Greenleaf lab and you've got all their details about them.

304
00:26:19,560 --> 00:26:27,600
And so then the idea is we'd like to make the analyses more variable than this but in

305
00:26:27,600 --> 00:26:37,960
this case I just went ahead and looked at Greenleaf lab COA and I'm like okay you know

306
00:26:37,960 --> 00:26:42,040
here are their analyses.

307
00:26:42,040 --> 00:26:47,840
And it may seem like a little bit of work but hopefully you should only really have

308
00:26:47,840 --> 00:26:53,520
to do this you know once or twice because you know I visited labs all across the country

309
00:26:53,520 --> 00:27:00,360
and these are basically the standard analyses that you'll have at laboratories right.

310
00:27:00,360 --> 00:27:06,440
Cannabinoids, pesticides, water activity and moisture content.

311
00:27:06,440 --> 00:27:10,720
You'll have terpenes right that may be optional so you should basically view all of these

312
00:27:10,720 --> 00:27:13,720
analyses as optional.

313
00:27:13,720 --> 00:27:20,080
Then you've got heavy metals, lycotoxins, microbial and then foreign matter.

314
00:27:20,080 --> 00:27:24,920
And then in California they call it foreign material so that's a curve ball but you know

315
00:27:24,920 --> 00:27:26,920
we can work with that.

316
00:27:26,920 --> 00:27:35,120
So once again we'd try to avoid that so if there's any way we could sort of figure this

317
00:27:35,120 --> 00:27:40,240
out through natural language processing that would be the best right.

318
00:27:40,240 --> 00:27:46,000
So we could just read microbials by PCR and know that's microbials.

319
00:27:46,000 --> 00:27:52,840
And as you can see you could come close to doing that but I've just predefined them.

320
00:27:52,840 --> 00:27:59,400
Once again natural language processing could probably solve that but that's using quite

321
00:27:59,400 --> 00:28:04,920
the heavy power tool for a small task.

322
00:28:04,920 --> 00:28:10,600
And then basically the way I'm approaching this is I'm going to basically divide the

323
00:28:10,600 --> 00:28:12,280
page up.

324
00:28:12,280 --> 00:28:17,060
So I'll just show you how I'll do that real quick.

325
00:28:17,060 --> 00:28:23,320
So the idea is we can read in this PDF.

326
00:28:23,320 --> 00:28:30,080
One second here.

327
00:28:30,080 --> 00:28:38,080
The idea is we can read in the PDF and here I'll just look at the first page and this

328
00:28:38,080 --> 00:28:44,960
will just be sort of all the rectangles that we can detect on the first page.

329
00:28:44,960 --> 00:28:56,480
So you can see okay awesome like we can detect this area looks like kind of looks like a

330
00:28:56,480 --> 00:29:03,080
table rectangle area so we can get that data real easily.

331
00:29:03,080 --> 00:29:10,000
And so the way I approach this is and this is why I was saying this could almost be a

332
00:29:10,000 --> 00:29:13,160
lab by lab project that you could potentially get paid for right.

333
00:29:13,160 --> 00:29:19,720
So say you go to a retailer and they say hey we've got labs for or we've got COAs from

334
00:29:19,720 --> 00:29:23,960
these 10 labs.

335
00:29:23,960 --> 00:29:30,080
You may want to you know that could be a good contract for you.

336
00:29:30,080 --> 00:29:33,440
But once again it could pay off big time for them right.

337
00:29:33,440 --> 00:29:42,280
If they're able to unlock COAs from 10 different laboratories for you know the foreseeable

338
00:29:42,280 --> 00:29:49,140
future and all their historic COAs and that's a lot of value to be added.

339
00:29:49,140 --> 00:29:53,720
So once again and I'll show you how you can kind of make short work of this and hopefully

340
00:29:53,720 --> 00:29:57,560
it'll get faster and faster each COA we parse.

341
00:29:57,560 --> 00:30:02,760
But the idea is you can just start to essentially use computer vision right.

342
00:30:02,760 --> 00:30:08,680
And that's why I was saying you know that's what's so cool about the PDF plumber package

343
00:30:08,680 --> 00:30:17,320
is it sort of works on you know kind of I believe it leverages sort of you know smart

344
00:30:17,320 --> 00:30:19,440
parsing techniques under the hood.

345
00:30:19,440 --> 00:30:24,080
So if you're interested you can get the images.

346
00:30:24,080 --> 00:30:29,800
So I'll show you down below how you could actually get this image data.

347
00:30:29,800 --> 00:30:37,160
So if you were predisposed you could get that image and display it on your website if you

348
00:30:37,160 --> 00:30:40,440
wanted to.

349
00:30:40,440 --> 00:30:48,840
This is where you kind of get into who owns the copyright of that image and what exactly

350
00:30:48,840 --> 00:30:49,840
is copywritten.

351
00:30:49,840 --> 00:30:58,440
And so this is where I would almost write to like who owns like the raw data of that

352
00:30:58,440 --> 00:30:59,440
image.

353
00:30:59,440 --> 00:31:06,960
I guess right like is the image data itself part of the COA product right.

354
00:31:06,960 --> 00:31:16,920
It's like the image data is similar to like a cannabinoid result or is the image like

355
00:31:16,920 --> 00:31:19,920
a proprietary Greenleaf lab image.

356
00:31:19,920 --> 00:31:23,080
So I don't know.

357
00:31:23,080 --> 00:31:30,360
I suspect it goes with the client.

358
00:31:30,360 --> 00:31:39,040
Exactly so you know if the client paid for the COA and Greenleaf lab took that photo

359
00:31:39,040 --> 00:31:43,240
you know can the client then do anything they want with that photo.

360
00:31:43,240 --> 00:31:46,040
Yeah that's part of the compliance package.

361
00:31:46,040 --> 00:31:49,400
That's what I that would be my argument.

362
00:31:49,400 --> 00:31:53,320
Of course Greenleaf lab owns their own logo but I would say you know if you paid for your

363
00:31:53,320 --> 00:32:00,160
test then I would say your image is part of your result.

364
00:32:00,160 --> 00:32:04,720
But anywho that's getting a little nitpicky but these are the kind of the cool things

365
00:32:04,720 --> 00:32:11,360
that we can start figuring out here in you know this digital 21st century.

366
00:32:11,360 --> 00:32:19,720
Because like I said the image data is quite rich and as we're figuring out you know some

367
00:32:19,720 --> 00:32:24,600
of these image processing techniques can provide a lot of data.

368
00:32:24,600 --> 00:32:29,520
But we're just more interested in the words.

369
00:32:29,520 --> 00:32:36,600
So basically whenever you get a PDF I would say you know the first thing you should do

370
00:32:36,600 --> 00:32:45,160
is just look at all of the words on the page just to make sure that you can get everything.

371
00:32:45,160 --> 00:32:53,440
And so here I'm just looking at the front page but we can look at you know subsequent

372
00:32:53,440 --> 00:32:57,780
pages here.

373
00:32:57,780 --> 00:33:03,760
This one we'll need to make sure that we can in fact get this area here.

374
00:33:03,760 --> 00:33:09,080
Have no fear I think we can do it.

375
00:33:09,080 --> 00:33:14,680
But that's sort of the first thing is you know make sure that you can extract the words.

376
00:33:14,680 --> 00:33:20,680
Have no fear we can get those ones.

377
00:33:20,680 --> 00:33:30,160
But I think we may have to get down to the character level.

378
00:33:30,160 --> 00:33:35,800
So here if you just look at all the characters yes so that's the thing is right couldn't

379
00:33:35,800 --> 00:33:40,920
find all the words but look we can get all of the characters.

380
00:33:40,920 --> 00:33:50,280
So as long as you know we can get every last data point you know there should be a way

381
00:33:50,280 --> 00:33:55,120
to write a clever enough algorithm to actually get the data points.

382
00:33:55,120 --> 00:34:01,480
So that's sort of the first step is just make sure that you have like an actual PDF and

383
00:34:01,480 --> 00:34:08,040
not like a right because you may get wild things right you may get like an image or

384
00:34:08,040 --> 00:34:15,600
something that's you know or some sort of and this would be a good detection right into

385
00:34:15,600 --> 00:34:20,600
this you know not that I would expect them but you know I did talk to someone once in

386
00:34:20,600 --> 00:34:26,720
Oregon who mentioned that this was a problem with like you know counterfeit COAs but this

387
00:34:26,720 --> 00:34:32,400
would you know detect immediately if something was say a counterfeit COA because you just

388
00:34:32,400 --> 00:34:38,800
wouldn't be able to read the characters.

389
00:34:38,800 --> 00:34:45,400
But anywho like I said I don't think that's going to be very common.

390
00:34:45,400 --> 00:34:50,960
But the way I go about parsing this is by the page area.

391
00:34:50,960 --> 00:35:06,920
So basically you know if you look at the front page so if you look at the COA which I recommend

392
00:35:06,920 --> 00:35:12,600
you do first thing is basically you just want to look at the COA and just take stock of

393
00:35:12,600 --> 00:35:19,160
every single data point on the COA and then think about you know how you know we're going

394
00:35:19,160 --> 00:35:24,000
to get every last data point off of this COA.

395
00:35:24,000 --> 00:35:33,120
So basically we've got the distributor, we've got the cultivator, we've got the sample details,

396
00:35:33,120 --> 00:35:40,400
we've got the status, we could potentially get the director and stuff but I'm not doing

397
00:35:40,400 --> 00:35:42,800
that for now.

398
00:35:42,800 --> 00:35:48,520
If we see reason two in the future we could but I don't see a reason.

399
00:35:48,520 --> 00:35:56,600
But then we see okay we've got this information repeated so we only need to get that once

400
00:35:56,600 --> 00:36:03,840
and then it just repeats and then we've got new data here in this table and then we've

401
00:36:03,840 --> 00:36:06,320
got a repeated footer.

402
00:36:06,320 --> 00:36:15,080
And so this is a common pattern that you can expect on a certificate of analysis is a common

403
00:36:15,080 --> 00:36:22,320
header a common footer with your meat in the middle.

404
00:36:22,320 --> 00:36:32,560
So the way I approached this was I was like okay I'll just read the distributor data and

405
00:36:32,560 --> 00:36:44,000
read the producer data so you can get the distributor, get the manufacturer and I was

406
00:36:44,000 --> 00:36:51,800
saying okay you know then you could read the sample details so you could specify you know

407
00:36:51,800 --> 00:36:59,280
these three areas and you know one may expect that different labs may put these in different

408
00:36:59,280 --> 00:37:06,560
areas or potentially Greenleaf labs could update their COA and things may change but

409
00:37:06,560 --> 00:37:10,920
hopefully you could target these areas.

410
00:37:10,920 --> 00:37:15,480
And basically I was actually describing this to John the other day but I'll just tell you

411
00:37:15,480 --> 00:37:22,680
so basically there's two approaches you could take to parsing these COAs.

412
00:37:22,680 --> 00:37:34,560
There's the blunt crude approach which would just be to read everything and then just try

413
00:37:34,560 --> 00:37:42,920
to split the data at known points so we'll just split lab samples, split at sampling

414
00:37:42,920 --> 00:37:44,660
method.

415
00:37:44,660 --> 00:37:51,780
So that was the route I originally went down but then you have to know every single field

416
00:37:51,780 --> 00:37:55,040
on the COA ahead of time.

417
00:37:55,040 --> 00:38:01,760
So if you do know that so say you're the limbs provider for Greenleaf labs then yes that

418
00:38:01,760 --> 00:38:06,520
would be the way you'd want to go but the way that I'm going to do this is basically

419
00:38:06,520 --> 00:38:14,440
just partition the COA into different areas and then just divide and conquer.

420
00:38:14,440 --> 00:38:21,000
So I'll just get the distributor data, get the producer data, get the sample details

421
00:38:21,000 --> 00:38:29,160
and then look you can get the results, just get this block of results.

422
00:38:29,160 --> 00:38:38,560
And so that way if you're trying to get these results and you cut out all of this noise

423
00:38:38,560 --> 00:38:47,060
then it's going to make parsing this much, much easier and I'll show you that right now.

424
00:38:47,060 --> 00:38:54,360
So basically I'll just show you, we've talked the talk, let's walk the walk.

425
00:38:54,360 --> 00:39:03,600
So let's cut out the distributor details from the front page.

426
00:39:03,600 --> 00:39:13,480
So as you can see it takes a little bit of cleaning to, right, so we make our crop and

427
00:39:13,480 --> 00:39:20,480
we just get this, right, we can read this block of text, right.

428
00:39:20,480 --> 00:39:30,620
So that's this distributor shimmons consulting corporation, distributor shimmons consulting

429
00:39:30,620 --> 00:39:32,120
corporation.

430
00:39:32,120 --> 00:39:35,600
So we can read that text in pretty cleanly.

431
00:39:35,600 --> 00:39:45,320
So the idea is it's pretty easy to parse that crop of text and it's fairly easy to parse

432
00:39:45,320 --> 00:39:50,840
the producer crop of text.

433
00:39:50,840 --> 00:40:00,880
So the idea is if you can specify what areas are these different sections then it can make

434
00:40:00,880 --> 00:40:03,400
parsing it super simple.

435
00:40:03,400 --> 00:40:10,920
And potentially I was thinking you could even build a user interface to make it even simpler,

436
00:40:10,920 --> 00:40:11,920
right.

437
00:40:11,920 --> 00:40:18,760
So your user interface would just be to show the PDF to somebody and just have them click

438
00:40:18,760 --> 00:40:26,840
squares and just say oh that's the square for distributor, that's the square for cultivator,

439
00:40:26,840 --> 00:40:28,160
so on and so forth.

440
00:40:28,160 --> 00:40:34,200
But once again it depends on how many of these you have to do.

441
00:40:34,200 --> 00:40:42,160
Once again just since we've got a little bit of extra time I'll show you how you can get

442
00:40:42,160 --> 00:40:44,280
the image data.

443
00:40:44,280 --> 00:40:58,000
So the COA doc, which I've initialized right here, has a nice built-in function to get

444
00:40:58,000 --> 00:41:00,400
the PDF image data.

445
00:41:00,400 --> 00:41:06,000
So here I want to go ahead and just get the image.

446
00:41:06,000 --> 00:41:13,980
And so this will be the raw data for that image, right.

447
00:41:13,980 --> 00:41:23,800
So this image right here, as far as the computer is concerned, that image may as well be this

448
00:41:23,800 --> 00:41:26,440
block of text.

449
00:41:26,440 --> 00:41:30,960
So this doesn't mean anything to a human.

450
00:41:30,960 --> 00:41:39,360
This is in fact just 34,000 characters of essentially randomness.

451
00:41:39,360 --> 00:41:41,440
But that's your image.

452
00:41:41,440 --> 00:41:48,420
And so the idea is you can give this block of text to your developer and they can display

453
00:41:48,420 --> 00:41:58,880
that on your website or print it on a label if you were so inclined on doing so.

454
00:41:58,880 --> 00:42:05,400
But that's a lot of text so I'm not going to be collecting all that data.

455
00:42:05,400 --> 00:42:11,960
But if you want it, it stays for the taking.

456
00:42:11,960 --> 00:42:17,200
And you can get the sample details.

457
00:42:17,200 --> 00:42:23,040
Once again, the Raspberry Parfait.

458
00:42:23,040 --> 00:42:26,320
So here you see Raspberry Parfait.

459
00:42:26,320 --> 00:42:31,360
And it's a little bit trickier to parse these.

460
00:42:31,360 --> 00:42:38,240
So this block of code is a little bit specialized.

461
00:42:38,240 --> 00:42:46,640
But the idea is if myself and all of you awesome data scientists, if we keep tinkering on this,

462
00:42:46,640 --> 00:42:55,800
hopefully we can generalize these algorithms so that way you can just point at the area

463
00:42:55,800 --> 00:43:05,400
on the PDF that you want parsed and you'll get your data here.

464
00:43:05,400 --> 00:43:12,520
And I'm just going to clear this out, just to save some room here.

465
00:43:12,520 --> 00:43:20,360
So now we've got all the distributor data, we've got all the producer data, we've got

466
00:43:20,360 --> 00:43:22,640
all the sample details.

467
00:43:22,640 --> 00:43:27,360
So that's pretty cool.

468
00:43:27,360 --> 00:43:29,480
We've got the product name, right?

469
00:43:29,480 --> 00:43:30,480
We may not yet.

470
00:43:30,480 --> 00:43:37,320
I was hoping we did.

471
00:43:37,320 --> 00:43:40,160
Well we may not have the product name yet.

472
00:43:40,160 --> 00:43:46,040
Oh, that's right, I added it at the end.

473
00:43:46,040 --> 00:43:48,000
So there it is.

474
00:43:48,000 --> 00:43:49,000
Cool.

475
00:43:49,000 --> 00:43:55,140
So now the idea is divide and conquer.

476
00:43:55,140 --> 00:43:58,280
So basically you need to get a couple things at this point.

477
00:43:58,280 --> 00:44:03,960
And so this is why, you know, CanLytics is here to help you standardize these.

478
00:44:03,960 --> 00:44:11,680
So basically we'll standardize the analyses and so basically at the top I've got, I put

479
00:44:11,680 --> 00:44:15,520
a list of all the data points that we'll collect.

480
00:44:15,520 --> 00:44:23,680
So we basically collect the analyses, collect all the distributor details, and then the

481
00:44:23,680 --> 00:44:26,760
big thing are the results, right?

482
00:44:26,760 --> 00:44:33,800
So we'll want to go ahead and get the analyses real quick just to be complete, but then we'll

483
00:44:33,800 --> 00:44:35,400
get to the results.

484
00:44:35,400 --> 00:44:46,480
We've got all the analyses, product underwent, and then we have the status for the various

485
00:44:46,480 --> 00:44:48,200
tests.

486
00:44:48,200 --> 00:44:52,600
We can get the methods.

487
00:44:52,600 --> 00:45:00,800
As we found out with PSI Labs, the method to the madness, just kidding, the method does

488
00:45:00,800 --> 00:45:04,000
matter, so it's worth keeping.

489
00:45:04,000 --> 00:45:11,720
I would like to tie these methods to the specific analysis, but just for time's sake, I haven't

490
00:45:11,720 --> 00:45:13,240
gotten to that yet.

491
00:45:13,240 --> 00:45:17,160
But we've got the methods just in case.

492
00:45:17,160 --> 00:45:18,880
Now the fun part.

493
00:45:18,880 --> 00:45:30,200
And so now because we've nicely cropped each page, remember for the results, we'll be cropping

494
00:45:30,200 --> 00:45:34,640
the page to the meat.

495
00:45:34,640 --> 00:45:45,280
We can just pull out all those rows, and then we basically have all of that data.

496
00:45:45,280 --> 00:45:55,200
Yes, we have to now organize it with a pretty clever algorithm, but basically we have all

497
00:45:55,200 --> 00:45:56,200
the data, right?

498
00:45:56,200 --> 00:45:59,280
We've got all the rows, right?

499
00:45:59,280 --> 00:46:06,400
We've got all the cannabinoids, and we've got all the terpenes.

500
00:46:06,400 --> 00:46:17,520
And so the way I approach this is in this case, I predefined the columns, but ideally

501
00:46:17,520 --> 00:46:22,240
your algorithm would be smart enough to detect the columns.

502
00:46:22,240 --> 00:46:30,480
But I just went ahead and as you can see up top, I just predefined, okay, your cannabinoids

503
00:46:30,480 --> 00:46:38,360
will have the name, the LOD, LOQ value, milligrams per gram.

504
00:46:38,360 --> 00:46:48,200
On the COA, the name, LOD, LOQ percentage, milligrams per gram.

505
00:46:48,200 --> 00:46:54,680
So it's not perfect, but at least we kind of know what order the data is in.

506
00:46:54,680 --> 00:47:03,840
And as long as Greenleaf Labs isn't changing their COA willy-nilly, and I'll tell you now

507
00:47:03,840 --> 00:47:08,160
that labs don't change their COAs willy-nilly.

508
00:47:08,160 --> 00:47:10,760
They may change them once in a while.

509
00:47:10,760 --> 00:47:19,000
So once in a while the routine may need to be updated, but these are controlled documents

510
00:47:19,000 --> 00:47:26,120
at the laboratory that have to go through a controlled process in order to be changed.

511
00:47:26,120 --> 00:47:36,320
So it kind of puts the brakes on things just changing rapidly overnight.

512
00:47:36,320 --> 00:47:42,760
So as I said, it's not going to be impossible for them to change their COA, but if they

513
00:47:42,760 --> 00:47:52,240
do so we can make a quick pull request to the algorithm and hopefully stay up to speed.

514
00:47:52,240 --> 00:48:00,400
So it's not perfect, but as I said, it's unlocking such rich data that I think the code was worth

515
00:48:00,400 --> 00:48:01,400
writing.

516
00:48:01,400 --> 00:48:10,040
Kieran, what is it doing for you have these, I don't know if I'd call them header and footers

517
00:48:10,040 --> 00:48:19,840
in the tables, but you've got columns with data and then it's interrupted with bars there.

518
00:48:19,840 --> 00:48:24,240
In this case it's black text bars.

519
00:48:24,240 --> 00:48:25,240
This?

520
00:48:25,240 --> 00:48:26,240
Yeah.

521
00:48:26,240 --> 00:48:29,520
So I'm skipping those.

522
00:48:29,520 --> 00:48:34,560
So basically I'm just going through and I'm saying, oh, if it starts with an analysis

523
00:48:34,560 --> 00:48:42,880
name, if it starts with date, time, I'll just skip it.

524
00:48:42,880 --> 00:48:47,560
And so that's here, I'll show you where in the code I'm doing that.

525
00:48:47,560 --> 00:48:50,720
Okay, I see what you're doing now.

526
00:48:50,720 --> 00:48:51,720
Okay.

527
00:48:51,720 --> 00:48:53,680
Well, I forget where in the code that is.

528
00:48:53,680 --> 00:48:57,160
Oh, I'll never see it in the million years.

529
00:48:57,160 --> 00:49:04,880
So that's what's happening is there is still some custom algorithms.

530
00:49:04,880 --> 00:49:13,800
The idea is, and this is the way I approach programming in general is I don't even mind

531
00:49:13,800 --> 00:49:16,380
writing a specific use case.

532
00:49:16,380 --> 00:49:21,400
So I wrote this specific use case and basically I was like, I'm going to parse all the data

533
00:49:21,400 --> 00:49:26,560
out of this COA one way or the other.

534
00:49:26,560 --> 00:49:32,760
Once I found out that yes, we can read every single piece of data on here, that was the

535
00:49:32,760 --> 00:49:33,760
goal.

536
00:49:33,760 --> 00:49:37,800
It was like one way or the other, we're going to read every single piece of data.

537
00:49:37,800 --> 00:49:43,880
And essentially we did exactly that.

538
00:49:43,880 --> 00:49:49,040
So here I'll just show you this right now.

539
00:49:49,040 --> 00:49:58,960
But basically, it also read in low on battery so if I cut out it's because I lost battery.

540
00:49:58,960 --> 00:50:08,400
But basically we've read every single data point on this COA.

541
00:50:08,400 --> 00:50:16,080
So this would be 59 or I think it's 58 data points.

542
00:50:16,080 --> 00:50:28,880
So tons of data and I've tried to be as specific as possible.

543
00:50:28,880 --> 00:50:34,320
I mean not as specific, I tried to be as general as possible so that way you could generalize

544
00:50:34,320 --> 00:50:37,840
the code to more and more labs.

545
00:50:37,840 --> 00:50:45,520
But inevitably I had to write Greenleaf Labs specific code.

546
00:50:45,520 --> 00:50:53,600
So basically I tried to put as many parameters as I could into JSON.

547
00:50:53,600 --> 00:50:56,960
Then I tried to generalize the code as much as possible.

548
00:50:56,960 --> 00:51:08,360
And at the end of the day, I still had to have a Greenleaf Labs specific function.

549
00:51:08,360 --> 00:51:12,880
So here I wrote a parse Greenleaf Labs function.

550
00:51:12,880 --> 00:51:21,240
And so the idea is all of these different functions, the parse Greenleaf Labs function,

551
00:51:21,240 --> 00:51:32,680
the parse Confident Cannabis function, these can be imported pretty quickly into COA doc.

552
00:51:32,680 --> 00:51:38,760
And so the idea is it's sort of this pluggable user interface.

553
00:51:38,760 --> 00:51:45,680
So you can basically just import your various labs.

554
00:51:45,680 --> 00:51:54,120
So here I've done Confident Cannabis, Tag-Leaf Lens, Greenleaf Labs, Data Scientific, and

555
00:51:54,120 --> 00:51:57,120
then SC Labs.

556
00:51:57,120 --> 00:51:58,920
I'm still finishing that one.

557
00:51:58,920 --> 00:52:02,400
And then MCR Labs, we've done the URL.

558
00:52:02,400 --> 00:52:07,320
I don't actually have a COA PDF from them.

559
00:52:07,320 --> 00:52:11,600
But if they put a QR code on their PDF, then we're good to go.

560
00:52:11,600 --> 00:52:25,640
So Keegan, given what you've got so far, can you issue a report that would show how homogeneous

561
00:52:25,640 --> 00:52:29,200
or not the analyte names are?

562
00:52:29,200 --> 00:52:31,720
Because these are all California COAs.

563
00:52:31,720 --> 00:52:38,640
And they should be, for the most part, all synonymous.

564
00:52:38,640 --> 00:52:43,920
But it would be interesting to see how bad that variance is on a California set right

565
00:52:43,920 --> 00:52:46,640
now.

566
00:52:46,640 --> 00:52:48,360
That would be a fun exercise.

567
00:52:48,360 --> 00:52:52,560
How easy is it to do with what you've got right now?

568
00:52:52,560 --> 00:52:59,160
Well, in fact, this was a test that I put Candace on with, I think Candace may have done it

569
00:52:59,160 --> 00:53:03,520
with the PSI Labs data, but we could do it with the SC Labs data.

570
00:53:03,520 --> 00:53:05,240
But we can do it.

571
00:53:05,240 --> 00:53:08,640
And essentially, we can get all the...

572
00:53:08,640 --> 00:53:12,440
I wouldn't get down with the analyte names.

573
00:53:12,440 --> 00:53:20,600
I would at first ask it at the top level of the assay classes themselves.

574
00:53:20,600 --> 00:53:21,600
The analysis.

575
00:53:21,600 --> 00:53:24,080
The analysis classes themselves.

576
00:53:24,080 --> 00:53:26,360
Start 50,000 feet.

577
00:53:26,360 --> 00:53:36,160
And let's just see how good or bad synonymous is in a set of California COAs.

578
00:53:36,160 --> 00:53:43,360
Because that's going to tell us how we have to run our synonym databases.

579
00:53:43,360 --> 00:53:46,200
I think they're actually the analyses are pretty similar.

580
00:53:46,200 --> 00:53:48,360
They could be.

581
00:53:48,360 --> 00:53:56,800
And I would recommend since VEDA is the most recent and is the most closest to current

582
00:53:56,800 --> 00:54:02,960
DCC because they've been now recently licensed.

583
00:54:02,960 --> 00:54:09,280
And what we just got back from VEDA is their current compliance COA.

584
00:54:09,280 --> 00:54:14,960
It might be pretty damn current.

585
00:54:14,960 --> 00:54:18,960
So here...

586
00:54:18,960 --> 00:54:23,240
So long story short is I think the analyses names are similar.

587
00:54:23,240 --> 00:54:28,360
Sometimes people use potency instead of cannabinoids.

588
00:54:28,360 --> 00:54:34,080
For whatever reason, people just like the term potency in the industry.

589
00:54:34,080 --> 00:54:37,880
I just always felt it was just less formal.

590
00:54:37,880 --> 00:54:38,920
It's wrong.

591
00:54:38,920 --> 00:54:42,680
It's totally fucking wrong.

592
00:54:42,680 --> 00:54:44,720
People just prefer it.

593
00:54:44,720 --> 00:54:47,240
It's a hard habit to break.

594
00:54:47,240 --> 00:54:51,000
It's just not right.

595
00:54:51,000 --> 00:54:55,720
The word potency and potent does have its time and place.

596
00:54:55,720 --> 00:55:00,680
But I feel like the word cannabinoids is more appropriate.

597
00:55:00,680 --> 00:55:03,280
It is totally more appropriate.

598
00:55:03,280 --> 00:55:08,080
You cannot define potency without an activity assay.

599
00:55:08,080 --> 00:55:12,680
And that's the last I'm going to be saying this.

600
00:55:12,680 --> 00:55:16,120
But it's nomenclature that you'll see.

601
00:55:16,120 --> 00:55:17,120
And so...

602
00:55:17,120 --> 00:55:19,640
Fucked up cannabis industry bullshit.

603
00:55:19,640 --> 00:55:23,920
And in fact, I even have a way to deal with this.

604
00:55:23,920 --> 00:55:34,240
And the way we do it is we basically just have replacements.

605
00:55:34,240 --> 00:55:39,800
Here basically just common replacements that I'll make on the COA.

606
00:55:39,800 --> 00:55:45,000
But at least we're not bad at the level of the analyses.

607
00:55:45,000 --> 00:55:50,240
Our terms are pretty synonymous, I guess, in the set you've got right now.

608
00:55:50,240 --> 00:55:51,240
Exactly.

609
00:55:51,240 --> 00:55:57,920
And so basically, if there's any oddball case like that, you can basically just define a

610
00:55:57,920 --> 00:56:07,440
new replacement and say, okay, we just want to replace potency with cannabinoids.

611
00:56:07,440 --> 00:56:17,040
So the idea is to make all of these little adjustments as sort of configurable as possible.

612
00:56:17,040 --> 00:56:24,440
And then the idea is you can just configure away the intricacies of everybody's COA.

613
00:56:24,440 --> 00:56:28,160
So say Greenleaf Lab uses the word potency.

614
00:56:28,160 --> 00:56:30,600
That's not going to break anything.

615
00:56:30,600 --> 00:56:34,760
We'll just make that replacement.

616
00:56:34,760 --> 00:56:39,240
Can I make a recommendation?

617
00:56:39,240 --> 00:56:41,080
What you're doing is wonderful.

618
00:56:41,080 --> 00:56:48,440
Do not go outside the flower box, as it were, because it's going to get a lot worse when

619
00:56:48,440 --> 00:56:54,460
you get processed cannabis products and it's going to be confounding.

620
00:56:54,460 --> 00:57:02,480
So I would at this point just be working with flower because it'll become rabbit holes in

621
00:57:02,480 --> 00:57:08,360
a mess once you go off that.

622
00:57:08,360 --> 00:57:11,680
It could get complex once you get to edibles.

623
00:57:11,680 --> 00:57:13,400
Yeah, it's going to be a mess.

624
00:57:13,400 --> 00:57:15,200
We'll deal with that later.

625
00:57:15,200 --> 00:57:19,160
Let's just do flowers.

626
00:57:19,160 --> 00:57:22,360
And so, exactly.

627
00:57:22,360 --> 00:57:29,080
So edibles often are measured in milligrams per gram and then the milligrams per serving

628
00:57:29,080 --> 00:57:30,480
matters.

629
00:57:30,480 --> 00:57:35,240
So the product classes and everything else gets real gnarly.

630
00:57:35,240 --> 00:57:37,440
So edibles could be a whole can of worms.

631
00:57:37,440 --> 00:57:42,320
If anyone ever comes across an edible COA that they want parsed, we could take a look at

632
00:57:42,320 --> 00:57:43,320
it.

633
00:57:43,320 --> 00:57:45,160
But as you said, we may have to take it from the top.

634
00:57:45,160 --> 00:57:48,320
Later, later.

635
00:57:48,320 --> 00:57:51,400
But for now, and I'll let you all get out of here.

636
00:57:51,400 --> 00:57:53,000
I think my battery's about to go.

637
00:57:53,000 --> 00:57:56,720
But for now, as I said, we can get all the results.

638
00:57:56,720 --> 00:58:01,600
It still needs to be flattened and normalized a bit.

639
00:58:01,600 --> 00:58:06,680
I forgot to add the units, so I need to go back and add the units.

640
00:58:06,680 --> 00:58:13,840
But at the moment, you can parse the COA and get all of this awesome data.

641
00:58:13,840 --> 00:58:16,680
And there's still more work to be done.

642
00:58:16,680 --> 00:58:21,880
But I was reading a pretty inspirational blog post.

643
00:58:21,880 --> 00:58:26,920
And it was basically, or actually, I think it was even a Stack Overflow answer, but basically

644
00:58:26,920 --> 00:58:34,560
somebody said that it's sort of miraculous what you can pull off sort of in a business

645
00:58:34,560 --> 00:58:42,840
model sense with good PDF parsing.

646
00:58:42,840 --> 00:58:43,840
It's a tough task.

647
00:58:43,840 --> 00:58:46,480
And not everybody does this well.

648
00:58:46,480 --> 00:58:50,840
And it's just like it's all about attention to detail.

649
00:58:50,840 --> 00:58:58,200
You could do this quick and dirty, but we had meticulous attention to detail to the

650
00:58:58,200 --> 00:59:04,840
point where we can get the sample image, we can get every last piece of data off of this

651
00:59:04,840 --> 00:59:05,840
COA.

652
00:59:05,840 --> 00:59:10,960
And that's what really makes it at the end of the day is the fact that we can get it

653
00:59:10,960 --> 00:59:22,080
all and organize it in a way that you can save it and your favorite data scientist can

654
00:59:22,080 --> 00:59:24,040
use this.

655
00:59:24,040 --> 00:59:27,320
So that's what really makes this special, I think.

656
00:59:27,320 --> 00:59:34,680
And so for any of you that want to work on this, as I said, we've got a pretty well-defined

657
00:59:34,680 --> 00:59:39,080
way to go about collecting this data.

658
00:59:39,080 --> 00:59:46,680
At least for California, we've got a nice list here of about 50 to 60 data points that

659
00:59:46,680 --> 00:59:49,920
you can expect on almost any COA.

660
00:59:49,920 --> 00:59:58,200
And you should be able to readily parse these data points without too much trouble.

661
00:59:58,200 --> 01:00:04,320
And then write some routines and hopefully get paid for your work.

662
01:00:04,320 --> 01:00:07,320
And that's the limit.

663
01:00:07,320 --> 01:00:11,700
And as I said, I think each bit that you tinker on this, I think, will be helping the cannabis

664
01:00:11,700 --> 01:00:15,560
industry out.

665
01:00:15,560 --> 01:00:19,080
So that's the presentation for today.

666
01:00:19,080 --> 01:00:24,960
Any thoughts, comments, questions before we get out of here and enjoy the bright sunny

667
01:00:24,960 --> 01:00:25,960
day?

668
01:00:25,960 --> 01:00:32,200
That's looking good.

669
01:00:32,200 --> 01:00:34,280
And I want to thank you all for coming along.

670
01:00:34,280 --> 01:00:38,600
And your eyes and your ears are helping.

671
01:00:38,600 --> 01:00:39,600
Thanks Jerry.

672
01:00:39,600 --> 01:00:40,600
Thanks Candice.

673
01:00:40,600 --> 01:00:41,600
John.

674
01:00:41,600 --> 01:00:42,600
All right.

675
01:00:42,600 --> 01:00:44,480
Keegan, hopefully we get more.

676
01:00:44,480 --> 01:00:50,920
As I said, we're waiting on Cafe Floor to see what they can roust up for us.

677
01:00:50,920 --> 01:01:00,440
So maybe it gets faster and easier as if you've gotten this far, maybe it's not so hard to

678
01:01:00,440 --> 01:01:04,440
put a new one in and see how fast it parses.

679
01:01:04,440 --> 01:01:05,440
Exactly.

680
01:01:05,440 --> 01:01:10,800
And that's what I was thinking is it's almost the classic learning by doing.

681
01:01:10,800 --> 01:01:38,200
So the more of these that we do, the...

