1
00:00:00,000 --> 00:00:07,800
Welcome to the Cannabis Data Science Meetup Group.

2
00:00:07,800 --> 00:00:09,000
Happy to have you all here.

3
00:00:09,000 --> 00:00:11,560
So long story short, my name is Keegan.

4
00:00:11,560 --> 00:00:17,120
I started a company, Cantlydx, spent a little bit of time in the cannabis space, preaching

5
00:00:17,120 --> 00:00:18,120
data.

6
00:00:18,120 --> 00:00:20,360
My background is in statistics.

7
00:00:20,360 --> 00:00:24,400
And so I was thinking, well, this is something that I know a bit about.

8
00:00:24,400 --> 00:00:33,280
So why not get together with some other knowledgeable people, data scientists, cannabis operators,

9
00:00:33,280 --> 00:00:39,280
regulators, you name it, and see if we can use statistics to help advance the cannabis

10
00:00:39,280 --> 00:00:40,320
industry.

11
00:00:40,320 --> 00:00:42,920
So that's where I'm coming from.

12
00:00:42,920 --> 00:00:47,920
This is basically a roundtable for us all to get together, talk about what's on our

13
00:00:47,920 --> 00:00:54,320
minds, what's pressing, what we want to accomplish, how we think data science can help us.

14
00:00:54,320 --> 00:00:57,360
Any pressing research questions we may have.

15
00:00:57,360 --> 00:01:02,680
Then I'm always happy to share my research and ideas with you.

16
00:01:02,680 --> 00:01:09,320
Always open to your feedback, because your feedback is how improvement happens.

17
00:01:09,320 --> 00:01:13,680
And then if you have any cool ideas or research endeavors, and I'm always happy to hear about

18
00:01:13,680 --> 00:01:17,600
those too, just so we can keep bouncing ideas off of each other.

19
00:01:17,600 --> 00:01:19,080
So that's enough about me.

20
00:01:19,080 --> 00:01:24,240
I'll share with you some cool statistics here today, cannabis related.

21
00:01:24,240 --> 00:01:27,880
Before we get to that, just want to give everybody a chance just to say, hey, a bunch of new

22
00:01:27,880 --> 00:01:29,320
faces today.

23
00:01:29,320 --> 00:01:35,720
So you don't have to, but if you want, spend anywhere from 30 seconds to a couple minutes

24
00:01:35,720 --> 00:01:42,680
talking about maybe your background or what you hope to get out of applying data science

25
00:01:42,680 --> 00:01:43,840
to cannabis.

26
00:01:43,840 --> 00:01:49,240
So Nick, good to meet you.

27
00:01:49,240 --> 00:01:50,240
What are you interested in?

28
00:01:50,240 --> 00:01:54,520
What do you hope to learn here in 2023?

29
00:01:54,520 --> 00:01:59,880
Well, I was involved in cannabis growing for about four years.

30
00:01:59,880 --> 00:02:08,920
And basically I'm interested, how the business will go further in 2023, because we all know

31
00:02:08,920 --> 00:02:18,920
what is now happening and people are struggling and how data analytics can help people achieve

32
00:02:18,920 --> 00:02:25,240
their goals, I guess.

33
00:02:25,240 --> 00:02:26,240
I love it.

34
00:02:26,240 --> 00:02:29,440
And I love your optimism, right?

35
00:02:29,440 --> 00:02:35,120
And then just mute me while I talk, just there was a little bit of feedback there.

36
00:02:35,120 --> 00:02:39,240
So long story short is I love your optimism because yes, you pointed out there's a problem

37
00:02:39,240 --> 00:02:42,280
at hand, but there's ways to fix it.

38
00:02:42,280 --> 00:02:43,280
And how do we start?

39
00:02:43,280 --> 00:02:45,880
Let's just start by any solution.

40
00:02:45,880 --> 00:02:48,520
Let's just start thinking of ideas.

41
00:02:48,520 --> 00:02:52,960
Cannabis doesn't have to be the best idea in the world, but it can be a starting point.

42
00:02:52,960 --> 00:02:56,400
So as you said, there's some companies that are struggling.

43
00:02:56,400 --> 00:03:02,320
We looked at the survival rate of cannabis companies and it varies by the type of business,

44
00:03:02,320 --> 00:03:09,160
say if you're a retailer, a processor, a cultivator, but on average, cannabis companies weren't

45
00:03:09,160 --> 00:03:17,760
having a lifespan much longer than a year on average, which is not an encouraging sight.

46
00:03:17,760 --> 00:03:23,000
It's not comparable to say the restaurant industry, but people are out there trying

47
00:03:23,000 --> 00:03:25,640
to start long-term companies.

48
00:03:25,640 --> 00:03:27,560
They want to have successful cultivations.

49
00:03:27,560 --> 00:03:35,680
They want to have successful retail establishments, successful processing.

50
00:03:35,680 --> 00:03:36,680
So exactly.

51
00:03:36,680 --> 00:03:39,920
So how can we alleviate some of the burdens?

52
00:03:39,920 --> 00:03:42,360
Well, maybe statistics can help.

53
00:03:42,360 --> 00:03:46,920
So we'll talk about how we can get back to the basics today.

54
00:03:46,920 --> 00:03:48,200
Real cool things coming up.

55
00:03:48,200 --> 00:03:54,480
So just once again, give everybody a chance to say, hey, Noah, good to meet you.

56
00:03:54,480 --> 00:03:57,400
We'd love to hear about your area that you're interested in.

57
00:03:57,400 --> 00:03:58,400
Yeah.

58
00:03:58,400 --> 00:03:59,400
Hi.

59
00:03:59,400 --> 00:04:00,400
Good to meet everyone.

60
00:04:00,400 --> 00:04:01,400
Yeah.

61
00:04:01,400 --> 00:04:04,720
I just found this meetup a few days ago.

62
00:04:04,720 --> 00:04:06,040
I thought it'd be really interesting.

63
00:04:06,040 --> 00:04:12,800
I just graduated from a degree in computer science with a data science emphasis.

64
00:04:12,800 --> 00:04:14,840
Still was kind of going around looking for work.

65
00:04:14,840 --> 00:04:17,720
I just relocated to Portland area.

66
00:04:17,720 --> 00:04:18,720
And I don't know.

67
00:04:18,720 --> 00:04:21,120
I'm just I'm really passionate about cannabis.

68
00:04:21,120 --> 00:04:24,520
And I've spent some time growing my own cannabis.

69
00:04:24,520 --> 00:04:30,560
And I'm really interested in some of the applications of using reinforcement learning to improve

70
00:04:30,560 --> 00:04:31,560
growth and that kind of thing.

71
00:04:31,560 --> 00:04:38,080
And I'm also interested in using large language models to kind of work out some of the effects

72
00:04:38,080 --> 00:04:40,400
of cannabis on different people.

73
00:04:40,400 --> 00:04:44,820
I think there's some really interesting application there and some holes that I think can see

74
00:04:44,820 --> 00:04:49,240
consumers could benefit from kind of understanding other people's experience with like certain

75
00:04:49,240 --> 00:04:50,720
strains and that kind of thing.

76
00:04:50,720 --> 00:04:55,640
But yeah, I'm mainly just here to just kind of hear what I didn't even know there was

77
00:04:55,640 --> 00:04:58,640
like a field of analytics in cannabis.

78
00:04:58,640 --> 00:05:05,440
So it's exciting to see that there are people working here and I'm just here to learn.

79
00:05:05,440 --> 00:05:10,200
So I'll give you a little taste of everything.

80
00:05:10,200 --> 00:05:14,400
So of course, I've got a lot of enthusiasm for this area.

81
00:05:14,400 --> 00:05:19,920
So I think there's a lot of value to be added by applying data science to cannabis.

82
00:05:19,920 --> 00:05:26,120
And so I started can let it just to show the other side.

83
00:05:26,120 --> 00:05:30,720
Data science is essentially a bleeding edge field.

84
00:05:30,720 --> 00:05:31,720
So it's really new.

85
00:05:31,720 --> 00:05:37,280
And so it can be difficult to convince people of the value.

86
00:05:37,280 --> 00:05:46,160
That's part of the one of the difficulties in a lot of what analytics companies do is

87
00:05:46,160 --> 00:05:52,640
talk to people and try to say try to essentially convince them that the analytics are in fact

88
00:05:52,640 --> 00:05:53,640
valuable.

89
00:05:53,640 --> 00:05:58,980
I guess don't want to to, you know, miss sell anything.

90
00:05:58,980 --> 00:06:01,080
But I think they are there.

91
00:06:01,080 --> 00:06:05,200
I think it's just asymmetric information.

92
00:06:05,200 --> 00:06:09,720
So to any who there is is a little bit of an uphill battle.

93
00:06:09,720 --> 00:06:12,760
But in my opinion, there's a lot of value to be added.

94
00:06:12,760 --> 00:06:17,720
The subjects you're talking about are exactly what people are interested in.

95
00:06:17,720 --> 00:06:26,000
Of course, on the retail side, especially since the medical industry is with how do

96
00:06:26,000 --> 00:06:33,520
you say they're they can't wait to dip their toes in the to dip their toes in.

97
00:06:33,520 --> 00:06:35,560
And of course, why?

98
00:06:35,560 --> 00:06:39,600
Because they're interested in potential medical effects.

99
00:06:39,600 --> 00:06:42,940
This is a lot of the talk on the town on the retail side.

100
00:06:42,940 --> 00:06:47,640
Another thing you raised is, well, what about the the home growers?

101
00:06:47,640 --> 00:06:52,800
And this was something that was neglected for a long time.

102
00:06:52,800 --> 00:06:58,020
The reason being in a lot of states, for example, Washington State, Washington State is still

103
00:06:58,020 --> 00:07:05,680
hesitant to permit home grow as Massachusetts has because they're worried about diversion.

104
00:07:05,680 --> 00:07:11,160
I don't know how legitimate of a claim that is, but I was thinking that analytics could

105
00:07:11,160 --> 00:07:15,400
be something that home growers could find quite useful.

106
00:07:15,400 --> 00:07:23,080
And I've encouraged meetup members in the past that this is something that I think people

107
00:07:23,080 --> 00:07:24,080
have dabbled in.

108
00:07:24,080 --> 00:07:27,600
But once again, you can always have a fresh take on it.

109
00:07:27,600 --> 00:07:30,280
Especially home grow analytics.

110
00:07:30,280 --> 00:07:33,640
And I don't know if it's just the material that I've searched for.

111
00:07:33,640 --> 00:07:40,240
But I see a lot of people doing, you know, hydroponic setups.

112
00:07:40,240 --> 00:07:46,760
And I wouldn't be surprised if at home hydroponic farming could be a thing in the future.

113
00:07:46,760 --> 00:07:51,240
But once again, bleeding edge, but you're well positioned.

114
00:07:51,240 --> 00:07:53,560
So I'm kind of rambling.

115
00:07:53,560 --> 00:07:57,720
If you have any more thoughts, feel free to chime in.

116
00:07:57,720 --> 00:08:01,840
And then as I continue to think about all your cool ideas, Noah.

117
00:08:01,840 --> 00:08:04,280
Michelle, welcome to the group.

118
00:08:04,280 --> 00:08:05,280
Happy to have you.

119
00:08:05,280 --> 00:08:08,160
We'd love to hear about what you're particularly interested in.

120
00:08:08,160 --> 00:08:11,480
That way we can make sure to touch on your interests.

121
00:08:11,480 --> 00:08:12,480
Sure.

122
00:08:12,480 --> 00:08:13,480
Hi.

123
00:08:13,480 --> 00:08:17,040
Keeping my camera turned off because I tend to get up and move around a lot.

124
00:08:17,040 --> 00:08:22,840
But like Noah, I kind of just stumbled upon the meetup a few days ago.

125
00:08:22,840 --> 00:08:29,520
I'm a math teacher who's actually, I'm not a math teacher anymore, but I'm transitioning

126
00:08:29,520 --> 00:08:32,120
into data analysis.

127
00:08:32,120 --> 00:08:37,580
And as I've been looking for projects to do and things like that, one of the things on

128
00:08:37,580 --> 00:08:40,720
my list was the cannabis industry.

129
00:08:40,720 --> 00:08:46,160
And I have a gajillion questions from not only the business aspect, but also how it's

130
00:08:46,160 --> 00:08:51,480
affecting society, you know, benefiting, I think probably benefiting most.

131
00:08:51,480 --> 00:08:57,280
But I would like to see the data on that for, you know, just other, you know, social ailments

132
00:08:57,280 --> 00:09:00,760
and mental struggles and that sort of thing.

133
00:09:00,760 --> 00:09:01,760
Depression.

134
00:09:01,760 --> 00:09:03,880
I deal with depression.

135
00:09:03,880 --> 00:09:08,760
But most of all, cannabis has been a part of my life for as long as I can remember.

136
00:09:08,760 --> 00:09:11,320
I grew up in the 70s with a hippie dad.

137
00:09:11,320 --> 00:09:14,880
He grew it on the roof of our apartment in New York.

138
00:09:14,880 --> 00:09:18,960
And yeah, so I just thought it was a really interesting way that I could maybe develop

139
00:09:18,960 --> 00:09:23,160
a project for my portfolio as well.

140
00:09:23,160 --> 00:09:24,160
I love it, Michelle.

141
00:09:24,160 --> 00:09:27,560
And you attended on the perfect day.

142
00:09:27,560 --> 00:09:31,240
Everything that you raised are essentially exactly what we're touching on.

143
00:09:31,240 --> 00:09:33,160
So first off, math.

144
00:09:33,160 --> 00:09:40,660
Coincidentally, we actually have a fun math or statistical exercise today.

145
00:09:40,660 --> 00:09:46,760
Going back to the 70s, we're actually going to hit on some real cool history today, which

146
00:09:46,760 --> 00:09:50,840
is tangentially related to growing on rooftops.

147
00:09:50,840 --> 00:09:55,240
And we'll get more into that next week.

148
00:09:55,240 --> 00:10:01,720
And then the final thing is, and I'll get to this more, relates to the insight of the

149
00:10:01,720 --> 00:10:02,720
day.

150
00:10:02,720 --> 00:10:05,200
And I'll sort of tease that.

151
00:10:05,200 --> 00:10:11,480
There's sort of a big takeaway at the end of today.

152
00:10:11,480 --> 00:10:12,600
Statistical insight.

153
00:10:12,600 --> 00:10:14,360
So I'll tease that.

154
00:10:14,360 --> 00:10:15,400
But that's all coming up.

155
00:10:15,400 --> 00:10:17,400
So you attended on the perfect day.

156
00:10:17,400 --> 00:10:20,680
Now, Isaac, happy to see you today.

157
00:10:20,680 --> 00:10:23,380
I love that you made it to the morning.

158
00:10:23,380 --> 00:10:26,360
So hopefully this works out well for you.

159
00:10:26,360 --> 00:10:31,840
We're doing, and the final thing is we're going to be working on a lot of projects.

160
00:10:31,840 --> 00:10:36,540
So this is essentially the, we're going into the third year of the Cannabis Data Science

161
00:10:36,540 --> 00:10:37,840
Meetup Group.

162
00:10:37,840 --> 00:10:40,320
The first year was a lot of data wrangling.

163
00:10:40,320 --> 00:10:44,880
So we were just trying to find what are all the cannabis data sources?

164
00:10:44,880 --> 00:10:46,480
What are all the data points?

165
00:10:46,480 --> 00:10:48,480
How can we get them?

166
00:10:48,480 --> 00:10:50,880
That was a lot of what the first year was.

167
00:10:50,880 --> 00:10:57,080
The second year, we laid out all of the statistical tools that we'll be using.

168
00:10:57,080 --> 00:10:59,680
So we started to lay out model after model.

169
00:10:59,680 --> 00:11:04,080
We have about 30 solid models that we can pull from.

170
00:11:04,080 --> 00:11:08,440
And then we began to see how we could apply those in the cannabis space.

171
00:11:08,440 --> 00:11:12,000
Now this year, we've got the data.

172
00:11:12,000 --> 00:11:19,320
We've got our statistical tools, and now it's time to finally embark on interesting projects.

173
00:11:19,320 --> 00:11:23,000
And so that's why it's all hands on deck.

174
00:11:23,000 --> 00:11:30,520
So if any of you, Michelle, Noah, Nag, Isaac, are looking for projects, feel free to get

175
00:11:30,520 --> 00:11:38,160
in touch because I'm actually starting to work with this group, cannabis data, and we're

176
00:11:38,160 --> 00:11:41,520
just wrangling data in statistics.

177
00:11:41,520 --> 00:11:44,760
And there's a lot of issues, open source project.

178
00:11:44,760 --> 00:11:49,120
So feel free to see if there's any way that that could shoot your fancy and you could

179
00:11:49,120 --> 00:11:52,960
get some value out of that.

180
00:11:52,960 --> 00:12:03,120
So anywho, before I continue rambling on, Isaac, any cool projects that you're interested

181
00:12:03,120 --> 00:12:05,680
in tackling in this coming year?

182
00:12:05,680 --> 00:12:15,960
Well, one thing that I'm interested in is lab fraud.

183
00:12:15,960 --> 00:12:23,040
So last year, well, end of last year, I tried to use Benford to do some detection.

184
00:12:23,040 --> 00:12:32,560
And also, last week, I plotted out the microbiome results from Washington and I noticed a strange

185
00:12:32,560 --> 00:12:39,120
decrease of detections around the state regulatory limit.

186
00:12:39,120 --> 00:12:46,300
So that seems pretty fishy to me and I perhaps dig into that a little bit more.

187
00:12:46,300 --> 00:12:51,240
And also, I found a pretty good test.

188
00:12:51,240 --> 00:12:54,520
It's called McCray density test.

189
00:12:54,520 --> 00:13:08,200
And it's like, well, so one method for detection or for just in general, measurement around

190
00:13:08,200 --> 00:13:13,360
one intervention is called discontinuity design.

191
00:13:13,360 --> 00:13:20,280
And it's that there is a, you are basically working around a ball of data around one single

192
00:13:20,280 --> 00:13:23,440
point and you're trying to see if the means are different.

193
00:13:23,440 --> 00:13:29,280
And this method called McCray density test allows you to do that on a histogram, say

194
00:13:29,280 --> 00:13:30,840
of a normal distribution.

195
00:13:30,840 --> 00:13:38,640
And you can use that to measure if there is a gap of difference of whatever mean or standard

196
00:13:38,640 --> 00:13:45,380
deviation of a normal distribution to detect if it's really off.

197
00:13:45,380 --> 00:13:48,500
So that seems pretty promising method.

198
00:13:48,500 --> 00:13:51,360
So yeah.

199
00:13:51,360 --> 00:13:55,920
I love it, Isaac, and I love that you bring this topic up today because once again, it

200
00:13:55,920 --> 00:14:03,960
is the exact topic at hand, lab results and statistical distributions.

201
00:14:03,960 --> 00:14:08,000
And if you've got time, I'll be attending.

202
00:14:08,000 --> 00:14:15,120
There is a quality control meeting that's open to the public in Washington state tomorrow.

203
00:14:15,120 --> 00:14:19,320
And these meetings, there's often a lot of people in the industry there, right?

204
00:14:19,320 --> 00:14:24,560
There'll be representatives from the labs and the licensees, but very few times do you

205
00:14:24,560 --> 00:14:26,040
see people from the public.

206
00:14:26,040 --> 00:14:30,120
So if you were able to attend, that would be a whole new perspective.

207
00:14:30,120 --> 00:14:35,840
And as you mentioned, and this was sort of the topic last week, there's all the, of course,

208
00:14:35,840 --> 00:14:39,840
there's a lot of talk about, you know, THC and CBD.

209
00:14:39,840 --> 00:14:42,840
There's a lot of natural variation there too.

210
00:14:42,840 --> 00:14:49,840
And like you said, of course we do want to identify any fraud.

211
00:14:49,840 --> 00:14:53,320
However do we only want to look at those analytes?

212
00:14:53,320 --> 00:14:58,720
And as we were pointing out last week, you know, moisture and water activity were looking

213
00:14:58,720 --> 00:15:00,480
a little odd.

214
00:15:00,480 --> 00:15:03,840
And I kind of messed up my Benford's analysis, I realized.

215
00:15:03,840 --> 00:15:06,680
So I'm happy that you repeated it.

216
00:15:06,680 --> 00:15:10,320
And so the moisture and water activity could use another look.

217
00:15:10,320 --> 00:15:16,360
And then as you said, the microbes and the mycotoxins, especially the mycotoxins could

218
00:15:16,360 --> 00:15:19,600
potentially use another look.

219
00:15:19,600 --> 00:15:22,920
Because they may have changed the method in Washington state.

220
00:15:22,920 --> 00:15:27,440
I need to follow it a bit more closely, but as you, I'm sure you know, there's multiple

221
00:15:27,440 --> 00:15:32,160
ways that you can screen for mycotoxins.

222
00:15:32,160 --> 00:15:39,680
From my understanding, you can use what's called plating, which I think takes maybe

223
00:15:39,680 --> 00:15:47,840
24 to 48 hours and maybe isn't as precise as one may hope.

224
00:15:47,840 --> 00:15:54,000
And then I think people may be moving more towards testing for mycotoxins on a mass spec,

225
00:15:54,000 --> 00:16:03,000
which is faster because I think you can run maybe a mass spec run maybe in under an hour.

226
00:16:03,000 --> 00:16:07,540
But it's more expensive because you have to have a mass spec.

227
00:16:07,540 --> 00:16:11,320
You have to have somebody that can run a mass spec and then you can have somebody that can

228
00:16:11,320 --> 00:16:13,320
analyze the results.

229
00:16:13,320 --> 00:16:15,960
So it's a bit more complicated.

230
00:16:15,960 --> 00:16:20,160
So it's interesting that you noted that.

231
00:16:20,160 --> 00:16:26,560
And so it's worth investigating a bit more because the labs and the licensees, they have

232
00:16:26,560 --> 00:16:33,120
their priorities about what they want to talk about, which will probably be lab testing

233
00:16:33,120 --> 00:16:34,600
is expensive.

234
00:16:34,600 --> 00:16:38,240
However, consumers have their concerns.

235
00:16:38,240 --> 00:16:44,960
And so if we see, say, mycotoxins is being measured oddly, that's a good point to bring

236
00:16:44,960 --> 00:16:45,960
up.

237
00:16:45,960 --> 00:16:48,680
So I like it.

238
00:16:48,680 --> 00:16:58,440
So just to go ahead and dive in to the topic at hand.

239
00:16:58,440 --> 00:17:00,120
Well I'll go ahead and show you.

240
00:17:00,120 --> 00:17:02,520
I got in a hold of this book.

241
00:17:02,520 --> 00:17:04,920
I think you can get it for fairly cheap.

242
00:17:04,920 --> 00:17:10,080
This is Sense Amelia Tips compiled by Tom Alexander.

243
00:17:10,080 --> 00:17:14,480
I got a used copy, I think, for under 10 bucks or so.

244
00:17:14,480 --> 00:17:16,680
So you may be able to find a copy.

245
00:17:16,680 --> 00:17:20,400
This is a surprisingly good find.

246
00:17:20,400 --> 00:17:26,460
So I was expecting just your typical news journal.

247
00:17:26,460 --> 00:17:29,720
But there's actually a good bit of substance to this.

248
00:17:29,720 --> 00:17:32,840
This was published in 1988.

249
00:17:32,840 --> 00:17:39,160
So that's, I don't know when in 1988, but that's going on 35 years ago.

250
00:17:39,160 --> 00:17:42,060
And that was when it was compiled.

251
00:17:42,060 --> 00:17:44,720
So those could have been written before then.

252
00:17:44,720 --> 00:17:47,640
So this could be over 35 years old.

253
00:17:47,640 --> 00:17:51,080
And so let's see if there's any nuggets of wisdom in there.

254
00:17:51,080 --> 00:17:54,000
Sorry, could you repeat the title again?

255
00:17:54,000 --> 00:17:56,040
I couldn't see the title clearly.

256
00:17:56,040 --> 00:17:57,040
Sure.

257
00:17:57,040 --> 00:18:09,160
I'll make all these links available, but it's SINSEMILATIP.

258
00:18:09,160 --> 00:18:17,120
And so Sense Amelia is, I think it may even be a Spanish name.

259
00:18:17,120 --> 00:18:23,200
That's when they, originally, when people cultivated cannabis, you would just have one

260
00:18:23,200 --> 00:18:28,680
field, where you would just have female and male plants.

261
00:18:28,680 --> 00:18:36,120
Well, the male plants would pollinate the female plants, and you'd get flowers filled

262
00:18:36,120 --> 00:18:37,520
with seeds.

263
00:18:37,520 --> 00:18:44,840
And so that's why the whole sublime song, pick out the seeds from stands.

264
00:18:44,840 --> 00:18:51,760
So people used to have to pick out the seeds because they just grew it all together.

265
00:18:51,760 --> 00:19:02,120
And then in the 70s or so, definitely by the 80s, people would realize if you are solely

266
00:19:02,120 --> 00:19:09,600
interested in the flower bud, then if you pick out the male flowers, then it's more

267
00:19:09,600 --> 00:19:10,600
efficient.

268
00:19:10,600 --> 00:19:19,600
You're using water and nutrients and light and space only for the productive female plants,

269
00:19:19,600 --> 00:19:22,960
and they don't get pollinated by the males.

270
00:19:22,960 --> 00:19:31,040
So cannabis is an odd plant, so there still may be an odd seed or two, but there's fewer

271
00:19:31,040 --> 00:19:32,040
seeds.

272
00:19:32,040 --> 00:19:39,000
So that's Sense Amelia, a technique that became more and more popular.

273
00:19:39,000 --> 00:19:43,760
So anywho, that's the name of the article.

274
00:19:43,760 --> 00:19:51,200
And this journal was put together by this interesting character, Tom Alexander, who

275
00:19:51,200 --> 00:19:54,240
was growing cannabis in Oregon.

276
00:19:54,240 --> 00:20:05,200
And I think they confiscated his crop, and I think he was facing some criminal penalties.

277
00:20:05,200 --> 00:20:10,840
I think the case ended up getting dropped for one reason or the other, I do believe.

278
00:20:10,840 --> 00:20:14,160
But then after that, he just went into publishing.

279
00:20:14,160 --> 00:20:20,680
And so he just talked with a bunch of cultivators, growers, and compiled a lot of techniques.

280
00:20:20,680 --> 00:20:23,800
So it's a real interesting look into the past here.

281
00:20:23,800 --> 00:20:25,040
But enough of me rambling on.

282
00:20:25,040 --> 00:20:32,720
Let me actually show you what's valuable about this.

283
00:20:32,720 --> 00:20:38,800
Okay, too cool.

284
00:20:38,800 --> 00:20:42,840
So this is a question that's come up multiple times.

285
00:20:42,840 --> 00:20:46,560
And Isaac, you may have brought this up at one point.

286
00:20:46,560 --> 00:20:57,580
And I think it's pretty well known, but I wanted to go back to the source and really

287
00:20:57,580 --> 00:21:02,520
try to figure out some of the early research that's been done on this.

288
00:21:02,520 --> 00:21:10,400
So okay, so we see a lot of variation in THC and CBD.

289
00:21:10,400 --> 00:21:16,200
And so the question is, well, is that because of environmental factors?

290
00:21:16,200 --> 00:21:19,080
And so that would be, are some cultivators better than others?

291
00:21:19,080 --> 00:21:21,160
Do some people have brighter lights?

292
00:21:21,160 --> 00:21:24,160
Do some people have more nutrients?

293
00:21:24,160 --> 00:21:28,040
Do some people have less disease?

294
00:21:28,040 --> 00:21:29,320
So on and so forth.

295
00:21:29,320 --> 00:21:32,360
So those would be environmental factors.

296
00:21:32,360 --> 00:21:36,980
And then, of course, people are interested in about breeding.

297
00:21:36,980 --> 00:21:40,120
And so here we get to the math.

298
00:21:40,120 --> 00:21:47,760
So I'll blast you with some of the math and statistics, and then we'll get to the takeaways.

299
00:21:47,760 --> 00:21:54,320
So sigma, this is a typo.

300
00:21:54,320 --> 00:21:57,520
This should be a sigma squared.

301
00:21:57,520 --> 00:22:02,840
This is a measure of variance.

302
00:22:02,840 --> 00:22:06,760
So it's a measure of central tendency.

303
00:22:06,760 --> 00:22:09,760
So we know about the mean.

304
00:22:09,760 --> 00:22:15,280
So if you've got a bunch of observations, you can calculate the average.

305
00:22:15,280 --> 00:22:21,720
But then you can also calculate how much do each of these vary from each other.

306
00:22:21,720 --> 00:22:23,960
So that's your variance.

307
00:22:23,960 --> 00:22:32,800
And so if we're interested in about, say, THC, well, presumably, there's some variance

308
00:22:32,800 --> 00:22:35,880
in THC that comes from genetics.

309
00:22:35,880 --> 00:22:46,040
So if you have a good seed, maybe that leads a certain amount to your variance in THC,

310
00:22:46,040 --> 00:22:49,340
as well as the environment.

311
00:22:49,340 --> 00:22:57,560
So if you have dim lights, one would expect your THC maybe a little lower.

312
00:22:57,560 --> 00:23:06,960
Likewise, if you've got poor quality soil, poor nutrients, poor airflow, odd temperatures,

313
00:23:06,960 --> 00:23:13,380
all of these things are your environment and could lead to different THC concentrations.

314
00:23:13,380 --> 00:23:23,160
And as we've seen, we've observed people selecting different breeds of cannabis to go into quite

315
00:23:23,160 --> 00:23:24,720
different directions.

316
00:23:24,720 --> 00:23:28,160
You see people breeding for high THC.

317
00:23:28,160 --> 00:23:34,400
And then you also see people breeding for low THC, people trying to grow the CBD hemp,

318
00:23:34,400 --> 00:23:38,920
who are trying to stay under the 0.3% THC limit.

319
00:23:38,920 --> 00:23:44,840
OK, so you've got genetic variance and environmental variance.

320
00:23:44,840 --> 00:23:54,760
And so we're going to define heterotability of a trait as the amount of total variance.

321
00:23:54,760 --> 00:24:04,520
So the total variance we see is the genetic variance plus the environmental variance.

322
00:24:04,520 --> 00:24:13,840
So we want to know what percentage of all the variance that we see comes from genetics.

323
00:24:13,840 --> 00:24:19,040
Is it a small amount or is it a large amount?

324
00:24:19,040 --> 00:24:22,280
This is really the crux.

325
00:24:22,280 --> 00:24:30,200
Because if genetics doesn't matter, then you just want to get your environment as good

326
00:24:30,200 --> 00:24:31,680
as possible.

327
00:24:31,680 --> 00:24:40,120
Likewise, if genetic variance, if that represents a large proportion of heritability, then you'd

328
00:24:40,120 --> 00:24:46,720
be more interested in focusing on breeding and less focused on environment.

329
00:24:46,720 --> 00:24:52,400
So can we disentangle these two?

330
00:24:52,400 --> 00:25:00,760
Also, just pointing this out, this was written by Chief 7 Turtles.

331
00:25:00,760 --> 00:25:08,840
And the turtle is a symbol for cantlytics because we're sort of the turtle over the

332
00:25:08,840 --> 00:25:09,840
hare.

333
00:25:09,840 --> 00:25:12,920
And so I just thought this was just a funny coincidence.

334
00:25:12,920 --> 00:25:16,040
Anywho, what are some of the key points?

335
00:25:16,040 --> 00:25:18,560
And then I'll get to the actual data.

336
00:25:18,560 --> 00:25:25,520
So some populations have more genetic variation than others.

337
00:25:25,520 --> 00:25:34,960
Likewise, certain traits are more susceptible to genetic variation than others.

338
00:25:34,960 --> 00:25:39,680
Of course, environmental variation depends on your conditions.

339
00:25:39,680 --> 00:25:47,280
The example given in Sinsemilia tips was your environmental variation will be different

340
00:25:47,280 --> 00:25:51,160
outdoors than it will be in a greenhouse.

341
00:25:51,160 --> 00:25:57,280
I need to think a little bit harder on that.

342
00:25:57,280 --> 00:26:03,480
But the conditions matter, and also the trait matters as well.

343
00:26:03,480 --> 00:26:09,040
So those two are pretty straightforward and logical.

344
00:26:09,040 --> 00:26:10,400
Those make sense.

345
00:26:10,400 --> 00:26:14,240
Okay, well, why am I droning on about this?

346
00:26:14,240 --> 00:26:21,640
Well, heritability is actually a difficult topic to study.

347
00:26:21,640 --> 00:26:25,560
And it's normally infeasible.

348
00:26:25,560 --> 00:26:33,960
What's interesting about cannabis is it's often grown with clones.

349
00:26:33,960 --> 00:26:37,640
So people will clone cannabis.

350
00:26:37,640 --> 00:26:40,120
It's pretty typical.

351
00:26:40,120 --> 00:26:48,160
Well, when you make a clone, there's going to be no genetic variation.

352
00:26:48,160 --> 00:26:58,400
All you'll have in the clone is environmental variation.

353
00:26:58,400 --> 00:27:12,060
So the idea is you can see the total variation in a crop of plants, and you can subtract

354
00:27:12,060 --> 00:27:17,120
away the variation from the clones.

355
00:27:17,120 --> 00:27:22,800
And what you're left with is the genetic variation.

356
00:27:22,800 --> 00:27:30,160
And then once you're left with the genetic variation and you know the environmental variation,

357
00:27:30,160 --> 00:27:33,560
you can calculate the heritability.

358
00:27:33,560 --> 00:27:42,100
So the long story short is heritability can range from zero, something doesn't inherit,

359
00:27:42,100 --> 00:27:43,360
to one.

360
00:27:43,360 --> 00:27:47,000
It's going to...

361
00:27:47,000 --> 00:27:52,540
The genetics is going to explain 100% of the variability in that trait.

362
00:27:52,540 --> 00:28:01,600
So we're interested in is the heritability of THC, is that closer to zero or is it closer

363
00:28:01,600 --> 00:28:07,080
to one?

364
00:28:07,080 --> 00:28:09,580
So that is sort of the question at hand.

365
00:28:09,580 --> 00:28:14,840
And if this is getting a little abstract, then don't worry because we're going to get

366
00:28:14,840 --> 00:28:16,480
our hands on the data.

367
00:28:16,480 --> 00:28:25,920
And I should have committed this to the repository beforehand, but I'll commit this right afterwards.

368
00:28:25,920 --> 00:28:37,920
So in this book, Since Amelia Tips, there's a table where they have two different plots,

369
00:28:37,920 --> 00:28:48,440
plot one and plot two, and in each plot, they have some plants that were grown from seed

370
00:28:48,440 --> 00:28:57,240
and some plants that were grown from clone, and they were measuring the THC concentration

371
00:28:57,240 --> 00:28:59,520
of these plants.

372
00:28:59,520 --> 00:29:06,280
And so this is, I was saying last week that I think cultivators have been keeping track

373
00:29:06,280 --> 00:29:09,320
of THC concentration for a long time.

374
00:29:09,320 --> 00:29:16,800
Well, it turns out they've been keeping track of THC concentration perhaps for more than

375
00:29:16,800 --> 00:29:21,800
35 years, which is wild to think about.

376
00:29:21,800 --> 00:29:23,800
So let's actually...

377
00:29:23,800 --> 00:29:24,800
Oops.

378
00:29:24,800 --> 00:29:25,800
Oops.

379
00:29:25,800 --> 00:29:30,200
Let's not give everything away.

380
00:29:30,200 --> 00:29:40,240
So first things first, just going to read this data in, and just to show you, this is

381
00:29:40,240 --> 00:29:42,360
all the data, right?

382
00:29:42,360 --> 00:29:52,280
We were just looking at in Excel, just going to print it out to the console, and then we'll

383
00:29:52,280 --> 00:29:58,280
essentially repeat the analysis done in Since Amelia Tips.

384
00:29:58,280 --> 00:30:12,080
I wasn't sure about if I could just take a picture of their table or not.

385
00:30:12,080 --> 00:30:19,400
I decided not to, but maybe it would have been a helpful benchmark.

386
00:30:19,400 --> 00:30:29,320
So basically the idea, well, an idea of the scientific process is reproducibility.

387
00:30:29,320 --> 00:30:33,640
And for statistics, this becomes really interesting.

388
00:30:33,640 --> 00:30:42,960
It's actually non-trivial to repeat someone's statistics, and it's often a good exercise.

389
00:30:42,960 --> 00:30:48,560
So if somebody publishes a paper, to go back through, get a hold of the data, and try to

390
00:30:48,560 --> 00:30:56,200
follow the steps they took to reproduce the same statistics, because you'll have to follow

391
00:30:56,200 --> 00:30:59,560
through all the assumptions, and it can be difficult.

392
00:30:59,560 --> 00:31:01,640
So anywho, we've got this data here.

393
00:31:01,640 --> 00:31:03,240
Well, let's start looking at it.

394
00:31:03,240 --> 00:31:05,680
So we've got seeds and clones.

395
00:31:05,680 --> 00:31:13,840
So if we just look at the seeds, so these were cannabis grown in two different plots

396
00:31:13,840 --> 00:31:15,420
by seed.

397
00:31:15,420 --> 00:31:28,160
And so as you can see, back in 1988, you've got an average THC of around 8%.

398
00:31:28,160 --> 00:31:35,680
So the average THC has more than doubled in the past 35 years.

399
00:31:35,680 --> 00:31:38,640
So that's an observation.

400
00:31:38,640 --> 00:31:43,080
Next we can look at the distribution of clones.

401
00:31:43,080 --> 00:31:47,160
And so these are clones in two different plots.

402
00:31:47,160 --> 00:31:57,120
And this would be environmental variance between these two different plots.

403
00:31:57,120 --> 00:32:03,200
There's no genetic variation between clones.

404
00:32:03,200 --> 00:32:12,280
So any variation in the distribution, so any variation in variation, will be explainable

405
00:32:12,280 --> 00:32:14,600
from environment.

406
00:32:14,600 --> 00:32:26,520
So if you did the same study that Chief Seven Turtles conducted, then you would see, oh,

407
00:32:26,520 --> 00:32:35,120
plot one is the better plot for growing clones than plot two.

408
00:32:35,120 --> 00:32:37,440
And that's purely environmental.

409
00:32:37,440 --> 00:32:39,360
Cool.

410
00:32:39,360 --> 00:32:50,080
So now if you plot the seeds versus the clones, it becomes quite interesting.

411
00:32:50,080 --> 00:32:58,120
So here you see the distribution of plants grown from seeds, distribution grown by clones.

412
00:32:58,120 --> 00:33:08,520
What's readily apparent, the variance is much smaller for plants grown from clone than from

413
00:33:08,520 --> 00:33:12,120
plants grown from seed.

414
00:33:12,120 --> 00:33:18,480
The mean is lower, but that may just be the clone that was selected.

415
00:33:18,480 --> 00:33:26,720
So for example, if you just selected a random plant from this distribution as your clone,

416
00:33:26,720 --> 00:33:33,720
maybe they selected a plant from a seed from this end of the distribution.

417
00:33:33,720 --> 00:33:35,520
Cool.

418
00:33:35,520 --> 00:33:45,080
So now we can actually calculate heritability the same way that Tom Alexander did.

419
00:33:45,080 --> 00:33:49,520
So the way they did, they actually use standard deviation.

420
00:33:49,520 --> 00:33:55,920
And so I'll show you this with standard deviation, and then I'll go back to variance.

421
00:33:55,920 --> 00:34:06,320
But basically, if you look at the, they also calculated variance grouped by plot.

422
00:34:06,320 --> 00:34:13,720
And so this is what's so interesting about reproducing statistics, because all they have

423
00:34:13,720 --> 00:34:19,320
is a table with the means and variances in their calculations.

424
00:34:19,320 --> 00:34:27,320
And so it takes reproducing it to realize that, OK, when they calculated the standard

425
00:34:27,320 --> 00:34:35,320
deviation, they did this by grouping it by plot, and then they took the average.

426
00:34:35,320 --> 00:34:45,720
So basically, they took the standard deviation of these two plots, which is odd that it's

427
00:34:45,720 --> 00:34:48,320
the same, but it is.

428
00:34:48,320 --> 00:34:55,320
It could be just the nature of numbers.

429
00:34:55,320 --> 00:34:59,320
Anywho, you've got the standard deviation of clones.

430
00:34:59,320 --> 00:35:04,320
And as you can see, the standard deviation of seeds is higher.

431
00:35:04,320 --> 00:35:15,320
So basically, the argument is all of the variation from clones is environmental variation.

432
00:35:15,320 --> 00:35:25,320
So that's how you get the environmental variation of 0.79% THC.

433
00:35:25,320 --> 00:35:30,320
And then they say, OK, well, what's the phenotypic variation?

434
00:35:30,320 --> 00:35:36,320
And so the phenotypic variation, well, that's environment plus genetics.

435
00:35:36,320 --> 00:35:43,320
And so they say, oh, we'll just get that by taking the standard deviation of the seeds.

436
00:35:43,320 --> 00:35:50,320
And we'll call that our phenotypic variation, because that's going to be all of the environment

437
00:35:50,320 --> 00:35:59,320
variation that the seeds experience plus the genetic variation of the seeds.

438
00:35:59,320 --> 00:36:00,320
Cool.

439
00:36:00,320 --> 00:36:07,320
And so then they say, oh, well, we can just calculate the genetic variation by subtracting

440
00:36:07,320 --> 00:36:13,320
the environmental variation from the phenotypic variation.

441
00:36:13,320 --> 00:36:14,320
Cool.

442
00:36:14,320 --> 00:36:22,320
So now they're saying the genetic variation is around 1.27% THC.

443
00:36:22,320 --> 00:36:26,320
And so you can now get a measure of heritability.

444
00:36:26,320 --> 00:36:28,320
Drum roll.

445
00:36:28,320 --> 00:36:37,320
So you're interested in, on a scale of 0 to 1, how heritable is THC?

446
00:36:37,320 --> 00:36:42,320
Dun-dun-dun-dun.

447
00:36:42,320 --> 00:36:50,320
Chief, seven turtles calculated the heritability of cannabis, or THC in cannabis,

448
00:36:50,320 --> 00:36:57,320
to be approximately 0.62, or 62%, which is quite high.

449
00:36:57,320 --> 00:37:06,320
So this would indicate that THC is, I guess you could argue what strongly is,

450
00:37:06,320 --> 00:37:10,320
but in my opinion, is strongly heritable.

451
00:37:10,320 --> 00:37:22,320
And so this would explain, okay, people figured out in 1988 that you can control

452
00:37:22,320 --> 00:37:27,320
variation in THC through genetics.

453
00:37:27,320 --> 00:37:33,320
So, for example, going back to our chart here,

454
00:37:33,320 --> 00:37:45,320
this now explains how people have been able to grow both CBD low THC cannabis

455
00:37:45,320 --> 00:37:52,320
and high THC cannabis because it's strongly heritable.

456
00:37:52,320 --> 00:37:59,320
So, for example, this clone would actually be a good selection if you wanted to move

457
00:37:59,320 --> 00:38:02,320
in the low THC direction.

458
00:38:02,320 --> 00:38:11,320
And so the idea would be you'd grow a bunch of seeds and you would pick one from the low end

459
00:38:11,320 --> 00:38:17,320
of the distribution and use that one for breeding and just repeat the process over

460
00:38:17,320 --> 00:38:24,320
and over again and try to push this distribution towards zero.

461
00:38:24,320 --> 00:38:33,320
Conversely, if you're interested in high THC, you would pick, you would breed the plants

462
00:38:33,320 --> 00:38:42,320
that tested at high concentrations and just keep doing selective breeding.

463
00:38:42,320 --> 00:38:53,320
Quick note, technically the formula for heritability is variance.

464
00:38:53,320 --> 00:39:04,320
And for I don't know why, for whatever reason, they chose to use standard deviation.

465
00:39:04,320 --> 00:39:10,320
I don't know if this is just because statistics has kind of evolved or people's understanding

466
00:39:10,320 --> 00:39:13,320
of statistics have kind of evolved in the past 35 years.

467
00:39:13,320 --> 00:39:19,320
I mean, statistics as a whole is a relatively new study in the grand scheme of things,

468
00:39:19,320 --> 00:39:29,320
rigorously, maybe a past 150 years or so, or not rigorously, that's maybe when it began

469
00:39:29,320 --> 00:39:36,320
and then maybe more rigorously, I don't know where, I'll quit conjecturing.

470
00:39:36,320 --> 00:39:46,320
The long story short is if we did say use variance, then we would get a higher measure

471
00:39:46,320 --> 00:39:47,320
of heritability.

472
00:39:47,320 --> 00:39:56,320
So if we used variance, we would measure heritability at almost 85%, which seems too high to me.

473
00:39:56,320 --> 00:40:02,320
But I also want to point out a couple shortcomings real quick.

474
00:40:02,320 --> 00:40:06,320
One, this is a statistic.

475
00:40:06,320 --> 00:40:17,320
So our estimation of heritability will become better as sample size increases.

476
00:40:17,320 --> 00:40:28,320
So here we have a sample size of 20 plants grown by seed and 20 plants grown by clone.

477
00:40:28,320 --> 00:40:33,320
So we could potentially use a bigger sample size.

478
00:40:33,320 --> 00:40:41,320
Also, what the author points out, measurement matters.

479
00:40:41,320 --> 00:40:49,320
And they say that you can view measurement error as an environmental variation, essentially,

480
00:40:49,320 --> 00:40:59,320
but they stress that you really want to measure as accurately and as consistently as possible.

481
00:40:59,320 --> 00:41:07,320
And then also, if you really do want to try to disentangle environmental variation from genetic variation,

482
00:41:07,320 --> 00:41:13,320
then you want to try to have as uniform an environment as possible.

483
00:41:13,320 --> 00:41:27,320
This was just a short, quick exercise just demonstrating how you can measure heritability of traits in cannabis.

484
00:41:27,320 --> 00:41:32,320
And remember, this will vary by population.

485
00:41:32,320 --> 00:41:37,320
So that means it may vary by strain population.

486
00:41:37,320 --> 00:41:42,320
I mean, it's not out of the realm. You'd have to do some investigation.

487
00:41:42,320 --> 00:41:49,320
But potentially, different strains may have different rates of heritability.

488
00:41:49,320 --> 00:41:54,320
Different traits may have different rates of heritability.

489
00:41:54,320 --> 00:42:01,320
So, for example, CBD may have a different rate of heritability than THC.

490
00:42:01,320 --> 00:42:04,320
And there's other factors people select for.

491
00:42:04,320 --> 00:42:16,320
So right out of the gate, they mentioned people are selecting for yield, for height, how many days it takes the flower.

492
00:42:16,320 --> 00:42:19,320
There's many factors that people are selecting for.

493
00:42:19,320 --> 00:42:24,320
So you'll have to look at heritability for each one.

494
00:42:24,320 --> 00:42:37,320
But this lays the framework for how you can apply statistics to be a better breeder.

495
00:42:37,320 --> 00:42:43,320
So this could apply to home growers or large scale cultivating.

496
00:42:43,320 --> 00:42:51,320
So I've been rambling long enough. I'll let you all ask any questions.

497
00:42:51,320 --> 00:42:56,320
And a teaser for next week. This is what I was going to do this week.

498
00:42:56,320 --> 00:42:59,320
But once again, lots of material.

499
00:42:59,320 --> 00:43:04,320
And so we wanted to take this quick tangent on heritability.

500
00:43:04,320 --> 00:43:16,320
For next week, I'll see if you can't find the author of this quote or the person who this can be attributed.

501
00:43:16,320 --> 00:43:27,320
But in relation to statistics, the idea is you don't want to try too hard.

502
00:43:27,320 --> 00:43:31,320
If you do, you may get spurious causation.

503
00:43:31,320 --> 00:43:40,320
And so in fact, obviously, I started a company, Candelitics.

504
00:43:40,320 --> 00:43:45,320
So we're interested in the cannabis space.

505
00:43:45,320 --> 00:43:53,320
As you're researching cannabis, it would be easy and you would have to be careful not to let your biases take over.

506
00:43:53,320 --> 00:44:05,320
So, for example, if you really, really did want to see a medicinal effect from CBD, that may bias your research.

507
00:44:05,320 --> 00:44:15,320
And so if there is just a readily apparent effect from CBD, then there may be something there that's worth looking at more.

508
00:44:15,320 --> 00:44:25,320
But the idea is if there's not just a real readily available signal or noise, you may want to be cautious.

509
00:44:25,320 --> 00:44:27,320
It doesn't mean don't proceed.

510
00:44:27,320 --> 00:44:39,320
You just may want to be cautious about biases, spurious causation, overfitting your model.

511
00:44:39,320 --> 00:44:49,320
And so, for example, basically, the lesson from today, how could we apply that lesson today?

512
00:44:49,320 --> 00:44:56,320
Well, if you really wanted to disentangle the environmental variation,

513
00:44:56,320 --> 00:45:07,320
there may be something going on between plot one and plot two, but it may be a stretch.

514
00:45:07,320 --> 00:45:18,320
However, it does seem readily apparent that there's differences in variance between seeds and clones.

515
00:45:18,320 --> 00:45:30,320
You've got a small sample size, 40 total observations, and already you can start to see a difference between the two.

516
00:45:30,320 --> 00:45:34,320
So that may warrant further investigation.

517
00:45:34,320 --> 00:45:40,320
So that's just sort of a word of caution as you're using statistics.

518
00:45:40,320 --> 00:45:50,320
And so if you're interested in this, then try to find the author behind this, because we'll continue to, or I'll at least continue to talk about this.

519
00:45:50,320 --> 00:46:06,320
And if you're not interested, then please let me know what sparks your interest as we continue to search for the genetic histories behind some of these cannabis treatments.

520
00:46:06,320 --> 00:46:15,320
Anywho, that's what I had to share with you all today, my latest journey.

521
00:46:15,320 --> 00:46:21,320
So my main project is compiling cannabis data from Washington state.

522
00:46:21,320 --> 00:46:37,320
And so, Michelle, this may be of direct relevance to you because you could start to do real interesting statistics as far as population level statistics goes there.

523
00:46:37,320 --> 00:46:40,320
So lots of lots more cool data coming down the pipelines.

524
00:46:40,320 --> 00:46:46,320
But real quick, does anyone have any thoughts, comments, questions from the material on heritability?

525
00:46:46,320 --> 00:47:07,320
Yeah, I can go first. I mean, I'm just wondering if there's anything we can do with more modern data because that and also the environmental variable, I mean, itself could be different.

526
00:47:07,320 --> 00:47:19,320
So I'm just thinking how the experiment can be improved with modern data to get a better measurement.

527
00:47:19,320 --> 00:47:27,320
Maybe we can even get a constant that's applied to most cannabis plants as one number, say 0.7.

528
00:47:27,320 --> 00:47:32,320
And then that number as an anchor point will allow us to do more analysis.

529
00:47:32,320 --> 00:47:38,320
But yeah, I agree 100 percent, Isaac.

530
00:47:38,320 --> 00:47:46,320
In fact, I would encourage you to and I'm going to be setting up a lot more on heritability now that we've figured this out.

531
00:47:46,320 --> 00:47:50,320
I think there are more sophisticated ways to estimate heritability.

532
00:47:50,320 --> 00:48:01,320
I think what you're talking about is more of a regression based approach where you'll want to control for more factors and, of course, increase your sample size.

533
00:48:01,320 --> 00:48:06,320
So I think the regression based approach, maybe the modern day approach.

534
00:48:06,320 --> 00:48:11,320
Remember, this was this was research that was done 35 years ago.

535
00:48:11,320 --> 00:48:15,320
So I think there is more sophisticated readily.

536
00:48:15,320 --> 00:48:21,320
And I mean, for all we know, they may have been calculating these means and variances with pen and paper.

537
00:48:21,320 --> 00:48:24,320
So we can do a lot better job today.

538
00:48:24,320 --> 00:48:37,320
So, for example, just loading the data into Python, right, I'm able to to do all these groupings and calculate means and variances and standard deviations really quickly.

539
00:48:37,320 --> 00:48:43,320
So it was really quick for me to to figure out, oh, they use standard deviation and not variance.

540
00:48:43,320 --> 00:48:47,320
So we can work quicker, work smarter.

541
00:48:47,320 --> 00:48:51,320
I think it's just just a starting point.

542
00:48:51,320 --> 00:49:09,320
And then the other part you raised about the environment, I think that does need to be taken into consideration because the way they went about identifying the environmental variation and then subtracting it from the variation of the seeds,

543
00:49:09,320 --> 00:49:12,320
it seems a little ad hoc.

544
00:49:12,320 --> 00:49:20,320
So this may be an ad hoc method for for calculating or estimating heritability.

545
00:49:20,320 --> 00:49:31,320
So this may be a starting point just to just get an idea of, OK, it is heritability closer to zero or maybe closer to one.

546
00:49:31,320 --> 00:49:37,320
So this got us an idea that, OK, there may be a significant amount of heritability there.

547
00:49:37,320 --> 00:49:42,320
And so, as you said, I think a more sophisticated analysis is needed.

548
00:49:42,320 --> 00:49:46,320
People are going to want to see a lot more controls of environment.

549
00:49:46,320 --> 00:49:53,320
That's a big criticism that people give to cannabis research.

550
00:49:53,320 --> 00:49:56,320
It's inevitable. We've talked about this before.

551
00:49:56,320 --> 00:50:00,320
You can't control for every factor under the sun.

552
00:50:00,320 --> 00:50:08,320
Right. If you controlled for everything and you did it in Massachusetts, someone will say, well, why didn't you do that same study in Washington state?

553
00:50:08,320 --> 00:50:12,320
You know, the air pressure is different or so, you know, so on and so forth.

554
00:50:12,320 --> 00:50:18,320
So there's always going to be something. But as you said, just the more things you control for the better.

555
00:50:18,320 --> 00:50:22,320
That gets us to sort of the lesson of the day.

556
00:50:22,320 --> 00:50:28,320
The more things you control for should get you an accurate, more accurate and consistent measure.

557
00:50:28,320 --> 00:50:38,320
But hopefully it's still relatively apparent, even if you say make a mistake or two or don't control for everything.

558
00:50:38,320 --> 00:50:41,320
I love it, Isaac. And so I'll continue thinking about this.

559
00:50:41,320 --> 00:50:51,320
The main thing is just working on the Washington state data was going to almost had a hold of some Oregon data that's still in the coming down the pipeline.

560
00:50:51,320 --> 00:50:55,320
And that may come for you as well as Massachusetts and Florida data.

561
00:50:55,320 --> 00:50:58,320
So hopefully those things come.

562
00:50:58,320 --> 00:51:03,320
OK, what about now? Awesome. Awesome. I can hear you. Love it.

563
00:51:03,320 --> 00:51:18,320
I'm talking about from a grover's perspective about these levels. So basically what I know that I was growing the same train outdoor greenhouse and indoor.

564
00:51:18,320 --> 00:51:23,320
And it was giving very different levels of THC.

565
00:51:23,320 --> 00:51:33,320
So the environment variance is very important in growing. Also, the seeds were growing faster than the clones.

566
00:51:33,320 --> 00:51:41,320
I know why they develop faster. They just have looks like they have the better genetics than the clones.

567
00:51:41,320 --> 00:51:46,320
And that's what I observed. OK, so there's a lot going on here.

568
00:51:46,320 --> 00:51:52,320
One thing I want to hit on that I don't know if a lot of people mention, but this has come across my research.

569
00:51:52,320 --> 00:51:59,320
From my understanding, seeds have tap roots, whereas clones do not.

570
00:51:59,320 --> 00:52:05,320
And to me, that makes it seem like that would have a big impact.

571
00:52:05,320 --> 00:52:09,320
It seems to me like the tap root would be important.

572
00:52:09,320 --> 00:52:13,320
So that maybe that may explain why it would grow faster.

573
00:52:13,320 --> 00:52:22,320
Technically, the clones are genetically identical to where they were cut from.

574
00:52:22,320 --> 00:52:32,320
So if they did grow differently, one could maybe chalk that up to maybe structural.

575
00:52:32,320 --> 00:52:37,320
They don't have a tap root. Also, there may be environment there.

576
00:52:37,320 --> 00:52:44,320
Also, it depends on a mother plant where you cut the clones.

577
00:52:44,320 --> 00:52:57,320
If you cut more than 30 percent of the plant, they say, and I also noticed that the clones can become hemophrodites.

578
00:52:57,320 --> 00:53:08,320
So they can just change. But also, we didn't talk about the nutrients that you put into the soil.

579
00:53:08,320 --> 00:53:15,320
So that's also very important for the plant to develop THC.

580
00:53:15,320 --> 00:53:28,320
Also, timing when you put certain things, is it like phosphorus or calcium or things like that?

581
00:53:28,320 --> 00:53:30,320
That's also very important.

582
00:53:30,320 --> 00:53:35,320
A hundred percent. This is the tricky part.

583
00:53:35,320 --> 00:53:40,320
Because there's variation in THC and obviously there's some genetics.

584
00:53:40,320 --> 00:53:47,320
That's how people have been able to breed over time. But even if, say, the genetics is 60 percent,

585
00:53:47,320 --> 00:53:56,320
that's still 40 percent or give or take more or less of variation that's explained from your environment.

586
00:53:56,320 --> 00:54:03,320
And that will be things like whether it was indoor or outdoor, because those are two different environments.

587
00:54:03,320 --> 00:54:10,320
How much nutrients the plant gets, that's an environmental factor.

588
00:54:10,320 --> 00:54:14,320
So those will all affect the environmental variation.

589
00:54:14,320 --> 00:54:20,320
And this matters when you're cultivating, right, because you want to grow top notch flower.

590
00:54:20,320 --> 00:54:33,320
The idea is, and this may be just a simple fact, but essentially, it may be obvious,

591
00:54:33,320 --> 00:54:43,320
you can't just select by clone to get better and better cannabis, because it's going to be genetically identical.

592
00:54:43,320 --> 00:54:50,320
Basically, the point was being in order to basically increase the THC levels over time,

593
00:54:50,320 --> 00:54:55,320
you would have to approach that genetically.

594
00:54:55,320 --> 00:55:03,320
It wouldn't be possible to just select clones that test higher and higher percentages.

595
00:55:03,320 --> 00:55:13,320
And then this is sort of the interesting thing is the seed may have genetics to produce high THC,

596
00:55:13,320 --> 00:55:21,320
but say, because of the environment, it may just have a lot of environmental variation.

597
00:55:21,320 --> 00:55:25,320
So it doesn't look like a high THC plant.

598
00:55:25,320 --> 00:55:32,320
And this is my argument about how California may have such prized genetics,

599
00:55:32,320 --> 00:55:37,320
because I think they're growing outside by seed.

600
00:55:37,320 --> 00:55:44,320
So these seeds in the plants, they may not grow into the best plants ever, right?

601
00:55:44,320 --> 00:55:51,320
They may be getting eaten by bugs and they're subject to all the elements.

602
00:55:51,320 --> 00:55:56,320
But people are able to then select the better and better variety.

603
00:55:56,320 --> 00:56:05,320
So if you then say, yeah, you took that clone from those plants and you went and grew it in a warehouse,

604
00:56:05,320 --> 00:56:09,320
you may be able to get higher THC on average.

605
00:56:09,320 --> 00:56:18,320
But over time, the people outdoor selecting and selecting and selecting will be able to increase the average.

606
00:56:18,320 --> 00:56:20,320
And then Noah, you had a question?

607
00:56:20,320 --> 00:56:26,320
Yeah, it was kind of related to that. Have there been any studies yet on like epigenetic factors?

608
00:56:26,320 --> 00:56:32,320
I'm curious if that's something that is even in effect here or I don't know.

609
00:56:32,320 --> 00:56:36,320
So this is bleeding edge.

610
00:56:36,320 --> 00:56:39,320
I can only report back hearsay.

611
00:56:39,320 --> 00:56:49,320
So as you may have known, LinkedIn is a good place for cannabis, people in the cannabis industry to socialize.

612
00:56:49,320 --> 00:57:00,320
And I've actually heard talk on the town that, yes, potentially cannabis may be especially susceptible to epigenetic change.

613
00:57:00,320 --> 00:57:04,320
But this is this is really new frontier.

614
00:57:04,320 --> 00:57:10,320
And I don't know how much I want to stray into it because I'm a super novice.

615
00:57:10,320 --> 00:57:13,320
I wouldn't even consider myself a biologist or a chemist.

616
00:57:13,320 --> 00:57:22,320
So actual biologists and chemists would probably frown upon upon me. So I kind of want to stay as humble as possible.

617
00:57:22,320 --> 00:57:26,320
But Michelle, you had a thought or question?

618
00:57:26,320 --> 00:57:34,320
Yeah, I was just wondering whether or not given that so many states are legalizing cannabis now, whether or not commercial growers,

619
00:57:34,320 --> 00:57:41,320
I know that they're tracking their THC levels and their CBD levels and levels.

620
00:57:41,320 --> 00:57:46,320
Are they required to report that data so that there's some sort of database that can be accessed?

621
00:57:46,320 --> 00:57:52,320
Yes. We have that data for you, the entire population in Washington state.

622
00:57:52,320 --> 00:58:00,320
And so that's what's real cool. We may even be able to parse out if things came from clone or seed.

623
00:58:00,320 --> 00:58:07,320
I don't know how well that's recorded in the traceability system, but we do have plant data.

624
00:58:07,320 --> 00:58:14,320
And all they were keeping track of were THC levels. And we have those.

625
00:58:14,320 --> 00:58:22,320
So we've got THC levels, CBD levels, and we may even have the propagation.

626
00:58:22,320 --> 00:58:25,320
We just may not have the genetic lineage.

627
00:58:25,320 --> 00:58:33,320
So there is maybe work we can do with data that we have at this time in Washington state.

628
00:58:33,320 --> 00:58:38,320
So let's keep talking about this because I think there is work we can do, Michelle.

629
00:58:38,320 --> 00:58:41,320
No, no, what further thoughts?

630
00:58:41,320 --> 00:58:49,320
Yeah, just following up on what Michelle said, I've heard anecdotal information about some of those THC testing,

631
00:58:49,320 --> 00:58:53,320
at least the ones reported on the label, and that they can be dubious.

632
00:58:53,320 --> 00:59:01,320
How do we know that these numbers that are being reported by the growers or the cultivars are accurate

633
00:59:01,320 --> 00:59:07,320
and aren't just kind of being, you know, torturing the numbers to kind of get high THC or low THC or whatever?

634
00:59:07,320 --> 00:59:13,320
Perfectly reasonable question. And the short answer is that's measurement error.

635
00:59:13,320 --> 00:59:19,320
So this is specifically, you know, even way back, I keep pointing this out,

636
00:59:19,320 --> 00:59:26,320
but it just shocks my mind how these topics were on mind 35 years ago.

637
00:59:26,320 --> 00:59:32,320
And exactly, if you have inaccurate measures of lab results,

638
00:59:32,320 --> 00:59:38,320
you're not going to be able to parse out the genetic variation as well.

639
00:59:38,320 --> 00:59:49,320
So disentangling environmental variation and genetic variation requires accurate and consistent testing.

640
00:59:49,320 --> 00:59:56,320
So that's why it is important for Isaac. I love that you're working on this.

641
00:59:56,320 --> 01:00:02,320
And that's why, you know, it is important for labs to measure accurately and consistently.

642
01:00:02,320 --> 01:00:05,320
It helps the breeders in the long run.

643
01:00:05,320 --> 01:00:10,320
Unfortunately, we may just have to anticipate that there is some measurement error.

644
01:00:10,320 --> 01:00:15,320
What I would say is, so for example, Isaac or anyone else studying this,

645
01:00:15,320 --> 01:00:19,320
that can be a condition that you try to control for.

646
01:00:19,320 --> 01:00:25,320
So say you were going to replicate their study at scale.

647
01:00:25,320 --> 01:00:30,320
Well, maybe one of the conditions you could control for would be laboratory.

648
01:00:30,320 --> 01:00:35,320
So maybe you would just grow all these plants from seed and clone,

649
01:00:35,320 --> 01:00:39,320
and then you'd send them to three or more different laboratories.

650
01:00:39,320 --> 01:00:43,320
And then you would just use that as one of your conditions

651
01:00:43,320 --> 01:00:49,320
and try to disentangle any measurement error coming from labs.

652
01:00:49,320 --> 01:00:53,320
Try to disentangle that from your genetic variation.

653
01:00:53,320 --> 01:01:00,320
So from a cultivator point of view, the measurement error is surmountable.

654
01:01:00,320 --> 01:01:05,320
I think it's not great, but I think you can overcome it.

655
01:01:05,320 --> 01:01:10,320
And you can still do, I think you can still do heritability studies.

656
01:01:10,320 --> 01:01:15,320
It just throws a monkey wrench into things, but you can just account for that.

657
01:01:15,320 --> 01:01:19,320
But Isaac, the question at hand?

658
01:01:19,320 --> 01:01:23,320
Yeah, just to chime in my two cents on the question.

659
01:01:23,320 --> 01:01:28,320
I mean, there are many sources of this uncertainty in measurement.

660
01:01:28,320 --> 01:01:34,320
On one hand, it could definitely, it's the most standard spread of measurement.

661
01:01:34,320 --> 01:01:38,320
You measure something exactly the same thing, and you'll get a spread.

662
01:01:38,320 --> 01:01:44,320
That's one type. And another type is that there might be lab issues

663
01:01:44,320 --> 01:01:51,320
that are just their control samples or their spikes was called is having issues.

664
01:01:51,320 --> 01:01:54,320
So it's like their quality control issues.

665
01:01:54,320 --> 01:02:00,320
And that might skew their test results, maybe 10% plus or minus,

666
01:02:00,320 --> 01:02:05,320
and also their habits of actually measuring it.

667
01:02:05,320 --> 01:02:11,320
So there is just natural variation.

668
01:02:11,320 --> 01:02:15,320
And second is variation based on their reference standards.

669
01:02:15,320 --> 01:02:21,320
And the third one, it could be based on their method for calculating the area

670
01:02:21,320 --> 01:02:25,320
or the amount of say THC.

671
01:02:25,320 --> 01:02:31,320
And it's only the fourth one is that they might knowingly fudge the number.

672
01:02:31,320 --> 01:02:38,320
So there are a lot of sources and the variations can be attributed to a combination.

673
01:02:38,320 --> 01:02:44,320
But in terms of how to get a data set that you can trust,

674
01:02:44,320 --> 01:02:47,320
it will take a small leap of faith.

675
01:02:47,320 --> 01:02:52,320
That is, you just look at the data of, for example, the Washington data set.

676
01:02:52,320 --> 01:02:57,320
It consists of data from 12 labs, I believe.

677
01:02:57,320 --> 01:03:03,320
And just by looking at the distribution of their THC measurements

678
01:03:03,320 --> 01:03:09,320
and also their micro measurements, some labs does have a better or more normal,

679
01:03:09,320 --> 01:03:13,320
like naturally occurring distribution than other ones.

680
01:03:13,320 --> 01:03:19,320
So if it's a concern for you that you want to go for reliable data,

681
01:03:19,320 --> 01:03:24,320
then perhaps you can filter for the labs that have the best looking distributions.

682
01:03:24,320 --> 01:03:30,320
Good points. And this is the beauty in the art of statistics.

683
01:03:30,320 --> 01:03:39,320
And why stress the, don't read into things too hard because there is a lot of noise.

684
01:03:39,320 --> 01:03:42,320
So you don't want to lean too much into the noise.

685
01:03:42,320 --> 01:03:46,320
But I love that this is the way you're thinking, Noah,

686
01:03:46,320 --> 01:03:49,320
because this is something that's often overlooked.

687
01:03:49,320 --> 01:03:55,320
And a good statistician like yourself will hammer this home.

688
01:03:55,320 --> 01:03:57,320
What about measurement error?

689
01:03:57,320 --> 01:04:01,320
Because things like this compound, right?

690
01:04:01,320 --> 01:04:07,320
Because your statistical analysis assumes maybe some variation.

691
01:04:07,320 --> 01:04:13,320
But then if all of a sudden there's other variation that you're not taking into consideration,

692
01:04:13,320 --> 01:04:20,320
like measurement error or other imperfections in the data, missing data, imputations,

693
01:04:20,320 --> 01:04:23,320
it just adds to the uncertainty.

694
01:04:23,320 --> 01:04:28,320
And I'll just leave it there that I don't know if you all know,

695
01:04:28,320 --> 01:04:34,320
but from my understanding, statistics is the study of uncertainty.

696
01:04:34,320 --> 01:04:36,320
And so that's what we're doing.

697
01:04:36,320 --> 01:04:39,320
This is an uncertain topic.

698
01:04:39,320 --> 01:04:44,320
And we're just trying to study it and see if we can't glean any knowledge out of it.

699
01:04:44,320 --> 01:04:46,320
Let's continue on this.

700
01:04:46,320 --> 01:04:49,320
I may want to go ahead and conclude this here.

701
01:04:49,320 --> 01:04:52,320
I just want to be respectful of everybody's time.

702
01:04:52,320 --> 01:04:55,320
And we've covered a lot of ground today.

703
01:04:55,320 --> 01:05:04,320
And for next week, I'll continue teasing some real fun things out of cannabis history

704
01:05:04,320 --> 01:05:10,320
and try to tie them back to the modern day with modern data sets,

705
01:05:10,320 --> 01:05:13,320
for example, this data set out of Washington State.

706
01:05:13,320 --> 01:05:15,320
So lots of cool things coming.

707
01:05:15,320 --> 01:05:20,320
And then also, if any of you have projects that you want to embark on,

708
01:05:20,320 --> 01:05:24,320
feel free to get in touch and let's keep the conversation going throughout the week.

709
01:05:24,320 --> 01:05:30,320
Because as I said, we're at the cutting edge, the bleeding edge of the cannabis area of research.

710
01:05:30,320 --> 01:05:32,320
And there's a lot to uncover.

711
01:05:32,320 --> 01:05:35,320
So I think there's exciting things to come.

712
01:05:35,320 --> 01:05:36,320
Thank you all for coming.

713
01:05:36,320 --> 01:05:41,320
Thank you for bringing your eyes, your ears, your brilliant minds.

714
01:05:41,320 --> 01:05:45,320
Your attention helps advance cannabis research, cannabis science.

715
01:05:45,320 --> 01:05:51,320
And so I think if we can help move things forward, even if it's only one molecule at a time,

716
01:05:51,320 --> 01:06:03,320
then let's keep at it.

