1
00:00:00,000 --> 00:00:09,000
Welcome to Saturday Morning Statistics for November 13th.

2
00:00:09,000 --> 00:00:24,000
It's going to be a really good day today. So we're going to pick up with some history of statistics because we want to do a nice introduction, a nice introductory lesson to statistics.

3
00:00:24,000 --> 00:00:36,000
So that way we have a nice foundation for analysis. And of course, we'll be working with real cannabis data to apply some of these fundamental statistics.

4
00:00:36,000 --> 00:00:48,000
And we can already begin making some awesome insights and inferences. So this is what's exciting. So you can do a lot with the fundamentals of statistics.

5
00:00:48,000 --> 00:01:02,000
So there's a lot of people like to say low hanging fruit to be had. So you don't have to necessarily come at it with the most complex models that people may not understand.

6
00:01:02,000 --> 00:01:14,000
You can come at cannabis data with simple statistics and inform people. So let's jump into it.

7
00:01:14,000 --> 00:01:27,000
Want to preface everything with the fact that in this day and age, these characters in statistics are quite controversial.

8
00:01:27,000 --> 00:01:51,000
So we're not here to really talk about the people or people's opinions of them. We're just here to talk about the ideas that they were they introduced because like it or not, they had quite an impact on the field of statistics.

9
00:01:51,000 --> 00:02:01,000
And these are measures and metrics that we use every day.

10
00:02:01,000 --> 00:02:09,000
You know, we heavily utilize some of these statistics. So it's interesting to know where they came about from.

11
00:02:09,000 --> 00:02:30,000
I always find the history of mathematics, statistics, economics, all quite interesting because it's just interesting to see how the field developed because, right, you have these brilliant thinkers throughout time.

12
00:02:30,000 --> 00:02:40,000
But at their time, things that we take as commonplace weren't yet discovered. So it's real interesting their approaches, right?

13
00:02:40,000 --> 00:02:52,000
So they may do these real roundabout approaches to solve problems that we understand much more formally these days.

14
00:02:52,000 --> 00:03:06,000
And we take as given, however, these people were just discovering them. So as you can see, you may recognize the name Pearson.

15
00:03:06,000 --> 00:03:19,000
So Carl Pearson, a statistician, introduced the Pearson correlation coefficient. So this is a metric we will talk about today.

16
00:03:19,000 --> 00:03:23,000
He also introduced the p-value.

17
00:03:23,000 --> 00:03:27,000
So we'll talk about that today. Another common measure.

18
00:03:27,000 --> 00:03:40,000
He was the first to formally do principal component analysis, which is utilized in the field today in the cannabis industry.

19
00:03:40,000 --> 00:03:46,000
So you see chemists will use principal component analysis.

20
00:03:46,000 --> 00:03:51,000
And dear to my heart, he created the first histogram.

21
00:03:51,000 --> 00:03:56,000
And the histogram is probably my favorite type of chart.

22
00:03:56,000 --> 00:04:03,000
So, well, you know, I'm a big fan of the distributions.

23
00:04:03,000 --> 00:04:20,000
So I think they capture a lot of dimensionality with just one figure. So he was the first one to create a histogram.

24
00:04:20,000 --> 00:04:25,000
And so he lived from 1857 to 1936.

25
00:04:25,000 --> 00:04:33,000
So he was born more than 150 years ago.

26
00:04:33,000 --> 00:04:44,000
So it's interesting to note that this is essentially when, you know, statistics was really starting to pick up steam.

27
00:04:44,000 --> 00:04:48,000
Next, yet another controversial figure.

28
00:04:48,000 --> 00:04:57,000
So if you want to read about the controversies about him, then hit the books.

29
00:04:57,000 --> 00:05:05,000
We're once again primarily focused on some of the methods that were introduced that we'll be talking about.

30
00:05:05,000 --> 00:05:13,000
Right. So we're always talking about variance and variability.

31
00:05:13,000 --> 00:05:22,000
Well, Ronald Fisher, actually Sir Ronald Fisher, so he was knighted in 1952.

32
00:05:22,000 --> 00:05:27,000
So he introduced the term variance.

33
00:05:27,000 --> 00:05:37,000
And so variance, as we know it, is the sum of squares from the mean.

34
00:05:37,000 --> 00:05:45,000
Well, actually, the variance is the square of the standard deviation.

35
00:05:45,000 --> 00:05:50,000
He also introduced popularized analysis of variance.

36
00:05:50,000 --> 00:05:55,000
And so this is where you can compare variances between populations.

37
00:05:55,000 --> 00:06:01,000
And so this is where we can start with our hypothesis testing.

38
00:06:01,000 --> 00:06:10,000
And so he also formalized what we now know as the hypothesis test.

39
00:06:10,000 --> 00:06:18,000
So there were hypothesis tests before, but they heavily relied on alternative hypotheses.

40
00:06:18,000 --> 00:06:22,000
And the mathematics was complex.

41
00:06:22,000 --> 00:06:34,000
And so the single hypothesis, just having a null hypothesis that you can reject, was the idea was introduced by Fisher.

42
00:06:34,000 --> 00:06:44,000
And it's a useful, simplified way to go about hypothesis testing.

43
00:06:44,000 --> 00:06:48,000
And then the S distribution was not.

44
00:06:48,000 --> 00:06:58,000
There was, it's interesting, while I was reading all of this, there is a current statistician, I believe, named Stigler.

45
00:06:58,000 --> 00:07:05,000
We were talking about a Stigler in economics. This is a different Stigler, Stephen Stigler.

46
00:07:05,000 --> 00:07:12,000
And he introduced a law.

47
00:07:12,000 --> 00:07:27,000
I won't get the name right, but his law goes along the lines of any famous name in mathematics wasn't named after the person who discovered it.

48
00:07:27,000 --> 00:07:41,000
So long story short, someone else really formalized the F distribution, but it is called an F distribution in honor of Ronald Fisher.

49
00:07:41,000 --> 00:07:59,000
So long story short, these are two major proponents who really, you know, really formalized statistics into what became the modern day statistics.

50
00:07:59,000 --> 00:08:08,000
So, you know, it wasn't really necessarily the field before these characters came along.

51
00:08:08,000 --> 00:08:18,000
And, you know, these days, statistics underlies essentially every major field.

52
00:08:18,000 --> 00:08:28,000
So, Carl Pearson and then his student Ronald Fisher from 1890 to 1962.

53
00:08:28,000 --> 00:08:33,000
So also, you know, once again, we're moving on through history here.

54
00:08:33,000 --> 00:08:45,000
So that's the history about where these ideas came from. Now, let's jump into it.

55
00:08:45,000 --> 00:08:57,000
So I thought it was interesting that before these characters came along, no one had really formalized the idea of variance.

56
00:08:57,000 --> 00:09:10,000
So what you would often do is just calculate means of different populations, and you would just see if the means were different.

57
00:09:10,000 --> 00:09:25,000
Well, in some circumstances, and I'll just show you some today, the mean will be similar or the same across populations.

58
00:09:25,000 --> 00:09:30,000
So here you have two different populations, red and blue.

59
00:09:30,000 --> 00:09:34,000
They both have the same mean, 100.

60
00:09:34,000 --> 00:09:41,000
However, the variance is different between the two groups.

61
00:09:41,000 --> 00:09:51,000
So there's a lower standard deviation in the red, higher standard deviation in the blue.

62
00:09:51,000 --> 00:10:10,000
So although the means are the same, this is where, you know, the inside of the histogram comes in, because you can kind of start to see that, you know, the tail ends of the distribution are going to be different.

63
00:10:10,000 --> 00:10:15,000
And so this could have implications.

64
00:10:15,000 --> 00:10:33,000
So, for example, for hypothesis testing, if you're trying to find if an observation lies outside of one of these distributions, well, say you have a value of 140.

65
00:10:33,000 --> 00:10:48,000
Well, you can probably, and we'll get to it, you know, conclude that 140 is statistically outside of this. You know, we wouldn't conclude that that would be in the red population.

66
00:10:48,000 --> 00:10:52,000
But, you know, 140 may be in the blue population.

67
00:10:52,000 --> 00:10:59,000
And so, you know, you wouldn't be able to statistically conclude that 140 is not in the blue.

68
00:10:59,000 --> 00:11:11,000
So long story short, the variance will have implications for hypothesis testing.

69
00:11:11,000 --> 00:11:20,000
Now, we've introduced the idea of these two groups.

70
00:11:20,000 --> 00:11:35,000
Well, actually, as we go along with this, let's go ahead and look at some of this with real cannabis data here.

71
00:11:35,000 --> 00:11:51,000
Because, you know, that's what we're kind of all about. So, okay, so, right, so where can we find something where we may have the same mean, but different variances with cannabis data?

72
00:11:51,000 --> 00:12:02,000
Well, let's start up a new console here.

73
00:12:02,000 --> 00:12:07,000
Right. So, once again, working with Massachusetts data.

74
00:12:07,000 --> 00:12:24,000
So, we're simply reading in this data here from Massachusetts.

75
00:12:24,000 --> 00:12:41,000
And we're going to go ahead and define a couple periods here, right? So, we've got essentially when the industry went on pause and when it was resumed.

76
00:12:41,000 --> 00:12:51,000
Because we've noted there's a gap there.

77
00:12:51,000 --> 00:13:01,000
Well,

78
00:13:01,000 --> 00:13:29,000
Okay, so long story short, the idea is, you know, to, right, we're trying to basically see, okay, are there two populations with different variances? And so the idea I was thinking was, you know, what if sales takes on every different, you know, underline, like, what if there's almost like the production function?

79
00:13:29,000 --> 00:13:40,000
And so, you know, it's like the production function changes in some fundamental manner. And so it looks like, you know, you're going along.

80
00:13:40,000 --> 00:13:52,000
And I even kind of want to compare going along, going along this period to this one, but we'll do pre to post pandemic first.

81
00:13:52,000 --> 00:14:00,000
Well, not really pandemic, but more closure, because this was just, you know, the period where shops were closed.

82
00:14:00,000 --> 00:14:13,000
But long story short, I figured we could start defining some of these variables here. So, how can we get these periods to have the same mean?

83
00:14:13,000 --> 00:14:26,000
Well, what you can do is you can actually calculate the change. So here, you know, I've calculated the change in sales.

84
00:14:26,000 --> 00:14:43,000
So you actually expect the change to be closer to zero, you know, if it was just a straight line, it would be zero, but maybe you would expect a slight positive mean.

85
00:14:43,000 --> 00:15:01,000
So, for example, right, your mean change in sales is around 11 percent. So positive.

86
00:15:01,000 --> 00:15:09,000
And so you would expect, you know, the change in sales to fluctuate around 0.11.

87
00:15:09,000 --> 00:15:29,000
Well, you could, you know, calculate the change in plants.

88
00:15:29,000 --> 00:15:47,000
OK, so I meant to restrict the time frame here from 2019 and onwards, because the 2018 data is partially missing.

89
00:15:47,000 --> 00:15:52,000
So let's try this one more time.

90
00:15:52,000 --> 00:16:01,000
OK, there we are. So here we have the change in plants and then the change in inventory.

91
00:16:01,000 --> 00:16:10,000
So note today we're looking at total flowering plants instead of total tracked plants.

92
00:16:10,000 --> 00:16:20,000
This is sort of an arbitrary change. I was just, I don't know, I started just have doubts about the total tracked plant count.

93
00:16:20,000 --> 00:16:30,000
So I thought maybe the total flowering count may be a bit more accurate because it doesn't really matter how many plants you have in vegetative state, right?

94
00:16:30,000 --> 00:16:39,000
Because some people may have tons and tons of vegetative plants and some may not really have that many.

95
00:16:39,000 --> 00:16:45,000
So it's really more the ones that are like in their flower room.

96
00:16:45,000 --> 00:16:55,000
And that kind of count. But the long story short, it's similar, but you can look at some various variables.

97
00:16:55,000 --> 00:17:00,000
But we're just sort of picking some random variables today, right?

98
00:17:00,000 --> 00:17:08,000
We're not really approaching this with much theory. We're just looking at rudimentary statistics here, right?

99
00:17:08,000 --> 00:17:18,000
We're just looking at variance. So, OK, great. So these are like definitely two different populations, right?

100
00:17:18,000 --> 00:17:25,000
Here you have change in sales and then here you've got change in plants.

101
00:17:25,000 --> 00:17:33,000
So these are entirely different series. Change in sales.

102
00:17:33,000 --> 00:17:38,000
Oh, yes, I wanted the mean of these.

103
00:17:38,000 --> 00:17:53,000
Sales is around 11 percent. Change in plants is about 0.3 percent mean.

104
00:17:53,000 --> 00:18:00,000
And then change in inventory is about 0.4 percent.

105
00:18:00,000 --> 00:18:09,000
And so what does this look like?

106
00:18:09,000 --> 00:18:28,000
Perhaps we could draw a histogram so you could see, OK, what does the frequency of these changes look like?

107
00:18:28,000 --> 00:18:41,000
But here you see the distribution of changes in inventory.

108
00:18:41,000 --> 00:18:57,000
And here you could look at the distribution of changes in plants.

109
00:18:57,000 --> 00:19:07,000
So these distributions, you know, just on the face of it, look similar.

110
00:19:07,000 --> 00:19:16,000
Let's look at the.

111
00:19:16,000 --> 00:19:23,000
And here's the change in sales.

112
00:19:23,000 --> 00:19:36,000
Right. And what's a little interesting about this is, you know, the change in sales.

113
00:19:36,000 --> 00:19:55,000
The mean is higher, but it looks like, you know, it may have this sort of fat tail is what they call them.

114
00:19:55,000 --> 00:19:59,000
OK, that's interesting.

115
00:19:59,000 --> 00:20:12,000
Well, I figured we could do better than that. And a point and we're going to start getting into some new data here.

116
00:20:12,000 --> 00:20:30,000
So there's a bunch of ways you can break this down by sales and we may come back and do this.

117
00:20:30,000 --> 00:20:47,000
You know, we may return. OK, I'll just show it to you real quick. Long story short, I was saying, OK, look, you can maybe look at it by flower.

118
00:20:47,000 --> 00:21:00,000
So this is the change in flower sales. So here, you know, here you just have weekly flower sales.

119
00:21:00,000 --> 00:21:09,000
And, you know, you can calculate the, you know, the change in flower sales.

120
00:21:09,000 --> 00:21:17,000
And then you could also, you know, calculate, you know, the change in processed goods.

121
00:21:17,000 --> 00:21:44,000
So here, so basically, and this is where you kind of get into the art of data science. How do you want to break these goods up? Right. So if we're looking at the products.

122
00:21:44,000 --> 00:21:52,000
Here, I've got a list of products. Some of them are similar and some of them are dissimilar.

123
00:21:52,000 --> 00:21:59,000
So what I've done is first, I've just deleted some that I just didn't think were relevant.

124
00:21:59,000 --> 00:22:07,000
So I'm not measuring waste. I'm not measuring seeds. Those are just not included in the analysis.

125
00:22:07,000 --> 00:22:26,000
And then basically what I've done is I've separated these into two groups. Essentially, the flower type goods, which I determined are goods that I think a cultivator could produce and sell to a retailer.

126
00:22:26,000 --> 00:22:47,000
So we'll have to, you know, check the rule book in Massachusetts. But from the naming, it would seem to me like a cultivator could sell buds, shake, raw pre-rolls, or trim to a retailer.

127
00:22:47,000 --> 00:23:09,000
And then a processor to someone who set up a processing laboratory and is either processing the flower into concentrates, or maybe they have a food facility and they're just taking concentrated products and making edibles.

128
00:23:09,000 --> 00:23:25,000
So this sort of may lump in dissimilar groups, right, because just because you're producing edibles doesn't necessarily mean you're producing concentrates and so on and so forth.

129
00:23:25,000 --> 00:23:43,000
However, I just tried to have, you know, dichotomous, two dichotomous groups here. And so I just said, okay, these are the flower goods that cultivators will produce, your processed goods that a processor would produce.

130
00:23:43,000 --> 00:24:02,000
And the idea is, is, you know, are flower goods growing more than concentrate goods? So like, as we saw, here are flower sales.

131
00:24:02,000 --> 00:24:08,000
Then we can, you know, look at concentrate sales.

132
00:24:08,000 --> 00:24:17,000
And, you know, we may note that, you know, overall flower sales.

133
00:24:17,000 --> 00:24:30,000
I wonder if we can plot these two together.

134
00:24:30,000 --> 00:24:44,000
Okay, so either something's going on, but flower, hold on, I plotted the wrong series here. Let's try this one more time.

135
00:24:44,000 --> 00:25:06,000
Okay, there we are. So this is interesting here, right? So here you have flower and flower in blue and non-flower in orange. And they're actually similar quantities, right? It looks like flowers kind of broken away.

136
00:25:06,000 --> 00:25:23,000
In other states, you see typically a 60-40 split with about 60% sales flower, 40% other. At least that's what we were observing in Washington state.

137
00:25:23,000 --> 00:25:48,000
So we would be interesting to, right? So we could even, this is what we're all about, calculating new statistics here, right? So we could calculate the ratio here of, you know, flower to concentrate.

138
00:25:48,000 --> 00:26:02,000
Or we could say, oh, what's, hold on, let's do this out of the total. So let's see what flower...

139
00:26:02,000 --> 00:26:11,000
Hold on, I don't think I did that right. Because basically I want to see what is flower out of the total.

140
00:26:11,000 --> 00:26:23,000
Okay, so this is real cool. So right now we just calculated the percent of sales that are from flower.

141
00:26:23,000 --> 00:26:45,000
And so as you, as you know, I conjectured, similarly to Washington state, you see a similar breakout where flower stabilizes at around 60% of sales.

142
00:26:45,000 --> 00:26:56,000
And we can even say, you know, what's the mean of this? Well, it's not going to work well because of this break.

143
00:26:56,000 --> 00:27:14,000
But we could say the last year. Okay, so this is interesting. So in the last year, flower has been around 54% of sales.

144
00:27:14,000 --> 00:27:25,000
So that's interesting. So Massachusetts may see a higher proportion of other goods sold than in other states.

145
00:27:25,000 --> 00:27:42,000
And as we've noted, this may be because of income. So as income rises, you may see people spending more money on edibles and less money on flower.

146
00:27:42,000 --> 00:28:00,000
So already we're starting to uncover ways where populations may be different. Of course, the means may be different, but the variances as well.

147
00:28:00,000 --> 00:28:16,000
So, so those are some interesting ones. And then just to show you these remaining statistics.

148
00:28:16,000 --> 00:28:30,000
This is just the percent change in flower pre-pandemic and post-pandemic or pre-closure, post-closure.

149
00:28:30,000 --> 00:28:40,000
And so this is where you can see, oh, you know, are these, you know, are these histograms different?

150
00:28:40,000 --> 00:28:59,000
So here I was, you know, trying to recreate this chart. But as you can see, you know, these distributions overlap pretty, pretty strongly.

151
00:28:59,000 --> 00:29:07,000
So quite a bit of overlap there.

152
00:29:07,000 --> 00:29:28,000
And then I was saying, okay, you know, you could also look at concentrates, you know, pre-pandemic or pre-closure, post-closure and similar thing, you know, is the like has the change in concentrate sales.

153
00:29:28,000 --> 00:29:44,000
Like is that systemically different? And, you know, it doesn't, you know, on the face of it, you know, appear, it doesn't on the face of it appear to be different.

154
00:29:44,000 --> 00:29:51,000
Right. You know, those distributions look quite similar.

155
00:29:51,000 --> 00:29:59,000
Well, you know, that's just plotting.

156
00:29:59,000 --> 00:30:07,000
But let's try to open this full screen again.

157
00:30:07,000 --> 00:30:15,000
Well, we've now noted that we can actually quantify this.

158
00:30:15,000 --> 00:30:32,000
So if we are given samples, samples of X and samples of Y, we can calculate the sample correlation coefficient

159
00:30:32,000 --> 00:30:44,000
to see how correlated these samples are with each other.

160
00:30:44,000 --> 00:30:58,000
And I think we can do that here in

161
00:30:58,000 --> 00:31:18,000
the next part. So

162
00:31:18,000 --> 00:31:36,000
I've already had this pulled up for some reason. I thought I had this one ready.

163
00:31:36,000 --> 00:31:52,000
Let's just try some of these out. These ones I haven't tried before. But ideally, I think this would be a visualization of the

164
00:31:52,000 --> 00:32:01,000
correlation here. Unfortunately, we may have to make this correlation matrix.

165
00:32:01,000 --> 00:32:14,000
Okay, yeah, this is maybe something that we may need to

166
00:32:14,000 --> 00:32:32,000
do. My apologies. I should have had this one pulled up.

167
00:32:32,000 --> 00:32:45,000
I'm going to leave this in.

168
00:32:45,000 --> 00:33:10,000
Okay, let's try this real quick. If it works, it works. If it doesn't, we'll move on. It would just be interesting to see, you know, what are some of the correlations between, say,

169
00:33:10,000 --> 00:33:22,000
some of these change in sales.

170
00:33:22,000 --> 00:33:43,000
So let's just see if we can't correlate pre-change in concentrate sales and pre-change in flower sales.

171
00:33:43,000 --> 00:33:53,000
I think it's going to be because of this NAND.

172
00:33:53,000 --> 00:34:15,000
Well, we can probably just get around that. Let me try this one more time and then we'll move on because there's more interesting statistics we can get to here.

173
00:34:15,000 --> 00:34:33,000
Just skip this first observation. Okay, awesome. We were able to do it. So we just calculated the correlation coefficient here between

174
00:34:33,000 --> 00:34:52,000
the pre-change in flower sales and the pre-closure concentrate sales.

175
00:34:52,000 --> 00:35:05,000
As you can see, they're moving along and we can say that they are positively correlated.

176
00:35:05,000 --> 00:35:27,000
And this measure ranges from positive one to negative one. So if it's one, then they're perfectly positively correlated. If it's negative one, then they're perfectly negative correlated.

177
00:35:27,000 --> 00:35:47,000
Zero, they are not correlated. And so here you had 0.6. So you confirm that you would say these are moderately positively correlated.

178
00:35:47,000 --> 00:36:02,000
The general rule of thumb is anything greater than 0.7 is strongly correlated. Anything less than around 0.3 is weakly correlated.

179
00:36:02,000 --> 00:36:16,000
And then anything between 0.3 and 0.6 is moderate. Those are my rules of thumb. So a little bit of a panic, but we were able to calculate this statistic.

180
00:36:16,000 --> 00:36:26,000
What's cool is, well, we can actually look at the correlation between these posts.

181
00:36:26,000 --> 00:36:29,000
So let's see if...

182
00:36:29,000 --> 00:36:35,000
Look at this. Post.

183
00:36:35,000 --> 00:36:41,000
Does that make sense?

184
00:36:41,000 --> 00:36:48,000
Let's visualize that. Look at that. Post.

185
00:36:48,000 --> 00:36:58,000
Post closure, concentrate sales and flower sales are almost perfectly correlated.

186
00:36:58,000 --> 00:37:06,000
Isn't that bizarre?

187
00:37:06,000 --> 00:37:17,000
Actually, that is quite bizarre. I wasn't expecting that. That's quite a big difference there.

188
00:37:17,000 --> 00:37:24,000
And so here we're starting to uncover some of these nuances.

189
00:37:24,000 --> 00:37:47,000
And so, like, I mean, think about it. We've taken the most rudimentary statistic here, the sample correlation coefficient pioneered by Carl Pearson, you know, in the early 18th, I mean, in the late 1800s, early 1900s.

190
00:37:47,000 --> 00:38:05,000
We've taken this old statistic, rudimentary, and already we're looking at this and we're saying, OK, like, look, there's only a 0.6 correlation between flower sales and concentrate.

191
00:38:05,000 --> 00:38:10,000
Well, this is actually the change, the change in sales.

192
00:38:10,000 --> 00:38:20,000
So, you know, for example,

193
00:38:20,000 --> 00:38:27,000
you know, you could look at the correlation coefficient here.

194
00:38:27,000 --> 00:38:30,000
Let's not get into it. It's going to be a whole can of worms.

195
00:38:30,000 --> 00:38:48,000
But anyways, there's a lot of variables here that you can look at the look at their correlation. And just so you know, time series data like this tends to actually be positive, tends to be strongly correlated.

196
00:38:48,000 --> 00:39:13,000
So, for example, like GDP and a lot of other of these economic variables tend to be strongly correlated with each other. So it's not that surprising that, you know, the change in flower sales and the change in concentrate sales would be highly correlated.

197
00:39:13,000 --> 00:39:26,000
But I just think this is shockingly high, not like 90, like in like 97, like a coefficient of 0.97 is high.

198
00:39:26,000 --> 00:39:32,000
You know, like I said, 0.6, like that's a bit more like that's a bit more typical.

199
00:39:32,000 --> 00:39:40,000
You would expect. But like I said, what's typical? We don't even know what typical is here in the cannabis industry.

200
00:39:40,000 --> 00:39:50,000
And that's why, you know, we need to start looking at these measures state by state, right? Because then you can start looking at the analysis of variance, right?

201
00:39:50,000 --> 00:40:04,000
Maybe red Washington, blue is Massachusetts or what have you. So we can start looking at these two different populations here.

202
00:40:04,000 --> 00:40:17,000
So here's, you know, one way to look at it, the Pearson correlation coefficient. And as you can look at the data, right? These series are kind of static.

203
00:40:17,000 --> 00:40:29,000
You know, you know, it's a little bit of white noise here. And then here, yeah, I mean, they're kind of sitting on top of each other.

204
00:40:29,000 --> 00:40:48,000
So that's an interesting, an interesting observation. But like I said, it's just correlation. You know, we can't really read much into it other than it's, you know, this is just statistical correlation.

205
00:40:48,000 --> 00:40:55,000
So awesome work here. Let's just document that we did this.

206
00:40:55,000 --> 00:41:06,000
Flower and pre and post. So let's just save this work over here just so we can come back to it if we need it.

207
00:41:06,000 --> 00:41:08,000
Awesome.

208
00:41:08,000 --> 00:41:11,000
Well,

209
00:41:11,000 --> 00:41:30,000
let's go ahead and get into these next metrics here. So I just want to finish the presentation real quick. And then I'll show you some brand new work that can be done with these with these measures.

210
00:41:30,000 --> 00:41:35,000
Okay.

211
00:41:35,000 --> 00:41:39,000
Okay, back to full screen.

212
00:41:39,000 --> 00:41:47,000
Okay, so Fisher introduced the analysis of variance.

213
00:41:47,000 --> 00:41:55,000
Right. And so this is a statistical test where I should have

214
00:41:55,000 --> 00:42:13,000
given a screenshot of the formula here.

215
00:42:13,000 --> 00:42:26,000
So to cut the long story short, it's a test where we can tell if two or more populations are different.

216
00:42:26,000 --> 00:42:40,000
Right. And so, you know, essentially, we'll be able to tell if, you know, these two populations are different or not.

217
00:42:40,000 --> 00:42:45,000
So let's go ahead and bring up

218
00:42:45,000 --> 00:43:00,000
this may be out of place, but I think it needed to be mentioned before the end of the day, which was essentially the hypothesis testing and the types of errors you can make.

219
00:43:00,000 --> 00:43:04,000
Right. So we introduced the

220
00:43:04,000 --> 00:43:19,000
test scenario, which is

221
00:43:19,000 --> 00:43:40,000
dependent on your significance level. And so what's your significance level? Well, that's where we are basically want to state what our probability is for making various types of errors.

222
00:43:40,000 --> 00:43:51,000
The general idea is, right, we've seen the distribution and the idea is it may go to negative infinity and positive infinity.

223
00:43:51,000 --> 00:44:16,000
So, you know, we can never be, you know, 100% sure of really anything. And then I guess that's sort of, you know, almost then, you know, where statistics starts to get to merge into worldview.

224
00:44:16,000 --> 00:44:27,000
And this is where, you know, frequentists and Bayesians, they start to butt heads and

225
00:44:27,000 --> 00:44:43,000
the discussion gets a little philosophical and may even get over my head a bit. So I'm more of a practitioner. So, you know, pardon me if I don't convey the theory 100% accurately or formally.

226
00:44:43,000 --> 00:44:50,000
But I'll just convey it the best I can, the way I understand it and the way I use it.

227
00:44:50,000 --> 00:44:59,000
Essentially, there's various errors that you can make when you're doing a hypothesis test. So, for example,

228
00:44:59,000 --> 00:45:18,000
the idea is you to set up a null hypothesis and then try to reject it. So the null hypothesis in this case would be these two groups are identical.

229
00:45:18,000 --> 00:45:33,000
And then you would reject your null hypothesis if you can provide substantial evidence that these two groups are different.

230
00:45:33,000 --> 00:45:38,000
So,

231
00:45:38,000 --> 00:45:49,000
I really should have had the formula pulled up here. But long story short,

232
00:45:49,000 --> 00:46:01,000
and I think that's where I may pick up. I think I'm going to pick up next week with a formal formula for analysis of variance, but we've laid the groundwork today.

233
00:46:01,000 --> 00:46:13,000
But the idea is if you conclude that they're the same, so we don't reject that they're different.

234
00:46:13,000 --> 00:46:23,000
Well, in reality, the two groups, well, they're either different or they're not. Right.

235
00:46:23,000 --> 00:46:32,000
And so this is sort of where you get into the philosophical argument, but we'll just take it, you know, theta, our parameter here as given.

236
00:46:32,000 --> 00:46:47,000
So in reality, we're saying there is a true parameter here and that the parameter does have a true value and it's either the groups are either the same or they're not.

237
00:46:47,000 --> 00:46:58,000
Well, if we say they're not different and they are, in fact,

238
00:46:58,000 --> 00:47:02,000
not different,

239
00:47:02,000 --> 00:47:10,000
then that's what we call a true negative.

240
00:47:10,000 --> 00:47:22,000
If we say they're not different and they are different, that's a false negative.

241
00:47:22,000 --> 00:47:30,000
If we say they are different and they aren't, that's a false positive.

242
00:47:30,000 --> 00:47:37,000
And then if we say they are different and they are, in fact, different, then that's a true positive.

243
00:47:37,000 --> 00:47:47,000
So a lot of picking your significance value, your significance level,

244
00:47:47,000 --> 00:48:01,000
you could argue that you would, coming from an economics point of view, you would essentially weigh the costs and benefits of making true negatives.

245
00:48:01,000 --> 00:48:10,000
I mean, you would basically weigh the benefits of predicting true negatives and predicting true positives,

246
00:48:10,000 --> 00:48:28,000
and you would compare that to the cost of predicting a false negative and the cost of predicting a false positive.

247
00:48:28,000 --> 00:48:36,000
And in different areas of life, these costs are asymmetric.

248
00:48:36,000 --> 00:48:44,000
So I think this is a really good starting point when you're determining your significance level here,

249
00:48:44,000 --> 00:48:49,000
is you kind of want to start thinking about the outcome.

250
00:48:49,000 --> 00:48:55,000
And this is where we started talking about the rules of forecasting, you know, acknowledge your forecasting error.

251
00:48:55,000 --> 00:49:09,000
So here you want to acknowledge if there's a symmetric or an asymmetric cost to your error.

252
00:49:09,000 --> 00:49:19,000
I'm trying to think of an example.

253
00:49:19,000 --> 00:49:36,000
Okay, let's say we conclude that the change in sale and change in flower, they are the same, but they're actually different.

254
00:49:36,000 --> 00:49:43,000
You know, people may stock the shelves wrong.

255
00:49:43,000 --> 00:49:49,000
So the cost there is not readily apparent.

256
00:49:49,000 --> 00:49:54,000
I'm more thinking more in like health care and diagnostics.

257
00:49:54,000 --> 00:50:02,000
That's where it'll have a big effect. But long story short, this is where you set your alpha.

258
00:50:02,000 --> 00:50:14,000
And so I would say, okay, you know, maybe you set your alpha either at 0.05 is what Fisher actually recommended.

259
00:50:14,000 --> 00:50:34,000
So this is where you know, you're predicting, you know, the true negatives 95% of the time and you only have a false positive 5% of the time.

260
00:50:34,000 --> 00:50:53,000
So you can actually, you know, do the math and, you know, if you can put a cost on your false positive, you can actually do the cost benefit analysis.

261
00:50:53,000 --> 00:51:08,000
You know, if you can put if you can measure everything, if you can put a cost on the false positives, put a cost on the false negatives.

262
00:51:08,000 --> 00:51:11,000
And so on and so forth.

263
00:51:11,000 --> 00:51:18,000
But I won't bore you to death with this.

264
00:51:18,000 --> 00:51:23,000
And then we'll resume with the NOVA next week.

265
00:51:23,000 --> 00:51:33,000
But I would just say, so this is, you know, the very rudimentary framework of statistics that Pearson and Fisher introduced.

266
00:51:33,000 --> 00:51:40,000
And then these all lay the groundwork for modern day statistics.

267
00:51:40,000 --> 00:51:57,000
And these are models that we've already been using the fixed effects model, which is a type of analysis of variance, where we're basically assuming that there's been a treatment applied to one of the groups.

268
00:51:57,000 --> 00:52:01,000
You can make this.

269
00:52:01,000 --> 00:52:13,000
You can treat the treatment as random with the random effects model. So we may get to these two models later down the road.

270
00:52:13,000 --> 00:52:23,000
I think next week we'll resume with analysis of variance, and I'll do this a bit more formally.

271
00:52:23,000 --> 00:52:32,000
Just to end on some real world applications of this. So when would we be doing a hypothesis?

272
00:52:32,000 --> 00:52:40,000
And when would we be analyzing the variance between populations here?

273
00:52:40,000 --> 00:52:46,000
Or even testing if the means are equal?

274
00:52:46,000 --> 00:52:55,000
Well, I promise that this data point would be interesting.

275
00:52:55,000 --> 00:53:00,000
And so without further ado, let's start looking at it.

276
00:53:00,000 --> 00:53:16,000
So the parameter I wanted to look at was the square footage of the licensees, so we can start seeing what the size of these operations are.

277
00:53:16,000 --> 00:53:30,000
So for example, one of the things, metrics people are always bandying about is what is the square foot required per plant?

278
00:53:30,000 --> 00:53:43,000
So we could actually measure that on average, right? We could look at the total number of cultivators, add up all their square footage, and then look at the total number of plants.

279
00:53:43,000 --> 00:53:54,000
And then so we can just divide plants by square footage of cultivators. So that's an interesting metric.

280
00:53:54,000 --> 00:54:02,000
We may look at that on Wednesday in the cannabis data science group, since that's more cannabis data science.

281
00:54:02,000 --> 00:54:10,000
But for Saturday morning statistics, well, we can look at the statistics.

282
00:54:10,000 --> 00:54:27,000
So if we just look at the square feet per licensee, we see, okay, you know, there's around nine, there's about 900 licensees.

283
00:54:27,000 --> 00:54:51,000
This average square foot is 30,000. But check this out, 64,000 square foot standard deviation. That is a big distribution.

284
00:54:51,000 --> 00:55:07,000
So, you know, so here's, you know, essentially our distribution of square feet in their establishment.

285
00:55:07,000 --> 00:55:21,000
And as you can see, you know, you have a lot. It's almost a log logarithmic distribution here where you have clumping around zero.

286
00:55:21,000 --> 00:55:32,000
And then you've got this, you know, long tail going way out.

287
00:55:32,000 --> 00:55:45,000
So there's a couple ways you can look at that. So that says a whole, but I figured, well, not all of these businesses are probably the same.

288
00:55:45,000 --> 00:56:02,000
Right. So, for example, you know, we could look at the mean by all of the different types.

289
00:56:02,000 --> 00:56:18,000
So here we have the, you know, the average square feet.

290
00:56:18,000 --> 00:56:28,000
Let's just round this just so it's something that we can look at without having a headache.

291
00:56:28,000 --> 00:56:41,000
Okay. Actually, we can just round this to the nearest foot, right? We don't need.

292
00:56:41,000 --> 00:56:51,000
Anyways, so it looks like we can kind of skip the craft cooperatives. Maybe there's, they aren't issued yet.

293
00:56:51,000 --> 00:57:00,000
So as you can see, transporters, they probably don't need much more than an office space.

294
00:57:00,000 --> 00:57:06,000
Laboratories, right? So this is, you know, I think this is going to be interesting, right?

295
00:57:06,000 --> 00:57:15,000
What, well, I may be biased here, right? Operating analytics. We help out a lot of laboratories.

296
00:57:15,000 --> 00:57:31,000
And so it would be interesting that I can know, okay, you know, what's the distribution here of, you know, the square feet needed for a laboratory.

297
00:57:31,000 --> 00:57:37,000
So if someone asks you off the top of your head, like, you know, how many square feet you need to run a laboratory?

298
00:57:37,000 --> 00:58:05,000
Well, you can tell them, you know, well, let's find all the licensees where their license type is an independent laboratory.

299
00:58:05,000 --> 00:58:10,000
Right. And here are all the different, you know, observations here.

300
00:58:10,000 --> 00:58:18,000
So you have some that are real small, right, around 800 square feet.

301
00:58:18,000 --> 00:58:27,000
That doesn't seem possible, but maybe it is. Maybe they're growing here.

302
00:58:27,000 --> 00:58:36,000
And so, you know, so you have you have them all over the board, right? So you have some that are like 1600 and you have some that are 16,000.

303
00:58:36,000 --> 00:58:57,000
You know, 16,000 is on the high end there, you know, and so once again, our handy dandy histogram.

304
00:58:57,000 --> 00:59:13,000
That's the best way to visualize the data with the histogram. But long story short, it appears, you know, there's different square feet.

305
00:59:13,000 --> 00:59:28,000
You know, we could you could even do a, you know, a conditional variance, right, right. So what we can calculate the variance, variance in square feet.

306
00:59:28,000 --> 00:59:43,000
You got more.

307
00:59:43,000 --> 01:00:02,000
I guess we hold on. I could have just done it this way.

308
01:00:02,000 --> 01:00:14,000
Should do this. All right. Cool. So, you know, so now we can see the mean and standard deviation for the different types.

309
01:00:14,000 --> 01:00:29,000
Right. And so it's going to be interesting to compare these. So, you know, the mean for the laboratories is 5000 and their standard deviation is 4000.

310
01:00:29,000 --> 01:00:35,000
So, you know, we could do this.

311
01:00:35,000 --> 01:00:39,000
So,

312
01:00:39,000 --> 01:01:01,000
So, you know, the long story short is it looks like there are differences in square feet, depending on the license type. And so this gives us a nice opportunity to do it in analysis of variance and conclude and see if we conclude if their sizes are statistically different.

313
01:01:01,000 --> 01:01:22,000
For example, you know, can we conclude if the cultivators at 55,000 square feet on average, you know, can we conclude if that's statistically different than manufacturers who have 40,000 square feet on average.

314
01:01:22,000 --> 01:01:40,000
We may not be able to because look how high the standard deviations are. So the standard deviation is quite high for both manufacturers and for cultivators.

315
01:01:40,000 --> 01:01:58,000
Right. But look, you know, for retail, you know, the average is around 8000 and then you only have a standard deviation of about 20,000. So if you just use Fisher's ballpark of okay two standard deviations.

316
01:01:58,000 --> 01:02:08,000
Well, two standard deviations away from the mean here is around, you know, 48,000 square feet.

317
01:02:08,000 --> 01:02:21,000
So if somebody told you, you know, they had a, you know, a 50,000 square foot facility or 100,000 square foot facility.

318
01:02:21,000 --> 01:02:43,000
Well, you could probably, you know, you do a hypothesis test, and you could probably conclude that you are more than 90, you know, you can say, okay, I'm 95% sure that that person is not a cannabis retailer.

319
01:02:43,000 --> 01:02:53,000
So you could start to, you know, predict license type based off of the square foot of their facility.

320
01:02:53,000 --> 01:03:08,000
Why would that be useful? Well, maybe you are in

321
01:03:08,000 --> 01:03:18,000
the formal name is skipping me.

322
01:03:18,000 --> 01:03:24,000
Which the branch of the economy where they buy and sell homes, real estate, sorry.

323
01:03:24,000 --> 01:03:41,000
It's getting on there. So for example, if you're in real estate and somebody approaches you and they want a 50,000 square foot facility. Well, and you know, they're a cannabis licensee.

324
01:03:41,000 --> 01:03:55,000
Well, you may be able to conclude that they're a cultivator and not a retailer, just by knowing how many square feet they're asking for.

325
01:03:55,000 --> 01:04:08,000
That's sort of maybe not the best example ever. But, you know, you know, it could be useful, you know, maybe they're asking for 35,000 square foot facility.

326
01:04:08,000 --> 01:04:19,000
Well, you could find the probability that they're a manufacturer. You may be able to conclude if they're a manufacturer versus a cultivator.

327
01:04:19,000 --> 01:04:23,000
So,

328
01:04:23,000 --> 01:04:42,000
so long story short, there's definitely differences of means here. There's definitely differences of variances here and we're going, we're, you know, starting with the basics to quantify these.

329
01:04:42,000 --> 01:05:00,000
And next week, I'm going to continue with an interesting application here with licensees and square feet per facility. So there's a particular

330
01:05:00,000 --> 01:05:19,000
ANOVA test we can do that just like Pearson and Fisher, we can be controversial. So I'm going to plan it out well for next week and

331
01:05:19,000 --> 01:05:39,000
prepare well for next week. That way we're not having to do things like look up these correlation coefficients on the spot. We were lucky and we're able to calculate those. Don't want to have to get lucky with this. So I'm going to prepare the ANOVA analysis for next week.

332
01:05:39,000 --> 01:05:51,000
And we are going to do a quite interesting analysis on licensees square feet.

333
01:05:51,000 --> 01:05:55,000
We'll incorporate license type.

334
01:05:55,000 --> 01:06:04,000
But we're principally be looking at square feet and another variable

335
01:06:04,000 --> 01:06:15,000
that, you know, so we'll also be using, you know, another variable

336
01:06:15,000 --> 01:06:26,000
that's provided with the licensees. And so we'll pick up next week with hypothesis testing.

337
01:06:26,000 --> 01:06:41,000
We'll be looking at groups that may have different variances or different means and see if we can't

338
01:06:41,000 --> 01:06:59,000
perform hypothesis tests to say how confident we are or not that the groups may be different. So, you know, next week, we'll, you know, start looking at

339
01:06:59,000 --> 01:07:22,000
square feet conditionally and we'll perform some ANOVA tests and I will include these statistics, the formal formulas that I should have included this week. I'll go ahead and type those up and include those for next week.

340
01:07:22,000 --> 01:07:39,000
So we're well underway to getting up to speed with modern day statistics while building up a strong foundation with the fundamental statistics here.

341
01:07:39,000 --> 01:07:55,000
So. Going to go ahead and pause it for now until next week. So if there are any questions I am, I'd be happy to field any.

342
01:07:55,000 --> 01:08:04,000
Yeah, Keegan, can you hear me? Yes. Ah, here we are. Can you answer my questions in the chat?

343
01:08:04,000 --> 01:08:12,000
Okay. Skewness and kurtosis. So.

344
01:08:12,000 --> 01:08:19,000
Essentially, these, if we're talking about the methods of moments,

345
01:08:19,000 --> 01:08:41,000
these are the statistics that you gather from a particular group, a particular sample population. So the first would be the mean. So if you're given it right. So we saw given this sample, we can calculate the mean.

346
01:08:41,000 --> 01:09:06,000
The next moment is your variance. And that's where we saw, you know, how much variance that there is. So either how tight the distribution is or how wide it is. So that's more width.

347
01:09:06,000 --> 01:09:19,000
Skewness is the next moment. So it's if you should read research method of moments, but it's essentially.

348
01:09:19,000 --> 01:09:38,000
You know, I don't know the formula, the formal formula off the top of my head, but it's it's similar to variance where, you know, these are statistics that you can calculate from a group of data. Just like given a sample, you can calculate its mean.

349
01:09:38,000 --> 01:09:57,000
You can calculate its variance. You can also calculate its skewness. And that is basically the thickness of its tails. So remember, today we were talking about.

350
01:09:57,000 --> 01:10:13,000
The change of sales, having a thick tail, like having one side had a thick tail and the other side had a thin tail. That would be skewness.

351
01:10:13,000 --> 01:10:37,000
And kurtosis, don't quote me on this, double check, but this is my understanding. Kurtosis is almost like the way I feel about this, the way I picture it is almost like the curve getting like a little like bent to the side.

352
01:10:37,000 --> 01:10:53,000
So, you know, you don't have quite the normal distribution curve, your curve gets a little tilted towards one side.

353
01:10:53,000 --> 01:11:13,000
And these are things, these are like we saw with the visualization. These are aspects that wouldn't really be captured by a mean and skewness wouldn't even necessarily be captured by a variance because you can have samples that have the same mean.

354
01:11:13,000 --> 01:11:34,000
They have the same variance, but one has thick tails and the other has thin tails. So it's possible to just vary on skewness and kurtosis alone.

355
01:11:34,000 --> 01:11:44,000
So these are just sort of higher order ways to classify data versus a mean.

356
01:11:44,000 --> 01:11:57,000
So it's all sort of talking about the how the distribution looks. That's how I would describe it, just more characteristics of your distribution.

357
01:11:57,000 --> 01:12:01,000
That's an incredibly informal explanation.

358
01:12:01,000 --> 01:12:10,000
Spearman correlation, okay, and spurious correlation. Okay, spearman, I'm going to have to research.

359
01:12:10,000 --> 01:12:29,000
I don't know that off the top of my head. Spurious correlation, is it real? So that's essentially where we were talking about with the time series, where we may have two different time series moving along.

360
01:12:29,000 --> 01:12:39,000
If you did a regression on the two, they would...

361
01:12:39,000 --> 01:12:56,000
The regression would look as if one variable was highly explanatory, but in reality they just move in similar ways.

362
01:12:56,000 --> 01:13:07,000
Trying to think of good examples here.

363
01:13:07,000 --> 01:13:17,000
I think it's just really just any time series.

364
01:13:17,000 --> 01:13:32,000
I'm having a tough time because the way I think, I always think about all these factors as being dependent on each other. So I'm having a hard time thinking of any two factors that are independent.

365
01:13:32,000 --> 01:13:41,000
So I'm having a tough time thinking of an example, but the idea is if you have two time series,

366
01:13:41,000 --> 01:13:50,000
that's why it's often useful to take the difference and look at the change in the growth rates like we were doing today.

367
01:13:50,000 --> 01:14:07,000
Because if we had done a regression of sales on plants, they're both just increasing over time, and we may have a spurious correlation where they may not really be correlated with each other.

368
01:14:07,000 --> 01:14:14,000
Long story short, you can sum that up by saying correlation does not mean causation.

369
01:14:14,000 --> 01:14:22,000
Just because you have correlation doesn't mean you've proved causation.

370
01:14:22,000 --> 01:14:39,000
And the final question, if means are equal.

371
01:14:39,000 --> 01:14:45,000
They may have different standard deviations and variances.

372
01:14:45,000 --> 01:14:54,000
And so as you hinted at in your question, you can't really conclude that these two groups are the same.

373
01:14:54,000 --> 01:15:06,000
So it depends on for what purpose. If you're only concerned about the mean, then you could make your decision and just say, OK.

374
01:15:06,000 --> 01:15:11,000
So this is the classic in the stock return, the stock market.

375
01:15:11,000 --> 01:15:22,000
The efficient market hypothesis is that nobody can make a return on average in the long run greater than 0%.

376
01:15:22,000 --> 01:15:34,000
So you may see somebody make a 20% return, but then they may also make a minus 40% return and a 5% return.

377
01:15:34,000 --> 01:15:46,000
And so then you may see someone else, and they're just doing 1% return, minus 1% return, 2% return, minus 3% return.

378
01:15:46,000 --> 01:15:54,000
So the idea is the means are the same. They both are getting 0% return.

379
01:15:54,000 --> 01:16:04,000
But you may actually care about the variance. So in the stock market, variance is risk.

380
01:16:04,000 --> 01:16:09,000
So you people have a preference for low variance.

381
01:16:09,000 --> 01:16:22,000
So if you had two traders and they both had a mean of 0, but one of them had a high variance and the other one had a low variance,

382
01:16:22,000 --> 01:16:26,000
you would prefer the trader with the low variance.

383
01:16:26,000 --> 01:16:31,000
And so this is actually taking into consideration in the stock market.

384
01:16:31,000 --> 01:16:46,000
So say you have two assets. They both return a, let's just say, a 2% yield or say they both return a 5% yield.

385
01:16:46,000 --> 01:16:52,000
And that's common for assets to return a similar yield on average.

386
01:16:52,000 --> 01:16:58,000
Well, then you actually have to look at the variance. What's the variance?

387
01:16:58,000 --> 01:17:03,000
What's the range of returns that you could get?

388
01:17:03,000 --> 01:17:10,000
And so maybe they both return 5% on average, but one has a greater variance than the other.

389
01:17:10,000 --> 01:17:23,000
Well, the risk adverse trader would, economists would argue everybody's risk adverse, would favor the bundle with the lower variance.

390
01:17:23,000 --> 01:17:29,000
So that is an actual example where you've got two groups.

391
01:17:29,000 --> 01:17:38,000
They've got the same mean, but you clearly have a preference for the group that has the lower variance.

392
01:17:38,000 --> 01:17:46,000
So it's circumstance dependent.

393
01:17:46,000 --> 01:17:50,000
And then survival functions.

394
01:17:50,000 --> 01:18:02,000
I don't know as well. This is more like how long things will survive or go unchanged.

395
01:18:02,000 --> 01:18:16,000
Maybe. Well, like I said, all of these things are kind of built upon each other. So I'm sure, you know, in its own way, survival functions are built upon analysis of variance.

396
01:18:16,000 --> 01:18:24,000
But I'll have to do some homework to connect the dots on that one.

397
01:18:24,000 --> 01:18:33,000
So, so hopefully you're welcome, Cheyenne. Hopefully that answers your questions.

398
01:18:33,000 --> 01:18:39,000
As I said, I can do a lot more homework myself on statistics.

399
01:18:39,000 --> 01:18:45,000
And that's what's, that's what's awesome about Saturday morning statistics is hopefully you can learn a thing or two.

400
01:18:45,000 --> 01:19:05,000
And then I, I learned a thing or two myself from preparing because, you know, sometimes I need to knock the dust off of some of these statistical concepts that I take as granted that we need to formally define and prove we can do well.

401
01:19:05,000 --> 01:19:15,000
Because that right, we have to start with the good fundamentals. And that's, that's what we stress here at the Cannabis Data Science Group is, you know, you got to walk before you can run.

402
01:19:15,000 --> 01:19:25,000
And so we, we're here proving that we can do simple statistics well with these, with cannabis data.

403
01:19:25,000 --> 01:19:43,000
So that way we can build up trust and confidence in ourselves. And we can make some, some good insights because as we showed today with simple statistics, you can have deep insights.

404
01:19:43,000 --> 01:19:47,000
And that's where we're going to keep delivering.

405
01:19:47,000 --> 01:19:49,000
Awesome Cheyenne.

406
01:19:49,000 --> 01:19:52,000
Thank you for saying a little extra long longer today.

407
01:19:52,000 --> 01:20:10,000
Next week we'll pick up with analysis of variance and keep building upon it. Because like I said, we can eventually build upon it to the point where we get to fixed effects models, random effects models, we'll look into survival functions and see if we can't tie those in.

408
01:20:10,000 --> 01:20:23,000
And we're going to keep, keep making some awesome insights. As I teased, we've got some real interesting work next week. So definitely tune in.

409
01:20:23,000 --> 01:20:41,000
I think we're going to have a good time and make some good discoveries.

