1
00:00:00,000 --> 00:00:09,880
My name is Keegan. I founded a company, Canlytics. We primarily help out laboratories that test

2
00:00:09,880 --> 00:00:15,840
cannabis. And we'd love to hear about what you do in the space or are interested in in

3
00:00:15,840 --> 00:00:19,240
the space so we can address some of your interests.

4
00:00:19,240 --> 00:00:25,680
Hi, everyone. Like you mentioned, my name is Jorge. You can call me George if you like.

5
00:00:25,680 --> 00:00:36,480
I've actually been in one way or another since the year 2002. I have a bachelor in computer

6
00:00:36,480 --> 00:00:44,160
information system. I'm from New York and I'm also a cannabis user since the age of

7
00:00:44,160 --> 00:00:53,400
16, I would say. And I'm waiting for the permit to come out here and excited to learn as much

8
00:00:53,400 --> 00:01:01,080
as I can. I already incorporated ready to put in a dispensary here in the state and

9
00:01:01,080 --> 00:01:06,120
looking forward to the new adventure that we're going to have here as New Yorkers with

10
00:01:06,120 --> 00:01:11,760
once this is all done and settled. And I just caught up a little bit on the previous, your

11
00:01:11,760 --> 00:01:18,840
previous lecture. I just finished watching that video and it's very interesting how you

12
00:01:18,840 --> 00:01:29,640
use the program that you use on your GitHub repo. And I was trying to follow along and

13
00:01:29,640 --> 00:01:34,760
I cloned the repo. So I'm going to see if I can follow along and teach myself how to

14
00:01:34,760 --> 00:01:38,960
use it a little better. But it's great. From what I've seen so far, I love what you're

15
00:01:38,960 --> 00:01:42,920
doing. Again, hi everyone. I'm excited to be here.

16
00:01:42,920 --> 00:01:51,840
Hi Jorge. Awesome Jorge. I love that you found the repository and the videos. So by all means,

17
00:01:51,840 --> 00:01:59,400
the source code's there for you to use however you can get value out of it. And as you said,

18
00:01:59,400 --> 00:02:05,840
you're up in New York. So we've been looking at data in Massachusetts and today you're

19
00:02:05,840 --> 00:02:11,920
lucky enough that the very least value I can provide to you today is I can share with you

20
00:02:11,920 --> 00:02:21,280
data that I collected from Illinois. So we can finally do an interstate comparison so

21
00:02:21,280 --> 00:02:28,240
we can compare Massachusetts to Illinois. That's so interesting. Let me introduce myself.

22
00:02:28,240 --> 00:02:35,120
I'm also on the same boat as you. I'm very interested in cannabis and data. And I mean,

23
00:02:35,120 --> 00:02:40,360
I love both. And I think, you know, if you're a user, it's very important to inform yourself

24
00:02:40,360 --> 00:02:45,360
and to be part of this time where like so many things are being discovered about cannabis

25
00:02:45,360 --> 00:02:50,520
and based on data, right? So important. So yeah, just wanted to introduce myself. For

26
00:02:50,520 --> 00:02:57,760
a professional, mainly a software developer, but I consider myself a data lover. So I do

27
00:02:57,760 --> 00:03:02,240
a little of everything. So yeah. Oh, and my name is Carolina.

28
00:03:02,240 --> 00:03:08,520
Long story short, we needed some data from other states. And as ambitious as I was that

29
00:03:08,520 --> 00:03:12,800
we're going to get all the states, we will get them eventually. We had to start with

30
00:03:12,800 --> 00:03:25,520
one. So as we were looking in last Wednesday, these are the statistics that we began to

31
00:03:25,520 --> 00:03:32,280
look at. And so as Marjana is brilliant enough to point out.

32
00:03:32,280 --> 00:03:39,240
Hey, I didn't introduce myself with it that like we were. Yeah. Anyways, I've been to

33
00:03:39,240 --> 00:03:47,720
like two of Keegan's, Ken Littix data science presentations. I was in academic research

34
00:03:47,720 --> 00:03:54,360
for 10 years. And then during the pandemic, I switched fields, I was done with being in

35
00:03:54,360 --> 00:04:03,360
lab, but I primarily work in software development now. And I also love data just like Carolina.

36
00:04:03,360 --> 00:04:08,200
So that's great. And I love cannabis too. I'm from Michigan.

37
00:04:08,200 --> 00:04:15,520
Wow. Okay, great. Well, yeah. I'm with you. I didn't go into detail because because of

38
00:04:15,520 --> 00:04:21,920
time I didn't want to huddle up all the time. But yeah, my situation just to go a little

39
00:04:21,920 --> 00:04:26,520
deeper. Yeah, I guess that before I've been in the industry since 2003, one way or another,

40
00:04:26,520 --> 00:04:34,880
you know, and unfortunately, my path was not the same. I ended up you know, you know, I

41
00:04:34,880 --> 00:04:39,720
don't like to go into this a lot. But yeah, the social justice thing affected me early

42
00:04:39,720 --> 00:04:45,560
on as soon as I finished college, and I didn't have many opportunities as a new grad. So

43
00:04:45,560 --> 00:04:51,320
I took another path, you know, here in New York, I ended up running my own car service

44
00:04:51,320 --> 00:04:59,040
business for a few years. And I've been in a different fields for many years. But because

45
00:04:59,040 --> 00:05:06,080
of the passion I have for technology, I've always followed along on every single field.

46
00:05:06,080 --> 00:05:11,120
You know, I've built apps and react, I've built those in different languages, Python

47
00:05:11,120 --> 00:05:19,480
and this and for a person like myself, that has been affected, has been, you know, criminalized

48
00:05:19,480 --> 00:05:24,440
for things that now today, you know, marijuana and stuff like that, we see it's not as big

49
00:05:24,440 --> 00:05:30,720
of a deal as we grew up thinking, fearing, you know, consequences and the prosecution

50
00:05:30,720 --> 00:05:37,640
for using it as a young man, a brand, brown Latino man. And now seeing how the world is

51
00:05:37,640 --> 00:05:43,320
changing, I personally never thought I would live to see this day. So I'm very passionate

52
00:05:43,320 --> 00:05:51,800
about, you know, hoping, you know, to be an early retail business owner here in New York.

53
00:05:51,800 --> 00:05:58,840
And also seeing how I can implement my knowledge in web design, and also data analytics, and

54
00:05:58,840 --> 00:06:05,160
what I can learn here, to maybe help Keegan input some data from New York as soon as we

55
00:06:05,160 --> 00:06:09,800
start, as soon as they start offering these licenses. So I'm just waiting, I call every

56
00:06:09,800 --> 00:06:16,480
week the cannabis management office in New York, hoping that, you know, I could get a

57
00:06:16,480 --> 00:06:20,640
license. But yeah, we're still waiting day by day. So hopefully soon I can give you guys

58
00:06:20,640 --> 00:06:24,080
more information. So Keegan, with that, I'll cut off.

59
00:06:24,080 --> 00:06:29,520
That's awesome, Jorge. I caught in there at the tail end of it, and I heard that you have

60
00:06:29,520 --> 00:06:37,080
experience doing react and I love your, you know, your mission and I share your dream

61
00:06:37,080 --> 00:06:44,480
too. So long story short, I think I'd love for you to get value from CanLytics. And like

62
00:06:44,480 --> 00:06:50,280
you said, there may be ways that you can help out too, because for example, the react can

63
00:06:50,280 --> 00:06:56,720
be quite useful for the user interfaces. And we're always trying to think of new innovative

64
00:06:56,720 --> 00:07:02,160
ways to get data to and from places. So that is, that's awesome.

65
00:07:02,160 --> 00:07:05,640
Oh, well, thanks for sharing that. And just to add a little something, actually, because

66
00:07:05,640 --> 00:07:13,920
right now I'm really digging deep into the blockchain and building DApps and Solidity.

67
00:07:13,920 --> 00:07:19,280
And just from catching what I caught on your previous week, you know, I just saw the, finished

68
00:07:19,280 --> 00:07:25,480
watching the full video of last, I think it was Wednesday's lecture. I can really see

69
00:07:25,480 --> 00:07:31,320
it being useful to have this data somehow on the blockchain and then having some sort

70
00:07:31,320 --> 00:07:39,160
of live API or DApp where you could constantly see every state as they start legalizing and,

71
00:07:39,160 --> 00:07:45,000
you know, giving out more public data. I could really see it being useful to make it a public

72
00:07:45,000 --> 00:07:49,440
record. And I think that's where the blockchain might be useful. So I didn't want to add that

73
00:07:49,440 --> 00:07:53,160
in there because maybe that could be of help as well.

74
00:07:53,160 --> 00:07:55,160
That's really cool, Jorge. Thank you for sharing that.

75
00:07:55,160 --> 00:07:58,760
I think you've, you're onto something there, Jorge. And I would say just attending some

76
00:07:58,760 --> 00:08:05,040
of these public meetings, because I think a lot of the commission's looking about ways

77
00:08:05,040 --> 00:08:11,600
to make the data more traceable, more transparent, right? Because we're all concerned about data

78
00:08:11,600 --> 00:08:19,920
authenticity. So, and data availability. So I think your insights could, could, could

79
00:08:19,920 --> 00:08:29,720
add a lot. Because, well, unless anyone else has anything to chime up about, I could start

80
00:08:29,720 --> 00:08:36,520
to try to show you some data as long as the internet's stable. Does anybody else have

81
00:08:36,520 --> 00:08:45,000
any thoughts, comments, ideas? There's Illinois Day here. Unfortunately, it's in PDFs, which

82
00:08:45,000 --> 00:08:55,560
can be a little tough to, to extract from. So long story short, we've, I try to, to work

83
00:08:55,560 --> 00:09:05,440
with a few tools to just try to extract the text. And so that's still a work in progress.

84
00:09:05,440 --> 00:09:18,760
But I was at least able to get the data relatively in a, in a manual process. So with the help

85
00:09:18,760 --> 00:09:25,360
of Python a little bit, and so I can share you the script I wrote to get this data. And

86
00:09:25,360 --> 00:09:36,400
so here is, here are the Illinois licensees that are licensed for retail. And so we were

87
00:09:36,400 --> 00:09:45,600
looking at retail the past few weeks. And so it would be interesting to look at, say,

88
00:09:45,600 --> 00:09:55,360
retail in Massachusetts to Illinois. They're both on this end of the curve, but we can

89
00:09:55,360 --> 00:10:02,840
still, we can still compare the two. One dispensary per a thousand, a hundred thousand people.

90
00:10:02,840 --> 00:10:11,760
Massachusetts 1.5. So in, for Jorge, these are states that are relevant to you because

91
00:10:11,760 --> 00:10:22,320
Massachusetts is right next door. So that could be a good comparison state. And so we're

92
00:10:22,320 --> 00:10:29,560
going to introduce some statistics. So one thing you'll learn is good data in, and you

93
00:10:29,560 --> 00:10:37,600
can get away with a simpler model. So often if you've got real nice data, you don't necessarily

94
00:10:37,600 --> 00:10:48,080
have to have the most elegant model out there. So if you can get it, panel data adds so much

95
00:10:48,080 --> 00:10:56,340
power to your statistics. So there are two awesome models, fixed effects and the random

96
00:10:56,340 --> 00:11:04,620
effects models, where we're essentially adding another dimension. So we've been looking at

97
00:11:04,620 --> 00:11:12,480
data over time. And now we're going to say that data can also vary by this individual

98
00:11:12,480 --> 00:11:22,040
dimension, which we'll call the state. So we'll have Illinois as one state, I1, and

99
00:11:22,040 --> 00:11:30,120
then Massachusetts as the other. And then, you know, I just got some things from Wikipedia

100
00:11:30,120 --> 00:11:44,360
here. So long story short, check out Wikipedia or a better source here. But the long story

101
00:11:44,360 --> 00:11:57,400
short is if you expect the error term to be uncorrelated with your independent variable,

102
00:11:57,400 --> 00:12:09,460
so in our case, the independent variable is retailers per capita, and we think this effect

103
00:12:09,460 --> 00:12:23,680
is coming from the state. So if this effect has, if this is independent

104
00:12:23,680 --> 00:12:34,720
from the effect of the retailers, so it's useful to look at this line. So we know that

105
00:12:34,720 --> 00:12:45,800
the states are different. But if the way they're different also affects the relationship, then

106
00:12:45,800 --> 00:12:54,600
we need a fixed effects model. If the way the states are different doesn't really affect

107
00:12:54,600 --> 00:13:04,760
this relationship, then we're better off using the random effects model. And so to give you

108
00:13:04,760 --> 00:13:21,800
an example of the way I think the error may be correlated, so take for example California.

109
00:13:21,800 --> 00:13:28,600
There's an unmeasured variable there, the black market, where California has a larger

110
00:13:28,600 --> 00:13:38,840
black market than Massachusetts. So that can be captured by either this fixed effect or

111
00:13:38,840 --> 00:13:51,280
a random effect. But will that variable, the degree of the black market, with that effect,

112
00:13:51,280 --> 00:14:04,640
the effect of increasing retailers on sales per retailer? And so that's sort of the argument

113
00:14:04,640 --> 00:14:10,640
you have to make when you're justifying which of these models you're going to use, because

114
00:14:10,640 --> 00:14:17,280
it's essentially an assumption, whether you're assuming that this, what they call latent

115
00:14:17,280 --> 00:14:23,200
variable, which is just some unmeasured variable, whether that's correlated with your error

116
00:14:23,200 --> 00:14:31,560
or not. So one could argue that, oh, if there's a large amount of black market sales, then

117
00:14:31,560 --> 00:14:41,300
that could affect sales per dispensary in some way, either positively or negatively.

118
00:14:41,300 --> 00:14:48,520
So long story short, you could argue that the fixed effects model may be better. But

119
00:14:48,520 --> 00:15:00,240
that's just essentially a story that I'm telling. So it really depends on your use case here.

120
00:15:00,240 --> 00:15:05,840
If you're presenting this to a room full of economists or statisticians, they'll start

121
00:15:05,840 --> 00:15:16,640
to just ask you about these latent variables that they think up of. So you just have to

122
00:15:16,640 --> 00:15:25,120
kind of think about how these factors may or may not be related to this relationship.

123
00:15:25,120 --> 00:15:32,360
So like I said, that's probably a poor explanation of the random effects or the fixed effects

124
00:15:32,360 --> 00:15:40,640
models. But I just wanted to try to introduce them today. And you may just have to pardon

125
00:15:40,640 --> 00:15:48,200
my poor introduction, and I'll try to iron it out for next week. And then we'll just

126
00:15:48,200 --> 00:15:55,760
make up for it with good data. Without further ado, let's look at the Illinois data, and

127
00:15:55,760 --> 00:16:04,640
then we can look at Massachusetts, and then we'll look at them together. So just grabbing

128
00:16:04,640 --> 00:16:12,520
some fairly common Python packages plus a couple of utility scripts I wrote. So this

129
00:16:12,520 --> 00:16:19,640
is the cleaning that mostly was done manually. A critical data point, in my opinion, which

130
00:16:19,640 --> 00:16:27,280
is the issue date. So this is cool, because we can now figure out approximately when the

131
00:16:27,280 --> 00:16:42,080
retailers began to operate. So let's do exactly that. So keep in mind, the two statistics

132
00:16:42,080 --> 00:16:56,280
here, we're trying to get our retailers per 100,000 and sales per dispensary. So a little

133
00:16:56,280 --> 00:17:07,880
out of order here, but just going to go ahead and get Illinois's population from the Federal

134
00:17:07,880 --> 00:17:18,640
Reserve Fed Fred. I may have in that time just read in the population for Illinois.

135
00:17:18,640 --> 00:17:31,160
So we basically now have Illinois's population for 2020 and 2021, and also just manually

136
00:17:31,160 --> 00:17:42,160
parse the Illinois sales data. So once again, need a good automated way to do this, but

137
00:17:42,160 --> 00:17:55,360
at the moment it's in a PDF, so that makes it tricky. But we have Illinois sales, and

138
00:17:55,360 --> 00:18:03,520
so we're interested in a couple things here, right? Sales per retailer and retailers per

139
00:18:03,520 --> 00:18:14,680
capita. So what we can do is, well, let's find out how many retailers there were at

140
00:18:14,680 --> 00:18:25,040
each point in time. So here's sales over time. Well, we know when each of these licensees

141
00:18:25,040 --> 00:18:32,120
opened. So similar to how we approach things in Massachusetts, we'll say the retailers

142
00:18:32,120 --> 00:18:41,480
began operating when they got their license issued. So that way we can try to get a measure

143
00:18:41,480 --> 00:18:52,800
of when retailers began to operate over time. So this is an interesting statistic, right?

144
00:18:52,800 --> 00:19:01,040
There are the first, like I said, there may have been some entrepreneurial people out

145
00:19:01,040 --> 00:19:12,240
there that have already calculated these statistics, but I'm not certain that they have. So maybe

146
00:19:12,240 --> 00:19:18,560
we should be humble. So maybe there are people out there who have calculated these statistics,

147
00:19:18,560 --> 00:19:25,120
but we can do them in an open source manner. And so now we can actually get a count of

148
00:19:25,120 --> 00:19:32,400
these retailers over time, which is helpful information. Because just because there's

149
00:19:32,400 --> 00:19:47,040
110 today, we can see that they were kind of, it took a year and a half for Illinois to

150
00:19:47,040 --> 00:20:03,640
get to 110. So we can now get a nice calculation of sales per retailer. So let's just calculate

151
00:20:03,640 --> 00:20:11,080
both of these real quick. Remember, we're comparing these statistics to statistics that

152
00:20:11,080 --> 00:20:20,640
were calculated by Nevada's technical memorandum that was commissioned. So we're looking for

153
00:20:20,640 --> 00:20:36,560
one dispensary per 100,000 and 10.8 million in annual revenue per dispensary. So we can

154
00:20:36,560 --> 00:20:44,200
start to calculate some of these statistics here. So just to grab some of this, it's already

155
00:20:44,200 --> 00:20:57,360
been done. We can calculate the average 2020 sales. So that's going to be the total sales

156
00:20:57,360 --> 00:21:06,920
here in Illinois. That's interesting. So we calculated in 2020, there was only 670 million

157
00:21:06,920 --> 00:21:21,040
in sales per retailer. I'm going to have to think more about this statistic. That doesn't

158
00:21:21,040 --> 00:21:30,560
seem right though, for some reason. The main thing of value today is the actual data, which

159
00:21:30,560 --> 00:21:36,840
I'll email you here after the presentation. But we can at least power through this and

160
00:21:36,840 --> 00:21:43,300
try to get some of these other statistics here. So the other statistic we were curious

161
00:21:43,300 --> 00:21:52,920
about was retailers per capita. Here we've plotted retailers per capita. And keep in

162
00:21:52,920 --> 00:22:02,820
mind, we're using the entire population of Illinois, whereas the technical memorandum

163
00:22:02,820 --> 00:22:09,720
used the adult population, which is a better metric. I just haven't found a source for

164
00:22:09,720 --> 00:22:16,800
this data yet. So once we find a good source for the adult population, we can compare Apple.

165
00:22:16,800 --> 00:22:26,880
Well, really, until 2021, there was less than a half of a dispensary per 100,000 to one

166
00:22:26,880 --> 00:22:37,640
dispensary per 200,000. So we've got this statistic, and we can continue to monitor

167
00:22:37,640 --> 00:22:47,040
the statistic over time. So this is cool, because we now have these two variables, and

168
00:22:47,040 --> 00:23:00,960
we can now redo this regression in Illinois. So that way we can basically zoom in on Illinois

169
00:23:00,960 --> 00:23:10,920
and not only that, but also add a whole other dimension. So we'll be adding the time dimension.

170
00:23:10,920 --> 00:23:33,640
So that's essentially what I mean by zoom in on Illinois.

