1
00:00:00,000 --> 00:00:15,600
Yes, so just kicking things off today. So, Charles, so you've begun looking at the sales

2
00:00:15,600 --> 00:00:22,240
items. And so what's this you're saying about novel values?

3
00:00:22,240 --> 00:00:28,680
When I merge it with the valid lab results, they all turn out, no, but I think there's

4
00:00:28,680 --> 00:00:33,320
a typo somewhere because one of the columns is not getting renamed properly. And I need

5
00:00:33,320 --> 00:00:38,640
to figure out. I think that's causing the problem.

6
00:00:38,640 --> 00:00:48,240
Okay, we can talk about that today. So essentially mapping these things out. So it seems that

7
00:00:48,240 --> 00:00:59,800
the sales items just have the global inventory ID. And so then we have to then hunt down

8
00:00:59,800 --> 00:01:06,320
the inventory item. And then I think the inventory item has the lab result ID that we need.

9
00:01:06,320 --> 00:01:12,560
Well, actually, the lab results have the inventory ID.

10
00:01:12,560 --> 00:01:26,880
Exactly. So basically, we have to match them via the inventory item is the middle man,

11
00:01:26,880 --> 00:01:28,640
so to speak.

12
00:01:28,640 --> 00:01:38,400
Although I kind of remember there was something like that inventory ID doesn't match the batch

13
00:01:38,400 --> 00:01:43,320
ID. And if you have to go, maybe I had to go to the batch file and get the inventory

14
00:01:43,320 --> 00:01:47,120
ID from that in order to make it work.

15
00:01:47,120 --> 00:01:53,000
You're saying that file BAT?

16
00:01:53,000 --> 00:01:56,720
BATCH. It's a batch file. Pardon.

17
00:01:56,720 --> 00:02:01,120
Oh, they're batches. It's batches of, of Canada's.

18
00:02:01,120 --> 00:02:06,320
Oh, gotcha. Thought it was dot bat. I was like, okay, thanks.

19
00:02:06,320 --> 00:02:12,320
So we'll need to make a mental map of it. So to welcome Paul. So we're just talking

20
00:02:12,320 --> 00:02:21,280
about how we're planning on wrangling the sales data for the future. However, however,

21
00:02:21,280 --> 00:02:26,720
for today, Charles and Heather and Paul, if you're still on board, Paul's put together

22
00:02:26,720 --> 00:02:37,920
some groundbreaking research on cannabis dispensaries. So I was thinking that today would be a good

23
00:02:37,920 --> 00:02:43,920
platform for Paul to present. So maybe at the end, we can talk a bit more about wrangling

24
00:02:43,920 --> 00:02:48,960
these sales items. But what do you think, Paul, you want to present your market basket

25
00:02:48,960 --> 00:02:49,960
analysis?

26
00:02:49,960 --> 00:02:55,200
Sure. Yeah. What I'll do, sorry with you Keegan, is I'll go through some just highlights

27
00:02:55,200 --> 00:03:00,880
of the paper that I did. So there's some visuals that people, that people, Heather and Charles

28
00:03:00,880 --> 00:03:07,400
and you can look at. So I was going to start off just by conceptually telling you what

29
00:03:07,400 --> 00:03:13,560
market basket analysis is, and then go from there if that sounds okay.

30
00:03:13,560 --> 00:03:20,960
Definitely. So usually I'm up on my podium. So this week, let's, let's listen to your

31
00:03:20,960 --> 00:03:27,080
research. So I'm all ears and I've got a couple of questions and then, and anyone feel

32
00:03:27,080 --> 00:03:32,160
free to chime in if you've got questions at any point. So the floor is yours, Paul.

33
00:03:32,160 --> 00:03:37,560
All right. So I'm going to, just going to share out some information here. This is from

34
00:03:37,560 --> 00:03:48,120
my graduate project on market basket analysis used to identify interesting product combinations

35
00:03:48,120 --> 00:03:52,840
and dispensaries. So can everybody see my screen?

36
00:03:52,840 --> 00:03:53,840
Yes.

37
00:03:53,840 --> 00:04:01,720
Okay. All right. So market basket analysis has been around for a long time as a data

38
00:04:01,720 --> 00:04:10,600
mining technique. It started in the early 1990s when point of sale systems came on board

39
00:04:10,600 --> 00:04:15,840
where at the checkout counter, you could scan in the UPC code and it helped out the checkout

40
00:04:15,840 --> 00:04:22,320
person to speed up the process of tying up what you bought and also helped inventory

41
00:04:22,320 --> 00:04:27,000
control with everything that was in a, in a grocery store or retail store. So these

42
00:04:27,000 --> 00:04:33,160
systems really came into full swing in the, in the early nineties. That combined with

43
00:04:33,160 --> 00:04:40,040
the low cost of data storage really kind of set the stage for a perfect storm in the retail

44
00:04:40,040 --> 00:04:45,340
market for deeper analytics because all this information was being collected. So research

45
00:04:45,340 --> 00:04:53,440
was done by a couple of kind of groundbreaking scientists at the time in this area and they

46
00:04:53,440 --> 00:04:58,460
came up with this approach and it's called different terms, but probably the most popular

47
00:04:58,460 --> 00:05:05,680
term is market basket analysis. The idea behind market basket analysis is that if customers

48
00:05:05,680 --> 00:05:11,920
buy products in certain combinations, it can tell you a little bit of a story. So if you're,

49
00:05:11,920 --> 00:05:17,960
I got some examples here in the middle of the page here. So if you buy peanut butter

50
00:05:17,960 --> 00:05:24,160
at the grocery store, you may be likely to buy jelly. And something that's maybe even

51
00:05:24,160 --> 00:05:29,360
more interesting would be if in the third example, if you bought hot dogs and buns,

52
00:05:29,360 --> 00:05:35,120
you may be likely to buy ketchup. So you kind of get the idea that these interesting useful

53
00:05:35,120 --> 00:05:41,800
combinations start to emerge in the data. So I thought it'd be interesting to apply

54
00:05:41,800 --> 00:05:49,180
this kind of proven technique to a new market, which is cannabis retail sales and dispensaries

55
00:05:49,180 --> 00:05:55,880
to see if we could start seeing, you know, it's more of an exploratory analysis to see

56
00:05:55,880 --> 00:06:02,680
what kind of product groupings are starting to pop up. So before I could really do that,

57
00:06:02,680 --> 00:06:08,600
I reached out to Keegan and he was kind enough to, and Charles, both of you were kind enough

58
00:06:08,600 --> 00:06:15,560
to help me out and get me started with a data set. And that was the Michigan State seed

59
00:06:15,560 --> 00:06:25,060
to sale, Michigan State, I'm sorry, Washington State seed to sale data set. So I got those

60
00:06:25,060 --> 00:06:31,680
raw files and imported those up into the cloud. I had to do some data conversion and things

61
00:06:31,680 --> 00:06:37,280
to make them usable. And the files were very, very large, larger than what I'm typically

62
00:06:37,280 --> 00:06:43,280
used to working on, you know, on personal projects at work, I work on some large data

63
00:06:43,280 --> 00:06:45,720
sets, but this was this is pretty good size.

64
00:06:45,720 --> 00:06:54,840
Can I ask you how large were the files once they were unzipped? So these sales items,

65
00:06:54,840 --> 00:06:56,960
is that the data you were working with?

66
00:06:56,960 --> 00:07:03,400
Yeah, yeah. So some of the larger ones were about 15 gig. And of course, that was too

67
00:07:03,400 --> 00:07:09,720
way too large to work on my local laptop. They would just keep it just keep crashing.

68
00:07:09,720 --> 00:07:14,760
So that that information is out actually on Google Cloud. And I think I've got about a

69
00:07:14,760 --> 00:07:24,000
week's worth of free time left to to use those those those tables that I might export it

70
00:07:24,000 --> 00:07:30,760
up to the cloud. So if you guys are interested to try and take a look at those before the

71
00:07:30,760 --> 00:07:39,040
the free the free cloud, I guess, they give you like a 60 or 90 days worth of free time.

72
00:07:39,040 --> 00:07:42,600
Well, if you guys are interested in trying to get access that we can figure that out

73
00:07:42,600 --> 00:07:44,440
in case you want to poke around.

74
00:07:44,440 --> 00:07:51,000
Well, the task that I had kind of set Charles down this rabbit hole, and I cannot talk to

75
00:07:51,000 --> 00:07:58,360
you about it as well is there's this one data point that I really need, essentially, trying

76
00:07:58,360 --> 00:08:09,760
to answer the long question, does THC matter? And so that is the total sales per lab result.

77
00:08:09,760 --> 00:08:20,200
And so the algorithm that I sketched out, let me open it up so I can essentially read

78
00:08:20,200 --> 00:08:29,880
it verbatim. Because I basically wrote like an algorithm that Okay, so how can we go about

79
00:08:29,880 --> 00:08:32,080
getting these?

80
00:08:32,080 --> 00:08:42,120
Okay, so you and Charles are working with, with complete data sets, are you or are you?

81
00:08:42,120 --> 00:08:49,440
Well, essentially, we were just reading in the sales items chunk by chunk. And so you've

82
00:08:49,440 --> 00:08:59,600
got them in the cloud. But regardless, the logic is still similar. Given the sales items,

83
00:08:59,600 --> 00:09:12,520
each sale item has a global inventory ID. If you match the global inventory ID to the

84
00:09:12,520 --> 00:09:21,880
inventories, you can get the global lab result ID. And essentially, we would like to just

85
00:09:21,880 --> 00:09:33,240
iterate over all the sales items, and then add the price total for each sale item to

86
00:09:33,240 --> 00:09:37,800
running total for each global lab result ID.

87
00:09:37,800 --> 00:09:42,680
So let's go through those IDs one more time. Is this for the Washington State data? Is

88
00:09:42,680 --> 00:09:47,520
this for a different state? Yes, this is the there should be the same data set that you've

89
00:09:47,520 --> 00:09:54,160
been. Yeah, okay. So you have you have a sales item ID to a global inventory ID. And then

90
00:09:54,160 --> 00:10:05,160
the global inventory ID goes to just the what was the next one? It goes to the inventories.

91
00:10:05,160 --> 00:10:16,680
So there's a data set called inventories. And it has a global ID. Right. So that should

92
00:10:16,680 --> 00:10:27,160
be your connection there. And then the inventories data set has the global lab result ID. Okay,

93
00:10:27,160 --> 00:10:32,360
global lab result. And that would have to price that final lab result. Exactly. And

94
00:10:32,360 --> 00:10:36,240
so then if you're thinking about just like an Excel spreadsheet, essentially, I just

95
00:10:36,240 --> 00:10:43,400
would love one column, that's just global lab result ID. And then the other column is

96
00:10:43,400 --> 00:10:52,920
total sales. Okay, so lab result ID, plus the total sales column.

97
00:10:52,920 --> 00:10:58,680
And so essentially, what I'm trying to get after there is a couple things. One, that's,

98
00:10:58,680 --> 00:11:05,360
if you take the average of that, that's the that's the expected cost of failure. So that's

99
00:11:05,360 --> 00:11:14,760
an interesting metric. And then two, I want to run a regression of total sales on on two

100
00:11:14,760 --> 00:11:20,760
things one on cannabinoid concentration, to find out if there's a statistically significant

101
00:11:20,760 --> 00:11:30,680
relationship between say THC and sales, or even CBD and sales. And then also, well, I

102
00:11:30,680 --> 00:11:37,240
was actually talking to a purchasing manager recently, so someone who does purchasing for

103
00:11:37,240 --> 00:11:44,200
a retailer, a large retailer. And so they said in the early days, so they've been in

104
00:11:44,200 --> 00:11:49,080
the industry, maybe 12 years or so. And so they said in the early days, you know, your

105
00:11:49,080 --> 00:11:54,880
buddy would come over, and they would just, they would, they would manually just, they

106
00:11:54,880 --> 00:12:00,560
would just try a sample out. And that's how product was sold. These days, you know, the

107
00:12:00,560 --> 00:12:07,200
large retailer will, they'll typically get a, they'll get an Excel spreadsheet, and maybe,

108
00:12:07,200 --> 00:12:14,160
you know, 30 certificates of analysis. And then they need to select their purchases,

109
00:12:14,160 --> 00:12:22,400
based off of those certificate of analyses. And so, so it's really interesting, because,

110
00:12:22,400 --> 00:12:25,680
you know, they're not actually seeing the product hands on, they just have to go straight

111
00:12:25,680 --> 00:12:33,340
off the certificates. And so I asked this purchasing manager, okay, what are the variables

112
00:12:33,340 --> 00:12:40,480
that you look at on a certificate? And surprisingly, they said, you know, you look at your cannabinoids,

113
00:12:40,480 --> 00:12:47,280
and surprisingly, they said they look at the moisture content, because their producing

114
00:12:47,280 --> 00:12:57,720
manager was taking a moisture corrected adjustment of the cannabinoid concentrations. So you

115
00:12:57,720 --> 00:13:06,520
would divide by one plus the moisture content to try to be able to compare products with

116
00:13:06,520 --> 00:13:12,720
different moisture contents. So trying to, they're trying to normalize, normalize it

117
00:13:12,720 --> 00:13:22,640
so they can compare. Exactly. And so I'm interested to look at this question, because just anecdotally,

118
00:13:22,640 --> 00:13:30,640
in my personal belief is that maybe the lower quality cannabis may tend to be drier. And

119
00:13:30,640 --> 00:13:36,960
then the it's a tricky balance. I don't think you want like overly moist cannabis, but I

120
00:13:36,960 --> 00:13:44,400
don't think you want overly dry cannabis either. Yeah, it's not unlike the idea behind a humidor

121
00:13:44,400 --> 00:13:52,760
and cigars, right? Exactly. And so I'm curious, I'm just curious, what does the THC concentration

122
00:13:52,760 --> 00:13:59,240
even look like once you take the moisture correction? And two, if that introduces another

123
00:13:59,240 --> 00:14:06,320
interesting question is, okay, let's say, you know, we're asking the question, does

124
00:14:06,320 --> 00:14:14,920
THC predict sales? Well, we can also ask, which predicts sales better, just your THC

125
00:14:14,920 --> 00:14:21,960
content or your moisture corrected THC content? Right, or any other factor you might consider.

126
00:14:21,960 --> 00:14:29,120
Exactly. Yeah. And so at least starting with those two, you know, we could just do a regression

127
00:14:29,120 --> 00:14:36,760
and essentially, whichever regression has the high higher R squared, that that variable

128
00:14:36,760 --> 00:14:42,960
would do a better job of explaining total sales. So then you could tell purchasing managers,

129
00:14:42,960 --> 00:14:50,840
hey, you know, from the data, it's better to make your purchases based off of either

130
00:14:50,840 --> 00:14:56,880
the THC concentration or the moisture corrected THC concentration. Yeah, that's a valuable

131
00:14:56,880 --> 00:15:03,200
decision point, right? So I'll take a look at Keegan as far as pulling that data together.

132
00:15:03,200 --> 00:15:08,960
So just let me go through that one more time to make sure I've got the tables and fields

133
00:15:08,960 --> 00:15:14,480
knitted together correctly. So you got the sales item ID. And of course, the sale item

134
00:15:14,480 --> 00:15:20,640
has a global inventory ID is right? Is that right? Exactly. And you're all pumping in

135
00:15:20,640 --> 00:15:29,280
the chat and also send send it to you afterwards. Okay, and then I'll see what I can find for

136
00:15:29,280 --> 00:15:35,320
you and Charles as far as pulling that data out. And if it's, you know, depending on the

137
00:15:35,320 --> 00:15:41,880
size of the file that it generates, we can get that to you. Exactly. And it shouldn't

138
00:15:41,880 --> 00:15:46,700
the file size shouldn't be it may be a few gigabytes. So you may need to share via Google

139
00:15:46,700 --> 00:15:52,200
Drive or something. But it shouldn't be overly large, because it's just an observation for

140
00:15:52,200 --> 00:15:59,160
each lab result ID. Yeah. Okay. Yeah, I'd be glad to do that. Sorry to take up your

141
00:15:59,160 --> 00:16:08,840
time and send you on a wild goose chase. But that's the direction that my research is heading.

142
00:16:08,840 --> 00:16:14,560
So and then what possible we can look at it next week. Sure. Yeah, I what's great is because

143
00:16:14,560 --> 00:16:19,320
you're talking to so many people in the industry, you're finding out what's important to them.

144
00:16:19,320 --> 00:16:21,120
And that's what matters. So it's

145
00:16:21,120 --> 00:16:29,160
got a new guest joining here in one second. So

146
00:16:29,160 --> 00:16:35,360
welcome, Brad. Welcome to the cannabis data science meetup group.

147
00:16:35,360 --> 00:16:42,880
We're discussing cannabis sales at dispensaries. And Paul is presenting what's called a market

148
00:16:42,880 --> 00:16:49,280
basket analysis, which essentially looks at the breakdown of products that people buy

149
00:16:49,280 --> 00:17:00,480
at the same time. Okay. Welcome to have you. Thanks. Sorry, I'm late.

150
00:17:00,480 --> 00:17:06,640
So I was just going through some very generic examples of what market basket analysis, they're

151
00:17:06,640 --> 00:17:12,360
called associate association looks like in a grocery store setting, before going into

152
00:17:12,360 --> 00:17:20,040
what doing the same thing in a in a dispensary setting. So so there's a few examples here,

153
00:17:20,040 --> 00:17:24,680
I just was mentioning that if you went to a store, and lots of customers, you know,

154
00:17:24,680 --> 00:17:30,080
bought hot dogs and hot hot dog buns, then you could potentially infer that they would

155
00:17:30,080 --> 00:17:34,200
buy ketchup. And that's kind of what more market basket analysis is, right? It just

156
00:17:34,200 --> 00:17:38,760
has these different common sense associations. Sometimes they're not so common sense, you

157
00:17:38,760 --> 00:17:44,400
can find some interesting things, but these little association rules pop up in in retail

158
00:17:44,400 --> 00:17:50,800
sales. So on the project that I did, I looked at, I wanted to look at some good candidate

159
00:17:50,800 --> 00:17:58,600
dispensaries to run this analysis on. So part of my process was picking out three, what

160
00:17:58,600 --> 00:18:05,640
I what I decided was going to be three dispensaries to run this analysis on. And I did that. Let's

161
00:18:05,640 --> 00:18:14,240
see if I can come here. So the criteria for picking dispensaries to run my analysis on,

162
00:18:14,240 --> 00:18:20,600
there's three or four factors I was looking at. I was looking at dispensaries, the total

163
00:18:20,600 --> 00:18:26,640
number of sales transactions that a dispensary would have. So the higher the number of transactions,

164
00:18:26,640 --> 00:18:33,280
the better, because it would be potentially be a richer data set to look at. And along

165
00:18:33,280 --> 00:18:41,960
the same lines, total revenue, so those dispensaries that are just selling a lot. Also here in panel

166
00:18:41,960 --> 00:18:47,580
C, there's dispensaries with the distinct distinct product sold. So they've got a lot

167
00:18:47,580 --> 00:18:51,980
of variety that they're selling to their customers. And then probably one of the more interesting

168
00:18:51,980 --> 00:18:59,260
parts is I wanted to differentiate dispensaries by the the area that you can find them in

169
00:18:59,260 --> 00:19:05,920
based on the median income in their zip code area. So we have dispensaries that are in

170
00:19:05,920 --> 00:19:12,540
lower income areas and some in higher income areas. And I wanted to pick three dispensaries

171
00:19:12,540 --> 00:19:17,920
that represented that that span of median income. And if you look at this distribution

172
00:19:17,920 --> 00:19:26,960
here, we've got the count of dispensaries on our y axis, how many fell into these different

173
00:19:26,960 --> 00:19:33,280
levels of median income across the bottom here. And you kind of get this bimodal distribution

174
00:19:33,280 --> 00:19:38,600
here where you've got this big hump here in the lower median income bracket, and then

175
00:19:38,600 --> 00:19:44,960
more towards the the medium income bracket, you've got this other hump and then actually,

176
00:19:44,960 --> 00:19:50,040
I guess it's trimodal. So and then finally, you've got a higher income median income bracket

177
00:19:50,040 --> 00:19:56,000
where dispensaries are located. So I took that signal and broke it into these three

178
00:19:56,000 --> 00:20:00,440
different sections with these dotted lines, and decided that I was going to pick a dispensary

179
00:20:00,440 --> 00:20:05,120
out of each one of these three areas that I was going to run my market basket analysis

180
00:20:05,120 --> 00:20:14,680
on. So if you come to these, these three plots here, we've got every dot here represents

181
00:20:14,680 --> 00:20:21,360
a dispensary. And this is in the low median income bracket, less than 80k per year, median

182
00:20:21,360 --> 00:20:26,920
income. And then on the left hand side, we have the count of products that the dispensary

183
00:20:26,920 --> 00:20:35,160
selling across the x axis is the number of transactions. And then the size of the bubble

184
00:20:35,160 --> 00:20:42,480
is the amount of sales they're making. So we've got in this particular left hand plot

185
00:20:42,480 --> 00:20:48,200
here, we've got four variables, counter products, number of transactions, the median income

186
00:20:48,200 --> 00:20:53,880
bracket that's in and then the size of the bubble is how much money they're making. So

187
00:20:53,880 --> 00:21:03,840
looking at these three distributions by income level, I picked these three different dispensaries.

188
00:21:03,840 --> 00:21:13,440
And so DL means dispensary low DM is dispensary medium and DH is dispensary high. And I forgot

189
00:21:13,440 --> 00:21:18,720
where I put it in my paper, but they these dispensaries map to specific dispensaries

190
00:21:18,720 --> 00:21:28,200
in Washington State and I forget I think I think DL here is Zips dispensary and I forgot

191
00:21:28,200 --> 00:21:34,120
what the other two are called off the top of my head. So those were the three dispensaries

192
00:21:34,120 --> 00:21:42,680
that I picked to run my market basket analysis on. Here we go. Here they are. So the low

193
00:21:42,680 --> 00:21:49,480
income dispensary is Zips, the median is place called PRC and the high median income is a

194
00:21:49,480 --> 00:21:58,320
place called Green Theory Factoria. And I think Green Theory Factoria is in, is it called

195
00:21:58,320 --> 00:22:01,160
Bellevue? Kagan, I think they use Bellevue?

196
00:22:01,160 --> 00:22:04,840
Yes Bellevue. So that's where Microsoft is headquartered.

197
00:22:04,840 --> 00:22:10,480
So right, right. So kind of more of an up market area, I guess. So those are the three

198
00:22:10,480 --> 00:22:13,760
dispensaries. Now, go ahead.

199
00:22:13,760 --> 00:22:19,960
Not to throw you off track, I just sort of had a question while we're on the previous

200
00:22:19,960 --> 00:22:32,440
chart. So Kanekan, I heard, let's see, it was the president of Love Buds to Brett Harris

201
00:22:32,440 --> 00:22:40,520
and he was talking about, so he runs a retail chain and he was talking about, okay, so what

202
00:22:40,520 --> 00:22:48,600
does he do? He's constantly stocking new products and stocking a lot of products. So it looks

203
00:22:48,600 --> 00:22:57,320
like in looking from your charts, there may be a positive correlation between just the

204
00:22:57,320 --> 00:23:06,080
number of products you have in your transactions. Is that something that you saw or?

205
00:23:06,080 --> 00:23:10,640
So the number of the counter products and the number of transactions, like a positive

206
00:23:10,640 --> 00:23:15,000
correlation, it does seem to be that way, if that's what you're saying.

207
00:23:15,000 --> 00:23:24,280
Yeah. And I guess the bigger question would be essentially, do stores with the more variety,

208
00:23:24,280 --> 00:23:28,240
do they have higher sales on average?

209
00:23:28,240 --> 00:23:44,320
I see what you're saying. I guess not necessarily, when you say overall sales, not necessarily.

210
00:23:44,320 --> 00:23:48,960
So if you look at the size of the bubbles in these three charts, DL and DH are, I think,

211
00:23:48,960 --> 00:23:57,640
about the same size, so they have the same amount of income. And then you got the counter

212
00:23:57,640 --> 00:24:04,320
product. So that DH has less products, less number of transactions, but they are making,

213
00:24:04,320 --> 00:24:10,760
they must be charging a premium for what they're selling, I guess. Just my hypothesis.

214
00:24:10,760 --> 00:24:13,680
So you're saying DH?

215
00:24:13,680 --> 00:24:20,840
So DH in the high median income area, they have lower number of transactions, lower account

216
00:24:20,840 --> 00:24:26,560
of products, but the size of their bubble is not much different than DL. So in other

217
00:24:26,560 --> 00:24:31,720
words, they're making relatively the same amount of money, not too much difference.

218
00:24:31,720 --> 00:24:38,680
Okay, I see the size of the bubbles, their revenue. So one could argue that they're not

219
00:24:38,680 --> 00:24:48,800
in the same markets. So one could argue that DH is only competing against, well, it may

220
00:24:48,800 --> 00:24:53,680
not be competing against all the other ones. I guess these may not necessarily be in the

221
00:24:53,680 --> 00:24:55,440
same geographic region.

222
00:24:55,440 --> 00:25:02,400
No, I don't believe they are. And I can get you more information on that. But I think

223
00:25:02,400 --> 00:25:09,200
the DH is more probably a boutique, I would think more of a boutique type business. And

224
00:25:09,200 --> 00:25:13,080
they have more margin, they can charge higher prices just because that's what the market

225
00:25:13,080 --> 00:25:14,080
will bear.

226
00:25:14,080 --> 00:25:20,640
Can I just ask, are the number of transactions, the number of total transactions of people

227
00:25:20,640 --> 00:25:26,000
buying more, that would be per transaction that could make a higher revenue?

228
00:25:26,000 --> 00:25:32,680
Oh, I see you're saying so for every transaction, are they buying more products per transaction?

229
00:25:32,680 --> 00:25:35,920
Yeah, people buying $400 worth instead of $100 worth.

230
00:25:35,920 --> 00:25:40,240
Yeah, I guess that that could be possible. Yeah, that could be possible. And I actually

231
00:25:40,240 --> 00:25:47,440
didn't look into segmenting out the transactions by like products purchased in a specific transaction.

232
00:25:47,440 --> 00:25:51,280
I was looking just at dollar amounts of transaction, but it could be.

233
00:25:51,280 --> 00:25:57,040
So long story short, I'll let you continue Paul, I just wanted to bring up the fact that

234
00:25:57,040 --> 00:26:06,560
I think people that run retail establishments, they may want to stress variety. But at least

235
00:26:06,560 --> 00:26:09,200
that's what that manager was saying.

236
00:26:09,200 --> 00:26:16,240
Yeah, no, and thank you for the questions. Because whenever I'm presenting something,

237
00:26:16,240 --> 00:26:22,160
and people start digging and asking more questions, that means that we're kind of onto something,

238
00:26:22,160 --> 00:26:26,480
right? I mean, it seems to make sense. And if it generates more questions, then it probably

239
00:26:26,480 --> 00:26:32,680
has there's more value there to be had. So thanks. Thanks for that.

240
00:26:32,680 --> 00:26:38,680
So yeah, this is my kind of simple way of trying to find some good candidate dispensaries

241
00:26:38,680 --> 00:26:44,800
to run my market basket analysis on. So something to know about market basket analysis is there's

242
00:26:44,800 --> 00:26:50,960
a couple of different metrics that you use to determine how important your association

243
00:26:50,960 --> 00:26:58,880
rules are. So, you know, peanut butter jelly is likely to yield a bread purchase, that

244
00:26:58,880 --> 00:27:06,320
particular combination has a value has several different kinds of values attributed to it.

245
00:27:06,320 --> 00:27:11,360
In my analysis, I'm using three different metrics to determine how valuable those combinations

246
00:27:11,360 --> 00:27:17,960
are. The most basic one is called support. And that's on the x axis here. And support

247
00:27:17,960 --> 00:27:24,400
is considered kind of a popular popularity measure, right? So out of all the different

248
00:27:24,400 --> 00:27:28,640
rule combinations I could produce, some are going to rise to the top. All right. And it's

249
00:27:28,640 --> 00:27:36,960
a very simple, proportional metric. It's called support. There's another metric called confidence.

250
00:27:36,960 --> 00:27:41,560
And when you look at a rule, as we've seen before, you got peanut butter and jelly is

251
00:27:41,560 --> 00:27:46,720
likely to yield bread. You've got two parts of the rule. There's the peanut butter and

252
00:27:46,720 --> 00:27:53,520
jelly. And there's the bread. So you got the antecedent, which comes first and the consequent

253
00:27:53,520 --> 00:28:00,220
that comes second. So you've got two pieces. And the likelihood or the strength of association

254
00:28:00,220 --> 00:28:04,760
with the first part of the rule, going to the second part of the rule, and I hope this

255
00:28:04,760 --> 00:28:10,000
is making sense. It's kind of the tightness of association going in one direction. It's

256
00:28:10,000 --> 00:28:14,880
called the confidence. And that's what we've got here on the left-hand side. So you got

257
00:28:14,880 --> 00:28:24,480
popularity. You got strength of association in one direction. And then there's a final

258
00:28:24,480 --> 00:28:31,120
measurement called lift. And lift takes into account the strength of association in both

259
00:28:31,120 --> 00:28:35,920
directions. So if I have peanut butter, what's my strength of association with bread? And

260
00:28:35,920 --> 00:28:39,760
then if I have bread, what's my strength of association with peanut butter and jelly?

261
00:28:39,760 --> 00:28:49,240
I think I left jelly out the first part there. And that's a much stronger metric to use than

262
00:28:49,240 --> 00:28:54,600
confidence. So when we look at this chart here, we're really looking for those items

263
00:28:54,600 --> 00:29:03,000
that have the highest support and the highest lift. And lift is based on the darkness of

264
00:29:03,000 --> 00:29:09,240
the color and support is how far over to the right it is. We only see a few plot points

265
00:29:09,240 --> 00:29:14,080
here because there's literally thousands upon thousands of combinations. And we can set

266
00:29:14,080 --> 00:29:22,240
thresholds that screen out kind of the very numerous low interest combinations that have

267
00:29:22,240 --> 00:29:28,680
kind of low value for us. And we can set those thresholds for support and confidence. We

268
00:29:28,680 --> 00:29:32,280
can set those thresholds and only tease out those things that are kind of at the top of

269
00:29:32,280 --> 00:29:39,080
the pyramid, so to speak. And so that's what these scatter plots are here, just various

270
00:29:39,080 --> 00:29:45,540
association rule combinations that have kind of bubbled to the top. And I'm going to show

271
00:29:45,540 --> 00:29:51,840
you what those associations look like for the three dispensaries, the actual rules.

272
00:29:51,840 --> 00:30:00,960
So this is a bit of an eye chart. Let me see if I can zoom in a little bit here. So here

273
00:30:00,960 --> 00:30:11,600
is our top 10 association rules for the DL dispensary low median income. And we can see

274
00:30:11,600 --> 00:30:19,760
right out the gate here, you've got a wax called Chernobyl and another wax called pink.

275
00:30:19,760 --> 00:30:28,120
And those two together were fought in association with a wax called Starfighter. And you notice

276
00:30:28,120 --> 00:30:33,160
that all these rules down here, and these are all ranked by lift, that strength of association

277
00:30:33,160 --> 00:30:41,200
between the two rules back and forth. And a lift value starts at one and it can go up

278
00:30:41,200 --> 00:30:49,920
to, you know, I've seen them up into the 170s before. But something like around 75 is a pretty

279
00:30:49,920 --> 00:30:57,120
very reasonably strong association between these two halves of the rule. And so but we

280
00:30:57,120 --> 00:31:02,240
notice right off the gate, right off the gate here that all we're dealing with here are

281
00:31:02,240 --> 00:31:08,360
wax products, which I thought was, to me was not know much about the industry, was pretty

282
00:31:08,360 --> 00:31:16,680
interesting because I would have expected to see more kind of like a cross association

283
00:31:16,680 --> 00:31:21,880
with other types of products. But I was talking about some of these results with Keegan and

284
00:31:21,880 --> 00:31:28,280
it seems like just a first blush that the cannabis industry might be like the alcohol

285
00:31:28,280 --> 00:31:36,000
business right where you might be a beer drinker or a wine drinker or a spirits drinker. And

286
00:31:36,000 --> 00:31:40,920
you kind of stay in your swim line, you know, you stick with the things that you like. And

287
00:31:40,920 --> 00:31:45,360
so could the next table? Yeah, go ahead.

288
00:31:45,360 --> 00:31:52,080
Chime in for a second. Yeah. So I was just thinking, so I don't know. So if we harken

289
00:31:52,080 --> 00:32:00,680
back if a few months ago, we talked about inflation and we looked at, okay, in Oregon,

290
00:32:00,680 --> 00:32:06,120
we hypothesized, okay, what if we essentially said, okay, what if people buy a basket of

291
00:32:06,120 --> 00:32:14,800
goods and they buy 60% flour, 40% wax? So that was just an assumption we made that,

292
00:32:14,800 --> 00:32:21,440
okay, you know, people are, you know, spending 60% of their money on flour, 40% on concentrates.

293
00:32:21,440 --> 00:32:27,080
Well, that may just be the breakdown of the market, you know, flour, people may buy 100%

294
00:32:27,080 --> 00:32:33,680
flour, concentrate, people may buy 100% concentrate. So there really may not be people buying these

295
00:32:33,680 --> 00:32:41,240
baskets is essentially is that that's what I'm starting to take away. But yeah. But anyways,

296
00:32:41,240 --> 00:32:43,240
just gonna jump in real quick. I'll let you get.

297
00:32:43,240 --> 00:32:46,960
Yeah, no, that's, that's a good observation. And, you know, got to keep in mind, we're

298
00:32:46,960 --> 00:32:52,360
only looking at three stores here. And it'd be interesting to look at a lot more to try

299
00:32:52,360 --> 00:32:57,280
and get a sense of, of that. But it's you're right, it seems like that's kind of the direction

300
00:32:57,280 --> 00:33:05,400
this is pointing in. So the second table here is for the median income dispensary. And here

301
00:33:05,400 --> 00:33:13,320
we do see a little bit of a mix of different products. But when I say a mix, there's two

302
00:33:13,320 --> 00:33:17,920
different categories, there's blunts, and there's waxes. But again, they're not being

303
00:33:17,920 --> 00:33:25,600
bought in, in tandem together, they're blunts are with blunts and waxes are with waxes.

304
00:33:25,600 --> 00:33:31,760
And so we can see that these rules here seem to be consistent in your people staying in

305
00:33:31,760 --> 00:33:43,480
their own consumer swim lane. And then finally, with the high median income dispensary, there's

306
00:33:43,480 --> 00:33:52,080
a little more going on here. And the category changes over to consumables. Am I saying that

307
00:33:52,080 --> 00:33:59,140
right? Keegan consumables or so these are like, oh, actually, sorry, there's pre rolls

308
00:33:59,140 --> 00:34:06,840
in here as well. Sorry about that. So I think they mean, I mean, so I think we're seeing

309
00:34:06,840 --> 00:34:15,320
is it a few edibles. So there's like the panda panda fruit drops look like look like edibles

310
00:34:15,320 --> 00:34:25,440
same for the blueberry belt, apple rings. Yeah, there was a those sound a lot actually,

311
00:34:25,440 --> 00:34:31,000
and given that they say 100 milligrams, typically, that's a giveaway that you're working with

312
00:34:31,000 --> 00:34:39,160
an edible. Yes. Okay. Okay, that's good to know. So edibles and then pre rolls are in

313
00:34:39,160 --> 00:34:46,260
here as well. But again, if you look at the top rule, we got pre rolls or pre rolls, go

314
00:34:46,260 --> 00:34:51,760
down a few and you've got edibles with edibles. So it seems like people are staying still

315
00:34:51,760 --> 00:34:57,600
staying consistent in in purchasing the things that they like to get, you know, the same

316
00:34:57,600 --> 00:35:08,240
category things together. But those are higher end more expensive items. Okay, okay. And

317
00:35:08,240 --> 00:35:16,760
this is this is what I found too. I just found that edibles they're they're priced high for

318
00:35:16,760 --> 00:35:23,360
you know. And so it's Yeah, it's real interesting that you're finding that those are some of

319
00:35:23,360 --> 00:35:31,960
the top pairings at this is the dispensary and Bellevue. Yeah, yeah. And so this is what

320
00:35:31,960 --> 00:35:38,320
I mean, this is interesting in that, you know, I don't know if edibles cost more to produce.

321
00:35:38,320 --> 00:35:43,440
Therefore, they're just a higher selling product. But what's really interesting is the is the

322
00:35:43,440 --> 00:35:51,320
demographics because maybe there's more disposable income here, but also smoking in and of itself,

323
00:35:51,320 --> 00:35:55,200
and I'm going out on a limb here, but it you know, as time has gone by, especially over

324
00:35:55,200 --> 00:36:00,800
the last, you know, 2030 years, smoking has been kind of looked down upon as being obviously

325
00:36:00,800 --> 00:36:06,760
a less healthy lifestyle. But here you can still consume cannabis and not have to worry

326
00:36:06,760 --> 00:36:11,280
about the smoking aspect of it. But I mean, pre rolls are here, we see that as so popular,

327
00:36:11,280 --> 00:36:16,560
but can I just throw in an economic concept real quick, I'm not sure how it plays plays

328
00:36:16,560 --> 00:36:23,720
in. But essentially, the amount that you can raise prices depends on essentially the price

329
00:36:23,720 --> 00:36:33,120
elasticity that consumers have for a given good. So I'm thinking that edibles may have

330
00:36:33,120 --> 00:36:43,760
a fairly inelastic. They may be fairly inelastic in regards to price. So people who aren't

331
00:36:43,760 --> 00:36:49,240
going to buy edibles, they're just not going to buy them even if you mark them way down.

332
00:36:49,240 --> 00:36:53,560
And then the people who are going to buy them are still going to buy them even if you mark

333
00:36:53,560 --> 00:37:01,640
them way up. So that may be what we're seeing with edibles where there's just there's low

334
00:37:01,640 --> 00:37:09,400
elasticity of demand. So they're just raising the price. Yeah. Yeah, there's a lot of interesting

335
00:37:09,400 --> 00:37:14,360
about going through this, this work is just there's a lot of once you start seeing some

336
00:37:14,360 --> 00:37:19,320
of the results, it just your mind immediately wants to try and start filling in the the

337
00:37:19,320 --> 00:37:25,240
gaps in some of these things, which I think is really compelling, it'd be fun to do more

338
00:37:25,240 --> 00:37:32,720
work in this area. Or also, you know, work with with a dispensary with maybe their point

339
00:37:32,720 --> 00:37:37,000
of sale data set that they actually have from the store. And if they were willing to share

340
00:37:37,000 --> 00:37:43,760
that just kind of dive deep into their particular their particular market backs, basket analysis

341
00:37:43,760 --> 00:37:50,560
and do some more work in that area. But that is, go ahead. I just think it's so interesting

342
00:37:50,560 --> 00:37:58,720
seeing the the pre rules being at one of the top selling pairs at the Bellevue one. So

343
00:37:58,720 --> 00:38:03,920
maybe that's those are just people just going and getting a couple pre rolls. But that was

344
00:38:03,920 --> 00:38:11,080
that's unexpected to me. So yeah, anyways, I'll let you know.

345
00:38:11,080 --> 00:38:16,680
Sure. I'm almost done here. I just wanted to, you know, kind of fair warning with any

346
00:38:16,680 --> 00:38:22,600
kind of research product project. There are limitations here. And one of the limitations

347
00:38:22,600 --> 00:38:28,460
I ran into with the market basket analysis is that I'm doing a market basket analysis

348
00:38:28,460 --> 00:38:36,480
on these very specific product names. And typically do market basket analysis on categories

349
00:38:36,480 --> 00:38:47,000
of purchased items. So not necessarily skippy peanut butter, but just peanut butter, right.

350
00:38:47,000 --> 00:38:52,240
And what that does is if you have kind of intermediate naming convention for your different

351
00:38:52,240 --> 00:38:58,720
products, you tend to get a richer data set at the end of it. And for instance, the support

352
00:38:58,720 --> 00:39:04,320
values of these these combinations are very, very low, which means they're they're low

353
00:39:04,320 --> 00:39:10,320
low popularity compared to all the rope, the rules that are available. But that's because

354
00:39:10,320 --> 00:39:14,480
there's just so many rules, because there's so many different name products, and it just

355
00:39:14,480 --> 00:39:20,880
gets very, very granular, very noisy. So follow on work for this, I think there would need

356
00:39:20,880 --> 00:39:28,680
to be some naming convention or a taxonomy developed, where each of these different types

357
00:39:28,680 --> 00:39:34,680
of products like, you know, edibles and pre rolls actually fit into something that's not

358
00:39:34,680 --> 00:39:41,080
really just like an edible category, because that's too general, but something in between.

359
00:39:41,080 --> 00:39:46,600
So I think we could squeeze more value out of this by developing that taxonomy. And then,

360
00:39:46,600 --> 00:39:51,320
you know, approaching a customer and running through the analysis for them. So I just wanted

361
00:39:51,320 --> 00:39:52,320
to share that.

362
00:39:52,320 --> 00:40:02,000
I think. So in that vein, it may be one of the variables that you have to sort of chain

363
00:40:02,000 --> 00:40:09,280
in there. However, there should be essentially what's called an intermediate type for these

364
00:40:09,280 --> 00:40:10,280
these products.

365
00:40:10,280 --> 00:40:16,200
There there is that intermediate type, there's a I think there's like 24, some around 24

366
00:40:16,200 --> 00:40:22,800
different types that you can assign these products to. But we could definitely do an

367
00:40:22,800 --> 00:40:28,480
analysis to that level. But I still think that is that would be too high, or too broad

368
00:40:28,480 --> 00:40:29,480
or vague.

369
00:40:29,480 --> 00:40:34,000
Oh, okay. So I see what you're saying. So you're looking for something more granular

370
00:40:34,000 --> 00:40:40,480
than the intermediate types, but not quite so granular as these just the absolutely straight

371
00:40:40,480 --> 00:40:48,400
things. Yeah, and I think that's a value proposition for anybody that's going to be doing cannabis

372
00:40:48,400 --> 00:40:56,120
sales analytics, especially in kind of customer segmentation. And just this this type of technique

373
00:40:56,120 --> 00:41:02,200
is we'd have to develop some sort of, you know, nomenclature that we could use these

374
00:41:02,200 --> 00:41:05,240
products to fit into.

375
00:41:05,240 --> 00:41:11,760
And this is where, like you said, the retail establishment by retail establishment analysis

376
00:41:11,760 --> 00:41:16,840
may work well because different retailers may have different naming conventions. So for

377
00:41:16,840 --> 00:41:25,360
example, this is maybe to not granular enough, but most retail places, they typically mark

378
00:41:25,360 --> 00:41:34,420
flower as either hybrid, indica or sativa, just to kind of put them in three broad categories

379
00:41:34,420 --> 00:41:41,160
of flower. So that would be helpful. So for your market basket analysis, like I think

380
00:41:41,160 --> 00:41:48,800
it would be interesting. You could apply it just to flower. So like what I would expect

381
00:41:48,800 --> 00:41:56,040
is, you know, so it'd be interesting to see, okay, do people buy hybrids with sativas?

382
00:41:56,040 --> 00:42:00,240
Do they buy a mix of indica and sativa?

383
00:42:00,240 --> 00:42:08,760
Yeah, that's a great idea, Keegan. You know, some more work would have to be done, but

384
00:42:08,760 --> 00:42:16,560
it seems like people buy within their particular swim lane of product. So we could just pluck

385
00:42:16,560 --> 00:42:22,960
out a swim lane, recategorize the products that are in that particular swim lane and

386
00:42:22,960 --> 00:42:30,760
run an analysis within that. Yeah. So for flower, for edibles, for, you know, whatever.

387
00:42:30,760 --> 00:42:36,920
I'm not certain if they, I don't know as much about the concentrates. They may still do

388
00:42:36,920 --> 00:42:42,400
the hybrid sativa, indica dichotomy, or I guess not a dichotomy, but they may do that

389
00:42:42,400 --> 00:42:46,800
distinction with oils. I'm not certain they do though.

390
00:42:46,800 --> 00:42:48,800
They do. Yes.

391
00:42:48,800 --> 00:42:54,560
Oh, they do. So, okay, so Heather's chiming in. So, so yeah, so there's even that distinction

392
00:42:54,560 --> 00:42:56,720
within concentrates as well.

393
00:42:56,720 --> 00:42:57,720
Yeah.

394
00:42:57,720 --> 00:43:03,000
Yes, oils, concentrates, so RSOs and concentrates. Absolutely.

395
00:43:03,000 --> 00:43:09,480
So I wonder if it'd be worthwhile to pull out from the Washington state data, pull out

396
00:43:09,480 --> 00:43:16,160
the, those 24, 25 or whatever it is, different kinds of categories, send it out to the group

397
00:43:16,160 --> 00:43:21,360
and see if they have any recommendations for under each one of those categories, how they

398
00:43:21,360 --> 00:43:27,360
could be broken down into kind of a more granular description.

399
00:43:27,360 --> 00:43:36,960
Definitely. So that way we can get some better categories for you.

400
00:43:36,960 --> 00:43:41,800
Yeah. Yeah. So I'll do that. I'll send it out to the group and if anybody wants to chime

401
00:43:41,800 --> 00:43:49,720
in with their opinion, I'd be glad to see any kind of input that there is.

402
00:43:49,720 --> 00:43:57,200
But that's essentially it for the research I did here. Are there any other questions

403
00:43:57,200 --> 00:43:59,200
before I hand it back over to you Keegan?

404
00:43:59,200 --> 00:44:07,440
I had a quick question. Okay, so the global variables that you're referring to, that's

405
00:44:07,440 --> 00:44:13,920
global to your state where I can't pick up that ID and use that like a VIN anywhere in

406
00:44:13,920 --> 00:44:19,000
the US to be able to locate that. Maybe is that your point or is that, so you're saying

407
00:44:19,000 --> 00:44:21,240
global to your state?

408
00:44:21,240 --> 00:44:27,760
Well, so this is, this is interesting. So this is specific to Washington state, at least

409
00:44:27,760 --> 00:44:34,640
the IDs are. And so this is where each state is its own little island in the cannabis space.

410
00:44:34,640 --> 00:44:43,200
However, one could start to potentially draw some inferences. So you could see, oh, in

411
00:44:43,200 --> 00:44:51,720
Washington state, at this dispensary in Bellevue, they're primarily selling edibles. So maybe

412
00:44:51,720 --> 00:44:58,200
you could try to find like, like a sister city, so to speak of Bellevue in your given

413
00:44:58,200 --> 00:45:05,080
state. So okay, in Maryland, you know, where there are some of the, you know, higher income

414
00:45:05,080 --> 00:45:10,960
localities, and then they, those may be selling mostly edibles, or they may, so you can start

415
00:45:10,960 --> 00:45:18,280
to draw inferences, but it is still, they are still islands at this time.

416
00:45:18,280 --> 00:45:21,960
Thank you.

417
00:45:21,960 --> 00:45:29,240
At least that's my take. Is that essentially how you would explain things Paul?

418
00:45:29,240 --> 00:45:33,840
That seems reasonable to me.

419
00:45:33,840 --> 00:45:35,560
Okay.

420
00:45:35,560 --> 00:45:39,000
All right.

421
00:45:39,000 --> 00:45:45,200
My main question is kind of broad is just, do we just know like the percent, like the

422
00:45:45,200 --> 00:45:52,720
percentage breakdowns of these, like, well, I guess that's more just the intermediate

423
00:45:52,720 --> 00:45:58,480
types. So like, essentially, like, like how much concentrates being sold, how much flowers

424
00:45:58,480 --> 00:45:59,480
being sold?

425
00:45:59,480 --> 00:46:08,640
Yeah, I didn't, I didn't really look at, look at the percentages by those categories. But

426
00:46:08,640 --> 00:46:13,600
that's, that's a good, I mean, to your point, if you mentioned this several times when dealing

427
00:46:13,600 --> 00:46:18,800
with these data sets is getting basic summary statistics is the starting point really, isn't

428
00:46:18,800 --> 00:46:24,920
it? I mean, that should inform some of the questions and levels of kind of research that

429
00:46:24,920 --> 00:46:28,560
you want to head in. But for this particular project, I didn't actually do any of those

430
00:46:28,560 --> 00:46:29,560
summaries.

431
00:46:29,560 --> 00:46:35,240
It's okay. It's just sort of, you know, the state where we're at, you know, there's a

432
00:46:35,240 --> 00:46:42,960
lot of other industries, these summary statistics, people just sort of know offhand, oh, you

433
00:46:42,960 --> 00:46:49,480
know, like, you know, like, for example, at the grocery stores, they have a much better

434
00:46:49,480 --> 00:46:55,960
idea of, okay, what's the breakdown of people buying milk and things like that. So yeah,

435
00:46:55,960 --> 00:47:00,360
that's, yeah, they've had that stuff for years and years and inside now.

436
00:47:00,360 --> 00:47:05,680
So that's why the summary statistics are just so meaningful. And I've just found that if

437
00:47:05,680 --> 00:47:11,880
you just keep taking like conditional, conditional averages, you can get really far with that.

438
00:47:11,880 --> 00:47:19,320
So yeah, yeah, absolutely.

439
00:47:19,320 --> 00:47:28,120
So I think this is incredibly interesting. So unless anyone else wants to chime in, basically,

440
00:47:28,120 --> 00:47:36,480
my main takeaway is, is, and it kind of confirmed the sneaking suspicion I had was there's almost

441
00:47:36,480 --> 00:47:45,680
just different types of cannabis consumers. Yeah. And they don't cross. So, and so it

442
00:47:45,680 --> 00:47:52,400
may like, so for example, like the cross selling, it may be like futile. So, like, you know,

443
00:47:52,400 --> 00:48:00,240
if like you're like a flower person, it really may just be tough for anyone to convince you

444
00:48:00,240 --> 00:48:07,840
to spend money on a pre-roll or spend money on an edible or oils. Yeah. Don't waste your

445
00:48:07,840 --> 00:48:12,600
time trying to convert somebody that's not going to get converted. Yeah. So that was

446
00:48:12,600 --> 00:48:22,640
my main like takeaway that you can like act on today is upsell versus cross sell. If you're

447
00:48:22,640 --> 00:48:27,440
right. Yeah, that's, that's a great, that's a great way of putting it Keegan. Yeah. I

448
00:48:27,440 --> 00:48:31,720
didn't really think of it in those terms, but yeah, upsell as opposed to cross sell.

449
00:48:31,720 --> 00:48:36,600
Yeah. Yeah. If you're into edibles, you can, every, every form of the flower or the product

450
00:48:36,600 --> 00:48:43,520
may work for you. So if you consume edibles, the flower, the concentrate, RSOs, those will

451
00:48:43,520 --> 00:48:52,960
work for you. So edibles are sold sort of as an alternative to smoking. And so, and

452
00:48:52,960 --> 00:48:57,760
I believe that's why they sort of, you know, they, they get a premium for them. They also,

453
00:48:57,760 --> 00:49:02,240
I think are more expensive to produce. I'm sure they're more expensive to produce. Yes.

454
00:49:02,240 --> 00:49:09,720
Pre-rolls, I mean, somebody has to do the, well, I'm sure it's done by machine, you know,

455
00:49:09,720 --> 00:49:13,920
they're pre-rolled, but it is an extra step in the process. And it's probably geared more

456
00:49:13,920 --> 00:49:21,240
towards people who don't have like smoking paraphernalia or don't have the skills to roll

457
00:49:21,240 --> 00:49:25,960
their own and can afford to have someone else do it for them. Right. Or they just don't

458
00:49:25,960 --> 00:49:29,960
want to, they, they're not too thrilled about the smell, right? There's always that aspect

459
00:49:29,960 --> 00:49:40,640
to it as well. So. It's interesting. So I think the pre-rolls are worth looking at more.

460
00:49:40,640 --> 00:49:45,860
So that was one that always sort of flew under my radar. So I just never paid too much attention

461
00:49:45,860 --> 00:49:53,240
to those, but there was a talk at this latest CannaCon about pre-rolls and they're maybe

462
00:49:53,240 --> 00:50:01,880
more popular than I give them credit for. So that's another takeaway. Also, it's good

463
00:50:01,880 --> 00:50:07,600
to see you. Tafique. I was going to say, Hey, and introduce you to the group. I didn't want

464
00:50:07,600 --> 00:50:14,800
to interrupt Paul. So, so now. Yeah, I wasn't sure what a good point to do that was. But

465
00:50:14,800 --> 00:50:21,160
it's good to have you. So in YouTube, Brad. So I guess before we get out of here, you

466
00:50:21,160 --> 00:50:28,800
wouldn't mind maybe introducing yourselves just so we can get to know each other. Yeah,

467
00:50:28,800 --> 00:50:35,240
no problem. So currently I'm a student in a data science bootcamp right now going through

468
00:50:35,240 --> 00:50:42,760
with thankful I'm coming up on the last bit. So pretty much like final capstones, technical

469
00:50:42,760 --> 00:50:49,440
interviews like learning how to proceed through those. And then on the side, which is why

470
00:50:49,440 --> 00:50:56,120
I found this particular media very interesting is I work part time at the dispensary. So

471
00:50:56,120 --> 00:51:02,240
this was very, very interesting enough. Yeah. So I'm listening in a little bit more intensely

472
00:51:02,240 --> 00:51:08,520
to like how you guys are collecting data, how you guys are analyzing it. Because at

473
00:51:08,520 --> 00:51:13,840
least for me, like if I use cannabis, like I don't use pre rolls at all because they

474
00:51:13,840 --> 00:51:20,120
are more convenient. But I mean, the smell is a factor. Living situation is a factor.

475
00:51:20,120 --> 00:51:27,040
And then I think tolerance is also like a very nebulous concept as well to where if

476
00:51:27,040 --> 00:51:35,240
you are used to edibles or concentrate for instance, then maybe pre rolls don't hit as

477
00:51:35,240 --> 00:51:43,880
consistently. So, but yeah. That's cool Tawfiq that you're enrolled in a program and you

478
00:51:43,880 --> 00:51:48,120
work at a dispensary, you're kind of at a really cool nexus there in between those two

479
00:51:48,120 --> 00:51:54,480
overlaps. But if you have any like further questions about the stuff that I did here,

480
00:51:54,480 --> 00:51:58,600
because I just, this is part of my graduate program and this is part of my capstone. Well,

481
00:51:58,600 --> 00:52:03,640
this was my capstone. Oh, okay. So yeah, so if you have you reach out if you have any

482
00:52:03,640 --> 00:52:09,080
other questions or whatever they are. It'll be interesting. I'll probably hang around

483
00:52:09,080 --> 00:52:16,480
afterwards. Okay. You'll have to just share some anecdotal evidence if you can with us

484
00:52:16,480 --> 00:52:22,720
and just kind of, or just keep an eye out yourself and just see like, okay, like are

485
00:52:22,720 --> 00:52:34,800
people mixing products and then if so, like, what are they mixing? Yes. Even like when

486
00:52:34,800 --> 00:52:42,080
they're purchasing or usage. So basically, like, so when the consumer comes in, if they

487
00:52:42,080 --> 00:52:49,440
buy more than one product, what are those products? So are they buying a wax and an

488
00:52:49,440 --> 00:52:57,400
edible? Are they buying a wax and an RSO? Are they buying like two things of flower?

489
00:52:57,400 --> 00:53:04,840
So that's sort of the idea is what like combination of products, if any, are people generally

490
00:53:04,840 --> 00:53:15,160
buying this sort of we're trying to figure out. I have to think about that. There's like

491
00:53:15,160 --> 00:53:22,520
a lot of I don't know if this is very universal across dispensaries in Michigan or even in

492
00:53:22,520 --> 00:53:28,760
other states like Washington or California. But I know that the dispensary that I work

493
00:53:28,760 --> 00:53:35,680
at runs promotion specifically to either double up on certain products or like bundle deals

494
00:53:35,680 --> 00:53:42,000
or like specials like buy two of any brand product and then you get one free of a equal

495
00:53:42,000 --> 00:53:49,640
or lesser value. So typically, sometimes they'll buy maybe flower or edibles and then maybe

496
00:53:49,640 --> 00:53:57,440
some paraphernalia or maybe an edible on top of that. But it's I haven't seen enough to

497
00:53:57,440 --> 00:54:03,720
kind of like make a kind of consistent rule, I would say, and I don't have access to this

498
00:54:03,720 --> 00:54:07,720
data, although I've been trying to ask for it.

499
00:54:07,720 --> 00:54:14,400
Well, and that's why you kind of mix the anecdotes with the data. So you look at the data to

500
00:54:14,400 --> 00:54:19,920
actually get the look and then you just do your real world observations just to make

501
00:54:19,920 --> 00:54:30,000
sure things are just for like a sanity check, essentially. So what you mentioned with the

502
00:54:30,000 --> 00:54:34,960
discounts is those look to me like essentially quantity discounts.

503
00:54:34,960 --> 00:54:36,760
Right, the bundling.

504
00:54:36,760 --> 00:54:43,160
So it seems like just people have just sort of stumbled or maybe they've done the research

505
00:54:43,160 --> 00:54:48,000
at the dispensary, but it seems that people have just kind of come to the realization

506
00:54:48,000 --> 00:54:56,000
that, OK, we can sell more by giving people quantity discounts, maybe not necessarily

507
00:54:56,000 --> 00:54:57,600
cross promotional.

508
00:54:57,600 --> 00:55:05,480
Yeah, this is great that you're able to join today, Todd, because they're based on this

509
00:55:05,480 --> 00:55:11,000
conversation that kind of aligns with what we're seeing in the results that people stay

510
00:55:11,000 --> 00:55:17,640
within their own products. And so much so that it seems like dispensaries either just

511
00:55:17,640 --> 00:55:22,480
going to try and upsell or try and increase quantity, but they're not going to really

512
00:55:22,480 --> 00:55:30,160
bother with trying to cross sell. So it seems like it's heading in that direction anyway.

513
00:55:30,160 --> 00:55:38,920
It could even lead for like future, you know, some future research, because if we've got

514
00:55:38,920 --> 00:55:44,840
these different consumer groups, then it may be worth trying to figure out, OK, what puts

515
00:55:44,840 --> 00:55:51,800
a consumer in a particular group. And so I think this is where there is some sort of

516
00:55:51,800 --> 00:55:55,960
research being done. So this is where I was saying a lot of people are interested in,

517
00:55:55,960 --> 00:56:02,600
OK, what are the demographics of cannabis consumers, what demographics by certain products.

518
00:56:02,600 --> 00:56:09,080
So now I think we're finally sort of joining the current state of cannabis research, which

519
00:56:09,080 --> 00:56:19,280
is basically what do different people buy? Right. So we're finally connecting all the

520
00:56:19,280 --> 00:56:20,280
dots here.

521
00:56:20,280 --> 00:56:26,280
Yeah, it'd be nice to know what some other people have done in consumer analytics in

522
00:56:26,280 --> 00:56:33,880
this space. I know Brad hasn't had much chance to talk, but I was just curious what Brad's

523
00:56:33,880 --> 00:56:37,800
background is and if he's got any overlap in this space or any of the other spaces within

524
00:56:37,800 --> 00:56:38,800
the industry.

525
00:56:38,800 --> 00:56:47,200
No, not really. I've been a contract software developer most of my life and I'm making a

526
00:56:47,200 --> 00:56:55,200
break into data science. I had some work at Leo Bernetta Marketing Company in this space.

527
00:56:55,200 --> 00:57:03,320
I'm taking a boot camp now, so I'm kind of new also.

528
00:57:03,320 --> 00:57:05,520
Very good.

529
00:57:05,520 --> 00:57:11,720
It's really interesting to see the work you're doing.

530
00:57:11,720 --> 00:57:18,520
Basically the only things I've seen people concretely do are just age and gender. So,

531
00:57:18,520 --> 00:57:25,240
OK, you know, I just vaguely remember that I think older people tend to buy more edibles.

532
00:57:25,240 --> 00:57:31,400
That was the main breakaway. And then, of course, there's I think there's more male

533
00:57:31,400 --> 00:57:37,040
consumers. You're just that's just the way the cannabis industry is. It's kind of dominated

534
00:57:37,040 --> 00:57:46,080
by young males. But that's the whole thing is not it may be for specific products. Right.

535
00:57:46,080 --> 00:57:52,000
So if you look at edibles, young males may not necessarily be the people buying edibles

536
00:57:52,000 --> 00:58:02,760
per se. So that's that's kind of what people are breaking into. But there's just. I was

537
00:58:02,760 --> 00:58:06,400
just saying there's just so much more that can be done there.

538
00:58:06,400 --> 00:58:14,840
I wonder if part of it, because I know that when we scan IDs, at least in the state of

539
00:58:14,840 --> 00:58:23,120
Michigan, the only demographic data that is collected is I mean, I do think that the gender

540
00:58:23,120 --> 00:58:29,080
is collected and there is a non-binary option, but I don't know if that's included in any

541
00:58:29,080 --> 00:58:36,200
of the database that we have internally. So I don't know whether we even have any in-house

542
00:58:36,200 --> 00:58:40,520
like data analytics that's being done, because usually for like rewards or anything like

543
00:58:40,520 --> 00:58:48,120
that, we go through a third party called Alpine IQ. They run some analytics software, but

544
00:58:48,120 --> 00:58:53,560
there isn't really anything done on our end specifically.

545
00:58:53,560 --> 00:59:00,080
This is the final tidbit I've got to offer. So the other thing that I heard was, OK, people

546
00:59:00,080 --> 00:59:07,080
tend to only buy things like within their immediate zip code or potentially on the way

547
00:59:07,080 --> 00:59:14,640
to work. So the way we could potentially look at this in Washington state is just say, OK,

548
00:59:14,640 --> 00:59:20,820
people are probably buying things locally. So you could try to maybe do some correlation

549
00:59:20,820 --> 00:59:26,700
between like you were saying, like essentially your income and then like the average sales

550
00:59:26,700 --> 00:59:37,160
in that zip code. So maybe your income affects the products you buy and then you can add

551
00:59:37,160 --> 00:59:40,760
whatever variables you can potentially get at the zip code level.

552
00:59:40,760 --> 00:59:47,880
So yeah, yeah. Some summary stats at the zip code level based on median income could probably

553
00:59:47,880 --> 00:59:53,840
confirm a lot of that. Maybe profession. I don't know if you can uncover that, like what

554
00:59:53,840 --> 00:59:59,240
profession you work at may determine. I don't know. I'm just spitballing at this point.

555
00:59:59,240 --> 01:00:04,960
So but yeah, so the median income by zip code I got from the US Census Bureau from their

556
01:00:04,960 --> 01:00:13,760
website. So they may have other interesting geographic. You could do average age. So yeah,

557
01:00:13,760 --> 01:00:18,760
yeah, there could be some quite a few other things that we could tie into that just by

558
01:00:18,760 --> 01:00:29,040
the zip code of dispensary. So that brings us here to the end of the hour. And so I think

559
01:00:29,040 --> 01:00:35,360
Paul, you've done a fantastic job. So I think it's groundbreaking research. I think it answers

560
01:00:35,360 --> 01:00:42,440
an important question. Do people buy different types of goods? And I think it provides evidence

561
01:00:42,440 --> 01:00:50,840
that there's distinct consumers and everybody can take their own takeaways from this. But

562
01:00:50,840 --> 01:00:56,320
I think that that in itself is valuable to retailers because now you just know that if

563
01:00:56,320 --> 01:00:59,480
you're trying to cross sell, you may be barking up the wrong tree.

564
01:00:59,480 --> 01:01:05,120
Yeah. Yeah. And thanks to you and Charles for kind of taking me into the group here

565
01:01:05,120 --> 01:01:09,320
a few months back and getting me started. And I really appreciate the help you've given

566
01:01:09,320 --> 01:01:17,400
me. It was a great presentation. Thank you. Very, very interesting. So it's given us a

567
01:01:17,400 --> 01:01:24,920
lot of avenues. And so for next week, we'll try to wrangle the sales per lab result and

568
01:01:24,920 --> 01:01:33,800
then see if THC matters. And then if so, how much so just before you wrap up one quick

569
01:01:33,800 --> 01:01:39,240
thing, I don't know if I is it Taufeek? I'm sorry if I'm pronouncing you wrong incorrectly.

570
01:01:39,240 --> 01:01:41,720
That's close enough.

571
01:01:41,720 --> 01:01:48,360
Okay. Well, I just to let you know, I do have to drop. I know you wanted to talk, but I'm

572
01:01:48,360 --> 01:01:53,320
on my lunch break. So I have to drop, but I'll reach out to you through the meetup. So if

573
01:01:53,320 --> 01:01:56,560
you have any follow on questions, we can talk.

574
01:01:56,560 --> 01:01:59,080
Okay, perfect. That sounds good.

575
01:01:59,080 --> 01:02:00,680
Okay. All right.

576
01:02:00,680 --> 01:02:05,960
All right, everyone. Well, thank you for coming to the Canvas Data Science Meetup Group and

577
01:02:05,960 --> 01:02:09,520
enjoy your week and stay productive. So.

578
01:02:09,520 --> 01:02:12,520
Bye guys. Thanks.

579
01:02:12,520 --> 01:02:13,520
Have a good one.

580
01:02:13,520 --> 01:02:14,520
Bye now.

581
01:02:14,520 --> 01:02:35,400
cambiagoose Or

