1
00:00:00,000 --> 00:00:14,700
Wha, good evening, badgeman!

2
00:00:14,700 --> 00:00:36,840
All right.

3
00:00:36,840 --> 00:00:42,340
Welcome to the Cannabis Data Science Meetup for March 24th.

4
00:00:42,340 --> 00:00:46,900
Last week we talked about market concentration.

5
00:00:46,900 --> 00:00:56,500
So this week we'll do a slightly more formal look and we'll actually look at measures of

6
00:00:56,500 --> 00:00:59,260
market concentration.

7
00:00:59,260 --> 00:01:09,420
Just to set the stage, we've been looking at Colorado data and just from the total revenue

8
00:01:09,420 --> 00:01:19,940
we have and the total number of distributors, we can just calculate the average revenue

9
00:01:19,940 --> 00:01:27,200
per distributor just to get a gauge of what the market share may look like.

10
00:01:27,200 --> 00:01:33,300
This may not actually be representative since it's just an average, but if we do a plot,

11
00:01:33,300 --> 00:01:41,020
we'll see that medical average market share is relatively constant and retail decreases

12
00:01:41,020 --> 00:01:50,700
from 2014 to 2017, indicating that the Colorado cannabis market is becoming more competitive

13
00:01:50,700 --> 00:01:53,140
over time.

14
00:01:53,140 --> 00:02:00,980
So we'll try to measure the market concentration and see if it changes over time.

15
00:02:00,980 --> 00:02:04,540
And how will we do that?

16
00:02:04,540 --> 00:02:12,580
Well, the Herfindale-Herschman Index is a measure of market concentration.

17
00:02:12,580 --> 00:02:14,060
This is widely accepted.

18
00:02:14,060 --> 00:02:24,300
It's used in legal cases and it's a go-to tool for economists.

19
00:02:24,300 --> 00:02:28,460
The Herfindale-Herschman Index is a simple calculation.

20
00:02:28,460 --> 00:02:32,540
It's simply the square of all market shares.

21
00:02:32,540 --> 00:02:40,980
For example, if there were two ferns in the market and each one had a 50% market share,

22
00:02:40,980 --> 00:02:48,300
think Coca-Cola has 50% market share, Pepsi has 50% market share, then you would have

23
00:02:48,300 --> 00:02:56,860
a Herfindale-Herschman Index of 5,000.

24
00:02:56,860 --> 00:03:08,700
What you'll note is, as there are more and more firms in the market and they each have

25
00:03:08,700 --> 00:03:16,060
a relatively equal market size, HHI will approach zero.

26
00:03:16,060 --> 00:03:28,220
And if there is one firm with 100, so 100 squared, then the HHI would be 10,000.

27
00:03:28,220 --> 00:03:37,380
So the HHI increases as the market becomes less competitive and it decreases as the market

28
00:03:37,380 --> 00:03:41,020
becomes more competitive.

29
00:03:41,020 --> 00:03:43,060
Why does that matter?

30
00:03:43,060 --> 00:03:52,620
Well, the HHI is often used in legal cases to take into consideration if certain policies

31
00:03:52,620 --> 00:03:57,140
increase or decrease market concentration.

32
00:03:57,140 --> 00:04:07,420
For example, HHI's between 1,500 and 2,500 are considered concentrated.

33
00:04:07,420 --> 00:04:13,420
Anything above 2,500, as we saw earlier, the two firms, that would be considered highly

34
00:04:13,420 --> 00:04:14,640
concentrated.

35
00:04:14,640 --> 00:04:23,500
So if a market truly were dominated by two firms with each 50% market share, that would

36
00:04:23,500 --> 00:04:26,620
be considered a highly concentrated market.

37
00:04:26,620 --> 00:04:33,860
It's not typical to see market shares greater than 30%, but the cannabis industry is new,

38
00:04:33,860 --> 00:04:46,740
so there are turbulent markets and market shares, and a lot of variance in market shares.

39
00:04:46,740 --> 00:04:48,860
Good.

40
00:04:48,860 --> 00:04:56,340
Nick, joining us?

41
00:04:56,340 --> 00:05:02,260
Welcome, Nick.

42
00:05:02,260 --> 00:05:21,900
We are discussing, essentially, we're discussing the average market share, and this can be

43
00:05:21,900 --> 00:05:29,220
measured by the Herfindale-Herschman index, which is just a square of market shares.

44
00:05:29,220 --> 00:05:41,340
And the reason this is important is, for example, if there are a policy decision, and it's going

45
00:05:41,340 --> 00:05:51,740
to increase the Herfindale-Herschman index by more than 200 points, that's going to increase

46
00:05:51,740 --> 00:05:53,940
somebody's market power.

47
00:05:53,940 --> 00:06:04,460
So it's just interesting to see how different policy decisions may affect market concentration.

48
00:06:04,460 --> 00:06:11,860
So we're just going to go ahead and jump into that with the data here.

49
00:06:11,860 --> 00:06:20,480
So as always, the first step is look at the data.

50
00:06:20,480 --> 00:06:26,900
So let me go ahead and...

51
00:06:26,900 --> 00:06:33,380
All right.

52
00:06:33,380 --> 00:06:53,980
And so this will be available on the Cannabis Data Science GitHub here in just one second.

53
00:06:53,980 --> 00:06:59,860
So essentially, what we're going to do is we're going to attempt to measure the market

54
00:06:59,860 --> 00:07:08,260
concentration, and we're going to do it in Washington State.

55
00:07:08,260 --> 00:07:13,580
So we have wholesale sales.

56
00:07:13,580 --> 00:07:16,680
We have it in two different formats here.

57
00:07:16,680 --> 00:07:19,180
So you have it in columns.

58
00:07:19,180 --> 00:07:31,140
So we have monthly wholesale sales for these six months, from August of 2020 to January

59
00:07:31,140 --> 00:07:33,540
of 2021.

60
00:07:33,540 --> 00:07:41,340
And we have the firms and their monthly sales.

61
00:07:41,340 --> 00:07:46,920
And a common way that you can look at this is what it's called panel data.

62
00:07:46,920 --> 00:07:53,740
So this is what you would consider panel data, where you have your organization, you have

63
00:07:53,740 --> 00:07:57,100
their sales, and you have their month.

64
00:07:57,100 --> 00:08:20,940
So you'll see, for example, you'll have each organization, and you'll have their sales

65
00:08:20,940 --> 00:08:23,700
by month.

66
00:08:23,700 --> 00:08:26,820
All right.

67
00:08:26,820 --> 00:08:28,460
So we've got our data.

68
00:08:28,460 --> 00:08:34,820
So now let's see if we can't estimate this.

69
00:08:34,820 --> 00:08:50,460
So we're going to jump over to SPDR and go ahead and read in this data.

70
00:08:50,460 --> 00:09:02,980
And I'm going to try working with the panel data to begin with.

71
00:09:02,980 --> 00:09:10,520
All right.

72
00:09:10,520 --> 00:09:14,700
So let's look at our data, as always.

73
00:09:14,700 --> 00:09:20,660
So we have our organization, our sales, and our month.

74
00:09:20,660 --> 00:09:23,180
All right.

75
00:09:23,180 --> 00:09:28,500
So I've broken this down into the various steps we need to do.

76
00:09:28,500 --> 00:09:33,180
And we're going to see if we can't code this up here in the next 15 minutes.

77
00:09:33,180 --> 00:09:41,780
So we need the total sale by month.

78
00:09:41,780 --> 00:10:03,980
So we can get our months.

79
00:10:03,980 --> 00:10:14,900
All right.

80
00:10:14,900 --> 00:10:19,220
So let's just start hacking at this.

81
00:10:19,220 --> 00:10:32,740
So I guess we just need sales by month.

82
00:10:32,740 --> 00:10:38,900
So there's probably more elegant ways to do this, but let's just get it done to begin

83
00:10:38,900 --> 00:10:39,900
with.

84
00:10:39,900 --> 00:10:46,380
So for month and months.

85
00:10:46,380 --> 00:10:53,900
All right.

86
00:10:53,900 --> 00:10:56,180
So we'll just need to do.

87
00:10:56,180 --> 00:10:57,180
Right.

88
00:10:57,180 --> 00:11:06,140
So data.sales.

89
00:11:06,140 --> 00:11:17,740
We don't want all of the data, so we'll just want to.

90
00:11:17,740 --> 00:11:27,980
Create a page where our month is equal to the month.

91
00:11:27,980 --> 00:11:33,100
So let's see if this snippet of code.

92
00:11:33,100 --> 00:11:37,060
All right.

93
00:11:37,060 --> 00:11:50,300
This is actually some.

94
00:11:50,300 --> 00:11:59,100
I forgot to.

95
00:11:59,100 --> 00:12:07,020
Record the.

96
00:12:07,020 --> 00:12:08,020
All right.

97
00:12:08,020 --> 00:12:15,620
So we now have total sales by month.

98
00:12:15,620 --> 00:12:32,220
So now we need to calculate the market share for each wholesaler.

99
00:12:32,220 --> 00:12:34,020
By month.

100
00:12:34,020 --> 00:12:41,420
So.

101
00:12:41,420 --> 00:12:47,380
That will actually be fairly easy.

102
00:12:47,380 --> 00:12:54,020
To do.

103
00:12:54,020 --> 00:12:56,500
Let's just think about how to do that.

104
00:12:56,500 --> 00:13:01,500
So.

105
00:13:01,500 --> 00:13:20,140
Well.

106
00:13:20,140 --> 00:13:24,980
This is probably not going to be the most elegant way to do things, but we'll come back

107
00:13:24,980 --> 00:13:27,940
and we can refactor as need be.

108
00:13:27,940 --> 00:13:34,100
All right, let's just iterate over the.

109
00:13:34,100 --> 00:13:35,500
The data for him.

110
00:13:35,500 --> 00:13:37,700
So let's just.

111
00:13:37,700 --> 00:13:50,900
Create a series so.

112
00:13:50,900 --> 00:14:14,060
We're just going to iterate over the rows and then the market share is simply going

113
00:14:14,060 --> 00:14:16,060
to be.

114
00:14:16,060 --> 00:14:18,100
The sales.

115
00:14:18,100 --> 00:14:22,380
Divided by.

116
00:14:22,380 --> 00:14:27,220
The sales by month.

117
00:14:27,220 --> 00:14:40,660
Of that particular month.

118
00:14:40,660 --> 00:14:48,900
And I wonder.

119
00:14:48,900 --> 00:14:56,460
If we could just assign this to the data itself.

120
00:14:56,460 --> 00:15:04,100
So.

121
00:15:04,100 --> 00:15:22,500
Think this is.

122
00:15:22,500 --> 00:15:35,380
Yeah.

123
00:15:35,380 --> 00:15:40,300
lt.

124
00:15:40,300 --> 00:15:41,300
Right.

125
00:15:41,300 --> 00:15:58,780
It's sort of having a mind space of how to assign a value to.

126
00:15:58,780 --> 00:16:17,780
And.

127
00:16:17,780 --> 00:16:38,780
Okay, let's see if.

128
00:16:38,780 --> 00:16:47,780
So that dirty piece of code, like I said.

129
00:16:47,780 --> 00:16:50,780
Always refactor.

130
00:16:50,780 --> 00:16:58,780
So we've now calculated the market share for each organization.

131
00:16:58,780 --> 00:17:00,780
So, rule number one.

132
00:17:00,780 --> 00:17:03,780
Look at the data.

133
00:17:03,780 --> 00:17:06,780
So.

134
00:17:06,780 --> 00:17:11,780
Just the mean market share is close to 1%.

135
00:17:11,780 --> 00:17:17,780
And just as a sanity check.

136
00:17:17,780 --> 00:17:25,780
In Colorado, the market share was dipping down to 0.1%.

137
00:17:25,780 --> 00:17:31,780
So that's interesting.

138
00:17:31,780 --> 00:17:39,780
All right. And so we can actually now calculate the HHI by month.

139
00:17:39,780 --> 00:17:47,780
So let's go ahead and do that to see how concentrated these markets are.

140
00:17:47,780 --> 00:17:53,780
And I'm going to do that in a similar way as we calculated total sales by month.

141
00:17:53,780 --> 00:18:10,780
So.

142
00:18:10,780 --> 00:18:25,780
We'll just look at the data for that particular month.

143
00:18:40,780 --> 00:18:50,780
All right. And so then we'll want to calculate the HHI.

144
00:18:50,780 --> 00:19:05,780
And so now we'll iterate over all the firms. So for.

145
00:19:05,780 --> 00:19:11,780
And remember the HHI.

146
00:19:11,780 --> 00:19:24,780
We're going to add the square of the market share.

147
00:19:24,780 --> 00:19:49,780
And we'll keep track of that value.

148
00:19:49,780 --> 00:20:00,780
And let's see if this works.

149
00:20:00,780 --> 00:20:07,780
So.

150
00:20:07,780 --> 00:20:24,780
We now have the Herfindale Hirschman index and let's go ahead and see if we can plot this real quick in the last five minutes, because that's not doing us any good right there.

151
00:20:24,780 --> 00:20:27,780
So.

152
00:20:27,780 --> 00:20:40,780
Let's see if we can create a series out of this somehow.

153
00:20:57,780 --> 00:21:25,780
I just wish I could sort this.

154
00:21:25,780 --> 00:21:46,780
Let's see if we can.

155
00:21:46,780 --> 00:21:59,780
So.

156
00:21:59,780 --> 00:22:27,780
So.

157
00:22:27,780 --> 00:22:37,780
Real quick.

158
00:22:37,780 --> 00:22:50,780
You're right.

159
00:22:50,780 --> 00:23:02,780
Oh, you're correct, Charles. So that's incredibly helpful.

160
00:23:02,780 --> 00:23:21,780
And just to polish this off in the last two minutes here, we have now in 30 minutes, we have discussed market share.

161
00:23:21,780 --> 00:23:30,780
We've introduced the Herfindale Hirschman index, which is a common measure of market concentration.

162
00:23:30,780 --> 00:23:46,780
We've discussed what the HHI is and how low values represent competitive markets and high values represent less competitive markets.

163
00:23:46,780 --> 00:24:01,780
And we've noticed that it can be used in the legal system to determine if policy decisions increase or decrease market concentration.

164
00:24:01,780 --> 00:24:30,780
We have then compiled wholesale sales from open Washington state data. And if you need the data source, we've compiled it here from OpenTHC, who gets their data from the open records requests.

165
00:24:30,780 --> 00:24:40,780
We have then read in the data. We calculated sales by month.

166
00:24:40,780 --> 00:24:48,780
We then calculated market share for each wholesaler by month.

167
00:24:48,780 --> 00:25:04,780
We finally calculated the HHI by month and we plotted it so we can visualize it. So this is a complete data science exercise in 30 minutes.

168
00:25:04,780 --> 00:25:32,780
And our takeaway is it looks like the market concentration is relatively constant and perhaps even decreasing, except for this quite noticeable spike in October of 2020, where there was a sharp decrease in market competitiveness.

169
00:25:32,780 --> 00:25:49,780
So I think I need to end it here to let people get to the Cannabis Sciences Virtual Conference, where they'll be talking about laboratory information management systems for testing cannabis.

170
00:25:49,780 --> 00:26:04,780
However, I'm going to pass this off to the group and discuss why we may have seen a rise in market concentration in October.

171
00:26:04,780 --> 00:26:28,780
So just to throw my idea out there, that is harvest month. But I'll leave that there for the group, because I actually have to go ahead and be wrapping this up today.

172
00:26:28,780 --> 00:26:38,780
I've got to get to the Cannabis Science Conference.

173
00:26:38,780 --> 00:26:50,780
Awesome. So sorry that was a short day, but I'll try to do an extra offering sometime next week in the afternoon to make up for lost time.

174
00:26:50,780 --> 00:26:54,780
So thanks for attending today.

175
00:26:54,780 --> 00:26:56,780
Thank you. It was great.

176
00:26:56,780 --> 00:27:05,780
Oh, I'm glad you learned something and I'll post this code so that you can go through it and extend upon it.

177
00:27:05,780 --> 00:27:06,780
Okay.

178
00:27:06,780 --> 00:27:09,780
Awesome, Charles. And Nick, thank you for coming today.

179
00:27:09,780 --> 00:27:26,780
Have a nice day. Bye.

