1
00:00:00,000 --> 00:00:11,200
So is this your new place?

2
00:00:11,200 --> 00:00:15,560
Well I am actually visiting Oklahoma City.

3
00:00:15,560 --> 00:00:17,600
CannaCon is this week.

4
00:00:17,600 --> 00:00:27,120
So tomorrow, Thursday, May 27th and then Friday May 28th there's CannaCon in Oklahoma.

5
00:00:27,120 --> 00:00:36,320
So I thought it was a good opportunity to see how this market is developing, meet some

6
00:00:36,320 --> 00:00:47,880
people in Oklahoma and try to talk with some laboratories here.

7
00:00:47,880 --> 00:00:48,880
Good morning Jake.

8
00:00:48,880 --> 00:00:49,880
Hey how's it going?

9
00:00:49,880 --> 00:00:54,920
Welcome to the Canva Data Science Meetup Group.

10
00:00:54,920 --> 00:00:55,920
Nice to meet you both.

11
00:00:55,920 --> 00:00:58,400
So it's great to be here.

12
00:00:58,400 --> 00:01:02,440
Well just to introduce myself, so my name is Keegan.

13
00:01:02,440 --> 00:01:08,880
My background, I started working as an analyst at a cannabis testing laboratory and then

14
00:01:08,880 --> 00:01:11,800
I found there were a lot of rooms for improvement.

15
00:01:11,800 --> 00:01:17,720
So for example there's data entry that needed to be automated, certificates needed to be

16
00:01:17,720 --> 00:01:24,320
created easy and then just a website and client portal.

17
00:01:24,320 --> 00:01:32,240
So as a software developer I realized there's a lot of laboratories that need software solutions

18
00:01:32,240 --> 00:01:38,600
so I launched CannaLytics, a company to help provide software solutions to laboratories

19
00:01:38,600 --> 00:01:40,920
to make cannabis testing simple and easy.

20
00:01:40,920 --> 00:01:46,960
Because I've got a lot of knowledge so I may as well share with you.

21
00:01:46,960 --> 00:01:55,080
I guess I'll let Charles and yourself introduce yourselves.

22
00:01:55,080 --> 00:01:58,680
Charles I think you were here first, go ahead if you want to.

23
00:01:58,680 --> 00:02:06,920
So yeah I have like 27 years of software development experience and I'm moving into like data science

24
00:02:06,920 --> 00:02:10,400
and machine learning now.

25
00:02:10,400 --> 00:02:19,240
And this group has some really good data to explore and do some interesting things with.

26
00:02:19,240 --> 00:02:25,560
And so I've been working on that, I've been working on some Kaggle competitions.

27
00:02:25,560 --> 00:02:28,560
Oh cool, that's awesome.

28
00:02:28,560 --> 00:02:37,160
How about yourself Jake, how did you hear about the group and...

29
00:02:37,160 --> 00:02:40,280
So I guess it's kind of two factors.

30
00:02:40,280 --> 00:02:52,120
So I've been interested in cannabis, I do like stock analytics data, like investing analytics.

31
00:02:52,120 --> 00:02:56,260
And cannabis is not an industry I'm current with, it's one of two industries I'm really

32
00:02:56,260 --> 00:02:57,260
looking at.

33
00:02:57,260 --> 00:03:00,920
But I'm also just interested in data science generally, so like that piece.

34
00:03:00,920 --> 00:03:08,200
And then the second thing is what I do most of my time is I'm building this data science

35
00:03:08,200 --> 00:03:15,600
tool and I'm sort of trying to explore what industries it could be interesting in.

36
00:03:15,600 --> 00:03:20,240
And we've had like one or two people that I've just met with recently who were doing

37
00:03:20,240 --> 00:03:24,520
like analytics in the cannabis or at cannabis companies I believe.

38
00:03:24,520 --> 00:03:28,520
And so I thought you know maybe this would be cool people to meet here in that space

39
00:03:28,520 --> 00:03:29,520
as well.

40
00:03:29,520 --> 00:03:35,200
That's awesome, if you're open to talk about it I would love to hear about the data science

41
00:03:35,200 --> 00:03:36,560
tool you're building.

42
00:03:36,560 --> 00:03:41,560
But if you want, if you guys want to try it out I could...

43
00:03:41,560 --> 00:03:42,560
It's a...

44
00:03:42,560 --> 00:03:43,560
If you want to...

45
00:03:43,560 --> 00:03:48,200
Well honestly today was going to be a bit more of an open day, so I'm traveling to Oklahoma

46
00:03:48,200 --> 00:03:50,840
City to visit Canacon.

47
00:03:50,840 --> 00:03:57,080
So I haven't had as much time to prepare things this week as I normally do, so the whole point

48
00:03:57,080 --> 00:03:59,760
of the group is to let everybody have a chance to share.

49
00:03:59,760 --> 00:04:02,680
So yes, if you want, it would be...

50
00:04:02,680 --> 00:04:06,720
Yeah why don't I... what I'll do here, I'll... this is a...

51
00:04:06,720 --> 00:04:08,400
That'd be really cool actually.

52
00:04:08,400 --> 00:04:09,400
Why don't I...

53
00:04:09,400 --> 00:04:11,240
I'll send you both a link.

54
00:04:11,240 --> 00:04:18,720
So I have a hosted version and a local version, but as I'm sure you're aware like downloading...

55
00:04:18,720 --> 00:04:23,960
It's a Python tool, so like downloading Python packages is a brutal, brutal process.

56
00:04:23,960 --> 00:04:31,160
So I'll send you both this posted version.

57
00:04:31,160 --> 00:04:33,520
And if you want to...

58
00:04:33,520 --> 00:04:35,000
You can just log in there.

59
00:04:35,000 --> 00:04:36,000
Oh do you...

60
00:04:36,000 --> 00:04:37,000
Are you still there?

61
00:04:37,000 --> 00:04:38,000
I think you might have froze.

62
00:04:38,000 --> 00:04:45,560
Sorry, I just had to get reconnected real quick.

63
00:04:45,560 --> 00:04:46,560
Oh no worries.

64
00:04:46,560 --> 00:04:53,600
So if you go to that link that I just sent, it's a Jupiter Hub link.

65
00:04:53,600 --> 00:04:55,080
So if you essentially...

66
00:04:55,080 --> 00:04:59,840
It's like a hosted Jupiter notebook essentially, or Jupiter lab really.

67
00:04:59,840 --> 00:05:03,360
You have to put your email in to access it, but you're welcome to put in like a fake email

68
00:05:03,360 --> 00:05:05,680
if you don't want to put in a real email.

69
00:05:05,680 --> 00:05:06,680
I understand.

70
00:05:06,680 --> 00:05:11,480
Oh sweet, yeah I can sort of guide you through.

71
00:05:11,480 --> 00:05:16,240
Yes, so I'm just going to share my screen that way.

72
00:05:16,240 --> 00:05:19,160
So just a heads up, so we were recording to that way people...

73
00:05:19,160 --> 00:05:21,680
Oh yeah, no worries at all.

74
00:05:21,680 --> 00:05:22,680
Totally cool.

75
00:05:22,680 --> 00:05:25,480
If anything, if more people get to see it, that's cool.

76
00:05:25,480 --> 00:05:29,320
How long have you been running this group?

77
00:05:29,320 --> 00:05:36,720
So the data science group kicked off in February this year, so I think our first meetup was

78
00:05:36,720 --> 00:05:41,840
February 24th or 26th.

79
00:05:41,840 --> 00:05:42,840
Okay cool.

80
00:05:42,840 --> 00:05:43,840
And...

81
00:05:43,840 --> 00:05:44,840
Oh interesting, so...

82
00:05:44,840 --> 00:05:51,560
So I'll give a little background.

83
00:05:51,560 --> 00:05:54,880
So what it is, it's a visual...

84
00:05:54,880 --> 00:06:01,160
It's supposed to be like a visual Python representation and sort of visualizing Python as a spreadsheet.

85
00:06:01,160 --> 00:06:05,720
So we'd imagine everything you'll do in the spreadsheet will generate the equivalent Python

86
00:06:05,720 --> 00:06:06,720
for you.

87
00:06:06,720 --> 00:06:12,440
So if you run this cell right here, that first cell...

88
00:06:12,440 --> 00:06:17,040
Yeah you can go from there.

89
00:06:17,040 --> 00:06:19,240
That should do it.

90
00:06:19,240 --> 00:06:23,240
You're going to have to put your email in again.

91
00:06:23,240 --> 00:06:34,080
But I sort of just knew I could just do this to hopefully be able to get in touch with

92
00:06:34,080 --> 00:06:35,080
people.

93
00:06:35,080 --> 00:06:36,080
Just formally?

94
00:06:36,080 --> 00:06:39,080
Yeah, yeah, you're welcome.

95
00:06:39,080 --> 00:06:44,840
And we're not taking anything.

96
00:06:44,840 --> 00:06:46,720
You can just put like NA here.

97
00:06:46,720 --> 00:06:47,720
This is...

98
00:06:47,720 --> 00:06:50,960
I kind of have this question here for like self-service users, but obviously since we're

99
00:06:50,960 --> 00:06:56,920
on a call, you're welcome to do that.

100
00:06:56,920 --> 00:06:57,920
Cool.

101
00:06:57,920 --> 00:06:58,920
Oh sweet.

102
00:06:58,920 --> 00:07:08,640
I didn't know you could do that.

103
00:07:08,640 --> 00:07:09,640
So this is the tool.

104
00:07:09,640 --> 00:07:10,640
So essentially...

105
00:07:10,640 --> 00:07:14,960
So are you a Python user at all?

106
00:07:14,960 --> 00:07:15,960
Oh definitely.

107
00:07:15,960 --> 00:07:19,720
Both Charles and I love Python.

108
00:07:19,720 --> 00:07:25,320
So I fell in love with Python five, six years ago.

109
00:07:25,320 --> 00:07:29,000
And it's my weapon of choice.

110
00:07:29,000 --> 00:07:31,400
And so I'm fascinated by this.

111
00:07:31,400 --> 00:07:33,600
So I've done a little work.

112
00:07:33,600 --> 00:07:40,960
So this looks almost like something you would build like with PyQt, but obviously it's running

113
00:07:40,960 --> 00:07:42,400
here in the browser.

114
00:07:42,400 --> 00:07:47,520
So honestly, I'm kind of curious how you built this, if that's something you're willing to

115
00:07:47,520 --> 00:07:50,280
share, but it also looks like a cool tool.

116
00:07:50,280 --> 00:08:02,200
Yeah I have a co-creator, who I made this with, especially my twin brother.

117
00:08:02,200 --> 00:08:07,660
And so essentially it's a Python backend.

118
00:08:07,660 --> 00:08:10,920
So let me actually, it's easier to explain once I show you a little bit of the functionality.

119
00:08:10,920 --> 00:08:12,760
So let's import a data set really quick.

120
00:08:12,760 --> 00:08:22,360
You can click the import button and then you can add files from your folder.

121
00:08:22,360 --> 00:08:28,600
Because you're on the hosted version, you have to upload your file to the system first

122
00:08:28,600 --> 00:08:31,400
and then you can import it into the sheet with that button we just clicked.

123
00:08:31,400 --> 00:08:36,400
If you're on a local version, then you can just sort of, you're in your local files,

124
00:08:36,400 --> 00:08:43,280
you can connect to whatever files you want.

125
00:08:43,280 --> 00:08:46,960
But if you want to go back and click that button again, so it is add files from current

126
00:08:46,960 --> 00:08:54,040
folder, that's just going to put in our sort of the demo data airport pets.

127
00:08:54,040 --> 00:08:57,560
And I'll just show you, so let's do like, you can close that little window on the right

128
00:08:57,560 --> 00:08:58,560
there.

129
00:08:58,560 --> 00:09:03,120
The one that you're, the one you had the import window, you can close that.

130
00:09:03,120 --> 00:09:04,120
This one?

131
00:09:04,120 --> 00:09:05,800
No, the, if you scroll up.

132
00:09:05,800 --> 00:09:07,800
Yeah, it works as import CSV.

133
00:09:07,800 --> 00:09:09,000
You can close that.

134
00:09:09,000 --> 00:09:13,320
And then now we, so essentially the gist of it is it's a spreadsheet and everything you

135
00:09:13,320 --> 00:09:16,680
do in the spreadsheet is going to generate the equivalent code for you.

136
00:09:16,680 --> 00:09:23,480
So if you filter, maybe, so apply, try applying a filter to a column.

137
00:09:23,480 --> 00:09:29,560
Like is it possible to import our own data set here real quick?

138
00:09:29,560 --> 00:09:30,560
Yeah, yeah, totally.

139
00:09:30,560 --> 00:09:31,560
For sure.

140
00:09:31,560 --> 00:09:34,120
Also all you have to, so one of two ways to do it.

141
00:09:34,120 --> 00:09:37,760
One is you could, if you have it as like a CSV file, you could just upload it in here.

142
00:09:37,760 --> 00:09:38,760
Okay, sweet.

143
00:09:38,760 --> 00:09:39,760
So yeah.

144
00:09:39,760 --> 00:09:46,360
Because essentially, you know, just put together some data here.

145
00:09:46,360 --> 00:09:48,360
Yeah, totally, totally.

146
00:09:48,360 --> 00:09:53,760
So I think the best way, did you click the upload button yet or are you just in your

147
00:09:53,760 --> 00:09:54,760
local file?

148
00:09:54,760 --> 00:09:58,760
Let me just get some data for you real quick.

149
00:09:58,760 --> 00:10:01,760
Yeah, for sure.

150
00:10:01,760 --> 00:10:02,760
No.

151
00:10:02,760 --> 00:10:05,800
And then how do we upload data?

152
00:10:05,800 --> 00:10:07,440
So yeah, so you want, yeah, exactly.

153
00:10:07,440 --> 00:10:18,680
Click that button there, select your data.

154
00:10:18,680 --> 00:10:22,000
And then now, so if we, let's rerun that top.

155
00:10:22,000 --> 00:10:24,520
Oh, was that a, did you upload it as a CSV?

156
00:10:24,520 --> 00:10:27,840
It was an Excel, does it need to be a CSV?

157
00:10:27,840 --> 00:10:28,840
Yes.

158
00:10:28,840 --> 00:10:32,000
I mean, it doesn't have to be, but if you want to do it sort of through the UI, it has

159
00:10:32,000 --> 00:10:33,000
to be a CSV.

160
00:10:33,000 --> 00:10:37,600
You can, the other thing you can do is you can import any data frame into the Mito just

161
00:10:37,600 --> 00:10:40,800
by putting it into the argument to the Mito sheet dot sheet call.

162
00:10:40,800 --> 00:10:44,080
So for example, if you're working in a notebook and you have a data frame you're working with

163
00:10:44,080 --> 00:10:48,840
and you want to quickly call those in, you can do that.

164
00:10:48,840 --> 00:10:50,960
Cool.

165
00:10:50,960 --> 00:10:58,000
So now that top cell in the notebook where we call the Mito sheet, if you want to run

166
00:10:58,000 --> 00:11:05,880
that again and then click the import button, you should be able to select your file.

167
00:11:05,880 --> 00:11:07,920
But I think this is.

168
00:11:07,920 --> 00:11:09,440
That's yeah, that's the wrong data.

169
00:11:09,440 --> 00:11:13,760
But if you click, there you go.

170
00:11:13,760 --> 00:11:14,760
You can close that.

171
00:11:14,760 --> 00:11:16,760
It's a little tutorial.

172
00:11:16,760 --> 00:11:17,760
Interesting.

173
00:11:17,760 --> 00:11:18,760
So.

174
00:11:18,760 --> 00:11:27,680
And if you want to build that original data.

175
00:11:27,680 --> 00:11:37,120
Yeah, you can see this little drop down next to where it says airport pets at the bottom.

176
00:11:37,120 --> 00:11:38,120
You can just click delete.

177
00:11:38,120 --> 00:11:43,920
Because yes, we're all about, you know, working with actual data here.

178
00:11:43,920 --> 00:11:49,080
So we just have a licensee data here.

179
00:11:49,080 --> 00:11:51,080
I'm tired of doing the same demo data.

180
00:11:51,080 --> 00:11:52,080
So this is great.

181
00:11:52,080 --> 00:11:56,440
But so yeah, let me, so let me show you maybe like a few things.

182
00:11:56,440 --> 00:11:59,960
And I just love to hear your thoughts or if you want to, if you have thoughts of how you

183
00:11:59,960 --> 00:12:02,160
might want to play with that, that'd be cool.

184
00:12:02,160 --> 00:12:05,400
For example, so let's do like a filter, for example.

185
00:12:05,400 --> 00:12:10,280
So if we go to the top or if we go to the actual the sheet, and then if you click, so

186
00:12:10,280 --> 00:12:14,960
each it's, you know, it's the same feel as a spreadsheet or it's, you know, aspiring

187
00:12:14,960 --> 00:12:15,960
to be that.

188
00:12:15,960 --> 00:12:19,600
So for every column, there's a little filter icon next to it.

189
00:12:19,600 --> 00:12:22,560
So if you want to filter.

190
00:12:22,560 --> 00:12:25,940
So like a good example here, yeah, we can add a filter.

191
00:12:25,940 --> 00:12:27,960
See we have that NaN value.

192
00:12:27,960 --> 00:12:28,960
This is kind of useful.

193
00:12:28,960 --> 00:12:32,800
If you want one of the filter conditions is you can do, yeah, you can filter to a city

194
00:12:32,800 --> 00:12:38,400
or you could filter like out null values by doing is not empty.

195
00:12:38,400 --> 00:12:41,560
The last option.

196
00:12:41,560 --> 00:12:46,860
You got rid of your null values.

197
00:12:46,860 --> 00:12:54,600
And then if you scroll down to below the, yeah, let's see.

198
00:12:54,600 --> 00:12:56,280
So this is code we're generating.

199
00:12:56,280 --> 00:13:01,520
So that final step, step four, that is the equivalent code for dropping null values from

200
00:13:01,520 --> 00:13:02,520
the column.

201
00:13:02,520 --> 00:13:03,520
Yeah.

202
00:13:03,520 --> 00:13:07,800
So we generate the equivalent code for each edit.

203
00:13:07,800 --> 00:13:11,120
And it kind of goes back to why I made the tool, which is like, I think you mentioned

204
00:13:11,120 --> 00:13:15,000
like if you was a five or six years ago, you started learning Python, like I started doing

205
00:13:15,000 --> 00:13:17,560
Python maybe like two years ago.

206
00:13:17,560 --> 00:13:23,480
And I just found that like memorizing the syntax and getting the syntax right all the time

207
00:13:23,480 --> 00:13:27,480
was just like was such a maybe it just wasn't my skill set, but it was just such a learning

208
00:13:27,480 --> 00:13:30,200
curve for me.

209
00:13:30,200 --> 00:13:37,600
So I kind of wanted with my brother like make a way to make Python more accessible essentially.

210
00:13:37,600 --> 00:13:44,600
Question, do we need mydosheader or is this just all on pandas?

211
00:13:44,600 --> 00:13:45,600
So that's a good question.

212
00:13:45,600 --> 00:13:48,960
So everything you have right there is pandas.

213
00:13:48,960 --> 00:13:51,280
Where you will need mydosheader is one specific place.

214
00:13:51,280 --> 00:13:54,920
And it's the reason we have that.

215
00:13:54,920 --> 00:14:00,760
So if you scroll up, if you do so what you can actually you can do like spreadsheet formulas

216
00:14:00,760 --> 00:14:01,760
in the sheet.

217
00:14:01,760 --> 00:14:17,360
So like if you want to try adding a column, for example, you can just click add column

218
00:14:17,360 --> 00:14:30,920
button in the toolbar at the top.

219
00:14:30,920 --> 00:14:34,920
And then in there you could use a form do a formula that's like normal spreadsheet syntax.

220
00:14:34,920 --> 00:14:39,440
So if you want to use like a concat function that you would do in Excel, you could do it

221
00:14:39,440 --> 00:14:45,000
in there and sort of concat the two.

222
00:14:45,000 --> 00:14:59,040
So just write concat and you'll see a suggestion pop up.

223
00:14:59,040 --> 00:15:02,960
And then you can select the columns you want.

224
00:15:02,960 --> 00:15:07,720
Ideally I'm trying to like let's say.

225
00:15:07,720 --> 00:15:10,840
Oh, yeah, pretty soon.

226
00:15:10,840 --> 00:15:13,920
I don't know why it wasn't working like that before.

227
00:15:13,920 --> 00:15:17,480
Maybe if you try, if you close that menu on the side, I think it might make it a little

228
00:15:17,480 --> 00:15:18,480
easier.

229
00:15:18,480 --> 00:15:21,680
That A where it says AB, if you close that.

230
00:15:21,680 --> 00:15:28,880
Yeah, but also, yeah, if you know the names of your columns, it will suggest it to you.

231
00:15:28,880 --> 00:15:36,720
We use this thing called Q grid, which is like a no, it's our AG grid, which is like

232
00:15:36,720 --> 00:15:42,640
a it's a JavaScript like table library essentially.

233
00:15:42,640 --> 00:15:46,920
And they kind of help us like get a lot of these features out of the box, which is cool.

234
00:15:46,920 --> 00:15:52,560
But yeah, so there's your.

235
00:15:52,560 --> 00:16:13,640
If you wanted to put like.

236
00:16:13,640 --> 00:16:17,600
And so, yeah, so it's actually kind of what you're doing right now is like formatting.

237
00:16:17,600 --> 00:16:21,360
It's like one of the main uses I think people use this for and then you can rename it.

238
00:16:21,360 --> 00:16:24,240
But then you to your question about like, can you just like copy this code into any

239
00:16:24,240 --> 00:16:26,700
other environment?

240
00:16:26,700 --> 00:16:29,520
You can't except for when you use something like.

241
00:16:29,520 --> 00:16:33,800
Where do we have it?

242
00:16:33,800 --> 00:16:34,800
Scroll down to the bottom.

243
00:16:34,800 --> 00:16:35,800
Exactly.

244
00:16:35,800 --> 00:16:40,400
But that function, it is pandas that we're using.

245
00:16:40,400 --> 00:16:46,000
But we it's like, we had to use a complex combination of pandas to like, get that to

246
00:16:46,000 --> 00:16:47,000
work.

247
00:16:47,000 --> 00:16:51,520
So we sort of wrapped it in our own function.

248
00:16:51,520 --> 00:16:53,240
But I thought I saw you opening spider.

249
00:16:53,240 --> 00:16:55,160
Were you about to try and copy this into spider?

250
00:16:55,160 --> 00:16:56,160
Yes, actually.

251
00:16:56,160 --> 00:16:57,160
Yes, that'd be cool.

252
00:16:57,160 --> 00:16:59,160
I've actually haven't done a ton of testing with that.

253
00:16:59,160 --> 00:17:01,200
So it'd be interesting to see how that goes.

254
00:17:01,200 --> 00:17:04,680
What I'm thinking is like the functions, you don't need the UI.

255
00:17:04,680 --> 00:17:07,920
The UI won't pop up in spider or at least it won't work well.

256
00:17:07,920 --> 00:17:14,640
But I'm thinking if you copy, if you also copy the Mito sheet package into spider, the

257
00:17:14,640 --> 00:17:16,240
syntax might actually work still.

258
00:17:16,240 --> 00:17:18,240
We haven't done a lot of testing on that.

259
00:17:18,240 --> 00:17:19,240
I'll need to.

260
00:17:19,240 --> 00:17:20,240
This your project.

261
00:17:20,240 --> 00:17:21,240
Yes.

262
00:17:21,240 --> 00:17:34,240
Are you looking to what are you looking to do in the local install?

263
00:17:34,240 --> 00:17:37,000
I believe so, right?

264
00:17:37,000 --> 00:17:39,520
Because I'll need Mito sheet to.

265
00:17:39,520 --> 00:17:41,480
Yeah, you will.

266
00:17:41,480 --> 00:17:42,480
So this is the thing.

267
00:17:42,480 --> 00:17:43,480
It's the local install.

268
00:17:43,480 --> 00:17:44,480
Yeah, it should.

269
00:17:44,480 --> 00:17:45,480
It should work.

270
00:17:45,480 --> 00:17:51,400
So you just it's so the thing, the way Mito works is a is a the back end and then there's

271
00:17:51,400 --> 00:17:54,160
a front end, which is a JupyterLab extension.

272
00:17:54,160 --> 00:17:58,320
But I think you probably if you actually don't think you'll have to even download the extension

273
00:17:58,320 --> 00:18:00,320
part, just do the just install the package.

274
00:18:00,320 --> 00:18:01,320
You should be fine.

275
00:18:01,320 --> 00:18:02,320
So yeah, that'd be cool if you want to try that.

276
00:18:02,320 --> 00:18:24,760
And I guess.

277
00:18:24,760 --> 00:18:34,800
It is your principal audiences or data scientists.

278
00:18:34,800 --> 00:18:37,000
I think it definitely is.

279
00:18:37,000 --> 00:18:41,640
I think I'm I'm definitely doing a lot of thinking right now about like trying to specify

280
00:18:41,640 --> 00:18:43,200
for like a more specific audience.

281
00:18:43,200 --> 00:18:51,480
But yes, certainly right now, I think it's like data scientists who are like beginner

282
00:18:51,480 --> 00:18:53,000
to intermediate at Python.

283
00:18:53,000 --> 00:18:58,640
I'd say like the lowest level is like like a lot of people think like, oh, this would

284
00:18:58,640 --> 00:19:00,680
be a good learning tool.

285
00:19:00,680 --> 00:19:01,680
And I see that.

286
00:19:01,680 --> 00:19:06,120
But I disagree because I think it would actually wouldn't it wouldn't let you like it wouldn't

287
00:19:06,120 --> 00:19:07,920
make you become better at Python.

288
00:19:07,920 --> 00:19:11,200
Like it would just make you be able to do some of the things we want to do in Python

289
00:19:11,200 --> 00:19:12,200
more quickly.

290
00:19:12,200 --> 00:19:17,680
So I think it's actually better for like someone who's like six months to like three years

291
00:19:17,680 --> 00:19:19,960
experience with Python.

292
00:19:19,960 --> 00:19:23,360
And they know what they want to do, and they know how it works, but they're just like they're

293
00:19:23,360 --> 00:19:26,760
tired of constantly going to Stack Overflow or like Googling syntax.

294
00:19:26,760 --> 00:19:31,520
It's just easier to do some things very quickly in here like a pivot table, for example, pivoting

295
00:19:31,520 --> 00:19:33,440
in pandas is like a pretty common thing to do.

296
00:19:33,440 --> 00:19:38,600
But I've talked to like even really advanced data scientists who just like never remember

297
00:19:38,600 --> 00:19:41,960
the pandas pivot syntax, and they're constantly Googling that.

298
00:19:41,960 --> 00:19:48,480
And so our tool is like, you know, quickly do your pivot table in our sheet.

299
00:19:48,480 --> 00:19:51,520
And so one is like the ability to just generate the syntax more quickly.

300
00:19:51,520 --> 00:19:54,320
And then two is just the visual aspect.

301
00:19:54,320 --> 00:19:58,440
A lot of people just like, you know, looking at their data during their analysis more in

302
00:19:58,440 --> 00:20:01,360
a more dynamic way than just like printing out a data frame.

303
00:20:01,360 --> 00:20:04,120
Which I think is cool.

304
00:20:04,120 --> 00:20:11,360
That's definitely one of the things I like about it.

305
00:20:11,360 --> 00:20:14,080
Did the package install?

306
00:20:14,080 --> 00:20:19,160
We're still installing here.

307
00:20:19,160 --> 00:20:22,720
Yes the other thing is like it's installing Plotly, which is like for graphing, but like

308
00:20:22,720 --> 00:20:25,000
you're not going to use graphing if you're not using the font.

309
00:20:25,000 --> 00:20:26,960
So it's kind of annoying.

310
00:20:26,960 --> 00:20:30,920
How about you Charles?

311
00:20:30,920 --> 00:20:36,480
Do you have thoughts so far or?

312
00:20:36,480 --> 00:20:39,320
No, this is really interesting.

313
00:20:39,320 --> 00:20:47,760
This is, you know, you might be targeting people who use like, was it Power BI?

314
00:20:47,760 --> 00:20:49,920
Yeah, I see that.

315
00:20:49,920 --> 00:20:57,000
People who do data science stuff, but don't really know programming.

316
00:20:57,000 --> 00:21:02,680
That would be, you know, I think that would be a good audience.

317
00:21:02,680 --> 00:21:03,680
That's interesting.

318
00:21:03,680 --> 00:21:04,680
Yeah.

319
00:21:04,680 --> 00:21:08,720
What do you think about the cannabis space?

320
00:21:08,720 --> 00:21:12,000
So I think one thing, one of my interests as a cannabis, obviously it's like, you know,

321
00:21:12,000 --> 00:21:14,940
one from an investing standpoint and the two from like a standpoint of this tool maybe

322
00:21:14,940 --> 00:21:20,720
is like a lot of places I've seen people enjoy the tool is like these like transitional industries

323
00:21:20,720 --> 00:21:25,640
where it's like they're typically using things like Excel, more analog tools.

324
00:21:25,640 --> 00:21:28,840
And because of like they have much larger data sizes, so they want to do more advanced

325
00:21:28,840 --> 00:21:30,160
analyses now.

326
00:21:30,160 --> 00:21:32,800
They're transitioning to Python.

327
00:21:32,800 --> 00:21:37,960
So we I've seen that in like finance a little bit, like pharma, like bio research.

328
00:21:37,960 --> 00:21:38,960
I'm seeing that.

329
00:21:38,960 --> 00:21:46,480
Do you think that cannabis is a place where, you know, there is a push to get these like

330
00:21:46,480 --> 00:21:49,400
non technical people more technical?

331
00:21:49,400 --> 00:21:59,360
Well, I what I have heard is people do want a hold of their data and to be able to analyze

332
00:21:59,360 --> 00:22:00,360
it.

333
00:22:00,360 --> 00:22:12,880
So that's, you know, something that I'm not sure if it's their design or, you know, a

334
00:22:12,880 --> 00:22:15,360
lot of people kind of are holding that hostage these days.

335
00:22:15,360 --> 00:22:19,800
So it seems well, that's what I've heard from the industry.

336
00:22:19,800 --> 00:22:24,880
So it seems like people are having a hard time getting their data out of whatever program

337
00:22:24,880 --> 00:22:25,880
they have it in.

338
00:22:25,880 --> 00:22:30,640
They just want to do statistics on their own.

339
00:22:30,640 --> 00:22:34,520
I just kind of heard that just word of mouth yesterday.

340
00:22:34,520 --> 00:22:44,000
So I think there is a demand for just people to explore their own data.

341
00:22:44,000 --> 00:22:45,000
Right.

342
00:22:45,000 --> 00:22:49,920
So I think people are becoming increasingly more sophisticated.

343
00:22:49,920 --> 00:22:58,080
And so people want to get a hold of their data and look at it themselves.

344
00:22:58,080 --> 00:23:06,960
Charles, in your work as a data scientist, do you think people are wanting to get a hold

345
00:23:06,960 --> 00:23:11,320
of their data more and more these days?

346
00:23:11,320 --> 00:23:12,320
Yeah, I think so.

347
00:23:12,320 --> 00:23:14,640
I mean, I'm not involved in the cannabis industry.

348
00:23:14,640 --> 00:23:26,360
So I can't speak to that, but there are a lot of people that do data analysis but don't

349
00:23:26,360 --> 00:23:31,320
really know like Python or any sort of programming languages.

350
00:23:31,320 --> 00:23:36,120
And again, they use tools like, you know, they use Excel, they use Power BI, they understand

351
00:23:36,120 --> 00:23:37,920
how to use those things.

352
00:23:37,920 --> 00:23:42,480
So this would be kind of good for people who aren't as technical but still do a lot of

353
00:23:42,480 --> 00:23:46,560
data analysis.

354
00:23:46,560 --> 00:23:53,360
So my other question is how large of a data set have you tried this on?

355
00:23:53,360 --> 00:23:56,800
Yeah, that's a good question.

356
00:23:56,800 --> 00:24:00,240
I'm confident that it will...

357
00:24:00,240 --> 00:24:03,840
So the idea is that what you're doing is really in the sheet, you're editing a data frame.

358
00:24:03,840 --> 00:24:07,440
So you're either passing a data frame as an argument and that will populate the sheet.

359
00:24:07,440 --> 00:24:11,000
Or if you're passing in a CSV, the first thing it automatically does is turn your CSV into

360
00:24:11,000 --> 00:24:12,000
a data frame.

361
00:24:12,000 --> 00:24:16,680
So as long as you can fit the data frame...

362
00:24:16,680 --> 00:24:19,680
You can fit...

363
00:24:19,680 --> 00:24:23,440
As long as you can fit that set into a data frame, which I think that most of you have

364
00:24:23,440 --> 00:24:25,400
done is like 15 million rows.

365
00:24:25,400 --> 00:24:28,240
I'm not sure how many columns that was.

366
00:24:28,240 --> 00:24:29,720
So it's more about obviously the area.

367
00:24:29,720 --> 00:24:38,240
But then I think it's definitely usable and of course way faster than doing it in other

368
00:24:38,240 --> 00:24:42,320
visual tools like Excel, for example.

369
00:24:42,320 --> 00:24:46,200
Excel is just really slow.

370
00:24:46,200 --> 00:24:47,800
But then there's also like...

371
00:24:47,800 --> 00:24:50,200
It's actually a really whole space of the data science I'm not too familiar with.

372
00:24:50,200 --> 00:24:52,600
It's like these people doing...

373
00:24:52,600 --> 00:24:56,640
I think it's like PySpark where it's like they have too much data.

374
00:24:56,640 --> 00:24:59,160
It's like they can't even fit in data frames.

375
00:24:59,160 --> 00:25:02,800
It's this whole other dimension of data science that I really am not too experienced in.

376
00:25:02,800 --> 00:25:08,760
So I think we're more focused on this...

377
00:25:08,760 --> 00:25:12,720
I think the people who have these a bit more smaller data sets maybe are doing more exploratory

378
00:25:12,720 --> 00:25:14,400
data analysis to start.

379
00:25:14,400 --> 00:25:18,800
Smaller, meaning still 5 million rows potentially.

380
00:25:18,800 --> 00:25:28,280
But we're definitely not there for huge productionized machine learning models for insurance companies

381
00:25:28,280 --> 00:25:29,280
and stuff like that.

382
00:25:29,280 --> 00:25:33,000
So we're just like crazy crazy, terabytes of data.

383
00:25:33,000 --> 00:25:40,720
If you want, I just started to do this brief analysis on comparing Eastern and Western

384
00:25:40,720 --> 00:25:41,720
Washington rainfall.

385
00:25:41,720 --> 00:25:45,600
So that affects how cannabis yields.

386
00:25:45,600 --> 00:25:53,000
So if you want, could I show you that brief analysis and see if that may be something

387
00:25:53,000 --> 00:25:57,880
that could be recreated in your tool?

388
00:25:57,880 --> 00:26:03,480
Would you want to try uploading that into the tool?

389
00:26:03,480 --> 00:26:06,480
Whatever you want to do.

390
00:26:06,480 --> 00:26:14,840
Yes, so here, why don't I run through it real quick and then we can try it with your program.

391
00:26:14,840 --> 00:26:18,600
Sweet, that'd be awesome.

392
00:26:18,600 --> 00:26:29,360
And that way we can try to replicate the results.

393
00:26:29,360 --> 00:26:32,120
Just a brief overview.

394
00:26:32,120 --> 00:26:37,080
Essentially we're just doing sort of a difference in difference.

395
00:26:37,080 --> 00:26:42,440
So just kind of comparing one region to the next.

396
00:26:42,440 --> 00:26:50,200
And so essentially we're just seeing, okay, it looks like there's quite a different climate

397
00:26:50,200 --> 00:26:56,680
in these counties compared to these counties.

398
00:26:56,680 --> 00:27:03,560
So over here it looks wet and incredibly wet.

399
00:27:03,560 --> 00:27:06,640
And then over here it's much drier.

400
00:27:06,640 --> 00:27:11,520
And so does that have any effect on cannabis yields?

401
00:27:11,520 --> 00:27:17,600
Is everybody doing things inside and temperature and humidity controlled rooms?

402
00:27:17,600 --> 00:27:20,720
And it may not have any effect whatsoever.

403
00:27:20,720 --> 00:27:25,320
Or maybe people over here are having a hard time keeping their crops cool.

404
00:27:25,320 --> 00:27:34,360
So we actually will have to do some statistics to see if there's any fundamental difference

405
00:27:34,360 --> 00:27:36,640
in the cannabis yields.

406
00:27:36,640 --> 00:27:41,560
So it's not actually that tricky of an analysis here.

407
00:27:41,560 --> 00:27:43,560
It's pretty short really.

408
00:27:43,560 --> 00:27:47,760
So essentially we're just going to use dummy variables.

409
00:27:47,760 --> 00:27:57,680
So one, if you're in Eastern Washington, zero if you're in Western Washington.

410
00:27:57,680 --> 00:27:59,680
Or I forget which.

411
00:27:59,680 --> 00:28:00,680
Exactly.

412
00:28:00,680 --> 00:28:11,760
So one, if you're in Eastern Washington, zero if you're in Western Washington.

413
00:28:11,760 --> 00:28:19,960
And so the past few weeks we've been working with lab result data in Washington state as

414
00:28:19,960 --> 00:28:28,480
well as earlier year in also what is the name of your program?

415
00:28:28,480 --> 00:28:31,240
Hold on, MITO.

416
00:28:31,240 --> 00:28:40,000
So in MITO we have already read in the licensees.

417
00:28:40,000 --> 00:28:50,640
And so we're joining that data with the lab results.

418
00:28:50,640 --> 00:28:57,320
Cool.

419
00:28:57,320 --> 00:29:01,440
Okay so.

420
00:29:01,440 --> 00:29:03,160
And you're reading this from an Excel file?

421
00:29:03,160 --> 00:29:04,160
Oh that's the basis.

422
00:29:04,160 --> 00:29:06,320
Did you say that one more time please?

423
00:29:06,320 --> 00:29:07,480
You're reading this from an Excel file?

424
00:29:07,480 --> 00:29:10,560
I think I saw a line in there that said read Excel.

425
00:29:10,560 --> 00:29:13,880
Yes, I think I may have moved this file.

426
00:29:13,880 --> 00:29:18,440
So one second and I am going to make a...

427
00:29:18,440 --> 00:29:26,440
Let me pause this for one second and I'm going to...

428
00:29:26,440 --> 00:29:27,440
Yeah no worries.

429
00:29:27,440 --> 00:29:28,440
All right.

430
00:29:28,440 --> 00:29:29,440
Hey.

431
00:29:29,440 --> 00:29:30,440
Hold on.

432
00:29:30,440 --> 00:29:34,440
I almost forgot Paul here.

433
00:29:34,440 --> 00:29:37,440
Hey Paul.

434
00:29:37,440 --> 00:29:40,440
Hey Paul.

435
00:29:40,440 --> 00:29:46,440
Hey guys, how you doing?

436
00:29:46,440 --> 00:29:47,440
Good.

437
00:29:47,440 --> 00:29:48,440
Excellent.

438
00:29:48,440 --> 00:29:56,840
So you actually joined in a good time because I'm sort of getting the data connected here.

439
00:29:56,840 --> 00:30:05,680
So we've got Jake today who's building a data science tool and we're using it to explore

440
00:30:05,680 --> 00:30:08,440
some of the cannabis data we're working with.

441
00:30:08,440 --> 00:30:09,440
Oh very cool.

442
00:30:09,440 --> 00:30:10,440
Nice to meet you Paul.

443
00:30:10,440 --> 00:30:11,440
How you doing Jake?

444
00:30:11,440 --> 00:30:12,440
Doing well.

445
00:30:12,440 --> 00:30:13,440
Where are you located?

446
00:30:13,440 --> 00:30:14,440
Southeast Michigan.

447
00:30:14,440 --> 00:30:15,440
Where are you?

448
00:30:15,440 --> 00:30:16,440
Cool.

449
00:30:16,440 --> 00:30:17,440
I'm in...

450
00:30:17,440 --> 00:30:18,440
Philadelphia.

451
00:30:18,440 --> 00:30:19,440
I'm actually a bit Jersey Shore right now though.

452
00:30:19,440 --> 00:30:20,440
Oh okay.

453
00:30:20,440 --> 00:30:21,440
Okay.

454
00:30:21,440 --> 00:30:31,440
So what we're doing Paul, so this is sort of merging both tools.

455
00:30:31,440 --> 00:30:41,440
We're basically doing an analysis on Eastern versus Western Washington lab results.

456
00:30:41,440 --> 00:30:47,600
So just a simple analysis just to demonstrate working with the data.

457
00:30:47,600 --> 00:30:50,440
Is it kind of like an A-B test or something?

458
00:30:50,440 --> 00:30:51,440
Essentially.

459
00:30:51,440 --> 00:31:00,600
So this was just sort of an example of merging the data sets and just running a regression.

460
00:31:00,600 --> 00:31:01,600
Right on.

461
00:31:01,600 --> 00:31:08,360
But I'm just getting the data pulled up here real quick.

462
00:31:08,360 --> 00:31:12,080
And while you're doing that, Charles thanks again for helping me out this week.

463
00:31:12,080 --> 00:31:13,080
I appreciate it.

464
00:31:13,080 --> 00:31:22,280
I haven't had a chance to get into the parquet files yet but I'm still trying to use PowerShell

465
00:31:22,280 --> 00:31:25,480
to get some of the conversions over and it seems to be working.

466
00:31:25,480 --> 00:31:27,600
It takes a long time though.

467
00:31:27,600 --> 00:31:28,600
Yes.

468
00:31:28,600 --> 00:31:35,920
It's converting it in pandas was...

469
00:31:35,920 --> 00:31:36,920
It was slow going.

470
00:31:36,920 --> 00:31:38,880
It was very slow going.

471
00:31:38,880 --> 00:31:39,880
Pretty painful.

472
00:31:39,880 --> 00:31:40,880
Yeah.

473
00:31:40,880 --> 00:31:42,560
So that's interesting.

474
00:31:42,560 --> 00:31:43,560
I was going to...

475
00:31:43,560 --> 00:31:44,560
I thought about like...

476
00:31:44,560 --> 00:31:49,200
I don't know a lot about R but I thought about trying to do some of this analysis in R and

477
00:31:49,200 --> 00:31:52,680
thinking oh maybe that would be better but you're running into the same problems I'm

478
00:31:52,680 --> 00:31:53,680
running into.

479
00:31:53,680 --> 00:31:54,680
Yeah.

480
00:31:54,680 --> 00:31:55,680
Yeah.

481
00:31:55,680 --> 00:32:03,400
And after you're mentioning GOSH in Python, what is the name of the chunking application

482
00:32:03,400 --> 00:32:05,400
you referred to?

483
00:32:05,400 --> 00:32:06,400
Dask?

484
00:32:06,400 --> 00:32:07,400
Dask.

485
00:32:07,400 --> 00:32:14,240
Which I don't know anything about but within R there was a library that does essentially

486
00:32:14,240 --> 00:32:15,520
actually is like three libraries.

487
00:32:15,520 --> 00:32:21,360
I've tried two of them and those two didn't work but this other one seems to be kind of

488
00:32:21,360 --> 00:32:26,080
promising where it just essentially takes the data and chunks it out into files and

489
00:32:26,080 --> 00:32:28,960
kind of distributes them and you do some parallel processing on it.

490
00:32:28,960 --> 00:32:34,760
So I'm going to give that a whirl and see what happens and yeah just test it out.

491
00:32:34,760 --> 00:32:35,760
Cool.

492
00:32:35,760 --> 00:32:36,760
Sorry about that.

493
00:32:36,760 --> 00:32:37,760
I can focus now.

494
00:32:37,760 --> 00:32:48,760
So that's good and I've read your emails or your messages that you're working to get the

495
00:32:48,760 --> 00:32:52,720
UTF-16 converted to UTF-8.

496
00:32:52,720 --> 00:32:59,720
So correct me if I'm wrong but I think you've got a solution.

497
00:32:59,720 --> 00:33:08,760
Yeah it's essentially just using PowerShell and run a small script and it just chunks

498
00:33:08,760 --> 00:33:17,240
through the whole file and then gives me a UTF-8 TSV file on the other side.

499
00:33:17,240 --> 00:33:22,560
But then that's another script I can run that will convert the TSV to CSV and then I've

500
00:33:22,560 --> 00:33:26,280
got the full CSV file and then what I've been doing is I'm in the process of doing this

501
00:33:26,280 --> 00:33:32,200
right now is uploading them to Google Cloud with the free instance that I've got right

502
00:33:32,200 --> 00:33:38,200
now and then just to get some of these larger files up there to see if this tests out the

503
00:33:38,200 --> 00:33:40,880
process and then run some small queries against them.

504
00:33:40,880 --> 00:33:42,880
That's what I'm trying to do.

505
00:33:42,880 --> 00:33:44,840
Excellent work.

506
00:33:44,840 --> 00:33:53,600
So just to show just a demonstration of this data to Jake and just some statistics that

507
00:33:53,600 --> 00:33:56,720
I was beginning to tinker on.

508
00:33:56,720 --> 00:34:00,520
So just always just start with the question.

509
00:34:00,520 --> 00:34:08,240
So my question was does rainfall, so does like the amount of moisture affect the cannabinoid

510
00:34:08,240 --> 00:34:09,240
results?

511
00:34:09,240 --> 00:34:16,960
And so I kind of wanted to look at all of the different results maybe some fail more

512
00:34:16,960 --> 00:34:23,120
for moisture content, maybe the water activity is higher on some than the others.

513
00:34:23,120 --> 00:34:32,360
So the way Washington is set up, it works well for essentially just a quick and simple

514
00:34:32,360 --> 00:34:34,720
dummy variable analysis.

515
00:34:34,720 --> 00:34:40,000
So we're just going to get.

516
00:34:40,000 --> 00:34:45,560
I think we just lost him.

517
00:34:45,560 --> 00:34:50,720
Hey Keegan, I don't know if you can hear us, but you froze up.

518
00:34:50,720 --> 00:34:53,680
At least it's a nice image to look at.

519
00:34:53,680 --> 00:34:55,840
He's got his power fist going, so that's good.

520
00:34:55,840 --> 00:34:58,840
Oh, there you are Keegan.

521
00:34:58,840 --> 00:34:59,840
You came back.

522
00:34:59,840 --> 00:35:00,840
My apologies.

523
00:35:00,840 --> 00:35:07,160
So I actually should let you know Jake, so I am working at the Oklahoma City Library.

524
00:35:07,160 --> 00:35:14,120
I'm in Oklahoma City for a CannaCon tomorrow and Friday.

525
00:35:14,120 --> 00:35:23,040
So I'm going to be meeting people in the cannabis industry here, see what their challenges are,

526
00:35:23,040 --> 00:35:26,280
how things are going, how the market is developing.

527
00:35:26,280 --> 00:35:28,280
That's really cool.

528
00:35:28,280 --> 00:35:33,280
So back to the work here, Eastern Western Washington.

529
00:35:33,280 --> 00:35:48,480
What I'm doing here is just reading in the lab result data and the licensee data and

530
00:35:48,480 --> 00:35:52,200
then merging it.

531
00:35:52,200 --> 00:36:06,080
And so, for example.

532
00:36:06,080 --> 00:36:15,680
So those are the variables for licensees and we can merge these with the licensees global

533
00:36:15,680 --> 00:36:25,760
ID and then the lab result for MME ID.

534
00:36:25,760 --> 00:36:42,200
So you can look at the data points, but that way we can merge them.

535
00:36:42,200 --> 00:36:49,440
And then we have a lot of data points for this one lab result.

536
00:36:49,440 --> 00:36:53,400
So now we have this is a lab result here.

537
00:36:53,400 --> 00:36:58,480
It is with this piece of inventory.

538
00:36:58,480 --> 00:37:07,520
So we could actually look up the inventory at this point and get even more data points.

539
00:37:07,520 --> 00:37:15,720
For now, our analysis doesn't need those data points, but in the future, we'll want to just

540
00:37:15,720 --> 00:37:19,920
get all the data points that we can together.

541
00:37:19,920 --> 00:37:25,680
And then essentially what we've done here is just added on the licensees data.

542
00:37:25,680 --> 00:37:33,360
So now we know, okay, this licensee is in Bellingham, Washington.

543
00:37:33,360 --> 00:37:38,480
And they're in Whatcom County.

544
00:37:38,480 --> 00:37:46,200
So now it's pretty easy just to create a dummy variable if they're in Eastern or Western

545
00:37:46,200 --> 00:37:47,200
Washington.

546
00:37:47,200 --> 00:37:54,320
Pandas has tricks for creating dummy variables, but I actually just wrote a quick script,

547
00:37:54,320 --> 00:38:00,680
you know, just wrote a quick, you know, quick little loop here just to create that.

548
00:38:00,680 --> 00:38:03,200
Can I jump in for a sec?

549
00:38:03,200 --> 00:38:06,760
So I think what's interesting there is you wrote, so you wrote that little script there,

550
00:38:06,760 --> 00:38:09,600
you know, it's an if statement essentially.

551
00:38:09,600 --> 00:38:14,680
So what you would do in my tool is you would just use like, you would just make a new column

552
00:38:14,680 --> 00:38:17,400
and then you could do an if statement in that column.

553
00:38:17,400 --> 00:38:23,120
Just like if this, that, if x1, if y0, it's kind of interesting.

554
00:38:23,120 --> 00:38:24,120
Yes.

555
00:38:24,120 --> 00:38:26,840
So Paul, this is Jake's tool.

556
00:38:26,840 --> 00:38:32,760
And so this is sort of combined as a user interface into a Python code generator.

557
00:38:32,760 --> 00:38:33,760
Okay.

558
00:38:33,760 --> 00:38:34,760
This would be...

559
00:38:34,760 --> 00:38:35,760
That's called Mito sheet.

560
00:38:35,760 --> 00:38:36,760
Yeah.

561
00:38:36,760 --> 00:38:49,760
Here, I'll put the, I'll put some links in the chat.

562
00:38:49,760 --> 00:38:56,640
So just for everybody's sake, I'll run through mine real quick, but then essentially I see

563
00:38:56,640 --> 00:39:03,920
you would basically, you know, create your dummy variable here, sort of Excel style.

564
00:39:03,920 --> 00:39:11,040
And then you'll have, you know, the code automatically generated.

565
00:39:11,040 --> 00:39:14,040
Yeah, totally, totally.

566
00:39:14,040 --> 00:39:16,080
Was it cool?

567
00:39:16,080 --> 00:39:21,200
So we've got, we've got that dummy variable now.

568
00:39:21,200 --> 00:39:25,360
And so, I mean, so this is actually interesting.

569
00:39:25,360 --> 00:39:34,000
This is actually the first time I've calculated this statistic, but like, so this is an interesting

570
00:39:34,000 --> 00:39:37,220
statistic that maybe not a lot of people even know.

571
00:39:37,220 --> 00:39:50,800
So it looks like, you know, 62% of licensees are in Eastern Washington and, you know, you're

572
00:39:50,800 --> 00:40:00,000
about 38% are in the other, you know, Eastern Washington.

573
00:40:00,000 --> 00:40:06,760
And so what you could do is remember, I always say, well, I'm not sure if you've heard me

574
00:40:06,760 --> 00:40:13,760
say this, but what I would say is you can get real interesting by just doing conditional,

575
00:40:13,760 --> 00:40:16,740
conditional averages.

576
00:40:16,740 --> 00:40:26,440
So if you just, just took the mean conditional on, okay, if it's a producer or processor,

577
00:40:26,440 --> 00:40:35,000
you can now find out, okay, are there more producers in Western Washington or Eastern

578
00:40:35,000 --> 00:40:36,880
Washington?

579
00:40:36,880 --> 00:40:48,600
Because my, you know, hypothesis is that there are a lot of retailers in Eastern Washington

580
00:40:48,600 --> 00:40:55,480
and a lot of cultivators in Western Washington.

581
00:40:55,480 --> 00:41:03,960
And wonder, wonder what the implications would be on transportation for that.

582
00:41:03,960 --> 00:41:13,920
Well, it depends on the way the laboratory, well, actually, well, there's a lot of factors.

583
00:41:13,920 --> 00:41:14,920
So you're right.

584
00:41:14,920 --> 00:41:21,440
So the people will end up, and that's the thing in Washington is essentially transporting

585
00:41:21,440 --> 00:41:29,480
the cannabis from Western Washington to Eastern Washington, you know, first to test it at

586
00:41:29,480 --> 00:41:30,800
a laboratory.

587
00:41:30,800 --> 00:41:37,520
Here are some laboratories in Western Washington, and then to send it to all their retailers.

588
00:41:37,520 --> 00:41:43,800
So just to understand the flow a little bit, because I'm new to this, so any grower has

589
00:41:43,800 --> 00:41:47,960
to, there's some formality, they have to get it tested at a specific type of lab before

590
00:41:47,960 --> 00:41:49,320
they can sell it.

591
00:41:49,320 --> 00:41:50,320
They can't do any of that testing in-house.

592
00:41:50,320 --> 00:41:51,320
Exactly.

593
00:41:51,320 --> 00:42:00,920
So they need to get their products tested at a third party independent laboratory.

594
00:42:00,920 --> 00:42:01,920
That's interesting.

595
00:42:01,920 --> 00:42:02,920
I wonder why.

596
00:42:02,920 --> 00:42:03,920
Yeah, that's interesting.

597
00:42:03,920 --> 00:42:07,680
I think it's for, well, I'm just speculating.

598
00:42:07,680 --> 00:42:14,280
I'm new like you, but I would think it's for overlapping interests, right?

599
00:42:14,280 --> 00:42:16,360
Because you could probably cut a lot of corners.

600
00:42:16,360 --> 00:42:21,800
If you're a transporter and a producer, I wonder if some of the checks and balances

601
00:42:21,800 --> 00:42:24,800
would be kind of compromised by doing that.

602
00:42:24,800 --> 00:42:25,800
Yeah, totally.

603
00:42:25,800 --> 00:42:31,720
It was interesting if you could, like, if you could, there's a way you could, I don't

604
00:42:31,720 --> 00:42:35,320
know, there's a way you could have it so you could do that testing in-house.

605
00:42:35,320 --> 00:42:37,560
You could potentially save a lot of money.

606
00:42:37,560 --> 00:42:40,200
I imagine it's the growers who are paying for that transportation.

607
00:42:40,200 --> 00:42:48,080
Yeah, but they have a big problem with people trying to fake results.

608
00:42:48,080 --> 00:42:49,080
Right, right.

609
00:42:49,080 --> 00:42:58,720
Yeah, I mean, there's a lot of growers who are using a lot of pesticides and agents

610
00:42:58,720 --> 00:43:01,720
to stop mold.

611
00:43:01,720 --> 00:43:05,600
It wasn't a safe product to sell anymore.

612
00:43:05,600 --> 00:43:09,760
So that's why you have independent labs to do that kind of stuff.

613
00:43:09,760 --> 00:43:18,360
Also, we should actually be careful with the statistics that I was calculating because

614
00:43:18,360 --> 00:43:25,640
I was only reading in 10,000 and there are about 2 million lab results.

615
00:43:25,640 --> 00:43:29,160
So that is not even going to be close to accurate.

616
00:43:29,160 --> 00:43:32,480
Yeah, you can do a random sample if you don't want to read it all in.

617
00:43:32,480 --> 00:43:33,480
Quick question, Charles.

618
00:43:33,480 --> 00:43:45,400
Do you know how to read in a CSV with just a random number of samples?

619
00:43:45,400 --> 00:43:46,400
No.

620
00:43:46,400 --> 00:43:56,320
Yeah, but so you may just have to hitch this, but this is sort of just a demo of it, an

621
00:43:56,320 --> 00:43:57,320
analysis.

622
00:43:57,320 --> 00:43:58,320
That's pretty cool.

623
00:43:58,320 --> 00:44:02,920
I mean, we were talking about this before, just the fact that you're going through this

624
00:44:02,920 --> 00:44:07,320
and answering some of those high level stats questions gives you that baseline of understanding

625
00:44:07,320 --> 00:44:10,840
of what the market's like and how the market operates too.

626
00:44:10,840 --> 00:44:15,000
So just having that information, you'll be way ahead of a lot of people.

627
00:44:15,000 --> 00:44:16,000
Exactly.

628
00:44:16,000 --> 00:44:21,160
So my red flag was, that's what you do when you're looking at data.

629
00:44:21,160 --> 00:44:25,280
You've got to look at the data because there could be some red flags.

630
00:44:25,280 --> 00:44:32,760
And so what my red flag was, was 100% of the cultivators are in Eastern Washington.

631
00:44:32,760 --> 00:44:40,880
And so that's just because I just read in the first 10,000 observations.

632
00:44:40,880 --> 00:44:44,720
One thing that comes to mind is you're showing me this, and we talked about this a little

633
00:44:44,720 --> 00:44:51,800
bit last time, but like Washington and other states, they're in this kind of weird situation

634
00:44:51,800 --> 00:44:53,640
where they're trying to grow an industry, right?

635
00:44:53,640 --> 00:44:57,640
They're really trying to get it off the ground so they can get their tax money out of it.

636
00:44:57,640 --> 00:45:03,200
But they're also trying to rein it in enough to where it's controlled and people aren't

637
00:45:03,200 --> 00:45:07,440
getting sick from bad product or what have you, try to mitigate the risks.

638
00:45:07,440 --> 00:45:09,600
So they're kind of this weird balancing act.

639
00:45:09,600 --> 00:45:14,480
But at some point, I would imagine that they're going to have to do some auditing and compliance,

640
00:45:14,480 --> 00:45:20,360
like pretty stringent auditing and compliance checks.

641
00:45:20,360 --> 00:45:28,800
Or also checks for if there's any way to check for what's the word I'm looking for?

642
00:45:28,800 --> 00:45:29,800
Shoot.

643
00:45:29,800 --> 00:45:32,960
That will come to me in a second.

644
00:45:32,960 --> 00:45:37,720
But to position yourself to where you can provide data that would help with auditing

645
00:45:37,720 --> 00:45:42,840
and compliance would probably be a pretty good strategic move.

646
00:45:42,840 --> 00:45:52,440
That's interesting because I think you're right, LEAF Data Systems is collecting all

647
00:45:52,440 --> 00:45:57,440
this data just to look at it.

648
00:45:57,440 --> 00:46:03,640
Rule number one about data, look at the data.

649
00:46:03,640 --> 00:46:05,640
So you're right.

650
00:46:05,640 --> 00:46:09,640
And it's one of those things where like I was saying, there's such a high demand for

651
00:46:09,640 --> 00:46:12,600
this and such a low supply.

652
00:46:12,600 --> 00:46:20,040
So LEAF Data Systems is probably trying to do their best to supply visualizations and

653
00:46:20,040 --> 00:46:25,400
reports and statistics, but they may be underwater.

654
00:46:25,400 --> 00:46:33,840
There may be such a high demand, they just can't fill it.

655
00:46:33,840 --> 00:46:35,640
So you're right.

656
00:46:35,640 --> 00:46:42,200
So that's where other people in the industry can kind of help out.

657
00:46:42,200 --> 00:46:45,960
So I want to give a shout out to Jim McCray.

658
00:46:45,960 --> 00:46:53,240
And so he is actually the person who originally provided this data set to us.

659
00:46:53,240 --> 00:46:55,240
And that's essentially what he does.

660
00:46:55,240 --> 00:46:58,240
He's sort of the canary in the coal mine.

661
00:46:58,240 --> 00:47:01,120
So he, the canary in the data mine.

662
00:47:01,120 --> 00:47:09,640
So he is digging around with Kim's data, essentially doing exactly what you're talking about, just

663
00:47:09,640 --> 00:47:19,360
seeing if there's any oddities or being an extra eye on.

664
00:47:19,360 --> 00:47:20,920
So who is Jim McCray?

665
00:47:20,920 --> 00:47:23,880
Is he with the State of Washington?

666
00:47:23,880 --> 00:47:30,400
I think he, I'm not 100% sure the organization he's associated with, but he's a researcher

667
00:47:30,400 --> 00:47:31,400
in Washington.

668
00:47:31,400 --> 00:47:38,120
And he's done a lot of research on the cannabis industry.

669
00:47:38,120 --> 00:47:43,720
So you may be able to find him on LinkedIn.

670
00:47:43,720 --> 00:47:53,960
That if you could honestly be the best way to get a hold of him, make him hear his LinkedIn.

671
00:47:53,960 --> 00:47:54,960
I was just curious.

672
00:47:54,960 --> 00:48:00,920
So yeah, so he's trying to, in a researchy kind of way, try and keep a pulse on the market

673
00:48:00,920 --> 00:48:08,880
and keep his eye open for things like what compliance and that sort of thing.

674
00:48:08,880 --> 00:48:09,880
Exactly.

675
00:48:09,880 --> 00:48:13,640
So there's interesting things to look at.

676
00:48:13,640 --> 00:48:23,400
So one is just the total amount of sales per lab result, because some products just sell

677
00:48:23,400 --> 00:48:25,360
a lot and others don't.

678
00:48:25,360 --> 00:48:37,360
And then you just want to keep an eye on failure rates just to make sure that there's not a

679
00:48:37,360 --> 00:48:43,240
significant, where you just be aware of a significant trend one way or the other.

680
00:48:43,240 --> 00:48:49,440
So for example, so just to give them a shout out.

681
00:48:49,440 --> 00:48:57,160
So if you want a good example of someone who's sort of, someone keeping their finger on the

682
00:48:57,160 --> 00:49:01,400
pulse, but they're doing it in California.

683
00:49:01,400 --> 00:49:14,640
So they're just seeing, okay, how many things are being tested and they plot failure rates.

684
00:49:14,640 --> 00:49:23,040
So you can see that failure rates are going down over time while the number of tests are

685
00:49:23,040 --> 00:49:27,920
going up.

686
00:49:27,920 --> 00:49:34,240
I think they may have, and this is, I think one of their best contributions is they actually

687
00:49:34,240 --> 00:49:42,120
show the seasonality in, I believe this is seasonality in failures.

688
00:49:42,120 --> 00:49:45,920
It may not necessarily be of cannabis.

689
00:49:45,920 --> 00:49:53,240
Oh, actually, no, no, no, this is pesticide use.

690
00:49:53,240 --> 00:50:00,840
So this is the analysis they've done that I think can be extended.

691
00:50:00,840 --> 00:50:13,840
So in California, they record the amount of pesticides that are used over time and it

692
00:50:13,840 --> 00:50:18,800
seems there's a cyclical use of pesticides.

693
00:50:18,800 --> 00:50:28,320
I think it would be interesting to try to correlate pesticide use with failure rates

694
00:50:28,320 --> 00:50:40,120
to see if somebody's using pesticides, does that affect the failure rate?

695
00:50:40,120 --> 00:50:48,880
But enough of that tangent.

696
00:50:48,880 --> 00:50:53,120
Sorry about that.

697
00:50:53,120 --> 00:51:00,840
So extra eye on the data.

698
00:51:00,840 --> 00:51:08,240
So we can actually use some statistics here to start answering some of these questions.

699
00:51:08,240 --> 00:51:16,240
So back to our original question of the day, does rainfall have an effect on cannabinoid

700
00:51:16,240 --> 00:51:17,240
results?

701
00:51:17,240 --> 00:51:24,680
Hedge this, that this is not a random sample, this is just the first 10,000 lab results.

702
00:51:24,680 --> 00:51:29,880
So you'd want to actually repeat this analysis with all of the lab results and then you'd

703
00:51:29,880 --> 00:51:34,280
actually get a better estimate of the effect.

704
00:51:34,280 --> 00:51:39,000
So hedge this, this is just a demonstration of the statistics.

705
00:51:39,000 --> 00:51:45,800
It's not actually representative of what we think that your estimates are.

706
00:51:45,800 --> 00:51:51,320
So Keegan, you pulled out like the first 10,000 results.

707
00:51:51,320 --> 00:51:57,960
I was able to do the same sort of thing with the files that I have.

708
00:51:57,960 --> 00:52:03,520
But to look across the entire data set, do you have the capacity to do that yourself

709
00:52:03,520 --> 00:52:11,120
or are you kind of constrained on how much data you can work with?

710
00:52:11,120 --> 00:52:13,000
I can probably pull it off.

711
00:52:13,000 --> 00:52:18,680
I just haven't yet.

712
00:52:18,680 --> 00:52:23,840
Were you thinking that you may be able to do this analysis or?

713
00:52:23,840 --> 00:52:28,920
Well, right now, like I've been kind of complaining the last couple of meetings and leaning on

714
00:52:28,920 --> 00:52:35,160
your guys's expertise, but I'm still trying just to get the entire data sets in a usable

715
00:52:35,160 --> 00:52:38,000
kind of format and I'm getting there.

716
00:52:38,000 --> 00:52:44,000
But as soon as I get it up to Google Cloud, I mean, running queries on this data is going

717
00:52:44,000 --> 00:52:46,360
to be really easy.

718
00:52:46,360 --> 00:52:47,840
Just running SQL queries on it.

719
00:52:47,840 --> 00:52:49,640
Of course, how much is going to cost?

720
00:52:49,640 --> 00:52:50,640
I don't know yet.

721
00:52:50,640 --> 00:52:52,600
So I got to be careful of that.

722
00:52:52,600 --> 00:52:59,040
But if I get those things up in place and it seems like they're intact, they're not

723
00:52:59,040 --> 00:53:03,320
corrupt because I know that Charles, you're saying that like the inventories file you

724
00:53:03,320 --> 00:53:06,500
found to be corrupted somehow.

725
00:53:06,500 --> 00:53:09,940
So I'm just saying if I can get it up there and they're in place, I'll let you know.

726
00:53:09,940 --> 00:53:13,720
And then maybe we can do something to kind of speed up some of your exploratory analysis

727
00:53:13,720 --> 00:53:14,720
a little bit.

728
00:53:14,720 --> 00:53:15,720
Definitely, definitely, definitely.

729
00:53:15,720 --> 00:53:16,720
So how about this next week?

730
00:53:16,720 --> 00:53:25,720
I can focus and then it seems that you're going to be focusing on this as well.

731
00:53:25,720 --> 00:53:30,160
Let's get this data usable because that's the first step.

732
00:53:30,160 --> 00:53:36,040
I think, you know, we've sort of mocked out how we can do some analysis.

733
00:53:36,040 --> 00:53:41,760
But you know, we've got to do our YAC shaving.

734
00:53:41,760 --> 00:53:49,400
And so we're basically stuck right here where we need to get all this data ready.

735
00:53:49,400 --> 00:53:56,080
And Charles has come up with a couple of good solutions.

736
00:53:56,080 --> 00:54:00,120
I just need to spend time and implement them.

737
00:54:00,120 --> 00:54:07,080
So this next week, that's what I'll be focusing on is getting this data usable.

738
00:54:07,080 --> 00:54:12,640
And then next week we can touch base again and I'll send you messages throughout the

739
00:54:12,640 --> 00:54:13,640
week.

740
00:54:13,640 --> 00:54:22,040
And that will be essentially the next step in our data analysis journey.

741
00:54:22,040 --> 00:54:28,120
Just for this, just so Jake knows, I'm actually just finished up in a master's in data science

742
00:54:28,120 --> 00:54:30,240
and I'm getting ready to do my capstone project.

743
00:54:30,240 --> 00:54:36,920
So I kind of lucked out and hooked up with the guys here.

744
00:54:36,920 --> 00:54:42,460
And I thought it would be kind of interesting to do a project around the cannabis industry

745
00:54:42,460 --> 00:54:45,120
just because it's a novel data set.

746
00:54:45,120 --> 00:54:50,320
And you can apply it like Keegan was saying the first time I was talking to him that because

747
00:54:50,320 --> 00:54:55,660
it's a new data set, there's a really good low hanging fruit here with some basic analysis.

748
00:54:55,660 --> 00:54:57,960
So that was kind of that's my focus.

749
00:54:57,960 --> 00:55:02,760
And I have to actually try and get a data set together pretty quick because my project's

750
00:55:02,760 --> 00:55:04,760
due by the beginning of August.

751
00:55:04,760 --> 00:55:08,280
And there's a bunch of other stuff I've got to do as well.

752
00:55:08,280 --> 00:55:11,880
So my goal is to try and get this data in a usable format as soon as possible.

753
00:55:11,880 --> 00:55:16,480
And of course, whatever success I have, I'll share with you guys.

754
00:55:16,480 --> 00:55:18,560
And then we can all kind of pound on it.

755
00:55:18,560 --> 00:55:19,560
Cool.

756
00:55:19,560 --> 00:55:24,040
I sound like the chap, if the tool, it's of course free.

757
00:55:24,040 --> 00:55:27,400
If it's at all helpful to you in the process, definitely.

758
00:55:27,400 --> 00:55:29,600
I put the docs in the chat if you want to try it out.

759
00:55:29,600 --> 00:55:30,600
Awesome.

760
00:55:30,600 --> 00:55:31,600
Yeah.

761
00:55:31,600 --> 00:55:32,600
Thanks, Jake.

762
00:55:32,600 --> 00:55:33,600
I appreciate that.

763
00:55:33,600 --> 00:55:39,400
Just to take it home real quick and then we'll close out.

764
00:55:39,400 --> 00:55:46,360
Basically we just ran a regression here, the THC on Eastern Washington.

765
00:55:46,360 --> 00:55:56,560
So you know, you could interpret this coefficient as, you know, the THC difference in Eastern

766
00:55:56,560 --> 00:55:58,520
Washington versus Western Washington.

767
00:55:58,520 --> 00:56:05,440
Keep in mind, we have read in all the data, but you would interpret this if you were confident

768
00:56:05,440 --> 00:56:08,240
that that was about a 5% difference.

769
00:56:08,240 --> 00:56:16,480
But there were some red flags, R squared, and principally R squared is a pretty big

770
00:56:16,480 --> 00:56:19,480
red flag.

771
00:56:19,480 --> 00:56:29,920
So, Jake, this is essentially the final piece that would need to be done in your program

772
00:56:29,920 --> 00:56:30,920
if possible.

773
00:56:30,920 --> 00:56:40,960
So, you know, we can get all the way through here where we've created the dummy variable.

774
00:56:40,960 --> 00:56:47,920
And so that could take us all the way to there and we can be content with that.

775
00:56:47,920 --> 00:56:53,960
And then if you stats models, or if you're interested in doing future work, you know,

776
00:56:53,960 --> 00:56:58,280
this may be something you'd be interested in looking at is, you know, just a way where

777
00:56:58,280 --> 00:57:01,920
you could run simple regressions.

778
00:57:01,920 --> 00:57:04,160
Yeah, totally.

779
00:57:04,160 --> 00:57:07,240
That's definitely something I think I want to add for sure.

780
00:57:07,240 --> 00:57:10,360
We're sort of, you know, we're building it out.

781
00:57:10,360 --> 00:57:15,920
I think right now we have like the base functionality you need, which is like transformations and

782
00:57:15,920 --> 00:57:19,000
pivot tables and merging and filtering.

783
00:57:19,000 --> 00:57:23,240
And then I think the question from there is like, do we go more into like the visualization

784
00:57:23,240 --> 00:57:28,720
space or the statistical space or the modeling space, you know, like machine learning, because

785
00:57:28,720 --> 00:57:34,800
obviously there's all these amazing packages we could try and develop this front end for.

786
00:57:34,800 --> 00:57:36,920
And so that's, yeah, that's just kind of the crossroads we're at.

787
00:57:36,920 --> 00:57:40,920
But definitely something like that, like running a simple regression, like, you know, it's

788
00:57:40,920 --> 00:57:41,920
not simple at all.

789
00:57:41,920 --> 00:57:48,160
But you kind of hit the nail on the head because we all agree that, you know, with data science,

790
00:57:48,160 --> 00:57:55,560
love it or hate it, you end up spending about 90% of your time or more, sometimes 99% of

791
00:57:55,560 --> 00:57:59,360
your time just cleaning the data and getting it into the checklist.

792
00:57:59,360 --> 00:58:05,160
And then, but that's the beauty of it because you spend all this work and then all of a

793
00:58:05,160 --> 00:58:08,000
sudden your data set's actually manageable.

794
00:58:08,000 --> 00:58:11,040
And then all you have to do is run a regression.

795
00:58:11,040 --> 00:58:17,400
And if it's good, clean data and you have all the data points you need, often you can

796
00:58:17,400 --> 00:58:23,920
just do an ordinary least squares regression and that'll be incredibly informative.

797
00:58:23,920 --> 00:58:24,920
Yeah, totally.

798
00:58:24,920 --> 00:58:25,920
Yeah, definitely.

799
00:58:25,920 --> 00:58:35,240
So, yeah, no, Jake, Keegan has a bunch of data and I'd be interested, you know, it'd

800
00:58:35,240 --> 00:58:41,880
be interesting for you to like try and load that into your program and see, because we've

801
00:58:41,880 --> 00:58:46,760
had a lot of problems with this data.

802
00:58:46,760 --> 00:58:50,240
And so if you're going to make it, you know, if it's going to be something sort of like

803
00:58:50,240 --> 00:58:56,120
an easy to use kind of tool, these are the kind of problems people are going to run into.

804
00:58:56,120 --> 00:58:58,280
Yeah, that'd be cool.

805
00:58:58,280 --> 00:59:04,000
Keegan, if you're up for it, I'd love maybe, or wherever, I'd love to maybe just, if you

806
00:59:04,000 --> 00:59:07,880
wanted to have like a 20 minute call sometime, maybe just try and go through some of this

807
00:59:07,880 --> 00:59:08,880
together would be cool.

808
00:59:08,880 --> 00:59:09,880
Absolutely.

809
00:59:09,880 --> 00:59:17,040
And I also put the data in the chat because I think Charles was right.

810
00:59:17,040 --> 00:59:27,880
Like if you want a data set to really, you know, what's the word, crash test your program,

811
00:59:27,880 --> 00:59:32,000
that could be a good data set for you.

812
00:59:32,000 --> 00:59:38,000
If I want to reach out to you, is this, this like contact cannabis email, is that good

813
00:59:38,000 --> 00:59:39,000
for you?

814
00:59:39,000 --> 00:59:44,000
You can actually reach me at keegan at canlyrics.com.

815
00:59:44,000 --> 00:59:47,000
Cool, cool, cool, cool.

816
00:59:47,000 --> 00:59:49,000
I'll definitely do that.

817
00:59:49,000 --> 00:59:50,000
All right.

818
00:59:50,000 --> 00:59:51,000
Great.

819
00:59:51,000 --> 00:59:54,000
Well, everyone, thank you for joining today.

820
00:59:54,000 --> 00:59:59,000
And we've got a big week ahead of data wrangling.

821
00:59:59,000 --> 01:00:06,000
So I'm excited, Paul's got a grin, Charles got a grin, so Jake, I'll look forward to

822
01:00:06,000 --> 01:00:09,000
talking with you and Charles as well.

823
01:00:09,000 --> 01:00:10,000
All right.

824
01:00:10,000 --> 01:00:11,000
Good talking guys.

825
01:00:11,000 --> 01:00:12,000
Nice to meet you, Jake.

826
01:00:12,000 --> 01:00:13,000
You as well.

827
01:00:13,000 --> 01:00:14,000
How are we going, guys?

828
01:00:14,000 --> 01:00:15,000
All right, take care.

829
01:00:15,000 --> 01:00:16,000
Have a good week.

830
01:00:16,000 --> 01:00:17,000
You too.

831
01:00:17,000 --> 01:00:18,000
Bye.

832
01:00:18,000 --> 01:00:31,880
Bye.

