1
00:00:00,000 --> 00:00:06,340
I guess what I'll kind of call out here is it's hard to productize generative AI.

2
00:00:06,340 --> 00:00:08,800
It's not a trivial matter, right?

3
00:00:08,800 --> 00:00:15,320
And I think a lot of us, Monte Carlo included, started from the block and tackle stuff.

4
00:00:15,320 --> 00:00:20,000
Let's take stuff that generative AI is pretty good at, and maybe code generation is something

5
00:00:20,000 --> 00:00:25,860
that generative AI does fairly well out of the box, and let's plug it into our applications

6
00:00:25,860 --> 00:00:29,260
where we need to perform those tasks, right?

7
00:00:29,260 --> 00:00:35,800
And we'll use the foundation models that OpenAI and others have kindly built for us through

8
00:00:35,800 --> 00:00:42,840
their APIs, and boom, you have generative AI in production.

9
00:00:42,840 --> 00:00:48,160
Hello and welcome to Coffee with Coalesce, a monthly podcast about all things data and

10
00:00:48,160 --> 00:00:51,120
the trends and technology transforming our industry.

11
00:00:51,120 --> 00:00:56,160
I'm Armand Petrosyan, CEO of Coalesce, and here with me is my co-founder and CTO Satish

12
00:00:56,160 --> 00:00:57,160
Jayanthi.

13
00:00:57,160 --> 00:01:01,320
Together, we'll be your host for the next hour.

14
00:01:01,320 --> 00:01:05,600
Hello, everybody.

15
00:01:05,600 --> 00:01:12,040
I'm super excited to have a great guest here, Mr. Lior.

16
00:01:12,040 --> 00:01:17,160
I think, Kent, you've been on here numerous times in the past, so maybe a quick introduction

17
00:01:17,160 --> 00:01:24,240
from you, Kent, and then our centerpiece guest here with Lior will let you fill in as well,

18
00:01:24,240 --> 00:01:25,240
and then Satish.

19
00:01:25,240 --> 00:01:27,240
But, Kent, why don't you go first?

20
00:01:27,240 --> 00:01:30,240
Do you want to just give everybody a quick background intro on your end?

21
00:01:30,240 --> 00:01:31,240
Sure.

22
00:01:31,240 --> 00:01:32,240
Thanks.

23
00:01:32,240 --> 00:01:33,720
For the folks who don't know me, I'm Kent Graziano.

24
00:01:33,720 --> 00:01:35,800
I'm known as the Data Warrior.

25
00:01:35,800 --> 00:01:42,840
I have been in the data space for multiple decades, something like 30 or 40 years.

26
00:01:42,840 --> 00:01:44,840
I can't remember anymore.

27
00:01:44,840 --> 00:01:51,400
I had the pleasure of working with Armand and Satish for over a decade through numerous

28
00:01:51,400 --> 00:01:58,920
other companies specialized in data architecture, data warehousing, transformations, Data Vault

29
00:01:58,920 --> 00:01:59,920
in particular.

30
00:01:59,920 --> 00:02:07,160
I was the chief technical evangelist at Snowflake for six years, and then retired and had people

31
00:02:07,160 --> 00:02:11,640
like Armand saying, well, no, you can't retire, dude.

32
00:02:11,640 --> 00:02:17,880
We need you as an advisor, so I spend most of my time these days as a strategic advisor

33
00:02:17,880 --> 00:02:22,920
for basically a lot of folks in the Snowflake ecosystem.

34
00:02:22,920 --> 00:02:29,520
I do have my own podcast that I do called the True Data Ops podcast, and Lior's Better

35
00:02:29,520 --> 00:02:31,000
Half was on that.

36
00:02:31,000 --> 00:02:32,000
That's awesome.

37
00:02:32,000 --> 00:02:39,720
In the last year, I got to talk to her about Monte Carlo and all things in that range.

38
00:02:39,720 --> 00:02:40,720
That's awesome.

39
00:02:40,720 --> 00:02:41,720
That's awesome.

40
00:02:41,720 --> 00:02:45,520
Hey, Bram, cool, Kent, and you mentioned it, the Better Half.

41
00:02:45,520 --> 00:02:50,960
I oftentimes call Satish my Better Half as far as the co-founder conversation goes, but

42
00:02:50,960 --> 00:02:53,040
Lior, this is actually your Better Half.

43
00:02:53,040 --> 00:02:55,640
You're married to Barr, the CEO, the co-founder.

44
00:02:55,640 --> 00:02:57,440
You co-founded the company together.

45
00:02:57,440 --> 00:03:02,360
Can you give a quick background yourself and also just for all the people that are tuning

46
00:03:02,360 --> 00:03:05,400
in right now, I would love to hear a founding story.

47
00:03:05,400 --> 00:03:12,940
Just talk us through how you both decided to start Monte Carlo, start the company.

48
00:03:12,940 --> 00:03:15,800
Anything you'd like to share on your end would be awesome to hear.

49
00:03:15,800 --> 00:03:19,240
For me personally, and I think everybody else, that's on as well.

50
00:03:19,240 --> 00:03:20,240
Yeah, absolutely.

51
00:03:20,240 --> 00:03:21,240
Thanks for having me.

52
00:03:21,240 --> 00:03:27,600
It's so fun to be with such a great group of people in data.

53
00:03:27,600 --> 00:03:34,480
I have Barr as my Better Half both at home and at work.

54
00:03:34,480 --> 00:03:36,080
It makes things easy.

55
00:03:36,080 --> 00:03:41,080
My background, I'm a software engineer by training.

56
00:03:41,080 --> 00:03:48,160
By trade, I'm probably some mix of software engineer, data engineer, data scientist.

57
00:03:48,160 --> 00:03:53,840
I've become the jack of all trades because I've always worked in startups and was excited

58
00:03:53,840 --> 00:03:56,880
about things that you can do with data.

59
00:03:56,880 --> 00:04:01,480
I started working on machine learning projects.

60
00:04:01,480 --> 00:04:09,840
I'm worried about disclosing how long ago, but the story that started Monte Carlo actually

61
00:04:09,840 --> 00:04:15,600
started with my previous startup, which was in the cybersecurity space.

62
00:04:15,600 --> 00:04:22,320
We basically used data and analytics to help companies manage and control their data, especially

63
00:04:22,320 --> 00:04:23,780
their sensitive data.

64
00:04:23,780 --> 00:04:28,720
We got acquired at some point by a larger cybersecurity firm called Barracuda, which

65
00:04:28,720 --> 00:04:33,320
is where I spent, I think over three years, led the engineering team there.

66
00:04:33,320 --> 00:04:40,280
One of the things that I spent the most time on was actually building machine learning

67
00:04:40,280 --> 00:04:47,840
models so that we can help our customers identify types of fraud that are difficult to identify

68
00:04:47,840 --> 00:04:51,520
with rule-based systems, which is what the product became extremely successful.

69
00:04:51,520 --> 00:04:58,020
I think it was the fastest growing product that Barracuda had since its inception.

70
00:04:58,020 --> 00:05:02,360
You got quite a bit of adoption and helped, I think, millions of people.

71
00:05:02,360 --> 00:05:09,200
The flip side of that is that when I was thinking about the times that we disappointed our customers,

72
00:05:09,200 --> 00:05:15,720
it was primarily when our data was wrong, when something in our pipelines that feed

73
00:05:15,720 --> 00:05:23,080
our models and or feed the features that are consumed by those models broke.

74
00:05:23,080 --> 00:05:29,280
In a way, that was far more dominant in creating issues and frustrations for our customers

75
00:05:29,280 --> 00:05:36,200
than what my software engineer self would consider traditional downtime, like your code

76
00:05:36,200 --> 00:05:42,080
is broken or your infrastructure isn't running fast enough or those kinds of things.

77
00:05:42,080 --> 00:05:48,940
When you think about this, you realize that on the software engineering side of the house,

78
00:05:48,940 --> 00:05:53,200
this methodology of how to make things reliable has existed for a while.

79
00:05:53,200 --> 00:05:55,880
We generally call it DevOps today.

80
00:05:55,880 --> 00:06:00,800
It's something that people have been practicing for decades, essentially.

81
00:06:00,800 --> 00:06:06,760
There's a very good understanding of how you do this, what is the process, and what are

82
00:06:06,760 --> 00:06:12,440
the tools that you need in order to do that, starting from...

83
00:06:12,440 --> 00:06:17,640
There's hundreds of different tools, but starting from a CI, CD system and all the way up to

84
00:06:17,640 --> 00:06:24,640
observability and monitoring things production, all of that to a large extent did not exist

85
00:06:24,640 --> 00:06:26,440
on the data side of the house.

86
00:06:26,440 --> 00:06:33,280
All the stuff that we built in terms of data pipelines and machine learning models that

87
00:06:33,280 --> 00:06:37,640
run real time had almost none of that.

88
00:06:37,640 --> 00:06:43,240
Even worse, it wasn't even clear how to do this, like what is the process by which you

89
00:06:43,240 --> 00:06:46,960
create reliable data and reliable models.

90
00:06:46,960 --> 00:06:50,640
That's a little bit where my inspiration for Monte Carlo came from.

91
00:06:50,640 --> 00:06:54,760
Mar, independently, was working more on the analytics side of the world.

92
00:06:54,760 --> 00:07:00,720
She was leading operations in an enterprise SaaS company that basically helped customer

93
00:07:00,720 --> 00:07:08,120
success teams use data to act on churn and upsells and things like that.

94
00:07:08,120 --> 00:07:09,960
Were you two together at the time or no?

95
00:07:09,960 --> 00:07:11,520
Did you meet later after this?

96
00:07:11,520 --> 00:07:12,520
Oh, yeah.

97
00:07:12,520 --> 00:07:13,520
No, we weren't married.

98
00:07:13,520 --> 00:07:14,520
We've been married for...

99
00:07:14,520 --> 00:07:15,520
Okay.

100
00:07:15,520 --> 00:07:16,520
Yeah.

101
00:07:16,520 --> 00:07:17,520
Way before Monte Carlo existed.

102
00:07:17,520 --> 00:07:18,520
Yeah, yeah.

103
00:07:18,520 --> 00:07:19,520
Got it.

104
00:07:19,520 --> 00:07:20,520
Okay.

105
00:07:20,520 --> 00:07:21,520
So, marriage came first.

106
00:07:21,520 --> 00:07:29,440
And what happened was, Bar left her job and I was helping her after hours, like over the

107
00:07:29,440 --> 00:07:33,600
weekends and nights, because she was thinking about starting a company.

108
00:07:33,600 --> 00:07:39,200
I'll keep her story shorter, but basically she ran into a similar set of challenges in

109
00:07:39,200 --> 00:07:40,480
the analytics world.

110
00:07:40,480 --> 00:07:43,400
So I had unreliable machine learning models.

111
00:07:43,400 --> 00:07:48,400
She had unreliable dashboards that caused customer frustration, et cetera, and it clicked

112
00:07:48,400 --> 00:07:58,160
for us that I was helping her again as a supporting husband.

113
00:07:58,160 --> 00:08:00,280
And Bar kind of thought, oh, it was interesting.

114
00:08:00,280 --> 00:08:05,400
She started researching the space and trying to understand whether we just suck at our

115
00:08:05,400 --> 00:08:10,720
jobs or whether it's something that a lot of people experience.

116
00:08:10,720 --> 00:08:17,080
And she did find out that this is a common problem that everybody that's building data

117
00:08:17,080 --> 00:08:18,480
is experiencing pretty much.

118
00:08:18,480 --> 00:08:23,760
And it was kind of like a cue, oh, it might be interesting to go out and solve this because

119
00:08:23,760 --> 00:08:25,360
this is a very important problem.

120
00:08:25,360 --> 00:08:32,200
And the use of data, the use of machine learning and production was definitely on the rise,

121
00:08:32,200 --> 00:08:33,960
which still holds true today.

122
00:08:33,960 --> 00:08:38,680
And I wasn't actually planning to join forces and work with her, but a mutual friend of

123
00:08:38,680 --> 00:08:43,680
ours that actually was working at Snowflake at the time, still works there today, and

124
00:08:43,680 --> 00:08:49,520
Bar went to consult with him and get feedback about what she was working on.

125
00:08:49,520 --> 00:08:52,720
And he basically said, oh, you know, Leora has the perfect background.

126
00:08:52,720 --> 00:08:56,840
He worked on fraud detection and data analytics, and he's cheap labor.

127
00:08:56,840 --> 00:08:59,840
So why don't we get into it?

128
00:08:59,840 --> 00:09:02,240
Yeah, it doesn't get cheaper than free.

129
00:09:02,240 --> 00:09:04,520
I would imagine you weren't charging her for this consulting.

130
00:09:04,520 --> 00:09:05,520
I see.

131
00:09:05,520 --> 00:09:06,520
It's nothing like a family-run business.

132
00:09:06,520 --> 00:09:07,520
I was not.

133
00:09:07,520 --> 00:09:12,520
The most I ever got was maybe, I don't know, help with the dishes or something like that.

134
00:09:12,520 --> 00:09:13,520
Yeah.

135
00:09:13,520 --> 00:09:20,800
So, so me, cheap labor was asked to join the team and Bar was very clever.

136
00:09:20,800 --> 00:09:25,560
She got me to, she would go out there and talk to presumably customers, like future

137
00:09:25,560 --> 00:09:29,760
customers of this, trying to research the market and understand how people are tackling

138
00:09:29,760 --> 00:09:32,400
this and what the level of pain is and so on.

139
00:09:32,400 --> 00:09:35,400
And she cleverly invited me to join a few of those.

140
00:09:35,400 --> 00:09:41,240
And then, you know, it's pretty evident that if we were able to solve this problem, this

141
00:09:41,240 --> 00:09:42,240
would be very meaningful.

142
00:09:42,240 --> 00:09:45,800
Like this is something that people, you know, lose sleep over.

143
00:09:45,800 --> 00:09:51,280
And this is something that would make for a fantastic company that could have a lot

144
00:09:51,280 --> 00:09:52,280
of impact.

145
00:09:52,280 --> 00:09:55,760
And that was, you know, and those are the type of opportunities that you only get a

146
00:09:55,760 --> 00:09:58,060
handful of times in your career.

147
00:09:58,060 --> 00:10:04,720
Maybe only once I can pass up and decided to join her full time, which I did several

148
00:10:04,720 --> 00:10:06,000
months later.

149
00:10:06,000 --> 00:10:10,400
And we started Monte Carlo to basically help companies deal with those sleepless nights

150
00:10:10,400 --> 00:10:13,480
and frustrated stakeholders.

151
00:10:13,480 --> 00:10:20,640
And ended up actually at the end of the day, Monte Carlo takes a lot of the ideas that

152
00:10:20,640 --> 00:10:25,360
we all learned from, you know, Bar working on the analytics side and trying to operationalize

153
00:10:25,360 --> 00:10:33,400
it and me kind of, you know, having that DevOps discipline, we applied a lot of those ideas

154
00:10:33,400 --> 00:10:38,960
and again, both forming the methodology of how to create reliable data products.

155
00:10:38,960 --> 00:10:42,880
And obviously we were also excited to build the technology that supports it.

156
00:10:42,880 --> 00:10:46,360
And at the end of the day, Monte Carlo is an observability tool.

157
00:10:46,360 --> 00:10:50,280
It's probably the equivalent of a data dog or a new relic.

158
00:10:50,280 --> 00:10:55,920
And, you know, in the same way they use those to review the, you know, the reliability of

159
00:10:55,920 --> 00:11:01,400
applications, of infrastructure, of security increasingly, you know, Monte Carlo is how

160
00:11:01,400 --> 00:11:03,800
you do that with in the data stack, right?

161
00:11:03,800 --> 00:11:12,600
With Snowflake and Looker and a million other tools that data people have adopted.

162
00:11:12,600 --> 00:11:16,000
And that's what we've been doing since and it's been an exciting journey so far.

163
00:11:16,000 --> 00:11:17,000
That's amazing.

164
00:11:17,000 --> 00:11:18,000
Wow.

165
00:11:18,000 --> 00:11:19,000
Yeah.

166
00:11:19,000 --> 00:11:20,560
Sounds like a great journey for sure.

167
00:11:20,560 --> 00:11:21,560
I think.

168
00:11:21,560 --> 00:11:23,440
And we're still married by the way.

169
00:11:23,440 --> 00:11:24,440
Still married.

170
00:11:24,440 --> 00:11:25,440
Yeah.

171
00:11:25,440 --> 00:11:26,440
That's good.

172
00:11:26,440 --> 00:11:27,440
That's good.

173
00:11:27,440 --> 00:11:28,440
That's most important.

174
00:11:28,440 --> 00:11:32,280
Hopefully this is a unifying thing that you've got through together and you've clearly had

175
00:11:32,280 --> 00:11:35,280
a lot of success and congrats on that so far.

176
00:11:35,280 --> 00:11:39,080
We certainly know how it goes, at least to be co-founders, both Satish and I obviously

177
00:11:39,080 --> 00:11:41,040
co-founders here at Coalesce.

178
00:11:41,040 --> 00:11:43,600
But this sounds like a totally different ballgame.

179
00:11:43,600 --> 00:11:44,600
You're married to the person.

180
00:11:44,600 --> 00:11:45,600
It's amazing.

181
00:11:45,600 --> 00:11:51,200
I got plenty more questions around all that, but that's such a good background and definitely

182
00:11:51,200 --> 00:11:55,000
familiar with the phase of not taking a paycheck, starting the company.

183
00:11:55,000 --> 00:12:00,440
Like both Satish and I did this for no money the first year, year and a half, I think it

184
00:12:00,440 --> 00:12:02,440
was Satish when we started Coalesce.

185
00:12:02,440 --> 00:12:05,280
But anyways, I'm CEO of the company Satish.

186
00:12:05,280 --> 00:12:09,080
I'll let you introduce yourself and then let me jump into a couple of questions I got for

187
00:12:09,080 --> 00:12:11,080
Lior and Kent.

188
00:12:11,080 --> 00:12:12,080
Sure.

189
00:12:12,080 --> 00:12:17,520
Hey guys, Satish Jayanthi, CTO, co-founder of Coalesce.

190
00:12:17,520 --> 00:12:23,400
My background is basically before Armand and I started working together, I was on the other

191
00:12:23,400 --> 00:12:34,600
side actually making those problems that Lior, you were alluding to, which is building pipelines,

192
00:12:34,600 --> 00:12:37,080
but then you have data issues.

193
00:12:37,080 --> 00:12:43,720
But essentially being on the engineering side, managing and data teams, solving business

194
00:12:43,720 --> 00:12:45,160
problems for large enterprises.

195
00:12:45,160 --> 00:12:47,640
That was what I was doing.

196
00:12:47,640 --> 00:12:48,640
Cool.

197
00:12:48,640 --> 00:12:53,120
So you certainly experienced some of the issues that Monte Carlo aims to solve.

198
00:12:53,120 --> 00:12:59,640
I'm curious, were there any specific use cases, especially because from my understanding,

199
00:12:59,640 --> 00:13:05,280
Monte Carlo was either the first or one of the first pure play data observability products

200
00:13:05,280 --> 00:13:06,280
in the market, right?

201
00:13:06,280 --> 00:13:11,240
As the modern data stack expanded, you saw all these different solutions appear for specific

202
00:13:11,240 --> 00:13:16,440
issues as data has become democratized and just so much more common.

203
00:13:16,440 --> 00:13:18,640
And so was there specific use cases?

204
00:13:18,640 --> 00:13:22,620
You mentioned fraud detection was one that you were exposed to firsthand.

205
00:13:22,620 --> 00:13:28,720
Was that the initial beachhead use case that you were looking at when you approached starting

206
00:13:28,720 --> 00:13:29,720
the company?

207
00:13:29,720 --> 00:13:31,800
What were the first couple of things where you were like, okay, we definitely need to

208
00:13:31,800 --> 00:13:34,600
solve this right out of the gates?

209
00:13:34,600 --> 00:13:36,120
Yeah, great question.

210
00:13:36,120 --> 00:13:43,160
Putting aside my speculations back then from five years ago, I think probably the biggest

211
00:13:43,160 --> 00:13:49,360
surprise to me starting Monte Carlo and then kind of living through it was that it is not

212
00:13:49,360 --> 00:13:51,600
very use case specific, right?

213
00:13:51,600 --> 00:13:57,040
I thought a lot of our customers were going to be essentially tech companies using data

214
00:13:57,040 --> 00:14:02,360
for fraud detection and other places where data matters.

215
00:14:02,360 --> 00:14:08,240
What we learned though, it's quite incredible, is that every single industry you can think

216
00:14:08,240 --> 00:14:13,120
about is using data today and using data in a meaningful way.

217
00:14:13,120 --> 00:14:19,320
And so our customers ended up being from, and this is from the early days even.

218
00:14:19,320 --> 00:14:23,240
So of course, all the prime suspects are there, right?

219
00:14:23,240 --> 00:14:27,920
Like you'll find tech companies, you'll find e-commerce, financial tech companies, right?

220
00:14:27,920 --> 00:14:29,160
All these are there.

221
00:14:29,160 --> 00:14:37,320
But you will also find a lot of manufacturing and a lot of education.

222
00:14:37,320 --> 00:14:43,120
And pretty much any sector of the economy that you could possibly imagine is using data

223
00:14:43,120 --> 00:14:44,120
in a meaningful way.

224
00:14:44,120 --> 00:14:45,440
Kent, you probably saw that.

225
00:14:45,440 --> 00:14:49,160
Yeah, no, I'm just thinking through all the companies I've worked for over the

226
00:14:49,160 --> 00:14:56,040
years and really the whole, what we now call observability, been a, like you said, a problem

227
00:14:56,040 --> 00:15:00,080
just in the analytics space, which is where Barth came out of, is there was always those

228
00:15:00,080 --> 00:15:03,960
questions about, can I trust the data in this dashboard or in this chart?

229
00:15:03,960 --> 00:15:09,040
How come I'm getting two different customer accounts from these two different managers

230
00:15:09,040 --> 00:15:10,440
out of our data warehouse, right?

231
00:15:10,440 --> 00:15:12,680
It was like, where's that data coming from?

232
00:15:12,680 --> 00:15:17,920
And trying to prove to the CEO, somebody having to go through piles and piles of hand coded

233
00:15:17,920 --> 00:15:23,000
ETL code to go, well, how did we end up there over in this smart?

234
00:15:23,000 --> 00:15:25,360
And we got a different answer over in that smart.

235
00:15:25,360 --> 00:15:28,160
Yeah, it is every industry.

236
00:15:28,160 --> 00:15:33,200
I mean, starting in the mid nineties with data warehousing really starting to boom back

237
00:15:33,200 --> 00:15:38,960
then, I saw that thing and said, this idea of business intelligence slightly.

238
00:15:38,960 --> 00:15:44,080
Okay, at the time, some of us thought it was an oxymoron, granted, can we actually have

239
00:15:44,080 --> 00:15:49,760
intelligence in business, but everybody needed that data and you could see anybody who's

240
00:15:49,760 --> 00:15:56,400
going to be successful regardless of the industry, like you said, they need to use the data effectively,

241
00:15:56,400 --> 00:16:03,920
but you get down to the data governance and the reliability, the auditability and the

242
00:16:03,920 --> 00:16:06,880
overall trust factor of that data.

243
00:16:06,880 --> 00:16:12,200
The more important the data became to a company, the more important all of those things became,

244
00:16:12,200 --> 00:16:13,200
right?

245
00:16:13,200 --> 00:16:15,240
We've got to be able to trust that data.

246
00:16:15,240 --> 00:16:20,680
And now that we're moving into AI and gen AI and your experience in machine learning,

247
00:16:20,680 --> 00:16:21,680
it's even more important.

248
00:16:21,680 --> 00:16:28,760
It's like, how do you trust the results of a black box gen AI thing if you can't trust

249
00:16:28,760 --> 00:16:32,440
the data that went into it?

250
00:16:32,440 --> 00:16:33,960
That makes complete sense.

251
00:16:33,960 --> 00:16:38,880
We talk about this all the time, especially with the black box in the AI world, the foundation

252
00:16:38,880 --> 00:16:41,080
you're feeding it with is so critical.

253
00:16:41,080 --> 00:16:47,680
Real quick, and maybe this is for everybody here, but when we think about data observability,

254
00:16:47,680 --> 00:16:52,600
some of this feels like it is on the fringes of data quality as well, because we talk about

255
00:16:52,600 --> 00:16:54,440
making sure that that quality is high.

256
00:16:54,440 --> 00:17:00,960
I guess, Leor, just for the audience here, how would you decipher or compare the two?

257
00:17:00,960 --> 00:17:05,600
Is it completely a separate thing or do you see observability is related to quality?

258
00:17:05,600 --> 00:17:06,600
What are your thoughts there?

259
00:17:06,600 --> 00:17:11,880
And also, it looks like as people are tuning in here, if you have any questions, feel free

260
00:17:11,880 --> 00:17:15,240
to ask for anybody that's on the webcast right now.

261
00:17:15,240 --> 00:17:18,560
But yeah, can you help decipher that?

262
00:17:18,560 --> 00:17:24,920
Yeah, to me, and there's obviously different ideas about this, but the way I view it is

263
00:17:24,920 --> 00:17:28,000
data observability is an extension of data quality.

264
00:17:28,000 --> 00:17:39,280
I think a lot of data quality, both the concepts and the tooling around it came from this viewpoint

265
00:17:39,280 --> 00:17:45,080
of I'm going to do something once very manually.

266
00:17:45,080 --> 00:17:52,680
I'm going to take data and ingest it, clean it, transform it, and put it in the binder.

267
00:17:52,680 --> 00:17:55,760
That's where the methodology came from.

268
00:17:55,760 --> 00:18:04,880
It's very much oriented towards point of ingestion or very, very specific parts of the pipeline.

269
00:18:04,880 --> 00:18:12,120
It's very focused on exclusively the data and the rows.

270
00:18:12,120 --> 00:18:13,120
That's critical.

271
00:18:13,120 --> 00:18:14,120
That's building blocks.

272
00:18:14,120 --> 00:18:19,240
You can't get a reliable dashboard if the numbers are broken or if the data that was

273
00:18:19,240 --> 00:18:23,720
ingested has values that shouldn't be there.

274
00:18:23,720 --> 00:18:27,760
That's absolutely a critical thing.

275
00:18:27,760 --> 00:18:30,480
It's a big part of data observability, a big part of what we do today.

276
00:18:30,480 --> 00:18:35,600
The thing where data observability took it a step further was, hey, look, we're not putting

277
00:18:35,600 --> 00:18:37,960
data in binders anymore.

278
00:18:37,960 --> 00:18:44,080
We're not doing this pool once a quarter that we analyze and scrutinize and have a person

279
00:18:44,080 --> 00:18:46,200
look at and manually transform.

280
00:18:46,200 --> 00:18:48,400
We're not doing that anymore.

281
00:18:48,400 --> 00:18:54,800
In a modern company, there could be hundreds and thousands of people that are staring at

282
00:18:54,800 --> 00:18:57,200
dashboards every day to do their jobs.

283
00:18:57,200 --> 00:19:02,000
There's models that are making decisions on behalf of the business every single day or

284
00:19:02,000 --> 00:19:04,840
billions of times a day sometimes.

285
00:19:04,840 --> 00:19:06,400
That idea no longer works.

286
00:19:06,400 --> 00:19:09,000
You have to think about how do you scale this thing?

287
00:19:09,000 --> 00:19:16,080
How do you make sure the entire system, all the way from the data that gets ingested

288
00:19:16,080 --> 00:19:21,160
and not through typically dozens and hundreds of steps of transformation, all the way down

289
00:19:21,160 --> 00:19:26,720
to the end product, be it a dashboard or a model or whatnot, how do you make sure this

290
00:19:26,720 --> 00:19:28,880
whole thing works reliably?

291
00:19:28,880 --> 00:19:36,960
One part of it is definitely making sure that values are correct in a sense or meet certain

292
00:19:36,960 --> 00:19:37,960
business rules.

293
00:19:37,960 --> 00:19:43,560
But you really have to start thinking about how every single step of the way of this long

294
00:19:43,560 --> 00:19:45,560
pipeline, how reliable is it?

295
00:19:45,560 --> 00:19:46,560
How healthy is it?

296
00:19:46,560 --> 00:19:49,800
And you have to think about it several dimensions.

297
00:19:49,800 --> 00:19:51,200
These systems are pretty complicated.

298
00:19:51,200 --> 00:19:56,200
They have both the data that's flowing in, like there's this external input that you're

299
00:19:56,200 --> 00:20:02,280
taking from either another team in your own company or sometimes from an external source

300
00:20:02,280 --> 00:20:05,780
that can change unexpectedly in ways that you don't anticipate.

301
00:20:05,780 --> 00:20:11,400
You have the code that you're using to transform all that data.

302
00:20:11,400 --> 00:20:17,520
You're usually, again, applying at least several dozens of steps of calculation.

303
00:20:17,520 --> 00:20:18,680
And that can change.

304
00:20:18,680 --> 00:20:22,320
You're hiring people to build those pipelines, to make those pipelines better.

305
00:20:22,320 --> 00:20:23,320
They're going to change the code.

306
00:20:23,320 --> 00:20:27,560
The code is going to have unintended consequence, like it just happened.

307
00:20:27,560 --> 00:20:30,240
And then the third piece, of course, is infrastructure.

308
00:20:30,240 --> 00:20:33,640
All this thing is running in a variety of tools.

309
00:20:33,640 --> 00:20:42,960
And all these tools work and combine together in sometimes mysterious ways to create the

310
00:20:42,960 --> 00:20:43,960
end product.

311
00:20:43,960 --> 00:20:49,840
And you have to understand how reliably and how healthy all these things are working and

312
00:20:49,840 --> 00:20:52,760
how they're combining to create the final result.

313
00:20:52,760 --> 00:20:58,080
And that's probably the biggest philosophical difference in terms of observability, like

314
00:20:58,080 --> 00:21:01,080
for data observability and where it extends data quality.

315
00:21:01,080 --> 00:21:06,520
In practice, this means that a data observability solution will give you tools that allow you

316
00:21:06,520 --> 00:21:12,760
to look at all of the tables that you have, not just at the point of ingestion or consumption,

317
00:21:12,760 --> 00:21:15,600
but all of the different steps of the pipeline.

318
00:21:15,600 --> 00:21:18,600
And it will try to measure health at every single step.

319
00:21:18,600 --> 00:21:21,560
And it will try to give you meaningful alerts.

320
00:21:21,560 --> 00:21:25,720
And it will, even more important, it will give you meaningful context about those alerts.

321
00:21:25,720 --> 00:21:28,120
Like, OK, there's a problem here.

322
00:21:28,120 --> 00:21:31,200
The data is wrong one way or another.

323
00:21:31,200 --> 00:21:33,000
Where is this data coming from?

324
00:21:33,000 --> 00:21:34,560
What happened there?

325
00:21:34,560 --> 00:21:36,360
Did someone change the code there?

326
00:21:36,360 --> 00:21:42,360
Did someone or the data that you ingest change in some way that you didn't expect?

327
00:21:42,360 --> 00:21:48,000
Did the infrastructure that was running all of this have a certain issue, performance

328
00:21:48,000 --> 00:21:50,360
or errors or otherwise?

329
00:21:50,360 --> 00:21:55,320
All these things are combined together into the single pane of glass that gives you visibility

330
00:21:55,320 --> 00:22:00,040
into data quality and other things that are important for the...

331
00:22:00,040 --> 00:22:05,360
So this is really, I'll say, automating the overall monitoring of what's happening in

332
00:22:05,360 --> 00:22:06,360
the data ecosystem, right?

333
00:22:06,360 --> 00:22:10,120
There's just no way to scale without doing the automation.

334
00:22:10,120 --> 00:22:12,080
Right, right, right, right.

335
00:22:12,080 --> 00:22:18,120
And that's exactly where data quality, quote unquote, struggled in the past, right?

336
00:22:18,120 --> 00:22:22,840
Yeah, running one SQL script occasionally, like before you move something to production,

337
00:22:22,840 --> 00:22:27,960
OK, so the code works today on the data we're looking at today.

338
00:22:27,960 --> 00:22:33,560
But like you said, something changes in a source system or a rule changes, somebody builds

339
00:22:33,560 --> 00:22:38,400
this pipeline a little different, and now you've got data flowing into tables that,

340
00:22:38,400 --> 00:22:40,080
is it really still right?

341
00:22:40,080 --> 00:22:42,600
Uh-huh, yeah, yeah, yeah.

342
00:22:42,600 --> 00:22:43,920
And these things break, right?

343
00:22:43,920 --> 00:22:45,920
Like they do.

344
00:22:45,920 --> 00:22:49,800
It's the nature of complex systems, right?

345
00:22:49,800 --> 00:22:51,440
Yeah, yeah, that's helpful.

346
00:22:51,440 --> 00:22:52,560
I love that.

347
00:22:52,560 --> 00:22:55,960
And I like the way you just had a high level comparing it to something like Datadog as

348
00:22:55,960 --> 00:23:01,080
far as software engineering goes, but appropriating that to the data pipelines, people go through

349
00:23:01,080 --> 00:23:03,600
building them and managing them.

350
00:23:03,600 --> 00:23:06,400
Any questions from you that you're curious about?

351
00:23:06,400 --> 00:23:13,720
Well, you know, it's the age of AI here, and since Lior has got the background in machine

352
00:23:13,720 --> 00:23:17,640
learning and all that, I was really kind of curious as to what are you seeing with your

353
00:23:17,640 --> 00:23:19,520
customer base today?

354
00:23:19,520 --> 00:23:23,080
Are people really getting into gen AI and machine learning?

355
00:23:23,080 --> 00:23:25,400
Are they just dipping their toes in?

356
00:23:25,400 --> 00:23:31,040
And how many are getting past these sort of, well, let's try it out with chat GPT sort

357
00:23:31,040 --> 00:23:36,160
of thing and experiment and really looking at putting stuff into production and using

358
00:23:36,160 --> 00:23:41,400
your product as a way of monitoring those pipelines to make sure that everything's good.

359
00:23:41,400 --> 00:23:43,520
Yeah, great question.

360
00:23:43,520 --> 00:23:47,640
It's been really hard not to hear about generative AI in the last, I've been trying sometimes

361
00:23:47,640 --> 00:23:50,840
to not hear about it, you still do, right?

362
00:23:50,840 --> 00:23:57,440
And so I think the kind of like you called out, I want to say that, you know, 80 or 90%

363
00:23:57,440 --> 00:24:02,760
of teams that you talk to have plans around it.

364
00:24:02,760 --> 00:24:10,320
And I've taken at least some steps to experiment with it, to understand it and to do things

365
00:24:10,320 --> 00:24:11,320
with it.

366
00:24:11,320 --> 00:24:18,040
But you also call out there's a pretty broad range of maturities around that, where some

367
00:24:18,040 --> 00:24:25,200
teams have gotten all the way up to, you know, a customer facing production app that leverages

368
00:24:25,200 --> 00:24:31,320
gen AI and a lot of companies are still the phases of figuring out what to even do with

369
00:24:31,320 --> 00:24:32,680
this and how.

370
00:24:32,680 --> 00:24:34,160
And we see it across the board.

371
00:24:34,160 --> 00:24:40,560
And I guess what I'll kind of call out here is it's hard to productize generative AI.

372
00:24:40,560 --> 00:24:43,040
It's not a trivial matter, right?

373
00:24:43,040 --> 00:24:48,400
And I think a lot of us, Monte Carlo included, started from the block and tackle stuff.

374
00:24:48,400 --> 00:24:52,640
Let's take stuff that generative AI is pretty good at, you know, and maybe code generation

375
00:24:52,640 --> 00:24:57,720
is something that gen AI does fairly well out of the box.

376
00:24:57,720 --> 00:25:04,160
And let's plug it into our applications where we need to perform those tasks, right?

377
00:25:04,160 --> 00:25:10,680
And we'll use the foundation models that OpenAI and others have kindly built for us through

378
00:25:10,680 --> 00:25:12,600
their APIs.

379
00:25:12,600 --> 00:25:15,200
And boom, you know, you have generative AI in production.

380
00:25:15,200 --> 00:25:19,720
And that is probably the most common, you know, success we've seen.

381
00:25:19,720 --> 00:25:24,040
And there's a good number of companies that have been able to do that.

382
00:25:24,040 --> 00:25:25,800
Monte Carlo is one of them, by the way.

383
00:25:25,800 --> 00:25:29,360
Like we do use generative AI in our product and a number of use cases.

384
00:25:29,360 --> 00:25:32,160
And it's gotten good adoption and good feedback.

385
00:25:32,160 --> 00:25:34,720
And can you talk about that a little bit?

386
00:25:34,720 --> 00:25:38,480
So like, as you mentioned, it's hard to, it's difficult to productize.

387
00:25:38,480 --> 00:25:44,880
Like when you think about Monte Carlo as a product, leveraging gen AI to impact your

388
00:25:44,880 --> 00:25:49,040
customers in some different ways, like what are some of the use cases that you saw were

389
00:25:49,040 --> 00:25:56,760
low hanging fruit or opportunities to leverage LLMs when it comes to your value proposition?

390
00:25:56,760 --> 00:26:03,120
So in our world, and I'm also happy to share examples outside of data observability, but

391
00:26:03,120 --> 00:26:06,560
in our world, code generation is probably big, right?

392
00:26:06,560 --> 00:26:11,600
Like we do help our customers deal with code in various ways, right?

393
00:26:11,600 --> 00:26:18,120
Especially SQL most commonly, but you know, we do help our customers process logs from

394
00:26:18,120 --> 00:26:19,960
their data warehouse, for example.

395
00:26:19,960 --> 00:26:20,960
Makes sense.

396
00:26:20,960 --> 00:26:30,640
Our customers do use logs to, sorry, use SQL queries to basically define quality rules

397
00:26:30,640 --> 00:26:34,000
or about their, you know, the data that they have.

398
00:26:34,000 --> 00:26:39,560
And so we've found different ways to help them create that code, debug it, optimize

399
00:26:39,560 --> 00:26:40,760
it, things like that.

400
00:26:40,760 --> 00:26:46,240
And so all of that is, is built into Monte Carlo and, and that does have been some of

401
00:26:46,240 --> 00:26:49,080
the first, you know, first implementations.

402
00:26:49,080 --> 00:26:53,080
Yeah, that, that, that, that worked pretty well.

403
00:26:53,080 --> 00:26:58,080
We've seen similar patterns with our customers specifically around usually code generation

404
00:26:58,080 --> 00:27:02,360
and or summarization is also something that works pretty well.

405
00:27:02,360 --> 00:27:09,400
So how accurate is it usually like we we've gone through internal PSCs at coalesce as

406
00:27:09,400 --> 00:27:10,400
well.

407
00:27:10,400 --> 00:27:14,200
And like, for example, like creating a join or something using GenAI for doing something

408
00:27:14,200 --> 00:27:15,200
like that.

409
00:27:15,200 --> 00:27:16,800
What we found is it gets you pretty close.

410
00:27:16,800 --> 00:27:18,200
It's not a hundred percent accurate.

411
00:27:18,200 --> 00:27:21,520
Is that similar to the experience you've had as well with?

412
00:27:21,520 --> 00:27:22,520
Yeah, absolutely.

413
00:27:22,520 --> 00:27:23,520
It's a copilot.

414
00:27:23,520 --> 00:27:24,520
Right, right.

415
00:27:24,520 --> 00:27:25,520
Isn't that what Snowflake called theirs?

416
00:27:25,520 --> 00:27:26,520
Isn't that the Snowflake copilot?

417
00:27:26,520 --> 00:27:27,520
Yeah, there's like core tech.

418
00:27:27,520 --> 00:27:28,520
So we'll be doing a demo with Doug and Snowflake later this month.

419
00:27:28,520 --> 00:27:29,520
That's for like a different theme.

420
00:27:29,520 --> 00:27:30,520
We've got demos with Doug.

421
00:27:30,520 --> 00:27:31,520
Super fun.

422
00:27:31,520 --> 00:27:32,520
If you haven't tuned into one of those, definitely check it out.

423
00:27:32,520 --> 00:27:33,520
Doug's amazing.

424
00:27:33,520 --> 00:27:34,520
But Snowflake is going to be coming on to the next one.

425
00:27:34,520 --> 00:27:35,520
So if you haven't, you can check it out.

426
00:27:35,520 --> 00:27:36,520
It's a great, great, great, great, great, great, great, great, great, great, great, great, great,

427
00:27:36,520 --> 00:27:50,460
great, great, great, great, great, great, coming on to talk through cortex, which is

428
00:27:50,460 --> 00:27:55,940
a lot of their gen AI copilot functionality, but similar themes.

429
00:27:55,940 --> 00:28:00,780
So it sounds like for you, we or that that is that has been like the first use case that

430
00:28:00,780 --> 00:28:04,300
you saw as an opportunity for and that and that's helped the customers like help customers

431
00:28:04,300 --> 00:28:08,940
at least cut down on some of the time that they would have maybe brings down the skill

432
00:28:08,940 --> 00:28:15,660
skill set required a bit for anybody. It saves a ton of time right even for a for an experienced

433
00:28:15,660 --> 00:28:21,900
engineer writing in SQL is is tedious right. I think where we saw though the most you know the

434
00:28:21,900 --> 00:28:27,900
biggest jump in functionality is when we started so one thing is getting the user experience right

435
00:28:27,900 --> 00:28:35,660
and the right user experience for generative AI is not not surprisingly but an interactive experience

436
00:28:35,660 --> 00:28:44,700
right. If you just try try to throw answers at people they will get limited value because of what

437
00:28:44,700 --> 00:28:50,300
you mentioned like how you know how close is it um does it need to refine and tuning doesn't need

438
00:28:50,300 --> 00:28:56,060
more context. I think the other part that really made this so much better for our customers start

439
00:28:56,060 --> 00:29:02,140
getting a lot more more success with it is when we started incorporating proprietary

440
00:29:03,660 --> 00:29:09,100
proprietary information into the process right and you can think about it as a very

441
00:29:09,900 --> 00:29:14,860
in our case a simple version of RAG of retrieval assisted generation right.

442
00:29:14,860 --> 00:29:20,060
Then when we started augmenting the information that the user provides about what they want to do

443
00:29:20,060 --> 00:29:27,420
with information that we already have about the the in this particular case the the user's data

444
00:29:27,420 --> 00:29:33,580
ecosystem starting from simple things like what tables did it even have and what columns did those

445
00:29:33,580 --> 00:29:38,620
people have and what do we know about right and then you can get more advanced. The results are

446
00:29:38,620 --> 00:29:46,860
so much better and more and more more personalized in a sense and it also creates a more differentiated

447
00:29:46,860 --> 00:29:52,140
differentiated experience if you will compared to going to chat GPT right because if we just use

448
00:29:53,100 --> 00:29:58,620
the APIs as they are I mean it maybe saves you a couple seconds of going to another tab but

449
00:29:58,620 --> 00:30:05,100
we wouldn't be offering anything that is really better than going to chatgpt.com or whatever the

450
00:30:05,100 --> 00:30:13,900
URL is I forget and so where it really became nice for our customers is when we started incorporating

451
00:30:13,900 --> 00:30:19,020
our proprietary information not proprietary but information that we have about the customer

452
00:30:19,020 --> 00:30:23,900
into that experience into the model. I was gonna say Satish talks about this pretty often too like

453
00:30:23,900 --> 00:30:29,420
it feels like it's a race to be able to train the model itself and like that's really where the value

454
00:30:29,420 --> 00:30:39,020
is versus just some like public API. It's about the metadata right or RAG right and and personally I

455
00:30:39,020 --> 00:30:46,380
think RAG is the easier thing to do in a sense but yeah you I think the point is correct.

456
00:30:46,380 --> 00:30:52,540
Satish I'd love to hear your thoughts as well but like it's really a hard or the secret sauce

457
00:30:52,540 --> 00:30:58,220
here is the data that you have however you choose to incorporate it into the model into the application

458
00:30:58,220 --> 00:31:04,380
RAG or fine tuning or you know train your own foundation model if you really want to it's hard

459
00:31:04,380 --> 00:31:09,580
but you you kind of have to do that to make generative AI effective otherwise you're nothing

460
00:31:09,580 --> 00:31:17,020
but a wrapper for gpt right. Satish anything any thoughts on that just like I know you obviously

461
00:31:17,020 --> 00:31:21,340
built some of the world's largest most complex data warehouses you've probably seen these problems

462
00:31:21,340 --> 00:31:28,540
over and over again as far as it relates to gen.ai data observability yeah training models yeah.

463
00:31:28,540 --> 00:31:37,500
And I think you know implementing the augmentation piece is probably the easiest of all the solutions

464
00:31:37,500 --> 00:31:44,780
out there to improve and add value as opposed to the generic API that's available so that's for sure

465
00:31:44,780 --> 00:31:49,020
because the training is while everybody says training and fine tuning and training and fine

466
00:31:49,020 --> 00:31:56,220
tuning is not that easy so that's that's that but as far as the observability goes I have a question

467
00:31:56,220 --> 00:32:05,740
for you Leo so you know when we implemented data quality in my past life you know just like we

468
00:32:05,740 --> 00:32:11,740
discussed hey you write in some sequel to test something either at the beginning of the pipeline

469
00:32:11,740 --> 00:32:16,620
in the middle of the pipeline or at the end of the pipeline and then we say hey here's my first rule

470
00:32:16,620 --> 00:32:21,980
here's my second rule third rule and once you get to a dozen rules then you're kind of getting

471
00:32:21,980 --> 00:32:28,300
to a point where you're losing control of what is happening and you don't have a proper structure so

472
00:32:29,820 --> 00:32:36,060
my question to you is if these companies are getting started let's say on data observability

473
00:32:36,060 --> 00:32:43,340
what would be some of the things that they need to be you know thinking about from the best practices

474
00:32:43,340 --> 00:32:48,540
or how do they start I mean obviously you don't want to boil the ocean but what's the best way

475
00:32:48,540 --> 00:32:54,380
to kind of get started with observability for these with observability great question and

476
00:32:54,380 --> 00:33:01,340
very relevant to general AI as well I think the way to do it is in my opinion several things first

477
00:33:01,340 --> 00:33:09,180
part is leverage automation right like you can absolutely go ahead and write a lot of rules like

478
00:33:09,180 --> 00:33:14,780
you called out that gets really complicated really quickly because it's hard to anticipate all the

479
00:33:14,780 --> 00:33:22,140
things are going to break it's hard to manage the configuration and thresholds and whatnot and can

480
00:33:22,140 --> 00:33:28,540
create a lot of noise as a result that would then alert fatigue people and and make the whole

481
00:33:28,540 --> 00:33:33,500
initiative fail and so the first thing is to leverage automation right is a lot of the stuff

482
00:33:33,500 --> 00:33:38,700
that we built into Monte Carlo is this ability to basically automatically collect a lot of health

483
00:33:38,700 --> 00:33:46,220
metrics about the data starting at the pipeline level or table level things like you know how

484
00:33:46,220 --> 00:33:53,020
recently the table was updated or you know how many rows it has does it kind of does it make sense

485
00:33:53,020 --> 00:33:57,980
for it to have as many rows as it does today and then going sometimes many levels deeper into the

486
00:33:57,980 --> 00:34:02,460
data itself you know again starting from the basic stuff like you know how many nodes you might have

487
00:34:02,460 --> 00:34:09,100
in a particular field or how many unique values and then going down into as sophisticated of a

488
00:34:09,100 --> 00:34:13,900
metric as you want to measure the health of your particular you know data set in your particular

489
00:34:13,900 --> 00:34:19,580
business and we build a lot of tools to make that scalable right whether it's the ability to collect

490
00:34:19,580 --> 00:34:25,900
a lot of those metrics you know with a single click or a single line of configuration like allows

491
00:34:25,900 --> 00:34:32,140
you to to do a lot of these metrics across a lot of tables at once whether it's machine learning

492
00:34:32,140 --> 00:34:39,740
models that help set thresholds in a way that may not be perfect or exactly what a human with a lot

493
00:34:39,740 --> 00:34:46,140
of context would have but they're very good sanity checks and they will put a lot of confidence into

494
00:34:46,140 --> 00:34:52,380
the pipeline without having human goes and and manually sets up a lot of rules so I think that's

495
00:34:52,380 --> 00:34:59,100
the first piece of it and and that's the the second piece of it is rules have a very important place

496
00:34:59,100 --> 00:35:05,500
right like and our customers build a lot of those various forms of rules the trick is to also create

497
00:35:05,500 --> 00:35:13,260
the the operational discipline around it I think right making sure that it's clear a what needs to

498
00:35:13,260 --> 00:35:20,780
be monitored right mapping hey you know there are these critical products data products that

499
00:35:20,780 --> 00:35:26,220
I'm trying to make reliable right whether it's the dashboard that the ceo uses or you know the

500
00:35:26,220 --> 00:35:32,300
table that feeds into my latest and greatest generative AI application but I need to understand

501
00:35:32,300 --> 00:35:39,980
what what it is that needs to be reliable what the slas are I understand what feeds into that

502
00:35:39,980 --> 00:35:47,180
what are the breaking points I understand how to monitor for those issues and make sure these

503
00:35:47,180 --> 00:35:54,380
things go to the right people right because I could send all of my alerts to one single channel and

504
00:35:54,380 --> 00:35:59,500
hope that something happens what happens is that everybody ignores that channel right we need to

505
00:35:59,500 --> 00:36:05,020
do is make sure it goes to the right person at the right time for them to act on it and so you need

506
00:36:05,020 --> 00:36:13,260
the you know both the the organizational discipline and the tooling again to to make sure all this is

507
00:36:13,260 --> 00:36:18,620
possible and attack it in kind of a methodical and measurable way right that's the other part

508
00:36:18,620 --> 00:36:22,300
you're a small team it may not be a big issue but like some of our customers are you know have

509
00:36:22,300 --> 00:36:27,420
thousands of people building data stuff things with right early thousands and it gets out of hand

510
00:36:28,140 --> 00:36:33,340
very quickly like way before you get the thousands of developers and so you need to start measuring

511
00:36:33,340 --> 00:36:43,100
like well how reliable are different you know data products and how rigorous are different teams and

512
00:36:43,100 --> 00:36:48,940
in in in managing reliability and doing the operational stuff and you need the visibility

513
00:36:48,940 --> 00:36:56,300
into that in order to drive accountability and and and eventually trust right so those are probably

514
00:36:56,300 --> 00:37:02,460
like the three key elements I'd say in terms of actually rolling it out in a in a sizable

515
00:37:02,460 --> 00:37:08,620
company so you got to have the right people and processes agreed upon in addition to having the

516
00:37:08,620 --> 00:37:15,340
technology right otherwise it's it's it becomes chaos and I always come back to this doesn't it

517
00:37:15,340 --> 00:37:21,100
I know people you know because you like you said you got to have agreement otherwise otherwise

518
00:37:21,100 --> 00:37:26,140
it's you know people are not going to be like you said again overloaded by the alerts well if they're

519
00:37:26,140 --> 00:37:32,940
getting over by the alerts then we picked either the wrong people or the wrong process or both

520
00:37:32,940 --> 00:37:39,660
and how we're going to take advantage of all of this information right it's like the technology

521
00:37:39,660 --> 00:37:43,580
you guys have done that right you got the technology now it's like how do we deploy

522
00:37:43,580 --> 00:37:49,100
it properly and then we get talking about the the culture the data culture right the uh and data

523
00:37:49,100 --> 00:37:53,900
literacy you know are are we getting it to the right people that understand what that alert even

524
00:37:53,900 --> 00:38:02,220
means right yeah yeah hey there's a quick question in the comment section James Daly here kind of

525
00:38:02,220 --> 00:38:05,820
ties into where we're going to close it out we've got a few minutes left so I just want to make sure

526
00:38:05,820 --> 00:38:11,100
we got this question asked and I remember you James we worked together in a past life so love

527
00:38:11,100 --> 00:38:15,900
seeing you on here but the questions around the the merging of proprietary information to augment

528
00:38:15,900 --> 00:38:21,340
ai can be a tricky topic for some organizations customers need safeguards that their proprietary

529
00:38:21,340 --> 00:38:25,900
information is not used to train models shared with the general public interested to hear how

530
00:38:25,900 --> 00:38:31,020
Monte Carlo manages that fear any thoughts there and like do we see this as a theme coming up in

531
00:38:31,020 --> 00:38:39,500
2024 for for companies as they express that towards vendors like ourselves yeah such a great

532
00:38:39,500 --> 00:38:45,740
question first of all I'll admit like Monte Carlo doesn't necessarily manage this directly and so

533
00:38:45,740 --> 00:38:52,620
we're usually not but I but I do have a point of view on that and I think as you call out

534
00:38:52,620 --> 00:39:00,300
James to adopt generative ai with proprietary data in an enterprise there's a bunch of requirements

535
00:39:00,300 --> 00:39:05,100
right that goes beyond the like oh a demo that I'm going to post on x or twitter or whatever it's

536
00:39:05,100 --> 00:39:09,820
called you need you need to start thinking about like what data goes where right which is what

537
00:39:09,820 --> 00:39:15,900
what you're you're pointing out around privacy compliance data security things like that you

538
00:39:16,380 --> 00:39:22,540
need to think about these things and strictly you need to think about scale right isn't going to

539
00:39:22,540 --> 00:39:27,180
you know it's one thing for me to build a demo out that works on my computer but like how do I

540
00:39:27,180 --> 00:39:32,780
do that in a way that serves my customers which could be many and they might not be talking to me

541
00:39:32,780 --> 00:39:36,620
or in a tightly controlled environment and you need to think about trust right like how do you

542
00:39:36,620 --> 00:39:41,900
make this thing you know produce the right results and and and that ties into data quality data

543
00:39:41,900 --> 00:39:46,780
observability and some things we discussed today to answer your question specifically about security

544
00:39:46,780 --> 00:39:54,940
but the same but I argue the same argument would apply to to the other two this is a place where

545
00:39:54,940 --> 00:40:02,620
this partially makes me say that RAG is probably the architecture of choice it is not only

546
00:40:02,620 --> 00:40:09,660
easier to implement from a technical standpoint but it also very naturally lends itself to all of

547
00:40:09,660 --> 00:40:17,180
these questions right you're ahead of trying your own model like there is literally I'm not aware of

548
00:40:17,180 --> 00:40:23,260
any way you can control who gets what data right like right you can then block the model from

549
00:40:23,260 --> 00:40:28,380
talking to certain people talking to certain people but but whoever gets access to the model

550
00:40:28,380 --> 00:40:34,060
gets access to all the data that was fed into it essentially plus some hallucinations but that's

551
00:40:34,060 --> 00:40:39,740
a different story in a RAG model you can actually control that much more tightly right like we've

552
00:40:39,740 --> 00:40:46,940
solved the the the issues around data privacy in the database world right we've been doing it for

553
00:40:46,940 --> 00:40:52,060
a while we have a lot of good constructs there snowflake has a lot of good controls you know

554
00:40:52,060 --> 00:40:58,460
other solutions too and so you can actually if you go down the RAG route and RAG I can actually

555
00:40:58,460 --> 00:41:03,660
solve a lot of the problems that we want to solve the generative AI you can actually control security

556
00:41:03,660 --> 00:41:11,820
and privacy very well with something that already exists today right it's not like me talking about

557
00:41:11,820 --> 00:41:17,820
some futuristic capabilities you can actually make sure that the person that is using the app

558
00:41:17,820 --> 00:41:22,940
gets access to exactly the data set that they should be getting access to according to their

559
00:41:22,940 --> 00:41:28,540
role or user ID or you know whatever it is and that is a very effective way to do that if you're

560
00:41:28,540 --> 00:41:34,140
going into the fine tuning and training world which which has some merit obviously in a lot of use

561
00:41:34,140 --> 00:41:38,860
cases then that's a whole other ballgame right that's about you know managing different models

562
00:41:38,860 --> 00:41:43,340
for different people and that that could be really really hard to do at scale for example like

563
00:41:43,340 --> 00:41:48,300
you know if you have millions of users some some of our customers do or tens of millions

564
00:41:48,300 --> 00:41:52,940
you know you can't maintain a model for you or it's very very difficult to maintain a model

565
00:41:53,500 --> 00:41:59,340
that is custom trained and custom built for every single customer like you kind of have to do it

566
00:41:59,340 --> 00:42:05,580
with RAG and you and you can use the security and privacy controls that have been created around

567
00:42:05,580 --> 00:42:11,740
databases to really accomplish that same objective with with the general yeah it basically helped

568
00:42:11,740 --> 00:42:17,340
basically help reinforce yeah because technically typically they use a vector database or something

569
00:42:17,340 --> 00:42:23,900
to hold this right you know information and that has the all the database permissions and policies

570
00:42:23,900 --> 00:42:29,900
that you can use most of the people are familiar with that type of security mechanisms anyways yeah

571
00:42:29,900 --> 00:42:35,900
yeah cool uh well we're a little bit past time i could stay on this all day leo it's been awesome

572
00:42:35,900 --> 00:42:41,580
having you as a guest kent as well it's always a pleasure thanks everybody for hopping in and i'm

573
00:42:41,580 --> 00:42:45,820
looking forward to the next one of these but it's uh it's been awesome having you on both of you we

574
00:42:45,820 --> 00:43:06,460
are kent thank you so much and thanks everybody for the great questions and uh for jumping on with us

