1
00:00:00,000 --> 00:00:06,800
Welcome to Making Data Matter, where we have practical conversations about data and leadership

2
00:00:06,800 --> 00:00:11,840
at mission-driven organizations with practical insights into the intersection of nonprofits,

3
00:00:11,840 --> 00:00:16,800
mission strategy, data. And I'm your host Sawyer Nyquist.

4
00:00:16,800 --> 00:00:18,320
And I'm your co-host Troy Dueck.

5
00:00:19,040 --> 00:00:21,440
And today we're joined by guests.

6
00:00:22,960 --> 00:00:23,360
Us.

7
00:00:23,360 --> 00:00:24,480
Us, yeah, nobody.

8
00:00:24,480 --> 00:00:32,480
So this is a Sawyer and Troy solo episode, solo duo episode, no guests today.

9
00:00:33,280 --> 00:00:41,440
But the guest is the experiences that we're doing. So Sawyer, you got started doing another cohort

10
00:00:41,440 --> 00:00:53,200
for data leaders three weeks ago. And I finished up the dataengineer.io boot camp that Zach Wilson

11
00:00:53,200 --> 00:00:57,200
puts on. And I finished that a couple months ago. So we got some great topics that we're going to

12
00:00:57,200 --> 00:01:03,520
cover to sort of compare and contrast those experiences and talk about just that intersection

13
00:01:03,520 --> 00:01:09,440
with our work with nonprofits and mission-driven orgs. So just you're in the thick of your

14
00:01:10,480 --> 00:01:14,560
cohort right now. Tell me how it's been going. Just high level. What do you do there?

15
00:01:15,280 --> 00:01:21,760
Maybe, you know, conceptually or categorically, who's in the group this time? Tell me a little

16
00:01:21,760 --> 00:01:29,600
bit about that. Yeah. So it's called the technical and strategic data leader. And that's probably a

17
00:01:29,600 --> 00:01:37,040
mouthful, but we call it the TSDL cohort. And it's a group of people who are in leadership roles

18
00:01:37,040 --> 00:01:42,400
over data teams. So we've got, there's seven people in the cohort and there's three mentors

19
00:01:42,400 --> 00:01:47,600
who are kind of leading the cohort. And so there's a head of data, there's a director of data,

20
00:01:47,600 --> 00:01:53,280
BI supervisors, VP of business intelligence. Those types of roles and titles are the people

21
00:01:53,280 --> 00:02:00,800
who are in the cohort. And the goal is to think about the bridging the gap between the strategic

22
00:02:00,800 --> 00:02:07,200
side of being in a leadership function in data and the technical side of being over data, which is

23
00:02:07,840 --> 00:02:12,320
fundamentally also a technical discipline. And how do we get data leaders and how do we equip

24
00:02:12,320 --> 00:02:17,920
data leaders to do a good job of both of those skillsets of business, strategic thinking,

25
00:02:17,920 --> 00:02:23,840
as well as technical and architectural thinking as they lead their team. So all of these people

26
00:02:23,840 --> 00:02:30,720
in the cohort are leaders of a team or leaders of teams of people, leaders of leaders of teams of

27
00:02:30,720 --> 00:02:35,440
people. And we really, it's really a really diverse group, which is fun. We've got people,

28
00:02:36,000 --> 00:02:41,120
we span nine time zones in our group. We've got people from Europe all the way to Pacific coast

29
00:02:41,120 --> 00:02:49,200
of the US. People at small organizations of a couple hundred people, people at a fortune 50

30
00:02:49,200 --> 00:02:54,960
organization. So just massive in terms of like team sizes and organizational complexities in

31
00:02:54,960 --> 00:03:02,800
politics. And then people, a couple of specific people in the cohort are in government and then

32
00:03:02,800 --> 00:03:08,320
in like nonprofit healthcare and Medicare related services. And so it spans both the for-profit

33
00:03:08,320 --> 00:03:12,720
world as well as the nonprofit world. So I don't know, that's a little bit about like how the

34
00:03:12,720 --> 00:03:17,040
cohort is and who's in it. And it's a different sort of learning experience than we've seen

35
00:03:17,040 --> 00:03:21,120
anywhere else, which is why we thought about, which is why me and the other mentors got together to

36
00:03:21,120 --> 00:03:27,440
put this together, because there's a combination of both content of delivering and talking through

37
00:03:27,440 --> 00:03:32,640
what's happening with data architectures and technologies and strategy, as well as lots of

38
00:03:32,640 --> 00:03:37,600
conversations and community building that we try to invest in. So because it's a small group

39
00:03:37,600 --> 00:03:43,120
and there's only 10 of us, we've got a really kind of intimate conversation space and dialogue

40
00:03:43,120 --> 00:03:48,080
space. People feel safe to really share and engage with their challenges that they're facing.

41
00:03:48,080 --> 00:03:52,960
And then there's also some like core content that they're getting in a synchronous format as well.

42
00:03:52,960 --> 00:03:58,400
So there's what the cohort looks like at a high level. And I'm interested now, like let's flip

43
00:03:58,400 --> 00:04:01,920
this around because we're going to talk about how you learn with data and how you grow your data

44
00:04:01,920 --> 00:04:04,640
career in a couple of different ways. So the cohort is kind of like one model we've talked

45
00:04:04,640 --> 00:04:10,080
about. I'm curious about, and we can go more to that later. Troy, tell me a bit about your experience

46
00:04:10,080 --> 00:04:15,600
with a data engineering bootcamp, because that was quite a bit different than what I was just describing.

47
00:04:15,600 --> 00:04:22,240
Yeah, definitely parallels. You're talking about big organizations represented in the people that

48
00:04:22,240 --> 00:04:29,120
are the students in the cohort. Some are data leaders, but it was more geared towards practitioners.

49
00:04:29,120 --> 00:04:35,920
And so it's those who really want to take their data game, technically speaking, to the next level.

50
00:04:35,920 --> 00:04:42,080
And so this was the fourth version of the bootcamp that Zach Wilson has put together. And,

51
00:04:42,080 --> 00:04:50,640
you know, for those of you who don't know, he has worked at Facebook, at Netflix, and has dealt with

52
00:04:50,640 --> 00:04:56,080
big data. You know, we're talking about maybe hundreds of terabytes of data that were being

53
00:04:56,080 --> 00:05:03,680
moved over an hour or two. So I can't think of a better definition for big data than some of the

54
00:05:03,680 --> 00:05:10,000
stuff that he was dealing with. And so it got very technical. You're coding. So you've got VS code

55
00:05:10,000 --> 00:05:19,600
open. You're doing stuff with SQL. You're using DBT, learning Kafka, Flink, Apache Spark. I mean,

56
00:05:19,600 --> 00:05:24,320
just really nerdy kind of stuff. And so... Yeah, we should have had a nerd warning at the beginning

57
00:05:24,320 --> 00:05:28,960
of this episode, because it sounds like we're going to be stepping our toe into the nerdy waters.

58
00:05:28,960 --> 00:05:36,000
Well, we might. We might. I'm just naming them off. And I want to get at, okay, so what does

59
00:05:36,000 --> 00:05:42,320
that matter for someone who's dealing with smaller data sets? You know, I'm thinking about my own

60
00:05:42,320 --> 00:05:46,880
experience and some of the biggest data sets I'm dealing with is only a few hundred million rows.

61
00:05:46,880 --> 00:05:54,880
And so, you know, that's not big data yet. That's still, by and large, manageable with certain

62
00:05:55,360 --> 00:06:02,240
small practices that you can do with data. It's not 200 terabytes in an hour or something like

63
00:06:02,240 --> 00:06:07,520
that. So a very different world. Now, a little bit about this group of students is there was

64
00:06:07,520 --> 00:06:15,280
a hundred and forty or so that had originally signed up for this bootcamp. Now, I think it

65
00:06:15,280 --> 00:06:22,240
kind of whittled down to a core group of 70 or 80 that were regularly active. And even that group

66
00:06:22,240 --> 00:06:28,240
of students was split between those who were really thinking of a career move. They either

67
00:06:28,640 --> 00:06:34,560
wanted to really break into data engineering. Maybe they were a data analyst or a BI person.

68
00:06:34,560 --> 00:06:40,880
Maybe they were even a program manager and they just wanted to get into data from more technical

69
00:06:40,880 --> 00:06:47,360
perspective. And so they were really making a huge career move into going further upstream and doing

70
00:06:47,360 --> 00:06:54,320
the real data engineering, setting some pipelines. You've got your DAGs to, you know, in airflow to

71
00:06:54,320 --> 00:06:59,600
orchestrate your jobs. And so it was all the nuts and bolts they hadn't played with in their former

72
00:06:59,600 --> 00:07:05,520
experience. That was one group of students. Then you had a second group of students that were the

73
00:07:05,520 --> 00:07:10,480
current data engineers who wanted to fill in their knowledge gaps. They wanted to learn some

74
00:07:10,480 --> 00:07:18,160
new SQL patterns. They wanted to just sort of see end-to-end solutions and think at a higher level

75
00:07:18,160 --> 00:07:23,840
of how can I do data engineering with best practices and industry standards, not just the

76
00:07:23,840 --> 00:07:30,560
duct tape and bailing wire ways that I've put things together over time. And so I think the

77
00:07:30,560 --> 00:07:35,200
group that was doing the career moves were maybe a little more invested in the community. And so

78
00:07:35,200 --> 00:07:43,280
that was one of the things that they really promoted throughout the course was here's a

79
00:07:43,280 --> 00:07:48,960
platform that you can use to share ideas with one another, troubleshoot your code together,

80
00:07:48,960 --> 00:07:59,120
work in groups and pairs and learn that way. There was also the utilization of LLM auto grading in

81
00:07:59,120 --> 00:08:05,760
the assignments that we would submit. And so using GitHub classroom, we would submit our code and

82
00:08:05,760 --> 00:08:12,560
then we'd get the automated grading feedback from the LLM that they had instituted. So really neat to

83
00:08:12,560 --> 00:08:20,480
see a real live use case of some cool tech on my homework. At the same time, it didn't give you as

84
00:08:20,480 --> 00:08:27,760
much of a personalized grading experience. So it's a little bit more the bigger, not as much of a

85
00:08:27,760 --> 00:08:31,600
cohort experience that I think you're talking about where you get to know these people, you get

86
00:08:31,600 --> 00:08:37,520
to have a relationship with these people and talk about very specific use cases that they're facing

87
00:08:37,520 --> 00:08:43,200
in the business. So that's just some of the, I guess, I don't know, culture or experience that

88
00:08:43,200 --> 00:08:48,000
might've been similar and different to what you're talking about in the cohort. So I don't know,

89
00:08:48,000 --> 00:08:52,240
what questions do you have about parallels between that cohort and the bootcamp?

90
00:08:52,240 --> 00:08:58,160
Yeah. Well, I think it's interesting. So for listeners here, we're not really promoting

91
00:08:58,160 --> 00:09:02,400
either of these things. We're really just talking about different ways that Troy and I have

92
00:09:02,400 --> 00:09:09,040
experienced this summer for learning and growing and expanding our skill sets as data people.

93
00:09:09,600 --> 00:09:14,480
And so one, Troy's bootcamp experience, mine's a cohort leadership experience. And so these are

94
00:09:14,480 --> 00:09:19,520
two different ways that we spent our summers with data. The thing that I'm curious about is really

95
00:09:19,520 --> 00:09:25,040
the learning experience. And maybe for you, Troy, as a learner, you've also worked in education in

96
00:09:25,040 --> 00:09:29,280
the past too. So I'm curious to poke at like, and you've also taught classes online. I'm curious

97
00:09:29,280 --> 00:09:34,240
about from a learning perspective, what are the things about that cohort? You had a lot of homework,

98
00:09:34,240 --> 00:09:38,640
you had some live instruction, you had pair programming options, talk about which parts of

99
00:09:38,640 --> 00:09:43,920
learning experience best equipped you to gain new skills. You got exposed to a lot of new technologies

100
00:09:43,920 --> 00:09:48,240
too in this. So you were learning lots of new stuff. What learning experiences were most

101
00:09:48,240 --> 00:09:55,840
influential for you? That's a great question. I'll camp on two particular nuggets that I took away

102
00:09:55,840 --> 00:10:03,680
from the bootcamp. And the first was just filling in gaps on my SQL knowledge. And so seeing patterns

103
00:10:03,680 --> 00:10:10,480
that were repeatable was helpful for me. One of those was the cumulative table design. I'm already

104
00:10:10,480 --> 00:10:15,840
familiar with slowly changing dimensions, the type two, which is the kind of the gold standard,

105
00:10:15,840 --> 00:10:23,680
I guess, of those slowly changing dimensions. And yet the cumulative table design was how can you

106
00:10:23,680 --> 00:10:30,240
not do truncate this entire table and reload from the database every single night on a schedule?

107
00:10:30,240 --> 00:10:36,400
How can you incrementally load this table? And what's the SQL pattern that'll enable you to get

108
00:10:36,400 --> 00:10:43,120
there? And you can use that on almost any data set. And so it's a pattern that you can continue

109
00:10:43,120 --> 00:10:48,880
to iterate on. And you don't have to get highly technical using some kind of change data capture

110
00:10:48,880 --> 00:10:54,960
function that the database has available to you or delta files to try to find all the change logs

111
00:10:54,960 --> 00:11:01,680
and do it that way. It was a much simpler way within SQL to still get at the same idea of

112
00:11:01,680 --> 00:11:08,160
incrementally loading this table rather than truncating the entire thing and loading it nightly

113
00:11:08,160 --> 00:11:13,520
kind of thing. And so that was a really helpful pattern to see. Sorry, you're going to ask a

114
00:11:13,520 --> 00:11:17,680
question. That's part of the content that you received in. Were there lectures about that?

115
00:11:17,680 --> 00:11:20,960
Or was that also part of your hands-on homework, was building some cumulative?

116
00:11:20,960 --> 00:11:27,040
Right. It was both. And so most of the lectures were, there was option for live,

117
00:11:27,040 --> 00:11:33,440
and Zach Wilson would usually do an hour lecture with an hour lab. And so it was usually a two-hour,

118
00:11:33,440 --> 00:11:38,720
and it was almost nightly. I think it was four nights a week were basically mandatory for you

119
00:11:38,720 --> 00:11:45,600
to be involved. So it was about eight to 10 hours that you were watching or attending live. And then

120
00:11:45,600 --> 00:11:51,120
there was another eight to 10 hours that you would be investing in homework. So it is a boot camp. I

121
00:11:51,120 --> 00:11:58,000
mean, you are really spending 20 plus hours a week, depending on your skill level, working on this

122
00:11:58,000 --> 00:12:04,000
stuff. And so you'd watch Zach go through some of his examples, and then he would give you similar

123
00:12:04,000 --> 00:12:10,480
examples with some data sets that he made available through his learning platform. And so they're using

124
00:12:10,480 --> 00:12:18,400
a Trino database, and there's a web UI that you can go to to write your queries right there,

125
00:12:18,400 --> 00:12:23,040
test them against the actual data sets. So you're actually coding in this thing. You're actually

126
00:12:23,040 --> 00:12:29,200
running that query, getting the results set back, looking at, did I do it right? And then you submit

127
00:12:29,200 --> 00:12:35,760
your code. Again, you can do it in VS code, submit it through GitHub, and then the autograder would

128
00:12:35,760 --> 00:12:43,120
come back at you. So that one particular pattern was a lecture that just really stuck with me, is

129
00:12:43,120 --> 00:12:49,200
looking at that SQL pattern to be able to continue to use that. I've already utilized it three or

130
00:12:49,200 --> 00:12:56,080
four times in my work day-to-day experience. So that was a really great nugget of learning.

131
00:12:56,720 --> 00:13:02,640
The other learning that I enjoyed was the optional learning, which was interviews that Zach would

132
00:13:02,640 --> 00:13:07,520
have with other practitioners. There was even something he had done a few years ago that he

133
00:13:07,520 --> 00:13:12,560
makes available in his learning platform, where he interviewed Bill Inman. And that was just a

134
00:13:12,560 --> 00:13:18,560
really neat conversation to hear Zach asking one of the fathers of data warehouses, Zach,

135
00:13:18,560 --> 00:13:26,640
data warehousing, how data has changed over the decades and what does the horizon look like.

136
00:13:26,640 --> 00:13:33,520
And so similar types of interviews he tried to have throughout the course, somewhere past that

137
00:13:33,520 --> 00:13:39,040
you could go and look at recorded, or he had some live ones where you could participate asking

138
00:13:39,040 --> 00:13:42,880
questions of these other data leaders and practitioners that are out there, well-known

139
00:13:43,680 --> 00:13:48,000
personalities. So that was part of that learning experience. And so whether it was the more

140
00:13:48,000 --> 00:13:53,440
technical learning the nuggets over here or getting those interviews of what's happening

141
00:13:53,440 --> 00:14:00,960
in the landscape, lots of exposure to technologies. And here's the one takeaway that was good for me,

142
00:14:00,960 --> 00:14:07,360
but it was maybe both a, it was a bittersweet thing that I realized. And it's a lot of these

143
00:14:07,360 --> 00:14:12,960
tools are intended for big tech. And so unless you're going to be doing some really crazy

144
00:14:12,960 --> 00:14:19,360
things with digital analytics, low latency data, these large data sets that are coming

145
00:14:19,360 --> 00:14:24,720
from streaming data of some kind, you probably aren't going to use these tools. You're not going

146
00:14:24,720 --> 00:14:32,960
to have to dig into Flink and Kafka and even Apache Spark as powerful as it is, might be

147
00:14:32,960 --> 00:14:37,760
overkill for the small job you're trying to accomplish. And so it was helpful for me as

148
00:14:37,760 --> 00:14:43,920
someone who's not worked in large data sets, but I've got over a decade of experience working with

149
00:14:43,920 --> 00:14:50,880
data to recognize that just because it's cool, new, shiny, really powerful, doesn't mean that

150
00:14:50,880 --> 00:14:56,320
it's the right tool for the job. And so that was a good, again, bittersweet thing. Like I love

151
00:14:56,640 --> 00:15:02,800
learning new tech. I love learning new skills and recognizing that. So the use cases, I think,

152
00:15:02,800 --> 00:15:09,200
at nonprofit for some of these really cool, new, shiny tools is that they're intended for big tech.

153
00:15:09,200 --> 00:15:16,240
And so unless you're going to be utilizing large data sets, you're going to be doing low latency

154
00:15:16,240 --> 00:15:25,040
data, or you're going to be trying to mash up tons and tons of data, the tool's just too big,

155
00:15:25,040 --> 00:15:30,320
too powerful for what you're trying to do. And the amount of time invested learning and growing in

156
00:15:30,320 --> 00:15:36,480
that may not actually reap a benefit for the organization that you work with. And so it was

157
00:15:36,480 --> 00:15:41,200
just a bittersweet realization that I came to, where it's like, oh, man, as much as I'd love to

158
00:15:41,200 --> 00:15:46,080
keep growing in some of these tools, I don't currently have that use case. And this is after

159
00:15:46,080 --> 00:15:53,840
having a decade of experience working for three relatively large nonprofit organizations and

160
00:15:53,840 --> 00:15:59,440
recognizing that's just not where they're at from a data maturity perspective. There's other things

161
00:15:59,440 --> 00:16:04,800
that they need to solve today. And the SQL patterns became more important and relevant for

162
00:16:04,800 --> 00:16:11,920
that work today. These other tools, they may have that use case down the road as data maturity grows

163
00:16:11,920 --> 00:16:14,320
and gets better at these organizations.

164
00:16:16,560 --> 00:16:23,040
Yeah, that's a really interesting call out to think about the buzz and the hype cycle that happens

165
00:16:23,040 --> 00:16:28,640
around these, or like the big sexy tools that the big tech uses, or that you see that it's

166
00:16:28,640 --> 00:16:33,200
got, or that Facebook or wherever, or those tools were oftentimes built at those places, and now

167
00:16:33,200 --> 00:16:39,920
they become commonplace. But the hype cycle doesn't apply well across all industries or across all

168
00:16:39,920 --> 00:16:44,640
sides of organizations. And we see this with LLMs as well, like it's super exciting, like let's go

169
00:16:44,640 --> 00:16:50,960
build your custom LLM model. And that doesn't work or really make sense or is practical for the vast

170
00:16:50,960 --> 00:16:57,200
majority of organizations out there. And so I guess you got exposed to a lot of new tools that

171
00:16:57,200 --> 00:17:02,080
were probably really interesting and fun to learn about. Tell me what were like, practically though,

172
00:17:02,800 --> 00:17:09,040
how would you look to apply those frameworks? Is this just like good to know those tools exist

173
00:17:09,040 --> 00:17:13,680
and what's out there? Are there any some takeaways for even like understanding the open source tools,

174
00:17:13,680 --> 00:17:15,680
Spark, Flink, Kafka, et cetera?

175
00:17:17,040 --> 00:17:21,680
Yeah, that's a great question. I think exposure is always a great thing to just know what tools are

176
00:17:21,680 --> 00:17:28,080
out there that you could put in your tool belt and have at your fingertips when needed. And so

177
00:17:28,080 --> 00:17:35,120
it's, to think of these things as either ors is always pretty dangerous. You end up getting siloed

178
00:17:35,120 --> 00:17:40,480
and that can have its own detriment. At the same time, and it's going to sound like I'm talking

179
00:17:40,480 --> 00:17:45,040
out of both sides of my mouth, there's so many of them. So how do you stay abreast of everything

180
00:17:45,040 --> 00:17:51,520
that's going on and know how to use that tool in that particular use case in a way that's

181
00:17:51,520 --> 00:17:58,320
it's a challenge. And so I think you have to find that sweet spot of I've got enough exposure

182
00:17:58,320 --> 00:18:04,080
to know what's out there, to know when I should look deeper into this and need to use it. But

183
00:18:04,080 --> 00:18:11,440
for the current day to day, more helpful is maybe just the end to end of thinking through processing.

184
00:18:12,080 --> 00:18:18,640
Medallion architecture, I think, has got a lot of hype around it. And it also can be a bit confusing,

185
00:18:18,640 --> 00:18:23,280
like, well, what do you put the bronze layer? What do you put in silver? What do you put in gold? And

186
00:18:23,280 --> 00:18:29,360
everybody might do something a little bit different there. And so to see a more technical end to end

187
00:18:29,360 --> 00:18:36,000
and not the categorical end to end was helpful. Oh, you first need to go to the source. You need to

188
00:18:36,000 --> 00:18:39,680
set up a pipeline. You need to know how you're going to orchestrate that. And you're going to

189
00:18:39,680 --> 00:18:44,080
need to think through the priority of your jobs. Well, I haven't even hit bronze yet. I'm just

190
00:18:44,080 --> 00:18:50,960
thinking about the nuts and bolts of getting that data extracted before I do something with it.

191
00:18:50,960 --> 00:18:58,640
And so just being able to break down those pieces further into the technical, what's happening here,

192
00:18:58,640 --> 00:19:04,720
here, here, and here with the different tools was helpful to just think from a maybe architectural

193
00:19:04,720 --> 00:19:09,840
standpoint that's a little more detailed than the, well, that's bronze, that's silver, that's gold.

194
00:19:09,840 --> 00:19:15,280
That didn't tell me very much. I got to learn a lot more of that in seeing it with different tools.

195
00:19:15,280 --> 00:19:21,520
And open source lets you play. Open source lets you dabble without getting into having to

196
00:19:23,040 --> 00:19:26,960
raise the cost unnecessarily just to learn something new.

197
00:19:27,920 --> 00:19:34,080
Yeah. Yeah. So like the patterns or like the more the framework and the wiring of how it all fits

198
00:19:34,080 --> 00:19:39,360
together is going to be useful even when you swap out different tools, because you're still

199
00:19:39,360 --> 00:19:43,520
thinking through the same sorts of problems about data ingestion. We're getting out of

200
00:19:43,520 --> 00:19:47,280
these source systems. We're landing it into a new place. Like what are the different configurations,

201
00:19:47,280 --> 00:19:50,800
requirements, and patterns that we want on ingestion? Same thing as you move through the

202
00:19:50,800 --> 00:19:55,600
different maybe layers of an architecture. And that looks the same whether you're using

203
00:19:55,600 --> 00:20:00,400
an open source project or whether you're using like a SaaS tool that's like plug and play or

204
00:20:00,400 --> 00:20:05,680
GUI drop interface. And you can configure, you still want to think through the same patterns

205
00:20:05,680 --> 00:20:10,480
and the same considerations come into play. Some of that gets easier when you don't have

206
00:20:10,480 --> 00:20:15,920
to configure it all manually in code. But I think that sounds like probably the thinking

207
00:20:15,920 --> 00:20:21,200
through the problems or understanding how it's wired together at a really like root code level

208
00:20:21,200 --> 00:20:27,840
helps you a lot more when you are working at a more abstract level or more user friendly tool set level.

209
00:20:29,360 --> 00:20:34,160
Yeah. Yeah. Now I want to ask you about your cohort. And I'm curious to hear,

210
00:20:34,160 --> 00:20:42,720
is I'm thinking from a practitioner standpoint, I need to know how to, you know, do this technical

211
00:20:42,720 --> 00:20:51,440
thing over here. But what are the problems that nonprofit data leaders or mission driven org data

212
00:20:51,440 --> 00:20:57,200
leaders are trying to solve? And so I'm curious as you're in this leadership cohort, what are the

213
00:20:57,200 --> 00:21:02,400
kinds of problems that you hear are coming to the surface that even as you're looking at all the

214
00:21:02,400 --> 00:21:09,280
different businesses represented, the different levels within the organization, what commonalities

215
00:21:09,280 --> 00:21:14,800
are there when you're hearing about problems that they're trying to solve from a data perspective?

216
00:21:14,800 --> 00:21:18,400
Yeah, I like that. Because this is one thing you said earlier, when you talked about one of the

217
00:21:18,400 --> 00:21:24,400
helpful things for you is pattern recognition or getting reusable patterns. And that was fitting

218
00:21:24,400 --> 00:21:28,560
with something I was thinking about when I was reflecting on the cohort is one of the useful

219
00:21:28,560 --> 00:21:34,320
pieces of gathering in a room with other leaders is patterns and hearing and getting like,

220
00:21:34,880 --> 00:21:39,120
honestly, like empathy of like, I see that too, or I deal with that too. And you start to see

221
00:21:39,120 --> 00:21:44,880
patterns, you start to see similar frameworks, and you're just getting validation for like,

222
00:21:44,880 --> 00:21:50,480
oh, this is how it looks over here. That maps to my experience over here. Being in a leadership role

223
00:21:51,360 --> 00:21:55,920
over a team, usually there's not a lot of data leaders at a company, maybe there's one person

224
00:21:55,920 --> 00:21:59,760
who's responsible for business intelligence, maybe one person is responsible for like data

225
00:21:59,760 --> 00:22:04,400
engineering, maybe not even multiple people, maybe it's just one person. Anyways, there's not a lot

226
00:22:04,400 --> 00:22:14,160
of peers you have internally at a company. And so the space to get into a room with other data

227
00:22:14,160 --> 00:22:22,880
leaders who have similar experiences to you, and to share patterns and to see commonalities is one

228
00:22:22,880 --> 00:22:30,720
of the more powerful parts of this, of getting validation of, yes, that issue I'm encountering,

229
00:22:30,720 --> 00:22:36,720
the cohort that we're talking through a lot is the decisions and the challenges are a little bit

230
00:22:36,720 --> 00:22:45,520
higher level. So it's questions about, so making technology decisions, like, hey, we're trying to

231
00:22:45,520 --> 00:22:51,280
decide, we're moving to the cloud, and we're trying to think about, are we moving to Databricks? Are we

232
00:22:51,280 --> 00:22:59,280
doing to Azure, AWS? Are we using Tableau or Power BI? And these are large, big picture questions

233
00:22:59,280 --> 00:23:03,920
around technology choices. That's something that's come up a lot as companies feel like,

234
00:23:03,920 --> 00:23:08,240
as leaders feel like they're in transition between legacy data stacks and moving to new data stacks.

235
00:23:08,800 --> 00:23:13,360
So we talk about technology choices and considerations and how do you even make those

236
00:23:13,360 --> 00:23:20,960
choices. The other common patterns that we talk through is skill sets and staffing of teams.

237
00:23:20,960 --> 00:23:29,280
And so how you weight your team around maybe more technical data engineers or database administrators,

238
00:23:29,280 --> 00:23:34,800
and then more business focused analysts, and then all the different little skill sets that maybe fall

239
00:23:34,800 --> 00:23:41,360
in between there, and how you have a team that's well equipped to bridge the full gap that is data

240
00:23:41,360 --> 00:23:48,960
in a source system to business intelligence for a business user. And so there's lots of different

241
00:23:48,960 --> 00:23:53,440
ways that gets configured based on the size of the org and the size of the team of who's supposed to

242
00:23:53,440 --> 00:24:00,240
do what. And your tools play into that as well. Some tools allow users to build all that maybe

243
00:24:00,240 --> 00:24:04,880
in one place. Some tools require that's multiple roles or different functions handling those

244
00:24:04,880 --> 00:24:11,920
different parts of the data value chain. But that's another common question and the common

245
00:24:11,920 --> 00:24:20,720
consideration we've had. As well as this pattern has surfaced as well, one of the main hats that

246
00:24:20,720 --> 00:24:25,440
a data leader wears is their interaction with the rest of the business organization that they serve

247
00:24:25,440 --> 00:24:30,640
and the stakeholders that they interact with. And how do they interact with those organizations?

248
00:24:30,640 --> 00:24:34,320
And how do they measure the success of their interactions with those other organizations?

249
00:24:35,040 --> 00:24:40,800
And so it's like, hey, we're a data team and we're pumping out reports and we're delivering data

250
00:24:40,800 --> 00:24:47,920
products to finance and to operations and to donor relations and all those different sorts of things

251
00:24:47,920 --> 00:24:53,600
and government affairs. And so there's lots of different business units that they interact with.

252
00:24:54,560 --> 00:24:57,680
And how do they know if they're doing a good job at that? Or how do they measure or gauge

253
00:24:57,680 --> 00:25:02,320
the interaction? Because the goal is they are a trustworthy source of data and they want to be a

254
00:25:02,320 --> 00:25:07,200
partner and they want to be analytical advisors and really help them make strategic decisions.

255
00:25:07,200 --> 00:25:11,360
And how do they not just become ticket takers and report just viewed as like somebody who pumps out

256
00:25:11,360 --> 00:25:17,040
reports and respond to my data request. And so there's some good conversations here about how

257
00:25:17,040 --> 00:25:23,760
you show up as a data team and a data leader in relation to those business teams. And so how do

258
00:25:23,760 --> 00:25:29,840
you do things like NPS score surveys and different ways of collecting? Are we doing a good job?

259
00:25:30,560 --> 00:25:35,120
What do they think of us? And are we responding well to their needs? Are we meeting their needs

260
00:25:35,120 --> 00:25:41,040
well? So I found those conversations really because every data leader is kind of like homegrown,

261
00:25:41,040 --> 00:25:46,000
a different way of thinking about that and different way of managing that and trying to assess

262
00:25:46,000 --> 00:25:49,840
their success relation to other teams. And so sharing the cross learnings there has been also

263
00:25:49,840 --> 00:25:58,560
really useful. This makes me curious about one of the most maybe elusive things that I've seen

264
00:25:58,560 --> 00:26:08,000
within my data career is the importance of data modeling. That was something that we spent two of

265
00:26:08,000 --> 00:26:15,600
six weeks discussing in this bootcamp. So if it's one third of the entire bootcamp is how do you do

266
00:26:15,600 --> 00:26:21,520
data modeling and how do you do data modeling well? Clearly that was an important topic for

267
00:26:21,520 --> 00:26:28,000
data engineers. And yet in my experience, I've seen data engineers think of themselves as the

268
00:26:28,000 --> 00:26:36,160
pipeline gurus. They're more interested in their ingestion pipelines and not as interested in

269
00:26:36,160 --> 00:26:44,160
modeling the data for the business use and the business analytics. So what role does data modeling

270
00:26:44,160 --> 00:26:50,560
play in your cohort? And does that get more technical than you typically intend for this

271
00:26:50,560 --> 00:26:55,680
leadership cohort? So I'm curious, what does it look like to talk about that? Maybe elusive thing,

272
00:26:55,680 --> 00:26:59,280
because data modeling can mean so many things to so many different people.

273
00:26:59,280 --> 00:27:03,040
Sure. So just to give you an outline of the course real quick. So this is the way we tackle topics.

274
00:27:04,160 --> 00:27:09,280
There's myself and then two other mentors who do some lectures and content and kind of lead this

275
00:27:09,280 --> 00:27:14,720
thing. So I do a couple of weeks on foundations of great data teams. And so the first one's

276
00:27:14,720 --> 00:27:20,080
kind of like purpose and strategy. And then the second one that I do around foundations of great

277
00:27:20,080 --> 00:27:24,560
data teams is more technical and functional behaviors. And so what those technical and

278
00:27:24,560 --> 00:27:31,440
functional behaviors look like in addition to data governance and kind of like developer

279
00:27:31,440 --> 00:27:36,400
processes or change control or CI CD, those are two of them. The third one is data modeling,

280
00:27:36,400 --> 00:27:42,400
as well as so great data teams have like the functional and technical patterns and behaviors

281
00:27:42,400 --> 00:27:46,640
and data modeling is one of those core ones. And then we spent a couple of weeks on data architecture

282
00:27:46,640 --> 00:27:53,680
and a couple of weeks on power BI and administration governance strategy, managing power BI developers.

283
00:27:53,680 --> 00:27:59,520
And then there's a bonus week on LLMs. But the point is like probably like week four is when we

284
00:27:59,520 --> 00:28:05,040
get to these technical and functional behaviors of data modeling, data governance and CI CD or like

285
00:28:05,040 --> 00:28:10,960
good developer practices. And so data modeling really fits into this, the main gap that I hear

286
00:28:10,960 --> 00:28:15,120
and the pain points that I hear from data leaders usually comes down to, I usually end up back

287
00:28:15,120 --> 00:28:20,400
talking about data modeling because they talk about all the data landing into a data lake.

288
00:28:20,400 --> 00:28:24,640
And it's basically just a copy of, you know, copy of source systems. And then they remodel

289
00:28:24,640 --> 00:28:29,680
and reintegrate the data every single time they build a report. And so it's like, they're probably

290
00:28:29,680 --> 00:28:33,360
really good modelers. The fact is they just do it every single time they build a report

291
00:28:33,360 --> 00:28:38,240
and it becomes really burdensome and it takes a long time to deliver reports and it's not

292
00:28:38,960 --> 00:28:44,000
unified. The reports may not be modeled the same way every time and the logic may not be the same.

293
00:28:44,000 --> 00:28:50,720
So data modeling becomes an important pain point there. Another person was talking about how this

294
00:28:50,720 --> 00:28:54,880
ends up being like in the BI tool. In the BI tool, they're calculating measures and they're calculating

295
00:28:55,440 --> 00:28:59,520
relationships between these data sets, integrating these data sets for every single time they're

296
00:28:59,520 --> 00:29:02,960
building the reports. Sometimes it's happening in the database, sometimes it's happening in a

297
00:29:02,960 --> 00:29:07,600
BI tool, but there's a lack of data modeling. And that pain point shows up when it takes them

298
00:29:07,600 --> 00:29:12,560
a long time to respond to report requests. And they realize that they're building the same logic

299
00:29:12,560 --> 00:29:17,680
every single time because they haven't thought through and they haven't done the work yet to

300
00:29:17,680 --> 00:29:23,200
design a business data model, like a model that represents how the business functions and is

301
00:29:23,200 --> 00:29:27,520
structured. And so this comes back to the conversation that I've been having with them around

302
00:29:28,720 --> 00:29:32,000
how do you interact with the business teams and how do you show up with empathy and appreciation

303
00:29:32,000 --> 00:29:38,080
for the business problems and challenges that they have. And those are the conversations that

304
00:29:38,080 --> 00:29:43,280
allow you to do data modeling. Once I can understand and empathize with the challenges and

305
00:29:43,280 --> 00:29:48,400
opportunities that they're facing, I can start to think about mapping their business process into a

306
00:29:48,960 --> 00:29:55,040
data model that represents the business. And it creates then a shared language between technical

307
00:29:55,040 --> 00:30:01,520
world and business world. Where now when we talk about a customer or when we talk about

308
00:30:01,520 --> 00:30:08,320
revenue or when we talk about events or operations, we have some similar language because we've

309
00:30:08,320 --> 00:30:14,880
designed data and we've constrained data and we've manufactured data to look like the entities we

310
00:30:14,880 --> 00:30:20,240
want it to represent. So I think about it as a shared language now between business and data.

311
00:30:21,040 --> 00:30:25,520
And anybody who's worked in the data field knows that that's the biggest breakdown that always

312
00:30:25,520 --> 00:30:30,240
occurs is we don't have a shared way to talk about stuff. And so the business person will say one

313
00:30:30,240 --> 00:30:33,680
thing over here, data person will say one thing over here, and there's a giant chasm in between.

314
00:30:33,680 --> 00:30:38,320
A data model bridges that gap. So that was a long way to way of saying, yeah, data model comes up in

315
00:30:38,320 --> 00:30:42,400
like all of the pain points that they're talking about usually. It's like, well, let's fix our data

316
00:30:42,400 --> 00:30:47,760
modeling problem first. Yeah. Oh, that's great. That's great. The shared language I think is so

317
00:30:47,760 --> 00:30:52,960
important because even if you're getting technical and you're trying to just determine what's the

318
00:30:52,960 --> 00:31:00,160
what's the best name for this column in the BI tool. Well, that's a good indicator if you're

319
00:31:00,800 --> 00:31:04,960
close to sharing that language with the business, because if you name it something and the business

320
00:31:04,960 --> 00:31:09,120
is like, oh, I didn't use that column because I didn't know what it was, you've missed it. Like

321
00:31:09,120 --> 00:31:14,320
that's poor data modeling because you didn't have enough conversations with the business users to

322
00:31:14,320 --> 00:31:21,200
determine what should that column really be named for them to get value out of it. It even reminds

323
00:31:21,200 --> 00:31:27,360
me a little bit about data lineage. And I was thinking about this the other day in like, you

324
00:31:27,360 --> 00:31:35,920
know, what would a baby computer call its father? Oh, no, I know what this is. Okay, I don't know,

325
00:31:35,920 --> 00:31:42,400
Troy. Go ahead. Data. Oh, that's like so perfect. And

326
00:31:45,520 --> 00:31:50,240
had to get my dad. You had a data lineage and I thought we were headed somewhere super productive,

327
00:31:50,240 --> 00:31:58,960
Troy. Okay. I think data lineage is also important, but not in the context of a baby, a dad joke.

328
00:32:04,640 --> 00:32:11,120
Troy, so tell me about the tech stack you're currently working with and the tech stack that

329
00:32:11,120 --> 00:32:17,280
you learned and maybe parallel or compare and contrast those two pros and cons that you're

330
00:32:17,280 --> 00:32:20,800
seeing between the different tech stocks that you've encountered in your day to day and then in this

331
00:32:20,800 --> 00:32:29,280
bootcamp. Yeah, yeah. Great question. So, you know, my day to day has been Microsoft stack. I have

332
00:32:29,280 --> 00:32:37,200
experience with both Google, BigQuery, Oracle products, PeopleSoft, you know, some of these other

333
00:32:38,160 --> 00:32:45,120
big organizations that are out there. But right now day to day is the Microsoft stack. And

334
00:32:45,120 --> 00:32:53,120
the difference there is in order for this bootcamp to function, it was mainly open source tools. And

335
00:32:53,120 --> 00:33:01,840
so you get a very different feel when you're using the proprietary low code citizen developer

336
00:33:01,840 --> 00:33:10,240
type feel of a tech stack that your enterprise has purchased and is invested in. And so Microsoft

337
00:33:10,240 --> 00:33:16,800
is very much that it makes it easier to do things at just the click of a button. At the same time,

338
00:33:16,800 --> 00:33:22,160
you might find it limiting, like you're almost you have your hands tied behind your back because

339
00:33:22,160 --> 00:33:28,720
you can't do something you know it could do. If you just could get behind that gooey and actually

340
00:33:28,720 --> 00:33:35,440
punch something in the way you want it to. And so that was the big differences. Right now, I'm

341
00:33:35,440 --> 00:33:42,400
trying to think through a way of integrating dbt, dbt core, I should say, which is the free open

342
00:33:42,400 --> 00:33:49,200
source version and trying to get that integrated within the Microsoft stack and use it effectively.

343
00:33:49,200 --> 00:33:55,760
And I'm just struggling because now I'm trying to take an open source tool that's, you know,

344
00:33:55,760 --> 00:34:02,560
code heavy. And that's just not my experience. I am not a coder by trade. And so how do I get that

345
00:34:02,560 --> 00:34:12,560
to plug into Microsoft environment well and work seamlessly where I can have a better just way of

346
00:34:12,560 --> 00:34:20,880
building models efficiently, quickly with version control, using Git, all those kinds of things

347
00:34:20,880 --> 00:34:25,040
that I'd like to be doing. And I'm just not there yet. So I learned some things I'm really excited

348
00:34:25,040 --> 00:34:31,680
about, and I'm struggling now to plug them into my particular experience. So I think that

349
00:34:31,680 --> 00:34:41,120
I think the context around like using more SaaS tools or like abstracted tools versus like using

350
00:34:41,120 --> 00:34:46,480
low level open source tools is your access to the knobs and switches under the covers.

351
00:34:46,480 --> 00:34:53,520
And you're always doing a trade off of ease of use, and fewer knobs and switches available.

352
00:34:54,320 --> 00:34:58,480
And so for smaller organizations with less technical skills or just less technical

353
00:34:58,480 --> 00:35:04,880
capacity, fewer people, you probably opt for fewer switches and easier use cases.

354
00:35:04,880 --> 00:35:12,880
For large organizations with more complex requirements and use cases and heavier technical

355
00:35:12,880 --> 00:35:18,720
skills and capacity, hey, give me more knobs and switches. I want to configure things at a detailed

356
00:35:18,720 --> 00:35:24,560
level. Let me ask you this, Joy, would you recommend a data engineering boot camp to somebody?

357
00:35:24,560 --> 00:35:29,840
Or what types of, was this boot camp worth it? Are there other boot camps you consider that you'd recommend to people?

358
00:35:31,360 --> 00:35:37,280
That's a great question. And with many things in life, it depends, right? And so, you know,

359
00:35:37,280 --> 00:35:42,480
I thought I had a great experience. Does that mean that everything was perfect? No, there was always

360
00:35:42,480 --> 00:35:49,120
a little bit of bumps along the way. But if you're looking to grow in the technical side of it, as

361
00:35:49,120 --> 00:35:54,800
you've heard in this conversation here, I would highly recommend the boot camp by Zach Wilson.

362
00:35:54,800 --> 00:36:00,160
I think he's doing a great job, always iterating, always making it better. And so there's some great

363
00:36:00,160 --> 00:36:06,240
content there. And Sawyer, of course, I know you're going to toot your own horn on the cohort here,

364
00:36:06,240 --> 00:36:12,160
but, you know, share one testimonial that you've heard from someone in this current boot camp in

365
00:36:12,160 --> 00:36:16,560
terms of what they're thinking and how they might recommend just something off the top of your head

366
00:36:16,560 --> 00:36:20,320
that you've got. Sure. I was about to say, like, we're not getting paid to say this stuff, but like,

367
00:36:20,320 --> 00:36:24,960
and we're not getting paid to promote Zach's boot camp, that's for sure. That's right. But it is

368
00:36:24,960 --> 00:36:30,960
part of my business, this community. So the cohort, yeah, one of the things that someone shared with

369
00:36:30,960 --> 00:36:35,280
me at the end of the last cohort, this is the second one we've run, is, hey, they feel like they

370
00:36:35,280 --> 00:36:39,680
got a lot of empathy and community for the first time as a data leader. They have their team and

371
00:36:39,680 --> 00:36:43,040
they have their bosses. But the first time they're like, I know some other data leaders, and I'm

372
00:36:43,040 --> 00:36:46,960
going to keep these relationships for a long time. And so that was super validating because that was

373
00:36:46,960 --> 00:36:51,120
one of our goals. We hoped it would happen. And at the end of the first seven weeks of this, it

374
00:36:51,120 --> 00:36:54,960
seemed like it did. And that was a big takeaway for one of the cohort people participants for the

375
00:36:54,960 --> 00:36:58,720
first time. And it seems like it's shaping up that way during the second cohort as well.

376
00:36:58,720 --> 00:37:03,520
That's great. That's cool. Yeah, of course, if anybody wants to chat more with me about

377
00:37:03,520 --> 00:37:08,720
my experience in the boot camp, of course, I'm happy to respond to any questions that people have

378
00:37:08,720 --> 00:37:15,120
and they can just reach out via my information. I think we have somewhere there on the website.

379
00:37:15,120 --> 00:37:42,960
And so we can do that for sure. All right. Well, this has been Making Data Matters. Thanks, everybody.

