1
00:00:00,000 --> 00:00:03,600
When I had entered into my graduate program,

2
00:00:03,600 --> 00:00:06,320
I was told that it was a waste of time

3
00:00:06,320 --> 00:00:10,000
to do like computer science and programming,

4
00:00:10,000 --> 00:00:12,560
that I would never be respected if all I did

5
00:00:12,560 --> 00:00:15,640
was like build tools or build libraries

6
00:00:15,640 --> 00:00:19,800
or write code to solve science problems.

7
00:00:24,120 --> 00:00:26,160
How did the best machine learning practitioners

8
00:00:26,160 --> 00:00:28,120
get involved in the field?

9
00:00:28,120 --> 00:00:30,600
What challenges have they faced?

10
00:00:30,600 --> 00:00:32,760
What has helped them flourish?

11
00:00:32,760 --> 00:00:34,560
Let's ask them.

12
00:00:34,560 --> 00:00:36,960
Welcome to Learning from Machine Learning.

13
00:00:36,960 --> 00:00:38,720
I'm your host, Seth Levine.

14
00:00:41,180 --> 00:00:43,520
Welcome to Learning from Machine Learning.

15
00:00:43,520 --> 00:00:46,680
On this episode, we have a very special guest,

16
00:00:46,680 --> 00:00:49,560
Paige Bailey, the lead product manager

17
00:00:49,560 --> 00:00:52,480
for generative models at Google DeepMind.

18
00:00:52,480 --> 00:00:56,720
And at such an exciting time, this week was Google I.O.

19
00:00:56,720 --> 00:01:01,200
where they got to announce new advancements for Google Bard,

20
00:01:01,200 --> 00:01:04,520
talk about Gemini, and how they're incorporating

21
00:01:04,520 --> 00:01:06,760
many aspects of generative models

22
00:01:06,760 --> 00:01:09,880
into all of their products, really.

23
00:01:09,880 --> 00:01:12,160
Paige, welcome to the show.

24
00:01:12,160 --> 00:01:12,680
Awesome.

25
00:01:12,680 --> 00:01:14,720
Thank you for having me.

26
00:01:14,720 --> 00:01:17,000
It's such a pleasure to have you.

27
00:01:17,000 --> 00:01:20,480
You have such an interesting background.

28
00:01:20,480 --> 00:01:22,600
You want to just give a little bit of background,

29
00:01:22,600 --> 00:01:24,920
introduce yourself, how you got interested in computers

30
00:01:24,920 --> 00:01:26,640
and machine learning?

31
00:01:26,640 --> 00:01:27,760
Sure, that sounds good.

32
00:01:27,760 --> 00:01:31,680
So my background, I did geophysics and applied math

33
00:01:31,680 --> 00:01:34,080
when I was in school.

34
00:01:34,080 --> 00:01:37,360
That was kind of the focus of my career.

35
00:01:37,360 --> 00:01:39,000
I think when I was younger, I wanted

36
00:01:39,000 --> 00:01:41,920
to be like some sort of lady, Carl Sagan,

37
00:01:41,920 --> 00:01:43,720
like focused on planetary science

38
00:01:43,720 --> 00:01:47,360
and sort of being able to explain

39
00:01:47,360 --> 00:01:51,280
these complicated technical topics to the world.

40
00:01:51,280 --> 00:01:54,240
I got into computers very early.

41
00:01:54,240 --> 00:01:57,360
I grew up in quite a small town.

42
00:01:57,360 --> 00:02:02,280
And my family kind of rescued an apple, too,

43
00:02:02,280 --> 00:02:03,560
from being thrown away.

44
00:02:03,560 --> 00:02:06,960
And that was how I first learned how to program.

45
00:02:06,960 --> 00:02:12,080
And then really, I think this time in particular

46
00:02:12,080 --> 00:02:15,680
is kind of what I've been waiting my entire career for.

47
00:02:15,680 --> 00:02:19,120
I've been doing machine learning for a bit over a decade.

48
00:02:19,120 --> 00:02:23,000
And previously, you would have to go through all of this pain

49
00:02:23,000 --> 00:02:25,200
to get the data, to build a model,

50
00:02:25,200 --> 00:02:27,760
to try to choose an algorithm, and then

51
00:02:27,760 --> 00:02:30,560
to do the really hard work of trying to get

52
00:02:30,560 --> 00:02:33,000
those models into production.

53
00:02:33,000 --> 00:02:35,560
And even then, those models were just kind of single task

54
00:02:35,560 --> 00:02:36,520
models.

55
00:02:36,520 --> 00:02:40,080
And today, we have these highly capable, very general purpose

56
00:02:40,080 --> 00:02:45,800
models that are doing an overwhelming number of things.

57
00:02:45,800 --> 00:02:49,360
And really, we keep discovering new ways

58
00:02:49,360 --> 00:02:53,800
that they're useful based on the way that people are sort

59
00:02:53,800 --> 00:02:58,160
of testing them out and stress testing their boundaries.

60
00:02:58,160 --> 00:03:03,000
So it's been really exhilarating, honestly.

61
00:03:03,000 --> 00:03:05,080
I would never have imagined that this

62
00:03:05,080 --> 00:03:08,240
would happen back when I started doing machine learning in 2009,

63
00:03:08,240 --> 00:03:09,720
2010.

64
00:03:09,720 --> 00:03:14,600
So back in 2009, 2010, what were the types of machine learning

65
00:03:14,600 --> 00:03:17,280
problems that you were working on?

66
00:03:17,280 --> 00:03:18,800
Oh, lord.

67
00:03:18,800 --> 00:03:23,240
So back then, I was fortunate enough to do,

68
00:03:23,240 --> 00:03:30,320
I got some NSF kind of grants to do planetary sciences research

69
00:03:30,320 --> 00:03:32,800
both at the Laboratory for Atmospheric and Space Physics

70
00:03:32,800 --> 00:03:36,760
in Boulder, as well as at Southwest Research Institute.

71
00:03:36,760 --> 00:03:38,720
And most of the machine learning there

72
00:03:38,720 --> 00:03:41,520
was just kind of fancy statistics, right?

73
00:03:41,520 --> 00:03:43,760
So you were doing linear regression,

74
00:03:43,760 --> 00:03:44,920
logistic regression.

75
00:03:44,920 --> 00:03:48,160
You might be doing decision trees

76
00:03:48,160 --> 00:03:51,040
or support vector machines.

77
00:03:51,040 --> 00:03:55,600
But really, it was just taking this very tabular data

78
00:03:55,600 --> 00:03:58,040
and attempting to make sense out of it.

79
00:03:58,040 --> 00:04:01,200
And it was useful, very, very useful,

80
00:04:01,200 --> 00:04:06,680
and capable of driving many interesting scientific advances,

81
00:04:06,680 --> 00:04:09,280
but nothing like what we have today.

82
00:04:09,280 --> 00:04:20,760
And also using sometimes very, I would say, very niche tools.

83
00:04:20,760 --> 00:04:23,320
So there was, in the space sciences,

84
00:04:23,320 --> 00:04:25,560
there's something called IDL.

85
00:04:25,560 --> 00:04:30,080
There's also, I remember, many, many sleepless nights

86
00:04:30,080 --> 00:04:33,680
attempting to wrangle MATLAB into doing

87
00:04:33,680 --> 00:04:36,480
what I needed it to do, whereas Python always

88
00:04:36,480 --> 00:04:38,840
seemed to make much more sense.

89
00:04:38,840 --> 00:04:45,080
So just, again, like a sort of an explosion

90
00:04:45,080 --> 00:04:47,880
and a revolution in terms of the kinds of models

91
00:04:47,880 --> 00:04:50,520
that we can build now and then also the tools that are

92
00:04:50,520 --> 00:04:52,520
available to help build those models.

93
00:04:52,520 --> 00:04:53,400
Yeah.

94
00:04:53,400 --> 00:04:55,480
I love MATLAB, by the way.

95
00:04:55,480 --> 00:04:56,240
Oh, wow.

96
00:04:56,240 --> 00:04:57,320
Yeah.

97
00:04:57,320 --> 00:04:59,960
A lot of my undergrad work, I use MATLAB.

98
00:04:59,960 --> 00:05:02,600
And one of my first big machine learning projects,

99
00:05:02,600 --> 00:05:04,000
I inherited the project.

100
00:05:04,000 --> 00:05:05,480
And a lot of it was in MATLAB.

101
00:05:05,480 --> 00:05:08,440
So I started out doing a lot of work there.

102
00:05:08,440 --> 00:05:10,520
I think it's great.

103
00:05:10,520 --> 00:05:11,880
People don't talk about it enough.

104
00:05:11,880 --> 00:05:15,560
Python has kind of eaten it up like a lot of other things.

105
00:05:15,560 --> 00:05:17,480
But I love MATLAB.

106
00:05:17,480 --> 00:05:19,600
Yeah.

107
00:05:19,600 --> 00:05:21,920
One of my professors always used to say

108
00:05:21,920 --> 00:05:24,280
that it doesn't really matter what tool you're using.

109
00:05:24,280 --> 00:05:27,000
It doesn't matter if it's Python, a spreadsheet, MATLAB,

110
00:05:27,000 --> 00:05:32,840
SPSS data, or R, or whatever it happens to be.

111
00:05:32,840 --> 00:05:34,840
The important thing is that you're

112
00:05:34,840 --> 00:05:36,400
asking an interesting question.

113
00:05:36,400 --> 00:05:39,560
And you're being thoughtful about analyzing data.

114
00:05:39,560 --> 00:05:42,000
And the tool is just kind of something

115
00:05:42,000 --> 00:05:43,360
that gets you to the answer.

116
00:05:43,360 --> 00:05:45,400
It shouldn't be the answer itself.

117
00:05:45,400 --> 00:05:46,640
Yeah.

118
00:05:46,640 --> 00:05:47,840
Yeah, definitely.

119
00:05:47,840 --> 00:05:51,200
That's for sure.

120
00:05:51,200 --> 00:05:54,320
Yeah, I remember I did some very interesting work with SPSS

121
00:05:54,320 --> 00:05:55,840
as well.

122
00:05:55,840 --> 00:05:58,160
But yeah, it's not the tool, really.

123
00:05:58,160 --> 00:06:02,480
It's understanding the problem, understanding that,

124
00:06:02,480 --> 00:06:04,600
do you even have the right data to start

125
00:06:04,600 --> 00:06:06,520
approaching this problem?

126
00:06:06,520 --> 00:06:10,640
So yeah, so I guess at face value,

127
00:06:10,640 --> 00:06:11,880
people might not understand.

128
00:06:11,880 --> 00:06:13,320
So you're in geophysics.

129
00:06:13,320 --> 00:06:19,400
How did you become the lead product for generative models

130
00:06:19,400 --> 00:06:21,200
at Google DeepMind?

131
00:06:21,200 --> 00:06:22,560
But I mean, it makes sense.

132
00:06:22,560 --> 00:06:25,640
There's so much dealing with so much data.

133
00:06:25,640 --> 00:06:30,080
And I guess a lot of the lessons that you

134
00:06:30,080 --> 00:06:32,960
learn with how to handle all of that kind of data apply.

135
00:06:32,960 --> 00:06:36,000
But if you want to speak to that.

136
00:06:36,000 --> 00:06:36,680
Absolutely.

137
00:06:36,680 --> 00:06:40,280
So one of the cool things about geophysics

138
00:06:40,280 --> 00:06:45,720
is that geophysicists have been using GPUs for a long time.

139
00:06:45,720 --> 00:06:51,760
And they've also been kind of the flavor of Earth scientists

140
00:06:51,760 --> 00:06:57,040
that are more likely to be delighted by computers

141
00:06:57,040 --> 00:07:02,200
as opposed to running away from computers to the mountains.

142
00:07:02,200 --> 00:07:10,240
So seismic data is very massive, as well as well-logged data,

143
00:07:10,240 --> 00:07:14,080
kind of the interpretations of subsurface data.

144
00:07:14,080 --> 00:07:20,320
And all of this Earth sciences data

145
00:07:20,320 --> 00:07:23,680
really needs heavy kind of computational horsepower

146
00:07:23,680 --> 00:07:25,360
in order to analyze it.

147
00:07:25,360 --> 00:07:28,240
So even before deep learning was a thing,

148
00:07:28,240 --> 00:07:33,120
Earth scientists were already building models

149
00:07:33,120 --> 00:07:35,720
and sort of attempting to analyze these patterns

150
00:07:35,720 --> 00:07:39,480
in the subsurface and in seismic using GPUs.

151
00:07:39,480 --> 00:07:43,360
And that's something that I don't know if many people know.

152
00:07:43,360 --> 00:07:48,840
But I actually had started experimenting with CUDA,

153
00:07:48,840 --> 00:07:53,640
not just for deep learning, but because of Earth sciences

154
00:07:53,640 --> 00:07:56,640
problems and quantitative hydrogeology,

155
00:07:56,640 --> 00:08:02,440
and also understanding fluid dynamics problems.

156
00:08:02,440 --> 00:08:04,880
Yeah, that's fascinating.

157
00:08:04,880 --> 00:08:07,840
I guess having that experience with CUDA

158
00:08:07,840 --> 00:08:10,760
and dealing with that type of data,

159
00:08:10,760 --> 00:08:15,760
understanding how to process it set you up probably pretty

160
00:08:15,760 --> 00:08:19,000
well for your role with TensorFlow.

161
00:08:19,000 --> 00:08:22,400
I know that you played a pivotal role

162
00:08:22,400 --> 00:08:25,360
in the development of that.

163
00:08:25,360 --> 00:08:28,960
Can you speak to some of your work with TensorFlow?

164
00:08:28,960 --> 00:08:30,680
I don't know if it was a pivotal role

165
00:08:30,680 --> 00:08:32,280
in the development of TensorFlow,

166
00:08:32,280 --> 00:08:34,640
but I certainly was delighted.

167
00:08:34,640 --> 00:08:38,240
So TensorFlow was, I've been experimenting

168
00:08:38,240 --> 00:08:42,560
with kind of open source machine learning tools for a long time.

169
00:08:42,560 --> 00:08:44,800
I think I mentioned before, Python always

170
00:08:44,800 --> 00:08:49,080
seemed to make more sense than MATLAB.

171
00:08:49,080 --> 00:08:53,800
And there's kind of a really nice Python data science

172
00:08:53,800 --> 00:08:58,280
ecosystem, SciPy, NumPy, Matplotlib, Scikit-learn,

173
00:08:58,280 --> 00:09:02,080
like all of those great things.

174
00:09:02,080 --> 00:09:07,040
But then TensorFlow came out and was open sourced in late 2015.

175
00:09:07,040 --> 00:09:11,400
And it was kind of revolutionary in a few ways.

176
00:09:11,400 --> 00:09:12,920
It had some documentation.

177
00:09:12,920 --> 00:09:15,640
It had some examples that you could use.

178
00:09:15,640 --> 00:09:17,800
I think all of those deep dream images

179
00:09:17,800 --> 00:09:20,320
went viral over the internet.

180
00:09:20,320 --> 00:09:21,920
But it was also in Python.

181
00:09:21,920 --> 00:09:25,600
And even though it was a really kind of janky, weird sort

182
00:09:25,600 --> 00:09:30,280
of Python that you had to construct graphs to use it,

183
00:09:30,280 --> 00:09:34,720
it was still something that felt a little bit more approachable

184
00:09:34,720 --> 00:09:37,680
than maybe some of the other deep learning frameworks that

185
00:09:37,680 --> 00:09:44,440
were implemented in C++ or Lua or whatever they might have been.

186
00:09:44,440 --> 00:09:49,760
So I got very excited about it, started

187
00:09:49,760 --> 00:09:52,600
learning how to use it, trying it for my projects.

188
00:09:52,600 --> 00:09:56,400
My first deep learning project was

189
00:09:56,400 --> 00:09:59,120
applied to the Earth sciences, so understanding

190
00:09:59,120 --> 00:10:06,320
how to categorize different shapes of its holes and reefs

191
00:10:06,320 --> 00:10:11,360
and just being overwhelmed at how something would have taken

192
00:10:11,360 --> 00:10:14,200
a poor grad student six months to do.

193
00:10:14,200 --> 00:10:16,440
And then suddenly, this was able to blast through it

194
00:10:16,440 --> 00:10:19,240
in just five minutes with a really modest amount

195
00:10:19,240 --> 00:10:21,520
of training data.

196
00:10:21,520 --> 00:10:24,440
But I started contributing.

197
00:10:24,440 --> 00:10:29,040
Eventually, the TensorFlow team kind of took notice.

198
00:10:29,040 --> 00:10:35,440
And I got to join them and to work at Google Brain.

199
00:10:35,440 --> 00:10:40,200
And in addition to getting to work with the TensorFlow team,

200
00:10:40,200 --> 00:10:45,680
the JAX team, which is also an open source numerical library

201
00:10:45,680 --> 00:10:50,120
for doing deep learning in addition to many other things,

202
00:10:50,120 --> 00:10:53,480
especially in the sciences, I got to work with them quite closely.

203
00:10:53,480 --> 00:10:55,440
And then all of our machine learning frameworks

204
00:10:55,440 --> 00:10:56,920
teams at Alphabet.

205
00:10:56,920 --> 00:11:00,200
So like I said, it's very exhilarating.

206
00:11:00,200 --> 00:11:03,120
And it's been really interesting to see

207
00:11:03,120 --> 00:11:07,720
how the space has evolved over the course of the last many years.

208
00:11:07,720 --> 00:11:09,200
Yeah.

209
00:11:09,200 --> 00:11:13,040
So fast forwarding to today and some

210
00:11:13,040 --> 00:11:17,440
of your more recent projects with all the things that

211
00:11:17,440 --> 00:11:21,520
are happening with these large language models,

212
00:11:21,520 --> 00:11:24,120
can you speak to the work that you've

213
00:11:24,120 --> 00:11:26,400
been doing with applying large language models

214
00:11:26,400 --> 00:11:30,000
to different software development tasks?

215
00:11:30,000 --> 00:11:30,760
Absolutely.

216
00:11:30,760 --> 00:11:36,080
So I guess this is also a nice segue.

217
00:11:36,080 --> 00:11:39,040
So I'm boomerang back to Alphabet.

218
00:11:39,040 --> 00:11:41,760
During the pandemic, I spent a little bit over a year

219
00:11:41,760 --> 00:11:45,000
at Microsoft, specifically GitHub,

220
00:11:45,000 --> 00:11:49,880
helping with introducing machine learning features into VS code,

221
00:11:49,880 --> 00:11:54,080
GPUs in code spaces, and then also, of course, copilot.

222
00:11:54,080 --> 00:12:02,240
And so I think the potential for generative models

223
00:12:02,240 --> 00:12:05,880
in the software development space is huge.

224
00:12:05,880 --> 00:12:10,840
Historically, single task models for things

225
00:12:10,840 --> 00:12:12,960
like doc string generation or single task

226
00:12:12,960 --> 00:12:16,720
models for code generation or for code completions,

227
00:12:16,720 --> 00:12:19,480
single task models for build repair

228
00:12:19,480 --> 00:12:22,440
or for helping resolve or identify errors.

229
00:12:22,440 --> 00:12:24,800
And now we're seeing models do all of these things

230
00:12:24,800 --> 00:12:29,800
and even more with just a singular model.

231
00:12:29,800 --> 00:12:32,720
One of the coolest things about the announcements

232
00:12:32,720 --> 00:12:35,600
that we had at I.O. this past week

233
00:12:35,600 --> 00:12:39,600
is that they're all using kind of our latest large language

234
00:12:39,600 --> 00:12:41,680
model, the technical paper is out.

235
00:12:41,680 --> 00:12:43,840
I encourage everyone to read it.

236
00:12:43,840 --> 00:12:48,320
We also have a website at g.co. slash Palm2.

237
00:12:48,320 --> 00:12:53,040
But this model, based on kind of the way

238
00:12:53,040 --> 00:12:57,840
that it was trained in it and the input data,

239
00:12:57,840 --> 00:13:02,080
it's capable of doing a broad variety of software development

240
00:13:02,080 --> 00:13:02,800
tasks.

241
00:13:02,800 --> 00:13:06,560
And it's supporting code generation, code explanation,

242
00:13:06,560 --> 00:13:11,360
error explanation, error fixing across so many alphabet

243
00:13:11,360 --> 00:13:11,960
products.

244
00:13:11,960 --> 00:13:16,000
So both are tools within Google Cloud.

245
00:13:16,000 --> 00:13:19,840
The product is called Duet, as well as

246
00:13:19,840 --> 00:13:23,120
all of those features within Google Colab, which is,

247
00:13:23,120 --> 00:13:25,360
if folks aren't familiar, it's kind of a data science

248
00:13:25,360 --> 00:13:28,920
notebook environment that's ready to use

249
00:13:28,920 --> 00:13:32,360
and that you can kind of have handy in your Google Drive

250
00:13:32,360 --> 00:13:37,600
instance, as well as Android Studio, as well as code,

251
00:13:37,600 --> 00:13:39,480
features, and BARD.

252
00:13:39,480 --> 00:13:42,560
But it's all powered by the same model.

253
00:13:42,560 --> 00:13:46,200
And it's been really energizing to see

254
00:13:46,200 --> 00:13:49,280
how people have been testing it out and using it.

255
00:13:49,280 --> 00:13:52,480
And then also the features that we have coming down the pipes,

256
00:13:52,480 --> 00:13:57,680
so things like self-healing code, and then also

257
00:13:57,680 --> 00:13:58,960
things like tool use.

258
00:13:58,960 --> 00:14:05,640
So being able to access a Python interpreter with code.

259
00:14:05,640 --> 00:14:08,440
Yeah, so I'm familiar with a lot of those tools.

260
00:14:08,440 --> 00:14:12,680
I think I was an early adopter for Google Colab.

261
00:14:12,680 --> 00:14:15,040
Yeah, I've loved it for so long, the ability

262
00:14:15,040 --> 00:14:18,520
to have free access to GPUs.

263
00:14:18,520 --> 00:14:20,640
In the past, it was a little bit longer access.

264
00:14:20,640 --> 00:14:23,080
Now it's a little bit less, but that's OK.

265
00:14:23,080 --> 00:14:26,520
Just being able to, any practitioner getting access

266
00:14:26,520 --> 00:14:30,560
to a GPU is just like, yeah, it just

267
00:14:30,560 --> 00:14:32,800
changes your iteration speed.

268
00:14:32,800 --> 00:14:35,960
And you can kind of work so much faster.

269
00:14:35,960 --> 00:14:39,040
For the other things, I don't know why,

270
00:14:39,040 --> 00:14:42,520
but I was more of a late adopter for Copilot.

271
00:14:42,520 --> 00:14:47,240
I got it within the last month, it's embarrassing to say.

272
00:14:47,240 --> 00:14:50,800
But that's when I finally got around to trying it.

273
00:14:50,800 --> 00:14:52,280
I don't know, I was hesitant.

274
00:14:52,280 --> 00:14:54,360
I just thought, oh, maybe it's going

275
00:14:54,360 --> 00:14:57,400
to introduce bugs or something like that,

276
00:14:57,400 --> 00:15:00,280
or maybe it was my pride or something.

277
00:15:00,280 --> 00:15:05,040
I wanted to just continue to be coding on my own.

278
00:15:05,040 --> 00:15:08,480
But I started to use ChatGPT, I started

279
00:15:08,480 --> 00:15:12,600
to use Bard to help me with certain things.

280
00:15:12,600 --> 00:15:14,400
I mean, it's not like I wasn't finding myself

281
00:15:14,400 --> 00:15:17,640
on Stack Overflow like every other programmer looking

282
00:15:17,640 --> 00:15:19,480
at things.

283
00:15:19,480 --> 00:15:21,560
But yeah, it's incredible.

284
00:15:21,560 --> 00:15:26,560
I think that this paradigm shift that's taking place,

285
00:15:26,560 --> 00:15:28,160
I mean, with machine learning in general,

286
00:15:28,160 --> 00:15:32,280
but for generating code, it's like what it used to be

287
00:15:32,280 --> 00:15:34,640
is you used to write a function, and then you

288
00:15:34,640 --> 00:15:37,240
would struggle to write your documentation

289
00:15:37,240 --> 00:15:38,960
or your comments and things.

290
00:15:38,960 --> 00:15:43,960
Now with this new technology to do code generation,

291
00:15:43,960 --> 00:15:46,720
you're writing what you want the code to do,

292
00:15:46,720 --> 00:15:49,200
and then it's making these suggestions

293
00:15:49,200 --> 00:15:50,560
for what the code should be.

294
00:15:50,560 --> 00:15:53,680
And multiple lines also, which is really cool.

295
00:15:53,680 --> 00:15:57,000
And it's not just trying to autocomplete,

296
00:15:57,000 --> 00:16:00,480
it's very smart.

297
00:16:00,480 --> 00:16:03,720
At least it appears to be very smart.

298
00:16:03,720 --> 00:16:07,600
One of the questions that I have is, so yeah,

299
00:16:07,600 --> 00:16:11,440
there's a lot of code out there, just like there's

300
00:16:11,440 --> 00:16:13,880
a lot of information out there.

301
00:16:13,880 --> 00:16:19,560
GitHub is filled with so many libraries,

302
00:16:19,560 --> 00:16:23,960
but not all of it is battle tested.

303
00:16:23,960 --> 00:16:26,920
Not all of it is peer reviewed.

304
00:16:26,920 --> 00:16:29,040
Code goes through code review, so it might not all

305
00:16:29,040 --> 00:16:31,400
be the highest quality.

306
00:16:31,400 --> 00:16:35,360
Or it's from, I mean, the way that things are moving

307
00:16:35,360 --> 00:16:38,520
these days, it's just from a year ago or two years ago,

308
00:16:38,520 --> 00:16:43,080
and it's using a different version of a library that

309
00:16:43,080 --> 00:16:46,960
has a different dependency or whatever the actual specific is.

310
00:16:46,960 --> 00:16:50,400
But I guess, how do you mitigate those sorts of problems?

311
00:16:50,400 --> 00:16:51,960
I know this is like a loaded question,

312
00:16:51,960 --> 00:16:54,120
but how do you mitigate those sorts of problems

313
00:16:54,120 --> 00:16:56,440
where you might be training on data that's

314
00:16:56,440 --> 00:17:00,640
either not the highest quality or out of date?

315
00:17:00,640 --> 00:17:02,800
Yeah, that's a great question.

316
00:17:02,800 --> 00:17:08,840
So we have a collection of fine tuned versions

317
00:17:08,840 --> 00:17:15,800
of our large models for internal use only.

318
00:17:15,800 --> 00:17:18,040
So kind of supporting the software engineers

319
00:17:18,040 --> 00:17:20,600
within Google who are building the software that

320
00:17:20,600 --> 00:17:23,280
powers all of the Alphabet products.

321
00:17:23,280 --> 00:17:27,040
And these fine tuned models are fine tuned on Google 3 internal

322
00:17:27,040 --> 00:17:34,400
source code, which is many, many tokens of super high quality

323
00:17:34,400 --> 00:17:40,360
peer reviewed data that is the equivalent of an L5

324
00:17:40,360 --> 00:17:42,360
suite or more.

325
00:17:42,360 --> 00:17:44,320
And that moves the needle quite a bit

326
00:17:44,320 --> 00:17:46,920
in terms of making sure that the recommendations that we're

327
00:17:46,920 --> 00:17:52,720
giving, the explanations that we're giving are pretty solid.

328
00:17:52,720 --> 00:17:58,760
For the GitHub code, I can certainly attest.

329
00:17:58,760 --> 00:18:01,360
There's a lot of very low quality code on GitHub.

330
00:18:01,360 --> 00:18:02,920
A lot of it doesn't run.

331
00:18:02,920 --> 00:18:06,600
A lot of it is using dated APIs or maybe

332
00:18:06,600 --> 00:18:08,920
have this very low process of somebody just

333
00:18:08,920 --> 00:18:12,640
push committed it to the repo without going

334
00:18:12,640 --> 00:18:16,840
through any evaluation from a peer.

335
00:18:16,840 --> 00:18:21,560
And so I think that if you are building these kinds of models

336
00:18:21,560 --> 00:18:24,200
externally, there's a lot of work

337
00:18:24,200 --> 00:18:28,120
that needs to go into kind of carefully curating and cleaning

338
00:18:28,120 --> 00:18:31,680
the GitHub data sets in order to make it solid for use.

339
00:18:31,680 --> 00:18:34,760
And if people are curious about this topic,

340
00:18:34,760 --> 00:18:37,160
I strongly, strongly recommend taking a look.

341
00:18:37,160 --> 00:18:41,160
There's a recent paper from the Hugging Face and Service Now

342
00:18:41,160 --> 00:18:43,600
team called Star Coder.

343
00:18:43,600 --> 00:18:47,160
And they go through kind of with their big code data set

344
00:18:47,160 --> 00:18:50,120
like all of the things that they needed

345
00:18:50,120 --> 00:18:54,000
to do in order to get a higher quality data set to train.

346
00:18:54,000 --> 00:18:57,600
And it includes things like deduplication.

347
00:18:57,600 --> 00:19:00,280
I think they also were considering preferentially

348
00:19:00,280 --> 00:19:03,360
waiting code that is newer or code that

349
00:19:03,360 --> 00:19:06,480
comes from repos that follow software engineering best

350
00:19:06,480 --> 00:19:10,440
practices as opposed to code that might be from like a Python

351
00:19:10,440 --> 00:19:12,560
101 student.

352
00:19:12,560 --> 00:19:17,560
And all of those kind of careful bits of attention

353
00:19:17,560 --> 00:19:22,680
to the pre-training data set move the needle significantly

354
00:19:22,680 --> 00:19:25,600
in terms of model performance down the line.

355
00:19:25,600 --> 00:19:28,040
But those things, there are also other tricks

356
00:19:28,040 --> 00:19:31,640
that you can do like retrieval techniques

357
00:19:31,640 --> 00:19:37,440
or being able to do these kind of self-healing operations

358
00:19:37,440 --> 00:19:40,040
where you recursively apply the model to the output code

359
00:19:40,040 --> 00:19:42,960
to see if it would actually run and then like fixing anything

360
00:19:42,960 --> 00:19:46,520
that might be wrong or spotting any security vulnerabilities

361
00:19:46,520 --> 00:19:49,480
if there are any in the output code.

362
00:19:49,480 --> 00:19:52,840
But it's certainly like an ongoing field

363
00:19:52,840 --> 00:19:54,760
of research in order to understand

364
00:19:54,760 --> 00:19:58,920
what the best options might be.

365
00:19:58,920 --> 00:19:59,960
Right.

366
00:19:59,960 --> 00:20:05,520
So yeah, so I guess designing, I don't know what to call it.

367
00:20:05,520 --> 00:20:09,520
I guess systems that help software engineers

368
00:20:09,520 --> 00:20:13,360
or machine learning practitioners sort of do their job,

369
00:20:13,360 --> 00:20:17,720
generate code, test code, help you write documentation,

370
00:20:17,720 --> 00:20:20,360
do all these things.

371
00:20:20,360 --> 00:20:25,320
I guess other than dealing with maybe outdated code

372
00:20:25,320 --> 00:20:27,320
are the things that we were just talking about.

373
00:20:27,320 --> 00:20:29,280
Are there any other challenges that you

374
00:20:29,280 --> 00:20:31,800
face when trying to design those systems?

375
00:20:31,800 --> 00:20:34,440
Oh, absolutely.

376
00:20:34,440 --> 00:20:38,160
And I'm sure I'm going to list a few, but there are many more.

377
00:20:38,160 --> 00:20:45,120
And I think people are discovering even more every day.

378
00:20:45,120 --> 00:20:51,880
So one is likelihood of reciting code.

379
00:20:51,880 --> 00:20:54,880
There's something that we've implemented for BARD

380
00:20:54,880 --> 00:20:57,800
where if you generate code and a portion of it

381
00:20:57,800 --> 00:21:00,440
is verbatim identical to something that

382
00:21:00,440 --> 00:21:03,160
might be within a GitHub repo, we point you

383
00:21:03,160 --> 00:21:05,240
towards the GitHub repo and then also tell you

384
00:21:05,240 --> 00:21:06,600
what license it was under.

385
00:21:06,600 --> 00:21:08,760
So whether it was Apache 2 or MIT,

386
00:21:08,760 --> 00:21:11,520
which are very permissive, versus GPL,

387
00:21:11,520 --> 00:21:13,640
which is not permissive at all and is something

388
00:21:13,640 --> 00:21:16,600
that you would probably, you would not certainly

389
00:21:16,600 --> 00:21:19,520
want to use for your business.

390
00:21:19,520 --> 00:21:24,880
I think there are also questions about security vulnerabilities

391
00:21:24,880 --> 00:21:26,920
and performance.

392
00:21:26,920 --> 00:21:28,680
So if you generate code, ideally you

393
00:21:28,680 --> 00:21:30,720
would want it to be efficient.

394
00:21:30,720 --> 00:21:32,840
You wouldn't want it to be something

395
00:21:32,840 --> 00:21:36,360
that would take 10 or 100 times longer, more compute

396
00:21:36,360 --> 00:21:38,880
in order to execute.

397
00:21:38,880 --> 00:21:40,680
And you would also hopefully want

398
00:21:40,680 --> 00:21:44,320
the code that you generate to be consistent in terms

399
00:21:44,320 --> 00:21:46,680
of syntax and conventions with the code

400
00:21:46,680 --> 00:21:49,360
in your existing code bases.

401
00:21:49,360 --> 00:21:53,720
So all of these things are considerations

402
00:21:53,720 --> 00:21:58,880
that folks have to think about when implementing tools

403
00:21:58,880 --> 00:22:02,160
for their own users.

404
00:22:02,160 --> 00:22:05,000
And it's something that we certainly

405
00:22:05,000 --> 00:22:10,480
think very deeply about when orchestrating ML applied

406
00:22:10,480 --> 00:22:14,240
to software systems internally at Alphabet.

407
00:22:14,240 --> 00:22:15,960
Right.

408
00:22:15,960 --> 00:22:19,400
So for Bard, this week you guys announced

409
00:22:19,400 --> 00:22:21,840
that the underlying model, I guess,

410
00:22:21,840 --> 00:22:28,600
was upgraded from POM1 to POM2.

411
00:22:28,600 --> 00:22:31,080
What is it you think about POM2 that makes it

412
00:22:31,080 --> 00:22:33,000
like a better base model?

413
00:22:33,000 --> 00:22:34,560
Is it more data?

414
00:22:34,560 --> 00:22:40,440
Is it whatever you can speak to about it?

415
00:22:40,440 --> 00:22:41,520
That's a great question.

416
00:22:41,520 --> 00:22:43,560
So the model that we upgraded from,

417
00:22:43,560 --> 00:22:45,880
it wasn't actually POMv1.

418
00:22:45,880 --> 00:22:51,560
And POMv1 was a very, very large model

419
00:22:51,560 --> 00:22:56,680
that there's a paper about it so folks can go

420
00:22:56,680 --> 00:22:58,640
read if they're curious.

421
00:22:58,640 --> 00:23:04,680
But it was not ever used in production for Alphabet,

422
00:23:04,680 --> 00:23:06,320
I don't believe.

423
00:23:06,320 --> 00:23:13,920
But the first model for Bard was a version of our Lambda model.

424
00:23:13,920 --> 00:23:16,400
The model that we have upgraded to

425
00:23:16,400 --> 00:23:22,160
is a version of POMv2, which was announced on Wednesday.

426
00:23:22,160 --> 00:23:24,320
And some of the capabilities of it

427
00:23:24,320 --> 00:23:27,960
are that it's much, much better at code, at math, at reasoning.

428
00:23:27,960 --> 00:23:31,280
It's also much better at multilingual tasks.

429
00:23:31,280 --> 00:23:36,400
So POMv2 was trained on over 100 spoken word languages,

430
00:23:36,400 --> 00:23:39,800
dozens and dozens of computer programming languages.

431
00:23:39,800 --> 00:23:44,480
And as a result of that really robust and very diverse

432
00:23:44,480 --> 00:23:46,480
pre-training data mixture, it can

433
00:23:46,480 --> 00:23:49,240
do things like translate from one language to another.

434
00:23:49,240 --> 00:23:54,000
It can explain idioms and riddles.

435
00:23:54,000 --> 00:23:55,840
Even in different languages, it can

436
00:23:55,840 --> 00:23:58,560
translate from one programming language to another.

437
00:23:58,560 --> 00:24:02,720
It can tell you if code might be vulnerable

438
00:24:02,720 --> 00:24:04,720
or if it needs performance fixes.

439
00:24:04,720 --> 00:24:05,960
It can generate code.

440
00:24:05,960 --> 00:24:06,960
It can explain code.

441
00:24:12,360 --> 00:24:15,400
It can write mathematical proofs.

442
00:24:15,400 --> 00:24:17,440
It's just lots and lots of different things.

443
00:24:17,440 --> 00:24:20,480
We're discovering new uses for it every day.

444
00:24:20,480 --> 00:24:24,840
And one of the other most compelling features

445
00:24:24,840 --> 00:24:27,360
about this POMv2 family of models

446
00:24:27,360 --> 00:24:30,840
is that it comes in a broad variety of sizes.

447
00:24:30,840 --> 00:24:34,800
So everything from our smallest version, which

448
00:24:34,800 --> 00:24:37,640
can fit directly on a mobile device,

449
00:24:37,640 --> 00:24:43,120
to more modest sized versions that are still,

450
00:24:43,120 --> 00:24:45,320
despite being an order of magnitude

451
00:24:45,320 --> 00:24:48,760
or more tinier than the largest version,

452
00:24:48,760 --> 00:24:51,880
still preserving all of the capabilities

453
00:24:51,880 --> 00:25:00,320
and doing it just faster, more efficient, and cheaper.

454
00:25:00,320 --> 00:25:02,840
So it's definitely, from a business perspective,

455
00:25:02,840 --> 00:25:06,160
the POMv2 family makes a lot of sense.

456
00:25:06,160 --> 00:25:08,040
Yeah, that makes sense.

457
00:25:08,040 --> 00:25:10,440
Just to dig into the idea, because this

458
00:25:10,440 --> 00:25:12,960
is something that you see across a lot of large language

459
00:25:12,960 --> 00:25:20,120
models, you see Llama that has four maybe different sizes

460
00:25:20,120 --> 00:25:24,960
and obviously POM as well.

461
00:25:24,960 --> 00:25:26,840
So the reason for the different sizes,

462
00:25:26,840 --> 00:25:28,240
the different number of parameters,

463
00:25:28,240 --> 00:25:30,680
is it just to deal with the different trade-offs

464
00:25:30,680 --> 00:25:33,960
of where you're running the model, what your trade-off is

465
00:25:33,960 --> 00:25:35,920
for latency and performance?

466
00:25:35,920 --> 00:25:37,560
Is there anything else that goes into it?

467
00:25:41,320 --> 00:25:43,000
That's a great synopsis.

468
00:25:43,000 --> 00:25:46,640
So the smaller versions of the models

469
00:25:46,640 --> 00:25:49,760
make it easier to serve in a broad variety of locations.

470
00:25:49,760 --> 00:25:54,640
It also makes them much, much quicker at inference.

471
00:25:54,640 --> 00:25:57,400
And then the kinds of capabilities

472
00:25:57,400 --> 00:25:59,760
that we've been seeing from these smaller models that

473
00:25:59,760 --> 00:26:03,880
were open sourced is you can have very, very modest sized

474
00:26:03,880 --> 00:26:04,920
models.

475
00:26:04,920 --> 00:26:07,640
And as long as you fine tune or instruction

476
00:26:07,640 --> 00:26:10,760
tune on very high quality data sets,

477
00:26:10,760 --> 00:26:13,440
you're capable of getting the exact same performance

478
00:26:13,440 --> 00:26:15,740
that you would from a much larger model,

479
00:26:15,740 --> 00:26:20,160
despite it being faster, cheaper, easier to deploy.

480
00:26:20,160 --> 00:26:24,680
And I personally, I think that's the most interesting field

481
00:26:24,680 --> 00:26:27,120
of research right now is trying to take

482
00:26:27,120 --> 00:26:33,360
these highly capable models and make them more accessible

483
00:26:33,360 --> 00:26:37,200
for people all over the world, even if the only device

484
00:26:37,200 --> 00:26:40,200
that they have to work with is a mobile device.

485
00:26:40,200 --> 00:26:44,600
Yeah, that's something that I'm also really interested in.

486
00:26:44,600 --> 00:26:46,880
Training like teacher-student models

487
00:26:46,880 --> 00:26:50,880
and distilling information from these large language models

488
00:26:50,880 --> 00:26:55,080
to get them to be stored into smaller systems.

489
00:26:55,080 --> 00:26:58,520
Because yeah, it really depends on your use case.

490
00:26:58,520 --> 00:27:00,320
Sometimes there's just something really nice

491
00:27:00,320 --> 00:27:02,640
about being able to run it on your laptop.

492
00:27:02,640 --> 00:27:05,720
And a lot, I mean, almost all of these models,

493
00:27:05,720 --> 00:27:08,160
it's just like it's not even possible.

494
00:27:08,160 --> 00:27:10,360
You have to access it through an API

495
00:27:10,360 --> 00:27:14,360
or you have to access it through the interface

496
00:27:14,360 --> 00:27:15,520
that the company offers it.

497
00:27:15,520 --> 00:27:18,000
But yeah, I think that there's something really nice.

498
00:27:18,000 --> 00:27:21,080
Like even, I mean, I'm doing some of my work.

499
00:27:21,080 --> 00:27:23,720
Like I'm trying to get things that are like 400 megs down

500
00:27:23,720 --> 00:27:26,720
to 40 megs, just because it gives you the ability

501
00:27:26,720 --> 00:27:31,280
to like run maybe five or 10 times the amount of things

502
00:27:31,280 --> 00:27:35,360
in a much faster iteration time.

503
00:27:38,000 --> 00:27:41,640
So there's been some like dropping of some ideas

504
00:27:41,640 --> 00:27:45,000
about the new release, the new model

505
00:27:45,000 --> 00:27:48,200
that I believe it's DeepMind with Gemini.

506
00:27:49,440 --> 00:27:51,880
What can you tell us about Gemini?

507
00:27:52,880 --> 00:27:57,680
So many of the people from the Palm B2 team

508
00:27:57,680 --> 00:28:00,000
are on the Gemini project, including myself.

509
00:28:00,980 --> 00:28:04,040
And we're still actively training the model.

510
00:28:04,040 --> 00:28:09,040
It's intended to be Alphabet's most compelling model,

511
00:28:09,040 --> 00:28:14,040
which kind of tracks with, every time we build a model,

512
00:28:15,040 --> 00:28:19,000
we hope that it is a super set of the capabilities

513
00:28:19,000 --> 00:28:20,400
of the models that preceded it.

514
00:28:21,360 --> 00:28:24,560
But Gemini is very special in that it was built

515
00:28:24,560 --> 00:28:27,760
from kind of the ground up to be multimodal.

516
00:28:27,760 --> 00:28:30,840
So we're already seeing multimodal features

517
00:28:30,840 --> 00:28:35,200
in the first versions of the model that we've trained.

518
00:28:35,200 --> 00:28:40,200
And we're anticipating that it will be in production

519
00:28:40,200 --> 00:28:42,720
very quickly, or at least the smaller versions of it

520
00:28:42,720 --> 00:28:45,240
will be in production quite quickly.

521
00:28:45,240 --> 00:28:50,240
But I guess the only thing that I can say specifically

522
00:28:51,120 --> 00:28:53,400
is stay tuned.

523
00:28:53,400 --> 00:28:54,440
We're very excited.

524
00:28:54,440 --> 00:28:56,440
It's the first model that Google DeepMind

525
00:28:56,440 --> 00:28:58,600
has trained kind of jointly together.

526
00:28:59,600 --> 00:29:03,320
And it should be particularly compelling

527
00:29:03,320 --> 00:29:06,480
for not just the text and the code use cases,

528
00:29:06,480 --> 00:29:09,040
but also the multimodal use cases.

529
00:29:09,040 --> 00:29:11,560
Yeah, that's very exciting.

530
00:29:11,560 --> 00:29:13,120
I'm looking forward to it.

531
00:29:14,000 --> 00:29:16,160
In regards to multimodal,

532
00:29:16,160 --> 00:29:20,000
what multimodal use cases are you most excited for?

533
00:29:21,120 --> 00:29:26,120
Yeah, so multimodal, I love this idea

534
00:29:26,800 --> 00:29:31,800
that you can have audio, video, images, text, code,

535
00:29:31,800 --> 00:29:36,800
as inputs, including many of them

536
00:29:36,800 --> 00:29:39,320
kind of being interspersed together.

537
00:29:39,320 --> 00:29:43,600
And then kind of define what your output should be,

538
00:29:43,600 --> 00:29:47,600
either by, you might get some text as output

539
00:29:47,600 --> 00:29:48,760
from your original model,

540
00:29:48,760 --> 00:29:50,720
and then you stack a diffusion model on top

541
00:29:50,720 --> 00:29:53,120
such that you can generate an image back out,

542
00:29:53,120 --> 00:29:56,880
or you can generate audio or video.

543
00:29:56,880 --> 00:30:00,240
But some of the use cases that I'm most excited about

544
00:30:00,240 --> 00:30:04,240
for multimodal models is that you can imagine,

545
00:30:04,920 --> 00:30:07,760
say you're taking a physics course,

546
00:30:07,760 --> 00:30:11,760
and you just can't nail a concept.

547
00:30:11,760 --> 00:30:14,480
Like you just, for whatever reason,

548
00:30:14,480 --> 00:30:17,400
like angular velocity just isn't clicking.

549
00:30:17,400 --> 00:30:22,400
And you could easily just ask, like, hey,

550
00:30:22,640 --> 00:30:25,640
I really, what I would love to have is,

551
00:30:25,640 --> 00:30:30,640
like a video that explains angular velocity to me.

552
00:30:30,920 --> 00:30:34,920
And I really want it to just have like cats.

553
00:30:34,920 --> 00:30:38,480
And then I also want you to have like a quiz

554
00:30:38,480 --> 00:30:42,480
to check my comprehension every like one minute of the video.

555
00:30:42,480 --> 00:30:45,920
I don't want this video to be longer than four minutes.

556
00:30:45,920 --> 00:30:49,920
And then I also want to have like an outline at the end

557
00:30:49,920 --> 00:30:53,920
that explains what the concepts were in the video.

558
00:30:53,920 --> 00:30:55,720
Also with images.

559
00:30:55,720 --> 00:30:57,760
And that's something that a multimodal model

560
00:30:57,760 --> 00:30:59,320
would be able to support.

561
00:31:00,160 --> 00:31:04,680
So just being able to do something like that is huge.

562
00:31:05,560 --> 00:31:10,560
And it also brings about this world of,

563
00:31:10,840 --> 00:31:15,840
super, super tailored custom materials for folks.

564
00:31:16,640 --> 00:31:19,480
Like you could imagine each kid gets

565
00:31:19,480 --> 00:31:22,040
a new bedtime story every night.

566
00:31:22,040 --> 00:31:25,360
That's complete with, you know, a new story,

567
00:31:25,360 --> 00:31:27,640
a new adventure and new images.

568
00:31:27,640 --> 00:31:30,400
It's like the, I'm not sure if you've ever read

569
00:31:30,400 --> 00:31:32,640
The Diamond Age by Neil Stevenson,

570
00:31:32,640 --> 00:31:35,560
but it's like having the primer just kind of like

571
00:31:35,560 --> 00:31:38,440
available for every person.

572
00:31:38,440 --> 00:31:41,320
And that's, it's really energizing.

573
00:31:41,320 --> 00:31:42,320
That's awesome.

574
00:31:42,320 --> 00:31:45,120
Yeah, I think that that's such a cool use case

575
00:31:45,120 --> 00:31:48,360
because, yeah, everyone's unique.

576
00:31:48,360 --> 00:31:50,560
Everyone learns in their own, in their own ways.

577
00:31:50,560 --> 00:31:55,560
People respond to different mediums, you know, differently.

578
00:31:55,560 --> 00:31:59,560
So being able to tailor the learning approach,

579
00:31:59,560 --> 00:32:01,560
that, yeah, that could change,

580
00:32:01,560 --> 00:32:04,080
that can really change the way that we learn,

581
00:32:04,080 --> 00:32:05,400
the way that we interact,

582
00:32:05,400 --> 00:32:08,600
the way that we interact with the different, you know,

583
00:32:08,600 --> 00:32:09,960
material that we're trying to learn.

584
00:32:09,960 --> 00:32:11,560
That's really cool.

585
00:32:11,560 --> 00:32:15,000
Yeah, I saw a tweet the other day

586
00:32:15,000 --> 00:32:18,360
that someone with the, with the,

587
00:32:18,360 --> 00:32:21,760
that someone with the, with the learning disability,

588
00:32:21,760 --> 00:32:25,720
they didn't share what their learning disability was,

589
00:32:25,720 --> 00:32:29,200
but they mentioned that interacting

590
00:32:29,200 --> 00:32:34,200
with these generative models has sort of, you know,

591
00:32:35,120 --> 00:32:37,400
made it that their learning disability

592
00:32:37,400 --> 00:32:40,480
isn't hindering their life as much anymore.

593
00:32:40,480 --> 00:32:43,480
You know, they're able to, you know,

594
00:32:43,480 --> 00:32:48,000
instead of having to attempt to digest all of this content,

595
00:32:48,000 --> 00:32:52,160
even though it's not architected in a way that is,

596
00:32:52,160 --> 00:32:55,600
that's optimal for their learning style,

597
00:32:55,600 --> 00:32:57,960
they're capable of working with generative models

598
00:32:57,960 --> 00:33:00,600
to consume the information in a different way.

599
00:33:00,600 --> 00:33:03,760
And it clicks, and it's the first time in their life

600
00:33:03,760 --> 00:33:06,280
that they had said that it felt like that.

601
00:33:06,280 --> 00:33:08,080
And that's huge, right?

602
00:33:08,080 --> 00:33:12,600
Like being able to, being able to, you know,

603
00:33:12,600 --> 00:33:14,560
unlock the joy of learning for people

604
00:33:14,560 --> 00:33:16,720
who had previously been frustrating.

605
00:33:16,720 --> 00:33:19,880
That's opening up the world to so many more creators

606
00:33:19,880 --> 00:33:22,240
and so many more potential engineers

607
00:33:22,240 --> 00:33:25,800
and folks that, you know, can contribute.

608
00:33:26,760 --> 00:33:27,960
Yeah, absolutely.

609
00:33:27,960 --> 00:33:30,200
That's so rewarding.

610
00:33:30,200 --> 00:33:32,160
It's so nice to hear, you know,

611
00:33:32,160 --> 00:33:35,760
especially now with all of the people that are sort of

612
00:33:35,760 --> 00:33:37,800
bringing up all of these negative

613
00:33:37,800 --> 00:33:40,320
potential future use cases for AI,

614
00:33:40,320 --> 00:33:42,960
but to know that there are all of these use cases

615
00:33:42,960 --> 00:33:46,640
where it can help level the playing field

616
00:33:46,640 --> 00:33:51,160
you know, in some ways and open up opportunities for people.

617
00:33:51,160 --> 00:33:55,200
You know, things are changing, things are changing rapidly.

618
00:33:56,320 --> 00:33:58,880
And machine learning does have this tendency

619
00:33:58,880 --> 00:34:02,000
to perpetuate a lot of the things from the past,

620
00:34:02,960 --> 00:34:05,800
but it's nice to know that there are times

621
00:34:05,800 --> 00:34:08,880
when it can actually increase accessibility.

622
00:34:08,880 --> 00:34:12,560
So that's a nice use case.

623
00:34:12,560 --> 00:34:14,280
One of my favorite stories,

624
00:34:14,280 --> 00:34:16,840
one of my favorite stories from the copilot days

625
00:34:16,840 --> 00:34:19,800
were somebody, usually on GitHub,

626
00:34:19,800 --> 00:34:23,040
when somebody files an issue, it's like not a fun thing.

627
00:34:23,040 --> 00:34:25,080
You know, it's like, man, this is broken,

628
00:34:25,080 --> 00:34:28,760
or like, I'm confused or whatever.

629
00:34:28,760 --> 00:34:29,840
But there was one person

630
00:34:29,840 --> 00:34:32,640
whenever we had first released copilot

631
00:34:32,640 --> 00:34:37,640
who wrote and said, you know, like, I have tremors,

632
00:34:37,640 --> 00:34:40,680
like, and they've gotten so bad recently

633
00:34:40,680 --> 00:34:45,680
that, you know, I was forced to, you know, stop doing my job

634
00:34:45,800 --> 00:34:48,240
and this person had been a software engineer.

635
00:34:49,440 --> 00:34:53,560
And they said, you know, you've introduced this kind of like

636
00:34:53,560 --> 00:34:57,200
speech to code, basically,

637
00:34:58,200 --> 00:35:01,200
generation features within the IDE.

638
00:35:01,200 --> 00:35:02,720
And, you know, it's made it

639
00:35:02,720 --> 00:35:04,720
so that I can actually build software again.

640
00:35:04,720 --> 00:35:07,480
And I never thought that I would be able to do that.

641
00:35:07,480 --> 00:35:10,960
And that is also, you know, not late.

642
00:35:10,960 --> 00:35:14,320
That's the kind of thing that makes you delighted

643
00:35:14,320 --> 00:35:16,320
to come to work every day, I think,

644
00:35:16,320 --> 00:35:21,320
is, you know, the potential of making it

645
00:35:21,560 --> 00:35:24,520
so that people can do the things that they love,

646
00:35:24,520 --> 00:35:27,440
even though they might be physically limited in some way.

647
00:35:27,440 --> 00:35:28,280
Right.

648
00:35:28,280 --> 00:35:31,320
Yeah, knowing that you're having a positive impact

649
00:35:31,320 --> 00:35:32,640
on that person, I mean, who knows?

650
00:35:32,640 --> 00:35:34,960
There's probably many other people that are also, you know,

651
00:35:34,960 --> 00:35:38,120
getting that positive impact.

652
00:35:38,120 --> 00:35:41,200
So yeah, that it's definitely delightful.

653
00:35:41,200 --> 00:35:42,960
It's rewarding to hear, you know,

654
00:35:42,960 --> 00:35:44,840
working in this field.

655
00:35:47,080 --> 00:35:48,680
Thanks for geeking out with me,

656
00:35:48,680 --> 00:35:51,920
talking about large language models and all of that.

657
00:35:51,920 --> 00:35:54,520
But I wanna switch and zoom out

658
00:35:54,520 --> 00:35:58,440
into just the machine learning field in general.

659
00:36:00,480 --> 00:36:04,120
So what do you believe is an important question

660
00:36:04,120 --> 00:36:07,160
that remains unanswered in machine learning?

661
00:36:08,520 --> 00:36:10,040
There's so many, right?

662
00:36:10,040 --> 00:36:15,040
And I think one of the questions that I'm most interested in

663
00:36:16,280 --> 00:36:21,280
is how do kind of the pre-training data mixtures

664
00:36:22,840 --> 00:36:25,240
for large language models,

665
00:36:25,240 --> 00:36:29,560
how do they impact performance on downstream tasks?

666
00:36:29,560 --> 00:36:31,440
And this is an unsolved problem.

667
00:36:31,440 --> 00:36:32,720
There's this notion of like,

668
00:36:32,720 --> 00:36:35,000
oh, well, I wanna do code stuff.

669
00:36:35,000 --> 00:36:37,600
So like, perhaps I should have more code

670
00:36:39,000 --> 00:36:41,120
or I wanna do multilingual stuff.

671
00:36:41,120 --> 00:36:44,760
So perhaps I should have more multilingual data,

672
00:36:44,760 --> 00:36:47,120
but nobody knows how much,

673
00:36:47,120 --> 00:36:49,720
nobody knows how much quality impacts

674
00:36:49,720 --> 00:36:52,680
that performance for data.

675
00:36:52,680 --> 00:36:55,960
Nobody knows like how the data should be structured

676
00:36:55,960 --> 00:36:59,080
or formatted or if it should be included

677
00:36:59,080 --> 00:37:00,920
in a broad variety of ways.

678
00:37:00,920 --> 00:37:03,520
Like if you want to predict edits,

679
00:37:03,520 --> 00:37:06,520
perhaps you should have like code dips, right?

680
00:37:06,520 --> 00:37:10,840
Like instead of just the source code itself.

681
00:37:10,840 --> 00:37:14,360
So all of this experimentation

682
00:37:14,360 --> 00:37:17,000
ends up being pretty expensive, right?

683
00:37:17,000 --> 00:37:22,000
Like, and people end up taking like really expensive risks.

684
00:37:22,240 --> 00:37:25,080
So for Palm V2 as an example,

685
00:37:26,000 --> 00:37:28,720
the input pre-training data mixture

686
00:37:28,720 --> 00:37:31,520
included an awful lot of multilingual tokens,

687
00:37:31,520 --> 00:37:34,440
which meant that the number of English tokens

688
00:37:34,440 --> 00:37:35,400
was much lower.

689
00:37:37,240 --> 00:37:40,360
And there was a risk from the team that like,

690
00:37:40,360 --> 00:37:42,240
perhaps we'll train this model,

691
00:37:42,240 --> 00:37:43,840
this like super expensive,

692
00:37:43,840 --> 00:37:45,760
like millions of dollars a model,

693
00:37:46,600 --> 00:37:48,960
and then we'll get something

694
00:37:48,960 --> 00:37:51,360
and it won't be as good at English tasks

695
00:37:51,360 --> 00:37:53,600
as it is on all of these other things.

696
00:37:53,600 --> 00:37:54,680
Right.

697
00:37:54,680 --> 00:37:57,800
And that was a risk that the team took.

698
00:37:57,800 --> 00:38:01,720
And it ended up paying off because the performance

699
00:38:01,720 --> 00:38:04,400
ended up being better actually

700
00:38:04,400 --> 00:38:07,960
across all of the task ranges.

701
00:38:07,960 --> 00:38:11,360
But that could have been something

702
00:38:11,360 --> 00:38:14,600
that turned out very differently.

703
00:38:14,600 --> 00:38:19,400
And I think for people who are interested

704
00:38:19,400 --> 00:38:21,040
in the large language model space

705
00:38:21,040 --> 00:38:22,720
and the deep learning space,

706
00:38:22,720 --> 00:38:26,480
understanding how the data choices that you make,

707
00:38:26,480 --> 00:38:28,920
and then also the quality of the data that you make

708
00:38:28,920 --> 00:38:33,160
impacts how your model performs

709
00:38:33,160 --> 00:38:36,440
is really, really compelling and important.

710
00:38:36,440 --> 00:38:39,160
And then also how does that,

711
00:38:39,160 --> 00:38:42,720
like the pre-training phase

712
00:38:42,720 --> 00:38:46,000
and the attention that you pay in pre-training,

713
00:38:46,000 --> 00:38:51,000
how is that compared to like downstream fine tuning

714
00:38:52,640 --> 00:38:53,840
and instruction tuning?

715
00:38:53,840 --> 00:38:55,520
Because we're increasingly seeing

716
00:38:55,520 --> 00:38:59,520
that the fine tuning instruction tuning RLHF portions

717
00:38:59,520 --> 00:39:01,960
are much more impactful and move the needle

718
00:39:01,960 --> 00:39:05,120
much, much more than everything else.

719
00:39:05,120 --> 00:39:05,960
Right.

720
00:39:05,960 --> 00:39:08,280
What's your intuition on that?

721
00:39:08,280 --> 00:39:10,880
Why do you think it has such a big effect?

722
00:39:12,200 --> 00:39:16,720
So I think it's because it helps the model focus,

723
00:39:16,720 --> 00:39:20,480
which is not a technical way to describe it at all,

724
00:39:20,480 --> 00:39:23,360
but I think it's intuitively maybe a little bit easier

725
00:39:23,360 --> 00:39:24,520
to understand, right?

726
00:39:24,520 --> 00:39:29,240
Like the model, you initially train it on a lot of data,

727
00:39:29,240 --> 00:39:31,160
like the entire internet, right?

728
00:39:31,160 --> 00:39:33,640
Or a lot of data.

729
00:39:34,840 --> 00:39:38,320
And so it's capable of doing a broad variety of things.

730
00:39:38,320 --> 00:39:40,520
Whenever you instruction tune it or fine tune it

731
00:39:40,520 --> 00:39:44,000
or RLHF it, you help it kind of focus

732
00:39:44,000 --> 00:39:47,000
on the kinds of questions that you're really interested

733
00:39:47,000 --> 00:39:48,840
in answering or the kind of format

734
00:39:48,840 --> 00:39:50,840
that you would really, really like to see.

735
00:39:50,840 --> 00:39:55,840
Like say, your model is giving relatively short outputs

736
00:39:57,560 --> 00:40:01,080
and you want them to be a little bit more long form.

737
00:40:01,080 --> 00:40:04,480
You can kind of tune that with RLHF

738
00:40:04,480 --> 00:40:09,480
such that your model is kind of rewarded

739
00:40:10,320 --> 00:40:14,040
whenever it outputs longer context things

740
00:40:14,040 --> 00:40:15,800
as opposed to shorter things.

741
00:40:17,240 --> 00:40:19,640
Right, so you can kind of tune it

742
00:40:19,640 --> 00:40:23,040
to how you want those outputs to be.

743
00:40:23,040 --> 00:40:24,480
Yeah.

744
00:40:24,480 --> 00:40:26,720
One of the things that you mentioned before,

745
00:40:26,720 --> 00:40:29,760
it made me think about like, well, two things.

746
00:40:29,760 --> 00:40:33,160
I guess how there are these different waves

747
00:40:33,160 --> 00:40:36,240
that have happened, I view in machine learning.

748
00:40:36,240 --> 00:40:38,800
Like I think there was a time a couple of years ago

749
00:40:38,800 --> 00:40:42,160
where it was like, you need to have a fine tune model

750
00:40:42,160 --> 00:40:43,440
on this task, right?

751
00:40:43,440 --> 00:40:48,120
And now with all of the advances in generative models,

752
00:40:48,120 --> 00:40:50,920
it's like, oh, maybe there's one model

753
00:40:50,920 --> 00:40:54,400
that can kind of answer all of these questions.

754
00:40:54,400 --> 00:40:56,880
But there still is this question about,

755
00:40:56,880 --> 00:41:01,040
can we use that generative model to then point you

756
00:41:01,040 --> 00:41:03,640
in the right direction of the best tool

757
00:41:03,640 --> 00:41:06,120
to use to solve your problem?

758
00:41:07,600 --> 00:41:12,040
I guess, what's your take on that

759
00:41:12,040 --> 00:41:15,240
and how have you viewed that sort of transition

760
00:41:15,240 --> 00:41:17,200
between having a generative model

761
00:41:17,200 --> 00:41:19,600
that can kind of solve many problems

762
00:41:19,600 --> 00:41:24,600
versus fine tune models that are for specific tasks?

763
00:41:24,760 --> 00:41:25,600
Yeah.

764
00:41:25,600 --> 00:41:28,960
So we found just anecdotally,

765
00:41:30,960 --> 00:41:35,080
so initially for some of our large models

766
00:41:36,680 --> 00:41:40,640
that perhaps weren't pre-trained on as much source code,

767
00:41:41,640 --> 00:41:45,320
we did have to have fine tuned versions

768
00:41:45,320 --> 00:41:48,080
of those models using source code.

769
00:41:48,080 --> 00:41:52,320
And then for future iterations of the larger models,

770
00:41:52,320 --> 00:41:55,200
we included all of the tokens that we had used

771
00:41:55,200 --> 00:41:56,840
during the fine tuning process

772
00:41:56,840 --> 00:41:59,280
within the pre-training data mixture.

773
00:41:59,280 --> 00:42:03,600
And the resulting model exceeded the performance

774
00:42:03,600 --> 00:42:08,600
of the fine tuned model based on those kinds of inclusions

775
00:42:09,680 --> 00:42:12,680
and then also using a tokenizer

776
00:42:12,680 --> 00:42:15,240
that was a little bit friendlier for code.

777
00:42:15,240 --> 00:42:20,240
So practically we've seen that the generative models

778
00:42:22,280 --> 00:42:26,920
provided you keep adding in high quality pre-training data,

779
00:42:26,920 --> 00:42:30,440
they can sort of absorb the tasks

780
00:42:30,440 --> 00:42:32,200
from the fine tuned models.

781
00:42:33,320 --> 00:42:37,080
But I also will say that for businesses,

782
00:42:38,760 --> 00:42:42,280
whatever solves your problem,

783
00:42:42,280 --> 00:42:45,240
like that's the thing that you should use.

784
00:42:45,240 --> 00:42:50,240
And Lord knows the world is like running on Excel spreadsheets

785
00:42:50,400 --> 00:42:53,120
for much of the finance sector.

786
00:42:53,120 --> 00:42:55,920
And the government is running on COBOL.

787
00:42:55,920 --> 00:43:00,200
And like the cost of migrating off of either of those things

788
00:43:00,200 --> 00:43:01,600
is just overwhelming.

789
00:43:01,600 --> 00:43:03,280
And I don't wanna be the one causing

790
00:43:03,280 --> 00:43:05,800
like the financial downturn of like

791
00:43:05,800 --> 00:43:07,440
telling the finance community

792
00:43:07,440 --> 00:43:09,760
that they shouldn't be using Excel spreadsheets

793
00:43:09,760 --> 00:43:12,680
or basic or whatever it is.

794
00:43:12,680 --> 00:43:17,680
So I would say that whatever modeling meets your needs

795
00:43:21,960 --> 00:43:23,720
is what you should use.

796
00:43:23,720 --> 00:43:27,080
And certainly experiment with generative models,

797
00:43:27,080 --> 00:43:29,240
see if it makes sense.

798
00:43:29,240 --> 00:43:31,680
And then also see if it makes financial sense.

799
00:43:31,680 --> 00:43:34,520
Cause if you can get by with a smaller model

800
00:43:34,520 --> 00:43:38,320
that perhaps is open source and easier to maintain,

801
00:43:38,320 --> 00:43:42,280
then why should you be paying for API calls?

802
00:43:44,720 --> 00:43:47,360
But that's personal opinion.

803
00:43:47,360 --> 00:43:51,120
Like new technology is always going to be very cool

804
00:43:51,120 --> 00:43:52,200
and push boundaries.

805
00:43:52,200 --> 00:43:56,520
But at the end of the day, what matters is your business,

806
00:43:56,520 --> 00:43:58,520
your users and their problems.

807
00:43:59,800 --> 00:44:01,080
Yeah, absolutely.

808
00:44:01,080 --> 00:44:03,560
I agree 100%.

809
00:44:03,560 --> 00:44:06,280
Identifying the problem, the underlying need,

810
00:44:06,280 --> 00:44:10,240
knowing that there's many approaches.

811
00:44:10,240 --> 00:44:15,240
I view that generative models are one tool in your toolset.

812
00:44:18,600 --> 00:44:22,360
You don't necessarily need to use it for everything.

813
00:44:22,360 --> 00:44:25,320
It has a lot of amazing use cases, but yeah.

814
00:44:27,040 --> 00:44:31,320
Certain business practices like you were mentioning are,

815
00:44:32,360 --> 00:44:33,720
I'll say stuck in there,

816
00:44:33,720 --> 00:44:35,400
stuck in there certain ways.

817
00:44:35,400 --> 00:44:39,440
And we're moving towards this new future,

818
00:44:39,440 --> 00:44:43,640
but everyone's not ready for the leap over the canyon.

819
00:44:46,880 --> 00:44:47,700
I don't know.

820
00:44:47,700 --> 00:44:49,840
There's a place, there's a space

821
00:44:49,840 --> 00:44:52,080
and we have to get people over.

822
00:44:52,080 --> 00:44:57,080
And some people will take smaller steps than others.

823
00:44:57,400 --> 00:45:01,840
Yeah, and there's also like some appetite for risk.

824
00:45:01,840 --> 00:45:02,960
I think you have to have

825
00:45:02,960 --> 00:45:04,960
if you're preferring generative models,

826
00:45:04,960 --> 00:45:09,960
just given that we're still exploring factuality

827
00:45:10,560 --> 00:45:12,320
and being able to verify

828
00:45:12,320 --> 00:45:14,860
that models are returning correct responses.

829
00:45:14,860 --> 00:45:16,960
Cause a lot of the time they don't.

830
00:45:16,960 --> 00:45:21,960
Like there are always examples towards like counterfactuals.

831
00:45:24,920 --> 00:45:29,920
But the one thing that I saw on Twitter earlier this week,

832
00:45:30,480 --> 00:45:32,240
cause all of these generative models things

833
00:45:32,240 --> 00:45:34,880
seem to be percolating to Twitter.

834
00:45:34,880 --> 00:45:38,880
But someone instead of like consulting

835
00:45:38,880 --> 00:45:41,960
with a financial analyst, they were like, I have some money

836
00:45:41,960 --> 00:45:43,840
and I wanna invest it in some stocks,

837
00:45:43,840 --> 00:45:46,440
like recommend some stocks that I should buy.

838
00:45:46,440 --> 00:45:49,980
And they got Chad GBT to do that.

839
00:45:49,980 --> 00:45:52,200
And then they purchased the stock.

840
00:45:52,200 --> 00:45:54,800
They kind of mapped it out over time

841
00:45:54,800 --> 00:45:56,120
as to what would have happened

842
00:45:56,120 --> 00:45:59,800
if they followed the portfolio recommendation

843
00:45:59,800 --> 00:46:04,120
versus the buy some stock recommendation.

844
00:46:04,120 --> 00:46:09,120
And the Chad GBT recommended stock purchases

845
00:46:09,440 --> 00:46:13,380
or the generative model recommended stock purchases

846
00:46:13,380 --> 00:46:15,440
actually ended up doing quite, quite well

847
00:46:15,440 --> 00:46:20,440
compared to the kind of investment portfolio recommendations.

848
00:46:21,120 --> 00:46:22,720
But all I could think was just like,

849
00:46:22,720 --> 00:46:25,760
I don't want my 401k to be at the mercy

850
00:46:25,760 --> 00:46:28,420
of a generative model, at least not now.

851
00:46:28,420 --> 00:46:32,000
Like it's, and it might be that perhaps

852
00:46:32,000 --> 00:46:36,600
these investment strategists are recommending certain things

853
00:46:36,600 --> 00:46:39,500
because they know that it's more stable

854
00:46:39,500 --> 00:46:44,120
than perhaps these stocks that might be in the near term,

855
00:46:45,240 --> 00:46:47,040
high performers, but long-term,

856
00:46:47,040 --> 00:46:52,040
like much more variable in terms of return.

857
00:46:52,880 --> 00:46:56,040
So it's like choose your own adventure

858
00:46:56,040 --> 00:47:01,040
and be very careful for betting all of your money

859
00:47:01,040 --> 00:47:02,980
on generative models.

860
00:47:02,980 --> 00:47:07,320
Yeah, I mean, yeah, in terms of financial decisions,

861
00:47:07,320 --> 00:47:08,980
I think, yeah, everyone's sort of like,

862
00:47:08,980 --> 00:47:11,300
how much risk are you willing to take?

863
00:47:11,300 --> 00:47:12,780
And then there should be like another step,

864
00:47:12,780 --> 00:47:14,260
like, are you willing to take the risk

865
00:47:14,260 --> 00:47:17,140
of using a generative model for your financial portfolio?

866
00:47:17,140 --> 00:47:19,140
Maybe some people will say yes, who knows?

867
00:47:22,120 --> 00:47:24,340
Okay, so switching into like

868
00:47:25,180 --> 00:47:28,020
learning from machine learning standpoint

869
00:47:28,020 --> 00:47:33,020
and just sort of general advice for people in the industry.

870
00:47:35,140 --> 00:47:36,680
Well, I'll start with this one.

871
00:47:37,660 --> 00:47:40,140
Who are some people in the machine learning field

872
00:47:40,140 --> 00:47:41,480
that influence you?

873
00:47:43,740 --> 00:47:47,260
So there have been so many,

874
00:47:48,500 --> 00:47:51,220
I think, and I'm fortunate to work

875
00:47:51,220 --> 00:47:53,620
with many of them every day.

876
00:47:53,620 --> 00:47:58,620
So I've been coming from kind of the open source tools,

877
00:48:00,420 --> 00:48:02,020
developer tool space.

878
00:48:02,020 --> 00:48:06,040
I've really loved working with the JAX team.

879
00:48:06,040 --> 00:48:08,620
So JAX is for folks who might not know,

880
00:48:08,620 --> 00:48:13,140
it's an open source library for doing,

881
00:48:14,060 --> 00:48:15,940
kind of building deep learning models,

882
00:48:15,940 --> 00:48:20,940
but also doing kind of scientific experimentation

883
00:48:20,940 --> 00:48:23,900
and all of the models that we build at AlphaBet

884
00:48:23,900 --> 00:48:25,400
are built using JAX.

885
00:48:26,820 --> 00:48:30,940
Matt Johnson, Peter Hawkins, James Bradbury,

886
00:48:30,940 --> 00:48:33,620
like they are all, Sky Wonderman-Milm,

887
00:48:33,620 --> 00:48:35,540
they're all very, very close collaborators,

888
00:48:35,540 --> 00:48:39,060
Roy Frosting as well, Yash Kataria,

889
00:48:39,060 --> 00:48:42,840
and they've been delightful to work with and to learn from.

890
00:48:42,840 --> 00:48:47,840
I've also really, really loved working with Jeff Dean,

891
00:48:54,120 --> 00:48:58,840
as well as the collaborators at DeepMind.

892
00:48:58,840 --> 00:49:00,920
So as part of this Gemini effort,

893
00:49:00,920 --> 00:49:05,920
I've gotten to work more closely with Oriol Benyals,

894
00:49:06,040 --> 00:49:10,160
who's kind of driving much of the efforts.

895
00:49:10,160 --> 00:49:13,520
And then also, of course, the learning for code folks,

896
00:49:14,520 --> 00:49:18,080
who care very deeply about machine learning

897
00:49:18,080 --> 00:49:19,760
as applied to software systems

898
00:49:19,760 --> 00:49:23,100
and think very carefully and thoughtfully about it.

899
00:49:23,100 --> 00:49:26,560
And if you're on Twitter,

900
00:49:26,560 --> 00:49:29,600
I would also highly recommend following

901
00:49:29,600 --> 00:49:31,920
the Hugging Face team,

902
00:49:31,920 --> 00:49:34,160
because they have a lot of passion

903
00:49:34,160 --> 00:49:36,600
around open source machine learning and deep learning

904
00:49:36,600 --> 00:49:38,120
and sharing their knowledge,

905
00:49:38,120 --> 00:49:40,800
as well as Andre Karpathy,

906
00:49:40,800 --> 00:49:45,280
who is a dear, sweet human just generally,

907
00:49:45,280 --> 00:49:49,000
but also cares a lot about education and advocacy

908
00:49:49,000 --> 00:49:51,000
and making sure that these concepts

909
00:49:51,000 --> 00:49:53,560
are understandable to everyday humans.

910
00:49:54,520 --> 00:49:59,520
So there are a lot of people

911
00:50:01,320 --> 00:50:02,680
to admire in this community.

912
00:50:02,680 --> 00:50:04,440
I feel very fortunate.

913
00:50:04,440 --> 00:50:09,080
Yeah, absolutely. Yeah, there's so many great people

914
00:50:09,080 --> 00:50:12,720
in the field and people are pretty generous

915
00:50:12,720 --> 00:50:13,760
with their time also.

916
00:50:13,760 --> 00:50:18,760
And there's this sense of wanting to help other people out.

917
00:50:19,940 --> 00:50:22,440
And it's a really nice community

918
00:50:22,440 --> 00:50:24,440
to be in the machine learning community.

919
00:50:26,080 --> 00:50:30,560
On your website, which is nicely named

920
00:50:30,560 --> 00:50:32,800
Dynamic Webpage, right?

921
00:50:32,800 --> 00:50:33,640
Is that?

922
00:50:33,640 --> 00:50:37,800
Oh, so my website,

923
00:50:37,800 --> 00:50:42,800
I have a tendency to purchase URLs that are puns.

924
00:50:43,520 --> 00:50:47,880
And I think probably most of us in this industry

925
00:50:47,880 --> 00:50:50,480
are like URL hoarders,

926
00:50:50,480 --> 00:50:54,400
but the one that might have,

927
00:50:55,600 --> 00:50:57,920
I think the URL might have even expired,

928
00:50:57,920 --> 00:51:01,960
but page views is the one.

929
00:51:01,960 --> 00:51:04,400
Okay, so under a section that I saw

930
00:51:04,400 --> 00:51:06,680
on one of your websites, it was under page views.

931
00:51:06,680 --> 00:51:10,840
You had some really interesting tenants.

932
00:51:10,840 --> 00:51:12,800
I'm gonna read them if you don't mind.

933
00:51:12,800 --> 00:51:14,200
Yep. Yep.

934
00:51:14,200 --> 00:51:17,960
So bring data to opinion fights,

935
00:51:17,960 --> 00:51:20,760
relentlessly ask questions,

936
00:51:20,760 --> 00:51:24,800
communication is everything, especially in open source,

937
00:51:24,800 --> 00:51:27,160
choose growth opportunities,

938
00:51:27,160 --> 00:51:29,820
nurture and build communities.

939
00:51:29,820 --> 00:51:31,200
If you don't have documentation,

940
00:51:31,200 --> 00:51:33,240
you don't have an MVP,

941
00:51:33,240 --> 00:51:35,360
give without expecting a return

942
00:51:35,360 --> 00:51:38,280
and believe in people, not acronyms.

943
00:51:38,280 --> 00:51:39,400
I love these.

944
00:51:39,400 --> 00:51:41,440
I love them so much.

945
00:51:41,440 --> 00:51:42,280
Thank you.

946
00:51:42,280 --> 00:51:43,400
Are there any that you,

947
00:51:43,400 --> 00:51:45,560
is there anything that you would add to any of them

948
00:51:45,560 --> 00:51:48,600
or anything that you wanna talk about with any of those?

949
00:51:48,600 --> 00:51:53,600
So those, I think I put down on paper

950
00:51:54,800 --> 00:51:56,440
about half a decade ago

951
00:51:56,440 --> 00:52:01,440
and they still seem like pretty good life roles.

952
00:52:01,520 --> 00:52:06,520
I think now, especially the believe in people,

953
00:52:06,520 --> 00:52:11,520
not acronyms, one is going to become even more compelling

954
00:52:13,440 --> 00:52:14,960
and important.

955
00:52:14,960 --> 00:52:18,120
So COVID kind of turned everything on its head, right?

956
00:52:18,120 --> 00:52:22,940
Like I personally, like it helped prioritize things for me.

957
00:52:22,940 --> 00:52:24,880
It was part of the reason why I left Alphabet.

958
00:52:24,880 --> 00:52:26,680
Like I wanted to move back to Texas

959
00:52:26,680 --> 00:52:28,000
to take care of my parents.

960
00:52:28,000 --> 00:52:33,000
They're like 70 plus, 80 plus.

961
00:52:34,880 --> 00:52:39,880
And I think a lot of other people also,

962
00:52:40,480 --> 00:52:42,320
it helped them prioritize

963
00:52:42,320 --> 00:52:45,520
what they should be spending their time on.

964
00:52:45,520 --> 00:52:49,400
So we saw a lot of kids choose not to go to college

965
00:52:49,400 --> 00:52:51,400
and to perhaps enter into the workforce

966
00:52:51,400 --> 00:52:54,520
or to perhaps like start building software

967
00:52:54,520 --> 00:52:57,160
or to build their own businesses, right?

968
00:52:57,160 --> 00:52:59,040
And these kids are incredible.

969
00:52:59,040 --> 00:53:01,480
Like they're doing such amazing things.

970
00:53:01,480 --> 00:53:05,240
Like the most killer use cases for generative models

971
00:53:05,240 --> 00:53:08,080
are all these kids that are like entrepreneurs

972
00:53:08,080 --> 00:53:11,320
and somehow like 16 years old, right?

973
00:53:11,320 --> 00:53:16,320
And so I feel like, and then also

974
00:53:16,520 --> 00:53:18,360
in the deep learning community,

975
00:53:18,360 --> 00:53:21,140
we're increasingly seeing that folks

976
00:53:21,140 --> 00:53:23,800
aren't really publishing papers anymore, right?

977
00:53:23,800 --> 00:53:27,840
Like whereas previously there was this academic ivory tower

978
00:53:27,840 --> 00:53:30,680
of like, oh, you do some research, you produce a paper,

979
00:53:30,680 --> 00:53:32,000
you present to the conference,

980
00:53:32,000 --> 00:53:33,160
and then you add it to like

981
00:53:33,160 --> 00:53:36,120
this extensive scholarly pedigree.

982
00:53:36,120 --> 00:53:39,560
Whereas now it's just kinda like, well, I did some work.

983
00:53:39,560 --> 00:53:41,600
I produced something that people can test out.

984
00:53:41,600 --> 00:53:44,360
I will perhaps write a blog post, but for the most part,

985
00:53:44,360 --> 00:53:47,080
it's just like something cool that's out in the world

986
00:53:47,080 --> 00:53:50,160
that people can like see and touch and experience.

987
00:53:50,160 --> 00:53:55,160
So I think that if you don't have a college degree,

988
00:53:57,720 --> 00:54:01,640
if you are debating whether or not college is right for you,

989
00:54:01,640 --> 00:54:04,880
if you're concerned that like perhaps not having a degree

990
00:54:04,880 --> 00:54:06,620
will limit you in some way,

991
00:54:06,620 --> 00:54:09,160
whether it's an undergrad degree or a graduate degree,

992
00:54:09,160 --> 00:54:11,680
or if you have like a journalism degree and you're like,

993
00:54:11,680 --> 00:54:13,820
but can I do these work?

994
00:54:13,820 --> 00:54:17,760
You can, there are no rules in this space.

995
00:54:17,760 --> 00:54:20,960
Like, you can do anything that you would like to.

996
00:54:20,960 --> 00:54:24,200
And the important part is just kind of focus

997
00:54:24,200 --> 00:54:28,860
on building something great that delights you

998
00:54:28,860 --> 00:54:31,120
and that delights other people.

999
00:54:31,120 --> 00:54:32,960
And that's all that really matters.

1000
00:54:33,960 --> 00:54:36,880
That's really good advice for people

1001
00:54:36,880 --> 00:54:38,440
starting out in the field.

1002
00:54:40,760 --> 00:54:43,380
A little other, another variation of that,

1003
00:54:43,380 --> 00:54:48,380
when you were just kind of getting your feet under you,

1004
00:54:49,220 --> 00:54:51,180
or I mean, I don't know if that is even the case,

1005
00:54:51,180 --> 00:54:53,500
but what advice would you give yourself

1006
00:54:53,500 --> 00:54:57,200
when you were starting your career in data science?

1007
00:54:58,860 --> 00:55:03,860
So I would say, I would just kind of reinforce,

1008
00:55:04,660 --> 00:55:08,200
do what you think is right and what interests you,

1009
00:55:08,200 --> 00:55:13,200
and then don't feel, give yourself permission

1010
00:55:13,200 --> 00:55:18,200
to do that and don't feel guilty or like you aren't,

1011
00:55:18,840 --> 00:55:23,840
like you aren't meeting up to the expectations

1012
00:55:24,320 --> 00:55:28,520
of your peers or your academic advisors.

1013
00:55:28,520 --> 00:55:33,520
I was told when I had entered into my graduate program,

1014
00:55:33,620 --> 00:55:36,360
I was told that it was a waste of time

1015
00:55:36,360 --> 00:55:40,020
to do like computer science and programming,

1016
00:55:40,020 --> 00:55:41,760
that I would never be respected

1017
00:55:41,760 --> 00:55:45,680
if all I did was like build tools or build libraries

1018
00:55:45,680 --> 00:55:49,840
or write code to solve science problems.

1019
00:55:50,760 --> 00:55:52,880
And I was also told that machine learning

1020
00:55:52,880 --> 00:55:55,320
would never have a place in the earth sciences,

1021
00:55:56,760 --> 00:56:01,760
which was in hindsight, kind of silly.

1022
00:56:04,680 --> 00:56:06,400
But at the time I was just like,

1023
00:56:06,400 --> 00:56:12,040
am I making the worst career choice of my entire life?

1024
00:56:12,040 --> 00:56:16,400
And I had a lot of internal emotional angst about it

1025
00:56:17,920 --> 00:56:22,920
back then, but now it's definitely the right choice.

1026
00:56:23,400 --> 00:56:26,840
If you feel like something is important,

1027
00:56:28,560 --> 00:56:32,280
sometimes you can see the future,

1028
00:56:32,280 --> 00:56:36,920
even if other people are standing,

1029
00:56:36,920 --> 00:56:38,580
facing a different direction.

1030
00:56:40,720 --> 00:56:42,440
That's great.

1031
00:56:42,440 --> 00:56:44,200
Yeah, it's so interesting, right?

1032
00:56:44,200 --> 00:56:49,040
When you're in the moment, things can seem so chaotic

1033
00:56:49,040 --> 00:56:51,440
and there could be so much uncertainty.

1034
00:56:51,440 --> 00:56:54,560
And then, I stop myself,

1035
00:56:54,560 --> 00:56:56,000
like the things that you're saying,

1036
00:56:56,000 --> 00:56:59,120
they're like looking at things now,

1037
00:56:59,120 --> 00:57:00,960
it's like some of that's like laughable.

1038
00:57:00,960 --> 00:57:04,800
Like, you look now, I mean, hindsight's 2020, of course,

1039
00:57:04,800 --> 00:57:05,640
but it's just, I mean,

1040
00:57:05,640 --> 00:57:08,200
machine learning has become so pervasive

1041
00:57:08,200 --> 00:57:11,640
and using all of these sorts of techniques and technologies.

1042
00:57:13,040 --> 00:57:17,000
But it's amazing how, yeah, in the moment,

1043
00:57:17,000 --> 00:57:20,000
it can be so uncertain, but then when you look back,

1044
00:57:20,000 --> 00:57:23,160
it can become so clear.

1045
00:57:24,320 --> 00:57:29,320
And that was just like eight or nine years ago, by the way,

1046
00:57:29,320 --> 00:57:33,120
that people were thinking that machine learning

1047
00:57:33,120 --> 00:57:34,980
wouldn't have a place in the earth sciences,

1048
00:57:34,980 --> 00:57:37,220
wouldn't have a place in the physical sciences.

1049
00:57:37,220 --> 00:57:39,060
All of this is super new.

1050
00:57:40,080 --> 00:57:44,220
And so I think that given the trajectory

1051
00:57:44,220 --> 00:57:46,960
and kind of the exponential increase

1052
00:57:46,960 --> 00:57:49,680
and how these things are progressing,

1053
00:57:49,680 --> 00:57:52,400
it's really tricky to predict the future.

1054
00:57:52,400 --> 00:57:54,720
And there's, I forget who said it,

1055
00:57:54,720 --> 00:57:58,040
but like, and it's kind of cliche, I guess now,

1056
00:57:58,040 --> 00:58:00,160
but the best way to predict the future

1057
00:58:00,160 --> 00:58:01,680
is to be the one building it.

1058
00:58:01,680 --> 00:58:04,080
Like, it is very clear that the future

1059
00:58:04,080 --> 00:58:08,440
is going to be built using the generative technologies

1060
00:58:08,440 --> 00:58:10,440
and assistive technologies

1061
00:58:10,440 --> 00:58:13,040
and productivity enhancing technologies.

1062
00:58:13,040 --> 00:58:15,000
So if you're doing that,

1063
00:58:15,000 --> 00:58:16,960
you're probably gonna be heading in

1064
00:58:16,960 --> 00:58:18,360
towards the right direction.

1065
00:58:19,360 --> 00:58:20,800
Absolutely.

1066
00:58:20,800 --> 00:58:24,360
So speaking of how fast things are changing

1067
00:58:24,360 --> 00:58:28,920
in the last decade, in the last five years,

1068
00:58:28,920 --> 00:58:31,360
even in the last year or week,

1069
00:58:32,920 --> 00:58:37,920
how do you stay up on all of the newest techniques?

1070
00:58:38,720 --> 00:58:41,280
I mean, I know in a way you're part of it,

1071
00:58:41,280 --> 00:58:43,280
you're building the future,

1072
00:58:43,280 --> 00:58:47,280
but there's so many people that are releasing stuff.

1073
00:58:48,520 --> 00:58:50,200
Do you have any techniques

1074
00:58:50,200 --> 00:58:54,400
to sort of stay up on top of everything?

1075
00:58:54,400 --> 00:58:57,200
It's a really tricky problem.

1076
00:58:57,200 --> 00:59:00,640
I would, and again, this is gonna sound silly.

1077
00:59:00,640 --> 00:59:04,160
I would be on Twitter

1078
00:59:04,160 --> 00:59:06,560
just because the machine learning community

1079
00:59:06,560 --> 00:59:09,440
seems to hang out on Twitter

1080
00:59:09,440 --> 00:59:12,240
and they talk about things that they find interesting.

1081
00:59:12,240 --> 00:59:15,280
It's like getting to sneakily listen in

1082
00:59:15,280 --> 00:59:17,280
to all of the hallway conversations

1083
00:59:17,280 --> 00:59:20,120
in all of these AI research labs.

1084
00:59:20,120 --> 00:59:24,200
And increasingly, we had mentioned it earlier,

1085
00:59:24,200 --> 00:59:27,840
but people aren't writing papers anymore.

1086
00:59:29,040 --> 00:59:32,680
And so the best insights that you can give

1087
00:59:32,680 --> 00:59:37,680
into kind of pedagogy and kind of how these models

1088
00:59:37,920 --> 00:59:41,920
were built is by hearing what people

1089
00:59:41,920 --> 00:59:43,280
are currently focused on.

1090
00:59:43,280 --> 00:59:45,120
Like, is it long context?

1091
00:59:45,120 --> 00:59:49,680
Is it this new tokenizer that seems interesting?

1092
00:59:49,680 --> 00:59:51,960
Is it some sort of algorithmic advance

1093
00:59:51,960 --> 00:59:55,000
that people are really paying attention to?

1094
00:59:55,000 --> 00:59:56,960
Is it the data mixtures?

1095
00:59:56,960 --> 01:00:00,680
Is it some other model architecture?

1096
01:00:00,680 --> 01:00:05,080
But the kind of honing in on where the conversation

1097
01:00:05,080 --> 01:00:09,080
seems to be happening is really, really important.

1098
01:00:09,080 --> 01:00:11,040
And right now, if you're just looking

1099
01:00:11,040 --> 01:00:13,200
or attempting to look at every deep learning paper

1100
01:00:13,200 --> 01:00:16,200
that's posted on archive, your life's gonna be insane.

1101
01:00:16,200 --> 01:00:18,880
Like there was a website a while back,

1102
01:00:18,880 --> 01:00:21,040
I think it was created by Andre Carpathy,

1103
01:00:21,040 --> 01:00:25,000
the archive sanity website, and it would post the paper

1104
01:00:25,000 --> 01:00:30,000
and then have like a really nice kind of images with PDFs.

1105
01:00:31,080 --> 01:00:34,760
But in the last three or four years,

1106
01:00:34,760 --> 01:00:37,920
it's gotten overwhelming to keep up even with that.

1107
01:00:37,920 --> 01:00:41,040
So I would strongly encourage you,

1108
01:00:41,040 --> 01:00:44,000
pick people that work at the AI research labs

1109
01:00:44,000 --> 01:00:49,000
that you care about, Anthropic, Google DeepMind, OpenAI,

1110
01:00:50,320 --> 01:00:53,280
and perhaps some of the others,

1111
01:00:53,280 --> 01:00:55,400
like Hugging Face that I had mentioned,

1112
01:00:56,360 --> 01:00:59,440
and follow them, see what they're talking about,

1113
01:00:59,440 --> 01:01:01,300
see who else they're following.

1114
01:01:02,200 --> 01:01:04,720
And then also don't be afraid to ask questions

1115
01:01:04,720 --> 01:01:07,160
if something isn't clear, because a lot of time,

1116
01:01:07,160 --> 01:01:09,800
if you ask a question to these people on Twitter,

1117
01:01:09,800 --> 01:01:12,480
they will respond, and that's huge, right?

1118
01:01:12,480 --> 01:01:17,480
Like being able to have access to some of the folks

1119
01:01:17,960 --> 01:01:20,620
that are building the future is massive.

1120
01:01:21,760 --> 01:01:24,520
Yeah, absolutely.

1121
01:01:24,520 --> 01:01:27,120
One thing that I didn't get to touch on,

1122
01:01:27,120 --> 01:01:30,800
so yeah, obviously machine learning, AI, generative models

1123
01:01:30,800 --> 01:01:33,060
has really made it into the mainstream,

1124
01:01:33,060 --> 01:01:37,200
and it's created this like frenzied hype.

1125
01:01:37,200 --> 01:01:42,200
I'm wondering from your perspective,

1126
01:01:42,640 --> 01:01:45,120
how do you view the gap between the hype

1127
01:01:45,120 --> 01:01:48,080
and the reality of AI?

1128
01:01:49,560 --> 01:01:52,960
Yeah, and this industry in particular,

1129
01:01:52,960 --> 01:01:57,320
there are all these hype cycles can be overwhelming.

1130
01:01:57,320 --> 01:02:01,200
I personally cannot wait for the AI hype cycle to be overt

1131
01:02:01,200 --> 01:02:03,960
so that we can all have a little bit more peace

1132
01:02:03,960 --> 01:02:05,700
to do the work.

1133
01:02:05,700 --> 01:02:08,360
I miss when AI was not cool,

1134
01:02:08,360 --> 01:02:12,480
because like, and NeurIPS was a lot more fun than two.

1135
01:02:12,480 --> 01:02:15,600
Like all of the academic conferences were much more chill.

1136
01:02:15,600 --> 01:02:18,680
Now it feels like there's more VCs than researchers.

1137
01:02:19,680 --> 01:02:24,680
But I do think whenever you're trying to distinguish

1138
01:02:25,880 --> 01:02:30,160
between who is like an AI influencer versus

1139
01:02:30,160 --> 01:02:33,440
who is someone who does this for their day job,

1140
01:02:33,440 --> 01:02:36,560
like definitely look and see where they work,

1141
01:02:36,560 --> 01:02:38,280
look and see what they've built.

1142
01:02:39,320 --> 01:02:43,920
And if it's somebody that decided to get excited about AI,

1143
01:02:43,920 --> 01:02:46,880
just about the time that web three went downhill,

1144
01:02:48,080 --> 01:02:51,720
because like somehow a lot of these web three influencers

1145
01:02:51,720 --> 01:02:54,280
have turned into like generative AI influencers,

1146
01:02:54,280 --> 01:02:55,960
and I don't know how that happened.

1147
01:02:55,960 --> 01:02:59,280
But just kind of like look at the backlog,

1148
01:02:59,280 --> 01:03:02,160
see what the person has accomplished

1149
01:03:02,160 --> 01:03:04,400
and what they've been interested in.

1150
01:03:04,400 --> 01:03:08,600
And if they haven't been doing AI for longer

1151
01:03:08,600 --> 01:03:10,640
than the last year or two,

1152
01:03:10,640 --> 01:03:14,560
then I would take what they say with a grain of salt.

1153
01:03:15,920 --> 01:03:17,320
Yeah, absolutely.

1154
01:03:17,320 --> 01:03:20,480
I think it's really good advice.

1155
01:03:20,480 --> 01:03:23,120
That's why it's really nice talking to somebody like you,

1156
01:03:24,080 --> 01:03:29,080
who has experience at Microsoft and GitHub and Google

1157
01:03:29,080 --> 01:03:32,640
and Alphabet and DeepMind and everything like that.

1158
01:03:33,960 --> 01:03:36,080
The last juicy meaty question,

1159
01:03:37,240 --> 01:03:41,440
what has a career in machine learning taught you about life?

1160
01:03:44,240 --> 01:03:45,800
That is a wonderful question.

1161
01:03:45,800 --> 01:03:48,400
I think what it has taught me the most

1162
01:03:48,400 --> 01:03:52,060
is what to appreciate about being human.

1163
01:03:52,060 --> 01:03:56,920
Like the more you see what these models can do,

1164
01:03:56,920 --> 01:03:59,920
the more you can understand what they're doing.

1165
01:03:59,920 --> 01:04:04,920
And it's very exciting and it's very cool.

1166
01:04:05,000 --> 01:04:10,000
And it's like discovering new capabilities every single day,

1167
01:04:11,160 --> 01:04:13,920
new ways in which I can automate myself

1168
01:04:14,880 --> 01:04:17,960
and sort of remove the tedious parts of the day

1169
01:04:17,960 --> 01:04:22,440
that I like creating meeting transcripts

1170
01:04:22,440 --> 01:04:27,440
or emails or drafting a paragraph for a doc or something.

1171
01:04:30,080 --> 01:04:34,320
But also seeing the ways in which it falls short.

1172
01:04:34,320 --> 01:04:39,320
Like it's not going to be able to give you a hug.

1173
01:04:42,560 --> 01:04:47,560
It's not going to sort of understand, at least not yet.

1174
01:04:48,760 --> 01:04:51,280
It's not going to understand that you might have a bad day

1175
01:04:51,280 --> 01:04:55,460
and that it should ask you, how are you doing?

1176
01:04:56,440 --> 01:05:01,440
It's not going to do those things

1177
01:05:03,240 --> 01:05:06,640
that are uniquely nice experiences

1178
01:05:06,640 --> 01:05:09,540
of interacting with humans in real life every day.

1179
01:05:10,640 --> 01:05:13,600
And I think that's important.

1180
01:05:13,600 --> 01:05:18,600
And I think one of the nicest possible outcomes

1181
01:05:18,760 --> 01:05:21,160
for generative AI becoming ingrained

1182
01:05:21,160 --> 01:05:23,480
in every person's life is that,

1183
01:05:25,120 --> 01:05:29,640
it helps us appreciate more what it is to be a human

1184
01:05:29,640 --> 01:05:31,240
and to interact with humans.

1185
01:05:32,640 --> 01:05:34,600
That's really nice.

1186
01:05:34,600 --> 01:05:35,660
Yeah, that's beautiful.

1187
01:05:35,660 --> 01:05:40,560
So by exploring the capabilities of machine learning and AI,

1188
01:05:40,560 --> 01:05:44,040
you can really appreciate the human connection.

1189
01:05:44,040 --> 01:05:45,360
Yeah.

1190
01:05:45,360 --> 01:05:48,200
So for folks that want to learn more about you

1191
01:05:48,200 --> 01:05:49,400
or the work that you're doing,

1192
01:05:49,400 --> 01:05:52,200
what would be a good resource for them?

1193
01:05:53,400 --> 01:05:54,240
Cool.

1194
01:05:54,240 --> 01:05:59,240
So I strongly recommend taking a look at both the Google AI

1195
01:05:59,480 --> 01:06:01,040
and the DeepMind websites,

1196
01:06:01,040 --> 01:06:04,400
though they will probably be merging at some point

1197
01:06:04,400 --> 01:06:06,120
in the near future.

1198
01:06:06,120 --> 01:06:09,240
I'm also chronically available on Twitter.

1199
01:06:09,240 --> 01:06:11,560
So twitter.com slash Dynamic Web Page,

1200
01:06:11,560 --> 01:06:12,720
and I'm Dynamic Web Page

1201
01:06:12,720 --> 01:06:15,520
pretty much everywhere else on the internet.

1202
01:06:15,520 --> 01:06:19,320
And then I also strongly, strongly encourage

1203
01:06:19,320 --> 01:06:24,320
everyone to just kind of get involved.

1204
01:06:26,760 --> 01:06:30,720
There are certainly ways to begin learning more

1205
01:06:30,720 --> 01:06:33,400
about the AI community without necessarily

1206
01:06:33,400 --> 01:06:35,160
having to have a career in it.

1207
01:06:36,160 --> 01:06:39,320
And then if you do want to have a career in it,

1208
01:06:39,320 --> 01:06:42,600
I think it's going to become increasingly possible

1209
01:06:42,600 --> 01:06:46,400
as businesses seem to be adopting AI

1210
01:06:46,400 --> 01:06:50,600
at a much quicker rate these days.

1211
01:06:50,600 --> 01:06:53,600
So don't be afraid to pitch to your boss

1212
01:06:53,600 --> 01:06:56,200
if you have an idea and you want to create a prototype.

1213
01:06:57,680 --> 01:06:58,720
Absolutely.

1214
01:06:59,680 --> 01:07:02,280
Paige, it has been such a pleasure.

1215
01:07:02,280 --> 01:07:04,000
I really appreciate your time.

1216
01:07:04,000 --> 01:07:07,200
Thank you for letting me pick your brain for this time.

1217
01:07:07,200 --> 01:07:08,800
Thank you so much.

1218
01:07:08,800 --> 01:07:09,620
No, thank you.

1219
01:07:09,620 --> 01:07:10,560
This has been delightful.

1220
01:07:10,560 --> 01:07:14,080
I am so glad to have gotten the chance to chat

1221
01:07:14,080 --> 01:07:17,000
and thank you for the awesome questions.

1222
01:07:17,000 --> 01:07:17,920
Really appreciate it.

1223
01:07:17,920 --> 01:07:19,520
Thanks so much.

1224
01:07:19,520 --> 01:07:20,360
Cool, cool.

1225
01:07:25,240 --> 01:07:27,360
Thank you for listening to this episode

1226
01:07:27,360 --> 01:07:28,920
of Learning from Machine Learning

1227
01:07:28,920 --> 01:07:31,000
with the remarkable Paige Bailey,

1228
01:07:31,000 --> 01:07:33,360
the lead product manager for generative models

1229
01:07:33,360 --> 01:07:35,000
at Google DeepMind.

1230
01:07:35,000 --> 01:07:37,360
Her work is pushing the boundaries of innovation

1231
01:07:37,360 --> 01:07:40,560
with Bard and the soon to be released Gemini.

1232
01:07:40,560 --> 01:07:43,640
Don't miss out on the valuable resources in the show notes.

1233
01:07:43,640 --> 01:07:46,240
Please leave a review, share it with your friends

1234
01:07:46,240 --> 01:07:49,400
and let's create a community of continuous learning.

1235
01:07:49,400 --> 01:08:14,000
Until next time, keep on learning.

