1
00:00:00,000 --> 00:00:01,920
Hey everyone, welcome back to the show.

2
00:00:01,920 --> 00:00:04,760
Today, we're gonna be taking a deep dive

3
00:00:04,760 --> 00:00:07,720
into a paper that's all about AI.

4
00:00:07,720 --> 00:00:11,120
And specifically making AI more adaptable.

5
00:00:11,120 --> 00:00:14,580
So think of like self-driving cars

6
00:00:14,580 --> 00:00:18,900
that can handle new cities without freaking out

7
00:00:18,900 --> 00:00:21,680
or robots that can pick up unfamiliar objects

8
00:00:21,680 --> 00:00:23,500
without like totally fumbling it.

9
00:00:23,500 --> 00:00:24,920
That's the kind of future this research

10
00:00:24,920 --> 00:00:25,760
is pointing us toward.

11
00:00:25,760 --> 00:00:27,660
Yeah, it's a really interesting area.

12
00:00:27,660 --> 00:00:29,520
I mean, right now a lot of AI systems,

13
00:00:29,520 --> 00:00:30,680
they're kind of fragile, right?

14
00:00:30,680 --> 00:00:31,960
They're good at what they're trained for,

15
00:00:31,960 --> 00:00:34,080
but even like small changes can throw them off.

16
00:00:34,080 --> 00:00:36,200
Right, yeah, it's like they get stuck in their ways.

17
00:00:36,200 --> 00:00:39,520
So how does this paper propose we get AI unstuck?

18
00:00:39,520 --> 00:00:43,040
Well, the paper is called Model Based Transfer Learning

19
00:00:43,040 --> 00:00:45,400
for Contextual Reinforcement Learning.

20
00:00:45,400 --> 00:00:46,640
And it's focused on this idea

21
00:00:46,640 --> 00:00:47,920
of contextual reinforcement learning,

22
00:00:47,920 --> 00:00:49,280
which is kind of like, you know,

23
00:00:49,280 --> 00:00:51,240
if you think about teaching an AI to play a video game,

24
00:00:51,240 --> 00:00:53,680
it's like teaching it to play different levels of that game.

25
00:00:53,680 --> 00:00:56,120
And the goal is to help the AI learn

26
00:00:56,120 --> 00:00:57,320
from those previous levels,

27
00:00:57,320 --> 00:00:59,840
so it doesn't have to like start from scratch

28
00:00:59,840 --> 00:01:01,000
every single time.

29
00:01:01,000 --> 00:01:03,840
Okay, so instead of training like a separate AI

30
00:01:03,840 --> 00:01:07,640
for every possible city a self-driving car might encounter,

31
00:01:07,640 --> 00:01:09,800
we want one AI that can just kind of figure things out

32
00:01:09,800 --> 00:01:10,640
on the fly.

33
00:01:10,640 --> 00:01:11,560
Exactly, yeah.

34
00:01:11,560 --> 00:01:13,240
And the key here is something called

35
00:01:13,240 --> 00:01:15,240
Model Based Transfer Learning.

36
00:01:15,240 --> 00:01:16,080
MBTL.

37
00:01:16,080 --> 00:01:16,920
Yeah, MBTL.

38
00:01:16,920 --> 00:01:17,760
Okay.

39
00:01:17,760 --> 00:01:19,760
You can think of it kind of like a coach for your AI.

40
00:01:19,760 --> 00:01:22,860
It's constantly evaluating which training exercises

41
00:01:22,860 --> 00:01:25,120
will help the AI become, you know,

42
00:01:25,120 --> 00:01:28,520
a more well-rounded athlete, so to speak.

43
00:01:28,520 --> 00:01:30,320
Okay, I like that analogy.

44
00:01:30,320 --> 00:01:33,120
So how does this AI coaching actually work?

45
00:01:33,120 --> 00:01:35,700
Well, MBTL has kind of two main parts to it.

46
00:01:35,700 --> 00:01:37,960
The first is what they call the performance set point.

47
00:01:37,960 --> 00:01:40,900
And basically what it does is it uses a statistical model

48
00:01:40,900 --> 00:01:43,040
to predict how well the AI will perform

49
00:01:43,040 --> 00:01:45,120
after training on a specific task.

50
00:01:45,120 --> 00:01:47,640
It's like if you were looking at an athlete's past performance

51
00:01:47,640 --> 00:01:49,920
to get a sense of how they might do in a new competition.

52
00:01:49,920 --> 00:01:51,840
So it's all about making educated guesses

53
00:01:51,840 --> 00:01:53,560
based on like past data.

54
00:01:53,560 --> 00:01:54,600
Precisely.

55
00:01:54,600 --> 00:01:56,600
Now, the second part of it is what's called

56
00:01:56,600 --> 00:01:58,240
the generalization gap.

57
00:01:58,240 --> 00:02:01,880
And this measures how much performance drops

58
00:02:01,880 --> 00:02:05,120
when the AI faces a slightly different situation.

59
00:02:05,120 --> 00:02:09,120
So say if the self-driving car had to deal with rain

60
00:02:09,120 --> 00:02:13,040
for the first time, or the robot encountered, you know,

61
00:02:13,040 --> 00:02:15,560
a different shaped object than what it was used to.

62
00:02:15,560 --> 00:02:16,400
Interesting.

63
00:02:16,400 --> 00:02:17,920
So is this generalization gap something

64
00:02:17,920 --> 00:02:19,520
that can also be predicted?

65
00:02:19,520 --> 00:02:20,560
That's what they're trying to do.

66
00:02:20,560 --> 00:02:23,200
The paper uses what's called a linear model for this,

67
00:02:23,200 --> 00:02:27,120
meaning they're assuming that basically the further away

68
00:02:27,120 --> 00:02:29,720
the new situation is from what the AI is used to,

69
00:02:29,720 --> 00:02:31,800
the bigger that drop in performance will be.

70
00:02:31,800 --> 00:02:32,300
OK.

71
00:02:32,300 --> 00:02:34,520
So it's kind of like saying, like the more unfamiliar

72
00:02:34,520 --> 00:02:36,720
the situation, the steeper the learning curve.

73
00:02:36,720 --> 00:02:37,280
Exactly.

74
00:02:37,280 --> 00:02:39,640
And they actually tested this out with, you know,

75
00:02:39,640 --> 00:02:42,640
that classic problem called the cart pull simulation.

76
00:02:42,640 --> 00:02:45,000
We were trying to balance the pull upright on a moving cart.

77
00:02:45,000 --> 00:02:45,600
Oh, yeah.

78
00:02:45,600 --> 00:02:49,040
I remember those from like my intro to AI class.

79
00:02:49,040 --> 00:02:51,440
You always seem so simple, but they're actually surprisingly

80
00:02:51,440 --> 00:02:52,600
tricky to get right.

81
00:02:52,600 --> 00:02:53,560
Exactly, yeah.

82
00:02:53,560 --> 00:02:55,960
And you can actually see this in Figure 1 of the Paper,

83
00:02:55,960 --> 00:03:00,840
like how the AI's performance drops off pretty predictably

84
00:03:00,840 --> 00:03:04,280
as the conditions of that cart pull task change.

85
00:03:04,280 --> 00:03:04,920
OK.

86
00:03:04,920 --> 00:03:06,920
So we've got this system for predicting

87
00:03:06,920 --> 00:03:10,840
how well training will go and also how much performance

88
00:03:10,840 --> 00:03:13,160
will drop off in new situations.

89
00:03:13,160 --> 00:03:17,480
But how does MBTL hourly, like, decide what the AI should

90
00:03:17,480 --> 00:03:18,520
train on next?

91
00:03:18,520 --> 00:03:20,080
Well, that's where things get really clever.

92
00:03:20,080 --> 00:03:23,080
They use something called Bayesian optimization, which

93
00:03:23,080 --> 00:03:25,000
sounds kind of complicated, but it's actually

94
00:03:25,000 --> 00:03:27,400
a pretty intuitive idea.

95
00:03:27,400 --> 00:03:29,080
Imagine you're training for a marathon.

96
00:03:29,080 --> 00:03:31,280
You wouldn't just run the same distance every day, right?

97
00:03:31,280 --> 00:03:34,000
You'd vary your workouts, maybe do some speed drills one day,

98
00:03:34,000 --> 00:03:36,960
some long runs another day, and take rest days in between.

99
00:03:36,960 --> 00:03:37,800
Yeah, that makes sense.

100
00:03:37,800 --> 00:03:39,320
You need variety to improve.

101
00:03:39,320 --> 00:03:39,840
Exactly.

102
00:03:39,840 --> 00:03:41,440
And that's kind of what Bayesian optimization

103
00:03:41,440 --> 00:03:42,400
is doing for the AI.

104
00:03:42,400 --> 00:03:44,760
It's using those predictions about performance

105
00:03:44,760 --> 00:03:47,760
and the generalization gap to pick the training tasks that

106
00:03:47,760 --> 00:03:50,120
are going to offer the most learning value.

107
00:03:50,120 --> 00:03:53,440
It's all about finding that right balance between pushing

108
00:03:53,440 --> 00:03:55,200
the AI outside of its comfort zone,

109
00:03:55,200 --> 00:03:57,760
but also making sure it doesn't get completely overwhelmed.

110
00:03:57,760 --> 00:04:00,280
So it's kind of like a personal trainer for the AI

111
00:04:00,280 --> 00:04:02,240
that's constantly adjusting the workout routine

112
00:04:02,240 --> 00:04:04,320
to maximize its learning potential.

113
00:04:04,320 --> 00:04:05,320
Yeah, exactly.

114
00:04:05,320 --> 00:04:07,280
It's not just throwing random data at the AI

115
00:04:07,280 --> 00:04:08,640
and hoping for the best.

116
00:04:08,640 --> 00:04:11,360
It's about strategically choosing the experiences that

117
00:04:11,360 --> 00:04:14,760
will help it learn the fastest and generalize its knowledge

118
00:04:14,760 --> 00:04:16,360
to new situations.

119
00:04:16,360 --> 00:04:19,840
This all sounds super promising, but does it actually work?

120
00:04:19,840 --> 00:04:20,440
Yeah.

121
00:04:20,440 --> 00:04:23,480
How well does this MBTL actually perform compared

122
00:04:23,480 --> 00:04:25,280
to other approaches?

123
00:04:25,280 --> 00:04:26,760
Yeah, so that's the exciting part.

124
00:04:26,760 --> 00:04:29,320
They tested it out on a bunch of different tasks

125
00:04:29,320 --> 00:04:32,000
from those classic continuous control problems,

126
00:04:32,000 --> 00:04:34,680
like making a robotic arm move smoothly,

127
00:04:34,680 --> 00:04:38,240
to some really cool real-world simulations.

128
00:04:38,240 --> 00:04:42,200
OK, so like robots doing parkour or self-driving cars

129
00:04:42,200 --> 00:04:44,120
navigating rush hour traffic?

130
00:04:44,120 --> 00:04:45,920
Not quite that advanced, not yet.

131
00:04:45,920 --> 00:04:47,840
But they did look at some pretty challenging stuff.

132
00:04:47,840 --> 00:04:50,320
So for example, they had AI controlling traffic lights

133
00:04:50,320 --> 00:04:52,280
to optimize flow through intersections.

134
00:04:52,280 --> 00:04:52,800
Oh, wow.

135
00:04:52,800 --> 00:04:56,120
And they even looked at systems for advising human drivers

136
00:04:56,120 --> 00:04:59,320
in real time, giving them tips on how to drive more efficiently.

137
00:04:59,320 --> 00:05:01,720
So how did MBTL do in these tests?

138
00:05:01,720 --> 00:05:04,000
Did it actually live up to the hype?

139
00:05:04,000 --> 00:05:04,840
It really did.

140
00:05:04,840 --> 00:05:06,640
They found that across the board,

141
00:05:06,640 --> 00:05:09,640
MBTL achieved really significant improvements

142
00:05:09,640 --> 00:05:12,320
in what's called sample efficiency, which basically

143
00:05:12,320 --> 00:05:15,560
means it learned a lot faster than standard methods.

144
00:05:15,560 --> 00:05:18,440
OK, so are we talking like a noticeable difference?

145
00:05:18,440 --> 00:05:20,000
Or just like a few percentage points?

146
00:05:20,000 --> 00:05:21,680
Oh, no, much more than that.

147
00:05:21,680 --> 00:05:25,200
We're talking about improvements of like up to 50 times.

148
00:05:25,200 --> 00:05:25,600
Wow.

149
00:05:25,600 --> 00:05:26,360
Yeah.

150
00:05:26,360 --> 00:05:29,800
In some cases, the AI actually needed 25 times fewer training

151
00:05:29,800 --> 00:05:32,440
tasks to reach the same level of performance

152
00:05:32,440 --> 00:05:35,080
as just training it separately for every single scenario.

153
00:05:35,080 --> 00:05:37,280
Wow, that's a massive time saver.

154
00:05:37,280 --> 00:05:39,480
What about those other approaches we talked about earlier?

155
00:05:39,480 --> 00:05:42,320
The ones that use multiple policies, but not

156
00:05:42,320 --> 00:05:44,240
that fancy Bayesian optimization?

157
00:05:44,240 --> 00:05:44,640
Yeah.

158
00:05:44,640 --> 00:05:46,560
Did MBTL beat those too?

159
00:05:46,560 --> 00:05:47,400
Yeah, it did.

160
00:05:47,400 --> 00:05:50,200
It consistently outperformed those simpler methods, which

161
00:05:50,200 --> 00:05:54,000
really highlights the power of this strategic task selection.

162
00:05:54,000 --> 00:05:56,280
It's not just about having multiple policies.

163
00:05:56,280 --> 00:05:58,480
It's about picking the right ones to learn from.

164
00:05:58,480 --> 00:06:01,320
So it's like having a really good coach who knows exactly

165
00:06:01,320 --> 00:06:02,760
which drills are going to make the biggest

166
00:06:02,760 --> 00:06:03,880
difference for the athlete.

167
00:06:03,880 --> 00:06:04,800
Exactly.

168
00:06:04,800 --> 00:06:07,560
It's about working smarter, not harder.

169
00:06:07,560 --> 00:06:09,880
But of course, like any new approach,

170
00:06:09,880 --> 00:06:11,440
there are always limitations.

171
00:06:11,440 --> 00:06:14,560
And the authors are very upfront about these in the paper.

172
00:06:14,560 --> 00:06:17,360
OK, so what are the caveats here?

173
00:06:17,360 --> 00:06:21,360
What can't this super coach AI do?

174
00:06:21,360 --> 00:06:24,080
So one of the key limitations right now

175
00:06:24,080 --> 00:06:26,720
is that it's really designed for situations where there's only

176
00:06:26,720 --> 00:06:29,480
one varying factor at a time.

177
00:06:29,480 --> 00:06:30,800
OK, so walk me through that.

178
00:06:30,800 --> 00:06:32,680
What does that mean, like in a practical sense?

179
00:06:32,680 --> 00:06:34,920
OK, so let's go back to that self-driving car example.

180
00:06:34,920 --> 00:06:37,000
Imagine the AI is learning to handle

181
00:06:37,000 --> 00:06:38,360
different weather conditions, right?

182
00:06:38,360 --> 00:06:40,840
So it can learn to drive in rain or snow.

183
00:06:40,840 --> 00:06:43,360
But not both at the same time, at least not yet.

184
00:06:43,360 --> 00:06:46,560
So it's kind of like saying, you can be a master chef specializing

185
00:06:46,560 --> 00:06:50,520
in French cuisine, but don't try to mix in some Japanese

186
00:06:50,520 --> 00:06:51,800
techniques just yet.

187
00:06:51,800 --> 00:06:53,200
Yeah, that's a great analogy.

188
00:06:53,200 --> 00:06:54,920
And of course, that's a simplification,

189
00:06:54,920 --> 00:06:56,320
but it kind of captures the idea.

190
00:06:56,320 --> 00:06:57,760
So yeah, addressing this limitation

191
00:06:57,760 --> 00:07:00,040
is definitely an area for future research.

192
00:07:00,040 --> 00:07:01,840
Yeah, it makes sense that that would be a challenge.

193
00:07:01,840 --> 00:07:04,120
But even with that constraint, this research

194
00:07:04,120 --> 00:07:05,680
seems like a huge step forward.

195
00:07:05,680 --> 00:07:06,160
Yeah.

196
00:07:06,160 --> 00:07:09,120
It could really change how we approach training AI

197
00:07:09,120 --> 00:07:12,000
for these messy real world problems.

198
00:07:12,000 --> 00:07:12,680
Absolutely.

199
00:07:12,680 --> 00:07:15,240
I think what's really exciting is the potential

200
00:07:15,240 --> 00:07:19,920
this has for making AI more robust and adaptable, right?

201
00:07:19,920 --> 00:07:22,960
It kind of moves us closer to this goal of having AI that

202
00:07:22,960 --> 00:07:25,800
can truly learn and generalize in a way that's more

203
00:07:25,800 --> 00:07:27,600
similar to how humans do it.

204
00:07:27,600 --> 00:07:29,680
So before we wrap up, what would you

205
00:07:29,680 --> 00:07:31,920
say is the biggest takeaway from this research?

206
00:07:31,920 --> 00:07:35,080
What should our listeners be most excited about?

207
00:07:35,080 --> 00:07:36,440
That's a good question.

208
00:07:36,440 --> 00:07:39,720
I think for me, it's this focus on really understanding

209
00:07:39,720 --> 00:07:42,080
and modeling the process of generalization.

210
00:07:42,080 --> 00:07:45,560
So MBTL isn't just a clever trick that

211
00:07:45,560 --> 00:07:47,320
works in a few specific cases.

212
00:07:47,320 --> 00:07:50,480
It's about gaining a deeper understanding of how AI actually

213
00:07:50,480 --> 00:07:51,560
learns and adapts.

214
00:07:51,560 --> 00:07:53,800
And I think that's a really crucial step towards building

215
00:07:53,800 --> 00:07:55,640
AI systems that are more intelligent and more

216
00:07:55,640 --> 00:07:56,840
reliable in the future.

217
00:07:56,840 --> 00:07:57,640
That's a great point.

218
00:07:57,640 --> 00:08:00,320
It's not just about throwing more data at the problem.

219
00:08:00,320 --> 00:08:04,200
It's about figuring out how to make the AI learn more

220
00:08:04,200 --> 00:08:06,120
effectively from the data it does have.

221
00:08:06,120 --> 00:08:06,920
Exactly.

222
00:08:06,920 --> 00:08:08,840
And that's why I think this research is so important.

223
00:08:08,840 --> 00:08:11,040
It's really pushing the boundaries of what

224
00:08:11,040 --> 00:08:12,200
we thought was possible.

225
00:08:12,200 --> 00:08:15,840
And it's opening up a whole new world of possibilities

226
00:08:15,840 --> 00:08:17,440
for the future of AI.

227
00:08:17,440 --> 00:08:18,640
I'm totally with you there.

228
00:08:18,640 --> 00:08:21,520
This has been a super fascinating deep dive.

229
00:08:21,520 --> 00:08:25,080
But it also leaves us with a pretty big question.

230
00:08:25,080 --> 00:08:27,880
If MBTL is so good at picking up on these similarities

231
00:08:27,880 --> 00:08:31,880
between tasks, what happens when the AI encounters

232
00:08:31,880 --> 00:08:34,640
something that's truly novel, something that's just

233
00:08:34,640 --> 00:08:35,960
never seen before?

234
00:08:35,960 --> 00:08:37,840
Yeah, that's the million dollar question.

235
00:08:37,840 --> 00:08:39,600
And it's definitely one that researchers

236
00:08:39,600 --> 00:08:42,480
will be grappling with for years to come.

237
00:08:42,480 --> 00:08:45,000
It really makes you think about just how far AI still

238
00:08:45,000 --> 00:08:46,160
has to go.

239
00:08:46,160 --> 00:08:49,720
So if MBTL is all about finding those connections,

240
00:08:49,720 --> 00:08:52,160
those similarities between tasks, is there

241
00:08:52,160 --> 00:08:56,160
a way we can prepare AI for the truly unexpected?

242
00:08:56,160 --> 00:08:59,120
Yeah, that's the big challenge, right?

243
00:08:59,120 --> 00:09:01,360
How do you train for something that you can't even

244
00:09:01,360 --> 00:09:02,640
anticipate?

245
00:09:02,640 --> 00:09:04,000
And I think that's where researchers

246
00:09:04,000 --> 00:09:06,360
are starting to look at combining different approaches.

247
00:09:06,360 --> 00:09:11,240
So maybe you use MBTL to build that strong foundation

248
00:09:11,240 --> 00:09:12,160
of knowledge.

249
00:09:12,160 --> 00:09:14,360
But then you layer on other techniques

250
00:09:14,360 --> 00:09:18,600
that are better at handling novelty and uncertainty.

251
00:09:18,600 --> 00:09:21,760
So it's like giving the AI a toolbox of different learning

252
00:09:21,760 --> 00:09:22,240
strategies.

253
00:09:22,240 --> 00:09:22,920
Exactly.

254
00:09:22,920 --> 00:09:26,720
And then it can learn to switch between those tools

255
00:09:26,720 --> 00:09:28,120
depending on the situation.

256
00:09:28,120 --> 00:09:31,280
And that's still very much an open research question,

257
00:09:31,280 --> 00:09:33,600
but it's a super exciting area.

258
00:09:33,600 --> 00:09:34,360
Yeah, for sure.

259
00:09:34,360 --> 00:09:37,000
It's like we're trying to teach AI not just

260
00:09:37,000 --> 00:09:41,440
to be good at specific tasks, but to be good learners

261
00:09:41,440 --> 00:09:41,960
in general.

262
00:09:41,960 --> 00:09:43,080
Exactly, yeah.

263
00:09:43,080 --> 00:09:44,640
And that's what I think will ultimately

264
00:09:44,640 --> 00:09:49,720
lead to more robust AI systems, AI that can actually

265
00:09:49,720 --> 00:09:51,240
thrive in the real world.

266
00:09:51,240 --> 00:09:52,720
Well, I think we've given our listeners

267
00:09:52,720 --> 00:09:54,280
a lot to think about today.

268
00:09:54,280 --> 00:09:55,920
Thanks for joining us for this deep dive

269
00:09:55,920 --> 00:09:58,120
into the world of model-based transfer learning.

270
00:09:58,120 --> 00:09:59,920
It's been a fascinating conversation.

271
00:09:59,920 --> 00:10:02,040
And I have a feeling we're going to be hearing a lot more

272
00:10:02,040 --> 00:10:06,120
about this kind of research in the very near future.

