1
00:00:00,000 --> 00:00:01,640
All right, strap in everyone.

2
00:00:01,640 --> 00:00:04,480
Today we are going deep on a paper all about,

3
00:00:04,480 --> 00:00:06,360
well, how AI remembers things.

4
00:00:06,360 --> 00:00:08,040
You mean remembers like us?

5
00:00:08,040 --> 00:00:10,960
Well, kind of, but not really.

6
00:00:10,960 --> 00:00:13,120
Specifically, it's about transformers,

7
00:00:13,120 --> 00:00:14,720
those powerful AI models.

8
00:00:14,720 --> 00:00:15,720
Ah.

9
00:00:15,720 --> 00:00:16,560
The paper is called

10
00:00:16,560 --> 00:00:20,080
An Evolved Universal Transformer Memory.

11
00:00:20,080 --> 00:00:21,000
Catchy, right?

12
00:00:21,000 --> 00:00:22,240
Definitely grabs your attention.

13
00:00:22,240 --> 00:00:23,440
And it's a pretty important topic,

14
00:00:23,440 --> 00:00:26,480
how memory impacts an AI's ability to, you know,

15
00:00:26,480 --> 00:00:28,240
learn and solve problems.

16
00:00:28,240 --> 00:00:29,080
It is.

17
00:00:29,080 --> 00:00:31,040
One of the big challenges is figuring out

18
00:00:31,040 --> 00:00:33,480
how to make these transformers smarter,

19
00:00:33,480 --> 00:00:36,000
but without making them super expensive to run,

20
00:00:36,000 --> 00:00:37,080
computationally speaking.

21
00:00:37,080 --> 00:00:38,560
Yeah, that's the balance, right?

22
00:00:38,560 --> 00:00:40,280
Totally, because, you know,

23
00:00:40,280 --> 00:00:42,160
the more a transformer can remember,

24
00:00:42,160 --> 00:00:44,000
like the bigger its context window,

25
00:00:44,000 --> 00:00:45,840
the better it usually performs.

26
00:00:45,840 --> 00:00:48,720
But all that memory requires a ton of computational power.

27
00:00:48,720 --> 00:00:50,280
Exactly, it gets costly.

28
00:00:50,280 --> 00:00:53,440
So this paper, they're proposing a pretty different approach.

29
00:00:53,440 --> 00:00:55,960
Instead of just telling the AI what to remember,

30
00:00:55,960 --> 00:00:57,840
you know, manually programming it.

31
00:00:57,840 --> 00:00:59,000
And what do they do?

32
00:00:59,000 --> 00:01:02,240
Well, they call it neural attention memory models.

33
00:01:02,240 --> 00:01:03,440
Yeah, NAMMs.

34
00:01:03,440 --> 00:01:05,760
It's basically using an evolutionary algorithm

35
00:01:05,760 --> 00:01:08,760
to teach AI how to manage its own memory.

36
00:01:08,760 --> 00:01:12,640
So it's like survival of the fittest, but for AI memories.

37
00:01:12,640 --> 00:01:16,080
The most important info gets to stay, the rest gets tossed.

38
00:01:16,080 --> 00:01:17,440
Uh-huh, you got it.

39
00:01:17,440 --> 00:01:20,920
So these NAMMs analyze what the AI is focusing on,

40
00:01:20,920 --> 00:01:23,840
what it's paying attention to, and then, get this,

41
00:01:23,840 --> 00:01:27,920
they use a technique from audio processing, a spectrogram.

42
00:01:27,920 --> 00:01:30,000
Wait a second, spectrograms?

43
00:01:30,000 --> 00:01:32,000
Like those visualizations of sound waves,

44
00:01:32,000 --> 00:01:34,560
what do those have to do with AI and memory?

45
00:01:34,560 --> 00:01:37,000
I know, it sounds strange, but it's really clever.

46
00:01:37,000 --> 00:01:40,440
By taking the AI's focus and turning that into a spectrogram,

47
00:01:40,440 --> 00:01:42,960
they can train the NAMMs to prune away

48
00:01:42,960 --> 00:01:46,200
all the unimportant stuff, keeping only the key information.

49
00:01:46,200 --> 00:01:48,080
And this works across different tasks too,

50
00:01:48,080 --> 00:01:49,640
not just in one specific area.

51
00:01:49,640 --> 00:01:52,440
So this model, it learns how to manage memory in a way

52
00:01:52,440 --> 00:01:54,040
that works for lots of different things.

53
00:01:54,040 --> 00:01:55,600
Universal, like the title says.

54
00:01:55,600 --> 00:01:57,480
Right, that's what makes it so interesting.

55
00:01:57,480 --> 00:02:00,640
They focused mostly on transformers in the paper,

56
00:02:00,640 --> 00:02:02,040
but they think it could be adapted

57
00:02:02,040 --> 00:02:04,160
to other kinds of AI models too.

58
00:02:04,160 --> 00:02:07,480
Wow, that is like next level.

59
00:02:07,480 --> 00:02:09,520
So it's not just about making transformers better,

60
00:02:09,520 --> 00:02:12,240
it's about giving any AI the ability

61
00:02:12,240 --> 00:02:15,280
to manage its own memory, regardless of the specific task.

62
00:02:15,280 --> 00:02:17,880
Yeah, exactly, it's about equipping AI

63
00:02:17,880 --> 00:02:20,120
with the tools to handle memory independently.

64
00:02:20,120 --> 00:02:22,120
Okay, this is blowing my mind a little.

65
00:02:22,120 --> 00:02:24,120
But let's get down to brass tacks.

66
00:02:24,120 --> 00:02:27,040
Did they actually see any real improvements,

67
00:02:27,040 --> 00:02:29,080
like in how well the AI performed

68
00:02:29,080 --> 00:02:30,880
and how efficiently it used resources?

69
00:02:30,880 --> 00:02:33,560
Oh yeah, they did, they tested it on this benchmark

70
00:02:33,560 --> 00:02:37,080
called Infinite Bench, which uses these super long contexts,

71
00:02:37,080 --> 00:02:38,600
like way more information than usual.

72
00:02:38,600 --> 00:02:39,440
Where'd I have it?

73
00:02:39,440 --> 00:02:41,480
Well, they saw a 10 fold improvement,

74
00:02:41,480 --> 00:02:43,520
like 10 times better performance.

75
00:02:43,520 --> 00:02:44,360
No way.

76
00:02:44,360 --> 00:02:48,320
And get this, when they applied NAMMs to video analysis,

77
00:02:48,320 --> 00:02:50,360
the model figured out how to prioritize

78
00:02:50,360 --> 00:02:52,320
the important language instructions

79
00:02:52,320 --> 00:02:55,080
and just ditch all the redundant video frames.

80
00:02:55,080 --> 00:02:57,520
So it learned to pay attention to the right things

81
00:02:57,520 --> 00:02:58,720
and ignore the noise.

82
00:02:59,600 --> 00:03:01,040
That's seriously impressive.

83
00:03:02,720 --> 00:03:04,400
It sounds like these NAMs are picking up

84
00:03:04,400 --> 00:03:07,080
some pretty clever tricks when it comes to memory.

85
00:03:07,080 --> 00:03:10,040
But did they compare this approach to other methods?

86
00:03:10,040 --> 00:03:12,560
Like how do we know this is better than what we already had?

87
00:03:12,560 --> 00:03:13,440
Good question.

88
00:03:13,440 --> 00:03:15,280
They compared it to three other methods

89
00:03:15,280 --> 00:03:16,640
that people are already using,

90
00:03:16,640 --> 00:03:19,320
and all of those relied on manually designed rules

91
00:03:19,320 --> 00:03:20,720
to manage memory.

92
00:03:20,720 --> 00:03:22,400
Okay, so someone had to program in

93
00:03:22,400 --> 00:03:24,400
exactly what the AI should remember.

94
00:03:24,400 --> 00:03:25,240
Yep, that's right.

95
00:03:25,240 --> 00:03:26,920
And the results were pretty clear.

96
00:03:26,920 --> 00:03:28,880
NAMMs did way better,

97
00:03:28,880 --> 00:03:30,680
both in terms of getting the right answers

98
00:03:30,680 --> 00:03:33,120
and using resources more efficiently.

99
00:03:33,120 --> 00:03:34,920
So this whole idea of letting the AI

100
00:03:34,920 --> 00:03:36,680
learn its own memory management,

101
00:03:36,680 --> 00:03:39,600
instead of forcing it to follow some pre-programmed rules,

102
00:03:39,600 --> 00:03:40,640
it actually works better.

103
00:03:40,640 --> 00:03:41,880
That's a major takeaway.

104
00:03:41,880 --> 00:03:44,880
They're essentially allowing the AI to adapt,

105
00:03:44,880 --> 00:03:46,440
to adjust its memory strategy,

106
00:03:46,440 --> 00:03:47,880
depending on what it's trying to do.

107
00:03:47,880 --> 00:03:50,000
And that leads to better results overall.

108
00:03:50,000 --> 00:03:51,960
Okay, now I'm really curious.

109
00:03:51,960 --> 00:03:53,800
How does all this actually work?

110
00:03:53,800 --> 00:03:56,360
What are the nuts and bolts of these NAMMs?

111
00:03:56,360 --> 00:03:58,680
Like, walk me through it step by step.

112
00:03:58,680 --> 00:04:01,200
Sure, so first we need to understand this thing

113
00:04:01,200 --> 00:04:02,960
called the attention matrix.

114
00:04:02,960 --> 00:04:03,960
Attention matrix.

115
00:04:03,960 --> 00:04:05,240
Sounds kind of sci-fi.

116
00:04:05,240 --> 00:04:06,440
A little bit, yeah.

117
00:04:06,440 --> 00:04:09,040
But it's really just a way of visualizing

118
00:04:09,040 --> 00:04:11,120
what the AI is focusing on.

119
00:04:11,120 --> 00:04:12,680
You know how when we read a sentence,

120
00:04:12,680 --> 00:04:15,640
we pay attention to certain words more than others?

121
00:04:15,640 --> 00:04:16,640
Yeah, the important ones.

122
00:04:16,640 --> 00:04:18,680
Right, and AI models do the same thing.

123
00:04:18,680 --> 00:04:21,840
They learn to focus on the most relevant information.

124
00:04:21,840 --> 00:04:24,600
The attention matrix is like a map of that focus.

125
00:04:24,600 --> 00:04:26,480
So it's like looking into the AI's brain,

126
00:04:26,480 --> 00:04:28,360
seeing what it thinks is important.

127
00:04:28,360 --> 00:04:30,920
Exactly, and that's where the NAMs come in.

128
00:04:30,920 --> 00:04:32,560
They look at this attention matrix,

129
00:04:32,560 --> 00:04:35,160
and then they use a technique called token pruning.

130
00:04:35,160 --> 00:04:36,800
Pruning, like trimming a tree.

131
00:04:36,800 --> 00:04:37,720
Kind of, yeah.

132
00:04:37,720 --> 00:04:40,640
It's about getting rid of the less important information,

133
00:04:40,640 --> 00:04:43,000
keeping only the stuff that really matters.

134
00:04:43,000 --> 00:04:45,640
And that makes the AI much more efficient and effective.

135
00:04:45,640 --> 00:04:47,840
Okay, so we've got an evolutionary algorithm

136
00:04:47,840 --> 00:04:49,720
training these NAMs.

137
00:04:49,720 --> 00:04:53,600
Then they analyze what the AI is focused on,

138
00:04:53,600 --> 00:04:56,160
and they prune away the unimportant stuff.

139
00:04:56,160 --> 00:04:57,280
Makes sense?

140
00:04:57,280 --> 00:04:59,600
But I'm still stuck on the spectrogram thing.

141
00:04:59,600 --> 00:05:03,720
How does analyzing sound waves help with AI memory?

142
00:05:03,720 --> 00:05:05,280
I just don't get that connection.

143
00:05:05,280 --> 00:05:07,280
That's the really cool part of this paper.

144
00:05:07,280 --> 00:05:10,560
They figured out that by representing the AI's attention

145
00:05:10,560 --> 00:05:12,000
as a spectrogram, you know,

146
00:05:12,000 --> 00:05:14,600
that technique used for sound analysis,

147
00:05:14,600 --> 00:05:17,320
they could make it much easier for the NAMs

148
00:05:17,320 --> 00:05:19,440
to learn how to manage memory.

149
00:05:19,440 --> 00:05:23,080
Hold up, they're using sound analysis to study AI memory.

150
00:05:23,080 --> 00:05:23,920
That's wild.

151
00:05:23,920 --> 00:05:24,920
I know, right?

152
00:05:24,920 --> 00:05:26,520
But it seems to work really well.

153
00:05:26,520 --> 00:05:28,160
They found that by using these spectrograms,

154
00:05:28,160 --> 00:05:31,160
the NAMs got way better at figuring out

155
00:05:31,160 --> 00:05:34,120
which information to keep and which to toss out.

156
00:05:34,120 --> 00:05:35,440
This is blowing my mind.

157
00:05:35,440 --> 00:05:36,560
Okay, before we go any further,

158
00:05:36,560 --> 00:05:37,960
let's just do a quick recap.

159
00:05:37,960 --> 00:05:39,960
We started by talking about this problem

160
00:05:39,960 --> 00:05:43,240
of managing memory in transformer models.

161
00:05:43,240 --> 00:05:44,760
Like, bigger isn't always better

162
00:05:44,760 --> 00:05:46,160
when it comes to context windows.

163
00:05:46,160 --> 00:05:47,000
Right.

164
00:05:47,000 --> 00:05:48,800
Then we dove into these NAMs,

165
00:05:48,800 --> 00:05:50,120
these evolved memory models

166
00:05:50,120 --> 00:05:52,560
that learned to pick out the important information

167
00:05:52,560 --> 00:05:54,520
by looking at what the AI focuses on

168
00:05:54,520 --> 00:05:56,960
and strategically forgetting the less important stuff.

169
00:05:56,960 --> 00:05:58,680
And they tested it against those other methods,

170
00:05:58,680 --> 00:06:00,360
the ones with the pre-programmed rules,

171
00:06:00,360 --> 00:06:02,520
and it totally blew them out of the water,

172
00:06:02,520 --> 00:06:05,000
both in terms of performance and efficiency.

173
00:06:05,000 --> 00:06:06,280
You got it.

174
00:06:06,280 --> 00:06:08,040
But what really got me excited

175
00:06:08,040 --> 00:06:10,120
was this whole thing with spectrograms.

176
00:06:10,120 --> 00:06:13,240
Like, they borrowed this tool from sound analysis

177
00:06:13,240 --> 00:06:15,840
to make memory management even more efficient.

178
00:06:15,840 --> 00:06:18,920
Yeah, that's some seriously creative problem solving.

179
00:06:18,920 --> 00:06:21,320
But we've only just scratched the surface of this paper.

180
00:06:21,320 --> 00:06:22,640
I know, right?

181
00:06:22,640 --> 00:06:24,720
There's still so much more to explore.

182
00:06:24,720 --> 00:06:27,680
Like, what about this whole zero-shot transfer thing?

183
00:06:27,680 --> 00:06:30,680
And what are the real-world implications of all this?

184
00:06:30,680 --> 00:06:32,280
We'll have to dive into that next time.

185
00:06:32,280 --> 00:06:33,280
Welcome back.

186
00:06:33,280 --> 00:06:34,760
We're in the middle of exploring

187
00:06:34,760 --> 00:06:38,080
these evolved memory models, NAMs.

188
00:06:38,080 --> 00:06:40,400
It's incredible how they analyze attention

189
00:06:40,400 --> 00:06:42,480
and prune information.

190
00:06:42,480 --> 00:06:43,480
But I wanna know,

191
00:06:43,480 --> 00:06:45,760
where could we actually use this technology?

192
00:06:45,760 --> 00:06:47,400
What kind of impact could it have?

193
00:06:47,400 --> 00:06:48,560
Yeah, exactly.

194
00:06:48,560 --> 00:06:50,640
It's cool to see all these impressive benchmark results,

195
00:06:50,640 --> 00:06:52,360
but let's get real.

196
00:06:52,360 --> 00:06:54,240
Are we talking actual real-world impact

197
00:06:54,240 --> 00:06:56,200
or just theoretical possibilities?

198
00:06:56,200 --> 00:06:57,760
I think this research could really change

199
00:06:57,760 --> 00:06:59,720
a lot of different AI applications.

200
00:06:59,720 --> 00:07:01,560
One area that immediately comes to mind

201
00:07:01,560 --> 00:07:03,800
is natural language processing.

202
00:07:03,800 --> 00:07:05,760
Okay, so like chatbots and stuff.

203
00:07:05,760 --> 00:07:06,760
Exactly.

204
00:07:06,760 --> 00:07:09,200
Imagine chatbots that can actually remember

205
00:07:09,200 --> 00:07:10,800
your past conversations,

206
00:07:10,800 --> 00:07:12,320
or language translation tools

207
00:07:12,320 --> 00:07:14,680
that are way more accurate, more nuanced.

208
00:07:14,680 --> 00:07:18,040
So Siri, but a million times better.

209
00:07:18,040 --> 00:07:18,880
Yeah.

210
00:07:18,880 --> 00:07:20,440
Able to understand all my weird requests.

211
00:07:20,440 --> 00:07:21,960
Ah, something like that.

212
00:07:21,960 --> 00:07:23,720
With this better memory management,

213
00:07:23,720 --> 00:07:26,560
these AI systems could be so much more personalized,

214
00:07:26,560 --> 00:07:27,720
so much more helpful.

215
00:07:27,720 --> 00:07:29,320
Okay, I see where you're going with this.

216
00:07:29,320 --> 00:07:31,200
Like customer service bots

217
00:07:31,200 --> 00:07:32,800
that actually remember who you are,

218
00:07:32,800 --> 00:07:34,320
what you talked about before.

219
00:07:34,320 --> 00:07:36,360
Or imagine educational tools

220
00:07:36,360 --> 00:07:38,080
that adapt to how you learn BAS.

221
00:07:38,080 --> 00:07:38,920
Exactly.

222
00:07:38,920 --> 00:07:40,280
Those are some great examples.

223
00:07:40,280 --> 00:07:42,640
What about stuff beyond just language?

224
00:07:42,640 --> 00:07:46,120
Could these NAMMs be used for other types of AI too?

225
00:07:46,120 --> 00:07:47,080
Oh, absolutely.

226
00:07:47,080 --> 00:07:48,120
The researchers talked about

227
00:07:48,120 --> 00:07:50,120
a lot of potential applications in fields

228
00:07:50,120 --> 00:07:52,880
like robotics and computer vision.

229
00:07:52,880 --> 00:07:54,320
All in, robots with memories.

230
00:07:54,320 --> 00:07:55,240
Well, think about it.

231
00:07:55,240 --> 00:07:58,040
Robots that can learn new tasks way faster

232
00:07:58,040 --> 00:08:00,760
because they remember their past experiences.

233
00:08:00,760 --> 00:08:02,320
Or computer vision systems

234
00:08:02,320 --> 00:08:04,400
that can analyze really complex scenes,

235
00:08:04,400 --> 00:08:06,000
but much more efficiently.

236
00:08:06,000 --> 00:08:08,880
So robots that learn from their mistakes

237
00:08:08,880 --> 00:08:11,160
and adapt to new environments.

238
00:08:11,160 --> 00:08:13,600
That sounds like, I don't know, something out of a movie.

239
00:08:13,600 --> 00:08:15,760
It might be closer to reality than you think.

240
00:08:15,760 --> 00:08:17,360
And this research is definitely

241
00:08:17,360 --> 00:08:19,120
a step in the right direction.

242
00:08:19,120 --> 00:08:22,360
By giving AI the ability to handle its own memory,

243
00:08:22,360 --> 00:08:24,680
we open up so many possibilities

244
00:08:24,680 --> 00:08:27,160
for how these systems learn and interact

245
00:08:27,160 --> 00:08:28,520
with the world around them.

246
00:08:28,520 --> 00:08:30,040
This is all incredibly exciting,

247
00:08:30,040 --> 00:08:31,920
but I have to ask,

248
00:08:31,920 --> 00:08:35,320
were there any limitations to this research?

249
00:08:35,320 --> 00:08:37,600
Or areas where they could improve?

250
00:08:37,600 --> 00:08:39,160
No research is perfect, right?

251
00:08:39,160 --> 00:08:40,000
You're right.

252
00:08:40,000 --> 00:08:41,520
I did point out some limitations.

253
00:08:41,520 --> 00:08:42,920
One of the big ones is that, you know,

254
00:08:42,920 --> 00:08:44,920
they mostly focused on language tasks

255
00:08:44,920 --> 00:08:46,560
for training the NAMMs.

256
00:08:46,560 --> 00:08:48,080
So we need more research to see

257
00:08:48,080 --> 00:08:49,760
how well they perform in other areas.

258
00:08:49,760 --> 00:08:50,600
Yeah, exactly.

259
00:08:50,600 --> 00:08:52,600
It's still early days for this technology.

260
00:08:52,600 --> 00:08:54,120
Another thing they mentioned was the cost

261
00:08:54,120 --> 00:08:55,560
of actually training these NAMs.

262
00:08:55,560 --> 00:08:57,760
It can still be quite resource intensive, you know,

263
00:08:57,760 --> 00:08:59,880
even though they make running the AI models

264
00:08:59,880 --> 00:09:01,480
themselves much cheaper.

265
00:09:01,480 --> 00:09:02,760
Well, that makes sense.

266
00:09:02,760 --> 00:09:04,080
Training any AI model

267
00:09:04,080 --> 00:09:06,000
takes a lot of computing power, right?

268
00:09:06,000 --> 00:09:07,800
Especially when you're dealing with something as complex

269
00:09:07,800 --> 00:09:09,880
as evolutionary algorithms.

270
00:09:09,880 --> 00:09:12,240
But it sounds like the benefits down the line,

271
00:09:12,240 --> 00:09:15,600
in terms of efficiency, could outweigh the initial costs.

272
00:09:15,600 --> 00:09:16,440
That's the hope.

273
00:09:16,440 --> 00:09:17,640
And the researchers believe that

274
00:09:17,640 --> 00:09:20,080
as this technology gets more refined,

275
00:09:20,080 --> 00:09:23,280
the training process will become more efficient too.

276
00:09:23,280 --> 00:09:24,480
So we've got this new way

277
00:09:24,480 --> 00:09:26,440
of approaching AI memory management

278
00:09:26,440 --> 00:09:29,080
with potential uses in all sorts of fields.

279
00:09:29,080 --> 00:09:31,440
But I'm also thinking about the bigger picture.

280
00:09:31,440 --> 00:09:32,960
What does this research mean

281
00:09:32,960 --> 00:09:35,680
for the future of AI in general?

282
00:09:35,680 --> 00:09:38,720
Well, I think this work is really pushing the boundaries

283
00:09:38,720 --> 00:09:40,280
of what AI can do.

284
00:09:40,280 --> 00:09:41,920
One of the most exciting aspects

285
00:09:41,920 --> 00:09:44,440
is this whole concept of zero shot transfer.

286
00:09:44,440 --> 00:09:45,520
Zero shot transfer.

287
00:09:45,520 --> 00:09:47,040
Fresh my memory, what was that again?

288
00:09:47,040 --> 00:09:50,240
It means you can train an AI model on one task

289
00:09:50,240 --> 00:09:52,240
and then use it for a totally different task

290
00:09:52,240 --> 00:09:53,800
without any additional training.

291
00:09:53,800 --> 00:09:55,240
Whoa.

292
00:09:55,240 --> 00:09:57,400
So you're saying like you could train these NAMs

293
00:09:57,400 --> 00:10:00,960
on language data and then use them to improve

294
00:10:00,960 --> 00:10:03,160
memory management in a robot, for example.

295
00:10:03,160 --> 00:10:04,000
Exactly.

296
00:10:04,000 --> 00:10:06,920
And that kind of flexibility, that adaptability,

297
00:10:06,920 --> 00:10:11,400
is a huge step towards creating more versatile AI systems.

298
00:10:11,400 --> 00:10:12,960
It's like moving away from AI

299
00:10:12,960 --> 00:10:15,000
that's good at one specific thing

300
00:10:15,000 --> 00:10:17,080
and toward AI that can learn and adapt

301
00:10:17,080 --> 00:10:18,560
to all sorts of situations.

302
00:10:18,560 --> 00:10:19,600
You got it.

303
00:10:19,600 --> 00:10:22,000
And that shift could completely revolutionize

304
00:10:22,000 --> 00:10:25,000
how we develop and interact with AI in the future.

305
00:10:25,000 --> 00:10:26,640
This is mind blowing stuff.

306
00:10:26,640 --> 00:10:29,120
But I wanna circle back to the specifics of this paper.

307
00:10:29,120 --> 00:10:31,400
They talked about like digging into the why

308
00:10:31,400 --> 00:10:33,800
behind how these NAMs actually work.

309
00:10:33,800 --> 00:10:35,080
Did they uncover anything interesting

310
00:10:35,080 --> 00:10:37,400
about the AI's thought process?

311
00:10:37,400 --> 00:10:38,360
They did.

312
00:10:38,360 --> 00:10:40,840
One of the things they looked at was how memory

313
00:10:40,840 --> 00:10:44,280
is distributed across different layers of the AI network.

314
00:10:44,280 --> 00:10:46,040
Like different parts of the AI's brain.

315
00:10:46,040 --> 00:10:46,880
Exactly.

316
00:10:46,880 --> 00:10:49,200
And they found that some layers tend to hold on

317
00:10:49,200 --> 00:10:51,600
to information from further back in time.

318
00:10:51,600 --> 00:10:55,000
So certain parts of the AI are better at long term memory.

319
00:10:55,000 --> 00:10:56,120
That's what it seems like.

320
00:10:56,120 --> 00:10:58,640
And it suggests that these layers are crucial

321
00:10:58,640 --> 00:11:00,680
for understanding complex relationships,

322
00:11:00,680 --> 00:11:02,400
the ones that span across a lot of data.

323
00:11:02,400 --> 00:11:05,280
It's like the AI is developing its own sense of history

324
00:11:05,280 --> 00:11:07,120
which helps it make better decisions.

325
00:11:07,120 --> 00:11:09,040
Whoa, that's deep.

326
00:11:09,040 --> 00:11:11,520
It's like we're watching this new form of intelligence

327
00:11:11,520 --> 00:11:15,600
emerge shaped by its own unique experiences and memories.

328
00:11:15,600 --> 00:11:17,240
Did they find anything else interesting

329
00:11:17,240 --> 00:11:19,840
about how these NAMs manage memory?

330
00:11:19,840 --> 00:11:22,880
They also noticed that the way NAMs get rid of information,

331
00:11:22,880 --> 00:11:25,200
the way they prune, it actually varies depending

332
00:11:25,200 --> 00:11:26,720
on what tasks they're doing.

333
00:11:26,720 --> 00:11:27,560
Makes sense.

334
00:11:27,560 --> 00:11:29,000
So it adapts its strategy.

335
00:11:29,000 --> 00:11:29,840
Yeah.

336
00:11:29,840 --> 00:11:31,640
For example, with CUD completion tasks,

337
00:11:31,640 --> 00:11:34,960
they found that NAMs removed more of the redundant bits

338
00:11:34,960 --> 00:11:36,960
compared to, say, natural language tasks.

339
00:11:36,960 --> 00:11:39,560
Well, code has to be super precise and efficient.

340
00:11:39,560 --> 00:11:41,040
So it makes sense that the memory management

341
00:11:41,040 --> 00:11:41,880
would be different.

342
00:11:41,880 --> 00:11:44,440
It's like the NAMs are learning to think like programmers.

343
00:11:44,440 --> 00:11:45,920
Ha ha, exactly.

344
00:11:45,920 --> 00:11:49,000
And that's a big part of what makes this research so cool.

345
00:11:49,000 --> 00:11:51,160
These NAMs aren't just remembering everything.

346
00:11:51,160 --> 00:11:53,680
They're learning what's important, what to prioritize,

347
00:11:53,680 --> 00:11:57,200
and they adjust their strategy based on the specific tasks.

348
00:11:57,200 --> 00:11:58,440
Just like we do, right?

349
00:11:58,440 --> 00:12:01,920
As humans, we don't remember every detail of every experience.

350
00:12:01,920 --> 00:12:03,520
We focus on what matters most.

351
00:12:03,520 --> 00:12:04,040
Right.

352
00:12:04,040 --> 00:12:06,160
And that's crucial for intelligent behavior,

353
00:12:06,160 --> 00:12:09,000
whether you're talking about humans or AI.

354
00:12:09,000 --> 00:12:10,800
This is all painting a really interesting picture,

355
00:12:10,800 --> 00:12:15,320
like how these NAMs work and what they could be capable of.

356
00:12:15,320 --> 00:12:17,040
But what about the data they used?

357
00:12:17,040 --> 00:12:19,400
Did they mention anything specific about the data itself?

358
00:12:19,400 --> 00:12:20,400
They did.

359
00:12:20,400 --> 00:12:21,720
One of the things they highlighted

360
00:12:21,720 --> 00:12:26,000
was this new benchmark data set they created called Chobun.

361
00:12:26,000 --> 00:12:26,960
Chobun.

362
00:12:26,960 --> 00:12:27,600
That's a cool name.

363
00:12:27,600 --> 00:12:29,240
What's so special about this data set?

364
00:12:29,240 --> 00:12:31,240
Well, it's specifically designed to test

365
00:12:31,240 --> 00:12:35,680
how well AI can understand long stretches of text in Japanese.

366
00:12:35,680 --> 00:12:36,760
Ah, I see.

367
00:12:36,760 --> 00:12:39,760
So it's about expanding the scope of language understanding

368
00:12:39,760 --> 00:12:40,480
for AI.

369
00:12:40,480 --> 00:12:41,480
Exactly.

370
00:12:41,480 --> 00:12:44,160
Most of the long context language benchmarks out there

371
00:12:44,160 --> 00:12:46,440
focus on English or Chinese.

372
00:12:46,440 --> 00:12:49,960
So Chobun helps to make sure AI development is inclusive,

373
00:12:49,960 --> 00:12:52,320
you know, that benefits speakers of different languages.

374
00:12:52,320 --> 00:12:53,480
That's fantastic.

375
00:12:53,480 --> 00:12:56,000
Did they talk about how the NAMs performed

376
00:12:56,000 --> 00:12:57,240
on this new data set?

377
00:12:57,240 --> 00:12:59,360
They did, and the results were great.

378
00:12:59,360 --> 00:13:02,000
The NAMs showed significant improvements

379
00:13:02,000 --> 00:13:04,360
compared to other memory management techniques.

380
00:13:04,360 --> 00:13:06,760
It shows how well they can adapt to a new language.

381
00:13:06,760 --> 00:13:08,240
This is amazing.

382
00:13:08,240 --> 00:13:10,640
It's incredible to see this research pushing the boundaries

383
00:13:10,640 --> 00:13:14,320
of AI in so many different ways from creating these new memory

384
00:13:14,320 --> 00:13:18,280
management techniques to expanding language understanding

385
00:13:18,280 --> 00:13:20,640
to making sure different languages are included.

386
00:13:20,640 --> 00:13:22,960
It feels like a real turning point for AI.

387
00:13:22,960 --> 00:13:23,840
I agree.

388
00:13:23,840 --> 00:13:26,960
This research has the potential to change how we build and use

389
00:13:26,960 --> 00:13:28,200
AI systems in the future.

390
00:13:28,200 --> 00:13:29,960
This has been an awesome journey so far,

391
00:13:29,960 --> 00:13:31,280
learning about AI memory.

392
00:13:31,280 --> 00:13:33,320
We've covered a lot of ground, but there's still

393
00:13:33,320 --> 00:13:34,440
so much more to this paper.

394
00:13:34,440 --> 00:13:35,360
I know, right?

395
00:13:35,360 --> 00:13:37,080
There's still so much to uncover.

396
00:13:37,080 --> 00:13:40,200
Stay tuned, because we're going to wrap up our deep dive

397
00:13:40,200 --> 00:13:43,720
into an evolved universal transformer memory

398
00:13:43,720 --> 00:13:46,440
with some final thoughts and key takeaways.

399
00:13:46,440 --> 00:13:48,840
And we're back for the final part of our deep dive.

400
00:13:48,840 --> 00:13:51,280
We've been talking all about this amazing paper,

401
00:13:51,280 --> 00:13:54,080
an evolved universal transformer memory,

402
00:13:54,080 --> 00:13:56,360
specifically those evolved memory models.

403
00:13:56,360 --> 00:13:57,840
The NAMs.

404
00:13:57,840 --> 00:13:59,400
What a wild ride this has been.

405
00:13:59,400 --> 00:14:00,480
It really has.

406
00:14:00,480 --> 00:14:03,400
From all those technical details to those mind blowing

407
00:14:03,400 --> 00:14:06,760
potential uses to, I don't know, even touching on some pretty

408
00:14:06,760 --> 00:14:09,280
deep philosophical stuff about AI and memory.

409
00:14:09,280 --> 00:14:10,200
Right.

410
00:14:10,200 --> 00:14:12,800
I think we can both agree this research is impressive.

411
00:14:12,800 --> 00:14:15,880
But before we wrap up, I want to zoom out a little.

412
00:14:15,880 --> 00:14:19,400
What does this all mean for the future of AI?

413
00:14:19,400 --> 00:14:20,280
The big picture.

414
00:14:20,280 --> 00:14:23,640
Well, to me, this paper shows a pretty big shift in how we

415
00:14:23,640 --> 00:14:25,440
think about and develop AI.

416
00:14:25,440 --> 00:14:28,480
We're moving beyond just throwing tons of data at an AI

417
00:14:28,480 --> 00:14:29,720
and hoping for the best.

418
00:14:29,720 --> 00:14:31,560
Yeah, it's not just about brute force anymore.

419
00:14:31,560 --> 00:14:32,120
Right.

420
00:14:32,120 --> 00:14:35,600
Now we're talking about AI systems that can learn how to learn,

421
00:14:35,600 --> 00:14:37,080
how to figure out what's important,

422
00:14:37,080 --> 00:14:38,640
how to adapt to new stuff.

423
00:14:38,640 --> 00:14:42,440
It's almost like we're giving AI the keys to its own mind,

424
00:14:42,440 --> 00:14:45,280
empowering it to be more in control of its own development.

425
00:14:45,280 --> 00:14:46,440
Exactly.

426
00:14:46,440 --> 00:14:48,680
And that has some pretty huge implications

427
00:14:48,680 --> 00:14:51,360
for the future of intelligence itself,

428
00:14:51,360 --> 00:14:53,640
both for AI and for humans.

429
00:14:53,640 --> 00:14:56,720
As these AI systems get better at managing their own memories,

430
00:14:56,720 --> 00:14:58,760
they'll be able to work with us in deeper ways,

431
00:14:58,760 --> 00:15:00,480
solve even more complex problems,

432
00:15:00,480 --> 00:15:03,160
maybe even help us understand ourselves better.

433
00:15:03,160 --> 00:15:06,240
It's like we're on the verge of this whole new partnership,

434
00:15:06,240 --> 00:15:09,000
humans and AI working together to push

435
00:15:09,000 --> 00:15:11,800
the boundaries of what's possible in terms

436
00:15:11,800 --> 00:15:13,200
of knowledge and creativity.

437
00:15:13,200 --> 00:15:14,200
But let's be real first.

438
00:15:14,200 --> 00:15:16,360
I got this research is still pretty new.

439
00:15:16,360 --> 00:15:17,560
What are the next steps?

440
00:15:17,560 --> 00:15:19,160
Where does this all go from here?

441
00:15:19,160 --> 00:15:20,840
Oh, there's so many possibilities.

442
00:15:20,840 --> 00:15:24,480
One thing is to see if these NAMMs work with other types

443
00:15:24,480 --> 00:15:26,640
of AI models, not just transformers.

444
00:15:26,640 --> 00:15:30,040
So expanding their memory magic to other AI species,

445
00:15:30,040 --> 00:15:30,560
so to speak.

446
00:15:30,560 --> 00:15:31,920
Exactly.

447
00:15:31,920 --> 00:15:33,680
We need to know if this approach works

448
00:15:33,680 --> 00:15:36,880
for other AI architectures, other kinds of tasks.

449
00:15:36,880 --> 00:15:38,720
Another exciting direction is to develop

450
00:15:38,720 --> 00:15:41,240
even more advanced evolutionary algorithms,

451
00:15:41,240 --> 00:15:44,400
ones that can train these NAMMs even more effectively.

452
00:15:44,400 --> 00:15:47,840
Survival of the fittest, but for AI memories on steroids.

453
00:15:47,840 --> 00:15:49,640
Uh-huh, that's a good way to put it.

454
00:15:49,640 --> 00:15:51,080
And as we push these boundaries,

455
00:15:51,080 --> 00:15:53,720
we got to be mindful of the impact this has on society.

456
00:15:53,720 --> 00:15:56,320
Make sure these advancements benefit everyone.

457
00:15:56,320 --> 00:15:59,000
AI should be a tool for good, helping us

458
00:15:59,000 --> 00:16:01,720
build a better future, not just for a few,

459
00:16:01,720 --> 00:16:03,280
but for all of humanity.

460
00:16:03,280 --> 00:16:04,080
Definitely.

461
00:16:04,080 --> 00:16:06,760
This research shows us that the future of AI

462
00:16:06,760 --> 00:16:08,800
isn't set in stone.

463
00:16:08,800 --> 00:16:11,160
It's up to us to guide it in a direction

464
00:16:11,160 --> 00:16:14,760
that aligns with our values, what we want to see in the world.

465
00:16:14,760 --> 00:16:17,280
So to bring it all together, this paper,

466
00:16:17,280 --> 00:16:20,160
an evolved universal transformer memory,

467
00:16:20,160 --> 00:16:25,160
gives us a glimpse into a future where AI isn't just powerful,

468
00:16:25,240 --> 00:16:27,960
but also adaptable, efficient,

469
00:16:27,960 --> 00:16:30,600
maybe even more intelligent than we can imagine right now.

470
00:16:30,600 --> 00:16:32,400
It's a future that's both exciting

471
00:16:32,400 --> 00:16:33,960
and a little bit intimidating.

472
00:16:33,960 --> 00:16:34,800
Definitely.

473
00:16:34,800 --> 00:16:36,520
And we have to approach it with a good balance

474
00:16:36,520 --> 00:16:38,560
of curiosity and caution.

475
00:16:38,560 --> 00:16:39,400
Couldn't agree more.

476
00:16:39,400 --> 00:16:41,760
Well, I think we've covered a lot of ground on this paper.

477
00:16:41,760 --> 00:16:42,680
I think so too.

478
00:16:42,680 --> 00:16:44,680
It's been fantastic exploring this with you.

479
00:16:44,680 --> 00:16:45,760
It's been a pleasure.

480
00:16:45,760 --> 00:16:47,240
And to all our listeners out there,

481
00:16:47,240 --> 00:16:49,000
thanks for joining us on this deep dive

482
00:16:49,000 --> 00:16:50,880
into the world of AI memory.

483
00:16:50,880 --> 00:17:16,280
Until next time, keep those brains buzzing.