1
00:00:00,000 --> 00:00:03,360
Okay, so imagine this AI that can understand language

2
00:00:03,360 --> 00:00:05,600
without like chopping it up into pieces first.

3
00:00:05,600 --> 00:00:07,240
Yeah, it's a pretty wild concept, right?

4
00:00:07,240 --> 00:00:08,440
Yeah, it kind of blows my mind.

5
00:00:08,440 --> 00:00:10,640
And that's exactly what we're gonna deep dive into today.

6
00:00:10,640 --> 00:00:11,480
Exactly.

7
00:00:11,480 --> 00:00:13,480
We have a fascinating paper all about

8
00:00:15,000 --> 00:00:19,000
bite-latent transformers, BLT for short.

9
00:00:19,000 --> 00:00:20,000
T-L-T, yeah.

10
00:00:20,000 --> 00:00:22,080
And how it's kind of shaking up the AI world.

11
00:00:22,080 --> 00:00:23,960
Yeah, so what's really interesting about this

12
00:00:23,960 --> 00:00:27,360
is this model, it actually works directly

13
00:00:27,360 --> 00:00:29,520
with the raw bites of information.

14
00:00:29,520 --> 00:00:30,360
Oh, wow.

15
00:00:30,360 --> 00:00:32,800
So those ones and zeros that make up the digital data.

16
00:00:32,800 --> 00:00:33,640
Okay.

17
00:00:33,640 --> 00:00:35,240
It basically skips over this whole process

18
00:00:35,240 --> 00:00:38,520
of tokenization that's been a standard in AI for ages.

19
00:00:38,520 --> 00:00:42,120
So for those of us who don't speak fluent AI,

20
00:00:42,120 --> 00:00:44,400
what is tokenization and like,

21
00:00:44,400 --> 00:00:46,400
why is ditching it such a big deal?

22
00:00:46,400 --> 00:00:48,400
Yeah, that's a great question.

23
00:00:48,400 --> 00:00:50,080
So think about tokenization,

24
00:00:50,080 --> 00:00:53,640
like breaking down a sentence into individual words.

25
00:00:53,640 --> 00:00:54,480
Okay.

26
00:00:54,480 --> 00:00:56,520
AI models, they traditionally use this method

27
00:00:56,520 --> 00:00:58,000
to try to understand the text.

28
00:00:58,000 --> 00:01:00,240
But imagine you're trying to understand a new language

29
00:01:00,240 --> 00:01:01,760
or maybe a technical document

30
00:01:01,760 --> 00:01:03,320
that's filled with a bunch of jargon.

31
00:01:03,320 --> 00:01:06,440
If your dictionary doesn't have those words,

32
00:01:06,440 --> 00:01:07,640
you're kind of stuck, right?

33
00:01:07,640 --> 00:01:09,200
Yeah, you're gonna have a bad time.

34
00:01:09,200 --> 00:01:12,080
That's exactly the limitation that BLT is trying to solve.

35
00:01:12,080 --> 00:01:15,040
So BLT is kind of like learning a language

36
00:01:15,040 --> 00:01:16,960
by understanding the letters

37
00:01:16,960 --> 00:01:19,840
instead of needing like a massive dictionary of words.

38
00:01:19,840 --> 00:01:20,680
Exactly.

39
00:01:20,680 --> 00:01:22,240
It's like going back to the basics.

40
00:01:22,240 --> 00:01:23,080
Interesting.

41
00:01:23,080 --> 00:01:25,240
And that's what makes it so much more adaptable.

42
00:01:25,240 --> 00:01:28,320
It can handle new words and complex terms.

43
00:01:28,320 --> 00:01:29,160
Wow.

44
00:01:29,160 --> 00:01:31,280
Even those pesky typos without missing a beat.

45
00:01:31,280 --> 00:01:33,240
Okay, cool, but how does it actually work?

46
00:01:33,240 --> 00:01:34,960
That is the question.

47
00:01:34,960 --> 00:01:37,800
And that's where this dynamic patching thing comes in.

48
00:01:37,800 --> 00:01:39,000
Ooh, that sounds fancy.

49
00:01:39,000 --> 00:01:40,320
What is dynamic patching?

50
00:01:40,320 --> 00:01:43,640
So imagine you're packing a suitcase.

51
00:01:43,640 --> 00:01:44,680
Okay.

52
00:01:44,680 --> 00:01:47,400
You wouldn't pack a super delicate vase.

53
00:01:47,400 --> 00:01:48,240
Right.

54
00:01:48,240 --> 00:01:50,080
The same way you would pack like a T-shirt.

55
00:01:50,080 --> 00:01:51,040
Of course not.

56
00:01:51,040 --> 00:01:53,600
BLT does something similar with bytes.

57
00:01:53,600 --> 00:01:55,320
It groups them into these little patches

58
00:01:55,320 --> 00:01:57,560
based on how complex the information is.

59
00:01:57,560 --> 00:01:59,600
So it's like it can tailor its focus

60
00:01:59,600 --> 00:02:00,720
depending on what it's reading.

61
00:02:00,720 --> 00:02:02,320
Yeah, like it's kind of reading the room.

62
00:02:02,320 --> 00:02:05,040
Spending more effort on the parts that are tricky

63
00:02:05,040 --> 00:02:06,320
and then just zipping through the stuff,

64
00:02:06,320 --> 00:02:08,000
that's pretty simple.

65
00:02:08,000 --> 00:02:09,680
It's all about efficiency.

66
00:02:09,680 --> 00:02:12,040
So it's not just like blindly chewing through data.

67
00:02:12,040 --> 00:02:13,600
It's actually being smart about it.

68
00:02:13,600 --> 00:02:14,440
Exactly.

69
00:02:14,440 --> 00:02:15,280
That's so cool.

70
00:02:15,280 --> 00:02:17,040
But how does it actually turn these patches

71
00:02:17,040 --> 00:02:18,840
into something that we can understand?

72
00:02:18,840 --> 00:02:20,200
Well, that's where the magic happened.

73
00:02:20,200 --> 00:02:21,040
The magic.

74
00:02:21,040 --> 00:02:21,880
Yeah.

75
00:02:21,880 --> 00:02:25,000
BLT, it uses this three-part system.

76
00:02:25,000 --> 00:02:26,160
Okay, a three-part system.

77
00:02:26,160 --> 00:02:28,320
Yeah, first you have the local encoder,

78
00:02:28,320 --> 00:02:30,320
which basically takes those raw bytes

79
00:02:30,320 --> 00:02:32,400
and maps them into these dynamic patches.

80
00:02:32,400 --> 00:02:33,240
Okay.

81
00:02:33,240 --> 00:02:35,720
Then we have the latent global transformer.

82
00:02:35,720 --> 00:02:36,560
Okay.

83
00:02:36,560 --> 00:02:38,680
Which is like the main powerhouse of the whole system.

84
00:02:38,680 --> 00:02:41,680
This is where the actual processing happens,

85
00:02:41,680 --> 00:02:44,880
understanding the relationships between those patches

86
00:02:44,880 --> 00:02:46,360
and extracting the meaning.

87
00:02:46,360 --> 00:02:48,480
So it's like the brains of the operation.

88
00:02:48,480 --> 00:02:49,600
That's a great way to put it.

89
00:02:49,600 --> 00:02:51,080
Okay, and then finally there's a decoder

90
00:02:51,080 --> 00:02:52,360
to kind of put it all back together

91
00:02:52,360 --> 00:02:53,320
so we can understand it.

92
00:02:53,320 --> 00:02:54,160
Yeah, you got it.

93
00:02:54,160 --> 00:02:57,400
The local decoder takes those process patches

94
00:02:57,400 --> 00:02:58,880
and turns them back into bytes.

95
00:02:58,880 --> 00:02:59,720
Okay.

96
00:02:59,720 --> 00:03:03,160
But there's this other really cool trick that BLT uses.

97
00:03:03,160 --> 00:03:04,560
Really get the hang of language.

98
00:03:04,560 --> 00:03:07,720
And it's called hash and gram embeddings.

99
00:03:07,720 --> 00:03:09,040
Hash and what?

100
00:03:09,040 --> 00:03:10,800
I know it sounds a little intimidating.

101
00:03:10,800 --> 00:03:11,720
Yeah, a little bit.

102
00:03:11,720 --> 00:03:13,520
But it's not as scary as it sounds.

103
00:03:13,520 --> 00:03:14,360
Okay.

104
00:03:14,360 --> 00:03:16,680
Just think about it like recognizing patterns

105
00:03:16,680 --> 00:03:18,360
in letters sequences.

106
00:03:18,360 --> 00:03:19,200
Okay.

107
00:03:19,200 --> 00:03:22,840
So for example, if you see the letter T in English,

108
00:03:22,840 --> 00:03:23,680
Right.

109
00:03:23,680 --> 00:03:25,360
there's a pretty good chance the next letter

110
00:03:25,360 --> 00:03:26,560
is gonna be an H.

111
00:03:26,560 --> 00:03:27,920
Yeah, A, yeah.

112
00:03:27,920 --> 00:03:28,920
Exactly.

113
00:03:28,920 --> 00:03:31,960
BLT uses these embeddings to kind of pick up

114
00:03:31,960 --> 00:03:34,880
on those same types of patterns, but in the bytes.

115
00:03:34,880 --> 00:03:35,720
Okay.

116
00:03:35,720 --> 00:03:37,760
Which really helps to understand the structure

117
00:03:37,760 --> 00:03:39,480
of the language at a deeper level.

118
00:03:39,480 --> 00:03:41,600
So it's not just seeing individual letters.

119
00:03:41,600 --> 00:03:43,600
It's like understanding how those letters

120
00:03:43,600 --> 00:03:44,440
usually go together.

121
00:03:44,440 --> 00:03:45,960
Exactly, it's all about context.

122
00:03:45,960 --> 00:03:47,720
Okay, that makes a lot of sense.

123
00:03:47,720 --> 00:03:50,480
But how does all this compare to the models

124
00:03:50,480 --> 00:03:51,760
that we're already using?

125
00:03:51,760 --> 00:03:53,400
That's a million dollar question.

126
00:03:53,400 --> 00:03:54,640
Yeah, is it actually better?

127
00:03:54,640 --> 00:03:56,240
Well, the research is pretty promising.

128
00:03:56,240 --> 00:04:00,200
They actually compared BLT to the state of the art model

129
00:04:00,200 --> 00:04:01,560
called LAME3.

130
00:04:01,560 --> 00:04:02,400
Okay.

131
00:04:02,400 --> 00:04:04,240
And they found that BLT could actually match

132
00:04:04,240 --> 00:04:08,600
its performance using significantly less processing power.

133
00:04:08,600 --> 00:04:09,640
Hold on, less power.

134
00:04:09,640 --> 00:04:10,480
Yeah.

135
00:04:10,480 --> 00:04:11,320
Same performance.

136
00:04:11,320 --> 00:04:12,920
Yeah, pretty amazing, right?

137
00:04:12,920 --> 00:04:14,360
That sounds like a win-win.

138
00:04:14,360 --> 00:04:15,200
Yeah.

139
00:04:15,200 --> 00:04:17,400
What does that mean for someone who's like building AI apps

140
00:04:17,400 --> 00:04:18,240
or something?

141
00:04:18,240 --> 00:04:19,960
So that means you could potentially achieve

142
00:04:19,960 --> 00:04:21,720
the same level of sophistication.

143
00:04:21,720 --> 00:04:22,560
Okay.

144
00:04:22,560 --> 00:04:25,320
But with way less computational resources.

145
00:04:25,320 --> 00:04:26,160
I like the sound of that.

146
00:04:26,160 --> 00:04:27,000
Yeah.

147
00:04:27,000 --> 00:04:28,520
It opens up a lot of possibilities,

148
00:04:28,520 --> 00:04:32,360
especially if you're working with limited hardware or budget.

149
00:04:32,360 --> 00:04:34,000
Right, or if you're just trying to be more efficient.

150
00:04:34,000 --> 00:04:35,080
Exactly.

151
00:04:35,080 --> 00:04:35,920
That's huge.

152
00:04:35,920 --> 00:04:38,120
I could really democratize access

153
00:04:38,120 --> 00:04:39,760
to these powerful AI models.

154
00:04:39,760 --> 00:04:41,280
Yeah, exactly.

155
00:04:41,280 --> 00:04:43,200
But hold on, comparing it to LAME3,

156
00:04:43,200 --> 00:04:45,440
isn't that kind of like apples or oranges?

157
00:04:45,440 --> 00:04:46,840
Yeah, that's a good point.

158
00:04:46,840 --> 00:04:50,280
Because LAME3 uses those fixed tokens right.

159
00:04:50,280 --> 00:04:53,000
While BLT is all about these dynamic patches.

160
00:04:53,000 --> 00:04:53,960
You're exactly right.

161
00:04:53,960 --> 00:04:55,600
And that's why the researchers,

162
00:04:55,600 --> 00:04:57,280
they took it a step further.

163
00:04:57,280 --> 00:05:00,960
They wanted to see how well each model could scale.

164
00:05:00,960 --> 00:05:04,800
So basically how they perform as you increase the size

165
00:05:04,800 --> 00:05:06,160
and the complexity.

166
00:05:06,160 --> 00:05:08,720
And that's where that whole patches are better

167
00:05:08,720 --> 00:05:10,560
than tokens thing really comes in.

168
00:05:10,560 --> 00:05:11,480
I am all ears.

169
00:05:11,480 --> 00:05:12,400
What did they find?

170
00:05:12,400 --> 00:05:15,760
So with BLT, you can do something really unique.

171
00:05:15,760 --> 00:05:17,520
You can increase the model size,

172
00:05:17,520 --> 00:05:20,000
A&D, the patch size at the same time,

173
00:05:20,000 --> 00:05:22,480
without needing more computational power.

174
00:05:22,480 --> 00:05:23,760
That sounds like a game changer.

175
00:05:23,760 --> 00:05:25,160
It is, it's a game changer,

176
00:05:25,160 --> 00:05:30,080
because it means BLT can adapt to more complex tasks

177
00:05:30,080 --> 00:05:32,520
and data sets without kind of hitting

178
00:05:32,520 --> 00:05:34,000
that performance bottleneck.

179
00:05:34,000 --> 00:05:36,240
So why can't traditional models do that?

180
00:05:36,240 --> 00:05:38,680
Because they're stuck with that fixed vocabulary

181
00:05:38,680 --> 00:05:40,280
and that fixed token size.

182
00:05:40,280 --> 00:05:42,520
It's like trying to fit more and more clothes

183
00:05:42,520 --> 00:05:44,640
into a suitcase that can't get any bigger.

184
00:05:44,640 --> 00:05:45,480
That's not gonna work.

185
00:05:45,480 --> 00:05:48,240
But BLT is like having this magical suitcase

186
00:05:48,240 --> 00:05:49,760
that expands as you need it.

187
00:05:49,760 --> 00:05:50,920
Okay, I love that analogy.

188
00:05:50,920 --> 00:05:54,480
So more flexibility, less computing power.

189
00:05:54,480 --> 00:05:55,840
What's the catch?

190
00:05:55,840 --> 00:05:57,480
Well, there's always a catch, right?

191
00:05:57,480 --> 00:05:59,880
So the research is still in its early stages.

192
00:05:59,880 --> 00:06:03,280
While BLT shows this incredible potential,

193
00:06:03,280 --> 00:06:05,360
there's a lot more to explore.

194
00:06:05,360 --> 00:06:08,160
The existing tools and the infrastructure for AI,

195
00:06:08,160 --> 00:06:12,480
they're mostly designed for those token-based models.

196
00:06:12,480 --> 00:06:16,440
So we need new approaches to really optimize BLT

197
00:06:16,440 --> 00:06:18,320
and to unlock its full power.

198
00:06:18,320 --> 00:06:20,120
So it's like we have this super fast car,

199
00:06:20,120 --> 00:06:21,720
but there's no race track design for it.

200
00:06:21,720 --> 00:06:23,200
Exactly, that's a great analogy.

201
00:06:23,200 --> 00:06:24,200
It reaches its top speed.

202
00:06:24,200 --> 00:06:26,640
Yeah, but the potential is definitely there.

203
00:06:26,640 --> 00:06:29,680
And the researchers, they're already looking into ways

204
00:06:29,680 --> 00:06:32,800
to improve things like the training process

205
00:06:32,800 --> 00:06:35,320
and make BLT even more efficient.

206
00:06:35,320 --> 00:06:38,880
So BLT might not be ready to take over the world just yet.

207
00:06:38,880 --> 00:06:40,560
But it's definitely a force to be reckoned with.

208
00:06:40,560 --> 00:06:41,400
Absolutely.

209
00:06:41,400 --> 00:06:44,040
This is getting me really excited about the future of AI.

210
00:06:44,040 --> 00:06:46,640
What other cool things did they find out about BLT?

211
00:06:46,640 --> 00:06:48,160
Oh, there's a lot more to unpack.

212
00:06:48,160 --> 00:06:50,040
What kind of tasks is it really good at?

213
00:06:50,040 --> 00:06:51,280
Let's dive into that.

214
00:06:51,280 --> 00:06:53,400
So one of the really impressive things they found

215
00:06:53,400 --> 00:06:59,880
is that BLT can handle messy real-world data, like a champ.

216
00:06:59,880 --> 00:07:03,640
Think about social media posts with all that slang and typos,

217
00:07:03,640 --> 00:07:06,760
or maybe technical documents filled with jargon.

218
00:07:06,760 --> 00:07:07,920
Yeah, that stuff could be tough.

219
00:07:07,920 --> 00:07:09,520
Yeah, those kinds of things can really

220
00:07:09,520 --> 00:07:11,200
trick up those traditional models.

221
00:07:11,200 --> 00:07:13,560
That makes sense if you're relying on a fixed dictionary.

222
00:07:13,560 --> 00:07:15,920
Anything outside of that is going to throw you off.

223
00:07:15,920 --> 00:07:16,800
Exactly.

224
00:07:16,800 --> 00:07:19,720
But BLT's like bite-level approach seems

225
00:07:19,720 --> 00:07:22,440
way more adaptable to those kind of curveballs.

226
00:07:22,440 --> 00:07:22,960
Exactly.

227
00:07:22,960 --> 00:07:25,480
It's like it can roll with the punches.

228
00:07:25,480 --> 00:07:26,160
OK, cool.

229
00:07:26,160 --> 00:07:28,240
So it's not just about efficiency then.

230
00:07:28,240 --> 00:07:30,360
It's about fairness and accuracy too.

231
00:07:30,360 --> 00:07:31,680
It's about robustness.

232
00:07:31,680 --> 00:07:33,120
Yeah, that's a really important point.

233
00:07:33,120 --> 00:07:33,680
It is.

234
00:07:33,680 --> 00:07:35,520
But how did they actually test this?

235
00:07:35,520 --> 00:07:38,920
So to test this, what kind of evidence did they find?

236
00:07:38,920 --> 00:07:42,440
So they used this benchmark called CET.

237
00:07:42,440 --> 00:07:44,000
C-E-T.

238
00:07:44,000 --> 00:07:45,120
C-U-E?

239
00:07:45,120 --> 00:07:47,800
Yeah, it's a cute name for a pretty tough test.

240
00:07:47,800 --> 00:07:48,400
Oh, OK.

241
00:07:48,400 --> 00:07:50,680
It tests how well a model understands

242
00:07:50,680 --> 00:07:54,280
the building blocks of language, like those individual characters

243
00:07:54,280 --> 00:07:54,840
and sounds.

244
00:07:54,840 --> 00:07:55,880
OK, the nitty-gritty.

245
00:07:55,880 --> 00:07:57,040
Yeah, the tiny little pieces.

246
00:07:57,040 --> 00:07:57,600
OK.

247
00:07:57,600 --> 00:08:01,240
And these tasks are surprisingly tricky for those token-based

248
00:08:01,240 --> 00:08:01,960
models.

249
00:08:01,960 --> 00:08:02,320
OK.

250
00:08:02,320 --> 00:08:04,720
But BLT excelled.

251
00:08:04,720 --> 00:08:07,240
So wait, it's not just understanding words.

252
00:08:07,240 --> 00:08:09,760
It's understanding the individual letters.

253
00:08:09,760 --> 00:08:10,400
Yeah.

254
00:08:10,400 --> 00:08:11,960
And how they relate to each other.

255
00:08:11,960 --> 00:08:13,640
Yeah, it's going deep.

256
00:08:13,640 --> 00:08:15,160
That's pretty amazing.

257
00:08:15,160 --> 00:08:17,440
What kind of tasks did they test specifically?

258
00:08:17,440 --> 00:08:20,040
So they look at things like spelling correction.

259
00:08:20,040 --> 00:08:20,760
Oh, wow.

260
00:08:20,760 --> 00:08:23,360
Identifying words with similar spellings,

261
00:08:23,360 --> 00:08:26,400
even figuring out how a word should be pronounced just

262
00:08:26,400 --> 00:08:27,600
by looking at the letters.

263
00:08:27,600 --> 00:08:28,760
That's wild.

264
00:08:28,760 --> 00:08:32,320
Yeah, BLT achieved near perfect accuracy

265
00:08:32,320 --> 00:08:33,400
on some of those tasks.

266
00:08:33,400 --> 00:08:34,120
That's incredible.

267
00:08:34,120 --> 00:08:39,120
It's like BLT is learning to read and understand language

268
00:08:39,120 --> 00:08:40,080
the way humans do.

269
00:08:40,080 --> 00:08:41,840
Yeah, it's learning like a kid.

270
00:08:41,840 --> 00:08:44,040
By paying attention to all the smallest details.

271
00:08:44,040 --> 00:08:45,280
It's really paying attention.

272
00:08:45,280 --> 00:08:47,840
But if it's so good at these fundamental tasks,

273
00:08:47,840 --> 00:08:51,680
what about more complex applications, like translation,

274
00:08:51,680 --> 00:08:52,240
for example?

275
00:08:52,240 --> 00:08:52,640
That's work.

276
00:08:52,640 --> 00:08:53,560
It's really interesting.

277
00:08:53,560 --> 00:08:54,000
OK.

278
00:08:54,000 --> 00:08:56,920
They found that BLT was especially good at translating

279
00:08:56,920 --> 00:08:59,440
languages with very little training data.

280
00:08:59,440 --> 00:09:00,000
Oh, right.

281
00:09:00,000 --> 00:09:02,720
So think about rare indigenous languages.

282
00:09:02,720 --> 00:09:03,080
OK.

283
00:09:03,080 --> 00:09:05,720
Or maybe dialects with limited written resources.

284
00:09:05,720 --> 00:09:06,160
Right.

285
00:09:06,160 --> 00:09:07,760
Where you don't have tons of data to work with.

286
00:09:07,760 --> 00:09:08,200
Exactly.

287
00:09:08,200 --> 00:09:10,960
And BLT was able to outperform those traditional models

288
00:09:10,960 --> 00:09:12,640
by a significant margin.

289
00:09:12,640 --> 00:09:14,080
Wow, that's a game changer.

290
00:09:14,080 --> 00:09:14,920
Yeah, it is.

291
00:09:14,920 --> 00:09:18,000
It could help preserve endangered languages

292
00:09:18,000 --> 00:09:19,800
or like bridge communication gaps

293
00:09:19,800 --> 00:09:20,920
between different cultures.

294
00:09:20,920 --> 00:09:21,880
Yeah, exactly.

295
00:09:21,880 --> 00:09:23,240
It has huge potential.

296
00:09:23,240 --> 00:09:26,240
But how is it able to do that with so little data?

297
00:09:26,240 --> 00:09:27,680
That's the question, right?

298
00:09:27,680 --> 00:09:29,560
Yeah, is it like AI magic or something?

299
00:09:29,560 --> 00:09:32,280
Well, researchers think it has to do with that direct engagement

300
00:09:32,280 --> 00:09:33,360
with the bytes.

301
00:09:33,360 --> 00:09:33,840
OK.

302
00:09:33,840 --> 00:09:37,000
By looking at that raw data, it can pick up

303
00:09:37,000 --> 00:09:40,560
on these subtle patterns and relationships

304
00:09:40,560 --> 00:09:44,320
that those token-based models might miss.

305
00:09:44,320 --> 00:09:44,800
OK.

306
00:09:44,800 --> 00:09:48,080
It's like BLT is a more attentive student.

307
00:09:48,080 --> 00:09:48,720
I like that.

308
00:09:48,720 --> 00:09:49,920
It's really paying attention.

309
00:09:49,920 --> 00:09:51,000
It's soaking it all up.

310
00:09:51,000 --> 00:09:51,720
Exactly.

311
00:09:51,720 --> 00:09:53,000
OK, that makes a lot of sense.

312
00:09:53,000 --> 00:09:54,680
It's like the difference between trying

313
00:09:54,680 --> 00:09:58,360
to learn a language from a phrase book versus like immersing

314
00:09:58,360 --> 00:10:01,080
yourself in the culture and absorbing all the nuances

315
00:10:01,080 --> 00:10:01,760
naturally.

316
00:10:01,760 --> 00:10:03,000
That's a great analogy.

317
00:10:03,000 --> 00:10:05,880
So all this talk about BLT's potential is really exciting.

318
00:10:05,880 --> 00:10:06,400
Yeah.

319
00:10:06,400 --> 00:10:07,440
What are the next steps?

320
00:10:07,440 --> 00:10:11,000
What kind of research needs to be done to make this a reality?

321
00:10:11,000 --> 00:10:14,240
Well, there's a few exciting avenues for future exploration.

322
00:10:14,240 --> 00:10:15,200
Lay it on me.

323
00:10:15,200 --> 00:10:17,640
One is optimizing that training process.

324
00:10:17,640 --> 00:10:18,040
OK.

325
00:10:18,040 --> 00:10:21,000
Right now, it takes a lot of time and computing power

326
00:10:21,000 --> 00:10:23,920
to train these large language models.

327
00:10:23,920 --> 00:10:25,720
But researchers are looking for ways

328
00:10:25,720 --> 00:10:27,520
to make that faster and more efficient.

329
00:10:27,520 --> 00:10:29,480
OK, because that's key.

330
00:10:29,480 --> 00:10:31,920
If we want to make this tech accessible to more people.

331
00:10:31,920 --> 00:10:33,640
Yeah, we got to make it practical.

332
00:10:33,640 --> 00:10:34,160
OK.

333
00:10:34,160 --> 00:10:35,240
What else have they worked on?

334
00:10:35,240 --> 00:10:37,360
Another area is figuring out the best way

335
00:10:37,360 --> 00:10:39,600
to biteify existing models.

336
00:10:39,600 --> 00:10:42,000
OK, we talked about how BLT starts from scratch.

337
00:10:42,000 --> 00:10:42,520
Right.

338
00:10:42,520 --> 00:10:44,960
But what if we could take the knowledge that's

339
00:10:44,960 --> 00:10:49,400
already stored in those massive token based models

340
00:10:49,400 --> 00:10:51,840
and like convert it to a bite level format?

341
00:10:51,840 --> 00:10:54,000
Yeah, what if we could just translate it?

342
00:10:54,000 --> 00:10:55,560
That would be a huge time saver.

343
00:10:55,560 --> 00:10:56,200
Yeah, it would be.

344
00:10:56,200 --> 00:10:57,760
It's like giving BLT a head start.

345
00:10:57,760 --> 00:10:58,600
A little cheat sheet.

346
00:10:58,600 --> 00:10:58,960
Right.

347
00:10:58,960 --> 00:10:59,800
Exactly.

348
00:10:59,800 --> 00:11:01,680
Did the researchers experiment with that at all?

349
00:11:01,680 --> 00:11:02,520
They did.

350
00:11:02,520 --> 00:11:06,840
They took a pre-trained model called LAM-A 3.1.

351
00:11:06,840 --> 00:11:07,360
OK.

352
00:11:07,360 --> 00:11:11,760
And they used its parameters to basically initialize BLT.

353
00:11:11,760 --> 00:11:12,360
Oh, wow.

354
00:11:12,360 --> 00:11:14,320
And the results were pretty promising,

355
00:11:14,320 --> 00:11:16,360
showing that this approach could lead to faster

356
00:11:16,360 --> 00:11:18,320
and more efficient training.

357
00:11:18,320 --> 00:11:21,960
So we could potentially build on the work that's already

358
00:11:21,960 --> 00:11:22,640
been done.

359
00:11:22,640 --> 00:11:23,160
Yeah.

360
00:11:23,160 --> 00:11:25,160
Instead of having to reinvent the wheel every time.

361
00:11:25,160 --> 00:11:25,600
Exactly.

362
00:11:25,600 --> 00:11:27,240
We don't have to start from zero.

363
00:11:27,240 --> 00:11:28,160
That's super cool.

364
00:11:28,160 --> 00:11:28,880
Yeah.

365
00:11:28,880 --> 00:11:30,680
But what about the practical side of things?

366
00:11:30,680 --> 00:11:32,480
Yeah, the real world stuff.

367
00:11:32,480 --> 00:11:36,160
Yeah, we talked earlier about the need for new tools

368
00:11:36,160 --> 00:11:38,040
and infrastructure to support BLT.

369
00:11:38,040 --> 00:11:38,440
Why?

370
00:11:38,440 --> 00:11:39,680
All the behind the scenes stuff.

371
00:11:39,680 --> 00:11:39,920
Yeah.

372
00:11:39,920 --> 00:11:41,200
What does that actually look like?

373
00:11:41,200 --> 00:11:46,680
So right now, most of the libraries and code bases for AI

374
00:11:46,680 --> 00:11:49,640
are optimized for token based models.

375
00:11:49,640 --> 00:11:50,040
OK.

376
00:11:50,040 --> 00:11:53,880
So we need to develop new software and hardware

377
00:11:53,880 --> 00:11:56,960
that can take advantage of BLT's unique capabilities.

378
00:11:56,960 --> 00:11:57,320
OK.

379
00:11:57,320 --> 00:12:00,000
Like that dynamic patching and byte level processing.

380
00:12:00,000 --> 00:12:00,320
OK.

381
00:12:00,320 --> 00:12:01,200
So it's a big challenge.

382
00:12:01,200 --> 00:12:02,400
Yeah, it's a big undertaking.

383
00:12:02,400 --> 00:12:04,000
But if we can overcome it.

384
00:12:04,000 --> 00:12:04,920
I think we can pull it off.

385
00:12:04,920 --> 00:12:06,680
The potential benefits seem huge.

386
00:12:06,680 --> 00:12:07,320
Yeah.

387
00:12:07,320 --> 00:12:10,680
More efficient models, better language understanding,

388
00:12:10,680 --> 00:12:12,920
fairer and more accurate results.

389
00:12:12,920 --> 00:12:13,240
Yeah.

390
00:12:13,240 --> 00:12:14,480
It's a whole new world.

391
00:12:14,480 --> 00:12:17,240
It's like opening up this whole new world of possibilities.

392
00:12:17,240 --> 00:12:17,920
Exactly.

393
00:12:17,920 --> 00:12:21,400
And it's important to remember this research is still going.

394
00:12:21,400 --> 00:12:21,560
Right.

395
00:12:21,560 --> 00:12:24,720
There's still so much more to learn and explore.

396
00:12:24,720 --> 00:12:25,200
OK.

397
00:12:25,200 --> 00:12:28,800
So what's the impact of BLT and its impact for the future of AI?

398
00:12:28,800 --> 00:12:32,400
This whole deep dive into BLT has been absolutely fascinating.

399
00:12:32,400 --> 00:12:33,520
I'm glad you're enjoying it.

400
00:12:33,520 --> 00:12:33,760
Yeah.

401
00:12:33,760 --> 00:12:36,520
It's made me think about AI in a completely different way.

402
00:12:36,520 --> 00:12:36,840
Yeah.

403
00:12:36,840 --> 00:12:38,920
It really challenges the status quo.

404
00:12:38,920 --> 00:12:39,280
Yeah.

405
00:12:39,280 --> 00:12:41,280
It's not just about processing words anymore.

406
00:12:41,280 --> 00:12:45,920
It's about understanding the fundamental building blocks

407
00:12:45,920 --> 00:12:46,600
of language.

408
00:12:46,600 --> 00:12:47,880
A DNA of language.

409
00:12:47,880 --> 00:12:48,560
Right.

410
00:12:48,560 --> 00:12:50,360
But before we wrap up, I'm curious.

411
00:12:50,360 --> 00:12:50,800
Yeah.

412
00:12:50,800 --> 00:12:52,960
What are your thoughts on the bigger picture?

413
00:12:52,960 --> 00:12:54,960
Ooh, the big picture.

414
00:12:54,960 --> 00:12:57,400
What does this research tell us about the direction of AI

415
00:12:57,400 --> 00:12:58,080
as a whole?

416
00:12:58,080 --> 00:12:59,880
Well, I think BLT really highlights

417
00:12:59,880 --> 00:13:02,240
the importance of thinking outside the box.

418
00:13:02,240 --> 00:13:02,800
OK.

419
00:13:02,800 --> 00:13:04,360
You know, for years, tokenization

420
00:13:04,360 --> 00:13:05,840
has been the dominant approach.

421
00:13:05,840 --> 00:13:06,040
Great.

422
00:13:06,040 --> 00:13:06,880
The go-to method.

423
00:13:06,880 --> 00:13:07,440
Yeah.

424
00:13:07,440 --> 00:13:08,560
Everyone was doing it.

425
00:13:08,560 --> 00:13:09,000
OK.

426
00:13:09,000 --> 00:13:10,680
But this research shows that there

427
00:13:10,680 --> 00:13:13,800
are other ways that might be even more effective

428
00:13:13,800 --> 00:13:17,000
and aligned with how humans actually understand language.

429
00:13:17,000 --> 00:13:18,480
That's a really powerful insight.

430
00:13:18,480 --> 00:13:19,040
It is.

431
00:13:19,040 --> 00:13:22,440
It seems like BLT is really pushing the boundaries

432
00:13:22,440 --> 00:13:24,800
of what we thought was possible with AI.

433
00:13:24,800 --> 00:13:26,720
Yeah, it's breaking down those barriers.

434
00:13:26,720 --> 00:13:28,440
But what are the potential downsides?

435
00:13:28,440 --> 00:13:29,040
Yeah.

436
00:13:29,040 --> 00:13:32,880
Are there any ethical considerations or risks

437
00:13:32,880 --> 00:13:34,200
that we need to be aware of?

438
00:13:34,200 --> 00:13:37,440
Oh, that's a great question and a really important one too.

439
00:13:37,440 --> 00:13:39,680
As this technology develops, yeah,

440
00:13:39,680 --> 00:13:41,840
there's always that potential for misuse.

441
00:13:41,840 --> 00:13:43,320
Right, because it's so powerful.

442
00:13:43,320 --> 00:13:43,960
Exactly.

443
00:13:43,960 --> 00:13:46,400
With any powerful technology, you

444
00:13:46,400 --> 00:13:50,560
need to be really careful about how it's used as BLT

445
00:13:50,560 --> 00:13:51,240
and other models.

446
00:13:51,240 --> 00:13:53,200
Like, it get more sophisticated.

447
00:13:53,200 --> 00:13:56,160
We need to think about how they're being used and developed.

448
00:13:56,160 --> 00:13:56,800
OK.

449
00:13:56,800 --> 00:14:00,560
Things like bias in the data that's used to train them

450
00:14:00,560 --> 00:14:03,120
and the possibility that they could be used to generate

451
00:14:03,120 --> 00:14:04,480
harmful content.

452
00:14:04,480 --> 00:14:06,040
Oh, yeah, that's a big one.

453
00:14:06,040 --> 00:14:07,840
And we have to make sure that they're being developed

454
00:14:07,840 --> 00:14:08,800
responsibly.

455
00:14:08,800 --> 00:14:10,360
So it's a lot of responsibility.

456
00:14:10,360 --> 00:14:11,480
Yeah, it's a big responsibility.

457
00:14:11,480 --> 00:14:13,200
Like, we're not just building a better engine.

458
00:14:13,200 --> 00:14:15,240
We're building a whole new vehicle.

459
00:14:15,240 --> 00:14:17,320
Exactly, a whole new way of thinking.

460
00:14:17,320 --> 00:14:21,120
And we need to make sure it's safe and ethical to drive.

461
00:14:21,120 --> 00:14:23,560
It's like giving AI a driver's license.

462
00:14:23,560 --> 00:14:24,520
Right, exactly.

463
00:14:24,520 --> 00:14:27,440
So focusing on the positive for a second.

464
00:14:27,440 --> 00:14:29,160
What are you most excited about when

465
00:14:29,160 --> 00:14:30,960
you think about the future of BLT?

466
00:14:30,960 --> 00:14:31,720
OK.

467
00:14:31,720 --> 00:14:33,800
And bite-level AI in general?

468
00:14:33,800 --> 00:14:36,160
I'm really excited about the potential for this thing

469
00:14:36,160 --> 00:14:38,320
called multimodal AI.

470
00:14:38,320 --> 00:14:39,480
Multimodal AI.

471
00:14:39,480 --> 00:14:40,720
Yeah, it's a mouthful.

472
00:14:40,720 --> 00:14:44,080
But it basically means models that can understand,

473
00:14:44,080 --> 00:14:48,000
not just text, but also images, audio, video.

474
00:14:48,000 --> 00:14:48,600
Oh, wow.

475
00:14:48,600 --> 00:14:50,920
So it's like AI that can experience the world

476
00:14:50,920 --> 00:14:52,160
through all our senses.

477
00:14:52,160 --> 00:14:54,440
Yeah, it's like giving AI all the senses.

478
00:14:54,440 --> 00:14:55,240
That's amazing.

479
00:14:55,240 --> 00:14:55,760
It is.

480
00:14:55,760 --> 00:14:57,160
What would that look like in practice?

481
00:14:57,160 --> 00:15:00,120
So imagine a model that can seamlessly analyze

482
00:15:00,120 --> 00:15:01,520
the text of a news article.

483
00:15:01,520 --> 00:15:02,080
OK.

484
00:15:02,080 --> 00:15:04,280
And then look at the images that go with it.

485
00:15:04,280 --> 00:15:04,600
Very.

486
00:15:04,600 --> 00:15:07,960
And even listen to the tone of voice in a related audio clip.

487
00:15:07,960 --> 00:15:08,440
Oh, wow.

488
00:15:08,440 --> 00:15:12,200
And put all of that together into one unified understanding.

489
00:15:12,200 --> 00:15:13,040
That's mind-blowing.

490
00:15:13,040 --> 00:15:16,000
It would be like having an AI assistant that could really

491
00:15:16,000 --> 00:15:19,760
grasp the world in all its complexity and richness.

492
00:15:19,760 --> 00:15:20,720
That would be incredible.

493
00:15:20,720 --> 00:15:21,880
That would be a game changer.

494
00:15:21,880 --> 00:15:23,760
But it's not just about understanding, right?

495
00:15:23,760 --> 00:15:24,240
No.

496
00:15:24,240 --> 00:15:25,720
What about the creative potential?

497
00:15:25,720 --> 00:15:27,480
Oh, that's the really exciting part.

498
00:15:27,480 --> 00:15:27,760
Yeah.

499
00:15:27,760 --> 00:15:31,880
Imagine being able to generate these brand new forms

500
00:15:31,880 --> 00:15:33,720
of art and media.

501
00:15:33,720 --> 00:15:34,080
Oh, wow.

502
00:15:34,080 --> 00:15:35,960
Blending those different modalities.

503
00:15:35,960 --> 00:15:36,440
OK.

504
00:15:36,440 --> 00:15:38,680
In ways that we haven't even thought of yet.

505
00:15:38,680 --> 00:15:42,200
So like AI-generated art, but on a whole other level?

506
00:15:42,200 --> 00:15:44,360
Yeah, it's taking it to the next dimension.

507
00:15:44,360 --> 00:15:48,600
A world where AI can help us tell stories, compose music,

508
00:15:48,600 --> 00:15:51,160
create art in ways we haven't even imagined.

509
00:15:51,160 --> 00:15:52,760
It's like opening up this whole new dimension

510
00:15:52,760 --> 00:15:54,240
of artistic expression.

511
00:15:54,240 --> 00:15:54,720
It is.

512
00:15:54,720 --> 00:15:55,220
Yes.

513
00:15:55,220 --> 00:15:56,800
Powered by AI.

514
00:15:56,800 --> 00:15:57,320
But hold on.

515
00:15:57,320 --> 00:16:00,960
If we're talking about AI models that can process all this data.

516
00:16:00,960 --> 00:16:01,480
Yeah.

517
00:16:01,480 --> 00:16:02,520
All that data.

518
00:16:02,520 --> 00:16:06,120
Aren't we also talking about massive computational demands?

519
00:16:06,120 --> 00:16:07,040
That's a good point.

520
00:16:07,040 --> 00:16:07,540
Yeah.

521
00:16:07,540 --> 00:16:10,280
Wouldn't that limit who could actually access and use

522
00:16:10,280 --> 00:16:11,160
this technology?

523
00:16:11,160 --> 00:16:11,480
Yeah.

524
00:16:11,480 --> 00:16:12,600
That's a valid concern.

525
00:16:12,600 --> 00:16:15,000
But that's where the efficiency of BLT

526
00:16:15,000 --> 00:16:16,360
becomes super important.

527
00:16:16,360 --> 00:16:16,800
OK.

528
00:16:16,800 --> 00:16:20,360
If we can develop these models that can process information

529
00:16:20,360 --> 00:16:21,480
at the byte level.

530
00:16:21,480 --> 00:16:21,720
Right.

531
00:16:21,720 --> 00:16:24,920
With the same efficiency and scalability as BLT.

532
00:16:24,920 --> 00:16:25,440
OK.

533
00:16:25,440 --> 00:16:30,480
We could potentially make those multimodal dreams a reality

534
00:16:30,480 --> 00:16:33,720
without needing those massive supercomputers.

535
00:16:33,720 --> 00:16:34,440
That's amazing.

536
00:16:34,440 --> 00:16:36,720
It could democratize access to these tools.

537
00:16:36,720 --> 00:16:37,000
Right.

538
00:16:37,000 --> 00:16:38,480
And make them available to anyone.

539
00:16:38,480 --> 00:16:40,840
Enabling innovation across different fields.

540
00:16:40,840 --> 00:16:41,800
That's really cool.

541
00:16:41,800 --> 00:16:45,600
So BLT isn't just pushing the boundaries of what AI can do.

542
00:16:45,600 --> 00:16:46,440
No, it's not.

543
00:16:46,440 --> 00:16:50,160
It's also potentially making it more accessible to everyone.

544
00:16:50,160 --> 00:16:50,840
Exactly.

545
00:16:50,840 --> 00:16:53,600
It's about making AI more inclusive.

546
00:16:53,600 --> 00:16:54,520
That's pretty inspiring.

547
00:16:54,520 --> 00:16:54,920
It is.

548
00:16:54,920 --> 00:16:57,520
It's a really exciting time to be working in AI.

549
00:16:57,520 --> 00:17:00,360
This whole deep dive into BLT has been incredible.

550
00:17:00,360 --> 00:17:01,320
I'm glad you enjoyed it.

551
00:17:01,320 --> 00:17:02,000
It really has.

552
00:17:02,000 --> 00:17:04,040
It's made me realize that we're living

553
00:17:04,040 --> 00:17:07,960
in this time of unprecedented advancement in AI.

554
00:17:07,960 --> 00:17:08,120
Yeah.

555
00:17:08,120 --> 00:17:09,880
Things are moving so fast.

556
00:17:09,880 --> 00:17:12,720
And the potential for positive change is huge.

557
00:17:12,720 --> 00:17:13,800
It's immense.

558
00:17:13,800 --> 00:17:14,560
Yeah.

559
00:17:14,560 --> 00:17:17,920
But as you said, it's also a time for careful consideration

560
00:17:17,920 --> 00:17:19,080
and responsible development.

561
00:17:19,080 --> 00:17:19,720
Absolutely.

562
00:17:19,720 --> 00:17:21,800
We need to make sure that this powerful technology is

563
00:17:21,800 --> 00:17:22,880
used for good.

564
00:17:22,880 --> 00:17:24,360
We don't want to go down the wrong path.

565
00:17:24,360 --> 00:17:24,920
Right.

566
00:17:24,920 --> 00:17:26,920
And that it benefits all of humanity.

567
00:17:26,920 --> 00:17:30,560
It's about using AI to make the world a better place.

568
00:17:30,560 --> 00:17:32,640
That's a perfect note to end on.

569
00:17:32,640 --> 00:17:35,320
Thank you so much for taking us on this deep dive

570
00:17:35,320 --> 00:17:38,160
into the world of byte-latent transformers.

571
00:17:38,160 --> 00:17:39,160
It's been my pleasure.

572
00:17:39,160 --> 00:17:42,080
It's been an eye-opening and inspiring conversation

573
00:17:42,080 --> 00:17:43,120
and to our listeners.

574
00:17:43,120 --> 00:17:44,920
Yeah, to everyone out there.

575
00:17:44,920 --> 00:17:46,920
Kept those imaginations running wild.

576
00:17:46,920 --> 00:17:48,960
Don't be afraid to dream big.

577
00:17:48,960 --> 00:17:51,360
And stay tuned for our next deep dive

578
00:17:51,360 --> 00:17:54,640
into the ever-evolving world of AI.

579
00:17:54,640 --> 00:17:56,920
We'll be back with more cutting-edge research.

580
00:17:56,920 --> 00:18:01,120
Until then, keep exploring and keep questioning.

581
00:18:01,120 --> 00:18:01,960
Keep learning.

582
00:18:01,960 --> 00:18:04,120
Because that's how we unlock the true potential

583
00:18:04,120 --> 00:18:05,360
of artificial intelligence.

584
00:18:05,360 --> 00:18:07,920
That's how we build a better future.

585
00:18:07,920 --> 00:18:09,640
And that's it for today's deep dive.

586
00:18:09,640 --> 00:18:10,280
See you next time.

587
00:18:10,280 --> 00:18:13,280
We'll see you next time.

