1
00:00:00,000 --> 00:00:01,760
Okay, so get this.

2
00:00:01,760 --> 00:00:06,760
AI that thinks in ideas, not just words, but actual ideas.

3
00:00:07,880 --> 00:00:09,360
Yeah, it's a pretty big shift.

4
00:00:09,360 --> 00:00:11,960
Right, like totally different from how we usually think

5
00:00:11,960 --> 00:00:13,960
about language AI.

6
00:00:13,960 --> 00:00:14,800
Definitely.

7
00:00:14,800 --> 00:00:18,200
So this paper you sent in, large concept models,

8
00:00:18,200 --> 00:00:21,480
language modeling in a sentence representation space.

9
00:00:21,480 --> 00:00:24,000
It basically says, forget the old way of doing things,

10
00:00:24,000 --> 00:00:25,760
we're going concept driven.

11
00:00:25,760 --> 00:00:28,280
It's exciting, trying to like get closer

12
00:00:28,280 --> 00:00:29,760
to how we think, you know?

13
00:00:29,760 --> 00:00:30,840
Oh yeah, for sure.

14
00:00:30,840 --> 00:00:32,760
Like our brains don't think word by word, right?

15
00:00:32,760 --> 00:00:33,600
It's more like-

16
00:00:33,600 --> 00:00:34,560
Oncepts, exactly.

17
00:00:34,560 --> 00:00:35,920
Think about when you're writing something,

18
00:00:35,920 --> 00:00:37,000
planning a speech.

19
00:00:37,000 --> 00:00:39,320
Totally, you're not just like stringing words together,

20
00:00:39,320 --> 00:00:40,320
there's a bigger picture.

21
00:00:40,320 --> 00:00:42,000
You've got these ideas you want to get across.

22
00:00:42,000 --> 00:00:43,280
Okay, I'm with you.

23
00:00:43,280 --> 00:00:45,840
But how do they actually teach an AI to do that?

24
00:00:45,840 --> 00:00:48,640
What are these, what did you call them, large concept models?

25
00:00:48,640 --> 00:00:49,480
It's LCMs.

26
00:00:49,480 --> 00:00:51,480
Right, LCMs, what's the deal with those?

27
00:00:51,480 --> 00:00:54,760
Think of it like this, the language AI we have now,

28
00:00:54,760 --> 00:00:57,080
like the stuff behind chat JPD.

29
00:00:57,080 --> 00:00:59,380
They're good at guessing the next word.

30
00:00:59,380 --> 00:01:01,800
Like statistically what word comes next.

31
00:01:01,800 --> 00:01:02,640
Right, right.

32
00:01:02,640 --> 00:01:05,440
But LCMs, they're trying to predict the next concept.

33
00:01:05,440 --> 00:01:07,480
And the way they do that is with something

34
00:01:07,480 --> 00:01:09,040
called sentence embeddings.

35
00:01:09,040 --> 00:01:12,320
Sentence embeddings, so like taking a whole sentence

36
00:01:12,320 --> 00:01:13,960
and squishing it down into a code.

37
00:01:13,960 --> 00:01:16,240
Yeah, basically like a fingerprint of the idea

38
00:01:16,240 --> 00:01:17,880
behind the whole sentence.

39
00:01:17,880 --> 00:01:20,320
And the really cool part is this paper uses a system

40
00:01:20,320 --> 00:01:21,600
called Sonar.

41
00:01:21,600 --> 00:01:24,600
And Sonar can make these embeddings for like

42
00:01:24,600 --> 00:01:27,440
over 200 languages, even speech.

43
00:01:27,440 --> 00:01:29,400
Hold on, 200 languages.

44
00:01:29,400 --> 00:01:32,140
So you're saying this AI could like listen to me

45
00:01:32,140 --> 00:01:34,320
talk in English and then I don't know,

46
00:01:34,320 --> 00:01:36,320
translate the ideas into French.

47
00:01:36,320 --> 00:01:38,120
But without actually translating word for word.

48
00:01:38,120 --> 00:01:40,880
Yeah, that's the goal, it's still early, but.

49
00:01:40,880 --> 00:01:41,720
That's wild.

50
00:01:41,720 --> 00:01:43,200
Okay, but back up a sec.

51
00:01:43,200 --> 00:01:45,080
How do these LCMs actually work?

52
00:01:45,080 --> 00:01:46,200
Like step by step.

53
00:01:46,200 --> 00:01:48,480
All right, so imagine you feed some text into the AI.

54
00:01:48,480 --> 00:01:51,680
First thing it does, splits that text into sentences.

55
00:01:51,680 --> 00:01:53,880
Gotcha, like breaking a book into chapters.

56
00:01:53,880 --> 00:01:55,000
Yeah, good analogy.

57
00:01:55,000 --> 00:01:58,920
Then each sentence, it gets turned into this concept vector.

58
00:01:58,920 --> 00:02:00,640
That's where the sentence embeddings come in.

59
00:02:00,640 --> 00:02:03,480
Okay, so each sentence is like a little package

60
00:02:03,480 --> 00:02:05,600
of meaning, this concept vector.

61
00:02:05,600 --> 00:02:08,500
Right, and then the LCM comes in,

62
00:02:08,500 --> 00:02:10,360
it takes those concept vectors

63
00:02:10,360 --> 00:02:12,680
and it tries to predict what the next one should be.

64
00:02:12,680 --> 00:02:14,400
So it's not just the next word,

65
00:02:14,400 --> 00:02:16,120
it's figuring out the next idea.

66
00:02:16,120 --> 00:02:19,320
Exactly, it's about the flow of ideas, not just words.

67
00:02:19,320 --> 00:02:22,080
So you're getting the logic behind a story

68
00:02:22,080 --> 00:02:23,360
and argument, whatever.

69
00:02:23,360 --> 00:02:25,800
It's just like moving past just grammar

70
00:02:25,800 --> 00:02:27,200
and getting into, ah.

71
00:02:27,200 --> 00:02:28,040
Semantics.

72
00:02:28,040 --> 00:02:28,880
Yeah, semantics.

73
00:02:28,880 --> 00:02:29,720
Yeah.

74
00:02:29,720 --> 00:02:31,360
And once it's predicted the next concept,

75
00:02:31,360 --> 00:02:34,120
it uses sonar again to turn that vector back

76
00:02:34,120 --> 00:02:35,600
into a sentence we can read.

77
00:02:35,600 --> 00:02:39,140
So it's like text to concept, predict next concept,

78
00:02:39,140 --> 00:02:40,680
concept back to text.

79
00:02:40,680 --> 00:02:43,000
Pretty neat, but I'm guessing it's not that easy, right?

80
00:02:43,000 --> 00:02:43,840
You got it.

81
00:02:43,840 --> 00:02:44,920
Language is messy.

82
00:02:44,920 --> 00:02:45,760
Oh yeah.

83
00:02:45,760 --> 00:02:47,760
Like there could be tons of ways to continue a sentence,

84
00:02:47,760 --> 00:02:48,760
lots of different meanings.

85
00:02:48,760 --> 00:02:49,640
Right, right.

86
00:02:49,640 --> 00:02:51,200
So how does the LCM deal with that?

87
00:02:51,200 --> 00:02:54,040
Does it just pick like the most statistically likely

88
00:02:54,040 --> 00:02:55,120
next concept?

89
00:02:55,120 --> 00:02:57,040
They do something more interesting than just statistics.

90
00:02:57,040 --> 00:03:00,400
They have these things called diffusion-based LCMs.

91
00:03:00,400 --> 00:03:02,080
And these diffusion-based LCMs,

92
00:03:02,080 --> 00:03:06,400
they use this process of noising and denoising.

93
00:03:06,400 --> 00:03:08,160
Imagine you take a picture, like a photograph,

94
00:03:08,160 --> 00:03:10,000
and you add a bunch of static to it.

95
00:03:10,000 --> 00:03:11,040
Like on an old TV.

96
00:03:11,040 --> 00:03:12,000
Yeah, exactly.

97
00:03:12,000 --> 00:03:15,200
And then they train the LCM to like remove that static

98
00:03:15,200 --> 00:03:16,440
and get back the original picture.

99
00:03:16,440 --> 00:03:17,760
And by doing that over and over,

100
00:03:17,760 --> 00:03:20,880
the LCM learns to see through the noise,

101
00:03:20,880 --> 00:03:23,320
to get the structure, even with all that mess.

102
00:03:23,320 --> 00:03:26,720
So like training the AI to see the core meaning,

103
00:03:26,720 --> 00:03:27,560
even if it's hidden.

104
00:03:27,560 --> 00:03:28,600
Exactly.

105
00:03:28,600 --> 00:03:32,480
And there are two main types of these diffusion-based things.

106
00:03:32,480 --> 00:03:33,880
One tower and two tower.

107
00:03:33,880 --> 00:03:35,480
One tower, two tower.

108
00:03:35,480 --> 00:03:36,480
Why the different names?

109
00:03:36,480 --> 00:03:37,680
It's all about their architecture,

110
00:03:37,680 --> 00:03:38,760
like how they're built inside.

111
00:03:38,760 --> 00:03:41,000
One tower, everything happens in one structure,

112
00:03:41,000 --> 00:03:42,000
like a one-stop shop.

113
00:03:42,000 --> 00:03:42,640
OK.

114
00:03:42,640 --> 00:03:44,560
But two-tower models, they split it up.

115
00:03:44,560 --> 00:03:47,360
One tower for encoding the input into concepts,

116
00:03:47,360 --> 00:03:50,040
and another tower for predicting and decoding the next one.

117
00:03:50,040 --> 00:03:50,600
Huh.

118
00:03:50,600 --> 00:03:52,040
So it's like dividing the labor?

119
00:03:52,040 --> 00:03:52,880
Yeah.

120
00:03:52,880 --> 00:03:54,120
Each tower is a specialist.

121
00:03:54,120 --> 00:03:54,440
Yeah.

122
00:03:54,440 --> 00:03:56,560
And what they found is this two-tower setup,

123
00:03:56,560 --> 00:03:58,600
it seems to work really well for language,

124
00:03:58,600 --> 00:04:01,120
especially for generating text that makes sense, you know,

125
00:04:01,120 --> 00:04:01,720
is coherent.

126
00:04:01,720 --> 00:04:04,040
Like it consistently outperforms the others.

127
00:04:04,040 --> 00:04:04,560
Makes sense, right?

128
00:04:04,560 --> 00:04:05,800
Like a team of exploits.

129
00:04:05,800 --> 00:04:06,640
Exactly.

130
00:04:06,640 --> 00:04:08,840
But there's another type of LCM they looked at too.

131
00:04:08,840 --> 00:04:11,560
It's called a quantized LCM.

132
00:04:11,560 --> 00:04:14,120
Quantized.

133
00:04:14,120 --> 00:04:15,240
That sounds complicated.

134
00:04:15,240 --> 00:04:18,400
Well, it's about dealing with how huge the concept space is.

135
00:04:18,400 --> 00:04:20,200
Imagine all the possible concepts, right?

136
00:04:20,200 --> 00:04:21,880
Like every idea ever.

137
00:04:21,880 --> 00:04:22,240
OK.

138
00:04:22,240 --> 00:04:23,480
That's a lot.

139
00:04:23,480 --> 00:04:26,320
So quantized LCMs, they try to simplify that

140
00:04:26,320 --> 00:04:28,440
by like breaking it down into chunks.

141
00:04:28,440 --> 00:04:30,920
So instead of infinite concepts, it's more like a grid.

142
00:04:30,920 --> 00:04:32,000
Yeah, exactly.

143
00:04:32,000 --> 00:04:33,360
It makes it easier to learn.

144
00:04:33,360 --> 00:04:34,920
And they have two kinds of these,

145
00:04:34,920 --> 00:04:39,320
quant LCMD and quant LCMC, which do this chunking

146
00:04:39,320 --> 00:04:40,440
a little differently.

147
00:04:40,440 --> 00:04:43,960
OK, so we've got the basic LCM, the diffusion-based ones

148
00:04:43,960 --> 00:04:47,600
with their towers, and these quantized ones.

149
00:04:47,600 --> 00:04:48,760
That's a lot of options.

150
00:04:48,760 --> 00:04:50,320
How do they know which ones best?

151
00:04:50,320 --> 00:04:51,760
That's where the experiments come in.

152
00:04:51,760 --> 00:04:54,360
They do what's called ablation studies.

153
00:04:54,360 --> 00:04:56,480
Basically, they test them all out head to head.

154
00:04:56,480 --> 00:04:57,920
So like an AI bake off.

155
00:04:57,920 --> 00:04:58,720
Yeah.

156
00:04:58,720 --> 00:05:01,200
And what they found is those diffusion-based LCMs,

157
00:05:01,200 --> 00:05:03,480
especially the two-tower ones, they do really well.

158
00:05:03,480 --> 00:05:05,120
So two towers the winner so far.

159
00:05:05,120 --> 00:05:07,880
What else did they learn from these tests?

160
00:05:07,880 --> 00:05:11,360
One interesting thing was about tuning the noise schedule.

161
00:05:11,360 --> 00:05:12,040
Noise schedule.

162
00:05:12,040 --> 00:05:13,400
Remind me what that was again.

163
00:05:13,400 --> 00:05:15,440
Remember that whole noising into noising thing?

164
00:05:15,440 --> 00:05:18,320
The noise schedule is like, how much noise you add and when?

165
00:05:18,320 --> 00:05:20,640
It's like how much static you put on the TV picture.

166
00:05:20,640 --> 00:05:21,520
Yep.

167
00:05:21,520 --> 00:05:23,320
And by adjusting that, they can actually

168
00:05:23,320 --> 00:05:26,840
control how well the LCM predicts the next concept

169
00:05:26,840 --> 00:05:29,040
and also how diverse the output is.

170
00:05:29,040 --> 00:05:29,720
Oh, that's cool.

171
00:05:29,720 --> 00:05:32,440
So like finding the balance between sticking to the script

172
00:05:32,440 --> 00:05:33,520
and being creator.

173
00:05:33,520 --> 00:05:35,160
Right, exactly.

174
00:05:35,160 --> 00:05:37,560
And another thing they found, this was interesting,

175
00:05:37,560 --> 00:05:41,440
this idea of fragility in the sonar space.

176
00:05:41,440 --> 00:05:42,640
Fragility.

177
00:05:42,640 --> 00:05:45,680
So it turns out that some sentence embeddings,

178
00:05:45,680 --> 00:05:47,920
they're really sensitive to changes.

179
00:05:47,920 --> 00:05:50,280
Like you change one tiny thing and the meaning's

180
00:05:50,280 --> 00:05:51,320
totally different.

181
00:05:51,320 --> 00:05:52,600
Oh, like that telephone game.

182
00:05:52,600 --> 00:05:53,520
Exactly.

183
00:05:53,520 --> 00:05:56,560
And that's a big deal for training these LCMs.

184
00:05:56,560 --> 00:05:58,240
Because if the foundation's shaky,

185
00:05:58,240 --> 00:05:59,800
the whole thing could fall apart.

186
00:05:59,800 --> 00:06:01,480
So it's not just about the LCM itself.

187
00:06:01,480 --> 00:06:03,880
It's got to have good, solid concepts to work with.

188
00:06:03,880 --> 00:06:04,360
Right.

189
00:06:04,360 --> 00:06:06,320
If the sentence embeddings are messed up,

190
00:06:06,320 --> 00:06:07,400
the whole thing's messed up.

191
00:06:07,400 --> 00:06:07,920
OK.

192
00:06:07,920 --> 00:06:10,720
So we've been talking about these kind of smaller models.

193
00:06:10,720 --> 00:06:11,720
Yeah.

194
00:06:11,720 --> 00:06:14,400
But I have a feeling they didn't stop there, right?

195
00:06:14,400 --> 00:06:15,400
You know it.

196
00:06:15,400 --> 00:06:16,680
They went big.

197
00:06:16,680 --> 00:06:20,600
They built a 7 billion parameter two tower LCM.

198
00:06:20,600 --> 00:06:22,800
Whoa, 7 billion.

199
00:06:22,800 --> 00:06:24,080
That's massive.

200
00:06:24,080 --> 00:06:26,000
What could you even do with a model that big?

201
00:06:26,000 --> 00:06:27,840
Well, they tried out some tougher tasks,

202
00:06:27,840 --> 00:06:30,000
like text summarization.

203
00:06:30,000 --> 00:06:32,080
You know, taking a long piece of text

204
00:06:32,080 --> 00:06:34,480
and boiling it down to the main points.

205
00:06:34,480 --> 00:06:37,400
And they even did this new thing called summary expansion.

206
00:06:37,400 --> 00:06:39,120
So you take a short summary and you

207
00:06:39,120 --> 00:06:42,640
generate a longer text from it, like fleshing out an outline.

208
00:06:42,640 --> 00:06:42,920
Wow.

209
00:06:42,920 --> 00:06:44,400
So not just thinking in concepts,

210
00:06:44,400 --> 00:06:46,640
but also doing these complex tasks.

211
00:06:46,640 --> 00:06:47,160
Yeah.

212
00:06:47,160 --> 00:06:47,800
Did it work?

213
00:06:47,800 --> 00:06:49,840
Like was this mega LCM any good?

214
00:06:49,840 --> 00:06:51,840
They used a bunch of different metrics to check it out.

215
00:06:51,840 --> 00:06:53,160
And yeah, it's impressive.

216
00:06:53,160 --> 00:06:55,280
Even at that scale, LCMs were holding their own

217
00:06:55,280 --> 00:06:58,080
against the usual large language models,

218
00:06:58,080 --> 00:06:59,200
sometimes even beating them.

219
00:06:59,200 --> 00:07:00,240
That's incredible.

220
00:07:00,240 --> 00:07:01,400
But wait, there's more, right?

221
00:07:01,400 --> 00:07:04,680
You mentioned something about multilingual magic.

222
00:07:04,680 --> 00:07:06,240
Yeah, this is where it gets really cool.

223
00:07:06,240 --> 00:07:08,640
Because these LCMs are working with concepts,

224
00:07:08,640 --> 00:07:10,920
they can potentially handle any language that's

225
00:07:10,920 --> 00:07:14,280
so in our supports and get this, without extra training.

226
00:07:14,280 --> 00:07:15,000
Hold up.

227
00:07:15,000 --> 00:07:19,000
You mean I could train an LCM on English text.

228
00:07:19,000 --> 00:07:21,480
And it could understand, I don't know, Japanese

229
00:07:21,480 --> 00:07:23,200
without ever seeing Japanese before.

230
00:07:23,200 --> 00:07:24,000
That's the idea.

231
00:07:24,000 --> 00:07:26,240
It's learning the concepts, not just the words.

232
00:07:26,240 --> 00:07:27,040
My mind is blown.

233
00:07:27,040 --> 00:07:28,240
Did they actually test this out?

234
00:07:28,240 --> 00:07:29,000
Oh, yeah.

235
00:07:29,000 --> 00:07:31,920
They did a test on this multilingual data set called

236
00:07:31,920 --> 00:07:33,240
XL Sum.

237
00:07:33,240 --> 00:07:36,720
And the LCM, it beat a really powerful language model,

238
00:07:36,720 --> 00:07:42,760
Lama 3.18BIT, on 42 different languages, many of which

239
00:07:42,760 --> 00:07:44,000
it had never seen before.

240
00:07:44,000 --> 00:07:44,520
No way.

241
00:07:44,520 --> 00:07:45,600
It's pretty amazing.

242
00:07:45,600 --> 00:07:47,800
This Zeryshot multilingual thing,

243
00:07:47,800 --> 00:07:49,240
it's a total game changer.

244
00:07:49,240 --> 00:07:50,320
And this is huge.

245
00:07:50,320 --> 00:07:51,920
It sounds almost too good to be true.

246
00:07:51,920 --> 00:07:52,440
I know.

247
00:07:52,440 --> 00:07:53,200
It's still early.

248
00:07:53,200 --> 00:07:54,160
But think about it.

249
00:07:54,160 --> 00:07:56,920
Communication, collaboration, understanding.

250
00:07:56,920 --> 00:07:58,080
No more language barriers.

251
00:07:58,080 --> 00:07:59,040
Exactly.

252
00:07:59,040 --> 00:08:00,920
And they even go one step further in the taper.

253
00:08:00,920 --> 00:08:04,360
They start talking about giving LCMs a plan to follow,

254
00:08:04,360 --> 00:08:05,440
like an outline.

255
00:08:05,440 --> 00:08:08,120
So instead of just predicting whatever concept comes next,

256
00:08:08,120 --> 00:08:09,160
it's got a roadmap.

257
00:08:09,160 --> 00:08:09,560
Right.

258
00:08:09,560 --> 00:08:13,440
They call them break concepts and plan concepts

259
00:08:13,440 --> 00:08:14,640
to help structure things.

260
00:08:14,640 --> 00:08:17,920
So break concepts are like chapter breaks in a book.

261
00:08:17,920 --> 00:08:21,080
And plan concepts tell you what should be in each section.

262
00:08:21,080 --> 00:08:25,640
They even tried it out with what they call a large planning

263
00:08:25,640 --> 00:08:28,640
concept model, or LPCM.

264
00:08:28,640 --> 00:08:30,160
And it looks really good.

265
00:08:30,160 --> 00:08:32,920
The text it generated was way more coherent.

266
00:08:32,920 --> 00:08:35,040
So these LCMs, they can think in concepts,

267
00:08:35,040 --> 00:08:36,960
they can do multiple languages, and now they

268
00:08:36,960 --> 00:08:38,480
can follow a plan.

269
00:08:38,480 --> 00:08:39,480
This is revolutionary.

270
00:08:39,480 --> 00:08:41,280
It's pushing the boundaries, for sure.

271
00:08:41,280 --> 00:08:44,880
We're getting closer to AI that can actually think like us.

272
00:08:44,880 --> 00:08:46,320
This is mind blowing.

273
00:08:46,320 --> 00:08:48,040
But OK, before we get too carried away,

274
00:08:48,040 --> 00:08:49,800
we've got to talk about the limitations too.

275
00:08:49,800 --> 00:08:51,480
I mean, no AI is perfect, right?

276
00:08:51,480 --> 00:08:52,360
Definitely not.

277
00:08:52,360 --> 00:08:53,920
And the paper is honest about that.

278
00:08:53,920 --> 00:08:54,800
Good.

279
00:08:54,800 --> 00:08:56,960
So that's what we'll do in the next part of this deep dive.

280
00:08:56,960 --> 00:08:57,760
Stick around.

281
00:08:57,760 --> 00:08:58,960
We've got more to uncover.

282
00:08:58,960 --> 00:09:01,000
Back again, ready to talk limitations.

283
00:09:01,000 --> 00:09:01,720
Let's do it.

284
00:09:01,720 --> 00:09:05,440
It all sounds amazing, but no AI is perfect.

285
00:09:05,440 --> 00:09:05,960
Right.

286
00:09:05,960 --> 00:09:07,160
The paper's upfront about that.

287
00:09:07,160 --> 00:09:08,680
There's still a lot to figure out.

288
00:09:08,680 --> 00:09:09,160
Yeah.

289
00:09:09,160 --> 00:09:09,680
Yeah.

290
00:09:09,680 --> 00:09:11,000
I mean, concepts themselves, they're

291
00:09:11,000 --> 00:09:11,880
kind of fuzzy, right?

292
00:09:11,880 --> 00:09:13,040
Mm-hmm.

293
00:09:13,040 --> 00:09:16,920
How do you even define a concept for an AI?

294
00:09:16,920 --> 00:09:18,560
That's the million dollar question.

295
00:09:18,560 --> 00:09:20,680
How do you really capture what a concept is?

296
00:09:20,680 --> 00:09:23,800
And that ties into, remember that whole fragility thing

297
00:09:23,800 --> 00:09:24,520
we were talking about?

298
00:09:24,520 --> 00:09:26,640
Oh, yeah, with the sentence embeddings.

299
00:09:26,640 --> 00:09:29,640
Like, a tiny change can totally mess up the meaning.

300
00:09:29,640 --> 00:09:30,480
Exactly.

301
00:09:30,480 --> 00:09:32,280
So when you're talking about concepts,

302
00:09:32,280 --> 00:09:34,680
that fragility is a big deal.

303
00:09:34,680 --> 00:09:37,040
Imagine the AI trying to understand something

304
00:09:37,040 --> 00:09:40,480
complex, democracy, justice, these big ideas.

305
00:09:40,480 --> 00:09:42,920
If the sentence embedding for those is off,

306
00:09:42,920 --> 00:09:44,200
even by a little bit.

307
00:09:44,200 --> 00:09:45,240
The whole thing's messed up.

308
00:09:45,240 --> 00:09:45,840
Yeah.

309
00:09:45,840 --> 00:09:47,760
It's like you're building on a shaky foundation.

310
00:09:47,760 --> 00:09:49,360
Doesn't matter how fancy the rest of it is.

311
00:09:49,360 --> 00:09:49,880
Right, right.

312
00:09:49,880 --> 00:09:52,360
So you've got to make sure those concept vectors are, like,

313
00:09:52,360 --> 00:09:54,520
really solid, accurate.

314
00:09:54,520 --> 00:09:55,320
Absolutely.

315
00:09:55,320 --> 00:09:57,280
And that kind of leads into another limitation,

316
00:09:57,280 --> 00:10:00,160
the computing power, encoding all these sentences

317
00:10:00,160 --> 00:10:03,360
into concept vectors, and then decoding them back.

318
00:10:03,360 --> 00:10:03,800
Oh, yeah.

319
00:10:03,800 --> 00:10:05,240
I bet that takes a lot of juice.

320
00:10:05,240 --> 00:10:07,640
It's a lot, especially with these huge data sets.

321
00:10:07,640 --> 00:10:09,560
It's like a whole extra layer on top

322
00:10:09,560 --> 00:10:11,200
of regular language processing.

323
00:10:11,200 --> 00:10:12,320
So it's a trade-off, right?

324
00:10:12,320 --> 00:10:14,120
You want the AI to think big, but it's

325
00:10:14,120 --> 00:10:16,000
got to be able to handle it computationally.

326
00:10:16,000 --> 00:10:16,560
Totally.

327
00:10:16,560 --> 00:10:18,200
And then there's the data itself.

328
00:10:18,200 --> 00:10:21,480
Like, LCMs still need to learn from tons of text, right?

329
00:10:21,480 --> 00:10:22,200
For sure.

330
00:10:22,200 --> 00:10:25,840
And if that data is bad, biased, or just limited.

331
00:10:25,840 --> 00:10:26,920
Garbage in, garbage out.

332
00:10:26,920 --> 00:10:27,840
Exactly.

333
00:10:27,840 --> 00:10:30,760
So finding good data, that's a huge challenge,

334
00:10:30,760 --> 00:10:32,920
especially for all those languages.

335
00:10:32,920 --> 00:10:35,400
200 plus, that's sonar can handle.

336
00:10:35,400 --> 00:10:37,520
It's got to be high quality, diverse,

337
00:10:37,520 --> 00:10:40,160
and represent all those different ways people use language.

338
00:10:40,160 --> 00:10:41,120
That's a tall order.

339
00:10:41,120 --> 00:10:41,680
Yeah.

340
00:10:41,680 --> 00:10:43,880
OK, so we've got the abstractness of concepts,

341
00:10:43,880 --> 00:10:47,040
the computing power issue, and the data problem.

342
00:10:47,040 --> 00:10:47,760
Anything else?

343
00:10:47,760 --> 00:10:50,480
Well, there's also the question of how

344
00:10:50,480 --> 00:10:51,760
you evaluate these things.

345
00:10:51,760 --> 00:10:54,400
Like, how do we know if an LCM is doing a good job?

346
00:10:54,400 --> 00:10:54,760
Right.

347
00:10:54,760 --> 00:10:56,800
It's not just about predicting the next word anymore.

348
00:10:56,800 --> 00:10:57,600
Exactly.

349
00:10:57,600 --> 00:11:01,840
We need ways to measure how well it understands ideas,

350
00:11:01,840 --> 00:11:04,600
how well it can think conceptually.

351
00:11:04,600 --> 00:11:05,400
That's tricky.

352
00:11:05,400 --> 00:11:06,200
Yeah.

353
00:11:06,200 --> 00:11:08,360
So we need new metrics, metrics that

354
00:11:08,360 --> 00:11:11,000
make sense to humans that capture what we think of

355
00:11:11,000 --> 00:11:12,680
as conceptual understanding.

356
00:11:12,680 --> 00:11:14,160
Like, you can't judge a painting just

357
00:11:14,160 --> 00:11:15,640
by looking at the breast drugs.

358
00:11:15,640 --> 00:11:16,680
Perfect analogy.

359
00:11:16,680 --> 00:11:17,680
There's more to it than that.

360
00:11:17,680 --> 00:11:19,440
We need to see the whole picture, you know?

361
00:11:19,440 --> 00:11:22,200
OK, so limitations.check.

362
00:11:22,200 --> 00:11:24,640
But we can't forget about the potential here.

363
00:11:24,640 --> 00:11:27,440
I mean, this is AI that could change everything.

364
00:11:27,440 --> 00:11:31,720
Communication, understanding, the way we learn, create,

365
00:11:31,720 --> 00:11:32,680
even solve problems.

366
00:11:32,680 --> 00:11:34,120
I know, right.

367
00:11:34,120 --> 00:11:34,960
It's mind blowing.

368
00:11:34,960 --> 00:11:37,600
This paper, it just opens up so many doors.

369
00:11:37,600 --> 00:11:38,040
It does.

370
00:11:38,040 --> 00:11:40,520
Even with the challenges, this feels like a huge step forward.

371
00:11:40,520 --> 00:11:41,360
Absolutely.

372
00:11:41,360 --> 00:11:44,720
Like, we're getting closer to AI that's truly intelligent.

373
00:11:44,720 --> 00:11:46,840
So in the last part of this deep dive,

374
00:11:46,840 --> 00:11:48,480
we'll explore some of those possibilities.

375
00:11:48,480 --> 00:11:50,280
Get ready for a glimpse into the future.

376
00:11:50,280 --> 00:11:51,440
All right, final part.

377
00:11:51,440 --> 00:11:52,560
The future.

378
00:11:52,560 --> 00:11:53,440
Let's get into it.

379
00:11:53,440 --> 00:11:54,400
The big picture, right?

380
00:11:54,400 --> 00:11:55,000
Yeah.

381
00:11:55,000 --> 00:11:56,600
I want to hear about all those possibilities.

382
00:11:56,600 --> 00:11:58,680
We've talked about how it works, the challenges.

383
00:11:58,680 --> 00:11:59,000
Yeah.

384
00:11:59,000 --> 00:12:01,640
Now let's, like, unleash our imaginations.

385
00:12:01,640 --> 00:12:06,240
Imagine a world where language isn't a barrier anymore.

386
00:12:06,240 --> 00:12:09,760
Where AI can translate, understand, but not just words,

387
00:12:09,760 --> 00:12:13,800
the ideas across, like, all these different languages.

388
00:12:13,800 --> 00:12:14,400
That's powerful.

389
00:12:14,400 --> 00:12:15,920
No more struggling with translations

390
00:12:15,920 --> 00:12:17,840
or trying to figure out what someone really means.

391
00:12:17,840 --> 00:12:20,600
Just clear communication.

392
00:12:20,600 --> 00:12:22,080
Global collaboration.

393
00:12:22,080 --> 00:12:23,360
And it's not just language, right?

394
00:12:23,360 --> 00:12:24,640
Because it's all about concepts.

395
00:12:24,640 --> 00:12:26,640
These LCMs could work with other stuff, too.

396
00:12:26,640 --> 00:12:29,720
Images, music, even, like, scientific data.

397
00:12:29,720 --> 00:12:30,400
Wait, hold on.

398
00:12:30,400 --> 00:12:34,040
You're saying an AI could get the meaning behind a photo

399
00:12:34,040 --> 00:12:36,040
or, like, the emotions in a song.

400
00:12:36,040 --> 00:12:39,080
That's the idea, bridging those gaps between how we express

401
00:12:39,080 --> 00:12:41,160
ourselves and how we understand things.

402
00:12:41,160 --> 00:12:44,640
So, like, an AI could read a scientific paper,

403
00:12:44,640 --> 00:12:47,360
get the key points, and then, I don't know,

404
00:12:47,360 --> 00:12:48,680
make a chart of the data.

405
00:12:48,680 --> 00:12:48,920
Yeah.

406
00:12:48,920 --> 00:12:50,560
And explain it all in simple language.

407
00:12:50,560 --> 00:12:51,400
Exactly.

408
00:12:51,400 --> 00:12:54,440
Think about how useful that would be for research,

409
00:12:54,440 --> 00:12:55,400
for learning.

410
00:12:55,400 --> 00:12:57,120
So many possibilities.

411
00:12:57,120 --> 00:12:59,640
It's like having this super smart assistant

412
00:12:59,640 --> 00:13:01,720
that can process any kind of information

413
00:13:01,720 --> 00:13:02,840
and tailor it to you.

414
00:13:02,840 --> 00:13:04,160
Personalized learning.

415
00:13:04,160 --> 00:13:04,920
Imagine that.

416
00:13:04,920 --> 00:13:08,080
An AI tutor that knows exactly how you learn best.

417
00:13:08,080 --> 00:13:09,200
OK, that's amazing.

418
00:13:09,200 --> 00:13:11,040
But what about, like, creative stuff?

419
00:13:11,040 --> 00:13:14,600
Could these LCMs help you write a story or compose music,

420
00:13:14,600 --> 00:13:16,000
design new products even?

421
00:13:16,000 --> 00:13:16,640
I think so.

422
00:13:16,640 --> 00:13:18,080
Because you're working with concepts,

423
00:13:18,080 --> 00:13:20,520
it's, like, breaking free from how we create now.

424
00:13:20,520 --> 00:13:23,560
You could explore new ideas, combine them in different ways.

425
00:13:23,560 --> 00:13:24,800
Push those boundaries.

426
00:13:24,800 --> 00:13:25,600
Exactly.

427
00:13:25,600 --> 00:13:27,960
It's like having a brainstorming partner, one that never

428
00:13:27,960 --> 00:13:29,120
runs out of ideas.

429
00:13:29,120 --> 00:13:30,160
That would be awesome.

430
00:13:30,160 --> 00:13:34,040
But OK, zooming out even more, what about the impact on,

431
00:13:34,040 --> 00:13:36,240
like, society as a whole?

432
00:13:36,240 --> 00:13:38,760
Could these LCMs help us solve big problems?

433
00:13:38,760 --> 00:13:40,400
That's the big question, right?

434
00:13:40,400 --> 00:13:42,840
What if they could analyze all that data

435
00:13:42,840 --> 00:13:46,720
we have on climate change, poverty, whatever,

436
00:13:46,720 --> 00:13:49,320
and come up with solutions that we haven't even thought of?

437
00:13:49,320 --> 00:13:51,720
It's like a global brain working on all these problems.

438
00:13:51,720 --> 00:13:53,600
It's exciting, but we've got to be careful, too.

439
00:13:53,600 --> 00:13:55,360
Like, with any powerful technology,

440
00:13:55,360 --> 00:13:57,200
there are ethical things to think about.

441
00:13:57,200 --> 00:13:59,600
We need to make sure these LCMs are used for good.

442
00:13:59,600 --> 00:14:01,200
Right, for good.

443
00:14:01,200 --> 00:14:03,840
But still, this is pretty amazing stuff.

444
00:14:03,840 --> 00:14:06,520
This whole concept of conceptual AI, it's

445
00:14:06,520 --> 00:14:07,400
It's a game changer.

446
00:14:07,400 --> 00:14:08,080
It is.

447
00:14:08,080 --> 00:14:09,880
This paper, it's really opened my eyes.

448
00:14:09,880 --> 00:14:11,480
Thanks for taking this deep dive with me.

449
00:14:11,480 --> 00:14:12,600
It's been a pleasure.

450
00:14:12,600 --> 00:14:14,240
This research, it's mind-blowing.

451
00:14:14,240 --> 00:14:15,480
And we're just at the beginning.

452
00:14:15,480 --> 00:14:16,720
That's what's so cool about it, right?

453
00:14:16,720 --> 00:14:18,960
Who knows what the future holds?

454
00:14:18,960 --> 00:14:20,600
All right, that wraps up our deep dive

455
00:14:20,600 --> 00:14:22,240
into large concept models.

456
00:14:22,240 --> 00:14:25,200
Until next time, everyone, keep those brains buzzing.

457
00:14:25,200 --> 00:14:27,800
And if you find any cool AI research out there.

458
00:14:27,800 --> 00:14:28,800
Send it our way.

459
00:14:28,800 --> 00:14:56,360
We'll be here ready for the next deep dive.