1
00:00:00,000 --> 00:00:01,920
Hey everyone and welcome back.

2
00:00:01,920 --> 00:00:03,800
Today we're gonna be taking a deep dive

3
00:00:03,800 --> 00:00:07,680
into a paper that's been making some waves in the AI world.

4
00:00:07,680 --> 00:00:09,040
Oh yeah, I saw that one.

5
00:00:09,040 --> 00:00:10,800
DeepSeq V3,

6
00:00:10,800 --> 00:00:13,800
scaling open source language models with long termism.

7
00:00:15,080 --> 00:00:19,440
DeepSeq V3 is a new open source large language model.

8
00:00:21,120 --> 00:00:22,840
And LLM. LLM.

9
00:00:22,840 --> 00:00:25,040
And it's designed to be really efficient,

10
00:00:25,040 --> 00:00:27,640
cost effective and perform really well.

11
00:00:27,640 --> 00:00:29,960
Especially when it comes to things like code and math.

12
00:00:29,960 --> 00:00:32,680
Yeah, they're really trying to challenge those

13
00:00:32,680 --> 00:00:36,120
big closed source models we hear about all the time.

14
00:00:36,120 --> 00:00:37,800
It's really cool to see open source models

15
00:00:37,800 --> 00:00:38,920
kind of pushing the boundaries.

16
00:00:38,920 --> 00:00:40,880
Yeah, making it more accessible to everybody.

17
00:00:40,880 --> 00:00:44,040
Right, so let's unpack what makes DeepSeq V3 so special.

18
00:00:44,040 --> 00:00:44,880
Okay.

19
00:00:44,880 --> 00:00:46,120
They start off the paper,

20
00:00:46,120 --> 00:00:48,880
talking about this need for these open source models

21
00:00:48,880 --> 00:00:51,040
in the AI landscape.

22
00:00:51,040 --> 00:00:54,000
And they present DeepSeq V3 as their answer to that call.

23
00:00:54,000 --> 00:00:56,560
Yeah, that it's crucial for driving innovation

24
00:00:56,560 --> 00:00:59,400
and making AI more democratic, collaborative.

25
00:00:59,400 --> 00:01:01,720
I like that, democratizing AI.

26
00:01:01,720 --> 00:01:04,000
It shouldn't just be for the big players, right?

27
00:01:04,000 --> 00:01:07,280
So how do they actually build this impressive model?

28
00:01:07,280 --> 00:01:09,640
Well, they started with that familiar framework

29
00:01:09,640 --> 00:01:13,960
called Transformer, which is pretty standard for LLMs.

30
00:01:13,960 --> 00:01:15,640
You can think of it as the blueprint

31
00:01:15,640 --> 00:01:18,960
for how the model processes and understands language.

32
00:01:18,960 --> 00:01:19,920
So a foundation.

33
00:01:19,920 --> 00:01:21,760
Exactly, a foundation.

34
00:01:21,760 --> 00:01:22,960
But on top of this foundation,

35
00:01:22,960 --> 00:01:26,040
they made some really clever tweets to help it stand out.

36
00:01:26,040 --> 00:01:29,880
For instance, they use something called multi-head latent attention

37
00:01:29,880 --> 00:01:31,120
or MLA.

38
00:01:31,120 --> 00:01:31,960
MLA.

39
00:01:31,960 --> 00:01:33,280
Okay, no, that's mouthful.

40
00:01:33,280 --> 00:01:34,880
Yeah, it sounds intimidating a little bit.

41
00:01:34,880 --> 00:01:35,800
Can you break that down?

42
00:01:35,800 --> 00:01:37,280
Yeah, so think of it this way.

43
00:01:37,280 --> 00:01:40,320
Imagine the model has this massive filing cabinet

44
00:01:40,320 --> 00:01:41,600
full of information.

45
00:01:41,600 --> 00:01:42,440
Okay.

46
00:01:42,440 --> 00:01:44,760
MLA is like a super efficient filing system.

47
00:01:44,760 --> 00:01:45,600
Okay.

48
00:01:45,600 --> 00:01:49,360
That helps the model find the information it needs much faster.

49
00:01:49,360 --> 00:01:50,840
So making it more efficient.

50
00:01:50,840 --> 00:01:52,280
Exactly.

51
00:01:52,280 --> 00:01:56,760
This means DeepSeq v3 can run on less powerful hardware,

52
00:01:56,760 --> 00:01:59,800
making it more accessible for developers and researchers.

53
00:01:59,800 --> 00:02:00,640
I see.

54
00:02:00,640 --> 00:02:04,640
Who might not have access to those giant supercomputers.

55
00:02:04,640 --> 00:02:07,120
So it's all about kind of making things leaner and faster.

56
00:02:07,120 --> 00:02:08,240
Exactly.

57
00:02:08,240 --> 00:02:09,680
And they didn't stop there.

58
00:02:09,680 --> 00:02:12,000
They also incorporated Depseqbo.

59
00:02:12,000 --> 00:02:12,840
Okay.

60
00:02:12,840 --> 00:02:16,040
Which uses a mix of shared and routed experts.

61
00:02:16,040 --> 00:02:19,520
Oh, experts like having different parts of the models

62
00:02:19,520 --> 00:02:20,960
specialized in different tasks?

63
00:02:20,960 --> 00:02:21,800
That's a great way to put it.

64
00:02:21,800 --> 00:02:22,640
Yeah.

65
00:02:22,640 --> 00:02:24,960
It's like having a team of specialists all working together.

66
00:02:24,960 --> 00:02:25,800
I see.

67
00:02:25,800 --> 00:02:28,120
Instead of one giant model trying to do everything,

68
00:02:28,120 --> 00:02:30,800
this approach improves training efficiency.

69
00:02:30,800 --> 00:02:31,640
Okay.

70
00:02:31,640 --> 00:02:34,400
Meaning they can train DeepSeq v3 faster

71
00:02:34,400 --> 00:02:36,160
and use fewer resources.

72
00:02:36,160 --> 00:02:39,040
So MLA is like the efficient filing system.

73
00:02:39,040 --> 00:02:42,000
And DeepSeqbo is like the team of specialized experts.

74
00:02:42,000 --> 00:02:43,040
Exactly.

75
00:02:43,040 --> 00:02:43,880
I like that.

76
00:02:43,880 --> 00:02:46,600
What else did they do to make this model unique?

77
00:02:46,600 --> 00:02:51,480
One really cool innovation is multi-token prediction or MTP.

78
00:02:51,480 --> 00:02:52,320
MTP.

79
00:02:52,320 --> 00:02:55,360
So think of traditional models as someone reading a sentence

80
00:02:55,360 --> 00:02:57,440
one word at a time and always guessing the next word.

81
00:02:57,440 --> 00:02:58,280
Okay, yeah.

82
00:02:58,280 --> 00:03:01,080
With MTP, DeepSeq v3 can actually predict

83
00:03:01,080 --> 00:03:02,400
multiple words at once.

84
00:03:02,400 --> 00:03:03,240
Wow.

85
00:03:03,240 --> 00:03:05,360
Making it more efficient and potentially leading

86
00:03:05,360 --> 00:03:06,760
to better predictions overall.

87
00:03:06,760 --> 00:03:08,760
So it's like the model is thinking ahead,

88
00:03:08,760 --> 00:03:10,720
trying to anticipate what's coming next.

89
00:03:10,720 --> 00:03:11,920
That's smart.

90
00:03:11,920 --> 00:03:14,720
So how much computing power do you need to train a model

91
00:03:14,720 --> 00:03:15,560
that can do all this?

92
00:03:15,560 --> 00:03:17,520
I mean, that's gotta be some serious hardware, right?

93
00:03:17,520 --> 00:03:20,640
Oh, it definitely requires a lot of computational resources.

94
00:03:20,640 --> 00:03:27,040
They used a massive compute cluster with 2048 NVIDIA H800 GPUs

95
00:03:27,040 --> 00:03:29,240
to train DeepSeq v3.

96
00:03:29,240 --> 00:03:31,640
2048 GPUs?

97
00:03:31,640 --> 00:03:33,440
Wow, that's a lot of GPUs.

98
00:03:33,440 --> 00:03:34,680
Do they need all of that?

99
00:03:34,680 --> 00:03:36,520
Training a model of this scale, it

100
00:03:36,520 --> 00:03:39,000
does require that immense computational power.

101
00:03:39,000 --> 00:03:40,040
Yeah.

102
00:03:40,040 --> 00:03:41,080
But they were smart about it.

103
00:03:41,080 --> 00:03:43,080
They didn't just throw hardware at the problem.

104
00:03:43,080 --> 00:03:45,560
They developed a really sophisticated training

105
00:03:45,560 --> 00:03:47,880
framework to handle that kind of scale.

106
00:03:47,880 --> 00:03:49,160
So it's not just about the hardware,

107
00:03:49,160 --> 00:03:51,400
but it's also about being smart with how you use it.

108
00:03:51,400 --> 00:03:52,480
Absolutely.

109
00:03:52,480 --> 00:03:55,120
They used what's called a dual pipeline approach,

110
00:03:55,120 --> 00:03:57,680
where they break the training process into chunks,

111
00:03:57,680 --> 00:04:00,920
and they cleverly overlap the computation and communication

112
00:04:00,920 --> 00:04:02,320
to maximize efficiency.

113
00:04:02,320 --> 00:04:02,800
I see.

114
00:04:02,800 --> 00:04:04,400
It's kind of like an assembly line,

115
00:04:04,400 --> 00:04:05,800
where different parts of the model

116
00:04:05,800 --> 00:04:07,840
are being built and tested simultaneously.

117
00:04:07,840 --> 00:04:10,120
Yeah, like an assembly line, like that analogy.

118
00:04:10,120 --> 00:04:12,200
But all that data being processed,

119
00:04:12,200 --> 00:04:14,440
how do they manage the memory demands?

120
00:04:14,440 --> 00:04:18,480
They had to get creative with memory saving techniques,

121
00:04:18,480 --> 00:04:23,000
things like recomputation, storing things in the CPU,

122
00:04:23,000 --> 00:04:25,680
really optimizing how they used all those resources.

123
00:04:25,680 --> 00:04:28,920
OK, so we've got the efficient architecture with MLA

124
00:04:28,920 --> 00:04:30,440
and DeepSeqMoh.

125
00:04:30,440 --> 00:04:33,400
We've got the smart prediction with MTP,

126
00:04:33,400 --> 00:04:36,320
and then we've got the massive compute cluster

127
00:04:36,320 --> 00:04:38,800
with this really clever memory management.

128
00:04:38,800 --> 00:04:40,560
What else did they do that's worth highlighting?

129
00:04:40,560 --> 00:04:42,360
This is where it gets a little bit technical.

130
00:04:42,360 --> 00:04:44,160
But they also use something called

131
00:04:44,160 --> 00:04:47,000
FP8 mixed precision training.

132
00:04:47,000 --> 00:04:48,200
FP8.

133
00:04:48,200 --> 00:04:49,680
Now, that sounds a little intimidating.

134
00:04:49,680 --> 00:04:52,280
Can you explain that in a way that even I can understand?

135
00:04:52,280 --> 00:04:53,840
Of course.

136
00:04:53,840 --> 00:04:55,400
Imagine you're baking a cake.

137
00:04:55,400 --> 00:04:57,160
You could use a super precise scale

138
00:04:57,160 --> 00:05:00,800
to measure every ingredient, but it would take forever.

139
00:05:00,800 --> 00:05:04,000
FP8 is kind of like using a slightly less precise scale.

140
00:05:04,000 --> 00:05:07,320
You might be off by a tiny bit, but it's much faster,

141
00:05:07,320 --> 00:05:09,080
and the cake still turns out delicious.

142
00:05:09,080 --> 00:05:12,000
Also, it's a trade-off between being super accurate

143
00:05:12,000 --> 00:05:13,840
and being faster and more efficient.

144
00:05:13,840 --> 00:05:14,600
Exactly.

145
00:05:14,600 --> 00:05:18,000
And they were very clever about how they implemented it.

146
00:05:18,000 --> 00:05:22,280
They carefully chose which operations were done in FP8

147
00:05:22,280 --> 00:05:24,640
and which needed that higher precision,

148
00:05:24,640 --> 00:05:27,280
making sure they didn't sacrifice too much accuracy.

149
00:05:27,280 --> 00:05:27,640
I see.

150
00:05:27,640 --> 00:05:29,760
But they still got a significant speed boost.

151
00:05:29,760 --> 00:05:31,360
So they found that sweet spot.

152
00:05:31,360 --> 00:05:32,240
Yeah, that's fascinating.

153
00:05:32,240 --> 00:05:35,200
So training a model is just one part of the story, right?

154
00:05:35,200 --> 00:05:36,560
You actually have to be able to use it.

155
00:05:36,560 --> 00:05:40,600
How easy is it to actually deploy DeepSeq V3?

156
00:05:40,600 --> 00:05:43,240
That's where DeepSeq V3 really shines.

157
00:05:43,240 --> 00:05:46,360
They developed a smart deployment strategy

158
00:05:46,360 --> 00:05:49,480
that involves two stages, prefilling and decoding.

159
00:05:49,480 --> 00:05:50,480
Refilling and decoding.

160
00:05:50,480 --> 00:05:50,980
OK.

161
00:05:50,980 --> 00:05:53,120
So prefilling, I imagine that's like getting the model

162
00:05:53,120 --> 00:05:54,480
ready for action.

163
00:05:54,480 --> 00:05:56,400
And then decoding is where the magic happens,

164
00:05:56,400 --> 00:05:57,960
where it actually generates text.

165
00:05:57,960 --> 00:05:58,440
You got it.

166
00:05:58,440 --> 00:05:59,000
OK.

167
00:05:59,000 --> 00:06:00,960
And they've optimized each of these stages

168
00:06:00,960 --> 00:06:03,400
for different needs, using techniques like tensor

169
00:06:03,400 --> 00:06:07,080
parallelism, sequence parallelism, data parallelism,

170
00:06:07,080 --> 00:06:08,880
and expert parallelism.

171
00:06:08,880 --> 00:06:11,960
It's a pretty complex system, but the key takeaway

172
00:06:11,960 --> 00:06:15,240
is they've put a lot of thought into making DeepSeq V3

173
00:06:15,240 --> 00:06:18,120
as efficient as possible, both in training

174
00:06:18,120 --> 00:06:19,480
and in how it's used.

175
00:06:19,480 --> 00:06:20,760
Yeah, that makes sense.

176
00:06:20,760 --> 00:06:22,560
So it sounds very impressive.

177
00:06:22,560 --> 00:06:26,360
But with all this talk about efficiency and speed,

178
00:06:26,360 --> 00:06:29,640
did they have any thoughts on how future hardware could

179
00:06:29,640 --> 00:06:32,320
be designed to make AI even better?

180
00:06:32,320 --> 00:06:34,280
You know that's one thing that really impressed me

181
00:06:34,280 --> 00:06:35,120
about this paper.

182
00:06:35,120 --> 00:06:35,440
OK.

183
00:06:35,440 --> 00:06:37,360
They didn't just focus on the model itself.

184
00:06:37,360 --> 00:06:39,200
They looked at the bigger picture.

185
00:06:39,200 --> 00:06:41,440
Based on their experience with DeepSeq V3,

186
00:06:41,440 --> 00:06:44,440
they actually have some specific suggestions

187
00:06:44,440 --> 00:06:46,320
for future hardware design.

188
00:06:46,320 --> 00:06:49,040
Like what kind of suggestions did they get into the details?

189
00:06:49,040 --> 00:06:49,560
They did.

190
00:06:49,560 --> 00:06:53,440
They talk about things like offloading certain communication

191
00:06:53,440 --> 00:06:58,120
tasks to dedicated hardware to free up resources.

192
00:06:58,120 --> 00:07:00,080
They talked about increasing the precision

193
00:07:00,080 --> 00:07:01,520
of certain calculations.

194
00:07:01,520 --> 00:07:01,960
I see.

195
00:07:01,960 --> 00:07:04,800
And improving support for this FTA mix precision approach

196
00:07:04,800 --> 00:07:05,640
we talked about.

197
00:07:05,640 --> 00:07:07,800
It's clear they're thinking long term, not just

198
00:07:07,800 --> 00:07:11,480
about this model, but about how to advance AI hardware

199
00:07:11,480 --> 00:07:12,040
in general.

200
00:07:12,040 --> 00:07:12,680
That's cool.

201
00:07:12,680 --> 00:07:13,960
So they're not just building a model.

202
00:07:13,960 --> 00:07:16,000
They're kind of pushing the whole field forward.

203
00:07:16,000 --> 00:07:17,120
Yeah.

204
00:07:17,120 --> 00:07:17,440
OK.

205
00:07:17,440 --> 00:07:19,480
So we've talked about the architecture, the training

206
00:07:19,480 --> 00:07:24,200
process, the deployment, even future hardware suggestions.

207
00:07:24,200 --> 00:07:26,720
What about the data they use to train this thing?

208
00:07:26,720 --> 00:07:28,880
Where did they get all that information from?

209
00:07:28,880 --> 00:07:31,680
This is where the scale of this project really hits you.

210
00:07:31,680 --> 00:07:38,240
They train DeepSeq V3 on a data set of 14.8 trillion tokens.

211
00:07:38,240 --> 00:07:40,240
14.8 trillion?

212
00:07:40,240 --> 00:07:41,360
That's mind boggling.

213
00:07:41,360 --> 00:07:43,880
Where on earth do you find that much data?

214
00:07:43,880 --> 00:07:46,400
They gather data from a huge variety of sources,

215
00:07:46,400 --> 00:07:49,240
ensuring that DeepSeq V3 has a broad understanding

216
00:07:49,240 --> 00:07:51,560
of different topics and writing styles.

217
00:07:51,560 --> 00:07:53,800
And to train on this massive data set,

218
00:07:53,800 --> 00:07:56,920
they used a strategy called Fill in the Middle or FEM.

219
00:07:56,920 --> 00:07:58,240
Fill in the Middle or FEM.

220
00:07:58,240 --> 00:07:58,720
Yeah.

221
00:07:58,720 --> 00:07:59,440
I'm not familiar with that.

222
00:07:59,440 --> 00:07:59,960
What's that?

223
00:07:59,960 --> 00:08:02,240
So imagine you have a sentence with a missing word

224
00:08:02,240 --> 00:08:02,760
in the middle.

225
00:08:02,760 --> 00:08:03,280
OK.

226
00:08:03,280 --> 00:08:05,280
The FEM strategy trains the model

227
00:08:05,280 --> 00:08:07,280
to predict that missing word based

228
00:08:07,280 --> 00:08:08,600
on the surrounding context.

229
00:08:08,600 --> 00:08:10,640
So it's like a giant game of fill in the blanks.

230
00:08:10,640 --> 00:08:11,080
It is.

231
00:08:11,080 --> 00:08:11,640
OK.

232
00:08:11,640 --> 00:08:13,640
Forcing the model to understand relationships

233
00:08:13,640 --> 00:08:15,440
between words and concepts.

234
00:08:15,440 --> 00:08:17,480
So it's not just understanding individual words,

235
00:08:17,480 --> 00:08:20,080
but how they fit together in a bigger context.

236
00:08:20,080 --> 00:08:20,840
OK, cool.

237
00:08:20,840 --> 00:08:23,480
You mentioned earlier something about DeepSeq V3

238
00:08:23,480 --> 00:08:26,800
being able to handle really long pieces of text.

239
00:08:26,800 --> 00:08:27,720
How is that possible?

240
00:08:27,720 --> 00:08:29,960
That's not something you see in every language model, right?

241
00:08:29,960 --> 00:08:30,560
You're right.

242
00:08:30,560 --> 00:08:32,680
They dedicated a whole section of the paper

243
00:08:32,680 --> 00:08:35,800
to explaining how they extended its context window

244
00:08:35,800 --> 00:08:39,000
to a whopping 128,000 tokens.

245
00:08:39,000 --> 00:08:40,640
128,000 tokens.

246
00:08:40,640 --> 00:08:41,760
That's like a small novel.

247
00:08:41,760 --> 00:08:42,840
I know.

248
00:08:42,840 --> 00:08:44,800
They used a technique called Yarn,

249
00:08:44,800 --> 00:08:48,080
which allows the model to process much longer sequences

250
00:08:48,080 --> 00:08:48,760
of text.

251
00:08:48,760 --> 00:08:51,040
So that would be good for things like analyzing

252
00:08:51,040 --> 00:08:54,600
long legal documents, maybe summarizing research papers,

253
00:08:54,600 --> 00:08:56,920
even having a conversation that spans multiple pages.

254
00:08:56,920 --> 00:08:57,560
Exactly.

255
00:08:57,560 --> 00:08:58,040
OK.

256
00:08:58,040 --> 00:08:58,320
Wow.

257
00:08:58,320 --> 00:09:00,000
So we've covered the architecture, the training

258
00:09:00,000 --> 00:09:02,680
process, the massive data set, even the long context

259
00:09:02,680 --> 00:09:03,760
capability.

260
00:09:03,760 --> 00:09:07,000
But how well does DeepSeq V3 actually perform?

261
00:09:07,000 --> 00:09:08,520
Did they put it to the test?

262
00:09:08,520 --> 00:09:09,000
They did.

263
00:09:09,000 --> 00:09:11,520
And the results are quite impressive,

264
00:09:11,520 --> 00:09:13,600
especially for an open source model.

265
00:09:13,600 --> 00:09:16,240
They tested it on a wide range of benchmarks,

266
00:09:16,240 --> 00:09:18,200
covering everything from code and math

267
00:09:18,200 --> 00:09:21,120
to general language understanding and reasoning tasks.

268
00:09:21,120 --> 00:09:23,000
So they really wanted to see how it stacked up

269
00:09:23,000 --> 00:09:26,240
against the competition, both open source and closed source.

270
00:09:26,240 --> 00:09:27,680
Exactly.

271
00:09:27,680 --> 00:09:30,680
And on many of these benchmarks, DeepSeq V3 actually

272
00:09:30,680 --> 00:09:32,960
outperformed other open source models,

273
00:09:32,960 --> 00:09:37,160
especially when it came to those tasks involving code and math.

274
00:09:37,160 --> 00:09:40,000
It even held its own against some of the leading closed

275
00:09:40,000 --> 00:09:40,680
source models.

276
00:09:40,680 --> 00:09:41,680
That's promising.

277
00:09:41,680 --> 00:09:43,560
But I'm guessing the story doesn't end there.

278
00:09:43,560 --> 00:09:46,040
I mean, they didn't just build this amazing model

279
00:09:46,040 --> 00:09:46,960
and call it a day.

280
00:09:46,960 --> 00:09:47,480
You're right.

281
00:09:47,480 --> 00:09:49,680
The journey doesn't end with pre-training.

282
00:09:49,680 --> 00:09:51,840
They took this powerful base model

283
00:09:51,840 --> 00:09:54,520
and refined it further using techniques

284
00:09:54,520 --> 00:09:59,200
like supervised fine tuning or SFT and reinforcement learning

285
00:09:59,200 --> 00:10:00,120
or RL.

286
00:10:00,120 --> 00:10:01,160
SFT and RL.

287
00:10:01,160 --> 00:10:02,320
Can you unpack those a little bit?

288
00:10:02,320 --> 00:10:04,680
I know they're common in AI, but a quick refresher

289
00:10:04,680 --> 00:10:05,120
would be great.

290
00:10:05,120 --> 00:10:05,620
Sure.

291
00:10:05,620 --> 00:10:09,520
So think of SFT as giving the model a crash course

292
00:10:09,520 --> 00:10:11,800
and how to be helpful and follow instructions.

293
00:10:11,800 --> 00:10:12,320
OK.

294
00:10:12,320 --> 00:10:15,280
They used a massive data set of instructions and examples

295
00:10:15,280 --> 00:10:17,800
to fine tune DeepSeq V3 teaching it

296
00:10:17,800 --> 00:10:21,080
to generate more human-like text and respond appropriately

297
00:10:21,080 --> 00:10:22,240
to different prompts.

298
00:10:22,240 --> 00:10:23,960
So it's like polishing a raw diamond.

299
00:10:23,960 --> 00:10:25,680
You're taking something that's already powerful

300
00:10:25,680 --> 00:10:28,360
and you're making it even more refined and useful.

301
00:10:28,360 --> 00:10:29,360
What about RL?

302
00:10:29,360 --> 00:10:31,040
How does that fit into the picture?

303
00:10:31,040 --> 00:10:33,080
RL is where things get really interesting.

304
00:10:33,080 --> 00:10:35,200
It's like giving the model a personal trainer,

305
00:10:35,200 --> 00:10:37,520
helping it learn through trial and error.

306
00:10:37,520 --> 00:10:38,200
Interesting.

307
00:10:38,200 --> 00:10:39,280
So how does that work?

308
00:10:39,280 --> 00:10:41,480
They use what are called reward models

309
00:10:41,480 --> 00:10:45,280
to provide feedback to DeepSeq V3 during the RL process.

310
00:10:45,280 --> 00:10:47,600
These reward models act as judges.

311
00:10:47,600 --> 00:10:48,100
OK.

312
00:10:48,100 --> 00:10:50,640
Evaluating the responses generated by the model

313
00:10:50,640 --> 00:10:53,840
and giving it rewards for good responses and penalties

314
00:10:53,840 --> 00:10:54,560
for bad ones.

315
00:10:54,560 --> 00:10:56,880
So it's like the model is learning from its mistakes

316
00:10:56,880 --> 00:10:59,120
and trying to improve its performance over time,

317
00:10:59,120 --> 00:11:01,080
like a student getting grades on their homework.

318
00:11:01,080 --> 00:11:01,920
Exactly.

319
00:11:01,920 --> 00:11:04,800
And they used a really clever algorithm called GRPO

320
00:11:04,800 --> 00:11:07,840
to make this RL process even more efficient.

321
00:11:07,840 --> 00:11:11,080
The result is a fine-tuned model called DeepSeq V3

322
00:11:11,080 --> 00:11:13,320
Chat, which is specifically designed

323
00:11:13,320 --> 00:11:15,680
for interactive conversations and tasks.

324
00:11:15,680 --> 00:11:17,280
DeepSeq V3 Chat.

325
00:11:17,280 --> 00:11:18,080
OK.

326
00:11:18,080 --> 00:11:21,160
And how well does it perform compared to that base model?

327
00:11:21,160 --> 00:11:23,320
Did they see a significant improvement

328
00:11:23,320 --> 00:11:24,760
after all this fine-tuning?

329
00:11:24,760 --> 00:11:25,640
Oh, they did.

330
00:11:25,640 --> 00:11:29,040
DeepSeq V3 Chat performs remarkably well, even

331
00:11:29,040 --> 00:11:31,960
rivaling some of the leading close source models out there.

332
00:11:31,960 --> 00:11:32,680
Really?

333
00:11:32,680 --> 00:11:33,160
But you know what?

334
00:11:33,160 --> 00:11:35,440
I think we should save the details of those evaluations

335
00:11:35,440 --> 00:11:37,040
for the next part of our deep dive.

336
00:11:37,040 --> 00:11:37,540
OK.

337
00:11:37,540 --> 00:11:38,040
You're right.

338
00:11:38,040 --> 00:11:39,640
We've covered a lot of ground already.

339
00:11:39,640 --> 00:11:40,240
We have.

340
00:11:40,240 --> 00:11:42,320
And there's still so much more to explore.

341
00:11:42,320 --> 00:11:44,560
So stay tuned for part two of our deep dive

342
00:11:44,560 --> 00:11:47,320
where we'll explore how DeepSeq V3 Chat performs

343
00:11:47,320 --> 00:11:50,560
in different tasks, what its strengths and limitations are,

344
00:11:50,560 --> 00:11:53,960
and what its potential impact could be on the AI field.

345
00:11:53,960 --> 00:11:55,000
Sounds good.

346
00:11:55,000 --> 00:11:55,960
Welcome back, everyone.

347
00:11:55,960 --> 00:11:57,320
Before the break, we were talking

348
00:11:57,320 --> 00:11:59,440
about that extensive fine-tuning process

349
00:11:59,440 --> 00:12:01,320
that DeepSeq V3 went through.

350
00:12:01,320 --> 00:12:02,280
Right.

351
00:12:02,280 --> 00:12:05,480
We left off with you mentioning DeepSeq V3 Chat

352
00:12:05,480 --> 00:12:06,960
and how well it performs.

353
00:12:06,960 --> 00:12:09,320
So how did it do in those evaluations?

354
00:12:09,320 --> 00:12:10,200
Spill the beans.

355
00:12:10,200 --> 00:12:13,840
Well, they tested it on a whole new set of benchmarks.

356
00:12:13,840 --> 00:12:15,760
And here's where it gets really interesting.

357
00:12:15,760 --> 00:12:19,640
It really excels in areas like long-hawn text

358
00:12:19,640 --> 00:12:23,960
understanding, coding, math, and even Chinese language tasks.

359
00:12:23,960 --> 00:12:26,560
Yeah, for instance, on the DRUP-P benchmark,

360
00:12:26,560 --> 00:12:29,560
which tests how well a model can understand and answer

361
00:12:29,560 --> 00:12:32,520
questions from long passages of text,

362
00:12:32,520 --> 00:12:35,920
DeepSeq V3 achieved a really impressive score.

363
00:12:35,920 --> 00:12:38,960
OK, so it can handle those long, complex pieces of text

364
00:12:38,960 --> 00:12:40,280
we talked about earlier.

365
00:12:40,280 --> 00:12:43,240
So that's great for things like summarizing research papers

366
00:12:43,240 --> 00:12:45,440
or analyzing legal documents.

367
00:12:45,440 --> 00:12:48,160
What about its performance in coding and math?

368
00:12:48,160 --> 00:12:50,680
Those are pretty important areas for AI these days.

369
00:12:50,680 --> 00:12:51,920
Absolutely.

370
00:12:51,920 --> 00:12:55,560
Encoding DeepSeq V3 really shines especially in tasks

371
00:12:55,560 --> 00:12:58,040
that require that algorithmic thinking.

372
00:12:58,040 --> 00:13:01,600
They tested it on benchmarks like HumanAvileMol and LiveCode

373
00:13:01,600 --> 00:13:04,640
Bench, and it consistently outperformed other models.

374
00:13:04,640 --> 00:13:05,520
I see.

375
00:13:05,520 --> 00:13:07,840
This is partly thanks to that knowledge distillation

376
00:13:07,840 --> 00:13:10,160
technique we discussed earlier, where they essentially

377
00:13:10,160 --> 00:13:12,960
transfer knowledge from a more specialized coding model

378
00:13:12,960 --> 00:13:14,280
into DeepSeq V3.

379
00:13:14,280 --> 00:13:16,760
So it's like they gave it a crash course in coding

380
00:13:16,760 --> 00:13:18,000
from a seasoned expert.

381
00:13:18,000 --> 00:13:18,880
That makes sense.

382
00:13:18,880 --> 00:13:19,560
What about math?

383
00:13:19,560 --> 00:13:22,160
Can it solve those tricky word problems that used to give us

384
00:13:22,160 --> 00:13:23,240
headaches back in school?

385
00:13:23,240 --> 00:13:24,000
You bet.

386
00:13:24,000 --> 00:13:26,640
They even tested it on problems from the Chinese National

387
00:13:26,640 --> 00:13:28,680
High School Mathematics Olympiad.

388
00:13:28,680 --> 00:13:32,640
And the results suggest that it can grasp those complex

389
00:13:32,640 --> 00:13:35,760
mathematical concepts and solve problems that require

390
00:13:35,760 --> 00:13:38,480
logical reasoning and multi-step calculations.

391
00:13:38,480 --> 00:13:39,240
That's impressive.

392
00:13:39,240 --> 00:13:40,800
They really put it through the wringer.

393
00:13:40,800 --> 00:13:43,760
But a lot of these AI models are developed in English-speaking

394
00:13:43,760 --> 00:13:45,200
countries.

395
00:13:45,200 --> 00:13:49,280
How well does DeepSeq V3 actually perform in other languages,

396
00:13:49,280 --> 00:13:50,400
like Chinese, for example?

397
00:13:50,400 --> 00:13:51,480
That's a great question.

398
00:13:51,480 --> 00:13:53,040
And it's something that they address in the paper.

399
00:13:53,040 --> 00:13:56,680
They actually compared DeepSeq V3 to a leading Chinese language

400
00:13:56,680 --> 00:13:58,200
model called QUIN.

401
00:13:58,200 --> 00:14:01,600
And DeepSeq V3 held its own, showing impressive accuracy

402
00:14:01,600 --> 00:14:03,240
and fluency in Chinese.

403
00:14:03,240 --> 00:14:05,200
So that's encouraging to hear.

404
00:14:05,200 --> 00:14:06,680
It's important for these AI models

405
00:14:06,680 --> 00:14:09,320
to be accessible and useful for people around the world,

406
00:14:09,320 --> 00:14:10,600
not just English speakers.

407
00:14:10,600 --> 00:14:11,280
Definitely.

408
00:14:11,280 --> 00:14:12,800
It speaks to the team's commitment

409
00:14:12,800 --> 00:14:15,400
to making AI more inclusive and versatile.

410
00:14:15,400 --> 00:14:18,680
Now, you mentioned earlier that DeepSeq V3 even rivals some

411
00:14:18,680 --> 00:14:21,400
of the top closed-source models out there.

412
00:14:21,400 --> 00:14:23,400
Did they do any head-to-head comparisons

413
00:14:23,400 --> 00:14:24,480
to see how it stacks up?

414
00:14:24,480 --> 00:14:25,080
They did.

415
00:14:25,080 --> 00:14:27,280
And this is where things get really exciting.

416
00:14:27,280 --> 00:14:29,920
They used a variety of benchmarks,

417
00:14:29,920 --> 00:14:32,120
including some that are specifically designed

418
00:14:32,120 --> 00:14:35,600
to test a model's ability to reason, be creative,

419
00:14:35,600 --> 00:14:38,000
and even generate different writing styles.

420
00:14:38,000 --> 00:14:39,800
So they weren't just looking at accuracy,

421
00:14:39,800 --> 00:14:41,680
but also at things like creativity

422
00:14:41,680 --> 00:14:43,760
and how human-like the text is.

423
00:14:43,760 --> 00:14:45,200
That's really interesting.

424
00:14:45,200 --> 00:14:49,960
And how did DeepSeq V3 fare in these more subjective evaluations?

425
00:14:49,960 --> 00:14:53,400
It performed remarkably well, even exceeding expectations

426
00:14:53,400 --> 00:14:55,440
on some of the tougher benchmarks.

427
00:14:55,440 --> 00:14:57,960
For instance, on Arena Hard, a benchmark known

428
00:14:57,960 --> 00:14:59,880
for its challenging prompts, it became

429
00:14:59,880 --> 00:15:03,440
the first open-source model to achieve a win rate over 85%.

430
00:15:03,440 --> 00:15:04,440
Over 85%.

431
00:15:04,440 --> 00:15:05,360
That's a big deal.

432
00:15:05,360 --> 00:15:07,680
It really shows that open-source models are catching up

433
00:15:07,680 --> 00:15:09,320
to their closed-source counterparts.

434
00:15:09,320 --> 00:15:10,280
Absolutely.

435
00:15:10,280 --> 00:15:12,000
And here's another fascinating tidbit.

436
00:15:12,000 --> 00:15:14,440
They even tested DeepSeq V3 as what's

437
00:15:14,440 --> 00:15:16,600
called a generative reward model.

438
00:15:16,600 --> 00:15:17,360
A reward model.

439
00:15:17,360 --> 00:15:20,280
You mean they used DeepSeq V3 to judge the output

440
00:15:20,280 --> 00:15:21,400
of other AI models?

441
00:15:21,400 --> 00:15:22,240
Exactly.

442
00:15:22,240 --> 00:15:24,720
Remember how we talked about those reward models being used

443
00:15:24,720 --> 00:15:27,640
to train DeepSeq V3 during reinforcement learning?

444
00:15:27,640 --> 00:15:27,960
Yeah.

445
00:15:27,960 --> 00:15:30,600
Well, they found that DeepSeq V3 was surprisingly

446
00:15:30,600 --> 00:15:33,280
good at evaluating the quality of text generated

447
00:15:33,280 --> 00:15:36,440
by other models, even performing on par with models

448
00:15:36,440 --> 00:15:39,720
like GPT-4O and Clawd 3.5 in some cases.

449
00:15:39,720 --> 00:15:41,760
So it's like the student becoming the teacher.

450
00:15:41,760 --> 00:15:42,640
That's really cool.

451
00:15:42,640 --> 00:15:45,520
It sounds like DeepSeq V3 is a real all-rounder,

452
00:15:45,520 --> 00:15:49,040
capable of not just generating text but also evaluating it.

453
00:15:49,040 --> 00:15:51,040
But I'm curious, did the researchers uncover

454
00:15:51,040 --> 00:15:55,000
any limitations or areas where the model could be improved?

455
00:15:55,000 --> 00:15:55,960
That's a great question.

456
00:15:55,960 --> 00:15:58,160
And your right no model is perfect.

457
00:15:58,160 --> 00:16:00,000
One thing they highlighted was the size

458
00:16:00,000 --> 00:16:01,720
of the recommended deployment unit.

459
00:16:01,720 --> 00:16:02,960
It's still quite large.

460
00:16:02,960 --> 00:16:03,480
OK.

461
00:16:03,480 --> 00:16:05,440
Which could be a challenge for smaller teams

462
00:16:05,440 --> 00:16:08,520
who might not have access to that level of computing power.

463
00:16:08,520 --> 00:16:09,360
That makes sense.

464
00:16:09,360 --> 00:16:10,200
It's a trade-off, right?

465
00:16:10,200 --> 00:16:11,840
You get this amazing performance,

466
00:16:11,840 --> 00:16:13,560
but it comes at the cost of needing

467
00:16:13,560 --> 00:16:15,480
significant computational resources.

468
00:16:15,480 --> 00:16:16,080
Exactly.

469
00:16:16,080 --> 00:16:17,640
And while their deployment strategy

470
00:16:17,640 --> 00:16:20,080
is much improved compared to previous versions,

471
00:16:20,080 --> 00:16:22,400
they acknowledge that there's still room for improvement

472
00:16:22,400 --> 00:16:24,280
in terms of speed and efficiency.

473
00:16:24,280 --> 00:16:27,600
They're actively exploring ways to make DeepSeq V3 more

474
00:16:27,600 --> 00:16:30,240
accessible and easier to deploy, especially

475
00:16:30,240 --> 00:16:33,040
for real-time applications like chatbots.

476
00:16:33,040 --> 00:16:34,760
So it's an ongoing journey.

477
00:16:34,760 --> 00:16:36,880
There are still hurdles to overcome.

478
00:16:36,880 --> 00:16:38,640
But even with these limitations,

479
00:16:38,640 --> 00:16:42,480
it's clear that DeepSeq V3 is a significant achievement.

480
00:16:42,480 --> 00:16:45,520
What do you think are the biggest takeaways from this research?

481
00:16:45,520 --> 00:16:47,000
I think one of the key takeaways

482
00:16:47,000 --> 00:16:49,480
is the sheer potential of open source AI.

483
00:16:49,480 --> 00:16:51,960
DeepSeq V3 demonstrates that open source models

484
00:16:51,960 --> 00:16:55,120
can be just as powerful and versatile as their close source

485
00:16:55,120 --> 00:16:58,360
counterparts, if not more so in certain areas.

486
00:16:58,360 --> 00:17:00,960
And the fact that the researchers have been so transparent

487
00:17:00,960 --> 00:17:03,360
about their methods and findings is hugely

488
00:17:03,360 --> 00:17:05,760
beneficial for the entire AI community.

489
00:17:05,760 --> 00:17:06,000
Right.

490
00:17:06,000 --> 00:17:08,040
It makes these powerful tools available to a wider

491
00:17:08,040 --> 00:17:11,560
range of researchers, developers, and even hobbyists

492
00:17:11,560 --> 00:17:13,640
who might not have the resources to access

493
00:17:13,640 --> 00:17:15,480
those big proprietary models.

494
00:17:15,480 --> 00:17:18,000
It really democratizes AI like we were talking about earlier.

495
00:17:18,000 --> 00:17:18,920
Exactly.

496
00:17:18,920 --> 00:17:21,640
And this leads to the second major implication.

497
00:17:21,640 --> 00:17:23,800
With more people able to experiment with and build

498
00:17:23,800 --> 00:17:25,320
upon these open source models, we're

499
00:17:25,320 --> 00:17:27,800
likely to see an explosion of creativity and innovation

500
00:17:27,800 --> 00:17:29,040
in the AI space.

501
00:17:29,040 --> 00:17:31,600
It's exciting to think about all the potential applications

502
00:17:31,600 --> 00:17:32,840
that could emerge.

503
00:17:32,840 --> 00:17:36,720
And I imagine this will also lead to more robust and reliable AI

504
00:17:36,720 --> 00:17:38,760
systems as researchers around the world

505
00:17:38,760 --> 00:17:40,920
can collaborate, share their findings,

506
00:17:40,920 --> 00:17:42,920
and work together to improve these models.

507
00:17:42,920 --> 00:17:43,560
Absolutely.

508
00:17:43,560 --> 00:17:46,280
Openness and transparency are crucial for building trust

509
00:17:46,280 --> 00:17:47,240
in AI.

510
00:17:47,240 --> 00:17:49,120
And DeepSeq V3 is a great example

511
00:17:49,120 --> 00:17:51,960
of how we can move towards a more open and collaborative

512
00:17:51,960 --> 00:17:53,800
future for this technology.

513
00:17:53,800 --> 00:17:55,480
But there's another implication that I think

514
00:17:55,480 --> 00:17:57,040
is particularly important.

515
00:17:57,040 --> 00:17:58,800
And it relates to the paper's title,

516
00:17:58,800 --> 00:18:02,440
Scaling Open Source Language Models with Long Termism.

517
00:18:02,440 --> 00:18:04,000
OK, long termism.

518
00:18:04,000 --> 00:18:05,400
That word definitely got my attention

519
00:18:05,400 --> 00:18:07,200
when we first introduced the paper.

520
00:18:07,200 --> 00:18:09,480
Can you elaborate on what the researchers mean by that

521
00:18:09,480 --> 00:18:10,920
and why it's important?

522
00:18:10,920 --> 00:18:12,880
They're emphasizing the need to think

523
00:18:12,880 --> 00:18:15,720
beyond just the immediate capabilities of these models.

524
00:18:15,720 --> 00:18:16,240
OK.

525
00:18:16,240 --> 00:18:17,640
They argue that we need to consider

526
00:18:17,640 --> 00:18:21,840
the long term ethical and societal implications of developing

527
00:18:21,840 --> 00:18:25,160
and deploying increasingly powerful AI systems.

528
00:18:25,160 --> 00:18:26,360
That's a crucial point.

529
00:18:26,360 --> 00:18:28,680
It's not just about building bigger and better models.

530
00:18:28,680 --> 00:18:31,520
It's about ensuring that these advancements benefit

531
00:18:31,520 --> 00:18:33,800
humanity as a whole and that we're

532
00:18:33,800 --> 00:18:37,200
prepared for the challenges that might arise as AI becomes

533
00:18:37,200 --> 00:18:38,760
more integrated into our lives.

534
00:18:38,760 --> 00:18:39,600
Exactly.

535
00:18:39,600 --> 00:18:42,280
And I think the researchers behind DeepSeq V3

536
00:18:42,280 --> 00:18:45,160
are setting a great example by not just focusing

537
00:18:45,160 --> 00:18:46,760
on technical achievements, but also

538
00:18:46,760 --> 00:18:49,600
by being mindful of the broader impact of their work

539
00:18:49,600 --> 00:18:52,120
and advocating for a more responsible and inclusive

540
00:18:52,120 --> 00:18:54,000
approach to AI development.

541
00:18:54,000 --> 00:18:55,160
Well said.

542
00:18:55,160 --> 00:18:58,560
It's a reminder that AI is not just a technological pursuit.

543
00:18:58,560 --> 00:19:00,120
It's also a human one.

544
00:19:00,120 --> 00:19:02,160
And we need to ensure that these advancements align

545
00:19:02,160 --> 00:19:05,800
with our values and contribute to a better future for everyone.

546
00:19:05,800 --> 00:19:07,160
So where we go from here?

547
00:19:07,160 --> 00:19:10,480
What's next for DeepSeq V3 and for the open source AI

548
00:19:10,480 --> 00:19:11,760
landscape in general?

549
00:19:11,760 --> 00:19:13,160
That's a great question to ponder

550
00:19:13,160 --> 00:19:14,680
as we move to our final part.

551
00:19:14,680 --> 00:19:16,920
Join us for a short break as we delve into the future

552
00:19:16,920 --> 00:19:18,240
of this exciting field.

553
00:19:18,240 --> 00:19:18,960
We're back.

554
00:19:18,960 --> 00:19:22,160
And I'm still thinking about all this stuff about DeepSeq V3.

555
00:19:22,160 --> 00:19:25,440
It's amazing to see how far open source AI has come.

556
00:19:25,440 --> 00:19:27,720
Yeah, it's really remarkable.

557
00:19:27,720 --> 00:19:29,480
And this is just the beginning.

558
00:19:29,480 --> 00:19:32,280
The DeepSeq team has already hinted at future versions

559
00:19:32,280 --> 00:19:34,880
that promise even more capabilities and efficiency.

560
00:19:34,880 --> 00:19:35,520
Really?

561
00:19:35,520 --> 00:19:35,600
Yeah.

562
00:19:35,600 --> 00:19:37,440
What will these future models be able to do?

563
00:19:37,440 --> 00:19:38,880
What kind of problems will they solve?

564
00:19:38,880 --> 00:19:41,800
Well, I think we can look at the trajectory of AI development

565
00:19:41,800 --> 00:19:43,200
to get some clues.

566
00:19:43,200 --> 00:19:45,480
We're seeing these models become increasingly

567
00:19:45,480 --> 00:19:48,200
adept at understanding and generating human language,

568
00:19:48,200 --> 00:19:50,640
which opens up a whole world of possibilities.

569
00:19:50,640 --> 00:19:51,320
Like what?

570
00:19:51,320 --> 00:19:52,360
Give me some examples.

571
00:19:52,360 --> 00:19:55,640
OK, so imagine personalized tutors powered

572
00:19:55,640 --> 00:19:59,120
by DeepSeq tailoring lessons to each student's individual needs

573
00:19:59,120 --> 00:20:00,360
and learning styles.

574
00:20:00,360 --> 00:20:01,120
OK, yeah.

575
00:20:01,120 --> 00:20:03,760
Or imagine AI assistants that can help us write more

576
00:20:03,760 --> 00:20:06,760
effectively, translate languages seamlessly,

577
00:20:06,760 --> 00:20:10,040
or even generate creative content like stories, poems,

578
00:20:10,040 --> 00:20:10,800
or scripts.

579
00:20:10,800 --> 00:20:13,600
It's like having a super-powered brainstorming partner

580
00:20:13,600 --> 00:20:16,080
who can just help you unlock your creative potential.

581
00:20:16,080 --> 00:20:18,120
And I bet there are huge implications

582
00:20:18,120 --> 00:20:21,160
for fields like scientific research and health care, too.

583
00:20:21,160 --> 00:20:22,320
Oh, absolutely.

584
00:20:22,320 --> 00:20:25,040
Imagine DeepSeq being used to analyze vast amounts

585
00:20:25,040 --> 00:20:28,200
of scientific data, helping researchers make new discoveries

586
00:20:28,200 --> 00:20:29,320
and breakthroughs.

587
00:20:29,320 --> 00:20:32,520
Or imagine it being used to develop more personalized medical

588
00:20:32,520 --> 00:20:35,720
treatments tailored to each patient's unique genetic makeup

589
00:20:35,720 --> 00:20:37,040
and health history.

590
00:20:37,040 --> 00:20:39,320
Yeah, the possibilities were pre-mind-boggling.

591
00:20:39,320 --> 00:20:42,280
It's really exciting to think about how this technology could

592
00:20:42,280 --> 00:20:45,800
be used to improve so many aspects of our lives.

593
00:20:45,800 --> 00:20:48,920
But with all this talk about potential and possibilities,

594
00:20:48,920 --> 00:20:51,320
it's important to remember that we need to develop and deploy

595
00:20:51,320 --> 00:20:52,880
these technologies responsibly.

596
00:20:52,880 --> 00:20:53,560
Yeah, for sure.

597
00:20:53,560 --> 00:20:55,760
We need to make sure AI is used for good

598
00:20:55,760 --> 00:20:58,040
and that it benefits humanity as a whole.

599
00:20:58,040 --> 00:21:00,360
Another DeepSeq team has emphasized that long-termism

600
00:21:00,360 --> 00:21:04,280
approach, thinking about the long-term ethical and societal

601
00:21:04,280 --> 00:21:05,760
implications of AI.

602
00:21:05,760 --> 00:21:08,040
But what can we as individuals do

603
00:21:08,040 --> 00:21:10,320
to ensure a positive future for AI?

604
00:21:10,320 --> 00:21:12,200
I think one of the most important things we can do

605
00:21:12,200 --> 00:21:15,440
is stay informed and engaged in the conversation about AI.

606
00:21:15,440 --> 00:21:17,560
We need to educate ourselves about the potential benefits

607
00:21:17,560 --> 00:21:20,800
and risks of this technology and participate in discussions

608
00:21:20,800 --> 00:21:22,800
about how it should be developed and used.

609
00:21:22,800 --> 00:21:25,000
So it's not just about the researchers and developers.

610
00:21:25,000 --> 00:21:27,480
It's about all of us taking an active role

611
00:21:27,480 --> 00:21:30,000
in shaping the future of AI.

612
00:21:30,000 --> 00:21:31,600
What are some practical steps people

613
00:21:31,600 --> 00:21:33,560
can take to get more involved?

614
00:21:33,560 --> 00:21:35,520
Well, there are many ways to get involved.

615
00:21:35,520 --> 00:21:38,200
You can read articles and books about AI,

616
00:21:38,200 --> 00:21:41,400
attend conferences and workshops, or even join online communities

617
00:21:41,400 --> 00:21:43,680
where people are discussing these topics.

618
00:21:43,680 --> 00:21:46,760
And don't be afraid to ask questions, challenge assumptions,

619
00:21:46,760 --> 00:21:48,240
and voice your concerns.

620
00:21:48,240 --> 00:21:50,680
It's through open and honest dialogue

621
00:21:50,680 --> 00:21:53,320
that we can ensure AI is developed and used

622
00:21:53,320 --> 00:21:55,320
in a way that aligns with our values.

623
00:21:55,320 --> 00:21:56,720
Well said.

624
00:21:56,720 --> 00:22:00,120
It's a reminder that the future of AI is not predetermined.

625
00:22:00,120 --> 00:22:01,800
It's something that we're actively shaping

626
00:22:01,800 --> 00:22:03,800
through our choices and our actions.

627
00:22:03,800 --> 00:22:07,040
And that's both a responsibility and an incredible opportunity.

628
00:22:07,040 --> 00:22:07,840
Definitely.

629
00:22:07,840 --> 00:22:10,720
As we learn more about these powerful AI models,

630
00:22:10,720 --> 00:22:12,880
like DeepSeq V3, it's up to all of us

631
00:22:12,880 --> 00:22:15,640
to ensure that we're using them wisely and ethically.

632
00:22:15,640 --> 00:22:17,480
OK, before we wrap up, I want to go back

633
00:22:17,480 --> 00:22:19,680
to the DeepSeq V3 paper itself.

634
00:22:19,680 --> 00:22:21,440
What's your final takeaway for our listeners?

635
00:22:21,440 --> 00:22:23,520
What do you hope they'll remember from this deep dive?

636
00:22:23,520 --> 00:22:26,040
I hope they'll remember that the world of open source AI

637
00:22:26,040 --> 00:22:29,880
is vibrant and innovative and full of potential founder.

638
00:22:29,880 --> 00:22:32,360
DeepSeq V3 is a testament to what's

639
00:22:32,360 --> 00:22:35,400
possible when researchers collaborate, share their knowledge,

640
00:22:35,400 --> 00:22:37,840
and push the boundaries of this technology.

641
00:22:37,840 --> 00:22:39,600
And I hope that we inspire to learn more

642
00:22:39,600 --> 00:22:42,080
to explore these ideas further and contribute

643
00:22:42,080 --> 00:22:45,160
to the conversation about how we can shape the future of AI

644
00:22:45,160 --> 00:22:47,600
in a positive and beneficial way.

645
00:22:47,600 --> 00:22:49,440
So to our listeners, we encourage you to check out

646
00:22:49,440 --> 00:22:51,400
the full DeepSeq V3 paper.

647
00:22:51,400 --> 00:22:54,440
It's a fascinating read, even if you're not a technical expert.

648
00:22:54,440 --> 00:22:57,200
And as you explore the paper, think about the questions we've

649
00:22:57,200 --> 00:22:57,560
raised today.

650
00:22:57,560 --> 00:22:59,800
What excites you about the potential of AI?

651
00:22:59,800 --> 00:23:01,480
What concerns do you have?

652
00:23:01,480 --> 00:23:04,080
And most importantly, how can you be a part of shaping a future

653
00:23:04,080 --> 00:23:06,320
where AI is used for good?

654
00:23:06,320 --> 00:23:09,440
Thanks for joining us on this deep dive into DeepSeq V3.

655
00:23:09,440 --> 00:23:10,920
It's been an insightful journey.

656
00:23:10,920 --> 00:23:13,320
And we look forward to exploring more fascinating corners

657
00:23:13,320 --> 00:23:15,880
of the AI world with you in our next episode.

658
00:23:15,880 --> 00:23:17,880
Until then, keep learning, keep questioning,

659
00:23:17,880 --> 00:23:32,640
and keep imagining the possibilities.

