1
00:00:00,000 --> 00:00:04,920
Okay, so let's dive into this world of open source AI.

2
00:00:04,920 --> 00:00:05,760
Sounds good.

3
00:00:05,760 --> 00:00:08,760
We're taking a deep dive into the two U3.

4
00:00:08,760 --> 00:00:10,920
Pushing Frontiers in Open Language Model

5
00:00:10,920 --> 00:00:12,640
post-training research paper.

6
00:00:12,640 --> 00:00:13,600
Okay.

7
00:00:13,600 --> 00:00:14,960
I think you've probably heard whispers

8
00:00:14,960 --> 00:00:17,120
about how this model has kind of shaken things up.

9
00:00:17,120 --> 00:00:17,960
Right.

10
00:00:17,960 --> 00:00:19,520
And you're in the right place to find out why.

11
00:00:19,520 --> 00:00:21,960
Yeah, this is definitely a really interesting paper.

12
00:00:21,960 --> 00:00:24,000
And what's really capturing everyone's attention

13
00:00:24,000 --> 00:00:28,080
is how this model, which was built by just like a small team,

14
00:00:28,080 --> 00:00:31,000
is challenging some of the biggest names in AI.

15
00:00:31,000 --> 00:00:31,840
Oh, wow.

16
00:00:31,840 --> 00:00:34,480
Even those like super secret closed source ones.

17
00:00:34,480 --> 00:00:37,560
It's like a David and Goliath situation in the AI world.

18
00:00:37,560 --> 00:00:38,400
Exactly.

19
00:00:38,400 --> 00:00:41,600
So for those of us who aren't, you know, AI experts,

20
00:00:41,600 --> 00:00:43,200
what exactly is two U3?

21
00:00:43,200 --> 00:00:45,320
So imagine a family of language models

22
00:00:45,320 --> 00:00:48,400
that anyone can access and tinker with.

23
00:00:48,400 --> 00:00:50,360
That's what two U3 is.

24
00:00:50,360 --> 00:00:52,280
Open source AI.

25
00:00:52,280 --> 00:00:55,760
They started with a solid foundation called Lama 3.1.

26
00:00:55,760 --> 00:00:56,600
Okay.

27
00:00:56,600 --> 00:00:58,320
They added their own secret sauce

28
00:00:58,320 --> 00:01:01,080
to create something truly extraordinary.

29
00:01:01,080 --> 00:01:02,920
Okay, so they're not building from scratch.

30
00:01:02,920 --> 00:01:04,720
They're taking something that already exists

31
00:01:04,720 --> 00:01:05,560
and improving it.

32
00:01:05,560 --> 00:01:06,400
Exactly.

33
00:01:06,400 --> 00:01:07,240
Okay.

34
00:01:07,240 --> 00:01:09,360
Think of it like taking a basic cake recipe.

35
00:01:09,360 --> 00:01:10,200
Okay.

36
00:01:10,200 --> 00:01:12,320
And adding unique ingredients and techniques

37
00:01:12,320 --> 00:01:14,960
to transform it into a gourmet masterpiece.

38
00:01:14,960 --> 00:01:15,920
I like that analogy.

39
00:01:15,920 --> 00:01:16,760
Yeah.

40
00:01:16,760 --> 00:01:20,760
So in this case, the team used a four stage process

41
00:01:20,760 --> 00:01:22,240
to train two U3.

42
00:01:22,240 --> 00:01:23,080
Right.

43
00:01:23,080 --> 00:01:25,240
It starts with the called supervised fine tuning.

44
00:01:25,240 --> 00:01:26,080
Right.

45
00:01:26,080 --> 00:01:28,800
And they teach the model to follow instructions

46
00:01:28,800 --> 00:01:30,920
using really carefully selected data.

47
00:01:30,920 --> 00:01:34,320
Yeah, it's like showing the model good examples

48
00:01:34,320 --> 00:01:35,520
of how to respond.

49
00:01:35,520 --> 00:01:36,360
Okay.

50
00:01:36,360 --> 00:01:38,440
Like a teacher giving a student model answers.

51
00:01:38,440 --> 00:01:39,720
Yeah, like a study guide almost.

52
00:01:39,720 --> 00:01:40,560
Exactly.

53
00:01:40,560 --> 00:01:42,680
But here's where it gets really interesting.

54
00:01:42,680 --> 00:01:44,680
They went above and beyond to make sure

55
00:01:44,680 --> 00:01:46,800
that the data they used was top notch.

56
00:01:46,800 --> 00:01:47,640
Okay.

57
00:01:47,640 --> 00:01:48,480
They even developed tools

58
00:01:48,480 --> 00:01:51,280
to decontaminate existing data sets,

59
00:01:51,280 --> 00:01:54,400
removing any overlap with benchmark tests

60
00:01:54,400 --> 00:01:56,320
to avoid it inflated scores.

61
00:01:56,320 --> 00:01:57,640
So it's like making sure the students

62
00:01:57,640 --> 00:02:00,280
aren't sneaking peaks at the answer key before the test.

63
00:02:00,280 --> 00:02:01,120
Exactly.

64
00:02:01,120 --> 00:02:01,960
Yeah.

65
00:02:01,960 --> 00:02:03,720
That level of rigor is impressive.

66
00:02:03,720 --> 00:02:04,560
It is.

67
00:02:04,560 --> 00:02:05,400
It is.

68
00:02:05,400 --> 00:02:06,240
Yeah.

69
00:02:06,240 --> 00:02:07,060
They're committed to building a model

70
00:02:07,060 --> 00:02:10,200
that can truly generalize and tackle new challenges.

71
00:02:10,200 --> 00:02:11,040
Okay.

72
00:02:11,040 --> 00:02:12,400
After the supervised fine tuning,

73
00:02:12,400 --> 00:02:15,200
they move on to what's called preference fine tuning.

74
00:02:15,200 --> 00:02:16,040
Right.

75
00:02:16,040 --> 00:02:18,120
Which is all about refining the model's responses

76
00:02:18,120 --> 00:02:20,520
based on what humans prefer.

77
00:02:20,520 --> 00:02:23,240
So they're taking into account what we,

78
00:02:23,240 --> 00:02:27,440
like what we humans find helpful, interesting, or enjoyable.

79
00:02:27,440 --> 00:02:28,520
You got it.

80
00:02:28,520 --> 00:02:29,800
And for this stage,

81
00:02:29,800 --> 00:02:32,680
they used a fascinating mix of human

82
00:02:32,680 --> 00:02:34,640
and AI-generated feedback.

83
00:02:34,640 --> 00:02:35,720
Oh, wow.

84
00:02:35,720 --> 00:02:38,280
It's like having a panel of judges

85
00:02:38,280 --> 00:02:39,620
with different perspectives,

86
00:02:39,620 --> 00:02:41,320
helping the model to improve.

87
00:02:41,320 --> 00:02:42,360
That's really clever.

88
00:02:42,360 --> 00:02:43,200
Yeah.

89
00:02:43,200 --> 00:02:45,120
It's like they're combining the best of both worlds.

90
00:02:45,120 --> 00:02:48,000
Human intuition and AI's ability to process

91
00:02:48,000 --> 00:02:49,760
massive amounts of information.

92
00:02:49,760 --> 00:02:50,600
Right.

93
00:02:50,600 --> 00:02:51,760
What's the third stage of training?

94
00:02:51,760 --> 00:02:54,200
The third stage is where they really break new ground.

95
00:02:54,200 --> 00:02:55,040
Okay.

96
00:02:55,040 --> 00:02:56,280
It's called reinforcement learning

97
00:02:56,280 --> 00:02:59,560
with verifiable rewards, or RLVR for short.

98
00:02:59,560 --> 00:03:00,400
RLVR.

99
00:03:00,400 --> 00:03:01,240
Yeah.

100
00:03:01,240 --> 00:03:02,080
Okay.

101
00:03:02,080 --> 00:03:02,920
And this technique is designed for tasks

102
00:03:02,920 --> 00:03:04,640
with clear right or wrong answers,

103
00:03:04,640 --> 00:03:07,040
like math problems or coding challenges.

104
00:03:07,040 --> 00:03:08,660
So it's like giving the model points

105
00:03:08,660 --> 00:03:10,280
for every correct solution.

106
00:03:10,280 --> 00:03:12,520
Solutioning it to become a master problem solver.

107
00:03:12,520 --> 00:03:13,360
Right.

108
00:03:13,360 --> 00:03:14,200
Okay.

109
00:03:14,200 --> 00:03:16,160
And the final stage is all about transparency.

110
00:03:16,160 --> 00:03:17,000
Okay.

111
00:03:17,000 --> 00:03:18,920
They've made all of their data, code,

112
00:03:18,920 --> 00:03:20,960
and even their training recipes public.

113
00:03:20,960 --> 00:03:21,720
Wow.

114
00:03:21,720 --> 00:03:23,760
This is a huge deal in the AI world

115
00:03:23,760 --> 00:03:25,560
where things are often kept secret.

116
00:03:25,560 --> 00:03:26,400
Yeah.

117
00:03:26,400 --> 00:03:27,240
Very secretive.

118
00:03:27,240 --> 00:03:28,080
Yeah.

119
00:03:28,080 --> 00:03:30,120
So they're basically inviting everyone to explore,

120
00:03:30,120 --> 00:03:32,800
learn from, and build upon their work.

121
00:03:32,800 --> 00:03:33,640
Exactly.

122
00:03:33,640 --> 00:03:35,120
That's the essence of open source, right?

123
00:03:35,120 --> 00:03:35,960
That's it.

124
00:03:35,960 --> 00:03:36,800
Okay.

125
00:03:36,800 --> 00:03:37,640
Let's back up for a second.

126
00:03:37,640 --> 00:03:38,480
Sure.

127
00:03:38,480 --> 00:03:40,280
You mentioned how they decontaminated data.

128
00:03:40,280 --> 00:03:41,120
Right.

129
00:03:41,120 --> 00:03:42,240
What exactly does that mean,

130
00:03:42,240 --> 00:03:44,080
and why is so important?

131
00:03:44,080 --> 00:03:46,320
So it turns out that a lot of data sets

132
00:03:46,320 --> 00:03:49,840
used to train AI models contain information

133
00:03:49,840 --> 00:03:54,120
that overlaps with the benchmark tests used to evaluate them.

134
00:03:54,120 --> 00:03:54,960
Oh, okay.

135
00:03:54,960 --> 00:03:57,680
It's like giving a student the test questions in advance.

136
00:03:57,680 --> 00:03:58,600
Oh, I see, yeah.

137
00:03:58,600 --> 00:04:01,960
This contamination can lead to artificially inflated scores.

138
00:04:01,960 --> 00:04:02,800
Yeah.

139
00:04:02,800 --> 00:04:05,080
Making a model seem smarter than it actually is.

140
00:04:05,080 --> 00:04:05,920
Right.

141
00:04:05,920 --> 00:04:08,000
Ah, so they're ensuring that the model

142
00:04:08,000 --> 00:04:10,200
isn't just memorizing answers,

143
00:04:10,200 --> 00:04:12,520
but actually learning how to solve problems.

144
00:04:12,520 --> 00:04:14,200
Like that's a crucial distinction.

145
00:04:14,200 --> 00:04:15,200
Yeah, it is, yeah.

146
00:04:15,200 --> 00:04:19,320
Did they talk about how they chose the data for training?

147
00:04:19,320 --> 00:04:20,160
They did.

148
00:04:20,160 --> 00:04:21,000
Okay.

149
00:04:21,000 --> 00:04:22,280
It seems like that's a key factor

150
00:04:22,280 --> 00:04:24,120
in building a well-rounded AI.

151
00:04:24,120 --> 00:04:24,960
It is.

152
00:04:24,960 --> 00:04:27,680
They emphasized using data from a wide range of sources.

153
00:04:27,680 --> 00:04:28,520
Okay.

154
00:04:28,520 --> 00:04:31,080
They tapped into real user interactions with models.

155
00:04:31,080 --> 00:04:32,200
Oh, wow.

156
00:04:32,200 --> 00:04:35,120
Imagine learning from how people actually talk to AI.

157
00:04:35,120 --> 00:04:36,000
That's cool.

158
00:04:36,000 --> 00:04:38,800
They also included data sets for specific skills

159
00:04:38,800 --> 00:04:42,080
like math, coding, and even multiple languages.

160
00:04:42,080 --> 00:04:42,920
Nice.

161
00:04:42,920 --> 00:04:45,000
It's like giving the model a diverse education

162
00:04:45,000 --> 00:04:46,880
to make it as versatile as possible.

163
00:04:46,880 --> 00:04:47,960
That makes a lot of sense.

164
00:04:47,960 --> 00:04:48,800
Yeah.

165
00:04:48,800 --> 00:04:50,480
So, is there anything about how they generated

166
00:04:50,480 --> 00:04:51,840
some of their own data?

167
00:04:51,840 --> 00:04:52,680
They did, yeah.

168
00:04:52,680 --> 00:04:53,560
Okay.

169
00:04:53,560 --> 00:04:55,080
That sounds intriguing.

170
00:04:55,080 --> 00:04:55,920
It is, yeah.

171
00:04:55,920 --> 00:04:58,720
They actually used AI to generate some of the training data.

172
00:04:58,720 --> 00:04:59,680
Oh, wow.

173
00:04:59,680 --> 00:05:02,680
But they were careful to avoid creating repetitive patterns

174
00:05:02,680 --> 00:05:05,920
that can make AI-generated data less effective.

175
00:05:05,920 --> 00:05:06,760
Right.

176
00:05:06,760 --> 00:05:08,640
So they used a clever approach

177
00:05:08,640 --> 00:05:11,040
where they created different personas

178
00:05:11,040 --> 00:05:12,600
for the AI to role play.

179
00:05:12,600 --> 00:05:13,440
Oh, interesting.

180
00:05:13,440 --> 00:05:16,600
Like a machine learning researcher or a beginner coder.

181
00:05:16,600 --> 00:05:19,160
This helped to diversify the generated data

182
00:05:19,160 --> 00:05:20,720
and make it more realistic.

183
00:05:20,720 --> 00:05:22,640
So they've carefully curated their data.

184
00:05:22,640 --> 00:05:23,480
Right.

185
00:05:23,480 --> 00:05:25,040
Now let's talk about the specifics

186
00:05:25,040 --> 00:05:26,400
of how they train the model.

187
00:05:26,400 --> 00:05:27,240
Yeah.

188
00:05:27,240 --> 00:05:29,320
It seems like this multi-stage process

189
00:05:29,320 --> 00:05:32,040
is where their innovation really shines through.

190
00:05:32,040 --> 00:05:32,880
I think so too.

191
00:05:32,880 --> 00:05:35,760
In the first stage, supervised fine tuning,

192
00:05:35,760 --> 00:05:38,000
they collect a massive amount of data

193
00:05:38,000 --> 00:05:40,360
containing prompts or instructions

194
00:05:40,360 --> 00:05:42,120
paired with the desired responses.

195
00:05:42,120 --> 00:05:42,960
Okay.

196
00:05:42,960 --> 00:05:44,520
It's like giving the model examples

197
00:05:44,520 --> 00:05:46,640
of good conversations to learn from.

198
00:05:46,640 --> 00:05:47,480
I see.

199
00:05:47,480 --> 00:05:49,720
And this is where all those diverse data sources

200
00:05:49,720 --> 00:05:50,880
come in handy, right?

201
00:05:50,880 --> 00:05:51,720
Yeah.

202
00:05:51,720 --> 00:05:54,480
Real user interactions, skill-specific data sets,

203
00:05:54,480 --> 00:05:56,360
and even the persona-generated data.

204
00:05:56,360 --> 00:05:57,200
All coming together.

205
00:05:57,200 --> 00:05:59,400
Do not just throwing everything in at random, are they?

206
00:05:59,400 --> 00:06:01,200
They're not, no, they're very strategic about it.

207
00:06:01,200 --> 00:06:02,040
Okay.

208
00:06:02,040 --> 00:06:03,280
They refine the data mix

209
00:06:03,280 --> 00:06:05,600
through multiple rounds of training,

210
00:06:05,600 --> 00:06:08,280
tweaking the recipe as they go to get the best results.

211
00:06:08,280 --> 00:06:10,240
So it's a very iterative process.

212
00:06:10,240 --> 00:06:11,080
It is.

213
00:06:11,080 --> 00:06:12,480
Constantly adjusting and improving.

214
00:06:12,480 --> 00:06:13,320
Yeah.

215
00:06:13,320 --> 00:06:14,560
What kind of results did they see

216
00:06:14,560 --> 00:06:16,440
after this first stage of training?

217
00:06:16,440 --> 00:06:18,360
Well, the paper includes a table

218
00:06:18,360 --> 00:06:20,720
that compares the performance of their model

219
00:06:20,720 --> 00:06:24,520
called 2U38B-SFT.

220
00:06:24,520 --> 00:06:25,360
Okay.

221
00:06:25,360 --> 00:06:27,920
Against other open source and even some closed models.

222
00:06:27,920 --> 00:06:28,760
Okay.

223
00:06:28,760 --> 00:06:30,280
And let me tell you, the results are impressive.

224
00:06:30,280 --> 00:06:32,400
I'm guessing 2U3 came out on top?

225
00:06:32,400 --> 00:06:33,240
You bet.

226
00:06:33,240 --> 00:06:34,060
Nice.

227
00:06:34,060 --> 00:06:37,120
It's significantly outperformed prior state-of-the-art models

228
00:06:37,120 --> 00:06:38,960
across a range of benchmarks,

229
00:06:38,960 --> 00:06:42,760
including MMEO, Truthful QA, and GSMA-K,

230
00:06:42,760 --> 00:06:45,200
which tests math problem solving skills.

231
00:06:45,200 --> 00:06:47,600
Wow, those are some tough benchmarks to beat.

232
00:06:47,600 --> 00:06:48,440
They are.

233
00:06:48,440 --> 00:06:50,360
That really speaks to the effectiveness

234
00:06:50,360 --> 00:06:54,320
of their data curation and multi-stage training approach.

235
00:06:54,320 --> 00:06:56,160
Yeah, they're really putting a lot of thought into this.

236
00:06:56,160 --> 00:06:57,000
Yeah.

237
00:06:57,000 --> 00:06:59,520
But I'm curious about this length normalized,

238
00:06:59,520 --> 00:07:01,480
direct preference optimization.

239
00:07:01,480 --> 00:07:02,320
Okay.

240
00:07:02,320 --> 00:07:03,160
What is that exactly?

241
00:07:03,160 --> 00:07:04,000
Sounds pretty technical, right?

242
00:07:04,000 --> 00:07:04,840
Yeah, a little bit.

243
00:07:04,840 --> 00:07:07,400
Well, direct preference optimization, or DPO,

244
00:07:07,400 --> 00:07:08,800
is the key algorithm they use

245
00:07:08,800 --> 00:07:10,560
for the second stage of training.

246
00:07:10,560 --> 00:07:11,400
Right.

247
00:07:11,400 --> 00:07:12,720
Preference fine tuning.

248
00:07:12,720 --> 00:07:15,640
Remember, this stage is all about making the AI's responses

249
00:07:15,640 --> 00:07:17,640
more aligned with human preferences.

250
00:07:17,640 --> 00:07:19,840
Right, so they're essentially teaching the model

251
00:07:19,840 --> 00:07:22,880
to be more human-like in its communication.

252
00:07:22,880 --> 00:07:23,720
Yeah.

253
00:07:23,720 --> 00:07:24,560
Okay.

254
00:07:24,560 --> 00:07:26,240
How does this DPO method work?

255
00:07:26,240 --> 00:07:30,920
It's a way of training the AI directly on human preferences

256
00:07:30,920 --> 00:07:33,800
without needing to create a separate reward model,

257
00:07:33,800 --> 00:07:36,160
which can be very complex and time-consuming.

258
00:07:36,160 --> 00:07:37,000
I see.

259
00:07:37,000 --> 00:07:39,640
The length normalized part is a clever tweak they made

260
00:07:39,640 --> 00:07:42,280
to the algorithm to make it even more effective.

261
00:07:42,280 --> 00:07:44,080
So they're simplifying the training process

262
00:07:44,080 --> 00:07:45,560
while making it more effective.

263
00:07:45,560 --> 00:07:46,440
Exactly.

264
00:07:46,440 --> 00:07:47,280
That's impressive.

265
00:07:47,280 --> 00:07:48,360
Yeah, they're pretty clever.

266
00:07:48,360 --> 00:07:49,640
What kind of data do they use

267
00:07:49,640 --> 00:07:52,080
for this preference fine tuning stage?

268
00:07:52,080 --> 00:07:53,520
They use a mix of sources.

269
00:07:53,520 --> 00:07:54,360
Okay.

270
00:07:54,360 --> 00:07:56,680
But they really focus on capturing human judgments

271
00:07:56,680 --> 00:07:59,280
about the quality of the AI's responses.

272
00:07:59,280 --> 00:08:00,120
Okay.

273
00:08:00,120 --> 00:08:01,240
They even developed a system

274
00:08:01,240 --> 00:08:04,760
where they use different AI models as judges

275
00:08:04,760 --> 00:08:07,840
to rate the responses generated by their model.

276
00:08:07,840 --> 00:08:10,560
Wait, they're having AI judge other AI.

277
00:08:10,560 --> 00:08:12,200
Yeah, it's a little bit meadow, isn't it?

278
00:08:12,200 --> 00:08:13,040
It is, yeah.

279
00:08:13,040 --> 00:08:15,040
But they're essentially leveraging the capabilities

280
00:08:15,040 --> 00:08:18,440
of different AI models to provide diverse perspectives

281
00:08:18,440 --> 00:08:20,920
on the quality of their model's responses.

282
00:08:20,920 --> 00:08:23,160
It's like having a panel of expert judges

283
00:08:23,160 --> 00:08:24,880
with different areas of expertise.

284
00:08:24,880 --> 00:08:27,200
That's an interesting way to get a more well-rounded

285
00:08:27,200 --> 00:08:29,160
assessment of the AI's performance.

286
00:08:30,120 --> 00:08:30,960
It is.

287
00:08:30,960 --> 00:08:32,400
Do they mention what kind of improvements they saw

288
00:08:32,400 --> 00:08:33,960
after using this DPO method?

289
00:08:33,960 --> 00:08:36,760
They found that DPO led to significant improvements

290
00:08:36,760 --> 00:08:39,320
in several areas, including Alpacaevigl,

291
00:08:39,320 --> 00:08:42,520
which tests the model's ability to follow instructions,

292
00:08:42,520 --> 00:08:45,440
and Safety, which evaluates its ability

293
00:08:45,440 --> 00:08:48,640
to avoid generating harmful or offensive content.

294
00:08:48,640 --> 00:08:51,600
So after two rounds of intense training,

295
00:08:51,600 --> 00:08:53,640
they've got a model that's not only smart,

296
00:08:53,640 --> 00:08:55,680
but also safe and well-behaved.

297
00:08:55,680 --> 00:08:56,520
That's right.

298
00:08:56,520 --> 00:08:57,920
Now let's move on to their final.

299
00:08:57,920 --> 00:08:58,760
Okay.

300
00:08:58,760 --> 00:09:01,040
And perhaps most innovative stage of training.

301
00:09:01,040 --> 00:09:04,880
Reinforcement learning with verifiable rewards,

302
00:09:04,880 --> 00:09:06,440
or RLVR.

303
00:09:06,440 --> 00:09:07,280
Right.

304
00:09:07,280 --> 00:09:08,760
This one sounds like it's pushing the boundaries

305
00:09:08,760 --> 00:09:10,720
of what's possible with AI training.

306
00:09:10,720 --> 00:09:11,560
It is.

307
00:09:11,560 --> 00:09:12,600
It's where it gets really interesting.

308
00:09:12,600 --> 00:09:15,360
RLVR is all about focusing on tasks

309
00:09:15,360 --> 00:09:16,840
where we can clearly determine

310
00:09:16,840 --> 00:09:18,400
if the AI got the answer right.

311
00:09:18,400 --> 00:09:19,240
Okay.

312
00:09:19,240 --> 00:09:20,800
Like math problems or coding challenges.

313
00:09:20,800 --> 00:09:22,760
Right, like a very specific grading rubric.

314
00:09:22,760 --> 00:09:23,600
Exactly.

315
00:09:23,600 --> 00:09:25,160
No room for subjectivity here.

316
00:09:25,160 --> 00:09:26,000
Right.

317
00:09:26,000 --> 00:09:26,840
How does it actually work?

318
00:09:26,840 --> 00:09:30,240
Well, in RLVR, the AI receives a reward.

319
00:09:30,240 --> 00:09:31,400
Think of it like a point.

320
00:09:31,400 --> 00:09:32,240
Okay.

321
00:09:32,240 --> 00:09:33,640
Only when it produces a correct solution.

322
00:09:33,640 --> 00:09:34,480
Right.

323
00:09:34,480 --> 00:09:36,640
This reward system helps to sharpen

324
00:09:36,640 --> 00:09:38,480
the AI's focus on accuracy.

325
00:09:38,480 --> 00:09:39,320
Okay.

326
00:09:39,320 --> 00:09:40,560
Making it a master problem solver.

327
00:09:40,560 --> 00:09:41,400
And they're training it

328
00:09:41,400 --> 00:09:42,960
on some challenging problems, right?

329
00:09:42,960 --> 00:09:43,800
They are.

330
00:09:43,800 --> 00:09:46,720
They specifically focus on three domains.

331
00:09:46,720 --> 00:09:49,680
Math problems from the GSM 8K dataset.

332
00:09:49,680 --> 00:09:50,520
Okay.

333
00:09:50,520 --> 00:09:53,000
More complex math problems from the math dataset.

334
00:09:53,000 --> 00:09:54,160
Wow.

335
00:09:54,160 --> 00:09:55,720
And precise instruction following

336
00:09:55,720 --> 00:09:57,240
from the IEVL dataset.

337
00:09:57,240 --> 00:09:59,720
So they're covering a broad spectrum of tasks

338
00:09:59,720 --> 00:10:03,160
that require accurate solutions and logical reasoning.

339
00:10:03,160 --> 00:10:04,000
Yeah.

340
00:10:04,000 --> 00:10:05,200
Did they encounter any hurdles

341
00:10:05,200 --> 00:10:07,360
while implementing this RLVR technique?

342
00:10:07,360 --> 00:10:08,200
They did.

343
00:10:08,200 --> 00:10:09,040
Yeah.

344
00:10:09,040 --> 00:10:10,160
Training a large language model

345
00:10:10,160 --> 00:10:11,320
with reinforcement learning

346
00:10:11,320 --> 00:10:13,560
can be very computationally expensive.

347
00:10:13,560 --> 00:10:14,640
Yeah, that makes sense.

348
00:10:14,640 --> 00:10:16,120
It takes a lot of processing power

349
00:10:16,120 --> 00:10:18,240
to teach a massive AI system

350
00:10:18,240 --> 00:10:20,800
by giving it points for every correct answer.

351
00:10:20,800 --> 00:10:21,640
Right.

352
00:10:21,640 --> 00:10:23,320
So what do they do to overcome this challenge?

353
00:10:23,320 --> 00:10:25,320
Well, they built a specialized system

354
00:10:25,320 --> 00:10:27,680
to run the RLVR training process.

355
00:10:27,680 --> 00:10:28,520
Okay.

356
00:10:28,520 --> 00:10:30,080
This system leverages a technique

357
00:10:30,080 --> 00:10:32,200
called asynchronous RL training.

358
00:10:32,200 --> 00:10:33,040
Okay.

359
00:10:33,040 --> 00:10:33,880
Which improves efficiency

360
00:10:33,880 --> 00:10:36,040
by allowing different parts of the training process

361
00:10:36,040 --> 00:10:37,240
to run concurrently.

362
00:10:37,240 --> 00:10:38,840
So it's like having multiple coaches

363
00:10:38,840 --> 00:10:40,480
working with an athlete at the same time.

364
00:10:40,480 --> 00:10:41,360
Exactly.

365
00:10:41,360 --> 00:10:42,920
A much faster way to train.

366
00:10:42,920 --> 00:10:44,040
What were the outcomes

367
00:10:44,040 --> 00:10:47,000
of this final stage of training?

368
00:10:47,000 --> 00:10:48,280
The results are impressive.

369
00:10:48,280 --> 00:10:49,120
Okay.

370
00:10:49,120 --> 00:10:51,680
They found that RLVR further enhanced

371
00:10:51,680 --> 00:10:54,840
the model's performance on those targeted tasks,

372
00:10:54,840 --> 00:10:59,280
particularly on GSMAK, math, and IEVL.

373
00:10:59,280 --> 00:11:00,120
Is that working?

374
00:11:00,120 --> 00:11:00,960
It's working.

375
00:11:00,960 --> 00:11:01,800
Okay.

376
00:11:01,800 --> 00:11:03,120
This shows that their approach is effective

377
00:11:03,120 --> 00:11:05,400
for both general language tasks

378
00:11:05,400 --> 00:11:07,320
and those requiring precise reasoning.

379
00:11:07,320 --> 00:11:08,160
That's incredible.

380
00:11:08,160 --> 00:11:09,000
It is.

381
00:11:09,000 --> 00:11:10,760
So they've taken this open source AI,

382
00:11:10,760 --> 00:11:13,600
put it through this really rigorous training program,

383
00:11:13,600 --> 00:11:16,760
and it's performing at a level that rivals

384
00:11:16,760 --> 00:11:20,000
or even surpasses some of the big names in AI.

385
00:11:20,000 --> 00:11:20,840
You're absolutely right.

386
00:11:20,840 --> 00:11:21,800
And it all comes back

387
00:11:21,800 --> 00:11:23,920
to their commitment to transparency.

388
00:11:23,920 --> 00:11:26,240
They've openly shared their data,

389
00:11:26,240 --> 00:11:28,720
code, and training methods,

390
00:11:28,720 --> 00:11:31,080
which is a game changer in the world of AI.

391
00:11:31,080 --> 00:11:31,920
It is.

392
00:11:31,920 --> 00:11:34,680
They're essentially inviting the entire AI community

393
00:11:34,680 --> 00:11:37,760
to learn from their work and build upon it.

394
00:11:37,760 --> 00:11:38,600
It's like they're saying,

395
00:11:38,600 --> 00:11:41,520
let's all work together to build a better AI future.

396
00:11:41,520 --> 00:11:42,360
Exactly.

397
00:11:42,360 --> 00:11:43,600
But hold on a second.

398
00:11:43,600 --> 00:11:46,320
We've talked a lot about training and data.

399
00:11:46,320 --> 00:11:47,160
Right.

400
00:11:47,160 --> 00:11:50,560
But what about evaluating 2U3's performance?

401
00:11:50,560 --> 00:11:51,400
Oh, that's a great point.

402
00:11:51,400 --> 00:11:53,240
How do we know it actually works?

403
00:11:53,240 --> 00:11:54,600
You raise a great point.

404
00:11:54,600 --> 00:11:58,240
Evaluation is a critical part of any AI research.

405
00:11:58,240 --> 00:12:02,360
And the 2U3 team didn't just rely on standard benchmarks.

406
00:12:02,360 --> 00:12:03,360
Okay.

407
00:12:03,360 --> 00:12:05,080
They went a step further

408
00:12:05,080 --> 00:12:07,440
and created a new evaluation framework

409
00:12:07,440 --> 00:12:10,160
called the Open Language Model Evaluation System,

410
00:12:10,160 --> 00:12:11,000
or Ulmes.

411
00:12:11,000 --> 00:12:11,840
Ulmes.

412
00:12:11,840 --> 00:12:12,680
Yeah.

413
00:12:12,680 --> 00:12:14,520
Okay, so it's like designing a standardized testing system

414
00:12:14,520 --> 00:12:17,760
for AI to ensure that everyone's playing by the same rules.

415
00:12:17,760 --> 00:12:18,600
Exactly.

416
00:12:18,600 --> 00:12:19,600
What's special about Ulmes?

417
00:12:19,600 --> 00:12:22,640
Ulmes includes a collection of carefully selected benchmarks.

418
00:12:22,640 --> 00:12:23,480
Okay.

419
00:12:23,480 --> 00:12:25,840
Covering a diverse range of tasks and skills.

420
00:12:25,840 --> 00:12:26,680
Right.

421
00:12:26,680 --> 00:12:28,000
But here's the key.

422
00:12:28,000 --> 00:12:31,200
They divided the evaluation suite into two sets.

423
00:12:31,200 --> 00:12:32,040
Oh, okay.

424
00:12:32,040 --> 00:12:34,520
One set, called the Development Suite,

425
00:12:34,520 --> 00:12:36,840
is used throughout the training process

426
00:12:36,840 --> 00:12:38,800
to monitor the model's progress

427
00:12:38,800 --> 00:12:40,560
and make adjustments as needed.

428
00:12:40,560 --> 00:12:43,680
So it's like giving the AI regular quizzes

429
00:12:43,680 --> 00:12:45,720
and practice tests throughout its training.

430
00:12:45,720 --> 00:12:46,560
Exactly.

431
00:12:46,560 --> 00:12:48,480
But what about the other set of evaluations?

432
00:12:48,480 --> 00:12:51,240
The second set, called the Unseen Suite,

433
00:12:51,240 --> 00:12:53,040
is kept hidden until the very end.

434
00:12:53,040 --> 00:12:53,880
Okay.

435
00:12:53,880 --> 00:12:54,720
It's like a final exam.

436
00:12:54,720 --> 00:12:55,560
Oh.

437
00:12:55,560 --> 00:12:57,760
Where the AI has never seen the questions before.

438
00:12:57,760 --> 00:12:58,640
Okay.

439
00:12:58,640 --> 00:13:01,480
This is the ultimate test of the model's ability

440
00:13:01,480 --> 00:13:03,960
to generalize and solve problems

441
00:13:03,960 --> 00:13:06,200
it hasn't specifically been trained on.

442
00:13:06,200 --> 00:13:08,920
That's a clever way to assess its true capabilities.

443
00:13:08,920 --> 00:13:09,760
It is.

444
00:13:09,760 --> 00:13:11,000
What kind of challenges do they include

445
00:13:11,000 --> 00:13:12,560
in this Unseen Suite?

446
00:13:12,560 --> 00:13:15,560
They included a range of new and challenging tasks,

447
00:13:15,560 --> 00:13:19,480
like AGEvol, which focuses on complex reasoning,

448
00:13:19,480 --> 00:13:22,560
and HREF, which evaluates the model's ability

449
00:13:22,560 --> 00:13:26,080
to follow instructions across different categories.

450
00:13:26,080 --> 00:13:26,920
Wow.

451
00:13:26,920 --> 00:13:28,560
They really wanted to push the model to its limits

452
00:13:28,560 --> 00:13:29,400
and see what it could do.

453
00:13:29,400 --> 00:13:31,840
So they didn't just create this powerful AI.

454
00:13:31,840 --> 00:13:32,680
Right.

455
00:13:32,680 --> 00:13:35,360
They also developed a really rigorous way

456
00:13:35,360 --> 00:13:37,040
to assess its capabilities.

457
00:13:37,040 --> 00:13:37,880
That's right.

458
00:13:37,880 --> 00:13:39,920
That's a commendable approach to research.

459
00:13:39,920 --> 00:13:40,760
It is.

460
00:13:40,760 --> 00:13:42,840
And in the spirit of transparency,

461
00:13:42,840 --> 00:13:45,920
they even shared some insights into what didn't work

462
00:13:45,920 --> 00:13:47,560
during the development process.

463
00:13:47,560 --> 00:13:48,440
Oh, interesting.

464
00:13:48,440 --> 00:13:50,880
For example, they experimented with different methods

465
00:13:50,880 --> 00:13:54,040
for generating data for preference fine tuning,

466
00:13:54,040 --> 00:13:56,200
but found that some approaches weren't as effective

467
00:13:56,200 --> 00:13:57,040
as others.

468
00:13:57,040 --> 00:14:00,360
So they're not just showcasing their successes.

469
00:14:00,360 --> 00:14:02,120
They're also highlighting their challenges

470
00:14:02,120 --> 00:14:03,920
and the lessons they learned along the way.

471
00:14:03,920 --> 00:14:04,520
Exactly.

472
00:14:04,520 --> 00:14:06,880
That's incredibly valuable for other researchers

473
00:14:06,880 --> 00:14:08,720
and developers building on their work.

474
00:14:08,720 --> 00:14:09,600
It is.

475
00:14:09,600 --> 00:14:12,640
And they also discussed areas for future research,

476
00:14:12,640 --> 00:14:15,320
like improving the model's ability to handle

477
00:14:15,320 --> 00:14:19,400
longer conversations, expanding its multilingual capability,

478
00:14:19,400 --> 00:14:22,560
and exploring ways to integrate it with tools and agents.

479
00:14:22,560 --> 00:14:24,200
It sounds like they've only just begun

480
00:14:24,200 --> 00:14:29,520
to explore the possibilities of open source AI with 2U3.

481
00:14:29,520 --> 00:14:30,680
It does, yeah.

482
00:14:30,680 --> 00:14:33,520
This research is truly groundbreaking,

483
00:14:33,520 --> 00:14:35,400
not just for the technical advancements,

484
00:14:35,400 --> 00:14:39,120
but also for their commitment to transparency and collaboration.

485
00:14:39,120 --> 00:14:39,680
Yeah.

486
00:14:39,680 --> 00:14:42,520
It's an exciting glimpse into the future of AI.

487
00:14:42,520 --> 00:14:43,160
It is.

488
00:14:43,160 --> 00:14:44,480
It is.

489
00:14:44,480 --> 00:14:45,640
Welcome back.

490
00:14:45,640 --> 00:14:48,720
I'm ready to hear how well 2U3 did on those hidden tests.

491
00:14:48,720 --> 00:14:49,240
Right.

492
00:14:49,240 --> 00:14:51,400
It's like waiting for the results of a big exam.

493
00:14:51,400 --> 00:14:52,800
Well, the results are in.

494
00:14:52,800 --> 00:14:54,160
And it's safe.

495
00:14:54,160 --> 00:14:57,040
I have to say that 2U3 passed with flying colors.

496
00:14:57,040 --> 00:14:58,360
OK, that's a relief.

497
00:14:58,360 --> 00:14:58,720
Yeah.

498
00:14:58,720 --> 00:15:00,400
But I'm guessing it wasn't a perfect score.

499
00:15:00,400 --> 00:15:01,080
Right.

500
00:15:01,080 --> 00:15:04,200
Were there any areas where 2U3 struggled?

501
00:15:04,200 --> 00:15:06,400
So there were a few interesting observations.

502
00:15:06,400 --> 00:15:06,800
OK.

503
00:15:06,800 --> 00:15:09,840
You see, while 2U3 excelled at math problems

504
00:15:09,840 --> 00:15:12,240
in the development suite, its performance

505
00:15:12,240 --> 00:15:15,760
dipped a bit on a similar but unseen set of math problems

506
00:15:15,760 --> 00:15:17,760
called deep-mind mathematics.

507
00:15:17,760 --> 00:15:20,640
So even though it aced the practice tests,

508
00:15:20,640 --> 00:15:23,280
it stumbled a bit when faced with brand new challenges.

509
00:15:23,280 --> 00:15:24,120
Exactly.

510
00:15:24,120 --> 00:15:25,280
Why do you think that happened?

511
00:15:25,280 --> 00:15:28,200
Well, the researchers noticed that 2U3 had a tendency

512
00:15:28,200 --> 00:15:30,400
to over-apply formatting rules.

513
00:15:30,400 --> 00:15:30,960
OK.

514
00:15:30,960 --> 00:15:33,040
It had learned during training.

515
00:15:33,040 --> 00:15:35,280
For instance, in the math data set,

516
00:15:35,280 --> 00:15:38,720
answers are often expected in latex format,

517
00:15:38,720 --> 00:15:41,800
which is a specific way of writing mathematical formulas.

518
00:15:41,800 --> 00:15:42,560
I see.

519
00:15:42,560 --> 00:15:45,720
2U3 seemed to apply this formatting even when

520
00:15:45,720 --> 00:15:48,720
it wasn't necessary, and that sometimes tripped it up.

521
00:15:48,720 --> 00:15:52,160
It's like a student who's so focused on using perfect grammar

522
00:15:52,160 --> 00:15:54,120
that they forget to answer the question itself.

523
00:15:54,120 --> 00:15:54,960
Exactly.

524
00:15:54,960 --> 00:15:55,600
That's funny.

525
00:15:55,600 --> 00:16:00,360
It highlights how even subtle aspects of the training data

526
00:16:00,360 --> 00:16:02,280
can influence the model's behavior.

527
00:16:02,280 --> 00:16:02,840
Right.

528
00:16:02,840 --> 00:16:05,720
Yeah, because it's picking up on these patterns,

529
00:16:05,720 --> 00:16:07,040
but not necessarily understanding

530
00:16:07,040 --> 00:16:08,240
like the underlying concept.

531
00:16:08,240 --> 00:16:08,880
Exactly.

532
00:16:08,880 --> 00:16:10,800
Were there any other unexpected findings

533
00:16:10,800 --> 00:16:12,360
in these unseen evaluations?

534
00:16:12,360 --> 00:16:13,160
Yes.

535
00:16:13,160 --> 00:16:15,720
They found a significant difference in performance

536
00:16:15,720 --> 00:16:20,080
between iFEVL, a benchmark they used during development,

537
00:16:20,080 --> 00:16:23,560
and a similar but unseen benchmark they created called

538
00:16:23,560 --> 00:16:24,800
iFEVLode.

539
00:16:24,800 --> 00:16:25,200
OK.

540
00:16:25,200 --> 00:16:27,800
I remember we talked about out of distribution tasks.

541
00:16:27,800 --> 00:16:28,240
Right.

542
00:16:28,240 --> 00:16:31,160
So iFEVLode is designed to test how well the model handles

543
00:16:31,160 --> 00:16:32,840
instructions it's never seen before.

544
00:16:32,840 --> 00:16:33,800
Exactly.

545
00:16:33,800 --> 00:16:37,800
Both benchmarks focus on precise instruction following,

546
00:16:37,800 --> 00:16:40,680
meaning the AI has to understand and execute

547
00:16:40,680 --> 00:16:42,840
very detailed instructions.

548
00:16:42,840 --> 00:16:43,080
Right.

549
00:16:43,080 --> 00:16:46,240
So it's like giving the AI a very specific recipe

550
00:16:46,240 --> 00:16:47,920
and seeing if it can follow it perfectly.

551
00:16:47,920 --> 00:16:48,480
Exactly.

552
00:16:48,480 --> 00:16:50,120
So how did two year three do?

553
00:16:50,120 --> 00:16:51,520
Well, here's the surprising part.

554
00:16:51,520 --> 00:16:52,040
OK.

555
00:16:52,040 --> 00:16:55,280
Even models that did exceptionally well on iFEVL,

556
00:16:55,280 --> 00:16:59,160
including two year three, saw a noticeable drop

557
00:16:59,160 --> 00:17:01,440
in performance on iFEVLode.

558
00:17:01,440 --> 00:17:04,080
So it's like they ace the practice test

559
00:17:04,080 --> 00:17:06,080
but struggle with the real exam.

560
00:17:06,080 --> 00:17:06,560
Right.

561
00:17:06,560 --> 00:17:07,480
What does that tell us?

562
00:17:07,480 --> 00:17:10,280
Well, it suggests that even though the model appears

563
00:17:10,280 --> 00:17:13,280
to be mastering precise instruction following,

564
00:17:13,280 --> 00:17:16,200
it might be overfitting to the specific examples

565
00:17:16,200 --> 00:17:17,680
it encountered during training.

566
00:17:17,680 --> 00:17:18,080
OK.

567
00:17:18,080 --> 00:17:19,880
In other words, it's not truly grasping

568
00:17:19,880 --> 00:17:21,960
the underlying principles, which

569
00:17:21,960 --> 00:17:24,880
makes it harder to generalize to new instructions.

570
00:17:24,880 --> 00:17:27,840
It seems like the AI is still learning by rote to some extent

571
00:17:27,840 --> 00:17:30,160
rather than developing a deeper understanding of how

572
00:17:30,160 --> 00:17:31,480
to follow instructions.

573
00:17:31,480 --> 00:17:32,680
That seems to be the case.

574
00:17:32,680 --> 00:17:33,000
OK.

575
00:17:33,000 --> 00:17:35,560
And it highlights an ongoing challenge in AI,

576
00:17:35,560 --> 00:17:38,440
developing models that can truly generalize and learn

577
00:17:38,440 --> 00:17:41,360
like humans do, adapting to new situations

578
00:17:41,360 --> 00:17:43,120
and applying knowledge flexibly.

579
00:17:43,120 --> 00:17:44,640
A good reality check that we're still

580
00:17:44,640 --> 00:17:46,040
in the early stages of AI.

581
00:17:46,040 --> 00:17:46,520
Yeah.

582
00:17:46,520 --> 00:17:47,640
And there's a lot more to learn.

583
00:17:47,640 --> 00:17:48,120
Right.

584
00:17:48,120 --> 00:17:52,400
But even with those limitations, the results of 2U3's

585
00:17:52,400 --> 00:17:54,720
evaluations are still pretty impressive.

586
00:17:54,720 --> 00:17:55,640
They are.

587
00:17:55,640 --> 00:18:00,000
They show that open source AI can be just as good as or even

588
00:18:00,000 --> 00:18:02,880
better than proprietary models in many areas.

589
00:18:02,880 --> 00:18:03,560
Absolutely.

590
00:18:03,560 --> 00:18:06,640
And the team's commitment to transparency

591
00:18:06,640 --> 00:18:09,560
is incredibly valuable for the entire AI community.

592
00:18:09,560 --> 00:18:10,240
It is.

593
00:18:10,240 --> 00:18:12,120
By sharing their methods and data,

594
00:18:12,120 --> 00:18:15,120
they're encouraging collaboration and driving innovation.

595
00:18:15,120 --> 00:18:15,600
Exactly.

596
00:18:15,600 --> 00:18:16,760
It's like they're saying, let's all

597
00:18:16,760 --> 00:18:19,400
work together to build a better AI future.

598
00:18:19,400 --> 00:18:20,760
So where do we go from here?

599
00:18:20,760 --> 00:18:21,920
That's a great question.

600
00:18:21,920 --> 00:18:23,960
What are the next steps for 2U3?

601
00:18:23,960 --> 00:18:25,560
And what are the broader implications

602
00:18:25,560 --> 00:18:28,360
of this research for the future of open source AI?

603
00:18:28,360 --> 00:18:30,360
Well, they've identified several exciting avenues

604
00:18:30,360 --> 00:18:31,520
for future research.

605
00:18:31,520 --> 00:18:32,080
OK.

606
00:18:32,080 --> 00:18:33,440
One area they're keen on exploring

607
00:18:33,440 --> 00:18:35,760
is enhancing the model's ability to handle longer

608
00:18:35,760 --> 00:18:38,440
conversations and more complex interactions.

609
00:18:38,440 --> 00:18:43,000
Right now, 2U3 is primarily trained on shorter conversations.

610
00:18:43,000 --> 00:18:43,720
Right.

611
00:18:43,720 --> 00:18:47,280
But imagine an AI that can understand and engage

612
00:18:47,280 --> 00:18:48,560
in more nuanced dialogues.

613
00:18:48,560 --> 00:18:49,000
Right.

614
00:18:49,000 --> 00:18:50,680
Like a real back and forth conversation.

615
00:18:50,680 --> 00:18:51,760
Exactly.

616
00:18:51,760 --> 00:18:55,560
That would be huge advancement for things like chatbots,

617
00:18:55,560 --> 00:18:59,640
virtual assistants, and AI-powered writing tools.

618
00:18:59,640 --> 00:19:01,000
All sorts of things.

619
00:19:01,000 --> 00:19:03,080
They're also looking into expanding

620
00:19:03,080 --> 00:19:05,200
its multilingual capabilities.

621
00:19:05,200 --> 00:19:05,760
Yes.

622
00:19:05,760 --> 00:19:07,160
So beyond English.

623
00:19:07,160 --> 00:19:07,960
Beyond English.

624
00:19:07,960 --> 00:19:10,640
Imagine an AI that can seamlessly communicate

625
00:19:10,640 --> 00:19:11,920
in multiple languages.

626
00:19:11,920 --> 00:19:12,680
Wow.

627
00:19:12,680 --> 00:19:14,640
It would open up so many possibilities

628
00:19:14,640 --> 00:19:17,520
for cross-cultural communication and collaboration.

629
00:19:17,520 --> 00:19:19,000
That's an inspiring vision.

630
00:19:19,000 --> 00:19:19,400
It is.

631
00:19:19,400 --> 00:19:22,320
Making AI accessible to people from all over the world.

632
00:19:22,320 --> 00:19:22,920
Precisely.

633
00:19:22,920 --> 00:19:23,720
That's amazing.

634
00:19:23,720 --> 00:19:26,120
It aligns with the idea of democratizing AI.

635
00:19:26,120 --> 00:19:26,600
Right.

636
00:19:26,600 --> 00:19:29,360
And ensuring that everyone can benefit from its potential.

637
00:19:29,360 --> 00:19:30,000
Yeah.

638
00:19:30,000 --> 00:19:33,680
Another area they're exploring is integrating 2U3

639
00:19:33,680 --> 00:19:36,560
with tools and agents.

640
00:19:36,560 --> 00:19:38,240
So not just communicating with words,

641
00:19:38,240 --> 00:19:40,440
but also interacting with the world around it.

642
00:19:40,440 --> 00:19:41,360
That's the idea.

643
00:19:41,360 --> 00:19:42,000
Wow.

644
00:19:42,000 --> 00:19:46,280
Imagine an AI that can access external tools and systems.

645
00:19:46,280 --> 00:19:46,760
Correct.

646
00:19:46,760 --> 00:19:50,280
Search engines, databases, even physical devices.

647
00:19:50,280 --> 00:19:50,880
Wow.

648
00:19:50,880 --> 00:19:53,160
So it could actually take actions based on what

649
00:19:53,160 --> 00:19:54,480
it learns and understands.

650
00:19:54,480 --> 00:19:55,280
Right.

651
00:19:55,280 --> 00:19:57,040
The possibilities seem endless.

652
00:19:57,040 --> 00:19:58,000
They really are.

653
00:19:58,000 --> 00:20:01,280
It shows how open source AI can be a driving force

654
00:20:01,280 --> 00:20:03,840
in developing really versatile AI systems that

655
00:20:03,840 --> 00:20:06,160
can address a wide range of real world problems.

656
00:20:06,160 --> 00:20:07,280
I agree.

657
00:20:07,280 --> 00:20:09,720
This research is truly pushing the boundaries of what

658
00:20:09,720 --> 00:20:10,880
we thought was possible.

659
00:20:10,880 --> 00:20:11,920
It is.

660
00:20:11,920 --> 00:20:16,040
It's exciting to think about how 2U3 will continue to evolve

661
00:20:16,040 --> 00:20:18,880
and inspire new innovations in the AI community.

662
00:20:18,880 --> 00:20:19,280
It is.

663
00:20:19,280 --> 00:20:20,800
It is a very exciting time.

664
00:20:20,800 --> 00:20:22,000
I'm glad we're here to talk about it.

665
00:20:22,000 --> 00:20:22,520
Me too.

666
00:20:22,520 --> 00:20:25,720
Welcome back for the final part of our deep dive into 2U3.

667
00:20:25,720 --> 00:20:27,040
It's been quite a journey.

668
00:20:27,040 --> 00:20:28,240
We've covered a lot of ground.

669
00:20:28,240 --> 00:20:28,760
We have.

670
00:20:28,760 --> 00:20:31,880
Exploring this technology, the training process,

671
00:20:31,880 --> 00:20:33,680
and even those unseen evaluations.

672
00:20:33,680 --> 00:20:34,000
Yeah.

673
00:20:34,000 --> 00:20:36,600
It's really fascinating to see how this open source model is

674
00:20:36,600 --> 00:20:38,520
kind of shaking up the AI world.

675
00:20:38,520 --> 00:20:38,920
Yeah.

676
00:20:38,920 --> 00:20:40,840
Challenging those preconceived notions.

677
00:20:40,840 --> 00:20:41,280
Right.

678
00:20:41,280 --> 00:20:43,560
And pushing those boundaries of what's possible.

679
00:20:43,560 --> 00:20:44,680
Exactly.

680
00:20:44,680 --> 00:20:48,240
But now it's time to step back and consider that bigger picture.

681
00:20:48,240 --> 00:20:48,720
Right.

682
00:20:48,720 --> 00:20:53,920
What does the rise of powerful open source AI mean for us?

683
00:20:53,920 --> 00:20:56,440
Like what are the implications for society,

684
00:20:56,440 --> 00:20:59,120
for the future of work, and even for our understanding

685
00:20:59,120 --> 00:21:00,440
of intelligence itself?

686
00:21:00,440 --> 00:21:00,640
Yeah.

687
00:21:00,640 --> 00:21:02,480
Those are the million dollar questions, aren't they?

688
00:21:02,480 --> 00:21:03,160
They are.

689
00:21:03,160 --> 00:21:04,880
And they're not easy to answer.

690
00:21:04,880 --> 00:21:05,360
No.

691
00:21:05,360 --> 00:21:08,680
But it is crucial that we start having these conversations

692
00:21:08,680 --> 00:21:10,960
now as this technology continues

693
00:21:10,960 --> 00:21:12,880
to evolve at such an incredible pace.

694
00:21:12,880 --> 00:21:13,760
Absolutely.

695
00:21:13,760 --> 00:21:16,000
So let's dive into some of these big picture questions.

696
00:21:16,000 --> 00:21:16,640
OK.

697
00:21:16,640 --> 00:21:19,840
First up, what does the rise of open source AI

698
00:21:19,840 --> 00:21:22,040
mean for the future of work?

699
00:21:22,040 --> 00:21:24,960
Well, one of the most immediate impacts we're likely to see

700
00:21:24,960 --> 00:21:26,880
is an acceleration of automation.

701
00:21:26,880 --> 00:21:27,200
OK.

702
00:21:27,200 --> 00:21:30,200
Tasks that were once thought to require human intelligence,

703
00:21:30,200 --> 00:21:34,000
like writing, coding, and data analysis,

704
00:21:34,000 --> 00:21:37,680
might become increasingly automated as AI, like tool

705
00:21:37,680 --> 00:21:40,240
U3, becomes more widely available.

706
00:21:40,240 --> 00:21:40,920
That makes sense.

707
00:21:40,920 --> 00:21:41,400
Yeah.

708
00:21:41,400 --> 00:21:43,960
If AI can do these things faster and more efficiently.

709
00:21:43,960 --> 00:21:47,040
It's only natural that businesses would adopt these tools.

710
00:21:47,040 --> 00:21:47,480
Right.

711
00:21:47,480 --> 00:21:49,600
But I can also see how that could lead to concerns

712
00:21:49,600 --> 00:21:51,400
about job displacement.

713
00:21:51,400 --> 00:21:51,920
You're right.

714
00:21:51,920 --> 00:21:53,000
It's a valid concern.

715
00:21:53,000 --> 00:21:53,440
Yeah.

716
00:21:53,440 --> 00:21:54,680
That's why it's essential that we

717
00:21:54,680 --> 00:21:57,760
start thinking about how we can adapt our workforce,

718
00:21:57,760 --> 00:21:59,920
invest in retraining and education,

719
00:21:59,920 --> 00:22:02,320
and create new opportunities that leverage

720
00:22:02,320 --> 00:22:05,120
the unique strengths of both humans and AI.

721
00:22:05,120 --> 00:22:08,400
It's about finding that synergy, that sweet spot,

722
00:22:08,400 --> 00:22:11,320
where humans and AI can work together

723
00:22:11,320 --> 00:22:12,880
to achieve incredible things.

724
00:22:12,880 --> 00:22:13,600
Exactly.

725
00:22:13,600 --> 00:22:15,960
And open source AI could play a key role

726
00:22:15,960 --> 00:22:19,280
in making these advanced tools more accessible

727
00:22:19,280 --> 00:22:21,960
to a wider range of businesses and individuals.

728
00:22:21,960 --> 00:22:22,200
Right?

729
00:22:22,200 --> 00:22:22,840
Absolutely.

730
00:22:22,840 --> 00:22:25,120
It's not just large corporations that

731
00:22:25,120 --> 00:22:26,680
can benefit from this technology.

732
00:22:26,680 --> 00:22:27,120
Right.

733
00:22:27,120 --> 00:22:30,680
Smaller businesses, startups, even individual entrepreneurs

734
00:22:30,680 --> 00:22:33,000
could have access to these powerful tools.

735
00:22:33,000 --> 00:22:33,520
That's great.

736
00:22:33,520 --> 00:22:37,280
Which could lead to a surge in innovation and creativity

737
00:22:37,280 --> 00:22:38,920
across all sectors.

738
00:22:38,920 --> 00:22:40,360
That's a really exciting prospect.

739
00:22:40,360 --> 00:22:42,120
It's like leveling the playing field.

740
00:22:42,120 --> 00:22:42,600
It is.

741
00:22:42,600 --> 00:22:46,560
And empowering everyone to participate in this AI revolution.

742
00:22:46,560 --> 00:22:47,400
Exactly.

743
00:22:47,400 --> 00:22:49,640
What about the implications for education?

744
00:22:49,640 --> 00:22:50,280
Oh, yeah.

745
00:22:50,280 --> 00:22:52,840
I can imagine AI having a profound impact

746
00:22:52,840 --> 00:22:54,320
on how we learn and teach.

747
00:22:54,320 --> 00:22:57,120
Imagine a world where students have access

748
00:22:57,120 --> 00:22:59,440
to personalized AI tutors.

749
00:22:59,440 --> 00:23:00,440
OK, yeah.

750
00:23:00,440 --> 00:23:03,520
That can adapt to their individual learning styles,

751
00:23:03,520 --> 00:23:05,280
provide instant feedback, and make

752
00:23:05,280 --> 00:23:07,480
learning more engaging and interactive.

753
00:23:07,480 --> 00:23:10,360
It's like having a personal coach for every subject,

754
00:23:10,360 --> 00:23:11,640
guiding you along the way.

755
00:23:11,640 --> 00:23:12,360
Exactly.

756
00:23:12,360 --> 00:23:13,440
That could be incredible.

757
00:23:13,440 --> 00:23:14,000
Yeah.

758
00:23:14,000 --> 00:23:16,440
But I can also see potential downsides.

759
00:23:16,440 --> 00:23:21,160
What about concerns that AI could stifle creativity

760
00:23:21,160 --> 00:23:23,000
or discourage critical thinking?

761
00:23:23,000 --> 00:23:24,680
Those are valid concerns.

762
00:23:24,680 --> 00:23:27,360
And it's essential that we address them thoughtfully.

763
00:23:27,360 --> 00:23:30,640
We need to ensure that AI tools are used responsibly

764
00:23:30,640 --> 00:23:34,200
in education, that they complement rather than replace

765
00:23:34,200 --> 00:23:36,520
human teachers, and that they encourage students

766
00:23:36,520 --> 00:23:38,760
to develop their critical thinking skills

767
00:23:38,760 --> 00:23:41,840
and explore their own unique creative potential.

768
00:23:41,840 --> 00:23:43,720
It's about striking that balance,

769
00:23:43,720 --> 00:23:47,080
using AI to enhance the learning experience,

770
00:23:47,080 --> 00:23:51,200
while preserving those really essential human qualities.

771
00:23:51,200 --> 00:23:51,920
Absolutely.

772
00:23:51,920 --> 00:23:53,920
Now, I can't let you off the hook without tackling

773
00:23:53,920 --> 00:23:55,160
one of the biggest questions of all.

774
00:23:55,160 --> 00:23:55,960
Oh, OK.

775
00:23:55,960 --> 00:23:59,280
What does the rise of AI, like 2U3,

776
00:23:59,280 --> 00:24:03,360
mean for our understanding of intelligence itself?

777
00:24:03,360 --> 00:24:05,600
The age-old question.

778
00:24:05,600 --> 00:24:09,200
As AI systems become increasingly sophisticated,

779
00:24:09,200 --> 00:24:11,520
they're challenging our traditional notions of what

780
00:24:11,520 --> 00:24:13,400
it means to be intelligent.

781
00:24:13,400 --> 00:24:15,760
They can perform tasks that were once thought

782
00:24:15,760 --> 00:24:17,320
to be uniquely human.

783
00:24:17,320 --> 00:24:19,640
And they're constantly learning and evolving.

784
00:24:19,640 --> 00:24:21,960
It's like we're witnessing a new form of intelligence

785
00:24:21,960 --> 00:24:25,160
emerging, and it's forcing us to rethink our own place

786
00:24:25,160 --> 00:24:25,760
in the world.

787
00:24:25,760 --> 00:24:27,040
Exactly.

788
00:24:27,040 --> 00:24:29,280
And the beauty of open source AI is

789
00:24:29,280 --> 00:24:32,600
that it allows us to explore these questions together

790
00:24:32,600 --> 00:24:33,800
as a global community.

791
00:24:33,800 --> 00:24:34,240
That's great.

792
00:24:34,240 --> 00:24:36,680
By sharing our knowledge and insights.

793
00:24:36,680 --> 00:24:39,920
We can collectively navigate the complexities of AI

794
00:24:39,920 --> 00:24:43,320
and ensure that its development benefits all of humanity.

795
00:24:43,320 --> 00:24:44,520
That's a powerful message.

796
00:24:44,520 --> 00:24:45,040
It is.

797
00:24:45,040 --> 00:24:46,800
It's not just about the technology itself.

798
00:24:46,800 --> 00:24:47,320
Right.

799
00:24:47,320 --> 00:24:50,160
It's about the collaborative spirit, the open exchange

800
00:24:50,160 --> 00:24:52,680
of ideas, and the shared responsibility

801
00:24:52,680 --> 00:24:55,600
we have in shaping the future of AI.

802
00:24:55,600 --> 00:24:56,920
Well said.

803
00:24:56,920 --> 00:24:59,880
Well, it's been a privilege to explore the world of 2U3

804
00:24:59,880 --> 00:25:00,800
with you today.

805
00:25:00,800 --> 00:25:02,000
Likewise.

806
00:25:02,000 --> 00:25:04,680
This deep dive has been a fascinating journey

807
00:25:04,680 --> 00:25:07,640
from the technical details to the philosophical implications.

808
00:25:07,640 --> 00:25:08,480
I agree.

809
00:25:08,480 --> 00:25:10,920
And I think it's safe to say that open source AI,

810
00:25:10,920 --> 00:25:13,960
like 2U3, will continue to shape our world

811
00:25:13,960 --> 00:25:15,440
in really profound ways.

812
00:25:15,440 --> 00:25:16,120
It will.

813
00:25:16,120 --> 00:25:18,800
It's an exciting time to be witnessing these developments

814
00:25:18,800 --> 00:25:20,520
and participating in this conversation.

815
00:25:20,520 --> 00:25:21,080
It is.

816
00:25:21,080 --> 00:25:22,000
Thanks for joining us.

817
00:25:22,000 --> 00:25:32,560
Thanks for having me.