1
00:00:00,000 --> 00:00:02,920
All right, so today we're taking a deep dive into

2
00:00:04,200 --> 00:00:07,280
Tencent's latest creation in the AI world.

3
00:00:07,280 --> 00:00:09,480
Yeah, their new model, Hanyuan Large.

4
00:00:09,480 --> 00:00:10,320
Hanyuan Large.

5
00:00:10,320 --> 00:00:11,160
That's right.

6
00:00:11,160 --> 00:00:12,400
And what's really interesting about this,

7
00:00:12,400 --> 00:00:14,000
and I know we've talked about some other

8
00:00:14,000 --> 00:00:16,280
large language models on the show before,

9
00:00:16,280 --> 00:00:19,840
but this one takes kind of a different approach.

10
00:00:19,840 --> 00:00:20,680
It does, yeah.

11
00:00:20,680 --> 00:00:23,560
And I think what's really interesting is the efficiency

12
00:00:23,560 --> 00:00:27,480
and specialization that this model brings to the table.

13
00:00:27,480 --> 00:00:29,840
It's built on this structure called a mixture

14
00:00:29,840 --> 00:00:32,000
of experts or MOI.

15
00:00:32,000 --> 00:00:34,440
And what that means is that the model itself

16
00:00:34,440 --> 00:00:35,920
has different parts that are really good

17
00:00:35,920 --> 00:00:37,680
at very specific things.

18
00:00:37,680 --> 00:00:40,640
So instead of just one AI brain that tries to be good

19
00:00:40,640 --> 00:00:42,880
at everything, it's more like a team.

20
00:00:42,880 --> 00:00:44,720
Yeah, like a team of specialists, you got it.

21
00:00:44,720 --> 00:00:45,960
Okay, I like that.

22
00:00:45,960 --> 00:00:48,920
And so you have an expert for math,

23
00:00:48,920 --> 00:00:50,320
you have an expert for coding,

24
00:00:50,320 --> 00:00:51,640
you have an expert for understanding

25
00:00:51,640 --> 00:00:53,240
long chunks of text.

26
00:00:53,240 --> 00:00:56,400
So it knows when to call on the right expert for the job.

27
00:00:56,400 --> 00:00:57,440
Exactly, yeah.

28
00:00:57,440 --> 00:00:59,560
It's smart enough to know which expert to use

29
00:00:59,560 --> 00:01:02,200
for any given task and that's what makes it so efficient.

30
00:01:02,200 --> 00:01:04,400
So is that part of the reason why,

31
00:01:04,400 --> 00:01:05,880
even though it's a massive model

32
00:01:05,880 --> 00:01:08,800
with 389 billion parameters,

33
00:01:08,800 --> 00:01:11,760
it only actually uses 52 billion of those parameters?

34
00:01:11,760 --> 00:01:12,720
That's exactly it.

35
00:01:12,720 --> 00:01:14,480
And that's I think one of the most interesting things

36
00:01:14,480 --> 00:01:16,280
about this research is that it's showing

37
00:01:16,280 --> 00:01:19,160
that you can get incredible performance

38
00:01:19,160 --> 00:01:21,960
with a smaller number of activated parameters.

39
00:01:21,960 --> 00:01:24,920
So Hanyu in large is going head to head

40
00:01:24,920 --> 00:01:29,920
with models like Lama 3.1, which uses 405 billion parameters.

41
00:01:30,800 --> 00:01:31,640
Wow.

42
00:01:31,640 --> 00:01:33,160
And it's outperforming it on many tasks.

43
00:01:33,160 --> 00:01:35,360
It's just kind of like a David and Goliath situation

44
00:01:35,360 --> 00:01:36,360
in the AI world.

45
00:01:36,360 --> 00:01:38,000
It really is, yeah, it's this smaller,

46
00:01:38,000 --> 00:01:41,720
more specialized model that's outperforming a giant.

47
00:01:41,720 --> 00:01:43,480
Yeah, and so how are they pulling this off?

48
00:01:43,480 --> 00:01:44,760
What's the secret sauce here?

49
00:01:44,760 --> 00:01:45,920
Well, there are three key things

50
00:01:45,920 --> 00:01:47,360
that they highlighted in the paper.

51
00:01:47,360 --> 00:01:50,160
The first is the data, they trained this model

52
00:01:50,160 --> 00:01:52,200
on an absolutely massive amount of data,

53
00:01:52,200 --> 00:01:55,680
seven trillion tokens, which is like feeding it a library

54
00:01:55,680 --> 00:01:57,240
the size of a small country.

55
00:01:57,240 --> 00:01:58,520
Wow, it's a lot of data.

56
00:01:58,520 --> 00:01:59,480
Yeah, it's a ton of data,

57
00:01:59,480 --> 00:02:01,880
but it's not just the size of the data that's important,

58
00:02:01,880 --> 00:02:03,200
it's also the quality.

59
00:02:03,200 --> 00:02:04,040
Oh, interesting, okay.

60
00:02:04,040 --> 00:02:07,600
They actually generated 1.5 trillion tokens

61
00:02:07,600 --> 00:02:09,040
of synthetic data.

62
00:02:09,040 --> 00:02:11,480
Now, hold on, I'm not an AI expert,

63
00:02:11,480 --> 00:02:12,920
so what is synthetic data?

64
00:02:12,920 --> 00:02:14,280
So think about it this way.

65
00:02:14,280 --> 00:02:17,080
Imagine you're teaching a kid about animals.

66
00:02:17,080 --> 00:02:18,520
You could take them to a zoo

67
00:02:18,520 --> 00:02:20,480
and show them all the different animals,

68
00:02:20,480 --> 00:02:23,320
or you could show them carefully curated pictures

69
00:02:23,320 --> 00:02:25,680
and videos of those animals.

70
00:02:25,680 --> 00:02:29,120
So synthetic data is kind of like those pictures and videos.

71
00:02:29,120 --> 00:02:30,960
It's data that's generated artificially

72
00:02:30,960 --> 00:02:33,200
to mimic real world data.

73
00:02:33,200 --> 00:02:34,560
So it's more controlled and efficient.

74
00:02:34,560 --> 00:02:36,280
Exactly, it's like giving the model

75
00:02:36,280 --> 00:02:38,480
a super concentrated learning smoothie.

76
00:02:38,480 --> 00:02:39,400
I love that analogy.

77
00:02:39,400 --> 00:02:41,080
Packed with all the essential information.

78
00:02:41,080 --> 00:02:42,920
That's great, so massive amounts of data,

79
00:02:42,920 --> 00:02:45,040
including this synthetic data.

80
00:02:45,040 --> 00:02:46,720
What else is in the secret sauce?

81
00:02:46,720 --> 00:02:48,800
So the second thing is a system

82
00:02:48,800 --> 00:02:50,680
that they call recycle routing.

83
00:02:50,680 --> 00:02:51,640
Recycle routing.

84
00:02:51,640 --> 00:02:53,000
Now, remember how we talked about

85
00:02:53,000 --> 00:02:54,440
the different experts in the model?

86
00:02:54,440 --> 00:02:55,280
Yes.

87
00:02:55,280 --> 00:02:57,320
Well, this routing system basically makes sure

88
00:02:57,320 --> 00:02:59,520
that none of that information gets lost

89
00:02:59,520 --> 00:03:01,200
as it's processed by those experts.

90
00:03:01,200 --> 00:03:03,680
So it's like, if one expert's already overloaded,

91
00:03:03,680 --> 00:03:05,960
it'll send it to another one that has the capacity.

92
00:03:05,960 --> 00:03:08,040
Exactly, it's like having a super efficient

93
00:03:08,040 --> 00:03:10,640
air traffic control system for the AI model.

94
00:03:10,640 --> 00:03:12,160
I like that, okay, so that's two ingredients.

95
00:03:12,160 --> 00:03:13,000
What's the third?

96
00:03:13,000 --> 00:03:14,280
The third one is what they call

97
00:03:14,280 --> 00:03:17,160
expert specific learning rates.

98
00:03:17,160 --> 00:03:18,600
Okay, break that down for me.

99
00:03:18,600 --> 00:03:20,800
Essentially, they realize the different parts

100
00:03:20,800 --> 00:03:24,640
of the Moe structure need to learn at different paces

101
00:03:24,640 --> 00:03:25,960
to be most effective.

102
00:03:25,960 --> 00:03:26,800
Interesting.

103
00:03:26,800 --> 00:03:29,160
So it's kind of like having a personalized learning plan

104
00:03:29,160 --> 00:03:31,160
for each AI expert.

105
00:03:31,160 --> 00:03:32,160
Fascinating.

106
00:03:32,160 --> 00:03:34,480
So they've got this really efficient model.

107
00:03:34,480 --> 00:03:37,680
They're training it on a ton of high quality data

108
00:03:37,680 --> 00:03:40,320
and then they're even optimizing the learning process

109
00:03:40,320 --> 00:03:42,080
for each expert.

110
00:03:42,080 --> 00:03:44,280
What kind of results are they seeing with all of this?

111
00:03:44,280 --> 00:03:45,960
Yeah, well, that's where things get really exciting.

112
00:03:45,960 --> 00:03:48,760
Hanyu and Larch has really blown some benchmarks out of the water.

113
00:03:48,760 --> 00:03:49,600
Oh, really?

114
00:03:49,600 --> 00:03:51,120
Especially when it comes to things like language

115
00:03:51,120 --> 00:03:53,120
understanding and even coding.

116
00:03:53,120 --> 00:03:54,600
Okay, so give me some specifics.

117
00:03:54,600 --> 00:03:56,040
Like what benchmarks are we talking about?

118
00:03:56,040 --> 00:03:58,600
Well, one of the big ones they use is called MMLU.

119
00:03:58,600 --> 00:03:59,440
Okay.

120
00:03:59,440 --> 00:04:01,000
And that basically measures how well a model

121
00:04:01,000 --> 00:04:04,640
understands language across a huge range of topics.

122
00:04:04,640 --> 00:04:06,520
So it's like a test of like AI smarts.

123
00:04:06,520 --> 00:04:08,960
Exactly, it's a pretty comprehensive test.

124
00:04:08,960 --> 00:04:12,840
And Hanyu and Larch scored 88.4% on this benchmark,

125
00:04:12,840 --> 00:04:15,120
which is significantly higher than models

126
00:04:15,120 --> 00:04:18,840
like Lama 3.1, which only got like 85.2%.

127
00:04:18,840 --> 00:04:20,480
Wow, and remind me how many parameters

128
00:04:20,480 --> 00:04:21,760
each of those models is using.

129
00:04:21,760 --> 00:04:24,360
Right, so Hanyu and Larch is only using 52 billion

130
00:04:24,360 --> 00:04:29,200
activated parameters, and Lama 3.1 is using 405 billion.

131
00:04:29,200 --> 00:04:31,600
So it's not just winning, it's winning by a landslide.

132
00:04:31,600 --> 00:04:34,320
It really is, and it's not just the raw scores.

133
00:04:34,320 --> 00:04:36,560
Hanyu and Larch is also showing that it can do things

134
00:04:36,560 --> 00:04:39,200
like common sense reasoning, question answering,

135
00:04:39,200 --> 00:04:40,720
and even generate code.

136
00:04:40,720 --> 00:04:41,560
Oh, wow.

137
00:04:41,560 --> 00:04:42,640
Okay, so it's not just talk,

138
00:04:42,640 --> 00:04:44,800
this model can actually like walk the walk.

139
00:04:44,800 --> 00:04:46,240
Exactly, it's the real deal.

140
00:04:46,240 --> 00:04:48,040
And they really wanted to push it to the limits

141
00:04:48,040 --> 00:04:51,600
and see how well it could handle long chunks of information.

142
00:04:51,600 --> 00:04:53,880
So they actually created a special benchmark

143
00:04:53,880 --> 00:04:56,120
just for this called Penguin Scrolls.

144
00:04:56,120 --> 00:04:57,840
Penguin Scrolls, that sounds adorable.

145
00:04:57,840 --> 00:04:59,240
It's a very catchy name.

146
00:04:59,240 --> 00:05:01,800
But the benchmark itself is very rigorous.

147
00:05:01,800 --> 00:05:05,480
They use things like financial reports, academic papers,

148
00:05:05,480 --> 00:05:08,640
some of these documents were over 100,000 words long.

149
00:05:08,640 --> 00:05:11,040
Wow, that's like feeding in an entire encyclopedia.

150
00:05:11,040 --> 00:05:12,720
Yeah, it's a ton of information.

151
00:05:12,720 --> 00:05:15,240
And they tested how well Hanyu and Larch

152
00:05:15,240 --> 00:05:17,400
could extract key information,

153
00:05:17,400 --> 00:05:19,320
answer really complex questions,

154
00:05:19,320 --> 00:05:21,600
and even engage in multi-turned dialogues

155
00:05:21,600 --> 00:05:23,520
based on these massive documents.

156
00:05:23,520 --> 00:05:25,480
So it's not just about understanding language,

157
00:05:25,480 --> 00:05:27,360
it's like understanding and reasoning

158
00:05:27,360 --> 00:05:28,960
about complex information.

159
00:05:28,960 --> 00:05:30,200
You got it.

160
00:05:30,200 --> 00:05:32,720
And the results were outstanding.

161
00:05:32,720 --> 00:05:36,560
It even outperformed models that are specifically designed

162
00:05:36,560 --> 00:05:39,000
for these long context tasks.

163
00:05:39,000 --> 00:05:40,280
That's seriously impressive.

164
00:05:40,280 --> 00:05:41,120
Yeah.

165
00:05:41,120 --> 00:05:42,160
So zooming out for a second,

166
00:05:42,160 --> 00:05:44,640
what are the broader implications of this research?

167
00:05:44,640 --> 00:05:45,480
What does it all mean?

168
00:05:45,480 --> 00:05:47,560
I think it's really challenging our assumptions

169
00:05:47,560 --> 00:05:50,560
about how we design AI in the future.

170
00:05:50,560 --> 00:05:52,680
We've always thought that bigger is better,

171
00:05:52,680 --> 00:05:54,760
but this research shows that it's not just about size,

172
00:05:54,760 --> 00:05:56,480
it's about being smarter, more focused.

173
00:05:56,480 --> 00:05:57,800
It's about being more efficient.

174
00:05:57,800 --> 00:06:00,400
Exactly, and that has huge implications

175
00:06:00,400 --> 00:06:02,720
for efficiency and accessibility.

176
00:06:02,720 --> 00:06:06,080
Imagine AI models that are not only more powerful,

177
00:06:06,080 --> 00:06:07,880
but also more energy efficient

178
00:06:07,880 --> 00:06:10,280
and less computationally demanding.

179
00:06:10,280 --> 00:06:12,240
Yeah, that would be a game changer,

180
00:06:12,240 --> 00:06:14,680
especially with all the concerns that we have these days

181
00:06:14,680 --> 00:06:16,920
about the environmental impact of AI.

182
00:06:16,920 --> 00:06:18,200
Absolutely.

183
00:06:18,200 --> 00:06:19,720
So what's next for Hanyu and Larch?

184
00:06:19,720 --> 00:06:21,160
What are they planning to do with it?

185
00:06:21,160 --> 00:06:22,520
Well, one of the coolest things about this

186
00:06:22,520 --> 00:06:24,240
is that they're committed to open sourcing it.

187
00:06:24,240 --> 00:06:26,040
Oh, so you mean making the model

188
00:06:26,040 --> 00:06:28,360
and code available for anybody to use?

189
00:06:28,360 --> 00:06:30,080
Exactly, they're setting a great example

190
00:06:30,080 --> 00:06:32,000
by making this technology accessible

191
00:06:32,000 --> 00:06:34,640
to the wider AI community.

192
00:06:34,640 --> 00:06:37,160
And that fosters collaboration and innovation.

193
00:06:37,160 --> 00:06:39,400
Absolutely, and it also helps address concerns

194
00:06:39,400 --> 00:06:42,280
about transparency and ethical considerations.

195
00:06:42,280 --> 00:06:45,880
Yeah, okay, so what does this mean for the average person?

196
00:06:45,880 --> 00:06:48,960
How is this gonna impact our lives in the coming years?

197
00:06:48,960 --> 00:06:50,720
That's the million dollar question.

198
00:06:50,720 --> 00:06:52,120
It's hard to say for certain,

199
00:06:52,120 --> 00:06:54,040
but this research is paving the way

200
00:06:54,040 --> 00:06:57,040
for more powerful, more versatile AI systems.

201
00:06:57,040 --> 00:06:59,760
So AI that can help us do more, understand more,

202
00:06:59,760 --> 00:07:01,160
even create more?

203
00:07:01,160 --> 00:07:02,840
Absolutely, and that has the potential

204
00:07:02,840 --> 00:07:05,880
to really revolutionize everything from healthcare

205
00:07:05,880 --> 00:07:08,440
and education to scientific research

206
00:07:08,440 --> 00:07:09,840
and artistic expression.

207
00:07:09,840 --> 00:07:10,680
Yeah, it really sounds like

208
00:07:10,680 --> 00:07:12,840
we're entering a new era of AI.

209
00:07:12,840 --> 00:07:13,920
I think so too.

210
00:07:13,920 --> 00:07:16,640
And it's exciting to see where this journey takes us.

211
00:07:16,640 --> 00:07:18,040
Is there anything else you wanna add

212
00:07:18,040 --> 00:07:21,040
before we wrap up this deep dive into, honey, you enlarge?

213
00:07:21,040 --> 00:07:22,560
Just one final thought.

214
00:07:22,560 --> 00:07:24,760
It's not just about building bigger models,

215
00:07:24,760 --> 00:07:27,000
it's about building smarter models.

216
00:07:27,000 --> 00:07:29,760
And that, I think, is the big takeaway here.

217
00:07:29,760 --> 00:07:32,160
So it's about being more strategic

218
00:07:32,160 --> 00:07:34,080
with how we approach AI development.

219
00:07:34,080 --> 00:07:37,400
Exactly, and that opens up a whole world of possibilities.

220
00:07:37,400 --> 00:07:41,800
Imagine AI systems that are not only more capable,

221
00:07:41,800 --> 00:07:44,560
but also more accessible to researchers

222
00:07:44,560 --> 00:07:46,360
and developers all over the world.

223
00:07:46,360 --> 00:07:47,840
So if I'm like a developer out there,

224
00:07:47,840 --> 00:07:49,840
and I'm like, man, I wanna play around

225
00:07:49,840 --> 00:07:51,520
with this, honey, you enlarge,

226
00:07:51,520 --> 00:07:54,200
what can I actually do with this open source model?

227
00:07:54,200 --> 00:07:56,280
Well, you can actually download the model weights

228
00:07:56,280 --> 00:07:59,640
and the code, right from 10Cent's GitHub repository.

229
00:07:59,640 --> 00:08:01,160
I can get my hands on the same tech

230
00:08:01,160 --> 00:08:02,760
that powered all those benchmark results

231
00:08:02,760 --> 00:08:03,600
we were talking about.

232
00:08:03,600 --> 00:08:04,440
You got it.

233
00:08:04,440 --> 00:08:05,760
Wow, okay, what could I do with that?

234
00:08:05,760 --> 00:08:07,960
So the possibilities are endless.

235
00:08:07,960 --> 00:08:10,280
You could fine tune the model for specific tasks,

236
00:08:10,280 --> 00:08:12,800
you could experiment with different architectures.

237
00:08:12,800 --> 00:08:14,720
You can even contribute to the development

238
00:08:14,720 --> 00:08:16,480
of the core technology itself.

239
00:08:16,480 --> 00:08:17,960
So it sounds like they're really creating

240
00:08:17,960 --> 00:08:19,960
a sort of like a community around this.

241
00:08:19,960 --> 00:08:22,520
Yeah, definitely a very collaborative

242
00:08:22,520 --> 00:08:25,600
and open approach to AI development.

243
00:08:25,600 --> 00:08:26,440
I like it.

244
00:08:26,440 --> 00:08:28,240
So to wrap this all up,

245
00:08:28,240 --> 00:08:30,440
what are your final thoughts on Honey When Large

246
00:08:30,440 --> 00:08:33,080
and this whole idea of like the future of AI?

247
00:08:33,080 --> 00:08:35,240
I think we're seeing a real paradigm shift here.

248
00:08:35,240 --> 00:08:40,000
We're moving away from these giant monolithic AI systems

249
00:08:40,000 --> 00:08:42,080
and towards these more specialized,

250
00:08:42,080 --> 00:08:44,080
adaptable collaborative models.

251
00:08:44,080 --> 00:08:45,680
And this is a prime example of that.

252
00:08:45,680 --> 00:08:48,000
Honey When Large is a fantastic example

253
00:08:48,000 --> 00:08:50,280
of this new wave of AI.

254
00:08:50,280 --> 00:08:52,840
And I think it's just the tip of the iceberg, honestly.

255
00:08:52,840 --> 00:08:55,360
I think we're entering a golden age of AI innovation

256
00:08:55,360 --> 00:08:57,800
and it's gonna be really interesting to see where it goes.

257
00:08:57,800 --> 00:08:58,640
I'm excited for it.

258
00:08:58,640 --> 00:09:00,040
Well, I wanna thank you for joining us

259
00:09:00,040 --> 00:09:02,440
on this deep dive into the world of Honey When Large.

260
00:09:02,440 --> 00:09:03,560
It's been my pleasure.

261
00:09:03,560 --> 00:09:05,800
It's been a really thought provoking conversation

262
00:09:05,800 --> 00:09:07,200
and to our listeners out there,

263
00:09:07,200 --> 00:09:09,240
if you wanna learn more about this research,

264
00:09:09,240 --> 00:09:11,440
we'll be sure to include links to the paper

265
00:09:11,440 --> 00:09:14,720
and 10 cents GitHub repository in the show notes.

266
00:09:14,720 --> 00:09:34,680
And until next time, stay curious.