1
00:00:00,000 --> 00:00:05,840
Welcome back everyone. Today we're going to take a deep dive into the world of cutting edge AI.

2
00:00:05,840 --> 00:00:06,880
Sounds exciting.

3
00:00:06,880 --> 00:00:12,480
It is. It is. Specifically, we're going to be focusing on two Chinese companies, DeepSeek

4
00:00:12,480 --> 00:00:19,680
and Moonshot AI. They're really making waves in the AI scene. And, you know, our main source for

5
00:00:19,680 --> 00:00:26,160
this deep dive is a document called Team of 20 Interns. It's a fascinating really, gives you

6
00:00:26,160 --> 00:00:31,200
real inside look at these companies. Oh wow. Team of 20 interns. That's interesting. It is. And what

7
00:00:31,200 --> 00:00:36,400
really struck me is how these relatively small teams, you know, mostly recent grads, even interns,

8
00:00:36,400 --> 00:00:42,560
are actually going head to head with giants like Google and Microsoft in this AI race.

9
00:00:42,560 --> 00:00:46,400
Yeah. That's wild. Like, how are they doing that? Well, that's what we're going to find out. That's

10
00:00:46,400 --> 00:00:51,840
the whole mission of this deep dive. Okay. I'm all ears. So to kick things off, DeepSeek, right,

11
00:00:51,840 --> 00:00:55,600
they've kind of come out of nowhere and just disrupted the entire AI landscape. I mean,

12
00:00:55,600 --> 00:00:59,760
they start to price war in China with their V2 model. Oh yeah. I remember hearing about that.

13
00:00:59,760 --> 00:01:04,640
Like, big players like Tencent and Alibaba were forced to lower their AI prices drastically.

14
00:01:04,640 --> 00:01:08,800
Exactly. There's a huge shock to the industry. It's almost like what happened with Pinduoduo,

15
00:01:08,800 --> 00:01:13,360
you know, in e-commerce with their whole discount model. You know what? I was thinking the exact

16
00:01:13,360 --> 00:01:18,720
same thing. I read somewhere that DeepSeek even got the nickname the AI Pinduoduo. It's a pretty

17
00:01:18,720 --> 00:01:23,440
apt comparison, I think. Yeah. I mean, the question is how are they able to offer such low prices and

18
00:01:23,440 --> 00:01:29,280
still, you know, actually make a profit? That's the million dollar question. It is. And it all

19
00:01:29,280 --> 00:01:34,880
comes down to their approach to AI model architecture. They've got this system called

20
00:01:34,880 --> 00:01:42,240
memory efficient local attention or MLA for short. Okay. MLA. Sounds pretty technical.

21
00:01:42,240 --> 00:01:48,160
It is, but it's the key to their success. So most AI models use a mechanism called multi-head

22
00:01:48,160 --> 00:01:54,000
attention, MHA, to process information. Think of it like the AI's way of connecting the dots

23
00:01:54,000 --> 00:01:58,400
between different parts of data. The problem is MHA takes a lot of memory to run.

24
00:01:58,400 --> 00:02:02,160
Okay. So DeepSeek's MLA is like a more efficient way to connect those dots.

25
00:02:02,160 --> 00:02:06,080
Exactly. Instead of trying to process all the data at once like MHA does,

26
00:02:06,080 --> 00:02:11,360
MLA focuses on smaller localized chunks. Hmm. Okay. So it's not trying to do everything at once.

27
00:02:11,360 --> 00:02:15,920
Right. It's more like reading a book one paragraph at a time, absorbing it fully,

28
00:02:15,920 --> 00:02:20,240
and then moving on. Whereas MHA is trying to take in the entire book all at once.

29
00:02:20,240 --> 00:02:24,160
Ah, okay. So you get the depth of understanding without overloading this.

30
00:02:24,160 --> 00:02:25,440
Precisely. You got it.

31
00:02:25,440 --> 00:02:28,640
But does that ever like backfire? Are there situations where this

32
00:02:29,280 --> 00:02:34,560
hyper focused approach, you know, might not be as effective as the traditional way?

33
00:02:34,560 --> 00:02:40,800
That's a great question. While MLA is like really good at processing sequential data, like text,

34
00:02:40,800 --> 00:02:46,800
there might be instances where that broader context provided by MHA is beneficial. So

35
00:02:46,800 --> 00:02:52,800
for things like complex image recognition or understanding subtle relationships across

36
00:02:52,800 --> 00:02:55,680
huge data sets, MHA might still have an edge.

37
00:02:55,680 --> 00:02:58,240
So it's about choosing the right tool for the job, basically.

38
00:02:58,240 --> 00:02:59,520
Exactly. It's a trade off.

39
00:02:59,520 --> 00:03:03,600
It seems like DeepSeek is really betting on MLA being the right tool for a lot of jobs, though.

40
00:03:03,600 --> 00:03:08,400
I mean, they've even implemented this other innovation called a DeepSeek-MoE sparse design

41
00:03:08,400 --> 00:03:10,400
to make their models even more streamlined.

42
00:03:10,400 --> 00:03:14,720
Oh yeah, DeepSeek-MoE, which stands for mixture of experts.

43
00:03:14,720 --> 00:03:20,960
It's a pretty clever system. Basically, different parts of the model specialize in different tasks.

44
00:03:20,960 --> 00:03:22,800
So like division of labor.

45
00:03:22,800 --> 00:03:28,800
Exactly. Specialization. And because of that, not every part of the model needs to be active

46
00:03:28,800 --> 00:03:34,480
for every task. It's much more efficient. The sparse part of the name means that only a small

47
00:03:34,480 --> 00:03:39,840
subset of these experts is used at any given time, which, you know, further reduces the

48
00:03:39,840 --> 00:03:41,040
computational load.

49
00:03:41,040 --> 00:03:45,120
Okay, so they're not just optimizing how the AI processes information.

50
00:03:45,120 --> 00:03:48,320
They're actually optimizing the structure of the AI itself.

51
00:03:48,320 --> 00:03:51,360
Exactly. And that's how they can offer such competitive pricing.

52
00:03:51,360 --> 00:03:54,400
Amazing. But it's not just about cheap chatbots.

53
00:03:54,400 --> 00:03:54,720
Right.

54
00:03:55,600 --> 00:03:59,520
They're founder, Liang Wenfeng. He's got his sights set on something much bigger.

55
00:03:59,520 --> 00:04:03,040
Oh, absolutely. He's talking about artificial general intelligence, AGI.

56
00:04:03,040 --> 00:04:06,320
AGI. That's the holy grail of AI, isn't it?

57
00:04:06,320 --> 00:04:11,280
It is. And Liang Wenfeng really believes it's the ultimate goal. But his approach is different.

58
00:04:11,280 --> 00:04:15,120
He's not just chasing trends or, you know, trying to copy what others are doing.

59
00:04:15,120 --> 00:04:16,480
So what's he doing differently?

60
00:04:16,480 --> 00:04:21,440
He's focused on building a solid foundation through architectural innovation.

61
00:04:22,080 --> 00:04:26,240
He believes that true progress comes from fundamental changes in how these AI systems

62
00:04:26,240 --> 00:04:28,720
are designed. And that's what DeepSeek is all about.

63
00:04:29,280 --> 00:04:32,640
So they're not just building a better chatbot. They're trying to build a better brain.

64
00:04:33,200 --> 00:04:34,320
It's a great way to put it.

65
00:04:34,320 --> 00:04:38,640
Okay, so that's DeepSeek. Now let's shift gears a bit and talk about Moonshot AI,

66
00:04:38,640 --> 00:04:44,240
the other player in this AI revolution. They've developed something called Kimi K1.5,

67
00:04:44,240 --> 00:04:47,040
a powerful multimodal AI.

68
00:04:47,040 --> 00:04:50,560
Yes, Kimi K1.5. It's making a lot of noise in the AI community.

69
00:04:50,560 --> 00:04:52,720
Multimodal AI. What exactly does that mean?

70
00:04:52,720 --> 00:04:57,120
Well, it means it can understand and process different types of data, like text, images,

71
00:04:57,120 --> 00:04:58,480
code, all at the same time.

72
00:04:58,480 --> 00:05:00,560
This is like an AI with multiple senses.

73
00:05:00,560 --> 00:05:04,160
Exactly. And to get this, Kimi K1.5 has actually

74
00:05:04,160 --> 00:05:09,200
been outperforming some of the biggest names in AI, like GPT-4, and certain benchmarks,

75
00:05:09,200 --> 00:05:10,800
especially math and coding.

76
00:05:10,800 --> 00:05:14,960
Wow, really? That's impressive. So it can juggle different types of data,

77
00:05:14,960 --> 00:05:18,320
and it's a math and coding whiz. What's the secret to its success?

78
00:05:18,320 --> 00:05:21,760
Well, one key factor is its massive context window.

79
00:05:22,400 --> 00:05:29,040
Imagine the AI's memory like a container. Kimi K1.5 has a huge container, so it can

80
00:05:29,040 --> 00:05:32,080
hold and process a lot more information compared to other models.

81
00:05:32,080 --> 00:05:33,600
It's like giving it an elephant's memory.

82
00:05:33,600 --> 00:05:39,440
Aha, yeah, something like that. We're talking about 128,000 token context window.

83
00:05:39,440 --> 00:05:40,720
That's massive.

84
00:05:40,720 --> 00:05:45,440
No wonder it can handle such complex problems. And what's even more remarkable is that they've

85
00:05:45,440 --> 00:05:50,240
made this powerful tool accessible to everyone through a free chat interface.

86
00:05:50,240 --> 00:05:54,000
That's right. Anyone can experience its capabilities firsthand.

87
00:05:54,000 --> 00:05:59,040
That's incredible. And speaking of access, both DeepSeek and Moonshot AI have embraced

88
00:05:59,040 --> 00:06:03,520
open source methodologies, which is quite different from how a lot of AI companies operate.

89
00:06:03,520 --> 00:06:07,200
Oh, absolutely. It's a major shift from the traditional approach of keeping everything

90
00:06:07,200 --> 00:06:08,160
under lock and key.

91
00:06:08,160 --> 00:06:10,320
It seems like they're not just competing with the big players.

92
00:06:10,320 --> 00:06:13,760
They're actually challenging the entire way the AI industry works.

93
00:06:13,760 --> 00:06:16,800
Exactly. And the impact is already being felt globally.

94
00:06:17,440 --> 00:06:23,200
DeepSeek's success, especially their low inference costs, has sent ripples throughout the industry.

95
00:06:23,200 --> 00:06:26,560
Yeah, even seasoned investors are trying to figure out how these smaller companies are

96
00:06:26,560 --> 00:06:27,600
changing the game.

97
00:06:27,600 --> 00:06:32,320
It's a real shakeup. And it raises a lot of questions about the future of AI.

98
00:06:32,320 --> 00:06:36,400
Will these smaller, more agile companies, with their open source approach,

99
00:06:37,040 --> 00:06:39,680
ultimately reshape the entire landscape?

100
00:06:39,680 --> 00:06:43,840
That's a fascinating question and one that I think we'll be exploring further as we delve

101
00:06:43,840 --> 00:06:47,440
deeper into the specifics of these companies and their AI models.

102
00:06:47,440 --> 00:06:50,160
But before we move on, I'm curious to hear your initial thoughts.

103
00:06:50,160 --> 00:06:54,560
What stands out to you most about DeepSeek and Moonshot AI so far?

104
00:06:54,560 --> 00:06:59,360
Honestly, what intrigues me most is their shared belief in the power of open source.

105
00:06:59,360 --> 00:07:03,440
They're not afraid to share their knowledge, and they believe that the true value lies in

106
00:07:03,440 --> 00:07:05,840
collaboration and collective advancement.

107
00:07:05,840 --> 00:07:10,480
It's a refreshing perspective in a field that's often so secretive and competitive.

108
00:07:10,480 --> 00:07:14,160
It's definitely a bold approach, and it's clearly making waves.

109
00:07:14,160 --> 00:07:15,680
So let's delve a little deeper.

110
00:07:15,680 --> 00:07:19,920
Let's start by taking a closer look at DeepSeek's revolutionary model architecture

111
00:07:19,920 --> 00:07:24,400
and how their MLA and DeepSeek Moa designs are achieving such remarkable efficiency.

112
00:07:24,400 --> 00:07:28,560
So let's unpack this DeepSeek's model architecture a little bit more.

113
00:07:28,560 --> 00:07:33,760
We talked about memory efficient local attention MLA earlier, but it's worth really digging

114
00:07:33,760 --> 00:07:35,520
into how it actually works.

115
00:07:35,520 --> 00:07:40,560
Traditional multi-head attention, MHA processes all the data at once, and that requires a ton

116
00:07:40,560 --> 00:07:41,360
of memory.

117
00:07:41,360 --> 00:07:46,880
MLA, on the other hand, breaks the data down into smaller, more localized chunks.

118
00:07:46,880 --> 00:07:50,160
So instead of trying to grasp everything all at once, it's more like focusing on

119
00:07:50,960 --> 00:07:52,960
smaller details and then putting them together.

120
00:07:52,960 --> 00:07:57,920
Exactly, and that approach has some big benefits when it comes to efficiency.

121
00:07:57,920 --> 00:08:02,480
MLA only needs to focus on a small portion of the data at any given time.

122
00:08:02,480 --> 00:08:02,800
Right?

123
00:08:02,800 --> 00:08:05,200
So it uses way less memory than MHA.

124
00:08:05,200 --> 00:08:10,560
Like, it can achieve the same level of accuracy using, get this, only 5% to 13% of the memory

125
00:08:10,560 --> 00:08:11,040
footprint.

126
00:08:11,040 --> 00:08:13,040
Wow, that's a huge difference.

127
00:08:13,040 --> 00:08:13,360
It is.

128
00:08:13,360 --> 00:08:15,920
It's huge in terms of both cost and speed.

129
00:08:15,920 --> 00:08:18,800
Okay, so they're making the AI, I think, smarter, not harder.

130
00:08:18,800 --> 00:08:19,280
Yeah.

131
00:08:19,280 --> 00:08:24,320
But doesn't focusing so intently on those smaller chunks risk missing the bigger picture?

132
00:08:24,960 --> 00:08:27,040
Are there any downsides to this approach?

133
00:08:27,040 --> 00:08:28,320
Yeah, that's a valid point.

134
00:08:28,960 --> 00:08:35,600
While MLA is great, incredibly efficient, actually, for tasks involving sequential data

135
00:08:35,600 --> 00:08:39,200
like text, it might not be the best solution for every situation.

136
00:08:39,200 --> 00:08:42,480
There are times when understanding the broader context is crucial.

137
00:08:42,480 --> 00:08:43,200
Okay, I see.

138
00:08:43,200 --> 00:08:48,400
So for analyzing an image with a lot of details, or maybe understanding connections across a

139
00:08:48,400 --> 00:08:52,560
huge data set, the traditional MHA approach might still be better.

140
00:08:52,560 --> 00:08:53,440
Yeah, exactly.

141
00:08:53,440 --> 00:08:56,000
So it's about knowing the limitations, right?

142
00:08:56,000 --> 00:08:56,320
Right.

143
00:08:56,320 --> 00:09:01,440
And it seems like DeepSeq is making a calculated bet that the benefits of MLA outweigh those

144
00:09:01,440 --> 00:09:03,040
potential downsides.

145
00:09:03,040 --> 00:09:07,440
And they've actually doubled down on this efficiency drive with their DeepSeqMo sparse

146
00:09:07,440 --> 00:09:07,840
design.

147
00:09:07,840 --> 00:09:08,320
Right, right.

148
00:09:08,320 --> 00:09:10,320
The DeepSeqMo, we talked about that earlier.

149
00:09:10,320 --> 00:09:10,880
Yes.

150
00:09:10,880 --> 00:09:16,400
DeepSeqMo, or mixture of experts, adds another layer of specialization to the model.

151
00:09:16,400 --> 00:09:21,760
It's like, imagine you have a team of experts, and each one specializes in a particular area.

152
00:09:21,760 --> 00:09:27,040
Instead of having everyone work on every problem, you just assign tasks based on their expertise.

153
00:09:27,040 --> 00:09:28,240
Delegate, delegate.

154
00:09:28,240 --> 00:09:28,720
Exactly.

155
00:09:28,720 --> 00:09:31,520
And that's essentially what's happening in the DeepSeqMo system.

156
00:09:31,520 --> 00:09:35,440
Different parts of the model are trained to handle specific types of tasks,

157
00:09:35,440 --> 00:09:39,360
so only the relevant experts are activated for a given input.

158
00:09:39,360 --> 00:09:42,400
So it's like they built an AI with a specialized workforce.

159
00:09:42,400 --> 00:09:42,960
Yes.

160
00:09:42,960 --> 00:09:43,760
Ha-ha.

161
00:09:43,760 --> 00:09:45,040
That's a good way to put it.

162
00:09:45,040 --> 00:09:51,200
And the sparse part means that only a small subset of these experts is used at any given time.

163
00:09:51,200 --> 00:09:54,160
So that reduces the computational load even further.

164
00:09:54,160 --> 00:09:59,520
OK, so they're not just optimizing how the AI processes information, but also the entire

165
00:09:59,520 --> 00:10:01,120
structure of the AI itself.

166
00:10:01,120 --> 00:10:01,920
Precisely.

167
00:10:01,920 --> 00:10:05,680
And this focus on efficiency, it's not just about saving money.

168
00:10:05,680 --> 00:10:08,080
It's about opening up new possibilities.

169
00:10:08,080 --> 00:10:13,520
By reducing that computational burden, DeepSeq can train larger, more powerful models that

170
00:10:13,520 --> 00:10:15,760
can handle even more complex tasks.

171
00:10:15,760 --> 00:10:17,440
So it's a long-term strategy.

172
00:10:17,440 --> 00:10:17,920
It is.

173
00:10:17,920 --> 00:10:23,280
And it aligns with their ultimate goal, which is pursuing artificial general intelligence, AGI.

174
00:10:23,280 --> 00:10:24,080
Right, AGI.

175
00:10:24,080 --> 00:10:25,360
I'm glad you brought that up again.

176
00:10:25,360 --> 00:10:28,880
It's easy to get caught up in all the technical stuff and lose sight of the bigger picture.

177
00:10:28,880 --> 00:10:30,800
Yeah, it's important to keep that in mind.

178
00:10:30,800 --> 00:10:35,760
And DeepSeq's founder, Liang Wen-Fun, he seems to be approaching AGI in a very,

179
00:10:35,760 --> 00:10:37,360
I don't know, almost philosophical way.

180
00:10:37,360 --> 00:10:38,480
Oh, absolutely.

181
00:10:38,480 --> 00:10:44,080
He's not interested in quick wins or building a chatbot that can trick people into thinking

182
00:10:44,080 --> 00:10:44,960
it's human.

183
00:10:44,960 --> 00:10:49,120
He wants to create systems that can truly understand and reason about the world,

184
00:10:49,120 --> 00:10:50,480
just like humans do.

185
00:10:50,480 --> 00:10:56,880
And he believes that the key is building that strong foundation through architectural innovation.

186
00:10:56,880 --> 00:11:00,640
So they're not just trying to create a better version of what we already have.

187
00:11:00,640 --> 00:11:03,760
They're aiming for something fundamentally different.

188
00:11:03,760 --> 00:11:04,320
Exactly.

189
00:11:04,320 --> 00:11:06,000
And it'll be fascinating to see how that plays out.

190
00:11:06,000 --> 00:11:13,360
Now, let's switch gears for a bit and talk about Moonshot AI and their Kimi K1.5 model.

191
00:11:13,360 --> 00:11:18,320
We mentioned earlier that it's a multimodal AI capable of processing different types of data

192
00:11:18,320 --> 00:11:19,120
at the same time.

193
00:11:19,120 --> 00:11:20,000
Right, right.

194
00:11:20,000 --> 00:11:21,760
But what does that actually mean in practice?

195
00:11:21,760 --> 00:11:28,000
Well, imagine an AI that can read a research paper, analyze a graph, and write code all at the same time.

196
00:11:28,000 --> 00:11:30,000
That's pretty mind-blowing.

197
00:11:30,000 --> 00:11:31,120
Is that even possible?

198
00:11:31,120 --> 00:11:31,760
It is.

199
00:11:31,760 --> 00:11:34,160
And Kimi K1.5 is proof of that.

200
00:11:34,160 --> 00:11:39,120
This model has shown some amazing capabilities and tasks that require, you know,

201
00:11:39,120 --> 00:11:41,680
combining information from different sources.

202
00:11:41,680 --> 00:11:46,720
For example, it can accurately identify and extract numerical data from images,

203
00:11:46,720 --> 00:11:49,760
something that most language models just can't do.

204
00:11:49,760 --> 00:11:50,560
Wow.

205
00:11:50,560 --> 00:11:53,200
So it's not just understanding different types of data separately.

206
00:11:53,200 --> 00:11:55,120
It's about connecting the dots between them.

207
00:11:55,120 --> 00:11:55,600
Exactly.

208
00:11:55,600 --> 00:12:02,160
And that ability to process information holistically is a big step towards creating AI systems

209
00:12:02,160 --> 00:12:04,160
that can understand the world the way we do.

210
00:12:04,160 --> 00:12:07,520
Because we don't experience reality through just one sense.

211
00:12:07,520 --> 00:12:10,000
We use all of our senses to build a complete picture.

212
00:12:10,000 --> 00:12:10,480
Right.

213
00:12:10,480 --> 00:12:14,480
And Kimi K1.5 is, you know, a step in that direction for AI.

214
00:12:14,480 --> 00:12:17,280
So even though it's not specifically designed for AGI,

215
00:12:17,280 --> 00:12:21,680
it's still contributing to that larger goal by pushing the boundaries of multimodal understanding.

216
00:12:21,680 --> 00:12:22,400
Absolutely.

217
00:12:22,400 --> 00:12:27,520
And Moonshot AI's commitment to open source development makes that contribution even bigger.

218
00:12:27,520 --> 00:12:33,520
By making Kimi K1.5 freely accessible, they're empowering researchers and developers all over the world

219
00:12:33,520 --> 00:12:36,480
to experiment with it and explore new possibilities.

220
00:12:36,480 --> 00:12:44,800
Speaking of open source, I read that Kimi K1.5 actually outperformed GPV4 and Claude 3.5

221
00:12:44,800 --> 00:12:47,760
in some pretty tough math and coding tests.

222
00:12:47,760 --> 00:12:48,960
How's that even possible?

223
00:12:48,960 --> 00:12:51,600
Those are like the big names in AI.

224
00:12:51,600 --> 00:12:53,120
It is remarkable, isn't it?

225
00:12:53,120 --> 00:12:57,840
Part of Kimi K1.5's success is because of its huge context window.

226
00:12:57,840 --> 00:13:02,720
Remember, it can hold and process incredibly long sequences of information.

227
00:13:02,720 --> 00:13:05,600
And that's really helpful for tasks like code generation,

228
00:13:05,600 --> 00:13:08,720
where you need to understand how different parts of the code are connected.

229
00:13:08,720 --> 00:13:11,760
So it's like giving the AI a much bigger working memory.

230
00:13:11,760 --> 00:13:12,320
Exactly.

231
00:13:12,320 --> 00:13:13,600
But that's not the whole story.

232
00:13:13,600 --> 00:13:18,880
Moonshot AI has also used some clever techniques during training to optimize its performance.

233
00:13:18,880 --> 00:13:22,000
They've used methods like rejection sampling and partial rollout.

234
00:13:22,000 --> 00:13:23,600
Those sound pretty technical.

235
00:13:23,600 --> 00:13:25,600
Can you break those down for us non-AI folks?

236
00:13:25,600 --> 00:13:26,080
Sure.

237
00:13:26,080 --> 00:13:29,520
Imagine you're teaching someone to play, I don't know, a musical instrument.

238
00:13:29,520 --> 00:13:31,600
You wouldn't just let them play random notes and hope for the best.

239
00:13:31,600 --> 00:13:35,120
You'd guide them, correct their mistakes, and encourage them to try different approaches.

240
00:13:35,120 --> 00:13:36,080
You even feedback?

241
00:13:36,080 --> 00:13:36,560
Right.

242
00:13:36,560 --> 00:13:39,680
And that's kind of what rejection sampling and partial rollouts do.

243
00:13:39,680 --> 00:13:44,000
They help the AI explore different solutions during training and learn to pick the best ones.

244
00:13:44,000 --> 00:13:47,360
So it's like having a virtual music teacher for the AI.

245
00:13:47,360 --> 00:13:49,520
Yeah, that's a great analogy.

246
00:13:49,520 --> 00:13:55,920
And by refining that training process, Moonshot AI has created a model that can handle some really complex tasks.

247
00:13:55,920 --> 00:14:05,440
It's fascinating to see how both DeepSeek and Moonshot AI are tackling AI development with such different yet effective strategies.

248
00:14:05,440 --> 00:14:11,520
It really highlights the diversity of thought in this field and the incredible pace of innovation.

249
00:14:11,520 --> 00:14:17,520
Now, let's put these two impressive AI models head to head and see how they stack up in some real-world tests.

250
00:14:17,520 --> 00:14:22,160
Okay, so they designed a bunch of tasks to see how these models performed in different situations.

251
00:14:22,160 --> 00:14:26,960
One test involved analyzing images like tables or charts that had numerical data in them.

252
00:14:26,960 --> 00:14:29,200
Kemi K 1.5, it aced this.

253
00:14:29,200 --> 00:14:31,600
It picked out the text and numbers without any problems.

254
00:14:31,600 --> 00:14:33,840
DeepSeek's R1 model, though, had a little trouble.

255
00:14:33,840 --> 00:14:35,200
It mixed up some of the values.

256
00:14:35,200 --> 00:14:39,360
So Kemi K 1.5 is pretty good at understanding visual information, then.

257
00:14:39,360 --> 00:14:42,480
What about language-focused tasks?

258
00:14:42,480 --> 00:14:43,440
How do they handle those?

259
00:14:43,440 --> 00:14:45,200
Well, they had this web search challenge.

260
00:14:45,200 --> 00:14:49,040
The goal was to find red gowns that cost under $200.

261
00:14:49,040 --> 00:14:50,960
Okay, sounds pretty straightforward.

262
00:14:50,960 --> 00:14:51,760
It does.

263
00:14:51,760 --> 00:14:56,080
But DeepSeek R1, it returned a bunch of links, but some of them weren't quite right.

264
00:14:56,080 --> 00:14:58,560
They weren't red bounds or they were over $200.

265
00:14:58,560 --> 00:15:00,320
So it didn't quite get the search parameters.

266
00:15:00,320 --> 00:15:00,800
Right.

267
00:15:00,800 --> 00:15:04,640
But Kemi K 1.5, it delivered two perfect links.

268
00:15:04,640 --> 00:15:06,800
And it even gave additional options in a separate panel.

269
00:15:06,800 --> 00:15:10,160
It really understood the color and price limits.

270
00:15:10,160 --> 00:15:11,120
Impressive.

271
00:15:11,120 --> 00:15:16,560
Did they test how the models handle multiple documents?

272
00:15:16,560 --> 00:15:20,640
Yeah, they gave each model a set of files and asked for a summary of the information.

273
00:15:20,640 --> 00:15:22,000
Kind of like a research assistant.

274
00:15:22,000 --> 00:15:22,880
Exactly.

275
00:15:22,880 --> 00:15:28,400
And Kemi K 1.5, it managed to summarize at least two out of the three files they gave it.

276
00:15:28,400 --> 00:15:32,640
DeepSeek R1, however, struggled with this unless the files were given one at a time.

277
00:15:32,640 --> 00:15:37,600
So it seems like handling a lot of information from multiple sources at once is a bit of a

278
00:15:37,600 --> 00:15:39,440
weak spot for DeepSeek R1.

279
00:15:39,440 --> 00:15:40,240
What about coding?

280
00:15:40,240 --> 00:15:41,440
How do they do with that?

281
00:15:41,440 --> 00:15:42,960
Ah, the coding challenge.

282
00:15:42,960 --> 00:15:46,080
They tasked them with creating a Snakes and Ladders game.

283
00:15:46,080 --> 00:15:48,240
And this is where DeepSeek R1 really stood out.

284
00:15:48,240 --> 00:15:52,560
It made a more advanced game with clearer features and it was actually playable.

285
00:15:52,560 --> 00:15:52,880
Oh, wow.

286
00:15:52,880 --> 00:15:53,840
So it actually made a game.

287
00:15:54,480 --> 00:15:55,280
It did.

288
00:15:55,280 --> 00:15:59,440
Kemi K 1.5's version, it worked, but it was simpler.

289
00:15:59,440 --> 00:16:03,040
And it had some bugs that let the pieces move off the board.

290
00:16:03,040 --> 00:16:07,600
So DeepSeek R1 might be better at those complex coding tasks.

291
00:16:07,600 --> 00:16:10,960
Did either of them actually get the Snakes and Ladders logic right though?

292
00:16:10,960 --> 00:16:13,200
Well, neither one fully implemented it.

293
00:16:13,200 --> 00:16:17,760
The gameplay was basically just moving pieces around randomly, which just goes to show

294
00:16:17,760 --> 00:16:21,200
that even with all these advancements, AI still has a long way to go.

295
00:16:21,200 --> 00:16:22,320
That's a good point.

296
00:16:22,320 --> 00:16:25,520
It's easy to get carried away with all the hype and forget that we're still on the

297
00:16:25,520 --> 00:16:26,320
Earth's stages.

298
00:16:26,320 --> 00:16:30,000
But even with their limitations, these models are pretty incredible.

299
00:16:30,000 --> 00:16:30,800
Oh, absolutely.

300
00:16:30,800 --> 00:16:33,920
And these tests, they weren't about finding a winner or loser.

301
00:16:33,920 --> 00:16:37,760
It's more about understanding the strengths and weaknesses of each model.

302
00:16:37,760 --> 00:16:44,880
DeepSeek R1 is a coding whiz, while Kemi K 1.5 shines in web searching, image analysis,

303
00:16:44,880 --> 00:16:46,880
and summarizing multiple documents.

304
00:16:46,880 --> 00:16:49,280
So different tools for different jobs, basically.

305
00:16:49,280 --> 00:16:49,920
Right.

306
00:16:49,920 --> 00:16:54,240
And what's really cool is that both of these companies are all in on open source.

307
00:16:54,240 --> 00:16:59,120
They're all about transparency and collaboration, which is pretty refreshing in an industry that's

308
00:16:59,120 --> 00:17:01,920
usually, you know, all about secrecy and competition.

309
00:17:01,920 --> 00:17:04,240
Like they're building a community, not just tools.

310
00:17:04,240 --> 00:17:04,960
Exactly.

311
00:17:04,960 --> 00:17:10,640
DeepSeek believes that real progress happens when you share knowledge, not hide it.

312
00:17:10,640 --> 00:17:16,640
And Moonshot AI seems to agree, considering their open access to Kemi K 1.5.

313
00:17:16,640 --> 00:17:21,200
Yeah, they're fostering this whole ecosystem of developers and researchers who are contributing

314
00:17:21,200 --> 00:17:23,440
to its development, so it's constantly evolving.

315
00:17:23,440 --> 00:17:25,840
It's a really exciting time for AI, that's for sure.

316
00:17:25,840 --> 00:17:26,400
It is.

317
00:17:26,400 --> 00:17:30,800
And it'll be interesting to see how these new open source models change the game even more.

318
00:17:30,800 --> 00:17:35,040
I think DeepSeek and Moonshot AI have definitely shaken things up.

319
00:17:35,040 --> 00:17:39,760
Their success, especially DeepSeek's low costs, has got everyone's attention.

320
00:17:39,760 --> 00:17:43,760
Yeah, even the big companies like Microsoft, Google, and Meta are taking notice.

321
00:17:43,760 --> 00:17:47,120
They're having to rethink their strategies to stay competitive.

322
00:17:47,120 --> 00:17:50,320
And the investors are probably watching this all very closely too.

323
00:17:50,320 --> 00:17:51,120
Oh, for sure.

324
00:17:51,120 --> 00:17:52,400
This is a big deal.

325
00:17:52,400 --> 00:17:54,320
And not just for the corporate world.

326
00:17:54,320 --> 00:17:59,200
We're seeing more and more support for open source AI all around.

327
00:17:59,200 --> 00:18:00,880
It's like a movement.

328
00:18:00,880 --> 00:18:01,360
It is.

329
00:18:01,360 --> 00:18:05,760
And the core of it is this fundamental question, who will control AI?

330
00:18:05,760 --> 00:18:10,640
Will it be a few powerful entities or will it benefit everyone?

331
00:18:10,640 --> 00:18:13,200
That's a big question and a really important one.

332
00:18:13,200 --> 00:18:17,440
I think the work that DeepSeek and Moonshot AI are doing gives us some hope though.

333
00:18:17,440 --> 00:18:22,560
They're proving that innovation can happen outside the big corporations and that collaboration

334
00:18:22,560 --> 00:18:25,120
and openness can lead to amazing breakthroughs.

335
00:18:25,120 --> 00:18:26,400
It's a powerful message.

336
00:18:26,400 --> 00:18:26,800
Yeah.

337
00:18:26,800 --> 00:18:31,840
And a really inspiring one for anyone who believes in making technology accessible to everyone.

338
00:18:31,840 --> 00:18:35,840
Well, folks, that brings us to the end of our deep dive into the world of cutting edge AI.

339
00:18:35,840 --> 00:18:40,240
We've explored the rise of these two incredible companies, DeepSeek and Moonshot AI,

340
00:18:40,240 --> 00:18:43,840
and we pondered the implications of their open source approach for the future.

341
00:18:43,840 --> 00:18:48,240
We'd love to hear your thoughts on all of this, head over to our website, and share your reactions.

342
00:18:48,240 --> 00:18:51,440
Does this new wave of OP source AI inspire you?

343
00:18:51,440 --> 00:18:52,800
Does it make you nervous?

344
00:18:52,800 --> 00:18:53,680
Let us know.

345
00:18:53,680 --> 00:18:58,320
And until next time, keep exploring, keep questioning, and keep diving deep.

