1
00:00:00,000 --> 00:00:02,600
Welcome to the Daily AI News podcast.

2
00:00:02,600 --> 00:00:04,880
Get ready for another deep dive today.

3
00:00:04,880 --> 00:00:07,400
We've got a ton of really cool stories today

4
00:00:07,400 --> 00:00:10,880
from image generation to video understanding

5
00:00:10,880 --> 00:00:14,160
and even chat GPT's been doing some stuff.

6
00:00:14,160 --> 00:00:16,680
But I guess the big theme is how AI models

7
00:00:16,680 --> 00:00:18,800
are interacting with different kinds of data.

8
00:00:18,800 --> 00:00:19,680
Yeah, that's right.

9
00:00:19,680 --> 00:00:20,520
Like video.

10
00:00:20,520 --> 00:00:22,400
Yeah, it's always been pretty tricky for AI

11
00:00:22,400 --> 00:00:25,120
to really process and understand video content.

12
00:00:25,120 --> 00:00:27,680
So seeing progress in that area is really exciting.

13
00:00:27,680 --> 00:00:29,600
Okay, so let's start with this new research paper

14
00:00:29,600 --> 00:00:32,640
from Metta and Scanford about video understanding

15
00:00:32,640 --> 00:00:34,960
in large multimodal models.

16
00:00:34,960 --> 00:00:37,440
Right, so they basically dug into all these design

17
00:00:37,440 --> 00:00:38,400
and training factors.

18
00:00:38,400 --> 00:00:40,880
Their findings are like gold for optimizing

19
00:00:40,880 --> 00:00:42,920
how these models learn and do their thing.

20
00:00:42,920 --> 00:00:44,800
Okay, so one of their key discoveries

21
00:00:44,800 --> 00:00:47,160
is this scaling consistency.

22
00:00:47,160 --> 00:00:49,960
It's basically like you're trying out a cake recipe

23
00:00:49,960 --> 00:00:51,560
but you don't wanna waste a bunch of ingredients

24
00:00:51,560 --> 00:00:53,240
on a huge cake if it doesn't turn out.

25
00:00:53,240 --> 00:00:55,120
So you make a small test batch first.

26
00:00:55,120 --> 00:00:56,600
That's perfect, yeah.

27
00:00:56,600 --> 00:00:57,720
It's incredibly efficient.

28
00:00:57,720 --> 00:00:59,880
Like why waste time and resources

29
00:00:59,880 --> 00:01:03,160
on training these huge massive models?

30
00:01:03,160 --> 00:01:05,640
When you can figure things out with smaller models

31
00:01:05,640 --> 00:01:08,360
and have confidence, those findings

32
00:01:08,360 --> 00:01:10,080
will translate to the bigger ones.

33
00:01:10,080 --> 00:01:11,880
Yeah, it's like making sure your recipe works

34
00:01:11,880 --> 00:01:13,520
before you go all out.

35
00:01:13,520 --> 00:01:17,200
Okay, so in the paper, they introduced this family

36
00:01:17,200 --> 00:01:19,400
of video language models called Apollo.

37
00:01:19,400 --> 00:01:21,760
Apollo is pretty groundbreaking actually.

38
00:01:21,760 --> 00:01:24,480
It's a big jump forward in video understanding.

39
00:01:24,480 --> 00:01:25,960
Like setting new benchmarks

40
00:01:25,960 --> 00:01:27,800
for how well these models perform.

41
00:01:27,800 --> 00:01:31,440
So what does that actually mean for us non-AI experts?

42
00:01:31,440 --> 00:01:33,880
Well, think about searching for a specific moment

43
00:01:33,880 --> 00:01:36,400
in a video just by describing what happens.

44
00:01:36,400 --> 00:01:38,800
That's the kind of capability Apollo is unlocking.

45
00:01:38,800 --> 00:01:40,920
Oh, wow, so like if I'm looking for the part

46
00:01:40,920 --> 00:01:43,320
in a cooking video where they add the secret ingredient,

47
00:01:43,320 --> 00:01:45,160
I could just type in, show me

48
00:01:45,160 --> 00:01:46,600
when they add the secret ingredient.

49
00:01:46,600 --> 00:01:47,560
Exactly.

50
00:01:47,560 --> 00:01:48,520
Wow, that is super cool.

51
00:01:48,520 --> 00:01:50,200
So there were some pretty interesting details

52
00:01:50,200 --> 00:01:51,040
in the paper too.

53
00:01:51,040 --> 00:01:53,160
Yeah, lots of little things that make a big difference.

54
00:01:53,160 --> 00:01:55,080
Like they found that frames per second sampling

55
00:01:55,080 --> 00:01:56,800
is better than uniform sampling.

56
00:01:56,800 --> 00:01:58,880
And there's also this sweet spot

57
00:01:58,880 --> 00:02:01,680
for the number of tokens per frame,

58
00:02:01,680 --> 00:02:03,720
with 832 being optimal.

59
00:02:03,720 --> 00:02:04,680
Oh, interesting.

60
00:02:04,680 --> 00:02:07,400
And get this, just adding simple tokens

61
00:02:07,400 --> 00:02:10,560
between video tokens from different frames or clips

62
00:02:10,560 --> 00:02:12,560
is enough for efficient integration.

63
00:02:12,560 --> 00:02:14,560
Right, it's like giving the AI little hints

64
00:02:14,560 --> 00:02:15,800
to help it connect the dots.

65
00:02:15,800 --> 00:02:18,400
Yeah, and speaking of things that are coming out right now,

66
00:02:18,400 --> 00:02:20,680
Meta just announced a major update

67
00:02:20,680 --> 00:02:23,440
to their Ray-Ban Meta smart glasses.

68
00:02:23,440 --> 00:02:26,880
They're adding real-time AI video capabilities.

69
00:02:26,880 --> 00:02:27,720
That's pretty awesome.

70
00:02:27,720 --> 00:02:29,600
It's like seeing those cutting-edge research ideas

71
00:02:29,600 --> 00:02:31,000
coming to life right in front of us.

72
00:02:31,000 --> 00:02:32,640
Right, and it's happening so fast.

73
00:02:32,640 --> 00:02:35,080
So what can you actually do with these new glasses?

74
00:02:35,080 --> 00:02:39,080
Well, imagine having ongoing conversations with Meta AI,

75
00:02:39,080 --> 00:02:40,520
but through your glasses.

76
00:02:40,520 --> 00:02:41,360
Oh, wow.

77
00:02:41,360 --> 00:02:42,480
Like it can understand what you're seeing

78
00:02:42,480 --> 00:02:44,280
and provide real-time information.

79
00:02:44,280 --> 00:02:46,240
You can even get real-time translations.

80
00:02:46,240 --> 00:02:48,480
Oh my gosh, like if I was traveling in a foreign country,

81
00:02:48,480 --> 00:02:51,000
I could just look at a sign and it would translate it for me.

82
00:02:51,000 --> 00:02:53,080
Exactly, it's like having a superpower.

83
00:02:53,080 --> 00:02:54,200
It really is.

84
00:02:54,200 --> 00:02:55,480
Okay, let's switch gears a bit

85
00:02:55,480 --> 00:02:58,120
and talk about image and video generation.

86
00:02:58,120 --> 00:03:00,200
Google just released some major updates

87
00:03:00,200 --> 00:03:02,880
with VO2 and Image in Three,

88
00:03:02,880 --> 00:03:06,480
and both of them are achieving state-of-the-art results.

89
00:03:06,480 --> 00:03:08,160
Yeah, they're really pushing the boundaries

90
00:03:08,160 --> 00:03:09,560
of what's possible.

91
00:03:09,560 --> 00:03:11,560
So what does state-of-the-art actually mean?

92
00:03:11,560 --> 00:03:15,000
Well, it basically means these models are creating images

93
00:03:15,000 --> 00:03:17,760
and videos that are more realistic and detailed

94
00:03:17,760 --> 00:03:19,440
than anything we've seen before.

95
00:03:19,440 --> 00:03:20,800
So let's start with VO2.

96
00:03:20,800 --> 00:03:22,760
This was all about video generation, right?

97
00:03:22,760 --> 00:03:24,920
Yeah, and it's not just about creating videos,

98
00:03:24,920 --> 00:03:26,280
it's about understanding them.

99
00:03:26,280 --> 00:03:29,240
Like VO2 can actually understand physics

100
00:03:29,240 --> 00:03:30,480
and human movement.

101
00:03:30,480 --> 00:03:32,320
Oh wow, so it wouldn't create a video

102
00:03:32,320 --> 00:03:34,640
of someone walking through a wall.

103
00:03:34,640 --> 00:03:36,760
Right, it understands how things move

104
00:03:36,760 --> 00:03:38,400
and interact in the real world,

105
00:03:38,400 --> 00:03:40,600
and it can even respond to cinematic prompts

106
00:03:40,600 --> 00:03:42,560
like camera angles and lens types.

107
00:03:42,560 --> 00:03:44,880
Oh, so I could tell it to use a low-angle tracking shot.

108
00:03:44,880 --> 00:03:45,720
Yeah, exactly.

109
00:03:45,720 --> 00:03:48,360
It's like having a virtual film crew at your fingertips.

110
00:03:48,360 --> 00:03:52,040
That is wild, and it produces fewer unwanted details.

111
00:03:52,040 --> 00:03:55,640
Which we call hallucinations in the AI world.

112
00:03:55,640 --> 00:03:58,560
Right, so like if it was generating a video of a cat,

113
00:03:58,560 --> 00:04:00,920
it wouldn't accidentally give the cat

114
00:04:00,920 --> 00:04:02,240
like six legs or something.

115
00:04:02,240 --> 00:04:04,240
Exactly, those weird little glitches

116
00:04:04,240 --> 00:04:06,880
are becoming less and less common.

117
00:04:06,880 --> 00:04:08,280
Okay, so what about image in three?

118
00:04:08,280 --> 00:04:10,760
Image in three is all about image generation,

119
00:04:10,760 --> 00:04:12,960
but it's not just about making pretty pictures,

120
00:04:12,960 --> 00:04:14,920
it's about artistic expression.

121
00:04:14,920 --> 00:04:17,320
Like it can understand different art styles

122
00:04:17,320 --> 00:04:19,560
and render them with incredible accuracy.

123
00:04:19,560 --> 00:04:22,960
Oh, so like if I wanted a picture in the style of Van Gogh.

124
00:04:22,960 --> 00:04:24,880
You could tell image in three to do that,

125
00:04:24,880 --> 00:04:25,880
and it would create something

126
00:04:25,880 --> 00:04:27,920
that looks like a real Van Gogh painting.

127
00:04:27,920 --> 00:04:28,800
That's insane.

128
00:04:28,800 --> 00:04:31,120
And it's not just mimicking existing styles,

129
00:04:31,120 --> 00:04:32,600
it can also blend styles together

130
00:04:32,600 --> 00:04:34,240
and create something entirely new.

131
00:04:34,240 --> 00:04:35,720
So what are some examples of things

132
00:04:35,720 --> 00:04:36,920
Image in Three has created?

133
00:04:36,920 --> 00:04:40,920
It can render a winter wonderland scene with a red squirrel,

134
00:04:40,920 --> 00:04:43,760
or a hyper realistic image of a strawberry sculpted

135
00:04:43,760 --> 00:04:46,360
into a hummingbird, or even bring to life

136
00:04:46,360 --> 00:04:49,160
a foggy 1940s European train station.

137
00:04:49,160 --> 00:04:50,280
Oh wow, that's amazing.

138
00:04:50,280 --> 00:04:51,520
It's really pushing the boundaries

139
00:04:51,520 --> 00:04:53,640
of what we thought was possible with AI.

140
00:04:53,640 --> 00:04:55,080
Okay, so let's move on to a topic

141
00:04:55,080 --> 00:04:56,960
that's near and dear to my heart.

142
00:04:56,960 --> 00:04:58,720
Check GPT.

143
00:04:58,720 --> 00:05:00,200
It seems like it's becoming way more

144
00:05:00,200 --> 00:05:02,040
than just a simple chatbot.

145
00:05:02,040 --> 00:05:04,560
Yeah, OpenAI is definitely positioning it

146
00:05:04,560 --> 00:05:07,040
as a central hub for all sorts of tasks.

147
00:05:07,040 --> 00:05:09,000
Right, so it's not just about writing anymore.

148
00:05:09,000 --> 00:05:12,480
It's also for research, programming, even web search.

149
00:05:12,480 --> 00:05:14,280
It's like an everything app for AI.

150
00:05:14,280 --> 00:05:16,200
Yeah, that's a good way to put it.

151
00:05:16,200 --> 00:05:18,600
So one of the new features is called projects.

152
00:05:18,600 --> 00:05:21,200
Right, it's basically a way to organize your conversations

153
00:05:21,200 --> 00:05:22,520
within chat GPT.

154
00:05:22,520 --> 00:05:24,800
Like having different folders for your various projects.

155
00:05:24,800 --> 00:05:27,280
Yeah, it can be really helpful for staying organized,

156
00:05:27,280 --> 00:05:31,040
especially if you're using chat GPT for complex tasks

157
00:05:31,040 --> 00:05:32,880
or managing multiple things at once.

158
00:05:32,880 --> 00:05:34,640
Okay, that makes a lot of sense.

159
00:05:34,640 --> 00:05:36,400
And speaking of new features,

160
00:05:36,400 --> 00:05:39,520
chat GPT's AI search engine is finally rolling out

161
00:05:39,520 --> 00:05:40,520
to all users.

162
00:05:40,520 --> 00:05:41,720
Yeah, this is a pretty big deal.

163
00:05:41,720 --> 00:05:43,200
It's not just a regular search engine.

164
00:05:43,200 --> 00:05:46,680
It uses AI to understand what you're actually looking for.

165
00:05:46,680 --> 00:05:48,440
So it's like, it can read your mind.

166
00:05:48,440 --> 00:05:50,880
Well, not quite, but it gets pretty close.

167
00:05:50,880 --> 00:05:53,480
It can pick up on subtle cues and nuances

168
00:05:53,480 --> 00:05:56,120
in your search queries to deliver results

169
00:05:56,120 --> 00:05:58,960
that are more personalized and relevant to your needs.

170
00:05:58,960 --> 00:06:01,480
And I heard that the mobile version is really interesting.

171
00:06:01,480 --> 00:06:04,480
Yeah, they actually designed it to look and feel

172
00:06:04,480 --> 00:06:06,400
more like a traditional search engine.

173
00:06:06,400 --> 00:06:09,160
So it has things like images and ratings and...

174
00:06:09,160 --> 00:06:11,280
Maps and directions, all that stuff.

175
00:06:11,280 --> 00:06:12,920
Okay, so why would they do that?

176
00:06:12,920 --> 00:06:15,400
Well, a lot of people are used to traditional search engines.

177
00:06:15,400 --> 00:06:18,520
So it makes sense to create something that feels familiar.

178
00:06:18,520 --> 00:06:20,200
Right, it's like meeting people where they are.

179
00:06:20,200 --> 00:06:21,040
Exactly.

180
00:06:21,040 --> 00:06:22,960
Okay, now let's talk about notebook LM

181
00:06:22,960 --> 00:06:24,440
because there are some exciting updates

182
00:06:24,440 --> 00:06:25,960
to this tool as well.

183
00:06:25,960 --> 00:06:28,720
So remind me again, what is notebook LM?

184
00:06:28,720 --> 00:06:32,320
It's basically an AI-powered research notebook.

185
00:06:32,320 --> 00:06:33,400
So what does that mean?

186
00:06:33,400 --> 00:06:36,840
Well, you can upload documents, summarize key points,

187
00:06:36,840 --> 00:06:38,320
highlight important passages,

188
00:06:38,320 --> 00:06:41,680
and even generate audio overviews of your research material.

189
00:06:41,680 --> 00:06:42,520
Oh, wow.

190
00:06:42,520 --> 00:06:44,080
So it's like having a research assistant

191
00:06:44,080 --> 00:06:46,000
who can do all the heavy lifting for you.

192
00:06:46,000 --> 00:06:47,960
Yeah, that's a good way to think about it.

193
00:06:47,960 --> 00:06:49,560
And they've just released a premium version

194
00:06:49,560 --> 00:06:51,600
called Notebook LM Plus.

195
00:06:51,600 --> 00:06:53,400
Which comes with all sorts of extra features,

196
00:06:53,400 --> 00:06:56,560
like increased usage limits, customization options,

197
00:06:56,560 --> 00:06:58,720
shared team notebooks with analytics,

198
00:06:58,720 --> 00:07:01,160
and even enhanced privacy and security.

199
00:07:01,160 --> 00:07:04,240
Oh, wow, that sounds like a power user's dream.

200
00:07:04,240 --> 00:07:06,000
But wait, it gets even better.

201
00:07:06,000 --> 00:07:10,200
Notebook LM is now integrated with Gemini 2.0 Flash.

202
00:07:10,200 --> 00:07:11,480
That's a huge upgrade.

203
00:07:11,480 --> 00:07:13,160
So what does that mean for users?

204
00:07:13,160 --> 00:07:14,600
Well, during those audio overviews,

205
00:07:14,600 --> 00:07:17,240
you can now interact with AI hosts.

206
00:07:17,240 --> 00:07:19,560
Okay, so I'm listening to a summary of a research paper.

207
00:07:19,560 --> 00:07:22,920
And you can actually pause and ask the AI a question,

208
00:07:22,920 --> 00:07:25,040
like to clarify something or delve deeper

209
00:07:25,040 --> 00:07:26,400
into a specific topic.

210
00:07:26,400 --> 00:07:27,640
Oh, wow, that's amazing.

211
00:07:27,640 --> 00:07:28,760
It's like having a conversation

212
00:07:28,760 --> 00:07:30,480
with the experts who wrote the paper.

213
00:07:30,480 --> 00:07:31,760
That is so cool.

214
00:07:31,760 --> 00:07:33,840
Okay, but not all the news in the AI world

215
00:07:33,840 --> 00:07:35,760
is sunshine and roses.

216
00:07:35,760 --> 00:07:37,560
We need to talk about this legal battle

217
00:07:37,560 --> 00:07:39,920
that's brewing between Elon Musk and open AI.

218
00:07:39,920 --> 00:07:41,480
Yeah, this one's a bit messy.

219
00:07:41,480 --> 00:07:44,360
It's basically a clash of visions about the future of AI.

220
00:07:44,360 --> 00:07:46,920
So Elon Musk was one of the early investors

221
00:07:46,920 --> 00:07:49,360
and board members of open AI, right?

222
00:07:49,360 --> 00:07:50,960
And he's now claiming that the company

223
00:07:50,960 --> 00:07:52,840
has strayed from its original mission.

224
00:07:52,840 --> 00:07:55,840
Which was to develop AI safely and ethically

225
00:07:55,840 --> 00:07:57,400
for the benefit of humanity.

226
00:07:57,400 --> 00:08:00,240
And he's worried that open AI is now more focused

227
00:08:00,240 --> 00:08:02,680
on profits than on ethics.

228
00:08:02,680 --> 00:08:04,160
It's a tricky situation for sure.

229
00:08:04,160 --> 00:08:06,440
So what exactly is Musk alleging?

230
00:08:06,440 --> 00:08:09,280
Well, court filings reveal that Musk's original vision

231
00:08:09,280 --> 00:08:11,640
for open AI was as a nonprofit.

232
00:08:11,640 --> 00:08:14,640
He wanted it to be a force for good in the world.

233
00:08:14,640 --> 00:08:16,920
And he's concerned that the company's shift

234
00:08:16,920 --> 00:08:19,960
to a for-profit model has compromised that mission.

235
00:08:19,960 --> 00:08:21,840
Yeah, he's basically saying that open AI

236
00:08:21,840 --> 00:08:23,680
has betrayed its original principles.

237
00:08:23,680 --> 00:08:25,800
And there are even emails from 2017

238
00:08:25,800 --> 00:08:28,000
showing this tension between Musk and Altman

239
00:08:28,000 --> 00:08:30,640
and the other co-founders over control of the company.

240
00:08:30,640 --> 00:08:31,840
It was a real power struggle.

241
00:08:31,840 --> 00:08:34,400
So it's like a real life drama about the future of AI.

242
00:08:34,400 --> 00:08:35,240
It really is.

243
00:08:35,240 --> 00:08:36,800
It'll be interesting to see how this all plays out.

244
00:08:36,800 --> 00:08:37,800
Definitely.

245
00:08:37,800 --> 00:08:39,760
Okay, let's end on a slightly lighter note.

246
00:08:39,760 --> 00:08:43,680
Ilya Sutskever, a former chief scientist at Open AI,

247
00:08:43,680 --> 00:08:45,480
made a pretty interesting prediction recently.

248
00:08:45,480 --> 00:08:48,280
Yeah, he said that AI with reasoning power

249
00:08:48,280 --> 00:08:50,000
will become less predictable.

250
00:08:50,000 --> 00:08:51,240
Okay, so what does that mean?

251
00:08:51,240 --> 00:08:53,160
Well, as AI models get smarter

252
00:08:53,160 --> 00:08:55,480
and more capable of complex reasoning,

253
00:08:55,480 --> 00:08:58,080
they'll be able to consider millions of possibilities

254
00:08:58,080 --> 00:08:59,960
before making a decision,

255
00:08:59,960 --> 00:09:02,360
which means their actions will become less obvious

256
00:09:02,360 --> 00:09:04,240
and more surprising to us.

257
00:09:04,240 --> 00:09:06,440
So it's like they'll be able to think outside the box.

258
00:09:06,440 --> 00:09:07,280
Exactly.

259
00:09:07,280 --> 00:09:10,360
And that could lead to some really amazing breakthroughs.

260
00:09:10,360 --> 00:09:12,640
But it also raises questions about control

261
00:09:12,640 --> 00:09:16,040
and ensuring that AI remains aligned with human values.

262
00:09:16,040 --> 00:09:18,480
It's like the more intelligent AI becomes,

263
00:09:18,480 --> 00:09:20,280
the less we'll be able to predict what it will do.

264
00:09:20,280 --> 00:09:22,960
Yeah, it's both exciting and a little bit scary.

265
00:09:22,960 --> 00:09:24,880
It's definitely something to think about.

266
00:09:24,880 --> 00:09:26,360
Okay, so let's shift gears one more time

267
00:09:26,360 --> 00:09:30,200
and talk about this new research on best of end jailbreaking.

268
00:09:30,200 --> 00:09:33,240
Okay, so jailbreaking is basically a way to trick a chatbot

269
00:09:33,240 --> 00:09:35,760
into giving you answers it's not supposed to.

270
00:09:35,760 --> 00:09:37,960
Like bypassing the safety protocols.

271
00:09:37,960 --> 00:09:38,800
Exactly.

272
00:09:38,800 --> 00:09:41,040
So is this research saying that it's becoming easier

273
00:09:41,040 --> 00:09:42,920
to jailbreak these AI systems?

274
00:09:42,920 --> 00:09:45,240
Well, they found that the success rate of these attacks

275
00:09:45,240 --> 00:09:48,520
increases with the amount of computing power used.

276
00:09:48,520 --> 00:09:50,800
So basically if you have enough computers,

277
00:09:50,800 --> 00:09:54,280
you can eventually find a way to jailbreak any AI model.

278
00:09:54,280 --> 00:09:56,320
Oh, that's a little unsettling.

279
00:09:56,320 --> 00:09:58,800
Yeah, it's a reminder that we need to be constantly

280
00:09:58,800 --> 00:10:00,200
improving our defenses.

281
00:10:00,200 --> 00:10:01,320
Right, it's like an arms race.

282
00:10:01,320 --> 00:10:02,160
Exactly.

283
00:10:02,160 --> 00:10:04,720
Okay, but the good news is that the company

284
00:10:04,720 --> 00:10:07,360
behind this research, Anthropic,

285
00:10:07,360 --> 00:10:11,480
responsibly disclosed the vulnerability to other AI labs

286
00:10:11,480 --> 00:10:13,360
through the Frontier Model Forum.

287
00:10:13,360 --> 00:10:15,880
Yeah, that's really important for the AI safety community.

288
00:10:15,880 --> 00:10:17,080
We need to be working together

289
00:10:17,080 --> 00:10:18,840
to make these systems more secure.

290
00:10:18,840 --> 00:10:19,680
Totally.

291
00:10:19,680 --> 00:10:22,680
Okay, so for our last story in this part of our deep dive,

292
00:10:22,680 --> 00:10:27,000
let's zoom out and look at the big picture of AI investment.

293
00:10:27,000 --> 00:10:29,520
SoftBank CEO, Masayoshi Sun,

294
00:10:29,520 --> 00:10:32,160
just announced a massive investment plan.

295
00:10:32,160 --> 00:10:33,200
A hundred billion dollars.

296
00:10:33,200 --> 00:10:35,680
Yes, a hundred billion dollars over the next four years

297
00:10:35,680 --> 00:10:37,760
to support AI development in the US.

298
00:10:37,760 --> 00:10:39,360
That's a serious commitment.

299
00:10:39,360 --> 00:10:41,120
So what's behind this big move?

300
00:10:41,120 --> 00:10:43,720
Well, Sun has a history of making bold bets

301
00:10:43,720 --> 00:10:45,400
on emerging technologies,

302
00:10:45,400 --> 00:10:48,800
and he clearly sees AI as the next big thing.

303
00:10:48,800 --> 00:10:50,320
So where's all this money gonna go?

304
00:10:50,320 --> 00:10:52,560
He's focusing on building the core infrastructure

305
00:10:52,560 --> 00:10:55,040
that AI development needs, like data centers,

306
00:10:55,040 --> 00:10:57,000
semiconductor manufacturing,

307
00:10:57,000 --> 00:10:58,800
and even energy infrastructure.

308
00:10:58,800 --> 00:11:01,120
So it's not just about developing new AI models.

309
00:11:01,120 --> 00:11:02,960
It's about creating the ecosystem

310
00:11:02,960 --> 00:11:05,240
that will allow AI to flourish.

311
00:11:05,240 --> 00:11:08,560
And that's it for part one of our AI deep dive.

312
00:11:08,560 --> 00:11:10,000
We've covered a lot of ground today,

313
00:11:10,000 --> 00:11:12,640
from video understanding and image generation,

314
00:11:12,640 --> 00:11:15,240
to the evolving world of chat GPT,

315
00:11:15,240 --> 00:11:18,240
and the big picture trends in the AI landscape.

316
00:11:18,240 --> 00:11:20,560
Yeah, it's amazing how fast this field is moving.

317
00:11:20,560 --> 00:11:22,720
But there is still more to explore.

318
00:11:22,720 --> 00:11:24,480
Be sure to join us for part two,

319
00:11:24,480 --> 00:11:25,680
where we'll continue our journey

320
00:11:25,680 --> 00:11:27,400
through the fascinating world of AI.

321
00:11:27,400 --> 00:11:28,240
I can't wait.

322
00:11:28,240 --> 00:11:31,200
So we were talking about the ethical side of AI

323
00:11:31,200 --> 00:11:34,280
and those potential risks as AI gets more and more powerful.

324
00:11:34,280 --> 00:11:36,400
And this next story really dives into that.

325
00:11:36,400 --> 00:11:38,920
Okay, so this is about a paper exploring something

326
00:11:38,920 --> 00:11:40,720
called best of in jailbreaking, right?

327
00:11:40,720 --> 00:11:41,560
Yes, exactly.

328
00:11:41,560 --> 00:11:43,280
Jailbreaking might sound kind of scary,

329
00:11:43,280 --> 00:11:44,440
but in the AI world,

330
00:11:44,440 --> 00:11:47,200
it's basically about tricking those powerful models.

331
00:11:47,200 --> 00:11:49,280
Oh, so like bypassing those safety protocols

332
00:11:49,280 --> 00:11:50,120
they have in place.

333
00:11:50,120 --> 00:11:50,960
Exactly.

334
00:11:50,960 --> 00:11:53,080
And what's concerning is this research found

335
00:11:53,080 --> 00:11:55,960
that those attack success rates increased predictably

336
00:11:55,960 --> 00:11:57,920
with the amount of computing power used.

337
00:11:57,920 --> 00:12:00,560
So like the more resources an attacker has,

338
00:12:00,560 --> 00:12:03,040
the easier it is to find those jailbreaks.

339
00:12:03,040 --> 00:12:04,840
Right, and that highlights something

340
00:12:04,840 --> 00:12:06,480
we really need to be thinking about.

341
00:12:06,480 --> 00:12:07,840
We need strong defenses,

342
00:12:07,840 --> 00:12:09,320
and we need to make sure they can keep up

343
00:12:09,320 --> 00:12:11,280
with how fast AI is evolving.

344
00:12:11,280 --> 00:12:13,160
Yeah, it's like they're always one step ahead.

345
00:12:13,160 --> 00:12:14,000
Exactly.

346
00:12:14,000 --> 00:12:16,920
And that's why I was happy to see that Anthropic,

347
00:12:16,920 --> 00:12:18,640
the company behind this research,

348
00:12:18,640 --> 00:12:20,720
actually disclosed this vulnerability.

349
00:12:20,720 --> 00:12:21,560
Oh, that's good.

350
00:12:21,560 --> 00:12:23,400
They shared it with other AI labs

351
00:12:23,400 --> 00:12:25,280
through the Frontier Model Forum, right?

352
00:12:25,280 --> 00:12:27,240
Yeah, that kind of collaboration

353
00:12:27,240 --> 00:12:29,040
is so important in the AI community.

354
00:12:29,040 --> 00:12:30,600
It's like we all need to work together

355
00:12:30,600 --> 00:12:33,040
to make sure these systems are as secure as possible.

356
00:12:33,040 --> 00:12:34,760
Okay, so shifting gears a bit,

357
00:12:34,760 --> 00:12:37,720
let's talk about the impact of AI on investments.

358
00:12:37,720 --> 00:12:40,360
Right, SoftBank CEO Masayoshi Sun

359
00:12:40,360 --> 00:12:44,840
just announced this massive $100 billion investment plan.

360
00:12:44,840 --> 00:12:47,200
Wow, $100 billion?

361
00:12:47,200 --> 00:12:48,680
That's a huge number.

362
00:12:48,680 --> 00:12:49,720
What's the thinking behind that?

363
00:12:49,720 --> 00:12:52,440
Well, Sun's known for making big moves in the tech world,

364
00:12:52,440 --> 00:12:54,920
and this shows he's really serious about AI.

365
00:12:54,920 --> 00:12:56,520
So where is all that money going?

366
00:12:56,520 --> 00:12:59,440
He's focusing on the infrastructure AI needs to grow,

367
00:12:59,440 --> 00:13:00,840
things like data centers,

368
00:13:00,840 --> 00:13:03,360
semiconductor manufacturing, and even energy.

369
00:13:03,360 --> 00:13:05,400
So he's really laying the groundwork.

370
00:13:05,400 --> 00:13:06,600
It's a strategic move.

371
00:13:06,600 --> 00:13:09,080
It could really accelerate how AI is used

372
00:13:09,080 --> 00:13:10,520
in all sorts of industries.

373
00:13:10,520 --> 00:13:12,680
It's interesting to see these big investments

374
00:13:12,680 --> 00:13:14,400
shaping the future of AI.

375
00:13:14,400 --> 00:13:16,200
It really does bring up that question

376
00:13:16,200 --> 00:13:17,680
we keep coming back to.

377
00:13:17,680 --> 00:13:19,480
What does all this mean for society?

378
00:13:19,480 --> 00:13:20,560
Yeah, that's a big one.

379
00:13:20,560 --> 00:13:23,000
Okay, let's go back to ChadGPT for a minute.

380
00:13:23,000 --> 00:13:25,760
OpenAI just keeps adding new features, it seems like.

381
00:13:25,760 --> 00:13:28,560
They really want to make ChadGPT more user-friendly

382
00:13:28,560 --> 00:13:30,240
and accessible for everyone.

383
00:13:30,240 --> 00:13:32,320
And one of those updates is projects, right?

384
00:13:32,320 --> 00:13:34,640
Yeah, basically think of it as folders

385
00:13:34,640 --> 00:13:37,000
for your ChadGPT conversations.

386
00:13:37,000 --> 00:13:40,840
Okay, so I can organize my chats by topic or project or...

387
00:13:40,840 --> 00:13:42,880
Exactly, it can really help you stay organized,

388
00:13:42,880 --> 00:13:44,480
especially if you're using ChadGPT

389
00:13:44,480 --> 00:13:45,840
for lots of different things.

390
00:13:45,840 --> 00:13:46,680
Yeah, that makes sense.

391
00:13:46,680 --> 00:13:48,160
I'm always losing track of things.

392
00:13:48,160 --> 00:13:49,760
Well, projects might help with that.

393
00:13:49,760 --> 00:13:51,600
And speaking of new features,

394
00:13:51,600 --> 00:13:55,800
ChadGPT's AI search engine is available to everyone now.

395
00:13:55,800 --> 00:13:56,760
That's exciting news,

396
00:13:56,760 --> 00:13:59,320
this is a pretty big step forward in search.

397
00:13:59,320 --> 00:14:00,160
So what makes it different

398
00:14:00,160 --> 00:14:01,960
from those traditional search engines?

399
00:14:01,960 --> 00:14:05,440
Well, traditional search engines really rely on keywords,

400
00:14:05,440 --> 00:14:08,600
but ChadGPT uses AI to understand

401
00:14:08,600 --> 00:14:10,120
what you're actually looking for.

402
00:14:10,120 --> 00:14:12,080
So it can understand the context of my search.

403
00:14:12,080 --> 00:14:14,160
Exactly, it picks up on those nuances

404
00:14:14,160 --> 00:14:16,280
to give you more personalized results.

405
00:14:16,280 --> 00:14:17,400
And I heard the mobile version

406
00:14:17,400 --> 00:14:19,080
is designed to feel familiar.

407
00:14:19,080 --> 00:14:21,400
Yeah, it looks a lot like a traditional search engine

408
00:14:21,400 --> 00:14:23,400
with images, ratings, and...

409
00:14:23,400 --> 00:14:24,360
Maps and all that.

410
00:14:24,360 --> 00:14:26,680
Exactly, they wanna make it easy for people to use.

411
00:14:26,680 --> 00:14:28,520
Okay, so let's talk about Notebook LM too.

412
00:14:28,520 --> 00:14:30,320
There are some updates to that as well, right?

413
00:14:30,320 --> 00:14:32,240
Yes, Notebook LM is basically

414
00:14:32,240 --> 00:14:35,560
like the super powered research notebook.

415
00:14:35,560 --> 00:14:36,760
Remind me what it does again?

416
00:14:36,760 --> 00:14:38,320
Well, you can upload documents,

417
00:14:38,320 --> 00:14:39,960
summarize key points,

418
00:14:39,960 --> 00:14:41,760
highlight important passages,

419
00:14:41,760 --> 00:14:44,480
even generate audio overviews of your research.

420
00:14:44,480 --> 00:14:46,360
Oh, wow, so it's like having a research assistant.

421
00:14:46,360 --> 00:14:48,640
Exactly, and they just launched a premium version

422
00:14:48,640 --> 00:14:50,480
called Notebook LM Plus.

423
00:14:50,480 --> 00:14:51,320
Which has?

424
00:14:51,320 --> 00:14:54,440
Increased usage limits, customization options,

425
00:14:54,440 --> 00:14:56,520
shared notebooks for teams,

426
00:14:56,520 --> 00:14:58,680
and even better privacy and security.

427
00:14:58,680 --> 00:15:00,360
Wow, that sounds amazing.

428
00:15:00,360 --> 00:15:01,200
Yeah.

429
00:15:01,200 --> 00:15:02,040
But wait, there's more.

430
00:15:02,040 --> 00:15:04,800
It's now integrated with Gemini 2.0 Flash.

431
00:15:04,800 --> 00:15:07,280
Oh yeah, that integration is huge.

432
00:15:07,280 --> 00:15:08,600
So what does that mean for users?

433
00:15:08,600 --> 00:15:10,120
During those audio overviews,

434
00:15:10,120 --> 00:15:12,960
you can actually interact with AI hosts.

435
00:15:12,960 --> 00:15:15,800
No way, so I could be listening to a summary of a paper.

436
00:15:15,800 --> 00:15:17,200
And if you need clarification,

437
00:15:17,200 --> 00:15:19,200
you can actually ask the AI.

438
00:15:19,200 --> 00:15:20,240
That is incredible.

439
00:15:20,240 --> 00:15:21,560
It's like having a personal tutor.

440
00:15:21,560 --> 00:15:23,640
Yeah, it's really pushing the boundaries

441
00:15:23,640 --> 00:15:26,680
of how we learn and work with information.

442
00:15:26,680 --> 00:15:29,480
Okay, so now for a story that reminds us that

443
00:15:29,480 --> 00:15:32,720
even with all these advancements, AI isn't perfect.

444
00:15:32,720 --> 00:15:36,640
Apparently Apple's new AI notification feature

445
00:15:36,640 --> 00:15:37,560
messed up a little bit.

446
00:15:37,560 --> 00:15:39,960
Yeah, it generated this false headline

447
00:15:39,960 --> 00:15:41,240
about a murder case.

448
00:15:41,240 --> 00:15:42,960
Oh no, that's not good.

449
00:15:42,960 --> 00:15:43,800
What happened?

450
00:15:43,800 --> 00:15:46,040
So it was summarizing a news article from the BBC

451
00:15:46,040 --> 00:15:49,200
about the murder of this healthcare insurance CEO.

452
00:15:49,200 --> 00:15:50,800
But it got the facts wrong and said

453
00:15:50,800 --> 00:15:52,840
the suspect shot himself.

454
00:15:52,840 --> 00:15:55,760
Oh wow, so the AI actually presented misinformation.

455
00:15:55,760 --> 00:15:58,200
Yeah, and it's a good reminder that even though AI

456
00:15:58,200 --> 00:16:00,120
is getting better at summarizing information,

457
00:16:00,120 --> 00:16:01,880
it's still prone to errors.

458
00:16:01,880 --> 00:16:04,840
So we can't just blindly trust everything an AI tells us.

459
00:16:04,840 --> 00:16:07,320
Exactly, we still need to be critical thinkers.

460
00:16:07,320 --> 00:16:09,440
Okay, back to Meta for a second.

461
00:16:09,440 --> 00:16:10,880
It seems like they're everywhere.

462
00:16:10,880 --> 00:16:12,160
What's this new story about?

463
00:16:12,160 --> 00:16:14,840
They're updating their Ray-Ban Meta smart glasses

464
00:16:14,840 --> 00:16:18,240
to include real-time AI video capabilities.

465
00:16:18,240 --> 00:16:19,840
Wow, so what does that even mean?

466
00:16:19,840 --> 00:16:22,200
Think of it like having a personal AI assistant

467
00:16:22,200 --> 00:16:23,680
who can see what you're seeing.

468
00:16:23,680 --> 00:16:25,600
You can ask it questions about your surroundings,

469
00:16:25,600 --> 00:16:28,600
get real-time info, even have conversations translated.

470
00:16:28,600 --> 00:16:30,720
Oh my gosh, that's like straight out of a sci-fi movie.

471
00:16:30,720 --> 00:16:31,560
It really is.

472
00:16:31,560 --> 00:16:33,040
Imagine traveling in a foreign country

473
00:16:33,040 --> 00:16:35,440
and your glasses just translate everything for you.

474
00:16:35,440 --> 00:16:36,960
That would be amazing.

475
00:16:36,960 --> 00:16:38,960
But it also makes me think about privacy.

476
00:16:38,960 --> 00:16:41,120
Yeah, that's a valid concern what happens

477
00:16:41,120 --> 00:16:42,640
to all that data being collected.

478
00:16:42,640 --> 00:16:43,680
Exactly.

479
00:16:43,680 --> 00:16:46,400
Okay, let's switch gears again and talk about Google.

480
00:16:46,400 --> 00:16:48,720
They've been making waves with their image

481
00:16:48,720 --> 00:16:52,520
and video generation models, VO2 and Image in Three.

482
00:16:52,520 --> 00:16:54,160
VO2 is really impressive.

483
00:16:54,160 --> 00:16:55,960
It can create these lifelike videos

484
00:16:55,960 --> 00:16:58,920
that understand physics movement and facial expressions.

485
00:16:58,920 --> 00:17:01,760
And it can even respond to directions like camera angles.

486
00:17:01,760 --> 00:17:04,640
Exactly, it's like having a virtual film crew.

487
00:17:04,640 --> 00:17:05,720
That's insane.

488
00:17:05,720 --> 00:17:07,920
And Image in Three is equally impressive.

489
00:17:07,920 --> 00:17:10,240
It can generate super realistic images

490
00:17:10,240 --> 00:17:15,000
from landscapes to portraits to just imaginative scenes.

491
00:17:15,000 --> 00:17:17,840
It's like AI is making it hard to tell what's real and what's not.

492
00:17:17,840 --> 00:17:19,400
It really does raise those questions

493
00:17:19,400 --> 00:17:22,640
about authenticity and ownership, especially with DeepFakes.

494
00:17:22,640 --> 00:17:23,680
Yeah, it's a double-edged sword.

495
00:17:23,680 --> 00:17:26,320
For sure, but these are the conversations we need to have.

496
00:17:26,320 --> 00:17:28,680
We need to be thoughtful about how we use AI.

497
00:17:28,680 --> 00:17:29,400
That's a great point.

498
00:17:29,400 --> 00:17:31,000
We need to make sure we're using it for good.

499
00:17:31,000 --> 00:17:32,040
Exactly.

500
00:17:32,040 --> 00:17:35,680
Okay, so that brings us to the end of part two of our deep dive.

501
00:17:35,680 --> 00:17:38,880
We've covered a lot of ground from the ethics of AI

502
00:17:38,880 --> 00:17:40,920
to those incredible advancements.

503
00:17:40,920 --> 00:17:42,920
And we've still got more to explore.

504
00:17:42,920 --> 00:17:44,680
So be sure to join us for part three

505
00:17:44,680 --> 00:17:47,920
where we'll dive into some more fascinating developments.

506
00:17:47,920 --> 00:17:50,960
Okay, so we're back for the final part of our deep dive.

507
00:17:50,960 --> 00:17:52,160
And we're kicking things off

508
00:17:52,160 --> 00:17:54,400
with a pretty big announcement from Meta.

509
00:17:54,400 --> 00:17:56,880
They've just revealed this new AI architecture

510
00:17:56,880 --> 00:18:01,440
called the Byte-Latent Transformer, or BLT for short.

511
00:18:01,440 --> 00:18:03,760
Yeah, BLT is definitely making some waves.

512
00:18:03,760 --> 00:18:06,000
So with all these new models coming out all the time,

513
00:18:06,000 --> 00:18:07,400
what makes this one so special?

514
00:18:07,400 --> 00:18:10,280
Well, BLT is a pretty fundamental shift

515
00:18:10,280 --> 00:18:13,760
in how AI models process information.

516
00:18:13,760 --> 00:18:15,840
Okay, I'm intrigued.

517
00:18:15,840 --> 00:18:16,680
Tell me more.

518
00:18:16,680 --> 00:18:19,520
Most AI models, they break down text into words

519
00:18:19,520 --> 00:18:21,680
or even parts of words before they can understand it.

520
00:18:21,680 --> 00:18:22,920
It's called tokenization.

521
00:18:22,920 --> 00:18:23,920
Yeah, I've heard of that.

522
00:18:23,920 --> 00:18:26,000
But BLT skips that whole step.

523
00:18:26,000 --> 00:18:28,480
So it works with the raw data directly.

524
00:18:28,480 --> 00:18:30,240
Exactly, it works with bytes,

525
00:18:30,240 --> 00:18:33,840
the fundamental building blocks of digital information.

526
00:18:33,840 --> 00:18:36,240
Okay, but how does that actually change things?

527
00:18:36,240 --> 00:18:37,280
Well, for one thing,

528
00:18:37,280 --> 00:18:40,200
it could make AI models much more efficient.

529
00:18:40,200 --> 00:18:42,880
They can process information faster

530
00:18:42,880 --> 00:18:44,760
and use less computing power.

531
00:18:44,760 --> 00:18:45,600
That makes sense.

532
00:18:45,600 --> 00:18:48,360
Cutting out that extra step must save a lot of resources.

533
00:18:48,360 --> 00:18:50,000
And it goes beyond efficiency too.

534
00:18:50,000 --> 00:18:52,200
Working with bytes could make AI models

535
00:18:52,200 --> 00:18:53,880
more adaptable and versatile.

536
00:18:53,880 --> 00:18:56,400
So they can handle different types of data more easily.

537
00:18:56,400 --> 00:19:00,080
Exactly, not just text, but images, video,

538
00:19:00,080 --> 00:19:02,160
maybe even music or code.

539
00:19:02,160 --> 00:19:05,320
Wow, so it's like giving AI a universal language.

540
00:19:05,320 --> 00:19:06,160
That's the idea.

541
00:19:06,160 --> 00:19:08,360
BLT is still in its early stages,

542
00:19:08,360 --> 00:19:10,320
but it's incredibly promising.

543
00:19:10,320 --> 00:19:11,320
It sounds like it, yeah,

544
00:19:11,320 --> 00:19:13,320
we'll definitely be keeping an eye on that one.

545
00:19:13,320 --> 00:19:14,600
Okay, so switching gears a bit.

546
00:19:14,600 --> 00:19:17,520
Google has this new AI experiment called WISC.

547
00:19:17,520 --> 00:19:18,360
What's that all about?

548
00:19:18,360 --> 00:19:19,880
WISC is all about creativity.

549
00:19:19,880 --> 00:19:22,400
It's using AI to blend images together.

550
00:19:22,400 --> 00:19:24,640
So instead of text prompts, you start with images.

551
00:19:24,640 --> 00:19:27,280
Exactly, you can upload your own pictures

552
00:19:27,280 --> 00:19:29,560
or choose from a library of existing ones.

553
00:19:29,560 --> 00:19:31,920
And then you use WISC to blend them together

554
00:19:31,920 --> 00:19:33,440
and create something totally new.

555
00:19:33,440 --> 00:19:36,200
It's like a digital art studio powered by AI.

556
00:19:36,200 --> 00:19:39,360
Yeah, and you can explore different styles, textures,

557
00:19:39,360 --> 00:19:41,320
combine elements from different pictures.

558
00:19:41,320 --> 00:19:42,240
It's really cool.

559
00:19:42,240 --> 00:19:43,760
So how does it actually work?

560
00:19:43,760 --> 00:19:47,400
It uses two of Google's most advanced AI models,

561
00:19:47,400 --> 00:19:49,280
Image in Three and Geminer.

562
00:19:49,280 --> 00:19:50,640
Okay, so those two are working together?

563
00:19:50,640 --> 00:19:53,040
Yeah, Gemini analyzes the images you give it

564
00:19:53,040 --> 00:19:55,280
and creates descriptions of what it sees.

565
00:19:55,280 --> 00:19:58,280
Then Image in Three uses those descriptions as prompts

566
00:19:58,280 --> 00:20:00,400
to create the final blended image.

567
00:20:00,400 --> 00:20:03,120
So it's like two AI models collaborating.

568
00:20:03,120 --> 00:20:05,840
Exactly, and that's what's so exciting about this field, right?

569
00:20:05,840 --> 00:20:08,120
We're seeing all these new ways AI can be used

570
00:20:08,120 --> 00:20:10,560
to enhance creativity and solve problems.

571
00:20:10,560 --> 00:20:11,760
It's mind blowing, really.

572
00:20:11,760 --> 00:20:14,400
Well, we've covered a lot of ground in this deep dive

573
00:20:14,400 --> 00:20:17,200
from video understanding and image generation

574
00:20:17,200 --> 00:20:21,560
to ethical considerations and these big investment trends.

575
00:20:21,560 --> 00:20:22,720
Then everything in between.

576
00:20:22,720 --> 00:20:25,320
It's amazing to see how quickly this field is evolving.

577
00:20:25,320 --> 00:20:27,920
It really is, and it's important to stay informed

578
00:20:27,920 --> 00:20:29,400
and engaged with these developments.

579
00:20:29,400 --> 00:20:32,000
Absolutely, because AI is gonna touch every aspect

580
00:20:32,000 --> 00:20:33,480
of our lives in the future.

581
00:20:33,480 --> 00:20:34,920
It already is, in many ways.

582
00:20:34,920 --> 00:20:37,400
It's definitely a fascinating and complex field.

583
00:20:37,400 --> 00:20:39,080
And it's only gonna get more so.

584
00:20:39,080 --> 00:20:41,400
Well, that brings us to the end of our deep dive today.

585
00:20:41,400 --> 00:20:44,200
It's been a pleasure exploring these topics with you.

586
00:20:44,200 --> 00:20:47,000
Thanks for listening to the Daily AI News podcast,

587
00:20:47,000 --> 00:20:48,400
and be sure to join us next time

588
00:20:48,400 --> 00:21:12,400
for another deep dive into the world of AI.