1
00:00:00,000 --> 00:00:02,280
Hey everyone and welcome back for another deep dive.

2
00:00:02,280 --> 00:00:05,480
Today, we're gonna be taking a look at this paper.

3
00:00:05,480 --> 00:00:08,520
It's called the Fancy TS Grounding Leaderboard.

4
00:00:09,440 --> 00:00:12,960
Benchmarking LLM's ability to ground responses

5
00:00:12,960 --> 00:00:15,680
to long form input.

6
00:00:15,680 --> 00:00:17,480
Ooh, long form, that sounds good.

7
00:00:17,480 --> 00:00:19,760
Yeah, that's the key here.

8
00:00:19,760 --> 00:00:22,120
And we're always talking about factuality, right?

9
00:00:22,120 --> 00:00:23,440
With AI. Yeah.

10
00:00:23,440 --> 00:00:24,480
And this is really interesting

11
00:00:24,480 --> 00:00:26,640
because it's not just like a regular paper,

12
00:00:26,640 --> 00:00:28,160
it's actually a leaderboard.

13
00:00:28,160 --> 00:00:29,840
So it's kind of like a competition

14
00:00:29,840 --> 00:00:33,480
to see which AI is best at giving you accurate information.

15
00:00:33,480 --> 00:00:34,440
Okay.

16
00:00:34,440 --> 00:00:37,320
And the really wild thing is the length of the text

17
00:00:37,320 --> 00:00:38,400
they're using to test this.

18
00:00:38,400 --> 00:00:41,040
Like we're talking up to 32,000 words.

19
00:00:41,040 --> 00:00:43,480
That's like, I don't know.

20
00:00:43,480 --> 00:00:44,320
It's a lot.

21
00:00:44,320 --> 00:00:45,160
To novella.

22
00:00:45,160 --> 00:00:46,000
Yeah.

23
00:00:46,000 --> 00:00:47,400
And the AI has to be able to answer questions

24
00:00:47,400 --> 00:00:49,200
based on that and only that.

25
00:00:49,200 --> 00:00:50,280
It can't just go on Google.

26
00:00:50,280 --> 00:00:51,120
Right, so it has to.

27
00:00:51,120 --> 00:00:51,960
Like closed book.

28
00:00:51,960 --> 00:00:53,200
Yeah, closed book, totally.

29
00:00:53,200 --> 00:00:54,040
Yeah.

30
00:00:54,040 --> 00:00:55,880
So how does this leaderboard even work?

31
00:00:55,880 --> 00:00:57,640
Like is there a panel of judges or something?

32
00:00:57,640 --> 00:01:02,600
Well, it's not quite like a talent show with scorecards.

33
00:01:02,600 --> 00:01:05,360
But the great thing is it's designed to be really thorough.

34
00:01:05,360 --> 00:01:08,240
So they test all kinds of AI models.

35
00:01:08,240 --> 00:01:12,440
The things, the brains behind like chat GPT, for example,

36
00:01:12,440 --> 00:01:13,840
across all sorts of topics,

37
00:01:13,840 --> 00:01:16,480
finance, medicine, law,

38
00:01:16,480 --> 00:01:20,040
and they ask them to do a whole variety of tasks,

39
00:01:20,040 --> 00:01:22,800
like summarizing, finding specific facts.

40
00:01:22,800 --> 00:01:25,440
And the best part is it's all judged by people.

41
00:01:25,440 --> 00:01:26,400
Oh, wow.

42
00:01:26,400 --> 00:01:28,160
Okay, so there is a human element to this.

43
00:01:28,160 --> 00:01:31,280
So it's not just, you know, AI greeting AI.

44
00:01:31,280 --> 00:01:34,200
Right, because it's not just about giving any answer, right?

45
00:01:34,200 --> 00:01:35,520
It has to be the right answer.

46
00:01:35,520 --> 00:01:37,240
And it has to understand what's being asked

47
00:01:37,240 --> 00:01:38,440
and be able to find that answer

48
00:01:38,440 --> 00:01:40,520
in this massive, you know, document.

49
00:01:40,520 --> 00:01:43,200
Right, so no more of those AI excuses of,

50
00:01:43,200 --> 00:01:44,320
oh, I couldn't find the answer,

51
00:01:44,320 --> 00:01:45,960
or the dog hate my homework.

52
00:01:45,960 --> 00:01:48,960
Exactly, and they actually had this two step process

53
00:01:48,960 --> 00:01:50,720
to make sure that that doesn't happen.

54
00:01:50,720 --> 00:01:52,760
So the first is eligibility.

55
00:01:52,760 --> 00:01:56,200
Like is the AI even trying to answer the question properly?

56
00:01:56,200 --> 00:01:57,040
Gotcha.

57
00:01:57,040 --> 00:01:57,960
So for instance, you know,

58
00:01:57,960 --> 00:01:59,640
if you ask something about wind energy

59
00:01:59,640 --> 00:02:01,360
and the AI just says, well, it's good,

60
00:02:01,360 --> 00:02:03,280
but it has challenges, that's out.

61
00:02:03,280 --> 00:02:05,880
It has to give you a real detailed answer

62
00:02:05,880 --> 00:02:07,080
grounded in the document.

63
00:02:07,080 --> 00:02:08,240
Okay, that makes sense.

64
00:02:08,240 --> 00:02:09,760
And then what's the second step?

65
00:02:09,760 --> 00:02:11,640
Ah, that's where the big guns come in.

66
00:02:11,640 --> 00:02:13,680
They bring in other AI models, you know,

67
00:02:13,680 --> 00:02:15,240
some of the names you might recognize,

68
00:02:15,240 --> 00:02:17,880
Claude, Gemini, GPT-4.

69
00:02:17,880 --> 00:02:18,720
Wow.

70
00:02:18,720 --> 00:02:22,000
And they judge how accurate the answers are

71
00:02:22,000 --> 00:02:24,840
compared to what's actually in the document.

72
00:02:24,840 --> 00:02:27,000
And this way it's not biased by just using one,

73
00:02:27,000 --> 00:02:29,160
you're using multiple AI judges.

74
00:02:29,160 --> 00:02:33,640
So it's like a jury of their peers, but all robots.

75
00:02:33,640 --> 00:02:35,120
That's kind of wild.

76
00:02:35,120 --> 00:02:36,040
So what did they find?

77
00:02:36,040 --> 00:02:38,480
Like were there any clear winners

78
00:02:38,480 --> 00:02:41,080
in this AI factual Olympics?

79
00:02:41,080 --> 00:02:42,880
There were some that really stood out.

80
00:02:42,880 --> 00:02:45,080
Gemini, especially the flash version,

81
00:02:45,080 --> 00:02:46,560
did really well overall.

82
00:02:46,560 --> 00:02:47,400
Okay.

83
00:02:47,400 --> 00:02:49,040
They tested the models on both, you know,

84
00:02:49,040 --> 00:02:50,360
things they might have seen before,

85
00:02:50,360 --> 00:02:52,320
and then brand new information.

86
00:02:52,320 --> 00:02:53,800
Just to see how they adapted.

87
00:02:53,800 --> 00:02:56,080
And Gemini was consistently up at the top.

88
00:02:56,080 --> 00:02:59,600
So does that mean that like bigger or more complex AI's

89
00:02:59,600 --> 00:03:01,800
are always better at being factual?

90
00:03:01,800 --> 00:03:03,720
Well, it's not always that easy.

91
00:03:03,720 --> 00:03:05,160
Because remember that first step,

92
00:03:05,160 --> 00:03:08,280
the eligibility, weeding out the bad answers.

93
00:03:08,280 --> 00:03:09,160
Right, yeah.

94
00:03:09,160 --> 00:03:10,840
Well, when they factored that in,

95
00:03:10,840 --> 00:03:13,360
some of the models that initially looked pretty good

96
00:03:13,360 --> 00:03:15,640
actually dropped in the rankings.

97
00:03:15,640 --> 00:03:16,480
No, I'm not sure.

98
00:03:16,480 --> 00:03:17,320
So it's kind of like, you know,

99
00:03:17,320 --> 00:03:19,160
showing your work in math class.

100
00:03:19,160 --> 00:03:20,000
Right.

101
00:03:20,000 --> 00:03:20,840
You can't just put down the answer.

102
00:03:20,840 --> 00:03:23,000
You gotta prove that you understand how you got there.

103
00:03:23,000 --> 00:03:24,680
That's a great point.

104
00:03:24,680 --> 00:03:26,360
Yeah, it's not just about knowing the facts.

105
00:03:26,360 --> 00:03:28,280
It's about how to use them.

106
00:03:28,280 --> 00:03:32,800
So, okay, this leaderboard seems like a really big step forward

107
00:03:32,800 --> 00:03:35,800
in figuring out how to make AI more trustworthy.

108
00:03:35,800 --> 00:03:38,440
But what are like the limitations

109
00:03:38,440 --> 00:03:39,640
of this kind of research?

110
00:03:39,640 --> 00:03:42,080
Like what couldn't they address fully?

111
00:03:42,080 --> 00:03:43,160
Well, there were a couple of things.

112
00:03:43,160 --> 00:03:46,240
First, some of the documents that they used to test the AI

113
00:03:46,240 --> 00:03:48,480
might have already been part of its training data.

114
00:03:48,480 --> 00:03:49,320
Oh, interesting.

115
00:03:49,320 --> 00:03:52,600
So you can think of it as accidentally giving

116
00:03:52,600 --> 00:03:55,240
a student a test with questions they've already seen, right?

117
00:03:55,240 --> 00:03:56,080
Right, yeah.

118
00:03:56,080 --> 00:03:59,000
Not a perfect measure, but still valuable.

119
00:03:59,000 --> 00:04:02,280
Because the questions they're asking are brand new.

120
00:04:02,280 --> 00:04:06,520
So it's still testing how well it can apply its knowledge

121
00:04:06,520 --> 00:04:07,680
in a new situation.

122
00:04:07,680 --> 00:04:10,080
So it's like seeing if they can think on their feet,

123
00:04:10,080 --> 00:04:12,440
even if they might have skimmed the textbook beforehand.

124
00:04:12,440 --> 00:04:13,600
Yeah, exactly.

125
00:04:13,600 --> 00:04:15,920
And then the other limitation is that this leaderboard

126
00:04:15,920 --> 00:04:20,680
is really good at testing what they call grounded factuality.

127
00:04:20,680 --> 00:04:21,520
Okay.

128
00:04:21,520 --> 00:04:25,120
So it means checking if the AI can pull the right information

129
00:04:25,120 --> 00:04:27,400
from the specific document it's given.

130
00:04:27,400 --> 00:04:29,520
It's not testing like general knowledge.

131
00:04:29,520 --> 00:04:30,360
Right, right.

132
00:04:30,360 --> 00:04:33,120
Or if it can double check facts against other sources.

133
00:04:33,120 --> 00:04:36,680
So it's kind of like saying, OK, AI, here's your little world.

134
00:04:36,680 --> 00:04:40,000
Tell me what's true within these boundaries.

135
00:04:40,000 --> 00:04:42,560
But in the real world, information's everywhere.

136
00:04:42,560 --> 00:04:43,160
Exactly.

137
00:04:43,160 --> 00:04:46,720
So it would be interesting to see in the future research

138
00:04:46,720 --> 00:04:50,440
where the AI has to deal with maybe conflicting information

139
00:04:50,440 --> 00:04:52,520
or verify facts from multiple places.

140
00:04:52,520 --> 00:04:52,920
Right.

141
00:04:52,920 --> 00:04:53,400
That's a whole.

142
00:04:53,400 --> 00:04:54,440
That would be really challenging.

143
00:04:54,440 --> 00:04:55,600
Yeah, be whole other level.

144
00:04:55,600 --> 00:04:56,080
Yeah.

145
00:04:56,080 --> 00:04:56,760
Yeah.

146
00:04:56,760 --> 00:04:58,880
So it sounds like this research has really opened up

147
00:04:58,880 --> 00:05:00,760
a lot of possibilities.

148
00:05:00,760 --> 00:05:04,080
But there's still so much more to learn about how

149
00:05:04,080 --> 00:05:06,240
to make AI truly reliable.

150
00:05:06,240 --> 00:05:07,000
Absolutely.

151
00:05:07,000 --> 00:05:10,520
And we'll be right back after a quick word from our sponsor.

152
00:05:10,520 --> 00:05:11,360
Stay tuned.

153
00:05:11,360 --> 00:05:12,920
Welcome back to the deep dive.

154
00:05:12,920 --> 00:05:16,560
You know, we're talking about making AI more accurate

155
00:05:16,560 --> 00:05:19,880
and reliable even when dealing with mountains

156
00:05:19,880 --> 00:05:21,160
of information.

157
00:05:21,160 --> 00:05:23,840
It's really amazing to think about the potential here.

158
00:05:23,840 --> 00:05:27,160
Like imagine AI that could just go through tons and tons

159
00:05:27,160 --> 00:05:29,600
of data and just tell you like, hey, this is true.

160
00:05:29,600 --> 00:05:30,960
This is not true.

161
00:05:30,960 --> 00:05:33,520
It'd be a total game changer for everything,

162
00:05:33,520 --> 00:05:36,040
from like fighting fake news to making sure you're

163
00:05:36,040 --> 00:05:37,520
getting the right medical information.

164
00:05:37,520 --> 00:05:38,360
Oh, absolutely.

165
00:05:38,360 --> 00:05:40,400
Like think about a world where you could instantly

166
00:05:40,400 --> 00:05:44,840
check a social media post or a news article is accurate.

167
00:05:44,840 --> 00:05:48,480
Just by running it through like an AI fact checker.

168
00:05:48,480 --> 00:05:49,160
Oh, wow.

169
00:05:49,160 --> 00:05:52,320
That kind of technology could be so powerful in the fight

170
00:05:52,320 --> 00:05:54,240
against misinformation.

171
00:05:54,240 --> 00:05:57,200
Yeah, it sounds amazing, but also a little scary.

172
00:05:57,200 --> 00:05:59,720
Like, I mean, who gets to decide what's considered true

173
00:05:59,720 --> 00:06:00,680
in the first place?

174
00:06:00,680 --> 00:06:02,680
It feels like there'd be a lot of gray areas.

175
00:06:02,680 --> 00:06:03,200
You're right.

176
00:06:03,200 --> 00:06:05,400
It's definitely not like a simple solution.

177
00:06:05,400 --> 00:06:08,200
Like, you know, building a truly, you know,

178
00:06:08,200 --> 00:06:10,800
comprehensive and unbiased database of information,

179
00:06:10,800 --> 00:06:12,160
that's a huge task.

180
00:06:12,160 --> 00:06:15,040
But that's exactly why research, like this leaderboard,

181
00:06:15,040 --> 00:06:15,880
is so important.

182
00:06:15,880 --> 00:06:18,280
It's really, you know, laying the foundation

183
00:06:18,280 --> 00:06:21,320
for developing these AI systems that are, you know,

184
00:06:21,320 --> 00:06:23,800
not just smart, but also trustworthy.

185
00:06:23,800 --> 00:06:27,240
So it's like we're teaching AI to be like a responsible citizen

186
00:06:27,240 --> 00:06:28,400
of the information world.

187
00:06:28,400 --> 00:06:29,000
Yeah.

188
00:06:29,000 --> 00:06:31,360
You know, making sure they know all the rules of the road

189
00:06:31,360 --> 00:06:32,600
before we let them drive.

190
00:06:32,600 --> 00:06:33,600
Exactly.

191
00:06:33,600 --> 00:06:36,280
And, you know, just like with any powerful tool we have

192
00:06:36,280 --> 00:06:37,800
to make sure we're using it for good.

193
00:06:37,800 --> 00:06:38,040
Right.

194
00:06:38,040 --> 00:06:40,480
You know, we need to think about the potential downsides

195
00:06:40,480 --> 00:06:43,760
and make sure that AI is empowering people,

196
00:06:43,760 --> 00:06:45,200
not manipulating them.

197
00:06:45,200 --> 00:06:46,880
Yeah, well said.

198
00:06:46,880 --> 00:06:50,480
So besides fighting fake news, what are some other ways

199
00:06:50,480 --> 00:06:52,880
this kind of AI could be used in the real world?

200
00:06:52,880 --> 00:06:54,160
The possibilities are huge.

201
00:06:54,160 --> 00:06:55,920
Like, imagine an AI that could summarize,

202
00:06:55,920 --> 00:06:58,640
like, really complex research papers, you know,

203
00:06:58,640 --> 00:07:01,640
making scientific knowledge more accessible to everyone.

204
00:07:01,640 --> 00:07:01,960
Right.

205
00:07:01,960 --> 00:07:03,320
Or think about legal documents.

206
00:07:03,320 --> 00:07:06,720
AI could help you analyze contracts, you know,

207
00:07:06,720 --> 00:07:08,920
making sure you understand the fine print

208
00:07:08,920 --> 00:07:10,560
before you sign anything.

209
00:07:10,560 --> 00:07:12,800
Yeah, it's like having a super-powered research system

210
00:07:12,800 --> 00:07:13,920
that never sleeps.

211
00:07:13,920 --> 00:07:15,080
Exactly.

212
00:07:15,080 --> 00:07:17,760
And because this research is focused on AI

213
00:07:17,760 --> 00:07:20,880
that can deal with these, you know, really long documents,

214
00:07:20,880 --> 00:07:22,760
it opens up even more possibilities.

215
00:07:22,760 --> 00:07:23,240
OK.

216
00:07:23,240 --> 00:07:26,440
Imagine AI that can personalize your education

217
00:07:26,440 --> 00:07:29,560
by pulling information from tons of different sources

218
00:07:29,560 --> 00:07:31,720
and tailoring it to your specific needs.

219
00:07:31,720 --> 00:07:33,720
Wow, that would be incredible.

220
00:07:33,720 --> 00:07:35,560
It really does sound like we're only just starting

221
00:07:35,560 --> 00:07:38,160
to understand what AI can do when it comes to,

222
00:07:38,160 --> 00:07:40,120
like, using information responsibly.

223
00:07:40,120 --> 00:07:40,600
We are.

224
00:07:40,600 --> 00:07:43,720
This research is just, you know, one step on a much longer

225
00:07:43,720 --> 00:07:46,240
journey, but it's a super exciting one.

226
00:07:46,240 --> 00:07:48,360
So we've talked about, like, the big picture,

227
00:07:48,360 --> 00:07:49,680
you know, implications.

228
00:07:49,680 --> 00:07:52,240
But let's zoom back in on the research itself.

229
00:07:52,240 --> 00:07:53,960
What are some key takeaways for our listeners?

230
00:07:53,960 --> 00:07:57,080
Like, what do they really need to get about this leaderboard

231
00:07:57,080 --> 00:07:59,200
and what it means for the future of AI?

232
00:07:59,200 --> 00:08:00,840
Well, I think, first of all, this research shows

233
00:08:00,840 --> 00:08:02,600
that we are getting better at measuring

234
00:08:02,600 --> 00:08:04,920
how good AI is at understanding facts.

235
00:08:04,920 --> 00:08:05,440
Right.

236
00:08:05,440 --> 00:08:07,160
This leaderboard gives us a way to, like,

237
00:08:07,160 --> 00:08:09,640
compare different AI models, you know, side by side,

238
00:08:09,640 --> 00:08:11,440
and see which ones are actually doing the best job.

239
00:08:11,440 --> 00:08:11,920
Right.

240
00:08:11,920 --> 00:08:14,720
And that's crucial for driving progress in the field.

241
00:08:14,720 --> 00:08:15,120
Yeah.

242
00:08:15,120 --> 00:08:17,000
It's like having a standardized test for AI.

243
00:08:17,000 --> 00:08:17,200
Yeah.

244
00:08:17,200 --> 00:08:18,600
So everyone's playing on the same field.

245
00:08:18,600 --> 00:08:19,480
Exactly.

246
00:08:19,480 --> 00:08:22,000
And because it's a competition, it encourages developers

247
00:08:22,000 --> 00:08:24,080
to keep pushing, you know, the boundaries

248
00:08:24,080 --> 00:08:26,560
and make their models even more accurate.

249
00:08:26,560 --> 00:08:28,560
Secondly, this research highlights

250
00:08:28,560 --> 00:08:31,200
how important it is to ground AI's answers

251
00:08:31,200 --> 00:08:32,680
and actual evidence.

252
00:08:32,680 --> 00:08:34,520
Remember that two-step process?

253
00:08:34,520 --> 00:08:35,360
Mm-hmm.

254
00:08:35,360 --> 00:08:38,680
That makes sure that the AI isn't just pulling random facts

255
00:08:38,680 --> 00:08:39,520
out of thin air.

256
00:08:39,520 --> 00:08:39,840
Right.

257
00:08:39,840 --> 00:08:41,840
It has to be able to back up its claims

258
00:08:41,840 --> 00:08:43,880
with info from the document it's given.

259
00:08:43,880 --> 00:08:45,560
So it's not enough to just be smart.

260
00:08:45,560 --> 00:08:46,840
You've got to be able to show your work.

261
00:08:46,840 --> 00:08:48,240
Exactly.

262
00:08:48,240 --> 00:08:50,680
And finally, this research shows that we are making,

263
00:08:50,680 --> 00:08:52,880
you know, real progress in teaching AI

264
00:08:52,880 --> 00:08:56,200
to understand and reason about complex information.

265
00:08:56,200 --> 00:08:56,520
Yeah.

266
00:08:56,520 --> 00:08:58,880
Like, the fact that these models can process documents

267
00:08:58,880 --> 00:09:01,240
as long as novels and still give accurate answers,

268
00:09:01,240 --> 00:09:02,400
that's a huge step.

269
00:09:02,400 --> 00:09:02,880
Yeah.

270
00:09:02,880 --> 00:09:04,680
That's pretty amazing when you think about it.

271
00:09:04,680 --> 00:09:06,160
We're basically teaching machines

272
00:09:06,160 --> 00:09:08,880
to think like, you know, expert researchers

273
00:09:08,880 --> 00:09:11,960
to be able to just dig through all this info and pull out the key facts.

274
00:09:11,960 --> 00:09:12,440
Yeah.

275
00:09:12,440 --> 00:09:15,360
And the implications of that are massive.

276
00:09:15,360 --> 00:09:18,840
Imagine a world where AI can help us make sense

277
00:09:18,840 --> 00:09:22,200
of all this information that's coming at us all the time.

278
00:09:22,200 --> 00:09:22,480
Yeah.

279
00:09:22,480 --> 00:09:24,960
You know, it could help us be better citizens,

280
00:09:24,960 --> 00:09:28,360
you know, better students, and even more efficient workers.

281
00:09:28,360 --> 00:09:29,920
I'm definitely seeing the potential here.

282
00:09:29,920 --> 00:09:30,420
Yeah.

283
00:09:30,420 --> 00:09:32,760
It's like AI is becoming our partner

284
00:09:32,760 --> 00:09:34,720
in this quest for knowledge.

285
00:09:34,720 --> 00:09:35,160
Yeah.

286
00:09:35,160 --> 00:09:37,840
You know, helping us navigate like an increasingly

287
00:09:37,840 --> 00:09:38,880
complex world.

288
00:09:38,880 --> 00:09:39,200
Yeah.

289
00:09:39,200 --> 00:09:40,200
That's a great way to put it.

290
00:09:40,200 --> 00:09:42,520
But of course, we have to remember that AI is a tool.

291
00:09:42,520 --> 00:09:42,880
Yeah.

292
00:09:42,880 --> 00:09:46,040
And like any tool, you know, it can be used for good or bad.

293
00:09:46,040 --> 00:09:48,640
And that's why it's important to keep having these conversations

294
00:09:48,640 --> 00:09:50,880
about, you know, AI ethics and make sure

295
00:09:50,880 --> 00:09:53,240
that we're developing these technologies responsibly.

296
00:09:53,240 --> 00:09:54,280
Absolutely.

297
00:09:54,280 --> 00:09:55,840
The powerful reminder that we need

298
00:09:55,840 --> 00:09:58,720
to be thoughtful about how we're integrating AI into our lives,

299
00:09:58,720 --> 00:09:59,520
you know.

300
00:09:59,520 --> 00:10:02,160
But with the right approach, it has the potential

301
00:10:02,160 --> 00:10:05,080
to make us, you know, smarter, more informed,

302
00:10:05,080 --> 00:10:08,040
and ultimately better equipped to face the challenges of the future.

303
00:10:08,040 --> 00:10:08,540
Well said.

304
00:10:08,540 --> 00:10:10,800
This research is, you know, a fascinating glimpse

305
00:10:10,800 --> 00:10:11,600
into that future.

306
00:10:11,600 --> 00:10:14,400
And it's clear that there's still so much more to explore.

307
00:10:14,400 --> 00:10:15,720
We've covered a lot in this D-Styfe,

308
00:10:15,720 --> 00:10:18,800
from like the nitty-gritty details of the leaderboard

309
00:10:18,800 --> 00:10:23,720
to, you know, its potential impact on society as a whole.

310
00:10:23,720 --> 00:10:24,920
It's been a great discussion.

311
00:10:24,920 --> 00:10:27,480
And I think it really highlights just how important it

312
00:10:27,480 --> 00:10:31,360
is to have these open and honest conversations about the role

313
00:10:31,360 --> 00:10:33,120
of AI in our world.

314
00:10:33,120 --> 00:10:34,760
And speaking of conversations, we

315
00:10:34,760 --> 00:10:36,520
want to hear from you, our listeners.

316
00:10:36,520 --> 00:10:37,200
Yeah.

317
00:10:37,200 --> 00:10:38,680
What do you think about this research?

318
00:10:38,680 --> 00:10:41,480
What excites you the most about the potential of AI?

319
00:10:41,480 --> 00:10:43,040
And what concerns do you have?

320
00:10:43,040 --> 00:10:45,480
Join the conversation on our social media

321
00:10:45,480 --> 00:10:47,160
and, you know, share your thoughts.

322
00:10:47,160 --> 00:10:50,520
We're really eager to hear your perspective on the future of AI

323
00:10:50,520 --> 00:10:52,280
and how it's going to impact our lives.

324
00:10:52,280 --> 00:10:54,320
Thanks for joining us on this deep dive

325
00:10:54,320 --> 00:10:57,360
into the world of AI factuality.

326
00:10:57,360 --> 00:11:00,400
We'll be back next time with another fascinating look

327
00:11:00,400 --> 00:11:04,160
at the latest developments in artificial intelligence.

328
00:11:04,160 --> 00:11:05,000
Yeah.

329
00:11:05,000 --> 00:11:06,680
Until then, stay curious.

330
00:11:09,040 --> 00:11:09,920
All right, so we're back.

331
00:11:09,920 --> 00:11:12,360
And, you know, we've been talking about how this AI leaderboard

332
00:11:12,360 --> 00:11:14,920
is really kind of pushing the boundaries of, you know,

333
00:11:14,920 --> 00:11:18,080
AI and truth, and seeing how well it can stick to the facts,

334
00:11:18,080 --> 00:11:20,680
even when there's like tons of information.

335
00:11:20,680 --> 00:11:21,200
Yeah.

336
00:11:21,200 --> 00:11:23,560
But before we wrap up, I wanted to go back to something

337
00:11:23,560 --> 00:11:25,640
we touched on earlier, the human element and all this.

338
00:11:25,640 --> 00:11:26,120
Yeah.

339
00:11:26,120 --> 00:11:28,040
We talk about how people are the ones writing the questions,

340
00:11:28,040 --> 00:11:28,840
judging the answers.

341
00:11:28,840 --> 00:11:32,200
But like, how do we actually teach AI

342
00:11:32,200 --> 00:11:34,520
to be more factual in the first place?

343
00:11:34,520 --> 00:11:35,800
It's not like they're born knowing

344
00:11:35,800 --> 00:11:37,840
the difference between what's true and what's not.

345
00:11:37,840 --> 00:11:38,680
That's a great point.

346
00:11:38,680 --> 00:11:40,760
And that's actually where it gets really interesting,

347
00:11:40,760 --> 00:11:45,640
because these large language models, the brains behind the AI

348
00:11:45,640 --> 00:11:48,440
that we're talking about, they're trained on so much data.

349
00:11:48,440 --> 00:11:50,880
I mean, just massive amounts of text, mostly from the internet.

350
00:11:50,880 --> 00:11:52,880
So that gives them a huge vocabulary, you know,

351
00:11:52,880 --> 00:11:55,800
like a general understanding of the world.

352
00:11:55,800 --> 00:11:59,440
But that doesn't necessarily make them experts in truth.

353
00:11:59,440 --> 00:11:59,940
Right.

354
00:11:59,940 --> 00:12:02,720
So it's kind of like they've read every book in the library,

355
00:12:02,720 --> 00:12:05,560
but they don't know which ones are actually based on facts

356
00:12:05,560 --> 00:12:07,560
and which ones are just made up stories.

357
00:12:07,560 --> 00:12:08,400
Exactly.

358
00:12:08,400 --> 00:12:12,240
They need some help to kind of figure out what's reliable

359
00:12:12,240 --> 00:12:13,560
and what's not.

360
00:12:13,560 --> 00:12:15,960
And that's where the humans come in.

361
00:12:15,960 --> 00:12:18,880
So are there teams of fact checkers out there,

362
00:12:18,880 --> 00:12:24,400
like grading AI essays and giving them gold stars for accuracy?

363
00:12:24,400 --> 00:12:27,200
Well, not exactly.

364
00:12:27,200 --> 00:12:29,120
But there are definitely humans involved

365
00:12:29,120 --> 00:12:31,040
in that training process.

366
00:12:31,040 --> 00:12:33,680
One way is to create special data sets of text

367
00:12:33,680 --> 00:12:36,760
that are really designed to teach AI about factuality.

368
00:12:36,760 --> 00:12:40,200
So think of it like, I guess, like a textbook for AI.

369
00:12:40,200 --> 00:12:41,960
But instead of grammar or vocabulary,

370
00:12:41,960 --> 00:12:45,240
it's all about how to identify reliable sources

371
00:12:45,240 --> 00:12:47,320
and how to support your claims with evidence.

372
00:12:47,320 --> 00:12:50,000
So like AI school, but the subject is truth 101.

373
00:12:50,000 --> 00:12:50,800
Exactly.

374
00:12:50,800 --> 00:12:51,640
Yeah.

375
00:12:51,640 --> 00:12:53,720
Another way to train AI is through something called

376
00:12:53,720 --> 00:12:55,120
reinforcement learning.

377
00:12:55,120 --> 00:12:57,720
It's kind of like, well, it's kind of like training a dog,

378
00:12:57,720 --> 00:12:58,440
right?

379
00:12:58,440 --> 00:13:00,600
Give them a treat when they do something good.

380
00:13:00,600 --> 00:13:03,680
And over time, they learn to do that thing more.

381
00:13:03,680 --> 00:13:05,940
So with AI, instead of a treat, it's

382
00:13:05,940 --> 00:13:09,000
like they get some kind of positive feedback or reward

383
00:13:09,000 --> 00:13:10,600
for giving an accurate answer.

384
00:13:10,600 --> 00:13:14,440
So it's like, good AI, you found the right fact

385
00:13:14,440 --> 00:13:16,640
in that massive pile of text.

386
00:13:16,640 --> 00:13:18,240
Here's a virtual high five.

387
00:13:18,240 --> 00:13:19,360
Yeah.

388
00:13:19,360 --> 00:13:20,040
Exactly.

389
00:13:20,040 --> 00:13:22,400
And this training, it's not a one time thing.

390
00:13:22,400 --> 00:13:23,320
It's ongoing.

391
00:13:23,320 --> 00:13:26,560
So as they get more data, as they get more feedback from us,

392
00:13:26,560 --> 00:13:29,040
they're constantly kind of getting better

393
00:13:29,040 --> 00:13:30,600
at figuring out what's true.

394
00:13:30,600 --> 00:13:31,480
Wow.

395
00:13:31,480 --> 00:13:34,880
So it really is this amazing kind of collaboration

396
00:13:34,880 --> 00:13:37,000
between humans and machines.

397
00:13:37,000 --> 00:13:39,240
It's not just that we're building these systems,

398
00:13:39,240 --> 00:13:41,560
but we're also kind of shaping how they think,

399
00:13:41,560 --> 00:13:42,960
how they learn about the world.

400
00:13:42,960 --> 00:13:43,460
Yeah.

401
00:13:43,460 --> 00:13:44,520
It really is this partnership.

402
00:13:44,520 --> 00:13:46,520
And it's a partnership that's always evolving.

403
00:13:46,520 --> 00:13:50,000
As the AI gets more advanced, our role changes.

404
00:13:50,000 --> 00:13:52,960
So it's less about programming and more about teaching

405
00:13:52,960 --> 00:13:53,600
and mentoring.

406
00:13:53,600 --> 00:13:55,920
It's like we're raising a generation of AI kids

407
00:13:55,920 --> 00:13:59,240
and guiding them as they explore this world of information.

408
00:13:59,240 --> 00:14:00,720
That's a great analogy.

409
00:14:00,720 --> 00:14:02,240
And just like kids, there are going

410
00:14:02,240 --> 00:14:04,360
to be challenges and rewards along the way.

411
00:14:04,360 --> 00:14:06,800
So yeah, there are going to be times when the AI makes

412
00:14:06,800 --> 00:14:10,400
mistakes, but there will also be these really incredible moments

413
00:14:10,400 --> 00:14:13,000
of insight and discovery.

414
00:14:13,000 --> 00:14:15,320
The important thing is to approach it

415
00:14:15,320 --> 00:14:18,480
with the balance of excitement, but also some caution.

416
00:14:18,480 --> 00:14:21,720
Making sure we're using it in a way that's good for everybody.

417
00:14:21,720 --> 00:14:22,880
Well said.

418
00:14:22,880 --> 00:14:27,800
So I think as we wrap up this deep dive into this FCTS

419
00:14:27,800 --> 00:14:31,840
grounding leaderboard, this quest for factual AI,

420
00:14:31,840 --> 00:14:35,600
I think the biggest takeaway is that AI and truth,

421
00:14:35,600 --> 00:14:37,040
it's not just a technical problem.

422
00:14:37,040 --> 00:14:38,400
It's really a human one.

423
00:14:38,400 --> 00:14:41,040
It takes this collaboration, this critical thinking.

424
00:14:41,040 --> 00:14:44,600
It's about building a future where AI and human intelligence

425
00:14:44,600 --> 00:14:46,720
can actually work together to help us better understand

426
00:14:46,720 --> 00:14:47,840
the world.

427
00:14:47,840 --> 00:14:49,520
So with that, we want to thank you all

428
00:14:49,520 --> 00:14:50,960
for joining us on this exploration.

429
00:14:50,960 --> 00:14:52,120
We really hope you enjoyed it.

430
00:14:52,120 --> 00:14:53,480
Yeah, thanks for listening.

431
00:14:53,480 --> 00:14:56,160
And keep digging deeper.

432
00:14:56,160 --> 00:14:58,200
Keep asking questions, have these conversations

433
00:14:58,200 --> 00:15:00,720
about the role of OrchieTac in our lives.

434
00:15:00,720 --> 00:15:03,120
And until next time, stay curious.

435
00:15:03,120 --> 00:15:23,880
And remember, the truth is out there.