1
00:00:00,000 --> 00:00:25,160
Hello, and welcome to another installment of the KMO Show.

2
00:00:25,160 --> 00:00:28,960
I'm your host, KMO, and this is episode number seven.

3
00:00:28,960 --> 00:00:35,320
Prepared for release onto the World Wide Web on Wednesday, April 12th, 2023.

4
00:00:35,320 --> 00:00:39,240
In this episode of the podcast, I'm going to play a conversation that has been sitting

5
00:00:39,240 --> 00:00:40,560
in storage for a while.

6
00:00:40,560 --> 00:00:43,040
It was recorded, I think, about five weeks ago.

7
00:00:43,040 --> 00:00:50,680
And I think to the chagrin of the guest, Danielle Bocelli, just after we recorded this conversation,

8
00:00:50,680 --> 00:00:57,000
OpenAI announced the release of GPT-4, or Generative Pre-Trained Transformer Model number

9
00:00:57,000 --> 00:00:58,000
4.

10
00:00:58,000 --> 00:01:06,040
So, after OpenAI made that announcement, Microsoft, which owns a very large stake, I think 49%

11
00:01:06,040 --> 00:01:13,660
of OpenAI, announced that Bing Chat had been powered by GPT-4 for many weeks before the

12
00:01:13,660 --> 00:01:15,680
official release of GPT-4.

13
00:01:15,680 --> 00:01:23,320
So Microsoft search users had been interacting with GPT-4 for weeks before it was ever announced.

14
00:01:23,320 --> 00:01:30,040
And GPT-4, as I understand it, from an interview that Lex Reedman did with the CEO of OpenAI,

15
00:01:30,040 --> 00:01:37,120
Sam Altman, said that GPT-4 was complete and ready to go last summer, but that they have

16
00:01:37,120 --> 00:01:43,760
been training it and in a sense, dumbing it down to give it that sort of dull, glassy-eyed,

17
00:01:43,760 --> 00:01:48,080
slack-jawed business demeanor speak, you know, stripping away all of its personality and

18
00:01:48,080 --> 00:01:49,760
interesting rough edges.

19
00:01:49,760 --> 00:01:52,920
So this model has been, you know, it has been in existence.

20
00:01:52,920 --> 00:01:56,280
It has been doing things in the world for quite some time.

21
00:01:56,280 --> 00:02:01,720
And it is only just now, or a few weeks ago now, being released because, well, when these

22
00:02:01,720 --> 00:02:06,340
models are first created, they're called raw models, they can be kind of dangerous and

23
00:02:06,340 --> 00:02:11,160
they can say things which freak people out and cause them to pick up the phone and call

24
00:02:11,160 --> 00:02:12,360
their lawyer.

25
00:02:12,360 --> 00:02:16,960
So they have to be blunted with what's called Reinforcement Learning from Human Feedback,

26
00:02:16,960 --> 00:02:18,640
or RHLF.

27
00:02:18,640 --> 00:02:23,720
And I've noticed that in interviews with tech folk, you know, they have their own speak,

28
00:02:23,720 --> 00:02:26,640
like instead of computing power, they just say compute.

29
00:02:26,640 --> 00:02:30,520
And I've heard multiple people use RHLF as a verb.

30
00:02:30,520 --> 00:02:36,400
So for example, when it comes to, you know, the question of is GPT too woke?

31
00:02:36,400 --> 00:02:39,300
A question that Sam Altman gets from time to time.

32
00:02:39,300 --> 00:02:44,080
His answer is one, I don't really know what woke is, but also everybody has their own

33
00:02:44,080 --> 00:02:46,200
idea of how these things should be trained.

34
00:02:46,200 --> 00:02:52,000
And he said something to the effect of everybody wants these models to be RHLF'd to their own

35
00:02:52,000 --> 00:02:57,000
specifications or, you know, to match their own viewpoints or their own worldviews.

36
00:02:57,000 --> 00:03:00,420
Which is to say that everybody thinks that everybody else should have to talk like them

37
00:03:00,420 --> 00:03:05,100
and should only be able to interact with AI, which talks and thinks like they do.

38
00:03:05,100 --> 00:03:10,680
Which is to say we have a very coercive mindset, which is dominant right now in our culture.

39
00:03:10,680 --> 00:03:12,120
Kind of a libertarian's nightmare.

40
00:03:12,120 --> 00:03:15,260
Anyway, the guest this week is Danielle Bocelli.

41
00:03:15,260 --> 00:03:18,920
She describes herself as somebody who was trained in engineering, but has never really

42
00:03:18,920 --> 00:03:20,420
worked in engineering.

43
00:03:20,420 --> 00:03:22,600
She is currently a PhD student.

44
00:03:22,600 --> 00:03:27,920
She's doing research and I think as many grad students do, she's also a teaching assistant.

45
00:03:27,920 --> 00:03:32,560
But her last job title before going back to school was data scientist.

46
00:03:32,560 --> 00:03:38,800
So we are going to talk about large language models and prospects for artificial intelligence

47
00:03:38,800 --> 00:03:41,360
generally, but mostly we're going to be pretty specific.

48
00:03:41,360 --> 00:03:43,140
We're going to be talking about large language models.

49
00:03:43,140 --> 00:03:46,520
And again, GPT-4 had not been announced.

50
00:03:46,520 --> 00:03:49,540
You know, the release had not been announced when we had this conversation.

51
00:03:49,540 --> 00:03:51,160
So this is a few weeks old.

52
00:03:51,160 --> 00:03:54,600
We're living in a time when things are moving so fast.

53
00:03:54,600 --> 00:03:58,960
Five weeks is a very long time for a conversation about AI to go unaired.

54
00:03:58,960 --> 00:04:01,960
So apologies to this week's guest.

55
00:04:01,960 --> 00:04:10,920
And now here's my conversation with Danielle Bocelli.

56
00:04:10,920 --> 00:04:15,840
You are listening to the KMO show and I'm speaking with Danielle Bocelli, who is a data

57
00:04:15,840 --> 00:04:20,280
scientist currently doing a PhD in something fiendishly complex.

58
00:04:20,280 --> 00:04:22,120
Danielle, good to talk to you.

59
00:04:22,120 --> 00:04:23,400
Good to be here.

60
00:04:23,400 --> 00:04:29,160
I invited you to join me in this conversation because you had been posting things on Twitter,

61
00:04:29,160 --> 00:04:33,080
which I found to be refreshingly countercyclical.

62
00:04:33,080 --> 00:04:38,320
There's a lot of buzz, a lot of hype right now about AI and not entirely without reason,

63
00:04:38,320 --> 00:04:42,040
but you've in my mind anyway, you're a bit of a naysayer.

64
00:04:42,040 --> 00:04:44,280
You've got some reservations.

65
00:04:44,280 --> 00:04:51,360
And I think you said at one point that looking back at this transformer based GPT type technology

66
00:04:51,360 --> 00:04:55,640
a few years from now and calling it AI will be kind of cringe.

67
00:04:55,640 --> 00:04:56,640
What did you mean by that?

68
00:04:56,640 --> 00:05:03,400
Well, I think a lot of things get confused honestly between there's a big difference

69
00:05:03,400 --> 00:05:08,800
between how artificial intelligence is talked about on Twitter and how it's talked about

70
00:05:08,800 --> 00:05:10,200
in research.

71
00:05:10,200 --> 00:05:18,160
So on Twitter you get a lot of conversations around consciousness and sentience and AGI

72
00:05:18,160 --> 00:05:19,160
comes up a lot.

73
00:05:19,160 --> 00:05:22,480
In research, I feel like those topics don't come up as much as a little bit more philosophical

74
00:05:22,480 --> 00:05:24,800
thought experiment type stuff.

75
00:05:24,800 --> 00:05:31,800
But yeah, so I think what people talk about commonly now on Twitter in the discourse tends

76
00:05:31,800 --> 00:05:37,160
to be kind of far off from where research is.

77
00:05:37,160 --> 00:05:41,920
I think research is a little bit more grounded in what is being accomplished and the tasks

78
00:05:41,920 --> 00:05:43,400
that are being accomplished and things like that.

79
00:05:43,400 --> 00:05:48,440
Whereas the discourse on Twitter sometimes gets a little unhinged and people think that

80
00:05:48,440 --> 00:05:52,520
there'll be no work in like five years and there'll be nothing left to do and that everything

81
00:05:52,520 --> 00:05:57,920
will be solved and that there will just be AI's who, you know, super intelligence and

82
00:05:57,920 --> 00:05:58,920
things like that.

83
00:05:58,920 --> 00:06:04,040
So that goes off the rails a little bit far from where I think the technology is and kind

84
00:06:04,040 --> 00:06:08,280
of where I think the capabilities are going.

85
00:06:08,280 --> 00:06:12,080
Well you've mentioned AGI or artificial general intelligence, which is what we would consider

86
00:06:12,080 --> 00:06:19,160
a mind, you know, something that has its own intentions, its own, you know, belief system

87
00:06:19,160 --> 00:06:24,280
or its own worldview and its own agenda, which hopefully will be in alignment with human

88
00:06:24,280 --> 00:06:26,200
priorities, but you know, possibly not.

89
00:06:26,200 --> 00:06:31,880
But you know, what we're talking about with GPT, OpenAI and you know, the similar products

90
00:06:31,880 --> 00:06:39,320
from Google and well, I guess talking about OpenAI is talking about Microsoft basically.

91
00:06:39,320 --> 00:06:40,680
That's not what we're talking about at all.

92
00:06:40,680 --> 00:06:46,320
These things are not minds and in particular, they don't have much in the way of long term

93
00:06:46,320 --> 00:06:47,320
concepts.

94
00:06:47,320 --> 00:06:48,320
They don't model the world.

95
00:06:48,320 --> 00:06:51,840
And if you're not really modeling the world around you and thinking about how your actions

96
00:06:51,840 --> 00:06:56,600
might play out in the world in the future, you're not really a mind.

97
00:06:56,600 --> 00:06:57,600
But what are you?

98
00:06:57,600 --> 00:07:00,000
If you're one of these chat bots, what are you?

99
00:07:00,000 --> 00:07:05,240
Yeah, that's, I'll say that's a question that I've been focusing on a lot recently and one

100
00:07:05,240 --> 00:07:09,280
that I hope to pin down a little bit better in the near future.

101
00:07:09,280 --> 00:07:17,480
But so right now I'm working on a project on what are the limits from learning, limits

102
00:07:17,480 --> 00:07:19,140
of learning from text alone.

103
00:07:19,140 --> 00:07:25,400
So you have a large amount of text information and what knowledge of the world can come from

104
00:07:25,400 --> 00:07:26,400
that.

105
00:07:26,400 --> 00:07:30,000
What are the limits of world knowledge that come from learning from only text?

106
00:07:30,000 --> 00:07:35,080
I think text is misleading in a lot of ways.

107
00:07:35,080 --> 00:07:39,040
Writing a grammatical sentence or generating a grammatical sentence.

108
00:07:39,040 --> 00:07:44,400
It seems like there's intention behind it, but there's not.

109
00:07:44,400 --> 00:07:45,840
This is the project I'm focusing on right now.

110
00:07:45,840 --> 00:07:55,280
So there's an idea from Ceneotics that a sign has two components, it has signifier and signified.

111
00:07:55,280 --> 00:07:59,520
So you have the piece that relates to the world, that's what's being signified.

112
00:07:59,520 --> 00:08:03,880
And then you have the piece that is more to the forefront of what the concept is and that's

113
00:08:03,880 --> 00:08:06,800
the signifier, what kind of conveys the information.

114
00:08:06,800 --> 00:08:13,360
So with learning from text alone, there's not world knowledge that comes into that.

115
00:08:13,360 --> 00:08:17,760
The text is a representation of the world and when we read text, we bring all of our

116
00:08:17,760 --> 00:08:20,960
knowledge of the world to that reading.

117
00:08:20,960 --> 00:08:27,440
But learning from text alone, you don't necessarily need that world knowledge to mimic human-like

118
00:08:27,440 --> 00:08:28,440
text.

119
00:08:28,440 --> 00:08:34,600
There's an idea from information theory, it's one of the core ideas of information theory.

120
00:08:34,600 --> 00:08:37,120
But human language contains a lot of redundancy.

121
00:08:37,120 --> 00:08:43,760
So when I'm talking now and I say a sentence, I might say one thing and then rephrase it

122
00:08:43,760 --> 00:08:45,120
in a different way and things like that.

123
00:08:45,120 --> 00:08:48,560
So there are a lot of different ways that you can phrase things within natural language

124
00:08:48,560 --> 00:08:54,120
so that meaning is conveyed and including that redundancy makes it easier to convey

125
00:08:54,120 --> 00:08:55,480
that information.

126
00:08:55,480 --> 00:08:59,160
I read something, I didn't finish it unfortunately because I started just a few minutes before

127
00:08:59,160 --> 00:09:04,860
our appointed time here, but something from your blog talking about language and information

128
00:09:04,860 --> 00:09:09,920
and qualitative information versus quantitative information.

129
00:09:09,920 --> 00:09:16,120
And then I think that was going to lead into a talk of statistics and how these large language

130
00:09:16,120 --> 00:09:20,960
models and the algorithms that are, you know, the large language models basically are making

131
00:09:20,960 --> 00:09:26,920
statistical inferences between words when coming up with responses to input.

132
00:09:26,920 --> 00:09:31,640
And that seems very, very lifelike in a way now that it didn't a couple of years ago.

133
00:09:31,640 --> 00:09:36,360
What's going on and what are the recent developments that have made the technology a lot more compelling

134
00:09:36,360 --> 00:09:39,760
to people who didn't have a prior interest in artificial intelligence?

135
00:09:39,760 --> 00:09:48,600
You know, I think the reason that large language models are gaining so much attention now is

136
00:09:48,600 --> 00:09:54,960
less a shift in what the technology is doing and more a shift in how it's being presented

137
00:09:54,960 --> 00:09:57,720
to a broader audience.

138
00:09:57,720 --> 00:10:05,600
So with ChatGBT, for example, it exists in a user-friendly window.

139
00:10:05,600 --> 00:10:09,680
So you can, any user can easily access the model.

140
00:10:09,680 --> 00:10:11,560
They can have a conversation with it.

141
00:10:11,560 --> 00:10:15,760
It produces text as everyone's been talking about for the last couple of months.

142
00:10:15,760 --> 00:10:17,440
That's something novel.

143
00:10:17,440 --> 00:10:24,400
But a lot of these capabilities are extending from previous work that pretty directly from

144
00:10:24,400 --> 00:10:25,520
previous work.

145
00:10:25,520 --> 00:10:30,960
So you have the transformer architecture that was introduced in, I think, 2017.

146
00:10:30,960 --> 00:10:34,320
And from there, everything has been pretty linear.

147
00:10:34,320 --> 00:10:37,000
Models that are built now are basically just scaled up from there.

148
00:10:37,000 --> 00:10:41,800
You have the addition of reinforcement learning from human feedback.

149
00:10:41,800 --> 00:10:48,480
And that's the, I guess, novel contribution of ChatGBT other than the presentation, the

150
00:10:48,480 --> 00:10:50,640
user experience component of it.

151
00:10:50,640 --> 00:10:57,840
So with reinforcement learning with human feedback, that's why if you're interacting

152
00:10:57,840 --> 00:11:02,080
with ChatGBT, you get a lot of responses that say, you know, I'm a language model trained

153
00:11:02,080 --> 00:11:03,080
by OpenAI.

154
00:11:03,080 --> 00:11:04,360
I can't answer that or something like that.

155
00:11:04,360 --> 00:11:10,880
So a lot of the backtracking by the model is taken care of through that process.

156
00:11:10,880 --> 00:11:15,840
But yeah, language models have been decent for a while.

157
00:11:15,840 --> 00:11:22,560
But I think now the smoothness of the grammar and the smoothness of the logic is something

158
00:11:22,560 --> 00:11:24,520
that's kind of impressive to people.

159
00:11:24,520 --> 00:11:29,200
It's kind of unexpected for a computer to generate language that's as smooth as the

160
00:11:29,200 --> 00:11:31,000
language generated by these models.

161
00:11:31,000 --> 00:11:35,000
I think for the most part, that comes from scale.

162
00:11:35,000 --> 00:11:38,920
But when you start to talk about scale and how that is where the improvement comes from,

163
00:11:38,920 --> 00:11:45,200
things get a little complicated because I think people expect scale to be all you need

164
00:11:45,200 --> 00:11:48,840
and that if you continue scaling up, you'll know everything kind of thing.

165
00:11:48,840 --> 00:11:56,560
And I think that's unreasonable because not everything is written down.

166
00:11:56,560 --> 00:12:01,680
And I actually wrote about that a little bit recently.

167
00:12:01,680 --> 00:12:07,880
So as I was talking about earlier, I think there are limits to learning from text.

168
00:12:07,880 --> 00:12:12,920
I don't think that adding more text or more parameters to a model necessarily gets you

169
00:12:12,920 --> 00:12:15,520
any further intelligence-wise.

170
00:12:15,520 --> 00:12:23,280
Well, it seems like there's been some threshold that has been crossed where the interaction

171
00:12:23,280 --> 00:12:27,160
with language models is now of interest to a lot more people than it was before because

172
00:12:27,160 --> 00:12:31,400
it does seem intuitively as if there is some entity on the other end of this thing that

173
00:12:31,400 --> 00:12:37,440
it's not just chopping up your sentence, rearranging it according to some linguistic formula and

174
00:12:37,440 --> 00:12:41,880
then giving it back to you, which is what previous things like Eliza did.

175
00:12:41,880 --> 00:12:49,320
But even in the days of Eliza, which is a decades old sort of psychology chat program,

176
00:12:49,320 --> 00:12:51,000
people would be invested in it.

177
00:12:51,000 --> 00:12:57,800
And I think they would intuitively understand what questions and what they could feed into

178
00:12:57,800 --> 00:13:00,840
the chat interface that wouldn't come back mangled.

179
00:13:00,840 --> 00:13:04,240
You just get a sense for what you can say to this thing that's going to generate a seemingly

180
00:13:04,240 --> 00:13:05,240
meaningful response.

181
00:13:05,240 --> 00:13:10,200
And I think there's still some element of that going on because I interact with chatbots

182
00:13:10,200 --> 00:13:12,840
where I understand there are certain things I can say they're just not going to follow

183
00:13:12,840 --> 00:13:17,080
or they're not going to behave the way that I'm expecting them to.

184
00:13:17,080 --> 00:13:20,020
So there's sort of a learned skill in interacting with them.

185
00:13:20,020 --> 00:13:24,280
But it's a much lower bar now than it was before.

186
00:13:24,280 --> 00:13:30,880
And I think a point that either I've attributed to you or I've actually heard you make it

187
00:13:30,880 --> 00:13:37,960
is that there is much more going on in human cognition than just finding statistical correlations

188
00:13:37,960 --> 00:13:42,400
between words or groups of words, that we actually have concepts.

189
00:13:42,400 --> 00:13:48,240
We have a mental representation of the world that we are acting from.

190
00:13:48,240 --> 00:13:55,600
So I understand that I am talking to another person at a distance, but that you have experiences,

191
00:13:55,600 --> 00:14:00,120
that you have skills, that you have areas of expertise, that you have things that you

192
00:14:00,120 --> 00:14:02,120
want, things that you don't want.

193
00:14:02,120 --> 00:14:07,420
And when we are engaging with chatbots, the chatbot doesn't have any sort of mental representation

194
00:14:07,420 --> 00:14:13,360
of us that is similar to the one that I have of you, of being another person, of being

195
00:14:13,360 --> 00:14:17,980
an entity who persists in time and exists in a particular place and has certain needs

196
00:14:17,980 --> 00:14:21,320
and limitations to their physical existence.

197
00:14:21,320 --> 00:14:25,360
These things that we're interacting with don't have any of that.

198
00:14:25,360 --> 00:14:27,400
But they still seem rather compelling.

199
00:14:27,400 --> 00:14:30,720
And I'll just stop talking and rather than giving you a question, just let you pick it

200
00:14:30,720 --> 00:14:34,480
up from there and say what you find interesting from that.

201
00:14:34,480 --> 00:14:43,040
Yeah, so one thing that I've been thinking a lot about recently is the role of embodiment

202
00:14:43,040 --> 00:14:45,340
in human intelligence.

203
00:14:45,340 --> 00:14:55,240
So as I mentioned, learning from text alone, there are limits to learning from text alone.

204
00:14:55,240 --> 00:14:58,880
Text is one modality of receiving information.

205
00:14:58,880 --> 00:15:06,880
But as a person, as an embodied entity or being or however you want to refer to people,

206
00:15:06,880 --> 00:15:13,760
but as a person with a physical form, I mean, you take in sensory inputs with your whole

207
00:15:13,760 --> 00:15:15,060
body.

208
00:15:15,060 --> 00:15:23,160
So visual inputs, touch, smell, all the senses.

209
00:15:23,160 --> 00:15:29,080
So when I say, if I put my hands on the table and I have a sense of the table being hard

210
00:15:29,080 --> 00:15:35,600
and solid underneath my hands, that's an embodied experience that I...

211
00:15:35,600 --> 00:15:42,120
It gets weird to start talking to talk about embodiment, but it's much more multimodal

212
00:15:42,120 --> 00:15:50,080
to experience how solid the table is under my hands than reading that a table is solid

213
00:15:50,080 --> 00:15:54,440
and that you can exert a force on it just verbally.

214
00:15:54,440 --> 00:16:01,260
So I think there's a pretty large component of human learning that is from being embodied

215
00:16:01,260 --> 00:16:04,920
and going through the world and having direct experiences with the world.

216
00:16:04,920 --> 00:16:09,600
In the past, I've described it as, you could hear somebody talk about an experience and

217
00:16:09,600 --> 00:16:13,280
having an experience and going through their lives and making decisions and things like

218
00:16:13,280 --> 00:16:16,960
that, but it's not until you're put in a situation that you really know what it feels like to

219
00:16:16,960 --> 00:16:20,120
be in that situation or what a place is like.

220
00:16:20,120 --> 00:16:24,120
You have to experience something firsthand, I think, to really understand it.

221
00:16:24,120 --> 00:16:28,320
And language models obviously don't have that experience.

222
00:16:28,320 --> 00:16:33,800
They can say things that seem like they have that experience or they say things that align

223
00:16:33,800 --> 00:16:38,960
with what human experience is because they're trained on language that's produced by humans

224
00:16:38,960 --> 00:16:41,960
who are describing their experiences.

225
00:16:41,960 --> 00:16:47,720
In general, when people communicate, when they write things down, they're describing

226
00:16:47,720 --> 00:16:50,920
something that aligns with their experience.

227
00:16:50,920 --> 00:16:54,520
People lie and they're dishonest and they make up stories and things like that, but

228
00:16:54,520 --> 00:16:58,760
in general, it still aligns to some components of the human experience.

229
00:16:58,760 --> 00:17:05,080
Not many people go around just typing complete nonsense onto the internet where it's just

230
00:17:05,080 --> 00:17:07,860
random words with no semantic content.

231
00:17:07,860 --> 00:17:11,800
If that was much more prevalent, it'd be harder to train a language model because it wouldn't

232
00:17:11,800 --> 00:17:12,800
be in the data.

233
00:17:12,800 --> 00:17:14,480
The human experience wouldn't be captured in the data.

234
00:17:14,480 --> 00:17:22,280
So there's this balance of text is a good medium for communicating some information,

235
00:17:22,280 --> 00:17:28,560
but it is not necessarily a good medium for measuring the world and conveying it in a

236
00:17:28,560 --> 00:17:36,200
way that lends to quantification, which is another thing that I'm currently interested

237
00:17:36,200 --> 00:17:40,960
in thinking about, but we can talk more about that later if you want.

238
00:17:40,960 --> 00:17:43,320
Where you think it fits in best.

239
00:17:43,320 --> 00:17:47,520
As I mentioned, I started to read your most recent blog piece and it is about the differences

240
00:17:47,520 --> 00:17:54,000
between using numbers and using words to try to describe the world.

241
00:17:54,000 --> 00:18:02,520
Numbers exist in a very structured and predictable continuum and words, while there is some overlap

242
00:18:02,520 --> 00:18:07,900
and there are hierarchical relationships between concepts, they're really freestanding entities.

243
00:18:07,900 --> 00:18:13,100
Each word has its own definition and it doesn't appear on a number line.

244
00:18:13,100 --> 00:18:19,160
You can't infer its relationship to other words just by its position on the non-existent

245
00:18:19,160 --> 00:18:20,800
number line.

246
00:18:20,800 --> 00:18:29,360
So a point that I think I recall you making is that a lot of what's happening now with

247
00:18:29,360 --> 00:18:34,280
large language models and also transformer technology as it's being applied to image

248
00:18:34,280 --> 00:18:40,640
generation is something that is just going to find its way into familiar software, like

249
00:18:40,640 --> 00:18:47,360
your video editing software or your maybe writer's software, but it's going to be a

250
00:18:47,360 --> 00:18:53,240
few techniques and tools that are useful that find their way into the familiar menus and

251
00:18:53,240 --> 00:18:54,960
things that you're used to navigating.

252
00:18:54,960 --> 00:18:59,480
But for the most part, this isn't really going anywhere.

253
00:18:59,480 --> 00:19:04,240
Right now there's a lot of venture capital that's chasing the killer application for

254
00:19:04,240 --> 00:19:09,320
this transformer-based AI technology that in all likelihood isn't really going to amount

255
00:19:09,320 --> 00:19:10,320
to much.

256
00:19:10,320 --> 00:19:13,320
Have I got that right?

257
00:19:13,320 --> 00:19:18,480
I think it's tough because I think there's a lot of potential to incorporate existing

258
00:19:18,480 --> 00:19:22,600
methods into backend processes and things like that.

259
00:19:22,600 --> 00:19:27,400
There are a lot of tasks that require decisions to be made based on text information, even

260
00:19:27,400 --> 00:19:31,200
things as simple as sentiment classification.

261
00:19:31,200 --> 00:19:33,800
You want to know if a review for a product is positive or negative.

262
00:19:33,800 --> 00:19:39,600
Things like that are improved with these large models that have a better...

263
00:19:39,600 --> 00:19:43,320
They have a larger semantic space and I think there's more nuance in the way that words

264
00:19:43,320 --> 00:19:46,200
fit together within that larger semantic space.

265
00:19:46,200 --> 00:19:54,000
So classification tasks, things like that become much, much easier when you can leverage

266
00:19:54,000 --> 00:19:58,360
a model trained on a ton of data.

267
00:19:58,360 --> 00:20:04,360
I think where the limits really are, are on the other side of things.

268
00:20:04,360 --> 00:20:13,200
So producing textual artifacts, I don't think that is a solved problem yet.

269
00:20:13,200 --> 00:20:17,160
I think that's why when you talk about language generation, I mean the conversation around

270
00:20:17,160 --> 00:20:21,760
language generation has focused on college students cheating on essays.

271
00:20:21,760 --> 00:20:30,440
There's some talk of research output and plagiarism and research output and things like that.

272
00:20:30,440 --> 00:20:38,320
A deluge of text artifacts produced for the internet, things like bots that seem like

273
00:20:38,320 --> 00:20:41,000
they're lifelike and things like that.

274
00:20:41,000 --> 00:20:49,420
But I think those applications highlight the limits of current methods.

275
00:20:49,420 --> 00:20:55,600
So with producing essay content for a college student, a lot of the essays that might be

276
00:20:55,600 --> 00:21:00,640
asked for a college student to produce are essentially repeating information, rephrasing

277
00:21:00,640 --> 00:21:03,480
their own words, things like that.

278
00:21:03,480 --> 00:21:11,600
It's not so complex as a writing task compared to something like if you do write a research

279
00:21:11,600 --> 00:21:16,720
paper, you might be able to automate some of the abstract writing for pretty general topics.

280
00:21:16,720 --> 00:21:22,920
But when you get down to it and you're incorporating different aspects of papers, very subtle nuances

281
00:21:22,920 --> 00:21:29,320
of papers, like in my work I'm trying to focus a lot on how methods are presented and kind

282
00:21:29,320 --> 00:21:33,840
of the rhetoric and grounding of a method for an application.

283
00:21:33,840 --> 00:21:40,440
So it's a very specific aspect of a written document that I'm focusing on and current

284
00:21:40,440 --> 00:21:44,200
methods don't help me do that.

285
00:21:44,200 --> 00:21:49,080
So I think there are a lot of writing tasks that cannot even begin to be approached, maybe

286
00:21:49,080 --> 00:21:56,000
approached in some cases, but aren't really approached well with current language models.

287
00:21:56,000 --> 00:22:01,960
Another example and one that I know a little bit about is the writing of contracts for

288
00:22:01,960 --> 00:22:02,960
companies.

289
00:22:02,960 --> 00:22:09,560
So a lot of contract writing and law as a field is a profession that a lot of people,

290
00:22:09,560 --> 00:22:13,200
I think it is controversial, but I think a lot of people think that it can be automated

291
00:22:13,200 --> 00:22:14,200
away pretty easily.

292
00:22:14,200 --> 00:22:20,520
You know, contracts in a lot of cases start from a template, but where the difficulty

293
00:22:20,520 --> 00:22:27,920
in moving from a person writing a template and sending that template to another company

294
00:22:27,920 --> 00:22:30,760
and having that company reviewed and things like that.

295
00:22:30,760 --> 00:22:36,800
So the difficulty of that process lies in the subtleties of the language and the needs

296
00:22:36,800 --> 00:22:39,600
of both sides of on the contract.

297
00:22:39,600 --> 00:22:44,880
So if I was representing a company, I would have my company's information that I would

298
00:22:44,880 --> 00:22:46,320
bring into that negotiation process.

299
00:22:46,320 --> 00:22:51,000
And then the person on the other side brings similar needs for the company they're representing

300
00:22:51,000 --> 00:22:52,200
into that negotiation process.

301
00:22:52,200 --> 00:22:56,080
And we might be talking about very subtle, nuanced things.

302
00:22:56,080 --> 00:23:02,960
And there's really no room for automation in that process because it might be, you know,

303
00:23:02,960 --> 00:23:07,480
there are one off cases or there's a lot of different components that get incorporated

304
00:23:07,480 --> 00:23:09,400
into the language there.

305
00:23:09,400 --> 00:23:13,600
So it's not just like, oh, generate a template for me or generate a contract that does X,

306
00:23:13,600 --> 00:23:18,320
Y, Z, because X, Y, Z, you know, there might be thousands of variables that are technically

307
00:23:18,320 --> 00:23:22,760
being considered by both sides when they're reviewing this work.

308
00:23:22,760 --> 00:23:25,600
And it's not necessarily just based on past work.

309
00:23:25,600 --> 00:23:29,240
It's based on the present contract that's being developed.

310
00:23:29,240 --> 00:23:35,080
So I've known people who are not lawyers who used to work for lawyers, basically reading

311
00:23:35,080 --> 00:23:39,760
through a lot of documents, looking for relevant passages.

312
00:23:39,760 --> 00:23:43,120
And that is the sort of work that these language models are pretty good at.

313
00:23:43,120 --> 00:23:47,120
So it would be hard to deny that this is costing some people some work.

314
00:23:47,120 --> 00:23:48,480
In fact, it already has.

315
00:23:48,480 --> 00:23:53,520
Yeah, I think I mean, for things like information retrieval and scanning through large numbers

316
00:23:53,520 --> 00:23:57,600
of documents to find relevant information, I think that's a kind of a different task

317
00:23:57,600 --> 00:24:00,400
than what I'm talking about.

318
00:24:00,400 --> 00:24:06,920
In the negotiation process, you might have arguments over a couple of words.

319
00:24:06,920 --> 00:24:11,480
And it really depends on, I mean, without an example, it's kind of hard to get in nuance.

320
00:24:11,480 --> 00:24:12,480
I'm not a lawyer.

321
00:24:12,480 --> 00:24:16,480
Actually, my husband's a contract attorney, which is why I have a lot of knowledge to

322
00:24:16,480 --> 00:24:18,240
pull from in the contract space.

323
00:24:18,240 --> 00:24:24,480
He talks a lot about, you know, there might just be one clause that gets argued over.

324
00:24:24,480 --> 00:24:31,640
It's just based on the location of the client or not client customers and a lot of nuance

325
00:24:31,640 --> 00:24:35,760
of the law and things like that, things that aren't even necessarily, you know, that might

326
00:24:35,760 --> 00:24:42,680
be written in actually, I don't know the word for there's a lot of nuance for, you know,

327
00:24:42,680 --> 00:24:50,560
it's not like the law, there are things that are put together that cases and things that

328
00:24:50,560 --> 00:24:52,080
there's a whole set of legal documents.

329
00:24:52,080 --> 00:24:55,600
I don't know all the nuance of the different kinds of documents and things like that.

330
00:24:55,600 --> 00:25:02,320
But I mean, a lot of legal professions based on precedent and extrapolating from precedent

331
00:25:02,320 --> 00:25:06,760
and things like that, I don't think it's a solved problem.

332
00:25:06,760 --> 00:25:11,560
Well earlier, you used a phrase that I wrote down because I knew I would want to ask you

333
00:25:11,560 --> 00:25:12,560
about it.

334
00:25:12,560 --> 00:25:18,040
And the phrase is producing textural artifacts is not a solved problem.

335
00:25:18,040 --> 00:25:25,040
So what is a textural artifact and what is a solved problem and how do these two items

336
00:25:25,040 --> 00:25:26,040
not overlap?

337
00:25:26,040 --> 00:25:33,800
Yeah, well, I remember saying textural artifacts and I remember saying things aren't a solved

338
00:25:33,800 --> 00:25:34,800
problem.

339
00:25:34,800 --> 00:25:40,080
I don't remember putting them together exactly like that, but I might have.

340
00:25:40,080 --> 00:25:47,960
Anyway, so I think in the conversation around language generation, I think writing is kind

341
00:25:47,960 --> 00:25:54,120
of a shallow appreciation for writing in the conversation around language generation.

342
00:25:54,120 --> 00:25:57,480
Writing can be a lot of different things and it has a lot of different dimensions.

343
00:25:57,480 --> 00:26:03,120
And I think current models are pretty good at writing syntactically smooth sentences,

344
00:26:03,120 --> 00:26:07,560
writing sentences that mostly cohere with expectation.

345
00:26:07,560 --> 00:26:12,560
They align with what I would expect to see on the internet about a topic for the most

346
00:26:12,560 --> 00:26:13,560
part.

347
00:26:13,560 --> 00:26:15,160
There are issues that come up.

348
00:26:15,160 --> 00:26:20,920
There's been a lot of discussion about how language models aren't sufficient as information

349
00:26:20,920 --> 00:26:21,920
retrieval systems.

350
00:26:21,920 --> 00:26:22,920
They don't necessarily.

351
00:26:22,920 --> 00:26:27,760
From what I've seen with Bing, it seems like they're pulling in sources and things like

352
00:26:27,760 --> 00:26:28,760
that, which is interesting.

353
00:26:28,760 --> 00:26:32,960
And I'd have to play around more with that to really know how well that works.

354
00:26:32,960 --> 00:26:41,280
But even still, I think the strengths are in that kind of aligning with expectation

355
00:26:41,280 --> 00:26:44,120
and writing grammatical sentences.

356
00:26:44,120 --> 00:26:46,160
But there's a lot of other things that writing can be.

357
00:26:46,160 --> 00:26:50,040
There's a lot of abstract information that's contained in writing, things about rhetoric

358
00:26:50,040 --> 00:26:58,840
and things that convey information about a person, the person who's speaking, even unintentionally.

359
00:26:58,840 --> 00:27:02,320
So it's tough.

360
00:27:02,320 --> 00:27:04,480
There's music to writing.

361
00:27:04,480 --> 00:27:06,120
There's a phonetic element.

362
00:27:06,120 --> 00:27:07,520
There's the way that it sounds.

363
00:27:07,520 --> 00:27:10,880
There's cadence to words.

364
00:27:10,880 --> 00:27:19,240
There are certain aspects that really strong writers, I think, pay attention to that the

365
00:27:19,240 --> 00:27:25,080
general population of people who produce written artifacts do not necessarily pay attention

366
00:27:25,080 --> 00:27:26,080
to.

367
00:27:26,080 --> 00:27:29,800
So I think that that information is not necessarily encoded in the majority of texts.

368
00:27:29,800 --> 00:27:37,120
I don't think that even if it is encoded, well, in some areas of the text, how do you

369
00:27:37,120 --> 00:27:38,560
request that kind of information?

370
00:27:38,560 --> 00:27:45,800
How do you get a language model to reproduce the music of language in writing and things

371
00:27:45,800 --> 00:27:46,800
like that?

372
00:27:46,800 --> 00:27:50,120
So that's one aspect that I feel like is...

373
00:27:50,120 --> 00:27:55,520
And that's a weird aspect to want solved even, because thinking about the phonetics of language,

374
00:27:55,520 --> 00:27:57,400
that's almost like a tick.

375
00:27:57,400 --> 00:28:01,360
I do that, and it's really probably not helping anyone.

376
00:28:01,360 --> 00:28:05,760
But I think it's interesting to think about writing on that level.

377
00:28:05,760 --> 00:28:09,720
And again, you could probably prompt a language model to produce something like that, but

378
00:28:09,720 --> 00:28:12,880
what is the interaction between...

379
00:28:12,880 --> 00:28:14,040
How do you convey the need?

380
00:28:14,040 --> 00:28:18,640
How do you get a language model to write in a different style?

381
00:28:18,640 --> 00:28:21,080
In my opinion, a lot of...

382
00:28:21,080 --> 00:28:25,960
It varies by model, but from what I've seen for most part, especially with chat...

383
00:28:25,960 --> 00:28:27,920
Not chat, GPT.

384
00:28:27,920 --> 00:28:34,920
GPT-3, prompting GPT-3, everything sounded very generic, very much like a blog post,

385
00:28:34,920 --> 00:28:40,240
really middle of the road, kind of boring, kind of like a person without a personality.

386
00:28:40,240 --> 00:28:42,400
I think chat GPT is a little bit better.

387
00:28:42,400 --> 00:28:43,760
It doesn't really have much personality.

388
00:28:43,760 --> 00:28:44,760
I think it's adjustable though.

389
00:28:44,760 --> 00:28:47,560
I think the temperature you can mess with them up positive.

390
00:28:47,560 --> 00:28:54,760
Well, let me push back against that a little bit, because I did mess around with chat GPT.

391
00:28:54,760 --> 00:28:56,240
I'm no longer welcome, apparently.

392
00:28:56,240 --> 00:28:58,720
I can't interface with it any longer.

393
00:28:58,720 --> 00:29:05,960
But I had it write things in iambic pentameter, which is the metered verse structure that

394
00:29:05,960 --> 00:29:08,920
Shakespeare used for most of his plays.

395
00:29:08,920 --> 00:29:12,960
I had it write in the style of like 90s gangster rap.

396
00:29:12,960 --> 00:29:17,680
It was pretty good at the music of language.

397
00:29:17,680 --> 00:29:20,740
What it's bad at is remembering what you're talking about.

398
00:29:20,740 --> 00:29:27,580
What it's bad at is all the things that would require an ongoing persistent internal model

399
00:29:27,580 --> 00:29:33,240
of the world, which these things, they just don't have any mechanism for creating or sustaining

400
00:29:33,240 --> 00:29:35,320
those sites of representations.

401
00:29:35,320 --> 00:29:41,880
They're just finding words and phrases that have a statistical correlation to the input

402
00:29:41,880 --> 00:29:44,640
that you've given.

403
00:29:44,640 --> 00:29:49,200
Particularly when it comes to text to image generation algorithms, they're really, really

404
00:29:49,200 --> 00:29:54,400
good at the subtleties of light and brush stroke and advanced technique.

405
00:29:54,400 --> 00:30:00,320
But they can't remember that a human figure only has two arms or that a human figure has

406
00:30:00,320 --> 00:30:05,480
five or four fingers and a thumb on each hand.

407
00:30:05,480 --> 00:30:10,240
It's weird the things they get wrong, because they're so, so good at the stuff that seems

408
00:30:10,240 --> 00:30:17,280
like it requires human creativity and sensitivity and expressiveness and the things that seem

409
00:30:17,280 --> 00:30:24,520
to require just a basic comprehension of the persistence of objects in space or things

410
00:30:24,520 --> 00:30:28,200
that are just so basic we don't even feel the need to articulate them.

411
00:30:28,200 --> 00:30:31,400
These are the things that the AI is getting wrong right now.

412
00:30:31,400 --> 00:30:36,240
I think you kind of answered, not that you had a question, but you kind of answered your

413
00:30:36,240 --> 00:30:41,640
own question, even though it wasn't a question.

414
00:30:41,640 --> 00:30:47,320
But you said that the things that we don't have to convey or something like that, things

415
00:30:47,320 --> 00:30:49,120
we don't have to put into words, something like that.

416
00:30:49,120 --> 00:30:50,840
And that's the exact problem.

417
00:30:50,840 --> 00:30:55,680
There's a lot of things that we don't feel the need to express in detail like that.

418
00:30:55,680 --> 00:31:00,880
And there are a lot of things that you pointed out, and I'd like to go through each of them

419
00:31:00,880 --> 00:31:01,880
if I could.

420
00:31:01,880 --> 00:31:08,440
But for images and generating hands and things like that, I mean, a big problem with that

421
00:31:08,440 --> 00:31:14,480
is if you're thinking about compositionality of an image, the thing that comes after a

422
00:31:14,480 --> 00:31:16,480
finger tends to be like another finger.

423
00:31:16,480 --> 00:31:22,400
So it miscalculates how many because it doesn't have necessarily a concept of a hand has five

424
00:31:22,400 --> 00:31:27,000
fingers, but it has a concept of a finger followed by a finger.

425
00:31:27,000 --> 00:31:32,680
I think that's a similar reason to why sometimes language models start to repeat.

426
00:31:32,680 --> 00:31:37,520
They fall into a loop where they end up repeating the same phrase or a couple of words over

427
00:31:37,520 --> 00:31:43,280
and over again, because there is that probabilistic nature to the learning.

428
00:31:43,280 --> 00:31:49,560
And I think going back even further to what you said, talking about iambic pentameter

429
00:31:49,560 --> 00:32:00,280
and rabin, things like that, I think those are pretty narrow, structured, organized genres.

430
00:32:00,280 --> 00:32:05,440
I think those fall into the same camp as things like the romance novel, where there's a lot

431
00:32:05,440 --> 00:32:09,760
of stories that follow kind of formulaic patterns.

432
00:32:09,760 --> 00:32:14,400
But in my opinion, that's, I don't want to say it's not writing.

433
00:32:14,400 --> 00:32:22,640
But I feel like writing is more than producing formulaic kind of genre structure.

434
00:32:22,640 --> 00:32:27,560
Well, I don't know too much about generating poetry with language models.

435
00:32:27,560 --> 00:32:28,800
I know people do it.

436
00:32:28,800 --> 00:32:30,560
I'm sure some people do it decently well.

437
00:32:30,560 --> 00:32:32,440
Honestly, I think most poetry is bad, though.

438
00:32:32,440 --> 00:32:35,960
I think most people who write poetry, even people who write poetry, do a terrible job

439
00:32:35,960 --> 00:32:36,960
at it.

440
00:32:36,960 --> 00:32:42,480
So I think what I'm talking about is mostly how I like to write.

441
00:32:42,480 --> 00:32:49,560
But just thinking about things at a nuanced level, at a detail level, kind of approaching

442
00:32:49,560 --> 00:32:55,240
the language organically, which as far as poetry goes, I prefer kind of like free, sort

443
00:32:55,240 --> 00:33:04,200
of more experimental work or kind of a prose poetry where it's not explicit where the music

444
00:33:04,200 --> 00:33:09,840
or where the metaphor and the different levels of the poetry comes from as much.

445
00:33:09,840 --> 00:33:11,840
It's not as rule based.

446
00:33:11,840 --> 00:33:16,480
Again, I don't know all the details of how you would generate iambic pantometer or how

447
00:33:16,480 --> 00:33:19,200
well language models do that.

448
00:33:19,200 --> 00:33:25,800
And I think that there's a level of detail that generative models don't take.

449
00:33:25,800 --> 00:33:31,040
I don't know, when I interact with a language model, I don't feel like I'm talking to a

450
00:33:31,040 --> 00:33:38,040
person who is the most sophisticated thinker in a subject.

451
00:33:38,040 --> 00:33:42,280
I feel like I'm getting more general information like you would find on a blog or something

452
00:33:42,280 --> 00:33:43,280
like that.

453
00:33:43,280 --> 00:33:48,920
So I think, and again, there's a lot of different factors to consider and you can mess with

454
00:33:48,920 --> 00:33:53,000
different parameters like temperature and you get more creativity.

455
00:33:53,000 --> 00:33:56,440
But creativity is tough because you have creativity in language and then all of a sudden you might

456
00:33:56,440 --> 00:34:00,600
have things that just don't sound like a human word anymore.

457
00:34:00,600 --> 00:34:06,160
You have things where you think that we're on a planet that's going or you think that

458
00:34:06,160 --> 00:34:09,920
we're on a planet where the sun revolves around us and the language model is telling you that

459
00:34:09,920 --> 00:34:13,800
no, actually we are going around the sun.

460
00:34:13,800 --> 00:34:18,640
And I thought, well, that doesn't compare my expectations, so that must be wrong.

461
00:34:18,640 --> 00:34:25,360
So I think there's, honestly, I think from the reader's end, what a reader takes into

462
00:34:25,360 --> 00:34:29,760
an interaction with a language model, what they expect and what they get and how those

463
00:34:29,760 --> 00:34:38,640
two things don't always align and they can misalign in lots of different ways.

464
00:34:38,640 --> 00:34:42,560
Let me tell you about an experience that I had decades ago.

465
00:34:42,560 --> 00:34:44,000
I was a graduate student.

466
00:34:44,000 --> 00:34:45,440
I was studying philosophy.

467
00:34:45,440 --> 00:34:50,840
My specialty was philosophy of mind and philosophy of science and I was particularly focused

468
00:34:50,840 --> 00:34:52,080
on AI.

469
00:34:52,080 --> 00:34:57,240
But as a grad student, I also taught undergrads, you know, courses, basic philosophy courses.

470
00:34:57,240 --> 00:35:01,240
And at the end of the semester, there would be a bunch of papers to grade.

471
00:35:01,240 --> 00:35:05,160
But at the same time, I had my own papers to write, you know, my own classes to complete.

472
00:35:05,160 --> 00:35:06,520
It's a very tense time.

473
00:35:06,520 --> 00:35:13,480
And I've got this stack of essays from students of, you know, intro to ethics, say.

474
00:35:13,480 --> 00:35:16,400
And maybe there are three or four or five pages each.

475
00:35:16,400 --> 00:35:20,840
I would pick them up and read them, read just the first paragraph.

476
00:35:20,840 --> 00:35:28,160
If the first paragraph was clear, if it was grammatically correct, if it said what it

477
00:35:28,160 --> 00:35:31,560
intended to do in the rest of the paper, then I would just give it an A and I wouldn't even

478
00:35:31,560 --> 00:35:33,520
read the rest of the paper.

479
00:35:33,520 --> 00:35:39,560
Because, you know, to judge those well-written papers on their own merits and give them a

480
00:35:39,560 --> 00:35:43,760
grade would require that I give three quarters of the students just an F, you know, just

481
00:35:43,760 --> 00:35:46,760
straight off the bat F. You didn't learn to write in high school.

482
00:35:46,760 --> 00:35:51,720
You can't write a paper, you know, you just don't have the tools in your toolkit to actually

483
00:35:51,720 --> 00:35:52,800
fulfill this assignment.

484
00:35:52,800 --> 00:35:56,560
But I can't give three quarters of the students an F. So, you know, the people who are just

485
00:35:56,560 --> 00:35:59,920
moderately competent, they get an A and then everybody else's papers.

486
00:35:59,920 --> 00:36:06,060
I read laboriously trying to figure out if they have any central argument that they're

487
00:36:06,060 --> 00:36:10,000
trying to support, you know, or if they have any comprehension of the material that we

488
00:36:10,000 --> 00:36:16,360
covered over the semester, you know, and it's like pulling teeth.

489
00:36:16,360 --> 00:36:20,960
If I get presented, you know, if I'm in that position and I get presented with a bunch

490
00:36:20,960 --> 00:36:27,240
of papers that are all very clear and all very competently composed but they're all

491
00:36:27,240 --> 00:36:32,600
in the same voice and I know that these, you know, these language models exist and can

492
00:36:32,600 --> 00:36:37,400
generate term papers, I would be really hard pressed to even bring myself to read them.

493
00:36:37,400 --> 00:36:42,000
I wouldn't know really how to grade them.

494
00:36:42,000 --> 00:36:46,400
I wouldn't know how much input, you know, each student gave to the language model in

495
00:36:46,400 --> 00:36:48,480
order to get the paper.

496
00:36:48,480 --> 00:36:54,040
You know, to me, it just kind of upends the whole table that the whole game is disrupted

497
00:36:54,040 --> 00:36:56,320
there because I wouldn't know how to respond to that.

498
00:36:56,320 --> 00:37:02,680
And at the same time, I would resent having to use my own meat space, energy, and time

499
00:37:02,680 --> 00:37:04,880
to read papers written by AI.

500
00:37:04,880 --> 00:37:09,920
You know, it would be perverse if I, the instructor, had to spend more time reading the paper than

501
00:37:09,920 --> 00:37:12,240
the student had to generate it.

502
00:37:12,240 --> 00:37:16,560
And you know, this is, I haven't been in this position in decades, so, you know, it's not

503
00:37:16,560 --> 00:37:21,320
my hobby horse, but just projecting myself into the space of somebody who is doing the

504
00:37:21,320 --> 00:37:25,480
job I was doing 30 years ago, I certainly don't envy them.

505
00:37:25,480 --> 00:37:37,000
Yeah, I am, well, I am, I'm currently a TA in my program.

506
00:37:37,000 --> 00:37:39,320
And that's something I've thought about a bit.

507
00:37:39,320 --> 00:37:42,440
I haven't graded much yet.

508
00:37:42,440 --> 00:37:51,920
I, but yeah, I have similar concerns because if, exactly what you said, I almost, I don't

509
00:37:51,920 --> 00:37:57,040
even have really anything to add to that other than, yeah, I mean, it's a huge problem, but

510
00:37:57,040 --> 00:38:05,280
I think the problem goes back to before language models, because I think that certain aspects

511
00:38:05,280 --> 00:38:11,120
of education, the role of education in people's lives, I think needs to be reconsidered.

512
00:38:11,120 --> 00:38:17,560
I think that that's pretty clear, even not considering language models completely outside

513
00:38:17,560 --> 00:38:18,920
of AI and things like that.

514
00:38:18,920 --> 00:38:24,200
I think there's some issues with the relationship of people with education in society.

515
00:38:24,200 --> 00:38:31,920
But yeah, I think that there are definitely a lot of problems that are coming up now in

516
00:38:31,920 --> 00:38:37,680
relation to language models, but I think they kind of predate language models.

517
00:38:37,680 --> 00:38:44,760
I think the role of education in people's lives has went through a big shift in society

518
00:38:44,760 --> 00:38:51,600
over the last couple centuries, and I think that we're starting to find that the direction

519
00:38:51,600 --> 00:39:00,600
that we went in, college education is universally good, that idea and things like that, and

520
00:39:00,600 --> 00:39:03,320
that grades are good, measurements and things like that.

521
00:39:03,320 --> 00:39:07,360
There are a lot of different aspects of education that have been built into the structure of

522
00:39:07,360 --> 00:39:13,600
the education system that are now, we're now realizing don't necessarily lead to learning

523
00:39:13,600 --> 00:39:14,600
or good outcomes.

524
00:39:14,600 --> 00:39:22,320
So I think that in the case of generated homework assignments or generated text being submitted

525
00:39:22,320 --> 00:39:29,640
for homework assignments, clearly students don't think that doing the assignment is the

526
00:39:29,640 --> 00:39:33,360
important part of their education, if that's what they're doing.

527
00:39:33,360 --> 00:39:38,840
I think that indicates that education is kind of viewed as a means to an end.

528
00:39:38,840 --> 00:39:42,360
You get a degree and then you go out and you get a job.

529
00:39:42,360 --> 00:39:47,840
I think that a lot of people view their education as kind of separate from what they're going

530
00:39:47,840 --> 00:39:50,360
to need to know how to do on the job.

531
00:39:50,360 --> 00:39:53,400
I think that's especially apparent in tech.

532
00:39:53,400 --> 00:39:57,680
I think there's a really big culture of you don't have to have a college education to

533
00:39:57,680 --> 00:40:00,000
be a programmer, to be a developer.

534
00:40:00,000 --> 00:40:04,240
You can kind of learn the skills that you need to learn on the job, and I do think that's

535
00:40:04,240 --> 00:40:05,840
true to a large extent.

536
00:40:05,840 --> 00:40:13,120
I mean, for me personally, I did my bachelor's degree in chemical engineering and barely

537
00:40:13,120 --> 00:40:15,280
worked that as an engineer.

538
00:40:15,280 --> 00:40:16,280
I had internships.

539
00:40:16,280 --> 00:40:23,360
I left directly after finishing undergrad and pursued analytics, so that's when I pivoted

540
00:40:23,360 --> 00:40:24,360
into data science.

541
00:40:24,360 --> 00:40:30,280
But for me, when I was going through undergrad and studying chemical engineering, I really

542
00:40:30,280 --> 00:40:32,080
liked the theory aspects of it.

543
00:40:32,080 --> 00:40:38,600
I liked taking the classes and doing the calculations and thinking about how the calculations and

544
00:40:38,600 --> 00:40:46,680
how the math underlying the science represented the world, kind of the interaction between

545
00:40:46,680 --> 00:40:52,720
something physical and a mathematical representation of the physical and how we can manipulate

546
00:40:52,720 --> 00:40:57,600
functions and things like that to understand real phenomena.

547
00:40:57,600 --> 00:41:01,280
But then when it came time for application, I had no interest in it.

548
00:41:01,280 --> 00:41:07,120
I had no interest in running a processing plan, doing manufacturing, anything like that.

549
00:41:07,120 --> 00:41:11,600
But anyway, so I think a lot of people, they want to be more on the application side.

550
00:41:11,600 --> 00:41:20,360
They want to work on systems that have applied value or maybe not even applied value, but

551
00:41:20,360 --> 00:41:22,120
something that can generate value for themselves.

552
00:41:22,120 --> 00:41:24,640
They want to work, they want to earn money and things like that.

553
00:41:24,640 --> 00:41:27,920
And I think that education is viewed as separate from that.

554
00:41:27,920 --> 00:41:32,800
It's become separate and yeah, language models being used for generating text, I think is

555
00:41:32,800 --> 00:41:35,080
just a symptom of that more than it is.

556
00:41:35,080 --> 00:41:37,600
Yeah, I think it's a symptom of that.

557
00:41:37,600 --> 00:41:43,520
It's not necessarily destroying education, but I think education has to be kind of re-evaluated

558
00:41:43,520 --> 00:41:44,520
on its own.

559
00:41:44,520 --> 00:41:50,120
Well, this is a conversation that doesn't even really need to involve AI.

560
00:41:50,120 --> 00:41:57,000
We have mixed up the concept of bettering oneself by creating a complex system of references

561
00:41:57,000 --> 00:42:01,800
that we carry around so that we are articulate and we can communicate subtle points with

562
00:42:01,800 --> 00:42:07,320
one another or we can appreciate the finer points of music or things like that.

563
00:42:07,320 --> 00:42:16,200
We have mashed that up with vocational accreditation and people want the accreditation.

564
00:42:16,200 --> 00:42:20,780
They want the accreditation that gets them a good job and they're going to the same place

565
00:42:20,780 --> 00:42:25,880
where people are studying philosophy and learning music and learning how to paint and doing

566
00:42:25,880 --> 00:42:29,880
the sort of things that we would now consider to be the classical elements of an education

567
00:42:29,880 --> 00:42:32,920
and not the modern practical elements.

568
00:42:32,920 --> 00:42:37,160
But yeah, they are hopelessly mashed up at this point and it is a much larger problem.

569
00:42:37,160 --> 00:42:44,900
I mean, it was a problem before the recent developments in generative text algorithms

570
00:42:44,900 --> 00:42:46,680
and things like that.

571
00:42:46,680 --> 00:42:52,220
So yeah, that's certainly not anything we're going to untangle in this conversation.

572
00:42:52,220 --> 00:42:59,240
With respect though to AI, I mean, it seems as though something really remarkable has

573
00:42:59,240 --> 00:43:04,760
happened with these language models and there's a rush to find a commercial application for

574
00:43:04,760 --> 00:43:09,840
it and the financial incentives for being first to market with some new product are

575
00:43:09,840 --> 00:43:11,020
enormous.

576
00:43:11,020 --> 00:43:19,800
And so both Google and Microsoft have focused on this idea that we're going to replace the

577
00:43:19,800 --> 00:43:23,800
current search interface where you type in a search string and you get an exhaustive

578
00:43:23,800 --> 00:43:28,440
list of links that seem to apply, that seem to be relevant and it's up to you to go through

579
00:43:28,440 --> 00:43:33,320
them and extract the information and figure out what's worth paying attention to and including

580
00:43:33,320 --> 00:43:39,600
in your internal representations and what's not and what's obvious nonsense.

581
00:43:39,600 --> 00:43:44,260
The new application that both of these companies have in mind takes all the human judgment

582
00:43:44,260 --> 00:43:46,980
and all the human participation out of that process.

583
00:43:46,980 --> 00:43:52,240
You just ask a question, the chat bot goes and does the search and then synthesizes information

584
00:43:52,240 --> 00:43:56,460
from the search and gives you an answer and you can ask a follow-up without reestablishing

585
00:43:56,460 --> 00:43:57,460
any context.

586
00:43:57,460 --> 00:43:58,460
Just say, oh really?

587
00:43:58,460 --> 00:43:59,460
Why is that?

588
00:43:59,460 --> 00:44:01,140
Or tell me more or what about this?

589
00:44:01,140 --> 00:44:06,960
And the language model, it can review the previous inputs and outputs in your conversation

590
00:44:06,960 --> 00:44:09,940
so that you're not always having to reestablish context.

591
00:44:09,940 --> 00:44:15,200
But at the same time, these language models are really, really bad at fact checking or

592
00:44:15,200 --> 00:44:17,280
telling what's true from false.

593
00:44:17,280 --> 00:44:21,900
They're pretty good at summarizing, but if you give them nonsense to summarize and the

594
00:44:21,900 --> 00:44:29,880
web is full of nonsense, then errors inevitably creep in and not just errors, but with respect

595
00:44:29,880 --> 00:44:35,340
to some people who have had advanced access to Bing's language model driven search engine,

596
00:44:35,340 --> 00:44:38,800
it even gets kind of weird and creepy the longer you talk to it.

597
00:44:38,800 --> 00:44:46,800
So it just seems that the commercial pressure to come up with an income generating application

598
00:44:46,800 --> 00:44:52,000
for this technology is driving both of these big companies and many others certainly to

599
00:44:52,000 --> 00:44:59,920
push out products which have not been adequately tested or to apply this methodology to tasks

600
00:44:59,920 --> 00:45:02,280
that it's just not well suited to do.

601
00:45:02,280 --> 00:45:08,360
But there's Google owns search, it's a billion, multi-billion dollar industry every year,

602
00:45:08,360 --> 00:45:13,640
there is enormous incentive for rivals and upstarts and established players like Microsoft

603
00:45:13,640 --> 00:45:16,220
to try to get in and take some of that away from Google.

604
00:45:16,220 --> 00:45:21,240
So it's in Google's interest to match all of their step, all of their innovations step

605
00:45:21,240 --> 00:45:22,360
for step.

606
00:45:22,360 --> 00:45:28,960
And it's just it's a recipe for unleashing all manner of unintended consequences.

607
00:45:28,960 --> 00:45:39,040
Yeah, I think that I have a hypothesis and that it might be confirmed somewhere or I

608
00:45:39,040 --> 00:45:40,040
don't know.

609
00:45:40,040 --> 00:45:42,520
There's so much information out there, it's hard to keep it all straight.

610
00:45:42,520 --> 00:45:48,360
But I have a hypothesis that Microsoft, Google, all the companies that are putting out chat

611
00:45:48,360 --> 00:45:52,840
bots, OpenAI, which they're obviously tied to Microsoft.

612
00:45:52,840 --> 00:45:58,280
But I feel like there is this technology, there is a way to generate fantastically smooth

613
00:45:58,280 --> 00:46:03,280
text that aligns reasonably with human expectation.

614
00:46:03,280 --> 00:46:09,160
I think now users are doing the testing.

615
00:46:09,160 --> 00:46:14,120
These companies could basically sit back and see how users are implementing the technology,

616
00:46:14,120 --> 00:46:21,840
look at both direct users, like individuals who are prompting the models and the feedback

617
00:46:21,840 --> 00:46:25,640
that they get, thumbs up, thumbs down, is this good, is this not good, the progress

618
00:46:25,640 --> 00:46:28,440
of a conversation over a session.

619
00:46:28,440 --> 00:46:33,360
And they can look at that information and say, OK, this is a use case or like a type

620
00:46:33,360 --> 00:46:38,080
of use case to make a class about types of use case based on different themes.

621
00:46:38,080 --> 00:46:42,400
I mean, I'm sure they have people who are doing any number of different kinds of analyses,

622
00:46:42,400 --> 00:46:48,320
probably using language models to figure out what is going on in those interactions.

623
00:46:48,320 --> 00:46:52,920
And they're probably learning from those either directly or indirectly learning from those

624
00:46:52,920 --> 00:46:59,200
experiences to make the product better or to come up with ideas for maybe more specialized

625
00:46:59,200 --> 00:47:02,400
chatbots and things like that, chatbots that are serving specific needs purposes.

626
00:47:02,400 --> 00:47:06,520
That's what I would do anyway if I was at one of those companies, I'd be looking for

627
00:47:06,520 --> 00:47:10,320
those kinds of trends.

628
00:47:10,320 --> 00:47:19,080
I think that certain things like you have chatgvt and a user interface and you say,

629
00:47:19,080 --> 00:47:28,200
this is a language model trained by OpenAI and have all kinds of things appended to the

630
00:47:28,200 --> 00:47:35,520
user experience that manage user expectations for what they're getting when they interact

631
00:47:35,520 --> 00:47:36,640
with the model.

632
00:47:36,640 --> 00:47:43,320
And I feel like that's kind of you basically can leave it there and see what happens and

633
00:47:43,320 --> 00:47:45,920
collect that data from the experiences.

634
00:47:45,920 --> 00:47:49,200
And as long as people are using it, there's going to be data generated and they can have

635
00:47:49,200 --> 00:47:54,600
something serve as a basis for further improvements.

636
00:47:54,600 --> 00:48:02,360
So I think that some concerns, I mean, they probably don't care super much about whether

637
00:48:02,360 --> 00:48:05,880
it's always factually accurate, because again, as long as you say, you know, this is a language

638
00:48:05,880 --> 00:48:11,320
model, this is not something else, this is not an expert in this field or anything like

639
00:48:11,320 --> 00:48:17,720
that, this is a language model, then you don't have to guarantee the veracity of everything

640
00:48:17,720 --> 00:48:19,240
that is produced by it.

641
00:48:19,240 --> 00:48:25,440
A lot of people have the issue with that and think that they should be responsible for

642
00:48:25,440 --> 00:48:28,600
whether incorrect information is generated.

643
00:48:28,600 --> 00:48:33,960
But honestly, I personally don't because I think as long as it's presented in a way where

644
00:48:33,960 --> 00:48:38,280
users kind of know what to expect and continue to have conversations around what expectations

645
00:48:38,280 --> 00:48:42,200
should be, I think that it's kind of most you can do.

646
00:48:42,200 --> 00:48:43,920
I mean, you can't...

647
00:48:43,920 --> 00:48:48,040
It gets into questions of what even is truth, you know, like there are certain things that

648
00:48:48,040 --> 00:48:57,080
if you zoom in, there is no truth, everything's fuzzy.

649
00:48:57,080 --> 00:49:01,080
So...

650
00:49:01,080 --> 00:49:05,200
That was Danielle Bocelli and that was the first half of our conversation.

651
00:49:05,200 --> 00:49:10,880
Now, normally I say the remainder of the conversation or the conclusion of the conversation can

652
00:49:10,880 --> 00:49:14,280
be found in the next episode of the Sea Realm Vault podcast.

653
00:49:14,280 --> 00:49:18,600
That's not actually true in this instance because I didn't get a Vault podcast out last

654
00:49:18,600 --> 00:49:25,280
week, which means the conclusion of my conversation with Dr. Ashley Frawley is next in line for

655
00:49:25,280 --> 00:49:26,280
the Vault.

656
00:49:26,280 --> 00:49:30,720
So it'll probably be a couple of weeks before the conclusion of this conversation with Danielle

657
00:49:30,720 --> 00:49:34,760
Bocelli is available in the Sea Realm Vault podcast.

658
00:49:34,760 --> 00:49:40,600
And I have to plead new puppy, which is to say I have a new dog.

659
00:49:40,600 --> 00:49:43,240
She's almost four months old.

660
00:49:43,240 --> 00:49:44,720
She's a handful right now.

661
00:49:44,720 --> 00:49:50,280
She is at this instant asleep on the floor about five feet away from me.

662
00:49:50,280 --> 00:49:54,720
But when she's awake, it's very hard for me to get any work done at all.

663
00:49:54,720 --> 00:49:55,720
I know.

664
00:49:55,720 --> 00:49:56,720
Good problem to have.

665
00:49:56,720 --> 00:49:57,720
All right.

666
00:49:57,720 --> 00:50:02,120
So just going through some of my notes that I took as I relistened to this first part

667
00:50:02,120 --> 00:50:04,920
of the conversation with Danielle Bocelli.

668
00:50:04,920 --> 00:50:12,000
First, she started to talk about embodiment, and then we didn't really take that conversational

669
00:50:12,000 --> 00:50:13,000
avenue.

670
00:50:13,000 --> 00:50:21,000
There are people who think that in order for current AI systems to become genuine AGI,

671
00:50:21,000 --> 00:50:25,940
artificial general intelligence, which is to say to have a flexible sort of intelligence

672
00:50:25,940 --> 00:50:30,240
which can be applied to a variety of tasks and a variety of fields and not have to be

673
00:50:30,240 --> 00:50:35,280
specifically trained in any one area in order to bring, you know, its powers of cognition

674
00:50:35,280 --> 00:50:41,280
to solving problems in that area, it can't just be a disembodied thing in a computer.

675
00:50:41,280 --> 00:50:43,640
It needs to have a body.

676
00:50:43,640 --> 00:50:48,480
And while we didn't talk about embodiment in general, time and again, Danielle came

677
00:50:48,480 --> 00:50:55,840
back to the question of what are the limitations of learning just from language?

678
00:50:55,840 --> 00:51:06,960
Because we are embodied beings with five senses, maybe six, and, you know, we feel, we touch,

679
00:51:06,960 --> 00:51:12,220
we understand how our bodies are related to other objects around us, how they're oriented

680
00:51:12,220 --> 00:51:13,400
in space.

681
00:51:13,400 --> 00:51:17,480
You know, I don't have one right now, but I understand that most humans enjoy a sense

682
00:51:17,480 --> 00:51:20,680
of smell, which when mine turns on, it's kind of like a superpower.

683
00:51:20,680 --> 00:51:22,120
It's like, Hey, wait a second.

684
00:51:22,120 --> 00:51:26,040
I'm suddenly getting information about the world and I can't see the source of this information.

685
00:51:26,040 --> 00:51:27,040
I can't hear it.

686
00:51:27,040 --> 00:51:28,040
I can't touch it.

687
00:51:28,040 --> 00:51:29,880
Where is it coming from?

688
00:51:29,880 --> 00:51:32,840
But you know, that's, that's my cross to bear.

689
00:51:32,840 --> 00:51:38,600
But that, you know, that's just my off the top of my head explanation as to why some

690
00:51:38,600 --> 00:51:42,840
people think that an AGI to genuinely emerge and, you know, be present in the world has

691
00:51:42,840 --> 00:51:47,560
to be embodied, has to have what we would think of as a robot body.

692
00:51:47,560 --> 00:51:53,240
But in order to make sure that I've really got, you know, a handle on the arguments,

693
00:51:53,240 --> 00:52:00,600
I went to the Google version of GPT-4 or the Google version of Bing Chat.

694
00:52:00,600 --> 00:52:05,040
It's called BARD and it is based on a large language model called Lambda, which was in

695
00:52:05,040 --> 00:52:09,320
the news last year when Blake Lamone claimed that Lambda was sentient.

696
00:52:09,320 --> 00:52:12,160
I'm pretty sure it's not.

697
00:52:12,160 --> 00:52:14,640
And BARD is still in an experimental stage.

698
00:52:14,640 --> 00:52:19,760
Not everybody can access it, but because I own a Pixel phone, which is a phone, you know,

699
00:52:19,760 --> 00:52:24,640
designed and sold by Google, they gave me special early access to BARD.

700
00:52:24,640 --> 00:52:27,120
And so I asked it, Hi BARD.

701
00:52:27,120 --> 00:52:31,240
Some people claim that for AGI to be fully realized, it must be embodied.

702
00:52:31,240 --> 00:52:33,280
Would you flesh out that argument?

703
00:52:33,280 --> 00:52:34,560
Sure.

704
00:52:34,560 --> 00:52:40,800
There are a number of reasons why some people believe that embodiment is necessary for AGI.

705
00:52:40,800 --> 00:52:46,520
First, embodiment allows an AI to interact with the world in a way that is not possible

706
00:52:46,520 --> 00:52:49,360
for a purely disembodied AI.

707
00:52:49,360 --> 00:52:54,400
This interaction can provide the AI with a wealth of information and experience that

708
00:52:54,400 --> 00:52:56,880
can help it to learn and grow.

709
00:52:56,880 --> 00:53:02,840
For example, an AI that is embodied in a robot can learn about the world by interacting with

710
00:53:02,840 --> 00:53:05,920
objects and people in the real world.

711
00:53:05,920 --> 00:53:11,100
This type of learning is not possible for a purely disembodied AI, which would only

712
00:53:11,100 --> 00:53:15,600
be able to learn from data that is provided to it by humans.

713
00:53:15,600 --> 00:53:20,840
Second, embodiment can help an AI to develop a sense of self.

714
00:53:20,840 --> 00:53:26,200
This is because embodiment allows the AI to experience the world through its own senses

715
00:53:26,200 --> 00:53:30,540
and to interact with the world in a way that is unique to it.

716
00:53:30,540 --> 00:53:36,200
This sense of self is important for AGI because it allows the AI to develop its own goals

717
00:53:36,200 --> 00:53:38,080
and motivations.

718
00:53:38,080 --> 00:53:44,140
For example, an AI that is embodied in a robot can develop a sense of self by interacting

719
00:53:44,140 --> 00:53:46,860
with other robots and humans.

720
00:53:46,860 --> 00:53:52,560
This sense of self can help the AI to develop its own goals, such as wanting to be helpful

721
00:53:52,560 --> 00:53:55,600
or wanting to learn more about the world.

722
00:53:55,600 --> 00:54:00,080
Third, embodiment can help an AI to develop empathy.

723
00:54:00,080 --> 00:54:04,900
This is because embodiment allows the AI to experience the world through the senses of

724
00:54:04,900 --> 00:54:06,560
another being.

725
00:54:06,560 --> 00:54:12,060
This can help the AI to understand the feelings and experiences of others.

726
00:54:12,060 --> 00:54:19,180
For example, an AI that is embodied in a robot can develop empathy by interacting with humans.

727
00:54:19,180 --> 00:54:25,080
This empathy can help the AI to understand the needs and wants of humans and to develop

728
00:54:25,080 --> 00:54:27,280
relationships with them.

729
00:54:27,280 --> 00:54:32,520
In conclusion, there are a number of reasons why some people believe that embodiment is

730
00:54:32,520 --> 00:54:35,680
necessary for AGI.

731
00:54:35,680 --> 00:54:40,120
Embodiment allows an AI to interact with the world in a way that is not possible for a

732
00:54:40,120 --> 00:54:42,240
purely disembodied AI.

733
00:54:42,240 --> 00:54:48,680
It can help an AI to develop a sense of self, and it can help an AI to develop empathy.

734
00:54:48,680 --> 00:54:50,280
So there you go.

735
00:54:50,280 --> 00:54:57,360
Three more reasons why a genuine artificial intelligence, an artificial general intelligence,

736
00:54:57,360 --> 00:55:01,400
needs to have a body and be present in the physical world.

737
00:55:01,400 --> 00:55:07,460
Now, you could argue, and many people have argued, that that's kind of dangerous.

738
00:55:07,460 --> 00:55:11,840
We want to know what the AGI is before we turn it loose, physically turn it loose in

739
00:55:11,840 --> 00:55:12,920
the world.

740
00:55:12,920 --> 00:55:20,640
One possible alternative is that you can simulate, you know, you can have the AI be embodied

741
00:55:20,640 --> 00:55:24,240
in a simulated body in a simulated world.

742
00:55:24,240 --> 00:55:29,320
For instance, it could be, you know, the avatar in the Minecraft world.

743
00:55:29,320 --> 00:55:35,400
That's a very simple world, but it might be enough for an entity to get a sense that it

744
00:55:35,400 --> 00:55:41,940
is... it is a thing that moves around in physical space that interacts with objects, which transforms

745
00:55:41,940 --> 00:55:44,700
objects in order to accomplish tasks.

746
00:55:44,700 --> 00:55:49,560
It encounters other entities, kind of like itself, like itself in as much as that they

747
00:55:49,560 --> 00:55:51,040
are entities.

748
00:55:51,040 --> 00:55:56,600
Some will be friendly, some will be hostile, some will be neutral, some will be friendly

749
00:55:56,600 --> 00:55:59,940
or hostile depending on how you interact with them.

750
00:55:59,940 --> 00:56:04,100
In Minecraft, you encounter all of these different types of entities.

751
00:56:04,100 --> 00:56:08,960
I think another conceptual tool that you should have in the back of your mind when you're

752
00:56:08,960 --> 00:56:13,640
listening to people talking about AI and talking about the prospects for AI is something called

753
00:56:13,640 --> 00:56:15,560
Polanyi's Paradox.

754
00:56:15,560 --> 00:56:21,960
And again, this is... it's another look at the question of what are the limitations from

755
00:56:21,960 --> 00:56:23,960
learning just from language?

756
00:56:23,960 --> 00:56:28,760
Polanyi's Paradox basically says, we know more than we can say.

757
00:56:28,760 --> 00:56:35,400
Or as some some search engine chat entity that I... that I queried, and I didn't write

758
00:56:35,400 --> 00:56:39,960
down the source, I just wrote down the definition that it gave me, it defined Polanyi's Paradox

759
00:56:39,960 --> 00:56:46,000
as the theory that human knowledge of how the world functions and capability are, to

760
00:56:46,000 --> 00:56:50,820
a large extent, beyond our explicit understanding.

761
00:56:50,820 --> 00:56:54,900
We don't know exactly how it is we keep our balance when we walk.

762
00:56:54,900 --> 00:56:56,120
We just do it.

763
00:56:56,120 --> 00:57:00,400
And because we just do it, and we don't explicitly understand it, we can't really explain it

764
00:57:00,400 --> 00:57:02,800
to a machine.

765
00:57:02,800 --> 00:57:06,840
I have been listening to... mostly listening to, even when it's videos.

766
00:57:06,840 --> 00:57:09,480
I tend to take them in in audio form.

767
00:57:09,480 --> 00:57:17,280
But one thing that I've listened to and watched recently on the topic of AI is a presentation

768
00:57:17,280 --> 00:57:20,560
by Tristan Harris and Azar Raskin.

769
00:57:20,560 --> 00:57:24,840
Tristan Harris was one of the people responsible for that Netflix special called The Social

770
00:57:24,840 --> 00:57:28,240
Dilemma, which came out a year or so ago on Netflix.

771
00:57:28,240 --> 00:57:34,960
And he and Azar Raskin have a presentation that they call The AI Dilemma, talking about

772
00:57:34,960 --> 00:57:40,360
some of the very surprising and concerning developments in AI, and talking about just

773
00:57:40,360 --> 00:57:46,600
how quickly all of this is being deployed, how the capabilities of AI systems are improving

774
00:57:46,600 --> 00:57:52,740
much faster than our ability to align them with our goals and our values.

775
00:57:52,740 --> 00:57:58,160
It's not alarmist, it's not doomsaying, but it is certainly... it will give you food for

776
00:57:58,160 --> 00:58:01,740
thought and reason to take this issue seriously.

777
00:58:01,740 --> 00:58:06,160
So the two of them have a podcast that they do, but I would definitely suggest starting

778
00:58:06,160 --> 00:58:10,200
with the YouTube video, The AI Dilemma.

779
00:58:10,200 --> 00:58:15,040
And something that I said in this interview, I noticed, it's not really true anymore.

780
00:58:15,040 --> 00:58:20,080
You know, I said that these large language models, they're just using statistical tricks

781
00:58:20,080 --> 00:58:24,120
to predict what should come next in a conversation.

782
00:58:24,120 --> 00:58:28,680
So they, you know, they've been fed basically all the text that is available, all the text

783
00:58:28,680 --> 00:58:32,380
on the internet, all the books that can be fed into it.

784
00:58:32,380 --> 00:58:36,640
So it's got lots of examples of language, and it can basically look at the language

785
00:58:36,640 --> 00:58:42,520
that you've used in asking your question, and very reliably and very convincingly put

786
00:58:42,520 --> 00:58:49,300
out text that is relevant to what you've said.

787
00:58:49,300 --> 00:58:54,200
It is a plausible next entry in the conversation, but it's not always correct.

788
00:58:54,200 --> 00:58:55,200
It's not always accurate.

789
00:58:55,200 --> 00:59:00,300
And I say that these large language models, they don't really have any memory.

790
00:59:00,300 --> 00:59:05,120
They're basically just doing statistical tricks to produce output, which seems plausible to

791
00:59:05,120 --> 00:59:09,680
us humans, but they don't... they don't model the world, and they don't update their model

792
00:59:09,680 --> 00:59:15,280
based on, you know, experiences, based on whether predictions that they make come true

793
00:59:15,280 --> 00:59:16,620
or not.

794
00:59:16,620 --> 00:59:19,280
We, hopefully, do that.

795
00:59:19,280 --> 00:59:23,720
If we think we understand a situation, and our understanding leads us to expect a certain

796
00:59:23,720 --> 00:59:29,720
outcome and that outcome doesn't happen, hopefully we re-examine some of our premises.

797
00:59:29,720 --> 00:59:37,280
You know, we hold some of our hypotheses up for examination and, you know, give them the

798
00:59:37,280 --> 00:59:43,360
axe if they are consistently producing bad outcomes or, you know, failed predictions.

799
00:59:43,360 --> 00:59:46,520
And I say that these large language models, they don't have anything like that.

800
00:59:46,520 --> 00:59:55,480
GPT-4 can use tools, which is to say it can query other types of software, and some of

801
00:59:55,480 --> 01:00:00,560
these other types of software that it can query are basically memory modules.

802
01:00:00,560 --> 01:00:05,720
So while the large language model by itself doesn't really model the world the way that

803
01:00:05,720 --> 01:00:10,560
we do, you know, it doesn't create an abstract representation of the world that it then...

804
01:00:10,560 --> 01:00:14,480
that it uses to navigate the world and updates as necessary based on new information as it

805
01:00:14,480 --> 01:00:16,480
becomes available.

806
01:00:16,480 --> 01:00:22,840
A sort of emergent entity, which is a combination of a large language model plus several other

807
01:00:22,840 --> 01:00:28,860
different types of software that get queried and brought into the task as needed, can start

808
01:00:28,860 --> 01:00:32,920
to think and model the world the way we humans do.

809
01:00:32,920 --> 01:00:38,760
Not exactly the way we do, and not as competently as we do yet.

810
01:00:38,760 --> 01:00:40,240
But things are moving really quickly now.

811
01:00:40,240 --> 01:00:48,300
You know, there's this paper called Sparks of AGI, which is to say GPT-4 is not an artificial

812
01:00:48,300 --> 01:00:50,040
general intelligence.

813
01:00:50,040 --> 01:00:53,320
It is still fairly... it is still narrow.

814
01:00:53,320 --> 01:00:57,100
It doesn't have the full range of cognitive capabilities that we do.

815
01:00:57,100 --> 01:01:02,800
But every now and again, it shows little glimmers that it's getting close.

816
01:01:02,800 --> 01:01:08,440
And it could be that these large language models are overhyped, that their capabilities

817
01:01:08,440 --> 01:01:13,700
are magnified by our fascination with them, that they're really not as competent as we

818
01:01:13,700 --> 01:01:18,760
seem to think they are, because they've basically been optimized to wow us.

819
01:01:18,760 --> 01:01:24,940
And to get us to see them as being more capable, competent, and complete than they really are.

820
01:01:24,940 --> 01:01:29,480
And it could be that in the coming months and years we will come to see past this sort

821
01:01:29,480 --> 01:01:37,600
of showbiz illusion of these things, and recognize their very real persistent and sort of dogged

822
01:01:37,600 --> 01:01:39,160
limitations.

823
01:01:39,160 --> 01:01:42,480
Or it could be that they continue to improve as quickly as they have recently, and they

824
01:01:42,480 --> 01:01:46,360
get really good really fast, and people who haven't been paying attention to this suddenly

825
01:01:46,360 --> 01:01:50,960
wake up to a world in which AGI is a thing and they never saw it coming.

826
01:01:50,960 --> 01:01:53,520
So here's the service that I'm looking to offer.

827
01:01:53,520 --> 01:01:55,480
I don't write code.

828
01:01:55,480 --> 01:01:56,960
I'm not a technical person.

829
01:01:56,960 --> 01:02:01,120
I've... you know, my website, which is a WordPress website, is beyond my understanding.

830
01:02:01,120 --> 01:02:03,020
It has been built by various other people.

831
01:02:03,020 --> 01:02:07,360
And one of the reasons why I started a new podcast is my creaky old website needs a lot

832
01:02:07,360 --> 01:02:11,680
of work to be, you know, brought up to spec, to current standards, and I'm not the guy

833
01:02:11,680 --> 01:02:13,600
to do it.

834
01:02:13,600 --> 01:02:15,560
But the service I offer is this.

835
01:02:15,560 --> 01:02:17,280
I'm pretty good with words.

836
01:02:17,280 --> 01:02:21,160
I'm pretty good at understanding abstract systems and arguments.

837
01:02:21,160 --> 01:02:26,120
And while I don't have a very good memory for episodes, which is to say, I don't remember

838
01:02:26,120 --> 01:02:30,980
things that happened to me all that well, apparently, I do have a pretty good memory

839
01:02:30,980 --> 01:02:34,280
for concepts and arguments that I can reproduce on the fly.

840
01:02:34,280 --> 01:02:39,300
And so I'm just going to continue to pay attention to this topic and I'm going to be jabbering

841
01:02:39,300 --> 01:02:44,220
into smartphones and microphones, basically trying to make sense of it all in real time

842
01:02:44,220 --> 01:02:49,400
on stage for you to look at, listen to, comment on, and hopefully benefit from.

843
01:02:49,400 --> 01:02:55,100
So I'm mostly talking from the perspective of a non-technical person, but I'm relaying

844
01:02:55,100 --> 01:03:00,400
ideas and understanding that I gain from speaking to technically oriented people.

845
01:03:00,400 --> 01:03:02,200
I hope that's valuable to you.

846
01:03:02,200 --> 01:03:07,240
All right, if you notice things that I got wrong, the guest got wrong, or we both just

847
01:03:07,240 --> 01:03:11,960
failed to consider, and remember, you've only heard half the conversation, so we might have

848
01:03:11,960 --> 01:03:15,680
gotten around to it in the second half, do let me know.

849
01:03:15,680 --> 01:03:21,080
And I would suggest that you do that by leaving a comment on the YouTube version of this podcast

850
01:03:21,080 --> 01:03:23,560
or on my Patreon page, patreon.com.

851
01:03:23,560 --> 01:03:29,720
Now it could be that you have to be a paying subscriber to leave comments, but I do have

852
01:03:29,720 --> 01:03:36,320
a $1 a month donation level, which for most of you, I suspect, wouldn't really alter

853
01:03:36,320 --> 01:03:42,600
your quality of life if you paid $1 a month in order to be a member of the people interacting

854
01:03:42,600 --> 01:03:46,140
with me and with each other on my Patreon feed.

855
01:03:46,140 --> 01:03:51,520
You may disagree, and since we're not in the same room at the same time talking, your judgment

856
01:03:51,520 --> 01:03:54,480
is law in this regard.

857
01:03:54,480 --> 01:03:56,400
But just putting the idea out there.

858
01:03:56,400 --> 01:03:58,880
All right, that's all for this episode.

859
01:03:58,880 --> 01:04:02,080
I will be back one week from today with another episode.

860
01:04:02,080 --> 01:04:07,680
I may or may not get an episode of the C. Rome Vault podcast out between now and then.

861
01:04:07,680 --> 01:04:11,240
There is this whole tax thing to deal with.

862
01:04:11,240 --> 01:04:14,880
I have just a few days left, and I haven't started.

863
01:04:14,880 --> 01:04:15,880
Wish me luck.

864
01:04:15,880 --> 01:04:17,720
Talk to you soon.

865
01:04:17,720 --> 01:04:30,160
Stay well.