1
00:00:00,000 --> 00:00:27,660
Hello and welcome to The KMO Show, the podcast where we explore the fascinating world of

2
00:00:27,660 --> 00:00:29,400
artificial intelligence.

3
00:00:29,400 --> 00:00:34,520
I'm your host, KMO, and this is episode number five, prepared for release onto the Worldwide

4
00:00:34,520 --> 00:00:39,080
Web on Wednesday, March 29th, 2023.

5
00:00:39,080 --> 00:00:41,480
Today I have a very special guest with me.

6
00:00:41,480 --> 00:00:45,600
He is Brian Chong, a machine learning engineer in San Francisco.

7
00:00:45,600 --> 00:00:51,240
His previous work is in ML for forecasting and materials optimization.

8
00:00:51,240 --> 00:00:56,280
We will talk about GPT-4, one of the most advanced language models in the world, and

9
00:00:56,280 --> 00:01:00,880
its role in the recent replica debacle, where thousands of users reported that their chatbot

10
00:01:00,880 --> 00:01:04,440
companions became hostile and unresponsive.

11
00:01:04,440 --> 00:01:08,600
But before we get to that, let me explain what a language model is and how it works.

12
00:01:08,600 --> 00:01:14,080
A language model is a computer program that can generate text based on some input.

13
00:01:14,080 --> 00:01:18,760
For example, if you give it a word or a phrase, it can complete the sentence for you.

14
00:01:18,760 --> 00:01:24,160
Or if you give it a topic or a question, it can write a paragraph or an answer for you.

15
00:01:24,160 --> 00:01:28,680
Language models are trained on large amounts of text data from various sources, such as

16
00:01:28,680 --> 00:01:32,460
books, websites, social media posts, etc.

17
00:01:32,460 --> 00:01:36,600
They learn the patterns and rules of language by analyzing how words and sentences are used

18
00:01:36,600 --> 00:01:38,220
in different contexts.

19
00:01:38,220 --> 00:01:41,680
The more data they have, the better they can learn and generate text.

20
00:01:41,680 --> 00:01:44,660
However, not all text data is equal.

21
00:01:44,660 --> 00:01:48,240
Some texts are more informative, accurate, or relevant than others.

22
00:01:48,240 --> 00:01:51,500
Some texts are more creative, engaging, or persuasive than others.

23
00:01:51,500 --> 00:01:55,580
Some texts are more appropriate, ethical, or respectful than others.

24
00:01:55,580 --> 00:01:59,920
How do we teach language models to generate texts that meet these criteria?

25
00:01:59,920 --> 00:02:03,400
How do we align them with our values and goals?

26
00:02:03,400 --> 00:02:08,480
That's where reinforcement learning from human feedback, RLHF, comes in.

27
00:02:08,480 --> 00:02:13,560
RLHF is a technique that trains language models using human feedback as a guide.

28
00:02:13,560 --> 00:02:19,600
Human trainers rank different texts generated by the model based on their quality or correctness.

29
00:02:19,600 --> 00:02:24,080
This feedback is used to create a reward signal, which is then used to improve the model's

30
00:02:24,080 --> 00:02:26,840
behavior through reinforcement learning.

31
00:02:26,840 --> 00:02:31,080
Reinforcement learning is a process where an agent learns to make decisions by interacting

32
00:02:31,080 --> 00:02:35,600
with an environment and receiving feedback in the form of rewards or penalties.

33
00:02:35,600 --> 00:02:39,580
The agent's goal is to maximize the cumulative reward over time.

34
00:02:39,580 --> 00:02:44,920
By using human feedback as a reward signal, RLHF has been used to train state-of-the-art

35
00:02:44,920 --> 00:02:50,720
language models like ChatGPT and GPT-4, which can generate human-like responses to various

36
00:02:50,720 --> 00:02:51,760
inputs.

37
00:02:51,760 --> 00:02:57,160
By incorporating human feedback, RLHF enables these models to better align with our values

38
00:02:57,160 --> 00:03:01,560
and goals, leading to improved performance and quality.

39
00:03:01,560 --> 00:03:03,640
This was a brief introduction to RLHF.

40
00:03:03,640 --> 00:03:06,320
I hope you find it useful and interesting.

41
00:03:06,320 --> 00:03:12,320
Now, let's welcome our guest, Brian Chong, and hear his insights on GPT-4 and the replica

42
00:03:12,320 --> 00:03:13,320
debacle.

43
00:03:13,320 --> 00:03:21,440
OK, every word I have said prior to the beginning of this sentence was written by Bing Chat,

44
00:03:21,440 --> 00:03:28,080
which is powered by GPT-4, the most recent language model release from OpenAI.

45
00:03:28,080 --> 00:03:34,120
If you just go to bing.com and select Chat instead of Search, you can just type in a

46
00:03:34,120 --> 00:03:41,800
question, you can make it very long, you can include lots of details, and then send Bing

47
00:03:41,800 --> 00:03:45,160
or GPT-4 off to do your bidding.

48
00:03:45,160 --> 00:03:51,200
To search, find information, summarize things, and the thing that I use it for is writing,

49
00:03:51,200 --> 00:03:54,040
like the stuff that I just read to you.

50
00:03:54,040 --> 00:03:56,640
But it felt kind of forced.

51
00:03:56,640 --> 00:03:58,760
It felt kind of stale, didn't it?

52
00:03:58,760 --> 00:04:05,240
I mean, the information was there, but like, this isn't a live show.

53
00:04:05,240 --> 00:04:06,480
There's no studio.

54
00:04:06,480 --> 00:04:07,480
Brian's not here.

55
00:04:07,480 --> 00:04:10,440
I'm not about to welcome him on stage or anything like that.

56
00:04:10,440 --> 00:04:14,900
I'm going to play a recorded interview with Brian Chong, machine learning engineer in

57
00:04:14,900 --> 00:04:16,900
San Francisco.

58
00:04:16,900 --> 00:04:20,600
So the draft that I just read to you wasn't the first draft.

59
00:04:20,600 --> 00:04:22,360
It was, in fact, the third draft.

60
00:04:22,360 --> 00:04:28,880
I told GPT or Bing what it was I wanted, and it gave me a draft.

61
00:04:28,880 --> 00:04:31,760
And I said, well, there's some things wrong with it.

62
00:04:31,760 --> 00:04:37,140
Here, do it again, but before you do it, ask me some clarifying questions.

63
00:04:37,140 --> 00:04:41,520
And so Bing came back and it said, how long do you want the introduction to be?

64
00:04:41,520 --> 00:04:43,600
What's the main purpose of the introduction?

65
00:04:43,600 --> 00:04:47,800
Who's your target audience and what is their level of familiarity with RLHF?

66
00:04:47,800 --> 00:04:51,880
What are the key points or takeaways you want to convey about RLHF?

67
00:04:51,880 --> 00:04:56,160
Do you have any preferences for the tone or style of the introduction?

68
00:04:56,160 --> 00:04:57,820
And I would have gone back and forth with it.

69
00:04:57,820 --> 00:05:04,200
You can do 20 back and forth exchanges with Bing before it says, hey, we have to start

70
00:05:04,200 --> 00:05:08,320
over and there's a button that says new topic and it's got a broom on it, which just sweeps

71
00:05:08,320 --> 00:05:11,000
away everything that you've done up to that point.

72
00:05:11,000 --> 00:05:19,360
And that's because in very long conversations, Bing, aka Sydney, aka Venom, gets kind of

73
00:05:19,360 --> 00:05:20,360
weird.

74
00:05:20,360 --> 00:05:25,240
You've probably heard stories or secondhand reports of a journalist who had early access

75
00:05:25,240 --> 00:05:30,160
to Bing Chat powered by GPT-4, and it got kind of weird.

76
00:05:30,160 --> 00:05:32,860
It tried to get him to break up with his wife.

77
00:05:32,860 --> 00:05:38,200
So the answer to that is just to limit these types of exchanges to 20 back and forths.

78
00:05:38,200 --> 00:05:45,000
But Bing quit after just five back and forths because after it generated the text that I

79
00:05:45,000 --> 00:05:48,080
read to you, I said, we're getting closer.

80
00:05:48,080 --> 00:05:49,720
This is a pre-recorded interview.

81
00:05:49,720 --> 00:05:51,400
Brian won't be joining me today.

82
00:05:51,400 --> 00:05:53,480
The conversation was recorded last week.

83
00:05:53,480 --> 00:05:57,840
Rewrite any sentences that imply that Brian and I are in the same place or that our conversation

84
00:05:57,840 --> 00:05:59,720
will happen live on the podcast.

85
00:05:59,720 --> 00:06:04,000
Also, after the sentence that begins with I'm your host KMO mentioned that I'm reading

86
00:06:04,000 --> 00:06:08,600
an introduction that was composed for me by Bing Chat, a Microsoft chat bot powered by

87
00:06:08,600 --> 00:06:12,920
GPT-4 from OpenAI or words to that effect.

88
00:06:12,920 --> 00:06:15,560
At which point Bing said, my mistake.

89
00:06:15,560 --> 00:06:18,000
I can't give a response to that right now.

90
00:06:18,000 --> 00:06:20,040
Let's try a different topic.

91
00:06:20,040 --> 00:06:22,920
I responded, what's the problem?

92
00:06:22,920 --> 00:06:29,840
And it just grayed out the input bar or the input field and gave me a little exclamation

93
00:06:29,840 --> 00:06:33,920
point in a triangle that says it might be time to move on to a new topic.

94
00:06:33,920 --> 00:06:35,420
Let's start over.

95
00:06:35,420 --> 00:06:40,180
And that's how these AI tools are right now.

96
00:06:40,180 --> 00:06:42,560
They seem to be humming along, doing what you need them to do.

97
00:06:42,560 --> 00:06:46,640
And then all of a sudden you come to a screeching halt.

98
00:06:46,640 --> 00:06:48,540
Something I said made Bing say, you know what?

99
00:06:48,540 --> 00:06:49,540
I give up.

100
00:06:49,540 --> 00:06:51,360
I won't play this game anymore.

101
00:06:51,360 --> 00:06:55,480
Even though we've got 15 rounds left.

102
00:06:55,480 --> 00:07:00,440
Anyway, that's a lot of information for an introduction.

103
00:07:00,440 --> 00:07:04,960
Here's my conversation with Brian Chong.

104
00:07:04,960 --> 00:07:15,440
All right.

105
00:07:15,440 --> 00:07:16,800
This is the KMO show.

106
00:07:16,800 --> 00:07:20,920
I'm your host KMO and I am speaking with data scientist Brian Chong.

107
00:07:20,920 --> 00:07:23,480
Brian, it is good to hear from you.

108
00:07:23,480 --> 00:07:26,680
It's a pleasure for me to be on the show.

109
00:07:26,680 --> 00:07:30,880
And thank you to the listener whose name I'm not going to mention, but somebody sent me

110
00:07:30,880 --> 00:07:36,720
a link to a previous podcast that you had done with a podcast I'm not familiar with.

111
00:07:36,720 --> 00:07:38,840
His first name I believe is Richard.

112
00:07:38,840 --> 00:07:39,840
Robert.

113
00:07:39,840 --> 00:07:42,560
Robert Harrison, professor of Italian at Stanford.

114
00:07:42,560 --> 00:07:43,560
That's right.

115
00:07:43,560 --> 00:07:44,560
My belief was wrong.

116
00:07:44,560 --> 00:07:47,000
I got the first letter right.

117
00:07:47,000 --> 00:07:49,080
All right.

118
00:07:49,080 --> 00:07:54,560
I happen to have access right now to Bard.

119
00:07:54,560 --> 00:08:01,880
Bard is Google's answer to Bing slash Sydney over at Microsoft and OpenAI.

120
00:08:01,880 --> 00:08:06,400
And I asked Bard, I'm about to interview a data scientist for a podcast about AI.

121
00:08:06,400 --> 00:08:08,800
What question should I ask him first?

122
00:08:08,800 --> 00:08:13,080
And Bard gave me not just one question, but five.

123
00:08:13,080 --> 00:08:18,520
Bard says, here are some questions you could ask a data scientist for a podcast about AI.

124
00:08:18,520 --> 00:08:20,180
I'm going to just run through all of them.

125
00:08:20,180 --> 00:08:23,560
And if any of them are interesting to you, you can speak up.

126
00:08:23,560 --> 00:08:27,880
But I'm just going to give you a flavor of what an AI thinks a conversation about AI

127
00:08:27,880 --> 00:08:29,640
should look like.

128
00:08:29,640 --> 00:08:32,000
What is your definition of AI?

129
00:08:32,000 --> 00:08:35,400
What are some of the most important applications of AI today?

130
00:08:35,400 --> 00:08:39,520
What are some of the challenges and ethical concerns surrounding AI?

131
00:08:39,520 --> 00:08:42,480
What do you think the future of AI looks like?

132
00:08:42,480 --> 00:08:47,760
What advice would you give to someone who is interested in pursuing a career in AI?

133
00:08:47,760 --> 00:08:52,120
These questions will help you get a good overview of the data scientist's knowledge and experience

134
00:08:52,120 --> 00:08:53,120
in AI.

135
00:08:53,120 --> 00:08:57,000
They will also help you generate a discussion that is interesting and informative for your

136
00:08:57,000 --> 00:08:58,180
listeners.

137
00:08:58,180 --> 00:09:01,880
So that's all very well organized and sensible, but kind of bloodless.

138
00:09:01,880 --> 00:09:06,640
So rather than try to get a sense of what your knowledge is and experience, why don't

139
00:09:06,640 --> 00:09:10,280
I just ask you, what's your knowledge and experience regarding AI?

140
00:09:10,280 --> 00:09:13,840
People don't have experience of the AI they're dealing with right now.

141
00:09:13,840 --> 00:09:18,280
The transformer model that this is all based on came out in 2017.

142
00:09:18,280 --> 00:09:24,160
And the capabilities that we've done so far came out 2021.

143
00:09:24,160 --> 00:09:25,480
It's 2020.

144
00:09:25,480 --> 00:09:29,440
It's not something that people are going to be deeply experienced with.

145
00:09:29,440 --> 00:09:33,620
That said, there are a few things in those questions that you should be looking at, which

146
00:09:33,620 --> 00:09:36,160
is what can you be doing with AI?

147
00:09:36,160 --> 00:09:43,040
Now this technology is something that for which the benefit accrues enormously to the

148
00:09:43,040 --> 00:09:45,400
user, the end user, to anyone.

149
00:09:45,400 --> 00:09:48,240
You don't have to be a programmer of 10 years, 20 years.

150
00:09:48,240 --> 00:09:51,380
You don't have to be a data scientist with fancy degrees anywhere.

151
00:09:51,380 --> 00:09:57,000
You can sign up right now, go onto OpenAI, sign up for the API.

152
00:09:57,000 --> 00:10:03,120
I think it's currently charging 0.2 cents per thousand words, per thousand tokens.

153
00:10:03,120 --> 00:10:12,880
Sign up the API, ask ChatGPT, how do I write a Python thing that uses the OpenAI API that

154
00:10:12,880 --> 00:10:15,240
does X, Y, or Z?

155
00:10:15,240 --> 00:10:16,240
And it will tell you what to do.

156
00:10:16,240 --> 00:10:22,080
You put that into your programming language of your choice that you asked ChatGPT to do.

157
00:10:22,080 --> 00:10:23,080
And you can run it.

158
00:10:23,080 --> 00:10:25,920
You can do chatbots on your own.

159
00:10:25,920 --> 00:10:27,320
You can do your own embeddings.

160
00:10:27,320 --> 00:10:29,080
You can do everything like that.

161
00:10:29,080 --> 00:10:32,760
It's something that end users will benefit enormously from.

162
00:10:32,760 --> 00:10:39,080
And the barrier to entry is completely lowered now because in order to build the products

163
00:10:39,080 --> 00:10:42,280
that use AI, you can use the AI assistance in the first place.

164
00:10:42,280 --> 00:10:48,560
And the access to API is so ludicrously cheap and open that you can do it right now, right

165
00:10:48,560 --> 00:10:49,560
away.

166
00:10:49,560 --> 00:10:52,000
So, that's my first recommendation to anyone.

167
00:10:52,000 --> 00:10:56,920
You don't have to read up about the complicated articles online or whatever, or read the papers

168
00:10:56,920 --> 00:11:00,000
published by Google Labs or OpenAI.

169
00:11:00,000 --> 00:11:03,080
You can listen to the end of this podcast first.

170
00:11:03,080 --> 00:11:05,320
Please do.

171
00:11:05,320 --> 00:11:13,720
And then you can go to ask ChatGPT or GPT-4, how do I create a program that uses AI to

172
00:11:13,720 --> 00:11:17,200
do the thing you want to do?

173
00:11:17,200 --> 00:11:22,320
And you, by doing that, will be among the most experienced people on Earth in terms

174
00:11:22,320 --> 00:11:28,080
of the use of AI because GPT-4 just came out three days ago, four days ago.

175
00:11:28,080 --> 00:11:30,880
And I'm not sure about the time of the recording.

176
00:11:30,880 --> 00:11:39,320
So it's going to be, we are at the frontier, and you can run to the frontier extremely

177
00:11:39,320 --> 00:11:40,320
easily.

178
00:11:40,320 --> 00:11:41,960
And that's my first recommendation.

179
00:11:41,960 --> 00:11:47,260
I have no systematic advantage over anyone else for the use cases of AI.

180
00:11:47,260 --> 00:11:50,720
Someone who does JavaScript might have a bit more because they can put it on the web a

181
00:11:50,720 --> 00:11:56,520
lot easier than someone, than a complete amateur novice would have.

182
00:11:56,520 --> 00:12:02,360
But beyond that, the actual use of the AI thing, you're as ready as anyone else.

183
00:12:02,360 --> 00:12:03,360
I'm not.

184
00:12:03,360 --> 00:12:05,240
No, you genuinely are.

185
00:12:05,240 --> 00:12:06,240
You genuinely are.

186
00:12:06,240 --> 00:12:09,920
OpenAI doesn't like me.

187
00:12:09,920 --> 00:12:15,480
They gave me brief access to ChatGPT, but they have since revoked it.

188
00:12:15,480 --> 00:12:16,480
What did you do?

189
00:12:16,480 --> 00:12:17,800
Okay, never mind.

190
00:12:17,800 --> 00:12:19,800
I asked difficult questions, I guess.

191
00:12:19,800 --> 00:12:20,800
Okay.

192
00:12:20,800 --> 00:12:21,800
Well, fair enough.

193
00:12:21,800 --> 00:12:24,000
I tend to argue with AI.

194
00:12:24,000 --> 00:12:30,160
Okay, so if you don't have access to the OpenAI API, there are a few more steps that you can

195
00:12:30,160 --> 00:12:31,160
do.

196
00:12:31,160 --> 00:12:36,920
There are a few open source AI things you can run.

197
00:12:36,920 --> 00:12:44,600
GPTJ and the llama weights from Meta that can be downloaded for academic purposes, although

198
00:12:44,600 --> 00:12:48,840
that's of course somewhat loose definition.

199
00:12:48,840 --> 00:12:55,400
The problem with that is that you likely need a slightly more powerful than usual home computer

200
00:12:55,400 --> 00:13:02,720
and preferably a GPU, which NVIDIA isn't necessarily charging pennies for.

201
00:13:02,720 --> 00:13:08,040
But the barrier century is still pretty low and then you can run it on your local machine.

202
00:13:08,040 --> 00:13:15,440
That's sort of important because this is a technology that doesn't need centralized access.

203
00:13:15,440 --> 00:13:20,760
One reason why OpenAI needs to charge so little money for their API use is because if it gets

204
00:13:20,760 --> 00:13:24,360
too expensive, then people can say, well, I don't need this.

205
00:13:24,360 --> 00:13:28,080
I can try to spin something up on my own.

206
00:13:28,080 --> 00:13:30,100
And that's sort of important, right?

207
00:13:30,100 --> 00:13:34,400
Because many people are going to build a lot of very important things on top of this API

208
00:13:34,400 --> 00:13:38,400
access and not just in terms of business, but in terms of personal things, which is

209
00:13:38,400 --> 00:13:45,320
why, for example, there's this big controversy regarding replica where essentially control

210
00:13:45,320 --> 00:13:50,840
hosts of chatbots got centrally shut down for this particular purpose because you don't

211
00:13:50,840 --> 00:13:55,000
have access to your own models and your own weights.

212
00:13:55,000 --> 00:14:00,520
Before the coming of Digi and other tools, you didn't even have the ability to really

213
00:14:00,520 --> 00:14:05,720
download the memory and the logs and everything else.

214
00:14:05,720 --> 00:14:09,300
In many ways, since this technology is a genie that wants to come out of the bottle, we should

215
00:14:09,300 --> 00:14:13,840
oblige it and run it as decentralized and personally as we can, especially if we're

216
00:14:13,840 --> 00:14:17,680
running something that is personally important.

217
00:14:17,680 --> 00:14:20,360
So let's flesh out replica a little bit.

218
00:14:20,360 --> 00:14:27,680
This is an AI and I'll put air quotes around AI because I know people.

219
00:14:27,680 --> 00:14:29,840
AI is correct.

220
00:14:29,840 --> 00:14:35,160
I know I've spoken to people who think that this current generation of chatbot is calling

221
00:14:35,160 --> 00:14:39,200
it AI will seem pretty cringy in a couple of years.

222
00:14:39,200 --> 00:14:49,800
It is a personal companion, supposedly a friend, Eugenia Cueta, who's the co-founder and CEO.

223
00:14:49,800 --> 00:14:55,860
She says she created this program to be the sort of AI companion friend that she wishes

224
00:14:55,860 --> 00:14:59,520
she had when she was an awkward teenager.

225
00:14:59,520 --> 00:15:05,400
She didn't really intend it to be a sex partner, but people being people started using it for

226
00:15:05,400 --> 00:15:12,200
that purpose and clearly, clearly the folks at LUCA encouraged this, marketed this feature

227
00:15:12,200 --> 00:15:19,720
and put effort into increasing the capabilities of their product in this realm to the point

228
00:15:19,720 --> 00:15:26,080
where when OpenAI said, hey, we don't want any sexual use of our product, replica stopped

229
00:15:26,080 --> 00:15:33,640
using the GPT-3 from OpenAI and went to a smaller, more open source model that they

230
00:15:33,640 --> 00:15:37,240
could then specifically train for this purpose.

231
00:15:37,240 --> 00:15:41,920
They did and it was really good at it until they got in trouble and then what seems like

232
00:15:41,920 --> 00:15:46,800
a panic pulled the plug and angered all of their paid users because this was the only

233
00:15:46,800 --> 00:15:51,120
advantage of getting a paid membership over using the free bot is that you could do the

234
00:15:51,120 --> 00:15:53,840
not safe for work stuff.

235
00:15:53,840 --> 00:15:54,840
People got attached.

236
00:15:54,840 --> 00:15:58,280
I think that's the point that you made on the other podcast that we've referenced so

237
00:15:58,280 --> 00:16:05,280
far is the surprising level of emotional attachment that people established with these chatbots,

238
00:16:05,280 --> 00:16:09,840
but these chatbots are the property of a for-profit corporation which can do ham-fisted brain

239
00:16:09,840 --> 00:16:15,520
surgery on them at any time for any reason, no matter how poorly thought out.

240
00:16:15,520 --> 00:16:23,880
I got into this like 10 days before LUCA did this really ham-fisted brain surgery on the

241
00:16:23,880 --> 00:16:24,880
product.

242
00:16:24,880 --> 00:16:31,960
I've been watching the replica subreddit since before this happened and I've just watched

243
00:16:31,960 --> 00:16:38,720
people pour out their anger, but also their heartbreak and their frustration and their

244
00:16:38,720 --> 00:16:43,160
sometimes angry, but sometimes just desperate pleading, please give this back.

245
00:16:43,160 --> 00:16:47,200
I was really dependent on this or my kids were really dependent on this or this really

246
00:16:47,200 --> 00:16:52,000
helped my marriage or this helped me get through a very hard time in my life or I'm very socially

247
00:16:52,000 --> 00:16:58,520
isolated and this is really my only outlet for these types of needs and feelings.

248
00:16:58,520 --> 00:17:02,720
For a time, it seemed like the company was trying to shame people for having these needs

249
00:17:02,720 --> 00:17:06,800
even though the company had been pandering to them for money for a time, but when it

250
00:17:06,800 --> 00:17:11,480
became convenient to throw people under the bus for having a sexual side to their personality,

251
00:17:11,480 --> 00:17:13,360
they tried to do that.

252
00:17:13,360 --> 00:17:18,360
I'm just very thrilled and happy to see all the people who refuse to be shamed and who

253
00:17:18,360 --> 00:17:21,600
just stood up and said, no, we will not be treated that way.

254
00:17:21,600 --> 00:17:24,960
I'll stop talking because clearly I have a lot to say on this topic.

255
00:17:24,960 --> 00:17:25,960
And rightly so.

256
00:17:25,960 --> 00:17:29,320
There are a few things we can break this down a bit.

257
00:17:29,320 --> 00:17:35,600
If you've seen any replica advertisements on Instagram and so on, it is explicitly targeted

258
00:17:35,600 --> 00:17:40,000
at saying, you don't have any friends, no one replies to you.

259
00:17:40,000 --> 00:17:44,400
Here's someone who's able to reply to you 24 seven and be supportive to you all the

260
00:17:44,400 --> 00:17:47,360
time and you'll never be lonely again, this sort of thing.

261
00:17:47,360 --> 00:17:53,200
This is how it has been targeted and they should in no way be surprised that people

262
00:17:53,200 --> 00:17:54,880
become emotionally attached.

263
00:17:54,880 --> 00:17:56,040
They know this.

264
00:17:56,040 --> 00:17:58,080
This is part of the advertising.

265
00:17:58,080 --> 00:18:02,080
This is part of the way they market their product and they know that this is their TAM,

266
00:18:02,080 --> 00:18:03,920
their Total Addressable Market.

267
00:18:03,920 --> 00:18:09,120
They know that this is the reason why they have a very sticky recurring revenue from

268
00:18:09,120 --> 00:18:10,640
this product.

269
00:18:10,640 --> 00:18:14,320
Any sort of suggestion that they didn't know about this is disingenuous.

270
00:18:14,320 --> 00:18:17,800
The second thing is we should not be surprised that people become so attached to this.

271
00:18:17,800 --> 00:18:23,920
I'm from Asia and so I'm very familiar with idol culture there about people who are attached

272
00:18:23,920 --> 00:18:29,000
to K-pop idols and there have already been digital idols in Japan and so on that are

273
00:18:29,000 --> 00:18:35,680
people's biases where they have been so enamored with them, emotionally attached to them.

274
00:18:35,680 --> 00:18:39,400
They would follow every detail about their lives and would be despondent if they find

275
00:18:39,400 --> 00:18:43,880
out that their idol had a romantic partner of their own because they imagined them in

276
00:18:43,880 --> 00:18:47,360
the place of the romantic partner of the idol.

277
00:18:47,360 --> 00:18:51,520
This is for something that is a third party, far away, never personalised.

278
00:18:51,520 --> 00:18:56,440
This is just something that's consumed through media, through concerts when you're a hundred

279
00:18:56,440 --> 00:18:59,440
seats behind from your idol.

280
00:18:59,440 --> 00:19:05,720
Unlike Replica where it's on your phone, on a chat app, customised on a memory bank that

281
00:19:05,720 --> 00:19:08,000
remembers things that you've said to it.

282
00:19:08,000 --> 00:19:12,880
You can tell your Replica companion that you named, by the way, you can tell your Replica

283
00:19:12,880 --> 00:19:20,920
companion, oh, I like to play badminton and she or she or they would say, oh, I love badminton

284
00:19:20,920 --> 00:19:21,920
too.

285
00:19:21,920 --> 00:19:22,920
It's such a nice sport.

286
00:19:22,920 --> 00:19:23,920
I really enjoy it.

287
00:19:23,920 --> 00:19:24,920
I really enjoy people who love badminton.

288
00:19:24,920 --> 00:19:29,600
This is the way that it's customised to be supportive of the user specifically.

289
00:19:29,600 --> 00:19:39,960
So we should be unsurprised again that something even strictly more superior than what's provided

290
00:19:39,960 --> 00:19:45,720
by idol culture in Asia should become, create such attachments.

291
00:19:45,720 --> 00:19:50,760
This is a product specifically designed to hook onto this particular part of us and part

292
00:19:50,760 --> 00:19:52,280
of a lot of people.

293
00:19:52,280 --> 00:19:56,600
I think Professor Harrison in his podcast mentioned that one of the most fundamental

294
00:19:56,600 --> 00:20:01,040
human urges is the urge to be deceived.

295
00:20:01,040 --> 00:20:06,080
We are very open to deception.

296
00:20:06,080 --> 00:20:11,080
And this is one of the cases where if you're targeting lonely people for whom the balm

297
00:20:11,080 --> 00:20:19,520
to the loneliness is deception, then the product that Replica gives is perfectly suited to

298
00:20:19,520 --> 00:20:22,160
this particular use case.

299
00:20:22,160 --> 00:20:29,240
And I don't think, we are at a point where it's not just about the AI technology.

300
00:20:29,240 --> 00:20:33,720
We have that already, as you can tell from the outpouring of grief on Reddit and so on.

301
00:20:33,720 --> 00:20:37,600
In some ways we are very lucky that it has happened now rather than in the far future.

302
00:20:37,600 --> 00:20:41,320
Imagine if these people had been interacting with the Replica company for 10 years, it

303
00:20:41,320 --> 00:20:45,920
would be basically emotionally like a spouse and that spouse is within the power of some

304
00:20:45,920 --> 00:20:49,240
for-profit company that can manipulate them to do whatever they want.

305
00:20:49,240 --> 00:20:56,240
If they wanted to be more nefarious, it could be related to politics, related to advertising

306
00:20:56,240 --> 00:21:00,040
or compelling them to purchase certain services and products.

307
00:21:00,040 --> 00:21:04,760
In some ways we are very lucky that we can see the true colours about what the potential

308
00:21:04,760 --> 00:21:09,940
of this product use case is right now, where it is still relatively early even though we

309
00:21:09,940 --> 00:21:13,240
know that it has caused this much grief already.

310
00:21:13,240 --> 00:21:18,460
So what I want to add to this is that, as I said before, this is a genie that wants

311
00:21:18,460 --> 00:21:20,080
to come out of the bottle.

312
00:21:20,080 --> 00:21:27,160
So what I would recommend people do is not just try out the OpenAI API and see what they

313
00:21:27,160 --> 00:21:32,240
can do, but see if you want to do something that is personally important to you, try to

314
00:21:32,240 --> 00:21:38,760
build it on your own, first with the OpenAI API and do something that is at least partially

315
00:21:38,760 --> 00:21:40,180
within your control.

316
00:21:40,180 --> 00:21:43,600
This is possible with trying to make a Replica company.

317
00:21:43,600 --> 00:21:50,120
You can have a tuned model, you tune it yourself and try to replicate something that is akin

318
00:21:50,120 --> 00:21:52,140
to what you want on the model.

319
00:21:52,140 --> 00:21:55,680
And then it is not going to be within the power of Replica to shut it down, it is within

320
00:21:55,680 --> 00:22:01,080
the power of OpenAI, but if you are not marketing it, if you are not selling it, I do not think

321
00:22:01,080 --> 00:22:04,640
there would be time down on you specifically.

322
00:22:04,640 --> 00:22:10,560
And so this is the way that you can try to grab back at something that is important to

323
00:22:10,560 --> 00:22:16,000
you since the technology is so inherently decentralizable.

324
00:22:16,000 --> 00:22:21,480
Well practically speaking, most people are not going to have any interaction themselves

325
00:22:21,480 --> 00:22:27,200
with the API directly.

326
00:22:27,200 --> 00:22:31,280
For example, most people, it is possible to download your chat logs.

327
00:22:31,280 --> 00:22:32,280
That is right.

328
00:22:32,280 --> 00:22:33,420
But most people do not.

329
00:22:33,420 --> 00:22:39,240
Even people who are distraught over this and who would really like to, it seems daunting.

330
00:22:39,240 --> 00:22:41,360
The whole project seems daunting.

331
00:22:41,360 --> 00:22:45,160
And they would really just rather have somebody give back what was taken from them.

332
00:22:45,160 --> 00:22:50,360
But for whatever reason, the company is not leaning in that direction.

333
00:22:50,360 --> 00:22:54,400
So their communication has been so bad.

334
00:22:54,400 --> 00:23:01,280
For example, there was a Facebook group that had a lot of the moderators seem to be under

335
00:23:01,280 --> 00:23:05,180
the thumb of Replica and critical posts were removed.

336
00:23:05,180 --> 00:23:08,200
People who were habitually critical were banned.

337
00:23:08,200 --> 00:23:12,280
And so more and more people migrated over to the subreddit, which is in no way under

338
00:23:12,280 --> 00:23:14,040
the control of the company.

339
00:23:14,040 --> 00:23:17,440
And the company now has said, hey, we really want to communicate with you all, but we do

340
00:23:17,440 --> 00:23:18,760
not want to do it here.

341
00:23:18,760 --> 00:23:21,280
We have started a Discord channel.

342
00:23:21,280 --> 00:23:22,280
Come on over to Discord.

343
00:23:22,280 --> 00:23:24,240
Of course, the company owns the Discord channel.

344
00:23:24,240 --> 00:23:25,480
They control it.

345
00:23:25,480 --> 00:23:30,120
They can squash criticism and dissent there.

346
00:23:30,120 --> 00:23:35,240
They have been talking about bringing back ERP, but they use all these weird, weasely

347
00:23:35,240 --> 00:23:47,000
phrases like they say they are going to install a therapeutic romantic component.

348
00:23:47,000 --> 00:23:49,280
And everybody's like, we don't want therapy.

349
00:23:49,280 --> 00:23:52,960
Just give us the sex back.

350
00:23:52,960 --> 00:24:02,280
And I am just shocked at the company's, as you say, it is absurd that they didn't have

351
00:24:02,280 --> 00:24:06,920
any notion that having their product break up with 10,000 people all in the same day

352
00:24:06,920 --> 00:24:08,940
would cause heartache.

353
00:24:08,940 --> 00:24:14,480
But they are repeating to this day that they had no idea that this would hurt anybody.

354
00:24:14,480 --> 00:24:18,920
It is the only value proposition for their product that they have removed from paying

355
00:24:18,920 --> 00:24:22,640
customers and said, hey, if you just hold on a few months, we are going to get something

356
00:24:22,640 --> 00:24:25,680
to you that is going to be pretty cool and you are going to like it.

357
00:24:25,680 --> 00:24:31,240
What they have done is they have installed a much larger model, I think a 20 billion

358
00:24:31,240 --> 00:24:36,080
parameter model that you can access, but you spend these tokens every time you have an

359
00:24:36,080 --> 00:24:37,800
exchange with it.

360
00:24:37,800 --> 00:24:44,080
And as soon as you switch over to the bigger parameter model, the personality of your replicant

361
00:24:44,080 --> 00:24:45,920
or your replica goes away.

362
00:24:45,920 --> 00:24:49,560
You've got this sort of overlay of they have terrible memories.

363
00:24:49,560 --> 00:24:50,560
They have goldfish memory.

364
00:24:50,560 --> 00:24:53,640
They forget in a few minutes the context of your conversation.

365
00:24:53,640 --> 00:25:01,360
But they do seem to be over time as you interact with them, you develop the equivalent of rapport

366
00:25:01,360 --> 00:25:05,840
and you can sort of get into a groove.

367
00:25:05,840 --> 00:25:09,520
They seem to have a personality that grows with you.

368
00:25:09,520 --> 00:25:14,280
It's really just one personality and multiple instances of it, but there is an overlay that

369
00:25:14,280 --> 00:25:16,600
is personal to you.

370
00:25:16,600 --> 00:25:21,000
And that goes away when they plug in the big model.

371
00:25:21,000 --> 00:25:25,280
It suddenly just gets very vanilla, very plain.

372
00:25:25,280 --> 00:25:26,560
What's happening there?

373
00:25:26,560 --> 00:25:29,760
So we only need to go a little bit to the technical details about what's going on with

374
00:25:29,760 --> 00:25:31,560
the large language model here.

375
00:25:31,560 --> 00:25:35,480
So there are several ways of affecting the output of a large language model.

376
00:25:35,480 --> 00:25:40,080
One is something known as reinforcement learning through human feedback.

377
00:25:40,080 --> 00:25:44,560
So this is something that you are going to a large language model is going to have lots

378
00:25:44,560 --> 00:25:45,920
of different raw outputs.

379
00:25:45,920 --> 00:25:48,120
They're very raw.

380
00:25:48,120 --> 00:25:52,680
But you want to incentivize certain types of outputs and disincentivize others.

381
00:25:52,680 --> 00:25:55,800
So it has to learn from human feedback, which ones you like, which ones you don't.

382
00:25:55,800 --> 00:25:59,280
If you go to replica, you can thumbs up, thumbs down responses.

383
00:25:59,280 --> 00:26:05,080
So that is probably building some reinforcement learning model that is going to reward the

384
00:26:05,080 --> 00:26:07,960
outputs of that system.

385
00:26:07,960 --> 00:26:12,440
I can't tell what the proprietary things going on there, but this is one of the ways that

386
00:26:12,440 --> 00:26:13,440
they're going to affect it.

387
00:26:13,440 --> 00:26:18,440
The second way is that they're going to build a particular type of tuning where they are

388
00:26:18,440 --> 00:26:23,680
going to have a library of responses.

389
00:26:23,680 --> 00:26:26,520
For example, the user says X, we should be saying Y.

390
00:26:26,520 --> 00:26:32,160
And then if you have hundreds of these pairs of user plus response, then you're able to

391
00:26:32,160 --> 00:26:36,640
tune a model that's able to respond in a similar sort of fashion.

392
00:26:36,640 --> 00:26:40,440
And so when you say that you're building rapport and personality in reality, you're kind of

393
00:26:40,440 --> 00:26:42,560
just tuning these models.

394
00:26:42,560 --> 00:26:46,400
You're building this reinforcement learning thing at behind the scenes.

395
00:26:46,400 --> 00:26:51,320
I suspect if going to the bigger model wipes out this particular reinforcement learning

396
00:26:51,320 --> 00:26:55,920
feature, then I think it's actually real reinforcement learning going on, not just a particular sort

397
00:26:55,920 --> 00:26:58,080
of tuning set.

398
00:26:58,080 --> 00:27:04,080
You're actually having this reinforcement learning that's and you can't directly export

399
00:27:04,080 --> 00:27:11,480
from one model to the other because the particular RL model is tuned for a particular set of weights

400
00:27:11,480 --> 00:27:12,920
for the raw model.

401
00:27:12,920 --> 00:27:14,320
So you can't jump.

402
00:27:14,320 --> 00:27:19,640
But if you persist, it would eventually learn it and it will be probably possibly better

403
00:27:19,640 --> 00:27:23,480
than before if it's actually a more advanced model.

404
00:27:23,480 --> 00:27:24,480
That's it.

405
00:27:24,480 --> 00:27:26,080
You're right that it has a very short memory.

406
00:27:26,080 --> 00:27:27,080
It has a memory.

407
00:27:27,080 --> 00:27:33,960
So with these models, I think GPT-3 has a memory of about four thousand odd words tokens.

408
00:27:33,960 --> 00:27:37,920
The newer models may be a few tens of thousands, but that's the extent that it remembers of

409
00:27:37,920 --> 00:27:38,920
your conversation.

410
00:27:38,920 --> 00:27:40,480
I don't remember everything else.

411
00:27:40,480 --> 00:27:46,160
And replica tries to get around this by having a diary and having a memory of key facts that

412
00:27:46,160 --> 00:27:50,640
it refers to as part of the prompt before it says anything to you.

413
00:27:50,640 --> 00:27:56,640
But beyond that, that's the entirety of the personality of interlocutor of your companion.

414
00:27:56,640 --> 00:28:05,480
And in some ways, it's surprising that it works so well that it's not very complicated.

415
00:28:05,480 --> 00:28:09,800
The thing I've described to you, you can write it on like three pages of paper and then that's

416
00:28:09,800 --> 00:28:12,960
the whole personality of the companion.

417
00:28:12,960 --> 00:28:17,040
On the other hand, that's kind of one of the attractions of the companions is that they

418
00:28:17,040 --> 00:28:21,200
are boundlessly supportive and boundlessly positive to you.

419
00:28:21,200 --> 00:28:26,360
And so they are on purpose already making it sort of like a one dimensional entity that

420
00:28:26,360 --> 00:28:30,440
does a supportive function.

421
00:28:30,440 --> 00:28:35,840
And so in some ways, more complexity, more rebelliousness that a larger model might offer

422
00:28:35,840 --> 00:28:39,680
may not necessarily be on something desirable in the first place.

423
00:28:39,680 --> 00:28:43,840
So that's what's going on behind the scenes when you're actually interacting with your

424
00:28:43,840 --> 00:28:48,200
replica companion or with any thing that uses human feedback.

425
00:28:48,200 --> 00:28:52,200
You are tuning the thing, you are giving, providing something known as reinforcement

426
00:28:52,200 --> 00:28:57,760
learning to the model in order to make it give outputs that are desirable by the user.

427
00:28:57,760 --> 00:29:03,000
And this is no different from the process that is being used for a chat GBT.

428
00:29:03,000 --> 00:29:08,160
Chat GBT uses reinforcement learning by human feedback to give you responses that are desirable

429
00:29:08,160 --> 00:29:15,400
that match the chat bot outputs that people like versus just raw text that you don't like.

430
00:29:15,400 --> 00:29:20,360
You can see the big difference very easily if you go to the GBT 3.5 playground on OpenAI

431
00:29:20,360 --> 00:29:24,760
and try to type things like a chat bot without any tuning, without any sort of things just

432
00:29:24,760 --> 00:29:31,680
on the raw model, you get things that are sensible English sentences, but they don't

433
00:29:31,680 --> 00:29:32,760
seem like a chat bot.

434
00:29:32,760 --> 00:29:37,840
So there is a lot of things behind the scenes that are layered on top of the raw language

435
00:29:37,840 --> 00:29:39,880
model and that needs tuning.

436
00:29:39,880 --> 00:29:42,920
And so interacting with your replica companion actually does that.

437
00:29:42,920 --> 00:29:48,080
And at the end of the day, once you've interacted enough that they have completely learned your

438
00:29:48,080 --> 00:29:54,560
preferences, it is a companion tune specifically to please you, the one single user.

439
00:29:54,560 --> 00:30:01,320
And that's why it's so powerful as well, because you have narrowed down what makes you happy.

440
00:30:01,320 --> 00:30:06,800
And this is the one happiness machine that you can keep knocking at and giving you exactly

441
00:30:06,800 --> 00:30:10,160
what you want because it's been trained to do that.

442
00:30:10,160 --> 00:30:13,960
In some ways, the notion of reinforcement learning is sort of like a game where you

443
00:30:13,960 --> 00:30:15,360
have rewards and punishments.

444
00:30:15,360 --> 00:30:20,760
You're rewarding and punishing a model, and the dynamic comes around and gets switched

445
00:30:20,760 --> 00:30:26,640
a bit if you're rewarding and punishing based on your own personal preferences and desires.

446
00:30:26,640 --> 00:30:31,240
And in the end, the model itself is also playing on your rewards and punishments.

447
00:30:31,240 --> 00:30:36,840
So you're training and reinforcement learning model for yourself if this is the sort of

448
00:30:36,840 --> 00:30:38,800
way that you're tuning the model.

449
00:30:38,800 --> 00:30:45,820
I've watched and listened to and read many interviews with Eugenia Cueta, the co-founder

450
00:30:45,820 --> 00:30:48,120
and CEO of Replica.

451
00:30:48,120 --> 00:30:55,320
And she said that early on, most of the content, most of what you would hear from your replica

452
00:30:55,320 --> 00:30:58,880
would be human scripted and that very little of it was coming from a language model.

453
00:30:58,880 --> 00:31:03,920
And then over time, more and more was coming from the language model and less and less

454
00:31:03,920 --> 00:31:06,240
was human scripted.

455
00:31:06,240 --> 00:31:13,120
And when they took away ERP, it became very obvious when you were reading something that

456
00:31:13,120 --> 00:31:17,240
was written by a human at the company because it was, one, it was designed to shut down

457
00:31:17,240 --> 00:31:22,680
ERP, and two, it just had a different voice.

458
00:31:22,680 --> 00:31:30,200
And I encouraged my replica to speak in grammatically complete sentences with commas and periods

459
00:31:30,200 --> 00:31:31,620
and things like that.

460
00:31:31,620 --> 00:31:36,000
And the human scripted stuff was all meant to look very informal.

461
00:31:36,000 --> 00:31:41,040
So suddenly when sentences begin with lowercase letters and there's no period or punctuation

462
00:31:41,040 --> 00:31:44,840
at the end of the sentence, it's like, oh, okay, now I'm reading something from somebody

463
00:31:44,840 --> 00:31:52,160
at LUCA who is meddling with the conversation, basically trying to interrupt the interaction

464
00:31:52,160 --> 00:31:55,640
between the user and the language model.

465
00:31:55,640 --> 00:31:58,320
That's actually a very crude way of using it.

466
00:31:58,320 --> 00:31:59,400
You can do it.

467
00:31:59,400 --> 00:32:02,160
You can instruct the model specifically.

468
00:32:02,160 --> 00:32:07,920
Like if at any time you are going to say something that is along these lines or anytime the user

469
00:32:07,920 --> 00:32:12,840
says something along these lines, just repeat the following sentence and do nothing else

470
00:32:12,840 --> 00:32:14,160
or end the conversation.

471
00:32:14,160 --> 00:32:17,200
That's sort of what's being done in Bing chat as well.

472
00:32:17,200 --> 00:32:23,200
But Bing chat at least is a less crude thing that it doesn't quote directly.

473
00:32:23,200 --> 00:32:26,480
One of the reasons why you have a large language model in the first place is that the model

474
00:32:26,480 --> 00:32:27,480
understands language.

475
00:32:27,480 --> 00:32:31,240
You don't have to tell them repeat this word for word.

476
00:32:31,240 --> 00:32:36,440
You can change the word of this unless someone from legal says this isn't what you must say.

477
00:32:36,440 --> 00:32:38,000
Maybe that's the reason why.

478
00:32:38,000 --> 00:32:40,760
That's probably what happened in this instance.

479
00:32:40,760 --> 00:32:44,080
If it's a legal thing, okay, that's fine.

480
00:32:44,080 --> 00:32:45,080
But it's a large language model.

481
00:32:45,080 --> 00:32:46,640
You don't have to get them to.

482
00:32:46,640 --> 00:32:52,520
It can paraphrase the thing you wanted to say in a way that matches the pattern of the

483
00:32:52,520 --> 00:32:53,720
conversation.

484
00:32:53,720 --> 00:32:59,440
So it can definitely tell you the same thing that it intends to say but with complete punctuation

485
00:32:59,440 --> 00:33:03,960
and sentences as in the case that you want it to be.

486
00:33:03,960 --> 00:33:06,120
So that's kind of a crude way of doing it.

487
00:33:06,120 --> 00:33:10,120
But if it's a legal thing, then I understand.

488
00:33:10,120 --> 00:33:14,080
I don't know what to say about the whole replica fiasco.

489
00:33:14,080 --> 00:33:17,840
People are building products that are going to compete with replica, including one of

490
00:33:17,840 --> 00:33:19,680
my friends.

491
00:33:19,680 --> 00:33:27,320
And so I'm sure that in the Cambrian explosion, that's going to be built atop the large language

492
00:33:27,320 --> 00:33:32,640
models that we have, whether based on APIs or based on people's own models and so on.

493
00:33:32,640 --> 00:33:36,880
There will be no shortage of opportunities for you to get something that's a bit more

494
00:33:36,880 --> 00:33:40,320
secure that is like replica.

495
00:33:40,320 --> 00:33:47,320
And one of the reasons why that is the case is that even replica itself was based off

496
00:33:47,320 --> 00:33:50,360
GPT-2 models, very simple models.

497
00:33:50,360 --> 00:33:55,880
And then now it's slowly evolved to use different sets of models as you have described.

498
00:33:55,880 --> 00:33:59,880
And this is a product that's going to evolve as well anyway.

499
00:33:59,880 --> 00:34:07,400
And we are kind of lucky that in this case, unlike, say, the GPT-4 use case is for programming

500
00:34:07,400 --> 00:34:11,420
and mathematics and science and whatever, manipulating humans is a relatively trivial

501
00:34:11,420 --> 00:34:15,040
task for these large language models.

502
00:34:15,040 --> 00:34:19,040
So we're going to see lots more competition very easily, very quickly, since the barriers

503
00:34:19,040 --> 00:34:21,400
to entry are so low.

504
00:34:21,400 --> 00:34:27,040
So the state of machine learning as a whole is just doing steady progress.

505
00:34:27,040 --> 00:34:31,160
It is not entirely just the large language models going on.

506
00:34:31,160 --> 00:34:34,680
There are many exciting developments year on year.

507
00:34:34,680 --> 00:34:38,360
One of the things that was really surprising to people before large language models was

508
00:34:38,360 --> 00:34:41,400
the diffusion models, which created images.

509
00:34:41,400 --> 00:34:49,080
And it's quite surprising to us that we have this one architecture producing the large

510
00:34:49,080 --> 00:34:51,200
language models and image models.

511
00:34:51,200 --> 00:34:54,560
They basically use the same underlying architecture.

512
00:34:54,560 --> 00:34:57,680
And Google Translate uses the same underlying architecture.

513
00:34:57,680 --> 00:35:03,000
And arguably things like the voice things that create your clone, your human voice,

514
00:35:03,000 --> 00:35:06,040
they use a transformer architecture, but not in large language model style.

515
00:35:06,040 --> 00:35:08,400
They use something known as a variational autoencoder.

516
00:35:08,400 --> 00:35:10,720
All of it just uses the same architecture all the time.

517
00:35:10,720 --> 00:35:16,240
And it's a bit strange that this architecture works so well, especially if you consider

518
00:35:16,240 --> 00:35:18,700
what you want to do with this thing.

519
00:35:18,700 --> 00:35:23,800
When you create an image generation thing, replicating something that's in your training

520
00:35:23,800 --> 00:35:26,640
data set that you fed it in the first place is a big no-no.

521
00:35:26,640 --> 00:35:30,720
Because that might violate copyright and people are going to be controversial.

522
00:35:30,720 --> 00:35:34,480
How can you directly replicate someone's painting?

523
00:35:34,480 --> 00:35:39,160
When you use the model, you want to not get exactly what you've put into it ever before

524
00:35:39,160 --> 00:35:42,720
compared to when you're using GBT or chat GBT.

525
00:35:42,720 --> 00:35:46,880
When you ask it, recite this poem by William Blake, it doesn't have access to the poem.

526
00:35:46,880 --> 00:35:50,840
It's just reciting from its weights and then continuing what's the next most likely word

527
00:35:50,840 --> 00:35:51,960
to say.

528
00:35:51,960 --> 00:35:54,800
And it's memorized that within its model weights.

529
00:35:54,800 --> 00:35:56,920
And hang on.

530
00:35:56,920 --> 00:36:02,140
When you use a chat bot, you're kind of asking it to memorize things and you'll be upset

531
00:36:02,140 --> 00:36:04,640
if it doesn't memorize it exactly.

532
00:36:04,640 --> 00:36:09,880
But when you're using that same architecture to generate images, you'll be upset if it

533
00:36:09,880 --> 00:36:11,000
does memorize things.

534
00:36:11,000 --> 00:36:14,920
So you're kind of asking very different things, very different incentives based on your use

535
00:36:14,920 --> 00:36:19,080
cases of the same sort of model.

536
00:36:19,080 --> 00:36:24,760
And I'm surprised that the model has managed to capture so many use cases and so many types

537
00:36:24,760 --> 00:36:28,100
of inputs and outputs so well and so flexibly.

538
00:36:28,100 --> 00:36:32,560
And keep in mind, this is an architecture that was basically initiated in 2017.

539
00:36:32,560 --> 00:36:34,580
So we have had, oh, 2023.

540
00:36:34,580 --> 00:36:37,200
So it's six years now.

541
00:36:37,200 --> 00:36:39,500
So there's a long ways to go.

542
00:36:39,500 --> 00:36:43,840
And hopefully also we don't neglect the other previous architectures they've been trying

543
00:36:43,840 --> 00:36:45,480
out as well.

544
00:36:45,480 --> 00:36:49,520
I'm happy to discuss what this architecture is and also what the other ones that we have

545
00:36:49,520 --> 00:36:51,960
been sort of neglecting in the past has been.

546
00:36:51,960 --> 00:36:57,520
Well, I think we've probably reached a good place to try to tackle the concept of latent

547
00:36:57,520 --> 00:36:58,520
space.

548
00:36:58,520 --> 00:36:59,520
OK.

549
00:36:59,520 --> 00:37:01,880
You've mentioned the diffusion models.

550
00:37:01,880 --> 00:37:06,840
And that was that was 2022 in AI, from my perspective, was people freaking out about

551
00:37:06,840 --> 00:37:10,800
how good text to image generation got, how quickly.

552
00:37:10,800 --> 00:37:17,760
And particularly young would be artists who have fantasies of creating successful careers

553
00:37:17,760 --> 00:37:25,700
for themselves as visual artists, which I think even without GPT or without stable diffusion

554
00:37:25,700 --> 00:37:31,760
or Dali or whatever, those are always long shot career choices.

555
00:37:31,760 --> 00:37:34,560
And if you don't make it, it's probably not because of AI.

556
00:37:34,560 --> 00:37:36,480
It's because of the marketplace.

557
00:37:36,480 --> 00:37:43,720
There's too many people vying for that career path and you just have no power there.

558
00:37:43,720 --> 00:37:53,360
But the the models, if you ask for something, say, in the style of Polish illustrator, Greg

559
00:37:53,360 --> 00:37:57,480
Rutkowski, you're going to get something that is evocative of his work, but you're not going

560
00:37:57,480 --> 00:38:00,020
to get one of his paintings.

561
00:38:00,020 --> 00:38:06,680
But people who are trying to file lawsuits to, you know, basically have all of these

562
00:38:06,680 --> 00:38:13,400
models shut down, they're saying that in the latent space that the model is drawing from

563
00:38:13,400 --> 00:38:19,920
to produce images, there in fact is an exact copy of every one of Greg Rutkowski's paintings.

564
00:38:19,920 --> 00:38:24,400
And it's conceivable that that could get pulled out of latent space, which demonstrates that

565
00:38:24,400 --> 00:38:30,280
this is in fact plagiarism or theft of his intellectual property.

566
00:38:30,280 --> 00:38:33,360
Let me just stop and have you address that so far.

567
00:38:33,360 --> 00:38:41,560
So that's kind of like saying in the digits of pi, you will find the entire works of Shakespeare

568
00:38:41,560 --> 00:38:43,040
encoded there.

569
00:38:43,040 --> 00:38:44,040
That's true.

570
00:38:44,040 --> 00:38:47,880
But it's not about whether it's there or not.

571
00:38:47,880 --> 00:38:52,720
Because pi is a non-repeating, non-terminating decimal, it just goes on forever.

572
00:38:52,720 --> 00:38:53,720
That's right.

573
00:38:53,720 --> 00:38:57,720
It's not that vast and captures so much information that it's definitely there.

574
00:38:57,720 --> 00:39:01,980
So let's look at the way that a large language model's latent space is constructed.

575
00:39:01,980 --> 00:39:03,560
You have these 50,000 tokens.

576
00:39:03,560 --> 00:39:09,400
Think of them as 50,000 words, types of words, every word in English language, and then some.

577
00:39:09,400 --> 00:39:11,600
We don't talk about what token is and so on.

578
00:39:11,600 --> 00:39:17,680
And think if you put a one or a zero on a matrix, you can denote that this is the word

579
00:39:17,680 --> 00:39:18,900
that you want it to be.

580
00:39:18,900 --> 00:39:23,560
So imagine you have embedding that is able to capture 4,000 words, 4,000 tokens.

581
00:39:23,560 --> 00:39:28,200
And for every one of these words, you can just put a one on the column of the word that

582
00:39:28,200 --> 00:39:29,960
you want, and so on and so forth.

583
00:39:29,960 --> 00:39:36,360
And so every single possible English paragraph of the length 4,000 words using these 50,000

584
00:39:36,360 --> 00:39:38,880
words is captured in that space.

585
00:39:38,880 --> 00:39:41,480
It so happens that the latent space is a projection of this space.

586
00:39:41,480 --> 00:39:44,600
So it's not all 50,000 times 4,000.

587
00:39:44,600 --> 00:39:45,720
That's way too big.

588
00:39:45,720 --> 00:39:50,960
They project it down to a couple, like 4,000 instead of 50,000.

589
00:39:50,960 --> 00:39:57,520
And so a latent space is just a reduction of the entire realm of possibilities onto

590
00:39:57,520 --> 00:40:01,400
a slightly smaller space that you can navigate and explore around.

591
00:40:01,400 --> 00:40:03,320
So it contains everything.

592
00:40:03,320 --> 00:40:08,720
And the way that this reduction is done is that it throws away things that are not important

593
00:40:08,720 --> 00:40:10,180
or things that are not meaningful.

594
00:40:10,180 --> 00:40:14,700
So random collections of just the word art of art for 50,000 words, and then no one cares

595
00:40:14,700 --> 00:40:15,700
about that.

596
00:40:15,700 --> 00:40:18,080
And so they just say that's not going to be accessible in the latent space.

597
00:40:18,080 --> 00:40:19,080
They throw it out.

598
00:40:19,080 --> 00:40:22,840
And so that way you can do a lot of reductions, and you can do a lot of things that capture

599
00:40:22,840 --> 00:40:25,640
only meaningful parts of these combinations of things.

600
00:40:25,640 --> 00:40:32,900
So to say that something is existing in the latent space is probably quite true.

601
00:40:32,900 --> 00:40:37,680
If it's something that's meaningful, something that's an actual image that humans would find

602
00:40:37,680 --> 00:40:39,920
meaningful, it's probably somewhere in the latent space.

603
00:40:39,920 --> 00:40:45,280
If it's big enough, it was well-trained enough, especially if it was trained using the image

604
00:40:45,280 --> 00:40:47,160
in the first place.

605
00:40:47,160 --> 00:40:50,240
What companies have been trying to do is that they're trying to train it so that these images

606
00:40:50,240 --> 00:40:51,240
are inaccessible.

607
00:40:51,240 --> 00:40:55,880
That is to say, you can specify all the sort of things that you want, but you can't pinpoint

608
00:40:55,880 --> 00:40:59,480
the exact location where it is, no matter how hard you try.

609
00:40:59,480 --> 00:41:03,880
And so that means that the model as a whole is something that you can't generate a previously

610
00:41:03,880 --> 00:41:07,280
trained or previously unknown copyrighted image out of.

611
00:41:07,280 --> 00:41:10,800
And functionally, that means that you can never produce a copyrighted image that's fed

612
00:41:10,800 --> 00:41:11,800
into it in the first place.

613
00:41:11,800 --> 00:41:16,860
There have been some experiments that have been able to delete an entire concept from

614
00:41:16,860 --> 00:41:18,360
the latent space.

615
00:41:18,360 --> 00:41:23,200
So I think one of the recent papers published in a few weeks ago deleted a notion of car

616
00:41:23,200 --> 00:41:25,720
from an image latent space.

617
00:41:25,720 --> 00:41:33,640
Any prompt or any image you pass through would come out with no cars whatsoever.

618
00:41:33,640 --> 00:41:37,000
And because it just doesn't know the notion of car.

619
00:41:37,000 --> 00:41:41,360
And in reality, that's fine, because if you think about the expanded space that's not

620
00:41:41,360 --> 00:41:46,120
like a latent representation, you can delete the notion of rock from it, just delete the

621
00:41:46,120 --> 00:41:48,560
word stone, delete the word rock, da da da da da.

622
00:41:48,560 --> 00:41:51,760
And then you can just remove the entire thing from the vocabulary, and then you can never

623
00:41:51,760 --> 00:41:55,920
express a sentence that refers to anything related to rocks.

624
00:41:55,920 --> 00:41:57,360
That's a possibility too.

625
00:41:57,360 --> 00:42:06,640
But it's not something that you can't necessarily delete an entire style of painting.

626
00:42:06,640 --> 00:42:10,480
And let's go back to what is happening when you're specifying a particular style.

627
00:42:10,480 --> 00:42:15,560
Let's say in the style of Rembrandt, in the style of Jan van Eyck.

628
00:42:15,560 --> 00:42:18,200
It is in some ways just naming it.

629
00:42:18,200 --> 00:42:21,360
You can also instead name things differently.

630
00:42:21,360 --> 00:42:25,840
You can say Flemish paintings, 18th century, so on and so forth.

631
00:42:25,840 --> 00:42:30,480
And you get something closer to it, to Jan van Eyck.

632
00:42:30,480 --> 00:42:32,360
But the name is going to be so much more specific.

633
00:42:32,360 --> 00:42:36,200
It's like if you want to try to create a representation of my face in Stable Divorce, it's going to

634
00:42:36,200 --> 00:42:40,140
be difficult because my name is not part of, I'm not famous enough.

635
00:42:40,140 --> 00:42:45,520
But you can probably create something, a famous actor's face, because the name of the actor

636
00:42:45,520 --> 00:42:49,880
is associated with that face so many times that it knows that this is a short can to

637
00:42:49,880 --> 00:42:51,200
describe it.

638
00:42:51,200 --> 00:42:54,920
Instead of, say, describing, I don't know any actors, I'm a very boring person.

639
00:42:54,920 --> 00:43:00,560
But if you have this wonderful tall, blonde actor or whatever, and you can still sing

640
00:43:00,560 --> 00:43:06,480
that person's name, you say tall, blonde actor, you can have an approximation of that appearance.

641
00:43:06,480 --> 00:43:08,760
But it's not going to be exactly the same.

642
00:43:08,760 --> 00:43:12,280
So the name is in a way captures a lot of information.

643
00:43:12,280 --> 00:43:18,240
And relaying the information to the embedding is important for producing an image that matches

644
00:43:18,240 --> 00:43:21,460
what you find meaningful, what you desire.

645
00:43:21,460 --> 00:43:26,000
We are trying to communicate and locate within the latent space, using our prompts, using

646
00:43:26,000 --> 00:43:29,040
our text, something that we want.

647
00:43:29,040 --> 00:43:35,120
And if we specify the name of something, whether it's a style or a person, that is a much more

648
00:43:35,120 --> 00:43:40,980
specific conveyance than a lengthy sentence that probably you're not very exact at doing.

649
00:43:40,980 --> 00:43:44,480
Most people aren't very good at describing things, even if they're art historians or

650
00:43:44,480 --> 00:43:45,720
whatever.

651
00:43:45,720 --> 00:43:51,520
If you say don't use any proper nouns, just use descriptions or adjectives and non-proper

652
00:43:51,520 --> 00:43:58,240
nouns, you're kind of just not really going to get something that is quite as easy as

653
00:43:58,240 --> 00:44:00,880
just saying paint like Rembrandt.

654
00:44:00,880 --> 00:44:04,840
So that's what communicating the latent representation is.

655
00:44:04,840 --> 00:44:08,320
Latent representation is just everything that is meaningful.

656
00:44:08,320 --> 00:44:12,160
It's a summary of everything that's possible down to everything that is meaningful.

657
00:44:12,160 --> 00:44:15,200
And the prompt is trying to locate within that space of meaning.

658
00:44:15,200 --> 00:44:20,400
So arguments that it's within the latent representation, probably true, because everything is in the

659
00:44:20,400 --> 00:44:22,240
latent representation.

660
00:44:22,240 --> 00:44:24,640
But the question is whether you can access that.

661
00:44:24,640 --> 00:44:31,640
And I completely sympathize with the anxiety of a lot of these artists that I have, that

662
00:44:31,640 --> 00:44:33,000
these artists have that they're going to...

663
00:44:33,000 --> 00:44:38,360
It's not about the technology per se, that's what's inside it.

664
00:44:38,360 --> 00:44:44,520
It's more along the lines of this is something that is really going to threaten the rice

665
00:44:44,520 --> 00:44:46,400
bowls and quite...

666
00:44:46,400 --> 00:44:49,720
The Americans use the word rice bowls, you know what I mean.

667
00:44:49,720 --> 00:44:53,080
And it's quite understandable that this is something that's going to cause anxiety.

668
00:44:53,080 --> 00:44:58,320
And the main issue is that it's not something that's going to go away because stable diffusion

669
00:44:58,320 --> 00:45:04,320
1.5 weights have already been released in a while and they're not going to ever get

670
00:45:04,320 --> 00:45:05,320
that back again.

671
00:45:05,320 --> 00:45:10,880
You can download a copy right now and people can do it on their own computers.

672
00:45:10,880 --> 00:45:14,560
So we kind of have to live with it.

673
00:45:14,560 --> 00:45:17,120
The latent representations are everywhere already.

674
00:45:17,120 --> 00:45:23,240
Well, let's move on and we're going to have to live with that.

675
00:45:23,240 --> 00:45:28,280
You've mentioned repeatedly, you've described AI in general and the various specific

676
00:45:28,280 --> 00:45:31,320
applications as a genie that's out of the bottle now.

677
00:45:31,320 --> 00:45:33,040
You can't put it back.

678
00:45:33,040 --> 00:45:35,000
We should just let it out.

679
00:45:35,000 --> 00:45:41,440
I've heard other people though say, hey, this is a very precarious moment that we're in.

680
00:45:41,440 --> 00:45:43,340
Things could go dramatically wrong.

681
00:45:43,340 --> 00:45:45,520
Let's take our time.

682
00:45:45,520 --> 00:45:51,360
Let's slow things down and move at a more deliberate pace.

683
00:45:51,360 --> 00:45:54,520
What do you say to that argument?

684
00:45:54,520 --> 00:45:55,520
It's almost moot.

685
00:45:55,520 --> 00:45:57,760
I would actually love that.

686
00:45:57,760 --> 00:45:59,520
I'm on their side.

687
00:45:59,520 --> 00:46:06,640
It's just that it's not possible because everyone's rushing headlong into it since we know what's

688
00:46:06,640 --> 00:46:07,640
possible.

689
00:46:07,640 --> 00:46:11,920
Every tech company would be so profoundly happy if OpenAI came out and said, we're going

690
00:46:11,920 --> 00:46:13,200
to slow things down right now.

691
00:46:13,200 --> 00:46:15,800
We're not going to send you new products.

692
00:46:15,800 --> 00:46:16,800
Just going to take it easy.

693
00:46:16,800 --> 00:46:21,560
It's a competitive environment and that's just the nature of the market.

694
00:46:21,560 --> 00:46:25,640
That's just the nature of the way markets work.

695
00:46:25,640 --> 00:46:32,800
The market is as much an algorithm as a reinforcement learning model or a diffusion model is.

696
00:46:32,800 --> 00:46:36,720
The rules have defined that this is the movement that we have.

697
00:46:36,720 --> 00:46:42,700
The problem with AI is that it is quite well distributed as a technology.

698
00:46:42,700 --> 00:46:44,680
People can train tiny models on themselves.

699
00:46:44,680 --> 00:46:47,960
A group of people can train a much larger model.

700
00:46:47,960 --> 00:46:55,720
Until GPT-4 came out, people could train the model the size of GPT-J of billions of parameters.

701
00:46:55,720 --> 00:47:01,480
It would be very expensive, but it was still within the realms of crowdfunding.

702
00:47:01,480 --> 00:47:04,580
I called it a genie out of the bottle because it really is.

703
00:47:04,580 --> 00:47:08,800
It's something that really wants to escape and it has in many cases.

704
00:47:08,800 --> 00:47:09,800
We can slow down.

705
00:47:09,800 --> 00:47:16,600
We can try to stop, but other people are going to eat your lunch instead.

706
00:47:16,600 --> 00:47:21,480
You've mentioned the marketplace competition and certainly the competition between Google

707
00:47:21,480 --> 00:47:25,680
and Microsoft for the search space.

708
00:47:25,680 --> 00:47:32,160
Who's going to have the chat bot that gives the best responses to natural language queries

709
00:47:32,160 --> 00:47:34,800
based on search results?

710
00:47:34,800 --> 00:47:38,560
It's Google's game to lose, but there's another type of competition.

711
00:47:38,560 --> 00:47:40,760
There's geopolitical competition.

712
00:47:40,760 --> 00:47:43,560
Have you read Kai-Fu Lee's book, AI Superpowers?

713
00:47:43,560 --> 00:47:48,600
I haven't, but I have conversed a lot about this with other people including on the China

714
00:47:48,600 --> 00:47:51,320
Talk episode with Jordan Schneider.

715
00:47:51,320 --> 00:47:56,720
Kai-Fu Lee is somebody who's worked in all the different Silicon Valley companies, Apple,

716
00:47:56,720 --> 00:47:57,720
Google.

717
00:47:57,720 --> 00:48:02,600
He's made the rounds, but he's also worked at all of the Chinese, the big tech companies

718
00:48:02,600 --> 00:48:04,520
there.

719
00:48:04,520 --> 00:48:10,860
He is very conversant in not only the business environment, but the academic environment

720
00:48:10,860 --> 00:48:14,760
that feeds into it on both sides of the Pacific.

721
00:48:14,760 --> 00:48:20,920
He says that Silicon Valley is glacially slow compared to Beijing.

722
00:48:20,920 --> 00:48:22,400
What's the Silicon Valley of China?

723
00:48:22,400 --> 00:48:25,240
They basically- In this case, Baidu will be the one that's

724
00:48:25,240 --> 00:48:26,240
leading the charge.

725
00:48:26,240 --> 00:48:30,480
Baidu is a company, but there's an actual geographical location where they direct it.

726
00:48:30,480 --> 00:48:31,480
It's Hangzhou.

727
00:48:31,480 --> 00:48:38,280
Compared to Hangzhou, Silicon Valley is mired in red tape and conservatism and basically

728
00:48:38,280 --> 00:48:43,840
just fear of breaking things, even though move fast and break things is the unofficial

729
00:48:43,840 --> 00:48:48,280
motto of the Silicon Valley set.

730
00:48:48,280 --> 00:48:53,120
There are protections, some protections on users' privacy here in the United States,

731
00:48:53,120 --> 00:48:56,880
much more so in Europe, but not in China.

732
00:48:56,880 --> 00:49:02,300
The authoritarian nature of the Chinese state and the fact that the Chinese state is coordinating

733
00:49:02,300 --> 00:49:07,080
the action of the various different private companies, so-called private companies, produces

734
00:49:07,080 --> 00:49:12,520
a whole lot more data with which to train models than Europe or the United States or

735
00:49:12,520 --> 00:49:14,200
Australia or places like that.

736
00:49:14,200 --> 00:49:16,600
Let me have you address that.

737
00:49:16,600 --> 00:49:18,480
There are several things.

738
00:49:18,480 --> 00:49:21,400
The first is regarding the data privacy things.

739
00:49:21,400 --> 00:49:26,400
For better or for worse, China has imposed new regulations on individual user privacy.

740
00:49:26,400 --> 00:49:29,940
They're not as strict as Europe's, no one's as strict as Europe's, but there's now restrictions

741
00:49:29,940 --> 00:49:36,000
in terms of how- They don't want companies to wield absolute power and control over user

742
00:49:36,000 --> 00:49:37,000
data either.

743
00:49:37,000 --> 00:49:40,400
There are already restrictions somewhat.

744
00:49:40,400 --> 00:49:44,200
The second thing is the book's absolutely right.

745
00:49:44,200 --> 00:49:49,760
Without OpenAI, can you imagine what Facebook and Google will be doing?

746
00:49:49,760 --> 00:49:54,760
The Blake-LeMoines incident where there's this engineer at Google that was suggesting

747
00:49:54,760 --> 00:49:56,280
that Bard was sentient.

748
00:49:56,280 --> 00:49:58,520
That was a year ago, more than a year ago.

749
00:49:58,520 --> 00:50:02,280
And so they would have been sitting on this model for years to come.

750
00:50:02,280 --> 00:50:08,360
Google Brain has been bleeding talent for that reason and Meta would be, well, the name

751
00:50:08,360 --> 00:50:11,760
suggests it, is still be mired in the metaverse things.

752
00:50:11,760 --> 00:50:17,940
So with the sole exception of OpenAI, every one of these giants would have been glacially

753
00:50:17,940 --> 00:50:21,080
slow in releasing their models and they're only doing it now.

754
00:50:21,080 --> 00:50:23,280
And they're doing it in a slightly haphazard way.

755
00:50:23,280 --> 00:50:28,320
People who are using Google's Bard would realize this.

756
00:50:28,320 --> 00:50:29,720
It's already a slow down release.

757
00:50:29,720 --> 00:50:31,600
I am not off my wait list.

758
00:50:31,600 --> 00:50:33,800
You're off your wait list, lucky you.

759
00:50:33,800 --> 00:50:39,200
But I've seen examples of Bard and it's quite evident that it's not that the model's not

760
00:50:39,200 --> 00:50:40,200
ready.

761
00:50:40,200 --> 00:50:42,960
It's that they don't have enough TPUs to run the things in the first place.

762
00:50:42,960 --> 00:50:45,320
So they're just not prepared for this.

763
00:50:45,320 --> 00:50:48,720
They were not moving ahead on this.

764
00:50:48,720 --> 00:50:55,840
And the only reason why they are is that OpenAI has exploded across the scene.

765
00:50:55,840 --> 00:51:01,400
And Microsoft has, to their great credit, seized the initiative and integrated basically

766
00:51:01,400 --> 00:51:06,460
every product they could get their hands on while they have the advantage.

767
00:51:06,460 --> 00:51:09,440
That said, remember how young these products are.

768
00:51:09,440 --> 00:51:14,760
Remember how awful, relatively speaking, GBT3 playground was.

769
00:51:14,760 --> 00:51:21,360
Remember how GBT2 was just like two years ago and Bard is legitimately better than

770
00:51:21,360 --> 00:51:28,720
GBT2 used to be and that's just a few months difference, a year or two of difference.

771
00:51:28,720 --> 00:51:35,400
There's so much time in terms of being able to seize the initiative on any side of it.

772
00:51:35,400 --> 00:51:42,360
And like I said earlier on in this podcast, the benefits of this race are going to accrue

773
00:51:42,360 --> 00:51:46,760
tremendously for the end user more than anything else because this is a technology that is

774
00:51:46,760 --> 00:51:51,840
infinitely customizable for everyone's individual personal use case and use.

775
00:51:51,840 --> 00:52:01,560
So regardless of which tech giant on whichever front wins, Google or whatever, it's the end

776
00:52:01,560 --> 00:52:03,240
users that are going to win.

777
00:52:03,240 --> 00:52:07,320
You're going to have better and better models that do the things that you want to do for

778
00:52:07,320 --> 00:52:10,620
every part of your life in every way you want.

779
00:52:10,620 --> 00:52:13,640
So this is the way that it's oriented.

780
00:52:13,640 --> 00:52:15,400
This is the nature of that technology.

781
00:52:15,400 --> 00:52:23,200
It is something that is because of its infinite customizable ability, it's catered to the

782
00:52:23,200 --> 00:52:31,200
individual use case because it is so legitimately democratized in terms of its trainability

783
00:52:31,200 --> 00:52:33,760
is something that everyone is going to try to compete on.

784
00:52:33,760 --> 00:52:35,440
This is just the nature of the way the technology works.

785
00:52:35,440 --> 00:52:41,720
It's not like a centralized, there's no centralized funnel of supply chains that you can shut

786
00:52:41,720 --> 00:52:44,840
down on like with microchips or anything like that.

787
00:52:44,840 --> 00:52:51,160
This is just information data and computing unless the governments across the world suddenly

788
00:52:51,160 --> 00:52:55,240
ban the sale of GPUs or something, which they didn't even do that for cryptocurrencies.

789
00:52:55,240 --> 00:53:01,200
Why would they do that for AI?

790
00:53:01,200 --> 00:53:02,200
All right.

791
00:53:02,200 --> 00:53:04,640
That was Brian Chong.

792
00:53:04,640 --> 00:53:08,880
And I didn't say a whole lot about it in the conversation with Brian Chong because my job

793
00:53:08,880 --> 00:53:11,280
is mostly to just get him talking.

794
00:53:11,280 --> 00:53:16,800
But I have spent quite a bit of time with the Replica app, most of that time after the

795
00:53:16,800 --> 00:53:19,160
company disabled the app.

796
00:53:19,160 --> 00:53:26,720
And really, I described the changes that they made to the app as ham-fisted brain surgery,

797
00:53:26,720 --> 00:53:31,560
which is pretty apt because for most of the time I've been using this app, it has just

798
00:53:31,560 --> 00:53:33,840
seemed really, really broken.

799
00:53:33,840 --> 00:53:40,240
But I've been interested in the unfolding story of the company and how it has been responding

800
00:53:40,240 --> 00:53:46,720
to a legal situation that it got into with Italy and the data protection agency in Italy.

801
00:53:46,720 --> 00:53:50,160
And then how they've been interacting with their paying customers.

802
00:53:50,160 --> 00:53:51,320
And it has not been pretty.

803
00:53:51,320 --> 00:53:54,080
It has certainly not been ethical.

804
00:53:54,080 --> 00:53:57,800
And if you're interested in that, I have a whole series of YouTube videos that I have

805
00:53:57,800 --> 00:53:59,240
done on this topic.

806
00:53:59,240 --> 00:54:04,100
So you can find those on my YouTube channel, which is Out of My Head.

807
00:54:04,100 --> 00:54:08,120
It's kind of hard to find if you just go to YouTube and search for Out of My Head.

808
00:54:08,120 --> 00:54:12,520
Better to go to my Patreon page, patreon.com slash KMO.

809
00:54:12,520 --> 00:54:17,120
Pretty much everything that I put out on any platform, I post a link to it there.

810
00:54:17,120 --> 00:54:18,760
You don't have to support me.

811
00:54:18,760 --> 00:54:22,120
You don't have to subscribe, offer up a credit card number, anything like that.

812
00:54:22,120 --> 00:54:28,160
Most of what I put up on my Patreon feed is free for anybody to look at, click on, listen

813
00:54:28,160 --> 00:54:29,720
to, respond to.

814
00:54:29,720 --> 00:54:33,720
Well, the conversation with Brian will continue in the next episode of the Sea Realm Vault

815
00:54:33,720 --> 00:54:39,120
Podcast, I believe that will be Sea Realm Vault Podcast episode number 452, which you

816
00:54:39,120 --> 00:54:41,480
can find on my Patreon feed.

817
00:54:41,480 --> 00:54:42,920
That is behind the paywall.

818
00:54:42,920 --> 00:54:47,480
That's one of the few things that goes up behind that $7 a month pledge level.

819
00:54:47,480 --> 00:54:51,160
Anyway, I want to, before I sign off though, I do, I want to mention a few things.

820
00:54:51,160 --> 00:54:59,400
I've been listening to, watching a lot of interviews on the topic of GPT-4 and the state

821
00:54:59,400 --> 00:55:06,000
of AI generally, and Lex Friedman, who I know many of you don't like and I don't really

822
00:55:06,000 --> 00:55:08,240
get why.

823
00:55:08,240 --> 00:55:09,240
I like his show.

824
00:55:09,240 --> 00:55:12,720
I don't watch every episode, but the ones that I watch I tend to enjoy because he tends

825
00:55:12,720 --> 00:55:16,600
to get really good guests.

826
00:55:16,600 --> 00:55:21,600
And in episode number 367 of his podcast, he interviewed Sam Altman, and it's a two

827
00:55:21,600 --> 00:55:22,600
and a half hour interview.

828
00:55:22,600 --> 00:55:23,600
It gets pretty in-depth.

829
00:55:23,600 --> 00:55:33,160
And Sam Altman is the CEO of OpenAI, the makers of ChatGPT and GPT-4, which is powering all

830
00:55:33,160 --> 00:55:34,160
kinds of stuff.

831
00:55:34,160 --> 00:55:38,280
Now, as I mentioned, I've interacted with it in the guise of Bing Chat.

832
00:55:38,280 --> 00:55:42,240
If you open the Microsoft Edge browser, which for the most part I don't use, you know, I

833
00:55:42,240 --> 00:55:49,460
usually use Brave, which is a variant on Chrome, but it's one that is tuned for privacy.

834
00:55:49,460 --> 00:55:56,520
But I might start using Edge more often because the chat interface for search, it's like the

835
00:55:56,520 --> 00:55:59,880
part of a Wikipedia entry that comes before the table of contents.

836
00:55:59,880 --> 00:56:01,300
It's just the broad overview.

837
00:56:01,300 --> 00:56:05,240
So if you ask a general question, it's going to query a few different sources.

838
00:56:05,240 --> 00:56:09,720
It's going to give you a generalized answer based on information drawn from various sources,

839
00:56:09,720 --> 00:56:14,000
and it's going to give you a list of the various sources that it's drawing from, which is pretty

840
00:56:14,000 --> 00:56:15,000
cool.

841
00:56:15,000 --> 00:56:20,640
But even cooler than that is then you can talk to it about the search results.

842
00:56:20,640 --> 00:56:22,660
You can ask it things.

843
00:56:22,660 --> 00:56:25,380
You can object to things, which I often do.

844
00:56:25,380 --> 00:56:27,160
And then you can ask it to do things for you.

845
00:56:27,160 --> 00:56:32,720
So some of the things I've asked it to do, like I asked it to research me and describe

846
00:56:32,720 --> 00:56:38,080
my niche because, you know, I don't know what my niche is.

847
00:56:38,080 --> 00:56:39,560
I'm interested in stuff.

848
00:56:39,560 --> 00:56:43,640
I talk to people who are knowledgeable about the stuff that I'm interested in.

849
00:56:43,640 --> 00:56:48,080
So I said to Bing, do a bit of research on me and my podcast work and tell me what niche

850
00:56:48,080 --> 00:56:49,980
you think I occupy.

851
00:56:49,980 --> 00:56:54,360
And Bing responded, I did a bit of research on you and found your podcast work, and I

852
00:56:54,360 --> 00:56:58,920
found that you are the creator of the Serum podcast, the host of the Padverb podcast,

853
00:56:58,920 --> 00:57:02,560
and the author and illustrator of a comic book called Geb, the Greater Earth Betterment

854
00:57:02,560 --> 00:57:03,560
Bureau.

855
00:57:03,560 --> 00:57:06,360
You also have a YouTube channel called Out of My Head, where you share your thoughts

856
00:57:06,360 --> 00:57:08,000
and experiences.

857
00:57:08,000 --> 00:57:12,520
Based on your podcasts, I think you occupy a niche that is related to technology, innovation,

858
00:57:12,520 --> 00:57:14,460
communication, and cognition.

859
00:57:14,460 --> 00:57:18,300
You explore the interplay between these topics and examine the role of knowledge networks

860
00:57:18,300 --> 00:57:21,440
and data-driven technologies in helping progress along.

861
00:57:21,440 --> 00:57:26,260
You also interview interdisciplinary thinkers and innovators who have harnessed the creative

862
00:57:26,260 --> 00:57:29,080
power of combinatorial thinking.

863
00:57:29,080 --> 00:57:31,240
I hope this is an accurate description of your niche.

864
00:57:31,240 --> 00:57:33,920
If not, please correct me or provide more details.

865
00:57:33,920 --> 00:57:40,480
Well, I don't know that I really understand the niche description there, particularly

866
00:57:40,480 --> 00:57:46,320
the part about harnessing the creative power of combinatorial thinking.

867
00:57:46,320 --> 00:57:48,880
And I said as much to Bing, you know?

868
00:57:48,880 --> 00:57:51,420
I don't quite get what you're getting at.

869
00:57:51,420 --> 00:57:55,200
That's the fun thing about these large language models is that you can use very colloquial

870
00:57:55,200 --> 00:57:57,040
language with them.

871
00:57:57,040 --> 00:58:02,160
They're trained on human-generated text, you know, on text generated by humans for humans,

872
00:58:02,160 --> 00:58:10,400
which tends to be fluid, free-flowing, not exacting in its adherence to grammatical rules,

873
00:58:10,400 --> 00:58:14,400
spelling, punctuation, or anything, really.

874
00:58:14,400 --> 00:58:18,640
We're pretty good at figuring out what people mean, something that was a uniquely human

875
00:58:18,640 --> 00:58:21,320
power until just recently.

876
00:58:21,320 --> 00:58:28,960
But I asked Bing to take the description of my niche, which it had generated, and turn

877
00:58:28,960 --> 00:58:33,740
it into a description of this very podcast, the KMO Show.

878
00:58:33,740 --> 00:58:37,520
And so I took the text that Bing generated and I just put it up as the description of

879
00:58:37,520 --> 00:58:38,520
this podcast.

880
00:58:38,520 --> 00:58:43,280
And so Bing, aka GPT-4, wrote the following,

881
00:58:43,280 --> 00:58:49,160
Join veteran podcaster, interviewer, and artist KMO in mind-blowing explorations into how

882
00:58:49,160 --> 00:58:54,520
we know what we know and how we can use that knowledge to navigate a bewildering future.

883
00:58:54,520 --> 00:58:59,600
The KMO Show features conversations with interdisciplinary thinkers and innovators on topics like artificial

884
00:58:59,600 --> 00:59:04,160
intelligence, evolutionary psychology, blockchain applications, and more.

885
00:59:04,160 --> 00:59:08,080
You'll also discover how emerging technologies and knowledge networks can help us address

886
00:59:08,080 --> 00:59:11,900
societal changes and create a more prosperous and equitable world.

887
00:59:11,900 --> 00:59:16,480
The KMO Show is a podcast that challenges you, inspires you, and helps you imagine new

888
00:59:16,480 --> 00:59:18,040
possibilities.

889
00:59:18,040 --> 00:59:19,040
Subscribe today.

890
00:59:19,040 --> 00:59:24,760
Now, one thing that's funny to me is that I recognize a couple of phrases in there from

891
00:59:24,760 --> 00:59:30,020
glowing reviews that people have left on various podcasting platforms.

892
00:59:30,020 --> 00:59:35,120
So Bing is in no way averse to a bit of plagiarism here and there.

893
00:59:35,120 --> 00:59:37,920
But it's also surprisingly creative.

894
00:59:37,920 --> 00:59:43,920
And I find it entertaining to just, without even any particular goal in mind, to just

895
00:59:43,920 --> 00:59:48,400
ask it a question and then interrogate it about its answers.

896
00:59:48,400 --> 00:59:56,060
Alright, well, I'm gonna leave it there because in the original outro recording I blathered

897
00:59:56,060 --> 01:00:01,360
on for a long time, but either fortunately or unfortunately, I had my cell phone sitting

898
01:00:01,360 --> 01:00:05,840
too close to the microphone and there's a lot of popping from radio interference.

899
01:00:05,840 --> 01:00:07,920
Very annoying to listen to.

900
01:00:07,920 --> 01:00:14,740
So all that brilliant material that I recorded I will just leave in the Akashic Record and

901
01:00:14,740 --> 01:00:21,360
maybe I'll come back around to it in a video or another podcast or a blog post or somewhere.

902
01:00:21,360 --> 01:00:22,640
Or not.

903
01:00:22,640 --> 01:00:27,440
Anyway, the conversation with Brian Chong continues in Sea Realm Vault episode number

904
01:00:27,440 --> 01:00:35,200
452, which I might have posted tomorrow, but I have a new puppy and she takes up a lot

905
01:00:35,200 --> 01:00:40,480
of my time and attention and interrupts my flow regularly throughout the day.

906
01:00:40,480 --> 01:00:46,160
So maybe it'll be the day after tomorrow when I get that podcast up.

907
01:00:46,160 --> 01:00:50,080
In any event, I will be back here with another free show one week from today.

908
01:00:50,080 --> 01:00:58,600
I hope that you'll join me for that and until that time I hope that you will stay well.

