1
00:00:00,000 --> 00:00:09,960
Welcome to Artificially Intelligent Marketing, a weekly podcast where we stay on top of the

2
00:00:09,960 --> 00:00:15,700
latest trends, tips, and tools in the world of marketing AI, helping you get the best

3
00:00:15,700 --> 00:00:18,540
results from your marketing efforts.

4
00:00:18,540 --> 00:00:23,240
Now let's join our hosts, Paul Avery and Martin Broadhurst.

5
00:00:23,240 --> 00:00:24,560
Hello everybody.

6
00:00:24,560 --> 00:00:27,920
Welcome to episode 35 of Artificially Intelligent Marketing.

7
00:00:27,920 --> 00:00:33,240
Paul Avery here, guiding you through everything you need to know in the world of AI tech and

8
00:00:33,240 --> 00:00:38,760
marketing and I am joined by the smartest AI person on the planet, my co-host, Martin

9
00:00:38,760 --> 00:00:39,760
Broadhurst.

10
00:00:39,760 --> 00:00:41,440
How are you, Martin?

11
00:00:41,440 --> 00:00:44,440
I'm delighted to be back in the studio with you.

12
00:00:44,440 --> 00:00:50,520
After your break, it feels like it's been forever since we've sat down and discussed

13
00:00:50,520 --> 00:00:52,600
things AI related.

14
00:00:52,600 --> 00:00:58,280
A lot has happened and the fact that we've not had a chance to discuss it is quite frankly

15
00:00:58,280 --> 00:00:59,280
shocking.

16
00:00:59,280 --> 00:01:04,560
To be honest, I thought you'd abandoned me and just got sick and tired of my voice.

17
00:01:04,560 --> 00:01:11,200
So in the intervening weeks while you've been away, I've put a lot of hard effort into completely

18
00:01:11,200 --> 00:01:12,480
reworking my voice.

19
00:01:12,480 --> 00:01:14,800
So much so that now...

20
00:01:14,800 --> 00:01:27,360
I am in fact a confident older woman, described by 11lubs as Cecile, who is confident and

21
00:01:27,360 --> 00:01:29,080
strict.

22
00:01:29,080 --> 00:01:31,920
Just the way you like it, Paul.

23
00:01:31,920 --> 00:01:35,400
Yeah, I'm not sure I can tell much of a difference honestly.

24
00:01:35,400 --> 00:01:36,960
No, I'm only joking.

25
00:01:36,960 --> 00:01:38,440
I'm only joking, Martin.

26
00:01:38,440 --> 00:01:40,880
That tool is fairly interesting, isn't it?

27
00:01:40,880 --> 00:01:43,560
That like, re-voicer tool.

28
00:01:43,560 --> 00:01:46,480
So clearly you've been having a good play with that.

29
00:01:46,480 --> 00:01:48,520
Yeah, 11lubs speech synthesis.

30
00:01:48,520 --> 00:01:51,280
So previously it was a text to speech tool.

31
00:01:51,280 --> 00:01:55,800
So you typed in the text and you got an audio version of what you typed in.

32
00:01:55,800 --> 00:02:01,880
But now you can just say the words and it will turn that speech with your intonation

33
00:02:01,880 --> 00:02:06,120
exactly as you said it into the AI generated voice.

34
00:02:06,120 --> 00:02:07,600
It's really quite impressive.

35
00:02:07,600 --> 00:02:10,080
Yeah, that's crazy.

36
00:02:10,080 --> 00:02:13,800
I can't wait until I can basically speak MB Darth Vader.

37
00:02:13,800 --> 00:02:14,800
That'd be pretty cool.

38
00:02:14,800 --> 00:02:17,120
That can't be far away.

39
00:02:17,120 --> 00:02:18,120
Especially given I've seen...

40
00:02:18,120 --> 00:02:22,760
I don't know if you've seen on the Twittersphere the example where someone's using the APIs

41
00:02:22,760 --> 00:02:30,120
of GPT-4 Vision and 11lubs to basically have David Attenborough narrate on their life,

42
00:02:30,120 --> 00:02:32,640
which is quite interesting.

43
00:02:32,640 --> 00:02:35,400
So obviously that's a voice clone of a specific person.

44
00:02:35,400 --> 00:02:39,480
I should state 11lubs doesn't offer David Attenborough as a voice.

45
00:02:39,480 --> 00:02:48,000
He was flirting with IP and copyright laws by taking a sample of David Attenborough's

46
00:02:48,000 --> 00:02:52,280
voice and using it to train the voice they wanted.

47
00:02:52,280 --> 00:02:56,400
And I should also say for you dear listener, it would be very cool if Martin was able to

48
00:02:56,400 --> 00:03:00,200
speak into the system and it was able to sort of change his voice in real time.

49
00:03:00,200 --> 00:03:06,320
But of course we did that in the edit because Martin recorded it separately.

50
00:03:06,320 --> 00:03:09,160
But it's still a very impressive tool nonetheless.

51
00:03:09,160 --> 00:03:11,280
There is so much to get through today, Martin.

52
00:03:11,280 --> 00:03:13,840
Shall we jump on into the first story?

53
00:03:13,840 --> 00:03:17,600
Yeah, this one feels very timely.

54
00:03:17,600 --> 00:03:22,400
ChatGPT, everybody's favourite chatbot, has turned one.

55
00:03:22,400 --> 00:03:29,760
On 30th of November ChatGPT received its first birthday cake with one little candle on it.

56
00:03:29,760 --> 00:03:32,480
It's been quite the year.

57
00:03:32,480 --> 00:03:38,120
So just to give you a few headlines from the past 12 months, Paul, because I know that

58
00:03:38,120 --> 00:03:40,200
you've not really been paying attention, have you really?

59
00:03:40,200 --> 00:03:41,840
It's just kind of passed you by, I'm sure.

60
00:03:41,840 --> 00:03:42,840
What was that thing?

61
00:03:42,840 --> 00:03:43,840
What was it?

62
00:03:43,840 --> 00:03:44,840
What do you call it?

63
00:03:44,840 --> 00:03:45,840
Chat what?

64
00:03:45,840 --> 00:03:47,800
GPT, it's from a company called OpenAI.

65
00:03:47,800 --> 00:03:48,800
Keep your eye out.

66
00:03:48,800 --> 00:03:49,800
You'll see them in the press.

67
00:03:49,800 --> 00:03:50,800
Cool, cool.

68
00:03:50,800 --> 00:03:58,600
So they reached 100 million users in two months after its launch in November 2022, which just

69
00:03:58,600 --> 00:04:03,120
for comparison, Facebook took four years to reach 100 million users.

70
00:04:03,120 --> 00:04:05,360
Snapchat, MySpace took three years.

71
00:04:05,360 --> 00:04:07,600
Instagram, two years.

72
00:04:07,600 --> 00:04:10,800
And Google took almost a year to reach 100 million users.

73
00:04:10,800 --> 00:04:16,840
So its rise was pretty phenomenal, achieving that in just two months.

74
00:04:16,840 --> 00:04:22,160
What's quite interesting when you look back at November 30th, 2022, is that they originally

75
00:04:22,160 --> 00:04:26,680
released it as a free research preview.

76
00:04:26,680 --> 00:04:31,480
And because all of the OpenAI team had been playing around with this tech for so long,

77
00:04:31,480 --> 00:04:37,160
they'd become a little bit blasé to the fact that being able to do it back and forth,

78
00:04:37,160 --> 00:04:41,760
with meditation, with memory and all of the stuff that we know ChatGPT has compared to

79
00:04:41,760 --> 00:04:47,360
just the old Instruct model, they'd kind of taken it for granted and hadn't thought about

80
00:04:47,360 --> 00:04:53,440
the absolute sensation that it was going to be.

81
00:04:53,440 --> 00:04:56,560
So it really caught them by surprise.

82
00:04:56,560 --> 00:04:58,520
So it launched in November.

83
00:04:58,520 --> 00:05:03,240
ChatGPT Plus was announced in February.

84
00:05:03,240 --> 00:05:08,920
Shortly after that, a week later, Microsoft announced that ChatGPT would be powering some

85
00:05:08,920 --> 00:05:12,440
features in Bing.

86
00:05:12,440 --> 00:05:18,740
In March, we saw the launch of, or the announcement of, GPT-4.

87
00:05:18,740 --> 00:05:22,760
We didn't get vision until a few months later, of course.

88
00:05:22,760 --> 00:05:28,480
Our first major ChatGPT outage was on the 20th of March, earlier this year.

89
00:05:28,480 --> 00:05:33,440
I think that was the first time I realised just how dependent I was on it to get through

90
00:05:33,440 --> 00:05:35,640
the day.

91
00:05:35,640 --> 00:05:38,720
And then the following day, Google launched Bard.

92
00:05:38,720 --> 00:05:39,720
Coincidence?

93
00:05:39,720 --> 00:05:43,040
It's a conspiracy there, Paul.

94
00:05:43,040 --> 00:05:51,680
It's not for us to say, Martin, but yes, please, go dig deep listeners.

95
00:05:51,680 --> 00:05:55,400
At the end of March, Italy banned ChatGPT.

96
00:05:55,400 --> 00:06:00,920
A month later, they allowed service to resume.

97
00:06:00,920 --> 00:06:08,480
In May, they launched the iOS app, which as an Android user was a great frustration because

98
00:06:08,480 --> 00:06:14,600
I didn't get it until the end of July.

99
00:06:14,600 --> 00:06:15,880
What else happened in that time?

100
00:06:15,880 --> 00:06:25,020
Oh yeah, so there was the Senate subcommittee hearing on AI oversight because the sensation

101
00:06:25,020 --> 00:06:32,240
that was ChatGPT had caused a lot of people to be scared about the potential of AI.

102
00:06:32,240 --> 00:06:39,600
So Sam Altman was hauled in front of the US Senate to explain the existential threat that

103
00:06:39,600 --> 00:06:42,160
we all face.

104
00:06:42,160 --> 00:06:48,880
In November, OpenAI announced the new voice and image capabilities of ChatGPT.

105
00:06:48,880 --> 00:06:55,080
And then we had OpenAI Dev Day, where they announced GPT for Turbo, Assistance API and

106
00:06:55,080 --> 00:06:57,040
Dolly 3 API.

107
00:06:57,040 --> 00:07:06,800
Oh, and then a couple of weeks ago, there was that small issue of Sam Altman being fired

108
00:07:06,800 --> 00:07:08,520
as CEO of OpenAI.

109
00:07:08,520 --> 00:07:11,480
What a way to spend your birthday.

110
00:07:11,480 --> 00:07:13,880
Happy birthday to you.

111
00:07:13,880 --> 00:07:16,800
Yeah, it's been quite a year.

112
00:07:16,800 --> 00:07:21,400
It's quite amazing when you think back and how quickly it developed.

113
00:07:21,400 --> 00:07:26,280
Ethan Molek, as the regular listeners to the podcast will know, one of our fans of Ethan's

114
00:07:26,280 --> 00:07:30,960
work, spends a lot of time tracking what's going on with different AI tools, but especially

115
00:07:30,960 --> 00:07:32,800
OpenAI's tools.

116
00:07:32,800 --> 00:07:42,000
And he was sort of reminiscing on how amazing GPT-3 was when it came out compared to GPT-2.

117
00:07:42,000 --> 00:07:45,040
And then of course, the step change again that came with GPT-4.

118
00:07:45,040 --> 00:07:51,400
And it's very easy to forget how much progress has been made in the year.

119
00:07:51,400 --> 00:07:56,920
I keep seeing online statements like this is what exponential change feels like.

120
00:07:56,920 --> 00:08:03,800
Ironically, I don't personally feel the speed of exponential change when it comes to ChatGPT

121
00:08:03,800 --> 00:08:05,240
and text-based chatbots at this point.

122
00:08:05,240 --> 00:08:07,720
If I'm really honest, it feels more incremental.

123
00:08:07,720 --> 00:08:12,520
I'm mostly feeling the speed of exponential change in video, which we'll probably talk

124
00:08:12,520 --> 00:08:16,720
a bit about later because that feels like there's some massive leaps being made there.

125
00:08:16,720 --> 00:08:23,920
So I don't know if we made our leaps this year with text-based sort of bots.

126
00:08:23,920 --> 00:08:28,840
And there's some significant problems still to solve, aren't there, around hallucinations

127
00:08:28,840 --> 00:08:34,240
and multi-step agents that can actually deliver on those multi-step tasks.

128
00:08:34,240 --> 00:08:39,280
I guess if we do see significant improvements in either of those areas next year, that will

129
00:08:39,280 --> 00:08:43,120
probably be another pretty big leap, wouldn't it?

130
00:08:43,120 --> 00:08:44,120
It would.

131
00:08:44,120 --> 00:08:51,600
And I think we're all expecting some interesting developments from OpenAI and their next models,

132
00:08:51,600 --> 00:08:58,080
particularly after Sam Altman was reinstated as CEO this week and he gave an interview

133
00:08:58,080 --> 00:09:05,080
to The Verge where they touched on some of the reasons why he might have been fired and

134
00:09:05,080 --> 00:09:07,560
some of the potential new technologies.

135
00:09:07,560 --> 00:09:09,480
What have you read about that?

136
00:09:09,480 --> 00:09:13,680
Yeah, so we had a bit of a what's up back and forward on this one, didn't we?

137
00:09:13,680 --> 00:09:16,000
I think there's a number of different aspects.

138
00:09:16,000 --> 00:09:20,240
One of the stories you shared with me, I think, will end up being true, which is it's actually

139
00:09:20,240 --> 00:09:28,160
quite simple sort of tension between the not-for-profit side of OpenAI and the profit side for OpenAI

140
00:09:28,160 --> 00:09:33,880
and what the balance is there and how Sam Altman is leading the organization against

141
00:09:33,880 --> 00:09:34,880
those.

142
00:09:34,880 --> 00:09:42,400
The one that was maybe a bit more intriguing was Q-Star, which if you're an AI nerd, your

143
00:09:42,400 --> 00:09:46,640
Twitter and LinkedIn feed would have been full of this because the speculation is and

144
00:09:46,640 --> 00:09:48,800
was super rife the minute it came out.

145
00:09:48,800 --> 00:09:55,400
And this is born out of a letter sent by a couple of OpenAI employees to, I think it

146
00:09:55,400 --> 00:10:02,960
was Forbes, detailing that the major falling out was because of a spectacular new improvement

147
00:10:02,960 --> 00:10:09,080
in one of the models that they've been working on internally at OpenAI, which would allow

148
00:10:09,080 --> 00:10:14,880
either ChatGPT or some other similar tool to basically reason more logically and perform

149
00:10:14,880 --> 00:10:20,760
things like mathematical reasoning, which on the face of it, especially the example

150
00:10:20,760 --> 00:10:24,080
that was given is that it could do maths at a grade school level, which doesn't sound

151
00:10:24,080 --> 00:10:25,080
that impressive.

152
00:10:25,080 --> 00:10:30,160
But when you think about how these tools work, they don't reason at all at the moment.

153
00:10:30,160 --> 00:10:34,360
They just predict what should be the next word they should output based on all the other

154
00:10:34,360 --> 00:10:38,320
words that came before and all the words in the context of the prompt.

155
00:10:38,320 --> 00:10:42,040
So to be able to actually do mathematical reasoning gives them a better understanding

156
00:10:42,040 --> 00:10:47,880
of the world and therefore a better ability to plan and take actions, which of course,

157
00:10:47,880 --> 00:10:51,360
if you want multi-step agents, which is where some of the real power is going to lie in

158
00:10:51,360 --> 00:10:54,600
these tools, I think, then you do need that.

159
00:10:54,600 --> 00:10:59,440
There's been so much speculation that that's actually all a bunch of baloney.

160
00:10:59,440 --> 00:11:03,880
And but what was interesting is that in the most recent interview by Sam Altman, when

161
00:11:03,880 --> 00:11:10,280
that was brought up, he described it as an unfortunate league, but he didn't deny it.

162
00:11:10,280 --> 00:11:16,840
Did he not deny it because he needs to buzz around OpenAI and are they doing a stock offering

163
00:11:16,840 --> 00:11:17,840
or something?

164
00:11:17,840 --> 00:11:19,960
They've got something coming up, haven't they?

165
00:11:19,960 --> 00:11:24,160
Financially motivated where elevating the company's value as high as possible is not

166
00:11:24,160 --> 00:11:27,520
necessarily important for him because he's not a direct investor, but it will be important

167
00:11:27,520 --> 00:11:28,520
for the staff.

168
00:11:28,520 --> 00:11:34,320
Is he keeping the buzz around this Q star high so that they can get the highest valuation

169
00:11:34,320 --> 00:11:36,440
of that neck as part of that process?

170
00:11:36,440 --> 00:11:40,240
Or is there, and it's not true, but he just wants us all to think it's true, which is

171
00:11:40,240 --> 00:11:43,720
why he didn't deny it, or is it true and that's why he didn't deny it?

172
00:11:43,720 --> 00:11:47,760
And the answer, dear listeners, we do not know.

173
00:11:47,760 --> 00:11:49,120
He's a great salesman, isn't he?

174
00:11:49,120 --> 00:11:52,160
And that's one of the things that he's really well known for.

175
00:11:52,160 --> 00:11:57,400
He can get people to buy into him and basically throw money at him.

176
00:11:57,400 --> 00:12:01,600
That's one thing that he's exceptionally good at, as evidenced by the fact that Microsoft

177
00:12:01,600 --> 00:12:05,400
chipped 10 billion at OpenAI a few years ago.

178
00:12:05,400 --> 00:12:12,560
So yeah, I think maybe there's a little bit of gamesmanship here, keep the story going,

179
00:12:12,560 --> 00:12:17,120
but the fact that he didn't just outright deny it, whereas in the past things like GPT-5,

180
00:12:17,120 --> 00:12:22,680
for instance, it wasn't that long ago where he spoke about, no, we're not working on GPT-5.

181
00:12:22,680 --> 00:12:25,360
He did cut discussions of that down.

182
00:12:25,360 --> 00:12:30,720
He has subsequently said they are working on those models now, but yeah, it's interesting.

183
00:12:30,720 --> 00:12:40,240
Yeah, that's the thing with Sam and Imad Mostak as well as Stability AI and a number of the

184
00:12:40,240 --> 00:12:46,480
CEOs of these companies, they have to fuel the hype train because that's where the valuations

185
00:12:46,480 --> 00:12:47,480
come from.

186
00:12:47,480 --> 00:12:51,520
So it ends up being really, really hard to try and sort out the signal from the noise

187
00:12:51,520 --> 00:12:54,800
in what they say.

188
00:12:54,800 --> 00:12:59,240
Because I'm a bit of a tech optimist, I really end up hoping that some of the things that

189
00:12:59,240 --> 00:13:02,640
come out in terms of the improvements that are being made end up being real because I

190
00:13:02,640 --> 00:13:05,440
think it makes the tools more powerful for us.

191
00:13:05,440 --> 00:13:12,080
And personally, I hope for a world where all of the efficiency gains provided by AI basically

192
00:13:12,080 --> 00:13:15,560
are used to make life better for everyone.

193
00:13:15,560 --> 00:13:19,880
Maybe people can work less, three-day weeks enabled by AI, for example.

194
00:13:19,880 --> 00:13:24,760
So I'm really excited for all of these technological things to be real, but at the back of my

195
00:13:24,760 --> 00:13:30,200
head, the little alarms always going, yeah, but it does serve their purpose in terms of

196
00:13:30,200 --> 00:13:33,160
the valuations of their companies to keep that hype train going.

197
00:13:33,160 --> 00:13:37,240
So you can't really tell.

198
00:13:37,240 --> 00:13:44,920
In terms of the first birthday of ChatGPT, I'm wondering, since GPT-4 Turbo was rolled

199
00:13:44,920 --> 00:13:49,160
out across the platform, have you noticed any performance difference?

200
00:13:49,160 --> 00:13:54,040
And the reason I ask this is because, and we may have discussed it on the previous episode,

201
00:13:54,040 --> 00:14:00,640
but this week I saw a discussion on Twitter that was talking specifically about its coding

202
00:14:00,640 --> 00:14:09,120
capabilities and how it seems to have been clipped in terms of its ability to write long

203
00:14:09,120 --> 00:14:10,520
pieces of code.

204
00:14:10,520 --> 00:14:17,120
It can work with small snippets, but it previously could write really quite extensive bits of

205
00:14:17,120 --> 00:14:22,760
script and that seems to have been, it's as if it's had its wings clipped in that domain.

206
00:14:22,760 --> 00:14:24,600
I've certainly encountered this.

207
00:14:24,600 --> 00:14:28,560
Yesterday, I was trying to get it to do some relatively simple calculations.

208
00:14:28,560 --> 00:14:34,440
I asked it to do it using Code Interpreter and it continuously failed.

209
00:14:34,440 --> 00:14:38,640
It failed about six times and in the end I just couldn't get it to work.

210
00:14:38,640 --> 00:14:44,560
Whereas I feel like a few weeks ago or maybe pre-GPT-4 Turbo, that wouldn't have been the

211
00:14:44,560 --> 00:14:45,560
case.

212
00:14:45,560 --> 00:14:49,240
Have you found anything similar?

213
00:14:49,240 --> 00:14:51,480
The short answer is yes.

214
00:14:51,480 --> 00:14:58,760
So I'm using Dually a lot at the moment and at the moment I'm struggling to get photorealistic

215
00:14:58,760 --> 00:14:59,760
images out of it.

216
00:14:59,760 --> 00:15:05,800
So I don't know if they've dialed that down or I'm not pushing and prompting hard enough,

217
00:15:05,800 --> 00:15:09,720
but the photorealistic images I'm getting now versus when it first came out are worse

218
00:15:09,720 --> 00:15:14,760
and they're making me think, oh, I'm going to have to reactivate my mid-journey subscription.

219
00:15:14,760 --> 00:15:16,760
So I'm definitely seeing that.

220
00:15:16,760 --> 00:15:22,800
I was on a webinar yesterday presenting about AI and sales and a good friend of mine, Nick

221
00:15:22,800 --> 00:15:27,560
Clare over at SuccessionBio, shout out to Nick and Harrison, was talking about an application

222
00:15:27,560 --> 00:15:31,040
he was working on where in the end he couldn't get GPT-4 to do it.

223
00:15:31,040 --> 00:15:36,000
So he switched to GPT-3.5 and got a better outcome in terms of the output, which I think

224
00:15:36,000 --> 00:15:39,240
is quite interesting.

225
00:15:39,240 --> 00:15:44,760
The GPTs as a slight aside that I've been building, I haven't been super impressed with.

226
00:15:44,760 --> 00:15:48,520
They're kind of like a slightly glossier version of GPT-4.

227
00:15:48,520 --> 00:15:53,880
I would hazard think I could get the same out with just a half decent prompt in GPT-4.

228
00:15:53,880 --> 00:15:58,320
I'm not playing with anything super sophisticated like trying to use data connectors or anything.

229
00:15:58,320 --> 00:16:03,360
The tests I've run on that have been extremely glitchy and unreliable anyway, to be honest.

230
00:16:03,360 --> 00:16:05,840
So yeah, they're tweaking the models.

231
00:16:05,840 --> 00:16:11,400
One of the things I haven't had a chance to test yet is if we're on GPT-4 Turbo now, have

232
00:16:11,400 --> 00:16:14,680
you been leveraging the context window?

233
00:16:14,680 --> 00:16:16,700
Has that rolled down?

234
00:16:16,700 --> 00:16:18,680
The context window has definitely expanded.

235
00:16:18,680 --> 00:16:20,720
Yeah, without doubt.

236
00:16:20,720 --> 00:16:24,200
I actually haven't hit the limit with it yet.

237
00:16:24,200 --> 00:16:29,680
And I've been throwing in some decent sized text that absolutely categorically would have

238
00:16:29,680 --> 00:16:30,680
hit the limits previously.

239
00:16:30,680 --> 00:16:32,600
I haven't pushed the boundaries though.

240
00:16:32,600 --> 00:16:42,640
In fact, I find it incredibly hard to max out a context window at 128,000 tokens or

241
00:16:42,640 --> 00:16:43,640
even...

242
00:16:43,640 --> 00:16:48,840
It's quite hard to put in a prompt that's even 50,000 tokens.

243
00:16:48,840 --> 00:16:49,840
Yeah.

244
00:16:49,840 --> 00:16:56,000
So my go-to test, which is not proving useful, is to just keep asking ChatGPT what its context

245
00:16:56,000 --> 00:16:58,200
window is and it just keeps telling me 2,000 tokens.

246
00:16:58,200 --> 00:17:03,440
So obviously I'm not pushing it right.

247
00:17:03,440 --> 00:17:10,760
I haven't tested it because of our next story, which is the release of Claude 2.1.

248
00:17:10,760 --> 00:17:15,600
So regular listeners to the podcast will know Claude is a model that's released by the company

249
00:17:15,600 --> 00:17:20,720
Anthropic, which has a bunch of ex-OpenAI folks within it.

250
00:17:20,720 --> 00:17:24,520
And it's another chatbot like ChatGPT.

251
00:17:24,520 --> 00:17:28,280
But the difference always with Claude was its context window was massive compared to

252
00:17:28,280 --> 00:17:31,500
ChatGPT's context window, which is now itself massive.

253
00:17:31,500 --> 00:17:37,620
So Claude needed to one-up the introduction of the 128K context window for ChatGPT.

254
00:17:37,620 --> 00:17:44,640
And with their 200,000 token context window, which you now get in Claude 2.1.

255
00:17:44,640 --> 00:17:51,240
So obviously you can upload PDFs, multiple PDFs, the equivalent to over 500 pages of

256
00:17:51,240 --> 00:17:56,720
text and start asking questions of the text, ask for summaries of the text, et cetera,

257
00:17:56,720 --> 00:18:01,840
et cetera, which is massively beneficial for processing documents like entire code bases

258
00:18:01,840 --> 00:18:06,600
or massive financial reports or anything else you can think of that's a big document that

259
00:18:06,600 --> 00:18:09,280
you'd want to interrogate really.

260
00:18:09,280 --> 00:18:13,100
The other thing that they've done with Claude 2.1, Martin, is they've introduced this reduction

261
00:18:13,100 --> 00:18:14,200
in hallucinations.

262
00:18:14,200 --> 00:18:19,760
They claim that you get a two times decrease in false statements, which of course is important

263
00:18:19,760 --> 00:18:23,160
for all of us because when you're interrogating large documents, you need to be able to trust

264
00:18:23,160 --> 00:18:25,760
that the outputs you're going to get, you can rely on.

265
00:18:25,760 --> 00:18:29,720
A lot of that's being driven by Claude saying that it's not willing to answer.

266
00:18:29,720 --> 00:18:36,480
But honestly, I'd rather have that than inaccuracies and untruths.

267
00:18:36,480 --> 00:18:38,320
I think that really helps.

268
00:18:38,320 --> 00:18:40,760
They've also done some interesting stuff for developers, haven't they, Martin?

269
00:18:40,760 --> 00:18:42,440
Do you know what they've done with it?

270
00:18:42,440 --> 00:18:46,720
They've done some stuff with the console and changed how developers interact with it.

271
00:18:46,720 --> 00:18:47,720
Yeah.

272
00:18:47,720 --> 00:18:53,200
So if you are using it through the API, you've got an API and developer account, they've

273
00:18:53,200 --> 00:19:00,080
introduced something called Workbench, which is effectively like OpenAI's playground where

274
00:19:00,080 --> 00:19:03,440
you have a bit more control over the system.

275
00:19:03,440 --> 00:19:07,980
And actually one of the things that they've introduced is the ability to system prompt.

276
00:19:07,980 --> 00:19:13,560
So in the same way that you can with ChatGPT and Assistants where you can say, this is

277
00:19:13,560 --> 00:19:18,060
your role and this is how you will respond and for the whole of this conversation, these

278
00:19:18,060 --> 00:19:20,280
are your rules, basically.

279
00:19:20,280 --> 00:19:23,640
You can now do that within the Workbench.

280
00:19:23,640 --> 00:19:32,060
They've also extended its functionality, basically giving it function calling.

281
00:19:32,060 --> 00:19:37,720
So they're enabling it to call upon APIs from external apps.

282
00:19:37,720 --> 00:19:43,280
All of this at the moment still requires developers to request API access.

283
00:19:43,280 --> 00:19:48,360
So you might be able to get a developer account, but you'll be limited on the number of API

284
00:19:48,360 --> 00:19:49,920
calls that you can make.

285
00:19:49,920 --> 00:19:53,560
And if you want to actually roll this out into production, you actually have to request

286
00:19:53,560 --> 00:19:58,360
that from Anthropic for them to de-restrict your account.

287
00:19:58,360 --> 00:20:02,760
But yeah, some definite advancements for developers.

288
00:20:02,760 --> 00:20:07,280
And I've been playing around with these tools a bit.

289
00:20:07,280 --> 00:20:13,480
Going back to my previous point, it's really hard to fill a context window.

290
00:20:13,480 --> 00:20:16,280
200,000 tokens is a lot.

291
00:20:16,280 --> 00:20:22,520
I went back looking through my API usage over the past month and in early November, there

292
00:20:22,520 --> 00:20:25,960
was one day where I had a really heavy usage day.

293
00:20:25,960 --> 00:20:28,000
I think I was working with quite a lot of transcripts.

294
00:20:28,000 --> 00:20:31,120
It might've even been podcast episode transcripts.

295
00:20:31,120 --> 00:20:34,600
Within one conversation, I was throwing in lots and lots of transcripts.

296
00:20:34,600 --> 00:20:39,760
Bear in mind, each episode is about an hour's worth of conversation, right?

297
00:20:39,760 --> 00:20:45,080
And you can see how many input tokens and how many completion tokens you get.

298
00:20:45,080 --> 00:20:47,840
So you can see that in your API log.

299
00:20:47,840 --> 00:20:51,760
On that day where I was going back and forth, back and forth, back and forth, back and forth,

300
00:20:51,760 --> 00:20:59,000
at no point did my input tokens go above 90,000.

301
00:20:59,000 --> 00:21:09,080
It's a huge amount of text to put in 200,000 tokens in a prompt.

302
00:21:09,080 --> 00:21:10,480
It's a lot to get there.

303
00:21:10,480 --> 00:21:13,200
I think you're right.

304
00:21:13,200 --> 00:21:18,200
I think it's an impressive technical feat, but we're probably at 100,000 and certainly

305
00:21:18,200 --> 00:21:23,760
200,000 tokens for a lot of, for like 98% of use cases, we're probably tapping out at

306
00:21:23,760 --> 00:21:26,160
terms of what would you really need it for?

307
00:21:26,160 --> 00:21:30,800
Unlike you, I use it a lot for transcripts.

308
00:21:30,800 --> 00:21:36,360
And what was interesting, so listeners will know I use a tool called Magi, M-A-G-A-I.

309
00:21:36,360 --> 00:21:43,360
The website is magi.co because I don't trust putting sensitive data into Claw directly

310
00:21:43,360 --> 00:21:46,960
or ChatGPT directly, but when you're using the API, you can have a bit more confidence.

311
00:21:46,960 --> 00:21:50,320
And Magi now has access to Claw 2.1.

312
00:21:50,320 --> 00:21:56,440
The transcript summaries are, I feel, significantly better in 2.1 than 2.

313
00:21:56,440 --> 00:21:59,840
The things that it's missing, it's just not missing them as much and its summaries are

314
00:21:59,840 --> 00:22:00,840
really spot on.

315
00:22:00,840 --> 00:22:04,800
And I gave it an incredibly hard task yesterday.

316
00:22:04,800 --> 00:22:08,800
I recorded an onboarding video for a member of my team and then I wanted to summarize

317
00:22:08,800 --> 00:22:09,800
it in an email.

318
00:22:09,800 --> 00:22:14,160
So I pulled the transcript out and dropped it into Clawed via Magi.

319
00:22:14,160 --> 00:22:16,700
And I got a summary and all of that was great.

320
00:22:16,700 --> 00:22:20,600
During that handover meeting, I'm describing what I think the kickoff call for the client

321
00:22:20,600 --> 00:22:24,000
should contain, but in an extremely sporadic fashion.

322
00:22:24,000 --> 00:22:26,840
Like it's a 30 minute intro onboarding call.

323
00:22:26,840 --> 00:22:29,280
And at one point I'll be like, oh, we should put that in the kickoff call.

324
00:22:29,280 --> 00:22:32,160
And then I talk for a bit more and then I'm like, and in the kickoff call, we should do

325
00:22:32,160 --> 00:22:33,160
this.

326
00:22:33,160 --> 00:22:38,960
So I asked Clawed to build the agenda for the kickoff call by trying to like pull some

327
00:22:38,960 --> 00:22:41,480
reason out of my ramblings.

328
00:22:41,480 --> 00:22:43,200
And it did a surprisingly good job.

329
00:22:43,200 --> 00:22:47,360
It included some things that it thought the kickoff call should have based on its understanding

330
00:22:47,360 --> 00:22:48,360
of the transcript.

331
00:22:48,360 --> 00:22:53,400
So in other words, it didn't just focus on the things I specifically called out, but

332
00:22:53,400 --> 00:22:56,320
it did include all the things I specifically called out.

333
00:22:56,320 --> 00:23:02,720
And I can't help but feel Clawed 2.0 wasn't as, wasn't strong enough to do that.

334
00:23:02,720 --> 00:23:07,760
Like it would struggle just accurately summarizing a transcript at times for me.

335
00:23:07,760 --> 00:23:09,720
So yeah, I've got a bit of love.

336
00:23:09,720 --> 00:23:13,640
I know you've always had a bit of Clawed love in your life, Martin, but I've got a bit of

337
00:23:13,640 --> 00:23:15,680
Clawed 2.1 love going on, if I'm honest.

338
00:23:15,680 --> 00:23:20,280
And with that API access, because now it's going to be able to call databases, it's going

339
00:23:20,280 --> 00:23:24,600
to be able to search the web with web search APIs.

340
00:23:24,600 --> 00:23:27,440
I think this could be an interesting time for Clawed.

341
00:23:27,440 --> 00:23:31,000
Oh, it's a pleasure to have you on board.

342
00:23:31,000 --> 00:23:33,800
Oh, and team Clawed.

343
00:23:33,800 --> 00:23:38,360
I need to get my, I would use Daweley3 to create a photorealistic image of a person

344
00:23:38,360 --> 00:23:42,400
wearing a t-shirt that says I'm team Clawed, but it can't do photorealistic images for

345
00:23:42,400 --> 00:23:44,400
me anymore, so I won't bother.

346
00:23:44,400 --> 00:23:46,480
But it can do text.

347
00:23:46,480 --> 00:23:50,520
Just give me one image generation tool that could do all the things I want.

348
00:23:50,520 --> 00:23:55,640
Mid-Journey, Daweley3, go out for a drink, see if you get on well, because I think your

349
00:23:55,640 --> 00:24:00,440
babies would be outstanding tools for us all to use.

350
00:24:00,440 --> 00:24:06,000
With that image that nobody needed, should we move on to our next story, Martin?

351
00:24:06,000 --> 00:24:16,040
Yeah, so this was a story that caught a lot of buzz on the Twitters, and it's all about

352
00:24:16,040 --> 00:24:20,320
an AI enabled SEO heist.

353
00:24:20,320 --> 00:24:32,280
Someone called Jake Ward said that they had effectively managed to use AI to steal 3.6

354
00:24:32,280 --> 00:24:44,960
million hits by generating 1800 articles in a process where they basically scraped a competitor's

355
00:24:44,960 --> 00:24:50,040
website and then used AI to spin that content programmatically.

356
00:24:50,040 --> 00:24:51,720
They automated the entire process.

357
00:24:51,720 --> 00:24:56,680
It took them a few hours rather than many weeks, and then they published all of these

358
00:24:56,680 --> 00:24:58,800
articles online.

359
00:24:58,800 --> 00:25:04,560
Interesting, you know, 1800 articles and the traffic, he shared it in a tweet.

360
00:25:04,560 --> 00:25:10,480
The graph is just going up and to the right in terms of website visitors.

361
00:25:10,480 --> 00:25:18,400
And it caused, understandably, a little bit of controversy because this was self-described

362
00:25:18,400 --> 00:25:21,600
by Jake Ward as theft, right?

363
00:25:21,600 --> 00:25:25,360
He managed to steal the traffic.

364
00:25:25,360 --> 00:25:31,960
So lots of people were saying, you know, is this fair?

365
00:25:31,960 --> 00:25:36,360
Is this something that Google should be clamping down on?

366
00:25:36,360 --> 00:25:44,140
I think Google's helpful content update recently will probably see companies getting away with

367
00:25:44,140 --> 00:25:46,120
this less and less.

368
00:25:46,120 --> 00:25:54,000
But yeah, certainly some ethical questions here, like where are the boundaries?

369
00:25:54,000 --> 00:25:58,200
Just because you can do it, should you do it?

370
00:25:58,200 --> 00:26:05,040
That's a great question, but unfortunately, you're back to the 1% because 99% of people

371
00:26:05,040 --> 00:26:08,680
might say, hmm, that's too black cat for me.

372
00:26:08,680 --> 00:26:10,400
I'm not in on that.

373
00:26:10,400 --> 00:26:12,920
But it only takes the 1% to do it.

374
00:26:12,920 --> 00:26:23,680
And the web is full of duplicate junk that is the equivalent of content farms of old.

375
00:26:23,680 --> 00:26:28,800
And it makes the web a much harder place to go find relevant information.

376
00:26:28,800 --> 00:26:32,800
And we've talked about this quite a lot, right, in terms of how does SEO change, not only

377
00:26:32,800 --> 00:26:39,880
as search experience changes with Bing and Google basically using generative AI to create

378
00:26:39,880 --> 00:26:42,040
written summaries in the search output.

379
00:26:42,040 --> 00:26:48,800
So you may never click on the link to go to someone's page, but also just junk.

380
00:26:48,800 --> 00:26:51,800
There is a, I think, what's the guy's name?

381
00:26:51,800 --> 00:26:54,400
I think it's Doug Kessler.

382
00:26:54,400 --> 00:27:00,000
There was a seminal sort of ebook presentation thing he did in maybe 10, 12 years ago called

383
00:27:00,000 --> 00:27:01,080
Crap.

384
00:27:01,080 --> 00:27:05,560
And it was the deluge of crap content in a world where all brands want to be publishers

385
00:27:05,560 --> 00:27:06,560
as well.

386
00:27:06,560 --> 00:27:08,920
And I think to a certain extent, it was right.

387
00:27:08,920 --> 00:27:10,520
I do think it was a deluge of crap.

388
00:27:10,520 --> 00:27:14,480
And we were lucky in that Google got better at sorting the crap from the good.

389
00:27:14,480 --> 00:27:19,080
So it surfaced the best content for us as part of its search engine results.

390
00:27:19,080 --> 00:27:23,480
But also over time, you started to learn which content brands you could really trust that

391
00:27:23,480 --> 00:27:25,720
would give you good content.

392
00:27:25,720 --> 00:27:31,600
I think Moz did a great job of this, Content Marketing Institute, et cetera, in our marketing

393
00:27:31,600 --> 00:27:32,600
land.

394
00:27:32,600 --> 00:27:36,080
There are certainly companies in the life sciences that have done a good job of it as

395
00:27:36,080 --> 00:27:37,080
well.

396
00:27:37,080 --> 00:27:42,840
But I think this just further goes to show SEO is going to change.

397
00:27:42,840 --> 00:27:47,580
If you're relying a lot on content on your website to drive a lot of traffic, you need

398
00:27:47,580 --> 00:27:51,880
to start to be planning what you would do if you lost a load of that traffic, either

399
00:27:51,880 --> 00:27:58,000
because you got content hacked, like this example, or because people don't click on

400
00:27:58,000 --> 00:28:01,560
as many links in Google search anymore and don't come to your website.

401
00:28:01,560 --> 00:28:03,480
So how are you going to use podcasts?

402
00:28:03,480 --> 00:28:04,920
How are you going to use video?

403
00:28:04,920 --> 00:28:09,560
How are you going to use trends, reports, and primary research data that no one else

404
00:28:09,560 --> 00:28:13,160
has access to and only you can share that insight with the world?

405
00:28:13,160 --> 00:28:17,400
How are you going to lean into subject matter experts who know things about your industry

406
00:28:17,400 --> 00:28:19,120
that no one else knows?

407
00:28:19,120 --> 00:28:24,040
That is going to become ever more critical, I think, mine.

408
00:28:24,040 --> 00:28:25,040
It is.

409
00:28:25,040 --> 00:28:30,240
And I think for content creators yourself, everybody's going to have to start asking

410
00:28:30,240 --> 00:28:33,560
themselves a question about what are their own ethical boundaries?

411
00:28:33,560 --> 00:28:35,200
How far are you prepared to go?

412
00:28:35,200 --> 00:28:40,900
Because yeah, you will see results by completely ripping off other people's content and making

413
00:28:40,900 --> 00:28:41,900
it your own.

414
00:28:41,900 --> 00:28:49,680
In fact, when GPTs were announced by OpenAI recently, if you went on YouTube and searched

415
00:28:49,680 --> 00:28:56,720
how to create a GPT, there were loads of videos of people saying, basically, find a publisher

416
00:28:56,720 --> 00:29:01,800
that you like their content, download it, rip it off and upload it as if it's your own,

417
00:29:01,800 --> 00:29:07,820
and now you've just created an expert bot with subject matter expertise about that thing

418
00:29:07,820 --> 00:29:12,720
that that other publisher wrote about and you can pass it off as your own.

419
00:29:12,720 --> 00:29:15,120
And this was rife.

420
00:29:15,120 --> 00:29:18,800
And we all have to say, well, where are our boundaries?

421
00:29:18,800 --> 00:29:26,420
I'm nervous about how that will play out because I think the 1% will cause carnage and it will

422
00:29:26,420 --> 00:29:32,520
get draconian changes by search engines like Google that penalise people who are actually

423
00:29:32,520 --> 00:29:35,920
producing valuable content potentially.

424
00:29:35,920 --> 00:29:43,520
And copyright laws and infringements and let's imagine that the entire GPT infrastructure

425
00:29:43,520 --> 00:29:44,520
becomes really valuable.

426
00:29:44,520 --> 00:29:47,920
I'm not sure it is at the moment, but let's say it did.

427
00:29:47,920 --> 00:29:53,440
And people are then doing this, then OpenAI will have to take the entire GPT marketplace

428
00:29:53,440 --> 00:29:58,240
down while they figure out how to navigate whatever legal and copyright issues they have

429
00:29:58,240 --> 00:30:02,680
to solve for, which will mean that a bunch of useful tools that people rely on, they

430
00:30:02,680 --> 00:30:09,560
can no longer access, but again, because the 1% are trying to think about clever, dare

431
00:30:09,560 --> 00:30:12,800
I say it, slightly underhand ways to make a quick buck.

432
00:30:12,800 --> 00:30:17,280
And I think that will happen because the history of humanity, unfortunately, does suggest it

433
00:30:17,280 --> 00:30:18,280
will happen.

434
00:30:18,280 --> 00:30:19,280
Yeah.

435
00:30:19,280 --> 00:30:21,760
That's a slightly negative note.

436
00:30:21,760 --> 00:30:25,080
Let's talk about something positive and super cool.

437
00:30:25,080 --> 00:30:31,880
Let's talk about Hey Gen, because Hey Gen, we really love Hey Gen, don't we, Martin?

438
00:30:31,880 --> 00:30:35,240
So regular listeners to the podcast, you'll probably have heard us talk about Hey Gen

439
00:30:35,240 --> 00:30:44,960
before because it's an AI video startup that basically is extremely good at AI-modified

440
00:30:44,960 --> 00:30:45,960
video.

441
00:30:45,960 --> 00:30:51,720
So in other words, what you can do is you can upload a video of yourself speaking.

442
00:30:51,720 --> 00:30:54,560
I think we did this on a recent podcast episode with you, didn't we, Martin?

443
00:30:54,560 --> 00:31:00,960
You upload a video of yourself speaking and it will overdub your voice in another language

444
00:31:00,960 --> 00:31:06,760
and then it will edit the video so that your mouth moves and lip syncs with the new translated

445
00:31:06,760 --> 00:31:07,760
voice.

446
00:31:07,760 --> 00:31:11,600
So we've done it in German, French, Spanish, and it's impressive.

447
00:31:11,600 --> 00:31:20,040
Well, they've just raised a load more money, 5.6 million, and launched their new near instant

448
00:31:20,040 --> 00:31:22,760
custom avatars.

449
00:31:22,760 --> 00:31:29,660
And why this is important is because previously, if you wanted to use Hey Gen to create like

450
00:31:29,660 --> 00:31:34,400
a synthetic version of yourself, either for translations or to write scripts and then

451
00:31:34,400 --> 00:31:38,120
have the synthetic version of you speak your script so you didn't have to record the video

452
00:31:38,120 --> 00:31:44,960
yourself, it was a fairly laborious process of recording five minutes of video in borderline

453
00:31:44,960 --> 00:31:47,360
professional studio environment.

454
00:31:47,360 --> 00:31:51,640
But they've been able to tweak their algorithm now so that it's now able to do this with

455
00:31:51,640 --> 00:31:54,260
far less production level video.

456
00:31:54,260 --> 00:31:58,640
They claim you can record yourself on a mobile phone and still get a reasonably good result,

457
00:31:58,640 --> 00:31:59,840
for example.

458
00:31:59,840 --> 00:32:03,320
And Martin and I have been like wanting to play with this, but just haven't had the resources

459
00:32:03,320 --> 00:32:06,200
to go record ourselves in a professional studio environment.

460
00:32:06,200 --> 00:32:11,160
And of course, now we and all of you, dear listeners, can jump in and go have a play

461
00:32:11,160 --> 00:32:12,960
with that as well.

462
00:32:12,960 --> 00:32:19,560
And it's really impressive because when you watch the synthetic version of you talk, it's

463
00:32:19,560 --> 00:32:24,380
you, it follows the same mannerisms and movements that you have, and it really does deep fake

464
00:32:24,380 --> 00:32:27,160
your mouth movements surprisingly well.

465
00:32:27,160 --> 00:32:32,360
And you can certainly imagine, Martin, a world where you can add videos to your site that

466
00:32:32,360 --> 00:32:36,460
where you write the script and then Hey Jen creates this synthetic version of you that

467
00:32:36,460 --> 00:32:39,920
looks just like you, as a great use case.

468
00:32:39,920 --> 00:32:44,700
And as I was talking about with a bunch of people yesterday, sending out prospecting

469
00:32:44,700 --> 00:32:50,920
sales emails with custom sales videos at scale, because GPT-4 will write the scripts and then

470
00:32:50,920 --> 00:32:53,960
Hey Jen will turn the scripts into videos of you speaking.

471
00:32:53,960 --> 00:32:59,840
So Hey Jen, we've been like excited about them for a while, but I think this sort of

472
00:32:59,840 --> 00:33:05,640
then, then now their technological change here, I think is going to open this up and

473
00:33:05,640 --> 00:33:08,320
make it much wider for a lot more people to start playing with it.

474
00:33:08,320 --> 00:33:10,880
What are your thoughts, Martin?

475
00:33:10,880 --> 00:33:11,880
Very much so.

476
00:33:11,880 --> 00:33:12,880
The interface is so simple.

477
00:33:12,880 --> 00:33:18,200
If I encourage anyone that's not tried it yet, just go over to the Hey Jen website and

478
00:33:18,200 --> 00:33:19,200
give it a go.

479
00:33:19,200 --> 00:33:22,000
There's no, there's no complication to it.

480
00:33:22,000 --> 00:33:23,640
It's remarkably simple.

481
00:33:23,640 --> 00:33:31,360
I recorded a video of myself in front of my webcam and translated that into French and

482
00:33:31,360 --> 00:33:32,360
it's brilliant.

483
00:33:32,360 --> 00:33:39,640
I did it in front of a conference this week and people there described it as a wow moment

484
00:33:39,640 --> 00:33:47,760
as they saw my English video immediately turned into French and I sent that video to my French

485
00:33:47,760 --> 00:33:49,580
neighbour and he was blown away.

486
00:33:49,580 --> 00:33:52,560
It sounds like you, it looks like you, it's incredible.

487
00:33:52,560 --> 00:33:58,640
I don't know if you have played around with many TikTok filters.

488
00:33:58,640 --> 00:34:00,920
Have you given them a go?

489
00:34:00,920 --> 00:34:03,680
I'm not a big TikTok user, I have to admit.

490
00:34:03,680 --> 00:34:05,360
They are incredible.

491
00:34:05,360 --> 00:34:06,360
Right.

492
00:34:06,360 --> 00:34:12,160
So some of it, there's like a beautify one, which will make you younger or make you dolled

493
00:34:12,160 --> 00:34:14,520
up, give you a glow up, what have you.

494
00:34:14,520 --> 00:34:17,000
I wondered why you were looking so good.

495
00:34:17,000 --> 00:34:21,880
Yeah, they are impressive.

496
00:34:21,880 --> 00:34:26,360
These filters are really impressive because actually if you put your hand in front of

497
00:34:26,360 --> 00:34:30,400
your face, like when, like I've got a virtual background now and it's doing a pretty good

498
00:34:30,400 --> 00:34:34,280
job of not showing my background.

499
00:34:34,280 --> 00:34:37,160
If you put your hand in front of your face, the image stays the same.

500
00:34:37,160 --> 00:34:42,360
There's no like, yeah, there's no glitchiness there.

501
00:34:42,360 --> 00:34:49,260
It just does a really good job and HeyGen seems to have the same capability that really

502
00:34:49,260 --> 00:34:51,960
reduces the glitchiness.

503
00:34:51,960 --> 00:35:01,840
And it was interesting to me that I saw the founders come from a background at Snap and

504
00:35:01,840 --> 00:35:02,840
TikTok.

505
00:35:02,840 --> 00:35:13,280
So they're both from the video AR augmented reality VR landscape, which explains why they've

506
00:35:13,280 --> 00:35:19,480
got probably such a good tech stack that enables them to do what they're doing.

507
00:35:19,480 --> 00:35:24,400
Yeah, that's really interesting because obviously the market for those types of TikTok filters

508
00:35:24,400 --> 00:35:29,400
is massive because basically all TikTok users, but the business applications that we just

509
00:35:29,400 --> 00:35:35,440
went through are not insignificant and probably an easier place to make money, dare I say.

510
00:35:35,440 --> 00:35:39,080
So it seems like a really smart move from them.

511
00:35:39,080 --> 00:35:47,160
And I guess the downside of this would be what stops someone from taking a video of

512
00:35:47,160 --> 00:35:56,140
me from like a webinar and then using that to train the model and the signing the consent

513
00:35:56,140 --> 00:36:01,080
statement on my behalf and then starting producing videos of me saying things I shouldn't be

514
00:36:01,080 --> 00:36:02,080
saying.

515
00:36:02,080 --> 00:36:04,800
I mean, it's the deep fake discussion, right?

516
00:36:04,800 --> 00:36:09,600
We had this deep fake discussion when Obama was president.

517
00:36:09,600 --> 00:36:12,160
This is something that has been in the landscape for a long time.

518
00:36:12,160 --> 00:36:18,400
It's just that these tools are becoming kind of commercially accessible at a consumer level.

519
00:36:18,400 --> 00:36:21,720
You don't have to be techie to train the model.

520
00:36:21,720 --> 00:36:26,280
You can spend like $25 a month and do it yourself.

521
00:36:26,280 --> 00:36:29,400
Yeah, I do think that is the critical piece.

522
00:36:29,400 --> 00:36:30,680
They are impressive.

523
00:36:30,680 --> 00:36:37,280
I threw in a very difficult video from the podcast where I moved a lot while I was talking

524
00:36:37,280 --> 00:36:43,640
and I turn my head around like to the side and it did a surprisingly good job of making

525
00:36:43,640 --> 00:36:47,680
the lip sync to mouth movements still look accurate.

526
00:36:47,680 --> 00:36:51,720
You can tell the video I gave it wasn't particularly well laid either.

527
00:36:51,720 --> 00:36:53,640
So I think you can tell.

528
00:36:53,640 --> 00:36:58,380
But on the best examples I've seen, it's really hard to tell.

529
00:36:58,380 --> 00:37:05,040
So if you're a marketer or if you're a BD person, you listen to this rev ops maybe,

530
00:37:05,040 --> 00:37:10,080
how could you use this technology to drive efficiencies and new ways of working, new

531
00:37:10,080 --> 00:37:14,680
ways of adding value, hopefully for customers, not just spamming them using technologies

532
00:37:14,680 --> 00:37:15,680
like these?

533
00:37:15,680 --> 00:37:23,240
Because I think 2023 was the year of sort of chat GPT and text generation.

534
00:37:23,240 --> 00:37:28,800
But I really think 2024 is going to be the year of synthetic audio and synthetic video.

535
00:37:28,800 --> 00:37:31,880
And I think we're going to see crazy pace in this area.

536
00:37:31,880 --> 00:37:35,520
So as marketers, I really think we need to be thinking about how does this influence

537
00:37:35,520 --> 00:37:40,900
what we do, how we work and all that good stuff.

538
00:37:40,900 --> 00:37:46,420
The big thing here is about latency.

539
00:37:46,420 --> 00:37:55,600
So with the translation app on Hey Gen, you have to upload the video, give it the target

540
00:37:55,600 --> 00:37:58,880
language that you want it translating in, and it does some processing in the background,

541
00:37:58,880 --> 00:38:04,300
it takes a few minutes and you get an email notification telling you that it's been done.

542
00:38:04,300 --> 00:38:12,920
When that time to translate is reduced from minutes to milliseconds, that is where we're

543
00:38:12,920 --> 00:38:20,400
going to see some serious business applications, which leads us neatly into our next story,

544
00:38:20,400 --> 00:38:28,080
Stable Diffusion XL Turbo, the launch of real time AI image generation.

545
00:38:28,080 --> 00:38:35,160
So Stability AI has unveiled its latest breakthrough, which is a turbo model of Stable Diffusion

546
00:38:35,160 --> 00:38:36,160
XL.

547
00:38:36,160 --> 00:38:39,720
And it's very impressive.

548
00:38:39,720 --> 00:38:45,280
You can just type in what you want and rather than hitting enter, and it generates the image

549
00:38:45,280 --> 00:38:51,260
while you are typing, the image is generating before your eyes.

550
00:38:51,260 --> 00:38:55,000
It's generating in real time.

551
00:38:55,000 --> 00:39:03,360
So Stable Diffusion XL Turbo's key advancement is its single step image output capability,

552
00:39:03,360 --> 00:39:10,360
which is significantly more efficient than the 20 to 50 steps required by its predecessor.

553
00:39:10,360 --> 00:39:18,120
So what happened previously was you would put in your prompt and then it would run through

554
00:39:18,120 --> 00:39:20,500
between 20 and 50 steps.

555
00:39:20,500 --> 00:39:26,840
So it would basically diffuse the image 20 or 50 times, one after another sequentially.

556
00:39:26,840 --> 00:39:30,480
Now it does this in a single step.

557
00:39:30,480 --> 00:39:36,880
So this leap in the technology is attributed to a technique called adversarial diffusion

558
00:39:36,880 --> 00:39:43,360
distillation, ADD, which refines the model's ability to distinguish between real and generated

559
00:39:43,360 --> 00:39:50,640
images, and therefore enhances the realism and the output.

560
00:39:50,640 --> 00:39:53,880
So yeah, it's pretty cool.

561
00:39:53,880 --> 00:40:00,680
It's available on, now I've forgotten the name of the website Paul, remind me.

562
00:40:00,680 --> 00:40:03,520
Oh, is it, is it CREA?

563
00:40:03,520 --> 00:40:04,520
K-R-E-A?

564
00:40:04,520 --> 00:40:05,520
Yes.

565
00:40:05,520 --> 00:40:08,160
It's available to try online.

566
00:40:08,160 --> 00:40:11,680
The demos are very cool.

567
00:40:11,680 --> 00:40:17,440
However, one thing to note for anyone that does want to go about using this is that it's

568
00:40:17,440 --> 00:40:25,600
available under a non-commercial research license and Stability AI is currently expressing

569
00:40:25,600 --> 00:40:30,120
interest in exploring commercial applications.

570
00:40:30,120 --> 00:40:38,320
You can see this having a bunch of applications for storytelling apps, game development, any

571
00:40:38,320 --> 00:40:44,600
kind of creative interactive user interface.

572
00:40:44,600 --> 00:40:47,800
Yeah, I think it's very cool.

573
00:40:47,800 --> 00:40:49,280
What did you think of the demos?

574
00:40:49,280 --> 00:40:50,280
Yeah.

575
00:40:50,280 --> 00:40:54,200
So to be fair, I don't know for sure that the CREA app uses it.

576
00:40:54,200 --> 00:41:00,080
I mean, it touts it as a sort of feature, is this sort of near real-time image generation.

577
00:41:00,080 --> 00:41:03,000
So I don't know of any other model that enables this.

578
00:41:03,000 --> 00:41:08,440
So I assume there's some sort of underlying partnership there, but that is an assumption.

579
00:41:08,440 --> 00:41:15,360
I think it's a really good example of how the technology advances in ways that we don't

580
00:41:15,360 --> 00:41:16,360
expect.

581
00:41:16,360 --> 00:41:19,920
So one of the things we've all looked at on the surface as marketers is our ability to

582
00:41:19,920 --> 00:41:22,960
generate images and that the images will be of high quality.

583
00:41:22,960 --> 00:41:27,080
But of course, the researchers in the background are realizing that these images take a while

584
00:41:27,080 --> 00:41:29,320
to develop.

585
00:41:29,320 --> 00:41:32,600
And so they're looking to optimize how the models work.

586
00:41:32,600 --> 00:41:36,860
So firstly, when you're playing with the demo, it's quite fun because your ability to influence

587
00:41:36,860 --> 00:41:40,040
the image output in real time is interesting.

588
00:41:40,040 --> 00:41:43,840
And you do that, at least in CREA, by you have an original prompt that creates your

589
00:41:43,840 --> 00:41:45,280
image and that's on the right-hand side.

590
00:41:45,280 --> 00:41:50,840
And on the left-hand side, you can use like pen and shape tools to influence the image

591
00:41:50,840 --> 00:41:53,600
with different colors and different sort of components.

592
00:41:53,600 --> 00:41:59,360
So the example when you boot up is a, I think it's like a blue square and then it's got

593
00:41:59,360 --> 00:42:02,160
a smaller pink square on top of it.

594
00:42:02,160 --> 00:42:10,320
And the prompt is, I think, a frog sat on top of a blue mushroom or something like that.

595
00:42:10,320 --> 00:42:16,340
And the squares directly dictate where in the image those things should actually be.

596
00:42:16,340 --> 00:42:19,120
And of course, you can move the squares around and the image updates in near real time.

597
00:42:19,120 --> 00:42:22,400
So it does give you much more control over how the image is being generated.

598
00:42:22,400 --> 00:42:27,880
I think the other thing to reflect on here is near real time image generation is a big

599
00:42:27,880 --> 00:42:34,120
step up from what we've had so far, where it might take five or 10 seconds to get your

600
00:42:34,120 --> 00:42:37,840
image, which is like, what, 10, 100 times faster.

601
00:42:37,840 --> 00:42:42,600
The next step from here is real time image generation at video level.

602
00:42:42,600 --> 00:42:43,600
Right.

603
00:42:43,600 --> 00:42:48,480
Once we get down to 24 images a second instead of one image a second, we're now creating

604
00:42:48,480 --> 00:42:49,880
video.

605
00:42:49,880 --> 00:42:55,840
So you can imagine, and I've seen a few examples online, not quite 24 frames a second, but

606
00:42:55,840 --> 00:43:03,520
moving image, the equivalent of like low frame rate video of people generating images in

607
00:43:03,520 --> 00:43:05,400
real time to create videos.

608
00:43:05,400 --> 00:43:08,880
So you mentioned game development, but also there's so many applications.

609
00:43:08,880 --> 00:43:14,300
Like I've heard described, and I don't know how likely this is going to be, but the equivalent

610
00:43:14,300 --> 00:43:21,600
of having Netflix create an animated movie for you in real time based on feedback that

611
00:43:21,600 --> 00:43:24,160
you're given how you want the story to develop.

612
00:43:24,160 --> 00:43:28,920
Well, this is the type of technological change that we would need to enable those types of

613
00:43:28,920 --> 00:43:29,920
things.

614
00:43:29,920 --> 00:43:34,800
So I think there's implications today for how marketers are creating images and the

615
00:43:34,800 --> 00:43:38,920
level of control they want and the speed of being able to influence the image, iterate,

616
00:43:38,920 --> 00:43:45,400
iterate, iterate, but also those future ramifications of what does this mean for creating video

617
00:43:45,400 --> 00:43:48,920
on demand and shaping that video in near real time.

618
00:43:48,920 --> 00:43:50,160
That would be kind of cool.

619
00:43:50,160 --> 00:43:52,680
So let's get on to our next story then, Martin.

620
00:43:52,680 --> 00:43:54,760
It's in the video realm as well.

621
00:43:54,760 --> 00:43:59,880
It's from Pika Labs, which blew up on the Twittersphere this week.

622
00:43:59,880 --> 00:44:05,960
They've raised a ton of cash, $55 million, and they are chucking themselves right in

623
00:44:05,960 --> 00:44:10,400
the mix to compete with the likes of Runway and Stability AI that we just talked about.

624
00:44:10,400 --> 00:44:15,280
Stability AI has its text to video generator now that you can play with on a variety of,

625
00:44:15,280 --> 00:44:19,960
I think, hugging face models that you can go play with for free and Runway ML we've

626
00:44:19,960 --> 00:44:25,520
talked about a lot on the podcast has its cool video generation features, not least

627
00:44:25,520 --> 00:44:30,720
the motion brush that we've talked about recently where you draw on a bit of an image and only

628
00:44:30,720 --> 00:44:32,840
that bit of the image animates.

629
00:44:32,840 --> 00:44:42,840
What's different about Pika 1.0 is that its text to video looks extremely impressive in

630
00:44:42,840 --> 00:44:45,840
the demo video, like really high quality video.

631
00:44:45,840 --> 00:44:50,560
There's an example, I think, where it might be Elon Musk going up into space or something

632
00:44:50,560 --> 00:44:56,320
similar in a cartoon style and it looks like, I don't know, like a Pixar animation.

633
00:44:56,320 --> 00:45:01,160
It doesn't look kind of like weird diffusion effects or a bit crappy like a lot of the

634
00:45:01,160 --> 00:45:04,560
other examples that we've seen.

635
00:45:04,560 --> 00:45:09,480
They already have a bunch of users, but it really sounds like this next leap in their

636
00:45:09,480 --> 00:45:11,520
technology is going to be significant.

637
00:45:11,520 --> 00:45:12,520
I'm on the waiting list.

638
00:45:12,520 --> 00:45:16,600
I think they're starting to open up the waiting list on Monday, Martin.

639
00:45:16,600 --> 00:45:21,200
So we shall see what it does when we can get access to it.

640
00:45:21,200 --> 00:45:24,720
So yeah, if you haven't checked it out, I think it's worth getting on the Twitterspheres

641
00:45:24,720 --> 00:45:29,320
or searching Google for Pika and just watch that demo video because if they can even do

642
00:45:29,320 --> 00:45:36,040
three quarters of what that demo video suggests, it's going to be significantly impressive.

643
00:45:36,040 --> 00:45:41,960
I think you can even like update your prompt to say, no, no, I wanted the robot in my animation

644
00:45:41,960 --> 00:45:45,000
to do this instead of this and then it will change it for you.

645
00:45:45,000 --> 00:45:50,080
So I guess the caveat here, Martin, based on previous experiences, demo videos that

646
00:45:50,080 --> 00:45:52,680
we see from the companies are usually rather awesome.

647
00:45:52,680 --> 00:45:57,240
And then when you play with it yourself, you struggle to get the same sort of output.

648
00:45:57,240 --> 00:45:58,560
Have you had a look at the Pika video?

649
00:45:58,560 --> 00:46:00,640
What have your thoughts been?

650
00:46:00,640 --> 00:46:03,160
I wholeheartedly agree.

651
00:46:03,160 --> 00:46:07,280
It's a demo video that looks fantastic, but I've seen demo videos that look fantastic

652
00:46:07,280 --> 00:46:12,760
previously and I've not been able to generate anything even close to them.

653
00:46:12,760 --> 00:46:21,200
So we will wait with bated breath to see whether or not it's actually as easy to get outputs

654
00:46:21,200 --> 00:46:23,280
of the quality that they're generating.

655
00:46:23,280 --> 00:46:27,560
I still can't get a runway to give me anything half decent.

656
00:46:27,560 --> 00:46:33,880
I was playing with Motion Brush last week trying to make some very small edits.

657
00:46:33,880 --> 00:46:40,480
I was trying to make somebody just, I thought maybe I could just get somebody to smile.

658
00:46:40,480 --> 00:46:45,520
It was a portrait image and their head just kind of floats up off the screen.

659
00:46:45,520 --> 00:46:54,160
So it's very good for getting things like clouds to move in the sky or water moving.

660
00:46:54,160 --> 00:46:58,640
But yeah, the Motion Brush tends to, when you set it in a direction, it keeps things

661
00:46:58,640 --> 00:47:04,880
going so people's heads just morph off the video entirely.

662
00:47:04,880 --> 00:47:10,840
So we'll see how easy it is to use Pika compared to the demo videos.

663
00:47:10,840 --> 00:47:16,240
I'm sure these technologies, well, they are the worst they're ever going to be right now.

664
00:47:16,240 --> 00:47:17,560
They're only going to get better from here.

665
00:47:17,560 --> 00:47:22,280
So the fact that somebody can make them do cool things like this suggests that in the

666
00:47:22,280 --> 00:47:28,480
future they'll get it to the point where we can all make cool things like their demo video.

667
00:47:28,480 --> 00:47:33,440
Yeah, I mean, I have to assume, one of the things I think about in this area a lot is

668
00:47:33,440 --> 00:47:39,800
if we've got a tool like GPT-4 Vision, which can analyze certainly frames in videos and

669
00:47:39,800 --> 00:47:47,400
images themselves and describe what's going on, why don't we have a dually three level

670
00:47:47,400 --> 00:47:51,920
video generator that's creating a video that's being checked in real time by GPT-4 Vision

671
00:47:51,920 --> 00:47:55,840
to make sure that it's not junk and doing some weird stuff like someone's head floating

672
00:47:55,840 --> 00:48:00,340
away that's clearly not going to be the intention from the prompt and therefore able to course

673
00:48:00,340 --> 00:48:03,600
correct its own output.

674
00:48:03,600 --> 00:48:08,160
There doesn't seem to be, at least in my understanding of how all the models work, that much recursiveness

675
00:48:08,160 --> 00:48:10,200
in how they produce stuff at the moment.

676
00:48:10,200 --> 00:48:15,240
And it just seems to the completely untrained person that I am that that would be a really

677
00:48:15,240 --> 00:48:17,240
powerful way to improve the outputs.

678
00:48:17,240 --> 00:48:21,680
I don't know if that's how a lot of these improvements that we're seeing are being made.

679
00:48:21,680 --> 00:48:26,600
I don't know if that's the type of thing that they're trying to bake in, but it does seem

680
00:48:26,600 --> 00:48:31,760
that's my hope is that you almost have multiple models working on a project together where

681
00:48:31,760 --> 00:48:35,040
they have different skills and then they can just like one feeds back to the other to go

682
00:48:35,040 --> 00:48:38,560
no, I think you might have drifted in the wrong direction here.

683
00:48:38,560 --> 00:48:43,960
Exactly as we've just described in the stable diffusion XL turbo where they've got the adversarial

684
00:48:43,960 --> 00:48:45,440
diffusion distillation.

685
00:48:45,440 --> 00:48:46,440
Right.

686
00:48:46,440 --> 00:48:51,040
That's exactly as you've just described and they don't have that in the video realm yet

687
00:48:51,040 --> 00:48:52,040
at all.

688
00:48:52,040 --> 00:48:53,040
So it won't be long.

689
00:48:53,040 --> 00:48:55,040
I'm sure they'll, they'll borrow these techniques soon enough.

690
00:48:55,040 --> 00:48:56,040
You're right.

691
00:48:56,040 --> 00:48:58,360
Should we talk music for a bit?

692
00:48:58,360 --> 00:48:59,360
Yeah.

693
00:48:59,360 --> 00:49:08,040
Going back to, to stability AI, they've announced some updates to stable audio.

694
00:49:08,040 --> 00:49:14,360
And since we last discussed stable audio on the podcast, it has been celebrated as one

695
00:49:14,360 --> 00:49:19,960
of Time magazine's best inventions of 2023.

696
00:49:19,960 --> 00:49:25,880
Also stable audio for those who missed the previous discussion about it is a text to

697
00:49:25,880 --> 00:49:26,880
audio generator.

698
00:49:26,880 --> 00:49:33,880
So you can describe the audio style, the rhythm, the BPM, all of that kind of stuff.

699
00:49:33,880 --> 00:49:39,560
And then it will generate an audio track and you can use these tracks commercially in your

700
00:49:39,560 --> 00:49:41,760
own projects.

701
00:49:41,760 --> 00:49:47,920
But they've now pushed the first real big set of updates for the product since it was

702
00:49:47,920 --> 00:49:51,280
launched, including input audio.

703
00:49:51,280 --> 00:49:57,420
So you can now actually, in the same way that you can have a seed image for a text to image,

704
00:49:57,420 --> 00:50:00,120
you can provide seed audio.

705
00:50:00,120 --> 00:50:04,920
So tracks that you've previously generated, you can input that and say, use this as the

706
00:50:04,920 --> 00:50:14,320
kind of starting point, which allows for more nuanced and personalized edits building upon

707
00:50:14,320 --> 00:50:16,440
existing creations.

708
00:50:16,440 --> 00:50:22,840
There's now a new set of editing parameters.

709
00:50:22,840 --> 00:50:28,300
So you can have control over the seed, the number of steps.

710
00:50:28,300 --> 00:50:34,480
So as we spoke about with stable diffusion, it goes through a number of steps behind the

711
00:50:34,480 --> 00:50:35,960
scenes to create the output.

712
00:50:35,960 --> 00:50:42,760
The more steps you allow the AI to run through, generally speaking, the better quality the

713
00:50:42,760 --> 00:50:43,800
output.

714
00:50:43,800 --> 00:50:48,480
So if you increase the number of steps, you'll get nicer quality audio closer to what you

715
00:50:48,480 --> 00:50:50,720
asked for in the prompt.

716
00:50:50,720 --> 00:50:52,820
You can also increase the prompt strength.

717
00:50:52,820 --> 00:50:58,940
So how closely it sticks to what you've put in the prompt and also adjusting the number

718
00:50:58,940 --> 00:51:00,200
of generations as well.

719
00:51:00,200 --> 00:51:05,320
So it will give you more than just one generation from an input.

720
00:51:05,320 --> 00:51:08,680
There's some improvements to collaboration.

721
00:51:08,680 --> 00:51:14,560
So you can now add share links, just allowing you to send your creations to other people

722
00:51:14,560 --> 00:51:17,560
for distribution and collaboration.

723
00:51:17,560 --> 00:51:21,040
Another neat feature is the downloader's video as well.

724
00:51:21,040 --> 00:51:26,360
So you can export your audio as a video file.

725
00:51:26,360 --> 00:51:35,840
And there is a prompt library, which for someone like me, who doesn't have any kind of musical

726
00:51:35,840 --> 00:51:39,160
creation, I'm not a musician at all.

727
00:51:39,160 --> 00:51:43,000
I don't have a single musical bone in my body.

728
00:51:43,000 --> 00:51:48,720
That can be really helpful because sometimes I might broadly know what kind of audio I'm

729
00:51:48,720 --> 00:51:53,680
looking for, but I can't really articulate it.

730
00:51:53,680 --> 00:51:56,160
I'm not technical.

731
00:51:56,160 --> 00:51:58,080
I've never done music production.

732
00:51:58,080 --> 00:52:02,840
So having a prompt library there is useful.

733
00:52:02,840 --> 00:52:04,920
This is quite cool.

734
00:52:04,920 --> 00:52:09,680
Sometimes when I talk about generative AI for music, when I'm out and about, I sometimes

735
00:52:09,680 --> 00:52:13,560
wonder how is this most applicable for marketers?

736
00:52:13,560 --> 00:52:20,360
And the way I think about a lot of this stuff is there's a lot of decent stock websites

737
00:52:20,360 --> 00:52:24,760
out and about like Audio Jungle and Theme Forest.

738
00:52:24,760 --> 00:52:29,400
But I think if you want your images to stand out and not just look like the stock photos

739
00:52:29,400 --> 00:52:32,960
that everyone else is using, that's where a tool like Mid Journey or Dudley3 comes in,

740
00:52:32,960 --> 00:52:35,800
because you can really generate something that is unique for you.

741
00:52:35,800 --> 00:52:38,280
And I think this is the case for music as well.

742
00:52:38,280 --> 00:52:43,200
If you want little intro jingles for your branded videos and you don't want to use

743
00:52:43,200 --> 00:52:47,120
something that you know could be used by someone else because you've just bought it for $15

744
00:52:47,120 --> 00:52:50,200
on Audio Jungle, well, this is a great way to do that.

745
00:52:50,200 --> 00:52:53,840
And the more powerful they get, the more control we're going to have over getting just the

746
00:52:53,840 --> 00:52:55,860
right output that we want.

747
00:52:55,860 --> 00:53:00,480
With all these things, generative AI is about getting it approximately what you are asking

748
00:53:00,480 --> 00:53:02,560
for, not exactly what you're asking for.

749
00:53:02,560 --> 00:53:06,660
And I think these tweaks to how the models work and the parameters you can change is

750
00:53:06,660 --> 00:53:08,600
definitely going to give people more control.

751
00:53:08,600 --> 00:53:14,560
But of course, in terms of absolute five control of a music producer, of course, not quite

752
00:53:14,560 --> 00:53:15,560
there yet.

753
00:53:15,560 --> 00:53:19,200
But yeah, I think it's amazing to see all the different realms.

754
00:53:19,200 --> 00:53:24,640
Text, image, video, everything is audio, music, everything is sprinting along all at the same

755
00:53:24,640 --> 00:53:25,640
time.

756
00:53:25,640 --> 00:53:26,640
Right.

757
00:53:26,640 --> 00:53:31,240
A couple more stories to whip through.

758
00:53:31,240 --> 00:53:37,320
The first one here is from MetaAI's research division, which have unveiled the groundbreaking

759
00:53:37,320 --> 00:53:41,400
achievement they've made with Cicero, which is the first AI capable of playing the strategy

760
00:53:41,400 --> 00:53:44,640
game Diplomacy at a human level.

761
00:53:44,640 --> 00:53:50,600
So Diplomacy is a tough game to play because you need to negotiate, cooperate and plan

762
00:53:50,600 --> 00:53:51,880
with other humans.

763
00:53:51,880 --> 00:53:56,180
And it's like it's very kind of diplomatic, as the name would suggest, and quite complex

764
00:53:56,180 --> 00:53:58,080
and very human, really.

765
00:53:58,080 --> 00:54:03,280
So to be able to create an AI that can play the game well is really impressive.

766
00:54:03,280 --> 00:54:08,040
The reason it can do it is because of its ability to combine advanced strategic reasoning

767
00:54:08,040 --> 00:54:09,480
with natural language processing.

768
00:54:09,480 --> 00:54:14,000
It can think ahead to a certain extent and it can also understand the nuances of what

769
00:54:14,000 --> 00:54:18,880
the other players are doing and what they're asking for and what they're saying.

770
00:54:18,880 --> 00:54:29,840
So Cicero played online on WebDiplomacy.net and was able to outperform average human scores,

771
00:54:29,840 --> 00:54:33,680
but also ranked in the top 10% of players, which is kind of mental.

772
00:54:33,680 --> 00:54:37,760
We've talked about AlphaGo, haven't we, on the podcast and that's a rather impressive

773
00:54:37,760 --> 00:54:40,240
feat because it's such a complicated game.

774
00:54:40,240 --> 00:54:42,800
But this is such a human game, Martin.

775
00:54:42,800 --> 00:54:45,200
Yeah, this is so different.

776
00:54:45,200 --> 00:54:50,480
Now AlphaGo was complex because of the number of permutations on the board.

777
00:54:50,480 --> 00:54:53,400
This is different because it's about negotiation.

778
00:54:53,400 --> 00:55:03,120
It's about understanding incentives, rewards, punishment, things like, well, human psychology,

779
00:55:03,120 --> 00:55:04,120
right?

780
00:55:04,120 --> 00:55:10,320
Loss aversion, human strategy, power, influence, all of these things.

781
00:55:10,320 --> 00:55:13,920
And it's managed to score in the top 10% of players.

782
00:55:13,920 --> 00:55:16,880
That's scary.

783
00:55:16,880 --> 00:55:24,760
But all of a sudden I'm thinking, okay, well, if ChatGPT with GPT-4 scores in the top 10%

784
00:55:24,760 --> 00:55:32,600
of humans that sit the bar exam, so for legal professionals in America, if I can take that

785
00:55:32,600 --> 00:55:41,840
capability and combine it with this new Cicero AI, then, you know, it sounds like I've got

786
00:55:41,840 --> 00:55:44,800
a legal dream team on my hands there.

787
00:55:44,800 --> 00:55:45,800
It's super interesting.

788
00:55:45,800 --> 00:55:54,640
So Sam Altman, formerly and now currently CEO of OpenAI said at the end of October, so

789
00:55:54,640 --> 00:56:00,560
only a month ago, on Twitter, I expect AI to be capable of superhuman persuasion well

790
00:56:00,560 --> 00:56:06,560
before it is superhuman at general intelligence, which may lead to some very strange outcomes.

791
00:56:06,560 --> 00:56:12,160
And I think having AI being good at a game like diplomacy is an absolute indication of

792
00:56:12,160 --> 00:56:13,160
that, right?

793
00:56:13,160 --> 00:56:20,960
It might be that before it is solving climate change and helping us develop new medicines,

794
00:56:20,960 --> 00:56:29,000
it is taking part in negotiating contracts and getting rebates and maybe acting in courts

795
00:56:29,000 --> 00:56:34,920
of law, right, because of its ability to understand how humans communicate and how to leverage

796
00:56:34,920 --> 00:56:38,520
that to persuade.

797
00:56:38,520 --> 00:56:39,520
Who knows?

798
00:56:39,520 --> 00:56:41,760
Before long, it could be operating at a geopolitical level.

799
00:56:41,760 --> 00:56:47,080
It will be leading and chairing debates in the UN chamber.

800
00:56:47,080 --> 00:56:48,080
Who knows?

801
00:56:48,080 --> 00:56:50,280
It's kind of mad, isn't it?

802
00:56:50,280 --> 00:56:54,200
Right, let's last story then for this week, Martin.

803
00:56:54,200 --> 00:57:00,400
Yeah, so this is revisiting a story that we had on the podcast recently.

804
00:57:00,400 --> 00:57:05,120
So we recently covered the case where some visual artists were taking action against

805
00:57:05,120 --> 00:57:13,560
AI image generation companies such as Stability AI, Mid Journey and Runway.

806
00:57:13,560 --> 00:57:21,840
And the crux of the case was that there was an alleged misuse of artists' work to train

807
00:57:21,840 --> 00:57:24,600
these generative AI systems.

808
00:57:24,600 --> 00:57:32,160
But last time we covered the case, there was basically a partial dismissal by the US District

809
00:57:32,160 --> 00:57:41,440
Court, which basically said that the artists' initial arguments were not strong enough and

810
00:57:41,440 --> 00:57:45,360
didn't really apply to this copyright law.

811
00:57:45,360 --> 00:57:52,960
But they maintained their core claim concerning AI training processes violating artists'

812
00:57:52,960 --> 00:57:53,960
rights.

813
00:57:53,960 --> 00:57:59,440
So now the artists have come back because I think last time three of the arguments were

814
00:57:59,440 --> 00:58:03,160
thrown out but the one was allowed to remain.

815
00:58:03,160 --> 00:58:13,000
In the amended complaint, the seven new artists have basically been...

816
00:58:13,000 --> 00:58:18,120
They've said that their rights have been impinged because now if you search for their name

817
00:58:18,120 --> 00:58:27,200
in Google, it isn't their work that comes up, it's the AI generated work that comes

818
00:58:27,200 --> 00:58:29,720
up first.

819
00:58:29,720 --> 00:58:34,960
And this is what's forming the basis of the new case because they're saying there is real,

820
00:58:34,960 --> 00:58:40,360
they're evidencing now that there is real commercial harm.

821
00:58:40,360 --> 00:58:41,360
That's so funny.

822
00:58:41,360 --> 00:58:42,360
So it's not the...

823
00:58:42,360 --> 00:58:45,360
Obviously they complained about the first order consequences, which are quite understandable,

824
00:58:45,360 --> 00:58:49,400
but this is a second order consequence where all of the buzz and the news around this ends

825
00:58:49,400 --> 00:58:54,640
up ranking highly on Google, more highly than their actual original work.

826
00:58:54,640 --> 00:58:56,680
Yep.

827
00:58:56,680 --> 00:58:59,320
Complicated.

828
00:58:59,320 --> 00:59:04,600
So this is definitely one to keep an eye on.

829
00:59:04,600 --> 00:59:09,500
Where this ends up, who knows, but we're starting to see some of the more interesting legal

830
00:59:09,500 --> 00:59:10,880
arguments now.

831
00:59:10,880 --> 00:59:17,960
Okay, it was trained on your work, but it's not completely derivative of your work.

832
00:59:17,960 --> 00:59:25,280
But now when people search for the artist's name, people's AI generated work in the style

833
00:59:25,280 --> 00:59:27,880
of that artist is outranking the actual artist.

834
00:59:27,880 --> 00:59:34,960
So therefore we can see that there is genuine commercial harm.

835
00:59:34,960 --> 00:59:38,160
This is a search engine problem though, surely, right?

836
00:59:38,160 --> 00:59:42,920
Google needs to figure out how to make some sort of update that allows people to know,

837
00:59:42,920 --> 00:59:46,840
here's the original artist, front and center, if you're looking for that.

838
00:59:46,840 --> 00:59:50,880
And if you're looking for derivatives, they come lower down the page.

839
00:59:50,880 --> 00:59:55,280
I feel like there could be a technical solution to this, but a little bit like, you know,

840
00:59:55,280 --> 01:00:00,720
the right to be forgotten in the EU where you can basically contact Google and ask that

841
01:00:00,720 --> 01:00:06,200
basically certain search results about you are not shown anymore.

842
01:00:06,200 --> 01:00:10,720
I would have to hope that they can solve this like this, but clearly it's helping to underpin

843
01:00:10,720 --> 01:00:15,880
a more substantial claim in terms of damages, etc.

844
01:00:15,880 --> 01:00:20,320
So that's probably why they haven't leaned so much into what might be a practical solution

845
01:00:20,320 --> 01:00:24,320
versus the solution they've opted for.

846
01:00:24,320 --> 01:00:27,520
Not familiar with the case, not a lawyer, so this is just my opinion.

847
01:00:27,520 --> 01:00:33,080
But yeah, it's still the wild west, but we are starting to see it coalesce a little bit

848
01:00:33,080 --> 01:00:36,880
in terms of the approaches that people are taking and then the counter approaches that

849
01:00:36,880 --> 01:00:43,440
these companies are taking to avoid being held liable for any sort of copyright or commercial

850
01:00:43,440 --> 01:00:45,880
damages.

851
01:00:45,880 --> 01:00:47,800
Many twists and turns ahead, I'm sure.

852
01:00:47,800 --> 01:00:50,680
I think you may be right on that one, Martin.

853
01:00:50,680 --> 01:00:53,600
One thing that we should say is that that's the end of the episode.

854
01:00:53,600 --> 01:00:55,440
Thank you so much, everyone, for sticking with us.

855
01:00:55,440 --> 01:00:57,600
We hope you find this useful.

856
01:00:57,600 --> 01:01:04,840
We've got another interview for next week's episode for the Eagle-eared, Eagle-eyed, Eagle-somethings

857
01:01:04,840 --> 01:01:09,240
of you will have noticed that to a certain extent we've found a new cadence, Martin and

858
01:01:09,240 --> 01:01:15,360
I, where we have a discussion like this on week one and then on week two, we may have

859
01:01:15,360 --> 01:01:20,560
an in-depth interview with a subject matter expert in the field of AI and technology for

860
01:01:20,560 --> 01:01:22,720
marketers and sales folk to use.

861
01:01:22,720 --> 01:01:27,000
So we have another interview next week, but we will be back with you the week after to

862
01:01:27,000 --> 01:01:32,000
dive into all the new technologies, new stories, tools and stuff that we learn about during

863
01:01:32,000 --> 01:01:33,000
that period.

864
01:01:33,000 --> 01:01:34,000
Thanks so much for your time, Martin.

865
01:01:34,000 --> 01:01:37,000
I look forward to speaking with you soon.

866
01:01:37,000 --> 01:01:38,820
Cheers, Paul.

867
01:01:38,820 --> 01:01:42,280
Thank you for listening to Artificially Intelligent Marketing.

868
01:01:42,280 --> 01:01:48,320
To stay on top of the latest trends, tips and tools in the world of marketing AI, be

869
01:01:48,320 --> 01:01:50,080
sure to subscribe.

870
01:01:50,080 --> 01:01:57,400
We look forward to seeing you again next week.

