1
00:00:00,000 --> 00:00:05,360
You have some artists that are releasing hits, AI hits, from the beyond.

2
00:00:05,520 --> 00:00:09,960
I see here Biggie Smalls rest in peace, but he's got a new cover.

3
00:00:13,800 --> 00:00:19,640
HTTTA. HTTTA. HTTTA. HTTTA. HTTTA.

4
00:00:19,640 --> 00:00:26,320
It's how to talk to AI with your hosts, go to go and west the synth mind.

5
00:00:26,720 --> 00:00:29,880
Ladies and gentlemen, boys and girls, children of all ages, dogs, cats,

6
00:00:29,880 --> 00:00:35,040
robots and everybody in between, especially you, AI generated pop music superstars.

7
00:00:35,200 --> 00:00:38,920
This is HTTTA, how to talk to AI.

8
00:00:39,320 --> 00:00:41,880
I am your host, West the Synth Mind, Synth Mind West.

9
00:00:41,920 --> 00:00:47,040
And as always, I am joined by the gleaming genius, the gregarious,

10
00:00:47,280 --> 00:00:52,040
the genuinely gifted, gorgeous, gracious, galavanting host herself,

11
00:00:52,280 --> 00:00:54,200
the glamorous Ms. Dota Go.

12
00:00:54,240 --> 00:00:55,840
G, how are you this week?

13
00:00:55,840 --> 00:00:59,120
I am fantastic. Thank you so much.

14
00:00:59,120 --> 00:01:00,440
I'm doing great.

15
00:01:00,440 --> 00:01:03,480
It's an interesting start to the week, to be honest.

16
00:01:03,840 --> 00:01:08,760
I just saw this video, which is a little bit political

17
00:01:08,920 --> 00:01:14,600
and with audience permission, it's about campaign, about Biden

18
00:01:14,680 --> 00:01:16,560
re-elections or something like that.

19
00:01:16,560 --> 00:01:20,480
I'm not into politics, but you are in the US.

20
00:01:20,600 --> 00:01:22,840
So I'm very curious to hear your opinion.

21
00:01:23,560 --> 00:01:28,520
This video, what I'm impressed about is the editing, the music,

22
00:01:28,520 --> 00:01:33,320
the audio, and as I understand it's fully AI generated.

23
00:01:33,560 --> 00:01:36,280
That's correct. So what G is referring to over there,

24
00:01:36,560 --> 00:01:39,400
the Republican Party here in the United States released

25
00:01:39,640 --> 00:01:44,240
one of their first campaign ads for the 2024 presidential election.

26
00:01:44,600 --> 00:01:49,240
And it is 100% AI generated from the photos, the music editing.

27
00:01:49,560 --> 00:01:52,800
We'll post a link to it just for everyone's essay,

28
00:01:52,800 --> 00:01:55,760
because if you're like us and you stare at a lot of these pictures

29
00:01:55,760 --> 00:01:58,200
all the time, you'll notice a couple of things that you're like,

30
00:01:58,200 --> 00:01:59,920
oh, that's AI generated. That's a little funny.

31
00:02:00,320 --> 00:02:03,320
But the way it's cut together, like a political attack ad,

32
00:02:03,760 --> 00:02:06,400
it's pretty sensational, pretty wild.

33
00:02:06,880 --> 00:02:11,840
And it kind of speaks to some of the darker sides of this,

34
00:02:11,840 --> 00:02:16,800
where misinformation can easily jump forth from these generative AI tools.

35
00:02:17,280 --> 00:02:20,960
But even though we will not take a political slant one way

36
00:02:20,960 --> 00:02:24,920
or the other on this show, I think it is important for our listeners

37
00:02:24,920 --> 00:02:28,880
and all out there to have an awareness that these are out there

38
00:02:29,320 --> 00:02:34,080
and they're not openly eliciting the fact that this is AI generated.

39
00:02:34,360 --> 00:02:38,760
No, not at all. In the title, if you didn't look into video description,

40
00:02:39,280 --> 00:02:43,280
you wouldn't know. And I can see people quickly jumping and sharing

41
00:02:43,560 --> 00:02:45,280
because the edit is good.

42
00:02:45,280 --> 00:02:48,760
Like just from the completely neutral perspective,

43
00:02:49,040 --> 00:02:50,840
I would be fooled for sure.

44
00:02:50,840 --> 00:02:53,760
But I saw another one, Wall Street Journal,

45
00:02:53,760 --> 00:02:57,560
just put out their video, which is the title is,

46
00:02:57,560 --> 00:03:02,160
I challenge my AI clone to replace me for 24 hours.

47
00:03:02,760 --> 00:03:04,720
Everyone can just go watch it.

48
00:03:04,880 --> 00:03:08,880
But there was an interesting also part touching on this,

49
00:03:08,880 --> 00:03:10,920
you know, how can you misuse these things?

50
00:03:11,360 --> 00:03:16,200
She used audio from 11 Labs, her voice clone,

51
00:03:16,640 --> 00:03:18,560
to actually talk with her bank.

52
00:03:18,880 --> 00:03:21,760
And as you know, banks usually have biometric.

53
00:03:21,760 --> 00:03:24,880
So they listen to your voice saying certain things

54
00:03:25,080 --> 00:03:28,880
and they either put you through with the actual human

55
00:03:29,080 --> 00:03:30,880
or basically allow you more options.

56
00:03:30,880 --> 00:03:36,200
So out of many tests she did using AI avatar in her voice,

57
00:03:36,400 --> 00:03:40,200
this one actually worked and the bank proceeded.

58
00:03:40,400 --> 00:03:44,080
And then they also tested with her friend trying to impersonate her

59
00:03:44,080 --> 00:03:48,240
and it did not work and the bank spotted that this is not a person,

60
00:03:48,240 --> 00:03:54,480
but it gives your impression at what level the audio is so good.

61
00:03:54,680 --> 00:03:56,040
It's so good, it's a little scary.

62
00:03:56,240 --> 00:03:59,160
And these are both areas where I think there's a need

63
00:03:59,160 --> 00:04:01,480
for some measure of regulation.

64
00:04:01,880 --> 00:04:03,760
How that's going to shake out, I don't know.

65
00:04:03,960 --> 00:04:05,680
You're a YouTube content creator.

66
00:04:05,880 --> 00:04:10,360
If you use AI tools, would you want a AI generated label

67
00:04:10,360 --> 00:04:12,200
somewhere on the video by law?

68
00:04:12,720 --> 00:04:16,360
Even though most of your video probably isn't AI generated,

69
00:04:16,360 --> 00:04:17,520
how would that make you feel?

70
00:04:17,520 --> 00:04:20,200
This is fun because I've thought about many aspects,

71
00:04:20,200 --> 00:04:25,560
for example, generating my own avatar of me or using my voice,

72
00:04:25,760 --> 00:04:29,640
cloning my voice, but I have not yet thought about

73
00:04:29,840 --> 00:04:33,440
do I need to disclose it and how do you disclose that?

74
00:04:33,640 --> 00:04:35,600
What would you even be comfortable with?

75
00:04:35,800 --> 00:04:37,320
I think if there's an advertisement,

76
00:04:37,520 --> 00:04:40,560
if it's something like political, targeted to kids,

77
00:04:40,760 --> 00:04:43,240
you know, put out by a government organization

78
00:04:43,440 --> 00:04:46,800
or like a news media source, they're acting in some official capacity.

79
00:04:46,800 --> 00:04:50,600
I think there should be some sort of label, disclaimer.

80
00:04:50,800 --> 00:04:54,560
Partial content has been generated with the help of AI tools and programs.

81
00:04:54,760 --> 00:04:58,840
And that hopefully is an indication for people to take this

82
00:04:58,840 --> 00:05:00,200
with a little bit of a grain of salt.

83
00:05:00,400 --> 00:05:02,000
100% agree with you.

84
00:05:02,200 --> 00:05:07,560
I think transparency and just giving that people consume the content,

85
00:05:07,760 --> 00:05:10,440
the freedom to judge for themselves.

86
00:05:10,640 --> 00:05:11,440
And it's fine.

87
00:05:11,640 --> 00:05:15,720
I think we are going to move forward that people will enjoy music.

88
00:05:15,720 --> 00:05:20,680
They may produce by AI or videos produced by AI.

89
00:05:20,880 --> 00:05:23,320
I think it's completely natural,

90
00:05:23,520 --> 00:05:26,520
and especially with images, too, like on the previous podcast,

91
00:05:26,720 --> 00:05:31,160
you said that people actually preferred AI generated images in this case study,

92
00:05:31,360 --> 00:05:37,320
right, made in Germany, but they're all for making disclaimers for people.

93
00:05:37,520 --> 00:05:40,360
I would include like, you know, in YouTube,

94
00:05:40,560 --> 00:05:42,560
for example, you do paid promotion.

95
00:05:42,560 --> 00:05:47,520
Now there is that you can market that this video was made with paid promotion.

96
00:05:47,720 --> 00:05:51,600
And there is this little pop up coming in the video.

97
00:05:51,800 --> 00:05:55,160
If you pause it or you remove it, it includes paid promotion.

98
00:05:55,360 --> 00:05:59,800
So maybe there should be like, oh, is this video or even the tools?

99
00:06:00,000 --> 00:06:03,400
YouTube should be able to kind of detect these things.

100
00:06:03,600 --> 00:06:06,120
As we know from a few episodes ago, too,

101
00:06:06,320 --> 00:06:09,440
very good at detecting all sorts of things on their surface.

102
00:06:09,640 --> 00:06:11,160
Well, kudos to them.

103
00:06:11,160 --> 00:06:12,520
Yeah, kudos to them.

104
00:06:12,520 --> 00:06:14,520
I mean, shoot, that's a whole nother discussion, too.

105
00:06:14,720 --> 00:06:19,400
There's still no AI generated legitimate content detectors.

106
00:06:19,600 --> 00:06:24,520
I've seen so many different stories of kids complaining because they've done

107
00:06:24,720 --> 00:06:28,760
their essays 100% themselves, but their teacher runs it through an AI

108
00:06:28,960 --> 00:06:33,880
content detector and parts of it comes up as this is 30% AI generated.

109
00:06:34,080 --> 00:06:35,640
A lot of those just use a random number

110
00:06:35,840 --> 00:06:38,960
distributor in terms of like what they elicit, what they put out,

111
00:06:38,960 --> 00:06:42,240
because they just need they want traffic to their site because that's money.

112
00:06:42,440 --> 00:06:45,600
And why not just open a text field that they can put the Declaration of

113
00:06:45,600 --> 00:06:48,640
Independence into it and it comes out as 70% AI generated.

114
00:06:48,840 --> 00:06:53,440
Like, no, no, it isn't like we know that we can put things in before.

115
00:06:53,640 --> 00:06:57,160
But I see enough still enough teachers at least taking it with a grain of salt.

116
00:06:57,360 --> 00:07:02,000
But I don't think it's in open eyes interest to put out their content

117
00:07:02,000 --> 00:07:06,120
detectors. And as these things get better and better, how would you go about doing it?

118
00:07:06,120 --> 00:07:09,440
Maybe there's some traces in AI video.

119
00:07:09,640 --> 00:07:14,040
I know, for example, in AI generated audio, as you and I are talking,

120
00:07:14,240 --> 00:07:16,440
it has a natural kind of like waveform pattern.

121
00:07:16,640 --> 00:07:21,240
There's ups and downs that even if kind of zoomed out, look a little more jagged.

122
00:07:21,440 --> 00:07:25,360
But if you were to zoom in, it's a much more gradual like a sine wave.

123
00:07:25,560 --> 00:07:30,360
I know in AI generated audio, when you get really down, it kind of looks like a step

124
00:07:30,560 --> 00:07:33,440
wise function or uniform. It's on or off.

125
00:07:33,440 --> 00:07:38,360
Yeah. And to our ears, we can't tell the difference because it's so subtle.

126
00:07:38,560 --> 00:07:40,960
But to a machine, there's no ramp up.

127
00:07:40,960 --> 00:07:45,200
So you can do these algorithms that saying, OK, there's no partial millisecond

128
00:07:45,400 --> 00:07:49,040
approach when this person says their name.

129
00:07:49,240 --> 00:07:52,320
So that is an indication it's fake audio.

130
00:07:52,520 --> 00:07:56,960
I am so curious to check it because remember I shared with you, I think,

131
00:07:57,160 --> 00:08:02,720
in our first conversations that to edit my videos, I use the script

132
00:08:02,720 --> 00:08:06,400
and that was, I think, one of the first

133
00:08:06,600 --> 00:08:08,920
software is that you can clone your voice.

134
00:08:09,120 --> 00:08:13,680
I could be wrong, but they are, let's say, one of the more profound ones.

135
00:08:13,880 --> 00:08:16,000
And OpenAI, by the way, invested in them.

136
00:08:16,200 --> 00:08:19,120
That's another story. What's going to come out of that?

137
00:08:19,320 --> 00:08:24,600
So I cloned my voice and I noticed, for example, in my videos when I'm editing,

138
00:08:24,800 --> 00:08:29,160
if there is some, let's say, I just need few words like maybe to make it

139
00:08:29,160 --> 00:08:34,760
a bit smoother to include and or so just to kind of make it transitions.

140
00:08:34,960 --> 00:08:38,280
And if I didn't originally record it in

141
00:08:38,480 --> 00:08:43,800
few of my videos, I use my cloned voice to incorporate this transitions.

142
00:08:44,000 --> 00:08:47,720
And the thing is, in one video I even said that I used it.

143
00:08:47,920 --> 00:08:54,040
And if people can notice where and I used the whole sentence and with my previous

144
00:08:54,240 --> 00:08:58,360
old bad mic, think if I used this mic, that would be insane.

145
00:08:58,360 --> 00:09:00,880
And people never detected.

146
00:09:01,080 --> 00:09:06,440
No one ever pointed out to me that, hey, yeah, I can tell this is the part.

147
00:09:06,640 --> 00:09:12,320
I can hear it, but not people from, I don't know, YouTube or my family.

148
00:09:12,520 --> 00:09:16,720
I'm yet to clone my voice with 11 Labs.

149
00:09:16,920 --> 00:09:19,240
I'm very excited about that. Have you done it?

150
00:09:19,440 --> 00:09:22,840
Yeah, I know you have an opportunity that I think maybe we'll discuss in a future

151
00:09:22,840 --> 00:09:29,640
episode where you might become a immortal digital avatar, fully scanned, fully recreated.

152
00:09:29,840 --> 00:09:31,680
I'm so excited about that.

153
00:09:31,880 --> 00:09:37,200
I can't disclose right now more, but yeah, if anybody comes across my YouTube

154
00:09:37,400 --> 00:09:42,560
video and there is another version of me, I think that's exciting to just explore

155
00:09:42,760 --> 00:09:45,600
this type of creative opportunities and never age.

156
00:09:45,800 --> 00:09:49,720
That's a thing I will just freeze myself at this moment in time.

157
00:09:49,720 --> 00:09:53,200
This is a nice segue into another topic.

158
00:09:53,400 --> 00:10:00,480
So just in the same vein as creating an AI voice copy of yourself, the 11 Labs needs

159
00:10:00,680 --> 00:10:06,400
three minutes of audio to create a like 99% accurate voice model.

160
00:10:06,600 --> 00:10:09,200
So when you have all these music singers

161
00:10:09,400 --> 00:10:13,720
and stars that have hundreds of hours of recorded footage,

162
00:10:13,720 --> 00:10:19,800
of high quality audio in the form of their music, you now have this proliferation

163
00:10:20,000 --> 00:10:23,320
of AI generated music hitting the scene.

164
00:10:23,520 --> 00:10:28,240
So we're going to include a link in the show notes and the newsletter this week

165
00:10:28,440 --> 00:10:35,800
to the first AI hits chart, kind of a billboard top 100 of AI generated music.

166
00:10:36,000 --> 00:10:41,640
It's a very interesting discussion because from the outside, the music industry,

167
00:10:41,640 --> 00:10:47,640
who's famously always been a big champion and rightfully so of the copyrighted

168
00:10:47,840 --> 00:10:51,080
material that artists put out, I think of Napster growing up,

169
00:10:51,280 --> 00:10:55,960
that being the hugest lawsuit most of my middle school and high school years was

170
00:10:55,960 --> 00:10:59,080
spent downloading tons of different songs, creating mixed CDs.

171
00:10:59,280 --> 00:11:02,600
For you younger listeners is actually a compact disc.

172
00:11:02,800 --> 00:11:08,240
It's this plastic little thing we used to use as ancient creatures that would go

173
00:11:08,240 --> 00:11:12,080
into this machine called a CD player.

174
00:11:12,160 --> 00:11:15,080
Yes, we didn't have our phones to listen to everything.

175
00:11:15,280 --> 00:11:18,120
We are just old souls, vintage souls.

176
00:11:18,320 --> 00:11:18,760
Yeah.

177
00:11:18,960 --> 00:11:23,960
But so now you have these AI generated songs coming out that in some respects,

178
00:11:23,960 --> 00:11:28,360
I'm looking at the list right now, you have some artists that are releasing

179
00:11:28,560 --> 00:11:30,680
hits, AI hits from the beyond.

180
00:11:30,880 --> 00:11:35,640
I see here Biggie Smalls, Rest in Peace, but he's got a new cover.

181
00:11:35,840 --> 00:11:36,720
So there's a couple of these.

182
00:11:36,720 --> 00:11:41,400
We'll take a little listen to one right now and talk about it after.

183
00:12:06,720 --> 00:12:09,760
My way, your stare was holding

184
00:12:09,960 --> 00:12:14,240
ripped jeans, skin was showing, hot night wind was blowing.

185
00:12:14,440 --> 00:12:17,000
Where you think you're going, baby?

186
00:12:17,200 --> 00:12:21,840
Hey, I just met you and this is crazy.

187
00:12:22,040 --> 00:12:24,080
But here's my number.

188
00:12:24,280 --> 00:12:28,880
So call me, maybe it's Hanna La Crye.

189
00:12:29,080 --> 00:12:33,120
Ashen baby, but here's my number.

190
00:12:33,320 --> 00:12:35,720
So call me, maybe.

191
00:12:35,720 --> 00:12:36,720
What?

192
00:12:36,920 --> 00:12:39,760
Now I want to listen to all of them like this is crazy.

193
00:12:39,960 --> 00:12:41,120
Some of them are wild.

194
00:12:41,320 --> 00:12:45,800
What we had and we'll put the link to it, that was Kanye West singing the Carly Rae

195
00:12:46,000 --> 00:12:48,400
Jepsen classic, Call Me Maybe.

196
00:12:48,600 --> 00:12:54,400
So you have a cross genre kind of occurrence here that people are doing.

197
00:12:54,600 --> 00:12:59,280
And I think it brings up a couple interesting points of discussion.

198
00:12:59,480 --> 00:13:04,640
So some artists have actually come out like I think Grimes has that has said,

199
00:13:04,640 --> 00:13:08,360
hey, anyone wants to use my voice for AI generated music?

200
00:13:08,560 --> 00:13:11,040
You can. I just get 50 percent of the royalties.

201
00:13:11,240 --> 00:13:13,360
And she's even created a little like on

202
00:13:13,560 --> 00:13:17,960
ramp for people to submit things and high res voice prints.

203
00:13:17,960 --> 00:13:19,160
So it sounds really good.

204
00:13:19,360 --> 00:13:22,240
So to one respect, that's kind of awesome.

205
00:13:22,320 --> 00:13:24,240
That could be an entire new revenue stream,

206
00:13:24,440 --> 00:13:26,640
especially if you have some control over the release.

207
00:13:26,680 --> 00:13:27,560
That's kind of flattering.

208
00:13:27,760 --> 00:13:32,240
Same to like a lot of people that are these mega timeless stars.

209
00:13:32,240 --> 00:13:36,640
I would love to hear new Prince or Michael Jackson songs come out.

210
00:13:36,840 --> 00:13:41,560
You know, those were both artists kind of like so quintessential to my growing up.

211
00:13:41,560 --> 00:13:43,520
That's what I heard my parents playing all the time.

212
00:13:43,720 --> 00:13:44,400
You know what?

213
00:13:44,600 --> 00:13:50,960
I'm kind of thinking running this thought in my head that we are unlocking the whole

214
00:13:51,160 --> 00:13:57,080
other level of creativity and I don't know how to actually describe it.

215
00:13:57,280 --> 00:14:01,840
But maybe you can help me with that before we have artists in this genre.

216
00:14:01,840 --> 00:14:04,360
And now we can take the same artist,

217
00:14:04,560 --> 00:14:08,080
like Kanye West, in singing country music.

218
00:14:08,280 --> 00:14:16,160
Now we have this like ability to make this cross sections between artists, time, styles.

219
00:14:16,360 --> 00:14:19,360
And I saw this also with images the other day.

220
00:14:19,560 --> 00:14:26,160
I saw that the image on Twitter that PewDiePie and MrBeast together.

221
00:14:26,360 --> 00:14:30,480
If anybody can relate who are these people, it's the biggest YouTubers.

222
00:14:30,480 --> 00:14:32,320
And basically I looked at that picture.

223
00:14:32,520 --> 00:14:40,840
I was like, this is so screams to me I generated, but I don't know if it is, which is fine.

224
00:14:41,040 --> 00:14:44,960
And then I just thought that, you know, people like, for example, fans were looking

225
00:14:45,160 --> 00:14:48,600
for ages to see these two people in one picture.

226
00:14:48,800 --> 00:14:54,760
And that now we have this ability to actually combine concepts like, I don't know,

227
00:14:54,760 --> 00:15:00,440
I saw this babies jumping with parachutes that you would never see,

228
00:15:00,640 --> 00:15:03,200
that you would never see a photograph of that.

229
00:15:03,400 --> 00:15:08,280
And now we can actually, if we can imagine these cross sections between different

230
00:15:08,480 --> 00:15:13,360
concepts and styles, we can actually have this creative ability to merge them.

231
00:15:13,560 --> 00:15:15,720
Creative synthesis, if you will.

232
00:15:15,920 --> 00:15:19,000
I knew that you will have some good name for it.

233
00:15:19,200 --> 00:15:21,320
That's new, that's synth words coming back.

234
00:15:21,320 --> 00:15:24,720
It's like multifaceted name of mine.

235
00:15:24,920 --> 00:15:28,920
But so I think the interesting thing about these AI generated songs as compared

236
00:15:29,120 --> 00:15:35,880
to AI generated photographs or AI generated text, when we're prompting and we want to

237
00:15:36,080 --> 00:15:42,280
include a specific author or specific photographer in our mid journey prompt,

238
00:15:42,480 --> 00:15:49,680
it tends to be in the way of in the style of this photographer written like this

239
00:15:49,680 --> 00:15:51,840
author in the style of this author.

240
00:15:52,040 --> 00:15:57,000
And to me, that is very different than this AI generated charts where it's like,

241
00:15:57,200 --> 00:16:00,000
oh, I'm the artist, but this is a Drake song.

242
00:16:00,200 --> 00:16:01,640
This is a Kanye song.

243
00:16:01,840 --> 00:16:08,200
It's misrepresenting it a little bit, even though it's blatant that this is an AI hits

244
00:16:08,400 --> 00:16:13,960
track, what's to stop someone from sharing the SoundCloud link and going, OK,

245
00:16:14,160 --> 00:16:17,920
this is the new Drake song when it's not Drake at all.

246
00:16:17,920 --> 00:16:24,040
So the flip side I touched on this, what kind of impresses me, this ability to push

247
00:16:24,240 --> 00:16:30,520
creativity, but the other aspect is the fact that again, the Grimes example, right?

248
00:16:30,720 --> 00:16:35,680
Music industry has been always very protective of copyright and they will

249
00:16:35,880 --> 00:16:37,760
collect royalties, that's for sure.

250
00:16:37,960 --> 00:16:39,320
Like they will find a way.

251
00:16:39,520 --> 00:16:45,320
And now it comes down to the same, that now you can use Drake's or Kanye's voice.

252
00:16:45,320 --> 00:16:51,560
They will receive revenue, but then it comes down to the artist's work in the image

253
00:16:51,760 --> 00:16:56,800
generation, that is who comes to give royalties to these artists.

254
00:16:57,000 --> 00:17:01,480
And they know that this topic has been hammered, there is lawsuits ongoing.

255
00:17:01,680 --> 00:17:07,800
But for us actually to have this ability, it's very much like a hot ongoing topic.

256
00:17:08,000 --> 00:17:13,720
So I think we will definitely talk in future episodes, the whole OpenAI being

257
00:17:13,720 --> 00:17:20,200
pushed to disclose what's their training data, what they used and how much there is

258
00:17:20,400 --> 00:17:25,600
potentially illegal data or that we have to pay fees like to the Reddit.

259
00:17:25,800 --> 00:17:31,360
But also, yeah, so are you being pushing this transparency?

260
00:17:31,560 --> 00:17:37,720
What happens once that transparency is reached and we actually see what exact

261
00:17:37,920 --> 00:17:41,320
artworks, what exact songs have been used?

262
00:17:41,320 --> 00:17:46,200
So this kind of creates the whole, potentially maybe the whole other

263
00:17:46,400 --> 00:17:53,040
industry, which could go on on its own, that, for example, you can license maybe

264
00:17:53,240 --> 00:17:56,080
your voice to the AI model labs.

265
00:17:56,160 --> 00:17:56,920
You're absolutely right.

266
00:17:57,120 --> 00:18:03,080
And then they can use that to be a cartoon character voice or say an audiobook.

267
00:18:03,280 --> 00:18:05,360
It's a really, really interesting discussion.

268
00:18:05,560 --> 00:18:09,320
My fear is that there is a lot of

269
00:18:09,320 --> 00:18:15,560
big cases historically that pertain to copyright law and fair use, what is free

270
00:18:15,760 --> 00:18:21,640
and fair use of songs, you know, the music industry famously has led many of these.

271
00:18:21,840 --> 00:18:28,400
And my concern is that it would set a precedent that would carry over into some

272
00:18:28,600 --> 00:18:33,080
of these image and text training sets, because to me, it's fundamentally

273
00:18:33,280 --> 00:18:37,160
different to like type something online and it's out there.

274
00:18:37,160 --> 00:18:40,080
And for that little blog post that you

275
00:18:40,280 --> 00:18:44,840
typed to be used as part of training data, then it is to say, put out a

276
00:18:45,040 --> 00:18:51,040
copyrighted song out there that even if it's paid for part of that copyright,

277
00:18:51,240 --> 00:18:53,240
isn't necessarily to scrape the audio.

278
00:18:53,440 --> 00:18:56,120
So like with this AI generated audio,

279
00:18:56,320 --> 00:19:01,480
even if the training set included a purchased song, they purchased the song

280
00:19:01,680 --> 00:19:05,360
legally from the iTunes store, it probably wasn't with the intention

281
00:19:05,360 --> 00:19:08,880
to be included in a AI generated training set.

282
00:19:09,080 --> 00:19:13,160
So kind of like how maybe you could license images like on Adobe.

283
00:19:13,360 --> 00:19:17,400
Hey, you can pay for the one version of it if you just want to use the image.

284
00:19:17,600 --> 00:19:21,760
But if it's going to be in front of this many people or used in this way,

285
00:19:21,960 --> 00:19:26,800
it's another price that might be a precedent or a world that we're stepping

286
00:19:27,000 --> 00:19:30,360
into, you know, a cost per stream, the artist gets this.

287
00:19:30,560 --> 00:19:33,840
But the cost per use in a training set, the artist gets that.

288
00:19:33,840 --> 00:19:38,960
I will come here on a record with this kind of thought regarding this.

289
00:19:39,160 --> 00:19:42,480
I think we need to fundamentally rethink

290
00:19:42,680 --> 00:19:47,280
technology behind metadata of any kind of file.

291
00:19:47,480 --> 00:19:50,800
Both images, both audio, if it's voice,

292
00:19:51,000 --> 00:19:56,720
some sort of way to embed the original rights.

293
00:19:56,920 --> 00:20:01,480
And again, don't want to completely shift to the whole NFT topic and stuff like that.

294
00:20:01,680 --> 00:20:02,760
That's not about that.

295
00:20:02,760 --> 00:20:07,760
Every kind of innovation and everything we see, it feels and it probably is like that.

296
00:20:07,960 --> 00:20:09,600
It's like a coin with two sides.

297
00:20:09,800 --> 00:20:15,880
It will enable creativity for people who don't sing, are not able to sing or bring

298
00:20:16,080 --> 00:20:22,320
the voice of their relatives and have these conversations and maybe create together things.

299
00:20:22,520 --> 00:20:24,880
You know, sky's the limit for creativity.

300
00:20:25,080 --> 00:20:27,720
And this is why we are beautiful humans,

301
00:20:27,920 --> 00:20:30,920
that we are creative and we will push boundaries all the time.

302
00:20:30,920 --> 00:20:37,200
And another side of coin is also our kind of nature to part of pushing boundaries is

303
00:20:37,400 --> 00:20:44,360
actually to come up with ways to trick people, to seek profit, to scams.

304
00:20:44,560 --> 00:20:49,840
And I just saw a video where a girl was crying that a scammer called

305
00:20:50,040 --> 00:20:54,440
and said that her little brother is in prison or something or dead.

306
00:20:54,640 --> 00:20:56,240
And it was his voice.

307
00:20:56,240 --> 00:21:01,440
You know, so if our voices leak on a black market or something like that, what happens then?

308
00:21:01,640 --> 00:21:05,600
I just gave you an example of bank security.

309
00:21:05,800 --> 00:21:09,800
Biometrics need to completely be rethought.

310
00:21:10,000 --> 00:21:15,480
And I think what Sam Alton said that equally as there is so much good

311
00:21:15,680 --> 00:21:20,840
potential from all this technology, equally there is also this huge thread.

312
00:21:20,960 --> 00:21:25,960
And the part that we don't know how to deal with these things and we are moving so fast.

313
00:21:25,960 --> 00:21:29,200
And some big companies are in interesting positions, too.

314
00:21:29,400 --> 00:21:33,720
So obviously a lot of these songs are listed on YouTube.

315
00:21:33,920 --> 00:21:38,040
Someone will put about there to get views because they have the most mature

316
00:21:38,240 --> 00:21:45,000
revenue generation, revenue per view, revenue per stream platform out of anything out there.

317
00:21:45,200 --> 00:21:46,200
They did a study right now.

318
00:21:46,400 --> 00:21:51,480
Users are using JetGBT 37 times more frequently than they are Google's part.

319
00:21:51,680 --> 00:21:53,800
So it's like, OK, if we say no,

320
00:21:53,800 --> 00:21:59,160
we're not going to put out any AI generated stuff on YouTube to appease the music industry.

321
00:21:59,360 --> 00:22:03,320
OK, you've sent a signal to consumers that you're not AI friendly.

322
00:22:03,520 --> 00:22:06,840
So some of your AI tools might not then

323
00:22:07,040 --> 00:22:09,960
proliferate as much as you hope they would.

324
00:22:10,160 --> 00:22:13,560
But conversely, you have a huge music industry that could say, OK, well,

325
00:22:13,600 --> 00:22:17,240
because you haven't taken down this Drake AI song, you have to take down all

326
00:22:17,440 --> 00:22:20,520
the real Drake songs that have over a billion views, you know,

327
00:22:20,720 --> 00:22:22,840
that are great sources of revenue for you.

328
00:22:22,840 --> 00:22:26,440
So it's an enviable place for them.

329
00:22:26,640 --> 00:22:33,560
It's so funny to see this kind of evolution where we go text, image, music,

330
00:22:33,760 --> 00:22:35,600
maybe voice, audio.

331
00:22:35,800 --> 00:22:40,240
And this is my kind of prediction that maybe 2024.

332
00:22:40,440 --> 00:22:44,360
I don't want to throw it into 2023, but let's say 2024.

333
00:22:44,560 --> 00:22:51,400
We have video, which is a merger of image and audio effects, voice.

334
00:22:51,400 --> 00:22:55,200
And there's insane developments in video already.

335
00:22:55,400 --> 00:22:58,200
So maybe prediction is 2023 end of the year.

336
00:22:58,400 --> 00:22:59,560
Predictions next Thursday.

337
00:22:59,760 --> 00:23:00,720
Next week's episode.

338
00:23:00,920 --> 00:23:07,840
But just the fact that if we are completely moving to the video where all of these

339
00:23:08,040 --> 00:23:13,480
things come together, that fundamentally changes entertainment industry.

340
00:23:13,680 --> 00:23:14,320
Yeah, it does.

341
00:23:14,520 --> 00:23:20,400
I think we're only a few years away from having a full AI movie, fully AI generated,

342
00:23:20,400 --> 00:23:25,920
influencer accounts that people can portray themselves as, or maybe even an AI is running.

343
00:23:26,120 --> 00:23:27,120
AI YouTuber.

344
00:23:27,320 --> 00:23:29,320
Yeah. I mean, to some degrees, it's already there.

345
00:23:29,520 --> 00:23:31,640
I just got my runway ML access.

346
00:23:31,840 --> 00:23:34,560
We'll post the link to that as well in the show notes.

347
00:23:34,760 --> 00:23:39,120
But this is a kind of all encompassing design tool.

348
00:23:39,320 --> 00:23:42,760
Anything from retouching images, AI generated images.

349
00:23:42,760 --> 00:23:47,840
But part of one of their alphas that they have right now is the text to video generative tool.

350
00:23:48,040 --> 00:23:49,000
Have you tried it?

351
00:23:49,000 --> 00:23:51,320
I've started messing with it yesterday.

352
00:23:51,520 --> 00:23:52,560
Gen 2 or Gen 1?

353
00:23:52,760 --> 00:23:53,360
Gen 1.

354
00:23:53,560 --> 00:23:54,600
I have Gen 2.

355
00:23:54,800 --> 00:23:57,760
I created a small snippet with Gen 1.

356
00:23:57,960 --> 00:24:00,600
I was on the waiting list and got early access.

357
00:24:00,800 --> 00:24:05,560
As with many of these early things, I went with the crazy expectations because we put

358
00:24:05,760 --> 00:24:08,520
trailers out there like, oh my God.

359
00:24:08,720 --> 00:24:14,480
And then, of course, again, we always go back to prompting how good you are to

360
00:24:14,680 --> 00:24:16,200
actually tell what you want.

361
00:24:16,200 --> 00:24:22,360
So I was playing a lot, experimenting, and it takes some actual skill to now get good

362
00:24:22,560 --> 00:24:26,160
results. But yeah, runway ML is impressive.

363
00:24:26,360 --> 00:24:31,760
So this is where I think a whole other generation of people that maybe aren't as

364
00:24:31,960 --> 00:24:38,520
tech literate, as deep into things like math, science, coding, but are very

365
00:24:38,720 --> 00:24:44,000
descriptive writers or can describe a scene very clearly in their head, may become

366
00:24:44,000 --> 00:24:47,080
the next elite level of prompt engineers.

367
00:24:47,280 --> 00:24:51,920
To be able to describe a scene in such detail with the composition and the

368
00:24:52,120 --> 00:24:58,320
lighting in the style of and exactly what actors, the AI actors are doing and holding

369
00:24:58,520 --> 00:25:01,920
and looking and expressing, there'll be so many different layers.

370
00:25:02,120 --> 00:25:04,200
And that also probably applies to music.

371
00:25:04,400 --> 00:25:07,200
How would you prompt a rap music song?

372
00:25:07,200 --> 00:25:08,640
Just saying rap is very broad.

373
00:25:08,840 --> 00:25:11,440
There's subcategories or subgenres of each.

374
00:25:11,440 --> 00:25:15,440
Do you have to know about what a song is composed of?

375
00:25:15,640 --> 00:25:20,640
The beats per minute, the valence, the cadence, the timbre, all these different

376
00:25:20,840 --> 00:25:24,240
little components that go into what makes a song.

377
00:25:24,440 --> 00:25:26,960
You have to be able to describe those in such a way.

378
00:25:27,160 --> 00:25:28,960
Very interesting kind of to think about.

379
00:25:29,160 --> 00:25:34,720
So it just sounds to me that if we look at the trajectory of career, let's say 30,

380
00:25:34,920 --> 00:25:40,280
40 years back, you usually probably had one job, one career continuously.

381
00:25:40,280 --> 00:25:44,000
Now these instances and unfortunately or

382
00:25:44,200 --> 00:25:47,920
fortunately I fall in that because by training and by my education, I'm an

383
00:25:48,120 --> 00:25:52,880
architect, right? When I had the startup and I worked in marketing and I'm a content creator.

384
00:25:53,080 --> 00:25:57,360
So this is such a merger of different skills.

385
00:25:57,560 --> 00:26:00,680
So if someone comes to me and asks about,

386
00:26:00,880 --> 00:26:06,280
I don't know, like a creative ad proposal prompt, I will be able to do that.

387
00:26:06,280 --> 00:26:13,400
If you ask me for image, for interior design, specific style or building facade,

388
00:26:13,600 --> 00:26:14,840
I am able to do it.

389
00:26:15,040 --> 00:26:20,560
But because of this diverse experiences and skills I acquired.

390
00:26:20,760 --> 00:26:26,200
So it just sounds that, yes, being able to communicate is a huge advantage.

391
00:26:26,400 --> 00:26:29,200
I saw people who study English degrees.

392
00:26:29,400 --> 00:26:33,480
They're like, oh, my God, I can't believe that it actually could pay out.

393
00:26:33,480 --> 00:26:41,560
But also a lot of people who had accumulated this huge experience in life.

394
00:26:41,760 --> 00:26:47,960
And as I mentioned to you, there's people who are older, are very good at describing

395
00:26:48,160 --> 00:26:52,120
what they want because they lived life and seen stuff.

396
00:26:52,320 --> 00:26:54,800
There will always be the need for experts.

397
00:26:55,000 --> 00:26:57,760
AI is not coming taking jobs.

398
00:26:57,960 --> 00:27:02,600
What's happening is someone using AI is going to come and replace other people.

399
00:27:02,600 --> 00:27:06,400
But there will still always need to be that yardstick to measure against.

400
00:27:06,600 --> 00:27:08,120
Like what is good?

401
00:27:08,320 --> 00:27:12,560
And then that person who's using AI that's already at the top of their game,

402
00:27:12,760 --> 00:27:17,400
like an amazing visual artist or an amazing guitar player or singer,

403
00:27:17,600 --> 00:27:22,880
they can create something now that's infinitely more creative than they could

404
00:27:23,080 --> 00:27:28,560
even attain. So it's still not something that's going to stifle creativity.

405
00:27:28,760 --> 00:27:31,880
If anything, it's going to supercharge it in a lot of different ways.

406
00:27:31,880 --> 00:27:34,760
And we'll always have this need for people to gauge it.

407
00:27:34,960 --> 00:27:41,000
Wes, do you think for future episodes, would you want to take on the challenge

408
00:27:41,200 --> 00:27:45,000
of creating our own music or something with voice?

409
00:27:45,200 --> 00:27:48,560
I think it would be interesting to just challenge with audio.

410
00:27:48,760 --> 00:27:49,840
I think that would be fun.

411
00:27:50,040 --> 00:27:52,120
You are someone that speaks multiple languages.

412
00:27:52,320 --> 00:27:56,560
I just saw that 11 Labs released a tool where I could say something in English

413
00:27:56,560 --> 00:28:00,040
and it could have me saying the exact same thing in 10 other different languages.

414
00:28:00,040 --> 00:28:02,200
It would be fun to see it evaluated.

415
00:28:02,400 --> 00:28:03,600
You know what just popped?

416
00:28:03,800 --> 00:28:08,680
I want to hear you speak in my own native language and I will leave it out for

417
00:28:08,880 --> 00:28:13,720
people maybe to guess, but I would love to test different accents

418
00:28:13,920 --> 00:28:16,920
because of course I'm not a native English speaker.

419
00:28:17,120 --> 00:28:17,880
I have an accent.

420
00:28:18,080 --> 00:28:22,640
But actually it would be so interesting how I sound in proper American.

421
00:28:22,840 --> 00:28:23,480
I don't know.

422
00:28:23,680 --> 00:28:25,000
Is there proper American?

423
00:28:25,200 --> 00:28:26,320
Proper American.

424
00:28:26,520 --> 00:28:27,040
Yeah.

425
00:28:27,240 --> 00:28:28,280
British English.

426
00:28:28,280 --> 00:28:29,840
That would be all like this here.

427
00:28:30,040 --> 00:28:30,640
How you can go?

428
00:28:30,840 --> 00:28:31,760
This is go to go.

429
00:28:31,960 --> 00:28:32,840
Coming in.

430
00:28:33,040 --> 00:28:34,720
Looking all fine, everyone.

431
00:28:34,920 --> 00:28:36,160
We had a fine little episode.

432
00:28:36,360 --> 00:28:37,440
Nick and Anna Picken.

433
00:28:37,640 --> 00:28:39,240
That would be, I guess, proper American.

434
00:28:39,440 --> 00:28:39,760
Yeah.

435
00:28:39,960 --> 00:28:46,280
So I think for a future and oh, by the way, we hinted on the avatar things.

436
00:28:46,480 --> 00:28:51,720
I think for listeners we could say that I hopefully can share more in the future

437
00:28:51,720 --> 00:28:58,200
episodes about cloning yourself and how it's done and all the details.

438
00:28:58,400 --> 00:28:59,680
Immortality.

439
00:28:59,880 --> 00:29:00,480
Exactly.

440
00:29:00,680 --> 00:29:06,200
So with that being said, Wes and go to go say bye to you.

441
00:29:06,400 --> 00:29:08,720
Happy prompting, everybody.

442
00:29:08,920 --> 00:29:16,240
Thanks for listening to How to Talk to AI with your hosts, go to go and Wes the Synth Mind.

443
00:29:16,240 --> 00:29:22,480
As always, you can check out the show notes and links at how to talk to dot AI.

444
00:29:22,680 --> 00:29:25,000
That's all for this week's episode.

445
00:29:25,000 --> 00:29:53,000
Happy prompting, everyone.

