WEBVTT

00:00:00.000 --> 00:00:02.480
You know that feeling, right? That friction.

00:00:02.899 --> 00:00:05.940
Oh, yeah. The creative buzzkill. You find this

00:00:05.940 --> 00:00:08.140
amazing tutorial for a new workflow. You get

00:00:08.140 --> 00:00:10.480
that little spark, and you think, I can actually

00:00:10.480 --> 00:00:12.480
make this. This is it. You click the link, and

00:00:12.480 --> 00:00:15.679
boom, paywall. Subscription fatigue. It's $40

00:00:15.679 --> 00:00:19.539
a month just to experiment. Exactly. And it just

00:00:19.539 --> 00:00:21.839
prices out curiosity. It's a huge bottleneck.

00:00:21.839 --> 00:00:23.760
It kills the whole tinkering phase. You can't

00:00:23.760 --> 00:00:25.280
just play around if you have to pay up front.

00:00:25.800 --> 00:00:29.160
Precisely. It puts innovation behind this gate.

00:00:29.839 --> 00:00:32.240
But the whole premise of the sources we've pulled

00:00:32.240 --> 00:00:34.899
together today is that we might be looking in

00:00:34.899 --> 00:00:38.359
the wrong place. While the West is so focused

00:00:38.359 --> 00:00:41.049
on... you know, three or four big companies with

00:00:41.049 --> 00:00:44.369
their subscriptions, there's this parallel ecosystem

00:00:44.369 --> 00:00:46.689
rising in the East. A massive one. We're looking

00:00:46.689 --> 00:00:49.729
at a stack of incredibly high powered AI tools

00:00:49.729 --> 00:00:53.170
from China, from giants like Alibaba, Tencent,

00:00:53.350 --> 00:00:56.630
ByteDance. And they are, for the most part, completely

00:00:56.630 --> 00:00:59.130
free. And these aren't elite versions or some

00:00:59.130 --> 00:01:01.030
three day trial. These are the heavy hitters.

00:01:01.369 --> 00:01:03.609
But there's this huge disconnect, right? Western

00:01:03.609 --> 00:01:05.719
users just aren't touching them. because of the

00:01:05.719 --> 00:01:08.120
language barrier. Right. The user interface is

00:01:08.120 --> 00:01:10.739
in Mandarin. And for most people, that's an immediate

00:01:10.739 --> 00:01:14.060
close tab. But the sources we have argue that

00:01:14.060 --> 00:01:17.260
this barrier is it's thinner than it looks. Way

00:01:17.260 --> 00:01:18.579
thinner. And that's what we're going to walk

00:01:18.579 --> 00:01:21.260
through. First, how to break that barrier. And

00:01:21.260 --> 00:01:23.879
then we're going to build a whole creative toolkit,

00:01:23.980 --> 00:01:26.819
3D modeling, realistic video, coding agents,

00:01:27.040 --> 00:01:30.180
even local hosting. All using tools that cost

00:01:30.180 --> 00:01:32.480
absolutely nothing. Zero. We're building a free

00:01:32.480 --> 00:01:34.680
toolkit. So let's start with the visual side.

00:01:34.959 --> 00:01:38.480
I think the jump from 2D to 3D is probably the

00:01:38.480 --> 00:01:40.780
steepest learning curve in all of digital art.

00:01:40.840 --> 00:01:43.739
Oh, for sure. If I want a 3D asset, I'm thinking,

00:01:43.819 --> 00:01:46.200
OK, I have to learn Blender. I have to deal with

00:01:46.200 --> 00:01:48.500
vertices and meshes. I've tried to learn Blender

00:01:48.500 --> 00:01:50.579
three times. I quit every single time. It's a

00:01:50.579 --> 00:01:53.719
six -month commitment, easy. Right. So the sources

00:01:53.719 --> 00:01:57.280
point to a tool called Hunyun 3D. What's the

00:01:57.280 --> 00:01:59.750
breakthrough here? What's the... Big idea. The

00:01:59.750 --> 00:02:02.590
breakthrough is image to 3D. That's a whole concept.

00:02:02.890 --> 00:02:05.049
Honeymoon lets you just bypass that entire modeling

00:02:05.049 --> 00:02:08.069
process. You skip it. Completely. You upload

00:02:08.069 --> 00:02:11.409
a single photo, a sneaker, a little toy car,

00:02:12.050 --> 00:02:16.139
a chair. And the AI just infers the spatial geometry.

00:02:16.620 --> 00:02:18.460
It predicts what the back looks like from the

00:02:18.460 --> 00:02:22.080
front and spits out a fully rotatable printable

00:02:22.080 --> 00:02:24.400
3D mesh. And this is where we hit that first

00:02:24.400 --> 00:02:26.400
barrier we were talking about. You land on the

00:02:26.400 --> 00:02:28.819
Hunyuan site and it's all in Chinese characters.

00:02:29.000 --> 00:02:31.460
And this is the translation trick that the sources

00:02:31.460 --> 00:02:34.680
really emphasize. It sounds... It's almost too

00:02:34.680 --> 00:02:37.319
simple, but it's the key to everything. You don't

00:02:37.319 --> 00:02:39.199
need to learn Mandarin. You just right -click

00:02:39.199 --> 00:02:42.099
on the white part of the browser page, literally

00:02:42.099 --> 00:02:43.539
anywhere on the background. Just on the white

00:02:43.539 --> 00:02:45.960
space. Yep. And hit Translate to English. So

00:02:45.960 --> 00:02:48.780
it's a browser -level thing. Exactly. And suddenly

00:02:48.780 --> 00:02:51.379
the whole interface is in English. The buttons

00:02:51.379 --> 00:02:54.400
make sense. But, and this is a really crucial

00:02:54.400 --> 00:02:57.500
detail for Hanyuan, the default settings are

00:02:57.500 --> 00:03:00.479
not what you want. The sources point out a specific

00:03:00.479 --> 00:03:03.020
slider you have to change. The detail slider.

00:03:03.120 --> 00:03:05.740
Right. It's set to medium by default. Probably

00:03:05.740 --> 00:03:07.879
to save on their servers, you have to slide that

00:03:07.879 --> 00:03:09.500
thing all the way to high. And what does that

00:03:09.500 --> 00:03:12.180
do? It might double the generation time, so maybe

00:03:12.180 --> 00:03:14.400
two minutes instead of one. But the difference

00:03:14.400 --> 00:03:17.199
in quality in the polygon count is huge. If you

00:03:17.199 --> 00:03:19.939
want to 3D print it or use it in a game, that

00:03:19.939 --> 00:03:22.659
slider is everything. So this tool is turning

00:03:22.659 --> 00:03:25.639
a flat image into a spatial object in just a

00:03:25.639 --> 00:03:27.819
few minutes. What does that really mean for a

00:03:27.819 --> 00:03:30.889
creator? It means spatial design is democratized.

00:03:31.270 --> 00:03:33.289
You don't need to understand geometry to make

00:03:33.289 --> 00:03:36.189
3D art anymore. You just need a picture. That's

00:03:36.189 --> 00:03:38.370
fascinating. Okay, let's move from objects to

00:03:38.370 --> 00:03:41.009
people. This is the other big challenge. AI image

00:03:41.009 --> 00:03:43.189
generators like Mid Journey and Dali, they've

00:03:43.189 --> 00:03:45.650
been around, but they have this persistent problem.

00:03:45.770 --> 00:03:47.990
The plastic skin problem. Yeah, everything looks

00:03:47.990 --> 00:03:50.569
way too smooth, too perfect. The uncanny valley.

00:03:50.889 --> 00:03:54.289
Our brains are just hardwired to spot fake humans.

00:03:54.669 --> 00:03:56.969
We're looking for pores, for asymmetry, for the

00:03:56.969 --> 00:03:59.490
way light hits skin. If that's missing, we reject

00:03:59.490 --> 00:04:01.750
it, even if it's super high -res. So the sources

00:04:01.750 --> 00:04:04.590
point to a tool called APOB, and specifically

00:04:04.590 --> 00:04:08.710
their Ultra S4K model as a solution. How is it

00:04:08.710 --> 00:04:11.199
doing things differently? APOB is optimizing

00:04:11.199 --> 00:04:14.120
for imperfection. It's kind of ironic, you know,

00:04:14.259 --> 00:04:16.100
to make it look real, you have to make it look

00:04:16.100 --> 00:04:18.480
a little worse. Right. It adds texture, those

00:04:18.480 --> 00:04:21.120
micro wrinkles, uneven lighting. It generates

00:04:21.120 --> 00:04:23.600
what the sources are calling authentic influencers.

00:04:23.920 --> 00:04:25.600
There was a great prompt example in the source

00:04:25.600 --> 00:04:28.459
material for this one. It was a university student

00:04:28.459 --> 00:04:31.920
studying in a cozy library wearing a yellow sweater,

00:04:32.379 --> 00:04:34.860
soft rain on the window behind her. It's such

00:04:34.860 --> 00:04:37.699
a mood piece. And what APOP gets right isn't

00:04:37.699 --> 00:04:40.199
just the person, it's the whole atmosphere. It

00:04:40.199 --> 00:04:42.860
actually understands how soft light diffuses

00:04:42.860 --> 00:04:45.980
through rain on a window. But the real power

00:04:45.980 --> 00:04:48.959
here is you can change one detail without wrecking

00:04:48.959 --> 00:04:51.759
the whole image. You can swap the yellow sweater

00:04:51.759 --> 00:04:54.420
for a red one and the face stays perfectly consistent.

00:04:54.600 --> 00:04:56.360
That brings up a good question, then. Why does

00:04:56.360 --> 00:04:59.779
texture matter more than, say, just raw resolution?

00:05:00.079 --> 00:05:03.439
Because our brains are wired to spot fake smoothness,

00:05:03.860 --> 00:05:06.699
texture is how we read reality. A high -res photo

00:05:06.699 --> 00:05:09.860
of plastic still looks like plastic. APOP just

00:05:09.860 --> 00:05:12.519
nails the human texture. Okay, so we've got 3D

00:05:12.519 --> 00:05:15.360
objects. We have realistic people. What if I

00:05:15.360 --> 00:05:17.949
just want to edit a photo I already took? I'm

00:05:17.949 --> 00:05:19.689
not starting from scratch, I just want to change

00:05:19.689 --> 00:05:22.490
the background. The sources mention a tool called

00:05:22.490 --> 00:05:25.050
Wann. Yeah, Wann is really interesting. It calls

00:05:25.050 --> 00:05:28.269
itself an infinite canvas. So think Photoshop,

00:05:28.430 --> 00:05:30.089
but instead of using brushes and layers, you

00:05:30.089 --> 00:05:32.829
just edit reality by typing. So you'd highlight

00:05:32.829 --> 00:05:34.870
a window and just type, replace with a tropical

00:05:34.870 --> 00:05:38.439
beach. Exactly. But this is where that free part

00:05:38.439 --> 00:05:40.339
can get a little tricky. Most of these tools

00:05:40.339 --> 00:05:42.480
use a credit system, right? You get five free

00:05:42.480 --> 00:05:45.000
generations and you have to pay. But the sources

00:05:45.000 --> 00:05:48.720
found a specific hack for one. The free mode

00:05:48.720 --> 00:05:51.379
hack. This is critical. You have to dig into

00:05:51.379 --> 00:05:53.379
the settings and there's a little toggle switch

00:05:53.379 --> 00:05:56.819
that says, generate with credits. You need to

00:05:56.819 --> 00:05:59.100
turn that off. That feels backwards. Why would

00:05:59.100 --> 00:06:01.000
they let you just turn off their monetization?

00:06:01.139 --> 00:06:03.560
It just moves you to a slower server queue. So

00:06:03.560 --> 00:06:06.560
instead of your image generating in like 10 seconds.

00:06:06.680 --> 00:06:09.199
It might take 40, but it costs you nothing. So

00:06:09.199 --> 00:06:11.620
the trade -off is just 30 seconds of patience

00:06:11.620 --> 00:06:15.060
for unlimited creativity. Exactly. Time is the

00:06:15.060 --> 00:06:17.680
only currency here. If you're not in a huge rush,

00:06:17.839 --> 00:06:20.160
you can try out a hundred different ideas for

00:06:20.160 --> 00:06:22.660
free. It's perfect for just experimenting. I

00:06:22.660 --> 00:06:25.899
love that. Okay, let's move into video. This

00:06:25.899 --> 00:06:28.819
is where AI usually falls apart. We've all seen

00:06:28.819 --> 00:06:32.139
those videos where a person's face just... morphs

00:06:32.139 --> 00:06:34.139
three times in five seconds. Yeah, it's like

00:06:34.139 --> 00:06:36.860
a fever dream. It just breaks the immersion instantly.

00:06:37.339 --> 00:06:39.300
The technical term is object permanence, isn't

00:06:39.300 --> 00:06:42.540
it? Right. Does the AI remember what the person

00:06:42.540 --> 00:06:44.600
looked like in frame one when it gets to frame

00:06:44.600 --> 00:06:47.040
20? Yeah. And usually the answer is a hard no.

00:06:47.459 --> 00:06:49.259
The tool that's apparently solving this is one

00:06:49.259 --> 00:06:51.279
people might actually know, but maybe not for

00:06:51.279 --> 00:06:53.680
this feature. It's CapCut. Right. Specifically

00:06:53.680 --> 00:06:56.680
the web version and the instant AI video feature.

00:06:57.160 --> 00:06:59.779
CapCut is owned by ByteDance, and they have like

00:06:59.819 --> 00:07:02.180
arguably the best video data set on the planet.

00:07:02.379 --> 00:07:04.500
So they've cracked consistency. They really have.

00:07:04.819 --> 00:07:06.699
The source material tells this little story about

00:07:06.699 --> 00:07:09.339
a dog named Max who gets lost in a city and befriends

00:07:09.339 --> 00:07:13.100
a cat. A classic. Totally. But in a normal AI

00:07:13.100 --> 00:07:15.339
video, Max would start as a golden retriever

00:07:15.339 --> 00:07:18.779
and end up as a Labrador. With CapCut, Max stays

00:07:18.779 --> 00:07:22.040
Max. Same collar, same spots, every single clip.

00:07:22.319 --> 00:07:24.839
That is massive for storytelling. But what if

00:07:24.839 --> 00:07:27.459
one of the clips is just... Bad. Do you have

00:07:27.459 --> 00:07:29.360
to redo the whole thing? Nope. And that's the

00:07:29.360 --> 00:07:32.120
other key feature. You can regenerate just a

00:07:32.120 --> 00:07:34.819
single clip. So if scene three looks weird, you

00:07:34.819 --> 00:07:36.680
just fix that one scene without touching the

00:07:36.680 --> 00:07:39.240
rest of the movie. So how does having that character

00:07:39.240 --> 00:07:43.100
permanence really change AI storytelling? It

00:07:43.100 --> 00:07:45.279
turns a bunch of random clips into an actual

00:07:45.279 --> 00:07:47.379
narrative. You can build a character arc because

00:07:47.379 --> 00:07:49.740
the audience finally recognizes the character

00:07:49.740 --> 00:07:51.680
scene to scene. OK, but sometimes you don't need

00:07:51.680 --> 00:07:53.800
a whole movie. You just want an image that moves

00:07:53.800 --> 00:07:56.730
a little. A cinemagraph. Yeah, just something

00:07:56.730 --> 00:07:58.870
to stop the scroll on social media. The tool

00:07:58.870 --> 00:08:01.009
for this is Veer. And the best part about Veer

00:08:01.009 --> 00:08:02.670
is how accessible it is. You don't even have

00:08:02.670 --> 00:08:05.209
to make an account. That's so rare now. It's

00:08:05.209 --> 00:08:07.110
incredible. You can take that image of the student

00:08:07.110 --> 00:08:09.790
in the library we made with APOB, upload it to

00:08:09.790 --> 00:08:12.370
Veer, and just tell it. Make the rainfall and

00:08:12.370 --> 00:08:14.610
make the girl breathe. And it isolates those

00:08:14.610 --> 00:08:16.689
parts and animates them. It finds motion vectors,

00:08:16.930 --> 00:08:18.920
yeah. It's super subtle, you can even add little

00:08:18.920 --> 00:08:22.480
camera moves, like a slow zoom in for cinematic

00:08:22.480 --> 00:08:25.100
feel. The whole thing takes like 20 seconds.

00:08:25.660 --> 00:08:27.740
Is this really about making a video or is it

00:08:27.740 --> 00:08:30.079
about something else entirely? It's about capturing

00:08:30.079 --> 00:08:32.980
attention. It's designed to arrest your eye on

00:08:32.980 --> 00:08:35.519
a busy social feed, a static image is easy to

00:08:35.519 --> 00:08:38.139
ignore, something that moves even a little and

00:08:38.139 --> 00:08:41.100
grabs you. Speaking of grabbing you, let's talk

00:08:41.100 --> 00:08:44.710
about speed. Nothing kills creativity faster

00:08:44.710 --> 00:08:47.269
than waiting 20 minutes for a render that turns

00:08:47.269 --> 00:08:49.990
out to be wrong. The latency of creativity. It's

00:08:49.990 --> 00:08:52.389
a real flow killer. And that's where Quinn comes

00:08:52.389 --> 00:08:54.990
in. It's from Alibaba, and it is built for pure

00:08:54.990 --> 00:08:57.330
speed. How fast are we talking? Usually under

00:08:57.330 --> 00:08:59.669
two minutes. It works like chat GPT, but instead

00:08:59.669 --> 00:09:01.929
of replying with text, it replies with a video.

00:09:02.029 --> 00:09:04.330
So you just type drone flying over a cyberpunk

00:09:04.330 --> 00:09:07.450
city with neon lights and rain, and it just makes

00:09:07.450 --> 00:09:09.509
it. It just spits it out. And because it's so

00:09:09.509 --> 00:09:12.539
fast, you can iterate. If the video is too dark,

00:09:12.639 --> 00:09:14.179
you don't feel like you wasted your time. You

00:09:14.179 --> 00:09:16.720
just type, okay, make the neon lights brighter

00:09:16.720 --> 00:09:19.320
and maybe more pink, and boom, a new version

00:09:19.320 --> 00:09:22.659
appears. Does that speed actually improve the

00:09:22.659 --> 00:09:25.379
quality of the final art, do you think? I really

00:09:25.379 --> 00:09:28.240
believe it does. It allows for such rapid iteration

00:09:28.240 --> 00:09:30.860
and learning. You learn how to prompt better

00:09:30.860 --> 00:09:33.960
because the feedback is instant. You can fail

00:09:33.960 --> 00:09:37.120
50 times in an hour. Let's pivot a bit. We've

00:09:37.120 --> 00:09:39.039
been very focused on the arts. Let's talk business.

00:09:39.669 --> 00:09:41.750
The sources make this interesting distinction

00:09:41.750 --> 00:09:45.049
between a chatbot and an agent. What's the difference?

00:09:45.610 --> 00:09:48.090
It's simple. A chatbot talks to you. An agent

00:09:48.090 --> 00:09:50.210
does work for you. And the tool here is called

00:09:50.210 --> 00:09:53.149
Minimax. It's designed to be like a digital employee.

00:09:53.610 --> 00:09:55.450
OK, give me an example. How is that different

00:09:55.450 --> 00:09:57.710
from just asking ChatGPT for business advice?

00:09:57.870 --> 00:09:59.850
So let's say you're planning a new coffee shop.

00:09:59.929 --> 00:10:02.169
You ask Minimax for help. It doesn't just give

00:10:02.169 --> 00:10:05.029
you a wall of text. It actually generates downloadable

00:10:05.029 --> 00:10:07.820
files for you. Wait, it makes the files. It makes

00:10:07.820 --> 00:10:10.480
the actual files. It'll create a Word document

00:10:10.480 --> 00:10:12.960
for your menu. It'll create a separate text file

00:10:12.960 --> 00:10:15.019
with your marketing plan. You just download them

00:10:15.019 --> 00:10:17.100
and use them. So we're really moving from just

00:10:17.100 --> 00:10:20.399
conversation to actual execution. Precisely.

00:10:20.500 --> 00:10:23.860
It's the shift from advice to deliverables. It's

00:10:23.860 --> 00:10:26.000
the difference between hiring a consultant and

00:10:26.000 --> 00:10:28.679
hiring an intern who actually does the work.

00:10:28.740 --> 00:10:30.500
And if you're starting that business, you might

00:10:30.500 --> 00:10:32.759
need a simple app. Normally that means hiring

00:10:32.759 --> 00:10:35.519
someone or learning to code yourself, but the

00:10:35.519 --> 00:10:39.019
next tool, GLM 4 .7, claims you can do it with

00:10:39.019 --> 00:10:42.570
just English. pretty mind -blowing, especially

00:10:42.570 --> 00:10:45.970
for non -technical people. You use its full stack

00:10:45.970 --> 00:10:49.009
or code mode and you just describe the app you

00:10:49.009 --> 00:10:51.049
want. Like, make me a tip calculator. Exactly

00:10:51.049 --> 00:10:53.450
that. Create a tip calculator. I need a box for

00:10:53.450 --> 00:10:55.549
the bill amount and a slider for the percentage.

00:10:56.149 --> 00:10:58.490
GLM writes the code, it runs the code, and then

00:10:58.490 --> 00:11:00.570
a little window just pops up on your screen with

00:11:00.570 --> 00:11:02.250
the working app. You can actually use it right

00:11:02.250 --> 00:11:04.809
there. You can drag the slider, you can type

00:11:04.809 --> 00:11:06.730
in the bill amount, and if you think the numbers

00:11:06.730 --> 00:11:08.870
are too small, you just say, make the numbers

00:11:08.870 --> 00:11:11.370
bigger. You're literally debugging with your

00:11:11.370 --> 00:11:14.029
voice. So does this mean the barrier to software

00:11:14.029 --> 00:11:17.190
engineering, at least for simple tools, is now

00:11:17.190 --> 00:11:20.070
just English? For these kinds of tools, yeah.

00:11:20.350 --> 00:11:23.289
Language is the new syntax. If you can clearly

00:11:23.289 --> 00:11:25.330
articulate what you want, you can basically build

00:11:25.330 --> 00:11:28.029
it. OK, we've covered visuals, video, a business

00:11:28.029 --> 00:11:31.330
plan, an app, but we're missing a huge piece,

00:11:31.649 --> 00:11:34.809
audio. The silent film era is over for sure,

00:11:35.169 --> 00:11:37.590
but the copyright strike era is very much alive.

00:11:37.750 --> 00:11:39.929
You use a famous song on YouTube, and you get

00:11:39.929 --> 00:11:42.490
demonetized instantly. So the solution here is

00:11:42.490 --> 00:11:45.149
a tool called Hailu Audio. Right. And Hailu solves

00:11:45.149 --> 00:11:47.610
this in two ways. First, it generates music.

00:11:47.970 --> 00:11:50.590
You ask for an upbeat ukulele song for a travel

00:11:50.590 --> 00:11:53.110
vlog. and it composes a totally unique track

00:11:53.110 --> 00:11:55.269
for you. No copyright issues because it didn't

00:11:55.269 --> 00:11:57.970
exist five minutes ago. Exactly. And the second

00:11:57.970 --> 00:12:00.710
part is voice cloning. You just need about 30

00:12:00.710 --> 00:12:02.950
seconds of your own voice to train it. What's

00:12:02.950 --> 00:12:06.909
the practical use case for that? Laughs. Well,

00:12:06.990 --> 00:12:10.070
besides ego, it's mostly for efficiency. Say

00:12:10.070 --> 00:12:12.690
you record a whole video, but you stumble on

00:12:12.690 --> 00:12:15.769
just one word. The word. Instead of setting up

00:12:15.769 --> 00:12:17.649
your mic and lights and doing a whole new take,

00:12:17.830 --> 00:12:21.210
you just type that one sentence into Halu. generate

00:12:21.210 --> 00:12:24.169
the audio in your own voice, and patch it right

00:12:24.169 --> 00:12:26.649
in. Is this basically the end of needing the

00:12:26.649 --> 00:12:29.429
perfect take? Absolutely. It turns voice acting

00:12:29.429 --> 00:12:33.009
into a text editing process. You can fix audio

00:12:33.009 --> 00:12:35.350
like you'd fix a typo. It's incredible. Okay,

00:12:35.509 --> 00:12:37.909
now... There's a whole group of creators out

00:12:37.909 --> 00:12:40.490
there who have great ideas, but are, you know,

00:12:40.669 --> 00:12:42.929
camera shy. They just don't want their face all

00:12:42.929 --> 00:12:44.690
over the internet. And that's a huge group of

00:12:44.690 --> 00:12:46.929
people. The tool for them is TikTok Symphony,

00:12:47.009 --> 00:12:48.950
which is, of course, from ByteDance. It uses

00:12:48.950 --> 00:12:51.289
what they call digital avatars. These are the

00:12:51.289 --> 00:12:53.190
AI -generated people who will read your script

00:12:53.190 --> 00:12:55.669
for you. Yes. But because it's ByteDance, the

00:12:55.669 --> 00:12:58.389
tech is just terrifyingly good, especially the

00:12:58.389 --> 00:13:00.730
lip sync. It's built for vertical video on phone

00:13:00.730 --> 00:13:03.210
screens, so the mouth movements are shockingly

00:13:03.210 --> 00:13:05.470
natural. You just type the script, pick an app,

00:13:05.320 --> 00:13:08.120
avatar and it performs the video for you. So

00:13:08.120 --> 00:13:10.440
are we separating the creator from the performance

00:13:10.440 --> 00:13:13.580
itself? I think we are. It lets your personality

00:13:13.580 --> 00:13:16.679
and your ideas shine through without the anxiety

00:13:16.679 --> 00:13:19.580
of actually being on camera. It removes a huge

00:13:19.580 --> 00:13:21.740
barrier for a lot of people. We have covered

00:13:21.740 --> 00:13:25.039
so many cloud -based tools, but that raises a

00:13:25.039 --> 00:13:28.700
question. What happens if the internet goes down?

00:13:29.320 --> 00:13:32.019
Or what if one of these free tools suddenly decides

00:13:32.019 --> 00:13:35.340
to become not free? That is the ultimate vulnerability

00:13:35.340 --> 00:13:37.299
of the cloud, right? You don't own any of it.

00:13:37.320 --> 00:13:39.279
And that's why the final tool we're looking at

00:13:39.279 --> 00:13:41.320
is Pinocchio. And Pinocchio is not a website.

00:13:41.539 --> 00:13:43.980
No, it's an application. It's like a browser

00:13:43.980 --> 00:13:46.080
and an installer for your own local computer.

00:13:46.480 --> 00:13:49.179
Think of it as an app store for AI models that

00:13:49.179 --> 00:13:51.779
run on your hard drive. So you download the actual

00:13:51.779 --> 00:13:54.279
AI model to your machine. You download it, you

00:13:54.279 --> 00:13:56.720
install it, and it runs completely offline. You

00:13:56.720 --> 00:13:59.580
can run versions of these image and video generators

00:13:59.580 --> 00:14:02.080
using your own computer's power. What's the main

00:14:02.080 --> 00:14:03.960
benefit of doing that? Well, privacy is a big

00:14:03.960 --> 00:14:06.340
one. No one sees what you're making. But also,

00:14:06.559 --> 00:14:09.259
zero limits. No more credit systems, no server

00:14:09.259 --> 00:14:11.860
queues, no corporate censorship. And the best

00:14:11.860 --> 00:14:14.919
part, no one can ever turn it off. Is this the

00:14:14.919 --> 00:14:17.620
ultimate safety net for a digital creator? It's

00:14:17.620 --> 00:14:20.220
digital sovereignty. You actually own the means

00:14:20.220 --> 00:14:23.980
of production again. Sponsor. Okay, let's just

00:14:23.980 --> 00:14:26.799
take a breath here. We have unpacked a massive

00:14:26.799 --> 00:14:29.519
amount of information. We went from 3D modeling

00:14:29.519 --> 00:14:33.620
with Hanyuan to business agents with Minimax,

00:14:34.120 --> 00:14:36.600
all the way to running your own local servers

00:14:36.600 --> 00:14:39.379
with Pinocchio. It's an overwhelming list. It

00:14:39.379 --> 00:14:41.139
really is. And I think that's the real danger

00:14:41.139 --> 00:14:43.019
here. You look at these 11 tools and you think,

00:14:43.159 --> 00:14:45.399
I have to learn all of this right now. And please

00:14:45.399 --> 00:14:47.740
do not do that. Do not do that. The advice from

00:14:47.740 --> 00:14:50.539
the sources is spot on here. Just pick one. Just

00:14:50.539 --> 00:14:52.919
one. If you're an artist, go play with Juan in

00:14:52.919 --> 00:14:55.299
that free mode hack. If you're more business

00:14:55.299 --> 00:14:57.759
minded, give Minimax a try. Spend 20 minutes

00:14:57.759 --> 00:15:00.480
on it. Make something fun. Don't try to change

00:15:00.480 --> 00:15:02.940
your entire workflow in one afternoon. Exactly.

00:15:03.019 --> 00:15:04.799
The goal isn't to become an expert overnight.

00:15:05.159 --> 00:15:07.799
The goal is just to break the seal, to realize

00:15:07.799 --> 00:15:09.899
that these tools are just sitting there waiting

00:15:09.899 --> 00:15:12.200
for you to try them. And really, the only thing

00:15:12.200 --> 00:15:14.480
standing between you and this entire ecosystem

00:15:14.480 --> 00:15:18.610
is a bit of curiosity. and maybe a right click

00:15:18.610 --> 00:15:21.129
to hit translate. That's it. That's the whole

00:15:21.129 --> 00:15:23.169
key to the kingdom. So that's the challenge for

00:15:23.169 --> 00:15:26.009
this week. Step out of your comfortable, familiar

00:15:26.009 --> 00:15:29.110
app bubble. Go try one of these tools. See what

00:15:29.110 --> 00:15:31.590
you can build when there's no paywall there to

00:15:31.590 --> 00:15:34.330
stop you. The frontier is wide open. Go explore

00:15:34.330 --> 00:15:36.929
it. Thanks for listening to this deep dive. We'll

00:15:36.929 --> 00:15:37.649
catch you on the next one.
