WEBVTT

00:00:00.000 --> 00:00:02.120
I want you to just visualize the math for a second.

00:00:02.779 --> 00:00:05.360
You have a job, a standard architectural rendering

00:00:05.360 --> 00:00:10.300
job, and you are staring at 75 hours of waiting,

00:00:10.460 --> 00:00:14.820
75 hours of just fans spinning, your office getting

00:00:14.820 --> 00:00:18.780
hot. Yeah. Then you look at this new way and

00:00:18.780 --> 00:00:21.780
it's five minutes. Five minutes. You are taking

00:00:21.780 --> 00:00:25.660
a workflow that needs 850 individual expensive

00:00:25.660 --> 00:00:28.500
images and you are replacing it with just two.

00:00:28.839 --> 00:00:30.800
It sounds, I mean, honestly, it sounds like snake

00:00:30.800 --> 00:00:32.799
oil. It does. It sounds like one of those late

00:00:32.799 --> 00:00:35.159
night ads promising you can learn Spanish in

00:00:35.159 --> 00:00:37.619
your sleep. But we're not talking about a scam

00:00:37.619 --> 00:00:40.280
today. We are looking at a fundamental shift

00:00:40.280 --> 00:00:43.679
in the physics of how architecture gets visualized.

00:00:43.700 --> 00:00:46.240
Right. We're talking about the $500 mistake you

00:00:46.240 --> 00:00:48.520
might be making every single time you hit that

00:00:48.520 --> 00:00:51.159
render button. It is a really startling claim.

00:00:51.340 --> 00:00:54.700
But the numbers, they seem to back it up. Welcome

00:00:54.700 --> 00:00:57.320
to the Deep Dive. I'm your host, and today we

00:00:57.320 --> 00:01:00.420
are unpacking a fascinating and frankly a kind

00:01:00.420 --> 00:01:03.140
of aggressive piece by Max Anne. It's titled

00:01:03.140 --> 00:01:06.200
The $500 Mistake Why You Should Stop Sending

00:01:06.200 --> 00:01:09.430
Your Archviz to a Render Farm. We're going to

00:01:09.430 --> 00:01:12.390
get into the nuts and bolts of Kling AI 2 .6,

00:01:12.510 --> 00:01:15.030
this whole concept of the first and last frame

00:01:15.030 --> 00:01:18.049
workflow, and how some specific prompts, what

00:01:18.049 --> 00:01:20.810
Anne calls power prompts, are completely rewriting

00:01:20.810 --> 00:01:23.010
the rules. And look, usually when we talk design

00:01:23.010 --> 00:01:25.290
tech, it's about the art, right? The aesthetics.

00:01:25.549 --> 00:01:27.650
But today we really need to talk about the financial

00:01:27.650 --> 00:01:29.950
reality. Because if you're still doing things

00:01:29.950 --> 00:01:32.629
the old way, the way we've done them for 20 years,

00:01:32.849 --> 00:01:34.730
Anne suggests you might be leaving about $100

00:01:34.730 --> 00:01:37.109
,000 a year on the table. That number just...

00:01:37.280 --> 00:01:39.439
It jumped out at me immediately. That's a salary.

00:01:39.560 --> 00:01:41.780
It's a whole employee. It's a whole person. But

00:01:41.780 --> 00:01:43.680
before we get to the cash, let's start with the

00:01:43.680 --> 00:01:46.700
pain. Ann calls it the progress bar nightmare.

00:01:47.079 --> 00:01:50.019
Oh, I felt that in my soul. If you work in architectural

00:01:50.019 --> 00:01:52.560
visualization or really any high -end 3D work,

00:01:52.700 --> 00:01:55.099
you know this pain viscerally. It's the time

00:01:55.099 --> 00:01:58.180
debt. Let's unpack that term, time debt. Because

00:01:58.180 --> 00:02:00.760
for someone who just sees the final pretty building,

00:02:00.959 --> 00:02:03.340
the process is invisible, what are we actually

00:02:03.340 --> 00:02:05.299
paying with here? It's the part of the job the

00:02:05.299 --> 00:02:08.860
client never, ever sees. Traditionally, if you

00:02:08.860 --> 00:02:11.919
want just a standard 10 -second animation, nothing

00:02:11.919 --> 00:02:15.020
fancy, just a simple walkthrough, at 30 frames

00:02:15.020 --> 00:02:19.400
per second, you need 300 individual images. 300?

00:02:19.919 --> 00:02:22.240
And in high -end archviz, we're not talking about,

00:02:22.280 --> 00:02:24.599
like... Video game graphics. We're talking about

00:02:24.599 --> 00:02:27.039
simulating light bouncing off velvet refraction

00:02:27.039 --> 00:02:29.680
through glass. The details. All the tiny details.

00:02:29.960 --> 00:02:33.199
One single frame can take 15 minutes to render

00:02:33.199 --> 00:02:36.400
on a powerful machine. So if I'm doing the math

00:02:36.400 --> 00:02:39.300
on that, 300 frames times 15 minutes, that's

00:02:39.300 --> 00:02:41.599
not a lunch break. No. You're looking at anywhere

00:02:41.599 --> 00:02:45.639
from 23 to 75 hours. That's three days. Yeah.

00:02:45.919 --> 00:02:48.400
Just babysitting a computer. That describes it

00:02:48.400 --> 00:02:50.280
perfectly. He says you stop being a designer

00:02:50.280 --> 00:02:53.759
and you start being a highly paid computer technician

00:02:53.759 --> 00:02:56.439
waiting for a loading screen. And here's the

00:02:56.439 --> 00:02:58.960
kicker. The thing that drives professionals crazy.

00:02:59.460 --> 00:03:02.580
What happens if the client calls you on hour

00:03:02.580 --> 00:03:05.680
60 and says, hey, can we move that chair three

00:03:05.680 --> 00:03:07.960
inches to the left? Oh, no, you have to start

00:03:07.960 --> 00:03:11.159
over. You start the 75 hour clock all over again.

00:03:11.280 --> 00:03:13.240
And that's where the money burns. Because it's

00:03:13.240 --> 00:03:15.699
not just your time. You're paying for the computing

00:03:15.699 --> 00:03:17.900
power. This is the part I think people outside

00:03:17.900 --> 00:03:19.960
the industry just don't get. You don't just have

00:03:19.960 --> 00:03:21.599
the computer power for this. You usually have

00:03:21.599 --> 00:03:24.439
to rent it. Right. From a render farm, these

00:03:24.439 --> 00:03:27.219
are massive data centers, huge racks of servers

00:03:27.219 --> 00:03:29.740
that you rent to do all the number crunching

00:03:29.740 --> 00:03:32.939
for you. And they charge per core hour. Can you

00:03:32.939 --> 00:03:35.180
define that for the listener who hasn't seen

00:03:35.180 --> 00:03:37.620
an invoice for one of these? Sure. A core hour

00:03:37.620 --> 00:03:41.120
is basically the cost to use one processor core

00:03:41.120 --> 00:03:43.810
for one hour. It's usually between, say, $0 .02

00:03:43.810 --> 00:03:46.909
and $0 .10, which sounds cheap, right? Just pennies.

00:03:47.150 --> 00:03:51.349
But for a complex scene with 850 frames, you're

00:03:51.349 --> 00:03:54.129
looking at a bill of over $500. For one version.

00:03:54.330 --> 00:03:56.289
For one version. If you do three revisions for

00:03:56.289 --> 00:04:01.500
the client, that's $1 ,500. It is just wild that

00:04:01.500 --> 00:04:03.879
this has been accepted as the cost of doing business.

00:04:04.120 --> 00:04:06.539
It raises a really big question for me. I mean,

00:04:06.580 --> 00:04:08.620
architects, designers, they're obsessed with

00:04:08.620 --> 00:04:11.479
efficiency. Why have smart professionals accepted

00:04:11.479 --> 00:04:14.680
this financial bleed as just normal for so long?

00:04:14.900 --> 00:04:17.019
It really just comes down to necessity. It was

00:04:17.019 --> 00:04:19.459
the only way to get that quality until now. Normal

00:04:19.459 --> 00:04:21.819
didn't mean good. It just meant it was possible.

00:04:22.379 --> 00:04:24.680
If you wanted photorealism, you had to pay the

00:04:24.680 --> 00:04:27.870
tax. But that definition of possible has just

00:04:27.870 --> 00:04:30.230
shifted. Let's talk about the tool that changes

00:04:30.230 --> 00:04:34.009
the physics of this. Cling 2 .6. Released late

00:04:34.009 --> 00:04:37.069
2025. This is the big disruptive element here.

00:04:37.250 --> 00:04:38.970
Now, I have to play devil's advocate a little.

00:04:39.110 --> 00:04:41.670
We've seen AI video before. We've all seen the

00:04:41.670 --> 00:04:44.050
memes. Oh, yeah. The walls are breathing. The

00:04:44.050 --> 00:04:46.410
furniture turns into a dog. The camera feels

00:04:46.410 --> 00:04:48.310
like it's floating in soup. It's hallucinogenic.

00:04:49.319 --> 00:04:51.379
Why is this any different? So you're describing

00:04:51.379 --> 00:04:53.259
the single image problem. Yeah. In the older

00:04:53.259 --> 00:04:55.399
AI models, you give it a picture of a living

00:04:55.399 --> 00:04:58.720
room and say, animate this. The AI has to guess

00:04:58.720 --> 00:05:00.220
what the rest of the room looks like. It has

00:05:00.220 --> 00:05:02.439
to guess what's behind the sofa. And it's a terrible

00:05:02.439 --> 00:05:04.879
guesser. A terrible guesser, especially when

00:05:04.879 --> 00:05:07.800
it comes to strict geometry. It improvises. Yeah.

00:05:07.839 --> 00:05:10.459
And architects, they hate improvisation. They

00:05:10.459 --> 00:05:13.019
hate it. They want stability. If a column moves,

00:05:13.300 --> 00:05:15.870
the whole building falls down. So Kling 2 .6

00:05:15.870 --> 00:05:18.509
introduces this first and last frame control.

00:05:18.730 --> 00:05:20.810
This is the genius part. You don't just give

00:05:20.810 --> 00:05:23.230
it the start. You go back to your 3D software

00:05:23.230 --> 00:05:26.129
blender, 3's Max, whatever, and you render frame

00:05:26.129 --> 00:05:28.910
one. And then you render frame 300. The destination.

00:05:29.170 --> 00:05:31.509
Exactly. You give the AI the beginning and the

00:05:31.509 --> 00:05:34.649
end. Both are perfect, geometrically accurate

00:05:34.649 --> 00:05:37.610
renders from your software. And then you tell

00:05:37.610 --> 00:05:40.110
the AI, find the path. So it's not creating the

00:05:40.110 --> 00:05:42.149
room from scratch. No. It's just connecting the

00:05:42.149 --> 00:05:45.379
dots. Precisely. It's interpolation. It calculates

00:05:45.379 --> 00:05:47.720
the movement between two locked points. It literally

00:05:47.720 --> 00:05:49.819
can't make the wall warp because the wall is

00:05:49.819 --> 00:05:51.579
locked in place at the finish line. It has no

00:05:51.579 --> 00:05:53.959
choice but to stay straight. That is astounding.

00:05:54.420 --> 00:05:58.639
It's elegantly simple. It really is. And Kling

00:05:58.639 --> 00:06:02.300
2 .6 also added native audio, so it'll generate

00:06:02.300 --> 00:06:05.660
sound effects like ambient noise, footsteps in

00:06:05.660 --> 00:06:08.180
that same pass. But the visual stability is the

00:06:08.180 --> 00:06:12.279
headline. You are replacing... 850 frames of

00:06:12.279 --> 00:06:15.540
heavy rendering with just two. Two frames. That

00:06:15.540 --> 00:06:19.600
is a 99 % reduction in render time. Whoa, that's

00:06:19.600 --> 00:06:21.439
almost hard to wrap your head around. You're

00:06:21.439 --> 00:06:23.259
basically building a bridge and you only have

00:06:23.259 --> 00:06:25.439
to build the two pillars on the banks and the

00:06:25.439 --> 00:06:29.220
AI just manifests the bridge in between. That

00:06:29.220 --> 00:06:31.420
is a great analogy. And because you built the

00:06:31.420 --> 00:06:33.939
pillars, you know the bridge lands exactly where

00:06:33.939 --> 00:06:36.310
it's supposed to. So just to be crystal clear,

00:06:36.449 --> 00:06:38.889
the AI isn't hallucinating the building. It's

00:06:38.889 --> 00:06:41.629
strictly filling in the gap. Exactly. It's constrained

00:06:41.629 --> 00:06:44.970
logic. It forces stability by locking that destination.

00:06:45.329 --> 00:06:46.970
OK, so that's the theory. But I want to know

00:06:46.970 --> 00:06:48.970
how a professional actually does this, because

00:06:48.970 --> 00:06:51.410
I assume you can't just type make it cool and

00:06:51.410 --> 00:06:53.589
get a client ready video. There has to be some

00:06:53.589 --> 00:06:56.129
craft involved. Or absolutely. Amy is very clear

00:06:56.129 --> 00:06:58.810
about this. You can't just vibe your way through

00:06:58.810 --> 00:07:00.730
it. You need a workflow. So walk us through it.

00:07:00.730 --> 00:07:04.050
Phase one. Phase one is the 3D prep. You're still

00:07:04.050 --> 00:07:06.230
a designer. You set up your camera path and your

00:07:06.230 --> 00:07:09.449
software. But, and this is a massive constraint,

00:07:09.649 --> 00:07:12.449
the movement has to be intentional. Define intentional

00:07:12.449 --> 00:07:16.269
for me. No roller coasters, no crazy acrobatics,

00:07:16.310 --> 00:07:20.370
slow pushes, smooth pans. Why? Why can't I do

00:07:20.370 --> 00:07:24.620
a 360 spin? Because of object permanence. If

00:07:24.620 --> 00:07:27.639
the camera spins too fast, the AI loses track

00:07:27.639 --> 00:07:29.399
of what the objects are supposed to look like.

00:07:29.439 --> 00:07:31.579
It just gets confused. So you keep it cinematic.

00:07:31.579 --> 00:07:33.980
You export frame one and frame 300. Standard

00:07:33.980 --> 00:07:37.480
HD resolution is fine. Okay. And phase two? Phase

00:07:37.480 --> 00:07:40.079
two is the setup inside Kling. You select image

00:07:40.079 --> 00:07:43.259
to video and you enable that first and last frame

00:07:43.259 --> 00:07:46.079
toggle. And it suggests starting with five second

00:07:46.079 --> 00:07:48.199
clips just to get the hang of it. You know, walk

00:07:48.199 --> 00:07:50.660
before you run. Makes sense. Now phase three.

00:07:51.279 --> 00:07:53.100
This is the part that really interested me. The

00:07:53.100 --> 00:07:55.740
language we use. The power prompt. Yes. This

00:07:55.740 --> 00:07:57.480
is where you make or break the entire shot. And

00:07:57.480 --> 00:08:00.220
breaks it down into a formula. What is it? It's

00:08:00.220 --> 00:08:03.019
a four -part structure. You need movement, scene

00:08:03.019 --> 00:08:05.819
description, lighting, and technical rules. Okay.

00:08:05.860 --> 00:08:07.899
Give me an example of how a novice would screw

00:08:07.899 --> 00:08:11.120
this up versus how a pro does it. Okay. A bad

00:08:11.120 --> 00:08:14.819
prompt, a novice prompt, is just camera moves

00:08:14.819 --> 00:08:18.839
through room. Vague. Very vague. Very. The AI

00:08:18.839 --> 00:08:20.579
will do whatever it wants. Here's the expert

00:08:20.579 --> 00:08:23.079
version, and it uses slow -forward tracking shot

00:08:23.079 --> 00:08:25.459
through a modern minimalist living room, natural

00:08:25.459 --> 00:08:28.180
daylight streaming through windows, stable geometry,

00:08:28.480 --> 00:08:31.639
no morphing, cinematic photography. Stable geometry,

00:08:31.920 --> 00:08:34.519
no morphing. So you're explicitly telling it

00:08:34.519 --> 00:08:37.320
what not to do. That's the technical rules part

00:08:37.320 --> 00:08:39.419
of the formula. It's the safety rail. It's so

00:08:39.419 --> 00:08:41.320
interesting that we have to speak to the machine

00:08:41.320 --> 00:08:44.080
in its own language of constraints. We have to

00:08:44.080 --> 00:08:46.580
tell it, hey, don't hallucinate. We do. We have

00:08:46.580 --> 00:08:48.559
to remind it of the laws of physics. So looking

00:08:48.559 --> 00:08:51.159
at that whole formula, what is the one specific

00:08:51.159 --> 00:08:53.580
variable in there that prevents all the weirdness?

00:08:53.860 --> 00:08:55.500
It's definitely that technical rules section,

00:08:55.659 --> 00:08:58.080
explicitly commanding stable geometry and no

00:08:58.080 --> 00:09:00.720
morphing. Okay, we're back. We have the tool.

00:09:00.779 --> 00:09:03.340
We have the prompt. Now let's talk about selling

00:09:03.340 --> 00:09:06.259
this. Because technology is cool, but applications

00:09:06.259 --> 00:09:09.919
are what get invoices paid. And outlines a few

00:09:09.919 --> 00:09:12.379
of these magic applications. And these are honestly

00:09:12.379 --> 00:09:15.120
brilliant because they turn what could be a tech

00:09:15.120 --> 00:09:18.340
limitation into a feature. The first one he calls

00:09:18.340 --> 00:09:21.179
magic staging. This is huge for renovation concepts.

00:09:21.399 --> 00:09:24.899
You render an empty room as frame one. And then

00:09:24.899 --> 00:09:26.860
the fully furnished room as the last frame. And

00:09:26.860 --> 00:09:29.220
the AI just fills in the middle. Yes, but with

00:09:29.220 --> 00:09:32.090
nuance. The prompt is something like. Furniture

00:09:32.090 --> 00:09:34.669
and decorations should appear piece by piece,

00:09:34.850 --> 00:09:37.129
popping up one by one. So it actually creates

00:09:37.129 --> 00:09:39.649
a time -lapse effect. Exactly. A clean, smooth

00:09:39.649 --> 00:09:42.190
transition of a room furnishing itself. That

00:09:42.190 --> 00:09:44.409
used to take days of manual keyframing. Wow.

00:09:44.549 --> 00:09:47.309
Now, three minutes. That is a killer marketing

00:09:47.309 --> 00:09:49.309
asset. I can see that on Instagram immediately.

00:09:49.929 --> 00:09:52.230
It is. Another one is atmospheric lighting shifts.

00:09:52.509 --> 00:09:54.409
Oh, I like this. Day to night. Day to night.

00:09:54.710 --> 00:09:57.169
Frame one is bright daylight. The last frame

00:09:57.169 --> 00:10:00.110
is a moody evening with warm lamps. The prompt

00:10:00.110 --> 00:10:02.830
describes golden afternoon sunbeams crawling

00:10:02.830 --> 00:10:05.269
across the floor. Crawling across the floor.

00:10:05.429 --> 00:10:08.230
That's poetic. It is. And the AI understands

00:10:08.230 --> 00:10:11.190
that flow. Yeah. It animates the shadows lengthening,

00:10:11.190 --> 00:10:13.149
the lights flickering on. It's very emotional.

00:10:13.409 --> 00:10:16.149
And emotion sells architecture. Always. Now,

00:10:16.210 --> 00:10:17.850
I have to ask about the elephant in the room.

00:10:17.929 --> 00:10:20.750
People. 3D people usually look like zombies.

00:10:20.809 --> 00:10:22.730
They slide across the floor. Their eyes are dead.

00:10:23.610 --> 00:10:27.039
Does Kling fix the zombie problem? It helps.

00:10:27.200 --> 00:10:30.940
And calls this application humanity. How does

00:10:30.940 --> 00:10:33.940
it work? You don't use motion capture data, which

00:10:33.940 --> 00:10:35.840
is expensive and really hard to clean up. You

00:10:35.840 --> 00:10:38.840
just set frame A, person is at the door. Frame

00:10:38.840 --> 00:10:41.139
B, person is sitting on the sofa. And the AI

00:10:41.139 --> 00:10:43.679
figures out how they walked there. It fills in

00:10:43.679 --> 00:10:46.179
the natural movement. Walking, sitting, shifting

00:10:46.179 --> 00:10:48.940
their weight. No rigging required at all. He

00:10:48.940 --> 00:10:50.759
gives an example of a woman walking to a sofa

00:10:50.759 --> 00:10:53.580
with a cup of tea. The AI handles the subtle

00:10:53.580 --> 00:10:55.879
physics of holding the cup, the fabric of her

00:10:55.879 --> 00:11:00.120
shirt moving. But be honest, is it perfect? Or

00:11:00.120 --> 00:11:02.759
does she suddenly grow a third arm halfway there?

00:11:03.019 --> 00:11:06.039
It's not perfect. There's a catch. You have to

00:11:06.039 --> 00:11:08.519
keep the people secondary. If you make them the

00:11:08.519 --> 00:11:11.240
main focus, you zoom right in on their face,

00:11:11.419 --> 00:11:14.259
the flaws show up. The uncanny valley is still

00:11:14.259 --> 00:11:17.409
there. Okay. But it's background life. It's incredible.

00:11:17.590 --> 00:11:19.190
And then there's this falling furniture concept.

00:11:19.289 --> 00:11:21.269
Yeah, physics and stop motion. This is purely

00:11:21.269 --> 00:11:24.470
for viral reels. You prompt for a stop motion

00:11:24.470 --> 00:11:27.429
animation style. Furniture slides and skids into

00:11:27.429 --> 00:11:30.549
place. It's playful. It's not physically perfect,

00:11:30.629 --> 00:11:32.909
but it absolutely grabs attention on Instagram.

00:11:33.360 --> 00:11:35.299
It seems like the whole paradigm is shifting

00:11:35.299 --> 00:11:39.379
from perfect simulation to mood and flow. That's

00:11:39.379 --> 00:11:41.379
a great distinction. Yeah. So does this work

00:11:41.379 --> 00:11:44.399
for really complex interactions or is it just

00:11:44.399 --> 00:11:46.500
for mood setting? It's mostly mood and flow.

00:11:46.559 --> 00:11:48.460
You have to keep people secondary to the architecture,

00:11:48.559 --> 00:11:50.779
at least for now. Okay. So it's time for a reality

00:11:50.779 --> 00:11:52.759
check. We're painting a very rosy picture here.

00:11:52.820 --> 00:11:56.220
Five minutes, $500 saved. But is the render farm

00:11:56.220 --> 00:12:00.240
actually dead? Not entirely. And An is honest

00:12:00.240 --> 00:12:03.120
about this. He puts the tech at about 80 % ready.

00:12:03.259 --> 00:12:06.500
What's in that missing 20 %? Complex camera moves

00:12:06.500 --> 00:12:10.080
like spirals or really intricate loops or extreme

00:12:10.080 --> 00:12:13.659
close -ups on textures. If you need to show the

00:12:13.659 --> 00:12:16.200
specific weave of a fabric for a manufacturer

00:12:16.200 --> 00:12:20.259
or if you need absolute... dimensional accuracy

00:12:20.259 --> 00:12:22.759
for a legal submission. You can't have the wall

00:12:22.759 --> 00:12:25.539
wobble even a single millimeter. Right. For regulatory

00:12:25.539 --> 00:12:27.820
work or when you're in court proving a sight

00:12:27.820 --> 00:12:30.100
line, you stick to the traditional render farm.

00:12:30.460 --> 00:12:32.740
You need the physics engine, not the AI, I guess.

00:12:32.940 --> 00:12:35.519
But for everything else, concepts, marketing,

00:12:35.659 --> 00:12:38.419
social media, client buy -in. It's a no -brainer.

00:12:38.840 --> 00:12:41.139
And this is where we get back to the money, the

00:12:41.139 --> 00:12:42.759
money math. Let's break it down per project.

00:12:43.080 --> 00:12:45.860
Okay. A traditional 10 -second walkthrough costs

00:12:45.860 --> 00:12:49.850
about $2 ,500. That's the render farm fees, plus

00:12:49.850 --> 00:12:51.990
about 20 hours of your labor at your billing

00:12:51.990 --> 00:12:54.129
rate. Expensive. The AI -powered version, about

00:12:54.129 --> 00:12:58.230
$250, $10 in AI credits, and maybe two and a

00:12:58.230 --> 00:13:00.570
half hours of labor. So you're saving $2 ,250

00:13:00.570 --> 00:13:03.690
per project. Exactly. If you do four projects

00:13:03.690 --> 00:13:06.710
a month, you are effectively finding $100 ,000

00:13:06.710 --> 00:13:09.509
a year in overhead. That is life -changing money

00:13:09.509 --> 00:13:11.940
for a small studio. That's the difference between

00:13:11.940 --> 00:13:14.259
profitability and bankruptcy. It really is. It's

00:13:14.259 --> 00:13:17.039
not just saving money. It's reclaiming your profit

00:13:17.039 --> 00:13:20.940
margin. I have to admit, though, listening to

00:13:20.940 --> 00:13:24.940
all this, there is something a little scary about

00:13:24.940 --> 00:13:27.860
it. I still wrestle with prompt drift myself

00:13:27.860 --> 00:13:31.480
or just the fear of the black box. When you render

00:13:31.480 --> 00:13:34.529
manually. You control every photon. You know

00:13:34.529 --> 00:13:37.570
exactly why a shadow falls where it does. When

00:13:37.570 --> 00:13:39.850
you write a prompt, you're trusting an algorithm.

00:13:40.110 --> 00:13:42.450
It feels like giving up the steering wheel. It's

00:13:42.450 --> 00:13:44.909
a loss of control, sure. But it's a gain in leverage.

00:13:44.950 --> 00:13:46.649
You're saying, I don't need to control the photon.

00:13:46.710 --> 00:13:49.149
I just need the result. So based on those numbers,

00:13:49.269 --> 00:13:52.450
is the render farm completely obsolete? Not entirely.

00:13:52.610 --> 00:13:54.850
It's obsolete for concepts. Yeah. But it's still

00:13:54.850 --> 00:13:57.149
essential for that legal precision. It's a hybrid

00:13:57.149 --> 00:13:59.490
world now. Got it. Okay. Now for the listeners

00:13:59.490 --> 00:14:01.389
who are thinking, okay, I'm in, but I want to

00:14:01.389 --> 00:14:03.769
do this at a high level. And outline some advanced

00:14:03.769 --> 00:14:06.309
techniques to really polish this stuff. He does.

00:14:06.450 --> 00:14:09.409
The first one is for length. If you try to generate,

00:14:09.429 --> 00:14:12.669
say, a 20 -second clip in one go, the AI will

00:14:12.669 --> 00:14:15.669
drift. It gets amnesia. It loses the plot. Right.

00:14:15.750 --> 00:14:19.190
So you use multi -segment animations. You generate

00:14:19.190 --> 00:14:21.940
a five -second clip. clip A, and you take the

00:14:21.940 --> 00:14:25.759
very last frame of clip A and you make that the

00:14:25.759 --> 00:14:27.580
start frame of clip B. So you're stitching them

00:14:27.580 --> 00:14:30.600
together like a relay race. Exactly. It prevents

00:14:30.600 --> 00:14:32.879
the drifting because you keep re -anchoring the

00:14:32.879 --> 00:14:35.139
reality every five seconds. That's really smart.

00:14:35.480 --> 00:14:37.700
Then there's the hybrid workflow. This is for

00:14:37.700 --> 00:14:40.559
when the AI gets you 90 % there, but there's

00:14:40.559 --> 00:14:42.279
a little glitch in the corner. Maybe a plant

00:14:42.279 --> 00:14:45.039
looks weird. You can export the frames, fix them

00:14:45.039 --> 00:14:48.039
in Photoshop, manually clean it up, and then

00:14:48.039 --> 00:14:50.070
feed them back in. So you're actually collaborating

00:14:50.070 --> 00:14:52.690
with the AI. You fix its mistakes. Precisely.

00:14:52.730 --> 00:14:56.529
And finally, style consistency. If you're doing

00:14:56.529 --> 00:14:58.730
a whole house, you don't want the kitchen to

00:14:58.730 --> 00:15:01.409
look like a moody film noir and the bedroom to

00:15:01.409 --> 00:15:03.269
look like a Pixar cartoon. That would be a little

00:15:03.269 --> 00:15:05.669
jarring. So you use style anchors. These are

00:15:05.669 --> 00:15:07.710
phrases you repeat in every single tromp for

00:15:07.710 --> 00:15:10.190
that project. Like what? Architectural digest

00:15:10.190 --> 00:15:14.169
photography style. Or Scandinavian minimalist

00:15:14.169 --> 00:15:17.590
aesthetic. You train the AI on the vibe. You

00:15:17.590 --> 00:15:20.519
do. And the last technical tip is upscaling.

00:15:20.620 --> 00:15:24.480
Cling outputs at 1080p. Most clients want 4K.

00:15:24.860 --> 00:15:27.200
So you run it through a tool like Topaz Video

00:15:27.200 --> 00:15:30.539
AI to upscale and sharpen it. It bridges that

00:15:30.539 --> 00:15:33.259
last gap to professional delivery. So if I'm

00:15:33.259 --> 00:15:35.240
understanding this right, the key to not letting

00:15:35.240 --> 00:15:38.120
the style drift is those style anchors repeating

00:15:38.120 --> 00:15:40.659
the exact same aesthetic keywords every single

00:15:40.659 --> 00:15:43.840
time. Yes. Consistency in your language equals

00:15:43.840 --> 00:15:46.980
consistency in the visuals. This has been really

00:15:46.980 --> 00:15:48.679
eye -opening. We are definitely looking at a

00:15:48.679 --> 00:15:50.919
hybrid future. We are. You know, we look back

00:15:50.919 --> 00:15:53.659
at early CGI in movies, I think the 90s, and

00:15:53.659 --> 00:15:56.659
it looks charming but primitive. The Scorpion

00:15:56.659 --> 00:15:58.960
King comes to mind. Exactly. We're in the Scorpion

00:15:58.960 --> 00:16:01.700
King era of AI video right now. It's impressive,

00:16:01.759 --> 00:16:03.580
but it's going to get so much better. But the

00:16:03.580 --> 00:16:06.259
point Ahn makes is so crucial. You can't wait

00:16:06.259 --> 00:16:08.600
for it to be perfect. No, because by the time

00:16:08.600 --> 00:16:11.039
it's perfect, everyone will use it. The competitive

00:16:11.039 --> 00:16:12.879
advantage belongs to the people who master it

00:16:12.879 --> 00:16:15.500
now, while it's still a little bit messy. The

00:16:15.500 --> 00:16:17.559
people who figure out how to reclaim that hundred

00:16:17.559 --> 00:16:20.779
grand a year. Exactly. Stop asking, can this

00:16:20.779 --> 00:16:23.600
replace rendering? And start asking, how can

00:16:23.600 --> 00:16:26.279
I use this to do things I couldn't do before?

00:16:26.480 --> 00:16:28.820
That's the real takeaway. Don't let the tool

00:16:28.820 --> 00:16:31.519
replace you. Let it multiply you. Couldn't have

00:16:31.519 --> 00:16:33.539
said it better myself. And here's a thought to

00:16:33.539 --> 00:16:36.080
leave you with. If the visualization becomes

00:16:36.080 --> 00:16:39.419
this fast and this cheap, does the design process

00:16:39.419 --> 00:16:42.519
itself change? If you can see the finished building

00:16:42.519 --> 00:16:45.179
in five minutes instead of three days, do you

00:16:45.179 --> 00:16:48.379
take more risks? Do you iterate more? Or do we

00:16:48.379 --> 00:16:50.220
just end up churning out more generic buildings

00:16:50.220 --> 00:16:53.559
but faster? That is the really exciting and kind

00:16:53.559 --> 00:16:55.519
of terrifying question, isn't it? Something to

00:16:55.519 --> 00:16:57.399
mull over. Thanks for diving in with us. Thanks

00:16:57.399 --> 00:16:58.700
for having me. We'll see you on the next one.
