WEBVTT

00:00:00.000 --> 00:00:02.680
I want you to imagine a scenario. It's a Tuesday

00:00:02.680 --> 00:00:05.940
morning. You are sitting at your desk. Maybe

00:00:05.940 --> 00:00:07.799
you have your coffee in hand. Naturally. You

00:00:07.799 --> 00:00:10.400
lean back in your chair. You fold your hands

00:00:10.400 --> 00:00:13.240
in your lap. You are absolutely not touching

00:00:13.240 --> 00:00:15.900
the keyboard. Right. And you are definitely not

00:00:15.900 --> 00:00:17.640
touching the mouse. Kind of hard to work that

00:00:17.640 --> 00:00:20.859
way. Exactly. But on your screen, the cursor

00:00:20.859 --> 00:00:24.600
is moving. And it's not. It's not glitching out.

00:00:24.780 --> 00:00:27.920
It's moving with purpose. Yeah. It clicks open

00:00:27.920 --> 00:00:31.320
a web browser. It navigates to a vendor's website.

00:00:31.519 --> 00:00:34.119
It's just by itself? Completely by itself. It

00:00:34.119 --> 00:00:36.520
fills out a complex order form, checking boxes,

00:00:36.780 --> 00:00:39.060
scrolling down to find the submit button. Wow.

00:00:39.240 --> 00:00:42.259
Then it opens a spreadsheet and logs the confirmation

00:00:42.259 --> 00:00:45.299
number. That's wild. It's not a recording? Not

00:00:45.299 --> 00:00:47.719
a script you wrote line by line? No. It's an

00:00:47.719 --> 00:00:50.659
AI literally using the computer interface. Just

00:00:50.659 --> 00:00:53.880
like a human employee would. It sounds like science

00:00:53.880 --> 00:00:56.200
fiction. Or maybe a hacker taking over your machine.

00:00:56.520 --> 00:00:58.700
It really does. But this is actually the reality

00:00:58.700 --> 00:01:00.299
we're looking at today. This is the computer

00:01:00.299 --> 00:01:02.899
use feature. And it is the headline act for the

00:01:02.899 --> 00:01:05.840
model we're discussing. Plaudsonnet 4 .6. Welcome

00:01:05.840 --> 00:01:08.359
to the deep dive. Today, we are not just talking

00:01:08.359 --> 00:01:11.120
about a chat bot that writes better poetry. Right.

00:01:11.500 --> 00:01:13.719
We're looking at a fundamental shift in mechanics.

00:01:14.019 --> 00:01:17.799
We are. Our source for this is a guide. It's

00:01:17.799 --> 00:01:21.099
called Mastering Claude 3 .5 Sonnet. Though,

00:01:21.400 --> 00:01:23.640
just to be totally clear for you listening, the

00:01:23.640 --> 00:01:26.099
text focuses heavily on the upgraded version.

00:01:26.420 --> 00:01:30.019
Sonnet 4 .6. Exactly. It's a fascinating document.

00:01:30.459 --> 00:01:32.799
It frames this moment not just as an upgrade

00:01:32.799 --> 00:01:36.879
in IQ, but as a shift in utility. Exactly. The

00:01:36.879 --> 00:01:39.159
premise here is striking. We're looking at a

00:01:39.159 --> 00:01:42.640
model that claims to be twice as fast as the

00:01:42.640 --> 00:01:45.079
Heavy Opus model. Which was the previous King

00:01:45.079 --> 00:01:47.739
of the Hill. Right. Yet it costs half the price.

00:01:47.900 --> 00:01:50.420
Half the price. Half. And if you believe the

00:01:50.420 --> 00:01:52.939
benchmarks, it's significantly smarter. That

00:01:52.939 --> 00:01:55.700
is the holy grail of software, isn't it? Better,

00:01:55.980 --> 00:01:58.099
faster, cheaper. Usually in engineering, you're

00:01:58.099 --> 00:02:00.099
told to pick two. If it's fast and cheap, it's

00:02:00.099 --> 00:02:03.099
usually garbage. Yeah, usually. And that is why

00:02:03.099 --> 00:02:05.719
we need to unpack this carefully. Claims like

00:02:05.719 --> 00:02:08.039
that usually come with an asterisk. So we have

00:02:08.039 --> 00:02:10.560
a roadmap for today's deep dive. Let's do it.

00:02:11.379 --> 00:02:13.930
First, we need to understand the brain. Specifically,

00:02:14.129 --> 00:02:17.150
the context window and this spooky new ability

00:02:17.150 --> 00:02:19.750
to control computers. Then we have to look at

00:02:19.750 --> 00:02:22.449
the economics. For businesses, that is the bottom

00:02:22.449 --> 00:02:24.810
line. Then we will walk through the stress tests.

00:02:25.009 --> 00:02:27.830
Everything from fake operating systems to 3D

00:02:27.830 --> 00:02:30.550
survival games. And finally, we'll look at agent

00:02:30.550 --> 00:02:33.870
mode. Where the AI takes the wheel on browser

00:02:33.870 --> 00:02:35.909
automation. Exactly. So let's start with the

00:02:35.909 --> 00:02:38.650
engine. What is going on under the hood here?

00:02:38.830 --> 00:02:41.569
What makes Sonnet 4 .6 distinct? Yeah. Well,

00:02:41.569 --> 00:02:43.469
the first thing that jumps out is the sheer size

00:02:43.469 --> 00:02:45.969
of the memory space. OK. This model comes with

00:02:45.969 --> 00:02:49.090
a 1 million token context window. 1 million.

00:02:49.129 --> 00:02:51.509
We hear these numbers thrown around a lot. 32k,

00:02:51.629 --> 00:02:54.530
128k, now a million. Right. But let's ground

00:02:54.530 --> 00:02:57.569
that. What is a token, basically? A token is

00:02:57.569 --> 00:02:59.789
roughly a piece of a word. So what does a million

00:02:59.789 --> 00:03:02.710
tokens actually feel like to you as a user? Think

00:03:02.710 --> 00:03:05.409
of it this way. It's like giving the AI a very

00:03:05.409 --> 00:03:10.060
thick or a massive disorganized folder of code.

00:03:10.740 --> 00:03:13.699
In previous generations, the AI was like a sieve.

00:03:13.780 --> 00:03:16.560
It's there, it might read the first few chapters,

00:03:16.860 --> 00:03:18.919
but by the time you asked a question at chapter

00:03:18.919 --> 00:03:20.860
10, it already forgot the character names from

00:03:20.860 --> 00:03:24.080
chapter one. Exactly. The goldfish memory problem.

00:03:24.180 --> 00:03:26.259
You have to keep reminding it. Like, hey, remember

00:03:26.259 --> 00:03:28.680
that rule I gave you 20 minutes ago? Precisely.

00:03:29.199 --> 00:03:32.939
With a one million token window, that sieve becomes

00:03:32.939 --> 00:03:36.280
a vault. A vault. It holds that entire thick

00:03:36.280 --> 00:03:39.460
book in its head at once. Wow. It remembers the

00:03:39.460 --> 00:03:41.900
footnote on page three while working on page

00:03:41.900 --> 00:03:45.560
500. That is a huge quality of life improvement.

00:03:45.639 --> 00:03:47.500
It holds the state of the conversation perfectly.

00:03:47.659 --> 00:03:50.280
But the guide suggests the bigger shish isn't

00:03:50.280 --> 00:03:54.539
just memory, it's agency. This computer use capability.

00:03:54.960 --> 00:03:57.419
Right. Let's talk about that. This feels like

00:03:57.419 --> 00:04:00.960
a real departure. Until now, AI has been a text

00:04:00.960 --> 00:04:03.650
generator. You type, it types back. But Sonnet

00:04:03.650 --> 00:04:06.930
4 .6 is different. The guide describes this as

00:04:06.930 --> 00:04:10.050
the ability to handle boring clicking tasks.

00:04:10.569 --> 00:04:13.889
Boring clicking tasks. Yeah. It can read complex

00:04:13.889 --> 00:04:16.689
spreadsheets, visually scan them, fill out long

00:04:16.689 --> 00:04:18.709
forms without getting tired. It acts almost like

00:04:18.709 --> 00:04:21.689
a real person sitting at the terminal. When you

00:04:21.689 --> 00:04:24.310
say visually scanned, does it actually see the

00:04:24.310 --> 00:04:27.410
screen? Or is it looking at the code behind the

00:04:27.410 --> 00:04:30.839
website? That is a key distinction. It is actually

00:04:30.839 --> 00:04:33.980
analyzing screenshots of the interface in real

00:04:33.980 --> 00:04:36.180
time. Really? Yeah. It looks at the buttons,

00:04:36.379 --> 00:04:39.100
the layout, the text fields, just like your eyes

00:04:39.100 --> 00:04:41.420
do. The source mentions it's great at step -by

00:04:41.420 --> 00:04:43.180
-step development. Is that just about writing

00:04:43.180 --> 00:04:46.620
code? It's much broader. The model builds a project,

00:04:46.920 --> 00:04:49.240
inspects what it made, and then fixes its own

00:04:49.240 --> 00:04:52.000
errors. As it goes. Right. It's recursive. It

00:04:52.000 --> 00:04:53.639
doesn't just shoot out an answer and hope for

00:04:53.639 --> 00:04:56.620
the best. It manages the task from start to finish.

00:04:56.980 --> 00:04:59.779
Exactly. Does this capacity for step -by -step

00:04:59.779 --> 00:05:02.720
correction fundamentally change how we interact

00:05:02.720 --> 00:05:05.620
with it? Yes. It moves from just answering questions

00:05:05.620 --> 00:05:08.439
to managing entire workflows start to finish.

00:05:08.699 --> 00:05:10.699
So we manage workflows now instead of just asking

00:05:10.699 --> 00:05:13.199
questions. Exactly. You become the architect.

00:05:13.579 --> 00:05:17.019
The AI becomes the builder. But usually, When

00:05:17.019 --> 00:05:20.540
you get an agent that can replace a junior employee's

00:05:20.540 --> 00:05:23.100
clicking capability, you expect a premium price

00:05:23.100 --> 00:05:25.680
tag. And that is where the market is getting

00:05:25.680 --> 00:05:28.779
aggressive. The guide breaks down the economics

00:05:28.779 --> 00:05:31.420
very clearly. Let's hear the numbers. For Sonnet

00:05:31.420 --> 00:05:36.540
4 .6, the input cost is $3 for every 1 million

00:05:36.540 --> 00:05:41.480
tokens. OK. The output cost is $15 per 1 million

00:05:41.480 --> 00:05:44.560
tokens. Contextualize that for me. How does that

00:05:44.560 --> 00:05:47.519
stack up against the older versions? It is offering

00:05:47.519 --> 00:05:50.680
top -level smarts at a middle -level price. It

00:05:50.680 --> 00:05:53.279
is half the price of the heavy Opus model, but

00:05:53.279 --> 00:05:55.579
it runs two times faster. And speed matters.

00:05:55.920 --> 00:05:58.839
Waiting 10 seconds versus 20 seconds is the difference

00:05:58.839 --> 00:06:01.720
between staying in flow and getting totally distracted.

00:06:01.899 --> 00:06:04.060
It really is. Yeah. But let's look at the benchmarks.

00:06:04.800 --> 00:06:08.300
Specifically, SW Bench. SW Bench. Explain that

00:06:08.300 --> 00:06:10.939
in plain English for us. It is a test. checking

00:06:10.939 --> 00:06:13.620
how well AI fixes code like a software engineer.

00:06:13.920 --> 00:06:16.040
OK. So it's not just writing a simple function.

00:06:16.139 --> 00:06:18.500
No, it's much messier. Yeah. The test gives the

00:06:18.500 --> 00:06:21.100
AI a real -world software repository and a bug

00:06:21.100 --> 00:06:23.079
description. And it has to fix it. It has to

00:06:23.079 --> 00:06:24.680
navigate the files, figure out what's broken,

00:06:24.819 --> 00:06:26.639
write the fix, and not break anything else. A

00:06:26.639 --> 00:06:29.319
test of troubleshooting. Exactly. In this test,

00:06:29.699 --> 00:06:33.660
Sonnet 4 .6 scores an 80 .2%. That sounds high.

00:06:33.860 --> 00:06:36.720
It is very high. It beats its big brother, the

00:06:36.720 --> 00:06:39.579
Opus 4 model. So we are seeing an inversion where

00:06:39.579 --> 00:06:41.899
the cheaper model is actually the more capable

00:06:41.899 --> 00:06:44.740
one. Exactly. You no longer pay a premium for

00:06:44.740 --> 00:06:47.600
intelligence, you pay for efficiency. So the

00:06:47.600 --> 00:06:50.139
math is half the price, twice the speed, and

00:06:50.139 --> 00:06:52.839
better performance. Impossible to ignore. Let's

00:06:52.839 --> 00:06:55.500
move to the visual tests. Numbers are one thing,

00:06:55.639 --> 00:06:58.439
but seeing what it builds is another. The guide

00:06:58.439 --> 00:07:00.959
walks through a few specific test cases. One

00:07:00.959 --> 00:07:03.360
of them was building a website for a local business,

00:07:03.800 --> 00:07:06.899
a locksmith in New York. This was a test of adherence

00:07:06.899 --> 00:07:09.480
to constraints. Right. The prompt was very specific.

00:07:09.920 --> 00:07:12.439
A white background. Dark blue buttons. To build

00:07:12.439 --> 00:07:15.399
trust. Exactly. And the key detail. An emergency

00:07:15.399 --> 00:07:18.680
service price of exactly $99. Sounds simple,

00:07:18.720 --> 00:07:21.519
but AI loves to hallucinate prices. Or they get

00:07:21.519 --> 00:07:23.839
creative and make the button purple. Right. But

00:07:23.839 --> 00:07:26.639
the result was spot on. Yeah. Professional fonts.

00:07:27.120 --> 00:07:30.120
Correct layout. Exact adherence to the price

00:07:30.120 --> 00:07:32.360
instruction. It looks like a human made it. It

00:07:32.360 --> 00:07:34.939
did. But then they pushed it harder. They asked

00:07:34.939 --> 00:07:37.959
it to simulate a Mac OS computer screen. Using

00:07:37.959 --> 00:07:41.000
only web code. Yes. Building a system inside

00:07:41.000 --> 00:07:43.899
a system. The first try was just okay. It was

00:07:43.899 --> 00:07:46.980
static. A painting of a computer. Right. But

00:07:46.980 --> 00:07:49.889
the tester asked it to improve. The second try

00:07:49.889 --> 00:07:52.910
was amazing. What changed? It added drag and

00:07:52.910 --> 00:07:55.730
drop folders. Wait, drag and drop? Yes. That

00:07:55.730 --> 00:07:59.029
requires complex logic. Tracking mouse position,

00:07:59.350 --> 00:08:01.550
updating elements instantly. It worked. You could

00:08:01.550 --> 00:08:03.810
change backgrounds. It built a music player with

00:08:03.810 --> 00:08:06.069
a moving time bar. I have to admit something

00:08:06.069 --> 00:08:08.389
here, Beat. I still wrestle with prompt rift

00:08:08.389 --> 00:08:11.100
myself. Oh, we all do. You ask an AI to fix one

00:08:11.100 --> 00:08:13.060
thing, it breaks three others. It forgets the

00:08:13.060 --> 00:08:15.720
button color. Yeah. Hearing it maintains the

00:08:15.720 --> 00:08:17.939
state of a music player while dragging a folder

00:08:17.939 --> 00:08:21.240
is so relieving. It shows real robustness. It

00:08:21.240 --> 00:08:23.540
handles layout well, but does it have artistic

00:08:23.540 --> 00:08:27.540
intuition? Not quite. Its SVG drawings, like

00:08:27.540 --> 00:08:31.139
butterflies and robots, are just okay. Opus is

00:08:31.139 --> 00:08:33.320
still the better artist. So it's an engineer,

00:08:33.480 --> 00:08:35.980
not a painter. Exactly. Which leads us to the

00:08:35.980 --> 00:08:39.580
gaming tests. Games are systems. Physics and

00:08:39.580 --> 00:08:42.279
logic. This is where it really flexed. First

00:08:42.279 --> 00:08:45.580
test was a marble board game. Using mouse movement

00:08:45.580 --> 00:08:49.000
to tilt a wooden board. Yes. The marble rolls

00:08:49.000 --> 00:08:52.159
faster or slower based on tilt. Black holes act

00:08:52.159 --> 00:08:55.600
as obstacles. Vector math. Understanding the

00:08:55.600 --> 00:08:58.080
connection between input and physics. And it

00:08:58.080 --> 00:09:00.740
worked instantly. The marble rolled realistically.

00:09:01.000 --> 00:09:03.539
But the next test was even bigger. A Minecraft

00:09:03.539 --> 00:09:08.070
clone. Boxelcraft. Tested using KiloCode. Huge

00:09:08.070 --> 00:09:10.870
prompt, I imagine. Massive. Health bars, food

00:09:10.870 --> 00:09:14.830
bars. Breaking and placing blocks. And deep caves

00:09:14.830 --> 00:09:17.529
under the ground. Yes. It included a proper start

00:09:17.529 --> 00:09:20.149
menu, which is rare, and it generated those deep

00:09:20.149 --> 00:09:24.049
caves. Whoa! Two -sex silence. Think about that,

00:09:24.149 --> 00:09:26.509
generating deep caves under the ground. They're

00:09:26.509 --> 00:09:28.950
crazy. The AI built a hidden part of the world

00:09:28.950 --> 00:09:31.490
just because the prompt asked for it, simulating

00:09:31.490 --> 00:09:33.570
depth we can't initially see. It really is a

00:09:33.570 --> 00:09:35.769
moment of wonder. Did the browser actually handle

00:09:35.769 --> 00:09:39.070
a full 3D survival game written by AI? Mostly

00:09:39.070 --> 00:09:41.830
yes, though the graphics were heavy. The logic

00:09:41.830 --> 00:09:44.309
held up, even if the browser struggled. So the

00:09:44.309 --> 00:09:46.629
underlying system worked flawlessly, even if

00:09:46.629 --> 00:09:48.669
rendering lagged. We're going to take a quick

00:09:48.669 --> 00:09:50.750
break for our sponsors and when we come back,

00:09:50.889 --> 00:09:53.289
we'll talk about agent mode. Sounds good. And

00:09:53.289 --> 00:09:55.570
we're back. Let's look at the agent workflow,

00:09:55.830 --> 00:09:58.649
the big data task. This is about automation,

00:09:59.389 --> 00:10:02.090
treating the AI as an autonomous agent. The prompt

00:10:02.090 --> 00:10:04.549
was a long chain of commands. Create a folder,

00:10:04.809 --> 00:10:07.629
write a Python script, open a browser. Search

00:10:07.629 --> 00:10:10.750
Google for news, save to file, build a dashboard.

00:10:10.889 --> 00:10:13.330
That is where things usually break down. The

00:10:13.330 --> 00:10:16.649
AI forgets step three. But this time, it utilized

00:10:16.649 --> 00:10:19.830
Selenium. Define Selenium for us. It's browser

00:10:19.830 --> 00:10:22.440
automation software. code that clicks buttons

00:10:22.440 --> 00:10:25.860
on websites. Usually very finicky. Very. But

00:10:25.860 --> 00:10:28.580
the AI wrote the script and executed it. Wait,

00:10:28.659 --> 00:10:31.539
it ran the code? Yes. The computer physically

00:10:31.539 --> 00:10:34.000
opened the browser by itself, typed in the search

00:10:34.000 --> 00:10:36.679
bar, found top links, populated the dashboard.

00:10:37.059 --> 00:10:39.519
Is the main advantage here capability, or is

00:10:39.519 --> 00:10:42.440
it reliability? Reliability. Unlike others that

00:10:42.440 --> 00:10:45.600
get lazy, this model finishes long lists of steps

00:10:45.600 --> 00:10:47.940
without stopping. It doesn't get lazy and quit

00:10:47.940 --> 00:10:50.559
halfway. Exactly. It follows through. If someone

00:10:50.559 --> 00:10:52.600
listening wants to actually use this, how do

00:10:52.600 --> 00:10:55.440
they get access? The official chat is free, but

00:10:55.440 --> 00:10:57.940
has limited messages. The dreaded rate limit.

00:10:58.039 --> 00:11:00.820
Yeah. There's the API, which is pay as you go.

00:11:01.059 --> 00:11:04.980
$3 in, $15 out. Right. Then there's LMSYS for

00:11:04.980 --> 00:11:08.259
free, side -by -side testing. And KiloCode. Open

00:11:08.259 --> 00:11:10.899
source platform offering $25 in free credits.

00:11:11.159 --> 00:11:14.519
Great for testing. OK, what is the tactical advice

00:11:14.519 --> 00:11:17.700
for prompting this thing? Three tips. First,

00:11:17.960 --> 00:11:20.879
be specific. Give us an example. Don't just say

00:11:20.879 --> 00:11:24.120
write an ad. The guide uses the AI fire newsletter.

00:11:24.240 --> 00:11:27.340
OK. Specify the hook, the audience busy office

00:11:27.340 --> 00:11:29.980
workers, and the promise to save five hours a

00:11:29.980 --> 00:11:32.879
week. Give it constraints. Second tip, iterate.

00:11:33.179 --> 00:11:35.659
Build in small pieces. Like stacking Lego blocks

00:11:35.659 --> 00:11:38.580
of beta. Log in screen first, then dashboard.

00:11:38.700 --> 00:11:41.019
It stops the AI from getting confused. And third

00:11:41.019 --> 00:11:44.000
tip. feed the memory, use that 1 million token

00:11:44.000 --> 00:11:47.080
window, paste in whole documents. Don't be shy.

00:11:47.279 --> 00:11:49.879
Exactly. Let's recap the big idea here. We are

00:11:49.879 --> 00:11:52.080
seeing a shift toward high intelligence, low

00:11:52.080 --> 00:11:55.740
price. Yes. It excels at doing coding, clicking,

00:11:56.419 --> 00:11:58.539
automating. Rather than just knowing facts. It

00:11:58.539 --> 00:12:01.620
lacks artistic flair for SVGs, but makes up for

00:12:01.620 --> 00:12:04.279
it with strict adherence to rules and massive

00:12:04.279 --> 00:12:07.000
memory. It's a highly skilled engineer. So here's

00:12:07.000 --> 00:12:09.720
my challenge to you listening. Try the marble

00:12:09.720 --> 00:12:13.230
game prompt. or use kilo code credits to build

00:12:13.230 --> 00:12:16.009
a small tool. The source mentions knowing how

00:12:16.009 --> 00:12:18.870
to use these tools reduces work stress. We are

00:12:18.870 --> 00:12:21.549
moving away from reading every line of code to

00:12:21.549 --> 00:12:23.730
orchestrating the system. It's a totally different

00:12:23.730 --> 00:12:25.950
kind of work. Focus on the what and why. Let

00:12:25.950 --> 00:12:29.110
the AI handle the how. Exactly. Thanks for diving

00:12:29.110 --> 00:12:31.289
in with us. Always a pleasure. See you in the

00:12:31.289 --> 00:12:32.009
next deep dive.