WEBVTT

00:00:00.000 --> 00:00:03.020
Okay, so we really need to pause for a second

00:00:03.020 --> 00:00:08.039
here and just absorb this reality. An AI model.

00:00:08.220 --> 00:00:10.460
Well, it recently worked autonomously for 30

00:00:10.460 --> 00:00:13.460
hours straight. Beat. And we're not talking about

00:00:13.460 --> 00:00:15.960
a small script or, you know, a quick fix. This

00:00:15.960 --> 00:00:18.000
thing was building entire software applications,

00:00:18.239 --> 00:00:20.320
setting up databases, handling the backend stuff,

00:00:20.460 --> 00:00:23.000
and even passed its own security audits. Required

00:00:23.000 --> 00:00:24.600
ones. This isn't really just a tool anymore,

00:00:24.719 --> 00:00:27.320
is it? It feels like the new baseline for...

00:00:28.640 --> 00:00:31.039
AI engineering capabilities. Yeah. Welcome to

00:00:31.039 --> 00:00:33.159
the deep dive. We are pulling straight from the

00:00:33.159 --> 00:00:36.619
latest stack of AI breakthroughs and platform

00:00:36.619 --> 00:00:39.140
updates. And honestly, the speed of change right

00:00:39.140 --> 00:00:41.740
now, it's almost overwhelming. So our mission

00:00:41.740 --> 00:00:43.640
today is basically to filter all that noise.

00:00:43.780 --> 00:00:45.960
We want to deliver like the three most critical

00:00:45.960 --> 00:00:48.060
shifts happening right now, the ones that fundamentally

00:00:48.060 --> 00:00:50.219
change how you approach your work, maybe your

00:00:50.219 --> 00:00:51.880
investment decisions, even how you think about

00:00:51.880 --> 00:00:54.460
creativity. Right. So our roadmap. First, we're

00:00:54.460 --> 00:00:56.600
going to unpack the technical stuff behind Anthropic's

00:00:56.600 --> 00:01:00.549
new staff engineer model. Then we'll look at

00:01:00.549 --> 00:01:03.210
how Microsoft is bringing this idea of vibe working,

00:01:03.429 --> 00:01:05.849
you know, just talking to the system into pretty

00:01:05.849 --> 00:01:08.689
much every office app you use. And finally, we'll

00:01:08.689 --> 00:01:11.250
dive into some specific AI prompting strategies

00:01:11.250 --> 00:01:14.489
for hopefully better, faster financial analysis.

00:01:14.890 --> 00:01:17.409
We know you need the clear insights, you know,

00:01:17.409 --> 00:01:19.829
without getting buried in information. Let's

00:01:19.829 --> 00:01:21.609
get into it. Okay. Let's kick off with segment

00:01:21.609 --> 00:01:24.500
one, the autonomous staff engineer. uh claude

00:01:24.500 --> 00:01:28.200
son at 4 .5 the jump here is just staggering

00:01:28.200 --> 00:01:30.900
both in what it can do and for how long i mentioned

00:01:30.900 --> 00:01:32.260
it briefly but yeah this is the breakthrough

00:01:32.260 --> 00:01:34.840
model sonnet 4 .5 feels like the most colleague

00:01:34.840 --> 00:01:37.000
-like model we've seen so far it's a real shift

00:01:37.000 --> 00:01:39.040
isn't it from using a static tool for one -off

00:01:39.040 --> 00:01:41.780
tasks to well engaging with something that feels

00:01:41.780 --> 00:01:44.120
like an active partner And what's really fascinating

00:01:44.120 --> 00:01:45.900
here, and this kind of speaks to the core technical

00:01:45.900 --> 00:01:49.560
challenge, is that jump in endurance. Sonnet

00:01:49.560 --> 00:01:52.500
4 .5 performed autonomous coding for up to 30

00:01:52.500 --> 00:01:55.299
hours. Now, just to give some context, its predecessor,

00:01:55.540 --> 00:01:58.760
Opus 4 .1, it only managed about seven hours

00:01:58.760 --> 00:02:01.879
of sustained unsupervised work. So going from

00:02:01.879 --> 00:02:04.680
seven hours to 30 in what, just a couple of months?

00:02:04.760 --> 00:02:07.980
That's not linear progress. That's serious acceleration.

00:02:08.280 --> 00:02:10.389
Incredible acceleration, yeah. But I think the

00:02:10.389 --> 00:02:12.509
real nugget here is why that duration matters

00:02:12.509 --> 00:02:14.710
so much. It's not just about running code for

00:02:14.710 --> 00:02:16.610
longer, right? It means the model is actually

00:02:16.610 --> 00:02:18.270
overcoming memory limits. It's keeping track

00:02:18.270 --> 00:02:21.050
of complex stuff across many steps. And it's

00:02:21.050 --> 00:02:23.330
preventing what people call plump drift, you

00:02:23.330 --> 00:02:25.389
know, where the AI kind of loses focus on the

00:02:25.389 --> 00:02:28.090
original big goal over time. 30 hours implies

00:02:28.090 --> 00:02:30.610
sustained complex memory management. That's huge.

00:02:31.030 --> 00:02:34.449
Exactly. And get this. In one test run, it generated

00:02:34.449 --> 00:02:37.569
over 11 ,000 lines of code. It handled really

00:02:37.569 --> 00:02:40.689
complex end -to -end engineering. tasks building

00:02:40.689 --> 00:02:43.330
full apps setting up databases even buying domain

00:02:43.330 --> 00:02:46.550
names like the whole product lifecycle and companies

00:02:46.550 --> 00:02:49.349
big ones like canva and cursor they're already

00:02:49.349 --> 00:02:51.930
starting to rely on it for deep research and

00:02:51.930 --> 00:02:54.870
complicated hiring workflows okay so if this

00:02:54.870 --> 00:02:57.669
model is working that long and doing that kind

00:02:57.669 --> 00:03:00.780
of complex work operating like a junior or maybe

00:03:00.780 --> 00:03:02.979
even a staff engineer. Start wondering about

00:03:02.979 --> 00:03:06.180
the cost, sure, but also the failure rate. Did

00:03:06.180 --> 00:03:08.240
it really work for 30 hours straight or was someone

00:03:08.240 --> 00:03:10.300
constantly nudging it back on track? Because

00:03:10.300 --> 00:03:13.060
if the AI is truly passing its own security audits,

00:03:13.099 --> 00:03:15.219
I mean, where does the human oversight actually

00:03:15.219 --> 00:03:17.870
start? What's left? That really hits the critical

00:03:17.870 --> 00:03:20.210
question, doesn't it? The data seems to suggest

00:03:20.210 --> 00:03:22.530
pretty minimal intervention was needed, and that's

00:03:22.530 --> 00:03:24.830
kind of the whole point. This level of autonomy

00:03:24.830 --> 00:03:27.530
means the human engineer's role, well, it fundamentally

00:03:27.530 --> 00:03:30.189
changes. You're not primarily the coder anymore.

00:03:30.270 --> 00:03:32.469
You're becoming more the validation expert, the

00:03:32.469 --> 00:03:35.069
auditor. Your job shifts maybe from creation

00:03:35.069 --> 00:03:37.930
to error management and oversight. So the new

00:03:37.930 --> 00:03:41.289
skill is agent management. Yeah. Learning how

00:03:41.289 --> 00:03:44.289
to manage these AI agents. Precisely, yeah. Okay,

00:03:44.310 --> 00:03:46.349
let's shift gears. Let's transition to another

00:03:46.349 --> 00:03:49.330
major platform update that really embodies this

00:03:49.330 --> 00:03:52.009
idea of the AI colleague, Microsoft's new agent

00:03:52.009 --> 00:03:55.710
mode and office agent, both inside 365 Copilot.

00:03:55.870 --> 00:04:00.009
Ah, right. This is Microsoft bringing that vibe

00:04:00.009 --> 00:04:03.009
-working idea, that really casual, almost conversational

00:04:03.009 --> 00:04:05.830
way of guiding things into the messy complexity

00:04:05.830 --> 00:04:09.039
of Excel and Word. The idea is you can just talk

00:04:09.039 --> 00:04:11.659
to the system, got it casually without needing

00:04:11.659 --> 00:04:13.699
complex commands. That is the core concept. Yeah.

00:04:13.840 --> 00:04:15.860
You might not need to know the exact function

00:04:15.860 --> 00:04:17.740
for a pivot table anymore. You just tell it what

00:04:17.740 --> 00:04:20.259
you need the data to actually show. We should

00:04:20.259 --> 00:04:21.639
look at the specifics, though, because there

00:04:21.639 --> 00:04:24.339
are two distinct agents here. OK, so what's the

00:04:24.339 --> 00:04:25.740
practical difference between them? How do they

00:04:25.740 --> 00:04:28.129
work? Well, first, you've got agent mode. This

00:04:28.129 --> 00:04:30.750
works inside the Office apps themselves, like

00:04:30.750 --> 00:04:33.189
Excel and Word with PowerPoint coming soon, apparently.

00:04:33.410 --> 00:04:36.290
This lets you do continuous iterative refinement

00:04:36.290 --> 00:04:38.069
right there in the document. Think of it like

00:04:38.069 --> 00:04:42.009
this. You prompt Copilot. It spits out an initial

00:04:42.009 --> 00:04:44.730
draft of, say, a sales report. Then you tell

00:04:44.730 --> 00:04:47.050
it, OK, bump this chart's projection up by 10

00:04:47.050 --> 00:04:52.069
% and change the font to Arial. It's all contextual

00:04:52.069 --> 00:04:54.870
guidance. OK, so it's kind of like having a really

00:04:54.870 --> 00:04:56.689
smart intern sitting right there next to you

00:04:56.689 --> 00:04:58.870
straight. sheet, basically ready to take notes

00:04:58.870 --> 00:05:01.709
and make changes instantly. Exactly. A very capable

00:05:01.709 --> 00:05:04.120
intern. Then there's the office agent. This is

00:05:04.120 --> 00:05:06.480
built into the main copilot chat interface. And

00:05:06.480 --> 00:05:08.699
this one's focused specifically on, let's call

00:05:08.699 --> 00:05:11.180
it single shot generation, generating polished

00:05:11.180 --> 00:05:13.540
documents or presentations straight from a simple

00:05:13.540 --> 00:05:16.240
chat prompt. You might say, generate a five slide

00:05:16.240 --> 00:05:18.839
presentation on our Q3 marketing strategy. And

00:05:18.839 --> 00:05:21.019
boom, the output is instantly formatted. It looks

00:05:21.019 --> 00:05:23.699
professional. Early benchmarks are showing something

00:05:23.699 --> 00:05:28.160
like 57 .2 % accuracy on complex multi -step

00:05:28.160 --> 00:05:32.910
tasks like these. 57 % accuracy. It sounds low

00:05:32.910 --> 00:05:35.529
maybe at first glance, but for really complex

00:05:35.529 --> 00:05:38.589
multi -step stuff, that's actually a pretty strong

00:05:38.589 --> 00:05:41.410
starting point. I have to admit, you know, I

00:05:41.410 --> 00:05:43.550
still wrestle with prompt drift myself sometimes

00:05:43.550 --> 00:05:46.970
when guiding these large models through complex

00:05:46.970 --> 00:05:50.589
creative work that takes multiple steps. It's

00:05:50.589 --> 00:05:52.550
tricky. It really is tricky to keep the system

00:05:52.550 --> 00:05:54.189
focused, especially if you're building out a

00:05:54.189 --> 00:05:57.279
long document. Yeah, and the accuracy gap that

00:05:57.279 --> 00:05:59.680
40 -something percent where it might miss the

00:05:59.680 --> 00:06:02.139
mark, that's where human review is still absolutely

00:06:02.139 --> 00:06:05.740
essential. But the goal is pretty clear. Automate

00:06:05.740 --> 00:06:07.699
the creation of polished documents from simple

00:06:07.699 --> 00:06:10.000
chat prompts. Which leads to the question, right,

00:06:10.139 --> 00:06:12.240
does this mean we can realistically stop building

00:06:12.240 --> 00:06:14.800
those complex Excel models and, you know, detailed

00:06:14.800 --> 00:06:17.100
word reports entirely by hand soon? The answer

00:06:17.100 --> 00:06:19.579
seems to be leaning towards yes, doesn't it?

00:06:20.009 --> 00:06:22.610
At least the grunt work, the initial heavy lifting

00:06:22.610 --> 00:06:24.990
of document creation looks like it's rapidly

00:06:24.990 --> 00:06:27.319
being automated. OK, so let's talk about using

00:06:27.319 --> 00:06:30.220
this agent power in maybe more high stakes fields.

00:06:30.339 --> 00:06:32.379
Segment three is all about using advanced AI

00:06:32.379 --> 00:06:34.779
prompts for smarter financial decisions. Right.

00:06:34.879 --> 00:06:37.699
We're seeing new resources pop up offering what

00:06:37.699 --> 00:06:39.819
they call a practical playbook of advanced AI

00:06:39.819 --> 00:06:42.100
prompts. These are designed specifically for

00:06:42.100 --> 00:06:45.259
smarter stock and crypto investing analysis.

00:06:45.540 --> 00:06:47.639
And just to be clear, this isn't about getting

00:06:47.639 --> 00:06:50.240
hot stock tips. It's really about automating

00:06:50.240 --> 00:06:53.019
the deep analysis part. Yeah, the benefit here

00:06:53.019 --> 00:06:55.399
is potentially huge because AI can drastically

00:06:55.399 --> 00:06:57.860
cut down the time you spend on fundamental research.

00:06:58.319 --> 00:07:01.680
Hours and hours of reading filings. This playbook

00:07:01.680 --> 00:07:04.160
apparently shows how to use a specific AI to

00:07:04.160 --> 00:07:07.120
get clear, almost analyst -level answers in,

00:07:07.160 --> 00:07:09.670
like, minutes a day. It's the kind of deep dive

00:07:09.670 --> 00:07:11.750
analysis that, you know, Wall Street firms spend

00:07:11.750 --> 00:07:13.649
thousands of human hours generating. OK, but

00:07:13.649 --> 00:07:15.889
how specific are these prompts? Are we just asking

00:07:15.889 --> 00:07:17.910
the AI like, hey, is Apple a good buy right now?

00:07:18.029 --> 00:07:20.029
Oh, not at all. No, no. We're talking about prompts

00:07:20.029 --> 00:07:22.149
that target really specific, often difficult

00:07:22.149 --> 00:07:24.870
to analyze sections of a company's public filings.

00:07:24.889 --> 00:07:27.970
For instance, you might prompt the AI to analyze

00:07:27.970 --> 00:07:30.329
the risk factors section in the last two quarters,

00:07:30.389 --> 00:07:34.670
10K filings for company X and synthesize a summary

00:07:34.670 --> 00:07:37.790
of potential legal exposures or maybe. Conduct

00:07:37.790 --> 00:07:39.709
sentiment analysis on management's discussion

00:07:39.709 --> 00:07:41.829
of the hiring outlook in the last three earnings

00:07:41.829 --> 00:07:44.850
call transcripts. Very specific data extraction.

00:07:45.149 --> 00:07:48.269
Got it. So you're leveraging the AI as a kind

00:07:48.269 --> 00:07:51.129
of research accelerator, like stacking Lego blocks

00:07:51.129 --> 00:07:52.990
of data much faster than a human could alone.

00:07:53.230 --> 00:07:55.550
But this is finance, right? High stakes territory.

00:07:55.610 --> 00:07:58.009
Is this kind of prompt based analysis really

00:07:58.009 --> 00:08:00.569
secure enough or reliable enough for making truly

00:08:00.569 --> 00:08:02.670
serious investing decisions? I mean, we're talking

00:08:02.670 --> 00:08:05.420
about real money here. and that confidence level

00:08:05.420 --> 00:08:08.019
that's still the key bottleneck absolutely ai

00:08:08.019 --> 00:08:10.680
offers incredibly faster research and synthesis

00:08:10.680 --> 00:08:14.300
no doubt but it absolutely positively requires

00:08:14.300 --> 00:08:17.980
human review for context for confidence checks

00:08:17.980 --> 00:08:20.800
for cross -checking against other sources before

00:08:20.800 --> 00:08:23.300
any serious capital decision is made think of

00:08:23.300 --> 00:08:25.699
it as a powerful co -pilot definitely but you

00:08:25.699 --> 00:08:28.660
are still a pilot in command Makes sense. OK,

00:08:28.759 --> 00:08:31.199
moving on to our final section, then a sort of

00:08:31.199 --> 00:08:34.279
rapid fire roundup of global quick hits and market

00:08:34.279 --> 00:08:36.379
shifts that just show the pace of all this. Right.

00:08:36.500 --> 00:08:39.139
Let's do it. First up, a major creative breakthrough

00:08:39.139 --> 00:08:41.960
out of China. Tencent just dropped Hunyuan Image

00:08:41.960 --> 00:08:45.039
3 .0. Now, this is a free image model that seems

00:08:45.039 --> 00:08:47.100
to exhibit pretty sophisticated reasoning. It

00:08:47.100 --> 00:08:50.240
can handle very long, complex prompts. And this

00:08:50.240 --> 00:08:51.679
is the really interesting part. It can apparently

00:08:51.679 --> 00:08:55.039
draw clean, legible text. inside the images it

00:08:55.039 --> 00:08:57.480
generates. Wait, why is drawing clean text inside

00:08:57.480 --> 00:08:59.519
an image such a big deal for these AI models?

00:08:59.620 --> 00:09:01.259
What's the challenge there? Well, it's because

00:09:01.259 --> 00:09:03.840
these diffusion models, the tech behind most

00:09:03.840 --> 00:09:06.860
image generators, they inherently treat text

00:09:06.860 --> 00:09:08.960
kind of like visual noise when they build an

00:09:08.960 --> 00:09:10.940
image. They see the shapes of letters, sure,

00:09:11.059 --> 00:09:13.879
but not the actual coherence, the meaning. So

00:09:13.879 --> 00:09:16.120
text usually comes out looking like garbled nonsense.

00:09:16.600 --> 00:09:19.700
Hunyuan Image 3 .0 actually solving that. It

00:09:19.700 --> 00:09:22.419
suggests a pretty significant leap in its understanding

00:09:22.419 --> 00:09:25.200
of semantic meaning, not just pixels. Okay, that

00:09:25.200 --> 00:09:27.080
is interesting. And here's something that really

00:09:27.080 --> 00:09:29.759
caught my eye. OpenAI is reportedly prepping

00:09:29.759 --> 00:09:32.049
a... standalone social app, kind of like TikTok

00:09:32.049 --> 00:09:35.330
in style, but powered entirely by Sora too. Everything

00:09:35.330 --> 00:09:37.970
in the feed would be AI generated video. Whoa.

00:09:38.169 --> 00:09:40.809
I mean, just imagine scaling a fully AI generated

00:09:40.809 --> 00:09:43.590
video feed to potentially a billion daily views.

00:09:43.710 --> 00:09:45.570
That's a moment of real wonder and maybe some

00:09:45.570 --> 00:09:48.990
disruption too. Exactly. That potential for pure

00:09:48.990 --> 00:09:52.470
targeted synthetic viral content. That's what

00:09:52.470 --> 00:09:54.090
we're talking about when we discuss these new

00:09:54.090 --> 00:09:58.090
market shifts and consumption patterns. OK, also

00:09:58.090 --> 00:10:00.009
on the policy and platform front, several quick

00:10:00.009 --> 00:10:02.490
updates. California Governor Newsom just signed

00:10:02.490 --> 00:10:06.169
that landmark AI safety bill SB 53 into law.

00:10:06.330 --> 00:10:09.230
So that's official. Chat GPT has new parental

00:10:09.230 --> 00:10:11.509
controls rolling out basically immediately to

00:10:11.509 --> 00:10:14.950
all users. And internally, word is Apple is heavily

00:10:14.950 --> 00:10:17.149
testing its own internal model. It's referred

00:10:17.149 --> 00:10:20.529
to internally as Veritas or sometimes Apple GPT.

00:10:20.590 --> 00:10:22.509
So they're definitely in the game. And for tool

00:10:22.509 --> 00:10:25.090
access, it's worth mentioning App 20X again.

00:10:25.190 --> 00:10:27.070
It's a platform that lets basically anyone. access

00:10:27.070 --> 00:10:29.070
open source alternative models and customize

00:10:29.070 --> 00:10:31.509
them using AI prompts. It kind of eliminates

00:10:31.509 --> 00:10:34.350
those technical hurdles that keep most non -developers

00:10:34.350 --> 00:10:36.289
from using models outside the big players like

00:10:36.289 --> 00:10:38.590
OpenAI or Google. Yeah, and if we connect this

00:10:38.590 --> 00:10:41.389
back to that Sora 2 social feed idea, the implication

00:10:41.389 --> 00:10:43.889
for just human content consumption is massive,

00:10:43.929 --> 00:10:47.110
isn't it? We could be moving from consuming mainly

00:10:47.110 --> 00:10:49.690
human -created content to interacting with purely

00:10:49.690 --> 00:10:52.029
manufactured, maybe hyper -personalized synthetic

00:10:52.029 --> 00:10:54.690
realities. Does that change everything? It feels

00:10:54.690 --> 00:10:57.570
like it could. Okay. let's try to unpack this

00:10:57.570 --> 00:11:00.190
one last time. Recap the big idea. The biggest

00:11:00.190 --> 00:11:02.230
takeaway from our deep dive today, I think, is

00:11:02.230 --> 00:11:04.509
the sheer velocity, but maybe more importantly,

00:11:04.570 --> 00:11:07.490
the direction of this change. We seem to be firmly

00:11:07.490 --> 00:11:10.929
moving past the era of just simple AI tools,

00:11:11.129 --> 00:11:13.950
you know, the basic chatbots, and really entering

00:11:13.950 --> 00:11:17.850
the age of autonomous agents. These are AIs capable

00:11:17.850 --> 00:11:21.230
of sustained, complex work, often without needing

00:11:21.230 --> 00:11:23.809
constant human hand -holding, whether that's

00:11:24.039 --> 00:11:26.419
coding an app for 30 hours or generating a complex

00:11:26.419 --> 00:11:28.539
financial report from a prompt. Yeah, absolutely.

00:11:28.720 --> 00:11:31.279
So we really encourage you, the listener, to

00:11:31.279 --> 00:11:33.460
maybe start testing Anthropic's new approach

00:11:33.460 --> 00:11:35.179
if you can, or try one of those new free image

00:11:35.179 --> 00:11:37.399
models we mentioned. Just to get a feel for the

00:11:37.399 --> 00:11:39.340
power of this kind of autonomy firsthand, it's

00:11:39.340 --> 00:11:42.059
different. But here's the final, maybe provocative

00:11:42.059 --> 00:11:44.200
thought we want to leave you with. If Claude

00:11:44.200 --> 00:11:47.059
Son at 4 .5 can genuinely work autonomously for

00:11:47.059 --> 00:11:49.600
30 hours straight and operate effectively like

00:11:49.600 --> 00:11:52.039
a staff engineer, what is the new competitive

00:11:52.039 --> 00:11:54.399
advantage for a human? working a standard 40

00:11:54.399 --> 00:11:56.940
-hour week. What does being human actually bring

00:11:56.940 --> 00:11:58.919
to the table now that an agent can't replicate

00:11:58.919 --> 00:12:02.620
or won't be able to soon? That's definitely something

00:12:02.620 --> 00:12:04.340
to think about. We'll be pondering that one too.

00:12:04.419 --> 00:12:06.039
Thanks for diving deep with us today.
