WEBVTT

00:00:00.000 --> 00:00:03.200
Okay, so picture this. AI isn't just your super

00:00:03.200 --> 00:00:05.240
smart friend you ask questions anymore. It's

00:00:05.240 --> 00:00:07.179
actually showing up with, you know, fully formed

00:00:07.179 --> 00:00:10.039
ideas like pitching a startup, but for research

00:00:10.039 --> 00:00:12.039
papers. Yeah, what's really fascinating here

00:00:12.039 --> 00:00:14.660
is just how quickly the AI landscape is evolving,

00:00:14.880 --> 00:00:16.920
right? It's shifting pretty dramatically, you

00:00:16.920 --> 00:00:19.600
know, from just processing information or like

00:00:19.600 --> 00:00:23.230
generating text based on prompts to... actively

00:00:23.230 --> 00:00:25.969
creating and proposing genuinely novel concepts.

00:00:26.190 --> 00:00:28.089
Exactly. It's a whole different ballgame, isn't

00:00:28.089 --> 00:00:29.649
it? And that's what we're really diving into

00:00:29.649 --> 00:00:32.890
today. This deep dive is all about these AI agents

00:00:32.890 --> 00:00:35.049
that are proposing original research ideas, kind

00:00:35.049 --> 00:00:36.689
of like they're pitching a new company concept.

00:00:36.789 --> 00:00:39.109
But yeah, for the world of science and academia.

00:00:39.799 --> 00:00:42.600
And we're pulling insights from this pretty significant

00:00:42.600 --> 00:00:45.060
recent report. It feels like a dispatch straight

00:00:45.060 --> 00:00:47.799
from, I guess, the front lines of AI development.

00:00:47.960 --> 00:00:50.799
It details how these cutting edge idea generating

00:00:50.799 --> 00:00:53.560
agents are actually being put to the test in

00:00:53.560 --> 00:00:55.920
pretty rigorous ways. Yeah, it's like getting

00:00:55.920 --> 00:00:58.479
the inside scoop on what's next. So our mission

00:00:58.479 --> 00:01:00.820
today really is to unpack this source material

00:01:00.820 --> 00:01:03.960
for you. We want to get into what these AI idea

00:01:03.960 --> 00:01:06.760
generators are actually proving capable of right

00:01:06.760 --> 00:01:09.579
now, how they were evaluated in this specific

00:01:09.579 --> 00:01:13.459
benchmark test, and maybe most importantly, what

00:01:13.459 --> 00:01:15.379
the results actually mean for you, whether you're

00:01:15.379 --> 00:01:19.060
using AI for brainstorming, for ideation in your

00:01:19.060 --> 00:01:20.620
own work, you know, whatever creative process

00:01:20.620 --> 00:01:22.319
you're involved in. Let's unpack this. Right.

00:01:22.579 --> 00:01:26.469
It's beyond just hype now. There are concrete

00:01:26.469 --> 00:01:29.730
benchmarks attempting to measure creativity and

00:01:29.730 --> 00:01:32.810
innovation in AI, which is, you know, a big step.

00:01:32.909 --> 00:01:34.790
Right. And the core of this report, you know,

00:01:34.790 --> 00:01:36.769
the main event is this test. They called it the

00:01:36.769 --> 00:01:40.010
AI Idea Bench 2025. It sounds a bit formal, maybe,

00:01:40.109 --> 00:01:42.269
but the premise is pretty cool, I think. They

00:01:42.269 --> 00:01:44.989
essentially put four leading AI idea generator

00:01:44.989 --> 00:01:47.370
agents through their paces like a competition.

00:01:47.670 --> 00:01:49.590
And crucially, they didn't just give them a broad

00:01:49.590 --> 00:01:52.290
topic and say, write something. They designed

00:01:52.290 --> 00:01:56.510
the test framework to really mimic how venture

00:01:56.510 --> 00:01:59.790
capitalists, you know, vet startups. They looked

00:01:59.790 --> 00:02:02.790
at these AI generated ideas based on three key

00:02:02.790 --> 00:02:05.090
criteria that matter in the real world, whether

00:02:05.090 --> 00:02:07.769
it's a business or research idea. Did the idea

00:02:07.769 --> 00:02:10.120
turn? get the right problem? Was it genuinely

00:02:10.120 --> 00:02:13.300
new? That's the novelty aspect. And did it actually

00:02:13.300 --> 00:02:15.360
look like something you could realistically build

00:02:15.360 --> 00:02:17.740
or implement? That's the feasibility side. Oh,

00:02:17.860 --> 00:02:20.379
applying that startup lens to research ideas.

00:02:20.520 --> 00:02:22.479
That makes so much sense. You kind of need all

00:02:22.479 --> 00:02:25.039
three for an idea to really. go anywhere, right?

00:02:25.099 --> 00:02:26.939
But here's where it gets really interesting,

00:02:27.060 --> 00:02:28.860
maybe even a little mind -bending, the ground

00:02:28.860 --> 00:02:31.580
truth they used for comparison. They didn't just

00:02:31.580 --> 00:02:33.699
have human experts guess if it was good. They

00:02:33.699 --> 00:02:36.240
compared the AI ideas against a massive data

00:02:36.240 --> 00:02:40.439
set, 3 ,495 brand new AI papers that were actually

00:02:40.439 --> 00:02:43.120
accepted and presented at top conferences recently.

00:02:43.400 --> 00:02:46.900
This part is absolutely key to the study's validity,

00:02:47.060 --> 00:02:49.580
I think. They were incredibly meticulous about

00:02:49.580 --> 00:02:52.949
the timing. Zero of this data. Zero of these

00:02:52.949 --> 00:02:57.469
3 ,495 research papers existed before the knowledge

00:02:57.469 --> 00:02:59.830
cutoff date for the models they were testing,

00:02:59.969 --> 00:03:03.110
specifically GPT -4io's cutoff date of October

00:03:03.110 --> 00:03:06.770
3, 2023. Whoa, like zero. None of it existed

00:03:06.770 --> 00:03:09.590
anywhere for the AI to have potentially seen

00:03:09.590 --> 00:03:12.530
it during training. Exactly. Zero. This completely

00:03:12.530 --> 00:03:14.729
eliminates the possibility that the AI was just,

00:03:14.789 --> 00:03:16.590
you know, echoing its training data or variations

00:03:16.590 --> 00:03:18.289
of knowledge that already exist in the public

00:03:18.289 --> 00:03:20.490
domain before that date. This was a true test

00:03:20.490 --> 00:03:22.000
of whether it could propose... something that

00:03:22.000 --> 00:03:25.020
was genuinely post -cutoff, something new to

00:03:25.020 --> 00:03:26.939
the world after its knowledge was frozen. Okay.

00:03:27.039 --> 00:03:29.199
That really isolates its ability to generate

00:03:29.199 --> 00:03:31.080
something novel, something new, which is like

00:03:31.080 --> 00:03:33.219
the whole point of ideation, right? Yeah. How

00:03:33.219 --> 00:03:35.740
did they actually score the ideas against this

00:03:35.740 --> 00:03:37.900
unseen data? They used a pretty smart two -step

00:03:37.900 --> 00:03:40.500
scoring process. First, they looked at how well

00:03:40.500 --> 00:03:43.819
the AI's idea aligned with a real paper that

00:03:43.819 --> 00:03:45.800
actually got published in that post -cutoff data

00:03:45.800 --> 00:03:48.719
set. Does it identify the same problem? Did it

00:03:48.719 --> 00:03:50.979
propose similar approaches or experiments? designs

00:03:50.979 --> 00:03:53.300
that measured the idea's relevance and depth.

00:03:53.479 --> 00:03:56.740
OK, so how close was it to a real new paper?

00:03:56.879 --> 00:03:59.400
Right. And second, they calculated a combined

00:03:59.400 --> 00:04:03.189
score for novelty and feasibility. And this part

00:04:03.189 --> 00:04:07.669
used citation data. Citation data. For novelty

00:04:07.669 --> 00:04:10.270
and feasibility, that feels a bit counterintuitive,

00:04:10.270 --> 00:04:12.009
doesn't it? Yeah. How does whether something

00:04:12.009 --> 00:04:14.449
is cited relate to whether it's new or doable?

00:04:14.750 --> 00:04:17.389
Well, it's used as a proxy, right? So for novelty,

00:04:17.509 --> 00:04:20.310
if an AI proposes an idea or a method and there

00:04:20.310 --> 00:04:22.529
are fewer existing papers citing similar work,

00:04:22.649 --> 00:04:25.149
that could indicate a more novel idea. It's less

00:04:25.149 --> 00:04:27.089
connected to the existing body of knowledge,

00:04:27.189 --> 00:04:29.529
you see. Okay. Fewer citations of similar things

00:04:29.529 --> 00:04:32.189
means it's maybe more novel. Got it. Makes sense.

00:04:32.350 --> 00:04:35.610
And for feasibility, looking at citation data

00:04:35.610 --> 00:04:37.949
related to the specific methods proposed in the

00:04:37.949 --> 00:04:41.569
AI idea can be a pretty powerful proxy. If the

00:04:41.569 --> 00:04:44.250
methods an AI suggests are already being adopted

00:04:44.250 --> 00:04:46.829
and cited frequently in recent research, it suggests

00:04:46.829 --> 00:04:49.410
they are more practical, more established, more

00:04:49.410 --> 00:04:52.759
buildable right now. The source notes that this

00:04:52.759 --> 00:04:55.060
citation -weighted feasibility score is actually

00:04:55.060 --> 00:04:58.389
a handy proxy for uh commercial traction too

00:04:58.389 --> 00:05:00.610
it tracks methods that are already demonstrating

00:05:00.610 --> 00:05:02.829
usefulness and adoption in the field kind of

00:05:02.829 --> 00:05:04.790
like market validation for the techniques themselves

00:05:04.790 --> 00:05:07.290
oh i see it's not just about whether the idea

00:05:07.290 --> 00:05:09.430
exists but whether the building blocks the ai

00:05:09.430 --> 00:05:11.850
is suggesting are already proving useful in practice

00:05:11.850 --> 00:05:14.350
like it's taking a pulse check on the technical

00:05:14.350 --> 00:05:16.329
approaches to see which ones are gaining traction

00:05:16.329 --> 00:05:19.550
and seem well practical precisely gives you a

00:05:19.550 --> 00:05:22.329
sense of how grounded in current workable techniques

00:05:22.329 --> 00:05:25.629
the idea is which is you know super useful fascinating

00:05:25.629 --> 00:05:28.949
so So who are these agents and how do they actually

00:05:28.949 --> 00:05:31.129
perform in this benchmark? What did the results

00:05:31.129 --> 00:05:33.490
show? Yeah, so the report specifically highlighted

00:05:33.490 --> 00:05:37.029
two, AI scientist and AI researcher, mainly because

00:05:37.029 --> 00:05:39.189
they showed different strengths, which is interesting

00:05:39.189 --> 00:05:42.930
in itself. AI scientist, this one was frankly

00:05:42.930 --> 00:05:46.829
amazing at alignment. It hit a perfect 5 .0 score

00:05:46.829 --> 00:05:49.889
on both the motivation for the research and the

00:05:49.889 --> 00:05:53.060
experiment design parts. A perfect 5 .0. Wow.

00:05:53.139 --> 00:05:54.759
So it didn't just get the gist. It got the why

00:05:54.759 --> 00:05:58.199
and the detailed how perfectly. It really nailed

00:05:58.199 --> 00:06:00.759
the concept and the proposed execution. Yeah.

00:06:00.879 --> 00:06:03.579
So perfectly matching the core problem and the

00:06:03.579 --> 00:06:06.199
detailed plan of what a human researcher came

00:06:06.199 --> 00:06:08.680
up with and actually got published after the

00:06:08.680 --> 00:06:11.079
cutoff. That's pretty wild. And maybe unsurprisingly,

00:06:11.100 --> 00:06:13.199
given that depth of alignment with what was truly

00:06:13.199 --> 00:06:15.899
novel and published, it also topped the charts

00:06:15.899 --> 00:06:19.350
for novelty overall. So AI scientists seems to

00:06:19.350 --> 00:06:21.850
be the agent you'd go to for generating bold,

00:06:22.069 --> 00:06:24.529
deeply aligned and potentially really novel ideas.

00:06:24.790 --> 00:06:27.029
OK, bold and novel, maybe the big swings, the

00:06:27.029 --> 00:06:28.670
breakthrough stuff. What about the other one

00:06:28.670 --> 00:06:30.470
you mentioned, AI researcher? Did it have a different

00:06:30.470 --> 00:06:33.709
profile? It did. AI researchers scored best on

00:06:33.709 --> 00:06:36.569
feasibility per step. It's hard to give the exact

00:06:36.569 --> 00:06:39.029
number without the study's full context. But

00:06:39.029 --> 00:06:41.350
the source mentions a specific number, 17 times

00:06:41.350 --> 00:06:43.810
10 to the minus three, I think. But the point

00:06:43.810 --> 00:06:45.829
was that its ideas looked more practical, more.

00:06:46.509 --> 00:06:48.509
buildable right now compared to the others in

00:06:48.509 --> 00:06:54.149
the test ah so maybe less blue sky more grounded

00:06:54.149 --> 00:06:57.889
one for big new concepts and one for ideas that

00:06:57.889 --> 00:07:00.449
seem more ready to actually implement or you

00:07:00.449 --> 00:07:02.310
know build upon today that seems to be the clear

00:07:02.310 --> 00:07:05.209
distinction emerging from the scoring yeah The

00:07:05.209 --> 00:07:07.170
citation -weighted feasibility score indicated

00:07:07.170 --> 00:07:09.449
that AI research was surfacing plans or methods

00:07:09.449 --> 00:07:11.430
that align more with techniques already gaining

00:07:11.430 --> 00:07:14.089
significant adoption or maybe even commercial

00:07:14.089 --> 00:07:17.389
traction in the field. Okay, this study is super

00:07:17.389 --> 00:07:19.029
interesting from a technical standpoint, but

00:07:19.029 --> 00:07:20.490
let's translate this. What does this actually

00:07:20.490 --> 00:07:22.850
mean for, you know, me, the person trying to

00:07:22.850 --> 00:07:25.250
use AI for brainstorming or come up with new

00:07:25.250 --> 00:07:27.709
projects? What are the practical takeaways from

00:07:27.709 --> 00:07:29.290
this whole thing? All right, this is where we

00:07:29.290 --> 00:07:31.470
really get to the so what's for you. The first

00:07:31.470 --> 00:07:33.310
big takeaway is about quality having layers.

00:07:33.870 --> 00:07:37.750
You know, just because an AI idea sounds relevant

00:07:37.750 --> 00:07:40.509
or even aligns perfectly with a problem like

00:07:40.509 --> 00:07:43.129
AI scientists did, doesn't automatically mean

00:07:43.129 --> 00:07:45.769
the details are there to actually implement it

00:07:45.769 --> 00:07:48.810
easily. There's a clear difference between nailing

00:07:48.810 --> 00:07:51.589
the core concept and providing a truly practical,

00:07:51.829 --> 00:07:54.449
buildable plan. Yeah, like the AI scientists

00:07:54.449 --> 00:07:57.439
got the what and the why down cold. Maybe perfectly,

00:07:57.620 --> 00:07:59.759
but maybe AI researcher was better at the how,

00:07:59.839 --> 00:08:02.199
the actual practical steps. Exactly. Think of

00:08:02.199 --> 00:08:05.199
it like maybe an architect providing a beautiful

00:08:05.199 --> 00:08:08.480
visionary drawing versus the detailed blueprints

00:08:08.480 --> 00:08:11.100
and engineering plans needed to actually construct

00:08:11.100 --> 00:08:13.360
the building. Both are necessary, right? But

00:08:13.360 --> 00:08:15.759
they represent different layers of quality or

00:08:15.759 --> 00:08:17.459
usefulness depending on what you need right now.

00:08:17.680 --> 00:08:19.699
I like that analogy. Vision versus blueprint.

00:08:19.959 --> 00:08:22.139
Okay, that makes sense. What else should we take

00:08:22.139 --> 00:08:24.860
away? Well, that citation -aware scoring approach

00:08:24.860 --> 00:08:27.639
they used. That concept itself is a valuable

00:08:27.639 --> 00:08:30.360
lens you can apply, even outside of this specific

00:08:30.360 --> 00:08:33.580
study. When an AI gives you an idea, thinking

00:08:33.580 --> 00:08:35.679
about how much related work or how many already

00:08:35.679 --> 00:08:38.419
adopted methodologies exist for it, you know,

00:08:38.419 --> 00:08:40.759
checking the citation pulse of the underlying

00:08:40.759 --> 00:08:43.799
techniques is like getting a market or technical

00:08:43.799 --> 00:08:47.460
feasibility signal. It's a quick check. If an

00:08:47.460 --> 00:08:50.440
AI suggests a method for, I don't know, analyzing

00:08:50.440 --> 00:08:53.019
customer data, I could kind of ask myself, based

00:08:53.019 --> 00:08:55.179
on what I know, are people actually using this

00:08:55.179 --> 00:08:57.639
method successfully? Is it proven? Is it gaining

00:08:57.639 --> 00:09:00.620
traction? Right. It's a way to gauge how speculative

00:09:00.620 --> 00:09:03.440
or how grounded the idea is in current practice.

00:09:03.820 --> 00:09:05.960
And then there's the critical point about testing

00:09:05.960 --> 00:09:08.799
against truly unseen data. If you're relying

00:09:08.799 --> 00:09:11.779
on AI for genuinely fresh brainstorming, for

00:09:11.779 --> 00:09:13.700
coming up with ideas that are new to you and

00:09:13.700 --> 00:09:15.720
hopefully new to the world, you really need to

00:09:15.720 --> 00:09:18.659
be wary of that training set echo. Training set

00:09:18.659 --> 00:09:20.559
echo, yeah. I like that term. It feels like the

00:09:20.559 --> 00:09:22.679
AI is just humming a tune it heard before, maybe

00:09:22.679 --> 00:09:24.840
slightly remixed, not composing something really

00:09:24.840 --> 00:09:27.840
new. Precisely. If the models you're using for

00:09:27.840 --> 00:09:30.600
brainstorming haven't been rigorously tested

00:09:30.600 --> 00:09:33.019
against information they absolutely could not

00:09:33.019 --> 00:09:36.100
have seen during training, like... In this benchmark,

00:09:36.320 --> 00:09:38.480
you can't be sure if they're generating genuinely

00:09:38.480 --> 00:09:41.000
novel ideas or just variations of things that

00:09:41.000 --> 00:09:43.299
already exist. That kind of testing is vital

00:09:43.299 --> 00:09:46.120
if novelty is really your goal. That makes total

00:09:46.120 --> 00:09:47.980
sense. You want to know if it's actually inventing

00:09:47.980 --> 00:09:49.940
something or just giving you a slightly different

00:09:49.940 --> 00:09:51.559
version of something you already know is out

00:09:51.559 --> 00:09:53.279
there. Maybe even something I already know is

00:09:53.279 --> 00:09:55.620
out there. And this really leads directly to

00:09:55.620 --> 00:09:58.919
the idea of matching the tool to the task. Don't

00:09:58.919 --> 00:10:01.120
fall into the trap of looking for one single

00:10:01.120 --> 00:10:04.940
best AI agent for all your ideation needs. Based

00:10:04.940 --> 00:10:07.279
on this study, AI scientists seems like the one

00:10:07.279 --> 00:10:10.220
you'd use for inspiring those bold, maybe slightly

00:10:10.220 --> 00:10:13.059
more theoretical, highly novel ideas that could

00:10:13.059 --> 00:10:15.259
really push boundaries. The big breakthrough

00:10:15.259 --> 00:10:18.779
concept ideas, the real moonshots, maybe. Yeah,

00:10:18.820 --> 00:10:21.860
the big swings. Well, AI researcher. with its

00:10:21.860 --> 00:10:24.440
higher feasibility score, seems better suited

00:10:24.440 --> 00:10:26.720
for surfacing plans or ideas you could actually

00:10:26.720 --> 00:10:29.860
ship or build upon right now using methods that

00:10:29.860 --> 00:10:31.779
are already proving their worth. Right. You need

00:10:31.779 --> 00:10:33.659
to pick the agent or maybe even use different

00:10:33.659 --> 00:10:36.259
agents in different stages that fits your specific

00:10:36.259 --> 00:10:39.570
goal for that brainstorming session. What are

00:10:39.570 --> 00:10:41.909
you trying to do today? So it's less about finding

00:10:41.909 --> 00:10:44.870
the perfect AI unicorn and more about understanding

00:10:44.870 --> 00:10:47.490
the strengths of different tools and using the

00:10:47.490 --> 00:10:49.870
right one for the right job you need done today,

00:10:50.070 --> 00:10:51.870
like having different tools in your toolbox.

00:10:52.190 --> 00:10:54.210
Exactly. And, you know, this applies whether

00:10:54.210 --> 00:10:56.789
you're in academic research or, as the source

00:10:56.789 --> 00:10:59.090
mentions, on a marketing team needing campaign

00:10:59.090 --> 00:11:01.990
ideas, a product team brainstorming new features,

00:11:02.090 --> 00:11:04.690
or even an investment team evaluating potential

00:11:04.690 --> 00:11:08.159
market gaps. When you turn to generative AI for

00:11:08.159 --> 00:11:11.480
ideation, explicitly apply this two -step filter

00:11:11.480 --> 00:11:14.379
derived from this study. First, is the idea truly

00:11:14.379 --> 00:11:16.399
on target for the problem you're trying to solve?

00:11:16.539 --> 00:11:19.139
Okay, step one, relevance. And second, can it

00:11:19.139 --> 00:11:21.039
actually be built or implemented with reasonable

00:11:21.039 --> 00:11:25.740
effort? Step two, feasibility. Is it relevant

00:11:25.740 --> 00:11:30.820
and is it doable? Right. Using that filter consistently

00:11:30.820 --> 00:11:33.940
helps you cut through the noise of maybe brilliant

00:11:33.940 --> 00:11:37.039
sounding but impractical ideas faster and helps

00:11:37.039 --> 00:11:39.519
you spot potentially winning actionable ideas

00:11:39.519 --> 00:11:42.159
more efficiently. It gives you a framework. That's

00:11:42.159 --> 00:11:43.879
super practical. It gives you like a concrete

00:11:43.879 --> 00:11:46.519
framework to evaluate what AI spits out instead

00:11:46.519 --> 00:11:49.360
of just feeling overwhelmed or unsure if the

00:11:49.360 --> 00:11:51.720
idea is any good or just noise. It does. It moves

00:11:51.720 --> 00:11:55.419
you from passively receiving AI output to. actively

00:11:55.419 --> 00:11:57.539
and critically evaluating it like a good investor

00:11:57.539 --> 00:11:59.720
or good editor really okay so that study gives

00:11:59.720 --> 00:12:02.679
us a deep look at ai's ideation capability which

00:12:02.679 --> 00:12:05.480
is you know a huge step but the source also touches

00:12:05.480 --> 00:12:07.899
on how ai is impacting other areas right now

00:12:07.899 --> 00:12:10.019
showing the breadth of development let's just

00:12:10.019 --> 00:12:11.860
take a quick look around the landscape it sketches

00:12:11.860 --> 00:12:13.980
out maybe touch on a few highlights yeah it's

00:12:13.980 --> 00:12:15.919
good to see these specific examples to ground

00:12:15.919 --> 00:12:18.580
the broader trends we discuss gives it context

00:12:19.129 --> 00:12:21.470
Totally. Like Amazon's apparently building this

00:12:21.470 --> 00:12:26.750
big AI brain for its warehouse robots called

00:12:26.750 --> 00:12:29.350
Proteus, I think. It sounds like it'll let the

00:12:29.350 --> 00:12:31.929
robots follow plain English orders from humans,

00:12:32.090 --> 00:12:34.409
which is kind of wild to think about. Plus, new

00:12:34.409 --> 00:12:36.509
models like Wellspring for optimizing delivery

00:12:36.509 --> 00:12:39.330
routes and an upgraded Sculpt model for stocking

00:12:39.330 --> 00:12:41.830
shelves smarter. Yeah, that's AI moving directly

00:12:41.830 --> 00:12:44.129
into complex physical operations and logistics,

00:12:44.470 --> 00:12:47.789
aiming for massive efficiency gains. It's not

00:12:47.789 --> 00:12:49.919
just soft. anymore. It's interacting with the

00:12:49.919 --> 00:12:53.419
physical world much more. And AlphaFold3. Isomorphic

00:12:53.419 --> 00:12:55.539
Labs, which is part of DeepMind, you know, says

00:12:55.539 --> 00:12:57.940
it can now predict the structure of protein interactions

00:12:57.940 --> 00:13:00.159
with other molecules, not just single proteins.

00:13:00.360 --> 00:13:02.659
This could potentially unlock way faster drug

00:13:02.659 --> 00:13:04.580
design, even for diseases that were previously

00:13:04.580 --> 00:13:07.200
considered Undruggable. That's pretty huge for

00:13:07.200 --> 00:13:09.419
medicine and biotech. Oh, absolutely. Predicting

00:13:09.419 --> 00:13:11.600
molecular interactions beyond just protein folding

00:13:11.600 --> 00:13:14.379
has been a major barrier for a long time. AlphaFold's

00:13:14.379 --> 00:13:17.259
progress here is genuinely a significant potential

00:13:17.259 --> 00:13:20.120
accelerator for pharmaceutical discovery if it

00:13:20.120 --> 00:13:23.840
holds up. Big if, but huge potential. And OpenAI

00:13:23.840 --> 00:13:25.799
still growing like crazy, apparently hitting

00:13:25.799 --> 00:13:28.639
3 million. paying business users now. That's

00:13:28.639 --> 00:13:31.399
a lot. And they're adding features aimed at making

00:13:31.399 --> 00:13:34.500
ChatGPT more useful in daily work, like connectors,

00:13:34.639 --> 00:13:36.399
which sounds like it integrates with things like

00:13:36.399 --> 00:13:39.860
Google Drive, maybe, and this record mode in

00:13:39.860 --> 00:13:41.820
the app for summarizing meetings you record.

00:13:42.480 --> 00:13:46.399
Those features really signal a move towards embedding

00:13:46.399 --> 00:13:50.600
AI deeper into core business workflows and knowledge

00:13:50.600 --> 00:13:52.799
management. They want it to be an indispensable

00:13:52.799 --> 00:13:55.200
tool for daily productivity, not just something

00:13:55.200 --> 00:13:57.299
you chat with occasionally. There's even a policy

00:13:57.299 --> 00:14:00.899
angle mentioned. Anthropic CEO Dario Amodei apparently

00:14:00.899 --> 00:14:04.000
called for a national AI transparency law, not

00:14:04.000 --> 00:14:06.080
freeze on development, importantly, but saying

00:14:06.080 --> 00:14:08.080
we need transparency about these models. That

00:14:08.080 --> 00:14:10.259
raises an important ongoing discussion about

00:14:10.259 --> 00:14:12.659
how societies and governments will oversee and

00:14:12.659 --> 00:14:15.700
regulate powerful AI models as they become more

00:14:15.700 --> 00:14:17.500
integrated into our infrastructure and decision

00:14:17.500 --> 00:14:20.480
making. Transparency is often seen as a key part

00:14:20.480 --> 00:14:23.279
of responsible development, building trust. And

00:14:23.279 --> 00:14:26.700
a cool safety application. Volvo's upcoming EX60

00:14:26.700 --> 00:14:30.320
electric vehicle will have this AI -driven multi

00:14:30.320 --> 00:14:33.460
-adaptive seatbelt system. It uses sensors and

00:14:33.460 --> 00:14:36.639
AI to adjust based on something like 11 different

00:14:36.639 --> 00:14:38.820
crash profiles in real time to try and protect

00:14:38.820 --> 00:14:42.080
passengers better. Using AI for real -time predictive

00:14:42.080 --> 00:14:44.320
safety adjustments in a physical product like

00:14:44.320 --> 00:14:47.070
a car seatbelt. That's a pretty concrete and

00:14:47.070 --> 00:14:49.809
potentially impactful application. Very tangible

00:14:49.809 --> 00:14:52.460
benefit. And on the business side, this new company,

00:14:52.620 --> 00:14:55.240
Shield Technology Partners, just got $100 million

00:14:55.240 --> 00:14:58.519
in funding to launch an AI -enabled managed IT

00:14:58.519 --> 00:15:02.059
services platform. Their strategy is to use shared

00:15:02.059 --> 00:15:04.600
AI agents to automate a lot of the routine IT

00:15:04.600 --> 00:15:07.779
support tasks that bog down human teams. Leveraging

00:15:07.779 --> 00:15:10.100
AI for service delivery and automation, especially

00:15:10.100 --> 00:15:12.539
in areas like IT support, where tasks are often

00:15:12.539 --> 00:15:14.899
repetitive but require specific knowledge, makes

00:15:14.899 --> 00:15:17.000
a lot of sense for scaling expertise and improving

00:15:17.000 --> 00:15:18.759
efficiency. You'll probably see a lot more of

00:15:18.759 --> 00:15:21.230
that. And just too quick. Cool tools that got

00:15:21.230 --> 00:15:23.789
to mention. Eleven Labs dropped version three

00:15:23.789 --> 00:15:26.730
of their platform for even more expressive text

00:15:26.730 --> 00:15:29.269
to speech. That includes emotion tags, which

00:15:29.269 --> 00:15:32.269
is pretty wild for TTS realism, and chat for

00:15:32.269 --> 00:15:34.429
data, which apparently lets you scrape web pages

00:15:34.429 --> 00:15:36.669
just using plain language prompts, which sounds

00:15:36.669 --> 00:15:39.669
way easier than coding it yourself. Yeah, tools

00:15:39.669 --> 00:15:41.730
like those continue to lower the barrier to entry

00:15:41.730 --> 00:15:45.649
for using AI for specific complex tasks, making

00:15:45.649 --> 00:15:48.370
things like creating expressive audio or extracting

00:15:48.370 --> 00:15:50.620
web data accessible. to a much wider audience,

00:15:50.779 --> 00:15:53.740
democratizing the tank in a way. So, wow, okay,

00:15:53.799 --> 00:15:56.000
putting it all together, a lot happening. Okay,

00:15:56.080 --> 00:15:57.899
so let's maybe try and unpack the bigger picture

00:15:57.899 --> 00:16:00.179
from this deep dive. The main takeaway is pretty

00:16:00.179 --> 00:16:02.940
stark, right? AI has genuinely moved way beyond

00:16:02.940 --> 00:16:04.820
just answering your search queries or writing

00:16:04.820 --> 00:16:06.879
simple emails. It's now actively stepping into

00:16:06.879 --> 00:16:09.500
the creative space, generating ideas, even complex

00:16:09.500 --> 00:16:11.779
ones like potential research proposals that,

00:16:11.879 --> 00:16:14.960
based on this study, can stack up against novel

00:16:14.960 --> 00:16:17.830
human -written papers. Yeah, it's not just summarizing

00:16:17.830 --> 00:16:20.049
the past. It's helping sketch out the future

00:16:20.049 --> 00:16:22.950
in a way. And, you know, just like investors

00:16:22.950 --> 00:16:25.889
vet startups or like this AI idea bench study

00:16:25.889 --> 00:16:28.450
scored these agents, we really need robust ways,

00:16:28.549 --> 00:16:30.210
maybe that two -step filter we talked about,

00:16:30.309 --> 00:16:33.230
to evaluate the quality of these AI -generated

00:16:33.230 --> 00:16:35.590
ideas ourselves. Looking at things like genuine

00:16:35.590 --> 00:16:38.169
novelty and practical feasibility is absolutely

00:16:38.169 --> 00:16:40.289
crucial to cut through the noise. Absolutely.

00:16:40.450 --> 00:16:44.230
An idea needs to not just sound good or be statistically

00:16:44.230 --> 00:16:47.799
novel. It needs to have legs. needs to be potentially

00:16:47.799 --> 00:16:50.360
achievable. And that means applying those filters

00:16:50.360 --> 00:16:53.330
is on target for the problem. And can it actually

00:16:53.330 --> 00:16:55.549
be built or implemented with reasonable resources?

00:16:55.909 --> 00:16:57.970
So here's something, I guess, to leave you thinking

00:16:57.970 --> 00:16:59.750
about, something to chew on after hearing all

00:16:59.750 --> 00:17:02.769
this. Given that AI can now generate ideas that

00:17:02.769 --> 00:17:05.130
actually align with and stack up against human

00:17:05.130 --> 00:17:07.150
written research papers published in top venues,

00:17:07.410 --> 00:17:09.750
how should we even begin to rethink the whole

00:17:09.750 --> 00:17:12.109
process of creative ideation itself? Does it

00:17:12.109 --> 00:17:15.009
fundamentally change things? This raises an important

00:17:15.009 --> 00:17:17.210
question for all of us, doesn't it? In a world

00:17:17.210 --> 00:17:19.690
where AI is becoming a constant potential co

00:17:19.690 --> 00:17:22.009
-creator, how do we distinguish between... ideas

00:17:22.009 --> 00:17:25.109
that are merely novel, maybe novel just based

00:17:25.109 --> 00:17:27.190
on recombining its training data in a clever

00:17:27.190 --> 00:17:30.309
way versus those that are truly impactful, truly

00:17:30.309 --> 00:17:32.490
achievable and meaningful in the real world?

00:17:33.009 --> 00:17:35.670
Where's the real insight versus just clever pattern

00:17:35.670 --> 00:17:38.809
mashing? Right. Like what kind of human oversight,

00:17:38.990 --> 00:17:41.509
what kind of human evaluation and curation becomes

00:17:41.509 --> 00:17:43.549
the most crucial part of the process when AI

00:17:43.549 --> 00:17:46.569
is doing so much of the initial generation and

00:17:46.569 --> 00:17:49.190
heavy lifting on concepts? What's our role now?

00:17:49.430 --> 00:17:52.230
It fundamentally shifts our role. perhaps, from

00:17:52.230 --> 00:17:54.970
being the sole generators of ideas to becoming

00:17:54.970 --> 00:17:57.269
expert curators, evaluators, maybe strategic

00:17:57.269 --> 00:18:00.150
prompters and refiners of AI generated concepts.

00:18:00.809 --> 00:18:03.690
Our value might move up the chain, so to speak.

00:18:04.210 --> 00:18:05.789
Definitely something to think about the next

00:18:05.789 --> 00:18:07.529
time you sit down to brainstorm, whether it's

00:18:07.529 --> 00:18:09.809
with AI helping out or just you and a whiteboard.