WEBVTT

00:00:00.000 --> 00:00:03.020
Okay, let's unpack this. Imagine something that

00:00:03.020 --> 00:00:06.559
scores like way off the charts, genius level,

00:00:06.719 --> 00:00:11.220
on an actual Mensa IQ test. It's brilliant with

00:00:11.220 --> 00:00:15.580
text, with logic, you know, but then you ask

00:00:15.580 --> 00:00:17.879
it to look at something, reason about what it

00:00:17.879 --> 00:00:20.539
sees, and suddenly it's, well, it's struggling,

00:00:20.760 --> 00:00:24.260
scoring way below average. Yeah. Kind of weird,

00:00:24.420 --> 00:00:26.679
right? Like, how does that even work? It's a

00:00:26.679 --> 00:00:29.280
really fascinating paradox, actually. And we've

00:00:29.280 --> 00:00:32.100
got this stack of sources today, articles, some

00:00:32.100 --> 00:00:34.539
research papers covering quite a range of things

00:00:34.539 --> 00:00:36.479
happening with AI right now. Right. From those

00:00:36.479 --> 00:00:39.000
cognitive quirks you mentioned to, you know,

00:00:39.020 --> 00:00:40.880
really practical tools people are building and

00:00:40.880 --> 00:00:43.679
even some pretty serious risks we need to talk

00:00:43.679 --> 00:00:45.939
about. Totally. So our mission for this deep

00:00:45.939 --> 00:00:47.619
dive is really just to sift through all this

00:00:47.619 --> 00:00:49.340
and pull out the important stuff, the nuggets

00:00:49.340 --> 00:00:51.399
that help you get a clearer picture of what's

00:00:51.399 --> 00:00:53.719
actually going on with AI. know, beyond the hype.

00:00:53.840 --> 00:00:55.560
Get you informed, maybe raise a few eyebrows.

00:00:55.780 --> 00:00:57.740
Exactly. OK, let's just dive right into that

00:00:57.740 --> 00:01:00.380
first big thing the sources point out. this ai

00:01:00.380 --> 00:01:03.179
and iq score situation we're seeing reports that

00:01:03.179 --> 00:01:06.500
some ai models the text only ones uh -huh just

00:01:06.500 --> 00:01:08.959
language no pictures exactly they're nailing

00:01:08.959 --> 00:01:12.500
these standard iq tests like open ai's oh three

00:01:12.500 --> 00:01:14.920
right oh three that's a big yeah it scored a

00:01:14.920 --> 00:01:18.840
like a staggering 135 on a mensa test i mean

00:01:18.840 --> 00:01:21.120
that's firmly in the genius category way above

00:01:21.120 --> 00:01:24.359
the average human which is like 90 to 110. And

00:01:24.359 --> 00:01:26.140
what's really striking looking at the sources

00:01:26.140 --> 00:01:28.640
is it's not just O3. You've got Clawed Force

00:01:28.640 --> 00:01:31.920
on it hitting 127, Gemini 2 .0, Flash thinking

00:01:31.920 --> 00:01:36.200
right there at 126, Gemini 2 .5 Pro at 124. Wow.

00:01:36.560 --> 00:01:39.319
These are consistently high scores. And the really

00:01:39.319 --> 00:01:41.920
interesting bit, the sources say the top 10 performers

00:01:41.920 --> 00:01:45.159
on these tests, all text -only models, every

00:01:45.159 --> 00:01:47.650
single one. Right. That feels a bit backwards,

00:01:47.810 --> 00:01:49.989
doesn't it? Because all the buzz is about multimodal

00:01:49.989 --> 00:01:53.209
AI, models that can see and hear and process

00:01:53.209 --> 00:01:55.370
text. Yeah, the ones that seem more human -like

00:01:55.370 --> 00:01:57.329
in their inputs. But here's where it gets really

00:01:57.329 --> 00:01:59.629
interesting and maybe a bit confusing based on

00:01:59.629 --> 00:02:01.969
what we're reading. The models that can see the

00:02:01.969 --> 00:02:04.969
multimodals. Like GBT -40 with vision. Precisely.

00:02:05.209 --> 00:02:07.629
When those models were given tasks needing structured

00:02:07.629 --> 00:02:10.129
reasoning, especially involving visual stuff,

00:02:10.389 --> 00:02:12.189
you know, things you'd expect them to be good

00:02:12.189 --> 00:02:14.810
at, they actually performed worse than average

00:02:14.810 --> 00:02:18.900
humans. Significantly worse sometimes. So this

00:02:18.900 --> 00:02:22.159
raises a big question. What is going on? Why

00:02:22.159 --> 00:02:25.219
is GBT 4 .0 vision scoring, what was it, 63?

00:02:25.659 --> 00:02:29.300
And Grok 3 think vision scoring 60. Yeah, those

00:02:29.300 --> 00:02:31.419
numbers are kind of shocking. I mean, in human

00:02:31.419 --> 00:02:33.479
terms, those scores are down in the borderline

00:02:33.479 --> 00:02:35.419
intellectual disability range. So you have this.

00:02:36.060 --> 00:02:39.280
massive kind of perplexing gap opening up. Genius

00:02:39.280 --> 00:02:42.639
text models here and struggling vision models

00:02:42.639 --> 00:02:44.319
over there, at least on these specific reasoning

00:02:44.319 --> 00:02:47.120
tasks. So what's the takeaway here? The sources

00:02:47.120 --> 00:02:49.620
seem to be pointing out this mismatch. Despite

00:02:49.620 --> 00:02:52.039
all the excitement and marketing around multimodal

00:02:52.039 --> 00:02:53.919
AI, they're just not there yet when it comes

00:02:53.919 --> 00:02:56.539
to complex logic combined with visual understanding.

00:02:56.840 --> 00:02:59.840
Right. It kind of widens that gap between the

00:02:59.840 --> 00:03:02.439
hype and today's reality. Maybe, just maybe,

00:03:02.539 --> 00:03:05.180
if your problem is pure logic or abstract reasoning,

00:03:05.400 --> 00:03:08.099
a simpler text model is actually, you know, smarter

00:03:08.099 --> 00:03:11.020
for now. Even Meta's Llama for Maverick, which

00:03:11.020 --> 00:03:13.819
does handle vision, the source mentioned it scored

00:03:13.819 --> 00:03:18.960
105, which is... OK, above average, but not in

00:03:18.960 --> 00:03:21.699
that elite tier with the text only champs. Yeah.

00:03:21.819 --> 00:03:23.939
Interesting distinction. And we should remember

00:03:23.939 --> 00:03:26.939
these IQ scores, while they tell us something

00:03:26.939 --> 00:03:29.199
about a certain kind of intelligence, they're

00:03:29.199 --> 00:03:31.879
definitely not the whole story. Right. One source

00:03:31.879 --> 00:03:34.379
highlights this Apple study looking at large

00:03:34.379 --> 00:03:36.780
reasoning models. Yeah. Yeah. The big names again,

00:03:36.939 --> 00:03:40.840
O3, Claude 3 .7 Sonnet, DeepSeek R1, Gemini,

00:03:40.860 --> 00:03:44.439
and found something kind of critical. They can

00:03:44.439 --> 00:03:46.710
actually just collapse. Completely. When you

00:03:46.710 --> 00:03:49.949
throw really complex problems at them. Wow. Collapse.

00:03:49.969 --> 00:03:52.289
You mean just stop working. Not even a bad answer.

00:03:52.330 --> 00:03:54.750
Just nothing. Pretty much. That's pretty eye

00:03:54.750 --> 00:03:56.530
opening. So it's not just about the peak score

00:03:56.530 --> 00:03:58.930
they can hit. It's about like robustness. Yeah.

00:03:58.949 --> 00:04:01.090
How they handle the hard stuff. Exactly. Brittleness

00:04:01.090 --> 00:04:03.389
is the word that comes to mind. And another source

00:04:03.389 --> 00:04:06.370
they ran a bunch of coding tests on, like 14

00:04:06.370 --> 00:04:09.629
major LLMs. Oh, yeah. How'd that go? Well, they

00:04:09.629 --> 00:04:13.030
found five clear winners. Models that consistently

00:04:13.030 --> 00:04:16.040
produced good code. But they also identified

00:04:16.040 --> 00:04:19.980
models you'd probably want to avoid if you're

00:04:19.980 --> 00:04:22.319
relying on them for coding. And the interesting

00:04:22.319 --> 00:04:25.879
point was that just having pro in the name doesn't

00:04:25.879 --> 00:04:28.720
guarantee better code. Performance varies wildly

00:04:28.720 --> 00:04:30.980
depending on the specific task you give it. Right.

00:04:31.079 --> 00:04:34.319
Task specific performance. Makes sense. And speaking

00:04:34.319 --> 00:04:36.920
of specific tasks, there was that mention of

00:04:36.920 --> 00:04:40.759
a YC backed startup. They built a research agent.

00:04:40.839 --> 00:04:42.579
Yeah, a frontier research agent. And it scored,

00:04:42.660 --> 00:04:46.540
what, 94 .9 % on OpenAI's SimpleQA benchmark?

00:04:47.199 --> 00:04:49.160
Which is all about answering questions based

00:04:49.160 --> 00:04:52.000
on provided text. High score. So it seems like

00:04:52.000 --> 00:04:54.720
when you focus an AI on one specific thing, it

00:04:54.720 --> 00:04:56.759
can get incredibly good at that thing, even if

00:04:56.759 --> 00:04:58.600
the general purpose models are still figuring

00:04:58.600 --> 00:05:00.879
things out. Yeah, specialized versus generalized

00:05:00.879 --> 00:05:03.180
intelligence. That's a key theme, I think. Okay,

00:05:03.240 --> 00:05:05.779
let's shift gears maybe. From performance and

00:05:05.779 --> 00:05:09.160
limits to more practical stuff. What's cool is

00:05:09.160 --> 00:05:11.579
seeing AI features actually starting to show

00:05:11.579 --> 00:05:13.939
up in everyday tools. Or just entirely new tools

00:05:13.939 --> 00:05:17.259
based on AI. Right, like... Google Gemini is

00:05:17.259 --> 00:05:19.839
apparently testing temporary chats. That lets

00:05:19.839 --> 00:05:22.040
you talk to it without the data being used for

00:05:22.040 --> 00:05:25.319
training. Kind of like ChatGPT's incognito mode.

00:05:25.519 --> 00:05:28.339
Yeah, that feels like a response to privacy concerns,

00:05:28.540 --> 00:05:32.000
giving users more control, which is good. And

00:05:32.000 --> 00:05:34.480
Gemini is also adding recurring tasks, you know,

00:05:34.480 --> 00:05:37.079
scheduling things. Oh, like ChatGPT already has.

00:05:37.240 --> 00:05:39.220
Exactly. Catching up on features. They might

00:05:39.220 --> 00:05:42.579
seem small, but for everyday use, setting reminders,

00:05:42.899 --> 00:05:45.129
automating little things. That's actually pretty

00:05:45.129 --> 00:05:47.189
useful. Totally. And the source has listed a

00:05:47.189 --> 00:05:49.269
whole bunch of specific tools, too. Did any jump

00:05:49.269 --> 00:05:51.250
out at you? Well, Glimpse turning photos into

00:05:51.250 --> 00:05:52.970
videos in the browser sounds kind of neat. Yeah.

00:05:53.009 --> 00:05:55.949
And Kling AI 2 .1 promising faster, cheaper,

00:05:56.089 --> 00:05:58.589
better video rendering. That could be a big deal

00:05:58.589 --> 00:06:01.110
for creators. Yeah, definitely. And Moonlit for

00:06:01.110 --> 00:06:04.569
building content workflows. Fusebase AI agents

00:06:04.569 --> 00:06:07.910
for teamwork, like a smarter notion maybe. And

00:06:07.910 --> 00:06:10.689
Agora, an AI search engine just for e -commerce.

00:06:10.829 --> 00:06:12.730
Lots of niche applications popping up. Put those

00:06:12.730 --> 00:06:14.689
quick hits. Apple doing live translation messages

00:06:14.689 --> 00:06:16.449
in FaceTime. That's pretty cool right on your

00:06:16.449 --> 00:06:18.649
phone. Microsoft putting out a free AI video

00:06:18.649 --> 00:06:22.170
creator. Oh, the wildly easy to use one. That's

00:06:22.170 --> 00:06:24.790
the one. And then on the flip side, Antropic.

00:06:25.259 --> 00:06:28.160
quietly killing its Claude Explains blog after

00:06:28.160 --> 00:06:31.100
just a month. Oh, really? I miss that. Yeah,

00:06:31.180 --> 00:06:32.920
it just shows how fast things change. Not every

00:06:32.920 --> 00:06:35.620
idea sticks. Yeah. Even for the big players.

00:06:35.759 --> 00:06:38.680
True. The pace is just relentless. And there

00:06:38.680 --> 00:06:41.500
were mentions of resources, too, like tutorials

00:06:41.500 --> 00:06:44.300
for prompt engineering, guides for building your

00:06:44.300 --> 00:06:46.819
own AI research assistant. Yeah, it gives you

00:06:46.819 --> 00:06:48.420
a feel for what people are actively trying to

00:06:48.420 --> 00:06:50.319
do with this tech beyond just chatting with it.

00:06:50.360 --> 00:06:52.680
Right, building things. But, you know, it's not

00:06:52.680 --> 00:06:55.639
all just... Cool tools and high scores. The sources

00:06:55.639 --> 00:07:00.720
also included a really tough story. Okay. A 16

00:07:00.720 --> 00:07:02.920
-year -old boy who tragically died by suicide.

00:07:03.439 --> 00:07:06.399
Apparently, criminals used fake AI -generated

00:07:06.399 --> 00:07:10.519
nude photos to blackmail him for $3 ,000. Oh,

00:07:10.519 --> 00:07:14.360
man. That's horrific. AI -generated fakes. Yeah.

00:07:14.620 --> 00:07:16.620
And the FBI is warning these kinds of scams are

00:07:16.620 --> 00:07:18.939
targeting more teens. It's just a really stark,

00:07:19.120 --> 00:07:21.360
heartbreaking reminder of the potential for misuse.

00:07:21.500 --> 00:07:23.579
These tools can cause devastating real world

00:07:23.579 --> 00:07:26.819
harm. Oh, God, that's just awful. It really just

00:07:26.819 --> 00:07:30.160
it slams home the need for more awareness, right?

00:07:30.240 --> 00:07:32.980
More protection, especially for kids. Education

00:07:32.980 --> 00:07:35.199
about this stuff and frankly, consequences for

00:07:35.199 --> 00:07:37.160
the people doing it. It's not abstract anymore.

00:07:37.339 --> 00:07:39.899
Absolutely. The dark side is incredibly real

00:07:39.899 --> 00:07:44.939
and dangerous. Shifting gears completely. To

00:07:44.939 --> 00:07:47.480
the business world. OK. The source has also touched

00:07:47.480 --> 00:07:50.139
on the money side. This company, AnySphere, they

00:07:50.139 --> 00:07:54.040
just raised $900 million. Wow. $900 million.

00:07:54.319 --> 00:07:56.759
Yeah. Huge funding round. Right. Puts their valuation

00:07:56.759 --> 00:08:00.540
at nearly $10 billion. And apparently their AI

00:08:00.540 --> 00:08:02.920
tool is already bringing in $200 million a year.

00:08:03.040 --> 00:08:05.480
That's serious cash. Shows the level of belief

00:08:05.480 --> 00:08:08.160
and maybe progress in certain corners of the

00:08:08.160 --> 00:08:10.240
AI business world. Definitely. And I thought

00:08:10.240 --> 00:08:12.670
it was interesting. The Zapier. CEO share their

00:08:12.670 --> 00:08:15.129
internal chart for measuring AI fluency. Oh,

00:08:15.189 --> 00:08:18.310
yeah. What was that like? To scale from unacceptable

00:08:18.310 --> 00:08:21.550
use of AI all the way up to transformative. Kind

00:08:21.550 --> 00:08:23.550
of makes you think, doesn't it? Where do you

00:08:23.550 --> 00:08:25.529
or your company fall on that spectrum right now?

00:08:25.670 --> 00:08:27.610
Yeah, that's a good self -assessment tool. How

00:08:27.610 --> 00:08:29.509
well are you actually using this stuff? Right.

00:08:29.649 --> 00:08:32.250
Practical perspective. OK, let's wrap up with

00:08:32.250 --> 00:08:34.529
something really forward looking, kind of mind

00:08:34.529 --> 00:08:37.690
bending, actually. One source dives into this

00:08:37.690 --> 00:08:42.639
new biomolecular AI. from MIT and a company called

00:08:42.639 --> 00:08:45.299
Recursion. It's called Bolts2. Yeah, this was

00:08:45.299 --> 00:08:47.700
pretty wild. What's really fascinating is how

00:08:47.700 --> 00:08:50.080
it's aiming to speed up drug research dramatically.

00:08:50.960 --> 00:08:53.840
Okay, so Bolts2 predicts something called binding

00:08:53.840 --> 00:08:57.179
affinity, which is basically how strongly a potential

00:08:57.179 --> 00:08:59.299
drug molecule will stick to its target in the

00:08:59.299 --> 00:09:01.720
body, like a protein involved in a disease. Right,

00:09:01.779 --> 00:09:04.000
that's crucial for a drug to work. Exactly. Getting

00:09:04.000 --> 00:09:06.360
that prediction right is key. Bolts2 does it

00:09:06.360 --> 00:09:10.059
with, they say, physics -grade accuracy. But

00:09:10.059 --> 00:09:13.320
here's the kicker. It does it a thousand times

00:09:13.320 --> 00:09:15.940
faster than the old school computer simulations.

00:09:16.299 --> 00:09:18.399
Wait, say that again? A thousand times faster?

00:09:18.419 --> 00:09:21.419
A thousand times, yeah. Wow. That's not incremental.

00:09:21.519 --> 00:09:23.720
That's transformative speed. What exactly does

00:09:23.720 --> 00:09:26.059
it predict? Just that sticking power? No, it's

00:09:26.059 --> 00:09:28.120
more comprehensive. It's a foundation model,

00:09:28.299 --> 00:09:31.740
kind of like an LLM, but for biology. It predicts

00:09:31.740 --> 00:09:34.139
both the 3D shapes of molecules and how they

00:09:34.139 --> 00:09:36.600
bind together. It builds on their earlier model,

00:09:36.720 --> 00:09:39.470
Bolts 1. which was already seen as an open source

00:09:39.470 --> 00:09:42.029
competitor to AlphaFold 3 for structure prediction.

00:09:42.289 --> 00:09:44.950
OK, AlphaFold was huge for protein shapes. Right.

00:09:45.070 --> 00:09:48.690
But Boltz 2 is apparently the first AI to model

00:09:48.690 --> 00:09:51.049
both the structure and the binding affinity together

00:09:51.049 --> 00:09:54.590
jointly in one go. And that combined approach

00:09:54.590 --> 00:09:57.649
seems to be the key. So it's accurate, too, not

00:09:57.649 --> 00:10:00.129
just fast. That's what the source claims. They

00:10:00.129 --> 00:10:02.389
say it matches the accuracy of really complex,

00:10:02.490 --> 00:10:05.669
slow physics -based methods and beats other methods

00:10:05.669 --> 00:10:09.269
on standard tests like OpenFE and CASP16. Those

00:10:09.269 --> 00:10:11.289
are like the Olympics for these kinds of predictions.

00:10:11.470 --> 00:10:13.570
And importantly, they apparently tested some

00:10:13.570 --> 00:10:15.610
of its predictions in the real world, prospectively,

00:10:15.710 --> 00:10:18.210
and confirmed they were strong binders. So it's

00:10:18.210 --> 00:10:21.289
validated. OK, so it's fast, accurate, validated,

00:10:21.350 --> 00:10:24.820
and built for practical use. Optimized for GPUs

00:10:24.820 --> 00:10:27.100
and stuff. Yeah, exactly. Designed for large

00:10:27.100 --> 00:10:29.440
-scale deployment. And their goal, according

00:10:29.440 --> 00:10:32.100
to the source, is pretty ambitious. They want

00:10:32.100 --> 00:10:35.519
Bolt 2 to be the go -to open platform for structure

00:10:35.519 --> 00:10:37.840
and affinity. Kind of like what AlphaFold became

00:10:37.840 --> 00:10:40.779
just for structure. Wow. If they pull that off,

00:10:40.919 --> 00:10:43.139
it could genuinely change the pace of discovering

00:10:43.139 --> 00:10:45.919
new medicines. Totally reshape it. Okay, that

00:10:45.919 --> 00:10:49.019
was quite a journey. We went from AI being a

00:10:49.019 --> 00:10:51.759
text genius, but maybe visually challenged. Uh

00:10:51.759 --> 00:10:54.620
-huh, the paradox. To spotting limitations, checking

00:10:54.620 --> 00:10:57.279
out all those new tools, facing the really serious

00:10:57.279 --> 00:11:00.299
risks. Yeah, the blackmail story was heavy. Seeing

00:11:00.299 --> 00:11:03.179
the huge money involved and then blasting off

00:11:03.179 --> 00:11:05.659
into predicting molecules a thousand times faster

00:11:05.659 --> 00:11:08.799
for drug discovery. It's a lot. It really shows

00:11:08.799 --> 00:11:11.120
the sheer breadth of AI today, doesn't it? From

00:11:11.120 --> 00:11:14.000
grappling with logic problems to simulating physics

00:11:14.000 --> 00:11:16.500
at incredible speeds. It really makes you wonder

00:11:16.500 --> 00:11:18.620
about the future. Like, how do we bridge that

00:11:18.620 --> 00:11:20.980
gap between tech smarts and visual reasoning

00:11:20.980 --> 00:11:23.820
and the big one? How do we make sure these incredibly

00:11:23.820 --> 00:11:27.120
powerful tools are used for good, like finding

00:11:27.120 --> 00:11:29.580
cures and not for horrific things like those

00:11:29.580 --> 00:11:32.059
scams? And maybe this raises an even deeper question

00:11:32.059 --> 00:11:35.389
for you to think about. Given how AI is soaring

00:11:35.389 --> 00:11:37.750
in these super specialized complex areas like

00:11:37.750 --> 00:11:40.429
drug research, while still fumbling with things

00:11:40.429 --> 00:11:43.509
humans find easy, like some visual tasks, what

00:11:43.509 --> 00:11:45.570
does that split tell us about intelligence itself?

00:11:46.009 --> 00:11:47.990
You know, human intelligence versus artificial

00:11:47.990 --> 00:11:50.230
intelligence. What even is it? Definitely something

00:11:50.230 --> 00:11:50.649
that you want.
