WEBVTT

00:00:00.000 --> 00:00:02.459
Welcome to the Deep Dive. We sift through sources

00:00:02.459 --> 00:00:04.200
to bring you the key insights, the important

00:00:04.200 --> 00:00:07.059
stuff. And today, well, we're tackling something

00:00:07.059 --> 00:00:09.580
many of us have strong opinions on, standardized

00:00:09.580 --> 00:00:12.560
testing. But what if the whole idea was maybe

00:00:12.560 --> 00:00:15.160
flawed right from the get -go? There's this question

00:00:15.160 --> 00:00:17.539
Peter Green asked over at Forbes, is the big

00:00:17.539 --> 00:00:21.170
standardized test a big standardized flop? In

00:00:21.170 --> 00:00:23.870
our main source today, an article by Gary Ackerman

00:00:23.870 --> 00:00:27.250
from back in September 2018 on hackscience .education,

00:00:27.609 --> 00:00:29.829
it gives a pretty clear answer, a definite yes.

00:00:30.570 --> 00:00:32.149
Now, lots of reasons have been thrown around

00:00:32.149 --> 00:00:33.829
for why these tests haven't worked out as hoped,

00:00:34.210 --> 00:00:37.170
but the source, it goes a bit deeper. It argues

00:00:37.170 --> 00:00:39.130
it was actually doomed to fail from the very

00:00:39.130 --> 00:00:41.670
beginning. Why? Because it was built on what

00:00:41.670 --> 00:00:44.920
the author calls an untenable foundation. Okay,

00:00:44.979 --> 00:00:47.320
let's unpack this. The basic problem, according

00:00:47.320 --> 00:00:48.960
to Ackerman, seemed to be trying to shoehorn

00:00:48.960 --> 00:00:51.600
a scientific lab experiment model onto, well,

00:00:51.799 --> 00:00:54.280
messy, complex human education. The article kicks

00:00:54.280 --> 00:00:56.320
off by looking at why standardized testing and

00:00:56.320 --> 00:00:58.000
this whole accountability movement even started.

00:00:58.500 --> 00:01:00.359
The basic argument was, you know, the public

00:01:00.359 --> 00:01:02.500
should only pay for what works in schools. And

00:01:02.500 --> 00:01:05.480
how do we know what works? Well, we measure it.

00:01:05.900 --> 00:01:08.299
Seems simple enough, right? Especially if you've

00:01:08.299 --> 00:01:10.969
got a science background. Oh, absolutely. That

00:01:10.969 --> 00:01:13.310
approach feels very familiar if you've spent

00:01:13.310 --> 00:01:17.469
time in, say, the natural sciences. Science often

00:01:17.469 --> 00:01:20.189
uses this pretty straightforward method for finding

00:01:20.189 --> 00:01:22.689
answers. You set up an experiment. You control

00:01:22.689 --> 00:01:24.930
everything in the environment, keep it all constant,

00:01:25.150 --> 00:01:28.569
except for just one single thing, one variable.

00:01:29.129 --> 00:01:31.469
Then you change that one variable for a specific

00:01:31.469 --> 00:01:33.609
group. That's your treatment group. After you

00:01:33.609 --> 00:01:35.510
make the change, you measure. You look for any

00:01:35.510 --> 00:01:37.590
growth or change you're interested in. And here's

00:01:37.590 --> 00:01:40.829
the key. any change you see. You can confidently

00:01:40.829 --> 00:01:43.530
say it was caused only by that one thing you

00:01:43.530 --> 00:01:46.129
changed, because everything else was held steady.

00:01:46.269 --> 00:01:48.010
The source gives a really good example, actually,

00:01:48.189 --> 00:01:50.609
from Motney. Imagine a student, maybe an undergrad,

00:01:51.290 --> 00:01:53.590
growing legumes, hydroponically, under grow lights,

00:01:53.790 --> 00:01:55.870
very controlled. They'd manage everything meticulously,

00:01:55.989 --> 00:01:57.489
the nutrient mix, the substrate they're growing

00:01:57.489 --> 00:01:59.170
in, they'd even randomize the seeds, make sure

00:01:59.170 --> 00:02:00.489
all the plants are right next to each other,

00:02:00.709 --> 00:02:04.269
total control. Then, for just one group of those

00:02:04.269 --> 00:02:05.890
plants, the treatment group, they add something

00:02:05.890 --> 00:02:08.509
tiny. Let's say... trace amounts of heavy metals

00:02:08.509 --> 00:02:11.009
to the nutrient water. Now, if they then observe

00:02:11.009 --> 00:02:14.550
fewer nodules, those are the little root growths

00:02:14.550 --> 00:02:16.669
that grab nitrogen from the air if they see fewer

00:02:16.669 --> 00:02:18.930
nodules only on that group. Well, the conclusion

00:02:18.930 --> 00:02:21.530
is pretty inestable. The heavy metals caused

00:02:21.530 --> 00:02:23.330
it. Nothing else could have. And that really

00:02:23.330 --> 00:02:25.949
gets to the heart of it. These experiments, they're

00:02:25.949 --> 00:02:27.810
specifically designed to remove the environment

00:02:27.810 --> 00:02:30.650
as a factor. You're stripping away all the complexity

00:02:30.650 --> 00:02:33.550
to isolate one single cause and its effect. OK,

00:02:33.550 --> 00:02:36.539
that makes total sense. In a lab. where you can

00:02:36.539 --> 00:02:39.020
control light, water, nutrients, everything.

00:02:39.979 --> 00:02:43.219
But taking that idea and applying it to education,

00:02:43.979 --> 00:02:46.719
to kids in a classroom, that feels like a huge

00:02:46.719 --> 00:02:49.900
leap. So here's where it gets really interesting,

00:02:50.479 --> 00:02:53.520
or maybe really problematic. What happens when

00:02:53.520 --> 00:02:55.680
you try to map that super controlled science

00:02:55.680 --> 00:02:58.930
model onto a school? That is precisely the issue

00:02:58.930 --> 00:03:01.909
Ackerman raises. It's the fundamental flaw. Students,

00:03:01.990 --> 00:03:04.689
as the source puts it, they live in a rich and

00:03:04.689 --> 00:03:06.689
variable environment. Rich and variable. Yeah,

00:03:06.810 --> 00:03:10.099
that sounds about right. It's not just a small

00:03:10.099 --> 00:03:12.919
detail you can tweak. It's basically an insurmountable

00:03:12.919 --> 00:03:16.099
barrier for that specific experimental model.

00:03:16.539 --> 00:03:18.340
You just cannot, and you wouldn't even want to

00:03:18.340 --> 00:03:20.819
control all of the variables that affect how

00:03:20.819 --> 00:03:23.319
they attend, engage with, and learn in school.

00:03:23.479 --> 00:03:24.879
Great. Think about everything that affects a

00:03:24.879 --> 00:03:27.219
student on any given day. Exactly. Their home

00:03:27.219 --> 00:03:29.520
life, what resources they have access to, how

00:03:29.520 --> 00:03:31.800
they're feeling emotionally, their health. Did

00:03:31.800 --> 00:03:33.719
they sleep well? Are they interested? What did

00:03:33.719 --> 00:03:36.129
they eat for breakfast? All these things, they're

00:03:36.129 --> 00:03:38.509
huge variables, constantly changing. Impossible

00:03:38.509 --> 00:03:41.509
to control. Totally impossible. And because we

00:03:41.509 --> 00:03:44.069
can't possibly control or even account for all

00:03:44.069 --> 00:03:46.750
those factors in a student's environment, well,

00:03:46.810 --> 00:03:49.509
then we have no scientifically rigorous way,

00:03:49.729 --> 00:03:52.949
using that model, to say that this specific teaching

00:03:52.949 --> 00:03:56.009
method caused that specific change in test scores.

00:03:56.610 --> 00:03:59.469
The link. that direct cause and effect, it just

00:03:59.469 --> 00:04:02.310
gets completely lost in the noise. It's untraceable.

00:04:02.449 --> 00:04:05.689
Wow. So the entire premise, the very foundation

00:04:05.689 --> 00:04:08.530
of using standardized tests in this way for accountability

00:04:08.530 --> 00:04:11.810
was shaky from day one. That's a pretty big claim.

00:04:12.050 --> 00:04:13.770
It suggests it wasn't just about choosing the

00:04:13.770 --> 00:04:16.069
wrong tests or bad implementation. It was the

00:04:16.069 --> 00:04:19.529
core. concept itself that didn't fit the reality

00:04:19.529 --> 00:04:22.110
of learning. Now, the source does mention, you

00:04:22.110 --> 00:04:24.009
know, the other common criticisms people have,

00:04:24.230 --> 00:04:25.930
things like, what are these tests really measuring?

00:04:26.050 --> 00:04:27.810
Are the constructs even well -defined questions

00:04:27.810 --> 00:04:30.290
about validity, reliability, how the data is

00:04:30.290 --> 00:04:32.730
collected, ethical concerns, those all get a

00:04:32.730 --> 00:04:34.389
nod. Right, the author acknowledges those are

00:04:34.389 --> 00:04:36.569
out there. But the main thrust, the thing this

00:04:36.569 --> 00:04:38.470
article really hammers home and what we're focusing

00:04:38.470 --> 00:04:42.569
on, is that initial fundamental flaw, the misapplication

00:04:42.569 --> 00:04:45.790
of the astermental design. Exactly. What's fascinating

00:04:45.790 --> 00:04:48.129
here is the argument that you don't even need

00:04:48.129 --> 00:04:50.509
to get bogged down in debating those other details.

00:04:50.930 --> 00:04:53.290
The core idea, the attempt to treat education

00:04:53.290 --> 00:04:56.029
like a controlled botany experiment, was flawed

00:04:56.029 --> 00:04:58.750
from the start. The foundation was just untenable,

00:04:58.930 --> 00:05:00.949
as the author says. Built on sand, essentially.

00:05:01.009 --> 00:05:03.389
Pretty much. And the article touches on the human

00:05:03.389 --> 00:05:05.870
side of this too, doesn't it? The cost. It makes

00:05:05.870 --> 00:05:08.430
the point. It is unfortunate that a generation

00:05:08.430 --> 00:05:11.519
of students suffered. while we pursued this approach

00:05:11.519 --> 00:05:14.220
that was perhaps fundamentally mismatched to

00:05:14.220 --> 00:05:16.600
the task. Yeah, that's a sobering thought. So

00:05:16.600 --> 00:05:18.459
wrapping this up, the key takeaway for you listening

00:05:18.459 --> 00:05:22.459
is this. According to this source, the big standardized

00:05:22.459 --> 00:05:25.439
test didn't just stumble. It potentially failed

00:05:25.439 --> 00:05:27.800
because its very premise was flawed. It was built

00:05:27.800 --> 00:05:31.040
on that untenable foundation, trying to use a

00:05:31.040 --> 00:05:33.240
scientific model designed for tightly controlled

00:05:33.240 --> 00:05:36.540
labs and applying it to the messy, complex, wonderful,

00:05:36.800 --> 00:05:39.139
rich, and variable world where actual humans

00:05:39.139 --> 00:05:41.930
learn. Mm -hmm. And if we sort of connect this

00:05:41.930 --> 00:05:44.529
to the bigger picture, really grasping this foundational

00:05:44.529 --> 00:05:47.529
issue, it changes how you might think about evaluating

00:05:47.529 --> 00:05:50.930
education entirely. It pushes you beyond just

00:05:50.930 --> 00:05:53.430
critiquing a test, you know? It makes you question

00:05:53.430 --> 00:05:56.149
the whole framework we use to decide what works

00:05:56.149 --> 00:05:59.029
in such a complex system like learning. Which

00:05:59.029 --> 00:06:00.970
definitely leaves us with a big question to chew

00:06:00.970 --> 00:06:04.160
on, doesn't it? If that strict scientific experimental

00:06:04.160 --> 00:06:06.660
model, the one described here, really can't be

00:06:06.660 --> 00:06:09.180
neatly applied to measure what works in education,

00:06:09.439 --> 00:06:11.740
and how do we genuinely understand and foster

00:06:11.740 --> 00:06:14.120
better learning in this incredibly rich and variable

00:06:14.120 --> 00:06:16.660
world we all live in? Something to think about.

00:06:16.779 --> 00:06:18.120
Thanks for joining us for this deep dive.
