WEBVTT

00:00:00.000 --> 00:00:03.060
Imagine this. You're an experienced surgeon.

00:00:03.299 --> 00:00:05.820
You're looking at an open tibial fracture, one

00:00:05.820 --> 00:00:09.400
of the most common severe injuries really. You're

00:00:09.400 --> 00:00:11.380
asked to classify it using a standard system,

00:00:11.539 --> 00:00:13.960
something you've done, well, countless times.

00:00:14.019 --> 00:00:17.000
Yes, standard practice. Now imagine several of

00:00:17.000 --> 00:00:18.899
your equally experienced colleagues are doing

00:00:18.899 --> 00:00:21.559
the exact same thing for the very same fracture.

00:00:22.199 --> 00:00:24.570
How often do you think you'd all agree? on the

00:00:24.570 --> 00:00:27.070
precise classification. You'd hope pretty often,

00:00:27.190 --> 00:00:28.469
wouldn't you? You would, wouldn't you? Maybe

00:00:28.469 --> 00:00:31.769
90%, 100%. Well, astonishingly, research has

00:00:31.769 --> 00:00:34.509
shown that even among highly experienced orthopedic

00:00:34.509 --> 00:00:37.490
trauma surgeons, using a widely accepted standard

00:00:37.490 --> 00:00:41.490
system, agreement can be as low as 60%. 60%.

00:00:41.490 --> 00:00:44.109
That's quite low. It really is. Think about just

00:00:44.109 --> 00:00:46.609
classifying what the injury is. That presents

00:00:46.609 --> 00:00:48.509
a significant challenge. And that's just the

00:00:48.509 --> 00:00:50.869
starting point, you know? before you even begin

00:00:50.869 --> 00:00:53.030
to figure out the absolute best way to treat

00:00:53.030 --> 00:00:55.530
it and definitively know if that treatment actually

00:00:55.530 --> 00:00:58.679
worked. Welcome to the deep dive. This is where

00:00:58.679 --> 00:01:00.920
we take a stack of source material, the articles,

00:01:01.100 --> 00:01:02.960
the research papers, the notes that you've shared

00:01:02.960 --> 00:01:06.540
with us, and we, well, we unpack the vital insights,

00:01:06.579 --> 00:01:09.840
the crucial nuggets of knowledge, really. We

00:01:09.840 --> 00:01:11.980
want to help you become truly well -informed

00:01:11.980 --> 00:01:14.719
on complex topics without, you know, drowning

00:01:14.719 --> 00:01:17.079
in information overload. A very common problem

00:01:17.079 --> 00:01:19.680
these days. It is. Our mission today is to navigate

00:01:19.680 --> 00:01:22.519
the sometimes murky, always intricate waters

00:01:22.519 --> 00:01:25.040
of clinical research. We're going to embark on

00:01:25.040 --> 00:01:27.420
a deep dive focusing on how studies are designed.

00:01:27.280 --> 00:01:30.280
the fascinating, and yes, sometimes frustrating,

00:01:30.480 --> 00:01:33.140
world of data analysis, and perhaps most crucially,

00:01:33.159 --> 00:01:35.400
how we actually measure whether a treatment has

00:01:35.400 --> 00:01:38.159
achieved a meaningful outcome, all with a particular

00:01:38.159 --> 00:01:40.680
eye on the challenges often seen within orthopedics.

00:01:40.939 --> 00:01:43.680
A field with its own unique set of hurdles, certainly.

00:01:44.079 --> 00:01:46.040
Exactly. It's about pulling out those essential

00:01:46.040 --> 00:01:48.340
pieces that make evaluating medical evidence

00:01:48.340 --> 00:01:51.180
less daunting and hopefully more insightful for

00:01:51.180 --> 00:01:53.969
you. And joining me today to guide us through

00:01:53.969 --> 00:01:56.569
this material is an expert with a remarkable

00:01:56.569 --> 00:01:59.890
ability to synthesize diverse information, to

00:01:59.890 --> 00:02:02.010
spot those subtle but critical patterns and,

00:02:02.010 --> 00:02:04.370
well, to connect the dots between the seemingly

00:02:04.370 --> 00:02:07.310
academic world of research methodology and its

00:02:07.310 --> 00:02:10.169
real world implications for clinical practice,

00:02:10.289 --> 00:02:12.370
for informed decision making. Happy to be here

00:02:12.370 --> 00:02:14.509
and try to unravel some of it. Great. He's here

00:02:14.509 --> 00:02:16.449
to help us understand not just what the studies

00:02:16.449 --> 00:02:19.430
say, but why it truly matters for the decisions

00:02:19.430 --> 00:02:22.000
you make. Right, let's jump straight in with

00:02:22.000 --> 00:02:24.620
a rapid -fire setup, just to preview some key

00:02:24.620 --> 00:02:27.539
angles of our deep dive. Thinking about professionals

00:02:27.539 --> 00:02:30.520
interpreting research, what in your view is the

00:02:30.520 --> 00:02:32.719
single most important concept they need to grasp

00:02:32.719 --> 00:02:35.800
right away? Ah, the single most important. I'd

00:02:35.800 --> 00:02:38.120
have to say it's understanding the inherent limitations.

00:02:39.099 --> 00:02:41.620
Limitations imposed by the study design and the

00:02:41.620 --> 00:02:44.560
potential for bias. You see, every study has

00:02:44.560 --> 00:02:47.759
weaknesses. Recognizing where a particular study

00:02:47.759 --> 00:02:50.479
sits within the hierarchy of evidence, along

00:02:50.479 --> 00:02:53.599
with its specific risks of bias, that's the absolute

00:02:53.599 --> 00:02:56.740
cornerstone of responsible interpretation. So

00:02:56.740 --> 00:02:59.500
you can't take it at face value? Never. If you

00:02:59.500 --> 00:03:01.680
don't see the flaws, you can't truly trust the

00:03:01.680 --> 00:03:03.560
findings. It's fundamental. All right. Fundamental

00:03:03.560 --> 00:03:07.620
indeed. Next question. Why is the choice of how

00:03:07.620 --> 00:03:11.099
you measure results sometimes just as or even

00:03:11.099 --> 00:03:13.419
more critical than the treatment itself? Oh,

00:03:13.439 --> 00:03:15.240
absolutely critical. Because if your outcome

00:03:15.240 --> 00:03:17.879
measure isn't truly meaningful, reliable, and

00:03:17.879 --> 00:03:20.340
valid, you know, within the context of the study

00:03:20.340 --> 00:03:22.620
and for the patients involved, then whatever

00:03:22.620 --> 00:03:24.080
findings you report about the treatment's effect

00:03:24.080 --> 00:03:25.979
become highly questionable. So you could get

00:03:25.979 --> 00:03:28.469
a significant result, but... Exactly. You might

00:03:28.469 --> 00:03:31.009
see a statistically significant change in, say,

00:03:31.129 --> 00:03:34.009
a lab value. But if that doesn't translate into

00:03:34.009 --> 00:03:36.949
a tangible improvement in a patient's function

00:03:36.949 --> 00:03:39.289
or quality of life, what they actually care about,

00:03:39.770 --> 00:03:42.050
then the measure, and by extension, the result,

00:03:42.530 --> 00:03:45.360
might lack clinical relevance. The measurements

00:03:45.360 --> 00:03:47.659
simply have to align with what matters. What

00:03:47.659 --> 00:03:49.379
matters to the patient, ultimately? Necessarily.

00:03:49.580 --> 00:03:52.139
Okay. And finally, stepping back from the statistics

00:03:52.139 --> 00:03:54.039
and the measurements for a moment, what's an

00:03:54.039 --> 00:03:56.719
ethical cornerstone of human research that simply

00:03:56.719 --> 00:03:59.400
cannot, under any circumstances, be overlooked?

00:03:59.699 --> 00:04:02.699
Informed consent. It's the absolute bedrock principle.

00:04:03.219 --> 00:04:05.800
It demonstrates fundamental respect for participant

00:04:05.800 --> 00:04:09.020
autonomy, ensuring individuals fully comprehend

00:04:09.020 --> 00:04:11.659
the study's purpose, the methods, the potential

00:04:11.659 --> 00:04:14.599
risks and benefits, all the available alternatives.

00:04:15.379 --> 00:04:17.199
And confirming their decision to participate

00:04:17.199 --> 00:04:20.459
is truly voluntary, free from coercion that is

00:04:20.459 --> 00:04:22.680
completely non -negotiable. So even a perfectly

00:04:22.680 --> 00:04:25.639
designed study. Without robust, genuine, informed

00:04:25.639 --> 00:04:28.300
consent, even the most methodologically brilliant

00:04:28.300 --> 00:04:30.819
study is built on ethically shaky ground. It's

00:04:30.819 --> 00:04:33.160
just unacceptable. Powerful start, thank you.

00:04:33.600 --> 00:04:35.459
Okay, let's untack this in more detail, then.

00:04:35.629 --> 00:04:38.589
We begin our deep dive into designing and analyzing

00:04:38.589 --> 00:04:41.769
studies with a concept that's often the first

00:04:41.769 --> 00:04:43.850
framework you encounter when trying to make sense

00:04:43.850 --> 00:04:46.750
of medical evidence, the hierarchy of evidence.

00:04:47.230 --> 00:04:49.889
What exactly is this and why is it such a foundational

00:04:49.889 --> 00:04:52.250
tool? Right, the hierarchy of evidence. It's

00:04:52.250 --> 00:04:54.990
essentially a structured way, a ranking system

00:04:54.990 --> 00:04:57.509
really, for different types of study designs.

00:04:58.310 --> 00:05:01.790
It ranks them based on their inherent methodological

00:05:01.790 --> 00:05:05.009
quality, particularly in terms of their ability

00:05:05.009 --> 00:05:08.769
to minimize bias and, crucially, establish causality,

00:05:09.230 --> 00:05:11.129
especially when you're evaluating the effectiveness

00:05:11.129 --> 00:05:13.930
of interventions. So it helps you judge reliability.

00:05:14.350 --> 00:05:17.350
Exactly. It helps us as readers of research quickly

00:05:17.350 --> 00:05:19.910
gauge the relative reliability and trustworthiness

00:05:19.910 --> 00:05:22.209
of a study's findings compared to other types

00:05:22.209 --> 00:05:24.389
of studies. The version from the Oxford Center

00:05:24.389 --> 00:05:26.149
for Evidence -Based Medicine is one of the most

00:05:26.149 --> 00:05:28.449
widely used and recognized frameworks for this.

00:05:28.750 --> 00:05:31.209
So it's like a pyramid or maybe a ladder with

00:05:31.209 --> 00:05:33.410
different study types at different levels. What

00:05:33.410 --> 00:05:35.310
sits at the very top when we're talking about

00:05:35.310 --> 00:05:38.069
individual studies looking at treatments? Precisely.

00:05:38.509 --> 00:05:41.050
At the pinnacle, representing the highest level

00:05:41.050 --> 00:05:43.230
of primary evidence for therapeutic interventions,

00:05:43.730 --> 00:05:45.889
you typically find randomized controlled trials,

00:05:46.329 --> 00:05:48.910
RCTs. They're generally classified as level one

00:05:48.910 --> 00:05:51.370
evidence. And why level one? What makes them

00:05:51.370 --> 00:05:54.129
the top? The reason they're placed there is fundamentally

00:05:54.129 --> 00:05:56.889
about their design's ability to reduce selection

00:05:56.889 --> 00:06:00.149
bias, which, as we said, is a major threat to

00:06:00.149 --> 00:06:03.709
validity. As you descend the hierarchy, you encounter

00:06:03.709 --> 00:06:06.399
other steady designs. Controlled observational

00:06:06.399 --> 00:06:08.939
studies sit below RCTs, and their exact level

00:06:08.939 --> 00:06:11.060
can depend on factors like whether they collected

00:06:11.060 --> 00:06:13.759
data prospectively looking forward in time, or

00:06:13.759 --> 00:06:15.939
retrospectively looking back at existing data.

00:06:16.019 --> 00:06:18.339
And lower down. Further down still, you find

00:06:18.339 --> 00:06:20.699
study types with higher inherent risks of bias.

00:06:20.990 --> 00:06:23.490
things like case series, uncontrolled studies,

00:06:23.629 --> 00:06:25.449
and finally, at the lowest level, typically level

00:06:25.449 --> 00:06:28.230
five, you have things like expert opinion, anecdotal

00:06:28.230 --> 00:06:30.610
evidence, or studies based purely on mechanisms

00:06:30.610 --> 00:06:32.730
of action rather than actual clinical outcomes

00:06:32.730 --> 00:06:35.269
in patients. Okay. You mentioned RCTs are the

00:06:35.269 --> 00:06:37.829
gold standard for primary evidence. What is it

00:06:37.829 --> 00:06:39.670
about that randomization process that makes them

00:06:39.670 --> 00:06:43.230
so powerful, especially against bias? Well, the

00:06:43.230 --> 00:06:45.610
fundamental strength of an RCT lies in the random

00:06:45.610 --> 00:06:47.889
allocation of participants to different groups.

00:06:48.250 --> 00:06:50.310
Typically the group receiving the new treatment

00:06:50.310 --> 00:06:53.730
and a comparison or control group. The idea is

00:06:53.730 --> 00:06:56.709
that by randomizing you ensure that on average

00:06:56.709 --> 00:06:58.970
the groups are comparable right at the start

00:06:58.970 --> 00:07:01.449
of the study. Comparably what way? In terms of

00:07:01.449 --> 00:07:03.850
both known factors that might influence the outcome

00:07:03.850 --> 00:07:07.329
like age, severity of disease, other health conditions,

00:07:07.930 --> 00:07:10.050
and crucially unknown factors that you might

00:07:10.050 --> 00:07:12.480
not even... be aware of or be able to measure.

00:07:13.100 --> 00:07:15.160
This comparability means that if you observe

00:07:15.160 --> 00:07:17.279
a significant difference in outcomes between

00:07:17.279 --> 00:07:19.540
the groups at the end, you could be much more

00:07:19.540 --> 00:07:21.720
confident that the difference is due to the intervention

00:07:21.720 --> 00:07:23.959
being studied rather than some pre -existing

00:07:23.959 --> 00:07:26.379
difference between the groups. So it isolates

00:07:26.379 --> 00:07:28.379
the effect of the treatment itself. It's the

00:07:28.379 --> 00:07:30.740
most robust way we have to try and establish

00:07:30.740 --> 00:07:32.699
that cause and effect relationship between a

00:07:32.699 --> 00:07:35.170
treatment and an outcome. That logic holds for

00:07:35.170 --> 00:07:37.389
many medical fields, but the source material

00:07:37.389 --> 00:07:39.389
correctly points out that even this gold standard

00:07:39.389 --> 00:07:41.949
faces significant limitations, particularly in

00:07:41.949 --> 00:07:44.709
surgical disciplines like orthopedics. What are

00:07:44.709 --> 00:07:47.410
some of the practical logistical hurdles there?

00:07:47.610 --> 00:07:50.670
Ah, yes. The reality of applying RCTs to surgery

00:07:50.670 --> 00:07:54.629
is challenging, to say the least. They are notoriously

00:07:54.629 --> 00:07:57.370
difficult and often prohibitively expensive to

00:07:57.370 --> 00:08:00.170
organize and run. They involve a substantial

00:08:00.170 --> 00:08:02.589
workload for the research team, often over a

00:08:02.589 --> 00:08:05.250
very long duration, which can lead to logistical

00:08:05.250 --> 00:08:07.810
nightmares and real challenges in just completing

00:08:07.810 --> 00:08:10.649
the study as planned. And beyond just the cost

00:08:10.649 --> 00:08:13.069
and time. Beyond the practicalities, there are

00:08:13.069 --> 00:08:15.970
profound difficulties with recruitment. Both

00:08:15.970 --> 00:08:18.189
surgeons and patients can struggle with the concept

00:08:18.189 --> 00:08:21.050
of randomization. A surgeon might have a strong

00:08:21.050 --> 00:08:23.250
clinical preference for one technique over another

00:08:23.250 --> 00:08:26.949
based on their experience, right? And randomization

00:08:26.949 --> 00:08:29.149
removes their autonomy to choose what they feel

00:08:29.149 --> 00:08:32.610
is best for that patient. Similarly, a patient

00:08:32.610 --> 00:08:34.870
might have heard about a new technique and specifically

00:08:34.870 --> 00:08:37.909
want that one. or they might simply be unwilling

00:08:37.909 --> 00:08:40.129
to accept the possibility of being randomized

00:08:40.129 --> 00:08:42.750
to an older or perhaps less preferred method,

00:08:43.210 --> 00:08:45.509
or even a non -surgical option, if that's part

00:08:45.509 --> 00:08:48.070
of the comparison. This resistance from both

00:08:48.070 --> 00:08:51.110
sides can severely impede recruitment and make

00:08:51.110 --> 00:08:54.190
surgical RCTs incredibly hard to complete successfully.

00:08:55.269 --> 00:08:57.710
And this brings us directly to a really critical

00:08:57.710 --> 00:09:01.009
ethical principle, clinical equipoise. This isn't

00:09:01.009 --> 00:09:03.490
just academic jargon. It's absolutely fundamental

00:09:03.490 --> 00:09:05.610
to the ethics of randomizing patients in the

00:09:05.610 --> 00:09:07.830
first place. Okay, clinical equipoise. How does

00:09:07.830 --> 00:09:10.409
that tie into justifying randomization ethically?

00:09:10.570 --> 00:09:12.909
What does it mean? Clinical equipoise means there

00:09:12.909 --> 00:09:15.269
must be genuine uncertainty within the expert

00:09:15.269 --> 00:09:17.590
medical community, including the investigators

00:09:17.590 --> 00:09:20.009
conducting the study, about whether one arm of

00:09:20.009 --> 00:09:22.350
the experiment is definitively superior to another.

00:09:22.529 --> 00:09:24.789
So if doctors already think one treatment is

00:09:24.789 --> 00:09:27.870
better. Exactly. If there is already a clear

00:09:27.870 --> 00:09:30.769
consensus, Based on existing evidence that one

00:09:30.769 --> 00:09:33.049
treatment is better or safer than another, it

00:09:33.049 --> 00:09:35.649
would generally be considered unethical to randomize

00:09:35.649 --> 00:09:37.909
patients to the treatment believed to be inferior.

00:09:38.629 --> 00:09:41.110
Equipoise is the necessary ethical precondition.

00:09:41.350 --> 00:09:43.929
It allows researchers to justify subjecting patients

00:09:43.929 --> 00:09:46.750
to the potential risks of an experimental treatment

00:09:46.750 --> 00:09:49.750
or randomizing them away from potentially beneficial

00:09:49.750 --> 00:09:52.690
treatment precisely because the best course of

00:09:52.690 --> 00:09:55.610
action is genuinely unknown. So if the investigator

00:09:55.610 --> 00:09:59.100
thinks they know the answer. If you, as the investigator,

00:09:59.240 --> 00:10:01.820
truly believe one treatment is superior, you

00:10:01.820 --> 00:10:04.019
have an ethical obligation to provide that treatment,

00:10:04.320 --> 00:10:06.960
not randomize patients away from it. Establishing

00:10:06.960 --> 00:10:09.299
and maintaining that state of genuine uncertainty

00:10:09.299 --> 00:10:11.960
equipoise throughout the trial is absolutely

00:10:11.960 --> 00:10:14.019
vital. Right, and the source material mentions

00:10:14.019 --> 00:10:16.340
a particularly challenging application of this

00:10:16.340 --> 00:10:19.779
principle in orthopedics, sham surgeries. That

00:10:19.779 --> 00:10:21.519
sounds ethically complex just from the name.

00:10:21.720 --> 00:10:24.000
It is indeed, and it highlights that ethical

00:10:24.000 --> 00:10:27.379
tension beautifully. SHAM surgeries are essentially

00:10:27.379 --> 00:10:29.860
the surgical equivalent of a placebo and a drug

00:10:29.860 --> 00:10:32.539
trial. They involve performing part of a surgical

00:10:32.539 --> 00:10:34.899
procedure, or perhaps a procedure that mimics

00:10:34.899 --> 00:10:38.159
the experience, but crucially omitting the actual

00:10:38.159 --> 00:10:40.720
therapeutic step. Like making the cut but not

00:10:40.720 --> 00:10:43.759
doing the repair. Exactly. Making incisions perhaps,

00:10:44.139 --> 00:10:46.519
but not performing the repair or the decompression

00:10:46.519 --> 00:10:49.019
you're studying. The scientific rationale is

00:10:49.019 --> 00:10:52.470
powerful. They allow for a truly blinded comparison

00:10:52.470 --> 00:10:54.629
because neither the patient nor the assessing

00:10:54.629 --> 00:10:56.769
clinician knows whether the full procedure was

00:10:56.769 --> 00:10:59.789
actually performed. This helps to isolate the

00:10:59.789 --> 00:11:01.850
true effect of the surgical intervention itself

00:11:01.850 --> 00:11:04.490
from the placebo effect, you know, the psychological

00:11:04.490 --> 00:11:07.110
impact of simply having undergone any procedure.

00:11:07.929 --> 00:11:10.809
But the ethical tension is undeniable. You are

00:11:10.809 --> 00:11:13.570
performing an invasive non -therapeutic procedure

00:11:13.570 --> 00:11:17.139
that carries risks anesthesia risks, infection,

00:11:17.600 --> 00:11:19.980
nerve damage potentially on a patient who may

00:11:19.980 --> 00:11:22.460
as a result be denied an effective treatment

00:11:22.460 --> 00:11:25.840
by being in that sham group. Now current literature

00:11:25.840 --> 00:11:28.360
and ethical guidelines do suggest that sham surgeries

00:11:28.360 --> 00:11:31.200
can have a place in research, but only under

00:11:31.200 --> 00:11:33.539
very stringent conditions. What sort of conditions?

00:11:33.919 --> 00:11:36.039
These include strict adherence to that principle

00:11:36.039 --> 00:11:39.460
of clinical equipoise. Genuine uncertainty about

00:11:39.460 --> 00:11:42.789
efficacy is paramount. rigorous informed consent,

00:11:43.169 --> 00:11:44.769
where the patient fully understands they might

00:11:44.769 --> 00:11:47.649
receive a sham procedure, is absolutely essential.

00:11:48.230 --> 00:11:49.850
And there needs to be very careful consideration

00:11:49.850 --> 00:11:52.049
of the risks versus the potential benefits of

00:11:52.049 --> 00:11:53.590
gaining definitive knowledge from the trial.

00:11:54.190 --> 00:11:56.730
It's a very sensitive area, but sometimes it's

00:11:56.730 --> 00:11:58.850
considered necessary to truly determine if a

00:11:58.850 --> 00:12:00.929
surgical intervention works beyond just the placebo

00:12:00.929 --> 00:12:03.029
effect and the natural history of the condition.

00:12:03.210 --> 00:12:05.669
Fascinating, and it definitely highlights the

00:12:05.669 --> 00:12:08.629
complexities researchers face. So, beyond these

00:12:08.629 --> 00:12:11.169
primary studies like RCTs, we also rely heavily

00:12:11.169 --> 00:12:13.830
on secondary and filtered research things like

00:12:13.830 --> 00:12:17.169
systematic reviews and meta -analyses. The hierarchy

00:12:17.169 --> 00:12:19.669
of evidence traditionally places these above

00:12:19.669 --> 00:12:22.210
individual studies, even RCTs. Why is that the

00:12:22.210 --> 00:12:24.690
case? Yes, they are often positioned at the top

00:12:24.690 --> 00:12:26.669
because they represent a synthesis or a summary

00:12:26.669 --> 00:12:29.090
of the best available primary evidence on a specific

00:12:29.090 --> 00:12:32.509
clinical question. A systematic review is a rigorous

00:12:32.509 --> 00:12:35.570
process. It involves a comprehensive and ideally

00:12:35.570 --> 00:12:38.570
unbiased search for all relevant primary studies

00:12:38.570 --> 00:12:41.029
addressing a particular question, followed by

00:12:41.029 --> 00:12:43.110
a critical appraisal and synthesis of their findings.

00:12:43.409 --> 00:12:45.210
So it aims to be exhaustive and transparent.

00:12:45.789 --> 00:12:49.129
Exactly. The aim is to reduce bias in the review

00:12:49.129 --> 00:12:52.090
process itself by being transparent about the

00:12:52.090 --> 00:12:55.169
search strategy, the inclusion criteria, the

00:12:55.169 --> 00:12:57.289
quality assessment, and how the results were

00:12:57.289 --> 00:13:00.320
combined. A meta -analysis takes this a step

00:13:00.320 --> 00:13:03.179
further. If the studies included in a systematic

00:13:03.179 --> 00:13:05.519
review are sufficiently similar in terms of their

00:13:05.519 --> 00:13:08.080
populations, interventions, outcome measures,

00:13:08.320 --> 00:13:11.340
that sort of thing, requiring a degree of homogeneity,

00:13:11.960 --> 00:13:14.559
then a meta -analysis can statistically pool

00:13:14.559 --> 00:13:16.919
the quantitative data from these studies. And

00:13:16.919 --> 00:13:19.220
the benefit of pooling the data? It increases

00:13:19.220 --> 00:13:22.080
the sample size and the statistical power. This

00:13:22.080 --> 00:13:24.320
potentially allows researchers to detect smaller

00:13:24.320 --> 00:13:27.259
effects or provide a much more precise estimate

00:13:27.259 --> 00:13:29.360
of the treatment effect than any single study

00:13:29.360 --> 00:13:31.940
could on its own. And when pooling those statistics

00:13:31.940 --> 00:13:34.000
in a meta -analysis, the source mentions fixed

00:13:34.000 --> 00:13:36.639
effects versus random effects models. What's

00:13:36.639 --> 00:13:38.700
the key difference in what those models assume

00:13:38.700 --> 00:13:41.399
about the data? Ah yes, that comes down to what

00:13:41.399 --> 00:13:44.120
the model assumes about the underlying effect

00:13:44.120 --> 00:13:48.100
size across the included studies. A fixed effects

00:13:48.100 --> 00:13:50.539
model Something like the Mantle -Hensel method

00:13:50.539 --> 00:13:53.820
assumes there is one true single effect size

00:13:53.820 --> 00:13:55.940
for the intervention across all the populations

00:13:55.940 --> 00:13:59.100
represented in the included studies. It assumes

00:13:59.100 --> 00:14:01.500
any variation in results seen between the studies

00:14:01.500 --> 00:14:04.799
is purely due to random chance sampling error.

00:14:05.139 --> 00:14:06.940
It essentially calculates a weighted average

00:14:06.940 --> 00:14:09.600
giving more weight to larger studies. Okay, and

00:14:09.600 --> 00:14:12.500
random effects. In contrast, a random effects

00:14:12.500 --> 00:14:14.860
model, like the Dersimonian and Laird method,

00:14:15.279 --> 00:14:17.500
is generally considered more conservative and,

00:14:17.500 --> 00:14:19.940
frankly, often more realistic in clinical research.

00:14:20.500 --> 00:14:22.679
It assumes that the true effect size might actually

00:14:22.679 --> 00:14:25.620
vary to some degree from study to study, not

00:14:25.620 --> 00:14:27.340
just because of random chance, but also because

00:14:27.340 --> 00:14:29.299
of actual differences in the population studied,

00:14:29.620 --> 00:14:31.340
perhaps the way the intervention was applied,

00:14:31.360 --> 00:14:33.759
the settings, and so on. It accounts for this

00:14:33.759 --> 00:14:36.539
between -study heterogeneity in addition to the

00:14:36.539 --> 00:14:39.139
within -study random variation. So if the studies

00:14:39.139 --> 00:14:42.200
are quite different, If there's significant heterogeneity

00:14:42.200 --> 00:14:45.019
in the results of the included studies, a random

00:14:45.019 --> 00:14:47.700
effects model is usually more appropriate, and

00:14:47.700 --> 00:14:50.519
it will typically produce wider confidence intervals,

00:14:51.000 --> 00:14:53.519
reflecting the greater uncertainty due to this

00:14:53.519 --> 00:14:56.299
underlying variation between studies. So systematic

00:14:56.299 --> 00:14:58.700
reviews and meta -analyses sound like powerful

00:14:58.700 --> 00:15:01.360
tools for summarizing evidence. But the source

00:15:01.360 --> 00:15:03.480
material also presents important criticisms,

00:15:03.879 --> 00:15:05.940
suggesting they are, well, only as good as the

00:15:05.940 --> 00:15:08.700
studies they include. What are the main caveats

00:15:08.700 --> 00:15:10.679
we should be aware of when reading them? This

00:15:10.679 --> 00:15:13.080
is an absolutely crucial point, and it really

00:15:13.080 --> 00:15:16.100
underpins that adage you mentioned. Secondary

00:15:16.100 --> 00:15:18.539
evidence is only as strong as the studies that

00:15:18.539 --> 00:15:21.730
comprise it. A beautifully conducted systematic

00:15:21.730 --> 00:15:24.950
review, meticulously done, can still yield misleading

00:15:24.950 --> 00:15:27.129
conclusions if the primary studies it includes

00:15:27.129 --> 00:15:29.610
are methodologically flawed, have high risks

00:15:29.610 --> 00:15:32.269
of bias, or are simply not relevant to current

00:15:32.269 --> 00:15:35.190
practice. So pooling bad studies doesn't make

00:15:35.190 --> 00:15:38.590
the result good. Precisely. including non -randomized

00:15:38.590 --> 00:15:42.070
studies when good RCTs exist, pooling data from

00:15:42.070 --> 00:15:44.330
studies with vastly different patient populations

00:15:44.330 --> 00:15:47.090
or interventions, or including studies that are

00:15:47.090 --> 00:15:49.970
technically sound but clinically irrelevant to

00:15:49.970 --> 00:15:53.269
the question at hand. All these things can compromise

00:15:53.269 --> 00:15:55.850
the validity and the practical impact of the

00:15:55.850 --> 00:15:58.570
review or meta -analysis. Just because you've

00:15:58.570 --> 00:16:00.509
pooled a lot of data doesn't automatically mean

00:16:00.509 --> 00:16:03.110
the answer is right or useful. If you're pooling

00:16:03.110 --> 00:16:05.250
weak evidence, you might just get a more precise

00:16:05.250 --> 00:16:07.779
estimate of the wrong answer. Readers really

00:16:07.779 --> 00:16:09.519
need to look critically at the methods section

00:16:09.519 --> 00:16:11.879
of any systematic review, understand how the

00:16:11.879 --> 00:16:13.879
studies were identified, how their quality was

00:16:13.879 --> 00:16:16.240
assessed, and whether it was genuinely appropriate

00:16:16.240 --> 00:16:18.740
to pool their results statistically in a meta

00:16:18.740 --> 00:16:21.600
-analysis, given the potential for heterogeneity.

00:16:21.919 --> 00:16:24.440
That brings us right back to the concept of bias,

00:16:24.539 --> 00:16:26.440
which you highlighted at the start as being so

00:16:26.440 --> 00:16:28.779
fundamental. You mentioned selection bias as

00:16:28.779 --> 00:16:31.320
particularly important in study design. Could

00:16:31.320 --> 00:16:33.659
you elaborate on that and maybe give a few more

00:16:33.659 --> 00:16:35.740
examples of where it can creep in, potentially

00:16:35.740 --> 00:16:38.220
distorting findings even early on? Certainly.

00:16:38.879 --> 00:16:41.879
Bias, remember, is a systematic error. It's not

00:16:41.879 --> 00:16:45.200
random noise. It's a consistent slant in the

00:16:45.200 --> 00:16:47.899
study design or conduct that leads to results

00:16:47.899 --> 00:16:49.559
that are consistently different from the true

00:16:49.559 --> 00:16:53.039
effect. Selection bias, as we discussed, arises

00:16:53.039 --> 00:16:54.820
when there are systematic differences in the

00:16:54.820 --> 00:16:56.500
characteristics of the groups being compared

00:16:56.500 --> 00:16:59.139
before the intervention even happens. This can

00:16:59.139 --> 00:17:01.299
happen if participants aren't recruited or allocated

00:17:01.299 --> 00:17:03.600
to groups in a truly random or comparable way.

00:17:04.220 --> 00:17:06.180
So beyond just using historical controls, which

00:17:06.180 --> 00:17:08.819
you said is always bad. Yes. Using historical

00:17:08.819 --> 00:17:11.119
controls is almost always problematic due to

00:17:11.119 --> 00:17:14.119
changes over time in care, diagnosis, everything.

00:17:15.039 --> 00:17:17.640
But selection bias can occur much more subtly.

00:17:17.789 --> 00:17:20.690
For example, if a study is conducted only at

00:17:20.690 --> 00:17:23.089
a highly specialized tertiary referral center,

00:17:23.630 --> 00:17:25.410
the patients recruited there might have more

00:17:25.410 --> 00:17:28.049
complex or severe forms of the condition than

00:17:28.049 --> 00:17:30.089
the average patient you'd see in a general hospital

00:17:30.089 --> 00:17:32.829
with that same condition. Comparing outcomes

00:17:32.829 --> 00:17:35.529
in this highly selected group to outcomes reported

00:17:35.529 --> 00:17:38.890
elsewhere could introduce selection bias. Similarly,

00:17:39.049 --> 00:17:40.690
if the method of recruiting patients differs

00:17:40.690 --> 00:17:43.160
between groups, Perhaps one group is recruited

00:17:43.160 --> 00:17:45.119
from an outpatient clinic and another from an

00:17:45.119 --> 00:17:47.500
inpatient ward, you might inadvertently select

00:17:47.500 --> 00:17:49.799
patients with different levels of mobility or

00:17:49.799 --> 00:17:53.430
overall health status. Even in an RCT, if the

00:17:53.430 --> 00:17:55.910
process of randomization isn't properly concealed,

00:17:56.349 --> 00:17:58.309
meaning the researchers enrolling patients know

00:17:58.309 --> 00:18:00.890
or can guess which group the next patient will

00:18:00.890 --> 00:18:03.549
be assigned to, they might consciously or unconsciously

00:18:03.549 --> 00:18:05.589
steer certain types of patients into one group

00:18:05.589 --> 00:18:08.690
over the other based on prognostic factors. That

00:18:08.690 --> 00:18:10.670
undermines a whole point of randomization and

00:18:10.670 --> 00:18:12.809
reintroduces selection bias. And the sources

00:18:12.809 --> 00:18:15.759
mention other types of bias too. Yes, absolutely.

00:18:16.380 --> 00:18:18.859
Detection bias, for instance. This occurs when

00:18:18.859 --> 00:18:20.880
the outcome is assessed differently between the

00:18:20.880 --> 00:18:23.480
groups. If the assessors, the people measuring

00:18:23.480 --> 00:18:25.799
the outcome, are aware of which treatment group

00:18:25.799 --> 00:18:28.400
a patient received, their expectations might

00:18:28.400 --> 00:18:31.549
influence their assessment. Blinding the assessors

00:18:31.549 --> 00:18:33.450
to the treatment group is the key way to mitigate

00:18:33.450 --> 00:18:36.589
this. Then there's reporting bias, which refers

00:18:36.589 --> 00:18:39.369
to the selective reporting of findings. For example,

00:18:39.670 --> 00:18:41.609
only publishing results that are statistically

00:18:41.609 --> 00:18:44.650
significant or favorable to a particular treatment

00:18:44.650 --> 00:18:47.430
while suppressing negative or null findings.

00:18:47.970 --> 00:18:50.190
This seriously distorts the overall evidence

00:18:50.190 --> 00:18:52.390
landscape, making treatments look better than

00:18:52.390 --> 00:18:54.789
they perhaps are. That reporting bias sounds

00:18:54.789 --> 00:18:56.910
particularly concerning, especially when we think

00:18:56.910 --> 00:18:59.470
about the pressure to publish. The source material

00:18:59.630 --> 00:19:02.269
calls out p -hacking as an unethical practice

00:19:02.269 --> 00:19:05.150
linked to this. What exactly is p -hacking and

00:19:05.150 --> 00:19:07.509
why is it considered so damaging to the integrity

00:19:07.509 --> 00:19:10.490
of research? P -hacking is a really problematic

00:19:10.490 --> 00:19:13.500
form of data manipulation. It's often driven

00:19:13.500 --> 00:19:16.500
by that intense pressure researchers feel to

00:19:16.500 --> 00:19:19.460
find statistically significant results, typically

00:19:19.460 --> 00:19:23.059
that p -value below .05, because significant

00:19:23.059 --> 00:19:25.880
results are generally easier to publish. It's

00:19:25.880 --> 00:19:28.839
not usually outright fabrication of data, but

00:19:28.839 --> 00:19:31.279
it involves playing with the data or the analysis

00:19:31.279 --> 00:19:33.900
methods after the data have been collected, specifically

00:19:33.900 --> 00:19:36.599
until a desired p -value is achieved. How might

00:19:36.599 --> 00:19:39.250
someone do that? Examples include running multiple

00:19:39.250 --> 00:19:41.009
different statistical tests on the same dataset,

00:19:41.470 --> 00:19:43.470
and only reporting the one that happens to yield

00:19:43.470 --> 00:19:46.349
a significant result. Or, perhaps collecting

00:19:46.349 --> 00:19:48.769
more data points if the initial analysis isn't

00:19:48.769 --> 00:19:50.869
significant, and then rerunning the analysis

00:19:50.869 --> 00:19:53.809
until it crosses the threshold. Selectively excluding

00:19:53.809 --> 00:19:56.009
outlier data points that might dilute the effect

00:19:56.009 --> 00:19:58.990
is another way. or measuring many, many different

00:19:58.990 --> 00:20:01.250
outcomes, but only reporting the few that happen

00:20:01.250 --> 00:20:03.170
to show a significant difference by chance. And

00:20:03.170 --> 00:20:05.710
why is that so bad? It sounds like just exploring

00:20:05.710 --> 00:20:08.250
the data. The reason p -hacking is so damaging

00:20:08.250 --> 00:20:11.410
and considered unethical is that it fundamentally

00:20:11.410 --> 00:20:14.930
violates the logic of hypothesis testing and

00:20:14.930 --> 00:20:18.369
the p -value itself. The p -value is the probability

00:20:18.369 --> 00:20:21.069
of observing your data. or more extreme data,

00:20:21.390 --> 00:20:24.430
assuming the null hypothesis is true, and, crucially,

00:20:24.910 --> 00:20:26.869
assuming the test and analysis plan were decided

00:20:26.869 --> 00:20:30.009
upon before you saw the data. When you p -hack,

00:20:30.089 --> 00:20:31.730
you're essentially searching through the data

00:20:31.730 --> 00:20:33.750
until you find any pattern that looks significant

00:20:33.750 --> 00:20:36.049
just by chance, and then you report it as if

00:20:36.049 --> 00:20:38.029
it were the primary finding you set out to discover

00:20:38.029 --> 00:20:40.069
all along. So it inflates the false positive

00:20:40.069 --> 00:20:43.170
rate. Massively. It hugely inflates the chance

00:20:43.170 --> 00:20:45.390
of reporting false positive results, finding

00:20:45.390 --> 00:20:47.650
an effect that isn't really there. It misleads

00:20:47.650 --> 00:20:49.710
other researchers, clinicians, and ultimately

00:20:49.710 --> 00:20:52.569
patients about the true state of evidence. It's

00:20:52.569 --> 00:20:55.329
a distortion of reality driven by chasing a specific

00:20:55.329 --> 00:20:58.049
statistical outcome, and it severely undermines

00:20:58.049 --> 00:21:00.130
the trustworthiness of the research process.

00:21:00.490 --> 00:21:03.769
Absolutely. It's a stark reminder that transparency

00:21:03.769 --> 00:21:06.589
and ethical conduct are just as important as

00:21:06.589 --> 00:21:09.609
getting the statistics right. Speaking of statistics,

00:21:10.009 --> 00:21:12.390
let's delve into some foundational concepts that

00:21:12.390 --> 00:21:14.809
are essential for interpreting the number studies

00:21:14.809 --> 00:21:17.450
report, starting with the basic distinction between

00:21:17.450 --> 00:21:19.950
samples and populations. Right. In research,

00:21:20.150 --> 00:21:22.230
we're almost always interested in understanding

00:21:22.230 --> 00:21:25.789
something about a large group, the population.

00:21:26.250 --> 00:21:28.009
They could be all patients with a certain type

00:21:28.009 --> 00:21:30.390
of fracture, all people undergoing a specific

00:21:30.390 --> 00:21:32.950
type of surgery, or maybe all adults in a country.

00:21:33.450 --> 00:21:36.279
However, it's rarely feasible or even possible,

00:21:36.619 --> 00:21:38.779
to study every single member of that population.

00:21:39.160 --> 00:21:41.700
So instead we study a smaller, hopefully representative

00:21:41.700 --> 00:21:44.279
subset, and that's our sample. And the goal is

00:21:44.279 --> 00:21:47.039
to generalize from the sample. Exactly. The goal

00:21:47.039 --> 00:21:49.220
of research is to use the data we collect from

00:21:49.220 --> 00:21:52.380
our sample to make inferences or generalize conclusions

00:21:52.380 --> 00:21:54.599
back to the larger population from which it was

00:21:54.599 --> 00:21:57.019
drawn. This is why how you select your sample

00:21:57.019 --> 00:21:59.440
is so critically important, going right back

00:21:59.440 --> 00:22:02.480
to that issue of selection bias. Random sampling.

00:22:02.680 --> 00:22:04.940
where every member of the population theoretically

00:22:04.940 --> 00:22:07.519
has an equal chance of being included, is the

00:22:07.519 --> 00:22:10.440
ideal because it helps ensure the sample is representative

00:22:10.440 --> 00:22:12.900
of the population, making your inferences more

00:22:12.900 --> 00:22:16.160
reliable. When we describe our sample data, we

00:22:16.160 --> 00:22:19.279
often use descriptive statistics. Data can be

00:22:19.279 --> 00:22:21.539
continuous, meaning it can take any value within

00:22:21.539 --> 00:22:24.119
a range, like height or blood pressure, or it

00:22:24.119 --> 00:22:26.319
can be categorical, meaning it falls into distinct

00:22:26.319 --> 00:22:29.099
categories, like sex, blood type, or fracture

00:22:29.099 --> 00:22:32.099
grade. For continuous data, the mean of the average

00:22:32.099 --> 00:22:34.140
is a common way to describe the center of the

00:22:34.140 --> 00:22:36.480
data. The source has also mentioned variance,

00:22:36.799 --> 00:22:38.900
which is a measure of dispersion. It tells you

00:22:38.900 --> 00:22:41.039
how spread out the individual data points are

00:22:41.039 --> 00:22:43.099
around that mean. How is variance calculated?

00:22:43.480 --> 00:22:45.720
It's calculated as the average of the squared

00:22:45.720 --> 00:22:47.759
differences of each data point from the mean.

00:22:48.559 --> 00:22:51.400
Measures of dispersion like variance and its

00:22:51.400 --> 00:22:54.000
square root, the standard deviation, are crucial

00:22:54.000 --> 00:22:56.200
because just Knowing the average isn't enough,

00:22:56.799 --> 00:22:58.759
you really need to know how much the data varies

00:22:58.759 --> 00:23:01.240
around that average. And how do these concepts

00:23:01.240 --> 00:23:05.000
then lead us into hypothesis testing? Hypothesis

00:23:05.000 --> 00:23:07.279
testing is the formal process researchers use

00:23:07.279 --> 00:23:09.180
to determine if the results observed in their

00:23:09.180 --> 00:23:11.900
sample provide enough evidence to support a conclusion

00:23:11.900 --> 00:23:14.619
about the wider population. It typically involves

00:23:14.619 --> 00:23:17.339
setting up two competing statements. The null

00:23:17.339 --> 00:23:20.059
hypothesis, often written as H, and the alternative

00:23:20.059 --> 00:23:22.839
hypothesis, H or sometimes H. What do those usually

00:23:22.839 --> 00:23:25.400
state? The null hypothesis usually states there

00:23:25.400 --> 00:23:27.900
is no effect or no difference between the groups

00:23:27.900 --> 00:23:30.740
being compared. For example, the new treatment

00:23:30.740 --> 00:23:32.819
has no effect on healing time compared to the

00:23:32.819 --> 00:23:35.339
old one. The alternative hypothesis states that

00:23:35.339 --> 00:23:37.420
there is an effect or a difference. For example,

00:23:37.819 --> 00:23:39.759
the new treatment does improve healing time.

00:23:40.359 --> 00:23:42.539
We then perform a statistical test on our sample

00:23:42.539 --> 00:23:44.740
data to see how likely it is we'd observe results

00:23:44.740 --> 00:23:47.519
like ours if the null hypothesis were actually

00:23:47.519 --> 00:23:49.769
true in the population. Okay, and this is where

00:23:49.769 --> 00:23:51.730
the famous p -value comes in, the number that

00:23:51.730 --> 00:23:54.589
often gets so much attention. We often hear researchers

00:23:54.589 --> 00:23:59.029
chasing that magic p .05, but what does the p

00:23:59.029 --> 00:24:02.690
-value actually represent? Ah, yes. This is probably

00:24:02.690 --> 00:24:04.769
one of the most commonly misunderstood concepts

00:24:04.769 --> 00:24:07.309
in statistics, and it's absolutely vital to get

00:24:07.309 --> 00:24:10.609
it right. The p -value is the probability of

00:24:10.609 --> 00:24:13.700
observing data from your sample that is as extreme

00:24:13.700 --> 00:24:16.559
as or more extreme than what you actually measured,

00:24:17.140 --> 00:24:19.079
assuming that the null hypothesis is true. Can

00:24:19.079 --> 00:24:21.029
you give an example? Okay, so if you conduct

00:24:21.029 --> 00:24:22.829
a study comparing two treatments and you get

00:24:22.829 --> 00:24:26.190
a p -value of, say, 0 .03, it means that if there

00:24:26.190 --> 00:24:28.029
were truly no difference between those treatments

00:24:28.029 --> 00:24:30.930
in the entire population, if the null hypothesis

00:24:30.930 --> 00:24:33.329
were correct, you would still expect to see a

00:24:33.329 --> 00:24:35.190
difference as large as or larger than the one

00:24:35.190 --> 00:24:37.569
you observed in your sample only 3 % of the time,

00:24:37.970 --> 00:24:40.549
just due to random chance or sampling variability.

00:24:41.150 --> 00:24:45.589
That traditional threshold of 0 .05 or 5 % for

00:24:45.589 --> 00:24:48.009
statistical significance is largely arbitrary.

00:24:48.049 --> 00:24:50.420
It's a convention that arose historically. If

00:24:50.420 --> 00:24:52.859
your p -value is less than this threshold, you

00:24:52.859 --> 00:24:55.400
typically reject the null hypothesis and conclude

00:24:55.400 --> 00:24:57.660
that your results provide statistically significant

00:24:57.660 --> 00:25:00.279
evidence against the null hypothesis and therefore

00:25:00.279 --> 00:25:02.779
in favor of your alternative hypothesis. But

00:25:02.779 --> 00:25:05.440
what is the p -value not? Crucially, it's absolutely

00:25:05.440 --> 00:25:07.579
critical to understand what the p -value is not.

00:25:08.519 --> 00:25:10.980
It is not the probability that the null hypothesis

00:25:10.980 --> 00:25:13.799
is true. It is not the probability that you are

00:25:13.799 --> 00:25:16.980
wrong if you reject the null hypothesis, although

00:25:16.980 --> 00:25:19.900
it's related to that concept. And, very importantly,

00:25:20.000 --> 00:25:21.599
it tells you absolutely nothing directly about

00:25:21.599 --> 00:25:24.259
the size or the clinical importance of the observed

00:25:24.259 --> 00:25:27.839
effect. As the source notes, a small p -value

00:25:27.839 --> 00:25:30.279
indicates that your observed data are unlikely

00:25:30.279 --> 00:25:32.859
if the null hypothesis is true, so it provides

00:25:32.859 --> 00:25:34.880
evidence against the null. You can think of it

00:25:34.880 --> 00:25:37.740
as a sort of currency for statistical robustness.

00:25:38.460 --> 00:25:40.619
Smaller p -values mean stronger evidence from

00:25:40.619 --> 00:25:43.099
the specific sample against the null. But even

00:25:43.099 --> 00:25:47.019
a very low p -value, say 0 .001, means there's

00:25:47.019 --> 00:25:49.880
still a 0 .1 % chance of seeing results that

00:25:49.880 --> 00:25:52.700
extreme if the null were true. In some contexts,

00:25:52.859 --> 00:25:56.519
like drug safety, even a 1 % chance, p0 .01,

00:25:56.940 --> 00:25:58.660
of being wrong about a harmful effect could be

00:25:58.660 --> 00:26:00.440
catastrophic if applied to a large population.

00:26:00.779 --> 00:26:03.700
So necessary but not sufficient. Exactly. Statistical

00:26:03.700 --> 00:26:05.779
significance is often a necessary condition for

00:26:05.779 --> 00:26:07.839
claiming an effect, but it's rarely sufficient

00:26:07.839 --> 00:26:10.240
on its own to make clinical decisions. You mentioned

00:26:10.240 --> 00:26:12.720
that many statistical tests compare groups by

00:26:12.720 --> 00:26:14.400
looking at their variances. Could you give us

00:26:14.400 --> 00:26:17.839
a simplified view of how that works? Yes, many

00:26:17.839 --> 00:26:20.640
common tests, particularly those comparing the

00:26:20.640 --> 00:26:22.920
means of different groups, are fundamentally

00:26:22.920 --> 00:26:26.079
based on analyzing variance. A key concept here

00:26:26.079 --> 00:26:29.559
is partitioning variance. Take ANOVO, for instance,

00:26:30.039 --> 00:26:31.740
analysis of variance, which is used when you're

00:26:31.740 --> 00:26:33.440
comparing the means of three or more groups.

00:26:34.140 --> 00:26:36.400
ANOVO essentially breaks down the total variation

00:26:36.400 --> 00:26:39.180
you see in the data into different sources. What

00:26:39.180 --> 00:26:42.119
sources? Primarily, the variation between the

00:26:42.119 --> 00:26:44.259
groups, which might be due to the treatment effect

00:26:44.259 --> 00:26:46.359
you're interested in, and the variation within

00:26:46.359 --> 00:26:48.759
each group, which is generally considered due

00:26:48.759 --> 00:26:51.640
to random error and individual differences among

00:26:51.640 --> 00:26:54.640
participants in the same group. It then compares

00:26:54.640 --> 00:26:56.940
the magnitude of the variance between the groups

00:26:56.940 --> 00:27:00.920
relative to the variance within the groups. This

00:27:00.920 --> 00:27:02.960
comparison is often done using something called

00:27:02.960 --> 00:27:06.200
the F -statistic, which follows a specific statistical

00:27:06.200 --> 00:27:08.859
distribution called the F -distribution. Fundamentally,

00:27:08.940 --> 00:27:11.359
it's a ratio of two variances. And if the variance

00:27:11.359 --> 00:27:14.009
between groups is large, If the variance between

00:27:14.009 --> 00:27:16.109
groups is significantly larger than the variance

00:27:16.109 --> 00:27:19.190
within groups, the test concludes there's a statistically

00:27:19.190 --> 00:27:21.809
significant difference somewhere among the group

00:27:21.809 --> 00:27:24.609
means. It's about seeing if the groups are more

00:27:24.609 --> 00:27:27.009
different from each other than individuals tend

00:27:27.009 --> 00:27:30.329
to be within their own group. Okay, so p -values

00:27:30.329 --> 00:27:31.990
give us a sense of statistical significance,

00:27:32.150 --> 00:27:34.170
of the likelihood of results being due to chance

00:27:34.170 --> 00:27:36.829
if the null hypothesis is true. But the source

00:27:36.829 --> 00:27:39.349
material strongly advocates looking beyond the

00:27:39.349 --> 00:27:41.509
p -value, especially at confidence intervals,

00:27:41.730 --> 00:27:45.509
or CIs. Why are CIs so vital, perhaps even more

00:27:45.509 --> 00:27:48.450
so than the p -value, for interpreting clinical

00:27:48.450 --> 00:27:50.859
relevance? Confidence intervals are absolutely

00:27:50.859 --> 00:27:53.259
critical for moving beyond just that binary,

00:27:53.539 --> 00:27:56.180
significant or not significant decision to understand

00:27:56.180 --> 00:27:58.319
the potential magnitude and the precision of

00:27:58.319 --> 00:28:01.000
the treatment effect, and therefore its clinical

00:28:01.000 --> 00:28:03.980
meaningfulness. A confidence interval, most commonly

00:28:03.980 --> 00:28:07.420
a 95 % CI, provides a range of values within

00:28:07.420 --> 00:28:09.900
which the true effect size in the larger population

00:28:09.900 --> 00:28:12.599
is likely to lie, based on the results observed

00:28:12.599 --> 00:28:14.970
in your sample. It gives you a sense of the uncertainty

00:28:14.970 --> 00:28:17.190
around your estimate of the effect. So it gives

00:28:17.190 --> 00:28:20.069
a range, not just a single number or a yes -no.

00:28:20.369 --> 00:28:23.250
How do we interpret that range? The key to interpreting

00:28:23.250 --> 00:28:26.130
a CI lies in where that range falls, relative

00:28:26.130 --> 00:28:29.660
to two important points. Zero. If you're looking

00:28:29.660 --> 00:28:32.099
at a difference between groups, zero means no

00:28:32.099 --> 00:28:34.960
difference. And, critically, relative to what

00:28:34.960 --> 00:28:37.019
is considered a clinically important difference.

00:28:37.539 --> 00:28:40.140
That's the minimum change that patients or clinicians

00:28:40.140 --> 00:28:42.460
would consider meaningful and worthwhile in practice.

00:28:43.180 --> 00:28:45.000
Let's maybe walk through a few scenarios to make

00:28:45.000 --> 00:28:48.099
it clear. If your 95 % CI for the difference

00:28:48.099 --> 00:28:50.180
between two treatments does not include zero,

00:28:50.440 --> 00:28:52.579
that aligns with a statistically significant

00:28:52.579 --> 00:28:57.200
result, typically P0 .05. The entire range suggests

00:28:57.200 --> 00:28:59.650
a difference likely exists. But then you must

00:28:59.650 --> 00:29:02.049
look at where that range lies. If the entire

00:29:02.049 --> 00:29:04.509
CI, both the lower and upper bounds, is above

00:29:04.509 --> 00:29:06.069
what you've defined as a clinically important

00:29:06.069 --> 00:29:08.069
difference, then you can be more confident that

00:29:08.069 --> 00:29:10.170
the treatment not only has a statistically significant

00:29:10.170 --> 00:29:12.569
effect, but also one that is likely to be clinically

00:29:12.569 --> 00:29:14.829
worthwhile. Okay, what if it does include zero?

00:29:15.069 --> 00:29:18.369
If the CI includes zero, your result is not statistically

00:29:18.369 --> 00:29:22.299
significant. typically PP .05. This means your

00:29:22.299 --> 00:29:24.220
data don't provide enough evidence to reject

00:29:24.220 --> 00:29:27.000
the null hypothesis of no difference. The true

00:29:27.000 --> 00:29:30.000
effect could plausibly be zero, or it could even

00:29:30.000 --> 00:29:32.519
point in the opposite direction if the CI spans

00:29:32.519 --> 00:29:35.519
both negative and positive values. This outcome

00:29:35.519 --> 00:29:37.299
is essentially uncertain based on your data.

00:29:37.640 --> 00:29:39.900
Right. And what about situations where it's significant,

00:29:39.920 --> 00:29:42.609
but maybe not clinically meaningful? or vice

00:29:42.609 --> 00:29:45.089
versa. Exactly. Consider a scenario where the

00:29:45.089 --> 00:29:49.349
p value is, say, 0 .06, just missing that conventional

00:29:49.349 --> 00:29:52.549
0 .05 cutoff, so not statistically significant.

00:29:52.950 --> 00:29:55.309
The 95 per cci for the difference between groups

00:29:55.309 --> 00:29:58.269
might be, for example, negative 1 .5 plus 8 .5

00:29:58.269 --> 00:30:00.109
points on a pain scale, where you've decided

00:30:00.109 --> 00:30:01.789
a clinically important difference is five points.

00:30:02.289 --> 00:30:04.750
Now, the c i includes zero, consistent with p

00:30:04.750 --> 00:30:07.569
.05, but look at the upper end. It includes values

00:30:07.569 --> 00:30:09.890
like 8 .5, which are well above your clinically

00:30:09.890 --> 00:30:12.640
important threshold of five. important. It suggests

00:30:12.640 --> 00:30:14.619
that while your study didn't find statistically

00:30:14.619 --> 00:30:17.579
significant evidence against the null, the results

00:30:17.579 --> 00:30:19.619
are still compatible with a clinically important

00:30:19.619 --> 00:30:22.779
benefit. You might conclude your study was perhaps

00:30:22.779 --> 00:30:25.380
underpowered to detect this difference definitively,

00:30:25.539 --> 00:30:28.180
or the result remains uncertain, but you certainly

00:30:28.180 --> 00:30:30.339
can't confidently say the treatment is ineffective

00:30:30.339 --> 00:30:34.210
based on that CI. Now flip that. Consider a study

00:30:34.210 --> 00:30:37.410
with a p -value of 0 .04 statistically significant,

00:30:37.930 --> 00:30:41.690
but the 95 % CI for the effect size is, say,

00:30:41.690 --> 00:30:45.170
plus 0 .5, plus 2 .5 points on that same pain

00:30:45.170 --> 00:30:47.529
scale, where five points is the clinically important

00:30:47.529 --> 00:30:50.170
difference. Here, the CI does not include zero.

00:30:50.269 --> 00:30:52.410
It's statistically significant, but the entire

00:30:52.410 --> 00:30:55.069
range, even the upper limit of 2 .5, is well

00:30:55.069 --> 00:30:57.289
below the five -point threshold you decided was

00:30:57.289 --> 00:31:00.309
clinically important. So significant, but maybe

00:31:00.309 --> 00:31:02.809
not worth it. Precisely. You have a statistically

00:31:02.809 --> 00:31:05.890
significant result, but the CI strongly suggests

00:31:05.890 --> 00:31:08.230
that the true effect, while likely existing,

00:31:08.730 --> 00:31:10.789
is probably too small to be clinically meaningful

00:31:10.789 --> 00:31:14.710
for patients. Or, one last scenario. Imagine

00:31:14.710 --> 00:31:17.549
a study where the p -value is high, say, a .20.

00:31:17.819 --> 00:31:20.599
clearly not significant, and the CI for the effect

00:31:20.599 --> 00:31:24.279
is, say, negative 3 plus 1 .5. This CI includes

00:31:24.279 --> 00:31:26.480
zero, consistent with the non -significant p

00:31:26.480 --> 00:31:29.240
-value. And importantly, the upper limit of the

00:31:29.240 --> 00:31:32.779
plausible effect, 1 .5, is well below your clinically

00:31:32.779 --> 00:31:35.200
important difference. In this case, both the

00:31:35.200 --> 00:31:37.200
p -value and the CI point towards the treatment

00:31:37.200 --> 00:31:39.599
likely being ineffective in any clinically meaningful

00:31:39.599 --> 00:31:42.559
way. So the CI gives much more context. Absolutely.

00:31:42.720 --> 00:31:44.880
The CI gives you not just a yes -no about statistical

00:31:44.880 --> 00:31:46.839
significance, but a range of plausible values

00:31:46.839 --> 00:31:48.920
for the actual effect size. This allows you to

00:31:48.920 --> 00:31:50.960
judge the precision of the estimate. A wide CI

00:31:50.960 --> 00:31:53.240
means less precision, more uncertainty. A narrow

00:31:53.240 --> 00:31:56.220
CI means more precision and, crucially, to directly

00:31:56.220 --> 00:31:58.220
assess its clinical relevance in a way the p

00:31:58.220 --> 00:32:00.480
-value alone simply cannot. Always, always look

00:32:00.480 --> 00:32:02.940
at the CI. That was a really clear explanation,

00:32:03.039 --> 00:32:06.819
thank you. a deep dive into how studies are designed

00:32:06.819 --> 00:32:09.160
and the statistical language we really need to

00:32:09.160 --> 00:32:12.220
understand to interpret them properly. That transitions

00:32:12.220 --> 00:32:15.160
us perfectly into our next segment. The critical

00:32:15.160 --> 00:32:17.380
challenges of measuring what truly matters in

00:32:17.380 --> 00:32:19.420
clinical outcomes and the fundamental ethical

00:32:19.420 --> 00:32:21.859
standards that have to underpin all human research.

00:32:22.559 --> 00:32:24.339
You highlighted earlier that choosing the right

00:32:24.339 --> 00:32:26.960
outcome measure is paramount. Let's expand a

00:32:26.960 --> 00:32:28.680
bit on the different domains of outcomes and

00:32:28.680 --> 00:32:31.049
why that choice is so vital. Right. Outcomes

00:32:31.049 --> 00:32:32.869
in healthcare are multifaceted, aren't they?

00:32:33.009 --> 00:32:35.049
We can think about them in several broad domains.

00:32:35.730 --> 00:32:37.910
There are clinical outcomes which relate directly

00:32:37.910 --> 00:32:40.609
to the patient's health status and the biological

00:32:40.609 --> 00:32:42.990
or functional effects of the intervention. Things

00:32:42.990 --> 00:32:45.710
like pain levels, range of motion, maybe fracture

00:32:45.710 --> 00:32:48.250
healing time on an x -ray, presence of infection,

00:32:48.670 --> 00:32:50.970
or even mortality. So the direct health impact.

00:32:51.450 --> 00:32:54.329
Yes. Then there are process of care outcomes.

00:32:54.640 --> 00:32:57.240
which look more at how care is delivered, things

00:32:57.240 --> 00:32:59.720
like length of hospital stay, readmission rates,

00:33:00.220 --> 00:33:03.119
resource utilization. Patient satisfaction is

00:33:03.119 --> 00:33:04.980
another crucial domain, though it's important

00:33:04.980 --> 00:33:07.200
to note it's also distinct from clinical outcomes.

00:33:07.980 --> 00:33:11.000
And finally, there are economic outcomes, focusing

00:33:11.000 --> 00:33:13.559
on costs, both the direct costs of the treatment

00:33:13.559 --> 00:33:16.359
itself, and perhaps indirect costs, like lost

00:33:16.359 --> 00:33:18.769
productivity for the patient. Selecting the correct

00:33:18.769 --> 00:33:21.269
outcome measure or perhaps measures for a particular

00:33:21.269 --> 00:33:23.109
study is absolutely critical because it must

00:33:23.109 --> 00:33:25.150
align with the research question being asked

00:33:25.150 --> 00:33:27.910
and be meaningful within the context of the intervention

00:33:27.910 --> 00:33:30.450
being studied and the patient population involved.

00:33:31.069 --> 00:33:33.170
If you're setting a surgery designed to reduce

00:33:33.170 --> 00:33:36.009
pain and improve function, measuring only, say,

00:33:36.190 --> 00:33:38.529
an inflammatory marker in the blood might be

00:33:38.529 --> 00:33:40.509
scientifically interesting, but it's unlikely

00:33:40.509 --> 00:33:42.910
to be clinically relevant on its own. And this

00:33:42.910 --> 00:33:45.369
links to the idea of value in healthcare. Very

00:33:45.369 --> 00:33:47.970
much so. This is becoming even more prominent

00:33:47.970 --> 00:33:50.089
with the shift we're seeing toward value -based

00:33:50.089 --> 00:33:53.250
healthcare models, moving away from simply paying

00:33:53.250 --> 00:33:55.869
for the volume of services provided to paying

00:33:55.869 --> 00:33:58.869
for the value delivered. And value is often defined,

00:33:58.950 --> 00:34:01.549
quite simply, as the outcomes achieved relative

00:34:01.549 --> 00:34:04.809
to the costs incurred to achieve them. If you

00:34:04.809 --> 00:34:07.269
can't accurately and meaningfully measure the

00:34:07.269 --> 00:34:09.849
outcomes that genuinely matter to patients and

00:34:09.849 --> 00:34:12.269
to the healthcare system, you simply can't demonstrate

00:34:12.269 --> 00:34:15.679
value. So, the measure must capture what the

00:34:15.679 --> 00:34:18.199
intervention is intended to change, and that

00:34:18.199 --> 00:34:20.460
change must be relevant either to the patient's

00:34:20.460 --> 00:34:22.960
life or to the efficiency of the healthcare system.

00:34:23.159 --> 00:34:25.460
Okay, so assuming we know what we want to measure,

00:34:25.619 --> 00:34:28.320
what properties define a good outcome measure?

00:34:28.820 --> 00:34:30.900
What should we be looking for when a study reports

00:34:30.900 --> 00:34:33.699
using a particular scale or test? The two most

00:34:33.699 --> 00:34:35.699
fundamental properties you need to look for are

00:34:35.699 --> 00:34:38.940
reliability and validity. Think of reliability

00:34:38.940 --> 00:34:42.420
as consistency or repeatability. A reliable instrument

00:34:42.420 --> 00:34:44.840
produces the same result, or very similar results,

00:34:45.260 --> 00:34:47.280
when measuring the same stable phenomenon under

00:34:47.280 --> 00:34:49.139
the same conditions. But measurements aren't

00:34:49.139 --> 00:34:52.960
always perfect? Never. Measurement always includes

00:34:52.960 --> 00:34:55.980
some degree of error. Whether that's error introduced

00:34:55.980 --> 00:34:57.500
by the person taking the measurement we call

00:34:57.500 --> 00:34:59.460
that rater error or error from the instrument

00:34:59.460 --> 00:35:02.000
itself, perhaps it's not calibrated properly.

00:35:02.780 --> 00:35:05.500
There's also random fluctuation or variability

00:35:05.500 --> 00:35:07.900
in the patient's state, even over short periods.

00:35:08.800 --> 00:35:10.800
Reliability is concerned with how much of the

00:35:10.800 --> 00:35:13.099
observed variation in measurements is due to

00:35:13.099 --> 00:35:15.940
these errors versus how much reflects the true

00:35:15.940 --> 00:35:18.260
underlying change in the thing you're actually

00:35:18.260 --> 00:35:20.079
trying to measure. Are there different types

00:35:20.079 --> 00:35:23.860
of reliability? Yes, several. Interrater reliability

00:35:23.860 --> 00:35:26.260
refers to the degree of agreement between two

00:35:26.260 --> 00:35:28.820
or more different observers or clinicians measuring

00:35:28.820 --> 00:35:31.659
the same thing. Interrater reliability is the

00:35:31.659 --> 00:35:33.400
consistency of measurements taken by the same

00:35:33.400 --> 00:35:36.010
observer on different occasions. When assessing

00:35:36.010 --> 00:35:38.230
reliability, we talk about concepts like agreement,

00:35:38.469 --> 00:35:40.769
concordance, repeatability, essentially, how

00:35:40.769 --> 00:35:42.989
consistent is the measurement. Okay, so that's

00:35:42.989 --> 00:35:45.090
reliability consistency. What about validity?

00:35:45.630 --> 00:35:47.630
Validity, on the other hand, is about accuracy.

00:35:48.269 --> 00:35:50.409
It's the degree to which an instrument truly

00:35:50.409 --> 00:35:53.309
measures what it is intended to measure. It answers

00:35:53.309 --> 00:35:55.909
the fundamental question, are we actually measuring

00:35:55.909 --> 00:35:59.039
the concept we think we are measuring? The source

00:35:59.039 --> 00:36:02.260
makes a very important point here. Validity dictates

00:36:02.260 --> 00:36:04.440
what you are able to do with the test results.

00:36:05.340 --> 00:36:07.840
If a measure is valid, you can confidently use

00:36:07.840 --> 00:36:10.300
its results to make decisions or draw conclusions

00:36:10.300 --> 00:36:12.719
about the specific construct it measures, like

00:36:12.719 --> 00:36:15.099
knee function or pain level. And reliability

00:36:15.099 --> 00:36:18.519
is needed for validity. Critically, yes. A measure

00:36:18.519 --> 00:36:22.079
that is inaccurate or unreliable cannot be valid.

00:36:23.179 --> 00:36:24.800
If you're not getting consistent results when

00:36:24.800 --> 00:36:27.300
you repeat the measurement, low reliability,

00:36:27.739 --> 00:36:30.039
you certainly can't be accurately measuring the

00:36:30.039 --> 00:36:32.840
true underlying state, validity. Think about

00:36:32.840 --> 00:36:35.800
it. If your bathroom scale gives you wildly different

00:36:35.800 --> 00:36:38.500
readings every time you step on it, it's unreliable,

00:36:38.719 --> 00:36:40.480
and therefore it's not a valid measure of your

00:36:40.480 --> 00:36:43.360
true weight. Establishing validity is often more

00:36:43.360 --> 00:36:46.039
challenging than reliability and involves accumulating

00:36:46.039 --> 00:36:48.539
evidence from various sources. Things like content

00:36:48.539 --> 00:36:51.139
validity, does it cover all aspects of the concept?

00:36:51.900 --> 00:36:54.039
Construct validity, does it relate to other measures

00:36:54.039 --> 00:36:56.599
as expected? Criterion validity, does it correlate

00:36:56.599 --> 00:36:59.119
with a gold standard? To show the measure behaves

00:36:59.119 --> 00:37:01.420
in ways consistent with the theoretical concept

00:37:01.420 --> 00:37:04.329
it's supposed to capture. Right. Let's explore

00:37:04.329 --> 00:37:05.969
some of the different measurement types then,

00:37:06.110 --> 00:37:08.150
starting with patient -reported outcome measures,

00:37:08.349 --> 00:37:10.949
or PROMs, which seem increasingly prominent in

00:37:10.949 --> 00:37:13.090
clinical research, particularly orthopedics.

00:37:13.340 --> 00:37:16.800
Indeed they are. PROMs are standardized, validated

00:37:16.800 --> 00:37:18.679
questionnaires that are completed directly by

00:37:18.679 --> 00:37:21.480
patients themselves. They are specifically designed

00:37:21.480 --> 00:37:23.440
to capture the patient's own perspective on their

00:37:23.440 --> 00:37:25.840
health status, their functional limitations,

00:37:26.340 --> 00:37:28.860
their symptoms like pain or stiffness, and their

00:37:28.860 --> 00:37:31.340
overall well -being or quality of life. And why

00:37:31.340 --> 00:37:34.019
the shift towards using these more? Well, the

00:37:34.019 --> 00:37:36.400
rise of PROMs really reflects a growing recognition

00:37:36.400 --> 00:37:38.300
that the patient's experience and their functional

00:37:38.300 --> 00:37:40.699
status are often the most important outcomes,

00:37:41.099 --> 00:37:43.739
particularly in areas like orthopedics, where

00:37:43.739 --> 00:37:46.360
improving quality of life, reducing pain, and

00:37:46.360 --> 00:37:48.659
restoring function are frequently the primary

00:37:48.659 --> 00:37:51.400
goals of treatment. They provide a more holistic

00:37:51.400 --> 00:37:53.679
picture that clinician -based measures or imaging

00:37:53.679 --> 00:37:56.739
alone might completely miss. The sources list

00:37:56.739 --> 00:37:59.960
several common examples, many specific to musculoskeletal

00:37:59.960 --> 00:38:02.719
health. Things like simple visual analog scales,

00:38:02.909 --> 00:38:06.090
VAS, or numeric rating scales for pain are very

00:38:06.090 --> 00:38:08.610
common. Then you have condition -specific scores

00:38:08.610 --> 00:38:11.630
like the Oxford hip score or Oxford knee score,

00:38:12.110 --> 00:38:14.210
the International Knee Documentation Committee,

00:38:14.429 --> 00:38:17.289
IKDC, subjective form, which is widely used for

00:38:17.289 --> 00:38:20.070
knee function, the hip disability and osteoarthritis

00:38:20.070 --> 00:38:23.599
outcome score, HOS, The Western Ontario and McMaster

00:38:23.599 --> 00:38:25.900
University's Osteoarthritis Index, the Lomax,

00:38:25.940 --> 00:38:27.980
there for hip and knee osteoarthritis symptoms

00:38:27.980 --> 00:38:30.519
and function. For upper limb problems, there's

00:38:30.519 --> 00:38:32.599
the disabilities of the arm, shoulder, and hand,

00:38:33.239 --> 00:38:35.619
DAS, a questionnaire. The Marks Activity Level

00:38:35.619 --> 00:38:37.579
Score is another example, specifically assessing

00:38:37.579 --> 00:38:39.940
how physically active a patient is. And what

00:38:39.940 --> 00:38:42.800
makes a good prom beyond reliability and validity?

00:38:43.360 --> 00:38:45.579
Desirable properties also include acceptability

00:38:45.579 --> 00:38:47.639
to patients. Are the questions understandable?

00:38:47.840 --> 00:38:51.420
Is it too burdensome or long to complete? Feasibility

00:38:51.420 --> 00:38:53.739
and practice is also key. How much time does

00:38:53.739 --> 00:38:57.099
it take to administer and score? The health assessment

00:38:57.099 --> 00:39:00.099
questionnaire, HAQ for instance, is noted as

00:39:00.099 --> 00:39:02.280
being relatively fast while something like the

00:39:02.280 --> 00:39:05.619
SF -36, a widely used general health status measure,

00:39:06.059 --> 00:39:07.900
can vary quite a bit in completion time depending

00:39:07.900 --> 00:39:10.679
on how it's administered on paper, online, or

00:39:10.679 --> 00:39:12.949
via an interview. It's also really important

00:39:12.949 --> 00:39:15.670
to distinguish PROMs, which measure health outcomes,

00:39:16.070 --> 00:39:17.909
from patient -reported experience measures, or

00:39:17.909 --> 00:39:20.929
PRMs. PRMs measure aspects of the care process

00:39:20.929 --> 00:39:22.889
itself, things like communication with staff,

00:39:23.449 --> 00:39:25.449
cleanliness of the facilities, waiting times,

00:39:25.889 --> 00:39:27.949
not the health outcome. And as the source points

00:39:27.949 --> 00:39:29.710
out, studies have generally shown quite a weak

00:39:29.710 --> 00:39:32.590
association between PRMs and PROMs, meaning a

00:39:32.590 --> 00:39:34.590
patient can report a poor experience of care,

00:39:34.630 --> 00:39:36.849
but still have a good clinical outcome, or vice

00:39:36.849 --> 00:39:39.380
versa. they capture fundamentally different things.

00:39:39.599 --> 00:39:42.800
Okay, that clarifies PROMs. And then we have

00:39:42.800 --> 00:39:45.219
the measures taken by clinicians or observers.

00:39:45.800 --> 00:39:47.780
What are some examples there and what are the

00:39:47.780 --> 00:39:49.539
particular challenges associated with those?

00:39:49.739 --> 00:39:51.300
Right, these are the more traditional objective

00:39:51.300 --> 00:39:53.900
or perhaps semi -objective measures performed

00:39:53.900 --> 00:39:56.539
by healthcare professionals. Common examples

00:39:56.539 --> 00:39:59.059
in orthopedics include goniometry to measure

00:39:59.059 --> 00:40:01.809
joint range of motion. Now while using a goniometer

00:40:01.809 --> 00:40:03.849
is generally considered more reliable than just

00:40:03.849 --> 00:40:06.610
visual estimation, it still has limitations,

00:40:06.789 --> 00:40:09.010
especially for small joints, and the measurement

00:40:09.010 --> 00:40:11.690
technique used is absolutely crucial for consistency.

00:40:12.530 --> 00:40:14.590
Radiographic measurement of angles and alignments

00:40:14.590 --> 00:40:16.949
is often the most accurate, but of course is

00:40:16.949 --> 00:40:19.349
limited by radiation exposure and isn't practical

00:40:19.349 --> 00:40:22.110
for frequent follow -up. Muscle power is very

00:40:22.110 --> 00:40:23.809
commonly assessed using the Medical Research

00:40:23.809 --> 00:40:28.150
Council MRC grading system from M0 no contraction,

00:40:28.469 --> 00:40:31.309
up to M5, normal power against full resistance.

00:40:32.449 --> 00:40:35.070
This is widely used but can have subjective elements,

00:40:35.329 --> 00:40:37.110
particularly when distinguishing between grades

00:40:37.110 --> 00:40:39.769
M4, movement against gravity with some resistance,

00:40:40.130 --> 00:40:43.070
and M5, normal. Sensory tests, like two -point

00:40:43.070 --> 00:40:45.190
discrimination to assess nerve function, require

00:40:45.190 --> 00:40:47.789
really careful technique. Applying too much pressure,

00:40:47.809 --> 00:40:50.269
for instance, can give inaccurate results. Grip

00:40:50.269 --> 00:40:52.869
strength measurement needs standardization. Usually

00:40:52.869 --> 00:40:55.110
multiple temps are taken, and whether you report

00:40:55.110 --> 00:40:57.389
the mean or the best result depends on the protocol.

00:40:58.269 --> 00:41:00.610
And crucially, the dynamometer, the instrument

00:41:00.610 --> 00:41:03.469
itself, needs regular calibration to ensure its

00:41:03.469 --> 00:41:05.739
accuracy. And what about classification systems?

00:41:05.880 --> 00:41:07.860
You mentioned the issue with tibial fracture

00:41:07.860 --> 00:41:10.219
classification right at the start. Yes, that's

00:41:10.219 --> 00:41:12.579
a significant challenge, particularly relevant

00:41:12.579 --> 00:41:15.619
to the example you used. Classification systems

00:41:15.619 --> 00:41:18.360
are used everywhere in orthopedics to categorize

00:41:18.360 --> 00:41:21.000
the severity or type of injuries or conditions.

00:41:21.280 --> 00:41:23.280
We have the AO classification for fractures,

00:41:23.599 --> 00:41:26.199
Schatzker for tibial plateau fractures, Gustio

00:41:26.199 --> 00:41:29.099
and Anderson for open fractures, Cheren for assessing

00:41:29.099 --> 00:41:32.039
soft tissue injury around closed fractures. While

00:41:32.039 --> 00:41:34.099
these systems provide a necessary framework for

00:41:34.099 --> 00:41:36.800
communication and research, their reliability,

00:41:37.139 --> 00:41:39.199
particularly that inter -observer reliability,

00:41:40.059 --> 00:41:42.119
the agreement between different clinicians looking

00:41:42.119 --> 00:41:44.300
at the same case can be surprisingly low. Like

00:41:44.300 --> 00:41:48.099
that 60 % figure. Exactly. That 60 % agreement

00:41:48.099 --> 00:41:50.619
rate among experienced orthopedic trauma surgeons

00:41:50.619 --> 00:41:53.639
classifying open tibial fractures using the standard

00:41:53.639 --> 00:41:56.360
Castillo and Anderson system is a stark reminder.

00:41:57.079 --> 00:42:00.119
Even with defined criteria, subjective interpretation

00:42:00.119 --> 00:42:02.989
can lead to considerable disagreement. And if

00:42:02.989 --> 00:42:05.269
different surgeons classify the same injury differently,

00:42:05.750 --> 00:42:07.730
how can you consistently use that classification

00:42:07.730 --> 00:42:09.849
either as a baseline characteristic to ensure

00:42:09.849 --> 00:42:12.250
groups are comparable, or even as an outcome

00:42:12.250 --> 00:42:14.869
measure in research? It introduces significant

00:42:14.869 --> 00:42:16.829
measurement error and reduces the study's power

00:42:16.829 --> 00:42:19.469
and credibility. Other observer -based measures

00:42:19.469 --> 00:42:21.389
mentioned include trauma severity scores like

00:42:21.389 --> 00:42:24.190
TRS, which combine physiological data, like blood

00:42:24.190 --> 00:42:26.869
pressure, and anatomical injury data to predict

00:42:26.869 --> 00:42:30.059
survival after major trauma. Cosmesis, or the

00:42:30.059 --> 00:42:32.000
aesthetic appearance, is another important outcome,

00:42:32.500 --> 00:42:34.179
particularly in specialties like plastic surgery

00:42:34.179 --> 00:42:36.599
or pediatric orthopedics, for instance after

00:42:36.599 --> 00:42:39.500
limb -lengthening procedures. Measuring cosmesis

00:42:39.500 --> 00:42:41.579
is often subjective and can be controversial,

00:42:41.940 --> 00:42:44.019
despite its undeniable psychosocial implications

00:42:44.019 --> 00:42:46.840
for patients. There's a recognized need for validated

00:42:46.840 --> 00:42:49.199
cosmetic outcome measures, which are feasible

00:42:49.199 --> 00:42:51.559
to develop using things like standardized photography

00:42:51.559 --> 00:42:54.860
and scoring systems. For infections like osteomyelitis,

00:42:55.300 --> 00:42:58.059
classification systems exist. The source mentions

00:42:58.059 --> 00:43:00.699
Waldvogel's system, which is more descriptive

00:43:00.699 --> 00:43:03.079
compared to May's system, which is designed more

00:43:03.079 --> 00:43:05.719
to guide treatment and predict prognosis. And

00:43:05.719 --> 00:43:08.019
finally, cost -utility analysis using outcomes

00:43:08.019 --> 00:43:11.500
like Quality Adjusted Life Years, or QLYs, is

00:43:11.500 --> 00:43:14.969
a critical economic outcome measure. A QALY combines

00:43:14.969 --> 00:43:16.849
the length of life gained with a measure of its

00:43:16.849 --> 00:43:19.250
quality, often derived from patient utility scores

00:43:19.250 --> 00:43:22.250
based on questionnaires like the EQ5D. It's highly

00:43:22.250 --> 00:43:24.190
relevant in orthopedics because improving quality

00:43:24.190 --> 00:43:26.409
of life is often the primary benefit of many

00:43:26.409 --> 00:43:29.110
interventions, and 2ALYs allow comparison of

00:43:29.110 --> 00:43:31.289
the cost -effectiveness of interventions across

00:43:31.289 --> 00:43:33.170
different areas of healthcare. That provides

00:43:33.170 --> 00:43:35.469
a really comprehensive overview of the measurement

00:43:35.469 --> 00:43:38.650
challenges. Let's transition now to the equally

00:43:38.650 --> 00:43:41.150
critical foundation of ethical practice in human

00:43:41.150 --> 00:43:43.739
research. The source material outlines several

00:43:43.739 --> 00:43:46.179
key pillars, starting with informed consent.

00:43:46.920 --> 00:43:48.960
We touched on it in the rapid fire, but let's

00:43:48.960 --> 00:43:51.980
delve deeper into what constitutes truly informed

00:43:51.980 --> 00:43:55.760
consent. Informed consent is, without any exaggeration,

00:43:56.059 --> 00:43:58.940
non -negotiable. It's the primary ethical principle

00:43:58.940 --> 00:44:01.400
that upholds the autonomy, the self -determination

00:44:01.400 --> 00:44:04.159
of potential research participants. For consent

00:44:04.159 --> 00:44:07.039
to be truly informed, the participant must be

00:44:07.039 --> 00:44:09.039
provided with all relevant information about

00:44:09.039 --> 00:44:11.860
the study. This includes its purpose, the study

00:44:11.860 --> 00:44:13.980
procedures, what exactly will happen to them,

00:44:14.280 --> 00:44:16.760
how often for how long, any potential risks,

00:44:16.920 --> 00:44:19.840
both common and rare, serious and non -serious.

00:44:20.139 --> 00:44:22.139
Potential benefits, both perhaps directly to

00:44:22.139 --> 00:44:24.039
them, but also importantly to future patients

00:44:24.039 --> 00:44:26.820
or scientific knowledge. What are the alternative

00:44:26.820 --> 00:44:28.820
treatments or procedures available outside the

00:44:28.820 --> 00:44:30.679
study? What are their rights as a participant,

00:44:30.960 --> 00:44:33.400
including the absolute right to withdraw at any

00:44:33.400 --> 00:44:36.869
time without penalty? and information about confidentiality

00:44:36.869 --> 00:44:39.050
and how their data will be handled and protected.

00:44:39.369 --> 00:44:41.449
So it's not just giving them a leaflet? Absolutely

00:44:41.449 --> 00:44:44.329
not. It's not enough just to provide the information,

00:44:44.550 --> 00:44:47.170
perhaps in a lengthy document. The participant

00:44:47.170 --> 00:44:49.849
must also understand it. This often involves

00:44:49.849 --> 00:44:51.710
the researcher checking their comprehension,

00:44:52.489 --> 00:44:54.769
maybe asking them to explain key aspects of the

00:44:54.769 --> 00:44:57.610
study back in their own words. And crucially,

00:44:57.969 --> 00:45:00.190
their decision to participate must be entirely

00:45:00.190 --> 00:45:03.829
voluntary, free from any coercion or undue influence.

00:45:04.650 --> 00:45:07.690
This voluntary aspect can be particularly challenging

00:45:07.690 --> 00:45:09.949
in situations where there is a significant power

00:45:09.949 --> 00:45:12.289
imbalance perhaps between a physician and their

00:45:12.289 --> 00:45:15.010
patient, a professor and their student, or an

00:45:15.010 --> 00:45:17.849
employer and employee. The consent process should

00:45:17.849 --> 00:45:20.630
really be an ongoing conversation, not just a

00:45:20.630 --> 00:45:22.719
one -off signature on a form. And the source

00:45:22.719 --> 00:45:25.179
mentions there are specific populations considered

00:45:25.179 --> 00:45:27.599
vulnerable who require additional protections

00:45:27.599 --> 00:45:30.639
in research. Yes, absolutely. Certain groups

00:45:30.639 --> 00:45:33.139
are deemed vulnerable because various factors

00:45:33.139 --> 00:45:36.000
such as compromised autonomy, cognitive limitations

00:45:36.000 --> 00:45:39.000
being institutionalized, or facing unequal power

00:45:39.000 --> 00:45:41.599
dynamics may affect their ability to give truly

00:45:41.599 --> 00:45:44.400
free and informed consent, or might make them

00:45:44.400 --> 00:45:47.179
more susceptible to coercion or exploitation.

00:45:47.469 --> 00:45:50.309
These groups are afforded extra safeguards in

00:45:50.309 --> 00:45:52.889
research regulations. Who falls into these categories?

00:45:53.230 --> 00:45:55.969
Examples typically include minors, children under

00:45:55.969 --> 00:45:58.969
the legal age of consent, adults with impaired

00:45:58.969 --> 00:46:01.710
mental capacity who would require proxy consent

00:46:01.710 --> 00:46:04.469
from a legally authorized representative, pregnant

00:46:04.469 --> 00:46:07.090
women, fetuses, and neonates due to potential

00:46:07.090 --> 00:46:09.849
risks to the developing child, and prisoners

00:46:09.849 --> 00:46:12.250
due to the constraints on their liberty and potential

00:46:12.250 --> 00:46:14.510
for coercion within the institutional setting.

00:46:14.679 --> 00:46:17.699
For children, specific rules apply, often including

00:46:17.699 --> 00:46:19.960
the need for ongoing permission or assent from

00:46:19.960 --> 00:46:21.679
the child themselves, depending on their age

00:46:21.679 --> 00:46:23.719
and maturity level, in addition to permission

00:46:23.719 --> 00:46:26.099
from their parents or legal guardians. There

00:46:26.099 --> 00:46:28.400
are also very strict conditions under which parental

00:46:28.400 --> 00:46:30.320
permission might be waived, such as in certain

00:46:30.320 --> 00:46:32.920
emergency research settings, but these are tightly

00:46:32.920 --> 00:46:36.760
regulated and reviewed. Another critical ethical

00:46:36.760 --> 00:46:39.079
consideration highlighted is conflicts of interest.

00:46:39.579 --> 00:46:41.500
These seem particularly prevalent or at least

00:46:41.500 --> 00:46:44.199
discussed more often in research with industry

00:46:44.199 --> 00:46:46.780
ties. What constitutes a conflict and why are

00:46:46.780 --> 00:46:49.340
they problematic for research integrity? A conflict

00:46:49.340 --> 00:46:52.360
of interest arises when a person's primary professional

00:46:52.360 --> 00:46:55.639
duty and in research that's primarily upholding

00:46:55.639 --> 00:46:57.860
the integrity of the study and protecting the

00:46:57.860 --> 00:47:00.519
welfare of the participants could be improperly

00:47:00.519 --> 00:47:03.719
influenced by a secondary interest. This secondary

00:47:03.719 --> 00:47:06.000
interest is very often financial, but it could

00:47:06.000 --> 00:47:08.739
also be non -financial things, like seeking academic

00:47:08.739 --> 00:47:11.039
promotion, gaining professional recognition,

00:47:11.400 --> 00:47:13.619
or even just a strong personal belief in the

00:47:13.619 --> 00:47:16.679
value of one's own work or invention. In industry

00:47:16.679 --> 00:47:18.820
-funded research, you often have at least three

00:47:18.820 --> 00:47:22.019
parties involved. The researcher, their institution,

00:47:22.179 --> 00:47:24.559
like a university or hospital, and the funding

00:47:24.559 --> 00:47:26.800
corporation. Each party has its own interests,

00:47:27.119 --> 00:47:29.690
which may not always perfectly align. The corporation

00:47:29.690 --> 00:47:31.969
naturally wants the research to show their product

00:47:31.969 --> 00:47:34.670
is effective and safe. The institution might

00:47:34.670 --> 00:47:37.070
benefit financially from research grants or intellectual

00:47:37.070 --> 00:47:39.329
property rights. The researcher might be funding

00:47:39.329 --> 00:47:41.329
for their lab, publications which help their

00:47:41.329 --> 00:47:43.510
career, and perhaps personal financial benefits

00:47:43.510 --> 00:47:46.289
like consultancy fees or royalties. And the problem

00:47:46.289 --> 00:47:49.849
is the potential for bias. Exactly. Financial

00:47:49.849 --> 00:47:52.389
conflicts of interest are a major concern because

00:47:52.389 --> 00:47:54.710
there's a considerable body of evidence suggesting

00:47:54.710 --> 00:47:58.269
they can subtly, or sometimes even overtly, bias

00:47:58.269 --> 00:48:01.829
the research process and its outcomes. For example,

00:48:01.929 --> 00:48:03.889
a researcher receiving significant royalties

00:48:03.889 --> 00:48:06.750
from an implant they helped develop might, consciously

00:48:06.750 --> 00:48:09.369
or unconsciously, be biased towards designing

00:48:09.369 --> 00:48:11.489
studies or interpreting results in a way that

00:48:11.489 --> 00:48:14.650
favors that implant. Professional bodies, like

00:48:14.650 --> 00:48:17.940
the AAOS, American Academy of Orthopedic Surgeons,

00:48:18.219 --> 00:48:20.280
mentioned at the source, often have principals

00:48:20.280 --> 00:48:22.440
recommending stringent measures to manage these

00:48:22.440 --> 00:48:25.079
conflicts. These might include things like avoiding

00:48:25.079 --> 00:48:26.800
trading stocks in the funding company during

00:48:26.800 --> 00:48:29.679
the trial, and having research on one's own royalty

00:48:29.679 --> 00:48:32.239
-generating products conducted by an independent,

00:48:32.679 --> 00:48:35.460
disinterested third party if possible. The evidence

00:48:35.460 --> 00:48:37.760
base as noted in the source does suggest a correlation.

00:48:37.949 --> 00:48:40.550
Clinical trials with significant industry funding

00:48:40.550 --> 00:48:43.230
or author ties are statistically more likely

00:48:43.230 --> 00:48:45.710
to report results favorable to the industry sponsor

00:48:45.710 --> 00:48:48.230
compared to independently funded trials. Now

00:48:48.230 --> 00:48:50.369
this doesn't automatically mean all industry

00:48:50.369 --> 00:48:53.369
funded research is biased, but it certainly highlights

00:48:53.369 --> 00:48:56.989
the inherent risk and the absolute need for transparency

00:48:56.989 --> 00:48:59.489
and robust management strategies for these conflicts.

00:48:59.809 --> 00:49:02.269
Full disclosure is the minimum standard. It certainly

00:49:02.269 --> 00:49:04.690
seems many of these ethical principles we take

00:49:04.690 --> 00:49:07.650
for granted now were solidified in response to

00:49:07.650 --> 00:49:10.329
some quite shocking past abuses in research.

00:49:11.070 --> 00:49:13.829
The source material provides some sobering historical

00:49:13.829 --> 00:49:17.179
examples. Yes. Tragically, many of the current

00:49:17.179 --> 00:49:19.280
ethical regulations and oversight mechanisms

00:49:19.280 --> 00:49:23.159
we have are direct responses to egregious historical

00:49:23.159 --> 00:49:25.940
violations of ethical principles. The Tuskegee

00:49:25.940 --> 00:49:28.099
syphilis experiment, which ran for decades in

00:49:28.099 --> 00:49:30.159
the US, is perhaps the most infamous example.

00:49:30.900 --> 00:49:33.099
African -American men with syphilis were deliberately

00:49:33.099 --> 00:49:36.039
denied effective treatment penicillin long after

00:49:36.039 --> 00:49:38.559
it became widely available, purely so researchers

00:49:38.559 --> 00:49:40.579
could study the natural progression of the untreated

00:49:40.579 --> 00:49:43.300
disease. Participants were actively deceived

00:49:43.300 --> 00:49:46.679
and profoundly harmed. The Stanford Prison Experiment,

00:49:47.099 --> 00:49:48.880
though different in nature and smaller scale,

00:49:49.320 --> 00:49:51.000
highlighted the ethical failure of researchers

00:49:51.000 --> 00:49:53.280
to adequately protect participants and to stop

00:49:53.280 --> 00:49:55.639
the study when individuals were clearly experiencing

00:49:55.639 --> 00:49:57.860
severe psychological distress and felt unable

00:49:57.860 --> 00:50:00.840
to withdraw. And then there was Project DemCultra,

00:50:01.099 --> 00:50:03.500
involving CIA -sponsored experiments often conducted

00:50:03.500 --> 00:50:05.960
on unwitting human subjects, using drugs and

00:50:05.960 --> 00:50:07.699
psychological manipulation techniques in the

00:50:07.699 --> 00:50:10.219
mid -20th century. And these led to real changes

00:50:10.219 --> 00:50:13.440
in oversight. Absolutely. Public outcry and subsequent

00:50:13.440 --> 00:50:15.960
investigations into these and other abuses led

00:50:15.960 --> 00:50:19.119
to significant reforms. In the U .S., the National

00:50:19.119 --> 00:50:21.920
Research Act of 1974 was a landmark piece of

00:50:21.920 --> 00:50:24.659
legislation. It established the National Commission

00:50:24.659 --> 00:50:26.940
for the Protection of Human Subjects of Biomedical

00:50:26.940 --> 00:50:29.639
and Behavioral Research. This commission produced

00:50:29.639 --> 00:50:31.980
the Pivotal Belmont Report, which outlined the

00:50:31.980 --> 00:50:34.219
three core ethical principles that now guide

00:50:34.219 --> 00:50:37.659
human research. Respect for persons. autonomy

00:50:37.659 --> 00:50:39.800
and protection of the vulnerable, beneficence,

00:50:40.059 --> 00:50:43.079
do no harm, maximize benefits, and justice, fair

00:50:43.079 --> 00:50:45.639
distribution of burdens and benefits. This eventually

00:50:45.639 --> 00:50:47.880
led to the development of the Common Rule, which

00:50:47.880 --> 00:50:49.940
is the set of federal regulations governing most

00:50:49.940 --> 00:50:52.360
research involving human subjects in the U .S.

00:50:52.489 --> 00:50:54.849
It mandates things like institutional review

00:50:54.849 --> 00:50:58.670
boards, IRBs, or ethics committees, IECs, to

00:50:58.670 --> 00:51:00.590
review and approve research protocols before

00:51:00.590 --> 00:51:02.989
they begin, requires detailed and formed written

00:51:02.989 --> 00:51:05.130
consent, and sets out specific requirements for

00:51:05.130 --> 00:51:07.349
risk benefit assessments and ensuring subject

00:51:07.349 --> 00:51:09.900
rights and welfare are protected. Internationally,

00:51:10.179 --> 00:51:11.980
guidelines like the International Council for

00:51:11.980 --> 00:51:15.579
Harmonization, ICH, Good Clinical Practice, GCP,

00:51:16.179 --> 00:51:18.599
provide a unified standard for ensuring the quality,

00:51:18.900 --> 00:51:21.519
integrity, and ethical conduct of clinical trials

00:51:21.519 --> 00:51:24.340
involving human participants globally. These

00:51:24.340 --> 00:51:26.539
historical lessons are absolutely vital reminders

00:51:26.539 --> 00:51:28.659
of why these protections are so necessary and

00:51:28.659 --> 00:51:31.059
must be rigorously upheld. So with all these

00:51:31.059 --> 00:51:33.760
layers of design, analysis, measurement, and

00:51:33.760 --> 00:51:36.639
ethics, who ultimately bears the responsibility

00:51:36.639 --> 00:51:39.480
for ensuring a study is conducted properly and

00:51:39.480 --> 00:51:42.480
ethically on the ground? The primary responsibility

00:51:42.480 --> 00:51:44.559
falls squarely on the shoulders of the principal

00:51:44.559 --> 00:51:47.820
investigator, or PI. This is the individual,

00:51:48.000 --> 00:51:50.679
usually a clinician or senior researcher, who

00:51:50.679 --> 00:51:52.760
leads the research team at a particular study

00:51:52.760 --> 00:51:55.880
site. The PI is accountable for the overall conduct

00:51:55.880 --> 00:51:58.739
of the study at their site and, critically, for

00:51:58.739 --> 00:52:00.920
protecting the rights, safety, and welfare of

00:52:00.920 --> 00:52:03.480
the research subjects under their care. They

00:52:03.480 --> 00:52:05.659
are the one ultimately answerable to the sponsor,

00:52:05.900 --> 00:52:07.699
the ethics committee, and regulatory authorities

00:52:07.699 --> 00:52:09.719
for what happens at their site. What are some

00:52:09.719 --> 00:52:12.280
of the key responsibilities of a PI, particularly

00:52:12.280 --> 00:52:15.380
under guidelines like GCP? Under GCP, the PI

00:52:15.380 --> 00:52:17.559
has a wide range of specific responsibilities.

00:52:17.900 --> 00:52:20.199
They must ensure that the study is conducted

00:52:20.199 --> 00:52:22.300
strictly in compliance with the ethically approved

00:52:22.300 --> 00:52:25.159
protocol, those core ethical principles we discussed,

00:52:25.559 --> 00:52:28.159
like informed consent, and all relevant local

00:52:28.159 --> 00:52:31.380
and international regulatory requirements. If

00:52:31.380 --> 00:52:33.500
any deviation from the protocol is necessary,

00:52:33.519 --> 00:52:36.079
for example, if a patient's immediate safety

00:52:36.079 --> 00:52:38.420
requires a different clinical action, the PI

00:52:38.420 --> 00:52:41.239
must document it thoroughly and report it appropriately.

00:52:42.210 --> 00:52:44.570
Importantly, any significant changes or amendments

00:52:44.570 --> 00:52:47.010
to the protocol generally require prior review

00:52:47.010 --> 00:52:50.150
and approval from the Ethics Committee, IRBIEC,

00:52:50.489 --> 00:52:52.969
before they are implemented. The exception is,

00:52:53.190 --> 00:52:55.789
if a deviation is immediately necessary to eliminate

00:52:55.789 --> 00:52:58.610
an unforeseen hazard to a participant, in that

00:52:58.610 --> 00:53:01.329
case, it should be done, but then documented

00:53:01.329 --> 00:53:03.349
and reported to the Ethics Committee and sponsored

00:53:03.349 --> 00:53:06.449
immediately afterwards. The PI is also responsible

00:53:06.449 --> 00:53:08.809
for ensuring subjects are kept adequately informed

00:53:08.809 --> 00:53:11.130
throughout the study and obtaining re -consent

00:53:11.130 --> 00:53:13.230
if significant new information about risks or

00:53:13.230 --> 00:53:15.690
potential benefits emerges, or if there are major

00:53:15.690 --> 00:53:17.929
changes to the study protocol that might affect

00:53:17.929 --> 00:53:20.170
their willingness to continue. They must also

00:53:20.170 --> 00:53:22.869
ensure the accuracy, completeness, and integrity

00:53:22.869 --> 00:53:25.469
of all the study data collected at their site.

00:53:25.690 --> 00:53:28.230
You mentioned recruitment earlier as being a

00:53:28.230 --> 00:53:31.170
major challenge, particularly for RCTs. How does

00:53:31.170 --> 00:53:33.269
that impact the PI specifically, and what are

00:53:33.269 --> 00:53:35.989
some common issues they might face in just managing

00:53:35.989 --> 00:53:39.070
the day -to -day trial operations? Yes. Poor

00:53:39.070 --> 00:53:41.670
recruitment and participant retention is, unfortunately,

00:53:42.090 --> 00:53:44.789
one of the most common reasons why clinical trials

00:53:44.789 --> 00:53:47.230
fail to meet their objectives or sometimes even

00:53:47.230 --> 00:53:50.929
fail to complete at all. The PI needs to be realistic

00:53:50.929 --> 00:53:53.369
when predicting recruitment rates for their site

00:53:53.369 --> 00:53:56.019
during the planning phase. They need to identify

00:53:56.019 --> 00:53:58.280
potential factors that might hinder enrollment.

00:53:59.059 --> 00:54:00.860
Maybe the protocol is extremely complex with

00:54:00.860 --> 00:54:03.320
lots of visits, perhaps the eligibility criteria

00:54:03.320 --> 00:54:06.179
are very strict, or maybe patients in their locality

00:54:06.179 --> 00:54:08.699
just aren't interested in this specific comparison

00:54:08.699 --> 00:54:11.639
being made. They need to regularly review recruitment

00:54:11.639 --> 00:54:14.260
progress against the targets. If it's lagging

00:54:14.260 --> 00:54:17.039
behind, the PI needs to proactively discuss the

00:54:17.039 --> 00:54:19.119
difficulties with their research staff and the

00:54:19.119 --> 00:54:21.659
trial sponsor. Solutions might involve providing

00:54:21.659 --> 00:54:24.079
additional training or resources to staff, looking

00:54:24.079 --> 00:54:26.239
at ways to simplify aspects of the protocol if

00:54:26.239 --> 00:54:29.139
feasible, which would need ethics approval, collaborating

00:54:29.139 --> 00:54:31.820
with other sites, or perhaps requesting an extension

00:54:31.820 --> 00:54:33.900
to the recruitment period if resources allow.

00:54:34.500 --> 00:54:37.079
It requires active management. Managing the overall

00:54:37.079 --> 00:54:39.179
trial operations involves significant risk management

00:54:39.179 --> 00:54:41.380
as well. Essential documents like the detailed

00:54:41.380 --> 00:54:44.039
protocol itself, standard operating procedures,

00:54:44.340 --> 00:54:47.619
SOPs, for various tasks like data entry or sample

00:54:47.619 --> 00:54:50.119
handling, the monitoring plan from the sponsor,

00:54:50.579 --> 00:54:52.599
and all the ethics committee approvals must be

00:54:52.599 --> 00:54:55.079
meticulously organized and in place before the

00:54:55.079 --> 00:54:58.440
very first subject is enrolled. Regulatory audits

00:54:58.440 --> 00:55:00.619
or inspections often uncover common findings.

00:55:00.860 --> 00:55:03.519
Things like inadequate monitoring by the sponsor,

00:55:04.159 --> 00:55:06.340
instances of PI noncompliance with the protocol

00:55:06.340 --> 00:55:08.800
or regulations, deficiencies in the informed

00:55:08.800 --> 00:55:11.500
consent process or documentation, and issues

00:55:11.500 --> 00:55:13.760
with accountability for the investigational product,

00:55:14.260 --> 00:55:16.400
like ensuring the correct drug or device is stored,

00:55:16.599 --> 00:55:20.019
dispensed, used, and tracked properly. A diligent

00:55:20.019 --> 00:55:22.039
PI and their research team need to be proactive

00:55:22.039 --> 00:55:23.940
in identifying and mitigating these potential

00:55:23.940 --> 00:55:26.519
risks and ready to implement swift, corrective,

00:55:26.559 --> 00:55:28.719
and preventive actions if any issues are identified

00:55:28.719 --> 00:55:31.699
during monitoring or audits. Right, so assuming

00:55:31.699 --> 00:55:33.980
the trial runs successfully, all the data are

00:55:33.980 --> 00:55:36.519
collected and analyzed. The findings then need

00:55:36.519 --> 00:55:38.659
to be communicated, typically through writing

00:55:38.659 --> 00:55:41.880
a research manuscript for publication. The source

00:55:41.880 --> 00:55:45.320
discusses the standard MRAD structure. Can you

00:55:45.320 --> 00:55:47.840
quickly break down what should ideally go into

00:55:47.840 --> 00:55:51.360
each of those sections for clarity? Yes, MRAD.

00:55:51.619 --> 00:55:54.659
It stands for Introduction, Methods, Results,

00:55:54.860 --> 00:55:57.250
and Discussion. It's the universally accepted

00:55:57.250 --> 00:55:59.630
structure for most original scientific research

00:55:59.630 --> 00:56:01.869
manuscripts because it provides a logical and

00:56:01.869 --> 00:56:04.230
predictable flow. It allows readers to quickly

00:56:04.230 --> 00:56:07.110
understand why the study was done. Introduction.

00:56:07.449 --> 00:56:10.230
How it was done. Methods. What was found. Results.

00:56:10.510 --> 00:56:13.210
And what those findings mean. Discussion. So

00:56:13.210 --> 00:56:15.250
starting with the introduction. The introduction

00:56:15.250 --> 00:56:17.269
sets the stage. It should provide the necessary

00:56:17.269 --> 00:56:19.710
background context for the research, explain

00:56:19.710 --> 00:56:21.829
the existing knowledge gap or the clinical problem

00:56:21.829 --> 00:56:24.369
the study addresses, clearly state the specific

00:56:24.369 --> 00:56:26.269
research question or hypothesis being tested,

00:56:26.570 --> 00:56:28.630
and finish by presenting the study's primary

00:56:28.630 --> 00:56:31.010
aim or objective. Then the method sounds crucial

00:56:31.010 --> 00:56:33.909
for judging quality. The method section is absolutely

00:56:33.909 --> 00:56:36.170
critical for transparency and reproducibility.

00:56:36.469 --> 00:56:38.809
It needs to describe exactly how the study was

00:56:38.809 --> 00:56:41.570
conducted in sufficient detail that another researcher

00:56:41.570 --> 00:56:44.699
could, at least in theory, replicate it. This

00:56:44.699 --> 00:56:47.639
includes detailing the study design, for example,

00:56:47.800 --> 00:56:50.659
RCT, cohort study, the setting where it took

00:56:50.659 --> 00:56:53.800
place, the participant selection criteria, inclusion

00:56:53.800 --> 00:56:56.420
and exclusion, and how participants were recruited.

00:56:57.099 --> 00:56:59.480
It needs to describe the interventions or exposures

00:56:59.480 --> 00:57:02.500
being studied, define the primary and secondary

00:57:02.500 --> 00:57:04.880
outcome measures used, and exactly how they were

00:57:04.880 --> 00:57:07.599
assessed. Information about the sample size calculation

00:57:07.599 --> 00:57:10.059
and the statistical methods used for the analysis

00:57:10.059 --> 00:57:13.389
must also be included here. Crucially, the description

00:57:13.389 --> 00:57:15.769
of the study sample itself, demographics, baseline

00:57:15.769 --> 00:57:18.250
characteristics of the participants, often presented

00:57:18.250 --> 00:57:20.670
in Table 1 belongs in the methods section, not

00:57:20.670 --> 00:57:23.070
the results. Okay, results, just the facts. The

00:57:23.070 --> 00:57:25.789
results section should present the findings objectively,

00:57:26.170 --> 00:57:28.150
without interpretation or speculation at this

00:57:28.150 --> 00:57:30.909
stage. You report the data, often using tables

00:57:30.909 --> 00:57:33.800
and figures for clarity. You should include key

00:57:33.800 --> 00:57:37.039
descriptive statistics like means, standard deviations,

00:57:37.320 --> 00:57:40.079
medians, ranges, and the results of your inferential

00:57:40.079 --> 00:57:42.300
statistical tests, making sure to include the

00:57:42.300 --> 00:57:44.840
numerical values for effect sizes, like mean

00:57:44.840 --> 00:57:47.099
differences or odds ratios, their confidence

00:57:47.099 --> 00:57:50.820
intervals, and the exact p -values. If you use

00:57:50.820 --> 00:57:53.039
the word significant in a statistical context,

00:57:53.260 --> 00:57:55.380
it absolutely must be followed by the corresponding

00:57:55.380 --> 00:57:58.769
p -value or CI to justify that claim. You should

00:57:58.769 --> 00:58:00.929
focus on reporting the main findings that directly

00:58:00.929 --> 00:58:03.610
relate to your primary study objectives, especially

00:58:03.610 --> 00:58:06.050
if space is limited. And finally, the discussion

00:58:06.050 --> 00:58:09.349
is where you interpret it all. Exactly. The discussion

00:58:09.349 --> 00:58:11.650
section is where you interpret your results and

00:58:11.650 --> 00:58:14.030
place them in the broader context. You typically

00:58:14.030 --> 00:58:16.170
start by briefly summarizing your study's most

00:58:16.170 --> 00:58:18.510
important findings, answering the research question

00:58:18.510 --> 00:58:21.610
you post in the introduction. Then you compare

00:58:21.610 --> 00:58:23.550
and contrast your results with previous research

00:58:23.550 --> 00:58:26.440
in the field. Do your findings support contradict,

00:58:26.760 --> 00:58:29.019
or perhaps add nuance to what was already known.

00:58:30.019 --> 00:58:32.139
You should discuss the clinical relevance or

00:58:32.139 --> 00:58:34.400
implications of your findings. What do they mean

00:58:34.400 --> 00:58:37.099
for patient care, clinical practice, or perhaps

00:58:37.099 --> 00:58:40.159
health policy? And this is absolutely vital for

00:58:40.159 --> 00:58:42.880
trustworthiness and scientific integrity. You

00:58:42.880 --> 00:58:45.039
must explicitly discuss your study's limitations.

00:58:45.480 --> 00:58:48.050
No study is perfect. Honestly, acknowledging

00:58:48.050 --> 00:58:50.730
the limitations, perhaps potential biases that

00:58:50.730 --> 00:58:53.210
couldn't be fully controlled, issues with measurement

00:58:53.210 --> 00:58:55.489
tools, constraints imposed by the sample size

00:58:55.489 --> 00:58:58.050
or duration, or factors affecting the generalizability

00:58:58.050 --> 00:58:59.949
of your findings significantly increases the

00:58:59.949 --> 00:59:02.510
credibility of your work. Discussing limitations

00:59:02.510 --> 00:59:05.409
also often helps to identify remaining uncertainties

00:59:05.409 --> 00:59:07.769
and generate important ideas for future research.

00:59:08.329 --> 00:59:10.289
And using reporting guidelines helps with this.

00:59:10.510 --> 00:59:13.570
Yes, adhering to specific reporting guidelines,

00:59:13.869 --> 00:59:15.989
like the Prisma Statement for Systematic Reviews

00:59:15.989 --> 00:59:19.550
and Meta -Analyses, or CONSORT for RCTs, helps

00:59:19.550 --> 00:59:21.409
ensure that all the essential information is

00:59:21.409 --> 00:59:23.809
included in a standardized way, which improves

00:59:23.809 --> 00:59:26.690
the quality and transparency of reporting. The

00:59:26.690 --> 00:59:29.190
whole submission process to a journal involves

00:59:29.190 --> 00:59:32.170
gathering all these materials, including increasingly

00:59:32.170 --> 00:59:34.889
detailed disclosures of any potential financial

00:59:34.889 --> 00:59:37.710
conflicts of interest from all authors, again

00:59:37.710 --> 00:59:40.420
to promote transparency. That was a really thorough

00:59:40.420 --> 00:59:42.619
breakdown of the entire research journey from

00:59:42.619 --> 00:59:44.559
initial design right through to publication.

00:59:45.119 --> 00:59:46.639
Let's maybe transition into a quick lightning

00:59:46.639 --> 00:59:48.780
round just to reinforce a few of the key concepts

00:59:48.780 --> 00:59:51.059
we've discussed. All right, lightning round time.

00:59:52.099 --> 00:59:55.500
Beyond the p -value, what is one specific statistical

00:59:55.500 --> 00:59:57.739
concept that professionals should always pay

00:59:57.739 --> 00:59:59.480
close attention to when they're reading research

00:59:59.480 --> 01:00:02.119
papers? Confidence intervals. Always look at

01:00:02.119 --> 01:00:04.380
the range of the plausible effect and crucially

01:00:04.380 --> 01:00:06.639
how that range relates to what you'd consider

01:00:06.840 --> 01:00:09.099
clinically important, not just whether it happens

01:00:09.099 --> 01:00:11.960
across zero or not. Good one. Okay, if you're

01:00:11.960 --> 01:00:14.199
reading a paper and something feels a bit off

01:00:14.199 --> 01:00:17.400
or maybe too good to be true, what's a potential

01:00:17.400 --> 01:00:20.420
red flag that might suggest reporting bias or

01:00:20.420 --> 01:00:23.059
maybe even p -hacking could be involved? Hmm.

01:00:23.599 --> 01:00:26.300
Watch out for selective reporting. For instance,

01:00:26.699 --> 01:00:28.780
if lots of different outcomes were measured according

01:00:28.780 --> 01:00:31.500
to the methods section, but only one or two are

01:00:31.500 --> 01:00:34.360
highlighted as significant in the results section

01:00:34.360 --> 01:00:37.500
without mentioning the others. or perhaps a lack

01:00:37.500 --> 01:00:39.699
of transparency about exactly how the statistical

01:00:39.699 --> 01:00:42.320
analyses were conducted, especially if the results

01:00:42.320 --> 01:00:45.000
seem unexpectedly strong or clean. Make sense.

01:00:45.500 --> 01:00:48.039
And for someone perhaps new to research ethics

01:00:48.039 --> 01:00:50.880
or wanting a refresher, where is a key place

01:00:50.880 --> 01:00:52.940
they could start looking for foundational information

01:00:52.940 --> 01:00:55.900
on the ethical standards for human studies? Well,

01:00:55.980 --> 01:00:57.880
searching for resources on the U .S. Common Rule,

01:00:58.059 --> 01:00:59.780
especially if working with U .S. federally funded

01:00:59.780 --> 01:01:02.739
research, is a good starting point there. or

01:01:02.739 --> 01:01:04.800
for broader international guidance that's widely

01:01:04.800 --> 01:01:07.760
adopted, looking up the ICH Good Clinical Practice

01:01:07.760 --> 01:01:09.900
GCP guidelines would be an excellent place to

01:01:09.900 --> 01:01:12.760
begin. They cover ethical conduct and data quality.

01:01:13.300 --> 01:01:16.260
Excellent. Great. Quick recap. So pulling all

01:01:16.260 --> 01:01:18.019
of this together now, thinking about our listeners

01:01:18.019 --> 01:01:21.059
trying to navigate this complex world, what are

01:01:21.059 --> 01:01:23.420
the key actionable takeaways they should really

01:01:23.420 --> 01:01:25.420
keep in mind when they're trying to evaluate

01:01:25.420 --> 01:01:27.760
clinical research findings for themselves? Okay.

01:01:28.039 --> 01:01:30.829
Key takeaways. First, I'd say use the hierarchy

01:01:30.829 --> 01:01:33.510
of evidence as a useful starting guide, but always

01:01:33.510 --> 01:01:36.250
remember its limitations. Really try to understand

01:01:36.250 --> 01:01:38.389
the inherent strengths and weaknesses of the

01:01:38.389 --> 01:01:40.889
specific study design you're looking at. RCTs

01:01:40.889 --> 01:01:43.869
are theoretically strong against bias, but hard

01:01:43.869 --> 01:01:46.969
to do well, especially in surgery. Systematic

01:01:46.969 --> 01:01:49.750
reviews pool evidence, but remember they're only

01:01:49.750 --> 01:01:51.909
ever as good as the primary studies they include.

01:01:52.570 --> 01:01:55.630
Be critical of the design itself. Second, scrutinize

01:01:55.630 --> 01:01:57.650
how the outcomes were measured. Was a patient

01:01:57.650 --> 01:02:00.630
reported using a validated PROM? Was a clinician

01:02:00.630 --> 01:02:03.289
observed using a potentially less reliable classification

01:02:03.289 --> 01:02:06.389
system? Understand the importance of demonstrated

01:02:06.389 --> 01:02:09.050
reliability and validity for those specific measures

01:02:09.050 --> 01:02:11.809
used in that study. Critically ask, does the

01:02:11.809 --> 01:02:13.610
measure actually capture what truly matters in

01:02:13.610 --> 01:02:15.550
clinical practice or, more importantly, to the

01:02:15.550 --> 01:02:18.289
patient? Third, be constantly vigilant for potential

01:02:18.289 --> 01:02:20.349
biases and underlying ethical considerations.

01:02:20.969 --> 01:02:23.090
Look for subtle signs of selection bias in how

01:02:23.090 --> 01:02:25.070
participants were recruited or allocated to groups.

01:02:25.420 --> 01:02:27.539
Think about potential reporting bias and how

01:02:27.539 --> 01:02:29.039
the results are presented. Are they telling the

01:02:29.039 --> 01:02:31.199
whole story? And always consider whether potential

01:02:31.199 --> 01:02:33.480
conflicts of interest, especially financial ones,

01:02:33.920 --> 01:02:35.980
might plausibly influence the interpretation

01:02:35.980 --> 01:02:39.119
or conclusions drawn by the authors. Fourth,

01:02:39.679 --> 01:02:43.039
make sure you go beyond the p -value. While statistical

01:02:43.039 --> 01:02:44.960
significance tells you something about the likelihood

01:02:44.960 --> 01:02:47.719
of the results being due purely to chance under

01:02:47.719 --> 01:02:50.280
the null hypothesis, always use the confidence

01:02:50.280 --> 01:02:52.329
interval. Use it to understand the potential

01:02:52.329 --> 01:02:54.469
magnitude and the precision of the treatment

01:02:54.469 --> 01:02:56.769
effect, and then use that information to gauge

01:02:56.769 --> 01:02:59.090
its actual clinical relevance or importance.

01:03:00.349 --> 01:03:02.690
And finally, I'd say look for transparency and

01:03:02.690 --> 01:03:05.690
honesty in the reporting itself. Does the paper

01:03:05.690 --> 01:03:07.989
follow a standard logical structure like MRAD?

01:03:08.119 --> 01:03:10.739
Do the authors openly and frankly discuss their

01:03:10.739 --> 01:03:13.559
studies' limitations? Acknowledging limitations

01:03:13.559 --> 01:03:15.739
isn't a sign of weakness, it's actually a sign

01:03:15.739 --> 01:03:18.300
of rigorous, trustworthy research, and it helps

01:03:18.300 --> 01:03:20.340
you understand where the evidence is still uncertain,

01:03:20.800 --> 01:03:23.159
or where more research is needed. Those are incredibly

01:03:23.159 --> 01:03:25.679
valuable filters, I think, for anyone to apply

01:03:25.679 --> 01:03:28.139
when they're trying to sift through the, frankly,

01:03:28.960 --> 01:03:30.639
overwhelming amount of medical literature out

01:03:30.639 --> 01:03:32.929
there. Thank you so much for guiding us through

01:03:32.929 --> 01:03:36.170
this really detailed deep dive into clinical

01:03:36.170 --> 01:03:38.590
research design, analysis, outcome measurement,

01:03:38.969 --> 01:03:41.030
and all those crucial ethical considerations.

01:03:41.449 --> 01:03:43.449
It's been my pleasure. It is a complex area,

01:03:43.510 --> 01:03:45.969
but as you say, it's absolutely vital for anyone

01:03:45.969 --> 01:03:47.670
involved in health care, whether research or

01:03:47.670 --> 01:03:50.829
clinician or even patient, to have a basic grasp

01:03:50.829 --> 01:03:53.769
of these principles. Absolutely. And if you found

01:03:53.769 --> 01:03:56.369
this deep dive into evaluating clinical research

01:03:56.369 --> 01:03:58.510
valuable, perhaps consider sharing it with a

01:03:58.510 --> 01:04:01.250
colleague who might also benefit or leaving us

01:04:01.250 --> 01:04:03.289
a rating wherever you happen to get your podcasts.

01:04:03.610 --> 01:04:06.230
It genuinely helps other professionals find the

01:04:06.230 --> 01:04:08.289
show. So the next time you pick up a research

01:04:08.289 --> 01:04:10.989
paper, maybe don't just look for that headline

01:04:10.989 --> 01:04:13.489
p -value or the simple conclusion in the abstract.

01:04:13.809 --> 01:04:16.110
Ask yourself, what did they really measure here?

01:04:16.130 --> 01:04:18.769
And can I actually trust that measurement? And

01:04:18.769 --> 01:04:21.050
perhaps more importantly, what does that confidence

01:04:21.050 --> 01:04:22.960
interval will really tell me about whether this

01:04:22.960 --> 01:04:25.239
finding actually matters for my patient or for

01:04:25.239 --> 01:04:28.079
my practice. That question I think is often much

01:04:28.079 --> 01:04:29.980
harder to answer than just looking at the statistics

01:04:29.980 --> 01:04:32.400
alone, but it's infinitely more important in

01:04:32.400 --> 01:04:34.840
the long run. That's all for this deep dive.

01:04:35.099 --> 01:04:37.440
Join us next time as we unpack another complex

01:04:37.440 --> 01:04:37.800
topic.
