Module 2 · Internal & External Validity
A flawless trial whose result you're not allowed to use.
A new drug is tested in a textbook-perfect randomised trial: properly randomised, allocation concealed, everyone blinded, thousands of patients, a rock-solid result. The drug clearly works.
Then you notice the fine print. Every patient in the trial was aged 18–55, with no other illnesses, no other medications. Your patient is 79, with diabetes, kidney trouble, and a fistful of daily pills.
The trial was flawless. Can you trust its result for your 79-year-old patient?
Two questions, not one
Every study faces two separate tests, and passing one says nothing about the other.
Internal validity — is the result true here?
Within the study itself, is the measured effect real — or an artefact of bias, confounding or chance? This is what all of M2 so far has been about: randomisation, blinding, the three enemies. Internal validity is truth inside the study.
External validity — is the result true there?
Does it carry beyond the study, to the patients, settings and conditions of the real world — your world? Also called generalisability. External validity is reach outside the study.
One asks did they get the right answer? The other asks is it the answer to my question? A trial can ace either and flunk the other.
Four kinds of study
Because they're independent, every study lands in one of four quadrants — not on a single scale from bad to good.
Tight RCT in young healthy volunteers — true, but not about your patients.
High internal / Low external
Large, realistic, rigorous trial — true AND relevant.
High internal / High external
Small, sloppy, unrepresentative — tells you little.
Low internal / Low external
Messy real-world study of the right patients — relevant, but maybe biased.
Low internal / High external
- High internal, high external — a large, rigorous trial in realistic patients. Rare and precious: true and relevant.
- High internal, low external — the flawless trial from the hook. Bulletproof, but in volunteers nothing like your patients. True here, useless there.
- Low internal, high external — a messy real-world study of exactly the right patients. Relevant, but maybe biased — you can't fully trust the number.
- Low internal, low external — small, sloppy, unrepresentative. Tells you almost nothing.
Notice the trap quadrant: high internal, low external. It's the most dangerous, because the study looks authoritative — perfect methods, impressive numbers — and quietly answers a question you didn't ask.
Each study below is weak on exactly one of the two. Tap which question it fails.
The built-in tension
Here's the uncomfortable part — and it explains why high-internal, low-external is so common.
The very things that make a trial internally strong tend to make it externally weak. To shut out the three enemies, you control everything: narrow entry criteria (no confounding comorbidities), ideal conditions, perfect adherence, expert centres. Each of those controls buys you internal validity — and each one pulls the trial further from the messy reality where your patients actually live.
There's even a name for the two ends of this dial:
- Explanatory trials sit at the controlled end — can this work, under ideal conditions? High internal validity, lower external.
- Pragmatic trials sit at the realistic end — does this work, in ordinary practice? Higher external validity, but harder to keep internally tight. (You'll meet these properly in M11.)
If this feels familiar, it should: this is exactly the efficacy-versus-effectiveness gap from M1, seen from the methods side. Efficacy is what a high-internal, controlled trial measures. Effectiveness is what external validity asks about. Same tension, named twice.
Five threats to generalisability
When you ask "does this transfer to my patients?", five specific things are worth checking. Each is a way a true result can fail to reach the real world:
Run a study through these five and you'll know not just whether it generalises, but where it breaks.
Each trial below is internally sound, but something blocks it from transferring. Tap the threat to generalisability.
Where this sits in M2
Pull the whole module together. You've been building two different skills without quite naming the split:
- Everything about the three enemies, randomisation and blinding was about internal validity — making sure the effect is real for the people studied.
- This lesson is about external validity — a separate audit of whether that real effect reaches the people you care about.
And here's the crucial part: randomisation buys you internal validity, not external. A coin toss makes the groups comparable; it does nothing to make the patients resemble yours. That's why even a perfect RCT can leave a generalisability gap wide open — and why a big, representative observational study sometimes tells you more about your patients than a narrow trial does. (Remember: a rung on the ladder is a starting presumption, not a verdict.)
Internal validity asks: did they get it right? External validity asks: right for whom? You need a yes to both before a result should move a decision.
Why this matters for HTA
This is the backbone of how you read evidence in a submission, in two moves:
First, the internal question: is this result believable at all? Randomised? Concealed? Blinded where it mattered? Free of the three enemies? If no — stop; the number can't be trusted.
Then, the external question: even if true, is it about us? Our patients, our comparator, our conditions, the outcomes and timescales we care about? This is where the "generalisability gap" becomes one of the most powerful, and most common, challenges an assessor raises — because manufacturers naturally test their drug under the conditions that flatter it most.
A submission's trial can be both unimpeachable and beside the point. Your job is to catch the studies that are perfectly true about the wrong patients, the wrong comparator, or the wrong outcome — and to say so.
Internal vs external validity, in one breath
- Two separate questions: internal = true here (within the study); external = true there (out in your world, a.k.a. generalisability).
- They're independent axes, not one scale — a study can be strong on either and weak on the other.
- The trap quadrant is high-internal, low-external: authoritative-looking, quietly answering the wrong question.
- The controls that buy internal validity often cost external validity — the same efficacy-vs-effectiveness tension from M1.
- Check generalisability across five threats: patients, comparator, conditions, outcome, time horizon.
- Randomisation protects internal validity only — it never guarantees the result transfers.
Before a result can change a decision, it must pass both tests: is it true? — and is it true for us?
That completes the foundations of reading evidence. You can now judge a study's design, defend against the three enemies, and ask whether a result both holds and transfers. What you can't yet do is read the numbers a study reports — the effect sizes, the p-values, the confidence intervals. That's exactly where M3 begins: biostatistics, the language the evidence actually speaks.