M11 · REAL-WORLD EVIDENCE
Where did all our numbers come from?
Step back over everything we've built. The survival curves we extrapolated in Module 8. The treatment effects that drove the ICER in Module 7. The uptake and outcomes behind the budget impact in Module 10. Nearly every number came from the same place: a randomised controlled trial.
And an RCT is a very particular kind of place. It enrols carefully chosen patients — often younger, fitter, with fewer other illnesses than the people you'd actually treat. It enforces the drug being taken exactly as directed, under close monitoring, for a fixed, usually short, window. It is, by design, an idealised world — built that way for excellent reasons we'll come to. But a health system doesn't treat idealised patients in idealised conditions. It treats real ones, in the mess of routine practice, for years.
So a natural question hangs over the whole enterprise: what actually happens out there, in the real world? Answering it is a different kind of evidence — real-world evidence — and it's the subject of this module.
Two different questions: efficacy vs effectiveness.
The temptation is to think real-world evidence is just a lower-quality RCT — the same question, answered less rigorously. That's the crucial mistake. RWE answers a different question.
The distinction has two names, and getting them straight is the foundation of everything here:
- Efficacy — can this intervention work, under ideal, controlled conditions? Right patients, perfect adherence, close monitoring, a clean comparison. This is what an RCT measures.
- Effectiveness — does this intervention work, in real patients under routine care? Everyone who gets it, taking it imperfectly, with all their other illnesses, in ordinary practice. This is what real-world evidence measures.
"Can it work?" and "does it work?" are not the same question, and neither is a worse version of the other. A drug can have superb efficacy — a clear effect in the perfect trial — and disappointing effectiveness once it meets the real world. RWE exists to measure that second thing, which no amount of RCT rigour can reach, because the RCT deliberately built the real world out.
The efficacy-effectiveness gap.
Put the two side by side and you almost always find a gap — and the effect in practice is usually smaller than the effect in the trial. This is the efficacy-effectiveness gap, and it's one of the most important patterns in all of health technology assessment.
Take an oncology drug. In its pivotal trial — patients under 70, no serious comorbidities, good performance status, closely monitored — it delivers 9 extra months of progression-free survival. Impressive efficacy. Now look at it in routine use: patients in their 80s, with heart disease and diabetes, missing doses, taking treatment breaks for side effects. The real-world benefit comes out at 4 months. Same drug, roughly half the effect.
Why the shrinkage? Real patients are older, sicker, and more complicated than trial patients; they adhere less perfectly; they're monitored less closely; and they lack the protocol-driven support that props up outcomes in a trial. None of this is a failure of the drug or the trial — it's the difference between a controlled demonstration and everyday reality. And the size of that gap is itself valuable information: it tells a health system what it will really get, not what it could get in a laboratory. A cost-effectiveness analysis built on trial efficacy, when the real effect is half as large, is quietly overstating the value of everything downstream.
The price of realism: losing randomisation.
So real-world evidence is more realistic than a trial. Why, then, isn't it simply the better evidence? Because realism has a price, and the price is steep.
Recall from Module 2 what randomisation does. By assigning patients to treatment or control by chance, an RCT makes the two groups alike in every respect — measured and unmeasured — except the treatment itself. That's what lets it claim the treatment caused the difference in outcome. Randomisation is the single mechanism that holds confounding at bay: the risk that some other difference between the groups, not the treatment, explains the result.
Real-world evidence has no randomisation. The patients who got the new drug in routine care chose it, or were chosen for it, for reasons — they were healthier, or sicker, or richer, or treated at better hospitals. Those reasons are tangled up with their outcomes, and nothing separates them out. So the moment you leave the trial's controlled world, confounding comes flooding back — the exact threat randomisation was invented to defeat. This is real-world evidence's central weakness, and untangling it — trying to recover some of what randomisation gave for free — is such a large problem that it gets its own lesson later in this module. For now, hold the trade-off: you buy realism by giving up your protection against confounding.
The control–realism trade-off.
Slide between a randomised controlled trial and real-world evidence. Watch both panels move at once — what you gain as you move toward realism, and what you lose. There's no "better" end of this slider; there's only the trade.
What you gain
- Representative patients (the people you'll actually treat)
- Long-term, routine-practice outcomes
- The answer to "does it work?" (effectiveness)
What you keep
- Protection against confounding (randomisation)
- A clean causal claim
- The answer to "can it work?" (efficacy)
At this setting (RCT): measures EFFICACY · population idealised · confounding protection strong · causal claim clean
Drag it all the way to realism and you get exactly the evidence a health system craves — real patients, real outcomes, over real timeframes — while your protection against confounding drains away to nothing. Drag it back to control and confounding is defeated, but you're now measuring a drug in a world that doesn't exist. Every choice on this slider is a trade, never an upgrade. The whole art of real-world evidence is getting as much realism as you can while clawing back as much protection against confounding as possible — which is the story of the rest of this module.
Now you.
For each question, pick the right kind of study — and say whether confounding is a major threat to it.
1. "Under ideal, controlled conditions, does this drug lower blood pressure more than placebo?"
2. "In routine care, do patients on this drug actually live longer than those on the older one?"
3. "What are the 15-year outcomes for patients who started this therapy a decade ago?"
4. "In a tightly controlled trial, does the drug shrink tumours more than the comparator?"
Three sources, three profiles.
"Real-world data" isn't one thing — it comes from several sources, each with a sharply different profile. The three you'll meet most:
- Disease and product registries — data collected on purpose, following a defined cohort of patients (everyone with a given disease, or everyone on a given drug). Strength: clinically deep, validated, purpose-built for research. Weakness: narrow, expensive, and often selective about who's included.
- Claims data — the billing records a payer generates to pay for care. Strength: enormous, near-complete on anything that was billed, spanning years. Weakness: clinically thin — it records what was charged for, in billing codes, not what happened to the patient, and it's collected to settle payments, not to answer research questions.
- Electronic health records (EHR) — the clinical notes and results from routine care. Strength: clinically rich — lab values, diagnoses, prescriptions, the actual detail of treatment. Weakness: messy, incomplete, and unstandardised — captured for delivering care, not for analysis, so definitions and completeness vary from clinician to clinician.
Notice the theme running through all three: registries aside, most real-world data was gathered for a purpose other than research — to bill, or to treat. Using it for evidence means using the right data for the wrong purpose, and every quirk of why it was collected leaves fingerprints on what it can reliably tell you.
What RWE gives HTA that RCT can't.
Set against a randomised trial, real-world evidence looks weaker on causation — so why does HTA increasingly demand it? Because it fills gaps an RCT structurally cannot, no matter how well run:
- Long-term outcomes. Trials run for months or a few years; RWE can follow patients for a decade or more — precisely the data that could validate (or demolish) the survival extrapolations from Module 8, where the unobserved tail drove the whole answer.
- The real population and real uptake. Who actually gets the drug, at what rate, with what adherence — the very assumptions the budget impact analysis of Module 10 rested on, now observed instead of guessed.
- Missing head-to-head comparisons. When no trial ever compared the two drugs a decision actually turns on, real-world data may be the only source of a direct comparison.
- Rare diseases. When a condition is too rare to run a large RCT, real-world data from registries may be the best evidence obtainable at all.
- Post-approval reality. What happens after a drug enters routine use — effectiveness, safety signals, whether the trial's promise held up.
Read that list against the modules behind us and the point lands: RWE isn't competing with the RCT for the same job. It's answering the questions the RCT left open — long horizons, real populations, routine practice — that the whole apparatus of HTA quietly depends on.
What's the flaw?
A manufacturer submits real-world evidence from a database of 500,000 patients, arguing: "This is a far larger sample than the 600-patient pivotal trial, so it provides stronger proof that the drug causes better survival." What's the flaw?
Why this matters for HTA
Real-world evidence now sits in almost every major submission, and reading it well starts with refusing to slot it onto the same scale as a trial.
- Always ask which question the evidence answers. Efficacy (can it work?) and effectiveness (does it work?) are different, and a submission should be clear about which it's claiming. Trial efficacy quoted as if it were real-world effectiveness — with no account of the gap — quietly inflates every downstream number, from the ICER to the budget impact.
- Never let sample size stand in for causal validity. A real-world study's size is the first thing manufacturers highlight and the least relevant to whether it proves causation. Without randomisation, the question is always confounding — and a bigger database doesn't answer it, it just makes the bias precise.
- Match the source to the claim. Registries, claims, and EHR have different strengths and different blind spots, and each was mostly built for a purpose other than research. A claim about clinical outcomes drawn from billing codes, or about a whole population drawn from a selective registry, is using data past the edge of what it can support.
An RCT tells you what a drug can do to a patient who doesn't exist. Real-world evidence tells you what it's doing to the patients who do — and the entire discipline of the field is learning to trust that second answer without forgetting why it's harder to trust.
Real-world evidence: sources and why, in one breath.
- RWE answers a different question from an RCT, not a lower-quality version of the same one. RCTs measure efficacy ("can it work?" in ideal, controlled conditions); RWE measures effectiveness ("does it work?" in routine practice with real patients).
- The efficacy-effectiveness gap is real and usually shrinks the effect: real patients are older, sicker, and less adherent than trial patients, so practice delivers less than the trial promised — and that gap is information, not error.
- Realism has a price: RWE has no randomisation, so confounding — the threat randomisation defeats — comes flooding back. Sample size never fixes this; a huge database just makes a confounded estimate precise. (Clawing back causal validity is the next-but-one lesson.)
- Its sources differ sharply: registries (deep, validated, narrow), claims (vast, but billing not clinical), EHR (clinically rich, but messy and built for care not research) — most gathered for a purpose other than research.
You can have the clean causal answer or the realistic one — rarely both at full strength. RCTs chose control; real-world evidence chooses reality, and then spends all its ingenuity trying to win back the rigour that choice gave away.
If the two ends of that slider force a trade, an obvious question follows: can you build a study that sits in the middle — the structure and randomisation of a trial, but run in the messy conditions of real practice? You can, and it's called a pragmatic trial. That's next.