M11 · REAL-WORLD EVIDENCE

The wrong question about RWE.

Three lessons have taken real-world evidence apart. We've seen it answers a different question than a trial (effectiveness, not efficacy), that it loses randomisation's protection, and that no amount of clever adjustment can remove the unmeasured confounding that leaves behind. A reasonable person, having absorbed all that, might conclude: don't trust real-world evidence.

That conclusion is a trap — because it's built on the wrong question. "Should we trust RWE?" treats trustworthiness as a fixed property of the evidence, like a grade stamped on it. It isn't. Everything we criticised in the last lesson — confounding by indication, residual bias — attaches to one particular job: proving that one treatment works better than another. But that is only one of the many questions HTA needs answered. Ask a different question, and those weaknesses may not apply at all.

So the real question isn't "is this RWE any good?" It's "what is this RWE for — and does its known weakness even matter for that task?" This lesson is about answering that question well, and about the powerful thing RWE does once you do: it turns "too uncertain to decide" into "decide, and keep learning."

Where RWE is weak: the treatment-effect question.

Let's be precise about the weakness, so we can then set it aside where it doesn't apply.

RWE is at its weakest on the comparative treatment-effect question: does drug A produce better outcomes than drug B (or than no treatment)? This is exactly where last lesson's villain lives. In observational data, patients weren't randomised to A or B — they were steered there for reasons tied to prognosis, so confounding by indication contaminates the comparison. Adjustment and propensity scores help with the measured part; residual confounding remains, invisible and irremovable. So when a submission uses RWE to claim "our drug beats the comparator," that claim carries all the fragility of the last lesson, and demands every caution we discussed — sensitivity analysis, negative controls, triangulation with trials.

This is real, and it's why the randomised trial remains the gold standard for causal claims about treatment effects. If the treatment effect is the question, RWE is the junior partner, admitted only with heavy caveats.

But hold that phrase — for treatment effects. Because the moment you stop asking a comparative-effect question, the whole objection evaporates.

Where RWE is strong (often the only source).

Look at what HTA actually needs to build an assessment, and notice how much of it isn't a treatment-effect comparison at all:

Natural history of the disease — how do untreated patients fare, and how does the condition progress? A purely descriptive question. There's no treatment comparison to confound.
Long-term survival — what happens to patients 10 or 15 years out? Trials run for months to a few years; only real-world data can see the long horizon. This is precisely the data that could validate the extrapolations from Module 8, where the unobserved survival tail drove the entire ICER.
Epidemiology and population size — how many patients have the disease, and how many are eligible? The core input to the budget impact analysis of Module 10, and a question no RCT is designed to answer.
Real-world costs and treatment patterns — what does routine care actually cost, and how is the disease managed in practice? The cost inputs to CEA and BIA, observable only in the real world.
Safety and rare adverse events — signals too infrequent to catch in a trial's limited sample, visible only across large real-world populations over time.
External control arms — for a disease too rare to run a controlled trial, real-world or historical data on untreated patients may be the only possible comparator.

See the common thread: most of these are descriptive or epidemiological, not causal comparisons of treatments. Confounding by indication — which needs two treatment arms to contaminate — simply doesn't bite on "how many patients have the disease?" or "how long do untreated patients survive?" For these questions, RWE isn't the junior partner. It's the best evidence available — and frequently the only evidence that exists.

The right question: match the role to the task.

Put S2 and S3 together and the principle falls out. The strength of a piece of real-world evidence is not a property of the data — it's a property of the data-question pair. The same database is weak for one job and gold for another.

A claims dataset can't credibly tell you drug A beats drug B (confounding), but it can tell you exactly how many patients were treated, what it cost, and how long they survived — brilliantly. A disease registry can't randomise anyone, but it may hold the only reliable picture of the natural history of a rare condition anywhere in the world. So the assessor's question is never the blunt "do we trust this RWE?" It's the precise one:

What role is this RWE playing in the argument — and does its weakness matter for that role?

RWE used to establish a treatment effect: weakness matters enormously, apply full scrutiny. The same RWE used to estimate the eligible population, or long-term survival, or real-world cost: weakness largely irrelevant, and it may be the best source there is. One dataset, many possible roles, and the verdict changes with each. Reading RWE well means asking what job it's doing before asking how much to trust it.

Assign RWE to the job.

Here are six questions a real HTA has to answer. For each, reveal how strong real-world evidence is for that specific question — and why. Watch for the pattern in what makes RWE weak versus strong.

1. Does drug A extend survival more than drug B?

2. How long do untreated patients with this disease survive? (natural history)

3. How many patients in the country are eligible? (epidemiology)

4. What is 10-year survival for patients on this therapy? (long-term)

5. What does real-world care actually cost per patient?

6. In a disease too rare for any RCT, how does treated survival compare to historical untreated patients? (external control)

Questions assigned: 0/6 · RWE weak: 0 (the treatment-effect comparison) · RWE strong or only option: 0

There's the pattern. Out of six real HTA questions, RWE is weak on exactly one — the head-to-head treatment-effect comparison, where confounding by indication lives. On the other five — natural history, epidemiology, long-term survival, real costs, rare-disease external control — RWE ranges from strong to the only evidence in existence. So a submission leaning on RWE isn't automatically weak; it depends entirely on which question the RWE is answering. Judge the role, never the data alone.

Now you.

For each question, how strong is real-world evidence — strong, weak, or the only option (with caution)?

1. "Which of two active drugs works better?"

2. "How many people in the population have the disease?"

3. "What is the untreated natural course of this disease?"

4. "What does treating a patient actually cost in routine practice?"

5. "In an ultra-rare disease where no RCT is feasible, does the drug beat historical controls?"

6. "Does adding drug X to standard care improve outcomes versus standard care alone?"

RWE as the engine of conditional decisions.

Now the payoff that ties three modules together. Real-world evidence doesn't just fill gaps in a static assessment — it enables a fundamentally different kind of decision.

Recall two earlier threads. Module 9 showed that when a decision is uncertain, information has value — sometimes it's worth paying to reduce the uncertainty before committing (the value of information). Module 10 showed that a payer's decision needn't be a blunt yes/no — it can be a conditional deal, a managed entry. Real-world evidence is the missing piece that makes those ideas operational: it's the mechanism that supplies the information, in routine practice, after a conditional yes.

The result is coverage with evidence development (also called "only in research" or managed access): instead of rejecting a promising-but-uncertain technology, or approving it blindly, the payer says "yes, provisionally — and you will collect real-world evidence as we go." Concretely: a promising rare-disease drug, with an encouraging but uncertain effect and a high value of information, enters through a managed-entry agreement with a mandatory registry. For three years, every treated patient's outcomes are recorded. Then the decision is revisited: if the RWE confirms the benefit, it converts to full reimbursement; if it doesn't, the price is renegotiated or the drug withdrawn.

Look at what that loop does. It closes the circle from Module 9 to Module 11: uncertainty (M9) → a conditional deal to buy time and information (M10) → real-world evidence that actually resolves it (M11). The decision stops being a single shot in the dark and becomes a process of learning — commit provisionally, watch what happens in the real world, then confirm or reverse. RWE is the engine that makes "decide and keep learning" possible.

The catches (why "conditional" often isn't).

That loop is elegant on a slide. In practice it's harder, and a clear-eyed assessor knows where it strains:

The evidence is slow and messy. Real-world data accumulates at the speed of routine care, often incompletely. The registry that was supposed to resolve the uncertainty in three years may still be ambiguous in five — and meanwhile the drug is in use.
Withdrawal is politically brutal — the ratchet problem. Once patients are receiving a drug and clinicians are prescribing it, taking it away because the RWE disappointed is enormously hard. "Conditional" approval has a way of becoming permanent regardless of what the evidence shows. The condition is easy to impose and hard to enforce.
Not all uncertainty is worth resolving. From Module 9: collecting RWE is only worthwhile if it would reduce decision-relevant uncertainty — if the answer could actually change the decision. A registry that produces tidy data nobody acts on is expense without value. Tie the RWE requirement to the questions whose answers matter.

None of this is a reason to abandon conditional decisions — they're often the right call when a technology is promising but the evidence is thin. It's a reason to design them honestly: specify in advance what evidence would confirm or reverse the decision, make the reversal credible, and only demand the data whose answer could move the outcome. Otherwise "managed entry" is just approval wearing a lab coat.

What's the soundest assessment?

A manufacturer submits a single-arm study of a new drug for a very rare cancer (too rare for a randomised trial), using real-world data on historical untreated patients as the comparison. They also provide registry data on how many patients have the disease and what current care costs. A reviewer must weigh the real-world evidence. What's the soundest assessment?

Why this matters for HTA

Reading RWE well is one of the defining skills of modern HTA, precisely because real-world evidence now appears in almost every submission, doing many different jobs at once:

Judge each use of RWE by its role, not by a blanket verdict. The first move on any real-world evidence is to ask what question it's answering. Descriptive uses — epidemiology, natural history, cost, long-term outcomes — are RWE's strength and deserve to be accepted where sound. Comparative treatment-effect uses inherit all of the confounding problems and demand the full scrutiny of the causal-inference lesson. The same dataset can be both in one dossier.
Use RWE to validate the model, not just to make claims. Some of RWE's highest value is quiet: real-world long-term survival that tests whether Module 8's extrapolation was fantasy, real-world uptake and cost that check whether Module 10's budget impact was honest. An assessor can ask for RWE specifically to validate the assumptions a model rests on.
Treat conditional approvals as commitments, not gestures. When RWE underpins a coverage-with-evidence decision, insist the design is real: pre-specified questions whose answers could change the decision, a credible path to reversal, and a focus on the uncertainty that actually matters. A managed-entry agreement that can never be unwound, gathering data no one will act on, is the illusion of learning rather than the substance.

Real-world evidence isn't strong or weak in the abstract — it's strong or weak for a job. Ask what job it's doing before you ask whether to trust it, and the same messy data becomes either a liability or the only window you have onto the world the decision actually lives in.

Real-world evidence in HTA decisions, in one breath.

The right question isn't "should we trust RWE?" but "what is this RWE for, and does its weakness matter for that role?" Trustworthiness is a property of the data-question pair, not the data alone.
RWE is weak on comparative treatment effects (confounding by indication) — there the RCT rules. But it's strong, often the only source, for descriptive questions: natural history, long-term survival (validating Module 8's extrapolations), epidemiology and population size (Module 10's inputs), real-world costs, safety, and rare-disease external controls.
RWE is the engine of conditional decisions: it operationalises Module 9's value of information and Module 10's managed entry. Coverage with evidence development turns "too uncertain to decide" into "approve provisionally, collect RWE, then confirm or reverse" — a loop of learning, not a shot in the dark.
But the loop has catches: real-world data is slow and messy, withdrawal is politically hard (the ratchet), and RWE is only worth collecting if it would resolve decision-relevant uncertainty. Design conditional deals honestly, or they're approval in disguise.

An RCT tells you whether a drug can work; the rest of what a decision needs — who has the disease, what it costs, how long people really live, whether the promise held up — lives in the real world. The craft is knowing which questions to bring to which evidence, and turning uncertainty into something a health system can learn its way out of.

That closes Module 11 — and with it, the evidence and analysis half of HTA is complete: you can appraise a trial, synthesise the literature, value health, model it, quantify its uncertainty, judge affordability, and read real-world evidence for what it's worth. What remains is the world these analyses actually live in: the agencies that run them, the laws that bind them, the reimbursement processes that turn an assessment into a decision. Who are NICE, IQWiG, AOTMiT? What is the EU's new HTA Regulation? How does the money actually get approved? That's the regulatory and reimbursement context of Module 12.