M8 · ECONOMIC MODELLING

The trial ends. The patients don't.

Everything in the last module ran on two numbers: the extra cost and the extra health a technology delivers — ΔCost and ΔEffect — measured over a patient's whole remaining life. The ICER divided them. Net benefit subtracted them. Both took those lifetime figures for granted.

But look at where they'd have to come from. A cost-effectiveness decision commits a health system for as long as patients live — often decades. A clinical trial follows patients for months, sometimes a year or two, occasionally a handful of years. It stops. The patients it enrolled keep living, keep incurring costs, keep gaining or losing health — off-camera, after the data ends.

So the lifetime numbers M7 relied on are never simply read off a trial. They're built. The thing that builds them is a model, and this module is about how — and about why the model, not the trial, is usually where the answer is really decided.

What a trial gives you — and what it doesn't.

A randomised trial is the best tool we have for one specific job: estimating a treatment effect, cleanly, by comparing like with like. But measure it against what a lifetime cost-effectiveness decision actually needs, and the gaps are everywhere:

It runs for a limited time. Follow-up ends at a data cut-off — 12 months, 24, maybe 60. The decision horizon is a lifetime.
It often measures the wrong thing. Trials frequently report surrogate endpoints — progression-free survival, tumour response, a lab value — because final outcomes like overall survival take too long to observe. Cost-effectiveness needs the final outcomes, and the QALYs that flow from them.
It studies a selected population. Trial patients are younger, fitter, and more closely monitored than the real ones. Protocol-driven care isn't routine care.
It usually has one comparator. The trial's control arm may not be the treatment your decision is actually up against.

A trial, in other words, is a sharp, reliable snapshot. A decision needs a lifetime, in the real population, against the relevant comparator, in final outcomes. Something has to close that distance.

The model is the bridge.

That something is a health economic model: a structured set of assumptions and equations that takes the trial's evidence — plus everything else we know — and produces the lifetime costs and QALYs a decision requires.

Put plainly, a model does four jobs at once, each one bridging a gap the trial leaves open:

The time gap — it extrapolates beyond the follow-up to the full time horizon, projecting what happens after the data stops. (This is the big one, and the rest of the lesson is about it.)
The endpoint gap — it links surrogate measures to final outcomes: how a progression-free month today translates into survival and quality of life later.
The evidence gap — it synthesises many sources: the trial, general-population mortality from life tables, long-term costs, utility values, real-world adherence. No single study contains all of it.
The decision gap — it translates the trial's population, comparator, and setting into the ones the decision is actually about.

A model isn't a nuisance bolted onto a clean trial. It's the machine that turns a short, observed fact into a lifetime decision. And every one of those four jobs is done by assumption — which is exactly why the model deserves as much scrutiny as the trial, often more.

Extrapolation: the leap beyond the data.

Focus on the first and most consequential job: bridging the time gap.

Remember the Kaplan–Meier curve from Module 3 — the staircase that tracks the proportion of patients still alive over time. In a trial, that staircase is drawn from real events, but only up to the data cut-off. At the cut-off, some fraction of patients are still alive. The curve simply stops there, hanging in mid-air.

To compute lifetime QALYs, you can't leave it hanging. You need the whole curve, all the way down to zero — because eventually everyone dies, and the area under the survival curve is (roughly) the mean survival time that feeds your QALYs. So you have to take the curve from where the data ends and carry it the rest of the way yourself. That carrying-forward is extrapolation, and it is a genuine leap beyond the evidence: by definition, there are no data in the region you're drawing.

Here's the uncomfortable part. The observed staircase constrains the curve only up to the cut-off. Past it, many different curves can continue on — some diving quickly to zero, some tailing off slowly over decades. They can look almost identical where the data are, and wildly different where they aren't. And it's the part where they differ — the unobserved tail — that determines the mean survival, the QALYs, and ultimately the verdict.

The tail that decides the answer.

Below is a new cancer drug versus standard care. The trial followed patients for 24 months — that's the observed part, and it's the same whichever button you press. Past the cut-off, pick how the survival curve continues. Watch the ICER.

Standard careNew drug — observedNew drug — extrapolated tail

Extrapolation: Middle

Mean survival — new drug: 4.5 life-years · standard care: 3.0 life-years

Incremental survival: 1.5 life-years

× utility (0.8) → incremental QALYs: 1.2

Incremental cost (held fixed): £24,000

ICER: £24,000 ÷ 1.2 = £20,000 per QALY

Verdict at £30,000/QALY: COST-EFFECTIVE

Read that again. The trial data never changed — the observed curve is identical in all three. The only thing you moved was the tail, the part with no data in it. And that alone swung the ICER from £10,000 to £40,000, flipping the verdict. In this evaluation, the answer doesn't live in the trial. It lives in the assumption.

(Incremental cost is held constant here to isolate the survival extrapolation; in a real model longer survival changes costs too.)

Why the tail wins.

Why can an unobserved tail overpower a whole trial's worth of data? Three reasons, and they're worth holding onto:

The observed part is short; the tail is long. Mean survival is the area under the whole curve. If the trial captures two years and patients live fifteen, then thirteen of those years — most of the area, most of the QALYs — are in the extrapolated region. The data you have is a thin slice of the number you need.

Statistical fit can't choose for you. Fit several standard survival curves — exponential, Weibull, log-logistic, Gompertz — to the observed data and they'll often score almost identically on goodness-of-fit. They agree on the part you can see, then fan out across the part you can't. So "we picked the best-fitting curve" is not an answer: fit to the observed data barely discriminates between extrapolations that imply completely different lifetimes.

Which means the choice must come from outside the trial. A defensible extrapolation is justified by external evidence: the biology of the disease, long-term registry data, similar treatments' known trajectories, and — a hard ceiling — general-population mortality. A modelled cohort can't be allowed to outlive the general population; the life tables cap the optimism. Extrapolation done well is an argument built from all of these. Done badly, it's whichever curve makes the ICER look best.

Now you.

A fourth extrapolation is proposed. It projects the new drug's mean survival at 5.0 life-years; standard care remains 3.0 life-years. Health is valued at a utility of 0.8.

How many incremental QALYs does this projection imply? (Enter a number.)

"All models are wrong, but some are useful."

The statistician George Box's line is the honest motto of health economic modelling. A model is not a prediction of what will happen — no one believes the exponential curve is literally true. So what is it for?

A model is a transparent, auditable argument. It lays out, in the open, every assumption required to get from the evidence we have to the decision we face — the structure, the extrapolation, the utilities, the costs — so that each one can be interrogated, challenged, and varied. Its worth isn't accuracy; it's explicitness. A good model doesn't hide the leap of faith. It shows you exactly where the leap is, so you can argue about it.

Two consequences follow, and they set up the rest of this module:

Structure should fit the decision, not impress. A one-off acute event might need nothing more than a simple decision tree. A chronic disease with patients moving between health states over years needs a Markov model. Complexity is a cost, justified only when the decision problem demands it — the next two lessons are about choosing.
The model is where the manufacturer's freedom lives. A trial is largely fixed and pre-registered. A model is built by the submitting company, with dozens of structural and parameter choices, most of them defensible-looking in isolation. That's why an assessor's real work isn't only "was the trial good?" — it's "are these assumptions honest, and what happens to the answer when I change the ones I doubt?"

What is the core problem an assessor should raise?

A manufacturer's oncology submission follows patients for two years; at the cut-off, 40% are still alive. To reach lifetime QALYs, they extrapolate the survival curve with the most optimistic long-tail option, justifying it as "the curve with the best statistical fit to the trial data." What is the core problem an assessor should raise?

Why this matters for HTA

When a dossier lands on your desk, the trial is usually the part everyone has already scrutinised — peer-reviewed, published, pre-registered. The model is where the unexamined choices hide, and where your attention pays off most.

Find the extrapolation first. In most lifetime evaluations — oncology especially — the survival extrapolation is the single biggest driver of the ICER. Before anything else, locate it, and ask what justifies the chosen curve beyond "it fit the data."
Assumptions are arguments, so demand the evidence for them. Every structural and parametric choice is a claim about the world. A defensible model backs its extrapolation with biology, registries, and life-table limits; an indefensible one backs it with whichever curve flatters the result.
Judge the model by how its answer moves. Because the leap is unobserved, the right question is rarely "is this the true curve?" but "how much does the verdict change across the plausible curves?" A conclusion that survives only under the manufacturer's favourite extrapolation isn't a conclusion — it's a preference.

The trial tells you what happened while someone was watching. The model tells you what happens after everyone looks away — and that, usually, is where the decision is really made.

Why a model, in one breath.

A cost-effectiveness decision needs lifetime costs and QALYs; a trial gives a short, surrogate, selected snapshot. The model bridges that gap.
A model does four jobs by assumption: extrapolate in time, link surrogate to final outcomes, synthesise multiple sources, translate to the decision's population and comparator.
Extrapolation is the leap that decides the answer. Many curves fit the observed data equally well but diverge in the unobserved tail — and the tail drives the mean survival, the QALYs, and the ICER. The same trial gave £10,000 or £40,000 per QALY depending only on the curve chosen.
A model isn't a prediction; it's a transparent argument under explicit assumptions. Its structure should fit the decision — and it's where the manufacturer's degrees of freedom concentrate.

All models are wrong. The useful ones are the ones that show you exactly where, and by how much, they might be wrong.

We've established why we model and what the central danger is. The next question is how — and it starts with the simplest structure of all, the one that fits a single decision with a handful of possible outcomes: the decision tree.