M9 · UNCERTAINTY

From a range to a distribution.

Last lesson, the tornado gave each uncertain input two numbers: a low and a high. But think about what that leaves out. If a utility could plausibly run from 0.6 to 0.8, are all those values equally likely? Almost never. The middle is usually far more credible than the extremes — 0.7 is a better bet than either end. A low-and-high pair throws that away, treating a whole span of possibilities as two flat edges.

The honest way to describe an uncertain input isn't two endpoints — it's a distribution: a curve that says how likely each value is, peaked where the estimate is strongest, thinning out toward the improbable extremes. Give every input a distribution instead of a range, and a far more powerful question opens up: not "how far can the ICER swing if I push one lever," but "what does the ICER actually look like when every input is uncertain at once?"

Draw one sample, run the model.

Here's the engine, and it's beautifully simple. Instead of fixing inputs at low, high, or base case, you draw a random sample from each input's distribution — one value apiece, each pulled with a probability matching its curve. The utility comes out as, say, 0.71 this time; the survival gain as 0.76 QALYs; the drug cost as £19,410. One value from every distribution.

Feed that complete set of sampled inputs through the model and it produces one ICER — one ΔCost and one ΔEffect. That single run is one plausible version of the world: one coherent guess at what might be true, with every input landing somewhere sensible within its own uncertainty. Not the base case, not an extreme — just one roll of the dice across the entire model at once.

Now do it a thousand times.

One roll tells you almost nothing. So you roll again — a fresh random draw from every distribution — and get a different ICER. And again. And again, thousands of times. This is Monte Carlo simulation: repeat the sample-and-run cycle until you have thousands of ICERs, each from its own independently drawn set of inputs.

The crucial difference from last lesson lives right here. In one-way analysis, exactly one input moved while all the rest sat frozen. In Monte Carlo, every input varies simultaneously in every single run — the utility and the survival gain and the cost, all drawn together, every time. That's what finally captures the realistic case one-way analysis was blind to: several inputs being unfavourable at once. Do this across a whole model of uncertain inputs and it's called probabilistic sensitivity analysis (PSA) — the standard way HTA quantifies how uncertain a cost-effectiveness result really is.

The cloud on the plane.

Now, how do you look at thousands of ICERs? You put them back on familiar ground: the cost-effectiveness plane from Module 7. Every run produces a ΔEffect and a ΔCost — which is exactly a point on that plane, horizontal axis incremental QALYs, vertical axis incremental cost. Plot one run and you get one point. Plot all of them and you get a cloud.

That cloud is the result of a PSA, and you read it two ways:

Its spread is the uncertainty. A tight little cluster means the inputs, however individually uncertain, combine into a confident answer. A cloud smeared across the plane means the result is genuinely up in the air.
Its position relative to the threshold is the verdict. Lay the threshold line across the plane and the question becomes visual: how much of the cloud sits on the cost-effective side, and how much has spilled across to the wrong side?

A single base-case ICER was one point pretending to be the answer. The cloud shows the answer for what it is — a distribution with a location and a width, and it's the width that a point estimate always hides.

Run the Monte Carlo.

Below is the cost-effectiveness plane, and a model whose incremental cost and QALYs are uncertain. Hit Run to draw samples and plot them — one point per simulated run. Build the cloud up, and use the uncertainty width slider to tighten or widen the input distributions. Watch the share of the cloud that lands below the threshold.

Cost-effective runNot cost-effectiveBase case£30,000 threshold

Uncertainty width1.0×

Runs: 0 · Mean ΔEffect — · Mean ΔCost — · Share below £30,000 threshold: —

One point was never the answer. The cloud is — and notice what it exposes that the base case couldn't: even with a base-case ICER comfortably under £30,000, a real chunk of the cloud can sit on the wrong side of the line. That share — the proportion of runs that come out cost-effective — is the number decision-makers actually want. (For clarity we sample ΔCost and ΔEffect directly here; a real PSA samples the underlying parameters and lets the model produce ΔCost and ΔEffect — same idea, one layer deeper.)

Choosing the distributions.

A PSA is only as sound as the distributions fed into it, and choosing them isn't cosmetic — it's constrained by what each input is. The rule: the distribution must respect the natural limits of the parameter.

Probabilities and utilities live between 0 and 1, so they get a beta distribution, which is bounded to that interval. Draw a transition probability from a beta and you can never get 1.3 or −0.2 — values that would be nonsense.
Costs can't go below zero and are usually right-skewed (a long tail of expensive cases), so they get a gamma or log-normal distribution, both bounded at zero and skewed to match.

Why does this matter beyond tidiness? Put a normal distribution on a probability near zero and it will cheerfully sample negative probabilities — the model then computes with impossible inputs, and the cloud is quietly corrupted. Matching each distribution to its parameter's real bounds is what keeps every one of those thousands of runs a possible world rather than an arithmetical fiction. When you appraise a PSA, the distributional choices are exactly the kind of thing worth checking.

Now you.

A PSA runs 2,000 simulations. Plotted on the cost-effectiveness plane, 1,700 of the points fall below the £30,000 threshold line (the cost-effective side).

What percentage of simulations came out cost-effective at this threshold? (Enter a number.)

Two things PSA is not.

PSA is powerful enough to be over-read, so pin down two things it does not tell you.

It is not heterogeneity. The cloud shows parameter uncertainty — how unsure we are about the true value of each input, because our evidence is limited. It does not show heterogeneity — the fact that real patients genuinely differ from one another (an older patient really does have different costs than a younger one). Parameter uncertainty shrinks as evidence improves; heterogeneity is a real feature of the world that better data only describes more precisely. They're different questions with different remedies — subgroups for heterogeneity, PSA for uncertainty — and conflating them is a classic error.

More simulations do not mean less uncertainty. Running 100,000 draws instead of 5,000 gives you a smoother, more stable cloud — a better estimate of the same uncertainty. It does not make the cloud narrower. The width is a property of the model's inputs, not of how many times you sampled them. Mistaking a smooth cloud for a confident one is mistaking precision of estimation for certainty of result.

And a boundary worth stating plainly: PSA propagates uncertainty in a model's parameters. It says nothing about whether the model itself is right — the structure, the extrapolation, the assumptions from Module 8. Run a PSA on a flawed model and you get a beautifully precise cloud around the wrong answer. That structural uncertainty needs a different tool, which is where the next-but-one lesson goes.

What's wrong with this reasoning?

A reviewer reads a PSA and concludes: "The cloud is wide, so the model must have been run with too few simulations — they should increase the number of Monte Carlo runs to narrow it." What's wrong with this reasoning?

Why this matters for HTA

PSA is where a modern dossier is supposed to be honest about what it doesn't know — and where a careful assessor separates real confidence from manufactured confidence.

Read the cloud's width, not just its centre. The headline is often a probability — "78% likely to be cost-effective at £30,000." A base-case ICER under the threshold paired with a cloud sprawling well across it is a result far shakier than the point estimate suggests. The gap between the two is exactly what PSA exists to reveal.
Interrogate the distributions like you interrogated the ranges. Same game as the tornado: narrow input distributions produce an artificially tight cloud and an inflated probability of cost-effectiveness. Check that distributions match parameter bounds (beta for 0–1, gamma/log-normal for costs) and that their spreads reflect the real state of the evidence, not the result the sponsor wants.
Don't let a precise cloud vouch for the model. PSA quantifies parameter uncertainty inside a given structure; it cannot tell you the structure is right. A gorgeous, tight cloud around an implausible extrapolation is precision in the service of a wrong answer. Structural doubt needs scenario analysis and validation, not more draws.

A base-case ICER tells you where the answer is. A PSA tells you how much the answer might be somewhere else — and in a close decision, the second question matters more than the first.

Probabilistic sensitivity analysis, in one breath.

PSA replaces each input's low/high range with a probability distribution, then uses Monte Carlo to draw from all of them at once, run the model, and repeat thousands of times — so, unlike one-way analysis, every input varies jointly.
The output is a cloud of points on the cost-effectiveness plane: its spread is the uncertainty, its position against the threshold is the verdict. The share of the cloud on the cost-effective side is the probability of cost-effectiveness.
Distributions must respect parameter bounds — beta for probabilities and utilities (0–1), gamma/log-normal for costs (≥0, skewed).
Two cautions: PSA is parameter uncertainty, not heterogeneity; and more runs make the cloud smoother, not narrower — width comes from the evidence, and PSA can't vouch for the model's structure.

One-way analysis pulled each lever alone and asked how far the answer could move. PSA pulls every lever at once, thousands of times, and shows you the whole shape of what you don't know.

We ended on a single number — the share of the cloud below £30,000. But that share depends entirely on where we drew the threshold; slide the line and the percentage changes. So why compute it at just one threshold? Draw it for every threshold at once and you get one of the most useful curves in health economics — the cost-effectiveness acceptability curve. That's next.