M4 · EVIDENCE SYNTHESIS

Two models,
one dataset

Five trials testing the same antihypertensive. They all show a benefit. But Study A shows +3 mmHg and Study D shows +11 mmHg. Are they measuring the same thing? Fixed-effect says yes. Random-effects says no. Same data — two answers. This lesson decides which assumption you should make, and what changes when you do.

Two claims about reality

Every meta-analysis carries a hidden assumption about why the studies differ. There are exactly two options.

Story one

One true effect

All studies estimate the same underlying truth. Their results scatter around it because of sampling chance alone — like five different labs measuring the same object. More data narrows the answer.

Story two

A distribution of true effects

The studies estimate genuinely different true effects — different populations, doses, follow-up lengths. Their results scatter because the biology differs, not just the luck. No amount of data collapses that spread to zero.

These two stories are not different analysis techniques. They are different claims about what is happening in the world. Choosing between them is a scientific decision, not a statistical one.

The fixed-effect model

The fixed-effect model commits to Story One. Its weight formula follows directly from that commitment.

Weight for study i

wi = 1 / vi

where vi is the within-study variance (the square of the standard error)

A large, precise trial has a small vi, so 1/vi is large — it dominates the pool. A small trial contributes little. That hierarchy is deliberate: if there is one true effect, the biggest trial is closest to it.

Because the only source of uncertainty is within-study chance, aggregating many patients drives the confidence interval gratifyingly narrow. The pooled estimate is a weighted average; its variance is 1/W, where W = Σ(1/vi).

When the fixed-effect assumption holds, the interval is legitimately tight. When it doesn't, the tight interval is a lie.

The random-effects model

The random-effects model commits to Story Two. The true effects form a distribution, and the studies sample from it. That adds a second source of variance: τ² (tau-squared), the between-study variance.

Weight for study i

wi = 1 / (vi + τ²)

τ² estimated from the data using DerSimonian & Laird or restricted maximum likelihood

Adding τ² to every denominator shrinks the weight of large studies and grows the weight of small ones. The hierarchy flattens. Weights become more equal across studies.

Two consequences follow:

Weights — the numbers

Take two studies: P with within-study variance v = 1, and Q with v = 4. Assume τ² = 4 (a plausible estimate for a heterogeneous literature).

vFixed weight
= 1/v
Random weight
= 1/(v+τ²)
Study P11.00.20
Study Q40.250.125

Fixed weight ratio P : Q = 4 : 1 — the large precise trial dominates. Random weight ratio = 1.6 : 1 — still heavier, but far less so. τ² did the equalising.

In the random-effects model, a study's weight is one over which quantity?

Where does τ² itself come from? It is estimated from the dispersion of the observed effect sizes around their weighted mean — a computation the software handles. The key insight is what it represents: how much the true effects genuinely differ across studies.

See it in the forest plot

Below are the same five trials — Study A (+3) through Study E (+4) — pooled under each model. Toggle between the two models to see how the squares and diamond change, then continue.

FIXED-EFFECTStudy A+3Study B+5Study C+9Study D+11Study E+4Pooled5.104812Pooled 5.1 (95% CI 4.0–6.3)

Reading the plots

Look at the two diamonds you just saw. Answer both questions to continue.

In which model is the pooled confidence interval wider?

Which model should you choose?

The choice depends on whether the "one true effect" story is scientifically credible. For each scenario below, pick the appropriate model.

Scenario 1

Four RCTs of the same beta-blocker at 25 mg/day in middle-aged adults with stage-1 hypertension. All trials run for 12 weeks, all measure 24-hour systolic BP by ambulatory monitoring, all report effects between +4 and +7 mmHg. Different sites, identical protocol — run under one coordinating centre.

Scenario 2

Five trials testing different vasodilators across three continents — doses ranging from 10 to 80 mg, follow-up from 4 weeks to 18 months, populations mixing young healthy adults, elderly patients and people with type-2 diabetes. Effects range from +2 to +14 mmHg.

Measuring heterogeneity: I² and τ²

Our five trials give τ² = 7.2 and I² = 79 %. What does that mean?

τ² = 7.2

The estimated variance of the distribution of true effects. Its square root, τ ≈ 2.7 mmHg, is the standard deviation of that distribution — a sense of how much the "true effect" varies from trial population to trial population.

I² = 79 %

The proportion of observed total variance that is due to true heterogeneity rather than chance. I² = τ²/(τ² + v̄), where v̄ is the average within-study variance. (This is a simplified form. The formal definition works through the Q statistic — I² = (Q − df) / Q, which the heterogeneity lesson unpacks in full — but it measures the same thing: the share of the visible disagreement that chance can't explain.) At 79 % most of the spread between trials reflects real differences, not sampling noise.

Cohen's informal benchmarks (25 % = low, 50 % = moderate, 75 % = high) give a sense of scale, but their mechanical use is discouraged. What matters is whether the heterogeneity is scientifically explicable — and whether a pooled estimate is meaningful at all when I² is this high.

High I² does not always mean the meta-analysis is invalid. It means you should ask why the trials differ — which is often the most interesting question of all.

Why this matters for HTA

Model choice is not a footnote. It touches the evidence that reaches the decision table.

The confidence interval is not just a number. Its width is a statement about what the model believes.

Fixed vs random effects, in one breath

A narrow pooled interval is only reassuring if the model behind it was honest.

Everything hinged on one quantity we simply assumed: τ², how much the true effects differ. In our five trials it was large — heterogeneity measured at I² = 79 %, meaning most of the spread between these trials is real, not chance. A later lesson makes that heterogeneity precise — turning the visible scatter into I², a single number for how much of the disagreement is real — and asks what to do when the pooled estimate itself stops being the right answer.