M4 · EVIDENCE SYNTHESIS

The average that isn't an average

Back in the first lesson of this module, you averaged eight trials with equal weight. I owed you a correction. Combining studies properly isn't averaging at all — it's weighing.

Cast your mind back to that first lesson. You had eight trials of a drug, and you took a simple average of their effects — added them up, divided by eight. It made a point about cherry-picking, and it did the job.

But that simple average had a flaw hiding in plain sight, and I told you we'd come back to it. Here we are.

That average treated all eight trials as equals. A trial of 40 patients counted exactly as much as a trial of 4,000. And that is not how combining evidence actually works — because those two trials are not equally believable, and pretending they are throws away the single most useful thing you know about them.

Why equal weight is wrong

Picture two trials of the same drug. One enrolled 4,000 patients and found a +7 point benefit. The other enrolled 40 and found +14.

Average them naively and you get +10.5 — as if the tiny trial's exuberant result deserved the same say as the large one's. But you already know the small trial is shakier: fewer patients means more room for chance, a wider confidence interval, a result you'd trust less if you had to bet.

So why would you let it pull the combined answer just as hard? You wouldn't. The whole idea of meta-analysis is to give each study a say proportional to how much we should believe it — and then combine them. Not one-study-one-vote. Weighted by trust.

Weight is precision

So what is "trust," in a number? It's precision — how tightly a study pins down its estimate. And precision is captured by the confidence interval: a narrow interval means a precise, heavily-trusted study; a wide one means an imprecise, lightly-trusted one.

Meta-analysis turns that into a weight. The rule, in words: a study's weight is the inverse of its variance — the tighter its interval, the more it counts in the final pooled estimate. Formally that's weight = 1 / variance, and since a bigger sample gives a tighter interval, larger studies usually weigh more.

But hold onto the real mechanism, because it matters: it's not the patient count that buys weight, it's the precision. Sample size is the usual route to precision, so we'll use it as our lever in a moment — but a large study with a noisy, subjective outcome can end up weighing less than a smaller study with a clean, hard one. Precision is the currency. Size is just the most common way to earn it.

Tug-of-war

Enough words. Feel it.

Below are four trials of a drug, each a square on the effect axis — right is more benefit. The square's size is its weight; the whiskers are its confidence interval. The diamond at the bottom is the combined, weighted result — the meta-analysis.

Study D sits far out to the right at +14, but notice how small it is: only 100 patients, so it barely counts. The diamond sits down near the bigger studies, almost ignoring D.

0Study A+8Study B+6Study C+5Study D+14Pooled6.9051015Effect (points on 0–100 QoL scale)

Study D sample size: 100 patients

Pooled estimate: 6.9

StudyEffectN
A+82,000
B+61,000
C+51,000
D+14100
Pooled (weighted)6.9

Grab Study D's sample size and drag it up. Watch the diamond slide toward it — even though D's result never changes. You're not changing what D found. You're changing how much it counts.

The pooled estimate isn't the centre of the points. It's their centre of gravity.

Naive vs weighted

Compare two ways of combining the same four trials.

Weighted average: 6.90Study A+8Study B+6Study C+5Study D+14Pooled6.9051015Effect (points on 0–100 QoL scale)

The naive average — every trial equal — lands at 8.25, dragged upward by little D's exuberant +14. The weighted average — trust-proportional — lands at 6.9, because D, for all its enthusiasm, barely counts. Same four numbers. Two different answers.

Which study pulls the weighted average hardest?

Do the weighting

Let's put a number on it, with a clean two-study example.

A weighted average multiplies each effect by its weight, adds those up, and divides by the total weight.

Compute the weighted mean.

(10 × 3 + 2 × 1) ÷ (3 + 1) =

Weight adds up

One more thing, so you don't over-learn the last one. It's tempting to conclude "the biggest study always wins." It doesn't — because weight accumulates.

A single large trial might carry more weight than any one other study. But several medium studies that agree with each other pool their weight together — and their combined vote can outweigh the big one, pulling the diamond toward their consensus instead. The pooled estimate answers to the whole distribution of trust, not to any single heavyweight.

Which quietly raises a question we've dodged so far: what if the studies don't agree? What if they're not all really measuring the same thing? Hold that thought — it's the whole of the next two lessons.

Why this matters for HTA

A manufacturer's dossier presents a meta-analysis with a single, favourable pooled number. It looks authoritative — one estimate, drawn from many trials. Your job is to ask how that number was built.

"A meta-analysis is not a vote of trials. It's a vote of trust — and your job is to check who was allowed to vote, and how loudly."

Pooling, in one breath

"The combined answer isn't the middle of the studies. It's their centre of gravity — and gravity goes to whoever carries the most weight."

You now know what a pooled estimate is and why it lands where it does. But we've been quietly assuming something the whole time: that all these studies are estimating the same underlying effect. What if they're not — what if the differences between them are real, not just chance? That single question splits meta-analysis into two different models of the world, giving two different answers from the same data. That's next.