M4 · EVIDENCE SYNTHESIS

A drug that worked. Until it didn't.

In the published literature, a class of antidepressants looked reliably effective. Trial after trial, positive. Then someone pulled the full set of trials from the drug regulator — including the ones that were run but never published.

Roughly half of them had been negative. Once you put them back, the average benefit shrank by about a third. For reboxetine, the reversal was sharper still: the majority of patient data had never reached a journal, and once included, the advantage over placebo largely evaporated.

Nothing here was fraud. Every individual trial was real. The distortion lived in which trials you got to see.

So here's the assessor's problem: someone hands you a synthesis built on the published trials. How do you spot a hole shaped like the studies you were never handed?

First, the shape of an honest evidence base.

A funnel plot puts each study as one dot: its effect estimate on the x-axis, its precision on the y-axis — measured as standard error, drawn upside down, so the most precise studies sit at the top.

Large studies (small SE) cluster tightly near the true effect. Small studies (large SE) scatter widely — but they scatter both ways. Draw two diagonal lines at effect ± 1.96·SE and, with no bias and no heterogeneity, about 95% of studies fall inside. A neat, symmetric inverted funnel.

That symmetry is the reference. Everything in this lesson is about reading a departure from it.

Mean differenceStandard error — precision ↑ at top-20246810012345No effectPooled

All nine studies, symmetric and complete. Dashed diagonals: pooled effect ± 1.96·SE.

Why small null studies quietly disappear.

A small study has a large standard error. To reach statistical significance it needs a large observed effect — a small or null result won't clear the bar. And a non-significant small study is the one least likely to get written up, submitted, accepted, cited.

So the studies that vanish aren't random. They're the small ones sitting near "no effect." Remove them and the bottom of the funnel loses one corner — and the studies that survive on that end are the big-effect ones.

Toggle the unpublished studies below and watch the shape change.

Mean differenceStandard error — precision ↑ at top-20246810012345No effectPooled

Pooled effect: 5.0 · symmetric, complete.

The pooled estimate barely moves — the buried studies were small and lightly weighted. The visible damage is to the shape, and to your confidence that the base is complete. Now scale it up: reboxetine lost the majority of its data, and that same modest nudge became a collapse.

Check which studies were destined to disappear.

A study is statistically significant (p < 0.05) when its effect is bigger than 1.96 × SE. That threshold is exactly why small null studies go missing — their bar is set impossibly high.

Take a small study like the ones at the bottom of the funnel:

  1. A small study has effect = +4.0, SE = 2.5. Compute its significance threshold: 1.96 × 2.5 = ?

Now the part most people get wrong.

You've seen an asymmetric funnel and a shifted pooled estimate. The tempting conclusion: publication bias. Stop there and you'll be wrong a lot of the time.

An asymmetric funnel tells you one thing only: the small studies disagree with the big ones. It does not tell you why. At least three suspects:

Toggle the two readings of the same funnel below. Same seven visible studies. Same tilt. From the shape alone you cannot tell them apart.

Mean differenceStandard error — precision ↑ at top-20246810012345No effectPooled

Two small null studies may be missing here — the funnel can't confirm it.

Same seven dots. Same asymmetry. Two completely different stories — and the plot alone can't choose between them.

One overlay starts to separate them: significance contours.

A contour-enhanced funnel shades the regions where a study would have been significant versus not. Then you ask: where is the gap?

Mean differenceStandard error — precision ↑ at top-20246810012345No effectPooled

white = non-significant · shaded = significant

In the suppression scenario, the empty region falls in the ___ zone.

Two tools you'll see in submissions — and where each one lies.

Egger's test regresses the effect against its precision; a non-zero intercept flags asymmetry. Begg's test does a rank correlation (weaker). Both come with fine print:

Trim-and-fill goes further: it imagines the missing studies, mirror-images them back, and re-computes a "corrected" pooled estimate. It looks like a fix. It isn't. It assumes the asymmetry is suppression — the very thing in question — and it misbehaves under heterogeneity. Read it as a sensitivity analysis ("how fragile is my result?"), never as a recovery of the truth.

No method invents data that was never collected. That's the whole point.

The funnel plot is a smoke detector. It is not the fire brigade.

A clean funnel is reassuring and nowhere near sufficient. The structural defence against reporting bias sits upstream, in the search:

And note what the funnel is blind to: it sees whole studies that vanish. It cannot see a published trial that quietly buried its primary endpoint. In HTA that selective-outcome reporting is often the bigger problem — and it's invisible on the plot.

The other chair

The other chair. Reading a submission: a symmetric funnel and a null Egger test are not a clean bill of health. Ask what the search actually covered — grey literature, conference abstracts, registries, regulators. Check the registered primary outcomes against the published ones. Treat trim-and-fill as a fragility check, not reassurance. Building one: the assessor is going to ask these questions, so answer them first. Document a search comprehensive enough that "you missed the negative trials" isn't available as a rebuttal. If your funnel tilts, don't hide it — pre-empt it: show the contour-enhanced plot, offer the heterogeneity explanation on its merits, and don't lean on trim-and-fill as if it settled anything.

Same skill, read from either side — knowing what a complete evidence base looks like, and what its absence looks like.

Why this matters for HTA

When it lands on your desk: a manufacturer's synthesis arrives, meta-analysis included, effect estimate crisp. The evidence it rests on is only as trustworthy as its completeness — and completeness is the one thing a polished forest plot can quietly hide.

The distortion that changes a decision is rarely in the study you're reading. It's in the one you were never sent.

Publication bias, in one breath.

You can't average your way out of the studies you never saw.

You've now covered what can distort a synthesis from the inside — how much its studies scatter (heterogeneity) and whether you have all of them (publication bias). Next: what happens when the trials you need don't compare your two options at all — indirect comparisons and network meta-analysis — before GRADE pulls the whole certainty judgement together.