M4 · EVIDENCE SYNTHESIS
A drug that worked. Until it didn't.
In the published literature, a class of antidepressants looked reliably effective. Trial after trial, positive. Then someone pulled the full set of trials from the drug regulator — including the ones that were run but never published.
Roughly half of them had been negative. Once you put them back, the average benefit shrank by about a third. For reboxetine, the reversal was sharper still: the majority of patient data had never reached a journal, and once included, the advantage over placebo largely evaporated.
Nothing here was fraud. Every individual trial was real. The distortion lived in which trials you got to see.
So here's the assessor's problem: someone hands you a synthesis built on the published trials. How do you spot a hole shaped like the studies you were never handed?
First, the shape of an honest evidence base.
A funnel plot puts each study as one dot: its effect estimate on the x-axis, its precision on the y-axis — measured as standard error, drawn upside down, so the most precise studies sit at the top.
Large studies (small SE) cluster tightly near the true effect. Small studies (large SE) scatter widely — but they scatter both ways. Draw two diagonal lines at effect ± 1.96·SE and, with no bias and no heterogeneity, about 95% of studies fall inside. A neat, symmetric inverted funnel.
That symmetry is the reference. Everything in this lesson is about reading a departure from it.
All nine studies, symmetric and complete. Dashed diagonals: pooled effect ± 1.96·SE.
Why small null studies quietly disappear.
A small study has a large standard error. To reach statistical significance it needs a large observed effect — a small or null result won't clear the bar. And a non-significant small study is the one least likely to get written up, submitted, accepted, cited.
So the studies that vanish aren't random. They're the small ones sitting near "no effect." Remove them and the bottom of the funnel loses one corner — and the studies that survive on that end are the big-effect ones.
Toggle the unpublished studies below and watch the shape change.
Pooled effect: 5.0 · symmetric, complete.
The pooled estimate barely moves — the buried studies were small and lightly weighted. The visible damage is to the shape, and to your confidence that the base is complete. Now scale it up: reboxetine lost the majority of its data, and that same modest nudge became a collapse.
Check which studies were destined to disappear.
A study is statistically significant (p < 0.05) when its effect is bigger than 1.96 × SE. That threshold is exactly why small null studies go missing — their bar is set impossibly high.
Take a small study like the ones at the bottom of the funnel:
A small study has effect = +4.0, SE = 2.5. Compute its significance threshold: 1.96 × 2.5 = ?
4.9. The study's effect (4.0) is below its own threshold (4.9), so it's not significant — precisely the kind of small study that never makes it to print.
Now compare a large study — effect +3.0, SE 1.0: threshold = 1.96 × 1.0 = 1.96. Effect (3.0) clears it easily → significant → published.
Same true effect, opposite fate. The bar isn't the result — it's the size of the study.
Now the part most people get wrong.
You've seen an asymmetric funnel and a shifted pooled estimate. The tempting conclusion: publication bias. Stop there and you'll be wrong a lot of the time.
An asymmetric funnel tells you one thing only: the small studies disagree with the big ones. It does not tell you why. At least three suspects:
- Missing studies (suppression) — the case we just built.
- True heterogeneity — the small studies are genuinely different. Small trials often run in sicker patients, with tighter monitoring or more intensive dosing, where the treatment really does more. Remember I²: real between-study variation is not misconduct.
- Arithmetic and chance — with few studies, asymmetry can be noise; and for some effect measures the plot is skewed by the maths itself, before any bias.
Toggle the two readings of the same funnel below. Same seven visible studies. Same tilt. From the shape alone you cannot tell them apart.
Two small null studies may be missing here — the funnel can't confirm it.
Same seven dots. Same asymmetry. Two completely different stories — and the plot alone can't choose between them.
One overlay starts to separate them: significance contours.
A contour-enhanced funnel shades the regions where a study would have been significant versus not. Then you ask: where is the gap?
- If the missing studies would have sat in the non-significant zone → suppression is plausible (non-significant small studies are exactly what gets buried).
- If the gap is in a significant zone → suppression is a weak story (why bury a positive, significant result?), so look to heterogeneity, study quality, or the effect measure.
white = non-significant · shaded = significant
In the suppression scenario, the empty region falls in the ___ zone.
Two tools you'll see in submissions — and where each one lies.
Egger's test regresses the effect against its precision; a non-zero intercept flags asymmetry. Begg's test does a rank correlation (weaker). Both come with fine print:
- They detect small-study effects in general — not publication bias specifically.
- With fewer than 10 studies, don't run them: they can't tell asymmetry from chance. This is the standard Cochrane threshold, and it rules out most HTA evidence bases.
- For odds ratios, the SE is mathematically tied to the OR itself, so the test throws false positives. Modified versions (Harbord, Peters) exist for a reason.
Trim-and-fill goes further: it imagines the missing studies, mirror-images them back, and re-computes a "corrected" pooled estimate. It looks like a fix. It isn't. It assumes the asymmetry is suppression — the very thing in question — and it misbehaves under heterogeneity. Read it as a sensitivity analysis ("how fragile is my result?"), never as a recovery of the truth.
No method invents data that was never collected. That's the whole point.
The funnel plot is a smoke detector. It is not the fire brigade.
A clean funnel is reassuring and nowhere near sufficient. The structural defence against reporting bias sits upstream, in the search:
- Trial registries — ClinicalTrials.gov, EU CTIS, the WHO ICTRP. Registered-but-never-published trials are your list of suspects. So is outcome switching: compare the pre-registered primary outcome with the one that got published.
- Regulatory documents — EMA assessment reports (EPARs), FDA medical and statistical reviews, and, when you can get them, full clinical study reports. This is where the Cochrane review of oseltamivir found its evidence — data the published papers never carried.
And note what the funnel is blind to: it sees whole studies that vanish. It cannot see a published trial that quietly buried its primary endpoint. In HTA that selective-outcome reporting is often the bigger problem — and it's invisible on the plot.
The other chair
The other chair. Reading a submission: a symmetric funnel and a null Egger test are not a clean bill of health. Ask what the search actually covered — grey literature, conference abstracts, registries, regulators. Check the registered primary outcomes against the published ones. Treat trim-and-fill as a fragility check, not reassurance. Building one: the assessor is going to ask these questions, so answer them first. Document a search comprehensive enough that "you missed the negative trials" isn't available as a rebuttal. If your funnel tilts, don't hide it — pre-empt it: show the contour-enhanced plot, offer the heterogeneity explanation on its merits, and don't lean on trim-and-fill as if it settled anything.
Same skill, read from either side — knowing what a complete evidence base looks like, and what its absence looks like.
Why this matters for HTA
When it lands on your desk: a manufacturer's synthesis arrives, meta-analysis included, effect estimate crisp. The evidence it rests on is only as trustworthy as its completeness — and completeness is the one thing a polished forest plot can quietly hide.
- You interrogate the search before you trust the pooled number. Publication bias is designed into the literature; a synthesis of the published record inherits it.
- You read asymmetry as a question, not a verdict. Missing studies, real heterogeneity, or an artefact of the measure — you decide which, using contours, registries, and clinical reasoning, not a single test's p-value.
- You price the residual uncertainty. An evidence base you can't fully verify is a reason for caution in the recommendation — and sometimes for a managed-access or risk-sharing arrangement rather than a clean yes.
The distortion that changes a decision is rarely in the study you're reading. It's in the one you were never sent.
Publication bias, in one breath.
- A funnel plot reads the completeness of an evidence base from its shape: precise studies at the top, small ones scattering symmetrically below.
- Asymmetry is a symptom, not a diagnosis. Suppressed studies, true heterogeneity, effect-measure arithmetic, and chance all tilt the funnel.
- Contour-enhanced funnels narrow the suspects — a gap where non-significant studies would sit points to suppression — but they don't convict.
- Egger's test flags small-study effects, not publication bias; it's unreliable below 10 studies and for odds ratios. Trim-and-fill is a sensitivity analysis, never a correction.
- The real defence is upstream: a comprehensive search, trial registries, and regulatory data. The plot is blind to a study that buried its own outcome.
You can't average your way out of the studies you never saw.
You've now covered what can distort a synthesis from the inside — how much its studies scatter (heterogeneity) and whether you have all of them (publication bias). Next: what happens when the trials you need don't compare your two options at all — indirect comparisons and network meta-analysis — before GRADE pulls the whole certainty judgement together.