Module 2 · Evidence Hierarchy

They were certain. The evidence was enormous. They were wrong.

For years, doctors believed hormone replacement therapy (HRT) protected women's hearts. This wasn't a hunch — it rested on large, careful studies tracking hundreds of thousands of women. Those who took HRT had noticeably fewer heart attacks. The evidence looked overwhelming.

Huge studies, a consistent result, expert agreement. Is that enough to trust the conclusion?

When the question was finally put to a stricter test, the heart protection vanished. Same therapy, far better evidence, opposite answer. Before we trust any result, we have to ask the question this whole half of the course is built on: how was it found out?

Get fooled

Here's what actually happened. The early studies simply watched: they compared women who happened to take HRT with women who didn't. And the HRT-takers did have healthier hearts.

But those women weren't comparable. Women prescribed HRT in that era tended to be wealthier, more health-conscious, more likely to exercise and see doctors regularly. Their hearts were healthier for reasons that had nothing to do with the therapy.

When researchers finally ran the stricter test — splitting women into HRT-or-not by chance, so the two groups were otherwise identical — the heart benefit was gone.

The original studies weren't small or sloppy. They were huge. And huge can still be wrong, if the method lets reality fool you.

The real lesson

This is the idea everything ahead is built on:

A claim is only as good as the way it was found out.

Not how confident the expert sounds. Not how big the number is. Not how many studies agree. What matters is whether the method protected the conclusion from being fooled.

So "is this good evidence?" turns into a sharper question: good against what? It turns out there are three ways — and only three big ones — that evidence leads us astray. Learn to spot them, and you can read any study in the world.

The three enemies

Every misleading health claim is some mix of three culprits. Meet them:

Chance. Pure luck. Flip a coin four times and you might get four heads — and conclude the coin is rigged. Small numbers lie constantly.

Bias. A systematic tilt in how the study was built or measured — the wrong people studied, the wrong way, so the answer was skewed before it began.

Confounding. A hidden third factor driving both things at once. HRT didn't protect hearts; being the kind of woman prescribed HRT did.

Good evidence is evidence that has shut these three out. Let's see if you can catch them.

A machine for not being fooled

Now the good news. We're not helpless against these three. Over the last century, researchers built methods designed to shut each enemy out.

To beat chance: study more people. Big numbers drown out luck.

To beat bias: standardise who's studied and how they're measured, so nothing tilts the result.

To beat confounding: make the groups you compare identical in every way except the treatment — so nothing hidden can take the credit.

That's what a study design really is: not paperwork, but a machine for not being fooled. And some machines shut out more enemies than others. That difference is the whole reason evidence comes in ranks.

Two pieces of evidence, same claim. Each time, tap the one you'd trust more — then see why.

The hierarchy of evidence

Line up what you just chose and a ladder appears — the hierarchy of evidence:

4Systematic review of RCTs — pools them all, guarding against chance and cherry-picking.
3Randomised controlled trial (RCT) — randomising shuts out confounding too.
2Observational study — counts systematically, but can't rule out confounding.
1Anecdote / expert opinion — one person's impression. Beats none of the three enemies.

Each rung sits higher because it shuts out one more way of being fooled. That's all the hierarchy is: a ranking by protection. (The exact study designs on each rung are the next lesson; how reviews pool trials is M4.)

The twist: the ladder is a simplification

One more duel — and this one's a trap.

You've just learned RCT beats observational. So which would you trust more?

The hierarchy ranks designs at their best. It doesn't promise that any RCT beats any observational study. A trial riddled with dropouts and run unblinded can be weaker than a large, careful observational study — its randomisation already ruined.

A study's rung is a starting presumption, not a verdict. How well it was actually run — and whether other studies agree — can move it up or down.

That's why real HTA never just counts study labels. It grades each study's quality and consistency — the machinery you'll meet in M4. The ladder tells you where to start trusting. It never tells you where to stop thinking.

Why this matters for HTA

Here's where this lands on your desk. A manufacturer's submission is built from evidence — and not all of it will sit high on the ladder. Where strong evidence is missing, weaker evidence gets dressed up to fill the gap: an observational comparison presented as if it settled the question; a single favourable trial leaned on hard; a striking association quietly implying cause.

None of it need be dishonest. But your job is to read past the packaging to the method underneath, and ask the only question that matters:

Given how this was found out — how much should it move me?

An expert who can't tell a confounded association from a randomised result will be confidently, repeatedly fooled. You're now not that person.

Why evidence isn't all equal

A claim is only as good as the way it was found out — not how big, loud, or confident it is.
Three enemies mislead us: chance, bias, and confounding.
A study design is a machine for shutting those enemies out. The more it shuts out, the higher it ranks.
The hierarchy, bottom to top: anecdote → observational → RCT → systematic review.
But a rung is a starting presumption, not a verdict — execution and agreement can move any study up or down.

Good evidence isn't evidence that sounds convincing. It's evidence that didn't let reality fool you.

You can now judge how much to trust a result. Next, we open up the rungs themselves — the actual study designs, what each one can prove, and what it can't. That's M2's next lesson: the types of studies.