Module 3 · Survival Analysis

Both drugs: "60% still alive." One is far better. Which?

Two cancer drugs report the same headline: at the end of each trial, 60% of patients were still alive.

But the trials weren't the same length. Drug A's trial followed patients for 5 years — 60% alive at five years. Drug B's trial followed them for 1 year — 60% alive at just one year.

Same '60% alive.' Are these drugs equally good?

Until now, outcomes were yes/no: did the event happen? But many of the most important outcomes — survival, time to relapse, time to progression — are about how long until it happened. Counting events alone discards the timing, and a special problem makes naïve counting fail outright: at the end of any trial, some patients simply haven't had the event yet.

Why a simple percentage fails

Picture a survival trial. Patients don't all arrive on day one and stay to the end. Some enrol late. Some move away and are lost to follow-up. And when the study closes, many patients are still alive — they just haven't had the event yet.

Now try to compute a simple "survival rate." Do you count the still-alive patients as successes? But you don't know what happens to them after the study ends. Do you drop them? Then you're throwing away most of your data — and biasing the result toward whoever did have the event.

Neither works. The naïve percentage can't cope with the fact that different patients were observed for different lengths of time, and that many have incomplete stories. We need a method built precisely for incomplete observation — and it starts by taking those unfinished stories seriously.

Censoring: the unfinished story

When a patient is observed for a while and then their story is cut off — the study ends, or they're lost to follow-up — without the event having happened, we call them censored.

Crucially, a censored patient is not missing data, and not a failure. They carry real information: "this person survived at least this long." A patient censored at 18 months tells you, with certainty, that they were alive at 18 months. You just don't know what happened afterwards.

Censoring is partial truth, not absent truth. The whole art of survival analysis is using that partial truth — "alive at least until here" — instead of discarding it. Throw censored patients away and you bias the result; count them as survivors and you overstate it. Kaplan-Meier does neither.

Build a survival curve

Let's build the curve by hand. Below is a timeline and a starting group of patients, all alive (the curve begins at 100%). Add events (deaths) and censorings along the timeline, and watch how each one behaves.

At risk: 10 · Survival: 100%

0%25%50%75%100%04812162024MonthsEvent (n=9)CensorEvent (n=7)JavaScript is off. Static example shown above: events step the curve down; censoring marks (|) leave it flat.

Try this: add a couple of events and watch the curve step down. Then add a censoring — and notice it does something completely different.

See the difference? An event drops the curve — that's someone having the outcome. A censoring mark doesn't move the curve at all — the patient simply leaves the at-risk pool, carrying the truth "survived at least this long." That's how Kaplan-Meier uses incomplete data instead of throwing it away.

How to read a Kaplan-Meier curve

Now you can read any Kaplan-Meier curve on sight. It's a staircase that only ever goes down:

The vertical axis is the probability of surviving (starting at 100%).
Each step down is an event — the curve drops in proportion to how many patients were still at risk at that moment. (Early on, with many at risk, each death is a small step; late on, with few left, each death is a big drop — which is why the tail of a curve is jumpy and uncertain.)
Each tick mark on the curve is a censoring — a patient leaving the at-risk pool without an event. No step, just a mark.
The curve flattens between events; it never rises.

The shape tells the story: a curve that stays high is good survival; one that plunges early is poor. And comparing two curves — treatment versus control — is how survival benefit is shown. The higher curve is the better drug.

Median survival

How do you summarise a whole curve in one number? Not with a mean — survival times are badly skewed (a few patients live far longer than the rest), and you often can't even compute a mean because some patients are still alive. Remember from the distributions lesson: when data is skewed, the median is the honest summary.

So survival uses the median survival time: the point where the curve crosses 50% — the time by which half the patients have had the event. Drop a line from 50% to the curve, read off the time. It's robust to the skew, and it works even when many patients are still alive.

One important phrase you'll meet: "median not reached." If the curve never drops to 50% within the study, the median survival can't be calculated — usually because more than half the patients were still alive at the end. That's often good news, but it's also a signal the follow-up wasn't long enough to see the full picture.

Read the curves

Read each survival curve like an assessor. What does it actually tell you?

Two curves are shown. The trial ran for 24 months. What does this pattern tell you?

The traps in survival curves

Three things to check before you trust a survival claim:

Short follow-up. A curve cut off early tells you nothing about long-term survival. The exciting tail of a curve, where few patients remain at risk, is the least reliable part — wide uncertainty, built on a handful of people.
When the curves separate. Two drugs can reach the same point by very different paths — early benefit, late benefit, or a benefit that only appears after a delay. The shape matters as much as the endpoint.
Curves that touch or cross. If treatment and control curves overlap or cross, any "advantage" is fragile — and crossing curves can violate the assumption behind the single summary number we meet next.

Always look at the whole curve and the numbers-at-risk beneath it — not just the headline survival figure or the median. The shape, the censoring, and how many patients remain are where the truth lives.

Why this matters for HTA

Survival analysis is the backbone of oncology appraisal — and increasingly everywhere outcomes are about time, not just occurrence.

A median survival gain ("3 months longer") is read against the whole curve: does the benefit last, when does it appear, and how many patients remain at risk to support the tail?
Short trials force extrapolation. When follow-up ends before most events occur, manufacturers must model survival beyond the data — and that extrapolation, which you'll meet in the economic modelling module, is one of the most contested parts of any submission.
"Median not reached" is common in modern cancer drugs — encouraging, but it means the durable benefit is still uncertain and the economic case rests heavily on assumptions about the future.
Beware the tail. Dramatic separation late in a curve, on few remaining patients, is where hope and statistical noise are hardest to tell apart.

A survival curve is a story told over time, not a single number. The median is the headline; the shape, the censoring, and the numbers-at-risk are the plot.

Survival analysis, in one breath

When when matters, a simple survival percentage fails — patients are observed for different lengths, and many haven't had the event yet.
Censoring = "survived at least this long": partial information, not missing data. Survival analysis uses it instead of discarding it.
A Kaplan-Meier curve steps down at each event and only marks (doesn't drop at) each censoring; the vertical axis is probability of surviving.
Median survival = where the curve crosses 50%; robust to skew, and may be "not reached" if over half are still alive.
Read the whole curve — when it separates, how the tail behaves, how many remain at risk — not just the headline number.

Survival isn't "how many are alive" — it's "how long they survive," and the curve, with its steps and censoring marks, is how we tell that honestly.

A Kaplan-Meier curve shows you a survival difference beautifully — but it's a whole picture, not a number you can put in a table or a cost-effectiveness model. So how do you compress "this curve versus that curve" into a single measure of benefit? That's the hazard ratio — the standard summary of survival difference, and, like the odds ratio before it, a number that hides important things behind its tidy single value. That's next.