M13 · SPECIAL TOPICS

The same device, two different results.

Take a drug — the same molecule, the same dose — and give it to two similar patients in two different hospitals. You expect, statistically, the same effect. That expectation is the bedrock of everything you've learned about evaluating medicines: the drug is the drug, wherever it goes.

Now take a medical device — say, an instrument for a minimally invasive procedure — and put it in the hands of two different surgeons. One is a world expert who's done the procedure a thousand times; the other is competent but new to it. Same device, same patient type. Will you get the same result? Obviously not. The expert's patients do markedly better. The device didn't change — the hands did. And that gap between the two surgeons isn't noise to be averaged away; it's a real, systematic part of what the device "does."

Sit with how strange that is compared to a drug. A molecule doesn't perform better for a more experienced prescriber. But a device's results are inseparable from who's using it, how many times they've used it before, and which version they're holding. This lesson is about what that does to health technology assessment — because it breaks, at the foundations, an assumption the entire drug-evaluation machine quietly depends on.

A drug is a molecule; a device is a system.

Here's the assumption, stated plainly, and it's so obvious for drugs that nobody names it: the technology being assessed is a stable, isolated causal agent. A drug is a molecule. It's identical in every unit, it acts by itself, it doesn't change over the years of a trial, and it's cleanly separable from the person who administers it. Precisely because the molecule is an isolated, stable factor, a randomised trial can isolate its effect — that's what the trial is for.

A device satisfies none of this. The effect of a device is not a property of the device — it's a property of a whole system: device × operator × version × time. Change the operator (expertise), change the point in time (learning), change the version (iteration), change the context (the hospital, the volume, the team), and you change the effect. There is no clean "device effect" sitting underneath, waiting to be measured, the way there's a clean "molecule effect." The causal agent isn't isolated and isn't stable — it's entangled with its user and its moment.

That single fracture — a device is a system, not an isolated factor — is the source of everything difficult about assessing devices. The rest of this lesson is really just four consequences of it, each one breaking a piece of the standard machine. Get the fracture, and the four breaks are obvious.

Break 1: operator-dependence.

The first consequence: a device's effect is operator-dependent. Effect = device × skill, and the skill term is large, real, and irreducible.

For a drug, the person administering it is causally irrelevant to the effect — the molecule works the same whoever hands it over. For a device, the operator is part of the intervention. A surgical robot in expert hands and the same robot in inexperienced hands are, from the outcome's point of view, two different interventions. This wrecks something you learned to prize in Module 2: internal validity — the trial's ability to attribute the outcome to the intervention itself. In a drug trial you can attribute the effect to the molecule because nothing else systematically varies with the treatment. In a device trial, skill varies, and you often can't separate "the device worked" from "these particular hands worked."

It also breaks blinding. You can blind a patient to whether they got a drug or a placebo; you cannot blind a surgeon to whether they're operating, or blind them to which device they're holding. The operator knows, and their knowledge, skill, and enthusiasm bleed into the result. The clean isolation a drug trial achieves is, for a device, structurally impossible — not because anyone was careless, but because the operator can't be factored out of a device the way a pharmacist can be factored out of a pill.

Break 2: the learning curve.

The second consequence follows from the first: if skill matters, and skill grows, then the device's effect changes over time. This is the learning curve, and it's poison for a stable estimate.

Picture the outcome plotted against the number of times a team has performed the procedure. Early on, results are modest — they're still learning the device, the workflow, the pitfalls. With repetition, outcomes climb, then flatten as mastery sets in. The device never changed; the results moved anyway, purely from accumulating experience. Now ask the fatal question: when do you run the trial? Run it early, before the curve rises, and you'll measure a disappointing effect that undersells a device that would shine once teams learn it. Run it late, in centres that have already climbed the curve, and you'll measure a spectacular effect that no ordinary hospital will match in its first year.

There is no "right" time that gives the true number, because there is no single true number — the effect is a curve, not a point. This is unlike a drug in the deepest way: a molecule's effect doesn't improve because the doctor has prescribed it two hundred times. The device's does. And that means every device trial is really a snapshot of one spot on a moving curve, in one set of hands, at one moment — and the snapshot's flattering or damning depending entirely on where and when it was taken.

Why the device trial can't tell you the truth.

Below is a team's learning curve — patient outcomes improving as they perform more procedures. You decide when the trial was run (early or late on the curve) and where (an expert centre or an average one). Then compare the trial's measured effect against what your own hospital would actually get in its first year. Watch them refuse to match. (An illustrative model of the mechanism, not real trial data.)

When the trial was run: 150 procedures (late)

EarlyLate

Where the trial was run:

Trial measured: 95% (late, expert centre) · Your hospital, year one: 62% · Gap: +33 points — the trial overstates what you'll get.

There's the trap in one picture. Slide to where real device trials actually happen — late on the curve, in expert centres — and the measured effect soars far above what your average hospital will get in year one. The trial isn't lying; it's faithfully reporting an effect that belongs to other hands at a later time. Slide the other way and you undersell a device that just hadn't been learned yet. The number you want — "what will this device do for us?" — isn't anywhere on the screen, because it isn't a number. It's a curve, and the trial only ever photographs one point on it.

Now you.

For each statement, is it a property of a drug (a stable, isolated factor) or a consequence of being a device (a system of device × operator × version × time)?

1. The same molecule produces the same effect regardless of who prescribes it.

2. The same instrument gives better results in an expert's hands than a novice's.

3. Results improve over the first hundred procedures as the team learns.

4. The active ingredient is identical in every unit and doesn't change over time.

5. By the time the trial reports, the version tested is off the market.

6. Regulatory approval required only observational data and surrogate outcomes.

Break 3: the moving target.

The third consequence attacks from a different direction: the device won't hold still long enough to be studied. A drug molecule is fixed for decades — the aspirin in a 2005 trial is the aspirin on the shelf today. A device is not. It iterates: version 1, version 2, version 3, often a new generation every eighteen months or so, each meaningfully different — better optics, a redesigned tip, new software.

Now collide two timelines. A good randomised trial takes years — to enrol, treat, follow up, analyse, publish. The device generation cycle is often shorter than that. So the sequence plays out cruelly: you start a rigorous trial on version 2; while it runs, version 3 launches and then version 4; by the time your trial reports its gold-standard result, the version it studied is discontinued — you can't even buy it anymore. You've produced a certain, rigorous answer to a question about a device that no longer exists, and clinicians are now using something your evidence never touched.

This is the moving target problem, and it's a genuine dilemma, not a solvable inconvenience. Demand full RCT evidence and you'll only ever have rigorous data on obsolete versions — you'll always be one or two generations behind reality. Accept evidence on the current version and you'll have to accept weaker evidence, because there hasn't been time to generate the strong kind. The very rigour that serves drugs so well works against devices: the slower and more careful the evidence, the more certainly it's out of date by the time it arrives.

Break 4, and how HTA responds.

The fourth consequence is quieter but pervasive: devices arrive with structurally weaker evidence, by design of the system that regulates them. Historically, getting a device to market (a CE mark in Europe) demanded far less than a drug's marketing authorisation — often observational data, small case series, surrogate endpoints, and rarely the large randomised trials required of medicines. So HTA of devices routinely starts from a thinner evidence base than HTA of drugs — not through anyone's negligence, but because the regulatory on-ramp never required more. (Regulation has been tightening, but the gap and its legacy remain.)

Put all four breaks together — operator-dependence, learning curves, moving-target versioning, thin evidence — and you can see why HTA can't just run the drug playbook. So it adapts:

Registries over (or alongside) trials. Instead of trying to isolate the effect in an artificial trial, track the device in real use across many centres and operators — capturing the learning curve and the operator spread rather than hiding them. Registries and real-world data (Module 11) aren't a fallback here; they're often the right tool.
Assess in real hands, and expect a curve. Interpret trial results as expert-centre, post-learning-curve upper bounds, and explicitly ask what an average centre will get, and how long the learning curve will cost.
Conditional, iterative decisions. Because the evidence matures later than the decision must be made, devices lean on the Module 11–12 machinery: conditional coverage, monitoring after adoption, revisiting as new versions and registry data arrive. The assessment is ongoing, not a one-off.

The deep shift is this: a device is assessed less like a product and more like a procedure or service. What you're really evaluating isn't the object — it's what happens when this device is used, by these people, learning over time, in this system. The unit of assessment is the whole practice, not the thing in the box.

What are the two biggest cautions?

A new surgical device is supported by an excellent randomised trial: it was conducted in three world-leading centres, whose surgeons had each performed over 200 procedures with the device before the trial began, and it studied "version 2," which has since been replaced by version 3. An assessor is deciding how to use this evidence. What are the two biggest cautions?

Why this matters for HTA

Devices are where an assessor most needs to resist drug-shaped instincts, because the familiar tools quietly mislead:

Read every device trial as a context, not a constant. The first questions are whose hands, how far up the learning curve, and which version? An impressive effect from expert centres past the curve is an upper bound, not an expectation — and the gap to an average hospital in year one is a real, quantifiable part of the assessment, not a detail. Treating a device effect as a portable constant, the way a drug effect is, is the central mistake.
Match the evidence method to a moving target. Insisting on slow, rigorous RCTs guarantees you'll only ever have solid data on obsolete versions. Registries, real-world monitoring, and conditional, revisitable decisions aren't second-best compromises for devices — they're often the appropriate response to a technology that iterates faster than a trial can report.
Assess the practice, not just the product. The learning curve, the operator spread, the centre volume, the versioning — these aren't confounders to strip away; they are the intervention. The honest question is never "how good is this device?" but "what happens when this device is adopted, learned, and used, by real teams, over time, in our system?" Evaluate the service, not the object.

A drug is what it is; a device is what you do with it. The molecule's effect belongs to the molecule, but the device's effect belongs to the hands, the moment, and the version — which means assessing a device is never about the thing in the box, but about the whole moving practice that grows up around it.

HTA of medical devices, in one breath.

A drug is a molecule — a stable, isolated causal agent, identical everywhere, unchanging over time, separable from its user — which is exactly why a trial can isolate its effect. A device is a system: device × operator × version × time. Its effect isn't a property of the device but of the whole system, so the drug-assessment machine breaks at the foundations.
Four breaks follow. Operator-dependence: effect = device × skill, wrecking internal validity and blinding. Learning curve: results improve with experience, so the effect is a curve, not a point — early trials undersell, expert-centre trials oversell. Moving target: devices iterate (v1→v2→v3) faster than trials report, so rigorous evidence describes discontinued versions. Weak evidence by design: device regulation (CE mark) historically demanded far less than drug approval.
The consequence: a device trial in expert centres, past the learning curve, on last year's version, is an upper bound on a superseded device in hands you won't have — informative, but never a portable constant.
So HTA adapts: registries and real-world data over artificial isolation, results read as context-bound upper bounds, and conditional, revisitable decisions that mature with the evidence. A device is assessed less like a product and more like a procedure or service — the unit is the whole practice, not the object.

Ask "how good is this device?" and you've already asked the wrong question. The device doesn't have an effect the way a molecule does — the hands, the learning, and the version have the effect, and the device is only where they meet.

Devices showed us a technology whose value is smeared across operators, time, and versions — where a single clean number was never available. The next lesson pushes on a different limit of the standard toolkit: what happens when the thing you're assessing has value on several dimensions at once, and no single measure — not even the QALY — can capture all of it? When cost per QALY isn't enough, how do you decide across many criteria at once? That's multi-criteria decision analysis (MCDA).