By Jacco Thijssen, University of York
The Ebola crisis brings into sharp contrast the importance of appropriate regulation for trials of new drugs. The “gold standard” in clinical research is the randomised trial, but the reliance on one particular measure of evidence arguably risks lives and may be holding back our efforts to defeat a disease which has sparked fear around the globe.
A typical Randomised Clinical Trial (RCT) aims to show if a new treatment is better than an existing one, or better than no treatment at all when none has so far been developed. To do that, a statistical tool is used to discover if a positive result might be a fluke or if the results are “statistically significant”. It is a dominant technique, but one which says nothing about the actual effect of the new treatment and which fails to take into account the opportunity costs of doing nothing: how many lives are lost while we conduct our trial? It is this disregard for the consequences of the actions following a trial like this that makes the one-size-fits-all approach to assessing the success of a treatment a potential killer.
A randomised trial to evaluate the effectiveness of a new health treatment divides patients into two groups: one receives the new treatment, whereas the other group gets the existing one. The health outcomes in both groups are measured and their averages computed, as well as the “p-value” of their difference – this is the number used to determine whether a result is statistically significant. If this p-value falls below a pre-specified threshold the new drug is declared to be more effective. This vital piece of evidence is then used in the decision-making process to allow the use of the new drug or not.
When a new treatment is tested, scientists typically say that if the chance of getting a result like the data provided using the existing treatment is greater than 5% (or sometimes 1%) then we cannot exclude the notion that the new treatment is no better than the old.
To look in a bit more detail, imagine that a new treatment is indeed no better than the existing one. That means that in the population of all patients you would expect, on average, that it would give the same health outcome (if the existing treatment is “no treatment exists” then this effect is zero). You now observe the average outcomes in your trial, which you expect to be very close to each other.
Of course, a different group of patients would lead to different observed average effects, so you wouldn’t expect to see exactly zero difference in the average effect, even if the new treatment is no better than the existing one. So, if you actually observe an average effect of the new treatment that exceeds the average effect of the old treatment, then this could be due to a fluke: you were just unlucky in the groups of patients that you had in your trial.
Statisticians compute the probability that you observe such a fluke (or worse) and this is the p-value. If this probability is very low, then either you have observed something that is very unlikely, or the true difference in averages is not zero. A typical threshold for the p-value is 5%. So, if after your trial the p-value is lower than 5% you conclude that “the new treatment is statistically significantly more effective than the existing treatment”.
Flaws and Effects
It is important to remember that a low p-value, no matter how low it is, never gives a guarantee for this particular treatment to be more effective than the existing treatment. All you know is that you won’t make a mistake in more than 5% of the possible trials that you could have conducted. How useful is this standard?
Consider the following two cases: (1) a disease (such as Ebola) where non-treatment almost certainly leads to the deaths of many people even though treatment with uncertain benefits is quite cheap, and (2) a rare disease for which there is currently a good, cheap, treatment, but for which a new, marginally better and much more expensive new treatment is proposed.
Surely, you would want to be much more certain about the effectiveness of the new treatment in the second case. In the first case, if you treat patients with a drug that may not be very effective, but is not very expensive to produce, then you might very well be willing to take a risk. A fetish with p-values as the ultimate arbitrator between “proven” effect and “unproven” effect, does not allows this. In short, costs and benefits should be taken into account.
A second, important point is that we do not have to wait until a full trial is finished. Consider the case of Ebola. If a new drug has low costs, high opportunity costs and high potential benefits, then you want the drug to be administered as soon as possible. This means that you do not wait until a long trial is finished before you evaluate the evidence. Rather you evaluate the evidence as it comes in and you take a decision as soon as is optimal. This approach is always at least as good as the standard approach; in many ways it is simply an enhanced version.
The mathematical and statistical theory for an approach as advocated here exists and has recently been applied to clinical trials. Unfortunately, by its very nature, this kind of approach is more difficult than the rather straightforward p-value. The fate of Ebola patients should, however, not stop us from exerting a little intellectual effort to alleviate their suffering. All it needs is a recognition by regulators and the medical profession that the current way of evaluating RCTs may not be the best.
Jacco Thijssen does not work for, consult to, own shares in or receive funding from any company or organisation that would benefit from this article, and has no relevant affiliations.