How to choose the baseline probability that a clinical hypothesis is true
This is a supplementary post to the EBM 2.0 project – you may wish to start there, if you haven’t already.
The EBM 2.0 project explains how there are longstanding, incorrect beliefs that the p value represents the probability that the null hypothesis is false and therefore 1 minus the p value tells you the probability that the evaluated hypothesis (the main alternate hypothesis) is true. This erroneous belief has lead to gross overestimation of the probability that study findings are true.
Instead, p-values are best understood by clinicians as likelihood ratios that can convert a pre-test probability (Pre-TP) that a hypothesis is true into a post-test (Post-TP) probability, where the trial conducted is considered “the test”. How to perform this conversion calculation is explained in this article.
However the question then remains – how does one choose this initial baseline Pre-TP(aka prior probability, pre-trial probability) that the hypothesis being evaluated is true?
In the world of EBM 2.0 the choice of a specific Pre-TP is likely to be one of the most controversial aspects of trial interpretation as there is no exact science to guide us. We will need to combine evaluations of biological plausibility together with prior evidence to come up with an estimate and while attempting to keep the process as objective as possible, there will unavoidably be a substantial element of of subjectivity.
Probably the most fundamental question is what should be our initial baseline Pre-TP for a clinical hypothesis being evaluated. This could then be adjusted on a case by case basis and then subsequently modified with each piece of evolving evidence.
I contend that the most appropriate “starting” Pre-TP for an intervention/therapy being trialled in human trials for the first time to prove efficacy, will probably be in the range of 5-10% but for the purpose of our statistical interpretations and evaluations it may be prudent to choose a conservative estimate of 5%.
The reasons for this estimate are:
- The law of diminishing returns
- Most of the great innovations in medicine and science that were the “low hanging fruit” in terms of making large improvements in patient benefits (e.g. vaccinations, antibiotics, public health measures) occurred many decades ago. In keeping with the law of diminishing returns, with just a few exceptions, it is now increasingly difficult to develop new therapies that makes a substantial difference to mortality or other key health outcomes – benefits in absolute terms tend to be very small and there is often great controversy regarding whether the benefits are present at all (e.g. statins for primary prevention of ischaemic heart disease, thrombolysis for stroke)
- Most findings are false
- We now know that most of our historical published findings are likely false (see EBM 2.0) and this applies to a whole body of literature that would include a range of trials, many/most of which were subsequent evaluations of an intervention/therapy. The results are likely to be more extreme (i.e very likely to be false) when considering first evaluations of an intervention/therapy in clinical trials.
- Counteract bias
- Even embracing an EBM 2.0 approach, there will be many intentional and unintentional actions/decisions/eventualities in the conduct of trials that will create substantial amounts of bias. The conglomeration of factors that cause this bias is extensive and the combined force of these factors to create bias is exceedingly strong … in fact somewhat overpowering in the EBM world. This bias will not only generate misleadingly low p values but will also affect the interpretation and conclusions based on these values. To counterbalance this predictable and unavoidable effect of bias, that will result in overly optimistic findings and interpretations, it would be arguably prudent and wise for the EBM community to be necessarily conservative in our estimate of the baseline Pre-TP.
- Enforce the practice of seeking replications
- Replications are a bedrock requirement of scientific discovery and yet it has become commonplace for widespread recommendations (including in guidelines) for practice change to occur based on a single clinical study. Using Bayesian analysis with a conservative baseline Pre-TP will almost always require the search for replication of findings as even with very low p values, the Post-TP be rarely be sufficiently high enough to warrant practice change.
- Optimally allocate resources:
- In a world of limited resources and unlimited competing priorities, in order to optimally allocate resources to create the greatest societal and health benefits, it would be advisable to take a conservative approach to Pre-TP estimation to maximise the chance of resource allocation towards the most high value therapies and minimise the risk of resource allocation to low value (or no value) care. Starting with a conservative Pre-TP will greatly assist in this goal.
The suggested 5% baseline Pre-TP should then be adjusted based on available data. For example, a large body of observational data supporting the hypothesis might provide a good reason to elevate the Pre-TP, although given how routine it is for observational data findings not to pan out in randomised controlled trials (RCTs), I’d again be cautiously conservative and only lift the baseline Pre-TP by small amounts towards the optimistic end of the estimated range i.e 10%.
The only data that should make substantial changes to our probability that our hypothesis is true, are high quality data such as blinded randomised controlled trials. If subsequent studies replicate the same findings, we can use the p values to calculate Post-TP probabilities and progressively increase our level of certainty in the findings. The Post-TP probability of the first trial, becomes the Pre-TP before the second trial, and so on.
Depending on the real-world costs and benefits of changing practice, we will need to choose a level of certainty that we deem acceptable before we change practice. Typically new therapies provide modest benefits and are extremely costly and thus require the highest levels of certainty before implementation, while old, cheap therapies with good safety profiles might be utilised based on a substantially lower level of certainty.
Proposed Certainty for Change Model
For more information about this, see the EBM 2.0 project
Dr Anand Senthi