Beyond the hype: AI and machine learning in clinical trials and healthcare

There is considerable hype surrounding Machine learning (ML) and Artificial Intelligence (AI) yet despite that, these technologies are real and powerful and this is starting to be realised in healthcare.  In this article we briefly discuss ML and AI alongside some key healthcare examples including how ML has added value in clinical trials with hands on examples performed by experts from PHASTAR’s newly established data science team. 

Although the terms AI and ML are frequently used interchangeably, they are not the same thing. AI is a broad concept that effectively describes how a machine can simulate natural human intelligence to solve a complex problem. AI is of course a moving target; based on those capabilities that a human possesses but a machine doesn’t. ML is one of the ways humans hope to achieve AI, where a machine can learn on its own without being programmed explicitly and without our constant supervision.

In addition to the availability of affordable large-scale compute environments this growth of ML is being driven by the digitization and accessibility of more diverse data.  Globally, it is estimated that the volume of digital data will grow from the estimated 33 Zettabytes (ZB) in 2018 to 175 ZB by 20251(1 ZB is the equivalent to 1 trillion GB). In healthcare for example, there are global initiatives to digitise health records (e.g. 2 in the UK), in addition to increasingly complex and of higher resolution medical devices such as MRI machines [3] and other rich data sources such as consumer-grade activity trackers. The utilisation of the right elements of these data, the right ML methods at the right time has the potential to revolutionise personalised healthcare.

There has been demonstratable success of ML in healthcare4-10 and in the development of new drugs11-12. More recently ML has been successfully applied to clinical trials; utilising decades of structured clinical trial data alongside real-world data (RWD) and other valuable data sources to support clinical trial design, execution and analysis with the aim of reducing the time and cost of clinical trials13-16.  Below, PHASTAR’s data science experts describe some examples of how they have successfully applied ML to clinical trials to add business value.

“Event prediction to guide clinical study design”

Following the completion of an early phase clinical trial, the Sponsor were planning a larger phase IIb study with the same compound and wanted to understand whether there were any biomarkers in the clinical data from the first study that were predictive of a specific event.  Following critical interaction with clinicians and scientists at the Sponsor company to fully understand the question and the insights they required the first component of the work was around feature selection: the selection of the variables in the clinical data that could potentially be useful as predictors of the event for example demography and laboratory data.  Having extracted and processed the relevant data, statistical and visualisation approaches were applied and documented to delve into the data followed by the application and evaluation of different machine learning approaches.  The team ensured that the output was not just a ‘black-box’ prediction but provided insights into any potential biomarkers. The results were used as a source of evidence for the clinical team to support their decision making during the design of the next study.

“Application of external models to predict subgroups of responders”

A similar example from PHASTAR experts was in the application of external models to predict subgroups of responders as part of a Phase III program.  The sponsor requested validation of a personalised medicine algorithm, developed by applying a machine learning approach using data from previous studies, which would be used to predict the probability of a patient having a favourable response when they received the compound. Working closely with the project team, a personalised medicine analysis plan was delivered fully detailing the algorithm & planned analysis which was submitted to the regulatory authorities alongside the main study Statistical analysis plans.

The analysis was performed in parallel with the primary analyses so that the project team could interpret the predictive power of the algorithm at the same time as the main results there by enabling the Sponsor to make informed decisions regarding the future of the compound.

One of the key challenges in ML with respect to clinical trials is access to large, quality datasets.  Initiatives are underway to facilitate this17-20 and the availability of valuable data sources is only set to increase as these initiatives and others gain traction. The application of ML goes beyond the application of statistical methods and critically, includes a full understanding of the data e.g.  limitations, nuances, bias or ethical considerations.  In addition, there is almost certainly a data processing and integration step followed by extraction of the key variables in close collaboration with the scientific team. Once the right data is prepared the appropriate ML method(s) can be selected and systematically evaluated.  It is essential that any results are interpreted by the right experts with an appreciation of the limitations in the model.  These steps are vital if any resulting ML model is to be trusted and be of benefit to the business. 

At PHASTAR our teams are specialists in addressing the technical challenges around clinical and other healthcare related data and our data science group works closely with our data management, statistics and programming specialists to ensure the right data is used in the right way.  The data science team at PHASTAR have a systematic approach to ML problems and partner with clinical and scientific teams to fully define questions and deliver business value. Amongst the projects underway at PHASTAR our data science experts are currently looking at generating ML insights from publicly available datasets associated with clinical trials as well as conducting a study around wearable devices and their application within clinical trials. 

If you would like to understand more about how ML could be applied to support your study or research question, please get in touch with one of the PHASTAR team.



S. M. Smith and T. E. Nichols, “Statistical challenges in 'big data' human neuroimaging,” Neuroview, vol. 97, no. 2, pp. 263-268, 2018.


[Online]. Available:


[Online]. Available:


[Online]. Available:


[Online]. Available:


[Online]. Available:


[Online]. Available:


[Online]. Available:


[Online]. Available:


[Online]. Available:


[Online]. Available:


[Online]. Available:


[Online]. Available:


[Online]. Available:


[Online]. Available:


M. Thissen et al, “mHealth App for Risk Assessment of Pigmented and Nonpigmented Skin Lesions-A Study on Sensitivity and Specificity in Detecting Malignancy,” Telemedicine and e-Health, vol. 23, no. 12, pp. 948-954, 2017.


D. Reinsel et al., “The digitization of the world from edge to core,” IDC - Seagate, 2018.


F. Pappalardo et al., “In silico clinical trials: concepts and early adoptions,” Briefings in Bioinformatics, 2018.


[B. P. Kovatchev et al., “In silico preclinical trials: a proof of concept in closed-loop control of type 1 diabetes,” Journal of Diabetes Science and Technology, vol. 3, no. 1, 2009.


A. Karthikesanlingam et al., “Clinically applicable deep learning for diagnosis and referral in retinal disease.,” Nature Medicine, vol. 24, pp. 1342-1350, 2018.