Event prediction to guide clinical study design
Following the completion of an early phase clinical trial, the sponsor was planning a larger phase IIb study with the same compound but wanted to understand whether there were any biomarkers in the clinical data from the first study that were predictive of a specific event.
At the start of the project, time was spent with the clinicians and scientists at the sponsor company to fully understand the question from the team and the insights they required prior to running the analysis. This interaction forms a critical part of any data science project.
The first component of the work was around feature selection, the selection of the variables in the clinical data that could potentially be useful as predictors of the event for example demography and laboratory data. This work was done in collaboration with the scientific teams, ensuring their experience and knowledge was included in the process.
Having extracted and processed the relevant data using a combination of specialist workflow tools and R, statistical and visualization approaches were applied to delve into the data before applying machine learning approaches. For example, the team looked at data consistency, missing data, outliers etc. and these were provided as a written report to the study team.
Different machine learning approaches were applied using R, including random forest and gradient boosting methods and evaluated using cross-validation. The predictive power, precision and recall of the different methods were analyzed and presented and, importantly, the variable importance. Variable importance describes how the different features in the data contributed towards the predictor, ensuring that the output was not just a ‘black-box’ predictor but provided insights into which variables were important for the predictor. It was then possible to look at the most predictive variables in more detail using visualization techniques delivered by the data science team.
The results were used as a source of evidence for the clinical team to support their decision making during the design of the next study.
Over and above the data mining performed on the clinical data, the sponsor made an additional request to perform a text mining approach across the literature to understand if there were any variables that may be associated with the event, externally, in the selected patient population. This work looked at a strength of association score between variables and reports in the literature and was used to support the decision making.