Data Science: making sense of data

The volume of digital data in healthcare is projected to increase more rapidly in the coming years than any other sector. On a day-to-day basis it is vital that clinical teams ensure they are maximising the value, not only of their own trial data but also of the wealth of external data for example electronic healthcare records, real-world data and peer-reviewed research published in journals.

The ability to utilise this data requires not only an understanding of what is available but how to access the data, work with the structure of the data, understand the quality and inherent biases and importantly apply the right methodology to extract value. In addition to the large volume of standard data generated on a clinical trial there can be a raft of other, more specialised data, such as genomics, proteomics, wearables and comprehensive measurements all of which rely on the skills of an experienced data management, programming and statistics team to utilise.

Ensuring teams maximise the value of these data sources, in the most efficient way at the right time is a key role of data science. Different data science approaches can be applied to integrate, analyse and present the data in the optimal way to facilitate decision making as clinical trials are designed and executed. It can be time consuming and tedious to summarise large amounts of data from different sources and use these to draw conclusions and drive decisions. Humans naturally process visual data more effectively than any other type of data and visual analytics can effectively bring together disparate data, extracting the key information and enabling complimentary data to be aligned. A visual representation can provide efficiency gains, enable the generation of meaningful insights and reduce errors, often facilitating discussion and interaction with a wider team. A good visualisation provides the relevant information in an attractive, easy to use format enabling the user to answer the question it was created for. In the context of clinical trials this often means that the data is presented in near-real time and can be interactive enabling some level of ‘drill-down’ to richer information around an observation.

In cases where a more sophisticated analyses of data are required there are a plethora of approaches in addition to more standard statistical approaches. For example, a branch of AI called natural language processing can be applied to trawl through the large-volumes of unstructured peer-reviewed data and retrieve specific and insightful results much faster than a human could achieve, or the application of machine learning to identify patterns and signals in data that are not obvious from simple observation. For these approaches to be meaningful it is vital that they are presented in a way that enables interpretation in the context of the problem whilst outlining any limitations. Visual analytics again can have an important role to play, ensuring the output of such sophisticated approaches can be interpreted easily and rapidly.

An adaptive clinical trial (one that allows for prospectively planned modifications to the trial design in accordance with observations from participant outcome) can benefit from data science expertise, for example, enabling teams to utilise external data sources during the design of the clinical trial. Moreover, data quality and study conduct are critical to the success of adaptive design studies and require the repeated examination of data. Visual analytics along with advanced statistical approaches could streamline this, enabling teams’ continuous access to near real-time data to spot signals or issues earlier and leading to a richer understanding of the tolerability of data in dose escalation studies.


Access to the right data, analysed and visualised in the right way for the right audience could impact clinical trials, including adaptive clinical trials, from the early stages of design through to trial conduct and analysis. Data scientists at PHASTAR have the necessary skills to support clinical teams figure out the best way to utilise the abundance of available data. Read here to find out more.