Data science collaboration with - part two

Click here to read part one 

Phastar embarked on a collaboration project with to work on a data science project to automate the medical coding process for adverse events using machine learning approaches.

In clinical trials adverse events are coded using the MedDRA coding dictionary to standardize and allow consistent interpretation of results. There are five MedDRA classifications that each verbatim term (the term reported during the trial) needs to be mapped. Even with the aid of auto-coders this is a manual and time-consuming process that is prone to human error. The recent wider adoption of machine learning within clinical trials has led to the semi-automation of certain tasks to increase efficiency in the clinical trial process. The focus of this project was the verbatim mapping to the Lowest Level Term (LLT) in the MedDRA hierarchy.

The goal of the 8-week project was to ascertain if the auto-coding process could be improved with the application of Natural Language Processing (NLP) to the verbatim terms that would suggest a list of the most appropriate LLT for the verbatim term with a confidence interval for adjudication by the data manager.

The project explored different approaches to achieve this on a small dataset of 4697 verbatim terms, including different preprocessing of the verbatim terms and different NLP approaches. The preprocessing step was necessary as some verbatim terms contain additional words, were misspelt or contained inflected forms of a word. Different approaches were used for the cleaning including stemming, space trimming and spelling correction and these were found to improve performance of the auto-coder.

Once the verbatim terms were ‘clean’ NLP methods were applied. Different methods were explored including exact matching of terms (between the verbatim and LLT), term frequency similarity using TF-IDF and the S-Bert deep learning transformer model.

The simple exact term matching approach achieved a performance similar to what is achieved by the current approaches, approximately 66% of LLT terms correctly returned for the verbatim terms.

TF-IDF (term frequency-inverse document frequency) is a method that evaluates how relevant an individual word is to a document within a collection of documents or in our case how relevant a verbatim term is to a particular LLT in a collection of all LLTs. This method outperformed the exact term matching and gave an accuracy of around 80% or terms identified correctly in the top 4 returned from the algorithm.

The third method applied was the S-BERT deep learning transformer model. BERT (Devlin, Chang, Lee, & Toutanova, 2018) stands for Bidirectional Encoder Representations from Transformers and is a machine learning model for NLP, developed by Google AI Language. SBERT (Reimers & Gurevych, 2019) is a modified version of the pretrained BERT network and can be used can use cosine-similarity to find sentences with similar meaning. This model performed even better with around 90% of the correct LLT term returned in the top 4 hits.

These early results suggest that sophisticated NLP approaches may improve the auto coding of verbatim terms to LLT terms from the MedDRA hierarchy and if incorporated in the current process have the potential to improve the efficiency of the coding process in the future.


Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Biderctional Transformers for Language Understanding. arXiv

Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Network. Retrieved from arXiv: