AutoCRF: A tool to create a mock eCRF screen from excel data specifications

Case report forms (CRFs) are critical in clinical trials as they capture vital data related to patient safety, treatment efficacy, and overall trial outcomes. Good CRF design is essential to ensure the accuracy and completion of CRFs during the data collection process, warranting the validity and reliability of trial results and leading to the approval of new treatments and medicines. The design of electronic CRFs (eCRFs) for use in electronic data capture (EDC) tools is usually led by the data management team but requires input from various stakeholders, including the clinical team, statisticians and programmers to ensure the eCRFs capture all necessary data accurately and efficiently.

The data specification process for eCRFs typically involves creating an excel spreadsheet with specific columns and rows for each data element or ‘variable’ required for the study. These variables usually have specific definitions including data types, formats, range checks, validation rules, and other requirements specified in the study protocol. This excel spreadsheet of variables needs to be carefully reviewed by each stakeholder to ensure all required trial information can be collected easily and collection of any unnecessary or duplicate information is avoided. However, the review process can often be a challenge. It is difficult for some stakeholders from diverse backgrounds to visualise and easily interpret technical language written in the data specification file, and the overall process can lead to fatigue among reviewers. Insufficient review can result in later amendments to the eCRF which inevitably incur time and resource investment.

Working in close collaboration with the data managers and database builders in the Data Operations team, the Digital Health and Data Science (DHDS) team identified a solution that would resolve the increasing business demand within Phastar for an automated mock eCRF that would reflect the technical information from the excel file in a more meaningful, readable format.


Fig: Snaps from the data specification excel sheet (left) and a form from the mock eCRF (right)

The mock eCRF generation process begins by taking the excel data specification as an input object of automated R code. This R code generates a R Markdown file that is compiled to generate the mock eCRF in HTML format resembling the screen that would be used for data collection in the EDC tool, including drop down lists and on-screen instructions. The reviewer can thus explore each form/screen interactively in a logical workflow. This entire creation process is done inside Posit Workbench (formerly RStudio Workbench).

There were some specific challenges encountered in the mock eCRF creation for the forms with dynamic functionality. For example, when an Adverse Event (AE) form should be displayed only when a trigger question ‘Were there any AEs YES/NO?’ is answered YES. Exploration and testing of other programming tools resulted in the resolution of the problem and permitted the implementation of this and other complex relationships within the eCRF.

At Phastar, we use CDASH standards for our eCRF specifications. As the entire process is automated and the data specifications do not deviate from the standard structure, the process takes as little as 10 minutes to generate the mock eCRF.

Collaborative works are going on between Data Operations and DHDS team for smooth roll-out of this R-based tool in some of our upcoming projects. If you would like to find out more, please get in touch.