Managing Missing Data

Missing data: Management and Prevention

Data managers strive to produce high quality, reliable and intact data for analysis. Integral to this quality standard is to ensure minimal or no missing data. Missing data may have different sources such as equipment failure, missed visits, death or withdrawal of a subject and is usually dealt with during the analysis by defined handling strategies. Data which are available at the investigational site, but have not been collected and are missing from the eCRF through error or omission can be avoided by good data handling procedures.

The impact of missing data can be many fold from delay in timelines, additional costs and resources associated with retrieving and reconciling the data, and of course, adversely affecting the interpretation of study results through the introduction of bias. Many data items are dependent upon or form dependencies on other data items, therefore the unavailability of a single item of data may affect the integrity of data points elsewhere in the database.

The optimal approach to dealing with missing data is one of prevention. Trial design has a role to play and consideration should be given to practicalities, such as the impact on site and subject, in an effort to avoid missing data due to confusion or errors in study conduct. Effective and efficient data capture processes are essential. Good eCRF (or paper CRF) design with a logical data flow which mimics the sequence of procedures in the clinic and facilitates efficient data collection is important. Skip logic is a feature that changes what question or page a respondent sees based on how they answer another question, thus guiding the user through the eCRF and avoiding data entry into variables that should remain blank. Clear on-screen data entry instructions and readily available eCRF completion guidelines are essential. User Acceptance Testing (UAT) during the database design stage is key to this and developers should ensure that a wide range of user-types test the design to consider ease of use. It goes without saying that user training and support are essential. Along with periodic refresher training demonstrations, vignette style videos are well received.

Clear guidelines need to be agreed and communicated on how missing data will be handled in an eCRF. If the decision is that missing data fields will be left blank, a site entering NA into text fields leads to further queries and data edits. The use of mandatory or required fields in the eCRF may be useful. These fields require data to be entered in order for the site user to save the form and proceed further. These fields need to be used with caution and ensure they do not interfere with entry of data at the relevant time.

Target rates for missing data can be set at the beginning of a trial and reviewed during a trial to target sites at an early point who may be falling outside of these targets. Early discussions on the reasons for a high level of missing data can lead to improvement before there is a major impact on study results.

Use of these strategies in conjunction with effective data review during study conduct can help to ensure we provide high-quality data by keeping the number of errors and missing data at a minimum and gather maximum data for analysis whilst ensuring that data are complete, reliable and processed correctly.