Failed studies, what went wrong?
A new formulation under investigation was planned to be submitted for use in both the US and Europe. As the reference formulation in the US differs from that in Europe, two separate bioequivalence studies were needed to gain approval from the FDA and EMA. Study 1 was conducted for FDA approval and successfully demonstrated bioequivalence. Study 2 was conducted for EMA approval and whilst AUC was within the bioequivalence limits, Cmax was not, and bioequivalence could not formally be declared.
PHASTAR were approached to critique the failed study and provide a sample size calculation to run the study again. Based on PHASTAR’s recommendations, Study 3 was conducted for EMA approval and successfully demonstrated bioequivalence. On completion of Study 3, and during application to the EMA, PHASTAR were approached to provide independent, expert advice as to why the initial EU-study had failed in order to justify acceptance from the EMA without an additional study being conducted to confirm the bioequivalence result.
Our approach was to assess and compare the study designs, power assumptions, and results of the two successful bioequivalence studies to that of Study 2 to determine if there were any differences that could have contributed towards the failure to declare bioequivalence.
The overall design and statistical methodologies of the studies were similar, with the only disparity being the adoption of a three-sequence design in Study 1 (to determine the intra-subject CoV), an older population in Study 2, and the underlying assumptions to calculate sample size in each of the studies.
Study 1 and 3 had used informed estimates of the geometric mean ratio and intra-subject coefficient of variation (CoV) for the calculation of the sample size, whereas Study 2 assumed a substantially lower CoV for the sample size calculation, lower than any observed value in each of the 3 studies. The estimate of the expected variability is a crucial component to adequately power a clinical trial such a as these. An observed variability which is higher than assumed in the design process reduces the power to be able to detect a true difference, if the expected variability is underestimated in the sample size calculation, the study will be too small and underpowered.
It was also determined that if a trial with the same sample size as Study 2 were to be conducted again but based on the intra-subject CoV observed in Study 2 (which was higher than the initial assumption), it would have 81% power based on a geometric mean ratio (GMR) of 95%, and 12% power based on a GMR of 82.61% (as observed in Study 2).
The observed values of the pharmacokinetic parameters (AUC and Cmax) were consistent across studies and formulation groups (test and reference), and no extreme outliers were identified in Study 2. PHASTAR noted that Study 2 reported the lowest average Cmax (based on geometric mean, mean and median) for the test formulation and the highest average Cmax for the reference formulation. This had a substantial impact when assessing the geometric mean ratio, resulting in a lower estimate relative to Study 1 and 3.
The bioequivalence results observed in Study 1 were replicated in Study 3, a study almost identical in design to the failed Study 2 but with a larger sample size. Type II errors are inherent in all clinical trials, and even when a sample size is calculated based on correct assumptions, a study could potentially fail due to chance alone. We concluded that after inspection of the data in all three studies, the failure to declare bioequivalence in Study 2 could likely be due to an underpowered study.