Sample Size for a Diagnostic Study

Case study

Recently PHASTAR aided on design input and sample size calculations for a study using a new diagnostic test to detect sepsis in blood samples. For patients presenting with suspected symptoms of sepsis, the current method to confirm the diagnosis is via blood cultures, which takes between 36 - 48 hours to process. Due to the severity and rapid escalation of sepsis, treatment needs to be initiated immediately. Since it is not possible to wait until a result is obtained, the patient is treated for sepsis empirically. There are 25 different bacteria types that can cause sepsis and the treatment depends on the bacteria type. The current test method via blood cultures cannot determine the bacteria type of infection, so the patient is treated for the most common sepsis-causing bacteria type (E. Coli). The new diagnostic test seeks to address both issues. It can return a sepsis diagnosis within 3 - 4 hours while identifying the individual bacteria type.

The current standard method of testing using blood cultures is used but it is not the definitive method. Hence, as per the FDA guidance document (Guidance for Industry and FDA Staff: Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests1, dated 13 March 2003), the current method is to be regarded as a non-reference standard rather than a gold standard. The definitive method of testing for sepsis is via genome sequencing of the blood sample but it is not routinely used as it is expensive.

In the proposed study, each patient will provide one blood sample. Each blood sample will be tested for sepsis using both the new diagnostic test and the current (non-reference standard) test. In addition, it will also be necessary to test all blood samples via genome sequencing (true gold standard) to obtain the true result. This will be needed to confirm all results (where there is both agreement and disagreement between the new diagnostic test and the current (blood culture) test). The possible outcomes of each test are Positive and Negative. The possible combinations of outcomes are therefore:

Actual (Genome Sequencing) Current (Blood Culture) Test New Diagnostic Test
Positive Positive Positive
Positive Positive Negative
Positive Negative Positive
Positive Negative Negative
Negative Positive Positive
Negative Positive Negative
Negative Negative Positive
Negative Negative Negative

As per the FDA guidance document, comparing a new test to a non-reference standard does not give a true result. Further discrepant resolution is inappropriate. Outcomes that are altered or updated by discrepant resolution to estimate the sensitivity and specificity of a new test or agreement between a new test and a non-reference standard should not be used. Instead, all blood samples will also need to be tested via genome sequencing (true gold standard) to get a definitive result.

The proposed primary endpoints for the study are:

  • The number of positive results using the new diagnostic test compared to the number of positive results from the current (non-reference standard) test
  • The number of correct bacteria types identified using the new diagnostic test as confirmed via genome sequencing (true gold standard)

The power to identify the individual bacteria type will be low given the low prevalence of each of the 25 bacteria types. Only the top 5 bacteria types have a prevalence of >5%, whilst the bottom 10 bacteria types have a prevalence of <1%.

At this stage, multiple testing for the co-primary endpoints has not yet been considered.

The proposed analysis will be based on a paired test in proportions, using the following approach:

  • Construct two 2x2 tables – one for the new diagnostic test versus genome sequencing (true gold standard) and one for the current (non-reference standard) test versus genome sequencing (true gold standard)
  • From each 2x2 table, estimate the sensitivity and specificity for the new diagnostic test and the current (non-reference standard) test
  • Compare the sensitivity of the new diagnostic test versus the current (non-reference standard) test using the difference in proportions and similarly for specificity

Sample size calculations were conducted using nQuery, based on a paired test of non-inferiority in proportions, with the following assumptions:

  • nQuery - paired test of non-inferiority in proportions
  • Significance level = 5%
  • One-sided
  • Non-inferiority margin = 5%
  • H0: π1 – π0 ≤ -0.05 (inferior) versus the alternative hypothesis that H1: π1 - π0 > -0.05 (not inferior)
  • Non-inferiority limit difference Δ0 = -0.05 (null hypothesis of inferiority)
  • Expected difference Δ1 = 0 (alternative hypothesis of non-inferiority)
  • Proportion discordant η = π10 + π01 (proportion of tests that will disagree under the alternative hypothesis)
  • Number of discordant pairs varies according to sensitivity and specificity

Example

If the sensitivity and specificity of the current test is 85% and 95%, respectively, and the positive rate of sequencing is 40% as summarized in the table below, then the required sample size is 309 samples.

    Sequencing
    Positive Negative Total
Current Test Positive 34% 3% 37%
  Negative 6% 57% 63%
  Total 40% 60% 100%
    Sensitivity Specificity  
    85% 95%  

Since this is based only on the number of discordant samples, it is necessary to check that the sample size will be adequate. To do this, use another sample size table in nQuery to compute the power for a test of non-equivalence based on the observed lower limit of the confidence interval for the difference in proportions.

  • nQuery – lower confidence limit for difference in paired proportions (simulation)
  • Confidence level 1 – α (one-sided) = 0.95
  • Expected difference π1 - π0 Δ1 = 0
  • Proportion discordant η = π10 + π01 = 0.09 (from the above table – 3% + 6% = 9%)

Use the nQuery side table to compute the remaining values and transfer to main table (may need to overwrite the expected difference to ensure that it remains as 0 rather than the computed value).

  • Lower limit for π1 – π0 LL = -0.05
  • Number of simulations = 1000
  • Random seed = 2020
  • n = 309 (sample size as calculated from nQuery in the first step above)
  • Lower limit for π1 – π0 LL = -0.05
  • Number of simulations = 1000
  • Random seed = 2020
  • n = 309 (sample size as calculated above)

nQuery then computes the estimated power. If the estimated power is less than 90%, increase the sample size and recompute with a larger value of n. Repeat this until the estimated power is greater than 90%.

In this example, the estimated power is greater than 90% when n = 309.

Various samples sizes for generated for a range of values of sensitivity (80%, 85%, 90%, 95%), specificity (90%, 95%), positive rate (40%, 30%, 20%, 10%) and power (80%, 85%, 90%).

Estimated Sample Sizes Required for Varying Values of Sensitivity, Specificity and Power

40% Positive Rate

Sensitivity

Specificity

Non-Inferiority Limit

Power

Alpha

Positive Rate

Negative Rate

Proportion Discordant

n

80%

90%

0.05

80%

0.05

40%

60%

14.0%

347

85%

90%

0.05

80%

0.05

40%

60%

12.0%

297

90%

90%

0.05

80%

0.05

40%

60%

10.0%

248

95%

90%

0.05

80%

0.05

40%

60%

8.0%

199

                 

80%

95%

0.05

80%

0.05

40%

60%

11.0%

273

85%

95%

0.05

80%

0.05

40%

60%

9.0%

223

90%

95%

0.05

80%

0.05

40%

60%

7.0%

174

95%

95%

0.05

80%

0.05

40%

60%

5.0%

149

                 

80%

90%

0.05

85%

0.05

40%

60%

14.0%

403

85%

90%

0.05

85%

0.05

40%

60%

12.0%

346

90%

90%

0.05

85%

0.05

40%

60%

10.0%

288

95%

90%

0.05

85%

0.05

40%

60%

8.0%

231

                 

80%

95%

0.05

85%

0.05

40%

60%

11.0%

317

85%

95%

0.05

85%

0.05

40%

60%

9.0%

259

90%

95%

0.05

85%

0.05

40%

60%

7.0%

204

95%

95%

0.05

85%

0.05

40%

60%

5.0%

173

                 

80%

90%

0.05

90%

0.05

40%

60%

14.0%

480

85%

90%

0.05

90%

0.05

40%

60%

12.0%

412

90%

90%

0.05

90%

0.05

40%

60%

10.0%

343

95%

90%

0.05

90%

0.05

40%

60%

8.0%

275

                 

80%

95%

0.05

90%

0.05

40%

60%

11.0%

377

85%

95%

0.05

90%

0.05

40%

60%

9.0%

309

90%

95%

0.05

90%

0.05

40%

60%

7.0%

240

95%

95%

0.05

90%

0.05

40%

60%

5.0%

206

                 

30% Positive Rate

Sensitivity

Specificity

Non-Inferiority Limit

Power

Alpha

Positive Rate

Negative Rate

Proportion Discordant

n

80%

90%

0.05

80%

0.05

30%

70%

13.0%

322

85%

90%

0.05

80%

0.05

30%

70%

11.5%

285

90%

90%

0.05

80%

0.05

30%

70%

10.0%

248

95%

90%

0.05

80%

0.05

30%

70%

8.5%

210

                 

80%

95%

0.05

80%

0.05

30%

70%

9.5%

235

85%

95%

0.05

80%

0.05

30%

70%

8.0%

205

90%

95%

0.05

80%

0.05

30%

70%

6.5%

168

95%

95%

0.05

80%

0.05

30%

70%

5.5%

137

                 

80%

90%

0.05

85%

0.05

30%

70%

13.0%

374

85%

90%

0.05

85%

0.05

30%

70%

11.5%

331

90%

90%

0.05

85%

0.05

30%

70%

10.0%

288

95%

90%

0.05

85%

0.05

30%

70%

8.5%

216

                 

80%

95%

0.05

85%

0.05

30%

70%

9.5%

274

85%

95%

0.05

85%

0.05

30%

70%

8.0%

231

90%

95%

0.05

85%

0.05

30%

70%

6.5%

195

95%

95%

0.05

85%

0.05

30%

70%

5.5%

159

                 

80%

90%

0.05

90%

0.05

30%

70%

13.0%

446

85%

90%

0.05

90%

0.05

30%

70%

11.5%

394

90%

90%

0.05

90%

0.05

30%

70%

10.0%

343

95%

90%

0.05

90%

0.05

30%

70%

8.5%

290

                 

80%

95%

0.05

90%

0.05

30%

70%

9.5%

327

85%

95%

0.05

90%

0.05

30%

70%

8.0%

277

90%

95%

0.05

90%

0.05

30%

70%

6.5%

223

95%

95%

0.05

90%

0.05

30%

70%

5.5%

189

                 

20% Positive Rate

Sensitivity

Specificity

Non-Inferiority Limit

Power

Alpha

Positive Rate

Negative Rate

Proportion Discordant

n

80%

90%

0.05

80%

0.05

20%

80%

12.0%

300

85%

90%

0.05

80%

0.05

20%

80%

11.0%

277

90%

90%

0.05

80%

0.05

20%

80%

10.0%

255

95%

90%

0.05

80%

0.05

20%

80%

9.0%

227

                 

80%

95%

0.05

80%

0.05

20%

80%

8.0%

210

85%

95%

0.05

80%

0.05

20%

80%

7.0%

182

90%

95%

0.05

80%

0.05

20%

80%

6.0%

162

95%

95%

0.05

80%

0.05

20%

80%

5.5%

143

                 

80%

90%

0.05

85%

0.05

20%

80%

12.0%

346

85%

90%

0.05

85%

0.05

20%

80%

11.0%

317

90%

90%

0.05

85%

0.05

20%

80%

10.0%

288

95%

90%

0.05

85%

0.05

20%

80%

9.0%

259

                 

80%

95%

0.05

85%

0.05

20%

80%

8.0%

231

85%

95%

0.05

85%

0.05

20%

80%

7.0%

210

90%

95%

0.05

85%

0.05

20%

80%

6.0%

185

95%

95%

0.05

85%

0.05

20%

80%

5.5%

159

                 

80%

90%

0.05

90%

0.05

20%

80%

12.0%

415

85%

90%

0.05

90%

0.05

20%

80%

11.0%

382

90%

90%

0.05

90%

0.05

20%

80%

10.0%

343

95%

90%

0.05

90%

0.05

20%

80%

9.0%

309

                 

80%

95%

0.05

90%

0.05

20%

80%

8.0%

280

85%

95%

0.05

90%

0.05

20%

80%

7.0%

245

90%

95%

0.05

90%

0.05

20%

80%

6.0%

215

95%

95%

0.05

90%

0.05

20%

80%

5.5%

189

                 

10% Positive Rate

Sensitivity

Specificity

Non-Inferiority Limit

Power

Alpha

Positive Rate

Negative Rate

Proportion Discordant

n

80%

90%

0.05

80%

0.05

10%

90%

11.0%

275

85%

90%

0.05

80%

0.05

10%

90%

10.5%

262

90%

90%

0.05

80%

0.05

10%

90%

10.0%

252

95%

90%

0.05

80%

0.05

10%

90%

9.5%

250

                 

80%

95%

0.05

80%

0.05

10%

90%

6.5%

182

85%

95%

0.05

80%

0.05

10%

90%

6.0%

170

90%

95%

0.05

80%

0.05

10%

90%

5.5%

158

95%

95%

0.05

80%

0.05

10%

90%

5.0%

157

                 

80%

90%

0.05

85%

0.05

10%

90%

11.0%

320

85%

90%

0.05

85%

0.05

10%

90%

10.5%

302

90%

90%

0.05

85%

0.05

10%

90%

10.0%

288

95%

90%

0.05

85%

0.05

10%

90%

9.5%

285

                 

80%

95%

0.05

85%

0.05

10%

90%

6.5%

207

85%

95%

0.05

85%

0.05

10%

90%

6.0%

197

90%

95%

0.05

85%

0.05

10%

90%

5.5%

180

95%

95%

0.05

85%

0.05

10%

90%

5.0%

180

                 

80%

90%

0.05

90%

0.05

10%

90%

11.0%

377

85%

90%

0.05

90%

0.05

10%

90%

10.5%

365

90%

90%

0.05

90%

0.05

10%

90%

10.0%

352

95%

90%

0.05

90%

0.05

10%

90%

9.5%

352

                 

80%

95%

0.05

90%

0.05

10%

90%

6.5%

240

85%

95%

0.05

90%

0.05

10%

90%

6.0%

230

90%

95%

0.05

90%

0.05

10%

90%

5.5%

207

95%

95%

0.05

90%

0.05

10%

90%

5.0%

207

                 

Reference

1 https://www.fda.gov/regulatory-information/search-fda-guidance-documents/statistical-guidance-reporting-results-studies-evaluating-diagnostic-tests-guidance-industry-and-fda

Learn more about our services
We are experts in study design, statistical analysis, data science, data capture and reporting for clinical trials