Statistical Analysis Plan for Medical Device Trials: Endpoints, Estimands, Missing Data, and Sensitivity Analyses

Comprehensive guide to developing a Statistical Analysis Plan (SAP) for medical device clinical trials — ICH E9 and E9(R1) estimand framework, primary and secondary endpoint hierarchy, analysis populations (ITT, PP, safety), missing data strategies (MI, tipping-point), sensitivity and supplementary analyses, multiplicity adjustment, SAP timing and approval, and regulatory expectations from FDA, EMA, and EU MDR for device submissions.

What Is a Statistical Analysis Plan and Why It Matters

A Statistical Analysis Plan (SAP) is a standalone document that provides a detailed, technical elaboration of the statistical analyses described in the clinical investigation protocol. While the protocol summarizes the intended statistical approach at a high level, the SAP translates that summary into executable specifications: the exact statistical methods, the rules for deriving analysis variables, the handling of missing data, the definitions of analysis populations, and the complete set of tables, listings, and figures that will be produced.

ICH E9, "Statistical Principles for Clinical Trials," states that the SAP should include "detailed procedures for executing the statistical analysis of the primary and secondary variables and other data." This language positions the SAP as the authoritative reference for how trial results will be generated, analyzed, and presented — a role that goes well beyond the protocol's statistical section.

For medical device trials, the SAP carries particular weight because device clinical evidence packages are scrutinized differently from pharmaceutical submissions. Device trials tend to be smaller, more heterogeneous, and more susceptible to confounding factors such as operator experience, learning curves, and site effects. A well-constructed SAP addresses these challenges head-on by pre-specifying the analytical strategies that will handle them, thereby strengthening the credibility of the results when they are submitted to FDA as part of a 510(k), De Novo, or PMA application, or to EU Notified Bodies under the EU MDR conformity assessment.

The SAP also serves a critical regulatory function: it demonstrates that analytical decisions were made before the data were unblinded or examined, which protects the integrity of the statistical conclusions. Post-hoc analyses — those conceived after seeing the data — have a legitimate role in generating hypotheses but cannot support confirmatory claims. The SAP draws this boundary clearly.

SAP Timing and Regulatory Expectations

When the SAP is finalized matters as much as what it contains. Regulatory authorities and ICH guidelines set clear expectations for SAP timing:

For blinded trials: The SAP should be finalized and signed off before database lock. ICH E9 Section 5.1 specifies that the statistical analysis plan should be completed before the blind is broken and any comparative analysis begins. Finalizing the SAP after unblinding — even if the analysis has not yet been run — fundamentally undermines the pre-specified nature of the analysis and is a serious protocol violation.

For open-label and unblinded device trials: Many medical device trials are inherently unblinded — surgeons know which implant they are placing, patients know which device they received. In these cases, the SAP should be finalized before the First Patient First Visit (FPFV), because the investigators and potentially the sponsor have access to accumulating outcome data from the start. The European standard ISO 14155:2020 on clinical investigations of medical devices in human subjects reinforces the importance of pre-specifying the statistical analysis approach, and the forthcoming ISO 14155:2026 edition goes further by explicitly incorporating the ICH E9(R1) estimand framework.

SAP amendments after database lock are permitted only in exceptional circumstances and must be fully documented with detailed justification. The amendment must explain why the change was necessary, demonstrate that it was not motivated by knowledge of the results, and be approved by the same signatories who approved the original SAP. FDA and EMA both expect a clear audit trail for any SAP modifications.

FDA expectations: For Investigational Device Exemption (IDE) applications, FDA expects to see a detailed statistical analysis section in the protocol and often requests the SAP as part of pre-submission discussions (the Q-Submission process). For pivotal trials supporting PMA or De Novo submissions, FDA reviewers will cross-reference the protocol, SAP, and final clinical study report to verify consistency. Discrepancies between these documents generate major deficiencies that can delay approval.

EU MDR and ISO 14155 expectations: Under the EU MDR, clinical investigations must be conducted in accordance with ISO 14155. The standard requires a clinical investigation plan that includes statistical considerations, and the SAP provides the detailed elaboration of those considerations. Notified Bodies reviewing clinical evidence under Annex XIV and Annex XV of the MDR will look for a pre-specified analysis plan as part of the clinical investigation documentation.

SAP Structure and Contents

A comprehensive SAP for a medical device clinical trial typically contains the following sections. This structure aligns with ICH E9 recommendations and FDA expectations for device submissions:

Comprehensive SAP Outline

1. Title Page and Administrative Information

Protocol number, SAP version number, and date
Signatures of the biostatistician, medical monitor, and sponsor representative
Amendment history table (version, date, description of change, justification)

2. Study Overview

Study title, protocol number, and sponsor
Study objectives (primary, secondary, exploratory) restated from the protocol
Study design summary (randomization, blinding, controls, follow-up schedule)
Device description and intended use

3. Objectives, Endpoints, and Estimands

Mapping of each objective to its corresponding endpoint
Estimand definitions per ICH E9(R1) (treatment condition, population, variable, intercurrent event handling, population-level summary)
Hierarchy of endpoints (primary, key secondary, other secondary, exploratory)
Derived variable definitions and calculation rules

4. Sample Size Justification

Reference to protocol sample size section or independent calculation
Assumptions used (effect size, variability, power, significance level)
Adjustment for dropouts and missing data

5. Analysis Populations

Definitions of each analysis set (ITT, mITT, per-protocol, safety, as-treated)
Rules for assigning subjects to populations
Handling of subjects with protocol deviations

6. Statistical Methods

General analytical conventions (significance level, confidence intervals, one-sided vs two-sided tests)
Primary endpoint analysis method with full specification
Secondary endpoint analysis methods
Exploratory endpoint analyses
Covariate adjustment strategies
Handling of treatment group comparisons

7. Missing Data Handling

Distinction between intercurrent events and missing data
Primary imputation strategy aligned with the estimand
Sensitivity analyses for missing data
Tipping-point analysis specifications

8. Sensitivity and Supplementary Analyses

Sensitivity analyses to test robustness of primary results
Supplementary analyses investigating alternative estimands
Per-protocol analysis, as-treated analysis
Covariate-adjusted analyses
Subgroup analyses (pre-specified only)

9. Interim Analysis Plan (if applicable)

Interim analysis objectives and timing
Stopping rules and decision criteria
Alpha spending function
Independent Data Monitoring Committee (IDMC) procedures
Procedures for maintaining confidentiality of interim results

10. Multiplicity Adjustments

Multiple endpoint adjustment strategy
Multiple comparison procedures
Fixed-sequence, gatekeeping, or other hierarchical testing approaches
Alpha allocation for interim analyses

11. Subgroup Analyses

Pre-specified subgroups with justification
Statistical methods for subgroup analyses (interaction tests)
Limitations and interpretation guidance

12. Safety Analysis

Adverse event coding and classification
Severity grading criteria
Relationship assessment to device
Serious adverse event (SAE) analysis
Analysis of device deficiencies and complaints
Summary tables by system organ class and preferred term

13. Tables, Listings, and Figures (TLF) Specifications

Shell tables with row and column headings
Listing specifications
Figure specifications (Kaplan-Meier plots, forest plots, change-over-time plots)
Reference to each TLF by number throughout the SAP

14. Software and Computational Details

Statistical software to be used (SAS, R, or other validated software)
Version numbers
Procedure calls or macros to be used
Rounding conventions

Endpoints and the Estimand Framework (ICH E9(R1))

The ICH E9(R1) addendum on estimands and sensitivity analysis, finalized in 2019 and increasingly adopted across drug and device development, fundamentally changes how clinical trial objectives are translated into statistical analyses. Rather than starting with a statistical test, the estimand framework starts with a precise description of what the trial intends to estimate — the treatment effect that the sponsor wishes to quantify.

ISO 14155:2026, the updated edition of the clinical investigation standard for medical devices, explicitly references the estimand framework in its Annex K on clinical investigation design considerations. This signals that the estimand approach is now formally part of the device clinical investigation landscape, not just a pharmaceutical concern.

The Five Attributes of an Estimand

An estimand is defined by five attributes:

Treatment condition: The intervention being evaluated, including the investigational device and any comparators or control treatments. For device trials, this must specify the device configuration, the delivery technique, and any concomitant therapies that are part of the treatment strategy.
Population: The patients or subjects targeted by the clinical question, defined by inclusion and exclusion criteria and any relevant disease or anatomical characteristics.
Endpoint (variable): The clinical outcome to be measured at what timepoint, including the measurement instrument or assessment method.
Intercurrent event handling: The strategy for addressing events that occur after treatment initiation and can affect the interpretation of the treatment effect — things like treatment discontinuation, use of rescue medication, device removal or revision, switching to a different treatment, or death.
Population-level summary: How the endpoint will be summarized for the population — a mean difference, a risk ratio, a hazard ratio, a difference in proportions, or another summary measure.

Five Strategies for Handling Intercurrent Events

ICH E9(R1) defines five strategies for addressing intercurrent events. The choice of strategy determines both the estimand and the statistical methods needed to estimate it:

Strategy	Description	Device Scenario
Treatment policy	Regardless of whether the intercurrent event occurred, the outcome is included in the analysis. The treatment effect reflects the policy of offering the treatment.	A cardiac stent trial where patients who receive bailout stenting or additional revascularization are still followed and analyzed. The treatment effect represents the real-world consequence of choosing this device.
Composite strategy	The intercurrent event is incorporated into the endpoint definition itself.	An orthopedic implant trial where the primary endpoint is defined as "success" only if the patient achieved functional improvement AND did not require revision surgery. Revision surgery makes the outcome a failure by definition.
While-on-treatment strategy	Only outcomes observed while the patient is still receiving or affected by the treatment are considered.	A neuromodulation device trial where outcomes are assessed only while the device is active and functioning. If the device is explanted, subsequent outcomes are not part of this estimand.
Hypothetical strategy	The treatment effect is estimated under a hypothetical scenario where the intercurrent event would not have occurred.	An ablation catheter trial where patients who switch to antiarrhythmic drugs are analyzed as if they had not switched, using multiple imputation to estimate what their outcomes would have been on the ablation device alone.
Principal stratum strategy	The analysis targets the subset of patients who would or would not experience the intercurrent event regardless of which treatment they were assigned.	A surgical robot trial analyzing only patients at sites where the surgical team was fully trained before enrollment began, to estimate the treatment effect in a setting without a learning curve.

How Estimands Reshape SAP Structure

The estimand framework changes the logical flow of a SAP. Instead of:

Objective -> Endpoint -> Statistical Test

The new structure becomes:

Objective -> Estimand (with all five attributes) -> Estimator (statistical method) -> Sensitivity Analyses (aligned to the same estimand) -> Supplementary Analyses (exploring different estimands)

This structure makes the analytical logic transparent. A device trial might have one primary estimand (treatment policy approach, analyzed using ITT) and a supplementary estimand (composite strategy, analyzed using per-protocol), each with its own estimator and sensitivity analyses.

Practical Example: Orthopedic Implant Trial

Consider a randomized trial comparing a new total knee replacement implant to an established competitor. The primary clinical question is: does the new implant provide superior functional improvement at 24 months?

Using the estimand framework, the primary estimand would be defined as:

Treatment condition: New implant system with standard surgical technique vs. predicate implant system
Population: Adults aged 50-80 with end-stage osteoarthritis indicated for total knee arthroplasty
Endpoint: Change in Knee injury and Osteoarthritis Outcome Score (KOOS) from baseline to 24 months
Intercurrent events: Treatment policy strategy — patients who undergo revision surgery, switch to contralateral procedures, or receive additional interventions remain in the analysis with their observed outcomes
Population-level summary: Difference in mean change from baseline between groups

A supplementary estimand might use the composite strategy:

Treatment condition and Population: Same as above
Endpoint: KOOS improvement at 24 months, with patients who underwent revision surgery classified as treatment failures (KOOS change = 0)
Intercurrent events: Composite strategy — revision surgery is built into the endpoint as a failure
Population-level summary: Difference in proportions achieving a minimally clinically important difference (MCID) in KOOS, with revision patients counted as non-responders

Both estimands address the same general question but from different perspectives. The first reflects the practical consequence of choosing the device; the second isolates the intrinsic performance of the device itself.

Analysis Populations

The definition of analysis populations is one of the most consequential decisions in the SAP because it determines which subjects contribute to each analysis and directly affects the interpretation of results. ICH E9 identifies several standard analysis sets, and the choice of primary analysis population depends on the study objective.

Definitions and Regulatory Context

Intention-to-Treat (ITT) Population: All subjects who were randomized, regardless of whether they received the assigned treatment, completed the study, or adhered to the protocol. The ITT principle preserves the benefits of randomization and is the primary analysis population for superiority trials. For device trials, this means subjects randomized to the investigational device are analyzed in that group even if they crossed over to control, received a different device, or withdrew before treatment.

Modified ITT (mITT) Population: A subset of the ITT population defined by additional criteria, most commonly "randomized subjects who received at least one application of the investigational device" or "randomized subjects for whom the index procedure was attempted." The mITT is acceptable when there is a clear rationale for excluding randomized subjects who never had the opportunity to be affected by the treatment — for example, subjects randomized to a surgical device who were taken to the operating room but found to have anatomy that precluded device placement.

Per-Protocol (PP) Population: The subset of ITT subjects who completed the study without major protocol deviations, received the assigned treatment as specified, and had adequate data for the primary endpoint assessment. The PP analysis is important for non-inferiority trials, where protocol deviations in the ITT population can bias results toward non-inferiority by diluting the treatment effect.

Safety Population: All subjects who received any amount of the investigational device or control treatment, regardless of randomization assignment. This is typically the primary population for safety analyses. For device trials, "received" may be defined as having the device implanted, applied, or used during the procedure.

As-Treated Population: Subjects are analyzed according to the treatment they actually received, not the treatment to which they were randomized. This population is used for supplementary analyses to explore whether the actual treatment received, rather than the assigned treatment, drove the observed outcomes.

Choosing the Primary Analysis Population

Study Objective	Primary Population	Supportive Population	Rationale
Superiority	ITT	PP, mITT	ITT preserves randomization and provides a conservative bias for superiority; PP confirms robustness
Non-inferiority	ITT and PP (co-primary)	mITT, as-treated	Both populations are essential because ITT can bias toward non-inferiority while PP can bias against it; consistent results across both strengthen conclusions
Safety assessment	Safety	ITT	Safety population captures all device exposures; ITT supports intention-to-treat safety assessment
Single-arm device study	mITT or full analysis set	Per-protocol	No randomization to preserve; mITT captures treated patients; PP assesses the effect in ideal adherers

Decision Flowchart: Handling Protocol Deviations in Analysis Populations

The SAP should include explicit rules for how subjects with specific protocol deviations are handled in each analysis population:

Deviation Type	ITT	mITT	PP	Safety
Randomized but not treated	Included	Excluded	Excluded	Excluded
Received wrong device	Analyzed as randomized	Analyzed as randomized	Excluded	Analyzed as treated
Major inclusion/exclusion violation	Included	Included	Excluded	Included
Premature discontinuation (device-related)	Included	Included	Excluded	Included
Premature discontinuation (not device-related)	Included	Included	Case-by-case	Included
Missed primary endpoint visit	Included (with imputation if applicable)	Included	Excluded	Included
Site with major GCP findings	Pre-specified rule in SAP	Pre-specified rule in SAP	Typically excluded	Pre-specified rule in SAP

Statistical Methods for Primary and Secondary Endpoints

The statistical methods section is the technical core of the SAP. Each endpoint must have a fully specified analytical method, including the model, covariates, estimation approach, and hypothesis testing framework.

Continuous Endpoints

Continuous endpoints — such as change in a clinical score, measurement of a physiological parameter, or patient-reported outcome measure — are common in device trials.

Method	When to Use	Key Considerations
Two-sample t-test	Comparison of means between two groups with normally distributed data	Check normality assumption; consider non-parametric alternatives if violated
ANCOVA	Comparison of adjusted means controlling for baseline score and other prognostic covariates	Preferred over t-test when baseline measurements are available; increases statistical power; specify covariates in advance
Mixed Model for Repeated Measures (MMRM)	Longitudinal data with measurements at multiple timepoints; handles missing data under missing-at-random assumption	Accounts for within-subject correlation; does not require imputation of missing values; specify covariance structure (unstructured, compound symmetry, autoregressive)
Linear regression	Adjusting for multiple covariates; predicting a continuous outcome	Specify all covariates a priori; check model assumptions (linearity, homoscedasticity, normality of residuals)

For device trials, ANCOVA adjusting for baseline score and important prognostic factors is often the primary analysis for continuous endpoints. MMRM is preferred when there are multiple post-baseline measurements over time.

Binary and Categorical Endpoints

Binary outcomes — success/failure, responder/non-responder, complication present/absent — are the most common endpoint type in device trials.

Method	When to Use	Key Considerations
Chi-square test	Comparing proportions between two groups with adequate sample size (expected cell counts greater than 5)	Simple and well-understood; does not adjust for covariates
Fisher's exact test	Comparing proportions with small sample sizes or sparse cells	Appropriate when chi-square assumptions are not met; use for small device trials
Logistic regression	Comparing proportions while adjusting for covariates; modeling probability of a binary outcome	Specify covariates a priori; report odds ratios with confidence intervals; check for separation issues in small samples
Cochran-Mantel-Haenszel (CMH) test	Comparing proportions while stratifying by a categorical variable (often study site)	Useful in multicenter device trials where site effects need to be controlled; specify stratification factors in advance

Time-to-Event Endpoints

Time-to-event analysis is used for endpoints such as time to device failure, time to re-intervention, time to target vessel revascularization, or survival.

Method	When to Use	Key Considerations
Kaplan-Meier estimator	Estimating the survival or event-free probability over time	Report median time-to-event with confidence intervals if event rate is sufficient; display graphically
Log-rank test	Comparing survival curves between two or more groups	Standard test for comparing time-to-event distributions; does not adjust for covariates
Cox proportional hazards model	Comparing hazard rates while adjusting for covariates	Check proportional hazards assumption; report hazard ratios with confidence intervals; specify covariates in the SAP

Non-Inferiority Testing

Non-inferiority trials are common in medical device development — demonstrating that a new device is not meaningfully worse than an established control. Two primary methods are used:

Fixed margin method: A pre-specified non-inferiority margin (delta) is established based on historical evidence of the active control's effect. The new device is declared non-inferior if the confidence interval for the treatment difference excludes the margin. FDA typically expects the margin to preserve a fraction (often 50%) of the control's historical treatment effect.

Synthesis method (also called the confidence interval method): The historical data and the current trial data are combined to test whether the new device retains a sufficient fraction of the control effect, without explicitly pre-specifying a fixed margin. This method is less commonly used in device submissions but is accepted by both FDA and EMA when properly justified.

Device-Specific Analytical Considerations

Medical device trials introduce statistical challenges that are less common in drug trials:

Learning curve effects: Early cases at a site may have different outcomes than later cases as operators gain experience with the device. The SAP may pre-specify sensitivity analyses excluding the first N cases per site or including a learning curve covariate.
Operator/surgeon effects: Outcomes may vary by operator skill, volume, or experience. Mixed models with random effects for operator or site can account for this clustering.
Clustering by site: Multicenter device trials should account for within-site correlation, particularly when site volume varies substantially. Generalized estimating equations (GEE) or random-effects models handle this.
Device configuration or sizing: If the device comes in multiple sizes or configurations, the SAP should specify whether these are pooled or analyzed separately.

Missing Data Strategies

Missing data is an inevitable challenge in clinical trials, and the SAP must specify how it will be handled. ICH E9(R1) makes a critical distinction that reshapes the missing data landscape: intercurrent events are not the same as missing data, and the two require different strategies.

Intercurrent Events vs. Missing Data

An intercurrent event is an event that occurs after treatment initiation and affects the interpretation of the treatment effect — treatment discontinuation, device removal, use of rescue therapy, or death. These are clinical events, not data artifacts, and the estimand framework addresses them through the five strategies described earlier.

Missing data refers to the absence of endpoint measurements that should have been collected but were not — because the subject missed a follow-up visit, withdrew consent, was lost to follow-up, or died (when death is not part of the endpoint). Missing data is a data limitation, not a clinical event.

A subject may experience an intercurrent event but still have complete outcome data (a patient whose device was removed but who returned for the 12-month assessment). Conversely, a subject may have no intercurrent events but have missing outcome data (a patient who was doing well but moved away and could not be reached for the final visit).

Primary Analysis Approach Depends on the Estimand

The handling of missing data in the primary analysis must be consistent with the estimand:

For a treatment policy estimand, missing data is a measurement problem. Multiple imputation or mixed models can be used under missing-at-random assumptions, because the estimand includes all patients regardless of intercurrent events.
For a composite estimand, missing data may be handled by defining patients with missing outcomes as failures, because the composite endpoint already incorporates unfavorable outcomes.
For a hypothetical estimand, missing data after intercurrent events must be imputed under the hypothetical scenario (e.g., what would the outcome have been had the patient not switched treatments?).

Common Missing Data Methods

Method	Description	Recommendation
Complete case analysis	Analyze only subjects with observed data for the endpoint of interest	Simple but can introduce bias if missingness is related to outcome; generally not recommended as primary analysis
Multiple imputation (MI)	Create multiple datasets with imputed values drawn from a predictive distribution, analyze each, and pool results using Rubin's rules	Recommended for many device trials; specify the imputation model and number of imputations (at least 20); align with estimand
Mixed models (MMRM)	Implicitly handles missing data under the missing-at-random assumption without requiring explicit imputation	Preferred for longitudinal continuous endpoints; specify covariance structure
Last observation carried forward (LOCF)	The last observed value is used in place of all subsequent missing values	Largely discouraged by FDA and EMA for confirmatory analyses; can bias results in both directions; acceptable only as a sensitivity analysis
Baseline observation carried forward (BOCF)	The baseline value is carried forward for all missing post-baseline measurements	Conservative approach that assumes no treatment benefit; sometimes used as a sensitivity analysis in non-inferiority trials
Tipping-point analysis	Systematically varies the imputed values for missing data from favorable to unfavorable until the conclusion changes	Strongly recommended as a sensitivity analysis; demonstrates the robustness of conclusions to missing data assumptions
Pattern-mixture models	Models stratify subjects by their missing data pattern and estimate treatment effects within each pattern	Useful for exploring departures from the missing-at-random assumption
Selection models	Jointly model the outcome and the probability of being missing	Technically rigorous but requires strong assumptions; used less frequently in practice

Tipping-Point Analysis in Practice

Tipping-point analysis is increasingly expected by regulatory reviewers as a sensitivity analysis for missing data. The approach works as follows:

Identify subjects with missing primary endpoint data in each treatment group.
Impute the missing values using the primary imputation method (e.g., multiple imputation).
Systematically shift the imputed values for subjects in the treatment group to progressively worse values (or shift the control group to better values, or both).
At each shift value, re-run the primary analysis and record whether the conclusion changes.
The tipping point is the value of the shift at which the conclusion changes from statistically significant to non-significant (or vice versa).

If the tipping point requires an implausibly large shift — for example, all missing subjects in the treatment group would need to have outcomes worse than 99% of observed subjects — then the primary conclusion is robust to missing data.

Practical Recommendations for Missing Data

The best strategy for missing data is to prevent it through thoughtful trial design: short follow-up periods, patient-friendly visit schedules, retention strategies, and clear communication about the importance of completing the study. When missing data is unavoidable, the SAP should pre-specify a primary approach aligned with the estimand and a set of sensitivity analyses that test the robustness of conclusions under different assumptions about the missing data mechanism.

Sensitivity and Supplementary Analyses

ICH E9(R1) requires that the primary estimator be accompanied by sensitivity analyses that test the robustness of the primary result to deviations from the assumptions underlying the primary analysis. The distinction between sensitivity analyses and supplementary analyses is important and often misunderstood.

Sensitivity vs. Supplementary Analyses

Sensitivity analyses test the robustness of the primary result by varying the assumptions of the primary analysis while remaining aligned to the same estimand. If the primary analysis uses multiple imputation under a missing-at-random assumption, a sensitivity analysis might use tipping-point analysis to explore how robust the conclusion is to departures from missing-at-random. Sensitivity analyses should not change the clinical question — they address the same estimand with different analytical approaches.

Supplementary analyses provide additional perspectives by addressing different estimands, investigating different populations, or using different endpoint definitions. A per-protocol analysis, an as-treated analysis, or an analysis of a composite endpoint that incorporates device revisions are supplementary analyses because they address different clinical questions from the primary estimand.

Recommended Sensitivity Analyses by Trial Type

Trial Type	Primary Analysis	Sensitivity Analyses	Supplementary Analyses
Superiority (randomized)	ITT, ANCOVA with multiple imputation	MMRM without imputation; tipping-point analysis; different covariate sets; LOCF; complete case	Per-protocol; as-treated; subgroup analyses; site-adjusted analysis
Non-inferiority (randomized)	ITT and PP (co-primary), fixed margin	Tipping-point analysis; different imputation models; BOCF for NI margin sensitivity	As-treated; per-protocol (if not co-primary); historical control comparison
Single-arm device study	mITT, one-sample test vs. performance goal	Different imputation strategies; complete case; worst-case imputation for missing failures	Per-protocol; sensitivity to performance goal choice; benchmarking against historical data
Diagnostic accuracy study	Sensitivity and specificity vs. reference standard	Different positivity thresholds (ROC analysis); reader variability analysis	Per-reader analysis; subgroup by disease severity; paired comparison of index test vs. comparator
Registry-based study	Pre-specified cohort with propensity score adjustment	Different propensity score models; inverse probability weighting; instrumental variable analysis	Subgroup analyses; sensitivity to unmeasured confounding

Specific Sensitivity Analyses to Pre-Specify

The SAP should include the following sensitivity analyses where applicable:

Alternative missing data approaches: At minimum, one approach that makes a different assumption about the missing data mechanism than the primary analysis.
Per-protocol analysis: Excluding subjects with major protocol deviations, to assess whether the treatment effect is robust when the analysis is restricted to subjects who received treatment as intended.
Covariate-adjusted analyses: If the primary analysis does not adjust for covariates, a sensitivity analysis with covariate adjustment (and vice versa).
Site-adjusted analyses: Including site as a fixed or random effect in multicenter trials, to assess whether site heterogeneity affects results.
Learning curve sensitivity: Excluding early cases per site or per operator, to assess whether operator learning affects the treatment effect estimate.
Alternative endpoint definitions: Using a different threshold for defining responders, or a different timepoint for the primary assessment.

Multiplicity Adjustments

Multiplicity arises when a clinical trial involves multiple statistical tests, and the chance of a false positive conclusion (Type I error) increases with each additional test. The SAP must specify how the overall Type I error rate will be controlled.

When Multiplicity Adjustment Is Needed

Multiplicity adjustment is required in the following scenarios:

Multiple primary endpoints: If a trial has two or more co-primary endpoints, all must be statistically significant for the trial to be considered successful. No alpha adjustment is needed because the endpoints are tested concurrently, but the power implications are substantial.
Multiple comparisons: Comparing the investigational device to multiple controls, or testing at multiple dose levels or device configurations.
Multiple timepoints: If the primary endpoint is assessed at multiple timepoints and a significant result at any timepoint would support a claim of effectiveness.
Interim analyses: Each interim look at the data consumes alpha, requiring adjustment through an alpha spending function.

Common Multiplicity Adjustment Methods

Method	Description	When to Use
Bonferroni correction	Divides the significance level equally among all tests (alpha / number of tests)	Simple and conservative; appropriate when the number of tests is small
Holm procedure	A step-down procedure that tests the smallest p-value against alpha/k, the next smallest against alpha/(k-1), and so on	Uniformly more powerful than Bonferroni; controls familywise error rate; easy to implement
Hochberg procedure	A step-up procedure that is more powerful than Holm when test statistics are positively correlated	Appropriate when endpoints are expected to be positively correlated
Gatekeeping strategy	Tests are ordered in a sequence; secondary endpoints are tested only if the primary endpoint is significant	Device trials commonly use this approach with a hierarchy: primary -> key secondary -> other secondary
Fixed-sequence testing	Endpoints are tested in a pre-specified order at the full alpha level; testing stops at the first non-significant result	Simple and intuitive; commonly used in PMA pivotal trials
Alpha spending functions	Allocates portions of the total alpha to each interim analysis	Required for trials with interim analyses; O'Brien-Fleming and Pocock boundaries are the most common choices

Alpha Spending for Interim Analyses

When a device trial includes an interim analysis — for early efficacy, futility, or safety — the SAP must specify how alpha is spent:

O'Brien-Fleming boundaries: Conservative at early interim analyses, requiring very strong evidence to stop early, while spending most of the alpha at the final analysis. This is the most common choice for device pivotal trials because it preserves statistical power for the final analysis.
Pocock boundaries: Use the same significance threshold at each interim analysis and the final analysis, but the threshold is more stringent than 0.05. This approach makes it easier to stop early but reduces power at the final analysis.

The SAP should also specify who has access to unblinded interim results (typically an Independent Data Monitoring Committee), the exact timing of interim analyses (in terms of number of subjects or events), and the decision rules for each possible outcome (stop for efficacy, stop for futility, continue).

SAP for Different Device Trial Types

The rigor and formality of the SAP varies by the type of device trial. The following table summarizes SAP requirements and expectations for common device trial types:

Trial Type	SAP Formality	Key SAP Features	Regulatory Scrutiny
Pivotal PMA trial	Most formal; detailed standalone document	Pre-specified primary endpoint with detailed analysis method; full multiplicity plan; comprehensive sensitivity analyses; interim analysis plan if applicable	Highest — FDA Division summary review and panel presentation
De Novo clinical study	Formal; standalone SAP recommended	Primary endpoint with effect size justification; non-inferiority or superiority framework; covariate adjustment plan	High — FDA Office of Device Evaluation review
510(k) clinical study	Moderate; can be embedded in protocol or standalone	Clearly defined endpoints and analysis populations; basic sensitivity analyses	Moderate — clinical data is one element of substantial equivalence
PMCF study (EU MDR)	Less formal but still pre-specified	Pre-specified analysis plan to prevent data dredging; comparison to performance goals or historical benchmarks	Moderate to high — Notified Body review under Annex XIV Part B
Diagnostic accuracy study	Formal standalone SAP	Sensitivity, specificity, ROC analysis; paired comparison design; reader agreement analysis; sample size based on expected accuracy	High — FDA Division of Radiological Health or corresponding division
Registry-based study	Moderate; analysis plan published or documented	Propensity score or other confounding adjustment; sensitivity to unmeasured confounding; pre-specified covariates	Variable — depends on whether results support regulatory submission

Pivotal PMA Trials

Pivotal trials for Premarket Approval applications have the most rigorous SAP requirements. FDA expects the primary endpoint, statistical hypotheses, sample size justification, analysis populations, primary analysis method, multiplicity adjustment strategy, and sensitivity analyses to be fully pre-specified and documented. The SAP is typically reviewed during the IDE process and may be discussed during pre-submission meetings.

The FDA's acceptance of the primary endpoint and success criteria is established before the trial begins. Changes to the primary endpoint after trial initiation — even if documented in an SAP amendment — will face intense regulatory scrutiny and may render the trial unpersuasive.

Diagnostic Accuracy Studies

Diagnostic device trials require specific statistical methods that differ from therapeutic device trials:

Sensitivity and specificity against a reference standard, with exact binomial confidence intervals
Receiver Operating Characteristic (ROC) curve analysis, including the area under the curve (AUC) and the selection of the optimal positivity threshold
Positive and negative predictive values, reported at clinically relevant prevalence rates
Paired comparisons when each subject receives both the index test and the reference standard
Inter-reader variability analysis using kappa statistics or intraclass correlation coefficients when multiple readers interpret the test results

The SAP for a diagnostic accuracy study should specify the reference standard, the rules for resolving discordant results, the handling of indeterminate or uninterpretable test results, and the statistical methods for comparing the index test to a comparator.

Registry-Based Studies

Registries and real-world data studies present unique challenges because the data are collected outside a controlled experimental setting. The SAP must address confounding by pre-specifying the adjustment strategy (propensity score matching, inverse probability of treatment weighting, or multivariable regression), the covariates to be included, the sensitivity analyses for unmeasured confounding, and the handling of missing covariate data.

The European Network of Health Technology Assessment (EUnetHTA) and FDA have both published guidance on using real-world evidence in regulatory submissions, and a pre-specified SAP is essential for registry-based studies to be considered credible evidence.

Common SAP Deficiencies Found in Regulatory Review

Regulatory reviewers at FDA and EU Notified Bodies frequently identify the following deficiencies in SAPs submitted as part of device clinical evidence packages:

1. Inadequate Specification of Primary Analysis Method

Many SAPs state that the primary endpoint will be analyzed using "appropriate statistical methods" or name a method without fully specifying it. FDA expects the SAP to include the exact model specification — for ANCOVA, the dependent variable, independent variables (treatment group, baseline score, stratification factors), and the estimation approach; for MMRM, the fixed effects, covariance structure, and degrees of freedom method.

2. Missing Estimand Definitions

The most common gap in contemporary SAPs is the absence of explicit estimand definitions, particularly the handling of intercurrent events. Without specifying whether treatment discontinuation, device removal, or use of rescue therapy are addressed through a treatment policy, composite, hypothetical, or other strategy, the SAP leaves the clinical question ambiguous. Reviewers increasingly expect ICH E9(R1) estimand language even for device trials.

3. No Pre-Specified Sensitivity Analyses

SAPs that present only a single primary analysis without any sensitivity analyses for missing data, protocol deviations, or model assumptions are vulnerable to regulatory questions. ICH E9(R1) explicitly requires sensitivity analyses aligned with the primary estimand, and their absence suggests that the sponsor has not considered the robustness of the conclusions.

4. Unclear Handling of Protocol Deviations in Analysis Populations

Vague definitions of the per-protocol population — such as "subjects who completed the study without major protocol deviations" without specifying which deviations are considered major — leave too much room for post-hoc decisions. The SAP must enumerate the specific protocol deviations that will lead to exclusion from the per-protocol population.

5. Undocumented Post-Hoc Changes to the SAP

Changes to the SAP after database lock or after unblinding that are not documented, justified, and approved through the amendment process are a serious finding. Even well-intentioned changes — such as adding a covariate to the primary model after noticing an imbalance — can be viewed as data-driven if not pre-specified.

6. Inconsistency Between Protocol Statistical Section and SAP

The protocol's statistical section and the SAP must be consistent. If the protocol states that the primary analysis will use a chi-square test and the SAP specifies logistic regression, the discrepancy must be explained and documented. If the protocol specifies a per-protocol primary analysis and the SAP specifies ITT, the inconsistency will be flagged.

7. Inadequate Handling of Device-Specific Factors

SAPs that treat device trials identically to drug trials — ignoring learning curves, operator effects, site volume differences, and device configuration variability — miss important sources of variability that can affect the treatment effect estimate. Reviewers expect the SAP to address these device-specific factors through pre-specified sensitivity or subgroup analyses.

Key Takeaways

Finalize the SAP before unblinding or, for open-label device trials, before the first patient is treated. The timing of SAP finalization is a regulatory requirement, not a recommendation. An SAP finalized after data are available is not a plan — it is a post-hoc description.
Define explicit estimands with all five ICH E9(R1) attributes, including intercurrent event handling. The estimand framework is now referenced in ISO 14155:2026 and is increasingly expected by FDA and EU Notified Bodies. Skipping this step creates ambiguity about what the trial actually estimates.
Specify the primary analysis method completely — model, covariates, estimator, and hypothesis — not just the name of a statistical test. Reviewers need to be able to replicate the analysis from the SAP specification alone.
Pre-specify sensitivity analyses for missing data, protocol deviations, and model assumptions. Tipping-point analysis is the single most effective tool for demonstrating robustness to missing data. Include it in every SAP where missing data is a possibility.
Address device-specific factors in the SAP: operator effects, learning curves, site variability, and device configuration. These factors are unique to device trials and ignoring them is a common deficiency flagged in regulatory review.
Use hierarchical testing (gatekeeping or fixed-sequence) for multiplicity adjustment in pivotal device trials. This approach is transparent, well-understood by reviewers, and preserves statistical power for the primary endpoint.
Ensure consistency between the protocol statistical section, the SAP, and the statistical methods section of the final clinical study report. Discrepancies between these three documents are among the most common — and most easily preventable — regulatory deficiencies.