SaMD Clinical Evaluation: How to Generate and Document Clinical Evidence for Software as a Medical Device Under FDA and EU MDR

How to conduct clinical evaluation for Software as a Medical Device (SaMD) — valid clinical association, analytical validation, and clinical performance under IMDRF N41, FDA requirements, and EU MDR Article 61, including generating clinical evidence without traditional clinical trials, real-world data strategies, and navigating the January 2026 FDA withdrawal of the SaMD clinical evaluation guidance.

Why SaMD Clinical Evaluation Is Different

Software as a Medical Device (SaMD) — software intended for medical purposes that operates independently of a physical medical device — presents unique challenges for clinical evaluation. Traditional medical device clinical evaluation relies heavily on physical testing, biocompatibility data, and clinical performance studies. SaMD has no physical form, no biocompatibility concerns, and often cannot be evaluated through traditional bench testing. Yet regulators require the same level of confidence in safety and effectiveness.

On January 6, 2026, the FDA formally withdrew its 2017 guidance "Software as a Medical Device (SaMD): Clinical Evaluation." Originally based on the International Medical Device Regulators Forum (IMDRF) framework document N41, the guidance had shaped how many organizations approached clinical evidence for standalone medical software. Its withdrawal does not reduce regulatory expectations — instead, it signals that the FDA wants manufacturers to adopt more tailored, risk-based clinical evidence strategies rather than following a single prescriptive framework. The IMDRF N41 principles remain foundational, and the EU MDR continues to reference them through MDCG 2020-1.

This creates a practical challenge for manufacturers. The regulatory bar has not been lowered, but the prescriptive roadmap has been removed. SaMD developers must now determine for themselves what clinical evidence is sufficient, justify that determination, and maintain lifecycle governance of their clinical evidence. This guide explains the three-pillar framework for SaMD clinical evaluation, how to generate clinical evidence when traditional clinical trials are impractical, what FDA and EU MDR expect in your submissions, and how to manage clinical evidence as an ongoing lifecycle activity.

The Three Pillars of SaMD Clinical Evaluation

The IMDRF N41 framework established three pillars for SaMD clinical evaluation that remain the foundational approach used globally, even after the FDA's withdrawal of its specific guidance. Each pillar builds upon the previous one.

Pillar 1: Valid Clinical Association

A valid clinical association establishes that the output of the SaMD is clinically meaningful — that there is a legitimate scientific basis for the relationship between the software's output and the targeted clinical condition. This answers the question: Does the clinical science support what this software is claiming to do?

For example, if a SaMD analyzes retinal images to screen for diabetic retinopathy, the valid clinical association is the established relationship between the retinal features the software detects and the presence or absence of diabetic retinopathy as confirmed by clinical standards. The association is supported by ophthalmology literature, clinical practice guidelines, and established diagnostic criteria.

Establishing a valid clinical association typically involves:

Literature review of the clinical science supporting the SaMD's intended function
Analysis of clinical guidelines and consensus statements
Expert clinical opinion confirming the clinical relevance of the SaMD output
Documentation of the clinical pathway from SaMD output to patient management decision

Pillar 2: Analytical Validation (Technical Performance)

Analytical validation demonstrates that the software accurately and reliably measures, detects, or processes the data it claims to handle. This answers the question: Does the software work correctly from a technical standpoint?

Analytical validation is the most straightforward pillar for most SaMD. It involves testing the software's technical performance against predefined specifications:

Accuracy: Does the software produce correct outputs for known inputs?
Precision and reproducibility: Does the software produce the same results when given the same inputs repeatedly?
Sensitivity and specificity: Can the software correctly detect the conditions it claims to detect?
Robustness: Does the software perform reliably across the range of expected input quality, noise, and edge cases?
Performance across platforms: Does the software perform consistently across supported operating systems, browsers, and hardware configurations?

For AI/ML-based SaMD, analytical validation also includes:

Dataset quality assessment: Are the training, validation, and test datasets clinically representative and appropriately curated?
Ground truth verification: What reference standard was used to label the data, and how was expert adjudication managed?
Bias analysis: Has the software been evaluated for performance differences across demographic subgroups?
Algorithm transparency: Can the decision-making process be explained or interpreted?

A critical requirement for AI-based SaMD: validation datasets must be independent from training data. Using the same data for both training and validation is one of the most common causes of FDA additional information requests and can lead to inflated performance claims that do not hold up in clinical use.

Pillar 3: Clinical Validation (Clinical Performance)

Clinical validation demonstrates that use of the SaMD leads to clinically meaningful outcomes — that when clinicians or patients use the software output as intended, it safely and effectively supports clinical decision-making. This answers the question: Does using this software actually help patients?

Clinical validation is the most challenging pillar because it requires evidence that connects the software's output to real clinical outcomes. The level of evidence required depends on the risk classification and the nature of the clinical claim.

For the EU MDR, clinical validation is explicitly required for all SaMD, and MDCG 2020-1 states that clinical evidence must demonstrate the clinical benefit claimed in the intended purpose. The EU takes a broad view of what constitutes clinical evidence for SaMD, but the evidence must be proportionate to the risk and must be documented in a Clinical Evaluation Report (CER).

For the FDA, clinical validation requirements depend on the classification and pathway. Many Class II SaMD can be cleared through the 510(k) pathway with analytical validation supported by clinical data from the predicate device. Higher-risk or novel SaMD may require prospective clinical performance studies. A recent analysis of AI medical device recalls found that devices without clinical validation were significantly more likely to be recalled — underscoring that clinical validation is not merely a regulatory formality but a patient safety necessity.

Generating Clinical Evidence Without Traditional Clinical Trials

One of the most common questions from SaMD manufacturers is whether clinical trials are required. Often, the answer is no — but alternative evidence must be equally rigorous.

Literature-Based Evidence

For SaMD that implements established clinical algorithms or performs functions already validated in the literature, existing clinical data can support the clinical evaluation. This includes:

Published clinical studies of similar software or algorithms
Systematic reviews and meta-analyses
Clinical practice guidelines that validate the underlying clinical approach
Real-world evidence from post-market surveillance of comparable products

The literature review must be systematic, following a documented search protocol, and must include critical appraisal of the quality and relevance of each source. Under the EU MDR, the literature search protocol and appraisal process should follow MDCG 2020-13 guidance.

Retrospective Clinical Performance Studies

Many SaMD products can generate clinical validation data through retrospective studies using previously collected clinical data. This approach involves:

Obtaining a clinically relevant dataset (e.g., medical images, EHR data, lab results) with known clinical outcomes
Running the SaMD on the dataset in a blinded fashion
Comparing the SaMD outputs against the reference standard (ground truth)
Calculating performance metrics (sensitivity, specificity, PPV, NPV, AUC) with confidence intervals

Retrospective studies can be conducted faster and at lower cost than prospective trials, but they require careful attention to:

Dataset representativeness — the data should reflect the intended patient population
Reference standard quality — the ground truth must be reliable
Avoiding data leakage between training and evaluation datasets
Documenting data curation, inclusion/exclusion criteria, and adjudication processes

Real-World Evidence

The FDA's December 2025 final guidance "Use of Real-World Evidence to Support Regulatory Decision-Making for Medical Devices" establishes a framework for using real-world data (RWD) to generate real-world evidence (RWE) for regulatory submissions. RWD includes data from electronic health records, medical device registries, administrative claims data, and patient-generated data from connected devices.

For SaMD, RWE can support:

Supplementing premarket clinical data with real-world performance data
Generating post-market clinical follow-up data for EU MDR PMCF
Supporting labeling changes and expanded indications
Monitoring AI/ML model performance in clinical practice

The FDA evaluates RWE based on the relevance and reliability of the underlying data. Relevance asks whether the data addresses the regulatory question. Reliability asks whether the data is of sufficient quality, including data collection procedures, completeness, accuracy, and appropriateness of the analytical methods applied.

Simulation and In Silico Testing

For some SaMD, clinical evidence can be generated through simulation studies where the software is tested against synthetic or curated clinical datasets that represent realistic clinical scenarios. This is particularly relevant for diagnostic AI SaMD, where large annotated datasets can serve as proxies for clinical testing.

In silico validation is not a substitute for clinical validation, but it can provide strong supporting evidence, especially when combined with literature-based evidence and real-world performance data.

The Non-Clinical Evidence Exception

Under the EU MDR Article 61(10) and MDCG 2020-1, there is an exception for certain lower-risk Class I and IIa SaMD where clinical data is not adequate to demonstrate compliance. In such cases, demonstration of compliance with general safety and performance requirements can be based solely on non-clinical test methods including performance evaluation, technical testing, and pre-clinical evaluation. However, this must be justified, covered in the risk management file, and considered in line with the device's intended clinical performance.

This exception is narrow and should not be interpreted as a general exemption from clinical evidence requirements. The justification must demonstrate why the available clinical data is sufficient or why collecting additional clinical data would be impractical or unnecessary.

EU MDR Requirements for SaMD Clinical Evaluation

The EU MDR imposes specific requirements for SaMD clinical evaluation that differ in important ways from FDA expectations.

Classification Under Rule 11

Under the EU MDR, most standalone diagnostic or therapeutic software falls into Class IIa, IIb, or III under Rule 11, depending on its intended use and potential risk. This reclassification means that self-certification, once common under the old Medical Device Directive, is now rare for SaMD. Most SaMD requires Notified Body involvement.

Clinical Evaluation Plan and Report

The EU MDR requires a documented Clinical Evaluation Plan (CEP) and Clinical Evaluation Report (CER) for all devices. For SaMD, the CEP should address:

The clinical background and current state of the art
The clinical claim being evaluated
The type of clinical data to be collected (literature, clinical investigation, clinical experience)
The methods for appraisal and analysis of clinical data
How the clinical evidence supports conformity with relevant GSPRs

The CER should synthesize all available clinical evidence and reach a conclusion about whether the device is safe, performs as intended, and achieves the claimed clinical benefits.

MDCG 2020-1 Guidance

MDCG 2020-1 provides specific guidance on clinical evaluation and performance evaluation for medical device software. It introduces the three elements for SaMD clinical evidence — valid clinical association, analytical/technical performance, and clinical performance — and describes how these elements interact. The guidance also addresses how to handle software that is integrated into a larger device system versus standalone software.

Post-Market Clinical Follow-Up

For EU MDR compliance, SaMD manufacturers must implement PMCF activities that continuously generate and evaluate clinical data from real-world use. The PMCF plan should define the methods, endpoints, and timelines for collecting clinical performance data post-market. For AI/ML-based SaMD, PMCF should include monitoring for performance drift — changes in model accuracy or reliability as the input data distribution shifts over time.

FDA Requirements for SaMD Clinical Evidence

After the January 2026 withdrawal of the SaMD Clinical Evaluation guidance, FDA expectations are informed by several sources.

Existing Framework Documents

The original IMDRF N41 framework, while no longer an FDA guidance, remains a recognized international standard
The FDA's guidance on real-world evidence (finalized December 2025)
The FDA's guidance on clinical decision support software (updated January 2026)
The FDA's guidance on cybersecurity in medical devices (updated February 2026)
IEC 62304 software lifecycle standard
IEC 62366-1 usability engineering standard

Risk-Based Evidence Strategy

Manufacturers should adopt a risk-based approach to clinical evidence generation that considers:

The IMDRF risk categorization (I through IV) and its implications for evidence rigor
The FDA classification (Class I, II, III) and the corresponding pathway (510(k), De Novo, PMA)
The nature of the clinical claim — diagnostic claims require different evidence than therapeutic claims
The availability of predicate devices and existing clinical data
Whether the software incorporates AI/ML, which requires additional evidence on dataset quality, bias, and model governance

The PCCP Framework

For AI/ML-based SaMD, the FDA's Predetermined Change Control Plan (PCCP) framework allows manufacturers to pre-specify planned algorithm updates that can proceed without a new submission, provided changes remain within the approved parameters. The PCCP should describe the planned changes, the methodology for implementing them, and the impact assessment protocol. This framework enables continuous learning and improvement of AI models while maintaining regulatory oversight.

Common Deficiencies and How to Avoid Them

Based on industry experience and regulatory feedback, the most common deficiencies in SaMD clinical evidence packages include:

Insufficient analytical validation. Many submissions lack adequate testing across the full range of expected input conditions, or fail to demonstrate performance on clinically representative datasets. Test your software against real-world data quality issues — missing values, variable imaging protocols, different hardware configurations — not just clean, curated datasets.

Inadequate clinical validation. The FDA expects clear evidence of analytical validation, clinical validation, performance accuracy, and safety under intended use conditions. Insufficient validation data is one of the biggest causes of FDA Additional Information requests.

Dataset quality issues for AI/ML. Training data that does not represent the target population, inadequate ground truth labeling, and data leakage between training and validation sets are common problems. Document your data governance, curation process, and quality assurance measures thoroughly.

Overstated claims. Regulatory language matters. Claiming "diagnoses" when the software "assists" can move a product from Class II to Class III. Ensure your intended use and clinical claims are precisely aligned with your evidence.

No bias analysis. AI/ML SaMD must demonstrate performance across relevant demographic subgroups (age, sex, race, ethnicity). Without subgroup analysis, the FDA cannot assess whether the device works safely for the full intended population.

Building a Lifecycle Clinical Evidence Program

Clinical evaluation for SaMD is not a one-time activity. Both FDA and EU MDR require ongoing clinical evidence generation as part of post-market surveillance.

Continuous Monitoring

Implement systems to continuously collect and analyze real-world performance data, including safety data, performance metrics, user feedback, and published research. For AI/ML models, monitor for performance drift and establish predefined thresholds that trigger graduated safety responses when real-world performance declines.

Evidence Updates

Update your clinical evaluation report periodically (at minimum annually for EU MDR compliance) to incorporate new clinical data, published literature, and post-market surveillance findings. The updated CER should reflect the current state of clinical knowledge and the device's real-world performance.

Change Control

When the software is updated, assess whether the change affects the clinical evidence. Bug fixes may not require new clinical data. Algorithm updates, new clinical features, or expanded indications may require additional analytical and clinical validation. Document the assessment in your change control records. For AI/ML-based SaMD marketed in the EU, the EU AI Act's high-risk AI requirements for medical devices under Article 6(1) begin applying in August 2027, adding obligations for data governance documentation, model performance monitoring, and transparency that should be integrated into the clinical evidence lifecycle from the start. Note that other high-risk AI obligations under Article 6(5) apply from August 2026, but CE-marked medical devices requiring Notified Body review under MDR or IVDR have until August 2027.

Key Takeaways

SaMD clinical evaluation requires demonstrating three things: that the clinical science supports the software's function (valid clinical association), that the software works correctly technically (analytical validation), and that using the software leads to meaningful clinical outcomes (clinical validation). The FDA's withdrawal of its SaMD Clinical Evaluation guidance in January 2026 does not lower the bar — it shifts responsibility to manufacturers to design tailored, risk-based evidence strategies.

For most SaMD, clinical evidence can be generated without traditional clinical trials through a combination of literature review, retrospective performance studies, real-world evidence, and simulation testing. The key is rigor: well-documented methods, representative datasets, independent validation, and transparent reporting of results.

For manufacturers pursuing both FDA and EU MDR clearance, design your clinical evaluation to satisfy the more stringent requirements from the start — typically the EU MDR's requirement for a documented CEP and CER with PMCF — and then adapt the documentation for the FDA's submission-focused format.