AI/ML in Medical Devices: FDA Regulatory Framework, SaMD Classification, and Approval Pathways
The complete guide to artificial intelligence and machine learning in medical devices — FDA's AI/ML action plan, predetermined change control plans, SaMD classification, 510(k) and De Novo pathways, EU MDR requirements, and Good Machine Learning Practice.
The Rise of AI/ML in Medical Devices
Artificial intelligence and machine learning have moved from research curiosity to clinical reality faster than almost any other technology in the history of medical devices. As of early 2026, the FDA has authorized over 1,000 AI/ML-enabled medical devices -- a number that was fewer than 50 in 2016, crossed 100 in 2019, and has accelerated every year since. In 2025 alone, the FDA authorized between 258 and 295 AI/ML-enabled devices, the most in the agency's history and roughly double the volume from just two years earlier.
This growth reflects a convergence of factors: the maturation of deep learning architectures (particularly convolutional neural networks for imaging and transformer models for sequential data), the availability of large clinical datasets, advances in computing infrastructure, and a regulatory environment that -- while cautious -- has shown increasing willingness to accommodate AI/ML products.
But the regulatory landscape for AI/ML medical devices is unlike anything that came before. Traditional medical devices are designed, tested, manufactured, and then remain essentially static until the manufacturer submits a new regulatory filing for a modified version. AI/ML devices challenge every assumption in that model. They can learn, adapt, and change their behavior based on new data. They can perform differently across patient populations. Their performance can degrade over time as clinical patterns shift. And their decision-making processes may be opaque even to their developers.
This guide covers the full regulatory framework for AI/ML medical devices: how they are classified, which pathways lead to market authorization, what documentation regulators expect, how predetermined change control plans work, what the EU and other international bodies require, and how manufacturers should approach the entire lifecycle from development through post-market monitoring.
AI/ML-Enabled Devices by the Numbers
The following statistics reflect the state of the AI/ML device market as of early 2026:
| Metric | Value |
|---|---|
| Total FDA-authorized AI/ML-enabled devices | Over 1,000 |
| Authorizations in 2025 | 258-295 (record year) |
| Authorizations in 2024 | ~170 |
| Authorizations in 2023 | ~171 |
| Percentage in radiology | ~75-80% |
| Percentage in cardiology | ~10% |
| Percentage authorized via 510(k) | ~95% |
| Percentage authorized via De Novo | ~5% |
| Percentage classified as SaMD | ~62% |
| Number of unique manufacturers | 400+ |
Note: The FDA maintains a publicly available database of all authorized AI/ML-enabled medical devices, updated regularly. This database is maintained by the Digital Health Center of Excellence (DHCoE) and is available on the FDA website. It is the authoritative source for tracking authorizations and identifying potential predicates.
Market Size and Financial Projections
The commercial opportunity behind AI/ML in medical devices is driving investment and regulatory activity at an unprecedented pace. The following projections reflect consensus estimates across multiple industry analyses:
| Metric | Value |
|---|---|
| Global AI in medical devices market size (2024) | ~$13.7 billion |
| Projected market size (2033) | $255 billion+ |
| Compound annual growth rate (CAGR) | 30-40% |
| Largest segment by application | Medical imaging and diagnostics |
| Fastest-growing segment | AI-enabled surgical robotics and drug discovery |
| North America market share (2024) | ~45-50% |
These figures should be interpreted with caution -- market size estimates vary across research firms and depend heavily on how "AI in medical devices" is defined. Some estimates include AI-enabled drug discovery platforms and health IT systems that fall outside the scope of device regulation. What is not in dispute is the trajectory: the market is growing at a rate that far exceeds the medical device industry average, and regulatory capacity must scale accordingly.
What Makes AI/ML Devices Different
To understand why AI/ML medical devices require a distinct regulatory approach, you must understand how they differ from traditional software-based medical devices at a fundamental level.
Locked Algorithms vs. Adaptive Algorithms
The FDA distinguishes between two categories of AI/ML algorithms:
Locked algorithms produce the same output each time the same input is applied. The algorithm does not change after it is deployed. A locked algorithm may have been developed using machine learning techniques -- for instance, a convolutional neural network trained on millions of chest X-rays -- but once the model is finalized and deployed, the model weights are fixed. The device performs exactly the same computation on day one as it does on day one thousand. Most FDA-authorized AI/ML devices today use locked algorithms.
Adaptive algorithms (also called continuously learning algorithms) change their behavior over time based on new data received during clinical deployment. An adaptive algorithm might retrain its model periodically using new patient data, adjust its decision thresholds based on local performance feedback, or update its feature weights based on newly labeled training examples. Adaptive algorithms are where the most significant regulatory challenges arise, and they are the primary focus of the FDA's evolving framework.
Between these two poles lies a spectrum. Some devices use locked algorithms but are updated periodically by the manufacturer through traditional regulatory channels (new 510(k) submissions). Others use predetermined change control plans to implement pre-authorized updates without new submissions. True continuously learning algorithms -- where the device modifies itself autonomously in real time -- remain rare in clinical deployment and face the highest regulatory scrutiny.
Why Traditional Regulatory Models Fall Short
The traditional regulatory model for medical devices assumes a largely static product. You design it, test it, submit evidence that it is safe and effective, receive authorization, manufacture it, and monitor its performance. Changes trigger new regulatory filings. This model works well for hardware devices and even for most traditional software.
AI/ML devices break this model in several ways:
- Performance depends on data, not just design: An AI/ML device's clinical performance is inseparable from the data on which it was trained. Two identically architected neural networks trained on different datasets can have dramatically different clinical performance. The data is as much a part of the "device" as the algorithm.
- Generalization is not guaranteed: A model trained on data from a specific hospital, patient population, or imaging system may not perform equivalently in a different clinical setting. This is not a software bug -- it is a fundamental property of statistical learning.
- Performance can degrade over time: Dataset shift (changes in the distribution of input data relative to training data) and concept drift (changes in the relationship between inputs and outcomes) can cause AI/ML models to lose accuracy over time without any change to the model itself. A chest X-ray AI trained before COVID-19 may perform differently on post-COVID imaging patterns.
- Opacity of decision-making: Many high-performing AI/ML models, particularly deep neural networks, are not fully interpretable. The model produces an output, but the reasoning path is not always transparent to users or developers. This creates challenges for clinical validation, labeling, and post-market surveillance.
- Bias risk: AI/ML models can encode and amplify biases present in training data. If training data underrepresents certain racial groups, age groups, or clinical presentations, the model may perform less accurately for those populations -- a direct patient safety concern.
FDA's AI/ML Regulatory Framework: A Decade of Evolution
The FDA has been developing its approach to AI/ML medical devices iteratively, through a series of discussion papers, action plans, guidance documents, and real-world authorization decisions. Understanding this evolution is essential for manufacturers, because the framework continues to change.
Timeline of Key FDA AI/ML Regulatory Milestones
| Year | Milestone |
|---|---|
| 2017 | FDA authorizes first standalone AI diagnostic device (IDx-DR) through De Novo pathway; Digital Health Innovation Action Plan published |
| 2018 | FDA establishes Digital Health Center of Excellence (DHCoE) precursor activities; authorizes Apple Watch ECG via De Novo |
| 2019 | FDA publishes "Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning-Based Software as a Medical Device" discussion paper |
| 2020 | DHCoE formally established within CDRH; FDA begins publishing AI/ML-enabled device authorization database |
| 2021 | FDA publishes AI/ML-Based SaMD Action Plan; FDA/Health Canada/MHRA jointly publish 10 GMLP guiding principles |
| 2022 | FDA publishes draft guidance on predetermined change control plans for ML-enabled device software functions |
| 2023 | FDA publishes updated draft PCCP guidance; number of authorized AI/ML devices crosses 600 |
| 2024 | FDA finalizes PCCP guidance (December), broadening scope from ML-DSF to AI-DSF; authorized AI/ML devices cross 950 |
| 2025 | FDA publishes landmark draft guidance "Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submission Recommendations" (January); record year for AI/ML authorizations; authorized devices cross 1,000 |
| 2026 | FDA continues refinement of AI/ML framework; QMSR takes effect (February); integration of AI/ML expectations with harmonized quality system requirements |
The 2019 Discussion Paper
The FDA's 2019 discussion paper, "Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning-Based Software as a Medical Device," was the agency's first comprehensive articulation of how it planned to regulate AI/ML devices. It recognized that the existing modification framework -- which requires a new 510(k) for significant changes -- was poorly suited to AI/ML devices that improve through iterative learning.
The paper proposed a Total Product Lifecycle (TPLC) approach in which manufacturers would submit a predetermined change control plan describing the types of anticipated modifications, a change protocol explaining how modifications would be developed and validated, and an update reporting mechanism for transparency. This concept would eventually become the PCCP framework finalized five years later.
The 2021 AI/ML Action Plan
The FDA's "Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device Action Plan," published in January 2021, identified five priorities:
- Tailored regulatory framework: Developing a regulatory approach specifically suited to AI/ML, including predetermined change control plans
- Good Machine Learning Practice (GMLP): Establishing community-wide best practices for AI/ML development, validation, and deployment
- Patient-centered approach: Ensuring transparency so patients and clinicians understand how AI/ML devices work and their limitations
- Regulatory science methods: Investing in new methodologies for evaluating AI/ML performance, including real-world performance assessment
- Real-world performance monitoring: Developing approaches to monitor AI/ML device performance after deployment, including pilots for detecting performance drift
Each of these five pillars has since been advanced through guidance documents, pilot programs, and real-world policy decisions.
The January 2025 TPLC Draft Guidance
The FDA's January 2025 draft guidance, "Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submission Recommendations," represents the most comprehensive articulation of the agency's expectations for AI/ML device developers. Key elements include:
- Total Product Lifecycle (TPLC) integration: AI considerations must be embedded from design through decommission, not bolted on at the submission stage
- Expanded transparency requirements: Detailed recommendations for communicating algorithm function, training data characteristics, known limitations, and subpopulation performance variations
- Bias analysis framework: Explicit expectations for evaluating performance across demographic subgroups and documenting strategies to address disparities
- Marketing submission content recommendations: Model description, data lineage and partitioning strategy, performance metrics mapped to clinical claims, bias analysis, human-AI workflow characterization, monitoring plans, and PCCP (if applicable)
- Post-market monitoring expectations: Recommendations for ongoing surveillance of model performance in deployment, including detection of data drift and concept drift
Important: This guidance uses the term "AI-enabled Device Software Functions (AI-DSF)" rather than the earlier "ML-DSF." This is intentional -- the FDA is signaling that its framework applies to all forms of AI in medical devices, including rule-based systems, not solely machine learning models.
SaMD Classification and the IMDRF Framework
Most AI/ML-enabled medical devices are classified as Software as a Medical Device (SaMD) -- software that performs a medical function independently of any hardware medical device. The classification of SaMD drives every subsequent regulatory decision: pathway, evidence requirements, post-market obligations, and quality system scope.
The IMDRF Risk Categorization Matrix
The International Medical Device Regulators Forum (IMDRF) published its risk categorization framework for SaMD in 2014 (IMDRF/SaMD WG/N12FINAL:2014). The framework uses a two-dimensional matrix with the following axes:
Significance of the information provided by the SaMD to the healthcare decision:
- Treat or diagnose: Information used for immediate or near-term action -- treatment, diagnosis, or prevention
- Drive clinical management: Information used to guide treatment decisions, clinical management, or intervention urgency
- Inform clinical management: Information that supports but does not directly trigger clinical decisions (trending, population data)
State of the healthcare situation or condition:
- Critical: Life-threatening or requiring urgent intervention to prevent permanent impairment
- Serious: Requires medical intervention to prevent significant or long-term harm
- Non-serious: Slow-progressing conditions not requiring urgent intervention
The IMDRF SaMD risk classification matrix:
| Significance \ Condition | Critical | Serious | Non-serious |
|---|---|---|---|
| Treat or diagnose | Category IV | Category III | Category II |
| Drive clinical management | Category III | Category II | Category I |
| Inform clinical management | Category II | Category I | Category I |
- Category I: Lowest risk. Example: An app that displays aggregated wellness trends from wearable sensors.
- Category II: Low-moderate risk. Example: An AI tool that flags potential skin lesions as "suspicious" for non-urgent dermatologist review.
- Category III: Moderate-high risk. Example: An algorithm that interprets ECG data to detect atrial fibrillation and drives clinical management decisions.
- Category IV: Highest risk. Example: An autonomous AI system that diagnoses diabetic retinopathy and treats the condition or makes an autonomous diagnostic decision in a critical clinical scenario.
How the IMDRF Matrix Maps to FDA Classification
The FDA does not directly adopt the IMDRF categories as its classification system, but the IMDRF matrix strongly influences how the agency thinks about SaMD risk. In practice:
| IMDRF Category | Typical FDA Classification | Typical Pathway |
|---|---|---|
| Category I | Class I or Class II | 510(k) exempt or 510(k) |
| Category II | Class II | 510(k) or De Novo |
| Category III | Class II (with special controls) | 510(k) or De Novo |
| Category IV | Class III | PMA or De Novo |
The mapping is not one-to-one. FDA classification depends on the specific intended use, the product code, existing predicates, and the specific risks associated with the device. But the IMDRF framework provides a useful starting point for regulatory strategy.
EU MDR Rule 11 and AI/ML Classification
Under the EU MDR (Regulation (EU) 2017/745), standalone software is classified under Rule 11:
- Software intended to provide information used to make decisions with diagnosis or therapeutic purposes is classified as Class IIa at minimum
- If those decisions could cause death or irreversible deterioration of health, it is Class III
- If those decisions could cause serious deterioration of health, it is Class IIb
- All other medical device software is classified as Class IIa at minimum
- Software intended to monitor physiological processes is classified as Class IIa, unless it monitors vital parameters where variations could result in immediate danger, in which case it is Class IIb
Rule 11 means that virtually all AI/ML medical device software in the EU will be classified as Class IIa or higher. There is no Class I pathway for SaMD under Rule 11, which creates a significantly higher regulatory burden in the EU compared to the US, where many digital health tools are Class I exempt or fall outside the scope of device regulation entirely.
Regulatory Pathways for AI/ML Devices
Three primary pathways lead to FDA marketing authorization for AI/ML devices: 510(k), De Novo, and PMA. The choice of pathway depends on the device's classification, the existence of predicates, and the level of risk.
510(k): The Dominant Pathway
Approximately 95% of all FDA-authorized AI/ML devices have been cleared through the 510(k) pathway. This reflects the maturation of the AI/ML device ecosystem -- as more devices are authorized through De Novo (which creates new product codes), subsequent similar devices can reference those authorizations as predicates for 510(k) submissions.
When to use 510(k) for an AI/ML device:
- A legally marketed predicate device exists with the same intended use
- Your device has similar technological characteristics (or different characteristics that do not raise new questions of safety and effectiveness)
- The device is Class II
Key considerations for AI/ML 510(k) submissions:
- You must demonstrate substantial equivalence, which for AI/ML devices means comparable performance on a relevant clinical dataset -- not identical algorithm architecture
- The FDA expects AI/ML-specific documentation beyond what is required for traditional software: training data description, performance metrics with confidence intervals, subgroup analysis, and failure mode analysis
- If your algorithm uses a fundamentally different architecture from the predicate (e.g., transformer vs. CNN), you must address whether the different technological characteristics raise new safety or effectiveness questions
- Performance claims must be supported by testing on an independent test set that the algorithm has never seen during training or validation
De Novo: Creating New Device Categories
The De Novo pathway is essential for the AI/ML ecosystem because many AI/ML applications are genuinely novel -- no predicate exists. A De Novo request asks the FDA to classify a novel device as Class I or Class II and establish a new product code with associated special controls.
When to use De Novo for an AI/ML device:
- No predicate exists for your device's intended use
- The device is low-to-moderate risk
- General controls alone (Class I) or general plus special controls (Class II) provide reasonable assurance of safety and effectiveness
Why De Novo matters disproportionately for AI/ML:
- Many of the most consequential AI/ML device authorizations have come through De Novo, including IDx-DR (the first autonomous AI diagnostic), the Apple Watch ECG, and numerous AI-based triage and detection tools
- Each De Novo authorization creates a new product code and special controls that define the regulatory requirements for an entire device category
- Subsequent manufacturers can reference the De Novo device as a predicate for 510(k) submissions, which is why the percentage of AI/ML devices going through 510(k) increases every year
PMA: The High-Risk Path
Premarket Approval (PMA) is required for Class III devices -- those presenting the highest risk or those for which general and special controls are insufficient. PMA requires the most rigorous clinical evidence, typically including prospective clinical trials.
When PMA applies to AI/ML devices:
- The device makes autonomous diagnostic or treatment decisions in critical clinical situations (IMDRF Category IV)
- The intended use involves direct patient impact where failure could cause death or serious harm
- No De Novo classification has been established for the device type, and the risk is too high for Class II
In practice, very few AI/ML devices have required PMA. The FDA has generally been willing to classify AI/ML diagnostic and triage tools as Class II with special controls, even for serious clinical applications. This may change as AI/ML devices move toward more autonomous functions.
Notable FDA-Authorized AI/ML Devices by Specialty
| Device / Manufacturer | Specialty | Pathway | Year | Significance |
|---|---|---|---|---|
| IDx-DR (Digital Diagnostics) | Ophthalmology | De Novo | 2018 | First FDA-authorized autonomous AI diagnostic system; detects diabetic retinopathy without clinician interpretation |
| Apple Watch ECG (Apple) | Cardiology | De Novo | 2018 | First consumer wearable with FDA-authorized ECG; De Novo created product code QRZ |
| Viz.ai ContaCT (Viz.ai) | Neurology | De Novo | 2018 | AI-based triage for suspected large vessel occlusion stroke; automated clinical workflow notification |
| Caption Guidance (Caption Health) | Cardiology | De Novo | 2020 | AI guidance for cardiac ultrasound acquisition by non-expert users |
| Paige Prostate (Paige AI) | Pathology | De Novo | 2021 | First AI-based pathology tool for cancer detection in prostate biopsies |
| GI Genius (Medtronic) | Gastroenterology | De Novo | 2021 | AI-assisted polyp detection during colonoscopy |
| Eko Murmur Analysis (Eko Health) | Cardiology | De Novo | 2023 | AI algorithm for heart murmur detection using digital stethoscope |
| Butterfly iQ+ (Butterfly Network) | Radiology | 510(k) | 2020 | AI-assisted point-of-care ultrasound with automated image interpretation |
| Aidoc BriefCase (Aidoc) | Radiology | 510(k) | 2020+ | Suite of AI triage algorithms for CT findings (PE, ICH, C-spine fracture, aortic emergencies) |
| Tempus ECG Analysis (Tempus AI) | Cardiology | 510(k) | 2024 | AI-based ECG interpretation with clinical genomic integration |
Predetermined Change Control Plans (PCCPs)
The Predetermined Change Control Plan is arguably the most important regulatory innovation for AI/ML medical devices in the past decade. Finalized by the FDA in December 2024 as "Marketing Submission Recommendations for a Predetermined Change Control Plan for Artificial Intelligence-Enabled Device Software Functions," the PCCP framework addresses the fundamental tension between AI/ML's need to evolve and the regulatory system's need for oversight.
The Problem PCCPs Solve
Before PCCPs, every significant change to an AI/ML device -- retraining the model on new data, adjusting performance thresholds, expanding the input data types the model accepts -- required a new premarket submission (typically a new 510(k) or PMA supplement). This process could take months and created a perverse incentive: manufacturers were discouraged from improving their AI/ML devices because each improvement triggered a new regulatory cycle.
This was particularly problematic for AI/ML devices where regular retraining is expected and beneficial. An AI model that detects a specific pathology might improve substantially with access to new, diverse training data. Under the traditional framework, deploying that improved model required a full regulatory submission for what was essentially a performance improvement using the same fundamental approach.
How a PCCP Works
A PCCP is submitted as part of a premarket marketing submission (510(k), De Novo, or PMA) and consists of three components:
Description of Planned Modifications: What types of changes the manufacturer anticipates making. For AI/ML devices, this typically includes model retraining on new data, performance threshold adjustments, expansion of compatible input data formats, addition of new output types within the authorized intended use, and algorithm architecture refinements within defined bounds.
Modification Protocol: A step-by-step description of how each type of modification will be developed, validated, and verified before deployment. For AI/ML devices, the modification protocol typically addresses:
- Data management procedures for new training data (collection, curation, labeling quality)
- Retraining methodology and validation approach
- Performance acceptance criteria (the new model must meet or exceed defined thresholds)
- Testing against a reference test set
- Subgroup performance requirements (ensuring modifications do not degrade performance for specific populations)
- Human factors considerations for any changes to the user interface
- Risk analysis for the proposed modification
Impact Assessment: An evaluation of how the planned modifications might affect the device's safety and effectiveness, including identification of new risks introduced by the modification and how those risks are mitigated.
What PCCPs Can and Cannot Cover
PCCPs can cover:
- Retraining an AI/ML model on new datasets using the same architecture and intended use
- Adjusting algorithm performance thresholds within pre-specified bounds
- Expanding input compatibility (e.g., adding support for new imaging modalities from the same clinical domain)
- Updating preprocessing or postprocessing steps
- Adding new output visualizations or confidence scores within the existing intended use
PCCPs cannot cover:
- Changes to the device's intended use (e.g., expanding from lung nodule detection to lung cancer staging)
- Changes that introduce new, unmitigated risks not addressed in the original submission
- Changes that would alter the device's classification
- Open-ended modifications without defined boundaries (the plan must be specific and bounded)
- Changes outside the scope described in the authorized PCCP
Practical significance: The PCCP framework does not eliminate regulatory oversight -- it frontloads it. The FDA reviews the PCCP as part of the original submission and must agree that the planned modifications, if executed according to the protocol, will maintain safety and effectiveness. The manufacturer then executes changes within the authorized plan and reports to the FDA, but does not need to wait for a new clearance before deploying each change.
PCCP Terminology: From ML-DSF to AI-DSF
A notable evolution in the final PCCP guidance is the shift from "Machine Learning-Enabled Device Software Functions (ML-DSF)" to "Artificial Intelligence-Enabled Device Software Functions (AI-DSF)." This is not merely cosmetic. By broadening the terminology, the FDA signaled that PCCPs are available for any AI-based software function -- including rule-based AI systems, expert systems, and hybrid approaches -- not only those using machine learning techniques. Manufacturers working with non-ML AI approaches should take note: the PCCP pathway is available to them.
Good Machine Learning Practice (GMLP)
In October 2021, the FDA, Health Canada, and the UK's Medicines and Healthcare products Regulatory Agency (MHRA) jointly published "Good Machine Learning Practice for Medical Device Development: Guiding Principles." These 10 principles represent the first international consensus on best practices for developing AI/ML-based medical devices. While not legally binding, they are increasingly referenced by reviewers during premarket evaluation and represent the standard of care that regulators expect.
The 10 GMLP Guiding Principles
1. Multi-Disciplinary Expertise Is Leveraged Throughout the Total Product Lifecycle
AI/ML device development requires collaboration among clinical experts (who understand the clinical problem and patient populations), data scientists and ML engineers (who design and implement algorithms), software engineers (who build production systems), biostatisticians (who design validation studies), human factors specialists (who ensure appropriate user interaction), and regulatory professionals (who navigate submission requirements). No single discipline can adequately address all aspects of AI/ML device development.
2. Good Software Engineering and Security Practices Are Implemented
AI/ML devices are software, and all established software engineering practices apply: version control, code review, testing, documentation, and security. IEC 62304 compliance is expected. The ML pipeline -- data ingestion, preprocessing, training, validation, deployment -- must be as rigorously managed as the production software. Reproducibility is essential: given the same training data and hyperparameters, the training process should produce a model with equivalent performance.
3. Clinical Study Participants and Data Sets Are Representative of the Intended Patient Population
Training, validation, and test data must reflect the diversity of the population for which the device is intended. This includes diversity across age, sex, race, ethnicity, comorbidities, disease severity, and clinical setting. If the intended use population includes both children and adults, the training data must include both. If the device will be deployed across hospitals using different imaging equipment, the training data should reflect that equipment variability.
4. Training Data Sets Are Independent of Test Sets
Data leakage -- where information from the test set influences the training process -- invalidates performance claims. The GMLP principles require strict independence: no data point should appear in both training and test sets, and no information derived from the test set (such as summary statistics) should influence model development decisions. For medical imaging, this includes patient-level separation -- all images from a single patient must be in the same partition.
5. Selected Reference Datasets Are Based on Best Available Methods
The ground truth labels used for training and testing must be established using the best available clinical methods. If the gold standard for a diagnosis is histopathology, then the reference standard for training an AI to detect that condition from imaging should be histopathology-confirmed cases, not radiologist opinion alone. The limitations of the reference standard must be documented and accounted for in performance claims.
6. Model Design Is Tailored to the Available Data and Intended Use
Algorithm complexity should be appropriate for the available data. Training a 100-million-parameter deep neural network on 500 labeled images invites overfitting. The GMLP principles encourage manufacturers to match model architecture to dataset size, to use appropriate regularization and cross-validation techniques, and to consider simpler models when data is limited. The best model is the one that generalizes best to new data, not the one that achieves the highest training accuracy.
7. Focus Is Placed on the Performance of the Human-AI Team
Clinical AI/ML devices almost always operate within a workflow that includes human clinicians. The relevant performance metric is not the algorithm's standalone accuracy -- it is the performance of the human-AI team. A radiologist aided by an AI tool may perform better than either the radiologist alone or the AI alone. Conversely, an overconfident AI output might lead to automation bias, where the clinician defers to the AI even when the AI is wrong. Validation studies should evaluate human-AI team performance, not just algorithm performance in isolation.
8. Testing Demonstrates Device Performance During Clinically Relevant Conditions
Validation must reflect the conditions under which the device will actually be used. This means testing with data from the clinical environments, patient populations, equipment types, and image quality levels that the device will encounter in practice. Testing exclusively on curated, high-quality academic datasets and then deploying the device in community hospitals with older equipment is a recipe for performance surprises.
9. Users Are Provided Clear, Essential Information
Labeling must communicate the device's intended use, how the AI/ML component functions, what data it was trained on, its performance characteristics (including sensitivity, specificity, and predictive values), known limitations, subpopulation performance variations, and conditions under which the device may not perform as intended. Users should understand what the AI is doing and where it might fail.
10. Deployed Models Are Monitored for Performance and Re-Training Risks Are Managed
Post-deployment monitoring must detect model degradation, including performance changes due to data drift (changes in input data distribution) or concept drift (changes in the relationship between inputs and outputs). When a model is retrained, the retraining process itself introduces risks -- the updated model might perform worse on certain subpopulations even if it performs better overall. These risks must be managed through systematic performance monitoring and validation protocols.
Transparency and Labeling Requirements
The FDA has made transparency a central pillar of its AI/ML framework, recognizing that clinicians and patients need to understand how AI/ML devices work in order to use them safely and effectively.
What the FDA Expects in AI/ML Device Labeling
The January 2025 draft guidance and prior agency communications establish the following labeling expectations for AI/ML devices:
- Intended use and indications for use: Clear articulation of what the device does, for what patient population, in what clinical setting, and by whom
- Description of the AI/ML function: Plain-language explanation of how the algorithm works, what inputs it requires, and what outputs it produces
- Training data description: Sufficient information for users to assess the relevance of the training data to their patient population -- including the demographics, clinical characteristics, geographic sources, and imaging equipment types represented in the training data
- Performance characteristics: Sensitivity, specificity, positive predictive value, negative predictive value, AUC, or other metrics appropriate to the clinical use case, with confidence intervals
- Subpopulation performance: Performance broken down by clinically relevant subgroups (age, sex, race/ethnicity, disease severity) where sample sizes permit
- Known limitations: Conditions under which the device may not perform as intended, including image quality requirements, patient populations not represented in training data, and known failure modes
- Warning against over-reliance: Clear statements that the AI output is intended to assist clinical decision-making, not replace it (for non-autonomous devices)
The Push Toward Algorithmic Transparency
Beyond labeling, the FDA has signaled a broader interest in algorithmic transparency. The agency has explored the concept of a "nutrition label" for AI/ML devices -- a standardized format for communicating key information about an algorithm's training, performance, and limitations. While no formal requirement has been finalized, the direction is clear: the era of black-box AI in clinical medicine is ending.
The Coalition for Health AI (CHAI), an industry-academic consortium, has developed model card and transparency frameworks that align with FDA's direction. Manufacturers would be well-advised to prepare for increasing transparency requirements by documenting algorithm development decisions, training data provenance, and performance characteristics from the outset.
Clinical Validation for AI/ML Devices
Clinical validation for AI/ML medical devices must demonstrate that the device performs safely and effectively in the intended clinical context. The approach differs from traditional medical device clinical evaluation in several important ways.
Study Design Considerations
Standalone performance studies evaluate the algorithm's performance on a curated test set. These studies measure metrics like sensitivity, specificity, AUC, and predictive values against a defined reference standard. For many AI/ML devices, particularly those in radiology, standalone performance studies using retrospective datasets have been sufficient for FDA authorization.
Reader studies compare the performance of clinicians using the AI tool (with-AI) versus clinicians without the AI tool (without-AI). Reader studies are particularly important for AI devices that augment rather than replace clinical judgment, because they measure the clinically relevant outcome: human-AI team performance.
Prospective clinical studies evaluate device performance in real clinical workflows with real patients. The FDA has increasingly expected prospective evidence for higher-risk AI/ML devices, particularly those that influence treatment decisions or operate with a degree of autonomy.
Critical Methodological Requirements
Independent test set: The test dataset must be completely independent from the data used to train or tune the model. Any data leakage invalidates the study.
Pre-specified analysis plan: Performance endpoints, statistical methods, and subgroup analyses must be defined before unblinding the test data. Post-hoc performance analyses are viewed skeptically.
Adequate sample size: The test set must be large enough to demonstrate performance with statistical rigor across the intended population. The FDA expects confidence intervals around all performance estimates.
Subgroup analysis: Performance must be evaluated across clinically relevant subgroups. If the intended use population includes diverse demographics, the test set must include sufficient representation of each subgroup to characterize performance.
Multi-site data: Especially for imaging AI, validation on data from multiple clinical sites, using different equipment manufacturers and acquisition protocols, strengthens the generalizability argument.
Addressing Bias in AI/ML Devices
Bias in AI/ML medical devices is a patient safety issue, not merely an ethical concern. An AI algorithm that performs less accurately for certain racial groups, age groups, or clinical presentations can lead to delayed diagnoses, missed conditions, or inappropriate treatment.
The FDA expects manufacturers to:
- Analyze training data for demographic representation and document any gaps
- Test device performance across clinically relevant subgroups and report results
- Identify and mitigate sources of bias in the training pipeline (label bias, selection bias, measurement bias)
- Include demographic performance data in labeling so that clinicians can assess the device's relevance to their patient population
- Develop monitoring plans to detect performance disparities in post-market deployment
The Clinical Evidence Gap: What the Data Actually Shows
While the FDA has authorized over 1,000 AI/ML-enabled devices, there is a growing body of evidence suggesting that the clinical rigor behind many of these authorizations may not match the sophistication of the technology itself. A systematic review by Lin et al. (2025), which examined the evidence base underlying FDA-authorized AI/ML devices, revealed findings that should give the industry pause:
| Finding | Value |
|---|---|
| Percentage of AI/ML devices that underwent randomized controlled trials (RCTs) | ~1.6% |
| Percentage reporting actual patient outcome data | Less than 1% |
| FDA summaries lacking demographic data on study populations | A substantial majority |
| Most common evidence basis for authorization | Retrospective standalone performance studies |
| Percentage of devices authorized via 510(k) (lower evidence bar) | ~95% |
These findings do not necessarily mean that authorized AI/ML devices are unsafe. The 510(k) pathway, by design, relies on substantial equivalence to a predicate rather than independent clinical outcome evidence. Standalone performance studies -- demonstrating that an algorithm can detect a finding on a curated test set -- are a legitimate form of evidence for many lower-risk applications.
However, the gap between what is technically authorized and what is clinically validated in real-world conditions is significant. When fewer than 2% of AI/ML devices have been tested in randomized trials, and fewer than 1% report patient outcomes, the evidence base is overwhelmingly dependent on retrospective dataset performance -- which may not reflect real-world clinical utility.
What the industry needs to improve:
- More prospective validation: Particularly for devices that influence treatment decisions, prospective studies measuring clinical workflow impact and patient outcomes should become standard, not exceptional
- Demographic transparency: FDA authorization summaries should consistently report the demographic composition of study populations, enabling clinicians to assess whether the evidence is relevant to their patients
- Post-market outcome tracking: The current system generates limited data on whether AI/ML devices actually improve patient outcomes after deployment. Manufacturers and health systems should collaborate on outcome registries
- Higher evidence standards for higher-risk claims: Devices that claim to autonomously diagnose or triage patients in critical situations should face evidence requirements commensurate with those claims
Context: This evidence gap is not unique to AI/ML devices. The 510(k) pathway has long been debated for its reliance on predicate equivalence rather than independent clinical evidence. But the rapid pace of AI/ML authorization -- hundreds of devices per year -- makes the evidence question particularly urgent. Regulators, manufacturers, and the clinical community share responsibility for closing this gap.
Continuous Learning and Adaptive AI
Continuously learning or adaptive AI/ML devices -- those that modify their behavior based on new data during clinical deployment -- represent the frontier of both AI/ML capability and regulatory challenge.
The Regulatory Challenge
A continuously learning algorithm is, by definition, a moving target. The device that the FDA authorized is not the same device six months later. Traditional regulatory mechanisms have no easy way to handle this: the concept of "substantial equivalence" presumes a stable device, and post-market surveillance assumes a known baseline.
Current FDA Approach
As of 2026, the FDA's approach to adaptive AI/ML is cautious:
- No fully autonomous continuously learning devices have been authorized. Every AI/ML device authorized by the FDA to date either uses a locked algorithm or undergoes periodic manufacturer-controlled updates.
- PCCPs provide a partial solution: Manufacturers can use PCCPs to pre-authorize specific types of algorithm updates within defined parameters. But the PCCP must describe bounded modifications -- open-ended autonomous learning is not within the current PCCP scope.
- The January 2025 draft guidance addresses continuous learning as a topic requiring further development, noting that the agency continues to evaluate appropriate regulatory mechanisms.
- Real-world performance monitoring pilots are exploring how post-market data can be used to detect performance changes in deployed AI/ML devices, which is a prerequisite for safe adaptive algorithms.
The bottom line: Manufacturers developing adaptive AI/ML devices should expect the regulatory pathway to be significantly more complex than for locked-algorithm devices. Early engagement with the FDA through Pre-Submission meetings is strongly recommended. PCCPs offer a viable mechanism for planned, bounded adaptations, but true continuously learning systems remain an open regulatory question.
Generative AI and Foundation Models in Medical Devices
The emergence of large language models (LLMs), vision-language models, and other foundation models has introduced a fundamentally new category of AI technology into the medical device landscape. Unlike the convolutional neural networks and gradient-boosted trees that dominate the current FDA-authorized AI/ML device database, generative AI systems produce novel outputs -- text, images, or structured data -- rather than classifications or scores. This distinction has profound regulatory implications.
LLMs in Clinical Decision Support
Large language models are increasingly being integrated into clinical decision support (CDS) systems: summarizing patient records, generating differential diagnoses, drafting clinical notes, answering clinician questions about treatment guidelines, and interpreting lab results in context. Some of these applications may qualify as medical devices under the FDA's CDS guidance framework, while others may fall within the CDS exemption criteria established under the 21st Century Cures Act.
The FDA's CDS guidance (finalized in September 2022) exempts software functions that meet four criteria: the function is not intended to acquire, process, or analyze a medical image or signal; it is intended for the purpose of displaying, analyzing, or printing medical information; it is intended for the purpose of supporting or providing recommendations to a health care professional; and it enables the professional to independently review the basis for the recommendations. LLM-based systems that meet all four criteria may fall outside the scope of device regulation. However, LLM systems that generate autonomous diagnostic conclusions, bypass clinician review, or process medical images likely qualify as devices and are subject to the full regulatory framework.
Hallucination Risk as a Regulatory Challenge
The defining regulatory challenge for generative AI in medical devices is hallucination -- the generation of plausible but factually incorrect outputs. A radiology AI that classifies an image as "positive" or "negative" has a bounded failure mode: it can be wrong, but the nature of the error is constrained. An LLM that generates a narrative clinical interpretation can introduce entirely fabricated findings, invent citations to nonexistent studies, or produce recommendations that are internally coherent but clinically dangerous.
Hallucination risk is qualitatively different from the misclassification risk that existing AI/ML regulatory frameworks were designed to address. Traditional performance metrics -- sensitivity, specificity, AUC -- do not adequately capture the failure modes of generative systems. Regulators and manufacturers are actively grappling with how to define, measure, and mitigate hallucination in clinical contexts. Approaches under exploration include:
- Retrieval-augmented generation (RAG): Grounding LLM outputs in verified clinical knowledge bases to reduce fabrication
- Output verification layers: Secondary algorithms that check LLM outputs against structured clinical databases before presenting them to users
- Constrained generation: Restricting the model's output space to predefined clinical vocabularies or structured formats
- Human-in-the-loop requirements: Mandating clinician review of all generative AI outputs before clinical action
Non-Deterministic Outputs and Validation
Foundation models are inherently non-deterministic in most deployment configurations: the same input can produce different outputs across repeated queries. This property conflicts with fundamental assumptions in medical device validation, where test-retest reliability is a basic expectation. If a physician queries the system twice with the same patient data and receives two different clinical summaries, the regulatory implications are substantial.
Validation of generative AI medical devices cannot rely on traditional test-set evaluation alone. Manufacturers must develop validation frameworks that account for output variability, including:
- Statistical characterization of output consistency across repeated queries
- Clinical equivalence testing (are different outputs clinically equivalent even if textually different?)
- Worst-case output analysis (what is the most harmful output the system can produce for a given input?)
- Red-teaming and adversarial testing specifically designed for generative failure modes
FDA's Evolving Position on Generative AI
As of early 2026, the FDA has not published dedicated guidance on generative AI or foundation models in medical devices. However, the agency has signaled through public statements, workshops, and the January 2025 TPLC draft guidance that it is actively evaluating how existing frameworks apply to these technologies. The AI-DSF terminology in the PCCP guidance is deliberately broad enough to encompass generative AI systems.
Manufacturers developing LLM-based or foundation-model-based medical devices should anticipate that the FDA will apply the same fundamental principles -- safety, effectiveness, transparency, equity -- but may require novel evidence types and validation approaches. Early engagement with the FDA through Pre-Submission meetings is particularly important for generative AI products, where the regulatory pathway may not be obvious from existing precedent.
EU MDR and the AI Act
European regulation of AI/ML medical devices involves two overlapping frameworks: the Medical Device Regulation (EU MDR 2017/745) and the Artificial Intelligence Act (Regulation (EU) 2024/1689).
EU MDR Requirements for AI/ML Devices
Under the EU MDR, AI/ML medical devices are subject to the same requirements as all medical devices, plus additional considerations for software:
- Classification under Rule 11: As discussed above, standalone software is classified as Class IIa at minimum, with higher classifications for devices whose outputs affect critical clinical decisions
- General Safety and Performance Requirements (GSPRs): Annex I requirements apply, including those related to software design (Section 17), information security (Sections 17.2, 17.4), and clinical evaluation
- Clinical evaluation: The MDR requires clinical evaluation based on clinical data, which for AI/ML devices means performance data on European patient populations. Clinical investigation (the EU equivalent of a clinical trial) may be required for higher-risk AI/ML devices
- Technical documentation: Must include complete software documentation per IEC 62304, algorithm description, training and validation data characterization, and performance evidence
- Post-market surveillance and PMCF: AI/ML device manufacturers must conduct Post-Market Clinical Follow-up (PMCF) that includes monitoring for performance drift and dataset shift
The EU AI Act and Medical Devices
The EU AI Act (Regulation (EU) 2024/1689), which entered into force in August 2024 with phased compliance deadlines, introduces a horizontal regulatory framework for AI systems across all sectors, including healthcare.
Key implications for AI/ML medical device manufacturers:
- High-risk classification: AI systems intended for use as a medical device (or as a safety component of a medical device) are classified as "high-risk" under Article 6 and Annex I of the AI Act
- Conformity assessment alignment: For AI systems that are medical devices, the AI Act conformity assessment is integrated into the EU MDR conformity assessment. The Notified Body assessing the device under MDR also assesses AI Act compliance. This avoids dual conformity assessment but increases the scope of what the Notified Body must evaluate
- Additional requirements: The AI Act imposes requirements on data governance, transparency, human oversight, accuracy, robustness, and cybersecurity that supplement (and in some areas exceed) the MDR's requirements
- Prohibited practices: The AI Act prohibits certain AI practices outright (Article 5), though these are primarily relevant to social scoring and manipulation rather than medical devices
- Timeline: Most obligations for high-risk AI systems (including medical devices) apply from August 2, 2026
Practical impact: Manufacturers marketing AI/ML medical devices in the EU must comply with both the MDR and the AI Act. While the conformity assessment is integrated, the documentation and compliance requirements are additive. Manufacturers should begin assessing AI Act compliance requirements now, particularly regarding data governance documentation, transparency obligations, and human oversight mechanisms.
International Regulatory Landscape
Health Canada
Health Canada has been among the most progressive regulators on AI/ML medical devices. Key positions include:
- Co-author of GMLP principles: Health Canada co-published the 10 GMLP guiding principles with the FDA and MHRA
- Pre-market review: AI/ML devices are regulated under Canada's Medical Devices Regulations (SOR/98-282) with classification based on risk. Most AI/ML SaMD falls into Class II, III, or IV
- Transparency expectations: Health Canada has emphasized the importance of algorithmic transparency and has published guidance on digital health technologies that includes AI/ML-specific considerations
- PCCP-like mechanisms: Health Canada is evaluating approaches to predetermined change control for AI/ML devices, informed by the GMLP principles and FDA's PCCP framework
UK MHRA
The MHRA has positioned itself as a leader in AI/ML medical device regulation following Brexit:
- Co-author of GMLP principles: The MHRA co-published the GMLP guiding principles
- Software and AI as a Medical Device Change Programme: The MHRA launched a multi-year program to develop a modernized regulatory framework for software and AI medical devices in the UK
- Adaptive regulation: The MHRA has expressed interest in regulatory sandboxes and adaptive approaches that could accommodate AI/ML device evolution more flexibly than traditional frameworks
- UK MDR 2002 (as amended): AI/ML devices are currently regulated under the UK's existing medical device regulations, with UKCA marking requirements. The MHRA's ongoing reform may result in new, AI-specific requirements
Japan PMDA
Japan's Pharmaceuticals and Medical Devices Agency (PMDA) has taken a structured approach to AI/ML:
- DASH for SaMD: The PMDA's Determination of AI/ML Software for Health (DASH) initiative provides guidance on how AI/ML SaMD is evaluated in Japan
- IDATEN plan: Japan's "Improvement and Development of AI Technology to Expedite New drug and medical device review" plan outlines a strategic approach to AI/ML regulation
- Collaborative frameworks: Japan participates in international harmonization efforts through IMDRF and bilateral agreements with the FDA
- IEC 62304 emphasis: Japan has strongly adopted IEC 62304 for software lifecycle management, including AI/ML components
China NMPA
China's National Medical Products Administration (NMPA) has rapidly expanded its regulatory framework for AI/ML medical devices, reflecting the country's substantial investment in healthcare AI. China has authorized a growing number of AI/ML medical devices, particularly in medical imaging and pathology, and has published several guidance documents specific to AI/ML:
- Classification guidance: The NMPA classifies AI/ML diagnostic software as Class II or Class III medical devices, depending on the clinical risk. AI software that provides autonomous diagnostic conclusions for serious conditions is typically Class III.
- Technical review guidelines: The NMPA has published guidelines for the technical review of AI/ML medical device software, covering algorithm description, training data requirements, performance evaluation, and clinical validation expectations. These guidelines share conceptual similarities with the FDA's TPLC approach but differ in specific requirements.
- Clinical trial requirements: For Class III AI/ML devices, the NMPA generally requires prospective clinical trials conducted at Chinese clinical sites with Chinese patient populations. This creates a significant market-entry barrier for foreign manufacturers, as retrospective studies or studies conducted entirely outside China may not be accepted.
- Data localization considerations: Manufacturers should be aware of China's data security and personal information protection laws, which may affect the transfer of training data and clinical data across borders.
China represents one of the largest markets for AI/ML medical devices, and manufacturers with global ambitions must account for NMPA requirements in their regulatory strategy.
WHO Guidance on AI for Health
The World Health Organization published its landmark guidance document, "Ethics and Governance of Artificial Intelligence for Health," in 2021 and has continued to refine its framework through subsequent publications. In 2023, the WHO articulated a six-pillar governance framework for AI in health:
- Transparency and documentation: AI systems should be transparent in their design, development, and deployment, with adequate documentation of algorithms, training data, and intended use
- Risk management and safety: Comprehensive risk management throughout the AI lifecycle, including identification and mitigation of risks to patients and health systems
- Data governance and privacy: Robust data governance frameworks that protect patient privacy, ensure data quality, and address bias in training datasets
- Collaboration and stakeholder engagement: Inclusive development processes that engage patients, clinicians, regulators, and affected communities
- Regulatory oversight and compliance: Appropriate regulatory mechanisms proportionate to the risk of the AI system, with post-market surveillance capabilities
- Evaluation and monitoring: Continuous monitoring of AI system performance in real-world settings, with mechanisms to detect and address safety concerns
While the WHO framework is not legally binding, it carries significant normative weight and influences regulatory development in low- and middle-income countries that may lack the institutional capacity to develop their own AI/ML device frameworks from scratch. Manufacturers marketing AI/ML devices globally should be familiar with the WHO framework, as it increasingly serves as the baseline for national regulatory development.
OECD AI Principles
The Organisation for Economic Co-operation and Development (OECD) adopted its AI Principles in 2019 (updated in 2024), which have been endorsed by over 40 countries and serve as a foundational reference for AI governance across sectors, including healthcare. The OECD principles emphasize:
- Inclusive growth, sustainable development, and well-being: AI should benefit people and the planet
- Human-centred values and fairness: AI systems should respect human rights, democratic values, and diversity, with safeguards to ensure fairness and prevent discrimination
- Transparency and explainability: AI actors should provide meaningful information about AI systems to enable stakeholders to understand outcomes and challenge them
- Robustness, security, and safety: AI systems should function in a robust, secure, and safe manner throughout their lifecycle, with ongoing risk assessment and management
- Accountability: Organizations and individuals developing or deploying AI should be accountable for the proper functioning of AI systems in line with the above principles
The OECD principles are particularly relevant to medical device manufacturers because many national regulatory frameworks -- including the EU AI Act -- draw directly from them. The OECD also maintains the AI Policy Observatory, which tracks AI policies and initiatives across member states and provides comparative analysis that is useful for manufacturers developing international regulatory strategies.
IEC 62304 and Software Lifecycle Considerations
IEC 62304 (Medical device software -- Software life cycle processes) is the foundational standard for medical device software development, and it applies fully to AI/ML medical devices. However, AI/ML introduces complexities that the standard did not originally contemplate.
Where IEC 62304 Applies to AI/ML
- Software development planning: The ML pipeline (data ingestion, preprocessing, training, validation, deployment) must be planned and documented as part of the software development lifecycle
- Software requirements: Requirements must capture the AI/ML-specific behavior, including expected performance thresholds, input specifications, output specifications, and failure modes
- Software architecture: The architecture must document the ML model as a software component, including its interfaces, dependencies, and integration points
- Software unit verification: Individual software units -- including data preprocessing code, model training scripts, inference engines, and postprocessing logic -- must be verified
- Software integration and system testing: The integrated system, including the ML model, must be tested in the context of the complete device
- Software maintenance: Bug fixes, model updates, and performance improvements must follow the maintenance process, including impact analysis and regression testing
AI/ML-Specific Challenges Under IEC 62304
Software of Unknown Provenance (SOUP): Pre-trained models, transfer learning base models, and third-party ML frameworks (TensorFlow, PyTorch, ONNX Runtime) are SOUP components. IEC 62304 requires manufacturers to identify all SOUP, evaluate its suitability, and manage risks associated with its use. For AI/ML, this extends to pre-trained model weights, which are effectively SOUP with unknown training data provenance.
Reproducibility: IEC 62304 assumes that software behavior is deterministic and reproducible. ML training processes are often stochastic -- random weight initialization, random data shuffling, GPU floating-point nondeterminism -- which means identical training runs may produce models with different weights and slightly different performance. Manufacturers must address this through controlled randomness (fixed seeds), performance-based acceptance criteria rather than bit-exact reproducibility, and documented training procedures.
Data as a design input: In traditional software, the code is the design and the design inputs are requirements. In AI/ML, the data is also a design input -- perhaps the most important one. IEC 62304 does not explicitly address data management as a software lifecycle activity, but regulators expect data management to be treated with the same rigor as code management: version controlled, traced, validated, and documented.
IEC 62304 Safety Classification for AI/ML
IEC 62304 classifies software into three safety classes:
| Safety Class | Description | AI/ML Relevance |
|---|---|---|
| Class A | No injury or damage to health is possible | Rare for AI/ML SaMD; most AI/ML devices that provide clinical information will be Class B or C |
| Class B | Non-serious injury is possible | AI/ML devices that inform but do not drive clinical decisions; lower-risk screening tools |
| Class C | Death or serious injury is possible | AI/ML devices that drive critical clinical decisions; autonomous diagnostic tools; treatment recommendation systems |
The safety class determines the rigor of software lifecycle activities required. Class C software requires the most extensive documentation, testing, and process controls.
Cybersecurity Considerations for AI/ML Devices
AI/ML medical devices face all the cybersecurity challenges of traditional software medical devices, plus several that are unique to AI/ML.
AI/ML-Specific Cybersecurity Threats
Adversarial attacks: Carefully crafted perturbations to input data can cause AI/ML models to produce incorrect outputs. For medical imaging AI, this could mean adding imperceptible noise to a medical image that causes the algorithm to miss a tumor or falsely detect a pathology that does not exist. Research has demonstrated successful adversarial attacks against medical imaging classifiers, ECG analysis algorithms, and natural language processing systems used in clinical decision support.
Data poisoning: If training data can be manipulated -- either during initial training or during retraining of adaptive systems -- an attacker can systematically degrade model performance or introduce targeted misclassifications. This is a particularly relevant threat for AI/ML devices that retrain on real-world data.
Model inversion and extraction: Attackers may attempt to reverse-engineer the AI model by querying it with specially constructed inputs, potentially extracting proprietary model weights or reconstructing training data (which may include protected health information).
Supply chain attacks on ML frameworks: AI/ML devices depend on complex software supply chains including ML frameworks, numerical libraries, and pre-trained models. Compromised dependencies can introduce vulnerabilities that are difficult to detect through traditional security testing.
Software Bill of Materials (SBOM) for AI/ML Devices
A Software Bill of Materials (SBOM) is a comprehensive, machine-readable inventory of all software components, libraries, frameworks, and dependencies that comprise a software system. For AI/ML medical devices, the SBOM is not optional -- the FDA's 2023 cybersecurity guidance ("Cybersecurity in Medical Devices: Quality System Considerations and Content of Premarket Submissions") establishes the SBOM as a required element of premarket submissions for devices with cybersecurity considerations, which includes virtually all AI/ML devices.
Why the FDA requires SBOMs for AI/ML devices:
AI/ML devices depend on particularly complex software supply chains. A typical AI/ML medical device may include a deep learning framework (TensorFlow, PyTorch, ONNX Runtime), numerical computation libraries (NumPy, cuDNN), data processing libraries, containerization layers, operating system components, and potentially pre-trained model weights from third-party sources. Each of these components can contain known vulnerabilities that must be tracked, assessed, and patched. The SBOM provides the foundation for vulnerability management by making the supply chain transparent.
The SBOM requirement is especially important for AI/ML devices because:
- ML frameworks release frequent updates that address security vulnerabilities, and manufacturers must track which versions are deployed
- Pre-trained models and transfer learning components may have opaque provenance, and the SBOM documents their inclusion
- The complexity of ML dependency chains (where a single framework may pull in hundreds of transitive dependencies) makes manual tracking infeasible
Practical guidance for SBOM generation:
- Standard formats: The FDA accepts SBOMs in two primary formats: SPDX (Software Package Data Exchange, an ISO/IEC 5962:2021 standard maintained by the Linux Foundation) and CycloneDX (an OWASP standard with strong support for vulnerability tracking). Both are machine-readable and widely supported by tooling. SPDX is more established in standards bodies; CycloneDX has gained traction for its native support of vulnerability disclosure and dependency graph representation.
- Generation tools: Automated SBOM generation tools include Syft (open-source, supports both SPDX and CycloneDX), Microsoft's SBOM Tool, CycloneDX generators for specific ecosystems (Python, Node.js, Java), and commercial platforms such as FOSSA, Snyk, and Black Duck. For AI/ML devices, ensure the tooling captures Python dependencies (pip/conda), system-level libraries (CUDA, cuDNN), and containerized components.
- Continuous maintenance: The SBOM is not a one-time deliverable. It must be updated whenever software components are added, removed, or updated. Integrate SBOM generation into your CI/CD pipeline so that every build produces a current SBOM.
- Vulnerability monitoring: Pair your SBOM with a vulnerability monitoring service (such as the NIST NVD, OSV, or commercial alternatives) that alerts you when newly discovered vulnerabilities affect components in your device.
Cybersecurity Documentation for AI/ML Submissions
In addition to standard cybersecurity documentation (threat model, SBOM, security testing, vulnerability management plan), AI/ML devices should address:
- Resilience to adversarial inputs (including testing with adversarial examples relevant to the clinical domain)
- Data integrity controls throughout the ML pipeline (training, validation, deployment, and any retraining)
- Security of model artifacts (model weights, configuration files, training data)
- Secure model deployment and update mechanisms (particularly relevant for devices with PCCPs)
- Monitoring for anomalous inputs that may indicate adversarial activity
Post-Market Monitoring for AI/ML Devices
Post-market monitoring is where the long-term safety and effectiveness of AI/ML devices is determined. The challenges are distinct from traditional medical devices because AI/ML performance can change even when the device itself does not.
Performance Drift and Dataset Shift
Data drift (also called covariate shift) occurs when the distribution of input data changes over time relative to the training data. Examples include changes in imaging equipment, changes in clinical protocols that affect image acquisition, shifts in patient demographics, and emergence of new disease presentations (as occurred with COVID-19).
Concept drift occurs when the relationship between inputs and outputs changes. For example, changes in diagnostic criteria, new treatment patterns that alter disease progression, or evolving clinical understanding of a condition can all cause concept drift.
Label drift occurs when the meaning or application of diagnostic labels changes over time, which can affect both model performance assessment and retraining data quality.
Monitoring Approaches
The FDA's January 2025 draft guidance recommends that manufacturers implement systematic post-market performance monitoring, including:
- Ongoing performance assessment: Regular evaluation of device performance metrics using real-world data, compared against the performance established during premarket evaluation
- Drift detection: Statistical methods to detect data drift and concept drift before they cause clinically significant performance degradation
- Complaint and adverse event analysis: Integration of AI/ML-specific failure modes into the existing adverse event reporting framework
- User feedback mechanisms: Structured channels for clinicians to report cases where the AI output was incorrect or misleading
- Subpopulation monitoring: Ongoing assessment of performance equity across demographic subgroups
Adverse Events and Recall Data for AI/ML Devices
Despite the rapid growth in AI/ML device authorizations, publicly available post-market safety data remains limited. This is partly because many AI/ML devices are relatively new to the market, and partly because adverse event reporting for software devices faces inherent challenges -- software failures may not be recognized or reported with the same frequency as hardware failures.
What the available data shows:
| Post-Market Metric | Value |
|---|---|
| AI/ML devices with reported adverse events post-approval | A small percentage of authorized devices |
| Recall rate for AI/ML-enabled devices | ~5.8% |
| Most common recall cause | Software-related issues (algorithm errors, performance degradation, integration failures) |
| FDA adverse event database (MAUDE) reports for AI/ML devices | Growing but underrepresented relative to deployment volume |
The ~5.8% recall rate for AI/ML devices is roughly in line with the broader medical device recall rate, but the nature of AI/ML recalls is distinctive. The majority are software-related: algorithm errors that produce incorrect outputs under specific input conditions, performance degradation discovered after deployment, compatibility issues with updated clinical IT infrastructure, or cybersecurity vulnerabilities in underlying software components. Hardware-related recalls (which dominate traditional medical device recall statistics) are rare for AI/ML SaMD products.
Key observations:
- Underreporting is likely: The fact that only a small percentage of AI/ML devices have reported adverse events should not be interpreted as evidence of universal safety. Software errors in clinical AI may go undetected if clinicians override incorrect AI outputs as part of their normal workflow -- the error is corrected in practice but never reported.
- Recall ≠ market withdrawal: Most AI/ML device recalls are Class II recalls (situations where use of or exposure to a product may cause temporary or reversible adverse health consequences), not Class I (reasonable probability of serious adverse health consequences or death). Many AI/ML recalls are resolved through software updates deployed remotely, without physical product retrieval.
- Post-market data infrastructure is immature: The current adverse event reporting infrastructure (MDR reporting, MAUDE database) was designed for hardware devices. Software-specific failure modes -- intermittent errors, population-specific performance degradation, drift-related accuracy loss -- are poorly captured by existing reporting categories. The FDA has acknowledged this gap and is exploring enhanced post-market surveillance mechanisms for AI/ML devices.
Manufacturers should treat post-market safety monitoring not as a regulatory obligation to be minimally satisfied, but as a critical quality function. Proactive monitoring -- including automated performance tracking, clinician feedback loops, and systematic comparison of real-world performance against premarket benchmarks -- provides the data needed to identify issues before they become adverse events.
Practical tip: Build performance monitoring infrastructure into your product from the beginning. Retroactively adding monitoring to a deployed AI/ML device is difficult. Design your system to log inputs, outputs, and (where available) clinician feedback from day one. This data becomes the foundation for both post-market surveillance and future model improvement.
Practical Guidance for Manufacturers
Bringing an AI/ML medical device to market requires navigating regulatory, technical, and clinical complexities simultaneously. The following guidance is distilled from the regulatory framework and the experience of manufacturers who have successfully brought AI/ML devices through FDA authorization.
Before You Start Development
Define the intended use precisely: Regulatory strategy, classification, pathway selection, clinical evidence requirements, and labeling all flow from the intended use. Ambiguity in intended use creates ambiguity in every downstream decision.
Search the FDA AI/ML device database: Identify devices that have been authorized for similar clinical applications. Note their product codes, regulatory pathways, special controls, and clinical evidence. This search determines whether a predicate exists (510(k) may be possible) or whether De Novo is required.
Assess your data strategy early: Your training data will determine your device's performance. Evaluate whether you have access to sufficiently large, diverse, well-labeled datasets. Identify gaps in demographic representation, clinical site diversity, and equipment diversity. Address these gaps before you begin training, not after.
Engage the FDA early: For novel AI/ML devices, a Pre-Submission (Q-Sub) meeting with the FDA is strongly recommended. The FDA's Digital Health Center of Excellence (DHCoE) provides specialized review expertise for AI/ML devices and can provide early feedback on classification, pathway, clinical study design, and PCCP strategy.
During Development
Implement IEC 62304 from day one: Do not treat software lifecycle compliance as a documentation exercise to be completed before submission. Build your ML pipeline with version control, traceability, testing, and documentation integrated into the development workflow.
Manage your training data rigorously: Treat training data as a design input. Document data sources, collection protocols, labeling procedures, inter-annotator agreement, and demographic characteristics. Maintain strict separation between training, validation, and test sets at the patient level.
Design for transparency: Develop model cards, algorithm descriptions, and performance characterizations as you build. These will form the basis of both your regulatory submission and your device labeling.
Plan for bias assessment: Incorporate subgroup analysis into your validation plan from the outset. If your training data lacks representation of certain demographic groups, document this limitation and develop a plan to address it (additional data collection, transfer learning, or explicit labeling of the limitation).
Consider your PCCP strategy: If you anticipate regular model updates, begin designing your PCCP during development. The modification protocol, acceptance criteria, and validation procedures should be developed in parallel with the initial model, not as an afterthought.
Preparing the Submission
Compile AI/ML-specific documentation: Beyond standard software documentation, your submission should include algorithm description, training data characterization, data partitioning methodology, performance metrics with confidence intervals, subgroup analysis, failure mode analysis, and human-AI workflow description.
Use the eSTAR template: The FDA's Electronic Submission Template and Resource (eSTAR) is now the required submission format for 510(k) and De Novo requests. Familiarize yourself with its structure and populate AI/ML-specific sections thoroughly.
Address cybersecurity proactively: Prepare your threat model, SBOM, and security testing results. For AI/ML devices, include adversarial robustness testing and data integrity controls.
Post-Authorization
Implement performance monitoring: Deploy monitoring infrastructure to track real-world performance, detect drift, and capture clinician feedback. This is both a regulatory expectation and a product quality imperative.
Execute your PCCP (if authorized): When implementing changes under an authorized PCCP, follow the modification protocol precisely and maintain documentation demonstrating compliance with the protocol and acceptance criteria.
Report transparently: Submit periodic reports as required by your authorization conditions. If you detect performance degradation or safety signals, report them promptly through the appropriate channels (MDR reporting for adverse events; PCCP reporting for planned modifications).
Key FDA Databases and Resources
| Resource | URL | Description |
|---|---|---|
| FDA AI/ML-Enabled Medical Device Database | fda.gov (DHCoE) | Complete list of all FDA-authorized AI/ML-enabled devices with product codes, pathways, and intended uses |
| 510(k) Premarket Notification Database | accessdata.fda.gov | Searchable database of all 510(k) clearances |
| De Novo Decision Summaries | accessdata.fda.gov | Decision summaries for all granted De Novo requests, including special controls |
| Digital Health Center of Excellence | fda.gov/medical-devices/digital-health-center-excellence | DHCoE resources, guidance, and AI/ML policy updates |
| CDRH Guidance Documents | fda.gov | All current and draft guidance documents, including AI/ML-specific guidance |
| MAUDE Database | accessdata.fda.gov/scripts/cdrh/cfdocs/cfmaude/search.cfm | Adverse event reports for medical devices, including AI/ML devices |
| IMDRF SaMD Documents | imdrf.org | IMDRF guidance on SaMD definition, classification, clinical evaluation, and quality management |
| GMLP Guiding Principles | fda.gov | The 10 Good Machine Learning Practice principles (FDA/Health Canada/MHRA) |
Future Outlook
The regulatory landscape for AI/ML medical devices will continue to evolve rapidly over the next several years. Several trends are clear:
PCCPs will become standard practice. As more manufacturers gain experience with PCCPs and the FDA builds a body of authorized plans, the PCCP framework will become the default approach for AI/ML devices that anticipate regular model updates. Expect the FDA to publish additional guidance refining PCCP expectations based on real-world experience.
Transparency requirements will increase. The direction is unmistakable: regulators worldwide are moving toward greater transparency in AI/ML medical devices. Standardized algorithm disclosure formats, mandatory subpopulation performance reporting, and possibly algorithmic audit requirements are all plausible developments within the next few years.
Real-world evidence will play a larger role. The FDA's interest in real-world performance monitoring for AI/ML devices signals a shift toward using real-world evidence -- not just premarket clinical studies -- to assess ongoing safety and effectiveness. Post-market performance data may eventually become a formal component of regulatory decision-making for AI/ML device modifications.
International harmonization will advance -- slowly. The GMLP principles demonstrated that international regulators can align on AI/ML expectations. The IMDRF continues to develop harmonized guidance. But practical harmonization -- where a manufacturer can use the same evidence package across multiple jurisdictions -- remains a long-term goal, not a near-term reality. The EU AI Act introduces requirements that have no direct equivalent in other jurisdictions, which may complicate rather than simplify international regulatory strategy.
Adaptive AI will eventually reach the clinic. True continuously learning algorithms -- where the device modifies itself in real time based on new patient data -- are technically feasible but regulatory mechanisms for overseeing them are still immature. Expect incremental progress: broader PCCP scopes, pilot programs for monitored adaptive systems, and eventually a framework that permits bounded autonomous learning with robust post-market safeguards.
Bias and equity will remain at the forefront. Regulators have made clear that AI/ML devices must perform equitably across patient populations. Expect increasing scrutiny of training data diversity, mandatory subgroup performance reporting, and potentially explicit requirements for bias mitigation plans.
Generative AI will raise new questions. Large language models and other generative AI systems are beginning to enter the medical device space -- for clinical documentation, patient communication, and even clinical decision support. As discussed in the generative AI section above, these systems raise regulatory questions that existing frameworks were not designed to address, including hallucination risk, non-deterministic outputs, and the challenge of validating a system whose outputs are essentially unbounded. Expect the FDA to address generative AI in dedicated guidance, building on the CDS framework and the broad AI-DSF terminology established in recent documents.
The regulatory framework for AI/ML medical devices is still being written. Manufacturers who engage early with regulators, build compliance into their development processes, and design for transparency and equity will be best positioned to navigate the evolving landscape. The opportunity is enormous -- AI/ML has the potential to transform medical diagnosis, treatment, and monitoring. But realizing that potential requires navigating a regulatory environment that is demanding, evolving, and appropriately focused on patient safety.
Frequently Asked Questions
Is a PCCP required for all AI medical devices?
No. A Predetermined Change Control Plan (PCCP) is optional, not mandatory. A PCCP is recommended for AI/ML devices where the manufacturer anticipates making iterative modifications after initial authorization -- such as retraining models on new data, adjusting performance thresholds, or expanding input compatibility. If the manufacturer does not plan to modify the device's AI/ML component, or if planned modifications would constitute a new intended use (which PCCPs cannot cover), a PCCP is neither required nor appropriate. Manufacturers of locked-algorithm devices that will not change after deployment may choose to forgo a PCCP entirely. However, for any AI/ML device where regular improvement cycles are part of the product strategy, a PCCP is strongly advisable because the alternative -- filing a new 510(k) or PMA supplement for each modification -- is slow and costly.
What is the difference between SaMD and AiMD?
SaMD (Software as a Medical Device) is a regulatory term defined by the IMDRF to describe software that performs a medical function independently, without being part of a hardware medical device. AiMD (Artificial Intelligence-enabled Medical Device) is a broader, less formally defined term used primarily in the European regulatory context (particularly in discussions around the EU AI Act) to describe any medical device that incorporates AI functionality -- whether that AI is embedded in hardware, runs as standalone software, or operates as a component within a larger system. All AI-based SaMD is AiMD, but not all AiMD is SaMD. An AI algorithm embedded in an MRI scanner that optimizes image reconstruction is AiMD but not SaMD, because the software is integral to a hardware device. An AI algorithm that independently analyzes uploaded chest X-rays and provides a diagnostic output is both SaMD and AiMD. The regulatory distinction matters because SaMD has a well-established classification and regulatory framework (the IMDRF SaMD framework, FDA classification rules, EU MDR Rule 11), while AiMD as a category is still developing its regulatory identity, particularly in the EU where both the MDR and the AI Act apply.
How long does FDA review of an AI/ML device take?
Review timelines vary significantly by pathway and complexity. For 510(k) submissions, the FDA's target review time is 90 calendar days from acceptance, and many AI/ML devices that are well-prepared and have clear predicates are reviewed within this timeframe. In practice, review cycles often extend to 4-8 months when additional information requests (AIs) are issued. De Novo requests typically take longer -- 6 to 12 months or more -- because there is no predicate and the FDA must establish new special controls. PMA reviews for AI/ML devices (rare) can take 12 months or longer. Pre-Submission (Q-Sub) meetings, which are strongly recommended for novel AI/ML devices, add 2-4 months to the front end of the process but can significantly reduce review time by aligning the submission with FDA expectations before filing. The single most effective way to shorten review time is to submit a complete, well-organized application that addresses AI/ML-specific questions proactively.
Can an AI medical device be Class I?
In theory, yes, but in practice it is rare. Under the FDA's classification system, a Class I device is one that presents minimal risk and is subject to general controls only. Some software functions that use AI/ML and fall within the Clinical Decision Support (CDS) exemption criteria of the 21st Century Cures Act may not be regulated as devices at all. Among regulated devices, software that uses AI/ML for administrative or operational functions (workflow optimization, scheduling) rather than clinical decision-making may qualify as Class I. However, the vast majority of AI/ML devices that provide clinical information -- diagnostic outputs, triage recommendations, clinical measurements -- are classified as Class II or higher. In the EU, Rule 11 of the MDR explicitly classifies all standalone medical device software as Class IIa at minimum, effectively eliminating the Class I pathway for SaMD in Europe.
What clinical evidence does the FDA require for AI devices?
The clinical evidence requirements depend on the pathway and the risk profile of the device. For 510(k) submissions (the most common pathway, covering ~95% of AI/ML devices), the FDA typically requires standalone performance data demonstrating algorithm accuracy on an independent test set -- sensitivity, specificity, AUC, and predictive values against a defined reference standard. For devices that augment clinician judgment, reader studies comparing clinician performance with and without the AI tool are often expected. For De Novo submissions, the evidence bar is generally higher and may include prospective clinical studies, multi-site validation, and more extensive subgroup analyses. For PMA submissions (rare for AI/ML), prospective clinical trials with patient outcome data are typically required. Across all pathways, the FDA expects subgroup performance analysis, documentation of training data characteristics, and a pre-specified statistical analysis plan. Notably, systematic reviews have found that fewer than 2% of FDA-authorized AI/ML devices have undergone randomized controlled trials, and fewer than 1% report actual patient outcomes -- the evidence base is overwhelmingly retrospective performance data.
Does the EU AI Act apply to medical devices?
Yes. The EU AI Act (Regulation (EU) 2024/1689) applies to AI systems intended for use as medical devices or as safety components of medical devices. These AI systems are classified as "high-risk" under the AI Act, which triggers requirements for data governance, transparency, human oversight, accuracy, robustness, and cybersecurity. Critically, the AI Act does not replace the EU MDR -- it supplements it. Manufacturers must comply with both regulations simultaneously. The conformity assessment is integrated: the Notified Body assessing the device under the MDR also evaluates AI Act compliance, avoiding separate assessment processes. However, the documentation and compliance requirements are additive, meaning that manufacturers face a broader set of obligations than under the MDR alone. Most AI Act obligations for high-risk medical device AI systems apply from August 2, 2026. Manufacturers should begin preparing now, particularly regarding data governance documentation and algorithmic transparency requirements.
What is an SBOM and why does the FDA require it?
An SBOM (Software Bill of Materials) is a machine-readable inventory of all software components, libraries, frameworks, and dependencies that make up a software product. Think of it as an ingredient list for software. The FDA requires SBOMs as part of premarket cybersecurity documentation for devices with cybersecurity considerations, which includes virtually all AI/ML medical devices. The FDA requires SBOMs because AI/ML devices depend on complex software supply chains -- ML frameworks, numerical libraries, containerization tools, pre-trained models -- and each component can contain known vulnerabilities. Without an SBOM, neither the manufacturer nor the FDA can systematically assess the device's vulnerability exposure. SBOMs are typically generated in standardized formats such as SPDX (ISO/IEC 5962:2021) or CycloneDX (OWASP), and paired with vulnerability monitoring services that alert manufacturers when newly discovered vulnerabilities affect their components. Automated tools like Syft, Microsoft's SBOM Tool, and commercial platforms can generate SBOMs from build environments.
How do I detect model drift in a deployed AI device?
Model drift detection requires systematic monitoring infrastructure built into the device from deployment. There are several complementary approaches. Statistical process control: Monitor key performance metrics (sensitivity, specificity, false positive rate) over rolling time windows and trigger alerts when metrics fall outside predefined control limits. Input distribution monitoring: Track the statistical properties of incoming data (pixel intensity distributions for imaging, feature distributions for tabular data) and detect when the input distribution diverges significantly from the training data distribution using methods such as the Kolmogorov-Smirnov test, Population Stability Index (PSI), or Maximum Mean Discrepancy (MMD). Prediction confidence monitoring: Track the distribution of model confidence scores over time; a shift toward lower confidence or toward the decision boundary may indicate degradation. Clinical feedback loops: Capture clinician agreement/disagreement with model outputs where feasible, and monitor the disagreement rate over time. Reference standard comparison: Periodically compare model predictions against ground truth outcomes (e.g., biopsy results for a lesion detection algorithm) to directly measure accuracy trends. The key is to establish baseline performance metrics at deployment and then monitor continuously, not episodically.
What is the difference between a locked and adaptive algorithm?
A locked algorithm produces the same output for the same input every time -- the model weights, decision thresholds, and processing logic are fixed after deployment. The algorithm may have been developed using machine learning (e.g., trained on millions of images), but once finalized, it does not learn or change. The vast majority of FDA-authorized AI/ML devices use locked algorithms. An adaptive algorithm (also called a continuously learning algorithm) modifies its behavior over time based on new data received during clinical deployment. This could mean the model retrains periodically on new patient data, adjusts its thresholds based on local performance feedback, or updates its parameters based on newly labeled examples. Adaptive algorithms offer the theoretical advantage of improving over time and adapting to local clinical patterns, but they introduce significant regulatory challenges: the device that was authorized is not the same device months later. As of 2026, no fully autonomous continuously learning medical device has been authorized by the FDA. Manufacturers can use PCCPs to pre-authorize bounded, manufacturer-controlled updates, but true real-time autonomous learning remains an open regulatory question.
How many AI/ML devices has the FDA authorized?
As of early 2026, the FDA has authorized over 1,000 AI/ML-enabled medical devices. The pace of authorization has accelerated dramatically: from fewer than 50 cumulative authorizations in 2016 to over 100 in 2019, approximately 170 in 2024, and between 258 and 295 in 2025 (a record year). Approximately 75-80% of authorized AI/ML devices are in radiology, followed by ~10% in cardiology, with the remainder spread across ophthalmology, pathology, gastroenterology, neurology, and other specialties. About 95% were authorized through the 510(k) pathway, ~5% through De Novo, and a very small number through PMA. Over 400 unique manufacturers have received AI/ML device authorizations. The FDA maintains a publicly available database of all authorized AI/ML-enabled devices through its Digital Health Center of Excellence (DHCoE), which is updated regularly and serves as the authoritative source for these figures.