MedDeviceGuideMedDeviceGuide
Back

IVDR Performance Evaluation: The Complete Guide for IVD Manufacturers

How to conduct performance evaluation under the EU IVDR — scientific validity, analytical performance, clinical performance, classification rules, and what Notified Bodies expect.

Ran Chen
Ran Chen
2026-03-14Updated 2026-03-2469 min read

What Is Performance Evaluation Under the IVDR?

Performance evaluation is the systematic assessment of the scientific validity, analytical performance, and clinical performance of an in vitro diagnostic medical device. Under the In Vitro Diagnostic Regulation (EU) 2017/746 — the IVDR — it is the IVD equivalent of clinical evaluation for medical devices under the MDR.

But there is a critical distinction. For medical devices, clinical evaluation focuses on whether the device achieves its intended clinical benefit and whether the residual risks are acceptable. For IVDs, performance evaluation asks a different set of questions: Does this device measure what it claims to measure? How accurately? How reliably? And does the information it provides actually lead to the clinical decisions the manufacturer says it does?

Article 56 of the IVDR establishes performance evaluation as a mandatory, continuous process. It is not a one-time exercise completed before CE marking and then forgotten. It must be planned, conducted, documented, and updated throughout the device's lifecycle — fed by post-market performance follow-up data and reflected in the performance evaluation report.

The IVDR applies to all IVDs placed on the EU market, regardless of classification. Whether you manufacture a Class A wash buffer or a Class D HIV screening assay, you need a performance evaluation. The depth, rigor, and regulatory scrutiny differ enormously by risk class, but the obligation is universal.

How Performance Evaluation Changed from IVDD to IVDR

The In Vitro Diagnostic Medical Devices Directive (IVDD 98/79/EC) required manufacturers to demonstrate device performance, but the framework was far less prescriptive than what the IVDR demands. Understanding these differences is essential, especially for manufacturers transitioning legacy devices.

Aspect IVDD (98/79/EC) IVDR (2017/746)
Legal basis Directive (transposed by member states) Regulation (directly applicable)
Performance evaluation scope Analytical performance and clinical performance mentioned, but not structured into explicit pillars Three explicit pillars: scientific validity, analytical performance, clinical performance (Article 56, Annex XIII)
Clinical evidence requirements Clinical evidence referenced but requirements vague; literature review often sufficient even for high-risk devices Structured clinical performance study requirements; clinical evidence must be sufficient for the risk class; clinical performance studies (interventional or observational) may be required
Performance evaluation plan Not explicitly required as a standalone document Mandatory performance evaluation plan (Annex XIII, Part A, Section 1)
Performance evaluation report No explicit PER requirement; performance data included in technical file Mandatory performance evaluation report with defined structure (Annex XIII, Part A, Section 1.3.2)
Post-market follow-up Post-market surveillance required but PMPF not defined as a formal process Post-market performance follow-up (PMPF) explicitly defined with plan and report requirements (Article 78, Annex XIII, Part B)
Common Specifications Common Technical Specifications (CTS) for Annex II List A devices Common Specifications (CS) for Class D devices, with broader scope and legal weight (Article 9)
Notified Body involvement Only for Annex II List A and List B devices (~20% of IVDs) For all Class B, C, and D devices (~80% of IVDs)
Literature review Accepted broadly, minimal structure required Accepted but must follow systematic methodology; standalone literature not sufficient for all risk classes
ISO 20916 Did not exist during most of IVDD era ISO 20916:2019 provides the framework for clinical performance studies, referenced by MDCG guidance

The magnitude of the shift cannot be overstated. Under the IVDD, a manufacturer of a Class C equivalent device (say, a companion diagnostic) could often rely on published literature and in-house performance data with minimal external scrutiny, because the device might not have required Notified Body involvement at all. Under the IVDR, that same device requires a structured performance evaluation plan, formal clinical performance data, a performance evaluation report reviewed by a Notified Body, and ongoing PMPF.

The Three Pillars of Performance Evaluation

Annex XIII of the IVDR structures performance evaluation around three pillars. Each pillar addresses a different dimension of what it means for an IVD to "work." They are not independent — they build on each other and must be addressed collectively in your performance evaluation report.

Pillar 1: Scientific Validity

Scientific validity is the association of an analyte with a clinical condition or physiological state. This pillar answers the most foundational question: Is there a legitimate scientific basis for measuring this particular analyte (or marker, or genetic variant, or microorganism) in the context of the intended clinical application?

For example, if your device measures troponin I to aid in the diagnosis of acute myocardial infarction, the scientific validity assessment must demonstrate — through peer-reviewed literature, clinical guidelines, and established medical knowledge — that troponin I levels are, in fact, associated with myocardial injury and that measuring troponin I has a recognized role in the clinical decision-making pathway for MI diagnosis.

Scientific validity is typically established through:

  • Peer-reviewed literature. Published studies demonstrating the association between the analyte and the clinical condition.
  • Clinical practice guidelines. Authoritative guidance from medical professional societies (e.g., ESC guidelines for cardiac biomarkers, WHO recommendations for infectious disease testing).
  • Systematic reviews and meta-analyses. Particularly valuable when the evidence base is large and heterogeneous.
  • International standards and reference materials. WHO international standards, certified reference materials, consensus panels.
  • Textbooks and established medical knowledge. For well-established analyte-disease associations (e.g., blood glucose and diabetes), this may be the primary source.

For well-established analytes — glucose, HbA1c, TSH, HIV antibodies — scientific validity is rarely the bottleneck. The evidence base is mature and well-documented. The challenge arises with novel biomarkers, companion diagnostics targeting emerging mutations, or multi-analyte panels where the clinical utility of the combination is less established than the individual markers.

Key point: Scientific validity is about the analyte-condition association, not about your specific device. It is independent of the measurement technology. Whether you measure troponin I using an immunoassay, mass spectrometry, or a point-of-care lateral flow device, the scientific validity of troponin I as a cardiac biomarker is the same.

Pillar 2: Analytical Performance

Analytical performance is the ability of a device to correctly detect or measure a particular analyte. This is the technical heart of IVD performance evaluation — the evidence that your device measures what it claims to measure, with the accuracy, precision, and reliability needed for its intended use.

Annex I of the IVDR (General Safety and Performance Requirements) and Annex XIII, Part A, Section 1.2.2 define the analytical performance characteristics that must be evaluated. The specific characteristics depend on the device type and intended purpose, but the IVDR provides a comprehensive list.

The following table summarizes the core analytical performance characteristics:

Characteristic Definition Why It Matters Typical Study Design
Analytical sensitivity The ability to detect the smallest amount of the analyte (related to limit of detection) Determines whether the device can detect the analyte at clinically relevant concentrations Serial dilutions of known positive samples; probit analysis for LoD determination
Analytical specificity The ability to detect only the intended analyte without interference from other substances Ensures results are not falsely affected by cross-reacting substances, medications, or endogenous interferents Testing panels of known interfering substances; cross-reactivity panels for microbiology/immunology assays
Limit of Detection (LoD) The lowest amount of analyte that can be reliably distinguished from zero (typically 95% probability of detection) Critical for screening assays, pathogen detection, and any application where missing low-level positives has clinical consequences CLSI EP17-A2; serial dilutions with replicate testing at each level; probit regression
Limit of Quantitation (LoQ) The lowest amount of analyte that can be quantitatively determined with acceptable precision Important for quantitative assays where clinical decisions depend on specific concentration values (e.g., viral load monitoring) CLSI EP17-A2; replicate measurements at low concentrations; defined by acceptable CV% or total error
Trueness/Accuracy The closeness of measured values to the true value (or accepted reference value) Ensures the device provides results that are clinically meaningful and comparable to reference methods Comparison with certified reference materials, reference measurement procedures, or established comparative methods; method comparison studies
Precision (Repeatability and Reproducibility) Repeatability: agreement between successive measurements under the same conditions. Reproducibility: agreement across different conditions (operators, instruments, sites, lots) Ensures consistent results across real-world usage conditions CLSI EP05-A3 or EP15-A3; multi-day, multi-operator, multi-instrument, multi-lot study designs
Linearity The ability to provide measured values proportional to the actual concentration across a defined range Validates that the device's measuring range is accurately calibrated from low to high concentrations CLSI EP06-A; testing serial dilutions or mixtures of high and low samples across the claimed measuring range
Measuring range (AMR) The range of analyte concentrations over which the device provides results with acceptable accuracy and precision Defines the boundaries within which the device is validated; results outside this range require dilution or cannot be reported Established through linearity and precision studies; defines lower and upper limits of the reportable range
Interference The effect of other substances present in the sample on the accuracy of the measured value Real patient samples contain medications, endogenous substances (hemoglobin, bilirubin, lipids, biotin), and other analytes that may affect results CLSI EP07-A3; spiking studies with known interferents at clinically relevant concentrations
Cross-reactivity The degree to which the device reacts with analytes other than the target analyte (especially relevant for immunoassays and molecular assays) Particularly critical for infectious disease assays where cross-reactivity with related organisms could cause false positives Testing panels of phylogenetically or structurally related organisms/analytes; wet testing and/or in silico analysis for molecular assays
Metrological traceability The traceability of calibrator values to higher-order reference materials or reference measurement procedures Ensures results are comparable across different measurement systems and over time Documented calibration hierarchy; traceability to WHO international standards, certified reference materials (CRMs), or reference measurement procedures per ISO 17511
High-dose hook effect The phenomenon where extremely high analyte concentrations cause a paradoxical decrease in the measured signal, producing falsely low or falsely negative results (primarily affects one-step immunometric "sandwich" assays) Can cause critically dangerous misclassification — for example, reporting a very high hCG as normal, or a massively elevated troponin as negative. Clinicians relying on the result may miss a life-threatening condition. Testing with serial dilutions extending well above the upper limit of the measuring range; documenting the concentration at which the hook effect begins and the magnitude of signal suppression; IFU must warn users if the hook effect cannot be eliminated by design
Diagnostic cut-off The threshold value that distinguishes positive from negative results (for qualitative assays) or normal from abnormal (for quantitative assays used in screening or diagnosis) Determines the balance between clinical sensitivity and specificity; directly affects false-positive and false-negative rates. The cut-off may include equivocal (grey) zones where results are indeterminate and require repeat testing or confirmatory testing. ROC analysis on well-characterized clinical populations; statistical methods for optimizing the cut-off based on intended clinical application (e.g., maximizing sensitivity for screening vs. maximizing specificity for confirmatory assays); documentation of equivocal/grey zone boundaries and their clinical management
Robustness The stability of analytical results under small, deliberate variations in method parameters — such as temperature fluctuations, timing deviations, sample volume variations, pH changes, or reagent concentration shifts Demonstrates that the assay tolerates the minor procedural variations that inevitably occur in real-world testing environments — particularly important for point-of-care and self-testing devices where controlled laboratory conditions cannot be guaranteed Flex studies (also called ruggedness studies) per CLSI EP26-A: systematically vary one parameter at a time while holding others constant; define acceptable performance boundaries for each variable; document failure modes when parameters exceed specified tolerances
Carry-over The transfer of analyte from a sample with a high concentration to the next sample tested, resulting in a falsely elevated result in the subsequent measurement Relevant for automated analyzers processing sequential samples — carry-over can cause false positives or clinically significant measurement errors, particularly when a high-positive sample is followed by a negative or low-positive sample Testing high-positive samples followed immediately by blank or negative samples; measuring any residual signal; expressing carry-over as a percentage; demonstrating that carry-over does not exceed clinically acceptable limits per CLSI EP10-A3
Matrix effects The influence of the sample matrix (the components of the specimen other than the analyte — such as proteins, lipids, anticoagulants, or preservatives) on the measured result Different specimen types (serum vs. plasma vs. whole blood, EDTA vs. heparin vs. citrate) may produce different results for the same analyte concentration. Failure to characterize matrix effects can lead to clinically significant errors when the device is used with specimen types not adequately validated. Method comparison studies across claimed specimen types; spiking and recovery experiments; assessment of anticoagulant effects; evaluation of specimen collection device compatibility (tube types, additives)
Stability The ability of the device (reagents, calibrators, controls) to maintain performance within specifications over time and under specified storage conditions Determines shelf life, open-vial stability, on-board stability, and transport conditions Real-time stability studies per CLSI EP25-A; accelerated stability for supporting data (not as sole basis for shelf-life claims)

How These Characteristics Interact

No single analytical performance characteristic stands alone. Consider a quantitative immunoassay for a cardiac biomarker:

  • The LoD tells you the lowest concentration you can detect.
  • The LoQ tells you the lowest concentration you can report as a number with acceptable imprecision.
  • Linearity confirms the measuring range is accurate from LoQ to the upper limit.
  • Precision tells you how reproducible those numbers are across days, instruments, and reagent lots.
  • Trueness tells you whether those numbers are correct compared to the reference method.
  • Interference testing tells you whether hemolysis, lipemia, biotin supplementation, or rheumatoid factor will distort those numbers in real patient samples.
  • Stability tells you how long the reagents will maintain all of the above performance.

The performance evaluation must address all applicable characteristics. A common deficiency flagged by Notified Bodies is the failure to assess characteristics that are clearly relevant to the device's intended use — for example, omitting interference studies for a point-of-care assay that will be used on whole blood samples where hemolysis is common.

Pillar 3: Clinical Performance

Clinical performance is the ability of a device to yield results that are correlated with a particular clinical condition or physiological or pathological process or state in accordance with the target population and intended user. Where analytical performance asks "Does the device measure the analyte correctly?", clinical performance asks "Do those measurements actually correspond to the clinical reality?"

Clinical performance is typically evaluated through the following characteristics:

Characteristic Definition Clinical Significance
Clinical sensitivity (Diagnostic sensitivity) The proportion of individuals with the target condition who are correctly identified by the device (true positive rate) A test with 99% clinical sensitivity misses 1 in 100 affected individuals. For screening assays (e.g., blood bank HIV screening), clinical sensitivity requirements approach 100%.
Clinical specificity (Diagnostic specificity) The proportion of individuals without the target condition who are correctly identified as negative by the device (true negative rate) A test with 95% specificity produces false positives in 5 out of 100 unaffected individuals. Low specificity in population screening creates anxiety, unnecessary follow-up testing, and costs.
Positive Predictive Value (PPV) The probability that a positive result indicates a true positive — depends on clinical sensitivity, specificity, and disease prevalence PPV is highly prevalence-dependent. A test with 99% sensitivity and 99% specificity has a PPV of only ~50% when prevalence is 1%. This is why confirmatory testing exists.
Negative Predictive Value (NPV) The probability that a negative result indicates a true negative NPV is critical for rule-out applications. A negative troponin in the emergency department must have extremely high NPV to safely discharge patients.
Positive Likelihood Ratio (LR+) How much more likely a positive result is in someone with the condition vs. without LR+ > 10 is generally considered strong evidence for ruling in a diagnosis. Combines sensitivity and specificity into a single metric independent of prevalence.
Negative Likelihood Ratio (LR-) How much more likely a negative result is in someone without the condition vs. with LR- < 0.1 is generally considered strong evidence for ruling out a diagnosis.
Receiver Operating Characteristic (ROC) analysis Plots sensitivity vs. (1 - specificity) across all possible cutoff values for quantitative assays Used to determine optimal diagnostic cutoff values and to compare overall discriminatory power between assays. Area Under the Curve (AUC) summarizes performance.
Agreement with reference method / comparator Concordance between the device under evaluation and an established diagnostic reference (gold standard or best available comparator) When no perfect reference standard exists (which is common), clinical performance must be assessed relative to the best available comparator, composite reference standards, or clinical adjudication panels.

How Clinical Performance Data Is Generated

Clinical performance data can come from several sources, and the IVDR does not mandate a single approach. The key is that the evidence must be sufficient for the device's risk class and intended purpose.

Clinical performance studies — These are prospective or retrospective studies specifically designed to evaluate the clinical performance of the device. They may be interventional (where the device result influences patient management) or non-interventional (where the device is used alongside the standard of care but does not influence clinical decisions). ISO 20916:2019 provides the framework for planning and conducting these studies.

Published literature — Peer-reviewed studies reporting clinical performance data for the same or similar devices. The IVDR accepts literature-based evidence, but it must be obtained through a systematic and documented search strategy, critically appraised, and its applicability to the specific device demonstrated.

Clinical performance data from routine diagnostic testing — Data generated through the routine use of the device in clinical practice (i.e., real-world evidence). This can include retrospective analyses of laboratory data, registry data, or data from post-market performance follow-up.

Residual sample studies — Using leftover clinical samples with known clinical status to evaluate the device. This is common for IVDs because the device's output is a measurement on a sample, not an intervention on a patient. Ethical requirements still apply, but these studies are generally less burdensome than interventional clinical trials.

Critical distinction: Clinical performance for IVDs is fundamentally different from clinical evaluation for medical devices. For a medical device, you need to show the device works in the patient. For an IVD, the "performance" is about the information the device provides — the measurement result — and whether that information is clinically meaningful. The patient is not directly exposed to the IVD (except for near-patient testing devices where specimen collection itself has risk considerations).

IVDR Classification Rules and Their Impact on Performance Evaluation

The IVDR classifies IVDs into four risk classes — A, B, C, and D — using seven rules set out in Annex VIII. The classification directly determines the depth and rigor of performance evaluation required, the level of Notified Body scrutiny, and whether Common Specifications must be followed.

Classification Overview

Class Risk Level Classification Criteria Examples Notified Body Required?
D Highest Devices for detecting transmissible agents in blood, tissues, cells, or transplantation materials; devices for determining blood groups or tissue typing when used to ensure compatibility Blood screening: HIV-1/2, HCV, HBV (HBsAg, anti-HBc, HBV NAT), HTLV-I/II, syphilis (Treponema pallidum), Chagas disease (Trypanosoma cruzi), variant CJD, CMV, West Nile virus, Zika virus, hepatitis D (HDV), hepatitis E (HEV), malaria screening for blood donations. Blood grouping: ABO system, Rh system (RhD, RhC/c, RhE/e), Kell system, Kidd system, Duffy system, irregular antibody screening, cross-matching for transfusion compatibility, HPA (human platelet antigen) typing for neonatal alloimmune thrombocytopenia. Tissue typing: HLA typing for organ and stem cell transplantation (when used to ensure immunological compatibility). Yes
C High Devices for companion diagnostics; devices used in screening, diagnosis, or staging of life-threatening diseases where false results could lead to life-threatening or irreversible harm; genetic testing; self-testing for management of serious diseases; near-patient testing for critical values Companion diagnostics (e.g., HER2, EGFR, ALK, BRAF, PD-L1); prenatal screening (trisomy 21); self-testing blood glucose monitors; HLA typing; tumor markers for treatment selection; HPV screening assays Yes
B Moderate Self-testing devices not in Class C or D; devices for near-patient testing not in Class C or D; devices not covered by higher classification rules but posing moderate risk Pregnancy self-tests; cholesterol self-tests; clinical chemistry analyzers; hematology systems; blood gas analyzers; urinalysis dipsticks (non-self-testing); coagulation analyzers Yes
A Lowest Devices not classified in any higher class; general laboratory products Wash solutions; buffer solutions; general culture media (not selective/differential for Class C/D organisms); specimen receptacles; histological stains No (self-declaration), unless sterile

Classification Rule Summary

Rule Scope Typical Classification Outcome
Rule 1 Devices for detecting transmissible agents in blood/tissue/cell donations for transfusion or transplantation Class D
Rule 2 Devices for blood grouping or tissue typing to ensure immunological compatibility Class D
Rule 3 Devices that pose high individual risk (life-threatening diseases where false results lead to life-threatening decisions) or high public health risk; companion diagnostics; genetic testing; near-patient testing with critical results; self-testing with management of serious disease Class C
Rule 4 Self-testing devices not covered by Rules 1–3 Class B
Rule 5 Devices for near-patient testing not covered by Rules 1–3 Class B
Rule 6 Specific products: instruments with no critical measurement features, specimen receptacles, general laboratory products Class A or B depending on specifics
Rule 7 All other devices not classified by Rules 1–6 Class B

Impact on Performance Evaluation Requirements

Performance Evaluation Aspect Class A Class B Class C Class D
Performance evaluation plan Required (can be simplified) Required Required (detailed) Required (detailed)
Scientific validity assessment Required Required Required (comprehensive) Required (comprehensive)
Analytical performance studies Required (proportionate to risk) Required Required (comprehensive) Required (comprehensive); must address Common Specifications
Clinical performance studies Required (may rely on literature or existing data for well-established analytes) Required; clinical performance studies often needed for novel assays Required; clinical performance studies typically expected Required; must address Common Specifications; interventional or well-powered observational studies expected
Performance evaluation report Required Required; reviewed by Notified Body Required; reviewed by Notified Body Required; reviewed by Notified Body
PMPF plan and report Required (may justify absence of active PMPF for low-risk devices) Required Required (active PMPF expected) Required (active PMPF expected)
Common Specifications Not applicable Not applicable May apply for specific device types Mandatory compliance unless justified
Notified Body review No (except sterile) Yes Yes Yes (including batch verification for some devices)

Practical note: "Proportionate to risk" does not mean "optional for Class A." Every IVD needs a performance evaluation. For a Class A wash buffer, the evaluation may be brief — limited literature confirming the buffer's role, basic analytical data confirming it performs its function (pH, osmolality, absence of interfering substances), and a justification that clinical performance is not directly applicable because the device is an accessory. But the evaluation must exist, be documented, and be defensible.

The Performance Evaluation Plan

Annex XIII, Part A, Section 1 of the IVDR requires manufacturers to draw up and document a performance evaluation plan. This is the strategic document that defines what evidence you need, how you will obtain it, and how it will be analyzed. It should be written before studies begin — not reverse-engineered after the data is collected.

A well-structured performance evaluation plan includes:

Scope and Device Description

  • Device identification (name, model, variants, intended purpose, target analyte, specimen type, measuring range)
  • Intended user population and intended operator
  • IVDR classification and applicable classification rules
  • Identification of applicable Common Specifications (for Class D devices)
  • Identification of applicable harmonized standards (e.g., ISO 20916, ISO 17511, EN 13612)

Identification of Applicable General Safety and Performance Requirements

  • Mapping of GSPR from Annex I that are relevant to performance evaluation
  • Identification of which GSPRs are addressed through performance evaluation data vs. other evidence (bench testing, biocompatibility, etc.)

Scientific Validity Plan

  • Description of the intended analyte and its clinical/physiological association
  • Literature search strategy (databases, search terms, inclusion/exclusion criteria, time frame)
  • Identification of clinical guidelines and authoritative references
  • Gap analysis: identification of any areas where the scientific validity evidence base is insufficient

Analytical Performance Study Plan

  • Identification of all applicable analytical performance characteristics
  • For each characteristic: study design, acceptance criteria, sample types and sizes, statistical methods, relevant CLSI or ISO protocols
  • Justification for any omitted characteristics
  • Description of reference materials and comparative methods to be used

Clinical Performance Study Plan

  • Identification of required clinical performance characteristics (clinical sensitivity, specificity, PPV, NPV, etc.)
  • Study design (prospective, retrospective, residual sample study, literature-based)
  • Target population definition and inclusion/exclusion criteria
  • Sample size justification (power calculations where applicable)
  • Reference standard or comparator definition
  • Statistical analysis plan
  • Ethical considerations and regulatory notifications (where clinical performance studies are conducted)
  • If ISO 20916-compliant clinical performance study: full study protocol

PMPF Plan Reference

  • Reference to the PMPF plan that will generate ongoing clinical performance data post-market
  • Identification of specific performance questions the PMPF is designed to address

Timeline and Milestones

  • Phasing of studies relative to design development and regulatory submission
  • Identification of dependencies and critical path activities

Common deficiency: Notified Bodies frequently cite performance evaluation plans that are generic — essentially a copy-paste template with the device name changed. The plan must be device-specific. It must identify the specific analyte, the specific clinical application, the specific performance characteristics that matter for that application, and the specific study designs that will generate the evidence. A generic plan signals that the manufacturer does not understand their own device's clinical context.

Analytical Performance Studies: Practical Considerations

Conducting analytical performance studies is the most labor-intensive part of the performance evaluation for most IVDs. The IVDR does not specify exact protocols, but the expectation — reinforced by MDCG guidance and Notified Body practice — is that studies follow recognized methodologies, primarily those from the Clinical and Laboratory Standards Institute (CLSI) and relevant ISO standards.

Sensitivity and Limit of Detection (LoD)

For qualitative assays (positive/negative), analytical sensitivity is expressed as the LoD — the lowest concentration of analyte that the device can reliably detect. The standard approach follows CLSI EP17-A2:

  1. Prepare a dilution series of known positive samples spanning the expected LoD region.
  2. Test multiple replicates (typically 20–60) at each concentration level.
  3. The LoD is the concentration at which the device detects the analyte with at least 95% probability (by probit analysis or other statistical method).

For molecular assays (PCR-based), LoD is often expressed in copies/mL or IU/mL and is critical for pathogen detection. Regulatory expectations for blood screening assays (Class D) are particularly demanding — missing a positive donation can result in transfusion-transmitted infection.

Specificity and Cross-Reactivity

Analytical specificity is evaluated through two complementary approaches:

Interference studies assess the effect of endogenous substances (hemoglobin, bilirubin, triglycerides, total protein, biotin) and exogenous substances (common medications) on the assay result. CLSI EP07-A3 provides the standard paired-difference study design. For each potential interferent, the study compares the device's result on a sample with and without the interferent at clinically relevant concentrations.

Cross-reactivity studies assess whether the device produces false-positive results when testing samples containing related but non-target analytes. For infectious disease assays, this means testing panels of phylogenetically related organisms. For immunoassays, this means testing structurally similar molecules (metabolites, isoforms, drug analogs). For molecular assays, in silico sequence analysis can supplement (but typically not replace) wet testing.

Precision Studies

Precision studies are structured around the standard variance component model described in CLSI EP05-A3:

Component What It Measures Study Design Element
Repeatability (within-run) Variation when the same sample is tested multiple times in the same run, same instrument, same operator Multiple replicates per run
Between-run (within-day) Variation between runs on the same day Multiple runs per day
Between-day Variation across days Testing over 20+ days
Between-instrument Variation across different instruments of the same type Multiple instruments (minimum 2–3)
Between-lot Variation across different reagent lots Multiple lots (minimum 2–3)
Between-site (reproducibility) Variation across different testing sites Multiple sites (for multi-center studies)
Total precision The combined effect of all sources of variation Calculated from variance components

Acceptance criteria for precision must be defined prospectively and should be clinically meaningful. A CV of 5% may be acceptable for a general chemistry analyte but entirely unacceptable for a drug level where the therapeutic window is narrow.

Linearity and Measuring Range

Linearity is evaluated per CLSI EP06-A by testing a series of samples at concentrations spanning the claimed measuring range. Typically, this involves preparing mixtures of high and low concentration patient samples to create a series of at least 5–7 concentration levels. The measured values are plotted against the expected values, and deviation from linearity is assessed using polynomial regression analysis.

The measuring range (Analytical Measurement Range, AMR) is established through the linearity and precision studies. The lower end is bounded by the LoQ (the lowest concentration with acceptable precision), and the upper end is bounded by the highest concentration that maintains linearity and precision.

Stability Studies

Stability studies must cover multiple aspects:

Stability Type What It Demonstrates Typical Requirements
Shelf life (real-time) Reagents/device components maintain performance over claimed shelf life under labeled storage conditions Real-time data required; duration must cover the entire claimed shelf life
Accelerated stability Supporting data for shelf-life estimation Cannot be sole basis for shelf-life claims; used to supplement real-time data or support preliminary claims
Open-vial stability Performance after opening reagent containers Required when reagents are packaged in multi-use containers
On-board stability Performance of reagents loaded on the instrument over the claimed on-board time Required for automated analyzers where reagents remain on-board
Calibration stability Duration over which a single calibration remains valid Required for quantitative assays; defines recalibration interval
In-use stability Performance of cartridges, strips, or sensors after opening sealed packaging Required for point-of-care and single-use devices
Transport/shipping stability Performance after exposure to transport conditions Required; typically assessed through temperature excursion studies
Specimen stability Stability of the analyte in the specimen under various storage conditions Required; defines acceptable pre-analytical handling conditions

Clinical Performance Studies and ISO 20916

When a clinical performance study is needed — because published literature and existing data are insufficient to establish clinical performance — the IVDR requires that the study be conducted in accordance with Annex XIII, Part A, and (for interventional studies or studies involving additional specimen collection) notified to or approved by the relevant Member State authority.

ISO 20916:2019 (Clinical performance studies using specimens from human subjects — Good study practice) provides the operational framework. It aligns with the IVDR requirements and covers:

  • Study planning and protocol development
  • Scientific and ethical review
  • Roles and responsibilities (sponsor, principal investigator, study monitor)
  • Specimen handling and labeling
  • Data management and statistical analysis
  • Study reporting

Key Aspects of Clinical Performance Study Design

Reference standard definition. This is often the hardest part. For many analytes, there is no perfect "gold standard." The IVDR acknowledges this and allows the use of composite reference standards, clinical adjudication panels, or the best available comparator. Whatever reference standard is chosen must be justified in the study protocol and its limitations discussed.

Sample size justification. The study must be powered to demonstrate the claimed clinical performance with appropriate statistical confidence. For clinical sensitivity and specificity, sample size depends on the expected performance, the acceptable confidence interval width, and the disease prevalence in the study population. For rare diseases or rare mutations, achieving adequate sample sizes can be exceptionally challenging and may require multi-center studies or the use of banked specimens.

Prospective vs. retrospective design. Both are acceptable under the IVDR, but each has trade-offs:

Design Advantages Limitations
Prospective Controlled specimen handling; defined enrollment criteria; no selection bias from sample availability Longer timeline; higher cost; requires IRB/ethics approval and (for interventional studies) regulatory notification
Retrospective (residual samples) Faster; lower cost; large sample sizes possible from biobanks Potential pre-analytical variability; selection bias; limited clinical metadata; some analytes degrade in storage

Multi-site considerations. Clinical performance should ideally be demonstrated across multiple clinical sites to ensure the device performs across different populations, operators, and clinical settings. Single-site studies are acceptable for initial evidence but may need to be supplemented post-market.

Scientific Validity Assessment: Deeper Dive

For most well-established analytes, the scientific validity assessment is a structured literature review. But "structured" is the operative word. The IVDR and MDCG guidance expect a methodology that is systematic, reproducible, and documented.

Conducting the Literature Review

  1. Define the search question. What is the analyte? What is the clinical condition? What is the claimed association?
  2. Select databases. PubMed/MEDLINE is the minimum. Cochrane Library, Embase, and relevant specialty databases should be considered.
  3. Define search terms. Use both MeSH terms and free text. Document the exact search strings used.
  4. Define inclusion and exclusion criteria. Specify date range, language, study type, population, and relevance filters.
  5. Execute the search and screen results. Document the number of hits, articles screened, and articles included/excluded with reasons. A PRISMA flow diagram is best practice.
  6. Critically appraise included studies. Assess study quality, risk of bias, applicability to the device's intended population, and consistency of findings.
  7. Synthesize the evidence. Summarize the strength of the analyte-condition association and identify any gaps or limitations.

For novel biomarkers, the scientific validity assessment may need to include data from the manufacturer's own studies — discovery studies, biomarker validation studies, outcome studies — particularly when the published literature is sparse.

Practical note for companion diagnostics: The scientific validity for a companion diagnostic is tightly linked to the clinical development of the companion drug. The analyte-condition association is effectively "the biomarker predicts response (or non-response) to the specific therapy." This evidence often comes from the drug clinical trials, and the IVD manufacturer must have access to or generate data demonstrating that their device identifies the biomarker in a manner consistent with the method used in the clinical trials.

Common Specifications for Class D Devices

Article 9 of the IVDR empowers the European Commission to adopt Common Specifications (CS) for Class D devices (and for Class A, B, and C devices in justified cases). Common Specifications are not optional — manufacturers must comply with them unless they can demonstrate that their alternative solutions provide a level of safety and performance that is at least equivalent.

Common Specifications replace the Common Technical Specifications (CTS) that existed under the IVDD for Annex II List A devices. The IVDR CS have broader scope and legal weight.

What Common Specifications Cover

For Class D devices, Common Specifications typically address:

  • Performance criteria. Minimum analytical sensitivity (LoD), diagnostic sensitivity, diagnostic specificity, and other performance requirements specific to the analyte or testing scenario.
  • Reference materials and panels. Specified panels of characterized samples that must be tested, including known positives at various concentrations, known negatives, and samples designed to challenge analytical specificity.
  • Study design requirements. Minimum sample sizes, population characteristics, and statistical methods.
  • Specific risk areas. Requirements for seroconversion panels (for blood screening assays), genotype panels (for blood grouping assays), and emerging variant detection (for infectious disease assays).

Compliance and Deviations

If a manufacturer deviates from a Common Specification, the burden of proof shifts to the manufacturer. They must demonstrate — with full documentation — that their alternative approach provides equivalent or superior performance. This deviation and its justification become part of the performance evaluation report and will be scrutinized by the Notified Body.

In practice, deviating from CS is difficult and should only be undertaken with strong technical justification. Notified Bodies expect a detailed gap analysis comparing the manufacturer's approach to the CS requirements, point by point.

Commission Regulation (EU) 2022/1107: What the Common Specifications Actually Contain

The European Commission adopted Commission Implementing Regulation (EU) 2022/1107 on 4 July 2022, laying down the first set of Common Specifications for Class D IVDs under the IVDR. These CS became applicable on 25 July 2024. Until their adoption, manufacturers were required to comply with the specifications in Commission Decision 2002/364/EC (the former Common Technical Specifications under the IVDD).

The regulation covers the following specific device categories — devices for the detection and/or quantification of markers related to:

Category Specific Analytes/Pathogens
Retrovirus infections HIV-1/HIV-2 (antibodies, antigen, NAT), HTLV-I/HTLV-II
Hepatitis virus infections HBV (HBsAg, anti-HBc, anti-HBs, HBV NAT), HCV (antibodies, antigen, NAT), HDV (hepatitis D)
Herpesvirus infections CMV (cytomegalovirus), EBV (Epstein-Barr virus)
Respiratory virus infections SARS-CoV-2
Bacterial infections Treponema pallidum (syphilis)
Parasitic infections Trypanosoma cruzi (Chagas disease)
Prion diseases Variant Creutzfeldt-Jakob disease (vCJD) markers
Blood grouping ABO system, Rh system (RhD, RhC/c, RhE/e), Kell system, Kidd system, Duffy system

The CS are structured in two parts. The first part establishes general requirements, definitions, and transitional provisions. The second part contains device-specific annexes organized by pathogen or analyte group, each specifying:

  • Minimum performance criteria. Required analytical sensitivity (LoD), diagnostic sensitivity, and diagnostic specificity thresholds — often expressed as minimum percentage detection rates on defined panel types (e.g., seroconversion panels, genotype panels, low-titer samples).
  • Mandatory reference panels. Specified panels of well-characterized samples that must be tested, including known positives at various concentrations (including near-cutoff concentrations), known negatives, seroconversion panels (for blood screening assays), and samples designed to challenge analytical specificity.
  • Genotype and subtype coverage. For infectious disease assays, requirements to demonstrate detection across all clinically relevant genotypes, subtypes, or variants — a particularly important consideration as pathogens evolve.
  • Study design minima. Minimum sample sizes, required population characteristics, and statistical methods for establishing clinical performance claims.
  • Emerging variant monitoring. Requirements for ongoing surveillance and validation against newly identified pathogen variants, genotypes, or recombinant forms.

Key practical point: Common Specifications are not static. The Commission can update them as scientific knowledge evolves, new pathogen variants emerge, or new device categories are designated. Manufacturers of Class D devices must monitor the Official Journal of the EU for CS amendments and assess their impact on existing conformity assessments.

Additional Device Categories Not Yet Covered by CS

Not all Class D devices have published Common Specifications. Devices for Class D applications outside the categories listed in EU 2022/1107 — such as certain tissue typing reagents, emerging pathogen screening assays, or future blood screening markers — must still meet the general performance evaluation requirements of the IVDR but do not have CS-specific performance benchmarks. In these cases, the manufacturer should benchmark against the state of the art and established reference methods, and the Notified Body will assess the performance evidence on its merits.

EU Reference Laboratories (EURLs) for Class D Devices

EU Reference Laboratories are a significant new element of the IVDR regulatory architecture that did not exist under the IVDD. Established under Article 100 of the IVDR, EURLs add an independent verification layer for Class D devices beyond the Notified Body assessment.

Legal Basis and Mandate

Article 100 of the IVDR empowers the European Commission to designate EU Reference Laboratories for specific categories or groups of Class D IVDs. These laboratories perform two primary functions:

  1. Performance verification — Independent verification of the manufacturer's performance claims and compliance with applicable Common Specifications, conducted as part of the initial conformity assessment.
  2. Batch testing — Testing of samples from manufactured batches of Class D devices to verify ongoing compliance with specifications.

Designated EURLs

The Commission has progressively designated EURLs through implementing regulations:

Regulation Publication Date Device Categories Covered Application Date
EU 2023/2713 December 2023 Detection/quantification of markers for hepatitis infections, retrovirus infections, herpesvirus infections, respiratory virus infections, and bacterial agents 1 October 2024
EU 2025/2526 December 2025 Parasite infection detection and blood grouping marker detection 1 May 2026

The designated laboratories include institutions such as the Paul-Ehrlich-Institut (Germany) for blood grouping, Instituto de Salud Carlos III (Spain) for parasitic infections, RISE Research Institutes of Sweden (Sweden), and several others across EU member states.

Impact on Manufacturers

The practical impact of EURLs on Class D device manufacturers is substantial:

  • Applications submitted before 1 May 2026. Performance verification is still required but may occur later in the certification cycle. Batch testing will apply from 1 May 2026 for devices already certified or under conformity assessment.
  • Applications submitted after 1 May 2026. Full EURL performance verification must be incorporated into the conformity assessment pathway from the outset. Manufacturers must ensure compliance with applicable Common Specifications, as these feed directly into the EURL test plan.
  • Devices outside EURL scope. Class D IVDs that fall outside the categories of designated EURLs continue to be assessed by the Notified Body alone — no EURL involvement is required for these devices at present.
  • Legacy Annex II List A devices. Devices certified under the IVDD that remain on the market under the extended transition periods (Regulation 2024/1860) undergo batch verification activities under the Notified Body's regime, not the EURL regime.

Manufacturers do not contract directly with EURLs. The EURL liaises with the Notified Body, and the manufacturer's relationship remains with the Notified Body. However, manufacturers must factor EURL timelines, fee structures, and documentation requirements into their conformity assessment planning.

EURL Tasks Under Article 100

Task Description
Performance claim verification Assess whether the manufacturer's claimed performance (sensitivity, specificity, LoD, etc.) is substantiated by independent testing
CS compliance verification Verify that the device meets applicable Common Specification requirements through independent panel testing
Batch release testing Test samples from manufactured batches to verify lot-to-lot consistency and ongoing performance
Scientific and technical advice Provide opinions and advice to Notified Bodies, Competent Authorities, and the Commission on matters relating to Class D device performance
Reference material development Recommend suitable reference materials and reference measurement procedures for Class D device categories
Laboratory network coordination Establish and coordinate networks of laboratories across the EU for harmonized testing
Method development Contribute to developing and harmonizing testing methodologies and analytical approaches

Planning consideration: EURL involvement adds time and cost to the Class D conformity assessment process. Manufacturers should engage their Notified Body early to understand the specific EURL requirements for their device category, expected timelines for performance verification and batch testing, and the documentation format expected by the EURL.

The Performance Evaluation Report (PER)

The performance evaluation report is the comprehensive document that compiles and analyzes all performance evaluation evidence. It is the IVD equivalent of the clinical evaluation report (CER) under the MDR. Annex XIII, Part A, Section 1.3.2 defines its requirements.

Recommended PER Structure

While the IVDR does not mandate a rigid format, the following structure aligns with Annex XIII requirements and Notified Body expectations:

Section Content
1. Executive Summary Brief overview of the device, intended purpose, classification, and the overall performance evaluation conclusion
2. Scope and Device Description Detailed device description, variants, intended purpose, target analyte(s), specimen types, intended user, patient population, clinical setting, classification rationale
3. Performance Evaluation Plan Summary Reference to the PEP, summary of the evaluation strategy, identification of applicable standards and CS
4. State of the Art Current medical and scientific knowledge relevant to the device's intended purpose; available alternative diagnostic methods; relevant clinical guidelines; benchmarking against competitor devices
5. Scientific Validity Assessment Summary and critical appraisal of the evidence for the analyte-condition association; literature search methodology and results; gap analysis
6. Analytical Performance Data Presentation and analysis of all analytical performance studies; comparison to acceptance criteria; discussion of results in the context of intended use and clinical relevance
7. Clinical Performance Data Presentation and analysis of clinical performance studies and/or literature-based clinical performance evidence; clinical sensitivity, specificity, PPV, NPV, likelihood ratios; comparison to state of the art
8. Benefit-Risk Analysis Assessment of the device's overall benefit-risk profile in the context of the performance evaluation data; residual risks from known performance limitations (e.g., false negatives in specific populations, known interferences); measures taken to mitigate these risks (confirmatory testing algorithms, IFU warnings)
9. Compliance with Common Specifications (For Class D, and where applicable for other classes) Point-by-point compliance with applicable CS; justification for any deviations
10. GSPR Compliance Matrix Mapping of performance evaluation data to applicable GSPRs
11. Conclusions Overall conclusions regarding scientific validity, analytical performance, and clinical performance; statement on whether the device meets IVDR requirements; identification of any conditions or limitations
12. Post-Market Performance Follow-Up Summary of the PMPF plan; identification of specific performance questions to be addressed post-market; criteria for updating the PER
13. References Complete bibliography of all literature, standards, and guidance documents referenced
14. Appendices Detailed study reports, statistical analyses, literature appraisal tables, search strategies, raw data summaries

PER Maintenance

The PER is a living document. It must be updated:

  • When new performance data becomes available (from PMPF or additional studies)
  • When the state of the art changes (new clinical guidelines, new reference methods, competitive devices with superior performance)
  • When changes are made to the device (new reagent formulations, new specimen types, expanded claims)
  • At minimum as part of the regular PSUR cycle (for Class B, C, D devices)
  • When post-market data reveals performance issues or trends

Post-Market Performance Follow-Up (PMPF)

PMPF is to IVDs what PMCF (Post-Market Clinical Follow-up) is to medical devices under the MDR. It is the proactive, ongoing collection and evaluation of clinical performance and safety data from the device's post-market use.

Article 78 and Annex XIII, Part B of the IVDR define PMPF requirements. The PMPF plan is part of the post-market surveillance plan and the performance evaluation plan.

PMPF vs. PMCF: Understanding the Distinction

While PMPF and PMCF serve parallel purposes — maintaining the evidence base for continued market access post-CE marking — they differ in important ways that reflect the fundamental differences between IVDs and other medical devices.

Aspect PMPF (IVDs, IVDR Annex XIII Part B) PMCF (Medical Devices, MDR Annex XIV Part B)
Primary objective Update the performance evaluation — confirming scientific validity, analytical performance, and clinical performance Update the clinical evaluation — confirming clinical safety and performance
Focus Analytical and clinical performance characteristics: sensitivity, specificity, precision, LoD, cut-off validity, variant coverage Clinical safety endpoints, long-term clinical performance, rare adverse events, real-world effectiveness
Regulatory basis IVDR Article 78, Annex XIII Part B MDR Article 61(11), Annex XIV Part B
Specific data sources Ring trials (proficiency testing), EQA schemes, epidemiological studies, genetic databanks, disease registries, laboratory feedback, PMPF studies Patient registries, clinical follow-up studies, PMCF studies, surveys of healthcare professionals, literature monitoring
Historical precedent Entirely new under IVDR — did not exist under the IVDD Existed conceptually under the MDD but significantly strengthened under the MDR
Update frequency Class C and D: annually; Class A and B: every 2–3 years or sooner if triggered by findings High-risk devices: annually; lower risk: may justify longer intervals
Three-pillar coverage Must address all three pillars — scientific validity (is the analyte-condition link still current?), analytical performance (is the device still measuring accurately?), and clinical performance (are clinical outcomes consistent with pre-market data?) Focuses on clinical safety and performance; no separate scientific validity pillar

PMPF is not simply PMCF with the terminology changed. The IVD-specific data sources — particularly ring trials (inter-laboratory proficiency testing), EQA (External Quality Assessment) participation, and genetic databank monitoring — have no direct parallel in the PMCF framework. Ring trials, in which multiple laboratories simultaneously test identical samples using the same device, provide uniquely powerful evidence of real-world inter-laboratory reproducibility that cannot be obtained through any other means.

For legacy IVD manufacturers: PMPF is required for legacy IVDs transitioning from the IVDD, even if those devices do not yet have a formal Performance Evaluation Report (PER). Per MDCG 2022-8, manufacturers are not required to retroactively generate a PER for legacy devices during the transition period, but they must implement PMPF activities — collecting and evaluating post-market performance data continuously from the date IVDR post-market requirements apply.

PMPF Plan

The PMPF plan must specify:

  • The objectives of the follow-up activities
  • The specific performance questions being addressed
  • The methods for data collection (routine diagnostic data, registries, customer surveys, PMPF studies, literature monitoring)
  • The evaluation methodology and statistical analysis plan
  • The rationale for the scope and frequency of follow-up
  • A timeline for reporting results

What PMPF Should Address

Area Specific Questions
Ongoing clinical performance Are clinical sensitivity and specificity consistent with pre-market data? Are there specific populations or clinical scenarios where performance differs from expectations?
Emerging variants and strains (For infectious disease assays) Does the device detect new pathogen variants, emerging genotypes, or mutants? This has been a major focus since the COVID-19 pandemic highlighted the importance of variant surveillance.
Real-world analytical performance How does precision, accuracy, and LoD perform in routine clinical settings (vs. controlled study environments)? What is the rate of quality control failures, calibration issues, or invalid results?
Evolving state of the art Have new reference methods, clinical guidelines, or competitive devices changed the expected performance benchmarks?
Known limitations Are the limitations identified in the PER (interferences, cross-reactivities, specific population performance gaps) confirmed or contradicted by post-market data?
Complaints and feedback What performance-related complaints are being received? Are there trends?

When Active PMPF Studies Are Expected

For Class C and D devices, Notified Bodies generally expect active PMPF — not just passive monitoring of complaints and literature. Active PMPF may involve:

  • Prospective performance monitoring at sentinel clinical sites
  • Periodic re-evaluation against updated reference panels (for infectious disease assays)
  • Systematic collection of discordant result data from clinical laboratories
  • Registry-based studies linking device results to patient outcomes

For Class A and B devices, the manufacturer can justify that passive PMPF (literature monitoring, complaint analysis, trend reporting) is sufficient — but this justification must be documented and reasoned, not merely asserted.

Companion Diagnostics Under the IVDR

Companion diagnostics (CDx) are classified as Class C under the IVDR. These are devices that provide information essential for the safe and effective use of a corresponding medicinal product — for example, determining HER2 status for trastuzumab eligibility, EGFR mutation status for osimertinib, PD-L1 expression for pembrolizumab, or BRAF V600E mutation for vemurafenib.

The performance evaluation for companion diagnostics has unique requirements:

Linkage to Medicinal Product

The IVD manufacturer must demonstrate that their device identifies the biomarker in a manner consistent with the method used in the clinical trials that established the drug's efficacy. This often means the CDx must show concordance (analytical and clinical) with the clinical trial assay (CTA) — the assay used to enroll patients in the pivotal drug trials.

Bridging Studies

If the CDx uses a different technology platform, different antibody clones (for IHC), or different primers/probes (for molecular assays) than the CTA, a bridging study is required to demonstrate concordance. This study compares the CDx result to the CTA result on a shared set of clinical specimens, demonstrating that the CDx classifies patients into the same biomarker-positive and biomarker-negative groups as the CTA.

Bridging Study Element Description
Sample set Clinical specimens from the drug clinical trial population or a representative population with similar biomarker prevalence
Comparator The clinical trial assay (or the validated reference method used in the trial)
Endpoints Overall percent agreement (OPA), positive percent agreement (PPA), negative percent agreement (NPA)
Acceptance criteria Must be pre-specified; typically OPA > 90%, with lower confidence bounds for PPA and NPA
Discordant analysis All discordant cases must be investigated; root cause analysis for discrepancies

Coordination with Marketing Authorization Holder

The IVDR requires that the CDx manufacturer consult with the marketing authorization holder (MAH) of the medicinal product during the device's development. In practice, this means:

  • Access to clinical trial data linking the biomarker to drug response
  • Agreement on cutoff values and scoring algorithms
  • Coordinated labeling (the drug label references the CDx, and vice versa)
  • Ongoing coordination as new indications or drug combinations are approved

Regulatory complexity: A companion diagnostic may need to be updated when the drug gains a new indication, a new patient population, or when new clinical trial data changes the biomarker cutoff or scoring criteria. Each change potentially triggers a PER update and Notified Body review.

Near-Patient Testing and Self-Testing Requirements

The IVDR applies additional requirements to devices intended for near-patient testing (NPT) and self-testing, reflecting the different use environments and operator capabilities.

Near-Patient Testing (Point-of-Care)

Near-patient testing devices are used outside the traditional clinical laboratory — in emergency departments, physician offices, pharmacies, or other healthcare settings where results are needed rapidly at the point of patient contact. These devices are classified as at least Class B under Rule 5 (or higher under Rule 3 if the results are critical).

Additional performance evaluation requirements include:

  • Operator studies (lay-user or trained non-laboratory personnel). The device must be evaluated in the hands of its intended users, not only trained laboratory professionals. If the device is intended for use by nurses in an emergency department, precision and accuracy studies must include nurses as operators.
  • Environmental robustness. Performance under varying temperature, humidity, and altitude conditions typical of the near-patient environment.
  • Specimen type variability. Near-patient devices often use capillary blood, saliva, or urine collected without standard laboratory pre-analytical processing. Performance must be validated for these specimen types.
  • Robustness to user errors. Usability testing must assess whether non-expert users can obtain reliable results following the instructions for use.

Self-Testing

Self-testing devices are used by lay persons (the patient or their caregiver) without professional supervision. These devices are classified as at least Class B (Rule 4), or Class C if used for management of a serious condition (e.g., blood glucose self-monitoring for diabetes, INR self-testing for anticoagulation management).

Additional requirements include all those for near-patient testing, plus:

  • Lay-user performance studies. Studies must demonstrate that untrained users, following only the instructions for use, can obtain results comparable to those obtained by trained professionals. Acceptance criteria typically require that a defined percentage (e.g., 95%) of lay-user results fall within specified agreement limits compared to a laboratory reference method.
  • Instructions for use review. IFU must be comprehensible to the intended lay audience. Readability testing (comprehension studies) may be required.
  • Interpretation of results. The manufacturer must demonstrate that lay users can correctly interpret the device output (especially for qualitative assays with visual read-out, where faint lines, ambiguous color changes, or timing-dependent results can cause misinterpretation).

Performance Evaluation for Legacy IVDs Transitioning from IVDD

For manufacturers with devices currently on the market under the IVDD, transitioning to the IVDR requires building a performance evaluation that meets the new requirements — often with data generated under the old framework.

The Transition Challenge

Under the IVDD, many manufacturers — particularly those with self-certified devices — had performance data that was generated for internal use, following whatever methodology the manufacturer deemed appropriate. This data may not meet IVDR expectations in terms of:

  • Study design rigor (no formal protocol, no pre-specified acceptance criteria)
  • Statistical analysis (insufficient sample sizes, no confidence intervals)
  • Documentation (incomplete study reports, missing raw data)
  • Scope (missing performance characteristics that the IVDR expects but the IVDD did not explicitly require)

Practical Approach to Legacy Device Performance Evaluation

Step Action Key Considerations
1. Gap analysis Compare existing performance data against IVDR Annex XIII requirements, applicable Common Specifications (Class D), and relevant MDCG guidance Identify what data exists, what data is missing, and what data exists but does not meet IVDR standards for rigor or documentation
2. Evaluate existing data Assess whether existing performance data can be used, with justification, in the IVDR performance evaluation Data generated under IVDD may be acceptable if the study design was sound, even if the documentation is less formal than IVDR ideally expects. Document the study methodology retrospectively where possible.
3. Supplement with new studies Design and conduct targeted studies to fill identified gaps Focus on the highest-priority gaps: characteristics not previously evaluated, populations not previously studied, comparison to updated reference methods
4. Leverage post-market data Use complaint data, external quality assessment (EQA) results, and field performance data accumulated during the IVDD era as supportive evidence Post-market data can be powerful evidence of real-world performance, but it must be systematically collected and analyzed, not cherry-picked
5. Build the PER Compile all evidence into an IVDR-compliant performance evaluation report Clearly distinguish between legacy data (generated under IVDD) and new data (generated to IVDR standards); acknowledge limitations of legacy data
6. Define PMPF Establish the PMPF plan to address remaining gaps or uncertainties identified in the PER The PMPF plan can be used to address residual gaps, but it cannot substitute for critical pre-market evidence

Transition Timelines — Regulation (EU) 2024/1860

The IVDR transition timelines have been extended multiple times. Regulation (EU) 2024/1860, published on 9 July 2024, provides the current extended deadlines for legacy IVDs. These replace the earlier timelines from Regulations (EU) 2022/112 and 2023/607.

Device Category Final IVDR Compliance Deadline Notified Body Application Deadline Signed Written Agreement Deadline
Class D (self-declared under IVDD) 31 December 2027 26 May 2025 26 September 2025
Class D (valid IVDD certificate) 31 December 2027 26 May 2025 26 September 2025
Class C (self-declared under IVDD) 31 December 2028 26 May 2026 26 September 2026
Class B (self-declared under IVDD) 31 December 2029 26 May 2027 26 October 2027
Class A sterile (self-declared under IVDD) 31 December 2029 26 May 2027 26 October 2027
Class A (non-sterile) 26 May 2022 (no extension) Not applicable Not applicable

Manufacturers must have submitted a formal application to a Notified Body by the specified deadline and must have a signed written agreement with the Notified Body. Simply being "in discussions" is not sufficient.

Important: Non-sterile Class A IVDs that were self-declared under the IVDD and do not require Notified Body involvement under the IVDR received no transition period. These devices were required to comply with the IVDR from its date of application on 26 May 2022.

Article 110(3) Conditions for Legacy Device Status

To benefit from the extended transition periods, legacy IVDs must satisfy all conditions set out in Article 110(3) of the IVDR. If any condition ceases to be met, the device loses its legacy status and must immediately comply with the IVDR in full or be withdrawn from the market.

Condition Requirement Practical Implication
Continued IVDD compliance The device must continue to comply with Directive 98/79/EC throughout the transition period Existing quality system and technical documentation per the IVDD must be maintained — they cannot be abandoned or degraded
No significant changes in design or intended purpose The device must not undergo significant changes to its design or intended purpose after 26 May 2022 Any modification beyond routine maintenance or administrative updates may trigger loss of legacy status and immediate IVDR compliance obligation
No unacceptable risk The device must not present an unacceptable risk to health or safety Competent Authorities retain the power to withdraw devices from the market during the transition period if safety concerns arise
QMS compliance Manufacturer must have a QMS in compliance with IVDR Article 10(8) no later than 26 May 2025 This is an earlier deadline than the product-specific compliance dates — manufacturers must have an IVDR-aligned QMS in place regardless of their device class transition date
Notified Body application Formal application to a Notified Body by the class-specific deadline, with signed written agreement by the corresponding agreement deadline Missing the application or agreement deadline eliminates the transition benefit entirely

What Constitutes a "Significant Change" — MDCG 2022-6

MDCG 2022-6 provides guidance on what modifications constitute "significant changes" that would cause a legacy device to lose its transitional status. The following changes are generally considered significant:

  • Intended purpose changes — expanding the intended use, adding new user populations, new clinical applications, or new specimen types
  • Design changes affecting operating principles — modifications to the measurement technology, detection methodology, or assay chemistry
  • Software modifications — most software changes are considered significant, unless they are strictly minor bug fixes with no impact on performance or safety
  • Material or composition changes — changes to reagent formulations, antibody clones, primer/probe sequences, or critical raw materials
  • Changes to control mechanisms — modifications to internal quality controls, calibration algorithms, or result interpretation logic
  • Corrective actions — design changes implemented as corrective actions generally trigger significant change status (unless accepted by the Competent Authority as maintaining IVDD conformity)

Changes that are generally not considered significant include administrative changes (manufacturer name, address), packaging changes with no impact on performance or stability, and editorial corrections to the IFU that do not alter the intended use or performance claims.

Practical warning: The "significant change" assessment is a judgment call, and manufacturers should document their rationale carefully. If a Competent Authority or Notified Body determines that a change was significant, the device retroactively loses its legacy status from the date the change was implemented — with potentially severe consequences including market withdrawal.

IVDR Requirements That Apply to Legacy Devices Immediately

Even during the transition period, certain IVDR requirements apply to all legacy devices regardless of their class-specific compliance deadline:

Requirement IVDR Reference Effective Date
Quality Management System Article 10(8) 26 May 2025
Post-market surveillance Chapter VII (Articles 78–81) Applies from 26 May 2022
Vigilance reporting Articles 82–87 Applies from 26 May 2022; serious incidents must be reported within 2 days (serious public health threats), 10 days (death/serious deterioration), or 15 days (other serious incidents)
Trend reporting Article 83 Applies from 26 May 2022
Field Safety Corrective Actions Article 82(9) Applies from 26 May 2022
Registration of economic operators Articles 28–30 Applies from 26 May 2022; EUDAMED registration required
Device registration Article 26 UDI-DI and EUDAMED registration required; devices lacking Basic UDI-DI must register with EUDAMED identifiers

Sell-Off Provisions

Regulation (EU) 2023/607 eliminated the previously established one-year sell-off period that would have allowed continued distribution of legacy devices after their transition deadline. Under the current framework, once a device's transition period expires, it can no longer be placed on the market. Devices already in the distribution chain (in the hands of distributors or importers) may continue to be made available on the market, but no new units may be placed on the market by the manufacturer.

Reality check for Class C manufacturers: The 26 May 2026 Notified Body application deadline is imminent. If your performance evaluation is not substantially complete and your Notified Body application not already well advanced, you are at serious risk of a market gap — the period where your IVDD compliance expires but your IVDR certification is not yet granted. Engage your Notified Body now on the timeline and prioritize the highest-risk elements of your performance evaluation.

Notified Body Expectations

Notified Bodies reviewing IVDR performance evaluations have developed increasingly specific expectations, informed by MDCG guidance documents and their own accumulated experience. Understanding these expectations is essential to avoiding review cycles that delay certification.

What Notified Bodies Look For

Area Expectation Common Pitfall
Performance evaluation plan Device-specific, not generic; clearly defines what evidence is needed and why; references applicable standards and CS Generic template with device name inserted; no rationale for study designs or acceptance criteria
Scientific validity Systematic literature review with documented methodology; evidence is current and from authoritative sources; clear linkage between analyte and clinical application Unsystematic collection of abstracts; outdated references; failure to address novel or contested aspects of the analyte-disease association
Analytical performance Studies conducted per recognized protocols (CLSI, ISO); pre-specified acceptance criteria; adequate sample sizes; all applicable characteristics addressed Missing characteristics (especially interference and cross-reactivity); post-hoc acceptance criteria; inadequate sample sizes; no mention of CLSI or equivalent methodology
Clinical performance Adequate evidence for the claimed clinical performance; appropriate reference standard; sufficient sample size; results reported with confidence intervals Unsupported claims (e.g., "99.9% sensitivity" without data); no confidence intervals; inappropriate reference standard; study population not representative of intended use
PER Comprehensive, well-organized, internally consistent; clear conclusions; honest about limitations; benefit-risk analysis that addresses residual performance risks Disorganized; contradictory data not addressed; limitations glossed over; benefit-risk analysis absent or superficial
Common Specifications (Class D) Point-by-point compliance demonstrated; any deviations fully justified CS requirements not addressed individually; deviations not acknowledged or justified
PMPF plan Specific to the device; identifies concrete performance questions; describes methods for data collection and analysis; defines update triggers Generic "we will monitor complaints"; no specific performance questions; no active PMPF activities for high-risk devices

Common Deficiencies Flagged by Notified Bodies

Based on published Notified Body feedback, MDCG guidance, and industry experience, the most frequent deficiencies in IVDR performance evaluations include:

  1. Insufficient clinical performance evidence. The most common major deficiency. Manufacturers rely exclusively on analytical performance data or limited literature, without demonstrating how the device performs in the clinical context of its intended use.

  2. No formal performance evaluation plan. Performance evaluation conducted ad hoc without a prospective plan. This makes it impossible to demonstrate that the evaluation was systematic and pre-specified.

  3. Missing analytical performance characteristics. Failure to evaluate all applicable characteristics — particularly interference studies, cross-reactivity for infectious disease assays, and metrological traceability for quantitative assays.

  4. Inadequate sample sizes without justification. Studies with small sample sizes and no statistical rationale. Even when resource constraints limit sample sizes, the manufacturer must acknowledge the statistical limitations and address them in the benefit-risk analysis and PMPF plan.

  5. No confidence intervals on performance claims. Reporting "sensitivity = 98%" without a confidence interval is unacceptable. The point estimate alone tells you nothing about the reliability of the estimate. A sensitivity of 98% based on 50 samples has a very different lower confidence bound than one based on 500 samples.

  6. State of the art not addressed. Failure to benchmark the device's performance against current clinical guidelines, established reference methods, and competitive devices. The performance evaluation must demonstrate that the device's performance is acceptable in the context of what is currently available and expected.

  7. PMPF plan absent or generic. No PMPF plan, or a plan so generic that it could apply to any device. The plan must identify specific performance questions arising from the pre-market evaluation.

  8. Failure to address known limitations. Every device has performance limitations — known interferences, populations where performance may differ, specimen types not validated. The PER must acknowledge these transparently and describe the risk mitigation measures (IFU warnings, confirmatory testing recommendations, contraindications).

  9. Outdated literature in scientific validity assessment. Relying on literature from 10+ years ago without demonstrating that the conclusions remain current. Medical knowledge evolves; new biomarkers may replace older ones; clinical guidelines may change recommended testing algorithms.

  10. Inconsistency between PER and other technical documentation. The performance claims in the PER do not match the IFU, the risk management file, or the GSPR checklist. Internal consistency across the technical documentation is a basic expectation.

Relationship to ISO 20916

ISO 20916:2019 — In vitro diagnostic medical devices — Clinical performance studies using specimens from human subjects — Good study practice — is the primary international standard for planning and conducting clinical performance studies for IVDs.

The IVDR does not explicitly require compliance with ISO 20916, but it is referenced in MDCG guidance (notably MDCG 2022-10 and related documents) and has become the de facto expected standard for clinical performance studies under the IVDR.

What ISO 20916 Covers

ISO 20916 Element IVDR Relevance
Study planning Aligns with IVDR requirements for a clinical performance study plan; defines sponsor responsibilities, protocol content, and statistical considerations
Ethical requirements Requires ethics committee review and informed consent where applicable; aligns with IVDR Article 58(5) requirements for studies involving additional specimen collection or interventions
Specimen management Defines requirements for specimen collection, handling, storage, and tracking — critical for ensuring pre-analytical quality in clinical performance studies
Reference standard/comparator Provides guidance on selecting and justifying the comparator or reference standard — one of the most challenging aspects of clinical performance study design
Data management Defines requirements for data collection, entry, verification, and archiving — ensuring data integrity and traceability
Statistical analysis Requires a pre-specified statistical analysis plan with defined endpoints, sample size justification, and handling of missing or indeterminate results
Study reporting Defines the content and structure of the clinical performance study report, which feeds into the PER
Residual sample studies Provides specific guidance for studies using leftover clinical specimens, including ethical considerations and limitations

ISO 20916:2024 Revision — Key Changes

ISO 20916 was revised in March 2024 (ISO 20916:2024), marking a significant update that strengthens alignment with the IVDR. The most important change is the introduction of Annex ZA, which formally harmonizes the standard with the IVDR by mapping ISO 20916 clauses to corresponding IVDR provisions.

Change Area ISO 20916:2019 ISO 20916:2024
IVDR mapping No formal mapping to IVDR provisions Annex ZA maps clauses to IVDR provisions, creating a unified regulatory pathway
Presumption of conformity Not established Compliance with clauses listed in Table ZA.1 gives manufacturers a presumption of conformity with IVDR GSPRs
Definition alignment Standard-specific terminology When definitions differ between ISO 20916 and the IVDR, Annex ZA prioritizes IVDR terminology
Foreseeable misuse Excluded from scope Annex ZA clarifies that while ISO 20916 excludes foreseeable misuse, the IVDR requires sponsors to address it — creating a gap that must be bridged in the study protocol
Monitoring requirements Allows rationale for remote monitoring IVDR requires an independent monitor — a stricter requirement that manufacturers must meet regardless of ISO 20916 allowances
Surgically invasive sampling No specific terminology Recognizes surgically invasive sample taking as triggering IVDR Annex XIV study requirements

Formal recognition status: As of early 2026, the official recognition of ISO 20916:2024 as an IVDR harmonized standard in the EU Official Journal is pending confirmation. However, the standard was approved by CEN without modification, and Notified Bodies are already assessing clinical performance studies against its principles. Manufacturers should design new clinical performance studies to comply with the 2024 revision, as it represents the current state of the art.

Clinical Performance Study Types and Regulatory Notification

The IVDR distinguishes between study types based on the level of risk to subjects, with different regulatory requirements for each:

Study Type Description Regulatory Requirements
Non-interventional studies using residual samples Device tested on leftover clinical specimens already collected for routine clinical care; results do not influence patient management Generally do not require notification to Competent Authorities; may require Ethics Committee review depending on national law; informed consent may be general consent or opt-out depending on jurisdiction; ISO 20916 principles still apply to study design and reporting
Non-interventional studies with prospective specimen collection Specimens collected specifically for the study but only alongside routine clinical specimens; device results do not influence patient management May require Ethics Committee approval and notification to Competent Authority depending on national requirements; informed consent typically required for additional specimen collection
Interventional studies (Annex XIV studies) Device results influence patient management decisions — e.g., treatment selection, escalation, or de-escalation based on device output; or studies involving surgically invasive specimen collection beyond routine care Must be notified to or authorized by the relevant Member State Competent Authority per IVDR Articles 58–77; Ethics Committee approval required; informed consent mandatory; sponsor must implement full clinical performance study plan per IVDR Annex XIV; study registration in EUDAMED required

The Clinical Performance Study Plan (CPSP) — required by both IVDR Annex XIII and ISO 20916 — must detail the study design, objectives, participant selection criteria, specimen handling procedures, reference standard justification, statistical analysis plan (including sample size calculation), quality procedures, data monitoring plan, and adverse event reporting procedures. For interventional studies, the CPSP must also address the informed consent process, data safety monitoring, and stopping rules.

When ISO 20916 Applies

ISO 20916 applies when the manufacturer conducts a clinical performance study — that is, a study specifically designed to evaluate clinical performance using human specimens. It does not apply to:

  • Analytical performance studies (which follow CLSI/ISO methodology for laboratory method evaluation)
  • Scientific validity assessments (which are literature-based)
  • Post-market performance monitoring that does not constitute a formal study

However, when PMPF includes a structured clinical performance study, ISO 20916 applies to that study as well.

Practical recommendation: Even if your clinical performance data is primarily literature-based or derived from residual sample testing, familiarize your team with ISO 20916. Notified Body reviewers will assess your clinical performance evidence against the principles embodied in this standard, even when the standard does not formally apply.

Putting It All Together: A Practical Workflow

For manufacturers approaching IVDR performance evaluation — whether for a new device or a legacy device transition — the following workflow provides a practical roadmap:

Step 1: Classify your device. Apply the IVDR Annex VIII classification rules. Get this right first — everything else flows from it. If uncertain, seek guidance from your Notified Body or Competent Authority.

Step 2: Identify applicable requirements. Based on classification, identify applicable GSPRs, Common Specifications (Class D), harmonized standards, and MDCG guidance documents.

Step 3: Write the performance evaluation plan. This is your strategic document. Define what evidence you need for scientific validity, analytical performance, and clinical performance. Specify study designs, acceptance criteria, and standards/protocols to be followed.

Step 4: Conduct the scientific validity assessment. Perform the systematic literature review. For well-established analytes, this is straightforward. For novel biomarkers, this may require significant effort and possibly the generation of new data.

Step 5: Conduct analytical performance studies. Execute the studies defined in your plan. Follow recognized protocols (CLSI, ISO). Document rigorously. Ensure sample sizes are justified and acceptance criteria are pre-specified.

Step 6: Generate clinical performance evidence. Conduct clinical performance studies, systematic literature reviews, or residual sample studies as defined in your plan. Ensure the reference standard is justified, the population is representative, and results are reported with confidence intervals.

Step 7: Compile the performance evaluation report. Bring all three pillars together. Analyze the totality of the evidence. Address the state of the art. Conduct the benefit-risk analysis. Acknowledge limitations. Reference the PMPF plan for ongoing data collection.

Step 8: Define the PMPF plan. Based on the PER conclusions and identified gaps or uncertainties, define concrete PMPF activities. These feed back into the PER at defined intervals.

Step 9: Submit to Notified Body. For Class B, C, and D devices, the PER is part of the technical documentation reviewed by the Notified Body. Expect questions. Address them thoroughly and promptly.

Step 10: Maintain and update. The performance evaluation is never done. Update the PER with PMPF data, new literature, state-of-the-art changes, and device modifications. Include updated conclusions in your PSUR.

Key MDCG Guidance Documents

The Medical Device Coordination Group (MDCG) has published several guidance documents relevant to IVDR performance evaluation. While not legally binding, these represent the consensus interpretation of the regulation and are followed by Notified Bodies:

Document Title Relevance
MDCG 2022-10 Guidance on the clinical evidence needed for medical devices previously covered by MDD/AIMDD (Note: IVDR-specific guidance is in development/published separately) Principles applicable to evidence sufficiency
MDCG 2020-16 Rev.2 Guidance on classification rules for in vitro diagnostic medical devices under Regulation (EU) 2017/746 Classification rules interpretation — foundational for determining performance evaluation scope
MDCG 2023-1 Guidance on the health institution exemption under Article 5(5) of Regulation (EU) 2017/746 (IVDR) Relevant for in-house IVDs and their performance evaluation obligations
MDCG 2024-1 Transition timelines for the IVDR (or relevant subsequent guidance on sell-off/grace periods) Timeline compliance for legacy device transitions
MDCG 2022-2 Guidance on general principles of clinical evidence for in vitro diagnostic medical devices Core guidance on what constitutes sufficient clinical evidence under the IVDR
MDCG 2023-9 Q&A on IVDR performance evaluation and post-market performance follow-up Practical Q&A addressing common industry questions

Note: MDCG guidance documents are updated periodically. Always verify you are referencing the latest revision. The MDCG website and your Notified Body are the authoritative sources for current guidance.

Summary

Performance evaluation under the IVDR is a fundamentally different undertaking from what the IVDD required. It is more structured, more prescriptive, more comprehensive, and subject to far greater regulatory scrutiny. The three-pillar framework — scientific validity, analytical performance, and clinical performance — provides a logical architecture, but executing it well requires deep understanding of the device's clinical context, rigorous study design, and honest assessment of the evidence.

The manufacturers who succeed in this environment will be those who treat performance evaluation not as a regulatory hurdle to clear, but as a rigorous, scientific discipline that ultimately serves patients. A well-conducted performance evaluation produces a device you can defend — to Notified Bodies, to clinicians, to patients, and to yourself.

The key takeaways:

  • Start with the performance evaluation plan. It is the foundation. A well-structured plan prevents wasted effort and demonstrates regulatory maturity to Notified Bodies.
  • Address all three pillars. Scientific validity, analytical performance, and clinical performance are all mandatory. Omitting or underinvesting in any one of them is a guaranteed deficiency.
  • Follow recognized methodologies. CLSI protocols for analytical performance. ISO 20916 for clinical performance studies. Systematic review methodology for literature. These are not suggestions — they are expectations.
  • Report with statistical rigor. Confidence intervals on every performance claim. Pre-specified acceptance criteria. Justified sample sizes. Notified Bodies have no patience for unsupported claims.
  • Be honest about limitations. Every device has them. Acknowledging limitations and describing mitigation measures demonstrates maturity and builds Notified Body confidence.
  • Plan for the lifecycle. Performance evaluation does not end at CE marking. PMPF is not an afterthought — it is how you maintain the evidence base that justifies continued market access.
  • For legacy devices, start the gap analysis now. The transition deadlines are firm. Identifying gaps early gives you time to fill them. Discovering gaps during Notified Body review costs months you may not have.