(function(w,d,s,l,i){ w[l]=w[l]||[]; w[l].push({'gtm.start': new Date().getTime(),event:'gtm.js'}); var f=d.getElementsByTagName(s)[0], j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:''; j.async=true; j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl; f.parentNode.insertBefore(j,f); })(window,document,'script','dataLayer','GTM-W24L468');
explore

The Model Card Template That Passes FDA Pre-Cert Review

August 11, 2025Alex Welcing7 min read

The FDA Submission That Got Rejected

Startup: "We're submitting our AI diagnostic tool for FDA Pre-Cert."

FDA Reviewer: "Provide documentation: training data, model architecture, evaluation metrics, clinical validation."

Startup: "We have a white paper..."

FDA: "We need structured documentation. Model card, data card, and clinical evaluation report. Resubmit in 6 months."

The Delay: 6 months of scrambling to create documentation that should've existed from day one.

What FDA Pre-Cert Requires (The Checklist)

Three Documents:

  1. Model Card: What the AI does, how it was trained, limitations
  2. Data Card: Where training data came from, bias testing, quality control
  3. Clinical Evaluation Report: Real-world validation, safety monitoring

Timeline:

  • Without documentation: 12-18 months to approval
  • With documentation: 6-9 months

Cost Savings: 6 months of eng time + faster time to market

The FDA-Ready Model Card Template

Section 1: Intended Use

What FDA Wants:

  • Medical condition/disease targeted
  • Patient population (age, sex, comorbidities)
  • Clinical setting (hospital, clinic, home use)
  • User (physician, nurse, patient)

Example:

INTENDED USE

Medical Condition: Type 2 Diabetes screening
Patient Population: Adults 18-75, no prior diabetes diagnosis
Clinical Setting: Primary care clinic
Primary User: Primary care physician
Decision Support: AI flags high-risk patients for lab testing (HbA1c)

What NOT to Say: "General health screening" (too vague—FDA will reject)

Section 2: Model Architecture

What FDA Wants:

  • Algorithm type (e.g., "Gradient boosting classifier")
  • Input features (e.g., "Age, BMI, blood pressure, family history")
  • Output (e.g., "Risk score 0-100, with threshold at 70 for high-risk")

Example:

MODEL ARCHITECTURE

Algorithm: XGBoost (gradient boosting decision trees)
Version: XGBoost 1.7.0
Inputs: 12 clinical features (age, BMI, systolic BP, fasting glucose, etc.)
Output: Diabetes risk score (0-100)
Threshold: Score ≥70 = High Risk (recommend HbA1c lab test)

Why This Matters: FDA needs to understand how the AI makes decisions (interpretability requirement).

Section 3: Training Data

What FDA Wants:

  • Source (where data came from)
  • Volume (how many patients)
  • Demographics (age, sex, race, ethnicity)
  • Date range (when data was collected)
  • Quality control (how you ensured data accuracy)

Example:

TRAINING DATA

Source: Electronic Health Records from [Hospital System], IRB-approved (Protocol #12345)
Volume: 50,000 patients (2018-2023)
Demographics:
  - Age: Mean 52 (range 18-75), SD 14
  - Sex: 52% female, 48% male
  - Race: 60% White, 20% Black, 12% Hispanic, 8% Asian
  - Ethnicity: 85% Non-Hispanic, 15% Hispanic
Data Quality:
  - Missing data: <5% per feature (imputed using median)
  - Outliers: Values >99th percentile reviewed by clinician, corrected or removed
De-Identification: HIPAA-compliant (dates shifted, names removed, rare diagnoses aggregated)

Red Flag: If demographics don't match US population, FDA will ask about bias.

Section 4: Evaluation Metrics

What FDA Wants:

  • Accuracy, sensitivity, specificity (clinical gold standards)
  • Performance by demographic subgroup (fairness testing)
  • Comparison to human clinicians (is AI better?)
  • Clinical impact (does AI improve patient outcomes?)

Example:

EVALUATION METRICS

Test Set: 10,000 patients (held out, not used in training)
Overall Performance:
  - Sensitivity (Recall): 87% (95% CI: 85-89%)
  - Specificity: 82% (95% CI: 80-84%)
  - AUC: 0.91

Subgroup Performance (Fairness Testing):
  - Female: Sensitivity 88%, Specificity 83%
  - Male: Sensitivity 86%, Specificity 81%
  - White: Sensitivity 89%, Specificity 84%
  - Black: Sensitivity 84%, Specificity 79% (within 5pp, acceptable)

Comparison to Physician:
  - Physician sensitivity: 78% (AI +9pp improvement)
  - Physician specificity: 85% (AI -3pp, acceptable trade-off)

Clinical Impact:
  - Early detection: AI flags 12% more high-risk patients than physician alone
  - Estimated prevented complications: 200 cases/year per 10,000 patients screened

Why This Matters: FDA cares about patient outcomes, not just model accuracy.

Section 5: Limitations and Warnings

What FDA Wants:

  • Known failure modes (when AI is unreliable)
  • Contraindications (when NOT to use AI)
  • Required human oversight (physician must review)

Example:

LIMITATIONS

Known Failure Modes:
  - Lower accuracy for patients with rare comorbidities (<1% of population)
  - Not validated for patients under 18 or over 75
  - Not validated for Type 1 Diabetes (only Type 2)

Contraindications:
  - Do NOT use for patients with pre-existing diabetes diagnosis
  - Do NOT use as sole diagnostic tool (lab confirmation required)

Required Human Oversight:
  - Physician must review all high-risk flags before ordering lab tests
  - AI is decision support, not autonomous diagnosis
  - Physician retains final clinical decision authority

Why This Matters: FDA wants proof you're not overselling the AI's capabilities.

Section 6: Post-Market Surveillance

What FDA Wants:

  • How you'll monitor AI performance in production
  • What triggers a safety alert (accuracy drop, adverse events)
  • How often you'll retrain/update the model

Example:

POST-MARKET SURVEILLANCE

Monitoring Plan:
  - Monthly accuracy tracking on production data (random sample of 500 patients)
  - Alert trigger: Sensitivity drops below 80% OR specificity drops below 75%
  - Physician feedback: Track overrides, false positives, false negatives

Safety Reporting:
  - Adverse events (patient harm) reported to FDA within 30 days
  - Quarterly summary report to FDA (performance metrics, user feedback)

Model Updates:
  - Annual retraining with new data (subject to FDA review)
  - Version control: All model versions documented, old versions archived

Why This Matters: FDA Pre-Cert assumes continuous improvement (not "set it and forget it").

Real Example: Diabetic Retinopathy Detection AI

Product: AI analyzes retinal images, flags diabetic retinopathy.

FDA Submission:

Intended Use: Screen diabetic patients for retinopathy in primary care settings (not ophthalmology clinics).

Model: Convolutional neural network (ResNet-50 architecture)

Training Data: 120,000 retinal images from 5 hospital systems (2015-2020)

Evaluation:

  • Sensitivity: 92% (FDA target: >85%)
  • Specificity: 88%
  • Comparison: Ophthalmologist sensitivity 95% (AI -3pp, acceptable for screening)

Limitations:

  • Not for patients with cataracts (image quality too poor)
  • Requires human ophthalmologist to confirm positive findings

Post-Market:

  • Monthly monitoring: Random sample of 1,000 images re-reviewed by ophthalmologist
  • Alert: If AI sensitivity drops below 88%, auto-disable pending investigation

FDA Decision: Approved (6 months from submission to clearance).

Why It Worked: Documentation was complete upfront. No back-and-forth with FDA.

The Data Card (Companion to Model Card)

What FDA Wants (separate document):

  • Data provenance: IRB approval, patient consent, HIPAA compliance
  • Bias testing: Performance by race, sex, age, socioeconomic status
  • Data retention: How long you keep training data, why
  • Data security: Encryption, access controls, audit logs

Example Snippet:

DATA CARD

Provenance:
  - Source: [Hospital System] EHR database
  - IRB: Approved under Protocol #12345, waiver of consent (de-identified data)
  - HIPAA: Compliant (Business Associate Agreement signed)

Bias Testing:
  - Racial parity: Sensitivity within 5pp across racial groups
  - Gender parity: Sensitivity within 3pp (female 88%, male 86%)
  - Age: Lower sensitivity for patients >70 (79% vs. 87% for 40-60 age group)
    → Mitigation: Added warning for physicians treating elderly patients

Data Retention:
  - Training data: Retained for 10 years (FDA device record requirement)
  - Production data: De-identified logs retained for 3 years (monitoring)

Data Security:
  - Encryption: AES-256 at rest, TLS 1.3 in transit
  - Access: Role-based (PM, ML engineer, clinical validator—7 people total)
  - Audit logs: Reviewed quarterly by compliance team

Checklist: Is Your Model Card FDA-Ready?

  • Intended use (specific medical condition, patient population, clinical setting)
  • Model architecture (algorithm, inputs, outputs, threshold)
  • Training data (source, volume, demographics, quality control)
  • Evaluation metrics (sensitivity, specificity, AUC, subgroup performance)
  • Comparison to human clinician (is AI better/worse?)
  • Clinical impact (does AI improve patient outcomes?)
  • Limitations (failure modes, contraindications, required oversight)
  • Post-market surveillance (monitoring plan, safety reporting, update schedule)

If any box is unchecked, FDA will request more documentation.

Common PM Mistakes

Mistake 1: Claiming "General Purpose" AI

  • Reality: FDA requires narrow, well-defined medical use cases
  • Fix: Specify exact condition, population, setting (not "health screening")

Mistake 2: No Bias Testing

  • Reality: FDA will reject if you haven't tested performance across demographics
  • Fix: Report sensitivity/specificity by race, sex, age (minimum)

Mistake 3: No Post-Market Plan

  • Reality: FDA Pre-Cert assumes you'll monitor and update the AI
  • Fix: Document monitoring frequency, alert triggers, update process

Alex Welcing is a Senior AI Product Manager in New York who writes FDA-ready model cards before submitting medical device AI. His regulatory approvals take 6 months, not 18, because documentation is a product requirement from day one.

Related Articles