Beyond the Hype: Why Synthetic Data Falls Short in Healthcare — and How RegenMed Circles Closes the Gap

September 10, 2025

Share This Page

Beyond the Hype: Why Synthetic Data Falls Short in Healthcare — and How RegenMed Circles Closes the Gap

September 10, 2025


Prepared for stakeholders in clinical development, payer strategy, and health system operations


Executive Summary

The recently published RegenMed White Paper details why AI-generated and other synthetic datasets struggle to meet the scientific, regulatory, and financial standards of high‑stakes healthcare.  Synthetic data can help with exploration and internal testing, but its lack of provenance, difficulty capturing edge cases, and tendency to propagate bias make it indefensible for submissions, audits, or patient‑impacting decisions.

A durable path forward prioritizes validatable, patient‑consented Real‑World Evidence (RWE) and hybrid models that link multiple verifiable sources.  RegenMed’s Circles platform was purpose‑built to deliver this: longitudinal, auditable datasets that preserve an end‑to‑end chain of custody and perform in real clinical settings.

What The White Paper Finds

Foundational pitfalls of synthetic data

  • Fidelity and generalizability limits: generative models mimic the center of the distribution, missing rare/critical events and temporal nuances; models trained this way can fail in real care.
  • Patient‑safety implications: evidence cited shows models missed a large share of in‑hospital deteriorations — illustrating how abstract training can overlook real‑world signals.
  • Bias reinforcement: if the source data are biased, synthetic replicas can amplify inequities and create self‑reinforcing feedback loops that degrade trust and widen disparities.

Audit and regulatory risks

  • Provenance is non‑negotiable for CMS and private payer audits; synthetic records cannot be tied to a beneficiary chart, clinician, timestamp, or EHR source and are therefore rejected as evidence.
  • CMS has expanded RADV scrutiny and sample sizes, increasing clawback exposure when documentation cannot be traced to a patient of record.
  • FDA guidance across drugs and devices emphasizes transparency, bias mitigation, and fit‑for‑purpose validation; it favors high‑quality RWE and requires detailed justification if synthetic data are used.

Stakeholder impacts

  • Manufacturers: devices without rigorous human‑data validation are more likely to be recalled; relying on synthetic datasets can mask performance gaps until post‑market exposure.
  • Researchers: results trained/tested on synthetic sets may fail to generalize; opaque generation methods complicate peer review and replication.
  • Payers & providers: financial denials, clawbacks, and malpractice exposure grow when algorithms trained on unverifiable or biased data affect clinical or utilization decisions.

Evolving legal landscape

States are enacting heterogeneous AI health policies — ranging from disclosure requirements for provider use of AI, to bans on automated adverse determinations,

to restrictions on AI in mental/behavioral healthcare.  The patchwork raises compliance cost and risk for national deployments.

The path forward: RWE‑first hybrid data strategy

The paper recommends shifting from data generation to data curation: build fit‑for‑purpose, auditable RWE and connect complementary sources (e.g., claims for the longitudinal journey, EHR notes for clinical context, and structured, protocol‑driven datasets capturing outcomes in routine care).  Hybrid clinical trials blend traditional RCT rigor with decentralized, continuous real‑world capture — accelerating timelines while improving external validity.

How RegenMed Circles Addresses The Weaknesses

  • Audit‑ready provenance: Circles datasets are built from patient‑consented, clinician‑documented encounters with timestamps, site/clinician identifiers, and verifiable source links — supporting payer and regulatory audits.
  • Edge‑case and rare‑event capture: Circles Observational Protocols focus collection on disease‑specific outcomes, complications, and safety signals — surfacing low‑frequency patterns synthetic data tend to erase.
  • Bias mitigation by design: multi‑site enrollment across academic centers and community practices, with stratified capture and QA, improves subgroup representation versus single‑source generative modeling.
  • Regulatory fit‑for‑purpose: Circles maintains data lineage, versioning, and documentation that map to FDA expectations for transparency, validation datasets, and reliability in RWE submissions.
  • Payer defensibility: every measurement is traceable to a patient of record, enabling documentation packs for CMS RADV, medical necessity appeals, and value‑based reconciliation.
  • Hybrid enablement: Circles links clean, protocol‑driven outcomes to claims and EHR context, creating a comprehensive, patient‑level evidence graph for analytics and label‑expansion studies.
  • Operational practicality: with inCytes™ for clinicians and Benchmarc™ for patients, Circles integrate into care pathways to collect longitudinal outcomes without burdening staff or disrupting workflows.

Illustrative Use Cases

  • AI device validation: a manufacturer stress‑tests an early‑warning algorithm across comorbid subgroups using Circles outcomes linked to EHR and claims; post‑market surveillance continues with the same auditable pipeline.
  • Medicare Advantage risk & quality: a payer evaluates risk adjustment and readmission programs using Circles’ patient‑linked outcomes and documentation packs, reducing denials and surviving RADV scrutiny.
  • Rare‑event research: investigators studying low‑prevalence complications use Circles to enrich cohorts and capture event timing/severity precisely, improving power and external validity compared to synthetic simulations.

Implementation Checklist For Partners

  • Define the fit‑for‑purpose question and target decisions (regulatory, reimbursement, labeling, clinical).
  • Select/author Observational Protocols that enumerate required outcomes, timepoints, covariates, and safety signals.
  • Establish data lineage: site IDs, user/clinician attribution, timestamps, and immutable source references.
  • Link Circles outcomes to claims and EHR extracts to create a patient‑level evidence graph.
  • Pre‑specify validation plans and bias checks aligned to FDA and payer expectations; document everything.
  • Package audit materials (data dictionaries, lineage, consent artifacts, and chart access pathways) for CMS/private payer reviews.

Conclusion

In healthcare, speed without provenance is a liability.  Synthetic datasets are useful scaffolding for exploration and internal QA, but they cannot carry the weight of clinical validation, regulatory approval, or financial accountability.  Circles provides the auditable RWE backbone — and the hybrid linkages to EHR and claims — needed to translate AI and analytics into safer products, smoother approvals, defensible reimbursement, and better patient outcomes.


Download the White Paper, “Weaknesses of AI and Other Synthetic Data in Healthcare,” Sept 2025.

Share This Page

Beyond the Hype: Why Synthetic Data Falls Short in Healthcare — and How RegenMed Circles Closes the Gap

September 10, 2025


Prepared for stakeholders in clinical development, payer strategy, and health system operations


Executive Summary

The recently published RegenMed White Paper details why AI-generated and other synthetic datasets struggle to meet the scientific, regulatory, and financial standards of high‑stakes healthcare.  Synthetic data can help with exploration and internal testing, but its lack of provenance, difficulty capturing edge cases, and tendency to propagate bias make it indefensible for submissions, audits, or patient‑impacting decisions.

A durable path forward prioritizes validatable, patient‑consented Real‑World Evidence (RWE) and hybrid models that link multiple verifiable sources.  RegenMed’s Circles platform was purpose‑built to deliver this: longitudinal, auditable datasets that preserve an end‑to‑end chain of custody and perform in real clinical settings.

What The White Paper Finds

Foundational pitfalls of synthetic data

  • Fidelity and generalizability limits: generative models mimic the center of the distribution, missing rare/critical events and temporal nuances; models trained this way can fail in real care.
  • Patient‑safety implications: evidence cited shows models missed a large share of in‑hospital deteriorations — illustrating how abstract training can overlook real‑world signals.
  • Bias reinforcement: if the source data are biased, synthetic replicas can amplify inequities and create self‑reinforcing feedback loops that degrade trust and widen disparities.

Audit and regulatory risks

  • Provenance is non‑negotiable for CMS and private payer audits; synthetic records cannot be tied to a beneficiary chart, clinician, timestamp, or EHR source and are therefore rejected as evidence.
  • CMS has expanded RADV scrutiny and sample sizes, increasing clawback exposure when documentation cannot be traced to a patient of record.
  • FDA guidance across drugs and devices emphasizes transparency, bias mitigation, and fit‑for‑purpose validation; it favors high‑quality RWE and requires detailed justification if synthetic data are used.

Stakeholder impacts

  • Manufacturers: devices without rigorous human‑data validation are more likely to be recalled; relying on synthetic datasets can mask performance gaps until post‑market exposure.
  • Researchers: results trained/tested on synthetic sets may fail to generalize; opaque generation methods complicate peer review and replication.
  • Payers & providers: financial denials, clawbacks, and malpractice exposure grow when algorithms trained on unverifiable or biased data affect clinical or utilization decisions.

Evolving legal landscape

States are enacting heterogeneous AI health policies — ranging from disclosure requirements for provider use of AI, to bans on automated adverse determinations,

to restrictions on AI in mental/behavioral healthcare.  The patchwork raises compliance cost and risk for national deployments.

The path forward: RWE‑first hybrid data strategy

The paper recommends shifting from data generation to data curation: build fit‑for‑purpose, auditable RWE and connect complementary sources (e.g., claims for the longitudinal journey, EHR notes for clinical context, and structured, protocol‑driven datasets capturing outcomes in routine care).  Hybrid clinical trials blend traditional RCT rigor with decentralized, continuous real‑world capture — accelerating timelines while improving external validity.

How RegenMed Circles Addresses The Weaknesses

  • Audit‑ready provenance: Circles datasets are built from patient‑consented, clinician‑documented encounters with timestamps, site/clinician identifiers, and verifiable source links — supporting payer and regulatory audits.
  • Edge‑case and rare‑event capture: Circles Observational Protocols focus collection on disease‑specific outcomes, complications, and safety signals — surfacing low‑frequency patterns synthetic data tend to erase.
  • Bias mitigation by design: multi‑site enrollment across academic centers and community practices, with stratified capture and QA, improves subgroup representation versus single‑source generative modeling.
  • Regulatory fit‑for‑purpose: Circles maintains data lineage, versioning, and documentation that map to FDA expectations for transparency, validation datasets, and reliability in RWE submissions.
  • Payer defensibility: every measurement is traceable to a patient of record, enabling documentation packs for CMS RADV, medical necessity appeals, and value‑based reconciliation.
  • Hybrid enablement: Circles links clean, protocol‑driven outcomes to claims and EHR context, creating a comprehensive, patient‑level evidence graph for analytics and label‑expansion studies.
  • Operational practicality: with inCytes™ for clinicians and Benchmarc™ for patients, Circles integrate into care pathways to collect longitudinal outcomes without burdening staff or disrupting workflows.

Illustrative Use Cases

  • AI device validation: a manufacturer stress‑tests an early‑warning algorithm across comorbid subgroups using Circles outcomes linked to EHR and claims; post‑market surveillance continues with the same auditable pipeline.
  • Medicare Advantage risk & quality: a payer evaluates risk adjustment and readmission programs using Circles’ patient‑linked outcomes and documentation packs, reducing denials and surviving RADV scrutiny.
  • Rare‑event research: investigators studying low‑prevalence complications use Circles to enrich cohorts and capture event timing/severity precisely, improving power and external validity compared to synthetic simulations.

Implementation Checklist For Partners

  • Define the fit‑for‑purpose question and target decisions (regulatory, reimbursement, labeling, clinical).
  • Select/author Observational Protocols that enumerate required outcomes, timepoints, covariates, and safety signals.
  • Establish data lineage: site IDs, user/clinician attribution, timestamps, and immutable source references.
  • Link Circles outcomes to claims and EHR extracts to create a patient‑level evidence graph.
  • Pre‑specify validation plans and bias checks aligned to FDA and payer expectations; document everything.
  • Package audit materials (data dictionaries, lineage, consent artifacts, and chart access pathways) for CMS/private payer reviews.

Conclusion

In healthcare, speed without provenance is a liability.  Synthetic datasets are useful scaffolding for exploration and internal QA, but they cannot carry the weight of clinical validation, regulatory approval, or financial accountability.  Circles provides the auditable RWE backbone — and the hybrid linkages to EHR and claims — needed to translate AI and analytics into safer products, smoother approvals, defensible reimbursement, and better patient outcomes.


Download the White Paper, “Weaknesses of AI and Other Synthetic Data in Healthcare,” Sept 2025.

Share This Page

Read The Latest