Chapter 1: Overview of Real-World Data

Types of Real-World Data

Data in the United States and many other countries are distributed across a complex ecosystem of thousands of institutions. Each type of RWD generation within various layers of health care delivery has a different set of strengths and potential weaknesses. Data in Europe are distributed across many nations, which have their own health care systems and vocabularies. Whether a real-world data set (or combination of data sets) is fit-for-purpose depends on the on the research question.10

As part of its 2018 framework for using real-world data to support regulatory decision making, the FDA has identified a number of potential sources of real-world data:1

Electronic health records (EHRs)

EHRs contain information collected in the ordinary course of hospital and ambulatory care  visits and can include structured data on diagnoses, procedures, laboratory results, vital signs,  medication orders, and medication administrations, as well as information that is unstructured, such as clinical notes.11 Structured data can include preliminary administrative claims, such as billing diagnoses and procedure codes (e.g., ICD, CPT).

Administrative claims

Administrative claims include information attached to charges submitted by health care providers for reimbursement by private insurers and federal insurance programs (e.g., Medicare and Medicaid). These administrative data typically include basic demographic characteristics and enrollment information in addition to detail on diagnoses and procedures associated with medical encounters and outpatient pharmacy prescription fills, or other specialized data such as charges from skilled nursing facilities or home health care providers. In comparison to the billing and diagnosis codes from EHRs, administrative claims from payors have been adjudicated for payment and thus have been somewhat verified, although are often delayed compared to EHR data.

Patient-generated health data (PGHD)

PGHD are “wellness and/or health-related data created, recorded, or gathered by individuals for themselves (or by family members or others who care for an individual).”12 These data can be generated from devices such as smart watches, internet-connected scales, phone apps, pedometers, and home blood pressure monitors. PGHD can include patient reported outcomes (PROs), which report the status of a patient's health directly from the patient (FDA Guidance for Industry 2009).13


Most patient registries can be categorized as either product or disease/condition registries. Product registries are typically created to prospectively capture data to support post-marketing surveillance of medical products. Disease or condition registries are defined by condition (e.g., pregnancy, HIV) to manage patient care and/or address questions related to the natural history of the disease or condition as well as comparative effectiveness and/or safety of treatments.

Environmental factors and social determinants of health

Environmental factors and social determinants of health (SDoH) such as socioeconomic status, food insecurity, and access to transportation may or may not be captured in EHRs, although collection of this information is increasing.14 Administrative claims may have supplemental information related to patients’ eligibility (e.g., employment status, income level). Researchers can also use geographic information, such as zip code, to obtain information about neighborhood level characteristics that may provide insight into SDoH.

Mortality Data

Despite being one of the more common endpoints in clinical studies of treatment safety and effectiveness, information on deaths, including date and cause, is often not captured in health care data. In these cases, health care data must be linked to external sources of mortality data (e.g., death certificates, vital statistics systems, civil registries) to capture death outcomes. In the US, sources of data on death include but are not limited to the National Death Index, State Vital Statistics, the Social Security Administration Death Master File, Medicare Master Beneficiary Summary File, and the National Association of Statistics and Information Systems Fact of Death Service.15