Chapter 1: Overview of Real-World Data

Real-World Data Governance

Data governance comprises several core concepts that are critical to the secure and appropriate use of RWD to generate RWE. These core concepts include: data privacy and de-identification; data security; and IRB review.

Data Privacy and De-Identification:

As more RWD are collected digitally and available for analysis, there is a risk that a patient’s privacy could be compromised. Much of the data available in RWD sources are subject to US Health Insurance Portability and Accountability Act (HIPAA) and EU General Data Protection Regulation (GDPR) provisions, and there are well understood methods for removing identifiable information and then using those data for research purposes without the explicit consent of the patient. In recent years, there has been renewed focus on privacy-preserving methods for linking or analyzing records across different data sets. These novel methods include privacy-preserving record linkage, differential privacy, secure multi-party computing, federated learning, and hashing and pseudonymization. These approaches vary in complexity, maturity, and adoption, but each represents a way to de-identify patient information while maintaining the ability to link patient data across different data sets.

HIPAA provides for two methods to assure that the re-identification risk is low for a given data set: One is safe harbor, in which data elements that contain personally identifiable data (PII) are removed. A second method is expert determination, which assures that there is a very small risk of re-identification in the combined or linked data set. A research team must pair the appropriate method of linkage with the necessary HIPAA-compliant data evaluation, preparation, and (if necessary) certification by an expert determination service.

Data Security:

Data security is fundamental to any infrastructure design for a given RWD project. In the US, the Federal Information Security Management Act (FISMA) defines three components of secure data:

  1. Confidentiality: the data are protected from unauthorized disclosure
  2. Integrity: the data are protected from unauthorized modification or destruction
  3. Availability: the data are protected from disruption of access20

A data breach can threaten any of the three components and can be the result of ransomware and cyberattacks (such as the WannaCry ransomware attack),21 data theft by employees, loss of devices, and human error. Data security for a RWD project begins with a governance and management framework that typically includes a data use agreement (DUA) and a formal plan for how the data will be used and secured. Considerations include technical components (such as cloud versus premise-based security), access policies, physical security and training, data protection, endpoint security, network security, defect and vulnerability management, identity management, monitoring, and independent testing and review.

IRB review:

Once the data source definition, fit-for-purpose assessment, and data privacy considerations have been incorporated into a study protocol, the research team should ensure that they have identified the appropriate IRB process to engage with for review.