The Rise of Real World Data: The Growing Use of RWD to improve Clinical Trial Outcomes

The use of real world data in health innovation is a hot topic. Doctors, biotech and pharmaceutical companies alike point to the tremendous potential to leverage this data to improve health outcomes. According to Acorn AI,  RWD is currently used in about 75% of new drug applications (NDA’s) and Biologic License Applications (BLA’s) in 2020.1

But as with any innovative field, there is a gap between the promising theory of RWD analysis and the reality of today’s practices. There are significant challenges to collecting, aggregating and analyzing so much disparate data in a way that can ensure privacy/protections, minimize bias and deliver meaningfully improved outcomes.

Doctor holding and Ipad. Data flowing from the ipad.

Looking to leverage Real World Data to improve clinical trial recruitment?

Definitions: Real-World Data & Real-World Evidence

Because there is often confusion in the terminology used to distinguish RWD (Real World Data) from RWE (Real World Evidence), we thought we would begin with the generally agreed definition of each, based on FDA guidance.

An older man sat down holding a phone and participating in decentralized clinical studies with ObvioHealth.

Real World Data

The data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources including electronic health records, claims and billing activities, product and disease registries, but also patient generated data both via in home-use settings and gathered from medical and other health monitoring devices.

Real World Evidence

The clinical evidence regarding the usage and potential benefits or risks of a medical product derived from analysis of RWD.

Real World Data Applications

An older man sat down holding a phone and participating in decentralized clinical studies with ObvioHealth.

Historically, the US Food and Drug Administration (FDA) generally referred to RWD and RWE as a means to monitor post-market safety and adverse events and to make regulatory decisions. Both terms typically referred to patient‐level data gathered outside the conventional clinical research setting.

However, advances in technology, data science, and healthcare policies have significantly increased the volume, sources, and utilization of RWD, beyond those traditionally referenced.

The guidance didn’t fully take into consideration the breadth of digital innovations enabling the use of RWD/RWE along the drug discovery lifecycle. Increasingly, product developers are using RWD and RWE within clinical trials to support clinical trial designs as well as to measure outcomes.

A 2018 publication from the FDA broadened the definition of RWE to any data “generated from any study design (including RCTs) as long as the data source is from routine care and the design is highly pragmatic, meaning the trial design and conduct closely approximate the eventual use of the product in clinical practice.” 1

A two screenshot example of a decentrslized clinical trial on a mobile phone from ObvioHealth, which delivers better data.

Looking to monitor vitals in real time in your clinical trial?

Contact Us

Supporting Clinical Trials with Real World Data & Real World Evidence:

We'll walk through the different ways in which this data and evidence is being applied in today's world.

1. Clinical Planning

There is tremendous and rich data available on any number of disease states, comorbidities and biomarkers…if only researchers were able to access it. This data is housed in many different and siloed ‘banks’ and is structed heterogeneously: Different hospital networks have their own nomenclature of electronic and medical health records. Laboratories collect imaging and testing data and house it separately.

Payers and insurance companies summarize their claims data according to their own structure. Add to these the terabytes of pharmaceutical data, disease registries and data collected via wearables and devices and it becomes easier to understand why so much of this data lies dormant. Yet within all those records lies a treasure-trove of information that can be structured, aggregated and analyzed to bring invaluable health insights to life.

Such information would allow clinicians to:

The power of real world data has recently been brought to light with the onset of the COVID-19 pandemic.

 The crisis forced clinical trial leaders to condense their discovery processes. They could not progress iteratively, waiting for evidence from a small-scale trial to proceed to the next leg. Rather, they had to rely upon existing data from previous COVID and other studies, registries and EHR to generate smart hypotheses that would enable them to short cut the traditional process.  RWD and RWE, generated from a combination of sources, enabled these researchers to respond to questions like:

Collaboration efforts to collect and transparently share meaningful and valid evidence have the potential to accelerate and enhance the scientific validity of clinicians’ work all along the health value chain by enabling other researchers to analyze large volumes of data and reproduce outcomes to deliver more robust findings and highlight meaningful health trends.

2. Optimization of clinical trial recruitment and protocol design

Most clinical trials are conducted as “one-offs,” without a way to draw on the plethora of data sources that could inform the approach. Yet the more a clinical team knows prior to conducting a trial, the more able they are to hone the protocol.  Real world data can inform cohort recruitment as well as questionnaire design and workflow. Integration of electronic health records can help to identify patient subpopulations with more refined demographic criteria, or more granular information like digital biomarkers or phenotypes that can help to define patient pools more quickly and accurately. In some cases, this data can eliminate the need for direct patient screening.

One way that real world data can be used in study design is for prognostic enrichment: to develop protocols that will achieve more measurable outcomes.

This can be done by recruiting participants whose history suggest they will be more likely to comply with the protocol. The number of endpoints can also be increased by choosing cohorts or geographies more likely meet the desired requirements (eg: recruiting in a geography with high COVID-19 infection rates in order to test efficacy of a COVID treatment).

Another way that real world data can be used is for predictive enrichment: to choose a cohort that is more likely to achieve the desired outcome.

The emergence of precision medicine has evolved the one-size-fits-all approach to various drugs and, correspondingly, the clinical trials used to generate evidence. Variability in genes, environment, and lifestyle is increasingly used to predict which subgroup of patients will work best for a given intervention. The use of RWD can help to reduce heterogeneity in studies by identifying and excluding certain populations that have exhibited a negative correlation with outcomes (eg: an unhealthy medical history, behavior or the use of a certain medication that might reduce efficacy of the drug to be tested).  

At the other end of the spectrum, study developers can use predictive enrichment to select patients that data suggest will respond more positively to a study product. Both of these methods have concrete impacts on the sample size required to prove efficacy. By better targeting cohorts, researchers can design more cost-effective protocols.

Two examples of improved recruitment and study design through the use of RWD:

Prognostic enrichment example:
DCT Dermatology Study

A nutraceuticals company sought to assess the impact of a carotenoid-based nutritional supplement on skin parameters in healthy women. This study sought to validate carotenoids' restorative properties and its ability to protect skin from ultraviolet (UV) damage.

The protocol ObvioHealth developed in partnership with the client targeted participants with a Fitzpatrick skin scoring deemed most reactive to sun exposure, living in regions with a certain range of UV light index and weather conditions. This reduced the sample size needed to observe an effect.

Predictive enrichment example:
Pediatric Growth Hormone Deficiency Study 4

A biopharmaceutical company developing therapeutics for rare diseases sought to recruit the highest potential population for an oral growth hormone.  They conducted a peer reviewed analysis of data from two prior Merck pediatric growth hormone deficiency (PGHD) studies as well as data mining analysis of children with GDH from EliLilly’s GeNeSIS data base.

The analysis corroborated the use of IGF-1 and GH as predictive enrichment markers, concluding that more moderately GH deficient patients would be more likely to respond to their treatment and thus recruited this population for their trial.

Left Carousel Arrow icon.
Right carousel arrow icon.

Once high potential patient populations have been identified, researchers can also work with their physicians to more readily recruit them into trials.

The benefits to the patient and provider are considerable. In addition to giving access to trials they might not otherwise been aware of, the integration of real world data into a study through their medical health records can enrich the trial findings and analysis. And clinical trial data can be reintegrated into their records post study.

3. Remote Monitoring from Wearables and Medical Devices

Traditional clinical trials have relied upon ePRO (electronic patient reported outcomes) for much of the data capture. ePRO can offer rich insights on patient perceptions and quality of life.

However, when participants are asked to report on efficacy or safety endpoints, the data is often inaccurate, as patient perceptions can influence reporting. Imagine the difference between asking someone to report how they slept last night vs. collecting data through a device that captures REM sleep... or asking a patient to recount the number hot flash episodes vs. clicking a button or icon to report an event.

The advent of wearables and devices has shifted this paradigm.  Healthcare providers and trial investigators can now monitor clinical grade vital signs including heart rate, respiratory rate, temperature, blood pressure, oxygen saturation, ECG, coughing episodes, sweat, sleep, activity levels and body positioning – without patients ever having to set foot in a clinic. The real world data from these devices is transmitted back to a clinical trial platform in real time, where it is centrally and continuously monitored.

Two examples of remote patient monitoring through the use of RWD:

Red Hill Study:

In an ongoing decentralized study designed to assess the safety and efficacy of COVID-19 intervention, ObvioHealth is leveraging telemetric devices for continuous monitoring of temperature, pulse, and respiratory rate of patients at home. This level of continuous monitoring is generally achieved only in hospitals; ObvioHealth’s capabilities make it possible in
decentralized clinical trials.

Mobile phone modalities can also capture unstructured data, like images and audio recordings that can reduce the burden on participants, accurately capturing  evidence with a push of a button. Not only is this unstructured data more accurate and easier to capture, it also has the potential for richer analysis

Pediatric GI Study:

In this study targeting busy caregivers, the goal was to capture data on their babies] crying episodes. Previous studies pointed to significant heterogeneity and bias in caregiver reporting. ObvioHealth thus replaced the ePRO module with one where caregivers recorded their infants’ crying /fussing through an ambient speaker, connected to our app, to more accurately capture each event. Expert scorers were then able to review and analyze each recorded crying episode to provide more accurate measures of frequency and duration.

The combination of patient reported outcomes and remote patient monitoring in clinical trials contribute to a more holistic understanding of the impact of an intervention in the real world and thereby provide enriched real world evidence of targeted outcomes.

Left Carousel Arrow icon.
Right carousel arrow icon.

Looking to introduce wearables and/or devices in your study? We can help.

5 medical devices. Each devices framed in an ObvioHealth branded shape.

4. Post marketing surveillance

As stated earlier, the most common use of real world data and evidence has been for the purpose of pharmacovigilance: to evaluate the safety of drugs post-launch. This use case remains highly relevant today. It is essential to track new drugs and devices longitudinally, as certain events will not be identified within a trial; researchers must watch for patterns over a longer period of time. The J&J COVID-19 vaccine is an example of identification of an adverse event (blood clots amongst young women) post trial, that helps doctors to anticipate and minimize such events in the future.

Clinicial trial timeline visual. Showcasing 3 images for phase 2, 3, & 4.  Each image showcases an increased number of people, while the image size also increases.

We'll help you to improve your clinical trial using RWD.

Free Consultation

Barriers to Adoption of Real-World Data in Clinical Trials

The growing interest in the use of RWD and RWE has also brought to light some significant barriers to delivering on all its promise.  


  • Need for anonymization and/or consent
  • Role for provider as intermediary

Of course, all this data must be collected in a way that guarantees the privacy and anonymity of the individuals from whom it is sourced. This can be accomplished by generating a unique code (token) for each patient that keeps his or her health information private while enabling aggregation of this RWD and RWE across disparate datasets to generate deeper and more longitudinal insights.

There are also occasions when it can be beneficial to associate the data with a given individual- for example- to offer access to a clinical trial. In these cases, it is of course necessary to obtain consent from the individual to be contacted, either directly or through the provider. New partnerships with health providers are facilitating this form of patient consent. Patients who see their providers as a trusted source of recommendation and referral are more likely to provide legal approval to access their medical records.


  • Multiple silos
  • Disparate data types
  • Different languages
  • Hetergenous coding mechanisms

According to a recent IQVIA whitepaper, the healthcare system generates approximately a zettabyte (a trillion gigabytes) of data annually, and this amount is doubling every two years. Yet, as mentioned earlier, this data sits in multiple silos, different languages, formats and is often coded different. The heterogeneity of the data is its greatest richness and also its biggest challenge.

There is a need to collate and analyze disparate types and categories of data including both structured (field entries) and unstructured (image and sound recordings, doctor notes…). Secure centralized databases must be made accessible to researchers and, despite tremendous effort and investment, it is only recently that systems have been put in place to extract this data more efficiently. Hospital systems are now partnering with data management experts in an effort to map EMR data into clinical trial case report form systems. Increased electronic data capture (EDC) is also resulting in reduced data fragmentation and data entry errors.


  • Variations by region and country
  • Evolving guidance

The acceptance and applications of RWE for decision making is growing. The US FDA has long been interested in using RWD to learn about medical products, particularly drug safety. More recently, in an effort to keep health care costs down, the FDA has also been exploring ways to use RWD to measure health performance. This interest extends beyond US borders. In April of 2019, Health Canada approved an expansion of the existing approved pediatric indication for acute otitis in children based on real‐world data and is now publishing guidance in this direction. The European Union, also open to uses of RWD to monitor safety and drug utilization for marketed products, applies stricter regulations around data privacy (GDPR) than does the US. The Japanese Pharmaceuticals and Medical Devices Agency (PMDA) has recently shown a greater willingness to expand its application to regulatory assessments other than safety. Draft guidance on the use of RWE for regulatory decision making has also been issued for comments by the National Medical Products Administration of China. However, the routes to obtaining pilot guidance are complex recommended uses and still vary significantly across regions.

Human Error / Bias

  • Selection bias
  • Interpretation bias

The richness and scale of RWD enables researchers to perform deep analysis that can identify patterns that would otherwise remain obscure. This can best be accomplished by training algorithms to perform such calculations. Yet there are several risks inherent in this process. First, there is the challenge of identifying and integrating all relevant factors /variables (treatment patterns, drug availability, disease severity, care setting, comorbidities… ) Much of the RWD available through medical records or claims tends to be episodic, offering only a partial picture of the health landscape. Researchers need to be sensitive to these gaps and must actively seek out alternative data sources to compensate.

RWD is also subject to selection bias, as cohort selection and treatment decisions in clinical practice are not random. For example, when using claims data, researchers may ignore that a certain segment of the population is less likely to visit the doctor or has never made a claim. It is incumbent upon researchers to think carefully about the relationships between different variables and to systematically challenge their own biases to ensure the validity of any conclusions.

The Future of RWD: Increasing Use of AI, NLP and RPA
Woman on laptop with ai illustrations floating above the laptop.

The power of RWD is proportional to the size and quality of the data set being analyzed. Given the challenges of data standardization and interoperability of data sets, AI (artificial intelligence), NPL (natural language processing) and RPA (Robotic Process Automation) capabilities will increasingly be employed to support these efforts to reveal powerful insights.

These insights will help in the identification of populations at risk for certain diseases, the selection of more effective medical treatments and the identification of new indications for existing drugs.

As the medical world increases its focus on targeted/precision medicine for niche populations where RCTs can become prohibitive, aggregation of Real World Datasets using phenotypic, genotypic or laboratory data across groups of smaller cohorts with similar disease stages can create a ‘synthetic arm’ for rare disease analysis, potentially providing a more cost efficient route to evidence generation for drug approvals. It can also identify inappropriate use of medications or dosing that can lead to better patient management.


1.     Longitudinal Patient Insights: Linking Clinical Trial and Real World Data to Create an End-to-End Patient Record – Medidata presentation at Reuters Pharma Clinical 2021

2.     Swift B, Jain L, White C, et al. Innovation at the Intersection of Clinical Trials and Real-World Data Science to Advance Patient Care. Clin Transl Sci. 2018;11(5):450-460. doi:10.1111/cts.12559

3.    Collaborating to Generate Fit-for-Use RWE Post COVID-19 – Aetion presentation at Scope 2021 Conference  

4.     Lumos Pharma, Inc. Data Supporting Use of Predictive Enrichment Markers for Lumos Pharma's LUM-201 Therapy in Clinical Trials for Moderate PGHD Published in Journal of the Endocrine Society. GlobeNewswire News Room, Lumos Pharma, Inc., 4 Mar. 2021,

5.    IQVIA. Accelerating AI/ML Adoption in Biopharma. 22 Dec. 2020,

6.     Andre, Elodie Baumfeld, et al. “Trial Designs Using Real‐World Data: The Changing Landscape of the Regulatory Approval Process.” Wiley Online Library, John Wiley & Sons, Ltd, 10 Dec. 2019,