Cardiovascular Risk Prediction for Maternal Mortality and Morbidity and Beyond Workshop

February 9 - 12, 2021
Virtual event


The National Institutes of Health’s (NIH) National Heart, Lung, and Blood Institute (NHLBI) and the Eunice Kennedy Shriver National Institute for Child Health and Human Development (NICHD), with participation from the Foundation for NIH Biomarker Consortium hosted a joint virtual workshop on February 9th and 12th, 2021, on prediction modeling for adverse pregnancy outcomes (APOs) associated with maternal morbidity and mortality and subsequent risk for cardiovascular disease (CVD). The main goals of the workshop were threefold. First, to explore whether a risk prediction model or tool could be developed to inform risk for APO. Second, to determine whether a risk prediction model or tool can predict future CVD in women of childbearing age with and without a history of APO. And third, if the answer is yes to either research question, what do we need to do to accomplish these objectives? For example, what additional data are needed and how can information from different cohorts be combined? Or if we are not currently able to develop such risk prediction models and tools, what barriers must be overcome to develop those capabilities? How can we apply lessons learned from other disciplines’ successes and failures?

Purpose of the Workshop

The purpose of this workshop was to identify initial steps towards the development of a risk prediction model and potential tool for maternal morbidity and mortality and CVD in young women. Most risk algorithms heavily rely on age and therefore are not useful in younger women, particularly during childbearing years. In addition, models or tools need to be developed that focus our ability to address the persistent racial and ethnic disparities noted in maternal morbidity and CVD. Our expectation was that by assembling a multidisciplinary group of experts (including clinicians, managers of large pregnancy registries, and biomedical informaticians), we could identify key research gaps, and propose strategies to develop an initial risk prediction tool. Such a tool could be used for both clinical management and research, with the ultimate goal of altering the trajectories of CVD in young women, particularly for those who have experienced APOs.  


APOs (such as preeclampsia, eclampsia, peripartum cardiomyopathy, preterm birth, gestational diabetes) and preexisting conditions (asthma, sleep disordered breathing, obesity, chronic hypertension, diabetes, social determinants of health) are associated with an increased risk of later CVD. However, the positive predictive value of currently available measures is poor. More precise risk stratification is critical to prognosticating CVD risk in young women during their childbearing years, particularly among women of color, in order to identify who would most benefit and how interventions can be applied to reduce increased risk of CVD outcomes. In addition to traditional risk factors (age, race/ethnicity, obesity status, blood pressure and lipids), genetic and biomarker data may help to create risk algorithms for these young women. Biomarkers may be particularly useful for nulliparous women with no apparent risk factors.  

Combinations of clinical and genetic data, biomarkers assessed via machine learning, artificial neural networks, and systems biology have the potential to dramatically improve our ability to identify women at risk for CVD compared to single factors. The appropriate combinations of risk factors may be difficult to recognize through traditional statistical approaches. Fortunately, the field of biomedical informatics has developed novel analytic approaches that can analyze the vast quantities of data generated by omics technologies. Complex analyses that use millions of data points require large data sets, clinical data from electronic health records, biosample banks, and well characterized phenotypes in order to be successful. Combining datasets is especially important to obtain adequate racial and ethnic diversity in sufficient numbers. In turn, this will facilitate our ability to address the persistent disparities noted in maternal morbidity and CVD in women.  

General Principles for Risk Prediction Models

  • Currently, a risk prediction tool for predicting future CVD risk after APOs appears to be more feasible than predicting APOs for a variety of reasons. First, obtaining pre-pregnancy data are challenging since young women rarely encounter the health care system unless they become pregnant or have a chronic condition that requires surveillance. Second, the diversity of pathways leading to APOs makes their phenotyping and prediction modeling extremely challenging, although newly developed analytical approaches hold future promise. Third, much more is known about CVD prevention, and a model focused on CVD risk would allow time to intervene and alter the trajectories. Further, there are fewer interventions for preventing pregnancy complications.
  • The endpoint for the model should be important, modifiable, and actionable for clinicians, women, and researchers. Use risk factors and biomarkers that predict significant risk reduction or that have an intervention available.
  • It is important to characterize whether a prediction tool is implementation-ready, an advanced clinical tool, or at the discovery phase; the prediction model also should be practical to use.  Data for clinical models must be easily obtainable and simple to enter. For example, robust models built on omics or other data may have strong predictive value; however, they may not be feasible for clinicians and their patients who prefer models they can easily understand.
  • A model must account for women who enter pregnancy with high risk versus those who develop an APO unexpectedly. The value of APOs for prediction is to identify women who are at higher risk but to do so at an earlier time in the life course, so clinicians can intervene by providing them with information to mitigate their risk factors. APO-related CVD risks decline over the long term as more traditional CVD risk factors overwhelm the model.  Therefore, APOs have the most predictive value pre-menopause as other risk factors become more important after menopause.
  • The goal should be to identify those women at higher risk to offer earlier interventions, prevention, and treatment. There may be surrogate endpoints (e.g., weight loss, blood pressure control) that might prompt changes in clinical practice.
  • Age overwhelms other risk factors in long-term CVD risk prediction. Yet, earlier prediction can lead to earlier prevention and/or treatment strategies. Therefore, improving prediction may require age-specific models.
  • A model developed from scratch and not based on an existing model may not be accepted by the scientific community. Adapting an existing model like the Framingham Risk Model to incorporate pregnancy complications is one approach to consider.  

Variables and Endpoints Important for the Models

  • Traditional risk factors include: BMI, medical/family history, age, race/ethnicity, blood pressure, and gestational age at delivery.
  • Predictive biomarkers help identify women at risk of developing early-onset preeclampsia, while diagnostic and monitoring biomarkers used during the second trimester or beyond can identify women who may benefit from therapeutic interventions. Timing for biomarker collection is important, especially in the third trimester when the woman is under maximum pregnancy stress, as well as at the time of delivery. Placental biomarkers are useful predictors of later CVD.
  • Psychosocial stress and social determinants are adverse health exposures for both CVD and APOs.
  • Race/ethnicity should be included in the models as a covariate or in stratified models.
  • Sleep disorders have been overlooked as pregnancy-related and post-partum risk factors. Sleep apnea, insomnia, and short sleep duration are prevalent conditions associated with incident CVD morbidity and mortality, with potential value for pregnancy and future CVD predictive models.
  • Polygenic scores (i.e., a quantitative metric of inherited risk) exceed the predictive risk of traditional risk factors and enable risk stratification for CVD early in life prior to onset of traditional factors.
  • Potential endpoints could include CVD, heart failure, or stroke. However, for young women, hypertension after pregnancy, severe maternal morbidity, or APOs in a subsequent pregnancy and measures of subclinical CVD (pulse wave velocity, carotid intima-media thickness) will be better.

Sources of Data for the Models

There are several large pregnancy registries with data and/or biosamples that could be leveraged to perform discovery and subsequent validation for a model. The Global Pregnancy Collaboration provides access to high-quality data and biospecimens from 40 cohorts around the world. The nuMoM2b Heart Health Study is a highly-phenotyped cohort of 4500 women that defined the incidence of hypertension and the CVD risk profile of women 2-7 years after a first pregnancy. Preconception to pOst-partum study of cardiometabolic health in Primigravid PregnancY (or POPPY), a pre-conception cohort study, examined different models of life course trajectories of cardiometabolic dysfunction in healthy versus unhealthy pregnancies in nulliparous women.  Pregnancy Outcome Prediction Study (POPS) is a longitudinal pregnancy biobank, with data, including maternal blood and urine at multiple time points, on well-phenotyped, physician-validated pregnancy outcomes.  LifeCodes is a biobank with similar data on 3,365 pregnancies, from 2009 to present. Kaiser Permanente Northern California incorporates data elements including demographics, behavioral measures, and clinical encounters, as well as longitudinal data on both pregnancy and CVD.

Other registries to consider include the Nurses’ Health Study 2 and 3; Black Women’s Health Study; Study of Latinos: Hispanic Community Health Study; Growing Up Today Study; the Women’s Health Initiative; and the Child Health Development Studies. The best data are research data that must be harmonized to provide the strongest CVD prediction modeling possible.

Identification of Outstanding Opportunities/Gaps

In the final breakout session, groups were charged with proposing a model, based on the information heard over the previous day and a half. The following opportunities were identified:

  • Acknowledge that modeling is an iterative process. Start with the simplest model and increase complexity with each subsequent attempt. Begin with modeling for future CVD in a multi-stage model where the first stage considers known risk factors and strategies that can be implemented, while further testing of a more advanced model is undertaken.
  • A developed model requires periodic updating in order to stay current with newly discovered risk factors and technological advances.
  • Perform discovery on large datasets and then validate in a “gold standard” dataset.  Using the high-quality data from nuMoM2b to test the strength of the correlations will be useful.
  • Consider machine learning because it is most helpful when the data and relationships are complex, as well as in situations where traditional statistical methods do not work very well (e.g., observational data with a lot of missing data). It is also useful for hypothesis generation and for finding associations when the incidence of cases is low.
  • Use time-to-event modeling to identify cluster APOs as well as to predict women who may be at higher risk.
  • Polygenic risk scoring should be considered as one strategy for risk prediction.
  • Predictive modeling can be used to identify participants for important clinical trials, such as using a blood pressure medication after pregnancy to delay the onset of adverse CVD outcomes.
  • Researchers must assume a collective will to collaborate, share data, and prioritize the questions. The Environmental Influences on Child Health Outcomes (ECHO) study from NIH provides a model for harmonizing data sets and answering important questions.  An ECHO-like approach would significantly advance this field of study.  


Several robust pregnancy registries exist that could form the basis for modeling efforts.  Multidisciplinary teams of clinicians, registry/sample owners, and bioinformaticians are essential to begin to develop potential models. A simpler “implementation” model based on existing knowledge may be the first attempt, later incorporating biological discovery using -omics data to identify pathways to APOs, which can ultimately lead to a clinical prediction model. A clinical prediction tool may be ideal for use at the 6-week post-partum visit, where assessment of CVD risk should happen and the patient-provider conversation about those risks should take place.


10:00–10:15 AM
Setting the stage and discussing the goals

Can we develop a risk prediction tool that is accurate and can inform risk for MMM (adverse pregnancy outcomes)?  Can we develop a risk prediction tool for future CV disease in young women?  If yes, what do we need to accomplish this?  If no, what barriers must be overcome to develop these tools?

10:15–10:35 AM

10:35–11:10 AM
Session #1: Data (clinical and molecular)

Rapid fire talks on successes and failures in prediction models

  • Casey Greene on data integration and methods in cancer research
  • Graeme Smith on OB role in management
  • Ananth Karumanchi on pregnancy risk prediction
  • Jennifer Stuart on CVD prediction using reproductive variables
  • Michael Pencina on CV prediction modeling and endpoints that matter in young people

11:10–11:25 AM

11:25–11:35 AM

11:35–11:40 AM
Polling Questions:

  1. Which types of variables do you think are essential to any risk prediction model for predicting maternal morbidity during pregnancy or post-partum?
  2. Which types of variables do you think are essential to a risk prediction model for predicting later cardiovascular disease in young women who have experienced an adverse pregnancy outcome?
  3. What is the most important but not routinely available variable?

11:40 AM–12:10 PM
Breakout #1

3 Multidisciplinary groups

  • Data for the model
    • What types of data do we have?
    • How robust (quantity and quality) are the data, especially for big data approaches?
    • Are the right variables being collected?
    • What kinds of new data do we need?
    • Is it possible to prioritize what is needed?
  • Sampling/Selection of subjects for the model
    • Do we have adequate sample sizes?
    • Do we have the capacity to investigate differences in racial/ethnic disparities?

12:10–12:40 PM
Return to large group and report out

12:40–1:00 PM

1:00–1:45 PM

1:45–2:00 PM
Session #2: Modeling

Rapid fire talks on modeling techniques, successes, and failures

  • Ricardo Henao on time to event modeling and machine learning
  • Doug McNair on machine learning and research equity
  • Laritza Rodriguez on data mining

2:00–2:20 PM

2:20–2:30 PM

2:30–2:35 PM
Polling Questions:

  1. Can we use intermediate endpoints since hard endpoints of MI or stroke are years in the future from pregnancy cohorts?
  2. What should our models predict?
    1. CVD
    2. Ischemic disease
    3. Stroke
    4. Heart failure
  3. What are the most important endpoints in young adult women (<50 years of age):
    1. Severe APO
    2. Severe maternal morbidity/mortality
    3. Hypertension after pregnancy
    4. Diabetes after pregnancy
    5. Other subclinical measures (PWV, CAC, IMT, etc)
  4. How would we most effectively use a model:
    1. Patient-facing app
    2. Primary care deployment for risk stratification, medication decisions
    3. OB discharge planning
    4. Other

2:35–3:05 PM
Breakout #2

3 Multidisciplinary groups

  • Modeling
    • Which are the most effective types of models?
    • What has worked for other diseases such as oncology?
    • What has failed?
    • Which subclasses should be considered in a model e.g. severe vs moderate disease, genetic subclasses, types of APOs?
    • How can racial and ethnic diversity be explored in the models?

3:05–3:35 PM
Return to large group for report outs

3:35–3:55 PM

3:55–4:00 PM
Rebecca Roper on NOSIs applicable to this work

4:00–4:10 PM
Instructions for Day 2

10:00–10:15 AM
Overview from Day 1

10:15–10:45 AM
Session #3: Risk factors, biomarkers, endpoints

Rapid fire talks on variables, risk factors, biomarkers and endpoints

  • Viola Vaccarino on sex-specific features, SES and stress
  • Sadiya Khan on racial disparities
  • Amit Khera on genetic or omics endpoints
  • Judette Louis on sleep risk factors

10:45–10:50 AM
Polling Questions:

  1. Which risk factors do you believe are the most important to include in a risk prediction model?  Name up to 5.
  2. Which important risk factors are more likely to be available?
  3. Which risk factors are important but not widely available?

10:50–11:20 AM
Breakout #3

3 Multidisciplinary groups

  • Risk factors, biomarkers and endpoints
    • Which risk factors and biomarkers have evidence to support their inclusion in a prediction model?
    • Do risk factors differ with respect to different racial/ethnic populations, pre- vs post-menopause for CVD, before or during pregnancy for maternal mortality?
    • Which datasets will be the most helpful to foster machine learning?
    • What are the right endpoints?
    • Should endpoints differ among various ethnic or racial groups?

11:20–11:30 AM

11:30–11:50 AM
Return to large group for report outs

11:50 AM–12:10 PM

12:10–12:40 PM
Rapid fire talks on potential datasets appropriate for modeling

  • Les Myatt/Jim Roberts on CoLab
  • David Haas on nuMoM2b and nuMoM2b Heart Health Study
  • Abi Fraser on ALSPAC
  • Katie Gray on LIFECODES
  • Janet R-Edwards on NHS, CARDIA, Black Women’s Health Study
  • Yeyi Zhu on Kaiser Permanente

12:40–12:45 PM
Aaron Pawlyk on the Foundation for NIH Biomarker Consortium

12:45–1:30 PM

1:30-1:40 PM
Session #4: Proposing models

Summary of meeting goals

Discussion of potential approaches and the modeling processes    

1:40–2:10 PM
Breakout #4:

3 Multidisciplinary groups

  • Proposed models 
    • Create a proposed model for predicting adverse maternal outcomes
    • Create a proposed model for predicting CV risk in young women

2:10–2:50 PM
Return to large group to discuss potential models and approaches

2:50–3:00 PM
Next Steps