Whole-Genome Sequencing (WGS) Project

Project began
Point of Contact

What is the goal of the WGS project?

The goal of the WGS project is to collect whole-genome sequencing data from individuals with well-defined phenotypes and existing clinical outcomes data. This project supports the Trans-Omics for Precision Medicine (TOPMed) program, which generates scientific resources to improve the understanding of heart, lung, blood, and sleep disorders and advance precision medicine.

In 2016, the NHLBI released its Strategic Vision, which will guide the Institute’s research activities for the coming decade. The WGS project addresses many of the objectives, compelling questions, and critical challenges identified in the plan that relate to genomic research. For example, the WGS project will enable research on how genes affect individual disease processes. The project will also leverage data from diverse participants in NHLBI’s population and epidemiology studies to enable research on how genes contribute to health differences among populations. 


  • The WGS project has sequenced over 90,000 genomes from over 30 studies, and the project aims to sequence more than 120,000 genomes.
  • Less than half of these genomes are from individuals of non-European descent, creating a valuable genomic resource reflective of the diverse U.S. population.
  • Among the WGS project participants are people who have a variety of heart, lung, blood, and sleep disorders.
  • The WGS project released data from over 30,000 genomes through the National Institutes of Health (NIH) Database of Genotypes and Phenotypes (dbGaP).

How does the WGS project contribute to scientific discoveries?

The WGS project aims to provide whole-genome sequencing data that researchers can use to identify genetic markers of increased or decreased risk of heart, lung, blood, and sleep disorders, as well as markers that help define disorder subtypes. Among the WGS project participants are people who have these conditions:

  • Cardiovascular disorders, such as atrial fibrillation, high blood pressure, and stroke
  • Lung disorders, such as asthma, chronic obstructive pulmonary disease, and sarcoidosis
  • Blood disorders, such as sickle cell disease, hemophilia, deep vein thrombosis, and pulmonary embolism
  • Sleep disorders, such as sleep apnea
  • Obesity

The NHLBI TOPMed program will combine genomic data from the WGS project with other -omics data and with molecular, behavioral, imaging, environmental, and clinical data from participants. These data resources will advance research to improve the prevention and treatment of heart, lung, blood, and sleep disorders.


How does the WGS project work?

More Information
- Whole-Genome Sequencing (WGS) Project

The WGS project is making strides to establish a genomic resource that reflects the diverse U.S. population. Among current WGS project participants, less than 50 percent are of European descent, 30 percent are of African descent, 10 percent are of Hispanic/Latino descent, and 8 percent are of Asian descent. A recent study found that 81 percent of participants in genome-wide association studies are of European ancestry. By intentionally including participants with a variety of racial and ethnic backgrounds, the WGS is creating a unique and valuable genomic resource.

Researchers have started releasing WGS project data through the NIH Database of Genotypes and Phenotypes (dbGaP). The dbGaP was developed to archive and distribute data from studies that have investigated the interaction of genotype and phenotype, including all genome-wide association studies supported by the NIH. Currently, the WGS project has released over 30,000 whole genome sequences in dbGAP and approximately 45,000 more will be added to dbGaP in early 2018.

The NHLBI will continue growing this valuable whole-genome sequence public data resource and hopes to sequence over 120,000 individual genomes. As this effort expands, we will look for ways to include more underrepresented groups and heart, lung, blood, and sleep disorders. We also will look for opportunities to integrate whole-genome sequencing with other -omics data.