It was a big moment nearly four years ago when NHLBI’s Trans-Omics for Precision Medicine program (TOPMed) released nearly 9,000 whole genomes to a limited group of researchers. The genomes—complete sets of people’s DNA—were the first the program had ever made available, and their release signaled a new era for precision medicine at NHLBI. One day, researchers said, the data could lead to treatments tailor-made to individual patients and maybe even shed light on the critical problem of racial and ethnic health disparities in the United States.
Since that October day in 2016, TOPMed has added tens of thousands more genomes to the database, courtesy of research investigators and participants in their NHLBI studies. And now, for the first time, any registered researcher—not just TOPMed investigators—can conduct research with the data.
This is important, experts say. With more researchers in a position to better understand the relationship between genetics and disease, the potential to develop innovative diagnostic tools, therapies, and prevention strategies--especially for heart, lung, blood diseases, and sleep disorders--is even greater than before.
“We are delighted,” said George Papanicolaou, Ph.D., Research Geneticist/Program Director and Framingham Heart Study Program Officer in the Division of Cardiovascular Sciences at NHLBI. “This opens up genetic research to so many more investigators, populations, and diseases, and will facilitate far more discovery.”
In practical terms, NHLBI is expanding access to what is called the TOPMed Imputation Server. Genetic imputation servers are like human brains that skim—rather than fully read—written information. The more data in our brain, the better we are at skimming, because even though we may miss words or phrases, our brains can use our understanding of the language to help fill the gaps.
In much the same way, genetic imputation servers can infer missing information in small genetic samples (called arrays) by comparing them to massive amounts of whole genome data (called reference panels). This means researchers can collect genetic arrays from people living with a disease and, using statistical modeling conducted by the server, get data that is nearly as good in quality as whole-genome data. And the results come much quicker and at a significantly lower cost than when they collect and analyze whole-genome sequences.
It’s all a researcher’s dream. “We’re ecstatic,” said Cashell E. Jaquish, Ph.D., a genetic epidemiologist in the Division of Cardiovascular Sciences at NHLBI.
The good news in the case of the TOPMed Imputation Server is that tens of thousands of whole genomes have already been sequenced and together can be used as a reference panels. In fact, the TOPMed imputation server reference panel is believed to be among the largest in the world, with the genomes of more than 97,000 participants. The panel is also among the most multiethnic: 58% of the whole-genome sequences represent people of non-European ancestry (31% African, 15% Hispanic/Latino, and 9% Asian), while 41% represent people of European ancestry and 2% unknown or other ancestries. The reference panel is considered high-quality, too, because each letter of a participant’s genetic code is determined based on an average of 38 observations, or “reads,” while most genetic imputations are based on 20-30 reads.
This all bodes well for the accuracy of the data researchers can get. “The better quality the data and the more whole-genome sequences a reference panel has, the more accurate the inferred information,” Papanicolaou said. “The accuracy improves even more when the ancestry of the participants in reference panels match up with the populations the researchers are studying.”
Until recently the University of Michigan housed the TOPMed Imputation Server, but in April it moved to the NIH’s STRIDES cloud environment and was integrated into NHLBI BioData Catalyst, NHLBI’s developing cloud-based data ecosystem. This change brought the imputation server directly under NHLBI’s management, made the individual genomic data even more secure, and expanded its accessibility. So far, the TOPMed Imputation Server has imputed 4.2 million genomes, a number far exceeding what was expected by this point.
Now, Jaquish said, more researchers will have the data that can help answer some of the big questions around genetics and disease, especially those related to racial and ethnic health disparities—for example, why high blood pressure is more common in African American and Hispanic adults. Answering these questions, she said, could one day lead to more effective prevention strategies and personalized therapies.
“As we continue to partner with researchers from around the world,” Papanicolaou added, “we expect the reference panel will grow in number and population diversity,” putting these potential benefits within reach even faster.