Accessible Search Form           Advanced Search

Skip left side navigation and go to content


Computational Models for Analyzing Genotype-Phenotype Associations in Rare Diseases

Executive Summary

The National Heart, Lung, and Blood Institute (NHLBI) convened a Working Group on July 24-25, 2008 in Bethesda, Maryland, to advise the Institute on new research opportunities for solving some of the challenges in genotype-phenotype association studies of rare diseases. To date, analyses of genotype-phenotype associations to reveal the genetic causes of rare diseases, particularly those that present variable expression, have been challenging because of the small cohorts available, (e.g., 2000 patients or fewer in the US and Canada). Participants in the Working Group included clinical experts in rare heart, lung, and blood disorders; phenotype experts; human and molecular geneticists; epidemiologists; statistical geneticists; and computational biologists. The goals of the Working Group were to:

  • explore how phenotypes can be defined in several of these rare diseases
  • determine the appropriate genotyping tests to be performed
  • evaluate the best computational models that can be developed to answer these questions in light of the challenges of rare diseases.

The meeting addressed NHLBI Strategic Plan Goals 1 and 2 of improving understanding of the molecular and physiologic basis of health and disease as well as the clinical mechanisms of disease, and thereby enabling better prevention, diagnosis, and treatment. (


The Working Group began with an overview on challenges in monogenic and polygenic rare heart, lung, and blood diseases, with particular emphasis on the tendency of many patients with the same disease to express a wide array of symptoms. Subsequent presentations focused on tools for defining and collecting phenotypic data especially ontology-driven phenotyping instruments to pool data across sites, methods of genotyping such as genome wide analyses versus selected candidate genes, and the findings of statistical geneticists and the challenges they face in performing genotype-phenotype analyses in large and small patient cohorts.

Major Points of Discussion:

Rare Heart Lung, and Blood Disorders: Their Symptoms, Signs, and Genetics

  • Rare diseases provide scientifically useful models that can aid in the elucidation of pathways for common diseases
  • Recognition of phenotyping and pathway commonalities between rare and common diseases or among rare diseases is critical to speeding mechanism discovery
  • Access to patient populations with rare diseases can be enabled by collaborations between American and International sites.

Defining and Collecting Phenotypic Information

  • Registries and/or databases are needed. A catalogue of these registries should be made at NIH. Updated consents for registries would permit continuous data collection and long term follow-up of patients, which will strengthen phenotypic accuracy and improve phenotype-genotype association studies in rare diseases.
  • Compared to common diseases where a large number of cases can be collected to provide sufficient power, longitudinal studies are often more appropriate for rare diseases. In addition, because phenotypes for rare diseases can be harder to define, studying individuals over time can help to identify homogenous subsets that can be most useful for genetic studies.
  • Standardization of phenotypic data across facilities and uniformity in data collection are among the greatest current challenges. They can only be addressed through a long-term investment in the collection of new phenotypic data and biological samples. Existing repositories and methods developed for data collection in other diseases may be useful in such an undertaking.
  • Rare diseases provide useful models that can elucidate mechanisms involved in common diseases. For example, mutations in the glucocerebrosidase (GBA) gene identified in patients with Gaucher disease have been shown to correlate with increased risk for Parkinsonís disease in the general population. Likewise, one of the proteins responsible for Fanconi Anemia is derived from a breast cancer gene. These examples suggest that studying pathways or general phenotypes shared between common and rare diseases or among rare diseases can provide an alternative way of searching for genetic modifiers. However, such analyses require standardized phenotypes that are comparable across studies.

Determining Appropriate Genotyping Procedures

  • Genetic epidemiologists are challenged by the large volumes of data and by the lack of computational tools to make analyses and interpretations of them accessible to investigators. First pass analyses often ignore complex interactions between genetic variants and the environment that could actually boost the power of genome-wide association studies (GWAS).
  • Validation of findings in other studies has proved to be fundamental in distinguishing real genetic modifiers from false positive signals and the availability of data in dbGAP should enable such efforts.
  • The type of genotyping effort to pursue is an important consideration. Commercially available SNP arrays are designed to capture common variants, but the discovery of genetic modifiers in rare disease may require different strategies, such as fine mapping, deep sequencing, or whole genome scans. Pathway analysis and customized candidate gene SNP chips are important approaches for the studies of rare diseases.

Genotype-Phenotype Analyses and the Development of Novel Computational Models

  • New computational methods are needed that can enable discovery of sets of variants associated with a phenotype rather than associations with single variants. Meta-analysis can play an important role in assessing the robustness of associations across studies.
  • The adequacy of the power of GWAS to discover genetic modifiers of the phenotypes of rare diseases is an important concern. Several analytic strategies can be explored to increase power in studies involving small numbers of patients, i.e.,
    • Focus on the most extreme values of quantitative phenotypes to increases the effect size and boost power; however such approaches can decrease the applicability to other measures if the extreme values are used as discriminants for study design;
    • Use data mining and knowledge discovery methods that have been employed in other disciplines to discover nuggets of information from massive data sets;
    • Use bioinformatics tools to navigate results and identify important genes to follow up without relying solely on statistical significance.
  • Multi-center joint efforts are needed along with coordinated information systems for easier sharing of data and the development and use of shared/common clinical epidemiology phenotypes and standardized phenotype measurements.

  • More emphasis needs to be placed on statistical and bioinformatics studies, and on attracting researchers who understand biology, genetics, and statistics and are adept at computational programming.

  • Sharing data will be crucial to the discovery or robust genotype-phenotype associations but the rarity of disease makes subjects at higher risk of being identified and safe methods of sharing data should be developed.


Development of the following resources and infrastructure would advance research into rare diseases and speed application and translation of knowledge to clinical settings:

  • Prospective registries or databases for specific rare diseases using state of the art data entry and data management systems. Coordination of rare disease registries through NIH would allow for interaction between registries.
    • When developing databases/registries, it is important to enroll multi generation and extended first- and second-degree proband-identified families.
    • Longitudinal registries for rare diseases should extend to multiple generations and include first and second degree relatives
    • International collaborations could increase the size of the primary study or contribute validation sets.
    • Confidentiality of data from patients with rare diseases must be ensured because of the relative ease of patient identification in small cohorts.
  • Inter- and intra-group collaborations to create and share standardized phenotyping methods and computerized data collection systems.
  • Collections of continuing phenotypic data with corresponding biological samples.
  • New computational methods, software, and tools that can be used to address the current data overload in all genomic studies. Their development will require formation of inter-disciplinary teams that include physicians, geneticists, molecular biologists, statisticians, epidemiologists, computer scientists, and experts in search and machine learning in addition to investigators responsible for quantitative analyses. Applications can include multiple investigators focused on the study of a rare disease.
  • Methods that can integrate bioinformatic information, genomic studies (particularly sequencing), and intermediate phenotypes to enable studies with higher power to identify the genetic bases of rare diseases.
  • Software to evaluate various approaches for identifying the genetic causes of rare diseases.
  • Common approaches to regulatory and compliance issues and data sharing

Working Group Members:


Jeffrey A. Towbin, M.D., Baylor College of Medicine
Paola Sebastiani, Ph.D., Boston University


Christopher Amos, PhD, MD Anderson Cancer Center
Terri Beaty, PhD, Johns Hopkins University
Sessions Cole, MD, Washington University
Barry Coller, MD, Rockefeller University
Emily Harris, PhD, NHGRI, NIH
Gail Jarvik, MD, PhD, University of Washington
Jeffery Lipton, MD, PhD, Albert Einstein College of Medicine
Idan Menashe, PhD, NCI, NIH
Arthur Moss, MD, University of Rochester
Ellis Neufeld, MD, PhD, Childrenís Hospital Boston
Michael Province, Washington University
Benjamin Rybicki, PhD, Henry Ford Health System
Steve Sherry, PhD, NCBI, NIH
Ellen Sidransky, MD, NHGRI, NIH
Edwin Silverman, MD, PhD, Brigham and Womenís Hospital
Martin Steinberg, MD, Boston University School of Medicine

Sponsoring Institute and Office:

Office of Rare Diseases
National Heart, Lung, and Blood Institute

NHLBI Organizers:


Blaine Moore, Rina Das


Dina Paltoo, George Papanicolaou, David Eckstein, Weinu Gan,
Gang Zheng, Jennie Larkin, Sandra Colombini-Hatch, Lisa Brooks, Greg Evans

Last Updated June 2011

Twitter iconTwitterimage of external icon Facebook iconFacebookimage of external icon YouTube iconYouTubeimage of external icon Google+ iconGoogle+image of external icon