NHLBI Working Group Report
Computational Models for Analyzing Genotype-Phenotype Associations in
The National Heart, Lung, and Blood Institute (NHLBI) convened a
Working Group on July 24-25, 2008 in Bethesda, Maryland, to advise
the Institute on new research opportunities for solving some of the
challenges in genotype-phenotype association studies of rare
diseases. To date, analyses of genotype-phenotype associations to
reveal the genetic causes of rare diseases, particularly those that
present variable expression, have been challenging because of the
small cohorts available, (e.g., 2000 patients or fewer in the US and
Canada). Participants in the Working Group included clinical experts
in rare heart, lung, and blood disorders; phenotype experts; human
and molecular geneticists; epidemiologists; statistical geneticists;
and computational biologists. The goals of the Working Group were
explore how phenotypes can be
defined in several of these rare diseases
determine the appropriate
genotyping tests to be performed
evaluate the best computational
models that can be developed to answer these questions in light of
the challenges of rare diseases.
The meeting addressed NHLBI
Strategic Plan Goals 1 and 2 of improving understanding of the
molecular and physiologic basis of health and disease as well as the
clinical mechanisms of disease, and thereby enabling better
prevention, diagnosis, and treatment. (http://apps.nhlbi.nih.gov/strategicplan/).
The Working Group began with an overview on
challenges in monogenic and polygenic rare heart, lung, and blood
diseases, with particular emphasis on the tendency of many patients
with the same disease to express a wide array of symptoms.
Subsequent presentations focused on tools for defining and
collecting phenotypic data especially ontology-driven phenotyping
instruments to pool data across sites, methods of genotyping such as
genome wide analyses versus selected candidate genes, and the
findings of statistical geneticists and the challenges they face in
performing genotype-phenotype analyses in large and small patient
Major Points of Discussion:
Rare Heart Lung, and Blood Disorders: Their Symptoms, Signs, and
Rare diseases provide
scientifically useful models that can aid in the elucidation of
pathways for common diseases
Recognition of phenotyping and
pathway commonalities between rare and common diseases or among rare
diseases is critical to speeding mechanism discovery
Access to patient populations
with rare diseases can be enabled by collaborations between American
and International sites.
Defining and Collecting
Registries and/or databases are
needed. A catalogue of these registries should be made at NIH.
Updated consents for registries would permit continuous data
collection and long term follow-up of patients, which will
strengthen phenotypic accuracy and improve phenotype-genotype
association studies in rare diseases.
Compared to common diseases where
a large number of cases can be collected to provide sufficient
power, longitudinal studies are often more appropriate for rare
diseases. In addition, because phenotypes for rare diseases can be
harder to define, studying individuals over time can help to
identify homogenous subsets that can be most useful for genetic
Standardization of phenotypic
data across facilities and uniformity in data collection are among
the greatest current challenges. They can only be addressed through
a long-term investment in the collection of new phenotypic data and
biological samples. Existing repositories and methods developed for
data collection in other diseases may be useful in such an
Rare diseases provide useful
models that can elucidate mechanisms involved in common diseases.
For example, mutations in the glucocerebrosidase (GBA) gene
identified in patients with Gaucher disease have been shown to
correlate with increased risk for Parkinsonís disease in the general
population. Likewise, one of the proteins responsible for Fanconi
Anemia is derived from a breast cancer gene. These examples suggest
that studying pathways or general phenotypes shared between common
and rare diseases or among rare diseases can provide an alternative
way of searching for genetic modifiers. However, such analyses
require standardized phenotypes that are comparable across studies.
Genetic epidemiologists are
challenged by the large volumes of data and by the lack of
computational tools to make analyses and interpretations of them
accessible to investigators. First pass analyses often ignore
complex interactions between genetic variants and the environment
that could actually boost the power of genome-wide association
Validation of findings in other
studies has proved to be fundamental in distinguishing real genetic
modifiers from false positive signals and the availability of data
in dbGAP should enable such efforts.
The type of genotyping effort to
pursue is an important consideration. Commercially available SNP
arrays are designed to capture common variants, but the discovery of
genetic modifiers in rare disease may require different strategies,
such as fine mapping, deep sequencing, or whole genome scans.
Pathway analysis and customized candidate gene SNP chips are
important approaches for the studies of rare diseases.
and the Development of Novel Computational Models
New computational methods are needed
that can enable discovery of sets of variants associated with a
phenotype rather than associations with single variants. Meta-analysis
can play an important role in assessing the robustness of associations
The adequacy of the power of GWAS
to discover genetic modifiers of the phenotypes of rare diseases
is an important concern. Several analytic strategies can be explored
to increase power in studies involving small numbers of patients,
Focus on the most extreme values
of quantitative phenotypes to increases the effect size and
boost power; however such approaches can decrease the applicability
to other measures if the extreme values are used as discriminants
for study design;
Use data mining and knowledge
discovery methods that have been employed in other disciplines
to discover nuggets of information from massive data sets;
Use bioinformatics tools to
navigate results and identify important genes to follow up without
relying solely on statistical significance.
- Multi-center joint efforts are needed along
with coordinated information systems for easier sharing of data and
the development and use of shared/common clinical epidemiology phenotypes
and standardized phenotype measurements.
- More emphasis needs to be placed on statistical
and bioinformatics studies, and on attracting researchers who understand
biology, genetics, and statistics and are adept at computational programming.
- Sharing data will be crucial to the discovery
or robust genotype-phenotype associations but the rarity of disease
makes subjects at higher risk of being identified and safe methods
of sharing data should be developed.
Development of the following resources and infrastructure would
advance research into rare diseases and speed application and
translation of knowledge to clinical settings:
Prospective registries or
databases for specific rare diseases using state of the art data
entry and data management systems. Coordination of rare disease
registries through NIH would allow for interaction between
databases/registries, it is important to enroll multi generation and
extended first- and second-degree proband-identified families.
Longitudinal registries for rare
diseases should extend to multiple generations and include first and
second degree relatives
could increase the size of the primary study or contribute
Confidentiality of data from
patients with rare diseases must be ensured because of the relative
ease of patient identification in small cohorts.
Inter- and intra-group
collaborations to create and share standardized phenotyping methods
and computerized data collection systems.
Collections of continuing
phenotypic data with corresponding biological samples.
New computational methods,
software, and tools that can be used to address the current data
overload in all genomic studies. Their development will require
formation of inter-disciplinary teams that include physicians,
geneticists, molecular biologists, statisticians, epidemiologists,
computer scientists, and experts in search and machine learning in
addition to investigators responsible for quantitative analyses.
Applications can include multiple investigators focused on the study
of a rare disease.
Methods that can integrate
bioinformatic information, genomic studies (particularly
sequencing), and intermediate phenotypes to enable studies with
higher power to identify the genetic bases of rare diseases.
Software to evaluate various
approaches for identifying the genetic causes of rare diseases.
Common approaches to regulatory
and compliance issues and data sharing
Working Group Members:
Jeffrey A. Towbin, M.D.,
Baylor College of Medicine
Paola Sebastiani, Ph.D., Boston University
Christopher Amos, PhD, MD Anderson Cancer Center
Terri Beaty, PhD, Johns Hopkins University
Sessions Cole, MD, Washington University
Barry Coller, MD, Rockefeller University
Emily Harris, PhD, NHGRI, NIH
Gail Jarvik, MD, PhD, University of Washington
Jeffery Lipton, MD, PhD, Albert Einstein College of Medicine
Idan Menashe, PhD, NCI, NIH
Arthur Moss, MD, University of Rochester
Ellis Neufeld, MD, PhD, Childrenís Hospital Boston
Michael Province, Washington University
Benjamin Rybicki, PhD, Henry Ford Health System
Steve Sherry, PhD, NCBI, NIH
Ellen Sidransky, MD, NHGRI, NIH
Edwin Silverman, MD, PhD, Brigham and Womenís Hospital
Martin Steinberg, MD, Boston University School of Medicine
Sponsoring Institute and Office:
Office of Rare Diseases
National Heart, Lung, and Blood Institute
Blaine Moore, Rina Das
Dina Paltoo, George Papanicolaou, David Eckstein,
Gang Zheng, Jennie Larkin, Sandra Colombini-Hatch, Lisa Brooks, Greg
Last Updated June 2011