The National Heart, Lung, and Blood Institute (NHLBI) convened a Working Group of cardiovascular researchers and bioinformatics experts on June 12, 2006 in Bethesda, Maryland. The aim of the working group meeting was to identify key challenges to annotating and integrating cardiovascular data as well as developing and promoting the use of controlled vocabularies and ontologies by the cardiovascular research community.
A review of the existing ontology resources led to a discussion of strategies for developing ontologies. “Top-down” approaches mandated by agencies or organizations (e.g., International Classification of Diseases, ICD) were compared with “bottom-up” community-generated approaches (e.g., Gene Ontology, GO). Developing ontologies with use-cases would help to ensure community involvement in ontology development and promote useful applications as well as focus ontology development efforts in a specific area.
Use-cases should be translational and span the research spectrum from basic to clinical and from genetic to phenotype. Specific conditions/diseases recommended as topics for use cases were: (1) cardiac malformations and congenital heart disease, and (2) ischemic heart disease. Cardiovascular electrophysiology was identified as an area in need of standardized data representation and lacking a culture of data sharing.
During the data annotation and integration discussion, the distinction between data interoperability and data integration was noted, as most of existing tools enable data interoperability. Researchers need to have consistent annotation for low-throughput as well as for high-throughput cardiovascular data.
Important annotation challenges/opportunities facing cardiovascular researchers included: (1) better integration across different data types and species, and (2) the development of specific cell models that include a variety of data types. Community-based infrastructure groups should be encouraged to develop annotation tools for data integration and data interoperability. Supporting ontology-based annotation of existing cardiovascular data (e.g., Physionet from NCRR) was judged to be an effective way to leverage existing resources.
A wide-ranging discussion identified needs and opportunities for training, information dissemination, as well as for improved education in ontologies. Effective dissemination approaches included success stories from specific research areas (use-cases), information dissemination at scientific meetings, and web resources.
Phenotype representation is a major challenge facing cardiovascular researchers. A key issue in phenotype representation is the granularity of phenotype representation at the molecular, cellular, tissue, and phenotype levels.
- Perform a comprehensive assessment of available ontology and controlled vocabulary resources with due diligence. NHLBI should partner with other major group efforts to catalogue and organize these resources for the community.
- Designate K-Awards to bring clinical and computational scientists together and support cross-training in the area of ontologies related to the heart, lung, and blood.
- Develop model elements for a specific cell type (e.g. cardiac myocyte or smooth muscle cell), including specialized models of complete networks/pathways, and the localization and activity of genes/proteins/molecules in both normal and pathologic conditions.
- Develop a well thought-out phenotype ontology to enhance NHLBI’s mission in basic, translational, and clinical research including the following considerations
- Linking research protocols to a standard phenotype ontology to offer clarity regarding how phenotype is measured and defined.
- Developing important foundational ontologies including time, space, and morphogenesis.
- An important research question concerns characterizing the granularity among phenotypes (e.g., molecular, cellular, tissue, disease levels)
- Develop ontology-based annotations of existing data resources, which would enable data mining and vocabulary/ontology development (e.g., Physionet). To achieve this goal, NHLBI will need to develop an inventory of databases for which improved annotation with controlled vocabularies and/or relevant ontologies would benefit the heart, lung, and blood research community.
- Develop a data-annotation infrastructure that benefits from widespread community input. The heart, lung and blood research community needs tools to support ontology development through recognized and widely dispersed domain experts especially when top–down, centralized development is infeasible.
- Develop paradigm projects (such as heart failure, Sudden Cardiac Death, ischemia, pharmacology, pharmacogenetics) that will drive tool development to facilitate data integration (from genotype to phenotype to disease model) and associated controlled vocabulary and ontology development. These projects would focus on a specific scientific area (e.g., genes to myocytes to ischemia; genes to embryonic stem cell differentiation to phenotypes to pathology) in building ontologies and data models for a specific scientific area (possibly restricted to a specific cell type).
Jennie Larkin, Ph.D.
Working Group Members:
- Mark Musen, M.D., Ph.D., Stanford University Medical Center
- Bruce McManus, M.D., Ph.D., University of British Columbia
- James Brinkley, M.D., Ph.D., University of Washington
- Bruce Conklin, M.D., University of California at San Francisco
- Janan Eppig, Ph.D., The Jackson Laboratory
- Bron Kisler, Clinical Data Interchange Standards Consortium (CDISC)
- John Quackenbush, Ph.D., Dana-Farber Cancer Institute
- Shankar Subramaniam, Ph.D., University of California at San Diego
- Simon Twigger, Ph.D., Medical College of Wisconsin
- Monte Westerfield, Ph.D., Institute of Neuroscience
- Mark Wilkinson, Ph.D., St. Paul’s Hospital
- Raimond Winslow, Ph.D., Johns Hopkins University