What is the goal of Data STAGE?
The NHLBI's Data STAGE will develop innovative computing solutions that meet the needs of the NHLBI and our research community, building on the cloud-based infrastructure of the NIH Data Commons. NHLBI's Data STAGE is a cloud-based platform, or technical framework, for tools, applications, and workflows. Data STAGE provides secure workspaces to share, store, cross-link, and compute large sets of data generated from biomedical and behavioral .
Data STAGE is a critical part of implementing the Data Commons, a virtual shared space where scientists can access and work with the digital objects of biomedical research, such as data and software. Data from NHLBI's Trans-Omics for Precision Medicine (TOPMed) Program is one of three NIH-funded datasets included in the Data Commons. The TOPMed dataset is being used to test and develop the capabilities of the Data Commons. Data STAGE will enhance access to data from TOPMed-affiliated studies and other NHLBI datasets. Data STAGE will also provide access to tools that can be used to analyze various data types, including , , other , and imaging data.
- Data STAGE will improve FAIR-ness—the findability, accessibility, interoperability, and reusability—of NHLBI data.
- Data STAGE will accelerate research and engagement to drive discovery of new diagnostics, treatments, and prevention strategies for HLBS conditions.
- It supports data democratization, so NHLBI data is accessible and understandable by researchers and citizen scientists as they work to accelerate discovery.
- Because of its interoperability, Data STAGE will be able to exchange information with other components of the Data Commons.
- Scientists will be able to use Data STAGE’s capabilities to integrate NHLBI imaging data with TOPMed data.
How does Data STAGE contribute to scientific discoveries?
Data STAGE directly addresses the NHLBI Strategic Vision objective of leveraging emerging opportunities in data science to open new frontiers in heart, lung, blood, and sleep (HLBS) research.
Building on the Data Commons infrastructure, Data STAGE will offer specialized search functions, controlled access to data, and analytic tools via widely available programming interfaces. With these capabilities, NHLBI researchers and other scientists can use NHLBI datasets for scientific discovery.
Data STAGE will use HLBS research to test and expand the platform. These HLBS use cases will also contribute knowledge and tools to the Data Commons. In the long term, Data STAGE will integrate massive datasets from NHLBI-supported clinical, population-based, and genomic studies to support NHLBI efforts toward precision medicine.
How does Data STAGE work?
Data STAGE is a long-term effort to support integration of NHLBI datasets within the Data Commons. The platform will support many different types of activities:
- Develop new solutions that allow NHLBI datasets and platforms to operate within the Data Commons cloud-based environment.
- Construct and enhance annotated metadata for NHLBI datasets that align with the standards and the technical solutions developed for the Data Commons and that also ensure the data comply with FAIR data principles.
- Design and test tools that search and analyze the unique characteristics of NHLBI datasets, and that also group data based on certain shared characteristics so that researchers can test hypotheses.
- Establish and support secure workspaces for collaborative analysis specialized for NHLBI datasets and HLBS research, using a platform that brings the computation to the data, not the data to the computation.
- Leverage the Data Commons as a repository for sharing analytic tools and workflows among HLBS researchers. Data STAGE includes data analysis pipelines that will enable researchers to confirm results by allowing others to duplicate their findings.
Learn more about the experts and organizations involved in Data STAGE.
Data STAGE is a joint effort of NHLBI and data science experts in academic institutions, research organizations, and industry. Harvard Medical School, Seven Bridges Genomics, the Renaissance Computing Institute, University of California Santa Cruz, the Broad Institute, and University of Chicago are working closely with the NHLBI Program Team to develop the platform.
Data STAGE is governed by a steering committee that includes the development teams, NLHBI staff, and data producers and consumers. An external panel of experts will provide guidance to the NHLBI during the development and implementation of Data STAGE, and the panel will provide linkages to the NIH Data Commons Pilot Phase Consortium.