Data Storage, Toolspace, Access, and analytics for biG-data Empowerment (DataSTAGE)

Project began
2018
Point of contact

What is the goal of DataSTAGE?

The NHLBI's DataSTAGE will develop innovative computing solutions that meet the needs of the NHLBI and our research community, building on the cloud-based infrastructure of the NIH Data Commons. NHLBI's DataSTAGE is a cloud-based platform, or technical framework, for tools, applications, and workflows. DataSTAGE provides secure workspaces to share, store, cross-link, and compute large sets of data generated from biomedical and behavioral research.

DataSTAGE is a critical part of implementing the Data Commons, a virtual shared space where scientists can access and work with the digital objects of biomedical research, such as data and software. Data from NHLBI's Trans-Omics for Precision Medicine (TOPMed) Program is one of three NIH-funded datasets included in the Data Commons. The TOPMed dataset is being used to test and develop the capabilities of the Data Commons. DataSTAGE will enhance access to data from TOPMed-affiliated studies and other NHLBI datasets. DataSTAGE will also provide access to tools that can be used to analyze various data types, including phenotypic, genomic, other omics, and imaging data.

AT A GLANCE
  • DataSTAGE will improve FAIR-ness—the findability, accessibility, interoperability, and reusability—of NHLBI data.
  • DataSTAGE will accelerate research and engagement to drive discovery of new diagnostics, treatments, and prevention strategies for HLBS conditions.
  • It supports data democratization, so NHLBI data is accessible and understandable by researchers and citizen scientists as they work to accelerate discovery.
  • Because of its interoperability, DataSTAGE will be able to exchange information with other components of the Data Commons.
  • Scientists will be able to use DataSTAGE’s capabilities to integrate NHLBI imaging data with TOPMed data.

How does DataSTAGE contribute to scientific discoveries?

DataSTAGE directly addresses the NHLBI Strategic Vision objective of leveraging emerging opportunities in data science to open new frontiers in heart, lung, blood, and sleep (HLBS) research.

Building on the Data Commons infrastructure, DataSTAGE will offer specialized search functions, controlled access to data, and analytic tools via widely available programming interfaces. With these capabilities, NHLBI researchers and other scientists can use NHLBI datasets for scientific discovery.

DataSTAGE will use HLBS research to test and expand the platform. These HLBS use cases will also contribute knowledge and tools to the Data Commons. In the long term, DataSTAGE will integrate massive datasets from NHLBI-supported clinical, population-based, and genomic studies to support NHLBI efforts toward precision medicine.

How does DataSTAGE work?

DataSTAGE is a long-term effort to support integration of NHLBI datasets within the Data Commons. The platform will support many different types of activities:

  • Develop new solutions that allow NHLBI datasets and platforms to operate within the Data Commons cloud-based environment.
  • Construct and enhance annotated metadata for NHLBI datasets that align with the standards and the technical solutions developed for the Data Commons and that also ensure the data comply with FAIR data principles.
  • Design and test tools that search and analyze the unique characteristics of NHLBI datasets, and that also group data based on certain shared characteristics so that researchers can test hypotheses.
  • Establish and support secure workspaces for collaborative analysis specialized for NHLBI datasets and HLBS research, using a platform that brings the computation to the data, not the data to the computation.
  • Leverage the Data Commons as a repository for sharing analytic tools and workflows among HLBS researchers. DataSTAGE includes data analysis pipelines that will enable researchers to confirm results by allowing others to duplicate their findings.

Learn more about the experts and organizations involved in DataSTAGE.

More Information
- Data Storage, Toolspace, Access, and analytics for biG-data Empowerment (DataSTAGE)

DataSTAGE is a joint effort of NHLBI and data science experts in academic institutions, research organizations, and industry. Harvard Medical School, Seven Bridges Genomics, the Renaissance Computing Institute, University of California Santa Cruz, the Broad Institute, and University of Chicago are working closely with the NHLBI Program Team to develop the platform.

DataSTAGE is governed by a steering committee that includes the development teams, NLHBI staff, and data producers and consumers. An external panel of experts will provide guidance to the NHLBI during the development and implementation of DataSTAGE, and the panel will provide linkages to the NIH Data Commons Pilot Phase Consortium.