|Skip left side navigation and go to content||
Guidelines for NHLBI Data Set Preparation
The purpose of this document is to provide information and guidance in the preparation of NHLBI data repository datasets and associated documentation for submission to the Biological Specimen and Data Repository Information Coordinating Center (BioLINCC) in accordance with the NHLBI Policy for Data Sharing from Clinical Trials and Epidemiological Studies.
Refer to NHLBI Clinical Research Guide Glossary for additional terms not identified.
Data - Information collected and recorded from study participants through periodic examinations and follow-up contacts, not to include original specimens or images.
Commercial purpose - Data will be considered as being for a commercial purpose if they are to be used by an investigator who is an employee of a for-profit organization, if they are to be used by an investigator to satisfy a contractual relationship with a for-profit organization, or if they are to be used by an investigator as the basis for a consulting relationship with a for-profit organization. Data will also be considered as being for a commercial purpose if the investigator(s) take any affirmative steps to facilitate commercial use of results derived from the data.
Non-Commercial Purpose Data Set - A data set consisting of all records except those for participants who requested that their data not be shared beyond the initial study investigators.
Commercial Purpose Data Set - A data set consisting of all records except those for participants who requested that their data not be shared beyond the initial study investigators or used for commercial purposes.
Non-Commercial Purpose Pedigree/Genetic Data Set - A pedigree/genetic data set consisting of all pedigree and genetic data except those for participants who requested that their data not be shared beyond the initial study investigators.
Commercial Purpose Pedigree/Genetic Data Set - A pedigree/genetic data set consisting of all pedigree and genetic data except those for participants who requested that their data not be shared beyond the initial study investigators or used for commercial purposes.
Overview of Responsibilities in Preparing Data Sets for Sharing
Investigators in NHLBI studies covered by the Policy for Data Sharing from Clinical Trials and Epidemiological Studies are required as part of the terms and conditions of their awards to prepare and deliver to the NHLBI data sets that satisfy NHLBI requirements. Included among these required components are the elimination of personal identifiers and the modification of other data elements so as to reduce the likelihood that any individual participant can be identified. Additional requirements include the provision of adequate dataset documentation to enable the use of prepared datasets by outside investigators as well as the submission of key study documents (protocol, data collection forms, manuals of procedures, etc.)
Two data sets, i.e., a Non-Commercial Purpose Data Set and a Commercial Purpose Data Set, and, if applicable, two pedigree/genetic data sets, i.e., a Non-Commercial Purpose Pedigree/Genetic Data Set and a Commercial Purpose Pedigree/Genetic Data Set, and associated documentation, must be provided in electronic form to the Institute. In addition, investigators must provide the Institute with two separate lists of participant identification numbers, one consisting of those participants who asked that their data not to be shared beyond the initial study investigators and the other of those participants who asked that their data not be used for commercial purposes.
Investigators in ancillary studies based on ongoing (parent) studies that are required by this policy to produce data sets must submit ancillary study data to the NHLBI through the parent study coordinating center or data submission process established by the parent study.
Types of Data to be Included in NHLBI Repository Data Sets
In addition to summary information, data sets include for each participant those raw data elements (e.g., food item data, individual electrocardiographic lead scores, etc.) that have not otherwise been processed into summary information.
Guidelines for Redaction/Summarization of NHLBI Data Sets
The NHLBI requires that the data be provided in a manner that protects the privacy of study participants. The Institute requires appropriate documentation of the steps taken to protect their privacy in preparing a data set. A summary of all proposed modifications and deletions to be made to a data set must be submitted to and approved by the NHLBI Data Repository representative prior to their implementation.
The following guidelines provide a framework for decision-making regarding preparation of data sets:
Dataset and Study Documentation
Documentation for data sets must be comprehensive and sufficiently clear to enable investigators who are not familiar with a data set to use it. The documentation must include data collection forms, study procedures and protocols, descriptions of all variable recoding performed, and a list of major study publications.
In addition, a summary documentation file, usually called a "readme" file, is required. It must provide a complete overview of the data and a description of their use for investigators who are not familiar with the data set. It must also contain a brief description of the study (including a general orientation to the study, its components, and its examination and follow-up timeline), a listing of all files being provided, a description of system requirements, a generation program code for installing a SAS file from the SAS export data file, and a frequency distribution for selected key variables.
Selected study documentation will be used to describe the study on the Data Repository website. Examples include Forms, Data Dictionaries, Descriptive Statistics, and the Study Protocol. These documents will need to be accessible to those with disabilities according to section 508 of the Rehabilitation Act. The HHS maintains a website devoted to 508 issues with links to resources on creating and checking accessibility at http://www.hhs.gov/web/508/index.html.
Format, Storage and Delivery of Study Materials
Both the comprehensive documentation and the summary documentation must be prepared in a consistent format, either as Word Perfect, MS Word, ASCII, or portable document format (PDF) files and included on the same storage medium as the data set. To ensure access by users with disabilities, all PDF files must be created in Adobe Acrobat version 5.0 or higher. Documentation that is not available in electronic form, such as data collection forms, should be scanned into a graphics file, converted to a PDF file using Adobe Acrobat version 5.0 or higher, and saved on the same medium as the data set. Pedigree data should be provided in a format readable by standard genetic analysis programs such as SAGE and SOLAR, with one individual's data per line beginning with pedigree identifier, individual's ID, father's ID, mother's ID, and individual's sex.
Data are to be stored on a CD ROM unless the investigators and the NHLBI mutually agree upon an alternative storage medium.
Data and study materials are to be sent to the NHLBI prior to the end of funding according to the timelines described in the NHLBI Policy for Data Sharing from Clinical Trials and Epidemiological Studies.
The following links highlight NIH policy and related guidance on sharing of research data developed with NIH funding.
For questions and/or concerns regarding the content of this page, please contact the Clinical Research Policy Manager
Last Updated: December 2011