Background: Development and Use of Study Quality Assessment Tools

Design of the quality assessment tools

Appraisal of individual study quality was based on tailored quality assessment tools developed jointly by methodologists from NHLBI and Research Triangle Institute International. The tools were based on quality assessment methods, concepts, and other tools developed by researchers in the Agency for Healthcare Research and Quality (AHRQ) Evidence-Based Practice Centers, the Cochrane Collaboration, the USPSTF, the Scottish Intercollegiate Guidelines Network, and the National Health Service Centre for Reviews and Dissemination, as well as consulting epidemiologists and others working in evidence-based medicine, with adaptations by methodologists and NHLBI staff for this project.

The tools were designed to assist reviewers in focusing on concepts that are key for critical appraisal of the internal validity of a study. The tools were not designed to provide a list of factors comprising a numeric score. The tools were specific to individual types of included study designs and are described in more detail below.

The tools included items for evaluating potential flaws in study methods or implementation, including sources of bias (e.g., patient selection, performance, attrition, and detection), confounding, study power, the strength of causality in the association between interventions and outcomes, and other factors. Quality reviewers could select "yes," "no," or "cannot determine/not reported/not applicable" in response to each item on the tool. For each item where "no" was selected, reviewers were instructed to consider the potential risk of bias that could be introduced by that flaw in the study design or implementation. Cannot determine and not reported were also noted as representing potential flaws.

Each of the quality assessment tools had a detailed guidance document, which was also developed by the methodology team and NHLBI. The guidance documents were specific to each tool and provided more detailed descriptions and examples of application of the items, as well as justifications for each item's inclusion. For some items, examples were provided to clarify the intent of the question and the appropriate rater response.

Significance of the quality ratings of good, fair, or poor

Reviewers used the study rating tools on the range of items included in each tool to judge each study to be of "good," "fair," or "poor" quality. The ratings on the different items were used by the reviewers to assess the risk of bias in the study due to flaws in study design or implementation.

In general terms, a "good" study has the least risk of bias, and results are considered to be valid. A "fair" study is susceptible to some bias deemed not sufficient to invalidate its results. The fair quality category is likely to be broad, so studies with this rating will vary in their strengths and weaknesses.

A "poor" rating indicates significant risk of bias. Studies rated poor were excluded from the body of evidence to be considered for each CQ. The only exception allowed was if there was no other evidence available, then poor quality studies could be considered. However, this exception was not applied in this project because there were no situations found where only poor quality studies were available for a body of evidence for a particular CQ.

Training for application of the quality assessment tools

The methodology team conducted a series of training sessions on the use of four of the quality assessment tools. Initial training consisted of two 2-day, in-person training sessions. Reviewers trained in the quality rating were master's or doctorate-level staff with a background in public health or health sciences. Training sessions provided instruction on identifying the correct study designs, the theory behind evidence-based research and quality assessment, explanations and rationales for the items in each tool, and methods for achieving overall judgments regarding quality ratings of good, fair, or poor. Participants engaged in interactive evaluation of multiple example articles, both with the instructors and during group work. Reviewers were also instructed to refer to related articles on study methods if such papers were cited in the articles being rated.

Following the in-person training sessions, the methodology team assigned several articles with pertinent study designs to test the abilities of each reviewer. The reviewers were asked to individually identify the correct study design, complete the appropriate quality assessment tool, and submit it to the methodology team for grading against a methodologist-developed key. A second round of training sessions was then conducted via telephone to review the results and resolve any remaining misinterpretations. Based on the results of these evaluations, a third round of exercises and training sessions was sometimes convened.

The before-after and case series studies quality assessment tools were only applied for the Obesity Panel's CQ5, which addresses bariatric surgery interventions. This CQ included those types of study designs due to the different types of issues addressed for this surgical intervention. As a result, a formal training program for using these quality assessment tools was not conducted. The training efforts were more individual and focused on reviewing the tool and guidance document with staff working on quality assessment for this CQ.

Quality assessment process

For all studies except systematic reviews and meta-analyses, each article that met the CQ's inclusion criteria was independently rated for quality by two reviewers using the appropriate tool. If the ratings differed, then reviewers discussed the article in an effort to reach consensus. If consensus was not achieved, then the article was forwarded to a methodologist for quality adjudication.

Quality rating of systematic reviews and meta-analyses was performed independently by two methodologists. If ratings differed, then reviewers discussed the article in an effort to reach consensus. When consensus was not achieved, the article was forwarded to a third methodologist for adjudication.

Expert panel members could appeal the quality of a particular study or publication, subsequent to the initial rating reported to the expert panel members. However, to enhance the objectivity of the quality rating process, the final decision on quality ratings was made by the methodology team, and not by the expert panel members.