Models based on the Framingham data have been widely used to predict risk of cardiovascular disease and by ATP-III to stratify individuals into risk categories. The accuracy of such models can be assessed in several different ways, including calibration, or how well the predicted probabilities agree with actual risk, and discrimination, or how well the model can separate those who do and do not develop the disease. In assessing the addition of CRP to risk prediction models, attention in the medical literature has focused virtually exclusively on the second component of accuracy, namely, model discrimination, and, in particular, on the c-statistic, also known as the c index or area under the ROC curve. The c-statistic describes how well the model can rank order cases and controls, and is not a function of the actual predicted probabilities. While sensitivity and specificity, upon which it is based, are natural parameters of interest in the case-control or diagnostic setting, they are less so in a cohort or prognostic setting.(1) It has been argued that sensitivity and specificity are properties of a test that are not subject to alteration by prevalence of disease. This has been shown to be false, however, both theoretically and clinically.(2)
Because the c-statistic is based solely on ranks, it is less sensitive than measures based on the likelihood or other global measures of model fit.(3) In data from the Women?s Health Study, the addition of HDL or LDL to models including age, systolic blood pressure and smoking raised the c-statistic to only 0.80 from 0.79.(4) The same was true for CRP, which was stronger than any of the lipids based on likelihood measures, but similar when based on c-statistics. Thus, if the same criteria for c-statistic improvement were applied to lipids, they would not be included in the Framingham score. This does not mean that lipids, or CRP, are useless as risk biomarkers, but indicates that the c-statistic should not be the only means of model assessment.
A second aspect of model fit that has largely been ignored is calibration, or the ability of the predicted risk to accurately reflect the true risk. In fact, there is a trade-off between discrimination and calibration, and it is impossible for a model to be perfect in both.(5) Calibration may, in fact, be a more important aspect of a prognostic model. Patients (and examining physicians) are interested in whether the patient has the disease given the test rather than their probability of having a positive test given whether they have the disease.(6) The predictive value, or post-test probability is thus more relevant for patient care,(7) and the ATP III uses such estimates of 10-year risk in its treatment guidelines. If such risk stratification can be made more accurate, the model is improved. In the Women?s Health Study, of women classified with at least 5% 10-year risk using Framingham covariables alone, more than 20% were more accurately re-classified into new risk strata when CRP was included in the model.4 Clinically this suggests that measuring CRP may be warranted, at least among those at intermediate risk, and that this could alter treatment decisions. Statistically, including CRP in risk prediction could lead to more accurate risk classification, despite little change in the c-statistic. When the goal of a predictive model is to categorize individuals into risk strata, the assessment of such models should be based on how well they achieve this aim.
- Gail MH, Pfeiffer RM. On criteria for evaluating models of absolute risk. Biostat. 2005;6:227-239.
- Brenner H, Gefeller O. Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence. Stat Med. 1997;16:981-991.
- Harrell FE, Jr., Lee KL, Mark DB. Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361-387.
- Cook NR, Buring JE, Ridker PM. The effect of including C-reactive protein in cardiovascular risk prediction models for women. Ann Intern Med. 2006;145:21-29.
- Diamond GA. What price perfection? Calibration and discrimination of clinical prediction models. J Clin Epidemiol. 1992;45:85-89.
- Moons KGM, Harrell FE. Sensitivity and specificity should be de-emphasized in diagnostic accuracy studies. Acad Radiol. 2003;10:670-672.
- Guggenmoos-Holzmann I, van Houwelingen HC. The (in)validity of sensitivity and specificity. Stat Med. 2000;19:1783-1792.