Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

Health Services Research & Development

Veterans Crisis Line Badge
Go to the ORD website
Go to the QUERI website

2011 HSR&D National Meeting Abstract

Printable View

2011 National Meeting

3051 — Accounting for Symptom Severity and Other Item Level Characteristics Yields a More Precise Measure of Depression

Kudel I (Cincinnati VAMC), Edwards MC (The Ohio State University), Justice AC (Veterans Affairs Connecticut Healthcare System), Tsevat J (Cincinnati VAMC)

Depression screening is routinely conducted in outpatient settings, but the statistical approach for deriving the score – summing responses across items – assumes that each item has the same psychometric properties and carries equal weight. Alternative state-of-the-art scoring methods such as item response theory (IRT) can differentially weight item-level characteristics such as symptom severity, thereby yielding more precise scores. In this study, we compared scores on a depression measure using the sum-score vs. an IRT-score.

Outpatients without HIV (N = 2813) enrolled in the Veterans Aging Cohort Study, an ongoing longitudinal, prospective study, responded to the PHQ-9, a 9-item measure used widely to screen for depression. The data were analyzed to produce a sum-score and an IRT-score. The latter procedure required 2 steps: 1) the Graded Response Model produced item properties, and 2) item properties were applied to every response from each patient to produce a score.

Veterans were predominantly male, African-American, and middle-aged. The sum-score procedure yielded all 28 different possible scores (range 0-27). The first step of the IRT analyses ordered the 9-items from least severe (changes in sleep) to most severe (suicidal ideation). The 2nd step yielded 931 discrete scores in a standard normal distribution (mean = 0, sd = 1) ranging from -1.06 to 2.58. The number of IRT scores per sum-score ranged from 1-89 and each sum-score masked an average of 0.43 sd of IRT scores. Patients with a sum score at the cut-point of 10 (scores >= 10 indicate major depression) differed by as much as .63 sd, indicating different levels of depression. Conversely, some patients with dissimilar sum-scores had very similar IRT scores. For example, two respondents had sum-scores of 7 and 10, but IRT scores of 0.484 and 0.485, respectively, indicating that they actually had comparable levels of depression.

Modeling self-report data on the PHQ-9 using IRT produces scores that reflect more fine-grained differences among respondents and therefore more precise indicators of depression.

Applying IRT scoring to the PHQ-9 may revolutionize the detection of depression in primary care if it can be easily administered and prove better at identifying veterans with major depression.

Questions about the HSR&D website? Email the Web Team.

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.