June 19, 2011

Violence risk meta-meta: Instrument choice does matter

Despite popularity, psychopathy test and actuarials not superior to other prediction methods 

The past couple of decades have seen an explosion of interest in forensic assessment of risk for future violent and sexual recidivism. Accordingly, evaluators can now choose from an array of more than 120 different risk assessment tools. But should this choice be based on individual preference, or are some instruments clearly superior to others?

Several systematic reviews and metaanalyses have addressed this question, but their conclusions often conflict. In the first systematic review of these reviews (called a “meta-review”), Jay Singh and Seena Fazel of Oxford University found that methodological shortcomings may contribute to the confusion. Problems they identified in the 40 metaanalyses and reviews they studied included authors' failure to adequately describe their study search procedures, failure to check for overlapping samples or publication bias, and failure to investigate the confound of sample heterogeneity.

The Oxford scholars, along with Martin Grann of Sweden's Centre for Violence Prevention, set out to rectify this problem via a more methodologically rigorous meta-review, using optimal data analyses and reporting procedures. For this purpose, they used the Preferred Reporting Items for Systematic Reviews and Metaanalyses, a 27-item checklist designed to enable a transparent and consistent reporting of results.

For their meta-meta (a metaanalysis of the metaanalyses), they collected data from 68 studies involving about 26,000 participants in 13 countries, focusing on the accuracy of the nine most commonly used forensic risk assessment instruments:
  • Psychopathy Checklist (PCL-R)
  • Static-99
  • Historical, Clinical, Risk Management-20 (HCR-20)
  • Violence Risk Appraisal Guide (VRAG)
  • Sexual Violence Risk-20 (SVR-20)
  • Level of Service Inventory (LSI-R)
  • Sex Offender Risk Appraisal Guide (SORAG)
  • Spousal Assault Risk Assessment (SARA)
  • Structured Assessment of Violence Risk in Youth (SAVRY)
Big differences in predictive validity

As it turns out, these widely used instruments vary substantially in predictive accuracy. Performing the best was the SAVRY, a risk assessment instrument designed for use with adolescents. At the bottom were the Level of Service Inventory and the Psychopathy Checklist. This is not too surprising, as the LSI-R is used with a wide variety of general offenders, and the PCL-R was not designed for risk prediction in the first place.



The present metaanalysis would therefore argue against the view of some experts that the PCL- R is unparalleled in its ability to predict future offending.

Statistical method matters: DOR outperforms AUC

The researchers compared several different methods of measuring predictive accuracy. They found that a popular statistic called the Area Under the Curve (AUC) was the weakest. Use of the AUC statistic may help to explain why some metaanalyses were unable to find significant differences among instruments, the authors theorize.

Better methods for comparing instruments’ predictive accuracy include calculating positive and negative predictive values and also using something called the Diagnostic Odds Ratio, or DOR. This is the ratio of the odds of a positive test result in an offender (true positive) relative to the odds of a positive result in a non-offender (false positive). The authors’ summary performance scores pooled results from all four statistical methods.

Actuarials not superior; race also matters

The poor performance of the Psychopathy Checklist (PCL-R) was not the only finding that may surprise some forensic evaluators. The researchers also found no evidence that actuarial tools – such as the widely touted Static-99 – outperform structured clinical judgment methods like the HCR-20 or the SVR-20.

They also found that an offender's race is critical to predictive accuracy. Risk assessment instruments perform best on white offenders, most likely because white offenders predominate in the underlying studies. This is consistent with other research, including a study by Dernevick and colleagues finding that risk assessment instruments are poor at predicting misconduct in terrorists.

Caution is therefore warranted when using any risk assessment tool to predict offending in samples dissimilar to their validation samples, the authors stress.

This systematic review appears to be the most methodologically rigorous such study to date, in a rapidly evolving field. I recommend obtaining both articles (see below for author contact information) and studying them carefully. The stakes are high, and it behooves us to use the instruments that are the most accurate for the specific purpose at hand.

The studies are:

No comments: