Rating the articles
The rating of the quality of reporting for each tool entailed the use of a formal set of criteria adopted from the STARD criteria for rating and reporting on diagnostic accuracy. Those criteria were further adapted for the purposes of this project. Complete and accurate reporting allows the reader to assess potential bias and judge the generalizability and applicability of the results. The STARD checklist does not yield an assessment of the research findings per se, (i.e., a summary of how well a test actually performs against the reference standard) but rather an assessment of the quality of reporting of essential features of all phases of a validation study. Results from the checklist, therefore, allow a reader to assess the likelihood that the results are unbiased and applicable to their own situation.
In Figure 3, the tools are ordered based on the STARD ratings. The left side of the chart shows the results for the four tools that serve the dual functions of screening for both mental health and substance use disorders (POSIT, DSP, GSS and the DUSI-R). The top scores among these four were the POSIT and the DPS with scores of 26, followed closely by the GSS with a score of 22. For the mental health tools that do not include a screening for substance use disorders, the majority yielded similar STARD scores (ranging from 23 to 26), with the two notable exceptions of the GHQ and the ECI-4 (scoring 18).
Figure 3: STARD Ratings for Articles on Mental Health-Related Tools
Figure 4 shows the ratings for the substance use-related screening tools, again with the "dual-function" tools grouped on the left, and those aimed exclusively on substance use disorders on the right. The STARD scores are the same for the dual-function tools as were shown in Figure 3-these are the same articles. For the substance use tools, the CRAFFT stands out with a score of 29; indeed it received the highest rating across all the tools being reviewed. The RAFFT follows with a rating of 25, and then the remainder lie fall closer to 20, with the exception of the RAPI that scored only 9. All the substance use-specific tools, with the exception of the CRAFFT, were the subject of only one article meeting our project criteria for validation design. This stands in contrast to many of the mental health-related tools in Figure 3.
Figure 4: STARD Ratings for Articles on Substance Use-Related Tools
To get a better sense of relative scoring on the STARD, we looked across Figures 3 and 4, and also across the dual-function versus single-function tools, and derived three groups as follows:
- "high" scores: 25 and higher (POSIT, DPS, PSC, Y-OQ-12, SDQ, CRAFFT and RAFFT)
- "moderate" scores: 20-24 (GSS, DUSI-R, YSR, RCQ, CSI-4, ACK, PRO, MAC-R)
- "low" scores: less than 20 (GHQ, ECI-4, RAPI).
As already emphasized, the STARD scores, and the groupings based on relative magnitude of the scores, reflect the thoroughness of the reporting of key features of the research and not necessarily the quality of the screening test itself, its psychometric performance and other factors related to its value in certain settings and populations.