Diagnostic accuracy of chest X-ray interpretation for tuberculosis by three artificial intelligence-based software in a screening use-case: an individual patient meta-analysis of global data

Author/s: Sandra V. Kik, Sifrash M. Gelaw, Morten Ruhwald, Rinn Song, Faiz Ahmad Khan, Rob van Hest, Violet Chihota, Nguyen Viet Nhung, Aliasgar Esmail, Anna Marie Celina Garfin, Guy B. Marks, Olga Gorbacheva, Onno W. Akkerman, Kgaugelo Moropane, Le Thi Ngoc Anh, Keertan Dheda, Greg J. Fox, Nina Marano, Knut Lönnroth, Frank Cobelens, Andrea Benedetti, Puneet Dewan, Stefano Ongarello, Claudia M. Denkinger
Year:
Language: English
Publication Type: Scientific report (Journal)(External)

Download this Publication
Description

Abstract

Background
Chest X-ray (CXR) screening is a useful diagnostic tool to test individuals at high risk of tuberculosis (TB), yet image interpretation requires trained human readers who are in short supply in many high TB burden countries. Therefore, CXR interpretation by computer-aided detection software (CAD) may overcome some of these challenges, but evidence of its accuracy is still limited.

We established a CXR library with images and metadata from individuals and risk groups that underwent TB screening in a variety of countries to assess the diagnostic accuracy of three commercial CAD solutions through an individual participant meta-analysis.

Methods and findings
We collected digital CXRs and demographic and clinical data from 6 source studies involving a total of 2756 participants, 1753 (64%) of whom also had microbiological test information. All CXR images were analyzed with CAD4TB v6 (Delft Imaging), Lunit Insight CXR TB algorithm v4.9.0 (Lunit Inc.), and qXR v2 (Qure.ai) and re-read by an expert radiologist who was blinded to the initial CXR reading, the CAD scores, and participant information. While the performance of CAD varied across source studies, the pooled, meta-analyzed summary receiver operating characteristic (ROC) curves of the three products against a microbiological reference standard were similar, with area under the curves (AUCs) of 76.4 (95% CI 72.1-80.3) for CAD4TB, 83.3 (95% CI 78.4-87.2) for Lunit, and 76.4 (95% CI 72.1-80.3) for qXR. None of the CAD products, or the radiologists, met the targets for a triage test of 90% sensitivity and 70% specificity. At the same sensitivity of the expert radiologist (94.0%), all CAD had slightly lower point estimates for specificity (22.4% (95% CI 16.9-29.0) for CAD4TB, 34.6% (95% CI 25.3-45.1) for qXR, and 41.0% (95% CI 30.1-53.0) for Lunit compared to 45.6% for the expert radiologist). At the same specificity of 45.6%, all CAD products had lower point estimates for sensitivity but overlapping CIs with the sensitivity estimate of the radiologist.

Conclusions
We showed that, overall, three commercially available CAD products had a reasonable diagnostic accuracy for microbiologically confirmed pulmonary TB and may achieve a sensitivity and specificity that approximates those of experienced radiologists. While threshold setting and cost-effectiveness modelling are needed to inform the optimal implementation of CAD products as part of screening programs, the availability of CAD will assist in scaling up active case finding for TB and hence contribute to TB elimination in these settings.

Region/Country (by coverage)
Publisher
medRxiv