Abstract:
Research on deep learning models for chest radiology applications has increased attention by the public. However, most works focus on developing models using in-domain data, so the significant drawback, when applied in real-world scenarios, was the mismatched data with the training set. Consequently, some models perform inferior at the deployment stage. This work focused on the effects of dataset mismatch on chest radiography and analyzed the methods the overcome the mismatch issues. The lung balance contrast enhancement technique (lung BCET) automatically identifies the lung region and normalizes the image accordingly to improve the robustness of out-of-domain data developed. Additionally, augmentation methods that are suitable for chest radiography were explored. The data on Tuberculosis (TB), COVID-19, and pneumonia were compiled from multiple datasets to evaluate and compare the performance of the preprocessing and augmentation methods using the area under the receiver operating characteristic curve (AUC) and heatmap quality. For out-of-domain testing conditions, the lung BCET preprocessing method achieved the highest AUC scores of 0.7978 and 0.6240 for the Maesot and Bureau of TB (BT) datasets, respectively. However, there are no differences in performance on COVID-19 and pneumonia datasets. Our study also found that lung BCET can be used to perform data augmentation in conjunction with the standard augmentation techniques to improve the performance in both in- and out-of-domain conditions on the TB datasets.