TY - JOUR
T1 - Biases associated with database structure for COVID-19 detection in X-ray images
AU - Arias-Garzón, Daniel
AU - Tabares-Soto, Reinel
AU - Bernal-Salcedo, Joshua
AU - Ruz, Gonzalo A.
N1 - Funding Information:
We would like to thank Universidad Autonoma de Manizales for making this paper as part of the “Detección de COVID-19 en imágenes de rayos X usando redes neuronales convolucionales” proyecct with code 699-106, and Minciencias for fund this proyect on the call No. 874 of 2020, named “Convocatoria para el Fortalecimiento de Proyectos en ejecución de CTeI en Ciencias de la Salud con Talento Joven e Impacto Regional” also to the proyects “CH-T1246 : Oportunidades de Mercado para las Empresas de Tecnología - Compras Públicas de Algoritmos Responsables, Èticos y Transparentes” and ANID PIA/BASAL FB0002, that help with the ethical tools applications on this paper.
Publisher Copyright:
© 2023, The Author(s).
PY - 2023/12
Y1 - 2023/12
N2 - Several artificial intelligence algorithms have been developed for COVID-19-related topics. One that has been common is the COVID-19 diagnosis using chest X-rays, where the eagerness to obtain early results has triggered the construction of a series of datasets where bias management has not been thorough from the point of view of patient information, capture conditions, class imbalance, and careless mixtures of multiple datasets. This paper analyses 19 datasets of COVID-19 chest X-ray images, identifying potential biases. Moreover, computational experiments were conducted using one of the most popular datasets in this domain, which obtains a 96.19% of classification accuracy on the complete dataset. Nevertheless, when evaluated with theethical tool Aequitas, it fails on all the metrics. Ethical tools enhanced with some distribution and image quality considerationsare the keys to developing or choosing a dataset with fewer bias issues. We aim to provide broad research on dataset problems,tools, and suggestions for future dataset developments and COVID-19 applications using chest X-ray images.
AB - Several artificial intelligence algorithms have been developed for COVID-19-related topics. One that has been common is the COVID-19 diagnosis using chest X-rays, where the eagerness to obtain early results has triggered the construction of a series of datasets where bias management has not been thorough from the point of view of patient information, capture conditions, class imbalance, and careless mixtures of multiple datasets. This paper analyses 19 datasets of COVID-19 chest X-ray images, identifying potential biases. Moreover, computational experiments were conducted using one of the most popular datasets in this domain, which obtains a 96.19% of classification accuracy on the complete dataset. Nevertheless, when evaluated with theethical tool Aequitas, it fails on all the metrics. Ethical tools enhanced with some distribution and image quality considerationsare the keys to developing or choosing a dataset with fewer bias issues. We aim to provide broad research on dataset problems,tools, and suggestions for future dataset developments and COVID-19 applications using chest X-ray images.
UR - http://www.scopus.com/inward/record.url?scp=85149369133&partnerID=8YFLogxK
U2 - 10.1038/s41598-023-30174-1
DO - 10.1038/s41598-023-30174-1
M3 - Article
C2 - 36859430
AN - SCOPUS:85149369133
SN - 2045-2322
VL - 13
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 3477
ER -