A systematic literature review of explainable risk assessment models for bronchial asthma

Cessie Valenie Anak Edwin; Phei Chin Lim; Dayang Nurfatimah binti Awang Iskandar; Diana Leh Ching Ng

doi:10.21037/jmai-24-204

Review Article

A systematic literature review of explainable risk assessment models for bronchial asthma

Cessie Valenie Anak Edwin , Phei Chin Lim , Dayang Nurfatimah binti Awang Iskandar, Diana Leh Ching Ng

Faculty of Computer Science and Information Technology, University Malaysia Sarawak (UNIMAS), Sarawak, Malaysia

Contributions: (I) Conception and design: CVA Edwin, PC Lim; (II) Administrative support: PC Lim; (III) Provision of study materials or patients: None; (IV) Collection and assembly of data: CVA Edwin; (V) Data analysis and interpretation: CVA Edwin, PC Lim; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Cessie Valenie Anak Edwin, BSc (Hons); Phei Chin Lim, BSc, MSc, PhD. Faculty of Computer Science and Information Technology, University Malaysia Sarawak (UNIMAS), 94300, Kota Samarahan, Sarawak, Malaysia. Email: cessievalenieedwin@gmail.com; pclim@unimas.my.

Background: The reduced quality and risks to life brought on by bronchial asthma (BA) have heightened the need for trustworthy risk assessment solutions with deliberate interpretability and transparency. Improper management of BA, such as ignoring symptoms, improper inhaler technique, or recent admissions to the intensive care unit (ICU), puts a patient at a higher risk of future asthma exacerbations, complications, or even death. This paper details a systematic literature review on recent literature to identify and analyse current explainable artificial intelligence (XAI) risk assessment models used in BA or the assessment of risk in healthcare using XAI.

Methods: A systematic review of English literatures was conducted through Science Direct, Association for Computing Machinery (ACM) Digital Library, Springer, PubMed, and Scopus between January 1, 2019 and October 26, 2023. All studies that incorporated XAI or risk assessment models for BA or health were included for this review. A combination and permutation of the following search terms was used: “explainable artificial intelligence”, “risk assessment”, “risk assessment model”, “asthma”, and “health”.

Results: A total of 43 literatures were included after screening through 689 literatures combined from the specified sources, with duplicates and materials not meeting the inclusion criteria removed. Among them, five of the literatures conducted research on asthma, while seven conducted research on lung-related diseases using explainable machine learning (ML) or deep learning (DL) techniques. The model that had better performance when compared to the other models in the 12 most relevant literature out of the 43 was extreme gradient boosting (XGBoost), with it having better performance two out of the three times it was compared to other models. The most common output was risk prediction with 36 literatures, followed by diagnosis with seven literatures and classification with one.

Conclusions: XAI has been used within the domain of asthma for diagnosis or prediction of future hospital visits; however, there is a scarcity for studies on explainable predictive models for asthma exacerbation risks. Research on XAI within this domain has the potential to contribute towards explainability in asthma risk prediction.

Keywords: Explainable artificial intelligence (XAI); risk assessment model; bronchial asthma (BA); healthcare

Received: 30 June 2024; Accepted: 25 August 2025; Published online: 15 December 2025.

doi: 10.21037/jmai-24-204

Highlight box

Key findings

• Explainable artificial intelligence (XAI) has significant application in asthma risk prediction.

• Random forest was the most tried machine learning (ML) model in recent literature.

• Extreme gradient boosting was the ML model with the better performance in the 12 most relevant literatures.

What is known and what is new?

• Future risk of asthma exacerbation or a diagnosis of high risk in a patient is usually assessed by the patient’s doctor or using a My Asthma Action Plan formulated by their doctor.

• XAI has shown promise in the prediction of risk with deliberation towards accuracy, interpretation, explainability, and patient risk assessment.

What is the implication, and what should change now?

• Research in XAI for risk prediction is seeing rapid development.

• Successful integration of XAI for clinical use in risk prediction of asthma will require clinical confirmation.

Introduction

Background

Bronchial asthma (BA) is a chronic respiratory disease that is prevalent in the lives of an estimated 300 million people worldwide and the cause of around 1,000 deaths each day (1). This long-term condition impacts the lives of individuals living with asthma, where there is a reduced quality of life or even the risk of death. Improper management of an individual’s asthma condition, such as ignoring symptoms, may put that person at risk of experiencing asthma exacerbations, which are severe cases of asthma symptoms that make it difficult to breathe due to the narrowing of the lung’s airways, inflammation, swelling, and increased production of (2). Each patient would have a Written Asthma Action Plan written specifically for them by their doctor to allow the patients to monitor their own symptoms and react accordingly (1). A Written Asthma Action Plan would have the dose of medication for a patient and the next action to be taken by the patient, such as going to the nearest hospital if the condition worsens. It also allows the patients to check whether they are in the green zone, the yellow zone, and the red zone, with green being in “good condition”, yellow being in “getting worse”, and red indicating that patients should be alert as their condition is even worse. Inattentiveness towards asthma symptoms can be life-threatening. Artificial intelligence (AI) has been broadly used in the medical field, such as in medical imaging, healthcare tasks (3), and utilised in the care of BA, where it is used for the diagnosis and screening of asthma in patients, classifying patients, managing and monitoring asthma, and for the treatment of asthma (4). It is expected to enhance medical care using machine learning (ML) or deep learning (DL) models. However, AI is still confronted with the issues regarding transparency, interpretability, and explainability, which could affect the users’ trust towards the AI model (5). The complex inference mechanisms of the AI models make it incomprehensible to its users. This limitation in AI is known as the “Black box”, where the AI model processes input data through mathematical algorithms and produces output data, but it is not always known how the AI model reached the decision or prediction it has made. There are certain limitations in AI that research in explainable AI (XAI) wishes to address. Especially within the medical sector, where risk prediction models are required to be accurate, trustworthy, and precise.

Rationale and knowledge gap

The increasing interest of using AI for risk prediction within a clinical setting extends the need for the models to be explainable. The scarcity of recent publications for XAI application in risk prediction for BA is highlighted within this literature review. Although many studies have used AI for diagnosis of asthma, lung-related diseases, and prediction for other diseases in the healthcare sector, there is a lack of literatures particularly on XAI for prediction of risk in BA. This paper aims to review the existing literature and find connections between them by sectioning and examining the selected literatures to procure a detailed insight regarding XAI-based risk assessment models by examining the models used, their architecture, their performances, the input data, and the outputs of the models before concluding with the potential suitable XAI risk assessment model for BA.

Objective

The objectives for this review were to analyse the current XAI risk assessment models for BA and their performances, the optimum explanation technique, and to conclude which XAI risk assessment model would work better for risk assessment in BA. This includes any risks to patients that are related to BA, including predicting the risk of the next asthma exacerbation, predicting the risk of being a higher or lower risk patient, which would indicate the need for alertness towards patients’ need for hospitalization, and anything related to assessing risk in BA. We present this article in accordance with the PRISMA reporting checklist (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-204/rc).

Methods

The systematic literature review was conducted in eight steps using the methods in a paper by Yu and Watson (6) for conducting a systematic literature review as a guide. Figure 1 shows the summary of the systematic literature review search process for the selection of XAI models. Figure 2 shows the PRISMA flow chart of the systematic literature review process. The search process is divided into three different stages, where the first stage, indicated by the blue box, comprises of the planning stage with steps 1 and 2 of the systematic literature review, which are addressing the research problem and specifying the review protocols. The second stage, indicated by the green box, is the process of the literature review itself with steps 3 to 7, which are literature identification, inclusion screening, eligibility, extracting data, and analysing the data. Finally, in the last stage indicated by the yellow box, is step 8, reporting the findings.

Figure 1 Systematic literature search process. AI, artificial intelligence; XAI, explainable artificial intelligence.

Figure 2 PRISMA flow chart of the systematic literature review process. ACM, Association for Computing Machinery.

Step 1: addressing the research problem

The first step in this systematic literature review is to detail the research problem, which would in turn steer the direction of the review (7). It is rudimentary to set a clear boundary and scope to narrow down on the problem at hand. A major problem in AI is the lack of interpretability or also known as the black box problem. The demand for interpretability in AI models is increasing, especially in the medical field (8). This research provides us with an important opportunity to contribute to the field of XAI for risk assessment in BA by exploring current related literatures, the data used for feature selection, the performance of models used, the current explanation models used, and drawing a conclusion for the next step in future research.

Step 2: specifying the review protocols

The review protocols are specified before starting the literature search. They are constructed by considering the purpose of the study, the research questions, the inclusion criteria for literatures, the search strategies, how the screening of the literature would be conducted, procedure of data procurement, analysis, and reporting the findings (9,10). To start the process of determining the review protocols, an initial general search was done using broad keywords such as “artificial intelligence in asthma”, “artificial intelligence in healthcare”, “explainable artificial intelligence in asthma”, and “explainable intelligence in healthcare”. It was then decided that the term AI would be removed from the search terms and only XAI would be kept to narrow down the search. However, some literatures that did not distinctively specify that they were applying XAI still showed up in the search and were included in the review after a thorough screening. In addition to that, the domain of the search was limited to asthma and other diseases within the medical field, where XAI techniques are applied for risk assessment. The decision to limit the domain to only asthma and other diseases was set because limiting it to only asthma would not yield enough literature, while not limiting it to be within only the medical field would yield too many results. It was also decided that literatures using different terms, such as explainable ML and explainable DL instead of XAI were accepted. Apart from this, literatures with both ad-hoc and post-hoc types of XAI were included.

The inclusion criteria are as follows: (I) literature that included XAI in their research, both ML and DL, and post-hoc and ad-hoc, and literature that did not explicitly specify research for XAI; however, the model used in their research is considered under explainable models. (II) Literature that uses XAI within the asthma or medical field, or using XAI for risk assessment. (III) Literature from the years 2019 until 2023 was included in this literature review to construct a review based on recent publications. (IV) Literature written in English. (V) Only articles with available full text.

The exclusion criteria are as follows: (I) literature that was not directly relevant to the research topic. (II) Not written in English. (III) Unavailable full-text literature.

The search strategy used was to search through selected databases using the formulated search terms with consideration toward the research topic. The five databases used were Science Direct, Association for Computing Machinery (ACM) Digital Library, Springer, PubMed, and Scopus. The initial search terms used before narrowing it down were “artificial intelligence”, “explainable artificial intelligence”, “risk assessment”, “risk assessment model”, “asthma”, “healthcare”, and the combinations and permutations of the search terms such as “artificial intelligence and healthcare”, “artificial intelligence and asthma”, “explainable artificial intelligence and healthcare”, “explainable artificial intelligence and asthma”, “explainable artificial intelligence” AND “risk assessment” AND “health”, and “explainable artificial intelligence” AND “risk assessment” AND “asthma”. The search terms that provided broad results were further narrowed down and combined to create more specific search terms. The search terms were then refined to only include a search on XAI. In conclusion, the four decided search terms were “explainable artificial intelligence” AND “asthma”, “explainable artificial intelligence” AND “risk assessment” AND “health OR asthma”, “explainable artificial intelligence” AND “risk assessment model”, and “explainable artificial intelligence” AND “risk assessment model” AND “asthma”.

The screening of the literature would then be conducted by manually reading through all the identified literatures after filtering them to determine which literature would be relevant based on the set inclusion criterion. The procurement of data would be carried out by listing all the selected literature in an Excel file, separated by their titles, abstracts, and keywords. Several categories would then be created to try to find similarities between the chosen literatures. Any categories that could be combined would then be combined. The analysis of the data is then carried out by deciding how to categorise and find similarities between the literatures. Followed by a discussion to report the findings of the literature.

Step 3: finding the literature

The systematic review was then conducted by searching through literature concentrating on XAI techniques for risk assessment in healthcare or asthma research using the determined searched terms through the five databases. The screening of the literatures was done by one reviewer. The literatures that were included were published between January 1, 2019 and October 26, 2023. The determined search terms, “explainable artificial intelligence” AND “asthma”, “explainable artificial intelligence” AND “risk assessment” AND “health OR asthma”, “explainable artificial intelligence” AND “risk assessment model”, and “explainable artificial intelligence” AND “risk assessment model” AND “asthma” was all used in the same manner for all the databases following the rules of the Boolean operators to specify the search.

The search was conducted through five databases, which were Science Direct, ACM Digital Library, Springer, PubMed, and Scopus by one reviewer. A search using the combined search term “explainable artificial intelligence” AND “asthma” through the databases Science Direct, ACM, Springer, PubMed, and Scopus initially yielded a total of 211 studies, not including preview-only content for Springer. The search was then refined by using the filters to only include work from 2019 until 2023, English written literature, free full text, full text and abstracts, research articles, books, and conference proceedings, which yielded a total of 151 studies. For the combined search term “explainable artificial intelligence” AND “risk assessment” AND “health OR asthma” through the five databases, it returned an initial total of 281 studies. After being filtered with the same constraints as above, the search yielded a total of 191 studies. Furthermore, the combined search term “explainable artificial intelligence” AND “risk assessment model” yielded an initial of 191 studies. After being filtered, a total of 108 studies were left. Finally, the combined search term “explainable artificial intelligence” AND “risk assessment model” AND “asthma” returned a total of six studies, and with five left after being filtered using the same constraints used for the other search terms. The initial screening of the literature using the search terms and filters produced a total of 455 research papers from the combined five databases.

Step 4: literature screening for inclusion

The titles and abstracts of the screened studies were then read by one reviewer to determine their relevance to this research on XAI risk assessment model for BA. After manually going through the titles and abstracts of the literatures combined from the 5 databases to exclude non-relevant literatures and removing 11 duplicates, a total of 51 studies were determined to be related to the research of XAI risk assessment for asthma including a few articles on XAI risk assessment models for health conditions other than asthma such as chronic cough, cardiovascular disease (CVD), acute kidney injury (AKI), and allergies.

Step 5: assessing the eligibility of literature

The 51 literatures selected based on their titles and abstracts were then further evaluated to determine their relevance and eligibility by going through and reading the full text of the papers. Eight articles were excluded because the full-text writing of the articles could not be retrieved. A total of 43 research papers were included in this research. The 43 research papers left were sorted using Microsoft Excel, listing their titles, keywords, and abstracts for easier categorization. The literature search process is summarised in Figure 1.

Step 6: extracting data from the literature

Information was then withdrawn from each research paper for the purpose of generalization, categorization, and meta-analysis. The information was manually extracted by one reviewer from each research paper by reading through them and taking notes on what each paper had, such as if it focused on asthma, lungs, or other healthcare-related topics. Mendeley was also used to help with organizing and sorting the reference papers. Tags in Mendeley were manually created to help with the sortation of the papers. The papers were tagged by whether they implemented ML or DL, whether they were asthma, lung, or other healthcare-related papers, whether they used predictive or diagnostic models, and what kind of data they were using. Any literature with missing data or that is an ongoing study was kept, but noted as ongoing. The results were then recorded in tables for further categorisation. It was determined that the selected literature could then be sorted into several categories under three main sections. It was decided that the three main sections would be application, architecture, and their models, under which the papers will be further sorted into categories based on their shared characteristics. This provides an insight on current research trends on XAI-based risk assessment models in BA or healthcare and overall performance of the models in improving the quality of care by means of risk assessment within their category. Each paper may belong to one or more categories.

Step 7: analysing the data

In this step, we examine the selected literatures to procure a considerable detailed insight regarding XAI-based risk assessment models. A possible cause of heterogeneity would be the scarcity of literatures focused solely on XAI-based risk assessment models within the domain of BA. The search, if limited to the specific research topic, procured limited results, and thus the need to expand the search topic to include XAI risk assessment models in healthcare as well. However, similarities were found between them. The selected literatures were sorted under three sections, which are application, architecture, and the type of models being used in the selected literature. The author(s), the data used, the architecture of the algorithms; whether they are ML or DL or others, the models used, its specific application, and the performance metrics used is the information extracted from the literatures and listed in the tables as a way of analysing the data. The input data used by the literatures were sought to find an insight regarding the features that the XAI models used in their application domain. The performance results of the models used in the studies were analysed to eventually draw a conclusion on which XAI model would be suitable for the risk assessment of BA.

Step 8: reporting the findings

Step 8 of reporting the findings was incorporated throughout steps 1 to 7; therefore, step 8 was completed through the documentation of steps 1 to 7.

Results

A total of 43 literatures were found to be within the inclusion criteria. The 43 literatures were then sorted according to their similarities by exploring the XAI techniques used to assess risk, the data used by the models, their performance metrics, and their level of relevance towards this study, where some papers are more relevant than others. Although there are several studies that researched XAI techniques for diagnosis, risk prediction, and identification of levels of risk in other health issues, only a limited number of literatures were specifically examining XAI for asthma risk prediction. The 43 literatures are sorted under three sections: application, architecture, and the type of models.

The first section, application, refers to where the AI algorithms are being applied. The application section is separated into three categories, which are literatures that have an application in asthma, lungs, or others. The other category includes applications in the healthcare sector that are not related to asthma or lungs, such as the prediction of sepsis, risk of mortality or heart attacks, and other healthcare-related literatures. The application section was also separated into these three categories to set apart literatures that would be more relevant to this research, which would be literatures related to XAI and asthma. The next most relevant category would be literatures that employ XAI models in application to lung-related problems or diseases. Some literatures may also belong to one or more categories. Tables 1-3 list the literatures that have applications in asthma, lungs, or other sectors, respectively.

Table 1

Research papers that were asthma related (source: author’s analysis)

Authors	Data used	Architecture	Models	Applications	Performance metrics
Tomita et al., 2023 (11)	Medical records data	Explainable ML	RF, XGBoost, SHAP	Diagnosing adult asthma	XGBoost with accuracy of 81% and an AUC of 85%. RF with accuracy of 76% and AUC of 79%
Ray et al., 2022 (12)	Genomics, transcriptomics, proteomics, and metabolomics data	ML	Regularized regression techniques (e.g., LASSO, L1 regularized regression, elastic net), Bootstrap aggregated decision trees (e.g., RF), principal components regression, PLS regression, k-fold or leave-one-out cross-validation	Diagnosing and managing asthma and providing a perspective on the use of omics and ML in identifying asthma endotypes and its outcomes	Not reported. Study is ongoing
Luo et al., 2021 (13)	Tabular data	Explainable ML	XGBoost, class-based association rule XAI post-hoc model	Forecasting hospital visits that are asthma related and aid in asthma management and patient outcome	XGBoost AUC using 221 features: 0.820. XGBoost AUC using top 50 features 0.815. Their XAI was able to explain 97.57 of the forecasting results correctly
AlSaad et al., 2019 (14)	EHR data	Interpretable DL	LSTM, BiLSTM, LR model	Interpreting predictions of risk that are patient specific using CD of BiLSTM with application to children with asthma	LSTM AUC (95% CI): 0.831 (0.824–0.838). BiLSTM AUC (95% CI): 0.819 (0.811–0.827). Linear regression AUC (95% CI): 0.702 (0.692–0.712)
Lotfata et al., 2023 (15)	GWRF	Explainable geospatial ML	Conventional RF and GWRF	Examining the prevalence of socioeconomic and environmental determinants of asthma using GWRF	MSE for RF is 17.62 and GWRF is 10.61. MAE for RF is 3.02 and GWRF is 2.80. RMSE for RF is 4.20 and GWRF is 3.20. Coefficient of determination (R²) for RF is 0.82 and GWRF is 0.89. Moran’s I statistic (residuals) for RF is 0.28 (P<0.05) and GWRF is 0.0008 (P>0.05)

AUC, area under the curve; BiLSTM, bi-directional LSTM; CD, contextual decomposition; CI, confidence interval; DL, deep learning; EHR, electronic health records; GWRF, geographically weighted random forest; LASSO, least absolute shrinkage and selection operator; LR, logistic regression; LSTM, long short-term memory; MAE, mean absolute error; ML, machine learning; MSE, mean squared error; PLS, partial least squares; RF, random forest; RMSE, root mean squared error; SHAP, SHapley Additive exPlanations; XAI, explainable artificial intelligence; XGBoost, extreme gradient boosting.

Table 2

Research papers that were lung-related (source: author’s analysis)

Paper	Data used	Architecture	Models	Applications	Performance metrics
Luo et al., 2021 (16)	EHR data. Structured data (medication and diagnosis) and unstructured data (clinical notes)	Interpretable DL	BiLSTM with attention layer and BERT with attention layer. LR, RF, SVM, and kNN	Identifying chronic cough cases	Sensitivity (0.856), Specificity (0.866) using structured data. Improved to 0.952 and 0.930 respectively after combining with symptoms extracted from unstructured data
Zeng et al., 2022 (17)	Structured data. Administrative and clinical data of patients	Explainable ML	XGBoost	Predicting COPD exacerbations	AUROC of 0.866, with 56.6% (103/182), sensitivity, and 91.17% (6,698/7,347) specificity
Islam et al., 2023 (18)	Image data	Explainable DL	DCNN with a GRU while incorporating XAI techniques. LIME, SHAP, and Grad-CAM	Using DCNN to enhance the detection and classification abnormalities in the lung	AdamW, Adam, and RMSprop optimizers, respectively: DCNN-GRU model accuracy of 99.30%, 98.77%, and 99.24%; precision, 96%, 98%, 99%; recall/sensitivity, 96%, 98%, 99%; F1 score, 96%, 98%, 99%; specificity, 96.84%, 95.92%, 97.96%; Cohen’s kappa, 93.84, 95.92, 97.96. MicroAUC of 0.99 and macroAUC of 0.98
Casiraghi et al., 2020 (19)	Radiological, clinical, and laboratory variables	Explainable ML	Boruta feature selection algorithm. RF in 10-fold cross validation scheme. RF classifier using most important variables	Predicting COVID-19 in patients rapidly and accurately	AUC: 0.81. Sensitivity: 0.72. Specificity: 0.76. F1-score: 0.62. Accuracy: 0.74
de Lima et al., 2022 (20)	Tabular data (FOT exams)	Explainable ML	Genetic programming and GE. Compared with kNN, SVM, AdaBoost, RF, LightGBM, XGBoost, decision trees, and LR	Diagnosing respiratory abnormalities in sarcoidosis using explainable ML and providing decision support for clinicians	XGBoost AUC >0.90
Kor et al., 2022 (21)	Tabular data	Explainable ML	Four ML models. GBM, SVM, RF, and XGBoost. And an explainable approach SHAP	Using explainable ML to predict first-time AECOPD at an individual level	GBM AUC (95% CI): 0.833 (0.745–0.921). SVM AUC (95% CI): 0.836 (0.757–0.915). XGB AUC (95% CI): 0.7703 (0.745–0.921). RF AUC (95% CI): 0.751 (0.642–0.859)
Hernández-Pereira et al., 2022 (22)	Demographic and clinical data	ML	Feature selection method. Supervised classification methods: LR, SVM, AdaBoost, Bagging, RF, MLP, and deep network. Over-sampling methods: SMOTE, ADASYN sampling approach for imbalanced learning	Predicting the different levels of hospital care for patients diagnosed with COVID-19 (regular hospital admission or ICU admission)	AUC of 76.1% for predicting the needs of the hospital and 80.4% AUC for ICU admissions

AdaBoost, adaptive boosting; ADASYN, adaptive synthetic; AECOPD, acute exacerbation of chronic obstructive pulmonary disease; AUROC, area under the receiver operating characteristic curve; BERT, bidirectional encoder representations from transformers; BiLSTM, bi-directional LSTM; CI, confidence interval; COPD, chronic obstructive pulmonary disease; COVID-19, coronavirus disease 2019; DCNN, deep convolutional neural network; DL, deep learning; EHR, electronic health records; FOT, forced oscillation technique; GBM, gradient boosting machine; GE, grammatical evolution; Grad-CAM, gradient-weighted class activation mapping; GRU, gated recurrent unit; ICU, intensive care unit; kNN, k-nearest neighbour; LIME, local interpretable model-agnostic explanation; LR, logistic regression; LSTM, long short-term memory; ML, machine learning; MLP, multilayer perceptron; RF, random forest; SHAP, SHapley Additive exPlanations; SMOTE, Synthetic Minority Oversampling Technique; SVM, support vector machine; XAI, explainable artificial intelligence; XGBoost, extreme gradient boosting.

Table 3

Research papers that were related by healthcare or risk prediction (source: author’s analysis)

Paper	Data used	Architecture	Models	Applications	Performance metrics
Healthcare
Nesaragi et al., 2021 (23)	EHR data. PhysioNet Challenge 2019 dataset	ML	Five-cross fold validation scheme, LightGBM learning models for binary classiﬁcation	Predicting sepsis early using clinical data	Utility score: 0.4314. AUROC: 0.8613
Kim et al., 2019 (24)	EHR data. Seven vital signs, patient age and body weight on ICU admission	DL	CNNs	Predicting in real-time the all-cause of mortality within the pediatric ICUs	AUROC in the range of 0.89–0.97 for mortality prediction 6 to 60 h prior to death
Wang et al., 2023 (25)	User data and generate synthetic data	XAI (DL)	ANN and CART using LIME	Enhancing the explainability of AI in the diagnosing of diabetes	Accuracy of ANN was 92%, MLR was 76%, CART was 88%
Ordikhani et al., 2022 (26)	EHR data	ML	Novel risk assessment model, and GA	Predicting CVD events risk using an evolutionary ML algorithm	Framingham model (AUROC) was 0.633, PROCAM model (AUROC) was 0.683, DL model using a three-layered CNN (AUROC) was 0.74, the GA-based XPARS with eight features had (AUROC) of 0.76, and 0.72 with four features
Du et al., 2022 (27)	EHR data	Explainable ML	Five ML algorithms with five-fold cross-validated grid search, LR, RF, SVM, AdaBoost, and XGBoost explained with explained with SHAP	Identifying women at risk or needing targeted pregnancy intervention using a ML-based CDSS	Theoretical (AUC-PR: 0.485, AUC-ROC: 0.792), GDM screening during a normal antenatal visit (AUC-PR: 0.208, AUC-ROC: 0.659), and remote GDM risk assessment (AUC-PR: 0.199, AUC-ROC: 0.656)
Zhang et al., 2021 (28)	EHR data	ML	LR, SVM, RF, GBM, and AdaBoost	Predicting AKI after LT early using an explainable supervised ML predictor	GBM model with highest scores of AUC (0.76; 95% CI: 0.70–0.82), F1-score (0.73; 95% CI: 0.66–0.79) and sensitivity (0.74; 95% CI: 0.66–0.8) in the internal validation set, and AUC (0.75; 95% CI: 0.67–0.81) for the external validation set
Roseiro et al., 2023 (29)	Acute coronary syndrome patient’s dataset	Interpretable ML	Common model known for short-term prognosis GRACE tool and a RF classifier	Estimating the influence of inflammation biomarkers towards the assessment of cardiovascular risk using an interpretable ML approach	The sensitivity, specificity, geometric mean, and AUC were 0.83, 0.84, 0.83, and 0.91, respectively
Zadeh et al., 2024 (30)	Medical records	ML	Predictive models, decision tree, gradient boosting, RF, LARS, neural network, and LR	De novo malignancy prediction in pancreas transplant patients	Gradient boosting results (accuracy, sensitivity, specificity, precision, F1-score, and AUROC) were 0.6945, 0.7401, 0.6488, 0.6782, 0.7078, and 0.654, respectively
Qatawneh et al., 2019 (31)	Patient data	DL	ANN and MLP feed forward neural network	Predicting the risk of developing VTE in hospitalized patients and classifying it into five risk levels ranging from low to high	Stratiﬁed 10-fold cross validation had overall user accuracy of 81.2% and an overall procedure accuracy of 80.7%
Alzoubi et al., 2023 (32)	Genotype data	DL	MLP	Predicting the risk of disease using a DL framework	An AUC up to 0.94
Yalçın et al., 2023 (33)	Demographic and clinical parameters	Explainable ML	Fivefold cross-validation in the software-R with a NNST elastic net, RF, SVM, and SHAP	Predicting weight of patients and the presence of weight gain at discharge for identifying malnutrition in NICU patients using ML algorithms	RF classification (AUROC: 0.847)
Singh et al., 2023 (34)	Patient data	Explainable DL	MPI, explainable DL model (HARD MACE-DL). Grad-CAM, SHAP	Predicting death or nonfatal MI using explainable DL by directly assessing risk from MPI	External testing AUC for HARD MACE-DL (0.73; 95% CI: 0.71–0.75)
Choubey et al., 2022 (35)	Patient data	DL	ANN, GA, DAPP, and EBP-ANN	Using enhanced backpropagation to enhance the prediction efficiency of virus borne diseases using an ANN	94% accuracy, 96% precision, and 95% recall
George et al., 2023 (36)	Health record data	Interpretable AI	VAE-kNN and SHAP	Predicting risk of 30-day emergency department visits for cancer patients who are undergoing treatment	AUROC: 0.80
Petrauskas et al., 2021 (37)	Patient data	XAI	Fuzzy logic-based XAI. FSM which is a dynamic SWOT analysis network. FCM-type network	Assessing nutrition-related factors or symptoms and determining the risk of geriatric patient health problems using an XAI-based CDSS	Accuracy for malnutrition 87.95%, 87.95% for oropharyngeal dysphagia, 90.36% for eating disorders in dementia, and 86.75% for dehydration
Zeng et al., 2021 (38)	EHR	Interpretable ML	k-means algorithm (for preprocessing data), dynamic time wrapping, XGBoost model, SHAP framework	Predicting complications after pediatric congenital heart surgery using explainable ML	Cardiac complication prediction AUC of 0.946, and lung complication prediction AUC of 0.785
Giuste et al., 2023 (39)	Image data, heart biopsy WSIs	Explainable DL, data-driven AI	Grad-CAM++, Eigen-CAM, Score-CAM, and Ablation-CAM techniques, PGAN, IGAN	Assessing risk of rare pediatric heart transplant rejection using explainable synthetic image generation and an explainable synthetic data augmentation approach	95.55% AUROC, with a sensitivity of 83.33% and specificity of 66.67%
Islam et al., 2022 (40)	DHS data for CS births collected from PDHS	XAI	HGSORF, ADASYN algorithm, SHAP, LIME, and feature importance curve	Predicting the chances of caesarean or CS delivery using a stable predictive model and XAI-based cause analysis	Accuracy, F1 score, and AUC above 98%, while its sensitivity is approximately 98.94%
Liu et al., 2023 (41)	Data from Medical Information Mart for Intensive Care database, electronic ICU Collaborative Research Database, and Amsterdam University Medical Centers Database	Explainable ML	XGBoost algorithm, and SHAP	Assessing the severity of illness in older adult patients admitted in the ICU using a ML model, ELDER-ICU with evaluation for potential model bias and cohort-specific calibrations	The AUROC (95% CI) of XGBoost was 0.866 (0.851–0.880) for the internal validation set, 0.838 (0.829–0.847) for the external validation (US) set, 0.833 (0.812–0.853) for the external validation (Europe) set, and 0.884 (0.869–0.897) for the temporal validation set
Wang et al., 2021 (42)	Patient data	Interpretable ML	Six ML models: LR, kNN, SVM, NB, and MLP, XGBoost, and SHAP method	Predicting the all-cause mortality in 3 years for patients with HF caused by CHD using an explainable ML model	XGBoost model mean AUC was 0.8207 (95% CI: 0.8143–0.8272) with F1-score of 0.4476 (95% CI: 0.4407–0.4546) for mortality
Tu et al., 2023 (43)	EMR from Chi Mei Medical Center	ML	Four different ML models: LR, RF, LightGBM, and XGBoost	Predicting mortality risk in the ICU for patients with traumatic brain injury using ML algorithm	For 42 features: LR accuracy: 0.799, sensitivity: 0.806, specificity: 0.799, and AUC: 0.901. RF accuracy: 0.829, sensitivity: 0.833, specificity: 0.828, and AUC: 0.914. LightGBM accuracy: 0.832, sensitivity: 0.833, specificity: 0.832, and AUC: 0.916. XGBoost accuracy: 0.794, sensitivity: 0.806, specificity: 0.792, and AUC: 0.900
Kavya et al., 2021 (44)	Datasets from allergy testing centers in South India	ML models and post-hoc XAI approaches	kNN, SVM, C5.0, NN, AdaBag, RF	Diagnosing allergies using ML and XAI approaches	Training and validation accuracy rate of decision tree was 81.62, SVM was 81.04, and RF was 83.07
Garcia-Canadilla et al., 2022 (45)	Heterogeneous and complex data. EHR data	ML	Not available	Predicting adverse events such as clinical deterioration on and/or ICU (re-)admission after pediatric cardiac surgery using ML based predictive systems	Not available
Shin et al., 2022 (46)	Photoplethysmography patient data from PPG. Waveform and image	Explainable DL	The model has three convolutional layers with two fully connected layers, and used Grad-CAM for XAI. CNN of two 1-dimensional convolutional layers, and two fully connected layers with 1,024 nodes and 0.2 dropout rates prior to the output neuron	Assessing and estimating vascular age non-invasively by using DL CNNs on PPG	Evaluation metric [mean ± SD (range)]: correlation coefficient (R) [0.59±0.01 (0.55–0.61)], coefficient of determination (R²) [0.34±0.02 (0.30–0.37)], MAE (years) [8.4±0.1 (8.2–8.6)], RMSE (years) [10.3±0.2 (10.0–10.6)]
Aghamohammadi et al., 2019 (47)	Cleveland dataset medical reports	XAI	GA and ANFIS	Using XAI to predict heart attacks	Sensitivity (79.1667%), speciﬁcity (87.5%), precision (90.4762%), accuracy (82.5%), RMSE (0.85158), and k-fold cross validation training sets and testing sets using 9-fold cross-validation to test efficiency
Tseng et al., 2020 (48)	Patient data	Explainable ML	LR, SVM, RF, XGboost, ensemble (RF + XGboost), and SHAP	Predicting CSA-AKI using an AI-based ML approach and thorough perioperative data-driven learning	RF AUC (0.839; 95% CI: 0.772–0.898), RF + XGboost AUC (0.843; 95% CI: 0.778–0.899)
Li et al., 2021 (49)	Inpatients data on type-2 diabetes	Explainable ML	XGsBoost algorithm and SHAP	Predicting the risk and the analysis of DR using predictive ML models and a large sample dataset	AUC: 0.90
Wang et al., 2022 (50)	Patient data	Explainable ML, stacking ensemble learning	Decision tree and RF	Predicting the outcomes and risk factors of hospitalization in geriatric patients with dementia using ML model	Accuracy: 95.6%, AUC: 0.757
Kovalchuk et al., 2022 (51)	Medical data	Applicable CDSSs data-driven predictive modelling and interpretable ML	K-neighbors classifier, RF classifier, LR, SVC, XGB classifier, Gaussian NB, Bernoulli NB, multinomial NB, and SHAP	Interpretable ML using data-driven predictive modelling for CDSSs	Accuracy (70.0%), sensitivity (76.0%), and specificity (60.2%)
Twick et al., 2022 (52)	EMR data	Explainable ML	Non-linear tree-based gradient boosting classifiers, CatBoost and SHAP	Predicting medically grounded risk using EMR-based risk prediction explainable models	Leak models, naïve gradient boosting pre-AUC (0.775) and post-AUC (0.855). Literature-based gradient boosting pre-AUC (0.793) and post-AUC (0.861)
Environment
Lee et al., 2022 (53)	Daily or hourly measured meteorological and air pollution data	ML and DL	RF, XGBoost, LR regularized with an L1 penalty (LASSO regression), an L2 penalty (ridge regression), and a net elastic penalty (elastic net), and DL algorithms DL algorithms for time-series data: LSTM and stacked LSTM	Predicting cause-specific mortality using ML and DL algorithms using daily or hourly measured meteorological and air pollution data	XGBoost, ridge, and elastic net reduced the MAE by 4.7%, 4.9%, and 16.8% respectively
Lotfata et al., 2023 (15)	Socioeconomic and environmental determinants	Explainable geospatial ML	Conventional RF and GWRF	Examining the prevalence of socioeconomic and environmental determinants of asthma using GWRF	MSE for RF is 17.62 and GWRF is 10.61. MAE for RF is 3.02 and GWRF is 2.80. RMSE for RF is 4.20 and GWRF is 3.20. Coefficient of determination (R²) for RF is 0.82 and GWRF is 0.89. Moran’s I statistic (residuals) for RF is 0.28 (P<0.05) and GWRF is 0.0008 (P>0.05)

AdaBoost, adaptive boosting; ADASYN, adaptive synthetic; AI, artificial intelligence; AKI, acute kidney injury; ANFIS, adaptive neural fuzzy inference system; ANN, artificial neural network; AUC, area under the curve; AUC-PR, precision recall area under the curve; AUROC, area under the receiver operating characteristic curve; CART, classification and regression tree; CatBoost, categorical boosting; CDSS, clinical decision support system; CHD, coronary heart disease; CI, confidence interval; CNN, convolutional neural network; CS, C-section; CSA-AKI, cardiac surgery-associated acute kidney injury; CVD, cardiovascular disease; DAPP, dynamic angle projection pattern; DHS, Demographic and Health Surveys; DL, deep learning; DR, diabetic retinopathy; EBP-ANN, enhanced back propagation algorithm using ANN; EHR, electronic health records; EMR, electronic medical records; FCM, fuzzy cognitive maps; FSM, fuzzy SWOT map; GA, genetic algorithm; GBM, gradient boosting machine; GDM, gestational diabetes mellitus; GRACE, Global Registry of Acute Coronary Events; Grad-CAM, gradient-weighted class activation mapping; GWRF, geographically weighted random forest; HF, heart failure; HGSO, Henry gas solubility optimization; HGSORF, HGSO-based RF; ICU, intensive care unit; IGAN, inspirational GAN; kNN, k-nearest neighbour; LARS, layer-wise adaptive rate scaling; LASSO, least absolute shrinkage and selection operator; LightGBM, light gradient boosting machine; LIME, local interpretable model-agnostic explanation; LR, logistic regression; LSTM, long short-term memory; LT, liver transplantation; MACE, major adverse cardiac events; MAE, mean absolute error; MI, myocardial infarction; ML, machine learning; MLP, multilayer perceptron; MLR, multiple linear regression; MPI, myocardial perfusion imaging; MSE, mean squared error; NB, naive Bayesian; NICU, neonatal intensive care unit; NNST, neonatal nutritional screening tool; PDHS, Pakistan Demographic and Health Survey; PGAN, progressive GAN; PPG, photoplethysmogram; RF, random forest; RMSE, root mean squared error; SD, standard deviation; SHAP, SHapley Additive exPlanations; SVC, support vector classifier; SVM, support vector machine; VAE-kNN, variational autoencoder k-nearest neighbours’ algorithm; VTE, venous thromboembolism; WSI, whole slide image; XAI, explainable artificial intelligence; XGBoost, extreme gradient boosting; XPARS, eXplanaible Persian Atherosclerotic CVD Risk Stratification.

Section 1: application

Literature in the asthma application category

The asthma application category under the application section highlights the scarcity of literatures directly relevant to risk assessment for asthma using XAI as shown in Table 1. Literature with research in AI models related to asthma was listed in this category. Five out of the 43 selected literatures were within the asthma domain. The literatures by Tomita et al. (11), Ray et al. (12), and Lotfata et al. (15) covered random forest (RF), making it the algorithm that showed up the most among the literatures within this category. In two of the literatures, extreme gradient boosting (XGBoost) is used, namely by Tomita et al. (11) and Luo et al. (13), making it the second most commonly used algorithm in this category. Model performance will be discussed in the “Discussion” section.

Literature in the lung application category

Due to the limited literatures specifically on XAI risk assessment for asthma, literatures relating to XAI for risk prediction within the healthcare field were also included. We first look at XAI for lung-related diseases, which may contain similarities in features needed for an asthma XAI risk assessment model, as depicted in Table 2. Seven out of 43 literatures were categorised under the lung application category. Six of these literatures distinctively specified research in explainable or interpretable AI techniques. After an analysis of the literature, it was found that the most used algorithm by the literatures under the lung application category was RF, which appeared five times across several literatures, and was either used as a model or as a comparison for their own model. This was followed closely by the support vector machine (SVM) algorithm with four appearances and logistic regression (LR) and XGBoost algorithm, that was each found to be used in three of the literatures. The performances of the models would be discussed in the discussion section.

Literatures in the other application category

Due to the scarcity of literatures specifically implementing XAI models for the prediction of the risks in BA, we look at studies that utilize predictive modelling to predict the risks in other medical-related problems. This part of the study aims to analyse the models used in other literatures. Table 3 lists the models used by each literature. Thirty-one of the 43 selected literatures were categorised under this category. Twenty-one distinctively specified the employment of XAI techniques under Table 3. Although some of the literatures did not explicitly mention or focus on XAI, the articles still showed up in the systematic literature review search. Either they used interpretable models or have stated or mentioned the use of post hoc XAI methods in their work. The focus is on explainable models that could be suitable for predicting the risks of asthma with precision and accuracy, whether it is predicting the risk or able to classify the patients based on the level of patient care needed. Subsequently, there are two literatures that were related to healthcare, but were studies on how the environment affects a patient. They were categorised under the environment subsection for their use of environmental data, such as daily or hourly measured meteorological and air pollution data for cause-specific mortality, such as in the literature written by Lee et al. (53), and socioeconomic and environmental determinants prevalence in asthma by Lotfata et al. (15).

The most used model within this category is RF with 13 mentions of being used, followed by XGBoost with nine and LR with eight. The most popular post hoc method is SHapley Additive exPlanations (SHAP) with 12 literatures using it. Implementing XAI in risk prediction is crucial for a user’s understanding.

Section 2: architecture

In section 2, architecture, the literatures are separated into two categories, which are ML or DL. The literatures are categorized based on the use of ML or DL techniques within their research, which is then further split into the sub-categories of explainable ML or explainable DL. This is because ML models are relatively known to be more interpretable compared to DL models; however, it still depends on the complexity of the models themselves. The literatures that would be most relevant to this research would be literatures that implement explainability features and have focus on asthma applications. In Tables 4,5 only the citations to the literatures are listed. Information extracted from the literatures are available in Tables 1-3, including the models or algorithms used in each research.

Table 4

Authors that used ML or explicitly mentioned research in explainable ML in their paper (source: author’s analysis)

Techniques	Authors
ML	Ray et al., 2022 (12)
	Zadeh et al., 2024 (30)
	Ordikhani et al., 2022 (26)
	Garcia-Canadilla et al., 2022 (45)
	Lee et al., 2022 (53)
	Wang et al., 2022 (50)
	Tu et al., 2023 (43)
	Hernández-Pereira et al., 2022 (22)
	Nesaragi et al., 2021 (23)
Explainable ML	Luo et al., 2021 (13)
	Islam et al., 2022 (40)
	Li et al., 2021 (49)
	Du et al., 2022 (27)
	Zeng et al., 2021 (38)
	Zhang et al., 2021 (28)
	Casiraghi et al., 2020 (19)
	Lotfata et al., 2023 (15)
	Kavya et al., 2021 (44)
	Yalçın et al., 2023 (33)
	Tomita et al., 2023 (11)
	Roseiro et al., 2023 (29)
	Liu et al., 2023 (41)
	Tseng et al., 2020 (48)
	Twick et al., 2022 (52)
	Kovalchuk et al., 2022 (51)
	Wang et al., 2021 (42)
	Kor et al., 2022 (21)
	de Lima et al., 2022 (20)
	Zeng et al., 2022 (17)
	George et al., 2023 (36)

ML, machine learning.

Table 5

Authors that used DL or explicitly mentioned explainable DL in their research (source: author’s analysis)

Techniques	Authors
DL	Choubey et al., 2022 (35)
	Kim et al., 2019 (24)
	Alzoubi et al., 2023 (32)
	Qatawneh et al., 2019 (31)
	Lee et al., 2022 (53)
	Giuste et al., 2023 (39)
	Ordikhani et al., 2022 (26)
Explainable DL	Shin et al., 2022 (46)
	Singh et al., 2023 (34)
	AlSaad et al., 2019 (14)
	Wang et al., 2023 (25)
	Luo et al., 2021 (16)
	Islam et al., 2023 (18)
	George et al., 2023 (36)

DL, deep learning.

Table 4 lists nine literatures that use ML and 21 literatures with explainable ML. Whereas Table 5 lists seven literatures that use DL and seven literatures with DL and explainability features. Some of the literatures implement both ML and DL algorithms and are categorized into both categories.

Section 3: models

This section categorizes the literatures based on their AI models. There are several types of AI models, such as descriptive models, prescriptive models, classification models, natural language processing (NLP) models, and generative models. The literature was sorted according to how the authors used the models in their literature in Tables 6-8. The models used in the selected literatures were found to be mostly used as predictive models and diagnostic models, such as listed in Tables 6-8, list the literature that was not predictive or diagnostic models. However, the algorithms used may be capable of performing more than one task depending on how it is applied, such as in the case of XGBoost, which can perform classification, regression, or survival analysis tasks. Table 9 is a comparison between related work that shows the similarities or differences of the applied models in the papers. Table 10 compares the performance of the models used and the number of patients in the samples. A table, Table 11, was added to provide more context on the medical tasks performed, such as diagnosis of asthma, forecasting of hospital visits, predicting diseases, and so on, and what algorithm is applied and the algorithm task in each literature. For example, an author may have used an algorithm to diagnose patients, making it a diagnosis model, but the task performed to diagnose the patients may have been a classification task that classifies a patient into categories based on the patients’ parameters. Some literatures were categorised into more than one category because the purpose of the model in the literature was more than only one, for example, both a predictive model and a diagnosis model. The literature most relevant to this research would be the literatures that have application towards asthma, implements explainability, and are predictive models. However, the diagnosis of a disease can also be included as a feature for prediction, and the classification task can be used to classify the level of care a patient may need. The relevance of a predictive model is that in the risk assessment of BA, it would be required to identify patterns and predict future outcomes, such as a patient being at risk of asthma exacerbations or being in the red zone.

Table 6

Authors that used predictive models in their research (source: author’s analysis)

Model	Authors
Predictive models	Nesaragi et al., 2021 (23)
	Kim et al., 2019 (24)
	Ordikhani et al., 2022 (26)
	Du et al., 2022 (27)
	Zhang et al., 2021 (28)
	Roseiro et al., 2023 (29)
	Luo et al., 2021 (16)
	Zeng et al., 2022 (17)
	Zadeh et al., 2024 (30)
	Qatawneh et al., 2019 (31)
	Alzoubi et al., 2023 (32)
	Yalçın et al., 2023 (33)
	Singh et al., 2023 (34)
	Choubey et al., 2022 (35)
	George et al., 2023 (36)
	Petrauskas et al., 2021 (37)
	Casiraghi et al., 2020 (19)
	Kor et al., 2022 (21)
	Zeng et al., 2021 (38)
	Giuste et al., 2023 (39)
	Lee et al., 2022 (53)
	Luo et al., 2021 (13)
	Islam et al., 2022 (40)
	Liu et al., 2023 (41)
	Wang et al., 2021 (42)
	AlSaad et al., 2019 (14)
	Tu et al., 2023 (43)
	Hernández-Pereira et al., 2022 (22)
	Garcia-Canadilla et al., 2022 (45)
	Aghamohammadi et al., 2019 (47)
	Tseng et al., 2020 (48)
	Li et al., 2021 (49)
	Wang et al., 2022 (50)
	Lotfata et al., 2023 (15)
	Kovalchuk et al., 2022 (51)
	Twick et al., 2022 (52)

Table 7

Authors that used diagnostic models in their research (source: author’s analysis)

Model	Authors
Diagnostic models	Tomita et al., 2023 (11)
	Ray et al., 2022 (12)
	Islam et al., 2023 (18)
	de Lima et al., 2022 (20)
	Kavya et al., 2021 (44)
	Shin et al., 2022 (46)
	Wang et al., 2023 (25)

Table 8

Authors that used other models in their research (source: authors analysis)

Model	Author
Classification model	Islam et al., 2023 (18)

Table 9

Comparison between related work (source: author’s analysis)

Paper	Research applied in	Topics covered
Paper	Research applied in	BA	Risk prediction	Risk level classification in patients	XAI	Diagnosis
Luo et al., 2021 (16)	Identifying chronic cough cases	×	√	×	√	√
Zeng et al., 2022 (17)	Predicting COPD exacerbations	×	√	×	√	×
Islam et al., 2023 (18)	Using DCNN to enhance the detection and classification abnormalities in the lung	×	×	×	√	√
Casiraghi et al., 2020 (19)	Predicting COVID-19 in patients rapidly and accurately	×	√	√	√	x
de Lima et al., 2022 (20)	Diagnosing respiratory abnormalities in sarcoidosis using explainable ML and providing decision support for clinicians	×	×	×	√	√
Kor et al., 2022 (21)	Using explainable ML to predict first-time AECOPD at an individual level	×	√	×	√	×
Hernández-Pereira et al., 2022 (22)	Predicting the different levels of hospital care for patients diagnosed with COVID-19 (regular hospital admission or ICU admission)	×	×	√	×	×
Tomita et al., 2023 (11)	Diagnosing adult asthma	√	×	×	√	√
Ray et al., 2022 (12)	Diagnosing and managing asthma and providing a perspective on the use of omics and ML in identifying asthma endotypes and its outcomes	√	×	×	√	√
Luo et al., 2021 (13)	Forecasting hospital visits that are asthma related and aid in asthma management and patient outcome	√	√	×	√	×
AlSaad et al., 2019 (14)	Interpreting predictions of risk that are patient specific using CD of BiLSTM with application to children with asthma	√	√	×	√	×
Lotfata et al., 2023 (15)	Examining the prevalence of socioeconomic and environmental determinants of asthma using GWRF	√	×	×	√	×

AECOPD, acute exacerbation of chronic obstructive pulmonary disease; BA, bronchial asthma; BiLSTM, bi-directional LSTM; CD, contextual decomposition; COPD, chronic obstructive pulmonary disease; COVID-19, coronavirus disease 2019; DCNN, deep convolutional neural network; GWRF, geographically weighted random forest; ICU, intensive care unit; LSTM, long short-term memory; ML, machine learning; XAI, explainable artificial intelligence.

Table 10

Comparison of number of patients and model performances

Paper	Research applied in	Source of data	Number of patients	Type of patients	Models and their performance
Luo et al., 2021 (16)	Identifying chronic cough cases	EHR data: medication, diagnosis and clinical notes	47,144 patients	23,572 chronic cough patients. 23,572 non-chronic cough patients	LR, SVM, kNN, RF, BERT, BERT + attention model, and BiLSTM + attention model. BERT + attention model performed the best overall in all their experiments. Mean specificity (95% CI): 0.930 (0.924, 0.936); mean sensitivity (95% CI): 0.948 (0.938, 0.958)
Zeng et al., 2022 (17)	Predicting COPD exacerbations	University of Washington Medicine: administrative and clinical data	285 patients	103 patients with COPD in the following 12 months. 182 patients with COPD and severe COPD exacerbation in following 12 months	XGBOOST and class-based association rule XAI post-hoc model. Able to explain 97.1% (100/103) patients with COPD and 73.6% (134/182) patients with COPD and severe COPD exacerbation
Islam et al., 2023 (18)	Using DCNN to enhance the detection and classification abnormalities in the lung	Iraq oncology teaching hospital/National Center for Cancer Diseases	–	2,073 CT scans from 110 healthy and lung cancer patients. 7,593 COVID-19 images from 466 patients. 6,893 normal images from 604 patients. 2,618 CAP images from 60 patients	DCNN + GRU + XAI. LIME, SHAP, Grad-CAM. Accuracy: COVID19-dataset, 99.30%; lung cancer dataset, 98.9%
Casiraghi et al., 2020 (19)	Predicting COVID-19 in patients rapidly and accurately	Emergency department of multicenter health system: clinical, comorbidity, laboratory, anterior posterior or posterior anterior data	301 patients	All patients RT-PCR positive COVID-19. 207 men, 94 women. 214 low risk patients, 87 high risk patients	Boruta feature selection algorithm. RF in 10-fold cross validation scheme. RF classifier using most important variables. RF AUC: 0.81. Sensitivity: 0.72. Specificity: 0.76. F1-score: 0.62. Accuracy: 0.74
de Lima et al., 2022 (20)	Diagnosing respiratory abnormalities in sarcoidosis using explainable ML and providing decision support for clinicians	Biomedical Instrumentation Laboratory of the Rio De Janeiro State University: FOT results	72 patients	25 healthy individuals. 47 patients with sarcoidosis. From the 47, 24 remained in normal conditions while 23 later had respiratory changes	Genetic programming in traditional tree form, GE, decision trees, kNN, SVM, AdaBoost, RF, LightGBM, XGBoost, and LR. However, in their study with a comparison to other ML models, XGBOOST performed better with AUC more than 90. XGBoost performed better in all 3 of the experiments performed in the paper compared to other models with an AUC of more than 90 for all three experiments
Kor et al., 2022 (21)	Using explainable ML to predict first-time AECOPD at an individual level	Changhuan Christian Hospital: registry data from COPD Pay-for Performance Program database	509 patients	All patients had COPD	GBM AUC (95% CI): 0.833 (0.745–0.921). SVM AUC (95% CI): 0.836 (0.757–0.915). XGB AUC (95% CI): 0.7703 (0.745–0.921). RF AUC (95% CI): 0.751 (0.642–0.859)
Hernández-Pereira et al., 2022 (22)	Predicting the different levels of hospital care for patients diagnosed with COVID-19 (regular hospital admission or ICU admission)	14 hospitals in Galicia (Spain): demographic and clinical data	10,454 patients	All diagnosed with COVID-19	AUC of 76.1% for predicting the needs of the hospital and 80.4% AUC for ICU admissions. In most cases they determined deep network was the best
Tomita et al., 2023 (11)	Diagnosing adult asthma	Kindai University: medical records	566 adult outpatients	367 patients with positive case of asthma. 199 negative cases of asthma. 345 were female. 221 were male	RF, XGBoost, and SHAP. XGBoost Accuracy: 81% and AUC 85%. RF accuracy: 76% and AUC 79%
Ray et al., 2022 (12)	Diagnosing and managing asthma and providing a perspective on the use of omics and ML in identifying asthma endotypes and its outcomes	None	None	None	None (study is still ongoing)
Luo et al., 2021 (13)	Forecasting hospital visits that are asthma related and aid in asthma management and patient outcome	KPSC data. 987,506 data instances	2,259 patients	All patients diagnosed with asthma	XGBoost AUC using 221 features: 0.820. XGBoost AUC using top 50 features 0.815. Their XAI was able to explain 97.57 of the forecasting results correctly
AlSaad et al., 2019 (14)	Interpreting predictions of risk that are patient specific using CD of BiLSTM with application to children with asthma	Cerner Health Facts EHR dataset	11,071 patients	All patients are pre-school students with respiratory-related complications	LSTM AUC (95% CI): 0.831 (0.824–0.838). BiLSTM AUC (95% CI): 0.819 (0.811–0.827). Linear regression AUC (95% CI): 0.702 (0.692–0.712)
Lotfata et al., 2023 (15)	Examining the prevalence of socioeconomic and environmental determinants of asthma using GWRF	Behaviour Risk Factor Surveillance System	3,059 countries	Socioeconomic and environmental determinants	GWRF. The untransformed median asthma rate was 9.9% per 1,000 people in 3,059 countries

AECOPD, acute exacerbation of chronic obstructive pulmonary disease; AUC, area under the curve; BERT, bidirectional encoder representations from transformers; BiLSTM, bi-directional LSTM; CAP, chest, abdomen, and pelvis; CD, contextual decomposition; CI, confidence interval; COPD, chronic obstructive pulmonary disease; COVID-19, coronavirus disease 2019; CT, computed tomography; EHR, electronic health records; FOT, forced oscillation technique; GBM, gradient boosting machine; GE, grammatical evolution; Grad-CAM, gradient-weighted class activation mapping; GRU, gated recurrent unit; GWRF, geographically weighted random forest; ICU, intensive care unit; kNN, k-nearest neighbour; KPSC, Kaiser Permanente Southern California; LightGBM, light gradient boosting machine; LIME, local interpretable model-agnostic explanation; LR, logistic regression; LSTM, long short-term memory; ML, machine learning; RF, random forest; RT-PCR, reverse transcription-polymerase chain reaction; SHAP, SHapley Additive exPlanations; SVM, support vector machine; XAI, explainable artificial intelligence; XGBoost, extreme gradient boosting.

Table 11

Medical task, algorithm applied, and algorithm task of each article

Authors	Medical task applied
Asthma application
Tomita et al., 2023 (11)	RF classifier and an optimized XGBoost classifier to diagnose adult asthma. Classification task to diagnose adult asthma
Ray et al., 2022 (12)	Not reported. Ongoing study
Luo et al., 2021 (13)	XGBoost classification algorithm and their class-based association rule automatic explanation method in forecasting asthma-related hospital visits. Classification task to predict asthma-related hospital visits
AlSaad et al., 2019 (14)	CD method, LSTM and BiLSTM, compared to LR model to predict and interpret future clinical outcomes in application to children with asthma. Predictive modelling with classification layer. Classification task to predict outcomes of children with asthma
Lotfata et al., 2023 (15)	Conventional RF and GWRF regression task for association between asthma prevalence and socioeconomic and environmental determinants. Regression task for asthma prevalence affected by socioeconomic determinants
Lung application
Luo et al., 2021 (16)	Identify chronic cough patients using BiLSTM with attention layer, BERT with attention layer, LR, RF, SVM, and kNN. Classification task to identify chronic cough patients
Zeng et al., 2022 (17)	XGBoost classification algorithm to make predictions on severe COPD exacerbations and their class-based association rule automatic explanation method. Classification task to predict severe COPD exacerbations
Islam et al., 2023 (18)	Detection and classification using DCNN-GRU for lung abnormalities. DCNN to extract meaningful features, including global and local patterns. GRU to model temporal dependencies and capture sequential information in the images. Both DCNN and GRU with a classification layer. Classification task to detect lung abnormalities
Casiraghi et al., 2020 (19)	Boruta feature selection algorithm and RF classifier in a 10-fold cross validation scheme for COVID-19 risk prediction. Classification task to predict COVID-19
de Lima et al., 2022 (20)	Genetic programming and GE classifier for diagnosing respiratory abnormalities in sarcoidosis. Data used from FOT exam. Classification task for diagnosing respiratory abnormalities
Kor et al., 2022 (21)	GBM, SVM, RF and XGBoost and SHAP in predicting AECOPD. Classification task for predicting acute COPD
Hernández-Pereira et al., 2022 (22)	Supervised classification methods: LR, SVM, AdaBoost, Bagging, RF, MLP for predicting different levels of hospital care oF COVID-19. Classification task for predicting different levels of hospital care of COVID-19
Healthcare or risk prediction application
Healthcare
Nesaragi et al., 2021 (23)	Five-cross fold validation scheme, LightGBM learning models for binary classiﬁcation to predict sepsis
Kim et al., 2019 (24)	CNNs for real-time mortality prediction in critically ill children in pediatric ICUs. PROMPT, a ML based model. Classification task
Wang et al., 2023 (25)	ANN and CART using LIME for classify cation and regression task in diagnosis of diabetes
Ordikhani et al., 2022 (26)	Novel risk assessment model, GA for CVD risk prediction. Classification task for CVD risk prediction
Du et al., 2022 (27)	Five ML algorithms with five-fold cross-validated grid search, LR, RF, SVM, AdaBoost and extreme gradient for prediction of GDM. Classification task for GDM prediction
Zhang et al., 2021 (28)	LR, SVM, RF, GBM, and AdaBoost for classification task in predicting AKI after deceased donor liver transplant
Roseiro et al., 2023 (29)	GRACE tool and a RF classifier to estimate the influence of inflammation biomarkers on cardiovascular risk assessment
Zadeh et al., 2024 (30)	Predictive models, decision tree, gradient boosting, RF, LARS, neural network, LR. Classification task for predicting de novo malignancy in pancreas transplant patients
Qatawneh et al., 2019 (31)	ANN, MLP feed forward neural network. Classification task for VTE risk classification
Alzoubi et al., 2023 (32)	MLP classification task for complex disease risk prediction using genomic variations
Yalçın et al., 2023 (33)	Fivefold cross-validation in the software-R with a NNST elastic net, RF, SVM, and SHAP. Classification and regression task for identifying malnutrition in NICU patients
Singh et al., 2023 (34)	MPI, explainable DL model (HARD MACE-DL). Grad-CAM, and SHAP for predicting death or nonfatal MI. Classification task
Choubey et al., 2022 (35)	ANN, GA, DAPP, EBP-ANN for prediction efficiency of virus borne diseases. Classification task
George et al., 2023 (36)	VAE-kNN and SHAP for predicting risk of 30-day ED visit. Classification task
Petrauskas et al., 2021 (37)	Fuzzy logic-based XAI. FSM which is a dynamic SWOT analysis network. FCM-type network for assessing the nutrition-related geriatric syndromes. Prediction and classification task
Zeng et al., 2021 (38)	k-means algorithm for preprocessing data, dynamic time wrapping, XGBoost model, SHAP framework for binary and multi-label classification in predictions of complications after pediatric congenital heart surgery
Giuste et al., 2023 (39)	Grad-CAM++, Eigen-CAM, Score-CAM, and Ablation-CAM techniques, PGAN, IGAN. Classification task for risk assessment of rare pediatric heart transplant rejection
Islam et al., 2022 (40)	HGSORF, ADASYN algorithm, SHAP, LIME, and feature importance curve. Classification task for CS prediction
Liu et al., 2023 (41)	XGBoost algorithm, SHAP. Isotonic regression for illness severity assessment of older adults in critical illness
Wang et al., 2021 (42)	LR, kNN, SVM, NB, MLP, XGBoost, and SHAP method. Classifiers for predicting 3-year all-cause mortality in patients with HF caused by CHD
Tu et al., 2023 (43)	LR, RF, LightGBM, and XGBoost for predicting mortality risk in ICU for patients with traumatic brain injury. Classification task
Kavya et al., 2021 (44)	kNN, SVM, C5.0, NN, AdaBag, RF for allergy diagnosis. Classification tasks
Garcia-Canadilla et al., 2022 (45)	Not available
Shin et al., 2022 (46)	CNN and Grad-CAM for vascular aging assessment. Regression task
Aghamohammadi et al., 2019 (47)	GA and ANFIS for predicting heart attack. Classification task
Tseng et al., 2020 (48)	LR, SVM, RF, XGboost, and ensemble (RF + XGboost) and SHAP for predicting AKI after cardiac surgery. Classification task
Li et al., 2021 (49)	XGBoost algorithm and SHAP. Classification task
Wang et al., 2022 (50)	Decision tree and RF for predicting hospitalization outcomes in geriatric patients with dementia. Multi-class classification task
Kovalchuk et al., 2022 (51)	K-neighbours classifier, RF classifier, LR, SVC, XGB classifier, Gaussian NB, Bernoulli NB, multinomial NB and SHAP for clinical decision making. Classification task
Twick et al., 2022 (52)	Non-linear tree-based gradient boosting classifiers, CatBoost and SHAP for EMR-based risk prediction models. Classification task
Environment
Lee et al., 2022 (53)	RF, XGBoost, LR regularized with an L1 penalty (LASSO regression), an L2 penalty (ridge regression), and a net elastic penalty (elastic net), and DL algorithms DL algorithms for time-series data: LSTM and stacked LSTM for predicting non‑accidental, cardiovascular, and respiratory mortality with environmental exposures. Regression task
Lotfata et al., 2023 (15)	Conventional RF and GWRF regression task for association between asthma prevalence and socioeconomic and environmental determinants

AdaBoost, adaptive boosting; ADASYN, adaptive synthetic; AECOPD, acute exacerbation of chronic obstructive pulmonary disease; AKI, acute kidney injury; ANFIS, adaptive neural fuzzy inference system; ANN, artificial neural network; BERT, bidirectional encoder representations from transformers; BiLSTM, bi-directional LSTM; CART, classification and regression tree; CatBoost, categorical boosting; CD, contextual decomposition; CHD, coronary heart disease; CNN, convolutional neural network; COPD, chronic obstructive pulmonary disease; COVID-19, coronavirus disease 2019; CS, C-section; CVD, cardiovascular disease; DAPP, dynamic angle projection pattern; DCNN, deep convolutional neural network; DL, deep learning; EBP-ANN, enhanced back propagation algorithm using ANN; ED, emergency department; EMR, electronic medical records; FCM, fuzzy cognitive maps; FOT, forced oscillation technique; FSM, fuzzy SWOT map; GA, genetic algorithm; GBM, gradient boosting machine; GDM, gestational diabetes mellitus; GE, grammatical evolution; GRACE, Global Registry of Acute Coronary Events; Grad-CAM, gradient-weighted class activation mapping; GRU, gated recurrent unit; GWRF, geographically weighted random forest; HF, heart failure; HGSO, Henry gas solubility optimization; HGSORF, HGSO-based RF; ICU, intensive care unit; IGAN, inspirational GAN; kNN, k-nearest neighbour; LARS, layer-wise adaptive rate scaling; LASSO, least absolute shrinkage and selection operator; LightGBM, light gradient boosting machine; LIME, local interpretable model-agnostic explanation; LR, logistic regression; LSTM, long short-term memory; MACE, major adverse cardiac events; MI, myocardial infarction; ML, machine learning; MLP, multilayer perceptron; MPI, myocardial perfusion imaging; NB, naive Bayesian; NICU, neonatal intensive care unit; NNST, neonatal nutritional screening tool; PGAN, progressive GAN; PROMPT, Paediatric Risk of Mortality Prediction Tool; RF, random forest; SHAP, SHapley Additive exPlanations; SVC, support vector classifier; SVM, support vector machine; VAE-kNN, variational autoencoder k-nearest neighbours’ algorithm; VTE, venous thromboembolism; XAI, explainable artificial intelligence; XGBoost, extreme gradient boosting.

Discussion

Key findings

The risk assessment methods used in each paper were compared to each other. The papers are not all on the same asthma-related task; however, our focus is on the algorithm tasks that are carried out by the algorithms for risk assessment, preferably in the BA domain. The focus of the study is on explainable risk assessment models. Due to the scarcity in material, we included literatures in lung and healthcare, with focus on the performance of the risk assessment tasks accomplished by the algorithms and the performance of the explainability models. We analysed and compared the XAI algorithms and the algorithm tasks performed for risk assessment. The algorithm tasks we refer to are tasks, such as classification tasks, regression tasks, or survival analysis tasks, among others, that may be used to assess risk. The connections and similarities were found between them.

Although this study also included literature on ML, DL and XAI methods that were developed for other diseases, such as chronic obstructive pulmonary disease (COPD) (17), diabetes (25), sepsis (23), CVD (26), gestational diabetes (27), kidney injury (28), venous thromboembolism (VTE) (31), malnutrition (33), myocardial infarction (MI) (34), viral infections (35), risk of emergency visits (36), complications after heart surgery (38), mortality prediction (24), and so on, they all involve assessing risk using patient data. The models also rely on similar input data, such as electronic health records (EHR), clinical history and data, medical history and data, laboratory data, patient vital signs, or demographic data to predict risk. Since risk assessment in asthma also uses similar inputs in data, the adaptations of the methods used for application in other diseases is feasible. The similarities in the input data and research goals, such as predicting the risk of the progression of a disease, identifying clinical risk, or predicting worsening condition are all similar to the needs of risk assessment in BA, such as the risks of exacerbation or hospitalization. The use of multidimensional patient data and the approaches used in other applications are all equally relevant in risk assessment of asthma. The cases in each study all centre around prediction of clinical risk and carry out algorithm tasks, such as classification task, forecasting, survival analysis, which are all equally relevant in the risk assessment of BA, such as the risk of exacerbations or hospitalization related to asthma. Out of the 43 selected literatures, 12 literatures were decidedly more relevant than the others due to their similarities in requirements for their research. Five out of the 12 literatures were research in explainable ML or DL in asthma application (11-15), and the remaining seven were in lung application for prediction of lung-related diseases by (16-22). In three of the literatures, XGBoost was compared to other models. These three literatures were the papers by Tomita et al. (11), de Lima et al. (20), and Kor et al. (21), where in the first two papers, XGBoost performed better than the other models compared in the papers, with an accuracy of 81% and area under the curve (AUC) of 85% in the paper by Tomita et al. (11) and an AUC of more than 90% in the paper by de Lima et al. (20), but in the paper by Kor et al. (21), XGBoost had an AUC of 0.7703 and performed less than gradient boosting machine (GBM) and SVM. All three papers used the XGBoost to carry out classification tasks. The results, sample size, and performance of the algorithms can be referred in Table 10. Table 11 lists the medical tasks, algorithm applied, and the algorithm task of each literature. The paper by Tomita et al. (11) used XGBoost classifier to diagnose adult asthma, with a sample size of 566 patients. The paper by de Lima et al. (20) diagnosed respiratory abnormalities in sarcoidosis with a sample size of 72 patients and compared between 10 algorithms as listed in Table 10. The paper by Kor et al. (21) predicted first-time acute exacerbation of COPD (AECOPD) with a sample size of 509 patients. The model that had better performance when compared to the other models in the 12 most relevant literature was XGBoost, with it having better performance 2 out of the 3 times (66.67%) it was compared to other models. Five of the 12 literatures applied XGBoost as their model, making it the most chosen model (41.67%); however, two of the literatures by Zeng et al. (17) and Luo et al. (13), by the same group of researchers, conducted research on their own automatic explanation method for ML models by using XGBoost as their model. The other literatures in this study have adopted models that have a possibility of being directly interpretable on its own, or have adopted post-hoc techniques, such as SHAP (14 literatures), local interpretable model-agnostic explanation (LIME) (three literatures), and gradient-weighted class activation mapping (Grad-CAM) (four literatures).

Strengths and limitations

Strengths

This study extends the dialogue in XAI for BA and has identified the explainable ML or DL techniques that researchers have included in their recent work. This includes ante-hoc or post-hoc explainable models, the most used model among them, the most commonly used post-hoc model, and the model that performs better than other models in studies that include comparison. This work has identified that XGBoost may be worth looking into as a model for BA risk prediction.

Limitations

There are several limitations confining this study. The first limitation is that the literatures included in this study was limited to five databases, namely Science Direct, ACM Digital Library, Springer, PubMed, and Scopus. Any possible relevant research outside of these five databases could have been excluded. Second, the search criteria for literatures may have omitted any literatures that did not explicitly mention the use of XAI, risk assessment, risk assessment model, asthma, and health. Other than that, the inclusion of literatures written only using the English language may have excluded relevant literatures written in other languages, resulting in language bias. Furthermore, some literatures had unattainable full-text and were consequently excluded from the review.

Moreover, there was a disproportion between literatures with explainable ML algorithms and explainable DL algorithms with ML XAI being a majority. This puts the review in focus on ML XAI while under-representing DL XAI.

In this review, we did not focus on the feature extraction or selection for an XAI model but on the model itself. This review also insufficiently explores the relation between XAI and asthma, with asthma being a global issue, due to the scarcity of papers written for that exact topic.

Comparison with similar researches

Comparison between similar researches is presented in Tables 9,10.

Explanations of findings

A comparison was done between the 12 most relevant literature, determined by their application in asthma and lungs in Tables 9,10. Table 9 was a comparison between what was done in each study, and Table 10 compared the performances of the models with the dataset used, number of patients, and type of patients. It is worth noting that only studies that explicitly mentioned research XAI in their studies were given a tick, although all the studies are considered using explainable models. Table 11 lists the medical tasks, algorithm applied, and algorithm task of each literature to give a clearer definition of the medical tasks, the algorithms used to solve the medical tasks, and the algorithm task of each algorithm. For example, the classification task of the XGBoost algorithm was used to solve the medical task of diagnosing adult asthma (11).

It was found that XGBoost was the most used model within the 12 literatures and the better performing model, with it chosen to be included in the studies five out 12 times and having a better performance two out of the three times it was compared to other models. There were some studies that did comparisons between their own models with other models, but did not include XGBoost as one of those models, such as in the work by AlSaad et al. (14), Luo et al. (16), and Hernández-Pereira et al. (22). Therefore, it is difficult to deduce whether XGBoost would have performed better than the selected models. It is also worth noting that the size or generalizability of each dataset used may contribute to the performance in the outputs of the model as shown in Table 10. Moving on, although being frequently used may not be an indicator of it being the better model, but it may be the desirable choice due to it being a tree-based ensemble method that, if not too complex, may be relatively interpretable. Next, the XGBoost model has built-in feature importance metrics that can be used to identify the most important features and gain insight on how the features may affect the patterns of the predictions.

Implications and actions needed

Research in XAI for risk prediction is seeing rapid development. However, there is still a need for more literatures on the use of XAI for predicting risks in asthma. Successful integration of XAI for clinical use in risk prediction of asthma will require clinical confirmation and research.

Conclusions

In conclusion, this review has analysed the XAI models used for asthma, lung, and healthcare. It has presented an overview on XAI models currently being used for risk assessment within the healthcare field. XAI for risk assessment in BA that is accurate, trustworthy, and interpretable by the users may soon contribute to the quality of life for asthma patients.

Acknowledgments

The authors declare that during the writing process of the work, AI-assisted technology ChatGPT and Monica were used only to improve clarity, enhance the overall quality of the writing, and were solely used as a tool to optimize the writing in the work. The authors edited and reviewed the work as needed.

Footnote

Reporting Checklist: The authors have completed the PRISMA reporting checklist. Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-204/rc

Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-204/prf

Funding: This research was supported by Universiti Malaysia Sarawak under the Vice Chancellor High Impact Grant (UNI/F08/VC-HIRG/85481/P01-03/2022).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-204/coif). All authors report a grant from Universiti Malaysia Sarawak outside the submitted work (UNI/F08/VC-HIRG/85481/P01-03/2022). The authors have no other conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Global Initiative for Asthma (GINA). 2024 GINA Main Report. 2024 [cited 2024 Jun 25]. Available online: https://ginasthma.org/2024-report/
National Heart, Lung, and Blood Institute. What Is Asthma? 2021. Available online: https://www.nhlbi.nih.gov/health/asthma
Chaddad A, Peng J, Xu J, et al. Survey of Explainable AI Techniques in Healthcare. Sensors (Basel) 2023; [Crossref]
Exarchos KP, Beltsiou M, Votti CA, et al. Artificial intelligence techniques in asthma: a systematic review and critical appraisal of the existing literature. Eur Respir J 2020;56:2000521. [Crossref] [PubMed]
Antoniadi AM, Du Y, Guendouz Y, et al. Current Challenges and Future Opportunities for XAI in Machine Learning-Based Clinical Decision Support Systems: A Systematic Review. Applied Sciences 2021;11:5088.
Yu X, Watson M. Guidance on conducting a systematic literature review. Journal of Planning Education and Research 2017;39:93-112.
Kitchenham B, Charters SM. Guidelines for performing systematic literature reviews in Software Engineering. EBSE Technical Report. 2007. Available online: https://www.researchgate.net/publication/302924724_Guidelines_for_performing_Systematic_Literature_Reviews_in_Software_Engineering_Guidelines_for_performing_Systematic_Literature_Reviews_in_Software_Engineering
Sheu RK, Pardeshi MS. A Survey on Medical Explainable AI (XAI): Recent Progress, Explainability Approach, Human Interaction and Scoring System. Sensors (Basel) 2022; [Crossref]
Gates S. Review of methodology of quantitative reviews using meta-analysis in ecology. Journal of Animal Ecology 2002;71:547-57.
Gomersall JS, Jadotte YT, Xue Y, et al. Conducting systematic reviews of economic evaluations. Int J Evid Based Healthc 2015;13:170-8. [Crossref] [PubMed]
Tomita K, Yamasaki A, Katou R, et al. Construction of a Diagnostic Algorithm for Diagnosis of Adult Asthma Using Machine Learning with Random Forest and XGBoost. Diagnostics (Basel) 2023; [Crossref]
Ray A, Das J, Wenzel SE. Determining asthma endotypes and outcomes: Complementing existing clinical practice with modern machine learning. Cell Rep Med 2022;3:100857. [Crossref] [PubMed]
Luo G, Nau CL, Crawford WW, et al. Generalizability of an Automatic Explanation Method for Machine Learning Prediction Results on Asthma-Related Hospital Visits in Patients With Asthma: Quantitative Analysis. J Med Internet Res 2021;23:e24153. [Crossref] [PubMed]
AlSaad R, Malluhi Q, Janahi I, et al. Interpreting patient-Specific risk prediction using contextual decomposition of BiLSTMs: application to children with asthma. BMC Med Inform Decis Mak 2019;19:214. [Crossref] [PubMed]
Lotfata A, Moosazadeh M, Helbich M, et al. Socioeconomic and environmental determinants of asthma prevalence: a cross-sectional study at the U.S. County level using geographically weighted random forests. Int J Health Geogr 2023;22:18. [Crossref] [PubMed]
Luo X, Gandhi P, Zhang Z, et al. Applying interpretable deep learning models to identify chronic cough patients using EHR data. Comput Methods Programs Biomed 2021;210:106395. [Crossref] [PubMed]
Zeng S, Arjomandi M, Luo G. Automatically Explaining Machine Learning Predictions on Severe Chronic Obstructive Pulmonary Disease Exacerbations: Retrospective Cohort Study. JMIR Med Inform 2022;10:e33043. [Crossref] [PubMed]
Islam MK, Rahman M, Ali MS, et al. Enhancing lung abnormalities detection and classification using a Deep Convolutional Neural Network and GRU with explainable AI: A promising approach for accurate diagnosis. Machine Learning With Applications 2023;14:100492.
Casiraghi E, Malchiodi D, Trucco G, et al. Explainable Machine Learning for Early Assessment of COVID-19 Risk Prediction in Emergency Departments. IEEE Access 2020;8:196299-325.
de Lima AD, Lopes AJ, do Amaral JLM, et al. Explainable machine learning methods and respiratory oscillometry for the diagnosis of respiratory abnormalities in sarcoidosis. BMC Med Inform Decis Mak 2022;22:274. [Crossref] [PubMed]
Kor CT, Li YR, Lin PR, et al. Explainable Machine Learning Model for Predicting First-Time Acute Exacerbation in Patients with Chronic Obstructive Pulmonary Disease. J Pers Med 2022; [Crossref]
Hernández-Pereira E, Fontenla-Romero O, Bolón-Canedo V, et al. Machine learning techniques to predict different levels of hospital care of CoVid-19. Appl Intell (Dordr) 2022;52:6413-31. [Crossref] [PubMed]
Nesaragi N, Patidar S, Veerakumar T. A correlation matrix-based tensor decomposition method for early prediction of sepsis from clinical data. Biocybernetics and Biomedical Engineering 2021;41:1013-24.
Kim SY, Kim S, Cho J, et al. A deep learning model for real-time mortality prediction in critically ill children. Crit Care 2019;23:279. [Crossref] [PubMed]
Wang YC, Chen TCT, Chiu MC. A systematic approach to enhance the explainability of artificial intelligence in healthcare with application to diagnosis of diabetes. Healthcare Analytics 2023;3:100183.
Ordikhani M, Saniee Abadeh M, Prugger C, et al. An evolutionary machine learning algorithm for cardiovascular disease risk prediction. PLoS One 2022;17:e0271723. [Crossref] [PubMed]
Du Y, Rafferty AR, McAuliffe FM, et al. An explainable machine learning-based clinical decision support system for prediction of gestational diabetes mellitus. Sci Rep 2022;12:1170. [Crossref] [PubMed]
Zhang Y, Yang D, Liu Z, et al. An explainable supervised machine learning predictor of acute kidney injury after adult deceased donor liver transplantation. J Transl Med 2021;19:321. [Crossref] [PubMed]
Roseiro M, Henriques J, Paredes S, et al. An interpretable machine learning approach to estimate the influence of inflammation biomarkers on cardiovascular risk assessment. Comput Methods Programs Biomed 2023;230:107347. [Crossref] [PubMed]
Zadeh A, Broach C, Nosoudi N, et al. Building analytical models for predicting de novo malignancy in pancreas transplant patients: A machine learning approach. Expert Systems with Applications 2024;237:121584.
Qatawneh Z, Alshraideh M, Almasri N, et al. Clinical decision support system for venous thromboembolism risk classification. Applied Computing and Informatics 2019;15:12-8.
Alzoubi H, Alzubi R, Ramzan N. Deep Learning Framework for Complex Disease Risk Prediction Using Genomic Variations. Sensors (Basel) 2023; [Crossref]
Yalçın N, Kaşıkcı M, Çelik HT, et al. Development and validation of machine learning-based clinical decision support tool for identifying malnutrition in NICU patients. Sci Rep 2023;13:5227. [Crossref] [PubMed]
Singh A, Miller RJH, Otaki Y, et al. Direct Risk Assessment From Myocardial Perfusion Imaging Using Explainable Deep Learning. JACC Cardiovasc Imaging 2023;16:209-20. [Crossref] [PubMed]
Choubey S, Barde S, Badholia A. Enhancing the prediction efficiency of virus borne diseases using enhanced backpropagation with an artificial neural network. Measurement: Sensors 2022;24:100505.
George R, Ellis B, West A, et al. Ensuring fair, safe, and interpretable artificial intelligence-based prediction tools in a real-world oncological setting. Commun Med (Lond) 2023;3:88. [Crossref] [PubMed]
Petrauskas V, Jasinevicius R, Damuleviciene G, et al. Explainable Artificial Intelligence-Based Decision Support System for assessing the Nutrition-Related Geriatric Syndromes. Applied Sciences 2021;11:11763.
Zeng X, Hu Y, Shu L, et al. Explainable machine-learning predictions for complications after pediatric congenital heart surgery. Sci Rep 2021;11:17244. [Crossref] [PubMed]
Giuste FO, Sequeira R, Keerthipati V, et al. Explainable synthetic image generation to improve risk assessment of rare pediatric heart transplant rejection. J Biomed Inform 2023;139:104303. [Crossref] [PubMed]
Islam MS, Awal MA, Laboni JN, et al. HGSORF: Henry Gas Solubility Optimization-based Random Forest for C-Section prediction and XAI-based cause analysis. Comput Biol Med 2022;147:105671. [Crossref] [PubMed]
Liu X, Hu P, Yeung W, et al. Illness severity assessment of older adults in critical illness using machine learning (ELDER-ICU): an international multicentre study with subgroup bias evaluation. Lancet Digit Health 2023;5:e657-67. [Crossref] [PubMed]
Wang K, Tian J, Zheng C, et al. Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP. Comput Biol Med 2021;137:104813. [Crossref] [PubMed]
Tu KC, Tau ENT, Chen NC, et al. Machine Learning Algorithm Predicts Mortality Risk in Intensive Care Unit for Patients with Traumatic Brain Injury. Diagnostics (Basel) 2023; [Crossref]
Kavya R, Christopher J, Panda S, et al. Machine Learning and XAI approaches for Allergy Diagnosis. Biomedical Signal Processing and Control 2021;69:102681.
Garcia-Canadilla P, Isabel-Roquero A, Aurensanz-Clemente E, et al. Machine Learning-Based Systems for the Anticipation of Adverse Events After Pediatric Cardiac Surgery. Front Pediatr 2022;10:930913. [Crossref] [PubMed]
Shin H, Noh G, Choi BM. Photoplethysmogram based vascular aging assessment using the deep convolutional neural network. Sci Rep 2022;12:11377. [Crossref] [PubMed]
Aghamohammadi M, Madan M, Hong JK, et al. Predicting heart attack through explainable artificial intelligence. In: International Conference on Computational Science. Cham: Springer International Publishing; 2019:633-45.
Tseng PY, Chen YT, Wang CH, et al. Prediction of the development of acute kidney injury following cardiac surgery by machine learning. Crit Care 2020;24:478. [Crossref] [PubMed]
Li W, Song Y, Chen K, et al. Predictive model and risk analysis for diabetic retinopathy using machine learning: a retrospective cohort study in China. BMJ Open 2021;11:e050989. [Crossref] [PubMed]
Wang X, Ezeana CF, Wang L, et al. Risk factors and machine learning model for predicting hospitalization outcomes in geriatric patients with dementia. Alzheimers Dement (N Y) 2022;8:e12351. [Crossref] [PubMed]
Kovalchuk SV, Kopanitsa GD, Derevitskii IV, et al. Three-stage intelligent support of clinical decision making for higher trust, validity, and explainability. J Biomed Inform 2022;127:104013. [Crossref] [PubMed]
Twick I, Zahavi G, Benvenisti H, et al. Towards interpretable, medically grounded, EMR-based risk prediction models. Sci Rep 2022;12:9990. [Crossref] [PubMed]
Lee W, Lim YH, Ha E, et al. Forecasting of non-accidental, cardiovascular, and respiratory mortality with environmental exposures adopting machine learning approaches. Environ Sci Pollut Res Int 2022;29:88318-29. [Crossref] [PubMed]

doi: 10.21037/jmai-24-204
Cite this article as: Edwin CVA, Lim PC, Awang Iskandar DNB, Ng DLC. A systematic literature review of explainable risk assessment models for bronchial asthma. J Med Artif Intell 2026;9:7.