Recent trends in prediction of chronic kidney disease using different learning approaches: a systematic literature review

Iliyas Ibrahim Iliyas; Souley Boukari; Abdulsalam Ya’u Gital

doi:10.21037/jmai-24-256

Review Article

Recent trends in prediction of chronic kidney disease using different learning approaches: a systematic literature review

Iliyas Ibrahim Iliyas¹ , Souley Boukari², Abdulsalam Ya’u Gital²

¹Department of Computer Science, University of Maiduguri, Borno State, Nigeria; ²Department of Computer Science, Abubakar Tafawa Balewa University, Bauchi State, Nigeria

Contributions: (I) Conception and design: All authors; (II) Administrative support: None; (III) Provision of study materials or patients: II Iliyas; (IV) Collection and assembly of data: II Iliyas; (V) Data analysis and interpretation: II Iliyas; (VI) Manuscript writing: II Iliyas; (VII) Final approval of manuscript: All authors.

Correspondence to: Iliyas Ibrahim Iliyas, MSc. Lecturer, Department of Computer Science, University of Maiduguri, Alone Bama Road, Maiduguri, Borno State 600004, Nigeria. Email: iliyasibrahimiliyas@unimaid.edu.ng.

Background: Chronic kidney disease (CKD) usually develops gradually over time, and the lack of adequate experienced nephrologists, biopsy services, and imaging shows that most CKD patients are not tested properly and promptly, which is why computer-aided automated, convenient, accurate, and low-cost. CKD predictive methods could enhance early detection. The study aims to present a comprehensive overview of the existing learning approaches for the prediction of CKD, their strengths and challenges, and further studies of these learning approaches and the different CKD datasets used for the application of these various learning approaches.

Methods: The studies were retrieved from Scopus, Springer, PubMed, ResearchGate, and Google Scholar with a time range from 2021 to 2024. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) was adopted for selecting the studies; out of 321 studies, 86 were fully studied and used in this study, and the selected studies were selected based on their relevancy to the prediction of CKD and performance matrices.

Results: In this article, we conducted a comprehensive review of the recent use of artificial intelligence (AI) and how different machine learning (ML) approaches are used to improve CKD predictions, the study highlighted these approaches as conventional ML (CML), deep learning (DL) and ensemble ML (ES) which has effectively predicted CKD, implications of these approaches were discussed, challenges and further studies to enhance the prediction of CKD within the clinical settings was also discussed.

Conclusions: AI models have shown promising improvements in predicting CKD accuracy by reducing human error and enhancing diagnostics. ML is a core component of AI that performs the ability to learn from data, identify complex patterns and make decisions. ML can now be classified into three different learning approaches which are CML, DL and ensemble learning. DL has outperformed the other learning approaches in identifying complex patterns and inclusion of feature engineering which are the limitations of CML. Additionally, DL has limited complexity compared to ensemble learning. Further studies should focus on enhancing DL to capture complex patterns, enhancement of feature selection to identify nonlinearity within features, and integration of explainable AI (XAI) techniques to improve interpretability and trust in the health sector.

Keywords: Ensemble learning; conventional machine learning (CML); deep learning (DL); chronic kidney disease (CKD); different learning for kidney detection

Received: 30 July 2024; Accepted: 28 November 2024; Published online: 12 March 2025.

doi: 10.21037/jmai-24-256

Highlight box

Key findings

• Machine learning has three different learning approaches which are conventional machine learning, deep learning and ensemble learning for the prediction of chronic kidney disease (CKD).

What is known and what is new?

• Previous reviews have discussed different machine learning algorithms for predicting CKD based on performance accuracy.

• However, there has been no previous research that studies different machine learning approaches that are used for the prediction of CKD with the consideration of their strengths and limitations.

What is the implication, and what should change now?

• We will need additional development methods to enhance accuracy, efficiency, features and the use of explainable artificial intelligence techniques to improve the prediction of CKD.

Introduction

Chronic kidney disease (CKD) refers to the case in which a human kidney is damaged and unable to filter blood in the body and flush out metabolic waste accordingly. It normally develops gradually over time. Globally, more than 800 million people are diagnosed with CKD, and CKD progresses to End Stage Renal Disease (ESRD) in some cases (1). CKD symptoms are not observed in its early stage until it loses almost 25% of its function (2). CKD has become a universal health issue that is projected to cause 5 to 10 million deaths annually worldwide because of kidney disease. Current findings based on data predict that by 2040, the fifth leading cause of death globally will be CKD (3). Traditionally, detecting and estimating CKD severity involves various biomarkers and computational methods. The most used way to determine kidney function is the glomerular filtration rate (GFR), however, detecting CKD manually is not sufficient for diagnosing CKD. The most used approach to determine kidney function is GFR, however, manual measurement of GFR is labour-intensive, requires careful planning and time consuming to detect CKD. Furthermore, using GFR for routine CKD screening might be impractical and costly. Also, limited access to nephrologists, screening facilities and biopsy services leads to late detections of CKD which is why computer-aided automated, convenient, accurate, and low-cost CKD predictive methods could enhance early detection (4). Machine learning (ML) is a subdivision of data science in which machines learn instead of being rigorously programmed. ML is a set of algorithms that enable software programs to predict outputs accurately without being assisted by any external programming. ML’s straightforward hypothesis involves the building of algorithms that receive input data and utilize statistical analysis to produce output while updating the output when new data are available. ML has achieved significant achievements in different disease diagnoses, and the real application of the learning algorithm involves classifying people’s symptoms according to whether the person is sick or healthy (5). ML has become a growing phase that addresses the study of large amounts of data and has evolved from the study of computational learning theory and pattern recognition in artificial intelligence (AI) with various algorithms, techniques, and algorithms to analyse and predict the data. ML methods have shown success in detecting and recognizing vital diseases in the medical arena by using previous CKD patient information to train predictive models (6). With several risk factors linked to the onset of CKD, there are limitations with methods for predicting the risk of CKD, and interventions to slow CKD development in patients with CKD will be undetected through health systems, and clinicians will face challenges in managing KD with inadequate tools (3). With the use of AI, we can now improve the diagnosis and prediction of CKD. AI and ML have now provided solutions for diagnosing CKD from a sparse amount of data and predicting patient prognosis, with researchers contributing to the success of these AI and ML methods in resolving complex problems in predicting CKD diagnosis and prognosis and managing treatment (7). Different Learning for Kidney Prediction (DLKP) involves different learning approaches adopted for predicting CKD with numerous algorithms. With the evolvement of ML in healthcare, predicting CKD has become increasingly data-driven, providing ways for early detection, however understanding this ML techniques are effective in clinical settings remains unclear. The study seeks to address the question: what are the different ML approaches developed for the prediction of CKD, what are their strengths and challenges they face, what are the various CKD datasets available and are further studies that can help improve the prediction of CKD.

Previously published surveys on CKD prediction

This section presents a survey on CKD prediction that was published in the literature. This section highlights insights on previously published studies as well as the enhancement of the previous studies that this study has presented. Recently, many papers on CKD prediction have been published in the literature. González-Rocha et al. (8) assessed CKD prediction scores among patients with type 3 diabetes versus those in good health and emphasized the need for further validation across a range of demographics. Zhao et al. (9) presented the use of AI-assisted medical imaging to enhance CKD diagnosis, noting the challenge of integrating AI into clinical practice. Chen et al. (10) systematically reviewed ML-based tools for CKD care, pointing out the need for methodological rigour and real-world application validation. Finally, Sanmarchi et al. (7) assessed AI and ML techniques for CKD, focusing on their clinical utility but identifying issues with generalizability and clinical evaluation. This study provides a comprehensive study on CKD prediction and analysis of various learning methods, their strengths, limitations and further studies. This study also provides an extensive review of different datasets and techniques over a specific period [2019–2024]. Additionally, this work provides a comparative analysis of the strengths and challenges faced by each learning method and proposes actionable steps for future improvements in CKD prediction, diagnosis, and treatment. We present this article in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting checklist (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-256/rc).

Methods

Study aim

The study reviewed various studies carried out for the prediction of CKD, specifically the different ML, deep learning (DL), and ensemble learning methods. This study investigated CKD datasets, the approaches used in applying those learning methods, and their performances based on several evaluation metrics.

Data source and description

PRISMA served as the basis for the search strategy used in this review study, which was introduced by Moher et al. (11). Scopus, Springer, PubMed, ResearchGate, and Google Scholar were searched for studies based on the keywords “machine learning and CKD/KD prediction” or “AI CKD/KD prediction”, and these studies were selected from studies carried out from 2021 to 2024. The studies were screened from the title and abstract reviewed, and based on the accessibility of the studies, 99% of the selected studies were fully reviewed, and models were chosen based on the studies’ highest scores selected from the evaluation matrices.

Selection criteria

Based on the aim of the study, only studies conducted for the prediction of CKD using ML were selected, studies carried out before 2021 were excluded, studies not written in English were also excluded, and some studies compared multiple evaluation metrics to determine the best model; however, this study selected only one metric, accuracy, but when accuracy was not considered an evaluation metric, other matrices were considered, and the studies were selected. The dataset was also considered to be only a numerical dataset.

Included studies

Of the 321 studies retrieved from Scopus, Springer, PubMed, ResearchGate, and Google Scholar, 104 were excluded after screening the titles of those articles, 196 were fully reviewed, 110 were excluded, and 86 studies were reviewed and analysed. Figure 1 shows the PRISMA flowchart of the study, while the taxonomy of the study is shown in Figure 2.

Figure 1 PRISMA flowchart. KD, knowledge discovery; ML, machine learning.

Figure 2 Taxonomy for different learning methods developed for predicting CKD. CKD, chronic kidney disease; AGRU, augmented gated recurrent unit; ANN, artificial neural network; BiLSTM, bidirectional long short-term memory; CNN, convolutional neural network; DBN, deep belief network; DNN, deep neural network; DT, decision tree; ERT, extremely randomized trees; GBM, gradient boosting machine; GNN, graph neural network; HRNN, hierarchical recurrent neural network; IBK, instance-based learning k-nearest neighbour; KNN, k-nearest neighbour; LR, logistic regression; MLP, multilayer perceptron; NB, naive Bayes; NN, neural network; RF, random forest; RT, regression tree; SVM, support vector machine; XGBoost, extreme gradient boosting.

Overview of CKD

CKD is characterized by a progressive gradual decline in kidney function and eventually requires renal replacement therapy (transplantation or dialysis) (7). Globally, CKD is a major public health concern, especially in low- and middle-income nations where millions of people die from a lack of access to care (12). This disease is caused by long-term alterations in the structure and function of the kidney. When kidney cells begin to decline over a long period, it is called “chronic”. CKD is a serious public health concern globally, especially in low- and middle-income countries where millions of people die from a lack of access to healthcare (12). CKD is a clinical illness caused by a permanent change in the kidney’s structure and function. It can be identified as a gradual, steady evolution and irreversible process. For adult patients, a GFR of less than 60 mL/min/1.73 m² or a GFR of greater than 60 mL/min/1.73 m² but with evidence of comparable damage to kidney structures, along with a start time of three months or more, are regarded as indicators of CKD (13). The rate of new cases of CKD is increasing globally; by 2040, CKD is expected to be the 6^th most prevalent chronic disease, and by 2040, CKD is expected to rank as the 6^th most prevalent chronic disease. Numerous healthcare systems, societal distributions, and risk factors for CKD contribute to the global incidence and prevalence of this disease. The current standardized incidence of CKD [estimated GFR (eGFR) <60 mL/min/1.73 m²] is estimated to be between 10% and 15%. The consequences of CKD are diverse (14). CKD reduces the quality of human life, increases healthcare expenditures, and promotes early death. The consequences of untreated CKD include end-stage kidney disease (ESKD), which is the last stage of CKD, and diseases such as cardiovascular disease (CVD), diabetes, hypertension, and obesity can contribute to the development of CKD (15).

Results

CKD diagnosis

At the initial stage of diagnosing CKD, it can be detected using several biochemical parameters, such as the estimated eGFR, which is measured as plasma creatinine with the help of equations such as the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) (14). The chance of receiving treatment to stop the course of CKD increases with a diagnosis of CKD. Most kidney disease goes undetected and without treatment until it is at a later stage, when expected interventions are less effective, due to low screening rates in high-risk populations, a lack of patient symptoms, and the low sensitivity of creatinine-based measures of kidney function to detect early kidney damage. Moreover, the cause of CKD is still unknown in a large number of individuals. The current standard for assessing renal microstructure is a kidney biopsy, which is only rarely performed. Furthermore, there is a great deal of variation in eGFR trajectories, which can be attributed in part to genetic variables and the severity of systemic chronic illnesses such as diabetes. Early detection of CKD is recommended for patients who are likely to have advanced disease (16). Published in 2002 by Kidney Disease: Improving Global Outcomes (KDIGO), the Kidney Disease Outcomes Quality Initiative was the first CKD guideline that emphasized the need for the identification and staging of CKD according to the GFR. KDIGO sponsored an international Controversies Conference in 2009 to discuss the findings of a meta-analysis of 45 cohorts, resulting in over 1.5 million adults, examining the relationship between albuminuria and eGFR and kidney outcomes and mortality. Participants at the 2–6 conference agreed to split CKD stage G3 into two stages and add urine albumin categories to each CKD stage. They also produced the KDIGO heatmap. The 2012 CKD guideline from KDIGO recommended determining CKD status (16).

ML model for predicting CKD

Predicting CKD involves analysing factors such as clinical, demographic, and laboratory tests to assess the presence or absence of CKD. Manual prediction of CKD can be challenging due to human error, leading to unnecessary treatment. A study suggests that a considerable number of people face at least one diagnostic mistake during their lifetime, and it is linked to a lack of noticeable symptoms, rare disease presences, and diseases excluded unintentionally from consideration. ML has now been spread across the healthcare sector to improve the diagnosis of disease which enhances diagnostic accuracy and safety (17) effective CKD diagnosis for accurate detection at the primary stage. ML is now one of the most essential and prosperous areas in the healthcare sector for analysing and making predictions for different diseases and stages.

Conventional ML (CML) in predicting CKD

According to Arthur Samuel, ML is a field of research in which computers learn without explicit programming (12). ML has emerged as a thriving area within AI. It makes use of techniques and models drawn from statistics and probability theory in place of symbolic approaches. By examining enough data samples, ML algorithms aid machines in learning the information required to carry out a specific activity. The features that reflect the most particular information must be recovered as part of a phase called feature extraction before the method can be used. The sample data, which come next in the process, are used to train the system to communicate characteristics and distinguish patterns using a particular ML training technique (18).

Logistic regression (LR)

LR is a common statistical approach for modelling outcomes with two different results. Various methods of learning are used to do LR in statistical research (19). It has shown success in predicting CKD such as the study by Palembang et al. (20) that uses K-nearest neighbour (KNN) imputation on the dataset before evaluating the method and achieved 99.75%, the study by Ifraz et al. (21) shows how LR outperformed other ML techniques in predicting CKD with 97.00% accuracy, the study by Mijwil and Alsaadi (22) also highlighted how LR predicts kidney failure with 97.80% accuracy.

Random forest (RF)

The RF is an ensemble technique similar to the closest neighbour predictor, applying divide-and-conquer to combine weak learners into a strong one, and decision trees (DTs) repeatedly splitting the dataset based on an ideal separation criterion to construct a tree structure (23). Banu and Begum (24) combined with KNN imputation predicted CKD with 99.75% accuracy. Debal and Sitote (25) also show how LR and recursive feature elimination (RFE) feature selection methods can predict CKD with an accuracy of 99.80% and 82.56% for both binary and multiclass respectively.

Support vector machine (SVM)

SVM is a widely supervised ML method of classification, constructing a perfect hyperplane by splitting data points into groups and categorising new data (23). Swain et al. (23) highlighted how SVM with Chi-squared test feature selection can predict CKD with 99.33%, Iftikhar et al. (26) succeeded in developing SVM-Laplace for predicting CKD with 90.50%.

DT

DTs are extensively used for supervised categorization, organising data into a flowchart with internal nodes assessing features, branches showing outcomes, and leaf nodes containing class labels (27). da Silveira et al. (28) highlighted how DT with manual augmentation and synthetic minority over-sampling technique (SMOTE) predict CKD with 92.66% accuracy, Samatha et al. (29) also proposed a DT model for predicting CKD with 98.75% accuracy.

KNN

The KNNs classifier measures the distance between an unlabelled instance and all training instances, assigning it to the class of its KNNs (30). Studies like by Arif et al. (31) proposed a KNN model with Boruta feature selection approach in predicting CKD with 100%, and Salkar (32) also developed KNN and principal component analysis (PCA) to predict CKD and CVD with the same accuracy of 100%.

Extreme gradient boosting (XGBoost)

DTs are created in XGBoost sequentially with significant weights assigned to each variable. Variables misclassified by the first tree are given higher weights before getting fed into the next tree. This procedure results in a robust model, suitable for regression, classification, ranking, and custom prediction tasks (19). Islam et al. (27) predicted CKD with XGBoost with effective subset features with 98.30% accuracy, Chiu et al. (33) also developed a KNN model in the prediction of CKD with a 0.86 AUC score. Other traditional ML used for predicting CKD are in Table 1.

Table 1

Traditional ML studies for predicting CKD

References	Method	Proposed algorithm	Contribution	Limitations
(20)	LR	RF & LR	The results show that RF with KNN imputation outperformed LR in the early detection of CKD using the UCI CKD dataset with 99.75%	The study’s generalizability and ability to determine CKD severity is limited by the UCI CKD dataset’s small size (400 samples)
(34)	RF	XGBoost, AdaBoost, & RF	The result showed RF has been superior in classifying CKD with 98.80%	The study’s conclusions are based on a single dataset, which may limit the generalizability of the findings
(35)	LSVM	ANN, C5.0, logistic, CHAID, LSVM, KNN and random tree	The study highlighted the combination of SMOTE for balancing data and LASSO regression for feature selection with LSVM achieving the best accuracy of 98.86% in CKD discovery and prediction	The study relied on the UCI CKD dataset which may limit the findings’ applicability to other data or real-world clinical settings
(36)	RF	LR, KNN, GNB, SVM, SGD, DT, GB, RF, XGB, and LightGBM	RF outperformed LR, KNN, GNB, SVM, SGD, DT, GB, XGB, and LightGBM in predicting CKD in T1DM patients using 16 years of trial data	The EDIC patient dataset could limit the models’ implementation and generalizability to other demographics or therapeutic contexts
(37)	AdaBoost and LightGBM	AdaBoost, XGBoost, and LightGBM	the results show that LightGBM and AdaBoost outperform another boosting algorithm (XGBoost) with 99.75% each in predicting CKD	The need to validate the model on larger and more varied datasets
(37)	RF	KNN, LR, DT, RF, SVM, NB, MLP, and QDA	The result shows that LightGBM and AdaBoost achieved 99.75% accuracy in the prediction of CKD with RandomizedSearchCV for hyper parameter tuning	The study has proven theoretical directions but lacks practical application on local datasets, and no exploration of DL approaches has been implemented
(38)	J48 and DT	DT, random tree, KNN, SGB, J48, and NB	The study showed how J48 outperforms DT in classifying CKD with the integration of feature selection with 99.00%	Lack of generalizability of the study findings due to the focus on a specific dataset, the need for broader validation across different datasets, and adjustments to model parameters
(39)	RF	LR, NB, MLP, SGD, AdaBoost, Bagging, DT, RF	The study indicated that RF outperformed LG, NB, MLP, SGD, AdaBoost, Bagging, and DT in classifying CKD	The study relied on accuracy and ROC curves without deeper analysis of other performance metrics like sensitivity and specificity
(29)	DT and LR	KNN, LR, RF, SVM, and DT	DT and LR performed better than RF, SVM, and KNN with 98.75%	Further validation with larger and more diverse datasets
(40)	KNN	KNN, LR, SVC, DTC, and RF	The result has proven that KNN with the highest accuracy in predicting CKD compared to LR, SVC, DTC, and RF	Further validation with larger real-world datasets
(41)	C5.0	C5.0, DT and C4.0	C5.0 achieved 100% in identifying risk factors of CKD	Limited dataset size and specificity of algorithms used
(42)	XGBoost	XGBoost, LR, and SVM	The results show that XGBoost achieved the highest accuracy of 93.00% in understanding CKD by identifying cIMT	Limited dataset, further validation with larger datasets, and the lack of analysis on “black box” models like artificial neural networks
(1)	KNN	K-Means Clustering, DB-Scan, I-Forest, and Autoencoder	KNN proved to be the best-unsupervised algorithm with 99.00% accuracy with feature selection method in classifying both CKD and Non-CKD	The study’s applicability to detecting specific stages of CKD requires further exploration and validation
(43)	Extremely Randomized Trees	LR, DT, RF, Extremely Randomized Trees, GTB, and NN	The results indicated that Extremely Randomized Trees classifiers accurately predict the time frame for CKD patients needing dialysis, with an accuracy of 94.00%	The study needs validation across different healthcare settings
(44)	DT	NB, LR, NB, KNN, DT, RF and extra tree	Employed ML techniques for prediction and diagnosis of CKD, achieving a high accuracy of 99.36% with extra trees classifier and optimizing feature selection	Reliance on one source of dataset and limited dataset
(45)	J48	J48 and RF	This study utilized ML classification algorithms, specifically J48, to predict stages of CKD with an accuracy of 85.5%	The study utilized one and a small dataset for validation
(21)	LR	LR, KNN and DT	LR outperformed DT and KNN with 97.00% in predicting CKD using clinical data	The study relied on a single dataset with potential biases. The need to validate findings across diverse datasets
(46)	RF	RF	RF combined with Hot-Deck imputation and chi-square test feature selection achieved an accuracy of 99.24% in the early detection of CKD	The study relied on a specific dataset and feature selection approach and evaluation with other ML methods was not done
(32)	KNN	RF, SVM, NB, KNN, and LR	The study indicated that KNN with PCA achieved the highest accuracy with 100% on CKD and CVD	The study did not address potential overfitting concerns and the lack of larger and more diverse datasets
(22)	LR	SVM, LR, RF, NB	The results show that LR achieved an accuracy of 97.80% in predicting kidney failure	The study requires further validation and adaptation.
(47)	RF	KNN, DT and RF	This study demonstrates the effectiveness of RF in early diagnosis of CKD with 100% accuracy	The study’s findings are limited by the small dataset
(48)	NB and RF	KNN, SVM, LR, RF and NB	The results show that NB and RF achieved the best accuracy of 100% in predicting CKD at the early stage	The study relied on a single dataset with 400 patient records
(49)	AdaBoost	LR, DT, XGBoost, RF, SVM, AdaBoost	The study proposed a cost-sensitive AdaBoost classifier with the highest accuracy of 99.8%	The study needs further validation on larger, more diverse datasets
(50)	SVM	SVM, RF, KNN, DT, and logistic	SVM achieved an accuracy of 92.00% in predicting CKD with minimal feature sets	The model’s clinical applicability is limited without incorporating a broader set of features, and the dataset is small
(28)	DT	DT, RF, and multi-class AdaBoosted DTs	The study proposed DT with manual augmentation and SMOTE technique with 92.66% accuracy	The study needs to improve decision rule accuracy and resolve conflicting rules and the size of the dataset is small
(51)	RF	RF and LR	The study improves CKD prediction by using correlation-based feature selection and RF, achieving 98%	The findings are primarily based on a single dataset
(30)	Rotation forest	Rotation forest, SVM, LR, SGD, ANN and KNN	The result indicated that rotation forest is highly effective in predicting CKD with an accuracy of 99.20%	The approach’s current validation is limited by the dataset’s size
(52)	IBK and random tree	IBK, random tree, random forest, MLP, J48, Hoeffding tree, REP Tree, AdaBoostM1	The results show that random tree and IBK achieved the highest accuracy of 93.66% in predicting DKD	The findings are limited by the dataset’s scope (410 instances), and further development is needed to predict DKD progression
(53)	RF and DT	DT and RF	The study proposed an explainable ML model for CKD prediction using an RF and DT algorithm, enhanced by feature selection and explainability methods (LIME, SHAP, SKATER)	The study is limited by the small dataset size and the focus on only the RF and DT
(54)	RF	CDC CKD risk score, XGBoost and RF	The study indicated that RF can detect DKD stages with AUC scores of 0.77 and 0.83 for stages 3 to 5	The study’s conclusions are based on retrospective data
(33)	XGBoost	LR, C5.0, SGB, XGBoost, and MARS	The study identified blood urea nitrogen, uric acid, and socioeconomic status as risk factors for CKD and showed that XGBoost outperformed LR, SGB, MARS, and C5.0 with a 0.86 AUC score	The findings are limited using a single dataset and cross-sectional data
(19)	RF and LR	RF, SVM, NB, LR, KNN, XGBoost, DT, and AdaBoost	The proposed model shows that RF and LR achieved the same accuracy of 100%	The models require further training and testing on diverse datasets
(25)	RF	RF, SVM and DT	RF with RFE indicated an accuracy of 99.80% and 82.56% for both binary and multiclass respectively	The study excluded the DL method in their study
(27)	XGBoost	Adaboost, RF, NB, KNN, LGBM, XGBoost, CatBoost, extra tree, SVM, DT, ANN, SGB, and GB	XGBoost classifier achieved an accuracy of 98.30% with an effective subset of features	The study relied on supervised learning
(26)	SVM	Logistics, Probit, DT, KNN, SVM-RB, SVM-L, SVM-LAP, SVM-B, and RF	SVM-Lap performed better than Logistics, Probit, DT, KNN, SVM-RB, SVM-L, SVM-B, and RF with 90.50% accuracy	The study is limited to a specific dataset from a single region
(55)	LR	NB, DT and LR	The proposed model shows better performance in prognosing and diagnosing CKD early with 80.20% accuracy	The study may require validation in real-world clinical settings
(12)	LightGBM	XGBoost, Random Forest, ExtraTrees Classifier, Naïve Bayes, Logistic Regression, SVC, AdaBoost, LightGBM and KNN	The study shows the superiority of LightGBM in predicting CKD with 99.00% accuracy	The study utilized limited biases and there is a need to validate the model across diverse populations
(24)	RF	LR, NB, Feed forward neural network, KNN, RF, and LOG	The result indicated that RF with KNN imputation outperformed LR, NB, Feed forward neural network, KNN, and LOG with 99.75% accuracy	The study is constrained by a small dataset of 400 samples, and the model’s ability to assess CKD severity due to the binary nature of the dataset (CKD and not CKD)
(31)	KNN	KNN, NB, and GNB	The study shows that KNN with the Boruta feature selection approach outperformed NB and GNB in predicting CKD with 100% accuracy	The study relied on a single CKD dataset, with substantial missing values
(23)	SVM	SVM and RF	The results show that SVM outperformed RF with feature selection via the Chi-squared test in predicting CKD with 99.33% accuracy	The model’s current scope is limited to predicting the presence or absence of CKD, and the need for more sophisticated feature selection techniques

ANN, artificial neural network; AUC, area under the curve; CDC, Centers for Disease Control and Prevention; CHAID, Chi-squared automatic interaction detector; cIMT, carotid intima-media thickness; CKD, chronic kidney disease; CVD, cardiovascular disease; DB, database; DL, deep learning; DT, decision tree; DTC, decision tree classifier; EDIC, epidemiology of diabetes interventions and complications; GB, gradient boosting; GBM, gradient boosting machine; GNB, Gaussian Naive Bayes; GTB, gradient tree boosting; IBK, instance-based K-nearest neighbour; KNN, K-nearest neighbour; LASSO, least absolute shrinkage and selection operator; LIME, local interpretable model-agnostic explanation; LOG, logistic; LR, logistic regression; LSVM, least squares support vector machine; MARS, multivariate adaptive regression spline; ML, machine learning; MLP, multilayer perceptron; NB, Naive Bayes; NN, neural network; QDA, quadratic discriminant analysis; RF, random forest; RFE, recursive feature elimination; ROC, receiver operating characteristic; SGB, stochastic gradient boosting; SGD, stochastic gradient descent; SHAP, SHapley Additive exPlanations; SKATER, spatially-known atmospheric turbulence effects in radiative transfer; SMOTE, synthetic minority over-sampling technique; SVC, support vector classification; SVM, support vector machine; SVM-B, support vector machine-bessel kernel; SVM-L, support vector machine-linear kernel; SVM-LAP, support vector machine-laplacian kernel; SVM-RB, support vector machine-radial basis kernel; T1DM, type 1 diabetes mellitus; UCI, University of California, Irvine; XGB/XGBoost, extreme gradient boosting.

DL models in predicting CKD

Artificial neural networks (ANNs) are a type of computing system that mimics the biological neural networks (NNs) that exist in human brains, and DL can use semi-supervised, supervised, or unsupervised representation learning. Distinguishing DL from other ML techniques is representation learning, commonly referred to as feature learning. With feature learning, computers can automatically identify the representations needed for classification from raw data, unlike manual feature engineering. As a result, DL requires very little manual tuning and can quickly analyse the growing amount of data and computations. Manual engineering is only necessary for some processes, such as changing the quantity and size of layers to produce different (56).

Multilayer perceptron (MLP)

The MLP is an essential and commonly utilised ANN architecture that has provided significant contributions to the fields of ML and DL. The MLP is made up of three layers: an input layer (IL), a hidden layer (HL) (n≥1), and one output layer (OL). The neurons in the NN layer are interconnected with connections weighted to reflect the relationships between inputs and outputs. One of MLP’s features is its ability to model both non-linear and linear relationships within data, making it useful in tasks such as classification, regression, and feature extraction. The HLs introduce nonlinearity, allowing MLPs to capture complex data patterns that simpler models cannot. However, training an MLP involves addressing problems like overfitting, vanishing gradients, and hyperparameter adjustment (57). It has shown that MLP can predict CKD as studies by Khadka et al. (58) proposed an MLP model with an explainable AI (XAI) interface to predict CKD with 100% accuracy, Ravindra et al. (59) also showed how MLP can predict CKD on University of California, Irvine (UCI) and local datasets with 93.22% and 92.78 respectively.

Convolutional neural network (CNN)

CNNs are a DL algorithm that assigns bias and weights to their various features and then separates one from the other. The primary justification for using CNN is that it requires only a little effort in preprocessing the data as compared to other algorithms since CNN can learn to optimise the filters through automated learning (60). The studies that show CNN in predicting CKD include the one by Mondol et al. (61) developed an optimized CNN for predicting CKD and achieved an AUC score of 0.99, and Alanazi (60) also proposed CNN that outperformed Naive Bayes (NB), DT, and LR with 95% accuracy.

Deep neural network (DNN)

DNNs are ANNs that have numerous layers between input features and output predictions. Each linear layer is connected by non-linear activation functions to learn non-linear correlations between the input features (62). Singh et al. (63) showed how DNN outperformed SVM, KNN, LR, RF, and NB in predicting CKD with RFE with 100% accuracy. DNN has been superior in terms of predicting the progression of CKD which leads to ESRD (62).

Other DL models

Apart from popular DL models that exist, other DL models have been proposed to predict CKD with exceptional accuracy, DL like Hybrid-Recurrent Neural Network (HRNN) has been developed with embedded feature selection method to predict CKD with 0.67 F1 score (64), attention-based gated recurrent unit (AGRU) also was proposed by Gokiladevi and Santhoshkumar (65) with Henry Gas optimization algorithm (HGOA) for selection of best features and Slime Mould Algorithm for hyper tuning AGRU, the proposed model achieved 99.56% accuracy and its computational time is less than 1 minute, Singh et al. (66) proposed ANN model with RF with wrapper feature selection for prediction of CKD which outperformed other traditional ML with 99.04% accuracy. Hybrid DL models have been proposed by many studies to predict CKD with tremendous performance such as the study by Venkatrao and Shaik (67) that introduced a hybrid model called hybrid DL network model (HDLNet) comprising of Capsule Network (CapsNet) for feature extraction, Aquila Optimization Algorithm (AO) method for feature selection and Deep Separable Convolution Neural Network (DSCNN) for classification of CKD and the proposed model outperformed existing models with 99.18% accuracy. Subashini and Venkatesh (68) showed that MLP model least absolute shrinkage and selection operator (LASSO) and Relief feature selection methodswith SMOTE followed by edited nearest neighbours (SMOTE-ENN) for handling imbalance data outperformed GBM, and LSTM in predicting CKD with 100% accuracy. Busi et al. (69) proposed a hybrid model combining the CapsNet feature extraction method and improved spotted hyena optimizer (ISHO) feature selection with Bi-Directional Convolutional Long-Short Term Memory (Bi-C-LSTM) and Darknet CNN (DNetCNN) for prediction of CKD with a better accuracy of 99.89%, a summary of other DL models that exists for predicting CKD is in Table 2.

Table 2

Deep learning studies for predicting CKD

References	Method	Dataset (size)	Contribution	Limitation
(70)	DNN	DNN	The results show that DNN achieved 99.00% accuracy in detecting CKD	No algorithm evaluation was done and the need for diverse dataset is necessary
(71)	DBN	RBM and DBN	The proposed model with softmax activation function achieved 98.50% accuracy in predicting CKD	The study requires further validation and comparison with more existing models
(72)	ANN	LR, ANN, and SVM	The proposed result indicated better performance than other traditional ML with 88.48%	The study relied on a chosen imputation interval and a lack of systematic exploration for optimal intervals
(73)	CNN	CNN, BLSTM, LightGBM, logistic, RF, DT, RF, and DT	The proposed study achieved AUC scores of 0.957 and 0.954 for predicting CKD occurrence within 6 or 12 months respectively	The model’s feature set needs expansion to include laboratory measurements and lifestyle information
(74)	NN-LASSO model	Pearson correlation-NN, Chi-squared-NN, extra tree-NN, and LASSO regularization-NN	The results show that LASSO outperformed Pearson correlation, Chi-squared, extra tree, and LASSO regularization feature selection in predicting CKD with NN with an accuracy of 99.98%	Incorporation of hybrid feature extraction and selection techniques
(75)	MLP	SVM and NB	MLP outperformed SVM and NB with 92.5% accuracy in predicting CKD	Creation of an application for detecting CKD and the use of multiple features for detecting CKD
(64)	HRNN	HRNN	The results show that HRNN with embedded feature selection achieved an F1 score of 0.67 in predicting CKD	further validation to handle missing and noisy data across diverse datasets
(76)	FPA-DNN	D-ACO, DT, PSO, LR, fuzzy neural classifier, MLP, ACO, XGBoost, OlexGA and FPA-DN	FPA-DNN was proposed to diagnose CKD with 98.75% accuracy	Incorporation of clustering and outlier removal techniques
(77)	DNN	DNN, and RF	DNN outperformed RF for classifying CKD based on binary classification with 98.80% accuracy	The need for a larger dataset and to uncover additional factors related to CKD susceptibility
(62)	DNN	DNN, LASSO, ridge, SVM-RBF, SVM-linear, RF, and XGBoost	DNN achieved a better AUC score of 0.8843 over LASSO, Ridge, SVM-RBF, SVM-linear, RF, and XGBoost in predicting the progression of ESRD	Utilization of a single-center design and retrospective nature
(59)	MLP	MLP	MLP achieved an accuracy of 93.22% and 92.78% in predicting CKD on both the UCI and local dataset	Further validation across more diverse datasets
(60)	CNN	NB, DT, LR, and CNN	CNN outperformed NB, DT, and LR in predicting symptoms of CKD	The study relied on symptom data and living habits requires extensive and diverse datasets
(63)	DNN	DNN, SVM, KNN, LR, RF, and NB	The proposed model achieved 100% accuracy using key features by RFE	The model’s reliability is limited by its testing on a small dataset
(61)	CNN	OCNN, OANN, and OLSTM	The proposed optimized model outperformed CNN, ANN, and LSTM in CKD classification with OCNN achieving the highest AUC score of 0.99	The study faces overfitting due to complex model architectures
(78)	MLP	MLP, DT, RF, GB, SGD, ADAB, KNN, SVM, LR, ELM, PAC, and NB	The proposed model achieved an accuracy of 96.48% in forecasting CKD data	limited benchmark dataset and expert’s validation
(79)	ANN	ANN, RF, DT, SVM, and LR	ANN outperformed RF, DT, LR, and SVM with 100% accuracy in diagnosing CKD	The study’s claim of perfect accuracy may indicate overfitting
(80)	ANN	SVM, KNN, and RF	ANN outperformed SVM, RF, and KNN in diagnosing kidney disease	The model’s performance is limited by the small dataset size and missing attribute value
(68)	MLP	FNN/MLP, GBM, and LSTM	The study demonstrated that combining LASSO and relief for feature selection with SMOTEENN for class balancing and employing MLP achieved 100% using an imbalanced medical dataset	Exploration of advanced DL architectures, and additional feature selection techniques to further improve and validate model performance on diverse medical datasets
(57)	MLP	MLP	The study proposed a hybrid model comprising PCA and MLP for diagnosing CKD with 98.34% and 98.54% accuracy for training and testing data respectively	Further enhancement with advanced models and optimization techniques is needed
(81)	CNN	RF, SVM, DT, and KNN	The proposed 1D-CNN model achieved 98.33% recall, and F1 score for CKD prediction, outperforming RF, SVM, DT, and KNN models	Further exploration of other deep learning architectures and larger datasets for enhanced robustness
(82)	GNN	LR, SVM, RF, DT, KNN and NB	Proposed a GNN and Tabular model for predicting CKD progression, achieving 95.10% accuracy, compared to LR, SVM, RF, DT, KNN, and NB	Focus on refining the fusion model, expanding the dataset, and assessing its generalizability across different populations
(58)	MLP	MLP, LR, and DT	The proposed model combined with the XAI interface (LIME and SHAP) outperformed LR and DT in predicting the occurrence of kidney disease with 100%	The study was conducted using one source dataset and without evaluating other DL models
(69)	Bi-C-LSTM and DNetCNN	SVM, DT, DBN, D-ACO, and LR-NN	The proposed hybrid deep learning approach, combining Improved CapsNet, ISHO, BConvLSTM, and DNetCNN, achieved a classification accuracy of 99.89% for CKD with reduced computation time	A complex model was developed using small-size data and the exclusion of external validation
(67)	DSCNN	MLP, RBF, SVM, and PNN	Proposed a hybrid model comprising of DSCNN and optimized by the AO and STOA algorithms, achieved a 99.18% accuracy for CKD classification, outperforming existing methods	One dataset with limited record and a single source was used for training and testing conducted across different areas and a larger dataset
(65)	CKDD-HGSODL	NN-GA, EP-CKD, CKD-DM, EAML-CKD, SVM, BCA-KELM and CBHFSC-CKD	The study proposed an automated CKD detection method using CKDD-HGSODL with an accuracy of 99.56% accuracy and a computational time of less than 1 minute	The approach requires extensive validation on larger datasets and faces challenges in result interpretability due to the complexity of the hybrid model
(66)	ANN	KNN, RT, SVM, and GNB	The study shows the potential of applying machine learning and deep learning approaches, such as BRF-MOANN with 99.04% accuracy	Validation with larger datasets is needed to confirm findings and ensure robustness, particularly regarding the influence of family history and other risk factors

ACO, ant colony optimization; ADAB, adaptive boosting; ANN, artificial neural network; AO, aquila optimization algorithm; AUC, area under the curve; BCA, bacterial colony algorithm; Bi-C-LSTM, bidirectional coupled long short-term memory; BLSTM, bidirectional long short-term memory; BRF-MOANN, boosted random forest-multi-objective artificial neural network; CBHFSC, chaotic binary black hole-based feature selection with classification; CKD, chronic kidney disease; CKDD, chronic kidney disease dataset; CNN, convolutional neural network; D-ACO, dynamic ant colony optimization; DBN, deep belief network; DM, data mining; DNetCNN, deep network convolutional neural network; DNN, deep neural network; DSCNN, deep separable convolutional neural network; DT, decision tree; EAML, explainable automated machine learning; ELM, extreme learning machine; ESRD, end-stage renal disease; FNN, feedforward neural network; FPA-DNN, flower pollination algorithm-deep neural network; GA, genetic algorithm; GB, gradient boosting; GBM, gradient boosting machine; GNB, Gaussian Naive Bayes; GNN, Graph Neural Network; HGSODL, hierarchical graph-based semi-supervised online deep learning; HRNN, Hierarchical Recurrent Neural Network; KELM, kernel extreme learning machine; KNN, K-nearest neighbour; LASSO, least absolute shrinkage and selection operator; LightGBM, Light Gradient Boosting Machine; LIME, local interpretable model-agnostic explanation; LR, logistic regression; LSTM, long short-term memory; ML, machine learning; MLP, multilayer perceptron; NB, Naive Bayes; NN, neural network; OANN, online artificial neural network; OCNN, optimal convolutional neural network; OLSTM, optimized long short-term memory; PAC, passive aggressive classifier; PCA, principal component analysis; PNN, probabilistic neural network; PSO, particle swarm optimization; RBF, radial basis function; RBM, restricted Boltzmann machine; RF, random forest; RFE, recursive feature elimination; RT, regression tree; SGD, stochastic gradient descent; SHAP, SHapley Additive exPlanations; SMOTEENN, synthetic minority over-sampling technique-edited nearest neighbour; STOA, sooty tern optimization algorithm; SVM, support vector machine; UCI, University of California, Irvine; XGBoost, extreme gradient boosting.

Ensemble learning models for CKD prediction

Ensemble learning is an approach to learning that produces a larger single model that is more powerful than its constituents by combining multiple models (83). A methodology for combining many classifiers to handle complicated problems and boost model performance is based on ensemble learning. The optimal number of trees in the natural world reduces overfitting issues and increases precision (84). Alsuhibany et al. (85) developed an ensemble of DL-based clinical decision support systems (EDL-CDSS) for CKD diagnosis comprising of Adaptive Synthetic (ADASYN) technique for outlier detection process, the ensemble has three models: deep belief network (DBN), kernel extreme learning machine (KELM) and CNN with gated recurrent unit (CNN-GRU) with 98.00% accuracy. Baidya et al. (86) developed an ensemble model comprising of KNN & extra tree to predict CKD with 99.00%, Srivastava et al. (87) proposed an ensemble of SVM, RF, KNN, DT, and AdaBoost with ranking weighted ensemble to predict CKD with 98.75% accuracy, Abdel-Fattah et al. (88) combine DT, SVM, and gradient boosting (GB) tree as an ensembled model to predict CKD with features selected by Relief F and achieved 100% accuracy, Awad Yousif et al. (89) proposed an ensemble model comprising of long short-term memory (LSTM), bidirectional gated recurrent unit (BiGRU), and bidirectional LSTM (BiLSTM) with Eurygasters Optimization Algorithm in selecting the best features and Shuffled Frog Leap Algorithm during the hyperparameter tuning process, the proposed model achieved 98.12%. Saif et al. (90) compared optimization methods on CNN, LSTM, and BLSTM and came up with the best optimizers as CNN_Adamax, LSTM_Adam, and LSTM_BLSTM_Adamax models in predicting CKD with 98.00% and 97.00% for 6 and 12 months patient’s records. The summary of other ensemble studies is summarised in Table 3.

Table 3

Ensemble learning studies in predicting CKD

References	Method	Dataset	Contribution	Limitation
(85)	EDL-CDSS	EDL-CDSS, DBN model, CNN-GRU model, KELM model, FNC model, D-ACO MLP classifier, decision tree, ACO	The results show that the proposed model achieved 98.00% in the prediction of CKD	The study did not provide a provision for deploying the EDL-CDSS technique in real-world IoT environments
(86)	KNN & EXT	KNN, GNB, GB, EXT, XGB, ADB, RF and DT	The results show that the proposed model achieved 99.00% in the prediction of CKD	The study utilized small dataset limits and sources
(87)	Ranking weighted ensemble	SVM, KNN, DT, RF, and AdaBoost	The result showed a proposed model that predicts CKD with 98.75% accuracy	The study utilized one and a small dataset for validation
(88)	SVM, DT, and GBT classifier	DT, RF, LR, LR, NB, SVM and GBT	The proposed model with Relief-F and Chi-squared classified CKD on Apache Spark achieved 100% accuracy	The study utilized one small dataset for validation and the computational overhead on Apache Spark
(91)	CNN-LSTM hybrid model	KNN, LR, NB, AdaBoost, and RF	The CNN-LSTM hybrid model achieved a classification accuracy of 99.17% for diagnosing CKD, surpassing other machine-learning methods	The study’s dataset was small and not collected from multiple centres
(92)	Ensemble model integrates pretrained DL models with the SVM	Proposed Model with Chi2, RFE, mutual_info, and Tree_base	The proposed model with SVM metalearner outperformed Chi², RFE, mutual_info, and Tree_base with 99.60% accuracy	The study relied on a limited dataset and computational complexity evaluation was not conducted
(93)	Decision tree, Gaussian Naïve Bayes, gradient boosting, and random forest with stacking classifier	GB, RF, GNB, and DT	The proposed model combined with the Pearson correlation feature selection technique achieved 100% accuracy	The study’s performance evaluation is limited to the small dataset, and there is possible overfitting due to the dataset
(94)	ANN with bagging and voting	ANN + voting classifier, ANN + bagging classifier and ANN + bagging-voting	ANN with bagging and voting outperformed other approaches with 99.17%	The study utilized one and a small dataset for validation
(95)	LR, DT, SVM with bagging	DT, RF, and KNN	The proposed model achieved 97.00% in predicting chronic renal disease prognosis	Potential overfitting or lack of generalizability due to reliance on a single dataset and absence of external validation
(96)	RF, KNN, and gradient boosting	RF, KNN, and gradient boosting	The results show that the proposed model achieved 97.80% accuracy in assessing the severity stages of CKD	The generalizability of the findings beyond the dataset is missing
(90)	CNN_Adamax, LSTM_Adam, and LSTM_BLSTM_Adamax models	DT, RF, logistic, LightGBM, LSTM-BLSTM_Adamax, LSTM_Adam, and CNN_Adamax	The study proposed an Ensemble model, which combines CNN_Adamax, LSTM_Adam, and LSTM_BLSTM_Adamax models, outperformed all other models, obtaining an accuracy score of 98% for the six-month dataset and 97% for the 12-month dataset	Testing the models’ robustness against distinct data sets required collaboration with healthcare providers
(89)	EOAEDL-CKDD	LR, NB, SVM, simple RNN, DNN, KNN, and ANN	The study proposed the EOAEDL-CKDD method for CKD detection, which employs the Eurygasters Optimisation Algorithm for feature selection and an ensemble of LSTM, BiGRU, and BiLSTM models with an improved classification accuracy of 98.12%	The technique’s reliance on sequential models limits its scalability to larger datasets

CKD, chronic kidney disease; ACO, ant colony optimization; ADB, adaptive boosting; ANN, artificial neural network; BLSTM, bidirectional long short-term memory; CDSS, clinical decision support system; CKDD, chronic kidney disease dataset; CNN, convolutional neural network; D-ACO, dynamic ant colony optimization; DBN, deep belief network; DL, deep learning; DNN, deep neural network; DT, decision tree; EDL, ensemble of deep learning; EDL-CDSS, ensemble of deep learning based clinical decision support systems; EOAEDL, Eurygasters optimization algorithm with ensemble deep learning; EXT, extra tree; FNC, fuzzy neural classifier; GB, gradient boosting; GBT, gradient boosting tree; GNB, Gaussian Naive Bayes; GRU, gated recurrent unit; IoT, internet of things; KELM, kernel extreme learning machine; KNN, K-nearest neighbour; LR, logistic regression; LSTM, long short-term memory; MLP, multilayer perceptron; NB, Naive Bayes; RF, random forest; RFE, recursive feature elimination; RNN, recurrent neural network; SVM, support vector machine; XGB, extreme gradient boosting.

Kidney disease dataset

ML can only act intelligently when it learns on a sufficient dataset. Studies have been conducted on predicting CKD and its progression using various datasets from various places, and this large volume of data includes laboratory tests, medical records, and diagnostic images. The CKD dataset, sourced from the UCI Machine Learning Repository (97), is widely utilized in various studies. It is an online dataset originating from India that was collected between April 2018 and June 2018 and covers individuals aged 9 to 90 years. This dataset consists of 400 records with 25 attributes. A CKD dataset is available from the University Hospital Prof. Alberto Antunes, UFAL, Brazil, based on the study conducted by Sobrinho et al. (98). This dataset aims to predict CKD and was collected between 2015 and 2016. It comprises 60 records involving 16 patients without kidney damage, 14 of whom were diagnosed with CKD. Additionally, 30 patients were diagnosed with CKD, hypertension, and/or diabetes. The age range of the patients was 31 to 79 years. In a recent study conducted by David et al. (52), they utilized a dataset related to diabetic kidney disease. This dataset has 410 instances and includes 18 attributes. The study conducted by You (42) focused on utilizing ML methods to predict CKD. The data for this study were collected from the Affiliated Hospital of Nanjing University Medicine between 2016 and 2017. This dataset consists of 224 instances, of which 69 records are on CKD. Salkar (32) employed a dataset from the Mayo Clinic Hospital in the USA to examine the relationship between CKD and heart disease. This CKD dataset comprises 500 instances and includes 24 input attributes along with 1 output attribute. The CKD dataset analysed by Mijwil and Alsaadi (22) included a total of 316 patients, comprising 189 males and 127 females. In their study, Chiu et al. (33) utilized the MJ Health-Check-Up-Based Population Database (MJPD) to employ ML for predicting CKD. This dataset was obtained from the MJ Group Screening Centre in Taiwan and covers the period from 2010 to 2015. It comprises a total of 65,394 records gathered from four MJ clinics. Debal and Sitote (25) utilized a dataset collected at St. Paulo’s Hospital during the period from 2018 to 2019. This dataset was gathered from patients admitted to the renal ward. Some instances within the dataset correspond to the same patients but at different times and stages of their condition. The dataset comprises a total of 1,718 records and encompasses 19 input variables, consisting of 7 nominal and 12 numeric values. Iftikhar et al. (26) developed ML techniques to predict CKD using a dataset obtained from the Medical Complex in Buner, Khyber Pakhtunkhwa, Pakistan. The dataset covers the timeframe from 2020 to 2021 and consists of a sample size of 500 records. Among these records, 270 instances are related to CKD, while 230 instances pertain to patients without CKD. This dataset comprises 20 input attributes and 1 target attribute. Summary of the different CKD dataset studies are summarized in Table 4.

Table 4

A summary of the CKD dataset used in studies

Reference	Source	Type of data	Attributes
(97)	University of California, Irvine (UCI) Machine Learning Repository	CKD	25
(98)	University Hospital Prof. Alberto Antunes at UFAL, Brazil	CKD	8
(52)	Saudi Diabetic Kidney Disease	Diabetic kidney disease	18
(42)	Affiliated Hospital of Nanjing University	CKD	23
(32)	Mayo Clinic Hospital in the USA	CKD	25
(25)	St. Paulo’s Hospital	CKD	19
(26)	Medical Complex in Buner, Khyber Pakhtunkhwa, Pakistan	CKD	20
(99)	UDDANAM area in Srikakulam District of Andhra Pradesh, India	CKD	37
(100)	Erbil Teaching Hospital	CKD	12
(101)	Tawan Hospital Outpatient Clinics in Al Ain	CKD	23
(102)	Gashua General Hospital in Yobe State	CKD	11
(73)	Taiwan’s National Health Insurance Research Database (NHIRD)	CKD	20
(43)	Vimcrete Hospital EMR	CKD	27
(103)	Central Laboratory of Islam Abas-e-Qarb City in Kermanshah, Iran	CKD	13

CKD, chronic kidney disease.

Discussion

Study interpretation

As shown in Figure 3, from 2021 to 2024, studies on predicting CKD disease using various ML approaches were undertaken, with 2021 having the most publications; nevertheless, as of 2024, no studies for predicting CKD using traditional ML had been documented. Figure 4 shows that DL has the highest studies with 24%, then RF with 17% out of the 86 studies carried out, even though RF has the highest studies, no studies have shown that it outperformed any DL and in most of the studies that compared a proposed DL model with other traditional ML, DL performed better. Figure 5 shows that CML had the most articles in 2021, but the number dropped. DL, which topped in 2023, has led to considerable gains in CKD studies, demonstrating its importance. While no studies using CML had been done in 2024, there were some studies conducted using DL and ensemble ML (ES) in 2024.

Figure 3 Publication trends of CKD predicting using different learning models. CKD, chronic kidney disease.

Figure 4 The percentage of ML models for predicting CKD. ML, machine learning; CKD, chronic kidney disease; DT, decision tree; LR, logistic regression; RF, random forest; SVM, support vector machine; DL, deep learning; ES, ensemble machine learning.

Figure 5 Different learning trends for CKD prediction. TML, traditional machine learning; DL, deep learning; ES, ensemble machine learning; CKD, chronic kidney disease.

AI implications in clinical domains

AI is now transforming the clinical domain, particularly CKD prediction, where accurate and early prediction is important for effective intervention. ML learning approaches such as CML, DL and ES have been used for analysing huge amounts of datasets to detect patterns and predict the occurrence of CKD with high accuracy, ML driven models support nephrologists by automating the prediction of CKD, reducing prediction errors and enabling more personalized treatment. Furthermore, DL approaches can handle data directly while reducing the feature engineering stage, which not only enhances prediction accuracy but also accelerates the decision-making process in clinical settings. However, despite AI evolvement in healthcare, it still faces significant limitations due to challenges such as model generalisation, computational complexity, and interpretability which are relevant for the prediction of CKD with ML approaches.

Challenges of ML approaches in CKD prediction

Despite the enhancement of different ML approaches in CKD prediction, several challenges affect AI for CKD prediction such as:

Data quality and preprocessing: the CKD dataset that is mostly used by the reviewed works contains 158 missing cells out of 400 cases, which reduces model generalization and efficiency. Many studies overlook data preprocessing and focus on model development without considering the effects of data quality on the predictive model.
Limitation of binary classification of CKD: most reviewed studies focused on the binary classification of CKD classifying CKD and not chronic kidney disease (NCKD), however considering multiclass classification for determining CKD progression stages can help provide necessary interventions and treatment.
Feature selection limitation: most feature selection techniques focus on identifying the best features based on linear relationships. However, medical dataset often exhibits nonlinear relationships where features are inversely related but can affect how CKD is predicted, the limitations of feature selection techniques to identify best features based on only linear relationships can overlook significant features that have nonlinear relationships.
Predictive model limitations: CML experiences challenges such as handling complex and high-dimensional data because the models don’t have complex architecture, another limitation of CML is that they require feature engineering to capture complex patterns effectively. DL model also has their limitation such as high computational resource requirements for running DL models, and interpretability challenges, due to the way they operate as a black box making it difficult to interpret their decision-making process and large data requirements because many datasets that are available online have limited records. Finally, ensemble learning has limitations such as the complexity of the models since it combines multiple computational power for prediction and requires more computational resources and memory resources.
Impact of small dataset on predictive models: significant reduction in generalization and performance is associated with the size of a dataset for the predictive model, especially in complex domains like healthcare where a significant amount of data is required to provide sensitive accurate predictions. Additionally, a small dataset can make a model to learn patterns on training data but fails to generalize to the new data called overfitting, leading to high accuracy on training datasets but poor performance on unseen datasets.
Interpretability and explainability of predictive models in the medical sector: predictive models in healthcare such as DL and ES operate as black boxes making it difficult to explain how these models are generated. This lack of transparency raises concern for nephrologists who are needed to understand and trust the model.
Consequently, there is a pressing need for the development of more complex and robust models, such as DL approaches, to enhance CKD prediction accuracy and reliability. Due to DL complexity, there are limited studies in terms of improving its computational cost for its effectualness in real-time application (89).

Further studies

The future of predictive modelling for CKD prediction in this research presents several promising actions to enhance model accuracy, robustness, and clinical applicability:

Development of predictive model on multiple datasets: integration of datasets from different clinical sources will be significant in improving model generalization, which can increase predictive models’ adaptability across different patient populations, to enhance model relevancy and reliability in global health settings in the global settings (12).
Enhancement of more biomarkers on CKD dataset: the conventional approach for diagnosing CKD through urine protein testing relies on 24-hour urine collection, however, alternatives such as urinary protein-to-creatinine ratio (uPCR) and urinary albumin-to-creatinine ratio (uACR) have proven to be excellent substitutes that allow CKD diagnosis using a single urine sample which simplifies the process (104).
Advanced feature selection techniques: future research should focus on sophisticated feature selection methods capable of accommodating both linear and nonlinear data relationships. By refining feature selection approaches, predictive accuracy can be improved, enabling models to better capture complex interdependencies in CKD-related data.
Development of multiclass classification models: to advance beyond binary classification, there is a need to develop models capable of predicting not only CKD presence but also its progression across stages. Such models would support a more effective assessment of disease severity, informing treatment strategies and allowing for proactive interventions (79).
Interpretable AI for clinical integration: as interpretability remains a critical challenge, future work should prioritize research on interpretable AI frameworks. Enhancing transparency and explainability within models will help foster clinician trust, facilitating the integration of AI-driven insights into routine clinical workflows and ultimately improving patient outcomes.

Conclusions

In conclusion, AI through CML, DL and ES learning approaches offers a transformative potential in predicting CKD. by leveraging different ML approaches and evolving volume of CKD data. AI models enable early prediction and personalised treatment, which is vital for CKD outcomes. This review of recent learning approaches has shown the evolution of AI for CKD prediction and highlights the ongoing transition towards DL as a preferred approach due to its ability to handle complex data with minimum intervention. However, challenges like data quality, limited development of multiclass classification, feature selection limitation, overfitting, explainability and interpretability must be addressed to fully comprehend AI potential in CKD management. Further research should focus on developing a predictive model across multiple datasets to improve the generalisability, inclusion and enhancement of additional biomarkers within the CKD dataset, and the development of an advanced feature selection method. Furthermore, the development of a multiclass classification model for the prediction of CKD progression and interpretable AI frameworks for clinical integration is vital for enhancing the accuracy and usability of the predictive model in healthcare sectors.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the PRISMA reporting checklist. Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-256/rc

Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-256/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-256/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Antony L, Azam S, Ignatious EVA, et al. A Comprehensive Unsupervised Framework for Chronic Kidney Disease Prediction. IEEE Access. 2021;9:126481-501. [Crossref]
Qin J, Chen LIN, Liu Y, et al. A Machine Learning Methodology for Diagnosing Chronic Kidney Disease. IEEE Access 2019;8:20991-1002.
Lim DKE, Boyd JH, Thomas E, et al. Prediction models used in the progression of chronic kidney disease: A scoping review. PLoS One 2022;17:e0271619. [Crossref] [PubMed]
Rashed-Al-Mahfuz M, Haque A, Azad A, et al. Clinically Applicable Machine Learning Approaches to Identify Attributes of Chronic Kidney Disease (CKD) for Use in Low-Cost Diagnostic Screening. IEEE J Transl Eng Health Med 2021;9:4900511. [Crossref] [PubMed]
Qezelbash-Chamak J, Badamchizadeh S, Eshghi K, et al. Machine Learning with Applications A survey of machine learning in kidney disease diagnosis. Mach Learn Appl 2022;10:100418. [Crossref]
Gudeti B, Mishra S, Malik S, et al. A Novel Approach to Predict Chronic Kidney Disease using Machine Learning Algorithms. 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA); 2020:1630-5.
Sanmarchi F, Fanconi C, Golinelli D, et al. Predict, diagnose, and treat chronic kidney disease with machine learning: a systematic literature review. J Nephrol 2023;36:1101-17. [Crossref] [PubMed]
González-Rocha A, Colli VA, Denova-Gutiérrez E. Risk Prediction Score for Chronic Kidney Disease in Healthy Adults and Adults With Type 2 Diabetes: Systematic Review. Prev Chronic Dis 2023;20:E30. [Crossref] [PubMed]
Zhao D, Wang W, Tang T, et al. Current progress in artificial intelligence-assisted medical image analysis for chronic kidney disease: A literature review. Comput Struct Biotechnol J 2023;21:3315-26. [Crossref] [PubMed]
Chen F, Kantagowit P, Nopsopon T, et al. Prediction and diagnosis of chronic kidney disease development and progression using machine-learning: Protocol for a systematic review and meta-analysis of reporting standards and model performance. PLoS One 2023;18:e0278729. [Crossref] [PubMed]
Moher D, Liberati A, Tetzlaff J, et al. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Int J Surg 2010;8:336-41. Erratum in: Int J Surg 2010;8:658. [Crossref] [PubMed]
Farjana A, Liza TF, Pandit PP, et al. Predicting Chronic Kidney Disease Using Machine Learning Algorithms. In: IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC). IEEE; 2023:1267-71.
Ma K, Zheng ZR, Meng Y. Pathogenesis of Chronic Kidney Disease Is Closely Bound up with Alzheimer’s Disease, Especially via the Renin-Angiotensin System. J Clin Med 2023;12:1459. [Crossref] [PubMed]
Borg R, Carlson N, Søndergaard J, et al. The Growing Challenge of Chronic Kidney Disease: An Overview of Current Knowledge. Int J Nephrol 2023;2023:9609266. [Crossref] [PubMed]
Barhoi A. A mathematical analysis of dialysis report and kidney function tests: a case study. J Nephropharmacology 2024;13:1-5.
Eckardt KU, Delgado C, Heerspink HJL, et al. Trends and perspectives for improving quality of chronic kidney disease care: conclusions from a Kidney Disease: Improving Global Outcomes (KDIGO) Controversies Conference. Kidney Int 2023;104:888-903. [Crossref] [PubMed]
Rahman SM, Ibtisum S, Bazgir E, et al. The Significance of Machine Learning in Clinical Disease Diagnosis: A Review. International Journal of Computer Applications 2023;185:10-7. [Crossref]
Moein MM, Saradar A, Rahmati K, et al. Predictive models for concrete properties using Machine Learning and Deep Learning Approaches: A review. J Build Eng 2023;105444. [Crossref]
Ahmed I, Chowdhury TE, Routh BB, et al. Performance Analysis of Machine Learning Algorithms in Chronic Kidney Disease Prediction. 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference; 2022:0417-23.
Palembang K, Novita RP, Different U, et al. A Novel Approach for Chronic Kidney Disease Using Machine Learning Methodology. J Phys Conf Ser 2021;1916:012164. [Crossref]
Ifraz GM, Rashid MH, Tazin T, et al. Comparative Analysis for Prediction of Kidney Disease Using Intelligent Machine Learning Methods. Comput Math Methods Med 2021;2021:6141470. [Crossref] [PubMed]
Mijwil MM, Alsaadi A. Applying Machine Learning Techniques to Predict Chronic Kidney Disease in Applying Machine Learning Techniques to Predict Chronic Kidney Disease in Humans. International Conference on Data Science and Applications; 2021:6.
Swain D, Mehta U, Bhatt A, et al. A Robust Chronic Kidney Disease Classifier Using Machine Learning. Electronics 2023;12:212. [Crossref]
Banu A, Begum Z. Chronic Disease Diagnosis Using Machine Learning Algorithm. J Emerg Technol Innov Res 2023;10:765-72.
Debal AD, Sitote MT. Chronic kidney disease prediction using machine learning techniques. J Big Data 2022; [Crossref]
Iftikhar H, Khan M, Khan Z, et al. A Comparative Analysis of Machine Learning Models: A Case Study in Predicting Chronic Kidney Disease. Sustainability 2023;15:2754. [Crossref]
Islam MA, Majumder MZH, Hussein MA. Chronic kidney disease prediction based on machine learning algorithms. J Pathol Inform 2023;14:100189. [Crossref] [PubMed]
da Silveira CMA, Sobrinho A, da Silva LD, et al. Exploring Early Prediction of Chronic Kidney Disease Using Machine Learning Algorithms for Small and Imbalanced Datasets. Appl Sci 2022;12:3673. [Crossref]
Samatha K, Reddy MR, Khan PF, et al. Chronic Kidney Disease Prediction using Machine Learning Algorithms. International Journal of Preventive Medicine and Health 2021;1:1-4. [Crossref]
Dritsas E, Trigka M. Machine Learning Techniques for Chronic Kidney Disease Risk Prediction. Big Data Cogn Comput 2022;6:98. [Crossref]
Arif SM, Mukheimer A, Asif D. Enhancing the Early Detection of Chronic Kidney Disease: A Robust Machine Learning Model. Big Data Cogn Comput 2023;7:144. [Crossref]
Salkar C. A Detailed Analysis on Kidney and Heart Disease Prediction using Machine Learning. J Comput Nat Sci 2021;1:9-14. [Crossref]
Chiu YL, Jhou MJ, Lee TS, et al. Health Data-Driven Machine Learning Algorithms Applied to Risk Indicators Assessment for Chronic Kidney Disease. Risk Manag Healthc Policy 2021;14:4401-12. [Crossref] [PubMed]
Raihan MMS, Ahmed E, Karim A, et al. Chronic Renal Disease Prediction using Clinical Data and Different Machine Learning Techniques. 2nd International Informatics and Software Engineering Conference, IISEC 2021; 2021:1-5.
Chittora P, Chaurasia S, Prasun C, et al. Prediction of Chronic Kidney Disease - A Machine Learning Perspective. IEEE Access 2021;9. [Crossref]
Chowdhury NH, Reaz MBI, Haque F, et al. Performance Analysis of Conventional Machine Learning Algorithms for Identification of Chronic Kidney Disease in Type 1 Diabetes Mellitus Patients. Diagnostics (Basel) 2021;11:2267. [Crossref] [PubMed]
Nishat MM, Faisal F, Dip RR, et al. A Comprehensive Analysis on Detecting Chronic Kidney Disease by Employing Machine Learning Algorithms. EAI Endorsed Trans Pervasive Heal Technol 2021;7:1-12. [Crossref]
Almustafa KM. Informatics in Medicine Unlocked Prediction of chronic Kidney Disease using Different Classification Algorithms. Informatics Med Unlocked 2021;24:100631. [Crossref]
Emon MU, Imran AM, Islam R, et al. Performance Analysis of Chronic Kidney Disease through Machine Learning Approaches. 2021 6th International Conference on Inventive Computation Technologies, ICICT; 2021:713-9.
Sarkar A, Kumar A, Sarkar S. Detection and Evaluation of Chronic Kidney Disease Using Different Regression and Classification Algorithms in Machine Learning. Advances in Electronics, Communications and Computing, Lecture Notes in Electrical Engineering; 2021;709.
Shanmugarajeshwari V, Ilayaraja M. Intelligent Prediction Techniques for Chronic Kidney Disease Data Analysis. International Journal of Intelligent Information Technologies 2021;11:1-22.
You H. Predictive analysis of chronic kidney disease based on machine learning. Eng Appl Sci Lett 2021;4:62-8. [Crossref]
Ventrella P, Delgrossi G, Ferrario G, et al. Supervised machine learning for the assessment of Chronic Kidney Disease advancement. Comput Methods Programs Biomed 2021;209:106329. [Crossref] [PubMed]
Roy MS, Ghosh R, Goswami D, et al. Comparative Analysis of Machine Learning Methods to Detect Chronic Kidney Disease. J Phys Conf Ser 2021;1911:012005. [Crossref]
Ilyas H, Ali S, Ponum M, et al. Chronic kidney disease diagnosis using decision tree algorithms. BMC Nephrol 2021;22:273. [Crossref] [PubMed]
Samet S, Laouar MR, Bendib I. Predicting and Staging Chronic Kidney Disease using Optimized Random Forest Algorithm. 2021 International Conference on Information Systems and Advanced Technologies (ICISAT); 2021:1-8.
Senan EM, Al-Adhaileh MH, Alsaade FW, et al. Diagnosis of Chronic Kidney Disease Using Effective Classification Algorithms and Recursive Feature Elimination Techniques. J Healthc Eng 2021;2021:1004767. [Crossref] [PubMed]
Ahmed R, Ali A, Ahmed N, et al. Computational Intelligence Approaches for Prediction of Chronic Kidney Disease. In: Advances in Distributed Computing and Machine Learning: proceedings of ICADCML; 2021:299-309.
Ebiaredoh-Mienye SA, Swart TG, Esenogho E, et al. A Machine Learning Method with Filter-Based Feature Selection for Improved Prediction of Chronic Kidney Disease. Bioengineering (Basel) 2022;9:350. [Crossref] [PubMed]
Jambulingam U, Kalpana AV, Ganesan I. Chronic kidney disease prediction with feature selection and extraction using. In: The First International Conference on Advances in Computational Science and Engineering; 2022.
Babu KP, Noorullah S. Recognition of Chronic Kidney Disease Using Machine Learning. J Algebr Stat 2022;13:910-7.
David SK, Rafiullah M, Siddiqui K. Comparison of Different Machine Learning Techniques to Predict Diabetic Kidney Disease. J Healthc Eng 2022;2022:7378307. [Crossref] [PubMed]
Islam A, Nittala K, Bajwa G. Adding Explainability to Machine Learning Models to Detect Chronic Kidney Disease. 2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI); 2022:297-302.
Allen A, Iqbal Z, Green-Saxena A, et al. Prediction of diabetic kidney disease with machine learning algorithms, upon the initial diagnosis of type 2 diabetes mellitus. BMJ Open Diabetes Res Care 2022;10:e002560. [Crossref] [PubMed]
Dekka S, Raju KN, Manendrasai D, et al. Utilizing Machine Learning Algorithms For Kidney Disease Prognosis. Eur J Mol Clin Med 2023;10:2852-61.
Ahmed SF, Alam MS. Deep learning modelling techniques: current progress, applications, advantages, and challenges. Artificial Intelligence Review 2023;56:13521-617. [Crossref]
Ranga P, Terlapu V, Jayaram D, et al. Optimizing Chronic Kidney Disease Diagnosis in Uddanam: A Smart Fusion of GA-MLP Hybrid and PCA Dimensionality Reduction. Procedia Comput Sci 2023;230:522-31. [Crossref]
Khadka SR, Subedi S, Aidy BK, et al. Predicting Chronic Kidney Disease using ML algorithms and XAI. Medicon Engineering Themes 2023;4:21-9.
Ravindra BV, Narendra VG, Shivaprasad G. A Neural Network Based Data Mining Approach For Recognition Of Chronic Kidney Disease. J Theor Appl Inf Technol. 2022;100:1369-77.
Alanazi R. Identification and Prediction of Chronic Diseases Using Machine Learning Approach. J Healthc Eng 2022;2022:2826127. [Crossref] [PubMed]
Mondol C, Shamrat MFMJ, Hasan R, et al. Early Prediction of Chronic Kidney Disease: A Comprehensive Performance Analysis of Deep Learning Models. Algorithms 2022;15:308. [Crossref]
Liang P, Yang J, Wang W, et al. Deep Learning Identifies Intelligible Predictors of Poor Prognosis in Chronic Kidney Disease. IEEE J Biomed Health Inform 2023;27:3677-85. [Crossref] [PubMed]
Singh V, Asari VK, Rajasekaran R. A Deep Neural Network for Early Detection and Prediction of Chronic Kidney Disease. Diagnostics (Basel) 2022;12:116. [Crossref] [PubMed]
Chitra S, Jayalakshmi V. Effective analysis of chronic kidney disease prediction using HRNN algorithm. Louis Savenien Dupuis Journal of Multidisciplinary Research; 2022:1-6.
Gokiladevi M, Santhoshkumar S. Henry Gas Optimization Algorithm with Deep Learning based Chronic Kidney Disease Detection and Classification Model. Int J Intell Eng Syst 2024;17:645-55.
Singh Y, Srivastava M, Mahajan S, et al. Evaluation of Boosted Random Forest and Multi-Objective Artificial Neural Network for the Diagnostic Severity of Chronic Kidney Disease (CKD) Patients. Nanotechnoloy Perceptions. 2024;20:629-38.
Venkatrao K, Shaik K. HDLNET: A Hybrid Deep Learning Network Model With Intelligent IOT for Detection and Classification of Chronic Kidney Disease. IEEE Access 2023;11:99638-52.
Subashini NJ, Venkatesh K. Multimodal deep learning for chronic kidney disease prediction: leveraging feature selection algorithms and ensemble models. Int J Comput Appl 2023;45:647-59.
Busi RAL, Meka JS, Reddy PVGDP. A Hybrid Deep Learning Technique for Feature Selection and Classification of Chronic Kidney Disease. Int J Intell Eng Syst 2023;16:638-49.
Arafat F, Khan T, Bapon DA, et al. A Deep Learning Approach to Predict Chronic Kidney Disease in Human. 2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON); 2021:1010-5.
Elkholy MMS, Rezk A, Saleh AEFA. Early Prediction of Chronic Kidney Disease Using Deep Belief Network. IEEE Access 2021;9:135542-9.
Au-Yeung L, Xie X, Chess J. Using Machine Learning to Refer Patients with Chronic Kidney Disease to Secondary Care. 2021 25th International Conference on Pattern Recognition (ICPR); 2021:10219-26.
Krishnamurthy S, Ks K, Dovgan E, et al. Machine Learning Prediction Models for Chronic Kidney Disease Using National Health Insurance Claim Data in Taiwan. Healthcare (Basel) 2021;9:546. [Crossref] [PubMed]
Yadav CD, Pal S. Performance based Evaluation of Algorithmson Chronic Kidney Disease using Hybrid Ensemble Model in Machine Learning. Biomed Pharmacol J 2021;14:1633-45. [Crossref]
Vashisth S, Dhall I, Saraswat S. Chronic Kidney Disease (CKD) Diagnosis using Multi-Layer Perceptron Classifier. 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence); 2020:346-50.
Aswathy RH, Suresh P, Sikkandar YM, et al. Optimized Tuned Deep Learning Model for Chronic Kidney. Computers, Materials & Continua 2022;70:2098-111. [Crossref]
Jhumka K, Auzine MM, Shoaib M, et al. Chronic Kidney Disease Prediction using Deep Neural Network. In: 3rd International Conference on Next Generation Computing Application; 2022:114-8.
Ashafuddula NI, Islam B, Islam R. An Intelligent Diagnostic System to Analyze Early-Stage Chronic Kidney Disease for Clinical Application. Applied Computational Intelligence and Soft Computing 2023;2023:3140270. [Crossref]
Sawhney R, Malik A, Sharma S, et al. A comparative assessment of artificial intelligence models used for early prediction and evaluation of chronic kidney disease. Decis Anal J 2023;6:100169. [Crossref]
Rashid AK. Diagnosing Chronic Kidney disease using Artificial Neural Network (ANN). J Inf Technol Comput 2023;4:37-45. [Crossref]
Özdemir R, Taşyürek M, Aslantaş V. A New Deep Learning Model for Chronic Kidney Disease Prediction. 15th International Conference on Engineering & Natural Sciences, Muş, Türkiye; 2023:95-105.
Rao PK, Chatterjee S, Nagaraju K, et al. Fusion of Graph and Tabular Deep Learning Models for Predicting Chronic Kidney Disease. Diagnostics (Basel) 2023;13:1981. [Crossref] [PubMed]
Mohammed A, Kora R. A comprehensive review on ensemble deep learning: Opportunities and challenges. J King Saud Univ - Comput Inf Sci 2023;35:757-74. [Crossref]
Islam MR. Chronic kidney disease prediction using ensemble machine learning. 2023. Available online: https://hstu.ac.bd/uploads/dpt-research/ece/2023tjj_2005107_MMIn2.pdf
Alsuhibany SA, Abdel-Khalek S, Algarni A, et al. Ensemble of Deep Learning Based Clinical Decision Support System for Chronic Kidney Disease Diagnosis in Medical Internet of Things Environment. Comput Intell Neurosci 2021;2021:4931450. [Crossref] [PubMed]
Baidya D, Umaima U, Islam N, et al. A Deep Prediction of Chronic Kidney Disease by Employing Machine Learning Method. In: 6th International Conference on Trends in Electronics and Informatics (ICOEI 2022). Tirunelveli, India; 2022:28-30.
Srivastava S, Yadav KR, Narayan V, et al. An Ensemble Learning Approach for Chronic Kidney Disease Classification. J Pharm Negat Results 2022;13:2401-9.
Abdel-Fattah MA, Othman NA, Goher N. Predicting Chronic Kidney Disease Using Hybrid Machine Learning Based on Apache Spark. Comput Intell Neurosci 2022;2022:9898831. [Crossref] [PubMed]
Awad Yousif SM, Halawani HT, Amoudi G, et al. Early detection of chronic kidney disease using eurygasters optimization algorithm with ensemble deep learning approach. Alexandria Eng J 2024;100:220-31. [Crossref]
Saif D, Sarhan AM, Elshennawy NM. Early Prediction of Chronic Kidney Disease Based on Ensemble of Deep Learning Models and Optimizers. J Electr Syst Inf Technol 2024;11:17. [Crossref]
Yildiz EN, Cengil E, Yildirim M, et al. Diagnosis of Chronic Kidney Disease Based on CNN and LSTM. Acadlore Trans AI Mach Learn 2023;2:66-74. [Crossref]
Alsekait DM, Saleh H, Gabralla LA, et al. Applied Sciences Toward Comprehensive Chronic Kidney Disease Prediction Based on Ensemble Deep Learning Models. Appl Sci 2023;13:3937. [Crossref]
Khalid H, Khan A, Zahid Khan M, et al. Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease. Comput Intell Neurosci 2023;2023:9266889. [Crossref] [PubMed]
Abhilash P, Parhi M, Pattanayak KB. An Ensemble Deep Learning Approach for Chronic Kidney Disease (CKD) Prediction. In: AIP Conference Proceedings; 2023.
Kaur C, Kumar MS, Anjum A, et al. Chronic Kidney Disease Prediction Using Machine Learning. J Adv Inf Technol 2023;14:384-91. [Crossref]
Swathi K, Krishna GV. Prediction of Chronic Kidney Disease with Various Machine Learning Techniques: A Comparative Study. I Smart Technologies in Data Science and Communication Lecture Notes in Networks and Systems; 2023:257-62.
Rubini L, Soundarapandian P, Eswaran P. UC Irvine Machine Learning Repository. Chronic Kidney Disease; 2015. doi: 10.24432/C5G020.10.24432/C5G020
Sobrinho A, Queiroz CMDASA, De Barros Costa E, et al. Computer-Aided Diagnosis of Chronic Kidney Disease in Developing Countries: A Comparative Analysis of Machine Learning Techniques. IEEE Access 2020;8:25407-19.
Anusha KB, Vital TP, Sangeeta K. Machine Learning Models and Neural Network Techniques for Predicting Uddanam CKD. Int J Recent Technol Eng 2019;8:2550-63. [Crossref]
Alshebly QO, Ahmed MR. Prediction and Factors Affecting of Chronic Kidney Disease Diagnosis using Artificial Neural Networks Model and Logistic Regression Model. Iraqi J Stat Sci 2019;28:140-59.
Vinothini A, Baghavathi Priya S. Design of Chronic Kidney Disease Prediction Model on Imbalanced Data Using Machine Learning Techniques. Indian J Sci Technol 2020;11:708-18.
Iliyas II, Saidu IR, Baba AD, et al. Prediction of Chronic Kidney Disease Using Deep Neural Network. FUDMA J Sci 2020;4:34-41. [Crossref]
Sharifi A, Alizadeh K. A Novel Classification Method Based on Multilayer Perceptron-Artificial Neural Network technique for Diagnosis of Chronic Kidney Disease. Ann Mil Heal Sci Res. 2020;18:3. [Crossref]
Huang KJ, Chang WC, Chen CH, et al. Urine Protein to Creatinine Ratio for the Assessment of Bevacizumab-Associated Proteinuria in Patients with Gynecologic Cancers: A Diagnostic and Quality Improvement Study. Diagnostics (Basel) 2024;14:1852. [Crossref] [PubMed]

doi: 10.21037/jmai-24-256
Cite this article as: Iliyas II, Boukari S, Gital AY. Recent trends in prediction of chronic kidney disease using different learning approaches: a systematic literature review. J Med Artif Intell 2025;8:62.