Comparison of machine learning algorithms for predicting admission in intensive care units based on the precision nursing framework
Original Article

Comparison of machine learning algorithms for predicting admission in intensive care units based on the precision nursing framework

Greici Capellari Fabrizzio1 ORCID logo, Lincoln Moura de Oliveira2, Diovane Ghignatti Da Costa1, José Luís Guedes Dos Santos1, Rodrigo Jensen3, Elisiane Lorenzini1, Clara Bender4, Simon Lebech Cichosz4, Alacoque Lorenzini Erdmann1

1Department of Nursing, Federal University of Santa Catarina, Florianópolis, SC, Brazil; 2Department of Electrical Engineering at the Federal University of Ceará, Fortaleza, CE, Brazil; 3Department of Nursing, Paulista State University, Botucatu, SP, Brazil; 4Department of Health Science and Technology, Aalborg, North Jutland, Denmark

Contributions: (I) Conception and design: GC Fabrizzio, LM de Oliveira, AL Erdmann; (II) Administrative support: DG Da Costa, JLGD Santos, R Jensen, E Lorenzini; (III) Provision of study materials or patients: DG Da Costa, JLGD Santos, R Jensen, E Lorenzini, C Blender, SL Cichosz, AL Erdmann; (IV) Collection and assembly of data: GC Fabrizzio, LM de Oliveira; (V) Data analysis and interpretation: GC Fabrizzio, LM de Oliveira, DG Da Costa, JLGD Santos, R Jensen, E Lorenzini, C Blender, SL Cichosz, AL Erdmann; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Greici Capellari Fabrizzio, PhD. Nurse, Health Sciences Center, Rua Delfino Conti, S/N - Trindade, Florianópolis - SC, 88040-370, Brasil; Department of Nursing, Federal University of Santa Catarina, Florianópolis, SC, Brazil. Email: greicicapellari@gmail.com.

Background: Between 5% and 25% of patients with coronavirus disease 2019 (COVID-19) disease will require hospitalization in an intensive care unit (ICU) bed. The study aimed comparison of machine learning models for predicting admission of patients infected with corona virus disease in ICUs, based on the theoretical framework of Precision Nursing.

Methods: A retrospective, multicenter study, which included 547 patients positive for COVID-19 disease, admitted to hospitals. From the combination Precision Nursing variables, eight prediction models were tested: neural network, AdaBoost, logistic regression, random forest, K Nearest Neighbor, Naive Bayes, Support Vector Machine, decision tree. Evaluated by cross-validation, observing the metrics of area under the curve, sensitivity, specificity, and confusion matrix.

Results: The most important predictor variables were admission to ICU were age, admission to the university hospital linked to the Universidade Federal do Amazonas and the number of people living in the household. The decision tree model showed the best performance for area under the curve (0.668), sensitivity (0.633) and specificity (0.669).

Conclusions: The use of the model as a tool to support nurses’ decision making helps to identify downgrading of the condition, indicating early the patients most likely to be admitted to the ICU. A frequent evaluation of their predictive capacity are necessary, and it is believed that the inclusion of the vaccination variable could improve the predictive model performance.

Keywords: Nursing informatics; precision medicine; coronavirus disease 2019 (COVID-19); supervised machine learning; algorithms


Received: 22 August 2024; Accepted: 09 December 2024; Published online: 12 March 2025.

doi: 10.21037/jmai-24-296


Highlight box

Key findings

• Variables that predict worsening are collected by nursing staff.

• Predictive models help healthcare professionals make decisions.

• The early identification of severe cases helps to prevent worsening of the condition.

What is known and what is new?

• Previous studies aimed to predict the hospitalization of patients with coronavirus disease 2019 (COVID-19) in the intensive care unit (ICU) with variables similar to those in this study, but they have not related them to the theoretical framework of individualized care.

• The research carried out advances in the sense of identifying predictor variables, applying and comparing machine learning algorithms (MLAs) for predicting the hospitalization of patients infected with COVID-19 in an ICU, based on the theoretical framework of Precision Nursing.

What is the implication, and what should change now?

• The identification of this model presents an opportunity for clinical use, as a tool to support decision-making, by indicating to health professionals which patients require more frequent assessments, by indicating the likelihood of ICU admission. Early identification of serious cases helps to prevent deterioration and reduces the risk of mortality and the use of mechanical ventilation.


Introduction

Patients with mild conditions of coronavirus disease 2019 (COVID-19), disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus of the coronavirus family, present the following signs and symptoms: fever, cough, fatigue and myalgia. However, after a week of the onset of symptoms, they can progress to severe cases of the disease with more severe conditions such as severe dyspnea and hypoxemia (1).

Statistics point out that 20% to 30% of patients with moderate and severe forms of the disease require hospital bed admission. As for hospitalized patients, the literature differs a little, and studies indicate that the rate of admission in the intensive care unit (ICU) is between 5% and 12%, reaching up to 25% of patients who need intensive care (1-3).

Traditionally, ICU beds have high rates of occupation, not only for COVID-19 emergencies, but also for a number of clinical conditions that require intensive care. With the outbreak of the pandemic, these two demands for ICU beds were added, varying the need according to the country and the behavior of the virus. Low- and middle-income countries have an availability of 0.1 to 2.5 ICU beds per 100,000 inhabitants, while high-income countries such as Monaco have an availability of ICU beds at 59.5 per 100,000 inhabitants (4).

The World Health Organization (WHO) recommends 1 to 3 ICU beds for every 10,000 inhabitants. In Brazil, beds are shared between the Unified Health System (Sistema Único de Saúde - SUS) and the private sector, however, most of the population uses public health services. Considering ICU beds in the SUS, only 11 Brazilian states had more than 1 ICU bed per 10,000 inhabitants, while in the private sector all states had more than 3 ICU beds per 10,000 inhabitants, except for the state of Santa Catarina which had 2.61 beds. In addition, the states have an unequal distribution of ICU beds by geographic area, in addition, the beds are concentrated in the capitals, as is the case of Manaus, in the state of Amazonas (5). Health services are distributed unevenly in Brazil, and the availability of ICU beds also varies among cities, reaching the lowest levels in poorer and black communities (6).

In view of this scenario, health technologies proved to be an important ally in the search for solutions to cope with COVID-19. Previous studies used prediction models to predict the transfer of patients to the ICU (1-3). Predictive models have the potential to be used in an ICU bed management context, especially MLAs have shown good results in tracking viral spread, patient monitoring, diagnostic methods, development of therapies, and drug discovery (7).

COVID-19 manifests in different ways in patients, while some individuals have mild symptoms of the disease, others develop the moderate to severe form of the infection, which can lead to death (1-3). This indicates that patients with the same clinical condition may have different courses of the disease, resulting in different individual responses to certain conditions. The same goes for treatment, the same drug can act in different ways in individuals with the same condition. The explanation is on the genetic characteristics of a group of individuals, which refers to the need for individualized health care (8).

In this search for individualization of health care, the term “precision” came to be used as a synonym for customized health care, aiming at more assertive care interventions. When these concepts are applied in the field of nursing, they support Precision Nursing studies, defined as the application of nursing care aimed at the specific needs of a single individual, based on the integration of large databases and integration of information on lifestyle habits, context in which they live and their social, economic and cultural influence on health. These associations provide information that will guide individual care for patients, with a view to achieving well-being in health (9-11). In this context there is a need for prediction models in the field of admission of patients infected with COVID-19 to ICU.

The study aimed comparison of machine learning models for predicting admission of patients infected with corona virus disease in ICUs, based on the theoretical framework of Precision Nursing. We present this article in accordance with the TRIPOD reporting checklist (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-296/rc).


Methods

Study design

Retrospective, multicenter study conducted in five Brazilian university hospitals, in the following regions of the country: (I) South: Hospital Universitário Polydoro Ernani de São Thiago - Universidade Federal de Santa Catarina (UFSC); (II) Southeast: Hospital Universitário Universidade Federal de São Paulo (UNIFESP); Hospital Universitário Clementino Fraga Filho - Universidade Federal do Rio de Janeiro (UFRJ); (III) Northeast: Hospital Universitário Onofre Lopes - Universidade Federal do Rio Grande do Norte (UFRN); (IV) North: Hospital Universitário Getúlio Vargas - Universidade Federal do Amazonas (UFAM) (12).

Data collection

Data were collected from April to December 2021 by telephone interview, from a structured interview used with pre-defined response options. The responses were recorded on the Google forms® platform. During the interviews, the Care Transitions Measure (CTM-15) instruments were initially applied, which aims to evaluate the transition of care from hospital services to home, and the instrument Patient Measure of Safety (PMOS), which analyzes patient safety in the hospital context. It must be highlighted that these instruments are not part of the analysis of the present study, but the sociodemographic data used for the characterization of patients, informed right after the application of the instruments mentioned above (13,14).

Inclusion criteria refer to COVID-19 positive patients admitted to hospital from October 2020 to December 2021, the period in which the study took place. The data collectors made a first contact with the patients. When the patient did not answer the phone call, two more attempts were made to establish contact, totaling three times, in different shifts and times. If even so, contact was not possible, the patient was recorded as a loss in the spreadsheet. Reasons for loss were considered: telephone did not correspond to the patients’ contact, unsuccessful contact after the collection period had expired, patient refused to participate, and other reasons.

In order to standardize data collection and minimize information bias in data collection, the general coordination of the project produced an instructional manual to guide data collectors during data collection. It also carried out online training demonstrating the flowchart to be followed during data collection. Contact attempts, including date, time and reason for loss, where applicable, were recorded in an Excel spreadsheet.

Study variables

The following are input variables related to the Precision Nursing. Among the sociodemographic variables, the following were considered: age (continuous variable); gender (male, female, other and prefer not to inform); education level (no education and less than 1 year of study, incomplete elementary education, complete elementary education, incomplete high school education, complete high school education, incomplete higher education, complete higher education); race (white, black, brown-mixed race/ethnicity, indigenous, Asian, other/unknown). Regarding clinical biomarkers: days of hospital stay and days of ICU stay (discrete variables); use of mechanical ventilation; chronic respiratory disease; systemic arterial hypertension; cardiovascular diseases; diabetes mellitus; kidney diseases; obesity; cancer; fever; fatigue; shortness of breath; cough; loss of smell and taste; headache; body pain; nausea and vomiting; diarrhea, categorized as yes and no. Regarding lifestyle habits, a single variable was analyzed: history of smoking (yes and no). Regarding the social determinants and patient’s context, the variables included: city of residence; family income [up to 2,090 Brazilian real (BRL), 2,091 to 5,225 BRL, 5,225 to 10,450 BRL, more than 10,450 BRL, no income, I prefer not to answer]; how many people live in your household.

Construction of the machine learning model

For the development of the predictive model, the following stages were followed: data preprocessing, attribute engineering, descriptive data exploration, model construction and model evaluation.

In the data preprocessing phase, the missing values in the database were initially identified. The variable income has the highest number of missing data. Some strategies were used to standardize the data, they are: the category ‘99 – interview suspended’ was declared as a missing value; the categories ‘yes’ and ‘no’ were declared as 1 and 0, respectively; the category ‘don’t know/don’t remember’ in the age variable was treated as a missing value; the category ‘6- I don’t know how to answer’ in the race variable was declared as a missing value. In the variable number of people in the household, the record ‘is in a nursing home’ was declared as a record of 20 people and ‘I prefer not to answer’ was treated as a missing value. In the variable days of ICU stay, the category ‘don’t know/don’t remember’ was transformed into a missing value; in the same variable, days greater than 0 were transformed into 1, otherwise the value corresponds to 0.

The variables: city of residence, date of admission and days of hospitalization were removed from the database, as they did not present relevant information for the modeling stage. The variable: mechanical ventilation was also removed from the data, as it could suggest the outcome of ICU admission. The 13 patients who did not have a record for the outcome of days spent in the ICU were removed from the database.

Data analysis and model validation

The work was developed with implementations in programming language and visual exploration of data in Python. Orange Data Mining and Knime Analytics Platform software were also used to identify the best models. The trained models were extracted from the Knime Analytics platform and imported and made available on a web page through the Streamlit tool.

The following models were tested: neural network, AdaBoost, logistic regression, random forest, K nearest neighbor (KNN), Naive Bayes, support vector machine (SVM) and decision tree. The parameterization of the models was the result of an interactive process, the final model compared after changing parameters with best result for:

  • Neural network: multi-layer perceptron (MLP) algorithm with backpropagation with 2 hidden layers with 250 and 100 neurons and activation function was the rectified linear unit function (ReLu). The optimization algorithm used with stochastic gradient-based optimizer (Adam) and L2 penalty (regularization term) parameter equal to 0.1, with maximal number of iterations equal to 300 and learning rate init =0.001.
  • AdaBoost is an ensemble algorithm that combines weak learners. Number of estimators equal to 50 and learning rate 0.01.
  • Logistic regression classification algorithm with LASSO (L1) regularization equal to 25.
  • Random forest is an ensemble of decision trees, with the number of trees equal to 37 and a growth control with the smallest subset that can be split equal to 15.
  • KNN: the number of nearest neighbors equal to 10, with a metric of Euclidean distance weighted by distance (closer neighbors of a query point have a greater influence).
  • Naive Bayes there is no hyperameters to tunning.
  • SVM with RBF kernel and penalty term for loss equal to 1.
  • The decision tree was parameterized in: the binary tree was induced, the minimum number of cases in branches set at 5 and the maximum depth of trees at 12, in addition, not dividing subsets smaller than 7 and classified to stop when the majority reached 99%.

Selecting algorithms with the best performance, based on cross-validation techniques of data, observing the metrics of area under the curve (AUC), sensitivity, specificity, and confusion matrix. The k-fold technique involves random division of the sample into a specific number of iterations and groups—in this case, five of each. In each iteration, four groups are designated for training and one for testing. In the next iteration, these groups are changed, and the final accuracy is computed from the mean accuracy of the iterations (15). The AUC metric refers to the largest area under the receiver operating characteristic (ROC) curve and correlates with the sensitivity and specificity measurements provided by the classifier.

To validate the best model, the data was split by randomly separating training (90 percent) and testing (10 percent) and repeated training/test splits 100 times. This technique seeks to obtain stable estimates with a high number of repetitions, reducing the uncertainty in performance estimates (16).

Statistical analysis

The most frequent values (mode) were imputed to the other missing values. In the variable income, 71 values were imputed, in the variable number of people in the household, 4 values, for education level and smoking, two values were imputed. The other variables were assigned one single value. The variables age and number of people in the household were maintained as numeric variables and the other variables were considered categorical. Categorical variables were transformed into binary values 0 or 1, using the one-hot encoding technique so that the data could be modeled as numerical values. Numeric variables were standardized by removing the mean and scaling around the unit variance (17).

Feature importance was calculated using Gini impurity, which assesses how much a variable contributes to reducing uncertainty in predictions. A higher Gini importance score indicates that the feature has a more significant impact on the model’s performance, specially for tree based algorithms (18).

An AUC of 0.7 reflects a 70% chance of correctly classifying the case. In general, AUC values are interpreted as: 0.5–0.6 (very bad), >0.6–0.7 (bad), >0.7–0.8 (poor), >0.8–0.9 (good), >0.9 (excellent). The AUC statistics should therefore be referred to according to their 95% confidence interval, allowing them to be compared with the null hypothesis of AUC =0.5.

Ethical aspects

The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Human research ethics committee of the Federal University of Santa Catarina (No. 4.347.463) and informed consent was obtained from all individual participants.


Results

The same patient could respond to both instruments, when this happened, the answers to the sociodemographic questionnaire from a single instrument were considered. Table 1 presents the number of patients per hospital and questionnaire.

Table 1

Number of research participants by hospital and by questionnaire, Florianopolis, SC, Brazil, 2021

University hospital Collected Total
CTM PMOS
Hospital Universitário Polydoro Ernani de São Thiago – Universidade Federal de Santa Catarina 78 74 152
Hospital Universitário – Universidade Federal de São Paulo 147 136 283
Hospital Universitário Clementino Fraga Filho – Universidade Federal do Rio de Janeiro 107 83 190
Hospital Universitário Onofre Lopes – Universidade Federal do Rio Grande do Norte 79 73 152
Hospital Universitário Getúlio Vargas –Universidade Federal do Amazonas 116 75 191
Total 643 541 968

CTM, care transitions measure; PMOS, patient measure of safety.

The data regarding COVID-19 positive patients who met the other inclusion criteria of the study resulted in 968 patients, after data processing in Python, it resulted in 547 patients, corresponding to the study sample, as shown in the flowchart in Figure 1. Of these, 309 patients did not go to the ICU and 238 did.

Figure 1 Data processing flowchart, Florianopolis, SC, Brazil, 2023. COVID-19, coronavirus disease 2019; CTM, care transitions measure; ICU, intensive care unit; PMOS, patient measure of safety.

Data were grouped regarding ICU admission for descriptive exploration, as shown in Table 2.

Table 2

Descriptive exploration of data regarding ICU admission (n=547), Florianopolis, SC, Brazil, 2021

Variables Not admitted to ICU, n=309 (56.49%) Admission to ICU, n=238 (43.51%) Total
Age (years)
   Mean 56.06 55.06
   Median 55 56
   Q1 (25%) 43 45
   Q3 (75%) 67 67
People in the same house
   Mean 3.27 3.13
   Median 3 3
   Q1 (25%) 2 2
   Q3 (75%) 4 4
Smoker
   No 194 (57%) 147 (43%) 341 (62.34%)
   Former smoker 105 (55%) 85 (45%) 190 (34.73%)
   Yes 10 (62%) 6 (38%) 16 (2.93%)
Gender
   Male 146 (52%) 133 (48%) 279 (51.01%)
   Female 162 (61%) 102 (39%) 264 (48.26%)
   Other 1 (25%) 3 (75%) 4 (0.73%)
Income
   No income 11 (46%) 13 (54%) 24 (4.39%)
   Income up to 2,000 BRL 153 (57%) 115 (43%) 268 (48.99%)
   Income from >2,000 to 5,000 BRL 111 (59%) 78 (41%) 189 (34.55%)
   Income from >5,000 to 10,000 BRL 24 (48%) 26 (52%) 50 (9.14%)
   Income greater than 10,000 BRL 10 (62%) 6 (38%) 16 (2.93%)
Hospital
   Universidade Federal de São Paulo Hospital 85 (50%) 86 (50%) 171 (31.26%)
   Universidade Federal do Amazonas Hospital 83 (78%) 24 (22%) 107 (19.56%)
   Universidade Federal de Santa Catarina Hospital 47 (50%) 47 (50%) 94 (17.18%)
   Universidade Federal do Rio de Janeiro Hospital 53 (56%) 41 (44%) 94 (17.18%)
   Universidade Federal do Rio Grande do Norte Hospital 41 (51%) 40 (49%) 81 (14.81%)
Education level
   Complete higher education 37 (51%) 35 (49%) 72 (13.16%)
   Incomplete higher education 27 (61%) 17 (39%) 44 (8.04%)
   Complete high school 104 (58%) 75 (42%) 179 (32.72%)
   Incomplete high school 27 (66%) 14 (34%) 41 (7.5%)
   Complete elementary school 25 (46%) 29 (54%) 54 (9.87%)
   Incomplete elementary school 77 (58%) 55 (42%) 132 (24.13%)
   No education 12 (48%) 13 (52%) 25 (4.57%)
Race
   Brown (mixed race/ethnicity) color 133 (58%) 98 (42%) 231 (42.23%)
   White color 125 (54%) 105 (46%) 230 (42.05%)
   Black color 43 (60%) 29 (40%) 72 (13.16%)
   Asian ethnicity 5 (83%) 1 (17%) 6 (1.1%)
   Indigenous 1 (25%) 3 (75%) 4 (0.73%)
   Other/unknown 2 (50%) 2 (50%) 4 (0.73%)
Dyspnea 230 (54%) 196 (46%) 426 (77.88%)
Fatigue 221 (52%) 200 (48%) 421 (76.97%)
Cough 211 (56%) 168 (44%) 379 (69.29%)
Body pain 211 (56%) 165 (44%) 376 (68.74%)
Fever 204 (55%) 169 (45%) 373 (68.19%)
Headache 170 (57%) 126 (43%) 296 (54.11%)
Loss of smell/taste 170 (59%) 119 (41%) 289 (52.83%)
Hypertension 168 (59%) 118 (41%) 286 (52.29%)
Diarrhea 132 (59%) 91 (41%) 223 (40.77%)
Nausea and vomiting 117 (62%) 73 (38%) 190 (34.73%)
Diabetes 108 (61%) 70 (39%) 178 (32.54%)
Obesity 81 (54%) 70 (46%) 151 (27.61%)
Cardiovascular diseases 85 (62%) 52 (38%) 137 (25.05%)
Chronic respiratory disease 55 (52%) 51 (48%) 106 (19.38%)
Kidney diseases 65 (66%) 34 (34%) 99 (18.1%)
Cancer 32 (63%) 19 (37%) 51 (9.32%)

BRL, Brazilian real; ICU, intensive care unit; Q, quartile.

According to the estimates for the AUC-ROC curve, it was found that the decision tree followed by neural network algorithms showed a significantly higher probability of correct classification for predicting the hospitalization of patients infected with COVID-19 in the ICU, compared to the areas under the curve observed in the logistic regression; KNN; Naive Bayes; SVM and AdaBoost algorithms. It should be noted that the AdaBoost algorithm was the one with the lowest estimate for the AUC, and consequently had the lowest chance of correctly classifying the hospitalization of patients infected with COVID-19 in the ICU.

It should be noted that all the estimates for the AUC were greater than 0.500, indicating that the probability of success is not due to chance. However, the vast majority of estimates were classified as bad (>0.6–0.7).

The performance of each predictive algorithms for COVID-19 patient ICU admission is presented in Table 3.

Table 3

Algorithm performance, Florianópolis, SC, Brasil, 2022

Algorithms Area under the curve (95% CI) Sensitivity Specificity
Decision tree 0.668 (0.664–0.672)a 0.633 0.669
Neural network 0.666 (0.663–0.669)a 0.622 0.673
Random forest 0.656 (0.652–0.670)ab 0.616 0.734
Logistic regression 0.638 (0.635–0.641)b 0.618 0.692
K Nearest Neighbor 0.632 (0.628–0.63)b 0.605 0.673
Naive Bayes 0.623 (0.620–0.626)b 0.592 0.640
Support Vector Machine 0.615 (0.611–0.619)b 0.580 0.572
AdaBoost 0.585 (0.580–0.590)c 0.589 0.611

a, b, c: System of letters identifying the statistically significant differences between the different algorithms, where areas under the curve with equal letters do not differ at a significance level of 5%. CI, confidence interval.

Considering all the AUC, sensitivity and specificity metrics, the decision tree was the best model. The decision tree was validated by means of a new model training and testing procedure—repeated training/test splits—and the model’s accuracy was assessed, indicating a final accuracy of 60.5%.

For the decision tree model, the predictive variables that had the greatest influence on the model are: age, hospital linked to the Universidade Federal do Amazonas and number of people living in the same household, followed by kidney disease, dyspnea, loss of taste and smell, female, non-smoker, fatigue, diabetes, incomplete higher education, white, headache, male, complete secondary education, brown, hypertension, nausea and vomiting, cancer, salary up to 2,090 BRL, hospital linked to UNIFESP, incomplete primary education, diarrhoea, body pain, cardiovascular diseases, complete university education, fever, complete primary education, black, hospital linked to UFSC, hospital linked to UFRJ, hospital linked to UFRN, ex-smoker, salary between 2,091 to 5,225 BRL, cough, chronic respiratory disease, obesity, incomplete secondary education, salary between 5,225 to 10,450 BRL, no income, yellow, no education, salary more than 10,450 BRL, smoker, indigenous.

In the confusion matrix, the model categorized 207 patients as not going to the ICU and in fact they did not, 139 patients were classified by the model as likely to go to the ICU and actually were, on the other hand, 102 patients were classified as false positives, that is, the model classified as if the patients were going to the ICU but in fact they did not, and 99 patients the model classified as not going to the ICU when they did. Figure 2 shows how the decision tree prediction occurred up to level 6.

Figure 2 Decision tree, Florianópolis, SC, Brasil, 2023.

Discussion

From the 08 tested models, the AUC ranged from 0.585 to 0.668 and the decision tree was the one that presented the best performance. This algorithm is one of the most elementary MLAs, and for this reason, it facilitates understanding of its complete functioning. In the structure of a decision tree, each node means a test made on an attribute. The branches are the results of the tests and their leaves (nodes) present the label of a class based on the decision of the model. This division of the dataset into subsets is what leads to learning. The process of division into subsets is finished when a node presents a value equal to that of the variable of interest or the divisions do not add more value to the prediction. Therefore, the tree-like structure establishes paths by dividing the data into small groups, that is, it works as a classification model and, within a set of possibilities, predicts the correct class of that information (19).

After identification of the decision tree model, the discussion about the behavior of the disease and the representativeness of the variables regarding the variable of interest, help us to understand the results presented by the artificial intelligence models. The variables presented in Table 2 show a balance between the data, both from patients who did not go to the ICU and from patients who did, which may be related to changes in the scenario of the COVID-19 pandemic, such as the emergence of new variants with different clinical biomarkers and the inclusion of the vaccination variable acting on the admission of patients, since in the data collection period, from April to December 2021, some patients had already been vaccinated against the disease.

Regarding the age variable, the mean age of patients not admitted to the ICU was 56.06 years old and of patients admitted to the ICU, 55.06 years old, that is, the patients admitted to the ICU in this database are younger than the than patients not admitted to the ICU. At the beginning of the pandemic, in April 2020, 40% of patients admitted to the ICU were patients over 70 years old (20). However, as vaccination rates increased, the mean age of admission decreased. A study conducted in a hospital in Spain found that the mean age of patients admitted to the ICU before immunization was 63.22 years and 58.88 after immunization, which represents a decrease of 4.34 years in a 95% confidence interval. The mean age drops to 52.35 years after immunization of 10% of the population and the percentage of patients over 60 admitted to the ICU due to COVID-19 has dropped to less than 50% since 17% of the population had been vaccinated (21).

Vaccination is a decisive factor for managing the pandemic with the potential to control the disease and reduce severe cases (21,22). Immunization reduces the ICU admission rates of patients vaccinated against COVID-19 by up to 65.6% (22). In patients with the complete vaccine schedule, there is also a 55.4% reduction in the probability of using a mechanical ventilator and a 22.6% lower probability of dying (23).

Regarding the model’s predictive variables, that is, variables that most influence the model, age, hospitalization at the university hospital linked to the Universidade Federal do Amazonas and the number of residents in the same household stand out. Although there is no pattern between the age data as previously mentioned, the algorithm still identified this variable as the most important for the model.

The predictive variable age converges with a research conducted in Spain, in which age was the most parsimonious predictive for ICU admission among COVID-19 patients, followed by temperature and respiratory rate (24). While a study conducted in the United States identified procalcitonin, lactate dehydrogenase, C-reactive protein, oxygen saturation, ferritin, and temperature as predictive variables of ICU admission (3). However, the variables procalcitonin, lactate dehydrogenase, C-reactive protein and ferritin were not considered in the present study, while the presence or absence of dyspnea and temperature were collected from the statements of patients or family members. Another study conducted in the USA identified respiratory rate, leukocytes, lymphocytes, diastolic blood pressure, C-reactive protein, pulse oximetry and, in 7th position, age as a predictor variable for ICU admission (2).

The other two predictive variables that most influenced the model were the number of people living in the household and admission to the university hospital linked to the Universidade Federal do Amazonas. These variables refer to specific characteristics of Brazil. Regarding admission at the UFAM hospital, there were more patients who were not admitted to the ICU than patients who were admitted.

Amazonas is the largest state in Brazil in terms of land area, covering 1,559,168.12 km2 and with a population of 4,144,597 inhabitants, and was one of the cities most affected by COVID-19 in Brazil. ICU beds were concentrated only in the capital, Manaus, with 1.24 public and private ICU beds per 10,000 inhabitants. This may have limited patients’ access to treatment for severe cases of COVID-19, precisely because of the difficulty in transporting patients to health services, due to limited access to mechanical ventilators, hospitals and ICUs (5).

The findings of the present study associated with the results found in the scientific literature indicate that there is no standard between the predictive variables of ICU admission, since the studies analyze different variables and some of these variables are associated with local characteristics. Although studies use different variables, including more costly variables such as laboratory biomarkers, the AUC of these models does not show significant variations. For example, in the present study, the best model (decision tree) showed an AUC of 69%. In the scientific literature, the model that identified the predictive variables age, temperature, and respiratory rate for the probability of ICU admission, presented an AUC of 76% (24). While the model that included the following predictive variables: procalcitonin, lactate dehydrogenase, C-reactive protein, oxygen saturation, ferritin and temperature, the AUC was 79% (3). And the model that included respiratory rate, leukocytes, lymphocytes, diastolic blood pressure, C-reactive protein, pulse oximetry and in the 7th position age showed an AUC of 79.9% (2).

The selection of the correct variables usually improves the model’s performance, it is unlikely that all the variables of a database are useful for constructing an artificial intelligence model, on the contrary, redundant variables reduce the model’s potential of generalization and consequently influence the prediction outcome. Thus, the number of variables collected must be the necessary and very well refined to achieve good predictive performance. For this reason, unnecessary variables should be removed as they increase the cost of data collection, computational cost and increase model complexity. For example, a model that includes a variable representing chest X-ray results will increase data collection costs (19).

The predictive variables identified in the study are collected by nurses during the nursing history and other symptoms and biomarkers are identified during the patient’s physical examination. The collection and analysis of these clinical and laboratory biomarkers provides information that helps nurses in their decision making to plan and implement precision interventions based on the identification of these biomarkers. Thus, the biomarkers must be associated with hypotheses of precision nursing interventions and occur in such a way as to present a broader assessment of the individual’s baseline condition and, based on the assessment of the biomarkers, validate or refute the hypothesis that led to the intervention, based on evidence (25).

This leads to the previous discussion that the inclusion of the vaccination variable could be a variable with greater predictive value for this study. In addition, the regional characteristics of the patients were not taken into account, since the study includes patients from four regions of Brazil. The regional characteristics of the population, such as genetics, inserted context, and lifestyle habits, may have influenced the performance of the models. This study used centralized machine learning, in which the central aggregator searches patient data from participating institutes and a global model is built by the central aggregator. However, federated learning presents could be a solution to the limited size and diversity of data, since the data can be less biased and more representative of the population. Federated learning is a recent area of the machine learning framework which allows global models to be built without sharing data between the participating institutes (26).

This study had some limitations related to analysing only data from patients who survived hospitalisation for COVID-19. There were also no stratified analyses considering the regional characteristics of the patients, since the study included patients from four regions of Brazil. The regional characteristics of the population, such as genetics, background and lifestyle habits, may have influenced the performance of the models. For these reasons, it is recommended that these findings be generalised with caution, since the models were developed in a certain context, with certain assumptions. In other words, the models presented here have applicability in a specific scenario with specific patients.


Conclusions

The decision tree algorithm showed the best predictive performance for predicting the admission of patients infected with COVID-19 to the ICU, with the following metrics: AUC =0.668, sensitivity =0.633 and specificity =0.669. The most important predictor variables for the decision tree model were: hospital linked to the Universidade Federal do Amazonas and number of people living in the same household and there seems to be no consensus in the scientific literature on which would be the predictive variables for hospitalization of infected patients with COVID-19 in the ICU, since previous studies analyzed different variables and some of them are associated with local and/or regional characteristics.

The identification of this model presents an opportunity for clinical use, as a tool to support decision-making, by indicating to nurses which patients require more frequent assessments, based on the likelihood of ICU admission. The early identification of severe cases helps to prevent worsening of the condition and reduces the risk of mortality and the need for mechanical ventilation.

However, based on the dynamics of the disease, such as virus mutations, with different clinical and laboratory biomarkers, it is necessary to refine the predictive variables and to conduct frequent evaluations to increase the predictive capacity of the artificial intelligence model. Among the weaknesses of the study is the fact that this prediction analyzed only patients who survived COVID-19 and who had the physical, psychological, and cognitive conditions to respond the questionnaire. Additionally, information regarding the vaccination status of patients was not collected and vaccination is a decisive factor in managing the pandemic, helping to cope with the disease and reducing severe cases. Therefore, for future studies, it is suggested to include these markers, starting from the hypothesis that they could improve the subsequent interactions of the model.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-296/rc

Data Sharing Statement: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-296/dss

Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-296/prf

Funding: This work was supported by Coordination for the Improvement of Higher Education Personnel – Brazil (grant number 001), the National Council for Scientific and Technological Development to G.C.F. (141591-2019-6), to A.L.E. and J.L.G.D.S. (317326/2021-0) and to E.L. (307625/2022-2) and State Fund to Support the Maintenance and Development of Higher Education to G.C.F. (261/SED/2022).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-296/coif). G.C.F. reports funding received from Coordination for the Improvement of Higher Education Personnel – Brazil (grant number 001), the National Council for Scientific and Technological Development (141591-2019-6), and State Fund to Support the Maintenance and Development of Higher Education (261/SED/2022). A.L.E. and J.L.G.D.S. report funding from the National Council for Scientific and Technological Development (317326/2021-0). E.L. reports funding from the National Council for Scientific and Technological Development (307625/2022-2). The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Human research ethics committee of the Federal University of Santa Catarina (No. 4.347.463) and informed consent was obtained from all individual participants.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Covino M, Sandroni C, Santoro M, et al. Predicting intensive care unit admission and death for COVID-19 patients in the emergency department using early warning scores. Resuscitation 2020;156:84-91. [Crossref] [PubMed]
  2. Cheng FY, Joshi H, Tandon P, et al. Using Machine Learning to Predict ICU Transfer in Hospitalized COVID-19 Patients. J Clin Med 2020;9:1668. [Crossref] [PubMed]
  3. Hou W, Zhao Z, Chen A, et al. Machining learning predicts the need for escalated care and mortality in COVID-19 patients from clinical variables. Int J Med Sci 2021;18:1739-45. [Crossref] [PubMed]
  4. Ma X, Vervoort D. Critical care capacity during the COVID-19 pandemic: Global availability of intensive care beds. J Crit Care 2020;58:96-7. [Crossref] [PubMed]
  5. Palamim CVC, Marson FAL. COVID-19 - The Availability of ICU Beds in Brazil during the Onset of Pandemic. Ann Glob Health 2020;86:100. [Crossref] [PubMed]
  6. Pereira RHM, Braga CKV, Servo LM, et al. Geographic access to COVID-19 healthcare in Brazil using a balanced float catchment area approach. Soc Sci Med 2021;273:113773. [Crossref] [PubMed]
  7. Heidari A, Jafari Navimipour N, Unal M, et al. Machine learning applications for COVID-19 outbreak management. Neural Comput Appl 2022;34:15313-48. [Crossref] [PubMed]
  8. Lopes-Júnior LC. Personalized Nursing Care in Precision-Medicine Era. SAGE Open Nurs 2021;7:23779608211064713. [Crossref] [PubMed]
  9. Lopes Júnior LC. The era of precision medicine and its impact on nursing: paradigm shifts? Rev Bras Enferm 2021;74:e740501. [Crossref] [PubMed]
  10. Yuan C. Precision Nursing: New Era of Cancer Care. Cancer Nurs 2015;38:333-4. [Crossref] [PubMed]
  11. Fu MR, Kurnat-Thoma E, Starkweather A, et al. Precision health: A nursing perspective. Int J Nurs Sci 2020;7:5-12. [Crossref] [PubMed]
  12. Cechinel-Peiter C, Engel FD, Mello ALSF, et al. Data collection via phone in multicentric research on nursing care in the face of Covid-19. Texto Contexto Enferm 2024;33:e20220261. [Crossref]
  13. Acosta AM, Lima MADS, Marques GQ, et al. Brazilian version of the Care Transitions Measure: translation and validation. Int Nurs Rev 2017;64:379-87. [Crossref] [PubMed]
  14. Mello JF, Barbosa SFF. Translation and transcultural adaptation of the Patient Measure of Safety (PMOS) questionnaire to Brazilian Portuguese. Texto Contexto Enferm 2021;30:e20180322. [Crossref]
  15. Netto AV, Berton L, Takahata AK. Data science and Artificial Intelligence in Healthcare. São Paulo: Editora dos Editores; 2021. 224 p.
  16. Kuhn M, Johnson K. Applied Predictive Modeling. New York, NY: Springer New York; 2013.
  17. Scikit learn. sklearn.preprocessing.StandardScaler — scikit-learn 1.0.2 documentation. [Internet]. [cited 2023 Jun 07]. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
  18. Géron A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. 2nd ed. Sebastopol, CA: O’Reilly Media; 2019.
  19. Shukla S, Maheshwari A, Johri P. Comparative analysis of machine learning algorithms; Stramlit web application. In: 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N). [Internet]. 2021 Dec 17 [cited 2023 Jun 05]. Available online: https://ieeexplore.ieee.org/document/9725496
  20. Cohen JF, Korevaar DA, Matczak S, et al. COVID-19-Related Fatalities and Intensive-Care-Unit Admissions by Age Groups in Europe: A Meta-Analysis. Front Med (Lausanne) 2020;7:560685. [Crossref] [PubMed]
  21. González-Castro A, Fito EC, Fernandez A, et al. Impact of vaccination on admissions to an intensive care unit for COVID-19 in a third-level hospital. Med Intensiva 2022;46:406-7.
  22. MoghadasSMVilchesTNZhangKThe impact of vaccination on COVID-19 outrbreaks in the Unit States.medRxiv. 2020 Jan. Preprint. doi: 10.1101/2020.11.27.20240051
  23. Maltezou HC, Basoulis D, Bonelis K, et al. Effectiveness of full (booster) COVID-19 vaccination against severe outcomes and work absenteeism in hospitalized patients with COVID-19 during the Delta and Omicron waves in Greece. Vaccine 2023;41:2343-8. [Crossref] [PubMed]
  24. Izquierdo JL, Ancochea JSavana COVID-19 Research Group, et al. Clinical Characteristics and Prognostic Factors for Intensive Care Unit Admission of Patients With COVID-19: Retrospective Study Using Machine Learning and Natural Language Processing. J Med Internet Res 2020;22:e21801. [Crossref] [PubMed]
  25. Corwin EJ, Ferranti EP. Integration of biomarkers to advance precision nursing interventions for family research across the life span. Nurs Outlook 2016;64:292-8. [Crossref] [PubMed]
  26. Oh W, Nadkarni GN. Federated Learning in Health care Using Structured Medical Data. Adv Kidney Dis Health 2023;30:4-16. [Crossref] [PubMed]
doi: 10.21037/jmai-24-296
Cite this article as: Fabrizzio GC, de Oliveira LM, Da Costa DG, Santos JLGD, Jensen R, Lorenzini E, Bender C, Cichosz SL, Erdmann AL. Comparison of machine learning algorithms for predicting admission in intensive care units based on the precision nursing framework. J Med Artif Intell 2025;8:44.

Download Citation