Role of artificial intelligence in healthcare settings: a systematic review
Highlight box
Key findings
• Future of clinical medicine can be revolutionized by integration of artificial intelligence (AI).
What is known and what is new?
• Previous researches have pointed out the benefits of AI in various clinical and diagnostic procedures.
• However, majority of these studies had risk of bias especially due to lack of randomized control trials. It was also observed that AI use is mostly applied in developed countries.
What is the implication, and what should change now?
• Current data on the integrity of AI is questionable due to lack of bias less researches. Future studies need to focus on conducting Randomized control trials to eliminate such risk bias.
Introduction
It is necessary to develop new training models for healthcare workers to adapt to changing practice environments. One possibility is the use of tools with enhanced technological capabilities (1). Technology is often defined as objects and materials developed or modified to address real-world issues. Products that fall under the category of such technologies include computer-based virtual reality, artificial intelligence (AI), machine learning (ML), etc. (2,3).
AI and ML are emerging as revolutionary tools at a time of unprecedented advancement in technology (4). Training with the help of simulation technology is a fundamental component of the health profession (5). The potential benefits of using AI and ML in healthcare settings include increased training efficacy, optimized precision and accuracy outcomes, and ultimately, better patient care (6). To fully utilize AI and ML in healthcare settings, it is essential that we evaluate the present landscape, acknowledge the accomplishments, and pinpoint the areas that require substantial research.
It is difficult to overstate the importance of applications for AI in healthcare. AI has tremendous potential for transforming the fields of disease diagnosis, personalized medicine, real-time health condition monitoring, and operational healthcare management (7). AI-driven diagnostic technologies are capable of to effectively evaluate medical images and regularly pick up on details that a human eye could have overlooked (8). This accuracy leads to more rapid and precise diagnosis, which has a substantial effect on patient outcomes. Similar to this, AI algorithms are making major advances towards personalized medicine in the area of treatment individualization (9). They can search through enormous datasets to find patterns and identify which medicines will work best for certain patient profiles. AI is also used in monitoring of patients, where wearable technology and remote monitoring systems provide ongoing patient health oversight, allowing for prompt treatments and a decrease in readmissions to hospitals (10). AI can optimize hospital processes and simplify operations, including appointment scheduling, to improve efficiency and satisfaction with patients in the delivery of healthcare (11).
The aim of this systematic review is to thoroughly investigate the various applications of AI and the potential challenges for the healthcare industry. This study offers a thorough overview of how AI technologies handle healthcare concerns, improve patient care, and enhance healthcare outcomes. This review will also critically evaluate the difficulties and challenges that arise when incorporating AI into healthcare procedures, ranging from ethical and technological challenges to implementation and regulatory challenges. The goal is to present an unbiased perspective that acknowledges the benefits of AI in healthcare while also addressing the difficulties and complications associated with its implementation. We present this article in accordance with the PRISMA reporting checklist (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-294/rc) (12).
Methods
Studies selection
This systematic review aimed to summarize the role of AI in healthcare settings. The central question guiding this review was: Does AI help to enhance the patient care provided in the healthcare industry? Formulated in line with the PICO framework, the breakdown is as follows: P (population) refers to the adult population of healthcare workers (both genders, who are involved in providing patient care), I (intervention) refers to AI technologies in healthcare, C (comparison) refers to comparing traditional methods and AI-driven methods, O (outcome) refers to all the training effectiveness, skill acquisition, patient care outcomes, decision-making abilities, patient safety in which the AI plays a significant role.
During the identification of the articles, duplicates were removed by exporting them to EndNote Basic (ENDNOTE, 2015). Subsequently, the studies were chosen in two stages. Reviewer 1 evaluated titles and abstracts in duplicate, separately, throughout phase 1 to find studies that qualified.
The following inclusion and exclusion criteria were used during article selection:
- Inclusion criteria:
- Randomized control trials, cohort studies, case-control studies, and Quasi-experimental studies will be included in this review.
- Language: only papers written in English.
- Studies focusing on the role of AI in healthcare settings.
- Studies that are peer-reviewed.
- Exclusion criteria:
- Case reports, systematic reviews, literature reviews, or editorial letters will be excluded from this study.
- Studies that are focusing on the role of AI in non-healthcare populations.
- Studies that are not reporting the efficacy of AI in healthcare settings.
- Non-peer-reviewed literature.
Search strategy
A systematic search was done for the relevant literature on the following four databases to retrieve relevant studies: Scopus: (“artificial intelligence” OR “AI” OR “machine learning” OR “deep learning” OR “neural networks” OR “intelligent systems”) AND (“healthcare setting” OR “medical training” OR “clinical setting” OR “virtual reality” OR “augmented reality” OR “simulation training” OR “healthcare training”), Web of Science: TS=(“artificial intelligence” OR “AI” OR “machine learning” OR “deep learning” OR “neural networks” OR “intelligent systems”) AND TS=(“healthcare setting” OR “medical training” OR “clinical setting” OR “virtual reality” OR “augmented reality” OR “simulation training” OR “healthcare training”), PubMed/EMBASE: (“artificial intelligence” OR “AI” OR “machine learning” OR “deep learning” OR “neural networks” OR “intelligent systems”) AND (“healthcare setting” OR “medical training” OR “clinical setting” OR “virtual reality” OR “augmented reality” OR “simulation training” OR “healthcare training”), Google Scholar: “artificial intelligence” OR AI OR “machine learning” OR “deep learning” OR “neural networks” OR “intelligent systems” AND “healthcare setting” OR “medical training” OR “clinical setting” OR “virtual reality” OR “augmented reality” OR “simulation training” OR “healthcare training”, Cochrane Library: “artificial intelligence” OR AI OR “machine learning” OR “deep learning” OR “neural networks” OR “intelligent systems” AND “healthcare setting” OR “medical training” OR “clinical setting” OR “virtual reality” OR “augmented reality” OR “simulation training” OR “healthcare training”. Databases were also searched for published systematic reviews or ongoing systematic reviews on the same topic. Relevant studies were retrieved and stored on ENDNOTE to discard the repeated results.
Data collection
Data were separately extracted by the same reviewer from the chosen articles. Title, authors, name of journal, duration, kind of study, country, age, gender, number of participants, location, and key findings were noted for each included study.
Assessment of risk bias
Data extractions were conducted using a standard form, and the full-text articles were assessed according to the Newcastle-Ottawa Scale (NOS) criteria. Publications were given scores on a low, medium, or high scale as a methodological quality indicator based on several variables such as reporting bias, performance, and selection. The inclusion and randomization criterion descriptions were used to score preference for selection. Allocation concealment and descriptions of a control arm were taken into consideration when evaluating performance bias. Biased reporting, industrial sponsorship, partial data management, and selective reporting received different rankings. During several teleconferences, the topics of eligibility limitations and reporting uniformity were covered. A second author considered gaps in the reviewers’ scores before selecting a study.
Results
Search results
A total of 3,047 studies were identified from 5 different databases of which 794 were from Scopus, 491 studies from Web of Science (WOS), 634 studies were from PubMed Central, 733 from Google Scholar, and 395 from Cochrane Library. All of these articles were sorted on endnote in which 1,113 were removed as duplicates. The remaining 1,934 articles were screen based only on title and abstract in which 899 articles were found to be irrelevant to this study and were removed from this study. The remaining 1,035 articles were searched for the availability of full text in these databases in which full text was not available for 203 articles. Eight hundred and thirty-two different full-text articles were assessed for eligibility to be included in this study. Sixty-three articles were not in the English language, 169 were found to be not based on AI in healthcare, and 354 articles were either review articles, industrial reports, Book chapters, or case reports. One hundred and ninety-five articles were relevant based on AI enhancement but not specifically in healthcare. After the exclusion of those articles, 51 articles were found to be fully eligible for inclusion in this review (Figure 1).
Risk bias assessment
A risk of bias assessment was done using the NOS. Out of 51 studies, 22 studies were found to have low-risk bias, 27 had moderate risk bias and 2 studies (13,14) had high-risk bias. In some studies, a portion of their methodological flaw is the way they chose their controls. Furthermore, no study disclosed the blinding of controls and patients concerning exposure, which may have led to measurement bias (Table 1).
Table 1
Study | Selection | Comparability | Exposure | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Is the case definition adequate? | Representativeness of cases | Selection of controls | Definition of controls | Comparability of cases and controls based on the design or analysis | Determination of exposure | The same calculation method for cases and controls | Non-response rate | |||
(15) | ★ | ★ | ★★ | ★ | ★ | |||||
(16) | ★ | ★ | ★ | ★ | ★ | |||||
(17) | ★ | ★ | ★ | ★ | ★ | |||||
(18) | ★ | ★ | ★ | ★★ | ★ | ★ | ★ | |||
(19) | ★ | ★ | ★★ | ★ | ★ | ★ | ||||
(20) | ★ | ★ | ★ | ★ | ★ | ★ | ||||
(21) | ★ | ★ | ★★ | ★ | ★ | |||||
(22) | ★ | ★ | ★ | ★★ | ★ | ★ | ★ | |||
(23) | ★ | ★ | ★ | ★ | ★ | ★ | ||||
(24) | ★ | ★ | ★★ | ★ | ★ | ★ | ||||
(13) | ★ | ★ | ★ | |||||||
(25) | ★ | ★ | ★ | ★ | ★ | |||||
(26) | ★ | ★ | ★ | ★★ | ★ | ★ | ★ | |||
(27) | ★ | ★ | ★ | ★ | ★ | ★ | ||||
(28) | ★ | ★ | ★ | ★ | ||||||
(29) | ★ | ★ | ★★ | ★ | ★ | ★ | ||||
(30) | ★ | ★ | ★ | ★ | ★ | ★ | ||||
(31) | ★ | ★ | ★ | ★ | ★ | ★ | ||||
(32) | ★ | ★ | ★ | ★★ | ★ | ★ | ★ | |||
(33) | ★ | ★ | ★★ | ★ | ★ | ★ | ||||
(34) | ★ | ★ | ★ | ★ | ★ | ★ | ★ | |||
(35) | ★ | ★ | ★★ | ★ | ★ | |||||
(36) | ★ | ★ | ★ | ★ | ★ | |||||
(37) | ★ | ★ | ★ | ★ | ★ | |||||
(38) | ★ | ★ | ★ | ★★ | ★ | ★ | ★ | |||
(39) | ★ | ★ | ★★ | ★ | ★ | ★ | ||||
(40) | ★ | ★ | ★ | ★ | ★ | ★ | ||||
(41) | ★ | ★ | ★★ | ★ | ★ | |||||
(42) | ★ | ★ | ★ | ★★ | ★ | ★ | ★ | |||
(43) | ★ | ★ | ★ | ★ | ★ | ★ | ||||
(44) | ★ | ★ | ★★ | ★ | ★ | ★ | ||||
(14) | ★ | ★ | ★ | |||||||
(45) | ★ | ★ | ★ | ★ | ★ | |||||
(46) | ★ | ★ | ★ | ★★ | ★ | ★ | ★ | |||
(47) | ★ | ★ | ★ | ★ | ★ | ★ | ||||
(48) | ★ | ★ | ★ | ★ | ||||||
(49) | ★ | ★ | ★★ | ★ | ★ | ★ | ||||
(50) | ★ | ★ | ★ | ★ | ★ | ★ | ||||
(51) | ★ | ★ | ★ | ★ | ★ | ★ | ||||
(52) | ★ | ★ | ★ | ★★ | ★ | ★ | ★ | |||
(53) | ★ | ★ | ★★ | ★ | ★ | ★ | ||||
(54) | ★ | ★ | ★ | ★ | ★ | ★ | ★ | |||
(55) | ★ | ★ | ★ | ★ | ||||||
(56) | ★ | ★ | ★★ | ★ | ★ | ★ | ||||
(57) | ★ | ★ | ★ | ★ | ★ | ★ | ||||
(58) | ★ | ★ | ★ | ★ | ★ | ★ | ||||
(59) | ★ | ★ | ★ | ★★ | ★ | ★ | ★ | |||
(60) | ★ | ★ | ★★ | ★ | ★ | ★ | ||||
(61) | ★ | ★ | ★ | ★ | ★ | ★ | ★ | |||
(62) | ★ | ★ | ★ | ★ | ||||||
(63) | ★ | ★ | ★ | ★ | ★ | ★ |
A study can receive a maximum of one star (★) for each numbered item in the selection and exhibition categories. A maximum of two stars (★★) can be given for comparability. Rating scale: 7 to 9 stars = low risk of bias; 4 to 6 stars = moderate risk of bias; 0 to 3 stars = high risk of bias.
GRADEpro GDT indicated that the studies that were part of this systematic review had moderate quality of evidence. The inclusion of cohort studies, which increases the risk of bias because it is unable to randomize the exposure, and the inconsistent nature of the research were the main causes of the low quality of the evidence.
Characteristics of included studies
In terms of research design, the 51 studies comprised one nonrandomized trial, 1 structured interview, 8 experimental studies, 4 comparative cross-sectional research studies, 4 polls, 8 retrospective studies, 14 randomized controlled trials, and 17 prospective studies. It is significant to remember that depending on when the data was collected, observational studies can be divided into prospective and retrospective studies. In prospective studies, the study’s design and gathering data methods are planned out before individuals become unhealthy or experience other unexpected findings. Retrospective studies gather data on people who are currently and were previously involved in the study; subjects may have developed the condition or experienced additional effects of interest before the start of the research design and data gathering process (Table 2).
Table 2
Citation | Country | Study design | Sample size | AI applications | AI techniques | Clinical tasks | Implementation of AI systems |
---|---|---|---|---|---|---|---|
(15) | US | Prospective cross-sectional | 819 | IDx-DR diagnostic system (IDx LLC, Coralville, IA) for automatic detection of diabetic retinopathy and diabetic macular edema based on retinal images | ML (CNN) | Disease diagnosis | Conduct a trial of IDx-DR diagnostic system to detect diabetic retinopathy. Length: 7 months |
(16) | Japan | Retrospective cross-sectional | 6 | A deep learning-based system that automatically detects mucosal breaks based on small-bowel capsule endoscopy images | ML (CNN) | Disease screening | Endoscopist readings after the first screening by the AI system for mucosal break detection Task: 20 videos of small-bowel capsule endoscopy procedure |
(17) | US | Prospective cross-sectional | 347 | A deep learning algorithm for intracranial hemorrhage diagnosis based on computed tomography scans of the head | ML (CNN) | Disease diagnosis | Re-prioritize head CT studies by implementing a DL model for ICH diagnosis. Length: 3 months |
(18) | US | Randomized control trials | 20,031 | A real-time algorithm for clinical deterioration for patients in general medical wards based on vital sign measures | ML (DT, LR) | Risk analysis | Patients receive real-time alerts an AI sepsis prediction tool. Length: 4 years and 5 months |
(19) | Not reported | Experiment within subjects | 3 | An AI-based decision support system for breast cancer diagnosis based on ultrasound images | ML (NR) | Disease diagnosis | Sequential workflow. Task: 500 lesion cases |
(20) | Canada | Prospective cross-sectional | 350 | A clinical decision support system for antimicrobial stewardship based on patient clinical data | ML (rule induction, instance-based learning) | Treatment | Implement a system (baseline system with learning module) for antimicrobial stewardship. Length: 2 months |
(21) | US | Non-randomized control trials | 9 | MRNet, a diagnostic algorithm based on knee magnetic resonance imaging | ML (CNN) | Disease diagnosis | Clinical experts assisted by MRNet. Task: 60 exams |
(22) | US | Randomized control trials | 20 | MySurgeryRisk, an algorithm for automated preoperative risk assessment based on the EHR data | ML (RF) | Risk analysis | Physicians with assistance from MySurgeryRisk. Task: 150 patient cases |
(23) | China | Comparative study | 437 | ENDOANGEL (previous WISENSE AI system), a real-time AI system that monitors blind spots during EGD | ML (CNN, DRL) | Disease screening | Patients undergo EGD assisted by ENDOANGEL. Task: 217 patients (lost to follow up not reported) |
(24) | UK | Survey | 2,642 | Streams (DeepMind Technologies Ltd., London, UK), a digitally-enabled care pathway for acute kidney injury management that incorporated in a mobile application | ML (NR) | Risk analysis | Implementation of Streams (digitally enable care pathway) in one hospital |
(13) | Kenya | Retrospective cross-sectional | 6 | Parasight Platform (SightDX, Tel Aviv), an automated system that provides malaria diagnosis, species identification, and parasite quantification based on blood samples | ML (SVM) | Disease diagnosis | Microscopists use Parasight Platform for malaria diagnosis Task: 205 samples in India (IN) 263 samples in Kenya (KE) |
(25) | US | Retrospective cross-sectional | 22,280 | Early Warning System 2.0, an ML-based tool that provides severe sepsis and sepsis shock prediction based on the clinical data from EHR | ML (RF) | Risk analysis | Implement Early Warning System 2.0 to predict sepsis. Silent period: 6 months; alert period: 8 months |
(26) | US | prospective cross-sectional | 87 | An early warning system that provides severe sepsis and sepsis shock prediction based on the clinical data from EHR | ML (NR) | Risk analysis | Implement an early warning system for severe sepsis prediction. Length: 6 weeks |
(27) | Spain | Survey | 2,569 | InNoCBR, a case-based reasoning system for automatic healthcare- associated infection surveillance and diagnosis | ML (NB, PART, DT) | Risk analysis | Deploy InNoCBR in a public hospital. Length: 10 months |
(28) | Germany | Experiment within subjects | 18 | e-ASPECT (Brainomix Limited, Summertown, Oxford, UK), a tool for early detection of ischemic damage based on CT scans | ML (RF) | Triage; treatment | Implement e-ASPECT into an ambulance |
(29) | Australia | prospective cross-sectional | 197 | An AI-based grading system for diabetic retinopathy based on retinal images | ML (CNN) | Disease screening | Deploy an AI-based system for diabetic retinopathy in clinical practice. Length: 6 months |
(30) | Australia | Randomized control trials | 96 | A deep learning algorithm for referrable diabetic retinopathy screening based on retinal images | ML (CNN) | Disease screening | Deploy a DL learning algorithm for referrable diabetic retinopathy. Length: 4 months |
(31) | UK | Experiment within subjects | 11 | A DL-based system for histopathologic classification of liver cancer (i.e., hepatocellular carcinoma, cholangiocarcinoma) based on whole-slide images | ML (CNN) | Disease diagnosis | Pathologists assisted by a DL-based system for liver cancer classification. Task: 80 samples |
(32) | UK | Randomized control trials | 2 | An AI-based system that predicts the long-term risk of diabetes-related complications based on individual patient profiles | ML (cox model, AFT model, RSF, SCVR) | Risk analysis | An AI-based system that predicts the long-term risk of diabetes-related complications |
(33) | China | Experiment within subjects | 350 | CC-Cruiser platform, an AI-based online platform for congenital cataract based on eye images | ML (CNN) | Disease diagnosis | Patients received diagnosis from CC-Cruiser platform. Length: about 9 months |
(34) | US | Prospective cross-sectional | 40 | A deep learning algorithm to detect and localize fractures in radiographs | ML (CNN) | Disease diagnosis | Clinicians assisted by a DL-based model to detect fractures in wrist radiographs. Task: 300 radiographs |
(35) | China | Comparative cross-sectional | 1,026 | An algorithm for polyp and adenoma detection in colonoscopy (Henan Xuanweitang Medical Information Technology Co., Ltd., Zhengzhou, Henan, China) | ML (CNN) | Disease screening | Patients undergo colonoscopy with assistance from a CADe system. Length: 6 months |
(36) | US | Prospective cross-sectional | 15 | Koios DS for Breast, an AI-based system that provides decision support for breast ultrasound lesion assessment (Koios Medical, NY, US) | ML (NR) | Disease diagnosis | Physicians assisted by Koios DS for Breast to make breast ultrasound lesion assessment. Task: 900 breast lesions |
(37) | US | Prospective cross-sectional | 214 | Patient Journey Record system, a system that detects adverse changes in patient biopsychosocial trajectories | ML (rule-based algorithms) | Risk analysis | Patients assigned to intervention group (implementation of a complex adaptative chronic care system). Length: 12 months |
(38) | US | Retrospective cross-sectional | 1,328 | InSight (Dascena, Hayward, California), a sepsis prediction algorithm based on EHR data | ML (NR) | Risk analysis | Implementation of InSight to predict sepsis |
(39) | US | Prospective cross-sectional | 3 | Watson for Oncology with Cota RWE platform (IBM Watson Health, Cambridge, MA), a clinical decision support system that provides treatment recommendations for patients with cancer | ML (NR) | Treatment | Clinicians with assistance from IBM Watson for Oncology with Cota RWE platform. Task: 223 patient cases |
(40) | Japan | Randomized control trials | 814 | An AI-based CAD system in assessing diminutive polyps during colonoscopy | ML (SVM) | Disease screening | Real-time use of CAD during colonoscopy. Length: 7 months |
(41) | UK | Experiment within subjects | 1 | e-Stroke Suite (Brainomix Limited, Summertown, Oxford, UK) | ML (DL) | Treatment | Implement e-Stroke Suite to improve mechanical thrombectomy referral pathway |
(42) | India | Comparative cross-sectional | 213 | Medios AI (Remidio Innovative Solutions Pvt Ltd., Karnataka, India), a mobile-based automated system for diabetic retinopathy screening based on retinal images | ML (CNN) | Disease screening | Analyze the retinal images with Medios AI. Length: 2 months |
(43) | Canada | Randomized control trials | 41 | An ML-based prostate implant planning algorithm | ML (kNN) | Treatment | Patients receive treatment planning from a ML-based prostate implant planning system. Length: 7 months |
(44) | US | Prospective cross-sectional | 8 | HeadXNet Model, a segmentation model that predicts intracranial aneurysms based on head computed tomographic angiography imaging | ML (CNN) | Disease diagnosis | Clinicians augmented with HeadXNet Model to predict intracranial aneurysms. Tasks: 115 examinations |
(14) | US | Prospective cross-sectional | 81 | A commercial AI-based clinical decision support tool to improve glycemic control in patients with diabetes | NR | Risk analysis | Implement an AI tool to improve glycemic control in patients with diabetes |
(45) | UK | Prospective cross-sectional | 408 | Technology integrated health management, an IoT system for dementia care | ML (Markov chain model) | Risk analysis | Implement TIHM system for dementia care. Length: 9 months |
(46) | US | Randomized control trials | 16 | SimulConsult diagnostic decision support system (SimulConsult, Chestnut Hill, MA), a system that informs diagnostic assessments based on the temporal pattern of clinical findings in each disease | ML (Bayesian pattern matching) | Disease diagnosis | Neurologists with assistance from SimulConsult diagnostic decision support system. Task: 40 patient vignettes |
(47) | US | Prospective cross-sectional | 26 | SimulConsult diagnostic decision support system | ML (Bayesian pattern matching) | Disease diagnosis | Neurologists with assistance from SimulConsult diagnostic decision support system. Task: 8 patient vignettes |
(48) | US | Structured interview | 10 | SimulConsult diagnostic decision support system | ML (Bayesian pattern matching) | Disease diagnosis | Implementation of SimulConsult diagnostic decision support system in clinical use |
(49) | Israel | Prospective cross-sectional | 3,160 | MedAware, an ML-based system for identification and prevention of prescription error | ML (NR) | Treatment | Integration of MedAware (prescription error identification and prevention system) into clinical practice. Length: 1 year and 11 months |
(50) | US | Randomized control trials | 142 | A commercial sepsis prediction algorithm (Dascena, Hayward, California) | ML (NR) | Risk analysis | Patients receive AI-generated sepsis prediction. Length: 3 months |
(51) | US | Prospective cross-sectional | 12 | Samsung Auto Lung Nodule (ALAND) (Samsung Electronics, Suwon, South Korea), an ML-based software that detects malignant lung nodules on chest radiographs | ML (CNN) | Disease screening | Radiologists assisted by ALAND to detect malignant lung nodules on chest radiographs. Task: 800 radiographs |
(52) | Not reported | Experimental within the subjects | 6 | A deep learning algorithm that reviews lymph nodes for metastatic breast cancer diagnosis | ML (CNN) | Disease screening | Pathologists assisted by a DL model when reviewing lymph nodes for metastatic breast cancer. Task: 70 digitized slides from lymph node sections |
(53) | China | Randomized control trials | 629 | An automated system for polyp and adenoma detection in colonoscopy (Qingdao Medicon Digital Engineering Co Ltd., Shandong, China) | ML (CNN) | Disease screening | Patients undergo colonoscopy with the assistance from an AI system. Length: 8 months |
(54) | Not reported | Randomized control trials | 2 | An algorithm for automated surveillance and triage of cranial images |
ML (CNN) | Disease diagnosis; triage | Implement an AI system to triage cranial images. Task: 180 images |
(55) | Not reported | Prospective cross-sectional study | 3 | An algorithm that recognizes cancer cell types and predicts HER2 status in breast cancer based on tumor resection samples (AstraZeneca, Cambridge, UK) | ML (CNN) | Disease diagnosis | Use the DL-based algorithm to recognize cancer cell types and diagnose breast cancer Task: 71 breast tumor resection samples |
(56) | US | Comparative cross-sectional | Not reported | A procalcitonin-guided decision algorithm for antibiotic stewardship | ML (NR) | Treatment | Implement an antibiotic stewardship algorithm for hospitalized sepsis and LRTI patients. Length: two 4-year periods (2006–2009, 2010–2014) |
(57) | China | Randomized control trials | 1,066 | A real-time automatic polyp and adenoma detection system in colonoscopy (Shanghai Wision AI, Shanghai, China) | ML (CNN) | Disease screening | Patients undergo colonoscopy with assistance from a polyp and adenoma detection system. Length: 6 months |
(58) | US | Randomized control trials | 75 | An algorithm that identifies patients with atrial fibrillation and recommends anticoagulation usage based on EHR data | ML (NR) | Risk analysis | Implement a system that identifies under-use of anticoagulation in 14 clinics (one clinic entered every 28 days). Length: 14 months |
(59) | China | Randomized control trials | 962 | EndoScreener, a polyp and adenoma detection system in colonoscopy (Shanghai Wision AI, Shanghai, China) | ML (deep learning) | Disease screening | Patients undergo colonoscopy with assistance from EndoScreener. Length: 5 months |
(60) | Netherland | Randomized control trials | 68 | An early warning system for intraoperative hypotension based on hemodynamic variables (Edwards Lifesciences, Irvine, California) | ML | Risk analysis | Patients receive the early warning system for atrial fibrillation. Length: 11 months |
(61) | China | Prospective cross-sectional study | 3,600 | Cataract AI agent (Beijing Tulip Partners Technology, Beijing, China), a platform for collaborative management of cataracts based on retinal images |
ML (CNN) | Disease screening | Implement the cataract AI referral platform for collaborative management of cataracts. Length: 6 months |
(62) | China | Randomized control trials | 309 | WISENSE, a real-time AI system that monitors blind spots during EGD and facilitates the diagnosis of upper gastrointestinal lesions | ML (CNN, DRL) | Disease screening | Patients undergo EGD with assistance of WISENSE system. Length: 3 months |
(63) | Korea | Prospective cross-sectional study | 51 | A real-time computer-aided system for thyroid nodule diagnosis based on ultrasonography images (S-Detect for Thyroid; Samsung Medison Co., Ltd.) | NR | Disease diagnosis | A radiologist assisted by a system for thyroid nodule diagnosis. Task: 50 patients undergo ultrasonography |
AI, artificial intelligence; IDx-DR, retinal diabetic retinopathy screening system; ML, machine learning; CNN, convolutional neural network; CT, computed tomography; ICH, intracranial hemorrhage; DL, deep learning; DT, decision tree; LR, logistic regression; NR, natural resources; RF, random forest; EHR, electronic health records; DRL, deep reinforcement learning; EGD, esophagogastroduodenoscopy; SVM, support vector machine; NB, Naive Bayes; PART, prognostic model based on decision trees; e-ASPECT, Electronic Alberta Stroke Program Early CT; AFT, accelerated failure time; RSF, random survival forests; SCVR, Synchronized Coronary Venous Retroperfusion; CAD, computer-assisted diagnosis; kNN, k-nearest neighbors; TIHM, technology integrated health management; HER2, human epidermal growth factor receptor 2; LRTI, lower respiratory tract infections.
Two of the 51 studies had an average sample size of less than 30, and 29 (57%) of the research studies specifically stated the patients involved. However, information regarding the participating healthcare providers was provided by 28 (55%) studies, of which 17 had 10 or fewer providers.
Of the 51 studies, 36 were carried in various developed nations: 20 were carried out in the US, 5 in the UK, 1 in Australia, 1 in Canada, and Japan, 1 in Germany, 1 in Israel, 1 in Spain, and the 1 in Netherlands, and 1 in South Korea. Conversely, ten studies, eight in China, a single study in India, along with one in India and Kenya, were carried out in developing countries.
Two of the 51 studies provided no details on the AI methods they employed. Neural networks (n=22) emerged as the most widely used ML approach among the rest of the 49 research. This was followed by random forest models (n=3), Bayesian pattern recognition (n=3), the use of support vector machines (n=2), decision tree models (n=2), and deep reinforcement training (n=2). Additionally, we observed that the majority of the AI systems that were included offered decision assistance for the following four types of clinical tasks: risk analysis (n=14), therapy (n=7), medical diagnosis (n=16), and disease screening or triage (n=16). Furthermore, 46 (94%) of research used AI technologies to target several particular diseases and disorders. Sepsis (n=6), breast cancer (n=5), retinopathy due to diabetes (n=4), polyp & adenoma (n=4), cataracts (n=2), and stroke (n=2) were the most common illnesses and ailments.
A total of 26 studies assessed how effectively AI applications performed in actual healthcare scenarios. The area under the curve, sensitivity, specificity, accuracy, positivity predictive value, and a negative predictive value were among the frequently utilized performance indicators. Of them, 24 investigations found that AI applications performed satisfactorily and with acceptable quality in real-world settings. One research (15), for instance, used the IDx-DR diagnostic technology in a pivotal trial to identify diabetic retinal degeneration in ten US primary clinic locations. The study reported that IDx-DR exceeded predetermined criteria with a sensitivity value of 87.2%, specificity of 90.7%, and visual appeal rate of 96.1%. Based on the findings, IDx-DR was approved by the FDA as the first AI diagnosing system, and it has the potential to help several thousand diabetic patients avoid visual loss by enhancing the early diagnosis of diabetic retinopathy.
Conversely, two research showed that AI apps required more development since they were unable to outperform healthcare practitioners (29,33). The scientists discovered that when it came to detecting congenital cataracts in children and choosing a course of treatment, CC-Cruiser performed much less accurately in terms of positive predictive value (PPV) and negative predictive value (NPV) than senior doctors. An AI-based diabetic retinopathy grading system’s effectiveness at an Australian primary care clinic was assessed in another research (33), which discovered that the system had a high false-positive rate (PPV of 12%). More specifically, the AI system found 17 individuals with severe diabetic retinopathy who needed to be referred out of the 193 patients who gave their assent for the research (29). Only two cases, though, were properly diagnosed; the other 15 were false positives.
AI can be of assistance in managers’ decisions in the healthcare field. From the identified papers, sixteen papers described how AI applications might enhance the capability to manage healthcare decisions. For instance, Brennan et al. (22) opted that having worked with MySurgery—an algorithm used for pre-operative risk assessments, physicians gained new knowledge as well as enhanced their performance with reference to risk. However, there was no sign of enhanced decision-making in two investigated cases (16,31). probable reason might be that AI could provide misleading recommendations and thus, cancel out the positive impact. For instance, Kiani et al. (31) investigated the impact of utilizing a deep learning-based system for the distinction of live cancer on the diagnostic performance of 11 pathologists; they found out that integrating AI into the system did not enhance the diagnostic proficiency. They also said that when AI provided the correct forecast, accuracy was high, while when it provided wrong forecasts, accuracy drops. In their study, Aoki et al. assessed the reading of endoscopists concerning small intestinal capsule endoscopy using a deep learning-based approach for mucosal rupture (16). The authors observed that the method did not improve endoscopists’ ochre, particularly in trainees.
Out of the seven studies, clinician efficiency and the ways in which they worked was the major area of concern. Of these, six investigations concluded that integration of AI into the existing process improved the current status by expediting entire clinical processes that previously required extra time. For example, Titano et al. (54) found that a deep learning-based intracranial image triage system enhanced case triage with the radiology workflow and a capacity to process and interpret pictures one hundred fifty time faster than human radiologists and added capacity to correctly alarm certain urgent cases. Investigation conducted by Wu et al. is the only study assessed that entailed evaluating the quality improvement system referred to as WISENSE for blind spots and procedure timing during esophagogastroduodenoscopy. This study also found that WISENSE helped endoscopists to monitor the time spent for each process and in turn misled to the improvement of the inspection time (62).
Finally, seven research looked at how clinicians perceived and accepted AI applications. In particular, general opinions on AI applications were shown to be favorable in five of the seven studies. For instance, in a simulated clinical workflow, Brennan et al. (22) requested 20 surgical intensivists to utilize and assess MySurgeryRisk for preoperative risk prediction. The majority of respondents said that MySurgeryRisk was practical, simple to use, and would be beneficial when making decisions. However, the opinions of AI were varied or even unfavorable in the other two experiments. In particular, Ginestra et al. (26) reported that just 16% of medical professionals thought the system-generated sepsis warnings were beneficial when evaluating physician assessments of an ML-based septic prediction system in a secondary teaching hospital. Low transparency of algorithm, a lack of defined actions following warnings, and suppliers’ lack of faith in alerts might all be reasons for the unfavorable assessments. Only 14% of clinical staff members were willing to endorse the AI-based clinical decision-making system that Romero-Brufau et al. (14) revealed survey findings after using in a regional healthcare practice. Some system-recommended treatments were found to be insufficient and unsuitable based on input from the staff.
Patient outcomes were reported in 14 research. The impact of AI on clinical procedures and results, including hospital duration of stay, death in the hospital, intensive care unit (ICU) relocation, returning to the hospital, and time to intervention, was investigated in 11 of the 14 trials. There was variability in the outcomes. The majority of studies noted better clinical results. An ML-based severe sepsis prediction system, for instance, was installed and evaluated in two separate ICUs at the University of California San Francisco Medical Center as part of an randomized controlled trial (RCT) (50). They discovered that the application of the algorithm significantly reduced both the hospitalized death rate from 21.3% to 8.96% and the hospital’s duration of stay from 13.0 to 10.3 days. Nevertheless, three of the investigations found no indication of better clinical outcomes, suggesting that the algorithms’ present iteration has limited application. In this case, Bailey et al. (18) focused on a study of the effects of a real-time ML system that generated warnings on clinical deterioration in patients in hospitals. They realized that using only warnings by alone was not effective in decreasing mortality in hospitals and the length of stay. In a study done by (24) here a novel electronic care pathway for managing acute kidney injury was explored, the authors noticed that there were no changes in the rate of renal regeneration or any other secondary clinical outcome after the intervention. In this context, Giannini et al. (25) developed a sepsis prediction algorithm in an academic healthcare system and implemented it soon. The results made it clear that the alerts given by the algorithm reduced none of the death, discharge dispositions, or ICU transfers, and impacted a selection of clinical processes to a limited degree. Therefore, more optimization of the algorithm needs to be done.
Three studies that looked at how patients assessed AI apps produced conclusive findings. In an outpatient endocrinology environment, Keel et al. (30) assessed the acceptance of an AI-based diabetic retinopathy screening tool among patients. They discovered that 78% of patients in the follow-up survey said they selected AI screening over conventional screening, indicating that patients generally accepted the AI tool. Additionally, 96% of the screened patients expressed satisfaction with the AI tool. When Lin et al. (33) evaluated client satisfaction with CC-Cruiser for pediatric cataracts, they discovered that patients were somewhat happier with the tool than senior consultants. One rationale is that, if early management is delayed, young cataracts may result in irreparable visual damage or even blindness.
Discussion
The potential benefits of AI applications are enormous: they can enhance clinician decision-making, enhance healthcare processes and outcomes for patients, and lower healthcare costs. The purpose of this review was to identify and gather studies on AI applications that have been used in actual clinical settings.
Given the huge quantity of research on healthcare AI, we observe that the total number of included research papers was rather low. Specifically, the majority of research on AI in healthcare was proof-of-concept work that concentrated on developing and validating AI algorithms with the use of prospective clinical data sets. On the other hand, only a few studies used AI in therapeutic settings and assessed its effectiveness. An AI application must, however, offer strong evidence of its superiority over the conventional method of care to guarantee safe adoption. Consequently, to showcase the potential of AI in actual clinical settings, we encourage healthcare AI researchers to collaborate closely with healthcare professionals and organizations.
The fact that developed nations accounted for over two-thirds of the included articles—of which over half were from the US, suggests that these nations are at the leading edge of healthcare AI development and application. This is because the leading health AI startups and firms are mostly based in Europe and the US. Although our search yielded 890 other language publications, we omitted non-English articles, thus this result should be regarded cautiously. Due to the complexity and variety of translations, we were unable to conduct an unbiased analysis, hence we skipped over these other languages articles. The unequal dissemination of publications based on nationality or level of economic development may be explained by the extremely low publishing rate of scholars from low-income nations.
It is important, nevertheless, that 8 of our studies were published in China, indicating that the country has been using and studying healthcare AI significantly. In fact, clinical use of AI has been organized by hospitals, tech firms, and the Chinese government in an effort to reduce medical asset disparities, lower healthcare costs, and address the shortage of doctors (64,65). Additionally, Chinese researchers have gained the ability to publish their research in international English-language journals.
Future research on healthcare AI assessment has to be of greater quality. Only 13 of the research papers in the study we included were Randomized controlled trials, and the majority of them had moderate risk of bias. Eight of the studies conducted were experimental, and as they all used crossover or within-subject designs, they were all vulnerable to variables that were confounding. Just 8 (16%) research had sample data on patients as well as healthcare providers, while 14 (28%) studies had a sample size of fewer than 20, which limited the findings’ capacity to be applied to a larger population.
It was more difficult to determine the additional value of AI applications when compared to the present best practice since the majority of the studies lacked a comparison group. The design, the reliability or precision performance, and possible hazards of the AI system should be transparently described in the research, as healthcare practitioners may have differing opinions about various AI systems with differing levels of performance and dependability. However, of the manuscripts we reviewed, 21 did not include enough details on the applications’ design, and 22 did not disclose the effectiveness or possible risks of the AI in consideration. Furthermore, it is crucial to periodically check and recalibrate AI programs to make sure they are operating as intended. This is because certain self-evolving flexible clinical AI systems regularly incorporate the most recent clinical practice information and published research. Ultimately, we discovered that a majority of the research (n=29, 57%) examined a single feature of the assessment outcome. Future studies are urged to undertake a more thorough evaluation of the effectiveness of clinical AI applications and their effects on medical professionals, patients, and institutions. This will make it easier to evaluate and choose among several AI solutions in the same therapeutic area.
Our research demonstrated that, in some situations, AI applications may offer useful decision help. For example, the amount of expertise can influence the enhancing function of AI in medical decision-making capacity. Specifically, two studies revealed that junior doctors were more likely than senior doctors to gain from AI because they’re more inclined to reevaluate and adjust their clinical judgments in response to contradictory AI recommendations (27,36). It’s important to remember, though, that AI can occasionally be deceptive. One research from our evaluation, for instance, hypothesized that, due to their inexperience with reading, trainee endoscopists may be perplexed by false-positive findings from an AI diagnostic instrument and, as a result, overlook AI-marked lesions of small-bowel mucosal breaches (16). Therefore, future studies must look at the situations in which medical professionals may gain the most from applications of AI. However, we are certain that the usage of AI technology will eventually be included in standard clinical treatment once it is accurate and developed enough to be considered an evidence-based standard of care.
Regarding AI adoption, we found that in two research (14,26), healthcare practitioners had negative opinions about AI, suggesting that there were difficulties in integrating AI into the regular workflow. However, a detailed discussion of the challenges to AI implementation is outside the scope of this study.
We discovered that the majority of the studies that focused on outcomes for patients did not thoroughly investigate the clinical procedures and treatments. However, the results for patients may not be improved by AI applications in the absence of relevant and helpful therapies. For instance, Bailey et al. (18) showed that improving the results of patients at greater risk did not need only informing nursing personnel of the risks associated with clinical deterioration. Effective therapies tailored to each patient are required. Therefore, in order to improve the therapeutic efficacy of AI applications, future studies may develop and assess patient-directed treatments.
Furthermore, it was shown by three of the studies that were included that the simplicity and efficacy of healthcare AI had resulted in high levels of satisfaction among patients and their families. That could occasionally be the case. Previous studies have demonstrated that even in cases where receiving basic medical treatment from a human physician meant a greater chance of misdiagnosis, people favored this method over AI (66). They believed AI to be unable of taking into account their particular set of circumstances, which is why. Patients may also think less of doctors who use clinical decision support systems and denigrate them in comparison to their unassisted colleagues. It should be taken into consideration to do more research to examine potential patient concerns and opposition to health applications of AI (67).
Despite the high upfront expenses of using AI, more than 50 percent of healthcare organizations believe that AI will save costs and increase revenue, based on an Accenture survey. Only one of the investigated studies, however, provided information on the financial effects of implementing AI (68). This emphasizes how important it is to carry out more cost-effectiveness studies of AI applications in medical settings.
Conclusions
Applications based on AI have enormous potential to enhance healthcare processes and outcomes for patients. There is a lot of interest in developing AI tools for assisting clinical processes, and more evidence of high quality is being developed, according to the literature reviewed in this study. However, currently, there isn’t enough data to justify the regular application of AI in medical settings for decision assistance, which is hindering the field’s advancement and raising concerns about patient safety. Therefore, we conclude that it is essential to carry out strong Randomized controlled trials to compare the results and procedures of AI-assisted care to modern facilities.
Acknowledgments
Funding: None.
Footnote
Reporting Checklist: The authors have completed the PRISMA reporting checklist. Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-294/rc
Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-294/prf
Conflicts of Interest: Both authors have completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-294/coif). The authors have no conflicts of interest to declare
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Secinaro S, Calandra D, Secinaro A, et al. The role of artificial intelligence in healthcare: a structured literature review. BMC Med Inform Decis Mak 2021;21:125. [Crossref] [PubMed]
- Naik N, Hameed BMZ, Shetty DK, et al. Legal and Ethical Consideration in Artificial Intelligence in Healthcare: Who Takes Responsibility? Front Surg 2022;9:862322. [Crossref] [PubMed]
- Rajpurkar P, Chen E, Banerjee O, et al. AI in health and medicine. Nat Med 2022;28:31-8. [Crossref] [PubMed]
- Akpan DM. Artificial Intelligence and Machine Learning. In: Future-Proof Accounting: Data and Technology Strategies. Bingley, UK: Emerald Publishing Limited; 2024:49-64.
- INACSL Standards Committee. Healthcare simulation standards of best practiceTM simulation design. Clinical Simulation in Nursing 2021;58:14-21. [Crossref]
- INACSL Standards Committee. Healthcare simulation standards of best practiceTM The debriefing process. Clinical Simulation in Nursing 2021;58:27-32. [Crossref]
- Lamé G, Dixon-Woods M. Using clinical simulation to study how to improve quality and safety in healthcare. BMJ Simul Technol Enhanc Learn 2020;6:87-94. [Crossref] [PubMed]
- Cau R, Faa G, Nardi V, et al. Long-COVID diagnosis: From diagnostic to advanced AI-driven models. Eur J Radiol 2022;148:110164. [Crossref] [PubMed]
- Raparthi M. AI-Driven Decision Support Systems for Precision Medicine: Examining the Development and Implementation of AI-Driven Decision Support Systems in Precision Medicine. Journal of Artificial Intelligence Research 2021;1:11-20.
- Lyon JY, Bogodistov Y, Moormann J. AI-driven optimization in healthcare: the diagnostic process. European Journal of Management Issues 2021;29:218-31. [Crossref]
- Comito C, Falcone D, Forestiero A. AI-driven clinical decision support: enhancing disease diagnosis exploiting patients similarity. IEEE Access 2022;10:6878-88.
- Moher D, Liberati A, Tetzlaff J, et al. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Intern Med 2009;151:264-9, W64.
- Eshel Y, Houri-Yafin A, Benkuzari H, et al. Evaluation of the Parasight Platform for Malaria Diagnosis. J Clin Microbiol 2017;55:768-75. [Crossref] [PubMed]
- Romero-Brufau S, Wyatt KD, Boyum P, et al. A lesson in implementation: A pre-post study of providers' experience with artificial intelligence-based clinical decision support. Int J Med Inform 2020;137:104072. [Crossref] [PubMed]
- Abràmoff MD, Lavin PT, Birch M, et al. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med 2018;1:39. [Crossref] [PubMed]
- Aoki T, Yamada A, Aoyama K, et al. Clinical usefulness of a deep learning-based system as the first screening on small-bowel capsule endoscopy reading. Dig Endosc 2020;32:585-91. [Crossref] [PubMed]
- Arbabshirani MR, Fornwalt BK, Mongelluzzo GJ, et al. Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration. NPJ Digit Med 2018;1:9. [Crossref] [PubMed]
- Bailey TC, Chen Y, Mao Y, et al. A trial of a real-time alert for clinical deterioration in patients hospitalized on general medical wards. J Hosp Med 2013;8:236-42. [Crossref] [PubMed]
- Barinov L, Jairaj A, Becker M, et al. Impact of Data Presentation on Physician Performance Utilizing Artificial Intelligence-Based Computer-Aided Diagnosis and Decision Support Systems. J Digit Imaging 2019;32:408-16. [Crossref] [PubMed]
- Beaudoin M, Kabanza F, Nault V, et al. Evaluation of a machine learning capability for a clinical decision support system to enhance antimicrobial stewardship programs. Artif Intell Med 2016;68:29-36. [Crossref] [PubMed]
- Bien N, Rajpurkar P, Ball RL, et al. Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet. PLoS Med 2018;15:e1002699. [Crossref] [PubMed]
- Brennan M, Puri S, Ozrazgat-Baslanti T, et al. Comparing clinical judgment with the MySurgeryRisk algorithm for preoperative risk assessment: A pilot usability study. Surgery 2019;165:1035-45. [Crossref] [PubMed]
- Chen D, Wu L, Li Y, et al. Comparing blind spots of unsedated ultrafine, sedated, and unsedated conventional gastroscopy with and without artificial intelligence: a prospective, single-blind, 3-parallel-group, randomized, single-center trial. Gastrointest Endosc 2020;91:332-339.e3. [Crossref] [PubMed]
- Connell A, Montgomery H, Martin P, et al. Evaluation of a digitally-enabled care pathway for acute kidney injury management in hospital emergency admissions. NPJ Digit Med 2019;2:67. [Crossref] [PubMed]
- Giannini HM, Ginestra JC, Chivers C, et al. A Machine Learning Algorithm to Predict Severe Sepsis and Septic Shock: Development, Implementation, and Impact on Clinical Practice. Crit Care Med 2019;47:1485-92. [Crossref] [PubMed]
- Ginestra JC, Giannini HM, Schweickert WD, et al. Clinician Perception of a Machine Learning-Based Early Warning System Designed to Predict Severe Sepsis and Septic Shock. Crit Care Med 2019;47:1477-84. [Crossref] [PubMed]
- Gómez-Vallejo H, Uriel-Latorre B, Sande-Meijide M, et al. A case-based reasoning system for aiding detection and classification of nosocomial infections. Decision Support Systems 2016;84:104-16. [Crossref]
- Grunwald IQ, Ragoschke-Schumm A, Kettner M, et al. First Automated Stroke Imaging Evaluation via Electronic Alberta Stroke Program Early CT Score in a Mobile Stroke Unit. Cerebrovasc Dis 2016;42:332-8. [Crossref] [PubMed]
- Kanagasingam Y, Xiao D, Vignarajan J, et al. Evaluation of Artificial Intelligence-Based Grading of Diabetic Retinopathy in Primary Care. JAMA Netw Open 2018;1:e182665. [Crossref] [PubMed]
- Keel S, Lee PY, Scheetz J, et al. Feasibility and patient acceptability of a novel artificial intelligence-based screening model for diabetic retinopathy at endocrinology outpatient services: a pilot study. Sci Rep 2018;8:4330. [Crossref] [PubMed]
- Kiani A, Uyumazturk B, Rajpurkar P, et al. Impact of a deep learning assistant on the histopathologic classification of liver cancer. NPJ Digit Med 2020;3:23. [Crossref] [PubMed]
- Lagani V, Chiarugi F, Manousos D, et al. Realization of a service for the long-term risk assessment of diabetes-related complications. J Diabetes Complications 2015;29:691-8. [Crossref] [PubMed]
- Lin H, Li R, Liu Z, et al. Diagnostic Efficacy and Therapeutic Decision-making Capacity of an Artificial Intelligence Platform for Childhood Cataracts in Eye Clinics: A Multicentre Randomized Controlled Trial. EClinicalMedicine 2019;9:52-9. [Crossref] [PubMed]
- Lindsey R, Daluiski A, Chopra S, et al. Deep neural network improves fracture detection by clinicians. Proc Natl Acad Sci U S A 2018;115:11591-6. [Crossref] [PubMed]
- Liu WN, Zhang YY, Bian XQ, et al. Study on detection rate of polyps and adenomas in artificial-intelligence-aided colonoscopy. Saudi J Gastroenterol 2020;26:13-9. [Crossref] [PubMed]
- Mango VL, Sun M, Wynn RT, et al. Should We Ignore, Follow, or Biopsy? Impact of Artificial Intelligence Decision Support on Breast Ultrasound Lesion Assessment. AJR Am J Roentgenol 2020;214:1445-52. [Crossref] [PubMed]
- Martin CM, Vogel C, Grady D, et al. Implementation of complex adaptive chronic care: the Patient Journey Record system (PaJR). J Eval Clin Pract 2012;18:1226-34. [Crossref] [PubMed]
- McCoy A, Das R. Reducing patient mortality, length of stay and readmissions through machine learning-based sepsis prediction in the emergency department, intensive care unit and hospital floor units. BMJ Open Qual 2017;6:e000158. [Crossref] [PubMed]
- McNamara DM, Goldberg SL, Latts L, et al. Differential impact of cognitive computing augmented by real world evidence on novice and expert oncologists. Cancer Med 2019;8:6578-84. [Crossref] [PubMed]
- Mori Y, Kudo SE, Misawa M, et al. Real-Time Use of Artificial Intelligence in Identification of Diminutive Polyps During Colonoscopy: A Prospective Study. Ann Intern Med 2018;169:357-66. [Crossref] [PubMed]
- Nagaratnam K, Harston G, Flossmann E, et al. Innovative use of artificial intelligence and digital communication in acute stroke pathway in response to COVID-19. Future Healthc J 2020;7:169-73. [Crossref] [PubMed]
- Natarajan S, Jain A, Krishnan R, et al. Diagnostic Accuracy of Community-Based Diabetic Retinopathy Screening With an Offline Artificial Intelligence System on a Smartphone. JAMA Ophthalmol 2019;137:1182-8. [Crossref] [PubMed]
- Nicolae A, Semple M, Lu L, et al. Conventional vs machine learning-based treatment planning in prostate brachytherapy: Results of a Phase I randomized controlled trial. Brachytherapy 2020;19:470-6. [Crossref] [PubMed]
- Park A, Chute C, Rajpurkar P, et al. Deep Learning-Assisted Diagnosis of Cerebral Aneurysms Using the HeadXNet Model. JAMA Netw Open 2019;2:e195600. [Crossref] [PubMed]
- Rostill H, Nilforooshan R, Morgan A, et al. Technology integrated health management for dementia. Br J Community Nurs 2018;23:502-8. [Crossref] [PubMed]
- Segal MM, Williams MS, Gropman AL, et al. Evidence-based decision support for neurological diagnosis reduces errors and unnecessary workup. J Child Neurol 2014;29:487-92. [Crossref] [PubMed]
- Segal MM, Athreya B, Son MB, et al. Evidence-based decision support for pediatric rheumatology reduces diagnostic errors. Pediatr Rheumatol Online J 2016;14:67. [Crossref] [PubMed]
- Segal MM, Rahm AK, Hulse NC, et al. Experience with Integrating Diagnostic Decision Support Software with Electronic Health Records: Benefits versus Risks of Information Sharing. EGEMS (Wash DC) 2017;5:23. [Crossref] [PubMed]
- Segal G, Segev A, Brom A, et al. Reducing drug prescription errors and adverse drug events by application of a probabilistic, machine-learning based clinical decision support system in an inpatient setting. J Am Med Inform Assoc 2019;26:1560-5. [Crossref] [PubMed]
- Shimabukuro DW, Barton CW, Feldman MD, et al. Effect of a machine learning-based severe sepsis prediction algorithm on patient survival and hospital length of stay: a randomised clinical trial. BMJ Open Respir Res 2017;4:e000234. [Crossref] [PubMed]
- Sim Y, Chung MJ, Kotter E, et al. Deep Convolutional Neural Network-based Software Improves Radiologist Detection of Malignant Lung Nodules on Chest Radiographs. Radiology 2020;294:199-209. [Crossref] [PubMed]
- Steiner DF, MacDonald R, Liu Y, et al. Impact of Deep Learning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer. Am J Surg Pathol 2018;42:1636-46. [Crossref] [PubMed]
- Su JR, Li Z, Shao XJ, et al. Impact of a real-time automatic quality control system on colorectal polyp and adenoma detection: a prospective randomized controlled study (with videos). Gastrointest Endosc 2020;91:415-424.e4. [Crossref] [PubMed]
- Titano JJ, Badgeley M, Schefflein J, et al. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat Med 2018;24:1337-41. [Crossref] [PubMed]
- Vandenberghe ME, Scott ML, Scorer PW, et al. Relevance of deep learning to facilitate the diagnosis of HER2 status in breast cancer. Sci Rep 2017;7:45938. [Crossref] [PubMed]
- Voermans AM, Mewes JC, Broyles MR, et al. Cost-Effectiveness Analysis of a Procalcitonin-Guided Decision Algorithm for Antibiotic Stewardship Using Real-World U.S. Hospital Data. OMICS 2019;23:508-515. [Crossref] [PubMed]
- Wang P, Berzin TM, Glissen Brown JR, et al. Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study. Gut 2019;68:1813-9. [Crossref] [PubMed]
- Wang SV, Rogers JR, Jin Y, et al. Stepped-wedge randomised trial to evaluate population health intervention designed to increase appropriate anticoagulation in patients with atrial fibrillation. BMJ Qual Saf 2019;28:835-42. [Crossref] [PubMed]
- Wang P, Liu X, Berzin TM, et al. Effect of a deep-learning computer-aided detection system on adenoma detection during colonoscopy (CADe-DB trial): a double-blind randomised study. Lancet Gastroenterol Hepatol 2020;5:343-51. [Crossref] [PubMed]
- Wijnberge M, Geerts BF, Hol L, et al. Effect of a Machine Learning-Derived Early Warning System for Intraoperative Hypotension vs Standard Care on Depth and Duration of Intraoperative Hypotension During Elective Noncardiac Surgery: The HYPE Randomized Clinical Trial. JAMA 2020;323:1052-60. [Crossref] [PubMed]
- Wu X, Huang Y, Liu Z, et al. Universal artificial intelligence platform for collaborative management of cataracts. Br J Ophthalmol 2019;103:1553-60. [Crossref] [PubMed]
- Wu L, Zhang J, Zhou W, et al. Randomised controlled trial of WISENSE, a real-time quality improving system for monitoring blind spots during esophagogastroduodenoscopy. Gut 2019;68:2161-9. [Crossref] [PubMed]
- Yoo YJ, Ha EJ, Cho YJ, et al. Computer-Aided Diagnosis of Thyroid Nodules via Ultrasonography: Initial Clinical Experience. Korean J Radiol 2018;19:665-72. [Crossref] [PubMed]
- Liu C, Liu X, Wu F, et al. Using Artificial Intelligence (Watson for Oncology) for Treatment Recommendations Amongst Chinese Patients with Lung Cancer: Feasibility Study. J Med Internet Res 2018;20:e11087. [Crossref] [PubMed]
- Sun Y. AI could alleviate China’s doctor shortage. MIT Technology Review. March 21, 2018.
- Longoni C, Bonezzi A, Morewedge CK. Resistance to medical artificial intelligence. Journal of Consumer Research 2019;46:629-50. [Crossref]
- Shaffer VA, Probst CA, Merkle EC, et al. Why do patients derogate physicians who use a computer-based diagnostic support system? Med Decis Making 2013;33:108-18. [Crossref] [PubMed]
- Stephanie FL, Sharma RS. Capturing Value in Digital Health Eco-Systems: Validating Strategies for Stakeholders. Boca Raton: CRC Press, Taylor and Francis Group; 2021.
Cite this article as: Ullah W, Ali Q. Role of artificial intelligence in healthcare settings: a systematic review. J Med Artif Intell 2025;8:24.