Artificial intelligence-augmented electrocardiogram for heart failure diagnosis and risk prediction: a narrative review

Dimitri Chepkunov; Daye Chung; Kimia Ghanbari; Bhumika Khanna; Neri Yakubov; Myoungmee Babu; Benson Babu

doi:10.21037/jmai-2025-178

Review Article

Artificial intelligence-augmented electrocardiogram for heart failure diagnosis and risk prediction: a narrative review

Dimitri Chepkunov^1,2, Daye Chung^1,2, Kimia Ghanbari^1,2, Bhumika Khanna^2,3, Neri Yakubov², Myoungmee Babu⁴, Benson Babu²

¹School of Medicine, St. George’s University, True Blue, Grenada, WI, USA; ²Department of Internal Medicine, Wyckoff Heights Medical Center, Brooklyn, NY, USA; ³School of Medicine, Xavier University, Oranjestad, Aruba; ⁴Department of Education, New York City Board of Education, New York, NY, USA

Contributions: (I) Conception and design: All authors; (II) Administrative support: B Babu, N Yakubov; (III) Provision of study materials or patients: B Babu, N Yakubov; (IV) Collection and assembly of data: D Chepkunov, D Chung, K Ghanbari, B Khanna; (V) Data analysis and interpretation: All authors; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Benson Babu, MD. Department of Internal Medicine, Wyckoff Heights Medical Center, 374 Stockholm St, Brooklyn, NY 11237, USA. Email: Zero-Click@modooda.org.

Background and Objective: The electrocardiogram (ECG) is a widely available, low-cost diagnostic tool routinely used in cardiovascular care; however, its conventional interpretation has limited sensitivity for detecting heart failure (HF) and predicting disease progression. Artificial intelligence (AI) has emerged as a promising approach to enhance ECG interpretation by identifying latent patterns associated with cardiac dysfunction. This review summarizes the diagnostic performance and clinical utility of AI-augmented ECG models for HF diagnosis and risk prediction based on recent primary studies.

Methods: A structured narrative review of studies published from January 2020 through December 2025 was conducted using Google Scholar, PubMed, AutoLit, Litmaps, Consensus, and Research Rabbit. Inclusion criteria required original studies applying AI to ECG data for the diagnosis, classification, or prognostication of HF, with reported diagnostic or predictive performance metrics. Only English-language studies were included. Data extraction focused on model architecture, training data characteristics, and performance outcomes. A total of 52 studies met the inclusion criteria.

Key Content and Findings: AI-augmented ECG models, predominantly based on deep learning architectures, demonstrated strong diagnostic and predictive performance across multiple HF phenotypes, including reduced, mildly reduced, and preserved ejection fraction. Reported area under the receiver operating characteristic curve (AUC) values commonly ranged from approximately 0.85 to 0.97 for HF detection, with several models also predicting future HF development, hospitalization, and mortality. A subset of studies reported potential clinical or workflow benefits, such as opportunistic screening and improved risk stratification. However, most studies were retrospective and single-center, frequently relying on internal validation only. External validation and prospective evaluation were uncommon, and reporting of blinding, reference standards, and diagnostic timelines was often incomplete. Collectively, these methodological characteristics may limit generalizability and real-world applicability.

Conclusions: AI-augmented ECG analysis demonstrates substantial potential to improve the detection and risk stratification of HF beyond conventional ECG interpretation. While current evidence shows consistently strong diagnostic and prognostic performance, important limitations remain, including study heterogeneity, limited external validation, and a scarcity of prospective, clinically integrated evaluations. Further large-scale, multicenter studies are needed to establish generalizability, clinical impact, and real-world utility before widespread adoption

Keywords: Artificial intelligence (AI); electrocardiogram (ECG); heart failure (HF); deep learning; risk prediction

Received: 31 July 2025; Accepted: 24 March 2026; Published online: 24 April 2026.

doi: 10.21037/jmai-2025-178

Introduction

Background

Heart failure (HF) is a major global public health challenge, affecting more than 64 million individuals worldwide and contributing substantially to morbidity, mortality, and healthcare expenditures (1). In the United States alone, the direct and indirect costs associated with HF management are projected to exceed 70 billion dollars by 2030, largely driven by recurrent hospitalizations, long-term pharmacologic therapy, and advanced interventions such as device implantation and heart transplantation (2). As populations age and the prevalence of cardiovascular risk factors, including hypertension, diabetes, and obesity, continues to rise, the burden of HF is expected to increase further.

The electrocardiogram (ECG) is one of the most widely used and accessible diagnostic tools in cardiovascular medicine. It is routinely obtained across a broad range of clinical settings and provides essential information regarding cardiac rhythm, conduction abnormalities, and indirect markers of structural heart disease. However, conventional ECG interpretation has limited sensitivity for detecting early or subclinical cardiac dysfunction, particularly in patients with HF with preserved or mildly reduced ejection fraction, in whom overt electrical abnormalities may be minimal or absent (3,4). Diagnostic performance is further limited by interobserver variability and by the inability of standard electrocardiographic criteria to capture complex relationships between electrical signals and myocardial structure or function.

Recent advances in artificial intelligence (AI), particularly machine learning and deep learning methods, have expanded the diagnostic potential of ECG. AI augmented ECG models, most commonly based on convolutional neural networks, are capable of analyzing large-scale electrocardiographic datasets and identifying latent patterns associated with ventricular dysfunction, elevated filling pressures, and future HF risk that are not readily apparent to human interpretation (5-7). Multiple studies have demonstrated that AI-enhanced ECG analysis can accurately identify left ventricular systolic dysfunction, classify HF phenotypes, and predict adverse outcomes such as hospitalization and mortality (8-11).

The incorporation of AI-enabled ECG analysis into wearable and portable devices further expands its clinical relevance. Smartwatches, single-lead ECG devices, and mobile monitoring platforms allow for longitudinal cardiac assessment outside traditional healthcare environments. These technologies may facilitate earlier detection of HF and improve access to diagnostic evaluation in underserved or resource-limited populations (12-14).

Rationale and knowledge gap

Several prior reviews have examined the application of AI to ECG, with emphasis on arrhythmia detection, cardiovascular risk prediction, or general AI-enabled diagnostic frameworks (15-17). Other reviews have focused on AI applications in cardiovascular imaging modalities, such as echocardiography or cardiac magnetic resonance imaging, where AI supports image acquisition and interpretation rather than analysis of electrical signals (18).

However, many existing reviews combine heterogeneous cardiovascular outcomes, include secondary or methodological studies, or predate the rapid expansion of large-scale AI ECG models published over the past 5 years. Importantly, few reviews synthesize only primary research studies that evaluate AI augmented ECG models specifically for HF diagnosis, phenotyping, and risk prediction. As a result, uncertainty remains regarding the robustness, generalizability, and real-world applicability of these models across diverse patient populations and clinical settings.

Objective

Given the rapid development of AI-enhanced ECG technologies and the growing number of primary studies published since 2020, an updated narrative synthesis focused exclusively on HF is needed. By restricting inclusion to original research studies that report diagnostic or prognostic performance of AI augmented ECG models, and by excluding systematic reviews and other secondary sources, the present review provides a contemporary assessment of model performance, training data characteristics, and methodological limitations.

The objective of this narrative review is to evaluate the current evidence supporting AI-driven ECG analysis for HF detection and risk stratification, to identify sources of bias that may limit clinical translation, and to highlight key research gaps that must be addressed before widespread implementation. We present this article in accordance with the Narrative Review reporting checklist (available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-178/rc).

Methods

Eligibility criteria

The eligibility criteria and search strategy were guided by the Population, Intervention, Comparison, Outcomes, and Study Design framework to ensure methodological clarity and reproducibility. The population included adult patients evaluated for HF or related cardiac dysfunction using an ECG. The intervention consisted of AI applications applied to ECG interpretation. The comparison involved AI-assisted assessment versus conventional clinical evaluation, imaging-based assessment, or standard electrocardiographic interpretation. Outcomes included diagnostic or prognostic performance metrics such as sensitivity, specificity, accuracy, area under the receiver operating characteristic curve, and prediction of clinical outcomes, including hospitalization and mortality. The study design was limited to original primary research studies.

Searches were performed using standardized phrases across all databases to identify studies evaluating clinical outcomes or diagnostic performance of AI augmented ECG models. The search syntax was adjusted slightly to accommodate the formatting of each database, and no additional Boolean operators or Medical Subject Headings terms were applied. Systematic reviews, meta-analyses, editorials, conference commentaries, and other non-original research articles were excluded during the screening process. Studies published before 2020 and studies without an accessible full text were excluded. Search engine restrictions were also excluded to maintain transparency and reproducibility. Only English-language publications were included.

Search strategy

In addition to the information presented in Table 1, no manual searches such as reviewing reference lists or consulting grey literature were performed, as the focus of this review was on peer-reviewed primary studies accessible through electronic databases.

Table 1

Search strategy summary

Items	Specification
Date of search	First search: May 14, 2024; last search: December 8, 2025
Databases and other sources searched	Google Scholar, PubMed, Consensus, AutoLit, Research Rabbit, Litmaps
Search terms used	“Systolic Heart Failure” OR “heart failure” OR “HFrEF” OR “HFpEF” AND “electrocardiogram” OR “ECG” AND “artificial intelligence” OR “deep learning” OR “machine learning”
Timeframe	2020–2025
Inclusion and exclusion criteria	Inclusion criteria: systolic heart failure, heart failure with preserved ejection fraction, improved detection of heart failure, screening tool, diagnostic tool, cost-effective screening tool, clinical outcome. English-language publications only
Inclusion and exclusion criteria	Exclusion criteria: articles published before 2020, mention of systematic/meta-analysis reviews, STEMI, non-clinical studies
Selection process	18,722 articles were extracted from the initial search. Further specification of keywords narrowed the search to 737 articles. Four student authors independently screened titles and abstracts. Full-text review was performed to confirm eligibility. Disagreements were resolved through discussion and consensus, with attending physician supervision when needed. Finally, after applying inclusion and exclusion criteria, 52 articles met study eligibility

Selection process

Once the search was completed, the identified studies underwent a structured selection process. Four student authors independently screened studies across the databases Google Scholar and PubMed. Supplemental search engines used for screening included AutoLit, Litmaps, Consensus, and Research Rabbit. Full text reviews were conducted to determine eligibility based on predefined inclusion and exclusion criteria.

During this process, screening was not blinded, which may introduce selection bias. However, independent review and consensus resolution were used to minimize selection bias. ChatGPT version 5.0 was used exclusively to improve syntax, grammar, and diction during manuscript preparation. All data extraction and interpretation were performed manually by four student reviewers and two attending physician supervisors. Any uncertainties were discussed collaboratively and resolved by consensus to ensure accuracy, consistency, and reproducibility. Duplicate studies were manually identified and removed to prevent redundancy.

Data extraction

Following study selection, data were manually extracted by the four student authors and two senior attending physicians involved in the review. Given the heterogeneity of included studies, extracted data points included study design, sample size, characteristics of the AI model, training data composition, and reported diagnostic or prognostic performance metrics.

In cases where discrepancies arose, such as studies initially meeting inclusion criteria but later identified as retracted or containing incomplete or uninterpretable data, the study was excluded from the final synthesis. These exclusions were acknowledged as limitations in the discussion section.

Given the narrative review design, formal risk-of-bias scoring tools were not applied. Instead, common methodological themes and limitations across studies were identified qualitatively and are discussed in the Results and Discussion sections.

Results

AI-augmented ECG analysis has demonstrated substantial advancements in the early detection, risk stratification, and management of HF, supported by high diagnostic performance metrics, large training datasets, and increasing clinical applicability.

The characteristics, dataset composition, model architecture, training data, and reported performance metrics of included AI-enabled ECG studies are summarized in Table 2. Across studies, AI models were applied to standard 12-lead ECGs, reduced lead formats including single- and six-lead devices, ECG images, and wearable or home monitoring signals. Due to substantial heterogeneity in study design, datasets, ECG formats, model architectures, validation approaches, and reported outcomes, a formal meta-analysis was not performed and findings were synthesized qualitatively.

Table 2

Summary of AI-ECG studies for heart failure diagnosis and risk prediction

Study	ECG type	Model	Training data	Performance metrics	Clinical task/outcome
Lu et al. (1)	12-lead	Interpretable DL	2.3M ECGs	AUROC 0.85–0.93	Mortality & CV risk
Baek et al. (2)	12-lead	DL age model	>400k ECGs	HR mortality 1.60	Biological heart age
Kanaji et al. (3)	12-lead	CNN	Takotsubo cohort	AUROC 0.80	MACE risk prediction
Liu et al. (4)	12-lead	DL	Young–middle age cohort	AUROC 0.89	LVH & mortality
Attia et al. (5)	Single-lead	CNN	70k ECGs	AUROC 0.90	Low LVEF detection
Chou et al. (6)	12-lead	CNN ensemble	150k ECGs	AUROC 0.83	LA enlargement
Kwon et al. (7)	12-lead	CNN	44k ECGs	AUROC 0.87	HFpEF detection
Yagi et al. (8)	12-lead	CNN	External cohorts	AUROC 0.88	LVSD validation
Whyte et al. (9)	Mobile 12-lead	AI-ECG	Conference cohort	Accuracy 88%	HF screening
Al-Messabi et al. (10)	12-lead	ML models	Public datasets	Accuracy 82%	Heart disease detection
König et al. (11)	12-lead	CNN	40k ECGs	AUROC 0.86	LVSD detection
De Michieli et al. (12)	12-lead	CNN	ED cohort	AUROC 0.84	LVSD screening
Shiraishi et al. (13)	12-lead	DL	HF registry	AUROC 0.78	Sudden cardiac death
Khurshid et al. (15)	12-lead	DL	UK Biobank	AUROC 0.86	LV mass
Tsai et al. (16)	12-lead	DL	120k ECGs	AUROC 0.81	Mortality
Vaid et al. (17)	12-lead	CNN	Multicenter	AUROC 0.91	Biventricular dysfunction
Verbrugge et al. (18)	12-lead	CNN	HF cohorts	AUROC 0.88	LA myopathy
Kaur et al. (19)	12-lead	DL	Multi-ethnic	Bias analysis	Equity assessment
Kashou et al. (20)	12-lead	CNN	Population cohort	AUROC 0.87	LVSD screening
Kwon et al. (21)	Single-lead	CNN	Wearable ECGs	AUROC 0.86	HFrEF detection
De Michieli et al. (22)	12-lead	CNN	ED cohort	AUROC reported	LV dysfunction (ED validation)
Jentzer et al. (23)	12-lead	CNN	CICU ECGs	AUROC 0.89	LVSD in ICU
Liu et al. (24)	12-lead	DL	Screening cohort	Cost-effective	LV dysfunction
Park et al. (25)	12-lead	DL	Hospital ECGs	Accuracy reported	HF detection
Lee HS et al. (26)	12-lead	CNN	HF registry	AUROC 0.79	1-year mortality
Lee HS et al. (27)	12-lead	Explainable DL	Hospital ECGs	Visualization	HF subtype detection
Haleem et al. (28)	12-lead	DL	PhysioNet datasets	Accuracy reported	CHF classification
Yang et al. (29)	12-lead	THC-Net	ECG datasets	Accuracy 90%	CHF detection
Yao et al. (30)	12-lead	CNN	Primary care ECGs	↑ EF detection	Pragmatic RCT
Cho et al. (31)	12-lead	CNN	Hospital ECGs	AUROC 0.88	HFrEF screening
Dhingra et al. (32)	Single-lead	CNN	UK Biobank	AUROC 0.88	Incident HF
Dhingra et al. (33)	ECG images	CNN	Multinational	AUROC 0.90	HF risk
Sau et al. (34)	12-lead	CNN	UK Biobank	AUROC 0.84	Hypertension
Dhar et al. (35)	ECG + SCG + GSR	DL fusion	HF cohort	AUROC 0.82	Readmission
Dhingra et al. (36)	12-lead	Ensemble DL	Multicenter	AUROC reported	Structural HD screening
Klein et al. (37)	Wearable ECG	AI fusion	HF patients	r=0.84	PCWP estimation
Hasumi et al. (38)	Single-lead	CNN	Home ECG	AUROC 0.86	HF monitoring
Devkota et al. (39)	12-lead	DL	Open ECG	MAE 6–8%	EF estimation
Cheema & Tibrewala (40)	Ambulatory ECG	AI fusion	Remote monitoring	Feasibility	SEISMIC-HF
Hou et al. (41)	12-lead	DL	Hospital ECGs	AUROC 0.89	Low EF
Jentzer et al. (42)	12-lead	CNN	ICU ECGs	AUROC 0.87	Diastolic dysfunction
Kim et al. (43)	12-lead	DL	HFmrEF cohort	AUROC 0.85	HFmrEF
Lee et al. (44)	12-lead	Explainable DL	Hospital ECGs	AUROC 0.90	LVSD
Santana Júnior et al. (45)	12-lead	CNN	Population cohort	AUROC 0.88	LVSD
Lim et al. (46)	6-lead	CNN	Prospective	AUROC 0.89	LVSD
Karabayir et al. (47)	12-lead	DL	Community cohort	AUROC 0.86	HF classification
Kelshiker et al. (48)	ECG + stethoscope	AI fusion	Primary care	AUROC 0.80	HF detection
Lee et al. (49)	12-lead	Foundation AI	Multi-ethnic	AUROC 0.92	LV filling pressure
Sau et al. (50)	12-lead	DL	Population ECGs	AUROC 0.88	Mortality
Croon et al. (51)	12-lead	AI-ECG	UK Biobank	HRs reported	PheWAS
Knorr et al. (52)	12-lead	AI risk model	Primary prevention	C-stat ↑	CV risk
Bandyopadhyay et al. (53)	Ambulatory ECG	DL reconstruction	Wearables	MAE ↓	EF monitoring

AI, artificial intelligence; AUROC, area under the receiver operating characteristic curve; CHF, congestive heart failure; CICU, cardiac intensive care unit; CNN, convolutional neural network; CV, cardiovascular; DL, deep learning; ECG, electrocardiogram; ED, emergency department; GSR, galvanic skin response; HD, heart disease; HF, hear failure; HFmrEF, heart failure with mildly reduced ejection fraction; HFpEF, heart failure with preserved ejection fraction; HFrEF, heart failure with reduced ejection fraction; HR, hazard ratio; ICU, intensive care unit; LA, left atrium; LV, left ventricle; LVEF, left ventricular ejection fraction; LVH, left ventricular hypertrophy; LVSD, left ventricular systolic dysfunction; MACE, major adverse cardiovascular events; MAE, mean absolute error; ML, machine learning; PCWP, pulmonary capillary wedge pressure; PheWAS, Phenome-Wide Association Study; RCT, randomized controlled trial; SCG, seismocardiogram; SEISMIC, SEISMocardiogram In Cardiovascular Monitoring; THC-Net, Two-channel Hybrid Convolutional Network.

Summary of included studies

A total of 52 studies were included. Most studies were retrospective and derived from electronic health record linked ECG repositories or registry-based cohorts (1,4,7,16,17,20), while a smaller subset evaluated prospective acquisition or real-world implementation within primary care, emergency, or wearable device settings (12,24,30,48). Study objectives clustered into four dominant categories: detection of reduced ejection fraction and left ventricular systolic dysfunction, identification of preserved ejection fraction phenotypes and structural surrogates, prediction of adverse cardiovascular outcomes, and evaluation of workflow integration or cost effectiveness.

Detection of reduced ejection fraction and systolic dysfunction

Numerous studies demonstrated strong diagnostic performance for detection of reduced ejection fraction and left ventricular systolic dysfunction using AI-augmented ECGs. Attia et al. reported robust detection of low ejection fraction using a single lead ECG acquired through a digital stethoscope, achieving an area under the receiver operating characteristic curve of approximately 0.90 (5). Kashou et al. and Jentzer et al. demonstrated effective identification of systolic dysfunction in population-based and intensive care unit cohorts, respectively (20,23). König et al. and Yagi et al. further validated deep learning models across external datasets, confirming generalizability while noting modest performance variability across institutions (8,11). Collectively, these studies support the feasibility of AI-enabled ECG screening for systolic dysfunction in both outpatient and acute care settings.

Detection of preserved ejection fraction phenotypes and structural abnormalities

A second group of studies focused on HF with preserved ejection fraction and structural or functional surrogates inferred from ECG features. Kwon et al. developed a deep learning model capable of identifying HF with preserved ejection fraction, reporting area under the curve values of 0.866 internally and 0.869 in external validation (7). Verbrugge et al. demonstrated that AI-enabled ECG scores could detect left atrial myopathy, which was independently associated with atrial fibrillation and mortality risk (18). Lee et al. introduced a multi-ethnic foundation model-based ECG system that accurately detected elevated left ventricular filling pressure and provided prognostic information across diverse populations (49). Karabayir et al. further demonstrated classification of reduced versus preserved ejection fraction phenotypes using community-based ECG cohorts (47). These studies illustrate the expanding role of AI in identifying HF-related physiology that is traditionally difficult to capture using surface ECG alone.

Prognostic value and risk prediction

Many studies extended beyond diagnostic classification to predict clinically meaningful outcomes, including all-cause mortality, cardiovascular mortality, hospitalization, major adverse cardiac events, arrhythmia risk, and sudden cardiac death. Lu et al. analyzed over two million ECGs and demonstrated that interpretable deep learning models could stratify mortality and cardiovascular risk with high accuracy (1). Baek et al. showed that AI-estimated biological heart age derived from ECGs was strongly associated with mortality and major adverse cardiac events (2). Liu et al. reported that AI-predicted left ventricular hypertrophy was associated with hazard ratios of 1.91 for cardiovascular mortality and 1.54 for all-cause mortality (4). Tsai et al. developed a survival-based deep learning model that identified patients at markedly elevated mortality risk (16). Shiraishi et al. improved the prediction of sudden cardiac death in HF patients by integrating AI-enhanced ECG features with clinical variables (13). Additional studies demonstrated prediction of future arrhythmias, hospitalization, and composite cardiovascular outcomes across diverse populations (3,30,33,52).

Training data scale and generalizability

Model performance was closely associated with the size and diversity of training datasets. Large-scale studies leveraging hundreds of thousands to millions of ECGs generally reported more stable and generalizable performance (1,2,7,17). However, Yagi et al. demonstrated that performance could vary across external cohorts, underscoring the importance of multicenter validation (8). Kaur et al. identified disparities in model performance across race, sex, and age, highlighting the need for inclusive training datasets and subgroup analyses prior to clinical deployment (19).

Clinical workflow integration and cost effectiveness

Several studies evaluated real-world clinical integration of AI-enabled ECG tools. Yao et al. conducted a cluster randomized trial in primary care clinics, demonstrating increased detection of low ejection fraction and more targeted echocardiogram utilization following implementation of AI screening (30). De Michieli et al. showed that emergency department deployment of AI ECG screening reduced unnecessary echocardiography while maintaining high sensitivity for systolic dysfunction (12). Liu et al. evaluated opportunistic screening for asymptomatic left ventricular dysfunction and reported favorable cost-effectiveness, with estimated savings of $7,439 per quality-adjusted life year (24). Wearable and ambulatory monitoring studies further demonstrated the feasibility for continuous or home-based assessment of cardiac function (21,48,53).

Discussion

Overall performance and comparison to prior literature

The application of AI to ECG analysis for HF diagnosis and risk prediction has expanded rapidly over the past decade. In this narrative review, we observed a consistent trend toward improved diagnostic and prognostic performance among AI-augmented ECG models compared with conventional ECG interpretation. However, the magnitude of improvement varied substantially across study design, dataset size, outcome definition, and validation strategy. These findings are consistent with prior reviews of AI in cardiovascular diagnostics, which emphasize that model performance remains closely tied to data quality, cohort diversity, and methodological transparency (14,54).

Rather than demonstrating universal superiority, the included studies collectively highlight AI-ECG as a supportive diagnostic and prognostic tool. Across the reviewed literature, deep learning-based approaches frequently achieved performance that approached or matched expert clinician interpretation, particularly when trained on large, heterogeneous datasets. Multiple studies demonstrated strong performance for the detection of left ventricular systolic dysfunction and reduced ejection fraction, including those by Attia et al. (5,30,54), Kashou et al. (20), and König et al. (11), with reported area under the curve values approaching or exceeding 0.88.

Beyond reduced ejection fraction, a second group of studies focused on preserved ejection fraction-related HF phenotypes and structural abnormalities inferred from ECG features. Kwon et al. (7) developed a deep learning model for HF with preserved ejection fraction that demonstrated robust internal and external discrimination. Verbrugge et al. (18) showed that AI-derived ECG signatures could identify left atrial myopathy, which was independently associated with atrial fibrillation and adverse cardiovascular outcomes. Additional investigations by Chou et al. (6) and Liu et al. (4) demonstrated that AI-ECG models could infer left atrial enlargement and left ventricular hypertrophy, linking these findings to long-term cardiovascular risk and mortality.

Several studies extended beyond diagnostic classification to predict clinically meaningful outcomes, including all-cause mortality, cardiovascular mortality, hospitalization, major adverse cardiac events, arrhythmia risk, and sudden cardiac death (1,2,4,13,16). Large population-based studies, such as those by Lu et al. (1), Dhingra et al. (32,33), and Lee et al. (49), consistently demonstrated more stable and generalizable performance compared with smaller single-center cohorts. In contrast, smaller retrospective studies often yielded more variable results, reinforcing concerns that limited sample diversity and internal validation alone may overestimate diagnostic accuracy (8,19).

Collectively, these findings position AI-augmented ECG analysis not as a replacement for clinical expertise, but as a complementary tool with growing potential to enhance HF detection, phenotyping, and risk stratification. Similar to trends observed in other cardiovascular imaging domains, the generalizability and reliability of AI-ECG models remain strongly dependent on dataset scale, population diversity, and rigorous external validation.

Clinical implications

A subset of studies in this review explicitly evaluated clinically meaningful outcomes beyond diagnostic classification. Multiple investigations demonstrated associations between AI-derived ECG outputs and all-cause mortality, cardiovascular mortality, hospitalization, major adverse cardiac events, arrhythmia risk, and sudden cardiac death (2,4,13,16). These findings suggest that AI-ECG models may provide valuable prognostic information that complements traditional clinical risk stratification.

Other studies focused on improving detection of reduced ejection fraction or identifying preserved ejection fraction-related phenotypes and structural abnormalities, such as left atrial enlargement, left atrial myopathy, or left ventricular hypertrophy (6,7,18). While improved detection may facilitate earlier intervention, diagnostic performance alone does not necessarily translate into improved patient outcomes unless it alters clinical decision-making or management pathways. As such, prospective studies designed to evaluate downstream clinical impact remain necessary.

Integration with clinical workflows

Several studies demonstrated the successful integration of AI-ECG tools into real-world clinical workflows. Pragmatic and cluster-randomized investigations showed that embedding AI-ECG into primary care or emergency department settings increased detection of left ventricular systolic dysfunction and optimized use of echocardiography resources (12,30). These findings suggest that AI-ECG systems may improve efficiency and support earlier referral without substantially increasing clinician burden.

Wearable and portable ECG platforms further extend the potential reach of AI-ECG models. Studies utilizing smartwatch-derived or reduced-lead ECGs demonstrated diagnostic performance comparable to conventional twelve-lead recordings (21,44). Such approaches may expand access to early screening, particularly in underserved or resource-limited settings, although robust prospective validation remains limited.

Economic and health system considerations

Despite frequent claims of improved efficiency, formal cost-effectiveness analyses were uncommon across the included studies. One investigation demonstrated favorable quality-adjusted life year estimates and potential cost savings associated with opportunistic AI-ECG screening for asymptomatic left ventricular dysfunction (24). However, most studies did not quantify economic outcomes directly. Future research should incorporate standardized health economic evaluations alongside diagnostic performance metrics to inform system-level adoption decisions.

Limitations of current evidence

The existing literature is limited by substantial methodological heterogeneity. Most studies were retrospective and single-center, with limited reporting of blinding, preprocessing pipelines, and external validation. Performance variability across institutions and demographic subgroups was noted, raising concerns regarding generalizability and equity (8,19). Additionally, only a minority of studies incorporated explainability techniques, limiting transparency and clinician trust.

Future research directions

Future investigations should prioritize prospective, multicenter study designs and adhere to standardized reporting frameworks such as TRIPOD-AI. External validation across diverse populations, incorporation of explainable AI methods, and evaluation of clinical decision making and patient outcomes are essential. Randomized or pragmatic trials assessing real-world impact, including cost-effectiveness and workflow efficiency, will be critical to establishing the true clinical value of AI-augmented ECG analysis in HF care.

Conclusions

This narrative review demonstrates that AI augmented ECG analysis has substantial potential to enhance the accuracy, consistency, and efficiency of HF diagnosis, phenotyping, and risk prediction. Across the included studies, most AI-ECG models achieved strong diagnostic and prognostic performance, with reported discrimination metrics frequently approaching or matching expert clinician interpretation. These findings support the role of AI-ECG as a valuable diagnostic support tool rather than a replacement for conventional clinical assessment.

However, improved model performance alone does not necessarily translate into improved clinical outcomes. While many studies focused on detecting left ventricular systolic dysfunction, HF subtypes, and structural or electrophysiologic abnormalities, only a limited subset directly evaluated patient centered or workflow related outcomes, such as reductions in unnecessary echocardiography, improved triage efficiency, cost effectiveness, or downstream clinical decision making. As a result, the current evidence remains largely descriptive and observational.

Accordingly, existing data support the use of AI-ECG as an adjunctive screening and risk stratification tool, but do not yet establish that widespread implementation meaningfully improves patient outcomes at the population level. To address this gap, future research should prioritize prospective, multicenter clinical studies that evaluate patient centered endpoints alongside diagnostic performance. Formal cost effectiveness analyses, external validation across diverse populations, and incorporation of explainable AI methods are also essential to ensure safe, transparent, and equitable adoption.

By addressing these limitations, AI-enhanced ECG analysis may evolve from a high performing diagnostic technology into a clinically transformative tool that improves early detection, optimizes resource utilization, and supports personalized HF care in real world settings.

Acknowledgments

During the preparation of this work, the author(s) used ChatGPT 4.0–5.0 to improve language and readability, with caution. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.

Footnote

Reporting Checklist: The authors have completed the Narrative Review reporting checklist. Available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-178/rc

Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-178/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-178/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Lu L, Zhu T, Ribeiro AH, et al. Decoding 2.3 million ECGs: interpretable deep learning for advancing cardiovascular diagnosis and mortality risk stratification. Eur Heart J Digit Health 2024;5:247-59. [Crossref] [PubMed]
Baek YS, Lee DH, Jo Y, et al. Artificial intelligence-estimated biological heart age using a 12-lead electrocardiogram predicts mortality and cardiovascular outcomes. Front Cardiovasc Med 2023;10:1137892. [Crossref] [PubMed]
Kanaji Y, Ozcan I, Tryon DN, et al. Predictive Value of Artificial Intelligence-Enabled Electrocardiography in Patients With Takotsubo Cardiomyopathy. J Am Heart Assoc 2024;13:e031859. [Crossref] [PubMed]
Liu CM, Hsieh ME, Hu YF, et al. Artificial Intelligence-Enabled Model for Early Detection of Left Ventricular Hypertrophy and Mortality Prediction in Young to Middle-Aged Adults. Circ Cardiovasc Qual Outcomes 2022;15:e008360. [Crossref] [PubMed]
Attia ZI, Dugan J, Rideout A, et al. Automated detection of low ejection fraction from a one-lead electrocardiogram: application of an AI algorithm to an electrocardiogram-enabled Digital Stethoscope. Eur Heart J Digit Health 2022;3:373-9. [Crossref] [PubMed]
Chou CC, Liu ZY, Chang PC, et al. Comparing Artificial Intelligence-Enabled Electrocardiogram Models in Identifying Left Atrium Enlargement and Long-term Cardiovascular Risk. Can J Cardiol 2024;40:585-94. [Crossref] [PubMed]
Kwon JM, Kim KH, Eisen HJ, et al. Artificial intelligence assessment for early detection of heart failure with preserved ejection fraction based on electrocardiographic features. Eur Heart J Digit Health 2020;2:106-16. Erratum in: Eur Heart J Digit Health 2021;3:115-6.
Yagi R, Goto S, Katsumata Y, et al. Importance of external validation and subgroup analysis of artificial intelligence in the detection of low ejection fraction from electrocardiograms. Eur Heart J Digit Health 2022;3:654-7. [Crossref] [PubMed]
Whyte S, Sample K, Mustafina I, et al. Diagnostic accuracy of a mobile AI-guided 12-lead ECG device. Circulation 2024;21:S82-3.
Al-Messabi HS, Al-Ali FM, Al-Obeidat F. Exploring the Potential of Analytical Models in Heart Disease Prediction. Int J Inf Sci Technol 2024;8:17-34.
König S, Hohenstein S, Nitsche A, et al. Artificial intelligence-based identification of left ventricular systolic dysfunction from 12-lead electrocardiograms: external validation and advanced application of an existing model. Eur Heart J Digit Health 2023;5:144-51.
De Michieli L, Knott JD, Attia ZI, et al. Artificial intelligence-augmented electrocardiography for left ventricular systolic dysfunction in patients undergoing high-sensitivity cardiac troponin T. Eur Heart J Acute Cardiovasc Care 2023;12:106-14. [Crossref] [PubMed]
Shiraishi Y, Goto S, Niimi N, et al. Improved prediction of sudden cardiac death in patients with heart failure through digital processing of electrocardiography. Europace 2023;25:922-30. [Crossref] [PubMed]
Siontis KC, Noseworthy PA, Attia ZI, et al. Artificial intelligence-enhanced electrocardiography in cardiovascular disease management. Nat Rev Cardiol 2021;18:465-78. [Crossref] [PubMed]
Khurshid S, Friedman S, Pirruccello JP, et al. Deep Learning to Predict Cardiac Magnetic Resonance-Derived Left Ventricular Mass and Hypertrophy From 12-Lead ECGs. Circ Cardiovasc Imaging 2021;14:e012281. [Crossref] [PubMed]
Tsai DJ, Lou YS, Lin CS, et al. Mortality risk prediction of the electrocardiogram as an informative indicator of cardiovascular diseases. Digit Health 2023;9:20552076231187247. [Crossref] [PubMed]
Vaid A, Johnson KW, Badgeley MA, et al. Using Deep-Learning Algorithms to Simultaneously Identify Right and Left Ventricular Dysfunction From the Electrocardiogram. JACC Cardiovasc Imaging 2022;15:395-410. [Crossref] [PubMed]
Verbrugge FH, Reddy YNV, Attia ZI, et al. Detection of Left Atrial Myopathy Using Artificial Intelligence-Enabled Electrocardiography. Circ Heart Fail 2022;15:e008176. [Crossref] [PubMed]
Kaur D, Hughes JW, Rogers AJ, et al. Race, Sex, and Age Disparities in the Performance of ECG Deep Learning Models Predicting Heart Failure. Circ Heart Fail 2024;17:e010879. [Crossref] [PubMed]
Kashou AH, Medina-Inojosa JR, Noseworthy PA, et al. Artificial Intelligence-Augmented Electrocardiogram Detection of Left Ventricular Systolic Dysfunction in the General Population. Mayo Clin Proc 2021;96:2576-86. [Crossref] [PubMed]
Kwon JM, Jo YY, Lee SY, et al. Artificial Intelligence-Enhanced Smartwatch ECG for Heart Failure-Reduced Ejection Fraction Detection by Generating 12-Lead ECG. Diagnostics (Basel) 2022;12:654. [Crossref] [PubMed]
De Michieli L, Knott J, Attia Z, et al. Artificial intelligence-enabled electrocardiographic algorithm for the detection of left ventricular dysfunction in emergency department patients undergoing high-sensitivity cardiac troponin T. Eur Heart J 2022;43:ehac544.1330.
Jentzer JC, Kashou AH, Attia ZI, et al. Left ventricular systolic dysfunction identification using artificial intelligence-augmented electrocardiogram in cardiac intensive care unit patients. Int J Cardiol 2021;326:114-23. [Crossref] [PubMed]
Liu WT, Hsieh PH, Lin CS, et al. Opportunistic Screening for Asymptomatic Left Ventricular Dysfunction With the Use of Electrocardiographic Artificial Intelligence: A Cost-Effectiveness Approach. Can J Cardiol 2024;40:1310-21. [Crossref] [PubMed]
Park Y, Kim KG, Joo S, et al. AI algorithms for ECG-based heart failure detection. Circulation 2023;148:A14208.
Lee HS, Kang S, Jang JH, et al. AI-ECG prediction of 1-year mortality in HFrEF. Circulation 2022;146:A13550.
Lee HS, Jang JH, Kang S, et al. Visualization of AI detection of heart failure subtypes from ECG. Circulation 2022;146:A13558.
Haleem MS, Castaldo R, Pagliara SM, et al. Time adaptive ECG driven cardiovascular disease detector. Biomed Signal Process Control 2021;70:102968.
Yang W, Si Y, Zhang G, et al. THC-Net for automated congestive heart failure recognition. Inf Sci 2021;575:112-24.
Yao X, Rushlow D, Inselman J, et al. Artificial Intelligence‐Enhanced ECG Identification of Low Ejection Fraction: A Pragmatic, Cluster‐Randomized Clinical Trial. Health Serv Res 2021;56:28-9.
Cho J, Lee B, Kwon JM, et al. Artificial Intelligence Algorithm for Screening Heart Failure with Reduced Ejection Fraction Using Electrocardiography. ASAIO J 2021;67:314-21. [Crossref] [PubMed]
Dhingra LS, Aminorroaya A, Pedroso AF, et al. AI-enabled prediction of heart failure risk from single-lead ECGs. JAMA Cardiol 2025;10:574-84. [Crossref] [PubMed]
Dhingra LS, Aminorroaya A, Sangha V, et al. Heart failure risk stratification using artificial intelligence applied to electrocardiogram images: a multinational study. Eur Heart J 2025;46:1044-53. [Crossref] [PubMed]
Sau A, Barker J, Pastika L, et al. Artificial Intelligence-Enhanced Electrocardiography for Prediction of Incident Hypertension. JAMA Cardiol 2025;10:214-23. [Crossref] [PubMed]
Dhar R, Hossen MR, Gamage PT, et al. AI-based prediction of HF readmission using multimodal biosignals. IEEE J Biomed Health Inform 2023;27:5767-74.
Dhingra LS, Aminorroaya A, Sangha V, et al. Ensemble Deep Learning Algorithm for Structural Heart Disease Screening Using Electrocardiographic Images: PRESENT SHD. J Am Coll Cardiol 2025;85:1302-13. [Crossref] [PubMed]
Klein L, Fudim M, Etemadi M, et al. Noninvasive PCWP estimation using wearable sensing and AI. JACC Heart Fail 2025;13:617-27.
Hasumi E, Fujiu K, Chen Y, et al. Heart failure monitoring with a single lead electrocardiogram at home. Int J Cardiol 2025;432:133203. [Crossref] [PubMed]
Devkota A, Prajapati R, El-Wakeel A, Adjeroh D, Patel B, Gyawali P. AI analysis for ejection fraction estimation from 12-lead ECG. Sci Rep 2025;15:13502. [Crossref] [PubMed]
Cheema B, Tibrewala A. SEISMIC-HF 1: key findings from AHA24 and implications for remote cardiac monitoring. Heart Fail Rev 2025;30:1099-101. [Crossref] [PubMed]
Hou Y, Fan Z, Li J, et al. Deep learning-based 12-lead electrocardiogram for low left ventricular ejection fraction detection in patients. Can J Cardiol 2025;41:278-90. [Crossref] [PubMed]
Jentzer JC, Lee E, Attia Z, et al. Artificial Intelligence ECG Diastolic Dysfunction and Survival in Cardiac Intensive Care Unit Patients. J Am Heart Assoc 2025;14:e037839. [Crossref] [PubMed]
Kim DY, Lee SW, Lee DH, et al. Electrocardiography-based artificial intelligence predicts the upcoming future of heart failure with mildly reduced ejection fraction. Eur Heart J Digit Health 2025;12:1418914.
Lee MS, Jang JH, Kang S, et al. Transparent and Robust Artificial Intelligence-Driven Electrocardiogram Model for Left Ventricular Systolic Dysfunction. Diagnostics (Basel) 2025;15:1837. [Crossref] [PubMed]
Santana Júnior WB, Pinto Filho MM, Barreto SM, et al. Use of Artificial Intelligence Applied to Electrocardiogram for Diagnosis of Left Ventricular Systolic Dysfunction. Arq Bras Cardiol 2025;122:e20240740. [Crossref] [PubMed]
Lim J, Lee HS, Han GI, et al. Artificial intelligence-enhanced six-lead portable electrocardiogram device for detecting left ventricular systolic dysfunction: a prospective single-centre cohort study. Eur Heart J Digit Health 2025;6:476-85. [Crossref] [PubMed]
Karabayir I, Davis R, Tootooni MS, et al. Abstract 4143513: ECG-AI to Assist with the Classification of Low Ejection Fraction and Heart Failure with Preserved Ejection Fraction. Circulation 2024;150:4143513.
Kelshiker MA, Chhatwal K, Nakhare S, et al. AI-enabled stethoscope for heart failure detection in primary care. Eur Heart J 2025;46:ehaf784.4424.
Lee MS, Lim J, Kang S, et al. Multi-ethnic foundation AI-ECG for LV filling pressure detection. Nat Med 2025;31:678-87.
Sau A, Pastika L, Sieliwonczyk E, et al. AI-ECG prediction of mortality, arrhythmia, and cardiovascular disease. Eur Heart J Digit Health 2024;5:zhad245.
Croon PM, Dhingra L, Sangha V, et al. PheWAS of AI-enhanced ECG interpretation in UK Biobank. Circulation 2025;152:761-73.
Knorr MS, Brederecke J, Bremer JP, et al. Improving cardiovascular risk stratification with ECG-predicted risk factors in primary prevention. Eur Heart J Digit Health 2024;5:ezead087.
Bandyopadhyay S, Chiu IM, Liu S, Ansari R, et al. Deep learning-reconstructed ECGs for continuous EF monitoring. Circulation 2024;150:A4142999.
Attia ZI, Harmon DM, Behr ER, et al. Application of artificial intelligence to the electrocardiogram. Eur Heart J 2021;42:4717-30. [Crossref] [PubMed]

doi: 10.21037/jmai-2025-178
Cite this article as: Chepkunov D, Chung D, Ghanbari K, Khanna B, Yakubov N, Babu M, Babu B. Artificial intelligence-augmented electrocardiogram for heart failure diagnosis and risk prediction: a narrative review. J Med Artif Intell 2026;9:36.

Artificial intelligence-augmented electrocardiogram for heart failure diagnosis and risk prediction: a narrative review

Introduction

Background

Rationale and knowledge gap

Objective

Methods

Eligibility criteria

Search strategy

Table 1

Selection process

Data extraction

Results

Table 2

Summary of included studies

Detection of reduced ejection fraction and systolic dysfunction

Detection of preserved ejection fraction phenotypes and structural abnormalities

Prognostic value and risk prediction

Training data scale and generalizability

Clinical workflow integration and cost effectiveness

Discussion

Overall performance and comparison to prior literature

Clinical implications

Integration with clinical workflows

Economic and health system considerations

Limitations of current evidence

Future research directions

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share