Artificial intelligence and the limits of predictability in medicine
Letter to the Editor

Artificial intelligence and the limits of predictability in medicine

Jacob T. Rosenthal1 ORCID logo, Mert R. Sabuncu2,3 ORCID logo

1Tri-Institutional MD-PhD Program of Weill Cornell/Rockefeller/Sloan Kettering, New York, NY, USA; 2Department of Radiology, Weill Cornell Medicine, New York, NY, USA; 3School of Electrical and Computer Engineering, Cornell Tech and Cornell University, New York, NY, USA

Correspondence to: Mert R. Sabuncu, PhD. Department of Radiology, Weill Cornell Medicine, 2 W Loop Rd., New York, NY 10044, USA; School of Electrical and Computer Engineering, Cornell Tech and Cornell University, New York, NY, USA. Email: msabuncu@cornell.edu.

Received: 06 March 2025; Accepted: 12 June 2025; Published online: 22 August 2025.

doi: 10.21037/jmai-2025-92


Introduction

The practice of medicine is fundamentally about making predictions. Clinical management of patients is based on predicting the future and intervening to yield the best possible outcome. Tasks which involve contemporaneous concepts can also be thought of as predictive. For example, diagnosis can be viewed as predicting the underlying pathology based on available information (e.g., test results) and observation (e.g., clinical presentation). Therefore, with its emphasis on prediction, machine learning, which is the workhorse of modern artificial intelligence (AI), is particularly well-suited for medicine.

The AI revolution, and especially the advent of deep learning and large language models (LLMs), illustrates the power of prediction. The basic task of predicting what comes next in a sentence, when trained on massive internet-scale corpora of text, can be used to build ever more complex and capable language models. These powerful prediction engines have found potential applications across fields of medicine, ranging from diagnostics to patient management and workflow tools. How far will these advances take us? Will AI ever be able to “solve” medicine by enabling perfect predictions, taking decision-making processes out of the hands of humans?


Sources of unpredictability

More than 30 years of research from fields such as physics, weather forecasting, and sociology has established the existence of concrete limits to prediction. These bounds arise because there are certain intrinsic sources of error and uncertainty in every non-trivial predictive task (1). This is sometimes referred to as aleatoric uncertainty and is closely related to the concept of Bayes error rates in detection theory.

Measurement error is one fundamental source of noise due to the limitations of the instruments and sensors by which we collect information from the world. In medical contexts, where supervised machine learning models are often trained on labels created by clinical experts, measurement error can be introduced through inter- and intra-observer variability (2,3); this can be viewed as a form of instrument noise, where the instrument is the expert labeling the data.

Another source of error is due to the limitations in computation, which in turn restricts the capacity of our models that use a finite set of parameters to capture the infinitely complex and sometimes chaotic natural systems driving causal relationships in the physical world—this fundamental limitation is known as dimensional distortion.

Lack of data can also bound the predictive ability of models, and assembling large enough datasets to train AI models can be particularly difficult in biomedical applications due to data privacy protections and rare conditions with low prevalence. Time lags in measurements can also make it difficult to capture temporal trends; this can lead to data obsolescence, increasing prediction error for more recent data. Finally, some natural processes may exhibit irreducible randomness, such as random mutation and genetic drift in evolution (4) and indeterminacy in quantum physics (5).

These various sources of error combine and compound non-linearly as prediction horizons are extended and/or involve more complex mechanisms, leading to fundamental limits beyond which the noise from various sources of error overwhelms any true signal, leaving no possibility of perfectly accurate inferences. This frontier is termed the limit of predictability, and it provides a helpful framework for understanding the implications of the rapid advancements in medical AI.


Characterizing the limits

Knowing that there are some fundamental limits to predictability, scientists in multiple fields have worked to characterize the performance ceiling. In weather forecasting, for example, even the most sophisticated models are incapable of reliably predicting high-impact weather events beyond 2 weeks out (6). Empiric analysis of population-level data has estimated the predictability limit to be 93% for human mobility under certain conditions (7). In the medical domain, predictability bounds in the progression of diseases have been characterized using electronic health records (8). While the exact ceiling will vary by context, both theoretical and empirical evidence agree that there will always be some frontier beyond which it is fundamentally not possible to make accurate predictions. An important gap in the medical AI literature is that there is a lack of tools and experimental studies for characterizing these limits.

On the other hand, it is not uncommon for research papers in medical AI to report predictive performance seemingly close to perfection, such as accuracy values well above 90% (9). We argue that it is important to be skeptical of such results and properly contextualize them. In some cases, the predictive tasks used in research have been artificially simplified from the real world, so that dimensional distortion no longer applies, and computational models trained on limited data are able to perform well. Information may also be propagated in the research setting along avenues that would not be available in the real world. Such leakage could occur due to basic errors such as overlap between training and test sets, or in much more subtle ways stemming from data curation and analysis. Overfitting to the idiosyncrasies of the research data may also occur, hurting the model’s predictive power on other populations; the use of independent external validation datasets can help ameliorate but not completely prevent this. In general, greater predictive power should be viewed with greater skepticism and presented with a discussion of the fundamental sources of error and the intrinsic limits of the predictive task at hand.


Improving prediction in medicine

To push back the limits of prediction, focused initiatives are necessary. Scale is often useful to achieve better performance, and many of the recent successes in medical AI have largely been driven by bigger models, more compute power, and more data. Yet while these might help squeeze every bit of useful information from our data to arrive at the ceiling of predictability, they will not allow us to surpass the fundamental limits of prediction. To truly expand the horizon of predictability for human health, it is necessary to examine the sources of error and target them directly.

New data modalities, ranging from omics assays to wearables, might capture new information about the world not contained in other data sources, thus helping to reduce dimensional distortion. Targeted data collection can also be used to help reduce known sources of uncertainty. For instance, targeted genomic profiling of clonal heterogeneity within tumors allows for better forecasts in precision medicine by improving knowledge of the initial biological conditions (4). Incorporating information about social, behavioral and lifestyle factors can provide essential context to guide interpretation of other biometric data sources at the patient level, allowing for improved predictions.

Similarly, data collection among underrepresented populations can help reduce systematic error by providing more complete information about the world. Investments in better sensors can also help reduce instrument error and improve resolution, but may be quite costly: studies in meteorology found that the resolution of atmospheric sensors must be increased by an order of magnitude just to extend weather predictions by 3–5 days (6). Analogously, investments in more powerful magnetic resonance imaging (MRI) scanners can be expected to improve diagnostic predictions, but likely at great expense and possibly with diminishing returns. Finally, predictive models benefit from advances in causal modeling and basic science, which play an important role in constructing better prediction models. However, even high-performing models may be met with resistance from clinicians if their predictions lack a clear causal rationale, especially in high-stakes decisions with major clinical consequences.


Embracing imperfect predictions in medicine

As medicine enters the era of AI, human health certainly stands to benefit from improved predictive models. Physicians should embrace the potential of new technologies to improve the frontiers of healthcare, but also recognize that there are inherent limits to our understanding of the world and the capabilities of our most sophisticated AI models (10,11). These limits underscore the importance of ethical caution in high-stakes clinical decision-making, where human judgment and contextual understanding remain essential. The fundamental limit of predictability provides a useful framework for understanding how targeted efforts can expand prediction horizons by reducing sources of error, but also why this error is impossible to eliminate completely. Because medicine itself is a predictive enterprise, this helps contextualize why human health will never be fully “solved” by AI only. The quality and utility of AI predictions will depend on the data we acquire, shaped by our understanding of health and disease and our ability to measure relevant factors, and the tasks we assign to the models. Medical AI should not be seen as an infallible oracle but as a powerful tool within a framework of uncertainty—one that, when properly understood, can enhance clinical decision-making and improve patient outcomes.


Acknowledgments

None.


Footnote

Provenance and Peer Review: This article was a standard submission to the journal. The article has undergone external peer review.

Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-92/prf

Funding: The work was supported in part by a Medical Scientist Training Program grant from the National Institute of General Medical Sciences of the NIH under award No. T32GM152349 to the Weill Cornell/Rockefeller/Sloan Kettering Tri-Institutional MD-PhD Program to J.T.R.

Conflicts of Interest: Both authors have completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-92/coif). J.T.R. was supported by a Medical Scientist Training Program grant from the National Institute of General Medical Sciences of the NIH under award No. T32GM152349 to the Weill Cornell/Rockefeller/Sloan Kettering Tri-Institutional MD-PhD Program. The other author has no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Kravtsov YA. Fundamental and Practical Limits of Predictability. In: Kravtsov YA. editor. Limits of Predictability. Berlin, Heidelberg: Springer Berlin Heidelberg; 1993:173-203.
  2. Elsheikh TM, Asa SL, Chan JK, et al. Interobserver and intraobserver variation among experts in the diagnosis of thyroid follicular lesions with borderline nuclear features of papillary carcinoma. Am J Clin Pathol 2008;130:736-44. [Crossref] [PubMed]
  3. Popović ZB, Thomas JD. Assessing observer variability: a user’s guide. Cardiovasc Diagn Ther 2017;7:317-24. [Crossref] [PubMed]
  4. Lipinski KA, Barber LJ, Davies MN, et al. Cancer Evolution and the Limits of Predictability in Precision Cancer Medicine. Trends Cancer 2016;2:49-63. [Crossref] [PubMed]
  5. Kochen S, Specker EP. The Problem of Hidden Variables in Quantum Mechanics. Journal of Mathematics and Mechanics 1967;17:59-87.
  6. Zhang F, Sun YQ, Magnusson L, et al. What Is the Predictability Limit of Midlatitude Weather? Journal of the Atmospheric Sciences 2019;76:1077-91.
  7. Song C, Qu Z, Blumm N, et al. Limits of predictability in human mobility. Science 2010;327:1018-21. [Crossref] [PubMed]
  8. Dahlem D, Maniloff D, Ratti C. Predictability Bounds of Electronic Health Records. Sci Rep 2015;5:11865. [Crossref] [PubMed]
  9. Kumar Y, Koul A, Singla R, et al. Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda. J Ambient Intell Humaniz Comput 2023;14:8459-86. [Crossref] [PubMed]
  10. Khan B, Fatima H, Qureshi A, et al. Drawbacks of Artificial Intelligence and Their Potential Solutions in the Healthcare Sector. Biomed Mater Devices 2023; Epub ahead of print. [Crossref]
  11. Kolla L, Parikh RB. Uses and limitations of artificial intelligence for oncology. Cancer 2024;130:2101-7. [Crossref] [PubMed]
doi: 10.21037/jmai-2025-92
Cite this article as: Rosenthal JT, Sabuncu MR. Artificial intelligence and the limits of predictability in medicine. J Med Artif Intell 2026;9:29.

Download Citation