Leveraging artificial intelligence for real-world evidence generation in healthcare: a review of lessons across 58 case reflections

Alexandros Sagkriotis

doi:10.21037/jmai-2025-157

Review Article

Leveraging artificial intelligence for real-world evidence generation in healthcare: a review of lessons across 58 case reflections

Alexandros Sagkriotis

Independent Consultant in Real-World Evidence and Health Data Science, London, UK

Correspondence to: Alexandros Sagkriotis, MSc, BSc. Independent Consultant in Real-World Evidence and Health Data Science, 4 Hillside road, Chorleywood, WD3 5AP, UK. Email: asagkriotis@gmail.com.

Abstract: Artificial intelligence (AI) is revolutionizing healthcare by enhancing real-world evidence (RWE) generation, clinical decision-making, and operational efficiency. Between 2019 and 2023, the number of medical AI publications increased 4.4-fold—from 1,480 to 6,450 articles—reflecting its accelerating adoption. Drawing on 58 professional reflections, this paper presents an integrative synthesis of AI applications across multiple therapeutic areas, including oncology, ophthalmology and dermatology. Furthermore, it illustrates use cases in predictive modelling, image analysis, protocol automation, drug discovery, and patient engagement. The narrative also discusses regulatory frameworks, methodological limitations, and ethical considerations. The growing adoption of AI by regulators such as European Medicines Agency (EMA) and Health Technology Assessment (HTA) bodies underscores the growing need for structured governance, transparency, and human oversight. This manuscript provides a real-world, cross-disciplinary perspective on AI’s expanding role in healthcare evidence generation. Key findings of this review include the practical integration of AI in protocol automation, image-based diagnostics, social media insights, and regulatory modelling. The review concludes that AI, when responsibly governed, can significantly accelerate evidence generation while maintaining clinical relevance and ethical integrity. These insights may guide future research on explainable and scalable AI tools and inform clinical and policy frameworks to embed AI safely into real-world healthcare systems. In addition, this review distinguishes itself from existing literature by focusing not solely on technical performance but on lived implementation experience across cross-functional domains. By highlighting lessons learned, implementation pitfalls, and real-world adoption patterns, the review provides actionable intelligence for stakeholders beyond data scientists—such as policy makers, HTA assessors, and medical affairs professionals. For instance, the role of AI in automating unstructured data extraction or in generating HTA-relevant modelling scenarios is often underrepresented in traditional reviews. This paper attempts to close that gap. Furthermore, the paper highlights the complementary nature of AI with traditional statistical approaches and emphasizes the need for human-in-the-loop governance and explainability, especially in high-stakes regulatory or diagnostic contexts. These expanded insights can support more responsible adoption of AI technologies across healthcare systems globally.

Keywords: Artificial intelligence (AI); real-world evidence (RWE); healthcare; regulatory science; machine learning (ML)

Received: 26 June 2025; Accepted: 23 September 2025; Published online: 11 December 2025.

doi: 10.21037/jmai-2025-157

Introduction

Background

Publications on medical artificial intelligence (AI) have increased nearly 4.4-fold between 2019 and 2023, rising from 1,480 to 6,450 papers, and reflecting an average annual growth rate of approximately 27%, according to a PubMed-based bibliometric analysis (1). AI encompasses a broad spectrum of technologies, including machine learning (ML), deep learning (DL), natural language processing (NLP), and large language models (LLMs), all of which are transforming the healthcare landscape (2,3). These technologies are not only enhancing diagnostic accuracy and personalized care but also reshaping how clinical evidence is generated, evaluated, and implemented (4,5). Projections suggest that at least 60% of today’s jobs will be transformed by AI technologies within the next 25 years, necessitating new skillsets and cultural shifts across industries, including healthcare (6). Regulatory agencies such as the European Medicines Agency (EMA) and Health Technology Assessment (HTA) bodies have acknowledged this paradigm shift by releasing guidance documents emphasizing transparency, auditability, and human oversight in AI applications (7-10).

Rationale and knowledge gap

Despite the growing body of literature on AI in healthcare, there remains a limited number of practice-oriented reviews that synthesize cross-functional and cross-therapeutic use cases specifically focused on real-world evidence (RWE) generation. Most academic publications focus on either highly technical applications or narrow disease areas, leaving a gap in accessible, real-life examples that can guide policymakers, data scientists, clinicians, and industry professionals.

Compared with prior reviews that focus primarily on methodological taxonomies, technical pipelines, or disease-specific applications of AI in RWE, this review offers a novel contribution by synthesizing 58 practice-based professional reflections authored between 2023–2025. These reflections provide cross-functional insights spanning regulatory science, protocol automation, medical affairs, and therapeutic domains such as oncology, ophthalmology, and dermatology. The review bridges technical AI developments with policy-relevant, implementation-focused narratives, offering a grounded understanding of how AI is being applied in diverse RWE contexts.

Objective

The objective of this paper is to provide a practice-driven synthesis of current AI applications in healthcare evidence generation, drawing on real-world experiences reflected in 58 professional LinkedIn posts. The paper illustrates concrete examples of AI integration across oncology, ophthalmology, dermatology, protocol development, medical affairs, and regulatory science, with attention to both opportunities and ongoing challenges.

Methods

This work does not follow a formal systematic review methodology. I do not apply a predefined search strategy, quality appraisal, or selection criteria. Instead, I conduct a qualitative analysis of 58 professional reflections authored between 2023 and 2025, all focusing on the role of AI in healthcare. These reflections come from a curated set of LinkedIn posts that I wrote between 2020 and July 30, 2025.

I draw inspiration for these posts from real-world discussions, stakeholder feedback, and evidence needs that I encounter in my senior roles across the pharmaceutical industry. Each post captures insights into therapeutic areas or methodological domains that directly relate to challenges I face in practice.

The table available online: https://cdn.amegroups.cn/static/public/jmai-2025-157-1.docx lists all 58 reflections and links them to their original posts, offering full traceability. My aim is to synthesize practical, experience-based insights from long-standing industry practice, rather than to deliver an exhaustive academic review.

I group the reflections thematically using manual coding in Excel, based on recurring topics and their relevance to healthcare applications. I do not use qualitative analysis software such as NVivo, nor do I assess inter-coder reliability, as this work reflects a single-author, exploratory effort. I approach the analysis reflexively, drawing on more than 30 years of experience in healthcare and industry-facing roles.

Therefore, the posts are manually categorized into thematic domains: clinical development, regulatory science, oncology, ophthalmology, dermatology, medical affairs, protocol development, drug discovery, and patient engagement. Relevant peer-reviewed literature cited in these reflections is integrated to support each thematic section.

Results

AI applications are increasingly embedded across multiple healthcare domains. These diverse applications of AI span multiple domains of healthcare evidence generation, as illustrated in Figure 1.

Figure 1 Applications of artificial intelligence across healthcare evidence generation. Visual summary of AI applications across key domains, including clinical development, regulatory science, oncology, ophthalmology, dermatology, medical affairs, protocol development, drug discovery, real-world evidence generation, and patient engagement. AI, artificial intelligence; HCP, healthcare professional; NLP, natural language processing.

NLP: AI-powered NLP models are transforming unstructured data extraction from electronic health records (EHRs), physician notes, and literature databases. NLP automates the synthesis of evidence for unsolicited healthcare professional (HCP) queries, medical information, and pharmacovigilance, reducing time to evidence dissemination and improving accuracy (5,11).

Recent implementations include the development of NLP frameworks that combine rule-based and DL methods to identify adverse event patterns in post-marketing surveillance. For example, NLP algorithms have been deployed to scan spontaneous reporting databases and social media content to detect potential safety signals earlier than traditional systems. These models leverage contextual embeddings and sentiment analysis to extract not only adverse event mentions but also patient experience and severity levels (12).

In clinical development and medical affairs, NLP engines now support automated extraction of inclusion/exclusion criteria from clinical trial protocols and matching with real-world patient records, improving trial feasibility assessments and patient recruitment strategies. Tools such as automated literature summarizers assist medical writers and Health Economics & Outcome Research (HEOR) teams in distilling thousands of abstracts into regulatory-compliant summaries (2).

Moreover, NLP has been applied to structured and semi-structured feedback from patient support programs and call center transcripts, allowing companies to extract recurring themes, such as adherence barriers or unmet needs. These insights can inform value dossiers, patient support materials, and label extensions (13).

Several organizations have begun integrating NLP-powered assistants into enterprise systems, offering real-time evidence summaries to medical science liaisons (MSLs) and enabling them to respond more accurately and quickly to HCP queries during field interactions (4,14).

LLMs: LLMs such as Chat Generative Pre-trained Transformer (developed by OpenAI) (ChatGPT) are being explored to handle real-time HCP queries, summarize literature, and assist in medical writing for scientific publications, facilitating faster evidence generation (3).

Recent applications demonstrate LLMs’ utility in automating key medical affairs functions, including rapid generation of fair-balanced scientific response letters and summarization of extensive regulatory or clinical documentation. These tools help reduce response turnaround times in HCP engagements while maintaining scientific integrity and traceability (11).

Additionally, LLMs are being piloted to support internal knowledge management—indexing and retrieving large volumes of internal reports, publications, and standard operating procedures across Research & Development (R&D) and commercial teams. Embedding these models within enterprise search tools helps harmonize fragmented evidence across silos and ensures timely access to validated scientific content (2).

Emerging use cases also include the integration of LLMs into patient-facing applications, such as chatbots trained to provide information on disease management, trial eligibility, and digital therapeutics. These applications must be rigorously validated to mitigate the risk of hallucination and ensure compliance with regulatory guidance (8,9).

From an operational standpoint, pharmaceutical companies are investing in fine-tuning open-source LLMs on proprietary datasets to enhance contextual relevance and ensure alignment with internal lexicons and regulatory frameworks. There is increasing interest in retrieval-augmented generation (RAG) techniques that combine LLMs with structured database queries to support evidence traceability and citation integrity (15).

Collectively, these developments highlight LLMs as foundational tools for scalable, multilingual, and context-sensitive content generation across medical, regulatory, and commercial domains.

Protocol development

AI facilitates the automated drafting of study protocols by analysing historical protocols, clinical guidelines, and regulatory standards to generate consistent, efficient, and regulatory-compliant designs (8,16).

Recent use cases demonstrate how AI algorithms, particularly those using natural language processing and ontology mapping, are being applied to extract structural elements from legacy protocols and automate the generation of new drafts aligned with International Council for Harmonisation-Good Clinical Practice (ICH-GCP) and therapeutic area-specific guidance. These systems can recommend appropriate inclusion/exclusion criteria, standardize endpoints, and propose validated measurement tools based on prior successful trials, thus enhancing protocol quality and reducing cycle time (17).

In oncology and rare diseases, AI-driven protocol optimization tools also evaluate feasibility metrics such as recruitment potential, burden of procedures, and overlap with competing trials using real-world data (RWD) from registries and claims databases. These predictive insights can help flag operational risks and suggest alternative study designs, including decentralized or adaptive approaches (18,19).

Moreover, regulatory agencies have begun to explore the integration of AI tools in pre-submission advisory processes to assess protocol soundness and harmonize terminology with electronic data capture systems. LLM-based tools are also being trialled to support investigators in navigating protocol templates and justifying key design decisions using evidence from prior literature (8).

Collectively, these applications indicate that AI is not only accelerating protocol development but also improving alignment between scientific rationale, regulatory compliance, and patient-centric trial design.

Predictive modelling

AI enables predictive modelling for disease progression and risk stratification in multiple therapeutic areas. In ophthalmology, AI models analyse optical coherence tomography (OCT) images to predict visual acuity changes in neovascular age-related macular degeneration (nAMD) (20,21). In glaucoma, predictive models assess surgical needs within a year based on multimodal data (3). Oncology predictive models identify high-value subgroups based on biomarker stratification (4,22,23).

Beyond imaging, predictive AI is being applied in cardiology to anticipate rehospitalization and stroke using EHR and wearable data (24), and in dermatology to forecast treatment response or flare-ups in chronic inflammatory conditions (25,26). These applications support earlier intervention and personalized care.

Health systems and regulators are also adopting predictive models to forecast healthcare demand, identify at-risk populations, and guide public health strategies (7). Many of these models now include explainability features [e.g., SHapley Additive exPlanations (SHAP) values] to meet transparency and auditability standards required by regulatory bodies (8).

Training times for AI models vary depending on data size, model complexity, and infrastructure. Convolutional neural networks (CNNs) trained on 3D OCT images can take hours to days, while transformer-based models for oncology may require several weeks (3,27).

AI model accuracy is commonly benchmarked against clinical practice. In ophthalmology, models have achieved area under the receiver operating characteristic curve (AUROC) scores of 0.89–0.94 for predicting disease activity and response to anti-vascular endothelial growth factor (anti-VEGF) treatment (20,28). Oncology models show similar performance, with AUROCs of 0.80–0.93 (22,29). A systematic review found DL matched or outperformed human experts in 98% of diagnostic imaging tasks (30).

These examples demonstrate the broad utility of predictive AI while underscoring the need for rigorous validation and governance to ensure safe and effective use in clinical practice.

Ophthalmology

AI-powered image analysis supports real-time quantification of OCT fluid compartments, enhancing monitoring of macular fluid dynamics and guiding anti-VEGF retreatment decisions (20,21,31). These models not only detect intra-retinal fluid (IRF), sub-retinal fluid (SRF), and sub-retinal pigment epithelium (RPE) fluid with high precision but can also evaluate the impact of fluid fluctuations on long-term visual outcomes (28). Automated tools such as the Vienna Fluid Monitor have demonstrated high correlation with expert grading and facilitate individualized treatment planning through longitudinal tracking of retinal biomarkers.

Beyond nAMD, AI is applied in diabetic macular edema (DME), retinal vein occlusion (RVO), and myopic choroidal neovascularization (mCNV) to predict disease activity and optimize dosing regimens (27). Predictive models using AI-based fluid dynamics and lesion growth trends also support real-world decision-making by identifying under-responders or patients at risk of early treatment discontinuation.

Additionally, explainable AI techniques are gaining traction to enhance transparency in ophthalmic imaging diagnostics. For instance, models trained with saliency maps and SHAP values can visually highlight decision-relevant retinal regions, aligning with clinician expectations and aiding trust in model outputs. These capabilities are particularly valuable in teleophthalmology, rural settings, and healthcare systems where access to retinal specialists is limited.

As such, ophthalmology serves as a leading use case for clinically integrated AI, combining high-resolution imaging, longitudinal disease monitoring, and regulatory interest to pioneer scalable, precision-guided eye care.

Dermatology

AI algorithms, particularly CNNs, classify dermoscopic images for early detection of melanoma and non-melanoma skin cancers (2,25,26,32-35). These models have demonstrated diagnostic performance on par with or exceeding that of dermatologists in several comparative studies, achieving AUROC scores above 0.90 for binary skin cancer classification (2,35).

In eczema care, AI-based algorithms enable remote severity assessment and area detection using patient-generated skin images (25). By leveraging AI to identify erythema, scaling, and lesion boundaries, such tools allow for longitudinal monitoring and therapy response tracking without the need for in-person visits. This is particularly impactful in underserved areas and paediatric populations, where clinic visits may be limited or logistically challenging. These tools support home-based screening and improve access to dermatological assessments.

Recent advances have expanded AI applications to multi-disease classification models, which can differentiate between a wide range of dermatologic conditions, including psoriasis, atopic dermatitis, lichen planus, and acne. These models often incorporate multimodal data—merging clinical metadata with imaging—to improve diagnostic granularity and tailor treatment suggestions (26).

Importantly, AI models are increasingly trained on diverse datasets encompassing a wide range of Fitzpatrick skin types to address equity concerns in skin disease detection (33). Several regulatory bodies and dermatology societies are advocating for the systematic validation of these tools across skin tones, lesion types, and care settings to mitigate bias and ensure clinical generalizability.

Collectively, AI-driven dermatological tools support not only early cancer detection but also scalable, remote skin disease management, marking dermatology as a maturing and patient-facing domain of medical AI.

Oncology

AI applications span imaging diagnostics, digital pathology, biomarker discovery, and individualized treatment planning. Platforms like PathAI enhance tissue sample interpretation, while AI-driven liquid biopsy development holds promise for early cancer detection (36,37).

In diagnostic imaging, DL models have shown strong performance in classifying and segmenting radiological scans, such as computed tomography (CT), positron emission tomography (PET), and magnetic resonance imaging (MRI), with accuracy levels comparable to radiologists for detecting lung nodules, breast tumours, and liver lesions (38). AI is also applied in radiogenomics, linking imaging phenotypes with molecular profiles to predict tumour behaviour and therapy response.

In digital pathology, CNNs automate histopathological slide analysis, enabling high-throughput quantification of mitotic activity, tumour-infiltrating lymphocytes, and margin status, which aids in prognostication and therapeutic decision-making (22). These systems are being validated in multi-center trials to ensure reproducibility and regulatory-grade reliability.

AI also accelerates biomarker discovery by integrating multi-omic data—such as transcriptomic, proteomic, and genomic profiles—with clinical endpoints to stratify patients into high- and low-risk subgroups. ML classifiers have achieved AUROC scores above 0.90 in predicting cancer-specific survival or treatment response, as shown in breast, colorectal, and prostate cancer studies (22,29).

Moreover, in the context of clinical trial optimization, AI is being applied to generate randomised clinical trials (RCTs)-comparable insights from RWD, simulating patient outcomes and helping to reduce reliance on placebo groups in specific settings such as ophthalmology (27).

The integration of AI across these oncology subdomains not only enhances diagnostic precision but also facilitates a shift toward precision oncology—tailoring interventions based on individual tumour biology and expected therapeutic benefit.

Drug discovery

AI accelerates early drug development by predicting molecular interactions, optimizing compound selection, and simulating clinical outcomes (4,12,29).

Recent applications leverage DL and reinforcement learning algorithms to explore vast chemical libraries and identify lead candidates with desirable pharmacokinetic and toxicity profiles. These models, trained on historical drug data and molecular descriptors, can predict binding affinity, off-target interactions, and absorption, distribution, metabolism, and excretion (ADME) characteristics—often outperforming traditional quantitative structure-activity relationship (QSAR) models (39).

In silico simulations of drug–target interactions are increasingly used to guide medicinal chemistry, reducing the need for extensive wet-lab screening. For instance, graph neural networks and generative models like variational autoencoders (VAEs) or generative adversarial networks (GANs) are being applied to generate novel chemical entities with optimized drug-likeness and synthetic accessibility.

Moreover, AI-driven multi-objective optimization tools can balance efficacy, safety, and manufacturability constraints, expediting progression from hit to lead and lead to candidate stages. These techniques have been adopted by several pharma companies to design first-in-class molecules or repurpose existing drugs against new targets, including during urgent public health crises such as coronavirus disease 2019 (COVID-19) (12).

Emerging platforms also use AI to simulate clinical trial outcomes using synthetic patient populations, predicting success probabilities and informing go/no-go decisions (4). These predictive frameworks help de-risk portfolios by flagging compounds with high attrition risk early in the pipeline.

Overall, AI is redefining drug discovery workflows—reducing cycle times, improving success rates, and supporting evidence-based decision-making from bench to bedside.

Medical affairs

AI-supported literature searches and unsolicited HCP query handling streamline medical information services. Furthermore, AI organizes insights from MSLs and medical representatives to identify emerging scientific trends (11,40). Additionally, AI-enabled patient-reported outcome platforms such as Kaiku Health provide real-time monitoring and symptom prediction, enhancing patient management in oncology settings (41).

Beyond these applications, medical affairs teams are increasingly adopting AI to automate scientific response letters, classify medical inquiries, and ensure consistency and compliance in global communication. NLP tools are used to synthesize findings across thousands of publications and generate concise, evidence-backed responses HCP engagement. AI platforms can now extract sentiment, trends, and unmet needs from field force activity reports and social listening data, which enables more strategic medical planning and congress coverage (11).

In field medical settings, AI-based dashboards support territory-level analytics, allowing MSLs to tailor discussions based on regional prescribing behaviours, treatment gaps, and new trial data. This personalization strengthens scientific exchange and elevates the role of Medical Affairs from a reactive support function to a proactive evidence-generation partner.

Furthermore, digital companion tools equipped with AI (e.g., symptom checkers, treatment navigators) are increasingly used to extend patient support and gather longitudinal RWD for outcomes research. These tools feed into medical strategies, inform value dossiers, and complement regulatory interactions.

Collectively, these innovations enhance the operational efficiency, scientific impact, and strategic relevance of Medical Affairs within pharmaceutical organizations, aligning closely with the shift toward more data-driven, externally focused medical strategies.

Social media analytics

AI-powered analysis of patient-generated data from social platforms and wearables provides real-time insights into patient experiences, aiding patient-centric evidence generation (14).

Building on this, AI algorithms—especially those leveraging NLP—are now capable of parsing vast volumes of unstructured social media content, including patient forums, tweets, and health-related discussions. These tools identify emerging concerns, misinformation trends, and gaps in education, which can guide medical communication and public health interventions.

In pharmaceutical applications, AI monitors digital conversations to track real-world drug usage, side effect reporting, and adherence challenges. For example, ML models can classify sentiment toward a newly launched drug, allowing manufacturers to fine-tune messaging or support services. DL methods have also been applied to correlate geotagged social data with local health trends or healthcare access issues—particularly valuable in rare diseases or underserved populations (14).

AI-enabled social listening platforms (e.g., CREATION.co) also assist in identifying influential voices within patient communities and clinician networks. These insights are increasingly used in advisory board planning, patient engagement strategy, and medical affairs content development.

Importantly, wearable technologies paired with social media allow for multimodal data integration. This includes real-time symptom tracking, biometric feedback, and qualitative patient-reported insights—all analysable via AI frameworks for disease pattern recognition or trial feasibility assessments.

As regulatory bodies begin to recognize the value of real-world digital insights, social media analytics are evolving into a core component of patient-centric evidence generation—complementing traditional data sources and shaping future engagement models across life sciences.

HTA and regulatory science

Regulators such as EMA, National Institute for Health and Care Excellence (NICE), and Canada’s Drug Agency, Advisory Mechanism on Artificial Intelligence (CDA-AMC) are integrating AI tools for evidence synthesis, clinical trial design, RWD harmonization, and cost-effectiveness modelling (7-10). The EMA Big Data Workplan emphasizes the importance of data governance and AI integration in regulatory-grade evidence generation.

Building on this, AI is increasingly being applied to streamline HTAs, particularly in the appraisal of complex interventions, adaptive pathways, and early access schemes. NLP and ML algorithms support the automation of systematic literature reviews and meta-analyses by rapidly extracting relevant data points from scientific publications and clinical trial registries, thereby reducing reviewer burden and accelerating decision timelines (2,8).

In clinical trial design, generative AI and simulation-based tools are assisting regulators and sponsors in optimizing eligibility criteria, predicting enrolment feasibility, and modelling comparative effectiveness scenarios using synthetic or historical controls (4,7). These capabilities are particularly relevant for rare diseases or precision medicine applications where conventional randomized controlled trials may be impractical.

AI is also supporting the generation of cost-effectiveness models by linking heterogeneous datasets—such as claims, registries, and EHRs—and predicting long-term outcomes and quality-adjusted life years (QALYs). Transparent, auditable AI systems are now a prerequisite for regulatory acceptability, with agencies demanding model explainability, algorithmic fairness, and traceability across development and deployment stages (8,9).

Notably, the increasing use of RWD in regulatory submissions (e.g., for label extensions or conditional approvals) has prompted both the EMA and Food and Drug Administration (FDA) to pilot AI-driven validation frameworks to evaluate the reliability and reproducibility of RWE (5,42). This evolving paradigm reflects a broader regulatory shift toward continuous learning systems, where AI enables adaptive, real-time policy refinement anchored in high-quality evidence.

Discussion

AI offers substantial advantages across healthcare evidence generation:

Efficiency gains: automation of routine data extraction, protocol writing, and literature synthesis accelerates evidence generation while reducing resource burden (4).
Improved access: real-time image analysis supports remote monitoring, particularly in rural or underserved populations (2,31).
Personalization: predictive models enable individualized therapy, optimizing treatment intensity based on patient-specific characteristics (20,26).
Expanded data sources: social media mining and patient-reported outcomes capture real-world experiences outside traditional clinical settings, enhancing regulatory and payer decision-making (5,14).

Despite these benefits, several challenges persist:

Data quality: variability in image acquisition, device heterogeneity, and training data diversity affect model generalizability (2,25,33).

Black box concerns: lack of transparency in AI model logic reduces clinician trust and regulatory confidence (6,8). Many AI models operate as “black boxes”, offering little interpretability to end users. This obscurity limits trust and impairs clinical decision-making. To address this, several models increasingly integrate explainability frameworks which provide transparent insight into model outputs and feature importance.

False positives: AI diagnostic tools risk over-diagnosis, unnecessary interventions, and patient anxiety (2,42).
Ethical issues: data privacy, bias in training data, and accountability for AI-driven decisions remain unresolved (15,33). When AI systems inform clinical decisions, the question of who is legally or ethically responsible for outcomes becomes complex. In scenarios where AI influences diagnosis, triage, or treatment recommendations, attribution of liability remains ambiguous, particularly when clinicians override—or blindly trust—automated outputs. For example, in dermatology, models trained on unbalanced image sets may underperform across diverse skin tones. AI also raises concerns about surveillance, data ownership, and patient consent—particularly in the use of wearables and social media for evidence generation. Ensuring AI systems uphold the principles of autonomy, beneficence, and justice requires robust, context-specific governance frameworks. These must include clear delineations of human oversight, regulatory review, auditability, and ethical transparency.
Regulatory gaps: current EMA, NICE, and FDA guidelines call for stronger validation frameworks, audit trails, and human oversight to ensure scientific integrity in AI-generated evidence (7-10).

Bias identification

As with any data-driven discipline, the implementation of AI in healthcare must actively address potential sources of bias that could compromise validity, generalizability, and patient safety. Several types of bias are identified across the 58 reflections analysed (available online: https://cdn.amegroups.cn/static/public/jmai-2025-157-1.docx), including demographic bias (e.g., underrepresentation of ethnic minorities and skin tones in dermatological datasets), technological bias (e.g., dependence on high-end imaging devices not equally accessible across care settings), automation bias (e.g., overreliance on AI suggestions may diminish clinician scrutiny), and sampling bias (e.g., models trained on tertiary care settings may not generalize to primary care).

Mitigation strategies reported across use cases include:

Diverse training datasets: in dermatology, studies stress the inclusion of multiple Fitzpatrick skin types in model development and application of debiasing techniques during model optimization (2,26). Still, disparities persist in both data access and model generalizability.
Hybrid decision-making frameworks: several reflections described the importance of human-in-the-loop systems to oversee AI-generated alerts, particularly in pharmacovigilance and patient safety monitoring. Hybrid governance models combining algorithmic alerts with expert review (11).
Transparency and explainability tools: applications in oncology and cardiology have begun integrating SHAP and LIME (Local Interpretable Model-Agnostic Explanations) frameworks to improve model interpretability for clinicians and regulators (18,24). These frameworks help ensure that high-impact AI applications—particularly those involving patient triage, diagnosis, or treatment selection—are auditable and interpretable by HCPs and oversight bodies.
Validation in external cohorts: real-world performance of AI models was frequently benchmarked using multi-country datasets, particularly in ophthalmology and oncology, to reduce institutional bias (20,22).
Ethical review and governance: protocol automation and predictive modelling pipelines are increasingly evaluated by data ethics committees, as emphasized in HTA guidance from CDA-AMC and EMA (7,8).

Acknowledging and proactively mitigating these biases is critical to support fairness, transparency, and trust in the deployment of AI-driven tools and ensuring equitable impact across populations.

Reflections on real-world implementation

The practical insights gathered in this manuscript underscore a critical distinction: AI in healthcare is not merely a technical innovation—it is a cultural and procedural shift. The integration of AI into workflows such as protocol generation, HCP engagement, and RWD synthesis calls for upskilling across roles, regulatory adaptation, and inter-disciplinary collaboration. As highlighted in multiple reflections, initial enthusiasm must be tempered by rigorous testing, lifecycle monitoring, and a patient-centered mindset.

Furthermore, several use cases demonstrated complementarity between traditional statistical approaches and AI-enhanced tools. For example, predictive models in nAMD are validated against clinician-reported outcomes and historical trial benchmarks, not designed to replace human expertise but to enrich it (21,27). Similarly, in regulatory science, AI is used not to automate decisions but to facilitate transparent and reproducible evidence synthesis (10).

Future research agenda

To advance the responsible use of AI in healthcare evidence generation, future work should prioritize:

Longitudinal studies to evaluate AI performance and drift across diverse populations.
Synthetic data generation frameworks for rare disease modelling.
Federated learning approaches to enable model development across decentralized datasets.
Cost-effectiveness studies comparing AI-assisted versus standard care pathways.
Cross-agency regulatory pilots for evaluating generative AI in real-world trial design.

This agenda supports a more equitable, scalable, and scientifically sound integration of AI technologies.

Limitations

This paper reflects a synthesis of professional observations and practice-driven insights rather than a systematic or hypothesis-testing study. The 58 LinkedIn reflections were authored over several years and stem from real-world interactions in dermatology, oncology, ophthalmology, and cross-functional roles in medical affairs and regulatory science. While this perspective offers valuable ground-level context, it is inherently subjective and shaped by the author’s career trajectory. Consequently, therapeutic areas not directly encountered in practice—such as radiology or infectious diseases—may be underrepresented.

Several limitations should be acknowledged. First, the reflections are not peer-reviewed at the point of original posting and do not follow a pre-specified quality assessment or selection protocol. Second, the examples discussed often highlight successful AI applications, which may introduce confirmation or publication bias. Third, while the manuscript references published evidence and applies thematic grouping to support consistency, it does not provide comparative performance metrics across AI methods. Therefore, generalizability across healthcare settings remains limited.

Ultimately, this reflective work aims to stimulate dialogue on responsible AI adoption and should be interpreted as exploratory in scope. Future research should further validate these experiential insights with prospective studies and multi-stakeholder consensus frameworks.

Closing perspective

As AI continues to evolve from pilot to practice, success will depend not only on technological sophistication but also on values-based design, stakeholder alignment, and longitudinal learning. This paper, by distilling lessons from real-world deployments across domains, aims to contribute to that collective learning journey.

Conclusions

This paper offers a practice-informed synthesis of AI applications across key domains of healthcare, including oncology, dermatology, ophthalmology, medical affairs, and regulatory science. Rather than presenting a systematic or comprehensive analysis, it draws on 58 reflections developed over several years of industry experience, aiming to highlight how AI tools are actively shaping evidence generation and clinical operations.

The examples included reflect firsthand professional observations and published sources, with a focus on translational and real-world relevance. As such, the work is inherently subjective and reflects the areas most directly experienced by the author. Nevertheless, the patterns observed suggest that AI is playing a growing role in predictive modelling, patient stratification, medical communication, and regulatory science.

This manuscript does not seek to generalize AI’s impact across all therapeutic areas or to evaluate the performance of specific tools. Instead, it contributes to the evolving conversation on how industry stakeholders are experiencing AI adoption in practical, operational, and scientific contexts.

Moving forward, collaboration between developers, HCPs, regulators, and patients remains essential to ensure the responsible, transparent, and equitable integration of AI in healthcare. Future research should focus on validating AI performance in real-world settings, addressing algorithmic bias, and aligning emerging technologies with health system needs and regulatory expectations.

Acknowledgments

The author would like to express heartfelt gratitude to his daughter, whose curiosity, resilience, and unwavering encouragement have been a constant source of inspiration throughout this work. Her presence reminds me daily of the importance of striving for a better, evidence-informed future in healthcare.

Portions of this manuscript, including drafting and refinement of certain sections, were supported using OpenAI’s GPT-4 model (ChatGPT) for language enhancement and idea framing. The author critically reviewed, edited, and validated all content to ensure accuracy and originality.

The author has held senior leadership roles across several global pharmaceutical companies over a 30-year career, with a focus on clinical development, RWE, and medical affairs. The author is motivated to share accumulated knowledge and industry perspectives to support future generations of researchers, healthcare leaders, and policymakers. This manuscript is part of that effort—an attempt to contribute experiential insights on the evolving role of AI in healthcare, grounded in real-world practice rather than academic theory.

Additionally, the conceptual foundation and several thematic insights presented in this manuscript were developed and first explored in a series of professional posts published by the author on LinkedIn (Alexandros Sagkriotis). The author gratefully acknowledges the anonymous peer reviewers for their careful evaluation and constructive feedback, which substantially strengthened the clarity, methodological rigor, and overall quality of this work.

Footnote

Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-157/prf

Funding: None.

Conflicts of Interest: The author has completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-157/coif). The author has no conflicts of interest to declare.

Ethical Statement: The author is accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Xie Y, Zhai Y, Lu G. Evolution of artificial intelligence in healthcare: a 30-year bibliometric study. Front Med (Lausanne) 2025;11:1505692. [Crossref] [PubMed]
Hartmann T, Passauer J, Hartmann J, et al. Basic principles of artificial intelligence in dermatology explained using melanoma. J Dtsch Dermatol Ges 2024;22:339-47. [Crossref] [PubMed]
Jin K, Yuan L, Wu H, Grzybowski A, et al. Exploring large language models for next generation of artificial intelligence in ophthalmology. Front Med (Lausanne) 2023;10:1291404. [Crossref] [PubMed]
Shah B, Adabala Viswa C, Zurkiya D, et al. Generative AI in the pharmaceutical industry: Moving from hype to reality. McKinsey & Company. January 2024. Available online: https://www.mckinsey.com/industries/life-sciences/our-insights/generative-ai-in-the-pharmaceutical-industry-moving-from-hype-to-reality
Arlett P, Umuhire D, Verpillat P, et al. Clinical Evidence 2030. Clin Pharmacol Ther 2025;117:884-6. [Crossref] [PubMed]
Kotter JP, Akhtar V, Gupta G. Change: how organizations achieve hard-to-imagine results in uncertain and volatile times. Hoboken: Wiley; 2021.
European Medicines Agency (EMA), Heads of Medicines Agencies. Big Data Workplan 2023–2025. HMA/EMA Joint Big Data Steering Group; June 2024. Available online: https://www.ema.europa.eu/en/documents/work-programme/workplan-2023-2025-hma-ema-joint-big-data-steering-group_en.pdf
Canada’s Drug Agency (CDA-AMC). Position statement on the use of artificial intelligence in the generation and reporting of evidence. Canada’s Drug Agency; April 2025. Available online: https://www.cda-amc.ca
National Institute for Health and Care Excellence (NICE). Use of AI in evidence generation: NICE position statement. NICE; 2024. Available online: https://www.nice.org.uk/position-statements/use-of-ai-in-evidence-generation-nice-position-statement
Turner AJ, Sammon C, Latimer N, et al. Transporting Comparative Effectiveness Evidence Between Countries: Considerations for Health Technology Assessments. Pharmacoeconomics 2024;42:165-76. [Crossref] [PubMed]
Medical Affairs Professional Society. Medical affairs: the roles, value, and practice of medical affairs in the biopharmaceutical and medical technology industries. Boca Raton: CRC Press; 2024.
Faiyazuddin M, Rahman SJQ, Anand G, et al. The Impact of Artificial Intelligence on Healthcare: A Comprehensive Review of Advancements in Diagnostics, Treatment, and Operational Efficiency. Health Sci Rep 2025;8:e70312. [Crossref] [PubMed]
Shah P, Kendall F, Khozin S, et al. Artificial intelligence and machine learning in clinical development: a translational perspective. NPJ Digit Med 2019;2:69. [Crossref] [PubMed]
Fernandes J. How social media intelligence can support blockbuster launches: GSK’s depemokimab. Creation.co; 2024. Available online: https://creation.co/knowledge/how-social-media-intelligence-can-support-blockbuster-launches-gsks-depemokimab/
Ontoforce. White paper: the evolution of RAG and what’s next for GenAI in the life sciences industry. Ontoforce; 2025. Available online: https://www.ontoforce.com/whitepaper/the-evolution-of-rag-and-whats-next-for-genai
Fragoulakis V, Vozikis A. Utilizing synthetic data and artificial neural networks for clinical phenotype prediction in precision medicine: a targeted metabolomic analysis of urinary organic acids in autoimmune diseases. J Polit Ethics New Technol AI 2024;3: [Crossref]
European Medicines Agency (EMA). Reflection Paper on the Use of Artificial Intelligence (AI) in the Medicinal Product Lifecycle. EMA/801176/2023. July 2023. Available online: https://www.ema.europa.eu/en/documents/scientific-guideline/reflection-paper-use-artificial-intelligence-ai-medicinal-product-lifecycle_en.pdf
Peltner J, Becker C, Wicherski J, et al. The EU project Real4Reg: unlocking real-world data with AI. Health Res Policy Syst 2025;23:27. [Crossref] [PubMed]
Zou KH. Real-World Data, Big Data, and Artificial Intelligence: Recent Development and Emerging Trends in the European Union. In: Zhou KH, Salem LA, Ray A. editors. RWE in a Patient-Centric Digital Era. New York: Chapman and Hall/CRC; 2023.
Chakravarthy U, Havilio M, Syntosi A, et al. Impact of macular fluid volume fluctuations on visual acuity during anti-VEGF therapy in eyes with nAMD. Eye (Lond) 2021;35:2983-90. [Crossref] [PubMed]
Schmidt-Erfurth U, Bogunovic H, Sadeghipour A, et al. Machine Learning to Analyze the Prognostic Value of Current Imaging Biomarkers in Neovascular Age-Related Macular Degeneration. Ophthalmol Retina 2018;2:24-30. [Crossref] [PubMed]
Al-Tashi Q, Saad MB, Muneer A, et al. Machine Learning Models for the Identification of Prognostic and Predictive Cancer Biomarkers: A Systematic Review. Int J Mol Sci 2023;24:7781. [Crossref] [PubMed]
Peltner J, Becker C, Wicherski J, et al. The EU project Real4Reg: unlocking real-world data with AI. Health Res Policy Syst 2025;23:27. [Crossref] [PubMed]
Feuerriegel S, Frauen D, Melnychuk V, et al. Causal machine learning for predicting treatment outcomes. Nat Med 2024;30:958-68. [Crossref] [PubMed]
Huang L, Tang WH, Attar R, et al. Remote Assessment of Eczema Severity via AI-powered Skin Image Analytics: A Systematic Review. Artif Intell Med 2024;156:102968. [Crossref] [PubMed]
Liopyris K, Gregoriou S, Dias J, et al. Artificial Intelligence in Dermatology: Challenges and Perspectives. Dermatol Ther (Heidelb) 2022;12:2637-51. [Crossref] [PubMed]
Sagkriotis A, Chakravarthy U, Griner R, et al. Application of machine learning methods to bridge the gap between non-interventional studies and randomized controlled trials in ophthalmic patients with neovascular age-related macular degeneration. Contemp Clin Trials 2021;104:106364. [Crossref] [PubMed]
Schmidt-Erfurth U, Vogl WD, Jampol LM, et al. Application of Automated Quantification of Fluid Volumes to Anti-VEGF Therapy of Neovascular Age-Related Macular Degeneration. Ophthalmology 2020;127:1211-9. [Crossref] [PubMed]
Parikh RB, Manz C, Chivers C, et al. Machine Learning Approaches to Predict 6-Month Mortality Among Patients With Cancer. JAMA Netw Open 2019;2:e1915997. [Crossref] [PubMed]
Liu X, Faes L, Kale AU, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health 2019;1:e271-97. [Crossref] [PubMed]
Schmidt-Erfurth U, Sadeghipour A, Gerendas BS, et al. Artificial intelligence in retina. Prog Retin Eye Res 2018;67:1-29. [Crossref] [PubMed]
Sengupta D. Artificial Intelligence in Diagnostic Dermatology: Challenges and the Way Forward. Indian Dermatol Online J 2023;14:782-7. [Crossref] [PubMed]
Venkatesh KP, Raza MM, Nickel G, et al. Deep learning models across the range of skin disease. NPJ Digit Med 2024;7:32. [Crossref] [PubMed]
Bibi I, Schaffert D, Blauth M, et al. Automated Machine Learning Analysis of Patients With Chronic Skin Disease Using a Medical Smartphone App: Retrospective Study. J Med Internet Res 2023;25:e50886. [Crossref] [PubMed]
Hebebrand M. Comparing AI Models for Dermatological Diagnoses. Dermatology Times. August 9, 2024. Available online: https://www.dermatologytimes.com/view/comparing-ai-models-for-dermatological-diagnoses
Grand View Research. Artificial intelligence in oncology market size, share & trends analysis report by component, by treatment type, by cancer type, by application, by end-user; and segment forecasts, 2025–2030. San Francisco: Grand View Research; 2025. Available online: https://www.grandviewresearch.com/industry-analysis/artificial-intelligence-oncology-market-report
Hu E, Tai M, Nie Z, et al. Development of a machine learning-based model for prognostic prediction in melanoma. Sci Rep 2025;15:45628. [Crossref] [PubMed]
Kleppe A, Skrede OJ, De Raedt S, et al. Designing deep learning studies in cancer diagnostics. Nat Rev Cancer 2021;21:199-211. [Crossref] [PubMed]
Schneider G. Mind and machine in drug design. Nat Mach Intell 2019;1:128-30.
Algazy J, Garcia A, Ryan S, et al. A vision for medical affairs 2030: five priorities for patient impact. McKinsey & Company; 2023. Available online: https://www.mckinsey.com/industries/life-sciences/our-insights/a-vision-for-medical-affairs-2030-five-priorities-for-patient-impact
Mohr P, Ascierto P, Addeo A, et al. Electronic patient-reported outcomes, fever management, and symptom prediction among patients with BRAF V600 mutant stage III–IV melanoma: the Kaiku Health platform. EJC Skin Cancer. 2024;2:100254.
Hebebrand M. Review investigates machine learning in skin disease identification. Dermatology Times. Oct 29, 2024. Available online: https://www.dermatologytimes.com/view/review-investigates-machine-learning-in-skin-disease-identification

doi: 10.21037/jmai-2025-157
Cite this article as: Sagkriotis A. Leveraging artificial intelligence for real-world evidence generation in healthcare: a review of lessons across 58 case reflections. J Med Artif Intell 2026;9:25.

Leveraging artificial intelligence for real-world evidence generation in healthcare: a review of lessons across 58 case reflections

Introduction

Background

Rationale and knowledge gap

Objective

Methods

Results

Protocol development

Predictive modelling

Ophthalmology

Dermatology

Oncology

Drug discovery

Medical affairs

Social media analytics

HTA and regulatory science

Discussion

Bias identification

Reflections on real-world implementation

Future research agenda

Limitations

Closing perspective

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share