Large language models-powered agentic AI design and implementation in pharmacovigilance—a narrative review

Sanju Mannumadam Venugopal

doi:10.21037/jmai-24-350

Review Article

Large language models-powered agentic AI design and implementation in pharmacovigilance—a narrative review

Sanju Mannumadam Venugopal

Healthtech Company, Los Angeles, CA, USA

Correspondence to: Sanju Mannumadam Venugopal, MBA, Information Systems. Senior Manager in Healthtech Company, 10000 Imperial Highway, Apt B310, Downey, CA 90242, USA. Email: sanjumv89@gmail.com.

Background and Objective: Pharmacovigilance (PV) is a science and activities related to detection, assessment, understanding, and prevention of adverse drug effects. Historically, PV efficiency has been burdened by analysis of voluminous and diverse data, which is both labor-intensive and costly. This led to delayed reporting and underreporting of the adverse events (AEs) with serious consequences of the lives of the patients, and financial and reputational implications to the drug manufacturers. Recent advancements in artificial intelligence (AI), particularly large language models (LLMs) and LLM-based agentic AI systems, show strong potential to play a crucial role in PV which can augment human capabilities and efforts, and transform AEs monitoring, reducing their turn around time and underreporting cases significantly, improving accuracy in the reports, and scaling to support the ever-growing number of AEs. This review highlights opportunities for integrating LLM-based agentic AI system into existing PV workflows with new possibilities of human-AI collaboration and explore how this collaboration can improve safety signal detection, individual case safety report (ICSR) generation, aggregate analysis, benefit-risk assessments, and risk management.

Methods: This review employed a non-systematic search strategy, analyzing relevant literature from major databases within the past decade, focusing on publications in English. The search was designed to capture studies exploring the application of LLMs and LLM-based agents in PV and its related AI advancements.

Key Content and Findings: The review outlines the capabilities of standalone LLMs in improving PV efficiency, and their limitations. The discussion introduces LLM-based agentic AI systems which offers enhanced cognitive and autonomous decision-making capabilities compared to standalone LLMs, while streamline PV processes and improve workflow efficiencies. The review also identifies LLM agents specific to PV workflows that could operate within an agentic AI system framework to optimize PV activities.

Conclusions: The integration of LLM-based agentic AI systems in PV holds significant promise for enhancing the PV processes. By addressing current limitations, these systems can potentially reshape future research, clinical practices, and policymaking, paving the way for more robust and efficient drug safety monitoring.

Keywords: Pharmacovigilance (PV); artificial intelligence (AI); large language models (LLMs); LLM-based agentic AI system

Received: 25 September 2024; Accepted: 08 September 2025; Published online: 12 January 2026.

doi: 10.21037/jmai-24-350

Introduction

An adverse event (AE) is any undesirable experience associated with the use of a medical product in a patient. This can become a serious adverse event (SAE) when the outcome of the patient results in death, life-threatening, hospitalization (initial or prolonged time), disability, birth defect or other serious medical events. Pharmacovigilance (PV) involves scientific and data gathering activities relating to the detection, assessment, and understanding of AEs for informed regulatory decisions and timely reporting to the regulatory agencies. Drug-induced toxicity, an extreme form of adverse drug events (ADEs) may lead to extreme conditions resulting in patient’s death. So, it is very important to proactively detect AEs, SAEs and ADEs to potentially avoid any serious issues caused by the drugs to the health of the patients. This would enhance patient safety and decrease patient harm, reduce unnecessary healthcare costs, and boost efficiency in the post-marketing surveillance.

PV activities such as individual case reporting, spontaneous reporting, reports adverse drug reactions (ADRs) to AE reporting systems like Food and Drug Administration Adverse Event Reporting System (FAERS), EudraVigilance, etc., to detect safety signals during which medical and scientific literature is reviewed and analyzed for new information on ADRs to have an updated drug safety profile. However, these activities lead to challenges like underreporting and delays in activities such as AEs and safety signal detection (between random and true signal from huge heterogeneous data), while ineffective utilization of unstructured data, such as generation of case reports leading to missed safety alerts (1).

Artificial intelligence (AI) models overcome the challenge of underreporting, while accelerating the reporting process of ADRs, improving the accuracy of safety signals and utilizing unstructured data for timely reporting of safety issues. However, traditional predictive AI models lack an in-depth reasoning capability, lack contextual text understanding interpreting signals from huge voluminous data, may fail to capture complex relationships on drug-drug interactions, may be unable to track patient data patterns over a period, etc. Generative AI models called the large language models (LLMs) along AI agents within an agentic AI framework can overcome the challenges posed by traditional predictive AI models, which possess capabilities such as context-aware reasoning, real-time decision-making, complex problem-solving and establishing iterative feedback for producing the best possible outcome and significantly improving workflow efficiency. This paper aims to offer an understanding of the current and future prospects of LLM-based agentic AI systems in PV and discuss their framework, advantages, challenges and concludes with future directions and research in this field of interest. This article is presented in accordance with the Narrative Review reporting checklist (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-350/rc).

Methods

Literature search strategy

A non-systematic narrative review of the literature review, as seen in Table 1, was conducted for assessing the role of LLM and LLM-based AI agentic systems in PV through major electronic databases including PubMed, ScienceDirect, arXiv, among others. The search was carried out using the keywords or combination of keywords such as: “Large Language Model AND Pharmacovigilance”, “Pharmacovigilance AND Artificial Intelligence”, “Large Language Model AND Safety Signal”, “Adverse Event AND Large Language Model”, “LLM agents” and “Generative AI AND Pharmacovigilance”. The different articles used in the study were sourced from different scholarly, open journal platforms that have articles written in English. The search was limited to articles and reports published between January 2022 and April 2025.

Table 1

Literature review

Items	Specification
Date of search	02 June 2024, 30 June 2025
Databases and other sources searched	PubMed, ScienceDirect, arXiv, among others
Search terms used	“Large Language Model AND Pharmacovigilance”, “Pharmacovigilance AND Artificial Intelligence”, “Large Language Model AND Safety Signal”, “Adverse Event AND Large Language Model”, “LLM based agents” and “Generative AI AND Pharmacovigilance”
Timeframe	2022–2025
Inclusion and exclusion criteria	Inclusion criteria: recent and relevant research related to the use of AI in pharmacovigilance, LLM and LLM-based agentic systems in the pharmacovigilance; studies must outline either research or application of AI models, Generative AI, LLM, LLM-based agentic AI systems to pharmacovigilance
Inclusion and exclusion criteria	Exclusion criteria: excluded any articles that did not use LLM or LLM-based agentic AI systems at all. Efforts were made to exclude studies that were related to generic applications of AI or LLM, and which are non-English language articles
Selection process	Conducted independently by the author S.M.V.

AI, artificial intelligence; LLM, large language model.

Inclusion criteria

The most recent and relevant research publications retrieved from scholarly databases, which are related to the use of AI in PV, and LLM and LLM-based AI Agentic systems in the PV are considered. The articles had to outline the methodology and their application of AI, generative AI, LLM, or LLM Agentic AI Systems to PV. It was vital for these studies to include detailed information on research conducted, the issue addressed, as well as its application to PV.

Exclusion criteria

The review excluded any articles that did not use LLM, LLM-based agentic systems or AI in PV, and which are non-English language articles. Efforts were made to exclude studies that were related to generic applications of AI or LLM. It also excluded any articles that only used quantitative methods without considering explicit LLM or LLM-based agentic systems techniques.

Discussion

AI in PV

PV is a field primarily driven by data, their collection, management, analysis and action. In the US, the data comes from varied disparate sources such as FAERS, emergency department-based systems, epidemiologic data from automated claims databases, National Electronic Injury Surveillance System (NEISS), Drug Abuse Warning Network (DAWN), etc. From other geographies of the world, they come from sources such as General Practice Research Database (GPRD) from MHRA (UK), Saskatchewan Health Database from Canada, etc. It’s critical to make sense out of these huge, heterogeneous data in order to reduce frequency of harm to provide predictions to help prevent possible ADEs and drug-induced toxicities. Some of the challenges faced by traditional PV methodologies include underreporting of activities such as AEs and safety signal detection (between random and true signal from huge heterogeneous data), while ineffective utilization of unstructured data such as generation of case reports, providing critical insights from varied data sources such as medical literature, clinical notes, electronic health records (EHRs), etc. AI has been in use in PV where there is a potential to handle huge volumes of data; analyze them and make predictions and discover patterns out of them (2). On that note, AI is found to be in detection of ADEs & ADRs, process safety reports, extraction of drug-drug interaction, prediction of drug side effects and patient identification with high risk of ADRs (3). Below are some of the key use cases of AI, identified to reduce the frequency of key drug AEs.

Prediction of whether patients are likely to have a future ADE, which should help to prevent or effectively manage ADEs.
Prediction of therapeutic response to medications.
Prediction of optimal medication dosing to balance therapeutic benefit with ADE risk related to a specific medication.
Prediction of the most appropriate treatment option to help guide selection of safe and effective pharmacological therapies.

The key AI algorithms considered for PV include machine learning algorithms (such as support vector machines, decision trees & random forest), natural language processing (NLP) and neural networks (convolutional neural networks, recurrent neural networks) (4,5). Among these, machine learning algorithms have predictive analytics capabilities where they employ unique methodologies and techniques to learn from data and make predictions and anticipate future AEs (6,7). NLP, on the other hand, helps to parse, interpret and extract insights from the texts extracted from medication dosage, pathology reports, clinical notes, etc. Techniques such as named entity recognition (NER), where text from medical literature is analyzed for drug names, symptoms, potential AEs, and relation extraction where associations are found between drugs and their potential AEs, are used (4,8). And finally, neural networks help in identifying complex patterns within large structured and unstructured datasets that facilitate the identification of relevant features that signal an ADE within medical literature or patient records (3,8).

Could LLMs improve efficiency in the PV process?

The recent surge in interest from academia and industry in experimenting and understanding LLMs has ignited the spark to try out its application in various fields (9,10). Along with its huge success in different natural language processing related tasks in different domains across industries, it has become ever important to understand its application in PV. The impact will be immense in PV applications, analyzing structured and unstructured data in the areas of drug safety monitoring, assessing understanding and preventing AEs, case report generation, etc., if their capabilities are used to its potential and adopted widely (11,12). Since these models require large amounts of quality data to train themselves and possibly use techniques such as retrieval augmented generation (RAG) and fine-tuning to tune LLMs to do specific tasks, there is a huge potential for PV domain to reap benefits out from LLMs.

LLMs in safety signal

One of the primary applications of LLMs would be timely detection of safety signals from vast datasets. This provides timely intervention to identify medications and their unexpected potential adverse effect, validation of safety signal, and appropriately having risk mitigation in place which otherwise may lead to serious health issues for the patients. Li et al. (13) proposed a specialized fine-tuned model called AE-GPT which uses LLMs for advanced AE detection.

LLMs in individual case safety report (ICSR) generation

ICSR generation involves taking, processing, and generating reports from varied sources (such as spontaneous reporting systems, clinical notes, literature, etc.) and finally submitting reports to regulatory agencies. This benefits patients through more rapid identification and communication of medical product safety information. Landman et al. (14) leveraged the contextual understanding and predictive capabilities of LLMs to analyze patient data, their genomic profile, clinical data, etc., to identify the population who are prone to AEs and to identify factors associated with increased risk and their prediction. LLMs can assist in integrating complex risk factors, such as comorbidities and polypharmacy, into risk assessment algorithms, resulting in more accurate predictions (15).

LLMs in drug repurposing and design

Drug repurposing and design can be enhanced with LLMs analyzing vast amount of data on biomedical literature, patient information, safety signal reports, etc., to identify existing drugs and their potential for new treatments, which were not indicated before. This involves analyzing drug-related data such as their molecular properties, their safety profile, and based on them, help in the prediction of new therapeutic application. Yan et al. (16) utilized ChatGPT to gain insights into existing drugs to be used for repurposing for Alzheimer’s disease.

LLMs in aggregate analysis and periodic reporting

Aggregate analysis and periodic reporting involves compiling safety information for a drug over an extended period of time, which is required for the pharmaceutical companies to be sent to regulatory agencies to comply with regulatory requirements. Generation of such aggregate and periodic reporting formats, such as periodic safety update reports (PSURs), periodic adverse drug experience report (PADER), etc., is time and resource intensive, and is prone to errors. LLMs can overcome these challenges and enable real-time monitoring and faster identification of potential safety concerns to report them (17). Through predefined criteria, templates and algorithm settings, LLMs can automatically generate standardized aggregate reports, ensuring consistency in reporting practices (18).

LLMs in benefit-risk assessment and risk management

Weighing benefits associated with a particular drug and its risks is important in ensuring benefits of the drug to outweigh risks throughout the drug’s lifecycle, considering patient’s perspective for drug usage. LLMs can significantly help in analyzing large volumes of study data to evaluate any documented risks (likelihood and severity), their impacts, and assist clinicians on deciding if drug usage is acceptable/unacceptable (19,20).

Significant progress has been made in applying LLMs to various pharmacovigilance tasks, as summarized in Table 2.

Table 2

Progress of LLM in pharmacovigilance

Pharmacovigilance aspect	Progress of LLMs
Safety signal	Li et al. (13) proposed a specialized fine-tuned model called AE-GPT which used LLMs for advanced AE detection
	Li et al. (21) assessed the performance of various LLM models in categorizing medical publications and identifying drug safety signals amidst the extensive and ever-growing body of medical literature. The findings highlight the promising potential of LLMs in literature screening, achieving reproducibility of 93%, sensitivity of 97%, and specificity of 67%, showcasing notable strengths in terms of reproducibility and sensitivity, with moderate specificity
	Matheny et al. (11) evaluated LLMs and discussed using them to improve post market surveillance in identifying new safety signal from electronic health records and refining existing signals
	Sun et al. (22) investigated using ChatGPT in pharmacovigilance event extraction to identify and extract adverse events or potential therapeutic events from textual medical sources, where ChatGPT performance was seen considerable
ICSR	Landman et al. (14) leveraged the contextual understanding and predictive capabilities of LLMs to analyze patient data, their genomic profile, clinical data, etc., to identify the population who are prone to adverse events and to identify factors associated with increased risk and their prediction.
ICSR	LLMs can assist in integrating complex risk factors, such as comorbidities and polypharmacy, into risk assessment algorithms, resulting in more accurate predictions (15)
PR	von Csefalvay (23) designed an LLM called DAEDRA to detect regulatory-relevant outcomes such as mortality, ER attendance and hospitalization in adverse event reports elicited through PR, and established strengths and weakness of having domain-specific LLMs having complex clinical entities
Drug repurposing and design	Yan et al. (16) utilized ChatGPT to gain insights on existing drugs to be used for repurposing for Alzheimer’s disease
Aggregate analysis and periodic reporting	Praveen et al. (17) explores the potential of using LLMs for data aggregation and analysis, and in automated reporting process
	Shi et al. (24) considered manual summarization of food effect from extensive drug application review documents where automated text summarization, where FDA professionals unanimously rated that 85% of the summaries generated by GPT-4 are factually consistent with the golden reference summary
	Benaïche et al. (25) created an LLM (internal to Bayer AG) called MyGenAssist (a ChatGPT based tool) to be used in pharmacovigilance case documentation process
Social media pharmacovigilance	Carpenter et al. (26) demonstrated the capabilities of GPT-3, and similar models, for practical contributions to pharmacovigilance. GPT-3 was able to generate drug synonyms to facilitate pharmacovigilance based on social media
Social media pharmacovigilance	Chang et al. (27) proposed LightNER, a few-shot NER approach to identify ADR from UGC in social media. This approach used few labelled UGCs with an LLM called BART, and was able to achieve considerable performance in ADR identification

ADR, adverse drug reaction; AE, adverse event; ER, emergency room; FDA, Food and Drug Administration; ICSR, individual case safety report; LLM, large language model; NER, named entity recognition; PR, passive reporting; UGC, user-generated content.

Drawbacks of using standalone LLMs in PV

Even though the performance of LLMs has significantly improved and they show great potential to enhance PV workflows, providing high-level instructions to an LLM and expecting it to accomplish a complex PV task remains a challenge (28). Looking at standalone LLMs, their inability to focus specifically on a domain-specific task (unless fine-tuned on those very specific tasks) could cause LLMs to generate outcomes, which might result in hallucinations. The reason is the absence of guided thinking and inconsistent reasoning abilities for these tasks that LLMs handle, where they, at most times, generate an outcome in a single step (29,30). Another drawback would be a lack of cross-examining LLMs’ outcomes and decision-making process (31). This poses a great risk in PV where outcomes are required to be consistent, accurate and explainable to a great extent, for better inference and to improve trustworthiness in LLMs. The above drawbacks of LLMs can be attributed to factors such as inaccessibility to specific set of data for LLMs during their pre-training, for which LLMs rely mostly on in-built knowledge based on their generalized dataset training. Additionally, as LLMs are primarily next-token predictors, when given more complex activities, they show an inability to break them down into simpler tasks with no visible mechanism to their response validation (32). Most of the high performing LLMs are largely trained for generalized tasks, except for some of the medical specific LLMs such as Med-PaLM, MedLlama-3, BioBert. Since there is comparatively less quality medical training data used to train these models, there is a high chance of hallucination in their responses with no guided thinking or cross-examination of these generated responses. Additionally, there is a high possibility for selection and performance bias that affects patient outcomes for populations from under-represented groups. This is because LLM models are often trained on datasets that contain insufficient data from underserved populations, leading to biased outputs. Moreover, the reasoning aspect of these underlying LLM’s neural networks, that fuels generative AI algorithms and their responses, lacks transparency in their decision-making process.

What makes LLM-based agentic AI systems different from standalone LLMs?

To best harness the capabilities of LLMs as components of a complex application, it is necessary to decompose the task into smaller sub-tasks and manage multiple LLM conversations, each with its own set of specifically defined instructions, state, and data sources. An AI agent, in simple terms, is an autonomous unit that is programmed to perform tasks, make decisions, and communicate with other agents. To elaborate a little more, as seen in Figure 1, AI agents are programmed entities that have the capability to perceive, reason, strategize and generate actions based on their environment using a suite of tools, where they collaborate, learn and communicate with each other through iterative feedback and discussions, to provide better outcomes. AI Agents work quite like the human decision-making process. The environment provides the much-needed stimuli, learning experience (past and present) and feedback for the AI agents to act upon to provide better outcomes (33,34).

Figure 1 LLM based agentic AI framework. AI, artificial intelligence; LLM, large language model.

Some of the key characteristics that make AI agents include role play, focus, tool usage, cooperation, guardrails, and memory. Such characteristics would aid AI Agents to involve effectively in perceiving, reasoning, strategizing, and performing actions while retrieving new information to enable customized decision-making (35). These agents have an effective way to learn action, which involves trial-and-error experiences via interactions with their given environment. Having such multistep reasoning, planning and executing would help in handling more complex tasks and providing outcomes with confidence (36,37). There are different types of AI Agents in use, namely symbolic agents, reactive agents, reinforcement-learning based agents, agents with transfer learning and meta learning, etc. The LLM-based agentic AI system is a new addition to the list, where LLMs can suitably be the primary component of an agent’s brain. This is different from how the LLMs work as a standalone tool, compared to working along with AI agents in an environment (38).

LLMs are considered the brain of agentic AI systems, which has capabilities like task splitting, autonomy, reactivity, pro-activeness, and social ability. LLMs are demonstrating impressive capabilities in various NLP tasks and using their chain-of-thought (CoT) and problem decomposition (task splitting) techniques, helping in planning and reasoning activities (22,36). Next, autonomy is visible in LLMs ability to generate human-like text, involving a conversation which doesn’t require instructions from humans for every step, providing creative outputs that weren’t programmed before and adjusting their output based on environmental inputs. Reactivity capability involves responding to immediate changes and stimuli in its environment, which is evident from LLMs processing information from various modalities such as audio, video, image, etc., and help in interacting with real-world physical environments and responding accordingly. Proactiveness involves taking actions based on their given goals and taking respective steps. Prompting techniques help in eliciting their reasoning abilities, while goal reformulation and task decomposition techniques help in adapting to its environment (37). Additionally, agentic system can employ tools for LLMs to interact with them and draw inferences (39). Social ability enables interaction with other agents through collaboration, cooperation and competition, which is essential for cross-examination of the AI agent’s outcomes and to generate more efficient responses (40). The relations between agents and LLMs govern the state of interaction and cooperation among them (38,41).

Building blocks of an agentic AI system

Planning and reasoning

Agents use LLMs for their planning & reasoning based on goals or objectives to solve problems. Agentic AI systems use iterative, multistep reasoning framework, while integrating with aspects like conditional logic, goal-based planning and ReAct (reason + act) framework to dynamically adapt and adjust actionable plans, choose appropriate tools, handle errors and reprioritize steps based on changes in their environments, updated learnings or new data, to help in achieving the desired goals or objectives (36).

Memory

The memory mechanism in the agentic AI system captures and retains contextual information from its environment, along with the past and current interactions, facilitating informed decisions in future actions. Short-term memory holds immediate information, long-term memory accumulates experiences, while knowledge retrieval accesses and utilizes information from large internal or external knowledge bases. This is crucial in building a comprehensive knowledge base of drug safety profiles, such as remembering past AEs linked to a particular drug.

Tools

Agentic AI systems use tools to access and interact with external knowledge bases, systems, and environments such as AE reporting application programming interfaces (APIs), drug databases, electronic health records, scientific literature, etc., to retrieve information, validate findings, perform actions and orchestrate multistep tasks to achieve their desired goals or objectives (42,43).

Self-reflection

Enables agentic AI systems to systematically analyze their own strategies, decision-making process, actions, outputs and performance, while learn from trial-and-error quickly, efficiently and from past experiences to improve on the outputs and performance on more complex tasks (Figure 1) (44,45).

Scope of LLM-based agentic AI system in PV

Agentic AI system draws a wide variety of inferences about themselves, other agents, and their environment; they create plans that reflect their characteristics and experiences, act out those plans, react, and re-plan when appropriate; and they respond when the end user changes their environment (46). Often, real-world scenarios, such as in PV, require multistep reasoning, planning and repeated interactions with data, tools and other agents to uncover new insights in order to make informed and personalized decisions.

Some of the key challenges facing PV industry are:

Rise in the new safety issues for new drugs, reported across various safety reporting systems.
Increase in the number and variety of data sources that need to be evaluated for safety information.
Limited availability of human safety experts, which delays processing and actions taken based on AE reports.
Different regulations, guidelines, standards and formats among the regulatory agencies [FDA, European Medicines Agency (EMA), Medicines and Healthcare products Regulatory Agency (MHRA), etc.] for processing of the reports.
For identifying safety issues, requirement to have updated data and their reports based on this data as a part of continuous monitoring process, which is a challenge.

PV challenges require LLM agentic AI system rather than standard alone LLM

As seen from Table 3, and based on the above challenges, a few areas in PV are identified below, which requires significant manual effort, real-time decision-making, problem-solving capabilities and goal-directed behavior. LLM-based agentic AI solutions provide these capabilities compared to standard LLMs (47). When efficiently done with this solution, overcoming these intensive and complex challenges provides high impact and proves significantly beneficial to PV (22,23,45,48).

Table 3

Reasons for utilizing LLM agentic AI systems compared to standalone LLM

Pharmacovigilance areas	Capability required	Reason
Safety signal management	Real-time decision-making	Continuous monitoring of AEs based on availability of new information for faster detection of potential safety signals before they escalate to major issues and have an impact to the patient
Processing and evaluation of ICSRs	Complex problem-solving	Rapid processing of ICSRs involving several steps with interconnected tasks, requiring data analysis from diverse sources and formats while considering intricate requirements of ICSR processing and evaluation
Benefit-risk analysis	Complex problem-solving	Multi-faceted evaluation that requires multistep assessment, weighing and integration of diverse factors contributing to benefits and risks of a drug, modelling scenarios to predict outcomes while considering intricate and multidimensional nature of this analysis
Aggregate report analysis and documentation	Goal-oriented behavior	Well-defined and structured reports are required to be prepared that involves integration and analysis of data from diverse sources where repetitive tasks like aggregation, tabulations, formatting etc. can be automated via agents to prepare comprehensive safety profile based on regulations and guidelines

AEs, adverse events; AI, artificial intelligence; ICSR, individual case safety report; LLM, large language model.

The identified list of areas in PV includes:

Safety signal management.
Processing and evaluation of ICSR.
Benefit-risk analysis.
Aggregate report analysis and documentation.

Choosing multi-agentic system for PV tasks

Having a multi-agentic system, where every agent collaborates and cooperates in accomplishing the given task, helps in much improved outcomes. In this system, each agent can be assigned with smaller sub-task, where they collaborate with other agents to accomplish the user objective. Having multiple agents would help in more manageable and maintainable handling of these complex tasks. Additionally, these individual agents, when trained on specialized data relevant to a particular task or domain, perform better (49). Finally, agentic AI systems with multiple agents need to have improved resilience and fault tolerance, where failure of one agent is less likely to impact the entire agentic system (50,51).

The list of agents identified for each of the below PV areas is based on its use case objectives, breakdown of their individual workflows, their task category, external integration needs, and their scalability consideration. Each of these agents can be coupled with a critic agent, which provides feedback on the agent’s output. This makes the reasoning and decision-making abilities of agents more efficient. The agent then regenerates its output based on critic agent’s feedback. This agent-critic interaction can continue until the critic approves the agent’s response.

Safety signals management

Safety signal is the reported information on a possible causal relationship between an AE and a drug, of which the relationship is unknown or incompletely documented previously (52). Some of the data sources from which signals are detected include EHRs, spontaneous reporting systems such as FDA Adverse Event Reporting, EU EudraVigilance, etc. There are two parts to safety signal management process, which are signal prioritization and signal evaluation (53,54).

The identified agents (Table 4) related to safety signal management include:

DataAggregation agent can aggregate data from various sources, tools and applications, in real-time, relevant to signal management and evaluation.
SafetySignalDetection agent can perform real time monitoring of safety signal from the data sources. The agent can be provided with criteria for finding potential signals and can be interfaced to all the relevant data sources either via an API, local machine or open source datasets. This agent can be trained with structured data (from FDA AE reporting system, etc.), unstructured data (Medical literature, EHR, etc.) and with labelled dataset (data prepared and validated by PV experts).
SignalCategory agent can confirm the presence of signal from sufficient evidences to categorize them as signal presence or signal absence based on criteria such as strength of evidence, medical significance, impact on public health and clinical relevance & context. This agent can be trained with predictive capability for binary classification of a signal with two classes: signal and non-signal, based on above criteria.
SignalValidation agent can match the details of signal with dataset with previously identified signal on the same ADE of the drug or check if sufficient data was used to classify it as a signal and to check if the signal was handled differently through different procedures.
ReportGeneration agent can generate reports related to various signal management and evaluation steps such as signal detection, signal validation, signal assessment, regulatory signal reports, etc. These reports can be generated in a specified format from the data obtained from their individual agents.

Table 4

AI agents list related to pharmacovigilance

Pharmacovigilance areas	AI agents list
Safety signal management	DataAggregation agent
	SafetySignalDetection agent
	SignalCategory agent
	SignalValidation agent
	SignalReportGeneration agent
Processing and evaluation of ICSRs	InfoElicitation agent
	ICSRValidation agent
	CausalityRelationFinder agent
	Confirmation agent
	ICSRReportGeneration agent
Benefit-risk analysis	SafetySignalIdentifier agent
	PrioritizeSignal agent
	BenefitRiskAnalysis agent
	BenefitRiskAssessment agent
	BenefitRiskFinalEvaluation agent
Aggregate report analysis and documentation	DataCollection agent
	SignalIdentification agent
	RealTimeData agent
	AggregateReportGeneration agent

AI, artificial intelligence; ICSR, individual case safety report.

Once the data is found to be true with big impact to population, suitable recommendations [e.g., Pharmacovigilance Risk Assessment Committee (PRAC) from EMA or FDA] can be followed where the signal is reported by the marketing authorization holder or a regulator.

Processing and evaluation of ICSR

With the ever-increasing volume of ICSRs being submitted to the regulatory agencies (such as FDA, EMA, etc.) and visible shortage of human safety experts to review these reports, it has become important to reduce their process inefficiencies in this resource intensive, error-prone and highly impactful workflow. ICSRs still are the primary source for providing monitoring and early warnings on drug safety at the individual patient level. There are two parts to this workflow: case processing and case causality assessment.

LLM Agentic AI system can be involved in a process where multiple agents can be tasked with sub-tasks of the overall processing and evaluation of an ICSRs. The identified agents (Table 4) for this include:

InfoElicitation agent can extract and analyze AEs, SAEs, and ICSRs from all relevant sources. These agents can be interfaced with the required tools and data sources from healthcare providers, clinical trials, medical literature, spontaneous reporting systems, etc.
ICSRValidation agent can evaluate for minimum criteria to classify the ICSR case (valid/invalid) by information aggregated from InfoElicitation agent. The list of minimum criteria includes an identifiable patient (identified with their initial, gender, height, weight, etc.), identifiable reporter of the AE (identified with their initial, address, email, etc.), suspected drug (name, dosage, strength, indications of use) and suspected AE (reaction description, date of reaction, signs, symptoms, etc.). This agent can be trained to classify valid/invalid ICSR cases with labelled datasets of cases having minimum criteria from the aggregated information.
CausalityRelationFinder agent can specifically check for causal relationship between a drug and an AE only on the valid case provided by ICSRValidation agent. CausalityRelationFinder agent can provide additional outputs such as strength of relation as “none”, “weak” or “stronger”, add priority values with options as “life-threatening”, “critical”, “non-critical” and final conclusion from the agent as “yes” or “no” for consideration of the case.
Confirmation agent can run a duplicate check to ensure that CausalityRelationFinder agent is not working on a drug and an AE with existing follow-up case. This is achieved where the agents extract key features from the existing ICSR narratives, to perform high-level assessment of source documents and confirm this as valid case to take action.
ICSRReportGeneration agent can generate ICSR as per the specified format per the regulatory agencies, aggregating information from the previous agents.

Benefit risk analysis

Benefit-risk assessment involves analysis that aims to appraise the usefulness of a medicine in a given indication, jointly accounting for favorable and unfavorable effects. There is an uncertainty in estimating probability of effects (favorable and unfavorable) due to limited and sometimes conflicting data, differences in perspectives (patient, societal, regulatory perspectives), challenges in valuation criteria to use, heterogeneity of effects across patient populations, multiple objectives (maximizing benefits, minimizing risks), etc. Because of the complexity and difficulty of synthesizing and interpreting appropriate information, communicating the reasons and rationale for regulatory decisions, creates a challenge (55).

LLM Agentic system can have multiple LLM-based agents, with each assigned with smaller sub-tasks of the overall benefit-risk evaluation. Each agent can be assigned a specific sub-task and can collaborate with other agents to accomplish the ultimate goal of weighing benefit and risks associated with medications. The agents (Table 4) can have sub-tasks, such as below:

SafetySignalIdentifier agent analyzes the data from various sources (such as ongoing monitoring of AEs, ICSRs, different aggregate reports, scientific and medical literature, etc.) for identifying new signals on adverse reactions which shows drugs having an AE.
PrioritizeSignal agent could communicate with SafetySignalIdentifier agent for identified safety signal list to prioritize newly identified signals with the existing ones and their AE details.
BenefitRiskAnalysis agent could first extract newly identified signals from PrioritizeSignal agent and analyze benefit-risk for the drugs, which can be based on two major factors. One is the frequency of drug negative effects (example: affects 1 in 100 as against 1 in 10,000) from observational or experimental data on the outcomes resulting from various treatments. The other is desirability of drug effects such as therapeutic pharmacology, efficacy, and desirable drug interaction. The agent can be tasked to identify whether the drug has negative effect, compare them with already listed signals and their risks, and identify any beneficial effect of drug from signal.
BenefitRiskAssessment agent could weigh benefit-risk for the identified with critical and major risks, through frameworks such as PrOACT-URL, BRAT, MCDA, SMAA, decision trees, etc. Suitable tools and models can be interfaced to agents for the assessment.
BenefitRiskFinalEvaluation agent, if there is more evidence required, those can also be extracted to re-analyze benefits-risk and potentially present benefits and risk and assist in deciding on continuing with the product or not based on various factors mentioned above (56).

Aggregate report analysis

Aggregate reporting involves reviewing the cumulative safety information from a wide range of data sources, on a periodic basis, to submit the findings to regulators worldwide. Some of the preapproval aggregate reports include Investigational New Drug (IND) report, New Drug Application (NDA) report, etc. Some of the post approval aggregate reports include Periodic Benefit Risk Evaluation Report (PBRER), Periodic Adverse Drug Experience Report (PADER), etc. (57,58). Manual data processing and aggregating are time and resource intensive that pose risks such as human errors, inconsistencies, and delays in identifying safety signals, potentially impacting patient safety. Additionally, the reporting requirements vary by country, approval status, and development stage of the medicinal product (59,60).

LLM Agentic AI system can assist in three major sub-tasks of aggregate report analysis and documentation, which includes data collection from varied sources in different formats, identifying trends or patterns for potential safety signal, preparation of pre-marketing and post-marketing reports (58). LLM Agents can have multiple agents, with each assigned with smaller sub-tasks of the overall aggregate report analysis. Each agent can be assigned a specific sub-task and can collaborate with other agents to accomplish the ultimate goal of aggregate report analysis. The agents (Table 4) can have sub-tasks, such as below.

DataCollection agent could gather relevant PV data from varied sources from the internet related to medical literature, spontaneous reports, EHRs, clinical trials, etc. This data can be stored in a database by the agents for further analysis.
SignalIdentification agent could utilize machine learning or neural network models to decipher patterns or trends and identify safety signals from this dataset and database about a particular drug or pharmaceutical product.
RealTimeData agent communicates with SignalIdentification agent in making sure that real-time data analysis is made and patterns and trends are identified from these data.
AggregateReportGeneration agent generates report based on a format required for different reports and by different agencies. These formats are given as an input to the agents, and they generate a comprehensive report from them. SignalIdentification and RealTimeData agents constantly communicate on the updates made to information and their corresponding signal updates are made. Accordingly, ReportGeneration agent would be able to generate new reports.

Technical architecture for LLM-based agentic AI system

Agent orchestration types

Multi-agent orchestration refers to the design patterns, protocols, and coordination strategies that can be directed to how AI agents interact, delegate responsibilities, and collectively achieve the user’s objective, given agent’s individual role and their tasks to perform. In this multi-agent orchestration, an orchestrator plays an important role to ensure, all the individual agents working towards an objective, operate smoothly. The role of an orchestrator includes task allocation to appropriate agents, information aggregation from different agents to ensure nothing is missed out during task execution, QA check on completion of tasks done by agents, and conflict resolution among agents. The three major orchestrator types include supervisory agent, hierarchical controller, and dynamic coordinator, which can be used based on the needs and complexity of the agentic AI systems.

Supervisory agent is a single, central agent (acting as a supervisor or orchestrator) that assigns, monitors, and coordinates individual specialized agents. For example, in safety signal management, the supervisory agent assigns AE data collection and aggregation to one agent, signal detection to another agent, validation to a third, etc., which ensures their continuous monitoring, resolving their internal conflicts and eventually completion of their tasks.
Hierarchical controller involves agents being organized in stages, where each stage handling a specific step/s in the workflow. For example, in ICSR processing, they structure the ICSR processing pipeline into well-defined, ordered stages, with each stage being managed by identified specialized agents like InfoElicitation, ICSRValidation, CausalityRelationFinder, etc. This ensures clear division of labor among agents and sequential processing, which exhibits the real-world pipeline.
Dynamic coordinator is context-aware, and adaptive that routes tasks to agents dynamically, adapting tasks based on real-time data. For example, in aggregate Report analysis, if there are unexpected changes to data sources late in the reporting cycle or evolving reporting regulatory requirements, the dynamic controller integrates new data sources by calling RealTimeData agent (as applicable) to consider all relevant sources/evidence to be considered, on short notice.

Memory management

Agentic memory architecture is required for memory management for LLM agents. This architecture is critical in establishing meaningful connections based on semantic similarities, multistep reasoning, iterative planning, learning and adapting, and reflection. Having appropriate memory management for agents will help in retaining the context of communication among agents, helps cater specific use case scenarios based on history of communication, and learn, adapt and improve upon its decision-making process. There are 3 primary types of memory such as short-term, long-term and knowledge retrieval.

Short-term memory is temporarily captured and retained, which holds recent interactions and communications. This memory is available for a limited time for immediate analysis and provide response, that which dissipates after task completion.
Long-term memory is stored and retained, holds accumulated knowledge from past interactions, scales beyond single tasks, and retains context across multiple sessions.
Knowledge retrieval brings in external knowledge, thereby expanding the range of information the agent can access.

Based on the workflows, the type of memory can be utilized in PV.

Tool integration architectures and APIs

Tool integration architectures and APIs are important to PV workflows, which provides seamless data exchange, process automation, compliance, and advanced analytics in the diverse PV workflows. Some of the architectural approaches include:

Modular, API-driven integration.
- The individual specialized agents can utilize RESTful APIs to extract case data, coding dictionaries, or safety literature from data sources (such as PubMed, MedDRA, etc.).
- The agents can leverage using microservices architecture to achieve modularity, scalability, efficiency, and seamless integration, which otherwise can make these individual agents siloed, redundant, and harder to scale or update.
- Cloud-native integration provides multi-agent PV systems to be scalable, secure, and event-driven. The agents use cloud services for real-time task orchestration, global data ingestion, modular microservice deployment, and secure API communication. This ensures adaptability, high availability, compliance, and efficient workload handling across dynamic, global PV workflows.
Centralized data warehousing & ETL.
- The centralized ETL (extract, transform, load) platforms can aggregate safety data from EHRs, spontaneous reporting databases, literature, and social media. Data are standardized, cleansed, and loaded into a unified repository for analytics or regulatory submissions. These individual agents (agents such as ICSRValidation agent, CausalityRelationFinder, etc.) can use and update shared data in the warehouse, avoiding siloed data and processes. The ETL can process structured and unstructured data to provide agents with, so agents always work with high-quality, regulation-ready information.
Multi-modal connectors.
- The agentic AI systems can use multi-modal data connectors to seamlessly combine and analyze all relevant PV data types, such as structured, unstructured and semi-structured. This enables robust, automated, and contextually rich safety monitoring, leading to earlier and more accurate signal detection as well as more comprehensive assessments for regulatory compliance and patient safety.

Implementation frameworks

Several open-source and research-grade frameworks support multi-agent orchestration in PV. Langroid, used in the PV MALADE agentic system (32), provides flexible, language-driven agent coordination. LangGraph (61,62) can be adopted for graph-based, multi-agent workflows in healthcare applications. CrewAI (61,63) provides structured role-based agent collaboration through well-defined task delegation. AutoGen (64) facilitates dynamic, autonomous agent interactions and conversation management. These frameworks collectively enable scalable, modular, and explainable LLM-based agentic AI system implementation.

Human-in-loop integration

Human AI collaboration, in terms of monitoring and reviewing AI agent’s actions and outputs, is required for PV, since the consequences of action and outputs of AI agentic actions are sufficiently greater in PV domain. This collaboration can be established in an AI agentic system, where human oversight and inputs can improve the combined cognitive & value-based judgment on the system’s decision. This collaboration considers strategic trade-offs (e.g., AE detections for missed contextual nuances), handling data privacy and regulatory concerns, and overcoming uncertain and novel scenarios for agents to handle. Human feedback loop can be implemented per the requirements of the agentic systems, considering the fact of identifying and minimizing areas of their collaborations that might minimize their efficiency (65).

Expected benefits of using LLM-based agentic AI system in PV

Enhanced data ingesting, handling and processing

LLM-based agentic AI system is capable of autonomously gathering, ingesting, processing, merging, and interpreting diverse complex data in PV. AI agents will be able to access and integrate with appropriate PV tools (private files, databases, software applications, APIs, etc.) to extract AE data in various available formats (E2B XML, case notes, CSV, PDF, JSON/XML, etc.), convert them into common intermediatory format, process and merge them to ensure scheme compatibility, and use them to derive insights on new safety signals, benefits-risk of drug, while use the insights in ICSR and aggregate reporting. However, access and usage of tools for data analysis will be determined by the goals set for these agentic systems which is extracted from the user’s query, their identified tasks determined by agents and the respective tools based on the task identified autonomously by agents.

Faster turnaround time

LLM-based agentic AI system enables real-time signal detection and prioritization of safety signals by dynamically ingesting and analyzing data from sources, which enables their faster identification and escalation that can bring down the signal detection cycle from weeks to days. Similarly, the agents can accelerate ICSR’s case handling and processing times by automating processes such as data intake, AE coding, triaging the case (based on seriousness, expected/unexpected, known product, etc.) and generating a detailed narrative in the specified format [e.g., ICH E2B(R2)] and not missing out on regulatory reporting deadline (7, 15 and 90 days). The agentic system can connect to relevant data sources (clinical trial, EHR, registries, etc.) and dynamically ingest real-world data such as social media, databases, internet sources, etc. This data can be used to aggregate and identify patterns, trends, benefits and risks using decision-making tools like Value Tree, having risk matrices created to rank risks by severity & frequency, etc. enabling rapid decision-making. AI agents can auto-generate content and populate templates, reducing report preparation time by 50–80% and improving submission timelines.

Scalable processing

PV workflow involves handling increasing data volumes in varying formats and regulations. However, this comes at the considerable scaling of cost, complexity and performance of the standalone LLM models. In contradiction, agentic AI systems, with their modular and segregation of duties approach, can handle them without a proportional increase in cost to its resources, and complexity while maintaining their performance efficiency. This is implemented in agentic AI system with distributed processing to handle huge data efficiently, modularity with the assignment of unique and distinct role to each agent and operate independently yet cooperatively, with reduced scaling in overall computation cost, while tracking decisions and monitoring performances individually. Additionally, assigning distinct and specialized tasks based on training of the individual agents, can significantly reduce computation cost, scale performance on distinct agent’s activities and providing clear individual goals for each of the agents to reduce complexity.

Enhanced decision-making

Agentic AI systems significantly enhance PV workflow by enabling a decentralized, task-specific, decision-making process with having continuous adaptation through real-time feedback mechanism into the process, such as new AE pattern, regulatory agency alerts, reviewer comments, etc. Agentic AI system can handle report sections independently such as efficacy trends, new safety data, and literature synthesis, leading to more timely, cohesive, and data-rich submissions. This enables a scalable, adaptive, and intelligent task distribution across different PV workflow to help an enhanced and efficient decision-making. This ultimately leads to faster ICSR handling, multi-source signal detection, and more nuanced benefit-risk evaluations, which results in enhancing workflow efficiency, patient safety and complying with regulations, and outcomes across the PV lifecycle.

Improved relevancy, consistency and accuracy

The ability of agentic AI systems to plan, reason, integrate with applicable external PV tools, take action and establish communication between critical agent with other agents, helps in the improvement of relevancy, accuracy and consistency of the outcome in PV workflows. Having individual agents carry out tasks for a single process (e.g., safety signal management, benefit-risk analysis, etc.) prevents the cascading of errors from overly complex interactions and aids in limited contextual interactions that each agent handles. The important aspect of the agentic AI system is the iterative agentic approach through planning, reasoning, tool integration, action, and continuous critic-agent feedback which ensures ongoing refinement, reduces error propagation and significantly enhances the overall relevancy, accuracy, and consistency of outcomes across complex PV workflows.

Having real-time insights

The agentic AI system’s ability to plan, reason, act and iterate to the changing and dynamic environments (such as social media, medical literature, spontaneous reporting data, etc.) with their non-reliance on static training data on the LLM models, will help in identifying novel ADRs in a rapidly changing therapeutic contexts, dynamically reprioritize safety signals based on the emergence of new evidence, iteratively refine data intake to update processing of cases based on regulatory updates and incoming data volumes, and predict downstream effects of any newly identified risks, where agentic systems enable preemptive interventions before issues are escalated.

Explainability

Explainability in AI model’s decisions making is essential in PV to ensure human understandability, transparency, trust, effective monitoring and regulatory compliance to clarify on their decision logic and their influencing factors related to safety signal detection, benefit-risk assessment, and data analysis done for aggregate and case reports. AI agents are able to provide post hoc explanation which is context-sensitive that details on why a specific input led to an output. Additionally, the agents align their explanations with specific agent role (example SafetySignalDetection, SignalValidation, etc.), provides scope for quick debugging on issues and reduces misinterpretation risk to give clarity on their individual agent’s task and outputs (66).

Additionally, Table 5 mentions some of the key PV metrics that can be improved using LLM-based multi-agentic AI system.

Table 5

Some key metrics that can be improved using LLM Agentic AI system in pharmacovigilance

Pharmacovigilance areas	Metrics
Safety signal management	Percentage of safety signals resulting in label updates or regulatory action
	Time from signal detection to report generation for regulatory submission
	Rate of false-positive signals identified
	Number of new safety signals identified per period (e.g., month, quarter, year, etc.)
	Time from signal detection to implementation of risk minimization measures
Processing and evaluation of ICSRs	Number of duplicate or misclassified adverse events corrected before impacting their analysis
	Rate of data entry errors identified during quality control
	Percentage of cases processed within regulatory timelines
	Percentage of ICSRs with missing core fields (e.g., suspect drug, indication)
	Average number of follow-up attempts per case
Benefit-risk analysis	Clinical outcome improvement post-benefit-risk mitigation (e.g., reduced serious adverse event rate)
	Average time from signal detection to completed benefit-risk evaluation
	Percentage of benefit-risk assessments leading to product label changes or regulatory action
	Number of benefit-risk assessments conducted per reporting period
	Number of risk management plan updates triggered by benefit-risk re-evaluation
Aggregate report analysis and documentation	Percentage of PSURs/PBRERs submitted on time
	Average turnaround time for aggregate report generation and internal QA review
	Percentage of aggregate reports requiring post-submission corrections or amendments
	Number of regulatory queries received per aggregate report
	Number of audit or inspection findings related to aggregate report documentation or archiving

AI, artificial intelligence; ICSR, individual case safety report; LLM, large language model; PBRER, periodic benefit-risk evaluation report; PSUR, periodic safety update report; QA, quality assurance.

Challenges of using LLM agentic AI system in PV

Lack of intuitive understanding

The agentic AI system, while capable of ingesting, processing and reporting from vast, diverse PV data, might lack true understanding, real world constraints and intuition related to PV tasks compared to professionals who develop their instinctive knowledge through real-world experiences (67,68). The agentic system might misinterpret nuanced safety signals, patient context, or ethical concerns. These system uses simulated intelligence obtained from predicting the patterns in the given context through constant intercommunication with the LLM, tools and other agents, but might need realistic, detailed and comprehensive testing in a PV setup to understand and capture their real-world constraints and complexities. The performance of the agentic systems (both individual agents and as a system) needed to be assessed periodically, in dynamic environments with multiple stakeholders involved (69,70).

Consensus, coordination and contextualization

The agentic AI systems may face challenges in PV tasks due to misaligned objectives, unshared context leading to inconsistent interpretations and having inter-agent coordination issues (69). Misaligned objectives can result from providing vague or incomplete goals, tasks or sub-tasks to agents (68,71). Lack of clarity on agent’s roles within the agentic system, ineffective communication and coordination among agents could lead to skewed outputs. Failure to maintain shared contextual information or conversation history among agents, contradicting reasoning among agents and lack of robust verification protocol can produce incomplete or incorrect results (67,71). These challenges lead to conflicting or missing of crucial safety signals, inconsistent ICSR evaluations, skewed benefit-risk analyses, and potential contradictory aggregated reports. Addressing these issues requires robust agent alignment, improved communication protocols, and context-aware frameworks for reliable decision-making among agents (68,72).

Data quality and privacy

PV is primarily driven by collection, management and analysis of data with inherent variation seen in their quality and volume. Noisy data misleads the agents, causing inaccurate insights and biased outcomes in their decision-making processes. The challenge remains in having high quality data without exposing sensitive patient health information (PHI), personal information (PI) and personal identifiable information (PII), which may lead to violation of data privacy (73). A carefully chosen training strategy should be implemented, since a complete de-identification may remove relevant context during training. In such cases, human experts can improve training by providing curated, de-identified data, expert-labeled case narratives, and incorporating feedback loops into the agentic learning process. Some of the techniques include transfer learning, knowledge distillation, and parameter efficient fine-tuning (PEFT). During the training process, obtaining informed consent from patients is crucial, which should be in accordance with established ethical guidelines with establishing stringent data privacy and LLM usage policy.

Interoperability

Interoperability between PV systems, their reporting databases with agentic AI system could offer necessary integration and information exchange in real-time challenges, which creates collaborative framework and collective approach for PV activities. With recent advancements, it was observed that A2A and MCP protocols can transform PV interoperability challenges by enabling standardized, secure, and real-time communication between AI agents, tools, and various data sources. For safety signal management and ICSR processing, these protocols can ensure agents to share consistent context and coordinate workflows, reducing errors and delays. For benefit-risk analysis and aggregate reporting, these protocols facilitate seamless integration and orchestration across legacy and modern systems, which can support auditability, and compliance. By acting as universal connectors, A2A and MCP break down silos, streamline multi-agent collaboration, and future-proof PV operations for more accurate, efficient, and regulatory-compliant drug safety monitoring (74-76).

The “Hallucination” problem

Generation of unreliable outcomes from LLM agentic AI systems such as misleading safety and benefits-risks assessments, incorrect AE reporting, etc., may be due to hallucination of non-factual information, reasoning errors, systematic biases, and failures. Reliability of LLM agents can be analyzed into three factors. Problems (The root cause of the reliability issue), Inspection (dealing with the LLM agent’s potentially unreliable output) and Improvement (making LLM agent’s response more reliable) (46,77). PV experts should be actively involved in the processes that require oversight at critical decision points such as confirming safety signals, approving case narratives, finalizing benefit-risk assessments, etc. Technically, reducing the problem of hallucination is possible by having LLMs’ think from a global perspective, not solely relying on the previously generated tokens. Some of these techniques include tree of thought (ToT) (78) and cumulative reasoning (79). Reinforcement learning techniques can be employed to optimize the model behavior. The self-reflection technique can improve the reliability of model output by enabling multiple large model agents to engage in mutual discussion and verification (80). Additionally, providing the agents with safety instructions within the structured prompts (example such as including regulatory aspects as ‘based on ICH E2B(R3), 21 CFR Part 11,314.80, etc., to infer from trusted sources before proceeding with the provided tasks, will help in reducing hallucination (81).

Responsible AI

Utilization of agentic AI systems in PV workflow presents ethical challenges. Some of the challenges include lack of clarity in their coordinated decision-making process, having distributed accountability on task completion, biased and unfairness in data during training, emergent behaviors resulting in misleading results, thereby leading to inconsistency in trust and reliability (27). During ICSR processing, or aggregate reporting, errors in misclassification during processing or in compiling safety summaries may make it challenging to trace to a specific agent’s failure as their outcome is a collaborative effort of multiple agents. In signal management, their prioritization and management without transparent reasoning reduces confidence in agent-driven signal management. Ignoring underrepresented population or therapeutic care areas during training results in skewed benefit risk analysis, which might impact patient health outcomes (82). Emergent behaviors from the interaction between LLM, agents, data, and environment resulting from probabilistic rules might have misleading result, which potentially can lead to unpredictable and harmful outcomes (27,83).

Regulations

To implement an LLM-based agentic AI systems in a regulated PV environment, the agentic system and their individual agents must comply with applicable regulations Table 6. Some of the applicable regulations and guidances related to LLM-based agentic AI systems are identified in Table 5. Data ingestion and processing agents must comply with FDA 21 CFR Part 11 and EMA Good Pharmacovigilance Practices (GVP) for audit trails and data traceability. Narrative generation, signal detection, and risk assessment agents must ensure accuracy, traceability, and human oversight per ICH and FDA Risk Evaluation and Mitigation Strategy (REMS) standards. Regulatory submission agents must follow validated workflows under ICH E2B/R3 and eCTD guidelines. Continuous model monitoring, validation, and change control must meet FDA Good Machine Learning Practice (GMLP), EMA AI expectations, and ICH Q9 to ensure trust, compliance, and audit readiness.

Table 6

Some of the key regulations and guidelines related to LLM-based agentic AI

Region	Regulation/guideline
USA	FDA: 21 CFR Part 314.80/600.80
	FDA: 21 CFR Part 11
	FDA: FAERS
	FDA: REM(S) Program and Guidance on Safety Reporting Requirements for INDs and BA/BE Studies
	HIPAA
Europe	EMA: Regulation (EU) No. 1235/2010
	EMA: Directive 2010/84/EU
	EMA: Good Pharmacovigilance Practice (GVP) Modules I–XVI
	EMA: EudraVigilance
	Regulatory documentation (PSURs, RMPs, PASS)
	GDPR
International	ICH E2B

AI, artificial intelligence; BA/BE, bioavailability/bioequivalence; CFR, Code of Federal Regulations; EMA, European Medicines Agency; FAERS, FDA Adverse Event Reporting System; FDA, Food and Drug Administration; GDPR, General Data Protection Regulation; HIPAA, Health Insurance Portability and Accountability Act; ICH, International Council for Harmonisation; INDs, investigational new drugs; LLM, large language model; PASS, post-authorization safety studies; PSURs, periodic safety update reports; REM(S), risk evaluation and mitigation strategy/strategies; RMPs, risk management plans.

Evaluation

LLM-based multi-agents systems’ decision-making processes can be complex and opaque. The agents are assigned their individual tasks, which are evaluated to ensure their efficacy in PV. Evaluating AI agents requires an analysis of their theoretical capabilities and an assessment of practical implications (84). The various evaluation aspects for LLM-based agents for PV shall planning effectiveness in achieving the goal, task diversity (covers various tasks assigned to agents), multi-round interaction (to gather information and complete task successfully), partially observable environments (agent’s ability to understand and act based on its environment), fine-grained progress metrics (partial task completion by agents to reflect goal attainment at various stages) and analytical evaluation such as grounding accuracy, long range interactions of agents. Additionally, a human-feedback loop which include experts’ opinions, user’s critical evaluation of results in appropriate steps, in various tasks is required to ensure critical thinking and ethical insight to the table (85-88). Currently, there is no tailored evaluation framework for LLM-based agentic AI systems in PV. However, from the existing evaluation frameworks surveyed (89), the tailored framework shall comprise accuracy, transparency, regulatory compliance, and collaboration through component testing, with PV domain-specific benchmarks, and human-in-the-loop validation for safety-critical decisions.

Summary

Strengths and limitations of this paper

The strengths of this paper lie in presenting a well-rounded exploration of how LLM-based agentic AI system is best suited for PV workflows, where this system can be effectively leveraged for improving PV efficiency. It offers a rationale for adopting multi-agentic AI framework over standalone LLMs, emphasizing the need for modularity, task specialization, and contextual orchestration. By situating LLM agents within the broader PV ecosystem, the paper highlights their potential to streamline processes, enhance decision support, and improve overall efficiency. It further adds practical value by identifying suitable agents based on the PV workflows, outlining their high-level roles and providing a thoughtful discussion on both the opportunities and the limitations of deploying such systems in real-world PV settings.

While the paper provides a rationale for applying LLM-based agentic AI systems in PV, it acknowledges several limitations. The paper does not delve into the technical software architecture, coding agentic AI specifics that would guide system developers in operationalizing such agentic AI systems in real-time. Also, the paper refrains from providing in-depth analyses of individual PV workflows, limiting its applicability for process-level integration. Practical considerations such as data integration, infrastructure requirements, and model deployment strategies are only briefly touched upon. The paper also omitted the commercial and business dimensions of agentic AI implementation in PV operations, which are critical for evaluating scalability and wider adoption. Finally, while regulatory perspectives are acknowledged, the paper offers only a high-level view, lacking a detailed examination of compliance pathways and risk mitigation strategies within the regulated environments.

Future research directions

In reviewing the current landscape of agentic AI systems for PV, several key areas for future research emerge. Studies on optimization of communication protocols among agents involving human feedback related to critical inferences (related to ADE, safety signals, etc.) can significantly improve performance and outcomes. Research on automatic identification and integration of more specialized scientific tools relevant to PV tasks using techniques like MCP servers, offers the potential to accelerate outcome timelines and accuracy. Creating comprehensive benchmarks for evaluating the full capabilities of agentic AI systems across various domains is required, as existing assessments focus narrowly on individual agent understanding and reasoning. Additionally, studies on developing agentic AI systems that continuously learn from new data and different modalities (image, video, audio) while managing their complexities are important. Developing techniques to address hallucinations in multi-agent settings is important, where misinformation or incorrect decision from one agent can cascade through interconnected agentic networks. Explainability of AI (XAI) techniques and development of computerized system validation strategy of these agentic AI systems are required for these systems to be compliant with their intended use. This would ensure that the system is technically qualified, clinically useful, and comply with appropriate regulations.

Conclusions

In conclusion, this narrative review has explored the current landscape and advancements in of AI in PV. In particular, it focused on benefits and challenges of using LLMs, identified their potential and shortcoming in PV. The review suggests utilizing agentic AI system, with mention and analysis of high-level agents required for various PV processes such as safety signals management, ICSR processing and evaluation, benefit risk analysis and aggregate report analysis. While significant potential and benefits are identified for LLM-based agentic AI systems, several challenges remain unanswered since the field of agentic AI is still at its nascent stage and addressing these challenges will require targeted research efforts focused on aspects like improving communication mechanisms, human-AI agent collaboration, better scientific tools interface protocols, dynamic data access & management, improving XAI and reducing hallucinations. Overall, continued innovation in this field will be crucial to unlocking the full potential of LLM-based agentic AI systems in boosting process efficiency in PV.

Acknowledgments

None.

Footnote

Reporting Checklist: The author has completed the Narrative Review reporting checklist. Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-350/rc

Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-350/prf

Funding: None.

Conflicts of Interest: The author has completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-350/coif). S.M.V. is an employee at Healthtech Company. The author has no other conflicts of interest to declare.

Ethical Statement: The author is accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Kumar RKS, Velusamy S. Harnessing Artificial Intelligence for Enhanced Pharmacovigilance: A Comprehensive Review. Indian Journal of Pharmacy Practice 2025;18:171-9.
Syrowatka A, Song W, Amato MG, et al. Key use cases for artificial intelligence to reduce the frequency of adverse drug events: a scoping review. Lancet Digit Health 2022;4:e137-48. [Crossref] [PubMed]
Salas M, Petracek J, Yalamanchili P, et al. The Use of Artificial Intelligence in Pharmacovigilance: A Systematic Review of the Literature. Pharmaceut Med 2022;36:295-306. [Crossref] [PubMed]
Kompa B, Hakim JB, Palepu A, et al. Artificial Intelligence Based on Machine Learning in Pharmacovigilance: A Scoping Review. Drug Saf 2022;45:477-91. [Crossref] [PubMed]
Yang S, Kar S. Application of artificial intelligence and machine learning in early detection of adverse drug reactions (ADRs) and drug-induced toxicity. Artif Intell Chem 2023;1:100011.
Kassekert R, Grabowski N, Lorenz D, et al. Industry Perspective on Artificial Intelligence/Machine Learning in Pharmacovigilance. Drug Saf 2022;45:439-48. [Crossref] [PubMed]
Bate A, Stegmann JU. Artificial intelligence and pharmacovigilance: What is happening, what could happen and what should happen? Health Policy Technol 2023;12:100743.
Rosello J. Deep Learning Techniques for Adverse Event Detection: Advanced Methods and Applications. Pharmacovigilance Analytics 2023. Available online: https://www.pharmacovigilanceanalytics.com/methods/predictive-analytics/deep-learning-techniques-for-adverse-event-detection-advanced-methods-and-applications/
Movva R, Balachandar S, Peng K, et al. Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers. arXiv 2024. arXiv:2307.10700v4.
Raiaan MAK, Mukta MSH, Fatema K, et al. A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges. IEEE Access 2024;12:26839-74.
Matheny ME, Yang J, Smith JC, et al. Enhancing Postmarketing Surveillance of Medical Products With Large Language Models. JAMA Netw Open 2024;7:e2428276. [Crossref] [PubMed]
Ong JCL, Chen MH, Ng N, et al. A scoping review on generative AI and large language models in mitigating medication related harm. NPJ Digit Med 2025;8:182. [Crossref] [PubMed]
Li Y, Li J, He J, et al. AE-GPT: Using Large Language Models to extract adverse events from surveillance reports-A use case with influenza vaccine adverse events. PLoS One 2024;19:e0300919. [Crossref] [PubMed]
Landman R, Healey SP, Loprinzo V, et al. Using large language models for safety-related table summarization in clinical study reports. JAMIA Open 2024;7:ooae043. [Crossref] [PubMed]
Pateriya K. Transforming Drug & Device Safety and Pharmacovigilance with Generative AI and Large Language Models. CLOUDBYZ; 2023. Available online: https://www.cloudbyz.com/resources/pharmacovigilance/transforming-drug-device-safety-and-pharmacovigilance-with-generative-ai-and-large-language-models/
Yan C, Grabowska ME, Dickson AL, et al. Leveraging generative AI to prioritize drug repurposing candidates for Alzheimer's disease with real-world clinical validation. NPJ Digit Med 2024;7:46. [Crossref] [PubMed]
Praveen J, Kumar CMK, Channappa AH. Transforming Pharmacovigilance Using Gen AI: Innovations in Aggregate Reporting, Signal Detection, and Safety Surveillance. J Multidiscip Res 2023;3:9-16.
Makridakis S, Petropoulos F, Kang Y. Large Language Models: Their Success and Impact. Forecasting 2023;5:536-49.
Crisafulli S, Ciccimarra F, Bellitto C, et al. Artificial intelligence for optimizing benefits and minimizing risks of pharmacological therapies: challenges and opportunities. Front Drug Saf Regul 2024;4:1356405. [Crossref] [PubMed]
Kawazoe Y, Shimamoto K, Seki T, et al. Post-marketing surveillance of anticancer drugs using natural language processing of electronic medical records. NPJ Digit Med 2024;7:315. [Crossref] [PubMed]
Li D, Wu L, Zhang M, et al. Assessing the performance of large language models in literature screening for pharmacovigilance: a comparative study. Front Drug Saf Regul 2024;4:1379260. [Crossref] [PubMed]
Sun Z, Pergola G, Wallace BC, et al. Leveraging ChatGPT in Pharmacovigilance Event Extraction: An Empirical Study. arXiv 2024. arXiv:2402.15663v1.
Von Csefalvay C. DAEDRA: A language model for predicting outcomes in passive pharmacovigilance reporting. arXiv 2024. arXiv:2402.10951v1.
Shi Y, Ren P, Wang J, et al. Leveraging GPT-4 for food effect summarization to enhance product-specific guidance development via iterative prompting. J Biomed Inform 2023;148:104533. [Crossref] [PubMed]
Benaïche A, Billaut-Laden I, Randriamihaja H, et al. Assessment of the Efficiency of a ChatGPT-Based Tool, MyGenAssist, in an Industry Pharmacovigilance Department for Case Documentation: Cross-Over Study. J Med Internet Res 2025;27:e65651. [Crossref] [PubMed]
Carpenter KA, Altman RB. Using GPT-3 to Build a Lexicon of Drugs of Abuse Synonyms for Social Media Pharmacovigilance. Biomolecules 2023;13:387. [Crossref] [PubMed]
Chang CH, Chang FY, Hwang SY, et al. Prompting for Few-shot Adverse Drug Reaction Recognition from Online Reviews. 2023 IEEE 11th International Conference on Healthcare Informatics (ICHI), Houston, TX, USA; 2023:168-75.
Sun J, Luo Y, Gong Y, et al. Enhancing chain-of-thoughts prompting with iterative bootstrapping in large language models. arXiv 2023. arXiv:2304.11657v3.
Singh I, Blukis V, Mousavian A, et al. ProgPrompt: Generating Situated Robot Task Plans using Large Language Models. arXiv 2022. arXiv:2209.11302v1.
Huang X, Liu W, Chen X, et al. Understanding the planning of LLM agents: A survey. arXiv 2024. arXiv:2402.02716v1.
Wang X, Wei J, Schuurmans D, et al. Self-consistency improves chain of thought reasoning in language models. arXiv 2022. arXiv:2203.11171v4.
Choi J, Palumbo N, Chalasani P, et al. MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance. arXiv 2024. arXiv:2408.01869.
Greyling C, AppAgent v2 With Advanced Agent for Flexible Mobile Interactions, Medium 2024. Available online: https://cobusgreyling.medium.com/appagent-v2-with-advanced-agent-for-flexible-mobile-interactions-e58e9c6abc6c
Poole DL, Mackworth AK. Artificial Intelligence: Foundations of Computational Agents. 3rd Edition. Cambridge University Press; 2023.
Sinha P. Understanding LLM-Based Agents and their Multi-Agent Architecture, Medium 2024. Available online: https://medium.com/@pallavisinha12/understanding-llm-based-agents-and-their-multi-agent-architecture-299cf54ebae4
Ruan J, Chen Y, Zhang B, et al. TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage. arXiv 2023. arXiv:2308.03427v3.
Wang L, Xu W, Lan Y, et al. Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models. arXiv 2023. arXiv:2305.04091v3.
Wang L, Ma C, Feng X, et al. A Survey on Large Language Model based Autonomous Agents. arXiv 2025. arXiv:2308.11432v7.
Schick T, Dwivedi-Yu J, Dessì R, et al. Toolformer: Language Models Can Teach Themselves to Use Tools. arXiv 2023. arXiv:2302.04761v1.
Zhang J, Xu X, Zhang N, et al. Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View. arXiv 2024. arXiv:2310.02124v3.
Weng L. LLM Powered Autonomous Agents. Available online: https://lilianweng.github.io/posts/2023-06-23-agent/
Shen Y, Song K, Tan X, et al. HuggingGPT: Solving AI tasks with chatGPT and its friends in hugging face. arXiv 2023. arXiv:2303.17580v4.
Gao S, Fang A, Huang Y, et al. Empowering biomedical discovery with AI agents. Cell 2024;187:6125-51. [Crossref] [PubMed]
Madaan A, Tandon N, Gupta P, et al. Self-Refine: Iterative Refinement with Self-Feedback. arXiv 2023. arXiv:2303.17651v2.
Wang H, Ding YJ, Luo Y. Future of ChatGPT in Pharmacovigilance. Drug Saf 2023;46:711-3. [Crossref] [PubMed]
Li Y, Wen H, Wang W, et al. Personal LLM agents: insights and survey about the capability, efficiency and security. arXiv 2024. arXiv:2401.05459v2.
Jin H, Huang L, Cai H, et al. From LLMs to LLM-based Agents for Software Engineering. arXiv 2024. arXiv:2408.02479v2.
Chalamasetti VR. Revolutionizing Pharmacovigilance with LLM-Based AI Agents: Addressing Challenges and Enhancing Efficiency, LinkedIn; 2024. Available online: https://www.linkedin.com/pulse/revolutionizing-pharmacovigilance-llm-based-ai-agents-chalamalasetti-kgihe
Zhang S, Zhang J, Liu J, et al. Offline Training of Language Model Agents with Functions as Learnable Weights. arXiv 2024. arXiv:2402.11359v4.
He J, Treude C, Lo D. LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead. arXiv 2024. arXiv:2404.04834v4.
Wei J, Wang X, Schuurmans D, et al. Chain-of-thought prompting elicits reasoning in large language models. arXiv 2023. arXiv:2201.11903v6.
Uppsala Monitoring Centre. What is a Signal? 2025. Available online: https://edit.who-umc.org/signal-management/what-is-a-signal/
Jakobsson J. What is a Safety Signal? BIOMAPAS; 2025. Available online: https://www.biomapas.com/what-is-a-safety-signal/
Pharmacovigilance Analytics. Signal Detection and Management in Pharmacovigilance. 2025. Available online: https://www.pharmacovigilanceanalytics.com/signal-detection-management/
EMA. Benefit-Risk Methodology Project. 2009.
Bala. Evaluation of Benefit-Risk Balance in Pharmacovigilance. Blog article; 2024. Available online: https://blog.drugvigil.com/benefit-risk-balance-in-pharmacovigilance/
Nandhini B, Venkatesh MP, Balamuralidhara V, et al. Aggregate Reporting and Regulatory Requirements. J Pharm Sci Res 2019;11:2504-14.
DDReg Pharma. What is Aggregate Reporting? Available online: https://www.ddregpharma.com/what-is-aggregate-reporting
Markey J, Traverso K. The Untapped Potential of AI & Automation in Pharmacovigilance. 2020. Available online: https://ispe.org/pharmaceutical-engineering/july-august-2020/untapped-potential-ai-automation-pharmacovigilance
Singh R, Nandave M, Kumar A, et al. Role of Artificial Intelligence in Pharmacovigilance. In: Nandave, M, Kumar A, editors. Pharmacovigilance Essentials: Advances, Challenges and Global Perspectives. Singapore: Springer; 2024.
Duan Z, Wang J. Exploration of LLM Multi-Agent Application Implementation Based on LangGraph+CrewAI. arXiv 2024. arXiv:2411.18241v1.
Zhuang Y, Jiang W, Zhang J, et al. Learning to Be A Doctor: Searching for Effective Medical Agent Architectures. arXiv 2025. arXiv:2504.11301v1.
Venkadesh PV, Divya SV, Kumar KS, Unlocking AI. Creativity: A Multi-Agent Approach with CrewAI. J Trends Comput Sci Smart Technol 2024;6:338-56.
Chen X, Yi H, You M, et al. Enhancing diagnostic capability with multi-agents conversational large language models. NPJ Digit Med 2025;8:159. [Crossref] [PubMed]
Feng X, Chen ZY, Qin Y, et al. Large Language Model-based Human-Agent Collaboration for Complex Task Solving. arXiv 2024. arXiv:2402.12914v1.
Lee S, Kim S, Lee J, et al. Explainable Artificial Intelligence for Patient Safety: A Review of Application in Pharmacovigilance. IEEE Access 2023;11:50830-40.
Han S, Zhang Q, Yao Y, et al. LLM Multi-Agent Systems: Challenges and Open Problems. arXiv 2024. arXiv:2402.03578v1.
Guo T, Chen X, Wang Y, et al. Large Language Model based Multi-Agents: A Survey of Progress and Challenge. arXiv 2024. arXiv:2402.01680v2.
Han S, Zhang Q, Yao Y, et al. LLM Multi-Agent Systems: Challenges and Open Problems. arXiv 2025. arXiv:2402.03578v2.
Mehandru N, Miao BY, Almaraz ER, et al. Large Language Models as Agents in the Clinic. arXiv 2023. arXiv:2309.10895v1.
Prasad A, Koller A, Hartmann M, et al. ADAPT - Dynamic Decomposition and Planning for LLMs in Complex Decision-Making. arXiv 2023. arXiv:2311.05772v2.
Praveen J. Empowering Pharmacovigilance: Unleashing the Potential of Generative AI in Drug Safety Monitoring. J Innov Appl Pharm Sci 2023;8:24-32.
He F, Zhu T, Ye D, et al. The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies. arXiv 2024. arXiv:2407.19354v1.
Tran KT, Dao D, Nguyen MD, et al. Multi-Agent Collaboration Mechanisms: A Survey of LLMs. arXiv 2025. arXiv:2501.06322v1.
Anthropic. Introducing the Model Context Protocol. 2024 [cited 2024 Nov 25]. Available online: https://www.anthropic.com/news/model-context-protocol
Surapaneni R, Jha M, Vakoc M, et al. Announcing the Agent2Agent Protocol (A2A). Google Developers Blog; 2025 [cited 2025 Apr 9]. Available online: https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/
Zheng H, Zhang Y, Wang L, et al. Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling. arXiv 2025. arXiv:2503.02233v2.
Yao S, Yu D, Zhao J, et al. Tree of Thoughts: Deliberate Problem-Solving with Large Language Models. arXiv 2023. arXiv:2305.10601v2.
Zhang Y, Yang J, Yuan Y, et al. Cumulative Reasoning with Large Language Models. arXiv 2023. arXiv:2308.04371v6.
Du Y, Li S, Torralba A, et al. Improving factuality and reasoning in language models through multiagent debate. arXiv 2023. arXiv:2305.14325v1.
Tonmoy STI, Zaman SMM, Jain V, et al. A comprehensive survey of hallucination mitigation techniques in Large Language Models. arXiv 2024. arXiv:2401.01313v3.
Hua W, Yang X, Jin M, et al. TrustAgent: Towards Safe and Trustworthy LLM-based Agents. arXiv 2024. arXiv:2402.01586v4.
Hu J, Dong Y, Ao S, et al. Position: Towards a Responsible LLM-empowered Multi-Agent Systems. arXiv 2025. arXiv:2502.01714v1.
Peng JL, Cheng S, Diau E, et al. Survey of Useful LLM Evaluation. arXiv 2024. arXiv:2406.00936v1.
Ma C, Zhang J, Zhu Z, et al. Agentboard: An Analytical Evaluation Board of Multi-turn LLM agents. arXiv 2024. arXiv:2401.13178v1.
Liu X, Yu H, Zhang H, et al. Agentbench: Evaluating LLMs as agents. arXiv 2023. arXiv:2308.03688v2.
Valmeekam K, Sreedharan S, Marquez M, et al. On the Planning Abilities of Large Language Models : A Critical Investigation. arXiv 2023. arXiv:2302.06706v1.
Dibia V. Multi-Agent LLM Applications. A Review of Current Research, Tools, and Challenges. 2023 [cited 2023 Dec 19]. Available online: https://newsletter.victordibia.com/p/multi-agent-llm-applications-a-review
Yehudai A, Eden L, Li A, et al. Survey on Evaluation of LLM-based Agents. arXiv 2025. arXiv:2503.16416v1.

doi: 10.21037/jmai-24-350
Cite this article as: Venugopal SM. Large language models-powered agentic AI design and implementation in pharmacovigilance—a narrative review. J Med Artif Intell 2026;9:8.