Non-diagnostic generative artificial intelligence in otolaryngology: a narrative review of evidence, risks, and implementation

Jacob E. Karni; Rebecca Attal; Armin Farzad; Sholem Hack

doi:10.21037/jmai-2026-1-0036

Review Article

Non-diagnostic generative artificial intelligence in otolaryngology: a narrative review of evidence, risks, and implementation

Jacob E. Karni¹, Rebecca Attal², Armin Farzad³, Sholem Hack²

¹Washington University in St. Louis, St. Louis, MO, USA; ²City St. George’s University of London, School of Medicine, Program Delivered by University of Nicosia at the Chaim Sheba Medical Center, Ramat Gan, Israel; ³City St. George’s University of London, School of Medicine, London, UK

Contributions: (I) Conception and design: JE Karni, S Hack; (II) Administrative support: S Hack; (III) Provision of study materials or patients: None; (IV) Collection and assembly of data: All authors; (V) Data analysis and interpretation: All authors; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Sholem Hack, MBBS. City St. George's University of London, School of Medicine, Program Delivered by University of Nicosia at the Chaim Sheba Medical Center, Derech Sheba 2, Ramat Gan, 52621, Israel. Email: sholemhack1@gmail.com.

Background and Objective: Generative artificial intelligence (GenAI) is increasingly used in otolaryngology, with most work focused on diagnostics. However, documentation, communication, and care coordination drive much of clinical workload and medicolegal risk, yet evidence on non-diagnostic use remains limited. To critically synthesize the peer-reviewed evidence on non-diagnostic GenAI applications in otolaryngology and evaluate their benefits, failure modes, and implementation considerations across common clinical documentation and communication tasks.

Methods: A structured narrative review was conducted of peer-reviewed literature published between January 2020 and December 2025. Searches were performed in PubMed, Scopus, and Web of Science, limited to English-language publications, using terms related to GenAI, large language models (LLMs), and otolaryngology documentation and communication. Studies focused exclusively on diagnostic interpretation or autonomous clinical decision-making were excluded unless directly relevant to non-diagnostic use. Evidence was synthesized qualitatively using a sociotechnical framework distinguishing epistemic (diagnostic) from operational (documentation and communication) functions. Five predefined domains were analyzed: operative notes, discharge summaries, clinic letters and patient-facing summaries, multidisciplinary tumor board documentation, and consent and risk communication.

Key Content and Findings: Across domains, GenAI demonstrated consistent strengths in linguistic fluency, organization, and readability, particularly for discharge summaries and patient-facing materials. However, recurrent failure modes were identified, including laterality errors, omission of low-frequency but high-consequence details, attenuation of clinical uncertainty, and false coherence in multidisciplinary summaries. High-risk applications—most notably operative notes and consent documentation—showed unacceptable error profiles when unconstrained narrative generation was used. Hybrid clinician-GenAI workflows consistently outperformed GenAI-only outputs in accuracy and acceptability. Efficiency gains were highly task-dependent and frequently offset by increased cognitive burden associated with verification and interpretive editing. Implementation challenges, including limited electronic health record (EHR) integration, governance barriers, and shadow use, were repeatedly identified as determinants of real-world safety and effectiveness.

Conclusions: Non-diagnostic GenAI is most defensible in otolaryngology when deployed selectively, with task-specific risk stratification and explicit human oversight. GenAI performs most reliably as a constrained drafting and organizational aid tethered to verified inputs, rather than as an autonomous narrator. Safe adoption depends less on model sophistication than on disciplined workflow integration, transparent attribution, and governance that preserve clinician accountability and patient trust.

Keywords: Artificial intelligence (AI); clinical documentation; electronic health records; medical ethics; workflow

Received: 11 February 2026; Accepted: 27 March 2026; Published online: 14 May 2026.

doi: 10.21037/jmai-2026-1-0036

Introduction

Background

Artificial intelligence (AI) has rapidly entered otolaryngology, with early adoption dominated by diagnostic applications such as imaging interpretation, endoscopic analysis, and disease classification (1-8). This emphasis reflects the specialty’s data-rich environment and parallels fields like radiology and dermatology, where diagnostic AI matured early (9-11). Consequently, much of the otolaryngology AI literature has centered on diagnostic accuracy and comparisons with clinician performance.

However, diagnostic interpretation represents only a small fraction of routine otolaryngologic practice. Documentation, communication, and care coordination consume substantial clinician time and are major contributors to burnout, medicolegal risk, and care fragmentation (12). Despite their operational centrality, these non-diagnostic tasks have received limited specialty-specific evaluation within the AI literature, particularly in otolaryngology where documentation errors may carry disproportionate clinical and medicolegal consequences.

Recent advances in generative artificial intelligence (GenAI), enabled by large language models (LLMs), have expanded AI capabilities beyond classification to natural language generation from clinical data (13-17). These systems can draft operative notes, discharge summaries, clinic correspondence, and patient-facing materials with high linguistic fluency (4,18). Yet fluency does not equate to safety. Evidence from otolaryngology and broader medical meta-analyses shows that GenAI performs more consistently in communication-oriented tasks than in diagnostic reasoning and cautions against autonomous clinical deployment (3,19). Together, these findings suggest that GenAI’s most defensible near-term role lies in augmenting downstream operational functions rather than replacing clinical judgment.

Rationale and knowledge gap

Existing reviews in otolaryngology have similarly focused on diagnostic performance, image-based AI, or broad overviews of AI applications, with limited attention to non-diagnostic domains such as documentation, communication, and workflow integration (20-22). While these studies provide important insights into model accuracy and clinical decision support, they do not address the distinct risks, failure modes, and implementation challenges associated with GenAI in non-diagnostic tasks (9,20,22,23). This gap highlights the need for a focused, specialty-specific synthesis of non-diagnostic GenAI applications.

Non-diagnostic GenAI occupies a distinct conceptual space in clinical care: it shapes how information is recorded and communicated rather than determining clinical truth (24,25). In otolaryngology, where laterality, anatomic specificity, staging, and procedural nuance are routine, small documentation errors can persist and exert disproportionate downstream effects on billing, care coordination, and medicolegal review (26,27). This risk profile underscores the need for specialty-specific evaluation of GenAI in documentation and communication, distinct from diagnostic AI assessment.

Otolaryngology also presents unique challenges for non-diagnostic GenAI compared to other specialties. Documentation frequently requires precise encoding of laterality, anatomic subsite, and procedural detail, where even minor inaccuracies may affect surgical planning, multidisciplinary coordination, or oncologic decision-making. In addition, non-diagnostic GenAI applications in this specialty remain relatively under-characterized compared to diagnostic use cases, supporting the need for a focused, specialty-specific synthesis.

Objective

In this state-of-the-art review, we synthesize peer-reviewed evidence from 2020–2025 on non-diagnostic GenAI applications in otolaryngology through a sociotechnical, workflow-aware lens. We examine five core domains—operative notes, discharge summaries, clinic letters and patient-facing summaries, multidisciplinary tumor board synthesis, and consent and risk communication—to evaluate benefits, failure modes, and implementation constraints. Our aim is to position GenAI not as a diagnostic surrogate, but as a conditional tool for clinical communication and documentation, where both opportunity and risk are substantial. We present this article in accordance with the Narrative Review reporting checklist (available at https://jmai.amegroups.com/article/view/10.21037/jmai-2026-1-0036/rc).

Methods

This state-of-the-art review employed a structured narrative synthesis, informed by systematic search methods, to evaluate peer-reviewed evidence on non-diagnostic applications of GenAI in otolaryngology, integrating clinical, technical, and implementation-focused literature into a practice-relevant framework.

A targeted search of PubMed, Scopus, and Web of Science identified studies published between January 1, 2020, and December 31, 2025 (final search January 2026). Search terms combined GenAI and LLM concepts with otolaryngology-specific documentation and communication tasks (Table 1). Search strategies were adapted for each database using controlled vocabulary and free-text terms, with searches performed in title, abstract, and keyword fields where applicable. No study design filters were applied. Reference lists of included studies, key review articles, and high-impact publications identified during screening were also examined, and duplicate records were identified and removed using reference management software prior to screening.

Table 1

Search strategy

Items	Specification
Date of search	January 15, 2026
Databases searched	PubMed; Scopus; Web of Science
Search terms used	(“Generative AI” OR “large language model” OR “LLM” OR “GPT” OR “transformer” OR “foundation model” OR “AI scribe” OR “prompt-based generation” OR “text generation” OR “summarization”) AND (“otolaryngology” OR “ENT” OR “head and neck” OR “head and neck surgery” OR “otology” OR “rhinology” OR “laryngology”) AND (“documentation” OR “discharge summary” OR “operative note” OR “clinic letter*” OR “patient communication” OR “after-visit summary” OR “consent” OR “risk communication” OR “tumor board” OR “multidisciplinary team”)
Timeframe	January 1, 2020–December 31, 2025
Inclusion criteria	Peer-reviewed journal articles; English language; non-diagnostic GenAI applications related to documentation or communication in otolaryngology
Exclusion criteria	Diagnostic interpretation; autonomous clinical decision-making; prognostic modeling; non-peer-reviewed literature
Selection process	Titles and abstracts were screened independently by two reviewers, followed by full-text review of eligible articles. Disagreements at both stages were resolved by consensus
Additional considerations	Reference lists of key reviews and high-impact studies were hand-searched

AI, artificial intelligence; ENT, ear, nose, and throat; GenAI, generative artificial intelligence; GPT, generative pre-trained transformer; LLM, large language model.

Eligible studies were limited to peer-reviewed journal articles. Preprints, conference abstracts, and non-empirical commentaries were excluded. Studies focused exclusively on diagnostic interpretation, autonomous treatment recommendation, or prognostic modeling were excluded unless directly relevant to non-diagnostic GenAI use.

Non-diagnostic GenAI was defined as systems that generate, summarize, or restructure existing clinical information without independently producing diagnostic or therapeutic decisions. For the purposes of this review, included tasks were those involving documentation, summarization, or communication of clinician-derived information (e.g., operative notes, discharge summaries, clinic letters, tumor board summaries, and consent documentation). Excluded tasks included image interpretation, diagnostic classification, differential diagnosis generation, and autonomous treatment recommendation.

Borderline applications, such as tumor board summaries and consent or risk communication, were included only when GenAI was used to organize or rephrase existing clinician-determined information rather than to generate new clinical recommendations. During study selection, tasks were excluded if the GenAI system materially influenced diagnostic reasoning or treatment decision-making beyond framing or summarization.

Five domains were prespecified based on clinical relevance and risk: operative notes, discharge summaries, clinic letters and patient-facing summaries, multidisciplinary tumor board synthesis, and consent and risk communication.

Given heterogeneity in study designs and outcomes, evidence was synthesized qualitatively. Included studies encompassed both simulation-based evaluations and real-world clinical implementations, which were interpreted in context when assessing reported benefits and risks. Formal quality appraisal or risk-of-bias assessment was not performed, consistent with the narrative review design. This review was not intended to be exhaustive but rather to provide a structured, concept-driven synthesis of representative evidence, with attention to performance, failure modes, human oversight, workflow integration, ethical and equity considerations, and downstream documentation risks, interpreted through a sociotechnical lens. Accordingly, formal systematic review reporting frameworks were not applied.

Conceptual framework: non-diagnostic GenAI as sociotechnical redistribution of clinical work

Non-diagnostic applications of GenAI in otolaryngology are best understood through a sociotechnical lens, in which technology reshapes clinical work without displacing professional responsibility (11,28). Unlike diagnostic AI, which aims to infer clinical truth, non-diagnostic GenAI operates downstream of judgment, shaping how clinical information is documented, structured, and communicated (27). This distinction is critical because documentation errors rarely trigger immediate clinical alarms, yet may persist within the longitudinal medical record and exert downstream effects on billing, care coordination, multidisciplinary decision-making, and medicolegal review (29).

From a human-machine interaction perspective, GenAI alters the form rather than the ownership of clinical labor. Tasks that previously required active authorship, such as composing operative notes or clinic correspondence, are transformed into tasks of verification and correction (30,31). Although clinicians retain full legal and ethical responsibility for the medical record, the cognitive experience of authorship changes when fluent narrative text is presented for review rather than composed de novo (30,31). This redistribution of effort introduces verification demands that are not captured by conventional efficiency metrics focused on drafting time alone (30,31).

This shift mirrors the automation paradox described in other safety-critical domains, including aviation and radiology, where increasing automation reduces routine workload while amplifying the consequences of oversight failure (24,32). As systems become “mostly correct,” human vigilance may decline, increasing susceptibility to automation bias, over-trust in automated outputs despite residual error (24,32). In clinical documentation, this risk manifests as verification fatigue, in which clinicians must identify subtle inaccuracies embedded within long, coherent narratives under time pressure (30,31).

Otolaryngology amplifies these risks because of its documentation demands. Laterality, anatomic subsite specificity, staging language, and device identifiers are routine elements of otolaryngology records, and small discrepancies may carry disproportionate downstream consequences (29,31). A laterality error in an otologic operative note or a subtly altered tumor subsite description in a head and neck cancer summary may appear linguistically trivial, yet materially affect subsequent care, reimbursement, or medicolegal interpretation. Such errors are often linguistic rather than diagnostic in nature, making them difficult to detect retrospectively once embedded in the record (27).

The distinction between epistemic and operational AI functions provides a useful organizing framework. Epistemic AI seeks to determine what is clinically true, whereas operational AI influences how that truth is recorded and transmitted (32,33). Evidence across otolaryngology and general medicine suggests that GenAI performs more reliably in operational tasks involving summarization, rephrasing, and organization than in autonomous reasoning or decision-making (24,31,32). This pattern reflects structural properties of probabilistic language models rather than domain-specific deficiencies.

Importantly, operational reliability is not uniform across tasks. The acceptability of GenAI assistance depends on task criticality, tolerance for imprecision, and ease of verification. Low-risk communication tasks may tolerate minor linguistic inaccuracies, whereas operative documentation and consent narratives require near-perfect fidelity. Responsible deployment therefore requires task-specific risk stratification rather than broad endorsement or rejection of GenAI technologies (33,34).

This conceptual framework underpins the sections that follow. Rather than evaluating GenAI tools in isolation, the subsequent domain-specific analyses examine how non-diagnostic GenAI interacts with clinician cognition, workflow constraints, and institutional accountability across core otolaryngology documentation contexts.

Operative notes and procedural documentation

Operative notes represent the highest-risk domain for non-diagnostic GenAI deployment in otolaryngology. These documents encode procedural intent, intraoperative findings, and technical detail that guide postoperative care, future surgical planning, billing, and medicolegal review (4,35). Unlike many other clinical documents, operative notes tolerate little ambiguity; inaccuracies related to laterality, anatomic extent, device use, or staging may not be immediately apparent yet can persist within the medical record and exert downstream consequences (4,35).

Across medical specialties, AI-assisted documentation systems demonstrate the ability to generate structurally complete operative reports with high surface-level coherence (4,31,35). However, systematic evaluations consistently identify recurrent failure modes, including hallucinated procedural steps, omission of low-frequency but high-consequence findings, and conflation of intended versus completed actions (36). These risks are most pronounced when GenAI systems are permitted to generate unconstrained narrative text rather than populate predefined templates anchored to verified intraoperative data.

Otolaryngology-specific evidence mirrors these findings. In a systematic review and meta-analysis assessing GenAI performance across otolaryngologic tasks, documentation-related outputs were rated more favorably than autonomous diagnostic reasoning but remained inferior to hybrid clinician-GenAI workflows, particularly for operative notes (3). GenAI-only outputs demonstrated higher rates of subtle inaccuracies, whereas clinician-edited drafts achieved greater accuracy and acceptability, underscoring the necessity of sustained human oversight (3).

The verification burden associated with operative documentation is substantial (4). While GenAI may reduce the time required to produce an initial draft, clinicians must still cognitively reconstruct the procedure, confirm laterality, verify device identifiers, and ensure consistency between narrative description and intraoperative reality (4,33). Documentation burden research indicates that cognitive load, rather than typing time alone, is a major contributor to clinician fatigue; when verification becomes effortful, net efficiency gains may diminish (37,38). As GenAI outputs become increasingly fluent, the risk of automation bias further complicates review, increasing the likelihood that subtle errors remain undetected under time pressure (33,38).

Subspecialty-specific factors heighten these concerns within otolaryngology. In otology, laterality errors or device mislabeling carry well-recognized medicolegal risk, particularly in bilateral disease or staged procedures. In rhinology, imprecise terminology may affect procedural coding, quality metrics, and longitudinal outcome assessment. In laryngology, where operative notes often rely on descriptive nuance rather than discrete procedural labels, linguistically polished but clinically imprecise text may misrepresent surgical extent or intent (27).

Taken together, current evidence does not support unconstrained GenAI authorship of operative notes in otolaryngology. The most defensible applications involve constrained drafting systems embedded within structured templates, procedural taxonomies, and explicit clinician verification workflow (3,4,35). In this domain, GenAI may reduce mechanical documentation burden, but it cannot substitute for surgeon authorship or accountability. Operative documentation therefore represents a conservative benchmark for non-diagnostic GenAI deployment: where tolerance for error is minimal, automation must remain tightly controlled, transparent, and subordinate to human judgment (35).

Discharge summaries and transitions of care

Discharge summaries represent one of the most promising applications of non-diagnostic GenAI in otolaryngology, offering a favorable balance between moderate clinical risk and substantial potential benefit. These documents serve as the primary mechanism for communicating inpatient care to patients, primary care clinicians, and referring specialists, and deficiencies in discharge documentation are a well-established contributor to adverse events following hospitalization. In contrast to operative notes, discharge summaries prioritize synthesis, clarity, and continuity rather than procedural precision, rendering them more amenable to structured automation (3,33,39).

Across medical specialties, GenAI systems have demonstrated the capacity to generate discharge summaries that are coherent, well organized, and linguistically accessible when supplied with appropriate source data (40-42). Comparative studies consistently report improvements in readability and structural completeness relative to clinician-authored documents, often accompanied by substantial reductions in drafting time (43-45). These gains are particularly salient given longstanding evidence that traditional discharge summaries frequently exceed recommended reading levels and inadequately support patient comprehension (43,45,46).

Improved readability, however, does not guarantee clinical safety. A recurring limitation across the literature is GenAI’s tendency to compress or omit contextual qualifiers, uncertainty, and low-frequency but high-consequence instructions (47,48). In otolaryngology, discharge summaries commonly include airway precautions, bleeding risk counseling, device management instructions, and explicit thresholds for urgent evaluation. Narrative smoothing or over-summarization may attenuate the prominence of these elements, particularly when they are not explicitly encoded in structured input fields (47,49).

The human-in-the-loop burden in discharge documentation differs qualitatively from that of operative notes. Verification focuses less on technical accuracy and more on interpretive emphasis, including clarity of follow-up plans, accuracy of medication reconciliation, and appropriate foregrounding of risk counseling. Studies of AI-assisted discharge workflows consistently demonstrate that clinician review remains essential to prevent omission of safety-critical content, even when drafts are generated rapidly and appear complete (43,46,50).

Otolaryngology-specific care complexity further moderates risk. Many discharges involve coordination with audiology, speech-language pathology, oncology, or home care services, increasing vulnerability to fragmentation when summaries are incomplete (42,46,47). Additionally, postoperative otolaryngology patients may experience delayed complications such as hemorrhage or airway compromise, requiring clear anticipatory guidance that cannot be safely abbreviated. Errors in these contexts may not be immediately apparent but can carry serious downstream consequences (27,46,47).

Despite these limitations, the balance of evidence supports discharge summaries as a moderate-risk, high-yield domain for GenAI deployment when appropriate safeguards are in place. The most defensible implementations emphasize structured data ingestion, explicit prompting for safety-critical instructions, and clinician review that is feasible within real-world workflows (43,46,50). When used as a drafting and organizational aid rather than an autonomous narrator, GenAI can improve clarity and continuity of care without undermining professional accountability (43,50,51).

Clinic letters and patient-facing summaries

Clinic letters and patient-facing visit summaries differ fundamentally from other clinical documents in that their primary purpose is interpretive rather than procedural. These texts translate clinical encounters into narratives intended to be revisited by patients, caregivers, and referring clinicians, often well after the encounter itself (52). In otolaryngology, where many conditions require longitudinal management and nuanced counseling, the clarity, tone, and framing of written communication can materially influence patient understanding, adherence, and trust (52).

GenAI systems demonstrate consistent strengths in drafting clinic letters that are organized, grammatically polished, and easier to read than many clinician-authored counterparts (28,46,53). Studies evaluating AI-generated patient summaries report improvements in readability and patient-perceived clarity, aligning with longstanding evidence that traditional clinical correspondence frequently exceeds recommended reading levels (46,53,54). These findings highlight a meaningful opportunity for GenAI to address persistent communication deficits in outpatient care.

At the same time, clinic letters expose a central limitation of non-diagnostic GenAI: the risk of interpretive distortion (55-57). Unlike operative notes, where accuracy is primarily factual, clinic correspondence encodes clinical judgment, uncertainty, and shared decision-making (58,59). Subtle shifts in phrasing may transform a probabilistic recommendation into a definitive plan or alter the perceived balance of risks and benefits (57,60). For example, language suggesting that “observation is reasonable” may be reframed as “no intervention is needed”, or a discussion of optional voice therapy may be rendered as a recommendation with implied urgency. In conditions such as early laryngeal lesions, functional voice disorders, or chronic tinnitus, where management decisions often hinge on patient preference and tolerance for uncertainty, these shifts can meaningfully influence patient expectations and downstream choices (61-63).

From the clinician’s perspective, efficiency gains in this domain are not guaranteed. Editing AI-generated clinic letters requires not only factual verification but also careful assessment of whether tone, emphasis, and framing accurately reflect what was discussed and intended (31,64,65). This form of review is cognitively demanding and less amenable to automation than proofreading for discrete errors. Without explicit recognition of this interpretive workload, GenAI risks becoming an invisible source of cognitive labor rather than a net efficiency gain (31,55,66).

Patient trust and transparency further shape the acceptability of GenAI-assisted communication. Surveys consistently demonstrate greater patient comfort with AI involvement in administrative or explanatory tasks than in diagnostic decision-making, but acceptance depends on perceived clinician oversight and disclosure (67-69). Undisclosed AI authorship of patient-facing documents may undermine authenticity, even when content quality is high (67-71).

Equity considerations are particularly salient in otolaryngology. Many otolaryngology patients experience speech, voice, or hearing impairments that affect how they engage with written and spoken communication (72-75). GenAI systems trained predominantly on normative language patterns may inadequately reflect these experiences, and speech-based inputs may underperform in patients with dysphonia or nonstandard accents (73-75). Without deliberate attention to inclusivity, GenAI-assisted communication risks reinforcing existing disparities rather than mitigating them (73,75-78).

Overall, clinic letters and patient-facing summaries represent a lower-risk but high-sensitivity application of GenAI. When used as a drafting and restructuring assistant under clear clinician supervision, GenAI can enhance clarity and consistency. However, preserving clinician authorship, transparency, and interpretive fidelity remains essential to maintaining trust and shared decision-making.

Multidisciplinary tumor board (MDT) synthesis

MDTs are central to head and neck cancer care, integrating surgical, medical, radiation, radiologic, and pathologic perspectives into coordinated treatment planning (79,80). Unlike most documentation tasks, MDT deliberations are inherently collective and context-dependent, relying on tacit expertise, institutional practice patterns, and patient-specific considerations that are often incompletely captured in the medical record (79,81). These features make MDT synthesis an appealing but fundamentally constrained application for non-diagnostic GenAI (34,81,82).

Recent oncology-focused studies suggest that GenAI systems can summarize patient histories, extract key staging elements, and restate guideline-concordant options with moderate accuracy in straightforward cases (83,84). Concordance with expert recommendations declines substantially, however, in complex scenarios involving advanced disease, prior multimodality treatment, or competing functional priorities (85). Agreement is highest when cases closely follow established guidelines and lowest when deliberation hinges on nuance, feasibility, or patient preference (81,82).

The principal risk in MDT contexts is not simply incorrect output, but false coherence. GenAI-generated summaries may present treatment plans with confident, unified language that obscures disagreement, uncertainty, or contingency (34,83). In multidisciplinary care, the explicit articulation of dissent and negotiation of trade-offs functions as a critical safeguard. Collapsing this process into a single polished narrative risks misrepresenting both the rationale for decisions and the conditions under which they apply (81).

Documentation accuracy carries additional medicolegal and quality-improvement implications. MDT summaries are often referenced to justify treatment decisions, demonstrate guideline adherence, and support institutional accreditation (79). Oversimplification or omission may distort the historical record of deliberation, particularly when downstream clinicians rely on summaries rather than contemporaneous discussion notes (86). As in operative documentation, linguistic fluency can mask missing context or unresolved uncertainty (34,83,87).

Head and neck cancer further amplifies these concerns. Small discrepancies in staging language, margin interpretation, or nodal classification can meaningfully alter management pathways (80). While GenAI systems may reliably restate existing staging information, they lack access to tacit knowledge regarding surgical feasibility, reconstructive complexity, anticipated functional outcomes, or institutional constraints, factors that frequently shape MDT recommendations but are rarely explicit in source data (81,82).

Accordingly, the most defensible role for GenAI in MDT workflows is supportive rather than decisional. As a preparatory tool, it may assist in assembling structured case summaries and ensuring data completeness. As a post-meeting aid, it may facilitate draft documentation that clinicians then edit to accurately reflect discussion, uncertainty, and consensus. In both contexts, GenAI should remain subordinate to human deliberation and tethered to verified source data.

Consent and risk communication represent a uniquely sensitive domain for non-diagnostic GenAI in otolaryngology, where the stakes extend beyond documentation accuracy to ethical validity and patient autonomy. Unlike other documentation tasks, consent processes require not only accurate information transfer but also appropriate framing of uncertainty, alternatives, and individualized risk in a manner that supports informed decision-making (58-60).

Emerging evidence suggests that GenAI can generate consent forms and risk explanations that are linguistically clear and often more readable than traditional clinician-authored materials. However, these strengths are accompanied by important limitations. GenAI systems may standardize or oversimplify risk information, attenuate uncertainty, or omit rare but high-consequence complications, particularly when such details are not explicitly specified in input prompts. Similar to other documentation domains, fluent narrative output may mask clinically meaningful omissions or distortions (3,17,88).

In otolaryngology, these risks are amplified by the procedural diversity and potential morbidity of common interventions, including airway compromise, bleeding, voice change, and hearing outcomes. Small shifts in phrasing may alter patient perception of risk, transforming probabilistic counseling into deterministic language or inadvertently biasing decision-making. Unlike operative documentation errors, which may be detected retrospectively, errors in consent communication directly affect patient understanding at the point of care (3,4,58,60).

From an ethical and medicolegal perspective, reliance on unconstrained GenAI-generated consent narratives is therefore not currently justifiable. The most defensible use cases involve structured assistance, such as generating draft templates anchored to validated procedure-specific risk libraries, with mandatory clinician review and customization. In this context, GenAI may improve readability and standardization while preserving clinician responsibility for content accuracy, framing, and shared decision-making (29,64).

Accordingly, consent and risk communication should be considered a high-risk application of non-diagnostic GenAI, requiring stricter safeguards, explicit oversight, and careful integration into established consent workflows.

Cross-cutting synthesis across non-diagnostic domains (Table 2)

Table 2

Domain-specific risk profiles and safeguards for non-diagnostic GenAI in otolaryngology

Domain	Clinical risk level	Primary benefits	Dominant failure modes	Recommended GenAI role	Required safeguards
Operative notes	High	Rapid drafting; standardized structure	Laterality errors; hallucinated steps; omission of rare findings	Constrained drafting only	Structured templates; source-linked fields; mandatory surgeon verification
Discharge summaries	Moderate	Improved readability; faster synthesis	Loss of uncertainty; omission of safety-critical instructions	Drafting and organization aid	Structured inputs; explicit safety prompts; clinician review
Clinic letters/patient summaries	Low–moderate	Clearer language; improved patient comprehension	Interpretive distortion; overstatement of certainty	Drafting and rewriting assistant	Clinician editing for tone, intent, and shared decision-making
MDT documentation	High	Case preparation; data completeness	False coherence; masking disagreement	Pre- and post-meeting support only	Tethering to source data; explicit documentation of uncertainty
Consent and risk communication	High	Standardized language; readability support	Incomplete risk disclosure; loss of nuance	Supplemental drafting only	Clinician-authored final text; verification against discussion

This table summarizes key non-diagnostic GenAI use cases in otolaryngology, stratified by clinical risk, dominant failure modes, and recommended deployment configurations. Across domains, GenAI demonstrates greatest reliability when used as a constrained drafting or organizational aid tethered to verified source data. High-risk applications require structured inputs, explicit clinician verification, and clear attribution to preserve documentation integrity and accountability. GenAI, generative artificial intelligence; MDT, multidisciplinary tumor board.

Across the five non-diagnostic domains reviewed, several consistent patterns emerge that clarify both the promise and the structural limits of GenAI in otolaryngology (4,89,90). These patterns suggest that performance is determined less by document type than by the interaction between language generation, clinician cognition, and workflow design (34,91).

A defining tradeoff is the tension between granularity and safety. GenAI systems perform most reliably when generation is constrained by structured inputs, templates, or source-linked fields that tether language output to verifiable data (50,91). Such constraints reduce hallucination, laterality drift, and omission, but at the cost of expressive flexibility (50,83,91). Conversely, unconstrained narrative generation enables fluent, natural language output while increasing the risk of confident misrepresentation (64,83). This tradeoff recurs across operative documentation, discharge summaries, clinic correspondence, and MDT records, reflecting a structural property of probabilistic language models rather than a domain-specific limitation (4,83,91).

A second unifying pattern is the convergence of failure modes. Laterality errors, omission of low-frequency but high-consequence details, and attenuation of uncertainty recur across tasks (83,91,92). These errors are particularly insidious because they are embedded within coherent, well-structured prose. Evidence from automation research indicates that such conditions heighten susceptibility to automation bias, especially when clinicians are required to verify outputs under time pressure (34,93). In documentation workflows, this manifests as verification fatigue; clinicians retain formal responsibility for review, yet are cognitively incentivized to skim fluent text rather than interrogate it.

As a result, the relationship between efficiency and safety is non-linear. While GenAI can substantially reduce drafting time, net efficiency depends on the cognitive cost of verification and correction (4,91,94). In lower-risk domains, such as patient-facing summaries, limited imprecision may be acceptable when transparency and oversight are preserved. In higher-risk domains, including operative documentation and consent narratives, even small inaccuracies carry disproportionate downstream consequences. These asymmetries argue against uniform deployment strategies and instead favor task-specific risk stratification (93,95).

Across studies, a consistent distinction emerges between linguistic optimization and cognitive substitution. GenAI excels at organizing, summarizing, and rephrasing existing information, but underperforms in tasks requiring contextual prioritization, tacit knowledge, or deliberation under uncertainty (83,95). In otolaryngology, where expertise is frequently anatomy-dependent and experiential, this limitation is particularly salient (4,89). GenAI functions most reliably as a linguistic optimizer that improves signal-to-noise ratio, rather than as a surrogate for clinical reasoning (24,91).

Taken together, these observations underscore that non-diagnostic GenAI cannot be meaningfully evaluated in isolation from workflow. Systems that fragment attention, obscure authorship, or impose excessive verification burden risk negating technical gains. Conversely, tools that are embedded within existing documentation structures, aligned with clinician mental models, and supported by explicit accountability frameworks are more likely to deliver durable benefit. Non-diagnostic GenAI should therefore be understood and assessed as a sociotechnical intervention, not a standalone technology.

Implementation realities: electronic health record (EHR) integration and the “last-mile” problem

The clinical impact of non-diagnostic GenAI in otolaryngology is determined less by model capability than by deployment (96). Across healthcare, AI systems that perform well in controlled evaluations frequently fail to deliver meaningful benefit once introduced into real-world workflows (97,98). This “last-mile” problem is particularly salient for documentation and communication tasks, where value depends on seamless integration into already time-pressured clinical environments (97,98).

Most current GenAI tools function as external or “side-car” applications rather than as native components of the EHR (36,99,100). These workflows require clinicians to transfer content between interfaces and manually reinsert outputs into the chart, fragmenting attention and eroding potential efficiency gains (33,98). Such fragmentation increases the risk of transcription error, version mismatch, and loss of provenance, particularly when multiple drafts circulate outside the authoritative record (31,65).

Even when technical integration is feasible, deployment is constrained by proprietary EHR architectures and limited application programming interfaces (33,100,101). Institutional governance processes, including information security review, legal oversight, and procurement, often further delay or restrict implementation (96,102). As a result, many promising tools remain confined to pilot settings without achieving operational maturity (96,102). Concerns regarding cloud-based processing of protected health information add an additional barrier, particularly for systems that generate or transform free-text clinical documentation at scale (96,102).

A predictable consequence of these constraints is the emergence of shadow use, in which clinicians independently adopt publicly available GenAI tools outside sanctioned workflows (64,102). While driven by documentation burden rather than intent to bypass governance, such practices introduce inconsistent documentation standards, weaken institutional oversight, and expose clinicians and patients to unquantified privacy and liability risk (64,102). Shadow use reflects a misalignment between clinician needs and system readiness rather than simple resistance to regulation (64,102).

Successful implementation therefore requires alignment between technical design and clinical workflow. GenAI systems that are embedded directly within documentation interfaces, minimize context switching, and preserve clear attribution of AI-assisted content are more likely to deliver net benefit (31,65,97). Importantly, tolerance for flexibility should be task-specific: higher-risk domains such as operative documentation require tighter integration, structured inputs, and explicit verification mechanisms, whereas lower-risk communication tasks may permit greater latitude (33,36,97).

Without addressing these deployment realities, non-diagnostic GenAI risks becoming either underutilized or misused. Conversely, workflow-native integration offers a credible path to translating technical advances into durable clinical value.

Ethical, equity, and privacy considerations

Ethical concerns surrounding non-diagnostic GenAI in otolaryngology arise less from overt clinical error than from subtler shifts in responsibility, trust, and representation (64,103). Because these systems shape how clinical information is recorded and communicated, they influence patient understanding and clinician accountability even when diagnostic and therapeutic decisions remain human (64,103).

A central challenge is accountability opacity. In GenAI-assisted documentation workflows, clinicians retain legal responsibility for content, yet authorship becomes distributed between human and system (64,103). Without explicit attribution and governance, this blurring risks normalizing passive endorsement of AI-generated text rather than active authorship. Over time, such shifts may erode professional documentation norms and weaken the link between clinical judgment and the medical record (64,102,103).

Privacy and data governance further complicate deployment. Non-diagnostic GenAI systems process large volumes of free-text clinical data, raising persistent questions about data retention, secondary use, and model improvement practices. In otolaryngology, where documentation may reference facial images, endoscopic findings, or audio recordings, de-identification is particularly challenging (64,103). Even when systems are used solely for inference, uncertainty regarding data logging and storage can undermine institutional and patient trust (64,103).

Equity considerations intersect directly with both workflow and model performance. GenAI systems trained on dominant linguistic and cultural norms may inadequately represent patients with speech, voice, or hearing impairments, a core otolaryngology population (104,105). Speech-to-text components may underperform in dysphonia or nonstandard accents, leading to documentation gaps that disproportionately affect vulnerable groups (105,106). Standardized consent or summary language may similarly fail to reflect culturally specific decision-making preferences if inclusivity is not explicitly addressed (104,107).

Transparency remains a key determinant of ethical acceptability. Surveys consistently indicate that patients are more comfortable with AI involvement in administrative or explanatory tasks than in diagnostic decision-making, provided clinician oversight is clear and disclosed (98). Undisclosed use of GenAI in patient-facing documentation risks undermining trust even when content quality is high (64,102,103).

Ethically responsible deployment therefore requires operational safeguards rather than abstract principles alone. Clear attribution of AI assistance, clinician education regarding appropriate use, inclusive design practices, and ongoing monitoring of real-world impact are necessary to ensure that gains in efficiency and clarity do not come at the expense of trust, equity, or professional integrity.

Limitations

This review may not capture all relevant studies, particularly non-English-language reports or work outside major biomedical databases. The available evidence is heterogeneous, largely derived from single-center or simulation-based studies, and focuses more on technical performance than patient outcomes, medicolegal risk, or long-term safety, limiting direct comparison with real-world clinical implementation and generalizability of findings. Most data come from high-income, English-language settings and text-based GenAI systems, limiting generalizability. Accordingly, findings should be viewed as a specialty-specific framing of a rapidly evolving field rather than product-level guidance.

Future directions

Future work should prioritize prospective evaluation of high-risk applications, particularly operative notes and consent documentation, with direct measurement of accuracy, hallucination rates, verification burden, and downstream clinical and medicolegal effects across clinician-only, GenAI-assisted, and hybrid workflows. Specialty-specific metrics, such as laterality errors, staging inaccuracies, and omission of airway precautions, are needed, alongside equity-focused testing in patients with communication impairments and multilingual settings. Finally, implementation science studies examining EHR integration, governance, clinician training, and shadow use will be critical to understanding real-world adoption and guiding responsible deployment.

Implications for practice

Current evidence supports a selective, task-specific approach to non-diagnostic GenAI in otolaryngology. High-risk applications, including operative notes and consent documentation, do not justify unconstrained narrative generation; when GenAI is used, it should be limited to structured drafting anchored to verified clinical data, with the clinician retaining full authorship and accountability.

Discharge summaries and clinic correspondence represent more suitable near-term applications. Used to organize reliable EHR inputs rather than reinterpret facts, GenAI can improve clarity and structure, particularly for patient-facing materials. However, any efficiency gains must be weighed against the cognitive burden of verification and interpretive editing.

Across domains, safe use requires clear attribution of GenAI-assisted text, institutional policies defining appropriate use, and clinician training focused on recognizing characteristic failure modes such as omission of high-consequence details or attenuation of uncertainty. Under these conditions, GenAI functions as a modifiable documentation tool rather than an opaque external service.

Conclusions

Non-diagnostic GenAI is likely to be the earliest and most pervasive application of AI in otolaryngology, acting primarily on the clinical record rather than on diagnostic interpretation. Across documentation and communication domains, performance is highest when generation is constrained by structured inputs and lowest when models are asked to improvise under uncertainty or encode tacit expertise, making task-specific risk stratification essential. Safe adoption will depend less on model sophistication than on disciplined workflow integration, explicit human verification, and governance frameworks that preserve accountability and patient trust. Beyond immediate applications, this work provides a conceptual and practical framework to guide future development, evaluation, and governance of non-diagnostic GenAI in otolaryngology, emphasizing task-specific risk stratification and workflow-integrated implementation.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the Narrative Review reporting checklist. Available at https://jmai.amegroups.com/article/view/10.21037/jmai-2026-1-0036/rc

Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-2026-1-0036/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-2026-1-0036/coif). S.H. received consulting fee from AMCA. The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Hack S, Alsleibi S, Shemesh S, et al. AI Meets Sleep Surgery: Assessing DISE Interpretation With a Large Language Model. Sleep 2025;zsaf338.
Hack S, Alsleibi S, Saleh N, et al. Are chatbots a reliable source for patient frequently asked questions on neck masses? Eur Arch Otorhinolaryngol 2025;282:4273-82. [Crossref] [PubMed]
Hack S, Attal R, Farzad A, et al. Performance of generative AI across ENT tasks: A systematic review and meta-analysis. Auris Nasus Larynx 2025;52:585-96. [Crossref] [PubMed]
Hack S, Attal R, Locatelli G, et al. Surgeon, Trainee, or GPT? A Blinded Multicentric Study of AI-Augmented Operative Notes. Laryngoscope 2026;136:605-15.
Hack S, Karni RJ, Maniaci A, et al. Evaluation of large language models as decision support tools for head and neck cancer management: A blinded multidisciplinary simulation study. Oral Oncol 2026;174:107877. [Crossref] [PubMed]
Hack S, Zalzal HG, Attal R, et al. Empowering front-line physicians with AI: Evaluating large language models in everyday ENT care. Am J Emerg Med 2026;102:90-7. [Crossref] [PubMed]
Hack S, Craig JR, Lin CF, et al. Retrieval-augmented generative AI enhances clinical reasoning in odontogenic sinusitis versus maxillary sinus mucositis. Eur Arch Otorhinolaryngol 2026;283:2353-60. [Crossref] [PubMed]
Hack S, Attal R, Geva K, et al. Blinded comparative evaluation of GPT-generated, online search-derived, and guideline-based answers for HPV-associated oropharyngeal cancer. Oral Oncol 2026;172:107813. [Crossref] [PubMed]
Demir E, Uğurlu BN, Uğurlu GA, et al. Artificial intelligence in otorhinolaryngology: current trends and application areas. Eur Arch Otorhinolaryngol 2025;282:2697-707. [Crossref] [PubMed]
Liu GS, Fereydooni S, Lee MC, et al. Scoping review of deep learning research illuminates artificial intelligence chasm in otolaryngology-head and neck surgery. NPJ Digit Med 2025;8:265. [Crossref] [PubMed]
Burmeister JR, Dimock E, Haupert M, et al. Bridging the AI implementation gap in otolaryngology: A clinical commentary. Digit Health 2025;11:20552076251396981. [Crossref] [PubMed]
Wandell GM, Giliberto JP. Otolaryngology resident clinic participation and attending electronic health record efficiency-A user activity logs study. Laryngoscope Investig Otolaryngol 2021;6:968-74. [Crossref] [PubMed]
Maddox T, Babski D, Embi P, et al. Risks of generative artificial intelligence in health and medicine. Generative Artificial Intelligence in Health and Medicine: Opportunities and Responsibilities for Transformative Innovation 2025:1-52.
Silverman JA, Ali SA, Rybak A, et al. Generative AI: Potential and Pitfalls in Academic Publishing. JPGN Rep 2023;4:e387. [Crossref] [PubMed]
Liu YM, Chou CC, Jaing TH, et al. Generative AI for clinical reasoning: A scoping review. Teaching and Learning in Nursing 2025;21:e305-e312.
Huang J, Wittbrodt MT, Teague CN, et al. Efficiency and Quality of Generative AI-Assisted Radiograph Reporting. JAMA Netw Open 2025;8:e2513921. [Crossref] [PubMed]
Lechien JR. Generative AI and Otolaryngology-Head & Neck Surgery. Otolaryngol Clin North Am 2024;57:753-65. [Crossref] [PubMed]
Arzideh K, Baldini G, Winnekens P, et al. A Transformer-Based Pipeline for German Clinical Document De-Identification. Appl Clin Inform 2025;16:31-43. [Crossref] [PubMed]
Rashid M, Yi CS, Sathapanasiri T, et al. Role of Generative Artificial Intelligence in Assisting Systematic Review Process in Health Research: A Systematic Review. Value Health 2025;28:1665-82. [Crossref] [PubMed]
Novi SL, Navarathna N, D’Cruz M, et al. Deep Learning in Otolaryngology: A Narrative Review. JAMA Otolaryngol Head Neck Surg 2026;152:71-80. [Crossref] [PubMed]
Mahmood H, Shaban M, Rajpoot N, et al. Artificial Intelligence-based methods in head and neck cancer diagnosis: an overview. Br J Cancer 2021;124:1934-40. [Crossref] [PubMed]
Wu Q, Wang X, Liang G, et al. Advances in Image-Based Artificial Intelligence in Otorhinolaryngology-Head and Neck Surgery: A Systematic Review. Otolaryngol Head Neck Surg 2023;169:1132-42. [Crossref] [PubMed]
Song D, Kim T, Lee Y, et al. Image-Based Artificial Intelligence Technology for Diagnosing Middle Ear Diseases: A Systematic Review. J Clin Med 2023;12:5831. [Crossref] [PubMed]
Takita H, Kabata D, Walston SL, et al. A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians. NPJ Digit Med 2025;8:175. [Crossref] [PubMed]
Esnaashari S, Hashem Y, Francis J, et al. Exploring doctors’ perspectives on generative-AI and diagnostic-decision-support systems. BMJ Health Care Inform 2025;32:e101371. [Crossref] [PubMed]
Perkins SW, Muste JC, Alam T, et al. Improving Clinical Documentation with Artificial Intelligence: A Systematic Review. Perspect Health Inf Manag 2024;21:1d.
Ayoub NF, Rameau A, Brenner MJ, et al. American Academy of Otolaryngology-Head and Neck Surgery (AAO-HNS) Report on Artificial Intelligence. Otolaryngol Head Neck Surg 2025;172:734-43. [Crossref] [PubMed]
Topaz M, Peltonen LM, Zhang Z. Beyond human ears: navigating the uncharted risks of AI scribes in clinical practice. NPJ Digit Med 2025;8:569. [Crossref] [PubMed]
Gerke S, Simon DA, Roman BR. Liability Risks of Ambient Clinical Workflows With Artificial Intelligence for Clinicians, Hospitals, and Manufacturers. JCO Oncol Pract 2026;22:357-61. [Crossref] [PubMed]
Cohen IG, Ritzman J, Cahill RF. Ambient Listening-Legal and Ethical Issues. JAMA Netw Open 2025;8:e2460642. [Crossref] [PubMed]
Lukac PJ, Turner W, Vangala S, et al. Ambient AI Scribes in Clinical Practice: A Randomized Trial. NEJM AI 2025;
Feldman MJ, Hoffer EP, Conley JJ, et al. Dedicated AI Expert System vs Generative AI With Large Language Model for Clinical Diagnoses. JAMA Netw Open 2025;8:e2512994. [Crossref] [PubMed]
Fraile Navarro D, Kocaballi AB, Berkovsky S. Understanding Clinician Perceptions of GenAI: A Mixed Methods Analysis of Clinical Documentation Tasks. J Med Syst 2025;49:101. [Crossref] [PubMed]
Garcia Sanchez C, Kharko A, Hägglund M, et al. Health Care Professionals’ Experiences and Opinions About Generative AI and Ambient Scribes in Clinical Documentation: Protocol for a Scoping Review. JMIR Res Protoc 2025;14:e73602. [Crossref] [PubMed]
Kılıçtaş AU, Gül O, Kılıçtaş B, et al. Can LLMs simplify operative notes? A comparative analysis in otorhinolaryngology. Eur Arch Otorhinolaryngol 2026;283:477-89.
Saadat S, Khalilizad Darounkolaei M, et al. Enhancing Clinical Documentation with AI: Reducing Errors, Improving Interoperability, and Supporting Real-Time Note-Taking. InfoScience Trends 2025;2:1-13.
Asgari E, Kaur J, Nuredini G, et al. Impact of Electronic Health Record Use on Cognitive Load and Burnout Among Clinicians: Narrative Review. JMIR Med Inform 2024;12:e55499. [Crossref] [PubMed]
Hudson TJ, Albrecht M, Smith TR, et al. Impact of Ambient Artificial Intelligence Documentation on Cognitive Load. Mayo Clin Proc Digit Health 2025;3:100193. [Crossref] [PubMed]
Kanaparthy NS, Villuendas-Rey Y, Bakare T, et al. Real-World Evidence Synthesis of Digital Scribes Using Ambient Listening and Generative Artificial Intelligence for Clinician Documentation Workflows: Rapid Review. JMIR AI 2025;4:e76743. [Crossref] [PubMed]
Patel SB, Lam K. ChatGPT: the future of discharge summaries? Lancet Digit Health 2023;5:e107-8. [Crossref] [PubMed]
Barak-Corren Y, Wolf R, Rozenblum R, et al. Harnessing the Power of Generative AI for Clinical Summaries: Perspectives From Emergency Physicians. Ann Emerg Med 2024;84:128-38. [Crossref] [PubMed]
Zaretsky J, Kim JM, Baskharoun S, et al. Generative Artificial Intelligence to Transform Inpatient Discharge Summaries to Patient-Friendly Language and Format. JAMA Netw Open 2024;7:e240357. [Crossref] [PubMed]
Klang E, Gill J, Sharma A, et al. Summarize-then-Prompt: A Novel Prompt Engineering Strategy for Generating High-Quality Discharge Summaries. Appl Clin Inform 2025;16:1325-31. [Crossref] [PubMed]
Oliveira JD, Santos HDP, Ulbrich AHDPS, et al. Development and evaluation of a clinical note summarization system using large language models. Commun Med (Lond) 2025;5:376. [Crossref] [PubMed]
Silberlust J, Genes N. Clinical Integration of Local AI Assistants Into Emergency Department Discharge Documentation. JAMA Netw Open 2025;8:e2538437. [Crossref] [PubMed]
Bass J, Bodimeade C, Choudhury N. A quality improvement project of patient perception of AI-generated discharge summaries: a comparison with doctor-written summaries. Ann R Coll Surg Engl 2025;107:559-62. [Crossref] [PubMed]
Patel EA, Fleischer L, Filip P, et al. The Use of Artificial Intelligence to Improve Readability of Otolaryngology Patient Education Materials. Otolaryngol Head Neck Surg 2024;171:603-8. [Crossref] [PubMed]
Zhao YF, Bove A, Thompson D, et al. Generative AI Is Not Ready for Clinical Use in Patient Education for Lower Back Pain Patients, Even With Retrieval-Augmented Generation. AMIA Jt Summits Transl Sci Proc 2025;2025:644-53.
Stanceski K, Zhong S, Zhang X, et al. The quality and safety of using generative AI to produce patient-centred discharge instructions. NPJ Digit Med 2024;7:329. [Crossref] [PubMed]
Ganzinger M, Kunz N, Fuchs P, et al. Automated generation of discharge summaries: leveraging large language models with clinical data. Sci Rep 2025;15:16466. [Crossref] [PubMed]
Alqahtani M, Al-Barakati A, Alotaibi F, et al. Impact of Detailed Versus Generic Instructions on Fine-Tuned Language Models for Patient Discharge Instructions Generation: Comparative Statistical Analysis. JMIR Form Res 2025;9:e80917. [Crossref] [PubMed]
Gupta KK, Ismail A, Balai E, et al. Writing clinic letters in head and neck outpatient services-Are we doing what patients want? Int J Clin Pract 2021;75:e13907. [Crossref] [PubMed]
Caterson J, Ambler O, Cereceda-Monteoliva N, et al. Application of generative language models to orthopaedic practice. BMJ Open 2024;14:e076484. [Crossref] [PubMed]
Shahnam A, Nindra U, Hitchen N, et al. Application of Generative Artificial Intelligence for Physician and Patient Oncology Letters-AI-OncLetters. JCO Clin Cancer Inform 2025;9:e2400323. [Crossref] [PubMed]
Mess SA, Mackey AJ, Yarowsky DE. Artificial Intelligence Scribe and Large Language Model Technology in Healthcare Documentation: Advantages, Limitations, and Recommendations. Plast Reconstr Surg Glob Open 2025;13:e6450. [Crossref] [PubMed]
Ali SR, Dobbs TD, Hutchings HA, et al. Using ChatGPT to write patient clinic letters. Lancet Digit Health 2023;5:e179-81. [Crossref] [PubMed]
Blease C. Placebo, Nocebo, and Machine Learning: How Generative AI Could Shape Patient Perception in Mental Health Care. JMIR Ment Health 2025;12:e78663. [Crossref] [PubMed]
Holm S. Handle with care: Assessing performance measures of medical AI for shared clinical decision-making. Bioethics 2022;36:178-86. [Crossref] [PubMed]
Lorenzini G, Arbelaez Ossa L, Shaw DM, et al. Artificial intelligence and the doctor-patient relationship expanding the paradigm of shared decision making. Bioethics 2023;37:424-9. [Crossref] [PubMed]
Gaube S, Suresh H, Raue M, et al. Do as AI say: susceptibility in deployment of clinical decision-aids. NPJ Digit Med 2021;4:31. [Crossref] [PubMed]
Kleinjung T, Peter N, Schecklmann M, et al. The Current State of Tinnitus Diagnosis and Treatment: a Multidisciplinary Expert Perspective. J Assoc Res Otolaryngol 2024;25:413-25. [Crossref] [PubMed]
Marrero-Gonzalez AR, Diemer TJ, Nguyen SA, et al. Application of artificial intelligence in laryngeal lesions: a systematic review and meta-analysis. Eur Arch Otorhinolaryngol 2025;282:1543-55. [Crossref] [PubMed]
Maniaci A, Chiesa-Estomba CM, Lechien JR. ChatGPT-4 Consistency in Interpreting Laryngeal Clinical Images of Common Lesions and Disorders. Otolaryngol Head Neck Surg 2024;171:1106-13. [Crossref] [PubMed]
Sun QW, Miller J, Hull SC. Charting the ethical landscape of generative AI-augmented clinical documentation. J Med Ethics 2026;52:e4-e7. [Crossref] [PubMed]
Duggan MJ, Gervase J, Schoenbaum A, et al. Clinician Experiences With Ambient Scribe Technology to Assist With Documentation Burden and Efficiency. JAMA Netw Open 2025;8:e2460637. [Crossref] [PubMed]
Ng JJW, Wang E, Zhou X, et al. Evaluating the performance of artificial intelligence-based speech recognition for clinical documentation: a systematic review. BMC Med Inform Decis Mak 2025;25:236. [Crossref] [PubMed]
Esmaeilzadeh P. Use of AI-based tools for healthcare purposes: a survey study from consumers’ perspectives. BMC Med Inform Decis Mak 2020;20:170. [Crossref] [PubMed]
Esmaeilzadeh P, Mirzaei T, Dharanikota S. Patients’ Perceptions Toward Human-Artificial Intelligence Interaction in Health Care: Experimental Study. J Med Internet Res 2021;23:e25856. [Crossref] [PubMed]
Witkowski K, Dougherty RB, Neely SR. Public perceptions of artificial intelligence in healthcare: ethical concerns and opportunities for patient-centered care. BMC Med Ethics 2024;25:74. [Crossref] [PubMed]
Osnat B. Patient perspectives on artificial intelligence in healthcare: A global scoping review of benefits, ethical concerns, and implementation strategies. Int J Med Inform 2025;203:106007. [Crossref] [PubMed]
Khan IU, Rehman U, Hanif S, et al. Patient perspectives on the use of AI in medical decision-making – exploring patient trust and acceptance of AI-driven healthcare services- qualitative STUDIE. Insights-Journal of Health and Rehabilitation 2025:416-22.
Ho RA, Shah E, De Armas JS, et al. Artificial Intelligence Models for Dysphonia Patient Education. Otolaryngol Head Neck Surg 2025;173:1455-62. [Crossref] [PubMed]
Saeedi S, Aghajanzadeh M. Assessing the diagnostic capacity of artificial intelligence chatbots for dysphonia types: Model development and validation. Eur Ann Otorhinolaryngol Head Neck Dis 2025;142:171-8. [Crossref] [PubMed]
Glanville B, Oates J, Foley KR, et al. Harmonizing Identities: A Scoping Review on Voice and Communication Supports and Challenges for Autistic Trans and Gender Diverse Individuals. J Autism Dev Disord 2025; Epub ahead of print. [Crossref]
Ferraro T, Villarin C, Jung C, et al. Disparities in Adult Otolaryngology Patients with Limited English Proficiency: A Systematic Review. Laryngoscope 2025;135:1248-58. [Crossref] [PubMed]
Bhamidipaty V, Botchu B, Bhamidipaty DL, et al. ChatGPT for speech-impaired assistance. Disabil Rehabil Assist Technol 2025;20:1575-7. [Crossref] [PubMed]
Rameau A, Cox SR, Sussman SH, et al. Addressing disparities in speech-language pathology and laryngology services with telehealth. J Commun Disord 2023;105:106349. [Crossref] [PubMed]
Schuh MR, Bush ML. Evaluating Equity Through the Social Determinants of Hearing Health. Ear Hear 2022;43:15S-22S. [Crossref] [PubMed]
Ma A, O’Shea R, Wedd L, et al. What is the power of a genomic multidisciplinary team approach? A systematic review of implementation and sustainability. Eur J Hum Genet 2024;32:381-91.
Han AY, Miller JE, Long JL, et al. Time for a Paradigm Shift in Head and Neck Cancer Management During the COVID-19 Pandemic. Otolaryngol Head Neck Surg 2020;163:447-54. [Crossref] [PubMed]
Dharanikota H, Wigmore SJ, Skipworth R, et al. Mapping cognitive biases in multidisciplinary team (MDT) decision-making for cancer care in Scotland: a cognitive ethnography study protocol. BMJ Open 2024;14:e086775. [Crossref] [PubMed]
Winters DA, Soukup T, Sevdalis N, et al. The cancer multidisciplinary team meeting: in need of change? History, challenges and future perspectives. BJU Int 2021;128:271-9.
Tang L, Sun Z, Idnay B, et al. Evaluating large language models on medical evidence summarization. NPJ Digit Med 2023;6:158. [Crossref] [PubMed]
Xiong YT, Liu HN, Zeng YM, et al. Exploring the capabilities of GenAI for oral cancer consultations in remote consultations: Author. BMC Oral Health 2025;25:269. [Crossref] [PubMed]
Li CP, Kalisa AT, Roohani S, et al. The imitation game: large language models versus multidisciplinary tumor boards: benchmarking AI against 21 sarcoma centers from the ring trial. J Cancer Res Clin Oncol 2025;151:248. [Crossref] [PubMed]
Khumalo AC, Kane BT. Perspectives on record-keeping practices in MDT meetings and meeting record utility. Int J Med Inform 2022;161:104711. [Crossref] [PubMed]
King D, Robinson AJM. Evaluating Bias, Fairness, and Transparency in Clinical NLP Algorithms Applied to Electronic Health Record Text for Equitable Healthcare Delivery. International Journal of Advance Research Publication and Reviews 2025;2:568-87.
Abou-Abdallah M, Dar T, Mahmudzade Y, et al. The quality and readability of patient information provided by ChatGPT: can AI reliably explain common ENT operations? Eur Arch Otorhinolaryngol 2024;281:6147-53. [Crossref] [PubMed]
Qu RW, Qureshi U, Petersen G, et al. Diagnostic and Management Applications of ChatGPT in Structured Otolaryngology Clinical Scenarios. OTO Open 2023;7:e67. [Crossref] [PubMed]
Lechien JR, Naunheim MR, Maniaci A, et al. Performance and Consistency of ChatGPT-4 Versus Otolaryngologists: A Clinical Case Series. Otolaryngol Head Neck Surg 2024;170:1519-26. [Crossref] [PubMed]
Asgari E, Montaña-Brown N, Dubois M, et al. A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation. NPJ Digit Med 2025;8:274. [Crossref] [PubMed]
Kernberg A, Gold JA, Mohan V. Using ChatGPT-4 to Create Structured Medical Notes From Audio Recordings of Physician-Patient Encounters: Comparative Study. J Med Internet Res 2024;26:e54419. [Crossref] [PubMed]
Hakim JB, Painter JL, Ramcharran D, et al. The need for guardrails with large language models in pharmacovigilance and other medical safety critical settings. Sci Rep 2025;15:27886. [Crossref] [PubMed]
Blease C, Torous J, McMillan B, et al. Generative Language Models and Open Notes: Exploring the Promise and Limitations. JMIR Med Educ 2024;10:e51183. [Crossref] [PubMed]
Jung KH. Large Language Models in Medicine: Clinical Applications, Technical Challenges, and Ethical Considerations. Healthc Inform Res 2025;31:114-24. [Crossref] [PubMed]
Reddy S. Generative AI in healthcare: an implementation science informed translational path on application, integration and governance. Implement Sci 2024;19:27. [Crossref] [PubMed]
Bracken A, Reilly C, Feeley A, et al. Artificial Intelligence (AI) - Powered Documentation Systems in Healthcare: A Systematic Review. J Med Syst 2025;49:28. [Crossref] [PubMed]
Poon EG, Lemak CH, Rojas JC, et al. Adoption of artificial intelligence in healthcare: survey of health system priorities, successes, and challenges. J Am Med Inform Assoc 2025;32:1093-100. [Crossref] [PubMed]
Zhang P, Boulos MNK. Generative AI in Medicine and Healthcare: Promises, Opportunities and Challenges. Future Internet 2023;15:286.
Chen Y, Esmaeilzadeh P. Generative AI in Medical Practice: In-Depth Exploration of Privacy and Security Challenges. J Med Internet Res 2024;26:e53008. [Crossref] [PubMed]
Mwogosi A, Mambile C. AI integration in EHR systems in developing countries: a systematic literature review using the TCCM framework. Inf Discov Deliv 2025; [Crossref]
Mennella C, Maniscalco U, De Pietro G, et al. Ethical and regulatory challenges of AI technologies in healthcare: A narrative review. Heliyon 2024;10:e26297. [Crossref] [PubMed]
Zhang J, Zhang ZM. Ethics and governance of trustworthy medical artificial intelligence. BMC Med Inform Decis Mak 2023;23:7. [Crossref] [PubMed]
Al-kfairy M, Mustafa D, Kshetri N, et al. Ethical Challenges and Solutions of Generative AI: An Interdisciplinary Perspective. Informatics 2024;11:58.
Larasati R. Inclusivity of AI Speech in Healthcare: A Decade Look Back. ArXiv 2025:abs/2505.10596.
Lewis A, Dangol A, Suh H, et al. Exploring AI-Based Support in Speech-Language Pathology for Culturally and Linguistically Diverse Children. Association for Computing Machinery 2025;558:1-19.
Omoyemi OE. Ethical Implications of AI-Driven AAC Systems: Ensuring Inclusivity and Equity in Assistive Technologies. World Journal of Advanced Research and Reviews 2024;24:2576-81.

doi: 10.21037/jmai-2026-1-0036
Cite this article as: Karni JE, Attal R, Farzad A, Hack S. Non-diagnostic generative artificial intelligence in otolaryngology: a narrative review of evidence, risks, and implementation. J Med Artif Intell 2026;9:55.

Non-diagnostic generative artificial intelligence in otolaryngology: a narrative review of evidence, risks, and implementation

Introduction

Background

Rationale and knowledge gap

Objective

Methods

Table 1

Conceptual framework: non-diagnostic GenAI as sociotechnical redistribution of clinical work

Operative notes and procedural documentation

Discharge summaries and transitions of care

Clinic letters and patient-facing summaries

Multidisciplinary tumor board (MDT) synthesis

Cross-cutting synthesis across non-diagnostic domains (Table 2)

Table 2

Implementation realities: electronic health record (EHR) integration and the “last-mile” problem

Ethical, equity, and privacy considerations

Limitations

Future directions

Implications for practice

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share