Review Article
Non-diagnostic generative artificial intelligence in otolaryngology: a narrative review of evidence, risks, and implementation
Abstract
Background and Objective: Generative artificial intelligence (GenAI) is increasingly used in otolaryngology, with most work focused on diagnostics. However, documentation, communication, and care coordination drive much of clinical workload and medicolegal risk, yet evidence on non-diagnostic use remains limited. To critically synthesize the peer-reviewed evidence on non-diagnostic GenAI applications in otolaryngology and evaluate their benefits, failure modes, and implementation considerations across common clinical documentation and communication tasks.
Methods: A structured narrative review was conducted of peer-reviewed literature published between January 2020 and December 2025. Searches were performed in PubMed, Scopus, and Web of Science, limited to English-language publications, using terms related to GenAI, large language models (LLMs), and otolaryngology documentation and communication. Studies focused exclusively on diagnostic interpretation or autonomous clinical decision-making were excluded unless directly relevant to non-diagnostic use. Evidence was synthesized qualitatively using a sociotechnical framework distinguishing epistemic (diagnostic) from operational (documentation and communication) functions. Five predefined domains were analyzed: operative notes, discharge summaries, clinic letters and patient-facing summaries, multidisciplinary tumor board documentation, and consent and risk communication.
Key Content and Findings: Across domains, GenAI demonstrated consistent strengths in linguistic fluency, organization, and readability, particularly for discharge summaries and patient-facing materials. However, recurrent failure modes were identified, including laterality errors, omission of low-frequency but high-consequence details, attenuation of clinical uncertainty, and false coherence in multidisciplinary summaries. High-risk applications—most notably operative notes and consent documentation—showed unacceptable error profiles when unconstrained narrative generation was used. Hybrid clinician-GenAI workflows consistently outperformed GenAI-only outputs in accuracy and acceptability. Efficiency gains were highly task-dependent and frequently offset by increased cognitive burden associated with verification and interpretive editing. Implementation challenges, including limited electronic health record (EHR) integration, governance barriers, and shadow use, were repeatedly identified as determinants of real-world safety and effectiveness.
Conclusions: Non-diagnostic GenAI is most defensible in otolaryngology when deployed selectively, with task-specific risk stratification and explicit human oversight. GenAI performs most reliably as a constrained drafting and organizational aid tethered to verified inputs, rather than as an autonomous narrator. Safe adoption depends less on model sophistication than on disciplined workflow integration, transparent attribution, and governance that preserve clinician accountability and patient trust.

