Large language model in head and neck research: evaluating the similarity of the American Society of Clinical Oncology (ASCO) with ChatGPT 4.0 answers
Letter to the Editor

Large language model in head and neck research: evaluating the similarity of the American Society of Clinical Oncology (ASCO) with ChatGPT 4.0 answers

Lucas Lacerda de Souza ORCID logo, Ivan José Correia-Neto ORCID logo, Alan Roger Santos-Silva ORCID logo, Pablo Agustin Vargas ORCID logo, Marcio Ajudarte Lopes ORCID logo

Department of Oral Diagnosis Piracicaba Dental School, University of Campinas, Piracicaba, Brazil

Correspondence to: Lucas Lacerda de Souza, DDS, MSc. Department of Oral Diagnosis, Piracicaba Dental School, University of Campinas, Av. Limeira, 901, Areião - 13414-903 Piracicaba, São Paulo, Brazil. Email: lucaslac@hotmail.com.

Received: 14 April 2024; Accepted: 03 July 2024; Published online: 08 August 2024.

doi: 10.21037/jmai-24-108


The increasing dependence of patients on the internet as a source for health information has become notably significant, with a staggering 80% of US adults now regularly engaging in this activity to seek advice, understand symptoms, or learn about potential treatments (1). This trend not only underscores the critical role of the internet in modern healthcare communication but also brings to the forefront the pressing need for ensuring the accuracy and reliability of health-related content available online. This is particularly pertinent in the context of serious conditions such as head and neck cancer (2), where the quality of information can directly affect patient outcomes and treatment decisions.

In light of this, the emergence of advanced artificial intelligence (AI) systems, exemplified by ChatGPT 4.0, has introduced new dynamics into the domain of online health information. ChatGPT 4.0, celebrated for its sophisticated natural language processing capabilities, offers the promise of providing immediate, accessible, and detailed health information to the public (3,4). However, the advent of such technology also raises important questions regarding the reliability and precision of the health information it dispenses. The potential for AI to either enhance or undermine the quality of health communication necessitates a rigorous evaluation of the content it produces, especially in areas as critical as oncology.

This necessity is acutely felt by healthcare professionals and health communicators, who bear the responsibility of guiding patients through complex health information and ensuring that the public has access to trustworthy and accurate data (4). It is against this backdrop that this letter seeks to undertake a thorough evaluation of the precision and reliability of head and neck cancer information as provided by ChatGPT 4.0. This analysis is juxtaposed against the established and peer-reviewed content offered by the American Society of Clinical Oncology (ASCO), particularly through its “Head and Neck Cancer Answers” webpage. Our comparison aims to shed light on the strengths and potential gaps in AI-driven health communication, with a view to enhancing the quality of information available to patients and ensuring that healthcare professionals are equipped with the best possible tools to support their work.

Using the “Head and Neck Cancer Answers” webpage from ASCO, we asked ChatGPT 4.0 five questions about head and neck cancer. The answers provided by ASCO and ChatGPT for each question were assessed and compared based on their word count and readability, utilizing the Flesch-Kincaid grade level and Flesch reading ease score metrics. To gauge the level of similarity and precision in the responses, we applied statistical analyses by Cosine similarity (assessed the thematic similarity between two documents, used here to compare responses from ASCO and ChatGPT 4.0); Levenshtein distance (represents the minimum single-character changes needed to transform one text into another, assessing textual similarity), and Jaccard similarity coefficient (evaluates the overlap of keyword sets between responses, indicating their conceptual similarity). Statistical analyses were performed using Python software (version 3.10.2; Python Software Foundation).

Table S1 presents questions and answers from ASCO and ChatGPT 4.0. ASCO’s responses averaged 104 words (range, 39 to 249 words), had a Flesch-Kincaid grade level of 6.84 (range, 4.8 to 9.0), and achieved a Flesch reading ease score of 61.12 (range, 45.8 to 78.4). ChatGPT’s answers averaged 324 words (range, 193 to 428 words), reached a higher Flesch-Kincaid grade level of 13.04 (range, 11.7 to 14.0), and had a Flesch reading ease score of 38.5 (range, 32.8 to 44.2).

The statistical analysis revealed a cosine similarity average of 0.669 (range, 0.57 to 0.754), a mean Levenshtein distance of 1,656 (range, 1,393 to 2,391), and an average Jaccard similarity coefficient of 0.4774 (range, 0.354 to 0.602) (Table 1).

Table 1

Word count, readability and statistical analysis

Questions American Society of Clinical Oncology ChatGPT 4.0 Statistical analysis
Word count Flesh-Kincaid grade level Flesch reading ease score Word count Flesh-Kincaid grade level Flesch reading ease score Cosine similarity Levenshtein distance Jaccard similarity coefficient
What is head and neck cancer? 82 5.3 71.6 270 14.0 38.1 0.694 1,393 0.508
What are the types of head and neck cancer? 92 7.2 55.8 314 11.7 44.2 0.699 1,512 0.51
What does stage mean? 39 4.8 78.4 193 14.0 32.8 0.57 1,079 0.354
How are cancers of the head and neck treated? 249 7.9 54.0 428 13.6 33.8 0.754 1,905 0.602
How can I cope with a cancer diagnosis? 60 9.0 45.8 419 11.9 43.6 0.629 2,391 0.413
Average of values 104 6.84 61.12 324 13.04 38.5 0.669 1,656 0.4774

The comparative examination of head and neck cancer information disseminated by the ASCO and ChatGPT 4.0 elucidates notable divergences in methodology and the manner of presentation, whilst also highlighting considerable congruence in the essence of the content conveyed. ASCO’s strategy appears to prioritize brevity and accessibility, ostensibly to facilitate comprehension across a broad spectrum of medical literacy. Such an approach is presumably aimed at ensuring that essential health information is readily understandable to all readers. In contrast, ChatGPT 4.0 adopts a distinctly different tact, offering responses characterized by greater detail and complexity. While this augmented intricacy might pose comprehension challenges for individuals with lower levels of literacy, it concurrently holds the potential to furnish more elucidated insights, catering to those in pursuit of exhaustive explanations (5).

This study highlights the importance of evaluating ChatGPT 4.0’s ability to provide precise and accurate information on head and neck cancer. As AI rapidly advances and becomes more integrated into the healthcare information landscape, it revolutionizes how medical knowledge is accessed and shared. Yet, these developments raise concerns, especially about the potential for AI systems like ChatGPT 4.0 to spread misinformation. This risk is amplified by the systems’ evolving nature, which can lead to inconsistent information accuracy due to continual user interactions (5,6).

Moreover, there are prevailing doubts about the scope of ChatGPT’s knowledge base, especially its competence in addressing topics outside the narrow confines of head and neck cancer. This concern prompts a call for a more expansive evaluation of ChatGPT’s capabilities across a wider spectrum of medical conditions, to determine its reliability and utility as a comprehensive health information resource (4,7).

To this end, establishing robust oversight and strict evaluation processes for digital tools like ChatGPT 4.0 is essential. These mechanisms are crucial for verifying the accuracy and reliability of provided information and ensuring the integrity of online health communication channels (3,4). High standards of factual correctness and ethical practice are vital to maintain the trust of healthcare professionals and patients. This study underscores the need to assess ChatGPT 4.0’s effectiveness in delivering cancer-related information and prompts a broader discussion on governing AI technologies in health information (7,8).

In assessing the potential and limitations of AI-generated content for medical information, it is crucial to acknowledge that such technologies, while promising, are not without their constraints. These include reliance on the available data, potential biases in training datasets, and the current inability of AI to capture the nuanced judgment of experienced clinicians. Our study attempts to improve these aspects, though the findings must be considered within the context of these inherent limitations.

This study compares the concise guidelines provided by ASCO with the more detailed responses from ChatGPT 4.0 concerning head and neck cancer. This comparison highlights the differing approaches to information delivery: ASCO prioritizes brevity and directness, which can benefit quick reference needs, whereas ChatGPT offers extended discussions, which may be useful for those seeking comprehensive explanations. This distinction emphasizes the need for adaptive communication strategies in medical information, recognizing that both concise and detailed approaches serve critical roles depending on the user’s needs and contexts.


Acknowledgments

Funding: This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-Brasil (CAPES)-Finance Code 001” and São Paulo State Research Foundation to support the present study (FAPESP #22/03123-5).


Footnote

Provenance and Peer Review: This article was a standard submission to the journal. The article has undergone external peer review.

Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-108/prf

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-108/coif). L.L.d.S. serves as the unpaid Associate Editor-in-Chief of Journal of Medical Artificial Intelligence from February 2024 to January 2026. The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Calixte R, Rivera A, Oridota O, et al. Social and Demographic Patterns of Health-Related Internet Use Among Adults in the United States: A Secondary Data Analysis of the Health Information National Trends Survey. Int J Environ Res Public Health 2020;17:6856. [Crossref] [PubMed]
  2. Tam B, Lin M, Castellanos C, et al. Head and Neck Cancer Online Support Groups: Disparities in Participation and Impact on Patients. OTO Open 2023;7:e87. [Crossref] [PubMed]
  3. de Souza LL, Lopes MA, Santos-Silva AR, et al. The potential of ChatGPT in oral medicine: a new era of patient care? Oral Surg Oral Med Oral Pathol Oral Radiol 2024;137:1-2. [Crossref] [PubMed]
  4. de Souza LL, Fonseca FP, Martins MD, et al. ChatGPT and medicine: a potential threat to science or a step towards the future? J Med Artif Intell 2023;6:19. [Crossref]
  5. Cortegiani A, Battaglini D, Amato G, et al. Dissemination of clinical and scientific practice through social media: a SIAARTI consensus-based document. J Anesth Analg Crit Care 2024;4:21. [Crossref] [PubMed]
  6. Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng 2018;2:719-31. [Crossref] [PubMed]
  7. Giannakopoulos K, Kavadella A, Aaqel Salim A, et al. Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study. J Med Internet Res 2023;25:e51580. [Crossref] [PubMed]
  8. Jeyaraman M, Balaji S, Jeyaraman N, et al. Unraveling the Ethical Enigma: Artificial Intelligence in Healthcare. Cureus 2023;15:e43262. [Crossref] [PubMed]
doi: 10.21037/jmai-24-108
Cite this article as: Souza LL, Correia-Neto IJ, Santos-Silva AR, Vargas PA, Lopes MA. Large language model in head and neck research: evaluating the similarity of the American Society of Clinical Oncology (ASCO) with ChatGPT 4.0 answers. J Med Artif Intell 2024;7:31.

Download Citation