Would ChatPDF be advantageous for expediting the interpretation of imaging and clinical articles in PDF format?
Highlight box
Key findings
• ChatPDF exhibits satisfactory comprehension of the main topics and conclusions in clinical articles. This indicates its potential utility for quickly summarizing key points, making it a helpful tool for preliminary reviews of scientific literature.
What is known and what is new?
• It is known that artificial intelligence tools face challenges in accurately interpreting non-textual information. ChatPDF struggles significantly to extract and interpret data from images, graphs, and tables attached to articles. This limitation is critical, as these visual elements often contain essential data that support and clarify the textual content. Additionally, ChatPDF shows inadequate performance in detailing methods of sampling and data collection, which are crucial for assessing the validity and reproducibility of clinical studies. These shortcomings highlight the tool’s current limitations in providing a comprehensive analysis of scientific documents.
What is the implication, and what should change now?
• While ChatPDF is useful for extracting principal information from PDF articles, it falls short in delivering in-depth details necessary for thorough scientific analysis. This limitation implies that users cannot rely solely on ChatPDF for critical evaluations of clinical research. To address these issues, developers should enhance the AI’s capabilities in visual data recognition and interpretation. This includes advancing algorithms for image processing and improving the parsing of complex scientific texts. Additionally, providing detailed user guidelines on effectively using ChatPDF can help mitigate some of its current limitations. By addressing these areas, ChatPDF could become a more robust and reliable tool for comprehensive literature reviews in clinical research.
Introduction
In the landscape of artificial intelligence (AI), chatbots have progressed from being mere conversational applications/tools to becoming indispensable instruments in a multitude of fields. The likes of ChatGPT, Bing Chat, and Google Bard showcase the expansive capabilities of AI chatbots, undertaking multifaceted tasks with ease.
However, among the well-known wide-ranging applications, there is also a focus on evaluating more targeted and specialized functionalities to improve specific user experiences (1,2).
Some literature has already examined and analyzed some of the roles that AI and large language models (LLMs) in particular can have for different textual analysis tasks, such as accelerating the process of analyzing texts for the drafting of systematic reviews (3), with partial results that still need to be completed by a human reviewer, or in the contribution they can offer to the search and examination of academic literature (4), offering, to complete the understanding of the text by helping researchers to make connections from a huge amount of literature. Other LLMs have also examined the ability to interpret textual reports in order to summarize them for clinical purposes to statize them in relation to the presence of specific findings, especially in emergency settings (5).
ChatPDF is a notable player in this specialized arena, offering a unique approach to document reading experiences. In a landscape dominated by powerful chatbots, ChatPDF distinguishes itself by focusing on the analysis of PDF files. This tool, designed to read, extract information, and respond to user queries, aims to streamline the consumption of scholarly content.
Scientists are becoming increasingly interested in these subjects and they are putting ChatGPT’s AI systems to the test. Specifically, these investigations have shown that rapidity and reading ability are possible without major writing ability or critical comprehension skills (6).
This article navigates through the current milieu of AI-driven chatbots, highlighting the pivotal role they play in shaping how information is accessed and understood. As we delve into the realm of scientific article interpretation, the focus shifts to ChatPDF and its potential impact on academic writing and reading.
The study outlined herein investigates how ChatPDF fares in comprehending the complexity of scientific articles. The set of 48 articles from different fields serves as a testing ground, where experts evaluate the chatbot’s performance on different sections of the articles. This study aims to go beyond mere assessment to reveal the impact of AI-driven chatbots in the context of scientific debate and provide insight into their benefits and limitations.
In this article, we analyze this scenario by examining the symbiotic relationship between AI chatbots and scientific content. As we ponder the potential applications and challenges presented by tools like ChatPDF, we contribute to the ongoing narrative at the intersection of AI technologies and academic communication.
Methods
Data collection
We conducted a retrospective analysis of 48 scientific medical articles collected from different fields (neurology, pneumology, musculoskeletal system, gastroenterology, endocrinology, anesthesiology, nuclear medicine, breast medicine and cardiology) and different article types (Review, Original Research, Clinical Case, Multicenter Study and Research Letter) (Table 1) downloaded in open access or free text from PubMed (https://pubmed.ncbi.nlm.nih.gov/).
Table 1
Fields and types of articles | Total |
---|---|
Fields of articles | |
Thoracic radiology | 11 |
Endocrinology | 11 |
Neuroradiology | 8 |
Musculoskeletal radiology | 6 |
Abdominal radiology | 4 |
Nuclear medicine | 2 |
Brest imaging | 1 |
Anesthesiology | 1 |
Pneumology | 1 |
Immunology | 1 |
Cardiology | 1 |
Management | 1 |
Types of articles | |
Original Article | 23 |
Review Article | 18 |
Case Report | 2 |
Multicenter Study | 2 |
Clinical Case Study | 1 |
Clinical Trial | 1 |
Research Letter | 1 |
We used the online service ChatPDF (https://www.chatpdf.com), last updated January 05, 2023, GmbH, ChatPDF GmbH, Lammertzweg 19, Laboe 24235, Germany. On web service we uploaded for each time a single file and sent 5 successive prompts to ChatPDF, respectively:
- Q1: Summarize in 50 words the topic of this article.
- Q2: Summarize the conclusions of this article in 50 words.
- Q3: Summarize the results of this article in 50 words.
- Q4: Summarize how the study and data collection are designed in 50 words.
- Q5: Is there any important information in the images, graphs and tables that is not reported in the text? What? In 50 words.
These queries were used as prompts with the aim of evaluating each article subject specifically: abstract/introduction for topic, conclusions, results, methodology and to explore the caption of images, graphs and tables.
Five clinicians, specifically four clinical radiologists with experience in emergency/abdomen, thorax, musculoskeletal system and neuroradiology, plus a clinical endocrinologist, with different academic experience and academic expertise (h-Index from 1 to 18, from resident to assistant professor) examined the answers given by the AI. They qualitatively evaluated each answer with a score ranging from 0 to 5, based on Mean Opinion Score (MOS) revisited as the Quality Evaluation Scale (7) for AI (MOS-AI). This index was used in different landscapes, and in medicine (8-10). The explanation of the point scale is shown in Table 2.
Table 2
Point | Description |
---|---|
0 | No understanding or totally misunderstanding as null response |
1 | Bad comprehension, missing or incorrect information and too generic description without essential information |
2 | Poor comprehension, description can be correlated to text but with only a few important pieces of information |
3 | Correct comprehension, missing details or the description is not accurate, but correct |
4 | Correct comprehension associated with some incomplete or inaccurate description of details |
5 | Perfect comprehension, even of the details, combined with a precise and timely description |
Each contributing author evaluated articles according to their sub-specialty, selecting the author’s articles and unknown scientific work. In this way, we intended to consider both deep comprehension of the text, for example in the case of the author’s articles with good knowledge of the work, and a more general collective understanding of the content.
Any patient-related information was not collected. Ethical committee approval was not required due to the total absence of patients nor any identifiable data.
Data analysis
For statistical analysis, categorical variables are expressed as absolute frequencies and percentages; where necessary, relative frequencies and percentages were used. A statistical analysis of subgroups was also performed. Gini heterogeneity index (11) was calculated to compare every group of evaluation of section of article.
The evaluation of Gini heterogeneity index was necessary because, to get tangible results from the study, we needed a test that returned reliable performance only in relation to the distribution of the data (higher or lower), independently of the type of distribution.
The Gini heterogeneity index (G) allows to measure the distribution of a qualitative variable depending on the homogeneity or heterogeneity of the statistical simple. The Gini index can take values greater than or equal to zero. It will be equal to zero in the scenario of maximum homogeneity or least heterogeneity, where all statistical units take on the same value. While the number of distinct values that the distribution in question can assume determines the maximum value. This implies that we will either have complete heterogeneity or no homogeneity when the value equals the highest value that may be achieved. The normalized Gini index (Gn), which is a number between 0 and 1 that can be calculated by dividing by the highest value assumed in the sample under study, is an additional method of representation. In this instance, the normalized value will approach 1 as it tends to the maximum heterogeneity, and it will approach zero as it tends to the maximum homogeneity relative to the sample.
In our data, the Gini index was useful, because, as described, it allowed us to quantitatively study some parameters examined, specifically, the sections of scientific articles. As mentioned, high values indicated greater inhomogeneity of the results obtained in the specific section and therefore different behavior of the AI with respect to the type of article, and therefore different ability to interpret the text of that specific section. The quantitative data produced with the Gini index also corresponded to the distribution of qualitative values by section of the article, as shown in Figures 1,2; high values showed a greater number of data out of scale with respect to the central body of the data, differently from lower Gini indices that graphically in the images are highlighted as a rather grouped and homogeneous cloud of data.
The normalization used, as described, that is, dividing by the highest index, was used to compare the different performances of homogeneity and inhomogeneity obtained between the different sections. Therefore, the two indices, regular and normalized, allow an absolute and relative evaluation (between the various sections) in order to allow for a better comparison.
Statistical analysis
Data were analyzed with dedicated software (SPSS Statistics, version 27.0; IBM, USA) and were graphically represented by histograms, stacked normalized bar chart and scatter plots. The results were considered statistically significant at P value <0.05.
Technical aspects
ChatPDF, based on GPT-3.5 like OpenAI’s chatbot, aims to improve learning experiences by reading user-submitted PDFs and answering related questions. It extracts information from documents, presents it as responses to specific queries and during analysis, it uses OpenAI’s ChatGPT API to generate responses (12,13) as a LLM. ChatPDF supports multiple languages and users can upload PDFs via the official website and store them securely in a cloud archive. ChatPDF, as a GPT-3.5 based chatbot, uses the same database as ChatGPT to search for additional information and answer questions beyond the uploaded PDF. ChatPDF offers a free option for up to two PDFs daily, while a monthly Plus subscription, a purchased version, allows analysis of multiple PDFs daily. Integration into websites or mobile apps is possible via application programming interface (API) registration. Lastly, free accounts are not upgraded to GPT-4 due to prohibitive costs, and upgrading is limited for a few messages, with the Plus plan price being a determining factor.
Each LLM has distinctive characteristics based on its basic algorithm, which is characterized by variations in the number of parameters and the nature of the data to which it has been subjected. In our investigation, we used ChatPDF, which has GPT-3.5 (one of the engines behind ChatGPT) trained on a dataset of 175 billion parameters. Disclosure of the volume and type of data used for training is not always provided by companies, placing an undisclosed and non-reportable figure. As a result, the disparate management of parameters and factors by LLMs can have a perceptible impact on their performance.
Results
The distribution of ratings given in terms of both score point, and article sections is shown in Table 2. Specifically, high point scores of 4 and 5 are evident in the topic and conclusions sections [topic: 3 points 17% (8/48), 4 points 42% (20/48), 5 points 37% (18/48); conclusions: 3 points 17% (8/48), 4 points 42% (20/48), 5 points 35% (17/48)] and lower in images, graphs and tables subject [0 point 61% (29/48), 1 point 6% (3/48), 2 points 6% (3/48)]. The values obtained in the methodology section have a different distribution [0 point 10% (5/48), 1 point 8% (4/48), 2 points 13% (6/48), 3 points 19% (9/48), 4 points 21% (10/48), 5 points 29% (14/48)], also as shown by the mean, minimum, maximum and median values of standard deviation (SD) in Table 2.
This led to the calculation of the Gini heterogeneity index (G) and its relative normalized index (Gn) in Table 3.
Table 3
Detail | Topic | Conclusion | Results | Methodology | Image, graph & table |
---|---|---|---|---|---|
0-point | 0 | 2 | 2 | 5 | 29 |
1-point | 0 | 0 | 1 | 4 | 3 |
2-point | 2 | 1 | 6 | 6 | 3 |
3-point | 8 | 8 | 9 | 9 | 3 |
4-point | 20 | 20 | 20 | 10 | 4 |
5-point | 18 | 17 | 10 | 14 | 6 |
Average | 4.13 | 3.98 | 3.54 | 3.19 | 1.33 |
Median | 4.00 | 4.00 | 4.00 | 3.50 | 0.00 |
SD | 0.84 | 1.14 | 1.25 | 1.67 | 1.91 |
Min | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Max | 5.00 | 5.00 | 5.00 | 5.00 | 5.00 |
Gini-index | 0.656 | 0.671 | 0.730 | 0.803 (max) | 0.601 (min) |
Gini-index (normalized) | 0.816 | 0.835 | 0.909 | 1.000 (max) | 0.748 (min) |
SD, standard deviation.
As shown in the graph (Figure 1) the lowest values, therefore the lowest heterogeneity or the major homogeneity (Figure 2), were obtained in the sections of images, graphs, and tables (G 0.601, Gn 0.748) topic (G 0.656; Gn 0.816) and conclusions (G 0.671; Gn 0.835). The high values both in results (G 0.730; Gn 0.909) and in methodology (G 0.803; Gn 1.000), are associated with high heterogeneity (Figure 3).
We have also provided some real question-answer examples obtained directly from the online tool with different response qualities [Supplementary file (Appendix 1)].
Discussion
In this work we examined AI comprehension in 48 scientific articles from diverse disciplines. Sections such as main topics, conclusions, results, methods, and data were examined, along with additional content like images and tables. The data analysis revealed that the topic, conclusions and images, graphs and tables categories had the lowest Gini index values (Figure 3), according to the frequency of the other categories (topic: G 0.656, Gn 0.817; conclusions: G 0.671, Gn 0.836; images, graphs and tables: G 0.601, Gn 0.748). This indicates that values tending to homogeneity have been acquired in these three areas. This pattern becomes remarkably interesting when we assess how the qualitative score is distributed.
In the first two (topic and conclusions) the most common scores were 3, 4, and 5, with a higher frequency of values 5 and 4, indicating either a perfect comprehension or a proper understanding but lacking some specifics. This shows that when AI understands the general meaning and the endpoints, it tends to be trustworthy. However, we would like to point out that, after careful examination of the data in the conclusions section, scores of 0 were achieved in two cases, indicating null comprehension. In the third case (images, graphs and tables section) the Gini index values further suggest a trend toward a homogeneous distribution. In particular, the data analysis reveals a high frequency of 0, 1, and 2 points, with a peak of 0 score, which denotes a completely incorrect or misunderstanding. In this category, AI is unreliable due to a substantial proportion of inadmissible responses, even though there are some examples with high scores of 4 and 5.
The Gini index in results and methodology categories showed almost no homogeneity. Specifically, the methodology had a higher degree of heterogeneity (G 0.803, Gn 1.000). A ubiquitous distribution, with values in each scale score and several 0 points within the methodology, demonstrates a significant variability in the answers with misunderstandings. Because comprehension in these categories has been shown to be untrustworthy and chaotic, we do not regard it as reliable. We believe that achieving correct and excellent understanding of the topic and conclusion sections is important, because these two sections represent the AI’s ability to understand the general subject of the articles. The variable response to data collection methods and results certainly makes these systems unreliable in data analysis. A lot of work needs to be done with images, graphs and tables reading skills, which are usually incomplete or null.
The identification of keywords, particularly their position, their weight and their contextualization within a scientific article, is important for AI. These are fundamental aspects for the efficiency and effectiveness of LLMs in extracting data from extended textual information and therefore also of scientific articles, as in our case.
The ‘human’ reading of a scientific article implies, albeit unconsciously, an examination of the text also based on factors such as the reader’s experience, the ‘fluency’, incisiveness and brevity of the text, the evaluation of the context and the specific clinical knowledge of the sector. Thus, human reading pays ’similar’ attention to key words in the text as well as to the numerical and quantitative values present in the text.
On the other hand, what our results obtained seems to demonstrate different performances in data extraction starting from a simultaneous presence of numerical data rather than from textual information alone or numerical information alone. In fact, the performance obtained in article sections that present both textual and numerical information (e.g., topics, results, conclusions) are better (more homogeneous) than in article sections in which only one of the two types of information is predominantly present (e.g., images, methodology).
Technical considerations include the use of diverse training data to mitigate and improve the incorporation of reinforcement learning, which is critical to promoting efficient LLMs. Continuous dynamic training and updating models with contemporary data and knowledge are essential to ensure relevance and adaptability to real-time applications. Collaborative engagement with specialists, guided by ethical principles, serves to improve trust, accuracy, and fairness. Expert input is crucial to guide model learning and answer more complex questions.
The results obtained therefore demonstrate that the analysis of scientific articles with this tool has limitations, in particular, for what was analyzed, with respect to the sections results and materials and methods. Probably these sections could have had more diverse or heterogeneous results due to the difficulty of analysis of the AI with respect to the presence of a conspicuous presence of numerical data (as often happens in the Materials and Methods section) which probably in some cases are poorly attributed and correlated with the free text. Similarly, also in the Results section the numerous numerical references, as well as references to tables or graphs can induce the AI to a less precise analysis of the text and its association with the quantitative data presented. It is hoped that a future LLM tool created ad hoc can better examine the sections of a scientific PDF, perhaps through the integration of text recognition algorithms [such as optical character recognition (OCR)] or better algorithms for the analysis of tables or, even better, the integration of domain-specific knowledge bases to improve accuracy in specialized fields.
The most important of the concerns surrounding the application of LLM is the considerable risk of over-reliance on technology at the expense of human input in decision-making and interpretation processes. Regardless of the application domain of AI, numerous authors warn against an uncritical interpretation of AI-generated results.
In this scenario, the article reading systems, such as ChatPDF, could become fundamental in research to improve data analysis collection and meta-analysis elaboration that could change the future of scientific research. This is not without difficulties.
Recent research has already mentioned several important concerns, including its misuse and ethical and legal consequences. For these reasons, recommendations for its use are still necessary (14).
To address the need to create models for testing chatbot systems, other authors have considered simulating conversations in clinical settings about routine diagnostic problems. Even in these cases, ChatGPT was able to quickly summarize the subject and dig deeper into the topic if needed (15).
The use of chatbots for clinical problems is becoming more and more fascinating. Applications of chatbot conversation systems, for instance, are intriguing in the subject of metal health. The possibility of chatbots to communicate and interact with people in the context of mental illnesses is being studied by several writers (16).
In this regard, we created a model, although under subjective evaluation parameters, to try to test the ability of these systems to understand scientific articles by choosing different fields and types of articles.
This study has some limitations that must be acknowledged. Firstly, the reliance on a limited number of scientific articles may affect the depth of analysis. The inclusion of qualitative and interpersonal assessments introduces subjectivity, impacting result variability. Additionally, the chosen article types focus on clinical radiology, potentially underrepresenting diverse scientific themes and hence influencing result generalizability. In summary, these limitations emphasize the need for cautious interpretation and restrict the study’s broader applicability beyond the outlined boundaries.
Conclusions
AI-driven chatbots are becoming increasingly common in helping in various applications (13,17-20). Our work resulted in a tendency for a good and an incredibly good understanding of topics and conclusions, which we consider important for comprehension of the text, and almost no tendency to reeding images, graphs, and tables. In other sections, such results and methodology, distributions were heterogeneous and therefore difficult to evaluate. This innovative study first highlighted the features and weaknesses of chatbot systems but with not negligible limitations. Further research is needed to build a validated model and deepen the chatbot ability to improve them.
Acknowledgments
Funding: None.
Footnote
Data Sharing Statement: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-153/dss
Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-153/prf
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-153/coif). The authors have no conflicts of interest to declare
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. Ethical committee approval was not required due to the total absence of patient-identifiable data.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Infante A, Gaudino S, Orsini F, et al. Large language models (LLMs) in the evaluation of emergency radiology reports: performance of ChatGPT-4, Perplexity, and Bard. Clin Radiol 2024;79:102-6. [Crossref] [PubMed]
- Lecler A, Duron L, Soyer P. Revolutionizing radiology with GPT-based models: Current applications, future possibilities and limitations of ChatGPT. Diagn Interv Imaging 2023;104:269-74. [Crossref] [PubMed]
- van Dijk SHB, Brusse-Keizer MGJ, Bucsán CC, et al. Artificial intelligence in systematic reviews: promising when appropriately used. BMJ Open 2023;13:e072254. [Crossref] [PubMed]
- Heidt A. Artificial-intelligence search engines wrangle academic literature. Nature 2023;620:456-7. [Crossref] [PubMed]
- Preiksaitis C, Ashenburg N, Bunney G, et al. The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review. JMIR Med Inform 2024;12:e53787. [Crossref] [PubMed]
- Hill-Yardin EL, Hutchinson MR, Laycock R, et al. A Chat(GPT) about the future of scientific publishing. Brain Behav Immun 2023;110:152-4. [Crossref] [PubMed]
- Streijl RC, Winkler S, Hands DS. Mean opinion score (MOS) revisited: methods and applications, limitations and alternatives. Multimedia Systems 2016;22:213-27. [Crossref]
- Kumar B, Singh SP, Mohan A, et al. Novel MOS prediction models for compressed medical image quality. J Med Eng Technol 2011;35:161-71. [Crossref] [PubMed]
- Khrulkov V, Babenko A. Neural Side-By-Side: Predicting Human Preferences, for No-Reference Super-Resolution Evaluation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021:4988-97. Available online: https://openaccess.thecvf.com/content/CVPR2021/html/Khrulkov_Neural_Side-by-Side_Predicting_Human_Preferences_for_No-Reference_Super-Resolution_Evaluation_CVPR_2021_paper.html
- Basant B, Singh SP, Mohan A, et al. Performance of Quality Metrics for Compressed Medical Images Through Mean Opinion Score Prediction. Journal of Medical Imaging and Health Informatics 2012;2:188-94. [Crossref]
- Das S, Biswas S. Critical Scaling through Gini Index. Phys Rev Lett 2023;131:157101. [Crossref] [PubMed]
- Crook BS, Park CN, Hurley ET, et al. Evaluation of Online Artificial Intelligence-Generated Information on Common Hand Procedures. J Hand Surg Am 2023;48:1122-7. [Crossref] [PubMed]
- Benirschke RC, Wodskow J, Prasai K, et al. Assessment of a large language model's utility in helping pathology professionals answer general knowledge pathology questions. Am J Clin Pathol 2024;161:42-8. [Crossref] [PubMed]
- Sallam M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare (Basel) 2023;11:887. [Crossref] [PubMed]
- Schukow C, Smith SC, Landgrebe E, et al. Application of ChatGPT in Routine Diagnostic Pathology: Promises, Pitfalls, and Potential Future Directions. Adv Anat Pathol 2024;31:15-21. [Crossref] [PubMed]
- Abd-Alrazaq AA, Alajlani M, Alalwan AA, et al. An overview of the features of chatbots in mental health: A scoping review. Int J Med Inform 2019;132:103978. [Crossref] [PubMed]
- Buvat I, Weber W. Nuclear Medicine from a Novel Perspective: Buvat and Weber Talk with OpenAI's ChatGPT. J Nucl Med 2023;64:505-7. [Crossref] [PubMed]
- Adams LC, Truhn D, Busch F, et al. Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study. Radiology 2023;307:e230725. [Crossref] [PubMed]
- Nobel JM, Kok EM, Robben SGF. Redefining the structure of structured reporting in radiology. Insights Imaging 2020;11:10. [Crossref] [PubMed]
- Rogasch JMM, Metzger G, Preisler M, et al. ChatGPT: Can You Prepare My Patients for [18F]FDG PET/CT and Explain My Reports?. J Nucl Med 2023;64:1876-9. [Crossref] [PubMed]
Cite this article as: Infante A, Guerri G, Del Ciello A, Gullì C, Chiloiro S, De Vizio S, Ruscelli L, Merlino B, Natale L, Sala E, Gaudino S. Would ChatPDF be advantageous for expediting the interpretation of imaging and clinical articles in PDF format? J Med Artif Intell 2025;8:1.