A narrative review of the evolving role of language models in biomedicine: bridging text, omics, and multiscale integration

Yanling Qi; Ying Wang; Wen Wen

doi:10.21037/jmai-2025-203

Review Article

A narrative review of the evolving role of language models in biomedicine: bridging text, omics, and multiscale integration

Yanling Qi^1,2, Ying Wang^3*, Wen Wen^1,4,5*

¹Department of Laboratory Diagnosis, Third Affiliated Hospital of Naval Medical University (Second Military Medical University), Shanghai, China; ²School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China; ³Department of Clinical Laboratory Medicine Center, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, China; ⁴Third Affiliated Hospital of Naval Medical University, National Center for Liver Cancer, Shanghai, China; ⁵International Cooperation Laboratory on Signal Transduction, Third Affiliated Hospital of Naval Medical University (Second Military Medical University), Shanghai, China

Contributions: (I) Conception and design: W Wen, Y Wang; (II) Administrative support: W Wen, Y Wang; (III) Provision of study materials or patients: None; (IV) Collection and assembly of data: Y Qi; (V) Data analysis and interpretation: Y Qi; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^*These authors contributed equally to this work as co-senior authors.

Correspondence to: Wen Wen, PhD. International Cooperation Laboratory on Signal Transduction, Third Affiliated Hospital of Naval Medical University (Second Military Medical University), Changhai Road, No. 225, Shanghai 200438, China; Department of Laboratory Diagnosis, Third Affiliated Hospital of Naval Medical University (Second Military Medical University), Shanghai, China; Third Affiliated Hospital of Naval Medical University, National Center for Liver Cancer, Shanghai, China. Email: wenwen@smmu.edu.cn; Ying Wang, PhD. Department of Clinical Laboratory Medicine Center, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, Ganhe Road, No. 110, Shanghai 200437, China. Email: nadger_wang@139.com.

Background and Objective: The rapid evolution of language models (LMs) is facilitating their expansion from general-purpose domains into specialized biomedical fields. This transition brings forth unprecedented opportunities alongside unique challenges, encompassing the management of complex terminologies, virtual-actual combination, and the assurance of reliability in clinical contexts. The purpose of this review is to investigate and synthesize genuine LMs in the biomedical field, explore the applications of three key areas inspired by LM architectures (protein structure prediction, cell annotation, and foundation models), and discuss the integration of text and omics data.

Methods: This review adopted a non-systematic search strategy to analyze relevant literature from the PubMed database. This search mainly focused on publications from 2017 to February 2025, with keywords including “language models”, “entity recognition”, “relation extraction”, “ontology”, “knowledge graph”, “protein structure prediction”, “and “cell annotation”.

Key Content and Findings: This review examines the applications of LMs in three key areas: the application of LMs in biomedical text mining, omics researches inspired by LMs architectures, and the application in virtual-actual data combination. A key finding is that while LMs exhibit exceptional capabilities, their effectiveness is frequently constrained by inherent challenges, including model hallucination, poor reproducibility, and a fundamental reliance on the quality of their training data. We detail how integrating knowledge graphs (KGs) and ontologies can significantly enhance the interpretability of LMs and mitigate these risks. Furthermore, we analyze the paradigm of leveraging LMs to fuse multi-source data, which addresses data silos between textual and omics datasets. This fusion substantially deepens biomedical research capabilities and supports the visualization of complex biological associations.

Conclusions: The advancement of LMs in biomedicine depends on establishing an intelligent framework that synergizes knowledge-guided and data-driven approaches. This synergy is critical for enabling the transition from simplistic “data splicing” to meaningful “knowledge fusion”, thereby enhancing model interpretability and decision-making transparency. By delineating this pathway, this review informs future research by promoting the development of robust, reliable, and clinically translatable artificial intelligence (AI) models.

Keywords: Language models (LMs); knowledge graph (KG); text ming; ontology; virtual-actual combination

Received: 25 August 2025; Accepted: 01 December 2025; Published online: 05 February 2026.

doi: 10.21037/jmai-2025-203

Introduction

Since the emergence of large language models (LLMs), attention to language models (LMs) has reached an unprecedented level, including the biomedicine field. LMs are not a new technological concept but have undergone a long development process. With the advent of the era of LLMs, the role of LMs has changed significantly. The earliest rule-based LMs originated in the 50s of the 20th century, mainly relying on handwritten rules by linguists and researchers, which exhibit high interpretability in texts with fixed structures and terminological standards (1,2). Although they struggle with the complexity and diversity of text and lack adaptive learning capabilities, these methods still have important values in the standardization of biomedical literature, medical terminology mapping, knowledge base development and medical record analysis following established rules (3). In the 1980s and 1990s, statistical methods became mainstream, with N-gram (4), hidden markov models (HMM) (5) and conditional random fields (CRF) being the most representative (6). The application of LMs in biomedicine has far exceeded the scope of linguistic text mining. HMM can effectively handle sequence data for gene sequence alignment and prediction (7). CRF excels in sequence labeling tasks and is ainstream, with N-gram (4), hidden widely used for the recognition and annotation of protein functional domains (8). Recurrent neural networks (RNNs) and their variants, such as gated recurrent units (GRUs) and long short-term memory networks (LSTMs), can process variable-length sequences for understanding long-range dependencies in gene and protein sequences (9).

In 2017, the Transformer model opened a new era for pre-trained LMs. Pre-trained LMs are initially trained on extensive corpora to capture generalized language representations, and subsequently, they can be fine-tuned to address a variety of specific downstream tasks. In addition to text mining applied to biomedical literature, pre-trained LMs represented by bidirectional encoder representations from transformers (BERT) and its variants have been increasingly applied in biology, including BioBERT (10), DNABERT (11), ProtBERT (12). The other important application area of LMs is clinical text data. Clinical data is highly dimensional, rapidly updated, scarce, and irregular, which makes data extraction and analysis complicated. Natural language processing (NLP) technology plays a key role in this field, enabling researchers to extract useful information from complex electronic health records (EHRs) data and make feature selection to build predictive models, such as IE system (13) and CancerBERT (14). In 2022, the release of ChatGPT, built on GPT-3 (15), marked a landmark breakthrough for generative LMs in conversational systems. This success has dramatically accelerated the widespread adoption and architectural innovation of LLMs across diverse domains. Among them, DeepSeek stands out with its Mixture-of-Experts architecture and advanced multimodal capabilities (16). Meanwhile, in the biomedical field, a series of LM-inspired architectures have emerged, extending the core principles of the Transformer architecture to non-natural language sequences such as omics data (e.g., genomics and proteomics). Accordingly, the application of LMs is evolving from traditional linguistic text mining toward deeper integration into fundamental life-science research and clinical practice in precision medicine.

Despite the increasing reviews on LMs in biomedicine, existing studies suffer from notable limitations: a primary focus on single-domain applications, with relatively limited exploration of multi-dimensional data; insufficient attention to the synergistic effect of knowledge graphs (KGs) and LMs in enhancing interpretability; and inadequate discussion on the “knowledge fusion” paradigm as a solution to cross-modal silos. To address these gaps, this review comprehensively covers LM applications across biomedical text mining, omics analysis, and their integration. For clarity, we distinguish two model categories: genuine LLMs (prioritizing linguistic data like literature/EHRs for tasks such as entity recognition) and LM-inspired architectures (a class of models whose design and training paradigm are adapted from LMs in NLP, particularly those based on the Transformer architecture), which are discussed separately to maintain focus on the core topic. We present this article in accordance with the Narrative Review reporting checklist (available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-203/rc).

Methods

This review comprehensively analyzed the literature on the applications of LMs in biomedicine by searching the PubMed database with core keywords. Table 1 details the specific search strategy employed. Keywords included combinations of “language model”, “large language model”, “Transformer”, “Bert”, “text ming”, “entity recognition”, “relation extraction”, “ontology”, “knowledge graph”, “Question & Answer system”, “protein structure prediction”, “cell annotation”, “foundation model”, “gene function annotation”, and “intelligent medicine”.

Table 1

The search strategy summary

Items	Specification
Date of search	February 19, 2025
Database searched	PubMed
Search terms used	Search terms included combinations of: “language model”, “large language model”, “Transformer”, “Bert”, “text ming”, “entity recognition”, “relation extraction”, “ontology”, “knowledge graph”, “Question&Answer system”, “protein structure prediction”, “cell annotation”, “foundation model”, “gene function annotation”, and “intelligent medicine”. No automated filters were applied
Timeframe	2017–2025
Inclusion and exclusion criteria	Inclusion: English-language publications; peer-reviewed articles; focus on LMs in biomedical text mining, LM-inspired architectures for biological tasks, or LMs for virtual-actual data integration; studies presenting novel methodologies, foundational insights, or critical challenges
Inclusion and exclusion criteria	Exclusion: non-English publications, studies unrelated to the target domains
Selection process	Two-stage screening: (I) title/abstract review by all authors; (II) full-text review of eligible studies. Discrepancies resolved by consensus based on contribution to the review’s objectives

LM, language model.

The literature search focused on publications from 2017 to February 2025 to balance coverage of recent advancements and foundational works. Articles were selected based on relevance to the three domains: (I) application of genuine LMs in biomedical text [e.g., text ming, named entity recognition (NER), relation extraction (RE), ontology, KGs, Question & Answer (Q&A) system]; (II) research leveraging LM-inspired architectures for protein structure prediction, cell annotation, and foundation model (FM); (III) applications combining virtual-actual data, including gene function annotation and the integration of text data with omics data.

The literature selection followed a two-stage screening process. First, titles and abstracts were screened against the inclusion criteria. Subsequently, studies deemed eligible underwent a full-text review. Any discrepancies regarding inclusion were resolved through consensus discussions among the authors, with a focus on the study’s contribution to synthesizing applications and methodological innovations of LMs in biomedicine. As this is a narrative review, prioritizing studies that offered novel insights, presented methodological advances, or provided critical discussions on key challenges (e.g., data quality, model interpretability). The review was limited to English-language publications to maintain consistency in interpretation.

Application of genuine LMs in biomedical text

Text mining, which focuses on automatically identifying and extracting relevant knowledge from unstructured text, represents one of the main application areas for LMs. Although LMs have reached a considerable level of maturity in general textual applications, their deployment in complex biomedical contexts remains challenging. To provide a comprehensive overview of LMs in biomedicine, Figure 1 illustrates their historical evolution, technical features, and application scopes. Figure 1A presents a timeline of LM development, beginning with rule-based systems in the 1950s, progressing through statistical approaches in the 1980s–1990s, deep learning models in the 2010s, and pre-trained architectures after 2017, up to contemporary LLMs such as ChatGPT and Deepseek. This chronology underscores the iterative progress that has enhanced the capacity of LMs to process increasingly sophisticated data and tasks. Figure 1B compares representative LMs across different phases, summarizing their strengths, limitations, and typical use cases in biomedicine. For instance, rule-based LMs enable highly interpretable structured text classification but suffer from limited generalizability, whereas LLMs like ChatGPT support advanced language understanding in Q&A systems yet raise concerns regarding data dependency and privacy. Figure 1C displays a heatmap of the adoption of various LMs across key biomedical tasks, such as NER, RE, KG-related tasks, and Q&A systems. The “Values” legend (0/1) indicates whether a model is applied to a specific task, thereby visualizing the breadth of LM deployment in tackling domain-specific challenges. Collectively, Figure 1 contextualizes the progression and distinctive traits of LMs, thereby laying the groundwork for the subsequent discussion of genuine LMs. Table 2 systematically collates state-of-the-art LM applications in biomedical text mining, and details key information for each model, including task category, model name, LM type, specific application scenarios, and quantitative performance metrics.

Figure 1 Overview of language models in biomedical text analysis. (A) The development process of LMs. (B) Characteristics of LMs at different stages and in common biomedical scenarios. (C) Heatmap illustrating the application models of LMs in biomedical texts. BERT, bidirectional encoder representations from transformers; BertSRC, biomedical entity relation transformer for semantic relation classification; ChatGPT, chat generative pre-trained transformer; CNN, convolutional neural network; DD, drug discovery; RE, relation extraction; GAMedX, graph-augmented medical text transformer; KG, knowledge graph; LLM, large language model; LM, language model; MKG-GC, multi-modal knowledge graph for gastric cancer; NER, named entity recognition; OSCR, open-schema commonsense reasoning; Q&A, question answering; RARPKB, robotic-assisted radical prostatectomy knowledge base; SIMKGC, similarity-based knowledge graph completion; TABoLiSTM, table-based bidirectional long short-term memory.

Table 2

Application models, usage and evaluation of LMs in biomedical texts

ID	Task	Name	Type	Model usage	Optimal model performance
1	NER	ViMRT	Rule-based LMs	Virus mutation recognition, virus gene recognition and disease recognition	Test set: F1-score 99.17%; validation set: F1-score 96.99%
2	NER	TABoLiSTM	Deep learning-based LMs	Metabolite named entity recognition	BioBERT version: F1-score 0.909, precision 0.926, recall 0.893
3	NER	MetaboListem	Based on pre-trained LMs	Metabolite named entity recognition	F1-score 0.890, precision 0.892, recall 0.888
4	RE	BertSRC	Based on pre-trained LMs	Classification of semantic relations in biomedical text mining	Test set: F1_score 0.852
5	KGC	KG-BERT	Based on pre-trained LMs	Triple classification, link prediction and relation prediction tasks	WN11 dataset accuracy: 93.5%; FB13 dataset accuracy: 90.4%
6	KGC	SIMKGC	Based on pre-trained LMs	Knowledge graph completion	WN18RR: +19% F1, Wikidata5M (T): +6.8% F1, Wikidata5M (I): +22% F1
7	KG & DD	MKG-GC	Based on pre-trained LMs	Automatically extract biomedical knowledge related to GC from the literature and identify GC drug candidates	Hits@ 1 score of 94.5%; F1 value of 86.9%
8	KGC & DD	CPMKG	LLMs	Personalized medicine and drug discovery	Compiled 307,614 knowledge entries
9	KG into Q&A system	OSCR	Based on pre-trained LMs	World knowledge, causal reasoning and healthcare knowledge	Obtained 33.3%, 18.6%, and 4% improved accuracy compared to pre-training BERT without OSCR
10	KG into Q&A system	RARPKB	LLMs	Personalized robotic-assisted surgical planning for prostate cancer patients	Authenticity, matching, personalized recs, patient matching, complication recs: RARPKB (100%) > ChatGPT-4; SUS: RARPKB 88.88±15.03; NPS: 85
11	KG into Q&A system	Text2GraphOL	Based on pre-trained LMs	Intelligent medical consultation Chatbot	Top performance: 76.2% (execution with values), 75.8% (deviation), 72.3% (exact set match accuracy)—beats current text-to-SQL models
12	NER	tmVar	Rule-based LMs, LMs based on statistical methods	A wide range of sequence variants described at protein, DNA and RNA levels	Our method: 91.4% (our corpus) >78.1%, 93.9% (their gold standard) >89.4% (F-measure)
13	RE	PubMedBERT (COVID-19)	Based on pre-trained LMs	Subject relationship strengthening, drug recognition and path prediction, long and short COVID-19 drug search, and coronavirus relationship prediction	F1-score: 0.8998

BERT, bidirectional encoder representations from transformers; BertSRC, biomedical entity relation transformer for semantic relation classification; BioBERT, biomedical bidirectional encoder representations from transformers; COVID-19, coronavirus disease 2019; DD, drug discovery; ERT, extractive rationalization for transformers; GC, gastric cancer; LLM, large language model; LM, language model; KG, knowledge graph; KGC, knowledge graph completion; MKG-GC, multi-modal knowledge graph for gastric cancer; NER, named entity recognition; NPS, net promoter score; OSCR, open-schema commonsense reasoning; RE, relationship extraction; Q&A, question and answer; RARPKB, robotic-assisted radical prostatectomy knowledge base; SIMKGC, similarity-based knowledge graph completion; SQL, structured query language; SUS, system usability scale; TABoLiSTM, table-based bidirectional long short-term memory; tmVar, text mining for sequence Variants; ViMRT, virus mutation recognition tool.

Biomedical NER and RE

In biomedical NLP, conventional research on NER has primarily focused on common entity types such as genes, diseases, and drugs, with representative tools including Pubtator (17), GNormPlus (18), DNorm (19) and ChemSpot (20). A key advantage of studying these prevalent entities is the availability of large-scale, expertly annotated corpora, which provide a robust foundation for model training, benchmarking, and iterative improvement. However, when applied to specialized domains or emerging scenarios, these general-purpose resources often fail to capture domain-specific or rare entities, necessitating additional data collection and manual annotation. To bridge biomedical text with standardized terminology, MetaMap is an advanced NLP tool that covers 127 semantic types and effectively maps terms from biomedical phrases to standardized concepts in the Unified Medical Language System (21). Although it extracts the largest number of concepts among comparable tools, MetaMap still requires manual preprocessing to address abbreviations and synonyms and struggles with unrecognized or out-of-vocabulary terms (22). In response to the limitations of general tools, domain-specific solutions have emerged. For example, Tong et al. (23) developed virus mutation recognition tool (ViMRT), a text mining tool that automates the identification of viral mutations based on 8 optimization rules and 12 regular expressions to improve the accuracy of mutation entity. MetaboListem and table-based bidirectional long short-term memory (TABoLiSTM), proposed by Yeung et al. (24), significantly improved the accuracy of the recognition of metabolite named entities by using GloVe word embedding and BERT and BioBERT word embedding.

RE is a core task in building KG. Previous studies have adopted statistical methods, including co-occurrence-based, pattern-based (25), rule-based, feature-based (26) and kernel-based methods (27). SemRep is a famous rule-based RE tool that extracts semantic relationships from sentences in biomedical texts (28,29). However, due to its rule dependence and domain limitations, SemRep can extract a limited number of semantic relations. The complex definition of relationships between biomedical entities poses a challenge for the RE task, and most existing RE datasets, such as gene-disease relation (GDR) (30), are limited in terms of data volume and relationship diversity due to the labor-intensive manual annotation process. BioRel is a large-scale biomedical RE dataset that covers 125 types of biomedical relationships through the use of knowledge bases and distant supervision (31). The biomedical entity relation transformer for semantic relation classification (BertSRC) relationship extraction model constructed by Lee et al. achieves excellent performance in RE of various biological entities (genes, proteins, diseases, etc.) through an improved fine-tuning method (32). Zhang et al. (33) developed a large-scale biomedical RE model and compared the performance of different pre-trained model architectures and domain models on the relationship extraction task. PubMedBERT model achieved the best performance after continuous pre-training, and the relational extraction model can be effectively applied to the coronavirus disease 2019 (COVID-19) corpus.

Overall, in biomedical applications, the development of multi-semantics entity recognition technologies and labeled datasets represents a critical direction. A key challenge is that the relationships between genes and diseases often involve multiple intermediate entities and complex regulatory mechanisms, which existing methods struggle to fully capture. Moreover, relying on manually defined rules is not only time-consuming and labor-intensive but also susceptible to subjective biases. Another significant limitation is the domain dependence of RE systems, which leads to performance degradation when applied to new domains.

Ontology and knowledge graph (KG) construction

Ontologies provide researchers with a structured way to organize and understand knowledge by defining concepts, attributes, relationships, and their constraints within a specific domain (34). Gene Ontology (GO) provides a standardized vocabulary about the biological functions of gene products, used to describe gene function, cellular components, and biological processes (35). Disease Ontology defines the classification and hierarchy of diseases for disease-related research and medical records (36). Due to the limitations of data quality and automatic extraction techniques, the construction of ontologies and KG is usually incomplete, and some relationships between entities are missing. Therefore, some LM-based methods have been proposed for the KG completion task, aiming to infer missing relationships and improve the completeness of KG. Yang et al. (37) built the multi-modal knowledge graph (MKG) framework based on multi-task learning and knowledge graph, and used the pre-trained LMs biomedical, knowledge, graph, enhanced (BioKGE)-BERT and the drug-disease discrimination model based on convolutional neural network-bidirectional long short-term memory (CNN-BiLSTM) to predict potential gastric cancer drugs from the MGG-GC, providing new possibilities for the treatment of gastric cancer. Yang et al. (38) developed the CPMKG platform by integrating LLMs and precision medicine knowledge maps to provide powerful support for drug discovery and personalized medicine.

Ontology and KG into Q&A system

The combination of ontology, KG, and Q&A system can bring revolutionary progress in decision support, intelligent medicine and other fields (39). Goodwin et al. (40) proposed online spatio-cemporal correlation-based routing (OSCR) method that performs well in the tasks of causal reasoning and health care knowledge Q&A. The robotic-assisted radical prostatectomy knowledge base (RARPKB) developed by Li et al. integrates an advanced Q&A system that enables physicians and patients to quickly receive personalized recommendations and information for robot-assisted surgery for prostate cancer (41). Ni et al. (42) proposed Text2GraphQL, an intelligent medical consultation chatbot. It can transform natural language problems into graph database query language tasks for more efficient and direct communication between humans and machines.

As emphasized by Zhou et al. (43), unlike LLMs trained primarily on textual data, KGQA systems built on structured ontologies and KGs are designed to retrieve specific user-requested information from factually grounded knowledge repositories. This structural advantage enables KGQA outputs to be “traced and lend themselves to easier verification”, as the reasoning process is anchored in explicit entity-relation triples defined by ontologies. The use of domain-specific ontologies further reinforces interpretability by providing formal semantics for entities and relations, ensuring that the system’s inference aligns with domain knowledge and is transparent to users. This level of explainability is essential in accuracy-driven scientific research, where knowing the provenance of an answer is as important as the answer itself for establishing trust and ensuring reliable system deployment.

LM-inspired architectures in biomedicine

Beyond textual data, the architecture that underpins LMs has been successfully repurposed for a variety of biomedical data types. This section highlights applications in areas like protein structure prediction and cell annotation, where models treat biological data (e.g., sequence) as a form of language. These approaches exemplify how LM-inspired architectures are pushing boundaries in biomedicine, distinct from traditional NLP-based models.

To illustrate the application of LMs in different biomedical data analysis tasks, Figure 2 is divided into three panels. Figure 2A provides a structured summary of the biomedical text analysis workflow discussed in the previous section. It begins with literature retrieval using E-Utilities and keyword combinations, followed sequentially by database management, NER, RE, and knowledge discovery supported by ontologies and LM-powered Q&A systems. This pipeline illustrates how LMs facilitate end-to-end extraction of structured knowledge from unstructured text. This section focuses primarily on Figure 2B, which highlights the extension of LMs to omics data mining tasks, such as protein structure prediction, cell annotation and FM in the field of medicine. These approaches reinterpret biological data as a “language” to reveal patterns in protein folding, cellular heterogeneity, and regulatory networks.

Figure 2 Three application scenarios of language model. (A) Biomedical text analysis workflow: end-to-end structured knowledge extraction from unstructured text. (B) LM extension to omics data mining: applications in protein structure prediction, cell annotation, and medical foundation models. (C) Virtual-actual data combination: mining text and data for integrative biomedical knowledge. LM, language model; UMAP, uniform manifold approximation and projection.

Protein structure prediction

Analogous to processing textual sequences, LMs treat protein sequence structure as “amino acid residue sequences”. Whereas traditional NLP models capture contextual dependencies between characters or words, protein-specific LMs learn spatial and functional correlations among residues, thereby transferring text-modeling logic to biological sequence analysis. Table 3 provides a summary of applications inspired by LM architectures in protein structure prediction tasks, including encompassing model names, specific application scenarios, key performance metrics, and core functional characteristics.

Table 3

Application of LM-inspired architectures for protein prediction tasks

Model name	Applications	Role or effect
Porter 6 (44)	Protein secondary structure prediction	A more effective protein structure prediction method improves accuracy, overcomes secondary structure limitations, and reduces computational cost
ProstT5 (45)	Inverse folding, structure-guided mutation effect prediction, remote homology detection	It enhances remote homology detection, generates structurally similar protein sequences, and improves protein classification
DR-BERT (46)	Prediction and annotation of disordered regions of protein domains	DR-BERT shows high accuracy in predicting regions of protein disorder and can also provide annotations for these regions
A-Prot (47)	Protein 3D structure modeling	A-Prot excels in long-range contact prediction, generates accurate 3D structures for CASP14 targets, and captures evolutionary and structural information with low computational cost
EMBER2 (48)	Prediction of 3D structure of proteins	EMBER2 is a fast and accurate protein structure prediction method that rivals or surpasses the performance of existing methods while requiring significantly less computational resources
ProteinBERT (49)	protein annotation prediction	The results of several protein property prediction tasks are close to or better than the existing models
ESM-2 (50)	Prediction of three-dimensional structure of protein at atomic level	ESMFold, based on ESM-2, predicts atomic structures directly from sequences 60× faster than AlphaFold2 and RosettaFold, with comparable accuracy
ESM-IF1 (51)	Prediction of antibody mutational effect	Compared to sequence-only models, ESM-IF1 achieved 40%+ higher accuracy in antibody mutation effect prediction
Evo (52)	Prediction and design at the molecular to genome scale	Evo learns nucleotide changes’ impact on organism fitness and generates >1 Mbp DNA sequences with plausible architecture

3D, three-dimensional; A-Prot, alphafold-inspired protein structure prediction; BERT, bidirectional encoder representations from transformers; DR-BERT, disordered region bidirectional encoder representations from transformers; EMBER2, efficient model for biomolecular embedding and retrieval 2; ESM-2, evolutionary scale modeling 2; ESM-IF1, evolutionary scale modeling for inverse folding 1; Evo, evolutionary sequence model; LM, language model; ProstT5, protein structure-aware T5; ProteinBERT, protein-specific bidirectional encoder representations from transformers

For example, Hie et al. (53) applied LMs to sequences of influenza hemagglutinin, human immunodeficiency virus (HIV)-1 envelope glycoprotein, and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike protein. The study also points out that judging whether a gene sequence encodes a functional protein is more straightforward than directly determining its three-dimensional structure, as the former primarily involves identifying open reading frames and conserved domains rather than resolving atomic-level structural details. Alanazi et al. (44) proposed Porter 6, a protein secondary structure prediction model that leverages embeddings generated by the evolutionary scale modeling 2 (ESM-2) pre-trained LMs as features and adopts an ensemble convolutional bidirectional recurrent neural network (CBRNN) architecture trained on large-scale Protein Data Bank (PDB) datasets. This model achieves 84.56% accuracy in three-state and 74.18% accuracy in eight-state classification tasks, outperforming its predecessor and state-of-the-art methods significantly while eliminating the need for multiple sequence alignments, thus substantially reducing the computational complexity. Heinzinger et al. (45) developed the bilingual protein LMs ProstT5, which encodes protein 3D structures into 1D 3Di tokens via Foldseek’s 3Di alphabet, enabling bidirectional translation between amino acid sequences and 3Di sequences. This model demonstrates superior performance in structure-related prediction tasks such as remote homology detection and reverse folding. Compared with the traditional workflow of first predicting 3D structures and then extracting 3Di tokens, ProstT5 achieves a three-order-of-magnitude speedup in 3Di sequence generation. Collectively, these studies highlight the transformative potential of LMs in protein structure prediction, as they not only enhance prediction accuracy and expand applicable tasks but also streamline computational workflows by reducing reliance on resource-intensive steps like multiple sequence alignments or full 3D structure prediction. Although LMs have been widely applied in the field of protein research, some cutting-edge explorations have further delved into the DNA level. For instance, Veniamin Fishman et al. proposed the GENA LM, which is a basic DNA LM based on the Transformer architecture. It can handle input sequences of up to 36,000 base pairs and is expected to drive new breakthroughs in genomics and multi-omics data analysis (54).

Cell annotation

Single-cell RNA sequencing has revolutionized the study of cellular heterogeneity but faces challenges in annotation due to data sparsity and reliance on manually curated databases. To address this, LMs are being repurposed to treat single-cell data as a linguistic system. In this paradigm, a cell’s transcriptome is conceptualized as a “document”, where genes act as “words” and their expression levels define contextual importance. LMs, particularly Transformer-based architectures, learn the complex syntactic and semantic relationships among these genes, enabling them to decode the “cellular language” and identify subtle patterns characteristic of distinct cell types.

This approach is demonstrated in several studies. Yang et al. (55) developed scBERT, a large-scale pre-trained deep LMs based on the Transformer architecture. By discretizing gene expression into “word bags” and integrating gene2vec for gene embeddings, it simulates natural language semantics and contextual relationships. Pre-trained on millions of unlabeled scRNA-seq data to learn universal gene interaction patterns, the model adapts to specific annotation tasks via supervised fine-tuning, with no need for manual gene labeling or dimensionality reduction. It exhibits excellent annotation accuracy, batch effect resistance, and gene-level interpretability across cross-dataset, cross-organ, and class imbalance scenarios, while effectively discovering new cell types, offering an efficient LM solution for single-cell data analysis.

FM definition and clinical applications

FM is a class of highly advanced artificial intelligence (AI) models with large-scale parameters and complex structures that can be trained by self-supervised learning on a large number of unlabeled datasets to adapt to diverse downstream tasks (56). These models are capable of processing data from different modalities, including text and tabular data (57). The training process of the base model consists of two phases: pre-training and fine-tuning. In the pre-training phase, the model grasps the semantics and structure of the language by analyzing a large amount of text data and predicting subsequent words. This phase allows the model to acquire extensive linguistic knowledge. Subsequently, in the fine-tuning phase, the model utilizes human feedback to make adjustments in order to generate coherent and sensible responses to a variety of queries and task requirements. For instance, the biomedical FM Evo is pre-trained on a large number of unlabeled prokaryotic and bacteriophage genomes to learn multi-scale biological sequence patterns, and is fine-tuned for tasks such as CRISPR-Cas system design and gene essentiality prediction, supporting precision medical applications like targeted gene therapy (52). Dai et al. (58) proposed TCMChat, a traditional Chinese medicine (TCM)-specific FM based on Baichuan2-7B-Chat. Pretrained on massive unlabeled TCM texts and fine-tuned on 600,000 TCM instructions, it performs well in tasks like diagnosis, herb recommendation, and ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) prediction, supporting TCM modernization and personalized care. The research and applications of FM in the field of clinical medicine are shown in Table 4.

Table 4

Application of foundation models in the field of medicine

Model name	Applications	Role or effect
GenePT (59)	Genes and cells	Generate gene embeddings by combining NCBI text descriptions and GPT-3.5 to provide effective embeddings for enhanced gene and cell research
Co-attention based transformer (60)	Viral genome sequence analysis	The P-E theorem proposes a Transformer-based pre-training model to enhance downstream fine-tuning, simulate viral neutral evolution, predict immune escape mutations, and compute viral R0 values, offering a new research paradigm for theoretical and computational biology
EHR foundation models (61)	Clinical medical records analysis	Constructed patient representations via logistic regression using self-supervised learning pre-trained EHR base models for predicting clinical outcomes such as hospital mortality, long-term hospitalization, 30-day readmission, and ICU admission, improving model robustness under time-distribution variations
ENBED (62)	Dna sequence analysis	Significant improvements were shown in downstream tasks, including enhancer, promoter, and splice site identification, detection of base errors and insertions/deletions, genomic sequence function annotation, and influenza virus mutation generation and validation
TCMChat (58)	TCM query, diagnosis, herb/formula recommendation, entity extraction, ADMET prediction	Accurate TCM QA, supports personalized care, boosts TCM text utilization, accelerates TCM drug development, facilitates TCM modernization
KED (63)	ECG zero-shot/few-shot diagnosis, clinical analysis, interpretable diagnosis	Robust across populations/devices, matches cardiologist-level accuracy, aids underserved areas, supports rapid center adaptation, enhances clinical trust via interpretability
Centaur (64)	Human behavior prediction, out-of-distribution generalization, cognitive model development	Outperforms domain-specific cognitive models, generalizes to unseen tasks/cover stories, aligns with human neural representations, guides interpretable cognitive theory discovery

ADMET, absorption, distribution, metabolism, excretion, toxicity; BERT, bidirectional encoder representations from transformers; ECG, electrocardiogram; EHR, electronic health record; ENBED, embedding for nucleotide-based encoding and decoding; GPT-3.5, generative pre-trained transformer 3.5; ICU, intensive care unit; KED, knowledge-enhanced diagnosis; NCBI, National Center for Biotechnology Information; P-E theorem, pre-training-enhancement theorem; QA, question answering; R0, basic reproduction number; SUS, system usability scale; TCM, traditional Chinese medicine; TCMChat, traditional chinese medicine chat; Transformer, transformer-based architecture.

Combination of virtual-actual data

Since single data often cannot fully reveal complex biological processes, the integration of multidimensional data (e.g., text and omics) helps researchers better understand biological phenomena from different angles, discover new biomarkers, and reveal disease mechanisms. Connecting text and data, the core function of LMs lies in transforming scattered biomedical literature into systematic knowledge. For instance, entities such as genes and pathways, along with their relationships, are extracted from unstructured texts leveraging NLP techniques, subsequently constructing gene function fingerprint and KG. This approach significantly expands our understanding of gene functions and disease mechanisms while being directly applied to discovering novel regulatory genes, predicting drug targets, and developing comprehensive biological knowledge bases.

Gene function and pathway annotation

Gene function annotation is a key step that helps us understand the role of genes in an organism and their association with diseases. However, traditional gene function labeling methods rely on manually collated information from public databases [e.g., Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG)]. The limited number of genes with functional annotations and the limited knowledge of gene function in public annotation databases make it impossible to identify genes that may affect biological processes but are not included in public gene function annotation files. Qin et al. (65) extracted GO terms related to yeast genes from PubMed, constructed and compared gene ontology fingerprints (GOFs), used GOFs to construct similar networks among genes, and successfully discovered new sphingolipid pathway regulatory genes. Wang et al. (66) introduced the explainable gene ontology fingerprint (XGOF) method, which leverages NLP to extract genes and GO terms from biomedical literature for constructing the XGOF network. XGOF demonstrates outstanding performance in multiple tasks: classifying tumor samples while uncovering similarities in GO terms and tumor-specific patterns; inferring cellular composition of the tumor microenvironment through enrichment analysis; and predicting novel gene functions based on gene-gene similarity score. In the context of single-gene function research, Wang et al. (67) employed XGOF to extract APOA1-related GO terms from literature, and built the APOA1 ontology fingerprint. APOA1-GOF indicated that APOA1 played various crucial biological roles in hepatocellular carcinoma, primarily involved in cholesterol efflux. In the context of gastrointestinal (GI) cancer research, Huang et al. built GI-GOF to show that GI cancers are related to diverse immune functions, especially the macrophage migration (68). These studies exemplify the paradigm of virtual-actual combination. The process involves first establishing a prior knowledge framework of gene function through NLP, then formulating predictive hypotheses integrating omics data, and finally providing empirical verification of these findings through biological experiments.

Development of an integrated text and omics database

The combination of text and data mining can overcome the limitations of single data sources, and through the integration of multi-dimensional information, it provides researchers with more comprehensive data search and analysis results. This integration is indispensable and urgent for the development of databases and knowledge bases. Wang et al. (69) developed GIDB, a knowledge database for GI tumors, integrating a knowledge map that details genes, miRNAs, lncRNAs, copy number variants, fusion genes, and their complex relationships. These interactions are visualized as semantic networks, enabling researchers to explore their roles in GI cancer development. Yang et al. (37) developed MGG-GC, a knowledge graph framework for gastric cancer, integrating scattered biomedical literature into a structured knowledge base. Utilizing multi-task learning, it automatically identifies and classifies disease-related entities and relationships. The system employs pre-trained models (BioKGE-BERT and CNN-BiLSTM) to analyze drug-disease interactions and predict potential therapeutic drugs for gastric cancer. Wang et al. (70) developed ViMIC, a database integrating 31,712 viral mutations, 105,624 integration sites, 16,310 viral target genes, and over one million viral sequences linking 8 viruses related to 77 human diseases. It enables exploration of virus-host cis-effects, including histone modifications, transcription factor binding, and chromatin accessibility at viral integration sites. Huang et al. (71) further updated ViMIC to version 2.0, expanding coverage to 28 viruses, 64,168 mutations, and 255 multi-omics datasets (including scRNA-seq and methylation profiling), supporting comprehensive analysis of virus-host interactions through enhanced visualization and data integration. Huang et al. (72) constructed a multi-semantic annotation dataset and automated recognition tool for viral carcinogenesis factors, integrating 551,715 literature abstracts, 5.8 million entities, and transcriptomic data to identify internal and external carcinogenic factors across 8 oncogenic viruses. Diamant et al. significantly upgraded the Harmonizome database to Harmonizome 3.0, adding 26 datasets with nearly 12 million gene-attribute associations and functionalities including dataset crossing, an LLM-powered chatbot, and KG visualization, offering an integrated, AI-ready multi-omics resource for biomedical research (73).

Strengths and limitations

The strength of this review lies in its broad and integrative scope. Unlike surveys that focus solely on text mining or a single type of omics data, we have systematically categorized and analyzed LM applications across three pivotal fronts: (I) genuine LMs in biomedical text mining; (II) LM-inspired architectures for specialized biological data tasks; and (III) virtual-actual data combination. This provides a cohesive structure for understanding both the direct and the inspirational impact of LMs on the field.

Several limitations of this study warrant acknowledgment. A potential source of bias in this review stems from the literature selection process, which was confined to the PubMed database. Consequently, relevant research available in other sources, including Embase, the Cochrane Library, and non-English journals, may have been overlooked, affecting the review’s comprehensiveness and generalizability. Finally, the field of biomedical LMs is advancing at an extraordinary pace. As such, this review, while including literature up to February 2025, inevitably faces the challenge of timeliness, and some of the most recent breakthroughs may not be captured. Despite these limitations, we believe this review successfully maps the current ecosystem and outlines a persistent challenge for future research.

Future directions and perspectives

The application of LMs in the biomedical field has demonstrated significant potential, yet multiple challenges remain to be addressed for successful technical implementation (74). In biomedical text processing, LMs must tackle the complexity of specialized terminology and the scarcity of annotated data. The ambiguity of gene names can lead to errors in entity recognition, while the construction of KG requires the integration of multi-source heterogeneous data (such as literature, omics, clinical images, and electronic medical records), placing higher demands on model’s contextual understanding and cross-domain knowledge alignment capabilities. Additionally, when incorporating KG into the Q&A systems, the multi-hop reasoning ability of model remains insufficient. For example, deducing disease mechanisms from gene functions often requires cross-hierarchical associations, and the lag in dynamic knowledge updates limits the system’s timeliness (75).

In the field of omics data analysis, it is evident that much of the current research merely replaces traditional machine learning models with the LM-based models, adapting to numerical features through fine-tuning while overlooking the heterogeneity between text and omics data in the representation space (76). The discrete symbolic system of text data and the continuous high-dimensional distribution of omics data may create a gap, and this simplistic model “replacement” may result in the model’s inability to capture cross-modal underlying biological patterns (e.g., the nonlinear relationship between gene mutations and pathological phenotypes described in the literature).

Currently, the integration of text and omics data remains relatively simplistic, with most research limited to shallow associations. For example, existing methods often map entities in biomedical text (e.g., genes) to omics data (e.g., gene expression matrices) without delving into the semantic interactions and hierarchical relationships between the two (77). In clinical scenarios that combine virtual and real data, LMs must bridge the gap from theoretical research to practical application. When deploying these models in clinical settings, issues such as insufficient interpretability and privacy compliance are particularly prominent (78). Federated learning frameworks may alleviate data silo problems through distributed training, while introducing causal reasoning mechanisms can enhance the traceability of the model’s logical chains (79). However, the computational complexity and implementation costs of these approaches still require further optimization.

Moreover, the lag in visualization technology further constrains integration effectiveness. Existing tools often independently display text keyword clouds and omics heatmaps, lacking dynamic interactive interfaces to reveal causal or synergistic relationships. Interpretable visualization modules (e.g., dynamic projections based on graph networks) could be introduced to display in real-time the association strength and path dependencies between textual concepts (e.g., a biological process) and omics entities (e.g., a gene’s expression level) under different experimental conditions, advancing from “data splicing” to “knowledge fusion”.

The choice of model must closely align with the specific biomedical problem at hand. Pretrained LMs (e.g., BERT-based) and generative LMs (e.g., GPT-based) each have their suitable scenarios. For instance, in tasks requiring precise entity recognition or parsing complex semantic relationships (e.g., gene-disease RE), the contextual encoding capabilities of pretrained models can effectively capture the implicit logic of biomedical text (80). In tasks requiring hypothesis generation or explanatory content, generative models can transform patterns into comprehensible narratives through open-ended language synthesis. It is noteworthy that the strengths of both approaches can complement each other in the integration with KG. Pretrained models are better suited for extracting rules from structured knowledge to enhance semantic understanding (e.g., embedding GO terms into model representations) or performing binary or multi-class classification tasks on omics data features (e.g., tumor risk prediction). Generative models can dynamically invoke paths from KG to generate interpretable reasoning chains (e.g., explanation mechanism). This problem-oriented model adaptation principle requires researchers to not only focus on the technical advancements but also deeply understand the intrinsic characteristics of biomedical problems.

The deep fusion of KG and LMs is a key route to enhance the interpretability of biomedical intelligent systems (81). Whether it involves injecting structured prior knowledge into LMs through KG, leveraging LMs to mine potential associations from vast texts to dynamically expand KG, or collaboratively building chain-of-thought reasoning systems, this bidirectional enhancement mechanism can significantly improve the transparency of model decision-making and mitigate the “hallucination” issue (82). However, it is critical to explicitly acknowledge the inherent limitations of LLMs, particularly regarding reproducibility and hallucination. In biomedical and clinical contexts, where decisions have profound implications, these limitations pose substantial risks. LLMs are fundamentally constrained by the quality and scope of their training data, captured as “garbage in, garbage out”. The disappointing trajectory of IBM Watson in oncology serves as a stark reminder of the consequences when this principle is neglected; its performance was ultimately limited by the data it was trained on and the challenges of contextualizing real-world clinical complexity. Therefore, rigorous validation and continuous monitoring are essential to ensure reliability. A particularly promising role for LLMs in this landscape is their potential to act as bridges between humans and data-centered AI. They can contextualize numerical model outputs. For instance, by translating a complex gene expression pattern into a narrative about potential disease mechanisms, these outputs more interpretable and actionable for researchers and clinicians. This translational capability underscores the importance of human-AI collaboration. In summary, by constructing an intelligent paradigm that parallels knowledge-guided and data-driven approaches, this may help LMs transcend the limitations of being “black boxes” (82). The process itself will also promote the deep integration of computational science, life sciences, and medicine.

Conclusions

In summary, while LMs demonstrate significant potential across biomedical applications, their reliable integration into clinical practice faces challenges such as limited annotated data, model opacity, and the risk of hallucinations. Future advancement requires a strategic shift from merely applying large models to constructing intelligent systems that effectively combine data-driven learning with knowledge-guided reasoning. By embedding structured knowledge through mechanisms like knowledge graphs, we can enhance interpretability and ground predictions in established biological facts. Ultimately, these systems should augment human expertise, serving as powerful interfaces to contextualize complex data and generate testable hypotheses that accelerate the translation of raw data into meaningful biological insights and clinical applications.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the Narrative Review reporting checklist. Available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-203/rc

Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-203/prf

Funding: This work was supported by the National Natural Science Foundation of China (Nos. 32370694 and L2424328), Shanghai Municipal Health Commission (No. 2022XD036), and Shanghai Public Health Research Project (No. 2024GKM25).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-203/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Zeng Z, Deng Y, Li X, et al. Natural Language Processing for EHR-Based Computational Phenotyping. IEEE/ACM Trans Comput Biol Bioinform 2019;16:139-53. [Crossref] [PubMed]
Topaz M, Murga L, Gaddis KM, et al. Mining fall-related information in clinical notes: Comparison of rule-based and novel word embedding-based machine learning approaches. J Biomed Inform 2019;90:103103. [Crossref] [PubMed]
Yan Z, Dube V, Heselton J, et al. Understanding older people's voice interactions with smart voice assistants: a new modified rule-based natural language processing model with human input. Front Digit Health 2024;6:1329910. [Crossref] [PubMed]
Cavnar WB, Trenkle JM. N-gram-based text categorization. 1994. [Cited 30 January, 2025]. Available online: https://www.osti.gov/biblio/68573
Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 1989;77:257-86.
Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. International Conference on Machine Learning 2001. Availiable online: https://www.cs.columbia.edu/~jebara/6772/papers/crf.pdf
Testa AC, Hane JK, Ellwood SR, et al. CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts. BMC Genomics 2015;16:170. [Crossref] [PubMed]
Wang S, Peng J, Ma J, et al. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields. Sci Rep 2016;6:18962. [Crossref] [PubMed]
Quang D, Xie X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res 2016;44:e107. [Crossref] [PubMed]
Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020;36:1234-40. [Crossref] [PubMed]
Ji Y, Zhou Z, Liu H, et al. DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. Bioinformatics 2021;37:2112-20. [Crossref] [PubMed]
Elnaggar A, Heinzinger M, Dallago C, et al. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans Pattern Anal Mach Intell 2022;44:7112-27. [Crossref] [PubMed]
Mykowiecka A, Marciniak M, Kupść A. Rule-based information extraction from patients' clinical data. J Biomed Inform 2009;42:923-36. [Crossref] [PubMed]
Zhou S, Wang N, Wang L, et al. CancerBERT: a cancer domain-specific language model for extracting breast cancer phenotypes from electronic health records. J Am Med Inform Assoc 2022;29:1208-16. [Crossref] [PubMed]
Brown T, Mann B, Ryder N, et al. Language models are few-shot learners. NIPS'20: 34th International Conference on Neural Information Processing Systems; 6-12 December 2020; Vancouver BC Canada. NIPS 2020:1877-1901.
Deng Z, Ma W, Han QL, et al. Exploring DeepSeek: A Survey on Advances, Applications, Challenges and Future Directions. EEE/CAA J Autom Sinica 2025;12:872-93.
Wei CH, Kao HY, Lu Z. PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res 2013;41:W518-22. [Crossref] [PubMed]
Wei CH, Kao HY, Lu Z. GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains. Biomed Res Int 2015;2015:918710. [Crossref] [PubMed]
Leaman R, Islamaj Dogan R, Lu Z. DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 2013;29:2909-17. [Crossref] [PubMed]
Rocktäschel T, Weidlich M, Leser U. ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics 2012;28:1633-40. [Crossref] [PubMed]
Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp 2001;17-21.
Redjdal A, Bouaud J, Gligorov J, et al. Comparison of MetaMap, cTAKES, SIFR, and ECMT to Annotate Breast Cancer Patient Summaries. Stud Health Technol Inform 2022;290:187-91. [Crossref] [PubMed]
Tong Y, Tan F, Huang H, et al. ViMRT: a text-mining tool and search engine for automated virus mutation recognition. Bioinformatics 2023;39:btac721. [Crossref] [PubMed]
Yeung CS, Beck T, Posma JM. MetaboListem and TABoLiSTM: Two Deep Learning Algorithms for Metabolite Named Entity Recognition. Metabolites 2022;12:276. [Crossref] [PubMed]
Corney DP, Buxton BF, Langdon WB, et al. BioRAT: extracting biological information from full-length papers. Bioinformatics 2004;20:3206-13. [Crossref] [PubMed]
Kim S, Liu H, Yeganova L, et al. Extracting drug-drug interactions from literature using a rich feature-based linear kernel approach. J Biomed Inform 2015;55:23-30. [Crossref] [PubMed]
Zhang Y, Lin H, Yang Z, et al. A single kernel-based approach to extract drug-drug interactions from biomedical literature. PLoS One 2012;7:e48901. [Crossref] [PubMed]
Kilicoglu H, Rosemblat G, Fiszman M, et al. Broad-coverage biomedical relation extraction with SemRep. BMC Bioinformatics 2020;21:188. [Crossref] [PubMed]
Rindflesch TC, Fiszman M. The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform 2003;36:462-77. [Crossref] [PubMed]
Jung S, Lee T, Cheng CH, et al. 15 years of GDR: New data and functionality in the Genome Database for Rosaceae. Nucleic Acids Res 2019;47:D1137-45. [Crossref] [PubMed]
Xing R, Luo J, Song T. BioRel: towards large-scale biomedical relation extraction. BMC Bioinformatics 2020;21:543. [Crossref] [PubMed]
Lee Y, Son J, Song M. BertSRC: transformer-based semantic relation classification. BMC Med Inform Decis Mak 2022;22:234. [Crossref] [PubMed]
Zhang Z, Fang M, Wu R, et al. Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19. J Med Internet Res 2023;25:e48115. [Crossref] [PubMed]
Haendel MA, Chute CG, Robinson PN. Classification, Ontology, and Precision Medicine. N Engl J Med 2018;379:1452-62. [Crossref] [PubMed]
Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000;25:25-9. [Crossref] [PubMed]
Schriml LM, Arze C, Nadendla S, et al. Disease Ontology: a backbone for disease semantic integration. Nucleic Acids Res 2012;40:D940-6. [Crossref] [PubMed]
Yang Y, Lu Y, Zheng Z, et al. MKG-GC: A multi-task learning-based knowledge graph construction framework with personalized application to gastric cancer. Comput Struct Biotechnol J 2024;23:1339-47. [Crossref] [PubMed]
Yang J, Zhuang X, Li Z, et al. CPMKG: a condition-based knowledge graph for precision medicine. Database (Oxford) 2024;2024:baae102. [Crossref] [PubMed]
Peng C, Xia F, Naseriparsa M, et al. Knowledge Graphs: Opportunities and Challenges. Artif Intell Rev 2023;1-32. [Crossref] [PubMed]
Goodwin TR, Demner-Fushman D. Enhancing Question Answering by Injecting Ontological Knowledge through Regularization. Proc Conf Empir Methods Nat Lang Process 2020;2020:56-63.
Li J, Tang T, Wu E, et al. RARPKB: a knowledge-guide decision support platform for personalized robot-assisted surgery in prostate cancer. Int J Surg 2024;110:3412-24. [Crossref] [PubMed]
Ni P, Okhrati R, Guan S, et al. Knowledge Graph and Deep Learning-based Text-to-GQL Model for Intelligent Medical Consultation Chatbot. Inf Syst Front 2022;1-19. [Crossref] [PubMed]
Zhou X, Zhang S, Agarwal M, et al. Marie and BERT-A Knowledge Graph Embedding Based Question Answering System for Chemistry. ACS Omega 2023;8:33039-57. [Crossref] [PubMed]
Alanazi W, Meng D, Pollastri G. Porter 6: Protein Secondary Structure Prediction by Leveraging Pre-Trained Language Models (PLMs). Int J Mol Sci 2024;26:130. [Crossref] [PubMed]
Heinzinger M, Weissenow K, Sanchez JG, et al. Bilingual language model for protein sequence and structure. NAR Genom Bioinform 2024;6:lqae150. [Crossref] [PubMed]
Nambiar A, Forsyth JM, Liu S, et al. DR-BERT: A protein language model to annotate disordered regions. Structure 2024;32:1260-1268.e3. [Crossref] [PubMed]
Hong Y, Lee J, Ko J. A-Prot: protein structure modeling using MSA transformer. BMC Bioinformatics 2022;23:93. [Crossref] [PubMed]
Weissenow K, Heinzinger M, Rost B. Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction. Structure 2022;30:1169-1177.e4. [Crossref] [PubMed]
Brandes N, Ofer D, Peleg Y, et al. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics 2022;38:2102-10. [Crossref] [PubMed]
Lin Z, Akin H, Rao R, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023;379:1123-30. [Crossref] [PubMed]
Shanker VR, Bruun TUJ, Hie BL, et al. Unsupervised evolution of protein and antibody complexes with a structure-informed language model. Science 2024;385:46-53. [Crossref] [PubMed]
Nguyen E, Poli M, Durrant MG, et al. Sequence modeling and design from molecular to genome scale with Evo. Science 2024;386:eado9336. [Crossref] [PubMed]
Hie B, Zhong ED, Berger B, et al. Learning the language of viral evolution and escape. Science 2021;371:284-8. [Crossref] [PubMed]
Fishman V, Kuratov Y, Shmelev A, et al. GENA-LM: a family of open-source foundational DNA language models for long sequences. Nucleic Acids Res 2025;53:gkae1310. [Crossref] [PubMed]
Yang F, Wang W, Wang F, et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat Mach Intell 2022;4:852-66.
Jung KH. Uncover This Tech Term: Foundation Model. Korean J Radiol 2023;24:1038-41. [Crossref] [PubMed]
Waqas A, Bui MM, Glassy EF, et al. Revolutionizing Digital Pathology With the Power of Generative Artificial Intelligence and Foundation Models. Lab Invest 2023;103:100255. [Crossref] [PubMed]
Dai Y, Shao X, Zhang J, et al. TCMChat: A generative large language model for traditional Chinese medicine. Pharmacol Res 2024;210:107530. [Crossref] [PubMed]
Chen Y, Zou J. Simple and effective embedding model for single-cell biology built from ChatGPT. Nat Biomed Eng 2025;9:483-93. [Crossref] [PubMed]
Liu Y, Luo Y, Lu X, et al. Genotypic-phenotypic landscape computation based on first principle and deep learning. Brief Bioinform 2024;25:bbae191. [Crossref] [PubMed]
Guo LL, Steinberg E, Fleming SL, et al. EHR foundation models improve robustness in the presence of temporal distribution shift. Sci Rep 2023;13:3767. [Crossref] [PubMed]
Malusare A, Kothandaraman H, Tamboli D, et al. Understanding the natural language of DNA using encoder-decoder foundation models with byte-level precision. Bioinform Adv 2024;4:vbae117. [Crossref] [PubMed]
Tian Y, Li Z, Jin Y, et al. Foundation model of ECG diagnosis: Diagnostics and explanations of any form and rhythm on ECG. Cell Rep Med 2024;5:101875. [Crossref] [PubMed]
Binz M, Akata E, Bethge M, et al. A foundation model to predict and capture human cognition. Nature 2025;644:1002-9. [Crossref] [PubMed]
Qin T, Matmati N, Tsoi LC, et al. Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network. Nucleic Acids Res 2014;42:e138. [Crossref] [PubMed]
Wang Y, Zong H, Yang F, et al. A knowledge empowered explainable gene ontology fingerprint approach to improve gene functional explication and prediction. iScience 2023;26:106356. [Crossref] [PubMed]
Wang Y, Chen S, Xiao X, et al. Impact of apolipoprotein A1 on tumor immune microenvironment, clinical prognosis and genomic landscape in hepatocellular carcinoma. Precis Clin Med 2023;6:pbad021. [Crossref] [PubMed]
Huang H, Zhan Y, Zong H, et al. Decoding hepatobiliary-specific immune gene patterns in gastrointestinal cancers via gene ontology fingerprints, multi-omics, and experimental integration. Precis Clin Med 2025;8:pbaf014. [Crossref] [PubMed]
Wang Y, Wang Y, Wang S, et al. GIDB: a knowledge database for the automated curation and multidimensional analysis of molecular signatures in gastrointestinal cancer. Database (Oxford) 2019;2019:baz051. [Crossref] [PubMed]
Wang Y, Tong Y, Zhang Z, et al. ViMIC: a database of human disease-related virus mutations, integration sites and cis-effects. Nucleic Acids Res 2022;50:D918-27. [Crossref] [PubMed]
Huang C, Huang H, Ding M, et al. ViMIC 2.0: an updated database of human disease-related viral mutations, integration sites, and multi-omics data. Nucleic Acids Res 2025;gkaf1106. [Crossref] [PubMed]
Huang H, Huang D, Wei Z, et al. An open-source multi-semantic annotation dataset and automated recognition tool for viral carcinogenesis factors. Database (Oxford) 2025;2025:baaf038. [Crossref] [PubMed]
Diamant I, Clarke DJB, Evangelista JE, et al. Harmonizome 3.0: integrated knowledge about genes and proteins from diverse multi-omics resources. Nucleic Acids Res 2025;53:D1016-28. [Crossref] [PubMed]
Wang D, Zhang S. Large language models in medical and healthcare fields: applications, advances, and challenges. Artif Intell Rev 2024;57:299.
Yang L, Chen H, Li Z, et al. Give us the Facts: Enhancing Large Language Models With Knowledge Graphs for Fact-Aware Language Modeling. IEEE Transactions on Knowledge and Data Engineering 2024;36:3091-110.
Szałata A, Hrovatin K, Becker S, et al. Transformers in single-cell omics: a review and new perspectives. Nat Methods 2024;21:1430-43. [Crossref] [PubMed]
Wörheide MA, Krumsiek J, Kastenmüller G, et al. Multi-omics integration in biomedical research - A metabolomics-centric review. Anal Chim Acta 2021;1141:144-62. [Crossref] [PubMed]
Ong JCL, Chang SY, William W, et al. Ethical and regulatory challenges of large language models in medicine. Lancet Digit Health 2024;6:e428-32. [Crossref] [PubMed]
Bhanbhro J, Nisticò S, Palopoli L. Issues in federated learning: some experiments and preliminary results. Sci Rep 2024;14:29881. [Crossref] [PubMed]
Goyal N, Singh N. Named entity recognition and relationship extraction for biomedical text: A comprehensive survey, recent advancements, and future research directions. Neurocomputing 2025;618:129171.
Cai L, Yu C, Kang Y, et al. Practices, opportunities and challenges in the fusion of knowledge graphs and large language models. Front Comput Sci 2025;7:1590632.
Pan S, Luo L, Wang Y, et al. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans Knowl Data Eng 2024;36:3580-99.

doi: 10.21037/jmai-2025-203
Cite this article as: Qi Y, Wang Y, Wen W. A narrative review of the evolving role of language models in biomedicine: bridging text, omics, and multiscale integration. J Med Artif Intell 2026;9:28.

A narrative review of the evolving role of language models in biomedicine: bridging text, omics, and multiscale integration

Introduction

Methods

Table 1

Application of genuine LMs in biomedical text

Table 2

Biomedical NER and RE

Ontology and knowledge graph (KG) construction

Ontology and KG into Q&A system

LM-inspired architectures in biomedicine

Protein structure prediction

Table 3

Cell annotation

FM definition and clinical applications

Table 4

Combination of virtual-actual data

Gene function and pathway annotation

Development of an integrated text and omics database

Strengths and limitations

Future directions and perspectives

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share