Machine learning approaches for enhancing adult height prediction in girls with early-onset and rapidly progressing puberty undergoing GH and GnRHa therapy: a scoping review

Kendah Saif; Amal Alharbi; Halima Samra; Naseem Alyahyawi

doi:10.21037/jmai-2025-149

Review Article

Machine learning approaches for enhancing adult height prediction in girls with early-onset and rapidly progressing puberty undergoing GH and GnRHa therapy: a scoping review

Kendah Saif¹, Amal Alharbi¹, Halima Samra¹, Naseem Alyahyawi²

¹Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia; ²Department of Pediatrics, Faculty of Medicine, King Abdulaziz University Hospital, Jeddah, Saudi Arabia

Contributions: (I) Conception and design: All authors; (II) Administrative support: All authors; (III) Provision of study materials or patients: All authors; (IV) Collection and assembly of data: All authors; (V) Data analysis and interpretation: All authors; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Kendah Saif, MSc. Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia. Email: Kendah.A.Saif@gmail.com.

Background: Accurate prediction of adult height in girls undergoing treatment for early-onset or rapidly progressing puberty with growth hormone (GH) and gonadotropin-releasing hormone agonist (GnRHa) is essential for guiding clinical decisions and optimizing outcomes. Traditional statistical models, such as Bayley-Pinneau and Tanner-Whitehouse, were developed in homogeneous cohorts and often fail to capture the heterogeneous growth trajectories seen in treated girls aged 7−10 years. This review examines the potential of machine learning (ML) approaches to improve predictive accuracy and clinical applicability in this underrepresented population.

Methods: A scoping review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) framework. PubMed, Scopus, and IEEE Xplore databases were searched for peer-reviewed studies published between January 2007 and May 2025. Eligible studies included predictive models of adult height or closely related outcomes in pediatric endocrinology, with a focus on early puberty and GH/GnRHa interventions. Data were extracted on study design, sample characteristics, treatment regimen, baseline risk markers (e.g., bone-age advancement, predicted adult height methods), and predictive methodology (traditional versus ML).

Results: Twente studies were included. Traditional models demonstrated variable accuracy, with systematic overestimation in girls with rapidly progressing puberty. ML models, including random forest (RF), extreme gradient boosting (XGBoost), and deep learning, consistently reduced prediction error compared with conventional methods [root mean square error (RMSE) <3.5 cm in some cohorts]. However, most studies were limited to East Asian and European populations, with minimal representation of Middle Eastern cohorts. In addition, variation in treatment timing, adherence, and genetic background complicates direct comparison across studies. While ML demonstrates superior feature integration and adaptability, questions remain regarding optimal algorithm selection for the 7-10 age group, interpretability for clinical trust, and validation in ethnically diverse, real-world settings.

Conclusions: ML approaches show promise in enhancing adult-height prediction for girls undergoing GH/GnRHa therapy, but evidence remains constrained by limited diversity and short-term validation. Future research should prioritize ethnically representative cohorts, interpretable models, and prospective clinical trials to establish predictive validity and improve real-world decision-making in pediatric endocrinology.

Keywords: Adult height prediction; growth hormone therapy (GH therapy); gonadotropin-releasing hormone agonist (GnRHa); machine learning (ML); pediatric endocrinology

Received: 18 June 2025; Accepted: 21 November 2025; Published online: 27 January 2026.

doi: 10.21037/jmai-2025-149

Highlight box

Key findings

• Machine learning models show clear advantages over traditional methods for predicting adult height in girls undergoing growth hormone (GH) and gonadotropin-releasing hormone agonist (GnRHa) therapy. Across 20 studies published between 2007 and 2025, machine learning (ML) algorithms such as random forest (RF), extreme gradient boosting (XGBoost), and deep learning consistently demonstrated higher predictive accuracy, with several cohorts achieving root mean square error (RMSE) values below 3.5 cm. Traditional models, including Bayley-Pinneau and Tanner-Whitehouse, frequently overestimated adult height, especially in girls with rapidly progressing puberty, and lacked adaptability to real-world clinical variation.

What is known and what is new?

• Traditional statistical models for adult-height prediction were derived from homogeneous cohorts and assume linear growth patterns, limiting their accuracy in children treated with GH and GnRHa.

• This review identifies a growing body of evidence supporting ML’s superior ability to integrate multidimensional data such as bone age, treatment adherence, hormonal levels, and parental height—allowing more individualized and accurate predictions. It also highlights the lack of representation of Middle Eastern populations in existing research.

What is the implication, and what should change now?

• The findings underscore the need for ethnically diverse datasets and interpretable ML models that clinicians can trust and apply in practice. Future studies should employ federated and transfer learning to ensure privacy-preserving, generalizable frameworks. Integrating ML-based prediction tools into pediatric endocrinology could enhance decision-making, personalize treatment strategies, and improve final height outcomes for girls receiving GH and GnRHa therapy.

Introduction

Background

Predicting adult height is essential for evaluating the effectiveness of interventions for girls with early-onset puberty, particularly those receiving growth hormone (GH) and gonadotropin-releasing hormone agonist (GnRHa) therapy. Consistent clinical guidelines recommend GnRHa therapy for all girls diagnosed with precocious puberty to delay pubertal onset and preserve final adult height. In girls, precocious puberty is defined as the development of secondary sexual characteristics before age 8 years in the general population and before age 7 years in certain ethnic groups. These thresholds correspond to approximately 2–2.5 standard deviations below the mean age of pubertal onset, which is ~10 years 6 months in most populations (1).

A key clinical question concerns the therapeutic benefits for girls whose pubertal onset occurs between the ages 7–10 years. These girls begin puberty earlier than their peers yet do not meet strict criteria for precocious puberty, creating a clinical gray zone that contributes to substantial variability in management (2). Some endocrinologists regard early pubertal development in this range as a normal variant and defer treatment, whereas others initiate therapy due to concerns about potential effects on final adult height and psychosocial well-being. Clarifying the benefits and indications of therapy for this intermediate age group remains an important area of ongoing research (2).

Traditional models, including the Bayley-Pinneau and Tanner-Whitehouse methods, are primarily built on homogeneous populations and assume uniform growth trajectories (3-5). These models often neglect the dynamic interactions between hormonal levels, treatment adherence, genetic predispositions, and other biological factors that influence growth. This limitation is especially pronounced in girls with onset of puberty aged 7-9 years or rapidly progressing advanced puberty aged 9-10 years, where puberty-induced hormonal fluctuations significantly alter growth patterns (4,6-8). Accurately identifying delayed puberty is critical, as delays can impact psychosocial well-being, and while it is often benign and self-resolving, it can sometimes indicate underlying medical conditions that need further investigation (4).

Rationale and knowledge gap

Research increasingly emphasizes the need for growth prediction tools that adapt to individualized contexts. Treatment success in this age group is highly contingent upon patient-specific factors, a reality that conventional models often fail to capture. Additionally, this demographic remains underrepresented in predictive modeling research, leading to a gap in reliable clinical tools for forecasting adult height outcomes. ML has emerged as a promising solution to these challenges. By uncovering nonlinear relationships and processing large, complex datasets, ML models offer the potential to significantly enhance prediction accuracy. In fields such as endocrinology and growth forecasting, ML approaches have already demonstrated improved performance compared to traditional statistical techniques (5,9,10). This review explores the potential of ML to transform adult height prediction into pediatric endocrinology, particularly for the underserved population of girls aged with onset of puberty at 7-9 years or girls who presented at age of 9-10 years with rapidly progressing advanced puberty and poor predicted adult height and treated with GH and GnRHa therapy.

Despite decades of research on adult height prediction, significant gaps remain. Traditional models such as Bayley-Pinneau and Tanner-Whitehouse were derived from homogeneous historical cohorts and lack adaptability to diverse, contemporary populations. These methods are particularly limited in girls aged 7–10 with early or rapidly progressing puberty, where growth is nonlinear and strongly influenced by hormonal therapy. Moreover, most existing studies have been conducted in East Asian or European populations, with minimal inclusion of Middle Eastern cohorts, leaving uncertainty about generalizability and other underrepresented groups. Current literature also underexplores how treatment-related factors (e.g., GH/GnRHa regimen, adherence, therapy duration) and genetic or familial variables impact prediction accuracy. Finally, while ML has shown potential to reduce prediction error, critical questions remain unanswered: which algorithms are optimal for 7–10-year-old? How can accuracy be balanced with interpretability to build clinician trust? And what is the real-world impact of ML predictions on long-term clinical outcomes? Addressing these gaps is essential for advancing toward population-specific, clinically applicable prediction models.

Objective

The primary objective of this literature review is to critically assess the limitations of existing statistical models used to predict adult height in pediatric patients, with a specific focus on girls with early-onset puberty receiving GH and GnRHa therapy. The review aims to explore the contributions of ML in improving predictive accuracy, integrating diverse clinical variables, and enhancing adaptability to individual patient profiles. By comparing traditional and ML models, this study seeks to highlight methodological advancements that can bridge the gap between clinical predictions and real-world variability. Moreover, it examines the ethical and practical considerations of deploying ML tools in pediatric endocrinology and emphasizes the necessity for population-specific predictive frameworks that ensure equitable and effective patient care. We present this article in accordance with the PRISMA-ScR reporting checklist (available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-149/rc) (11).

Methods

Review design

The purpose of the scoping review was to map the extent, range, and nature of evidence on adult height prediction in girls undergoing GH and GnRHa therapy, with particular attention to the application of ML and traditional statistical approaches. Unlike systematic reviews, which critically appraise study quality, scoping reviews aim to provide a broad overview of relevant evidence, regardless of study type or methodological rigor.

Data sources and search strategy

A comprehensive search was conducted across PubMed, Scopus, and IEEE Xplore databases to identify studies published between January 2007 and May 2025 to ensure inclusion of research reflecting updated clinical criteria for precocious puberty and the emergence of modern ML methods relevant to the study’s objectives. The search strategy was designed to be broad and inclusive, combining medical subject headings and free-text terms such as “adult height prediction”, “growth hormone therapy”, “GnRHa treatment”, and “machine learning in endocrinology”. Boolean operators and filters were applied to restrict results to peer-reviewed, English-language articles. Full search strings are available in Appendix 1.

Eligibility criteria

In alignment with the objectives of a scoping review, this review considered all relevant study types that addressed adult height prediction or closely related outcomes in pediatric endocrinology, including predictive modeling (statistical or ML), diagnostic accuracy studies related to pubertal disorders, and expert/consensus or review papers that inform model inputs and clinical pathways.

Studies were required to focus on pediatric endocrinology and involve children with early-onset puberty or girls with rapidly progressing puberty undergoing GH and/or GnRHa treatment.

We defined poor predicted adult height [poor (PAH)] at baseline using the study’s reported PAH, calculated per the original authors’ method (typically Bayley-Pinneau accelerated tables with Greulich-Pyle bone age). A study was considered to include girls with “poor PAH” if one or more of the following criteria was met: (I) PAH standard deviation score (SDS) ≤−2.0, relative to sex-specific population norms. (II) PAH < [mid-parental height (MPH) − 1.0 standard deviation (SD)]; or (III) absolute PAH <150 cm.

When multiple criteria were available, we prioritized (I); if only target-height or MPH gaps were reported, we applied (II). We recorded the exact PAH method (tables, bone-age atlas) and thresholds used by each study during data charting.

Inclusion criteria

Studies were included if they:

Focused on pediatric populations, with emphasis on girls experiencing early-onset or rapidly progressing puberty treated with GH and/or GnRHa.
Addressed height prediction directly, or indirectly contributed to the understanding of predictors, treatment response, or methodological frameworks.

Exclusion criteria

Studies were excluded if they:

Focused exclusively on unrelated genetic syndromes, or
Lacked sufficient detail regarding growth outcomes or predictors.

Study Selection

Titles and abstracts were independently screened by multiple reviewers. Full texts were then retrieved for studies meeting initial eligibility. Discrepancies were resolved by discussion and consensus. The number of records identified, screened, and retained is summarized in Figure 1 (12), which illustrates the full identification, screening, eligibility, and inclusion process in accordance with the PRISMA-ScR framework.

Figure 1 PRISMA-ScR flow diagram (12) illustrating the study identification, screening, eligibility, and inclusion process for the scoping review.

Data charting

A standardized data-charting form captured the following fields: author, year, study design, participant characteristics (including age range) and sample size, data source (e.g., clinical trial, cohort, device-based registry), predictive or modeling approach [e.g., random forest (RF), extreme gradient boosting (XGBoost), logistic regression, consensus framework], and key findings. In addition, we extracted methodological details essential for comparability across studies, including the PAH computation method (e.g., Bayley-Pinneau table type), the bone-age assessment method (e.g., Greulich-Pyle), and which “poor PAH” criterion (I–III) was satisfied at baseline. Where reported, treatment regimen (drug and dose), initiation timing, treatment duration, adherence measures, and relevant genetic or familial variables were also recorded. Comprehensive study characteristics (regimen, timing, duration, adherence, bone age/PAH method, genetic/familial factors) are provided in Table S1.

Critical appraisal

In keeping with the purpose of a scoping review, no formal risk-of-bias or quality assessment was undertaken. Instead, the goal was to provide a comprehensive map of available evidence rather than to exclude studies based on methodological limitations.

A total of 35 articles were identified after the initial title and abstract screening phase. Following full-text content screening for relevance, data availability, and methodological rigor, 20 studies were retained for final inclusion in the review. key data were extracted regarding study design, participant characteristics, datasets, predictive modeling approaches, and principal findings. A summary of the main features of the included studies is provided in Table 1.

Table 1

Summary of key studies on growth prediction and treatment

Author(s)	Year	Design	Participants (age)	Dataset	Model	Key findings
Labarta et al. (3)	2021	Retrospective review/guidance	Children on GH therapy; ages not specified (general pediatric)	Clinical + literature data	Not applicable	Provided clinical guidance; emphasized bone age/PAH methods and SDS thresholds
Palmert & Dunkel (4)	2012	Narrative review	Adolescents with delayed puberty (typically 12–16 years)	Clinical and literature data	Not applicable	Defined delayed puberty, summarized diagnostic approaches, and treatment strategies
Lemaire et al. (5)	2018	Retrospective validation study	284 girls with idiopathic CPP (mean onset ~7–8 years)	Multicenter European trial data	Mathematical prediction model	Validated adult height prediction model; showed improved accuracy vs. Bayley-Pinneau
Collett-Solberg et al. (GRS perspective) (6)	2019	Expert consensus/review	Not applicable	Workshop + review data	Not applicable	Provided etiological insights; framed AI’s role in diagnosis/management
Dotremont et al. (7)	2023	Prospective cohort	Girls with early/fast puberty, poor PAH (mean onset 7–8 years)	Multiyear (4 years) treatment cohort	Not ML; clinical cohort analysis	GH + GnRHa combo improved adult height compared to controls
Alaaraj et al. (8)	2021	Retrospective cohort (Qatar)	Girls with early & fast puberty; mean 7.7±0.7 years	National hospital records	Not applicable	Documented PAH/MPH SDS gaps in regional cohort; highlights Middle Eastern representation
Park & Lee (9)	2023	Retrospective	210 Korean girls with CPP; mean age ~8.6 years	Single-center Korean data	Segmented ML (parental height)	ML reduced RMSE <3.5 cm; improved subgroup predictions by MPH
Shmoish et al. (10)	2021	Retrospective	2,687 children (0–20 years), parents included strongest signals at 3.4–6 years	Israeli national cohort	Multiple ML models	RMSE <3.5 cm; highlighted predictive value of early-childhood height
Cho et al. (13)	2023	Prospective	Girls with CPP, treated ~7–9 years	Korean multicenter	Mono vs. combo GH + GnRHa	Combo therapy improved final adult height compared to GnRHa alone
Swaiss et al. (14)	2017	Retrospective cohort (Jordan)	Girls with CPP, mean onset ~7 years; treated vs. untreated	University hospital records	Not applicable	Treated girls’ final adult height 158.5 cm vs. untreated 151.2 cm; first Jordanian CPP cohort
Spataru et al. (15)	2022	Retrospective	10,929 children/adolescents (≤18 years) on GH	Real-world device-logged (easypod™)	Logistic regression, RF	Identified early adherence predictors; RF best performing
Tuvemo et al. (16)	2007	Randomized controlled trial	46 adopted girls with early puberty (mean 8.3 years)	Clinical trial	Not applicable	GH + GnRHa significantly improved PAH and growth velocity
Ilyas et al. (17)	2020	Retrospective	1,596 individuals, birth–adolescence (0–18 years)	GrowUp longitudinal cohort	Deep learning (deep neural network, RF)	Predicted adult height from early growth data; high accuracy
Spataru et al. (18)	2021	Retrospective (abstract)	Children on GH (≤18 years)	Easypod™ adherence data	RF, logistic regression	Early adherence data predicted long-term adherence
Loftus et al. (19)	2022	Federated learning perspective	Multi-institutional pediatric data	Decentralized records	Federated learning	Demonstrated feasibility of privacy-preserving pediatric ML
Chen et al. (20)	2024	Meta-analysis	Girls with suspected CPP (typically 6–10 years)	Multi-study hormonal/imaging datasets	ML (XGBoost, RF, support vector machine, etc.)	High diagnostic accuracy for CPP (AUC ≈0.90; sensitivity 0.82; specificity 0.85)
Chen et al. (21)	2023	Retrospective diagnostic ML	1,757 girls (mean ~8 years) undergoing GnRH stimulation test	Hospital records	XGBoost, RF	Accurately classified GnRH test outcomes; AUC ~0.85
Araujo-Moura et al. (22)	2025	Transfer learning study	Children & adolescents (6–18 years) in South America	Multi-cohort	Transfer learning (deep neural network)	Improved hypertension prediction; methodological analogy for adult height prediction

AI, artificial intelligence; AUC, area under the curve; CPP, central precocious puberty; GH, growth hormone; GnRH, gonadotropin-releasing hormone; GnRHa, gonadotropin-releasing hormone agonist; GRS, Growth Hormone Research Society; ML, machine learning; MPH, mid-parental height; PAH, predicted adult height; RF, random forest; RMSE, root mean square error; SDS, standard deviation score; XGBoost, extreme gradient boosting.

Discussion

This section discusses the challenges and recent advancements in predicting adult height for girls undergoing GH and GnRHa treatment. It emphasizes the gaps in existing models and the emerging role of ML techniques in addressing these challenges.

Gaps in existing research

The current body of research reveals multiple gaps that limit the effectiveness and clinical applicability of adult height prediction models for girls treated for early puberty. Many studies fail to adequately address the unique growth characteristics of girls in this group, a demographic that experiences distinct hormonal changes compared to younger girls. Traditional statistical methods dominate the field, often relying on linear assumptions that oversimplify the complex biological relationships influencing growth outcomes. Furthermore, a significant portion of existing studies focuses on specific conditions such as central precocious puberty (CPP), resulting in limited generalizability to broader patient populations. These gaps highlight the urgent need for advanced methodologies, particularly ML, to provide more reliable and individualized predictive tools (5,10,13).

Inconsistent results across studies

Significant inconsistencies exist across studies evaluating height prediction outcomes for similar therapeutic regimens. Differences in study design, patient populations, and treatment protocols contribute to varied findings. For instance, a Qatar cohort of girls with early and fast puberty treated with monthly GnRHa (triptorelin 3.75 mg intramuscular) for three years documented reductions in growth velocity and tracked changes in bone maturation and predicted adult height, illustrating how protocolized suppression shapes longitudinal height trajectories in practice (6). Similarly, a Jordanian retrospective cohort (43 treated vs. 13 untreated) reported higher final adult height in treated girls with CPP (158.5±6.6 vs. 151.2±8.4 cm; P=0.004) and closer alignment to target and MPH, underscoring that measured gains depend on baseline prediction, treatment duration, and timing of initiation (14). For example, the studies (7,13) report substantial height gains after combination therapy, whereas device-verified adherence work (15) highlights nuanced results that depend on initiation timing and adherence. A critical consideration is that the advanced bone age observed in certain patients with early puberty may compromise the actual efficacy of GH therapy by limiting the remaining growth potential, thereby helping to explain why some studies report reduced or inconsistent benefits. Variations in treatment administration, therapy duration, patient compliance, and genetic predispositions further complicate comparability across studies. This inconsistency underscores the need for standardized methodologies and the development of models capable of accommodating demographic and individual variability.

Neglect of age-specific models

Most existing prediction models for adult height predominantly focus on younger children, typically under 8 years of age (10) developed a machine-learning model using a broad pediatric cohort (0-20 years); notably, the most predictive features were gender and height measured at ~3.4-6.0 years. demonstrating high predictive accuracy in early childhood. However, their model does not address the prediction challenges associated with girls who begin therapy during the later stages of pubertal development, particularly those aged 7-8 years. This subgroup, often starting treatment during the onset or progression of puberty, experiences distinct biological and hormonal changes, including variations in growth velocity, hormonal fluctuations, and bone maturation rates, that substantially influence growth patterns and complicate prediction efforts.

Research (10) demonstrated that machine learning (ML) models trained on early childhood data achieved high predictive accuracy, with root mean square error (RMSE) values under 3.5 cm. Although the study did not directly compare broader age ranges, its results suggest that limiting models to specific developmental stages may improve prediction accuracy. These findings support the argument for developing tailored predictive frameworks for clinical subgroups, such as girls with early-onset puberty, where growth dynamics are notably different. Relying on generalized models may risk inaccurate projections and lead to suboptimal treatment planning.

The oversight of age-specific variations leaves a substantial portion of the pediatric population underrepresented in research and clinical applications. Current models inadequately address the age-specific hormonal shifts associated with the onset of puberty and its suppression through treatment. These factors collectively emphasize the necessity for predictive tools specifically designed to accommodate the unique growth patterns of older prepubertal and early pubertal girls.

Focus on underlying conditions

Another notable limitation in literature is the predominant focus on girls with underlying medical diagnoses such as CPP or growth hormone deficiency (GHD), as shown in treatment and modeling cohorts (7,13,14,16). Healthy girls undergoing early-puberty treatment remain underrepresented in predictive modeling; they do not represent the full spectrum of girls undergoing GH and GnRHa therapy. Healthy girls who receive early puberty treatment for idiopathic reasons are frequently excluded from predictive model development, creating a bias that limits model applicability in broader clinical contexts.

For instance, a study (17) utilized a deep feed-forward neural network to model outcomes in girls with GHD, achieving high predictive accuracy within their specific sample. However, because the dataset was restricted to a particular medical condition, the resulting model cannot be confidently generalized to girls experiencing idiopathic early-onset puberty. Without models trained and validated on healthy patient populations, clinicians must extrapolate findings from non-representative cohorts, thereby increasing the risk of misaligned predictions and inappropriate clinical decisions.

Many predictive models prioritize individuals with well-defined underlying conditions while overlooking girls undergoing early-onset puberty treatment. This narrowed focus limits the generalizability of existing models and leaves a significant demographic without tailored predictive tools. Girls with early-onset puberty may exhibit distinct growth patterns compared to those with underlying medical diagnoses, necessitating the development of different predictive frameworks. The lack of research attention to this group contributes to gaps in both research and clinical practice, where treatment strategies are often based on incomplete or non-representative datasets. Addressing this issue requires the creation of predictive models that are inclusive of broader patient populations, ensuring equitable and effective healthcare solutions.

Limited application of ML

Traditional statistical methods continue to dominate adult-height prediction despite well-documented limits. They assume largely linear relations and a narrow feature set, which fails to capture the nonlinear interactions among bone age, hormonal milieu, parental height, and adherence (5,9,10). The limited adoption of advanced technologies, particularly ML methods, has further exacerbated these shortcomings.

ML, with its capacity to manage multidimensional datasets and identify complex, nonlinear patterns, presents a promising alternative. However, its application in the field of pediatric endocrinology remains underexplored, creating an urgent need for innovation and the broader integration of these methodologies into clinical practice. Studies such as (13,18), which incorporated real-world adherence data collected from auto-injector devices, and (9,10,17), which utilized deep learning on longitudinal datasets, have demonstrated the superior predictive capabilities of ML models compared to traditional approaches. These findings provide strong evidence for the potential of ML to enhance predictive accuracy and support individualized treatment planning.

Despite these promising results, the broader clinical adoption of ML remains limited. Several barriers contribute to this underutilization, including the lack of large, accessible pediatric datasets, ethical concerns surrounding the use of sensitive patient data, and the limited familiarity of many clinicians with advanced analytical methods. Furthermore, a considerable proportion of ML research in this domain remains theoretical or restricted to simulation studies, rather than being validated in real-world clinical environments (19). This gap continues to delay the full integration of ML technologies into standard pediatric endocrine care.

Advances in ML for height prediction

ML has revolutionized predictive modeling in healthcare by offering tools to analyze complex datasets and uncover nonlinear relationships that traditional statistical methods often fail to address. In pediatric endocrinology, ML presents a new opportunity to enhance the prediction of adult height, particularly for girls with the onset of puberty undergoing GH and GnRHa therapy. By integrating diverse data points and applying advanced algorithms, ML models provide more accurate and personalized predictions, addressing critical gaps in current clinical practices.

ML has introduced a paradigm shift in predictive modeling across multiple areas of medicine, including pediatric growth management. Unlike traditional models that are limited by their reliance on linearity and a small set of variables, ML algorithms can detect and model complex, nonlinear relationships among numerous interrelated clinical, biological, and behavioral factors. This capacity is particularly important for predicting adult height outcomes in girls with early-onset puberty, where growth trajectories are shaped by subtle and interacting variables.

Figure 2 illustrates the fundamental differences between traditional statistical models and ML models used for adult height prediction. Traditional models typically rely on a limited set of fixed inputs, such as chronological age and bone age, and use linear formulas to generate static predictions. In contrast, ML models incorporate a broader and dynamic set of inputs, including parental height, treatment adherence, and hormonal levels, and are capable of adaptively updating predictions as new clinical data emerge.

Figure 2 Comparison of traditional statistical models and machine learning models for adult height prediction.

The subsections that follow describe in detail the specific ways in which ML outperforms traditional methods, supported by evidence from existing studies on predictive accuracy, variable integration, personalized treatment insights, combination therapy analysis, and real-world clinical implementation.

Superior predictive accuracy

ML models have demonstrated substantial improvements in predictive performance compared to traditional statistical methods, particularly within the context of pediatric growth prediction. For example, the study (9) employed various ML algorithms, including RF and XGBoost, to analyze data from girls with CPP. Their models consistently outperformed conventional regression approaches, achieving lower RMSE and mean absolute error (MAE) values. These findings indicate that ML models are more effective in capturing nonlinear relationships and interaction effects among clinical variables that influence height outcomes. Recent studies have validated mathematical models specifically designed to predict adult height in girls with idiopathic CPP, emphasizing the accuracy and reliability of such models compared to traditional prediction methods (5).

Similarly, the study (10) applied multiple ML techniques to a large dataset of 2,687 Korean children and achieved an RMSE of less than 3.5 cm in predicting adult height. This level of predictive accuracy reflects a significant improvement over traditional models, which often rely on fixed growth charts and generalized assessments of bone age. The superior performance observed in these studies confirms that ML techniques offer a more flexible and individualized framework for height prediction, accommodating complex and patient-specific growth trajectories that conventional methods frequently overlook. To ensure transparency, we report per-study predictive accuracy metrics [RMSE/MAE/R² for adult-height prediction, or area under the curve (AUC)/sensitivity/specificity for diagnostic ML]. Where available, we also present the numeric gap versus traditional methods (e.g., Bayley-Pinneau or Tanner-Whitehouse).

Per-study predictive metrics are summarized in Table 2, with full characteristics are available in Table S1.

Table 2

Comparison of per-study metrics and conventional methods

Study	Population/task	Method(s) compared	Metric (as reported)	Traditional comparator	Gap vs. comparator
Lemaire et al. (5)	Idiopathic CPP; AH model validation	Mathematical model vs. Bayley-Pinneau	Improved accuracy vs. Bayley-Pinneau	Bayley-Pinneau	Δ vs. Bayley-Pinneau NR
Park & Lee (9)	General cohort; MPH-segmented AH prediction	Multiple ML (by parental-height segment)	RMSE <3.5 cm (by segment)	Conventional baselines	NR
Shmoish et al. (10)	General cohort; adult-height prediction	Multiple ML models	RMSE <3.5 cm	Chart/formula baselines	NR
Ilyas et al. (17)	GHD planning; height prediction	Deep learning	Metric NR in manuscript	—	—
Diagnostic ML (context—not AH prediction)
Chen et al. (20)	CPP diagnosis (meta-analysis)	Multiple ML	AUC ≈0.90; sensitivity ≈0.82; specificity ≈0.85	—	—
Chen et al. (21)	CPP diagnosis	ML classifiers	AUC/sensitivity/specificity (per study)	—	—

AH, adult height; AUC, area under the curve; CPP, central precocious puberty; GHD, growth hormone deficiency; ML, machine learning; MPH, mid-parental height; NR, not reported; RMSE, root mean square error.

Incorporation of key variables

A significant strength of ML lies in its flexibility to incorporate a wide range of predictive variables into modeling processes. Traditional approaches often rely on a limited set of features, typically chronological age, bone age, and baseline height, while excluding critical factors such as treatment adherence, hormonal levels, and dynamic behavioral responses. ML algorithms overcome this limitation by accommodating a much broader array of both structured and unstructured clinical data.

Studies such as (15,18) have demonstrated that adherence to GH therapy, measured through smart auto-injector devices, can significantly predict long-term treatment outcomes. Similarly, the studies (15,18) highlighted the importance of integrating behavioral patterns and dosing consistency, showing that these real-world factors strongly influenced prediction accuracy within a large cohort of over 10,000 children. These findings underscore the critical role of incorporating real-world behavioral and clinical data into predictive models, a capability that is made feasible through ML and remains largely unattainable with traditional methods.

Moreover, ML models can continuously update predictions as new patient data becomes available. This adaptability is particularly valuable in pediatric growth prediction, where each new height measurement, laboratory result, or treatment adherence record can refine future forecasts. By dynamically incorporating the most current and comprehensive information, ML ensures that clinicians are making better-informed, individualized treatment decisions.

Personalized treatment insights

ML enables clinicians to move beyond generalized predictions by offering personalized treatment insights tailored to individual patient profiles. By analyzing patient-specific data, ML models can identify key factors influencing growth outcomes and suggest optimized treatment strategies, aligning with clinical guidance that emphasizes individualized assessment and treatment planning for children with growth disorders (6). Studies such as (18) have emphasized the value of personalized medicine in improving patient outcomes, and ML tools directly support this goal by providing actionable insights that enhance therapeutic efficacy and patient satisfaction.

ML allows predictive modeling to shift from population-level averages toward individualized forecasts based on each patient’s unique clinical and biological profile. This approach aligns with the growing emphasis on personalized medicine in pediatric care, as discussed in (9). ML models such as XGBoost and LightGBM have demonstrated the ability to isolate the clinical variables most influential to individual outcomes, such as bone maturity, puberty stage, and family growth history. Recent meta-analytic evidence further supports this, with AUC ≈0.90; sensitivity ≈0.82; specificity ≈0.85 (20), underscoring that integrated clinical/hormonal/imaging inputs can be predictively informative for related tasks. By identifying these patient-specific factors, ML empowers physicians to tailor treatment plans more precisely, whether by adjusting therapy durations, modifying intervention timing, or setting more realistic expectations for families. This individualized predictive capability enhances clinical decision-making, reduces psychological burden, and improves overall satisfaction with the care process.

Combination therapy insights

Research on combination therapies, such as GH and GnRHa treatments, has shown significant promise in improving adult height outcomes. However, traditional models often fail to account for the nuanced effects and interactions between these therapies. ML models offer a distinct advantage by analyzing the complex relationships between individual treatment responses and therapeutic outcomes, providing deeper and more actionable insights.

Previous randomized controlled studies further substantiate the benefits of combined therapies. For instance, the study (16) conducted a randomized trial in adopted girls with early puberty and found that combining GH with GnRHa treatment notably improved growth velocity over two years by an average of 3.7 cm compared to GnRHa alone. Their findings highlight the potential clinical utility of combination treatments to optimize final adult height predictions.

Across multiple cohorts (7,13,16), combination therapy has shown benefits, particularly in improving growth outcomes for girls with early-onset puberty or growth deceleration. ML tools can further refine these findings by modeling therapy-specific variables and their interactions, allowing clinicians to optimize treatment plans for maximum efficacy. This approach enables individualized treatment optimization based on patient-specific profiles, rather than relying on generalized population outcomes.

Traditional statistical models are often limited in their ability to accommodate the intricate interactions between multiple treatments, frequently leading to oversimplified conclusions. In contrast, ML models offer unparalleled utility in identifying which subgroups are most likely to benefit from combination therapy and under what specific conditions. While studies such as (13) have supported the efficacy of combination therapies in improving growth outcomes, their conclusions were drawn from traditional cohort analyses that assume similar responses across diverse patient groups. Similarly, the study (21) relied on conventional statistical models, which may overlook the variability in individual treatment response.

ML algorithms can simulate various treatment sequences, predict individualized growth outcomes, and estimate which therapeutic approach will maximize final adult height. This predictive functionality holds substantial potential for refining treatment planning, minimizing unnecessary therapy durations, and reducing the risk of suboptimal outcomes.

Model comparisons and interpretability

Different ML models bring distinct strengths to the task of adult height prediction. RF is well-regarded for its robustness and interpretability, providing variable importance rankings that help clinicians understand the drivers behind model predictions. XGBoost, meanwhile, offers superior generalization performance and often outperforms other algorithms on small- to medium-sized structured datasets, as demonstrated by (9). Deep neural networks, while offering high predictive accuracy, require larger datasets and greater computational resources, and they also pose significant challenges related to model transparency.

As demonstrated in device-based adherence modeling (15), deep learning and ensemble models used in clinical contexts should be accompanied by interpretability frameworks such as SHapley Additive exPlanations (SHAP) to enhance transparency, build clinical trust, and support ethical deployment in real-world settings. Interpretability remains a key barrier to the widespread adoption of ML in healthcare, and ongoing research must prioritize the development of models that balance predictive complexity with clinical usability.

Generalizability and real-world challenges

Despite their promise, ML models are not without limitations. One critical concern is generalizability, as models trained on a specific population may not perform well when applied to different demographic groups, clinical practices, or data collection methods. While studies from Qatar and Jordan provide valuable insights into treatment outcomes in Middle Eastern cohorts (8,14), their relatively small sample sizes highlight the continued need for broader, multi-center datasets to ensure robust generalization. This limitation can undermine the broader applicability of predictive models across diverse healthcare settings.

Federated learning has been proposed as a potential solution to this challenge. As discussed by (19), federated learning enables models to be trained collaboratively across multiple institutions without requiring centralized sharing of sensitive patient information. This approach helps maintain data privacy while improving the robustness and generalizability of models. Transfer learning, in which models initially developed for related medical tasks are adapted for new applications such as adult height prediction, also offers a promising pathway for expanding model utility and enhancing performance across varied populations. A study (22) demonstrated the effectiveness of transfer learning in pediatric hypertension prediction across multiple South American cohorts. By adapting models trained on children to adolescent datasets, their study significantly improved performance metrics such as Area Under the Receiver Operating Characteristic Curve, emphasizing how Transfer Learning can overcome data scarcity and variability in pediatric populations.

However, real-world implementation of ML models faces additional logistical and ethical challenges. Data privacy, especially concerning pediatric populations, remains a major concern. Models must comply with stringent data protection regulations while remaining transparent and explainable to both clinicians and patients. Furthermore, successful adoption requires substantial clinician education on ML principles, integration into electronic health record systems, and ongoing model validation to ensure consistent performance over time.

In summary, ML models have demonstrated substantial advantages over traditional statistical approaches in predicting adult height, particularly through their ability to integrate multidimensional clinical data, personalized treatment strategies, and complex model interactions. However, challenges related to generalizability, interpretability, and real-world implementation remain and must be addressed to fully realize their clinical potential. Continued research focusing on ethical deployment, clinician education, and robust validation across diverse populations will be essential to ensure the successful integration of ML tools into pediatric endocrinology practice.

Analysis of existing literature

Literature highlights significant progress as well as critical gaps in the field of adult height prediction for girls undergoing GH and GnRHa treatment. Existing predictive models demonstrate value but often fall short in addressing the specific needs of girls with early-onset puberty, which experiences unique growth patterns and hormonal dynamics distinct from other pediatric demographics. Traditional statistical models, although foundational, fail to capture the complex and nonlinear relationships among key variables such as bone age, hormonal levels, and treatment adherence, ultimately limiting their accuracy and applicability in real-world scenarios (9,17).

ML has emerged as a promising alternative, offering superior predictive accuracy and adaptability. Algorithms such as RF and XGBoost have demonstrated strong capabilities in processing multidimensional datasets and identifying critical variable interactions (9,15). Despite these advancements, the application of ML in this context remains limited. Notably, many studies focus primarily on individuals with underlying medical conditions such as CPP, leaving healthy girls undergoing early puberty treatments significantly underrepresented (7,13,14,16).

Several limitations continue to hinder the development of inclusive and reliable predictive tools. A major concern is the lack of model generalizability, as most studies rely on controlled datasets that do not capture the variability inherent in broader clinical settings. There is also a strong focus on younger age groups in literature, as seen in (10), highlighting features from early childhood (around ages 3.4-6.0 years) contribute most strongly to adult height prediction, but this emphasis on younger cohorts leaves the distinct developmental and therapeutic dynamics of girls aged 7-10 years underexplored. These issues highlight an urgent need for comprehensive predictive frameworks that integrate real-world clinical data and address the diverse needs of this underrepresented demographic.

This review focuses on synthesizing clinically relevant insights across heterogeneous study types (adult-height prediction models, treatment outcome cohorts, and diagnostic ML for CPP. While such diversity precludes a single pooled estimate, it allows a broader appraisal of when and why models perform well. Reporting standards varied across studies (e.g., exact PAH/bone-age methods, dosing/duration, or adherence metrics), so we explicitly extracted and tabulated these fields where available and flagged them when not reported, to maintain transparency without over-interpreting gaps. Several modeling papers emphasized internal validation; we therefore interpret accuracy estimates as context-dependent and highlight the importance of external or temporal validation for routine use. Finally, our deliberate scope (peer-reviewed, English-language sources) prioritizes methodological clarity over exhaustiveness; future multi-center work can extend this synthesis. Taken together, these choices reflect a conservative, clinically oriented interpretation rather than a limitation of substance, and they point to practical next steps already outlined in our roadmap.

Future research should prioritize integrating real-world clinical datasets, developing interpretable ML frameworks for clinical applications, and conducting validation studies across diverse populations and institutions. Additionally, ethical considerations related to data privacy, model transparency, and patient consent must remain central to the responsible deployment of ML tools. The proposed study aims to address these challenges by leveraging ML methodologies to develop a robust framework that aligns with broader goals of precision medicine. By incorporating real-world clinical data from Saudi hospitals and focusing on those with early-onset puberty, this research seeks to advance clinical decision-making and improve outcomes in pediatric endocrinology. Building on these priorities, several open questions define a practical agenda for the field: external validity across settings prospective testing of models in ethnically and clinically diverse cohorts to quantify transportability and calibration; age-tailored algorithm choice head-to-head comparisons of RF, XGBoost, and compact neural models specifically for the 7-10-year subgroup under identical features and metrics (RMSE/MAE, calibration); complexity-interpretability trade-offs whether modest accuracy gains justify reduced transparency, evaluated with explainability tools (e.g., SHAP) and clinician usability outcomes; multimodal predictors the incremental value of genetics over a clinical core (bone age, growth velocity, adherence, assessed with net reclassification and decision-curve analysis; and clinical impact temporal/external validation linked to real-world outcomes (final adult height, timing/duration of therapy, adverse effects, costs) via pragmatic trials or registry-based studies. Future studies applying ML to ethnically diverse cohorts, younger subgroups, and genetic integration, with attention to interpretability and long-term validation, are needed to address these open questions.

Conclusions

This literature review identifies critical limitations in traditional models used for adult height prediction and highlights the potential of ML approaches to address these deficiencies. Studies such as (10,17,18) demonstrate that ML models consistently outperform conventional methods in both predictive accuracy and adaptability, particularly for underserved populations such as girls aged with early-onset puberty and poor predicted adult height undergoing combination therapy GH and GnRHa therapy.

ML models are uniquely positioned to integrate diverse and evolving data sources, including growth trajectories, genetic markers, and real-world adherence patterns. This capacity enables the generation of highly individualized and precise predictions, moving beyond the limitations of traditional linear models. Nevertheless, realizing the full clinical potential of ML requires addressing several challenges, including data availability, model interpretability, and ethical implementation practices.

Future research should focus on developing interpretable ML frameworks that clinicians can easily adopt and trust. Collaborative initiatives using federated learning approaches are essential to improve model generalizability across diverse populations without compromising patient privacy. Furthermore, stronger partnerships between endocrinologists, data scientists, and bioethicists are necessary to ensure that ML tools are ethically designed, clinically relevant, and practically deployable.

When responsibly implemented, ML-enhanced predictive tools have the potential to significantly reduce prediction errors, personalize treatment strategies, and ultimately transform pediatric growth management into a more dynamic, equitable, and precise field of care.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the PRISMA-ScR reporting checklist. Available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-149/rc

Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-149/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-149/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Carel JC, Eugster EA, Rogol A, et al. Consensus statement on the use of gonadotropin-releasing hormone analogs in children. Pediatrics 2009;123:e752-62. [Crossref] [PubMed]
Berberoğlu M. Precocious puberty and normal variant puberty: definition, etiology, diagnosis and current management. J Clin Res Pediatr Endocrinol 2009;1:164-74. [Crossref] [PubMed]
Labarta JI, Ranke MB, Maghnie M, et al. Important Tools for Use by Pediatric Endocrinologists in the Assessment of Short Stature. J Clin Res Pediatr Endocrinol 2021;13:124-35. [Crossref] [PubMed]
Palmert MR, Dunkel L. Delayed puberty. N Engl J Med 2012;366:443-53. [Crossref] [PubMed]
Lemaire P, Duhil de Bénazé G, Mul D, et al. A mathematical model for predicting the adult height of girls with idiopathic central precocious puberty: A European validation. PLoS One 2018;13:e0205318. [Crossref] [PubMed]
Collett-Solberg PF, Ambler G, Backeljauw PF, et al. Diagnosis, Genetics, and Therapy of Short Stature in Children: A Growth Hormone Research Society International Perspective. Horm Res Paediatr 2019;92:1-14. [Crossref] [PubMed]
Dotremont H, France A, Heinrichs C, et al. Efficacy and safety of a 4-year combination therapy of growth hormone and gonadotropin-releasing hormone analogue in pubertal girls with short predicted adult height. Front Endocrinol (Lausanne) 2023;14:1113750. [Crossref] [PubMed]
Alaaraj N, Soliman AT, De Sanctis V, et al. Growth, bone maturation and ovarian size in girls with early and fast puberty and effects of three years treatment with GnRH analogue. Acta Biomed 2022;92:e2021333. [Crossref] [PubMed]
Park JS, Lee DH. Improving the accuracy of adult height prediction with exploiting multiple machine learning models. IEEE Access 2023;11:14454–64.
Shmoish M, German A, Devir N, et al. Prediction of Adult Height by Machine Learning Technique. J Clin Endocrinol Metab 2021;106:e2700-10. [Crossref] [PubMed]
Tricco AC, Lillie E, Zarin W, et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann Intern Med 2018;169:467-73. [Crossref] [PubMed]
Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021;372: [Crossref] [PubMed]
Cho AY, Shim YS, Lee HS, et al. Effect of gonadotropin-releasing hormone agonist monotherapy and combination therapy with growth hormone on final adult height in girls with central precocious puberty. Sci Rep 2023;13:1264. [Crossref] [PubMed]
Swaiss HH, Khawaja NM, Farahid OH, et al. Effect of gonadotropin-releasing hormone analogue on final adult height among Jordanian children with precocious puberty. Saudi Med J 2017;38:1101-7. [Crossref] [PubMed]
Spataru A, van Dommelen P, Arnaud L, et al. Use of machine learning to identify patients at risk of suboptimal adherence: study based on real-world data from 10,929 children using a connected auto-injector device. BMC Med Inform Decis Ma 2022;22:179. [Crossref] [PubMed]
Tuvemo T. Growth hormone treatment during suppression of early puberty in adopted girls. Acta Paediatr 2007;96:1344-9. [Crossref] [PubMed]
Ilyas M, Ahmad J, Lawson A, et al. Height prediction for growth hormone deficiency treatment planning using deep learning. In: Advances in Brain Inspired Cognitive Systems. Cham: Springer; 2020:76-85.
Spataru A, van Dommelen P, Arnaud L, et al. A machine learning approach for identifying children at risk of suboptimal adherence to growth hormone therapy. J Endocr Soc 2021;5:A672-3.
Loftus TJ, Ruppert MM, Shickel B, et al. Federated learning for preserving data privacy in collaborative healthcare research. Digit Health 2022;8:20552076221134455. [Crossref] [PubMed]
Chen Y, Huang X, Tian L. Meta-analysis of machine learning models for the diagnosis of central precocious puberty based on clinical, hormonal and imaging data. Front Endocrinol (Lausanne) 2024;15:1353023. [Crossref] [PubMed]
Chen YS, Liu CF, Sung MI, et al. Machine learning approach for prediction of the test results of gonadotropin-releasing hormone stimulation. Diagnostics (Basel) 2023;13:1550. [Crossref] [PubMed]
Araujo-Moura K, Souza L, de Oliveira TA, et al. Prediction of Hypertension in the Pediatric Population Using Machine Learning and Transfer Learning: A Multicentric Analysis of the SAYCARE Study. Int J Public Health 2025;70:1607944. [Crossref] [PubMed]

doi: 10.21037/jmai-2025-149
Cite this article as: Saif K, Alharbi A, Samra H, Alyahyawi N. Machine learning approaches for enhancing adult height prediction in girls with early-onset and rapidly progressing puberty undergoing GH and GnRHa therapy: a scoping review. J Med Artif Intell 2026;9:19.

Machine learning approaches for enhancing adult height prediction in girls with early-onset and rapidly progressing puberty undergoing GH and GnRHa therapy: a scoping review

Highlight box

Introduction

Background

Rationale and knowledge gap

Objective

Methods

Review design

Data sources and search strategy

Eligibility criteria

Inclusion criteria

Exclusion criteria

Study Selection

Data charting

Critical appraisal

Table 1

Discussion

Gaps in existing research

Inconsistent results across studies

Neglect of age-specific models

Focus on underlying conditions

Limited application of ML

Advances in ML for height prediction

Superior predictive accuracy

Table 2

Incorporation of key variables

Personalized treatment insights

Combination therapy insights

Model comparisons and interpretability

Generalizability and real-world challenges

Analysis of existing literature

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share