Development of an explainable machine learning model to predict live birth versus miscarriage among in vitro fertilization-embryo transfer pregnancies

Mehtap Agirsoy; Matthew A. Oehlschlaeger

doi:10.21037/jmai-25-75

Original Article

Development of an explainable machine learning model to predict live birth versus miscarriage among in vitro fertilization-embryo transfer pregnancies

Mehtap Agirsoy, Matthew A. Oehlschlaeger

Department of Mechanical, Aerospace, and Nuclear Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA

Contributions: (I) Conception and design: Both authors; (II) Administrative support: MA Oehlschlaeger; (III) Provision of study materials or patients: M Agirsoy; (IV) Collection and assembly of data: M Agirsoy; (V) Data analysis and interpretation: M Agirsoy; (VI) Manuscript writing: Both authors; (VII) Final approval of manuscript: Both authors.

Correspondence to: Mehtap Agirsoy, PhD Candidate. Department of Mechanical, Aerospace, and Nuclear Engineering, Rensselaer Polytechnic Institute, 110 Eighth St., Troy, 12180, NY, USA. Email: agirsm@rpi.edu.

Background: While in vitro fertilization (IVF) has enabled many couples to conceive, miscarriage remains common even after clinical pregnancy is confirmed. Accurately predicting live birth versus miscarriage among in vitro fertilization-embryo transfer (IVF-ET) patients with confirmed pregnancy is a critical yet underexplored challenge in reproductive medicine. This study aims to develop and validate an interpretable machine learning (ML) model specifically for patients with confirmed clinical pregnancies following IVF.

Methods: We used a publicly available dataset from Yuan et al. (2022), comprising 1,017 singleton IVF-ET pregnancies. Patients with ectopic/multiple gestations, systemic illness, or incomplete records were excluded. Fifty-five clinical, hormonal, immune, and treatment-related features were extracted, including engineered variables such as anti-Müllerian hormone (AMH) × age, follicle-stimulating hormone (FSH)/luteinizing hormone (LH) ratio, and endometrial quality. Data were split into 80% training and 20% testing sets. A soft-voting ensemble model [extreme gradient boosting (XGBoost), LightGBM, logistic regression] was developed and compared to a standalone XGBoost classifier. Models were evaluated using area under the curve (AUC), precision, F1-score, accuracy, sensitivity, specificity, and SHapley Additive exPlanations (SHAP)-based interpretability.

Results: The XGBoost model outperformed the ensemble, achieving an AUC of 0.99 [95% confidence interval (CI): 0.98–1.00] on the training set and 0.91 (95% CI: 0.86–0.96) on the test set, with an F1-score of 0.93 (95% CI: 0.89–0.96) and accuracy of 0.89 (95% CI: 0.85–0.93). The ensemble model achieved AUC =0.89 with improved recall (sensitivity =0.96) and specificity =0.66 at an optimized threshold of 0.16. SHAP analysis highlighted AMH, prolactin (PRL), FSH, erythrocyte sedimentation rate (ESR), and anti-nuclear antibody (ANA) as top contributors. Engineered features were not globally dominant but showed potential in subgroups.

Conclusions: This interpretable ML model provides an accurate prediction of IVF-ET outcomes in patients with confirmed pregnancy. Findings support the integration of artificial intelligence (AI) into post-transfer risk assessment and highlight the need for external validation in diverse populations.

Keywords: Machine learning (ML); extreme gradient boosting (XGBoost); live birth outcomes (LBO); in vitro fertilization-embryo transfer (IVF-ET); patient outcomes optimization

Received: 16 February 2025; Accepted: 18 June 2025; Published online: 25 July 2025.

doi: 10.21037/jmai-25-75

Highlight box

Key findings

• An extreme gradient boosting (XGBoost)-based model achieved high accuracy [area under the curve (AUC) =0.91] in predicting live birth versus miscarriage among patients with confirmed pregnancy following in vitro fertilization-embryo transfer (IVF-ET).

• SHapley Additive exPlanations (SHAP) analysis identified body mass index (BMI), years of infertility and select immune/coagulation markers [e.g., erythrocyte sedimentation rate (ESR), total testosterone (TT)] as top predictors of outcome.

• Engineered features [e.g., anti-Müllerian hormone (AMH) × age, endometrial quality] showed limited global importance.

What is known and what is new?

• Most machine learning (ML) models for IVF focus on predicting whether pregnancy occurs and use common predictors such as age, AMH, and follicle-stimulating hormone (FSH).

• This study specifically targets patients already pregnant via IVF-ET to distinguish between live birth and miscarriage. It incorporates domain-informed feature engineering and SHAP-based explainability to enhance interpretability and clinical relevance.

What is the implication, and what should change now?

• Post-conception risk prediction using ML enables early risk stratification and tailored monitoring for IVF patients who have conceived.

• Clinical workflows can begin to integrate explainable artificial intelligence (AI) models for individualized post-transfer counseling. Future work should focus on prospective validation, external cohorts, and streamlined feature sets for real-world implementation.

Introduction

Accurately predicting whether a clinically confirmed pregnancy following in vitro fertilization (IVF) will result in live birth or miscarriage remains a critical yet underexplored challenge in reproductive medicine, particularly as up to 20% of such pregnancies result in miscarriage (1,2). While assisted reproductive technologies (ART) have enabled millions of couples to conceive, miscarriage after successful embryo transfer remains a common outcome, contributing to emotional distress and healthcare burden (3). According to the World Health Organization (WHO), approximately 17.5% of adults globally experience infertility at some stage in their lives (4). In the USA, over 238,000 ART cycles were initiated in 2021, resulting in 91,906 live births, a live birth rate of 47.2% among women under 35 (5). However, a significant proportion of pregnancies achieved through IVF end in early pregnancy loss, with reported miscarriage rates among IVF-embryo transfer (IVF-ET) patients ranging from 18% to 30% (6).

Artificial intelligence (AI), particularly machine learning (ML), offers a powerful means of enhancing outcome prediction by identifying complex, nonlinear relationships across diverse physiological domains (7,8). AI-assisted IVF has shown promise in optimizing medication protocols, predicting pregnancy outcomes, and improving clinical decision-making (9,10). Yet, most existing ML models focus on predicting whether a patient will become pregnant following IVF (9,11). They often include both pregnant and non-pregnant patients in the modeling cohort, aiming to estimate overall IVF success (12). In contrast, our study specifically targets a more narrowly defined and clinically relevant subpopulation: patients who have already achieved clinical pregnancy after IVF-ET. The aim is to predict whether these pregnancies will proceed to live birth or result in miscarriage.

This distinction is critical. Prediction models that apply only after pregnancy confirmation serve a different purpose than those predicting IVF success from the outset (12-16). They can support clinicians in early identification of high-risk pregnancies, guide post-transfer monitoring, and inform patient counseling (15-18). However, few ML studies have addressed this stage-specific outcome prediction, and even fewer incorporate interpretable modeling strategies that clinicians can trust and apply in practice.

Literature review

ML has become an increasingly prominent tool in ART, particularly for predicting live birth outcomes (LBO) following IVF (14-21). Among various ML algorithms, extreme gradient boosting (XGBoost) has consistently demonstrated superior performance due to its ability to model complex, nonlinear interactions. As summarized in Table 1, several studies have successfully applied XGBoost to predict LBO or miscarriage outcomes in IVF cohorts. For instance, Chen et al. achieved an area under the curve (AUC) of 0.90 using XGBoost on a dataset of 3,012 patients, identifying age and early-cycle hormone levels as key predictors (30). Similarly, Wen et al. and Qiu et al. reported AUC values exceeding 0.70, underscoring the robustness of XGBoost in reproductive outcome modeling (9,29).

Table 1

Prior studies implementing ML models for predicting LBO-miscarriage among IVF patients

Study (years)	Dataset size	Models used	Key features identified or key findings	Best model performance	Population description
Liu et al. (2024) (22)	1,405 patients	ANN, SV^M†	Age, OSI, COS treatment regimen	AUC: 0.91 (train); 0.85 (test)	IVF-conceived patients (pregnancy confirmed)
			Gn starting dose, endometrial thickness on HCG day, P value on HCG day	Precision: 0.86 (train); 0.78 (test)
			Embryo transfer strategy
Liu et al. (2023) (23)	1,857 patients	LR, RF, XGBoost, LGBM^†	Maternal age, AMH, basal FSH, basal P, basal E2, infertility duration, the number of left sinus follicles	AUC: 0.84 (train); 0.82 (valid); 0.84 (test)	IVF patients (likely mixed conception status)
Wang et al. (2022) (24)	3,012 patients	XGBoost^†	Age, E2 on day 2, PRL on day 2, LH on day 2, basal LH, basal E2, basal PRL, FSH consumption	AUC: 0.90	All IVF cycles (pregnant and non-pregnant)
				Specificity: 0.83
				Sensitivity: 0.81
Zhang et al. (2022) (15)	57,558 patients	DT, LD, LR, NB, SVM, ANN	Not specified	AUC: 0.79	Mixed IVF population from national registry
Zhang et al. (2022) (15)	57,558 patients	BT, AdaBoost, Gentle Boost, LogitBoost^†, RUSBoost, RSM	Not specified	F1 score: 0.71	Mixed IVF population from national registry
Bardet et al. (2022) (25)	3,045 patients	Cox regression, XGBoost^†	Age, ovarian failure, BMI, male age, infertility duration, endometriosis, dysovulatory, tubal infertility, year of 1st cycle	C-index: 0.60	General ART population
Amini et al. (2021) (26)	4,930 cycles	LR, SVM, XGBoost, RF^†, NB, LDA	Embryo number, number of injected oocytes, cause of infertility, age, PCO, number of previous IVF	AUC: 0.68 (train); 0.60 (test)	Mixed IVF patients (pre- and post-conception)
Amini et al. (2021) (26)	4,930 cycles	LR, SVM, XGBoost, RF^†, NB, LDA	History of abortion, infertility type	AUC: 0.68 (train); 0.60 (test)	Mixed IVF patients (pre- and post-conception)
Goyal et al. (2020) (27)	141,160 patients	MLP, KNN, DT, RF^†, Adaboost, voting hard classifier, voting soft classifier	Not specified	AUC: 0.75	ART cycles registry (pregnant and non-pregnant)
Goyal et al. (2020) (27)	141,160 patients		Not specified	F1 score: 0.77	ART cycles registry (pregnant and non-pregnant)
Barnett-Itzhaki et al. (2020) (8)	136 patients	SVM, ANN^†, LR	Clinical data, age, BMI, number of transferred embryos	AUC: 0.77	IVF-conceived cohort (pregnancy likely confirmed)
Barnett-Itzhaki et al. (2020) (8)	136 patients	SVM, ANN^†, LR	Clinical data, age, BMI, number of transferred embryos	Accuracy: 0.87	IVF-conceived cohort (pregnancy likely confirmed)
Vogiatzi et al. (2019) (28)	257 patients	ANN^†	Age, age at menarche, difficulty at ET, endometrium thickness	Sensitivity: 0.78 (training); 0.71 (testing)	IVF patients (mixed cycle types: fresh/frozen)
Vogiatzi et al. (2019) (28)	257 patients	ANN^†	ET/2PN, TQE D3, TQE D3/2PN, total Gn, age group, dyspareunia, fresh or frozen cycle, menarche >12 years	Specificity: 0.75 (training); 0.70 (testing)	IVF patients (mixed cycle types: fresh/frozen)
Qiu et al. (2019) (29)	7,188 patients	XGBoost^†	Age, AMH, BMI, duration of infertility	AUC: 0.73	All IVF patients (pregnancy and outcome status mixed)
			Previous live birth, previous miscarriage	Accuracy: 0.70
			Previous abortion and type of infertility
Yuan et al. (2022) (6)	1,017 patients	XGBoost^†	XGBoost demonstrated high predictive accuracy for missed abortion in IVF-ET patients	AUC: 0.87	Pregnant IVF patients (first-trimester miscarriage prediction)
Yuan et al. (2022) (6)	1,017 patients	XGBoost^†		F1: 0.730
Amitai et al. (2023) (19)	391 patients	XGBoost, RF^†	The models showed moderate accuracy in predicting miscarriage based on time-lapse images of preimplantation development	AUC: 0.69
Current study	1,017 patients	XGBoost^†, Ensemble		AUC: 0.91	Pregnant IVF patients (first-trimester miscarriage prediction)
				F1: 0.93
				Precision: 0.89
				Accuracy: 0.89

The best model for each study is indicated with ^†. AMH, anti-Müllerian hormone; ANA, anti-nuclear antibody; AUC, area under the curve; BMI, body mass index; BT, basal temperature; COS, controlled ovarian stimulation; DT, decision tree; E2, estradiol; ET, embryo transfer; ET/2PN, ratio of number of embryos transferred to the uterus to oocytes fertilized (2PN) following OR; FSH, follicle-stimulating hormone; Gn, gonadotropins; HCG, human chorionic gonadotropin; IVF, in vitro fertilization; IVF-ET, in vitro fertilization-embryo transfer; KNN, K-nearest neighbour; LBO, live birth outcomes; LD, linear discriminant; LDA, linear discriminant analysis; LGBM, light gradient boosting machine; LH, luteinizing hormone; LR, logistic regression; ML, machine learning; MLP, multilayer perceptron; NB, naïve Bayes; OR, oocyte retrieval; OSI, ovarian sensitivity index; P, progesterone; PCO, polycystic ovaries; PRL, prolactin; RF, random forest; RSM, response surface methodology; SVM, support vector machine; TQE D3, top-quality embryos at day 3 of development; TQE D3/2PN, ratio of top quality embryos at day 3 of development to oocytes fertilized (2PN); total gonadotropins, total dose of gonadotropins administered (IU) during the stimulation protocol in a single cycle.

Large-scale studies further validate the performance of ML methods. Goyal et al. developed a random forest (RF) model using data from over 141,000 IVF cycles and achieved an AUC of 0.85 (27). Zhang et al. used a LogitBoost model on 57,558 IVF-linked cycles, reporting an AUC of 0.79. These studies emphasize the value of ensemble models and large datasets in improving predictive accuracy (15).

Across existing models, several features frequently emerge as dominant predictors: female age, anti-Müllerian hormone (AMH), follicle-stimulating hormone (FSH), luteinizing hormone (LH), estradiol (E2), body mass index (BMI), duration of infertility, and prior miscarriage history. However, most studies focus on predicting whether IVF will result in pregnancy and do not assess outcomes among patients who have already conceived.

Several recent efforts have begun to explore this space. Tarasconi et al. found low AMH levels to be a strong predictor of miscarriage risk in IVF-ET pregnancies (31). Liu et al. used fetal heart rate (FHR) with a RF model to predict early pregnancy loss, achieving an AUC of 0.97 (22). Yuan et al. applied XGBoost to a dataset of IVF pregnancies and achieved an AUC of 0.88 for predicting missed abortion (6). Luan et al. integrated lipidomic biomarkers, while Amitai et al. employed time-lapse embryo imaging to assess early miscarriage risk (19,32).

Despite these contributions, existing models rarely incorporate interaction terms or treatment-stage variables such as embryo quality or endometrial metrics (9,16-19,33-36). Model interpretability is also limited; most approaches function as black boxes, providing accurate predictions but little insight into why a particular outcome was forecasted. Furthermore, few studies focus exclusively on clinically confirmed IVF pregnancies, where miscarriage prediction could provide immediate clinical utility. Notably, this gap is highlighted in several works that span a broad spectrum of approaches and datasets (9,11,15,18,20,22,27,37-45).

Motivation

This study is motivated by the need for a transparent and clinically actionable ML model tailored to a specific cohort: patients with confirmed clinical pregnancy following IVF. Unlike broader IVF success models, this work aims to stratify risk among individuals who have already conceived, thereby supporting post-transfer management and counseling.

Existing models often overlook key physiological domains, such as thyroid function, immune status, and treatment-phase variables. In addition to well-known predictors such as age, AMH, and FSH, we include lesser-studied but clinically relevant variables like endometrial characteristics and fasting blood glucose. Endometrial thickness and morphology have been associated with implantation potential and endometrial, receptivity (46,47), while elevated fasting glucose and insulin resistance have been linked to impaired endometrial function and increased miscarriage risk (48). By integrating these features, the model aims to improve individualized post-conception risk stratification. They also tend to rely on standard features without engineering new ones that may better reflect biological mechanisms. Moreover, most lack explainability, making them difficult to integrate into clinical workflows.

To address these gaps, we develop and validate an ensemble ML model that integrates domain-specific feature engineering, such as AMH × age and endometrial quality, and applies SHapley Additive exPlanations (SHAP) to provide both global and individualized interpretations of predictions. By focusing on a well-defined, post-conception IVF population and incorporating underused predictors and interpretable modeling, this study advances current methodologies and supports more informed, personalized fertility care. We present this article in accordance with the TRIPOD reporting checklist (available at https://jmai.amegroups.com/article/view/10.21037/jmai-25-75/rc).

Methods

Study design and dataset description

This retrospective study utilized publicly available data originally published by Yuan et al. (2022), collected from the Reproductive Center of Qingdao Women’s and Children’s Hospital between September 2019 and May 2022 (6). The dataset included 1,017 infertile women who underwent IVF-ET and achieved singleton intrauterine pregnancy confirmed by ultrasound. The inclusion criteria were: (I) complete IVF-ET treatment and follow-up; (II) confirmed singleton pregnancy; (III) intrauterine gestation. Exclusion criteria were: (I) use of donor sperm or eggs, (II) ectopic or multiple pregnancy, (III) pregnancy loss due to trauma or medication, (IV) severe organ disorders (cardiac, renal, hepatic, etc.), and (V) incomplete clinical data. A flowchart of the study can be found in Figure S1.

The primary outcome was live birth versus miscarriage among patients with confirmed pregnancy. Miscarriage was defined using ultrasound-based criteria consistent with American College of Obstetricians and Gynecologists (ACOG) guidelines, including absence of fetal heartbeat, gestational sac size thresholds, and timing-based criteria following yolk sac development. Stillbirths, preterm deliveries, and ectopic pregnancies were excluded based on the original study criteria.

A total of 52 clinical, biochemical, hormonal, immunologic, and treatment-related variables were included, as outlined in Table S1. These features spanned demographic factors, ovarian reserve markers (e.g., AMH, FSH, LH), thyroid hormones [e.g., thyroid-stimulating hormone (TSH), free triiodothyronine (FT3), free thyroxine (FT4)], immune markers [e.g., anti-nuclear antibody (ANA), erythrocyte sedimentation rate (ESR)], coagulation parameters [e.g., D-dimer, prothrombin time (PT), activated partial thromboplastin time (APTT)], infection screening, uterine and ovarian morphology, and embryo transfer characteristics. Most laboratory values were obtained during baseline assessment prior to IVF stimulation. Embryological and endometrial data were collected at or near the time of embryo transfer. Appendix 1 provides detailed explanations of the variables listed in Table S1.

Data preprocessing

Features with >50% missing values were excluded. Remaining missing data were imputed using median imputation for numerical variables and mode for categorical variables. Categorical features were encoded using Label-Encoder. Numerical features were normalized using Min-Max scaling. To address class imbalance (LBO:miscarriage ~2:1), the XGBoost model incorporated the scale-pos-weight parameter; however, no oversampling techniques (e.g., SMOTE) were applied.

Model development and ensemble learning

Two parallel modeling pipelines were implemented:

Ensemble model: a soft-voting classifier combining XGBoost, LightGBM, and logistic regression (LR) was trained on the full feature set. XGBoost was initialized with max-depth =4, n-estimators =100, learning-rate =0.1, gamma =0.1, subsample =0.8, colsample-bytree =0.6. LightGBM and LR used default settings.
Standalone XGBoost for SHAP: a separately trained XGBoost model was used for SHAP-based post-hoc interpretability.

The dataset was split using stratified random sampling into training (80%) and testing (20%) subsets. Hyperparameters were tuned using 5-fold grid search cross-validation on the training set.

To enhance physiological relevance and interpretability, interaction terms were engineered based on established clinical reasoning. The FSH/LH ratio was derived to represent hypothalamic-pituitary-ovarian axis dynamics, as commonly applied in reproductive endocrinology. The AMH × age term was constructed to correlate ovarian reserve with biological age, providing a composite indicator of reproductive potential. Additionally, endometrial quality was defined as the product of endometrial thickness and embryo quality score to approximate implantation readiness.

A classification threshold of 0.156 was selected by maximizing the F1-score on the precision-recall curve using validation data, ensuring an optimal balance between sensitivity and precision in predicting LBO.

A full code repository is available in Appendix 2 to ensure reproducibility.

Figure 1 displays the first decision tree of the XGBoost model for predicting LBO. The tree begins with “years of infertility” as the most significant feature, splitting the data based on a threshold of 0.156. Depending on this value, the model follows different branches, considering other features like “herpes simplex virus IgM” levels, “ESR”, and “female age” to refine predictions.

Figure 1 Decision tree visualization from the XGBoost Model. This figure illustrates one of the top-level decision trees used in the XGBoost model for predicting LBO following IVF-ET. ESR, erythrocyte sedimentation rate; IgM, immunoglobulin M; IVF-ET, in vitro fertilization-embryo transfer; LBO, live birth outcomes.

For patients with fewer years of infertility, low herpes simplex virus IgM levels lead the model to assess ESR, influencing the final prediction. Higher or missing IgM levels shift focus to the patient’s age, affecting the outcome. For those with longer infertility, the model re-evaluates infertility duration and then considers ESR and IgM levels to determine the prognosis.

This decision tree highlights the hierarchical importance of clinical features and their interplay in predicting outcomes. It offers a clear visualization of the model’s decision-making process, providing insights into how different factors contribute to patient outcomes, which can be valuable for personalized care in clinical settings.

Ethics and registration

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The original dataset was approved by the Ethics Committee of the Affiliated Women and Children’s Hospital of Qingdao University (QFELL-YJ-2022-18). This secondary analysis was not separately registered as no new patient data were collected.

Model evaluation

The model’s performance was evaluated on both the training and testing datasets using standard classification metrics, including area under the receiver operating characteristic (ROC) curve, precision, recall, F1 score, and accuracy. These metrics are reported in the main text, while detailed definitions are provided in Appendix 3. To assess classification performance and threshold trade-offs, confusion matrices (Figure 2A,2B) and ROC curves (Figure 2C,2D) were initially generated. Subsequently, additional confusion matrices (Figure 3A,3B) and ROC curves (Figure 3C,3D) were produced to facilitate comparative analysis across models.

Figure 2 Confusion matrices for the training (A) and test (B) datasets. ROC curves for the XGBoost model for the training (C) and test (D) datasets. ROC, receiver operating characteristic.

Figure 3 Confusion matrices for the training (A) and test (B) datasets. ROC curves for the ensemble model for the training (C) and test (D) datasets. AUC, area under the curve; ROC, receiver operating characteristic.

The SHAP framework was employed to interpret the XGBoost model’s outputs. SHAP values were computed to quantify the contribution of each input feature to the predicted probability of live birth. The SHAP plot reveals the complex relationships between clinical features and their influence on predicting LBO, providing insight into the model’s decision-making process in a clinically meaningful way.

All analyses were conducted using Python 3.11 within a Jupyter Notebook environment. Libraries used included Scikit-learn, XGBoost, LightGBM, SHAP, pandas, numpy, and matplotlib.

Statistical analysis

Descriptive statistics were computed for all features to summarize central tendency and dispersion. Continuous variables were expressed as medians and interquartile ranges (IQR) due to the non-uniform distribution of features, as determined by the Shapiro-Wilk test for normality. Between-group comparisons for continuous variables (live birth vs. miscarriage) were performed using Welch’s t-test for normally distributed features and the Mann-Whitney U-test for non-normal distributions. For categorical variables, frequencies and percentages were reported, and comparisons were conducted using the chi-square or Fisher’s exact test, as appropriate. A two-sided P<0.05 was considered statistically significant.

All statistical analyses were conducted using Python (v3.11), with SciPy and statsmodels libraries utilized for hypothesis testing.

Results

The final analytic dataset included 1,017 patients, with 677 live births and 340 miscarriages following IVF-ET, with 31 continuous and 21 categorical features. Descriptive statistics for continuous features in the live birth and miscarriage groups are provided in Table S2. Significant differences were found between the groups. For instance, female age was lower in the live birth group (32.1±3.8 years) than in the miscarriage group (34.1±4.7 years) (P<0.001). Similarly, FSH levels were lower in the live birth group {6.6 mIU/mL [interquartile range (IQR): 5.5–8]} compared to the miscarriage group [7.2 mIU/mL (IQR: 6.2–8.3)] (P<0.001). E2 levels were higher in the live birth group [171.3 pmol/L (IQR: 141.3–195.8)] vs. the miscarriage group [147.9 pmol/L (IQR: 112.0–195.5)] (P<0.001). Other features, such as AMH, prolactin (PRL), APTT, and ESR, also show differences between the groups. However, features like BMI, years of infertility, and D-Dimer do not show significant differences.

A feature correlation matrix, found in Figure 4, provides a visualization of the relationships between features, with strong correlations identified, such as between male and female age (r=0.7) and between AMH levels and follicle counts. Total testosterone (TT) and fibrinogen (FIB) also exhibited a moderate positive correlation (r=0.6).

Figure 4 Feature correlation matrix. Color intensity reflects the strength of correlations; darker colors indicate stronger correlations and lighter colors indicate weaker correlations; positive values indicate a positive correlation, and negative values indicate a negative correlation. AMH, anti-Müllerian hormone; APTT, activated partial thromboplastin time; BMI, body mass index; E2, estradiol; ESR, erythrocyte sedimentation rate; FBG, fasting blood glucose; FIB, fibrinogen; FSH, follicle-stimulating hormone; FT3, free triiodothyronine; FT4, free thyroxine; LH, luteinizing hormone; P, progesterone; PRL, prolactin; PT, prothrombin time; T, testosterone; TSH, thyroid stimulating hormone; TT, total testosterone.

SHAP analysis was performed to interpret model predictions and identify features most influential to pregnancy outcomes. Figure 5 presents a global summary of SHAP values across all predictors. Among the 52 features evaluated, longer duration of infertility, higher BMI, elevated ESR, and increased TT were associated with decreased likelihood of live birth. Conversely, a greater number of retrieved oocytes contributed positively to successful outcomes. Features such as TSH, female age, estradiol (E2), and male age exhibited bidirectional effects, reflecting patient-specific variability. Notably, engineered features including AMH × age, FSH/LH ratio, and endometrial quality did not demonstrate strong global importance in the model, suggesting limited added predictive value beyond established clinical markers in this cohort.

Figure 5 SHAP analysis. Features with SHAP values less than zero tend to influence miscarriage, and values above zero tend to influence LBO. The red indicates the greater value of the variable, and the blue indicates the lower value. The variables are in decreasing order of importance. ACA, anti-cardiolipin antibody; AMH, anti-Müllerian hormone; ANA, anti-nuclear antibody; APTT, activated partial thromboplastin time; BMI, body mass index; E2, estradiol; ESR, erythrocyte sedimentation rate; FBG, fasting blood glucose; FIB, fibrinogen; FSH, follicle-stimulating hormone; FT3, free triiodothyronine; FT4, free thyroxine; LBO, live birth outcomes; LH, luteinizing hormone; P, progesterone; PRL, prolactin; PT, prothrombin time; SHAP, SHapley Additive exPlanations; T, testosterone; TG-Ab, thyroglobulin antibody; TPO_Ab, thyroid peroxidase antibody; TSH, thyroid stimulating hormone; TT, total testosterone.

Both the ensemble model and standalone XGBoost classifier demonstrated strong predictive performance in distinguishing live birth from miscarriage among patients who achieved clinical pregnancy following IVF-ET.

The XGBoost model, trained with optimized hyperparameters and adjusted for class imbalance, achieved excellent discriminative capability, with an AUC of 0.99 on the training set and 0.89 on the test set. Evaluation metrics, including precision, F1-score, and overall accuracy, confirmed high model fidelity. Confusion matrices (Figure 2A,2B) indicated balanced classification with minimal false negatives, while ROC curves (Figure 2C,2D) demonstrated near-perfect separation in the training set and strong generalizability on the test set (AUC =0.91).

The ensemble classifier, incorporating XGBoost, LightGBM, and LR in a soft-voting configuration, achieved comparable performance. The model attained an AUC of 0.99 on the training set and 0.89 on the test set. Following threshold optimization based on F1-score maximization, a probability cutoff of 0.16 was selected. This threshold yielded an overall accuracy of 0.88, F1-score of 0.87, precision of 0.85, sensitivity of 0.96, and specificity of 0.66. The ensemble offered a slight improvement in recall and clinical sensitivity, supporting its potential for post-transfer outcome triage. Performance visualizations for the ensemble model, including confusion matrices and ROC curves, are presented in Figure 3A,3D. Performance comparison summary for XGBoost and ensemble models is summarized in Table 2.

Table 2

Performance metrics for the XGBoost model

Performance metrics	XGBoost	Ensemble
Training set
AUC score	0.99	1.0
Precision	0.95	0.71
F1 score	0.98	0.83
Accuracy	0.97	0.72
Test set
AUC score	0.91	0.88
Precision	0.89	0.70
F1 score	0.93	0.82
Accuracy	0.89	0.72

AUC, area under the curve.

After confirming the strong predictive capability of the present models, a t-test was conducted to explore the clinical and demographic differences between patients who had live births and those who experienced miscarriages. The results, found in Table S3, reveal significant differences between these two groups for several key features. Female age (t=7.38, P<0.001) and male age (t=2.05, P=0.04) were both significantly higher in the miscarriage group. Chromosome abnormalities (t=2.02, P=0.04) and abnormal ovarian structures (t=2.65, P=0.01) were also more prevalent among those who miscarried. Hormonal factors, such as PRL (t=3.28, P=0.001) and AMH (t=−4.964, P<0.001), showed significant differences, with higher PRL levels and lower AMH levels in the miscarriage group. Coagulation-related features, including PT (t=−2.05, P=0.04) and APTT (t=−4.07, P<0.001), also differed significantly between the two groups. Additionally, the presence of cytomegalovirus IgM (t=4.34, P<0.001), rubella virus IgM (t=2.01, P=0.04), anti-cardiolipin antibody (ACA) (t=2.67, P=0.01), ANA (t=2.01, P=0.05), and thyroid peroxidase antibody (TPO-Ab) (t=4.10, P<0.001) were significantly associated with miscarriage. However, several variables, such as BMI, infertility type, male sperm abnormalities, and the total number of eggs retrieved, did not show significant differences between the two groups, suggesting that these factors may not play a crucial role in distinguishing between live birth and miscarriage outcomes in this cohort.

Calibration performance was assessed using reliability curves. As shown in Figure S2, the calibration curve on the training set closely follows the ideal diagonal for higher probability bins, indicating accurate probability estimation in confident predictions. The test set curve shows slightly more deviation but maintains a reasonable fit across most thresholds, suggesting that the model’s predicted probabilities are generally well-calibrated. This supports the model’s reliability in estimating true outcome probabilities in new data.

Decision curve analysis was used to evaluate the clinical utility of the XGBoost model across a continuum of threshold probabilities. As illustrated in Figure S3, the model consistently demonstrated a higher net benefit compared to “Treat All” and “Treat None” strategies, particularly within the clinically actionable threshold range of 0.10–0.40. This suggests that applying the model to guide clinical decision-making would yield superior outcomes over default strategies that treat all or no patients.

Discussion

This study developed an interpretable ML framework to predict live birth versus miscarriage in patients with confirmed clinical pregnancy following IVF-ET. In contrast to conventional models that aim to predict the likelihood of conception, this work focuses on a clinically distinct and underexplored population, those who have already conceived where early risk stratification can inform post-transfer management.

While ensemble models are commonly used to improve generalization by aggregating multiple learners, the standalone XGBoost model outperformed the ensemble in this study across multiple performance metrics. Specifically, XGBoost achieved a higher test AUC (0.91 vs. 0.88), F1 score (0.93 vs. 0.82), precision (0.89 vs. 0.70), and accuracy (0.89 vs. 0.72). This superior performance can be attributed to XGBoost’s ability to model complex non-linear feature interactions and manage class imbalance through hyperparameter tuning, including scale-pos-weight, subsample, and colsample-bytree. In contrast, the ensemble model, which incorporated XGBoost, LightGBM, and LR, may have suffered from signal dilution, where weaker base learners (e.g., LR) reduced overall prediction precision by overestimating the positive class. This was evident in the ensemble’s confusion matrix, which showed no false negatives but a substantial number of false positives, leading to lower specificity. Furthermore, the voting-based approach, while increasing sensitivity, compromised the model’s ability to discriminate accurately between outcomes. These findings underscore the robustness of XGBoost in capturing non-linearities, handling noisy features, and delivering balanced performance, making it the preferred model for risk stratification in post-IVF pregnancy prediction.

Feature-level analysis highlighted predictors consistent with reproductive physiology. Female and male ages were significantly higher in the miscarriage group, confirming their established association with reduced fertility potential. AMH and FSH levels reflected expected biological trends: higher AMH and lower FSH in the live birth group suggested better ovarian reserve and stimulation response. Elevated PRL in the miscarriage group aligned with its known role in impairing implantation and luteal function.

Coagulation markers, including PT and APTT, were significantly associated with miscarriage, supporting the hypothesis that thrombophilic states may contribute to early pregnancy loss. Immune and infectious markers, such as ACA, ANA, TPO-Ab, and viral IgM antibodies were more prevalent in the miscarriage group, suggesting that latent infection and immune dysregulation may impact post-conception outcomes.

Although engineered features (AMH × age, FSH/LH ratio, and endometrial quality) did not rank among the most globally dominant predictors, they may hold stratified utility within patient subgroups and contribute to physiological interpretability.

SHAP analysis further revealed treatment-phase predictors to be highly informative. The number of high-quality embryos and total oocytes retrieved had the strongest positive impact on live birth prediction, underscoring the relevance of intermediate treatment outcomes. Immune and hemostatic markers, particularly ESR, TT, and D-Dimer, also contributed substantially. Elevated ESR and shortened TT were associated with favorable outcomes, while increased APTT and PT predicted miscarriage. These findings emphasize the value of integrating inflammatory and coagulation biomarkers into IVF risk models.

Thyroid-related features, including high TSH and positive thyroid autoantibodies (TG-Ab, TPO-Ab), were negatively associated with live birth probability, supporting literature linking subclinical hypothyroidism to implantation failure. Among clinical variables, years of infertility, female age, and BMI were strong negative predictors. Notably, although not among the top-ranked features, endometrial thickness exhibited a consistent directional influence thin endometrial lining corresponded with lower predicted success, in alignment with clinical thresholds for optimal receptivity.

Clinically, the model offers several actionable benefits:

Early identification of high-risk pregnancies enables closer surveillance.
Support for personalized interventions, such as immunologic workup or anticoagulation.
Improved counseling, reducing uncertainty and psychological burden.

Together, these findings validate the predictive strength of the model and highlight the contribution of diverse physiological domains, hormonal, immune, coagulation, and anatomical to pregnancy maintenance following IVF. By coupling ensemble learning with explainable AI, this study provides a foundation for deploying ML tools to guide post-conception care in reproductive medicine.

Strengths and limitations

Key strengths of this study include its novel focus on post-conception outcome prediction, the use of a multi-domain dataset spanning hormonal, immunologic, structural, and treatment features, and integration of explainability via SHAP. Ensemble modeling and regularization techniques were employed to mitigate overfitting.

However, several limitations warrant discussion. The dataset originates from a single center in China, which may limit generalizability. Although cross-validation was used during training, external validation on geographically and ethnically diverse cohorts is essential. The model currently includes 52 features, which may be challenging for routine clinical use; future work should explore feature reduction techniques to increase feasibility. Additionally, as a retrospective analysis, unmeasured confounding and data quality issues may influence results.

Compared to previous IVF prediction models, which primarily assess pregnancy initiation, our model targets a clinically underexplored question, the likelihood of live birth once pregnancy is confirmed. This represents a meaningful contribution to the literature. In terms of performance, our model achieved an AUC of 0.89–0.90, which is competitive with or exceeds prior models (Table 1). Importantly, we introduced novel clinically relevant features (e.g., endometrial quality, autoantibodies) and explainability techniques that enhance transparency and potential for clinical translation.

While this study demonstrates the utility of interpretable ML models in predicting post-IVF pregnancy outcomes, broader implementation in clinical practice requires careful consideration of data transparency and governance. As highlighted by Hickman et al., integrating blockchain and decentralized data technologies may address longstanding concerns about the traceability, interoperability, and ethical oversight of AI systems in reproductive medicine (49). Future work should explore how such infrastructures can enhance model accountability and support multi-institutional validation without compromising patient privacy.

Conclusions

This study demonstrates the feasibility and clinical relevance of using interpretable ML to predict live birth versus miscarriage among patients with confirmed pregnancies following IVF-ET. While an ensemble model integrating XGBoost, LightGBM, and LR achieved strong predictive performance the standalone XGBoost model outperformed the ensemble in key metrics such as precision, F1-score, and accuracy. These findings underscore the robustness of XGBoost when appropriately tuned and regularized for class imbalance. SHAP-based explainability applied to the XGBoost model provided granular insights into the clinical and biological features driving prediction outcomes. Collectively, these results support the integration of advanced ML frameworks into post-conception risk stratification and treatment personalization in reproductive medicine.

By integrating a broad range of features across hormonal, immunologic, coagulation, and treatment domains, the model enabled individualized risk stratification. Key predictors included AMH, FSH, PRL, ESR, and the number of retrieved oocytes, while engineered features such as AMH × age and endometrial quality contributed interpretive value. SHAP analysis further elucidated the directionality and clinical implications of each factor.

These findings highlight the potential for ML to support post-conception counseling, early intervention, and personalized care in reproductive medicine. Future work should focus on simplifying the feature set for real-world deployment and externally validating the model across diverse clinical settings to enhance generalizability.

Acknowledgments

During the preparation of this work, the authors used Grammarly to edit and refine the text. After using this tool, the authors reviewed and edited the content as needed and took full responsibility for the content of the published article.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://jmai.amegroups.com/article/view/10.21037/jmai-25-75/rc

Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-25-75/prf

Funding: None.

Conflicts of Interest: Both authors have completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-25-75/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The original dataset was approved by the Ethics Committee of the Affiliated Women and Children’s Hospital of Qingdao University (QFELL-YJ-2022–18). This secondary analysis was not separately registered as no new patient data were collected.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Zinaman MJ, Clegg ED, Brown CC, et al. Estimates of human fertility and pregnancy loss. Fertil Steril 1996;65:503-9.
Guo J, Feng Q, Chaemsaithong P, et al. Biomarkers at 6 weeks' gestation in the prediction of early miscarriage in pregnancy following assisted reproductive technology. Acta Obstet Gynecol Scand 2023;102:1073-83. [Crossref] [PubMed]
Fact Sheet_ In Vitro Fertilization (IVF) Use Across the United States _ HHS.gov. Available online: https://www.hhs.gov/about/news/2024/03/13/fact-sheet-in-vitro-fertilization-ivf-use-across-united-states.html
Infertility Prevalence Estimates. Available online: https://iris.who.int/bitstream/handle/10665/366700/9789240068315-eng.pdf
Mardovich S, Jewett A, Zhang Y, et al. 2021 Assisted Reproductive Technology Fertility Clinic and National Summary Report; 2021. Available online: www.cdc.gov/art/artdata/index.html
Yuan G, Lv B, Du X, et al. Prediction model for missed abortion of patients treated with IVF-ET based on XGBoost: a retrospective study. PeerJ 2023;11:e14762. [Crossref] [PubMed]
Bulletti FM, Berrettini M, Sciorio R, et al. Artificial intelligence algorithms for optimizing assisted reproductive technology programs: A systematic review. Glob Transl Med 2023;2:1-12.
Barnett-Itzhaki Z, Elbaz M, Butterman R, et al. Machine learning vs. classic statistics for the prediction of IVF outcomes. J Assist Reprod Genet 2020;37:2405-12. [Crossref] [PubMed]
Wen JY, Liu CF, Chung MT, et al. Artificial intelligence model to predict pregnancy and multiple pregnancy risk following in vitro fertilization-embryo transfer (IVF-ET). Taiwan J Obstet Gynecol 2022;61:837-46. [Crossref] [PubMed]
Hariton E, Pavlovic Z, Fanton M, et al. Applications of artificial intelligence in ovarian stimulation: a tool for improving efficiency and outcomes. Fertil Steril 2023;120:8-16. [Crossref] [PubMed]
Yao MWM, Nguyen ET, Retzloff MG, et al. Improving IVF Utilization with Patient-Centric Artificial Intelligence-Machine Learning (AI/ML): A Retrospective Multicenter Experience. J Clin Med 2024;13:3560. [Crossref] [PubMed]
Chow DJX, Wijesinghe P, Dholakia K, et al. Does artificial intelligence have a role in the IVF clinic? Reprod Fertil 2021;2:C29-34. [Crossref] [PubMed]
DeVilbiss EA, Mumford SL, Sjaarda LA, et al. Prediction of pregnancy loss by early first trimester ultrasound characteristics. Am J Obstet Gynecol 2020;223:242.e1-242.e22. [Crossref] [PubMed]
Wei J, Xiong D, Zhang Y, et al. Predicting ovarian responses to the controlled ovarian hyperstimulation in elderly infertile women using clinical measurements and random forest regression. Eur J Obstet Gynecol Reprod Biol 2023;288:153-9. [Crossref] [PubMed]
Zhang Y, Shen L, Yin X, et al. Live-Birth Prediction of Natural-Cycle In Vitro Fertilization Using 57,558 Linked Cycle Records: A Machine Learning Perspective. Front Endocrinol (Lausanne) 2022;13:838087. [Crossref] [PubMed]
Ma Y, Zhang B, Liu Z, et al. IAS-FET: An intelligent assistant system and an online platform for enhancing successful rate of in-vitro fertilization embryo transfer technology based on clinical features. Comput Methods Programs Biomed 2024;245:108050. [Crossref] [PubMed]
Lin W, Zhuang L, Ren M, et al. IVFPred: A Novel Framework to Expediate Mining the Clinical Data of In Vitro Fertilization-Embryo Transfer 3 4. Available online: https://github.com/ExposomeX/IVFPred
Sun L, Li J, Zeng S, et al. Artificial intelligence system for outcome evaluations of human in vitro fertilization-derived embryos. Chin Med J (Engl) 2024;137:1939-49. [Crossref] [PubMed]
Amitai T, Kan-Tor Y, Or Y, et al. Embryo classification beyond pregnancy: early prediction of first trimester miscarriage using machine learning. J Assist Reprod Genet 2023;40:309-22. [Crossref] [PubMed]
Mehrjerd A, Rezaei H, Eslami S, et al. Internal validation and comparison of predictive models to determine success rate of infertility treatments: a retrospective study of 2485 cycles. Sci Rep 2022;12:7216. [Crossref] [PubMed]
Fanton M, Nutting V, Rothman A, et al. An interpretable machine learning model for individualized gonadotrophin starting dose selection during ovarian stimulation. Reprod Biomed Online 2022;45:1152-9. [Crossref] [PubMed]
Liu L, Liang H, Yang J, et al. Clinical data-based modeling of IVF live birth outcome and its application. Reprod Biol Endocrinol 2024;22:76. [Crossref] [PubMed]
Liu X, Chen Z, Ji Y. Construction of the machine learning-based live birth prediction models for the first in vitro fertilization pregnant women. BMC Pregnancy Childbirth 2023;23:476. [Crossref] [PubMed]
Wang CW, Kuo CY, Chen CH, et al. Predicting clinical pregnancy using clinical features and machine learning algorithms in in vitro fertilization. PLoS One 2022;17:e0267554. [Crossref] [PubMed]
Bardet L, Excoffier JB, Salaun-Penquer N, et al. Comparison of predictive models for cumulative live birth rate after treatment with ART. Reprod Biomed Online 2022;45:246-55. [Crossref] [PubMed]
Amini P, Ramezanali F, Parchehbaf-Kashani M, et al. Factors Associated with In Vitro Fertilization Live Birth Outcome: A Comparison of Different Classification Methods. Int J Fertil Steril 2021;15:128-34. [Crossref] [PubMed]
Goyal A, Kuchana M, Ayyagari KPR. Machine learning predicts live-birth occurrence before in-vitro fertilization treatment. Sci Rep 2020;10:20925. [Crossref] [PubMed]
Vogiatzi P, Pouliakis A, Siristatidis C. An artificial neural network for the prediction of assisted reproduction outcome. J Assist Reprod Genet 2019;36:1441-8. [Crossref] [PubMed]
Qiu J, Li P, Dong M, et al. Personalized prediction of live birth prior to the first in vitro fertilization treatment: a machine learning method. J Transl Med 2019;17:317. [Crossref] [PubMed]
Chen Z, Zhang D, Zhen J, et al. Predicting cumulative live birth rate for patients undergoing in vitro fertilization (IVF)/intracytoplasmic sperm injection (ICSI) for tubal and male infertility: a machine learning approach using XGBoost. Chin Med J (Engl) 2022;135:997-9. [Crossref] [PubMed]
Tarasconi B, Tadros T, Ayoubi JM, et al. Serum antimüllerian hormone levels are independently related to miscarriage rates after in vitro fertilization-embryo transfer. Fertil Steril 2017;108:518-24. [Crossref] [PubMed]
Luan CX, Xie WD, Liu D, et al. Candidate Circulating Biomarkers of Spontaneous Miscarriage After IVF-ET Identified via Coupling Machine Learning and Serum Lipidomics Profiling. Reprod Sci 2022;29:750-60. [Crossref] [PubMed]
Drakopoulos P, Blockeel C, Stoop D, et al. Conventional ovarian stimulation and single embryo transfer for IVF/ICSI. How many oocytes do we need to maximize cumulative live birth rates after utilization of all fresh and frozen embryos? Hum Reprod 2016;31:370-6. [Crossref] [PubMed]
Yang L, Peavey M, Kaskar K, et al. Development of a dynamic machine learning algorithm to predict clinical pregnancy and live birth rate with embryo morphokinetics. F S Rep 2022;3:116-23. [Crossref] [PubMed]
Liu L, Jiao Y, Li X, et al. Machine learning algorithms to predict early pregnancy loss after in vitro fertilization-embryo transfer with fetal heart rate as a strong predictor. Comput Methods Programs Biomed 2020;196:105624. [Crossref] [PubMed]
Chen H, Sun ZL, Chen MX, et al. Predicting the probability of a live birth after a freeze-all based in vitro fertilization-embryo transfer (IVF-ET) treatment strategy. Transl Pediatr 2022;11:797-812. [Crossref] [PubMed]
Li S, Zheng PS, Ma HM, et al. Systematic review of subsequent pregnancy outcomes in couples with parental abnormal chromosomal karyotypes and recurrent pregnancy loss. Fertil Steril 2022;118:906-14. [Crossref] [PubMed]
Xia L, Han S, Huang J, et al. Predicting personalized cumulative live birth rate after a complete in vitro fertilization cycle: an analysis of 32,306 treatment cycles in China. Reprod Biol Endocrinol 2024;22:65. [Crossref] [PubMed]
Havrljenko J, Kopitovic V, Pjevic AT, et al. The Prediction of IVF Outcomes with Autologous Oocytes and the Optimal MII Oocyte/Embryo Number for Live Birth at Advanced Maternal Age. Medicina (Kaunas) 2023;59:1799. [Crossref] [PubMed]
Priya GH, Amuthavalli K, Vijayan T, et al. Classifiers with synthetic oversampling pre-process for In Vitro Fertilization predictions. Indian Journal of Computer Science and Engineering 2021;12:1532-41.
Jia L, Chen PG, Chen LN, et al. Machine learning-based prediction of pregnancy outcomes in couples with non-obstructive azoospermia using micro-TESE for ICSI: a retrospective cohort study. Reproductive and Developmental Medicine 2024;8:24-31.
Shingshetty L, Cameron NJ, Mclernon DJ, et al. Predictors of success after in vitro fertilization. Fertil Steril 2024;121:742-51. [Crossref] [PubMed]
Hassan MR, Al-Insaif S, Hossain MI, et al. A machine learning approach for prediction of pregnancy outcome following IVF treatment. Neural Comput Appl 2020;32:2283-97.
Wu J, Li T, Xu L, et al. Development of a machine learning-based prediction model for clinical pregnancy of intrauterine insemination in a large Chinese population. J Assist Reprod Genet 2024;41:2173-83. [Crossref] [PubMed]
Peigné M, Bernard V, Dijols L, et al. Using serum anti-Müllerian hormone levels to predict the chance of live birth after spontaneous or assisted conception: a systematic review and meta-analysis. Hum Reprod 2023;38:1789-806. [Crossref] [PubMed]
Xu S, Diao H, Xiong Y, et al. The study on the clinical efficacy of endometrial receptivity analysis and influence factors of displaced window of implantation. Sci Rep 2025;15:7326. [Crossref] [PubMed]
Richter KS, Bugge KR, Bromer JG, et al. Relationship between endometrial thickness and embryo implantation, based on 1,294 cycles of in vitro fertilization with transfer of two blastocyst-stage embryos. Fertil Steril 2007;87:53-9. [Crossref] [PubMed]
Chen Y, Guo J, Zhang Q, et al. Insulin Resistance is a Risk Factor for Early Miscarriage and Macrosomia in Patients With Polycystic Ovary Syndrome From the First Embryo Transfer Cycle: A Retrospective Cohort Study. Front Endocrinol (Lausanne) 2022;13:853473. [Crossref] [PubMed]
Hickman CFL, Alshubbar H, Chambost J, et al. Data sharing: using blockchain and decentralized data technologies to unlock the potential of artificial intelligence: What can assisted reproduction learn from other areas of medicine? Fertil Steril 2020;114:927-33. [Crossref] [PubMed]

doi: 10.21037/jmai-25-75
Cite this article as: Agirsoy M, Oehlschlaeger MA. Development of an explainable machine learning model to predict live birth versus miscarriage among in vitro fertilization-embryo transfer pregnancies. J Med Artif Intell 2026;9:13.

Development of an explainable machine learning model to predict live birth versus miscarriage among in vitro fertilization-embryo transfer pregnancies

Highlight box

Introduction

Literature review

Table 1

Motivation

Methods

Study design and dataset description

Data preprocessing

Model development and ensemble learning

Ethics and registration

Model evaluation

Statistical analysis

Results

Table 2

Discussion

Strengths and limitations

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share