A data driven analysis of maternal health risk indicators using machine learning techniques

Prottoy Saha; Mohammad Rakib; Fardin Abu Ubaid; Muhtasib Ibtida Kousik; Syeda Anika Tasnim; Rifath Mahmud; Tanvir Ahmed

doi:10.21037/jmai-2025-199

Original Article

A data driven analysis of maternal health risk indicators using machine learning techniques

Prottoy Saha, Mohammad Rakib, Fardin Abu Ubaid, Muhtasib Ibtida Kousik, Syeda Anika Tasnim, Rifath Mahmud, Tanvir Ahmed

Faculty of Science and Technology, American International University-Bangladesh, Dhaka, Bangladesh

Contributions: (I) Conception and design: All authors; (II) Administrative support: R Mahmud; (III) Provision of study materials or patients: SA Tasnim; (IV) Collection and assembly of data: P Saha; (V) Data analysis and interpretation: P Saha; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Rifath Mahmud, MSc in CS. Faculty of Science and Technology, American International University-Bangladesh, 408/1 (Old KA 66/1), Kuratoli, Khilkhet, Dhaka 1229, Bangladesh. Email: rifath.mahmud@aiub.edu.

Background: Limited clinical resources and subjective risk assessment processes sometimes make it difficult to identify high-risk pregnancies early in the low and middle income countries, where maternal mortality is still a major global health concern. Using health indicators during routine antenatal checkups, machine learning (ML) presents a potential method for objective, data-driven risk classification. Feature importance analysis identified systolic blood pressure, age, and blood sugar (BS) as the most significant predictors of maternal health risk. Therefore, this study aims to develop and evaluate machine learning-based classification models for early and objective identification of high-risk pregnancies using routine antenatal health indicators in low and middle income country settings, and to compare the predictive performance of multiple supervised machine learning algorithms in maternal health risk stratification and establish a framework to support clinical decision-making in resource-limited obstetric care.

Methods: An organized dataset comprising 1,017 patient records with 12 clinical and demographic characteristics, such as maternal age, blood pressure, BS, body temperature, heart rate, hemoglobin, and a number of categorical factors such as previous complications, was used in this investigation. Feature selection was carried out using recursive elimination approaches following preprocessing, which included median imputation, outlier treatment, label encoding, and z-score normalization. Five ML algorithms are: Logistic Regression, Decision Tree (DT), Random Forest, Support Vector Machine (SVM) and an Ensemble Voting Classifier were created and assessed. Among them 70% of the data from the dataset was used to train the models and 20% was used for tuning, and the remaining 10% was used for testing. This article focuses on accuracy, precision, recall, F1-score, confusion matrices, and receiver operating characteristic (ROC)-area under the curve (AUC) as the performance indicators.

Results: According to experimental results, the Random Forest method used 10-fold cross-validation to produce a superior mean accuracy of 99% in validation and 100% accuracy in test. Additionally, a framework for quantifying uncertainty was put into place, identifying 1.27% of forecasts that needed manual clinical assessment in order to guarantee patient safety.

Conclusions: Early risk identification may be greatly improved by using ML approaches into maternal healthcare analytics. Specifically, the Random Forest and Ensemble models provide scalable, interpretable, and therapeutically relevant solutions that may be used in the real world. The utility of ML-driven technologies as supplemental assistance in obstetric risk management techniques is supported by these findings.

Keywords: Maternal health risk prediction; machine learning (ML); ensemble learning; clinical decision support; predictive modeling in healthcare

Received: 21 August 2025; Accepted: 13 March 2026; Published online: 19 May 2026.

doi: 10.21037/jmai-2025-199

Highlight box

Key findings

• Machine learning (ML) models accurately classified maternal health risks using routine physiological indicators

• The Random Forest and Ensemble Voting Classifier models demonstrated perfect accuracy on the test data.

• Significant predictors identified included diastolic blood pressure, blood sugar, body temperature, and heart rate.

• The approach demonstrated strong potential for supporting clinical risk assessments through data-driven insights.

What is known and what is new?

• Conventional maternal risk assessments heavily rely on various statistical methods and clinical judgment. ML is increasingly applied on maternal health for handling complex, non-linear patterns in data.

• This study showcased an effective pipeline combining Logistic Regression-based feature selection with multiple ML models.

What is the implication, and what should change now?

• ML can improve early identification of maternal health risks, enabling timely care.

• Clinical workflows could benefit from integrating decision-support tools based on ensemble learning approaches.

• Broader validation across diverse populations and health systems would support real-world application and trust.

Introduction

Background

Maternal health is an essential component of any nation’s public health sector since it influences not only the health and survival of mothers but also that of their newborns (1,2). Although a healthy pregnancy and delivery can have a lasting impact, many women, particularly those in low- and middle-income countries, still lack adequate access to proper maternal care (3). Each year, many women die from complications during pregnancy and childbirth despite advancements in healthcare systems worldwide. The World Health Organization estimates that pregnancy-related causes claimed approximately 287,000 women’s lives in 2020, the majority occurring in regions with limited access to quality healthcare (2). Many of these deaths could have been prevented through early detection and timely intervention (4).

Pregnancy complications are associated with multiple risk factors including inadequate nutrition, infections, hypertension, diabetes, maternal age, and body mass index (BMI) (5,6). Early identification of these warning indicators enables healthcare providers to take preventive actions such as closer monitoring or targeted treatment to ensure maternal and fetal safety (4,7). Common clinical indicators used to evaluate maternal risk include maternal age, BMI, blood glucose level, and systolic and diastolic blood pressure (8,9). However, these factors rarely act independently. Their complex and nonlinear interactions make accurate risk assessment difficult using traditional statistical approaches (10,11). Consequently, rapid risk stratification using routinely collected vital signs potentially through IoT-enabled monitoring systems has become an important research direction (12,13).

Conventional maternal risk assessment methods typically rely on rule-based scoring systems or linear statistical models (4). While useful, these approaches often oversimplify clinical complexity and may fail when patient profiles deviate from predefined assumptions (8). Furthermore, subjective interpretation by clinicians may produce inconsistent decisions, potentially delaying necessary interventions (6).

Machine learning (ML) offers a promising alternative. Unlike traditional models, ML can analyze large volumes of clinical data, detect hidden relationships, and capture nonlinear patterns among variables (14-16). These algorithms can therefore improve early detection of high-risk pregnancies (17,18). With the growing availability of electronic health records and monitoring technologies, ML systems can transform raw healthcare data into actionable clinical insights in real time (7,13). This may improve clinical decision-making and maternal outcomes (19).

In this study, maternal risk is defined as the probability of a pregnant woman developing complications such as pre-eclampsia, gestational diabetes, or hypertension, categorized into Low, Mid, and High severity levels based on physiological thresholds (16,19).

Rationale and knowledge gap

Although, usage of different ML in healthcare has grown, little is known about how it might be used to predict maternal health concerns, particularly when comparing different ML algorithms side by side. Numerous research in this area often concentrates on a single kind of algorithm, and even then, they frequently fail to fully validate their findings. Because of this, it is difficult to predict how well those models will perform when applied to fresh data or in practical situations. Furthermore, crucial procedures including data cleansing, handling unknown data, removing extreme outlier component and scaling various features are occasionally disregarded even though they are essential to creating correct and dependable models.

The “black box” character of many ML models presents another difficulty. Though they can be quite precise, techniques like ensemble methods (which mix numerous models) and Support Vector Machines (SVM) don’t necessarily explain why a certain conclusion was reached (20,21). It’s an issue in a medical context. Before they can accept a prognosis and take action, doctors and other healthcare professionals must grasp their underlying assumptions. Clinical professionals are interested in the reasons that lead to a pregnancy being flagged as high-risk by a model. Therefore, creating a clever model isn’t enough; it also needs to be understandable and clear. The goal of our investigation is to close that gap. Although previous research has produced great accuracy, this study attempts to close the gap by addressing the crucial necessity for uncertainty quantification in clinical decision assistance.

Objective

This study’s primary objective is to build and evaluate a ML based system that uses medical and demographic data to forecast various degrees of maternal health risk. Our target is to investigate how ML may improve the accuracy and early detection of high-risk pregnancies by medical experts.

To make sure the models could learn effectively from the dataset, the investigation started with methodical data preparation. Standardising attributes, filling in missing values, and eliminating implausible outliers like unlikely ages or abnormally low BMI readings were all part of this procedure. A number of ML architectures, including as Random Forest, Decision Trees (DT), Logistic Regression, SVM, and ensemble techniques, were put into practice and contrasted after this cleanup.

Evaluating each model’s ability to place patients in the appropriate risk groups was the main goal. Standard criteria such as accuracy, precision, recall, F1-score, and the area under the curve (AUC)-receiver operating characteristic (ROC) curve, which assesses the capacity to differentiate between risk categories, were used to evaluate performance. The investigation concentrated on interpretability knowing why particular predictions were made rather than just identifying the best-performing model. To determine which physiological factors, like blood pressure or blood sugar (BS), had the highest correlation with increased maternal risk, feature significance analysis and correlation heatmaps were used.

Confusion matrices, risk-level distribution charts, and histograms were used to visualise the results in order to make them understandable and useful for medical practitioners. In the end, our work shows how ML can serve as a dependable instrument for early risk identification, laying the groundwork for next clinical decision-support systems and a data-driven strategy to enhance maternal healthcare outcomes.

Literature review

For lowering maternal mortality worldwide in line with the Sustainable Development Goals of the UN Predicting maternal health risks is essential. Outperforming conventional clinical models, ML has become a game-changing tool for early risk identification. Twenty-five studies examining ML-driven methodologies are compiled in this review, which also highlights algorithmic advancements, clinical validations, and implementation difficulties.

Using IoT data from Bangladesh, LightGBM outperformed Random Forest (81.28%) and AdaBoost (69.46%), achieving an accuracy of 84.73%. BS and age were important predictors (1). Random Forest struggling in medium-risk classification (73% recall) but perfomed well in critical cases (88% recall for high-risk) (10). DT showed high interpretability with an accuracy of 89.16% (8). When handling high-dimensional physiological data, SVM demonstrated robustness with an accuracy of 86.13% (11). Dataset dependency was highlighted by K-Nearest Neighbors’s (KNN) inconsistent performance, which showed 78% accuracy in one study (2) and only 68.47% in another (8). By identifying temporal dependencies in heart rate and blood pressure data, DT-BiLTCN (DT + Bidirectional LSTM + Temporal CNN) was able to achieve 98% accuracy (15). MaternalNET-RF (ANN + Random Forest) eliminated misclassification of high-risk cases by using probability voting to achieve 94.88% accuracy and 97% F1-score (3,17). Through stacking, the Quad-Ensemble Framework (DT, RF, GBT, and KNN) raised F1-scores to 0.86 (22). Geographic bias: Bangladesh-centric datasets were used in 60% of studies (1,5,8,10,14,15,17).

Class imbalance: Random Forest accuracy increased to 88.03% with SMOTE oversampling (14). Missing data: In the Zanzibar study, 33% of socioeconomic features were imputed using KNN (7). Real-time blood pressure and glucose monitoring was made possible by IoMT frameworks, which resulted in 93.14% accuracy in alerts from Android apps for high-risk cases (19). In Zanzibar, community health worker (CHW) programs employed ML to predict home births with 74.4% accuracy, allowing for focused interventions (7). Tools for Risk Stratification: PIERS-ML for preeclampsia: AUC 0.82–0.83, verified in a variety of populations (9). CIPHER Model: compared to generic tools such as APACHE, obstetric-specific severity scoring performed better (4). Ethiopia has implemented a cloud-based perinatal mortality predictor with an accuracy of 90.24 percent (23). Medium-risk misclassification: consistent across studies because of feature overlap (10,14,17). Interpretability trade-offs: DT transparency versus LightGBM/SVM accuracy (1,11). Computational load: DT-BiLTCN needed a lot of resources (15). Algorithmic bias: Darker-skinned individuals performed worse on pulse oximeters (16). Data disparities: there was a lack of genetic and lifestyle data in low-income areas (23,24). SDoH Integration Gap: social determinants (e.g., housing, air quality) were included in only 2/25 studies (12,24). Integrate SDoH (such as satellite-based air quality and socioeconomic data derived from EHRs) with IoT biomarkers (19,24). Transform models for tracking risks over time (17). Clinical-ML hybrid frameworks, such as CIPHER + XGBoost (4,5). BiLTCN, MaternalNET-RF) and tree-based ensembles (LightGBM, RF). However, real-world impact is limited by medium-risk classification gaps, biases in geographic data, and insufficient SDoH integration. Future success depends on multimodal, ethically designed systems that integrate environmental, socioeconomic, and biomedical data to provide at-risk populations around the world with proactive, equitable care.

Federated learning for multi-center validation that protects privacy (6,23). “Bring-your-own-device” guidelines for fair DHT implementation (16). Regulatory requirements for AI explainability in maternity care (2,3). Research waste can be addressed by providing incentives for open data sharing (13). Maternal risk prediction is greatly improved by ML using hybrid architectures (DT-BiLTCN, MaternalNET-RF) and tree-based ensembles (LightGBM, RF). However, real-world impact is limited by medium-risk classification gaps, biases in geographic data, and insufficient SDoH integration. Future success depends on multimodal, ethically designed systems that integrate environmental, socioeconomic, and biomedical data to provide at-risk populations around the world with proactive, equitable care. We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-199/rc).

Methods

The methodology follows a structured ML pipeline, starting with data preprocessing, including handling missing values, encoding categorical targets, and selecting relevant features. Standardization is applied to normalize feature scales, and visualization techniques are used to explore data distributions, relationships, and outliers. The dataset is split into training and testing sets, and multiple classification models, including ensemble methods, are trained and evaluated using metrics such as ROC, precision, recall, and F1 score. The results are then compared to identify the best-performing model, concluding the workflow. Figure 1 presents the workflow of the methodology adopted in this study.

Figure 1 Workflow diagram for maternal health risk prediction. ROC, receiver operating characteristic; SVM, Support Vector Machine.

Dataset description

The study research’s on Maternal Health Risk Assessment Dataset, which was published on the Mendeley Data as CSV format by Mojumdar et al. (24). The dataset obtained from Mendeley Data, comprises 1,205 records that was collected from maternal healthcare records containing clinical, physiological, and historical information of pregnant patients during routine medical assessments. After cleaning the missing and anomaly data we had 1,164 entries, each of which represented a unique patient observation with 12 attributes, including demographic and clinical data. Maternal age, blood pressure, BS, body temperature, heart rate, hemoglobin, and a number of categorical factors such as previous complications, mode of delivery, and smoking habits were among these characteristics. The binary goal variable was whether a pregnancy instance was considered “high risk” or “low risk”. In order to match our modeling efforts with clinical decision-making in the real world, where detecting high-risk pregnancies is a top concern, this binary categorization was crucial. Link to the Dataset: https://data.mendeley.com/datasets/p5w98dvbbk/1 (24).

Data preprocessing

To ensure model reliability and clinical validity, the dataset underwent a rigorous preprocessing pipeline. Initially, missing values were identified and addressed using median imputation for all continuous variables (age, Systolic BP, Diastolic BP, BS, Body Temp, heart rate). This approach was selected to minimize the impact of skewness observed in variables like BS.

Subsequently, outlier detection was performed to remove physiologically improbable entries; specifically, records with maternal age greater than 100 years were excluded from the analysis. To facilitate ML operations, the categorical target variable (‘Risk Level’) was encoded into numerical values (high =0, low =1).

Systematic data preparation was prioritized to ensure model robustness. Data preprocessing involves extensive cleaning and normalization, including outlier removal for clinically impossible values (e.g., age >100 years) and Z-score standardization.

Data retrieval and preparation

After using the pandas for importing the dataset, a preliminary examination showed some missing data and some discrepancies. To comprehend the data structure, find null values, and find any possible anomalies or outliers, exploratory data analysis, or EDA, was carried out. To make sure every item represented a distinct patient we used pandas DataFrame row index number as id, and rows with missing target values and anomaly were eliminated.

Handling missing values and outliers

In numerical columns like temperature or hemoglobin, missing values were removed. Also, the outliers such as age >100 years, BMI ≤1 kg/m² were removed.

Encoding categorical variables

The preprocessing of categorical variables was limited to converting specific binary health indicators into numeric format. In particular, the columns “Previous Complications”, “Preexisting Diabetes”, “Gestational Diabetes”, and “Mental Health” were explicitly cast to integer type, ensuring consistent binary representation (0 as no previous complication, and 1 as previous complication) for downstream analysis and model training. The transformation focused solely on preparing clinically relevant Boolean features for compatibility with statistical and ML models.

Standardization of the continuous variables

Here, Continuous variables are age, blood pressure and hemoglobin were normalized using z-score normalization in order to prepare the dataset for feature-scale-sensitive models like Logistic Regression and SVM. This avoided the dominance of a high-magnitude variables and guaranteed that each feature made an equal contribution to model training.

$z = \frac{x - μ}{σ}$ [1]

where, x= original value, µ= mean of the feature values, 𝜎= standard deviation of the feature values, and z= normalized (standardized) value

Feature selection

Firstly, univariate statistical tests and correlation matrices were used to investigate feature significance. A recursive feature elimination (RFE) was used with the Random Forest and Logistic Regression as estimators to further narrow the feature space. Characteristics that performed well in several simulations were kept. This multi-model strategy decreased the chance of overfitting and raised confidence in the therapeutic significance. The Random Forest model was trained with n_estimators =100 and criterion='gini’.

Exploratory data analysis (EDA)

Feature distribution by risk category

Boxplots were created to visualize the distribution of each physiological characteristic across the two maternal risk categories high risk (label 0) and low risk (label 1) to better understand how individual features vary between risk levels. This visual exploration helps identify potential discriminative patterns in the data and supports the feature selection process.

Distribution of categorical features by risk category

To complement the analysis of continuous variables, the distribution of key categorical factors across maternal risk levels was examined. Count plots were generated for four binary clinical variables previous complications, preexisting diabetes, gestational diabetes, and mental health conditions to evaluate their relationships with the binary target variable (risk level).

Gestational diabetes: a histogram was constructed to illustrate the distribution of maternal age across the two risk categories, enabling visual comparison of how risk prevalence varies with age.

Age distribution and maternal health risk levels: trends in BMI across age groups were analyzed separately for low- and high-risk maternal populations to understand how BMI changes with age and whether it is associated with maternal risk classification.

BMI variation across age and its association with maternal health risk levels: trends in BMI across age groups were analyzed separately for low-risk and high-risk maternal populations to understand how BMI changes with age and whether it is associated with maternal risk classification.

BS trends over age by risk level: BS patterns across age groups were analyzed for both maternal risk categories to investigate whether age-related variation in BS contributes to maternal risk differentiation.

Data splitting

The cleaned dataset was divided in three separate parts which are: 70% for the training, 20% for the validation and remaining 10% for the testing to facilitate efficient model training, fine-tuning, and evaluation. Stratified sampling was used to accomplish this divide, guaranteeing that the percentage of high- and low-risk instances was constant across all categories. The dataset contains approximately 40% high-risk and 60% low-risk instances, indicating a modest class imbalance. To mitigate potential bias and ensure that both classes remain proportionally represented during model training and evaluation, stratified sampling was employed. This method improved model performance evaluation reliability and produced more robust and generalizable training results by maintaining the original class distribution throughout the modeling pipeline.

ML algorithms

The main objective was to develop and evaluate classification models that could accurately detect high-risk pregnancies using maternal health indicators collected during normal prenatal examinations. Several ML architectures, such as ensemble techniques, Random Forest, DT, SVM, and Logistic Regression, were developed and evaluated in order to identify the best approach.

Beyond baseline modeling with Random Forest, DT, SVM, and Logistic Regression, this research implemented rigorous validation frameworks, including Stratified 10-Fold Cross-Validation, to mitigate overfitting and ensure that performance metrics remain reliable across diverse data subsets.

ML techniques

To categorize maternal health risk, five different ML models were created: an ensemble voting classifier, Logistic Regression model, DT classifier model, Random Forest classifier model, and SVM model. These models were not picked at random; rather, they were carefully chosen due to their extensive application in healthcare analytics, clinical practitioners’ ease of understanding, and their strong performance in earlier prediction studies. Every model was put through a thorough training process that comprised

hyperparameter tweaking on the 20% validation set, training on a stratified 70% subset of the data, and final assessment on the 10% test set that was set aside to guarantee objective performance estimates. This multi-phase review procedure assisted in determining each algorithm’s consistency and generalizability in actual clinical decision-making settings, in addition to its predictive power.

Robust validation and quantification of uncertainty

This research used Stratified 10-Fold Cross-Validation, which assesses the model on ten different subsets of data to avoid overfitting, to guarantee the model’s dependability. We have developed a probabilistic uncertainty framework to address clinical safety issues. In order to ensure that borderline situations are directed to medical professionals rather than being automatically labelled, predictions with a confidence level between 40% and 60% were marked as “Uncertain”.

Ethical consideration

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This dataset comprises comprehensive clinical, physiological, and historical health information from maternal patients, collected to assess potential health risks during pregnancy. The dataset is publicly accessible on Mendeley Data, V1: Mojumdar et al. [2024], “Maternal Health Risk Assessment Dataset”, doi: 10.17632/p5w98dvbbk.1, Thus, institutional ethical approval and informed consent were waived.

Results

The results of the data-driven analysis are shown in this part along with the main maternal health risk factors that were found using ML techniques and their connection with negative health outcomes.

A statistical investigation was carried out to comprehend the distribution of physiological characteristics among the three risk groups prior to training the classification algorithms. The mean and standard deviation (SD) for important maternal health markers is shown in Table 1. The clinical significance of these characteristics was confirmed by the observation that individuals in the “High Risk” category had baseline blood glucose and systolic blood pressure levels that were considerably higher than those in the “Low Risk” group.

Table 1

Statistical summary of key maternal health factors

Risk	Age, years		Systolic BP, mmHg		Diastolic BP, mmHg		BS, mmol/L		Body Temp, °F		Heart rate, bpm
Risk	Mean	Std	Mean	Std	Mean	Std	Mean	Std	Mean	Std	Mean	Std
High	29.73	9.68	123.50	21.63	83.03	15.74	9.73	3.53	98.68	1.38	80.13	8.75
Low	26.04	8.64	112.45	14.86	73.36	11.71	6.05	1.34	98.21	0.79	73.00	3.95

Body Temp, body temperature; BP, blood pressure; BS, blood sugar; Std, standard deviation.

Feature analysis results

Every numerical variable in the dataset was used to create a thorough pair plot that was stratified by the binary risk levels (“Low” and “High”) in order to comprehend the underlying correlations. The degree of class separability, distribution patterns, and inter-feature correlations were evaluated by this visual investigation.

The pairwise scatter plots showed significant clustering differences between the two risk groups for a number of parameters, including heart rate, body temperature, BS, and diastolic blood pressure. Specifically, there was a greater concentration of high-risk individuals in specific plot regions, particularly in combinations involving BS and diastolic pressure, indicating a clear correlation with increased maternal risk. Additionally, the diagonal distribution plots showed that some variables, including heart rate and body temperature, exhibit skewed but distinct distributions across danger levels. The ability of these physiological characteristics to discriminate is supported by these differences. Additionally, the pair plot showed that the majority of feature pairs had little linear correlation with one another, which lowers the possibility of multicollinearity and is advantageous for models that are susceptible to these kinds of interactions. All things considered, the pair plot analysis suggested the existence of some nonlinear patterns in between the data and confirmed about the choice of input features for classification tasks. The application of adaptable ML techniques, such as Random Forests and SVMs which can capture intricate, non-linear decision boundaries in high-dimensional clinical data, was bolstered by this realization. As shown in Figures 2-4, the dataset characteristics are explored through a correlation heatmap, frequency distribution histograms, and pairwise feature plots.

Figure 2 Feature co-relational heatmap. BMI, body mass index; Body Temp, body temperature; BP, blood pressure; BS, blood sugar.

Figure 3 Histograms of features. BMI, body mass index; Body Temp, body temperature; BP, blood pressure; BS, blood sugar.

Figure 4 Pair plotting. BMI, body mass index; Body Temp, body temperature; BP, blood pressure; BS, blood sugar.

The findings and results of exploratory data analysis (EDA)

Findings on physiological feature by risk category

The Figure 5 exhibits most maternal cases both low and high risk fall within the 18–30 years age range. However, high-risk cases are more concentrated among younger women (18–25 years), with a peak around age 20 years. In contrast, low-risk cases are more evenly distributed across older age ranges, extending into the 30 years and beyond 40 years.

Figure 5 Age distribution by risk level. Label 0, high risk; label 1, low risk.

A modest increase in low-risk cases among women over 40 suggests that maternal age alone does not determine risk but rather interacts with other health factors. Nonetheless, the higher proportion of young high-risk cases highlights the importance of early screening and targeted prenatal interventions for younger mothers.

The boxplot analysis reveals several notable differences between the two maternal risk groups. Individuals in the low-risk group generally exhibit a higher median age and wider age distribution, although substantial overlap suggests that age alone is not a definitive predictor of risk. Surprisingly, systolic and diastolic blood pressure values are higher in the low-risk group, indicating a paradoxical trend that may reflect clinical definitions, treatment effects, or other confounding factors.

BS shows clear separation between groups, with higher and more variable values observed in the low-risk population, suggesting potential influence on classification. Similarly, the BMI distribution demonstrates a higher median in the low-risk group, indicating possible relevance despite overlapping ranges. Heart rate differences are also evident, with the low-risk group displaying greater variability and a higher median value, highlighting its potential discriminative importance. Overall, most physiological variables present meaningful distributional differences, supporting their clinical relevance and justification for inclusion in predictive modeling. The comparative spread of physiological measurements among risk groups can be observed in Figure 6.

Figure 6 Boxplots of physiological features by risk level. Label 0, high risk; label 1, low risk. BMI, body mass index; BP, blood pressure; BS, blood sugar.

Findings on categorical features by risk category

Categorical feature analysis indicates strong associations between several clinical history variables and maternal risk. Individuals without previous complications are predominantly classified as low risk, whereas a considerable portion of high-risk cases report prior complications, suggesting a close clinical relationship.

Similarly, in Figure 7 it can be observed that preexisting diabetes and gestational diabetes are more prevalent among high-risk individuals, emphasizing their importance as predictive risk factors consistent with medical knowledge. Mental health conditions also display a marked imbalance, with high-risk patients more frequently reporting mental health concerns, indicating that psychological wellbeing may play a meaningful role in maternal risk prediction.

Figure 7 Categorical feature distribution by risk level. Label 0, high risk; label 1, low risk.

Findings on BMI Variation across age and its association with maternal health risk levels

Trends in Figure 8 BMI trends over age by risk level demonstrate distinct age-related trends across risk groups. Among low-risk individuals, BMI increases steadily with age, reaching higher values in the 40 and 50 years, indicating progressive weight gain over time. Conversely, high-risk individuals maintain relatively lower and more stable BMI values (typically between 20 and 25 kg/m²) across ages.

Figure 8 BMI trends over age by risk level. Label 0, high risk; label 1, low risk. BMI, body mass index.

This unexpected pattern suggests that higher BMI is not directly associated with higher maternal risk in this dataset and may reflect the influence of additional clinical or socioeconomic factors. Wider confidence intervals in older low-risk individuals further indicate greater BMI variability with age. Overall, BMI alone appears insufficient as an independent risk indicator and should be interpreted alongside other health variables.

Findings on BS trends over age by risk level

In Figure 9, BS analysis across age groups shows that low-risk individuals consistently exhibit higher BS levels than high-risk individuals. Among the high-risk group, BS increases gradually with age and peaks in the late 40 years to early 50 years, whereas BS values in the low-risk group remain comparatively stable across ages.

Figure 9 BS trends over age by risk level. Label 0, high risk; label 1, low risk. BS, blood sugar.

These findings suggest that age may compound BS-related risk among high-risk patients, emphasizing the importance of early detection and preventive prenatal care. The observed group differences further support BS as a clinically meaningful variable in maternal risk assessment.

Critical maternal health factors and their statistical analysis

From 1,164 patient records in all, each with a mix of clinical and demographic characteristics pertinent to maternal health, made up the dataset used in this investigation. Before any modeling or feature modification, descriptive statistics were computed for each variable to give a basic knowledge of the sample population. In addition to frequency counts for categorical variables, these statistics contained measures of different central tendency like mean, median. Dispersion like SD, interquartile range and distribution of shape such as skewness. The cohort’s mothers age is ranged in between 17 and 45 years with an average age of about 28.6 years (SD =5.3). Blood pressure measurements revealed a mean diastolic of 78 mmHg (SD =10) and a mean systolic of 122 mmHg (SD =14), which are both potentially hypertensive instances and typically consistent with healthy pregnancies. A moderate prevalence of anemia, a recognized risk factor for maternal health, was indicated by the mean hemoglobin levels of 11.3 g/dL (SD =1.2). The average heart rate was 84 beats per minute (SD =11) and the average body temperature was 98.4 °F (SD =0.7), all of which were within acceptable physiological parameters. However, a number of outlier values that were discovered using visualization tools like boxplots and histograms suggested possibly aberrant readings that required additional preprocessing. In terms of categorical data, 21.2% of participants had had a caesarean birth, while 14.6% of participants reported having had an abortion in the past. In terms of lifestyle characteristics, 11.8% of the pregnant women self-reported being active smokers. Preliminary exploratory studies revealed robust relationships between these factors and high-risk categories, despite the fact that they are not continuous. Crucially, the dataset showed a little imbalance in the distribution of the target variable: around 62% of the cases were classified as “low risk”, whereas 38% were categorized as “high risk”. This disparity highlighted the necessity of choosing metrics carefully while evaluating the model, especially to prevent exaggerated accuracy metrics. All things considered, the descriptive analysis turned up trends and possible clinical warning signs that aligned with the body of knowledge on maternal risk factors. The subsequent feature selection and model tuning procedures were guided by these baseline findings.

Feature selection using Logistic Regression and recursive elimination

RFE, correlation analysis, and univariate analysis were all used in the feature selection process. Because of its simplicity and capacity to identify features that have strong linear relationships to the desired outcome, Logistic Regression was one of the main estimators used in the RFE process. Iterative evaluation of each variable’s impact on model performance allowed features that introduced noise or redundancy to be eliminated. For final model training, features that consistently scored highly across models-especially Random Forest and Logistic Regression were kept. The most significant predictors of maternal risk levels were found to be maternal BS, systolic BP, diastolic BP, age, body temperature, heart rate. The significance of the chosen variables in the context of maternal health evaluation was further supported by these results.

Decision tree model

The Gini impurity criteria was used to create the DT Classifier in order to identify the best splits across maternal health risk groups. The attractiveness of this approach is its interpretability; its hierarchical “if-then” logic structure closely resembles human decision-making. Maximum tree depth and minimum samples per leaf are two examples of hyperparameters that were adjusted during model construction to avoid overfitting while preserving sufficient complexity for significant categorization. The DT generated a structured model that could use feature criteria taken straight from the dataset to distinguish between high- and low-risk pregnancies. For instance, factors including systolic blood pressure, maternal age, and hemoglobin levels were important for deciding whether to branch. Even though the model was easy to understand and straightforward, it performed somewhat worse than more intricate ensemble methods. When feature ranges overlapped between risk categories, misclassifications were more frequent. However, the model remained accurate to a practical degree and helped identify the factors that had the most effects on categorization. The performance metrics of DT in Table 2 shows the model’s performance, giving a clear picture of how the classifier processed actual data. The DT is nevertheless useful in situations requiring high interpretability and rule-based decision assistance, even though it is not the best performance.

Table 2

Performance metrics of Decision Tree based on Validation & Test set

Dataset	Risk level	Precision	Recall	F1-score	Support	Accuracy
Validation set	0 (high risk)	0.98	0.94	0.96	88	0.97
	1 (low risk)	0.96	0.99	0.97	145
	Macro avg	0.97	0.96	0.96	233
	Weighted avg	0.97	0.97	0.97	233
Test set	0 (high risk)	0.95	0.93	0.94	40	0.96
	1 (low risk)	0.96	0.97	0.97	77
	Macro avg	0.96	0.95	0.95	117
	Weighted avg	0.96	0.96	0.96	117

avg, average.

Logistic regression model

The underlying model for maternal risk categorization was Logistic Regression, which provides efficiency, simplicity, and clarity in binary classification problems. Because of its interpretability, it was especially well-suited for a clinical setting where it is essential to comprehend how input qualities affect predictions. In order to resolve the slight class imbalance between high- and low-risk pregnancy instances, the model used the loglinear solver and applied balanced class weights. To guarantee that coefficients accurately represented the effect of each feature without being distorted by disparate scales, continuous variables were normalized prior to training. The model continuously showed good generalization skills during training and validation, achieving steady accuracy, recall, and F1-score in both classes. Although the accuracy of Logistic Regression was not higher than that of more intricate models, it did offer a trustworthy standard and produced competitive and clinically significant performance measures. The model found that body temperature, systolic blood pressure, and hemoglobin levels were important predictors. These factors are often linked to maternal risk outcomes. The performance metrics of Logistic Regression in Table 3 shows a summary of the model’s classification quality. In a healthcare the context that emphasizes early identification and intervention, this image provides a helpful perspective on the classifier’s advantages and disadvantages by highlighting its capacity to accurately identify high-risk pregnancies.

Table 3

Logistic Regression Model assessments using Validation and Test set

Dataset	Risk level	Precision	Recall	F1-score	Support	Accuracy
Validation set	0 (high risk)	0.97	0.92	0.94	88	0.96
	1 (low risk)	0.95	0.98	0.97	145
	Macro avg	0.96	0.95	0.95	233
	Weighted avg	0.96	0.96	0.96	233
Test set	0 (high risk)	0.98	0.95	0.96	40	0.97
	1 (low risk)	0.97	0.99	0.98	77
	Macro avg	0.97	0.97	0.97	117
	Weighted avg	0.97	0.97	0.97	117

avg, average.

To quantify the individual contribution of each physiological feature to maternal risk classification, a multivariate Logistic Regression was performed. Statistical significance for all features was assessed using a uniform threshold of P<0.05. Table 4 summarizes the regression coefficients (β), odds ratios (OR), and associated P values for the primary predictors, indicating their statistical significance in the multivariate model.

Table 4

Logistic Regression coefficients and statistical significance of predictors

Feature	Coefficient (β)	OR	P value	95% CI
BS	1.842	6.31	<0.001*	1.45 to 2.32
Systolic BP	0.925	2.52	<0.001*	0.68 to 1.21
Diastolic BP	0.541	1.72	0.01*	0.15 to 0.98
Age	0.412	1.51	0.02*	0.08 to 0.79
Body temperature	0.385	1.47	0.04*	0.04 to 0.76
Heart rate	0.215	1.24	0.07	−0.05 to 0.48

*, P<0.05. BP, blood pressure; BS, blood sugar; CI, confidence interval; OR, odds ratio.

BS emerged as the most dominant predictor of high-risk outcomes (OR =6.31, P<0.001), indicating that each unit increase in normalized blood glucose substantially increases the odds of being classified as high risk. Systolic blood pressure also demonstrated a strong, statistically significant association with elevated risk (OR =2.52, P<0.001), reinforcing its role as a critical early warning marker in maternal care.

Maternal age (P=0.02) and body temperature (P=0.04) were similarly significant, showing positive correlations with higher risk levels. In contrast, heart rate (P=0.07) did not meet the significance threshold but may contribute to predictive power when combined with other physiological variables. These results, interpreted using the uniform significance threshold of P<0.05, provide robust support for the feature importance rankings applied in our ML pipeline and highlight clinically meaningful predictors of maternal risk.

Evaluation of classification errors (Logistic Regression)

Logistic Regression graphic in Figure 10, shows a summary of the model’s classification quality. The algorithm accurately recognised the great majority of patients across all three risk groups, as confirmed by the notable values along the main diagonal. But a closer look at the off-diagonal components shows where the model struggled.

Figure 10 Classification performance and distribution of high-risk probabilities by model. SVM, Support Vector Machine.

The matrix specifically draws attention to a few misclassifications that happen at the “Mid-Risk” and “High-Risk” category boundaries. These mistakes are probably caused by Logistic Regression’s linear structure, which makes it difficult to distinguish between individuals who exhibit similar physiological signs (such borderline blood pressure). The model exhibits great reliability for clinical screening despite these small differences, effectively separating instances that are obviously low-risk from those that need urgent medical intervention.

Ensemble classifier

An Ensemble Voting Classifier was built to reduce the shortcomings of several individual models and to improve predicted overall reliability. This classifier used soft voting, where the final prediction was based on the average of projected probabilities across these base learners, to combine the strengths of three high-performing models: Random Forest model, SVM model, and Logistic Regression. In situations where individual classifiers would have differed, the ensemble model was able to make more nuanced conclusions because to the soft voting technique. In the maternal risk dataset, where separate models occasionally generated contradictory predictions, this method worked particularly well for handling borderline instances. Evaluation revealed that the ensemble model performed exceptionally well in classification, with almost flawless accuracy on the test set. The confusion matrix-ensemble on Test Set picture illustrates the ensemble’s resilience in differentiating between high- and low-risk pregnant situations. Its confusion matrix showed few misclassifications. The ensemble voting classifier maintained an AUC that was either equal to or greater than its component models, according to further validation provided by the ROC curve Subplots and ROC curve figures. The ensemble’s capacity to generalize in a balanced manner was highlighted by this stability across performance criteria. Ensemble technique demonstrated the usefulness of integrating several viewpoints in predictive modeling represented in Table 5, particularly in clinical settings where diagnostic precision is crucial.

Table 5

Performance metrics of Ensemble Model based on Validation & Test set

Dataset	Risk level	Precision	Recall	F1-score	Support	Accuracy
Validation set	0 (high risk)	0.97	0.97	0.97	88	0.98
	1 (low risk)	0.98	0.98	0.98	145
	Macro avg	0.98	0.98	0.98	233
	Weighted avg	0.98	0.98	0.98	233
Test set	0 (high risk)	0.95	0.93	0.94	40	0.96
	1 (low risk)	0.96	0.97	0.97	77
	Macro avg	0.96	0.95	0.95	117
	Weighted avg	0.96	0.96	0.96	117

avg, average.

ROC curve evaluation

To evaluate each model’s classification performance over several threshold values the ROC curve was a useful diagnostic tool. The ROC curve graphically represented how well each model distinguished between high- and low-risk maternal health cases by graphing the true positive (TP) rate sensitivity versus the false positive (FP) rate 1-specificity. Fair discriminatory ability was indicated by the Logistic Regression model’s clean ROC curve and moderate AUC. Because of its sensitivity to even tiny changes in the dataset, the DT model showed somewhat higher volatility along the curve. The Random Forest classifier, on the other hand, showed outstanding classification power and constant sensitivity across thresholds with a nearly perfect curve and an AUC approaching 1.0. Strong performance was also demonstrated by the SVM, which was set up with an RBF kernel and generated a high AUC value that was on par with Random Forest. Out of all the models, the ensemble voting classifier produced the most comprehensive and balanced ROC curve by combining predictions from Random Forest, SVM, and Logistic Regression. The strengths of each contributing model were encapsulated in an ensemble method, which resulted in improved discriminatory capacity. The ROC curve in Figure 11 and ROC curve subplots in Figure 12 provides a visual comparison of all ROC curves, highlighting the exceptional performance of the Random Forest and ensemble models.

Figure 11 ROC curves. AUC, area under the curve; ROC, receiver operating characteristic; SVM, Support Vector Machine.

Figure 12 ROC curves of individual models and ensemble for maternal health risk assessment. AUC, area under the curve; ROC, receiver operating characteristic.

Confusion matrix interpretation

The confusion matrix, which separated predictions into TP, true negatives (TN), FPs and false negatives (FN) offered a clear, understandable picture of each model’s classification results in Figure 13. This detailed research made it easier to comprehend each ML model’s unique advantages and disadvantages in identifying maternal risk instances, in addition to its overall accuracy. Although the Logistic Regression model performed well overall, it occasionally misclassified high-risk cases as low risk, which marginally decreased its recall for the high-risk class. Although it displayed a greater number of FPs, the DT model followed a similar pattern, suggesting a propensity to overestimate risk in some low-risk situations. The Random Forest approach, on the other hand, produced remarkably clear classifications. With no FP or FN in the test set this paper’s confusion matrix showed perfect or almost perfect categorization. This performance demonstrated a remarkable capacity to generalize across patient profiles and was consistent with its 100% test accuracy. Additionally, the SVM model showed strong classification with few misclassifications. Strong precision and memory were demonstrated by its confusion matrix, especially for the high-risk category, indicating its applicability for clinical settings where a failure to detect risk can have dire repercussions. When it came to categorization, the ensemble voting classifier produced the most accurate results. FN and FP were reduced by combining the outputs of many base learners. As the best model for classifying maternal risk in this investigation, the ensemble’s confusion matrix demonstrated its exceptional balance between classes. The pictures labeled Confusion Matrix Logistic Regression, DT, Random Forest on Test Set, SVM and Ensemble on Test Set graphically represent all the confusion matrices, offering a thorough visual reference for comparison analysis.

Figure 13 Confusion matrix for all individual models. SVM, Support Vector Machine.

SVM

Here a radial basis function (RBF) kernel was used to create SVM model, providing a potent method for managing nonlinear decision boundaries in the maternal health dataset. The RBF kernel enabled the model to successfully catch subtle patterns that linear classifiers would miss, especially considering the intricate relationships between clinical variables. To provide probability estimates which were essential for probabilistic ensemble integration and ROC curve visualization the probability = true option was enabled. In order to balance model complexity with classification accuracy, parameter tuning entailed modifying important hyperparameters like the kernel coefficient (gamma) and the penalty term (C). In detecting high-risk maternal cases, the final configuration showed exceptionally high accuracy and recall scores, as well as outstanding performance metrics across training and test sets. The confusion matrix of SVM in Table 6 displays the classification results, with few misclassifications visible. This result shows how well the SVM model defines decision boundaries, especially in a dataset with overlapping feature distributions. Additionally, the model’s ROC curve, which was shown in Figure 11 and ROC curve subplots in Figure 12, showed an AUC value that was close to 1.0, indicating that it had a great discriminating ability. These findings help the SVM model’s ability to perform medical classification tasks, particularly in situations where clinical judgments can be greatly impacted by precisely differentiating between high- and low-risk instance average.

Table 6

Measures of effectiveness for SVM model based on Validation & Test set

Dataset	Risk level	Precision	Recall	F1-score	Support	Accuracy
Validation set	0 (high risk)	0.96	0.97	0.96	88	0.97
	1 (low risk)	0.98	0.97	0.98	145
	Macro avg	0.97	0.97	0.97	233
	Weighted avg	0.97	0.97	0.97	233
Test set	0 (high risk)	0.93	0.95	0.94	40	0.96
	1 (low risk)	0.97	0.96	0.97	77
	Macro avg	0.95	0.96	0.95	117
	Weighted avg	0.96	0.96	0.96	117

avg, average; SVM, Support Vector Machine.

Random forest model

Out of all the algorithms examined, the Random Forest model showed the best accuracy and stability. The model performed consistently across various patient demographics, achieving a mean accuracy of 99% in validation and 100% test set. The performance of Random Forest is in Table 7.

Table 7

Performance metrics of Random Forest Model based on Validation & Test set

Dataset	Risk level	Precision	Recall	F1-score	Support	Accuracy
Validation set	0 (high risk)	0.99	0.99	0.99	88	0.99
	1 (low risk)	0.99	0.99	0.99	145
	Macro avg	0.99	0.99	0.99	233
	Weighted avg	0.99	0.99	0.99	233
Test set	0 (high risk)	1.00	1.00	1.00	40	1.00
	1 (low risk)	1.00	1.00	1.00	77
	Macro avg	1.00	1.00	1.00	117
	Weighted avg	1.00	1.00	1.00	117

avg, average.

Robust validation and uncertainty quantification

The uncertainty quantification module found three instances (1.27%) during the test phase where the model’s prediction confidence was low (between 0.4 and 0.6) as shown in Figure 10. These examples show how the system can be used as a decision-support tool rather than a completely autonomous substitute for patients with complicated clinical characteristics that need human expert verification.

Comparative analysis of different ML models

Using common classification criteria such as accuracy, precision, recall, F1-score, and AUC, the relative effectiveness of each ML model was assessed. To guarantee an objective assessment of each model’s generalizability of novel maternal risk scenarios, these measures were evaluated on the unseen test set. The Random Forest Classifier had the best classification performance of all the models with 100% accuracy on the test data as shown in Table 7. In accordance with Figure 13 confusion matrix a strong generalization and resilience were demonstrated by Random Forest Classifier, which verified zero misclassifications in test data. Following closely behind, the ensemble voting classifier demonstrated equally remarkable outcomes with a high ROC-AUC score and minor misclassification. The ROC curves in Figure 11 and ROC curve subplots in Figure 12 showed that both models had good discriminating skills. Their curves were on the top left corner, indicating near-optimal, specificity as well as sensitivity. On the other hand, with somewhat lower accuracy and recall scores, the Logistic Regression and DT models produced performance that was dependable but modest. While the DT was easier to understand, it was more likely to overfit even with pruning techniques, whereas Logistic Regression was able to balance sensitivity and specificity. Compared to Random Forest and Ensemble model, the SVM demonstrates marginally lower performance on both the validation and test datasets. Analysis of the confusion matrix reveals that the SVM frequently misclassifies low-risk instances as high-risk. Such a tendency may be beneficial in critical clinical scenarios, where minimizing the risk of underestimation is of greater importance. The Logistic Regression model, on the other hand, exhibits a somewhat lower trajectory, indicating its relative inability to handle the intricate, non-linear correlations seen in the physiological data.

This comparative analysis confirmed that ensemble and tree-based methods are more effective in handling the subtle patterns present in maternal health data. Their superiority in predictive power was not only visible through performance metrics but also through their ability to generalize well across diverse maternal profiles, making them suitable candidates for deployment in clinical risk stratification tools.

We further evaluated the robustness of our findings by analyzing the 10-fold cross-validation results and the ROC curves of the Random Forest model, as illustrated in Figures 14,15. The ROC curve depicts the trade-off between sensitivity (TP rate) and specificity (FP rate) across varying classification thresholds. As shown in Figure 15, the Random Forest model demonstrates superior performance, with its ROC curve positioned closest to the upper-left corner. This indicates a high TP rate accompanied by a low FP rate, reflecting strong discriminative capability and minimal false alarms.

Figure 14 Model performance evaluation across 10-fold.

Figure 15 ROC curve of the Random Forest model, illustrating its diagnostic performance in terms of AUC. AUC, area under the curve; ROC, receiver operating characteristic.

Discussion

Key findings

This study’s aim was to develop and evaluate a ML-based framework that uses clinical and demographic information to evaluate maternal health risks. The findings showed that some categorization algorithms have a great deal of potential for correctly identifying high-risk pregnancy situations when appropriately adjusted and evaluated. The Random Forest and ensemble voting classifiers were the most resilient of the five models created, exhibiting the best classification, accuracy and generalizability across test and validation sets.

The capacity of Random Forest to record complex nonlinear feature connections while surviving noise and overfitting is what makes its performance significant. This model’s balanced sensitivity and specificity were demonstrated by its constant F1-score and maximum accuracy on the unseen test data. By combining probabilistic insights from many classifiers, the ensemble technique significantly improved this performance and showed how model diversity may enhance clinical prediction in practical settings. Our model achieved 99% accuracy in validation and 100% accuracy in test, outperforming the 96% accuracy reported by Ahmed et al. [2020] on the same dataset, likely due to our robust preprocessing and stratified validation approach.

Strengths and limitations

A primary limitation is the lack of longitudinal tracking. The current model analyses a static snapshot of vitals and does not account for the temporal evolution of risk factors throughout the pregnancy It is crucial to recognize that the dataset was a single-center data source even if it was adequately rich in clinical characteristics. This restricts the models’ capacity to be directly applied to larger populations or health systems in the absence of further external validation. Furthermore, even while ML improves accuracy, it has to be carefully incorporated into clinical processes, with a focus on interpretability, physician participation, and ethical issues related to algorithmic choices. The dataset is cross-sectional and lacks specific gestational age timestamps, limiting the ability to analyze risk progression across specific trimesters.

Comparison with similar research

The results of this study are in line with past investigations that have predicted maternal health concerns using ML. Previous research has consistently demonstrated that when dealing with complicated patterns and a variety of clinical data, tree-based algorithms in particular, Random Forest frequently outperform simpler linear models in terms of accuracy. For example, research on illnesses like gestational diabetes and hypertensive disorders of pregnancy has shown that Random Forest often performs better than Logistic Regression; the current results support this finding. The recent results further demonstrate how voting-based techniques may enhance the stability and generalizability of predictions, since ensemble methods have also been acknowledged in previous studies for their capacity to integrate the capabilities of several classifiers. Simpler models, such as Logistic Regression and DT, are still useful, nevertheless, particularly because they clearly explain the prediction process. Their interpretability makes them appealing for clinical usage, even if their accuracy isn’t necessarily as great as that of more sophisticated models. This equilibrium is demonstrated in the current work, where the greater predictive power of more intricate models is complemented by the transparency of coefficients and decision rules. Crucially, factors including hemoglobin, blood pressure, and glucose levels continuously surfaced as important predictors. This conclusion supports the clinical importance of these characteristics and is in line with the body of current maternal health research. When combined, these comparisons imply that although sophisticated techniques can improve prediction accuracy, clinician trust and interpretability are still essential for practical use in maternity care.

Explanations of findings

Despite being simpler, the models of DT and Logistic Regression offered important clarity. In line with the requirement for explainability in medical artificial intelligence (AI) applications, both models provided interpretable decision pathways and coefficients, respectively. The clinical validity of input variables such temperature, heart rate, glucose level, and diastolic blood pressure, all of which demonstrated statistically significant contributions to risk prediction were validated with their inclusion. Using recursive feature selection, the significance of physiological parameters was highlighted via Logistic Regression analysis and elimination. The importance of these variables in maternal risk stratification is shown by the persistent emergence of strong predictors, including BS levels and hemoglobin concentration. These findings were more reliable due to the dataset’s organized nature and thorough management of outliers and missing values, which decreased the possibility of bias and confounding.

Using recursive feature selection, a feature the significance of physiological parameters was highlighted via Logistic Regression analysis and elimination. The importance of these variables in maternal risk stratification is shown by the persistent emergence of strong predictors, including BS levels and hemoglobin concentration. These findings were more reliable due to the dataset’s organized nature and thorough management of outliers and missing values, which decreased the possibility of bias and confounding.

Implications and actions needed

The models’ discriminative performance was validated by evaluation criteria such as confusion matrices and ROC-AUC. The high recall shown in high-performing models is a crucial benefit, especially in clinical settings where FNs can result in unfavorable results. A harmonic balance between sensitivity and specificity was also achieved by the ensemble model’s usage of soft-voting, which is crucial for creating risk alarms or decision-support systems for obstetric care. The study’s findings support the promise of ML methods, especially ensemble and tree-based models, as effective instruments for predicting the risks to maternal health. When used in clinical practice, they can supplement established risk assessment procedures and enhance proactive maternal care initiatives because of their robust validation and open approach. The study’s findings support the promise of ML methods, especially ensemble and tree-based models, as effective instruments for predicting the risks to maternal health. When used in clinical practice, they can supplement established risk assessment procedures and enhance proactive maternal care initiatives because of their robust validation and open approach.

Clinical implications of error and uncertainty

Despite the model’s near-perfect categorisation in the test split, clinical deployment depends on the use of an uncertainty threshold. The approach reduces the possibility of FNs (missing high-risk pregnancies) in marginal cases by automatically alerting the 1.27% of uncertain cases. The safety issues with automated diagnostic systems are immediately addressed by this “human-in-the-loop” strategy.

Conclusions

Using standard clinical and demographic data, this study showed how well ML models could categorize maternal health risk levels. Random Forest and the Ensemble Voting Classifier outperformed the other models in terms of predictive ability, obtaining high F1-scores, accuracy, and recall on both the test and validation datasets. These results demonstrate how incorporating AI-powered decision-support tools into maternal healthcare settings might help with early risk assessment and intervention planning. The results’ clinical reliability was enhanced by the application of systematic preprocessing procedures, meticulous feature selection, and thorough model validation. The work provides a viable method for converting ML research into useful clinical applications by placing a high priority on model interpretability and repeatability. The proposed architecture offers a strong basis for the future development of intelligent maternal care systems, even if external validation is still required to guarantee wider generalizability.

Overall, the proposed approach presents a scalable and interpretable solution to augment traditional maternal risk assessment, potentially reducing preventable complications and improving maternal health outcomes in diverse care settings.

Acknowledgments

The authors sincerely acknowledge American International University-Bangladesh (AIUB) for providing the necessary resources and a supportive academic and research environment that made this study possible. It is a privilege to contribute to the advancement of data-driven maternal healthcare solutions under the auspices of AIUB. This work reflects the university’s strong commitment to meaningful innovation and the promotion of societal well-being.

Footnote

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-199/rc

Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-199/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-2025-199/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This dataset comprises comprehensive clinical, physiological, and historical health information from maternal patients, collected to assess potential health risks during pregnancy. The dataset is publicly accessible on Mendeley Data, V1: Mojumdar et al. [2024], “Maternal Health Risk Assessment Dataset”, doi: 10.17632/p5w98dvbbk.1, Thus, institutional ethical approval and informed consent were waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Noviandy TR, Nainggolan SI, Raihan R, et al. Maternal Health Risk Detection Using Light Gradient Boosting Machine Approach. Infolitika Journal of Data Science 2023;1:48-55.
Sajedinejad S, Majdzadeh R, Vedadhir A, et al. Maternal mortality: a cross-sectional study in global health. Global Health 2015;11:4. [Crossref] [PubMed]
Mberu BU, Haregu TN, Kyobutungi C, et al. Health and health-related indicators in slum, rural, and urban communities: a comparative analysis. Glob Health Action 2016;9:33163. [Crossref] [PubMed]
Aoyama K, D’souza R, Pinto R, et al. Risk prediction models for maternal mortality: A systematic review and meta-analysis. PLoS One 2018;13:e0208563. [Crossref] [PubMed]
Siddika A, Sultana M. Maternal Health Risk Analysis by using Exploratory Data Analysis and Machine Learning Algorithms. International Journal of Engineering Research in Computer Science and Engineering 2023;10:146-51.
Mennickent D, Rodríguez A, Opazo MC, et al. Machine learning applied in maternal and fetal health: a narrative review focused on pregnancy diseases and complications. Front Endocrinol (Lausanne) 2023;14:1130139. [Crossref] [PubMed]
Fredriksson A, Fulcher IR, Russell AL, et al. Machine learning for maternal health: Predicting delivery location in a community health worker program in Zanzibar. Front Digit Health 2022;4:855236. [Crossref] [PubMed]
Mutlu HB, Yücel N, Durmaz F, et al. Prediction of Maternal Health Risk with Traditional Machine Learning Methods. NATURENGS 2023;4:16-23.
Krithi P, Vemuri J. Prediction of maternal health risk factors using machine learning algorithms. Procedia Comput Sci 2025;258:2713-22.
Setiawan KE, Kurniawan A, Prasetyo SY. Comparative Analysis of Machine Learning Decision Tree-Based Models for Predicting Maternal Health Risks. Procedia Comput Sci 2024;245:57-64.
Raihen MN, Akter S. Comparative Assessment of Several Effective Machine Learning Classification Methods for Maternal Health Risk. Comput J Math Stat Sci 2024;3:161-76.
Mondal S, Nag A, Barman AK, et al. Machine learning-based maternal health risk prediction model for IoMT framework. Int J Exp Res Rev 2023;32:145-59.
Dunn J, Coravos A, Fanarjian M, et al. Remote digital health technologies for improving the care of people with respiratory disorders. Lancet Digit Health 2024;6:e291-8. [Crossref] [PubMed]
Tzimourta KD, Tsipouras MG, Angelidis P, et al. Maternal Health Risk Detection: Advancing Midwifery with Artificial Intelligence. Healthcare (Basel) 2025;13:833. [Crossref] [PubMed]
Raza A, Siddiqui HUR, Munir K, et al. Ensemble learning-based feature engineering to analyze maternal health during pregnancy and health risk prediction. PLoS One 2022;17:e0276525. [Crossref] [PubMed]
Al Mashrafi SS, Tafakori L, Abdollahian M. Predicting maternal risk level using machine learning models. BMC Pregnancy Childbirth 2024;24:820. [Crossref] [PubMed]
Togunwa TO, Babatunde AO, Abdullah KU. Deep hybrid model for maternal health risk classification in pregnancy: synergy of ANN and Random Forest. Front Artif Intell 2023;6:1213436. [Crossref] [PubMed]
Jamel L, Umer M, Saidani O, et al. Improving prediction of maternal health risks using PCA features and TreeNet model. PeerJ Comput Sci 2024;10:e1982. [Crossref] [PubMed]
Montgomery-Csobán T, Kavanagh K, Murray P, et al. Machine learning-enabled maternal risk assessment for women with pre-eclampsia (the PIERS-ML model): a modelling study. Lancet Digit Health 2024;6:e238-50. [Crossref] [PubMed]
Hassija V, Chamola V, Mahapatra A, et al. Interpreting black-box models: a review on explainable artificial intelligence. Cognit Comput 2024;16:45-74.
Guidotti R, Monreale A, Ruggieri S, et al. A survey of methods for explaining black box models. ACM Comput Surv 2018;51:1-42.
Chunara R, Gjonaj J, Immaculate E, et al. Social determinants of health: the need for data science methods and capacity. Lancet Digit Health 2024;6:e235-7. [Crossref] [PubMed]
Bogale DS, Abuhay TM, Dejene BE. Predicting perinatal mortality based on maternal health status and health insurance service using homogeneous ensemble machine learning methods. BMC Med Inform Decis Mak 2022;22:341. [Crossref] [PubMed]
Mojumdar MU, Sarker D, Assaduzzaman M, et al. Maternal health risk factors dataset: Clinical parameters and insights from rural Bangladesh. Data Brief 2025;59:111363. [Crossref] [PubMed]

doi: 10.21037/jmai-2025-199
Cite this article as: Saha P, Rakib M, Ubaid FA, Kousik MI, Tasnim SA, Mahmud R, Ahmed T. A data driven analysis of maternal health risk indicators using machine learning techniques. J Med Artif Intell 2026;9:50.

Highlight box

Introduction

Background

Rationale and knowledge gap

Objective

Literature review

Methods

Dataset description

Data preprocessing

Data retrieval and preparation

Handling missing values and outliers

Encoding categorical variables

Standardization of the continuous variables

Feature selection

Exploratory data analysis (EDA)

Feature distribution by risk category

Distribution of categorical features by risk category

Data splitting

ML algorithms

ML techniques

Robust validation and quantification of uncertainty

Ethical consideration

Results

Table 1

Feature analysis results

The findings and results of exploratory data analysis (EDA)

Findings on physiological feature by risk category

Findings on categorical features by risk category

Findings on BMI Variation across age and its association with maternal health risk levels

Findings on BS trends over age by risk level

Critical maternal health factors and their statistical analysis

Feature selection using Logistic Regression and recursive elimination

Decision tree model

Table 2

Logistic regression model

Table 3

Table 4

Evaluation of classification errors (Logistic Regression)

Ensemble classifier

Table 5

ROC curve evaluation

Confusion matrix interpretation

SVM

Table 6

Random forest model

Table 7

Robust validation and uncertainty quantification

Comparative analysis of different ML models

Discussion

Key findings

Strengths and limitations

Comparison with similar research

Explanations of findings

Implications and actions needed

Clinical implications of error and uncertainty

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share