Log In

Developing and validating an artificial intelligence-based application for predicting some pregnancy outcomes: a multi-phase study protocol

Published 1 day ago20 minute read

Reproductive Health volume 22, Article number: 99 (2025) Cite this article

AbstractSection Background

Pregnancy complications such as preterm birth, low birth weight, gestational diabetes mellitus, preeclampsia, and intrauterine growth restriction significantly affect both maternal and neonatal health outcomes. Early identification of high-risk pregnancies is essential for timely interventions; however, traditional predictive models often lack accuracy. This study aims to develop and validate an AI-based application to improve risk assessment and clinical decision-making regarding pregnancy outcomes through a multi-phase approach.

AbstractSection Methods

This study comprises three phases. In Phase 1, retrospective case-control data will be collected from medical records, including Mother and Infant System (IMaN), Hospital Information System (HIS), and archived records of women who gave birth at Al-Zahra and Taleghani Educational and Medical Centers in Tabriz between 2022 and 2024. In Phase 2, an artificial intelligence model will be developed using machine learning algorithms such as Random Forest, XGBoost, Support Vector Machines (SVM), and neural networks, followed by model training, validation, and integration into a user-friendly application. Phase 3 will focus on a prospective cohort study of pregnant women attending clinics after 22 weeks of gestation, evaluating the AI model’s predictive performance through metrics like AUROC (area under the receiver operating characteristic curve), sensitivity, specificity, and predictive values, along with real-time data collection. Content validity will be determined through expert reviews.

AbstractSection Discussion

This study protocol presents a multi-phase approach to developing and validating an AI-based application for predicting pregnancy outcomes. By integrating retrospective data analysis, machine learning, and prospective validation, the study aims to improve early risk detection and maternal care. If successful, this application could support personalized obstetric decision-making.

This study aims to develop and validate an artificial intelligence (AI)-based tool to predict pregnancy complications, including preterm birth, low birth weight, gestational diabetes, intrauterine growth restriction, and preeclampsia. The research will be conducted in three phases. First, past medical records from two hospitals will be analysed to identify key risk factors. Next, a machine learning model will be developed and integrated into a user-friendly application. Finally, the tool will be tested on a group of pregnant women to assess its accuracy in predicting adverse pregnancy outcomes.

By leveraging AI, this study seeks to enhance early risk detection, enabling healthcare providers to implement timely preventive measures and improve maternal and neonatal health outcomes. If successful, this AI-based application could serve as a valuable resource in maternity care, assisting midwives and doctors in delivering personalized care and reducing complications. The findings could also advance the use of AI technology in obstetric practice, improving decision-making and optimizing healthcare resources.

Peer Review reports

Pregnancy represents a critical phase in a woman’s life, significantly impacting maternal and neonatal health outcomes [1]. This period is accompanied by extensive physiological and psychological transformations, leading to various complications such as low birth weight (LBW), preeclampsia, gestational diabetes mellitus (GDM), preterm birth, and intrauterine growth restriction (IUGR), among others [2]. These complications not only threaten immediate maternal and neonatal health but also predispose individuals to long-term health issues, escalate healthcare expenditures, and in extreme cases, result in maternal and neonatal mortality. Prompt identification and intervention are essential for mitigating these risks and enhancing pregnancy outcomes [3].

Preeclampsia, characterized by hypertension and multi-organ dysfunction, occurs in approximately 2–8% of pregnancies globally and stands as a significant contributor to maternal and neonatal morbidity and mortality [4]. Delayed diagnosis can precipitate severe outcomes including organ failure, preterm birth, and even maternal or neonatal death [5]. Gestational diabetes, affecting nearly 10% of pregnancies, poses substantial risks to both mothers and offspring, including long-term metabolic and cardiovascular sequelae [6, 7].

Preterm birth, defined as delivery occurring before 37 weeks of gestation, is a notable contributor to neonatal morbidity and mortality, with a global prevalence documented at 9.9% in 2020 [8]. It is associated with persistent developmental and health challenges for affected children [9]. LBW (< 2500 g), which occurs in 4–16% of births worldwide, serves as a critical indicator of neonatal health, correlating with increased risks of mortality, developmental delays, and chronic diseases later in life [10, 11]. IUGR, observed in approximately 10% of pregnancies, indicates inadequate fetal growth, often associated with placental dysfunction, maternal malnutrition, or other pathological conditions. IUGR significantly heightens the risk of neonatal mortality, developmental deficiencies, and long-term health complications [12, 13].

Conventional predictive models for pregnancy outcomes typically rely on a limited set of variables, inadequately addressing the multifactorial nature of these complications and resulting in suboptimal risk assessment and management [14]. The advent of machine learning (ML) and artificial intelligence (AI) presents a transformative avenue for analyzing complex datasets and identifying intricate patterns that traditional models may overlook [15]. ML algorithms can assimilate various data sources—clinical, genetic, environmental, and lifestyle—to yield precise and comprehensive risk predictions [16, 17].

As a subset of AI, ML encompasses algorithms that derive insights from data to make predictions or inform decisions autonomously [18]. In the healthcare domain, ML has demonstrated significant potential in disease diagnostics, outcome predictions, and personalized treatment strategies [19, 20]. Within obstetrics, ML has the potential to enhance the identification of high-risk pregnancies, and facilitate timely interventions and tailored care [21, 22]. For example, ML models can predict occurrences of different complications by critically analyzing maternal health datasets, thus refining clinical decision-making processes [23, 24].

However, integrating machine learning into pregnancy care faces several challenges, including data quality, accessibility, and privacy concerns [25, 26]. Despite these hurdles, numerous studies worldwide have illustrated the effectiveness of ML in predicting pregnancy outcomes and optimizing delivery methods [27, 28]. A pioneering study in Iran by Ranjbar et al. (2023) investigated ML applications in forecasting outcomes such as LBW, postpartum hemorrhage, IUGR, and perinatal asphyxia, though it faced limitations regarding sample size and regional applicability [29,30,31,32].

This study aims to overcome prior research limitations by developing and validating an AI-driven tool for predicting pregnancy outcomes, including LBW, preeclampsia, GDM, preterm birth, and IUGR. We will conduct an extensive analysis of a large, diverse dataset that incorporates a broad spectrum of variables to formulate a robust and generalizable predictive model. Additionally, the study will evaluate the model’s accuracy and sensitivity during a validation phase, ultimately striving to enhance maternal and neonatal health outcomes through early risk identification and personalized healthcare interventions. Here is a detailed form of the process:

General objective

To design and validate an artificial intelligence-based application for predicting selected pregnancy outcomes.

To collect data on women who have given birth using existing electronic medical records.

Primary objectives

Hypotheses1 (H1): The AI model will effectively predict Preterm Birth with high specificity combining the data that will be collected.

H2: The AI model will effectively predict LBW with high specificity combining the data that will be collected.

Secondary objectives

H3: The AI model will effectively predict GDM with high specificity combining the data that will be collected.

H4: The AI model will effectively predict IUGR with high specificity combining the data that will be collected.

H5: The AI model will effectively predict preeclampsia with high specificity combining the data that will be collected.

Primary objectives

Secondary objectives

This study will be conducted in three phases: (1) Retrospective data collection and analysis from medical records, (2) Development of an AI-based program using machine learning models, and (3) A prospective cohort study to validate the program’s accuracy in predicting pregnancy outcomes. Data will be analysed using statistical and machine learning methods, with results evaluated for sensitivity, specificity, and overall predictive performance (Fig. 1).

Fig. 1
figure 1

Study design flowchart

Full size image

In our study design, we plan to develop separate outcome-specific predictive models for each adverse pregnancy outcome (preterm birth, low birth weight, gestational diabetes mellitus, intrauterine growth restriction, and preeclampsia). Initially, distinct models will be developed and optimized independently for each outcome to ensure maximum predictive accuracy and clinical relevance, given the unique risk factors associated with each condition. However, in the later stages of model comparison, we will also explore ensemble modelling techniques (such as stacking and boosting) to assess whether a composite, multivariate model combining predictions can enhance overall predictive performance. Thus, the primary focus is on outcome-specific modelling, but the feasibility of a multivariate approach will be evaluated as a secondary exploratory analysis. We believe this strategy offers both precise predictions for clinical use and the opportunity to leverage the interrelationships among outcomes.

Phase 1: data collection and analysis of historical data

Historical data will be collected from the following sources:

Women who experienced pregnancy outcomes including low birth weight, preterm birth, preeclampsia, gestational diabetes, and intrauterine growth restriction will be selected as the case group. Simultaneously, women who had successful pregnancies without complications during the same period will serve as the control group. The sample size will be adjusted based on the availability of cases and the prevalence of outcomes in the hospital population.

Women who have given birth in 2022–2024.

Women who gave birth to infants with known genetic defects.

Demographic Information: Age, height, weight, ethnicity, education level, medical history, etc.

Obstetric History: Number of pregnancies, miscarriages, previous pregnancy complications, delivery method.

Prenatal Information: Routine test results such as blood pressure, glucose levels, and other prenatal lab results.

Pregnancy Outcomes: Type of delivery, gestational age at birth, birth weight, Apgar score < 7 at 5 min, neonatal intensive care unit (NICU) admission, low birth weight, preeclampsia, gestational diabetes, preterm birth, and intrauterine growth restriction.

The data pre-processing phase is critical for transforming raw data into a format that is clean, consistent, and conducive to training machine learning models. It begins with data cleaning, where incomplete, noisy, or inconsistent entries are identified and rectified to eliminate errors. For instance, unrealistic values, such as a maternal age of 150 years or a weight of 1000 kg, must be corrected to avoid adversely affecting model performance. Addressing missing data is a vital step, often accomplished through techniques like multiple imputation. This statistical method generates several plausible datasets by imputing missing values based on predictive models. Each dataset is then analysed separately, and the results are aggregated to yield more robust and reliable estimates. Following that, normalization is applied to standardize the data, typically scaling it to a range between 0 and 1. This is often executed using Min-Max Normalization, allowing features with divergent scales—such as blood pressure, age, and weight—to contribute equitably to the model. This step is essential for mitigating issues like gradient instability, particularly in neural networks. Lastly, feature engineering is employed to derive new, meaningful features from the existing data. This might include calculating metrics like body mass index (BMI) or investigating interactions between variables, such as maternal age and the number of pregnancies. Such enhancements improve the model’s capacity to discern patterns and enhance predictive accuracy. Overall, this detailed pre-processing workflow ensures that the dataset is accurate, consistent, and optimized for machine learning, establishing a solid foundation for reliable predictions regarding pregnancy outcomes.

The machine learning model development phase focuses on selecting and optimizing algorithms to predict pregnancy outcomes. Key algorithms assessed include Random Forest, XGBoost, Support Vector Machines (SVM), and Neural Networks. Random Forest is chosen for its resistance to overfitting and feature importance estimation. XGBoost excels in handling missing data with its efficient error-correction approach. SVM is effective at defining optimal decision boundaries in high-dimensional spaces, while Neural Networks capture complex patterns in large datasets. Transfer learning is also explored to adapt pre-trained models, reducing the need for extensive datasets. Each model is fine-tuned with hyperparameter optimization and cross-validation to enhance performance, aiming to develop accurate and generalizable models for healthcare providers in managing pregnancy outcomes.

The model training and validation phase is pivotal in achieving high accuracy, stability, and generalizability for machine learning models. We will implement k-fold cross-validation, partitioning the dataset into k subsets to train the model on k-1 folds and validate it on the remaining fold. This iterative process will be repeated k times, yielding robust performance estimates on unseen data. Hyperparameter tuning will be conducted via Grid Search and Bayesian Optimization, allowing us to systematically navigate and refine key settings such as learning rates, tree depths, and regularization parameters. This meticulous approach guarantees that our models are finely adjusted for optimal accuracy and computational efficiency. To evaluate model performance, we will use metrics such as accuracy, precision, recall, F1-score, and AUROC (area under the receiver operating characteristic curve). These metrics offer a thorough assessment of the model’s predictive abilities, particularly in handling imbalanced datasets commonly encountered in this field. Further augmentation of predictive performance will be explored by ensemble methods such as Stacking and Boosting. Stacking will leverage the predictions from various base models (such as Random Forest, XGBoost, and SVM) as inputs for a higher-level meta-model while Boosting will iteratively refine the model by focusing on correcting the residual errors of prior iterations. This approach is especially beneficial for navigating complex prediction tasks like preterm birth and preeclampsia. This phase ensures the developed models are robust, accurate, and generalizable to the new unseen data.

Phase 2: AI program development

In this phase, we will integrate the top-performing machine learning model, identified through superior evaluation metrics, into an AI-powered application aimed at delivering real-time predictions for pregnancy outcomes. The application will utilize a user-centric interface built with frameworks such as “Dash”, ensuring cross-platform compatibility and an intuitive user experience. The interface will focus on clarity, visual appeal, and seamless navigation, featuring interactive elements—buttons, icons, and forms—designed to enhance user engagement. A dynamic dashboard will be implemented to visualize patient data and predictive analytics, leveraging tools like “Plotly” for interactive charts and graphs, which will enable healthcare providers to effectively track and interpret patient progress.

At this stage, the programming language and specific machine learning libraries are currently under evaluation in collaboration with our artificial intelligence development team. Nonetheless, the plan is to utilize well-established, open-source tools and environments to ensure reproducibility, scalability, and interoperability. Python is the leading candidate due to its comprehensive ecosystem for machine learning and compatibility with various libraries, such as Scikit-learn, TensorFlow, and XGBoost. The final selection will depend on the characteristics of the data, model performance, and the ease of integration with the Dash-based user interface.

Key functionalities include real-time data entry capabilities, allowing clinicians to input critical clinical metrics (e.g., blood pressure, glucose levels) and receive immediate predictions concerning potential outcomes such as preterm birth, preeclampsia, or gestational diabetes. The application will also incorporate visual alerts through color-coded indicators (e.g., red for high-risk scenarios and green for stable conditions) to rapidly communicate critical information and facilitate swift decision-making. Additionally, the software will feature a responsive design to ensure optimal functionality across various devices—desktops, tablets, and smartphones—maximizing accessibility for end-users. The AI model will be continuously refined and updated with new data, enhancing its predictive accuracy and adaptability.

To verify the reliability and usability of the application, it will undergo comprehensive internal testing focused on functionality, security, and the accuracy of real-time predictions. Following this, a beta release will be distributed to a select group of healthcare providers for feedback, which will inform iterative enhancements and ensure the application meets the specific needs of its users. This phase aims to produce a robust, user-friendly AI application that equips healthcare providers with actionable insights to improve pregnancy outcomes effectively.

Phase 3: prospective cohort study for validation

This phase focuses on validating the AI program through an prospective cohort study involving pregnant women visiting prenatal clinics at hospitals and urban/suburban health centers in Tabriz. The study population includes pregnant women after 22 weeks of gestation, excluding cases with known fetal genetic abnormalities.

To address algorithmic bias, we will ensure the inclusion of diversity within the training dataset by incorporating participants representing a broad spectrum of sociodemographic, clinical, and obstetric variables. Furthermore, we will actively monitor fairness metrics throughout the model evaluation process and report performance across relevant subgroups where appropriate. We are committed to maintaining data privacy and security through the de-identification of all patient records, the encryption of databases, enforced access control policies, and compliance with institutional and national data protection regulations. To mitigate potential risks associated with false positives and false negatives, we will establish clinically significant thresholds for model outputs in collaboration with obstetric specialists, thereby avoiding an overreliance on any singular prediction. The model will be designed as a decision-support tool, rather than a diagnostic authority, thereby preserving the autonomy of clinicians in care decisions. In the prospective phase, the AI application will articulate the certainty of predictions and will recommend confirmatory clinical evaluations when elevated risks are identified. Through these measures, we aim to responsibly balance innovation with patient safety, transparency, and ethical accountability.

To determine the sample size, two primary outcomes—low birth weight and preterm birth—are considered. The sample size for each outcome is calculated separately, and the largest value is selected as the overall sample size. The formula for diagnostic studies is used:

$$\:\text{N}=\left({\text{Z}}^{2}\propto\:/2\text{*}\text{P}\left(1-\text{P}\right)\right)/{\text{d}}^{2}$$

Here are the parameters used: a confidence level of 95% (Zα/₂ = 1.96), a precision (d) of 0.04 for the most prevalent outcome (low birth weight at 20%) [33], and 0.032 for preterm birth (16%) [11, 34], resulting in sample sizes of 384 and 504, respectively. The larger sample size (504) was selected. To ensure sufficient power for model development and to account for potential missing or unusable data, a 10% increase was applied. The final sample size is set at 550 participants.

Data will be collected in real-time via medical records and structured patient interviews throughout the gestational period. Healthcare providers will utilize the AI platform to forecast pregnancy outcomes at various clinical junctures. Outcome assessment will focus on metrics such as sensitivity, specificity, positive and negative predictive values, and overall predictive accuracy, achieved by juxtaposing predicted outcomes with actual results. Regular oversight and documentation of pregnancy outcomes will continue until delivery. Data analysis will initiate with descriptive statistics to summarize demographic and clinical features, followed by comparative analysis of predicted versus actual outcomes utilizing metrics such as AUROC, sensitivity, and specificity. To ensure content validity, a panel of obstetricians and AI experts will evaluate the variables, using tools like the Content Validity Index (CVI) to assess the effectiveness of the program components.

The principal investigator, PhD candidate in midwifery, will extract data from records of women who have given birth in Phase 1 and lead the prospective cohort study in Phase 3. Furthermore, the investigator will contribute to the AI program’s design in Phase 2 by acquiring coding and collaborating on development. Phase 3 aims to validate the program’s predictive accuracy through a well-designed cohort study, ensuring its reliability and effectiveness in real-world clinical environments.

This study adhered to established ethical guidelines and applicable national principles. Approval was granted by the ethics committee of Tabriz University of Medical Sciences (IR.TBZMED.REC.1403.896). Informed written consent will be obtained from all participants in the prospective phase, ensuring the protection of confidentiality and privacy.

Recent advancements in machine learning (ML) and artificial intelligence (AI) show significant promise in predicting adverse pregnancy outcomes such as preterm birth [35], preeclampsia, and gestational diabetes [27]. Studies indicate that ML models, especially XGBoost and Random Forest, outperform traditional statistical approaches like logistic regression in analysing complex datasets. These techniques effectively identify critical risk factors, leading to high prediction accuracy [35, 36].

Recent reviews support the superior efficacy of advanced models in predicting preterm birth and preeclampsia, highlighting their ability to aggregate diverse data inputs [28, 35,36,37]. However, challenges such as data quality, accessibility, and generalizability across populations remain significant hurdles for widespread implementation.

In Iran, ML models have been successfully applied to forecast outcomes like postpartum haemorrhage [30] and low birth weight [31], leveraging localized datasets. Nevertheless, issues like regional data bias emphasize the need for research that addresses population-specific factors.

In summary, while AI and ML offer innovative ways to enhance maternal and neonatal health outcomes, it is crucial to overcome challenges related to data quality and ethical considerations. Future research should focus on integrating diverse datasets and localized studies to improve model applicability across different populations. Addressing these limitations could make AI as a transformative tool in advancing global maternal care.

This study integrates diverse data sources like electronic medical records and real-time clinical data, enhancing predictive model accuracy. Advanced machine learning techniques ensure high performance and clinical utility. Utilizing data from the Iranian Maternal and Neonatal Network (IMaN System) tailors the models to the target population, while real-time predictions support timely healthcare decisions. However, limitations include potential data quality issues and challenges in generalization to other populations. Additionally, biases in training data and the limited range of predictors may affect the models’ efficacy. Continuous improvement and ethical considerations are necessary for broader implementation in clinical practice.

This study protocol presents a framework for developing and validating AI-driven predictive models for adverse pregnancy outcomes, utilizing advanced machine learning and diverse local datasets. By integrating real-time data and robust validation techniques, the research aims to provide healthcare practitioners with precise and actionable insights to enhance maternal and neonatal care. However, addressing challenges related to data quality, generalizability, ethical implications, and resource demands is crucial for effective implementation. Ultimately, this research has the potential to transform pregnancy care through early identification of high-risk cases, leading to timely interventions and improved health outcomes for mothers and infants.

Upon request, the corresponding author will provide access to the data.

AI:

Artificial intelligence

ML:

Machine learning

IMaN:

Mother and Infant System

HIS:

Hospital Information System

XGBoost:

Extreme Gradient Boosting

SVM:

Support Vector Machines

AUROC:

Area under the receiver operating characteristic curve

LBW:

Low birth weight

GDM:

Gestational diabetes mellitus

IUGR:

Intrauterine growth restriction

NICU:

Neonatal intensive care unit

BMI:

Body mass index

CVI:

Content Validity Index

We acknowledge the Research Unit of Al-Zahra and Taleghani Medical and Educational centers and Tabriz University of Medical Sciences for scientific support. We extend our sincere appreciation to the Vice-Chancellor for Research at Tabriz University of Medical Sciences for their substantial financial backing in support of this study.

The funding for this study is provided by Tabriz University of Medical Sciences. However, the funding source has no involvement in the study’s design, implementation, or decision-making regarding manuscript writing and submission.

    Authors

    1. Fatemeh Abbasalizadeh

      You can also search for this author inPubMed Google Scholar

    2. Jafar Tanha

      You can also search for this author inPubMed Google Scholar

    3. Mojgan Mirghafourvand

      You can also search for this author inPubMed Google Scholar

    FS, AJ, SM-A-C, FA, JT and MM contributed to the protocol design. FS, AJ, JT, and MM were engaged in developing the implementation and analysis framework. FS composed the initial draft of the protocol under the guidance of MM, the Corresponding Author. AJ, SM-A-C, FA and JT undertook a comprehensive review of the manuscript and endorsed the final version for publication.

    Correspondence to Mojgan Mirghafourvand.

    The study followed ethical guidelines, including the Helsinki Declaration, and received approval from the ethics committee of Tabriz University of Medical Sciences (IR.TBZMED.REC.1403.896). Informed written consent will be obtained from all participants in the prospective phase, ensuring the protection of confidentiality and privacy.

    Not applicable.

    The authors declare no competing interests.

    Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

    Check for updates. Verify currency and authenticity via CrossMark

    Shabani, F., Jodeiri, A., Mohammad‑Alizadeh‑Charandabi, S. et al. Developing and validating an artificial intelligence-based application for predicting some pregnancy outcomes: a multi-phase study protocol. Reprod Health 22, 99 (2025). https://doi.org/10.1186/s12978-025-02048-4

    Download citation

    • Received:

    • Accepted:

    • Published:

    • DOI: https://doi.org/10.1186/s12978-025-02048-4

    Origin:
    publisher logo
    BioMed Central
    Loading...
    Loading...
    Loading...

    You may also like...