Harness machine learning for multiple prognoses prediction in sepsis patients: evidence from the MIMIC-IV database

BMC Medical Informatics and Decision Making volume 25, Article number: 152 (2025) Cite this article

Sepsis, a severe systemic response to infection, frequently results in adverse outcomes, underscoring the urgency for prompt and accurate prognostic tools. Machine learning methods such as logistic regression, random forests, and CatBoost, have shown potential in early sepsis prediction. The study aimed to create and verify a machine learning model capable of early prognostic identification of patients with sepsis in intensive care units (ICUs).

Patients adhering to inclusion and exclusion criteria from the MIMIC-IV v2.2 database were divided into a training set and a validation set in a 7:3 ratio. Initially, we employed difference analysis to assess the significance of each variable and subsequently screened relevant features with multinomial logistic regression analysis. Logistic regression, random forest, and CatBoost algorithms were used to construct machine learning models to predict rapid recovery, chronic critical illness, and mortality in sepsis. The models were compared through several evaluation indexes including precision, accuracy, recall, F1 score, and the area under the receiver-operating-characteristic curve(AUC) in the validation set to select the optimal model. The best model was visualized and interpreted utilizing the Shapley Additive explanations method.

13174 sepsis patients were included. Post the screening process,26 clinical features were obtained to develop three distinct machine learning models. CatBoost exhibited superior performance among the three models with a weighted AUC of 0.771. The prognosis with the highest predictive performance was mortality (AUC = 0.804), followed by the prognoses of rapid recovery (AUC = 0.773) and chronic critical illness(AUC = 0.737). Urine output, respiratory rate, and temperature were the top three important features for the whole model prediction.

The machine learning model developed leveraging the CatBoost algorithm demonstrates the latent capacity to identify sepsis prognosis early. It also suggests that interventions targeting factors such as urine output, respiratory status, and temperature in the early stage may potentially alter the adverse prognosis of sepsis patients. However, the model will still require further external validation in the future.

Peer Review reports

Sepsis, characterized as a detrimental systemic response to infection, poses a significant risk for the occurrence of life-threatening organ dysfunction [1]. With the condition advancing, it may escalate to multiple organ failure and ultimately fatality, especially if not identified swiftly and treated promptly. Sepsis is a leading cause of significant morbidity and mortality in intensive care units (ICUs), with a high incidence of disability among survivors. According to a research for the Global Burden of Disease, there were approximately 48.9 million cases of sepsis worldwide in 2017, resulting in 11 million deaths, which accounted for 19.7% of the global death toll that year [2]. An epidemiological survey in Chinese ICUs indicated that the incidence of sepsis was 20.6%, and the 90-day mortality rate was 35.5% [3]. With the improvement of life-saving techniques, a portion of the population survive the early acute phase and enter the chronic stage, known as Chronic Critical Illness (CCI) [4]. Research from Japan found that among 2395,016 patients admitted to ICU, 9.0% met the criteria for CCI, with sepsis being the underlying cause in 50.6% of those [5]. Patients with CCI often require long-term intensive care and stay in the ICU, leading to substantial consumption of healthcare resources [6, 7]. They frequently encounter challenges such as persistent inflammatory response [8], acquired immunosuppression [9], and hypercatabolism [10], which result in recurrent infections, prolonged hospitalization, and a markedly diminished quality of life.

Conventional prognostic prediction of sepsis is commonly assessed by clinical scoring systems such as Sequential Organ Failure Assessment (SOFA), quick SOFA (qSOFA), Systemic Inflammatory Response Syndrome (SIRS), and Acute Physiology and Chronic Health Evaluation II (APACHE II) [11,12,13,14]. Nevertheless, sepsis frequently involves multiple organ dysfunctions, contains lots of clinical information, and the diseased organs differ from person to person. So predicting the disease events using the traditional assessment methods may lead to results bias. The advent of machine learning(ML) algorithms in recent years has facilitated the prediction of disease events based on large and complex clinical information. Premised on the effective management of AI-associated risks by conforming to the European AI Act, machine learning holds the potential to render substantial contributions to disease prediction and treatment decision-making [15]. Advanced ML algorithms are adept at analyzing intricate signals in data-rich environments. Therefore, based on the advantages of analyzing big data, ML approaches are promising in sepsis prognostic prediction. Moreover, the integration of ML methods with epidemiology also represents an emerging trend, with the potential to be utilized across a broad spectrum of infectious disease research [16].

Currently, the prognosis of sepsis is commonly predicted based on logistic regression(LR) to construct nomogram [17,18,19]. These models are evaluated by comparing with classical scores such as SOFA and SAPS II, whose performance are often inferior to the models. However, this way frequently lacks comparison between models and has inherent limitations. Another approach involves constructing prognostic models through more flexible ML algorithms including ensemble learning, such as random forests(RF) [20, 21], support vector machines (SVM) [22, 23], extreme gradient lift (XGBoost) [24, 25], etc. And focus on the comparison of the performance of multiple models, such models can deal with more complex data structures, but sometimes not as explanatory as the former. Additionally, we found that most prognostic models were binary models, designed to predict whether sepsis patients die or develop into CCI. Since both of the two prognoses have a serious impact on the quality of life, we attempted to establish a model that can predict these two adverse outcomes concurrently to aid clinical decision-making.

This study was designed to construct a novel prediction model of multiple prognoses in sepsis based on ML methods using data from the MIMIC-IV v2.2 database and to facilitate early clinical intervention. We selected logistic regression(LR), random forest(RF), and CatBoost for training. LR was chosen for its widespread use, RF for its generally excellent performance, and CatBoost for its use of oblivious trees as base learners, which effectively reduces overfitting and offers high precision and robustness. CatBoost is particularly adept at multi-task learning and handling imbalanced datasets through its built-in balancing strategies.

This was a retrospective study and three ML algorithms were employed to train the models. Subsequent validation determined the most efficient algorithms. Additionally, the interpretability of the model was enhanced due to the use of the Shapley Additive explanations(SHAP) [26]. The detailed process of the study is shown in Fig. 1.

Ethical concerns were thoroughly considered during the study design. The data utilized in this study were obtained from the MIMIC-IV v2.2, a database developed and maintained by MIT Laboratory of Computational Physiology. The database is the largest publicly accessible, privacy-free database in critical care medicine and contains comprehensive information on patients admitted to Beth Israel Deaconess Medical Center between 2008 and 2019. The database includes anonymized clinical information, ensuring that individual patient identities remain confidential [27]. Since the data come from a public database, it does not involve an ethical review. Three authors of this study have successfully completed the ethics training for the MIMIC database (one author with certification number: 12780309).

Inclusion Criteria: ICU patients diagnosed with sepsis 3.0 for the first time:

Exclusion Criteria: i.Patients with multiple ICU admissions, for whom only initial admission data were considered; ii. Patients with an ICU stay shorter than 24 hours; iii. Patients under the age of 18 years; iv. Patients had no SOFA score within 24 hours after admission to the ICU;v.Patients had abnormal data or missed significant clinical information.

For the definition of the CCI group, the diagnostic criteria of CCI were adopted from the Research Triangle Institute (RTI) [28], consisting of an ICU stay for at least 8 days with one of 5 eligible clinical conditions: prolonged acute mechanical ventilation (i.e. mechanical ventilation for at least 96 hours in a single episode); tracheotomy; sepsis and other severe infections; severe wounds; and multiple organ failure, ischemic stroke, intercerebral hemorrhage or traumatic brain injury.

We used Navicat Premium16 to write structured query language (SQL) to extract data from the MIMIC - IV v2.2 database. A total of 56 common clinical variables were extracted, including patients’ demographic information such as age, gender, weight, height, ICU types, time in and out of ICU, and time of death; the Charlson Comorbidity Index was used to account for comorbidities, considering that the risk of comorbidities is not simply the sum of the risks caused by individual diseases; vital signs including temperature, respiratory rate, heart rate, systolic blood pressure, diastolic blood pressure, and mean arterial pressure, with the mean values used to represent the average level of patients within 24 hours; laboratory examinations, where for indicators such as hemoglobin, platelet count, bicarbonate, blood calcium, base excess, pH, blood oxygen partial pressure, partial pressure of carbon dioxide, oxygenation index, lymphocyte count, lymphocyte percentage, and albumin, the minimum values within 24 hours were retained, while for hematocrit, white blood cell count, C - reactive protein, neutrophil count, anion gap, creatinine, blood urea nitrogen, blood chloride, blood glucose, blood sodium, blood potassium, international normalized ratio, prothrombin time, partial thromboplastin time, lactic acid, red blood cell distribution width, D - Dimer, and fibrinogen, the maximum values within 24 hours were retained, as the value of each indicator corresponded to the level of the patient’s test time point and we selected the worst values within 24 hours; ICU monitoring including urine output and central venous pressure monitoring; and related treatments such as mechanical ventilation, renal replacement therapy, diuretics, milrinone, epinephrine, vasopressin, norepinephrine, phenylephrine, dopamine, and dobutamine within the first 24 hours of ICU admission.

The variables with missing values greater than 25% were removed, and 1% and 99% quantiles were used to remove outliers in continuous variables. Variables whose outliers were laborious to remove using the above method were replaced by the median. Categorical variables with a category percentage of less than 5% or containing ambiguous classifications were removed. The retained variables were subsequently utilized for further analysis. To avoid data contamination, the dataset was first randomly divided into training and validation sets according to a 7:3 ratio. After that the data were filled with method of spline using the interpolate function in Python for the training and validation sets respectively, a piecewise imputation approach that better aligns with the structure of the data. And the data was normalized by MinMaxScaler.

Difference analysis was applied to check the significance of each variable and multinomial logistic regression analysis was used to screen relevant features.

Data analyses were conducted by Python software(version 3.7) and SPSS software (version 26.0). Continuous variables adhering to a normal distribution were described as the mean (standard deviation), and variables deviating from a normal distribution were described by median (interquartile range, IQR), none of the variables in this study conformed to normal distribution. Categorical variables were described by frequency (percentage). For between-group comparisons, the Kruskal-Wallis test was used for continuous variables; the chi-square test was used for categorical variables. Subsequently, multinomial logistic regression analysis was employed to screen features, The area under the receiver-operating-characteristic curve (AUC) was mainly used to evaluate the performance of the models. The threshold for statistical significance was set at P < 0.05.

In this study, LR, RF, and CatBoost algorithms were used to develop prediction models. Open-source scikit-learn(http://scikitlearn.org/) was used for model construction, tuning, validation, and results interpretation in Python software(version 3.7).

The data was divided into a training set and validation set with a ratio of 7:3 by stratified method, and the models were constructed by the training set and validated by the validation set. Taking 7 days as the time node, discharge within 7 days, death within 7 days, and development into CCI were taken as the three prognoses of sepsis: i.e., rapid recovery, mortality, and CCI. And use the three prognoses as the ending index to construct the classification models. To optimize the prediction models, randomized search combined with manual fine-tuning was applied to obtain the final hyperparameters. Considering the issue of imbalance in the dataset, we chose to use the model’s built-in weight parameter to balance the distribution (i.e., setting class_weight = “balanced”). When set to “balanced”, the model automatically adjusts the weights of the groups, making the weight of each group inversely proportional to its sample size. This gives higher weights to samples of the minority group during training, thereby balancing the impact of each class. The core of this approach lies in adjusting the loss function so that the optimization process takes into account all groups in a more balanced manner. Precision, Accuracy, Recall, F1-score, and the area under the ROC curve (AUC) were calculated to evaluate the models. Some studies have demonstrated that interpretability and transparency remain challenges for ML [29]. We introduced SHAP method to enhance the interpretability of the model, which can effectively reveal the intrinsic logic behind model predictions.

A total of 33177 records of patients admitted to ICU who met the diagnostic criteria for sepsis 3.0 were obtained from the MIMIC-IV v2.2 database. 25,715 patients were obtained by removing records of repeated admission. 23,174 patients were included for ICU stays of more than 24 hours, and 13174 patients were finally obtained by removing records of patients with abnormal values (Fig. 1).

Following data processing, 37 out of the 56 variables were retained. Table 1 shows 37 variables with significant differences in distribution among the three groups through difference analysis.

Table 1 Difference analysis of variables in three groups of rapid recovery, CCI, and mortality

Full size table

After collinearity diagnosis of the above variables, the VIF of PT, INR, pH, and BE were greater than 10. Considering the close correlation between PT and INR, pH, and BE, one of them was removed respectively and INR and pH were retained. After collinearity diagnosis again, there was no obvious collinearity between the remaining variables. The remaining variables were analyzed using multinomial logistic regression analysis and 26 of them were selected (Table 2).

Table 2 Multinomial logistic regression analysis results

Full size table

Samples were divided into training and validation sets with a ratio of 7:3 through the stratified method (Table 3). Subsequently, three models were developed by the training set and assessed using the validation set. The values of evaluation indexes for 3 algorithms are illustrated in Table 4.

Table 3 Data set dividing information

Full size table

Table 4 Performance of the three models in the validation set

Full size table

The models exhibiting superior performance were CatBoost and random forest, with respective weighted AUC in the validation set of 0.771 and 0.755, both higher than 0.747 of the logistic regression (Table 4). To further evaluate the performance of the two superior models for each prognosis, we generated ROC curves for each prognosis respectively (Fig. 2). Both of them predicted the prognosis of mortality better than the other two, in line with the prioritization of clinical decision-making. The difference in prediction performance between RF and CatBoost was insignificant, but the CatBoost model was better, where CatBoost reached an AUC of 0.804 for mortality prediction. This result was similar to the predictive performance observed in other studies on mortality rates for infectious diseases [30]. Moreover, the precision-recall(P–R) curves in Fig. 3 for Random Forest and CatBoost showed minimal difference, yet CatBoost demonstrates a slight advantage due to its superior handling of imbalanced datasets. To better evaluate the performance of the model, we further plotted the ROC and PR curves for the clinical scores commonly used in the Additional file 1 and compared them with CatBoost (Supplementary Figure 2). It can be seen that these clinical scores have a certain predictive value for adverse outcomes, with APACHE II being particularly prominent. When compared with CatBoost, the latter showed relatively better predictive performance for mortality prognosis and also had a good predictive effect on the rapid recovery group.

The best model was CatBoost. Through ranking of feature importance derived from CatBoost, our investigation revealed that three clinically-relevant feature - urine output, respiratory rate, and temperature - emerged as the most influential predictors of sepsis prognosis (Fig. 4). Furthermore, to enhance the interpretability of nonparametric models, which often lack transparency, we employed SHAP method for a visual representation of the features importance. This approach enables us to quantify each feature’s contribution to both the overall and specific prediction of the model. The SHAP dependence plots were drawn to examine how the three main features contribute to model prediction. The influence tendency of them on the model was complex (Fig. 5).

As shown in Fig. 6, it was evident that the main features in predicting various prognoses were distinct. With respect to CCI, the top three features were mechanical ventilation, MAP, and age, and for mortality were urine output, BUN, and age. In addition, the confusion matrix of the CatBoost model in the validation set is presented in Fig. 6d. Previously we obtained that CatBoost had the highest AUC for mortality prediction, but we also found that there was the possibility of overconfidence for the prediction of poor prognoses in the meantime(Rapid recovery was predicted to CCI and mortality in many cases) (Fig. 6d). Overconfidence in mortality prediction may lead to excessive clinical attention to these patients, potentially resulting in the irrational allocation of medical resources based on the severity of their conditions. However, it may also enhance the clinical vigilance of healthcare providers, potentially reducing mortality rates to some extent. And we performed model calibration using isotonic regression and present the results in Additional file 1 (Supplementary Table 1 and Figures 3 and 4). Initial calibration of the overall model also revealed that the predicted probabilities of mortality were higher than the actual, placing the calibration curves below the 45-degree line (Supplementary Figure 3). The metrics for the CatBoost model before and after calibration are detailed in Supplementary Table 1. The overall calibration did not improve the model’s mortality prediction. Following this, a mortality-specific recalibration was conducted, yielding a slight, albeit non-significant enhancement in calibration performance (Supplementary Figure 4). The model’s predictive capacity for rapid recovery outcome is constrained, likely due to the large population and complex individual differences within this patient cohort, which limits the predictive performance across different datasets. Moreover, during model training, the emphasis on the most adverse prognosis may indirectly contribute to the result. Furthermore, force plots of interpretation for 50 patients in CCI and mortality groups in the validation set are illustrated in Fig. 7. They show the combined contribution of each feature to prediction.

To date, sepsis remains a serious problem that jeopardizes human life and health. The combination of public databases and ML methods offers opportunities for research on sepsis diagnosis [31], complications and prognosis prediction [32,33,34], and treatment strategies [35, 36]. Undoubtedly, ensuring data security, patient privacy and adherence to AI ethics is of paramount importance whether using public databases or in the clinic. In this study, we endeavored to construct a prognostic prediction model of sepsis. Our findings indicated that the CatBoost model outperformed commonly employed models for sepsis prognosis prediction, achieving an AUC of 0.771 and an F1-score of 0.665.

We found that urine output, respiratory rate, and temperature played key roles in model forecasting. Urine output serves as an indicator of adequate perfusion. Sepsis-associated decreased urine output results from reduced renal perfusion due to systemic inflammation, capillary leak, and compensatory vasoconstriction. Pro-inflammatory cytokines and renal vasoconstriction further impair glomerular filtration and tubular function, while microvascular thrombosis exacerbates renal ischemia. This clinical manifestation is indicative of disease progression and worsening severity in sepsis. And beeswarm plots illustrate that individuals with higher urine output are more likely to recover, whereas those with lower output face a greater risk of mortality. It is similar to the findings of a study by Heffernan AJ et al [37] that low urine output means a high likelihood of death in sepsis patients. Through a meticulous examination of the 3 pivotal predictors (Fig. 5), we observed that urine output exhibited an opposite U-shaped association with CCI and mortality risk, suggesting a complex relationship with poor prognoses. Within a certain range, increased urine output was related to an increased risk of CCI and a decreased risk of mortality. However, scattered data points imply that excessively high urine output can also elevate the risk of mortality. Sepsis is associated with a reduction in circulating blood volume, which subsequently leads to the accumulation of acidic metabolites and triggers a compensatory increase in respiratory rate. Consequently, an elevated respiratory rate frequently signals the onset of clinical deterioration. Beeswarm plots similarly indicate that patients with fast respiratory rates are more likely to develop CCI or mortality (Fig. 6). The plots also reveal a monotonically positive correlation between respiratory rate and the risk of CCI and mortality (Fig. 5). The impact of body temperature on CCI and mortality was different, patients with very low body temperature were susceptible to die. Hypothermia in sepsis is caused by hypothalamic dysfunction and peripheral vasodilation due to infection. The former disrupts thermoregulation, while the latter increases heat loss. Metabolic depression resulting from impaired cellular metabolism and infectious effects further reduces heat production. These factors collectively lead to a significant drop in temperature. Hypothermia may impair immune function, exacerbate organ dysfunction, and disrupt inflammatory regulation, correlating with disease severity and adverse outcomes. A multicenter, large-sample study conducted by Saxena et al. also confirmed that sepsis patients with the lowest mortality risk were those who experienced high body temperatures like peak temperatures of 38–39.4 °C during the first 24 hours after ICU admission [38]. This suggests that a certain degree of initial hyperthermia may have a positive impact on patient prognosis. As body temperature rose, the risk of mortality initially decreased and subsequently increased, demonstrating that both excessively low and high body temperatures signify ominous prognosis (Fig. 5). Furthermore, those with very high body temperature were inclined to develop CCI as well. The reason may be that low temperature means a weakened immune system, predisposing patients to death, while high temperature is also detrimental due to ongoing inflammatory response and organ failures.

SHAP values offer an intuitive interpretation of model decisions. Although the variables highlighted by SHAP values as having significant predictive value do not have a direct causal relationship with the outcome, SHAP analysis indicates that variables such as urine output and respiratory status make substantial contributions to outcome prediction. Therefore, these variables warrant particular attention in the early stage. As evidenced by the results, patients exhibiting higher urine output, a relatively slow respiratory rate, and stable CVP monitoring are more likely to recover. For the three important predictors, the focus of CCI is principally on respiratory and circulatory status. Decreased urine output and elevated blood urea nitrogen are certainly of the essence to death outcome, underscoring the critical role of renal function in survival prognosis. This suggests that we should be extra vigilant for patients with poor renal function and closely monitor the level of urine output and renal function. Taken together, age has an obvious impact on adverse prognoses. Relatively younger patients are prone to develop CCI, while older patients tend to have a higher risk of death (Fig. 6). Meanwhile, the above features also reflect the importance of 24 - hour ICU monitoring and various monitoring methods such as in - out and CVP monitoring.

However, this study also has limitations. Firstly, due to the constraints of the database, the singularity of the dataset, regional epidemiology and so on bring certain limitations on the model’s generalizability. And some data in the database is incomplete and limits the inclusion indicators, potentially resulting in the loss of critical features and suboptimal model performance. Besides, for patients excluded due to the length of ICU stay less than 24 hours, these individuals may have been transferred, discharged, or unfortunately succumbed to their severe condition. Given that data for these patients within the first 24 hours were incomplete, we excluded them from the study. This exclusion may somewhat compromise the model’s predictive performance for this patient group. Additionally, the distribution of data sets was imbalanced. Although weights were applied to balance the distribution of different prognoses in data sets, the model’s predictive performance might still be affected. Moreover, this study was retrospective, and the model was internally verified using data set partitioning but not externally verified. Furthermore, the absence of definitive evidence of the onset of sepsis means that the levels of single cross-sectional biomarkers obtained at the earliest time point considered to be associated with clinical manifestation also brings limitations, as the timing of initial clinical presentation may affect the dynamics of the biomarkers [39]. The probable reason for this is that the appearance of clinical manifestations may not coincide with the initiation of sepsis.

This study offers insights into the development of a multi-classification prediction model for sepsis. However, when any model is used in clinical decision-making, external validation, prospective validation, and randomized clinical trials are essential to make rational judgments [40]. The lack of external validation somewhat restricts our evaluation of the model’s generalization ability. In the future, we will be committed to conducting multi-center clinical study and simultaneously collecting available retrospective clinical data to obtain sufficient external validation data, thereby thoroughly evaluating the generalizability and clinical performance of the model. If the model is verified in clinical practice, it will be convenient to build a website or develop a simple predictive tool for its application in ICU subsequently. Additionally, model explainability is crucial for understanding, trusting, and applying the model. In our paper, we employed methods such as the confusion matrix and SHAP plots to enhance the explainability of the model. Despite these efforts, tools for interpreting ML models remain limited. The development of clinically viable predictive tools faces technical challenges requiring interdisciplinary collaboration with clinical informaticists. Clinicians’ reliance on their experience may limit trust in predictive tools, emphasizing the need for integration with clinical guidelines. Our plan for tools will be designed to support, not replace clinical judgment, enhancing decision-making through careful interpretation of outputs. We hope that in the future, there will be an emergence of more flexible and comprehensible explainability tools, or improvements in the self-explanatory ML algorithms. There is no doubt that utilizing the first ICU admission data presents limitations, as the progression of sepsis is a dynamic process. We will also investigate the influence of the dynamic trajectory of biomarkers on the prognosis of sepsis, so as to better conform to the dynamic changes of the pathophysiology of sepsis. It is worth noting that ensemble models by leveraging the strengths of multiple models, may achieve superior performance compared to individual model. We will further explore the use of ensemble techniques to optimize clinical prediction models in our future work.

A unique approach was provided to simultaneously and timely distinguish multiple prognoses in sepsis. CatBoost model affords a valuable reference for the clinical evaluation and proactive intervention.

The data generated and analyzed during the current study are available on the MIMIC-IV website at http://mimic.physionet.org/, https://doi.org/10.13026/6mm1-ek67. Raw data extracted in our initial stage is provided in supplementary information files.

AUC:: The area under the receiver-operating-characteristic curve
SHAP:: The Shapley Additive explanations
ICU:: Intensive care unit
CCI:: Chronic Critical Illness
SOFA:: Sequential Organ Failure Assessment
qSOFA:: Quick SOFA
SIRS:: Systemic Inflammatory Response Syndrome
APACHE II:: Acute Physiology and Chronic Health Evaluation II
ML:: Machine learning

Download references

Not applicable.

This study was supported by the subject of Jiangsu Province TCM Science and Technology Development Program (No.ZD202204), Natural Science Foundation of Nanjing University of Chinese Medicine (XZR2023006, XZR2023031) and the construction project of national famous Chinese medicine experts, Prof. Zhou Min’s research office (2022/07-2025/12).

Author notes

Authors

Hai-Dong Zhang
You can also search for this author inPubMed Google Scholar
Ying-Hao Pei
You can also search for this author inPubMed Google Scholar
Hua Jiang
You can also search for this author inPubMed Google Scholar

Z.S, D.H, P.Y, and J.H designed the work. S.Y, S.B cleaned and analyzed the data. G.Y, C.Q summarized and selected the features of patients. Z.H extracted the data of sepsis patients from MIMIC-IV v2.2. Z.S, D.H, P.Y, J.H constructed the model and wrote this paper. All authors reviewed the manuscript.

Correspondence to Ying-Hao Pei or Hua Jiang.

Not applicable.

The authors declare no competing interests.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

Zhang, SZ., Ding, HY., Shen, YM. et al. Harness machine learning for multiple prognoses prediction in sepsis patients: evidence from the MIMIC-IV database. BMC Med Inform Decis Mak 25, 152 (2025). https://doi.org/10.1186/s12911-025-02976-y

Download citation