Risk of mortality assessment in pediatric heart surgery

Pediatrik kalp cerrahisinde mortalite riskinin değerlendirilmesi

Özge Köner¹, Deniz Özsoy², İsmail Haberal², Ali Ekrem Köner³, Cenk Eray Yıldız², Gürkan Çetin²

¹Department of Anaesthesiology and Reanimation, Yeditepe University Hospital, İstanbul, Turkey
²Department of Cardiovascular Surgery, İstanbul University Institute of Cardiology, İstanbul, Turkey
³3Department of Cardiovascular Surgery, Medical Faculty of Gaziosmanpaşa University, Tokat, Turkey

DOI : 10.5606/tgkdc.dergisi.2013.7658

Abstract

Background: This study aims to evaluate the validity of Pediatric Index of Mortality (PIM) 1, PIM 2, and modified Sequential Organ Failure Assessment (m-SOFA) scores for predicting mortality in pediatric heart surgery.

Methods: Between June 2003 and January 2009, medical files of 456 pediatric patients who were monitored in a 12-bed postoperative cardiac surgery care unit following heart surgery were retrospectively analyzed. A total of 373 files were included in the study. Age, gender, diagnosis, the length of stay in the intensive care unit and hospital, survival rates, PIM 1, PIM 2 scores and m-SOFA scores on admission, at 24 and 48 hours and peak m-SOFA scores were recorded. Student’s t test was used to compare the normally distributed data, whereas Mann-Whitney-U test was used to compare non-parametric data. Calibration of the scores was performed using the Hosmer and Lemeshow Goodness of Fit test. Discrimination power of the scores was analyzed using the receiver operating characteristic (ROC) curves.

Results: Fifty patients (13.4%) died perioperatively. Peak and m-SOFA scores on admission were significantly higher in nonsurvivors (9.8±2 and 9.2±2, respectively) than survivors (5±2.5 and 4.6±2.5, respectively; p<0.01). Calibration with Hosmer- Lemeshow Goodness of Fit test was chi-square df (8)=30.4, p=0.0002 for PIM 1 and chi-square df (9)=13.5, p=0.13 for PIM 2. Discrimination power and calibration strength of PIM 2 score was good (ROC 0.82), whereas PIM 1 had a better value (ROC 0.87) of discrimination power with a poor calibration strength. The ROC values of peak and m-SOFA scores on admission were observed to have a good discrimination power (0.93 and 0.92, respectively).

Conclusion: Our study results demonstrate that peak and m-SOFA scores on admission are improved for the prediction of mortality in pediatric cardiac surgery, compared to PIM 1 and PIM 2 scores.

Abstract

Mortality risk and outcome prediction are of great importance in the intensive care unit (ICU), and mortality indices are tools that aid in predicting patient outcome, especially in pediatric ICUs. The standard mortality prediction model in pediatric ICUs is the Paediatric Risk of Mortality (PRISM)[1] in which the scores are calculated using the most abnormal values of 14 physiological variables obtained within the first 24 hours of ICU stay.[2] However, it is difficult to collect the variables for the PRISM, and other problems exist with this model. For example, the score is less accurate than it appears, and the worst values obtained within 24 hours hide the differences between the various centers. Because of these issues with PRISM, another index, the Pediatric Index of Mortality (PIM), based on eight variables, was developed. The score for this model is based on the data used for admission to the ICU. It is simple and has a good predictive power.[3] In 2003, a revised version of the PIM score (PIM2) was developed,[4] and the categories of admission after cardiopulmonary bypass (CPB) and low risk diagnosis variables were added.

Modified sequential organ failure assessment (m-SOFA) is another assessment instrument that consists of the evaluation of five organ systems. It is designed to predict organ failure, but has also been successfully used to predict mortality after pediatric heart surgery.[5]

In this study, we retrospectively collected the data of pediatric patients who underwent heart surgery, calculated their PIM 1, PIM 2, and m-SOFA scores, and evaluated their mortality prediction rates.

Methods

This study, which was approved by the ethics committee, was conducted in a 12-bed postoperative cardiac surgery ICU that admits 300 patients yearly who undergo heart surgery, 100-130 of whom are pediatric patients. For this retrospective study, data from those who underwent heart surgery between June 2003 and January 2009 at our facility was collected, and a total of 456 files were evaluated. Only those files with complete records and laboratory tests were included in the study since that made them eligible to be evaluated by the PIM and SOFA scores. Seventy-seven were excluded for this reason. Furthermore, patients older than 16 years old were also excluded. Hence, 376 patients remained after applying the exclusion criteria; however, three patients died in the operating room, thus leaving 373 (212 boys, 161 girls). Their data were the assembled and recorded. Standard anesthesia and CPB methods were used during all of the operations.

The age, gender, diagnosis, ICU and hospital length of stay, mortality, and cardiopulmonary and aortic cross-clamp times along with the PIM 1 PIM 2, m-SOFA scores on admission and at 24 and 48 hours, and the peak m-SOFA score during the study period were recorded. The m-SOFA score had a maximum value of 20 (Table 1),[5] and the PIM 1 and PIM 2 scores were recorded according to the ICU referral data.

Table 1: Modified sequential organ failure assessment score

Parametric values were expressed as mean ± standard deviation (SD) where appropriate, and all variables were tested for normal distribution using the Kolmogorov-Smirnov test. Student’s t-test was used for comparison of the normally distributed parametric data while the m-SOFA scores, both among the survivors and non-survivors, were compared with the Mann-Whitney U test. The performance of the scoring systems was assessed by applying the standardized mortality ratio (SMR), which was calculated by dividing the observed deaths by the predicted number of deaths in the total group. Calibration of the PIM 1 and PIM 2 scores was performed with the Hosmer and Lemeshow goodness-of-fit test for deciles of mortality risk based on the ranked mortality risks of all patients for each scoring system. A p value <0.05 indicated a poor calibration or fit. The discrimination power of the scores was assessed with receiver operating characteristic (ROC) curves in which the sensitivity was plotted against the 1-specificity. The area under the receiving operating characteristic (AUROC) curve was the overall summary measure of discriminatory performance. For example, an AUROC of 0.5 meant that there was no discriminative ability or that it was equal to random chance, whereas an AUROC of 1.0 indicated a perfect discrimination power. Statistical analysis was performed with the SPSS for Windows version 10.0 software program (SPSS Inc, Chicago, IL, USA), and a value of p<0.05 was considered to be significant.

Methods

Results

The distribution of the patients according to the operative procedures is shown in Table 2, and the demographic and operative data are shown in Table 3. Fifty of the patients died perioperatively (13.4%), and 15 of these were neonates (30% of all the nonsurvivors). The ages of these patients ranged from three days to 14 years old. The non-survivors were younger than the survivors (Table 3) as they were all younger than four years of age.

Table 2: Distribution and mortality rate according to the operative procedures

Table 3: Demographic and operative data

The length of ICU stay along with the aortic crossclamp and CPB times were longer for the non-survivors (Table 3). Furthermore, the m-SOFA admission scores were higher for the non-survivors (9.2±2) than the survivors (4.6±2.5) (p<0.01), and the m-SOFA scores on day one and day two were higher for the nonsurvivors (7.8±2 and 8.7±2.5, respectively) than for the survivors (4.6±2.5 and 3.8±2.7) (p<0.01 for both). In addition, the peak m-SOFA score was significantly higher for the non-survivors (9.8±2) when compared against the survivors (5±2.5) (p<0.001) (Figure 1). All the non-survivors had both peak m-SOFA and m-SOFA admission scores of ≥ 6 (range 6-14).

Figure 1: Comparison of sequential m-SOFA scores among the survivors and non-survivors. m-SOFA: Modified Sequential Organ Failure Assessment.

The m-SOFA admission score had a sensitivity of 97% and a specificity of 80%, which made for a cut-off value of 6.5, and the peak m-SOFA values revealed a similar sensitivity and specificity (96% and 74%, respectively) with the same cut-off value. The m-SOFA on day one had a sensitivity of 88% and specificity of 75%, yielding a cut-off value of 5.5. On day two, the sensitivity was 86% and the specificity was 86%, making a cut-off value of 6.5. Of all of the non-survivors, the peak m-SOFA and m-SOFA admission scores of only two patients were below the cut-off value of 6.5. These results revealed that the peak m-SOFA score along with m-SOFA admission scores of 6.5 have reliable sensitivity and specificity for mortality.

The overall performance of the PIM 1 and PIM 2 scores was evaluated via SMR, which was calculated by comparing the expected and observed deaths in the whole group (Table 4). The PIM 1 score had a sensitivity of 83% and a specificity of 78% for a cutoff value of 2.85, and the PIM 2 score had a similar sensitivity and specificity (83% and 76%, respectively) for a cut-off value of 2.45. Of all the non-survivors, the PIM 1 scores of 10 patients were below the cut-off value of 2.85 and the PIM 2 scores of seven patients were below the cut-off value of 2.45. Therefore, the discriminative power of admission and the peak m-SOFA score of 6.5 is higher than both PIM scores.

Table 4: Observed and expected mortality as standard mortality rate (n=373)

Calibration with the Hosmer-Lemeshow goodnessof- fit test for PIM 1 revealed a chi-square value of degrees of freedom (df) (8)=30.4 and a 95% confidence interval (CI) of 1.2 (range 1.12-1.27) (p=0.0002). The results of the PIM 2 showed a chi-square value of df (9)=13.5 and a 95% CI of 1.38 (range 1.2-1.5) (p=0.13). The PIM 2 mortality prediction model proved to be a better option compared to PIM 1 for predicting mortality in pediatric patients undergoing heart surgery as the Hosmer-Lemeshow goodness-offit test had a p value of >0.05 (p=0.13). However, both scores underestimated the mortality risk in this study (Table 5).

Table 5: Observed and expected mortality for pediatric index of mortality 1 and pediatric index of mortality 2 scores in pediatric patients undergoing heart surgery

The discriminatory performance of the scores assessed with the ROC curves showed that the PIM 2 had a fair discrimination power (ROC score 0.82), while the PIM1 ROC (0.87) was better. However, there was poor calibration (p=0.0002). The m-SOFA scores have good discrimination power (Table 6). However, the m-SOFA admission and m-SOFA on day two had the greatest power of all (Table 6). The m-SOFA score is shown in Table 1.

Table 6: Discriminatory performance of the scores assessed by receiver operating curve curves

Results

Discussion

In this study, we found that the peak m-SOFA and m-SOFA admission scores performed better than the PIM 1 and PIM 2 scores for predicting mortality after pediatric cardiac surgery. Additionally, peak m-SOFA and m-SOFA admission scores of 6.5 have a reliable sensitivity and specificity for predicting mortality.

A preliminary study evaluating the mortality predicting power of PIM over PRISM in pediatric cardiac surgery demonstrated that PIM is better than PRISM when applied to infants and children.[6] Furthermore, the authors declared that collecting data for the PIM is much easier than for the PRISM. However, in a recent study, the performance of the PIM 2 was found to be poor regarding the calibration and predictive ability in pediatric cardiac surgery.[7] The difference between these studies regarding the PIM scores might be attributable to differences among the centers and poor inter-rater reliability.[8] The Czaja et al.[7] study, comprised of 8,391 pediatric patients who underwent cardiac surgery, also had an AUROC of 0.80. The performance of the PIM 2 score in our study had a similar AUROC of 0.82.

In a study performed on 75 pediatric patients who underwent cardiac surgery, Barlas et al.[9] determined that the PRISM score had a poor mortality prediction power when compared with the modified Acute Physiology and Chronic Health Evaluation (APACHE) III score.

The SOFA is an organ failure assessment score. However, it is also valuable for predicting ICU mortality for both adult and pediatric ICU patients.[10,11] This score can also be used as an independent predictor of mortality in adult patients undergoing heart surgery, as shown in the study by Pätilä et al.[12] In that same study, the peak SOFA scores were measured during the first three days. A SOFA score of over 20 points in pediatric patients undergoing heart surgery was found to be reliable for predicting death within the first 36 hours.[5] The Shime et al.[5] study, with as maller number o f participants (n=142) than our study, had a very high neonatal mortality rate (7 out of 8). Furthermore, the authors did not report the AUROC or the sensitivity and specificity of the SOFA scores. They didn’t perform a neurological evaluation based on the SOFA score, which we chose not to include, either. In our study, the AUROC plot values for admission regarding the m-SOFA scores and m-SOFA scores at 48 hours were higher than all of the other m-SOFA scores. The m-SOFA admission and peak m-SOFA scores of 6.5 showed good discrimination power. In a systematic review of SOFA-based models for predicting mortality in the ICU, Minne et al.[13] concluded that SOFA admission scores were competitive with severity of illness scores limited to the first 24 hours of admission. They advocated for the use of a combination of sequential SOFA scores in conjunction with traditional models (e.g. APACHE). We agree that studies which combine SOFA with other severity scores, such as PIM, could also be valuable for predicting mortality after pediatric cardiac surgery.

We acknowledge that our study had several limitations. A retrospective analysis cannot provide as strong evidence in favor of the predictive power of a mortality scoring system as a prospective study would. Moreover, although statistically significant results have emerged in our limited patient set, multi-center studies with larger patient numbers are necessary for enhancing statistical strength in testing the universality and reliability of m-SOFA score as a mortality predictor.

Discussion

Conclusion

The m-SOFA peak and admission scores performed better for pediatric patients who underwent heart surgery and for the prediction of mortality than the PIM 1 and 2 scores in our retrospective study. Larger prospective studies are needed to investigate the value of these scores as a better means of predicting surgical outcomes in pediatric heart patients.

Declaration of conflicting interests
The authors declared no conflicts of interest with respect to the authorship and/or publication of this article.

Funding
The authors received no financial support for the research and/or authorship of this article.

Conclusion