Risk group classification for bleeding after coronary artery bypass graft surgery: a comparison of the logistic regression with decision tree models

Koroner arter baypas greft cerrahisi sonrasında kanama açısından riskli grupların sınıflandırılması: Lojistik regresyon ve karar ağacı modellerinin karşılaştırılması

Reza Safiarian¹, Payam Amini², Elham Khodayari Moez², Fatemeh Mohammadzadeh², Mohammad Tavakoli³, Farid Zayeri⁴

¹Baqiyatallah University of Medical Science, Tehran, Iran
²School of Medical Sciences, Tarbiat Modares University, Tehran, Iran
³Ministry of Health Treatments and Medical Education, Tehran, Iran
⁴Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran

DOI : 10.5606/tgkdc.dergisi.2013.7680

Abstract

Background: This study aims to identify high-risk patient groups for bleeding after coronary artery bypass graft (CABG) surgery.

Methods: We retrospectively evaluated 205 patients (143 males, 62 females; mean age 59.7±10.1 years; range, 28 to 83 years) undergoing CABG surgery between June 2001 and August 2008 at Jamaran Heart Hospital, Tehran, Iran. Baseline characteristics of the patients and postoperative bleeding status were recorded. For classifying the bleeders and non-bleeders, classic logistic regression and decision tree models were utilized.

Results: Logistic regression analysis showed that sex was significantly related to postoperative bleeding. Decision tree model revealed that age (score= 100), diabetes mellitus (score= 16.38), sex (score= 13.67), capital residency (score= 7.31) and dyslipidemia (score= 5.06) were found to have an impact on bleeding. We also observed that the decision tree model provided a better classification of the patients than logistic regression.

Conclusion: Surgeons should be aware of risk indicators of bleeding such as older age, male sex, absence of diabetes mellitus and presence of dyslipidemia in patients with three bypassed vessels before CABG. We also recommend statisticians to utilize the decision tree model instead of logistic regression analysis in classification of risk groups.

Abstract

An association between bleeding after coronary artery bypass graft (CABG) surgery and postoperative mortality and morbidity has been well proven in many previous studies.[1-4] In addition, a higher risk of an adverse outcome is predicted for patients who undergo reoperations because of bleeding after a CABG operation.[2] This risk increases in proportion to the time that elapses before the reoperation. Furthermore, other complications such as stroke, renal failure, transfusion reactions, infections, and cardiac tamponade are expected among patients for whom the reoperation is delayed.[1,2] These patients usually need prolonged intensive care and hospital stays, which increases the cost of treatment.[4] Much research has been conducted to identify potential changes during cardiac surgery that could decrease the rate of bleeding among patients undergoing CABG[2,4] since on-pump CABG and longer cardiopulmonary bypass (CPB) duration affects platelet function. Because of the decreased incidence of clotting abnormalities, off-pump CABG has become more popular.[1,3]

Classification assigns a new object to a specific class from a given set of classes and can be performed by using various methods like data mining.[5] For example, classifying patients in order to predict periventricular leukomalacia by employing monitoring variables such as heart rate, blood pressure, and central venous filling pressure[6] helps to identify patients who are prone to coronary heart disease because of risk factors such as smoking, blood pressure, and total cholesterol levels.[7] Among the many classification methods, decision trees (DTs), Bayesian networks (BNs), k-nearest neighbor (k-NN) algorithms, and fuzzy logic are the most commonly applied approaches.[5] Aside from evaluating the predictive power of these methods, knowing the ease with which the results can be interpreted is vital when choosing the appropriate analytical method. With this in mind, the DT classifier is a common choice among data analysts.[8-10] This procedure is preferable when the subjects are described through a predetermined set of attributes, the target variable is discrete, disjunctive results are required, or the data contains errors and missing attribute values.[11] In addition to the modern analytical techniques such as DT, regression analysis is one of the traditional statistical methods which can be applied to mark the most influential variables.[9] Predicting the presence or absence of an attribute by means of some predictors is a widely known application of the logistic regression (LR) model.[12]

Our objective was to identify the high-risk groups for bleeding after on-pump CABG according to the preoperative characteristics of the patients in order to give these patients the extra care they require. Therefore, we compared LR and DT, the two most popular classification methods.

Methods

In this study, we analyzed the data from 205 patients (143 males, 62 females; mean age 59.7±10.1 years; range 28 to 83 years) who underwent CABG surgery between June 2001 and August 2008 at Jamaran Heart Hospital in Tehran, Iran. Since reoperation after CABG surgery is rare (2-6%),[3] we randomly selected 40 patients from the reoperated subjects for the cohort, and the remainder were selected from non-reoperated cases in order to have valid inferences.[13,14]

Reoperation took place in these patients as a result of bleeding after initial departure from the operating room. The decision criteria for the reoperation were as follows: (i) drainage of more than 500 mL during the first hour, more than 400 mL during each of the first two hours, more than 300 mL during each of the first three hours, or a total of more than 1000 mL in the first four hours; (ii) sudden massive bleeding; (iii) obvious signs of cardiac tamponade; (iv) excess bleeding despite correction of coagulopathies; and (v) cardiac arrest in a patient who continued to bleed.[15]

In order to eliminate any bias in the estimation, we applied the following restrictions on the sampling design: (i) patients with a history of abnormal coagulation were excluded; (ii) the CABG operations were performed by the same surgeon; and (iii) patients with three bypassed vessels were eligible for this study.

In this archival study, bleeding after CABG was treated as a binary response variable (0= patients who d id n ot e xperience b leeding, 1 = p atients who bled postoperatively). Patients’ preoperative characteristics such as gender, age at the time of surgery, dyslipidemia, diabetes mellitus (DM), and hypertension (HT) were considered as predictors (independent variables) along with whether they were a resident of Tehran [capital resident (CR)]. This information was obtained from the patients’ medical files at the hospital. To find the variables that affected bleeding after CABG, we applied both the LR model and the DT to the data. The details are shown in the following sections.

Details of statistical analysis
Logistic regression (LR): Logistic regression is a common statistical tool for predicting a binary outcome. In this model, the relationship between the independent variables and the dependent variable is a logit function (the natural logarithm of odds), not a linear one. The model can be written as:

In this model, X1,X2,…,Xp are the independent variables, and b1,b2,...,bp are the regression parameters which should be estimated through the data.[16,17]

Decision tree (DT): The DT is one of the most popular classification methods as it can be applied to many medical diagnosis problems.[18] In the majority of cases in which the aim of the research is to identify or discriminate high-risk subjects, the DT is an excellent analytical choice.[19] It involves three basic components: decision nodes, branches, and leaves. The path begins at the decision node and extends to the leaf. This corresponds to a conjunction of test features. The tree can be considered as a disjunction of these conjunctions,[18] and these disjunctions function to separate the branch population into groups with a similar likelihood of events. At each branching stage, the set of disjunctions causes the highest possible predictive power. This method provides the graphic feature of choices which allows one to find alternatives for each decision and possible outcome and to compare the different alternatives.[19] Several algorithms have been introduced to construct a decision tree, such as classification and regression trees (CARTs).[20]

Comparison: In addition, the Hosmer-Lemeshow test evaluates the adequacy of LR by using indices such as sensitivity, specificity, diagnostic accuracy (DA), positive predictive value (PPV) and negative predictive value (NPV) to determine the accuracy of the methods. To clarify the results of our study, a receiver operating characteristic (ROC) curve was plotted for the both the LR model and the DT. The area under the ROC curve (AUC) was calculated as a measure of discrimination, and McNemar’s test was used to evaluate the differences in proportions between the methods. To find the association between the observed and predicted values in both methods, measures such as the Δ coefficient, contingency coefficient, and Kendall tau-b correlation coefficient were calculated.[21] The C ART® version 6.0 (Salford Systems, San Diego, CA, USA) and IBM SPSS Statistics version 19.0 (IBM Corporation, Armonk, NY, USA) software programs were then used to analyze the data.

Methods

Results

About 50% of the patients were CRs, and the majority of these were either nondiabetic (72.7%), nondyslipidemic (63.4%), or nonhypertensive (68.8%). Forty patients (19.5%) experienced bleeding after the CABG surgery. To identify the clinical indicators for this bleeding, we used both the LR model and the DT to analyze the data.

The test sample was composed of 22 randomly selected patients. The remaining 173 subjects made up the learning sample, and these were classified using the DT method. The result derived from the learning sample was then evaluated by utilizing the test sample.

Gender was the only significant variable in the LR model ( Table 1 ), but age ( score= 1 00), D M ( score= 16.38), gender (score= 13.67), CR (score= 7.31), and dyslipidemia (score= 5.06) were significant variables in the decision tree analysis. According to the results of LR, the odds of bleeding for men were 2.57 times higher than for women, and the Hosmer-Lemeshow test showed a good fit for the LR model (p=0.524).

Table 1: Logistic regression results for assessing the effect of different risk factors on bleeding

In Figure 1, each node shows the probability of bleeding for patients who met the conditions mentioned on the corresponding branches. For example, the probability of bleeding for female patients who are younger than 66.5 was 0.07.

Figure 1: The classification tree model. DM: Diabetes mellitus; CR: Capital resident.

The 14 rules extracted from the DT are shown in Table 2. Our data revealed that regardless of residency and diabetes status, 7.4% of women younger than 66.5 of age experienced postoperative bleeding. Additionally, male diabetics older than 53.5 years old who were not CRs had a 34% probability of bleeding after CABG surgery.

Table 2: Risk group classification results for bleeding using the decision tree analysis

Regarding the higher sensitivity, specificity, DA, NPV, PPV, and AUC, a comparison was made concerning the outperformance of the DT versus the LR model (Table 3). We depicted the behavior of the methods through their ROC curves, and for the DT, it exhibited higher specificity and sensitivity when measured against the LR model (Figure 2).

Table 3: Diagnostic values of the decision tree and logistic regression models

Figure 2: Receiver operating characteristic curves for decision tree (left) and logistic regression (right) predictions.

We also used McNemar’s test and association measures to compare the diagnostic accuracy between the fitted models (Table 4), and this strongly confirmed the significant difference between the two methods.

Table 4: Assessment of the correlation between the decision tree and logistic regression classification results

The Δ coefficient, contingency coefficient, and Kendall tau-b between the observed and predicted values were 0.174, 0.172, and 0.174 for LR and 0.438, 0.401, and 0.438 for the DT, respectively. Obviously, the higher correlation signified the method with less misclassification.

Results

Discussion

The shared results of both the DT and LR indicated that men are more prone to bleeding after the CABG surgery. The same result can be found in the study by Mehta et al.,[22] who proved that bleeding in men is 1.39 times more likely than in women. However, when assessing the risk factors of reexploration caused by hemorrhage in CABG patients in 1998, Dacey et al.[3] found that gender was not a significant variable.

We found that the likelihood of post-CABG bleeding was significantly influenced by the age of the patient. In the aforementioned study by Mehta et al.,[22] bleeding in patients over the age of 60 was 1.02 times more probable than for other patients. Choong et al.[2] Dacey et al.[3] and Al-Fayes et al.[4] also determined that increased age was a significant risk factor for bleeding.

According to our results, diabetics were less likely to experience bleeding. This may be because their rate of blood perfusion is less than for non-diabetics.[23]

Mehta et al.[22] examined t he p ossibility o f DM b eing a risk factor for bleeding in CABG patients and found that non-diabetics were 1.16 times more likely to have bleeding than diabetics in their study of 528,686 patients.

In this study, HT was included in the models, but bleeding was not affected by this variable. Although Mehta et al.[22] found HT to be a significant risk factor for bleeding after CABG surgery, Choong et al.[2] concluded, just as we did, that the effect of HT was insignificant.

In addition, we found few studies in the literature which considered dyslipidemia to be a risk factor for bleeding after CABG surgery. We also found it to be an insignificant indicator, and Mehta et al.[22] also arrived at the same conclusion.

To identify the relationships between geographic status and clinical outcomes following CABG surgery, Dao et al.[24] concluded that rural patients experience longer hospital stays as well as higher in-hospital mortality rates. When we took into account geographic status as a risk factor for an adverse outcome after surgery, our data showed that patients living in Tehran experienced less bleeding than those living in other locations. Patients in the capital have more access to progressive medical equipment, special physicians, and health coverage, thus they are followed up regularly and start their treatment at the beginning stages of the disease. Consequently, fewer complications, for example mortality and morbidity after a surgery, are seen.

In order to classify the patients who underwent CABG into high-risk and low-risk categories, we compared the LR and DT classification models. The DT classification provides a rapid and effective method for determining categories and can be applied to a wide range of issues.[10,25] As previous research has proven, DTs have the ability to cope with noisy data[8] while also satisfying the need for accuracy and precision.[26] In addition, this method is strongly recommended because of its distribution-free nature as well as its ability to classify both categorical and numerical data. Furthermore, its tree-shaped structure can be easily interpreted and understood,[8,9] and its results can be described using a set of if-then rules,[11] which is an advantage for many medical applications.[27] In other words, in contrast to many other commonly used methods, for instance LR, there is no need to assume any distribution for the response when using the DT.[19,28] Clearly, the LR model has the benefit of parametric properties while the DT is non-parametric in nature.[29] However, the DT is able to deal with outliers[30] and missing data,[11,31] whereas the LR model estimates are usually biased when using this type of data.[32]

In this study, we found that the DT performed better than the LR model. Samanta et al.[6] also preferred the results of the DT over LR when they selected the hemodynamic features of periventricular leukomalacia in 2009. Sledjeski et al.[20] reached a considerably higher sensitivity (95% for the DT versus 37% for LR) and lower specificity (39% for the DT versus 80% for LR) for the DT when they sought to determine the high-risk group with regard to recurrent maltreatment by using data collected from investigations carried out in one Connecticutt county by the Connecticut Department of Children and Families. In contrast to these studies, other research, such as that conducted by Dreiseitl and Ohno-Machado[27] in 2003 and Mirta et al.[33] in 2005, concluded that both methods were equally effective.

Based on this study, surgeons should pay special attention to the potential for postoperative bleeding after CABG in men older than 44.5 and women older than 66.5 of age, especially those who are non-diabetics and who live in areas with medical facilities that are not as well equipped. For these high-risk groups, we strongly recommend performing off-pump CABG surgery or in the case of on-pump CABG surgery, the CPB duration should be shortened. Furthermore, patients should discontinue the use of antiplatelet medication prior to their surgery.[1-3] We also recommend that statisticians utilize the CART methods instead of LR for the purpose of classification.

Declaration of conflicting interests
The authors declared no conflicts of interest with respect to the authorship and/or publication of this article.

Funding
The authors received no financial support for the research and/or authorship of this article.

Discussion