Machine learning techniques in cardiac risk assessment

Makine öğrenmesi teknikleriyle kardiyak risk değerlendirmesi

Elif Kartal¹, Mehmet Erdal Balaban²

¹Informatics Department, İstanbul University, İstanbul, Turkey
²Turkish Community Services Foundation (TOVAK), İstanbul, Turkey

DOI : 10.5606/tgkdc.dergisi.2018.15559

Özet

Background: The objective of this study was to predict the mortality risk of patients during or shortly after cardiac surgery by using machine learning techniques and their learning abilities from collected data.

Methods: The dataset was obtained from Acıbadem Maslak Hospital. Risk factors of the European System for Cardiac Operative Risk Evaluation (EuroSCORE) were used to predict mortality risk. First, Standard EuroSCORE scores of patients were calculated and risk groups were determined, because 30-day follow-up information of patients was not available in the dataset. Models were created with five different machine learning algorithms and two different datasets including age, serum creatinine, left ventricular dysfunction, and pulmonary hypertension were numeric in Dataset 1 and categorical in Dataset 2. Model performance evaluation was performed with 10-fold cross-validation.

Results: Data analysis and performance evaluation were performed with R, RStudio and Shiny. C4.5 was selected as the best algorithm for risk prediction (accuracy= 0.989) in Dataset 1. This model indicated that pulmonary hypertension, recent myocardial infarct, surgery on thoracic aorta are the primary three risk factors that affect the mortality risk of patients during or shortly after cardiac surgery. Also, this model is used to develop a dynamic web application which is also accessible from mobile devices (https://elifkartal.shinyapps.io/euSCR/).Conclusion: The C4.5 decision tree model was identified as having the highest performance in Dataset 1 in predicting the mortality risk of patients. Using the numerical values of the risk factors can be useful in increasing the performance of machine learning models. Development of hospital-specific local assessment systems using hospital data, such as the application in this study, would be beneficial for both patients and doctors.

Özet

For many years, researchers have focused on improving life expectancy of patients and their quality of life, therefore, the treatment of common diseases has become a top priority for governments. According to the World Health Organization[1] cardiovascular diseases (CVD) are the top cause of all deaths around the world. According to the American Heart Association[2] one of every three deaths in the United States were caused by CVD which is a top killer both in the United States and worldwide. In this regard, Turkey has a similar situation as the United States. The Turkish Statistical Institute[3] indicates that 40.1% of all deaths in 2015 and 39.8% of all deaths in 2016 in Turkey were from circulatory system diseases. Statistics also show that ischemic heart disease, cerebrovascular diseases, hypertensive diseases, and other heart diseases accounted for 40.5%, 23.6%, 8.8%, and 22.3% of deaths due to circulatory system diseases, respectively, in 2016.[3] One way to prevent and control death caused by CVD is to predict the patients mortality risk.

Risk grouping and forecasting models are seen as essential tools for assessing the quality of care, medical decision making, patient counseling, and patient consent.[4] Different risk stratification models such as the Parsonnet Scoring System, Cleveland Clinic Scoring System, The Society of Thoracic Surgeons National Database Risk Scoring System, etc. are developed to evaluate the results of open-cardiac surgery.[5] Geissler et al.[6] compared six different scoring techniques and reported that the European System for Cardiac Operative Risk Evaluation (EuroSCORE) gave the best performance of mortality prediction. Dişcigil et al.[7] pointed out that the EuroSCORE has only four factors related to surgery, therefore it is the least affected by surgery factors. In this regard, increasing patient based risk assessment and minimizing differences which may arise due to the surgical team are seen advantages of EuroSCORE.[6,7] In addition, a system based on EuroSCORE called Cardiac Risk Scoring is used by hospitals in Turkey[8] and hospital charges are determined according to this risk score. Karabulut et al.[9] found that EuroSCORE is easy and applicable for the cardiovascular surgery clinic; although multi-centered studies and increasing the number of observations would increase the validity of the system in Turkey.

European system for cardiac operative risk evaluation
EuroSCORE is a scoring system which was developed to predict early death in cardiac surgery patients.[10-12] Roques et al.[13] identified risk factors for mortality in cardiac surgical adult patients as part of EuroSCOREs development process. Also a large portion of this studys database was used to develop the EuroSOCRE. Ninety-seven risk factors were collected from 20 thousand patients from 128 hospitals of eight European countries; however only 17 of these risk factors (Table 1) were selected for the scoring system as significant, reliable, and objective.[10] Today, there are three EuroSCORE models that provides online risk calculations: Standard (Additive) EuroSCORE,[14] Logistic EuroSCORE,[15] and EuroSCORE II[16] (Figure 1):

Figure 1: Comparison of EuroSCORE models.

Machine learning in cardiac risk assessment
The machine learning field is associated with building automatically developed computer programs with experience.[17] Machine learning incorporates computer programming using sample data or past experience for performance optimization.[18] Simon[19] described learning as any change that would improve a systems second performance on the same task or in a new task related to the same population. Mitchell[17] stated how a machine can change its behavior in order to learn by taking performance into consideration: A computer program is said to learn from experience (E) with respect to some class of tasks (T) and performance measure (P), if its performance at tasks in (T), as measured by (P), improves with experience (E).

There are two main types of learning: supervised learning and unsupervised learning. Supervised learning is a form of learning in which the learner receives a set of labeled examples of training data and makes predictions for points that it has not seen before.[20] Unsupervised learning is a form of learning in which no labeled sample is found in the learners' training data.[20] The main difference between supervised and unsupervised learning is the presence of the target attribute in the dataset.

Both machine learning and common scoring systems have been used for predicting mortality risk after cardiac surgery. Nouei et al.[21] proposed the Lookup Genetic Fuzzy Annealing System to predict mortality risk after coronary artery bypass grafting (CABG) surgery a nd c ompared i ts a ccuracy (acc= 0 .853) w ith two well-known machine learning techniques: logistic regression (acc= 0.781) and the multilayer perceptron neural network (acc= 0.748). Tu et al.[22] compared the performance of the artificial neural networks and logistic regression to estimate the mortality risk in the hospital after CABG operation, and found that the two methods reported similar relationships between patient characteristics and mortality. Lippmann et al.[23] estimated the mortality risk of death, stroke, and renal impairment for patients who underwent CABG operation using artificial neural networks. Tunca[24] developed a risk prediction model by using the REMARC (Risk Estimation by Maximizing Area under Receiver Operating Characteristic Curve) algorithm and TurkoSCORE system which involves a database and learning system to estimate mortality risk for patients in Turkey.

This study aimed to predict the mortality risk of patients during or shortly after cardiac surgery by using EuroSCORE mortality risk factors and machine learning techniques.

Yöntem

In this study, CRoss-Industry Standard Process for Data Mining (CRISP-DM) was chosen to systematically perform machine learning analyses. The CRISP-DM model was developed with the participation of industry leaders with input from over two hundred experts and data mining tool and service providers.[25] It consists of six stages: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment. The studys method is explained in terms of CRISP-DM below. In this study, approval of local committee of Acıbadem Maslak Hospital was obtained.

Business understanding
Business understanding is defined as problem understanding in the business environment. In this study, business understanding was considered to be problem understanding. The problem was defined as predicting the risk assessment of patients during or shortly after cardiac surgery.

Data understanding
In this study, data was obtained from Acıbadem Maslak Hospital. Initially, the dataset consisted of 17 predictive attributes (Table 1). The total number of observations was 1482. Dead / alive status of patients was determined by Roques et al.[13] according to the next 30 days after surgery. However, when the date of operation and discharge from hospital were examined in this study, a standard 30-day postoperative followup period could not be obtained. Therefore, patients were grouped according to standard EuroSCORE scores: low (0-2 points), moderate (3-5 points), and high (≥6 point). This attribute was also used as a target attribute for machine learning algorithms in the analyses.

Table 1: The European system for cardiac operative risk evaluation risk factors

Data preparation
A large number of missing values were detected. While the standard and logistic EuroSCORE calculator[26] does not have any option for missing values, the calculator was designed for patients[27] to have options of Do not know and No, and the calculated scores was equal in both cases. Therefore, in this study, it was decided to complete the missing values in the dataset before the analyses. Missing values of the categorical and numerical attributes were completed with the most repeated category and the mean of the each attribute in terms of each risk group (class label of the target attribute).

Outliers were detected and removed from the dataset by considering the rules provided by experts. Since the post-infarct septal rupture attribute was only seen in one patient, it was removed from the dataset. Duplicated observations were also removed.

EuroSCORE only works with categorical attributes; however numerical values of age, serum creatinine, left ventricular dysfunction, and pulmonary hypertension attributes were also available in the dataset. It is believed that possible effects of different data types of these attributes can be examined. Therefore, analyses are performed on two different datasets in which the attributes are numerical in Dataset 1 and categorical in Dataset 2. The numerical attributes in Dataset 1 were normalized using the max-min normalization technique.[28] Table 2 shows the frequency distribution of risk groups in Dataset 1 and Dataset 2.

Table 2: Frequency distribution of risk groups in the datasets

Modeling
Alternative models were created with Naive Bayes classifier, k-nearest neighbor algorithm, logistic regression analysis, ID3, and C4.5 decision tree algorithms to predict the mortality risk of patients during or shortly after cardiac surgery. The basic concepts of these algorithms are briefly explained below.[17,28-30]

Naive Bayes Classifier: An easily understandable method which makes use of the Bayes Theorem. Probabilities of an observation belonging to the class labels of the target attribute can be found with this method. Maximum a posteriori hypothesis and assumption of class conditional independence are two key elements that are used in classification process.

K-Nearest Neighbor Algorithm: The distance is calculated between the unlabeled observation and all observations in the dataset. k-observations are taken with smallest distance value. The most frequent class in k observations is assigned as the class value.

In this study, k parameter of the algorithm was initially selected. In order to obtain the best k, the algorithm was applied for k= 1, 2, ..., 10. Furthermore, Gower distance[31] was preferred for Dataset 1 since it has both binary coded and numerical attributes, Jaccard distance[32] is used for Dataset 2 because the attributes in Dataset 2 are encoded in asymmetric binary format. Moreover, the function which allows the Gower distance for the algorithm is developed with R by the authors for the analyses.

Logistic Regression Analysis: Provides the relationship between the predictive attributes and the target attribute if the target attribute is categorical. It is defined as binary, ordinal, and multinomial logistic regression according to data type of the target attribute.[33]

In this study, due to the number of zero frequency cells, some categories of the attributes (including age, left ventricular dysfunction, and the target attribute) were merged to make the data more appropriate for the analyses and binary logistic regression was performed. The purpose of binary logistic regression is to estimate the possibility that the target attribute gets 1 value when 1 code is used for the risky situation in the target attribute.[33]

ID3 and C4.5 Decision Tree Algorithms: ID3 is one of the simplest decision tree algorithms. It uses entropy and information gain to measure how well the training samples are split. The information gain criterion used in the ID3 has left its place to gain ratio in C4.5 which applies a kind of normalization called split information to information gain. Since C4.5 can work with attributes that take both categorical and numerical values and ID3 works only with categorical attributes, in this study analysis was performed with C4.5 on Dataset 1 and with ID3 on Dataset 2.

Evaluation
Various methods have been developed for model performance evaluation such as hold-out, stratified sampling, three-way split, cross-validation, etc.

In this study, stratified 10-fold cross validation method was chosen to compare performance of the models. In k-fold cross-validation, the dataset is divided into k-equal parts. One part is used for testing, and remaining k-1 parts are used for training. In the end, k error rates (or other performance evaluation measures) are obtained and average of the errors are taken into account as performance.

In addition; various measures can be used for model performance evaluation.[34] In this study, accuracy, error, and also more comprehensive measures such as F-measure and diagnostic odds ratio were calculated.

Analyses were performed with R programming language and RStudio. R is a free language and environment that allows statistical calculations and graphical visualization.[35] RStudio[36] is an integrated development environment for R. Various R packages such as e1071,[37] knnGarden,[38] RWeka,[39,40] shiny,[41] and shinythemes[42] are used to perform analyses in R. A dynamic web application of the best model has been developed with Shiny[43] and it provides the development of applications that enable the transfer of R codes to the web environment. One of the ways to share these applications on the web is to publish it from shinyapps.io.[44]

Yöntem

Bulgular

Considering that both categorical and numerical attributes were used in Dataset 1 C4.5 decision tree algorithm performed the best in risk prediction (acc= 0.989). This algorithm was followed by logistic regression analysis (acc= 0.982), Naive Bayes classifier (acc= 0.977), and k-nearest neighbor algorithm (acc= 0.972). However, m odel performances when working with only categorical attributes (Dataset 2) were lower than Dataset 1.

The ranking of the attributes in terms of contribution levels to models are obtained from ID3, C4.5, and logistic regression analysis (Table 3).

Table 3: Top three attributes according to contribution level of the models

Deployment
C4.5 decision tree model, which gives the best performance in Dataset 1, was integrated into a dynamic web application which is also accessible from mobile devices (https://elifkartal.shinyapps.io/euSCR/) (Figure 2). It is possible to produce rules similar to those below by using the decision tree.

IF pulmonary hypertension is less than or equal to 32 and recent myocardial i nfarct= NO and Other than isolated CABG = NO; Then the RISK is LOW

Figure 2: Web application for cardiac risk assessment using the C4.5 decision tree model.

IF pulmonary hypertension is less than or equal to 32 and recent myocardial infarct= NO and Other than i solated C ABG= YES; Then the RISK is MEDIUM

IF pulmonary hypertension is greater than 33 and pulmonary hypertension is less than or equal to 42 and recent myocardial infarct= YES; Then the RISK is HIGH.

Bulgular

Tartışma

This study aimed to determine the mortality risk of a patient during or shortly after cardiac surgery by using machine learning techniques. This study differs from other studies by using EuroSCORE in the literature in following aspects:

Since there was no 30-day follow-up data for patients in the dataset as in EuroSCORE, the standard EuroSCORE scores of the patients were first calculated and predictions were made using the risk groups as target attribute. Seventeen risk factors were used in the calculation of Standard EuroSCORE; however since postinfarct septal rupture attribute was only seen in one patient, this attribute was not used in analyses.

In EuroSCORE, if the patient did not know the exact value of the risk factor, the factor was calculated as absent. However, in this study, the missing values of the remaining 16 risk factors were completed.

Numerical and categorical values of age, serum creatinine, left ventricular dysfunction, and pulmonary hypertension attributes were used in Dataset 1 and Dataset 2, respectively.

Not only accuracy and error, but more comprehensive performance evaluation measures were also used.

The highest performance was obtained from the C4.5 decision tree algorithm model in Dataset 1 and the lowest performance was obtained from the ID3 decision tree algorithm in Dataset 2.

It was determined that the performance measures obtained from Dataset 2 were significantly lower than the values obtained from Dataset 1. The general evaluation showed that the errors in Table 4 ranged from 0.011 to 0.160. The difference between these error values could be considered insignificant for another application domain; however it is thought that since the patients mortality risk is highly crucial, it is suggested to use the numerical values of the factors that affect the target attribute.

Table 4: Results of model performance evaluation

Sixteen attributes were ordered with the help of ID3 and C4.5 decision tree algorithms and logistic regression analysis. Pulmonary hypertension was first rank in models derived from Dataset 1. It was also determined that age factor was in the top three for two different machine learning algorithms.

Conclusion
The C4.5 decision tree model had the highest performance in predicting the mortality risk of patients (accuracy= 0 .989). This model can be accepted as a predictor model based on learning from data from this study. Using numerical values of the risk factors may be useful in increasing the performance of machine learning models. Developing hospital-specific local assessment systems, such as the application in this study, would be beneficial for both patients and doctors. Furthermore, this model should be tested with datasets collected from other hospitals.

Acknowledgement
This study was formed within the scope of Kartal E. (2015). Machine Learning Techniques Based on Classification and a Study on Cardiac Risk Assessment (PhD Thesis). İstanbul University, İstanbul.

The authors would like to thank to Acıbadem Maslak Hospital Chief Physician Prof. Dr. Çağlar Çuhadaroğlu, Acıbadem University School of Medicine Head of Department of Cardiovascular Surgery Prof. Dr. Cem Alhan, and Acıbadem Maslak Hospital Cardiovascular Surgery Department Assistant Sevinç Kocaman who provided the dataset.

Declaration of conflicting interests
The authors declared no conflicts of interest with respect to the authorship and/or publication of this article.

Funding
This study was supported by Scientific Research Projects Coordination Unit of Istanbul University (Project number 49091).

Tartışma