Logistic regression used to estimate the probability (or risk) of a particular outcome, given the value(s) of the independent variable, assumes a linear relationship with the logit of the outcome (natural logarithm of odds). The logistic regression model is initially defined as the probability of a two-level result of interest. In the model, the probability of the event of interest is expressed, as the ratio of the probability of the event to the probability of its not happening. The logistic regression model takes the natural logarithm of the probabilities as a regression function of the estimators. If there is only one predictor in the model, it takes the form "ln[odds(Y=1)]=Β0+Β1X". Herein, "Y" is the outcome variable and "Y" takes the value "1" when the event occurs and "0" when the event does not occur. "0" is also the cut-off term and represents the regression coefficient representing the change in the logarithm of the probabilities of the event with a 1-unit change in "X". It is also possible to extend the model to the form containing more than one predictor, in which case the model becomes "ln[odds(Y=1)]=Β0 +Β1X1+Β2X2+...+ΒkXk". This equation is the general form of logistic regression and "k" is the number of predictors. The selection of predictors can be only achieved by a careful review of the literature in relation to the outcome to ensure that the full range of potential predictors is included.[8] The main goal of logistic regression is to estimate the "k+1" unknown parameters "Β" in the equation. It is possible to estimate the "Βs" coefficients in the model using the maximum likelihood or weighted least square methods. The fact that these coefficients, which show the relationship between "Xs" and logit of "Y", take a value greater than "0" indicates an increase in logit of "Y"; a value less than "0" indicates a decrease in the logit of "Y" and a value of "0" indicates that there is no linear relationship between logit of "Y" and "Xs".
A model is created which includes all the predictor variables useful in logistic regression to accurately predict the outcome category for individual cases using the best model. It calculates the probability of success over the probability of failure and gives an odds ratio (OR) for the analysis results. While interpreting logistic regression, OR is also included in the interpretation to evaluate not only the regression coefficients, but also the relationships between outcome and predictors. When a predictor variable "Xi; i=1,2,...,k" increases by one unit "Xi+1", which is referred to as the independent variable, the probabilities of the dependent variable, called the outcome variable, increase by the factor "exp(Βi)", while all other factors remain constant. This is called the OR and ranges from zero to positive infinity. It indicates the relative amount by which the odds of the outcome variable increase (OR>1) or decrease (OR<1) when the value of the corresponding predictor variable increases by one (1) unit. The OR is a measurement that finds wide use particularly in epidemiology. For instance, when "Y" indicates liver cancer status (have=1; have not=0), "X" indicates alcohol use status (using=1; not using=0); an OR calculated as "OR=3" means that the probability of liver cancer among alcohol users is three times as likely among non-alcoholics in any given population.
The validity of the model obtained with the help of selected variables should be checked by evaluating the goodness-of-fit of the logistic regression model. First, the overall evaluation of the model is made by examining the relationship between all predictor variables in the model and the outcome variable. For this purpose, evaluation is made by comparing the compatibility of two models with and without predictor variables. If the predictor variable model shows an improvement over the predictor variable model, it is stated that it fits the data better. For this purpose, the "H0: Β1=Β2=...=Βk=0" hypothesis is tested by using "likelihood ratio test" for the general fit of the model. If the "p" value for the general model fit statistic is lower than the significance level of the test ("p<0.05"), the "H0" hypothesis is rejected and at least one of the predictor variables is speculated to contribute to the prediction of outcome. The Hosmer-Lemeshow test, another indicator of good overall model fit, is used to examine whether the observed proportions of events are similar to the predicted probabilities of occurrence in subgroups of the model population. The Hosmer- Lemeshow test statistic asymptotically follows a "χ2" distribution and small values (with large p-value closer to 1) indicate a good fit to the data, thus, good overall model fit. On the other hand, large values (with p<0.05) indicate a poor fit to the data. If the general model obtained in logistic regression works well, the second step is how important each of the predictor variables is. The "Wald test" of the logistic regression coefficients for each predictor is performed and their contribution to the model is evaluated. For each model coefficient, the null hypothesis that the coefficient is zero is tested against the alternative that the coefficient is not zero using a "Wald test". The Wald statistic is asymptotically distributed as a "χ2" distribution and each Wald statistic is compared with a "χ2" critical value with degrees of freedom "1". If the H0 hypothesis is rejected, it is stated that the contribution of the coefficient to the model is statistically significant. The classification table, which should be prepared in the third step, is a method to evaluate the predictive accuracy of the logistic regression model.[9] In the 2x2 classification table, the observed values for the outcome and the predicted values are cross-classified. The classification table, which should be prepared in the third step, is a method to evaluate the predictive accuracy of the logistic regression model. In the generated classification table, the number of observations in cells a, b, c and d, respectively, is expected to be high in cells a and d, and very few in cells b and c. High sensitivity and specificity calculated with the help of medical diagnostic test indicate better fit to the model. While specifying a single cut-off point using the 2x2 classification table, by extending this table, all possible cut-off points in the range 0-1 can be observed. Plotting the pairs of sensitivity and one minus specificity on a scatter plot provides a Receiver Operating Characteristic (ROC) curve. The area under this curve (AUC) provides an overall measure of fit of the model.[10] The larger the AUC, the greater the predictability, which is an indicator of the success of the model.
In this study, factors affecting fast-attack and early extubation were evaluated with the help of multivariate logistic regression analysis. While outcome was variable (fast-track or early extubation) in the established multivariate model, age, body weight, presence of genetic syndrome, operational risk category and procedure time variables were found to be statistically significant as predictor variables. As a result of the analysis, fast-attack and early extubation can be successfully applied with low reintubation rates in selected cases with congenital heart surgery. Age over 30 days, less complex procedures (RACSH-1<4), absence of genetic syndrome, shorter duration of cardiopulmonary bypass and lower vasoactiveinotropic score (<8) may predict rapid follow-up and early extubation.
1) Kirkwood BR, Sterne JAC. Essential Medical Statistics.
Oxford: Blackwell Science Ltd.; 2003.
2) Hosmer DW, Lemeshow S. Applied Logistic Regression. 2nd
ed. New York: John Wiley & Sons, Inc.; 2000.
3) LaValley MP. Logistic regression. Circulation 20086;117:2395-
9) doi: 10.1161/CIRCULATIONAHA.106.682658.
4) Hosmer DW, Lemeshow S, Sturdivant RX. The multiple
logistic regression model. Applied Logistic Regression
1989;1:25-37.
5) Boateng EY, Abaye DA. A Review of the logistic regression
model with emphasis on medical research. JDAIP 2019;7:190-
207) doi: 10.4236/jdaip.2019.74012.
6) Hosmer DW, Lemeshow S. Applied Logistic Regression.
New York: John Wiley & Sons, Inc.; 2013.
7) Hosmer DW, Jovanovic B, Lemeshow S. Best Subsets
Logistic Regression. Biometrics 1989;45:1265-70. doi:
10) 2307/2531779
8) Reed P, Wu Y. Logistic regression for risk factor modelling
in stuttering research. J Fluency Disord 2013;38:88-101. doi:
10) 1016/j.jfludis.2012.09.003.
9) Peng CYJ, So TSH. Logistic regression analysis and
reporting: A primer, understanding statistics. 2002;1:31-70.
doi: 10.1207/S15328031US0101_04.