Deep learning in distinguishing pulmonary nodules as benign and malignant

Muhammed Bilal Akıncı¹, Mesut Özgökçe¹, Murat Canayaz², Fatma Durmaz¹, Sercan Özkaçmaz¹, İlyas Dündar¹, Ensar Türko¹, Cemil Göya¹

¹Department of Radiology, Van Yüzüncü Yıl University Faculty of Medicine, Van, Türkiye
²Van Yüzüncüyıl University, Computer Engineering, Van, Türkiye

DOI : 10.5606/tgkdc.dergisi.2024.26027

Abstract

Background: Due to the high mortality of lung cancer, the aim was to find convolutional neural network models that can distinguish benign and malignant cases with high accuracy, which can help in early diagnosis with diagnostic imaging.

Methods: Patients who underwent tomography in our clinic and who were found to have lung nodules were retrospectively screened between January 2015 and December 2020. The patients were divided into two groups: benign (n=68; 38 males, 30 females; mean age: 59±12.2 years; range, 27 to 81 years) and malignant (n=29; 19 males, 10 females; mean age: 65±10.4 years; range, 43 to 88 years). In addition, a control group (n=67; 38 males, 29 females; mean age: 56.9±14.1 years; range, 26 to 81 years) consisting of healthy patients with no pathology in their sections was formed. Deep neural networks were trained with 80% of the three-class dataset we created and tested with 20% of the data. After the training of deep neural networks, feature extraction was done for these networks. The features extracted from the dataset were classified by machine learning algorithms. Performance results were obtained using confusion matrix analysis.

Results: After training deep neural networks, the highest accuracy rate of 80% was achieved with the AlexNET model among the models used. In the second stage results, obtained after feature extraction and using the classifier, the highest accuracy rate was a chieved w ith t he s upport vector m achine c lassifier i n t he VGG19 model with 93.5%. In addition, increases in accuracy were noted in all models with the use of the support vector machine classifier.

Conclusion: Differentiation of benign and malignant lung nodules using deep learning models and feature extraction will provide important advantages for early diagnosis in radiology practice. The results obtained in our study support this view.

Abstract

The most common cancers in all age groups and both sexes in the world and in our country are breast, prostate, and lung cancers. Among these, the most common cause of death is lung cancer for all age groups and for both sexes.[1] The five-year survival rate for lung cancer is between 10 and 20%, which is quite low.[2 -4] Five-year survival is high in patients with lung cancer detected and resected at an early stage. It is known that even a cure can be achieved in Stage I cancers.[2,5] C omputed t omography ( CT) is the most commonly used and most effective imaging method for detecting lung cancer.[4]

Lung lesions are divided into two main groups: malignant and benign lesions. About 70 to 80% of benign lesions are infectious granulomas, and 10% are hamartomas. Malignant lesions consist of primary lung cancer or metastases.[6]

Radiologists detect and characterize diseases through the qualitative features of medical images.[7] The qualitative features of nodular lesions detected in the lung include the size of the nodule, its location, shape and border features, whether it contains calcification and fat, its contrast status, and growth rate.[8] C omputer-aided d iagnostic s ystems u sing convolutional neural networks (CNNs) are excellent at automatically recognizing complex patterns in imaging data and are highly effective in providing quantitative rather than qualitative assessments of radiographic features. In studies by Gonçalves et al.[4] a nd K im e t a l.,[9] there was an increase in the diagnostic accuracy of radiologists using computer-aided diagnosis systems. In this respect, the use of computer-aided diagnosis systems in the characterization of lung lesions will provide significant support for early and rapid diagnosis.[7]

Feature extraction must first be performed to detect and characterize nodules using deep learning. Feature extraction is the process of highlighting important points in the image. There are two commonly used methods for feature extraction. The first of these is the manual feature extraction process with image processing methods. In this process, it is necessary to work on each image separately, and therefore, this takes a long time. The second most widely used feature extraction method is the use of CNN in deep learning. Here, the images are circulated between the CNN layers, and certain features on each image are automatically obtained. It is among the most preferred methods in studies since it automates the work. In this method, it is necessary to investigate deep learning models that provide the best extraction of features. In our study, we aimed to differentiate lung nodules as benign and malignant with high accuracy by using CNN models with deep feature extraction, and we researched the best model for the dataset we created.

Methods

Thoracic CT scans taken at the Van Yüzüncü Yıl University Faculty of Medicine, Department of Radiology between January 2015 and December 2020 were retrospectively scanned. Patients with pulmonary nodules on scanned CT images were evaluated. Among these patients, patients with histopathological diagnosis or at least two years of follow-up were included in the study. Patients with nodules smaller than 5 mm in size, with nodules that could not be clearly distinguished due to artifacts in the images, and who did not have radiological follow-up or histopathology were excluded from the study. Afterward, patients were grouped into two separate groups, according to their pathology and clinical and radiological results: benign and malignant. The criteria for determining the malignant group were that the lesion was diagnosed as histopathologically malignant or evaluated as malignant clinicoradiologically. The criteria for determining the benign group were that the lesion was diagnosed as histopathologically benign or there was no increase in size and no change in nature during clinicoradiological follow-up of at least two years.

Computed tomography images were obtained with multislice tomography devices with 128 detectors (Somatom Definition AS+128; Siemens AG, München, Germany) and 16 detectors (Somatom Emotion 16-slice; Siemens AG, München, Germany). Sections were obtained from the distal neck to the upper abdomen, with the patients in the supine position and holding their breath. Axial images with a cross-sectional thickness of 3 mm, which were transferred to the system after the previous imaging procedures, were evaluated on the high-resolution grayscale medical monitor used for routine CT examinations by two separate radiologists with four and 10 years of experience, respectively, in terms of compatibility with the study. Afterward, cases were classified as malignant and benign according to the histopathological or clinicoradiological data of the patient.

In the malignant group, 199 images were created using the follow-up tomography images of 29 patients (19 males, 10 females; mean age: 65±10.4 years; range, 43 to 88 years) and all pathological axial sections within these images. In the benign group, 202 images were created using follow-up tomography images of 68 patients (38 males, 30 females; mean age: 59±12.2 years; range, 27 to 81 years) and all axial sections in which the nodule was present ( Figure 1). Since these images created for training deep learning algorithms contained malignant or benign lesions, training was also required with normal sections. For this reason, we created a control group without lung pathology. Sixty-seven patients (38 males, 29 females; mean age: 56.9±14.1 years; range, 26 to 81 years) were included in this control group. To homogenize the distribution while creating normal sections, patients from different age groups were selected. In addition, different sections were obtained from the upper, middle, and lower parts of the lung in each patient. In this way, 343 images were created from 67 patients included in the control group ( Figure 2).

Figure 1. Section sample of a malignant lesion on the left and a benign lesion on the right.

Figure 2. Normal section samples obtained from the upper-middle and lower sections of the lung.

Artificial intelligence training

The images obtained after archive scanning were collected in three different classes (normal, benign, and malignant), and then training was started in the Matrix laboratory (MATLAB; MathWorks Inc., Natick, MA, USA) environment with AlexNet, GoogleNet, VGG19, and ResNet models. The images from the three groups we created were divided into 80% training and 20% test data. The networks were trained with 80% training data, and attempts were made to verify the trained network using the 20% test data. Afterward, the accuracy values obtained for each class were determined, and each network trained with the dataset was recorded as a model ( Figure 3).

Figure 3. Working flowchart.

In the next step, feature extraction was started, and 1,000 features were extracted from each network.

After this stage, each image had 1,000 features after passing through the network. These features were classified using SVM (support vector machine) and KNN (k-nearest neighbor) classifiers from machine learning algorithms.

Statistical analysis

Python programming language and the sklearn library written in this language were used in the statistical analysis studies carried out in our study. All statistical metrics were obtained using ready-made functions in this library.

Methods

Results

The parenchyma window was used for all sections (Table 1). Before feature extraction and using SVM and KNN classifiers, the accuracy rates we obtained as a result of artificial neural network training were 78.33% for VGG19, 80% for AlexNet, 75% for ResNet, and 73.33% for GoogleNet.

Table 1. Dataset information

In the results we obtained after feature extraction and use of classifier, it was found to be 93.5% for VGG19, 86.7% for AlexNet, 88.5% for ResNet, and 78% for GoogleNet after using the SVM classifier. After using the KNN classifier, it was found to be 92.2% for VGG19, 78.9% for AlexNet, 74.6% for ResNet, and 73% for GoogleNet (Table 2).

Table 2. Comparison table before and after using classifier

As a result of the first training, the AlexNet model was the best model with 80% accuracy. After feature extraction through models, statistical classifiers SVM and KNN were used. According to the results obtained from these classifiers, the VGG19 model achieved the best result with an accuracy rate of 93.5%. Retraining was required when new patient images were added to the dataset. These rates are expected to increase with new images.

The metric values used for statistical analysis are given in Figure 4. Statistical analysis results for SVM and KNN are given in Tables 3 and 4.

Figure 4. Equations for metric values used for statistical analysis.

Table 3. Statistical analysis results after using SVM classifier

Table 4. Statistical analysis results after using KNN classifier

Results

Discussion

Precise and accurate detection and examination of pulmonary nodules is one of the best approaches to reduce deaths from lung cancer. With the development of artificial intelligence methods, pulmonary nodules have been classified according to whether they are malignant or benign. In this sense, the first machine learning methods were used, but the time-consuming and limited discrimination power of the applied methods was an important limitation of the studies.[10,11] Later on, CNN architectures that can automatically extract high-level features from images were developed.

In our retrospective study, we aimed to differentiate the nodules observed on lung tomography images obtained in our clinic as malignant and benign by using the automatic image analysis feature of CNNs. Similar to our study, Sun et al.[12] used LeNet to classify lung nodules, Hussein et al.[13] estimated the malignancy of nodules based on AlexNet, and Nibali et al.[14] used ResNet for benign/malignant lung nodule classification.[15] In our study, we used AlexNet, GoogleNet, ResNet, and VGG19 models as CNNs. These models were chosen because they achieved success in competitions based on image analysis and classification.[16,17]

After training of CNN models with our dataset, we performed feature extraction in the next step and classified the features obtained with KNN and SVM classifiers, which are machine learning algorithms. The highest accuracy rate of 93.5% was achieved with the combination of VGG19 and SVM classifier.

Shen et al.[18] used a multi-crop pooling technique they called multi-crop CNN in their study in 2016. In the study, three different dimensional datasets were used, and the highest accuracy rate achieved was 87% in the highest dimensional dataset. This study provides important information in terms of the effect of dataset size on the increase in accuracy rates.

Nibali et al.[14] reached an accuracy rate of 89.9% in their study with the ResNet model in 2017. Dai et al.[15] a chieved a n a ccuracy r ate o f 9 1.47% w ith the artificial neural network model they developed in 2018. In these two studies, the publicly available lung tomography image database known as the Lung Image Database Consortium image collection (LIDC-IDRI) was used, and the classification of malignant and benign in both studies was based on the opinions of radiologists. The large number of datasets used in the studies by Nibali et al.[14] and Dai et al.[15] may have been effective in increasing the accuracy rates. The classification process in these studies by radiologists as low probability or high probability differs from our study. As stated above, some objective criteria were used while classifying malignant and benign in our study. In addition, while creating the groups, we paid attention to the similarity of average ages and number distribution. While forming the control group, we selected samples from different lung sections. Thus, our dataset became more homogeneous before starting training. This may have an effect on the accuracy rates we obtained using a smaller number of samples compared to the studies mentioned above.

Da Nóbrega et al.[10] also used the LIDC-IDRI database in 2018, like the mentioned researchers, but they used many different architectures and used a classifier after feature extraction, as in our study. In this comprehensive study, the highest accuracy rate was close to 90% for the SVM-radial basis function (RBF) classifier combination with the ResNET50 CNN network. In this study, the ResNET and VGG19 architectures and KNN and SVM classifiers were used, similar to our study. In the study by Da Nóbrega et al.,[10] the accuracy rates for these combinations were 86.85% for ResNet-KNN, 86.98% for ResNet-SVM, 86% for VGG19-KNN, and 82.03% for VGG19-SVM. In our study, accuracy was found to be 74.6% for ResNet-KNN, 88.5% for ResNet SVM, 92.2% for VGG19-KNN, and 93.5% for VGG19-SVM. The ResNet-KNN combination was not as successful in our study as it was for Da Nóbrega et al.[10] However, when other similar combinations were compared, the accuracy rates we obtained were found to be higher. This study shows that similar to our study, the accuracy rates obtained as a result of training CNN architectures will increase with the use of classifiers after deep feature extraction.

The main limitation of our study is the small number of patients. The retrospective nature of the study is another limitation. Additionally, thorax CT sections could not always be obtained with the same protocol, and images were taken using different CT devices.

In conclusion, the results show that deep learning techniques can distinguish benign nodules from malignant by performing image analysis, and they are promising in this respect. Our suggestions for future studies are that CNN models be trained with higher and homogenized sample numbers and suitable classifier combinations to increase the accuracy rates.

Ethics Committee Approval: The study protocol was approved by the Yüzüncü Yıl University Non-Interventional Clinical Research Ethics Committee (date: 11.12.2020, no: 2020/10-01). The study was conducted in accordance with the principles of the Declaration of Helsinki.

Patient Consent for Publication: A written informed consent was obtained from each patient.

Data Sharing Statement: The data that support the findings of this study are available from the corresponding author upon reasonable request.

Author Contributions: Idea/concept: M.B.A., M.Ö., M.C., C.G.; Control/supervision: M.B.A., M.Ö., M.C, F.D, C.G.; Data collection: M.Ö., F.D., İ.D., E.T., S.Ö.; Data processing, statistical analysis: M.B.A., M.C.; Interpretation: M.B.A., F.D. , M.C.; Literature review: M.B.A., S.Ö., İ.D.,E.T. Writing: M.B.A, M.Ö., M.C., F.D.; Editing: M.B.A., F.D., S.Ö., E.T., İ.D.

Conflict of Interest: The authors declared no conflicts of interest with respect to the authorship and/or publication of this article.

Funding: The authors received no financial support for the research and/or authorship of this article.

Discussion