Methods: Patients who underwent tomography in our clinic and who were found to have lung nodules were retrospectively screened between January 2015 and December 2020. The patients were divided into two groups: benign (n=68; 38 males, 30 females; mean age: 59±12.2 years; range, 27 to 81 years) and malignant (n=29; 19 males, 10 females; mean age: 65±10.4 years; range, 43 to 88 years). In addition, a control group (n=67; 38 males, 29 females; mean age: 56.9±14.1 years; range, 26 to 81 years) consisting of healthy patients with no pathology in their sections was formed. Deep neural networks were trained with 80% of the three-class dataset we created and tested with 20% of the data. After the training of deep neural networks, feature extraction was done for these networks. The features extracted from the dataset were classified by machine learning algorithms. Performance results were obtained using confusion matrix analysis.
Results: After training deep neural networks, the highest accuracy rate of 80% was achieved with the AlexNET model among the models used. In the second stage results, obtained after feature extraction and using the classifier, the highest accuracy rate was a chieved w ith t he s upport vector m achine c lassifier i n t he VGG19 model with 93.5%. In addition, increases in accuracy were noted in all models with the use of the support vector machine classifier.
Conclusion: Differentiation of benign and malignant lung nodules using deep learning models and feature extraction will provide important advantages for early diagnosis in radiology practice. The results obtained in our study support this view.
Lung lesions are divided into two main groups: malignant and benign lesions. About 70 to 80% of benign lesions are infectious granulomas, and 10% are hamartomas. Malignant lesions consist of primary lung cancer or metastases.[6]
Radiologists detect and characterize diseases through the qualitative features of medical images.[7] The qualitative features of nodular lesions detected in the lung include the size of the nodule, its location, shape and border features, whether it contains calcification and fat, its contrast status, and growth rate.[8] C omputer-aided d iagnostic s ystems u sing convolutional neural networks (CNNs) are excellent at automatically recognizing complex patterns in imaging data and are highly effective in providing quantitative rather than qualitative assessments of radiographic features. In studies by Gonçalves et al.[4] a nd K im e t a l.,[9] there was an increase in the diagnostic accuracy of radiologists using computer-aided diagnosis systems. In this respect, the use of computer-aided diagnosis systems in the characterization of lung lesions will provide significant support for early and rapid diagnosis.[7]
Feature extraction must first be performed to detect and characterize nodules using deep learning. Feature extraction is the process of highlighting important points in the image. There are two commonly used methods for feature extraction. The first of these is the manual feature extraction process with image processing methods. In this process, it is necessary to work on each image separately, and therefore, this takes a long time. The second most widely used feature extraction method is the use of CNN in deep learning. Here, the images are circulated between the CNN layers, and certain features on each image are automatically obtained. It is among the most preferred methods in studies since it automates the work. In this method, it is necessary to investigate deep learning models that provide the best extraction of features. In our study, we aimed to differentiate lung nodules as benign and malignant with high accuracy by using CNN models with deep feature extraction, and we researched the best model for the dataset we created.
Computed tomography images were obtained with multislice tomography devices with 128 detectors (Somatom Definition AS+128; Siemens AG, München, Germany) and 16 detectors (Somatom Emotion 16-slice; Siemens AG, München, Germany). Sections were obtained from the distal neck to the upper abdomen, with the patients in the supine position and holding their breath. Axial images with a cross-sectional thickness of 3 mm, which were transferred to the system after the previous imaging procedures, were evaluated on the high-resolution grayscale medical monitor used for routine CT examinations by two separate radiologists with four and 10 years of experience, respectively, in terms of compatibility with the study. Afterward, cases were classified as malignant and benign according to the histopathological or clinicoradiological data of the patient.
In the malignant group, 199 images were created using the follow-up tomography images of 29 patients (19 males, 10 females; mean age: 65±10.4 years; range, 43 to 88 years) and all pathological axial sections within these images. In the benign group, 202 images were created using follow-up tomography images of 68 patients (38 males, 30 females; mean age: 59±12.2 years; range, 27 to 81 years) and all axial sections in which the nodule was present ( Figure 1). Since these images created for training deep learning algorithms contained malignant or benign lesions, training was also required with normal sections. For this reason, we created a control group without lung pathology. Sixty-seven patients (38 males, 29 females; mean age: 56.9±14.1 years; range, 26 to 81 years) were included in this control group. To homogenize the distribution while creating normal sections, patients from different age groups were selected. In addition, different sections were obtained from the upper, middle, and lower parts of the lung in each patient. In this way, 343 images were created from 67 patients included in the control group ( Figure 2).
Figure 1. Section sample of a malignant lesion on the left and a benign lesion on the right.
Figure 2. Normal section samples obtained from the upper-middle and lower sections of the lung.
Artificial intelligence training
The images obtained after archive scanning were collected in three different classes (normal, benign, and malignant), and then training was started in the Matrix laboratory (MATLAB; MathWorks Inc., Natick, MA, USA) environment with AlexNet, GoogleNet, VGG19, and ResNet models. The images from the three groups we created were divided into 80% training and 20% test data. The networks were trained with 80% training data, and attempts were made to verify the trained network using the 20% test data. Afterward, the accuracy values obtained for each class were determined, and each network trained with the dataset was recorded as a model ( Figure 3).
In the next step, feature extraction was started, and 1,000 features were extracted from each network.
After this stage, each image had 1,000 features after passing through the network. These features were classified using SVM (support vector machine) and KNN (k-nearest neighbor) classifiers from machine learning algorithms.
Statistical analysis
Python programming language and the sklearn library written in this language were used in the statistical analysis studies carried out in our study. All statistical metrics were obtained using ready-made functions in this library.
In the results we obtained after feature extraction and use of classifier, it was found to be 93.5% for VGG19, 86.7% for AlexNet, 88.5% for ResNet, and 78% for GoogleNet after using the SVM classifier. After using the KNN classifier, it was found to be 92.2% for VGG19, 78.9% for AlexNet, 74.6% for ResNet, and 73% for GoogleNet (Table 2).
Table 2. Comparison table before and after using classifier
As a result of the first training, the AlexNet model was the best model with 80% accuracy. After feature extraction through models, statistical classifiers SVM and KNN were used. According to the results obtained from these classifiers, the VGG19 model achieved the best result with an accuracy rate of 93.5%. Retraining was required when new patient images were added to the dataset. These rates are expected to increase with new images.
The metric values used for statistical analysis are given in Figure 4. Statistical analysis results for SVM and KNN are given in Tables 3 and 4.
Figure 4. Equations for metric values used for statistical analysis.
Table 3. Statistical analysis results after using SVM classifier
Table 4. Statistical analysis results after using KNN classifier
In our retrospective study, we aimed to differentiate the nodules observed on lung tomography images obtained in our clinic as malignant and benign by using the automatic image analysis feature of CNNs. Similar to our study, Sun et al.[12] used LeNet to classify lung nodules, Hussein et al.[13] estimated the malignancy of nodules based on AlexNet, and Nibali et al.[14] used ResNet for benign/malignant lung nodule classification.[15] In our study, we used AlexNet, GoogleNet, ResNet, and VGG19 models as CNNs. These models were chosen because they achieved success in competitions based on image analysis and classification.[16,17]
After training of CNN models with our dataset, we performed feature extraction in the next step and classified the features obtained with KNN and SVM classifiers, which are machine learning algorithms. The highest accuracy rate of 93.5% was achieved with the combination of VGG19 and SVM classifier.
Shen et al.[18] used a multi-crop pooling technique they called multi-crop CNN in their study in 2016. In the study, three different dimensional datasets were used, and the highest accuracy rate achieved was 87% in the highest dimensional dataset. This study provides important information in terms of the effect of dataset size on the increase in accuracy rates.
Nibali et al.[14] reached an accuracy rate of 89.9% in their study with the ResNet model in 2017. Dai et al.[15] a chieved a n a ccuracy r ate o f 9 1.47% w ith the artificial neural network model they developed in 2018. In these two studies, the publicly available lung tomography image database known as the Lung Image Database Consortium image collection (LIDC-IDRI) was used, and the classification of malignant and benign in both studies was based on the opinions of radiologists. The large number of datasets used in the studies by Nibali et al.[14] and Dai et al.[15] may have been effective in increasing the accuracy rates. The classification process in these studies by radiologists as low probability or high probability differs from our study. As stated above, some objective criteria were used while classifying malignant and benign in our study. In addition, while creating the groups, we paid attention to the similarity of average ages and number distribution. While forming the control group, we selected samples from different lung sections. Thus, our dataset became more homogeneous before starting training. This may have an effect on the accuracy rates we obtained using a smaller number of samples compared to the studies mentioned above.
Da Nóbrega et al.[10] also used the LIDC-IDRI database in 2018, like the mentioned researchers, but they used many different architectures and used a classifier after feature extraction, as in our study. In this comprehensive study, the highest accuracy rate was close to 90% for the SVM-radial basis function (RBF) classifier combination with the ResNET50 CNN network. In this study, the ResNET and VGG19 architectures and KNN and SVM classifiers were used, similar to our study. In the study by Da Nóbrega et al.,[10] the accuracy rates for these combinations were 86.85% for ResNet-KNN, 86.98% for ResNet-SVM, 86% for VGG19-KNN, and 82.03% for VGG19-SVM. In our study, accuracy was found to be 74.6% for ResNet-KNN, 88.5% for ResNet SVM, 92.2% for VGG19-KNN, and 93.5% for VGG19-SVM. The ResNet-KNN combination was not as successful in our study as it was for Da Nóbrega et al.[10] However, when other similar combinations were compared, the accuracy rates we obtained were found to be higher. This study shows that similar to our study, the accuracy rates obtained as a result of training CNN architectures will increase with the use of classifiers after deep feature extraction.
The main limitation of our study is the small number of patients. The retrospective nature of the study is another limitation. Additionally, thorax CT sections could not always be obtained with the same protocol, and images were taken using different CT devices.
In conclusion, the results show that deep learning techniques can distinguish benign nodules from malignant by performing image analysis, and they are promising in this respect. Our suggestions for future studies are that CNN models be trained with higher and homogenized sample numbers and suitable classifier combinations to increase the accuracy rates.
Ethics Committee Approval: The study protocol was approved by the Yüzüncü Yıl University Non-Interventional Clinical Research Ethics Committee (date: 11.12.2020, no: 2020/10-01). The study was conducted in accordance with the principles of the Declaration of Helsinki.
Patient Consent for Publication: A written informed consent was obtained from each patient.
Data Sharing Statement: The data that support the findings of this study are available from the corresponding author upon reasonable request.
Author Contributions: Idea/concept: M.B.A., M.Ö., M.C., C.G.; Control/supervision: M.B.A., M.Ö., M.C, F.D, C.G.; Data collection: M.Ö., F.D., İ.D., E.T., S.Ö.; Data processing, statistical analysis: M.B.A., M.C.; Interpretation: M.B.A., F.D. , M.C.; Literature review: M.B.A., S.Ö., İ.D.,E.T. Writing: M.B.A, M.Ö., M.C., F.D.; Editing: M.B.A., F.D., S.Ö., E.T., İ.D.
Conflict of Interest: The authors declared no conflicts of interest with respect to the authorship and/or publication of this article.
Funding: The authors received no financial support for the research and/or authorship of this article.
1) Thandra KC, Barsouk A, Saginala K, Aluru JS, Barsouk
A. Epidemiology of lung cancer. Contemp Oncol (Pozn)
2021;25:45-52. doi: 10.5114/wo.2021.103829.
2) Collins LG, Haines C, Perkel R, Enck RE. Lung cancer:
Diagnosis and management. Am Fam Physician 2007;75:56-63.
3) Tanoue LT, Tanner NT, Gould MK, Silvestri GA. Lung
cancer screening. Am J Respir Crit Care Med 2015;191:19-33.
doi: 10.1164/rccm.201410-1777CI.
4) Gonçalves L, Novo J, Cunha A, Campilho A. Learning lung
nodule malignancy likelihood from radiologist annotations
or diagnosis data. J Med Biol Eng 2018;38:424-42. doi: 10.1007/s40846-017-0317-2.
5) Kartaloğlu Z. Soliter pulmoner nodüle yaklaşım. Turkish J
Thorac Cardiovasc Surg 2008;16:274-83.
6) Ost D, Fein AM, Feinsilver SH. Clinical practice. The
solitary pulmonary nodule. N Engl J Med 2003;348:2535-42.
doi: 10.1056/NEJMcp012290.
7) Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts
HJWL. Artificial intelligence in radiology. Nat Rev Cancer
2018;18:500-10. doi: 10.1038/s41568-018-0016-5.
8) Soubani AO. The evaluation and management of the solitary
pulmonary nodule. Postgrad Med J 2008;84:459-66. doi: 10.1136/pgmj.2007.063545.
9) Kim RY, Oke JL, Pickup LC, Munden RF, Dotson TL,
Bellinger CR, et al. Artificial intelligence tool for assessment
of indeterminate pulmonary nodules detected with CT.
Radiology 2022;304:683-91. doi: 10.1148/radiol.212182.
10) da Nóbrega RVM, Rebouças Filho PP, Rodrigues MB, da
Silva SPP, Dourado Júnior CMJM, de Albuquerque VHC.
Lung nodule malignancy classification in chest computed
tomography images using transfer learning and convolutional
neural networks. Neural Comput Applic 2020;32:11065-82.
doi: 10.1007/s00521-018-3895-1.
11) Monkam P, Qi S, Ma H, Gao W, Yao Y, Qian W. Detection
and classification of pulmonary nodules using convolutional
neural networks: A survey. IEEE Access 2019;7:78075-91.
12) Sun W, Zheng B, Qian W. Computer aided lung cancer
diagnosis with deep learning algorithms. Presented at the
SPIE medical imaging,Computer-Aided Diagnosis; 2016 Feb
27-3, San diego. 2016;9785:241-8. doi: 10.1117/12.2216307
13) Hussein S, Gillies R, Cao K, Song Q, Bagci U. TumorNet:
Lung nodule characterization using multi-view Convolutional
Neural Network with Gaussian Process 2017:1007-10. doi: 10.1109/ISBI.2017.7950686.
14) Nibali A, He Z, Wollersheim D. Pulmonary nodule
classification with deep residual networks. Int J Comput
Assist Radiol Surg 2017;12:1799-808. doi: 10.1007/s11548-017-1605-6.
15) Dai Y, Yan S, Zheng B, Song C. Incorporating automatically
learned pulmonary nodule attributes into a convolutional
neural network to improve accuracy of benign-malignant
nodule classification. Phys Med Biol 2018;63:245004. doi: 10.1088/1361-6560/aaf09f.
16) Krizhevsky A, Sutskever I, Hinton G. ImageNet classification
with deep convolutional neural networks. NeurIPS 2012:25.
doi: 10.1145/3065386.