Comparison of Machine Learning Models for Predicting Lung Cancer Severity

Ninik Lestari, Erliyan Redy Susanto

Abstract


This study aims to compare the performance of four machine learning algorithms Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), and K-Nearest Neighbors (KNN) in predicting lung cancer severity based on patient medical data. The dataset includes clinical information with the target variable categorized into three severity levels: low, medium, and high. Experiments were conducted using an 80:20 train-test split without feature scaling. The results show that RF achieved 100% accuracy, LR 99%, KNN 82%, and SVM 43%. The superior performance of Random Forest can be attributed to its ensemble of decision trees, which mitigates overfitting in medium-dimensional numerical features, whereas SVM (kernel = RBF, C = 1.0, gamma = "scale") failed to adapt due to the absence of scaling and hyperparameter tuning. Recall, precision, and F1-score further confirm the dominance of RF and LR. This study provides insights into the effectiveness of machine learning algorithms in lung cancer diagnosis and highlights the contribution of a multi-algorithm approach. The findings recommend using RF as the primary model and LR as a complementary control within clinical decision support systems, enabling physicians to make earlier, more personalized treatment decisions and ultimately improve lung cancer patient prognosis.

Keywords


Lung Cancer Prediction, Machine Learning, Random Forest, Diagnosis

Full Text:

PDF

References


H.-Y. Lin dan J. Y. Park, “Epidemiology of Cancer,” in Anesthesia for Oncological Surgery, Cham: Springer International Publishing, 2023, hal. 11–16. DOI: 10.1007/978-3-031-50977-3_2.

C. Mattiuzzi dan G. Lippi, “Current Cancer Epidemiology,” J. Epidemiol. Glob. Health, Vol. 9, No. 4, hal. 217, 2019, DOI: 10.2991/jegh.k.191008.001.

I. Buana dan D. A. Harahap, “Asbestos, Radon dan Polusi Udara sebagai Faktor Resiko Kanker Paru pada Perempuan bukan Perokok,” AVERROUS J. Kedokt. dan Kesehat. Malikussaleh, Vol. 8, No. 1, hal. 1–16, Jul 2022, DOI: 10.29103/averrous.v8i1.7088.

W. Hamilton, F. M. Walter, G. Rubin, dan R. D. Neal, “Improving Early Diagnosis of Symptomatic Cancer,” Nat. Rev. Clin. Oncol., Vol. 13, No. 12, hal. 740–749, Des 2016, DOI: 10.1038/nrclinonc.2016.109.

M. J. Iqbal et al., “Clinical Applications of Artificial Intelligence and Machine Learning in Cancer Diagnosis: Looking into the Future,” Cancer Cell Int., Vol. 21, No. 1, hal. 270, Mei 2021, DOI: 10.1186/s12935-021-01981-1.

K. Shah, H. Patel, D. Sanghvi, dan M. Shah, “A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification,” Augment. Hum. Res., Vol. 5, No. 1, hal. 12, Des 2020, DOI: 10.1007/s41133-020-00032-0.

D. Delen, G. Walker, dan A. Kadam, “Predicting Breast Cancer Survivability: A Comparison of Three Data Mining Methods,” Artif. Intell. Med., Vol. 34, No. 2, hal. 113–127, Jun 2005, DOI: 10.1016/j.artmed.2004.07.002.

R. Deshpand, M. Chandra, dan A. Rauthan, “Evolving Trends in Lung Cancer,” Indian J. Cancer, Vol. 59, No. Suppl 1, hal. S90–S105, Mar 2022, DOI: 10.4103/ijc.IJC_52_21.

A. Yusuf Permana, Hari Noer Fazri, M.Fakhrizal Nur Athoilah, Mohammad Robi, dan Ricky Firmansyah, “Penerapan Data Mining dalam Analisis Prediksi Kanker Paru menggunakan Algoritma Random Forest,” J. Ilm. Tek. Inform. dan Komun., Vol. 3, No. 2, hal. 27–41, Jun 2023, DOI: 10.55606/juitik.v3i2.472.

L. Wang, “Deep Learning Techniques to Diagnose Lung Cancer,” Cancers (Basel)., Vol. 14, No. 22, hal. 5569, Nov 2022, DOI: 10.3390/cancers14225569.

H. T. Gayap dan M. A. Akhloufi, “Deep Machine Learning for Medical Diagnosis, Application to Lung Cancer Detection: A Review,” BioMedInformatics, Vol. 4, No. 1, hal. 236–284, Jan 2024, DOI: 10.3390/biomedinformatics4010015.

H. W. N. S. Putra, V. Atina, dan J. Maulindar, “Penerapan Algoritme Decision Tree pada Klasifikasi Penyakit Kanker Paru-Paru,” Jutisi J. Ilm. Tek. Inform. dan Sist. Inf., Vol. 12, No. 3, hal. 967, Des 2023, DOI: 10.35889/jutisi.v12i3.1323.

D. Septhya et al., “Implementasi Algoritma Decision Tree dan Support Vector Machine untuk Klasifikasi Penyakit Kanker Paru,” MALCOM Indones. J. Mach. Learn. Comput. Sci., Vol. 3, No. 1, hal. 15–19, Mei 2023, DOI: 10.57152/malcom.v3i1.591.

A. A. Nagra, I. Mubarik, M. M. Asif, K. Masood, M. A. Al Ghamdi, dan S. H. Almotiri, “Hybrid GA-SVM Approach for Postoperative Life Expectancy Prediction in Lung Cancer Patients,” Appl. Sci., Vol. 12, No. 21, hal. 10927, Okt 2022, DOI: 10.3390/app122110927.

D. Mustafa Abdullah, A. Mohsin Abdulazeez, dan A. Bibo Sallow, “Lung Cancer Prediction and Classification based on Correlation Selection Method using Machine Learning Techniques,” Qubahan Acad. J., Vol. 1, No. 2, hal. 141–149, Mei 2021, DOI: 10.48161/qaj.v1n2a58.

T. M. T. A. Hamid, R. Sallehuddin, Z. M. Yunos, dan A. Ali, “Ensemble based Filter Feature Selection with Harmonize Particle Swarm Optimization and Support Vector Machine for Optimal Cancer Classification,” Mach. Learn. with Appl., Vol. 5, hal. 100054, Sep 2021, DOI: 10.1016/j.mlwa.2021.100054.

R. Akbani, S. Kwek, dan N. Japkowicz, “Applying Support Vector Machines to Imbalanced Datasets,” 2004, hal. 39–50. DOI: 10.1007/978-3-540-30115-8_7.

J. L. Speiser, M. E. Miller, J. Tooze, dan E. Ip, “A Comparison of Random Forest Variable Selection Methods for Classification Prediction Modeling,” Expert Syst. Appl., Vol. 134, hal. 93–101, Nov 2019, DOI: 10.1016/j.eswa.2019.05.028.

A. Bhattacharjee, R. Murugan, dan T. Goel, “A Hybrid Approach for Lung Cancer Diagnosis using Optimized Random Forest Classification and K-Means Visualization Algorithm,” Health Technol. (Berl)., Vol. 12, No. 4, hal. 787–800, Jul 2022, DOI: 10.1007/s12553-022-00679-2.

S. Sobari, A. I. Purnamasari, A. Bahtiar, dan K. Kaslani, “Meningkatkan Model Prediksi Kelulusan Santri Tahfidz di Pondok Pesantren Al-Kautsar menggunakan Algoritma Random Forest," J. Inform. dan Tek. Elektro Terap., Vol. 13, No. 1, Jan 2025, DOI: 10.23960/jitet.v13i1.5704.

T. Kam Ho, “Random Decision Forests,” in Proceedings of 3rd International Conference on Document Analysis and Recognition, IEEE Comput. Soc. Press, hal. 278–282. DOI: 10.1109/ICDAR.1995.598994.

C.-T. Su dan C.-H. Yang, “Feature Selection for the SVM: An Application to Hypertension Diagnosis,” Expert Syst. Appl., Vol. 34, No. 1, hal. 754–763, Jan 2008, DOI: 10.1016/j.eswa.2006.10.010.

C. Wang et al., “Exploratory Study on Classification of Lung Cancer Subtypes Through A Combined K-Nearest Neighbor Classifier in Breathomics,” Sci. Rep., Vol. 10, No. 1, hal. 5880, Apr 2020, DOI: 10.1038/s41598-020-62803-4.

D. Endalie dan W. T. Abebe, “Analysis of Lung Cancer Risk Factors from Medical Records in Ethiopia using Machine Learning,” PLOS Digit. Heal., Vol. 2, No. 7, hal. e0000308, Jul 2023, DOI: 10.1371/journal.pdig.0000308.

F. Yang, H. Wang, H. Mi, C. Lin, dan W. Cai, “Using Random Forest for Reliable Classification and Cost-Sensitive Learning for Medical Diagnosis,” BMC Bioinformatics, Vol. 10, No. S1, hal. S22, Jan 2009, DOI: 10.1186/1471-2105-10-S1-S22.

R. G. Brereton dan G. R. Lloyd, “Support Vector Machines for Classification and Regression,” Analyst, Vol. 135, No. 2, hal. 230–267, 2010, DOI: 10.1039/B918972F.

Z. Lai, X. Chen, J. Zhang, H. Kong, dan J. Wen, “Maximal Margin Support Vector Machine for Feature Representation and Classification,” IEEE Trans. Cybern., Vol. 53, No. 10, hal. 6700–6713, Okt 2023, DOI: 10.1109/TCYB.2022.3232800.

A. Theissler, M. Thomas, M. Burch, dan F. Gerschner, “ConfusionVis: Comparative Evaluation and Selection of Multi-Class Classifiers based on Confusion Matrices,” Knowledge-Based Syst., Vol. 247, hal. 108651, Jul 2022, DOI: 10.1016/j.knosys.2022.108651.

Y. Hui, X. Mei, G. Jiang, F. Zhao, Z. Ma, dan T. Tao, “Assembly Quality Evaluation for Linear Axis of Machine Tool using Data-Driven Modeling Approach,” J. Intell. Manuf., Vol. 33, No. 3, hal. 753–769, Mar 2022, DOI: 10.1007/s10845-020-01666-y.

A. A. T. Fernandes, D. B. Figueiredo Filho, E. C. da Rocha, dan W. da S. Nascimento, “Read this Paper if You Want to Learn Logistic Regression,” Rev. Sociol. e Política, Vol. 28, No. 74, 2020, DOI: 10.1590/1678-987320287406en.

V. Hassija et al., “Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence,” Cognit. Comput., Vol. 16, No. 1, hal. 45–74, Jan 2024, DOI: 10.1007/s12559-023-10179-8.

D. Dey et al., “The proper application of logistic regression model in complex survey data: a systematic review,” BMC Med. Res. Methodol., Vol. 25, No. 1, hal. 15, Jan 2025, DOI: 10.1186/s12874-024-02454-5.

A. Vanacore, M. S. Pellegrino, dan A. Ciardiello, “Fair Evaluation of Classifier Predictive Performance based on Binary Confusion Matrix,” Comput. Stat., Vol. 39, No. 1, hal. 363–383, Feb 2024, DOI: 10.1007/s00180-022-01301-9.

D. Chicco dan G. Jurman, “The Matthews Correlation Coefficient (MCC) should Replace the ROC AUC as the Standard Metric for Assessing Binary Classification,” BioData Min., Vol. 16, No. 1, hal. 4, Feb 2023, DOI: 10.1186/s13040-023-00322-4.

Q. M. Zhou, L. Zhe, R. J. Brooke, M. M. Hudson, dan Y. Yuan, “A Relationship Between the Incremental Values of Area under the ROC Curve and of Area under the Precision-Recall Curve,” Diagnostic Progn. Res., Vol. 5, No. 1, hal. 13, Des 2021, DOI: 10.1186/s41512-021-00102-w.

I. M. De Diego, A. R. Redondo, R. R. Fernández, J. Navarro, dan J. M. Moguerza, “General Performance Score for Classification Problems,” Appl. Intell., Vol. 52, No. 10, hal. 12049–12063, Agu 2022, DOI: 10.1007/s10489-021-03041-7.

A. Hasby Bik, F. Tri Anggraeny, dan E. Yulia Puspaningrum, “Klasifikasi Penyakit Ginjal nenggunakan Algoritma Hibrida CNN-ELM,” JATI (Jurnal Mhs. Tek. Inform., Vol. 8, No. 3, hal. 3836–3844, Jun 2024, DOI: 10.36040/jati.v8i3.9807.

N.-C. Yang dan K.-L. Sung, “Non-Intrusive Load Classification and Recognition using Soft-Voting Ensemble Learning Algorithm with Decision Tree, K-Nearest Neighbor Algorithm and Multilayer Perceptron,” IEEE Access, Vol. 11, hal. 94506–94520, 2023, DOI: 10.1109/ACCESS.2023.3311641.




DOI: https://doi.org/10.32520/stmsi.v14i6.5258

Article Metrics

Abstract view : 7 times
PDF - 5 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.