Implementation of the Random Forest Algorithm with Optuna Optimization in Lung Cancer Classification

Ahmad Ainul Yaqin, Mula Agung Barata, Nur Mahmudah

Abstract


Lung cancer remains one of the leading causes of death worldwide, with many sufferers unaware of their condition until it is too late for treatment. Therefore, high-accuracy prediction methods are urgently needed for early detection of lung cancer. This research uses the Random Forest algorithm, known for its excellent performance in medical data classification. In this study, modeling was optimized by implementing hyperparameter optimization using Optuna. The results of the generated model show an accuracy rate of 98.6%, which is highly significant in the context of early lung cancer detection. Additionally, this algorithm demonstrated 100% recall for the positive class and 97% for the negative class, indicating that the model is highly effective in identifying patients who truly have lung cancer. Another advantage of this model is seen in the AUC (Area Under the Curve) value reaching 1, indicating 100% accurate predictions. With these results, this research affirms the importance of using the Random Forest algorithm in developing early detection systems for lung cancer. This not only can improve treatment success rates but also significantly reduce mortality rates from lung cancer.

Keywords


lung cancer, random forest, optuna, hyperparameter optimization, classification

Full Text:

PDF

References


S. Alfarisa, E. Mitra, and S. Wahyuni, “Karakteristik Pasien Kanker Paru di RSUP Dr. M. Djamil Padang Tahun 2021,” SCI. J., vol. 2, no. 6, pp. 141–149, 2023, doi: 10.56260/sciena.v2i6.116.

Hajiar Yuliana, “Hyperparameter Optimization of Random Forest for 5G Coverage Prediction,” Bul. Pos dan Telekomun., vol. 22, no. 1, pp. 75–90, 2024, doi: 10.17933/bpostel.v22i1.390.

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: a Next-Generation Hyperparameter Optimization Framework,” in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 2623–2631.

S. Chintakindi, A. Alsamhan, M. H. Abidi, and M. P. Kumar, “Annealing of Monel 400 Alloy using Principal Component Analysis, Hyper-Parameter Optimization, Machine Learning Techniques, and Multi-Objective Particle Swarm Optimization,” Int. J. Comput. Intell. Syst., vol. 15, no. 1, 2022, doi: 10.1007/s44196-022-00070-z.

R. B. Sinaga, D. Widiyanto, and B. T. Wahyono, “Deteksi Dini Penyakit Kanker Paru dengan Gabungan Algoritma Adaboost dan Random Forest,” Semin. Nas. Mhs. Ilmu Komput. dan Apl., pp. 1–10, 2022, [Online]. Available: https://www.kaggle.com/datasets/mysarahmadbhat/lung-cancer

B. F. Sitanggang and P. Sitompul, “Deteksi Awal Kelangsungan Hidup Pasien Gagal Jantung menggunakan Machine Learning Metode Random Forest,” Innov. J. Soc. Sci. …, vol. 4, pp. 3347–3357, 2024, [Online]. Available: http://j-innovative.org/index.php/Innovative/article/view/8189%0Ahttps://j-innovative.org/index.php/Innovative/article/download/8189/6657

D. Juliani and M. Soleh, “Implementasi Machine Learning untuk Klasifikasi Penyakit Kanker Paru menggunakan Metode Naïve Bayes dengan Tambahan Fitur Chatbot ( Implementation of Machine Learning for Lung Cancer Classification using Naïve Bayes Method with Additional Chatbot Features,” 2020.

L. Sari, A. Romadloni, and R. Listyaningrum, “Penerapan Data Mining dalam Analisis Prediksi Kanker Paru menggunakan Algoritma Random Forest,” Infotekmesin, vol. 14, no. 1, pp. 155–162, 2023, doi: 10.35970/infotekmesin.v14i1.1751.

S. Hanifi, A. Cammarono, and H. Zare-Behtash, “Advanced Hyperparameter Optimization of Deep Learning Models for Wind Power Prediction,” Renew. Energy, vol. 221, no. November 2023, p. 119700, 2024, doi: 10.1016/j.renene.2023.119700.

M. Sipper, “High Per Parameter: A Large-Scale Study of Hyperparameter Tuning for Machine Learning Algorithms,” Algorithms, vol. 15, no. 9, 2022, doi: 10.3390/a15090315.

M. Banurea, D. Betaria Hutagaol, and O. Sihombing, “Klasifikasi Penyakit Stunting dengan menggunakan Algoritma Support Vector Machine dan Random Forest,” J. TEKINKOM, vol. 6, no. 2, pp. 540–549, 2023, doi: 10.37600/tekinkom.v6i2.927.

Jan Melvin Ayu Soraya Dachi and Pardomuan Sitompul, “Analisis Perbandingan Algoritma XGBoost dan Algoritma Random Forest Ensemble Learning pada Klasifikasi Keputusan Kredit,” J. Ris. Rumpun Mat. Dan Ilmu Pengetah. Alam, vol. 2, no. 2, pp. 87–103, 2023, doi: 10.55606/jurrimipa.v2i2.1470.

J. Brandt and E. Lanzén, “A Comparative Review of SMOTE and ADASYN in Imbalanced Data Classification,” 2021‏, p. 42, 2020, [Online]. Available: https://www.diva-portal.org/smash/record.jsf?pid=diva2:1519153

M. Schonlau and R. Y. Zou, “The Random Forest Algorithm for Statistical Learning,” Stata J., vol. 20, no. 1, pp. 3–29, 2020, doi: 10.1177/1536867X20909688.

T. Kurniawan, L. Hermawanti, and A. N. Safriandono, “Interpretable Machine Learning with SHAP and XGBoost for Lung Cancer Prediction Insights,” vol. 8, no. 2, pp. 296–303, 2024.




DOI: https://doi.org/10.32520/stmsi.v14i2.4877

Article Metrics

Abstract view : 185 times
PDF - 56 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.