Hyperparameter Optimization of Ensemble Learning for Heart Disease Prediction using Patient Data

Nikko Listio Wicaksono, Kusrini Kusrini

Abstract


This study evaluates the impact of hyperparameter optimization on the performance of four machine learning algorithms—Extra Trees, XGBoost, Random Forest, and AdaBoost—in heart disease prediction. The results show that hyperparameter tuning significantly improves model performance for three out of the four algorithms, with varying effects across models. Extra Trees demonstrates the most consistent improvement, achieving the highest Area Under the Curve (AUC) of 0.9107 and a recall of 80.93%, which is particularly crucial in medical contexts for accurately identifying disease cases. XGBoost exhibits the largest increase in accuracy, rising from 78.11% to 81.49%, while Random Forest shows improvements in both recall and F1-score. In contrast, AdaBoost experiences a slight decline in performance, suggesting that the model was already near optimal prior to tuning. Overall, Extra Trees with hyperparameter optimization emerges as the best-performing algorithm for heart disease prediction, offering high reliability in identifying at-risk patients.

Keywords


adaboost; extra trees; hyperparameter optimizatio; machine learning algorithms; random forest; xgboost

Full Text:

PDF

References


“WHO.” Accessed: Feb. 10, 2025. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)

A. Sekarbumi et al., “Literature Review: Peran Manajemen Stres dan Pola Hidup Sehat dalam mencegah Diabetes Mellitus Tipe 2 pada Remaja,” J. Penelit. Inov., Vol. 5, No. 2, pp. 2219–2228, 2025, DOI: 10.54082/jupin.1456.

R. Ajartha, J. Sudirman No, L. Pakam, and K. Deli Serdang, “Penanganan Awal Gagal Jantung Akut bagi Tenaga Kesehatan di Instalasi Gawat Darurat,” J. Pengabdi. Kpd. Masy., Vol. 4, No. 1, p. 2024, 2024.

N. Chandrasekhar and S. Peddakrishna, “Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization,” Processes, Vol. 11, No. 4, 2023, DOI: 10.3390/pr11041210.

D. Al-fraihat, Y. Sharrab, H. Alshishani, and A. Algarni, “Hyperparameter Optimization for Software Bug Prediction using Ensemble Learning,” IEEE Access, Vol. 12, No. January, pp. 51869–51878, 2024, DOI: 10.1109/ACCESS.2024.3380024.

M. Nurkholifah, Jasmarizal, Y. Umar, and Rahmaddeni, “Analisa Performa Algoritma Machine Learning dalam Prediksi Penyakit Liver,” J. Indones. Manaj. Inform. dan Komun., Vol. 4, No. 3, pp. 989–998, 2023.

J. Tjen and G. Hoendarto, “Assessing the Performance of Different Variations of Ensembled Tree Models in Chlorophyll Concentration Prediction,” Vol. 09, No. 0, pp. 6401–6409, 2024.

S. R. Labhsetwar, “Predictive Analysis of Customer Churn in Telecom Industry using Supervised Learning,” pp. 2054–2060, 2020, DOI: 10.21917/ijsc.2020.0291.

N. Nahar, F. Ara, K. Andersson, and V. Barua, “A Comparative Analysis of the Ensemble Method for Liver Disease Prediction,” pp. 23–24, 2019.

C. Models, “Using Sensor Data to Detect Lameness and Mastitis Treatment Events in Dairy Cows : A Comparison of Classification Models,” pp. 1–18, 2020.

R. Wirth and J. Hipp, “CRISP-DM: Towards a Standard Process Model for Data Mining. Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, 29-39,” Proc. Fourth Int. Conf. Pract. Appl. Knowl. Discov. Data Min., No. 24959, pp. 29–39, 2000.

S. M. Ganie, P. K. D. Pramanik, M. B. Malik, A. Nayyar, and K. S. Kwak, “An Improved Ensemble Learning Approach for Heart Disease Prediction using Boosting Algorithms,” Comput. Syst. SCI. Eng., Vol. 46, No. 3, pp. 3993–4006, 2023, DOI: 10.32604/csse.2023.035244.

A. H. Yusufi, A. Kharisma, A. D. Adinata, D. F. Ramzy, and M. M. Santoni, “Prediksi Resiko Kematian pada Penderita Penyakit Kadiovaskular menggunakan Metode Ensemble Learning,” Semin. Nas. Mhs. Ilmu Komput. dan Apl., pp. 531–542, 2022.

C. Chen et al., “DNN-DTIs: Improved Drug-Target Interactions Prediction using XGBoost Feature Selection and Deep Neural Network,” Comput. Biol. Med., Vol. 136, No. March, p. 104676, 2021, DOI: 10.1016/j.compbiomed.2021.104676.

D. Chicco and G. Jurman, “The Advantages of the Matthews Correlation Coefficient (MCC) Over F1 Score and Accuracy in Binary Classification Evaluation,” BMC Genomics, Vol. 21, No. 1, pp. 1–14, 2020, DOI: 10.1186/s12864-019-6413-7.

K. Hajian-Tilaki, “Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation,” Casp. J. Intern. Med., Vol. 4, No. 2, pp. 627–635, 2013.

A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The Impact of Class Imbalance in Classification Performance Metrics based on the Binary Confusion Matrix,” Pattern Recognit., Vol. 91, pp. 216–231, 2019, DOI: 10.1016/j.patcog.2019.02.023.

A. Tharwat, “Classification Assessment Methods,” Appl. Comput. Informatics, Vol. 17, No. 1, pp. 168–192, 2018, DOI: 10.1016/j.aci.2018.08.003.




DOI: https://doi.org/10.32520/stmsi.v15i3.6006

Article Metrics

Abstract view : 0 times
PDF - 0 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.