Comparison of Filter and Wrapper Feature Selection Methods for Heart Disease Risk Classification using K-Nearest Neighbors (k-NN)

Deni Kuswandani, Herman Herman, Rusydi Umar

Abstract


Feature selection plays a crucial role in improving the effectiveness of medical classification models. This study compares two feature selection approaches—filter and wrapper methods—in developing a k-Nearest Neighbors (k-NN) model for heart disease risk classification. The dataset consists of patients’ demographic data, lifestyle factors, and clinical indicators. In this study, the filter method was applied by considering data types: Pearson Correlation was used for numerical features, while the Chi-Square test was applied to categorical features. The selected features from both techniques were then combined, reducing the initial 20 features to four key variables considered most relevant for heart disease risk classification: BMI, homocysteine level, blood pressure, and stress level. This approach achieved high computational efficiency; however, it resulted in only a modest accuracy improvement (76.8%) and a low recall for the minority class (0.07). In contrast, the wrapper method using Sequential Forward Selection (SFS) produced a more informative subset of 11 features, achieving higher accuracy (80.00%) and a ROC-AUC of 0.657, indicating better discrimination capability for the minority class. These findings suggest that while the filter method excels in simplicity and computational efficiency, the wrapper method is more effective in improving classification performance. This study provides empirical insights into selecting appropriate feature selection strategies based on analytical objectives, particularly for clinical decision support systems.

Keywords


feature selection; filter method; heart disease classification; k-Nearest Neighbors (k-NN); wrapper method

Full Text:

PDF

References


M. Di Cesare et al., “The Heart of the World,” Glob. Heart, Vol. 19, No. 1, 2024, DOI: 10.5334/gh.1288.

M. Fira, L. Goras, and H.-N. Costin, “Evaluating Sparse Feature Selection Methods: A Theoretical and Empirical Perspective,” Applied Sciences, Vol. 15, No. 7, p. 3752, Mar. 2025, DOI: 10.3390/app15073752.

I. G. A. Gunadi and D. O. Rachmawati, “A Comparative Study on the Impact of Feature Selection and Dataset Resampling on the Performance of the K-Nearest Neighbors (KNN) Classification Algorithm,” Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), Vol. 13, No. 2, pp. 419–427, Jul. 2024, DOI: 10.23887/janapati.v13i2.82174.

R. Ishak, “Optimalisasi Seleksi Atribut K-Means menggunakan Correlation Matrix pada Clustering Penyakit Pasien Optimization of K-Means Attribute Selection using Correlation Matrix in Patient Disease Clustering,” Jambura Journal of Electrical and Electronics Engineering, Vol. 7, No. 2, Jul. 2025, DOI: 10.37905/jjeee.v7i2.28010.

A. Wantoro, A. Fitria Yulia, D . Yana Ayu, and S. Mustofa, “Evaluasi Kinerja Algoritma Machine Learning (Ml) menggunakan Seleksi Fitur pada Klasifikasi Diabetes,” JIP (Jurnal Informatika Polinema), Vol. 11, No. 3, May 2025, DOI: 10.33795/jip.v11i3.

F. F. Firdaus, H. A. Nugroho, and I. Soesanti, “A Review of Feature Selection and Classification Approaches for Heart Disease Prediction,” International Journal of Information Technology and Electrical Engineering (IJITEE), Vol. 4, No. 3, Sep. 2020, DOI: 10.22146/ijitee.59193.

H. Nugroho, G. E. Yuliastuti, and A. Firman, “Klasifikasi Diagnosis Diabetes Melitus menggunakan Metode Naïve Bayes dengan Seleksi Fitur Backward Elimination,” Jurnal Ilmiah NERO, Vol. 8, No. 2, 2023, DOI: 10.21107/nero.v8i2.21110.

D. Cahya and P. Buani, “Penerapan Algoritma Naïve Bayes dengan Seleksi Fitur Algoritma Genetika untuk Prediksi Gagal Jantung,” Jurnal Sains dan Manajemen, Vol. 9, No. 2, 2021.

S. R. Azizah, R. Herteno, A. Farmadi, D. Kartini, and I. Budiman, “Kombinasi Seleksi Fitur berbasis Filter dan Wrapper menggunakan Naive Bayes pada Klasifikasi Penyakit Jantung,” Jurnal Teknologi Informasi dan Ilmu Komputer, Vol. 10, No. 6, pp. 1361–1368, Dec. 2023, DOI: 10.25126/jtiik.2023107467.

Y. Setiawan, “Data Mining berbasis Nearest Neighbor dan Seleksi Fitur untuk Deteksi Kanker Payudara,” Jurnal pengembangan IT (JPIT), Vol. 8, No. 2, 2023, DOI: 10.30591/jpit.v8i2.4994.

E. N. Wanyonyi and N. W. Masinde, “The Impact of Data Preprocessing on Machine Learning Model Performance: A Comprehensive Examination,” International Journal of Scientific Research in Computer Science, Engineering and Information Technology, Vol. 11, No. 2, pp. 3814–3827, Apr. 2025, DOI: 10.32628/CSEIT25112854.

S. Alam, M. S. Ayub, S. Arora, and M. A. Khan, “An Investigation of the Imputation Techniques for Missing Values in Ordinal Data Enhancing Clustering and Classification Analysis Validity,” Decision Analytics Journal, Vol. 9, p. 100341, Dec. 2023, DOI: 10.1016/j.dajour.2023.100341.

Y. Rimal, N. Sharma, S. Paudel, A. Alsadoon, M. P. Koirala, and S. Gill, “Comparative Analysis of Heart Disease Prediction using Logistic Regression, SVM, KNN, and Random Forest with Cross-Validation for Improved Accuracy,” SCI. Rep., Vol. 15, No. 1, Dec. 2025, DOI: 10.1038/s41598-025-93675-1.

M. S. Pathan, Av. Nag, M. M. Pathan, and S. Dev, “Analyzing the Impact of Feature Selection on the Accuracy of Heart Disease Prediction,” Jun. 2022, [Online]. Available: http://arxiv.org/abs/2206.03239

T. Zhao, Y. Zheng, and Z. Wu, “Feature Selection-based Machine Learning Modeling for Distributed Model Predictive Control of Nonlinear Processes,” Comput. Chem. Eng., Vol. 169, Jan. 2023, DOI: 10.1016/j.compchemeng.2022.108074.

S. Suresh, D. T. Newton, T. H. Everett, G. Lin, and B. S. Duerstock, “Feature Selection Techniques for a Machine Learning Model to Detect Autonomic Dysreflexia,” Front. Neuroinform., Vol. 16, Aug. 2022, DOI: 10.3389/fninf.2022.901428.

R. R. Sarra, I. I. Gorial, R. R. Manea, A. E. Korial, M. Mohammed, and Y. Ahmed, “Enhanced Stacked Ensemble-based Heart Disease Prediction with Chi-Square Feature Selection Method,” Journal of Robotics and Control (JRC), Vol. 5, No. 6, pp. 1753–1763, 2024, DOI: 10.18196/jrc.v5i6.23191.

A. I. Neugut and T. Fojo, “The Statistical Significance Revolution,” JNCI Cancer Spectr., Vol. 8, No. 3, Apr. 2024, DOI: 10.1093/jncics/pkae035.

N. Pudjihartono, T. Fadason, A. W. Kempa-Liehr, and J. M. O’Sullivan, “A Review of Feature Selection Methods for Machine Learning-based Disease Risk Prediction,” Frontiers in Bioinformatics, Vol. 2, Jun. 2022, DOI: 10.3389/fbinf.2022.927312.

S. Shafiee, L. M. Lied, I. Burud, J. A. Dieseth, M. Alsheikh, and M. Lillemo, “Sequential Forward Selection and Support Vector Regression in Comparison to LASSO Regression for Spring Wheat Yield Prediction based on UAV Imagery,” Comput. Electron. Agric., Vol. 183, Apr. 2021, DOI: 10.1016/j.compag.2021.106036.

I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” 2003.




DOI: https://doi.org/10.32520/stmsi.v15i3.5989

Article Metrics

Abstract view : 0 times
PDF - 0 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.