Graduation Prediction for Prospective University Students Using Stacking Ensemble Learning

Claudia Swastikawati, Ema Utami

Abstract


Graduation is a crucial indicator for accreditation and forms an integral part of quality management strategies in higher education institutions. Therefore, early prediction of student graduation is essential to enhance the effectiveness of data-driven decision-making in student admissions. Differences in graduation rates are influenced by a combination of academic, demographic, economic, and family factors. This study applies a Stacking Ensemble Learning method by combining Random Forest, K-Nearest Neighbors, and Support Vector Machine, with XGBoost used as a meta-learner. The dataset integrates new student enrollment data and graduation status reports from the NeoFeeder PDDikti system, covering 16 academic and non-academic feature variables. The model was evaluated using accuracy, precision, recall, F1-score, and Area Under the Curve (AUC) metrics. The results show that the stacking ensemble model outperforms individual models, achieving an accuracy of 82%, a weighted F1-score of 80%, and an AUC of 87.15% on the test data. These findings contribute to the identification of relevant features and demonstrate the effectiveness of applying an ensemble model for building a machine learning-based prediction system, particularly in addressing data imbalance and improving classification accuracy.

Keywords


stacking, multi-model classification; graduation prediction; machine learning

Full Text:

PDF

References


A. F. Mohamed Nafuri, N. S. Sani, N. F. A. Zainudin, A. H. A. Rahman, and M. Aliff, “Clustering Analysis for Classifying Student Academic Performance in Higher Education,” Applied Sciences (Switzerland), Vol. 12, No. 19, Oct. 2022, DOI: 10.3390/app12199467.

T. R. Noviandy et al., “Machine Learning for Early Detection of Dropout Risks and Academic Excellence: A Stacked Classifier Approach,” Journal of Educational Management and Learning, Vol. 2, No. 1, pp. 28–34, 2024, DOI: 10.60084/jeml.v2i1.191.

H. Karalar and C. Kapucu, “Akses terbuka memprediksi Siswa Berisiko Gagal Akademik menggunakan Model Ansambel pada Masa Pandemi dalam Sistem Pembelajaran Jarak Jauh,” 2021.

Z. Sun, Y. Yuan, X. Xiong, S. Meng, Y. Shi, and A. Chen, “Predicting Academic Achievement from the Collaborative Influences of Executive Function, Physical Fitness, and Demographic Factors among Primary School Students in China: Ensemble Learning Methods,” BMC Public Health, Vol. 24, No. 1, pp. 1–13, 2024, DOI: 10.1186/s12889-024-17769-7.

F. Ouatik and M. Eritali, “Machine Translated by Google memprediksi Keberhasilan Siswa menggunakan Big Data dan Mesin Algoritma Pembelajaran Machine Translated by Google,” pp. 236–251.

M. Yaÿcÿ, “Penambangan Data Pendidikan : Prediksi Kinerja Akademik Siswa menggunakan Algoritma Pembelajaran Mesin,” 2022.

H. Karalar, C. Kapucu, and H. Gürüler, “Predicting Students at Risk of Academic Failure using Ensemble Model during Pandemic in a Distance Learning System,” International Journal of Educational Technology in Higher Education, Vol. 18, No. 1, 2021, DOI: 10.1186/s41239-021-00300-y.

R. Shintabella, C. Edi Widodo, and A. Wibowo, “Loss of Life Transformer Prediction based on Stacking Ensemble Improved by Genetic Algorithm By IJISRT,” International Journal of Innovative Science and Research Technology (IJISRT), Vol. 9, No. 3, pp. 1061–1066, 2024, DOI: 10.38124/ijisrt/ijisrt24mar1125.

M. R. Alzahrani, “Predicting Student Performance using Ensemble Models and Learning Analytics Techniques,” Preprints.org, p. 202406.1100.v1, 2024, DOI: 10.20944/preprints202406.1100.v1.

K. Mahboob, Sarfaraz Abdul Sattar Natha, Syed Saood Zia, Priha Bhatti, Abeer Javed Syed, and Samra Mehmood, “An Ensemble Modeling Approach to Enhance Grade Prediction in Academic Engineering Programming Courses,” VFAST Transactions on Software Engineering, Vol. 11, No. 4, pp. 01–14, 2023, DOI: 10.21015/vtse.v11i4.1641.

L. Yan and Y. Liu, “An Ensemble Prediction Model for Potential Student Recommendation using Machine Learning,” Symmetry (Basel), Vol. 12, No. 5, pp. 1–17, 2020, DOI: 10.3390/SYM12050728.

A. J. Fernandez-Garcia, J. C. Preciado, F. Melchor, R. Rodriguez-Echeverria, J. M. Conejero, and F. Sanchez-Figueroa, “A Real-Life Machine Learning Experience for Predicting University Dropout at Different Stages using Academic Data,” IEEE Access, Vol. 9, pp. 133076–133090, 2021, DOI: 10.1109/ACCESS.2021.3115851.

S. D. A. Bujang et al., “Multiclass Prediction Model for Student Grade Prediction using Machine Learning,” IEEE Access, Vol. 9, pp. 95608–95621, 2021, DOI: 10.1109/ACCESS.2021.3093563.

Herianto, B. Kurniawan, Z. H. Hartomi, Y. Irawan, and M. K. Anam, “Machine Learning Algorithm Optimization using Stacking Technique for Graduation Prediction,” Journal of Applied Data Sciences, Vol. 5, No. 3, pp. 1272–1285, 2024, DOI: 10.47738/jads.v5i3.316.

A. Ghasemieh, A. Lloyed, P. Bahrami, P. Vajar, and R. Kashef, “A Novel Machine Learning Model with Stacking Ensemble Learner for Predicting Emergency Readmission of Heart-Disease Patients,” Decision Analytics Journal, Vol. 7, No. February, p. 100242, 2023, DOI: 10.1016/j.dajour.2023.100242.

H. Sahlaoui, E. A. A. Alaoui, A. Nayyar, S. Agoujil, and M. M. Jaber, “Predicting and Interpreting Student Performance using Ensemble Models and Shapley Additive Explanations,” IEEE Access, Vol. 9, pp. 152688–152703, 2021, DOI: 10.1109/ACCESS.2021.3124270.

P. Sejati, Munawar, M. Pilliang, and H. Akbar, “Studi Komparasi Naive Bayes , K-Nearest Neighbor, dan Random Forest untuk Prediksi Calon Mahasiswa yang Diterima atau Comparative Study of Naive Bayes , K-Nearest Neighbor , and Random Forest for the Prediction of Prospective Students,” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), Vol. 9, No. 7, pp. 1341–1348, 2022, DOI: 10.25126/jtiik.202296737.

M. Nachouki and M. A. Naaj, “Predicting Student Performance to Improve Academic Advising using the Random Forest Algorithm,” International Journal of Distance Education Technologies, Vol. 20, No. 1, pp. 1–17, 2022, DOI: 10.4018/IJDET.296702.

I. Vol and M. Gusnina, “Machine Translated by Google Prediksi Kinerja Mahasiswa Universitas Sebelas Maret Berdasarkan Random Forest Algoritma Machine Translated by Google,” Vol. 27, No. 3, pp. 495–501, 2022.

N. A. Butt, Z. Mahmood, K. Shakeel, S. Alfarhood, M. Safran, and I. Ashraf, “Performance Prediction of Students in Higher Education using Multi-Model Ensemble Approach,” IEEE Access, Vol. 11, pp. 136091–136108, 2023, DOI: 10.1109/ACCESS.2023.3336987.

M. A. Muslim et al., “New Model Combination Meta-Learner to Improve Accuracy Prediction P2P Lending with Stacking Ensemble Learning,” Intelligent Systems with Applications, Vol. 18, No. February, p. 200204, 2023, DOI: 10.1016/j.iswa.2023.200204.

F. Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, “Learning from Imbalanced Data Sets,” Springer, 2018, DOI: https://doi.org/10.1007/978-3-319-98074-4.

G. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, “Learning from Class-Imbalanced Data: Review of Methods and Applications. Expert Systems with Applications,” Vol. 73, pp. 220–239, 2017, DOI: https://doi.org/10.1016/j.eswa.2016.12.035.

E. Richardson, R. Trevizani, J. A. Greenbaum, H. Carter, M. Nielsen, and B. Peters, “Pr ep rin t n pe er r ed Pr ep rin t n er ed”.

S. S. Yadav* and G. P. Bhole, “Learning from Imbalanced Data in Classification,” International Journal of Recent Technology and Engineering (IJRTE), Vol. 8, No. 5, pp. 1907–1016, 2020, DOI: 10.35940/ijrte.e6286.018520.

S. M. Lundberg and S. I. Lee, “A Unified Approach to Interpreting Model Predictions,” Adv Neural Inf Process Syst, Vol. 2017-Decem, No. Section 2, pp. 4766–4775, 2017.




DOI: https://doi.org/10.32520/stmsi.v14i6.5535

Article Metrics

Abstract view : 6 times
PDF - 1 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.