Predicting Students' Academic Performance in Mathematics based on Big Five Personality Traits using Random Forest with Synthetic Minority Over-Sampling Technique

Annisa Nurul Pratiwi, Ema Utami

Abstract


The secondary school period is a crucial time for the development of students' academic and social performance. Educational data mining (EDM) has emerged as a strategic method capable of exploring patterns in educational data to predict academic performance based on various factors, including students' personalities. However, the imbalance in educational data remains an issue that can lead to bias in predictive models. This study aims to identify the factors contributing to the academic performance in mathematics of junior high school students, such as academic, demographic, and Big Five personality factors. The Random Forest method and SMOTE oversampling technique are employed to identify components that contribute to students' academic performance and to enhance the performance of the predictive model. The research indicates that academic factors are significant, while socio-economic and personality factors are less significant in relation to academic performance. Additionally, the application of the SMOTE technique proves effective in addressing data imbalance, and the Random Forest model demonstrates optimal performance with appropriate tuning. The combination of Random Forest, hyperparameter tuning using GridSearchCV, and SMOTE successfully develops a model with an accuracy rate of 99%.

Keywords


Big Five; Educational Data Mining; Kinerja Siswa; Random Forest; SMOTE

Full Text:

PDF

References


M. Nachouki, E. A. Mohamed, R. Mehdi, and M. Abou Naaj, “Student Course Grade Prediction using the Random Forest Algorithm: Analysis Of Predictors’ Importance,” Trends Neurosci. Educ., vol. 33, p. 100214, 2023, doi: 10.1016/j.tine.2023.100214.

O. Ozyurt, H. Ozyurt, and D. Mishra, “Uncovering the Educational Data Mining Landscape and Future Perspective: A Comprehensive Analysis,” IEEE Access, vol. 11, no. October, pp. 120192–120208, 2023, doi: 10.1109/ACCESS.2023.3327624.

S. M. Dol and P. M. Jawandhiya, "Systematic Review and Analysis of EDM for Predicting the Academic Performance of Students", no. 0123456789. Springer India, 2024. doi: 10.1007/s40031-024-00998-0.

I. Issah, O. Appiah, P. Appiahene, and F. Inusah, “A Systematic Review of the Literature on Machine Learning Application of Determining the Attributes Influencing Academic Performance,” Decis. Anal. J., vol. 7, no. February, p. 100204, 2023, doi: 10.1016/j.dajour.2023.100204.

M. H. bin Roslan and C. J. Chen, “Educational Data Mining for Student Performance Prediction: A Systematic Literature Review (2015-2021),” Int. J. Emerg. Technol. Learn., vol. 17, no. 5, pp. 147–179, 2022, doi: 10.3991/ijet.v17i05.27685.

L. S. Rodrigues, M. Dos Santos, I. Costa, and M. A. L. Moreira, “Student Performance Prediction on Primary and Secondary Schools-A Systematic Literature Review,” Procedia Comput. Sci., vol. 214, no. C, pp. 680–687, 2022, doi: 10.1016/j.procs.2022.11.229.

S. M. Dol and P. M. Jawandhiya, “A Review of Data Mining in Education Sector,” J. Eng. Educ. Transform., vol. 36, no. Special Issue 2, pp. 13–22, 2022, doi: 10.16920/jeet/2023/v36is2/23003.

Wawan and H. Retnawati, “Empirical Study of Factors Affecting the Students’ Mathematics Learning Achievement,” Int. J. Instr., vol. 15, no. 2, pp. 417–434, 2022, doi: 10.29333/iji.2022.15223a.

A. Costa, D. Moreira, J. Casanova, Â. Azevedo, A. Gonçalves, Í. Oliveira, R. Azevedo, and P. C. Dias, "Determinants of Academic Achievement from the Middle to Secondary School Education: A Systematic Review," Social Psychology of Education, vol. 27, pp. 3533–3572, Jul. 2024, doi: 10.1007/s11218-024-09941-z.

S. El-Keiey, D. ElMenshawy, and E. Hassanein, “Student’s Performance Prediction based on Personality Traits and Intelligence Quotient using Machine Learning,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 9, pp. 292–299, 2022, doi: 10.14569/IJACSA.2022.0130934.

J. Meyer, T. Jansen, N. Hübner, and O. Lüdtke, Disentangling the Association between the Big Five Personality Traits and Student Achievement: Meta-Analytic Evidence on the Role of Domain Specificity and Achievement Measures, vol. 35, no. 1. Springer US, 2023. doi: 10.1007/s10648-023-09736-2.

J. R. Rico-Juan, C. Cachero, and H. Macià, “Study Regarding the Influence of a Student’s Personality and an LMS usage Profile on Learning Performance using Machine Learning Techniques,” Appl. Intell., vol. 54, no. 8, pp. 6175–6197, 2024, doi: 10.1007/s10489-024-05483-1.

F. S. E. Shaninah and M. H. Mohd Noor, “The Impact of Big Five Personality Trait in Predicting Student Academic Performance,” J. Appl. Res. High. Educ., vol. 16, no. 2, pp. 523–539, 2024, doi: 10.1108/JARHE-08-2022-0274.

M. H. Bin Roslan and C. J. Chen, “Predicting Students’ Performance in English and Mathematics using Data Mining Techniques,” Educ. Inf. Technol., vol. 28, no. 2, pp. 1427–1453, 2023, doi: 10.1007/s10639-022-11259-2.

D. Khairy, N. Alharbi, M. A. Amasha, M. F. Areed, S. Alkhalaf, and R. A. Abougalala, “Prediction of Student Exam Performance using Data Mining Classification Algorithms,” Educ. Inf. Technol., no. 0123456789, 2024, doi: 10.1007/s10639-024-12619-w.

A. Santoso, H. Retnawati, Kartianom, E. Apino, I. Rafi, and M. N. Rosyada, “Predicting Time to Graduation of Open University Students: An Educational Data Mining Study,” Open Educ. Stud., vol. 6, no. 1, 2024, doi: 10.1515/edu-2022-0220.

P. J. B. Pajila, B. G. Sheena, A. Gayathri, J. Aswini, M. Nalini, and R. Siva Subramanian, “A Comprehensive Survey on Naive Bayes Algorithm: Advantages, Limitations and Applications,” Proc. 4th Int. Conf. Smart Electron. Commun. ICOSEC 2023, pp. 1228–1234, 2023, doi: 10.1109/ICOSEC58147.2023.10276274.

I. D. Mienye and N. Jere, “A Survey of Decision Trees: Concepts, Algorithms, and Applications,” IEEE Access, vol. 12, pp. 86716–86727, 2024, doi: 10.1109/ACCESS.2024.3416838.

J. Pecuchova and M. Drlik, “Predicting Students at Risk of Early Dropping Out from Course using Ensemble Classification Methods,” Procedia Comput. Sci., vol. 225, pp. 3223–3232, 2023, doi: 10.1016/j.procs.2023.10.316.

F. Arden and C. Safitri, “Hyperparameter Tuning Algorithm Comparison with Machine Learning Algorithms,” Proceeding - 6th Int. Conf. Inf. Technol. Inf. Syst. Electr. Eng. Appl. Data SCI. Artif. Intell. Technol. Environ. Sustain. ICITISEE 2022, pp. 183–188, 2022, doi: 10.1109/ICITISEE57756.2022.10057630.

Y. Rimal, N. Sharma, and A. Alsadoon, “The Accuracy of Machine Learning Models Relies on Hyperparameter Tuning: Student Result Classification using Random Forest, Randomized Search, Grid Search, Bayesian, Genetic, And Optuna Algorithms,” Multimed. Tools Appl., vol. 83, no. 30, pp. 74349–74364, 2024, doi: 10.1007/s11042-024-18426-2.

S. D. A. Bujang, A. Selamat, O. Krejcar, F. Mohamed, L. K. Cheng, and P. C. Chiu, "Imbalanced Classification Methods for Student Grade Prediction: A Systematic Literature Review," IEEE Access, vol. 11, pp. 1970-1989, 2023, doi: 10.1109/ACCESS.2022.3225404.

N. G. Ramadhan and Adiwijaya, “Data Mining Techniques in Handling Personality Analysis for Ideal Customers,” J. Inf. Syst. Eng. Bus. Intell., vol. 8, no. 2, pp. 175–181, 2022, doi: 10.20473/jisebi.8.2.175-181.

A. K. Hamoud, M. B. M. Kamel, A. S. Gaafar, A. S. Alasady, A. M. Humadi, W. A. Awadh, and J. M. Dahr, "A Prediction Model based Machine Learning Algorithms with Feature Selection Approaches Over Imbalanced Dataset," Indones. J. Electr. Eng. Comput. Sci., vol. 28, no. 2, pp. 1105–1116, Nov. 2022, doi: 10.11591/ijeecs.v28.i2.pp1105-1116.

H. Jatnika, A. Waluyo, and A. Azis, “A Comparative Study on Data Collection Methods : Investigating Optimal Datasets for Data Mining Analysis,” vol. 5, no. 1, pp. 16–23, 2024.

O. P. John, E. M. Donahue, and R. Kentle, “The Big Five Inventory--Versions 4a and 54.” CA: University of California,Berkeley, Institute of Personality and Social Research, Berkeley, 1991.

O. P. John, L. P. Naumann, and C. J. Soto, “Paradigm Shift to the Integrative Big-Five Trait Taxonomy: History, Measurement, and Conceptual Issues,” in Handbook of personality: Theory and research, O. P. John, R. W. Robins, and L. A. Pervin, Eds. New York, NY: Guilford Press, 2008, pp. 114–158.

D. Budiastuti and A. Bandur, Validitas dan Reliabilitas Penelitian. Penerbit Mitra Wacana Media, 2018.

G. D. Garson, Validity & Reliability. Statistical Publishing Associates, 2013.

T. Gori, A. Sunyoto, and H. Al Fatta, “Preprocessing Data dan Klasifikasi untuk Prediksi Kinerja Akademik Siswa,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 1, pp. 215–224, 2024, doi: 10.25126/jtiik.20241118074.

J. Brownlee, Master Machine Learning Algorithms Discover how They Work and Implement Them from Scratch. Machine Learning Mastery, 2016. [Online]. Available: https://machinelearningmastery.com/master-machine-learning-algorithms/

A. C. Muller and S. Guido, Introduction to Machine Learning with Python, 1st Editio. O’Reilly Media, Inc, 2016. [Online]. Available: https://www.nrigroupindia.com/e-book/Introduction to Machine Learning with Python ( PDFDrive.com )-min.pdf

S. Raschka and V. Mirjalili, Python Machine Learning, 2nd Editio. Packt Publishing Ltd, 2017. [Online]. Available: http://radio.eng.niigata-u.ac.jp/wp/wp-content/uploads/2020/06/python-machine-learning-2nd.pdf

N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE : Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002, doi: 10.1613/jair.953.

N. A. Azhar, M. S. Mohd Pozi, A. M. Din, and A. Jatowt, “An investigation of SMOTE based Methods for Imbalanced Datasets with Data Complexity Analysis,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 7, pp. 6651–6672, 2023, doi: 10.1109/TKDE.2022.3179381.

A. Géron, Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Editio. O’Reilly Media, Inc., 2019. [Online]. Available: https://powerunit-ju.com/wp-content/uploads/2021/04/Aurelien-Geron-Hands-On-Machine-Learning-with-Scikit-Learn-Keras-and-Tensorflow_-Concepts-Tools-and-Techniques-to-Build-Intelligent-Systems-OReilly-Media-2019.pdf

G. Hackeling, Mastering Machine Learning with Scikit-Learn. Packt Publishing Ltd, 2014. [Online]. Available: https://www.amazon.com/Mastering-Machine-Learning-Scikit-learn-Algorithms/dp/1783988363

S. Rajvanshi, G. Kaur, A. Dhatwalia, Arunima, A. Singla, and A. Bhasin, Research on Problems and Solutions of Overfitting in Machine Learning, vol. 1191 LNEE. Springer Nature Singapore, 2024. doi: 10.1007/978-981-97-2508-3_47.




DOI: https://doi.org/10.32520/stmsi.v14i2.5102

Article Metrics

Abstract view : 170 times
PDF - 53 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.