Analysis of the Performance Comparison between Random Forest and SVM RBF in Detecting Cyberbullying on Imbalanced Data with the SMOTE Approach

Inna Nur Amalina, Norhikmah Norhikmah, Wahid Miftahul Ashari

Abstract


Cyberbullying has emerged as a growing threat with the widespread adoption of social media, creating significant risks to online safety. Automatic detection of such behavior remains challenging, particularly when the training dataset is highly imbalanced. This study presents a comparative analysis of Random Forest and Support Vector Machine with Radial Basis Function kernel (SVM RBF) for cyberbullying detection, incorporating the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance. The experiments utilized a publicly available, manually annotated dataset containing 47,693 English-language tweets from global users, labeled as cyberbullying or non-cyberbullying. Performance was evaluated using accuracy, precision, recall, and F1-score. Results indicate that Random Forest achieved the highest performance before SMOTE (accuracy = 88.52%, precision = 89.07%, recall = 94.00%, F1-score = 91.49%), while SMOTE improved recall for both algorithms but reduced accuracy and precision. These findings highlight that the choice of algorithm and effective handling of class imbalance are critical for enhancing the reliability of automated cyberbullying detection systems, thereby enabling more effective content moderation and safer online environments.

Keywords


cyberbullying; imbalanced data; SMOTE; Random Forest; SVM RBF

Full Text:

PDF

References


C. Haythornthwaite, “Online Social Networking,” in The Blackwell Encyclopedia of Sociology, Wiley, 2024, pp. 1–2. DOI: 10.1002/9781405165518.wbeoso036.pub3.

N. Agustiningsih, A. Yusuf, A. Ahsan, and Q. Fanani, “The Impact of Bullying and Cyberbullying on Mental Health: A Systematic Review,” International Journal of Public Health Science (IJPHS), Vol. 13, No. 2, p. 513, Jun. 2024, DOI: 10.11591/ijphs.v13i2.23683.

A. Wahyu Nugroho, “Analisis Sentimen menggunakan Algoritma Support Vector Machine pada Covid_19 Sentiment Analysis using the Support Vector Machine Algorithm on Covid_19,” SISTEMASI: Jurnal Sistem Informasi, 2024. [Online]. Available: http://sistemasi.ftik.unisi.ac.id

P. H. Gunawan and I. V. Paputungan, “Sistemasi: Jurnal Sistem Informasi Deteksi Tingkat Potensi Kelulusan Calon Mahasiswa menggunakan Algoritma Random Forest Detection of Graduation Potential in Prospective Students using the Random Forest Algorithm.” [Online]. Available: http://sistemasi.ftik.unisi.ac.id

C. S. Jalda, U. B. Polimetal, A. K. Nanda, and S. Nanda, “A Comparison Study of Cyberbullying Detection using Various Machine Learning Algorithms,” in Communications in Computer and Information Science, Springer Science and Business Media Deutschland GmbH, 2024, pp. 43–54. DOI: 10.1007/978-3-031-61298-5_4.

A. F. Alqahtani and M. Ilyas, “An Ensemble-based Multi-Classification Machine Learning Classifiers Approach to Detect Multiple Classes of Cyberbullying,” Mach Learn Knowl Extr, Vol. 6, No. 1, pp. 156–170, Mar. 2024, DOI: 10.3390/make6010009.

H. H. Limbong, “Sistemasi: Jurnal Sistem Informasi Optimasi Analisis Sentimen Ulasan Aplikasi Amikom One menggunakan SMOTE pada Algoritma Artificial Neural Network Optimization of Sentiment Analysis for Amikom One Application Reviews using SMOTE with Artificial Neural Network Algorithm.” [Online]. Available: http://sistemasi.ftik.unisi.ac.id

A. Alsabry, M. Algabri, A. M. Ahsan, M. A. A. Mosleh, A. A. Ahmed, and H. A. Qasem, “Enhancing Prediction Models’ Performance for Breast Cancer using SMOTE Technique,” in 2023 3rd International Conference on Emerging Smart Technologies and Applications, eSmarTA 2023, Institute of Electrical and Electronics Engineers Inc., 2023. DOI: 10.1109/eSmarTA59349.2023.10293726.

M. S. Nikhila, A. Bhalla, and P. Singh, “Text Imbalance Handling and Classification for Cross- platform Cyber-crime Detection using Deep Learning,” 2020.

Y. Anusha, R. Visalakshi, and K. Srinivas, “Imbalanced Data Classification using Improved Synthetic Minority Over-Sampling Technique,” Multiagent and Grid Systems, Vol. 19, No. 2, pp. 117–131, Oct. 2023, DOI: 10.3233/MGS-230007.

Q. Zhai, Y. Tian, and J. Zhou, “A Smote based Quadratic Surface Support Vector Machine for Imbalanced Classification with Mislabeled Information,” Journal of Industrial and Management Optimization, Vol. 19, No. 2, pp. 1310–1327, 2023, DOI: 10.3934/jimo.2021230.

M. Azad, T. H. Nehal, and M. Moshkov, “A Novel Ensemble Learning Method using Majority based Voting of Multiple Selective Decision Trees,” Computing, Vol. 107, No. 1, Jan. 2025, DOI: 10.1007/s00607-024-01394-8.

P. H. Gunawan and I. V. Paputungan, “Sistemasi: Jurnal Sistem Informasi Deteksi Tingkat Potensi Kelulusan Calon Mahasiswa menggunakan Algoritma Random Forest Detection of Graduation Potential in Prospective Students using the Random Forest Algorithm.” [Online]. Available: http://sistemasi.ftik.unisi.ac.id

H. Kurniawan, A. Aminuddin, T. Hidayat, N. Norhikmah, K. R. Hidayat, and N. Larasati, “A Comparative Performance Analysis of SVM Kernels in Automated Breast Cancer Diagnosis,” in 2024 International Conference on Information Technology Systems and Innovation, ICITSI 2024 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 230–235. DOI: 10.1109/ICITSI65188.2024.10929383.

A. Mustofa and S. Pradana, “Perbandingan Pengujian Deteksi Phising menggunakan Metode SVM dengan Kernel RBF dan Linear Comparison of Phishing Detection Tests using the SVM Method with RBF and Linear Kernels.” , “SISTEMATIS: Jurnal Sistem Informasi” [Online]. Available: http://sistemasi.ftik.unisi.ac.id

S. Gupta, I. B. Jain, M. Saxena, P. K. Sarangi, A. K. Sahoo, and A. K. Agrawal, “Cyber Bullying Detection and Classification using Machine Learning Algorithms,” in 2024 International Conference on Cybernation and Computation, CYBERCOM 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 167–171. DOI: 10.1109/CYBERCOM63683.2024.10803176.

J. Pardede and D. P. Pamungkas, “The Impact of Balanced Data Techniques on Classification Model Performance,” Scientific Journal of Informatics, Vol. 11, No. 2, pp. 401–412, May 2024, DOI: 10.15294/sji.v11i2.3649.

A. Mustofa and S. Pradana, “Comparison of Phishing Detection Tests using the SVM Method with RBF and Linear Kernels,” Sistemasi: Jurnal Sistem Informasi, Vol. 12, No. 3, pp. 754–759, 2023, DOI: https://doi.org/10.32520/stmsi.v12i3.2882




DOI: https://doi.org/10.32520/stmsi.v14i6.5574

Article Metrics

Abstract view : 6 times
PDF - 5 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.