Comparison of Logistic Regression and Random Forest using Correlation-based Feature Selection for Phishing Website Detection
Dublin Core | PKP Metadata Items | Metadata for this Document | |
1. | Title | Title of document | Comparison of Logistic Regression and Random Forest using Correlation-based Feature Selection for Phishing Website Detection |
2. | Creator | Author's name, affiliation, country | Farida Farida |
2. | Creator | Author's name, affiliation, country | Ali Mustopa; Universitas AMIKOM Yogyakarta |
3. | Subject | Discipline(s) | Ilmu Komputer |
3. | Subject | Keyword(s) | |
4. | Description | Abstract | The world is currently experiencing mass developments in information technology, especially during the current pandemic, which requires all of us to learn and even work online. They are triggered much crime in the internet world. One of them is stealing internet user data through a fake website built like the original or called a phishing website. In this research , a classification model is needed to detect phishing websites using the best performance from one of the logistic regression and random forest classification algorithms to overcome the rise of phishing websites in cyberspace. Classification performance can be improved using the correlation-based feature selection (CFS) method to select the most influential attribute in detecting web phishing. Based on the test results, applying the logistic regression and random forest classification algorithm in the classification of web phishing resulted in an accuracy of 93.035% and 96.834%. After feature selection with CFS, the accuracy was 92.718% and 97.015%, respectively. On the Testing, There was an increase in accuracy in RandomForest by 0.181% and an insignificant decrease in logistic regression. The test results prove that feature selection with CFS can eliminate redundant attributes and the resulting classification algorithm accuracy is not much different when the details are complete and Random Forest has accuracy better than after using CSF. Keywords: website phishing, classification, logistic regression, random forest, correlation-based |
5. | Publisher | Organizing agency, location | Program Studi Sistem Informasi Fakultas Teknik dan Ilmu Komputer |
6. | Contributor | Sponsor(s) | Universitas AMIKOM Yogyakarta |
7. | Date | (YYYY-MM-DD) | 2023-01-31 |
8. | Type | Status & genre | Peer-reviewed Article |
8. | Type | Type | |
9. | Format | File format | |
10. | Identifier | Uniform Resource Identifier | https://sistemasi.org/index.php/stmsi/article/view/1832 |
10. | Identifier | Digital Object Identifier (DOI) | https://doi.org/10.32520/stmsi.v12i1.1832 |
11. | Source | Title; vol., no. (year) | SISTEMASI; Vol 12, No 1 (2023): Sistemasi: Jurnal Sistem Informasi |
12. | Language | English=en | id |
13. | Relation | Supp. Files | |
14. | Coverage | Geo-spatial location, chronological period, research sample (gender, age, etc.) | |
15. | Rights | Copyright and permissions |
Copyright (c) 2023 Sistemasi:Jurnal Sistem Informasi |