Machine learning-based phishing detection using URL features: a comprehensive review

Asif, Asif Uz Zaman, author; Ray, Indrakshi, advisor; Shirazi, Hossein, advisor; Ray, Indrajit, committee member; Wang, Haonan, committee member

Machine learning-based phishing detection using URL features: a comprehensive review

Files

Asif_colostate_0053N_18146.pdf (528.67 KB)

Date

2023

Authors

Asif, Asif Uz Zaman, author

Ray, Indrakshi, advisor

Shirazi, Hossein, advisor

Ray, Indrajit, committee member

Wang, Haonan, committee member

Abstract

In a social engineering attack known as phishing, a perpetrator sends a false message to a victim while posing as a trusted representative in an effort to collect private information such as login passwords and financial information for personal gain. To successfully carry out a phishing attack, fraudulent websites, emails, and messages that are counterfeit are utilized to trick the victim. Machine learning appears to be a promising technique for phishing detection. Typically, website content and Unified Resource Locator (URL) based features are used. However, gathering website content features requires visiting malicious sites, and preparing the data is labor-intensive. Towards this end, researchers are investigating if URL-only information can be used for phishing detection. This approach is lightweight and can be installed at the client's end, they do not require data collection from malicious sites and can identify zero-day attacks. We conduct a systematic literature review on URL-based phishing detection. We selected recent papers (2018 --) or if they had a high citation count (50+ in Google Scholar) that appeared in top conferences and journals in cybersecurity. This survey will provide researchers and practitioners with information on the current state of research on URL-based website phishing attack detection methodologies. The results of this study show that, despite the lack of a centralized dataset, this is beneficial because it prevents attackers from seeing the features that classifiers employ. However, the approach is time-consuming for researchers. Furthermore, for algorithms, both machine learning and deep learning algorithms can be utilized since they have very good classification accuracy, and in this work, we found that Random Forest and Long Short-Term Memory are good choices of algorithms. Using task-specific lexical characteristics rather than concentrating on the number of features is essential for this work because feature selection will impact how accurately algorithms will detect phishing URLs.

Subject

machine learning

social engineering

URL-based

phishing

cyber-security

survey

URI

https://hdl.handle.net/10217/237380

Collections

2020-
Theses and Dissertations

Full item page

Machine learning-based phishing detection using URL features: a comprehensive review

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Rights Access

Subject

Citation

URI

Associated Publications

Collections