Machine learning-based phishing detection using URL features: a comprehensive review

Asif, Asif Uz Zaman, author; Ray, Indrakshi, advisor; Shirazi, Hossein, advisor; Ray, Indrajit, committee member; Wang, Haonan, committee member

Machine learning-based phishing detection using URL features: a comprehensive review

dc.contributor.author	Asif, Asif Uz Zaman, author
dc.contributor.author	Ray, Indrakshi, advisor
dc.contributor.author	Shirazi, Hossein, advisor
dc.contributor.author	Ray, Indrajit, committee member
dc.contributor.author	Wang, Haonan, committee member
dc.date.accessioned	2024-01-01T11:24:22Z
dc.date.available	2024-01-01T11:24:22Z
dc.date.issued	2023
dc.description.abstract	In a social engineering attack known as phishing, a perpetrator sends a false message to a victim while posing as a trusted representative in an effort to collect private information such as login passwords and financial information for personal gain. To successfully carry out a phishing attack, fraudulent websites, emails, and messages that are counterfeit are utilized to trick the victim. Machine learning appears to be a promising technique for phishing detection. Typically, website content and Unified Resource Locator (URL) based features are used. However, gathering website content features requires visiting malicious sites, and preparing the data is labor-intensive. Towards this end, researchers are investigating if URL-only information can be used for phishing detection. This approach is lightweight and can be installed at the client's end, they do not require data collection from malicious sites and can identify zero-day attacks. We conduct a systematic literature review on URL-based phishing detection. We selected recent papers (2018 --) or if they had a high citation count (50+ in Google Scholar) that appeared in top conferences and journals in cybersecurity. This survey will provide researchers and practitioners with information on the current state of research on URL-based website phishing attack detection methodologies. The results of this study show that, despite the lack of a centralized dataset, this is beneficial because it prevents attackers from seeing the features that classifiers employ. However, the approach is time-consuming for researchers. Furthermore, for algorithms, both machine learning and deep learning algorithms can be utilized since they have very good classification accuracy, and in this work, we found that Random Forest and Long Short-Term Memory are good choices of algorithms. Using task-specific lexical characteristics rather than concentrating on the number of features is essential for this work because feature selection will impact how accurately algorithms will detect phishing URLs.
dc.format.medium	born digital
dc.format.medium	masters theses
dc.identifier	Asif_colostate_0053N_18146.pdf
dc.identifier.uri	https://hdl.handle.net/10217/237380
dc.language	English
dc.language.iso	eng
dc.publisher	Colorado State University. Libraries
dc.relation.ispartof	2020-
dc.rights	Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subject	machine learning
dc.subject	social engineering
dc.subject	URL-based
dc.subject	phishing
dc.subject	cyber-security
dc.subject	survey
dc.title	Machine learning-based phishing detection using URL features: a comprehensive review
dc.type	Text
dcterms.rights.dpla	This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Colorado State University
thesis.degree.level	Masters
thesis.degree.name	Master of Science (M.S.)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Asif_colostate_0053N_18146.pdf
Size:: 528.67 KB
Format:: Adobe Portable Document Format

Download

Collections

2020-
Theses and Dissertations