Repository logo
 

Machine learning-based phishing detection using URL features: a comprehensive review

dc.contributor.authorAsif, Asif Uz Zaman, author
dc.contributor.authorRay, Indrakshi, advisor
dc.contributor.authorShirazi, Hossein, advisor
dc.contributor.authorRay, Indrajit, committee member
dc.contributor.authorWang, Haonan, committee member
dc.date.accessioned2024-01-01T11:24:22Z
dc.date.available2024-01-01T11:24:22Z
dc.date.issued2023
dc.description.abstractIn a social engineering attack known as phishing, a perpetrator sends a false message to a victim while posing as a trusted representative in an effort to collect private information such as login passwords and financial information for personal gain. To successfully carry out a phishing attack, fraudulent websites, emails, and messages that are counterfeit are utilized to trick the victim. Machine learning appears to be a promising technique for phishing detection. Typically, website content and Unified Resource Locator (URL) based features are used. However, gathering website content features requires visiting malicious sites, and preparing the data is labor-intensive. Towards this end, researchers are investigating if URL-only information can be used for phishing detection. This approach is lightweight and can be installed at the client's end, they do not require data collection from malicious sites and can identify zero-day attacks. We conduct a systematic literature review on URL-based phishing detection. We selected recent papers (2018 --) or if they had a high citation count (50+ in Google Scholar) that appeared in top conferences and journals in cybersecurity. This survey will provide researchers and practitioners with information on the current state of research on URL-based website phishing attack detection methodologies. The results of this study show that, despite the lack of a centralized dataset, this is beneficial because it prevents attackers from seeing the features that classifiers employ. However, the approach is time-consuming for researchers. Furthermore, for algorithms, both machine learning and deep learning algorithms can be utilized since they have very good classification accuracy, and in this work, we found that Random Forest and Long Short-Term Memory are good choices of algorithms. Using task-specific lexical characteristics rather than concentrating on the number of features is essential for this work because feature selection will impact how accurately algorithms will detect phishing URLs.
dc.format.mediumborn digital
dc.format.mediummasters theses
dc.identifierAsif_colostate_0053N_18146.pdf
dc.identifier.urihttps://hdl.handle.net/10217/237380
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartof2020-
dc.rightsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subjectmachine learning
dc.subjectsocial engineering
dc.subjectURL-based
dc.subjectphishing
dc.subjectcyber-security
dc.subjectsurvey
dc.titleMachine learning-based phishing detection using URL features: a comprehensive review
dc.typeText
dcterms.rights.dplaThis Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado State University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.S.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Asif_colostate_0053N_18146.pdf
Size:
528.67 KB
Format:
Adobe Portable Document Format