Repository logo
 

Harnessing large language models for permission fidelity analysis from android application descriptions

dc.contributor.authorTamrakar, Yunik, author
dc.contributor.authorRay, Indrakshi, advisor
dc.contributor.authorBanerjee, Ritwik, advisor
dc.contributor.authorGhosh, Sudipto, committee member
dc.contributor.authorSimske, Steve, committee member
dc.date.accessioned2025-06-02T15:20:02Z
dc.date.available2026-05-28
dc.date.issued2025
dc.description.abstractAndroid applications are very popular these days and as of mid-2024 there are over 2 million applications in the Google Play Store. With such a large number of applications available for download, the threat of privacy leakage increases considerably, primarily due to the users' limited knowledge in distinguishing the necessary app permissions. This makes accurate and consistent checking of the permissions collected by the applications necessary to ensure the protection of the user's privacy. Studies have indicated that inferring permissions from app descriptions is an effective way to determine whether the collected permissions are necessary or not. Previous research in the permission inference space has explored techniques such as keyword-based matching, Natural Language Processing methods (including part-of-speech tagging and named entity recognition), as well as deep learning based approaches using Recurrent Neural Networks. However, app descriptions are often vague and may omit details to meet sentence length restrictions, resulting in suboptimal performance of these models. This limitation motivated our choice of large language models (LLMs), as their advanced contextual understanding and ability to infer implicit information can directly address the weaknesses observed in previous approaches. In this work, we explore various LLM architectures for the permission inference task and provide a detailed comparison across various models. We evaluate both zero-shot learning and fine-tuning based approaches, demonstrating that fine-tuned models can achieve state-of-the-art performance. Additionally, by employing targeted generative AI based training data augmentation techniques, we show that these fine-tuned models can significantly outperform baseline methods. Furthermore, we illustrate the potential of leveraging paraphrasing to boost fine-tuned performance by over 50 percent, all while using only a very small number of annotated samples—a rarity for LLMs.
dc.format.mediumborn digital
dc.format.mediummasters theses
dc.identifierTamrakar_colostate_0053N_18881.pdf
dc.identifier.urihttps://hdl.handle.net/10217/240955
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartof2020-
dc.rightsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.rights.accessEmbargo expires: 05/28/2026.
dc.subjectandroid permissions
dc.subjectLLM
dc.subjectprivacy
dc.subjectcompliance
dc.subjectandroid applications
dc.subjectNLP
dc.titleHarnessing large language models for permission fidelity analysis from android application descriptions
dc.typeText
dcterms.embargo.expires2026-05-28
dcterms.embargo.terms2026-05-28
dcterms.rights.dplaThis Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado State University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.S.)

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Tamrakar_colostate_0053N_18881.pdf
Size:
4.83 MB
Format:
Adobe Portable Document Format