Harnessing large language models for permission fidelity analysis from android application descriptions
dc.contributor.author | Tamrakar, Yunik, author | |
dc.contributor.author | Ray, Indrakshi, advisor | |
dc.contributor.author | Banerjee, Ritwik, advisor | |
dc.contributor.author | Ghosh, Sudipto, committee member | |
dc.contributor.author | Simske, Steve, committee member | |
dc.date.accessioned | 2025-06-02T15:20:02Z | |
dc.date.available | 2026-05-28 | |
dc.date.issued | 2025 | |
dc.description.abstract | Android applications are very popular these days and as of mid-2024 there are over 2 million applications in the Google Play Store. With such a large number of applications available for download, the threat of privacy leakage increases considerably, primarily due to the users' limited knowledge in distinguishing the necessary app permissions. This makes accurate and consistent checking of the permissions collected by the applications necessary to ensure the protection of the user's privacy. Studies have indicated that inferring permissions from app descriptions is an effective way to determine whether the collected permissions are necessary or not. Previous research in the permission inference space has explored techniques such as keyword-based matching, Natural Language Processing methods (including part-of-speech tagging and named entity recognition), as well as deep learning based approaches using Recurrent Neural Networks. However, app descriptions are often vague and may omit details to meet sentence length restrictions, resulting in suboptimal performance of these models. This limitation motivated our choice of large language models (LLMs), as their advanced contextual understanding and ability to infer implicit information can directly address the weaknesses observed in previous approaches. In this work, we explore various LLM architectures for the permission inference task and provide a detailed comparison across various models. We evaluate both zero-shot learning and fine-tuning based approaches, demonstrating that fine-tuned models can achieve state-of-the-art performance. Additionally, by employing targeted generative AI based training data augmentation techniques, we show that these fine-tuned models can significantly outperform baseline methods. Furthermore, we illustrate the potential of leveraging paraphrasing to boost fine-tuned performance by over 50 percent, all while using only a very small number of annotated samples—a rarity for LLMs. | |
dc.format.medium | born digital | |
dc.format.medium | masters theses | |
dc.identifier | Tamrakar_colostate_0053N_18881.pdf | |
dc.identifier.uri | https://hdl.handle.net/10217/240955 | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Colorado State University. Libraries | |
dc.relation.ispartof | 2020- | |
dc.rights | Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright. | |
dc.rights.access | Embargo expires: 05/28/2026. | |
dc.subject | android permissions | |
dc.subject | LLM | |
dc.subject | privacy | |
dc.subject | compliance | |
dc.subject | android applications | |
dc.subject | NLP | |
dc.title | Harnessing large language models for permission fidelity analysis from android application descriptions | |
dc.type | Text | |
dcterms.embargo.expires | 2026-05-28 | |
dcterms.embargo.terms | 2026-05-28 | |
dcterms.rights.dpla | This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | Colorado State University | |
thesis.degree.level | Masters | |
thesis.degree.name | Master of Science (M.S.) |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Tamrakar_colostate_0053N_18881.pdf
- Size:
- 4.83 MB
- Format:
- Adobe Portable Document Format