Browsing by Author "Banerjee, Ritwik, advisor"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Embargo Harnessing large language models for permission fidelity analysis from android application descriptions(Colorado State University. Libraries, 2025) Tamrakar, Yunik, author; Ray, Indrakshi, advisor; Banerjee, Ritwik, advisor; Ghosh, Sudipto, committee member; Simske, Steve, committee memberAndroid applications are very popular these days and as of mid-2024 there are over 2 million applications in the Google Play Store. With such a large number of applications available for download, the threat of privacy leakage increases considerably, primarily due to the users' limited knowledge in distinguishing the necessary app permissions. This makes accurate and consistent checking of the permissions collected by the applications necessary to ensure the protection of the user's privacy. Studies have indicated that inferring permissions from app descriptions is an effective way to determine whether the collected permissions are necessary or not. Previous research in the permission inference space has explored techniques such as keyword-based matching, Natural Language Processing methods (including part-of-speech tagging and named entity recognition), as well as deep learning based approaches using Recurrent Neural Networks. However, app descriptions are often vague and may omit details to meet sentence length restrictions, resulting in suboptimal performance of these models. This limitation motivated our choice of large language models (LLMs), as their advanced contextual understanding and ability to infer implicit information can directly address the weaknesses observed in previous approaches. In this work, we explore various LLM architectures for the permission inference task and provide a detailed comparison across various models. We evaluate both zero-shot learning and fine-tuning based approaches, demonstrating that fine-tuned models can achieve state-of-the-art performance. Additionally, by employing targeted generative AI based training data augmentation techniques, we show that these fine-tuned models can significantly outperform baseline methods. Furthermore, we illustrate the potential of leveraging paraphrasing to boost fine-tuned performance by over 50 percent, all while using only a very small number of annotated samples—a rarity for LLMs.Item Open Access Redundant complexity in deep learning: an efficacy analysis of NeXtVLAD in NLP(Colorado State University. Libraries, 2022) Mahdipour Saravani, Sina, author; Ray, Indrakshi, advisor; Banerjee, Ritwik, advisor; Simske, Steven, committee memberWhile deep learning is prevalent and successful, partly due to its extensive expressive power with less human intervention, it may inherently promote a naive and negatively simplistic employment, giving rise to problems in sustainability, reproducibility, and design. Larger, more compute-intensive models entail costs in these areas. In this thesis, we probe the effect of a neural component -- specifically, an architecture called NeXtVLAD -- on predictive accuracy for two downstream natural language processing tasks -- context-dependent sarcasm detection and deepfake text detection, and find it ineffective and redundant. We specifically investigate the extent to which this novel architecture contributes to the results, and find that it does not provide statistically significant benefits. This is only one of the several directions in efficiency-aware research in deep learning, but is especially important due to introducing an aspect of interpretability that targets design and efficiency, ergo, promotes studying architectures and topologies in deep learning to both ablate the redundant components for enhancement in sustainability, and to earn further insights into the information flow in deep neural architectures, and into the role of each and every component. We hope our insights highlighting the lack of benefits from introducing a resource-intensive component will aid future research to distill the effective elements from long and complex pipelines, thereby providing a boost to the wider research community.