Towards interactive analytics over voluminous spatiotemporal data using a distributed, in-memory framework

Mitra, Saptashwa, author; Pallickara, Sangmi Lee, advisor; Pallickara, Shrideep, committee member; Ortega, Francisco, committee member; Li, Kaigang, committee member

Towards interactive analytics over voluminous spatiotemporal data using a distributed, in-memory framework

Files

Mitra_colostate_0053A_18041.pdf (7.25 MB)

Date

2023

Authors

Mitra, Saptashwa, author

Pallickara, Sangmi Lee, advisor

Pallickara, Shrideep, committee member

Ortega, Francisco, committee member

Li, Kaigang, committee member

Abstract

The proliferation of heterogeneous data sources, driven by advancements in sensor networks, simulations, and observational devices, has reached unprecedented levels. This surge in data generation and the demand for proper storage has been met with extensive research and development in distributed storage systems, facilitating the scalable housing of these voluminous datasets while enabling analytical processes. Nonetheless, the extraction of meaningful insights from these datasets, especially in the context of low-latency/ interactive analytics, poses a formidable challenge. This arises from the persistent gap between the processing capacity of distributed systems and their ever-expanding storage capabilities. Moreover, the interactive querying of these datasets is hindered by disk I/O, redundant network communications, recurrent hotspots, transient surges of user interest over limited geospatial regions, particularly in systems that concurrently serve multiple users. In environments where interactive querying is paramount, such as visualization systems, addressing these challenges becomes imperative. This dissertation delves into the intricacies of enabling interactive analytics over large-scale spatiotemporal datasets. My research efforts are centered around the conceptualization and implementation of a scalable storage, indexing, and caching framework tailored specifically for spatiotemporal data access. The research aims to create frameworks to facilitate fast query analytics over diverse data-types ranging from point, vector, and raster datasets. The frameworks implemented are characterized by its lightweight nature, residence primarily in memory, and their capacity to support model-driven extraction of insights from raw data or dynamic reconstruction of compressed/ partial in-memory data fragments with an acceptable level of accuracy. This approach effectively helps reduce the memory footprint of cached data objects and also mitigates the need for frequent client-server communications. Furthermore, we investigate the potential of leveraging various transfer learning techniques to improve the turn-around times of our memory-resident deep learning models, given the voluminous nature of our datasets, while maintaining good overall accuracy over its entire spatiotemporal domain. Additionally, our research explores the extraction of insights from high-dimensional datasets, such as satellite imagery, within this framework. The dissertation is also accompanied by empirical evaluations of our frameworks as well as the future directions and anticipated contributions in the domain of interactive analytics over large-scale spatiotemporal datasets, acknowledging the evolving landscape of data analytics where analytics frameworks increasingly rely on compute-intensive machine learning models.

Subject

distributed caching

science-guided machine learning

data cubes

visual analytics

in-memory storage

URI

https://hdl.handle.net/10217/237507
https://doi.org/10.25675/3.02503

Collections

2020-
Theses and Dissertations

Full item page

Towards interactive analytics over voluminous spatiotemporal data using a distributed, in-memory framework

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Rights Access

Subject

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By