Browsing by Author "Cooley, Dan, committee member"
Now showing 1 - 10 of 10
- Results Per Page
- Sort Options
Item Open Access Application of statistical and deep learning methods to power grids(Colorado State University. Libraries, 2023) Rimkus, Mantautas, author; Kokoszka, Piotr, advisor; Wang, Haonan, advisor; Nielsen, Aaron, committee member; Cooley, Dan, committee member; Chen, Haonan, committee memberThe structure of power flows in transmission grids is evolving and is likely to change significantly in the coming years due to the rapid growth of renewable energy generation that introduces randomness and bidirectional power flows. Another transformative aspect is the increasing penetration of various smart-meter technologies. Inexpensive measurement devices can be placed at practically any component of the grid. As a result, traditional fault detection methods may no longer be sufficient. Consequently, there is a growing interest in developing new methods to detect power grid faults. Using model data, we first propose a two-stage procedure for detecting a fault in a regional power grid. In the first stage, a fault is detected in real time. In the second stage, the faulted line is identified with a negligible delay. The approach uses only the voltage modulus measured at buses (nodes of the grid) as the input. Our method does not require prior knowledge of the fault type. We further explore fault detection based on high-frequency data streams that are becoming available in modern power grids. Our approach can be treated as an online (sequential) change point monitoring methodology. However, due to the mostly unexplored and very nonstandard structure of high-frequency power grid streaming data, substantial new statistical development is required to make this methodology practically applicable. The work includes development of scalar detectors based on multichannel data streams, determination of data-driven alarm thresholds and investigation of the performance and robustness of the new tools. Due to a reasonably large database of faults, we can calculate frequencies of false and correct fault signals, and recommend implementations that optimize these empirical success rates. Next, we extend our proposed method for fault localization in a regional grid for scenarios where partial observability limits the available data. While classification methods have been proposed for fault localization, their effectiveness depends on the availability of labeled data, which is often impractical in real-life situations. Our approach bridges the gap between partial and full observability of the power grid. We develop efficient fault localization methods that can operate effectively even when only a subset of power grid bus data is available. This work contributes to the research area of fault diagnosis in scenarios where the number of available phasor measurement unit devices is smaller than the number of buses in the grid. We propose using Graph Neural Networks in combination with statistical fault localization methods to localize faults in a regional power grid with minimal available data. Our contribution to the field of fault localization aims to enable the adoption of effective fault localization methods for future power grids.Item Open Access Bayesian methods for environmental exposures: mixtures and missing data(Colorado State University. Libraries, 2022) Hoskovec, Lauren, author; Wilson, Ander, advisor; Magzamen, Sheryl, committee member; Hoeting, Jennifer, committee member; Cooley, Dan, committee memberAir pollution exposure has been linked to increased morbidity and mortality. Estimating the association between air pollution exposure and health outcomes is complicated by simultaneous exposure to multiple pollutants, referred to as a multipollutant mixture. In a multipollutant mixture, exposures may have both independent and interactive effects on health. In addition, observational studies of air pollution exposure often involve missing data. In this dissertation, we address challenges related to model choice and missing data when studying exposure to a mixture of environmental pollutants. First, we conduct a formal simulation study of recently developed methods for estimating the association between a health outcome and exposure to a multipollutant mixture. We evaluate methods on their performance in estimating the exposure-response function, identifying mixture components associated with the outcome, and identifying interaction effects. Other studies have reviewed the literature or compared performance on a single data set; however, none have formally compared such a broad range of new methods in a simulation study. Second, we propose a statistical method to analyze multiple asynchronous multivariate time series with missing data for use in personal exposure assessments. We develop an infinite hidden Markov model for multiple time series to impute missing data and identify shared time-activity patterns in exposures. We estimate hidden states that represent latent environments presenting a unique distribution of a mixture of environmental exposures. Through our multiple imputation algorithm, we impute missing exposure data conditional on the hidden states. Finally, we conduct an individual-level study of the association between long-term exposure to air pollution and COVID-19 severity in a Denver, Colorado, USA cohort. We develop a Bayesian multinomial logistic regression model for data with partially missing categorical outcomes. Our model uses Polya-gamma data augmentation, and we propose a visualization approach for inference on the odds ratio. We conduct one of the first individual-level studies of air pollution exposure and COVID-19 health outcomes using detailed clinical data and individual-level air pollution exposure data.Item Open Access Detection of multiple correlated time series and its application in synthetic aperture sonar imagery(Colorado State University. Libraries, 2014) Klausner, Nicholas Harold, author; Azimi-Sadjadi, Mahmood R., advisor; Scharf, Louis L., advisor; Pezeshki, Ali, committee member; Cooley, Dan, committee memberDetecting the presence of a common but unknown signal among two or more data channels is a problem that finds its uses in many applications, including collaborative sensor networks, geological monitoring of seismic activity, radar, and sonar. Some detection systems in such situations use decision fusion to combine individual detection decisions into one global decision. However, this detection paradigm can be sub-optimal as local decisions are based on the perspective of a single sensory system. Thus, methods that capture the coherent or mutual information among multiple data sets are needed. This work considers the problem of testing for the independence among multiple (≥ 2) random vectors. The solution is attained by considering a Generalized Likelihood Ratio Test (GLRT) that tests the null hypothesis that the composite covariance matrix of the channels, a matrix containing all inter and intra-channel second-order information, is block-diagonal. The test statistic becomes a generalized Hadamard ratio given by the ratio of the determinant of the estimate of this composite covariance matrix over the product of the determinant of its diagonal blocks. One important question in the practical application of any likelihood ratio test is the values of the test statistic needed to achieve sufficient evidence in support of the decision to reject the null hypothesis. To gain some understanding of the false alarm probability or size of the test for the generalized Hadamard ratio, we employ the theory of Gram determinants to show that the likelihood ratio can be written as a product of ratios of the squared residual from two linear prediction problems. This expression for the likelihood ratio leads quite simply to the fact that the generalized Hadamard ratio is stochastically equivalent to a product of independently distributed beta random variables under the null hypothesis. Asymptotically, the scaled logarithm of the generalized Hadamard ratio converges in distribution to a chi-squared random variable as the number of samples used to estimate the composite covariance matrix grows large. The degrees of freedom for this chi-squared distribution are closely related to the dimensions of the parameter spaces considered in the development of the GLRT. Studies of this asymptotic distribution seem to indicate, however, that the rate of convergence is particularly slow for all but the simplest of problems and may therefore lack practicality. For this reason, we consider the use of saddlepoint approximations as a practical alternative for this problem. This leads to methods that can be used to determine the threshold needed to approximately achieve a desired false alarm probability. We next turn our attention to an alternative implementation of the generalized Hadamard ratio for 2-dimensional wide-sense stationary random processes. Although the true GLRT for this problem would impose a Toeplitz structure (more specifically, a Toeplitz-block-Toeplitz structure) on the estimate of the composite covariance matrix, an intractable problem with no closed-form solution, the asymptotic theory of large Toeplitz matrices shows that the generalized Hadamard ratio converges to a broadband coherence statistic as the size of the composite covariance matrix grows large. Although an asymptotic result, simulations of several applications show that even finite dimensional implementations of the broadband coherence statistic can provide a significant improvement in detection performance. This improvement in performance is most likely attributed to the fact that, by constraining the model to incorporate stationarity, we have alleviated some of the difficulties associated with estimating highly parameterized models. Although more generally applicable, the unconstrained covariance estimates used in the generalized Hadamard ratio require the estimation of a much larger number of parameters. These methods are then applied to the detection of underwater targets in pairs of high frequency and broadband sonar images coregistered over the seafloor. This is a difficult problem due to various factors such as variations in the operating and environmental conditions, presence of spatially varying clutter, and variations in target shapes, compositions, and orientation. A comprehensive study of these methods is conducted using three sonar imagery datasets. The first two datasets are actual images of objects lying on the seafloor and are collected at different geographical locations with the environments from each presenting unique challenges. These two datasets will be used to demonstrate the usefulness of results pertaining to the null distribution of the generalized Hadamard ratio and to study the effects different clutter environments can have on its applicability. They are also used to compare the performance of the broadband coherence detector to several alternative detection techniques. The third dataset used in these studies contains actual images of the seafloor with synthetically generated targets of different geometrical shapes inserted into the images. The primary purpose of this dataset is to study the proposed detection technique's robustness to deviations from coregistration which may occur in practice due to the disparities in high frequency and broadband sonar. Using the results of this section, we will show that the fundamental principle of detecting underwater targets using coherence-based approaches is itself a very useful solution for this problem and that the broadband coherence statistic is adequately adept at achieving this.Item Open Access Inference for cumulative intraday return curves(Colorado State University. Libraries, 2018) Zheng, Ben, author; Kokoszka, Piotr S., advisor; Cooley, Dan, committee member; Miao, Hong, committee member; Zhou, Wen, committee memberThe central theme of this dissertation is inference for cumulative intraday return (CIDR) curves computed from high frequency data. Such curves describe how the return on an investment evolves with time over a relatively short period. We introduce a functional factor model to investigate the dependence of cumulative return curves of individual assets on the market and other factors. We propose a new statistical test to determine whether this dependence is the same in two sample periods. The statistical power of the new test is validated by asymptotic theory and a simulation study. We apply this test to study the impact on individual stocks and Sector Exchanged-Traded Funds (ETF) of the recent financial crisis and of trends in the oil price. Our analysis reveals that the functional approach has an information content different from that obtained from scalar factor models for point-to-point returns. Motivated by the risk inherent in intraday investing, we propose several ways of quantifying extremal behavior of a time series of curves. A curve can be extreme if it has shape and/or magnitude much different than the bulk of observed curves. Our approach is at the nexus of Functional Data Analysis and Extreme Value Theory. The risk measures we propose allow us to assess probabilities of observing extreme curves not seen in a historical record. These measures complement risk measures based on point-to-point returns, but have different interpretation and information content. Using our approach, we study how the financial crisis of 2008 impacted the extreme behavior of intraday cumulative return curves. We discover different impacts on shares in important sectors of the US economy. The information our analysis provides is in some cases different from the conclusions based on the extreme value analysis of daily closing price returns. In a different direction, we investigate a large-scale multiple testing problem motivated by a biological study. We introduce mixed models to fit the longitudinal data and incorporate a bootstrap method to construct a false discovery rate (FDR) controlling procedure. A simulation study is implemented to show its effectiveness.Item Open Access Moist static energy and the Madden-Julian oscillation: understanding initiation, maintenance and propagation through the application of novel diagnostics(Colorado State University. Libraries, 2014) Wolding, Brandon, author; Maloney, Eric, advisor; Johnson, Richard, committee member; Cooley, Dan, committee memberAs the dominant mode of tropical intraseasonal variability, the Madden-Julian Oscillation (MJO) has enormous societal impacts. Despite four decades of research motivated by these impacts, the processes that drive the initiation, maintenance and propagation of the MJO are still poorly understood. The development of large scale moisture anomalies plays an important role in many recent theories of the MJO, including moisture mode theory. This study identifies processes that support the development, maintenance and propagation of moisture anomalies associated with the MJO. A new set of objective MJO diagnostics, obtained as an extension of CEOF analysis, are introduced. These diagnostics provide useful measures of previously overlooked information yielded by CEOF analysis, including an objective measure that allows geographically disparate locations to be compared and contrasted throughout a reference MJO lifecycle. Compositing techniques based on this measure are applied to the MJO in an attempt to determine key physical processes affecting the MSE budget, identify prominent geographical variability of these processes, and highlight changes in the mean state winds and moisture field that explain this variability. The MSE budget reveals that variations in MSE associated with the MJO are largely the result of variations in column integrated moisture content (~90%), the majority of which occur between 850-500 hPa (~75%). Easterly(westerly) low level wind anomalies to the east(west) of the MJO result in a reduction(enhancement) of drying due to horizontal advection, which is only partially offset by a reduction(enhancement) of surface latent heat flux. In the deep tropics (5°N-5°S) of the eastern hemisphere, anomalous horizontal advection is primarily the result of the anomalous winds acting on the mean state moisture gradient. Over the broader tropics (15°N-15°S), the anomalous horizontal advection appears to result primarily from the modulation of synoptic scale eddy activity. The incomplete cancelation that occurs between anomalous horizontal advection and anomalous surface latent heat flux allows for the enhancement(reduction) of MSE to the east(west) of the MJO, enhancing(reducing) convection and helping drive propagation of the MJO. Anomalous vertical moisture advection is the primary process maintaining moisture and MSE anomalies against dissipation by anomalous precipitation throughout the MJO lifecycle. Anomalously positive(negative) vertical moisture advection appears to slightly exceed anomalous precipitation during periods of enhanced(suppressed) convection, suggesting a potential positive feedback that could act to destabilize the MJO. Geographical changes in the MSE budget of the MJO are primarily associated with changes in the mean state winds and the mean state moisture gradient. These results suggest that MJO convective anomalies are maintained by anomalous vertical moisture advection, and that propagation of these convective anomalies results from the large scale asymmetrical dynamical response to equatorial heating occurring in a specific arrangement of mean state winds and mean moisture gradient. The findings of this study support the hypothesis that the MJO is a moisture mode.Item Open Access Nonparametric tests of spatial isotropy and a calibration-capture-recapture model(Colorado State University. Libraries, 2017) Weller, Zachary D., author; Hoeting, Jennifer A., advisor; Cooley, Dan, committee member; Hooten, Mevin, committee member; Ahola, Jason, committee memberIn this dissertation we present applied, theoretical, and methodological advances in the statistical analysis of spatially-referenced and capture-recapture data. An important step in modeling spatially referenced data is choosing the spatial covariance function. Due to the development of a variety of covariance models, practitioners are faced with a myriad of choices for the covariance function. One of these choices is whether or not the covariance function is isotropic. Isotropy means that the covariance function depends only the distance between observations in space and not their relative direction. Part I of this dissertation focuses on nonparametric hypothesis tests of spatial isotropy. Statisticians have developed diagnostics, including graphical techniques and hypothesis tests, to assist in determining if an assumption of isotropy is adequate. Nonparametric tests of isotropy are one subset of these diagnostic methods, and while the theory for several nonparametric tests has been developed, the efficacy of these methods in practice is less understood. To begin part I of this dissertation, we develop a comprehensive review of nonparametric hypothesis tests of isotropy for spatially-referenced data. Our review provides informative graphics and insight about how nonparametric tests fit into the bigger picture of modeling spatial data and considerations for choosing a test of isotropy. An extensive simulation study offers comparisons of method performance and recommendations for test implementation. Our review also gives rise to a number of open research questions. In the second section of part I, we develop and demonstrate software that implements several of the tests. Because the tests were not available in software, we created the R package spTest, which implements a number of nonparametric tests of isotropy. The package is open source and available on the Comprehensive R Archive Network (CRAN). We provide a detailed demonstration of how to use spTest for testing isotropy on two spatially-referenced data sets. We offer insights into test limitations and how the tests can be used in conjunction with graphical techniques to evaluate isotropy properties. To conclude our work with spatially-referenced data in part I, we develop a new nonparametric test of spatial isotropy using the spectral representation of the spatial covariance function. Our new test overcomes some of the short-comings of other nonparametric tests. We develop theory that describes the distribution of our test statistic and explore the efficacy of our test via simulations and applications. We also note several difficulties in implementing the test, explore remedies to these difficulties, and propose several areas of future work. Finally, in part II of this dissertation, we shift our focus away from spatially-referenced data to capture-recapture data. Our capture-recapture work is motivated by methane concentration data collected by new mobile sensing technology. Because this technology is still in its infancy, there is a need to develop algorithms to extract meaningful information from the data. We develop a new Bayesian hierarchical capture-recapture model which we call the calibration-capture-recapture (CCR) model. We use our model and methane data to estimate the number and emission rate of methane sources within an urban sampling region. We apply our CCR model to methane data collected in two U.S. cities. Our new CCR model provides a framework to draw inference from data collected by mobile sensing technologies. The methodology for our capture-recapture model is useful in other capture-recapture settings, and the results of our model are important for informing climate change and infrastructure discussions.Item Open Access Some topics in high-dimensional robust inference and graphical modeling(Colorado State University. Libraries, 2021) Song, Youngseok, author; Zhou, Wen, advisor; Breidt, Jay, committee member; Cooley, Dan, committee member; Hoke, Kim, committee memberIn this dissertation, we focus on large-scale robust inference and high-dimensional graphical modeling. Especially, we study three problems: a large-scale inference method by a tail-robust regression, model specification tests for dependence structure of Gaussian Markov random fields, and a robust Gaussian graph estimation. First of all, we consider the problem of simultaneously testing a large number of general linear hypotheses, encompassing covariate-effect analysis, analysis of variance, and model comparisons. The new challenge that comes along with the overwhelmingly large number of tests is the ubiquitous presence of heavy-tailed and/or highly skewed measurement noise, which is the main reason for the failure of conventional least squares based methods. The new testing procedure is built on data-adaptive Huber regression, and a new covariance estimator of the regression estimate. Under mild conditions, we show that the proposed methods produce consistent estimates of the false discovery proportion. Extensive numerical experiments, along with an empirical study on quantitative linguistics, demonstrate the advantage of our proposal compared to many state-of-the-art methods when the data are generated from heavy-tailed and/or skewed distributions. In the next chapter, we focus on the Gaussian Markov random fields (GMRFs) and, by utilizing the connection between GMRFs and precision matrices, we propose an easily implemented procedure to assess the spatial structures modeled by GMRFs based on spatio-temporal observations. The new procedure is flexible to assess a variety of structures including the isotropic and directional dependence as well as the Matern class. A comprehensive simulation study has been conducted to demonstrate the finite sample performance of the procedure. Motivated from the efforts on modeling flu spread across the United States, we also apply our method to the Google Flu Trend data and report some very interesting epidemiological findings. Finally, we propose a high-dimensional precision matrix estimation method via nodewise distributionally robust regressions. The distributionally robust regression with an ambiguity set defined by Wasserstein-2 ball has a computationally tractable dual formulation, which is linked to square-root regressions. We propose an iterative algorithm that has a substantial advantage in terms of computation time. Extensive numerical experiments study the performance of the proposed method under various precision matrix structures and contamination models.Item Open Access Statistical upscaling of stochastic forcing in multiscale, multiphysics modeling(Colorado State University. Libraries, 2019) Vollmer, Charles T., author; Estep, Don, advisor; Tavener, Simon, committee member; Breidt, Jay, committee member; Cooley, Dan, committee memberModeling nuclear radiation damage necessarily involves multiple scales in both time and space, where molecular-level models have drastically different assumptions and phenomena than continuumlevel models. In this thesis, we propose a novel approach to explicitly coupling these multiple scales of the microstructure damage of radiation in materials. Our proposed stochastic process is a statistical upscaling from physical first principals that explicitly couples the micro, meso, and macro scales of materials under irradiation.Item Open Access Test of change point versus long-range dependence in functional time series(Colorado State University. Libraries, 2024) Meng, Xiangdong, author; Kokoszka, Piotr S., advisor; Cooley, Dan, committee member; Wang, Haonan, committee member; Miao, Hong, committee memberIn scalar time series analysis, a long-range dependent (LRD) series cannot be easily distinguished from certain non-stationary models, such as the change in mean model with short-range dependent (SRD) errors. To be specific, realizations of LRD series usually have a characteristic of changing local mean if the time span taken into account is long enough, which resembles the behavior of change in mean models. Test procedure for distinguishing between these two types of model has been investigated a lot in scalar case, see e.g. Berkes et al. (2006) and Baek and Pipiras (2012) and references therein. However, no analogous test for functional observations has been developed yet, partly because of omitted methods and theory for analyzing functional time series with long-range dependence. My dissertation establishes a procedure for testing change in mean models with SRD errors against LRD processes in functional case, which is an extension of the method of Baek and Pipiras (2012). The test builds on the local Whittle (LW) (or Gaussian semiparametric) estimation of the self-similarity parameter, which is based on the estimated level 1 scores of a suitable functional residual process. Remarkably, unlike other parametric methods such as Whittle estimation, whose asymptotic properties heavily depend on validity of the underlying spectral density on the full frequency range (−π, π], LW estimation imposes mild restrictions on the spectral density only near the origin and is thus more robust to model misspecification. We shall prove that the test statistic based on LW estimation is asymptotically normally distributed under the null hypothesis and it diverges to infinity under the LRD alternative.Item Open Access Underwater target detection using multiple disparate sonar platforms(Colorado State University. Libraries, 2010) Klausner, Nicholas Harold, author; Azimi-Sadjadi, Mahmood R., advisor; Pezeshki, Ali, committee member; Cooley, Dan, committee memberThe detection of underwater objects from sonar imagery presents a difficult problem due to various factors such as variations in the operating and environmental conditions, presence of spatially varying clutter, and variations in target shapes, compositions, and orientation. Additionally, collecting data from multiple platforms can present more challenging questions such as "how should I collaboratively perform detection to achieve optimal performance?", "how many platforms should be employed?", "when do we reach a point of diminishing return when adding platforms?", or more importantly "when does adding an additional platform not help at all?". To perform multi-platform detection and answer these questions we use the coherent information among all disparate sources of information and perform detection on the premise that the amount of coherent information will be greater in situations where a target is present in a region of interest within an image versus a situation where our observation strictly consists of background clutter. To exploit the coherent information among the different sources, we recast the standard Neyman-Pearson, Gauss-Gauss detector into the Multi-Channel Coherence Analysis (MCA) framework. The MCA framework allows one to optimally decompose the multi-channel data into a new appropriate coordinate system in order to analyze their linear dependence or coherence in a more meaningful fashion. To do this, new expressions for the log-likelihood ratio and J-divergence are formulated in this multichannel coordinate system. Using the MCA framework, the data of each channel is first whitened individually, hence removing the second-order information from each channel. Then, a set of linear mapping matrices are obtained which maximizes the sum of the cross-correlations among the channels in the mapped domain. To perform detection in the coordinate system provided by MCA, we first of all construct a model suited to this multiple sensor platform problem and subsequently represent observations in their MCA coordinates associated with the H1 hypothesis. Performing detection in the MCA framework results in a log-likelihood ratio that is written in terms of the MCA correlations and mapping vectors as well as a local signal-to-noise ratio matrix. In this coordinate system, the J-divergence, which is a measure of the difference in means of the likelihood ratio, can effectively be represented in terms of the multi-channel correlations and mapping vectors. Using this J-divergence expression, one can get a more clear picture of the amount of discriminatory information available for detection by analyzing the amount of coherent information present among the channels. New analytical and experimental results are also presented to provide better insight on the effects of adding a new piece of data to the multi-channel Gauss-Gauss detector represented in the MCA framework. To answer questions like those posed in the first paragraph, one must carefully analyze the amount of discriminatory information that is brought to the detection process when adding observations from an additional channel. Rather than attempting to observe the increase (or lack thereof) from the full detection problem it is advantageous to look at the change incrementally. To accomplish this goal, new updating equations for the likelihood ratio are derived that involve linearly estimating the new data from the old (already existing) and updating the likelihood ratio accordingly. In this case, the change in J-divergence can be written in terms of error covariance matrices under each hypothesis. We then derive a change in coordinate system that can be used to perform dimensionality reduction. This especially becomes useful when the data we wish to add exists in high-dimensional space. To demonstrate the usefulness of log-likelihood updating, we conduct two simulation studies. The first simulation corresponds to detecting the presence of dynamical structure in data we have observed and corresponds to a temporal updating scheme. The second is concerned with detecting the presence of a single narrow-band source using multiple linear sensor arrays in which case we consider a platform (or channel) updating scheme. A comprehensive study is carried out on the MCA-based detector on three data sets acquired from the Naval Surface Warfare Center (NSWC) in Panama City, FL. The first data set consists of one high frequency (HF) and three broadband (BB) side-looking sonar imagery coregistered over the same region on the sea floor captured from an Autonomous Underwater Vehicle (AUV) platform. For this data set we consider three different detection schemes using different combinations of these sonar channels. The next data set consists of one HF and only one BB beamformed sonar imagery again coregistered over the same region on the sea floor. This data set consists of not only target objects but also lobster traps giving us experimental intuition as how the multi-channel correlations change for different object compositions. The use of multiple disparate sonar images, e.g., a high frequency, high resolution sonar with good target definition and a multitude of lower resolution broadband sonar with good clutter suppression ability significantly improves the detection and false alarm rates comparing to situations where only single sonar is utilized. Finally, a data set consisting of synthetically generated images of targets with differing degrees of disparity such as signal-to-noise ratio (SNR), aspect angle, resolution, etc., is used to conduct a thorough sensitivity analysis in order to study the effects of different SNR, target types, and disparateness in aspect angle.