Browsing by Author "Wang, Haonan, advisor"
Now showing 1 - 14 of 14
- Results Per Page
- Sort Options
Item Open Access A novel approach to statistical problems without identifiability(Colorado State University. Libraries, 2024) Adams, Addison D., author; Wang, Haonan, advisor; Zhou, Tianjian, advisor; Kokoszka, Piotr, committee member; Shaby, Ben, committee member; Ray, Indrakshi, committee memberIn this dissertation, we propose novel approaches to random coefficient regression (RCR) and the recovery of mixing distributions under nonidentifiable scenarios. The RCR model is an extension of the classical linear regression model that accounts for individual variation by treating the regression coefficients as random variables. A major interest lies in the estimation of the joint probability distribution of these random coefficients based on the observable samples of the outcome variable evaluated for different values of the explanatory variables. In Chapter 2, we consider fixed-design RCR models, under which the coefficient distribution is not identifiable. To tackle the challenges of nonidentifiability, we consider an equivalence class, in which each element is a plausible coefficient distribution that, for each value of the explanatory variables, yields the same distribution for the outcome variable. In particular, we formulate the approximations of the coefficient distributions as a collection of stochastic inverse problems, allowing for a more flexible nonparametric approach with minimal assumptions. An iterative approach is proposed to approximate the elements by incorporating an initial guess of a solution called the global ansatz. We further study its convergence and demonstrate its performance through simulation studies. The proposed approach is applied to a real data set from an acupuncture clinical trial. In Chapter 3, we consider the problem of recovering a mixing distribution, given a component distribution family and observations from a compound distribution. Most existing methods are restricted in scope in that they are developed for certain component distribution families or continuity structures of mixing distributions. We propose a new, flexible nonparametric approach with minimal assumptions. Our proposed method iteratively steps closer to the desired mixing distribution, starting from a user-specified distribution, and we further establish its convergence properties. Simulation studies are conducted to examine the performance of our proposed method. In addition, we demonstrate the utility of our proposed method through its application to two sets of real-world data, including prostate cancer data and Shakespeare's canon word count.Item Open Access A penalized estimation procedure for varying coefficient models(Colorado State University. Libraries, 2015) Tu, Yan, author; Wang, Haonan, advisor; Breidt, F. Jay, committee member; Chapman, Phillip, committee member; Luo, J. Rockey, committee memberVarying coefficient models are widely used for analyzing longitudinal data. Various methods for estimating coefficient functions have been developed over the years. We revisit the problem under the theme of functional sparsity. The problem of sparsity, including global sparsity and local sparsity, is a recurrent topic in nonparametric function estimation. A function has global sparsity if it is zero over the entire domain, and it indicates that the corresponding covariate is irrelevant to the response variable. A function has local sparsity if it is nonzero but remains zero for a set of intervals, and it identifies an inactive period of the corresponding covariate. Each type of sparsity has been addressed in the literature using the idea of regularization to improve estimation as well as interpretability. In this dissertation, a penalized estimation procedure has been developed to achieve functional sparsity, that is, simultaneously addressing both types of sparsity in a unified framework. We exploit the property of B-spline approximation and group bridge penalization. Our method is illustrated in simulation study and real data analysis, and outperforms the existing methods in identifying both local sparsity and global sparsity. Asymptotic properties of estimation consistency and sparsistency of the proposed method are established. The term of sparsistency refers to the property that the functional sparsity can be consistently detected.Item Open Access Analysis of structured data and big data with application to neuroscience(Colorado State University. Libraries, 2015) Sienkiewicz, Ela, author; Wang, Haonan, advisor; Meyer, Mary, committee member; Breidt, F. Jay, committee member; Hayne, Stephen, committee memberNeuroscience research leads to a remarkable set of statistical challenges, many of them due to the complexity of the brain, its intricate structure and dynamical, non-linear, often non-stationary behavior. The challenge of modeling brain functions is magnified by the quantity and inhomogeneity of data produced by scientific studies. Here we show how to take advantage of advances in distributed and parallel computing to mitigate memory and processor constraints and develop models of neural components and neural dynamics. First we consider the problem of function estimation and selection in time-series functional dynamical models. Our motivating application is on the point-process spiking activities recorded from the brain, which poses major computational challenges for modeling even moderately complex brain functionality. We present a big data approach to the identification of sparse nonlinear dynamical systems using generalized Volterra kernels and their approximation using B-spline basis functions. The performance of the proposed method is demonstrated in experimental studies. We also consider a set of unlabeled tree objects with topological and geometric properties. For each data object, two curve representations are developed to characterize its topological and geometric aspects. We further define the notions of topological and geometric medians as well as quantiles based on both representations. In addition, we take a novel approach to define the Pareto medians and quantiles through a multi-objective optimization problem. In particular, we study two different objective functions which measure the topological variation and geometric variation respectively. Analytical solutions are provided for topological and geometric medians and quantiles, and in general, for Pareto medians and quantiles the genetic algorithm is implemented. The proposed methods are applied to analyze a data set of pyramidal neurons.Item Open Access Application of statistical and deep learning methods to power grids(Colorado State University. Libraries, 2023) Rimkus, Mantautas, author; Kokoszka, Piotr, advisor; Wang, Haonan, advisor; Nielsen, Aaron, committee member; Cooley, Dan, committee member; Chen, Haonan, committee memberThe structure of power flows in transmission grids is evolving and is likely to change significantly in the coming years due to the rapid growth of renewable energy generation that introduces randomness and bidirectional power flows. Another transformative aspect is the increasing penetration of various smart-meter technologies. Inexpensive measurement devices can be placed at practically any component of the grid. As a result, traditional fault detection methods may no longer be sufficient. Consequently, there is a growing interest in developing new methods to detect power grid faults. Using model data, we first propose a two-stage procedure for detecting a fault in a regional power grid. In the first stage, a fault is detected in real time. In the second stage, the faulted line is identified with a negligible delay. The approach uses only the voltage modulus measured at buses (nodes of the grid) as the input. Our method does not require prior knowledge of the fault type. We further explore fault detection based on high-frequency data streams that are becoming available in modern power grids. Our approach can be treated as an online (sequential) change point monitoring methodology. However, due to the mostly unexplored and very nonstandard structure of high-frequency power grid streaming data, substantial new statistical development is required to make this methodology practically applicable. The work includes development of scalar detectors based on multichannel data streams, determination of data-driven alarm thresholds and investigation of the performance and robustness of the new tools. Due to a reasonably large database of faults, we can calculate frequencies of false and correct fault signals, and recommend implementations that optimize these empirical success rates. Next, we extend our proposed method for fault localization in a regional grid for scenarios where partial observability limits the available data. While classification methods have been proposed for fault localization, their effectiveness depends on the availability of labeled data, which is often impractical in real-life situations. Our approach bridges the gap between partial and full observability of the power grid. We develop efficient fault localization methods that can operate effectively even when only a subset of power grid bus data is available. This work contributes to the research area of fault diagnosis in scenarios where the number of available phasor measurement unit devices is smaller than the number of buses in the grid. We propose using Graph Neural Networks in combination with statistical fault localization methods to localize faults in a regional power grid with minimal available data. Our contribution to the field of fault localization aims to enable the adoption of effective fault localization methods for future power grids.Item Open Access Improved inference in heteroskedastic regression models with monotone variance function estimation(Colorado State University. Libraries, 2018) Kim, Soo Young, author; Wang, Haonan, advisor; Meyer, Mary C., advisor; Fosdick, Bailey K., committee member; Opsomer, Jean D., committee member; Luo, J. Rockey, committee memberThe problems associated with heteroskedasticity often lead to incorrect inferences in a regression model, especially when the form of the heteroskedasticity is obscure. In this dissertation, I present methods to estimate a variance function in a heteroskedastic regression model where the variance function is assumed to be smooth and monotone in a predictor variable. Maximum likelihood estimation of the variance function is derived under normal or double-exponential error distribution assumptions based on regression splines and the cone projection algorithm. A penalized spline estimator is also introduced, and the estimator performs well when there exists a spiking problem at a boundary of domain. The convergence rates of the estimated variance functions are derived, and simulations show that it tends to be closer to the true variance function in a variety of scenarios compared to the existing method. The estimated variance functions from the proposed methods provide improved inference about the mean function, in terms of a coverage probability and an average length for an interval estimate. The utility of the method is illustrated through the analysis of real datasets such as LIDAR data, abalone data, California air pollution data, and U.S. temperature data. The methodology is implemented in the R package cgam. In addition to the variance function estimation method, the hypothesis test procedure of a smooth and monotone variance function is discussed. The likelihood ratio test is introduced under normal or double-exponential error distribution assumptions. Comparisons of the proposed test with existing tests are conducted through simulations.Item Open Access Linear system design for compression and fusion(Colorado State University. Libraries, 2013) Wang, Yuan, author; Wang, Haonan, advisor; Scharf, Louis L., advisor; Breidt, F. Jay, committee member; Luo, Rockey J., committee memberThis is a study of measurement compression and fusion design. The idea common to both problems is that measurements can often be linearly compressed into lower-dimensional spaces without introducing too much excess mean-squared error or excess volume in a concentration ellipse. The question is how to design the compression to minimize the excesses at any given dimension. The first part of this work is motivated by sensing and wireless communication, where data compression or dimension reduction may be used to reduce the required communication bandwidth. The high-dimensional measurements are converted into low-dimensional representations through linear compression. Our aim is to compress a noisy measurement, allowing for the fact that the compressed measurement will be transmitted over a noisy channel. We review optimal compression with no transmission noise and show its connection with canonical coordinates. When the compressed measurement is transmitted with noise, we give the closed-form expression for the optimal compression matrix with respect to the trace and determinant of the error covariance matrix. We show that the solutions are canonical coordinate solutions, scaled by coefficients which account for canonical correlations and transmission noise variance, followed by a coordinate transformation into the sub-dominant invariant subspace of the channel noise. The second part of this work is a problem of integrating multiple sources of measurements. We consider two multiple-input-multiple-output channels, a primary channel and a secondary channel, with dependent input signals. The primary channel carries the signal of interest, and the secondary channel carries a signal that shares a joint distribution with the primary signal. The problem of particular interest is designing the secondary channel, with a fixed primary channel. We formulate the problem as an optimization problem, in which the optimal secondary channel maximizes an information-based criterion. An analytic solution is provided in a special case. Two fast-to-compute algorithms, one extrinsic and the other intrinsic, are proposed to approximate the optimal solutions in general cases. In particular, the intrinsic algorithm exploits the geometry of the unit sphere, a manifold embedded in Euclidean space. The performances of the proposed algorithms are examined through a simulation study. A discussion of the choice of dimension for the secondary channel is given, leading to rules for dimension reduction.Item Open Access Model selection based on expected squared Hellinger distance(Colorado State University. Libraries, 2007) Cao, Xiaofan, author; Iyer, Hariharan K., advisor; Wang, Haonan, advisorThis dissertation is motivated by a general model selection problem such that the true model is unknown and one or more approximating parametric families of models are given along with strategies for estimating the parameters using data. We develop model selection methods based on Hellinger distance that can be applied to a wide range of modeling problems without posing the typical assumptions for the true model to be within the approximating families or to come from a particular parametric family. We propose two estimators for the expected squared Hellinger distance as the model selection criteria.Item Open Access Modulated renewal process models with functional predictors for neural connectivities(Colorado State University. Libraries, 2015) Tan, Hongyu, author; Chapman, Phillip L., advisor; Wang, Haonan, advisor; Meyer, Mary C., committee member; Luo, J. Rockey, committee memberRecurrent event data arise in fields such as medicine, business and social sciences. In general, there are two types of recurrent event data. One is from a relatively large number of independent processes exhibiting a relatively small number of recurrent events, and the other is from a relatively small number of processes generating a large number of events. We focus on the second type. Our motivating application is a collection of neuron spike trains from a rat brain, recorded during performance of a task. The goal is to model the intensity of events in the response spike train as a function of a set of predictor spike trains and the spike history of the response itself. We propose a multiplicative modulated renewal processes model that is similar to a Cox proportional hazards model. The model for response intensity includes four components: (1) a baseline intensity, or hazard, function that captures the common pattern of time to next event, (2) a log-linear term that quantifies the impact of the predictor spike histories through coefficient functions, (3) a similar log-linear term for the response history, (4) a log-linear regression-type term for external time dependent variables. The coefficient functions for predictor and response histories are approximated by B-spline basis functions. Model parameters are estimated by partial likelihood. Performance of the proposed methods is demonstrated through simulation. Simulations show that both the coefficient function estimates and the asymptotic standard error function estimates are accurate when the sample size is large. For small samples, simulations show that the smoothly absolute clipped deviation (SCAD) penalty outperforms LASSO penalty and unpenalized partial likelihood approach in identifying functional sparsity under various situations. The proposed methods are illustrated on a real spike train data set, in which substantial non-stationarity is identified.Item Open Access Multi-channel factor analysis: properties, extensions, and applications(Colorado State University. Libraries, 2024) Stanton, Gray, author; Wang, Haonan, advisor; Scharf, Louis, advisor; Kokoszka, Piotr, committee member; Wang, Tianying, committee member; Luo, Jie, committee memberMulti-channel Factor Analysis (MFA) extends factor analysis to the multi-channel or multi-view setting, where latent common factors influence all channels while distinct factors are specific to individual channels. The within- and across-channel covariance is determined by a low-rank matrix, a block-diagonal matrix with low-rank blocks, and a diagonal matrix, which provides a parsimonious model for both covariances. MFA and related multi-channel methods for data fusion are discussed in Chapter 1. Under conditions on the channel sizes and factor numbers, the results of Chapter 2 show that the generic global identifiability of the aforementioned covariance matrices can be guaranteed a priori, and the estimators obtained by maximizing a Gaussian likelihood are shown to be consistent and asymptotically normal even under misspecification. To handle temporal correlation in the latent factors, Chapter 3 introduces Multi-channel Factor Spectral Analysis (MFSA). Results for the identifiability and parameterization properties of the MFSA spectral density model are derived, and a Majorization-Minimization procedure to optimize the Whittle pseudo-likelihood is designed to estimate the MFSA parameters. A simulation study is conducted to explore how temporal correlations in the latent factors affect estimation, and it is demonstrated that MFSA significantly outperforms MFA when the factor series are highly autocorrelated. In Chapter 4, a locally stationary joint multivariate Gaussian process with MFA-type cross-sectional covariance is developed to model multi-vehicle trajectories in a highway environment. A dynamic model-based clustering procedure is designed to partition cohorts of nearby vehicles into pods based on the stability of the intra-pod relative vehicle configuration. The performance of this procedure is illustrated by its application to the Next GENeration SIMulation dataset of vehicle trajectories on U.S. Highway 101.Item Open Access Non-asymptotic properties of spectral decomposition of large gram-type matrices with applications to high-dimensional inference(Colorado State University. Libraries, 2020) Zhang, Lyuou, author; Zhou, Wen, advisor; Wang, Haonan, advisor; Breidt, Jay, committee member; Meyer, Mary, committee member; Yang, Liuqing, committee memberJointly modeling a large and possibly divergent number of temporally evolving subjects arises ubiquitously in statistics, econometrics, finance, biology, and environmental sciences. To circumvent the challenges due to the high dimesionality as well as the temporal and/or contemporaneous dependence, the factor model and its variants have been widely employed. In general, they model the large scale temporally dependent data using some low dimensional structures that capture variations shared across dimensions. In this dissertation, we investigate the non-asymptotic properties of spectral decomposition of high-dimensional Gram-type matrices based on factor models. Specifically, we derive the exponential tail bound for the first and second moments of the deviation between the empirical and population eigenvectors to the right Gram matrix as well as the Berry-Esseen type bound to characterize the Gaussian approximation of these deviations. We also obtain the non-asymptotic tail bound of the ratio between eigenvalues of the left Gram matrix, namely the sample covariance matrix, and their population counterparts regardless of the size of the data matrix. The documented non-asymptotic properties are further demonstrated in a suite of applications, including the non-asymptotic characterization of the estimated number of latent factors in factor models and related machine learning problems, the estimation and forecasting of high-dimensional time series, the spectral properties of large sample covariance matrix such as perturbation bounds and inference on the spectral projectors, and low-rank matrix denoising from temporally dependent data. Next, we consider the estimation and inference of a flexible subject-specific heteroskedasticity model for large scale panel data, which employs latent semiparametric factor structure to simultaneously account for the heteroskedasticity across subjects and contemporaneous and/or serial correlations. Specifically, the subject-specific heteroskedasticity is modeled by the product of unobserved factor process and subject-specific covariate effect. Serving as the loading, the covariate effect is further modeled via additive models. We propose a two-step procedure for estimation. Theoretical validity of this procedure is documented. By scrupulously examining the non-asymptotic rates for recovering the latent factor process and its loading, we show the consistency and asymptotic efficiency of our regression coefficient estimator in addition to the asymptotic normality. This leads to a more efficient confidence set for the regression coefficient. Using a comprehensive simulation study, we demonstrate the finite sample performance of our procedure, and numerical results corroborate the theoretical findings. Finally, we consider the factor model-assisted variable clustering for temporally dependent data. The population level clusters are characterized by the latent factors of the model. We combine the approximate factor model with population level clusters to give an integrative group factor model as a background model for variable clustering. In this model, variables are loaded on latent factors and the factors are the same for variables from a common cluster and are different for variables from different groups. The commonality among clusters is modeled by common factors and the clustering structure is modeled by unique factors of each cluster. We quantify the difficulty of clustering data generated from integrative group factor model in terms of a permutation-invariant clustering error. We develop an algorithm to recover clustering assignments and study its minimax-optimality. The analysis of integrative group factor model and our proposed algorithm partitions a two-dimensional phase space into three regions showing the impact of parameters on the possibility of clustering in integrative group factor model and the statistical guarantee of our proposed algorithm. We also obtain the non-asymptotic characterization of the estimated number of latent factors. The model can be extended to the case of diverging number of clusters with similar results.Item Open Access Nonparametric function smoothing: fiducial inference of free knot splines and ecological applications(Colorado State University. Libraries, 2010) Sonderegger, Derek Lee, author; Wang, Haonan, advisor; Hannig, Jan, advisor; Noon, Barry R. (Barry Richard), 1949-, committee member; Iyer, Hariharan K., committee memberNonparametric function estimation has proven to be a useful tool for applied statisticians. Classic techniques such as locally weighted regression and smoothing splines are being used in a variety of circumstances to address questions at the forefront of ecological theory. We first examine an ecological threshold problem and define a threshold as where the derivative of the estimated functions changes states (negative, possibly zero, or positive) and present a graphical method that examines the state changes across a wide interval of smoothing levels. We apply this method to macro-invertabrate data from the Arkansas River. Next we investigate a measurement error model and a generalization of the commonly used regression calibration method whereby a nonparametric function is used instead of a linear function. We present a simulation study to assess the effectiveness of the method and apply the method to a water quality monitoring data set. The possibility of defining thresholds as knot point locations in smoothing splines led to the investigation of the fiducial distribution of free-knot splines. After introducing the theory behind fiducial inference, we then derive conditions sufficient to for asymptotic normality of the multivariate fiducial density. We then derive the fiducial density for an arbitrary degree spline with an arbitrary number of knot points. We then show that free-knot splines of degree 3 or greater satisfy the asymptotic normality conditions. Finally we conduct a simulation study to assess quality of the fiducial solution compared to three other commonly used methods.Item Open Access Parametric and semiparametric model estimation and selection in geostatistics(Colorado State University. Libraries, 2012) Chu, Tingjin, author; Wang, Haonan, advisor; Zhu, Jun, advisor; Meyer, Mary, committee member; Luo, J. Rockey, committee memberThis dissertation is focused on geostatistical models, which are useful in many scientific disciplines, such as climatology, ecology and environmental monitoring. In the first part, we consider variable selection in spatial linear models with Gaussian process errors. Penalized maximum likelihood estimation (PMLE) that enables simultaneous variable selection and parameter estimation is developed and for ease of computation, PMLE is approximated by one-step sparse estimation (OSE). To further improve computational efficiency particularly with large sample sizes, we propose penalized maximum covariance-tapered likelihood estimation (PMLET) and its one-step sparse estimation (OSET). General forms of penalty functions with an emphasis on smoothly clipped absolute deviation are used for penalized maximum likelihood. Theoretical properties of PMLE and OSE, as well as their approximations PMLET and OSET using covariance tapering are derived, including consistency, sparsity, asymptotic normality, and the oracle properties. For covariance tapering, a by-product of our theoretical results is consistency and asymptotic normality of maximum covariance-tapered likelihood estimates. Finite-sample properties of the proposed methods are demonstrated in a simulation study and for illustration, the methods are applied to analyze two real data sets. In the second part, we develop a new semiparametric approach to geostatistical modeling and inference. In particular, we consider a geostatistical model with additive components, where the covariance function of the spatial random error is not pre-specified and thus flexible. A novel, local Karhunen-Loève expansion is developed and a likelihood-based method devised for estimating the model parameters. In addition, statistical inference, including spatial interpolation and variable selection, is considered. Our proposed computational algorithm utilizes Newton-Raphson on a Stiefel manifold and is computationally efficient. A simulation study demonstrates sound finite-sample properties and a real data example is given to illustrate our method. While the numerical results are comparable to maximum likelihood estimation under the true model, our method is shown to be more robust against model misspecification and is computationally far more efficient for larger sample sizes. Finally, the theoretical properties of the estimates are explored and in particular, a consistency result is established.Item Open Access Sliced inverse approach and domain recovery for stochastic inverse problems(Colorado State University. Libraries, 2021) Chi, Jiarui, author; Wang, Haonan, advisor; Estep, Don, advisor; Breidt, F. Jay, committee member; Tavener, Simon, committee member; Zhou, Wen, committee memberThis dissertation tackles several critical challenges related to the Stochastic Inverse Problem (SIP) to perform scientific inference and prediction for complex physical systems which are characterized by mathematical models, e.g. differential equations. We treat both discrete and continuous cases. The SIP concerns inferring the values and quantifying the uncertainty of the inputs of a model, which are considered as random and unobservable quantities governing system behavior, by using observational data on the model outputs. Uncertainty of the inputs is quantified through probability distributions on the input domain which induce the probability distribution on the outputs realized by the observational data. The formulation of the SIP is based on rigorous measure-theoretic probability theory that uses all the information encapsulated in both the model and data. We introduce a problem in which a portion of the inputs can be observed and varied to study the hidden inputs, and we employ a formulation of the problem that uses all the knowledge in multiple experiments by varying the observable inputs. Since the map that the model induces is typically not 1-1, an ansatz, i.e. an assumption of some prior information, is necessary to be imposed in order to determine a specific solution of the SIP. The resulting solution is heavily conditioned on the observable inputs and we seek to combine solutions from different values of the observable inputs in order to reduce that dependence. We propose an approach of combining the individual solutions based on the framework of the Dempster-Shafer theory, which removes the dependency on the experiments as well as the ansatz and provides useful distributional information about the unobservable inputs, more specifically, about the ansatz. We develop an iterative algorithm that updates the ansatz information in order to obtain a best form of the solution for all experiments. The philosophy of Bayesian approaches is similar to that of the SIP in the sense that they both consider random variables as the model inputs and they seek to update the unobservable solution using information obtained from observations. We extend the classical Bayesian in the context of the SIP by incorporating the knowledge of the model. The input domain is a pre-specified condition for the SIP given by the knowledge from scientists and is often assumed to be a compact metric space. The supports of the probability distributions computed in the SIP are restricted to the domain, and thus an inappropriate choice of domain might cause a massive loss of information in the solutions. Similarly, we combine the individual solutions from multiple experiments to recover a unique domain among many choices of domain induced by the distribution of the inputs in general cases. In particular, results on the convergence of the domain recovery in linear models are investigated.Item Open Access Statistical modeling and inference for spatial and spatio-temporal data(Colorado State University. Libraries, 2019) Liu, Jialuo, author; Wang, Haonan, advisor; Breidt, F. Jay, committee member; Kokoszka, Piotr S., committee member; Luo, Rockey J., committee memberSpatio-temporal processes with a continuous index in space and time are encountered in many scientific disciplines such as climatology, environmental sciences, and public health. A fundamental component for modeling such spatio-temporal processes is the covariance function, which is traditionally assumed to be stationary. While convenient, this stationarity assumption can be unrealistic in many situations. In the first part of this dissertation, we develop a new class of locally stationary spatio-temporal covariance functions. A novel spatio-temporal expanding distance (STED) asymptotic framework is proposed to study the properties of statistical inference. The STED asymptotic framework is established on a fixed spatio-temporal domain, aiming to characterize spatio-temporal processes that are globally nonstationary in a rescaled fixed domain and locally stationary in a distance expanding domain. The utility of STED is illustrated by establishing the asymptotic properties of the maximum likelihood estimation for a general class of spatio-temporal covariance functions, as well as a simulation study which suggests sound finite-sample properties. Then, we address the problem of simultaneous estimation of the mean and covariance functions for continuously indexed spatio-temporal processes. A flexible spatio-temporal model with partially linear regression in the mean function and local stationarity in the covariance function is proposed. We study a profile likelihood method for estimation in the presence of spatio-temporally correlated errors. Specifically, for the nonparametric component, we employ a family of bimodal kernels to alleviate bias, which may be of independent interest for semiparametric spatial statistics. The theoretical properties of our profile likelihood estimation, including consistency and asymptotic normality, are established. A simulation study is conducted and corroborates our theoretical findings, while a health hazard data example further illustrates the methodology. Maximum likelihood method for irregularly spaced spatial datasets is computationally intensive, as it involves the manipulation of sizable dense covariance matrices. Finding the exact likelihood is generally impractical, especially for large datasets. In the third part, we present an approximation to the Gaussian log-likelihood function using Krylov subspace methods. This method reduces the computational complexity from O(N³) operations to O(N²) for dense matrices and further to quasi-linear if matrices are sparse. Specifically, we implement the conjugate gradient method to solve linear systems iteratively and use Monte Carlo method and Gauss quadrature rule to obtain a stochastic estimator of the log-determinant. We give conditions to ensure consistency of the estimators. Simulation studies have been conducted to explore various important computational aspects including complexity, accuracy and efficiency. We also apply our proposed method to estimate the spatial structure of a big LiDAR dataset.