Browsing by Author "Zhou, Wen, committee member"
Now showing 1 - 16 of 16
- Results Per Page
- Sort Options
Item Open Access Automatically detecting task unrelated thoughts during conversations using keystroke analysis(Colorado State University. Libraries, 2022) Kuvar, Vishal Kiran, author; Blanchard, Nathaniel, advisor; Mills, Caitlin, advisor; Ben-Hur, Asa, committee member; Zhou, Wen, committee memberTask-unrelated thought (TUT), commonly known as the phenomenon of daydreaming or zoning- out, is a mental state where a person's attention moves away from the task-at-hand to self-generated thoughts. This state is extremely common yet not much is known about it during dyadic interactions. We built a model to detect when a person experiences TUTs while talking to another person through a chat platform, by analyzing their keystroke patterns. This model was able to differentiate between task-unrelated thoughts and task-related thoughts with a kappa of 0.343. This serves as a strong indicator that typing behavior is linked with mental states, task-unrelated thoughts in our case.Item Open Access Evolutionary increase in genome size drives changes in cell biology and organ structure(Colorado State University. Libraries, 2022) Itgen, Michael Walter, author; Mueller, Rachel Lockridge, advisor; Sloan, Daniel B., committee member; Hoke, Kim L., committee member; Zhou, Wen, committee memberThe evolution of large genome size has been associated with patterns of phenotypic change in cell and organismal biology. The most fundamental of these is between genome size and cell size, which share a strong positive and deterministic relationship. As a result, increases in cell size alter the structure and function of the cell. Genome and cell size, together, are hypothesized to produce emergent consequences on development and physiology at the cellular and organismal level. My dissertation aims to better understand these patterns and identify potential mechanisms underlying these phenotypic changes. I test for the effects of genome and cell size on cell function, cellular physiology, and organ morphology by leveraging the natural variation in genome size found in salamanders (Genus: Plethodon). First, I show that transcriptomic data supports the predictions that large genome and cell size has functional consequences on cell biology. I also reject the hypothesis that large cell size is functionally linked to lower metabolic rate at the cellular level, but I provide transcriptomic evidence that cell size alters the metabolic state of cells. Finally, I show that genome and cell size drives morphological change in organ-specific ways in the heart and liver. I conclude that large cell size does not lower metabolic rate in salamanders. As an alternative, I propose that the evolution of low metabolic rate lifts the constraint of cell size, thus permitting the evolution of genome gigantism.Item Open Access Independent measurement of the T2K near detector constraint using the off-axis pi-zero detector(Colorado State University. Libraries, 2019) Hogan, Matthew Gregory, author; Toki, Walter, advisor; Wilson, Robert, committee member; Buchanan, Norman, committee member; Zhou, Wen, committee memberThe Tokai to Kamioka (T2K) experiment is a long-baseline neutrino oscillation experiment hosted in Japan searching for electron neutrino appearance in a high purity muon neutrino beam. In order to constrain the systematic uncertainties in the oscillation analysis, a dedicated near detector (ND) complex called ND280 is located 280 meters from the neutrino production source in line of the beam. To date, the Fine Grain Detector (FGD) in ND280 has provided the ND constraint using a binned maximum likelihood estimate fit. This thesis describes the effort to validate the ND constraint using the same framework, but with an independent data set from the ND280 pi-zero detector (PØD). Expanding on previously developed PØD selections, new selections have been developed to select neutrino and antineutrino events in one and multiple track topologies on water and carbon. These selections are shown to have similar sensitivity to the T2K flux and cross section systematic uncertainties. Using the same parameterization as the official ND constraint result, a hypothesis test was conducted between the PØD-only and FGD-only data fit results. A p-value of 0.2865 was obtained indicating the two data sets are likely describing the same population of neutrinos and their interactions in T2K.Item Open Access Inference for cumulative intraday return curves(Colorado State University. Libraries, 2018) Zheng, Ben, author; Kokoszka, Piotr S., advisor; Cooley, Dan, committee member; Miao, Hong, committee member; Zhou, Wen, committee memberThe central theme of this dissertation is inference for cumulative intraday return (CIDR) curves computed from high frequency data. Such curves describe how the return on an investment evolves with time over a relatively short period. We introduce a functional factor model to investigate the dependence of cumulative return curves of individual assets on the market and other factors. We propose a new statistical test to determine whether this dependence is the same in two sample periods. The statistical power of the new test is validated by asymptotic theory and a simulation study. We apply this test to study the impact on individual stocks and Sector Exchanged-Traded Funds (ETF) of the recent financial crisis and of trends in the oil price. Our analysis reveals that the functional approach has an information content different from that obtained from scalar factor models for point-to-point returns. Motivated by the risk inherent in intraday investing, we propose several ways of quantifying extremal behavior of a time series of curves. A curve can be extreme if it has shape and/or magnitude much different than the bulk of observed curves. Our approach is at the nexus of Functional Data Analysis and Extreme Value Theory. The risk measures we propose allow us to assess probabilities of observing extreme curves not seen in a historical record. These measures complement risk measures based on point-to-point returns, but have different interpretation and information content. Using our approach, we study how the financial crisis of 2008 impacted the extreme behavior of intraday cumulative return curves. We discover different impacts on shares in important sectors of the US economy. The information our analysis provides is in some cases different from the conclusions based on the extreme value analysis of daily closing price returns. In a different direction, we investigate a large-scale multiple testing problem motivated by a biological study. We introduce mixed models to fit the longitudinal data and incorporate a bootstrap method to construct a false discovery rate (FDR) controlling procedure. A simulation study is implemented to show its effectiveness.Item Open Access Inference for functional time series with applications to yield curves and intraday cumulative returns(Colorado State University. Libraries, 2016) Young, Gabriel J., author; Kokoszka, Piotr S., advisor; Miao, Hong, committee member; Breidt, F. Jay, committee member; Zhou, Wen, committee memberEconometric and financial data often take the form of a functional time series. Examples include yield curves, intraday price curves and term structure curves. Before an attempt is made to statistically model or predict such series, we must address whether or not such a series can be assumed stationary or trend stationary. We develop extensions of the KPSS stationarity test to functional time series. Motivated by the problem of a change in the mean structure of yield curves, we also introduce several change point methods applied to dynamic factor models. For all testing procedures, we include a complete asymptotic theory, a simulation study, illustrative data examples, as well as details of the numerical implementation of the testing procedures. The impact of scheduled macroeconomic announcements has been shown to account for sizable fractions of total annual realized stock returns. To assess this impact, we develop methods of derivative estimation which utilize a functional analogue of local-polynomial smoothing. The confidence bands are then used to find time intervals of statistically increasing cumulative returns.Item Open Access Infinite dimensional stochastic inverse problems(Colorado State University. Libraries, 2018) Yang, Lei, author; Estep, Donald, advisor; Breidt, F. Jay, committee member; Tavener, Simon, committee member; Zhou, Wen, committee memberIn many disciplines, mathematical models such as differential equations, are used to characterize physical systems. The model induces a complex nonlinear measurable map from the domain of physical parameters to the range of observable Quantities of Interest (QoI) computed by applying a set of functionals to the solution of the model. Often the parameters can not be directly measured, and people are confronted with the task of inferring information about values of the parameters given the measured or imposed information about the values of the QoI. In such applications, there is generally significant uncertainty in the measured values of the QoI. Uncertainty is often modeled using probability distributions. For example, a probability structure imposed on the domain of the parameters induces a corresponding probability structure on the range of the QoI. This is the well known Stochastic Forward Problem that is typically solved using a variation of the Monte Carlo method. This dissertation is concerned with the Stochastic Inverse Problems (SIP) where the probability distributions are imposed on the range of the QoI, and problem is to compute the induced distributions on the domain of the parameters. In our formulation of the SIP and its generalization for the case where the physical parameters are functions, main topics including the existence, continuity and numerical approximations of the solutions are investigated. Chapter 1 introduces the background and previous research on the SIP. It also gives useful theorems, results and notation used later. Chapter 2 begins by establishing a relationship between Lebesgue measures on the domain and the range, and then studies the form of solution of the SIP and its continuity properties. Chapter 3 proposes an algorithm for computing the solution of the SIP, and discusses the convergence of the algorithm to the true solution. Chapter 4 exploits the fact that a function can be represented by its coefficients with respect to a basis, and extends the SIP framework to allow for cases where the domain representing the basis coefficients is a countable cube with decaying edges, referred to as the infinite dimensional SIP. We then discusses how its solution can be approximated by the SIP for which the domain is the finite dimensional cube obtained by taking a finite dimensional projection of the countable cube. Chapter 5 begins with an algorithm for approximating the solution of the infinite dimensional SIP, and then proves the algorithm converges to the true solution. Chapter 6 gives a numerical example showing the effects of different decay rates and the relation to truncation to finite dimensions. Chapter 7 reviews popular probabilistic inverse problem methods and proposes a combination of the SIP and statistical models to address problems encountered in practice.Item Open Access Influence and regulation of PCBP2 and YTHDF2 RNA-binding proteins during self-renewal and differentiation of human induced pluripotent stem cells(Colorado State University. Libraries, 2019) Heck, Adam M., author; Wilusz, Carol J., advisor; Wilusz, Jeffrey, advisor; Osborne Nishimura, Erin, committee member; Montgomery, Tai, committee member; Zhou, Wen, committee memberEmbryonic stem cells (ESCs) are able to self-renew or differentiate into any cell type in the body, a property known as pluripotency that enables them to initiate early growth and development. However, the ethical implications of harvesting and manipulating ESCs hinders their use in basic research and the clinical applications. Thus, the discovery that somatic cells can be exogenously reprogrammed into induced pluripotent stem cells (iPSCs) offers new and exciting possibilities for gene therapy, personalized medicine and basic research. However, more research is needed into the mechanisms involved in regulating pluripotency in order for iPSCs to reach their full potential in the research lab and clinic. To maintain a state of self-renewal, yet also be able to rapidly differentiate in response to external signals, pluripotent stem cells need to exert tight control over gene expression through transcriptional and post-transcriptional mechanisms. There are several notable transcriptional networks that regulate pluripotency, but the post-transcriptional mechanisms remain poorly characterized. mRNA decay is one form of post-transcriptional regulation that can help to both maintain the steady-state of a transcriptome or facilitate its rapid remodeling. To this end, degradation rates are influenced by the elements contained in an mRNA and the RNA-binding proteins (RBPs) they associate with. Previous reports have indicated the RNA modification N6-methyladenosine (m6A) and C-rich sequence elements (CREs) can affect mRNA decay in pluripotent stem cells. Therefore, we sought to further understand the roles of m6A and CREs in mRNA decay in stem cells by characterizing the expression and mRNA targets of two RBPs that recognize these elements, YTHDF2 and PCBP2, respectively. In this thesis, I report YTHDF2 is differentially regulated in pluripotent and differentiated cells and that YTHDF2 contributes to pluripotency by targeting a group of mRNAs encoding factors important for neural development. The down-regulation of YTHDF2 during neural differentiation is consistent with increased expression of neural factors during this time. Moreover, YTHDF2 expression is regulated at the level of translation via elements located in the first 300 nucleotides of the 3' untranslated region of YTHDF2 mRNA. Based on these results, I propose that stem cells are primed for rapid differentiation by transcribing low levels of mRNAs encoding neural factors that are subsequently targeted for degradation, in part by YTHDF2, until differentiation is induced. On the other hand, PCBP2 is up-regulated upon differentiation of pluripotent stem cells and regulates several mRNAs associated with pluripotency and development, including LIN28B. Notably, expression of long non-coding RNAs (lncRNAs) that contain human endogenous retrovirus element H (HERV-H) is influenced by PCBP2. HERV-H lncRNAs are almost exclusively expressed in stem cells and play a role in maintaining a pluripotent state, although their functions are not fully understood. Intriguingly, some HERV-lncRNAs can also regulate PCBP2 expression, as altering the expression of LINC01356 or LINC00458 effects PCBP2 protein levels. Based on these results, I propose the reciprocal regulation of PCBP2 and HERV-H lncRNAs influences whether stem cells maintain a state of self-renewal or differentiate. Taken together, these findings demonstrate that YTHDF2 and PCBP2 post-transcriptionally regulate gene expression in stem cells and influence pluripotency.Item Open Access Large-scale automated protein function prediction(Colorado State University. Libraries, 2016) Kahanda, Indika, author; Ben-Hur, Asa, advisor; Anderson, Chuck, committee member; Draper, Bruce, committee member; Zhou, Wen, committee memberProteins are the workhorses of life, and identifying their functions is a very important biological problem. The function of a protein can be loosely defined as everything it performs or happens to it. The Gene Ontology (GO) is a structured vocabulary which captures protein function in a hierarchical manner and contains thousands of terms. Through various wet-lab experiments over the years scientists have been able to annotate a large number of proteins with GO categories which reflect their functionality. However, experimentally determining protein functions is a highly resource-intensive task, and a large fraction of proteins remain un-annotated. Recently a plethora automated methods have emerged and their reasonable success in computationally determining the functions of proteins using a variety of data sources – by sequence/structure similarity or using various biological network data, has led to establishing automated function prediction (AFP) as an important problem in bioinformatics. In a typical machine learning problem, cross-validation is the protocol of choice for evaluating the accuracy of a classifier. But, due to the process of accumulation of annotations over time, we identify the AFP as a combination of two sub-tasks: making predictions on annotated proteins and making predictions on previously unannotated proteins. In our first project, we analyze the performance of several protein function prediction methods in these two scenarios. Our results show that GOstruct, an AFP method that our lab has previously developed, and two other popular methods: binary SVMs and guilt by association, find it hard to achieve the same level of accuracy on these two tasks compared to the performance evaluated through cross-validation, and that predicting novel annotations for previously annotated proteins is a harder problem than predicting annotations for uncharacterized proteins. We develop GOstruct 2.0 by proposing improvements which allows the model to make use of information of a protein's current annotations to better handle the task of predicting novel annotations for previously annotated proteins. Experimental results on yeast and human data show that GOstruct 2.0 outperforms the original GOstruct, demonstrating the effectiveness of the proposed improvements. Although the biomedical literature is a very informative resource for identifying protein function, most AFP methods do not take advantage of the large amount of information contained in it. In our second project, we conduct the first ever comprehensive evaluation on the effectiveness of literature data for AFP. Specifically, we extract co-mentions of protein-GO term pairs and bag-of-words features from the literature and explore their effectiveness in predicting protein function. Our results show that literature features are very informative of protein function but with further room for improvement. In order to improve the quality of automatically extracted co-mentions, we formulate the classification of co-mentions as a supervised learning problem and propose a novel method based on graph kernels. Experimental results indicate the feasibility of using this co-mention classifier as a complementary method that aids the bio-curators who are responsible for maintaining databases such as Gene Ontology. This is the first study of the problem of protein-function relation extraction from biomedical text. The recently developed human phenotype ontology (HPO), which is very similar to GO, is a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. At present, only a small fraction of human protein coding genes have HPO annotations. But, researchers believe that a large portion of currently unannotated genes are related to disease phenotypes. Therefore, it is important to predict gene-HPO term associations using accurate computational methods. In our third project, we introduce PHENOstruct, a computational method that directly predicts the set of HPO terms for a given gene. We compare PHENOstruct with several baseline methods and show that it outperforms them in every respect. Furthermore, we highlight a collection of informative data sources suitable for the problem of predicting gene-HPO associations, including large scale literature mining data.Item Open Access Neutrino oscillation parameter sensitivity in future long-baseline experiments(Colorado State University. Libraries, 2014) Bass, Matthew, author; Wilson, Robert J., advisor; Harton, John, committee member; Toki, Walter, committee member; Zhou, Wen, committee memberThe study of neutrino interactions and propagation has produced evidence for physics beyond the standard model and promises to continue to shed light on rare phenomena. Since the discovery of neutrino oscillations in the late 1990s there have been rapid advances in establishing the three flavor paradigm of neutrino oscillations. The 2012 discovery of a large value for the last unmeasured missing angle has opened the way for future experiments to search for charge-parity symmetry violation in the lepton sector. This thesis presents an analysis of the future sensitivity to neutrino oscillations in the three flavor paradigm for the T2K, NOνA, LBNE, and T2HK experiments. The theory of the three flavor paradigm is explained and the methods to use these theoretical predictions to design long baseline neutrino experiments are described. The sensitivity to the oscillation parameters for each experiment is presented with a particular focus on the search for CP violation and the measurement of the neutrino mass hierarchy. The variations of these sensitivities with statistical considerations and experimental design optimizations taken into account are explored. The effects of systematic uncertainties in the neutrino flux, interaction, and detection predictions are also considered by incorporating more advanced simulations inputs from the LBNE experiment.Item Open Access Parameter inference and model selection for differential equation models(Colorado State University. Libraries, 2015) Sun, Libo, author; Hoeting, Jennifer A., advisor; Lee, Chihoon, advisor; Zhou, Wen, committee member; Hobbs, N. Thompson, committee memberFirstly, we consider the problem of estimating parameters of stochastic differential equations with discrete-time observations that are either completely or partially observed. The transition density between two observations is generally unknown. We propose an importance sampling approach with an auxiliary parameter when the transition density is unknown. We embed the auxiliary importance sampler in a penalized maximum likelihood framework which produces more accurate and computationally efficient parameter estimates. Simulation studies in three different models illustrate promising improvements of the new penalized simulated maximum likelihood method. The new procedure is designed for the challenging case when some state variables are unobserved and moreover, observed states are sparse over time, which commonly arises in ecological studies. We apply this new approach to two epidemics of chronic wasting disease in mule deer. Next, we consider the problem of selecting deterministic or stochastic models for a biological, ecological, or environmental dynamical process. In most cases, one prefers either deterministic or stochastic models as candidate models based on experience or subjective judgment. Due to the complex or intractable likelihood in most dynamical models, likelihood-based approaches for model selection are not suitable. We use approximate Bayesian computation for parameter estimation and model selection to gain further understanding of the dynamics of two epidemics of chronic wasting disease in mule deer. The main novel contribution of this work is that under a hierarchical model framework we compare three types of dynamical models: ordinary differential equation, continuous time Markov chain, and stochastic differential equation models. To our knowledge model selection between these types of models has not appeared previously. The practice of incorporating dynamical models into data models is becoming more common, the proposed approach may be useful in a variety of applications. Lastly, we consider estimation of parameters in nonlinear ordinary differential equation models with measurement error where closed-form solutions are not available. We propose a new numerical algorithm, the data driven adaptive mesh method, which is a combination of the Euler and 4th order Runge-Kutta methods with different step sizes based on the observation time points. Our results show that both the accuracy in parameter estimation and computational cost of the new algorithm improve over the most widely used numerical algorithm, the 4th Runge-Kutta method. Moreover, the generalized profiling procedure proposed by Ramsay et al. (2007) doesn't have good performance for sparse data in time as compared to the new approach. We illustrate our approach with both simulation studies and ecological data on intestinal microbiota.Item Open Access Penalized unimodal spline density estimate with application to M-estimation(Colorado State University. Libraries, 2020) Chen, Xin, author; Meyer, Mary C., advisor; Wang, Haonan, committee member; Kokoszka, Piotr, committee member; Zhou, Wen, committee member; Miao, Hong, committee memberThis dissertation establishes a novel type of robust estimation, Auto-Adaptive M-estimation (AAME), based on a new density estimation. The new robust estimation, AAME, is highly data-driven, without the need of priori of the error distribution. It presents improved performance against fat-tailed or highly-contaminated errors over existing M-estimators, by down-weighting influential outliers automatically. It is shown to be root-n consistent, and has an asymptotically normal sampling distribution which provides asymptotic confidence intervals and the basis of robust prediction intervals. The new density estimation is a penalized unimodal spline density estimation which is established as a basis for AAME. It is constrained to be unimodal, symmetrical, and integrate to 1, and it is penalized to have stabilized derivatives and against over-fitting, overall satisfying the requirements of being applied in AAME. The new density estimation is shown to be consistent, and its optimal asymptotic convergence rate can be obtained when the penalty is asymptotically bounded. We also extend our AAME to linear models with heavy-tailed and dependent errors. The dependency of errors is modeled by an autoregressive process, and parameters are estimated jointly.Item Open Access Protein interface prediction using graph convolutional networks(Colorado State University. Libraries, 2017) Fout, Alex M., author; Ben-Hur, Asa, advisor; Anderson, Chuck, committee member; Chitsaz, Hamidreza, committee member; Zhou, Wen, committee memberProteins play a critical role in processes both within and between cells, through their interactions with each other and other molecules. Proteins interact via an interface forming a protein complex, which is difficult, expensive, and time consuming to determine experimentally, giving rise to computational approaches. These computational approaches utilize known electrochemical properties of protein amino acid residues in order to predict if they are a part of an interface or not. Prediction can occur in a partner independent fashion, where amino acid residues are considered independently of their neighbor, or in a partner specific fashion, where pairs of potentially interacting residues are considered together. Ultimately, prediction of protein interfaces can help illuminate cellular biology, improve our understanding of diseases, and aide pharmaceutical research. Interface prediction has historically been performed with a variety of methods, to include docking, template matching, and more recently, machine learning approaches. The field of machine learning has undergone a revolution of sorts with the emergence of convolutional neural networks as the leading method of choice for a wide swath of tasks. Enabled by large quantities of data and the increasing power and availability of computing resources, convolutional neural networks efficiently detect patterns in grid structured data and generate hierarchical representations that prove useful for many types of problems. This success has motivated the work presented in this thesis, which seeks to improve upon state of the art interface prediction methods by incorporating concepts from convolutional neural networks. Proteins are inherently irregular, so they don't easily conform to a grid structure, whereas a graph representation is much more natural. Various convolution operations have been proposed for graph data, each geared towards a particular application. We adapted these convolutions for use in interface prediction, and proposed two new variants. Neural networks were trained on the Docking Benchmark Dataset version 4.0 complexes and tested on the new complexes added in version 5.0. Results were compared against the state of the art method partner specific method, PAIRpred [1]. Results show that multiple variants of graph convolution outperform PAIRpred, with no method emerging as the clear winner. In the future, additional training data may be incorporated from other sources, unsupervised pretraining such as autoencoding may be employed, and a generalization of convolution to simplicial complexes may also be explored. In addition, the various graph convolution approaches may be applied to other applications with graph structured data, such as Quantitative Structure Activity Relationship (QSAR) learning, and knowledge base inference.Item Open Access Quality assessment of protein structures using graph convolutional networks(Colorado State University. Libraries, 2024) Roy, Soumyadip, author; Ben-Hur, Asa, advisor; Blanchard, Nathaniel, committee member; Zhou, Wen, committee memberThe prediction of protein 3D structure is essential for understanding protein function, drug discovery, and disease mechanisms; with the advent of methods like AlphaFold that are capable of producing very high quality decoys, ensuring the quality of those decoys can provide further confidence in the accuracy of their predictions. In this work we describe Qε, a graph convolutional network that utilizes a minimal set of atom and residue features as input to predict the global distance test total score (GDTTS) and local distance difference test score (lDDT) of a decoy. To improve the model's performance, we introduce a novel loss function based on the ε-insensitive loss function used for SVM-regression. This loss function is specifically designed for the characteristics of the quality assessment problem, and provides predictions with improved accuracy over standard loss functions used for this task. Despite using only a minimal set of features, it matches the performance of recent state-of-the-art methods like DeepUMQA. The code for Qε is available at https://github.com/soumyadip1997/qepsilon.Item Open Access Regression of network data: dealing with dependence(Colorado State University. Libraries, 2019) Marrs, Frank W., author; Fosdick, Bailey K., advisor; Breidt, F. Jay, committee member; Zhou, Wen, committee member; Wilson, James B., committee memberNetwork data, which consist of measured relations between pairs of actors, characterize some of the most pressing problems of our time, from environmental treaty legislation to human migration flows. A canonical problem in analyzing network data is to estimate the effects of exogenous covariates on a response that forms a network. Unlike typical regression scenarios, network data often naturally engender excess statistical dependence -- beyond that represented by covariates -- due to relations that share an actor. For analyzing bipartite network data observed over time, we propose a new model that accounts for excess network dependence directly, as this dependence is of scientific interest. In an example of international state interactions, we are able to infer the networks of influence among the states, such as which states' military actions are likely to incite other states' military actions. In the remainder of the dissertation, we focus on situations where inference on effects of exogenous covariates on the network is the primary goal of the analysis, and thus, the excess network dependence is a nuisance effect. In this setting, we leverage an exchangeability assumption to propose novel parsimonious estimators of regression coefficients for both binary and continuous network data, and new estimators for coefficient standard errors for continuous network data. The exchangeability assumption we rely upon is pervasive in network and array models in the statistics literature, but not previously considered when adjusting for dependence in a regression of network data. Although the estimators we propose are aligned with many network models in the literature, our estimators are derived from the assumption of exchangeability rather than proposing a particular parametric model for representing excess network dependence in the data.Item Open Access Sliced inverse approach and domain recovery for stochastic inverse problems(Colorado State University. Libraries, 2021) Chi, Jiarui, author; Wang, Haonan, advisor; Estep, Don, advisor; Breidt, F. Jay, committee member; Tavener, Simon, committee member; Zhou, Wen, committee memberThis dissertation tackles several critical challenges related to the Stochastic Inverse Problem (SIP) to perform scientific inference and prediction for complex physical systems which are characterized by mathematical models, e.g. differential equations. We treat both discrete and continuous cases. The SIP concerns inferring the values and quantifying the uncertainty of the inputs of a model, which are considered as random and unobservable quantities governing system behavior, by using observational data on the model outputs. Uncertainty of the inputs is quantified through probability distributions on the input domain which induce the probability distribution on the outputs realized by the observational data. The formulation of the SIP is based on rigorous measure-theoretic probability theory that uses all the information encapsulated in both the model and data. We introduce a problem in which a portion of the inputs can be observed and varied to study the hidden inputs, and we employ a formulation of the problem that uses all the knowledge in multiple experiments by varying the observable inputs. Since the map that the model induces is typically not 1-1, an ansatz, i.e. an assumption of some prior information, is necessary to be imposed in order to determine a specific solution of the SIP. The resulting solution is heavily conditioned on the observable inputs and we seek to combine solutions from different values of the observable inputs in order to reduce that dependence. We propose an approach of combining the individual solutions based on the framework of the Dempster-Shafer theory, which removes the dependency on the experiments as well as the ansatz and provides useful distributional information about the unobservable inputs, more specifically, about the ansatz. We develop an iterative algorithm that updates the ansatz information in order to obtain a best form of the solution for all experiments. The philosophy of Bayesian approaches is similar to that of the SIP in the sense that they both consider random variables as the model inputs and they seek to update the unobservable solution using information obtained from observations. We extend the classical Bayesian in the context of the SIP by incorporating the knowledge of the model. The input domain is a pre-specified condition for the SIP given by the knowledge from scientists and is often assumed to be a compact metric space. The supports of the probability distributions computed in the SIP are restricted to the domain, and thus an inappropriate choice of domain might cause a massive loss of information in the solutions. Similarly, we combine the individual solutions from multiple experiments to recover a unique domain among many choices of domain induced by the distribution of the inputs in general cases. In particular, results on the convergence of the domain recovery in linear models are investigated.Item Open Access Some topics on survey estimators under shape constraints(Colorado State University. Libraries, 2021) Xu, Xiaoming, author; Meyer, Mary C., advisor; Breidt, F. Jay, committee member; Zhou, Wen, committee member; Chong, Edwin K. P., committee memberWe consider three topics in this dissertation: 1) Nonresponse weighting adjustment using estimated response probability; 2) Improved variance estimation for inequality constrained domain mean estimators in surveys; and 3) One-sided testing of population domain means in surveys. Weighting by the inverse of the estimated response probabilities is a common type of adjustment for nonresponse in surveys. In the first topic, we propose a new survey estimator under nonresponse where we set the response model in linear form and the parameters are estimated by fitting a constrained least square regression model, with the constraint being a calibration equation. We examine asymptotic properties of Horvitz-Thompson and Hájek versions of the estimators. Variance estimation for the proposed estimators is also discussed. In a limited simulation study, the performances of the estimators are compared with those of the corresponding uncalibrated estimators in terms of unbiasedness, MSE and coverage rate. In survey domain estimation, a priori information can often be imposed in the form of linear inequality constraints on the domain estimators. Wu et al. (2016) formulated the isotonic domain mean estimator, for the simple order restriction, and methods for more general constraints were proposed in Oliva-Avilés et al. (2020). When the assumptions are valid, imposing restrictions on the estimators will ensure that the a priori information is respected, and in addition allows information to be pooled across domains, resulting in estimators with smaller variance. In the second topic, we propose a method to further improve the estimation of the covariance matrix for these constrained domain estimators, using a mixture of possible covariance matrices obtained from the inequality constraints. We prove consistency of the improved variance estimator, and simulations demonstrate that the new estimator results in improved coverage probabilities for domain mean confidence intervals, while retaining the smaller confidence interval lengths. Recent work in survey domain estimation allows for estimation of population domain means under a priori assumptions expressed in terms of linear inequality constraints. Imposing the constraints has been shown to provide estimators with smaller variance and tighter confidence intervals. In the third topic, we consider a formal test of the null hypothesis that all the constraints are binding, versus the alternative that at least one constraint is non-binding. The test of constant versus increasing domain means is a special case. The power of the test is substantially better than the test with an unconstrained alternative. The new test is used with data from the National Survey of College Graduates, to show that salaries are positively related to the subject's father's educational level, across fields of study and over several years of cohorts.