Repository logo
 

Penalized estimation for sample surveys in the presence of auxiliary variables

Abstract

In conducting sample surveys, time and financial resources can be limited but research questions are wide and varied. Thus, methods for analysis must make the best use of whatever data are available and produce results that address a variety of needs. Motivation for this research comes from surveys of aquatic resources, in which sample sizes are small to moderate, but auxiliary information is available to supplement measured survey responses. The problems of survey estimation are considered, tied together in their use of constrained/penalized estimation techniques for combining information from the auxiliary information and the responses of interest. We study a small area problem with the goal of obtaining a good ensemble estimate, that is, a collection of estimates for individual small areas that collectively give a good estimate of the overall distribution function across small areas. Often, estimators that are good for one purpose may not be good for others. For example, estimation of the distribution function itself (as in Cordy and Thomas, 1997) can address questions of variability and extremes but does not provide individual estimators of the small areas, nor is it appropriate when auxiliary information can be made of use. Bayes estimators are good individual estimators in terms of mean squared error but are not variable enough to represent ensemble traits (Ghosh, 1992). An algorithm that extends the constrained Bayes (CB) methods of Louis (1984) and Ghosh (1992) for use in a model with a general covariance matrix is presented. This algorithm produces estimators with similar properties as (CB), and we refer to this method as general constrained Bayes (GCB). The ensemble GCB estimator is asymptotically unbiased for the posterior mean of the empirical distribution function (edf). The ensemble properties of transformed GCB estimates are investigated to determine if the desirable ensemble characteristics displayed by the GCB estimator are preserved under such transformations. The GCB algorithm is then applied to complex models such as conditional autoregressive spatial models and to penalized spline models. Illustrative examples include the estimation of lip cancer risk, mean water acidity, and rates of change in water acidity. We also study a moderate area problem in which the goal is to derive a set of survey weights that can be applied to each study variable with reasonable predictive results. Zheng and Little (2003) use penalized spline regression in a model-based approach for finite population estimation in a two-stage sample when predictor variables are available. Breidt et al. (2005) propose a class of model-assisted estimators based on penalized spline regression in single stage sampling. Because unbiasedness of the model-based estimator requires that the model be correctly specified, we look at extending model-assisted estimation to the two-stage case. By calibrating the degrees of freedom of the smooth to the most important study variables, a set of weights can be obtained that produce design consistent estimators for all study variables. The model-assisted estimator is compared to other estimators in a simulation study. Results from the simulation study show that the model-assisted estimator is comparable to other estimators when the model is correctly specified and generally superior when the model is incorrectly specified.

Description

Rights Access

Subject

constrained Bayes
penalized splines
sample surveys
semiparametric regression
small areas
statistics

Citation

Associated Publications