Accounting for spatial confounding in large scale epidemiological studies
Date
2025
Journal Title
Journal ISSN
Volume Title
Abstract
Epidemiological analyses of environmental risk factors often include spatially-varying exposures and outcomes. Unmeasured, spatially-varying factors can lead to confounding bias in estimates of associations. In this dissertation, I present a comparison of existing and new methods that use thin plate regression splines to mitigate spatial confounding bias for both cross-sectional and longitudinal analyses. I also introduce a metric to quantify the spatial smoothing induced by thin plate regression splines in varying geographic domains. I first investigate cross-sectional data, directly comparing existing approaches based on information criteria and cross-validation metrics and additionally introduce a hybrid method to selection that combines features from multiple existing approaches. Based on a simulation study, I make a recommendation for the best approach for different settings and demonstrate their use in a study of environmental exposures on birth weight in a Colorado cohort. Next, I develop an effective bandwidth metric that quantifies the relationship between spatial splines and the range of implied spatial smoothing. I present an R Shiny application, spconfShiny, that provides a user-friendly platform to compute the metric. spconfShiny can be accessed at https://g2aging.shinyapps.io/spconfShiny/. We illustrate the procedure to compute the effective bandwidth and demonstrate its use for different numbers of spatial splines across England, India, Ireland, Northern Ireland, and the United States. Finally, I extend two cross-sectional methods for spatial confounding adjustment to model longitudinal and time-to-event data. The additional temporal component existing in the data requires an additional selection of which coordinates to use to create thin-plate regression splines basis: the spatial coordinates, temporal coordinates, or both the spatial and temporal coordinates. I demonstrate these methods for mixed models, generalized estimating equation models, and a proportional hazard regression framework. I demonstrate the application of these methods in a study of tropical cyclone wind exposures on preterm birth in a North Carolina cohort.
Description
Rights Access
Embargo expires: 05/28/2026.
Subject
regression models
spatial splines
spatial confounding
environmental epidemiology