labs_title

A backward elimination discrete optimization algorithm for model selection in spatio-temporal regression models

V. Yadav, K.L. Mueller and A.M. Michalak

In this study, we present novel computationally efficient algorithms for selecting a regression model for spatio-temporal data from a large set of candidate covariates, based on existing Residual Sums of Squares-based model selection criteria. All the algorithms are designed for selecting an appropriate regression model when the number of candidate covariates is large, and offer a solution for the case where regression residuals are correlated. The presented algorithms have wide applicability including their use in problems involving feature selection, such as cluster analysis and neural nets, making them especially promising for applications in remote sensing and pattern recognition.


Figure: Pathways for model selection in geostatistical regression (GR). Note: For GR model selection, we first perform multiple linear regression (MLR; model selection (Pathway 1) and if the residuals from the best MLR model are correlated then we proceed with GR model selection. The choice of the dual criterion optimal (DCO) or single criterion heuristic (SCH) B&B algorithm, and the option of varying the covariance parameters in the former, depends on the number of observations (n) and total number of covariates (r).

Abstract

Regression models are used in geosciences to extrapolate data and identify significant predictors of a response variable. Criterion approaches based on the residual sum of squares (RSS), such as the Akaike Information Criterion, Bayesian Information Criterion (BIC), Deviance Information Criterion, or Mallows' Cp can be used to compare non-nested models to identify an optimal subset of covariates. Computational limitations arise when the number of observations or candidate covariates is large in comparing all possible combinations of the available covariates, and in characterizing the covariance of the residuals for each examined model when the residuals are autocorrelated, as is often the case in spatial and temporal regression analysis. This paper presents computationally efficient algorithms for identifying the optimal model as defined using any RSS-based model selection criterion. The proposed dual criterion optimal branch and bound (DCO B&B) algorithm is guaranteed to identify the optimal model, while a single criterion heuristic (SCH) B&B algorithm provides further computational savings and approximates the optimal solution. These algorithms are applicable both to multiple linear regression (MLR) and to response variables with correlated residuals. We also propose an approach for iterative model selection, where a single set of covariance parameters is used in each iteration rather than a different set of parameters being used for each examined model. Simulation experiments are performed to evaluate the performance of the algorithms for regression models, using MLR and geostatistical regression as prototypical regression tools and BIC as a prototypical model selection approach. Results show massive computational savings using the DCO B&B algorithm relative to performing an exhaustive search. The SCH B&B is shown to provide a good approximation of the optimal model in most cases, while the DCO B&B with iterative covariance parameter optimization yields the closest approximation to the DCO B&B algorithm while also providing additional computational savings.

Yadav, V., K.L. Mueller, A.M. Michalak (2013) “A backward elimination discrete optimization algorithm for model selection in spatio-temporal regression models”, Environmental Modelling & Software, 42 (2013): 88-98, dx.doi.org/j.envsoft.2012.12.009.