# Michalak Lab Software

You will be asked for a login ID and password to access this software. If you do not possess these credentials, submit a registration form to obtain them here. Please read our Usage Policy before proceeding to the next step.

#### Software Products

Some of these software products also require that you download and install PUORG Common Libraries. This requirement is indicated in the entries below and explained in the relevant project installation guide.

**Multiplication of Separable Matrices***(Original posting: 17 July 2013*)Addressing a variety of questions within Earth science disciplines entails the inference of the spatiotemporal distribution of parameters of interest based on observations of related quantities. Such estimation problems often represent inverse problems that are formulated as linear optimization problems. Computational limitations arise when the number of observations and/or the size of the discretized state space becomes large, especially if the inverse problem is formulated in a probabilistic framework and therefore aims to assess the uncertainty associated with the estimates. This work proposes two approaches to lower the computational costs and memory requirements for large linear space–time inverse problems, taking the Bayesian approach for estimating carbon dioxide (CO2) emissions and uptake (a.k.a. fluxes) as a prototypical example. The first algorithm can be used to efficiently multiply two matrices, as long as one can be expressed as a Kronecker product of two smaller matrices, a condition that is typical when multiplying a sensitivity matrix by a covariance matrix in the solution of inverse problems. The second algorithm can be used to compute a posteriori uncertainties directly at aggregated spatiotemporal scales, which are the scales of most interest in many inverse problems. Both algorithms have significantly lower memory requirements and computational complexity relative to direct computation of the same quantities (O(n2.5) vs. O(n3)). For an examined benchmark problem, the two algorithms yielded massive savings in floating point operations relative to direct computation of the same quantities.

Matlab code Multiplication of any matrix with a matrix expressed as a Kronecker product Matlab code Computation of aggregated a posteriori covariance

Reference:

**Branch and Bound Algorithms***(Original posting: 17 July 2013*)Regression models are used in geosciences to extrapolate data and identify significant predictors of a response variable. Criterion approaches based on the residual sum of squares (RSS), such as the Akaike Information Criterion, Bayesian Information Criterion (BIC), Deviance Information Criterion, or Mallows' Cp can be used to compare non-nested models to identify an optimal subset of covariates. Computational limitations arise when the number of observations or candidate covariates is large in comparing all possible combinations of the available covariates, and in characterizing the covariance of the residuals for each examined model when the residuals are autocorrelated, as is often the case in spatial and temporal regression analysis. This paper presents computationally efficient algorithms for identifying the optimal model as defined using any RSS-based model selection criterion. The proposed dual criterion optimal branch and bound (DCO B&B) algorithm is guaranteed to identify the optimal model, while a single criterion heuristic (SCH) B&B algorithm provides further computational savings and approximates the optimal solution. These algorithms are applicable both to multiple linear regression (MLR) and to response variables with correlated residuals. We also propose an approach for iterative model selection, where a single set of covariance parameters is used in each iteration rather than a different set of parameters being used for each examined model. Simulation experiments are performed to evaluate the performance of the algorithms for regression models, using MLR and geostatistical regression as prototypical regression tools and BIC as a prototypical model selection approach. Results show massive computational savings using the DCO B&B algorithm relative to performing an exhaustive search. The SCH B&B is shown to provide a good approximation of the optimal model in most cases, while the DCO B&B with iterative covariance parameter optimization yields the closest approximation to the DCO B&B algorithm while also providing additional computational savings.

zip Branch & Bound source code and test data

Reference:

**GIM: Geostatistical Inverse Modeling - Matlab version***(Original posting: 21 May 2010*)GIM is a suite of MATLAB functions for performing geostatistical inverse modeling for CO2 flux estimation. The example provided uses continuous measurements from 9 tower locations and a Lagrangian transport model to infer 1x1 CO2 fluxes across North America. This Geostatistical Inverse Model (GIM) algorithm is based on a data-driven approach to CO2 flux estimation, where the covariance parameters, fluxes and relationship of auxiliary variables to flux are estimated using the atmospheric measurements. This Bayesian method contains a linear trend which can include environmental variables known to affect or correlate with carbon fluxes, thereby helping to constrain estimates without specifying a prescribed prior flux field. GIM also accounts for the spatiotemporal correlation in the deviations between the estimated fluxes that are not explained by the linear trend. The purpose of providing these algorithms is to allow researchers to (1) learn about the GIM algorithm for estimating regional-scale CO2 fluxes, (2) estimate carbon dioxide fluxes and their associated uncertainties at a 1x1 resolution in a computationally efficient manner, and (3) modify the provided GIM code for particular researcher needs.

zip GIM source code and test data archive [280 MB] zip GIM source code only archive [23 KB] pdf GIM documentation

**GIM: Geostatistical Inverse Modeling - Fortran version***(Original posting: July 2013)*The Geostatistical Inverse Modeling code is now available in Fortran as well. The code and descriptions can be found here.

Reference:

**GRM: Geostatistical Regression Method***(Original posting: 28 May 2010; Latest update: 1 June 2010)*We provide an algorithm for a geostatistical regression method (GRM) that can be used with eddy-covariance data (as collected at a flux tower site) to investigate the relationships between carbon flux and environmental variables at multiple timescales. The approach uses an adaptation of the Bayes Information Criterion to identify an optimal set of environmental variables that are able to explain the observed variability in carbon flux. In addition, GR quantifies the temporal correlation in the portion of the flux signal that cannot be explained by the selected variables, and directly accounts for this correlation in the analysis. This method has been applied previously to (1) identify the dominant explanatory variables for Net Ecosystem Exchange (NEE), Gross Ecosystem Exchange (GEE) and heterotrophic and autotrophic respiration (Rh+a) at different temporal scales, (2) evaluate whether environmental variables can be used to isolate the GEE and Rh+a signals from the NEE measurements, and (3) determine the impact of temporal scale on the inferred relationships between the environmental variables and CO2 flux.

zip GRM source code archive [360 KB] pdf GRM documentation *(Updated: 1 June 2010)*

Reference:

**GD_Sed: A Computer Code for Geostatistical Downscaling of Multi-Resolution Aquatic Sediment Data***(Original posting: 23 June 2010)*In the paper entitled Characterizing Attribute Distributions in Water Sediments by Geostatistical Downscaling, Zhou and Michalak (2009) developed a geostatistical downscaling method (GD_SED) to estimate the spatial distribution of attributes (e.g., contaminant concentrations or hydraulic properties) from non-uniform resolution samples in surface water or water sediment. The code provided here can be used to reproduce the results for this paper.

An estimate of an attribute distribution at a uniform spatial resolution is often required for site characterization and the design of appropriate risk-based remediation alternatives. Information about attributes in benthic sediments is typically obtained in core sections of varying lengths and only the average value is measured in each section, which leaves a huge challenge for traditional mapping methods. In this research, geostatistical downscaling is applied and shown to resolve this problem by explicitly accounting for the relationship between the known average measurements and the unknown fine resolution attribute distribution to be estimated. First, we applied and tested our method with a pseudodata case. Second, we estimated the spatial variability of total organic carbon in the lower part of Passaic River, New Jersey. The GD_SED code provided here can be used (1) to reproduce the results and most of the figures for this paper, (2) to compare the GD_SED estimation with estimation using ordinary kriging, and (3) as a starting point for adapting the code for other applications where sampling is conducted at multiple resolutions.

zip GD_Sed source code archive [44 KB] pdf GD_Sed documentation

Reference:

**GDF_AOT: A Computer Code for Geostatistical Data Fusion for Aerosol Optical Depth***(Original posting: 6 July 2010)*In the paper entitled "A Geostatistical Data Fusion Technique for Merging Remote Sensing and Ground-based Observations of Aerosol Optical Thickness", Chatterjee et al. (2010) applied a geostatistical data fusion technique that merges aerosol optical depth (AOT) distributions from multiple sensors. The code provided here can be used to reproduce the results for this paper.

The Multi-angle Imaging SpectroRadiometer (MISR) and the Moderate Resolution Imaging Spectroradiometer (MODIS) aboard the NASA Earth Observation System's Terra satellite have been measuring aerosol optical thickness (AOT) since early 2000. These remote-sensing platforms complement the ground-based AErosol RObotic NETwork (AERONET) in better understanding the role of aerosols in climate and atmospheric chemistry. To date, however, there have been only limited attempts to exploit the complementary multiangle (MISR) and multispectral (MODIS) capabilities of these sensors along with the ground-based observations in an integrated analysis. Chatterjee et al. (2010) describe a geostatistical data fusion technique that can take advantage of the spatial autocorrelation of the AOT distribution, while making optimal use of all available datasets. Using Level 2.0 AERONET, MISR and MODIS AOT data for the contiguous US, we demonstrate that this approach can successfully incorporate information from multiple sensors, and provide accurate estimates of AOT with rigorous uncertainty bounds. Cross-validation results show that the resulting AOT product is closer to the ground-based AOT observations than either of the individual satellite measurements.

This code product requires the PUORG Common Functions Library, version dated 20 May 2010 or later.

zip GDF_AOT source code archive [221 KB] pdf GDF_AOT documentation

Reference:

**GRM_GPP: A Computer Code for Geostatistical Regression Modeling of Gross Primary Productivity at Six AmeriFlux Sites***(Original posting: 15 July 2010)*In the paper entitled "A geostatistical synthesis study of factors affecting gross primary productivity in various ecosystems of North America," Yadav et al. (2010) applied a coupled Bayesian model selection and geostatistical regression modeling approach to the empirical analysis of gross primary productivity (GPP) at six AmeriFlux sites, including the Kennedy Space Center Scrub Oak, Vaira Ranch, Tonzi Ranch, Blodgett Forest, Morgan Monroe State Forest, and Harvard Forest sites. The code provided here can be used to reproduce the results for this paper.

The analysis is performed at a continuum of temporal scales ranging from daily to monthly, for a period of seven years. A total of 10 covariates representing environmental stimuli and indices of plant physiology are considered in explaining variations in GPP. Similar to other statistical methods, the proposed approach estimates regression coefficients and uncertainties associated with the covariates in a selected regression model. However, unlike traditional regression methods, the presented approach also estimates the uncertainty associated with the selection of a single best model of GPP. In addition, the approach provides an enhanced understanding of how the importance of specific covariates changes with temporal resolutions. An examination of trends in the importance of specific covariates reveals scaling thresholds above or below which covariates become signicant in explaining GPP. Results indicate that most sites (especially those with a stronger seasonal cycle) exhibit at least one prominent scaling threshold between daily to 20-day temporal scales. This demonstrates that environmental variables that explain GPP at synoptic scales are different from those that capture its seasonality. At shorter time scales, radiation, temperature, and vapor pressure deficit exert the most signicant influence on GPP at most examined sites. However, at coarser time scales, the importance of these covariates in explaining GPP declines. Overall, the unique best models are identified at most sites at the daily scale, whereas multiple competing models are identified at larger time scales. In addition, the selected models are able to explain a larger fraction of the observed variability for sites exhibiting strong seasonality.

This code product requires the PUORG Common Functions Library, version dated 20 May 2010 or later.

zip GRM_GPP source code archive [57 KB] pdf GRM_GPP documentation

Reference:

**June_PS: A Computer Code for the June North American Inversion Pseudo-Data Study***(Original posting: 23 July 2010)*In the paper entitled "Regional-scale geostatistical inverse modeling of North American CO2 fluxes: a synthetic data study," Gourdji et al. (2010) use a series of synthetic data experiments to investigate the ability of a one-month regional atmospheric inversion to estimate grid-scale CO2 fluxes during the growing season over North America. The code provided here can be used to reproduce the results and most of the figures for this paper.

The inversions are performed within a geostatistical framework without the use of any prior flux estimates or auxiliary variables, in order to focus on the atmospheric constraint provided by the nine towers collecting continuous, calibrated CO2 measurements in 2004. Using synthetic measurements and their associated concentration footprints, flux and model-data mismatch covariance parameters are first optimized, and then fluxes and their uncertainties are estimated at three different temporal resolutions. These temporal resolutions are chosen to help assess the impact of temporal aggregation errors on the estimated fluxes and covariance parameters, and they include a 4-day average, a four-day-average diurnal cycle with 3-hourly increments, and 3-hourly fluxes.

Estimating fluxes at a temporal resolution that can adjust the diurnal variability was found to be critical both for recovering covariance parameters directly from the atmospheric data, and for inferring accurate ecoregion-scale fluxes. Accounting for both spatial and temporal covariance in the flux distribution was also found to be necessary for recovering accurate a posteriori uncertainty bounds on the estimated fluxes. Overall, the results suggest that even a fairly sparse network of 9 towers collecting continuous CO2 measurements across the continent, used with no auxiliary information or prior estimates of flux variability in time or space, can be used to infer relatively accurate monthly ecoregion scale CO2 surface fluxes over North America within recovered uncertainty bounds. Simulated random transport error was shown to decrease the quality of flux estimates in under-constrained areas at the ecoregion scale, although the uncertainty bounds were still realistic. While these synthetic data inversions do not consider other concerns associated with using actual measurement data, e.g. systematic transport errors or problems with the boundary conditions, they help to highlight the impact of inversion setup choices and can potentially help to provide a baseline set of fluxes for comparison with CO2 flux estimates from future real-data inversions.

This code product requires the PUORG Common Functions Library, version dated 20 May 2010 or later.

zip June_PS source code archive [300 MB] zip June_PS source code only archive [146 KB] pdf June_PS documentation

Reference:

#### Common Libraries

**PUORG Common Functions Library***(Original posting: 20 May 2010; Latest update: 7 January 2011)*The document describes how to obtain and install the PUORG Common Code Library, a set of MATLAB functions that support the PUORG computer codes.

zip PUORG Common Functions Library source code archive [29 KB] *(Updated: 14 Oct 2010)*pdf Common Functions Library documentation *(Updated: 7 January 2011)*

#### Software related to SI^{2}-SSI project

**The software develped for the SI**^{2}-SSI project can be found here.