A parallel computing approach to fast geostatistical areal interpolation
Journal
International Journal of Geographical Information Science
Date Issued
September 2011
DOI
10.1080/13658816.2011.563744
Abstract
Areal interpolation is the procedure of using known attribute values at a set of (source)
areal units to predict unknown attribute values at another set of (target) units. Geostatistical
areal interpolation employs spatial prediction algorithms, that is, variants of Kriging,
which explicitly incorporate spatial autocorrelation and scale differences between
source and target units in the interpolation endeavor. When all the available source measurements
are used for interpolation, that is, when a global search neighborhood is adopted,
geostatistical areal interpolation is extremely computationally intensive. Interpolation
in this case requires huge memory space and massive computing power, even with
the dramatic improvement introduced by the spectral algorithms developed by Kyriakidis
et al. (2005. Improving spatial data interoperability using geostatistical support-to-support
interpolation. In: Proceedings of geoComputation. Ann Arbor, MI: University of Michigan)
and Liu et al. (2006. Calculation of average covariance using fast Fourier transform (FFT).
Menlo Park, CA: Stanford Center for Reservoir Forecasting, Petroleum Engineering Department,
Stanford University) based on the fast Fourier transform (FFT). In this study,
a parallel FFT-based geostatistical areal interpolation algorithm was developed to tackle
the computational challenge of such problems. The algorithm includes three parallel processes:
(1) the computation of source-to-source and source-to-target covariance matrices by
means of FFT; (2) the QR factorization of the source-to-source covariance matrix; and (3)
the computation of source-to-target weights via Kriging, and the subsequent computation
of predicted attribute values for the target supports. Experiments with real-world datasets
(i.e., predicting population densities of watersheds from population densities of counties
in the Eastern Time Zone and in the continental United States) showed that the parallel algorithm
drastically reduced the computing time to a practical length that is feasible for actual
spatial analysis applications, and achieved fairly high speed-ups and efficiencies. Experiments
also showed the algorithm scaled reasonably well as the number of processors
increased and as the problem size increased
areal units to predict unknown attribute values at another set of (target) units. Geostatistical
areal interpolation employs spatial prediction algorithms, that is, variants of Kriging,
which explicitly incorporate spatial autocorrelation and scale differences between
source and target units in the interpolation endeavor. When all the available source measurements
are used for interpolation, that is, when a global search neighborhood is adopted,
geostatistical areal interpolation is extremely computationally intensive. Interpolation
in this case requires huge memory space and massive computing power, even with
the dramatic improvement introduced by the spectral algorithms developed by Kyriakidis
et al. (2005. Improving spatial data interoperability using geostatistical support-to-support
interpolation. In: Proceedings of geoComputation. Ann Arbor, MI: University of Michigan)
and Liu et al. (2006. Calculation of average covariance using fast Fourier transform (FFT).
Menlo Park, CA: Stanford Center for Reservoir Forecasting, Petroleum Engineering Department,
Stanford University) based on the fast Fourier transform (FFT). In this study,
a parallel FFT-based geostatistical areal interpolation algorithm was developed to tackle
the computational challenge of such problems. The algorithm includes three parallel processes:
(1) the computation of source-to-source and source-to-target covariance matrices by
means of FFT; (2) the QR factorization of the source-to-source covariance matrix; and (3)
the computation of source-to-target weights via Kriging, and the subsequent computation
of predicted attribute values for the target supports. Experiments with real-world datasets
(i.e., predicting population densities of watersheds from population densities of counties
in the Eastern Time Zone and in the continental United States) showed that the parallel algorithm
drastically reduced the computing time to a practical length that is feasible for actual
spatial analysis applications, and achieved fairly high speed-ups and efficiencies. Experiments
also showed the algorithm scaled reasonably well as the number of processors
increased and as the problem size increased

