Change detection and classification algorithm
The core algorithm used for GLanCE is the Continuous Change Detection and Classification (CCDC) algorithm, which was developed at Boston University (Zhu and Woodcock 2014). CCDC assumes that noise is ephemeral and land cover change is persistent and uses all available Landsat observations at each pixel to simultaneously map land cover and land cover change. To accomplish this, the algorithm includes two core steps:
- Identification of change points and modeling of stable time segments.First, Landsat time series at each pixel are filtered to remove observations affected by clouds, cloud shadows and snow. The resulting time series is then modeled as a Fourier series, starting in the first year. Using this initial model, subsequent Landsat reflectance values at each pixel are then successively compared against model forecasts, and change points are identified based on persistent mismatch between new observations and the model. When no change is detected, new data are appended to the time series and the model is re-fit. Change points are identified based on model fits across all spectral bands using a change vector metric that integrates differences between observed and predicted reflectances in each spectral band, weighted by the root mean squared error of the model fit for each spectral band. Central to the CCDC algorithm is the concept of a time segment, which is the time period between two consecutive land cover changes (if any), bounded by the start and end of the time series.
Assignment of class labels to each time segment. Once change points and time segments are identified, spectral and temporal information for each time segment are used to assign land cover and land use labels to each pixel for each segment. This approach has two important advantages. First, data from the entire time segment contributes to the classification, providing a richer set of remote sensing inputs than is possible using conventional classification methods. Further, instead of using time series of surface reflectances as the primary inputs, CCDC uses model parameters estimated for each Landsat band, which have been shown to be highly effective for discrimination of land cover and land use classes. Second, this approach only estimates a new classification label after a change has been detected, which avoids problems associated with stochastic changes in classification labels that arise when land cover is mapped based on independent time series from each year. To perform this classification, CCDC uses a supervised classification approach based on Random forest that relies on training data for estimation.
Figure 1. Red dot corresponds to a change point identified by CCDC for a single Landsat pixel time series. Here, CCDC identifies two time segments, one beginning in 2004 and ending 2012 (when a change is detected), and the second extending from 2012 to the end of the time series.
Implementation of CCDC at global scale will require three key modifications relative to how it is generally applied at local or regional scales:
- As it is unrealistic to train a single classification model that can be applied over the entire globe, a moving window approach will be used to estimate unique classification models for regions composed of 3 x 3 (450 km x 450 km) 30-meter tiles. To do this, Random forest classification models will be estimated for each 3 x 3 tile region using training data from the 5 x 5 tile window surrounding the region of interest (750 km x 750 km). This approach has three key advantages: (1) by using training data outside of the classification window, the amount of training data available to the classifier will be increased relative to using only data from within the local window, which will improve classifier performance; (2) by using training in surrounding tiles, classification results will not have discrete changes (i.e., seams) at tile boundaries; (3) issues arising in parts of the world where training data is sparse because land cover is uniform (e.g., the Sahara desert) or because training data were difficult to obtain, will be compensated for by expanding the size of the training data window (i.e., 7 x 7 or even 9 x 9).
- Existing map products from around the world will be used to inform and improve the classification process. The growing list of thematically-focused global land cover and land use maps created at moderate spatial resolution will be exploited, including maps of global forest cover and forest change (Hansen et al., 2013), impervious surfaces (Song et al., 2016), agricultural lands (GCAD30, 2017), and water surfaces (Pekel et al., 2016) derived from Landsat. To generate layers providing prior probabilities for GLanCE land cover classes based on these maps, moving windows will be applied to each of these maps to compute the local likelihood of these classes at 30-meter spatial resolution, which will then be used in association with class conditional probabilities estimated from Random forest to compute posterior probabilities for each class and assign class labels based on maximum likelihood following the same basic approach that is used to create the MODIS Land Type product (Friedl et al. 2010). Note that because these ancillary maps were created independently and not all GLanCE classes are represented in these data sets, estimated prior probabilities will need to be normalized at each pixel.
- A “back-up” algorithm will be applied at pixels where CCDC fails because of an insufficient number of observations to support time series analysis. This algorithm will use conventional classification approaches (i.e., Random forest applied to available spectral reflectances) based on individual (or a small number of) images in each year. These methods have been used for decades for local-to-regional scale mapping and monitoring. Note that because this classification approach does not use time series the way that CCDC does, post-processing to reduce spurious land cover changes associated with classification errors will be essential. This will be done using the approach developed by Abercrombie and Friedl (2016), which estimates Markov models from annual time series of classification results to distinguish stable changes from classification errors. Where applied, the back-up algorithm will be identified in the QA/QC data for each pixel and each year it is employed.