Global forest management data for 2015 at a 100 m resolution

Reference data collection

In February 2019, we involved forest experts from different regions around the world and organized a workshop to (1) discuss the variety of forest management practices that take place in various parts of the world; (2) explore what types of forest management information could be collected by visual interpretation of very high-resolution images from Google Maps and Microsoft Bing Maps, in combination with Sentinel time series and Normalized Difference Vegetation Index (NDVI) profiles derived from Google Earth Engine (GEE); (3) generalize and harmonize the definitions at global scale; (4) finalize the Geo-Wiki interface for the crowdsourcing campaigns; and (5) build a data set of control points (or the expert data set), which we used later to monitor the quality of the crowdsourced contributions by the participants. Based on the results of this analysis, we launched the crowdsourcing campaigns by involving a broader group of participants, which included people recruited from remote sensing, geography and forest research institutes and universities. After the crowdsourcing campaigns, we collected additional data with the help of experts. Hence, the final reference data consists of two parts: (1) a randomly stratified sample collected by crowdsourcing (49,982 locations); (2) a targeted sample collected by experts (176,340 locations, at those locations where the information collected from the crowdsourcing campaign was not large enough to ensure a robust classification).


Table 1 contains the initial classification used for visual interpretation of the reference samples and the aggregated classes presented in the final reference data set. For the Geo-Wiki campaigns, we attempted to collect information (1) related to forest management practices and (2) recognizable from very high-resolution satellite imagery or time series of vegetation indices. The final reference data set and the final map contain an aggregation of classes, i.e., only those that were reliably distinguishable from visual interpretation of satellite imagery.

Table 1 Forest management classes and definitions.

Sampling design for the crowdsourcing campaigns

Initially, we generated a random stratified sample of 110,000 sites globally. The total number of sample sites was chosen based on experiences from past Geo-Wiki campaigns12, a practical estimation of the potential number of volunteer participants that we could engage in the campaign, and the expected spatial variation in forest management. We used two spatial data sets for the stratification of the sample: World Wildlife Fund (WWF) Terrestrial Ecoregions13 and Global Forest Change14. The samples were stratified into three biomes, based on WWF Terrestrial Ecoregions (Fig. 2): boreal (25 000 sample sites), temperate (35,000 sample sites) and tropical (50,000 sample sites). Within each biome, we used Hansen’s14 Global Forest Change maps to derive areas with “forest remaining forest” 2000–2015, “forest loss or gain”, and “permanent non-forest” areas.

Fig. 2
figure 2

Biomes for sampling stratification (1 – boreal, 2 – temperate, 3 – sub-tropical and tropical).

The sample size was determined from previous experiences, taking into account the expected spatial variation in forest management within each biome. Tropical forests had the largest sample size because of increasing commodity-driven deforestation15, the wide spatial extent of plantations, and slash and burn agriculture. Temperate forests had a larger sample compared to boreal forests due to their higher fragmentation. Each sample site was classified by at least three different participants, thus accounting for human error and varying expertise16,17,18. At a later stage, following a preliminary analysis of the data collected, we increased the number of sample sites to meet certain accuracy thresholds for every mapped class (aiming to exceed 75% accuracy).

The Geo‐Wiki application

Geo‐ is an online application for crowdsourcing and expert visual interpretation of satellite imagery, e.g., to classify land cover and land use. This application has been used in several data collection campaigns over the last decade16,19,20,21,22,23. Here, we implemented a new custom branch of Geo‐Wiki (‘Human impact on Forest’), which is devoted to the collection of forest management data (Fig. 3). Various map overlays (including satellite images from Google Maps, Microsoft Bing Maps and Sentinel 2), campaign statistics and tools to aid interpretation, such as time series profiles of NDVI, were provided as part of this Geo‐Wiki branch, giving users a range of options and choices to facilitate image classification and general data collection. Google Maps and Microsoft Bing Maps include mosaics of very high-resolution satellite and aerial imagery from different time periods and multiple image providers, including the Landsat satellites operated by NASA and USGS as base imagery to commercial image providers such as Digital Globe. More information on the spatial and temporal distribution of very high-resolution satellite imagery can be found in Lesiv et al.24. This collection of images was supplied as guidance for visual interpretation16,20. Participants could analyze time series profiles of NDVI from Landsat, Sentinel 2 and MODIS images, which were derived from Google Earth Engine (GEE). More information on tools can be found in Supplementary file 1.

Fig. 3
figure 3

Screenshot of the Geo‐Wiki interface showing a very high-resolution image from Google Maps and a sample site as a 100 mx100 m blue square, which the participants classified based on the forest management classes on the right.

The blue box in Fig. 3 corresponds to 100 m × 100 m pixels aligned with the Sentinel grid in UTM projection. It is the same geometry required for the classification workflow that is used to produce the Copernicus Land Cover product for 201511.

Before starting the campaign, the participants were shown a series of slides designed to help them gain familiarity with the interface and to train them in how to visually determine and select the most appropriate type of land use and forest management classes at each given location, thereby increasing both consistency and accuracy of the labelling tasks among experts. Once completed, the participants were shown random locations (from the random stratified sample) on the Geo‐Wiki interface and were then asked to select one of the forest management classes outlined in the Definition section (see Table 1 above).

Alternatively, if there was either insufficient quality in the available imagery, or if a participant was unable to determine the forest management type, they could skip such a site (Fig. 3). If a participant skipped a sample site because it was too difficult, other participants would then receive this sample site for classification, whereas in the case of the absence of high-resolution satellite imagery, i.e., Google Maps and Microsoft Bing Maps, this sample site was then removed from the pool of available sample sites. The skipped locations were less than 1% of the total amount of locations assigned for labeling. Table 2 shows the distribution of the skipped locations by countries, based on the subset of the crowdsourced data where all the participants agreed.

Table 2 Distribution of the skipped locations by countries.

Quality assurance and data aggregation of the crowdsourced data

Based on the experience gained from previous crowdsourcing campaigns12,19, we invested in the training of the participants (130 persons in total) and overall quality assurance. Specifically, we provided initial guidelines for the participants in the form of a video and a presentation that were shown before the participants could start classifying in the forest management branch (Supplementary file 1). Additionally, the participants were asked to classify 20 training samples before contributing to the campaign. For each of these training samples, they received text‐based feedback regarding how each location should be classified. Summary information about the participants who filled in the survey at the end of the campaign (i.e., gender, age, level of education, and their country of residence) is provided in the Supplementary file 2. We would like to note that 130 participants is a high number, especially taking the complexity of the task into consideration.

Furthermore, during the campaign, sample sites that were part of the “control” data set were randomly shown to the participants. The participants received text-based feedback regarding whether the classification had been made correctly or not, with additional information and guidance. By providing immediate feedback, our intention was that participants would learn from their mistakes, increasing the quality and classification accuracy over time. If the text‐based feedback was not sufficient to provide an understanding of the correct classification, the participants were able to submit a request (“Ask the expert”) for a more detailed explanation by email.

The control set was independent of the main sample, and it was created using the same random stratified sampling procedure within each biome and the stratification by Global Forest Change maps14 (see “Sample design” section). To determine the size of the control sample, we considered two aspects: (a) the maximum number of sample sites that one person could classify during the entire campaign; (b) the frequency at which control sites would appear among the task sites (defined at 15%, which is a compromise between the classification of as many unknown locations as possible and a sufficient level of quality control, based on previous experience). Our control sample consisted of 5,000 sites. Each control sample site was classified twice by two different experts. When the two experts agreed, these sample sites were added to the final control sample. Where disagreement occurred (in 25% of cases), these sample sites were checked again by the experts and revised accordingly. During the campaign, participants had the option to disagree with the classification of the control site and submit a request with their opinion and arguments. They received an additional quality score in the situation when they were correct, but the experts were not. This procedure also ensured an increase in the quality of the control data set.

To incentivize participation and high-quality classifications, we offered prizes as part of the campaign design. The ranking system for the prize competition considered both the quality of the classifications and the number of classifications provided by a participant. The quality measure was based on the control sample discussed above. The participants randomly received a control point, which was classified in advance by the experts. For every control point, a participant could receive a maximum of +30 points (fully correct classification) to a minimum of −30 points (incorrect classification). In the case where the answer was partly correct (e.g., the participant correctly classified that the forest is managed, but misclassified the regeneration type), they received points ranging from 5 to 25.

The relative quality score for each participant was then calculated as the total sum of gained points divided by the maximum sum of points that this participant could have earned. For any subsequent data analysis, we excluded classifications from those participants whose relative quality score was less than 70%. This threshold corresponds to an average score of 10 points at each location (out of a maximum of 30 points), i.e., where participants were good at defining the aggregated forest management type but may have been less good at providing the more detailed classification.

Unfortunately, we observed some imbalance in the proportion of participants coming from different countries, e.g. there were not so many participants from the tropics. This could have resulted in interpretation errors, even when all the participants agreed on a classification. To address this, we did an additional quality check. We selected only those sample sites where all the participants agreed and then randomly checked 100 sample sites from each class. Table 3 summarizes the results of this check and explains the selection of the final classes presented in Table 1.

Table 3 Qualitative analysis of the reference sample sites with full agreement.

As a result of the actions outlined in Table 3, we compiled the final reference data set, which consisted of 49,982 consistent sample sites.

Additional expert data collection

We used the reference data set to produce a test map of forest management (the classification algorithm used is described in the next section). By checking visually and comparing against the control data set, we found that the map was of insufficient quality for many locations, especially in the case of heterogeneous landscapes. While several reasons for such an unsatisfactory result are possible, the experts agreed that a larger sample size would likely increase the accuracy of the final map, especially in areas of high heterogeneity and for forest management classes that only cover a small spatial extent. To increase the amount of high-quality training data and hence to improve the map, we collected additional data using a targeted approach. In practice, the map was uploaded to Geo-Wiki, and using the embedded drawing tools, the experts randomly checked locations on the map, focusing on their region of expertise and added classified polygons in locations where the forest management was misclassified. To limit model overfitting and oversampling of certain classes, the experts also added points for correctly mapped classes to keep the density of the points the same. This process involved a few iterations of collecting additional points and training the classification algorithm until the map accuracy reached 75%. In total, we collected an additional 176,340 training points. With the 49,982 consistent training points from the Geo-Wiki campaigns, this resulted in 226,322 (Fig. 4). This two-pronged approach would not have been possible without the exhaustive knowledge obtained from running the initial Geo-Wiki campaigns, including numerous questions raised by the campaign participants. Figure 4 also highlights in yellow the areas of very high sampling density, I.e., those collected by the experts. The sampling intensity of these areas is much higher in comparison with the randomly distributed crowdsourced locations, and these are mainly areas with very mixed forest classes or small patches, in most cases, including plantations.

Fig. 4
figure 4

Distribution of reference locations.

Classification algorithm

To produce the forest management map for the year 2015, we applied a workflow that was developed as part of the production of the Copernicus Global Land Services land cover at 100 m resolution (CGLS-LC100) collection 2 product11. A brief description of the workflow (Fig. 5), focusing on the implemented changes, is given below. A more thorough explanation, including detailed technical descriptions of the algorithms, the ancillary data used, and the intermediate products generated, can be found in the Algorithm Theoretical Basis Document (ATBD) of the CGLS-LC100 collection 2 product25.

Fig. 5
figure 5

Workflow overview for the generation of the Copernicus Global Land Cover Layers. Adapted from the Algorithm Theoretical Basis Document25.

The CGLS-LC100 collection 2 processing workflow can be applied to any satellite data, as it is unspecific to different sensors or resolutions. While the CGLS-LC100 Collection 2 product is based on PROBA-V sensor data, the workflow has already been tested with Sentinel 2 and Landsat data, thereby using it for regional/continental land cover (LC) mapping applications11,26. For generating the forest management layer, the main Earth Observation (EO) input was the PROBA-V UTM Analysis Ready Data (ARD) archive based on the complete PROBA-V L1C archive from 2014 to 2016. The ARD pre-processing included geometric transformation into a UTM coordinate system, which reduced distortions in high northern latitudes, as well as improved atmospheric correction, which converted the Top-of-Atmosphere reflectance to surface reflectance (Top-of-Canopy). In a further processing step, gaps in the 5-daily PROBA-V UTM multi-spectral image data with a Ground Sampling Distance (GSD) of ~0.001 degrees (~100 m) were filled using the PROBA-V UTM daily multi-spectral image data with a GSD of ~0.003 degrees (~300 m). This data fusion is based on a Kalman filtering approach, as in Sedano et al.27, but was further adapted to heterogonous surfaces25. Outputs from the EO pre-processing were temporally cleaned by using the internal quality flags of the PROBA-V UTM L3 data, a temporal cloud and outlier filter built on a Fourier transformation. This was done to produce consistent and dense 5-daily image stacks for all global land masses at 100 m resolution and a quality indicator, called the Data Density Indicator (DDI), used in the supervised learning process of the algorithm.

Since the total time series stack for the epoch 2015 (a three-year period including the reference year 2015 +/− 1 year) would be composed of too many proxies for supervised learning, the time and spectral dimension of the data stack had to be condensed. The spectral domain was condensed by using Vegetation Indices (VIs) instead of the original reflectance values. Overall, ten VIs based on the four PROBA-V reflectance bands were generated, which included: Normalized Difference Vegetation Index (NDVI); Enhanced Vegetation Index (EVI); Structure Intensive Pigment Index (SIPI); Normalized Difference Moisture Index (NDMI); Near-Infrared reflectance of vegetation (NIRv); Angle at NIR; HUE and VALUE of the Hue Saturation Value (HSV) color system transformation. The temporal domain of the time series VI stacks was then condensed by extracting metrics, which are used as general descriptors to enable distinguishing between the different LC classes. Overall, we extracted 266 temporal, descriptive, and textual metrics from the VI times series stacks. The temporal descriptors were derived through a harmonic model, fitted through the time series of each of the VIs based on a Fourier transformation28,29. In addition to the seven parameters of the harmonic model that describe the overall level and seasonality of the VI time series, 11 descriptive statistics (mean, standard deviation, minimum, maximum, sum, median, 10th percentile, 90th percentile, 10th – 90th percentile range, time step of the first minimum appearance, and time step of the first maximum appearance) and one textural metric (median variation of the center pixel to median of the neighbours) were generated for each VI. Additionally, the elevation, slope, aspect, and purity derived at 100 m from a Digital Elevation Model (DEM) were added. Overall, 270 metrics were extracted from the PROBA-V UTM 2015 epoch.

The main difference to the original CGLS-LC100 collection 2 algorithms is the use of forest management training data instead of the global LC reference data set, as well as only using the discrete classification branch of the algorithm. The dedicated regressor branch of the CGLS-LC100 collection 2 algorithm, i.e., outputting cover fraction maps for all LC classes, was not needed for generating the forest management layer.

In order to adapt the classification algorithm to sub-continental and continental patterns, the classification of the data was carried out per biome cluster, with the 73 biome clusters defined by the combination of several global ecological layers, which include the ecoregions 2017 dataset30, the Geiger-Koeppen dataset31, the global FAO eco-regions dataset32, a global tree-line layer33, the Sentinel-2 tiling grid and the PROBA-V imaging extent;30,31 this, effectively, resulted in the creation of 73 classification models, each with its non-overlapping geographic extent and its own training dataset. Next, in preparation for the classification procedure, the metrics of all training points were analyzed for outliers, as well as screened via an all-relevant feature selection approach for the best metric combinations (i.e., best band selection) for each biome cluster in order to reduce redundancy between parameters used in the classification. The best metrics are defined as those that have the highest separability compared to other metrics. For each metric, the separability is calculated by comparing the metric values of one class to the metric values of another class; more details can be found in the ATBD25. The optimized training data set, together with the quality indicator of the input data (DDI data set) as a weight factor, were used in the training of the Random Forest classifier. Moreover, a 5-fold cross-validation was used to optimize the classifier parameters for each generated model (one per biome).

Finally, the Random Forest classification was used to produce a hard classification, showing the discrete class for each pixel, as well as the predicted class probability. In the last step, the discrete classification results (now called the forest management map) are modified by the CGLS-LC100 collection 2 tree cover fraction layer29. Therefore, the tree cover fraction layer, showing the relative distribution of trees within one pixel, was used to remove areas with less than 10% tree cover fraction in the forest management layer, following the FAO definition of forest. Figure 6 shows the class probability layer that illustrates the model behavior, highlighting the areas of class confusion. This layer shows that there is high confusion between forest management classes in heterogeneous landscapes, e.g., in Europe and the Tropics while homogenous landscapes, such as Boreal forests, are mapped with high confidence. It is important to note that a low probability does not mean that the classification is wrong.

Fig. 6
figure 6

The predicted class probability by the Random Forest classification.

Source link

Stay in Touch

To follow the best weight loss journeys, success stories and inspirational interviews with the industry's top coaches and specialists. Start changing your life today!


Related Articles