|
| |||
|
Lake Tanganyika Project: Zambia National Site Characterisation and Catchment Management Design Workshop
GIS training module – Lesson 5 Thomas Gumbricht, ICRAF |
|
|
Lake Tanganyika Regional Integrated Management Project
A United Nations Development Programme (UNDP) Global Environmental Facility (GEF) project covering the Lake Tanganyika riparian countries, Burundi, the Democratic Republic of Congo, Tanzania and Zambia.
Training module created by Thomas Gumbricht, www.mapjourney.com
Last updated: October, 2010
LESSON 5 – VEGETATION AND TIME SERIES DATA
In this lesson you will learn about satellite derived vegetation data, and how to use it for assessing land degradation; how to visually explore changes in both space and time using GIS; how to analyze time series data using DIVA-GIS; and more advanced methods for grid calculations. The lesson also gives an introduction to different data formats for grid data.
The lack of adequate ground data has led to most studies utilizing the growing archives of satellite data for mapping vegetation changes and land degradation. In this lesson, time series data of vegetation for the period 1981 to 2009 is calculated from 8 km resolution Normalized Difference Vegetation Index (NDVI) data obtained from the Advanced Very High Resolution Radiometer (AVHRR) sensor flown on a series of satellites operated by the National Oceanic Atmospheric Administration (NOAA) of the United States of America (USA). From 1981 until today AVHRR instruments have been carried on board, NOAA-7, NOAA-9, NOAA-11, NOAA-14, NOAA-16, and NOAA-17. The AVHRR sensor records electromagnetic radiation in five bands or channels: visible, near infrared, middle infrared and two thermal bands. The resolution at nadir (straight down) is 1.1 km. The spectral bands used for monitoring vegetation are the visible (580-680 nm) and the near infrared (725-1100 nm). Reflectances in these two bands are combined to calculate the Normalized Difference Vegetation Index (NDVI):
NDVI = (near infrared - visible)/ (near infrared + visible) (Eq.1)
In this project we have used the AVHRR NDVI processed by the Global Inventory Monitoring and Modeling System (GIMMS) group at NASA Goddard Space Flight Center (GSFC). The GIMMS-NDVI is corrected for the different generation of sensors, for sensor degradation and for atmospheric disturbances. The global GIMMS dataset used here has a resolution of 8 * 8 km.
The AVHRR sensor was not designed for vegetation mapping, and NDVI is hampered by several shortcomings. To overcome the influence of soil signals at low vegetation densities affecting NDVI, a previously published soil adjustment factor was applied. All the NDVI data used in this lesson, and all NDVI data available in the dataset are hence soil adjusted, sometimes denoted *NDVI. Information about the data source, and where to download, is in the project web-catalog. Details about the soil adjustment and data processing are in the main project report (Section 5).
The *NDVI data was recalculated to annual indexes of vegetation growth using three approaches: 1) annual average vegetation, 2) annual maximum vegetation and, 3) annual increments in vegetation. The annual indexes were calculated using the period January to December. The figure below illustrates the difference in the indexes (see section 5.7 in the main report).
Earlier studies have used either an annual (or growing season) NDVI average (or integral) for mapping vegetation growth. Studies restricted to cropland have tended to use the annual maximum NDVI, hypothesized to represent the standing crop before harvest. For this project we adopted a theoretically neutral annual vegetation index, defined as the summed annual increment in NDVI between each of the individual observations (i.e. difference in vegetation between current and previous observation if positive) over an annual NDVI cycle. The starting dekadof each annual vegetation cycle was set to December the year before. By summing the increments we hypothesize that our index better captures the productivity of rangelands, and neutralizes the differences between rangelands, croplands and woodlands. It further has the advantage of neutralizing initial background effects (e.g. soil moisture conditions, influence of woody biomass – see the illustration above) at the start of an annual growing cycle. The developed index would should hence be more suitable when comparing vegetation in landscapes with transient temporal and spatial changes, compared to both either annual average and or maximum vegetation used in most other studies. Each of the three indexes has its advantage and drawback: annual average vegetation can be misleading as it is sensitive to cloud contamination, and it overestimates vegetation growth in evergreen forests and shrub lands; maximum vegetation is not suitable for pastoral landscapes with continuous or intermittent grazing, where standing crop is low but growth can still be higher compared to fallow or croplands; the annual vegetation increment is sensitive to cloud contamination.
About grid data and data formats
In DIVA-GIS a grid layer is built of three separate files; a data file (with the extension *.gri), a header file that contains information on rows, columns, resolution, projection, data format and legend (with the extension *.grd) and an image file that is displayed in the Data View (usually a bitmap - *.bmp). Most GIS software only use two files for grids, and do not have a separate image file – the GIS software generate the display image when a grid is added. Here is a grid header file from DIVA-GIS, with some explanations:
[General] Creator=Thomas Gumbricht Created=Sun Oct 28 17:55:33 2007 Title=Afpopd90
[GeoReference] Projection=GEOGRAPHIC Datum=WGS84 Mapunits=people/km2 Columns=216 Rows=240 MinX=33.000000 MaxX=42.000000 MinY=-5.000000 MaxY=5.000000 ResolutionX=0.041667 ResolutionY=0.041667
[Data] DataType=INT2BYTES MinValue=0 MaxValue=10410 NoDataValue=-9999 Transparent=1 Units= people/km2
[Application] Opt0= Imported from …. Hist0=Population data made …. Hist1=Produced by CIESIN….
[ContLegend] Count=2 Color1=16711680 Value1=0 Label1= Color2=255 Value2=10410 Label2=
[Legend] Count=9 Color1=13816530 Min1=0 Max1=0 Label1=0 . . . Color9=65634 Min9=5001 Max9=1000000 Label9=>5000 Transparent=1 NoDataColor=16777210 NoDataLabel=No Data isContinuous=0 SpacedByColor=0 | # General information # # #
# The Coordinate system of the grid # Either Geographic or Other (i.e. projected) # How the Earth roundness is set # The units in Other projections (see below) # Nr of columns in grid # Nr of rows in grid # Coordinate of left side # Coordinate of right side # Coordinate of bottom # Coordinate of top # Resolution in Grid X # Resolution in Grid Y
# # Datatype can be byte, integer or real # Minimum data value in grid # Maximum data value in grid # Value denoting no data # if no data is transparent or not in the view # Units in map (here = people/km2)
# # DIVA-GIS operations (entered by DIVA) # Metadata put by the producer/user # Any number of Hist entries allowed
# Default continuous legend (if no [Legend]) # Nr of entries in default legend # Color of entry 1 in default legend # Value of entry in default legend (= min) # Text for entry 1 in default legend # Color for label 2 in default legend # Value for label 2 in default legend (= max) # Color for label 2 in default legend # # If given overrides the [ContLegend] # Nr of entries in the Legend # Color of entry 1 in Legend # Min value for entry 1 in Legend # Max value for entry 1 in Legend # Label for entry 1 in Legend . . . # Color of entry 9 in Legend # Min value for entry 9 in Legend # Max value for entry 9 in Legend # Label for entry 9 in Legend # If nodata is transparent (1) or not (0) # No data color (shown if Transparent = 0) # Label for no data # If legend is continuous or has gaps # If Legend should be spaced by color
|
Most of the information given above can be retrieved from the Properties window of grid layers.
The georeference of a GIS data layer can either be Geographic (unprojected) or Projected (called Other in DIVA-GIS). A Geographic georeference is given in Longitude for X and Latitude for Y. DIVA-GIS can reproject layers from Geographic to many different Projections, but not vice versa. A Geographic georefence is defaulted to have Mapunits in decimal degrees (e.g. 33.000000 and not in degrees-minutes-seconds 33 0 0). For Projected layers Mapunits can be kilometers, meters, millimeters, miles, feet etc. The Datum describes the spherical shape of the Earth, as the Earth is not completely round but is flattened at the poles and thicker around the equator. WGS84 (World Geodetic System 1984) is the default for Geographic georeference. If you reproject data to a Projection, another Datum can be set (see lesson 7).
The DataType in GIS grid layers can have many formats, but is normally restricted to three formats:
BYTE (INT1BYTE) | 0 .. 255 |
INTEGER (INT2BYTES) | –32768 ..32767 |
REAL (FLT4BYTES) | 1.5 x 10^–45 .. 3.4 x 10^38 |
In a Byte image each cell uses one byte (8 bits) for storing each cell value, and the data range can be a whole number between 0 and 255 (called unsigned byte by computer specialists as only positive numbers can be stored). In an Integer image each cell uses 2 bytes (16 bits) for storing each cell value, and the data range can be a whole number between –32768 and 32767 (signed small integer for computer specialists, as the data can have both positive and negative values). An image with Real values uses 4 bytes (32 bits) for storing each cell value, and the data range can be a ratio number (32 bit real is called Single by computer specialists). The trend images that you generated in lesson 4 contain Real data (FLT4BYTES in the DIVA-GIS header).
The image files for vegetation and rainfall that you will encounter in this lesson, where generated outside DIVA-GIS to make a more full use of color ramping. If you make changes to the legends in these files, DIVA-GIS will regenerate the image file and the pre-made color ramp in the images will disappear, You will not be able to recreate the symbolization using DIVA, so if you change the symbolization, then you can make use of the image files to get the original symbolization back by using an image program (photoshop, gimp etc) to convert the jpg image with the same name to a bitmap (bmp).
In order to get some reference and background data to use for interpreting vegetation trends, start by adding the following framework (basic) data layers.
Data layer | Folder | File |
Countries | \data_spatial\laketan\political\vector | generalised_borders |
Rivers | \data_spatial\laketan\hydro\vector | srtm_flowpath |
Basins | \data_spatial\laketan\hydro\vector | watershed |
Climate | \data_spatial\laketan\climate | climate_monthly_average |
Try to symbolize the layers. The country layer you can keep as it is (transparent fill with black boundaries). The river should be symbolized using the ID attribute (only attribute available) that gives information on stream order (number of upstream divisions), for example as shown below. To generate stream order, the river data must have a correct topology (see lesson 2), that is to say that the streams must be connected to each other with no gaps, and a new stream segment at each flow join or bifurcation (splitting of one stream to two). Stream order can be seen as ordinal data (see lesson 2 for an explaining of ordinal data), and as this ordinal data is numerical you can actually use the classification option to symbolize the streams into small and large (as shown below).
The watershed layer (watershed) you can put any color to the border (magenta in the example below), but keep the transparent fill. The climate data you will not need until later in the lesson, and you can leave it for now. The illustration below shows the DIVA-GIS map view if you followed the symbolization suggested above.
For the analysis of vegetation covering the whole of the Lake Tanganyika Basin, vegetation data from the AVHRR (Advanced Very High Resolution Radiometer) was used (see above). The AVHRR data is recorded daily, and then composed into 10-day values for vegetation using the maximum recorded values. The maximum value is preferred in many studies as clouds and atmospheric pollution tend to decrease the value of NDVI. All the NDVI vegetation data for L. Tanganyika basin as a whole is under \data_spatial\laketan\rs\ndvi.
2005 (dry) | 2004 (wet) |
The plot images above clearly illustrate the variation in annual growing cycles under different rainfall conditions in L. Tanganyika.
You can use excel to design a better graph, including setting colors on the bars, putting labels on the axis, adding more years with data etc.
Annual indexes of vegetation production were calculated using 12 months cycles starting in January and ending in December for each year 1981 to 2009. As stated above three different annual indexes were calculated (annual average *NDVI, annual maximum *NDVI and annual increment *NDVI), and they are all included on the project CD. The table below gives information on the different indexes and where to find the data in the dataset.
Annual vegetation production index | Folder |
Annual average | \data_spatial\laketan\rs\ndvig\GIMMS_NDVIg-annual-average |
Annual maximum | \data_spatial\laketan\rs\ndvig\ |
Annual increment | \data_spatial\laketan\rs\ndvig\GIMMS_NDVIg-annual-increment |
For each index there are layers representing the mean situation for the period 1982-2009, the normalized trend slope 1982-2009 based on z-scores (see below), the normalized trend slope coefficient of determination (regression r2) 1982-2009, and the significance (p) level of the normalized trend slopes.
As you are now going to add several grid files at the same time, it is better to uncheck the option for Layer is visible when added, in the Option window (found via the menu: Tools – Options; see lesson 3 to set the Options).
When you add the vegetation layers to your DIVA-GIS project, only add the grid (*.grd) files, do not load any image (*.jpg) file. You do not need them, as the symbolization of the grid files is identical to the image files.
Start by adding at least three images showing the annual mean vegetation production going back to 1982. Use one image from the early 1980’s, one from the 1990’s and one recent image. In the example below, images for 1984 (very dry), 1994 (very wet), and 2004 (“normal”), were used.
Put the added vegetation images at the bottom of the Legend in the Data view, and turn on the layers showing the countries of L.Tanganyika watershed, then sequentially turn on each vegetation image. The years used in this tutorial show a remarkable difference in annual vegetation production. In the example below you can see the vegetation production in 1984, 1994 and 2004 using the increment *NDVI index as an example. During the drought of 1984 vegetation production was much less and did not reach as fat North as after the rainfall recovery.
1984 | 1994 | 2004 |
You can use the framework dataset to do some simple visual interpretation of correlations between various spatial (geographic) phenomena and vegetation production.
The framework dataset includes flow paths and watersheds, and you need to symbolize these to be able to visually interpret if they have any effect on the vegetation pattern. If you followed this tutorial, you should already have rivers and drainage basins properly symbolized in your DIVA-GIS project, if not you need to go the Frameworks data set section above and do the symbolization now.
The image below shows drainage and average vegetation production (1982 to 2009) for Tanzania part of L. Tanganyika, indicated in the overview map (see lesson 1 to repeat how to add an overview map to your DIVA-GIS project).
To make a more thorough analysis of topography and vegetation growth more detailed data is needed for both vegetation and topography.
In lesson 4 you used layers showing protected areas to visually analyze the tree cover and if tree cover differed inside and outside protected areas in L. Tanganyika. You can now redo that analysis but instead of using tree cover you can use NDVI layers. In the example below IUCN protected national areas (under the folder \data_spatila\laketan\mapdata\protected\vector) are put on top of the average vegetation production layer, but there seems to be no significant difference in vegetation production between protected and un-protected areas. You can test it by using the Identify tool.
Change to the Classes tab in the Properties window and create a classification of the annual rainfall attribute (PANN) as shown below (precipitation is an attribute of the type ratio, and symbolization should be done using classes – click here to get to lesson 2 explaining ratio data and other attribute data types). Change the label of the theme to Annual rainfall (mm) (or other relevant label), the label will then be changed in the Legend of the Data View.
Toggle between viewing the annual rainfall layer and the average vegetation production layer by turning the rainfall layer on and off.
The climate statistical layer contains monthly data not only for precipitation, but also for temperature and soil water, and can be used for making further analysis. If you want to use the climate statistical data layer for calculations, DIVA-GIS can do that if you first convert the polygons to a grid, via the menu: Data – Polygon to Grid. In the Polygon to Grid window you can set the parameters of the output grid to exactly fit the vegetation data. As a grid layer only contains one attribute, you need to select each attribute of the vector layer to generate a new grid file (that is why vector files consume less storage space in the memory). For now it is enough if you generate one grid file for the annual average rainfall pattern.
The image below shows the grid layer for annual rainfall generated by DIVA-GIS, and then symbolized using rainfall classes. You can use the Polygon to Grid function in DIVA GIS to convert any vector polygon (shape file) to a grid, including protected areas, countries, river basins etc. In lesson 6 you will use the country layer to create a grid mask in this way.
In lesson 4 you used map algebra to analyze the relations between cells in the same position, but in different maps. Comparing single cells over different grid layers is called map algebra or local analyzes in the GIS jargon. But you can also analyze a cell related to its neighbors, called focal (or neighborhood) analysis. The most common focal analysis is filtering. A filter is a small window (or kernel in GIS jargon) of cells (3x3, 5x5, 7x7 etc), where the central pixel in the kernel is assigned a value dependent on the values of its neighbors in the kernel. Filtering can be used for analyzing many spatial relations at a focal scale, including mean, median, mode, minimum and maximum values. But also more advanced functions can be used, including for analyzing variation and richness. Make the gridded rainfall map the active theme, and go via the menu: Grid – Neighborhood, to open the window for doing neighborhood analysis. In the dropdown menu for Method you can choose which kind of focal analysis you want to do.
In this case you want to smooth the rainfall grid so that the clusters of pixels with similar values get a more smooth transition. Rainfall is a typical ratio attribute, and hence Mean is the best suited filtering method (if you want to filter nominal data you can not use Mean, but any of the other methods could be used). As the cluster of pixels with similar values are of the size 8x8 cells you should set the Neighborhood (filter or kernel) size to be 9 x 9 in the drop down menu as shown above. Then click OK to do the neighborhood analysis.
Open the properties window for the smoothed rainfall grid (precip_annual_smoothed in the tutorial) and load the legend from the original rainfall grid (see lesson 4 to repeat how to load a legend from an existing to grid). Arrange the display order of the rainfall grid maps and the layer showing the country borders as shown below, then toggle the top rainfall grid layer on and off and you will see the result of the Neighborhood analyses – the filtered version of the rainfall grid has a much more smooth appearance.
If you redo the transect analysis using the smoothed grid you will also see that the clusters of similar values have disappeared, and the values are smoothed.
Neighborhood analysis can also be used to calculate slope (steepness) and aspect (direction of slope). In DIVA-GIS this function is available under the menu: Modeling – Terrain modeling. It is intended to use with topographic data, and will not be used in this lesson.
For mapping out land degradation it is more interesting to analyze the temporal trends in vegetation than the vegetation cover in a particular year.
Instead of clicking in the map, you can also give values for which row and column to plot. After entering values in the text fields, just click Apply for DIVA-GIS to analyze your selected point. You can again copy the stack plot graph or the values behind it to the clipboard and just paste the graphics or numerical values into any other Windows application. An efficient way to add data to excel, make an image for a presentation, or for including in a report. Close the Stack Plot window.
A stack can be used as input for map calculations in DIVA-GIS, introduced in lesson 4. One such option is to calculate a regression trend using a stack. DIVA-GIS supports calculating an ordinary least square (OLS) regression using a stack as input. We can hence use DIVA-GIS to find areas where vegetation has increased or decreased over the last 27 years. The regression option is reached via the menu: Stack – Regression. Select the same stack you used for the stack plot above, click the Output button and give a logical output name, and click Apply to perform the regression, illustrated below.
DIVA-GIS will generate two files from the regression, one with the suffix _slope (the strength of the slope) and a second with the suffix _r2 (the correlation in the slope). Only the first file is automatically added to the project, the second must be added using the Add layer tool. Below you see an example of how the regression slope result has been symbolized using color ramping in 8 classes. It is very clear that vegetation production has reduced over most of Lake Tanganyika for the last 27 years.
Instead of analyzing the trend using the map, you can also use DIVA-GIS to plot a histogram of your results. Make sure the time-series regression layer is the active theme, and go via the menu: Analysis – Histogram. If you had the wrong theme as active layer, you can select which file to plot also directly from the Histogram window. You can set the Class width or the Number of classes you want for your histogram plot, but start with the default - Classes as defined in the legend of the grid layer. Just click Apply, and you will get a histogram where each bar is symbolized with the same color it has in the legend. Click the checkbox for Show values to see the actual numerical values (number of cells falling within each class).
As you see above most cells fall in the class in the negative values indicating a decrease in vegetation cover for the last 27 years.
Again you can copy the graph and paste it into any other Windows document, or you can copy the values and paste them to Excel for further calculations or for building another chart type using the classified data. You can also zoom in and out in the chart to enlarge sections of the chart.
In DIVA-GIS you can also do an Ordinary Least Square (OLS) regression with either single or multiple independent variables to estimate the value of a dependent variable.
Cut Grid
If you keep the Cut window open you can just change Input and Output layers to cut another grid to the same size used for the previous cutting. Otherwise you can also enter the dimensions of the output layer, either as Row/Columns positions or as Coordinates. You can also copy the extent of the cutting from an existing grid in smaller size (i.e. a grid that you already cut before). It is important that your grids covering a specific region have the same extent and resolution (see A note on grid data dimensions in lesson 4). Also cut one of the annual vegetation grids (mean for the whole period 1982-2009, or a grid representing a single year).
You can symbolize the cut grids (rainfall and vegetation production) by loading the legends from existing layers that are already symbolized. The two images below show the cut versions of the rainfall and averaget *NDVI (mean for 1982-2009) grid layers, symbolized using the original grid layers. Note how the colors are not as gradual in the cut *NDVI layer as in the original – the original layer was symbolized outside DIVA-GIS and can not easily be recreated using DIVA-GIS (you then need to define 250 entries in the Legend).
With the two smaller grids shown in the image above, you can use DIVA-GIS to test the statistical dependency of vegetation production on rainfall. This regression function is under the menu: Analysis – Regression. In the Regression window set the cut vegetation grid to be the dependent variable (Y), and the rainfall to be the independent variable (X). Then click Apply to perform the regression analysis. When the regression is ready you can watch the results under the two tabs in the Regression window. The Chart tab opens automatically when the regression is done, and you see the regression as a plotted graph.
(If you look at the curve in the figure above, a logarithmic function should fit the data better than a linear model. In DIVA-GIS you can use the Natural Logarithm as a fitting function rather than a linear function, but the result is not as good as for the linear model. With another exponential function than the natural logarithm the fitting would be better.)
Under the result tab you can see some more details about the regression, including the regression intercept and slope.
In the example above the intercept (1124.246) and slope (-0.3741) leads to the following regression model for determining vegetation production in Lake Tanganyika basin:
annual Average *NDVI = 1124.246-0.3741*rainfall
You can also see the coefficient of determination (R2) that is as high as 0.068 in the example above that can be interpreted as rainfall determining 6.8 % of the spatial variation in vegetation growth in Lake Tanganyiaka. There are also statistical results for ANOVA (analysis of variance) and about other statistical properties.
If you have more variables available that you think can explain vegetation growth, you can also include them by using a multiple regression, available in DIVA-GIS under the menu: Analysis – Multiple regression. Factors that could be tested together in a multiple regression apart from rainfall include: topography, protected area, population and rainfall the previous year.
You should now be able to use what you learned above to redo the analysis of vegetation trends for your own country or district (districts for all the countries in L. Tanganyika basin are available in the vector layer \data_spatial\laketan\mapdata\political\vector\admin_districts.shp). You first need to select a country/district to work with, and then cut the grid data fit that region, and then redo the analysis. Select a study site and use the Stack Plot tool to capture the vegetation change as recorded with the three different indexes (annual average *NDVI, annual maximum *NDVI and annual increment *NDVI). Transfer the results to excel and create a graph that shows the difference between the three indexes. Use the stack regression to analyze the rainfall relation of each index to see how they respond to rainfall Compile your results in a short report including a designed map, a histogram for each index, a comparing excel diagram, and a transect for the same position for each index.
In statistics, the standard score, also called the z-score or normal score, is a dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation. This conversion process is called standardizing or normalizing.
The standard score indicates how many standard deviations an observation is above or below the mean. It allows comparison of observations from different normal distributions, which is done frequently in research.
The standard score is: | |
where: | x is a raw score to be standardized |
The quantity z represents the distance between the raw score and the population mean in units of the standard deviation. z is negative when the raw score is below the mean, positive when above.