Authors: Chantal Huijbers and Sarah Richmond
Citation: Huijbers C, Richmond S (2019). Key environmental data layers across Ecoscience facilities. [URL] [date accessed].
TABLE OF CONTENTS
- Spread of environmental data layers across facilities
- Overlap of environmental data layers between facilities
- How FAIR is this selection of environmental data layers?
Spatial environmental data is essential to better understand past, current and potential future states of our environment, and as such this data is widely used by both researchers and decision makers. There are many different producers and aggregators of spatial data, but discoverability and access for end users of this data can sometimes be limited due to licensing or technical constraints. Online platforms have been developed to overcome such issues and can provide a common access method for datasets from disparate sources. However, these platforms often serve a particular user community with specific data analysis activities, and as such only provide access to a subset of all spatial environmental data available.
Among the ecocloud Partnership there are a number of platforms that retain standardised copies of spatial environmental data layers relevant for the analytical capabilities provided by each platform. While this enables easy access and visualisation of data within the platform, it also presents a number of issues with regards to maintenance, storage, and interoperability across platforms. Moreover, there is likely some significant overlap between these data collections, which are currently stored and managed independently by each facility.
With the increasing volume of data that becomes available for environmental science, some key issues and challenges emerge for platforms hosting this data:
Standardisation of data in the appropriate format for end use in the platform;
Restrictive and mixed licensing, limiting uptake and comparability of datasets;
Maintenance of multiple versions of the same dataset;
Presentation of accurate dataset information depends on metadata provision by data provider;
While platforms host data that is most relevant to their end users, there is likely significant overlap among platforms in relevant data, resulting in duplication of managing these issues as well as duplication of curation efforts, meaning the selection, integration and presentation of best available data including metadata.
These issues and challenges also affect the end user of spatial environmental data layers. The interoperability of data layers hosted by different platforms might be limited if data is provided in different formats, with different reuse licenses, or if metadata is not exhaustive enough to understand the similarities or differences between data layers.
This report provides a brief overview of the key national environmental raster data layers hosted by four of the ecocloud partners, with a particular lens on the use of this data in analytical tools. These data layers could be included in the development of National Spatial Layers Service that aims to reduce replication and increase coordination and collaboration.
The environmental data layers included in this report are hosted by the following facilities:
Atlas of Living Australia (ALA): ALA manages and presents ~500 bio-environmental data layers from over 60 national and international agencies within its Spatial Portal. For this review we only included raster layers with environmental data, and thus excluded the vector layers with contextual information such as boundaries of biological or management regions.
Biodiversity and Climate Change Virtual Laboratory (BCCVL): the BCCVL is an online platform for biodiversity modelling which uses spatial environmental data layers as input to the models. The platform includes a data portal that provides access to ~60,000 spatial raster layers for current and future climate data as well as environmental layers for vegetation, soil, hydrology, stream, catchment and marine data.
Multi-Criteria Analysis Shell for Spatial Decision Support (MCAS-S): MCAS-S is a free spatial software tool provided by the Australian Bureau of Agriculture and Resource Economics and Sciences (ABARES). It enables users to view and combine spatial layers for multi-criteria analysis which is often used to inform decision making and help with stakeholder engagement. MCAS-S provides access to ~200 spatial raster layers with socio-economic and environmental data.
Terrestrial Ecosystem Research Network (TERN): TERN is Australia’s ecosystem observatory that provides access to standardised measures of land ecosystems. For this review, we only included the national raster layers made available by TERN and the Australian National University (ANU) through the Landscape Data Visualiser and the ANUClimate collection accessible through the TERN Data Discovery Portal.
It is worth noting here that ALA, BCCVL and MCAS-S copy and host spatial environmental layers primarily as data consumers for specific analytical workflows, whereas TERN create, collect and aggregate environmental data layers as a data provider. As such, the former three platforms retain their own copy of the data layers, while TERN’s Landscape Data Visualiser provides direct download services.
Spread of environmental data layers across facilities
The four facilities included in this report together provide access to almost 200,000 environmental raster layers for climate, fire, hydrology, land use, socio-economic, substrate, topography, and vegetation data (Table 1).
Table 1. Overview of number of environmental raster layers per facility for different classes of data.
The vast majority of climate data is hosted by TERN and BCCVL. While TERN provides access to >80,000 layers of daily and monthly climate data, BCCVL hosts >50,000 layers with possible future climate projections. ALA and MCAS-S only have a subset of long-term climate averages available for use in their tools. For the other environmental data categories, ALA, BCCVL and MCAS-S all provide a relatively equal spread to mostly long-term data for a large number of different variables. The Landscape Data Visualiser by TERN/ANU on the other hand focuses on a smaller set of variables, but these are measured at single day or week intervals for several years resulting in 1000s of layers for 9 variables. Also, data for the soil variables are provided at 6 different depths for the best estimate, 5th and 95th percentile, adding up to >10,000 substrate layers. Interestingly, ALA is the only platform that hosts layers for future substrate data such as potential growth index for plants and water deficiency for soil moisture.
When only considering spatial layers for current environmental conditions, each facility hosts a significant amount of current climate data, regardless of whether that is daily, monthly, annual or long-term data (Figure 1). While there is not a lot of overlap of identical climate layers between the facilities, it is clear that a standardised approach to hosting such data would be helpful for both data providers, consumers and end users. For example, both ALA and BCCVL host global climate data from WorldClim. This data is made available at 4 different spatial resolutions, and recently a new version of the data has been published. For each resolution there are 103 layers available, so in total 412 new layers that would have to be downloaded, stored, standardised and visualised, twice, by two different groups. While in this case only 2 facilities would host this data, it is likely that other tools also provide, or would like to provide access to the same data.
Figure 1. Spread of current only environmental data layers among categories for each facility separately.
Figure 2. Spread of all, including future, environmental data layers among categories for each facility separately.
Figure 3. Spread of environmental raster layers among facilities visualised as percentage for each category of data separately.
Overlap of environmental data layers between facilities
As far as could be extracted based on the metadata provided, only 41 of these layers are exact duplicates. However, there still is a lot of overlap between layers that provide information for the same variable, but the layers could be generated from different input data, based on different temporal scales and/or generated at different spatial resolutions. While such variety can be useful for comparative analyses, it also creates confusion and potentially a decrease in confidence and trust for end users who need to decide which layers are best for their research purpose. It was noted that the clarity of metadata for a lot of these layers was often insufficient (e.g. spatial and/or temporal resolution not listed, link to information not working), and the quality and amount of metadata provided varied quite significantly between the different facilities. Most overlap was found for the following data classes:
Climate data: ALA, BCCVL and MCAS-S all provide access to long-term climate data. While BCCVL only provides data for the standard 19 bioclim variables, ALA and MCAS-S include custom developed layers such as seasonal precipitation reliability. MCAS-S layers are mostly based on data products from the Bureau of Meteorology (BOM), while ALAs climate layers are mostly generated by CSIRO. BCCVL provides the largest suite of different long-term climate datasets enabling users to compare data from 5 different providers. While such comparisons can be informative, how does a user choose which layer for a particular climate variable is most useful? See Box 1 for a comparison of mean annual rainfall data layers.
Soil data: ALA, BCCVL and MCAS-S all include layers from the Australian Soil Resource Information System (ASRIS) such as clay content, bulk density and water holding capacity. TERN provides data for the same variables from the Soil and Landscape Grid of Australia, but has a more comprehensive suite including all depths. While the provided information states that the Soil and Landscape Grid is part of ASRIS, it might be unclear for an end user who would access this data through the different platforms whether the layers are the same and if not, what the difference is or how to decide which layers are best fitted for the intended use.
Vegetation data: data layers for several variables that provide an indication for the state of the vegetation such as fractional ground cover, fraction of photosynthetically active radiation (fPAR) and gross/nett primary productivity (GPP/NPP) are made available through more than one facility, but are based on different source data or calculated for different year ranges.
Topography data: data layers for variables such as slope and relief are usually based on the Geodata 9 second Digital Elevation Model (DEM) which is duplicated across ALA and MCAS-S. Other topographical data products such as the ridge top and valley bottom flatness by CSIRO are duplicated across ALA, BCCVL and MCAS-S who all host an old version of these layers and thus would all need to upgrade to the newer version available in the near future.
A standardized approach to metadata for these spatial environmental raster layers across facilities would already greatly improve the transparency for end users to compare layers and select the appropriate data for their end use.
How FAIR is this selection of environmental data layers?
The FAIR data principles provide guidelines to improve the findability, accessibility, interoperability, and reuse of data. While it would be time consuming to assess all 200,000 layers against these principles, we used the guidelines in the ARDC FAIR self-assessment tool to broadly assess the data provided by the four facilities.
While there is still room for improvement to make this data more FAIR, all facilities provide some metadata to help users understand the provenance of the data. BCCVL and MCAS-S both enable users to directly download the data through their platform, presenting an easy way for researchers to obtain data in a standardized format. However, that approach means that these facilities need to retain a local copy, which incurs extra effort in indexing and maintaining the data as well as keeping track of updates or new versions. TERN’s approach to provide a visualizer for a large range of spatial data layers including a direct link to the location of that data is a more sustainable approach for the long-term and reduces the effort in duplication of layers. For such large datasets, it is recommended to implement services that enable easy access to a data cube based on a few simple parameters (e.g. spatial and temporal range) for end users who do not have the technical or computational knowledge to access this data through a thredds server.
The four facilities in this report host a wealth of data, and this is still only a subset of all available spatial environmental data in Australia. Other facilities and organisations such as Geoscience Australia, CSIRO, Department of Environment and Energy, BOM, and the other resources provided by TERN include data layers that will also be of interest for the environmental science community, but it was outside the scope of this report to include them all.
In general, the efforts that have gone into visualising such a large variety of spatial data layers is impressive and commendable. The services provided by the four facilities and others have already increased the discoverability and accessibility of this data for end users. With new data generated on a daily basis, it is important to keep striving to make this data more FAIR, increase standardisation across facilities, and decrease duplication of efforts. As an example, standardization could be enabled through the use of Cloud Optimized GeoTIFF (COG) files that enables more efficient workflows in the cloud and is aimed at reducing duplication of data.
The long-term goal of developing a National Spatial Layers Service has been mentioned by several collaborators and initiatives such as National Map, the National Environmental Information Infrastructure (NEII), E2SIP and the National Earth and Environmental Science Facilities Forum (NEESFF). Williams and Belbin (2015) suggested a centralized approach with standardised formats for all layers that can be accessed through a national data cube. Such a national approach would have several benefits like an improved supply chain of data, interoperability between systems, reduction of duplication effort and a one point of access to data for both platforms and end users. Additionally, a national environmental data hub with standardised metadata across data layers would provide researchers and decision-makers with the confidence and trust required for them to easily understand the differences between these datasets and choose the data that is best fit for their purpose. It is noted that such a standardized access portal should also provide access to historical data, which is frequently unavailable for download.
In summary, a national environmental data hub would provide:
One place to access spatial environmental data layers;
Shared resources and a common infrastructure framework for long-term success;
Agreement on best practices to publish geospatial data;
Standardised metadata to describe spatial information and the methods that generated the data.
One of the potential delivery mechanisms for such a data hub could be through Digital Earth Australia.
The spatial environmental raster layers included in this report only represent a subset of the much larger amount of data available in Australia. This report was therefore not aiming to provide a complete overview of all data layers, but rather some examples of good practices across some of the ecocloud partners, and an indication of where we can gain improvements in better interoperability between these facilities.
Full list with layers:
Belbin, L., & Williams, K. J. (2016). Towards a national bio-environmental data facility: experiences from the Atlas of Living Australia. International Journal of Geographical Information Science, 30(1), 108-125.