LTER Network Information System grows

Issue: 
Network News Summer 2015, Vol. 28 No. 2

The Long Term Ecological Research (LTER) Network Information System (NIS) now contains 28,660 data packages from 27 sites, including those from the retired Shortgrass Steppe (SGS) and North Inlet (NIN) LTER sites. Since the early release of PASTA1 in January 2013 the number of data sets in the repository has increased arithmetically and many strategic advances have been achieved furthering the goals of NIS.

A major addition to PASTA was the integration of the Apache Solr search engine, which has vastly improved the speed and agility of searches for science metadata in the NIS. Solr replaces the Metacat database that was used as transition technology to shorten the PASTA release schedule. Solr is a web-enabled search engine that uses open standards and, therefore, seamlessly integrated into the PASTA framework. If you haven’t searched for your favorite dataset in PASTA lately you should really give it a try. I think you’ll be impressed.

In addition, the LTER Network Office successfully transitioned from using the LTER Metacat as the primary DataONE Member Node to using the DataONE-supported Generic Member Node (GMN) in March 2015. Whereas only metadata was previously available through Metacat, LTER is now synchronizing all publicly accessible data packages residing in PASTA, including both data and metadata, to the DataONE Federation. At present, we have shared over 28,000 public data packages (nearly 250,000 individual objects) from the LTER Network to DataONE, representing the single largest contribution from any Member Node. For continuity, we have also moved metadata documents from the now deprecated Metacat to the LTER GMN where they are still available through DataONE’s search interface.

Furthermore, we have made a number of enhancements to the NIS Data Portal to make it friendlier for data consumers. The most visually noticeable of these improvements is the embedded Google Map locations that are provided in the summary information for each data package that contains geographic coverage information. Browse to your favorite data package via the Data Portal and take a look. If, while browsing, you find yourself losing track of your datasets over time you can take advantage of the new Data Shelf feature that allows users who have logged in to the Data Portal to store data packages of interest on a virtual shelf for later reference. And if your site takes advantage of the optional EML element specifying funding sources(<funding>), you can now quickly filter your datasets by funding code (e.g., DEB-0832652) in the simple or advanced search interfaces of the Data Portal. For “big data” power users, PASTA now supports downloading of very large (>50GB) data files. Lastly, many users are taking advantage of the ability to generate “data harvest” code for popular statistical packages (Matlab, R, SAS, and SPSS) specifically for each data entity right from the interface. This feature was enabled by code produced by LTER information managers and operates as a web service through the VCR LTER web site.

On the data producer side, data can now be uploaded directly to PASTA from the user’s desktop computer without having to preload it on an Internet accessible web server. This a major timesaver for information managers and scientists alike who do not have easy access to a web server.

North Temperate Lakes (NTL) achieved a major milestone for the NIS with a publication in Nature's Scientific Data resource that included the first PASTA data package in that journal (doi:10.6073/pasta/379a6cebee50119df2575c469aba19c5).  A global database of lake surface temperatures collected by in situ and satellite methods from 1985–2009 by Sapna Sharma, et al. (2015) was published and featured in the March issue of Scientific Data. The publication was a group effort by the Global Lake Temperature Collaboration (www.laketemperature.org) of which North Temperate Lakes LTER is a major collaborator.

Only last year the LTER NIS repository was vetted and approved by Nature Publishing Group as a recognized community repository for data published and cited in Scientific Data and other Nature journals. When a data package is published in the LTER Network Information System, a DOI is minted by PASTA and registered through EZID (a DOI subscription service offered to qualifying academic efforts by the California Digital Library) and into DataCite, the premiere DOI registry for scientific data. The DOI is used by the Data Citation Index to track acknowledgement and citation of the data package in subsequent publications.

As this effort demonstrates, data publication and citation represent real opportunities to create scientifically viable products and increase the visibility and impact of LTER research.

The last item to mention, but not least, is the harvest of the LTER Landsat collection into PASTA. The Landsat collection of satellite imagery has been maintained for years by John Vande Castle at the LTER Network Office and is now being harvested into PASTA, including the atmospherically corrected data that were recently completed as part of an LTER science working group.

All in all, a very busy couple of years for the PASTA folks and the LTER information management group.

  1. The Provenance Aware Synthesis Tracking Architecture (PASTA) which is accessible through portal.lternet.edu and provides a repository for all LTER data, including both metadata and data.