LTER advances Ecological Informatics

Issue: 
Network News Fall 2006, Vol. 19 No. 2

Most ecologists will agree on the necessity and importance of synthesis to address new ecological questions, yet synthesizing desired data products from a diverse array of complex datasets in a robust and reproducible way is a challenging task. Now, teams of researchers from the Harvard Forest Long-Term Ecological Research site (HFR) and the LTER Network Office (LNO) have advanced the knowledge of designing and building scientifically rigorous on-line information systems that will directly and significantly enhance ecological synthesis.

The LNO team responsible for the design and development of the Network Information System (NIS) has designed and prototyped a data warehouse framework to support ecological synthesis, building on successful deployment of ecological metadata language (EML), the Metacat repository, and Metacat Harvester. This framework, code-named PASTA for Provenance Aware SynThesis Architecture (see Figure 1), is

  1. Efficient because it builds on existing investments and experiences
  2. Integrative because it adopts standard interfaces and approaches,
  3. Innovative because it incorporates data provenance and data quality into the design.

The PASTA data warehousing architecture has been prototyped against the dynamic part of the Trends project as a case study and demonstrated to scientists on the Trends editorial committee. PASTA has received positive reviews by the Network Information System Advisory Committee (NISAC), members of the Science Environment for Ecological Knowledge (SEEK) development team, the Trends technical committee, and the LTER IM committee. According to Mark Servilla, Lead NIS developer, “The project draws upon current and advanced computing science in the management of data provenance and data quality metrics…. Early prototyping will pay off and accelerate development by giving us material with which to solicit partners and proposals.”

While there is much to be done to bring PASTA into production, a major milestone was reached recently in developing and testing the EML Parser/Loader. The EML Parser/Loader, developed in partnership with SEEK and the National Center for Ecological Analysis and Synthesis (NCEAS), reads an EML document and uses the information there to retrieve and load a dataset into a relational database management system. In early tests, datasets from the Georgia Coastal Ecosystem (GCE) LTER site have been successfully extracted, loaded, and queried. The success of the EML Parser/Loader is the next big step in being able to automate part of the synthetic process.

One major hurdle in the deployment of PASTA or any architecture that recognizes provenance is defining the mechanism for representing data lineage with complete and precise definitions of the scientific processes that are used to produce scientific datasets. Enter the researchers from Harvard Forest LTER and their partners at the University of Massachusetts. Through a concept called “analytic webs” (first reported as an update to Network News in July, www.lternet.edu/news/Article98.html) analytic and synthetic processes can be described accurately through a concordance of directed graphs describing data flow, dataset derivation, and data processes (Figure 2). The precise and formal definitions of these graphs present a promising development in describing data provenance in a robust and reproducible way that can work in harmony with the LTER Network investments in EML. The team comprising A.M. Ellison, L.J. Osterweil, L. Clarke, J.L. Haldley, A. Wise, E. Boose, D.R. Foster, A. Hanson, D. Jensen, P. Kuzeja, E. Riseman, and H. Schultz, whose work was supported by the National Science Foundation, also developed a prototype software tool called SciWalker that is used to create the analytic webs and synthesize the data. The researchers successfully applied analytic webs to the analysis and synthesis of forest carbon-dioxide exchange data from eddy flux towers located at Harvard Forest’s Prospect Hill.

These independent developments by researchers in the LTER Network and their partners fit together like the pieces of a puzzle to form a promising picture of the future. Look to this space for continued reporting on advances in Ecological Informatics.

Further reading

Ellison, A. M., L.J. Osterweil, L. Clarke, J.L. Haldley, A. Wise, E. Boose, D.R. Foster, A. Hanson, D. Jensen, P. Kuzeja, E. Riseman, and H. Schultz. 2006. Analytic Webs Support the Synthesis of Ecological Data Sets. Ecology, 87(6): 1345-1358.

Servilla, M.S., J.W. Brunt, I Sangil, and D Costa. 2006. Pasta: A Network-level Architecture Design for Generating Synthetic Data Products in the LTER Network. Databits – Fall 2006. Long Term Ecological Research Network.

James W. Brunt is the Associate Director for Information Management at the LNO