Making Data Useful:The History and Future of LTER Metadata

Issue: 
Network News Fall 2001, Vol. 14 No. 2

Since its inception, the LTER Network has been a leader in creating and managing ecological metadata and is on the threshold of making significant leaps that will greatly increase the utility of LTER data to ecological researchers.

History

The LTER program has long recognized the importance of long-term survivability of data and that one of the contributing factors determining that survivability is the documentation of data through metadata (data about data). Today, metadata is increasingly being recognized as the key to making scientifically useful data available to ecological researchers (see LTER Network News: Fall 1999).

LTER sites have always recognized the need to maintain documentation that will allow data files to be interpreted. A substantial portion of the "Research Data Management in the Ecological Sciences" book (Michener 1986), which resulted from a 1984 symposium sponsored by the LTER Network, was devoted to issues of "documentation" - then primarily in hard-copy (paper) forms. During the first decade of the LTER program, the emphasis was on site-specific metadata needed to maintain the long-term utility of data within a given site. However, with the growth of interest in cross-site and network-wide science in the late 1980s and early 1990s the interest in cross-site data and metadata exchange increased.

Early interest was primarily in metadata for cataloging LTER data sets. A cross-site data catalog was discussed at the 1988 LTER Data Managers' Meeting, put forward as a proposal at the 1989 meeting and by the 1990 meeting the "LTER Network Core Dataset Catalog" was "in press" (Michener, Miller and Nottrott 1990). The catalog presented a limited subset of all LTER data, focusing only on data deemed by the sites to be "core" data sets. It contained only 251 data sets represented network wide, compared to over 3,000 data sets in the current online catalog.

1990 was a banner year for LTER work on metadata. The Data Mangers' meeting also featured a discussion of the "InterSite" data file format (Conley and Brunt, 1991) pioneered by Walt Conley and a discussion of "meta-data." The meeting even included a proposal to the LTER Coordinating Committee for a network-wide standard for "a minimum set of standard information for data abstracts." This standard requested a title, keywords, list of parameters (variables) measured, site location, study purpose and goals, experimental design, methods and proprietary limits. It should be noted that prior to 1990, it was extremely difficult to find out what data was being included in the database of other LTER sites. This was not an aberration, but rather represented the culture of ecological science at the time.

Interest at improving data accessibility at the 1990 LTER All-Scientists' meeting led to creation of an ad-hoc working group (Judy Myers, John Hobbie, John Magnuson, Bill Michener, Susan Stafford and John Porter) that was charged to develop guidelines for site data management policies. The resulting guidelines called for each site to develop its own data management policy that guaranteed the availability of data, while protecting the rights of data contributors (Porter and Callahan 1994). These guidelines had broad influence on site policies and paved the way to the creation of an LTER-wide data policy in the late 1990s.

The 1991 Data Managers Meeting had an extensive discussion of "Interactive Data Access Systems." At that time, in the pre-WWW days, such systems were confined to "bulletin board" systems using modems within the LTER Network and only Hubbard Brook LTER had an on-line system in operation. Interestingly, the discussion had a strong emphasis on security, with an emphasis on releasing data only to selected users. In a separate discussion on data publication (largely revolving around CD-ROMs), a recommendation was to "Encourage professional societies to produce data publications in support of their journals." This recommendation preceded the establishment of the Ecological Society of America data journal "Ecological Archives" by nearly a decade.

The vision section of the report noted: "Importantly, the proposed format is for the exchange of data, not for the development and maintenance of meta-data within a site" thus avoiding a pitfall encountered in earlier efforts which failed to recognize that sites needed to maintain flexibility for dealing with their site-specific metadata and computational needs. By focusing on finding ways to transform site-specific metadata into general forms, it facilitated exchanges while maintaining needed autonomy and flexibility.

Based on that vision of sharing metadata, the sites committed to developing a "common exchange format" based on the minimum metadata standard defined at the 1990 workshop. Thomas Kirchner, then data manager at the Central Plains Experimental Range (now Shortgrass Steppe) LTER was charged with developing tools based on his concept of a flexible "attribute-value" syntax. By the fall of 1992, using software tools developed by Tom, the information manager at the Virginia Coast LTER was able to automatically create a SAS program by reading metadata stored in the "attribute-value" standard. Despite this demonstration of technical feasibility, the development of a usable metadata exchange media still needed cross-site consensus on minimum metadata for exchange.

In 1992 and 1993 a quiet revolution occurred. A simple to administer Internet information server called "Gopher" became available. By 1993 eight sites were running Gopher systems and were well-positioned for the "Internet Revolution" when the World Wide Web became a practical option with the release of the first graphical web browser (NCSA Mosaic) in 1993-1994.

A greater revolution occurred in 1994 when the LTER Coordinating Committee first mandated that each LTER site should have "at least one" data set available on-line. This led to a vision for a network-wide information system aimed at providing scientists access to LTER data resources. In addition to developing that vision, the Data Managers' Committee also finalized a content standard for cross-site metadata exchanges. This standard did not specify the form the metadata would take, but rather focused on the types of information that would be required to interpret and use data for ecological research. The LTER adoption of metadata exchange standards coincided with the activities of the ESA "Future Long-term Ecological Data" (FLED) committee, chaired by Kay Gross (KBS LTER), and was incorporated into what is now considered a seminal paper on ecological metadata (Michener et al., 1997).

Future

As we enter the third decade of LTER research, LTER metadata efforts have continued and grown, with the development of partnerships with the National Center for Ecological Analaysis and Synthesis (NCEAS), the Federal Geographic Data Committee (FGDC), the San Diego Supercomputing Center and others. Recent efforts to develop advanced systems for searching, accessing and using distributed data require both more extensive metadata content and more structured forms of encoding that information to allow more automated forms of use. The Ecological Metadata Language (EML) developed originally at NCEAS and jointly enhanced this year via collaboration with LTER and the "Knowledge Network for Biocomplexity (KNB) project (see LTER Network News: Spring 2001) " extends the past efforts aimed at exchanging metadata and carries them a giant leap forward. With the associated tools, it makes the vision developed by LTER information managers in the early 1990s a practical reality.

The LTER Information Manageers now turn to the problem of implementing a common standard for metadata across the LTER network. Preliminary results indicate that sites vary from requiring major effort to create and input metadata to requiring assistance in developing software to translate existing metadata. The LTER metadata working group has identified several key components that need to be specified as part of a comprehensive plan for implementing metadata standards across the network. The most significant effort is migrating the structure of existing metadata to a schema based on, or compatible with, the recommended standard, EML 2.0. For some sites, this may also require expanding current content coverage to satisfy required content fields. Finally, technical developments will be required at many sites to enhance existing reporting tools to support the new XML structured metadata output formats.

Three metadata working groups have been formed around each of the 3 extant LTER metadata types. Each group is responsible for producing -

  • A applicable plan of attack for that metadata type with cost estimates for implementation,
  • A body of EML 2.0 compliant metadata from one or more sites with that metadata type reposited in the LTER metadata catalog,
  • A contribution to general interest and training publications on 'implementing ecological metadata language' for that metadata type.

The LTER Network Office and the KNB project will provide coordination, programming, and database support to the working groups. This effort is made possible by supplemental funding from NSF to the Network Office. This activity will have broad impact beyond the LTER community in that we expect ecological metadata language to become the standard used by the entire ecological community.

Literature Cited

This article made extensive use of the reports from LTER Information Managers' meetings, which are available on the WWW at:
http://intranet.lternet.edu/cgi-bin/list.cgi?./reports/committee_reports...

Conley, W. and J. W. Brunt. 1991. An institute for theoretical ecology? - part V:
practical data management for cross-site analysis and synthesis of ecological information. Coenoses 6, (3), 173-180.

Michener, W.K (ed.). 1986. Research Data Management in the Ecological Sciences. Belle W. Baruch Library in Marine Science No. 16, University of South Carolina Press, Columbia, SC. p. 426.

Michener, W.K., A.B. Miller and R. Nottrott. 1990. Long-term Ecological Research Network core data set catalog. Belle W. Baruch Institute for Marine Biology and Coastal Research. Columbia, South Carolinia. p. 322.

Michener, W.K., J.W. Brunt, J.J. Helly, T.B. Kirchner and S.G. Stafford. 1997. Non-geospatial metadata for the ecological sciences. Ecological Applications 7(1):330-342.

Porter, J. H., and J.T. Callahan 1994. Circumventing a dilemma: historical approaches to data sharing in ecological research. Pages 193-203 in W. K. Michener, S. Stafford, and J. W. Brunt, editors. Environmental Information Mangagement. Taylor and Francis, Bristol, PA.