USGS Data Harvesting Service for HydroDB

Issue: 
Network News Fall 2003, Vol. 16 No. 2
Section:
Network News

In January 2002, Wade Sheldon (GCE LTER) developed an automated system for harvesting streamflow data from any real-time USGS gauging station and processing it for submission to HydroDB, the LTER All-site hydrological database at Andrews LTER. Working in collaboration with Suzanne Remillard and Don Henshaw (AND LTER), this system was generalized and offered as a service to the broader LTER community in June 2003.

In this system, recent provisional data are harvested on a weekly basis from one or more stations requested by each participating site. The data are converted to units compatible with HydroDB and undergo several levels of quality control analysis and flagging to identify questionable values. Values flagged as invalid (e.g. negative discharge) are removed from data sets prior to submission to HydroDB. Also, any updates to provisional data by USGS are automatically synchronized with the database each week, and provisional values are overwritten with finalized data as soon as they are released.

This harvesting service provides several important benefits to the LTER and broader scientific community. USGS has made great strides in providing timely access to national monitoring data via the WWW, but the vast size of this monitoring network (over 5500 streamflow stations alone) makes finding data relevant to LTER sites a significant task. Data are also not provided in standard metric units, and provisional data are often not subjectedto any quality checks prior to web posting. Harvesting, transforming, and quality-checking data from stations near to or within LTER sites on a regular basis and providing access through a single web interface greatly enhances the usability of these data, facilitating synthesis. It also serves as a useful demonstration of how metadata-based data processing technology (see http://gcelter.marsci.uga.edu/lter/research/tools/usgs_harvester.htm), data format standards, and web-based communications protocols can ease the application of information technology developed at sites to network-level problems, providing a significant research benefit with almost no added cost.

The San Diego Supercomputer Center (SDSC) scientists and LTER information managers have been collaborating since February 2002 to develop a web services implementation of ClimDB (Network News, Fall 2002, p. 3-4).