Software Calibration & Standardization

The CISN partners are working together on the problem of software calibration, particularly as it pertains to automated earthquake processing. Currently, the software implemented in the Northern and Southern California Management Centers is very different. Initially, the CISN focused on the issue of calibration although the last year has seen an increased focus on standardization.

In 2002-2003, effort was focused on phase pickers (pick-ew), the association algorithm (binder), the location algorithm (hypoinverse), and magnitude estimation (various). Since then, magnitude estimation continues to be a significant area of focus, as well as ShakeMap configuration, metadata exchange, and database standardization.

At this point, the issues of a statewide detection and location system are largely addressed. Configuration files have been standardized and a statewide system has been running in Menlo Park for nearly a year. It performed well during the December 2003 San Simeon sequence and the 2004 Parkfield earthquake.

However, a number of outstanding issues remain to be addressed.

Magnitude: Calibrating magnitude estimates has proven to be more difficult than the CISN originally anticipated. As described last year (in some detail), it appears as if there is a bias between the northern and southern California magnitude estimates, as illustrated by three lines of evidence. First, a comparison of nearly 500 earthquakes over a 20 year period in central California recorded by both networks shows a bias of 0.14 magnitude units, with NC magnitudes higher than SC magnitudes. Second, efforts to invert Wood Anderson amplitudes using a differential approach, a constraint that the BKS and PAS adjustments sum to zero, and fixing the attenuation relationship to one determined by Kanamori (1993), indicates a bias of 0.14. Finally, an independent inversion of a different dataset (absolute approach, a different set of station constraints, and simultaneous inversion for attenuation) suggests a bias of 0.20.

Efforts to understand this issue have been hampered by the lack of a good statewide dataset. In the past year, Bob Uhrhammer has started to clean up the data set by removing outliers and events with a small number of observations and analyze these data. His results examine variations among networks, component, orientation, and processing system as well as station adjustments.

A final component of the magnitude efforts is the designation of a magnitude reporting hierarchy. There is general agreement at the low end and at the high end, but the working group is still reviewing issues related to transition points from one magnitude type to another.

ShakeMap: In addition to the efforts in standardizing earthquake locations and magnitudes, a CISN working group has been addressing issues related to ShakeMaps. At present, ShakeMaps are currently generated on 5 systems within the CISN. 2 systems in Pasadena generate ``SoCal" Shakemaps; 2 systems in the Bay area generate ``NoCal" Shakemaps; and 1 system in Sacramento generates ShakeMaps for all of California. The Sacramento system uses QDDS to provide the authoritative event information for northern and southern California.

Over last 18 months, the Working Group has addressed a number of issues in the standardization of ShakeMap. Initially focusing on the look and feel of the maps (topography, geology, faults, road, lake outlines, cities, and fonts), the Working Group has just started to review a comprehensive compilation of the differences in configuration among the 3 implementations. The remaining differences between the centers range from the small (URL used in the ``addon" message) to the significant (use of regressions, linear versus log amplitude weighting). This effort will move the CISN forward toward having fully standardized ShakeMaps.

The lack of stations in the near source region of the 2003 San Simeon earthquake raised a number of issues related to ShakeMap and how to measure the quality of the product as well as quantify the uncertainty. Over the past 6 months, a subset of the Working Group has been working on this issue, based on the work of Hok and Wald (2003). One of the first projects has been to make a map to illustrate current CISN capabilities, based on the existing station distribution. The next step is to look at calculating uncertainty for some example earthquakes and comparing the uncertainty estimates with different realizations of the ShakeMap produced by using various subsets of the data. Once the method to quantify the uncertainty is validated, then we can use this information to determine a grade.

Toward the goal of improving access to ShakeMap, the working group has put together an outline of how to create a unified set of Web pages. With general agreement about what to do, not much progress was made on actual implementation. The primary difficulty has been time, since creating unified Web pages requires a separation between product generation and Web page generation.

A second goal of this effort was to improve the robustness of ShakeMap generation and delivery by taking advantage of the fact that ShakeMaps are generated in the Bay Area, Pasadena, and Sacramento.

Lind Gee and Bruce Worden recently drafted a updated proposal to address both of these problems. The proposal is still being reviewed by various parties (including the working group) but hopefully will begin to move forward during the next fiscal year.

Location Codes: The CISN adopted a standard for the use of ``location" codes (part of the Standard for the Exchange of Earthquake Data (SEED) nomenclature to describe a timeseries based on network-station-channel-location) in the late fall of 2003. Over the past few months, USGS and UC Berkeley developers have been working to modify the Earthworm software to support the use of location codes. This effort is nearly complete and the centers are working on a plan to begin migration to the modified codes.

Metadata Exchange: The CISN is also working on issues related to metadata exchange. This is an important component of CISN activities, as correct metadata are required to insure valid interpretation of data. A Standards Working Group has developed and initiated testing of a model for database replication of metadata, and is currently reviewing how much of the schema to exchange and how to address metadata from partners such as CGS, who do not currently maintain their metadata in a database.

Two years ago, the Metadata Working Group compiled a list of metadata necessary for data processing and developed a model for exchanging metadata. In this model, each CISN member is responsible for the metadata of their stations and other stations that enter into CISN processing through them. For example, Menlo Park is responsible for the NSMP, Tremor, and PG&E stations and Caltech is responsible for the Anza data. The Working Group believes that metadata exchange should proceed on a timely basis, not just when data are generated, and is testing an approach using database replication.

For database exchange, the Working Group proposed that each group or organization have a working or interim database as a staging area (a private sandbox) and a master database. The interim database would contain snapshots of the master tables (that the group/organization is responsible for) and the changes would be pushed manually to the master database by snapshot replication. Changes would be propagated among the master databases by multi-master replication.

To start this off, the Working Group agreed to test replication with a limited number of tables, focusing initially on tables relevant to the real-time system (but not sufficient for archiving). In order to test this solution, the NCMC and SCMC needed to resolve some inconsistencies in the database implementations. This included both differences in the physical schema as well as differences in use of the schema. The Working Group initiated a pilot test in early 2004, using the tables SimpleResponse and StaMapping and a test database at the NCMC and SCMC. The initial results were successful and the effort has expanded to other tables. As of June 2005, 11 tables are being replicated between the test databases. The next step for the Working Group is to validate the use of the metadata in the real-time system. When this is done, the database replication can be migrated to the primary databases.

In parallel, the Working Group has developed a plan for importing metadata from CGS. Their metadata is not currently stored in a database and is maintained in simple files. Their policy is to distribute the metadata as part of a waveform package and the V0 format was developed to allow for that. The Working Group developed the concept of a "dataless" V0 format (analogous to the dataless SEED files) which will be used to distribute the metadata. In the current plan, CGS will initially prepare and distribute dataless V0 files providing the current metadata for ShakeMap quality stations (i.e., with channels meeting CISN Reference Station or better standards) in the CGS network. These current-information metadata files for the stations will be distributed (probably using a mechanism like sendfile/getfile) and will also be placed at the CGS FTP site. As agreed, the comment field in the V0 header will be used to define the valid time period for the metadata. Each dataless V0 file will contain the 3 channels of the reference sensor at the site. The Working Group plan includes the ability to handle corrections, as well as updates as stations are serviced.

In order to make use of the dataless V0 file, tools have been developed to parse the file and write an XML file containing the information (an expansion of capabilities of the v02ms program). The NCMC has taken advantage of previously existing tools to create a system where the XML is converted into a spreadsheet format and then imported into the database. This plan will be further tested as CGS generates more dataless V0 files and the database is populated.

As part of this process, the issue of mapping the sensor orientation into the SEED channel nomenclature has come up. The v02ms program now uses the same algorithm for generating channel names as used by CGS.

Standardization Over the past 18 months, the CISN has begun to focus more on standardization of software, rather then calibration of software. As a result, the CISN is beginning to leverage software resources across the CISN partners. For example, the BSL and the USGS Menlo Park are planning to adopt large portions of the software running at the SCMC for the NCMC. The northern and southern California developers met for a two-day meeting in October 2004 to discuss plans for the joint software development. Examples of collaboration include the development of the CISN Messaging Service - software designed to replace the commercial SmartSockets package used in the initial development of TriNet, implementation of the RequestCardGenerator and Jiggle in northern California, and initial efforts to develop specifications for a magnitude coordinator.

Berkeley Seismological Laboratory
215 McCone Hall, UC Berkeley, Berkeley, CA 94720-4760
Questions or comments? Send e-mail: www@seismo.berkeley.edu
© 2005, The Regents of the University of California