next up previous contents
Home: Berkeley Seismological Laboratory
Next: Network Data Analysis Up: Operations Previous: Parkfield-Hollister electromagnetic monitoring array

Data Archive and Distribution: Northern California Earthquake Data Center

Douglas Neuhauser, Stephane Zuzlewski, Rick McKenzie, Steve Fulton, Lind Gee


Overview

The Northern California Earthquake Data Center, a joint project of the Berkeley Seismological Laboratory and the U.S. Geological Survey at Menlo Park, serves as an "on-line" archive for various types of digital data relating to earthquakes in central and northern California. The NCEDC is located at the Berkeley Seismological Laboratory, and has been accessible to users via the Internet since mid-1992.

The primary goal of the NCEDC is to provide a stable and permanent archival and distribution center of digital geophysical data for northern and central California such as seismic waveforms, electromagnetic data, GPS data, and earthquake parametric data. The principal networks contributing seismic data to the data center are the Berkeley Digital Seismic Network (BDSN) operated by the Seismological Laboratory, the Northern California Seismic Network (NCSN) operated by the USGS, and the Bay Area Regional Deformation (BARD) GPS network. The collection of NCSN digital waveforms date from 1984 to the present, the BDSN digital waveforms date from 1987 to the present, and the BARD GPS data date from 1993 to the present.

The NCEDC continues to use the World Wide Web as a principal interface for users to request, search, and receive data from the NCEDC. The NCEDC has implemented a number of useful and original mechanisms of data search and retrieval using the World Wide Web, which are available to anyone on the Internet. All of the documentation about the NCEDC, including the research users' guide, is available via the Web. Users can perform catalog searches and retrieve hypocentral information and phase readings from the various earthquake catalogs at the NCEDC via easy-to-use forms on the Web. In addition, users can peruse the index of available broadband data at the NCEDC, and can request and retrieve broadband data in standard SEED format via the Web. Access to all datasets is available via research accounts at the NCEDC. The NCEDC's home page address is http://quake.geo.berkeley.edu/

Datasets

The initial phase of archiving the historic NCSN earthquake seismograms from 1984 through the present and BDSN data from 1987 through the present is basically complete. All historic NCSN data tapes have been read and data for all local and regional events have been loaded onto the NCEDC. A list of teleseismic events recorded by the NCSN is now available on the NCEDC, and the NCSN is determining which of those event have sufficient data to archive at the NCEDC. The 16-bit BDSN data from 1987-1991 have been converted to MiniSEED and are now online.

The total size of the datasets archived at the NCEDC is shown in Table 7.1.


 
Table 7.1: Volume of Data Archived at the NCEDC by Data Type
Data Type MBytes
Broadband, electric field , and magnetic field waveforms (compressed) 484,411
NCSN event seismograms (compressed) 197,717
GPS data (compressed RINEX and raw data) 122,863
Unocal Geysers region seismograms 25,090
Parkfield HRSN seismograms 2,728
Misc data 8,920
Total size of archived data 841,729
 

BDSN/HFN Seismic Data

The archival of current BDSN and HFN seismic data is an ongoing task. BDSN and HFN data are telemetered from 29 seismic dataloggers in real-time to the BSL, where they are written to disk files. Each day, an extraction process creates a daily archive by retrieving all continuous and event-triggered data for the previous day. The daily archive is run through quality control procedures to correct any timing errors, triggered data is reselected based on the REDI, NCSN, and UCB earthquake catalogs, and the resulting daily collection of data is archived at the NCEDC.

NCSN Seismic Data

NCSN waveform data continue to be collected and processed by the USGS at Menlo Park, and shipped to the datacenter on exabyte tapes. The NCSN has developed procedures to send event waveform files from Menlo Park to the NCEDC via the Internet, but was delayed by TCP/IP software problems on the Vax computers at Menlo Park. New software has been acquired that fixes the problem, and we should be able to initiate that process shortly. Parametric information, such as event catalogs and phase readings from both the BDSN and NCSN networks are automatically updated on the NCEDC on a daily basis.

Various problems were encountered that prevented the NCEDC from loading many of the NCSN waveforms. First, it was discovered that a significant number of waveforms were erroneously loaded by the NCEDC in VAX (little-endian) byteorder instead of the Sun (big-endian) byteorder used at the NCEDC. In addition, problems with the hardware on the new mass storage system and software for the old mass storage system significantly delayed the migration of data from the old mass storage system to the new mass storage system. Since no additional space was available on the old mass store, we needed to migrate the old data to the new mass store before fixing the byte ordering problem with the existing datafiles. In addition, we needed to fix the byte ordering problem before loading new NCEDC data onto the mass store. As a result, we have a significant backlog of current NCSN data to archive at the NCEDC.

Electro-Magnetic Data

The NCEDC continues to archive and process electric and magnetic field data acquired from 3 dataloggers at three sites (SAO, PKD, and PKD1). At PKD and SAO, 3 components of magnetic field and 2 or 4 components of electric field are digitized and telemetered in real-time along with seismic data to the Seismological Laboratory, where they are processed and archived at the NCEDC in a similar fashion to the seismic data. The system generates continuous data channels at 40 Hz, 1 Hz, and .1 Hz for each component of data. All of these data are archived and remain available online at the NCEDC. Using programs written by Dr. Martin Fullerkrug at the Stanford University STAR Laboratory, the NCEDC is computing and archiving magnetic activity and Schumann resonance analysis using the 40 Hz data from this dataset.

The datalogger at PKD1 acquires 8 channel of low frequency long baseline electric field data from an ongoing project by Dr. Steve Park of UC Riverside. This data is acquired and archived in an identical manner to the other electric field data at the NCEDC.

GPS Data

The NCEDC continues to expand its archive of GPS data through the BARD (Bay Area Regional Deformation) network of continuously monitored GPS receivers in northern California. The NCEDC GPS archive now includes 40 sites in northern California. There are 25 core BARD sites owned and operated by UC Berkeley, LLNL, USGS, UC Davis, Trimble Navigation, and Stanford. Data from the other northern California sites are collected from sites operated by JPL, the US Coast Guard, and Scripps Institute of Oceanography.

Most of the Berkeley BARD sites are co-located with seismic stations, and data from these sites are acquired in real-time using shared frame relay telemetry link. The remaining Berkeley BARD stations use dedicated frame relay and/or spread spectrum radio to provide data in real-time to UC Berkeley, and are automatically processed and archived at the NCEDC on a daily basis. Data from the USGS sites are downloaded by the USGS and transferred to the NCEDC on a daily basis, and is automatically archived by the NCEDC. The other sites are automatically acquired from their respective operators on an hourly or daily basis, and are archived by the NCEDC.

The NCEDC is participating in the UNAVCO-sponsored GPS Seamless Archive Centers (GSAC) initiative, which is developing common protocols and interfaces for the exchange and distribution of continuous and survey-mode GPS data.

Unocal Seismic Data

The Unocal Corporation operated a micro-seismic monitoring network in the Geysers regions of northern California. In prior years, Unocal had released six years of triggered event waveform data from 1987-1994 for archival and distribution at the NCEDC. Through an updated agreement with the NCEDC this year, Unocal released triggered event waveform data and a preliminary hypocenter catalog for an additional four years of data from 1995-1998. The total dataset represents over 150,000 events that were recorded by the Unocal Geysers network, and is available via research accounts at the NCEDC. Although Unocal did not release a preliminary hypocenter catalog for the first six years of data or phase readings for any of the events, several scientists have already deemed this dataset as a useful addition to the NCEDC. Due to problems encountered with the tape media used by Unocal to archive their data, the NCEDC is still loading portions of the 1995-1998 data.

Parkfield High Resolution Seismic Network Data

Event seismograms from the Parkfield High Resolution Seismic Network from 1987 through June 1998 have been loaded onto the NCEDC in their raw SEGY format. A number of events have faulty timing due to the lack or failure of a precision timesource for the network. Due to funding limitations, there is currently no ongoing work to correct the timing problems in the older events or to create MiniSEED volumes for these events. However, a preliminary catalog for a significant number of these events has been constructed, and the catalog is available via the web at the NCEDC. The raw SEGY data files are available via research accounts at the NCEDC.

CNSS Worldwide Earthquake Catalog

The NCEDC, in conjunction with the Council of the National Seismic System (CNSS), is producing and distributing a world-wide composite catalog of earthquakes based on the catalogs of the national and various US regional networks. Each network updates their earthquake catalog on a daily basis at the NCEDC, and the NCEDC constructs a composite world-wide earthquake catalog by combining the data, removing duplicate entries that may ocurr from multiple networks recording an event, and giving priority to the data from each network's authoritative region. The catalog, which includes data from 14 regional and national networks, and is available for use at the NCEDC, and is made available to anyone over the Internet.

New Facilities and Hardware

The NCEDC moved into new facilities at the newly renovated Berkeley Seismological Laboratory in McCone Hall in January 1999. This move reunited all of the NCEDC mass storage systems and computers into a single location and onto the same high-speed 100 MBit switched network.

Last year partial funding for a new mass storage system was made available from the USGS in mid-year, but the purchase of the new mass storage system had been deferred in an attempt to acquire higher density drives, which ultimately were not available. The new mass system was purchased in late spring 1998, and is comprised of a Sun Ultra 450 computer, a 1.3 Tbyte DISC 517 slot jukebox with two 2.6 GByte Magneto Optical (MO) drives, an 11-slot AIT tape jukebox which holds 25 GBytes per tape, and the SAM-FS hierarchical filesystem management software. Only two MO drives and minimal media were purchased at that time.

This year, the mass storage system was upgraded from its initial 1.3 TByte capacity to 2.5 TByte capacity by the replacement of its 2.6 GByte MO drives by four 5.2 GByte MO drives and 5.2 GB MO media. The mass storage system can be upgraded to a total of 1000 slots (5.2 TByte) capacity with the addition of a second media picker, drives, and media cells.

The older NCEDC data stored on the original two 300 GByte Sony WORM jukeboxes are currently being migrated to the new mass storage system in order to reduce maintenance and operating costs and increase the speed of access to the data. The data migration should be completed by the end of 1999.

The new hardware and software system can be configured to automatically create multiple copies of each data file. The NCEDC is using this feature to create an online copy of each data file on MO media, and another copy on AIT tape which will be stored offline.

Joint Northern California Earthquake Catalog

Currently both the USGS and BSL construct and maintain earthquake catalogs for northern and central California. The "official" UC Berkeley earthquake catalog begins in 1910, and the USGS "official" catalog begins in 1966. Both of these catalogs are archived and available through the NCEDC, but the existence of 2 catalogs has caused confusion among both researchers and the public. The BSL and the USGS have spent considerable effort over the past year to define procedures for merging the data from the two catalogs into a single northern and central California earthquake catalog in order to present a unified view of northern California seismicity. The differences in time period, variations in data availability, and mismatches in regions of coverage all complicate the task.

From 1910 through 1967, the BSL catalog is the primary source of northern California earthquake information. Only limited phase data are available for this time period, although location and magnitude information is provided. The NCSN began to come online in 1966, and observations from this network are available beginning in July of that year.

Starting with data from 1996, the BSL and USGS are working to generate a "joint" catalog by merging phase data and relocating the earthquakes. One of the initial complications in this project is matching up events between the two catalogs. Due to the sparse nature of the BSL instrumentation over the years, the BSL catalog is only complete at the magnitude 3 level while the USGS catalog is generally complete at the magnitude 2 level. However, the BSL catalog includes regional events from southern California, Nevada, Utah, Oregon, and Washington. Thus neither catalog is a subset of the other. Other complications include foreshock/aftershock sequences, where one organization might read a foreshock and the mainshock and the other might read the mainshock and an immediate aftershock. Since limited phase data are available for the BDSN until 1976, most events during this period will combine the USGS location with the BSL magnitude. Where BSL phase data are available, an event that appears in both catalogs will have its phase and amplitude data merged, the event will be relocated with the combined phase data, and magnitudes will be recomputed using the available amplitude readings and new location.

The process of consolidating the BSL data to be merged into the joint catalog has uncovered many details in the original catalog which were ambiguous or poorly documented or inconsistent over time, such as the use of channel names for phase and amplitude readings. Significant time has been spent resolving these issues before the joint catalog was actually constructed.

The USGS and BSL performed an initial joint catalog from the USGS and BSL catalogs in the spring of 1999. The BSL spent considerable time analyzing the resulting catalog, and has identified problems with specific earthquake associations and other related problems. We anticipate creating a final merged catalog within the next year.

Database Development

Most of the parametric data archived at the NCEDC, such as earthquake catalogs, phase and amplitude readings, waveform inventory, and instrument responses have been stored in flat text files. Flat file are easily stored and viewed, but are not efficiently searched. Over the last year, the NCEDC, in collaboration with the USGS/SCEC Data Center, and TriNet, has continued development of database schemas to store the parametric data from the joint earthquake catalog, station history, complete instrument response for all data channels, and waveform inventory.

The parametric schema supports tables and associations for the joint earthquake catalog. It allows for multiple hypocenters per event, multiple magnitudes per hypocenter, and association of phases and amplitudes with multiple versions of hypocenters and magnitudes respectively. The instrument response schema represents full multi-stage instrument responses (including filter coefficients) for the broadband dataloggers. The hardware tracking schema will represent the interconnection of instruments, amplifiers, filters, and dataloggers over time. This schema will be used to store the joint northern California earthquake catalog and the CNSS composite catalog.

The entire description for the BDSN network and data archive has been entered into the hardware tracking, SEED instrument response, and waveform tables. Programs have been developed to perform queries of waveform inventory and instrument responses, and the NCEDC can now generate full SEED volumes from the BDSN network based on information from the database and the waveforms on the mass storage system. The second stage of development will include the NCSN waveform inventory and later the NCSN instrument response data as they are made available.

Additional details on the joint catalog effort and database schema development may be found at http://quake.geo.berkeley.edu/db

NETDC

In a collaborative project with the IRIS DMC and other worldwide datacenter, the NCEDC has helped develop and implement NETDC, a protocol which will provide a seamless user interface to multiple datacenters for geophysical network and station inventory, instrument responses, and data retrieval requests. The NETDC system is currently operational in beta test mode between the NCEDC and the IRIS DMC, and was demonstrated at the IUGG meeting in Europe this summer. The NETDC implementation at the NCEDC makes significant use of the waveform and instrument response data stored in our newly developed database. It is scheduled to be opened to public access at the time of the Fall'99 AGU meeting.


next up previous contents
Next: Network Data Analysis Up: Operations Previous: Parkfield-Hollister electromagnetic monitoring array

The Berkeley Seismological Laboratory, 202 McCone Hall, UC Berkeley, Berkeley CA 94720
Questions and comments to www@seismo.berkeley.edu
Copyright 1999, The Regents of the University of California.