next up previous contents
Home: Berkeley Seismological Laboratory
Next: Outreach Activities Up: Operations Previous: Seismic Data Analysis

   
Data Archive and Distribution: Northern California Earthquake Data Center


Introduction

The Northern California Earthquake Data Center, a joint project of the Berkeley Seismological Laboratory and the U.S. Geological Survey at Menlo Park, serves as an "on-line" archive for various types of digital data relating to earthquakes in central and northern California. The NCEDC is located at the Berkeley Seismological Laboratory, and has been accessible to users via the Internet since mid-1992.

The primary goal of the NCEDC is to provide a stable and permanent archival and distribution center of digital geophysical data for northern and central California such as seismic waveforms, electromagnetic data, GPS data, and earthquake parametric data. The principal networks contributing seismic data to the data center are the Berkeley Digital Seismic Network (BDSN) operated by the Seismological Laboratory, the Northern California Seismic Network (NCSN) operated by the USGS, and the Bay Area Regional Deformation (BARD) GPS network. The collection of NCSN digital waveforms date from 1984 to the present, the BDSN digital waveforms date from 1987 to the present, and the BARD GPS data date from 1993 to the present.

The NCEDC continues to use the World Wide Web as a principal interface for users to request, search, and receive data from the NCEDC. The NCEDC has implemented a number of useful and original mechanisms of data search and retrieval using the World Wide Web, which are available to anyone on the Internet. All of the documentation about the NCEDC, including the research users' guide, is available via the Web. Users can perform catalog searches and retrieve hypocentral information and phase readings from the various earthquake catalogs at the NCEDC via easy-to-use forms on the Web. In addition, users can peruse the index of available broadband data at the NCEDC, and can request and retrieve broadband data in standard SEED format via the Web. Access to all datasets is available via research accounts at the NCEDC. The NCEDC's home page address is http://quake.geo.berkeley.edu/

Datasets

The bulk of the data at the NCEDC consist of waveform and GPS data from northern California. The total size of the datasets archived at the NCEDC is shown in Table 10.1.


 
Table 10.1: Volume of Data Archived at the NCEDC by Data Type
Data Type MBytes
Broadband, electric field , and magnetic field waveforms (compressed) 612,768
NCSN event seismograms (compressed) 250,460
GPS data (compressed RINEX and raw data) 182,218
Unocal Geysers region seismograms 31,100
Parkfield HRSN seismograms 2,732
Misc data 13,281
Total size of archived data 1,092,577
 

BDSN/NHFN Seismic Data

The archival of current BDSN (Chapter 2) and NHFN (Chapter 3) seismic data is an ongoing task. BDSN and NHFN data are telemetered from 30 seismic dataloggers in real-time to the BSL, where they are written to disk files. Each day, an extraction process creates a daily archive by retrieving all continuous and event-triggered data for the previous day. The daily archive is run through quality control procedures to correct any timing errors, triggered data is reselected based on the REDI, NCSN, and UCB earthquake catalogs, and the resulting daily collection of data is archived at the NCEDC.

All of the data acquired from the BDSN and NHFN Quanterra dataloggers are archived at the NCEDC. Most of the 16-bit BDSN digital broadband data from 1987-1991 have been converted to MiniSEED and are now online. However, there are still data from the 16-bit BDSN data acquisition systems from MHC, SAO, and PKD1 from late 1991 through mid-1992 that need to be converted to MiniSEED. Likewise, data acquired by portable 24-bit RefTek recorders before the installation of Quanterra dataloggers at NHFN sites has not yet been archived at the NCEDC. Additional temporary disk space recently acquired for the NCEDC should allow this data to be converted to MiniSEED and archived during the next year.


  
Figure 10.1: Chart showing the availability of BDSN and NHFN data at the NCEDC for the 1 Hz and 20 Hz channels from 01/01/1996 - 06/30/2000. The "uptime" of the BDSN and NHFN is better than 95% at nearly all stations. Exceptions are BDM (damaged by a lightning strike in May 2000), FARB (power problems when the USFWS generator failed in 1998), MOD (the newest BDSN station which suffered some delays during installation), WENL (flooded during the winter of 1997), and YBIB (damaged during a lightning strike in 1997). In general, a difference between the 1 and 20 Hz data is indicative of one or more significant telemetry problems. Following a major telemetry outage, BSL staff will recover 1 Hz continuous data but only event data for the 20 Hz channels.
\begin{figure*}
\begin{center}
\epsfig{file=bdsn.avail.eps,width=15cm}\end{center}\end{figure*}

NCSN/SHFN Seismic Data

NCSN and SHFN waveform data are now sent automatically to the NCEDC via the Internet. Previously, NCSN waveform data collected and processed by the USGS at Menlo Park were shipped to the datacenter on 2 GB or 5 GB 8-mm Exabyte helical scan tapes. The NCSN tapes were read by the NCEDC, checked against the parametric data for the events, and archived on the mass storage system. However, there were often significant delays between the time an event was recorded and its waveform was available at the NCEDC.

The USGS Menlo Park and NCEDC developed and tested the reliable transfer procedures during the past year. Waveforms are now automatically transferred from the Menlo Park to the NCEDC as part of the routine analysis procedure by the USGS, and are automatically verified and archived by the NCEDC. Parametric information, such as event catalogs and phase readings from both the BDSN and NCSN are automatically updated on the NCEDC on an hourly basis.

In 1998, a programming bug at the NCEDC resulted in a number of waveforms being archived in VAX (little-endian) byteorder instead of the Sun (big-endian) byteorder used at the NCEDC. By the time the bug was discovered and fixed, the old mass storage system did not have sufficient space to store the corrected events. Once all of the older NCSN data was migrated to the new mass store, the little-endian events were rearchived in big-endian format. The entire backlog of NCSN data has now been archived.

The NCEDC has created a list of teleseismic events recorded by the NCSN, and the USGS is determining which of those event have sufficient data to archive at the NCEDC. Some of the teleseismic event waveform are archived available at the NCEDC, but the remaining events require decimation from 100 Hz to 20 Hz before they are archived.

Electro-Magnetic Data

The NCEDC continues to archive and process electric and magnetic field data acquired from dataloggers at three sites (SAO, PKD1 and PKD). At PKD and SAO, 3 components of magnetic field and 2 or 4 components of electric field are digitized and telemetered in real-time along with seismic data to the Seismological Laboratory, where they are processed and archived at the NCEDC in a similar fashion to the seismic data (Chapter 5). The system generates continuous data channels at 40 Hz, 1 Hz, and .1 Hz for each component of data. All of these data are archived and remain available online at the NCEDC. Using programs developed by Dr. Martin Fullerkrug at the Stanford University STAR Laboratory (now at the Institute for Meteorology and Geophysics at the Univerity of Frankfurt), the NCEDC is computing and archiving magnetic activity and Schumann resonance analysis using the 40 Hz data from this dataset. The magnetic activity and Schumann resonance data can be accessed from the Web.

In addition to the electro-magnetic data from PKD and SAO, the NCEDC archives data from a low-frequency, long-baseline electric field project operated by Dr. Steve Park of UC Riverside. This experiment (which is separate from the equipment at PKD1 described in Chapter 5), uses an 8-channel Quanterra datalogger to record the data, which are transmitted to the BSL using the same circuit as the BDSN seismic data. These data is acquired and archived in an identical manner to the other electric field data at the NCEDC.

Parkfield High Resolution Seismic Network Data

Event seismograms from the Parkfield High Resolution Seismic Network (HRSN) from 1987 through June 1998 are available in their raw SEGY format via NCEDC research accounts. A number of events have faulty timing due to the lack or failure of a precision timesource for the network. Due to funding limitations, there is currently no ongoing work to correct the timing problems in the older events or to create MiniSEED volumes for these events. However, a preliminary catalog for a significant number of these events has been constructed, and the catalog is available via the web at the NCEDC.

The original HRSN acquisition system died in late 1998, and an interim system of portable RefTek recorders has acquired data since that time. The portable system is scheduled to be replaced in fall of 2000, at which time event data will be transmitted electronically to the NCEDC. We anticipate archiving the events and earthquake catalog from the interim system when they are provided to the NCEDC during the next year.

GPS Data

The NCEDC continues to expand its archive of GPS data through the BARD (Bay Area Regional Deformation) network of continuously monitored GPS receivers in northern California (Chapter 6). The NCEDC GPS archive now includes 40 sites in northern California. There are 24 core BARD sites owned and operated by UC Berkeley, LLNL, USGS, UC Davis, Trimble Navigation, and Stanford. Data from the other northern California sites are collected from sites operated by JPL, the U.S. Coast Guard, and Scripps Institute of Oceanography.

Most of the Berkeley BARD sites are collocated with seismic stations, and data from these sites are acquired in real-time using shared frame relay telemetry link. The remaining Berkeley BARD stations use dedicated frame relay and/or spread spectrum radio to provide data in real-time to UC Berkeley, and are automatically processed and archived at the NCEDC on a daily basis. Data from the USGS sites are downloaded by the USGS and transferred to the NCEDC on a daily basis, and is automatically archived by the NCEDC. The other sites are automatically acquired from their respective operators on an hourly or daily basis, and are archived by the NCEDC.

This year the NCEDC started developing and implementing procedure to archive non-continuous survey GPS data. The initial dataset to be archived is the survey GPS data collected by the USGS Menlo Park for northern California and other locations. The NCEDC will be the principal archive for this dataset. Significant quality control efforts were implemented by the NCEDC to ensure that the raw data, scanned site log sheets, and RINEX data are archived for each survey. The majority of the USGS MP data has been transferred to the NCEDC and are currently undergoing quality control procedures.

Unocal Seismic Data

The Unocal Corporation operated a micro-seismic monitoring network in the Geysers regions of northern California. In prior years, Unocal had released six years of triggered event waveform data from 1987-1994 for archival and distribution at the NCEDC. Through an updated agreement with the NCEDC last year, Unocal released triggered event waveform data and a preliminary hypocenter catalog for an additional four years of data from 1995-1998. The total dataset represents over 150,000 events that were recorded by the Unocal Geysers network, and is available via research accounts at the NCEDC. Although we experienced significant problems reading the data tapes provided by Unocal, we completed archiving the dataset this summer.

USGS Low Frequency Data

Over the last 25 years, the USGS at Menlo Park, in collaboration with other principal investigators, has collected an extensive low-frequency geophysical data set that contains over 1300 channels of tilt, tensor strain, dilatational strain, creep, magnetic field, water level, and auxilliary channels such as temperature, pore pressure, rain and snow accumulation, and wind speed. We are actively working with the USGS to assemble the requisite information for the hardware representation of the stations and the instrument responses for this diverse dataset.

During the past year, we developed the necessary programs to archive the raw data in standard MiniSEED, and tested our archive and retrieval system with data and station description from a representative station. There is also considerable interest in having the NCEDC archive and distribute a "processed" version of the principal data channels from this data set. Over the next year, we will archive the data raw as the instrument response information is assembled for the channels, and will work with the USGS to clearly define the attributes of the "processed" data channels.

The NCEDC has also tentatively agreed to archive the low frequency data from the Pinon Flats Observatory (PFO) in southern California, since the characteristics of that data set and archiving issues are similar to those of the USGS low frequency data set. We anticipate working on the PFO data set once the outstanding issues with the USGS data set are resolved.

CNSS Worldwide Earthquake Catalog

The NCEDC, in conjunction with the Council of the National Seismic System (CNSS), is producing and distributing a world-wide composite catalog of earthquakes based on the catalogs of the national and various U.S. regional networks. Each network updates their earthquake catalog on a daily basis at the NCEDC, and the NCEDC constructs a composite world-wide earthquake catalog by combining the data, removing duplicate entries that may occur from multiple networks recording an event, and giving priority to the data from each network's authoritative region. The catalog, which includes data from 14 regional and national networks, is searchable using a Web interface at the NCEDC. The catalog is also freely available to anyone via ftp over the Internet.

NCEDC Hardware and Software system

The NCEDC moved into new facilities at the newly renovated Berkeley Seismological Laboratory in McCone Hall in January 1999. This move reunited all of the NCEDC mass storage systems and computers into a single location and onto the same high-speed 100 MBit switched network.

In 1998 partial funding for a new mass storage system was made available from the USGS in mid-year, but the purchase of the new mass storage system had been deferred in an attempt to acquire higher density drives, which ultimately were not available. The new mass system was purchased in late spring 1998, and is comprised of a Sun Ultra 450 computer, a 1.3 Tbyte DISC 517 slot jukebox with two 2.6 GByte Magneto Optical (MO) drives, an 11-slot AIT tape jukebox which holds 25 GBytes per tape, and the SAM-FS hierarchical filesystem management software. Only two MO drives and minimal media were purchased at that time.

In 1999, the mass storage system was upgraded from its initial 1.3 TByte capacity to 2.5 TByte capacity by the replacement of its 2.6 GByte MO drives by four 5.2 GByte MO drives and 5.2 GB MO media. The mass storage system can be upgraded to a total of 1000 slots (5.2 TByte) capacity with the addition of a second media picker, drives, and media cells.

The new hardware and software system can be configured to automatically create multiple copies of each data file. The NCEDC is using this feature to create an online copy of each data file on MO media, and another copy on AIT tape which will be stored offline.

The older NCEDC data was stored on the original two 300 GByte Sony WORM jukeboxes. During the past year, the NCEDC finished migrating data from the old jukeboxes to the new mass storage system, and the old jukeboxes were decommissioned in order to reduce ongoing maintenance costs. Shortly after this data migration was finished, we identified quality problems with some of the new magneto-optical media for the new mass storage system. All of our magneto-optical media was replaced at no change by the vendor, and we migrated all of our data to the new media. No data files were lost due to the media problems.

During the past year, the NCEDC replaced its Sparc 20 computer, which functioned as the research account computer and web server, by a dual processor 360 MHz Sun Ultra 60 system. We also acquired additional mirrored disk space for both the Ultra 60 and the Ultra 450, which is the host for the the mass storage host and NCEDC database.

Joint Northern California Earthquake Catalog

Currently both the USGS and BSL construct and maintain earthquake catalogs for northern and central California. The "official" UC Berkeley earthquake catalog begins in 1910, and the USGS "official" catalog begins in 1966. Both of these catalogs are archived and available through the NCEDC, but the existence of 2 catalogs has caused confusion among both researchers and the public. The BSL and the USGS have spent considerable effort over the past year to define procedures for merging the data from the two catalogs into a single northern and central California earthquake catalog in order to present a unified view of northern California seismicity. The differences in time period, variations in data availability, and mismatches in regions of coverage all complicate the task.

From 1910 through 1967, the BSL catalog is the primary source of northern California earthquake information. Only limited phase data are available for this time period, although location and magnitude information is provided. The NCSN began to come online in 1966, and observations from this network are available beginning in July of that year.

Starting with data from 1996, the BSL and USGS are working to generate a "joint" catalog by merging phase data and relocating the earthquakes. One of the initial complications in this project is matching up events between the two catalogs. Due to the sparse nature of the BSL instrumentation over the years, the BSL catalog is only complete at the magnitude 3 level while the USGS catalog is generally complete at the magnitude 2 level. However, the BSL catalog includes regional events from southern California, Nevada, Utah, Oregon, and Washington. Thus neither catalog is a subset of the other. Other complications include foreshock/aftershock sequences, where one organization might read a foreshock and the mainshock and the other might read the mainshock and an immediate aftershock. Since limited phase data are available for the BDSN until 1976, most events during this period will combine the USGS location with the BSL magnitude. Where BSL phase data are available, an event that appears in both catalogs will have its phase and amplitude data merged, the event will be relocated with the combined phase data, and magnitudes will be recomputed using the available amplitude readings and new location.

The process of consolidating the BSL data to be merged into the joint catalog has uncovered many details in the original catalog which were ambiguous or poorly documented or inconsistent over time, such as the use of channel names for phase and amplitude readings. Significant time was spent resolving these issues before the joint catalog was actually constructed.

The USGS and BSL performed an initial joint catalog from the USGS and BSL catalogs in the spring of 1999. The BSL spent considerable time analyzing the resulting catalog, and has identified problems with specific earthquake associations and other related problems. We are currently finalizing the joint catalog for 1984 through 1998. We anticipate having a complete merged catalog within the next year.

Database Development

Most of the parametric data archived at the NCEDC, such as earthquake catalogs, phase and amplitude readings, waveform inventory, and instrument responses have been stored in flat text files. Flat file are easily stored and viewed, but are not efficiently searched. Over the last year, the NCEDC, in collaboration with the USGS/SCEC Data Center, and TriNet, has continued development of database schemas to store the parametric data from the joint earthquake catalog, station history, complete instrument response for all data channels, and waveform inventory.

The parametric schema supports tables and associations for the joint earthquake catalog. It allows for multiple hypocenters per event, multiple magnitudes per hypocenter, and association of phases and amplitudes with multiple versions of hypocenters and magnitudes respectively. The instrument response schema represents full multi-stage instrument responses (including filter coefficients) for the broadband dataloggers. The hardware tracking schema will represent the interconnection of instruments, amplifiers, filters, and dataloggers over time. This schema will be used to store the joint northern California earthquake catalog and the CNSS composite catalog.

The entire description for the BDSN network and data archive has been entered into the hardware tracking, SEED instrument response, and waveform tables. Programs have been developed to perform queries of waveform inventory and instrument responses, and the NCEDC can now generate full SEED volumes from the BDSN network based on information from the database and the waveforms on the mass storage system. The second stage of development will include the NCSN waveform inventory and later the NCSN instrument response data as they are made available. We distributed all of our programs and procedures to populate the hardware tracking and instrument response tables to the SCECDC in order to help them populate their database.

Additional details on the joint catalog effort and database schema development may be found at http://quake.geo.berkeley.edu/db

Data Distribution

In a collaborative project with the IRIS DMC and other worldwide datacenters, the NCEDC has helped develop and implement NETDC, a protocol which will provide a seamless user interface to multiple datacenters for geophysical network and station inventory, instrument responses, and data retrieval requests. The NETDC builds upon the foundation and concepts of the IRIS BREQ_FAST data request system. The NETDC system was put into production in January 2000, and is currently operational at three datacenters worldwide - the NCEDC, IRIS DMC, and Geoscope. The NETDC system receives user requests via email, automatically routes the appropriate portion of the requests to the appropriate datacenter, optionally aggregates the responses from the various datacenters, and delivers the data (or ftp pointers to the data) to the users via email.

The NCEDC hosts a web page that allows users to easily query the NCEDC waveform inventory, and generate and submit NETDC requests to the NCEDC. The NCEDC currently supports both the BREQ_FAST and NETDC request formats. As part of our collaboration with SCECDC, the NCEDC provided its BREQ_FAST interface code to SCECDC, have worked closely with them to implement BREQ_FAST requests at the SCECDC.

The various earthquake catalogs, phase, and earthquake mechanism can be searched using NCEDC web interfaces that allow users to select the catalog, attributes such as geographical region, time and magnitude. The GPS data is available to all users via anonymous ftp. Research accounts are available to any qualified researcher who needs access to the other datasets that currently are not available via the Web.

The NCEDC is participating in the UNAVCO-sponsored GPS Seamless Archive Centers (GSAC) initiative, which is developing common protocols and interfaces for the exchange and distribution of continuous and survey-mode GPS data. During this year, the NCEDC developed and implemented procedures to generate the appropriate GSAC inventory records for previously archived GPS data at the NCEDC, and now automatically generates the GSAC inventory records on a routine basis as part of its archiving procedures. The GSAC inventory records are available via anonymous ftp for other GSAC data wholesalers and retailers.

Acknowledgements

The NCEDC is a joint project of the BSL and the USGS Menlo Park and is partially funded by the USGS.

Under Barbara Romanowicz's general supervision, and with Doug Neuhauser as head guru, Stephane Zuzlewski, Rick McKenzie, Steve Fulton, Mark Murray, Ray Baxter, and Lind Gee of the BSL and David Oppenheimer and Rick Lester of the USGS Menlo Park contribute to the operation of the NCEDC. Will Prescott of the USGS Menlo Park has been coordinating the contribution of the low-frequency data set. Doug Neuhauser Stephane Zuzlewski, and Lind Gee contributed to the preparation of this chapter.


next up previous contents
Next: Outreach Activities Up: Operations Previous: Seismic Data Analysis

The Berkeley Seismological Laboratory, 202 McCone Hall, UC Berkeley, Berkeley CA 94720
Questions and comments to www@seismo.berkeley.edu
Copyright 2000, The Regents of the University of California.