The Northern California Earthquake Data Center, a joint project of the Berkeley Seismological Laboratory and the U.S. Geological Survey at Menlo Park, serves as an "on-line" archive for various types of digital data relating to earthquakes in central and northern California. The NCEDC is located at the Berkeley Seismological Laboratory, and has been accessible to users via the Internet since mid-1992.
The primary goal of the NCEDC is to provide a stable and permanent archival and distribution center of digital geophysical data for northern and central California such as seismic waveforms, electromagnetic data, GPS data, and earthquake parametric data. The principal networks contributing seismic data to the data center are the Berkeley Digital Seismic Network (BDSN) operated by the Seismological Laboratory, the Northern California Seismic Network (NCSN) operated by the USGS, and the Bay Area Regional Deformation (BARD) GPS network. The collection of NCSN digital waveforms date from 1984 to the present, the BDSN digital waveforms date from 1987 to the present, and the BARD GPS data date from 1993 to the present.
The NCEDC continues to use the World Wide Web as a principal interface for users to request, search, and receive data from the NCEDC. The NCEDC has implemented a number of useful and original mechanisms of data search and retrieval using the World Wide Web, which are available to anyone on the Internet. All of the documentation about the NCEDC, including the research users' guide, is available via the Web. Users can perform catalog searches and retrieve hypocentral information and phase readings from the various earthquake catalogs at the NCEDC via easy-to-use forms on the Web. In addition, users can peruse the index of available broadband data at the NCEDC, and can request and retrieve broadband data in standard SEED format via the Web. Access to all datasets is available via research accounts at the NCEDC. The NCEDC's Web address is http://quake.geo.berkeley.edu/
The current NCEDC facilities consist of a Sun Ultra 450 computer, a 2.5 TByte capacity DISC 517 slot jukebox with four 5.2 GByte MO drives and 5.2 GB MO media, an 15-slot AIT tape jukebox which holds 25 GBytes per tape, and the SAM-FS hierarchical storage management (HSM) software. A dual processor Sun Ultra 60 provides Web services and research account access to the NCEDC.
The hardware and software system can be configured to automatically create multiple copies of each data file. The NCEDC uses this feature to create an online copy of each data file on MO media, and another copy on AIT tape which is stored offline.
These activities and projects are described in detail below.
The bulk of the data at the NCEDC consist of waveform and GPS data from northern California. The total size of the datasets archived at the NCEDC is shown in Table 13.1. Figure 13.1 shows the geographic distribution of data archived by the NCEDC.
The archival of current BDSN (Chapter 4), NHFN (Chapter 5), and Mini-PBO (Chapter 9) (all stations using the network code BK) seismic data is an ongoing task. These data are telemetered from more than 30 seismic data loggers in real-time to the BSL, where they are written to disk files. Each day, an extraction process creates a daily archive by retrieving all continuous and event-triggered data for the previous day. The daily archive is run through quality control procedures to correct any timing errors, triggered data is reselected based on the REDI, NCSN, and BSL earthquake catalogs, and the resulting daily collection of data is archived at the NCEDC.
All of the data acquired from the BDSN/NHFN/MPBO Quanterra data loggers are archived at the NCEDC. The NCEDC has made an effort to archive older digital data, and the 16-bit BDSN digital broadband data from 1987-1991 have been converted to MiniSEED and are now online. In late June 2002, the NCEDC initiated a project to convert the remaining 16-bit BDSN data (MHC, SAO, and PKD1) from late 1991 through mid-1992 to MiniSEED. An undergraduate student has been hired to read the old tapes and to work on the conversion. Data acquired by portable 24-bit RefTek recorders before the installation of Quanterra data loggers at NHFN sites has not yet been converted to MiniSEED and archived.
NCSN and SHFN waveform data are sent to the NCEDC via the Internet. The NCSN event waveform files are automatically transferred from the Menlo Park to the NCEDC as part of the routine analysis procedure by the USGS, and are automatically verified and archived by the NCEDC.
A few corrupt NCSN event files were discovered at the NCEDC several years ago, and were eventually traced down to suspected flaws in the 12-inch WORM media and/or firmware problems on the Sony WDA-600 series jukeboxes used by the NCEDC. When we transcribed the data from the 12-inch WORM media to the current 5.25 inch magneto-optical media, we verified that all files were transcribed accurately. In 2000-2001, using software developed at the NCEDC to detect possibly corrupt NCSN files, we identified 4704 possibly corrupted NCSN waveform event files. We re-read the original NCSN tapes for all of these events, discovered that only 71 of the files were actually corrupt, and replaced the corrupted event waveform files.
The NCEDC maintains a list of teleseismic events recorded by the NCSN, which is updated automatically whenever a new NCSN event file is received at the NCEDC, since these events do not appear in the NCSN catalog.
The NCSN has installed 9 continuously telemetered digital broadband stations in northwest California and southwest Oregon in support of the USGS/NOAA Consolitated Reporting of EarthquakeS and Tsunamis (CREST) system. This year, the NCEDC established proceedures to create an archive of continuous data from these stations, in addition to the event waveform files. These data initially included channels at 50 and 100 Hz, but now are all 100 Hz sampling. The NCEDC hoped to generate an archive of 20 Hz data (for consistency with the BDSN data) from these 100 Hz waveforms, but problems with missing data has made this difficult. At this point, the NCEDC is archiving the 100 Hz data without decimation.
Event seismograms from the Parkfield High Resolution Seismic Network (HRSN) from 1987 through June 1998 are available in their raw SEGY format via NCEDC research accounts. A number of events have faulty timing due to the lack or failure of a precision timesource for the network. Due to funding limitations, there is currently no ongoing work to correct the timing problems in the older events or to create MiniSEED volumes for these events. However, a preliminary catalog for a significant number of these events has been constructed, and the catalog is available via the web at the NCEDC.
As described in Chapter 6, the original HRSN acquisition system died in late 1998, and an interim system of portable RefTek recorders were installed at some of the sites. Data from this interim system are not currently available online.
In 2000 and 2001, 3 new borehole sites were installed, and the network was upgraded to operate with Quanterra Q730 data loggers and digital telemetry. The upgraded acquisition system detects events using the HRSN stations, and through collaboration with the temporarily deployed PASSCAL network (PASO), extracts waveforms from both the HRSN and the PASO stations. The event waveform files are automatically transferred to the NCEDC, where thet are made available to the research community via anonymous ftp until they are reviewed and permanently archived.
The HRSN 20 Hz (BP) and state-of-health channels are being archived continuously at the NCEDC. As an interim measure, the NCEDC is also archiving continuous data from the 250 Hz (DP) channels in order to help researchers retrieve events that were not detected during the network upgrade.
The University of Reno in Nevada (UNR) operates several broadband stations in western Nevada and eastern California that are important for northern California earthquake processing and analysis. Starting in August 2000, the NCEDC has been receiving and archiving continuous broadband data from four UNR stations. The data are transmitted in real-time from UNR to UC Berkeley, where it is made available for real-time earthquake processing and for archiving.
In a situation similar to that of the broadband waveforms from the NCSN, the NCEDC originally planned to create an archive of 20 Hz data from the 100 Hz data. However, frequent gaps in the data complicate the development of a robust decimation process. At this time, the UNR broadband waveforms are being archived at 100 Hz.
The NCEDC continues to archive and process electric and magnetic field data acquired from data loggers at two sites (SAO and PKD). At PKD and SAO, 3 components of magnetic field and 2 or 4 components of electric field are digitized and telemetered in real-time along with seismic data to the Seismological Laboratory, where they are processed and archived at the NCEDC in a similar fashion to the seismic data (Chapter 7). The system generates continuous data channels at 40 Hz, 1 Hz, and .1 Hz for each component of data. All of these data are archived and remain available online at the NCEDC. Using programs developed by Dr. Martin Fullerkrug at the Stanford University STAR Laboratory (now at the Institute for Meteorology and Geophysics at the Univerity of Frankfurt), the NCEDC is computing and archiving magnetic activity and Schumann resonance analysis using the 40 Hz data from this dataset. The magnetic activity and Schumann resonance data can be accessed from the Web.
In addition to the electro-magnetic data from PKD and SAO, the NCEDC archives data from a low-frequency, long-baseline electric field project operated by Dr. Steve Park of UC Riverside at site PKD2. This experiment (which is separate from the equipment at PKD1 described in Chapter 7), uses an 8-channel Quanterra data logger to record the data, which are transmitted to the BSL using the same circuit as the BDSN seismic data. These data is acquired and archived in an identical manner to the other electric field data at the NCEDC.
The NCEDC continues to expand its archive of GPS data through the BARD (Bay Area Regional Deformation) network of continuously monitored GPS receivers in northern California (Chapter 8). The NCEDC GPS archive now includes 67 continuous sites in northern California. There are approximately 50 core BARD sites owned and operated by UC Berkeley, USGS (Menlo Park and Cascade Volcano Observatory), LLNL, UC Davis, UC Santa Cruz, Trimble Navigation, and Stanford. Data are also archived from sites operated by other agencies including East Bay Municipal Utilities District, the City of Modesto, the National Geodetic Survey, and the Jet Propulsion Laboratory.
This year the NCEDC continued to archive non-continuous survey GPS data. The initial dataset to be archived is the survey GPS data collected by the USGS Menlo Park for northern California and other locations. The NCEDC is the principal archive for this dataset. Significant quality control efforts were implemented by the NCEDC to ensure that the raw data, scanned site log sheets, and RINEX data are archived for each survey. All of the USGS MP GPS data has been transferred to the NCEDC and virtually all of the data from 1992 to the present has been archived and is available for distribution.
The Calpine Corporation currently operates a micro-seismic monitoring network in the Geysers regions of northern California. Prior to 1999 this network was operated by Unocal. Through various agreements with both Unocal and Calpine, the companies have release triggered event waveform data from 1989 to through 2000 along with and preliminary event catalogs for the same time period for archiving and distribution through the NCEDC. This dataset represents over 296,000 events that were recorded by Calpine/Unocal Geysers network, and are available via research accounts at the NCEDC.
Over the last 26 years, the USGS at Menlo Park, in collaboration with other principal investigators, has collected an extensive low-frequency geophysical data set that contains over 1300 channels of tilt, tensor strain, dilatational strain, creep, magnetic field, water level, and auxilliary channels such as temperature, pore pressure, rain and snow accumulation, and wind speed. In collaboration with the USGS, we assembled the requisite information for the hardware representation of the stations and the instrument responses for many channels of this diverse dataset, and developed the required programs to populate and update the hardware database and generate the instrument responses. We developed the programs and procedures to automate the process of importing the raw waveform data and convert it to MiniSEED format.
We have currently archived timeseries data from 887 data channels from 167 sites, and have instrument response information for 542 channels at 139 sites. The waveform archive is updated on a daily basis with data from 350 currently operating data channels. We will augment the raw data archive as additional instrument response information is assembled for the channels, and will work with the USGS to clearly define the attributes of the "processed" data channels.
Currently both the USGS and BSL construct and maintain earthquake catalogs for northern and central California. The "official" UC Berkeley earthquake catalog begins in 1910, and the USGS "official" catalog begins in 1966. Both of these catalogs are archived and available through the NCEDC, but the existence of 2 catalogs has caused confusion among both researchers and the public. The BSL and the USGS have spent considerable effort over the past year to define procedures for merging the data from the two catalogs into a single northern and central California earthquake catalog in order to present a unified view of northern California seismicity. The differences in time period, variations in data availability, and mismatches in regions of coverage all complicate the task.
The NCEDC, in conjunction with the Council of the National Seismic System (CNSS), has producing and distributed a world-wide composite catalog of earthquakes based on the catalogs of the national and various U.S. regional networks for several years. Each network updates their earthquake catalog on a daily basis at the NCEDC, and the NCEDC constructs a composite world-wide earthquake catalog by combining the data, removing duplicate entries that may occur from multiple networks recording an event, and giving priority to the data from each network's authoritative region. The catalog, which includes data from 14 regional and national networks, is searchable using a Web interface at the NCEDC. The catalog is also freely available to anyone via ftp over the Internet.
With the demise of the CNSS and the development of the ANSS, the NCEDC has been asked to update the Web pages to present the composite catalog as a product of the ANSS. This conversion will be completed in the fall of 2002.
The NCEDC developed a GUI-based state-driven system CalQC to facilitate the quality control processing that is applied to the BK, NC broadband, NN, and BP data sets.
The quality control procedures for these datasets include the following tasks:
CalQC uses previously developed the programs to perform each function, but it provides a graphical point-and-click interface to automate these procedures, and to provide the analyst with a record of when each process was started, whether it executed correctly, and whether the analyst has indicated that a step has been completed.
Most of the parametric data archived at the NCEDC, such as earthquake catalogs, phase and amplitude readings, waveform inventory, and instrument responses have been stored in flat text files. Flat file are easily stored and viewed, but are not efficiently searched. Over the last year, the NCEDC, in collaboration with the USGS/SCEC Data Center, and TriNet, has continued development of database schemas to store the parametric data from the joint earthquake catalog, station history, complete instrument response for all data channels, and waveform inventory.
The parametric schema supports tables and associations for the joint earthquake catalog. It allows for multiple hypocenters per event, multiple magnitudes per hypocenter, and association of phases and amplitudes with multiple versions of hypocenters and magnitudes respectively. The instrument response schema represents full multi-stage instrument responses (including filter coefficients) for the broadband data loggers. The hardware tracking schema will represent the interconnection of instruments, amplifiers, filters, and data loggers over time. This schema will be used to store the joint northern California earthquake catalog and the CNSS composite catalog.
The entire description for the BDSN/NHFN/MPBO, HRSN, and USGS Low Frequency Geophysical networks and data archive has been entered into the hardware tracking, SEED instrument response, and waveform tables. Programs have been developed to perform queries of waveform inventory and instrument responses, and the NCEDC can now generate full SEED volumes from the BDSN network based on information from the database and the waveforms on the mass storage system. The second stage of development will include the NCSN waveform inventory and later the NCSN instrument response data as they are made available. We distributed all of our programs and procedures to populate the hardware tracking and instrument response tables to the SCEDC in order to help them populate their database.
Additional details on the joint catalog effort and database schema development may be found at http://quake.geo.berkeley.edu/db
The various earthquake catalogs, phase, and earthquake mechanism can be searched using NCEDC web interfaces that allow users to select the catalog, attributes such as geographical region, time and magnitude. The GPS data is available to all users via anonymous ftp. Research accounts are available to any qualified researcher who needs access to the other datasets that currently are not available via the Web.
During 2000 and 2001, the NCEDC has developed a generalized database query system to support the development of portable database query applications among data centers with different internal database schemas. The initial goal was to modify the IRIS SeismiQuery web interface program to make installation easier at the NCEDC and other data centers, as well as to introduce a new query language that would be schema independent.
In order to support SeismiQuery and other future database query applications, we defined a set of Generic Data Views (GDV) for the database that encompassed the basic objects that we expect most data centers to support. We introduced a new language we call MSQL (Meta SeismiQuery Language), which is based on generic SQL, and uses the GDV's for its core schema. MSQL queries are converted to Data Center specific SQL queries by the parsing program MSQL2SQL. This parser stores the MSQL parsing tree in a data structure and API's were implemented to browse and modify elements in the parsing tree. These API's are the only datacenter or database specific source codes. We finally modified the SeismiQuery web interface to uniformly generate MSQL requests and to process these requests in a consistent fashion.
We have installed SeismiQuery at the NCEDC, where it provides a common interface for querying attributes and available data for SEED format data, and have provided both IRIS and the SCEC Data Center with our modified version of SeismiQuery. We envision using this approach to support other database query programs in the future.
In a collaborative project with the IRIS DMC and other worldwide datacenters, the NCEDC helped develop and implement NetDC, a protocol which will provide a seamless user interface to multiple datacenters for geophysical network and station inventory, instrument responses, and data retrieval requests. The NetDC builds upon the foundation and concepts of the IRIS BREQ_FAST data request system. The NetDC system was put into production in January 2000, and is currently operational at three datacenters worldwide - the NCEDC, IRIS DMC, and Geoscope. The NetDC system receives user requests via email, automatically routes the appropriate portion of the requests to the appropriate datacenter, optionally aggregates the responses from the various datacenters, and delivers the data (or ftp pointers to the data) to the users via email.
The NCEDC hosts a web page that allows users to easily query the NCEDC waveform inventory, generate and submit NetDC requests to the NCEDC. The NCEDC currently supports both the BREQ_FAST and NetDC request formats. As part of our collaboration with SCEDC, the NCEDC provided its BREQ_FAST interface code to SCEDC, have worked closely with them to implement BREQ_FAST requests at the SCEDC.
This year, the NCEDC wrote a collaborative proposal with the SCEDC to the Southern California Earthquake Center, with the goal of unifying data access between the two data centers. As part of this project, the NCEDC and SCEDC are working to support a common set of 3 tools for accessing waveform and parametric data: SeismiQuery, NetDC, and STP.
The Seismogram Transfer Program or STP is a GUI-based client-server program, developed at at the SCEDC. Access to STP is either through a simple direct interface that is available for Sun or Linux platforms or through a Web interface. With the direct interface, the data are placed directly on a users' computer in several possible formats, with the byte-swap conversion performed automatically. With the Web interface, the selected and converted data are retrieved with a single ftp command. The STP interface also allows rapid access to parametric data such as hypocenters and phases.
The NCEDC has started implementing STP, working with the SCEDC on extensions and needed additions.
The NCEDC is a joint project of the BSL and the USGS Menlo Park and is partially funded by the USGS.
Doug Neuhauser is the manager of the NCEDC. Stephane Zuzlewski, Rick McKenzie, Mark Murray, André Basset, and Lind Gee of the BSL and David Oppenheimer, Hal Macbeth, and Fred Klein of the USGS Menlo Park contribute to the operation of the NCEDC. Steve Chu developed the CalQC program. Doug Neuhauser, Lind Gee, and Stephane Zuzlewski contributed to the preparation of this chapter.