The Northern California Earthquake Data Center, a joint project of the Berkeley Seismological Laboratory and the U.S. Geological Survey at Menlo Park, serves as an "on-line" archive for various types of digital data relating to earthquakes in central and northern California. The NCEDC is located at the Berkeley Seismological Laboratory, and has been accessible to users via the Internet since mid-1992.
The primary goal of the NCEDC is to provide a stable and permanent archival and distribution center of digital geophysical data for northern and central California such as seismic waveforms, electromagnetic data, GPS data, and earthquake parametric data. The principal networks contributing seismic data to the data center are the Berkeley Digital Seismic Network (BDSN) operated by the Seismological Laboratory, the Northern California Seismic Network (NCSN) operated by the USGS, and the Bay Area Regional Deformation (BARD) GPS network. The collection of NCSN digital waveforms date from 1984 to the present, the BDSN digital waveforms date from 1987 to the present, and the BARD GPS data date from 1993 to the present.
The NCEDC continues to use the World Wide Web as a principal interface for users to request, search, and receive data from the NCEDC. The NCEDC has implemented a number of useful and original mechanisms of data search and retrieval using the World Wide Web, which are available to anyone on the Internet. All of the documentation about the NCEDC, including the research users' guide, is available via the Web. Users can perform catalog searches and retrieve hypocentral information and phase readings from the various earthquake catalogs at the NCEDC via easy-to-use forms on the Web. In addition, users can peruse the index of available broadband data at the NCEDC, and can request and retrieve broadband data in standard SEED format via the Web. Access to all datasets is available via research accounts at the NCEDC. The NCEDC's Web address is http://quake.geo.berkeley.edu/
The current NCEDC facilities consist of a Sun Ultra 450 computer, a 2.5 TByte capacity DISC 517 slot jukebox with four 5.2 GByte MO drives and 5.2 GB MO media, a 15-slot AIT tape jukebox which holds 25 GBytes per tape, and the SAM-FS hierarchical storage management (HSM) software, and 4.6 TB of online disk storage. A dual processor Sun Ultra 60 provides Web services and research account access to the NCEDC.
The hardware and software system can be configured to automatically create multiple copies of each data file. The NCEDC uses this feature to create an online copy of each data file on MO media, and another copy on AIT tape which is stored offline. As of 2003, all data is stored on magnetic disk, with backup copies on MO and tape media.
These activities and projects are described in detail below.
The bulk of the data at the NCEDC consist of waveform and GPS data from northern California. Figure 11.1 shows the relative proportion of each data set at the NCEDC. The total size of the datasets archived at the NCEDC is shown in Table 11.1. Figure 11.2 shows the geographic distribution of data archived by the NCEDC.
The archival of current BDSN (Chapter 3), NHFN (Chapter 4), and Mini-PBO (Chapter 8) (all stations using the network code BK) seismic data is an ongoing task. These data are telemetered from more than 30 seismic data loggers in real-time to the BSL, where they are written to disk files. Each day, an extraction process creates a daily archive by retrieving all continuous and event-triggered data for the previous day. The daily archive is run through quality control procedures to correct any timing errors, triggered data is reselected based on the REDI, NCSN, and BSL earthquake catalogs, and the resulting daily collection of data is archived at the NCEDC.
All of the data acquired from the BDSN/NHFN/MPBO Quanterra data loggers are archived at the NCEDC. The NCEDC has made an effort to archive older digital data, and the 16-bit BDSN digital broadband data from 1987-1991 have been converted to MiniSEED and are now online. In late June 2002, the NCEDC initiated a project to convert the remaining 16-bit BDSN data (MHC, SAO, and PKD1) from late 1991 through mid-1992 to MiniSEED. An undergraduate student was hired to read the old tapes and to work on the conversion. All remaining 20 Hz 16 bit BDSN data has been converted to MiniSEED, and we are working on the decimation procedures to create the 1 Hz data channels. Data acquired by portable 24-bit RefTek recorders before the installation of Quanterra data loggers at NHFN sites has not yet been converted to MiniSEED and archived.
NCSN and SHFN waveform data are sent to the NCEDC via the Internet. The NCSN event waveform files are automatically transferred from the Menlo Park to the NCEDC as part of the routine analysis procedure by the USGS, and are automatically verified and archived by the NCEDC.
A few corrupt NCSN event files were discovered at the NCEDC several years ago, and were eventually traced down to suspected flaws in the 12-inch WORM media and/or firmware problems on the Sony WDA-600 series jukeboxes used by the NCEDC. When we transcribed the data from the 12-inch WORM media to the current 5.25 inch magneto-optical media, we verified that all files were transcribed accurately. In 2000-2001, using software developed at the NCEDC to detect possibly corrupt NCSN files, we identified 4704 possibly corrupted NCSN waveform event files. We re-read the original NCSN tapes for all of these events, discovered that only 71 of the files were actually corrupt, and replaced the corrupted event waveform files.
The NCEDC maintains a list of teleseismic events recorded by the NCSN, which is updated automatically whenever a new NCSN event file is received at the NCEDC, since these events do not appear in the NCSN catalog.
The NCSN installed 9 continuously telemetered digital broadband stations in northwest California and southwest Oregon in support of the USGS/NOAA Consolidated Reporting of EarthquakeS and Tsunamis (CREST) system, and 2 continuously telemetered digital broadband stations in the Mammoth region. The NCEDC established procedures to create an archive of continuous data from these stations, in addition to the event waveform files. These data initially included channels at 50 and 100 Hz, but now are all 100 Hz sampling. The NCEDC hoped to generate an archive of 20 Hz data (for consistency with the BDSN data) from these 100 Hz waveforms, but incomplete continuous data due to telemetry problems between the stations and the USGS Menlo Park data collection center has made this difficult. At this point, the NCEDC is archiving the 100 Hz data without decimation.
Event seismograms from the Parkfield High Resolution Seismic Network (HRSN) from 1987 through June 1998 are available in their raw SEGY format via NCEDC research accounts. A number of events have faulty timing due to the lack or failure of a precision time source for the network. Due to funding limitations, there is currently no ongoing work to correct the timing problems in the older events or to create MiniSEED volumes for these events. However, a preliminary catalog for a significant number of these events has been constructed, and the catalog is available via the Web at the NCEDC.
As described in Chapter 5, the original HRSN acquisition system died in late 1998, and an interim system of portable RefTek recorders were installed at some of the sites. Data from this interim system are not currently available online.
In 2000 and 2001, 3 new borehole sites were installed, and the network was upgraded to operate with Quanterra Q730 data loggers and digital telemetry. The upgraded acquisition system detects events using the HRSN stations and extracts waveforms from both the HRSN and the PASO stations. The event waveform files are automatically transferred to the NCEDC, where they are made available to the research community via anonymous FTP until they are reviewed and permanently archived. During the deployment of the temporary PASSCAL network (PASO) Parkfield during in 2000-2003 with the IRIS broadband array telemetry, the HRSN collected event data from both the HRSN and PASO array and provided this integrated data set to researchers in near-real-time.
The HRSN 20 Hz (BP) and state-of-health channels are being archived continuously at the NCEDC. As an interim measure, the NCEDC also archived continuous data from the 250 Hz (DP) channels through mid 2002 in order to help researchers retrieve events that were not detected during the network upgrade.
The University of Reno in Nevada (UNR) operates several broadband stations in western Nevada and eastern California that are important for northern California earthquake processing and analysis. Starting in August 2000, the NCEDC has been receiving and archiving continuous broadband data from four UNR stations. The data are transmitted in real-time from UNR to UC Berkeley, where it is made available for real-time earthquake processing and for archiving.
In a situation similar to that of the broadband waveforms from the NCSN, the NCEDC originally planned to create an archive of 20 Hz data from the 100 Hz data. However, frequent gaps in the data complicate the development of a robust decimation process. At this time, the UNR broadband waveforms are being archived at 100 Hz.
The NCEDC continues to archive and process electric and magnetic field data acquired from data loggers at two sites (SAO and PKD). At PKD and SAO, 3 components of magnetic field and 2 or 4 components of electric field are digitized and telemetered in real-time along with seismic data to the Seismological Laboratory, where they are processed and archived at the NCEDC in a similar fashion to the seismic data (Chapter 6). The system generates continuous data channels at 40 Hz, 1 Hz, and .1 Hz for each component of data. All of these data are archived and remain available online at the NCEDC. Using programs developed by Dr. Martin Fullerkrug at the Stanford University STAR Laboratory (now at the Institute for Meteorology and Geophysics at the University of Frankfurt), the NCEDC is computing and archiving magnetic activity and Schumann resonance analysis using the 40 Hz data from this dataset. The magnetic activity and Schumann resonance data can be accessed from the Web.
In addition to the electromagnetic data from PKD and SAO, the NCEDC archives data from a low-frequency, long-baseline electric field project operated by Dr. Steve Park of UC Riverside at site PKD2. This experiment (which is separate from the equipment at PKD1 described in Chapter 6), uses an 8-channel Quanterra data logger to record the data, which are transmitted to the BSL using the same circuit as the BDSN seismic data. These data is acquired and archived in an identical manner to the other electric field data at the NCEDC.
The NCEDC archives GPS data from the BARD (Bay Area Regional Deformation) network of continuously monitored GPS receivers in northern California (Chapter 7). The NCEDC GPS archive now includes 77 continuous sites in northern California. There are approximately 70 core BARD sites owned and operated by UC Berkeley, USGS (Menlo Park and Cascade Volcano Observatory), LLNL, UC Davis, UC Santa Cruz, Trimble Navigation, and Stanford. Data are also archived from sites operated by other agencies including East Bay Municipal Utilities District, the City of Modesto, the National Geodetic Survey, and the Jet Propulsion Laboratory.
The NCEDC also archives non-continuous survey GPS data. The NCEDC is the principal archive for the survey GPS data collected by the USGS Menlo Park for northern California and other locations. Significant quality control efforts were implemented by the NCEDC to ensure that the raw data, scanned site log sheets, and RINEX data are archived for each survey. All of the USGS MP GPS data has been transferred to the NCEDC and virtually all of the data from 1992 to the present has been archived and is available for distribution.
The Calpine Corporation currently operates a micro-seismic monitoring network in the Geysers regions of northern California. Prior to 1999 this network was operated by Unocal. Through various agreements with both Unocal and Calpine, the companies have release triggered event waveform data from 1989 to through 2000 along with and preliminary event catalogs for the same time period for archiving and distribution through the NCEDC. This dataset represents over 296,000 events that were recorded by Calpine/Unocal Geysers network, and are available via research accounts at the NCEDC.
Over the last 26 years, the USGS at Menlo Park, in collaboration with other principal investigators, has collected an extensive low-frequency geophysical data set that contains over 1300 channels of tilt, tensor strain, dilatational strain, creep, magnetic field, water level, and auxiliary channels such as temperature, pore pressure, rain and snow accumulation, and wind speed. In collaboration with the USGS, we assembled the requisite information for the hardware representation of the stations and the instrument responses for many channels of this diverse dataset, and developed the required programs to populate and update the hardware database and generate the instrument responses. We developed the programs and procedures to automate the process of importing the raw waveform data and convert it to MiniSEED format.
We have currently archived timeseries data from 887 data channels from 167 sites, and have instrument response information for 542 channels at 139 sites. The waveform archive is updated on a daily basis with data from 350 currently operating data channels. We will augment the raw data archive as additional instrument response information is assembled for the channels, and will work with the USGS to clearly define the attributes of the "processed" data channels.
Currently both the USGS and BSL construct and maintain earthquake catalogs for northern and central California. The "official" UC Berkeley earthquake catalog begins in 1910, and the USGS "official" catalog begins in 1966. Both of these catalogs are archived and available through the NCEDC, but the existence of 2 catalogs has caused confusion among both researchers and the public. The BSL and the USGS have spent considerable effort over the past years to define procedures for merging the data from the two catalogs into a single northern and central California earthquake catalog in order to present a unified view of northern California seismicity. The differences in time period, variations in data availability, and mismatches in regions of coverage all complicate the task.
The NCEDC, in conjunction with the Council of the National Seismic System (CNSS), produced and distributed a world-wide composite catalog of earthquakes based on the catalogs of the national and various U.S. regional networks for several years. Each network updates their earthquake catalog on a daily basis at the NCEDC, and the NCEDC constructs a composite world-wide earthquake catalog by combining the data, removing duplicate entries that may occur from multiple networks recording an event, and giving priority to the data from each network's authoritative region. The catalog, which includes data from 14 regional and national networks, is searchable using a Web interface at the NCEDC. The catalog is also freely available to anyone via FTP over the Internet.
With the demise of the CNSS and the development of the ANSS, the NCEDC was asked to update the Web pages to present the composite catalog as a product of the ANSS. This conversion was completed in the fall of 2002.
The NCEDC developed a GUI-based state-driven system CalQC to facilitate the quality control processing that is applied to the BK, NC broadband, NN, and BP data sets.
The quality control procedures for these datasets include the following tasks:
CalQC uses previously developed programs to perform each function, but it provides a graphical point-and-click interface to automate these procedures, and to provide the analyst with a record of when each process was started, whether it executed correctly, and whether the analyst has indicated that a step has been completed. CalQC is used to process all data from the BK network, and all continuous data from the NN, NC, and BP networks that is archived by the NCEDC.
Most of the parametric data archived at the NCEDC, such as earthquake catalogs, phase and amplitude readings, waveform inventory, and instrument responses have been stored in flat text files. Flat file are easily stored and viewed, but are not efficiently searched. Over the last year, the NCEDC, in collaboration with the Southern California Earthquake Data Center (SCEDC) and the California Integrated Seismic Network (CISN), has continued development of database schemas to store the parametric data from the joint earthquake catalog, station history, complete instrument response for all data channels, and waveform inventory.
The parametric schema supports tables and associations for the joint earthquake catalog. It allows for multiple hypocenters per event, multiple magnitudes per hypocenter, and association of phases and amplitudes with multiple versions of hypocenters and magnitudes respectively. The instrument response schema represents full multi-stage instrument responses (including filter coefficients) for the broadband data loggers. The hardware tracking schema will represent the interconnection of instruments, amplifiers, filters, and data loggers over time. This schema will be used to store the joint northern California earthquake catalog and the ANSS composite catalog.
The entire description for the BDSN/NHFN/MPBO, HRSN, and USGS Low Frequency Geophysical networks and data archive has been entered into the hardware tracking, SEED instrument response, and waveform tables. Using programs developed to perform queries of waveform inventory and instrument responses, the NCEDC can now generate full SEED volumes for these network based on information from the database and the waveforms on the mass storage system.
During 2002-2003, the NCEDC and NCSN jointly developed a system consisting of an extensive spreadsheet that contains per-channel information that describes the hardware of each NCSN data channel and provides each channel with a SEED-compliant channel name. This spreadsheet, combined with a limited number of of files that describe the central-site analog digitizer, FIR decimation filters, and general characteristics of digital acquisition systems, allow the NCSN to assemble its station history in a format that the NCEDC can use to populate the hardware tracking and instrument response database tables for the NCSN. As of June 2003, the NCEDC has the preliminary response for approximately 75 percent of the NCSN network. However, significant work must still be done to complete and verify the NCSN instrument responses.
The second part of this project is the conversion of the NCSN waveforms from their native CUSP format into MiniSEED, the standard NCEDC waveform format. This process must deal with multiple problems such as ambiguous or erroneously labeled CUSP data channel, sensor that were recorded on multiple data channels, and ensuring that each distinct data channel is mapped to a distinct SEED channel name. The NCEDC developed programs to use the time-dependent NCSN instrument response spreadsheet and NSCN-supplied name channel name transformation rules to determine the the SEED channel naming, and to provide feedback to the NCSN on channel naming problems. When the channel transformation rules have stabilized, the NCEDC will perform a bulk conversion of all historic NCSN waveforms to MiniSEED format.
The second stage of development will include the NCSN waveform inventory and later the NCSN instrument response data as they are made available. We distributed all of our programs and procedures to populate the hardware tracking and instrument response tables to the SCEDC in order to help them populate their database.
During 2002-2003, the BSL has been processing events detected by the HRSN (BP) network. The waveform data and event parameters (picks and hypocenters) are stored in separate HRSN database tables, and will be merged with events from the NCSN when the NCSN catalog is migrated to the database.
Additional details on the joint catalog effort and database schema development may be found at http://quake.geo.berkeley.edu/db
The various earthquake catalogs, phase, and earthquake mechanism can be searched using NCEDC Web interfaces that allow users to select the catalog, attributes such as geographical region, time and magnitude. The GPS data is available to all users via anonymous FTP. Research accounts are available to any qualified researcher who needs access to the other datasets that currently are not available via the Web.
During 2000 and 2001, the NCEDC has developed a generalized database query system to support the development of portable database query applications among data centers with different internal database schemas. The initial goal was to modify the IRIS SeismiQuery Web interface program to make installation easier at the NCEDC and other data centers, as well as to introduce a new query language that would be schema independent.
In order to support SeismiQuery and other future database query applications, we defined a set of Generic Data Views (GDV) for the database that encompassed the basic objects that we expect most data centers to support. We introduced a new language we call MSQL (Meta SeismiQuery Language), which is based on generic SQL, and uses the GDV's for its core schema. MSQL queries are converted to Data Center specific SQL queries by the parsing program MSQL2SQL. This parser stores the MSQL parsing tree in a data structure and API's were implemented to browse and modify elements in the parsing tree. These API's are the only data center or database specific source codes. We finally modified the SeismiQuery Web interface to uniformly generate MSQL requests and to process these requests in a consistent fashion.
We have installed SeismiQuery at the NCEDC, where it provides a common interface for querying attributes and available data for SEED format data, and have provided both IRIS and the SCEC Data Center with our modified version of SeismiQuery. We envision using this approach to support other database query programs in the future.
In a collaborative project with the IRIS DMC and other worldwide data centers, the NCEDC helped develop and implement NetDC, a protocol which will provide a seamless user interface to multiple data centers for geophysical network and station inventory, instrument responses, and data retrieval requests. The NetDC builds upon the foundation and concepts of the IRIS BREQ_FAST data request system. The NetDC system was put into production in January 2000, and is currently operational at three data centers worldwide - the NCEDC, IRIS DMC, and Geoscope. The NetDC system receives user requests via email, automatically routes the appropriate portion of the requests to the appropriate data center, optionally aggregates the responses from the various data centers, and delivers the data (or FTP pointers to the data) to the users via email.
The NCEDC hosts a Web page that allows users to easily query the NCEDC waveform inventory, generate and submit NetDC requests to the NCEDC. The NCEDC currently supports both the BREQ_FAST and NetDC request formats. As part of our collaboration with SCEDC, the NCEDC provided its BREQ_FAST interface code to SCEDC, have worked closely with them to implement BREQ_FAST requests at the SCEDC.
Last year, the NCEDC wrote a collaborative proposal with the SCEDC to the Southern California Earthquake Center, with the goal of unifying data access between the two data centers. As part of this project, the NCEDC and SCEDC are working to support a common set of 3 tools for accessing waveform and parametric data: SeismiQuery, NetDC, and STP.
The Seismogram Transfer Program or STP is a GUI-based client-server program, developed at at the SCEDC. Access to STP is either through a simple direct interface that is available for Sun or Linux platforms or through a Web interface. With the direct interface, the data are placed directly on a users' computer in several possible formats, with the byte-swap conversion performed automatically. With the Web interface, the selected and converted data are retrieved with a single FTP command. The STP interface also allows rapid access to parametric data such as hypocenters and phases.
The NCEDC has started implementing STP, working with the SCEDC on extensions and needed additions. We are adding support for the full SEED channel identifiers (Station, Network, Channel, and Location), and improving the waveform retrieval formats
In order to provide Web access to the NCSN waveforms before the SEED conversion and instrument response for the NCSN has been completed, the NCEDC implemented EVT_FAST, and interim email-based waveform request system similar to the BREQ_FAST email request systems. Users can email EVT_FAST requests to the NCEDC and request NCSN waveform data based on the NCSN event id. The NCSN waveform data is converted to either SAC ASCII, SAC binary, or AH format, and placed in the anonymous FTP directory so that users can retrieve the data. The EVT_FAST waveforms are currently named with the USGS's native NCSN channel names, since the SEED channel names conversion is not yet complete.
The FISSURES project developed from an initiative by IRIS to improve earth scientists' efficiency by developing a unified environment that can provide interactive or programmatic access to waveform data and the corresponding metadata for instrument response, as well as station, and channel inventory information. FISSURES was developed using CORBA (Common Object Request Broker Architecture) as the architecture to implement a system-independent method for the exchange of this binary data. The IRIS DMC developed a series of services, referred to as the Data Handling Interface (DHI), using the FISSURES architecture to provide waveform and metadata from the IRIS DMC.
The NCEDC has started to implement the FISSURES Data Handling Interface (DHI) services at the NCEDC, which involves interfacing the DHI servers with the NCEDC database schema. We started with the source code for the IRIS DMC's DHI servers, which reduced significantly the implementation's time. We now have the waveform and event FISSURES services running in demonstration mode at the NCEDC. These services interact with the NCEDC database and data storage system, and can deliver NCEDC event and channel meta-data as well as waveforms using the FISSURES interfaces. We are currently still performing tests on FISSURES and are waiting to import our catalog data into the database before we start running the software in production mode.
The NCEDC is a joint project of the BSL and the USGS Menlo Park and is partially funded by the USGS.
Doug Neuhauser is the manager of the NCEDC. Stephane Zuzlewski, Rick McKenzie, Mark Murray, André Basset, and Lind Gee of the BSL and David Oppenheimer, Hal Macbeth, and Fred Klein of the USGS Menlo Park contribute to the operation of the NCEDC. Steve Chu developed the CalQC program. Doug Neuhauser, Lind Gee, and Stephane Zuzlewski contributed to the preparation of this chapter.
Berkeley Seismological Laboratory
215 McCone Hall, UC Berkeley, Berkeley, CA 94720-4760
Questions or comments? Send e-mail: firstname.lastname@example.org
© 2004, The Regents of the University of California