Magnetotellurics (MT) relies upon the recording of tiny variations in
the earth's magnetic field. The spike identification is predicated on
the assumption that these variations are horizontally polarized
(spatially uniform) over 100's of km. In the absence of overwhelming
noise, this implies that sensors at
different sites will be strongly correlated, with the magnetic fields
being nearly identical and the electric fields differing by a scale
factor which depends on the local conductivity structure. This scale
factor is approximately stationary as evidenced by the stability
of intersite transfer functions [*Eisel and Egbert* 2002]. Figure 2.11 shows orthogonal
electric and magnetic field data sampled at the earth's surface at two
sites a distance of 120km apart. For a detailed description of the
sites see [*Kappler* 2005]. Note the similarity
in variations between channels of same orientation and field
type, as well as identical scaling of the magnetic components,
compared to the scale factor difference in the electric channels.
This similarity can be seen to extend from periods of hours to
seconds as shown by repeatedly zooming in on the data in
Figures 2.12, 2.13, and 2.14.

This inherent similarity in the fields is exploited to identify
windows in time when the array is not functioning
correctly. Magnetotelluric data is prone to sudden sharp variations in
signal of natural origin. These variations can be differentiated from
variations whose origins are local noise by comparing
channels at different sites,
and flagging sharp variations which
are not present arraywide. A variety of
statistics can be used to identify spikes.
The statistic used
here is simply the variance of each array channel over short time
windows. The variance can be calculated directly in each
time window, or alternatively, the variance of the first difference
data can be calculated. The ratio of the variance in channels which
record the same field type at the same orientation should be
stationary about some typical value; in the case of magnetic fields,
this value should be 1. In the case of electric fields, the value will
be approximately stationary around some value, which reflects
site effects such as local field distortion phenomena. The stationarity
is only approximate because the observed fields' ratios will vary as a
function of frequency, and hence electric field variance ratios
will wander somewhat with the frequency content of the incident radiation. We
approximately account for this by scaling the electric field data
by a 'site factor' *s* which scales the *i*
channel such that non-contaminated windows of channel *i* data
have variance nearly equal to the variance of the same time window at
a reference site. Formally, with the electric field data scaled in V/m and the magnetic
field data still in instrument counts (effectively dB/dt), the time
series of data collected by the i sensor S is represented
as an N-point time series **x**(t). The N-point time series
is chopped into windows of length L and overlap V, resulting in a
total of k=floor(N/(L-V)) windows. Thus **x**(t) is
represented as **X**(t), a KxL array, whose *j* row
is the contiguous stream of data centered at *t*=*j**(L-V).
The last row of **X** is taken as the final L points of **x** in
order to avoid zero-padding. The array **X** is then contracted to
a vector **v** whose *j* element is the variance of the
*j* row of **X**. We thus obtain a windowed variance
time series **v**(t) for each sensor, where the time increment
for **v** is (L-V) times the sampling period of **x**. In our
example, we use day-long time series, sampled at 1Hz (hence N=86400),
and set L=256, V=64, thus k=450.

The choice of window length should be sufficiently wide to account for
possible intersite timing errors and FIR filter noise convolved on
top of spikes, but sufficiently narrow that moderate sized spikes
drive the window variance well above the value it would have without a
spike. To apply the electric field scale factors, we look at the
ratios of the **v** for approximately parallel electrodes at
the different sites. In our example, we use SAO as the reference site,
and point-by-point ratios of **v** to **v**
are examined together with point-by-point ratios of **v**
to **v**. Each cardinal sensor orientation is examined
separately because of possible site distortion effects. Figure
shows the variance ratios over a four-year
window. Note that there is a significant difference in the ratio
amplitudes between the two cardinal orientations, a median of 28 for
EW and 148 for NS, implying a relative electrical anisotropy between
the two sites. Some seasonal effects are also apparent, as are
offsets due to instrument swaps, or times when sites were not
functioning properly. In general however, the ratio is reasonably
stable, and stationary on a day-to-day basis. The EW electric field
at PKD is thus scaled by or equivalently **v** is scaled by 28, and similarly for the NS component.
With E field data appropriately scaled, we recalculate the **v**(t), but this time the rows of **X** are
first-differenced before variances are calculated. Windows of
anomalous variance can now be flagged.
Each **v**(t) is associated with a two digit code, one digit for field
type (E or H) and one for orientation. Channels with the same codes are
grouped together (we assume that sites are not significantly
rotated w.r.t one another). Within each pairing, the log ratio **r**
of the variance time series is calculated:

The method seems to work well. It has application in the
monitoring of array health, and possibly in other mutichannel
systems where channels typically record stationary time series. There
are a very few cases where the method
misidentifies data as being a spike.
Addition of a hard threshold insisting that
spikes be ID-ed as windows having **v** greater than 0.3 or M alpha
trimmed stds, whichever is greater, seems to protect agianst this.There are several parameters which control the algorithm; these are:
initial time series lengths (N), window length (L), window overlap
(V), whether or not to 'first-difference the data' (boolean),
(percent of data to reject when calculating thresholds), and M, the
threshold multiplier. A code has been written which incorporates
these variables into a few simple scripts. Four years of data at 1Hz
can be rapidly despiked in a short amount of time.

Eisel M., Egbert G.D., (2001), On the stability of magnetotelluric
transfer function estimates and the reliability of their variances,
*Geophys. J. Int.*, 144, pp65-82, 2001

Kappler, K.N., Morrison, H.F., Egbert, G.D., Parkfield-Hollister
Electromagentic Monitoring Array, *BSL Annual Report*, 2005-06

Berkeley Seismological Laboratory

215 McCone Hall, UC Berkeley, Berkeley, CA 94720-4760

Questions or comments? Send e-mail: www@seismo.berkeley.edu

© 2007, The Regents of the University of California