A Comparison of Standard Inversion, Neural Networks and Support Vector Machines

Karl Kappler, Heidi Kuzma, James W. Rector

Summary

The object of geophysical inversion is to recover earth parameters from measured data. If the relationship between an earth model and the data is linear, then three different methods of data interpretation, linear inversion, Neural Networks and Support Vector Machines, arrive at the same model from different paradigms. Linear inversion finds a model by minimizing a least squares objective function to which there is a closed form solution. NNs and SVMs use training data to approximate a functional inverse. If the relationship between models and data is non-linear, there is no longer a closed form solution to the inversion objective function, and a model is found by iterative guessing, or by applying a linear approximation to the non-linear physics governing the problem in conjunction with standard linear inversion. An NN finds non-linear relationships by adopting an architecture tuned to a user-designed search algorithm. A SVM is rendered non-linear by changing a single parameter. All three approaches can be used independently or in combination. We are actively developing a tutorial which explains the differences and relationships between these methods.

Introduction

Geophysical relationships can be written

\begin{displaymath}
d = G(m)
\end{displaymath} (37.1)

d is a vector of measured data, m is a vector of model parameters that describe the earth, and G is a function derived from physics and geometry. The goal of geophysical inversion is to find m, given G and d.

Linear Inversion

If the relationship between data and models is linear, then G can be written as a matrix of coefficients:


\begin{displaymath}
d = Gm
\end{displaymath} (37.2)

Least squares linear inversion is done by minimizing an objective function:


\begin{displaymath}
m_{linv} = \stackrel{arg min}{_m}((d-Gm)^{T}(d-Gm))
\end{displaymath} (37.3)

which has the well known closed form solution


\begin{displaymath}
m_{linv} = (G^{T}G)^{-1}G^{T}d
\end{displaymath} (37.4)

Notice that any vector of data can be inverted once the matrix $(G^{T}G)^{-1}G^{T}$ has been computed.

Inversion via Machine Learning

In the machine learning paradigm, it is assumed that examples of m and d are available, and G does not necessarily have to be known. The goal of inversion via machine learning, either with a Neural Network or a Support Vector Machine, is to find a function S such that


\begin{displaymath}
m_{ML} = S(d) \approx SVD(d) \approx NN(d)
\end{displaymath} (37.5)

In geophysics, usually G is known, and it is possible to select a series of example models and use G to compute a corresponding set of training data. The goal of a linear machine learning problem is to find S such that


\begin{displaymath}
M=DS
\end{displaymath} (37.6)

M is now a matrix of example or 'training models' and D is a matrix in which each row is the corresponding 'training data'.

Linear Neural Network

Another frame of reference in which to cast information processing is that of a network of interlinked, adaptive data structures, broadly known as neural networks. The neural net format is inspired by the neuron model. The neuron is composed of three main parts. There are the input channels (dendrites), a cell which performs a function on the inputs (the neuron body), and an output channel to conduct the function result away from the body (axon).

Figure 37.1: Schematic of a single Neuron
\begin{figure}\begin{center}
\epsfig{file=kk2fig1.eps, width=8cm}\end{center}\end{figure}

In a simple, linear neural net, the inputs are subjected to a weighting scheme which seeks to optimize the output of the network. The method of steepest ascent is the most common method of selecting weights for the input Inversion, NN and SVM channels. In the case of a linear weighting scheme applied to a single set of inputs, the steepest ascent reduces to the method of ordinary least squares, or linear inversion. Neural Network approaches to computing partition the coding into nodal structures like the neuron and interlink the nodes with axon/dendrites. Using the NN paradigm when designing software is helpful in making the design modular.

Linear SVM

Least Squares SVMs (LS-SVM) are derived starting from equation 6 by minimizing an objective function of the form.


\begin{displaymath}
S_{SVM} = \stackrel{arg min}{_S}(C(DS-M){T}(DS-M)-S{T}S)
\end{displaymath} (37.7)

If the scalar C (picked by the user) is very large, then the solution to equation (7) reduces to


\begin{displaymath}
S_{SVM} = (G^{T}G)^{-1}G^{T}
\end{displaymath} (37.8)

Instead of solving equation 7 directly, an LS-SVM is derived from an equivalent, dual form of the minimization (Kuzma, 2003). The solution is formed in terms of inner-products, or Kernel functions. In the simplest case the kernel function is just the standard dot-product. A LS-SVM does not find S directly, but instead finds a set of coefficients $\alpha$ such that


\begin{displaymath}
\alpha = (DD^{T}+\frac{1}{2C})^{-1}M
\end{displaymath} (37.9)

To then use the SVM to find a model from new data, the following is computed


\begin{displaymath}
m = SVM(d) = \sum_{trianing data} \alpha_{i}(d_{i}^{T}d)
\end{displaymath} (37.10)

In essence the SVM is approximating the model which fits the field data by 'projecting' the field data onto a basis of training data vectors (the support vectors). Thus a weighted linear combination of training models forms the solution. This is described in more detail in the tutorial we are preparing.

Figure 37.2: Given proper training data, the models found from a linear LS-SVM are equivalent to those found by linear inversion
\begin{figure}\begin{center}
\epsfig{file=kk2fig2.eps, width=8.5cm}\end{center}\end{figure}

Non-Linear Inversion

If the relationship between models and data is not linear, then it is no longer possible to invert G directly. Nonlinear iterative inversions rely on numerical methods to minimize objective functions similar to that in equation 3. Non-linear inversion follows a sequence of steps:

1. Make an initial guess for m

2. Compute data (or gradients) via equation 1

3. Update the guess for m

4. Repeat steps 2 and 3 until a solution is established

In practice, many different algorithms may be used for nonlinear inversion. Constraints, regularization and limits on step size may be used to force the solution toward desired models. In our tutorial we treat the following common numerical methods by outlining them conceptually, and citing formulae:

Newton's Method

Gradient : Steepest Ascent/Descent

Conjugate Gradient

Monte Carlo methods: Genetic and Simulated Annealing

Occam's Constraint

Some of the pitfalls of each method, such as slow convergence or choice of a good starting model are discussed, as well as tricks to avoid these pitfalls. For example using a simple SVM to select a good starting model and a decaying step size applied to a subsequent gradient search.

Non-linear NN

A linear NN refers to the linear relationship betwixt neurons. Each node itself can be working in a non-linear fashion, as in the case of a network of switching nodes in which node potentials decay exponentially when node is off, and increase linear when it is on (Kappler et al. 2002). The linearity is in the connection between nodes. The state or potential of a given node depends on a linear function of the state and the potentials of its input nodes. The NN offers the advantage of being adaptable, and can respond to a changing data set on the fly, where as a linear inversion needs to be recomputed whenever a data set changes. The SVM and NN share this property, that once trained they can interpret new data for free.

Figure 37.3: Diagram of a neural net with a single hidden layer, comprised of four nodes
\begin{figure}\begin{center}
\epsfig{file=kk2fig3.eps, width=8.5cm}\end{center}\end{figure}

A more complicated neural net often uses connectivity as a strength for optimizing a solution. Once the data are input, a series of hidden layers are used to analyze different aspects of data. The connection strengths between nodes can be varied. A NN could incorporate a group of parallel SVMs, each optimizing different functions, and weighting the inversion form each SVM into a resulting output.

A description of a NN which emulates an SVM and the supporting mathematics can be found in Haykin CH7. In the NN paradigm, SVMs are known as radial basis function networks.

Non-Linear SVMs

The architecture of a non-linear SVM is the same as the architecture of a linear SVM except that dot products are replaced by kernel function. To capture non-linear relationships in an LS-SVM, equations 11 and 12 are replaced by


\begin{displaymath}
\alpha = (\Gamma+\frac{1}{2C})^{-1}M
\end{displaymath} (37.11)


\begin{displaymath}
\Gamma_{i,j}=K(d_{i}, d_{j})
\end{displaymath} (37.12)


\begin{displaymath}
m=\sum_{trianing data} \alpha_{i}K(d_{i}^{T}d)
\end{displaymath} (37.13)

The kernel function K(d$_{i}$,d$_{j}$) can be any function such that the Gram matrix, G, is positive definite (has no negative eigenvalues). In practice, K is picked by trial and error out of a small library of kernels. Training the SVM means finding the s, which still only requires inverting a well conditioned matrix that is the size of the number of training examples.

Implications and Future Work

The ordinary method of linear inversion can be recovered from simple assumptions applied to Neural Networks and SVMs. The NN and SVM approach however offer several advantages such as adaptability, expanding training sets without restarting the inversion, and nonlinear methods. The framework of a NN is also intuitive in the sense that the neural clusters can be set up so that each set of nodes is working on a specific user defined question.

Acknowledgments

The DCI Postdoctoral Fellowship Program partially supported this work.

References

Haykin, S., 1994, Neural Networks, A comprehensive Foundation, Macmillan

Kappler et al., 2002 Dynamics in High Dimensional Networks, Signal Processing, Elsevier

Kuzma, H. A., Rector, J.W., (2004), Non-linear AVO inversion using support vector machines: 74th annual International Meeting of Soc. Expl. Geophysics Scholkopf, B. and Smola, A., 2002, Learning with Kernels, Support Vector Machines, Regularization, Optimization, and Beyond: The MIT Press.

Berkeley Seismological Laboratory
215 McCone Hall, UC Berkeley, Berkeley, CA 94720-4760
Questions or comments? Send e-mail: www@seismo.berkeley.edu
© 2006, The Regents of the University of California