A Comparison of Standard Inversion, Neural Networks and Support Vector Machines

The object of geophysical inversion is to recover earth parameters from measured data. If the relationship between an earth model and the data is linear, then three different methods of data interpretation, linear inversion, Neural Networks and Support Vector Machines, arrive at the same model from different paradigms. Linear inversion finds a model by minimizing a least squares objective function to which there is a closed form solution. NNs and SVMs use training data to approximate a functional inverse. If the relationship between models and data is non-linear, there is no longer a closed form solution to the inversion objective function, and a model is found by iterative guessing, or by applying a linear approximation to the non-linear physics governing the problem in conjunction with standard linear inversion. An NN finds non-linear relationships by adopting an architecture tuned to a user-designed search algorithm. A SVM is rendered non-linear by changing a single parameter. All three approaches can be used independently or in combination. We are actively developing a tutorial which explains the differences and relationships between these methods.

Geophysical relationships can be written

(37.1) |

d is a vector of measured data, m is a vector of model parameters that describe the earth, and G is a function derived from physics and geometry. The goal of geophysical inversion is to find m, given G and d.

If the relationship between data and models is linear, then G can be written as a matrix of coefficients:

(37.2) |

Least squares linear inversion is done by minimizing an objective function:

(37.3) |

which has the well known closed form solution

(37.4) |

Notice that any vector of data can be inverted once the matrix has been computed.

In the machine learning paradigm, it is assumed that examples of m and d are available, and G does not necessarily have to be known. The goal of inversion via machine learning, either with a Neural Network or a Support Vector Machine, is to find a function S such that

(37.5) |

In geophysics, usually G is known, and it is possible to select a series of example models and use G to compute a corresponding set of training data. The goal of a linear machine learning problem is to find S such that

(37.6) |

M is now a matrix of example or 'training models' and D is a matrix in which each row is the corresponding 'training data'.

Another frame of reference in which to cast information processing is that of a network of interlinked, adaptive data structures, broadly known as neural networks. The neural net format is inspired by the neuron model. The neuron is composed of three main parts. There are the input channels (dendrites), a cell which performs a function on the inputs (the neuron body), and an output channel to conduct the function result away from the body (axon).

In a simple, linear neural net, the inputs are subjected to a weighting scheme which seeks to optimize the output of the network. The method of steepest ascent is the most common method of selecting weights for the input Inversion, NN and SVM channels. In the case of a linear weighting scheme applied to a single set of inputs, the steepest ascent reduces to the method of ordinary least squares, or linear inversion. Neural Network approaches to computing partition the coding into nodal structures like the neuron and interlink the nodes with axon/dendrites. Using the NN paradigm when designing software is helpful in making the design modular.

(37.7) |

If the scalar C (picked by the user) is very large, then the solution to equation (7) reduces to

(37.8) |

Instead of solving equation 7 directly, an LS-SVM is
derived from an equivalent, dual form of the minimization
(*Kuzma*, 2003). The solution is
formed in terms of inner-products, or Kernel functions. In the simplest case the kernel function is just the standard dot-product. A LS-SVM
does not find S directly, but instead finds a set of
coefficients such that

(37.9) |

To then use the SVM to find a model from new data, the following is computed

(37.10) |

In essence the SVM is approximating the model which fits the field data by 'projecting' the field data onto a basis of training data vectors (the support vectors). Thus a weighted linear combination of training models forms the solution. This is described in more detail in the tutorial we are preparing.

1. Make an initial guess for m

2. Compute data (or gradients) via equation 1

3. Update the guess for m

4. Repeat steps 2 and 3 until a solution is established

In practice, many different algorithms may be used for nonlinear inversion. Constraints, regularization and limits on step size may be used to force the solution toward desired models. In our tutorial we treat the following common numerical methods by outlining them conceptually, and citing formulae:

Newton's Method

Gradient : Steepest Ascent/Descent

Conjugate Gradient

Monte Carlo methods: Genetic and Simulated Annealing

Occam's Constraint

Some of the pitfalls of each method, such as slow convergence or choice of a good starting model are discussed, as well as tricks to avoid these pitfalls. For example using a simple SVM to select a good starting model and a decaying step size applied to a subsequent gradient search.

A more complicated neural net often uses connectivity as a strength for optimizing a solution. Once the data are input, a series of hidden layers are used to analyze different aspects of data. The connection strengths between nodes can be varied. A NN could incorporate a group of parallel SVMs, each optimizing different functions, and weighting the inversion form each SVM into a resulting output.

A description of a NN which emulates an SVM and the supporting mathematics can be found in *Haykin* CH7. In the NN paradigm, SVMs are known as radial basis function networks.

The architecture of a non-linear SVM is the same as the architecture of a linear SVM except that dot products are replaced by kernel function. To capture non-linear relationships in an LS-SVM, equations 11 and 12 are replaced by

(37.11) |

(37.12) |

(37.13) |

The DCI Postdoctoral Fellowship Program partially supported this work.

Kappler et al., 2002 Dynamics in High Dimensional Networks, *Signal Processing, Elsevier*

Kuzma, H. A., Rector, J.W., (2004), Non-linear AVO inversion using support vector machines: 74th annual International Meeting of Soc. Expl. Geophysics Scholkopf, B. and Smola, A., 2002, Learning with Kernels, Support Vector Machines, Regularization, Optimization, and Beyond: The MIT Press.

Berkeley Seismological Laboratory

215 McCone Hall, UC Berkeley, Berkeley, CA 94720-4760

Questions or comments? Send e-mail: www@seismo.berkeley.edu

© 2006, The Regents of the University of California