Efficient parallel search using the Neighbourhood AlgorithmP. Rickwood & M. Sambridge Over the past year, we have undertaken development of the Neighbourhood Algorithm, to improve its performance as a distributed/parallel algorithm for geophysical inversion. This has involved re-formulating the Neighbourhood algorithm so that it is equivalent (in a probabilistic sense) to the existing formulation (described on the NA home page), but is capable of being more efficiently parallelised. A parallel implementation has been implemented which can be an order of magnitude faster than the currently available parallel implementation. The existing (old) parallel implementation of the NA used simple MPI capabilities to achieve a limited form of parallelism. However, this parallelism has two important limitations:
The re-worked implementation of the NA takes advantage of more sophisticated MPI capabilities (asynchronous non-blocking send-receive), that, together with minor modifications to the algorithm itself (that preserve the essential nature of the algorithm), allow for a vastly more efficient implementation. Barrier synchronisation overhead is eliminated entirely, and book-keeping is distributed across all computers involved in the computation. Figure 2 demonstrates the speed-up achieved by the new implementation of the NA algorithm.
Figure 2: Performance -- old versus new The improvement in efficiency is most appreciable for problems with objective functions that are relatively quick (< 1 second) to evaluate, and for problems where there is a large degree of variability in the time taken to evaluate the objective function 1. Other work and improvementsIn addition to being more efficient, the new implementation of the Neighbourhood Algorithm is also more robust, as the re-formulation of the algorithm allows an implementation that is resistant to the failure of any individual node. In other words, the parallel implementation will continue to work even in the event that nodes fail mid-computation. As clusters become larger (and cheaper), and parallel computing becomes the norm, this feature will become especially important. The Neighbourhood algorithm has also been re-implemented in the Java programming language, to take advantage of Java's cross-platform portability. This means that the Neighbourhood Algorithm can now be distributed in pre-compiled form, and run on any platform without the need for recompilation. In addition, the Java implementation of the Neighbourhood algorithm can now be easily distributed with the CADI Toolkit, and benefit from the point-and-click interface and other plotting and visualisation capabilities of the CADI toolkit. We plan to make both Fortran and Java implementations of the parallel Neighbourhood Algorithm available from the NA home page. Future development may involve some re-engineering of the second stage of the Neighbourhood Algorithm (the appraisal stage) for more efficient parallel execution. 1 This may happen in varying circumstances. In some cases, the time taken to evaluate the objective function is parameter dependent. In other cases, the computers performing the computation are not equivalent (in terms of speed), or have varying loads. 2 Details of which can be found in: |