LBM: Optimized Implementations of the Lattice Boltzmann Method in 3D
Lattice Boltzmann Methods (LBM) are popular for numerical simulation of incompressible flows. This project is aimed to investigate and optimize simple lattice Boltzmann kernels for different architectures. This includes both commodity "off-the-shelf" architectures and tailored HPC systems, such as vector computers. We cover modern 64-bit processors ranging from IA32 compatible (Intel Xeon/Nocona, AMD Opteron), superscalar RISC (IBM Power4), IA64 (Intel Itanium 2) to classical vector (NEC SX6) and novel vector (Cray X1) architectures.
In the course of this project, we adviced the Bachelor Thesis of Stefan Donath and Johannes Habich and published several papers.
The Bachelor Thesis of Stefan Donath as well as the first report (see section Papers below) is on the influence of different memory layouts on the performance of simple lattice Boltzmann kernels. By reordering the data of the array used it was able to supersede standard cache-optimizing techniques like spatial blocking.
Stefan Donath himself presented his results on the SIAM Conference on Computational Science & Engineering 2005 in Orlando, Florida.
Parallelization and scaling behavior of LBM was examined in a second part of this project. Extensive experiments with both OpenMP and MPI on different contemporary Terascale architectures have been done and published at e.g. Supercomputing Conference 2004 and Parallel CFD Conference 2005.
Optimization and Application of 3D LBM for complex structures
In a further stage of the project we investigated in optimization possibilities of a LBM code for complex structures. In cooperation with the Lattice Boltzmann Development Consortium a data representation which only stores fluid cells, ommitting obstacle data, was examined. The results using memory traversion by space-filling curves were presented on the ASIM Conference 2005.
This part of the project was partially funded by the Bavarian Graduate School for Computational Engineering which is part of the Elitenetzwerk Bayern.
To regard the increasing complexity of continous surfaces the Bachelor Thesis of Johannes Habich implemented more advanced and accurate boundary conditions of second order. The influence on performance of the additional calculations as well as the possibility of different fluid to obstacle ratios were well-investigated. This lead to the implementation of an compressed list storage format which was thouroughly tested for performance with different compressed list storage spatial blocking factors. A shared memory parallelization was done to meet todays increased medium grained parallelism.
Optimized GPU (Graphics Processing Unit) Implementations of the Lattice Boltzmann Method in 3D
Special purpose accelerators are an emerging topic over the last years. To evaluate the effort of implementing numerical kernels and the proposed benefit, the Master Thesis of Johannes Habich implemented several benchmarks to get hands-on knowledge about initial implementation effort and optimization techniques on the currently available nVIDIA Geforce G80 GPU. The huge thread level parallelism leads to a new way of parallel programming, which is supported by the nVIDIA CUDA framework. The well known Streambenchmark was implemented and demonstrated the potential of the memory subunit. The implementation of a lattice Boltzmann driven fluid flow solver showed deep insights into pitfalls of the hardware and led to sophisticated optimization techniques which are in general applicable.
In cooperation with the Department of Computer Science 10 (Systemsimulation) a new kernel was derived which was better suited for deployment in an MPI parallelized heterogeneous framework called widely applicable Lattice Boltzmann from Erlangen (waLBerla). An indepth analysis of the computation and communication pattern led to a very efficient and fast solver which is now developed towards different kinds of applications, e.g. particulate flows. A major concern in comparison to stand alone solver development is that different communication networks lead to inevitable performance drawbacks. To optimize these communication stages is the most important part of performance optimizations.
This project is partially funded by KONWIHR (Competence Network for Technical, Scientific High Performance Computing in Bavaria).
By cooperation with the
Department of Computer Science 10 (Systemsimulation)
and the Chair of Fluid Dynamics
we ensure that the project is always as near as
possible to the engineering demands. Furthermore we are
working together with Peter Lammers at HLRS
Jörg Bernsdorf of German Research School for Simulation Sciences GmbH.
This project is partially funded by SKALB (Lattice-Boltzmann-Methoden fÃ¼r skalierbare Multi-Physik-Anwendungen).
Infos & Talks
Gerhard Wellein, Thomas Zeiser, Stefan Donath, Georg Hager
On the Single Processor Performance of Simple Lattice Boltzmann Kernels
Computers & Fluids, 35:8-9 (2006) 910-919
Thomas Pohl, Nils Thürey, Frank Deserno, Ulrich Rüde, Peter Lammers, Gerhard Wellein, Thomas Zeiser
Performance Evaluation of Parallel Large-Scale Lattice Boltzmann Applications on Three Supercomputing Architectures
accepted for Supercomputing Conference, 2004.
Peter Lammers, Gerhard Wellein, Thomas Zeiser, Georg Hager, Michael Breuer
Have the vectors the continuing ability to parry the attack of the killer micros?
accepted for Proceedings of the 2nd Teraflop-Workshop at HLRS, March 2005.
Gerhard Wellein, Thomas Zeiser, Peter Lammers, Uwe Küster
Towards Optimal Performance for Lattice Boltzmann Applications on Terascale Computers
accepted for Parallel CFD Conference, 2005.
Stefan Donath, Thomas Zeiser, Georg Hager, Johannes Habich, Gerhard Wellein
Optimizing Performance of the Lattice Boltzmann Method for Complex Structures on Cache-based Architectures
In Proceedings "Frontiers in Simulation: Simulationstechnique - 18th Symposium in Erlangen, September 2005 (ASIM)" (Editors: F. Hülsemann, M. Kowarschik, U. Rüde), SCS Publishing House, Erlangen, 2005, Pages 728-735.
Johannes Habich, Thomas Zeiser, Georg Hager, Gerhard Wellein
Speeding up a Lattice Boltzmann Kernel on nVIDIA GPUs.
In Proceedings of the " First International Conference on Parallel, Distributed and Grid Computing for Engineering, April 2009, Pecs, Hungary, PARENG09-S01" (Editors: B.H.V. Topping and P. Ivanyi ), Civil-Comp Press, Stirling, 2009.
Thomas Zeiser, Gerhard Wellein, Georg Hager, Stefan Donath, Frank Deserno, Peter Lammers, Monika Wierse
Optimized Lattice Boltzmann Kernels as Testbeds for Processor Performance
Prof. Ulrich Rüde, Dr. Gerhard Wellein, Dr. Thomas Zeiser, Dr. Georg Hager, Stefan Donath.
Prof. Ulrich Rüde, Dr. Gerhard Wellein, Thomas Zeiser, Georg Hager, Frank Deserno.
Prof. Ulrich Rüde, Dr. Gerhard Wellein, Thomas Zeiser, Georg Hager.
Optimization Approaches and Performance Characteristics of Lattice Boltzmann Kernels
invited talk, International Conference for Mesoscopic Methods in Engineering and Science, Braunschweig, July 28, 2004.
On Optimized Implementations of the Lattice Boltzmann Method on Contemporary High Performance Architectures
SIAM CSE05 Conference, Orlando, February 2005.
Architecture and Performance of Terascale Computers
International Conference on Parallel Computational Fluid Dynamics, Maryland, May 24-27, 2005.
Optimizing Performance of the Lattice Boltzmann Method for Complex Structures
ASIM Conference, Erlangen, September 2005.
Efficient implementations of simple lattice Boltzmann kernels
International Conference for mesoscopic Methods in Engineering and Science (ICMMES) 2006, Hampton/Norfolk, July 24, 2006.