next up previous contents
Next: 3.8 Style guide for Up: 3 The installation of Previous: 3.6 Compiling vasp.4f90

3.7 Performance optimization of VASP


The most critical point for a good performance of VASP is a good BLAS package. This package can be retrieved from many public domain servers, for instance Most machine suppliers also offer a highly optimized BLAS package. BLAS routines are for instance part of the following libraries:

  libessl (on IBM)
  libdxml (on DEC ALPHA)
  libblas (available from SGI)
These packages reach peak performance on most machines (240-400 Mflops). Whenever possible one should buy these routines from the manufacturer of the machine. As an alternative one can install the public domain version but this might slow down VASP by a factor of 2-3 for very large systems. We want to stress that using DO loops instead of the optimized BLAS routines would have resulted in a similar performance drop.

If possible an optimized LAPACK should also be installed, although this less importance for good performance. All required LAPACK routines are also contained in vasp.lib. If no optimized LAPACK routines are available it is often possible to improve performance slightly by specifying -DNOZTRMM (see section 3.5.4) in the makefile. One can test this using a large test system (for instance bench.Hg.tar) and running with IALGO=-1. The only timing influenced is ORTHCH.

Of considerable importance is in addition the performance of the FFT routines. VASP is supplied with routines written and optimized by J. Furthmüller (it is a version of Schwarztrauber's multiple sequence FFT, supporting radices 2,3,4,5 and 7). On most machines these routines outperform the manufacturer supplied routines (for instance CRAY C90, SGI, DEC). It is possible to optimize these routines by supplying an additional flag to the precompiler

The following values resulted in optimal performance:
 IBM       -DCACHE_SIZE=32768
 T3D       -DCACHE_SIZE=8000
 DEC ev5   -DCACHE_SIZE=8000
After changing CACHE_SIZE in the makefile fft3dfurth must be touched
 touch fft3dfurth.F
and vasp recompiled. On vector computers CACHE_SIZE should be set to 0. (Mind that versions older than March 14 1997, require that one sets CACHE_SIZE directly in fft3dfurth.F). One can also try to increase the optimization level for these routines (but in our tests we have never found a significant performance improvement).

There are a few remaining routines which might benefit from higher optimization: Most important are nonl.F and nonlr.F. Tests for these routines can be done with bench.Hg.tar and IALGO=-1. For LREAL=.TRUE. the timings for RPRO and RACC (nonlr.F) are affected, whereas for LREAL=.FALSE. the timings for VNLACC and PROJ (nonl.F) are affected.

On LINUX-PC's with f2c (and g77) one should always specify -malign-double. But not all f2c front-ends understand this directive (fort77 for instance does). Mind: NAG f90 and -malign-double result in a corrupt code.

next up previous contents
Next: 3.8 Style guide for Up: 3 The installation of Previous: 3.6 Compiling vasp.4f90

Mon Mar 29 10:38:29 MEST 1999