next up previous contents index
Next: CRAY_MPP Up: Pre-compiler flags overview, parallel Previous: T3D_SMA   Contents   Index

N.B. This document is no longer maintained, please visit our wiki.


If specified, VASP will use scaLAPACK instead of LAPACK for the LU decomposition (timing ORTHCH) and diagonalisation (timing SUBROT) of the sub space matrix ( $ N_{\rm bands} \times N_{\rm bands}$). These operations are very fast in the serial version ($ 2 \%$) but become a bottleneck on massively parallel machine for systems with many electrons. If scaLAPACK is installed on massively parallel machine use this switch (T3E, SGI, IBM SPX). scaLAPACK can be used on the T3E starting from programming environment ( does for instance not offer the required routines). On the T3D (but not T3E) the additional switch

must be specified, at least for the scaLAPACK version we have tested (the T3D scaLAPACK is not compatible to standard scaLAPACK routines).

On slow networks and PC clusters (100 Mbit Ethernet and even 1 Gbit Ethernet), it is not recommended to use scaLAPACK. Performance improvements are small or scaLAPACK is even slower than LAPACK. If you still want to give it a try, please download the required source files from Compilation is fairly straightforward, but requires familiarity with MPI, Fortran, C and UNIX makefiles (always make sure that the underlying BLACS routines are working correctly !).

ScaLAPACK can be switched of during runtime by specifying

in the INCAR file. Use this as a fallback, when you encounter problems with scaLAPACK. Furthermore, in some cases, the LU decomposition (timing ORTHCH) based on scaLAPACK is slower than the serial LU decomposition. Hence it also is possible, to switch of the parallel LU decomposition by specifying
in the INCAR file (the subspace rotation is still done with scaLAPACK in this case).

N.B. Requests for support are to be addressed to: