Next: Parallelization of VASP.4
Up: The installation of VASP
Previous: Performance of serial code
N.B. This document is no longer maintained, please visit our wiki.
Performance of parallel code on various machines
For historic reasons, we show the scaling of VASP.4 code on the T3D.
The system is l-Fe with a cell containing 64 atoms, the point
only was used, the number of plane waves was 12500 and the number of
included bands is 384.
The main problem with the current algorithm is the sub space
rotation. Sub space rotation requires the diagonalization of
a relatively small matrix (in this case
this step scales badly on a massively parallel
machine. VASP currently uses either scaLAPACK or a fast
Jacobi matrix diagonalisation scheme written by Ian Bush (T3D, T3E only). On 64
nodes, the Jacoby scheme requires around 1 sec to diagonalise the matrix,
but increasing the number of nodes does not improve the timing.
The scaLAPACK requires at least 2 seconds, and scaLAPACK reaches this performance
already with 16 nodes.
Scaling for a 256 Al system.
Fig. 2 shows a more representative result on an SGI 2000 for 256 Al
atoms. Up to 32 nodes an efficiency of 0.8 is found.
A similar efficiency can be expected on most current architecture
with large communication band-width (Infiniband, Myrinet, SGI etc.).
On a Gibgabit ethernet based cluster, you can expect an efficiency of up to 75 %
for up to 16-32 cores.
Scaling of bench.PdO on a PC cluster with Gigabit ethernet.
The final figure Fig. 3 shows the scaling for an in-house state of the art machine build by SGI (narwal).
The nodes are linked by a QDR Infiniband switch, and each node consists
of 8 cores (with two Intel(R) Xeon(R) CPU E5540 CPU's, 2.53GHz).
In this case, the RMM-DIIS algorithm shows very good parallel efficiency
of 65 % from 16 to 256 cores. For the Davidson algorithm, the
parallel efficiency is only roughly 50 % from 16 to 256 cores.
Scaling for a 512 atom GaAs system. The point only
version was used and the total number of filled bands is 1024.
The default plane wave cutoff of 208 eV was used. Other VASP settings
are PREC = A ; ISYM = 0 ; NELMDL = 5 ; NELM = 8 ; LREAL = A .
The left panel shows the timing for RMM-DISS (ALGO = V), the right for
Davidson (ALGO = N).
The time for the 7th SCF step is reported.
N.B. Requests for support are to be addressed to: email@example.com