next up previous contents index
Next: LASYNC-tag Up: The INCAR File Previous: ICORELEVEL-tag, and core level   Contents   Index

N.B. This document is no longer maintained, please visit our wiki.

Parallelisation: NPAR, NCORE, LPLANE, and the KPAR-tag

VASP currently offers parallelisation (and data distribution) over bands, parallelization (and data distribution) over plane wave coefficients (see also Section 4), and as of VASP.5.3.2, parallelization over $ k$-points (no data distribution).

To obtain high efficiency on massively parallel systems or modern multi-core machines, it is strongly recommended to use all at the same time. Most algorithms work with any data distribution (except for the single band conjugated gradient, which is considered to be obsolete).

NCORE is available from VASP.5.2.13 on, and is more handy than the previous parameter NPAR. The user should either specify NCORE or NPAR, where NPAR takes a higher preference. The relation between both parameters is

$\displaystyle {\tt NCORE} =$   total number cores$\displaystyle / {\tt NPAR}.

NCORE determines how many cores work on one orbital. The value is also printed at the beginning of the OUTCAR file. The current default is NCORE=1, implying that one orbital is treated by one core. NPAR is then set to the total number of cores. If NCORE equals the total number of cores, NPAR is set to 1. This implies distribution over plane wave coefficients only: all cores will work on every individual band, by distributing the plane wave coefficients over all cores. This is usually very slow and should be avoided.

NCORE=1 is the optimal setting for platforms with a small communication bandwidth and is a good choice for up to cores, as well as, machines with a single core per node and a Gigabit network. However, this mode substantially increases the memory requirements, because the non-local projector functions must be stored entirely on each core. In addition, substantial all-to-all communications are required to orthogonalize the bands. On massively parallel systems and modern multi-core machines we strongly urge to set

$\displaystyle {\tt NPAR=} \approx \sqrt{ \mbox{number of cores}}


$\displaystyle {\tt NPAR=}$   number of cores per compute node

In selected cases, we found that this improves the performance by a factor of up to four compared to the default, and it also significantly improves the stability of the code due to reduced memory requirements.

The second switch influences the data distribution is LPLANE. If LPLANE is set to .TRUE. in the INCAR file, the data distribution in real space is done plane wise. Any combination of NPAR and LPLANE can be used. Generally, LPLANE=.TRUE. reduces the communication band width during the FFT's, but at the same time it unfortunately worsens the load balancing on massively parallel machines. LPLANE=.TRUE. should only be used if NGZ is at least 3*(number of nodes)/NPAR, and optimal load balancing is achieved if NGZ=n*NPAR, where n is an arbitrary integer. If LPLANE=.TRUE. and if the real space projector functions (LREAL=.TRUE. or ON or AUTO) are used, it might be necessary to check the lines following

 real space projector functions
  total allocation   :
  max/ min on nodes  :
The max/ min values should not differ too much, otherwise the load balancing might worsen as well.

The optimum settings for NPAR and LPLANE depend very much on the type of machine you are using. Results for some selected machines can be found in Sec. 3.10. Recommended setups:

KPAR is the number of $ k$-points that are to be treated in parallel (available as of VASP.5.3.2). The set of $ k$-points is distributed over KPAR groups of compute cores, in a round-robin fashion. This means that a number of $ N={\rm\char93 cores}/{\rm KPAR}$ compute cores together work on an individual $ k$-point (choose KPAR such that it is an integer divisor of the total number of cores). Within this group of $ N$ cores that share the work on an individual $ k$-point, the usual parallelism over bands and/or plane wave coefficients applies (see above).

Note: the data is not distributed additionally over $ k$-points.

N.B. Requests for support are to be addressed to: