VASP currently offers parallelisation (and data distribution) over bands and parallelization (and data distribution) over plane wave coefficients (see also Section 4). To obtain high efficiency on massively parallel systems or modern multi-core machines, it is strongly recommended to use both at the same time. Most algorithms work with any data distribution (except for the single band conjugated gradient, which is considered to be obsolete).
NCORE is available from VASP.5.2.13 on, and is more handy
than the previous parameter NPAR. The user should either
specify NCORE or NPAR, where NPAR takes
a higher preference. The relation between both parameters
is
NCORE determines how many cores work on one orbital. The value is also printed at the beginning of the OUTCAR file. The current default is NCORE=1, implying that one orbital is treated by one core. NPAR is then set to the total number of cores. If NCORE equals the total number of cores, NPAR is set to 1. This implies distribution over plane wave coefficients only: all cores will work on every individual band, by distributing the plane wave coefficients over all cores. This is usually very slow and should be avoided.
NCORE=1 is the optimal setting for
platforms with a small communication bandwidth and is
a good choice for up to cores, as well as, machines with a single
core per node and a Gigabit network. However, this mode
substantially increases the memory requirements, because
the non-local projector functions must be stored entirely on each core.
In addition, substantial all-to-all communications are
required to orthogonalize the bands.
On massively parallel systems and modern multi-core machines
we strongly urge to set
The second switch influences the data distribution is LPLANE. If LPLANE is set to .TRUE. in the INCAR file, the data distribution in real space is done plane wise. Any combination of NPAR and LPLANE can be used. Generally, LPLANE=.TRUE. reduces the communication band width during the FFT's, but at the same time it unfortunately worsens the load balancing on massively parallel machines. LPLANE=.TRUE. should only be used if NGZ is at least 3*(number of nodes)/NPAR, and optimal load balancing is achieved if NGZ=n*NPAR, where n is an arbitrary integer. If LPLANE=.TRUE. and if the real space projector functions (LREAL=.TRUE. or ON or AUTO) are used, it might be necessary to check the lines following
real space projector functions total allocation : max/ min on nodes :The max/ min values should not differ too much, otherwise the load balancing might worsen as well.
The optimum settings for NPAR and LPLANE depend very much on the type of machine you are using. Results for some selected machines can be found in Sec. 3.10. Recommended setups:
On a LINUX cluster with multicore machines linked by a fast network we recommend to set
LPLANE = .TRUE. NCORE = number of cores per nodes (e.g. 4 or 8) LSCALU = .FALSE. NSIM = 4If very many nodes are used, it might be necessary to set LPLANE = .FALSE., but usually this offers very little advantage. For long (e.g. molecular dynamics runs), we recommend to optimize NPAR by trying short runs for different settings.
LPLANE = .TRUE. NCORE = 1 LSCALU = .FALSE. NSIM = 4Mind that you need at least a 100 Mbit full duplex network, with a fast switch offering at least 2 Gbit switch capacity to find usefull speedups. Multi-core machines should be always linked by an Infiniband, since Gbit is too slow for multi-core machines.
On many massively parallel machines one is forced to use
a huge number of nodes. In this case load balancing problems
and problems with the communication bandwidth are likely
to be experienced. In addition the local memory is fairly small
on some massively parallel machines; too small
keep the real space projectors in the cache with any setting.
Therefore, we recommend to set NPAR on these machines to
(explicit timing
can be helpful to find the optimum value). The use of
LPLANE=.TRUE. is only recommend if the number of
nodes is significantly smaller than NGX, NGY and NGZ.
In summary, the following setting is recommended
LPLANE = .FALSE. NPAR = sqrt(number of nodes) NSIM = 1