All requests for technical support from the VASP group must be addressed to: vasp.materialphysik@univie.ac.at

Difference between revisions of "ACFDT/RPA calculations"

From Vaspwiki
Jump to: navigation, search
(Memory bottleneck and Parallelization)
 
(5 intermediate revisions by the same user not shown)
Line 101: Line 101:
 
Every frequency point will have a similar line as shown above for the first point. The value after ''ERR='' corresponds to the maximum Fourier transformation error and should be of similar order as the maximum integration error of the frequency grid.
 
Every frequency point will have a similar line as shown above for the first point. The value after ''ERR='' corresponds to the maximum Fourier transformation error and should be of similar order as the maximum integration error of the frequency grid.
  
=== Note on Parallelization ===
+
=== Memory bottleneck and Parallelization ===
 
+
{{:Memory requirements of low-scaling GW and RPA algorithms}}
The low scaling RPA algorithm requires significantly more memory than the conventional method described in the previous section, because two Green's functions <math> G(i\tau)</math> and one polarizability <math> \chi(i\omega_n)</math> on {{TAG|NOMEGA}} imaginary grid points have to be stored.
+
 
+
To reduce the memory overhead, VASP exploits Fast Fourier Transformations (FFT) to avoid storage of the matrices on the (larger) real space grid, on the one hand. The precision of the FFT can be selected with {{TAG|PRECFOCK}}, where usually the values ''Fast'' sufficient.
+
 
+
On the other hand, the code avoids storage of redundant information,i.e. both the Green's function and polarizability matrices are distributed as well as the individual imaginary grid points. The distribution of the imaginary grid points can be set by hand with the {{TAG|NTAUPAR}} and {{TAG|NOMEGAPAR}} tags, which splits the imaginary grid points {{TAG|NOMEGA}} into {{TAG|NTAUPAR}} time and {{TAG|NOMEGAPAR}} groups. For this purpose both tags have to be divisors of {{TAG|NOMEGA}}.
+
 
+
The default values are usually reasonable choices for moderately large unit cells. For small systems or on large memory architectures {{TAG|NTAUPAR}} should be increased to {{TAG|NOMEGAPAR}} with the latter chosen as close as possible to {{TAG|NOMEGA}}. For very large unit cells on the other hand we recommend to set {{TAG|NTAUPAR}}={{TAG|NOMEGAPAR}}=1 or use more CPUs if possible.
+
 
+
=== Storage requirements ===
+
The cubic scaling space-time RPA as well as GW algorithm require considerably more memory than the correspondong quartic-scaling implementations, two Green's functions <math>\chi_({\bf r,r'},i\tau_n)</math> have to be stored. The required storage for an low-scaling RPA or GW calculation depends on {{TAG|NTAUPAR}}, the number of MPI groups that share same imaginary time points. A rough estimate for the required bytes is given by
+
 
+
NKPTS * (NGX*NGY*NGZ)^2 / ( NCPU  / {{TAGBL|NTAUPAR}} ) * 16
+
 
+
where "NKPTS" is the number of irreducible q-points, "NCPU" the number of MPI ranks used for the job and "NGX,NGY,NGZ" the number of FFT grid points for the supercell, which is written in the {{TAGBL|OUTCAR}} file in the line
+
 
+
FFT grid for supercell:  NGX =  32; NGY =  32; NGZ =  32
+
 
+
The smaller {{TAG|NTAUPAR}} is set, the less memory the job will require to finish successfully. Note that VASP finds the optimum value of {{TAGBL|NTAUPAR}} based on {{TAGBL|MAXMEM}}, the freely available memory per MPI rank on each node.
+
Thus it is recommended not to set {{TAGBL|NTAUPAR}} in the {{TAGBL|INCAR}}, but to set {{TAG|MAXMEM}} instead and allow VASP to find the optimum {{TAGBL|NTAUPAR}}.
+
 
+
The approximate memory requirement is calculated in advance and printed to screen and {{TAGBL|OUTCAR}} as follows:
+
 
+
min. memory requirement per mpi rank 1234 MB, per node 9872 MB
+
 
</span>
 
</span>
  

Latest revision as of 12:52, 12 August 2019

The ACFDT-RPA groundstate energy ( ) is the sum of the ACFDT-RPA correlation energy and the Hartree-Fock energy evaluated non self-consistently using DFT orbitals :

.

Note that, here includes also the Hartree energy, the kinetic energy, as well as the Ewald energy of the ions, whereas often in literature refers only to the exact exchange energy evaluated using DFT orbitals.

If ALGO=RPA is set in the INCAR file, VASP calculates the correlation energy in the random phase approximation. To this end, VASP calculates first the independent particle response function, using the virtual (unoccupied) states found in the WAVECAR file, and then determines the correlation energy using the plasmon fluctuation equation:

.

More information about the theory behind the RPA is found here.

General Recipe to Calculate ACFDT-RPA Total Energies

As of VASP.6 an RPA energy calculation can be done in one single step using a similar INCAR file as follows

 ALGO = RPAR # or ACFDTR 

Here it is "important not to set NBANDS", otherwise VASP proceeds with the WAVECAR found in the directory (if not present a random wavefunction is used!).

Older versions of VASP can not perform all required steps of an RPA calculation in a single run and four individual steps have to be done in practice:

  • First step (a standard DFT run): All occupied orbitals (and as usual in VASP, a few unoccupied orbitals) of the DFT-Hamiltonian are calculated:
EDIFF = 1E-8
ISMEAR = 0 ; SIGMA = 0.05

This can be done with your favorite setup, but we recommend to attain very high precision (small EDIFF flag) and to use a small smearing width (SIGMA flag), and to avoid higher order Methfessel-Paxton smearing (see also ISMEAR). We suggest to use PBE orbitals as input for the ACFDT-RPA run, but other choices are possible as well, e.g. LDA or hybrid functionals such as HSE. For hybrid functionals, we suggest to carefully consider the caveats mentioned in reference [1], specifically the RPA dielectric matrix yields significantly weak screening for hybrid functionals, which deteriorates RPA results.


  • Second step: the Hartree Fock energy is calculated using the predetermined DFT orbitals:
ALGO  = EIGENVAL ; NELM = 1
LWAVE=.FALSE.                  ! avoid accidental update of WAVECAR
LHFCALC = .TRUE. ; AEXX = 1.0  ! you my set ALDAC = 0.0 but the default is 1-AEXX
ISMEAR = 0 ; SIGMA = 0.05

For insulators and semiconductors with a sizable gap, faster convergence of the Hartree-Fock energy can be obtained by setting HFRCUT=-1, altough this slows down k-point convergence for metals.


  • Third step: Search for maximum number of plane-waves: in the OUTCAR file of the first step, and run VASP again with the following INCAR file to determine all virtual states by an exact diagonalization of the Hamiltonian (DFT or hybrid, make certain to use the same Hamiltonian as in step 1):
NBANDS = maximum number of plane-waves (times 2 for gamma-only calculations)
ALGO = Exact    ! exact diagonalization
NELM = 1        ! one step suffices since WAVECAR is pre-converged
LOPTICS = .TRUE.
ISMEAR = 0 ; SIGMA = 0.05

For calculations using the gamma-point only version of vasp, NBANDS must be set to twice the maximum number of plane-waves: (found in the OUTCAR file) in step 1. For metals, we recommend to avoid setting LOPTICS=.TRUE., since this slows down k-point convergence.


  • Fourth step: Calculate the ACFDT-RPA correlation energy:
NBANDS =  maximum number of plane-waves
ALGO = ACFDT
NOMEGA = 8-24 

Output analysis

The energy is calculated for 8 different cutoff energies and a linear regression is used to extrapolate the results to the infinite cutoff limit (see section below). A successful RPA calculation writes following lines into the OUTCAR:

      cutoff energy     smooth cutoff   RPA   correlation   Hartree contr. to MP2
---------------------------------------------------------------------------------
            316.767           316.767      -17.5265976349      -26.2640927215
            301.683           301.683      -17.3846505665      -26.0990489039
            287.317           287.317      -17.2429031341      -25.9344769084
            273.635           273.635      -17.0686574017      -25.7325162480
            260.605           260.605      -16.8914915810      -25.5277026697
            248.195           248.195      -16.7202601717      -25.3302982602
            236.376           236.376      -16.5559849344      -25.1415392478
            225.120           225.120      -16.3635400223      -24.9210737434
  linear regression    
  converged value                          -19.2585393615      -28.2627347266

Here the third and forth columns correspond to the correlation energy (for that specific cutoff energy) in the RPA and the direct MP2 approximation (second order term in RPA). The corresponding results of the linear regression are found in the line starting with "converged value".

Low scaling ACFDT/RPA algorithm

Virtually the same flags and procedures apply to the new low scaling RPA algorithm implemented in vasp.6.[2] However, ALGO=ACFDT or ALGO=RPA needs to be replaced by either ALGO=ACFDTR or ALGO=RPAR.

With this setting VASP calculates the independent particle polarizability using Green's functions on the imaginary time axis by the contraction formula[3]

Subsequently a compressed Fourier transformation on the imaginary axes yields

The remaining step is the evaluation of the correlation energy and is the same as described above.

Crucial to this approach is the accuracy of the Fourier transformation from , which in general depends on two factors.

First, the grid order that can be set by NOMEGA in the INCAR file. Here, similar choices as for the ACFDT algorithms are recommended. Second, the grid points and Fouier matrix have to be optimized for the same interval as spanned by all possible transition energies in the polarizability. The minimum (maximum) transition energy can be set with the OMEGAMIN (OMEGATL) tag and should be smaller (larger) than the band gap (maximum transition energy) of the previous DFT calculation. VASP determines these values automatically and writes it in the OUTCAR after the lines

Response functions by GG contraction: 

These values should be checked for consistency. Furthermore we recommend to inspect the grid and transformation errors by looking for following lines in the OUTCAR file

nu_ 1=  0.1561030E+00 ERR=   0.6327933E-05 finished after   1 steps    
nu_ 2= ...
Maximum error of frequency grid:  0.3369591E-06

Every frequency point will have a similar line as shown above for the first point. The value after ERR= corresponds to the maximum Fourier transformation error and should be of similar order as the maximum integration error of the frequency grid.

Memory bottleneck and Parallelization

The cubic scaling space-time RPA as well as GW algorithm require considerably more memory than the correspondong quartic-scaling implementations, two Green's functions have to be stored in real-space. To reduce the memory overhead, VASP exploits Fast Fourier Transformations (FFT) to avoid storage of the matrices on the (larger) real space grid, on the one hand. The precision of the FFT can be selected with PRECFOCK, where usually the values Fast sufficient.

On the other hand, the code avoids storage of redundant information,i.e. both the Green's function and polarizability matrices are distributed as well as the individual imaginary grid points. The distribution of the imaginary grid points can be set by hand with the NTAUPAR and NOMEGAPAR tags, which splits the imaginary grid points NOMEGA into NTAUPAR time and NOMEGAPAR groups. For this purpose both tags have to be divisors of NOMEGA.

The default values are usually reasonable choices provided the tag MAXMEM is set correctly and we strongly recommend to set MAXMEM instead of NTAUPAR. The optimum value of MPI groups that share the same time points will then be set internally to an optimum value.

The required storage for an low-scaling RPA or GW calculation depends mostly on NTAUPAR, the number of MPI groups that share same imaginary time points. A rough estimate for the required bytes is given by

NKPTS * (NGX*NGY*NGZ)^2 / ( NCPU  / NTAUPAR ) * 16

where "NKPTS" is the number of irreducible q-points, "NCPU" the number of MPI ranks used for the job and "NGX,NGY,NGZ" the number of FFT grid points for the supercell, which is written in the OUTCAR file in the line

FFT grid for supercell:   NGX =  32; NGY =  32; NGZ =  32

The smaller NTAUPAR is set, the less memory the job will require to finish successfully. Note that VASP finds the optimum value of NTAUPAR based on MAXMEM, the freely available memory per MPI rank on each node. Thus it is recommended not to set NTAUPAR in the INCAR, but to set MAXMEM instead and allow VASP to find the optimum NTAUPAR.

The approximate memory requirement is calculated in advance and printed to screen and OUTCAR as follows:

min. memory requirement per mpi rank 1234 MB, per node 9872 MB

Some Issues Particular to ACFDT-RPA Calculations on Metals

For metals, the RPA groundstate energy converges the fastest with respect to k-points, if the exchange (Eq. (12) in reference [4]) and correlation energy are calculated on the same k-point grid, HFRCUT= is not set, and the long-wavelength contributions from the polarizability are not considered (see reference [4]).

To evaluate Eq. (12), a correction energy for related to partial occupancies has to be added to the RPA groundstate energy:[4]

.

In vasp.5.4.1, this value is calculated for any HF type calculation (step 2) and can be found in the OUTCAR file after the total energy (in the line starting with exchange ACFDT corr. =).

To neglect the long-wavelength contributions, simply set LOPTICS=.FALSE. in the ALGO=Exact step (third step), and remove the WAVEDER files in the directory.

Possible tests and known issues

Basis set convergence

The expression for the ACFDT-RPA correlation energy written in terms of reciprocal lattice vectors reads:

.

The sum over reciprocal lattice vectors has to be truncated at some , determined by < ENCUTGW, which can be set in the INCAR file. The default value is ENCUT, which experience has taught us not to change. For systematic convergence tests, instead increase ENCUT and repeat steps 1 to 4, but be aware that the "maximum number of plane-waves" changes when ENCUT is increased. Note that it is virtually impossible, to converge absolute correlation energies. Rather concentrate on relative energies (e.g. energy differences between two solids, or between a solid and the constituent atoms).

Since correlation energies converge very slowly with respect to , VASP automatically extrapolates to the infinite basis set limit using a linear regression to the equation: [5][4][6]

.

Furthermore, the Coulomb kernel is smoothly truncated between ENCUTGWSOFT and ENCUTGW using a simple cosine like window function (Hann window function). The default for ENCUTGWSOFT is 0.8 ENCUTGW (again we do not recommend to change this default).

The integral over is evaluated by means of a highly accurate minimax integration.[7] The number of points is determined by the flag NOMEGA, whereas the energy range of transitions is determined by the band gap and the energy difference between the lowest occupied and highest unoccupied one-electron orbital. VASP determines these values automatically (from vasp.5.4.1 on), and the user should only carefully converge with respect to the number of frequency points NOMEGA. A good choice is usually NOMEGA=12, however, for large gap systems one might obtain eV convergence per atom already using 8 points, whereas for metals up to NOMEGA=24 frequency points are sometimes necessary, in particular, for large unit cells.

Strictly adhere to the steps outlines above. Specifically, be aware that steps two and three require the WAVECAR file generated in step one, whereas step four requires the WAVECAR and WAVEDER file generated in step three (generated by setting LOPTICS=.TRUE.).

Convergence with respect to the number of plane waves can be rather slow, and we recommend to test the calculations carefully. Specifically, the calculations should be performed at the default energy cutoff ENCUT, and at an increased cutoff (ideally the default energy cutoff ). Another issue is that energy volume-curves are sometimes not particularly smooth. In that case, the best strategy is to set

ENCUT = 1.3 times default cutoff energy
ENCUTGWSOFT = 0.5 times default cutoff energy

where the default cutoff energy is the usual cutoff energy (maximum ENMAX in POTCAR files). The frequency integration also needs to be checked carefully, in particular for small gap systems (some symmetry broken atoms) convergence can be rather slow, since the one-electron band gap can be very small, requiring a very small minimum in the frequency integration.

Related Tags and Sections

  • ALGO for response functions and ACFDT calculations
  • NOMEGA, NOMEGAR number of frequency points
  • LHFCALC, switches on HF calculations
  • LOPTICS, required in the DFT step to store head and wings
  • ENCUTGW, to set cutoff for response functions
  • ENCUTGWSOFT
  • PRECFOCK controls the FFT grids in HF, GW, RPA calculations
  • NTAUPAR controls the number of imaginary time groups in space-time GW and RPA calculations
  • NOMEGAPAR controls the number of imaginary frequency groups in space-time GW and RPA calculations
  • MAXMEM sets the available memory per MPI rank on each node

References

  1. J. Paier, M. Marsman, G. Kresse, Phys. Rev. B 78, 121201 (2008).
  2. M. Kaltak, J. Klimeš and G. Kresse, Phys. Rev. B 90, 054115 (2014).
  3. H. N. Rojas, R. W. Godby, and R. J. Needs, Phys. Rev. Lett. 74, 1827 (1995).
  4. a b c d J. Harl, L. Schimka, and G. Kresse, Phys. Rev. B 81, 115126 (2010).
  5. J. Harl and G. Kresse, Phys. Rev. B 77, 045136 (2008).
  6. J. Klimeš, M. Kaltak, and G. Kresse, Phys. Rev. B 90, 075125 (2014).
  7. M. Kaltak, J. Klimeš, and G. Kresse, J. Chem. Theory Comput. 10, 2498-2507 (2014).