Page 1 of 1

Out of memory error in calculating band structure

PostPosted: Fri Apr 13, 2018 8:35 pm
by augolotti
Dear all,

I am trying to calculate the band structure of a simple system, a monolayer of NaCl in an orthorhombic supercell, however I am unable to get a result. I either get an out-of-memory error from the cluster (48 cpus per node and 2 nodes with 190GB of RAM) or the calculation gets stuck until the maximum walltime is reached and I cannot figure out why this happens. As the procedure, I have first relaxed the system, then I look for bands with a non-self-consistent calculation based on previous WAVECAR and CONTCAR (->POSCAR). This is the INCAR I used:

Code: Select all
SYSTEM = Free standing ClNa 12x5 supercell
ISTART = 1
ICHARG = 11
ISPIN = 1
PREC = Accurate
NWRITE = 3 #med-high verbosity

# controlling scf
EDIFF = 0.0001 #electronic accuracy
NELM =  100 # max number of scf steps
ENCUT = 250
#NBANDS = 
ISMEAR = 1 #second-order m-p
SIGMA = 0.1 #half of default
LREAL = A
IMIX = 1
AMIX = 0.2
BMIX = 0.00001
ALGO = All

# controlling relaxation
NSW = 0 # number of ionic step in relax
IBRION = -1 #cg relax
ISIF = 0 #not calculating the stresses

# controlling dos plot
LORBIT = 11
RWIG = 1.111 1.757
NEDOS = 301

# output files settings
LWAVE = .FALSE. #wfcs for dos and bands
LCHARG = .FALSE. #charge for dos, bands,  postprocess
LVHAR = .FALSE. #potential for checking

# parallel settings
LPLANE = .TRUE.
LSCALU = .FALSE.
NSIM = 4
NCORE = 48
KPAR = 8

# optB88 functional settings
GGA = BO
PARAM1 = 0.1833333333
PARAM2 = 0.2200000000
LUSE_VDW = .TRUE.
AGGAC = 0.0000


and this is the KPOINTS file:
Code: Select all
Band Path G-X-S-Y-G-S
20
Line-mode
rec
0.0000000000 0.0000000000 0.0000000000 !G
0.5000000000 0.0000000000 0.0000000000 !X

0.5000000000 0.0000000000 0.0000000000 !X
0.5000000000 0.5000000000 0.0000000000 !S

0.5000000000 0.5000000000 0.0000000000 !S
0.0000000000 0.5000000000 0.0000000000 !Y

0.0000000000 0.5000000000 0.0000000000 !Y
0.0000000000 0.0000000000 0.0000000000 !G

0.0000000000 0.0000000000 0.0000000000 !G
0.5000000000 0.5000000000 0.0000000000 !S


Any help to figure this out would be really appreciated, thank you.

Best,

Aldo Ugolotti

PhD student
Department of Materials Science
Università degli Studi di Milano-Bicocca
Milano, Italy

Re: Out of memory error in calculating band structure

PostPosted: Mon Apr 30, 2018 3:07 pm
by DannyVanpoucke
Check the amount of memory used via
grep memory OUTCAR

The machine you are using has 3.75Gb of RAM/core. The amount presented by VASP should be below this 3.75Gb. Furthermore, upon finishing a calculation, there is a spike in memory usage for combining the results into a single CHGCAR. At this point you will need up to 2-3 times the amount of memory reported per core. If this is not the case it will seem like your job is just hanging, however it is probably swapping itself to death. :-)

Solution look at your KPAR and NPAR parameters used.
If you use KPAR, reduce it, or double the number of nodes (retaining the KPAR). NPAR parallelisation reduces the memory usage per core, KPAR increases it.

Always remember, the amount of memory presented by VASP is the amount of memory per core, which scales roughly linearly with the number of k-points (100 in your case)!

Best,
Danny
PS: I hope you are not calculating NaCl with an ENCUT of merely 250 eV...as there are no default cutoffs for Cl with an ENCUT below 260 eV to my knowledge, check the ENMAX values in the POTCAR you use, and use at least the largest of the two, below that your results may be rather fishy.

Re: Out of memory error in calculating band structure

PostPosted: Thu May 03, 2018 7:35 pm
by augolotti
Hello,

thanks for the detailed explanation of the memory usage, that would come in handy.
About the small ENCUT I was using a smaller value that suggested (then not so small, the values suggested in the POTCAR for Cl are in the range 197-280) to check if the issue could be related to the cutoff (number of total plane waves).
In the end I noticed that the problem is first STRONGLY dependent on the parallelization settings, especially KPAR. For this system KPAR>2 results in such an error, and sometimes NPAR/NCORE tag have to played out before finding the proper value, i.e. the one the gets the simulation at least to run.