Dear Forum,
I'm experiencing a lot of crashes when running fairly unsymmetric, but small structures at high kpoint sampling (KSPACING=0.05) and high plane wave cutoff (ENCUT=750). The structures are originally derived from symmetric crystals, but have a lot of random strain applied to them.
Within ~30s and before k point generation starts the jobs crash with the following error
munmap_chunk(): invalid pointer
before SLURM cancels the job and VASP prints a stack trace. I'm suspecting it's a memory issue, because the same structures do run successfully on larger KSPACING. It is my understanding that reducing KPAR and increasing NCORE should reduce memory load, so I ran two trials once with 32 cores, KPAR=2, NCORE=4 and once with 128 cores, KPAR=1 and NCORE=32, but the same error appears. I allocated 2GB/core RAM via SLURM. I've attached both runs, the stdout and stderr during the run are in the files error.out and error.msg.
Our hardware are nodes with 2 AMD EPYC 9754 128-Core Processors and 768GB RAM, so the calculations should run within a single processor on the same node.
The present calculations use ADDGRID, but I've observed the same problem in older runs without it as well. LREAL should not make a difference I suppose as the structures are only 2 atoms.
Are there any other options to reduce memory load that I could try?
If it really does turn out to be an insufficient memory issue, it would be very helpful if VASP could print a better error message before quitting.
Because the crash occurs before k points and plane wave info is printed to OUTCAR, I'm not sure about the exact memory requirements, but as quick check I ran the same structure and same settings with a different DFT code (SPHInX https://sxrepo.mpie.de/) and those finish without problem. So I think we can rule out hardware limitations.
Let me know if you need more information and thanks for the help already,
Marvin
EDIT: I saw this post forum/viewtopic.php?t=18812 that sounds similar, but I've check that on our machines
Code: Select all
ulimit -s unlimited