My Community

Posted: **Fri Jul 19, 2024 8:31 am**

Hi,
After a fast installation of VASP on my machine, I noticed a performance issue. The test job was incomplete for several hours, so I interrupted it. I ran my small task on two versions of VASP. In version 5.4, the task was completed in 1 minute. In version 6, the program did not even enter the main loop after this time. I found a post on the forum stating that it relates to mpirun and the command I_MPI_FABRICS=shm vasp_std, but this solution does not work for me. Despite assigning the calculations to 32 CPUs (mpirun -np 32), VASP spreads across all 64 threads. How can I solve this problem? Below, I attach the makefile.include. My CPU is AMD Ryzen Threadripper PRO 5975WX 32-Cores, and I have only Gnu libraries. Thanks in advance!

Code: Select all

# Default precompiler options
CPP_OPTIONS = -DHOST=\"LinuxGNU\" \
              -DMPI -DMPI_BLOCK=8000 -Duse_collective \
              -DscaLAPACK \
              -DCACHE_SIZE=4000 \
              -Davoidalloc \
              -Dvasp6 \
              -Duse_bse_te \
              -Dtbdyn \
              -Dfock_dblbuf

CPP         = gcc -E -C -w $*$(FUFFIX) >$*$(SUFFIX) $(CPP_OPTIONS)

FC          = mpif90
FCL         = mpif90

FREE        = -ffree-form -ffree-line-length-none

FFLAGS      = -w -ffpe-summary=none

OFLAG       = -O2
OFLAG_IN    = $(OFLAG)
DEBUG       = -O0

OBJECTS     = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o
OBJECTS_O1 += fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o

# For what used to be vasp.5.lib
CPP_LIB     = $(CPP)
FC_LIB      = $(FC)
CC_LIB      = gcc
CFLAGS_LIB  = -O
FFLAGS_LIB  = -O1
FREE_LIB    = $(FREE)

OBJECTS_LIB = linpack_double.o

# For the parser library
CXX_PARS    = g++
LLIBS       = -lstdc++

##
## Customize as of this point! Of course you may change the preceding
## part of this file as well if you like, but it should rarely be
## necessary ...
##

# When compiling on the target machine itself, change this to the
# relevant target when cross-compiling for another architecture
VASP_TARGET_CPU ?= -march=native
FFLAGS     += $(VASP_TARGET_CPU)

# For gcc-10 and higher (comment out for older versions)
FFLAGS     += -fallow-argument-mismatch

# BLAS and LAPACK (mandatory)
OPENBLAS_ROOT ?= /programs/lapack-3.11/
BLASPACK    = -L$(OPENBLAS_ROOT)/lib -lopenblas

# scaLAPACK (mandatory)
SCALAPACK_ROOT ?= /programs/scalapack-2.2.0
SCALAPACK   = -L$(SCALAPACK_ROOT) -lscalapack

LLIBS      += $(SCALAPACK) $(BLASPACK)

# FFTW (mandatory)
FFTW_ROOT  ?= /usr/include
LLIBS      += -L$(FFTW_ROOT)/lib -lfftw3
INCS       += -I$(FFTW_ROOT)/

# HDF5-support (optional but strongly recommended)
CPP_OPTIONS+= -DVASP_HDF5
HDF5_ROOT  ?= /usr/lib/x86_64-linux-gnu/hdf5/serial
LLIBS      += -L$(HDF5_ROOT)/ -lhdf5_fortran
INCS       += -I$(HDF5_ROOT)/include

# For the VASP-2-Wannier90 interface (optional)
#CPP_OPTIONS    += -DVASP2WANNIER90
#WANNIER90_ROOT ?= /path/to/your/wannier90/installation
#LLIBS          += -L$(WANNIER90_ROOT)/lib -lwannier

Posted: **Fri Jul 19, 2024 8:41 am**

Hi,
Could you attach your POSCAR, INCAR, KPOINTS, POTCAR, and OUTCAR and stdout files? It'll help us to solve your issue.

Posted: **Fri Jul 19, 2024 9:36 am**

Hi, thanks for the fast answer.
I've run it again, and it looks like a stuck job. However, this time, it works on 32 CPUs. All files can be found in the attachment.

Posted: **Mon Jul 22, 2024 1:31 pm**

I found the solution. First, I stopped compiling with the hdf5 library. According to what I found on the forum, I also added symlib.o in makefile.include file in the line

Code: Select all

OBJECTS_O1 += fftw3d.o fftmpi.o fftmpiw.o symlib.o

and, perhaps most importantly, I set

Code: Select all

export OMP_NUM_THREADS=1

At this point, my sample task runs 25-30% faster in version 6 than in version 5. I run the program with the command

Code: Select all

mpirun -np 32 vasp_std

It works, so the problem seems to be solved. However, can someone explain to me why setting OMP_NUM_THREADS to 1 is more efficient than, for example, 4 or 16? And thanks for the feedback.

Posted: **Tue Jul 23, 2024 11:58 am**

Glad that you could get it working. It's strange that hdf5 caused issues. Do you have a stdout file associated with this and we can take a closer look at it? As this is a feature that is here to stay.

In terms of OMP_NUM_THREADS, it depends a lot on your individual architecture what will work best. There's a page on the VASP wiki about general parallelisation that might give some insight.

Posted: **Mon Jul 29, 2024 12:26 pm**

I did one more test, and now I complied with VASP with the hd5 library. It works correctly with the correct setting of OPM_NUM_THREADS. Thanks for your help!

Posted: **Mon Jul 29, 2024 3:05 pm**

Glad you've got it working!

My Community

VASP6 is a few times slower than 5.4 on AMD CPU

VASP6 is a few times slower than 5.4 on AMD CPU

Re: VASP6 is a few times slower than 5.4 on AMD CPU

Re: VASP6 is a few times slower than 5.4 on AMD CPU

Re: VASP6 is a few times slower than 5.4 on AMD CPU

Re: VASP6 is a few times slower than 5.4 on AMD CPU

Re: VASP6 is a few times slower than 5.4 on AMD CPU

Re: VASP6 is a few times slower than 5.4 on AMD CPU