error encountered while running VASP in GPU
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 39
- Joined: Mon May 29, 2023 8:56 am
error encountered while running VASP in GPU
Dear experts, I have compiled vasp (without wannier90 interface) in my GPU. I got an error while trying to run VASP as shown in the screenshot (note : I got the same error while trying to run quantum espresso also). Any help to resolve the issue would be highly appreciated. Thank you. Here are my system specification:
OS: Ubuntu 22.04
CPU: 36 Core
GPU: Nvidia RTX A6000
CUDA Version: 12.2
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0
Error:
mpirun -np 8 /home/cms-gpu/softwares/vasp.6.4.2/bin/vasp_ncl
running 8 mpi-ranks, on 1 nodes
libgomp: TODO
libgomp: TODO
libgomp: TODO
libgomp: TODO
libgomp: TODO
libgomp: TODO
libgomp: TODO
libgomp: TODO
distrk: each k-point on 8 cores, 1 groups
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[45111,1],1]
Exit code: 1
OS: Ubuntu 22.04
CPU: 36 Core
GPU: Nvidia RTX A6000
CUDA Version: 12.2
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0
Error:
mpirun -np 8 /home/cms-gpu/softwares/vasp.6.4.2/bin/vasp_ncl
running 8 mpi-ranks, on 1 nodes
libgomp: TODO
libgomp: TODO
libgomp: TODO
libgomp: TODO
libgomp: TODO
libgomp: TODO
libgomp: TODO
libgomp: TODO
distrk: each k-point on 8 cores, 1 groups
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[45111,1],1]
Exit code: 1
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 542
- Joined: Fri Nov 08, 2019 7:18 am
Re: error encountered while running VASP in GPU
Could you provide the makefile.include and tell us which modules you load, please?
It also seems like the error stems from OPENMP so you might want to compile without that option as a first attempt.
It also seems like the error stems from OPENMP so you might want to compile without that option as a first attempt.
Martin Schlipf
VASP developer
-
- Newbie
- Posts: 39
- Joined: Mon May 29, 2023 8:56 am
Re: error encountered while running VASP in GPU
sure, here is the makefile.include file.
Thank you.
Thank you.
You do not have the required permissions to view the files attached to this post.
-
- Newbie
- Posts: 39
- Joined: Mon May 29, 2023 8:56 am
Re: error encountered while running VASP in GPU
and, although I am not so sure (as I am new to ubuntu), I have used openmp, nvfortran.
-
- Global Moderator
- Posts: 542
- Joined: Fri Nov 08, 2019 7:18 am
Re: error encountered while running VASP in GPU
Thanks, I will try to reproduce this. At a first glance it seems strange that you get this error since you do not have OpenMP in your makefile.include, so I do not know why you would need to link to libgomp.
Martin Schlipf
VASP developer
-
- Newbie
- Posts: 39
- Joined: Mon May 29, 2023 8:56 am
Re: error encountered while running VASP in GPU
Will look forward to your insight. Thank you.
-
- Global Moderator
- Posts: 542
- Joined: Fri Nov 08, 2019 7:18 am
Re: error encountered while running VASP in GPU
When looking at your makefile.include, I noticed that you had replaced mpif90 with the explicit path to the NVIDIA compiler. What was the reason for that? If I were to guess, I would assume that you did not add the compiler to your PATH and hence the system decided to use a built-in mpif90 or did not find a mpif90 at all.
Did you do the same procedure for mpirun as well or did you add mpirun to your PATH? If not, it is possible that you use the mpirun of a different library which will typically not work. You can check this by
This should show /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/mpi/bin/mpirun or something very similar. If it does not, please add mpirun to your PATH or explicitly use the path to that executable.
Did you do the same procedure for mpirun as well or did you add mpirun to your PATH? If not, it is possible that you use the mpirun of a different library which will typically not work. You can check this by
Code: Select all
which mpirun
Martin Schlipf
VASP developer
-
- Newbie
- Posts: 39
- Joined: Mon May 29, 2023 8:56 am
Re: error encountered while running VASP in GPU
Hello, "which mpirun" showed "/opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/bin/mpirun". What to do now to solve the issue?
thanks.
thanks.
-
- Global Moderator
- Posts: 542
- Joined: Fri Nov 08, 2019 7:18 am
Re: error encountered while running VASP in GPU
I wrote a small example code.
Can you try to compile this with the same flags
and check whether you get the same error when you run the executable?
Another difference I noticed is the cuda version. In out tests, we use always cuda11.0.
Code: Select all
! example.f90
program main
implicit none
real x(1000), y(1000), sum_
integer ii
sum_ = 0
call random_number(x)
call random_number(y)
!$acc parallel reduction(+:sum_)
do ii = 1, size(x)
sum_ = sum_ + x(ii) * y(ii)
end do
!$acc end parallel
write(0,*) sum_, sum(x * y)
end program main
Code: Select all
/opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/mpi/bin/mpif90 -acc -gpu=cc60,cc70,cc80,cuda12.3 example.f90
Another difference I noticed is the cuda version. In out tests, we use always cuda11.0.
Martin Schlipf
VASP developer
-
- Newbie
- Posts: 39
- Joined: Mon May 29, 2023 8:56 am
Re: error encountered while running VASP in GPU
Hello,
I ran the command you gave and got an "a.out" file. I did not get any error message.
I ran the command you gave and got an "a.out" file. I did not get any error message.
-
- Newbie
- Posts: 39
- Joined: Mon May 29, 2023 8:56 am
Re: error encountered while running VASP in GPU
Hello, Even though the example program wrote by you ran without any error. VASP is still showing the same error. What can I do?
Thank You.
Thank You.
-
- Global Moderator
- Posts: 542
- Joined: Fri Nov 08, 2019 7:18 am
Re: error encountered while running VASP in GPU
When I tried to reproduce your setup, I ran into an issue with FFTW and looking into your makefile.include it may be that your fftw is not compatible with nvfortran. In particular it seems like you use the OpenMP version, which may explain why you get the errors that you see. Perhaps you can modify the example to do one fft and see whether that produces the error.
Martin Schlipf
VASP developer
-
- Newbie
- Posts: 39
- Joined: Mon May 29, 2023 8:56 am
Re: error encountered while running VASP in GPU
Hello sir,
I am very new to this field and I'm afraid I won't be able to modify the example to do fft on my own. Can you please assist me with that?
Thank you.
I am very new to this field and I'm afraid I won't be able to modify the example to do fft on my own. Can you please assist me with that?
Thank you.
-
- Global Moderator
- Posts: 542
- Joined: Fri Nov 08, 2019 7:18 am
Re: error encountered while running VASP in GPU
Something like this?
which you can compile with
after you set FFTW_ROOT to the appropriate folder.
Code: Select all
program main
implicit none
#include "fftw3.f"
integer, parameter :: N = 100
double complex in, out
dimension in(N), out(N)
integer*8 plan
call dfftw_plan_dft_1d(plan,N,in,out,FFTW_FORWARD,FFTW_ESTIMATE)
call dfftw_execute_dft(plan, in, out)
call dfftw_destroy_plan(plan)
end program main
Code: Select all
nvfortran example.f90 -I $FFTW_ROOT/include -L $FFTW_ROOT/lib -lfftw3
Martin Schlipf
VASP developer
-
- Newbie
- Posts: 39
- Joined: Mon May 29, 2023 8:56 am
Re: error encountered while running VASP in GPU
Hello sir,
I did "nvfortran example2.f90 -I /opt/intel/oneapi/mkl/2024.0/include -L /opt/intel/oneapi/mkl/2024.0/include/fftw -lfftw3" with the code you've given and got "a.out" without any error.
I did "nvfortran example2.f90 -I /opt/intel/oneapi/mkl/2024.0/include -L /opt/intel/oneapi/mkl/2024.0/include/fftw -lfftw3" with the code you've given and got "a.out" without any error.