error encountered while running VASP in GPU

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Message
Author
bhargabkakati
Newbie
Newbie
Posts: 39
Joined: Mon May 29, 2023 8:56 am

error encountered while running VASP in GPU

#1 Post by bhargabkakati » Tue Mar 26, 2024 6:33 am

Dear experts, I have compiled vasp (without wannier90 interface) in my GPU. I got an error while trying to run VASP as shown in the screenshot (note : I got the same error while trying to run quantum espresso also). Any help to resolve the issue would be highly appreciated. Thank you. Here are my system specification:

OS: Ubuntu 22.04
CPU: 36 Core
GPU: Nvidia RTX A6000
CUDA Version: 12.2

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0

Error:

mpirun -np 8 /home/cms-gpu/softwares/vasp.6.4.2/bin/vasp_ncl
running 8 mpi-ranks, on 1 nodes

libgomp: TODO

libgomp: TODO

libgomp: TODO

libgomp: TODO

libgomp: TODO

libgomp: TODO

libgomp: TODO

libgomp: TODO
distrk: each k-point on 8 cores, 1 groups
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[45111,1],1]
Exit code: 1
You do not have the required permissions to view the files attached to this post.

martin.schlipf
Global Moderator
Global Moderator
Posts: 542
Joined: Fri Nov 08, 2019 7:18 am

Re: error encountered while running VASP in GPU

#2 Post by martin.schlipf » Tue Mar 26, 2024 10:11 am

Could you provide the makefile.include and tell us which modules you load, please?

It also seems like the error stems from OPENMP so you might want to compile without that option as a first attempt.

Martin Schlipf
VASP developer


bhargabkakati
Newbie
Newbie
Posts: 39
Joined: Mon May 29, 2023 8:56 am

Re: error encountered while running VASP in GPU

#3 Post by bhargabkakati » Tue Mar 26, 2024 10:15 am

sure, here is the makefile.include file.
Thank you.
You do not have the required permissions to view the files attached to this post.

bhargabkakati
Newbie
Newbie
Posts: 39
Joined: Mon May 29, 2023 8:56 am

Re: error encountered while running VASP in GPU

#4 Post by bhargabkakati » Tue Mar 26, 2024 10:17 am

and, although I am not so sure (as I am new to ubuntu), I have used openmp, nvfortran.

martin.schlipf
Global Moderator
Global Moderator
Posts: 542
Joined: Fri Nov 08, 2019 7:18 am

Re: error encountered while running VASP in GPU

#5 Post by martin.schlipf » Tue Mar 26, 2024 10:24 am

Thanks, I will try to reproduce this. At a first glance it seems strange that you get this error since you do not have OpenMP in your makefile.include, so I do not know why you would need to link to libgomp.

Martin Schlipf
VASP developer


bhargabkakati
Newbie
Newbie
Posts: 39
Joined: Mon May 29, 2023 8:56 am

Re: error encountered while running VASP in GPU

#6 Post by bhargabkakati » Tue Mar 26, 2024 10:27 am

Will look forward to your insight. Thank you.

martin.schlipf
Global Moderator
Global Moderator
Posts: 542
Joined: Fri Nov 08, 2019 7:18 am

Re: error encountered while running VASP in GPU

#7 Post by martin.schlipf » Tue Mar 26, 2024 2:04 pm

When looking at your makefile.include, I noticed that you had replaced mpif90 with the explicit path to the NVIDIA compiler. What was the reason for that? If I were to guess, I would assume that you did not add the compiler to your PATH and hence the system decided to use a built-in mpif90 or did not find a mpif90 at all.

Did you do the same procedure for mpirun as well or did you add mpirun to your PATH? If not, it is possible that you use the mpirun of a different library which will typically not work. You can check this by

Code: Select all

which mpirun
This should show /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/mpi/bin/mpirun or something very similar. If it does not, please add mpirun to your PATH or explicitly use the path to that executable.

Martin Schlipf
VASP developer


bhargabkakati
Newbie
Newbie
Posts: 39
Joined: Mon May 29, 2023 8:56 am

Re: error encountered while running VASP in GPU

#8 Post by bhargabkakati » Wed Mar 27, 2024 4:14 am

Hello, "which mpirun" showed "/opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/bin/mpirun". What to do now to solve the issue?
thanks.

martin.schlipf
Global Moderator
Global Moderator
Posts: 542
Joined: Fri Nov 08, 2019 7:18 am

Re: error encountered while running VASP in GPU

#9 Post by martin.schlipf » Wed Mar 27, 2024 8:59 am

I wrote a small example code.

Code: Select all

! example.f90
program main

    implicit none

    real x(1000), y(1000), sum_
    integer ii

    sum_ = 0
    call random_number(x)
    call random_number(y)

    !$acc parallel reduction(+:sum_)
    do ii = 1, size(x)
        sum_ = sum_ + x(ii) * y(ii)
    end do
    !$acc end parallel

    write(0,*) sum_, sum(x * y)

end program main
Can you try to compile this with the same flags

Code: Select all

/opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/mpi/bin/mpif90 -acc -gpu=cc60,cc70,cc80,cuda12.3 example.f90
and check whether you get the same error when you run the executable?

Another difference I noticed is the cuda version. In out tests, we use always cuda11.0.

Martin Schlipf
VASP developer


bhargabkakati
Newbie
Newbie
Posts: 39
Joined: Mon May 29, 2023 8:56 am

Re: error encountered while running VASP in GPU

#10 Post by bhargabkakati » Wed Mar 27, 2024 9:35 am

Hello,
I ran the command you gave and got an "a.out" file. I did not get any error message.

bhargabkakati
Newbie
Newbie
Posts: 39
Joined: Mon May 29, 2023 8:56 am

Re: error encountered while running VASP in GPU

#11 Post by bhargabkakati » Wed Mar 27, 2024 2:28 pm

Hello, Even though the example program wrote by you ran without any error. VASP is still showing the same error. What can I do?
Thank You.

martin.schlipf
Global Moderator
Global Moderator
Posts: 542
Joined: Fri Nov 08, 2019 7:18 am

Re: error encountered while running VASP in GPU

#12 Post by martin.schlipf » Thu Mar 28, 2024 8:18 am

When I tried to reproduce your setup, I ran into an issue with FFTW and looking into your makefile.include it may be that your fftw is not compatible with nvfortran. In particular it seems like you use the OpenMP version, which may explain why you get the errors that you see. Perhaps you can modify the example to do one fft and see whether that produces the error.

Martin Schlipf
VASP developer


bhargabkakati
Newbie
Newbie
Posts: 39
Joined: Mon May 29, 2023 8:56 am

Re: error encountered while running VASP in GPU

#13 Post by bhargabkakati » Thu Mar 28, 2024 9:04 am

Hello sir,
I am very new to this field and I'm afraid I won't be able to modify the example to do fft on my own. Can you please assist me with that?
Thank you.

martin.schlipf
Global Moderator
Global Moderator
Posts: 542
Joined: Fri Nov 08, 2019 7:18 am

Re: error encountered while running VASP in GPU

#14 Post by martin.schlipf » Thu Mar 28, 2024 4:57 pm

Something like this?

Code: Select all

program main
    implicit none
    #include "fftw3.f"
    integer, parameter :: N = 100
    double complex in, out
    dimension in(N), out(N)
    integer*8 plan
    call dfftw_plan_dft_1d(plan,N,in,out,FFTW_FORWARD,FFTW_ESTIMATE)
    call dfftw_execute_dft(plan, in, out)
    call dfftw_destroy_plan(plan)
end program main
which you can compile with

Code: Select all

nvfortran example.f90 -I $FFTW_ROOT/include -L $FFTW_ROOT/lib -lfftw3
after you set FFTW_ROOT to the appropriate folder.

Martin Schlipf
VASP developer


bhargabkakati
Newbie
Newbie
Posts: 39
Joined: Mon May 29, 2023 8:56 am

Re: error encountered while running VASP in GPU

#15 Post by bhargabkakati » Sat Mar 30, 2024 5:51 am

Hello sir,
I did "nvfortran example2.f90 -I /opt/intel/oneapi/mkl/2024.0/include -L /opt/intel/oneapi/mkl/2024.0/include/fftw -lfftw3" with the code you've given and got "a.out" without any error.

Locked