problem running parallel vasp

Questions regarding the compilation of VASP on various platforms: hardware, compilers and libraries, etc.

Moderators: Moderator, Global Moderator

Post Reply
Message
Author
jialy_25
Newbie
Newbie
Posts: 8
Joined: Fri Apr 21, 2006 9:00 am

problem running parallel vasp

#1 Post by jialy_25 » Thu May 04, 2006 8:46 am

i m trying to install parallel version of vasp on P4 linux cluster on Fedora Core 4 by using MPICH2-1.0.3. The linker of mpif90 is G95 (http://www.g95.org).

This is my Makefile:

.SUFFIXES: .inc .f .f90 .F
#-----------------------------------------------------------------------
# Makefile for the G95 (http://www.g95.com) compiler
#-----------------------------------------------------------------------
#
# BLAS must be installed on the machine
# there are several options:
# 1) very slow but works:
# retrieve the lapackage from ftp.netlib.org
# and compile the blas routines (BLAS/SRC directory)
# please use g77 or f77 for the compilation. When I tried to
# use pgf77 or pgf90 for BLAS, VASP hang up when calling
# ZHEEV (however this was with lapack 1.1 now I use lapack 2.0)
# 2) most desirable: get an optimized BLAS
#
# the two most reliable packages around are presently:
# 3a) Intels own optimised BLAS (PIII, P4, Itanium)
# http://developer.intel.com/software/products/mkl/
# this is really excellent when you use Intel CPU's
#
# 3b) or obtain the atlas based BLAS routines
# http://math-atlas.sourceforge.net/
# you certainly need atlas on the Athlon, since the mkl
# routines are not optimal on the Athlon.
# If you want to use atlas based BLAS, check the lines around LIB=
#
# 3c) mindblowing fast SSE2 (4 GFlops on P4, 2.53 GHz)
# Kazushige Goto's BLAS
# http://www.cs.utexas.edu/users/kgoto/signup_first.html
#
#-----------------------------------------------------------------------

# all CPP processed fortran files have the extension .f90
SUFFIX=.f90

#-----------------------------------------------------------------------
# fortran compiler and linker
#-----------------------------------------------------------------------
FC=g95
# fortran linker
FCL=$(FC)


#-----------------------------------------------------------------------
# whereis CPP ?? (I need CPP, can't use gcc with proper options)
# that's the location of gcc for SUSE 5.3
#
# CPP_ = /usr/lib/gcc-lib/i486-linux/2.7.2/cpp -P -C
#
# that's probably the right line for some Red Hat distribution:
#
# CPP_ = /usr/lib/gcc-lib/i386-redhat-linux/2.7.2.3/cpp -P -C
#
# SUSE X.X, maybe some Red Hat distributions:

CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)

#-----------------------------------------------------------------------
# possible options for CPP:
# NGXhalf charge density reduced in X direction
# wNGXhalf gamma point only reduced in X direction
# avoidalloc avoid ALLOCATE if possible
# IFC work around some IFC bugs
# CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4
# RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (depends on used BLAS)
# RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (depends on used BLAS)
#-----------------------------------------------------------------------

#CPP = $(CPP_) -DHOST=\"LinuxIFC\" \
# -Dkind8 -DNGXhalf -DCACHE_SIZE=12000 -DPGF90 -Davoidalloc \
# -DRPROMU_DGEMV -DRACCMU_DGEMV

#-----------------------------------------------------------------------
# general fortran flags (there must a trailing blank on this line)
#-----------------------------------------------------------------------

FFLAGS = -ffree-form -march=pentium4 -mfpmath=sse -msse \
-msse2

#-----------------------------------------------------------------------
# optimization
# no in depth tests done yet -O3 however seems to be sensible
#-----------------------------------------------------------------------

OFLAG=-O2

OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =

OBJ_NOOPT =
DEBUG = -g -O0
INLINE = $(OFLAG)


#-----------------------------------------------------------------------
# the following lines specify the position of BLAS and LAPACK
# on Opteron, VASP works fastest with the libgoto library
# so that's what I recommend
#-----------------------------------------------------------------------

# Atlas based libraries
#ATLASHOME= $(PWD)/Linux_HAMMER64SSE2/lib/
#BLAS= -L$(ATLASHOME) -lf77blas -latlas

# use specific libraries (default library path might point to other libraries)
BLAS= /usr/lib/sse2/libf77blas.a /usr/lib/sse2/libatlas.a

# use the mkl Intel libraries for p4 (www.intel.com)
# mkl.5.1
# set -DRPROMU_DGEMV -DRACCMU_DGEMV in the CPP lines
#BLAS=-L/opt/intel/mkl/lib/32 -lmkl_p4 -lpthread

# mkl.5.2 requires also to -lguide library
# set -DRPROMU_DGEMV -DRACCMU_DGEMV in the CPP lines
#BLAS=-L/opt/intel/mkl/lib/32 -lmkl_p4 -lguide -lpthread

# even faster Kazushige Goto's BLAS
# http://www.cs.utexas.edu/users/kgoto/signup_first.html
#BLAS= /home/kresse/64/vasp.4.6/libgoto_opt64-r0.93.so

# LAPACK, simplest use vasp.4.lib/lapack_double
#LAPACK= ../vasp.4.lib/lapack_double.o

# use atlas optimized part of lapack
LAPACK= /usr/local/vasp-mpich/vasp.4.lib/lapack_atlas.o -L/usr/lib/sse2 -llapack -lcblas

# use the mkl Intel lapack
#LAPACK= -lmkl_lapack

#-----------------------------------------------------------------------

LIB = -L../vasp.4.lib -ldmy \
../vasp.4.lib/linpack_double.o $(LAPACK) \
$(BLAS)

# options for linking (for compiler version 6.X, 7.1) nothing is required
#LINK = /usr/local/g95/lib/gcc-lib/i686-pc-linux-gnu/4.0.1/libf95.a

#-----------------------------------------------------------------------
# fft libraries:
# VASP.4.6 can use fftw.3.0.X (http://www.fftw.org)
# since this version is faster on P4 machines, we recommend to use it
#-----------------------------------------------------------------------

FFT3D = fft3dfurth.o fft3dlib.o
FFT3D = fftw3d.o fft3dlib.o /usr/lib/libfftw3.a


#=======================================================================
# MPI section, uncomment the following lines
#
# one comment for users of mpich or lam:
# You must *not* compile mpi with g77/f77, because f77/g77
# appends *two* underscores to symbols that contain already an
# underscore (i.e. MPI_SEND becomes mpi_send__). The pgf90/ifc
# compilers however append only one underscore.
# Precompiled mpi version will also not work !!!
#
# We found that mpich.1.2.1 and lam-6.5.X to lam-7.0.4 are stable
# mpich.1.2.1 was configured with
# ./configure -prefix=/usr/local/mpich_nodvdbg -fc="pgf77 -Mx,119,0x200000" \
# -f90="pgf90 -Mx,119,0x200000" \
# --without-romio --without-mpe -opt=-O \
#
# lam was configured with the line
# ./configure -prefix /opt/libs/lam-7.0.4 --with-cflags=-O -with-fc=pgf77 \
# --with-f77flags=-O --without-romio
#
# please note that you might be able to use a lam or mpich version
# compiled with f77/g77, but then you need to add the following
# options: -Msecond_underscore (compilation) and -g77libs (linking)
#
# !!! Please do not send me any queries on how to install MPI, I will
# certainly not answer them !!!!
#=======================================================================
#-----------------------------------------------------------------------
# fortran linker for mpi: if you use LAM and compiled it with the options
# suggested above, you can use the following line
#-----------------------------------------------------------------------

FC=mpif90
FCL=$(FC)

#-----------------------------------------------------------------------
# additional options for CPP in parallel version (see also above):
# NGZhalf charge density reduced in Z direction
# wNGZhalf gamma point only reduced in Z direction
# scaLAPACK use scaLAPACK (usually slower on 100 Mbit Net)
#-----------------------------------------------------------------------

CPP = $(CPP_) -DMPI -DHOST=\"LinuxG95\" -DG95 \
-Dkind8 -DNGZhalf -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc \
-DMPI_BLOCK=500 \
# -DRPROMU_DGEMV -DRACCMU_DGEMV

#-----------------------------------------------------------------------
# location of SCALAPACK
# if you do not use SCALAPACK simply uncomment the line SCA
#-----------------------------------------------------------------------

#BLACS=$(HOME)/archives/SCALAPACK/BLACS/
#SCA_=$(HOME)/archives/SCALAPACK/SCALAPACK

#SCA= $(SCA_)/libscalapack.a \
# $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a $(BLACS)/LIB/blacs_MPI-LINUX-0.a $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a

#SCA=

#-----------------------------------------------------------------------
# libraries for mpi
#-----------------------------------------------------------------------

LIB = -L/usr/local/vasp-mpich/vasp.4.lib -ldmy \
/usr/local/vasp-mpich/vasp.4.lib/linpack_double.o $(LAPACK) \
$(SCA) $(BLAS)

LINK = -Wl,-rpath,/usr/lib/sse2

# FFT: fftmpi.o with fft3dlib of Juergen Furthmueller
FFT3D = fftmpi.o fftmpi_map.o fft3dlib.o

# fftw.3.0.1 is slighly faster and should be used if available
#FFT3D = fftmpiw.o fftmpi_map.o fft3dlib.o /usr/lib/libfftw3.a

#-----------------------------------------------------------------------
# general rules and compile lines
#-----------------------------------------------------------------------
BASIC= symmetry.o symlib.o lattlib.o random.o

SOURCE= base.o mpi.o smart_allocate.o xml.o \
constant.o jacobi.o main_mpi.o scala.o \
asa.o lattice.o poscar.o ini.o setex.o radial.o \
pseudo.o mgrid.o mkpoints.o wave.o wave_mpi.o $(BASIC) \
nonl.o nonlr.o dfast.o choleski2.o \
mix.o charge.o xcgrad.o xcspin.o potex1.o potex2.o \
metagga.o constrmag.o pot.o cl_shift.o force.o dos.o elf.o \
tet.o hamil.o steep.o \
chain.o dyna.o relativistic.o LDApU.o sphpro.o paw.o us.o \
ebs.o wavpre.o wavpre_noio.o broyden.o \
dynbr.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \
brent.o stufak.o fileio.o opergrid.o stepver.o \
dipol.o xclib.o chgloc.o subrot.o optreal.o davidson.o \
edtest.o electron.o shm.o pardens.o paircorrection.o \
optics.o constr_cell_relax.o stm.o finite_diff.o \
elpol.o setlocalpp.o

INC=

vasp: $(SOURCE) $(FFT3D) $(INC) main.o
rm -f vasp
$(FCL) -o vasp main.o $(SOURCE) $(FFT3D) $(LIB) $(LINK)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
$(FCL) -o makeparam makeparam.o $(SOURCE) $(FFT3D) $(LIB) $(LINK)
zgemmtest: zgemmtest.o base.o random.o $(INC)
$(FCL) -o zgemmtest zgemmtest.o random.o base.o $(LIB) $(LINK)
dgemmtest: dgemmtest.o base.o random.o $(INC)
$(FCL) -o dgemmtest dgemmtest.o random.o base.o $(LIB) $(LINK)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
$(FCL) -o ffttest ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB) $(LINK)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
$(FCL) -o kpoints makekpoints.o $(SOURCE) $(FFT3D) $(LIB) $(LINK)

clean:
-rm -f *.g *.f *.o *.L *.mod ; touch *.F

main.o: main$(SUFFIX)
$(FC) $(FFLAGS) $(DEBUG) $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F
#
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch_dep must be called
# with the corresponding name of the structure
#
base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
poscar.o: poscar.inc poscar.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.inc wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F

$(OBJ_HIGH):
$(CPP)
$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
$(CPP)
$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
$(CPP)
$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

.F.o:
$(CPP)
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
$(CPP)
$(SUFFIX).o:
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

# special rules
#-----------------------------------------------------------------------
# these special rules are cummulative (that is once failed
# in one compiler version, stays in the list forever)
# -tpp5|6|7 P, PII-PIII, PIV
# -xW use SIMD (does not pay of on PII, since fft3d uses double prec)
# all other options do no affect the code performance since -O1 is used
#-----------------------------------------------------------------------


There were errors in file tet.F and tutor.F
In file tet.f90:56

IMPLICIT COMPLEX(q) (C)
1
Error: Letter 'Q' already set in IMPLICIT statement at (1)


I modified these 2 files that IMPLICIT COMPLEX(q) (C) is a line above IMPLICIT REAL(q) (A-B, D-H, O-Z). And the compilation become smooth but with plenty of warnings like below:

In file us.f90:585

LMDIM,CRHODE, CHTOT,CHDEN, IRDMAX)
1
In file us.f90:1371

LMDIM,CRHODE, CHTOT_,CHDEN, IRDMAX )
2
Warning (155): Inconsistent types (COMPLEX(8)/REAL(8)) in actual argument lists at (1) and (2)


Then i run the benchmark on master node by giving command vasp. The process won't continue after 'entering main loop' as shown below:

running on 1 nodes
distr: one band on 1 nodes, 1 groups
vasp.4.6.28 25Jul05 complex
POSCAR found : 1 types and 8 ions
WARNING: mass on POTCAR and INCAR are incompatible
typ 1 Mass 63.55 63.546
LDA part: xc-table for Ceperly-Alder, standard interpolation

-----------------------------------------------------------------------------
| |
| W W AA RRRRR N N II N N GGGG !!! |
| W W A A R R NN N II NN N G G !!! |
| W W A A R R N N N II N N N G !!! |
| W WW W AAAAAA RRRRR N N N II N N N G GGG ! |
| WW WW A A R R N NN II N NN G G |
| W W A A R R N N II N N GGGG !!! |
| |
| VASP found 21 degrees of freedom |
| the temperature will equal 2*E(kin)/ (degrees of freedom) |
| this differs from previous releases, where T was 2*E(kin)/(3 NIONS). |
| The new definition is more consistent |
| |
-----------------------------------------------------------------------------

POSCAR, INCAR and KPOINTS ok, starting setup
WARNING: wrap around errors must be expected
FFT: planning ... 4
reading WAVECAR
prediction of wavefunctions initialized - no I/O
entering main loop
N E dE d eps ncg rms rms(c)


Can anybody tell me what is the problem?
Last edited by jialy_25 on Thu May 04, 2006 8:46 am, edited 1 time in total.

tjf
Full Member
Full Member
Posts: 107
Joined: Wed Aug 10, 2005 1:30 pm
Location: Leiden, Netherlands

problem running parallel vasp

#2 Post by tjf » Thu May 04, 2006 12:03 pm

How old is your g95? There was an g95 issue with the implicit typing of the style used in the VASP source. It should have been fixed a while ago (several months?) and had entered into the g95 tree some unknown time before that. (VASP compiled fine with g95 at one time before that, and I built it successfully after.) Maybe it's reared its head again?

Presumably your problem is for a serial binary also? (Please confirm.) You really should start there. That takes various complications out of the picture.



<span class='smallblacktext'>[ Edited Thu May 04 2006, 02:05PM ]</span>
Last edited by tjf on Thu May 04, 2006 12:03 pm, edited 1 time in total.

jialy_25
Newbie
Newbie
Posts: 8
Joined: Fri Apr 21, 2006 9:00 am

problem running parallel vasp

#3 Post by jialy_25 » Sat May 06, 2006 9:01 am

I ve tried compiling the serial version of vasp. Of course, there were still same warnings of inconsistet type. And i found that serial version executable have the same problem with the parallel version that i described above. The g95 version that i am using is 4.0.1. I had dowloaded its prebuilt binaries in March 2006.

Is this the problem of g95? ?!
Last edited by jialy_25 on Sat May 06, 2006 9:01 am, edited 1 time in total.

tjf
Full Member
Full Member
Posts: 107
Joined: Wed Aug 10, 2005 1:30 pm
Location: Leiden, Netherlands

problem running parallel vasp

#4 Post by tjf » Sat May 06, 2006 11:17 am

[quote="jialy_25"]Is this the problem of g95? ?! [/quote]


It may well be. If it was me I'd:
a) Download the latest version of g95 and try again.
b) Try changing REAL(q) and COMPLEX(q) to REAL(type=q) and COMPLEX(kind=q) everywhere (this has worked in the past).
c) Try to hack together a small piece of code using the same construct to see if you can easily show g95 is broken. File bug report.
d) Use gfortran.
Last edited by tjf on Sat May 06, 2006 11:17 am, edited 1 time in total.

Post Reply