Registered Member #712
Joined Thu Jun 08 2006, 07:33PM
posts 4
I am a sysadmin helping a user install vasp on our linux (RHEL 4.0) opteron cluster. The compiler is PGIF90 6.1 and the MPI lib is OpenMPI 1.0.2 the serial version builds and runs just fine but the paralell version gives the following error,
running on 2 nodes
[nyx.engin.umich.edu:28430] *** An error occurred in MPI_Cart_create
[nyx.engin.umich.edu:28430] *** on communicator MPI_COMM_WORLD
[nyx.engin.umich.edu:28430] *** MPI_ERR_OTHER: known error not in list
[nyx.engin.umich.edu:28430] *** MPI_ERRORS_ARE_FATAL (goodbye)
[nyx.engin.umich.edu:28431] *** An error occurred in MPI_Cart_create
[nyx.engin.umich.edu:28431] *** on communicator MPI_COMM_WORLD
[nyx.engin.umich.edu:28431] *** MPI_ERR_OTHER: known error not in list
[nyx.engin.umich.edu:28431] *** MPI_ERRORS_ARE_FATAL (goodbye)
1 additional process aborted (not shown)
This is a regular OMPI error, and i have contacted the devs of openmpi, i am posting here to see if this is a problem anyone else has seen and if so how/if they were able to fix this problem.
Registered Member #712
Joined: Thu Jun 08 2006, 07:33PM
posts 4
job wrote: ... Have you compiled vasp with -i8 and the mpi library with default settings? That won't work.
I tried rebulding the OpenMPI with -i8 but chokes on configure ( i have contacted the devs)
I also tried forcing 4 byte integers -i4 and rebuilt both vasp.4.lib and vasp.4.6 and i get a new error:
[brockp@nyx VASP.run]$ mpirun -np 2 -v ./vasp
running on 2 nodes
[nyx.engin.umich.edu:31483] *** An error occurred in MPI_Cartdim_get
[nyx.engin.umich.edu:31483] *** on communicator MPI_COMM_WORLD
[nyx.engin.umich.edu:31483] *** MPI_ERR_COMM: invalid communicator
[nyx.engin.umich.edu:31483] *** MPI_ERRORS_ARE_FATAL (goodbye)
distr: one band on 1 nodes, 1 groups
1 process killed (possibly by Open MPI)
So i still have not made any progress. I also added the -Ddebug to the flags, but vasp did not display anything.
Registered Member #712
Joined: Thu Jun 08 2006, 07:33PM
posts 4
The problem was solved using the following:
lam-7.1.2
Open MPI would not work with vasp this is unfortonate, both mpich and lam are nolonger dev. Moving to more uptodate MPI libs like OpenMPI would be a plus in the future. Im not sure if its OpenMPI or VASP causing the problem so i will pass it on to the OMPI devs see if we can fix it.
PGI 6.1 -i4
Matchin the size of LOGICALS and such was a real pain, Its not documented anyware but the default PGI make file for linux has -i8 in the Makefiles. This caused quite a headache. This MUST match what your MPI lib was built with.
VASP is running now int MPI GoTO was very slow on the example case i had (dont know why) ACML3.5 was slightly faster than ATLAS.
This was on OPT 244 with GIG-E non blocking + Jumbo frames networking. Hope this helps anyone else. If you want I can provide Makefiles for anyone having trouble.
Brock
1(734)936-1985
Center for Advanced Computing
University of Michigan (Ann Arbor)
Registered Member #111
Joined: Tue Mar 22 2005, 07:45AM
posts 10
I met similar problem. In my case, I could solve it by following:
Set the compiler path directory like as,
FC=/usr/foo/bar/bin/mpif90
'FC=mpif90' with $PATH doesn't work. A shared library is missing. I don't know why.
Another choice is to link staticaly ( libmpi.a, liborte.a and libopal.a in openmpi case). Remenber to copy header files from 'include' directory in openmpi.