Error while running ABINIT parallelly  [SOLVED]

option, parallelism,...

Moderators: fgoudreault, mcote

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Locked
gvtheen
Posts: 4
Joined: Sun Dec 02, 2012 12:13 pm

Error while running ABINIT parallelly

Post by gvtheen » Sun Dec 09, 2012 3:17 am

Respected Sir and abinit users
Greetings
I have successfully compiled abinit 6-10.1 on Linux TC5000 2.6.18-164.el5 in parallel.
However, when I run the code( /public/program/mpi/openmpi/1.6.3/bin/mpirun --mca btl ^openib -np 4 /public/program/abinit-6.10.1/bin/abinit < t11.file >log ), there are some following error:
"
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 14.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 21771 on
node TC5000 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

"
The t11.in is a copy of the test file in ~/abinit-6.10.1/tests/tutorial/Input/t11.in.

1. Compile openmpi-1.6.3:
#./configure CC=/root/gcc-4.5.1/gcc-4.5.0/bin/gcc CXX=/root/gcc-4.5.1/gcc-4.5.0/bin/g++ F77=/root/gcc-4.5.1/gcc-4.5.0/bin/gfortran FC=/root/gcc-4.5.1/gcc-4.5.0/bin/gfortran --enable-static --disable-shared --prefix=/public/program/mpi/openmpi/1.6.3 --enable-mpirun-prefix-by-default --enable-mca-no-build=maffinity-libnuma,openib --with-tm=/opt/gridview/pbs/dispatcher
# make
# make install

Via the test, it is ok! As follow:
# /public/program/mpi/openmpi/1.6.3/bin/mpirun --mca btl ^openib -np 4 hello_c
Hello, world, I am 0 of 4
Hello, world, I am 1 of 4
Hello, world, I am 2 of 4
Hello, world, I am 3 of 4

2. Compile abinit-6.10.1
#./configure --prefix=/public/program/abinit-6.10.1 --with-mpi-prefix=/public/program/mpi/openmpi/1.6.3 --enable-mpi=yes --enable-vdwxc=yes
# make
# make mj4

############################
Bestwishes!
gvtheen

User avatar
Alain_Jacques
Posts: 279
Joined: Sat Aug 15, 2009 9:34 pm
Location: Université catholique de Louvain - Belgium

Re: Error while running ABINIT parallelly  [SOLVED]

Post by Alain_Jacques » Thu Feb 07, 2013 8:29 pm

Hi gvtheen,

there could be many reasons why a MPI job crashes and very little info here to diagnose it. Thanks for choosing the tutorial t11.in example - as far as I remember an hydrogen molecule in a large box with only ONE k-point as a relevant test for parallelization.

Did you read Open MPI FAQ about static linking? I captured you don't want InfiniBand transfer layer but which one works then?
Did you run the test suite? Did you have any success with an example in the tutorial about parallel execution?
Did you check the log file to figure out where abinit crashed?
Did you try to run the code with only one instance? Did you try a sequential build?
Did you provide a config.log?

Kind regards,

Alain

Locked