error while running the code

Phonons, DFPT, electron-phonon, electric-field response, mechanical response…

Moderators: mverstra, joaocarloscabreu

Locked
guptasanjay.85
Posts: 8
Joined: Sun Jun 27, 2010 7:25 am

error while running the code

Post by guptasanjay.85 » Fri Apr 20, 2012 4:52 pm

Respected Sir and abinit users,
Greetings
We have successfully compiled the abinit-6.12.2 and abinit-6.12.3 on HPC supercomputing cluster in parallel,
However when we are running the code its gives the below error.
we have discussed this error with administrator also but this has yet not been resolved.
The error is given below
"
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 14.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 25491 on
node compute-2-6.local exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
Please suggest me more
waiting for your positive reply
WIth Kind Regards

Sanjay D. Gupta
Bhavnagar University,
Bhavnagar

User avatar
pouillon
Posts: 651
Joined: Wed Aug 19, 2009 10:08 am
Location: Spain
Contact:

Re: error while running the code

Post by pouillon » Fri Apr 20, 2012 7:49 pm

It will be difficult for us to help you without knowing what you're going to do more in detail: input file, MPI vendor and version, et c.

At first glance, some of your default MPI parameters might have to be tuned. How to do it will be explained in your MPI documentation.
Yann Pouillon
Simune Atomistics
Donostia-San Sebastián, Spain

guptasanjay.85
Posts: 8
Joined: Sun Jun 27, 2010 7:25 am

Re: error while running the code

Post by guptasanjay.85 » Sat Apr 21, 2012 7:45 am

Respected Sir,
Thank you very much for kind reply
here we are using the
openmpi 1.3.3 and my input file is given below.

input file
**********************************
#Set 1 : ground state self-consistency
ndtset 31 #self-consistency

getwfk1 0 # Cancel default
kptopt1 1 # Automatic generation of k points, taking into account the symmetry
nqpt1 0 # Cancel default
tolvrs1 1.0d-20 # SCF stopping criterion (modify default)
rfphon1 0 # Cancel default
prtwf1 1
prtden1 1

#Q vectors for all datasets

#Complete set of symmetry-inequivalent qpt chosen to be commensurate
# with kpt mesh so that only one set of GS wave functions is needed.
#Generated automatically by running GS calculation with kptopt=1,
# nshift=0, shiftk=0 0 0 (to include gamma) and taking output kpt set
# file as qpt set. Set nstep=1 so only one iteration runs.

nqpt 1 # One qpt for each dataset (only 0 or 1 allowed)
# This is the default for all datasets and must
# be explicitly turned off for dataset 1.
qpt2 0.00000000E+00 0.00000000E+00 0.00000000E+00
qpt3 0.00000000E+00 0.00000000E+00 0.00000000E+00
qpt4 1.25000000E-01 0.00000000E+00 0.00000000E+00
qpt5 2.50000000E-01 0.00000000E+00 0.00000000E+00
qpt6 3.75000000E-01 0.00000000E+00 0.00000000E+00
qpt7 5.00000000E-01 0.00000000E+00 0.00000000E+00
qpt8 1.25000000E-01 1.25000000E-01 0.00000000E+00
qpt9 2.50000000E-01 1.25000000E-01 0.00000000E+00
qpt10 3.75000000E-01 1.25000000E-01 0.00000000E+00
qpt11 5.00000000E-01 1.25000000E-01 0.00000000E+00
qpt12 -3.75000000E-01 1.25000000E-01 0.00000000E+00
qpt13 -2.50000000E-01 1.25000000E-01 0.00000000E+00
qpt14 -1.25000000E-01 1.25000000E-01 0.00000000E+00
qpt15 2.50000000E-01 2.50000000E-01 0.00000000E+00
qpt16 3.75000000E-01 2.50000000E-01 0.00000000E+00
qpt17 5.00000000E-01 2.50000000E-01 0.00000000E+00
qpt18 -3.75000000E-01 2.50000000E-01 0.00000000E+00
qpt19 -2.50000000E-01 2.50000000E-01 0.00000000E+00
qpt20 3.75000000E-01 3.75000000E-01 0.00000000E+00
qpt21 5.00000000E-01 3.75000000E-01 0.00000000E+00
qpt22 -3.75000000E-01 3.75000000E-01 0.00000000E+00
qpt23 5.00000000E-01 5.00000000E-01 0.00000000E+00
qpt24 3.75000000E-01 2.50000000E-01 1.25000000E-01
qpt25 5.00000000E-01 2.50000000E-01 1.25000000E-01
qpt26 -3.75000000E-01 2.50000000E-01 1.25000000E-01
qpt27 5.00000000E-01 3.75000000E-01 1.25000000E-01
qpt28 -3.75000000E-01 3.75000000E-01 1.25000000E-01
qpt29 -2.50000000E-01 3.75000000E-01 1.25000000E-01
qpt30 -3.75000000E-01 5.00000000E-01 1.25000000E-01
qpt31 -2.50000000E-01 5.00000000E-01 2.50000000E-01
# nction calculation of d/dk wave function

iscf2 -3 # Need this non-self-consistent option for d/dk
kptopt2 2 # Modify default to use time-reversal symmetry
rfphon2 0 # Cancel default
rfelfd2 2 # Calculate d/dk wave function only
# tolvrs2 0.0 # Cancel default for d/dk
tolwfr2 1.0d-18 # Use wave function residual criterion instead
prtwf2 2
prtden2 2

#Set 3 : Response function calculation of Q=0 phonons and electric field pert.

getddk3 2 # d/dk wave functions from last dataset
kptopt3 2 # Modify default to use time-reversal symmetry
rfelfd3 3 # Electric-field perturbation response only

#Sets 4-10 : Finite-wave-vector phonon calculations (defaults for all datasets)

getwfk 1 # Use GS wave functions from dataset1
kptopt 3 # Need full k-point set for finite-Q response
rfphon 1 # Do phonon response
rfatpol 1 2 # Treat displacements of all atoms
rfdir 1 1 1 # Do all directions (symmetry will be used)
tolvrs 1.0d-8 # This default is active for sets 3-10
#######################################################################
#Common input variables
#Definition of the unit cell

acell 3*8.915
spgroup 225
angdeg 90 90 90
brvltt -1
# strtarget 1.24570127E+01 1.24570127E+01 1.24570127E+01 0.00000000E+00 0.00000000E+00 0.00000000E+00
# rprim 0.0 0.5 0.5 # In lessons 1 and 2, these primitive vectors
# 0.5 0.0 0.5 #(to be scaled by acell) were 1 0 0 0 1 0 0 0 le1
# 0.5 0.5 0.0 # that is, the default.

#Definition of the atom types
ntypat 2 # There are two types of atom
znucl 40 6 # The keyword "znucl" refers to the atomic number of the
# possible type(s) of atom. The pseudopotential(s)
# mentioned in the "files" file must correspond
# to the type(s) of atom. Here, type 1 is the Aluminum,
# type 2 is the Arsenic.

#Definition of the atoms
natom 2 # There are two atoms
typat 1 2 # The first is of type 1 (Al), the second is of type 2 (As).

xred 0.0 0.0 0.0
0.5 0.5 0.5

#Gives the number of band, explicitely (do not take the default)
nband 16
#Exchange-correlation functional
ixc 9 # LDA Teter Pade parametrization
#Definition of the planewave basis set
ecut 40.0 # Maximal kinetic energy cut-off, in Hartree
nsppol 2
nspden 2
#Definition of the k-point grid
ngkpt 6 6 6
nshiftk 4 # Use one copy of grid only (default)
shiftk 0.0 0.0 0.5 # This gives the usual fcc Monkhorst-Pack grid
0.0 0.5 0.0
0.5 0.0 0.0
0.5 0.5 0.5

#Definition of the SCF procedure
iscf 7 # Self-consistent calculation, using algorithm 5
nstep 100 # Maximal number of SCF cycles
diemac 1.0d60 # is described in the "dielng" input variable section.
occopt 4
tsmear 0.04
#iprcel 45
ntime 50
#nline 10
#nnsclo 12 # The dielectric constant of AlAs is smaller that the one of Si (=12).
***********************************************************************************************

Please suggest me more..

WIth Kind Regards

Sanjay D Gupta

aromero
Posts: 53
Joined: Sun Aug 16, 2009 7:56 pm
Location: Queretaro-Mexico
Contact:

Re: error while running the code

Post by aromero » Tue Apr 24, 2012 11:46 am

did you try to run a very simple case, as diamond Si and see how the paralellization goes? instead of running a very complicated
test, I would test something much simple and see if the system is well parallelized and if the memory is Ok

guptasanjay.85
Posts: 8
Joined: Sun Jun 27, 2010 7:25 am

Re: error while running the code

Post by guptasanjay.85 » Wed Apr 25, 2012 4:25 pm

Respected Sir,
Thank you very much for kind and valuable suggestion.
In fact we first do this kind of small calculation for testing purpose.
I have also tried to run simple file and having the problem to run the executable in paralleel.
The error extracted from the error file is given as below
****************************************
[compute-3-17.local][[65376,1],8][btl_openib_component.c:1484:init_one_device] error obtaining device attributes for mthca0 errno says Resource temporarily unavailable
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

Local host: compute-3-17.local
Local device: mthca0
--------------------------------------------------------------------------
[compute-3-17.local][[65376,1],9][btl_openib_component.c:1484:init_one_device] error obtaining device attributes for mthca0 errno says Resource temporarily unavailable[compute-3-31.local:31694] 1 more process has sent help message help-mpi-btl-openib.txt / error in device init
[compute-3-31.local:31694] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

[compute-3-17.local][[65376,1],10][btl_openib_component.c:1484:init_one_device] error obtaining device attributes for mthca0 errno says Resource temporarily unavailable
[compute-3-31.local:31694] 1 more process has sent help message help-mpi-btl-openib.txt / error in device init
[compute-3-17.local][[65376,1],11][btl_openib_component.c:1484:init_one_device] error obtaining device attributes for mthca0 errno says Resource temporarily unavailable
[compute-3-31.local:31694] 1 more process has sent help message help-mpi-btl-openib.txt / error in device init
[compute-3-17.local][[65376,1],12][btl_openib_component.c:1484:init_one_device] error obtaining device attributes for mthca0 errno says Resource temporarily unavailable
[compute-3-31.local:31694] 1 more process has sent help message help-mpi-btl-openib.txt / error in device init
[compute-3-17.local][[65376,1],13][btl_openib_component.c:1484:init_one_device] error obtaining device attributes for mthca0 errno says Resource temporarily unavailable
[compute-3-31.local:31694] 1 more process has sent help message help-mpi-btl-openib.txt / error in device init
[compute-3-17.local][[65376,1],14][btl_openib_component.c:1484:init_one_device] error obtaining device attributes for mthca0 errno says Resource temporarily unavailable
[compute-3-31.local:31694] 1 more process has sent help message help-mpi-btl-openib.txt / error in device init
[compute-3-17.local][[65376,1],15][btl_openib_component.c:1484:init_one_device] error obtaining device attributes for mthca0 errno says Resource temporarily unavailable
[compute-3-31.local:31694] 1 more process has sent help message help-mpi-btl-openib.txt / error in device init
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 14.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 31697 on

The error in output file is as bellow


***********************************************************

node compute-3-31.local exiting without calling "finalize". This may-catch_rsh /opt/gridengine/default/spool/compute-3-31/active_jobs/45664.1/pe_hostfile
compute-3-31
compute-3-31
compute-3-31
compute-3-31
compute-3-31
compute-3-31
compute-3-31
compute-3-31
compute-3-17
compute-3-17
compute-3-17
compute-3-17
compute-3-17
compute-3-17
compute-3-17
compute-3-17
CMA: unable to query RDMA device
CMA: unable to query RDMA device
CMA: unable to query RDMA device
CMA: unable to query RDMA device
CMA: unable to query RDMA device
CMA: unable to query RDMA device
CMA: unable to query RDMA device
CMA: unable to query RDMA device
Your architecture is not able to handle 8, 4 or 2-bytes FORTRAN file record markers!
You cannot use ABINIT and MPI/IO.
MPI_ERROR_STRING: MPI_ERR_UNKNOWN: unknown error

**************************************************

Here with i am also appending the input file

***************************************************
inputfile


#**************Crysta:qlline ZrC_GGA_CsCl : computation lattice parameter



ndtset 10
kptopt 1 # Option for the automatic generation of k points, taking
# into account the symmetry
nshiftk 4
shiftk 0.5 0.5 0.5 # These shifts will be the same for all grids
0.5 0.0 0.0
0.0 0.5 0.0
0.0 0.0 0.5
ngkpt 3*6
occopt 4
tsmear 0.01
# *********************

Definition cell

acell: 3*5.517
acell+ 3*0.16
spgroup 221
angdeg 90 90 90
brvltt -1
#*************Definition of the atoms

ntypat 2
znucl 40 6
natom 2
typat 1 2
xred
0.0 0.0 0.0
1/2 1/2 1/2
ecut 40
ecutsm 0.5
nstep 100
tolvrs 1.0d-20
ixc 11


*****************************************

and the out file is as below

*************-catch_rsh /opt/gridengine/default/spool/compute-3-31/active_jobs/45664.1/pe_hostfile
compute-3-31
compute-3-31
compute-3-31
compute-3-31
compute-3-31
compute-3-31
compute-3-31
compute-3-31
compute-3-17
compute-3-17
compute-3-17
compute-3-17
compute-3-17
compute-3-17
compute-3-17
compute-3-17
CMA: unable to query RDMA device
CMA: unable to query RDMA device
CMA: unable to query RDMA device
CMA: unable to query RDMA device
CMA: unable to query RDMA device
CMA: unable to query RDMA device
CMA: unable to query RDMA device
CMA: unable to query RDMA device
Your architecture is not able to handle 8, 4 or 2-bytes FORTRAN file record markers!
You cannot use ABINIT and MPI/IO.
MPI_ERROR_STRING: MPI_ERR_UNKNOWN: unknown error

****************************************************************************

Please suggest me more

WIth Kind Regards

Sanjay D Gupta

outfile

guptasanjay.85
Posts: 8
Joined: Sun Jun 27, 2010 7:25 am

Re: error while running the code

Post by guptasanjay.85 » Thu Apr 26, 2012 9:26 am

Respected Sir,
Greeting !!!
We have sort out the problem
We are running the executables from external hard disk mounted on HPC cluster and abinit not running from there.
While the abinit running well from the home directory.However we need to run compulsory abinit from the mounted external hard disk due to lack of space on home directory allotted (5GB) only.

Please suggest me more..

WIth Kind Regards

Sanjay D Gupta

Locked