MPI_Abort error in Abinit 8.0.8  [SOLVED]

Total energy, DFT+U, BigDFT,...

Moderator: amadon

MPI_Abort error in Abinit 8.0.8

Postby sheng » Thu Jul 28, 2016 10:54 am

As the title says I encounter MPI_Abort error for one of my calculation of a supercell.
The Abinit 8.0.8 I use is compiled with IntelMPI (ifort 15.0) and it passed all of the test suites.
The supercell is generated using Phonopy program with the intention to do finite difference calculations. The previous few supercells can be run without any problem.

The errors produced are related MPI_Abort, such as
Code: Select all
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 63
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 68
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 71
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 62
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 66
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 67
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 60
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 65
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 69
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 70
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 61
INTERNAL ERROR: invalid error code 78ea36 (Ring ids do not match) in MPIR_Allreduce_impl:1262
INTERNAL ERROR: invalid error code 58ea36 (Ring ids do not match) in MPIR_Allreduce_impl:1262
INTERNAL ERROR: invalid error code 58ea36 (Ring ids do not match) in MPIR_Allreduce_impl:1262
INTERNAL ERROR: invalid error code 68ea36 (Ring ids do not match) in MPIR_Allreduce_impl:1262
Fatal error in MPI_Allreduce: Other MPI error, error stack:
MPI_Allreduce(1421)......: MPI_Allreduce(sbuf=0xf2a2ea0, rbuf=0xf6941c0, count=516706, MPI_DOUBLE_PRECISION, MPI_SUM, comm=0x84000004) failed
MPIR_Allreduce_impl(1262):

The errors only appear when I activate KGB parallelization using a certain distribution of processors. For example the distribution cause errors
Code: Select all
paral_kgb 1   npkpt 7 npband 12 npfft 1

but the following processor distribution runs without problem
Code: Select all
paral_kgb 1   npkpt 14 npband 4 npfft 1

All other parameters are still the same.

The log file and inout files are attached.
Thank you.
Attachments
log.log
(209.67 KiB) Downloaded 211 times
BaTiO3.in.log
(3.1 KiB) Downloaded 183 times
sheng
 
Posts: 64
Joined: Fri Apr 11, 2014 3:44 pm

Re: MPI_Abort error in Abinit 8.0.8  [SOLVED]

Postby Jordan » Fri Jul 29, 2016 9:51 am

Hi,

This probably means you have the usual issue with the lobpcg algorithm which is automatically activated with paral_kgb. Changing the processors distribution changes the way the algorithm works and so avoid or produce an error.
There is currently no cure available for public abinit, but hopefully soon this will change.

Cheers

Jordan
Jordan
 
Posts: 281
Joined: Tue May 07, 2013 9:47 am

Re: MPI_Abort error in Abinit 8.0.8

Postby sheng » Fri Jul 29, 2016 10:28 am

Thanks Jordan for the clarification.
sheng
 
Posts: 64
Joined: Fri Apr 11, 2014 3:44 pm


Return to Ground state

Who is online

Users browsing this forum: No registered users and 2 guests