autoparal file writing gives MPI ABORT (cluster install)

option, parallelism,...

Moderators: fgoudreault, mcote

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Locked
fhssn1
Posts: 36
Joined: Mon Feb 26, 2018 7:52 pm

autoparal file writing gives MPI ABORT (cluster install)

Post by fhssn1 » Thu Jul 04, 2019 5:22 am

I compiled abinit-8.10.2 with mpi and with mpi-io on one of the XSEDE clusters (comet to be specific).

I used spack for compilation, and the install was in my local directory (no system wide install).

The package appears to be working fine in serial mode.

However, when I have a multi-dataset run (ndtset more than 1), with autoparal 1, the first dataset finishes calculation successfully, then it tries to write data files (e.g., whatever_xo_DS1_DEN). At this point, it gives and MPI ABORT. There are some cryptic messages, one of which is 'whatever_xo_DS1_DEN file does not exist'.

My understanding is that mpi execution is fine, but something is wrong with mpi-io.

Actually this error occurs with system-wide abinit install also (which was installed by cluster admins and is an old version). So my guess is that my compile and install is not bad?

Am I not loading some crucial mpi-io related module?

For my abinit, I load the following modules before running abinit: gnutools, intel, mvapich2_ib, fftw, libxc, abinit/8.10.2.

For system-wide abinit, I simply do a 'module load abinit' and it just works (serial part at least).

Any help highly appreciated, especially by folks who are familiar with XSEDE clusters in general, and comet cluster in particular?

Locked