Abinit 9.0.4, linalg segfault on cluster  [SOLVED]

option, parallelism,...

Moderators: fgoudreault, mcote

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Locked
pomax
Posts: 7
Joined: Wed Jun 05, 2019 5:40 pm

Abinit 9.0.4, linalg segfault on cluster

Post by pomax » Mon Aug 17, 2020 5:41 pm

Hello everybody,
I am trying to compile the git version abinit 9.0.4 on the cluster beluga.

I am able to configure, make and install. I can ask abinit --version or abinit --build. However, whenever I try to start a simulation, I get a Segmentation fault inside the linalg module even while using 1 proc. I've added the stackTrace at the end of the post. Also, the output, ac9 file and config log are added as an attachments.

I've tried to configure directly in the shell and from an interactive session. This is the command I use : ../configure --with-mpi -enable-openmp --with-config-file=olivier.ac9 --prefix="/path/to/Installation/folder/"

Inside the log, I'm getting 2 errors in the linalg section :
1. I don't have Elpa.
2. I don't have <lapacke.h> while trying to use LAPACKE C API support.

Does somebody have an idea where the error could be coming from and how to fix it?

Thank you,
Olivier

Note : I've removed mpi-io to help pinpoint the error.


==== backtrace ====
0 0x0000000000010e90 __funlockfile() ???:0
1 0x0000000000097201 PMPI_Comm_size() ???:0
2 0x0000000000029de9 MKLMPI_Comm_size() ???:0
3 0x0000000000027fb1 mkl_blacs_init() ???:0
4 0x0000000000027ef8 Cblacs_pinfo() ???:0
5 0x00000000000187f9 blacs_gridmap_() ???:0
6 0x00000000000181ce blacs_gridinit_() ???:0
7 0x00000000025bc394 m_slk_mp_init_scalapack_() ???:0
8 0x000000000252a26b m_abi_linalg_mp_abi_linalg_init_() ???:0
9 0x000000000041bda7 m_driver_mp_driver_() ???:0
10 0x000000000040b687 MAIN__() ???:0
11 0x000000000040a0fe main() ???:0
12 0x00000000000202e0 __libc_start_main() ???:0
13 0x000000000040a01a _start() /tmp/nix-build-glibc-2.24.drv-0/glibc-2.24/csu/../sysdeps/x86_64/start.S:120
===================
Attachments
config.log
Config Log
(377.32 KiB) Downloaded 168 times
ac9.log
olivier.ac9
(43.73 KiB) Downloaded 171 times
output.log
The output of configure
(51.47 KiB) Downloaded 185 times

User avatar
jbeuken
Posts: 365
Joined: Tue Aug 18, 2009 9:24 pm
Contact:

Re: Abinit 9.0.4, linalg segfault on cluster  [SOLVED]

Post by jbeuken » Tue Aug 18, 2020 11:10 am

Hi,

your ac9 file without comment :

Code: Select all

CC="mpicc"
CFLAGS="-O2 -xCore-AVX512 -ftz -fp-speculation=safe -fp-model source -mkl=cluster"
CXX="mpic++"
CXXFLAGS="-O2 -xCore-AVX512 -ftz -fp-speculation=safe -fp-model source -mkl=cluster"
FC="mpif90"
FCFLAGS="-O2 -xCore-AVX512 -ftz -fp-speculation=safe -fp-model source -mkl=cluster"

with_mpi="yes"
with_mpi_flavor="auto"
enable_mpi_inplace="yes"
enable_mpi_io="no"

with_linalg="/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/imkl/2018.3.222"
LINALG_LIBS="-L/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/imkl/2018.3.222/mkl/lib -llapack -lblas -lscalapack"

with_libxc="/cvmfs/soft.computecanada.ca/easybuild/software/2017/avx2/Compiler/intel2018.3/libxc/4.3.4"
with_hdf5="/cvmfs/soft.computecanada.ca/easybuild/software/2017/avx512/Compiler/intel2018.3/hdf5/1.10.3"
H5CC="/cvmfs/soft.computecanada.ca/easybuild/software/2017/avx512/Compiler/intel2018.3/hdf5/1.10.3/bin/h5cc"
with_netcdf="/cvmfs/soft.computecanada.ca/easybuild/software/2017/avx512/Compiler/intel2018.3/netcdf/4.6.1"
with_netcdf_fortran="/cvmfs/soft.computecanada.ca/easybuild/software/2017/avx512/Compiler/intel2018.3/netcdf-fortran/4.4.4"
and , tail of output.log :

Code: Select all

Core build parameters
---------------------

  * C compiler       : intel version 18.0
  * Fortran compiler : intel version 18.0
  * architecture     : intel xeon (64 bits)
  * debugging        : basic
  * optimizations    : standard

  * OpenMP enabled   : yes (collapse: yes)
  * MPI    enabled   : yes (flavor: auto)
  * MPI    in-place  : yes
  * MPI-IO enabled   : no
  * GPU    enabled   : no (flavor: none)

  * LibXML2 enabled  : no
  * HDF5 enabled     : yes (MPI support: no)
  * NetCDF enabled   : yes (MPI support: no)
  * NetCDF-F enabled : yes (MPI support: no)

  * FFT flavor       : dfti (libs: auto-detected)
  * LINALG flavor    : mkl (libs: auto-detected)
I'm not familiar with a linalg config like this and I don't know the "-mkl=cluster" option.

But I think I see a problem...

At the end of the output, we see

Code: Select all

LINALG flavor : mkl (libs: auto-detected)
However, you have configured LINALG in the file ac9

Code: Select all

with_linalg="/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/imkl/2018.3.222"
LINALG_LIBS="-L/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/imkl/2018.3.222/mkl/lib -llapack -lblas -lscalapack"
If the information was correct, we should have had this output:

Code: Select all

* LINALG flavor    : mkl (libs: user-defined)
I think the path is wrong.

if you execute this command :

Code: Select all

ls /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/imkl/2018.3.222/mkl/lib
do you see the librairies ( liblapack,... )?

In "my" cluster , the end of path is : mkl/lib/intel64

jmb
------
Jean-Michel Beuken
Computer Scientist

pomax
Posts: 7
Joined: Wed Jun 05, 2019 5:40 pm

Re: Abinit 9.0.4, linalg segfault on cluster

Post by pomax » Tue Aug 18, 2020 5:00 pm

Thanks for the quick reply.
You are right, It looks like I was just missing intel64 at the end my library path.
Everything seems to works fine now.

Best regards,
Olivier

Locked