ABINIT 8.10.3 with GPU, MKL and MAGMA - segmentation fault

option, parallelism,...

Moderators: fgoudreault, mcote

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Locked
Naga
Posts: 2
Joined: Thu Jul 04, 2019 12:58 pm

ABINIT 8.10.3 with GPU, MKL and MAGMA - segmentation fault

Post by Naga » Fri Oct 18, 2019 2:03 pm

Hi,
I have compiled ABINIT 8.10.3 with GPU enabled and with MKL + MAGMA. Following are the settings in config.ac
enable_mpi="yes"
with_mpi_level="2"
with_mpi_prefix="$MPI_HOME"
enable_gpu="yes"
with_gpu_flavor="cuda-double"
with_gpu_incs="-I$CUDA_HOME/include/"
with_gpu_libs="-L$CUDA_HOME/lib64/ -lcublas -lcufft -lcudart -lstdc++"
with_gpu_cppflags="-DHAVE_GPU_MPI"
with_linalg_flavor="mkl+magma"
with_linalg_incs="-I$MKLROOT/include/intel64/lp64 -I$MKLROOT/include -I/home/nvydyanathan/Work/DMRL/ABINIT/new-install/magma-2.5.1/build/include"
with_linalg_libs="-L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -L/home/nvydyanathan/Work/DMRL/ABINIT/new-install/magma-2.5.1/build/lib -lmagma"

Modules loaded are
Currently Loaded Modulefiles:
1) GCCcore/5.4.0 7) PrgEnv/GCC+OpenMPI/2018-05-24
2) binutils/2.26-GCCcore-5.4.0 8) gcc/7.3.0
3) GCC/5.4.0-2.26 9) hwloc/1.11.10
4) OpenBLAS/0.2.18-GCC-5.4.0-2.26-LAPACK-3.6.1 10) openmpi/2.1.3
5) cuda/10.1.105 11) mkl/2017-beta
6) slurm/16.05.0

make test_fast gives a segmentation fault:
backtrace in gdb is as follows:
ABINIT 8.10.3

Give name for formatted input file:
testin_fast.in
Give name for formatted output file:
testin_fast.out
Give root name for generic input files:
testin_fast_i
Give root name for generic output files:
testin_fast_o
Give root name for generic temporary files:
testin_fast_tmp

Program received signal SIGSEGV, Segmentation fault.
_gfortrani_next_record (dtp=dtp@entry=0x7fffffff7a00, done=done@entry=1) at ../../../libgfortran/io/transfer.c:3505
3505 ../../../libgfortran/io/transfer.c: No such file or directory.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 libibverbs-41mlnx1-OFED.4.3.2.1.6.43302.x86_64 libmlx4-41mlnx1-OFED.4.1.0.1.0.43302.x86_64 libmlx5-41mlnx1-OFED.4.3.2.0.0.43302.x86_64 libnl3-3.2.28-4.el7.x86_64 libpciaccess-0.14-1.el7.x86_64 librdmacm-41mlnx1-OFED.4.2.0.1.3.43302.x86_64 librxe-41mlnx1-OFED.4.1.0.1.7.43302.x86_64 munge-libs-0.5.11-3.el7.x86_64 numactl-libs-2.0.9-7.el7.x86_64
(gdb) [dgx03:50755] 2 more processes have sent help message help-mpi-btl-openib.txt / default subnet prefix
[dgx03:50755] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

(gdb) bt
#0 _gfortrani_next_record (dtp=dtp@entry=0x7fffffff7a00, done=done@entry=1) at ../../../libgfortran/io/transfer.c:3505
#1 0x00002aaaaabba3f3 in finalize_transfer (dtp=dtp@entry=0x7fffffff7a00) at ../../../libgfortran/io/transfer.c:3616
#2 0x00002aaaaabba589 in _gfortran_st_write_done (dtp=0x7fffffff7a00) at ../../../libgfortran/io/transfer.c:3747
#3 0x000000000148a66e in m_errors::msg_hndl (
message=<error reading variable: Asked for position 0 of stack, stack only has 0 elements on it.>,
level=<error reading variable: Asked for position 0 of stack, stack only has 0 elements on it.>,
mode_paral=<error reading variable: Asked for position 0 of stack, stack only has 0 elements on it.>,
file=<error reading variable: Asked for position 0 of stack, stack only has 0 elements on it.>, line=<optimized out>,
nodump=<optimized out>, nostop=<optimized out>, _message=<optimized out>, _level=<optimized out>, _mode_paral=<optimized out>,
_file=<optimized out>) at m_errors.F90:901

could you please help resolve this?

thanks,
Naga

User avatar
jbeuken
Posts: 365
Joined: Tue Aug 18, 2009 9:24 pm
Contact:

Re: ABINIT 8.10.3 with GPU, MKL and MAGMA - segmentation fault

Post by jbeuken » Fri Nov 22, 2019 10:50 am

Hi,

this is the ac file used in our testfarm :

Code: Select all

FC_LDFLAGS_EXTRA="-Wl,-z,muldefs"
FC_LIBS="-lstdc++ -ldl"
enable_mpi="yes"
enable_mpi_io="yes"
with_mpi_prefix="${MPIHOME}"
enable_gpu="yes"
with_gpu_flavor="cuda-double"
NVCC_CFLAGS="-O3 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_30,code=sm_30 -Xptxas=-v --use_fast_math --compiler-options -O3,-fPIC"
with_linalg_flavor="mkl+magma"
with_linalg_incs="-I${MAGMA_ROOT}/include -I${MKLROOT}/include"
with_linalg_libs="-L${MAGMA_ROOT}/lib -Wl,--start-group -lmagma -lcuda -Wl,--end-group -L${MKLROOT}/lib/intel64 -lmkl_gf_lp64 -lmkl_sequential -lmkl_core -lgomp -lpthread -lm"

with_trio_flavor="netcdf"
with_netcdf_incs="-I/path_netcdf4_installed/include -I/path_netcdf4_fortran_installed/include"
with_netcdf_libs="-L/path_netcdf4_installed/lib -lnetcdff -L/path_netcdf4_fortran_installed/lib -lnetcdf"

with_dft_flavor="libxc"
with_libxc_incs="-I/path_libxc_installed/include"
with_libxc_libs="-L/path_libxc_installed/lib -lxc"
enable_gw_dpc="yes"
PS : replace "/path_[libxc,netcdf4,netcdf4-fortran]_installed" by the correct paths ;)

jmb
------
Jean-Michel Beuken
Computer Scientist

Locked