defective inter-node parallelism with 2017 Intel compilers

option, parallelism,...

Moderators: jbeuken, Jordan, pouillon

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit builds.
For a video explanation on how to build Abinit for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.

defective inter-node parallelism with 2017 Intel compilers

Postby danric » Tue Mar 20, 2018 2:20 pm

Hello!

After trying a lot to solve this, I thought to ask help on this forum as well. Thank you in advance for your advice.

I would like to start using a system that only has the compilers from Intel Composer 2017. Have configured as suggested by the Intel MKL advisor and I can get the abinit executables. The behavior I notice is that abinit calculates OK when running with processors of a single node, but whenever I to use >1 nodes, either the geometry relaxation refuses to converge after a few Broyden steps, crashes after a few Broyden steps, or the calculation completes, but in a longer time than in case of using a single node. So, intra-node parallelism is OK, but inter-node parallelism is not. The behavior is essentially the same with abinit versions 7.8.1, 8.2.2 and 8.6.3.

If I compile same versions of abinit using compilers of Intel Composer 2013 (on another platform that accepts both 2013 and 2017 versions) , it is all OK: my test calculations converge and the time reasonably scales with the number of processors used. And the 2017 compilers still don't provide me with executables that run well on more than 1 node on that platform too.

It might also be relevant to mention that I always need to do some tricks to allow the ./configure to recognize the Intel compilers. In addition to specifying the MPI prefix, I have to manually replace mpif90-> mpiiforc etc (similar for C and C++) inside the configure file, and also specify with_fc_vendor="intel", with_fc_version="17.0.5" etc (similar for C and C++) in the .ac file. If I don't do the above, either the configuration fails or I must manually copy all the .mod files gradually generated in /src directories in the src/mods or src/incs directory in order for the compilation to succeed. I wonder if some irregular behavior is not caused by the configure script being unable to recognize the compilers by itself as it should... though again, using 2013 compilers it always runs OK.

I can provide more details (log files or specific build options used) if someone has an idea about what could be done in order to make inter-node parallelism work with 2017 Intel compilers.

I have seen that someone reported some issues when using more than 1 node back in 2012, though that was for 2013 Intel compilers that in my case work fine, and also I don't always get job crashes. Thus the cause is probably different.

viewtopic.php?f=3&t=1851

Many thanks for reading and for any clues.
Dan
danric
 
Posts: 6
Joined: Wed Nov 09, 2011 3:01 pm

Re: defective inter-node parallelism with 2017 Intel compile

Postby ebousquet » Tue Mar 20, 2018 9:17 pm

Dear danric,
This is probably related to compilation flags you are using. Could you send your config.ac file or the list of config flags you are using?
On which architecture/processors are you running on?
Best wishes,
Eric
ebousquet
 
Posts: 163
Joined: Tue Apr 19, 2011 11:13 am
Location: University of Liege, Belgium

Re: defective inter-node parallelism with 2017 Intel compile

Postby ebousquet » Tue Mar 20, 2018 10:35 pm

This post might help in solving your problem:
https://forum.abinit.org/viewtopic.php?f=3&t=3391&p=10348#p10348
Eric
ebousquet
 
Posts: 163
Joined: Tue Apr 19, 2011 11:13 am
Location: University of Liege, Belgium

Re: defective inter-node parallelism with 2017 Intel compile

Postby danric » Wed Mar 21, 2018 6:56 pm

Thank you, Eric, for your kindness.

I was indeed very hopeful to try the enable_avx_safe_mode option as suggested in that post, but unfortunately the behavior is still not right. For abinit v863, it's OK when running on 1 node (18 processors), and refuses to converge while doing even the first relaxation step of my test case, when running with 36 processors on 2 nodes.

I run abinit on a Intel Xeon E5-2670v2 cluster and also trying one with Intel Xeon Gold 6126 processors. For first one I can run OK with 2013 Intel compilers, but only have 2017 compilers for the 2nd, which means that I can only use 1 node there.

Here are the key options in my .ac file that I usually set. These build options are the result of my reading of internet and experience with compiling abinit (some of these might be overkill, but probably they don't harm). I have also used -O2 instead of -O3, with no improvement in the described behavior.

enable_optim="yes"
CFLAGS_OPTIM='-O3 -mtune=native -march=native -xHost'
CXXFLAGS_OPTIM='-O3 -mtune=native -march=native -xHost'
FCFLAGS_OPTIM='-O3 -mtune=native -march=native -xHost'
enable_mpi="yes"
with_mpi_prefix="...... etc /intel/compilers_and_libraries_2017.5.239/linux/mpi/intel64"
with_linalg_flavor="mkl"
with_linalg_libs=" -lmkl_scalapack_ilp64 -lmkl_cdft_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl"
with_fft_flavor="fftw3"
with_fft_libs=" -lmkl_scalapack_ilp64 -lmkl_cdft_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl"
FCFLAGS_EXTRA=" -mkl=parallel"
CFLAGS_EXTRA=" -DMKL_ILP64 -mkl=parallel"
CXXFLAGS_EXTRA=" -DMKL_ILP64 -mkl=parallel"
with_fc_vendor="intel"
with_fc_version="17.0.5"
with_cc_vendor="intel"
with_cc_version="17.0.5"
with_cxx_vendor="intel"
with_cxx_version="17.0.5"

And here is some build information as printed in the log file of a calculation job.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

=== Build Information ===
Version : 8.6.3
Build target : x86_64_linux_intel17.0
Build date : 20180307

=== Compiler Suite ===
C compiler : gnu
C++ compiler : gnu17.0
Fortran compiler : intel17.0
CFLAGS : -g -O3 -mtune=native -march=native -xHost -DMKL_ILP64 -mkl=parallel
CXXFLAGS : -g -O3 -mtune=native -march=native -xHost -DMKL_ILP64 -mkl=parallel
FCFLAGS : -g -mkl=parallel
FC_LDFLAGS :

=== Optimizations ===
Debug level : basic
Optimization level : yes
Architecture : intel_xeon

=== Multicore ===
Parallel build : yes
Parallel I/O : auto
openMP support : no
GPU support : no

=== Connectors / Fallbacks ===
Connectors on : yes
Fallbacks on : yes
DFT flavor : none
FFT flavor : fftw3
LINALG flavor : mkl
MATH flavor : none
TIMER flavor : abinit
TRIO flavor : none

=== Experimental features ===
Bindings : @enable_bindings@
Exports : no
GW double-precision : no

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Default optimizations:
-O3 -mtune=native -march=native -xHost

Optimizations for 20_datashare:
-O0
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CPP options activated during the build:
CC_GNU CXX_GNU FC_INTEL
HAVE_FC_ALLOCATABLE_DT... HAVE_FC_ASYNC HAVE_FC_COMMAND_ARGUMENT
HAVE_FC_COMMAND_LINE HAVE_FC_CONTIGUOUS HAVE_FC_CPUTIME
HAVE_FC_ETIME HAVE_FC_EXIT HAVE_FC_FLUSH
HAVE_FC_GAMMA HAVE_FC_GETENV HAVE_FC_GETPID
HAVE_FC_IEEE_EXCEPTIONS HAVE_FC_IOMSG HAVE_FC_ISO_C_BINDING
HAVE_FC_ISO_FORTRAN_2008 HAVE_FC_LONG_LINES HAVE_FC_MOVE_ALLOC
HAVE_FC_PRIVATE HAVE_FC_PROTECTED HAVE_FC_STREAM_IO
HAVE_FC_SYSTEM HAVE_FFT HAVE_FFT_FFTW3
HAVE_FFT_MPI HAVE_FFT_SERIAL HAVE_LIBPAW_ABINIT
HAVE_LIBTETRA_ABINIT HAVE_LINALG HAVE_LINALG_AXPBY
HAVE_LINALG_GEMM3M HAVE_LINALG_MKL_IMATCOPY HAVE_LINALG_MKL_OMATADD
HAVE_LINALG_MKL_OMATCOPY HAVE_LINALG_MKL_THREADS HAVE_LINALG_MPI
HAVE_LINALG_SERIAL HAVE_MPI HAVE_MPI2
HAVE_MPI_IALLREDUCE HAVE_MPI_IALLTOALL HAVE_MPI_IALLTOALLV
HAVE_MPI_INTEGER16 HAVE_MPI_IO HAVE_MPI_TYPE_CREATE_S...
HAVE_OS_LINUX HAVE_TIMER_ABINIT USE_MACROAVE
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

One thing I don't get is why the C and C++ compilers are recognized as GNU variety even though I think that I did everything to ensure they are seen as Intel's (which should perhaps be done automatically, but never mind). Probably this is not the cause of the defective inter-node paralelism, though I am not sure.

Many thanks again if you, Eric, or someone else could share some ideas about what I may be doing wrong or could try to solve the inter-node parallelism issue.

Dan
danric
 
Posts: 6
Joined: Wed Nov 09, 2011 3:01 pm

Re: defective inter-node parallelism with 2017 Intel compile

Postby ebousquet » Wed Mar 21, 2018 9:09 pm

Dear Dan,
We'll try to solve this problem. Intel17 has more optimizations than 2013 by default, which speedup the calculations but also can add noise that make the code diverging, mostly the relaxations is indeed affected (also observed on other codes)...
It is indeed strange that it does not recognize your intel C compiler, maybe you could force him by specifying the compiler directly, for example:
Code: Select all
CC=mpiicc
CXX=mpiicpc
FC=mpiifort


Could you try to compile with the following compilation flags, it helped in my case with intel17 and Xeon:
Code: Select all
FCFLAGS="-O2 -axCORE-AVX2 -xavx -mkl -fp-model precise"
FFLAGS="-O2 -axCORE-AVX2 -xavx -mkl -fp-model precise"
CFLAGS="-O2 -axCORE-AVX2 -xavx -mkl -fp-model precise"
CXXFLAGS="-O2 -axCORE-AVX2 -xavx -mkl -fp-model precise"


And in my case I'm using the following (sounds similar to yours):
Code: Select all
 
--with-fc-vendor=intel
--with-fft-flavor=fftw3-mkl
--with-fft-libs="-lmkl_intel_lp64 -lmkl_sequential -lmkl_core"
--with-linalg-flavor="mkl+scalapack"
--with-linalg-libs="-lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl"
--enable-mpi --enable-mpi-inplace --enable-mpi-io
--enable-zdot-bugfix --enable-avx-safe-mode --enable-fallbacks


Let me know how it goes, mostly with the compilation flags.
Best wishes,
Eric
ebousquet
 
Posts: 163
Joined: Tue Apr 19, 2011 11:13 am
Location: University of Liege, Belgium

Re: defective inter-node parallelism with 2017 Intel compile

Postby danric » Thu Mar 22, 2018 7:09 pm

Hello again, Eric, and thank you.

First, the only thing that works so that I can access mpiifort and the others (instead of mpif90, mpicc and mpicxx) is to directly change those inside the "configure" file. As mentioned, it does not apparently recognize mpiicc and mpiicpc as Intel compilers, but I think that these are the ones that are used for compiling. And the fortran compiler is seen as Intel's (though if I don't use the "with-vendor=intel" flag I still run into trouble with not copying the mods file in their directory and the build stops).

I have tried, today, and years ago, to do the same using FC=mpiifort etc, but for some reason doing so still wraps to the GNU variety of compilers . Technically, the FC is listed as wrap-mpifc in config.log, and inside that I see the FC= as set by me, followed by "export FC" which then goes to the mpif90. So I concluded that it can't be helped using FC=, CC= and CXX=, and I was happy to find the solution to modify those inside "configure".

Now, back to the parallelization issue. I have tried the flags you suggested and although there are other combinations of them to try if inspiration dries out, unfortunately the one silver bullet that would solve this issue is still elusive. Although not using the exact linear algebra and fft library options that you suggested, I compiled with the flags that apparently worked in your case. Though there are some complaints/warnings when compiling many F90 programs during "make", the build doesn't stop and I can get the executables.

The single node calculation runs OK though a bit slower than before, and the 2-node still doesn't converge (perhaps I should mention that when going from 1 node to 2 nodes the variable that is doubled is the one related to the band parallelization "npband").

Here is proof that the executables obtained have indeed used the flags that worked for you.

Code: Select all
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 === Build Information ===
  Version       : 8.6.3
  Build target  : x86_64_linux_intel17.0
  Build date    : 20180322
 
 === Compiler Suite ===
  C compiler       : gnu
  C++ compiler     : gnu17.0
  Fortran compiler : intel17.0
  CFLAGS           : -O2 -axCORE-AVX2 -xavx -mkl -fp-model precise
  CXXFLAGS         : -O2 -axCORE-AVX2 -xavx -mkl -fp-model precise
  FCFLAGS          : -O2 -axCORE-AVX2 -xavx -mkl -fp-model precise
  FC_LDFLAGS       :
 
 === Optimizations ===
  Debug level        : basic
  Optimization level : yes
  Architecture       : intel_xeon
 
 === Multicore ===
  Parallel build : yes
  Parallel I/O   : auto
  openMP support : no
  GPU support    : no
 
 === Connectors / Fallbacks ===
  Connectors on : yes
  Fallbacks on  : yes
  DFT flavor    : none
  FFT flavor    : fftw3
  LINALG flavor : mkl
  MATH flavor   : none
  TIMER flavor  : abinit
  TRIO flavor   : none
 
 === Experimental features ===
  Bindings            : @enable_bindings@
  Exports             : no
  GW double-precision : no
 
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 Default optimizations:
   --- None ---


 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 CPP options activated during the build:

                    CC_GNU                   CXX_GNU                  FC_INTEL
 
        HAVE_AVX_SAFE_MODE HAVE_FC_ALLOCATABLE_DT...             HAVE_FC_ASYNC
 
  HAVE_FC_COMMAND_ARGUMENT      HAVE_FC_COMMAND_LINE        HAVE_FC_CONTIGUOUS
 
           HAVE_FC_CPUTIME             HAVE_FC_ETIME              HAVE_FC_EXIT
 
             HAVE_FC_FLUSH             HAVE_FC_GAMMA            HAVE_FC_GETENV
 
            HAVE_FC_GETPID   HAVE_FC_IEEE_EXCEPTIONS             HAVE_FC_IOMSG
 
     HAVE_FC_ISO_C_BINDING  HAVE_FC_ISO_FORTRAN_2008        HAVE_FC_LONG_LINES
 
        HAVE_FC_MOVE_ALLOC           HAVE_FC_PRIVATE         HAVE_FC_PROTECTED
 
         HAVE_FC_STREAM_IO            HAVE_FC_SYSTEM                  HAVE_FFT
 
            HAVE_FFT_FFTW3              HAVE_FFT_MPI           HAVE_FFT_SERIAL
 
        HAVE_LIBPAW_ABINIT      HAVE_LIBTETRA_ABINIT               HAVE_LINALG
 
         HAVE_LINALG_AXPBY        HAVE_LINALG_GEMM3M  HAVE_LINALG_MKL_IMATCOPY
 
   HAVE_LINALG_MKL_OMATADD  HAVE_LINALG_MKL_OMATCOPY   HAVE_LINALG_MKL_THREADS
 
           HAVE_LINALG_MPI        HAVE_LINALG_SERIAL     HAVE_LINALG_ZDOTC_B*G
 
     HAVE_LINALG_ZDOTU_B*G                  HAVE_MPI                 HAVE_MPI2
 
         HAVE_MPI2_INPLACE       HAVE_MPI_IALLREDUCE        HAVE_MPI_IALLTOALL
 
       HAVE_MPI_IALLTOALLV        HAVE_MPI_INTEGER16               HAVE_MPI_IO
 
 HAVE_MPI_TYPE_CREATE_S...             HAVE_OS_LINUX         HAVE_TIMER_ABINIT
 
              USE_MACROAVE                                                     
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


And I also tried a combination with my original compilation flags (there were some complaints in config.log about using -xHost and -march=native together with your suggested flags, so I removed those), with same behavior as with above options.

Code: Select all
=== Build Information ===
  Version       : 8.6.3
  Build target  : x86_64_linux_intel17.0
  Build date    : 20180322
 
 === Compiler Suite ===
  C compiler       : gnu
  C++ compiler     : gnu17.0
  Fortran compiler : intel17.0
  CFLAGS           :  -g -O2 -mtune=native -axCORE-AVX2 -xavx -mkl -fp-model precise   -DMKL_ILP64 -mkl=parallel
  CXXFLAGS         :  -g -O2 -mtune=native -axCORE-AVX2 -xavx -mkl -fp-model precise   -DMKL_ILP64 -mkl=parallel
  FCFLAGS          :  -g   -mkl=parallel
  FC_LDFLAGS       :
 
 === Optimizations ===
  Debug level        : basic
  Optimization level : yes
  Architecture       : intel_xeon
 
 === Multicore ===
  Parallel build : yes
  Parallel I/O   : auto
  openMP support : no
  GPU support    : no
 
 === Connectors / Fallbacks ===
  Connectors on : yes
  Fallbacks on  : yes
  DFT flavor    : none
  FFT flavor    : fftw3
  LINALG flavor : mkl
  MATH flavor   : none
  TIMER flavor  : abinit
  TRIO flavor   : none
 
 === Experimental features ===
  Bindings            : @enable_bindings@
  Exports             : no
  GW double-precision : no
 
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 Default optimizations:
   -O2 -mtune=native -axCORE-AVX2 -xavx -mkl -fp-model precise


 Optimizations for 20_datashare:
   -O0


 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 CPP options activated during the build:

                    CC_GNU                   CXX_GNU                  FC_INTEL
 
        HAVE_AVX_SAFE_MODE HAVE_FC_ALLOCATABLE_DT...             HAVE_FC_ASYNC
 
  HAVE_FC_COMMAND_ARGUMENT      HAVE_FC_COMMAND_LINE        HAVE_FC_CONTIGUOUS
 
           HAVE_FC_CPUTIME             HAVE_FC_ETIME              HAVE_FC_EXIT
 
             HAVE_FC_FLUSH             HAVE_FC_GAMMA            HAVE_FC_GETENV
 
            HAVE_FC_GETPID   HAVE_FC_IEEE_EXCEPTIONS             HAVE_FC_IOMSG
 
     HAVE_FC_ISO_C_BINDING  HAVE_FC_ISO_FORTRAN_2008        HAVE_FC_LONG_LINES
 
        HAVE_FC_MOVE_ALLOC           HAVE_FC_PRIVATE         HAVE_FC_PROTECTED
 
         HAVE_FC_STREAM_IO            HAVE_FC_SYSTEM                  HAVE_FFT
 
            HAVE_FFT_FFTW3              HAVE_FFT_MPI           HAVE_FFT_SERIAL
 
        HAVE_LIBPAW_ABINIT      HAVE_LIBTETRA_ABINIT               HAVE_LINALG
 
         HAVE_LINALG_AXPBY        HAVE_LINALG_GEMM3M  HAVE_LINALG_MKL_IMATCOPY
 
   HAVE_LINALG_MKL_OMATADD  HAVE_LINALG_MKL_OMATCOPY   HAVE_LINALG_MKL_THREADS
 
           HAVE_LINALG_MPI        HAVE_LINALG_SERIAL                  HAVE_MPI
 
                 HAVE_MPI2       HAVE_MPI_IALLREDUCE        HAVE_MPI_IALLTOALL
 
       HAVE_MPI_IALLTOALLV        HAVE_MPI_INTEGER16               HAVE_MPI_IO
 
 HAVE_MPI_TYPE_CREATE_S...             HAVE_OS_LINUX         HAVE_TIMER_ABINIT
 
              USE_MACROAVE                                                     
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



I value your continued advice, with the hope that perhaps we could still identify some compilation flags that could, as a first step at least, essentially "downgrade" the optimized vectorization or whatever of 2017 version compilers to an earlier version such as 2013. If you can't think of other obvious things to try, learning more about the compilation flags you suggested might be a good starting point for me to find others that could help.

On the other hand, I worry if my issues are not due to the architecture or OS etc implementation on the system I use, which could reduce the chance of finding a solution in a reasonable time. You mention that there are similar issues with other codes too, but if you and apparently most of everyone else have found solutions (otherwise there would probably be more outcry about the performance of recent Intel compilers), I guess that either the solution should soon be found in my case too, or not at all...

Many thanks for spending your precious time to help me and I will try anything you suggest could work.

best wishes,
Dan
danric
 
Posts: 6
Joined: Wed Nov 09, 2011 3:01 pm

Re: defective inter-node parallelism with 2017 Intel compile

Postby Jordan » Mon Mar 26, 2018 11:04 am

Hi,

My opinion is that there is something with you MPI installation.
0) Remove optimization and scalapack
1) Check that the MPI you use was compiled with intel 17 and not an other version of intel nor gnu.
2) Check the optimizations of the MPI with your admin. There might be some variables used to accelerate communication and thus decrease accuracy.
3) Don't use you fancy flags, try the basic one (which make the configure fail) and send us the config.log file

cheers
Jordan
 
Posts: 280
Joined: Tue May 07, 2013 9:47 am

Re: defective inter-node parallelism with 2017 Intel compile

Postby danric » Mon Mar 26, 2018 1:35 pm

Hi and thank you for taking this on systematically.

Well, I think that doing some of what you are suggesting kinda sets us back to issues I thought I found solutions for and might take even more time to figure out, but let's try.

Alright, I removed all optimization flags (and everything that Eric suggested), and scalapack (not sure if you wanted me to give up Intel MKL and FFTW3 altogether, so for now I kept some of the options indicated by the MKL advisory), so now the .ac looks as follows:

Code: Select all
FC=mpiifort
CC=mpiicc
CXX=mpiicpc
enable_mpi="yes"
with_mpi_prefix="/octfs/apl/intel/compilers_and_libraries_2017.5.239/linux/mpi/intel64"
with_linalg_flavor="mkl"
with_linalg_libs=" -lmkl_cdft_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl"
with_fft_flavor="fftw3"
with_fft_libs=" -lmkl_cdft_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl"
FCFLAGS_EXTRA=" -mkl=parallel"
CFLAGS_EXTRA=" -DMKL_ILP64 -mkl=parallel"
CXXFLAGS_EXTRA=" -DMKL_ILP64 -mkl=parallel"


With this, the ./configure fails at the point where it realizes that the GNU compilers (yes, Intel's are not recognized) don't have an iso_C_binding_module. The config.log in this case is here:
configGNU.log
(168.33 KiB) Downloaded 55 times


I mentioned before that I can make the ./configure recognize the mpi Intel compilers if I manually modify the "configure" file (instead of FC=mpiifort, CC=mpiicc, CXX=mpiicpc, I just do mpif90->mpiifort etc). Then, with the .ac below:

Code: Select all
enable_mpi="yes"
with_mpi_prefix="/octfs/apl/intel/compilers_and_libraries_2017.5.239/linux/mpi/intel64"
with_linalg_flavor="mkl"
with_linalg_libs=" -lmkl_cdft_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl"
with_fft_flavor="fftw3"
with_fft_libs=" -lmkl_cdft_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl"
FCFLAGS_EXTRA=" -mkl=parallel"
CFLAGS_EXTRA=" -DMKL_ILP64 -mkl=parallel"
CXXFLAGS_EXTRA=" -DMKL_ILP64 -mkl=parallel"


I get the following config.log file:
configINTEL.log
(182.12 KiB) Downloaded 53 times


In this case, the ./configure completes. According to the config.log file, the mpiicc, mpiifort and mpiicpc are apparently used as intended, yet the compiler type is mistaken: GNU type for C and C++ and Generic0.0 for Fortran.

It is particularly critical, I think, that the fortran compiler type is not recognized at Intel's, because the "make" stops soon after starting, at the point below:

Code: Select all
[...]
ranlib lib14_hidewrite.a
make[1]: ?????? `/octfs/home/d45678/abinit/v863_octG/abinit-8.6.3/src/14_hidewrite' ?????
cd src/16_hideleave && make lib16_hideleave.a
make[1]: ?????? `/octfs/home/d45678/abinit/v863_octG/abinit-8.6.3/src/16_hideleave' ?????
/octfs/apl/intel/compilers_and_libraries_2017.5.239/linux/mpi/intel64/bin/mpiifort -DHAVE_CONFIG_H -I. -I../..  -I../../src/10_dumpinfo -I../../src/10_dumpinfo -I../../src/12_hide_mpi -I../../src/12_hide_mpi -I../../src/11_memory_mpi -I../../src/11_memory_mpi -I../../src/10_defs -I../../src/10_defs -I../../src/14_hidewrite -I../../src/14_hidewrite -I../../src/incs -I../../src/incs -I/octfs/home/d45678/abinit/v863_octG/abinit-8.6.3/fallbacks/exports/include    -g   -mkl=parallel  -c -o leave_new.o leave_new.F90
/octfs/apl/intel/compilers_and_libraries_2017.5.239/linux/mpi/intel64/bin/mpiifort -DHAVE_CONFIG_H -I. -I../..  -I../../src/10_dumpinfo -I../../src/10_dumpinfo -I../../src/12_hide_mpi -I../../src/12_hide_mpi -I../../src/11_memory_mpi -I../../src/11_memory_mpi -I../../src/10_defs -I../../src/10_defs -I../../src/14_hidewrite -I../../src/14_hidewrite -I../../src/incs -I../../src/incs -I/octfs/home/d45678/abinit/v863_octG/abinit-8.6.3/fallbacks/exports/include    -g   -mkl=parallel  -c -o m_xieee.o m_xieee.F90
/octfs/apl/intel/compilers_and_libraries_2017.5.239/linux/mpi/intel64/bin/mpiifort -DHAVE_CONFIG_H -I. -I../..  -I../../src/10_dumpinfo -I../../src/10_dumpinfo -I../../src/12_hide_mpi -I../../src/12_hide_mpi -I../../src/11_memory_mpi -I../../src/11_memory_mpi -I../../src/10_defs -I../../src/10_defs -I../../src/14_hidewrite -I../../src/14_hidewrite -I../../src/incs -I../../src/incs -I/octfs/home/d45678/abinit/v863_octG/abinit-8.6.3/fallbacks/exports/include    -g   -mkl=parallel  -c -o interfaces_16_hideleave.o interfaces_16_hideleave.F90
/octfs/apl/intel/compilers_and_libraries_2017.5.239/linux/mpi/intel64/bin/mpiifort -DHAVE_CONFIG_H -I. -I../..  -I../../src/10_dumpinfo -I../../src/10_dumpinfo -I../../src/12_hide_mpi -I../../src/12_hide_mpi -I../../src/11_memory_mpi -I../../src/11_memory_mpi -I../../src/10_defs -I../../src/10_defs -I../../src/14_hidewrite -I../../src/14_hidewrite -I../../src/incs -I../../src/incs -I/octfs/home/d45678/abinit/v863_octG/abinit-8.6.3/fallbacks/exports/include    -g   -mkl=parallel  -c -o m_errors.o m_errors.F90
rm -f lib16_hideleave.a
ar rc lib16_hideleave.a leave_new.o m_xieee.o m_errors.o interfaces_16_hideleave.o
ranlib lib16_hideleave.a
make[1]: ?????? `/octfs/home/d45678/abinit/v863_octG/abinit-8.6.3/src/16_hideleave' ?????
cd src/17_libtetra_ext && make lib17_libtetra_ext.a
make[1]: ?????? `/octfs/home/d45678/abinit/v863_octG/abinit-8.6.3/src/17_libtetra_ext' ?????
/octfs/apl/intel/compilers_and_libraries_2017.5.239/linux/mpi/intel64/bin/mpiifort -DHAVE_CONFIG_H -I. -I../..  -I../../src/11_memory_mpi -I../../src/11_memory_mpi -I../../src/16_hideleave -I../../src/16_hideleave -I../../src/14_hidewrite -I../../src/14_hidewrite -I../../src/incs -I../../src/incs -I/octfs/home/d45678/abinit/v863_octG/abinit-8.6.3/fallbacks/exports/include    -g   -mkl=parallel  -c -o m_kptrank.o m_kptrank.F90
/octfs/apl/intel/compilers_and_libraries_2017.5.239/linux/mpi/intel64/bin/mpiifort -DHAVE_CONFIG_H -I. -I../..  -I../../src/11_memory_mpi -I../../src/11_memory_mpi -I../../src/16_hideleave -I../../src/16_hideleave -I../../src/14_hidewrite -I../../src/14_hidewrite -I../../src/incs -I../../src/incs -I/octfs/home/d45678/abinit/v863_octG/abinit-8.6.3/fallbacks/exports/include    -g   -mkl=parallel  -c -o interfaces_17_libtetra_ext.o interfaces_17_libtetra_ext.F90
m_kptrank.F90(29): ??? #7002: ????????????????????????????INCLUDE ????????????   [DEFS_BASIS]
 use m_profiling_abi
-----^
m_kptrank.F90(30): ??? #6580: ??????????????????   [MSG_HNDL]
 use m_errors, only : msg_hndl
----------------------^
m_kptrank.F90(160): ??? #6632: ?????????????????????????????   [FILE]
     call msg_hndl(msg,"ERROR", "PERS" ,file="m_kptrank.F90", line=160)
----------------------------------------^
m_kptrank.F90(160): ??? #6632: ?????????????????????????????   [LINE]
     call msg_hndl(msg,"ERROR", "PERS" ,file="m_kptrank.F90", line=160)
--------------------------------------------------------------^
m_kptrank.F90(171): ??? #6632: ?????????????????????????????   [FILE]
     call msg_hndl(msg,"ERROR", "PERS" ,file="m_kptrank.F90", line=171)
----------------------------------------^
m_kptrank.F90(171): ??? #6632: ?????????????????????????????   [LINE]
     call msg_hndl(msg,"ERROR", "PERS" ,file="m_kptrank.F90", line=171)
--------------------------------------------------------------^
m_kptrank.F90(288): ??? #6406: ?????????????????????????   [MSG_HNDL]
   call msg_hndl(msg,"ERROR", "PERS" ,file="m_kptrank.F90", line=288)
--------^
?????? m_kptrank.F90 ????????? (??? 1)?
make[1]: *** [m_kptrank.o] ??? 1
make[1]: ?????? `/octfs/home/d45678/abinit/v863_octG/abinit-8.6.3/src/17_libtetra_ext' ?????
make: *** [abinit] ??? 2


Several years ago I figured out that I can still obtain the executables if I manually copy all the .mod files that are created in each directory of /src to /incs or /mods directory. So doing allows me to obtain the executables, yet the compile type of "abinit" will be marked as "prepared for a x86_64_linux_Generic0.0 computer"

I have learned to avoid the hassle of copying all those files manually (and have the "abinit" recognized as compiler under Intel compilers) simply by adding the below entries to the .ac file (it might be enough to do this for Fortran, but haven't tried; naturally the version number changes according to the one actually used):

Code: Select all
with_fc_vendor="intel"
with_fc_version="17.0.5"
with_cc_vendor="intel"
with_cc_version="17.0.5"
with_cxx_vendor="intel"
with_cxx_version="17.0.5"


Doing the latter, allows me to obtain the executables with behavior as described in the previous posts. (Well, I have yet to try without any optimizations for the latest version of abinit, and will report the result soon, but I somehow doubt that simply removing the optimizations and scalapack is likely to be the solution that I was looking for... could be wrong though, will try and report)

Please advise further when you have a moment to consider these.
Thank you and best,
Dan

PS I have also asked my admin's help, but didn't get very far yet, which is why I decided to ask here as well. Will confirm the other MPI-related questions with them soon too.
danric
 
Posts: 6
Joined: Wed Nov 09, 2011 3:01 pm

Re: defective inter-node parallelism with 2017 Intel compile

Postby jbeuken » Mon Mar 26, 2018 5:21 pm

Hi,

I successfully run on a cluster with 1, 2, 4 and 8 nodes ( 16 cores each ) with version 8.4.2, 8.5.0 and 8.7.3
the version of intel compiler is clusterstudio 2017.3.191

this is my ac file :

Code: Select all
# clusterstudio   ifort 17

# Fortran compiler
# ================
FC="mpiifort"
CC="mpiicc"
CXX="mpiicpc"
AR=ar

# Fortran optimization flags
# ==========================
FCFLAGS_EXTRA="-g -O3 -align all"
enable_optim="yes"
enable_gw_dpc="yes"
enable_64bit_flags="yes"

enable_openmp="yes"
FCFLAGS_OPENMP="-openmp"

# Parallel compilation flags
# ==========================
enable_mpi="yes"
enable_mpi_io="yes"
# I_MPI_ROOT=/opt/software/intel/impi/2017.2.191
with_mpi_incs="-I${I_MPI_ROOT}/include64"
with_mpi_libs="-L${I_MPI_ROOT}/lib64 -lmpi"

# Linear Algebra library links (ScaLAPACK)
# ========================================
with_linalg_flavor="mkl+scalapack"
with_linalg_incs="-I${MKLROOT}/include"
with_linalg_libs="-L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl"

# FFTW3 / MKL
# ========================================
with_fft_flavor="dfti"
with_fft_libs="-L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl"
with_fft_incs="-I${MKLROOT}/include"

# Plugins additionels
# ===================
with_dft_flavor="libxc"
with_libxc_incs="-I${HOME}/local/libxc/include"
with_libxc_libs="-L${HOME}/local/libxc/lib -lxcf90 -lxc"

with_trio_flavor="netcdf"
with_netcdf_incs="-I${HOME}/local/netcdf/include -I${HOME}/local/hdf5/include"
with_netcdf_libs="-L${HOME}/local/netcdf/lib -lnetcdff -lnetcdf -L${HOME}/local/hdf5/lib -lhdf5_hl -lhdf5"
enable_netcdf_default="yes"




and , this is a part of my input file ( UO2 54 Atoms )

Code: Select all
#ABINIT - INPUT FILE
#UO2 54 ATOMS

#Process distribution (parallelism) - TO BE ADAPTED
# autoparal 1
# npkpt 2 npband 4 npfft 4 # 32 processeurs
# npkpt 2 npband 8 npfft 4 # 64 processeurs -1
# npkpt 2 npband 4 npfft 8 # 64 processeurs -2
 npkpt 2 npband 8 npfft 8 # 128 processeurs
paral_kgb 1
bandpp 2

#fftalg 512

#Plane wave basis
ecut 6.
pawecutdg 10.

#Self-consistent cycle parameters
toldfe 1.d-5
nstep 30
nline 6
diemac 10.

#K-points and symetries
nkpt 1
kpt 0.5 0.5 0.5
kptopt 0
nsym 0
maxnsym 2048
chksymbreak 0

#Electronic configuration
nsppol 2
nband 256
occopt 3
tsmear 0.0005 hartree

...


my 5¢

regards

jmb
User avatar
jbeuken
 
Posts: 278
Joined: Tue Aug 18, 2009 9:24 pm

Re: defective inter-node parallelism with 2017 Intel compile

Postby danric » Mon Mar 26, 2018 6:05 pm

Dear JMB

Thank you. There are a few things worth trying there, so I will and report if I can get it to work.

Many thanks again,
Dan
danric
 
Posts: 6
Joined: Wed Nov 09, 2011 3:01 pm


Return to Configuration

Who is online

Users browsing this forum: Bing [Bot] and 1 guest