SEGFAULT in large calculations

orubel · Post by **orubel** » Fri Feb 06, 2015 5:05 pm

Dear ABINIT Community,

I would like to share my experience with running large scale (~150 atoms) ABINIT calculation, a problem encountered and the proposed solution. Of course, you comments and suggestions are highly welcome.

I run a calculation with ABINIT 7.10.2 (latest version) across 32 cores on 2 nodes using MVAPICH2-1.9. The code is compiled with Intel compilers and MKL (details are provided below). The SEGFAULT occurs in 56_xc/rhohxc.F90 line 440

Code: Select all

rhor_(:,:)=rhor(:,:)-nhat(:,:)

Similar structure, but ~70 atoms works fine. It turns out that the size of nhat is about (2500000,1) in the case of 150 atoms. For 70 atoms it is half of that. I should also mention that the stack size is set to "unlimited". I resolved the problem by replacing this piece of code with a cycle. I had to do the same in 42_libpaw/m_pawdij.F90 Here are details of the code modification.

edit (line 262): .../src/56_xc/rhohxc.F90

Code: Select all

!Local variables-------------------------------
!scalars
...
integer :: jfft, jspin  ! Oleg added
...

edit (line 440): .../src/56_xc/rhohxc.F90

Code: Select all

...
!     rhor_(:,:)=rhor(:,:)-nhat(:,:)  ! there is a segfault here
   do jspin = 1, nspden ! Oleg added begin
     do jfft = 1, nfft
        rhor_(jfft,jspin)=rhor(jfft,jspin)-nhat(jfft,jspin)
     end do
   end do            ! Oleg added end
...

edit (line 219): /gs/project/fhu-132-aa/abinit-7.10.2-mvapich2-intel-dbg/src/42_libpaw/m_pawdij.F90

Code: Select all

!Local variables ---------------------------------------
!scalars
...
integer :: ioleg, joleg ! Oleg aded
...

edit (line 345): /gs/project/fhu-132-aa/abinit-7.10.2-mvapich2-intel-dbg/src/42_libpaw/m_pawdij.F90

Code: Select all

!       v_dijhat=vtrial-vxc   ! Segfault
     do joleg = 1, size(vxc,2)   ! Oleg added start
       do ioleg = 1, size(vxc,1)
         v_dijhat(ioleg,joleg) = vtrial(ioleg,joleg) - vxc(ioleg,joleg)
       end do
     end do                      ! Oleg added end
...

It must be related to memory handling, which becomes problematic for large cases. Maybe it is possible to fix the problem at the compilation level without code modification? There are some discussions on stack vs. heap memory (https://software.intel.com/en-us/forums/topic/327647).

Thank you
Oleg

P.S.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

=== Build Information ===
Version : 7.10.2
Build target : x86_64_linux_intel14.0
Build date : 20150205

=== Compiler Suite ===
C compiler : intel14.0
C++ compiler : gnu14.0
Fortran compiler : intel14.0
CFLAGS : -g -O2 -vec-report0
CXXFLAGS : -g -O2 -mtune=native -march=native
FCFLAGS : -g -extend-source -vec-report0 -noaltparam -nofpscomp
FC_LDFLAGS : -static-intel -static-libgcc

=== Optimizations ===
Debug level : basic
Optimization level : standard
Architecture : intel_xeon

=== Multicore ===
Parallel build : yes
Parallel I/O : auto
openMP support : no
GPU support : no

=== Connectors / Fallbacks ===
Connectors on : yes
Fallbacks on : yes
DFT flavor : libxc-fallback
FFT flavor : none
LINALG flavor : netlib-fallback
MATH flavor : none
TIMER flavor : abinit
TRIO flavor : none

=== Experimental features ===
Bindings : @enable_bindings@
Exports : no
GW double-precision : no

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Default optimizations:
-O2 -xHost
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CPP options activated during the build:

CC_INTEL CXX_GNU FC_INTEL

HAVE_DFT_LIBXC HAVE_FC_ALLOCATABLE_DT... HAVE_FC_ASYNC

HAVE_FC_COMMAND_ARGUMENT HAVE_FC_CONTIGUOUS HAVE_FC_CPUTIME

HAVE_FC_ETIME HAVE_FC_EXIT HAVE_FC_FLUSH

HAVE_FC_GAMMA HAVE_FC_GETENV HAVE_FC_GETPID

HAVE_FC_IEEE_EXCEPTIONS HAVE_FC_IOMSG HAVE_FC_ISO_C_BINDING

HAVE_FC_LONG_LINES HAVE_FC_MOVE_ALLOC HAVE_FC_PRIVATE

HAVE_FC_PROTECTED HAVE_FC_STREAM_IO HAVE_FC_SYSTEM

HAVE_LIBPAW_ABINIT HAVE_MPI HAVE_MPI2

HAVE_MPI_IALLREDUCE HAVE_MPI_IALLTOALL HAVE_MPI_IALLTOALLV

HAVE_MPI_IO HAVE_MPI_TYPE_CREATE_S... HAVE_NUMPY

HAVE_OS_LINUX HAVE_TIMER HAVE_TIMER_ABINIT

HAVE_TIMER_MPI HAVE_TIMER_SERIAL USE_MACROAVE

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

torrent · Post by **torrent** » Mon Feb 23, 2015 2:25 pm

Dear Oleg,

I have included your modifs in the devel version of ABIINT.
As you mention, this should be driven at the compiler level.
(but it costs almost nothing to add these lines).

One remark however:
the best way to decrease the memory for large systems is to distribute data among processes.
For the two code sections you modified, you can do it by parallelizing over ffts (npfft keyword).
If you do that, nfft becomes nfft/npfft and the sizes of the v_dijhat and rhor_ decreases...

Regards,

orubel · Post by **orubel** » Mon Mar 02, 2015 11:16 pm

Dear Mark,

thank you for the reply and suggestions. I used "autoparal 4" in order to determine the best combination of npfft, npband, npspinor and npkpt. As far as I understand this option optimizes the speed (not the memory usage). But anyway, here are the results:

32 proc
npfft, npband, npspinor and npkpt: 3 1 1 10
vmem = 1.7 GB/core from 4 GB/core available

64 proc
npfft, npband, npspinor and npkpt: 2 3 1 10
vmem = 1.8 GB/core from 4 GB/core available

It seems that the memory was not a bottle neck in this particular case, since the usage is about 1/2 of the memory available. However, I did not try with the larger npfft and smaller npkpt.

Thank you once again
Oleg

ABINIT Discussion Forums

SEGFAULT in large calculations [SOLVED]

SEGFAULT in large calculations

Re: SEGFAULT in large calculations [SOLVED]

Re: SEGFAULT in large calculations