Execution time and memory with ACLM

option, parallelism,...

Moderators: fgoudreault, mcote

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Locked
gryko
Posts: 12
Joined: Tue Feb 26, 2013 5:56 pm

Execution time and memory with ACLM

Post by gryko » Thu Oct 03, 2013 6:15 pm

Hello,

Here are excution times of Abinit compiled with and without ACML library using gfortran and internal FFT.
System: AMD 8350 Vishera 4.0 GHz CPU (8 cores), 32 GB memory. Input: scf calculation for 34 Si atoms in FCC cell.
Execution times per core (8 cores)

Abinit 7.4.2 compiled with ACML 1861
Abinit 7.4.2 without internal lib 1890
Abinit 7.2.1 - binary from abint site 1883

However, for phon calculations with the same system, Abinit with the ACML library uses over 32 GB memory,
whereas Abinit from the web site or compiled with internal libs uses only ~19 GB.

Any suggestions why? The ACML library supposed to be optimized for AMD processors?

Thank you,
Jan Gryko

User avatar
Alain_Jacques
Posts: 279
Joined: Sat Aug 15, 2009 9:34 pm
Location: Université catholique de Louvain - Belgium

Re: Execution time and memory with ACLM

Post by Alain_Jacques » Fri Oct 04, 2013 12:41 pm

Hi Jan,

First of all, if you invest time in optimizing Abinit, I would compile it with FFTW3 - an enhanced FFT lib is probably more efficient than enhanced BLAS/LAPACK

Although AMD markets the 8350 as an 8 "cores" CPU (very deceptively IMHO) , it only has 4 compute units i.e. 4 caches and 4 arithmetic units. So it is what I call (and Intel too) a hyperthreaded 4 cores CPU. May I suggest to compare ACML to plain BLAS LAPCK with only 4 parallel threads to avoid overloading. I don't know how you did the test (either several MPI slots with single threaded BLAS/LAPCK or sequential with multithreaded BLAS/LAPACK - the former is the most efficient and will benefit from enhanced libs)

I have no clue about the memory footprint discrepancy - ACML is multithreaded but it shouldn't replicate data for this to work.
ACML is supposed to be optimized ... I have better results with MKL or openBLAS on AMD CPUs

Kind regards,

Alain

gryko
Posts: 12
Joined: Tue Feb 26, 2013 5:56 pm

Re: Execution time and memory with ACLM

Post by gryko » Fri Oct 04, 2013 7:50 pm

Thank you very much for your quick answer. Here are more tests for two acml libraries: libacml and libacml_mp for Vishera 8350:

Abinit linked with libacml run with mpirun -np 4 or mpirun -np 8 uses 4 or 8 cores, almost 100%
Abinit linked with libacml_mp runs with mpirun -np 4 using 8 cores at about 70 - 80%.
Abinit linked with libacml_mp runs with mpirun -np 8, using 8 cores 100%, but the execution time is about 5 - 7 times longer.


Changing the subject - I am trying to link Abinit with fftw3. I installed fftw3-3.3, all tests reported fine, but when linking with Abinit, several subroutines are missing, for example:
../../src/52_fft_mpi_noabirule/lib52_fft_mpi_noabirule.a(m_fftw3.o): In function `__m_fftw3_MOD_cplan_many_dft':
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:2908: undefined reference to `sfftw_plan_many_dft_'
../../src/52_fft_mpi_noabirule/lib52_fft_mpi_noabirule.a(m_fftw3.o): In function `__m_fftw3_MOD_fftw3_c2c_op_spc':
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:1802: undefined reference to `sfftw_execute_dft_'
../../src/52_fft_mpi_noabirule/lib52_fft_mpi_noabirule.a(m_fftw3.o): In function `__m_fftw3_MOD_fftw3_execute_dft_spc':
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:3215: undefined reference to `sfftw_execute_dft_'
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:3215: undefined reference to `sfftw_execute_dft_'
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:3215: undefined reference to `sfftw_execute_dft_'
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:3215: undefined reference to `sfftw_execute_dft_'
../../src/52_fft_mpi_noabirule/lib52_fft_mpi_noabirule.a(m_fftw3.o):/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:3215: more undefined references to `sfftw_execute_dft_' follow

Any suggestions why?

Thank you very much in advance,
Jan Gryko

Locked