Bug in ncache assignment for MPI FFTs  [SOLVED]

Moderators: jbeuken, jzwanzig, gonze, mcote

Bug in ncache assignment for MPI FFTs

Postby renatomiceli » Mon Nov 19, 2012 7:46 pm

Hi all,

one of our users here at the Irish Centre for High-End Computing ran into an error while running Abinit 6.12.3. The error message printed right before the execution stopped was as below:

ncache has to be enlarged to be able to hold at
least one 1-d FFT of each size even though this will
reduce the performance for shorter transform lengths

I was able to successfully replicate his 'ncache' issue and then to trace it to the FFT routines within Abinit, contained in abinit-6.12.3/src/52_fft_mpi_noabirule. Variable 'ncache' defines the size of the working area for the FFT algorithm and leads to a program finalisation in case the work array is too small to even fit a single dimensional transform. The issue is that the value for 'ncache' is hardcoded in the code and the execution won't be able to proceed if any of the FFT dimensions exceed 1024, which is the scenario the user faced. The piece of code that caused the unexpected finalisation is the following:

Code: Select all
        ncache=4*1024
        if (ncache/(4*max(n1,n2,n3)).lt.1) then
                      write(std_out,*) &
&                        ' ncache has to be enlarged to be able to hold at', &
&                        ' least one 1-d FFT of each size even though this will', &
&                        ' reduce the performance for shorter transform lengths'
                       stop
        end if

In order to work around this issue I've managed to patch Abinit to allow for a variable work array for the FFT algorithm to be allocated. I replaced in files accrho.F90, applypot.F90, back.F90, back_wf.F90, forw.F90 and forw_wf.F90 all the assignments of 'ncache':

Code: Select all
        ncache=4*1024

By the following, in order to allow the working area for the FFT algorithm to fit at least a single dimensional transform:

Code: Select all
        ncache=4*max(n1,n2,n3,1024)

Where 'n1', 'n2' and 'n3' are the FFT dimensions.

Now that Abinit 6.12.3 is patched, the code passed the check that was leading to the unexpected finalisation, the job finalised successfully and the results yielded were physically consistent.

I appreciate if you can fix this bug in Abinit's main trunk.

If you'd like to replicate the bug, I've published the test case scenario here:
http://www-staff.ichec.ie/~rmiceli/abinit/

The Abinit 6.12.3 installation script I used is available here:
http://www-staff.ichec.ie/~rmiceli/abin ... -6.12.3.sh

The data sets, input files and PBS scripts are available here:
http://www-staff.ichec.ie/~rmiceli/abin ... test-case/

(where 01-H.LDA.fhi and 14-Si.LDA.fhi are the pseudopotentials; sislab.files and sislab.in are the input files; sislab.scr is the PBS script for job execution; and sislab.log, sislab.o1209327, sislab.out, sislab_STATUS and sislabo_OUT.nc are the output files after the execution unexpectedly terminates.)

You can also find the patches for the source files at abinit-6.12.3/src/52_fft_mpi_noabirule here:
http://www-staff.ichec.ie/~rmiceli/abin ... g/patches/

And here is the execution script already performing the patching, using 'sed':
http://www-staff.ichec.ie/~rmiceli/abin ... patched.sh

Please let me know if you'd like any more inputs.
Thank you in advance for your time and patience.

Kind regards,
Renato Miceli
Mr. Renato Miceli
Independent HPC & Supercomputing Consultant
http://renatomiceli.com/
renatomiceli
 
Posts: 10
Joined: Sat Jul 21, 2012 4:11 pm
Location: Salvador da Bahia, Brazil

Re: Bug in ncache assignment for MPI FFTs

Postby renatomiceli » Tue Jan 29, 2013 8:44 pm

Hi,

just to update you that this reported bug remains in both versions 7.0.3 and 7.0.5 (the latest Abinit release). The error message has now changed to the following:
back.F90:120:ERROR
ncache has to be enlarged to be able to hold at
least one 1-d FFT of each size even though this will
reduce the performance for shorter transform lengths
mpiexec_raw: Warning: tasks 0-143 exited with status 1.


It now becomes even clearer that the bug lies within file back.F90 at directory abinit-7.0.5/src/52_fft_mpi_noabirule/. The other 5 files that contain the same construct ncache=4*1024 need to be patched as well.

I appreciate if you can fix this bug in Abinit's main trunk.

The test case scenario used to reach this error state is the same as before:
http://www-staff.ichec.ie/~rmiceli/abin ... test-case/

Its results with Abinit 7.0.5 are here:
http://www-staff.ichec.ie/~rmiceli/abin ... unpatched/

The Abinit 7.0.5 installation script I used is available here:
http://www-staff.ichec.ie/~rmiceli/abin ... t-7.0.5.sh

The patches I propose to fix this bug are here:
http://www-staff.ichec.ie/~rmiceli/abin ... 5/patches/

Or else, you could just sed the files, as I've done in the patched installation script here:
http://www-staff.ichec.ie/~rmiceli/abin ... patched.sh

And the results are now properly computed, as you can see here:
http://www-staff.ichec.ie/~rmiceli/abin ... e/patched/

Please let me know if you need my contribution for anything else. I will be happy to see this bug fixed at your earliest convenience.

Kind regards,
Renato Miceli
Mr. Renato Miceli
Independent HPC & Supercomputing Consultant
http://renatomiceli.com/
renatomiceli
 
Posts: 10
Joined: Sat Jul 21, 2012 4:11 pm
Location: Salvador da Bahia, Brazil

Re: Bug in ncache assignment for MPI FFTs  [SOLVED]

Postby mverstra » Mon Sep 16, 2013 1:13 pm

Hi Renato,

sorry about being so long in integrating this. It is now in my devel 7.5.3 branch and will be tested then released in abinit 7.6 shortly. I can send you a source tar.gz if you wish to try it now.

PS: your reference and source files are not legible/downloadable. I did get the patch files, though.

cheers, and many thanks for the contrib!

Matthieu
Matthieu Verstraete
University of Liege, Belgium
mverstra
 
Posts: 618
Joined: Wed Aug 19, 2009 12:01 pm

Re: Bug in ncache assignment for MPI FFTs

Postby renatomiceli » Wed Jan 22, 2014 5:27 pm

Hi Matthieu,

I just saw that my patches were applied to Abinit 7.6.1 released last week.
It was a pleasure to contribute to Abinit!

Thanks again for patching the new release!

Kind regards,
Renato Miceli
Mr. Renato Miceli
Independent HPC & Supercomputing Consultant
http://renatomiceli.com/
renatomiceli
 
Posts: 10
Joined: Sat Jul 21, 2012 4:11 pm
Location: Salvador da Bahia, Brazil


Return to Proposed code modifications

Who is online

Users browsing this forum: No registered users and 1 guest