SCF cycle deteriorates during molecular dynamics

Total energy, geometry optimization, DFT+U, spin....

Moderator: bguster

Locked
gabriel.antonius
Posts: 58
Joined: Mon May 03, 2010 10:34 pm

SCF cycle deteriorates during molecular dynamics

Post by gabriel.antonius » Fri Apr 13, 2018 7:21 pm

Dear all,

I’ve encountered this problem several times. During molecular dynamics, the SCF cycles degrades at each iteration, and at some point, the SCF cycle no longer converges for the remaining iterations (see image). However, restarting the molecular dynamics from the last non-converging step stabilizes the cycle for a number of iterations until it starts diverging again. This point to a bug in the code, rather than a physical feature, and it tends to appear in large systems (bulk supercells or 2D slabs with vacuum).

plot-51-110-md-scf.jpg
plot-51-110-md-scf.jpg (96.02 KiB) Viewed 6800 times


I’ve reported this problem before, and the solution was to disable vectorization when compiling Abinit. This time however, I’m not using any vectorization, and no optimization either. I use Intel compilers with mkl on a Cray XC40 system with Intel Xeon "Haswell" processor nodes. Here are some compilation flags appearing in my config.log file.

Code: Select all

FCFLAGS='-O0 -z muldefs'
FCFLAGS_64BITS=''
FCFLAGS_DEBUG=''
FCFLAGS_EXTRA=''
FCFLAGS_F90=''
FCFLAGS_FIXEDFORM='-fixed'
FCFLAGS_FREEFORM='-free'
FCFLAGS_HINTS='-extend-source -vec-report0 -noaltparam -nofpscomp'
FCFLAGS_MODDIR='-module $(abinit_moddir)'
FCFLAGS_OPENMP=''
FCFLAGS_OPTIM=''


Here attached are input and output files for a "minimal" working example. It is a supercell with 40 atoms. I hope we can track this bug soon, and I'm looking for any work-around you might know of.
Attachments
calc.in
(2.48 KiB) Downloaded 267 times
calc.out
(939.91 KiB) Downloaded 267 times
Gabriel Antonius
Université du Québec à Trois-Rivières

Boris
Posts: 128
Joined: Tue Feb 16, 2010 10:13 am
Location: France

Re: SCF cycle deteriorates during molecular dynamics

Post by Boris » Tue Apr 17, 2018 9:37 pm

Hi

I have the exact same issue for a large 64 atom actinide supercell with spin orbit coupling. The SCF cycle slowly degrades as the dynamics goes on, except that I cannot restart the molecular dynamics due to convergence issue.

I have not been able yet to identify the source of this problem but we know it exists. If you run into anything that can help us, please let us know. Try to deactivate open MP if you're using it.
----------------------------------------------------------
Boris Dorado
Atomic Energy Commission
France
----------------------------------------------------------

User avatar
jbeuken
Posts: 365
Joined: Tue Aug 18, 2009 9:24 pm
Contact:

Re: SCF cycle deteriorates during molecular dynamics

Post by jbeuken » Thu Apr 19, 2018 5:17 pm

Hi,

Have you already tried OpenBLAS or netlib instead of MKL?
It's probably slower...

my 5¢
------
Jean-Michel Beuken
Computer Scientist

gabriel.antonius
Posts: 58
Joined: Mon May 03, 2010 10:34 pm

Re: SCF cycle deteriorates during molecular dynamics

Post by gabriel.antonius » Mon Apr 23, 2018 1:53 am

Yes, I tried compiling with Netlib and the gnu compilers, but the result is the same.
Gabriel Antonius
Université du Québec à Trois-Rivières

gabriel.antonius
Posts: 58
Joined: Mon May 03, 2010 10:34 pm

Re: SCF cycle deteriorates during molecular dynamics

Post by gabriel.antonius » Mon Apr 30, 2018 4:56 am

So I have been playing with various parameters to alter the SCF cycle and try to work around the bug, without success. Here I wanted to give a small update.

When I increase the convergence criterion tolvrs, I see that the error in V2res builds up progressively from one iteration to the next. This means that it’s really a noise being added to the potential that grows bigger at each iteration.

plot-53-106-md-scf.jpg
plot-53-106-md-scf.jpg (120.23 KiB) Viewed 6745 times


For this last test, Abinit was compiled with the gnu compilers, without openMP, and without any external library (using all the fallbacks).

I would like to emphasis again that this bug is frequent, and that it is an important barrier to the usability of the code...
Gabriel Antonius
Université du Québec à Trois-Rivières

gabriel.antonius
Posts: 58
Joined: Mon May 03, 2010 10:34 pm

Re: SCF cycle deteriorates during molecular dynamics

Post by gabriel.antonius » Tue May 01, 2018 3:50 am

I realized the problem was linked to fft parallelism, and arises for all fft libraries I've tried. So the work-around is to set npfft=1.

I am trying to take advantage of openmp parallelism to circumvent this limitation, but that's another story...
Gabriel Antonius
Université du Québec à Trois-Rivières

Boris
Posts: 128
Joined: Tue Feb 16, 2010 10:13 am
Location: France

Re: SCF cycle deteriorates during molecular dynamics

Post by Boris » Wed May 02, 2018 9:25 am

Are you sure it is linked to the FFT parallelism? Because I'm using npfft 1 and I am still having this issue.

Openmp should not be a problem if you're using the latest version of abinit with the new implementation of the lobpcg algorithm (set by default). It works fairly well with openmp.
----------------------------------------------------------
Boris Dorado
Atomic Energy Commission
France
----------------------------------------------------------

Boris
Posts: 128
Joined: Tue Feb 16, 2010 10:13 am
Location: France

Re: SCF cycle deteriorates during molecular dynamics

Post by Boris » Fri May 04, 2018 8:53 am

gabriel.antonius wrote:I realized the problem was linked to fft parallelism, and arises for all fft libraries I've tried. So the work-around is to set npfft=1.

I am trying to take advantage of openmp parallelism to circumvent this limitation, but that's another story...


We may have a clue about what's going on here. We have found out that forces are not synchronized correctly, so that each MPI process does its own calculation, until the difference between each MPI process is too large, and the Broyden goes nuts.

This is indeed linked to the FFT parallelism and can be worked around by setting npfft 1. However, if you turn openmp on, you will have the same issue, because you will basically "replace" the fft parallelism by openmp threads.

We're working on it!
----------------------------------------------------------
Boris Dorado
Atomic Energy Commission
France
----------------------------------------------------------

Locked