SCF convergence deteriorates during structural relaxation resulting in crash

structure determination,...

Moderators: jzwanzig, jolafc

Post Reply
kalkm1
Posts: 6
Joined: Tue Jul 02, 2019 4:51 pm

SCF convergence deteriorates during structural relaxation resulting in crash

Post by kalkm1 » Mon Mar 15, 2021 12:43 pm

Hi,

I am trying to relax a 64 atom supercell of CdMnTe using ionmov 2 and optcell 2.

When dealing with bulk CdTe, the SCF loop converges and the structure relaxes below my tolerance (10e-4 tolmxf) after a few Broyden iterations - all good!

However, once I introduce Mn to the supercell (e.g. Cd0.5Mn0.5Te) the SCF loop does not converge but noticeably also starts at a very high density residual (nres2=2.060E+02) which does not improve after 30 SCF iterations (nres2=7.362E+02). After a few Broyden interations of the SCF loop not finding convergence, the calculation then crashes.

Any suggestions as to why this is happening and what steps/parameters I can change to get the SCF convergence to work would be greatly appreciated. I have played around with changing diemac (8, 12, 50) but this has not helped.

I have attached my input file for the CdMnTe supercell calculation and the corresponding output file.
den.in
(5.75 KiB) Downloaded 17 times
den_oc2im2_d50.out
(74.45 KiB) Downloaded 16 times
Cheers,
kalkm1
Last edited by kalkm1 on Wed Mar 17, 2021 6:58 pm, edited 1 time in total.

kalkm1
Posts: 6
Joined: Tue Jul 02, 2019 4:51 pm

Re: SCF convergence deteriorates during structural relaxation resulting in crash

Post by kalkm1 » Wed Mar 17, 2021 6:58 pm

Update!:

Okay I have managed to get the SCF loop to behave by changing diemix to 0.1 from 0.7, and it now converges in about 60 steps in the first Broyden iteration (output file is attached).

However, I have now encountered a new issue during subsequent SCF convergence loops during the Broyden structural relaxation iterations. The SCF initially improves finding convergence in ~30 steps for the first 5 Broyden iterations but then begins to deteriorate during steps 6 and 7, not finding convergence in 100 SCF steps, with the calculation crashing after the 7th Broyden iteration.

I should note that during some identical calculations (64 atoms Cd0.5Mn0.5Te) the SCF convergence does not deteriorate and the calculation completes successfully. Other times it crashes as described above.

When the SCF loop deteriorates, is this a case of the structural relaxation not finding it's minimum? If so, how come the calculation crashes after the SCF loop has already completed and, according to the log file (last lines of log file attached), as Abinit is trying to create an HDf-5 file? The last line in the log file before the calculation terminates is always:
- Creating HDf5 file with MPI-IO support: tmp/den_o_GSR.nc
I've attached a figure showing the total energy as a function of stress on the supercell: this shows the structural relaxation progressing well before the stress suddenly increases and the calculation crashes.

If you have more understanding as to why this is happening - whether it is an issue with the code or a minimization problem - and what can be done to avoid it, please let me know!

den_oc2im2_d8_O2sm.out
(262.88 KiB) Downloaded 14 times
den.log
(7.74 KiB) Downloaded 14 times
Figure_1.png
Cheers,
kalkm1

User avatar
gmatteo
Posts: 291
Joined: Sun Aug 16, 2009 5:40 pm

Re: SCF convergence deteriorates during structural relaxation resulting in crash

Post by gmatteo » Sat Mar 20, 2021 5:04 am

The last line in the log file before the calculation terminates is always:
- Creating HDf5 file with MPI-IO support: tmp/den_o_GSR.nc
The problem is not necessarily due to the output of the GSR file.
In your log file, I find the following section:

Code: Select all

- Creating HDf5 file with MPI-IO support: tmp/den_o_GSR.nc
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 27
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 28
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 29
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 31
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 32
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 33
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 46
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 47
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 48
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 49
This indicates that abinit aborted execution because a critical condition occurred but this critical
event is explicitly handled by the developer who calls MPI_ABORT to shutdown everything.
In principle, there should be an __ABI_MPIABORTFILE__ file with the error message produced by the first MPI process
that invokes MPI_ABORT. Having the error message would be useful to pinpoint the problem.

Unfortunately, it may happen that __ABI_MPIABORTFILE__ is empty since the MPI runtime environment may kill all the processes without giving them enough time to flush their IO buffer to file.

kalkm1
Posts: 6
Joined: Tue Jul 02, 2019 4:51 pm

Re: SCF convergence deteriorates during structural relaxation resulting in crash

Post by kalkm1 » Sun Mar 21, 2021 12:18 pm

Thanks for your response!

I do usually get an __ABI_MPIABORTFILE__ file when the runs crash, but as you correctly predicted, it is always empty. I was confused as to why it is empty, so your answer does clarify this somewhat. Unfortunately, this makes it more tricky to diagnose the issue.

I have since made some more observations which might be of interest. For the 64 atom supercell, as I have already mentioned in an above post, sometimes the calculation crashes with this error in the log file, but sometimes it also completes.

For a 216 atom supercell, the calculation will never complete and always crashes at Broyden iteration ~7/8 with the SCF deteriorating after the ~4/5 Broyden iteration (i.e. SCF no longer finding convergence in 100 steps). If I restart the calculation with the atomic positions and acell from one of the relaxation steps which found convergence (e.g. Broyden iteration 4), the calculation will again run for 4/5 Broyden iterations before the SCF deteriorates and the job crashes.

This forum post describes a similar issue: viewtopic.php?f=8&t=4131 and suggests compiling Abinit with the -O2 and --enable-avx-safe-mode flags. This solved the issue for the user in the original post, but has not made any difference in my case.

Cheers,
kalkm1

Post Reply