weird bug appearing only in self-consistent GW (with PAW)

GW, Bethe-Salpeter …

Moderators: maryam.azizi, bruneval

Locked
etea
Posts: 2
Joined: Mon Mar 24, 2014 9:38 am

weird bug appearing only in self-consistent GW (with PAW)

Post by etea » Mon Mar 24, 2014 10:40 am

Dear abiniters,

I experienced a SIGBUS (or SIGSEGV depending on the machine's mood) with abinit-7.6.2 here:
=> 70_qw/calc_vhxc_me.F90: line 526 : ABI_FREE(kinpw)
This is not a limited memory problem as it is quite often the case.
(i have 20GB for a bulk test calculation (2 atoms, 8 k-points in IBZ, reduced cut-offs etc.) (all tests on 1 and 2 cpus)

SIGSEGV usually points to array bound violation which is what i found. Debugging:
1/ the array kinpw is allocated with a size npw_k = Wfd%Kdata(ik_ibz)%npw which is the number of plane waves at the current k-point
70_qw/calc_vhxc_me.F90: line 428 : ABI_MALLOC(kinpw,(npw_k))
2/ it is then passed to mkkin() to calculate the kinetic energy at this k-point for all plane waves but up to Wfd%npwwfn plane waves.
-> Wfd%npwwfn is not k-dependent (absolute maximum number of pw for any k-point?) and always larger than npw_k (this has been checked)
70_qw/calc_vhxc_me.F90: line 430 : call mkkin(Dtset%ecutwfn+0.1_dp,Dtset%ecutsm,Dtset%effmass,Cryst%gmet,kg_k,kinpw,kpt,Wfd%npwwfn)
3/ the loop in mkkin() fills the kinpw array without condition for all elements ( 1 : npw=Wfd%npwwfn )
56_recipspace/mkkin.F90: line 107 : do ig=1,npw
56_recipspace/mkkin.F90: line 136 : kinpw(ig)=kinetic/effmass
So, the code writes data somewhere unallowed...

Now the weird part:
1/ The signals show up only in self-consistent GW (gwcalctyp>=20) with PAW (no spin-orbit).
2/ The signals do not show up in one shot GW (default gwcalctyp) with PAW, or any GW with NC psps, even though kinpw() bounds are exceeded.

What is reassuring is that in calc_vhxc_me(), the elements beyond bounds kinpw(npw_k:Wfd%npwwfn) are not referenced.
However, SIGBUS or SIGSEGV showing up means that the pointer to kinpw() is somewhere ill-defined when deallocating (is that right?).
The fact that this 'somewhere' depends on the psps used and/or the type of GW calculation may point to an initialization dependent behaviour (wavefunctions etc.).
I hope that with PAW, kinpw(npw_k:Wfd%npwwfn) do not actually overwrite existing data. I didn't have a look on how ABI_MALLOC() works...

To solve the problem i just substituted "npw_k" to "Wfd%npwwfn" in the call to mkkin() which is what was intended i think.
No more SIGBUS or SIGSEGV (even for larger calculations with less available memory).
I'd like to have a feedback from developpers please.
I have two questions:
1/ am i doing anything wrong with this substitution?
2/ am i missing something (bug cause, solution, bug consequences etc.)

Best regards,
Eric Tea

raul_l
Posts: 74
Joined: Sun Jan 08, 2012 7:45 pm

Re: weird bug appearing only in self-consistent GW (with PAW

Post by raul_l » Mon Mar 24, 2014 8:13 pm

I had exactly the same problem. In mkkin.F90 I changed

Code: Select all

do ig=1,npw

to

Code: Select all

do ig=1,min(size(kinpw),npw)

which, I think, is effectively the same thing you did. Also, I had to change

Code: Select all

real(dp),intent(out) :: kinpw(npw)

to

Code: Select all

real(dp),intent(in out) :: kinpw(:)

after which it got past line 526 in 70_qw/calc_vhxc_me.F90. However, it still crashed at some later time (during the 2nd round of screening of QP corrections, don't remember anymore; originally I complained about it in here viewtopic.php?f=11&t=1922). Hopefully some developer can comment on this issue.
Raul Laasner
Netherlands Institute for Space Research

Locked