rhohxc crash

Moderator: pouillon

Locked
mverstra
Posts: 655
Joined: Wed Aug 19, 2009 12:01 pm

rhohxc crash

Post by mverstra » Wed Mar 30, 2011 7:25 am

Hello everyone,

same platform as usual (magerit = xlf 12 suse linux with openmpi)

Since the last merge with trunk I am getting fairly systematic crashes in rhohxc:
present vxctau F
==18392==
==18392== Invalid read of size 8
==18392== at 0x11303448: *xcpot_stub_in_rhohxc (rhohxc_cpp.f90:1258)
==18392== by 0x1130FA9C: rhohxc (rhohxc_cpp.f90:1261)
==18392== by 0x10BE40C8: setvtr (setvtr_cpp.f90:767)
==18392== by 0x1021FE50: *setvtr_stub_in_scfcv (in /gpfs/storage/home/ulg32/ulg32347/CODES/ABINIT/6.7.1-private/tmp-seq/src/98_main/abinit)
==18392== by 0x1023BEDC: scfcv (scfcv_cpp.f90:1280)
==18392== by 0x100DD268: *scfcv_stub_in_scfcv_new (in /gpfs/storage/home/ulg32/ulg32347/CODES/ABINIT/6.7.1-private/tmp-seq/src/98_main/abinit)
==18392== by 0x100DEA64: scfcv_new (scfcv_new_cpp.f90:681)
==18392== by 0x101B46AC: gstate (gstate_cpp.f90:1374)
==18392== by 0x10041A0C: gstateimg (gstateimg_cpp.f90:840)
==18392== by 0x100110F4: driver (driver_cpp.f90:954)
==18392== by 0x100050FC: *driver_stub_in_abinit (abinit_cpp.f90:827)
==18392== by 0x10007290: main (abinit_cpp.f90:827)
==18392== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==18392==
==18392== Process terminating with default action of signal 11 (SIGSEGV)
==18392== Access not within mapped region at address 0x0
==18392== at 0x11303448: *xcpot_stub_in_rhohxc (rhohxc_cpp.f90:1258)
==18392== by 0x1130FA9C: rhohxc (rhohxc_cpp.f90:1261)
==18392== by 0x10BE40C8: setvtr (setvtr_cpp.f90:767)
==18392== by 0x1021FE50: *setvtr_stub_in_scfcv (in /gpfs/storage/home/ulg32/ulg32347/CODES/ABINIT/6.7.1-private/tmp-seq/src/98_main/abinit)
==18392== by 0x1023BEDC: scfcv (scfcv_cpp.f90:1280)
==18392== by 0x100DD268: *scfcv_stub_in_scfcv_new (in /gpfs/storage/home/ulg32/ulg32347/CODES/ABINIT/6.7.1-private/tmp-seq/src/98_main/abinit)
==18392== by 0x100DEA64: scfcv_new (scfcv_new_cpp.f90:681)
==18392== by 0x101B46AC: gstate (gstate_cpp.f90:1374)
==18392== by 0x10041A0C: gstateimg (gstateimg_cpp.f90:840)
==18392== by 0x100110F4: driver (driver_cpp.f90:954)
==18392== by 0x100050FC: *driver_stub_in_abinit (abinit_cpp.f90:827)
==18392== by 0x10007290: main (abinit_cpp.f90:827)


I can make the code go further by faking the presence of vxctau - these modification appear to be the cause of the problems. If I do this (add dummy arg in second call to xcpot) then it works for LDA but crashes for GGA calculations in tests_v1

any hints or suggestions? Strange that this should appear only on this platform (seq and par) and adding bounds checking does not help (optimization level either -O0 or -O3 is the same)

Matthieu
Matthieu Verstraete
University of Liege, Belgium

User avatar
pouillon
Posts: 651
Joined: Wed Aug 19, 2009 10:08 am
Location: Spain
Contact:

Re: rhohxc crash

Post by pouillon » Wed Mar 30, 2011 9:41 am

Zero-sized array issue?
Yann Pouillon
Simune Atomistics
Donostia-San Sebastián, Spain

mverstra
Posts: 655
Joined: Wed Aug 19, 2009 12:01 pm

Re: rhohxc crash

Post by mverstra » Thu Apr 07, 2011 9:19 pm

no, it appears to be intent in with optional, an ibm compiler bug (this but is present in the v12.1 compiler)

https://www-304.ibm.com/support/docview ... wg1LI75825

however, it does not bug systematically, and removing the intent(in) does not suffice - there are several levels of optional,intent(in) so it could be more complex. I hope they will patch the xlf...


@#%@#%!

Matthieu
Matthieu Verstraete
University of Liege, Belgium

mverstra
Posts: 655
Joined: Wed Aug 19, 2009 12:01 pm

Re: rhohxc crash

Post by mverstra » Thu May 05, 2011 8:29 am

Update: the patch for the compiler bug cited above was applied, and is fixed (I tried with the IBM test case) but the abinit still crashes. This is really a pain...
Matthieu Verstraete
University of Liege, Belgium

User avatar
pouillon
Posts: 651
Joined: Wed Aug 19, 2009 10:08 am
Location: Spain
Contact:

Re: rhohxc crash

Post by pouillon » Thu May 05, 2011 12:20 pm

Then, either:
  • the bug has only been partly fixed;
  • there is a flaw in Abinit as well;
but knowing IBM and Abinit, it could be a combination of both too. ;)
Yann Pouillon
Simune Atomistics
Donostia-San Sebastián, Spain

mverstra
Posts: 655
Joined: Wed Aug 19, 2009 12:01 pm

Re: rhohxc crash

Post by mverstra » Sun Jun 05, 2011 8:34 pm

present solution with 6.9.0 build: comment out 2 lines in rhohxc, calling xcpot with optional argument for tau gradients... Not happy, but it passes most tests.

Matthieu
Matthieu Verstraete
University of Liege, Belgium

Locked