Problem with t41.file test

Total energy, geometry optimization, DFT+U, spin....

Moderator: bguster

Locked
sdwang
Posts: 5
Joined: Thu Apr 22, 2010 5:05 am

Problem with t41.file test

Post by sdwang » Fri Sep 24, 2010 9:56 am

Dear:
When I performed the test/t41.file,there exits error as follows:

*** glibc detected *** free(): invalid pointer: 0x0000002a99d35010 ***
p0_15043: p4_error: interrupt SIGx: 6
forrtl: error (69): process interrupted (SIGINT)
rm_l_1_29792: (2.921875) net_send: could not write to fd=5, errno = 32
forrtl: error (69): process interrupted (SIGINT)
p0_15043: (5.964844) net_send: could not write to fd=4, errno = 32

What is the problem?
SDwang

mverstra
Posts: 655
Joined: Wed Aug 19, 2009 12:01 pm

Re: Problem with t41.file test

Post by mverstra » Mon Oct 11, 2010 12:18 pm

1) read the nettiquette in viewtopic.php?f=20&t=251
2) this is probably linked to your build of abinit and in particular p4_error probably means a parallelization error.

You really can't expect us to explain your crash on so little information...

Matthieu
Matthieu Verstraete
University of Liege, Belgium

sdwang
Posts: 5
Joined: Thu Apr 22, 2010 5:05 am

killed paralell run

Post by sdwang » Tue Oct 12, 2010 5:41 am

I have tested paralell calculation in ./tests/tparal_1.in, but it stops at:
================================================================================

getcut: wavevector= 0.0000 0.0000 0.0000 ngfft= 36 36 36
ecut(hartree)= 30.000 => boxcut(ratio)= 2.06487
scfcv : before setvtr, energies%e_hartree= 0.000000000000000E+000

ewald : nr and ng are 3 and 11
mklocl_recipspace : will add potential with strength vprtrb(:)=
0.000000000000000E+000 0.000000000000000E+000
setvtr : istep,n1xccc,moved_rhor= 1 0 0
scfcv : after setvtr, energies%e_hartree= 0.000000000000000E+000

ITER STEP NUMBER 1
vtorho : nnsclo_now= 2, note that nnsclo,dbl_nnsclo,istep= 0 0 1
p1_32116: p4_error: interrupt SIGSEGV: 11
p0_32111: p4_error: interrupt SIGSEGV: 11
forrtl: error (69): process interrupted (SIGINT)
rm_l_1_32174: (2.261719) net_send: could not write to fd=5, errno = 32
p1_32116: (2.261719) net_send: could not write to fd=5, errno = 32
p0_32111: (4.523438) net_send: could not write to fd=4, errno = 32

I do not kown why? In Below is part of my log file.
=== Build Information ===
Version : 6.2.2
Build target : x86_64_linux_intel9.0
Build date : 20101012

=== Compiler Suite ===
C compiler : gnu3.4
CFLAGS : -g -O3 -fschedule-insns2 -march=nocona -mmmx -msse -msse2 -msse3 -mfpmath=sse
C++ compiler : gnu3.4
CXXFLAGS : -g -O3 -fschedule-insns2 -march=nocona -mmmx -msse -msse2 -msse3 -mfpmath=sse
Fortran compiler : intel9.0
FCFLAGS : -g -extend-source -vec-report0
FC_LDFLAGS : -static-libgcc -static-intel

=== Optimizations ===
Debug level : yes
Optimization level : standard
Architecture : intel_xeon

=== MPI ===
Parallel build : yes
Parallel I/O : yes

=== Linear algebra ===
Library flavor : @linalg_flavor@
Use ScaLAPACK : no

=== Plug-ins ===
BigDFT : no
ETSF I/O : no
LibXC : no
FoX : no
NetCDF : no
Wannier90 : no

=== Experimental features ===
Bindings : no
Exports : no
GW double-precision : no
Macroave build : yes

mverstra
Posts: 655
Joined: Wed Aug 19, 2009 12:01 pm

Re: Problem with t41.file test

Post by mverstra » Sat Oct 16, 2010 11:53 am

your code is segfaulting, but there's no way to tell why at this distance. These input files have run on dozens of reference architectures every night for years, so the problem is with your build, hardware, or you have modified the input file. Your compilers are quite old, but this should not be the problem.
- Check your parallel mpif90/mpicc is correctly compiled with the same versions of the compilers.
- Compile without optimizations or (first) run under a debugger:

* read the gdb manual or a howto

* mpirun -np 4 abinit < etc.etc.etc. > &

* top gives you the pid for the instances of abinit, then you can run

* gdb $ABINITPATH/abinit <pid1>

* inside gdb, type cont to continue execution, and see where it crashes.

Also, does it run sequentially?

matthieu
Matthieu Verstraete
University of Liege, Belgium

Naina
Posts: 1
Joined: Tue Sep 27, 2016 1:25 pm

Re: Problem with t41.file test

Post by Naina » Tue Sep 27, 2016 1:29 pm

Hi,

I am running into similar error and I am not able to figure out why my jobs are crashing. Any help will be greatly appreciated.

Requested basis set is non-standard
Compound shells will be simplified
There are 30 shells and 82 basis functions
A cutoff of 1.0D-12 yielded 442 shell pairs
There are 3388 function pairs ( 4202 Cartesian)
Smallest overlap matrix eigenvalue = 4.51E-03
p0_947: p4_error: interrupt SIGSEGV: 11

Below is how my qchem input looks like:
$molecule
0 5
S
Fe 1 2.030996
$end

$rem
BASIS gen
ECP gen
EXCHANGE PBE
CORRELATION PBE
MAX_SCF_CYCLES 200
SCF_ALGORITHM DIIS_GDM
INCDFT FALSE
VARTHRESH FALSE
SYMMETRY FALSE
JOBTYPE freq
MEM_TOTAL = 4000
MEM_STATIC = 256
$end

Locked