runtest.py failed 12 tests

minyez · Post by **minyez** » Sat Sep 05, 2020 11:10 am

Hi everyone,
Greetings. I am having some problem in testing the abinit-8.10.3 built with Intel 2019 Update 3. I am wondering if these failed cases will make further practical use less legitimate. It will be very appreciated if some one could help me resolve the problem.

The packages used in building abinit are netCDF (4.7.4), netCDF-Fortran (4.5.3), LibXC (3.0.1), MKL (within Intel 19.3) FFTW3 (3.3.8) and Wannier90 (1.2). They are all built with the same ifort 19 compiler. Here is the configure command used

Code: Select all

./configure --prefix="/opt/software/abinit/8.10.3/intel-2019.3" \
    CXX=mpiicpc FC=mpiifort CC=mpiicc MPI_RUNNER=mpirun \
    --enable-64-bit-flags \
    --with-mpi-incs="-I$MKLROOT/../mpi/intel64/include" --with-mpi-libs="-L$MKLROOT/../mpi/intel64/lib -lmpi" \
    --with-linalg-flavor="mkl+scalapack" --with-linalg-incs=-I$MKLROOT/include/intel64/lp64 \
    --with-linalg-libs="-L$MKLROOT/lib/intel64 -lmkl_blas95_lp64 -lmkl_lapack95_lp64 -lmkl_scalapack_lp64 -Wl,--start-group -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -lpthread -lm -ldl" \
    --enable-gw-dpc  --with-fft-flavor="fftw3" --with-fft-libs="-L$FFTW3DIR/lib -lfftw3_mpi -lfftw3" --with-fft-incs=-I$FFTW3DIR/include/ \
    --with-dft-flavor="atompaw+libxc+wannier90" --with-libxc-incs="-I$LIBXCDIR/include/" --with-libxc-libs="-L$LIBXCDIR/lib -lxcf90 -lxc" \
    --with-trio-flavor="netcdf" --with-netcdf-incs="-I$NETCDF_HOME/include/ -I$NETCDF_F_HOME/include/" --with-netcdf-libs="-L$NETCDF_HOME/lib -L$NETCDF_F_HOME/lib -lnetcdff -lnetcdf" \
    --with-wannier-libs="$WANNIER90_HOME/libwannier90.a" --with-wannier90-bins="$WANNIER90_HOME/" --with-wannier90-incs="-I$WANNIER90_HOME/src"

Compilation succeded. Tests were run shortly after compilation to check the build, with the command as runtests.py -j8 --force-mpi. The overall report is below.

Code: Select all

Suite            failed  passed  succeeded  skipped  disabled  run_etime  tot_etime
atompaw               2       0          0        0         0      12.26      12.48
bigdft                0       0          0       23         0       0.00       0.38
bigdft_paral          0       0          0        4         0       0.00       0.05
built-in              0       0          6        1         0       7.47       7.62
etsf_io               0       0          8        0         0      25.86      26.24
fast                  0       1         10        0         0      42.74      44.12
gpu                   0       0          0        7         0       0.00       0.01
libxc                 0       7         28        0         0     502.62     506.75
mpiio                 0       0          1       17         0       1.84       1.89
paral                 0       5         22       89         0     552.69     556.70
psml                  0       0          0       14         0       0.00       0.02
seq                   0       0          0       18         0       0.00       0.01
tutomultibinit        0       0          3        0         0      38.94      41.86
tutoparal             0       0          1        5         0       0.78       0.81
tutoplugs             0       4          0        0         0      13.25      13.42
tutorespfn            0       6         20        0         0    1368.07    1373.88
tutorial              0      13         46        0         0    1557.80    1562.85
unitary               2       0         19       13         0      96.64      97.32
v1                    0       1         75        0         0     673.91     679.98
v2                    0      14         65        0         0     939.31     945.70
v3                    1      13         64        0         0     707.81     717.84
v4                    0      11         50        0         0     683.25     691.34
v5                    0      13         60        0         0    1199.20    1213.47
v6                    1       8         52        0         0     771.25     781.04
v67mbpt               1       7         17        0         0     376.32     382.42
v7                    2      10         52        0         0    1587.33    1600.73
v8                    3       8         35        2         0    1567.68    1575.06
vdwxc                 0       0          1        0         0      14.04      14.09
wannier90             0       7          0        0         0      32.56      32.94

Completed in 1810.95 [s]. Average time for test=16.48 [s], stdev=29.64 [s]
Summary: failed=12, succeeded=635, passed=128, skipped=193, disabled=0

The 12 failed tests are

Code: Select all

atompaw_t01-02
atompaw_t03-t04
unitary_tfftfftw3_01
unitary_tfftfftw3_02
v3_t87-t88-t89-t90-t91
v6_t45
v67mbpt_t37-t38-t39
v7_t43
v7_t85-t86-t87-t88-t89
v8_t41-t42-t43-t44
v8_t45
v8_t61-t62-t63-t64-t65

Since I can't upload the HTML report files, the stderr is briefly pasted below

1. atompaw
[atompaw][t02][np=1]: failed: erroneous lines 28 > 25 [file=t02.out]
[atompaw][t04][np=1]: failed: absolute error 0.015 > 0.011 [file=t04.out]

2. unitary
Both tfftfftw3_01 and tfftfftw3_02 tests have the following in the stderr

Code: Select all

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libifcoremt.so.5 000014B4C75E2522 for__signal_handl Unknown Unknown
libpthread-2.28.s 000014B4C9242070 Unknown Unknown Unknown
fftprof 00000000006F1DFA Unknown Unknown Unknown
fftprof 00000000006F0501 fftw_destroy_plan Unknown Unknown
fftprof 000000000047CC1B Unknown Unknown Unknown
fftprof 00000000004354D4 Unknown Unknown Unknown
fftprof 000000000043063E Unknown Unknown Unknown
fftprof 000000000040BF12 Unknown Unknown Unknown
fftprof 0000000000409022 Unknown Unknown Unknown
libc-2.28.so 000014B4C51BE413 __libc_start_main Unknown Unknown
fftprof 0000000000408F2E Unknown Unknown Unknown

3. v3
In t88 and t89, it reports "The diff analysis cannot be pursued: the leading characters differ."

4. v7
t43 failed with absolute error 10.0 > 0.0011
t87 failed with "The diff analysis cannot be pursued: the leading characters differ."

5. v8
t43 and t63 failed with "The diff analysis cannot be pursued: the leading characters differ."
t45 failed with erroneous lines 35 > 33

minyez · Post by **minyez** » Mon Sep 07, 2020 9:59 am

Update:
1. after using atompaw 4.1.0.5 instead of fallback atompaw, the atompaw errors disappears.
2. I found error information about v6-t45 was missing in the post. make check reports the error as

--- !ERROR
src_file: m_energy.F90
src_line: 465
mpi_rank: 0
message: |

BUG: Migdal energy and LDA+U energy do not conincide 0.96416777E+01 0.00000000E+00nlda

gmatteo · Post by **gmatteo** » Tue Sep 08, 2020 10:04 pm

The packages used in building abinit are netCDF MKL (within Intel 19.3) FFTW3 (3.3.8)

Avoid mixing FFTW3 with MKL as MKL provides wrappers for fftw3 calling the intel DFTI FFT library.
This means that the linker will receive multiple definitions for the same procedure and
results are unpredictable!
Either use MKL for linalg and fft or FFTW3 with e.g. openblas for linalg.

This should explain the segmentation fault in tfftfftw3_01 and tfftfftw3_02.
The errors in atompaw do not seem to be critical, perhaps it's just numerical noise.

The other tests require some inspection of the output files with e.g. vimdiff to understand if the diffs are physically significant.

Also note that there are tests that are expected to fail with intel19.
The test info section has an `exclude_builders` section we use to disable the test for particular compilers/machines.

$ grep intel ../tests/v9/Input/*
../tests/v9/Input/t86.in:#%% exclude_builders = .*_intel_19.0_.*, .*_nag_7.0_.*, buda2_intel_17.0_openmpi
../tests/v9/Input/t86.in:#%% Temporarily disabled on higgs_intel_19.0_serial for strange segfault
../tests/v9/Input/t87.in:#%% exclude_builders = .*_intel_19.0_.*, .*_nag_7.0_.*, buda2_intel_17.0_openmpi
../tests/v9/Input/t87.in:#%% Temporarily disabled on higgs_intel_19.0_serial for strange segfault

ABINIT Discussion Forums

runtest.py failed 12 tests

runtest.py failed 12 tests

Re: runtest.py failed 12 tests

Re: runtest.py failed 12 tests