Getting too much errors with tests

FritoPaez · Post by **FritoPaez** » Thu May 29, 2014 9:35 pm

Hi guys...

I have just configured and compiled abinit 7.6.4 with the following .ac config:

enable_64bit_flags="yes"
prefix="${HOME}/software/abinit"
CPP="icc -E"
enable_mpi="yes"
enable_mpi_io="yes"
with_mpi_prefix="/usr/local"
enable_gpu="yes"
with_gpu_flavor="cuda-double"
with_gpu_prefix="/usr/local/cuda-5.5"
with_trio_flavor="netcdf+etsf_io+fox"
with_fft_flavor="fftw3-mkl"
with_fft_libs="-L/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64 -Wl,--start-group  -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group"
with_linalg_flavor="mkl+magma"
with_linalg_libs="-L/usr/local/magma/lib -lmagma -lmagmablas -L/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64 -Wl,--start-group  -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group"
with_dft_flavor="atompaw+bigdft+libxc+wannier90"
enable_gw_dpc="yes"
enable_test_timeout="yes"

magma libs were generated with this make.inc file:

Code: Select all

GPU_TARGET = Tesla
CC        = icc
NVCC      = nvcc
FORT      = ifort
ARCH      = ar
ARCHFLAGS = cr
RANLIB    = ranlib
OPTS      = -O3 -DADD_ -Wall -openmp -DMAGMA_WITH_MKL -DMAGMA_SETAFFINITY
F77OPTS   = -O3 -DADD_ -warn all
FOPTS     = -O3 -DADD_ -warn all
NVOPTS    = -O3 -DADD_ -Xcompiler -fno-strict-aliasing
LDOPTS    = -openmp
LIB       = -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread -lcublas -lcudart -lstdc++ -lm
-include make.check-mkl
-include make.check-cuda
LIBDIR    = -L$(MKLROOT)/lib/intel64 \
            -L$(CUDADIR)/lib64
INC       = -I$(CUDADIR)/include -I$(MKLROOT)/include

On the other hand, the OpenMPI environment was compiled using the Intel 14 compilers

The ./configure went fine, the make also did... but, when running the runtests.py I received a lot of errors mainly divided in two categories:

1. EXAMPLE:
[tutorespfn][telast_5] fldiff.pl fatal error:
The diff analysis cannot be pursued : the leading characters differ.
File .../abinit-7.6.4/tests/tutorespfn/Refs/telast_5.out, line 1206, 144 ignored, character:=
File .../abinit-7.6.4/build/temp_test/Test_suite/tutorespfn_telast_5/telast_5.out, line 1186, 124 ignored, character:

when eye-examined the output files, it looks like a correct output... and when the values are compared, nothing looks so tremendous. could you advice me with the possible origin of this error? must be taken very seriously or is just a bug in any of the sed-awk-diff routines?

2. EXAMPLE:
[tutorial][tbase2_5] failed: absolute error 9.132 > 3.681e-09

this deserves no comments... any good advice??

Best regards, boys... and keep the good work!

Post by **jbeuken** » Fri May 30, 2014 8:22 pm

Hello,

we have a testfarm and one of the bot tests the cuda functionality ( this bot has 4 x Tesla C1060 )
we use gcc46 + mkl + magma ( > 1.1.0 ) + cuda 4.2

and compile abinit with ( among others, as you can find in config/spec/testfarm.conf ) :

Code: Select all

NVCC_CFLAGS="-O3 -arch=sm_13 -Xptxas=-v --use_fast_math --compiler-options -O3,-fPIC"

there are only 4 tests "tested" with cuda to validate the "gcc/cuda" part of abinit ( ./runtests.py gpu )

Code: Select all

==========================================================================
          Serie   #failed   #passed  #succes  #skip  |   #CPU      #WALL
==========================================================================
            gpu |     0   |    2   |    2   |    0   |   264.8  |   265.4
==========================================================================

all other tests are not validated, although the results may be correct…

the gpu is not yet fully officially supported ... but we're working …

my 50¢

jmb

ABINIT Discussion Forums

Getting too much errors with tests

Getting too much errors with tests

Re: Getting too much errors with tests