Failed to perform DATASET 2  [SOLVED]

Total energy, geometry optimization, DFT+U, spin....

Moderator: bguster

Locked
gh.phys
Posts: 6
Joined: Fri Oct 02, 2015 12:30 pm

Failed to perform DATASET 2

Post by gh.phys » Mon Feb 15, 2016 10:03 am

I am running a band structure calculation on a cluster (64 cores and paral-kgb=1). When the first DATASET is done, the DATASET 2 does not start due to the following error:

----iterations are completed or convergence reached----

outwf: write wavefunction to file output_DS1_WFK, with accesswff 1
File locking failed in ADIOI_Set_lock(fd 16,cmd F_SETLKW/7,type F_WRLCK/1,whence 0) with return value FFFFFFFF and errno 26.
- If the file system is NFS, you need to use NFS version 3, ensure that the lockd daemon is running on all the machines, and mount the directory with the 'noac' option (no attribute caching).
- If the file system is LUSTRE, ensure that the directory is mounted with the 'flock' option.
ADIOI_Set_lock:: Function not implemented
ADIOI_Set_lock:offset 1551170, length 524288
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.


I have complied abinit using the following configurations:

Code: Select all

FCFLAGS_EXTRA="-O2 -mtune=native -march=native -mfpmath=sse -ffree-line-length-$
enable_mpi="yes"
enable_mpi_io="yes"
with_mpi_prefix="/common/sw/alarik/pkg/openmpi/1.6.4/gcc/4.6.2"
with_trio_flavor="netcdf+etsf_io"
with_fft_flavor="fftw3"
with_fft_incs="-I/common/sw/alarik/pkg/intel/14.0/composer_xe_2013_sp1.2.144/mk$
with_fft_libs="-L/common/sw/alarik/pkg/intel/14.0/composer_xe_2013_sp1.2.144/mk$
with_linalg_flavor="mkl"
with_linalg_incs="-I/common/sw/alarik/pkg/intel/14.0/composer_xe_2013_sp1.2.144$
with_linalg_libs="-L/common/sw/alarik/pkg/intel/14.0/composer_xe_2013_sp1.2.144$
with_dft_flavor="atompaw+libxc+wannier90"
enable_clib="yes"
enable_gw_dpc="yes"
enable_memory_profiling="no"
enable_openmp="no"
enable_maintainer_checks="no"


I appreciate your help.

gh.phys
Posts: 6
Joined: Fri Oct 02, 2015 12:30 pm

Re: Failed to perform DATASET 2

Post by gh.phys » Mon Feb 22, 2016 10:02 am

Does anyone have any input on the above error? I do appreciate any help.

By the way, I also installed Abinit 10.7.5 with gcc 4.9 compiler and the same error happened afetr the first dataset finished.

mverstra
Posts: 655
Joined: Wed Aug 19, 2009 12:01 pm

Re: Failed to perform DATASET 2

Post by mverstra » Mon Feb 22, 2016 4:05 pm

Hi Masoomeh,

your run is not able to print the output WFK file at the end of dataset 1. This seems to be a network disk related issue (not abinit related) - perhaps you can post your job submission file and "files" file, but normally only your local sysadmin can fix this kind of thing for you (probably just tell you what you are doing wrong in the setting up of paths and directories for execution and in your files file).

It could also be an issue with parallel io over your network. Could you try a test run in sequential (1 proc), perhaps with very low nstep and ecut, to see if it can save the file? Are all of the files on a local scratch on the node, and are you running on a single node? NB: if you run on several nodes and the WFK are stored on local disks, only the mother node has access to the WFK to start DS2 - this is a classic error - and will fail. If you copy stuff back to your original submission directory, and relaunch DS2 it will work because the WFK is now copied to all participating nodes.

cheers

Matthieu
Matthieu Verstraete
University of Liege, Belgium

gh.phys
Posts: 6
Joined: Fri Oct 02, 2015 12:30 pm

Re: Failed to perform DATASET 2  [SOLVED]

Post by gh.phys » Wed Feb 24, 2016 10:11 am

Hi Matthieu,

Thanks for your reply. You are right. I did a test and the job works fine on one node. At least, now I know that it is not abinit-related. I have contacted our grid support to see how I can run on more nodes.

Locked