I have been faced with a trouble in parallelization on ABINIT-8.4.2 when I install it in a new system. I could not find a similar topic and I would like to get a help here.
I am a beginner of ABINIT and first-principle packages and enjoying a simple calculation with ABINIT on my desktop PC. It works well even in parallel cases. In order to start a relatively large calculation, I am going to use a larger computing system (16 processes with 256 GB memories). For the installation, I followed the completely same procedures as I used for my desktop PC. A brief description of the setup is as follows:
CPU: Intel Xeon E5-2687W Sandy Bridge Octa Core 3.1GHz, L3 = 20MB 150W x 2
Compiler: ifort 13.0.1 with Intel MKL
MPI: OpenMPI-1.4.5 combined with torque-2.3.7
Code: Select all
../configure --enable-mpi --enable-openmp --enable-64bit-flags FC=mpif90 CC=mpicc CXX=mpicxx LDFLAGS="-L/opt/intel/composer_xe_2013_1.117/mkl/lib/intel64" LIBS="-lmkl_blas95_lp64 -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm" --prefix=$HOME/abinit-8.4.2
(I attach the config.log file for more details.)
There was no error when I did make, make check and make install. I also confirmed that simple ground-state calculations and band calculations provide consistent results with ones in my previous PC. The calculation time is also in the same order.
However, when I tried to perform parallel calculations, the calculation time significantly degraded. For example, a ground-state calculation on bismuth crystal (my interest) finished in 38 sec with "abinit<input.files>&log" but it takes more than 180 sec with "mpirun -np 8 abinit<input.files>&log". I believe that this is not due to the input file because a clear speed-up by the parallelization was observed on my desktop PC for the completely same input file. (ex. 38 sec -> 11 sec by "mpirun -np 8).
I have already confirmed the OpenMPI itself works well with a simple test program calculating Gram Schmidt normalization. Although I do not show the detail, the calculation time improves with a number of processes.
Although I am not sure whether this is related, I have found a strange behavior when I check the CPU consumption by "top" command. Even in a sequential trial, the CPU rate was around 1,600%. It may correspond to 16 * 100% and 16 is a maximum number of processes in this system. When I perform "mpirun -np 8", eight ~200% processes showed up. Such behavior has never been observed in my desktop PC.
Has anyone ever been faced with this kind of problem? I am sorry for this long post, but I would appreciate your advices.