Only 1cpu is used among multicore-cpu  [SOLVED]

option, parallelism,...

Moderators: fgoudreault, mcote

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Locked
anemonekgo
Posts: 21
Joined: Tue Sep 22, 2015 3:54 am

Only 1cpu is used among multicore-cpu  [SOLVED]

Post by anemonekgo » Sun Oct 11, 2015 2:34 am

Hi

I'm a beginner.
Recently I installed Abinit 7.10.4 on CentOS 6.7.
My PC is Corei7 990X_ex.(6core/12T) -RAM24G -SSD256G.
When look after a system manager when I perform Running of abinit;
I'm using only 1CPU(100%) in multicore-cpu(It is displayed 12CPU).
What setting will this have a problem with?
I'm getting good calculation results , but run time is long.
Please tell me what to do to.

Best regards,
Haruyuki Satou

======================================================================

The following configuration information
======================================================================
#build.ac file
------------------------------------------------------------
enable_mpi="yes"
enable_mpi_io="yes"
with_mpi_prefix="/usr/lib64/openmpi"
with_trio_flavor="netcdf+etsf_io"
#with_netcdf_incs="-I/usr/include"
#with_netcdf_libs="-L/usr/lib64 -lnetcdf -lnetcdff"
with_fft_flavor="fftw3"
with_fft_incs="-I/usr/include/"
with_fft_libs="-L/usr/local/lib64/gnu/fftw/3.3.4/lib -lfftw3 -lfftw3f"
with_linalg_flavor="atlas"
with_linalg_libs="-L/usr/lib64/atlas -llapack -lf77blas -lcblas -latlas"
#with_dft_flavor="atompaw+libxc"
#with_dft_flavor="atompaw+libxc+wannier90"
with_dft_flavor="atompaw+bigdft+libxc+wannier90"
enable_gw_dpc="yes"
#enable_maintainer_checks="no"
#enable_test_timeout="yes"
------------------------------------------------------------
$./configure --with-config-file=./build.ac
#no error

==============================================================================
=== Final remarks ===
==============================================================================
Summary of important options:

* C compiler : gnu version 4.9
* Fortran compiler: gnu version 4.9
* architecture : unknown unknown (64 bits)

* debugging : basic
* optimizations : standard

* OpenMP enabled : no (collapse: ignored)
* MPI enabled : yes
* MPI-IO enabled : yes
* GPU enabled : no (flavor: none)

* TRIO flavor = netcdf-fallback+etsf_io-fallback
* TIMER flavor = abinit (libs: ignored)
* LINALG flavor = atlas (libs: user-defined)
* ALGO flavor = none (libs: ignored)
* FFT flavor = fftw3 (libs: user-defined)
* MATH flavor = none (libs: ignored)
* DFT flavor = libxc-fallback+atompaw-fallback+bigdft-fallback+wannier90-fallback

Configuration complete.
You may now type "make" to build ABINIT.
(or, on a SMP machine, "make mj4", or "make multi multi_nprocs=<n>")
==============================================================================
$ make mj4
#no error
==============================================================================
$ make install
#no error

**************************************************************************************************
$cd tests
$./runtests.py -j6 fast
Test suite completed in 9.63 s (average time for test = 2.63 s, stdev = 2.63 s)
failed: 0, succeeded: 10, passed: 1, skipped: 0, disabled: 0
Suite failed passed succeeded skipped disabled run_etime tot_etime
fast 0 1 10 0 0 28.92 30.20

$./runtests.py paral -j6
Test suite completed in 86.06 s (average time for test = 4.48 s, stdev = 12.80 s)
failed: 0, succeeded: 14, passed: 7, skipped: 69, disabled: 0
[paral][t06_MPI1][np=1] has run_etime 65.40 s
[paral][t51_MPI1-t52_MPI1-t53_MPI1] has run_etime 32.28 s
[paral][t59_MPI1][np=1] has run_etime 30.34 s
[paral][t71_MPI1][np=1] has run_etime 72.22 s
[paral][t73_MPI1][np=1] has run_etime 37.03 s
[paral][t91_MPI1][np=1] has run_etime 42.28 s
Suite failed passed succeeded skipped disabled run_etime tot_etime
paral 0 7 14 69 0 403.01 406.55
**************************************************************************************************

Jordan
Posts: 282
Joined: Tue May 07, 2013 9:47 am

Re: Only 1cpu is used among multicore-cpu

Post by Jordan » Mon Oct 12, 2015 7:58 am

You did not compile abinit with OpenMP support *threads" which means that if you want to use, let say 6 cores per calculation, you have to launch abinit with mpirun like

Code: Select all

mpirun -n 6 abinit < files > log


Or for the test suite, you need to specify -n 6 instead of -j 6. For the testsuite, the -j option means launch 6 tests at the same time (parallel launch of several tests, each one runs on 1 cpus) whereas -n X means run each test on X cpus.
So for instance, runtests.py -n 6 -j 2 means do 2 tests at the same time whit 6 mpi processes for each test. So you would use 6*2=12 cores.

Cheers

Jordan

anemonekgo
Posts: 21
Joined: Tue Sep 22, 2015 3:54 am

Re: Only 1cpu is used among multicore-cpu

Post by anemonekgo » Tue Oct 13, 2015 6:31 am

Dear Jordan

Thanks for all your help.

I could start multi-core process on abinit by writing as "enable_openmp="yes"" in the build.ac.

I failed before when it was "./configure --enable-openmp"( --> * OpenMP enabled : yes * MPI enabled : no * MPI-IO enabled : no).

Then I tried "mpirun -n 6 abinit < files > log" command with " --allow-run-as-root " option.

As a result of calculating in "BaTiO3" (ecut=15,ngkpt=8x8x8), and having tested it, I was able to shorten calculation time (90min. ---> 5min.).

What you just said really comforted me, thank you.

Sincerely.
Haruyuki Satou

anemonekgo
Posts: 21
Joined: Tue Sep 22, 2015 3:54 am

Re: Only 1cpu is used among multicore-cpu

Post by anemonekgo » Fri Oct 16, 2015 11:18 am

Hi

Thanks for your help the other day.

Then a problem has happened.

While it's calculation, Abinit stops at the following error.

#chkpawovlp : ERROR -
PAW SPHERES ARE OVERLAPPING !
#Distance between atoms 3 and 37 is : 1.70352
PAW radius of the sphere around atom 3 is: 2.32398
PAW radius of the sphere around atom 37 is: 0.90083
This leads to a (voluminal) overlap ratio of 90.75 %

--from log file ------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
--- !ERROR
message: |
Action: 1- decrease cutoff radius of PAW dataset
OR 2- ajust "pawovlp" input variable to allow overlap (risky)
src_file: chkpawovlp.F90
src_line: 183
...

leave_new : decision taken to exit ...
----------------------------------------------------------------------------------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 14.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
----------------------------------------------------------------------------------------------------------------------------------------------------

Even if it's made pawovlp=-1, a strange calculation result is generated.
(Such as nuclear fusion jokingly )

For example calculation is normally running to n = about 7, and a problem occurs critically after it.

The error has occurred using this command.

$ mpirun --allow-run-as-root -n 6 abinit < $name.files > $name.log

I think trouble of OpenMPI.

I would appreciate it if you would give me some advice.

Best regards
Haruyuki Satou

Jordan
Posts: 282
Joined: Tue May 07, 2013 9:47 am

Re: Only 1cpu is used among multicore-cpu

Post by Jordan » Mon Oct 19, 2015 8:11 am

Please post your input file. It might be a good starting point to analyze it before going into an informatic problem.
In particular, check the distance between you atoms, use any visualization software to be sure of the structure.

In PAW overlap of PAW sphere may result in a wrong calculation, in you case you have a very high overlap therefore if this overlap is a consequence of you input file, it is probably not physical. Allowing the overlap will ensure the calculation will continue but nothing guaranty the result.

anemonekgo
Posts: 21
Joined: Tue Sep 22, 2015 3:54 am

Re: Only 1cpu is used among multicore-cpu

Post by anemonekgo » Mon Oct 19, 2015 5:16 pm

Dear Jordan

Thank you for your reply.

I attach an input file as you suggestion.

I would be grateful if you could kindly confirm that you have checked it.

Please tell me if you need something.

Sincerely.
Haruyuki Satou
Attachments
20151019.in
(5.34 KiB) Downloaded 399 times

Jordan
Posts: 282
Joined: Tue May 07, 2013 9:47 am

Re: Only 1cpu is used among multicore-cpu

Post by Jordan » Tue Oct 20, 2015 8:57 am

Your overlap appears between a lead and an hydrogen which is a common problem with hydrogen.

Looking in your input file, I noticed you use ecut=5 which is really small and might result in bad results or crashed if too small. I would suggest, before doing your relaxation, to run the usual convergence studies, especially on ecut. You should find a value between 15 and 25 in PAW or probably between 35 and 50 in NC.

anemonekgo
Posts: 21
Joined: Tue Sep 22, 2015 3:54 am

Re: Only 1cpu is used among multicore-cpu

Post by anemonekgo » Tue Oct 20, 2015 3:29 pm

Dear Jordan

Thank you for the valuable proposal.
I'm going to fix the ecut value to usual and try it.
Thank you again for your kindness.

Sincerely,
Haruyuki Satou

Locked