Same time to run with more processors  [SOLVED]

option, parallelism,...

Moderators: fgoudreault, mcote

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Locked
Fionn
Posts: 3
Joined: Mon Aug 10, 2015 2:07 pm

Same time to run with more processors  [SOLVED]

Post by Fionn » Mon Aug 10, 2015 4:46 pm

I'm trying to run Abinit 7.10.4 in parallel using MPICH2 and Ubuntu 14.04, using tmoldyn_01 in tutoparal for testing. I have fifteen computers with two CPUs each, but am trying to set it up on two computers first. When I run the program, it takes the same amount of time to run regardless of whether 2 (one on each computer) or 4 processors are used (approx 2 mins for ntime=5). I have timed it using an external timer, and they are the same. To make sure that both computers were being used, I unplugged one from the Ethernet switch, and the program stopped running, which implies it is using both. The time is almost exactly the same, which suggests there is no difference in computation if np=2 or 4.

However, when I run it with only one processor, it only takes 7s. I understand that it takes time to send data between computers, but I'm not sure why there is such a large discrepancy.

I have attached the log file for each test. For the tests with np=1,2 and 4, I used npband=1,2,2 and npfft=1,1,2. I've looked at all of the tutorials and can't find anything I missed, and I can't find anyone else reporting this problem. I hope it's just something obvious I've missed.

Essentially, why is there no difference in time if np=2 or 4, and why does it take so much less time with only one computer? I'm afraid I'm quite new to both Abinit and Linux. Thank you for any assistance!
Attachments
1cpu.log
(163.19 KiB) Downloaded 259 times
2cpu.log
(164.47 KiB) Downloaded 281 times
4cpu.log
(163.5 KiB) Downloaded 265 times

Fionn
Posts: 3
Joined: Mon Aug 10, 2015 2:07 pm

Re: Same time to run with more processors

Post by Fionn » Tue Aug 11, 2015 11:16 am

An update: It occured to me that I'd only tried increasing the number of processors keeping the number of computers constant, and not increasing the number of computers by increasing the number of computers. So I added another computer, and this caused an increase from 2 to 3 mins. I'm wondering if I could be setting the input file up incorrectly? Or if this could be an MPICH problem, rather than Abinit? If it is an issue with MPICH, sorry for posting here (though some pointers would be appreciated anyway).

User avatar
jbeuken
Posts: 365
Joined: Tue Aug 18, 2009 9:24 pm
Contact:

Re: Same time to run with more processors

Post by jbeuken » Tue Aug 11, 2015 9:37 pm

Hi,

first of all, 7s in 1 CPU is not enough to use this test for scalability evaluation…
there are overheads with I/O ( read input file, write results,… )

try to find a test with time around 10 mins…

concerning MPICH in SMP ( one computer with some cores ) check the configure :
you needs at least : --with-pm=hydra:forker --with-device=ch3:nemesis

regards
------
Jean-Michel Beuken
Computer Scientist

Fionn
Posts: 3
Joined: Mon Aug 10, 2015 2:07 pm

Re: Same time to run with more processors

Post by Fionn » Mon Aug 17, 2015 5:27 pm

Thank you, this seemed to work with multiple cores in one computer.
I'm aware that 7s is not long enough, but I was just curious as to why this took so much longer with more computers, and I still have the same problem with longer times.

Locked