Hardware question  [SOLVED]

option, parallelism,...

Moderators: fgoudreault, mcote

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Locked
amir
Posts: 19
Joined: Fri Oct 24, 2014 8:44 pm

Hardware question

Post by amir » Fri Jun 19, 2015 8:55 am

Hi.

I am not sure if here is the right place for asking this question, but I couldn't find any other index more relevant.

I'm running my jobs on university's HPC, but the queues are long. I will be working with systems containing 100-200 atoms. My question is if it makes sense to buy a workstation with 6, 10, 12 or more cores. If so, what other specifications shall I take into account? I'd appreciate your suggestions.

Thanks,
Amir

Jordan
Posts: 282
Joined: Tue May 07, 2013 9:47 am

Re: Hardware question  [SOLVED]

Post by Jordan » Fri Jun 19, 2015 10:06 am

This is my personal advise.

For system with 100-200 atoms with let say ~10 valence electrons, you have between 1000 and 2000 bands in your problem - times 2 if the system is polarized.
Then for such a system you'll need between 1 and a few dozen kpt maybe.
Let assume you want to parallelize that you will first parallelize over kpt (let say 10 cpus) then bands ( at least 10 cpus again (~100 bands per cpu which is already a lot !) and maybe 2 or 4 for the FFT.
You are already at 10*10*2=200 cpus.
Of course 1000 cpus would make the calculation even faster.
Depending on what application you will run, 100 CPUS can be enough for a ground state calculation, dos,... Bug if you go to DFPT, MD or other stuff that already heavier 1000 cpus is more what you wish to have.

So with a workstation with 12 core, even 24 or 48, you will spend time just for one calculation.

An other consideration is the memory. with 1000 - 2000 bands and a few kpt, you'll need memory. I would not bet that on a 12 cores machine such a calculation fits in memory (usually 32Go, maybe 64Go)
The MPI parallelization will distribute the memory on all the core so each core needs less memory. so with 10 nodes of 12 cores and 32Go each you have 320Go for the calculation with ~2.6Go per core.

I've just tried a 720 bands and 26 kpt supercell, it does not fit on my 32Go and 12 cores. and of course any iteration is looooooooooooooooooooooooooooooooooong.
On 1560 cpus it takes already severals minutes per SCF steps...

I'd better spend time waiting in the queue and have the CPUS+memory that fit you needs. But make sure to tune your parallelization before the "real" calculation. Be sure the input is correct, the number of cpus is ok and so on.

Cheers

Jordan

amir
Posts: 19
Joined: Fri Oct 24, 2014 8:44 pm

Re: Hardware question

Post by amir » Mon Jun 22, 2015 7:10 pm

Thanks Jordan. I was looking for an answer like this to give my some view. If we can get a funding, we will go for nodes. Do you have any recommendation/advice on that?

Thanks for your time again :)

Cheers,
Amir

Jordan
Posts: 282
Joined: Tue May 07, 2013 9:47 am

Re: Hardware question

Post by Jordan » Tue Jun 23, 2015 9:11 am

I have never bought myself nodes but I usually prefer nodes with at least 3Go of memory per core (for large calculations) and cpu with large cache L2 memory.
For some calculation which need a lot of memory, even if the frequency is lower having a large L2 can increase the speed of the calculation compared to a high frequency CPU and small L2.
I would recommend you test on several configuration a typical run you will do and see what's most important.
Also for MPI parallelization don't neglect the inter-connexion of the nodes (fast InfiBand network and so on).

Jordan

amir
Posts: 19
Joined: Fri Oct 24, 2014 8:44 pm

Re: Hardware question

Post by amir » Tue Jun 23, 2015 8:59 pm

Thank you Jordan. I appreciate it.

Locked