Follow-up) problems with large natom errors  [SOLVED]

option, parallelism,...

Moderators: fgoudreault, mcote

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Locked
anemonekgo
Posts: 21
Joined: Tue Sep 22, 2015 3:54 am

Follow-up) problems with large natom errors

Post by anemonekgo » Mon Mar 28, 2016 2:07 am

Dear all,

I'm always thankful for the help.
This time is a subsequent report of the following subject.
#Subject: About "signal 11 (Segmentation fault)" error when value of natom is making to big.
viewtopic.php?f=2&t=3217
*Because the last time is a long sentence, please refer to a link

I understood that an error occur when I made natom big by a trial run of the parallel computer.

First it is a LAN traffic error, I was able to calculate without a problem when 10Gbps LAN card and switch change only master node.
By this operation, a calculation was possible to natom=396.
=================================================================================================================
Ru_1 natom 48 No problem calculation end by 1Gbps_LAN_system
Ru_2 natom 156 I can calculate without a problem when 10Gbps LAN card and switch change only master node.
Ru_3 natom 252 Same as above
Ru_4 natom 300 Same as above
Ru_5 natom 348 Same as above
Ru_6 natom 396 Same as above
Ru_7 natom 444 #error appears at $filename_o_DS1_TIM6.cif(DEN) and does Stop
Ru_8 natom 492 #error appears at $filename_o_DS1_TIM4.cif(DEN) and does Stop
=================================================================================================================

*I tried to change the value of ecut at Run_7.

ecut=5 ,pawecutdg=20 : error appears at $filename_o_DS1_TIM2.cif(DEN) and does Stop
ecut=10,pawecutdg=20 : error appears at $filename_o_DS1_TIM6.cif(DEN) and does Stop
ecut=15,pawecutdg=30 : error appears at $filename_o_DS1_TIM8.cif(DEN) and does Stop
ecut=20,pawecutdg=40 : calculation finish $filename_o_DS1_TIM8.cif(DEN), But error appears at EIG file writting stage and stop.
ecut=25,pawecutdg=50 : calculation complete! I can get EIG file. But computing time is very long.

So it's a question.

#Question_1
When natom becomes big, why can't it be calculated any more? (natom : 396<=>444 )
*On the other hand even if ngkpt is made big, it can be calculated.
I thought that I was caused by shortage of somewhere memory (RAM, L3,etc)???

#Question_2
Why is it that I become able to calculate by making ecut big?
*I was thinking increase of ecut was increase of a memory. However, this hypothesis is the result and reverse.

Please tell me what is going on in this series of operations.
I want to find the key to solution to the problem.

Best regards,
Haruyuki Satou
(anemonekgo)

anemonekgo
Posts: 21
Joined: Tue Sep 22, 2015 3:54 am

Re: Follow-up) problems with large natom errors  [SOLVED]

Post by anemonekgo » Mon Apr 11, 2016 1:54 am

anemonekgo wrote:Dear all,

I'm always thankful for the help.
This time is a subsequent report of the following subject.
#Subject: About "signal 11 (Segmentation fault)" error when value of natom is making to big.
viewtopic.php?f=2&t=3217
*Because the last time is a long sentence, please refer to a link

I understood that an error occur when I made natom big by a trial run of the parallel computer.

First it is a LAN traffic error, I was able to calculate without a problem when 10Gbps LAN card and switch change only master node.
By this operation, a calculation was possible to natom=396.
=================================================================================================================
Ru_1 natom 48 No problem calculation end by 1Gbps_LAN_system
Ru_2 natom 156 I can calculate without a problem when 10Gbps LAN card and switch change only master node.
Ru_3 natom 252 Same as above
Ru_4 natom 300 Same as above
Ru_5 natom 348 Same as above
Ru_6 natom 396 Same as above
Ru_7 natom 444 #error appears at $filename_o_DS1_TIM6.cif(DEN) and does Stop
Ru_8 natom 492 #error appears at $filename_o_DS1_TIM4.cif(DEN) and does Stop
=================================================================================================================

*I tried to change the value of ecut at Run_7.

ecut=5 ,pawecutdg=20 : error appears at $filename_o_DS1_TIM2.cif(DEN) and does Stop
ecut=10,pawecutdg=20 : error appears at $filename_o_DS1_TIM6.cif(DEN) and does Stop
ecut=15,pawecutdg=30 : error appears at $filename_o_DS1_TIM8.cif(DEN) and does Stop
ecut=20,pawecutdg=40 : calculation finish $filename_o_DS1_TIM8.cif(DEN), But error appears at EIG file writting stage and stop.
ecut=25,pawecutdg=50 : calculation complete! I can get EIG file. But computing time is very long.

So it's a question.

#Question_1
When natom becomes big, why can't it be calculated any more? (natom : 396<=>444 )
*On the other hand even if ngkpt is made big, it can be calculated.
I thought that I was caused by shortage of somewhere memory (RAM, L3,etc)???

#Question_2
Why is it that I become able to calculate by making ecut big?
*I was thinking increase of ecut was increase of a memory. However, this hypothesis is the result and reverse.

Please tell me what is going on in this series of operations.
I want to find the key to solution to the problem.

Best regards,
Haruyuki Satou
(anemonekgo)
Dear all,

I'm always thankful for the help.
Because it was settled by oneself, I report it.
Of course I do not know where is the most important point of course.
Therefore please take care of advice somehow.
As a result, what I was able to calculate to natom 492 did the following two operations.

----------------------------------------------------------------------------------------
1) I changed gcc,g++,gfortran=4.8 => 4.9 with optimization

I added the following to build(ubuntu).ac file when I do configure abinit.

Code: Select all

FC="mpif90"
F77="mpif90"
FCFLAGS="-lgfortran -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -O2 -fstack-arrays -m64 -fsignaling-nans -funroll-all-loops -mtune=native -march=native -ftree-vectorize -ffast-math -fno-protect-parens -g -ffree-line-length-none"
CC="mpicc"
CFLAGS="-lgcc -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -O2 -ffast-math -m64 -ftree-vectorize -mtune=native -march=native -funroll-all-loops -g"
CXX="mpicxx"
CXXFLAGS="-lstdc++6 -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -O2 -ffast-math -m64 -ftree-vectorize -mtune=native -march=native -funroll-all-loops -g"


2) I changed linear algebra from "ATLAS" to "Netlib+ScaLAPACK"

----------------------------------------------------------------------------------------

If it is possible, let me ask you a question.
Where would be important in this operation?
Can't I also have advice for a future reference?

Best regards,
Haruyuki Satou
(anemonekgo)

Locked