Page 1 of 1
Posted: Thu Apr 17, 2014 8:55 pm
Having some issues with fldiff behaviour, I would like to plan to rewrite this script in python with more flexibility and why not features.
That's why I come to you.
First, I want to know if it is ok with everyone if I rewrite this script ?
Second, I want to ask you what modification you would like to include in this new script ?
I let you think and post your ideas / comments during a couple of weeks. After then, I'll start the work.
Posted: Tue Apr 22, 2014 8:54 am
You're not the first who wants to rewrite fldiff and this is a topic that comes back periodically. However, Matteo is already on it and is taking care of preserving its integration within the test suite as well as its proper working on multicore architectures. If you wish to help, the best is probably that you contact him directly and get updated wrt his plans.
Just be aware that any modification to fldiff has to be considered with a lot of care and coordination with the other developers, since this is the most critical instrument to measure the correctness of the tests. It cannot be done in a "just send me some specs and I'll be coding" way.
Posted: Tue Apr 22, 2014 4:03 pm
I'm already in touch with Matteo.
I know this is a critical part of the test suite and that many people wanted to rewrite it in python but nobody did it so far (except Matteo).
As I am experiencing a misbehavior of the current perl script I decided to ask the developers what they tough are about writing a new python script. That's the main purpose of this topic.
I think if we can get something working with the same old behavior plus new features, if needed, written in python, most of the developers will be glad, won't they ?
Posted: Wed Apr 23, 2014 1:50 am
I already have a preliminary version of fldiff written in python indeed. I'm still
working on it but I will commit the python script to my bzr branch asap so that
you can look at it and send me comments/suggestions.
I wrote the new script with the following goals in mind:
- The comparison should never fail. If the two files have the same number of lines, we can use the standard line-based algorithm
If the files have different number of lines (e.g different number of SCF cycles), I use the algorithm implemented in`diff` to match the lines and I will proceed with the comparison.
runtests.py will mark the test with the *warning* tag if no significant difference is found in the lines that can be matched *and* the calculation completed
without runtime errors.
- The script will generate a standard txt file as well as a HTML file with hyperlinks so that we can easily analyze the differences
- The logic used to tokenize the output file is much easier than the one implemented in the old version. I will explain the new rules
when I'm done with the new version (the explanation will more lengthy than the python code that is actually 3-4 lines)
- One should always take into account that we are not comparing real numbers but their string representation.
Hence we should try to be more flexible when comparing strings representing numbers.
For example: "1.1" and "1.2" should be treated as an equivalent **string** representation of a floating point number.
If you have access to the binary representation of the number, maybe you can find that 1.1499999 and 1.1500001 agree within 2e-7
but their string representation can differ by an amount that is given by the number of significant digits used to print the string!
There's nothing wrong in this approach because I think that important results of the calculation should always be written with
enough significant digits and possibly in scientific format.
- I want to have code that can be easily ported to python3 hence the new script will require python2.7
I'm already planning to migrate the test suite to python >= 2.7 in Abinit8 (I will open a new thread to discuss about this topic)
As Yann says, one has to make sure that the new version is robust enough to replace the old script. I have to say that this is the most boring part
since the new code is not just a translation of fldiff.pl in python but it implements a completely different algorithm.
My goal is not to have the same algorithm implemented in python but to have something that is more flexible and more integrated with the rest of the test suite.
For the first implementation, I will try to be as close as possible to fldiff.pl so that we can compare the results produced by the two versions.
My goal is to have the new pythonic version of fldiff officially integrated in the test farm when we open the branches for Abinit8.
I encourage the other developer to participate to this discussion so that we can take into acccount their suggestions during
the development of the new code.
Posted: Tue Feb 12, 2019 9:41 pm
This is an old thread, but I am wondering if anyone ever developed a better version of fldiff?
I am getting lots of fldiff.pl error reports when I run the tests. But when I compare the output to the REF output, the results appear to be OK and are only failing because they have a different number of lines for some reason (e.g., different number of SCF cycles, etc).
Posted: Mon Mar 04, 2019 12:27 am
I discovered that the reason I was getting a different number of SCF cycles than the reference runs, and thus causing fldiff.pl to fail, is because I was using the Intel compiler tools (Intel Parallel Studio XE 19.0). They default to more aggressive optimization than the gnu tools, which compromises accuracy. A nice description of this is given at https://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler
If i compile with FCFLAGS_OPTIM="-O1 -fp-model precise", I don't see any fldiff.pl errors. So, in this case, the fact that fldiff.pl flagged output with differing numbers of lines helped me to learn more about compiling with Intel.
It would be nice if the abinit documentation included a warning about this for new users.