Numdiff

by Ivano Primi <ivprimi (at) libero (dot) it>
Last Update: 2017-02-25

News

About

Numdiff (which I will also write numdiff) is a little program that can be used to compare putatively similar files line by line and field by field, ignoring small numeric differences or/and different numeric formats. Equivalently, Numdiff is a program with the capability to appropriately compare files containing numerical fields (and not only).

Whenever you compare a couple of such files, what you want to obtain usually is a list of the numerical fields in the second file which numerically differ from the corresponding fields in the first file. Well known tools like diff, cmp or wdiff can not be used to this purpose: they can not recognize whether a difference between two numerical fields is only due to the notation or is an actual difference of numerical values. In addition, sometimes you might also want to ignore differences in numerical values as long as they do not exceed a certain threshold. In other words, you could desire to neglect all small numerical differences too. However, programs like diff and wdiff can not be used to ignore small numerical differences, since they do not even know what a numerical difference is. These are the reasons why I decided to implement Numdiff.

In writing this program I was inspired by ndiff, a GPL'ed software by Nelson H. F. Beebe of the Salt Lake City University, see

http://www.math.utah.edu/~beebe/software/ndiff

ndiff is a good tool and I used it for a while. But I did not completely like the way it works and so numdiff was conceived. Although ndiff inspired numdiff, they are completely different from the viewpoint of the source code: numdiff has been entirely written from scratch with addition of source code from GNU bc, GNU diff and GNUlib.

When comparing files, Numdiff assumes by default that the fields are separated by white-space characters (spaces, horizontal tabulations and newlines), but the user can also specify its list of separators through the option -s, see the User Manual.

Numdiff has many features that ndiff lacks, for instance it recognizes complex numbers and allows to specify different sets of field delimiters for the two files to compare. In addition, starting from version 5 Numdiff includes a filter which allows it not to get confused if one file contains one or more lines for which there exist no corresponding lines in the other file. Also this feature is missing in ndiff.

I know that many people could find Numdiff simply useless. But people working in Scientific Computing or in Numerical Analysis could find it useful for their job. Often they need to compare a file containing the output produced by a given numerical program, when running in a certain environment, with another file containing the output produced by the same program but in a different environment (by different environment I mean e.g.a different operating system or a different compiler on the same system). Or they need to compare the output of a numerical program, which is made to solve a certain problem, with the one produced by another program, which solves the same problem but using a different algorithm. Finally, sometimes they have to compare the output of a numerical program with a sample file containing a list of expected data (which could have been computed theoretically or come from experiments in a laboratory). In all these situations Numdiff could turn out very helpful, since it also lets the user specify a tolerance for absolute and/or relative differences, then reporting only the fields which differ enough to exceed these tolerances.

To end this presentation, let me say that Numdiff is a console application, i.e. a computer program designed to be used via a text-only computer interface, such as a text terminal or the command line interface of some operating systems. This means no mouse, no windows, no buttons, no silly icons. All modern operating systems provide with the Graphical User Interface (GUI) a program to emulate a text terminal. This program has different names depending on the operating system you are using: console, terminal emulator, xterm, rxvt, and so on. To use Numdiff you have to open the console/terminal emulator, start to write there some strange commands, and then press the key Enter to execute them :) If you do not know how to start with a terminal emulator, search the web for a user guide and, after reading it carefully, come back here.

Sample Output

Since one example is often more useful than many words... Let us suppose that file1 contains the list of numbers:

  1.25	-3.45		1.23456789E-2   -5.98765432e+5  100.00

and file2 the following one:

  1.250001  -3.450003	1.23456788E-2   -5.98765431e+5  100.000022

We can compare these two files by calling numdiff (the name of the program must be written lower case!) and passing it file1 and file2 as arguments:

  numdiff file1 file2

The output of this command will be:

  ----------------
  ##1       #:1   <== 1.25
  ##1       #:1   ==> 1.250001
  @ Absolute error = 1.0000000000e-6, Relative error = 8.0000000000e-7
  ##1       #:2   <== -3.45
  ##1       #:2   ==> -3.450003
  @ Absolute error = 3.0000000000e-6, Relative error = 8.6956521739e-7
  ##1       #:3   <== 1.23456789E-2
  ##1       #:3   ==> 1.23456788E-2
  @ Absolute error = 1.0000000000e-10, Relative error = 8.1000001393e-9
  ##1       #:4   <== -5.98765432e+5
  ##1       #:4   ==> -5.98765431e+5
  @ Absolute error = 1.0000000000e-3, Relative error = 1.6701030958e-9
  ##1       #:5   <== 100.00
  ##1       #:5   ==> 100.000022
  @ Absolute error = 2.2000000000e-5, Relative error = 2.2000000000e-7
  
  +++  File "file1" differs from file "file2"

This text should be self-explanatory. The tags ##l and #:f, where l and f are integer numbers, refer to the line number and to the position of the field within the line, respectively. Thus,

  ##1       #:1   <== 1.25
  ##1       #:1   ==> 1.250001
  @ Absolute error = 1.0000000000e-6, Relative error = 8.0000000000e-7

means that the first field of the first line is given by 1.25 in the first file, by 1.250001 in the second file. The absolute difference between these two numbers is 1.0000000000e-6, while the relative difference is 8.0000000000e-7.

Numdiff can also print a sort of statistical report about the numerical differences discovered in the two files. To this end it is sufficient to specify the option -S. If you are interested only in the statistical report and want to remove from the output the detailed list of all differences, then you have to specify additionally the option -q. The output of the command numdiff -S -q file1 file2 is:

  
  5 numeric comparisons have been done, all of them
  have produced an outcome beyond the tolerance threshold
  
  Largest absolute error in the set of the major numerical differences:
  1.0000000000e-3
  Corresponding relative error:
  1.6701030958e-9
  First occurrence (#line, #field) in the  first file: 1, 4
  First occurrence (#line, #field) in the second file: 1, 4
  
  Largest relative error in the set of the major numerical differences:
  8.6956521739e-7
  Corresponding absolute error:
  3.0000000000e-6
  First occurrence (#line, #field) in the  first file: 1, 2
  First occurrence (#line, #field) in the second file: 1, 2
  
  
  Sum of all absolute errors:
  1.0260001000e-3
  Sum of the major absolute errors:
  1.0260001000e-3
  Arithmetic mean of all absolute errors:
  2.0520002000e-4
  Arithmetic mean of the major absolute errors:
  2.0520002000e-4
  Square root of the sum of the squares of all absolute errors:
  1.0002469695e-3
  Quadratic mean of all absolute errors:
  4.4732404362e-4
  Square root of the sum of the squares
  of the major absolute errors:
  1.0002469695e-3
  Quadratic mean of the major absolute errors:
  4.4732404362e-4
  

You can specify an absolute error tolerance (or a relative error tolerance) by means of the option -a (-r). If an absolute error tolerance is specified, numdiff only reports the absolute differences exceeding that tolerance. For instance, the output of numdiff -a 1.0e-5 file1 file2 will be

  ----------------
  ##1       #:4   <== -5.98765432e+5
  ##1       #:4   ==> -5.98765431e+5
  @ Absolute error = 1.0000000000e-3, Relative error = 1.6701030958e-9
  ##1       #:5   <== 100.00
  ##1       #:5   ==> 100.000022
  @ Absolute error = 2.2000000000e-5, Relative error = 2.2000000000e-7
  
  +++  File "file1" differs from file "file2"

Numdiff can also recognize non-numerical differences between two files. If a certain field in any of the two compared files is of non-numerical type, then, instead of performing a numeric comparison, Numdiff will simply perform a literal (character by character) comparison. For example, if the file example1 contains the line

  1.0     xyz     3.0     x       y

and the file example2 the line

  abc     1.1     3.3     x       z

then numdiff example1 example2 will display

  ----------------
  ##1       #:1   <== 1.0
  ##1       #:1   ==> abc
  @                                                     @@
  ##1       #:2   <== xyz
  ##1       #:2   ==> 1.1
  @                                                     @@
  ##1       #:3   <== 3.0
  ##1       #:3   ==> 3.3
  @ Absolute error = 3.0000000000e-1, Relative error = 1.0000000000e-1
  ##1       #:5   <== y
  ##1       #:5   ==> z
  @                                                     @@
  
  +++  File "example1" differs from file "example2"

The most appealing feature of Numdiff is the ability to detect insertions/deletions of lines, similarly to what diff does, through activation of a filter. Suppose that the files list1 and list2 contain the data

  Additional_line_which_creates_confusion
  Additional_line_which_creates_confusion
   +1.000
   +2.510
  +10.022

and

   +1.003
   +2.500
  +10.000
  Final_line_which_creates_confusion

respectively. What you would expect to find in the report displayed by Numdiff is, that list1 contains two lines at the beginning which are not present in list2, that the last line of list2 is not present in list1 and finally, that the three numerical values in list2 differ from the corresponding values in list1 together with the specifications of absolute and relative errors. But the output of the command numdiff list1 list2 differs from your expectations, since this is what Numdiff reports:

  ----------------
  ##1       #:1   <== Additional_line_which_creates_confusion
  ##1       #:1   ==> +1.003
  @                                                     @@
  ----------------
  ##2       #:1   <== Additional_line_which_creates_confusion
  ##2       #:1   ==> +2.500
  @                                                     @@
  ----------------
  ##3       #:1   <== +1.000
  ##3       #:1   ==> +10.000
  @ Absolute error = 9.0000000000e+0, Relative error = 9.0000000000e+0
  ----------------
  ##4       #:1   <== +2.510
  ##4       #:1   ==> Final_line_which_creates_confusion
  @                                                     @@
  ----------------
  ##5       <== +10.022
            ==>
  
  
  ***  End of file "list2" reached
       Likely the files "list1" and "list2" do not have the same number of lines !
  
  +++  File "list1" differs from file "list2"

By default Numdiff compares indeed the first, second, third line of the first file (in this case list1) with the first, second, third line of the second file (list2), and so on. If one of the two compared files contain one or more lines for which there exist no corresponding lines in the other file, Numdiff gets confused and displays a wrong output.

The filtering mechanism implemented in Numdiff since version 5 can detect such situations and re-synchronize the two files to obtain the final expected result. For instance, the command numdiff -z @ list1 list2, which activates the filter through the option -z @, will print

  ----------------
  ##1       <== Additional_line_which_creates_confusion
            ==>
  
  ----------------
  ##2       <== Additional_line_which_creates_confusion
            ==>
  
  ----------------
  ##3       #:1   <== +1.000
  ##1       #:1   ==> +1.003
  @ Absolute error = 3.0000000000e-3, Relative error = 3.0000000000e-3
  ----------------
  ##4       #:1   <== +2.510
  ##2       #:1   ==> +2.500
  @ Absolute error = 1.0000000000e-2, Relative error = 4.0000000000e-3
  ----------------
  ##5       #:1   <== +10.022
  ##3       #:1   ==> +10.000
  @ Absolute error = 2.2000000000e-2, Relative error = 2.2000000000e-3
  ----------------
            <==
  ##4       ==> Final_line_which_creates_confusion
  
  
  +++  File "list1" differs from file "list2"

The use of the filter can be sometimes tricky, see the User Manual for more examples and additional explanations.

Numdiff has many more options and features. In the User Manual you can find a detailed description of them.

Installation

On Unix(R) and GNU systems, like GNU/Linux, configuration, building and installation of Numdiff can be performed through the standard three steps:

          ./configure
          make
          make install

This works under the assumption that the target system for installation supplies an ANSI C compiler, a POSIX implementation of the make utility, and a shell sh-compatible. The compiler should at least accept the option -o to write its output to a specified file, the option -D for macros pre-definition, the option -l to search for a specified library, and the options -I and -L to add a given directory to the search path for include and library files, respectively. If you want to install the documentation also in the GNU Info format, then you need additionally a proper installation of GNU Texinfo. Finally, a proper installation of GNU Gettext is needed if you care about support for languages other than english (at the moment only the Italian localization is available). If you leave enabled the Natural Language Support and you want to install also the localization files, after make you will have to type and launch

          make install-nls

By default, make install will install all the files in /usr/local/bin, /usr/local/info, etc. You can specify an installation prefix different from /usr/local by using the option --prefix in the configure step, for instance --prefix=$HOME:

          ./configure --prefix=$HOME

Type ./configure --help to obtain the complete list of all available options.

Once Numdiff has been installed, you can remove all files previously installed by a simple make uninstall. If you have also installed the localization files trough make install-nls, then, in order to remove these ones too, use make uninstall-nls in place of make uninstall.

Look at chapter 4 of the User Manual if you need more information on how to compile, build and install Numdiff.

TODO

Known issues

The target installation directory specified by means of the configuration option --prefix cannot contain white spaces: make install does not work at all when the target installation directory is in a path which includes a white space (blank or tab). It is fairly easy to fix this issue in Makefile.in with some double quotes around each usage of $(DESTDIR), but unfortunately also the installation script GNU-shtool (of which Numdiff includes the current version) is broken.

License

Numdiff (also written numdiff) is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Numdiff is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Contact and Bug reports

Bug reports have to be sent to the address <ivprimi (at) libero (dot) it>. Please, put Numdiff in the subject and indicate the version of the operating system you are running (in particular, do not forget to specify if it is a 32- or a 64-bit system), and, if you know it, the version of the compiler used to build Numdiff. Please write also whether your version of Numdiff uses the GNU MP library or not. Before writing an email be sure to run the latest stable version of Numdiff, I do not provide support for older versions.

Download and Documentation

The tar-gzipped archive with the source code of Numdiff can be downloaded from

http://savannah.nongnu.org/download/numdiff

The latest stable release of Numdiff is provided by version 5.9.0. Together with the source code, the archive contains a very detailed user manual (in English). The manual, which was written by using GNU Texinfo, is available in the following formats:

Permission is granted to copy, distribute and/or modify this manual under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation. A copy of the license is always included in the section entitled "GNU Free Documentation License". You can also obtain a copy of the GNU Free Documentation License from http://www.gnu.org/copyleft/.

The manual of Numdiff can also be browsed online here.

Acknowledgments

First I want to thank all the people till now involved in the Free Software community, starting from those ones directly involved in the GNU project (http://www.gnu.org). Without their great work, this little one would have never been done.

I have also to thank Aurelio Marinho Jargas (verde@aurelio.net), author of txt2tags (http://txt2tags.sf.net), a free (GPL'ed) and wonderful text formatting and conversion tool, which I used in writing this web page.

Many thanks also to Mr. Norman Clerman of Opcon Associates, Inc. for several suggestions he gave me to improve the readability and the effectiveness of the output produced by Numdiff. He also pointed out the need to implement a filter for resynchronizing the lines between two files in case of addition or deletion of one or more lines. I have to give him credit for the urge to prepare the versions 4.x and 5.x of Numdiff.

Finally, I want to thank my friends Mariapia Palombaro, since she removed some errors while reviewing the first version of this document, and Paolo Caramanica, who suggested me to add more information to the output of the option -S of Numdiff.