bintools/evalid/evalid.txt
author Richard Taylor <richard.i.taylor@nokia.com>
Tue, 27 Apr 2010 16:46:48 +0100
branchwip
changeset 498 564986768b79
parent 0 044383f39525
child 678 f0d451bf8bcb
permissions -rw-r--r--
add --query option

EVALID
19/10/2005

EVALID compares two trees of files, ignoring non-significant differences in
some types of files.

EVALID can perform the comparision using two different methodlogies.

1) Direct comparision of two trees of files, where each tree must be accessible
at run to time to EVALID.

2) Indirect comparision which generates a set of MD5 signitures for the files
in each tree in multiple runs of EVALID and then compares sets of MD5 signitures
in an additional run of EVALID. This method does not require each file tree to
be accessible at the same time.


A typical EVALID report is something like:

    OK: tree1\file1 and tree2\file1 (identical)
    OK: tree1\file2 and tree2\file2 (some type)
    
    FAILED: tree1\file3 and tree2\file3 (some type)

This indicates that file1 was completely byte-for-byte identical in the two
trees, that file2 was the same ignoring non-significant differences, and that
file2 was significantly different. Both file2 and file3 were different in the
strict byte-for-byte sense, so EVALID examined the file in the first tree to
identify the file type, and applied an associated comparison function to ignore
the non-significant differences.

The -v option to EVALID will report more detail of the comparison failures: for 
types where the comparison is done on filtered text (e.g. Intel object), the 
comparison failure will show the lines after filtering has taken place.

These are the types of file recognised by EVALID, with the comparison
functions and rules applied in each case. File type recognition is based on an
examination of the first few bytes of the file, not the file name.


--------
(identical)

EVALID did not determine the type: the files are the same size and have
the same sequence of bytes.

--------
(E32 DLL), (E32 EXE), (compressed E32 DLL), (compressed E32 EXE)

E32 Image format files as produced by PETRAN. 

EVALID compares the files using "pediff -e32", which ignores the timestamp and 
the version number in the file header, but requires the rest of the files to 
be identical.

--------
(Intel DLL), (Intel EXE), (MSDOS executable)

PE-COFF format executable for machine type IMAGE_FILE_MACHINE_I386.

EVALID applies "pe_dump" to each file and compares the output.

The "pe_dump" utility ignores timestamps in the COFF header, the debug data, 
the export directory and the .rsrc section. The timestamps are set to zero, but 
then the headers and the contents of each section are required to be identical.

--------
(Intel object)

PE-COFF object file for machine type IMAGE_FILE_MACHINE_I386.

EVALID applies "dumpbin /symbols /exports" to each file and compares the output
subject to the following filtering:

1) Lines beginning "Dump of file" have the filename removed
2) "line #xxxx" references to source line numbers are removed
3) The filenames associated with ".file" information have the drive and path information removed.
4) The absolute symbol for the LIB.EXE version number is ignored.
5) COFF section offsets are ignored
6) The size of each section is ignored.
7) Summary information about the debug section is ignored

The dumpbin output is otherwise expected to be identical.

--------
(ARM object)

PE-COFF object file for machine type 0x0A00.

EVALID applies "nm --no-sort" to each file and compares the output subject to
the following filtering:

1) All filenames are ignored
2) Pathnames of object files are ignored
3) The unique symbol generated by dlltool is based on the -o argument
   specified on the command line, so it is "cleaned" to remove path information

The output of nm is otherwise expected to be identical.

--------
(Intel library), (ARM library)

Archive file in which the first identifable object file is (Intel object) or
(ARM object) respectively. 

EVALID compares libraries using the same approach as used for the corresponding
object files: the tools used accept libraries of object files as well as
individual files.

--------
(Java class)

Files which begin with the 0xCAFEBABE magic number.

Java class files ought to be (identical) as there are no non-significant 
differences. EVALID therefore only reports this type for failures.

--------
(ZIP file)

Files which begin with the signature for PK ZIP format 3.4

EVALID applies "unzip -l -v" to each file and compares the output subject to 
the following filtering:

1) The name of the archive file is ignored
2) The time and datestamps on the files are ignored

The output of zip is otherwise expected to be identical, i.e. the zip file
contains the same filenames with the same sizes and checksums, in the same order.

--------
(EPOC Permanent File Store)

Files which appear to have UID 1 equal to 0x10000050, without actually
computing the Symbian platform checksum to confirm that the UIDs are valid.

EVALID applies "pfsdump -c -v" and compares the output, ignoring the name
of the file being dumped. Pfsdump is a utility which prints out the
contents of the file store in stream ID order, thereby ignoring unreclaimed
free space in the filestore or the current location of each stream within
the store.

--------
(SIS file)

Files which appear to have the UIDs for narrow or unicode SIS files.

EVALID does not know how to compare SIS files, so this type is only
reported for comparison failures.

--------
(MSVC database)

Microsoft database files, usually Debug databases (.PDB files).

EVALID does not know how to compare these files, but assumes that there are no
significant differences in these files which won't also be reflected in the
associated executables: this type is always reported as an "OK" comparison.

--------
(MAP file)

MAP file generated by the GNU linker.

EVALID filters this text format as follows:

1) Names such as ds999.o are ignored (as per ARM object files)
2) Pathnames to object files are ignored
3) The .stab and .stabstr lines are ignored because they relate to debug information
4) Lines which say "size before relaxing" are ignored as they also relate to debug information
5) The unique "_head" and "_iname" symbols in import libraries are ignored (as per ARM object files)

The files are otherwise expected to be identical.

--------
(SGML file)

Files which contain "<!DOCTYPE" and therefore follow the SGML standard - this
includes XML and HTML files.

EVALID filters these text files to remove the text inside single line script
comments, e.g. <!-- comment -->, and expects the files to be otherwise identical.

--------
(Preprocessed text)

Files which begin with "# 1 "filename"" are assumed to be the output of CPP.EXE.

EVALID filters these text files to remove the lines which record the #include structure leading to
the final contents of the file, and expects the files to be otherwise identical.

--------
(unknown format)

Files which weren't recognised as any of the above types, and which weren't
identical.

EVALID only reports this type for comparison failures.

--------
(unknown library)

Archive files which didn't appear to contain Intel or ARM object files

EVALID only reports this type for comparison failures.

--------
(Unknown COFF object)

COFF format object file with a machine_type which doesn't imply either
(ARM object) or (Intel object).

EVALID only reports this type for comparison failures.

--------
(ELF file)

ELF format file.

EVALID applies "elfdump" and compares the output, ignoring the program
header and section header offsets. Elfdump is a utility which prints out 
the significant parts of an ELF executable, ignoring non-significant
differences such as offsets of symbols in string table.

--------
(CHM file)

Microsoft's Compiled HTML Help files

EVALID applies "hh -decompile" to each file expanding it to a temporary directory and then compares
the contents of the temporary directory using the following process:

When directly comparing two chm files EVALID will first compare the file listing of the temporary
directories and return failed if the file listing is not identical.
If the file listing are identical it will then compare the contents of the two temporary directories
using the normal EVALID process.

When using the MD5 comparing functionality EVALID processes the contents of the temporary directory
using the standard EVALID process but amalgamates the results in to one MD5 signiture.