Solved

How to compare Two binary files?

Posted on 2010-08-16
15
1,234 Views
Last Modified: 2013-12-26
A library is compiled with Wind River Diab 4.2b compiler resulting in old.a file.
Identical library is compiled with Wind River Diab 5.8.0.0 compiler resulting in new.a.

SlickEdit DIFFzilla utility was used to compare old.a and new.a binary files.  
The files are not getting compared because first part of old.a is shown against Imaginary Buffer.  Then, first part of new.a is show against Imaginary Buffer.  Then, next part of old.a is shown against Imaginary Buffer.  Then, next part of new.a is shown against Imaginary Buffer.  This keeps repeating until end of files is reached.

My guess is that files are not different.  Somehow compiler output formats are different which isn't allowing the files to get compared.

What other options do I have to compare these binay files?

Other tools I'm using are Clearcase, Codewright, Unix, Linux.  
0
Comment
Question by:naseeam
  • 6
  • 4
  • 4
  • +1
15 Comments
 
LVL 40

Expert Comment

by:evilrix
ID: 33449266
What is it, exactly, you are trying to achieve?
0
 
LVL 1

Author Comment

by:naseeam
ID: 33450233
I'm trying to compare two binary files using SlickEdit DIFFzilla utility.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 33450240
That I understand but to what end... what is you ultimate goal here?
0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 1

Author Comment

by:naseeam
ID: 33455550
I compile library source files with one version of the compiler to get old.a library file.
Then, I compile exact same source files with newer version of compiler to get new.a library file.

My goal is to find out if two .a library files are identical or not?  If they are identical, then, I don't need to test library (built with new version of compiler) in the target board.
0
 
LVL 32

Expert Comment

by:phoffric
ID: 33455801
In some environments two identical compilations of libraries or executable builds still result in different binaries because of date time stamps. But, by experimenting, we could determine what was built from identical source code.

For example, I just build a few seconds apart two executables, a.exe and b.exe. I did a diff as seen below on hex dumps, and the results show minor changes. For our configuration management builds, we dropped the hex dumps into Codewright and got (better - easier to read) results than what I am showing below using the freeware WinMerge
$ od -c a.exe > a.txt
$ od -c b.exe > b.txt
$ diff a.txt b.txt
9c9
< 0000200   P   E  \0  \0   L 001  \t  \0 361 252   j   L  \0   &  \0  \0
---
> 0000200   P   E  \0  \0   L 001  \t  \0 343 252   j   L  \0   &  \0  \0
14c14
< 0000320  \0 240  \0  \0  \0 004  \0  \0   3 317  \0  \0 003  \0  \0 200
---
> 0000320  \0 240  \0  \0  \0 004  \0  \0   % 317  \0  \0 003  \0  \0 200

Open in new window

binary-diff.PNG
0
 
LVL 40

Assisted Solution

by:evilrix
evilrix earned 40 total points
ID: 33455893
You can't just binary diff the libraries because every time you compile them there is a chance that variable content (such as timestamps or paths as phoffric eludes) will be included. You need to do something a bit smarter than this.

This might get you going...

To find the dependencies of a dynamic library just used the ldd command.
http://unixhelp.ed.ac.uk/CGI/man-cgi?ldd+1

You can find out what symbols each library exports using the nm command
http://unixhelp.ed.ac.uk/CGI/man-cgi?nm

0
 
LVL 40

Expert Comment

by:evilrix
ID: 33455911
Oh, and if they are static libraries you can just unarchive them since a static library is nothing more than an archive file with an index table. You can use the ar command to do this.

http://unixhelp.ed.ac.uk/CGI/man-cgi?ar

Once the contents are extracted to can compare each individual element.
0
 
LVL 32

Expert Comment

by:phoffric
ID: 33455952
>> (such as timestamps or paths)
True, sometimes the Configuration Management was mapped to a different drive.
But that was what we found were the binary differences: timestamps and/or paths.

And it was easy to determine that if we got a clean comparison, then we knew that the other 1MB represented the same source code.

When comparing libraries, there were many more difference because each object in that library could have a different timestamp. But this process was sound and absolutely necessary to guarantee that what was being sent to the customer was identical to what had been thoroughly tested in the lab.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 33455977
>> such as timestamps or paths

In the case of a static unix library you can add to this the random access index table for symbol names

http://unixhelp.ed.ac.uk/CGI/man-cgi?ranlib
0
 
LVL 32

Expert Comment

by:phoffric
ID: 33456008
Each environment was different. That is why we had to experiment to understand exactly what worked for that environment.
0
 
LVL 1

Author Comment

by:naseeam
ID: 33465774
I'll have to wait for my Unix Account before I can try out above solutions.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 33465852
>> I'll have to wait for my Unix Account before I can try out above solutions.

This might help you make progress before then.

"Cygwin is a Linux-like environment for Windows."
http://www.cygwin.com/
0
 
LVL 32

Expert Comment

by:phoffric
ID: 33468954
If you do use Cygwin (I use it), then do not get its X Server
                   http://x.cygwin.com/

Instead, get Xming (and don't waste the time I did trying to use the Cygwin X server - I had to get EE help to learn this)
       http://sourceforge.net/projects/xming/
0
 
LVL 5

Accepted Solution

by:
shajithchandran earned 460 total points
ID: 33473382
what i would probably do is, just extract the text , data and may be the loader section from the libraries and compare them. If they are same, then the libraries are same.

After all, during execution, all that matters is the instructions (text section) , the initialized data (data section) and how the loader will resolve (loader section). If they are same, then i believe , we can safely conclude that the libraries are same.

i use dump -s on my unix machine to extract them.
0
 
LVL 1

Author Closing Comment

by:naseeam
ID: 33476005
Excellent solution.  Brillant Expert!
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
twoTwo  challenge 35 107
How to install SVN Command Line Client? 5 78
JQuery serialize and unserialize 8 132
egit plugin on eclipse 8 82
Introduction: The undo support, implementing a stack. Continuing from the eigth article about sudoku.   We need a mechanism to keep track of the digits entered so as to implement an undo mechanism.  This should be a ‘Last In First Out’ collec…
Introduction: Dialogs (2) modeless dialog and a worker thread.  Handling data shared between threads.  Recursive functions. Continuing from the tenth article about sudoku.   Last article we worked with a modal dialog to help maintain informat…
The goal of this video is to provide viewers with basic examples to understand how to create, access, and change arrays in the C programming language.
The viewer will learn how to synchronize PHP projects with a remote server in NetBeans IDE 8.0 for Windows.

789 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question