Solved

How to compare Two binary files?

Posted on 2010-08-16
15
1,216 Views
Last Modified: 2013-12-26
A library is compiled with Wind River Diab 4.2b compiler resulting in old.a file.
Identical library is compiled with Wind River Diab 5.8.0.0 compiler resulting in new.a.

SlickEdit DIFFzilla utility was used to compare old.a and new.a binary files.  
The files are not getting compared because first part of old.a is shown against Imaginary Buffer.  Then, first part of new.a is show against Imaginary Buffer.  Then, next part of old.a is shown against Imaginary Buffer.  Then, next part of new.a is shown against Imaginary Buffer.  This keeps repeating until end of files is reached.

My guess is that files are not different.  Somehow compiler output formats are different which isn't allowing the files to get compared.

What other options do I have to compare these binay files?

Other tools I'm using are Clearcase, Codewright, Unix, Linux.  
0
Comment
Question by:naseeam
  • 6
  • 4
  • 4
  • +1
15 Comments
 
LVL 40

Expert Comment

by:evilrix
ID: 33449266
What is it, exactly, you are trying to achieve?
0
 

Author Comment

by:naseeam
ID: 33450233
I'm trying to compare two binary files using SlickEdit DIFFzilla utility.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 33450240
That I understand but to what end... what is you ultimate goal here?
0
 

Author Comment

by:naseeam
ID: 33455550
I compile library source files with one version of the compiler to get old.a library file.
Then, I compile exact same source files with newer version of compiler to get new.a library file.

My goal is to find out if two .a library files are identical or not?  If they are identical, then, I don't need to test library (built with new version of compiler) in the target board.
0
 
LVL 32

Expert Comment

by:phoffric
ID: 33455801
In some environments two identical compilations of libraries or executable builds still result in different binaries because of date time stamps. But, by experimenting, we could determine what was built from identical source code.

For example, I just build a few seconds apart two executables, a.exe and b.exe. I did a diff as seen below on hex dumps, and the results show minor changes. For our configuration management builds, we dropped the hex dumps into Codewright and got (better - easier to read) results than what I am showing below using the freeware WinMerge
$ od -c a.exe > a.txt

$ od -c b.exe > b.txt

$ diff a.txt b.txt

9c9

< 0000200   P   E  \0  \0   L 001  \t  \0 361 252   j   L  \0   &  \0  \0

---

> 0000200   P   E  \0  \0   L 001  \t  \0 343 252   j   L  \0   &  \0  \0

14c14

< 0000320  \0 240  \0  \0  \0 004  \0  \0   3 317  \0  \0 003  \0  \0 200

---

> 0000320  \0 240  \0  \0  \0 004  \0  \0   % 317  \0  \0 003  \0  \0 200

Open in new window

binary-diff.PNG
0
 
LVL 40

Assisted Solution

by:evilrix
evilrix earned 40 total points
ID: 33455893
You can't just binary diff the libraries because every time you compile them there is a chance that variable content (such as timestamps or paths as phoffric eludes) will be included. You need to do something a bit smarter than this.

This might get you going...

To find the dependencies of a dynamic library just used the ldd command.
http://unixhelp.ed.ac.uk/CGI/man-cgi?ldd+1

You can find out what symbols each library exports using the nm command
http://unixhelp.ed.ac.uk/CGI/man-cgi?nm

0
 
LVL 40

Expert Comment

by:evilrix
ID: 33455911
Oh, and if they are static libraries you can just unarchive them since a static library is nothing more than an archive file with an index table. You can use the ar command to do this.

http://unixhelp.ed.ac.uk/CGI/man-cgi?ar

Once the contents are extracted to can compare each individual element.
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 32

Expert Comment

by:phoffric
ID: 33455952
>> (such as timestamps or paths)
True, sometimes the Configuration Management was mapped to a different drive.
But that was what we found were the binary differences: timestamps and/or paths.

And it was easy to determine that if we got a clean comparison, then we knew that the other 1MB represented the same source code.

When comparing libraries, there were many more difference because each object in that library could have a different timestamp. But this process was sound and absolutely necessary to guarantee that what was being sent to the customer was identical to what had been thoroughly tested in the lab.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 33455977
>> such as timestamps or paths

In the case of a static unix library you can add to this the random access index table for symbol names

http://unixhelp.ed.ac.uk/CGI/man-cgi?ranlib
0
 
LVL 32

Expert Comment

by:phoffric
ID: 33456008
Each environment was different. That is why we had to experiment to understand exactly what worked for that environment.
0
 

Author Comment

by:naseeam
ID: 33465774
I'll have to wait for my Unix Account before I can try out above solutions.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 33465852
>> I'll have to wait for my Unix Account before I can try out above solutions.

This might help you make progress before then.

"Cygwin is a Linux-like environment for Windows."
http://www.cygwin.com/
0
 
LVL 32

Expert Comment

by:phoffric
ID: 33468954
If you do use Cygwin (I use it), then do not get its X Server
                   http://x.cygwin.com/

Instead, get Xming (and don't waste the time I did trying to use the Cygwin X server - I had to get EE help to learn this)
       http://sourceforge.net/projects/xming/
0
 
LVL 5

Accepted Solution

by:
shajithchandran earned 460 total points
ID: 33473382
what i would probably do is, just extract the text , data and may be the loader section from the libraries and compare them. If they are same, then the libraries are same.

After all, during execution, all that matters is the instructions (text section) , the initialized data (data section) and how the loader will resolve (loader section). If they are same, then i believe , we can safely conclude that the libraries are same.

i use dump -s on my unix machine to extract them.
0
 

Author Closing Comment

by:naseeam
ID: 33476005
Excellent solution.  Brillant Expert!
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
withoutTen challenge 14 88
mapAB Challlenge 35 89
Change to event 1 75
What is atomic operation? 6 18
If you use Adobe Reader X it is possible you can't open OLE PDF documents in the standard. The reason is the 'save box mode' in adobe reader X. Many people think the protected Mode of adobe reader x is only to stop the write access. But this fe…
Windows programmers of the C/C++ variety, how many of you realise that since Window 9x Microsoft has been lying to you about what constitutes Unicode (http://en.wikipedia.org/wiki/Unicode)? They will have you believe that Unicode requires you to use…
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use while-loops in the C programming language.
The viewer will learn how to use NetBeans IDE 8.0 for Windows to connect to a MySQL database. Open Services Panel: Create a new connection using New Connection Wizard: Create a test database called eetutorial: Create a new test tabel called ee…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now