?
Solved

What Portions of Executable Files Need to Match for Executable Files to Be Confirmed as Being Similar

Posted on 2011-10-25
6
Medium Priority
?
275 Views
Last Modified: 2012-05-12
I am attempting to compare two executables that are compiled from the same source code. (Both executables are being compiled from the same source code without any changes to the code or the environment settings.) After attempting to use PE Explorer, I found that there were too many differences within the file header and data sections to definitively say that two executables were the same. Using the PE format of exectuables, what pieces of the executable must match in order for the two executables to be considered the same? For instance each time you compile the code, you end up with a different checksum number and a different date/time stamp on the executable. If I wanted the main pieces of code to be the same when I compiled it on another PC, assuming that all of the environment variables within Visual Studio were set to be the same as the first PC, what section of the PE executable file would I look at? (I am thinking that I am going to have to manually write an application that will go through the header and ignore all of the data and just do a comparison on the information on the data section of the PE executables. If I am incorrect in my assumption on how to show that the two files are similar or different, please let me know of any other ideas. I do not have a lot of time to complete this task, and I was wondering if anyone knew of any other third party tools that would examine only the data sections of the PE executable files to determine if the executables are a match.

Thank you in advance for any help that you could provide...it is greatly appreciated.
0
Comment
Question by:thenthorn1010
6 Comments
 
LVL 46

Expert Comment

by:aikimark
ID: 37028768
Help me understand somethings, please.

1. You have this in .Net and C# zones, but you have a C++ tag.

2. Why are you comparing executables if you know they were compiled from the same source code?

3. Do you have the source code?

4. If these are .Net programs, were they obfuscated?

5. Is any type of (wrapper) application protection being used?

If no obfuscation was used in the development, I would suggest using ILDASM against both assemblies and then compare the output.
0
 
LVL 22

Accepted Solution

by:
8080_Diver earned 500 total points
ID: 37030997
Essentially, in order to confirm that they are "similar", you will have to decide what portions are important.  

If you have source code that is supposed to be the code from which the "original" file was compiled, then you could do some experimentation to determine how much the executables change under certain circumstances.  For instance, if you compile the source code today and then make some inconsequential change (e.g. add a space somewhere that is "inconsequential") and compile it again, you could compare the two resulting executables to determine the extent of the change.  Because you know exactly what you changed, your analysis will provide an idea as to what the change implies.

0
 
LVL 37

Assisted Solution

by:TommySzalapski
TommySzalapski earned 1000 total points
ID: 37031644
If you skip all the header bytes and the images match then they are definitely the same. If you are using the exact same compiler, this should work.

If you use different compilers (even different versions) then the order of the linking matters and any optimizations that the compiler does will matter and there's no good way of knowing how similar they are.

Perhaps if you explain why you want to test them to check for sameness, we could help you accomplish the same thing a different way.

For example, you could create an MD5 hash of every executable after you build it and keep a list of which ones match. Then you can use that hash to check if an exe is the same as when it was built and use the list to see which ones match.

If you create a hash of each code file, then you can check before building each exe to see if it will match an old one.

This may accomplish the same thing and will certainly be more reliable.
0
Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

 

Author Comment

by:thenthorn1010
ID: 37040213
The purpose of this task is to make sure that a large purchase for source code that will be pulled from a code repository at Iron Mountain and given to an organization that I am affiliated with and they need to verify that the source code that they are getting out of Iron Mountain compiles into the same source code as the executable code that the vendor from whom the organization I am affilitated with is purchasing the software from. Due to poor customer service, my organization has decided to just purchase the source code from a third party vendor that has previously serviced the code because it takes months to fix simple bugs, such as misspelled words in error messages, to be resolved. In order to ensure that the large amount of money that is being exchanged with this third party vendor for the rights to the intellectual property, my organization would like to make sure that the executables are exactly the same. On Linux, you can do an MD5SUM. On Windows, you are unable to complete such a feat because of the PE format. I am looking for a way to check to make sure that both the compiled code retrieved from Iron Mountain and the source code that is provided in the current release of the software on the vendor's FTP site are the same, without any extra features. This vendor has no documentation on the code and poor QA testing. The software is for multimillion dollar services that occur everyday and needs to be exact so that the software can exchange hands without any doubt that the source code from Iron Mountain that will be compiled would produce the same exact executable that is on the FTP site. (Trust is not something that is very high on the list between the vendor and the organization that I work for.)

I hope that explains the reasoning for my question. I have torn through all of the PE document format and read various white pages and still have not been able to compare two executables that are compiled on two different machines are the same when they are compiled in Visual C++ version 6 or another application written in C# 2008.
0
 
LVL 37

Assisted Solution

by:TommySzalapski
TommySzalapski earned 1000 total points
ID: 37040910
If you don't use the exact same compiler version with all the same optimization routines, then it wouldn't even work on Linux. Some of the machine code will surely be different.

When you pull up the executable from the vendor in PE Explorer, it will give you the linker version. You can use that to try to pin down what version of what compiler they used. This will possibly help you generate the same .exe file.

The only other thing to try would be to run a large battery of tests to make sure they do the same thing. You could probably script that.
0
 
LVL 46

Assisted Solution

by:aikimark
aikimark earned 500 total points
ID: 37041094
Here's an idea...DLSuperC.  There is a version of this comparison engine that compares binary files.  Point it to the executables from the source code compile and the binary files you purchased.  It should be able to sync itself.

I think the same think can be done with the .Net assemblies, although I think my earlier ILDASM step might provide a more illustrative source to compare.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this post we will learn how to make Android Gesture Tutorial and give different functionality whenever a user Touch or Scroll android screen.
When you discover the power of the R programming language, you are going to wonder how you ever lived without it! Learn why the language merits a place in your programming arsenal.
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…
Starting up a Project

807 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question