What Portions of Executable Files Need to Match for Executable Files to Be Confirmed as Being Similar

Posted on 2011-10-25
Last Modified: 2012-05-12
I am attempting to compare two executables that are compiled from the same source code. (Both executables are being compiled from the same source code without any changes to the code or the environment settings.) After attempting to use PE Explorer, I found that there were too many differences within the file header and data sections to definitively say that two executables were the same. Using the PE format of exectuables, what pieces of the executable must match in order for the two executables to be considered the same? For instance each time you compile the code, you end up with a different checksum number and a different date/time stamp on the executable. If I wanted the main pieces of code to be the same when I compiled it on another PC, assuming that all of the environment variables within Visual Studio were set to be the same as the first PC, what section of the PE executable file would I look at? (I am thinking that I am going to have to manually write an application that will go through the header and ignore all of the data and just do a comparison on the information on the data section of the PE executables. If I am incorrect in my assumption on how to show that the two files are similar or different, please let me know of any other ideas. I do not have a lot of time to complete this task, and I was wondering if anyone knew of any other third party tools that would examine only the data sections of the PE executable files to determine if the executables are a match.

Thank you in advance for any help that you could is greatly appreciated.
Question by:thenthorn1010
    LVL 44

    Expert Comment

    Help me understand somethings, please.

    1. You have this in .Net and C# zones, but you have a C++ tag.

    2. Why are you comparing executables if you know they were compiled from the same source code?

    3. Do you have the source code?

    4. If these are .Net programs, were they obfuscated?

    5. Is any type of (wrapper) application protection being used?

    If no obfuscation was used in the development, I would suggest using ILDASM against both assemblies and then compare the output.
    LVL 22

    Accepted Solution

    Essentially, in order to confirm that they are "similar", you will have to decide what portions are important.  

    If you have source code that is supposed to be the code from which the "original" file was compiled, then you could do some experimentation to determine how much the executables change under certain circumstances.  For instance, if you compile the source code today and then make some inconsequential change (e.g. add a space somewhere that is "inconsequential") and compile it again, you could compare the two resulting executables to determine the extent of the change.  Because you know exactly what you changed, your analysis will provide an idea as to what the change implies.

    LVL 37

    Assisted Solution

    If you skip all the header bytes and the images match then they are definitely the same. If you are using the exact same compiler, this should work.

    If you use different compilers (even different versions) then the order of the linking matters and any optimizations that the compiler does will matter and there's no good way of knowing how similar they are.

    Perhaps if you explain why you want to test them to check for sameness, we could help you accomplish the same thing a different way.

    For example, you could create an MD5 hash of every executable after you build it and keep a list of which ones match. Then you can use that hash to check if an exe is the same as when it was built and use the list to see which ones match.

    If you create a hash of each code file, then you can check before building each exe to see if it will match an old one.

    This may accomplish the same thing and will certainly be more reliable.

    Author Comment

    The purpose of this task is to make sure that a large purchase for source code that will be pulled from a code repository at Iron Mountain and given to an organization that I am affiliated with and they need to verify that the source code that they are getting out of Iron Mountain compiles into the same source code as the executable code that the vendor from whom the organization I am affilitated with is purchasing the software from. Due to poor customer service, my organization has decided to just purchase the source code from a third party vendor that has previously serviced the code because it takes months to fix simple bugs, such as misspelled words in error messages, to be resolved. In order to ensure that the large amount of money that is being exchanged with this third party vendor for the rights to the intellectual property, my organization would like to make sure that the executables are exactly the same. On Linux, you can do an MD5SUM. On Windows, you are unable to complete such a feat because of the PE format. I am looking for a way to check to make sure that both the compiled code retrieved from Iron Mountain and the source code that is provided in the current release of the software on the vendor's FTP site are the same, without any extra features. This vendor has no documentation on the code and poor QA testing. The software is for multimillion dollar services that occur everyday and needs to be exact so that the software can exchange hands without any doubt that the source code from Iron Mountain that will be compiled would produce the same exact executable that is on the FTP site. (Trust is not something that is very high on the list between the vendor and the organization that I work for.)

    I hope that explains the reasoning for my question. I have torn through all of the PE document format and read various white pages and still have not been able to compare two executables that are compiled on two different machines are the same when they are compiled in Visual C++ version 6 or another application written in C# 2008.
    LVL 37

    Assisted Solution

    If you don't use the exact same compiler version with all the same optimization routines, then it wouldn't even work on Linux. Some of the machine code will surely be different.

    When you pull up the executable from the vendor in PE Explorer, it will give you the linker version. You can use that to try to pin down what version of what compiler they used. This will possibly help you generate the same .exe file.

    The only other thing to try would be to run a large battery of tests to make sure they do the same thing. You could probably script that.
    LVL 44

    Assisted Solution

    Here's an idea...DLSuperC.  There is a version of this comparison engine that compares binary files.  Point it to the executables from the source code compile and the binary files you purchased.  It should be able to sync itself.

    I think the same think can be done with the .Net assemblies, although I think my earlier ILDASM step might provide a more illustrative source to compare.

    Featured Post

    What Should I Do With This Threat Intelligence?

    Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

    Join & Write a Comment

    Displaying an arrayList in a listView using the default adapter is rarely the best solution. To get full control of your display data, and to be able to refresh it after editing, requires the use of a custom adapter.
    A short article about problems I had with the new location API and permissions in Marshmallow
    An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
    In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

    745 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    16 Experts available now in Live!

    Get 1:1 Help Now