Inspecting object code

Posted on 2001-06-24
Last Modified: 2010-04-02
I am doing some investigation into program similarity, i wonder if someone can help.

The two snippets of code below both do essentially the same thing, but one uses a 'for loop', and the other a 'while loop'.

//First Prog
int i;
for (i=0;i<10;i++)
   printf ("%d\n",i);

//Second Prog
while (i<10)
   printf ("%d\n",i);

The thing that interests me is, that after each program has been compiled to an .obj (presumably before linkage takes place), will the .obj's be different ? and what are the difference's likely to be.

Would it be possible to write a program parse the .obj's, and detect the difference's (or in my case the similarities) ?

Thanks for any help
Question by:AntBon

Expert Comment

ID: 6222580
1) Run the program in the debugger and look at the disassembly
2) Take a close look at the command line tools that come with your compiler. There is likely a command that displays the generated object code
3) Make the compile generate assembly code, usually a command line option like -s (Borland) or -S (GCC)
LVL 22

Accepted Solution

nietod earned 50 total points
ID: 6222582
>>  will the .obj's be different ?
These are so somilar that there is a good chance that assembly code generated by the compiler (the actual instructions that perform the tasks you wrote in your C++ code) will be the same.  If the assembly code is the same, the the object code will be almost identical.  (It might not be 100% identical because the boject code mght contain portions that include complile time info, file names, etc etc)   however, there is no guarantee that the assembly will be the same, but it is reasonably likely.  In more complex cases there is a greater chance that the two assembly codes produced would be different.  Also turning on optimizatiosn will tend to encourage the two to be more similar and turning them off will tend to encourage them to be more different  

>> what are the difference's likely to be.
No one can say.    In fact I don't think a difference is that likely for this case.

>> Would it be possible to write a program parse the .obj's, and detect
>> the difference's (or in my case the similarities) ?
It woudl be possible, yes.  but a tremendous amount of work.

A better solution is to write,compile and link the two programs and then run them under a debugger that supports disssasembly.  then look at the assembly code that the compiler produces.

LVL 32

Expert Comment

ID: 6222585
It's very complier dependent but in general, yes, the code will be different.

Just because you're carefully chosen the initial values and loop counters to behave identically here, doesn't make these two constructs identical for all cases.  So the compiler usually doesn't "see" code the way you and I do.  It's really "stupid" and has a really hard time comprehending intent.  We can clearly see that the two code blocks above are going to do the same thing but that's because we have a higher level understanding of what the programmer is doing.

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

LVL 22

Expert Comment

ID: 6222598
For example, in MS VC in a debug compile I got the following results

For loop.
00401048   mov         dword ptr [ebp-4],0
0040104F   jmp         main+2Ah (0040105a)
00401051   mov         eax,dword ptr [ebp-4]
00401054   add         eax,1
00401057   mov         dword ptr [ebp-4],eax
0040105A   cmp         dword ptr [ebp-4],0Ah
0040105E   jge         main+43h (00401073)
00401060   mov         ecx,dword ptr [ebp-4]
00401063   push        ecx
00401064   push        offset string "%d\n" (0042e01c)
00401069   call        printf (00408170)
0040106E   add         esp,8
00401071   jmp         main+21h (00401051)

While loop
00401048   mov         dword ptr [ebp-4],0
0040104F   cmp         dword ptr [ebp-4],0Ah
00401053   jge         main+41h (00401071)
00401055   mov         eax,dword ptr [ebp-4]
00401058   push        eax
00401059   push        offset string "%d\n" (0042e01c)
0040105E   call        printf (00408170)
00401063   add         esp,8
00401066   mov         ecx,dword ptr [ebp-4]
00401069   add         ecx,1
0040106C   mov         dword ptr [ebp-4],ecx
0040106F   jmp         main+1Fh (0040104f)

In this case the code is slightly different and the while loop code is slightly superior--very slightly.  But as i said, this sort of difference will depend on many many factors, like the exact compiler used, the whethor or not you are optiizing, and the offects of other code in the vacinity of the code in question.

LVL 22

Expert Comment

ID: 6222608
When I try this in a release (not debug) version with optimizations I get the two algorithms produce exactly the same code and that this code is significantly improved over the code above.   But once again, this is not guaranteeed.

00401002   xor         esi,esi
00401004   push        esi
00401005   push        40C0A0h
0040100A   call        00403A91
0040100F   add         esp,8
00401012   inc         esi
00401013   cmp         esi,0Ah
00401016   jl          00401004
LVL 30

Expert Comment

ID: 6222671
Hi AntBon:
Feel free to click the [Reject Answer] button near (Answer-poster's)response, even if it seems like a good answer.
Doing so will increase your chance of obtaining additional input from other experts.  Later, you can click the [Select Comment as Answer] button on any response.

Expert Comment

ID: 6222700
I will alert PandorMod to look at this, the primary Moderator for this Topic Area.


Expert Comment

ID: 6225386
There is little to look at, nietods post came two minutes after mine, he could never have seen my comment and write his in that time.

Meanwhile Antbon can look at our posts, together we can provide help on the large majority of C++ compilers (besides, they all come with similar tools and options)
LVL 22

Expert Comment

ID: 6225465
Actually, I don't think this question is directed to a specific compiler or even this specific code, but more generally to what can happen with different algorithms that produce the same results.  At least I assume so since this case is too specific to be of much value.

Author Comment

ID: 6447510
Thanks v much and sorry for the delay

Featured Post

ScreenConnect 6.0 Free Trial

Check out the updates in one game-changing release, ScreenConnect 6.0, based on partner feedback. New features include a redesigned UI that improves session organization and overall user experience. See the enhancements for yourself!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, wh…
  Included as part of the C++ Standard Template Library (STL) is a collection of generic containers. Each of these containers serves a different purpose and has different pros and cons. It is often difficult to decide which container to use and …
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question