Solved

Inspecting object code

Posted on 2001-06-24
10
200 Views
Last Modified: 2010-04-02
Hi,
I am doing some investigation into program similarity, i wonder if someone can help.

The two snippets of code below both do essentially the same thing, but one uses a 'for loop', and the other a 'while loop'.


//First Prog
int i;
for (i=0;i<10;i++)
   {
   printf ("%d\n",i);
   }


//Second Prog
i=0;
while (i<10)
   {
   printf ("%d\n",i);
   i++;
   }



The thing that interests me is, that after each program has been compiled to an .obj (presumably before linkage takes place), will the .obj's be different ? and what are the difference's likely to be.

Would it be possible to write a program parse the .obj's, and detect the difference's (or in my case the similarities) ?

Thanks for any help
0
Comment
Question by:AntBon
10 Comments
 
LVL 7

Expert Comment

by:KangaRoo
ID: 6222580
1) Run the program in the debugger and look at the disassembly
2) Take a close look at the command line tools that come with your compiler. There is likely a command that displays the generated object code
3) Make the compile generate assembly code, usually a command line option like -s (Borland) or -S (GCC)
0
 
LVL 22

Accepted Solution

by:
nietod earned 50 total points
ID: 6222582
>>  will the .obj's be different ?
These are so somilar that there is a good chance that assembly code generated by the compiler (the actual instructions that perform the tasks you wrote in your C++ code) will be the same.  If the assembly code is the same, the the object code will be almost identical.  (It might not be 100% identical because the boject code mght contain portions that include complile time info, file names, etc etc)   however, there is no guarantee that the assembly will be the same, but it is reasonably likely.  In more complex cases there is a greater chance that the two assembly codes produced would be different.  Also turning on optimizatiosn will tend to encourage the two to be more similar and turning them off will tend to encourage them to be more different  

>> what are the difference's likely to be.
No one can say.    In fact I don't think a difference is that likely for this case.

>> Would it be possible to write a program parse the .obj's, and detect
>> the difference's (or in my case the similarities) ?
It woudl be possible, yes.  but a tremendous amount of work.

A better solution is to write,compile and link the two programs and then run them under a debugger that supports disssasembly.  then look at the assembly code that the compiler produces.

0
 
LVL 32

Expert Comment

by:jhance
ID: 6222585
It's very complier dependent but in general, yes, the code will be different.

Just because you're carefully chosen the initial values and loop counters to behave identically here, doesn't make these two constructs identical for all cases.  So the compiler usually doesn't "see" code the way you and I do.  It's really "stupid" and has a really hard time comprehending intent.  We can clearly see that the two code blocks above are going to do the same thing but that's because we have a higher level understanding of what the programmer is doing.

0
 
LVL 22

Expert Comment

by:nietod
ID: 6222598
For example, in MS VC in a debug compile I got the following results


For loop.
00401048   mov         dword ptr [ebp-4],0
0040104F   jmp         main+2Ah (0040105a)
00401051   mov         eax,dword ptr [ebp-4]
00401054   add         eax,1
00401057   mov         dword ptr [ebp-4],eax
0040105A   cmp         dword ptr [ebp-4],0Ah
0040105E   jge         main+43h (00401073)
00401060   mov         ecx,dword ptr [ebp-4]
00401063   push        ecx
00401064   push        offset string "%d\n" (0042e01c)
00401069   call        printf (00408170)
0040106E   add         esp,8
00401071   jmp         main+21h (00401051)


While loop
00401048   mov         dword ptr [ebp-4],0
0040104F   cmp         dword ptr [ebp-4],0Ah
00401053   jge         main+41h (00401071)
00401055   mov         eax,dword ptr [ebp-4]
00401058   push        eax
00401059   push        offset string "%d\n" (0042e01c)
0040105E   call        printf (00408170)
00401063   add         esp,8
00401066   mov         ecx,dword ptr [ebp-4]
00401069   add         ecx,1
0040106C   mov         dword ptr [ebp-4],ecx
0040106F   jmp         main+1Fh (0040104f)


In this case the code is slightly different and the while loop code is slightly superior--very slightly.  But as i said, this sort of difference will depend on many many factors, like the exact compiler used, the whethor or not you are optiizing, and the offects of other code in the vacinity of the code in question.




0
 
LVL 22

Expert Comment

by:nietod
ID: 6222608
When I try this in a release (not debug) version with optimizations I get the two algorithms produce exactly the same code and that this code is significantly improved over the code above.   But once again, this is not guaranteeed.

00401002   xor         esi,esi
00401004   push        esi
00401005   push        40C0A0h
0040100A   call        00403A91
0040100F   add         esp,8
00401012   inc         esi
00401013   cmp         esi,0Ah
00401016   jl          00401004
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 30

Expert Comment

by:Axter
ID: 6222671
Hi AntBon:
Feel free to click the [Reject Answer] button near (Answer-poster's)response, even if it seems like a good answer.
Doing so will increase your chance of obtaining additional input from other experts.  Later, you can click the [Select Comment as Answer] button on any response.
0
 

Expert Comment

by:ComTech
ID: 6222700
I will alert PandorMod to look at this, the primary Moderator for this Topic Area.

ComTech
0
 
LVL 7

Expert Comment

by:KangaRoo
ID: 6225386
There is little to look at, nietods post came two minutes after mine, he could never have seen my comment and write his in that time.

Meanwhile Antbon can look at our posts, together we can provide help on the large majority of C++ compilers (besides, they all come with similar tools and options)
0
 
LVL 22

Expert Comment

by:nietod
ID: 6225465
Actually, I don't think this question is directed to a specific compiler or even this specific code, but more generally to what can happen with different algorithms that produce the same results.  At least I assume so since this case is too specific to be of much value.
0
 

Author Comment

by:AntBon
ID: 6447510
Thanks v much and sorry for the delay
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now