Solved

X86 Assembly Code

Posted on 2007-11-27
5
2,855 Views
Last Modified: 2008-02-01
Hey guys... I'm learning to read x86 assembly code as C code, and I need some help getting started.

Here's the original function in C:
--------
int triangle (int width, int height){
int array[5] = {0,1,2,3,4};
int area;
area = width * height/2;
return (area);
}

Now here's that same code, compiled into ASM (It's AT&T syntax unfortunately, I'm doing this from a textbook):
--------
0x8048430 <triangle>:      push    %ebp
0x8048431 <triangle+1>:    mov     %esp, %ebp
0x8048433 <triangle+3>:    push    %edi
0x8048434 <triangle+4>:    push    %esi
0x8048435 <triangle+5>:    sub     $0x30,%esp
0x8048438 <triangle+8>:    lea     0xffffffd8(%ebp), %edi
0x804843b <triangle+11>:    mov    $0x8049508,%esi
0x8048440 <triangle+16>:    cld
0x8048441 <triangle+17>:    mov    $0x30,%esp
0x8048446 <triangle+22>:    repz movsl    %ds:( %esi), %es:( %edi)
0x8048448 <triangle+24>:    mov    0x8(%ebp),%eax
0x804844b <triangle+27>:    mov    %eax,%edx
0x804844d <triangle+29>:    imul   0xc(%ebp),%edx
0x8048451 <triangle+33>:    mov    %edx,%eax
0x8048453 <triangle+35>:    sar    $0x1f,%eax
0x8048456 <triangle+38>:    shr    $0x1f,%eax
0x8048459 <triangle+41>:    lea    (%eax, %edx, 1), %eax
0x804845c <triangle+44>:    sar    %eax
0x804845e <triangle+46>:    mov    %eax,0xffffffd4(%ebp)
0x8048461 <triangle+49>:    mov    0xffffffd4(%ebp),%eax
0x8048464 <triangle+52>:    mov    %eax,%eax
0x8048466 <triangle+54>:    add    $0x30,%esp
0x8048469 <triangle+57>:    pop    %esi
0x804846a <triangle+58>:    pop    %edi
0x804846b <triangle+59>     pop    %ebp
0x804846c <triangle+60>:    ret


Okay now my goal here is to convince myself that if I hadn't been given the original C code, I could figure out the function of the ASM code.

Here's what I've managed to deduce myself:
--------
push %ebp                        // save old base pointer
mov %esp,%ebp                        // create new stack frame -- new_bp is old_sp
push %edi                        // save edi
push %esi                        // save esi
sub $0x30,%esp                        // create new stack frame -- new_sp is old_sp plus room for local vars

lea 0xffffffd8(%ebp),%edi            // address of destination string
mov $0x8049508,%esi                  // address of source string
cld                              // set direction flag (cld to count up, std to count down)

mov $0x30,%esp                        // wtf?

repz movsl %ds:( %esi),%es:( %edi)      // move source string to destination string

mov 0x8(%ebp),%eax                  // load eax with a value on the stack
mov %eax,%edx                        // put this value into edx
imul 0xc(%ebp),%edx                  // multiply edx by another value on the stack
mov %edx,%eax                        // put the result in eax

sar $0x1f,%eax                        // signed divide -- shift eax 1f bits to the right
shr $0x1f,%eax                        // unsigned divide -- shift eax 1f bits to the right

lea (%eax,%edx,1),%eax
sar %eax
mov %eax,0xffffffd4(%ebp)
mov 0xffffffd4(%ebp),%eax
mov %eax,%eax

add $0x30,%esp                        // destroy stack frame -- restore old_sp, destroy local vars
pop %esi                        // restore saved esi
pop %edi                        // restore saved edi
pop %ebp                        // destroy stack frame -- restore old_bp
ret                              // retn value in eax


1). Okay so I'm fine with the initial and final parts of code which set up and destroy the stack frame...

2). I'm new to the x86 string functions, but after a quick look on google I get the basic idea. What I don't understand is *why* string functions are used here. Obviously it has something to do with the array... but  what is in the source, and why copy it to the destination?

3). What is the purpose of "mov $0x30,%esp" here. It seems like it is screwing around with the stack pointer for no apparent reason. (I'm guessing that in AT&T syntax the dollar symbol just means it's a constant... is this correct?).

4). Obviously the next block of code is the width * height part... no problems there.

5). I'm guessing that these sar and shr instructions have something to do with the /2 part... not sure exactly how this works though.

6). I'm not sure what this chunk of code does.



I know those are a lot of questions, and this is a very long post, but hopefully it should be pretty easy for someone good with x86 to answer. Also I promise you guys that this is not a "homework question". I'm doing a Software Engineering degree (currently on holidays) and I wanted to go into more detail on the low level stuff by myself. I know that sounds really nerdy... I guess it probably is. Anyway, appreciate any pointers.

Cheers!

BTW -- Sorry about the indentation (EE doesn't like tabs for some reason).
0
Comment
Question by:da_mango_bros
  • 2
  • 2
5 Comments
 
LVL 53

Accepted Solution

by:
Infinity08 earned 400 total points
Comment Utility
>> 1). Okay so I'm fine with the initial and final parts of code which set up and destroy the stack frame...

Looks good indeed.


>> 2). I'm new to the x86 string functions, but after a quick look on google I get the basic idea. What I don't understand is *why* string functions are used here. Obviously it has something to do with the array... but  what is in the source, and why copy it to the destination?

The repz movsl instruction copies 32bit values from the source to the destination ... it initializes an array of ints with the values found at the source address. The corresponding line in the C code is :

        int array[5] = {0,1,2,3,4};


>> 3). What is the purpose of "mov $0x30,%esp" here. It seems like it is screwing around with the stack pointer for no apparent reason.

That looks weird indeed ... Are you sure that's correct ? Let me think about this one ...


>> (I'm guessing that in AT&T syntax the dollar symbol just means it's a constant... is this correct?).

Correct.


>> 4). Obviously the next block of code is the width * height part... no problems there.

ok.


>> 5). I'm guessing that these sar and shr instructions have something to do with the /2 part... not sure exactly how this works though.
>> 6). I'm not sure what this chunk of code does.

It is needlessly complicated, but it takes care of the signed division by 2.

First of all, after this line :

        mov %edx,%eax                        // put the result in eax

eax and edx contain the product of the two function parameters.

Then :

        sar $0x1f,%eax                        // signed divide -- shift eax 1f bits to the right

This is indeed signed shift over 31 bits to the right. In other words, if eax was a negative value, then it will now contain -1 (0xFFFFFFFF) - if eax was a positive value, then it will now contain 0 (0x00000000).

        shr $0x1f,%eax                        // unsigned divide -- shift eax 1f bits to the right

This is indeed an unsigned shift over 31 bits to the right. In other words, if eax was 0xFFFFFFFF, then it will now contain 0x00000001 - if eax was 0x00000000, then it will still contain 0x00000000.

These two instructions basically extract the sign bit of the result.

After that :

        lea (%eax,%edx,1),%eax

sets eax = eax + (edx * 1). Remember that eax was 1 in case edx was negative, and was 0 in case edx was positive. So, basically, positive values stay the same, but negative values are incremented by 1.

        sar %eax

next, eax is divided by 2 using a signed divide.

You might wonder why 1 needed to be added to negative values ? The reason is for correct rounding when dividing.


Finally, these :

        mov %eax,0xffffffd4(%ebp)
        mov 0xffffffd4(%ebp),%eax
        mov %eax,%eax

are pretty unnecessary ;)
0
 
LVL 22

Assisted Solution

by:grg99
grg99 earned 100 total points
Comment Utility
the mov #30,esp  is disassembled incorrectly, it's supposed to be mov #30,ecx.  The following move bytes instruction needs a byte count in ecx

BTW the move bytes instruction is usually slower than a well-interleaved loop of plain old MOV instructions.  Do you have an old compiler or one with optimization turned off?

The redundant moves on exit may be left over code to restore any exception stack frames.  Compiling with optimization would probably eliminate those.



0
 
LVL 22

Expert Comment

by:grg99
Comment Utility
BTW the code will work better if you use (width * height) / 2

0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> the mov #30,esp  is disassembled incorrectly, it's supposed to be mov #30,ecx.  The following move bytes instruction needs a byte count in ecx

That would explain it :)
0
 

Author Comment

by:da_mango_bros
Comment Utility
Hey, thanks guys great explanation :)

>> These two instructions basically extract the sign bit of the result.

That was the main link I was failing to make. And the rest of the code didn't make sense without knowing that.

>> the mov #30,esp  is disassembled incorrectly, it's supposed to be mov #30,ecx.  The following move bytes instruction needs a byte count in ecx

That makes sense. When I was reading up about x86 string operations, it kept going on about a count variable and said that it was usually stored in ECX... I wondered why I didn't have that!

>> The redundant moves on exit may be left over code to restore any exception stack frames.  Compiling with optimization would probably eliminate those.

Good to know what that is.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Article by: SunnyDark
This article's goal is to present you with an easy to use XML wrapper for C++ and also present some interesting techniques that you might use with MS C++. The reason I built this class is to ease the pain of using XML files with C++, since there is…
This article will show you some of the more useful Standard Template Library (STL) algorithms through the use of working examples.  You will learn about how these algorithms fit into the STL architecture, how they work with STL containers, and why t…
The goal of this video is to provide viewers with basic examples to understand and use conditional statements in the C programming language.
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now