asked on

X86 Assembly Code

Hey guys... I'm learning to read x86 assembly code as C code, and I need some help getting started.

Here's the original function in C:
--------
int triangle (int width, int height){
int array[5] = {0,1,2,3,4};
int area;
area = width * height/2;
return (area);
}

Now here's that same code, compiled into ASM (It's AT&T syntax unfortunately, I'm doing this from a textbook):
--------
0x8048430 <triangle>: push %ebp
0x8048431 <triangle+1>: mov %esp, %ebp
0x8048433 <triangle+3>: push %edi
0x8048434 <triangle+4>: push %esi
0x8048435 <triangle+5>: sub $0x30,%esp
0x8048438 <triangle+8>: lea 0xffffffd8(%ebp), %edi
0x804843b <triangle+11>: mov $0x8049508,%esi
0x8048440 <triangle+16>: cld
0x8048441 <triangle+17>: mov $0x30,%esp
0x8048446 <triangle+22>: repz movsl %ds:( %esi), %es:( %edi)
0x8048448 <triangle+24>: mov 0x8(%ebp),%eax
0x804844b <triangle+27>: mov %eax,%edx
0x804844d <triangle+29>: imul 0xc(%ebp),%edx
0x8048451 <triangle+33>: mov %edx,%eax
0x8048453 <triangle+35>: sar $0x1f,%eax
0x8048456 <triangle+38>: shr $0x1f,%eax
0x8048459 <triangle+41>: lea (%eax, %edx, 1), %eax
0x804845c <triangle+44>: sar %eax
0x804845e <triangle+46>: mov %eax,0xffffffd4(%ebp)
0x8048461 <triangle+49>: mov 0xffffffd4(%ebp),%eax
0x8048464 <triangle+52>: mov %eax,%eax
0x8048466 <triangle+54>: add $0x30,%esp
0x8048469 <triangle+57>: pop %esi
0x804846a <triangle+58>: pop %edi
0x804846b <triangle+59> pop %ebp
0x804846c <triangle+60>: ret

Okay now my goal here is to convince myself that if I hadn't been given the original C code, I could figure out the function of the ASM code.

Here's what I've managed to deduce myself:
--------
push %ebp                        // save old base pointer
mov %esp,%ebp                        // create new stack frame -- new_bp is old_sp
push %edi                        // save edi
push %esi                        // save esi
sub $0x30,%esp                        // create new stack frame -- new_sp is old_sp plus room for local vars

lea 0xffffffd8(%ebp),%edi            // address of destination string
mov $0x8049508,%esi                  // address of source string
cld                              // set direction flag (cld to count up, std to count down)

mov $0x30,%esp                        // wtf?

repz movsl %ds:( %esi),%es:( %edi)      // move source string to destination string

mov 0x8(%ebp),%eax                  // load eax with a value on the stack
mov %eax,%edx                        // put this value into edx
imul 0xc(%ebp),%edx                  // multiply edx by another value on the stack
mov %edx,%eax                        // put the result in eax

sar $0x1f,%eax                        // signed divide -- shift eax 1f bits to the right
shr $0x1f,%eax                        // unsigned divide -- shift eax 1f bits to the right

lea (%eax,%edx,1),%eax
sar %eax
mov %eax,0xffffffd4(%ebp)
mov 0xffffffd4(%ebp),%eax
mov %eax,%eax

add $0x30,%esp                        // destroy stack frame -- restore old_sp, destroy local vars
pop %esi                        // restore saved esi
pop %edi                        // restore saved edi
pop %ebp                        // destroy stack frame -- restore old_bp
ret                              // retn value in eax

1). Okay so I'm fine with the initial and final parts of code which set up and destroy the stack frame...

2). I'm new to the x86 string functions, but after a quick look on google I get the basic idea. What I don't understand is *why* string functions are used here. Obviously it has something to do with the array... but what is in the source, and why copy it to the destination?

3). What is the purpose of "mov $0x30,%esp" here. It seems like it is screwing around with the stack pointer for no apparent reason. (I'm guessing that in AT&T syntax the dollar symbol just means it's a constant... is this correct?).

4). Obviously the next block of code is the width * height part... no problems there.

5). I'm guessing that these sar and shr instructions have something to do with the /2 part... not sure exactly how this works though.

6). I'm not sure what this chunk of code does.

I know those are a lot of questions, and this is a very long post, but hopefully it should be pretty easy for someone good with x86 to answer. Also I promise you guys that this is not a "homework question". I'm doing a Software Engineering degree (currently on holidays) and I wanted to go into more detail on the low level stuff by myself. I know that sounds really nerdy... I guess it probably is. Anyway, appreciate any pointers.

Cheers!

BTW -- Sorry about the indentation (EE doesn't like tabs for some reason).

ASKER CERTIFIED SOLUTION

Infinity08

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

grg99

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

grg99

BTW the code will work better if you use (width * height) / 2

Infinity08

>> the mov #30,esp is disassembled incorrectly, it's supposed to be mov #30,ecx. The following move bytes instruction needs a byte count in ecx

That would explain it :)

da_mango_bros

ASKER

Hey, thanks guys great explanation :)

>> These two instructions basically extract the sign bit of the result.

That was the main link I was failing to make. And the rest of the code didn't make sense without knowing that.

>> the mov #30,esp is disassembled incorrectly, it's supposed to be mov #30,ecx. The following move bytes instruction needs a byte count in ecx

That makes sense. When I was reading up about x86 string operations, it kept going on about a count variable and said that it was usually stored in ECX... I wondered why I didn't have that!

>> The redundant moves on exit may be left over code to restore any exception stack frames. Compiling with optimization would probably eliminate those.

Good to know what that is.