Link to home
Create AccountLog in
Avatar of unityxx311
unityxx311

asked on

Understanding Hex Dump of helloworld.c

Hi,

I am trying to understand how to read hex dumps. For simplicity I wanted to start with hello.c. The executable created has 12904 bytes or ~807 lines of 16 bytes. I want to understand the different sections of the hex dump and if there is anything in general you can assume. Please be thorough.

source:

#include <stdio.h>
int main()
{
  printf("Hello World\n");
  return (0);
}

gcc hello.c -o hello

$ gcc --version
gcc (GCC) 3.3.3 (cygwin special)
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

thanks! matt
SOLUTION
Avatar of grg99
grg99

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
Avatar of unityxx311
unityxx311

ASKER

Yeah I was a little puzzled at the amount of hex produced from just hello.c. I figured most of it was assembly code...
using gcc -S

$ cat hello
        .file   "hello.c"
        .def    ___main;        .scl    2;      .type   32;     .endef
        .section .rdata,"dr"
LC0:
        .ascii "Hello World\12\0"
        .text
.globl _main
        .def    _main;  .scl    2;      .type   32;     .endef
_main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        andl    $-16, %esp
        movl    $0, %eax
        movl    %eax, -4(%ebp)
        movl    -4(%ebp), %eax
        call    __alloca
        call    ___main
        movl    $LC0, (%esp)
        call    _printf
        movl    $0, %eax
        leave
        ret
        .def    _printf;        .scl    2;      .type   32;     .endef
grg99's right, in that you are probably better off just starting with the .s files.

gcc -S hello.c
generates this output in hello.s:

        .file   "hello.c"
        .section        .rodata
.LC0:
        .string "Hello World\n"
        .text
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        andl    $-16, %esp
        movl    $0, %eax
        subl    %eax, %esp
        movl    $.LC0, (%esp)
        call    printf
        movl    $0, %eax
        leave
        ret
        .size   main, .-main
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.3.5  (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3, pie-8.7.7.1)"

With my comments:
        .file   "hello.c"              
        .section        .rodata        ; means 'read-only data'.  
.LC0:
        .string "Hello World\n"     ; your string constant, stored at label .LC0
        .text                              ; it says 'text', but this really stands for 'code'.  
.globl main                            ; this tells the assembler to publish the symbol 'main' as a global in the symbol table of the .o, so other programs (namely the linker) will be able to find it.  Note that .LC0 is not published.
        .type   main, @function   ; this is type information associated with the label, to let the outside world know main is a function and not a variable or anything like that.
main:                                   ; this is the actual label for main().
        pushl   %ebp                 ; %ebp is the 'frame pointer', and %esp is the 'stack pointer'.  These two lines preserve the
        movl    %esp, %ebp       ; calling routine's frame pointer and reset %ebp to the current frame pointer.
        subl    $8, %esp            ; reserves 8 bytes on the stack.  Not sure why.
        andl    $-16, %esp         ; aligns top of stack to 16-byte boundary.  Not sure why.
        movl    $0, %eax           ; zeroes %eax
        subl    %eax, %esp       ; not sure why this is necessary.  Is it clearing the CPU arithmetic flags?
        movl    $.LC0, (%esp)   ; puts the address of the "Hello World\n" string on the stack as the first parameter to printf
        call    printf                  ; calls printf
        movl    $0, %eax          ; zeroes %eax again.  function return values go into %eax, so printf clobbered it.
        leave                          ; reverses those first two %ebp and %esp instructions, to get the frame pointer back to how the calling routine expects it
        ret                              ; returns to calling routine
        .size   main, .-main      ; specifies the size of the 'main' entry.  '.' is the current output location, and '.-main' is # of bytes from 'main:' to here.
        .section        .note.GNU-stack,"",@progbits     ; not sure
        .ident  "GCC: (GNU) 3.3.5  (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3, pie-8.7.7.1)"   ; identifying information for compiler

Avatar of Narendra Kumar S S
So, it would typically contain the memory layout of your code.

-ssnkumar
Many unix-like operating systems (Linux, Solarix, and hp-ux for example) use ELF binaries for object code.  

This site has a couple of informative links on it:
http://www.answers.com/topic/executable-and-linkable-format
Actually, the wikipedia article is probably a little better:
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
So essentially, to hex edit any executable file you need to know the format?
ASKER CERTIFIED SOLUTION
Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
What are you trying to do?
I was trying to determine if there was a standard format of an executable, like ELF that someone pointed out. For example, could I look at lines 30-40 and say this is were it does x. I was looking at the hex for just the simple case of hello.c which had thousands of lines and I figured that most of it was overhead, which might be in some standard format for every exe created with gcc.

matt
SOLUTION
Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.