unityxx311
asked on
Understanding Hex Dump of helloworld.c
Hi,
I am trying to understand how to read hex dumps. For simplicity I wanted to start with hello.c. The executable created has 12904 bytes or ~807 lines of 16 bytes. I want to understand the different sections of the hex dump and if there is anything in general you can assume. Please be thorough.
source:
#include <stdio.h>
int main()
{
printf("Hello World\n");
return (0);
}
gcc hello.c -o hello
$ gcc --version
gcc (GCC) 3.3.3 (cygwin special)
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
thanks! matt
I am trying to understand how to read hex dumps. For simplicity I wanted to start with hello.c. The executable created has 12904 bytes or ~807 lines of 16 bytes. I want to understand the different sections of the hex dump and if there is anything in general you can assume. Please be thorough.
source:
#include <stdio.h>
int main()
{
printf("Hello World\n");
return (0);
}
gcc hello.c -o hello
$ gcc --version
gcc (GCC) 3.3.3 (cygwin special)
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
thanks! matt
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
using gcc -S
$ cat hello
.file "hello.c"
.def ___main; .scl 2; .type 32; .endef
.section .rdata,"dr"
LC0:
.ascii "Hello World\12\0"
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
call __alloca
call ___main
movl $LC0, (%esp)
call _printf
movl $0, %eax
leave
ret
.def _printf; .scl 2; .type 32; .endef
$ cat hello
.file "hello.c"
.def ___main; .scl 2; .type 32; .endef
.section .rdata,"dr"
LC0:
.ascii "Hello World\12\0"
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
call __alloca
call ___main
movl $LC0, (%esp)
call _printf
movl $0, %eax
leave
ret
.def _printf; .scl 2; .type 32; .endef
grg99's right, in that you are probably better off just starting with the .s files.
gcc -S hello.c
generates this output in hello.s:
.file "hello.c"
.section .rodata
.LC0:
.string "Hello World\n"
.text
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
subl %eax, %esp
movl $.LC0, (%esp)
call printf
movl $0, %eax
leave
ret
.size main, .-main
.section .note.GNU-stack,"",@progbi ts
.ident "GCC: (GNU) 3.3.5 (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3, pie-8.7.7.1)"
With my comments:
.file "hello.c"
.section .rodata ; means 'read-only data'.
.LC0:
.string "Hello World\n" ; your string constant, stored at label .LC0
.text ; it says 'text', but this really stands for 'code'.
.globl main ; this tells the assembler to publish the symbol 'main' as a global in the symbol table of the .o, so other programs (namely the linker) will be able to find it. Note that .LC0 is not published.
.type main, @function ; this is type information associated with the label, to let the outside world know main is a function and not a variable or anything like that.
main: ; this is the actual label for main().
pushl %ebp ; %ebp is the 'frame pointer', and %esp is the 'stack pointer'. These two lines preserve the
movl %esp, %ebp ; calling routine's frame pointer and reset %ebp to the current frame pointer.
subl $8, %esp ; reserves 8 bytes on the stack. Not sure why.
andl $-16, %esp ; aligns top of stack to 16-byte boundary. Not sure why.
movl $0, %eax ; zeroes %eax
subl %eax, %esp ; not sure why this is necessary. Is it clearing the CPU arithmetic flags?
movl $.LC0, (%esp) ; puts the address of the "Hello World\n" string on the stack as the first parameter to printf
call printf ; calls printf
movl $0, %eax ; zeroes %eax again. function return values go into %eax, so printf clobbered it.
leave ; reverses those first two %ebp and %esp instructions, to get the frame pointer back to how the calling routine expects it
ret ; returns to calling routine
.size main, .-main ; specifies the size of the 'main' entry. '.' is the current output location, and '.-main' is # of bytes from 'main:' to here.
.section .note.GNU-stack,"",@progbi ts ; not sure
.ident "GCC: (GNU) 3.3.5 (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3, pie-8.7.7.1)" ; identifying information for compiler
gcc -S hello.c
generates this output in hello.s:
.file "hello.c"
.section .rodata
.LC0:
.string "Hello World\n"
.text
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
subl %eax, %esp
movl $.LC0, (%esp)
call printf
movl $0, %eax
leave
ret
.size main, .-main
.section .note.GNU-stack,"",@progbi
.ident "GCC: (GNU) 3.3.5 (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3, pie-8.7.7.1)"
With my comments:
.file "hello.c"
.section .rodata ; means 'read-only data'.
.LC0:
.string "Hello World\n" ; your string constant, stored at label .LC0
.text ; it says 'text', but this really stands for 'code'.
.globl main ; this tells the assembler to publish the symbol 'main' as a global in the symbol table of the .o, so other programs (namely the linker) will be able to find it. Note that .LC0 is not published.
.type main, @function ; this is type information associated with the label, to let the outside world know main is a function and not a variable or anything like that.
main: ; this is the actual label for main().
pushl %ebp ; %ebp is the 'frame pointer', and %esp is the 'stack pointer'. These two lines preserve the
movl %esp, %ebp ; calling routine's frame pointer and reset %ebp to the current frame pointer.
subl $8, %esp ; reserves 8 bytes on the stack. Not sure why.
andl $-16, %esp ; aligns top of stack to 16-byte boundary. Not sure why.
movl $0, %eax ; zeroes %eax
subl %eax, %esp ; not sure why this is necessary. Is it clearing the CPU arithmetic flags?
movl $.LC0, (%esp) ; puts the address of the "Hello World\n" string on the stack as the first parameter to printf
call printf ; calls printf
movl $0, %eax ; zeroes %eax again. function return values go into %eax, so printf clobbered it.
leave ; reverses those first two %ebp and %esp instructions, to get the frame pointer back to how the calling routine expects it
ret ; returns to calling routine
.size main, .-main ; specifies the size of the 'main' entry. '.' is the current output location, and '.-main' is # of bytes from 'main:' to here.
.section .note.GNU-stack,"",@progbi
.ident "GCC: (GNU) 3.3.5 (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3, pie-8.7.7.1)" ; identifying information for compiler
So, it would typically contain the memory layout of your code.
-ssnkumar
-ssnkumar
Many unix-like operating systems (Linux, Solarix, and hp-ux for example) use ELF binaries for object code.
This site has a couple of informative links on it:
http://www.answers.com/topic/executable-and-linkable-format
This site has a couple of informative links on it:
http://www.answers.com/topic/executable-and-linkable-format
Actually, the wikipedia article is probably a little better:
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
ASKER
So essentially, to hex edit any executable file you need to know the format?
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
What are you trying to do?
ASKER
I was trying to determine if there was a standard format of an executable, like ELF that someone pointed out. For example, could I look at lines 30-40 and say this is were it does x. I was looking at the hex for just the simple case of hello.c which had thousands of lines and I figured that most of it was overhead, which might be in some standard format for every exe created with gcc.
matt
matt
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER