Link to home
Start Free TrialLog in
Avatar of 2266180
2266180Flag for United States of America

asked on

interfacing C pointers with asm in real mode (memory addressing issues in C)

Hi experts,

This is a continuation of a project I'm working on for some time (I also opened a question for anotehr isse related to the project: https://www.experts-exchange.com/questions/21833126/Combining-high-level-languages-with-asm-to-generate-com-files.html) and that is why I am now here.
Small introduction:
- I use nasm and gcc to create the application and bootloader (see my last comment in the above question).
- I decided to use real mode instead of protected mode as I do not need multi tasking, and over 1 MB memory and stuff like that, plus that I need direct hw access and other low level stuff, so I though there is no reason to get complicated with (mainly) memory management (will use the same addressing for code/data and hw access)
- I just hit an issue with memory management. it's probably just a confusion from my side, but I read on pmode and real mode and just can't seem to figure my way out of this.

I already have about 50% of needed asm code written (UI, the rest of 50% is the applicaiton main objective: hardware interfacing) Also I have about 30% of C code written (also UI)

now I came to the issue of writing a very simple and basic memory manager, that will provide access to the 640KB conventional memory (way too much for my needs). BUT. Until now, I was absolutly sure that I will not need more than 64KB of memory for all operations of my application. Now, after starting with the UI, I am not so sure (UI is in text mode) As I only wrote about 30% of UI, I now have about 10KB of memory (at least) used only by the existing code (for caching, and other stuff) and there are still a few things to be done, so that is why I am starting to think that I might exceeed my inital estimate of one segment for data (the code will fit in one segment for sure)
So now I am considering that I will have one segment for code, maybe one for stack (just to be on the safe side) and at least 2 segments for data. and this complicates everything (or at least is generating a lot of confusion for me).

so I know that a C pointer (gcc) is only the offset (I hope I'm not wrong with this...). until now that, wasn't a problem with having one data segment. but now that I will have possibly 2 or more, I just don't have a clue on how to work with these pointers.

so I have a C function that takes as an argument a C pointer. say:
extern void func1(char* c);
extern char* func2();
and the function is implemented in asm (nasm):
global func1
global func2

func1:
  blabla
  ret

func2:
  blabla
  ret

questions:
- how will I correctly access the data in func1 ? (C pointer is 4 bytes, but is still only an offset - if I understood it correctly)
- how will I pass to the C part of the program the address of the data initialized in func2? so that:
char* c = func2();
c[0] = '3';
will use the correct data from memory (segm:offs)

ever since I started to think on this issue and read a few stuff on both protected and real mode, everything is fuzzy in my head and I am not even sure on the single data segment implementation anymore.

As I said above, all my asm code is written having real mode in mind and thus is all 16 bit and all that. So if switching to protected mode is the only way out, is there anything I have to worry about besides memory access/handling which obviously needs to be changed?

I would really prefer a solution so that I will not have to change anything on the current implementation (because then I will most likely have to change design as well and changes might become ugly and drastical)
I would like to underline that char* is a simple example. I will have pointers to structures/unions and various blocks of data, which needs to be accessed from both asm and C (I already have a few such structees: that's why I know about the aprox 10 KB need).

I don't mind if I receive some GOOD links to read on this memory issues/C pointers/addresses and how to interface between asm and C (with pointers), but I want those info to be as clear as possible as I am already filled with various information that made me get into this confused state.

I would appreciate some examples with explanations on this interfacing stuff (working with multiple segments in asm only is not an issue for me: it's the way I take that "pointer" (address = segm:offs) from asm and use it in C)

(note: I'll put a pointer in the ASM TA to this just in case)

Thank you.
Avatar of cwwkie
cwwkie

> so I know that a C pointer (gcc) is only the offset (I hope I'm not wrong with this...).

You can choose the memory model: http://en.wikipedia.org/wiki/Memory_models But if you use tiny, it is true.

> wasn't a problem with having one data segment. but now that I will have possibly 2 or more,

I think you should choose another memory model. I am afraid you only get problems if you play with the segment registers.
Avatar of 2266180

ASKER

ok, I see what you mean. so If I choose a memory model of say compact, then I can access data as far. nice.
but then how do I break up the far pointer into segment:offset for asm, and viceversa, compute the far pointer from segm:offset in asm?
this must be compiler specific, no? does gcc has some macros/functions for this? also, how do I force the gcc compiler to use a specific memory model?
if you can't help me with that, it's ok, I'll just open up another question: it's a little off-topic anyway :)
SOLUTION
Avatar of dimitry
dimitry

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of 2266180

ASKER

will that be valid in my case: running in real mode?

I made a small test program in C:
int main(){
  char* c = "test";
  c[0] = 'a';
  return 0;
}

compiled to binary and disassemblerd it and got:
00000000  55                push ebp
00000001  89E5              mov ebp,esp
00000003  83EC08            sub esp,byte +0x8
00000006  83E4F0            and esp,byte -0x10
00000009  B800000000        mov eax,0x0
0000000E  29C4              sub esp,eax
00000010  C745FC24000000    mov dword [ebp-0x4],0x24
00000017  8B45FC            mov eax,[ebp-0x4]
0000001A  C60061            mov byte [eax],0x61
0000001D  B800000000        mov eax,0x0
00000022  C9                leave
00000023  C3                ret
00000024  7465              jz 0x8b
00000026  7374              jnc 0x9c
00000028  00                db 0x00

that doesn't look like it would run correctly in real mode.

so ... still very confused.
> but then how do I break up the far pointer into segment:offset for asm, and viceversa,

I have not used assembly much, but I think you can use model or .model to handle that.

Maybe this will help: http://www.dre.vanderbilt.edu/~sutambe/documents/misc_c11.htm

Here is an example which includes large pointers: http://www.koders.com/assembler/fid1D592E3B23D8CA1C384B90550038FE7D92111A47.aspx

This function: exec( int swap, char far *program, char far *cmdtail, int environment_seg, char far *tmpfilename );

This are the offsets on the stack:
argbase            equ      6
a_swap            equ      <bp+argbase+0>  ; 16 bit
a_prog            equ      <bp+argbase+2>   ; 32 bit
a_tail            equ      <bp+argbase+6>   ; 32 bit
a_env             equ      <bp+argbase+10>  ; 16 bit
a_tmp            equ      <bp+argbase+12>  ; 32 bit

and this is in the code:
lds      si, ss:[a_tmp]  ; this loads ds:si with the pointer value of tmpfilename
> that doesn't look like it would run correctly in real mode.

I think you cannot conclude that from that example. At least I don't see what won't run in real mode.
correction,

> 00000017  8B45FC            mov eax,[ebp-0x4]
> 0000001A  C60061            mov byte [eax],0x61

I think you are right. this loads a 32 bits pointer in eax, and uses it to store 0x61 into.

But I think this must be a compiler option.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of 2266180

ASKER

>I have not used assembly much, but I think you can use model or .model to handle that.
that might do it on asm side (if I can figure out the corresponding option for nasm, but it still leaves a pretty big hole:
in real mode I have an address formed by a segment and an offset (24 bit). I need the code that the c compiler generates to follow that "restriction". but I just can't seem to find a way to tell the compiler (gcc in my case) to do that. all it spits out is 32 bit code for protected mode. this is default. not a big problem unless I can't find a way to bypass it.: eitehr by command line switches or by compiler directive.

> But I think this must be a compiler option.
maybe. I've been through the man page of gcc but there is a lot of stuff, so I might have overlooked.  one could probably spend 1-2 hours just reading the man page from gcc. this is one of the reasons I decided to post a question: too much time to waste for somthing that I think is a small issue.

as you can see, it's more a C issue here, and that is why I posted the question in this TA :)
the asm part is fine, running ok in real mode, addressing the memory correctly. no problem there. but on my C part... there are issues. At one point I was thinking to just use segment and offset separatly and pass them around to the asm functions and back to the c functions. And I implemented this on C side until I had to write code to access a structure (which had a pointer field to hold about 3 KB). and I said to myself: ok, I copy stuff in asm from wherever I "allocated" the memory to the code segment by means of asm and giving the local offset to the c function ... but how am I going to access the data the pointer from the structure is pointing to? and that's when in all started to collaps and me getting extreamly confused. so I solved a small problem, but there is anotherone which I don't know how to solve, since I will have around 5-6 different structures jsut in the user interface, and then there will be the actual functional code (which is not yet designed) which might come with it's own data structures. so copying data like that is not an option. it's not an option to copy the data to teh code segment and back again since I will have no way of telling what to copy.
so I am back to the drawing board, needing a way to:
- interface C pointers with asm addressing in real mode.
this is a solution that will have the least of modifications to be done in the code (mostly just on the C side).
the other solution is to switch to protected mode, in which case I have to:
- document wheter there are limitation considering direct hw accesss (IO and stuff like that). another round of haeavy reading. anotehr round of time spent not developing (and clock is ticking :) )
- modify both C code and asm code to address memory accordingly (though I think I could leave the asm code with little to no modifications, but I'll have to go through that just to be sure)

so I would really like to know if there is a way to make gcc compile code in such a way that it can be used in real mode (just in case it wasn't clear: the bootloader I made will load this code generated by gcc in memory and give execution to it. so there is no OS or anything like that to manage the memory:allocation/access/etc)
Avatar of 2266180

ASKER

hi Paul,

thanks for the input. I know about the segments and their use, and I also understand asm listings. I think :)

anyway, as you are not familiar with gcc: I tried that far thing, but gcc doesn't cope with it. not with defaults anyway. it seems that it just doesn't recognize the keyword. or maybe there is another notation/keyword for it. google didn't help on this one. it usually does but on this issue (the one in discussion on this question) it just didn't came up with anything of use for me.
Avatar of 2266180

ASKER

> I could ask the page editor for Assembler to take a look but that would be pointless, I know just as much as he does. :-)

now that I think about it, it could be an idea to put a pointer question in unix and linux programming. those guys should know more about gcc. don't know why I didn't think about that in the first place.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi ciuly,

If there are only a few assembler functions to call from C we could probably work out a macro that would reliably convert the pointers both ways.

If you can get assembler listings, try posting the following:

1. A sample assembler routine we must call. Ideally, one taking a pointer and returning a pointer.
2. A sample C function you want to use to call the assembler.
3. Assembly listing of 2. above (so make a small one please).

Paul

P.S. My joke about the Page Editor of Assembly was because I am he. :-)
Avatar of 2266180

ASKER

Hi Paul

I thoug about that myself. but if you look at the listing I gave a little earlier (this one:)
I made a small test program in C:
int main(){
  char* c = "test";
  c[0] = 'a';
  return 0;
}

compiled to binary and disassemblerd it and got:
00000000  55                push ebp
00000001  89E5              mov ebp,esp
00000003  83EC08            sub esp,byte +0x8
00000006  83E4F0            and esp,byte -0x10
00000009  B800000000        mov eax,0x0
0000000E  29C4              sub esp,eax
00000010  C745FC24000000    mov dword [ebp-0x4],0x24
00000017  8B45FC            mov eax,[ebp-0x4]
0000001A  C60061            mov byte [eax],0x61
0000001D  B800000000        mov eax,0x0
00000022  C9                leave
00000023  C3                ret
00000024  7465              jz 0x8b
00000026  7374              jnc 0x9c
00000028  00                db 0x00

more precisly these 2 lines:
00000017  8B45FC            mov eax,[ebp-0x4]
0000001A  C60061            mov byte [eax],0x61
you will see that the pointer is local to the current segment. so for instance, if I have some data residing on segment X, that is not the same with CS=DS=SS(with E's in the above listing) then I'm out of luck. the processor being in real mode, it will compute the phisical address by means od ds and whatever offset hapens to be in ax. in any case, nothing realted to my X segment with whatever Y offset. (data being in lower conventional memory since were in real mode)

PS :  I thought so but didn't get around to actually check that with all the reading I'm getting :)

@cwwkie:
reading on GAS (and trying it out with error (C code is from above):
/tmp/ccywLLUn.s: Assembler messages:
/tmp/ccywLLUn.s:19: Error: `-4(%ebp)' is not a valid 16 bit base/index expression
/tmp/ccywLLUn.s:20: Error: `-4(%ebp)' is not a valid 16 bit base/index expression
/tmp/ccywLLUn.s:21: Error: `(%eax)' is not a valid 16 bit base/index expression
I read a little further on those pages just to find:
http://tldp.org/HOWTO/Assembly-HOWTO/nasm.html
where I quote: "Unless you're using BCC as a 16-bit compiler (which is out of scope of this 32-bit HOWTO), you should definitely use NASM"
I don't know if that influences GAS or not. I did a ndisasm on the bootloader code (already written and compiled) and it's all 16 bit (compiled with nasm). so that is ok.
however, I can't seem to find a way to see what version of binutils I am having (they say "Binutils (2.9.1.0.25+) now fully support 16-bit mode" ) but my gcc is 3.2.2 and ld says GNU ld version 2.13.90.0.18 20030206
So I guess I should fit the profile :)
c code for GAS giving the above errors:
asm(".code16\n");

int main(){
  char* c = "test";
  c[0] = 'a';
  return 0;
}

dunno if it's on teh right track, but it's something :) maybe I also need to put a swith on the gcc command line?
Hi ciuly,

Forgive me if I'm on the wrong track but if your compiler cant generate 16bit code then you will have a significant communication problem between them.

Surely, once you have solved that problem, this one will probably go away, or at least change.

If you find a 16bit compiler, this problem will disappear. If you use chunking between them then that mechanism should be doing the address translation.

Paul
Avatar of 2266180

ASKER

after reading some more, I chaged .code16 to .code16gcc and got the following output:
00000000  6655              push bp
00000002  6689E5            mov bp,sp
00000005  6683EC08          sub sp,byte +0x8
00000009  6683E4F0          and sp,byte -0x10
0000000D  66B80000          mov ax,0x0
00000011  0000              add [eax],al
00000013  6629C4            sub sp,ax
00000016  6766C745FC3200    mov word [di-0x4],0x32
0000001D  0000              add [eax],al
0000001F  67668B45FC        mov ax,[di-0x4]
00000024  67C60061          mov byte [bx+si],0x61
00000028  66B80000          mov ax,0x0
0000002C  0000              add [eax],al
0000002E  66C9              o16 leave
00000030  66C3              o16 ret
00000032  7465              jz 0x99
00000034  7374              jnc 0xaa
00000036  00                db 0x00

this seems to be more or less valid 16 bit code but I'm still not sure of it. bx is not initialized though it is used and so are other registers.

I'm still reading a few stuff...
Avatar of 2266180

ASKER

Paul,

I am still not sure on wheter gcc can or cannot produce valid 16 bit code to be run in real mode. But I sure hope it is able to, because I would hate to change compiler (given the fact that my X experience is mostly with gcc)
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi cwwkie,

>>00000011  0000              add [eax],al
I thought this was strange too. I guess we are looking at debug code with no optimisation, which is why we get "mov ax,0". I thint this is an incorrect disassembly. I always saw op code 00 as NOP. I'll find out.

Paul
Avatar of 2266180

ASKER

@cwwkie:
interesting idea, but it keeps talking about dos/windows and at first looks at least, it seems to depend on that. I se that it defines a new gdt, but at this moment I am not sure if it's totally os independent. I'll have to further read and understand the code therein. but it looks promising :)
Hi ciuly,

It does indeed look promising. I suggest you work out why:

00000011  0000              add [eax],al

is there. 'eax' is not available in 16bit so those instructions are suspect (there are three of them). Notice that the opcode is 0000, I bet that has been misinterpreted. If that is the case then I think you found your route.

Paul
Avatar of 2266180

ASKER

Hi Paul,

I was just reading something on this. according to: http://www.gnu.org/software/binutils/manual/gas-2.9.1/html_chapter/as_16.html "GAS currently supports only 32-bit addressing modes: when writing 16-bit code, it always outputs address size prefixes for any instruction that uses a non-register addressing mode. So you can write code that runs on 16-bit processors, but only if that code never references memory."
but that is relevant to gas 2.9.1 and
as --version
gives: GNU assembler 2.13.90.0.18 20030206
which is newer and probably complies to my other quote from an older post above:
"Binutils (2.9.1.0.25+) now fully support 16-bit mode (registers and addressing) on i386 PCs."

regarding your comment, yes eax should not be present there, but as the old docs say, gas wasn't able to produce clean 16 bit code, so I wouldn't be suprosed if the ssue persists. given the fact that asm(".code16") failed to compile and only .code16gcc compiled (which among other things, keeps stack access on 32 bit.

The latest release of GNU binutils is 2.16.1 so I will give that a try and see if that is able to get me good 16bit code for real mode. if not, I'll consider to
- see if flat real mode works for me (I'll rpobably just give it a try and see if it works or not). understanding everything there seems rather complicated
- move everything in protected mode and figure out how to solve teh bios and direct hw access issues, or
- change the compiler (again). I'll have to find a decent C compiler though that can output 16 bit code for real mode. (I'll open up anotehr question if it gets to this)

I just hope latest binutils will save the day.

well, it's 2 am here, work day tomorrow, so I'll get back to you (all) in a few days after I have tried everything and hopefully got somethign working. with the specific of my job this will probably last until next weekend.
of course, other inputs/suggestions will still be welcomed :)
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of 2266180

ASKER

Hi aib_42,
thank you for your input. I have considered unreal mode, aka flat real mode at cwwkie and it looks promising, but I will have to do some smal testing to make sure that indeed it will support my application for me, after reading what cwwkie suggested and now what you did, it looks like the best option. but since I am not a low-level expert, as my main areas of work are in high level environments, I cannot be sure until I "see it with my own eyes" or understand how it works.
I am at work right now and will not get home in the next 11 hours or so, but when I do, this flat real mode (unreal mode) will be the first thing to start testing.

>Is your assembly code [going to be] 16-bit? How much of it, exactly -- can you easily port to 32-bit code?
I think so. the only issue I see at the present code would be the way the memory is being addressed.

>Have you considered some of the other benefits of using Protected Mode?
I have, but I don't really need any of that. I rather need the real mode. My application will be interfacing hardware at the very low level to say the least. main functionality will be IO on a variety of IO devices (at least IDE and hopefully SCSI drives depending on the time left to implement all that)

>Your issues seem mainly related to Assembly, I think you would have [had] more luck posting to that TA. You should post pointer questions in Assembly and maybe Linux TAs, yes.
I did put a pointer question right at the begining in asm TA. But from what I have done so far with the help of the experts here and on my own, the asm part of my code is fine as it is; the issue is how the C code can be made to run correctly in real mode. and main problem is not the instructions (which can be forced to 16 bit as someone suggested by forcing GAC) but the memory addressing for which as until now, I couldn't find a way to force to be outputed as segment addressing for real mode.
but if unreal mode works fine for my situation, then this will no longer be an issue. I might need to change a little of the asm code since it accesses the cached video memory at least, but that shouldn't be much of a problem and since I am not entirely clear on how ot use unreal mode yet, maybe no change is really necessary (at least that is what I understand from what they say on those pages about unreal mode). I am more concerned on what the c compiler gets out to me. there is no OS behind all this to take care of anything, so I must make sure that everything is in place and in the correct "manner". a 16 bit compiler would probably help, but all 16 bit compiler that are good enough are for windows/dos and thus at least they all have an org 100h messing up things, plus that all dos compilers will use dos calls for sure (i couldn't find one that doesn't) so that is why I'm on a linux box with gcc :)
For now I will stick with gcc and try the unreal mode. if that doesn't work, then I will consider changin the compiler. but that is a problem for later which I hope will not come up :D

peace
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of 2266180

ASKER

Hi aib_42,
I was on my way to turn of my pc and get to sleep when I received the notificatio of your message. I read through but I must be too tired since I didn't make out much that would help. But I can tell you that I would rather not change my development environment again. The project is due in 3 weeks, from which the first 2  I'll be working as well on my job. so even if this project will be an ugly "hack" I'll have to get something that will allow me to continue development and not start over. I consider the working days of being "dead" anyway, and count only the weekends as "planned days for the project", so if I can fix this issue until the comming weekend and just continue, I will only have one lost day (yesterday) and will still be more or less on shedule. but if I will have to start over again ...
I know it was a bad design from my part, but looking on the bright side: at least I didn't finish most of teh project to launch it and see it crash and spend a few hours debugging just to realise when it's too late that the whole thing was badly designed from the start.
this unreal mode looks promising as I keep reading articles and examples and all that. I still have to figure out how the bios interrupts will work in this mode and after that I can make a small test app to see if indeed it works as expected.
I'm hoping to be able to set up my machine at work for this "job" and thus speed up the process of deciding what to use and how to use it and then hopefully everything will work nicely.

let me tell you al this, just as a fact so that someone would pay extra attention when modifying design: about a year ago when I started to design this project, there was a smal OS involved that would do all the dirty work. I've done the design for that and the app but that took too much (until about 1,5-2 months ago) when I realised I won't have time to implement the OS, so I stripped it off. and probably this is where I missconsidered the memory requirements of the UI and for some reason believed that everything would fit in one segment. guess I'll pay more attention next time.

If I can't get this unreal mode running until the day after tomorrow, then I'll post another question in asm TA for some code. I found a few out there but they are more or less different, so I am a little in doubt of which one to consider and in what order, as none of the sites are "known" (for me at least)
Avatar of 2266180

ASKER

Hi all,

I have finally managed to finish a small "hello world" like app that will do fine for my situation. To make a small "review" all everything, for future readers:

Tools:
- nasm
- gcc

1) Unreal Mode:
I have tried several way of getting in unreal mode, some did not compile, some did not seem to work (the link from cwwkie might work, don't remember why I didn't use that). the one I used is from here (also has pretty good explanations): http://my.execpc.com/~geezer/johnfine/segments.htm

2) Parameter passing between C and asm: as pointed out by dimitry, they will be pushed on the stack as 32 bit values. I have forgotten that tip, and spent almost a day trying to figure out why the asm functions were not called/working.

3) cwwkie also pointed out that the *.c files will need an asm(".code16"); at the top. That usually works, but for some reason, my gcc compiler spit errors in a few cases, so for those cases use asm(".code16gcc"); and you'll be fine (right now I am using a mix of the 2)

4) all *.asm files that will be used in linking to c modules will need a BITS 16 directive at the top.

5) I also spent a day on this "minor" problem. When I was writing the bootloader, I followed a "tutorial" in which the author loaded the "kernel" at 200h. This worked pretty nicely, if the kernel was only 1 sector long. If bigger, it will still run fine as long as you don't use an interrupt (video in my case). The thing is that the bootloader loaded the kernel in the interrupt vector table space. After searching for a good memory map, I was provided with one on an irc channel: http://stakface.com/nuggets/index.php?id=10&replyTo=0 
My current implementation loads the code at 500h and allocates memory for the data starting from 7E00h (I might change this a bit in order to allocate a proper stack for the app)

Thank you again for your help