Link to home
Start Free TrialLog in
Avatar of Elmo_
Elmo_Flag for Ireland

asked on

Dealing with Memory + Stacks

Hi there, Would anybody be able to tell, what would be the best way to deal with declaring, memory space for Bytes and Words, when dealing 8086 assembly. What I want to do is simulate how Assembly Languages store variables and Arrays in Memory and how they access them.

Cheers, Elmo_.
ASKER CERTIFIED SOLUTION
Avatar of nietod
nietod

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Elmo_

ASKER

Cheers for answering nietod,

I am trying to write an interpreter for 8086 Assembly language instructions.

If I am Storing it as a huge array of unsigned Characters how do I deal with variables that store Strings and use Constants? E.g.

CR EQU 10
LF EQU 13

Mes_1 DB LF,CR,'Hello World',CR,'$'

Elmo_
Avatar of nietod
nietod

I still don't understand if you are writting an assembler or a CPU simulator.  I'm guessing the assembler?  right?

For the assembler you will need to create a symbol table of all the named addresses, this would probably be hash table, but you could just use an array (dynmamicly sized array unless you want to place a limit on the number of symbols.)

You will need an int variable that you use to store the "current" data address (offset into the data segment) and you will initilaize this varialbe to 0.  each time a data declaration is encountered, you add the name of the declaration into the symbol table and store the current address with it (and posssibly other information, like the data type (byte, word etc) and/or the length, etc)  Then you increment the current data adddress by the size of the data allocated.

so for

Mes_1 DB LF,CR,'Hello World',CR,'$'

you would add "Mes_1" to the symbol table and record the current address in the entry.  Then you would add 16 to the current address.
You mention entries like

CR EQU 10
 LF EQU 13

For that you need include symbolic entries in the symbol table.  i.e each entry in the symbol table will have a value that indicates its type, and the types might be symbolic (EQUs), BYTE (byte data) WORD, DWORD, CODE and there might be others.  So when you come to a data declaration, like I suggesting in the previous comment, you would mark the entry in the symbol table as BYTE, WORD, or DWORD and adjust the current data address.  When you come to a symbolic definition like those above, you would justa dd them to the table, probalby recording their value, and woudl not alter the current data address.

Does that help?
Avatar of Elmo_

ASKER

Yes, Cheers.

it helps alot. I am actually writing a program which interperetes 8086 Assembly code and runs it.  So basically the best way of describing it would be that it scans through the code making sure that it is a valid program and then runs the program based on what it should do if the program was assembled and linked.  My program does not create an executable file.  Just interpretes the code.

One more question: Why would I add 16 to the current memory address when storing

Mes_1 DB LF,CR,'Hello World',CR,'$'

??

Cheers, theanks for all the help so far...
So it is both the assembler and the simulated computer.  Right?

The way a simple assembler works is it just starts posiitioning stuff in memory sequentually.  i.e. the first thing it comes to goes into address 0, the next thing follows the first and so on.  

So there is the concept of the "current location" in the assembly process, that is the location the next data item will be placed at. This current location will be set to 0 at the start of the assembly process, then incremented each time an item encoutnered, by the size of that item.  So for example in

BYTE1 DB 0
WORD1 DW 0
BYTARY DB 5 DUP (0)
BYTE2 DB 0

the current location starts at 0.  When BYTE1 is encountered it gets an address of 0. and the current location is increased by the size of byte1, which is 1 bytes.  So the current location is now 1.  WORD1 is then encountered.  it gets address 1, the current location. The current location is increased by its size, 2 bytes.  So now the current locaiton is 3.  BYTARY get the address 3.  Current location is increased by the size of BYTEARY, 5.  So current locaiton is 8.  Byte 2 gets an address of 8.  

(These addresses are what will go into the symbol table.  They will be offsets into an array that simulates the computer's memory)

Now this does get more complex in a real assembler.  You can have multiple segments so you need a current location for each segment.  Also there are usually options for direclty setting the current location, like ORG in the typical x86 assembly languages.  This allows the programmer to move the current location up or down in memory.  Producing unused space or allowing space to be used twice!

Note also that x86 assembly (most assembly languages) usually lets you get the current location value.  in x86 assembly this is usually done with $. If you do

someword DW $

the word is initialized with the current location.  In otherwords, the word is at the address of the current location and set to the current locaiton.  so it is in effect a poiner to itself.  

You can use this to determine the length of things.  Like

STRBGNADD EQU $ ; Get the string begin address
STRING DB "This is a string."
STRLEN EQU $-STRBGNADD ; The string length.