Solved

Strings and Arrays on Assembler

Posted on 2001-06-08
6
1,061 Views
Last Modified: 2009-12-16
Hi there, well, i'm programming a little (and very basic) Pascal-to-Assembler compiler on Pascal (Delphi), but i'm kinda lost about string and 2-Dimensional array management...

what i need is to know how to 'translate':

-simple string operations like:
x:='string';   or
x[y]:='<character>';

-simple 2D array operations like:
x[y,z]:=<value>;

to Assembler Code.

i'm done now with lexic, sintactic, and semantic part of the compiler, and can manage 'for','while','repeat', and basic integer, real, and byte operations, but still lost with strings and arrays. (well, read(string) and write(string) is working too.)

step-by-step instructions are welcome since i'm not an assembler expert, and reffering to simple tutorials too.

Thanx in Advance...
0
Comment
Question by:garisoain
6 Comments
 
LVL 24

Expert Comment

by:SunBow
Comment Utility
?
Basically, string is fixed size (possible)
You give it address, name, number of bytes,
possibly initializing to a specific value (either for default or debug)
That is data part.
Code part is using that address to access that string.
Good ides to have another item to size the string (number) and/or limit it (nul at end).

Depending on language and platform, quite often we'll toss the 'string' onto a 'stack' form. Difference is how you approach the definition for limitation of string. Defining up front with names and preassigned bytes, the length is limited by predesignation. For flexibility, the string is not defined until runtime, and all strings get to use a common store. This means of course, less space used for small names, longer strings permitted, but a lot of overhead trying to keep track of who has what space, managing free space, and garbage collection.

Another method would allocate with linked list. More overhead, but easier to manage in coding it up.

Any sense?

A lot depends on the underlying platform and the needs of the programmer.

For running dimensions and not caring on space or efficiency, consider linked list {a,b,c,d,link}{x,y,z,link}. Have pointers off on the side.

For efficiency, decide early which goes first (rotates). So 0,0,0,3  has rotation on right, so you run it up to end, then begin on next column.

Anything variable length gets either special character termination (<nul>) or neat trick, 1st datum entry (zero-eth) has the length. This is managed at runtime.
0
 
LVL 4

Author Comment

by:garisoain
Comment Utility
=/ sorry, but i don't get it...

any explained example???
0
 
LVL 4

Accepted Solution

by:
Neutron earned 200 total points
Comment Utility
As I understand, you do not generate the machine code, but the result of your compilation is an .asm file.

Also, it would be very useful _for you_ to specify for which processor (8086, 386...) do you generate code.

Anyway:

Pascal style strings have a fixed length 1+255
This means that first byte holds LENGTH of a string, and remaining 255 characters hold DATA (characters).

In Pascal, as you already know, you can have declarations

var
  s : String[77]; { instead of 77 can be an int 1..255 }

...and...

var
  s : String; { this is similar to  s : String[255] }

Since these two declarations are different only for the size of the DATA part, I will explain how to translate the string with specified length, and if you support only String with 255 characters, then it doesn't change validity of this comment.

Pascal declaration:

var
  s : String[255];

translates into asm declaration

s_LENGTH db 0
s_DATA db 255 dup(?)

This means that s has capacity 255 but has initial length 0.

When you parse Pascal code, all string constants you find, you must declare in output .asm code as anonymous strings,
so when you find

x:='string';

put in output file, in data section this

string_constant_LENGTH db 6
string_constant_DATA db "string"


When you want to translate assignment of this constant to string variable x, you do it like this:
(I will assume that your output code is 8086 real mode)



mov   si, OFFSET string_constant_LENGTH
mov   di, OFFSET x_LENGTH

lodsb
stosb

mov   cl, al
xor   ch, ch

rep movsb




This code will assign value of string_constant to string variable x.

Note that if x was declared as String[4] this code will overwrite first variable in data segment which appears after declaration of x, but that also happens in Pascal.
You can make some improvements for this in compile time and/or run time. If you decide to do it in run time, you must extend your string definition with one more byte which would hold string capacity information.

var
  y : Integer; { an 16 bit integer, of course }
  x : String;


x[y]:='c';

;--- data (these are only delcarations, somewhere in Pascal code they will be initialized

y dw ?

x_LENGTH db 0
x_DATA db 255 dup(?)


;--- code

mov   al, 'c'

mov   si, OFFSET x_DATA
add   si, y

mov   [si], al



For these examples it is clear that some registers values are changed, so if this code is within a loop, you may need to push/pop affected registers before/after code execution.


Arrays:

I will explain for 2D arrays, but the logic is just the same for multidimensional arrays.
In assembler you represent a 2D array as 1D array.

If you have an array declared like this:

type
  Row = array[0..319] of Byte;
  TwoDimensionalArray = array[0..199] of Row;

var
  tda : TwoDimensionalArray;

So, one dimension has 200 elements, other dimension has 320 elements and each element is a Byte.
You declare this in your asm output as an array:

tda dw (200*320) dup (?)

Note that this expression 320*200 is just a constant and is just the same as

tda dw 64000 dup (?)

and it doesn't have semantics of array declaration.



In an array declared like this, you have in memory
320 elements of first row, followed by 320 elements of second row, then 320 elements of 3rd row, ..., and finally you have 320 elements of the last row.
All rows are lined up one after another in a 1D array.

Now, you have 2 coordinates to access some element in 2D array.
You use a simple formula to calculate 1D index of element from two 2D indices.

Instead of

tda[j,i]:=<value>;

you actually do this

tda[j*320 + i]:=<value>;



Assembler output would look like this:

; data

i dw ?
j dw ?

tda db 64000 dup (?)

; code

mov   ax, i
mov   dx, 320
mul   dx
add   ax, j
mov   si, ax

mov   al, <value>

mov   [si], al



This TwoDimensionalArray structure was intentionally defined with zero based indices: 0..319 and 0..199

If your array is 5..324 and 7..206 then this code looks
like this:

mov   ax, i
sub   ax, 5
mov   dx, 320
mul   dx
add   ax, j
sub   ax, 7
mov   si, ax

mov   al, <value>

mov   [si], al


Also, size of data in array is one byte. If you have array of records, you must multiply AX with DATA_SIZE before assigning it to SI register (you do that just like code above multiplies with 320).

Your final formula for locating x[i,j] is

x[((j-ROW_LOW_IDX)*TOTAL_COLUMNS+(i-COL_LOW_IDX))*DATA_SIZE]



Given code doesn't perform index range checking, and also it doesn't check for overflow, so if you try to access element x[800, 600] calculated index will be incorrect because 960000 cannot fit into 16 bit register, and even if it could, the array was defined as 320x200 so element being accessed wouldn't be an element of requested array.

You can add range checking to the given code and add overflow checking after multiplications and additions.
Furthermore you could use 386 instructions and 32bit registers. Finally you could use some decent memory model with linear addressing, so you don't have to deal with memory segmentation.


One more example, for 3D array with zero-based indices:

type
  vector = array[0..99] of byte;
  matrix = array[0..77] of vector;
  world = array[0..11] of matrix;

var
  w : world;

...can be represented as 1D array

w : array[0..(100*78*12)] of byte;

...which in asm is translated like this

w db (100*78*12) dup (?)

...and when you are accessing matrix with index m, vector with index v inside that matrix, and byte with index i inside that vector you have

w[m*(78*100) + v*100 + i]

Same rules apply as to 2D arrays,
is some index is not zero based - sub bottom index from it
if data size is not 1 byte - multiply the whole equation with data size.


'Hope this helps,
   Ntr:)
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 1

Expert Comment

by:erezsh
Comment Utility
A string in Pascal is always 255 unless declared otherwise ( string[n] ).
in:
s:='Hello';
(s is a variable with reserved place)
simply copy 'Hello' to the offset of s.
another tip: the first byte of pascal strings ( s[0] ) is they're size.
for example, 'hello' in memory is chr(5)+'hello'.

As for 2 dimentional array or more...
do the same u do on 1dim arrays, only multiply the values:
a[10] is offset a+10
a[10,5] is offset a+10*5

I hope I helped
  Erez Sh.
0
 
LVL 4

Expert Comment

by:Neutron
Comment Utility
In example of assigning a single char at given index in string, it should be

mov   si, OFFSET x_LENGTH


instead of


mov   si, OFFSET x_DATA


because Pascal (at least BP/Delphi) allows access to element 0 just like any other character in a string.

If you use any of this, please make this correction.

Greetings,
    Ntr:)
0
 
LVL 4

Author Comment

by:garisoain
Comment Utility
Thanx to you all!!!

Now I know how to deal with this. =)
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

I know it’s not a new topic to discuss and it has lots of online contents already available over the net. But Then I thought it would be useful to this site’s visitors and can have online repository on vim most commonly used commands. This post h…
Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now