Strings and Arrays on Assembler

Posted on 2001-06-08
Last Modified: 2009-12-16
Hi there, well, i'm programming a little (and very basic) Pascal-to-Assembler compiler on Pascal (Delphi), but i'm kinda lost about string and 2-Dimensional array management...

what i need is to know how to 'translate':

-simple string operations like:
x:='string';   or

-simple 2D array operations like:

to Assembler Code.

i'm done now with lexic, sintactic, and semantic part of the compiler, and can manage 'for','while','repeat', and basic integer, real, and byte operations, but still lost with strings and arrays. (well, read(string) and write(string) is working too.)

step-by-step instructions are welcome since i'm not an assembler expert, and reffering to simple tutorials too.

Thanx in Advance...
Question by:garisoain
LVL 24

Expert Comment

ID: 6169880
Basically, string is fixed size (possible)
You give it address, name, number of bytes,
possibly initializing to a specific value (either for default or debug)
That is data part.
Code part is using that address to access that string.
Good ides to have another item to size the string (number) and/or limit it (nul at end).

Depending on language and platform, quite often we'll toss the 'string' onto a 'stack' form. Difference is how you approach the definition for limitation of string. Defining up front with names and preassigned bytes, the length is limited by predesignation. For flexibility, the string is not defined until runtime, and all strings get to use a common store. This means of course, less space used for small names, longer strings permitted, but a lot of overhead trying to keep track of who has what space, managing free space, and garbage collection.

Another method would allocate with linked list. More overhead, but easier to manage in coding it up.

Any sense?

A lot depends on the underlying platform and the needs of the programmer.

For running dimensions and not caring on space or efficiency, consider linked list {a,b,c,d,link}{x,y,z,link}. Have pointers off on the side.

For efficiency, decide early which goes first (rotates). So 0,0,0,3  has rotation on right, so you run it up to end, then begin on next column.

Anything variable length gets either special character termination (<nul>) or neat trick, 1st datum entry (zero-eth) has the length. This is managed at runtime.

Author Comment

ID: 6172403
=/ sorry, but i don't get it...

any explained example???

Accepted Solution

Neutron earned 200 total points
ID: 6173948
As I understand, you do not generate the machine code, but the result of your compilation is an .asm file.

Also, it would be very useful _for you_ to specify for which processor (8086, 386...) do you generate code.


Pascal style strings have a fixed length 1+255
This means that first byte holds LENGTH of a string, and remaining 255 characters hold DATA (characters).

In Pascal, as you already know, you can have declarations

  s : String[77]; { instead of 77 can be an int 1..255 }


  s : String; { this is similar to  s : String[255] }

Since these two declarations are different only for the size of the DATA part, I will explain how to translate the string with specified length, and if you support only String with 255 characters, then it doesn't change validity of this comment.

Pascal declaration:

  s : String[255];

translates into asm declaration

s_LENGTH db 0
s_DATA db 255 dup(?)

This means that s has capacity 255 but has initial length 0.

When you parse Pascal code, all string constants you find, you must declare in output .asm code as anonymous strings,
so when you find


put in output file, in data section this

string_constant_LENGTH db 6
string_constant_DATA db "string"

When you want to translate assignment of this constant to string variable x, you do it like this:
(I will assume that your output code is 8086 real mode)

mov   si, OFFSET string_constant_LENGTH
mov   di, OFFSET x_LENGTH


mov   cl, al
xor   ch, ch

rep movsb

This code will assign value of string_constant to string variable x.

Note that if x was declared as String[4] this code will overwrite first variable in data segment which appears after declaration of x, but that also happens in Pascal.
You can make some improvements for this in compile time and/or run time. If you decide to do it in run time, you must extend your string definition with one more byte which would hold string capacity information.

  y : Integer; { an 16 bit integer, of course }
  x : String;


;--- data (these are only delcarations, somewhere in Pascal code they will be initialized

y dw ?

x_LENGTH db 0
x_DATA db 255 dup(?)

;--- code

mov   al, 'c'

mov   si, OFFSET x_DATA
add   si, y

mov   [si], al

For these examples it is clear that some registers values are changed, so if this code is within a loop, you may need to push/pop affected registers before/after code execution.


I will explain for 2D arrays, but the logic is just the same for multidimensional arrays.
In assembler you represent a 2D array as 1D array.

If you have an array declared like this:

  Row = array[0..319] of Byte;
  TwoDimensionalArray = array[0..199] of Row;

  tda : TwoDimensionalArray;

So, one dimension has 200 elements, other dimension has 320 elements and each element is a Byte.
You declare this in your asm output as an array:

tda dw (200*320) dup (?)

Note that this expression 320*200 is just a constant and is just the same as

tda dw 64000 dup (?)

and it doesn't have semantics of array declaration.

In an array declared like this, you have in memory
320 elements of first row, followed by 320 elements of second row, then 320 elements of 3rd row, ..., and finally you have 320 elements of the last row.
All rows are lined up one after another in a 1D array.

Now, you have 2 coordinates to access some element in 2D array.
You use a simple formula to calculate 1D index of element from two 2D indices.

Instead of


you actually do this

tda[j*320 + i]:=<value>;

Assembler output would look like this:

; data

i dw ?
j dw ?

tda db 64000 dup (?)

; code

mov   ax, i
mov   dx, 320
mul   dx
add   ax, j
mov   si, ax

mov   al, <value>

mov   [si], al

This TwoDimensionalArray structure was intentionally defined with zero based indices: 0..319 and 0..199

If your array is 5..324 and 7..206 then this code looks
like this:

mov   ax, i
sub   ax, 5
mov   dx, 320
mul   dx
add   ax, j
sub   ax, 7
mov   si, ax

mov   al, <value>

mov   [si], al

Also, size of data in array is one byte. If you have array of records, you must multiply AX with DATA_SIZE before assigning it to SI register (you do that just like code above multiplies with 320).

Your final formula for locating x[i,j] is


Given code doesn't perform index range checking, and also it doesn't check for overflow, so if you try to access element x[800, 600] calculated index will be incorrect because 960000 cannot fit into 16 bit register, and even if it could, the array was defined as 320x200 so element being accessed wouldn't be an element of requested array.

You can add range checking to the given code and add overflow checking after multiplications and additions.
Furthermore you could use 386 instructions and 32bit registers. Finally you could use some decent memory model with linear addressing, so you don't have to deal with memory segmentation.

One more example, for 3D array with zero-based indices:

  vector = array[0..99] of byte;
  matrix = array[0..77] of vector;
  world = array[0..11] of matrix;

  w : world;

...can be represented as 1D array

w : array[0..(100*78*12)] of byte;

...which in asm is translated like this

w db (100*78*12) dup (?)

...and when you are accessing matrix with index m, vector with index v inside that matrix, and byte with index i inside that vector you have

w[m*(78*100) + v*100 + i]

Same rules apply as to 2D arrays,
is some index is not zero based - sub bottom index from it
if data size is not 1 byte - multiply the whole equation with data size.

'Hope this helps,
Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.


Expert Comment

ID: 6173961
A string in Pascal is always 255 unless declared otherwise ( string[n] ).
(s is a variable with reserved place)
simply copy 'Hello' to the offset of s.
another tip: the first byte of pascal strings ( s[0] ) is they're size.
for example, 'hello' in memory is chr(5)+'hello'.

As for 2 dimentional array or more...
do the same u do on 1dim arrays, only multiply the values:
a[10] is offset a+10
a[10,5] is offset a+10*5

I hope I helped
  Erez Sh.

Expert Comment

ID: 6173970
In example of assigning a single char at given index in string, it should be

mov   si, OFFSET x_LENGTH

instead of

mov   si, OFFSET x_DATA

because Pascal (at least BP/Delphi) allows access to element 0 just like any other character in a string.

If you use any of this, please make this correction.


Author Comment

ID: 6182763
Thanx to you all!!!

Now I know how to deal with this. =)

Featured Post

The Eight Noble Truths of Backup and Recovery

How can IT departments tackle the challenges of a Big Data world? This white paper provides a roadmap to success and helps companies ensure that all their data is safe and secure, no matter if it resides on-premise with physical or virtual machines or in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
How to get all the API from website? 11 89
C Programming - If Statement 8 76
recursion example 16 126 and sql server 4 35
This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question