Basic Assembler Simulator help

Posted on 2003-11-03
Last Modified: 2011-09-20
Just wondering if anyone could point me to some resources, or information on a problem I have. I have the Mano Computer System Architecture book, but im not very familiar with Assembly at all, ive scoured google and most sites i find either go way too into detail that I get lost or direct me to a pay site where i have to buy a book. Anyways the problem at hand:

Im trying to write a very basic Assembler Simulator in C, that takes in input from a text file containing Assembly and produces output in the form of binary object file containing the corresponding machine code image of the program.

This program must be able to handle the directives: ORG, END, DEC
If you need more information I can provide some.

Im not asking to be handed a completed answer :) Though I wouldnt complain, just looking for some help and a point in the right direction as I do want to learn how to do this.

Thanks, I will award more points depending on the responces I get

Question by:Afterlife
  • 7
  • 4
  • 2

Expert Comment

ID: 9678114
Writing an assembler is not too tricky, just a lot of work. Assembly consists of mnemonic instructions that have a corresponding binary representation.

A single mnemonic (eg DEC) usually has multiple different binary forms depending on the size and arguments of the instruction. For example:

DEC register
DEC [register]
DEC [register + displacement]

Start out by using the lex/yacc combination to generate a parser that accepts the different mnemonics and their various formats (size, addressing mode). You will need an instruction set reference to find out the combinations that are allowed for each instruction (and of course the list of instructions that your assembler should support).

Author Comment

ID: 9680608
Do you know where I could find an example program that does what Im trying to do so I can use it for referance and to learn off. As it stands im failry clueless in assembly.

All i know is i need to be able to convert from say an expression such as A = B + 9, change this into Assembly such and the Load, and then store all this stuff, and then change to hex.

This is a project some students across the U.S have to do, and im clueless as to how to begin, i will severly up the points im offering for some serious help!

Thanks in advance.

Expert Comment

ID: 9680901
Start by reading the lex/yacc documentation. These tools are invaluable for compiler construction in general.

The 'lex' (or newer 'flex') tool constructs lexical analyzers for mapping characters to tokens. The 'yacc' (or newer 'bison') tool can be used to specify a grammar using productions. These programs generate C code.

Forget about the code generation part for now. First get the input file into a datastructure you can work with.

And, I imagine your professor/teacher has given you a grammar, instruction set reference and maybe some example test programs to start with.
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.


Author Comment

ID: 9682379
grammar I think yeah, but no test programs or anything, the prof has been failry unhelpful with regards to this assignment.

Program must satisfy the following specific goals:
Support the directives: ORG, END, DEC.

Support symbolic address reference labels (must start with alphabetic character, length 1-3 alphanumeric characters, terminating in comma – ‘,’)

Support operation mnemonic syntax defined by Mano (Table 6.1, page 175) with hexadecimal object codes defined.

Support assembly instruction grammar, with or without symbolic address

I have no idea what this lex/yakk thing is? This assignment is "supposed" to be simple, tho im a complete newb where assembler is concerned and the materials we have to work with focus mor eon computer architecture and logic/circuit design not assembler...

I think all im supposed to do is read in a file say like A=B+3 and then convert this to Assembler and then from that to Hex, i dont have to convert it to binary just yet thats like the 3rd part or something.

Await your responce.

Author Comment

ID: 9682389
oh and its not a compiler its a Simulator, im supposed to be faking an assembler in C.

Expert Comment

ID: 9682813
Well, you do have to convert from a text input file to machine code (whether or not you print out hex values or dump them to a binary). Don't know what the "simulation" part is then, since this is what a real assembler does. Perhaps the Mano book (never seen it) defines some "demonstration" assembly language.

Anyway, I still think using lex/yacc is the quickest way to get the job done. Depends on what you're allowed to use of course.

The 'lex' thing is just a tool to recognize patterns, so you don't have to dissect your input file using hand-crafted C code (think <ctype.h>).

The flex manual:

The 'yacc' thing can recognize grammars.

The bison manual:

The 'flex' and 'bison' tools are simply improved version of 'lex' and 'yacc'. They work the same and are present on any unix system. There are windows version too.

I am willing to help, but I am not sure what other advice to give you, since I have no clue what the intended assembly looks like, what the input file format (grammar) is and how sophisticated the whole simulator needs to be.

Author Comment

ID: 9683631
Would you like me to send you the webpage with the instructions and stuff? I asked if were allowed to use yacc or flex and apprantly thats way too advanced for this assignment.

Do you have an e-mail or MSN/ICQ name/number maybe we can converse in real time or something.


Expert Comment

ID: 9686033
You can post a link to the webpage (or upload it somewhere) so everyone can take a look at it.

Expert Comment

ID: 9770311
Hi Afterlife,
    I think I know what u r after. Uneed to simulate an assembler in C which bassically will handle certain opcodes like LOAD, ADD, STORE etc and will also handle directives like ORG, DEC etc.
For this u need to understand a few data structures required for manipulating the Assembler instructions.
1) Symbol Table - Hold information relating to variables used in Assembly program.

There are multiple type of Assemblers that we can simulate  based on requirements:
1) One Pass assembler
2) Two Pass assembler (most frequently used)
3) Macro processor (Additional to implement macro functionality)

If this is what u r looking for I'll be glad to guide u. Please reply to this post saying the same. I'll provide more details based on ur queries.

Regards ,


Author Comment

ID: 9799498
VBS yeah thats basically it...

The objective of this project:
To design and write an Assembler program that accepts input in the form of text files containing Assembly language program source codes (expressed for the given Instruction set architecture) and produces output in the form of a binary object file containing the corresponding machine code image of the program.  This text file will contain an assembly language program that conforms to the system and grammar of the assembly language introduced by Mano in Chapter 6.  

Your program must satisfy the following specific goals:
· Support the directives: ORG, END, DEC.
· Support symbolic address reference labels (must start with alphabetic character, length 1-3 alphanumeric characters, terminating in comma – ‘,’)
· Support operation mnemonic syntax defined by Mano (Table 6.1, page 175) with hexadecimal object codes defined.
· Support assembly instruction grammar, with or without symbolic address

The Assembler must produce a listing of the assembly language program and the corresponding allocation addresses and object codes.
It should also produce a listing of the Symbol Table used during the Assembly process. Both the listings should be printable (i.e. text)      

You must include a binary object file.  

Using the principle of required prior declaration, wherein a symbolic reference (such as an address label or variable name label) can be resolved immediately if it has already been defined, a single-pass assembler strategy will be adequate to translate the program into machine code.

One approach to parsing and resolving programming constructs is to fully expand the construct in Assembly language format, generating labels as required on the first pass through the code, then performing the second pass in order to fully resolve all address (branch point) labels.

In order to demonstrate and test your model, you must produce at least one non-trivial example assembly language program that your Assembler will correctly assemble into a target binary object code file. This object code file contains a loadable, binary executable image of your program, suitable for immediate loading and execution.

Program must implement all instructions to permit your assembler to fully support the machine model.

I honestly have no idea where to even begin, if you could supply at least some code for me to work off, or if you have fully functional code doing what is asked, then I can greatly bump up the points! I wouldnt just use your code tho, I would use it to learn and implement my own way, but seeing as im strapped for time at the momment I dont have time to sit down and learn this as slowly as I would like.

Accepted Solution

vbs03 earned 125 total points
ID: 9801834
OK u need to start of defining an structure of opcodes
i.e. ur Assembler statements - LOAD, STORE etc. This structure could be defined as:
struct opcodes{
     char opname[15];
     int opval;
} = {"LOAD",0, "STORE", 1, ......};

In this way declare and initialise the opcode table using ur Mano table.

Now for the symbol table:
struct symtab{
     char id;
     int address;

Step 1: Then in the main function start ur program by opening the source file. Parse this source file line by line,
Step 2: Perform syntax analysis by ensuring that the instructions on each and every line follow the folllowing syntax

label: opcode <operand1> <operand2> <operand3> ....

To perform the syantax analysis for the same u need to create a new function which will take opcode as parameter and verify the number of operands. In this function perform the following validation also that if opcode does not exist then print error.

Step 3: After syntax checking, make use of operands to enter into the symtab structure, first the name of operand and next the location of operand i.e. the line number.

Step 4: In Step 1 along with source file we will be opening the target file also. This will be a new file we will open in o/p mode which will contain the translated code. Afetr Step 3 we will enter the tarnslated object code for the scanned line into the target file.

Step 5: At the end of source program file, have the object code file ready to execute and the symbol table in memory as required. Both cann be printed any time.

This is the entire flow of one-pass assembler, of course in short. If u have followed what I have said please start implementing it. I will guide u further as ur development begins. In case ur finding gr8 difficulty then I will provide u with code for some of the important functions used in the one-pass assembler.


Author Comment

ID: 9801865
Time is against me!

Ill try doing some of this tommorow, tho I have 5 classes and 5 finals which start in under 2 weeks, so getting this done in a week and studying for 5 finals, is pressure!!!

If you could provide maybe some more detailed instructions, examples, and maybe those functions would be great. I know this sounds greedy, but im really running out of time and this is only the first part of the project!

If you want I can tell you the second part of the project, im not sure if part 1 is needed to implement it or not, I can also give you the Mano table, i have a scan of it, so if you need it I can post it.

My regards - Afterlife

Author Comment

ID: 9832202
Thanks for the help, but I have not had the time to try any of the suggested from VB.

I dont know wether I am to assign points  no matter what, or if a mod can just close this topic.


Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Swapping a Int64 var (using ASM) 19 1,121
Help! Bomb phase 6 3 982
LNK2017 error with MASM64 2 1,611
Convert dialog units to pixels 5 109
When we purchase storage, we typically are advertised storage of 500GB, 1TB, 2TB and so on. However, when you actually install it into your computer, your 500GB HDD will actually show up as 465GB. Why? It has to do with the way people and computers…
An article on effective troubleshooting
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question