Basic Assembler Simulator help

Just wondering if anyone could point me to some resources, or information on a problem I have. I have the Mano Computer System Architecture book, but im not very familiar with Assembly at all, ive scoured google and most sites i find either go way too into detail that I get lost or direct me to a pay site where i have to buy a book. Anyways the problem at hand:

Im trying to write a very basic Assembler Simulator in C, that takes in input from a text file containing Assembly and produces output in the form of binary object file containing the corresponding machine code image of the program.

This program must be able to handle the directives: ORG, END, DEC
If you need more information I can provide some.

Im not asking to be handed a completed answer :) Though I wouldnt complain, just looking for some help and a point in the right direction as I do want to learn how to do this.

Thanks, I will award more points depending on the responces I get

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Writing an assembler is not too tricky, just a lot of work. Assembly consists of mnemonic instructions that have a corresponding binary representation.

A single mnemonic (eg DEC) usually has multiple different binary forms depending on the size and arguments of the instruction. For example:

DEC register
DEC [register]
DEC [register + displacement]

Start out by using the lex/yacc combination to generate a parser that accepts the different mnemonics and their various formats (size, addressing mode). You will need an instruction set reference to find out the combinations that are allowed for each instruction (and of course the list of instructions that your assembler should support).
AfterlifeAuthor Commented:
Do you know where I could find an example program that does what Im trying to do so I can use it for referance and to learn off. As it stands im failry clueless in assembly.

All i know is i need to be able to convert from say an expression such as A = B + 9, change this into Assembly such and the Load, and then store all this stuff, and then change to hex.

This is a project some students across the U.S have to do, and im clueless as to how to begin, i will severly up the points im offering for some serious help!

Thanks in advance.
Start by reading the lex/yacc documentation. These tools are invaluable for compiler construction in general.

The 'lex' (or newer 'flex') tool constructs lexical analyzers for mapping characters to tokens. The 'yacc' (or newer 'bison') tool can be used to specify a grammar using productions. These programs generate C code.

Forget about the code generation part for now. First get the input file into a datastructure you can work with.

And, I imagine your professor/teacher has given you a grammar, instruction set reference and maybe some example test programs to start with.
Cloud Class® Course: C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

AfterlifeAuthor Commented:
grammar I think yeah, but no test programs or anything, the prof has been failry unhelpful with regards to this assignment.

Program must satisfy the following specific goals:
Support the directives: ORG, END, DEC.

Support symbolic address reference labels (must start with alphabetic character, length 1-3 alphanumeric characters, terminating in comma – ‘,’)

Support operation mnemonic syntax defined by Mano (Table 6.1, page 175) with hexadecimal object codes defined.

Support assembly instruction grammar, with or without symbolic address

I have no idea what this lex/yakk thing is? This assignment is "supposed" to be simple, tho im a complete newb where assembler is concerned and the materials we have to work with focus mor eon computer architecture and logic/circuit design not assembler...

I think all im supposed to do is read in a file say like A=B+3 and then convert this to Assembler and then from that to Hex, i dont have to convert it to binary just yet thats like the 3rd part or something.

Await your responce.
AfterlifeAuthor Commented:
oh and its not a compiler its a Simulator, im supposed to be faking an assembler in C.
Well, you do have to convert from a text input file to machine code (whether or not you print out hex values or dump them to a binary). Don't know what the "simulation" part is then, since this is what a real assembler does. Perhaps the Mano book (never seen it) defines some "demonstration" assembly language.

Anyway, I still think using lex/yacc is the quickest way to get the job done. Depends on what you're allowed to use of course.

The 'lex' thing is just a tool to recognize patterns, so you don't have to dissect your input file using hand-crafted C code (think <ctype.h>).

The flex manual:

The 'yacc' thing can recognize grammars.

The bison manual:

The 'flex' and 'bison' tools are simply improved version of 'lex' and 'yacc'. They work the same and are present on any unix system. There are windows version too.

I am willing to help, but I am not sure what other advice to give you, since I have no clue what the intended assembly looks like, what the input file format (grammar) is and how sophisticated the whole simulator needs to be.
AfterlifeAuthor Commented:
Would you like me to send you the webpage with the instructions and stuff? I asked if were allowed to use yacc or flex and apprantly thats way too advanced for this assignment.

Do you have an e-mail or MSN/ICQ name/number maybe we can converse in real time or something.

You can post a link to the webpage (or upload it somewhere) so everyone can take a look at it.
Hi Afterlife,
    I think I know what u r after. Uneed to simulate an assembler in C which bassically will handle certain opcodes like LOAD, ADD, STORE etc and will also handle directives like ORG, DEC etc.
For this u need to understand a few data structures required for manipulating the Assembler instructions.
1) Symbol Table - Hold information relating to variables used in Assembly program.

There are multiple type of Assemblers that we can simulate  based on requirements:
1) One Pass assembler
2) Two Pass assembler (most frequently used)
3) Macro processor (Additional to implement macro functionality)

If this is what u r looking for I'll be glad to guide u. Please reply to this post saying the same. I'll provide more details based on ur queries.

Regards ,

AfterlifeAuthor Commented:
VBS yeah thats basically it...

The objective of this project:
To design and write an Assembler program that accepts input in the form of text files containing Assembly language program source codes (expressed for the given Instruction set architecture) and produces output in the form of a binary object file containing the corresponding machine code image of the program.  This text file will contain an assembly language program that conforms to the system and grammar of the assembly language introduced by Mano in Chapter 6.  

Your program must satisfy the following specific goals:
· Support the directives: ORG, END, DEC.
· Support symbolic address reference labels (must start with alphabetic character, length 1-3 alphanumeric characters, terminating in comma – ‘,’)
· Support operation mnemonic syntax defined by Mano (Table 6.1, page 175) with hexadecimal object codes defined.
· Support assembly instruction grammar, with or without symbolic address

The Assembler must produce a listing of the assembly language program and the corresponding allocation addresses and object codes.
It should also produce a listing of the Symbol Table used during the Assembly process. Both the listings should be printable (i.e. text)      

You must include a binary object file.  

Using the principle of required prior declaration, wherein a symbolic reference (such as an address label or variable name label) can be resolved immediately if it has already been defined, a single-pass assembler strategy will be adequate to translate the program into machine code.

One approach to parsing and resolving programming constructs is to fully expand the construct in Assembly language format, generating labels as required on the first pass through the code, then performing the second pass in order to fully resolve all address (branch point) labels.

In order to demonstrate and test your model, you must produce at least one non-trivial example assembly language program that your Assembler will correctly assemble into a target binary object code file. This object code file contains a loadable, binary executable image of your program, suitable for immediate loading and execution.

Program must implement all instructions to permit your assembler to fully support the machine model.

I honestly have no idea where to even begin, if you could supply at least some code for me to work off, or if you have fully functional code doing what is asked, then I can greatly bump up the points! I wouldnt just use your code tho, I would use it to learn and implement my own way, but seeing as im strapped for time at the momment I dont have time to sit down and learn this as slowly as I would like.
OK u need to start of defining an structure of opcodes
i.e. ur Assembler statements - LOAD, STORE etc. This structure could be defined as:
struct opcodes{
     char opname[15];
     int opval;
} = {"LOAD",0, "STORE", 1, ......};

In this way declare and initialise the opcode table using ur Mano table.

Now for the symbol table:
struct symtab{
     char id;
     int address;

Step 1: Then in the main function start ur program by opening the source file. Parse this source file line by line,
Step 2: Perform syntax analysis by ensuring that the instructions on each and every line follow the folllowing syntax

label: opcode <operand1> <operand2> <operand3> ....

To perform the syantax analysis for the same u need to create a new function which will take opcode as parameter and verify the number of operands. In this function perform the following validation also that if opcode does not exist then print error.

Step 3: After syntax checking, make use of operands to enter into the symtab structure, first the name of operand and next the location of operand i.e. the line number.

Step 4: In Step 1 along with source file we will be opening the target file also. This will be a new file we will open in o/p mode which will contain the translated code. Afetr Step 3 we will enter the tarnslated object code for the scanned line into the target file.

Step 5: At the end of source program file, have the object code file ready to execute and the symbol table in memory as required. Both cann be printed any time.

This is the entire flow of one-pass assembler, of course in short. If u have followed what I have said please start implementing it. I will guide u further as ur development begins. In case ur finding gr8 difficulty then I will provide u with code for some of the important functions used in the one-pass assembler.


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
AfterlifeAuthor Commented:
Time is against me!

Ill try doing some of this tommorow, tho I have 5 classes and 5 finals which start in under 2 weeks, so getting this done in a week and studying for 5 finals, is pressure!!!

If you could provide maybe some more detailed instructions, examples, and maybe those functions would be great. I know this sounds greedy, but im really running out of time and this is only the first part of the project!

If you want I can tell you the second part of the project, im not sure if part 1 is needed to implement it or not, I can also give you the Mano table, i have a scan of it, so if you need it I can post it.

My regards - Afterlife
AfterlifeAuthor Commented:
Thanks for the help, but I have not had the time to try any of the suggested from VB.

I dont know wether I am to assign points  no matter what, or if a mod can just close this topic.

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.