How to write a scanner program for a compiler?

Posted on 2004-11-25
Last Modified: 2006-11-17
I have this simple program saved in a text file:
{ Sample program
  in TINY language-
  computes factorial
read x; { input an integer }
if 0 < x then { don't compute if x <= 0 }
  fact := 1;
    fact := fact * x;
      x := x - 1
      until x = 0;
      write fact { output factorial of x }

I need to figure out how to write a program that will read this file, pick out the tokens, and then write the tokens only to another text file.  Please, can anyone give me some insight on how to even start this project!!!
Question by:morales7_0
    LVL 55

    Expert Comment

    by:Jaime Olivares
    Is a long question. You have to learn some about parsing techniques, first.
    Have a look to these articles:
    String splitting:
    Simple math parser:
    Math parser:
    Advanced C++ parser:

    LVL 11

    Expert Comment

    for the above simple project you can use lex ,

    LVL 39

    Expert Comment

    Do you mean that?

    #include <vector>
    #include <string>
    #include <iostream>
    #include <fstream>

    using namespace std;

    int main(int nArgs, char* pszArgs[])
          vector<string> tokens;
          string token;
          ifstream ifs("input.txt");
          while (!
               ifs >> token;
               if ( || token.empty())


          ofstream ofs("output.txt");
          for (int i  = 0; i < tokens.size(); ++i)
               ofs << tokens[i] << endl;
           return 0;

    Regards, Alex
    LVL 22

    Expert Comment

    First we need to know if you're supposed to write the scanner from scratch, or whether you are allowed to use any canned tokenizer tools, like lex, yacc, or token objects.

    If you have to write it from scratch, here's the basic outline:

       get next input character

      if it is a left curly brace:  repeat getting next character until EOF or you find a right curly brace.
      if it is a digit:  keep getting input characters until you get a non-digit: return the digits as a kind "number"
      if it is a letter:  keep getting input characters until you get a non-letter: return the letters as a kind "word"
       if it is a space, tab, or end-of-line:  keep getting characters until you get something that isnt a space, tab, or end of line.
       if it is a colon:  get the next character, if it is a equals, return kind = "assignment operator", otherwise return "error"
       if it is anything else: return the character as kind = "operator"
       get next character
     until EOF


    Author Comment

    I'm sorry fellows, the program is from scratch and it has to be written in C++.  I should have wrote that in my details of the question.  For now, the program should only be able to recognize the tokens.  The next program I have to do later on is going to be the parse program and I have implement this scan program with it.  
    LVL 11

    Expert Comment

    LCC has a implementation of scanner and a parser.

    Its in C though , so you have to read andunderstand and then make a C++ version out of it.
    The source code is available from the website, but its preferable to get the book too. The book is
    quite nice to read bout practical computer implementation vis a vis the theory books like the dragon book.
    LVL 22

    Accepted Solution

    From scratch, eh?  Then you could use my outline above, easily translated into C or C++ code. Less than a page of code.


    Featured Post

    What Security Threats Are You Missing?

    Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

    Join & Write a Comment

    Article by: SunnyDark
    This article's goal is to present you with an easy to use XML wrapper for C++ and also present some interesting techniques that you might use with MS C++. The reason I built this class is to ease the pain of using XML files with C++, since there is…
    Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, wh…
    The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…
    The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.

    754 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    16 Experts available now in Live!

    Get 1:1 Help Now