Link to home
Start Free TrialLog in
Avatar of laeuchli
laeuchli

asked on

Looking in a file.

Hi I have a problem. I am trying to write a program that looks through a .c file and read some c code like ifs and voids. My first step was to write a function that looks on the first line of the file for a word and if it is not there it goes on to the next. I can't get it to work! Does anyone know anything about this kind of program? Will someone give me a hand? Did I start out right, and if so will someone help me write the function? Thanks
P.S there will be more points in a while, this is just to get it started.
Avatar of VEngineer
VEngineer


I can help you out if you use C++ to parse the C file.. the C++ io is more powerful than C's get and put.
Avatar of ozo
How did you write it, and what doesn't work?
What exactly does not work in your program ? Are you not able to read from file, to search for a text on a single line, or your program is confused by the program's structure and gives out if's although they aren't there ?
Do you realy need it in C, C++ ?
It would be much simpler to emplement it in a language designed for text-processing like perl or awk.
Your approach is wrong.  C is not an ideal language to write parsers in.  Neither is C++.
However, there are tools that integrate in a C/C++ environment and do most of the work for you.  What you need is LEX (or it's variants like flex, etc.).  There are several implementations available free of charge for any platform.

C perhaps is not the best choice but it should work and sometimes there is a reason to use it.

Norbert, LEX produces C code (note that I did not suggest AWK or Perl).

Avatar of laeuchli

ASKER

I am useing C++ VEngineer. I don't need to support all the parts of  C. I would rather use the compiler than an add on but if the compiler part is to hard....
Thanks for all the comments.
Alexo, lex is a poewerfull tool but I think for simple things it is a little bit overdone
and knowing C and nothing about LEX you have first learn a new language and I think for simple parsers you will be finshed using direct C before you are finished learning LEX and writing your parser.
Perhaps the C written parser will not covers all syntax posibilities (remember 'Confusing.C' ) and perhaps it is not good enough for a comercial tool.

laeuchli, what do you realy want to do ,perhaps you should send some source code
lex is a poewerfull tool but I think for simple things it is a little bit overdone


Since you say are using C++,
here's how you would start writing a search function
using the compiler's standard string library (assuming you have a newer version of the compiler, Visual 5, or Borland 5, or the latest g++ ).  If you have an older version, you can use your own string lib, download the standard library, or use char* objects as strings.

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

// note: if you are using the standard lib, there is no .h
// otherwise you will be loading an older library

// here is a search function that counts up the number of occurrences of a certain word, 'keyword' and returns an int of the number of occurences


int countSearch(string keyword) {

   // get a filename
   cout << "Enter filename: " << flush;
   string filename;
   cin >> filename;

   // define the infile stream

   ifstream fin( filename.c_str() );

   int count = 0;
   string temp;
 
   // while you can read in temp (while not end of file)
   // check to see if the words match

   while (fin >> temp) {
      if (temp == keyword)
         ++count;
   }

   return count;
}

Hopefully this will get you started.

Thanks for the answer VEngineer, but I am not sure this will hack it. The problems are,if you look through a file for something, it won't stop looking untill the end, a thing that will not work. Also what about blank spaces? I don't know how much of a problem they cause but I have always tryied to make sure the compiler discouts them. Should I maybe post my code? Unless you have another function up your sleve you could help me get the bugs out of that.
  If I do want to use lex where can I get it?

lex and yacc are part of all UNIXs (or their compiler bundles),
flex and bison (the GNU counterparts):  http://www.gnu.org
You don't need YACC, it's for writing compilers.  LEX (or the GNU equivalent, FLEX, as ahoffman noted) is perfectly suitable.
If you use a Windows platform, go to http://www.cygnus.com/misc/gnu-win32/
If you use UNIX etc. go to http://www.gnu.org

Thanks I will see if veinginer has anything to say, and if not I will try your method.
>> Thanks I will see if veinginer has anything to say, and if not I will try your method.
EE, points and other stuff aside, a good programmer needs a good set of tools.
I suggest you expand your toolkit with LEX and AWK (or derivatives).
/*  Name of the file:      pars.c
Author                  :      Bhavani P Polimetla                                    
Aim                  :      find the word in given file.
            `                        if it exits it returns 1 else returns 0
Date                  :      26/03/98   */

#include <string.h>
#include <stdio.h>
#include <memory.h>

int findword(char*,char*);
            
void main()
{
      int flag=0;

      
      flag = findword("hai.c","bfl");

      if(flag)
            printf("word found");
      else
            printf("word not found");

      getchar();

}  // end of program

int findword(char* filename, char* word)
{      
      
      char line[250],string2[250];      
      // separetares to space the tokens
      char seps[]   = "\t\n ";
      char *token1;
      FILE *hppdos=NULL;      


      // open given file to read data
      hppdos = fopen(filename,"r");      
      if(hppdos == NULL)
      {
            printf("Failed to open %s",filename);
            return 0;
      }

      // check the given line is blank or not
      while( !feof(hppdos ) )
      {
            memset(string2,'\0',strlen(string2));
            memset(line,'\0',strlen(line));
            if( fgets( string2, 250, hppdos ) != NULL)  
            {
                  strcpy(line,string2);
                  token1 = strtok( line, seps );      

                  if(token1 != NULL)
                  {
                        if (strcmp( token1,word) == 0)
                        {
                              _fcloseall();
                              return 1;                        
                        }
                        else
                        {
                              while (token1 !=NULL)
                              {
                                    if ( strcmp( token1,word) == 0)
                                    {
                                          return 1;
                                          _fcloseall();
                                    }
                                    token1 = strtok( NULL, seps );
                              }
                        }
                  }

            }  // if
      }  // while
      
      _fcloseall();
    return 0;
} // end of function


I think you can you use the function findword() (given above)
to check the given word in a given file. If the word exits
it returns 1 else it returns 0.
If you want to check only in first line change the above code.
 
>>memset(string2,'\0',strlen(string2));
>>memset(line,'\0',strlen(line));
string2 and line are on the stack and the contens is not defined.
strlen searches for the first occurence of '\0'
that maybe inside string2/line or not
this is very danger because it may corrupt some memory on the stack
better use
memset(string2,'\0',sizeof(string2));
memset(line,'\0',sizeof(line));

>> The problems are,if you look through a file for something, it won't stop looking until the end, a thing that will not work.

Do you want it to stop and return "true" as soon as it finds the word?  If so, that is a quick fix.  Maybe I'm still not clear on the exact purpose of your program.

As for spaces, all they do is separate the word and fin/cin does not count them if you use that code above.
If you declare char temp, it will read each character at a time, including the spaces.  If you declare string temp, it will read one word in at a time automatically, using spaces and newline characters as word separators only.

Ok, example, this program would return true as soon as you find the word:

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

bool found(string keyword) {
   // get a filename
   cout << "Enter filename: " << flush;
   string filename;
   cin >> filename;

   // define the infile stream
   ifstream fin( filename.c_str() );

   string temp;
 
   while (fin >> temp) {
      // if found, return true, ending the function immediately
      if (temp == keyword)
         return true;
   }

   // otherwise return false, indicating not found
   return false;
}
I still don't think so. I am trying to parse C code. It is easy to write functions that look in file to find a string. I am trying to write a parser for C.
Go ahead and dump my answer.  I think the others here may have more experience than I do in straight C and tools and could answer your question better.

If i understud the problem I think I find the solution
The function read from a .c file and search for text word.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
void findword(void)
{
char file[30],word[10],buf[1024];
FILE *f;
int length,n;
printf("The name of the file(with full path):");
scanf("%s",file);
if((f=fopen(file,"rt"))==NULL)
  {
   printf("\nEror opening file");
   exit(0);
  }
printf("Give me the word:");
scanf("%s",word);
length=strlen(word);
while(!feof(f))      //if you open the file in binary mode feof wouldn't work
{
 n=fread(buf,1024,1,f);
 fseek(f,-length,SEEK_CUR);  //if the word is starts for example at 1022
                       // we must go back in file with the length
                       // of word in order not to get over the word

 if(strstr(buf,word))          //the function search a substring in a string
    {                            //for more detailes see the help
     printf("Word was found\n");
     return;
   }
    printf("Word wasn't found\n");
 if(n<1024)                        //if we reach the end of file then exit
    return;
}
}

void main(void)
{
findword();
}
>> I am trying to write a parser for C
laeuchli, I hate to be a pain in the ... but LEX is tool that was specifically created for writing parsers.  There are free implementations for every platform, it interfaces with C (in fact it creates a C file which you compile with your project), it is universally used and extremely well documented.

http://www.cs.columbia.edu/~royr/tools.html

Lauechli: if you really want to write a PARSER for C - that means a program that understands syntax structure of a C source file - you _should_ be using lex and yacc tools unless you are _extremely_ experienced or unless you really wish to spend a year or so on this task. You should better subscribe usenet newsgroup comp.compilers and ask parsing questions there.
In general I agree with many preceding comments: lex or awk are the things to use. However, I'm still not clear on your objectives. There are undoubtedly valid objectives that would lead one to prefer some other solution. So for what it is worth, here is the lexical analyzer for a small processing engine I wrote some years ago. The grammer is simple and listed at the top in comments. It does not handle keywords perse, but that could be added in the section where it recognizes identifiers:

#include "stdio.h"
#include "string.h"
#include "stdlib.h"
#include "conio.h"

/*
 * Recognized commands for the grammer are:
 *
 * id(exp)              copy exp bytes and label them with id
 * id[+ exp]            advance output message pointer exp bytes
 * id[- exp]            retract output message pointer exp bytes
 * {(exp)  ... }        loop through the following specifications exp times
 * / exp /              execute expression
 *
 * exp is (in order of decreasing binding strength)
 * n           some constant
 * id          an identifier
 * (exp)       parenthetical expression
 * [exp,...]   array expression
 * id & exp    a bitwise and operation
 * id | exp    a bitwise or operation
 * exp?exp:exp a conditional expression
 * id = exp    an assignment operation
 * id |= exp   a bitwise or assignment operation
 * id &= exp   a bitwise and assignment operation
 *
 * Expressions result in their value being on the stack. Array expression
 * leave all their values on the stack in order. Assignments and loop ends
 * consume values on the stack.
 * Assignments to id's with length 1 or 2 take the value from the top of the
 * stack. Id's with length greater than 2 are done one byte at a time starting
 * length bytes from the top of the stack. Assignment to id's with length 0
 * just consume the top stack entry.
 *
 */

#define TRUE      1
#define FALSE     0
#define WRITE    "wb"
#define READ     "rb"
#define AWRITE   "w"
#define NOP       -1L
#define TRKHDSZ  7

#define VARNMSZ   8
#define LINELEN 300

/* tokens */
#define CONS     0   /* constant                  */
#define ID       1   /* identifier                */
#define LPAREN   2   /* (                         */
#define RPAREN   3   /* )                         */
#define LBRACE   4   /* {                         */
#define RBRACE   5   /* }                         */
#define FSLASH   6   /* /                         */
#define ASSNOP   7   /* =, |=, &=                 */
#define CONOP    8   /* ?                         */
#define SEPOP    9   /* :                         */
#define BITOP   10   /* |, &                      */
#define ARRAY   11   /* [                         */
#define INSERT  12   /* [+                        */
#define DELETE  13   /* [-                        */
#define CLOSE   14   /* ]                         */
#define COMMA   15   /* ,                         */
#define UNKNOWN 16   /*                           */

/* lexical analyzer */

#include "ctype.h"

void fetchtoken(void);
char getnextst(void);
char getnext(void);
void unget(char);

int tkp,nextdone,nxtoken,gotnxch,lp;
char nxtokenstr[VARNMSZ+1],nxch,line[LINELEN+1];
FILE *tmf;

void initlex()

{      tmf=fopen(tmn,READ);
      if(tmf==NULL)cabort("Missing source");
      lp=-2;
}


gettoken()

{      if(!nextdone)fetchtoken();
      strcpy(tokenstr,nxtokenstr);
      nextdone=FALSE;
      return(nxtoken);
}


nexttoken()

{      if(!nextdone)
      {      fetchtoken();
            nextdone=TRUE;
      }
      return(nxtoken);
}


void fetchtoken()

{      int i,hc;
      char a,t;

      tkp=-1;
      a=getnextst();
      nxtokenstr[++tkp]=a;
      if(isalnum(a))
      {      while(a=getnext(),isalnum(a))
            {      if(tkp>VARNMSZ)cabort("Value name too long");
                  ++tkp;
                  nxtokenstr[tkp]=a;
            }
            nxtokenstr[++tkp]='\0';
            if(isdigit(nxtokenstr[0]))
            {      i=1;
                  hc=FALSE;
                  if(nxtokenstr[1]=='x')
                  {      if(nxtokenstr[0]!='0')cabort("Invalid constant");
                        i=2;
                        hc=TRUE;
                  }
                  for(; nxtokenstr[i]!='\0' && (!hc && isdigit(nxtokenstr[i])) ||
                                                            (hc && isxdigit(nxtokenstr[i])); ++i);
                  if(nxtokenstr[i]!='\0')cabort("Invalid constant");
                  nxtoken=CONS;
            }
            else nxtoken=ID;
            unget(a);
            return;
      }
      nxtokenstr[1]='\0';
      switch(a)
      {      case '(':
                  nxtoken=LPAREN;
                  return;

            case ')':
                  nxtoken=RPAREN;
                  return;

            case '{':
                  nxtoken=LBRACE;
                  return;

            case '}':
                  nxtoken=RBRACE;
                  return;

            case '/':
                  nxtoken=FSLASH;
                  return;

            case '?':
                  nxtoken=CONOP;
                  return;

            case ':':
                  nxtoken=SEPOP;
                  return;

            case '=':
                  nxtoken=ASSNOP;
                  return;

            case '|':
            case '&':
                  t=getnext();
                  if(t=='=')
                  {      nxtokenstr[1]='=';
                        nxtokenstr[2]='\0';
                        nxtoken=ASSNOP;
                  }
                  else
                  {      unget(t);
                        nxtoken=BITOP;
                  }
                  return;

            case ',':
                  nxtoken=COMMA;
                  return;
                  
            case '[':
                  t=getnext();
                  if(t=='+' || t=='-')
                  {      nxtokenstr[1]=t;
                        nxtokenstr[2]='\0';
                        nxtoken=(t=='+'?INSERT:DELETE);
                  }
                  else
                  {      unget(t);
                        nxtoken=ARRAY;
                  }
                  return;

            case ']':
                  nxtoken=CLOSE;
                  return;
                  
            default:
                  nxtoken=UNKNOWN;
                  return;
      }
      return;
}


char getnextst()

{      char a;

      while(a=getnext(),a==' ' || a=='\t' || a=='\n' || a=='\r');
      return(a);
}


char getnext()

{      if(gotnxch)
      {      gotnxch=FALSE;
            return(nxch);
      }
      if(lp==-2 || line[lp]=='\n' || lp==LINELEN)
      {      fgets(line,LINELEN+1,tmf);
            lp=-1;
      }
      return(line[++lp]);
}


void unget(a)
char a;

{      nxch=a;
      gotnxch=TRUE;
      return;
}


So that's meant mostly as a kind of example of what a lexical analyzer would look like. At its heart is a switch statement that breaks out tokens. It is also responsible for skipping white space. A C lexer would also have to skip comments. The other functions are largely housekeeping and the 3 top routines with which the parser uses the lexer.

P.S. It would have to be completed (and so much more compicated) if you are really wanting to completely analyze the C grammer. That would likely be outside most of our scope. On the other hand, if you are attempting something more limited (like get all the function names or something like that) it could be smaller.

You can't parse c just looking in a file for words. Come on! Please I am counting on you SOB! :-). Are there any examples of parseing c on the net?
ASKER CERTIFIED SOLUTION
Avatar of alexo
alexo
Flag of Antarctica image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
What does your answer mean? Please explain.
OK, I think I get your answer, but I don't know lex. Is there a site somewhere where I can get a start on it?
Thanks.

http://www.cs.columbia.edu/~royr/tools.html


BTW, why won't you use a text pr0cessing language (awk, perl, sed)?
What is awk, perl,or sed?
Why would I use them instead of lex?

What is awk, perl,or sed?
Why would I use them instead of lex?

Also how about a little example to get me started? Lets say we have a file that has the following:
"
int a;
if(a=1)
"
just those to lines. How would one parse them useing lex?
Thanks.
P.S.
I will give and A rateing and sixty more points if you help me now.
Thanks.

As you can read in other comments, you have a special language to define (text)patterns (== lex) and rules (== yacc) how to use them. lex (and yacc) compile these patterns and rules into C code which must be compiled again to an executable. Then you can test.

awk, perl, sed (and some others) read their instructions from commandline or a script file. You don't need a compiler, just the program (awk for example) itself. This makes developing and testing very simple.
An other advantage is that these programs are designed to deal with text, means that they know about "words", "lines" etc. which must be coded in lex or C otherwise.

awk and sed are part of all UNIXs, perl newerdays too.
So I just give a few examples how easy to use.

1. find all lines in a file which start with  "void"
    sed -n -e '/^void /p' file

2. find all lines in a file which contain the literal  "if (var==1)"
    sed -n -e '/if (var==1)p' file

3. find all lines where the second word is  "function"
    awk '$2 == "function" {print}' file

Hope these example will give you a hint what can be done with these programs on text files.
uups, we are posting simultaneously ;-)

Anyway, I gave the answer according to you last comment *and* the initial question in my last comment (using sed).
I have to disagree with ahoffman.  SED, AWK and Perl are indeed suitable for general text processing but not for writing parsers in.  LEX was specifically created for parsers and thus is the best tool for the job (along with its derivatives).

Check:
  http://ironbark.bendigo.latrobe.edu.au/courses/subjects/bitsys/ctutes/tutelist.html
  http://www.cs.huji.ac.il/course/plab/lex

Also see:

LEX tutorial (in postscript)
  http://www.fmi.uni-passau.de/common/lib/archive/doc/michael/programmierung/lex.ps.gz

LEX & YACC notes
  http://opal.cs.binghamton.edu/~zdu/zprof/zprof/NOTES.html

LEX & YACC examples
  http://vcapp.csee.usf.edu/~sundares/lex_yacc.html

FLEX & BISON (derivatives) info
  http://tinf2.vub.ac.be/~dvermeir/courses/compilers
alexo, laeuchli din't ask for a parser. Well after several comments he said that he want to write a parser (Monday, June 22 1998 - 06:57AM PDT).
OK, didn't read all comments carefully.
So laeuchli, do you realy want't to write a new complete C parser (see Belgarat's comment)?

alexo if he realy wants such a thing your're right your with last comment and your last answer.

But I disagree with you alexo that sed, perl and/or awk are not suitable for parsers. I can parse any kind of data with them, and so they *are* parsers too.
Keep in mind that a `parser' is not a synonym for `tokenizing and parsing C program code'. You know that I'm shure ;-))

Sorry but the links don't help to much. The download is dead and I could not find a good lesson on the others. Get trying.  
i think i diserve an explanation for rejecting my answer
In my humble opinion..this looks like a troll....
With a bad case of feature Creep.  ;-)
It started with a rather simple premise and has escalated to a full scale parser.
There have been a couple of very good examples which fulfilled the original request and they were rejected. It may be time to move on.
No disrespect intended-just an observation.

John C. Cook

John C. Cook
Perl is quite suitable for parsers.  Although I agree that sed and awk are more limiting, and may not be the language of choice for more than simple lexing.
On the other hand,

int a;
if(a=1)

is not valid C, so I'm not sure what you'd want a parser to return for it other than "syntax error"
If you just want to lex it into tokens, that's easy enough.
(Although including support for preprocessor macros would take it beyond the scope of a 17 point question)
Thanks alexo I will look at the links. johncook you wanted to know why I did not take any of the c answer. Because I tryed one month and three functions to get it to work. But none of my parse function could parse an if.(at least without a lot more trouble) Now granted some of these function may be good, but I don't want to spend another month trying to figure out somebody elses source code and getting it ready. I would much rather use something like lex which promises to be better for the task
Thanks. Jesse

Reading the last 3 to 5 comments, it would be nice if you tell us exactly what you want to do (parse) in detail for 17 points.

laeuchli, you did not answer some of the experts questions ;(
How would you get a right answer if we don't know what you realy want.
Thank you laeuchli for your response.
The first step toward solving a problem is to define exactly what the problem is. If you would take a few moments to outline the scope of your request you will get the answers you are looking for.
I can see by looking across these reponses that there are some extremely knowlegable people attempting to assist you. And if they are given the information they need they can.
If I may let me give you an example of a statement that would help.

Problem statement:
I am looking for a program that understands or can decode(parse) 'C' syntax from a source file.
I want to be able to enter a 'C' statement or key word at the command line and have the program return all instances of that statement and also display the entire content of each statement returned.
***you might even want to get more specific - if you do I am sure you will be pleased with the results.**

Good luck with your quest,
John C. Cook
Parting words from my favorite Stooge "Curly"
"I'm try'n to think but nuthin' happens"
OK, sorry i guess I should be less vauge in my questions.
I what I am trying to do is make a program that looks in a .c file for things like void main() and ifs. When the program finds the  stuff that it is looking for it writes stuff to a file.
Thanks.
P.S. I will raise the amount of points now so I don't blow it on some other question.
simple task, simple programs:

grep 'void main()' source.c > void.stuff
grep '[ \t][ \t]*if[( ]' source.c > if.stuff
egrep 'main void()|[ \t][ \t]*if[( ]' source.c > all.stuff
to ahoffmann:"WHAT!!" to alexo:"You think I should be useing LEX and not yacc right?Why what is differnt?"
LEX is for writing parsers, it is used to tokenize a source file.
YACC is for writing compilers, it is used to translate those tokens to code.

They are usually used together.
AHH! I got b18 from http://www.cygnus.com/misc/gnu-win32/ 
and start some test lex programs. When I compile them with g++ it says that _WinMain16 is undefined!! HELP!

_WinMain@16 is the mangled name of WinMain(), the equivalent of main() for a windowed application.  Either use WinMain() instead of main(), or explicitly tell the compiler you want a console application.

laeuchli, it seems to me that this is getting farther from the original question.  Why don't you close this one and ask compiler related stuff in another?
I am asking this here because I can't compile the code without the gnc package and so the answer is not usefull if I can't get the stuff working. Is there a plain win32 flex?

laeuchli, the error does noot seem to come from flex.  Check your configuration.  Are you compiling a console or windowed applications?  If windowed, you *must* have a WinMain function.

>> Is there a plain win32 flex?
This is as plain as it comes.
laeuchli, you got gnu-win32.

Why didn't you try my suggestions?
Cygnus' bin directory contains all the tools (awk, egrep, grep, sed, etc.), you don't need to comile anything for you simple task.
ahoffmann I perfer the idea of lex where you can compile into c.
alexo, I found what may be better. Visual Parse at www.sand-stone.com if it works I will still give you the poinst because you pointed me to lex. Will post comment if it works.

I have tryed to setup gnu and that did not work. I could not get into visual parse. I could not compile the pccts programs.
Is there anything else I can do?
I suggest that you ask help about the specific tools on usenet:
  comp.compilers.tools
  comp.compilers.tools.pccts

I just had an idea. All these unix ports gnuc ptts etc do not like to be ported. Would it work if I downloaded the linux version of Lex and used the *.c file made in linux in windows?
Think that would work?

Dunno.  Why don't you try?
Well which version of lex(not pttc) do you think would be the most portable, and where can I get it? What does everyone outthere think?

laeuchli, I worked pretty hard on this question and gave you quite a fair amount of useful information on a subject that kept getting broader and broader.  However, my knowledge and resources *are* limited and I probably cannot be squeezed for more.  if you think that what you got is not worth the 80 points you offered, feel free to reject my answer.
Look I downloaded 36mb of stuff,spent 80 points,spent 5 months, and wrote 300 hundred lines of code. I am asking a lot of questions here in the hope of cutting my loses. However as you seem to be the last person to have any ideas I will give you the points. I still don't have a parser.Maybe I can figure myself.

I'm sorry.  I just ran out of ideas.

Try asking in:
        comp.compilers.tools
        comp.compilers.tools.pccts

Try emailing the author of PCCTS.
That's all right it does not matter. I found a version of lex the runs on linux that seems portable. Thanks for your help.