Link to home
Start Free TrialLog in
Avatar of j_k
j_k

asked on

Who has the best C string parsing method?


I'm wondering what the is the best, most acceptable C like method for writing a piece of code that loops through lines of strings and parses the file out.  I've got my method, but would like to see how others do it.  I use strtok and sscanf.  The problem with strtok is that it puts a '\0' at the end of each token, which seems to make sscanf better!!

Suppose a text file like:
AAAA 12:54 BBBB 1234 6:34 CCCC 9999
BBBB 2:44 CCCC 4567 7:34 DDDD 0000
HHHH 8:43 BBBB 4321 4:28 EEEE 4568
KKKK 11:10 GGGG 5432 10:34 DDDD 1212

And some code like
While ((fgets(line, infile) != NULL) {
  pos1 = strtok(line, " ");
  pos2 = strtok(NULL, " ");
  pos3 = strtok(NULL, " ");
  pos4 = strtok(NULL, " ");
  pos5 = strtok(NULL, " ");
  pos6 = strtok(NULL, " ");
  pos7 = strtok(NULL, " ");
  if (pos1 = something) do something ...
  if (pos2 = something) do something ...
  ....
  ....
  }
}

And then Sscanf option like:
While ((fgets(line, infile) != NULL) {
  sscanf(line, "%s %d:%d %s %d %d:%d %s %d", pos1, &pos2, &pos3, pos4, &pos5, &pos6, &pos7, pos8, &pos9);

  if (pos1 = something) do something ...
  if (pos2 = something) do something ...
  ....
  ....
}

What do you think?  Is there a cleaner solution, or if it works dont mess with it?

Thanks.
Avatar of zebada
zebada

This is what I use when I want a simple text file parser:

#include <stdio.h>
#include <errno.h>
#include <string.h>

#define MAX_TOKEN_SIZE 255

char *nextToken(FILE *fd,char *delimiter);

int
main(int argc, char* argv[])
{
  FILE *fd;
  char *filename="data.txt";
  char *token;

  if ( (fd=fopen(filename,"r"))==NULL ) {
    fprintf(stderr,"Can't open %s. Error %d\n",filename,errno);
    return -1;
  }

  while ( (token=nextToken(fd," \t\n"))!=NULL ) {
    printf("Token: [%s]\n",token);
  }

  fclose(fd);
  return 0;
}

char *
nextToken(FILE *fd,char *delimiter)
{
  static char token[MAX_TOKEN_SIZE+1];
  char        *t=token;
  int         n=0;
  char        c;

  if ( feof(fd) )
    return NULL;

  while ( !feof(fd) && n<MAX_TOKEN_SIZE ) {
    c = fgetc(fd);
    if ( strchr(delimiter,c)!=NULL ) {
      while ( !feof(fd) && strchr(delimiter,c)!=NULL )
        c = fgetc(fd);
      ungetc(c,fd);
      break;
    }
    *t++ = c;
    n++;
  }
  *t = '\0';

  if ( n==MAX_TOKEN_SIZE ) {
    while ( !feof(fd) && strchr(delimiter,c)==NULL )
      c = fgetc(fd);
    while ( !feof(fd) && strchr(delimiter,c)!=NULL )
      c = fgetc(fd);
    ungetc(c,fd);
  }

  return token;
}
Avatar of j_k

ASKER

Zebada,
I like what you have shown, and I understand some of it.  What I need to do is make tokens per line read in.  After getting each token per line, I need to perform some calcs and such before going to the next lines tokens.  How would you adapt you code to do that?
ASKER CERTIFIED SOLUTION
Avatar of zebada
zebada

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Zoppo
Hi j_k,

one of the best methods I know is to use a lexical/grammatical analyzer tool
like i.e. lex/yacc or flex/bison.

With those tools you can define set of rules for syntax and grammar in a script
file and the tool creates running C-code from those scripts. The generated code
is stated as very stable and optimized.

Those tools have the advantage that existing parsers are easy to extent through
modifications of the script files and re-generation of the parser code.
Disatvantage is that you first will have to learn the script-language (which is not
too hard).

Lot of applications which parse text files are created with those tools, among them
the majority of C-compilers.

ZOPPO