[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1204
  • Last Modified:

Parse a CSV File

Hello Developers

I have a CSV file which looks something like this.....


,,,,,,,,,,,,,Restore,,,,,,,,
6257_005 List,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,
6389_005,,,,,,,,,,,,,,,,,,,,,

,,,,,,,,,,,,,,,,,,,,,
1,R874,,R_02212,,    6109,,   -22.125,,     9.025,,    90.000,,   0,,-,,-,,,,
,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,
2,R825,,R_03320,,    2312,,    23.225,,   -34.000,,    90.000,,   0,,-,,-,,,,
,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,
3 -- and so on---

I just want something like this...

R874,R_02212
R825,R_03320
and so on its a big file

How can I proceed with it....small code to do it will be great..

Thank You
Harsimrat
0
hthukral
Asked:
hthukral
  • 3
  • 3
  • 2
2 Solutions
 
brettmjohnsonCommented:
This is really a job for AWK or perl, but if you must do it in C,
you will need to use strsep() rather than strtok() to parse the
lines, as strtok() does not handle empty fields.   Here is a simplified
example of use of fgets() and strsep() to parse the lines of a CSV file:

% cat csv.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct person {
  char * name;
  int age;
  char * country;
  char * emailaddr;
};

#ifdef WRITER
int main (int arc, char ** argv)
{
  struct person p = { "SaiSeng", 19, "malaysia", "sai_seng@expert.com" };

  printf ("%s,%d,%s,%s\n", p.name, p.age, p.country, p.emailaddr);

  return 0;
}
#endif

#ifdef READER
int main (int arc, char ** argv)
{
  char line[1024];

  while (fgets(line, sizeof(line), stdin)) {
    struct person p;
    char *state = line;
    p.name = strsep(&state, ",");
    p.age = atoi(strsep(&state, ","));
    p.country = strsep(&state, ",");
    p.emailaddr = strsep(&state, ",");

    printf ("Name: %s\n", p.name);
    printf ("Age: %d\n", p.age);
    printf ("Country: %s\n", p.country);
    printf ("Email: %s\n\n", p.emailaddr);
  }

  return 0;
}
#endif

0
 
brettmjohnsonCommented:
Here is a more general purpose string parser that I wrote.
You will still need to read each line of the file using fgets() or getline().
Then pass the line to  stringTokenizer() and it returns an array of parsed
tokens.

% cat stringTokenizer.c
/*
   NAME
        stringTokenizer

   SYNOPSIS
        char ** tokens = stringTokenizer(const char * input, const char * delims, int flags);

   DESCRIPTION
        stringTokenizer() parses the input string into an array of text tokens.
        The tokens are separated in the input string by any of the delimiter
        characters specified by the delims string.  The returned array consists
        a sequence of pointers to the individual tokens.  The array is terminated
        with a NULL pointer after the last valid token.

        If flags has the STRTOK_EMTPY_TOKENS bit set, adjacent delimiters in the
        input are considered to delimit a zero-length token - represented in the
        returned array as a pointer to the empty string, "".  This is appropriate
        for parsing tabular data that may contain empty fields, common in comma-
        separated values.

        If STRTOK_EMTPY_TOKENS is not specified, multiple adjacent delimiters are
        considered as a single delimiter.  This is appropriate for parsing text
        that may have words separated by varying amounts of whitespace and/or
        punctuation.
       
        If flags has the STRTOK_TRIM_WHITE bit set, tokens are trimmed of leading
        and trailing whitespace.  Embedded whitespace is preserved.  Whitespace is
        considered to be the ASCII space character (' ') as well as all ASCII control
        characters, including tab ('\t'), carriage return ('\r'), newline ('\n'),
        and form feed ('\f').  [The token does maintain a NUL termination ('\0').]

   
   RETURN VALUES
        If successful, stringTokenizer() returns a pointer to an array of
        pointers to parsed tokens. The array is terminated with a NULL pointer.
        It is the responsibility of the caller to free the memory returned.
        It is sufficient to free the returned pointer, as the array as well as
        the token text will get freed.
       
        If input or delims is NULL or a memory allocation error occurs,
        NULL is returned.


   EXAMPLES    
        The following example populates a vector of floating values from a
        comma-separated text representation of the floats. Note the empty
        field between e and pi. Missing values are assigned NaN (Not-a-Number)
        rather than 0.0 (a valid float value).

        char * input = "1.41421, 2.71828,, 3.14159, 6.021e23";
        float values[5];
        char ** tokens;

        tokens = stringTokenizer(input, ",", STRTOK_EMPTY_TOKENS | STRTOK_TRIM_WHITE);
        for (i = 0; (i < 5) && (tokens[i] != NULL); i++) {
           if (*tokens[i] == '\0')
              values[i] = NAN;  // empty value is NaN
           else
              values[i] = atof(tokens[i]);
        }
        free(tokens);


        The following example parses words from the primer reader text. It considers
        words one or more characters separated by whitespace and/or punctuation.

        char * input = "See Dick.  See Dick run.  Run, Forest, run!\n";
        char ** tokens;

        tokens = stringTokenizer(input, " \t\r\n.!,?;:", 0);
        for (i = 0; (tokens[i] != NULL); i++) {
           printf("found word: \"%s\"\n", tokens[i]);
        }
        free(tokens);


   BUGS & SIDE EFFECTS
        If realloc() fails, stringTokenizer() returns NULL, but the previous
        allocation is leaked.

        The memory for the array of token pointers and the token text are
        allocated as a single block, so the caller need only free the single
        returned pointer when finished.

        STRTOK_TRIM_WHITE strips only 7-bit ASCII whitespace (space and all
        control characters except NUL).  It does not trim DEL ('\7F')
        or any 8-bit or multibyte characters that may include regional
        encodings of hard-space (non-breaking space).
*/


/***  PUBLIC DECLARATIONS ***/

char ** stringTokenizer(const char *input, const char *delims, int flags);

/* The following flag bits may be specified to modify the
 * the behaviour of the stringTokenizer().  Multiple flags
 * may be specified by ORing together multiple flags.
 */

/* If you define STRTOK_EMPTY_TOKENS the tokenizer accepts
 * adjacent delimiters as representing an empty field.
 * This is common in comma-separated values parsed into tables.
 * If STRTOK_EMPTY_TOKENS is not defined, the tokenizer accepts
 * multiple adjacent delimiters as a single delimiter.
 * This is useful for tokenizing words separated by variable
 * amounts of whitespace and/or punctuation.
 */
#define STRTOK_EMPTY_TOKENS     1


/* If STRTOK_TRIM_WHITE is defined, the tokenizer trims leading
 * and trailing whitespace from the tokens.
 */
#define STRTOK_TRIM_WHITE       2


/***  PUBLIC IMPLEMENTATIONS ***/
#include <stdlib.h>
#include <string.h>
#define _HAVE_STRTOK_R_

char ** stringTokenizer(const char *input, const char *delims, int flags)
{
  char ** tokens = NULL;
  int tokenCount = 0, tokenMaxCount = 0;
  char * token, * state, *str;
  char * (*tokfn)(char *, const char *, char **);
  static char * trimWhite(char *input);
  static char * strsepWrapper(char * input, const char * delims, char **state);


  /* Check input params and make a mutable copy of the input string */
  if (input && delims && (str = strdup(input))) {

    /* Determine which tokenizer we will use based upon specified behaviour flags */
    if (flags & STRTOK_EMPTY_TOKENS)
      tokfn = &strsepWrapper;
    else
#ifdef _HAVE_STRTOK_R_
      /* We prefer to use the thread-safe, reentrant version of strtok() */
      tokfn = &strtok_r;
#else
      /* But will settle for the traditional non-reentrant version */
      tokfn = (char * (*)(char *, const char *, char **)) &strtok;
#endif

    /* Separate the string into indvidual tokens */
    for (token = (*tokfn)(str, delims, &state); token; token = (*tokfn)(NULL, delims, &state))
    {
      if (flags & STRTOK_TRIM_WHITE)
        /* trim leading and trailing whitespace from token */
        token = trimWhite(token);

      /* Extend the array of pointers to tokens, if neccessary */
      if (tokenCount >= tokenMaxCount) {
        if ((tokens = (char **)realloc(tokens, (tokenMaxCount+=32)*sizeof(char*))) == NULL)
          break;
      }

      /* Add our new token to the array */
      tokens[tokenCount++] = token;
    }

    /* NULL terminate the array of pointers and include the text of the
     * tokens in the allocation, so that the caller can free it up with
     * a single call to free().
     */
    if (tokens) {
      int len = strlen(input)+1;
      if ((tokens = (char **)realloc(tokens, (tokenCount+1)*sizeof(char*)+len))) {
        tokens[tokenCount] = NULL;
        memcpy(tokens+tokenCount+1, str, len);
      }
    }

    /* Delete our mutable working copy of the string */
    free(str);
  }

  /* return the array of tokens */
  return tokens;
}


/*** PRIVATE IMPLEMENTATIONS ***/

/* Trim leading and trailing whitespace from the input string.
 * This routine trims in-place, modifying the input string.
 * The address of the first non-white character is returned.
 * If the returned pointer points to a NUL byte, the whole string
 * was white.
 */
static char * trimWhite(char *input)
{
  char *start, *end;
  if ((start = input)) {
    while (*start && (*start <= ' ')) start++;
    if (*start) {
      for (end = start; *end; end++);
      while ((end > start) && (*end <= ' ')) end--;
      *(end+1) = '\0';
    }
  }
  return start;
}

/* This wraps strsep() so that it may be called like strtok_r() */
static char * strsepWrapper(char * input, const char * delims, char **state)
{
  if (input)
    *state = input;
  return strsep(state, delims);
}


/*** UNIT TEST ***/

#ifdef UNIT_TEST

#include <stdio.h>
#include <math.h>

int main()
{    
  {
    int i;
    char * input = "See Dick.  See Dick run.  Run, Forest, run!\n";
    char ** tokens;
   
    tokens = stringTokenizer(input, " \t\r\n.!,?;:", 0);
    for (i = 0; (tokens[i] != NULL); i++) {
      printf("found word: \"%s\"\n", tokens[i]);
    }
    free(tokens);
  }

  {
    int i;
    char * input = "1.41421, 2.71828,, 3.14159, 6.021e23";
    float values[5];
    char ** tokens;
   
    tokens = stringTokenizer(input, ",", STRTOK_EMPTY_TOKENS | STRTOK_TRIM_WHITE);
    for (i = 0; (i < 5) && (tokens[i] != NULL); i++) {
      if (*tokens[i] == '\0')
        values[i] = NAN;
      else
        values[i] = atof(tokens[i]);
      printf("values[%d] = %f\n", i, values[i]);
    }
    free(tokens);
  }

  return 0;
}

#endif
0
 
PaulCaswellCommented:
Sounds to me like you may be looking for something simple. I guess you want to select the second and fourth value from the record. How about something like this:

...
char Line[1024];
...
while ( fgets ( Line, sizeof(Line), file ) != NULL )
{
 int i; // Position in line.
 int f; // Field number;
 int printedSomething = false;

 for ( i = 0, f = 1; Line[i] != '\0'; i++, f++ )
 {
  char * end = strchr(&Line[i],',');
  if ( end != NULL )
  {
    int length = end - &Line[i];
    switch ( f )
    {
      case 2:
      case 4:
             // Grab the 2nd and 4th field.
             printf ( "%*.*s", length, length, &Line[i] );
             if ( f < 4 ) printf(",");
             printedSomething = true;
             break;
    }
    i += length;
  }
 }
 if ( printedSomething ) printf("\r");
}

Paul
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
hthukralAuthor Commented:
Paul I have concern, I'm reading everything in CString inStr; and how can I use this instead of a file
while ( fgets ( Line, sizeof(Line), file ) != NULL )

and putting everything back as CString outStr; instead of printfs ;///this shouldnt be a big problem

only the reading with fgets is a task as its giving me error

error C2664: 'fgets' : cannot convert parameter 3 from 'class CString' to 'struct _iobuf *'

Thank You
Harsimrat
0
 
hthukralAuthor Commented:
I changed everything like this..

FILE* file =0;
if( (file = fopen(srcStr, "r")) != NULL )
{

}
char Line[1024];
while ( fgets ( Line, sizeof(Line), file) != NULL )
{
 int i; // Position in line.
 int f; // Field number;
 int printedSomething = false;
 for ( i = 0, f = 1; Line[i] != '\0'; i++, f++ )
 {
  char * end = strchr(&Line[i],',');
  if ( end != NULL )
  {
int length = end - &Line[i];
switch ( f )
{
  case 2:
  case 4:
 // Grab the 2nd and 4th field.
//       printf ( "%*.*s", length, length, &Line[i] );
outStr += ( "%*.*s", length, length, &Line[i] );    // outStr is declared something like this CString outStr;
 if ( f < 4 ) outStr += ",";
       printedSomething = true;
 break;
}
      i += length;
}
}
 if ( printedSomething ) outStr += "\r";
}

When I do this, it shows the file in jumbled up form
something like this

,,,,,,,,,,,,Show,,,,,,,,               // Repeted twice here
,,,,,,,,,,,Show,,,,,,,,
,,,,,,,,,,,,help List,,,,,,,,
,,,,,,,,,,,help List,,,,,,,,
,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,

,,,,,,,,,,,,,,,,,,,
R301,,RES_02212_002,,    6109,,   -22.125,,     9.025,,    90.000,,   0,,-,,-,,,,
,RES_02212_002,,    6109,,   -22.125,,     9.025,,    90.000,,   0,,-,,-,,,,
,,,,,,,,,,,,,,,,,,,,

and this garbled goes on....and on....
0
 
PaulCaswellCommented:
I am not terribly well up with C++. I apologise, I assumed that because you posted this in the C TA you wanted a C solution. The problem is here:

outStr += ( "%*.*s", length, length, &Line[i] );    

This is not the same as:

printf ( "%*.*s", length, length, &Line[i] );

an I think printf is not available in C++.

Essentially, this line is taking 'length' bytes from 'Line' starting at position 'i'. If you can make that happen in C++ you should get what you want.

If you'd like me to move this question to C++ just post here and I'll do it.

Paul

0
 
PaulCaswellCommented:
Jim,

I *looks* like I cracked it and asker went away but Brett put quite a lot of work in so I am loath to take all credit. I'd suggest a split unless Brett feels otherwise.

Paul
0
 
hthukralAuthor Commented:
Sorry everyone...I have splitted points between Paul and Brett....both solutions worked....Paul was simple solution and Brett was hard ...

Thanks

Harsimrat
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 3
  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now