Solved

Parsing a CSV

Posted on 2003-12-05
21
984 Views
Last Modified: 2006-11-17
i am reading in a text file line by line -- i need to parse each line by commas

i am running into trouble when i try to grab an element out of the line

buffer[1] returns the first character not everything between the commas.

what i would like is the ability to read each line and reference the entire element as buffer[1] buffer[2] etc

code is below

#include <iostream.h>
#include <fstream.h>
#include <stdlib.h>


int main ()

{
   char buffer[500];
   char dataline[500];

   int strcmp( const char* s1,
                     const char* s2 );
   
   int count = 0;
   
   ifstream thefile;

   thefile.open ("c:/file.txt", ios::in);    
                                                   
                                                   
   if (! thefile.is_open()) {
      cout << "Error opening file";
      exit (1);
   }


   while (! thefile.eof() ) {
     
       thefile.getline(buffer, 499, '/o');
      
       cout << buffer << endl;

     }

  thefile.close();
  return 0;
}
0
Comment
Question by:tpiazza
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 6
  • 4
  • +1
21 Comments
 
LVL 16

Expert Comment

by:imladris
ID: 9882889
char buffer[500]; represents a single line of 500 characters.
The getline method reads a single line.

To read multiple lines into memory and be able to address them you would need a two dimensional array:

char buffer[10][500];

That declaration represents 10 lines of 500 characters. You could read into it like:

i=0;
while (! thefile.eof() ) {
     
      thefile.getline(buffer[i++], 499, '/o');
     
      cout << buffer << endl;

     }

However that would, of course, run into trouble after 10 lines. Along this route you would have to read the whole file into buffer, which would mean you would have to know or find out beforehand how many lines there are in the file.

It is more common to read and process one line at a time, similar to what you are doing now.

0
 

Author Comment

by:tpiazza
ID: 9882974
need to do it line by line -- the files range in size
0
 
LVL 4

Assisted Solution

by:dhyanesh
dhyanesh earned 20 total points
ID: 9883989
Hi

I think this should be something like:

while (! thefile.eof() ) {
     
     thefile.getline(dataline, 499);         //  You do not need pass the third argument it is optional
     
      cout << dataline << endl;

    }

Now to get data between commas you could use strtok() function. It gets all data until a delimiter.

You will have to declare buffer something like:

char (*buffer)[15];                //If you have 15 fields at max

This makes buffer as array of 15 pointers to characters.

In strtok() you have to pass dataline as first argument and second argument will be delimiter. It will return a pointer to first field i.e. all characters before the first comma.

First call to strtok() makes it return a pointer to string before the first delimiter. It also puts a '\0' just before the delimiter.

Subsequent calls to strtok() with NULL as first argument and delimiter as second argument will parse the string and return the subsequent fields until the end. When no more fields are left NULL is returned.

Dhyanesh


0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:tpiazza
ID: 9884243
mind posting some code with the strtok()

mine keeps erroring out
0
 
LVL 16

Expert Comment

by:imladris
ID: 9884499
char *p;

while (! thefile.eof() ) {
     
      thefile.getline(buffer, 499);
      p=strtok(buffer,",");
      if(p!=NULL)
      {   // process first field
      }
      while((p=strtok(buffer,NULL))!=NULL)
      {   // process next field
      }
     
      cout << buffer << endl;

     }
0
 
LVL 3

Expert Comment

by:merphle
ID: 9884734
Or, if the code to process the first field and all subsequent fields is the same:

char *p;
while (! thefile.eof() ) {

      thefile.getline(buffer, 499);
      for (p=strtok(buffer,","); p != NULL; p=strtok(buffer,NULL)) {
           // process field
      }

}
0
 
LVL 4

Expert Comment

by:dhyanesh
ID: 9887771
Hi

I do not think strtok() works the way it is posted above.

As given in documentation of Turbo C++ it should be something like:

char *p;

while (! thefile.eof() ) {
     
     thefile.getline(buffer, 499);
      p=strtok(buffer,",");
     if(p!=NULL)
     {   // process first field
     }
     while((p=strtok(NULL,","))!=NULL)   //first argument should be NULL and not buffer and 2nd argument should be the delimiter
     {   // process next fields
     }
     
     cout << buffer << endl;

    }


Dhyanesh
0
 
LVL 4

Expert Comment

by:dhyanesh
ID: 9887836
Hi

Also if you to reference each field like buf[0], buf[1] then you would have to do something like:

int i;
char *p;
char (*buf)[15];

while (! thefile.eof() ) {
   
     thefile.getline(buffer, 499);
      p=strtok(buffer,",");
    if(p!=NULL)
    {   buf[0] = p;
    }
    i = 1;
    while((p=strtok(NULL,","))!=NULL) //first argument should be NULL and not 'buffer' and 2nd argument should be the delimiter
    {  
          buf[i++] = p;
    }
   
   }

After using strtok() the original string i.e. 'buffer' will have a '\0' placed just before each delimiter. So if you do

cout << buffer <<endl;


You will see only the first field. However you can access the other fields by buf[0], buf[1], buf[2], .....

Dhyanesh
0
 

Author Comment

by:tpiazza
ID: 9896513
i keep getting the follwoing error

C:\Program Files\Microsoft Visual Studio\MyProjects\parse\parse.cpp(36) : error C2440: '=' : cannot convert from 'char *' to 'char [500]'
        There are no conversions to array types, although there are conversions to references or pointers to arrays

#include <iostream.h>
#include <fstream.h>
#include <stdlib.h>
#include <string.h>


int main ()

{
   char buffer[500];
   
    int i;
      char *p;
      char (*buf)[500];

   ifstream thefile;

   thefile.open ("c:/file.txt", ios::in);    
                                                   
                                                   
   if (! thefile.is_open()) {
      cout << "Error opening file";
      exit (1);
   }


   while (! thefile.eof() ) {
   
     thefile.getline(buffer, 499);
      p=strtok(buffer,",");
    if(p!=NULL)
    {   buf[0] = p;
    }
    i = 1;
    while((p=strtok(NULL,","))!=NULL)
   {  
          buf[i++] = p;
    }
     
      cout << buffer << endl;
        cout << buf[0] << endl;

     }

  thefile.close();
  return 0;
}
0
 
LVL 16

Expert Comment

by:imladris
ID: 9897373
This line:

char (*buf)[500];

declares a pointer named buf, which points to an array of 500 characters.

Thus buf[0] will point to the "first" array of 500 characters, and buf[1] will point to the "second" array of 500 characters. So

buf[0]=p;

where p is a pointer to a single character is going to cause a conversion error. If you want to save the pointers to the tokens you find in buffer you could declare:

char *buf[500];

This is an array of 500 pointers to character. So buf[0] is a pointer to character, just like p is, so the assignment will now work.

Note also that strtok changes buffer, so you will not be able to use buffer to emit the line to cout at the end.
0
 

Author Comment

by:tpiazza
ID: 9898635
that gets me the output i want -- however buf[0] returns the first element once and buf[1], buf[2], etc returns each element twice.

also

i = 1;
    while((p=strtok(NULL,","))!=NULL)
   {  
          buf[i++] = p;
    }
     
      if i change i=0 i get the second element in the list

    please explain how this iterates



0
 

Author Comment

by:tpiazza
ID: 9898655
actually it does buf[0] twice -- it only displays output once
0
 
LVL 16

Expert Comment

by:imladris
ID: 9898771
I would expect each element of buf to contain 1 token.

I'm not sure what you mean by "buf[1] returns each element twice".

I would expect:

for(int j=0; j<i; ++j)
{   cout << buf[j] << endl;
}

to show a list of the tokens that were found.

If that didn't clear it up, please post the code you are using, and explain exactly what output you are getting.
0
 

Author Comment

by:tpiazza
ID: 9899041
each linein the text file is like

  ktc,163008,A,3458.8221,N

with the following code if i ask for buf[1]

i get

163008
163008





#include <iostream.h>
#include <fstream.h>
#include <stdlib.h>
#include <string.h>

int main ()

{
   char buffer[500];
   
    int i;
      char *p;
      char *buf[500];

   ifstream thefile;

   thefile.open ("c:/file.txt", ios::in);    
                                                   
                                                   
   if (! thefile.is_open()) {
      cout << "Error opening file";
      exit (1);
   }


   while (! thefile.eof() ) {
   
     thefile.getline(buffer, 499);

      p=strtok(buffer,",");
   
        if(p!=NULL)
    {   buf[0] = p;
    }
   
        i = 1;
   
      while((p=strtok(NULL,","))!=NULL)
    {  
          buf[i++] = p;
    }
     
        cout  << buf[0] <<endl;

     }

  thefile.close();
  return 0;
}


0
 

Author Comment

by:tpiazza
ID: 9899078
if i move  

while((p=strtok(NULL,","))!=NULL)
    {  
          buf[i++] = p;
    }
     
       
     }

cout  << buf[0] <<endl;

  thefile.close();

i dont get any results for buf[0]
 
0
 

Author Comment

by:tpiazza
ID: 9899223
ok this is odd -- i originally only had one line of text it the file -- works like a champ with more than one line -- if it has only one line is when you get the before mentioned output
0
 
LVL 16

Expert Comment

by:imladris
ID: 9899411
Ah, I see.

For a single line the loop will proceed as follows:

while (! thefile.eof() ) {
     thefile.getline(buffer, 499);
     
     //process line

     cout  << buf[0] <<endl;
}

end of file yet, no
get next line
process line
output first token
go back to top of loop
end of file yet, no
get next line
(end of file condition is now raised)
process contents of buffer (still contains first line)
output first token

So you see, the loop is going one iteration too far. You need something like:

while(!thefile.eof())
{   thefile.getline(buffer,499);
     if(!thefile.eof())
     {   //processs line
          cout << buf[0] << endl;
     }
}

0
 
LVL 4

Expert Comment

by:dhyanesh
ID: 9902143
Hi

Sorry for my mistake with declaration of buf.

It should be char *buf[500] as imladris pointed out

Dhyanesh
0
 

Author Comment

by:tpiazza
ID: 9904947
thanks so much -- appreciate the detailed explanations

last question

if i want to only output the line where buf[0] = ktc how would i go about it

my file contains info in the following form

ktc,163008,A,3458.8221,N,08200.8754,W,61.7,137.3,120603,6.2,W,A*24
gmg,163008,A,3458.8221,N,08200.8754,W,61.7,137.3,120603,6.2,W,A*24
ktc,163008,A,3458.8221,N,08200.8754,W,61.7,137.3,120603,6.2,W,A*24
gmg,163008,A,3458.8221,N,08200.8754,W,61.7,137.3,120603,6.2,W,A*24

i only need the ktc

if i throw

 if(!thefile.eof())
     {  
           
             if(buf[0] = "ktc")
             {
             cout  << buf[0] << "  " << buf[1] <<endl;
             }
     }
      
      
or move it on top of the while statement it still outputs everyline with buf[o] as ktc and then buf[1] as what its supposed to be on every other line
0
 
LVL 16

Accepted Solution

by:
imladris earned 80 total points
ID: 9904982
buf[0]="ktc" will produce some kind of assignment. The equality operator is '=='. But even that doesn't work for character arrays. Assuming you want to compare the token that buf[0] points to with "ktc" you should use strcmp:

if(strcmp(buf[0],"ktc")==0)

0
 

Author Comment

by:tpiazza
ID: 9906017
thank you for your help
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generat…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question