Solved

Parsing a CSV

Posted on 2003-12-05
21
973 Views
Last Modified: 2006-11-17
i am reading in a text file line by line -- i need to parse each line by commas

i am running into trouble when i try to grab an element out of the line

buffer[1] returns the first character not everything between the commas.

what i would like is the ability to read each line and reference the entire element as buffer[1] buffer[2] etc

code is below

#include <iostream.h>
#include <fstream.h>
#include <stdlib.h>


int main ()

{
   char buffer[500];
   char dataline[500];

   int strcmp( const char* s1,
                     const char* s2 );
   
   int count = 0;
   
   ifstream thefile;

   thefile.open ("c:/file.txt", ios::in);    
                                                   
                                                   
   if (! thefile.is_open()) {
      cout << "Error opening file";
      exit (1);
   }


   while (! thefile.eof() ) {
     
       thefile.getline(buffer, 499, '/o');
      
       cout << buffer << endl;

     }

  thefile.close();
  return 0;
}
0
Comment
Question by:tpiazza
  • 10
  • 6
  • 4
  • +1
21 Comments
 
LVL 16

Expert Comment

by:imladris
Comment Utility
char buffer[500]; represents a single line of 500 characters.
The getline method reads a single line.

To read multiple lines into memory and be able to address them you would need a two dimensional array:

char buffer[10][500];

That declaration represents 10 lines of 500 characters. You could read into it like:

i=0;
while (! thefile.eof() ) {
     
      thefile.getline(buffer[i++], 499, '/o');
     
      cout << buffer << endl;

     }

However that would, of course, run into trouble after 10 lines. Along this route you would have to read the whole file into buffer, which would mean you would have to know or find out beforehand how many lines there are in the file.

It is more common to read and process one line at a time, similar to what you are doing now.

0
 

Author Comment

by:tpiazza
Comment Utility
need to do it line by line -- the files range in size
0
 
LVL 4

Assisted Solution

by:dhyanesh
dhyanesh earned 20 total points
Comment Utility
Hi

I think this should be something like:

while (! thefile.eof() ) {
     
     thefile.getline(dataline, 499);         //  You do not need pass the third argument it is optional
     
      cout << dataline << endl;

    }

Now to get data between commas you could use strtok() function. It gets all data until a delimiter.

You will have to declare buffer something like:

char (*buffer)[15];                //If you have 15 fields at max

This makes buffer as array of 15 pointers to characters.

In strtok() you have to pass dataline as first argument and second argument will be delimiter. It will return a pointer to first field i.e. all characters before the first comma.

First call to strtok() makes it return a pointer to string before the first delimiter. It also puts a '\0' just before the delimiter.

Subsequent calls to strtok() with NULL as first argument and delimiter as second argument will parse the string and return the subsequent fields until the end. When no more fields are left NULL is returned.

Dhyanesh


0
 

Author Comment

by:tpiazza
Comment Utility
mind posting some code with the strtok()

mine keeps erroring out
0
 
LVL 16

Expert Comment

by:imladris
Comment Utility
char *p;

while (! thefile.eof() ) {
     
      thefile.getline(buffer, 499);
      p=strtok(buffer,",");
      if(p!=NULL)
      {   // process first field
      }
      while((p=strtok(buffer,NULL))!=NULL)
      {   // process next field
      }
     
      cout << buffer << endl;

     }
0
 
LVL 3

Expert Comment

by:merphle
Comment Utility
Or, if the code to process the first field and all subsequent fields is the same:

char *p;
while (! thefile.eof() ) {

      thefile.getline(buffer, 499);
      for (p=strtok(buffer,","); p != NULL; p=strtok(buffer,NULL)) {
           // process field
      }

}
0
 
LVL 4

Expert Comment

by:dhyanesh
Comment Utility
Hi

I do not think strtok() works the way it is posted above.

As given in documentation of Turbo C++ it should be something like:

char *p;

while (! thefile.eof() ) {
     
     thefile.getline(buffer, 499);
      p=strtok(buffer,",");
     if(p!=NULL)
     {   // process first field
     }
     while((p=strtok(NULL,","))!=NULL)   //first argument should be NULL and not buffer and 2nd argument should be the delimiter
     {   // process next fields
     }
     
     cout << buffer << endl;

    }


Dhyanesh
0
 
LVL 4

Expert Comment

by:dhyanesh
Comment Utility
Hi

Also if you to reference each field like buf[0], buf[1] then you would have to do something like:

int i;
char *p;
char (*buf)[15];

while (! thefile.eof() ) {
   
     thefile.getline(buffer, 499);
      p=strtok(buffer,",");
    if(p!=NULL)
    {   buf[0] = p;
    }
    i = 1;
    while((p=strtok(NULL,","))!=NULL) //first argument should be NULL and not 'buffer' and 2nd argument should be the delimiter
    {  
          buf[i++] = p;
    }
   
   }

After using strtok() the original string i.e. 'buffer' will have a '\0' placed just before each delimiter. So if you do

cout << buffer <<endl;


You will see only the first field. However you can access the other fields by buf[0], buf[1], buf[2], .....

Dhyanesh
0
 

Author Comment

by:tpiazza
Comment Utility
i keep getting the follwoing error

C:\Program Files\Microsoft Visual Studio\MyProjects\parse\parse.cpp(36) : error C2440: '=' : cannot convert from 'char *' to 'char [500]'
        There are no conversions to array types, although there are conversions to references or pointers to arrays

#include <iostream.h>
#include <fstream.h>
#include <stdlib.h>
#include <string.h>


int main ()

{
   char buffer[500];
   
    int i;
      char *p;
      char (*buf)[500];

   ifstream thefile;

   thefile.open ("c:/file.txt", ios::in);    
                                                   
                                                   
   if (! thefile.is_open()) {
      cout << "Error opening file";
      exit (1);
   }


   while (! thefile.eof() ) {
   
     thefile.getline(buffer, 499);
      p=strtok(buffer,",");
    if(p!=NULL)
    {   buf[0] = p;
    }
    i = 1;
    while((p=strtok(NULL,","))!=NULL)
   {  
          buf[i++] = p;
    }
     
      cout << buffer << endl;
        cout << buf[0] << endl;

     }

  thefile.close();
  return 0;
}
0
 
LVL 16

Expert Comment

by:imladris
Comment Utility
This line:

char (*buf)[500];

declares a pointer named buf, which points to an array of 500 characters.

Thus buf[0] will point to the "first" array of 500 characters, and buf[1] will point to the "second" array of 500 characters. So

buf[0]=p;

where p is a pointer to a single character is going to cause a conversion error. If you want to save the pointers to the tokens you find in buffer you could declare:

char *buf[500];

This is an array of 500 pointers to character. So buf[0] is a pointer to character, just like p is, so the assignment will now work.

Note also that strtok changes buffer, so you will not be able to use buffer to emit the line to cout at the end.
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 

Author Comment

by:tpiazza
Comment Utility
that gets me the output i want -- however buf[0] returns the first element once and buf[1], buf[2], etc returns each element twice.

also

i = 1;
    while((p=strtok(NULL,","))!=NULL)
   {  
          buf[i++] = p;
    }
     
      if i change i=0 i get the second element in the list

    please explain how this iterates



0
 

Author Comment

by:tpiazza
Comment Utility
actually it does buf[0] twice -- it only displays output once
0
 
LVL 16

Expert Comment

by:imladris
Comment Utility
I would expect each element of buf to contain 1 token.

I'm not sure what you mean by "buf[1] returns each element twice".

I would expect:

for(int j=0; j<i; ++j)
{   cout << buf[j] << endl;
}

to show a list of the tokens that were found.

If that didn't clear it up, please post the code you are using, and explain exactly what output you are getting.
0
 

Author Comment

by:tpiazza
Comment Utility
each linein the text file is like

  ktc,163008,A,3458.8221,N

with the following code if i ask for buf[1]

i get

163008
163008





#include <iostream.h>
#include <fstream.h>
#include <stdlib.h>
#include <string.h>

int main ()

{
   char buffer[500];
   
    int i;
      char *p;
      char *buf[500];

   ifstream thefile;

   thefile.open ("c:/file.txt", ios::in);    
                                                   
                                                   
   if (! thefile.is_open()) {
      cout << "Error opening file";
      exit (1);
   }


   while (! thefile.eof() ) {
   
     thefile.getline(buffer, 499);

      p=strtok(buffer,",");
   
        if(p!=NULL)
    {   buf[0] = p;
    }
   
        i = 1;
   
      while((p=strtok(NULL,","))!=NULL)
    {  
          buf[i++] = p;
    }
     
        cout  << buf[0] <<endl;

     }

  thefile.close();
  return 0;
}


0
 

Author Comment

by:tpiazza
Comment Utility
if i move  

while((p=strtok(NULL,","))!=NULL)
    {  
          buf[i++] = p;
    }
     
       
     }

cout  << buf[0] <<endl;

  thefile.close();

i dont get any results for buf[0]
 
0
 

Author Comment

by:tpiazza
Comment Utility
ok this is odd -- i originally only had one line of text it the file -- works like a champ with more than one line -- if it has only one line is when you get the before mentioned output
0
 
LVL 16

Expert Comment

by:imladris
Comment Utility
Ah, I see.

For a single line the loop will proceed as follows:

while (! thefile.eof() ) {
     thefile.getline(buffer, 499);
     
     //process line

     cout  << buf[0] <<endl;
}

end of file yet, no
get next line
process line
output first token
go back to top of loop
end of file yet, no
get next line
(end of file condition is now raised)
process contents of buffer (still contains first line)
output first token

So you see, the loop is going one iteration too far. You need something like:

while(!thefile.eof())
{   thefile.getline(buffer,499);
     if(!thefile.eof())
     {   //processs line
          cout << buf[0] << endl;
     }
}

0
 
LVL 4

Expert Comment

by:dhyanesh
Comment Utility
Hi

Sorry for my mistake with declaration of buf.

It should be char *buf[500] as imladris pointed out

Dhyanesh
0
 

Author Comment

by:tpiazza
Comment Utility
thanks so much -- appreciate the detailed explanations

last question

if i want to only output the line where buf[0] = ktc how would i go about it

my file contains info in the following form

ktc,163008,A,3458.8221,N,08200.8754,W,61.7,137.3,120603,6.2,W,A*24
gmg,163008,A,3458.8221,N,08200.8754,W,61.7,137.3,120603,6.2,W,A*24
ktc,163008,A,3458.8221,N,08200.8754,W,61.7,137.3,120603,6.2,W,A*24
gmg,163008,A,3458.8221,N,08200.8754,W,61.7,137.3,120603,6.2,W,A*24

i only need the ktc

if i throw

 if(!thefile.eof())
     {  
           
             if(buf[0] = "ktc")
             {
             cout  << buf[0] << "  " << buf[1] <<endl;
             }
     }
      
      
or move it on top of the while statement it still outputs everyline with buf[o] as ktc and then buf[1] as what its supposed to be on every other line
0
 
LVL 16

Accepted Solution

by:
imladris earned 80 total points
Comment Utility
buf[0]="ktc" will produce some kind of assignment. The equality operator is '=='. But even that doesn't work for character arrays. Assuming you want to compare the token that buf[0] points to with "ktc" you should use strcmp:

if(strcmp(buf[0],"ktc")==0)

0
 

Author Comment

by:tpiazza
Comment Utility
thank you for your help
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
Container Orchestration platforms empower organizations to scale their apps at an exceptional rate. This is the reason numerous innovation-driven companies are moving apps to an appropriated datacenter wide platform that empowers them to scale at a …
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now