Solved

write code to convert csv file to ascii file

Posted on 2004-08-31
24
426 Views
Last Modified: 2008-03-10
Hi there,

I have a csv file with 30 columns and 8000 rows of records. I want to pick some of the columns and write to a new file. This new file must be ASCII flat file with the format requirements. Give you an example:

CSV file at c:\temp\profile.csv
   Name,           ID,                 Age,              Gender,                   Location,
="john",       ="1001",          ="32",              ="M",                     ="NY",
="mary",      ="1002",          ="28",              ="F",                      ="NJ",
="david",      ="1003",         ="33",              ="M",                      ="MA",
="jane",       ="1004",         ="34",              ="F",                       ="PA",


New ASCII file at c:\temp\profilenew.txt and the new format should be
                          start                           end                          length
Name                  1                                10                            10
ID                       11                              16                             6
Age                     17                              19                            3
Gender                20                               20                            1
Location               21                              22                             2


How can I write the codes ?
Thanks
0
Comment
Question by:justinY
  • 11
  • 10
  • +2
24 Comments
 
LVL 9

Expert Comment

by:Cayce
ID: 11947338
Is this homework?
0
 

Author Comment

by:justinY
ID: 11948243
yes, do you know how to do it ?
0
 
LVL 22

Expert Comment

by:grg99
ID: 11951071
Here's the basic steps, you just need to flesh out each one:

open the file

while not end of file:
     read a line
     while not end of line:
        look for a quote;  look for another; extract the stuff between;put it in an array
     write out the array elements in the right widths


... that's basically it.

for the file i/o, see "man iostream"
for the string operations, see man "string"

0
 
LVL 11

Expert Comment

by:bcladd
ID: 11953616
(1) Do you have to worry about internal quotes or commas? That is, can a field contain quotes or commas? If so, you will want to make sure you find non-escaped quotes (or, if quotes can be internal and commas can't, you can scan for commas).

(2) Show us the code that you have or ask specific questions and we would all be happy to help. We can't write the code for you to turn in but we can help you figure out how to extract the fields and write the fixed format lines.

-bcl
0
 

Author Comment

by:justinY
ID: 11980307
grg99 , bcladd

I am struggling with

look for a quote;  look for another;
extract the stuff between;
put it in an array
write out the array elements in the right widths

Can you guys give me some hints or sample code that can lead me into right direction ?
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 11983149
justinY, show us what you have so far and show us where you've snagged and we'll be able to help.

Here's some help...

To look for the offset of a '\"' character in a string, try using a loop and compare each character in the string with '\"'.

If that doesn't click for you, show us your attempt at writing the loop, and let us know what you reckon is wrong with it.

There are of course many different ways of doing this, and once you get the ball rolling you'll get suggestions from other experts who propose different ways of looking for characters in strings. The best way to do it is *your* way.
0
 

Author Comment

by:justinY
ID: 11985490
I can simplify the file with only comma (no “ “). My thoughts are:
1.start reading the second line, because the first line is column’s names not values.
2.read the second line into a string from first character to end of line.
3.read from first character until reach first comma, put this field1 into array, then continue reading until reach second comma, put this field2 into array, … do so until reach end of line.
4.write all the values to an output file by the requirement that is fixed length

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

#include<iostream.h>
#include<fstream>
#include<iomanip.h>

main()
{
char n[40];       // declare an array n has 40 fields //                  
for (int i = 0; i < 40; i++)      // initialize array //
int j;             // declare line count //
for (int j = 1; j < 10000; j++)      // initialize line count //

// open file1 (source file) in reading mode //
fstream file1_op(“c:\temp\file1.csv”,ios::in);

// open file2 (output file) in writing mode //
fstream file2_op(“c:\temp\file2.txt”,ios::in);      

if (!file1_op)      // check if file1 open for reading //
{
  cout << “Error Open File” << endl;
  return –1;
}
if (!file2_op)                    // check if file2 open for writing //
{
  cout << “Error Open File” << endl;
  return –1;
}

// starting from second line(j=1), loop for j=1,i=0,1,2,3,4...39 //
// get value at field j=1, i=0; j=1,i=1; j=1,i=2; ... j=1,i=39 //
// and write each field's value to file2.txt (output file) //
while(file1_op !=eof())      
{
file1_op.getline(file1_op,line,”,”); // i am not quite sure if this is right //

// I was struggling here for long time. what I want to do is reading line 1
from first charater until reach first comma, put first field's value into j=1, i=0 position; and continue reading untill reach second comma, and put second field's value into j=1, i=1 position, keep doing this until reach end of line, put the last field's value into j=1,i=39 position.  Once I have all the fields' value, then I start to write the values to file2.txt by new format defined by first filed's length=10, second field's length=6,third field's length=3, ... , //

}

I dont know if I make myself clear enough. Please help


0
 
LVL 17

Expert Comment

by:rstaveley
ID: 11988253
You need to pace yourself. There is a lot to get your head around. Start off by getting a working framework which reads lines of text from an input file and outputs them pretty much intact in an output file.

(1) The headers should be per the standard.

Don't use:

#include<iostream.h>
#include<iomanip.h>

Use:

#include<iostream>
#include<iomanip>
using namespace std;

(2) You are going to be working with strings. Use the standard library string rather than character arrays. If you find yourself using an array or a pointer, you should feel bad about it.

Use:

#include <string>   /* Provides you with standard library strings */
#include <vector>  /* Provides you with standard library vectors (the proper container to use if you find yourself drawn towards evil arrays. Vectors are a good basic container. */

(3) Remember that DOS backslashes need to be escaped (e.g. "c:\\temp\\file1.csv").

(4) ios::in is not right for the output stream. I prefer using ifstream for input and ofstream for output. There is less scope for getting things wrong that fstream with ios::in or ios::out.

(5) You'll find that the standard library <string> header defines getline to work with strings. This is what you should be using to read lines from your input stream. The standard library string is a template based on basic_string. Don't be put off by the complicated appearance of the definition at http://www.sgi.com/tech/stl/basic_string.html, it is as string as this:

       int lines_read = 0;
       string line;   // Here's what you want to read the line into
       while (getline(file1_op,line)) {
              // OK, you've got your line, now tokenise it...
              // ... but for the time being let's simply do this
              lines_read++;
              file1_op << "Line " << lines_read << ": " << line << '\n';
       }

This takes you as far as reading though the input file and writing to the output file. See if you can get to this point before getting into tokenising etc.

Let us know how you get on with this and then you can progress to the next stage.
0
 

Author Comment

by:justinY
ID: 11990597
Thank you very much. so far so good.
But questions for you:
1. I want to skip reading line 0 ( line 0 contains no value, only column names), so can we change int lines_read = 1; ?
2. Do we still need file2_op as output file ? if yes, can we change to
 file2_op << "Line " << lines_read << ": " << line << '\n';
(we are reading the contents between " " and treating ; as seperator, then output everything we have read into the output file, right ?)
3. I only want to write some columns to the output file (not all the columns), for example, I want to write column 2, 3, 5, 10, 14, 20, 30, 34, 35, 37 and 40 into output file file2.txt. Columns 2, 3, 5 have constants. Column 14 has no direct value, it must reference a small file "file3.txt" to get value from file3.txt file. then how can we do it ?
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 11990869
> 1. I want to skip reading line 0 ( line 0 contains no value, only column names), so can we change int lines_read = 1; ?

    if (lines_read > 1)
              file1_op << "Line " << lines_read << ": " << line << '\n';

> 2. Do we still need file2_op as output file ?

Sorry, I don't understand the question. The code written there was a place-holder. You need to start doing clever things with the string (i.e. separating it into substrings or "tokenising" the string).

> 3. I only want to write some columns...

Before you get to that, have you managed to tokenise the string?

The following illustrates a crude but effective way of tokenising your string (with no handling for quotes).
--------8<--------
#include <iostream>
#include <string>
using namespace std;

void tokenise(const string& line),process(const string& substring);

int main()
{
      string line = "one,two,three";
      tokenise(line);
}

void tokenise(const string& line)
{
      // Look for a ',' in the string
      for (int i = 0;i < line.size();i++)
            if (line[i] == ',') {
                  process(string(line,0,i)); // Process the substring up to the ','
                  tokenise(string(line,i+1)); // Tokenise the remains
                  return;
            }
      process(line); // Proces all of this string (there is no ',')
}

void process(const string& substring)
{
      cout << "Here is a substring: " << substring << '\n';
}
--------8<--------

There are wiser ways of doing this, but this illustrates a simple way of tokenising the line into substrings. See if you can do something with the process function to make it output the substring in a suitable manner for lines read from your input file.
0
 

Author Comment

by:justinY
ID: 11991904
Thank you, the following is my process function
// what I want to do is that write the substrings to file2 (output file) and set the length to fixed length.//

void process(const string& substring)
{
               file2 << setw(4) << substring << '\n' ;
               file2 << setw(5) << substring << '\n' ;
               file2 << setw(6) << substring << '\n' ;
}
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 11992729
Test your implementation of process with some test data. Indeed you can test it with the harness code in my previous comment, if you substitute file2 with cout. I think you'll find that it doesn't do what you want it to do.
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 17

Expert Comment

by:rstaveley
ID: 11992773
You could implement something like this.

--------8<--------
#include <iostream>
#include <iomanip>
#include <string>
using namespace std;

void tokenise(const string& line,ostream& os,int column = 0),process(const string& substring,ostream& os,int column);

int main()
{
     string line = "john,1001,32,M,NY";
     tokenise(line,cout);
     cout << '\n';
}

void tokenise(const string& line,ostream& os,int column)
{
     // Look for a ',' in the string
     for (int i = 0;i < line.size();i++)
          if (line[i] == ',') {
               process(string(line,0,i),os,column); // Process the substring up to the ','
               tokenise(string(line,i+1),os,column+1); // Tokenise the remains
               return;
          }
     process(line,os,column); // Proces all of this string (there is no ',')
}

void process(const string& substring,ostream& os,int column)
{
     const int width[] = {10,6,3,1,2};
     if (column < sizeof(width)/sizeof(int))
         os << setw(width[column]) << substring;
}
--------8<--------

All that's missing from this is the line feed at the end of the line (unless I'm mistaken at this late hour...)
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 11992780
BTW... If you want left justification, you should use:

  os << setiosflags(ios::left);
0
 

Author Comment

by:justinY
ID: 11997338
Thank you.
Then Lets make a change for the layout
New ASCII file at c:\temp\profilenew.txt and the new format should be
                          start                           end                          length
ID                        1                                10                            10
Name                   11                              16                             6
Age                     17                              19                             3
Location               20                              21                             2
Gender                22                              22                              1

Then how can I do it ?
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 11997472
Change my suggested width settings from...

> const int width[] = {10,6,3,1,2};

...to...

const int width[] = {10,6,3,2,1};
0
 

Author Comment

by:justinY
ID: 11998467
Thank you.
More questions for you
1. what about if I want to print Justin in Name column (no other names, only Justin)?
2. This is the program for one line ( one row) only, right ? How about for many rows (like N rows) ?
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 11999064
> 1. what about if I want to print Justin in Name column (no other names, only Justin)?

I'm not with you. Do you mean skip all records other than Justin's?

> 2. This is the program for one line ( one row) only, right ? How about for many rows (like N rows) ?

This program needs to be merged with your program, which reads all lines from the file and the tokenise function should be called for each line.

i.e.

      int lines_read = 0;
       string line;   // Here's what you want to read the line into
       while (getline(file1_op,line)) {
              lines_read++;
              if (lines_read > 1) {
                  tokenise(line,file2_op);
                  file2_op << '\n';
              }
       }
0
 

Author Comment

by:justinY
ID: 11999253
Sorry, I didnt make myself clear. Here are what I mean:
1. I want to skip ID column
2. I want to write Justin to Name column. So on my output file, under Name cloumn, there will be no John, mary, David, Jane. Only shows Justin
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 11999411
Are you sure you can't do this yourself?

To skip the first column, you could use a test like this:

    if (column == 0) // Skip 1st column...          

To force "Justin" in the second column, you could use a test like this:

     if (column == 1) // Force the value "Justin" when it is the 2nd column...
0
 

Author Comment

by:justinY
ID: 12000890
Thank you.
Quick question. Is there anyway to write the substring to specified field (starting 3 to 10), not write to the length (8) ? For example, I read a substring between second comma and third comma. I want to write this substring to the specified location starting from 3 to 10, assuming 3 to 10 is long enough to hold the substring.
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 12003651
You would need to write the string into a string which was long enough and filled with spaces (use string::assign) with each new line. Use the string::replace function to replace the required substring.
0
 

Author Comment

by:justinY
ID: 12006629
Thank you
why dont you use funtion strtok(). You dont like it ?
0
 
LVL 17

Accepted Solution

by:
rstaveley earned 500 total points
ID: 12008185
strtok belongs in the world of character pointers and C. If you like strtok, but you want to embrace C++, you'd be better off looking at the tokenizer in the boost libraries, or look at using geline with a ',' delimiter with a stringstream. There is a plethora of better ways of tackling your problem than the approach I have outlined. I aimed to outline an approach which was educational and required minimal experience of libraries including the standard library. As you hone your skills in C++, you'll find that you increasingly become a librarian rather than a coder, but it's a good idea to focus on coding while you are getting to grips with the language.

I confess that I've found it difficult to break the habit of resorting to C character arrays and pointers because they offer quick gratification, but it has been worth it. I spend less time debugging code as a consequence. Marshall Cline says that pointers and arrays are "evil" in his C++ FAQ. It doesn't hurt to think along his lines.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Templates For Beginners Or How To Encourage The Compiler To Work For You Introduction This tutorial is targeted at the reader who is, perhaps, familiar with the basics of C++ but would prefer a little slower introduction to the more ad…
Introduction This article is the first in a series of articles about the C/C++ Visual Studio Express debugger.  It provides a quick start guide in using the debugger. Part 2 focuses on additional topics in breakpoints.  Lastly, Part 3 focuses on th…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now