?
Solved

Parsing Text into Tab Delimited File

Posted on 2005-04-05
13
Medium Priority
?
207 Views
Last Modified: 2010-04-01

I have some text data from a legacy system (old Mainframe) which I am trying to input into
a tab delimeted file for input into a relational database.  The goal is to read in a file
of type *.DAT, and then output into a tab delimeted *.TXT file
I only have soft copies of the *.DAT files for input.  The data output is static
and certain fields are specified by their line number, :#:.  I would like to parse these
files, but I am not sure on how to go about doing it.  The file structure looks like this.  
A5

A5543645674645646446
      :01:KI
:02:AMERA123456C897
:03:
:04:
:05:
:10:BIRDCAGE
:12:A50212USD1234,89
:74:LONG STRING HERE
:113:B
:245:123456XIX1234
-
A62354424334242423234
      :01:KI
:02:EURO123456C897
:03:
:04:
:05:
:06:
:10:BIRDCAGE
:12:A50212USD2345,89
:74:LONG STRING HERE
:113:B
:245:123456XIX1235
-

The values theat I would like to import into SQL Server, are only :02:,:10: :12:,:74:
:113:,:245: However in some cases these values will be null, and
sometimes those lines, and line numbers will not exist at all.  When parsing into
tab delimeted, I want the output to look like this:


:02:      :10:      :12:      :74:      :113:      :245:
Value      Value      Value      Value      Value      Value
Value      Value      Null      Value      Value      Value

So that each line number is a column, and the values for those columns are the strings
next to those values.  If the line number does not exist, or there is no data in the line
number, the value will be null.

As you can see, there are multiple records in one file, and it there are not always
the same number of records in each file.  

I have never done text parsing before, however need a lot of help on this one, any code,
suggestions, or pointers to the right direction will be most helpful.

Thank You!
0
Comment
Question by:superfly18
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 6
13 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 13710924
This should be easy using maps and vectors, e.g.:

#include <vector>
#include <map>
#include <fstream>
#include <string>
using namespace std;

map<string,vector<string> > mapColsToData;

vector<string> data_02;
vector<string> data_10;
vector<string> data_12;
vector<string> data_74;
vector<string> data_113;
vector<string> data_245;

mapColsToData.insert(map<string,vector<string>& >::value_type(":02:",data_02));
mapColsToData.insert(map<string,vector<string>& >::value_type(":10:",data_10));
mapColsToData.insert(map<string,vector<string>& >::value_type(":12:",data_12));
mapColsToData.insert(map<string,vector<string>& >::value_type(":74:",data_74));
mapColsToData.insert(map<string,vector<string>& >::value_type(":113:",data_113));
mapColsToData.insert(map<string,vector<string>& >::value_type(":254:",data_254));

ifstream is ("inputfile");
string strLine;
string strCol;
string strData;
int nPos;

while(!is.eof()) {

    getline(is,strLine);

    // locate 2nd ':' (if any)

    nPos = strLine.find(':');

    if ( 0 > nPos)  nPos = strLine.find(':', nPos) else continue;

    strCol = strLine.substr(0,nPos);
    strData = strLine.substr(nPos + 1);

    // find the vector for the column name
    map<string,vector<string> >::iterator i = mapColsToData.find(strCol);

    if ( i == mapColsToData.end()) continue;

    // insert data to vector, empty fields will be empty strings
    i->second.push_back(strData);
}

// now, output that to a file

ofstream os ("outputfile.txt");
size_t szMax = 0;
map<string,vector<string> >::iterator i;

for ( i = mapColsToData.begin(); i != mapColsToData.end(); ++i) {

    size_t sz = i->second.size();

    if ( sz > szMax) szMax = sz;
}


os << ":02:     :10:     :12:     :74:     :113:     :245:"


for ( int n = 0; n < mapColsToData.size(); ++n) {

    for ( i = mapColsToData.begin(); i != mapColsToData.end(); ++i) {

        string s = "NULL";

        if ( n <= i->second.size()) s = i->second[n];

        os << s << "\t";
    }

    os << endl;
}
0
 
LVL 1

Author Comment

by:superfly18
ID: 13712919
Thanks for the quick help.  I spent some time trying to troubleshoot myself, but have to admit I am a bit stuck.  I am getting the following compiler error:

        F:\parse.cpp(16) : see reference to class template instantiation 'std::map<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<s

Please let me know how I can fix this

Thanks!
0
 
LVL 86

Expert Comment

by:jkr
ID: 13712993
Well, that was meant more like an idea than a solution to work by copy&paste :o)

What exacly is line 16 in your code?
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Author Comment

by:superfly18
ID: 13713097
Agreed.  However, I have never worked with vectors before, and all of the errors seem to stem from these lines:
mapColsToData.insert(map<string,vector<string>& >::value_type(":12:",data_12));
0
 
LVL 86

Expert Comment

by:jkr
ID: 13713124
Um, sorry that I forgot to mention this before, but the error message you posted seems to be lacking some info...

And, just to add that, the above lines should somehow be embedded into functions or methods also.
0
 
LVL 1

Author Comment

by:superfly18
ID: 13713179
They are in my main function.  I just have a few more errors to clean up, and they all have to do with those lines.  Here is the complete error message.

F:\parse.cpp(26) : error C2665: 'pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std:
:allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > &>::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class std::vector<class std::basic_string<
char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > &>' : none of the 2 overloads can convert parameter 2 from type 'class s
td::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > >'

Thank You for your help thus far.  If you can help me fix these last few lines I will be HAPPY!!!

Thanks!
0
 
LVL 86

Expert Comment

by:jkr
ID: 13713200
Hmm, let's see - the following

mapColsToData.insert(map<string,vector<string> >::value_type(":02:",data_02));
mapColsToData.insert(map<string,vector<string> >::value_type(":10:",data_10));
mapColsToData.insert(map<string,vector<string> >::value_type(":12:",data_12));
mapColsToData.insert(map<string,vector<string> >::value_type(":74:",data_74));
mapColsToData.insert(map<string,vector<string> >::value_type(":113:",data_113));
mapColsToData.insert(map<string,vector<string> >::value_type(":254:",data_254));

should help...
0
 
LVL 86

Expert Comment

by:jkr
ID: 13713207
BTW, lacking the input file(s), I cant test that code, but

#include <vector>
#include <map>
#include <fstream>
#include <string>
using namespace std;

int main () {

map<string,vector<string> > mapColsToData;

vector<string> data_02;
vector<string> data_10;
vector<string> data_12;
vector<string> data_74;
vector<string> data_113;
vector<string> data_245;

mapColsToData.insert(map<string,vector<string> >::value_type(":02:",data_02));
mapColsToData.insert(map<string,vector<string> >::value_type(":10:",data_10));
mapColsToData.insert(map<string,vector<string> >::value_type(":12:",data_12));
mapColsToData.insert(map<string,vector<string> >::value_type(":74:",data_74));
mapColsToData.insert(map<string,vector<string> >::value_type(":113:",data_113));
mapColsToData.insert(map<string,vector<string> >::value_type(":254:",data_245));

ifstream is ("inputfile");
string strLine;
string strCol;
string strData;
int nPos;

while(!is.eof()) {

   getline(is,strLine);

   // locate 2nd ':' (if any)

   nPos = strLine.find(':');

   if ( 0 > nPos)  nPos = strLine.find(':', nPos) ;else continue;

   strCol = strLine.substr(0,nPos);
   strData = strLine.substr(nPos + 1);

   // find the vector for the column name
   map<string,vector<string> >::iterator i = mapColsToData.find(strCol);

   if ( i == mapColsToData.end()) continue;

   // insert data to vector, empty fields will be empty strings
   i->second.push_back(strData);
}

// now, output that to a file

ofstream os ("outputfile.txt");
size_t szMax = 0;
map<string,vector<string> >::iterator i;

for ( i = mapColsToData.begin(); i != mapColsToData.end(); ++i) {

   size_t sz = i->second.size();

   if ( sz > szMax) szMax = sz;
}


os << ":02:     :10:     :12:     :74:     :113:     :245:";


for ( int n = 0; n < mapColsToData.size(); ++n) {

   for ( i = mapColsToData.begin(); i != mapColsToData.end(); ++i) {

       string s = "NULL";

       if ( n <= i->second.size()) s = i->second[n];

       os << s << "\t";
   }

   os << endl;
}

return 0;
}

compiles. The warning about "warning C4503: XYZ decorated name length exceeded, name was truncated" can safely be ignored.
0
 
LVL 1

Author Comment

by:superfly18
ID: 13713469
Almost got it, it creates the output file, however then gives me an error after execution.  I switched compilers to the Borland BCC32 compiler.  

Here is some sanitized data.  Please let me know what's wrong!  I have learned a lot going through this, and I thank you for all the time and effort you have put in thus far.  This has been VERY helpful!

A5021100017890876543                                                                           :01:TE
:01:Test
:02:Test
:03:Test
:10:Test
:21:.
:32A:Test
:74:Test
123 Test Street
Testing
Testing
:51:Test
:52:Test
Test Test Test
:58:Test Data
Test Data
1234 Test
:113:T
:65:Canine
:73:Address
:80:Where is this
Address Shipping
:61:Test
:74:Test
:254:Test
:109:Test
:112:Test
:113:Test
0
 
LVL 1

Author Comment

by:superfly18
ID: 13717732
Trying to troubleshoot the code, I notice that the output file is creates, however has no values in it.  So I add the line, ad comment out the output file portion.

I can cout StrLine StrData and StrCol, however when I add the following line:

cout << "First element: " << data_12.front() << endl;

The code compiles correctly, and I get the same memory errors I got before when running the program.  

Please let me know what could be causing this error.  

Thanks,
MA
0
 
LVL 86

Accepted Solution

by:
jkr earned 2000 total points
ID: 13718585
There were still some bugs in the code.

#include <vector>
#include <map>
#include <fstream>
#include <string>
#include <iostream>
using namespace std;

int main () {

map<string,vector<string> > mapColsToData;

vector<string> data_02;
vector<string> data_10;
vector<string> data_12;
vector<string> data_74;
vector<string> data_113;
vector<string> data_245;

mapColsToData.insert(map<string,vector<string> >::value_type(":02:",data_02));
mapColsToData.insert(map<string,vector<string> >::value_type(":10:",data_10));
mapColsToData.insert(map<string,vector<string> >::value_type(":12:",data_12));
mapColsToData.insert(map<string,vector<string> >::value_type(":74:",data_74));
mapColsToData.insert(map<string,vector<string> >::value_type(":113:",data_113));
mapColsToData.insert(map<string,vector<string> >::value_type(":254:",data_245));

ifstream is ("inputfile");
string strLine;
string strCol;
string strData;
int nPos1;
int nPos2;

//__asm { int 3};

while(!is.eof()) {

  getline(is,strLine);

//  cout << strLine << endl;

  // locate 2nd ':' (if any)

  nPos1 = strLine.find(':');

  if ( -1 != nPos1)  {

    nPos2 = strLine.find(':', nPos1 + 1);

    if ( -1 == nPos2) continue;
  }
  else continue;

  cout << nPos1 << " " << nPos2 << endl;

  strCol = strLine.substr(nPos1,nPos2 - nPos1 + 1);
  strData = strLine.substr(nPos2 + 1);

  cout << "Col: " << strCol << "\t" << "Data " <<strData << endl;

  // find the vector for the column name
  map<string,vector<string> >::iterator i = mapColsToData.find(strCol);

  if ( i == mapColsToData.end()) continue;

  // insert data to vector, empty fields will be empty strings
  i->second.push_back(strData);
}

// now, output that to a file

ofstream os ("outputfile.txt");
size_t szMax = 0;
map<string,vector<string> >::iterator i;

for ( i = mapColsToData.begin(); i != mapColsToData.end(); ++i) {

  size_t sz = i->second.size();

  cout << i->first << " " << sz << endl;

  if ( sz > szMax) szMax = sz;
}


os << ":02:     :10:     :12:     :74:     :113:     :245:" << endl;


for ( int n = 0; n < mapColsToData.size(); ++n) {

  for ( i = mapColsToData.begin(); i != mapColsToData.end(); ++i) {

      string s = "NULL";

      if ( n < i->second.size()) s = i->second[n];

      os << s << "\t";
  }

  os << endl;
}

return 0;
}

creates

:02:     :10:     :12:     :74:     :113:     :245:
Test      Test      T      NULL      Test      Test      
NULL      NULL      Test      NULL      NULL      Test      
NULL      NULL      NULL      NULL      NULL      NULL      
NULL      NULL      NULL      NULL      NULL      NULL      
NULL      NULL      NULL      NULL      NULL      NULL      
NULL      NULL      NULL      NULL      NULL      NULL      


with the above input.
0
 
LVL 1

Author Comment

by:superfly18
ID: 13719429
Thanks again for all your help on this....One last question....For some reason it always stops at 6 rows....how can I fix this???

Thanks!
0
 
LVL 86

Expert Comment

by:jkr
ID: 13720029
Argh, another bug :o)

for ( int n = 0; n < mapColsToData.size(); ++n) {

should be

for ( int n = 0; n < szMax; ++n) {
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Templates For Beginners Or How To Encourage The Compiler To Work For You Introduction This tutorial is targeted at the reader who is, perhaps, familiar with the basics of C++ but would prefer a little slower introduction to the more ad…
Introduction This article is a continuation of the C/C++ Visual Studio Express debugger series. Part 1 provided a quick start guide in using the debugger. Part 2 focused on additional topics in breakpoints. As your assignments become a little more …
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.
Suggested Courses
Course of the Month13 days, 17 hours left to enroll

801 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question