Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 208
  • Last Modified:

Parsing Text into Tab Delimited File


I have some text data from a legacy system (old Mainframe) which I am trying to input into
a tab delimeted file for input into a relational database.  The goal is to read in a file
of type *.DAT, and then output into a tab delimeted *.TXT file
I only have soft copies of the *.DAT files for input.  The data output is static
and certain fields are specified by their line number, :#:.  I would like to parse these
files, but I am not sure on how to go about doing it.  The file structure looks like this.  
A5

A5543645674645646446
      :01:KI
:02:AMERA123456C897
:03:
:04:
:05:
:10:BIRDCAGE
:12:A50212USD1234,89
:74:LONG STRING HERE
:113:B
:245:123456XIX1234
-
A62354424334242423234
      :01:KI
:02:EURO123456C897
:03:
:04:
:05:
:06:
:10:BIRDCAGE
:12:A50212USD2345,89
:74:LONG STRING HERE
:113:B
:245:123456XIX1235
-

The values theat I would like to import into SQL Server, are only :02:,:10: :12:,:74:
:113:,:245: However in some cases these values will be null, and
sometimes those lines, and line numbers will not exist at all.  When parsing into
tab delimeted, I want the output to look like this:


:02:      :10:      :12:      :74:      :113:      :245:
Value      Value      Value      Value      Value      Value
Value      Value      Null      Value      Value      Value

So that each line number is a column, and the values for those columns are the strings
next to those values.  If the line number does not exist, or there is no data in the line
number, the value will be null.

As you can see, there are multiple records in one file, and it there are not always
the same number of records in each file.  

I have never done text parsing before, however need a lot of help on this one, any code,
suggestions, or pointers to the right direction will be most helpful.

Thank You!
0
superfly18
Asked:
superfly18
  • 7
  • 6
1 Solution
 
jkrCommented:
This should be easy using maps and vectors, e.g.:

#include <vector>
#include <map>
#include <fstream>
#include <string>
using namespace std;

map<string,vector<string> > mapColsToData;

vector<string> data_02;
vector<string> data_10;
vector<string> data_12;
vector<string> data_74;
vector<string> data_113;
vector<string> data_245;

mapColsToData.insert(map<string,vector<string>& >::value_type(":02:",data_02));
mapColsToData.insert(map<string,vector<string>& >::value_type(":10:",data_10));
mapColsToData.insert(map<string,vector<string>& >::value_type(":12:",data_12));
mapColsToData.insert(map<string,vector<string>& >::value_type(":74:",data_74));
mapColsToData.insert(map<string,vector<string>& >::value_type(":113:",data_113));
mapColsToData.insert(map<string,vector<string>& >::value_type(":254:",data_254));

ifstream is ("inputfile");
string strLine;
string strCol;
string strData;
int nPos;

while(!is.eof()) {

    getline(is,strLine);

    // locate 2nd ':' (if any)

    nPos = strLine.find(':');

    if ( 0 > nPos)  nPos = strLine.find(':', nPos) else continue;

    strCol = strLine.substr(0,nPos);
    strData = strLine.substr(nPos + 1);

    // find the vector for the column name
    map<string,vector<string> >::iterator i = mapColsToData.find(strCol);

    if ( i == mapColsToData.end()) continue;

    // insert data to vector, empty fields will be empty strings
    i->second.push_back(strData);
}

// now, output that to a file

ofstream os ("outputfile.txt");
size_t szMax = 0;
map<string,vector<string> >::iterator i;

for ( i = mapColsToData.begin(); i != mapColsToData.end(); ++i) {

    size_t sz = i->second.size();

    if ( sz > szMax) szMax = sz;
}


os << ":02:     :10:     :12:     :74:     :113:     :245:"


for ( int n = 0; n < mapColsToData.size(); ++n) {

    for ( i = mapColsToData.begin(); i != mapColsToData.end(); ++i) {

        string s = "NULL";

        if ( n <= i->second.size()) s = i->second[n];

        os << s << "\t";
    }

    os << endl;
}
0
 
superfly18Author Commented:
Thanks for the quick help.  I spent some time trying to troubleshoot myself, but have to admit I am a bit stuck.  I am getting the following compiler error:

        F:\parse.cpp(16) : see reference to class template instantiation 'std::map<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<s

Please let me know how I can fix this

Thanks!
0
 
jkrCommented:
Well, that was meant more like an idea than a solution to work by copy&paste :o)

What exacly is line 16 in your code?
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
superfly18Author Commented:
Agreed.  However, I have never worked with vectors before, and all of the errors seem to stem from these lines:
mapColsToData.insert(map<string,vector<string>& >::value_type(":12:",data_12));
0
 
jkrCommented:
Um, sorry that I forgot to mention this before, but the error message you posted seems to be lacking some info...

And, just to add that, the above lines should somehow be embedded into functions or methods also.
0
 
superfly18Author Commented:
They are in my main function.  I just have a few more errors to clean up, and they all have to do with those lines.  Here is the complete error message.

F:\parse.cpp(26) : error C2665: 'pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std:
:allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > &>::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class std::vector<class std::basic_string<
char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > &>' : none of the 2 overloads can convert parameter 2 from type 'class s
td::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > >'

Thank You for your help thus far.  If you can help me fix these last few lines I will be HAPPY!!!

Thanks!
0
 
jkrCommented:
Hmm, let's see - the following

mapColsToData.insert(map<string,vector<string> >::value_type(":02:",data_02));
mapColsToData.insert(map<string,vector<string> >::value_type(":10:",data_10));
mapColsToData.insert(map<string,vector<string> >::value_type(":12:",data_12));
mapColsToData.insert(map<string,vector<string> >::value_type(":74:",data_74));
mapColsToData.insert(map<string,vector<string> >::value_type(":113:",data_113));
mapColsToData.insert(map<string,vector<string> >::value_type(":254:",data_254));

should help...
0
 
jkrCommented:
BTW, lacking the input file(s), I cant test that code, but

#include <vector>
#include <map>
#include <fstream>
#include <string>
using namespace std;

int main () {

map<string,vector<string> > mapColsToData;

vector<string> data_02;
vector<string> data_10;
vector<string> data_12;
vector<string> data_74;
vector<string> data_113;
vector<string> data_245;

mapColsToData.insert(map<string,vector<string> >::value_type(":02:",data_02));
mapColsToData.insert(map<string,vector<string> >::value_type(":10:",data_10));
mapColsToData.insert(map<string,vector<string> >::value_type(":12:",data_12));
mapColsToData.insert(map<string,vector<string> >::value_type(":74:",data_74));
mapColsToData.insert(map<string,vector<string> >::value_type(":113:",data_113));
mapColsToData.insert(map<string,vector<string> >::value_type(":254:",data_245));

ifstream is ("inputfile");
string strLine;
string strCol;
string strData;
int nPos;

while(!is.eof()) {

   getline(is,strLine);

   // locate 2nd ':' (if any)

   nPos = strLine.find(':');

   if ( 0 > nPos)  nPos = strLine.find(':', nPos) ;else continue;

   strCol = strLine.substr(0,nPos);
   strData = strLine.substr(nPos + 1);

   // find the vector for the column name
   map<string,vector<string> >::iterator i = mapColsToData.find(strCol);

   if ( i == mapColsToData.end()) continue;

   // insert data to vector, empty fields will be empty strings
   i->second.push_back(strData);
}

// now, output that to a file

ofstream os ("outputfile.txt");
size_t szMax = 0;
map<string,vector<string> >::iterator i;

for ( i = mapColsToData.begin(); i != mapColsToData.end(); ++i) {

   size_t sz = i->second.size();

   if ( sz > szMax) szMax = sz;
}


os << ":02:     :10:     :12:     :74:     :113:     :245:";


for ( int n = 0; n < mapColsToData.size(); ++n) {

   for ( i = mapColsToData.begin(); i != mapColsToData.end(); ++i) {

       string s = "NULL";

       if ( n <= i->second.size()) s = i->second[n];

       os << s << "\t";
   }

   os << endl;
}

return 0;
}

compiles. The warning about "warning C4503: XYZ decorated name length exceeded, name was truncated" can safely be ignored.
0
 
superfly18Author Commented:
Almost got it, it creates the output file, however then gives me an error after execution.  I switched compilers to the Borland BCC32 compiler.  

Here is some sanitized data.  Please let me know what's wrong!  I have learned a lot going through this, and I thank you for all the time and effort you have put in thus far.  This has been VERY helpful!

A5021100017890876543                                                                           :01:TE
:01:Test
:02:Test
:03:Test
:10:Test
:21:.
:32A:Test
:74:Test
123 Test Street
Testing
Testing
:51:Test
:52:Test
Test Test Test
:58:Test Data
Test Data
1234 Test
:113:T
:65:Canine
:73:Address
:80:Where is this
Address Shipping
:61:Test
:74:Test
:254:Test
:109:Test
:112:Test
:113:Test
0
 
superfly18Author Commented:
Trying to troubleshoot the code, I notice that the output file is creates, however has no values in it.  So I add the line, ad comment out the output file portion.

I can cout StrLine StrData and StrCol, however when I add the following line:

cout << "First element: " << data_12.front() << endl;

The code compiles correctly, and I get the same memory errors I got before when running the program.  

Please let me know what could be causing this error.  

Thanks,
MA
0
 
jkrCommented:
There were still some bugs in the code.

#include <vector>
#include <map>
#include <fstream>
#include <string>
#include <iostream>
using namespace std;

int main () {

map<string,vector<string> > mapColsToData;

vector<string> data_02;
vector<string> data_10;
vector<string> data_12;
vector<string> data_74;
vector<string> data_113;
vector<string> data_245;

mapColsToData.insert(map<string,vector<string> >::value_type(":02:",data_02));
mapColsToData.insert(map<string,vector<string> >::value_type(":10:",data_10));
mapColsToData.insert(map<string,vector<string> >::value_type(":12:",data_12));
mapColsToData.insert(map<string,vector<string> >::value_type(":74:",data_74));
mapColsToData.insert(map<string,vector<string> >::value_type(":113:",data_113));
mapColsToData.insert(map<string,vector<string> >::value_type(":254:",data_245));

ifstream is ("inputfile");
string strLine;
string strCol;
string strData;
int nPos1;
int nPos2;

//__asm { int 3};

while(!is.eof()) {

  getline(is,strLine);

//  cout << strLine << endl;

  // locate 2nd ':' (if any)

  nPos1 = strLine.find(':');

  if ( -1 != nPos1)  {

    nPos2 = strLine.find(':', nPos1 + 1);

    if ( -1 == nPos2) continue;
  }
  else continue;

  cout << nPos1 << " " << nPos2 << endl;

  strCol = strLine.substr(nPos1,nPos2 - nPos1 + 1);
  strData = strLine.substr(nPos2 + 1);

  cout << "Col: " << strCol << "\t" << "Data " <<strData << endl;

  // find the vector for the column name
  map<string,vector<string> >::iterator i = mapColsToData.find(strCol);

  if ( i == mapColsToData.end()) continue;

  // insert data to vector, empty fields will be empty strings
  i->second.push_back(strData);
}

// now, output that to a file

ofstream os ("outputfile.txt");
size_t szMax = 0;
map<string,vector<string> >::iterator i;

for ( i = mapColsToData.begin(); i != mapColsToData.end(); ++i) {

  size_t sz = i->second.size();

  cout << i->first << " " << sz << endl;

  if ( sz > szMax) szMax = sz;
}


os << ":02:     :10:     :12:     :74:     :113:     :245:" << endl;


for ( int n = 0; n < mapColsToData.size(); ++n) {

  for ( i = mapColsToData.begin(); i != mapColsToData.end(); ++i) {

      string s = "NULL";

      if ( n < i->second.size()) s = i->second[n];

      os << s << "\t";
  }

  os << endl;
}

return 0;
}

creates

:02:     :10:     :12:     :74:     :113:     :245:
Test      Test      T      NULL      Test      Test      
NULL      NULL      Test      NULL      NULL      Test      
NULL      NULL      NULL      NULL      NULL      NULL      
NULL      NULL      NULL      NULL      NULL      NULL      
NULL      NULL      NULL      NULL      NULL      NULL      
NULL      NULL      NULL      NULL      NULL      NULL      


with the above input.
0
 
superfly18Author Commented:
Thanks again for all your help on this....One last question....For some reason it always stops at 6 rows....how can I fix this???

Thanks!
0
 
jkrCommented:
Argh, another bug :o)

for ( int n = 0; n < mapColsToData.size(); ++n) {

should be

for ( int n = 0; n < szMax; ++n) {
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 7
  • 6
Tackle projects and never again get stuck behind a technical roadblock.
Join Now