Solved

Parsing Text Using C

Posted on 2011-03-01
18
351 Views
Last Modified: 2012-05-11
I'm looking to create a, hopefully simple, program within C that looks in the folder C:/test and goes through all of the text files that are in that file and for each one parses the data from those text files into a word document.

I'm pretty rough on C programming, any ideas? Also, what's the best free compiler?

For now I just want to get the concept started, I'll ask additional questions down the line for how to properly parse the specifics for which I'm intereted.

Thanks!
0
Comment
Question by:PGRBryant
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 6
  • 2
  • +2
18 Comments
 
LVL 86

Assisted Solution

by:jkr
jkr earned 125 total points
ID: 35009750
Well, listing text files in a directory is the easier part, that could be done like the following (see also http://msdn.microsoft.com/en-us/library/aa365200%28VS.85%29.aspx - "Listing the Files in a Directory"):
#include <windows.h>
#include <tchar.h>

void main () {

	WIN32_FIND_DATA fd;

	HANDLE hFind = FindFirstFile(_T("c:\\path\\*.txt"),&fd);

	while (hFind) {

		wprintf(_T("File: %s\n""),fd.cFileName);

		if (!FindNextFile(hFind,&fd)) break;
	}
}

Open in new window


However - how would you want these files to be added to a Word document?
0
 
LVL 4

Assisted Solution

by:parnasso
parnasso earned 125 total points
ID: 35010014
Once you have listed the files within a directory, you can create a new doc file and add content paragraphs to it with COM Word Automation in C++.

There is a very nice example of COM Word automation in the following link http://1code.codeplex.com/releases/view/59632#DownloadId=201236

The sample creates a new doc file and adds some paragraphs to it.

Hope this suits your needs
0
 
LVL 34

Assisted Solution

by:sarabande
sarabande earned 250 total points
ID: 35010233
visual studio 2010 express is a good and free C and C++ compiler.

you also could use cygwin or mingw compilers/ide which already have the dirent.h included. the dirent.h is a portable means to retrieve a list of text files from folder.

the word document can done via com and automation like parnasso told. but that probably is not the easiest way. you might think of putting the data to an intermediate database and getting data from there into word/excel. or writing a rtf file from your data which then more easily can be converted to word document.

Sara
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 11

Expert Comment

by:DeepuAbrahamK
ID: 35017492
Convert text file into a word document try this:


char szOrigFilename[MAX_PATH];
char szNewFilename[MAX_PATH];

strcpy(szOrigFilename,"c:\\test\\text.txt")
strcpy(szNewFilename,"c:\\test\\text.doc")

BOOL bRename = MoveFile(szOrigFilename,szNewFilename);

Open in new window

0
 
LVL 1

Author Comment

by:PGRBryant
ID: 35023476
Okay Sarabande, the point of this process is to parse out tidbits of information that are stored in the text file and then automatically convert them and display them as needed.

I've attached a really silly .txt file that has basically random information.

But let's say that I wanted to pull out the "1" the "#" and the "Tom".

And place those into a word document, how would I do that?

See, down the line I'm actually going to be parsing the data from a series of text files and pasting them into another program. But in order to get there I'm trying to start with something relatively simple.
Test.txt
0
 
LVL 1

Author Comment

by:PGRBryant
ID: 35023503
I didn't mean to exclude the names of you other experts, my apologies.

@jkr: I'm not sure what you're doing here? You're looking through a folder for all the .txt files and then you're printing them to what?

Let's say the txt files are all in the directory c:\test and then filename of the word document is Parsing.doc in the same folder.

@Parnasso, for some reason my visual basic express didn't know what to do with the file that you supplied, it said "conversion failed" and wouldn't load your example... although that does look very interesting.

@DeepuAbrahamK, what's the #includes  and main type for that code? I'm assuming in your psuedocode that szOrigFilename is supposed to be the name of my .txt file, so let's say test.txt, and the szNewFilename is the name of my word document, so let's say parsing.doc? Yes?
0
 
LVL 1

Author Comment

by:PGRBryant
ID: 35023511
Visual C++ express, not basic*... I really wish I could edit posts.
0
 
LVL 34

Expert Comment

by:sarabande
ID: 35025315
to parse the text file you would define a struct

struct Test
{
    int title1;
    std::string title2;
    std::string title3;
};

then open text file

   std::ifstream  testfile("test.txt");

and read line by line like

  Test record;
   std::string s;
   // ignore title line
   std::getline(testfile, s);

   std::vector<Test> alltests;
   while (testfile >> record.title1 >> record.title2 >> record.title3 )
   {
        // remove here leading and trailing spaces from title2 and title3
       ...
       alltests.push_back(record);
   }

the word part is not so easy.

I would suggest you store those data into ms access table  or .csv and get the data from there into your word doc. but i am no expert in word or automation.

Sara
0
 
LVL 1

Author Comment

by:PGRBryant
ID: 35029523
Okay so let's back it up a bit, and perhaps make it simplier... I'll ask more questions down the line.

Let's start with one .txt file and create a new txt file just with the data that I want parsed into it?

From what I gather of your code it should look something like the following, assuming I'm writing a win32 console application w/ defaults on in Visual C++ Express 2010.


#include "stdafx.h"
#include "targetver.h"
#include <iostream>

int _tmain(int argc, char* argv[])
{
   struct Test //define parts of interest
   {
       int title1;
       std::string title2;
       std::string title3;
   };	
   
   std::ifstream testfile("test.txt"); //Open text file
    
   Test record;
   std::string s; // ignores title line
   std::getc(testfile, s);
    
   std::vector<Test> alltests;
   while (testfile >> record.title1 >> record.title2 >> record.title3 )
   {
    // remove here leading and trailing spaces from title2 and title3
    ... // what goes here?
    alltests.push_back(record);
    }

   return 0;
}

Open in new window

0
 
LVL 86

Expert Comment

by:jkr
ID: 35029918
>>    // remove here leading and trailing spaces from title2 and title3
>>    ... // what goes here?

A call to a function like the following to do that for each string variable:

void trim_whitespace(std::string& x) {

  size_t nPos; 

  while (' ' == (*x.begin())) x.erase(x.begin()); 
  while (' ' == (*x.rbegin())) x.erase(x.rbegin());

}

Open in new window

0
 
LVL 34

Accepted Solution

by:
sarabande earned 250 total points
ID: 35030043
the member strings title2 and title3 will have spaces because of

   testfile >> record.title1 >> record.title2 >> record.title3

which does not extract the spaces (blanks) of your lines.

you can remove them using the following helper

std::string & trim(const std::string & s)
{
    int n1 = (int)s.find_first_of(" \t");
    if (n1 == std::string::npos)
        return "";  // empty string
    int n2 = (int)s.find_last_of(" \t");
    if (n2 == std::string::npos)
        n2 = s.length() -1;
    return s.substr(n1, n2+1-n1);
}

Sara
0
 
LVL 4

Expert Comment

by:parnasso
ID: 35034470
About my example, if it doesn-t make the conversion, please create a new solution with your Visual and add the cpp files to it. With all likelyhood this is an issue with the Visual Studio versions.
0
 
LVL 34

Expert Comment

by:sarabande
ID: 35034832
parnasso, i also downloaded the zip file but it can't extract the files with my winzip (decompression failed) which is fairly new (version 14.5).

Sara
0
 
LVL 34

Expert Comment

by:sarabande
ID: 35034845
if using my trim helper you would need to replace find_first_of by find_first_not_of and find_last_of by find_last_not_of.

Sara
0
 
LVL 1

Author Comment

by:PGRBryant
ID: 35045072
I'm still working on this, I don't quite understand what you guys are doing, I'm too green I guess, give me a bit to figure it out... in the meantime I think I've asked something simpler:
http://www.experts-exchange.com/Programming/Languages/C/Q_26866379.html
0
 
LVL 34

Expert Comment

by:sarabande
ID: 35276725
actually it was 4 questions:

  - search directory for text files
  - read text file
  - parse text lines
  - put results into word document

where each expert made (valid) comments to different parts.

Sara
0
 
LVL 1

Author Closing Comment

by:PGRBryant
ID: 35287486
Concur with sarabande, points split accordingly.

This question was poorly worded and I got distracted with other projects and didn't come back till later to verify the experts quality advice.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
If you’re thinking to yourself “That description sounds a lot like two people doing the work that one could accomplish,” you’re not alone.
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question