• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 355
  • Last Modified:

Parsing Text Using C

I'm looking to create a, hopefully simple, program within C that looks in the folder C:/test and goes through all of the text files that are in that file and for each one parses the data from those text files into a word document.

I'm pretty rough on C programming, any ideas? Also, what's the best free compiler?

For now I just want to get the concept started, I'll ask additional questions down the line for how to properly parse the specifics for which I'm intereted.

Thanks!
0
PGRBryant
Asked:
PGRBryant
  • 6
  • 6
  • 2
  • +2
4 Solutions
 
jkrCommented:
Well, listing text files in a directory is the easier part, that could be done like the following (see also http://msdn.microsoft.com/en-us/library/aa365200%28VS.85%29.aspx - "Listing the Files in a Directory"):
#include <windows.h>
#include <tchar.h>

void main () {

	WIN32_FIND_DATA fd;

	HANDLE hFind = FindFirstFile(_T("c:\\path\\*.txt"),&fd);

	while (hFind) {

		wprintf(_T("File: %s\n""),fd.cFileName);

		if (!FindNextFile(hFind,&fd)) break;
	}
}

Open in new window


However - how would you want these files to be added to a Word document?
0
 
parnassoCommented:
Once you have listed the files within a directory, you can create a new doc file and add content paragraphs to it with COM Word Automation in C++.

There is a very nice example of COM Word automation in the following link http://1code.codeplex.com/releases/view/59632#DownloadId=201236

The sample creates a new doc file and adds some paragraphs to it.

Hope this suits your needs
0
 
sarabandeCommented:
visual studio 2010 express is a good and free C and C++ compiler.

you also could use cygwin or mingw compilers/ide which already have the dirent.h included. the dirent.h is a portable means to retrieve a list of text files from folder.

the word document can done via com and automation like parnasso told. but that probably is not the easiest way. you might think of putting the data to an intermediate database and getting data from there into word/excel. or writing a rtf file from your data which then more easily can be converted to word document.

Sara
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
Deepu AbrahamR & D Engineering ManagerCommented:
Convert text file into a word document try this:


char szOrigFilename[MAX_PATH];
char szNewFilename[MAX_PATH];

strcpy(szOrigFilename,"c:\\test\\text.txt")
strcpy(szNewFilename,"c:\\test\\text.doc")

BOOL bRename = MoveFile(szOrigFilename,szNewFilename);

Open in new window

0
 
PGRBryantAuthor Commented:
Okay Sarabande, the point of this process is to parse out tidbits of information that are stored in the text file and then automatically convert them and display them as needed.

I've attached a really silly .txt file that has basically random information.

But let's say that I wanted to pull out the "1" the "#" and the "Tom".

And place those into a word document, how would I do that?

See, down the line I'm actually going to be parsing the data from a series of text files and pasting them into another program. But in order to get there I'm trying to start with something relatively simple.
Test.txt
0
 
PGRBryantAuthor Commented:
I didn't mean to exclude the names of you other experts, my apologies.

@jkr: I'm not sure what you're doing here? You're looking through a folder for all the .txt files and then you're printing them to what?

Let's say the txt files are all in the directory c:\test and then filename of the word document is Parsing.doc in the same folder.

@Parnasso, for some reason my visual basic express didn't know what to do with the file that you supplied, it said "conversion failed" and wouldn't load your example... although that does look very interesting.

@DeepuAbrahamK, what's the #includes  and main type for that code? I'm assuming in your psuedocode that szOrigFilename is supposed to be the name of my .txt file, so let's say test.txt, and the szNewFilename is the name of my word document, so let's say parsing.doc? Yes?
0
 
PGRBryantAuthor Commented:
Visual C++ express, not basic*... I really wish I could edit posts.
0
 
sarabandeCommented:
to parse the text file you would define a struct

struct Test
{
    int title1;
    std::string title2;
    std::string title3;
};

then open text file

   std::ifstream  testfile("test.txt");

and read line by line like

  Test record;
   std::string s;
   // ignore title line
   std::getline(testfile, s);

   std::vector<Test> alltests;
   while (testfile >> record.title1 >> record.title2 >> record.title3 )
   {
        // remove here leading and trailing spaces from title2 and title3
       ...
       alltests.push_back(record);
   }

the word part is not so easy.

I would suggest you store those data into ms access table  or .csv and get the data from there into your word doc. but i am no expert in word or automation.

Sara
0
 
PGRBryantAuthor Commented:
Okay so let's back it up a bit, and perhaps make it simplier... I'll ask more questions down the line.

Let's start with one .txt file and create a new txt file just with the data that I want parsed into it?

From what I gather of your code it should look something like the following, assuming I'm writing a win32 console application w/ defaults on in Visual C++ Express 2010.


#include "stdafx.h"
#include "targetver.h"
#include <iostream>

int _tmain(int argc, char* argv[])
{
   struct Test //define parts of interest
   {
       int title1;
       std::string title2;
       std::string title3;
   };	
   
   std::ifstream testfile("test.txt"); //Open text file
    
   Test record;
   std::string s; // ignores title line
   std::getc(testfile, s);
    
   std::vector<Test> alltests;
   while (testfile >> record.title1 >> record.title2 >> record.title3 )
   {
    // remove here leading and trailing spaces from title2 and title3
    ... // what goes here?
    alltests.push_back(record);
    }

   return 0;
}

Open in new window

0
 
jkrCommented:
>>    // remove here leading and trailing spaces from title2 and title3
>>    ... // what goes here?

A call to a function like the following to do that for each string variable:

void trim_whitespace(std::string& x) {

  size_t nPos; 

  while (' ' == (*x.begin())) x.erase(x.begin()); 
  while (' ' == (*x.rbegin())) x.erase(x.rbegin());

}

Open in new window

0
 
sarabandeCommented:
the member strings title2 and title3 will have spaces because of

   testfile >> record.title1 >> record.title2 >> record.title3

which does not extract the spaces (blanks) of your lines.

you can remove them using the following helper

std::string & trim(const std::string & s)
{
    int n1 = (int)s.find_first_of(" \t");
    if (n1 == std::string::npos)
        return "";  // empty string
    int n2 = (int)s.find_last_of(" \t");
    if (n2 == std::string::npos)
        n2 = s.length() -1;
    return s.substr(n1, n2+1-n1);
}

Sara
0
 
parnassoCommented:
About my example, if it doesn-t make the conversion, please create a new solution with your Visual and add the cpp files to it. With all likelyhood this is an issue with the Visual Studio versions.
0
 
sarabandeCommented:
parnasso, i also downloaded the zip file but it can't extract the files with my winzip (decompression failed) which is fairly new (version 14.5).

Sara
0
 
sarabandeCommented:
if using my trim helper you would need to replace find_first_of by find_first_not_of and find_last_of by find_last_not_of.

Sara
0
 
PGRBryantAuthor Commented:
I'm still working on this, I don't quite understand what you guys are doing, I'm too green I guess, give me a bit to figure it out... in the meantime I think I've asked something simpler:
http://www.experts-exchange.com/Programming/Languages/C/Q_26866379.html
0
 
sarabandeCommented:
actually it was 4 questions:

  - search directory for text files
  - read text file
  - parse text lines
  - put results into word document

where each expert made (valid) comments to different parts.

Sara
0
 
PGRBryantAuthor Commented:
Concur with sarabande, points split accordingly.

This question was poorly worded and I got distracted with other projects and didn't come back till later to verify the experts quality advice.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 6
  • 6
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now