• Status: Solved
• Priority: Medium
• Security: Public
• Views: 356

# Parsing Text Using C

I'm looking to create a, hopefully simple, program within C that looks in the folder C:/test and goes through all of the text files that are in that file and for each one parses the data from those text files into a word document.

I'm pretty rough on C programming, any ideas? Also, what's the best free compiler?

For now I just want to get the concept started, I'll ask additional questions down the line for how to properly parse the specifics for which I'm intereted.

Thanks!
0
PGRBryant
• 6
• 6
• 2
• +2
4 Solutions

Commented:
Well, listing text files in a directory is the easier part, that could be done like the following (see also http://msdn.microsoft.com/en-us/library/aa365200%28VS.85%29.aspx - "Listing the Files in a Directory"):
#include <windows.h>
#include <tchar.h>

void main () {

WIN32_FIND_DATA fd;

HANDLE hFind = FindFirstFile(_T("c:\\path\\*.txt"),&fd);

while (hFind) {

wprintf(_T("File: %s\n""),fd.cFileName);

if (!FindNextFile(hFind,&fd)) break;
}
}


However - how would you want these files to be added to a Word document?
0

Commented:
Once you have listed the files within a directory, you can create a new doc file and add content paragraphs to it with COM Word Automation in C++.

The sample creates a new doc file and adds some paragraphs to it.

0

Commented:
visual studio 2010 express is a good and free C and C++ compiler.

you also could use cygwin or mingw compilers/ide which already have the dirent.h included. the dirent.h is a portable means to retrieve a list of text files from folder.

the word document can done via com and automation like parnasso told. but that probably is not the easiest way. you might think of putting the data to an intermediate database and getting data from there into word/excel. or writing a rtf file from your data which then more easily can be converted to word document.

Sara
0

R & D Engineering ManagerCommented:
Convert text file into a word document try this:

char szOrigFilename[MAX_PATH];
char szNewFilename[MAX_PATH];

strcpy(szOrigFilename,"c:\\test\\text.txt")
strcpy(szNewFilename,"c:\\test\\text.doc")

BOOL bRename = MoveFile(szOrigFilename,szNewFilename);

0

Author Commented:
Okay Sarabande, the point of this process is to parse out tidbits of information that are stored in the text file and then automatically convert them and display them as needed.

I've attached a really silly .txt file that has basically random information.

But let's say that I wanted to pull out the "1" the "#" and the "Tom".

And place those into a word document, how would I do that?

See, down the line I'm actually going to be parsing the data from a series of text files and pasting them into another program. But in order to get there I'm trying to start with something relatively simple.
Test.txt
0

Author Commented:
I didn't mean to exclude the names of you other experts, my apologies.

@jkr: I'm not sure what you're doing here? You're looking through a folder for all the .txt files and then you're printing them to what?

Let's say the txt files are all in the directory c:\test and then filename of the word document is Parsing.doc in the same folder.

@Parnasso, for some reason my visual basic express didn't know what to do with the file that you supplied, it said "conversion failed" and wouldn't load your example... although that does look very interesting.

@DeepuAbrahamK, what's the #includes  and main type for that code? I'm assuming in your psuedocode that szOrigFilename is supposed to be the name of my .txt file, so let's say test.txt, and the szNewFilename is the name of my word document, so let's say parsing.doc? Yes?
0

Author Commented:
Visual C++ express, not basic*... I really wish I could edit posts.
0

Commented:
to parse the text file you would define a struct

struct Test
{
int title1;
std::string title2;
std::string title3;
};

then open text file

std::ifstream  testfile("test.txt");

and read line by line like

Test record;
std::string s;
// ignore title line
std::getline(testfile, s);

std::vector<Test> alltests;
while (testfile >> record.title1 >> record.title2 >> record.title3 )
{
// remove here leading and trailing spaces from title2 and title3
...
alltests.push_back(record);
}

the word part is not so easy.

I would suggest you store those data into ms access table  or .csv and get the data from there into your word doc. but i am no expert in word or automation.

Sara
0

Author Commented:
Okay so let's back it up a bit, and perhaps make it simplier... I'll ask more questions down the line.

Let's start with one .txt file and create a new txt file just with the data that I want parsed into it?

From what I gather of your code it should look something like the following, assuming I'm writing a win32 console application w/ defaults on in Visual C++ Express 2010.

#include "stdafx.h"
#include "targetver.h"
#include <iostream>

int _tmain(int argc, char* argv[])
{
struct Test //define parts of interest
{
int title1;
std::string title2;
std::string title3;
};

std::ifstream testfile("test.txt"); //Open text file

Test record;
std::string s; // ignores title line
std::getc(testfile, s);

std::vector<Test> alltests;
while (testfile >> record.title1 >> record.title2 >> record.title3 )
{
// remove here leading and trailing spaces from title2 and title3
... // what goes here?
alltests.push_back(record);
}

return 0;
}

0

Commented:
>>    // remove here leading and trailing spaces from title2 and title3
>>    ... // what goes here?

A call to a function like the following to do that for each string variable:

void trim_whitespace(std::string& x) {

size_t nPos;

while (' ' == (*x.begin())) x.erase(x.begin());
while (' ' == (*x.rbegin())) x.erase(x.rbegin());

}

0

Commented:
the member strings title2 and title3 will have spaces because of

testfile >> record.title1 >> record.title2 >> record.title3

which does not extract the spaces (blanks) of your lines.

you can remove them using the following helper

std::string & trim(const std::string & s)
{
int n1 = (int)s.find_first_of(" \t");
if (n1 == std::string::npos)
return "";  // empty string
int n2 = (int)s.find_last_of(" \t");
if (n2 == std::string::npos)
n2 = s.length() -1;
return s.substr(n1, n2+1-n1);
}

Sara
0

Commented:
About my example, if it doesn-t make the conversion, please create a new solution with your Visual and add the cpp files to it. With all likelyhood this is an issue with the Visual Studio versions.
0

Commented:
parnasso, i also downloaded the zip file but it can't extract the files with my winzip (decompression failed) which is fairly new (version 14.5).

Sara
0

Commented:
if using my trim helper you would need to replace find_first_of by find_first_not_of and find_last_of by find_last_not_of.

Sara
0

Author Commented:
I'm still working on this, I don't quite understand what you guys are doing, I'm too green I guess, give me a bit to figure it out... in the meantime I think I've asked something simpler:
http://www.experts-exchange.com/Programming/Languages/C/Q_26866379.html
0

Commented:
actually it was 4 questions:

- search directory for text files
- parse text lines
- put results into word document

Sara
0

Author Commented:
Concur with sarabande, points split accordingly.

This question was poorly worded and I got distracted with other projects and didn't come back till later to verify the experts quality advice.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.