asked on

File I/O, and some pointer or stack problem..

I have two questions here.

1.Because of some work, I need to collect all keywords that scattered in many files. Basically, I have to read all these files and extract those useful keywords. I know how to deal with it from one source file. Can anybody tell me how to deal with multiple files in this situation?

2. I have a file that contains all the keywords. I need a program to make the keywords unique and sorted alphabetically.

I think I have to read these keywords one by one from the file. Then each time compare the new one first with the smallest one, then those have been sorted.

My questions are: How shall I deal with the sorted results? How can I retrieve them when I need? How shall I save them? use pointer, or stack?

Thank you very much!

bkrahmer

My realistic answer for number one is to use 'cat' to get all your data flowing into the stdin of the program. If that doesn't trip your trigger, you'd have to give more details about your specifications, like what OS.
For two, I think your first assumption is a good one. The second point can be made extremely easy by using an STL map to store your words after you've parsed them.
brian

phildsp

The cat command would only be available on UNIX and might be slow compared to c++ code. But it would be a quick way to implement the search. Otherwise why wouldn't you just cycle through the files, closing and deleting the I/O object for the last file processed and creating a new I/O object for the new file?

If you have a lot of keywords or will be doing a lot of access on them you might consider using a hash table. That's an extremely efficient means of storing and sorting that's flexible. It's a commonly enough used object that should be available in development libraries such as Rogue Wave for example.

bkrahmer

I guarantee that the unix method I mentioned would be faster. Otherwise, yeah.
Exactly which STL container has the hashtable? An important lesson to be learned from the principles of eXtreme Programming is to give the most simple solution for a given problem. Is his program realistically going to take even a second to run? I doubt it. I didn't hear execution speed mentioned as the primary requirement. Therefore, why not let STL do all the work for you and move on to solving interesting problems instead of reinventing the wheel each time?
brian

bkrahmer

I might also point out that using a map solves 4 of the points of the problem very well. In addition, because of it's associate property, one could extend it with a couple more lines of code to show how many times a keyword was found, or print the map sorted by how many times the keyword was present.
brian

anxx0018

ASKER

Yes, I am using Linux redhat system. So I think Brian's method will work.

Could you please give me some simple examples for the methods you mentioned above? They all sound strange to me.

Thanks a lot!

anxx0018

ASKER

Brian, Can you help me check the following program. I really do not understand why it does not go into the loop.

Thank you very much!
#include <stdio.h>
#include <stdlib.h>
#include <fstream>

#include <iostream>
#include <string>
#include <map>

using namespace std;

//Min is to return the smaller value of m1 and m2.
int Min(int m1, int m2)
{
if (m1<m2)
return m1;
else
return m2;
}

//WordCmp is used to compare two strings, if k1 is alphabetically smaller than k2, return 0; if k1 is bigger than k2, return 1; if k1 and k2 are equal, return 2.

int WordCmp(string k1, string k2)
{
int len1,len2;

char *s1 = new char[strlen(k1.c_str()+1)];
strcpy(s1,k1.c_str());

char *s2 = new char[strlen(k2.c_str()+1)];
strcpy(s2,k2.c_str());

int n = Min(strlen(k1.c_str()), strlen(k2.c_str()));

for (int i= 0; i<n; i++)
{
if (s1[i]<s2[i])
{
return 0;
break;
}
else if (s1[i]>s2[i])
{
return 1;
break;
}
}
return 2;
}

//This function is to sort names alphabetically. First, delete "END" and "Ì¼@P" from inflowing stream. Then if read-in is smaller than the first name of the array, put it at the first place and put others larger. If read-in is larger than the biggest one, put it at the end of the array. Otherwise, check where is a good place for the new read-in, and then put it there.

int main()
{
ifstream source("/tmp_mnt/home/anqian/keywords.txt");
map<int, string> key_array;

string str_Line;

if(!source)
{
cerr<<"Error opening the File!"<<endl;
return 1;
}

else
{
getline(source,str_Line);
key_array[0]=str_Line;
getline(source,str_Line);

while(!source.eof() && str_Line.compare("END")!=0 && str_Line.compare("Ì¼@P")!=0)
{
if (WordCmp(str_Line,key_array[0])==0)
{
cout<<key_array.size()<<endl;
for (int i = key_array.size(); i<1; i--)
{
cout<<"I am in for loop!"<<endl;
key_array[i]=key_array[i-1];
}
key_array[0]=str_Line;
getline(source,str_Line);
}

else if(WordCmp(str_Line,key_array[key_array.size()-1])==1)
{
key_array[key_array.size()]=str_Line;
getline(source,str_Line);
}

else
{
for (int i=key_array.size(); i=2; i--)
{
if (WordCmp(key_array[i-1],str_Line)==1 && WordCmp(key_array[i-2],str_Line)==0)
{
cout<<"Yes! I am here"<<endl;
for(int j=key_array.size(); j<=i; j--)
{
key_array[j]=key_array[j-1];
}
key_array[i-1]=str_Line;
break;
}
}

getline(source,str_Line);
}
}
return 0;
}

for(int k=0; k<key_array.size();k++)
{
cout<<key_array[k]<<endl;
}
}

Source File:
SIMPLE
BITPIX
NAXIS
EXTEND
NEXTEND
DATE
FILENAME
FILETYPE
TELESCOP
INSTRUME
EQUINOX
ROOTNAME
PRIMESI
TARGNAME
RA_TARG
DEC_TARG
PROPOSID
LINENUM
PR_INV_L
PR_INV_F
PR_INV_M
TDATEOBS
TTIMEOBS
TEXPSTRT
TEXPEND
TEXPTIME
POSTARG1
POSTARG2
OVERFLOW
CAL_VER
PROCTIME
CFSTATUS
OBSTYPE
OBSMODE
PHOTMOD

Thank you so much!

bkrahmer

How's this?? :) If you are serious about coding, please take note of the OO approach, readable code, well-named variables, simple algorithms, nice use of typedefs, and making the language and the libraries available work as much for you as possible. There's further polishing that could be done...but I digress. Also, if this is for use in a class, please do not cut and paste. Even retyping someone elses code can help you learn it.
brian

#include <stdio.h>
#include <stdlib.h>
#include <fstream>
#include <iostream>
#include <string>
#include <map>

using namespace std;

typedef map<string, string> StringStringMap;
typedef StringStringMap::iterator StringStringMapIter;

class StringSorter
{
public:
StringSorter(const string &filename) : m_InputFilename(filename)
{
}

bool ProcessFile()
{
bool retval = true;
string inputBuffer;

m_InputFile.open(m_InputFilename.c_str());
if (m_InputFile.good())
{
while (!m_InputFile.eof())
{
getline(m_InputFile, inputBuffer);
if (inputBuffer == "END" || inputBuffer == "L< @P")
{
break;
}

ProcessRecord(inputBuffer);
}
m_InputFile.close();
}
else
{
retval = false;
}
return retval;
}

void DumpSortedKeywords()
{
StringStringMapIter iter;
for(iter = m_KeywordMap.begin(); iter != m_KeywordMap.end(); iter++)
{
cout << (*iter).first << endl;
}
}

private:
void ProcessRecord(const string &data)
{
m_KeywordMap[data] = data;
}

string m_InputFilename;
ifstream m_InputFile;
StringStringMap m_KeywordMap;
};

int main()
{
StringSorter sorter("D:\\Brian\\test_code\\stringsort\\keywords.txt");
if (sorter.ProcessFile())
{
sorter.DumpSortedKeywords();
}
return 0;
}

anxx0018

ASKER

Brian,

Yes, you did privide an extremely efficient program. I am serious about learning coding.

Can you explain me more about how ProcessRecord works, and typedef?

Can you explain what is OO approach, readable code? Thank you very much!

Or May I know your email address? Thank you again!

anxx0018

ASKER

again, why mine does not work?

anxx0018

ASKER

Brian,

Is the map very useful class?

typedef map<string, string> StringStringMap;

what does "(*iter).first" mean?

what does "m_KeywordMap[data] = data" mean?

I didn't find the real sorting programs. I found the very tricky stuff is the m_KeywordMap[data]=data, while I would use m_KeywordMap[int i]=data. Does m_KeywordMap[data]=data contribute to sorting?

Where could I find a good C++ book or a website which describes some useful functions and classes?

Thank you a lot!

anxx0018

ASKER

Now I think m_KeywordMap[data]=data did the sorting. Because the "[data]" must be unique and sorted. Am I right?

Can you give me a simple example how can I cat a lot of files since I have to read these keywords from more than one file.

Thank you!

ASKER CERTIFIED SOLUTION

bkrahmer

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

anxx0018

ASKER

Thank you very much!!