Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Need to read in file, looking for key terms

Posted on 2003-02-19
8
Medium Priority
?
184 Views
Last Modified: 2010-04-01
I need to write a program, that will read in an input file.  The file is structured:
<TITLE>Then here is the title

Info that does with the title



<TITLE>Next title...
and so on.  

I need to be able to look through the file, find a <TITLE> and store the rest of the info on that line as the key, and store everything between that and the next title key as the data.  I then make a pair of these things and put them into a list.  What is the most efficient way to do this?
0
Comment
Question by:CletusTheDwarf
8 Comments
 
LVL 7

Expert Comment

by:burcarpat
ID: 7985578
if the file is not very big, read everything into a std::vector<std::string> line by line and use string.find() to locate the <TITLE>
0
 
LVL 12

Expert Comment

by:Salte
ID: 7986376
Map the file into memory as ram or read the whole file in in one big buffer. Then you can consider the whole file to be one long array of char. Unfortunatley it is not 0 terminated but that doesn't matter.

char * beg points to the beginning.
char * end = beg + size_of_file; point to the end.


Note that STL treat char pointers like this as iterators, so you can use find() treating those two pointers as iterators. No need for actual std::string here.

char * p = find(beg,end,"<TITLE>");

if p == end then there was no <TITLE> in the file.

if p < end then p point to <TITLE> inside the string.

process that <TITLE> and then you can search for next:

p = find(p + 7,end,"<TITLE>");

p + 7 points past the <TITLE> we just found.

One way to do it is to have two pointers:

char * p point to previous <TITLE>.
char * q point to next.

so:

p = beg;
q = find(p + 7, end, "<TITLE>");

p..q will then be the current block starting from previous <TITLE> to the next (or the end).

if q < end then:

p = q; q = find(p + 7, end, "<TITLE>"); to find next.

stop when q == end.

Another way to do this is to let p and q be istream_iterator<char> objects instead.

ifstream file("thefile.lst");

istream_iterator<char> p(file);
istream_iterator<char> end; // default constructor makes an 'end' object.

istream_iterator<char> q = find(p+7,end,"<TITLE>");
again, if q == end there was no more <TITLE> in the file and [p,q) == [p,end) is a section starting with <TITLE> and is the last such section in the file.

if (q != end) (doesn't make sense to test q < end in this case) then q is a next position where you have a new <TITLE> block and [p,q) is one section.

Note that p + 7 assumes that p points to "<TITLE>" so you should do a p = find(beg,end,"<TITLE>"); first to find the very first perhaps.

Alf
0
 
LVL 1

Expert Comment

by:rainbowsix
ID: 7986585
hi cletus... Salte's comment will work, just dont forget to take care that the Literal "<TITLE>" does not appear in the actual title or the text.
OR
Search for "\n<TITLE>" except for the first one.
cheers!
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:CletusTheDwarf
ID: 7987162
It is a rather large file, about 1.6 mb of text.  Should I still just read it all into a large buffer?  If so, how do I do that?  I'm new to the language so I'm still feeling my way around.  What I've been trying so far is to use the getLine of an ifstream.  
   while(!infile.getline(temp,100).eof()){
      string line = (string)temp;
      if(line.substr(0,7)=="<TITLE>"){
      string data;
      string key = line.substr(7,line.size());
      cout<<key<<endl;
      }
}
0
 

Author Comment

by:CletusTheDwarf
ID: 7987163
It is a rather large file, about 1.6 mb of text.  Should I still just read it all into a large buffer?  If so, how do I do that?  I'm new to the language so I'm still feeling my way around.  What I've been trying so far is to use the getLine of an ifstream.  
   while(!infile.getline(temp,100).eof()){
      string line = (string)temp;
      if(line.substr(0,7)=="<TITLE>"){
      string data;
      string key = line.substr(7,line.size());
      cout<<key<<endl;
      }
}
0
 

Author Comment

by:CletusTheDwarf
ID: 7987164
It is a rather large file, about 1.6 mb of text.  Should I still just read it all into a large buffer?  If so, how do I do that?  I'm new to the language so I'm still feeling my way around.  What I've been trying so far is to use the getLine of an ifstream.  
   while(!infile.getline(temp,100).eof()){
      string line = (string)temp;
      if(line.substr(0,7)=="<TITLE>"){
      string data;
      string key = line.substr(7,line.size());
      cout<<key<<endl;
      }
}
0
 

Author Comment

by:CletusTheDwarf
ID: 7987170
oops, sorry about that.  I hit space 3 times and it took that as a press of the reply button.  As I was saying.  That tells me when I find the title, and lets me get that, but I couldn't think of where to go from there.
0
 
LVL 1

Accepted Solution

by:
Intern earned 225 total points
ID: 7987521
Here is another way using fstream and cin.get

ifstream infile;
infile.open("yourfile.txt")

char ch, word[10], line[80];
int i = 0;
int data_read_counter = 0;

while(infile)
{
     infile.get(ch);
     if(ch == '<')
     {
           i = 0;
           while(ch != '>')
           {
                  infile.get(ch);
                  word[i] = ch;
                  i++;
           }
           if(strcmp(word, "<TITLE>") == 0)
           {
                 infile.get(ch);
                 i = 0;
                 while(ch != '\n')
                 {
                       line[i] = ch;  
                       infile.get(ch);
                       i++;
                 }
                 data_read_counter++;
           }
     }
     else
     {
           if (!(data_read_counter % 2))
           {
                 data_array[counter] = ch;
           }
     }
}


This does not have the right data structures for you to hold what you want but it should work.  It reads in character by character until it finds <TITLE>, when that is found then a counter is incremented.  So when the counter is odd you know that you are in a data segment.  When you hit the terminating <TITLE> then the counter will be even so you are out of the data segment.

data_array should be something that you will store the data in.

word is the title that come on the same line as <TITLE>


I don't mean to step on toes here, the other solutions are correct this is just a different way to approach the problem.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What is C++ STL?: STL stands for Standard Template Library and is a part of standard C++ libraries. It contains many useful data structures (containers) and algorithms, which can spare you a lot of the time. Today we will look at the STL Vector. …
  Included as part of the C++ Standard Template Library (STL) is a collection of generic containers. Each of these containers serves a different purpose and has different pros and cons. It is often difficult to decide which container to use and …
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.

581 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question