?
Solved

Need to read in file, looking for key terms

Posted on 2003-02-19
8
Medium Priority
?
183 Views
Last Modified: 2010-04-01
I need to write a program, that will read in an input file.  The file is structured:
<TITLE>Then here is the title

Info that does with the title



<TITLE>Next title...
and so on.  

I need to be able to look through the file, find a <TITLE> and store the rest of the info on that line as the key, and store everything between that and the next title key as the data.  I then make a pair of these things and put them into a list.  What is the most efficient way to do this?
0
Comment
Question by:CletusTheDwarf
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 
LVL 7

Expert Comment

by:burcarpat
ID: 7985578
if the file is not very big, read everything into a std::vector<std::string> line by line and use string.find() to locate the <TITLE>
0
 
LVL 12

Expert Comment

by:Salte
ID: 7986376
Map the file into memory as ram or read the whole file in in one big buffer. Then you can consider the whole file to be one long array of char. Unfortunatley it is not 0 terminated but that doesn't matter.

char * beg points to the beginning.
char * end = beg + size_of_file; point to the end.


Note that STL treat char pointers like this as iterators, so you can use find() treating those two pointers as iterators. No need for actual std::string here.

char * p = find(beg,end,"<TITLE>");

if p == end then there was no <TITLE> in the file.

if p < end then p point to <TITLE> inside the string.

process that <TITLE> and then you can search for next:

p = find(p + 7,end,"<TITLE>");

p + 7 points past the <TITLE> we just found.

One way to do it is to have two pointers:

char * p point to previous <TITLE>.
char * q point to next.

so:

p = beg;
q = find(p + 7, end, "<TITLE>");

p..q will then be the current block starting from previous <TITLE> to the next (or the end).

if q < end then:

p = q; q = find(p + 7, end, "<TITLE>"); to find next.

stop when q == end.

Another way to do this is to let p and q be istream_iterator<char> objects instead.

ifstream file("thefile.lst");

istream_iterator<char> p(file);
istream_iterator<char> end; // default constructor makes an 'end' object.

istream_iterator<char> q = find(p+7,end,"<TITLE>");
again, if q == end there was no more <TITLE> in the file and [p,q) == [p,end) is a section starting with <TITLE> and is the last such section in the file.

if (q != end) (doesn't make sense to test q < end in this case) then q is a next position where you have a new <TITLE> block and [p,q) is one section.

Note that p + 7 assumes that p points to "<TITLE>" so you should do a p = find(beg,end,"<TITLE>"); first to find the very first perhaps.

Alf
0
 
LVL 1

Expert Comment

by:rainbowsix
ID: 7986585
hi cletus... Salte's comment will work, just dont forget to take care that the Literal "<TITLE>" does not appear in the actual title or the text.
OR
Search for "\n<TITLE>" except for the first one.
cheers!
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:CletusTheDwarf
ID: 7987162
It is a rather large file, about 1.6 mb of text.  Should I still just read it all into a large buffer?  If so, how do I do that?  I'm new to the language so I'm still feeling my way around.  What I've been trying so far is to use the getLine of an ifstream.  
   while(!infile.getline(temp,100).eof()){
      string line = (string)temp;
      if(line.substr(0,7)=="<TITLE>"){
      string data;
      string key = line.substr(7,line.size());
      cout<<key<<endl;
      }
}
0
 

Author Comment

by:CletusTheDwarf
ID: 7987163
It is a rather large file, about 1.6 mb of text.  Should I still just read it all into a large buffer?  If so, how do I do that?  I'm new to the language so I'm still feeling my way around.  What I've been trying so far is to use the getLine of an ifstream.  
   while(!infile.getline(temp,100).eof()){
      string line = (string)temp;
      if(line.substr(0,7)=="<TITLE>"){
      string data;
      string key = line.substr(7,line.size());
      cout<<key<<endl;
      }
}
0
 

Author Comment

by:CletusTheDwarf
ID: 7987164
It is a rather large file, about 1.6 mb of text.  Should I still just read it all into a large buffer?  If so, how do I do that?  I'm new to the language so I'm still feeling my way around.  What I've been trying so far is to use the getLine of an ifstream.  
   while(!infile.getline(temp,100).eof()){
      string line = (string)temp;
      if(line.substr(0,7)=="<TITLE>"){
      string data;
      string key = line.substr(7,line.size());
      cout<<key<<endl;
      }
}
0
 

Author Comment

by:CletusTheDwarf
ID: 7987170
oops, sorry about that.  I hit space 3 times and it took that as a press of the reply button.  As I was saying.  That tells me when I find the title, and lets me get that, but I couldn't think of where to go from there.
0
 
LVL 1

Accepted Solution

by:
Intern earned 225 total points
ID: 7987521
Here is another way using fstream and cin.get

ifstream infile;
infile.open("yourfile.txt")

char ch, word[10], line[80];
int i = 0;
int data_read_counter = 0;

while(infile)
{
     infile.get(ch);
     if(ch == '<')
     {
           i = 0;
           while(ch != '>')
           {
                  infile.get(ch);
                  word[i] = ch;
                  i++;
           }
           if(strcmp(word, "<TITLE>") == 0)
           {
                 infile.get(ch);
                 i = 0;
                 while(ch != '\n')
                 {
                       line[i] = ch;  
                       infile.get(ch);
                       i++;
                 }
                 data_read_counter++;
           }
     }
     else
     {
           if (!(data_read_counter % 2))
           {
                 data_array[counter] = ch;
           }
     }
}


This does not have the right data structures for you to hold what you want but it should work.  It reads in character by character until it finds <TITLE>, when that is found then a counter is incremented.  So when the counter is odd you know that you are in a data segment.  When you hit the terminating <TITLE> then the counter will be even so you are out of the data segment.

data_array should be something that you will store the data in.

word is the title that come on the same line as <TITLE>


I don't mean to step on toes here, the other solutions are correct this is just a different way to approach the problem.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is the first in a series of articles about the C/C++ Visual Studio Express debugger.  It provides a quick start guide in using the debugger. Part 2 focuses on additional topics in breakpoints.  Lastly, Part 3 focuses on th…
Introduction This article is a continuation of the C/C++ Visual Studio Express debugger series. Part 1 provided a quick start guide in using the debugger. Part 2 focused on additional topics in breakpoints. As your assignments become a little more …
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question