Solved

search algorithms for searching a body of text

Posted on 2003-12-06
4
315 Views
Last Modified: 2010-04-17
Im looking for an efficent search algorithm which searches a body of text (under 3,000 words) looking for keywords.

What would be the best algorithm to employ.

The body of text may be a XML document (in which case id like to be able to search the XML elements eg search for 'Alan Turing' in the element tag author)

Thanks in adavance for any pointers
0
Comment
Question by:mellowmoose
4 Comments
 

Author Comment

by:mellowmoose
ID: 9888361
To clarify a few points.

The document will be one ive never seen before.

I'd prefer to be able to search the XML b4 I parse it (if the keyowrds dont match then XML doc will not be (parsed)

Ill be using DOM parsing.

Thanks
0
 
LVL 45

Accepted Solution

by:
sunnycoder earned 125 total points
ID: 9888431
Hi mellowmoose,

since you are searching only under 300 words, even linear search should be fast enough on todays hardware

however, if you do wish take the troble of implementing efficient algorithms, I would recommend Aho Corasick algorithm

you can see a demo here
http://www-sr.informatik.uni-tuebingen.de/~buehler/AC/AC1.html

details of the algorithm
http://courses.cs.vt.edu/~algnbio/algnbio_2001/lectures/AhoCorasick.html

Cheers!
Sunny:o)
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A short article about a problem I had getting the GPS LocationListener working.
Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question