Solved

search algorithms for searching a body of text

Posted on 2003-12-06
4
282 Views
Last Modified: 2010-04-17
Im looking for an efficent search algorithm which searches a body of text (under 3,000 words) looking for keywords.

What would be the best algorithm to employ.

The body of text may be a XML document (in which case id like to be able to search the XML elements eg search for 'Alan Turing' in the element tag author)

Thanks in adavance for any pointers
0
Comment
Question by:mellowmoose
4 Comments
 

Author Comment

by:mellowmoose
ID: 9888361
To clarify a few points.

The document will be one ive never seen before.

I'd prefer to be able to search the XML b4 I parse it (if the keyowrds dont match then XML doc will not be (parsed)

Ill be using DOM parsing.

Thanks
0
 
LVL 45

Accepted Solution

by:
sunnycoder earned 125 total points
ID: 9888431
Hi mellowmoose,

since you are searching only under 300 words, even linear search should be fast enough on todays hardware

however, if you do wish take the troble of implementing efficient algorithms, I would recommend Aho Corasick algorithm

you can see a demo here
http://www-sr.informatik.uni-tuebingen.de/~buehler/AC/AC1.html

details of the algorithm
http://courses.cs.vt.edu/~algnbio/algnbio_2001/lectures/AhoCorasick.html

Cheers!
Sunny:o)
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
A short article about a problem I had getting the GPS LocationListener working.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now