Is there any reason you can't import it to MS Word and do the checking there?
Main Topics
Browse All TopicsHi Experts
Could you provide me links for algorithms on spell checker?
Supposing if have a text file of 1MB size, and have around 50,000 words in a dictionary (which is also a text file), what is the best and robust algorithm to apply for detecting errors?
Thanks
Sri
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
I have already tried "the simplest thing" which you have said, but it takes lot of time to get completed, atleast 2 mins! But I want an algorithm which does it in few seconds.
Probably storing the dictionary in a kind of tree structure, and then checking each unique word in the content file with that tree... something like that. Or if there is a better approach also please let me know.
BTW, I want to learn this algorithm and thats why i need it.
Thanks
Srikanth
If you are interested in the algorithms for this sort of work, you should look at the classic text Searching and Sorting by Donald Knuth. You can get a used copy for less than $20 at Amazon.
http://www.amazon.com/Art-
The sort of things you need to consider:
A 1MB text file has on the order of 200K words. So you have to do at least 200K
operations.
The first strategy I suggested (sorting and eliminating duplicates) is computationally intensive, but there are existing utilities for those functions.
It does help to reduce the number of searches (look ups) that you need to do. A typical 200K word text file probably uses a vocabulary of <5K words. Searches can be very computaionally intensive.
Preprocessing the dictionary is a good idea. The structure you would probably want to use is a hash table. Look at Knuth or Wikipedia for the details:
http://en.wikipedia.org/wi
Hashing can reduce the complexity of lookups to a single operation. The spell checking the entire document would reduce to a linear pass through the file, performing a single hash lookup for each word.
Business Accounts
Answer for Membership
by: d-glitchPosted on 2008-02-20 at 11:30:09ID: 20940962
The simplest thing to do is break the text file into a list of words, alphabetize them, eliminate the duplicates, and then do at comparison with your sorted
dictionary.
Throw away words that are in the dictionary. Keep the ones that aren't for human review.
All of these functions are typically available in the UNIX shell.
You can probably do it in one line with pipes.