How to get a summary from a few short titles?

wsyy asked

I have a few short titles of specific products. For example, they are:

1. "Apple iphone 4S"
2. "Apple iphone 4"
3. "iphone 4S"

I would like to have a summary such as "iphone 4" or "iphone 4s". How can I do so?

No one has answered because this is either obvious or harder or very hard.  

The obvious answer is to make a list of all the possible strings which you think are the same as "iphone 4" and if the incoming string is one of those, then it goes in that bucket.

You can enhance it slightly by doing a little smarter checking, like if (instr.toLowerCase() ).indexOf("iphone") > -1 && (instr.toLowerCase()).indexOf("4") > -1) then iPhone4Group
+= 1;

The harder way is to use text indexing, like Lucene.

The very hard way is to use natural language processing to read the incoming strings and make a judgement as to whether they are in the iPhone 4 group.


I am looking for the harder way.

How can Lucene solve this problem?
You run the Lucene text indexer in the input data and execute a Lucene query to retrieve all pieces of input data which contain "iphone" and "4."  The results might not be terribly different in this case, but the mechanism would be more general for all your cases.  And it would handle more of the vagaries of English, such as plurals and tense.  I believe there might be a way to do Soundex searching as well.