We help IT Professionals succeed at work.

How to get a summary from a few short titles?

wsyy asked

I have a few short titles of specific products. For example, they are:

1. "Apple iphone 4S"
2. "Apple iphone 4"
3. "iphone 4S"

I would like to have a summary such as "iphone 4" or "iphone 4s". How can I do so?

Watch Question

Top Expert 2007

No one has answered because this is either obvious or harder or very hard.  

The obvious answer is to make a list of all the possible strings which you think are the same as "iphone 4" and if the incoming string is one of those, then it goes in that bucket.

You can enhance it slightly by doing a little smarter checking, like if (instr.toLowerCase() ).indexOf("iphone") > -1 && (instr.toLowerCase()).indexOf("4") > -1) then iPhone4Group
+= 1;

The harder way is to use text indexing, like Lucene.

The very hard way is to use natural language processing to read the incoming strings and make a judgement as to whether they are in the iPhone 4 group.


I am looking for the harder way.

How can Lucene solve this problem?
Top Expert 2007
You run the Lucene text indexer in the input data and execute a Lucene query to retrieve all pieces of input data which contain "iphone" and "4."  The results might not be terribly different in this case, but the mechanism would be more general for all your cases.  And it would handle more of the vagaries of English, such as plurals and tense.  I believe there might be a way to do Soundex searching as well.