Solved

Excel function/macro -- extract proper nouns

Posted on 2013-12-04
10
1,472 Views
Last Modified: 2013-12-10
Experts:

Need some help with developing -- if all possible -- a function/macro -- that scans through a single columns (in a spreadsheet) and "detects" all proper nouns.  

Then, in step #2, it would add only the nouns into another column (separated by commas).

For example, I may have the following two (2) sentences in column A:

The agency made the announcement late Tuesday night, hours after a union representative told CNN that the train engineer apparently "was nodding off and caught himself too late" before the accident.

On Monday, one of the nation's most esteemed weeklies, New York magazine, announced that it would soon shift to an every-other-week publishing schedule while beefing up its Web production.

Then, once the function/macro has been applied, I'd like to see the following nouns in column B:
B1: agency, announcement, union, representative, train, engineer, accident
B2: nation, magazine, schedule, production

Note:  In actuality, I have thousands of strings in column A.   Some of them are short... some of them are longer.

Even if I only get the 80% solution, I certainly would be a huge time saver.

Any thoughts on how to accomplish this task?

Thanks,
EEH
0
Comment
Question by:ExpExchHelp
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
  • +2
10 Comments
 
LVL 4

Expert Comment

by:andrew_man
ID: 39695982
Can post your worksheet here?
0
 
LVL 15

Assisted Solution

by:JimFive
JimFive earned 300 total points
ID: 39696005
This is a hard problem, even you got it wrong in your example:  the word union is an adjective in the first statement, as is train.

Probably the easiest first approximation would be to find all words that follow an article (a, an, the)
You could build up a list of adjectives and look for non-adjective words that follow an adjective.  This would also let you skip adjectives after an article.  But you still have a problem with multifunctional words.

A thought that just popped into my head is:  I wonder if you can access the MS-Word spellchecker/thesaurus to get part of speech for words?
0
 
LVL 4

Expert Comment

by:andrew_man
ID: 39696025
So, we need a table of noun.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 19

Assisted Solution

by:regmigrant
regmigrant earned 100 total points
ID: 39696026
First you will need a list of all the nouns you want to find - there is no way to programmatically identify what's a noun and what's a verb within VBA.

Alternatively you could work with an external service such as this
http://stackoverflow.com/questions/5981547/finding-the-english-definition-of-a-word and I see that Microsoft provide an online tool that -presumably- would allow access to their thesaurus etc. (I haven't checked) http://office.microsoft.com/Research/query.asmx

Once you've established your data base - or written a function to work with an online tool -  you will need to read each piece of text, chop it into words and check against the list.

It would be relatively easy to do this with either a recursive function or an array but I probably wouldn't start with Excel/VBA if this is a production tool

How good are your VBA skills?

Reg
0
 
LVL 4

Expert Comment

by:andrew_man
ID: 39696042
Dear Reg,

I agreed.  Do you think we can use the WebService() function in the Excel 2013.  But, we must find out the webservice provider.

Andrew
0
 

Author Comment

by:ExpExchHelp
ID: 39696255
All:

Thank you for the comments... I appreciate it.

Unfortunately, I won't be able to post the the worksheet.   The "strings" are based on comments collected during a company survey.    As you can imagine, some of the comments are "sensitive"... posting them online would surely expedite my getting laid off.  ;)

Also, I don't have a ready table of nouns so that it compare it against.    I completely understand this is tricky w/o having a data dictionary to compare it against.

Based on a quick look of the comments, anything in the English language would qualify.   On the other hand, if a general e-dictionary with, e.g., 50,000 nouns would be somewhere posted online, I think it'll catch maybe 75% or more of the content.

EEH
0
 
LVL 45

Accepted Solution

by:
aikimark earned 100 total points
ID: 39696359
Please consider the Stanford Parser.  Although there doesn't appear to be a directly consumable interface for VBA, there are several language interfaces.
http://nlp.stanford.edu/software/lex-parser.shtml

There is at least one syntactic parser on sourceforge.net
0
 
LVL 15

Assisted Solution

by:JimFive
JimFive earned 300 total points
ID: 39696482
I think that you need to consider your end point.  Since this is for survey comments you presumably want to be able to aggregate what the respondents commented upon.  It might make sense to just count word frequencies and eliminate the useless words from that list.  If you really wanted the meaningful words for each row you could useyour meaningful wwords list as your starting point.
0
 

Author Comment

by:ExpExchHelp
ID: 39696574
aikimark -- thx, I'll look into that one.

JimFive -- excellent point.  Frequency count may facilitate findings the important "nuggets".  Any recommendations how to accomplish that in Excel or Access?

EEH
0
 
LVL 15

Assisted Solution

by:JimFive
JimFive earned 300 total points
ID: 39696629
In Access I would do this.
Create a table called Words with two columns Word varchar(50), Occurrence longint

Then, in VBA
code up a routine to go through each row of your recordset.  Strip out punctuation.  Split into an array on the spaces.  Loop through the array and if the word is new add it to the Words table with an occurrence of 1, if the word is already in the words table then add 1 to its occurrence.
It would look something like this
Sub CountWords()
Dim rs as Recordset

Set rs = openrecordset(Comments)

While not rs.eof()
    comment = rs("Comment")
    comment = replace(replace(replace(comment,".",""),",",""),"!","")
    commentwords = split(comment, " ")
    for i = lbound(commentwords) to ubound(commentwords)
        in = dlookup("occurrence","Words","Word = """ & commentwords(i) & """")
        if isnull(in) then
            addword
        else
            incrementoccurrence
        end if
    next
    rs.movenext
Wend

Open in new window

0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Freeze panes is an option within all variants of Excel to enable parts of a sheet to remain stationary when the cursor is in another part of the sheet. This is a very useful feature which is overlooked or under used.
This article describes a serious pitfall that can happen when deleting shapes using VBA.
This Micro Tutorial will demonstrate the scrolling table in Microsoft Excel using the INDEX function.
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa‚Ķ

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question