Improve company productivity with a Business Account.Sign Up

x
?
Solved

Excel function/macro -- extract proper nouns

Posted on 2013-12-04
10
Medium Priority
?
1,899 Views
Last Modified: 2013-12-10
Experts:

Need some help with developing -- if all possible -- a function/macro -- that scans through a single columns (in a spreadsheet) and "detects" all proper nouns.  

Then, in step #2, it would add only the nouns into another column (separated by commas).

For example, I may have the following two (2) sentences in column A:

The agency made the announcement late Tuesday night, hours after a union representative told CNN that the train engineer apparently "was nodding off and caught himself too late" before the accident.

On Monday, one of the nation's most esteemed weeklies, New York magazine, announced that it would soon shift to an every-other-week publishing schedule while beefing up its Web production.

Then, once the function/macro has been applied, I'd like to see the following nouns in column B:
B1: agency, announcement, union, representative, train, engineer, accident
B2: nation, magazine, schedule, production

Note:  In actuality, I have thousands of strings in column A.   Some of them are short... some of them are longer.

Even if I only get the 80% solution, I certainly would be a huge time saver.

Any thoughts on how to accomplish this task?

Thanks,
EEH
0
Comment
Question by:ExpExchHelp
  • 3
  • 3
  • 2
  • +2
10 Comments
 
LVL 4

Expert Comment

by:andrew_man
ID: 39695982
Can post your worksheet here?
0
 
LVL 15

Assisted Solution

by:JimFive
JimFive earned 900 total points
ID: 39696005
This is a hard problem, even you got it wrong in your example:  the word union is an adjective in the first statement, as is train.

Probably the easiest first approximation would be to find all words that follow an article (a, an, the)
You could build up a list of adjectives and look for non-adjective words that follow an adjective.  This would also let you skip adjectives after an article.  But you still have a problem with multifunctional words.

A thought that just popped into my head is:  I wonder if you can access the MS-Word spellchecker/thesaurus to get part of speech for words?
0
 
LVL 4

Expert Comment

by:andrew_man
ID: 39696025
So, we need a table of noun.
0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

 
LVL 19

Assisted Solution

by:regmigrant
regmigrant earned 300 total points
ID: 39696026
First you will need a list of all the nouns you want to find - there is no way to programmatically identify what's a noun and what's a verb within VBA.

Alternatively you could work with an external service such as this
http://stackoverflow.com/questions/5981547/finding-the-english-definition-of-a-word and I see that Microsoft provide an online tool that -presumably- would allow access to their thesaurus etc. (I haven't checked) http://office.microsoft.com/Research/query.asmx

Once you've established your data base - or written a function to work with an online tool -  you will need to read each piece of text, chop it into words and check against the list.

It would be relatively easy to do this with either a recursive function or an array but I probably wouldn't start with Excel/VBA if this is a production tool

How good are your VBA skills?

Reg
0
 
LVL 4

Expert Comment

by:andrew_man
ID: 39696042
Dear Reg,

I agreed.  Do you think we can use the WebService() function in the Excel 2013.  But, we must find out the webservice provider.

Andrew
0
 

Author Comment

by:ExpExchHelp
ID: 39696255
All:

Thank you for the comments... I appreciate it.

Unfortunately, I won't be able to post the the worksheet.   The "strings" are based on comments collected during a company survey.    As you can imagine, some of the comments are "sensitive"... posting them online would surely expedite my getting laid off.  ;)

Also, I don't have a ready table of nouns so that it compare it against.    I completely understand this is tricky w/o having a data dictionary to compare it against.

Based on a quick look of the comments, anything in the English language would qualify.   On the other hand, if a general e-dictionary with, e.g., 50,000 nouns would be somewhere posted online, I think it'll catch maybe 75% or more of the content.

EEH
0
 
LVL 47

Accepted Solution

by:
aikimark earned 300 total points
ID: 39696359
Please consider the Stanford Parser.  Although there doesn't appear to be a directly consumable interface for VBA, there are several language interfaces.
http://nlp.stanford.edu/software/lex-parser.shtml

There is at least one syntactic parser on sourceforge.net
0
 
LVL 15

Assisted Solution

by:JimFive
JimFive earned 900 total points
ID: 39696482
I think that you need to consider your end point.  Since this is for survey comments you presumably want to be able to aggregate what the respondents commented upon.  It might make sense to just count word frequencies and eliminate the useless words from that list.  If you really wanted the meaningful words for each row you could useyour meaningful wwords list as your starting point.
0
 

Author Comment

by:ExpExchHelp
ID: 39696574
aikimark -- thx, I'll look into that one.

JimFive -- excellent point.  Frequency count may facilitate findings the important "nuggets".  Any recommendations how to accomplish that in Excel or Access?

EEH
0
 
LVL 15

Assisted Solution

by:JimFive
JimFive earned 900 total points
ID: 39696629
In Access I would do this.
Create a table called Words with two columns Word varchar(50), Occurrence longint

Then, in VBA
code up a routine to go through each row of your recordset.  Strip out punctuation.  Split into an array on the spaces.  Loop through the array and if the word is new add it to the Words table with an occurrence of 1, if the word is already in the words table then add 1 to its occurrence.
It would look something like this
Sub CountWords()
Dim rs as Recordset

Set rs = openrecordset(Comments)

While not rs.eof()
    comment = rs("Comment")
    comment = replace(replace(replace(comment,".",""),",",""),"!","")
    commentwords = split(comment, " ")
    for i = lbound(commentwords) to ubound(commentwords)
        in = dlookup("occurrence","Words","Word = """ & commentwords(i) & """")
        if isnull(in) then
            addword
        else
            incrementoccurrence
        end if
    next
    rs.movenext
Wend

Open in new window

0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

After seeing numerous questions for Dynamic Data Validation I notice that most have used Visual Basic to solve the problem. This suggestion is purely formula based and can be used in multiple rows.
Windows Explorer lets you open cabinet (cab) files like any other folder. In VBA you can easily handle normal files and folders, but opening and indeed creating cabinet files takes a lot more - and that's you'll find here.
This Micro Tutorial will demonstrate how to use a scrolling table in Microsoft Excel using the INDEX function.
This Micro Tutorial will demonstrate how to create pivot charts out of a data set. I also added a drop-down menu which allows to choose from different categories in the data set and the chart will automatically update.

606 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question