Solved

Java regular expression for company name

Posted on 2008-10-10
4
933 Views
Last Modified: 2010-08-05
I am trying to create a regular expression for use in Java that takes a company name and removes extraneous information and leaves me with the core elements of the name. So, for example it would:

- Exclude words like "co", "inc" "company", "incorporated" from the end of the string
- Ignore words like "the", "a" etc. from the beginning of the string
- It should ignore punctuation marks such as commas or periods

Any help is appreciated as I am relatively new to regular expressions in Java.

-- Matt
0
Comment
Question by:aaron_karp
  • 2
  • 2
4 Comments
 
LVL 16

Accepted Solution

by:
Bryan Butler earned 500 total points
ID: 22691035
Does this mean to search a string for any words not in (co, inc, company, incorporated, the, a, ",", ".", <etc.>) ?

I might be able to give you some code if you can provide more details, such as if it's reading user input, a file, or a database.  If you just want the regular expressions for each of these, that would be:

[^\.,;<etc>]  where the "^" cause the regex to match everthing except the things following it; the "\." is the period which has to be excaped as it is a 'special character', but the ";" "," and other punctuation are fine alone;  so this would return everything that isn't punctuation.  

Then you would have to negate each of the words, which would return everthing except the "bad words".  

^Co
^the
^inc
^<etc>

Here is a good tutorial on it: http://java.sun.com/docs/books/tutorial/essential/regex/index.html

Is this was you were looking for?
0
 

Author Comment

by:aaron_karp
ID: 22694291
I think this gets me most of the way there. Here is a little bit more about my problem and what I am trying to accomplish with this regex:

I am writing a function that will compare two strings in a language similar to Java (Salesforce.com's Apex which uses the Java regex processing functions.) So there will be String1 and String2 which will be equal to company names and I want to process them with a regex, use a replaceAll function to strip out the "bad words", and then compare the two results to see if they're equal.

So it looks like I'd want to use something like:

[^\.,;^co^the^inc]

And, what would I need to do to make those matches on "co" etc. case insensitive?

Thanks!
-- Matt
0
 
LVL 16

Expert Comment

by:Bryan Butler
ID: 22703355
The flag for performing a case-insensitive is: (?i)

So I believe you would put this in front of it such as: (?i)[^\.,;^co^the^inc]

But the "Apex" thing might be a problem.  Cheers,
BB
0
 

Author Closing Comment

by:aaron_karp
ID: 31505106
Thanks - this was a big help!
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

For customizing the look of your lightweight component and making it look opaque like it was made of plastic.  This tip assumes your component to be of rectangular shape and completely opaque.   (CODE)
INTRODUCTION Working with files is a moderately common task in Java.  For most projects hard coding the file names, using parameters in configuration files, or using command-line arguments is sufficient.   However, when your application has vi…
Viewers learn about the scanner class in this video and are introduced to receiving user input for their programs. Additionally, objects, conditional statements, and loops are used to help reinforce the concepts. Introduce Scanner class: Importing…
The viewer will learn how to implement Singleton Design Pattern in Java.

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now