Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Java regular expression for company name

Posted on 2008-10-10
4
Medium Priority
?
996 Views
Last Modified: 2010-08-05
I am trying to create a regular expression for use in Java that takes a company name and removes extraneous information and leaves me with the core elements of the name. So, for example it would:

- Exclude words like "co", "inc" "company", "incorporated" from the end of the string
- Ignore words like "the", "a" etc. from the beginning of the string
- It should ignore punctuation marks such as commas or periods

Any help is appreciated as I am relatively new to regular expressions in Java.

-- Matt
0
Comment
Question by:aaron_karp
  • 2
  • 2
4 Comments
 
LVL 16

Accepted Solution

by:
Bryan Butler earned 2000 total points
ID: 22691035
Does this mean to search a string for any words not in (co, inc, company, incorporated, the, a, ",", ".", <etc.>) ?

I might be able to give you some code if you can provide more details, such as if it's reading user input, a file, or a database.  If you just want the regular expressions for each of these, that would be:

[^\.,;<etc>]  where the "^" cause the regex to match everthing except the things following it; the "\." is the period which has to be excaped as it is a 'special character', but the ";" "," and other punctuation are fine alone;  so this would return everything that isn't punctuation.  

Then you would have to negate each of the words, which would return everthing except the "bad words".  

^Co
^the
^inc
^<etc>

Here is a good tutorial on it: http://java.sun.com/docs/books/tutorial/essential/regex/index.html

Is this was you were looking for?
0
 

Author Comment

by:aaron_karp
ID: 22694291
I think this gets me most of the way there. Here is a little bit more about my problem and what I am trying to accomplish with this regex:

I am writing a function that will compare two strings in a language similar to Java (Salesforce.com's Apex which uses the Java regex processing functions.) So there will be String1 and String2 which will be equal to company names and I want to process them with a regex, use a replaceAll function to strip out the "bad words", and then compare the two results to see if they're equal.

So it looks like I'd want to use something like:

[^\.,;^co^the^inc]

And, what would I need to do to make those matches on "co" etc. case insensitive?

Thanks!
-- Matt
0
 
LVL 16

Expert Comment

by:Bryan Butler
ID: 22703355
The flag for performing a case-insensitive is: (?i)

So I believe you would put this in front of it such as: (?i)[^\.,;^co^the^inc]

But the "Apex" thing might be a problem.  Cheers,
BB
0
 

Author Closing Comment

by:aaron_karp
ID: 31505106
Thanks - this was a big help!
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For beginner Java programmers or at least those new to the Eclipse IDE, the following tutorial will show some (four) ways in which you can import your Java projects to your Eclipse workbench. Introduction While learning Java can be done with…
Are you developing a Java application and want to create Excel Spreadsheets? You have come to the right place, this article will describe how you can create Excel Spreadsheets from a Java Application. For the purposes of this article, I will be u…
Viewers learn about the “for” loop and how it works in Java. By comparing it to the while loop learned before, viewers can make the transition easily. You will learn about the formatting of the for loop as we write a program that prints even numbers…
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.
Suggested Courses

876 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question