Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Java regular expression for company name

Posted on 2008-10-10
4
Medium Priority
?
979 Views
Last Modified: 2010-08-05
I am trying to create a regular expression for use in Java that takes a company name and removes extraneous information and leaves me with the core elements of the name. So, for example it would:

- Exclude words like "co", "inc" "company", "incorporated" from the end of the string
- Ignore words like "the", "a" etc. from the beginning of the string
- It should ignore punctuation marks such as commas or periods

Any help is appreciated as I am relatively new to regular expressions in Java.

-- Matt
0
Comment
Question by:aaron_karp
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 16

Accepted Solution

by:
Bryan Butler earned 2000 total points
ID: 22691035
Does this mean to search a string for any words not in (co, inc, company, incorporated, the, a, ",", ".", <etc.>) ?

I might be able to give you some code if you can provide more details, such as if it's reading user input, a file, or a database.  If you just want the regular expressions for each of these, that would be:

[^\.,;<etc>]  where the "^" cause the regex to match everthing except the things following it; the "\." is the period which has to be excaped as it is a 'special character', but the ";" "," and other punctuation are fine alone;  so this would return everything that isn't punctuation.  

Then you would have to negate each of the words, which would return everthing except the "bad words".  

^Co
^the
^inc
^<etc>

Here is a good tutorial on it: http://java.sun.com/docs/books/tutorial/essential/regex/index.html

Is this was you were looking for?
0
 

Author Comment

by:aaron_karp
ID: 22694291
I think this gets me most of the way there. Here is a little bit more about my problem and what I am trying to accomplish with this regex:

I am writing a function that will compare two strings in a language similar to Java (Salesforce.com's Apex which uses the Java regex processing functions.) So there will be String1 and String2 which will be equal to company names and I want to process them with a regex, use a replaceAll function to strip out the "bad words", and then compare the two results to see if they're equal.

So it looks like I'd want to use something like:

[^\.,;^co^the^inc]

And, what would I need to do to make those matches on "co" etc. case insensitive?

Thanks!
-- Matt
0
 
LVL 16

Expert Comment

by:Bryan Butler
ID: 22703355
The flag for performing a case-insensitive is: (?i)

So I believe you would put this in front of it such as: (?i)[^\.,;^co^the^inc]

But the "Apex" thing might be a problem.  Cheers,
BB
0
 

Author Closing Comment

by:aaron_karp
ID: 31505106
Thanks - this was a big help!
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
Viewers learn how to read error messages and identify possible mistakes that could cause hours of frustration. Coding is as much about debugging your code as it is about writing it. Define Error Message: Line Numbers: Type of Error: Break Down…
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…
Suggested Courses

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question