Solved

Java regular expression for company name

Posted on 2008-10-10
4
944 Views
Last Modified: 2010-08-05
I am trying to create a regular expression for use in Java that takes a company name and removes extraneous information and leaves me with the core elements of the name. So, for example it would:

- Exclude words like "co", "inc" "company", "incorporated" from the end of the string
- Ignore words like "the", "a" etc. from the beginning of the string
- It should ignore punctuation marks such as commas or periods

Any help is appreciated as I am relatively new to regular expressions in Java.

-- Matt
0
Comment
Question by:aaron_karp
  • 2
  • 2
4 Comments
 
LVL 16

Accepted Solution

by:
Bryan Butler earned 500 total points
ID: 22691035
Does this mean to search a string for any words not in (co, inc, company, incorporated, the, a, ",", ".", <etc.>) ?

I might be able to give you some code if you can provide more details, such as if it's reading user input, a file, or a database.  If you just want the regular expressions for each of these, that would be:

[^\.,;<etc>]  where the "^" cause the regex to match everthing except the things following it; the "\." is the period which has to be excaped as it is a 'special character', but the ";" "," and other punctuation are fine alone;  so this would return everything that isn't punctuation.  

Then you would have to negate each of the words, which would return everthing except the "bad words".  

^Co
^the
^inc
^<etc>

Here is a good tutorial on it: http://java.sun.com/docs/books/tutorial/essential/regex/index.html

Is this was you were looking for?
0
 

Author Comment

by:aaron_karp
ID: 22694291
I think this gets me most of the way there. Here is a little bit more about my problem and what I am trying to accomplish with this regex:

I am writing a function that will compare two strings in a language similar to Java (Salesforce.com's Apex which uses the Java regex processing functions.) So there will be String1 and String2 which will be equal to company names and I want to process them with a regex, use a replaceAll function to strip out the "bad words", and then compare the two results to see if they're equal.

So it looks like I'd want to use something like:

[^\.,;^co^the^inc]

And, what would I need to do to make those matches on "co" etc. case insensitive?

Thanks!
-- Matt
0
 
LVL 16

Expert Comment

by:Bryan Butler
ID: 22703355
The flag for performing a case-insensitive is: (?i)

So I believe you would put this in front of it such as: (?i)[^\.,;^co^the^inc]

But the "Apex" thing might be a problem.  Cheers,
BB
0
 

Author Closing Comment

by:aaron_karp
ID: 31505106
Thanks - this was a big help!
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
servlet doXXX methods 3 61
Configure a Bean in an XML file 4 42
null output 3 35
Crystal Reports Licensing Questions 4 33
For beginner Java programmers or at least those new to the Eclipse IDE, the following tutorial will show some (four) ways in which you can import your Java projects to your Eclipse workbench. Introduction While learning Java can be done with…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
Viewers learn about the third conditional statement “else if” and use it in an example program. Then additional information about conditional statements is provided, covering the topic thoroughly. Viewers learn about the third conditional statement …
Viewers learn about the “while” loop and how to utilize it correctly in Java. Additionally, viewers begin exploring how to include conditional statements within a while loop and avoid an endless loop. Define While Loop: Basic Example: Explanatio…

792 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question