How do I use REreplace to filter out abbreviations?

Posted on 2009-07-13
Last Modified: 2013-12-24
I am using Verity and needed to expand the "stop words" list to filter out common terms used in companies names such as: company, incorporated, corporation, etc..

Not only did I need to filter out those terms but their abbreviations as well such as: com, inc, inc., corp, etc...

I initially used replacelist which works great however it hiccups when filtering abbreviations. For example, while using replacelist to filter "inc" works it also changes "Lincoln" to "L oln" and "Communications" to "munications"

The solution, I believe, lies in passing the result from replacelist to a REreplace filter but I am not positive how to write the regular expression. I would want to filter any abbreviations with and without a period. Below is the code I have so far. I've shortened the list of terms I am replacing since it is quite long.
<cfset search_term = lcase(url.searchTerm)>
<cfset search_term_cleaned = replaceList(search_term, "associates,assoc,bank,companies,company,com,corp,holdings,incorporated,industries,trust,corporation"," , , , , , , , , , , , , ,")>
<cfset search_term_final = REreplace(search_term_cleaned, "REGEX here","")>

Open in new window

Question by:futr_vision
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
LVL 27

Expert Comment

ID: 24842612
\binc\b would match "inc" when it is bordered by a \W character [^A-Za-z0-9_] or start/end of a string, maybe that will help you.

Author Comment

ID: 24843230
I have series of these abbreviations I need to filter. Is there a way to include them all in one REreplace statement?
LVL 27

Accepted Solution

ddrudik earned 250 total points
ID: 24843237
The Eight Noble Truths of Backup and Recovery

How can IT departments tackle the challenges of a Big Data world? This white paper provides a roadmap to success and helps companies ensure that all their data is safe and secure, no matter if it resides on-premise with physical or virtual machines or in the cloud.


Author Comment

ID: 24843486
Great! And if they use a period after the abbreviation such as in 'inc." I need to escape the period using "\" correct?
LVL 27

Expert Comment

ID: 24843549
Yes, but note that . is in \W and \b would allow\w following to match.


would match:
test inc.
test inc.a

but not:
test inc.,
LVL 27

Expert Comment

ID: 24843573
Also, note that \binc\.\b would not match "test inc. something" given that "." and " " are both in \W.

Author Comment

ID: 24843717
Looking at this it is probably not necessary to account for the "." since "." will not return any results in  a search. I'll go with your solution as-is. Thanks
LVL 27

Expert Comment

ID: 24844411
Thanks for the question and the points.

Featured Post

Don't Miss ATEN at InfoComm 2017!

Visit booth #2167 to see the  new ATEN VM3200 32 x 32 Modular Matrix Switch. Other highlights include the VE8950 4K HDMI Over IP Extender, VS1912 12-Port DP Video Wall Media Player  and VK2100 ATEN Control System. Register now with Free Pass Code ATEN288!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: kevp75
Hey folks, 'bout time for me to come around with a little tip. Thanks to IIS 7.5 Extensions and Microsoft (well... really Windows 8, and IIS 8 I guess...), we can now prime our Application Pools, when IIS starts. Now, though it would be nice t…
This is an updated version of a post made on my blog over 3 years ago. It is unfortunately, still very relevant as we continue to see both SQLi (SQL injection) and XSS (cross site scripting) attacks hitting some of the most recognizable website and …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question