Solved

How do I use REreplace to filter out abbreviations?

Posted on 2009-07-13
8
226 Views
Last Modified: 2013-12-24
I am using Verity and needed to expand the "stop words" list to filter out common terms used in companies names such as: company, incorporated, corporation, etc..

Not only did I need to filter out those terms but their abbreviations as well such as: com, inc, inc., corp, etc...

I initially used replacelist which works great however it hiccups when filtering abbreviations. For example, while using replacelist to filter "inc" works it also changes "Lincoln" to "L oln" and "Communications" to "munications"

The solution, I believe, lies in passing the result from replacelist to a REreplace filter but I am not positive how to write the regular expression. I would want to filter any abbreviations with and without a period. Below is the code I have so far. I've shortened the list of terms I am replacing since it is quite long.
<cfset search_term = lcase(url.searchTerm)>

<cfset search_term_cleaned = replaceList(search_term, "associates,assoc,bank,companies,company,com,corp,holdings,incorporated,industries,trust,corporation"," , , , , , , , , , , , , ,")>

<cfset search_term_final = REreplace(search_term_cleaned, "REGEX here","")>

Open in new window

0
Comment
Question by:futr_vision
  • 5
  • 3
8 Comments
 
LVL 27

Expert Comment

by:ddrudik
ID: 24842612
\binc\b would match "inc" when it is bordered by a \W character [^A-Za-z0-9_] or start/end of a string, maybe that will help you.
0
 

Author Comment

by:futr_vision
ID: 24843230
I have series of these abbreviations I need to filter. Is there a way to include them all in one REreplace statement?
0
 
LVL 27

Accepted Solution

by:
ddrudik earned 250 total points
ID: 24843237
\b(one|two|three)\b
or
\b(?:one|two|three)\b
0
 

Author Comment

by:futr_vision
ID: 24843486
Great! And if they use a period after the abbreviation such as in 'inc." I need to escape the period using "\" correct?
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 27

Expert Comment

by:ddrudik
ID: 24843549
Yes, but note that . is in \W and \b would allow\w following to match.

Given
\binc\.\b

would match:
test inc.
test inc.a

but not:
test inc.,
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 24843573
Also, note that \binc\.\b would not match "test inc. something" given that "." and " " are both in \W.
0
 

Author Comment

by:futr_vision
ID: 24843717
Looking at this it is probably not necessary to account for the "." since "." will not return any results in  a search. I'll go with your solution as-is. Thanks
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 24844411
Thanks for the question and the points.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

One of the typical problems I have experienced is when you have to move a web server from one hosting site to another. You normally prepare all on the new host, transfer the site, change DNS and cross your fingers hoping all will be ok on new server…
Introduction This article explores the design of a cache system that can improve the performance of a web site or web application.  The assumption is that the web site has many more “read” operations than “write” operations (this is commonly the ca…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

861 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

26 Experts available now in Live!

Get 1:1 Help Now