[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 234
  • Last Modified:

How do I use REreplace to filter out abbreviations?

I am using Verity and needed to expand the "stop words" list to filter out common terms used in companies names such as: company, incorporated, corporation, etc..

Not only did I need to filter out those terms but their abbreviations as well such as: com, inc, inc., corp, etc...

I initially used replacelist which works great however it hiccups when filtering abbreviations. For example, while using replacelist to filter "inc" works it also changes "Lincoln" to "L oln" and "Communications" to "munications"

The solution, I believe, lies in passing the result from replacelist to a REreplace filter but I am not positive how to write the regular expression. I would want to filter any abbreviations with and without a period. Below is the code I have so far. I've shortened the list of terms I am replacing since it is quite long.
<cfset search_term = lcase(url.searchTerm)>
<cfset search_term_cleaned = replaceList(search_term, "associates,assoc,bank,companies,company,com,corp,holdings,incorporated,industries,trust,corporation"," , , , , , , , , , , , , ,")>
<cfset search_term_final = REreplace(search_term_cleaned, "REGEX here","")>

Open in new window

0
futr_vision
Asked:
futr_vision
  • 5
  • 3
1 Solution
 
ddrudikCommented:
\binc\b would match "inc" when it is bordered by a \W character [^A-Za-z0-9_] or start/end of a string, maybe that will help you.
0
 
futr_visionAuthor Commented:
I have series of these abbreviations I need to filter. Is there a way to include them all in one REreplace statement?
0
 
ddrudikCommented:
\b(one|two|three)\b
or
\b(?:one|two|three)\b
0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
futr_visionAuthor Commented:
Great! And if they use a period after the abbreviation such as in 'inc." I need to escape the period using "\" correct?
0
 
ddrudikCommented:
Yes, but note that . is in \W and \b would allow\w following to match.

Given
\binc\.\b

would match:
test inc.
test inc.a

but not:
test inc.,
0
 
ddrudikCommented:
Also, note that \binc\.\b would not match "test inc. something" given that "." and " " are both in \W.
0
 
futr_visionAuthor Commented:
Looking at this it is probably not necessary to account for the "." since "." will not return any results in  a search. I'll go with your solution as-is. Thanks
0
 
ddrudikCommented:
Thanks for the question and the points.
0

Featured Post

Receive 1:1 tech help

Solve your biggest tech problems alongside global tech experts with 1:1 help.

  • 5
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now