?
Solved

Regular expressions and word filtering in ColdFusion - really need some help!!

Posted on 2006-05-11
5
Medium Priority
?
195 Views
Last Modified: 2013-12-20
Hi!  I am trying to create a language filter using regular expressions for detecting if any word in a paragraph is what is considered an adult word (at least on our list) - and then flag the paragraph.  I've gotten most of it done, but now I am struggling with one part and was really hoping someone might be able to help.  

What I am trying to do is check for weird characters (up to 2 between each letter) that might be separating an "adult" word and be able to determine that it is in fact a word in the adult list and therefore flag the paragraph as adult.

For example, say one of the adult words is the word "adult", and a person typed it in as such A**D**U^^L%%T - I am trying to write a regular expression that can test of any special characters within the word, and see if without them the word fits into the adult criteria.  

Here is what I have so far, checking for spaces between letters to see if the word exists in the string and checking for any basic variation of the word, I just need help with the other regular expression to complete it.  I'm terrible with regular expressions, so any help or any improvements on what I've got so far would be welcome as well!  

<cffunction name="cleanString" returnType="string" output="false">
   <cfargument name="string" type="string" required="true">
   <cfargument name="badwords" type="string" required="false" default="adult">
   <cfset var word = "">
   <cfset var y = "">
   <cfset var newword = "">   
   
   <cfloop index="word" list="#arguments.badwords#">
      <cfset newword = "">
      <cfloop index="y" from="1" to="#len(word)#">    
         <cfset newword = newword & mid(word, y, 1) & "\s*">
      </cfloop>

      <cfif (reFindNoCase("\b#newword#\b", arguments.string)) or (reFindNoCase("#newword#(ing|ed|er|r|\b)", arguments.string))>
        <cfset count = 1>
       </cfif>  
  </cfloop>    
 
   <cfreturn arguments.string>
</cffunction>

Of course, if anyone another solution that would work better, I am totally open to suggestions!  Also - if any part of this doesn't make sense - just let me know and I'll try to explain it more clearly! Thanks ahead of time for your help!
0
Comment
Question by:questhaven
  • 2
3 Comments
 
LVL 7

Accepted Solution

by:
wytcom earned 2000 total points
ID: 16661643
Your approach will detect many targeted words, but many other variations are likely to get through.  If you want to improve the efficiency consider adding a secondary check for paragraphs that come through clean.  Use occasional human review to keep a list of specific word versions that have slipped through.  For example, if A-D_V.L*T gets through you could add it to a list of words for a secondary check.
0
 

Author Comment

by:questhaven
ID: 16738435
wytcom - That is a really good idea - thanks! I'm going to several set of filtering, but wanted to share what I came up with for the initial round of filtering (words considered adult have been replace)

<cffunction name="cleanString" returnType="string" output="false">
  <cfargument name="string" type="string" required="true">
  <cfargument name="isAdult" type="boolean" required="false">
 
  <!---no spaces between word and comma--->
  <cfargument name="badwords" type="string" required="adult, adult2">  
   <cfset var word = "">
   <cfset var newword = "">
        
        <cfloop index="word" list="#arguments.badwords#">
            <cfset newword = "">
            <cfloop index="i" from="1" to="#len(word) - 1#">
                 <cfset newword = newword & mid(word, i, 1) & "([^a-zA-Z]*)">                        
            </cfloop>
            <cfset newword = newword & right(word, 1)>      
            <cfset newword = "([^a-zA-Z])" & newword>
            <cfset newword = newword & "([^a-zA-Z])">
            
            <cfset arguments.string = " #arguments.string# ">
            <cfif (reFindNoCase("#newword#", arguments.string)) or (reFindNoCase("#newword#(ing|ed|er|r|y|\b)", arguments.string))>
                    <cfset isAdult = TRUE>
         <cfelse>
                     <cfset isAdult = FALSE>
          </cfif>
      
            <cfif isAdult EQ FALSE>
                  <cfset newword = newword & mid(word, len(word), 1)>
                  <cfset firstIndex = REFindNoCase(newword, arguments.string)>
                  <cfset isAdult = FALSE>
                  <cfif firstIndex > 0>
                        <cfset isAdult = TRUE>
                  </cfif>
            </cfif>            
            
            <cfif isAdult>
                  <cfbreak>
            </cfif>
</cfloop>
   <cfreturn isAdult>
</cffunction>
0
 

Author Comment

by:questhaven
ID: 17052327
Sorry!  I completely forgot to close the question!  Please consider wytcom's answer and my reply a solution.
0

Featured Post

Upgrade your Question Security!

Add Premium security features to your question to ensure its privacy or anonymity. Learn more about your ability to control Question Security today.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Objective of This Article In 1990’s, when I was a budding software professional, I had a lot of confusion about which stream or technology, I had to choose to build my career. In those days, I had lot of confusion like whether to choose System so…
Geo-targeting is the practice of distributing content based on a person’s location, as best as you can determine it. Let’s look at some ways you could successfully use this tactic. The following tips and case studies could lead to meaningful results.
The purpose of this video is to demonstrate how to exclude a particular blog category from the main blog page. This is can be used when a category already has its own tab, or you simply want certain types of posts not to show up on the main blog. …
The purpose of this video is to demonstrate how to set up an RSS Feed on a WordPress Website. This will be demonstrated using a Windows 8 PC. Feedburner will be used for this demonstration. Go to your WordPress login page. This will look like the…
Suggested Courses
Course of the Month13 days, 9 hours left to enroll

750 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question