Solved

Textarea word filter with XML word list

Posted on 2006-06-15
6
688 Views
Last Modified: 2008-01-09
Hello,

I'm trying to figure out the best way to solve the following problem. I've got a ASP.NET/VB.NET form with a textarea and a XML document with about 600 bad words. OnSubmit I'd like to check and see if any words entered into the textarea are listed in the XML document.

The XML document is structured like this:

<wordlist>
  <word>badword1</word>
  <word>badword2</word>
  <word>bad word or phrase</word>
</wordlist>

My first attempt was to load the XML data into a dataset, loop throught the dataset and use instr on the textarea value to check for words in the XML list. Then the word in question is displayed to the user so they can correct the post. This works somewhat, but is flawed because if there is a bad word inside a good word then the post is flagged, even though there is no bad word - for example the good word "how" is being flagged because it has the word "ho" within which is in the bad word list.

Here's what that code looks like.

    Dim dsFilter as DataSet
    dsFilter = New DataSet()
    dsFilter.ReadXML(MapPath("xml/wordlist.xml"))
   
    Dim intI as Integer = dsFilter.Tables(0).Rows.Count
    Dim intJ as Integer = 0
    Dim strFlag as String = ""

    While intJ < (intI - 1)
   
      If inStr(txtQuickReply.Text, " " & dsFilter.Tables(0).Rows(intJ)(0)) or inStr(txtQuickReply.Text, dsFilter.Tables(0).Rows(intJ)(0) & " ") or inStr(txtQuickReply.Text, " " & dsFilter.Tables(0).Rows(intJ)(0) & " ") or inStr(txtQuickReply.Text, dsFilter.Tables(0).Rows(intJ)(0)) = 1 or inStr(txtQuickReply.Text, " " & dsFilter.Tables(0).Rows(intJ)(0) & ".") or inStr(txtQuickReply.Text, " " & dsFilter.Tables(0).Rows(intJ)(0) & "?") or inStr(txtQuickReply.Text, " " & dsFilter.Tables(0).Rows(intJ)(0) & "!") Then
     
        strFlag = dsFilter.Tables(0).Rows(intJ)(0)
       
      End If
     
      intJ = intJ + 1
   
    End While
   
The only other solution I can think of is to split the textarea content into an aray by space and then do a nested loop - looping through the array of words in the textarea for the outer loop and taking each value of that array and loop through the bad word list to compare for the inner loop. However, I wonder if there is a more efficient solution.

Does anyone have any ideas? Thank you.

Scott
0
Comment
Question by:sneidig
  • 4
  • 2
6 Comments
 
LVL 20

Expert Comment

by:alainbryden
ID: 16916626
you're on the right track. The key now is don't just search for the word but search for " " & word & " " that way you get the word on it's own and not stuck against anything else.

The proper way to do this though would be to use a regular expression checker. Just import regex. Then you can go crazy finding your words in the text, and also make sure that any tags you alow in the box (like html) are skipped so that they can't get away with bad words by saying something like Da[b][/b]mn
0
 
LVL 20

Expert Comment

by:alainbryden
ID: 16916724
Here, I redid your code for you, this should be much easier to manage and edit to make better:

      Dim dsFilter as DataSet
    Dim I as Integer
    Dim strFlag as String, Text As String, Word as String
   
    dsFilter = New DataSet()
    dsFilter.ReadXML(MapPath("xml/wordlist.xml"))
   
      Text = txtQuickReply.Text

    For I = 0 to dsFilter.Tables(0).Rows.Count - 1
   
          Word = dsFilter.Tables(0).Rows(I)(0)
          
          If inStr(Text, " " & Word & " ") or _
             inStr(Text, " " & Word & ".") or _
             inStr(Text, " " & Word & "?") or _
             inStr(Text, " " & Word & "!") Then
     
             strFlag = Word
            End If
    End While



You have to take out those first two statements so that words like 'ho'w and spec'ho' (special?) aren't caught by the filter. Really it's easy to get away with alot in this feature , but at least this way it's easy to tell what you're checking for and you can plug in more variations as you see what they try.
0
 

Author Comment

by:sneidig
ID: 16916876
Thanks for taking a look at this.  If no one has any other has other ideas, I'll probably give you the points tomorrow when I revisit this issue. This works fine except for if the word is the first word in the textarea or else the last word with no puncuation or space. Regular expressions have always been a challenging for me, which is frustrating since much of the time it seems like I can code whatever I want.

I appreciate you taking the time to share your thoughts with me.

Scott
0
Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

 
LVL 20

Accepted Solution

by:
alainbryden earned 500 total points
ID: 16917366
oh right that's why you had that. Then you can add another line or two

inStr(Text, Word & " ") = 1 or                                     'Word is at the beginning
(inStr(Text, Word) AND len(Text) = len(Word+1)) or     'Only word is a bad word (with room for punctuation)
inStr(" " & Text, Word) > len(Text) - len(Word+1) or ... 'Word is at the end (with room for punctuation)
0
 

Author Comment

by:sneidig
ID: 16922813
Those are some great ideas thanks for your help.

Scott
0
 
LVL 20

Expert Comment

by:alainbryden
ID: 16923333
My pleasure Scott.
0

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Does the idea of dealing with bits scare or confuse you? Does it seem like a waste of time in an age where we all have terabytes of storage? If so, you're missing out on one of the core tools in every professional programmer's toolbox. Learn how to …
This is an explanation of a simple data model to help parse a JSON feed
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…

813 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now