sneidig
asked on
Textarea word filter with XML word list
Hello,
I'm trying to figure out the best way to solve the following problem. I've got a ASP.NET/VB.NET form with a textarea and a XML document with about 600 bad words. OnSubmit I'd like to check and see if any words entered into the textarea are listed in the XML document.
The XML document is structured like this:
<wordlist>
<word>badword1</word>
<word>badword2</word>
<word>bad word or phrase</word>
</wordlist>
My first attempt was to load the XML data into a dataset, loop throught the dataset and use instr on the textarea value to check for words in the XML list. Then the word in question is displayed to the user so they can correct the post. This works somewhat, but is flawed because if there is a bad word inside a good word then the post is flagged, even though there is no bad word - for example the good word "how" is being flagged because it has the word "ho" within which is in the bad word list.
Here's what that code looks like.
Dim dsFilter as DataSet
dsFilter = New DataSet()
dsFilter.ReadXML(MapPath(" xml/wordli st.xml"))
Dim intI as Integer = dsFilter.Tables(0).Rows.Co unt
Dim intJ as Integer = 0
Dim strFlag as String = ""
While intJ < (intI - 1)
If inStr(txtQuickReply.Text, " " & dsFilter.Tables(0).Rows(in tJ)(0)) or inStr(txtQuickReply.Text, dsFilter.Tables(0).Rows(in tJ)(0) & " ") or inStr(txtQuickReply.Text, " " & dsFilter.Tables(0).Rows(in tJ)(0) & " ") or inStr(txtQuickReply.Text, dsFilter.Tables(0).Rows(in tJ)(0)) = 1 or inStr(txtQuickReply.Text, " " & dsFilter.Tables(0).Rows(in tJ)(0) & ".") or inStr(txtQuickReply.Text, " " & dsFilter.Tables(0).Rows(in tJ)(0) & "?") or inStr(txtQuickReply.Text, " " & dsFilter.Tables(0).Rows(in tJ)(0) & "!") Then
strFlag = dsFilter.Tables(0).Rows(in tJ)(0)
End If
intJ = intJ + 1
End While
The only other solution I can think of is to split the textarea content into an aray by space and then do a nested loop - looping through the array of words in the textarea for the outer loop and taking each value of that array and loop through the bad word list to compare for the inner loop. However, I wonder if there is a more efficient solution.
Does anyone have any ideas? Thank you.
Scott
I'm trying to figure out the best way to solve the following problem. I've got a ASP.NET/VB.NET form with a textarea and a XML document with about 600 bad words. OnSubmit I'd like to check and see if any words entered into the textarea are listed in the XML document.
The XML document is structured like this:
<wordlist>
<word>badword1</word>
<word>badword2</word>
<word>bad word or phrase</word>
</wordlist>
My first attempt was to load the XML data into a dataset, loop throught the dataset and use instr on the textarea value to check for words in the XML list. Then the word in question is displayed to the user so they can correct the post. This works somewhat, but is flawed because if there is a bad word inside a good word then the post is flagged, even though there is no bad word - for example the good word "how" is being flagged because it has the word "ho" within which is in the bad word list.
Here's what that code looks like.
Dim dsFilter as DataSet
dsFilter = New DataSet()
dsFilter.ReadXML(MapPath("
Dim intI as Integer = dsFilter.Tables(0).Rows.Co
Dim intJ as Integer = 0
Dim strFlag as String = ""
While intJ < (intI - 1)
If inStr(txtQuickReply.Text, " " & dsFilter.Tables(0).Rows(in
strFlag = dsFilter.Tables(0).Rows(in
End If
intJ = intJ + 1
End While
The only other solution I can think of is to split the textarea content into an aray by space and then do a nested loop - looping through the array of words in the textarea for the outer loop and taking each value of that array and loop through the bad word list to compare for the inner loop. However, I wonder if there is a more efficient solution.
Does anyone have any ideas? Thank you.
Scott
Here, I redid your code for you, this should be much easier to manage and edit to make better:
Dim dsFilter as DataSet
Dim I as Integer
Dim strFlag as String, Text As String, Word as String
dsFilter = New DataSet()
dsFilter.ReadXML(MapPath(" xml/wordli st.xml"))
Text = txtQuickReply.Text
For I = 0 to dsFilter.Tables(0).Rows.Co unt - 1
Word = dsFilter.Tables(0).Rows(I) (0)
If inStr(Text, " " & Word & " ") or _
inStr(Text, " " & Word & ".") or _
inStr(Text, " " & Word & "?") or _
inStr(Text, " " & Word & "!") Then
strFlag = Word
End If
End While
You have to take out those first two statements so that words like 'ho'w and spec'ho' (special?) aren't caught by the filter. Really it's easy to get away with alot in this feature , but at least this way it's easy to tell what you're checking for and you can plug in more variations as you see what they try.
Dim dsFilter as DataSet
Dim I as Integer
Dim strFlag as String, Text As String, Word as String
dsFilter = New DataSet()
dsFilter.ReadXML(MapPath("
Text = txtQuickReply.Text
For I = 0 to dsFilter.Tables(0).Rows.Co
Word = dsFilter.Tables(0).Rows(I)
If inStr(Text, " " & Word & " ") or _
inStr(Text, " " & Word & ".") or _
inStr(Text, " " & Word & "?") or _
inStr(Text, " " & Word & "!") Then
strFlag = Word
End If
End While
You have to take out those first two statements so that words like 'ho'w and spec'ho' (special?) aren't caught by the filter. Really it's easy to get away with alot in this feature , but at least this way it's easy to tell what you're checking for and you can plug in more variations as you see what they try.
ASKER
Thanks for taking a look at this. If no one has any other has other ideas, I'll probably give you the points tomorrow when I revisit this issue. This works fine except for if the word is the first word in the textarea or else the last word with no puncuation or space. Regular expressions have always been a challenging for me, which is frustrating since much of the time it seems like I can code whatever I want.
I appreciate you taking the time to share your thoughts with me.
Scott
I appreciate you taking the time to share your thoughts with me.
Scott
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Those are some great ideas thanks for your help.
Scott
Scott
My pleasure Scott.
The proper way to do this though would be to use a regular expression checker. Just import regex. Then you can go crazy finding your words in the text, and also make sure that any tags you alow in the box (like html) are skipped so that they can't get away with bad words by saying something like Da[b][/b]mn