Solved

Swear word filter on input

Posted on 2004-08-18
13
1,290 Views
Last Modified: 2010-08-05
Hi,

Ive got a real hard question for someone to solve....Upon insert of a record into a table, i am using a trigger to check for certain data, one of the things i want to do is to check for swear words, if there are swear words then i want to delete the record...

Does anyone know of a method of setting up perhaps an array of swear words that i can check the input against??? One of my concerns is checking a string against potentially 100/200 swear words, i think this could add a certain amount of strain on the server load, so any suggestions for this too would be very much appreciated.

Thanks in advance guys,

Al
0
Comment
Question by:higgsy
  • 5
  • 3
  • 2
  • +2
13 Comments
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 11834166
>> Does anyone know of a method of setting up perhaps an array of swear words that i can check the input against? <<

Basically just create a table to hold them.  But this will still add quite a bit of overhead, especially if you have to check fairly long entries.  You could also try full-text indexing the column(s) and using lookups on that to remove offending row(s).
0
 
LVL 15

Accepted Solution

by:
jdlambert1 earned 500 total points
ID: 11834398
You can also expect that you'll either have to do precise matches or fuzzy matches.
If you do fuzzy matches, you can expect a lot of "good" words to get hits.
If you do precise matches, you have to decide what to do with words that have more than one meaning. And if you get into evaluating context, that's a area where room fulls of PhD's are working on automated grammatical analysis, with the bottom line that there's no easy way to do it. Plus, your precise control list is likely to always be missing some "bad" words.

Performance aside, this is a very difficult area...
0
 
LVL 18

Expert Comment

by:SjoerdVerweij
ID: 11834408
Let us know if you need more specifics (i.e., code).
0
 

Author Comment

by:higgsy
ID: 11835429
code would be great guys...

Thanks

Al
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 11835554
I urge you to carefully review jdlambert1's issues and resolve those before (prematurely) worrying about code.  You should do the "analysis" and "design" of this before you worry about code.  No offense to anyone, but coding is the easiest part of it once a proper design and clear goals are determined.
0
 
LVL 18

Expert Comment

by:SjoerdVerweij
ID: 11835822
True. To get into specifics, which one of these should go through? (Note: read * as i

sh*t

s h * t

smashit

sh1t

etc.
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 69

Expert Comment

by:ScottPletcher
ID: 11835863
And if you try to go phonetically, like SOUNDEX or something similar,  what about:
shiitake mushrooms?
0
 
LVL 18

Expert Comment

by:SjoerdVerweij
ID: 11836485
Not to mention you'd have to do the pig-Latin ones, like pr0n.
0
 
LVL 9

Expert Comment

by:dancebert
ID: 11836556
To quote (ok, paraphrase) the great George Carlin:

'You can prick your finger but you can't finger your prick.'

A baseball announcer can say 'Roberto Clemente has two balls on him.' But he can't say, 'I think he hurt his balls on that play'

0
 
LVL 18

Expert Comment

by:SjoerdVerweij
ID: 11836579
Not to mention regional variances.

"I am going to smoke a fag"

Nicotine-related in the UK, homicide-related in the US.
0
 

Author Comment

by:higgsy
ID: 11879356
Hi guys,

You've all made valid points, and ive had some time to reflect on the design. Instead of deleting the record if a swear word is found, i am simply going to send an alert to the website administrator so we can just go and check it....In the past when people have left comments etc on the website i have a flag in the database called IsAuthorised, therefore until we check the content the record can't be viewed on the website....This is not only very time costly but also annoys users as they dont get to see their post straight away...

What do you guys recommend?? If you agree with me solution, does anyone have code that will search a string for a series of swear words, almost like an array???

Thanks guys

Al
0
 
LVL 18

Expert Comment

by:SjoerdVerweij
ID: 11885046
Most blog comments now revert to some type of authorization mechanism.

To do a rough and ready:

create table NaughtyWords(Word As VarChar(50))

insert into NaughtyWords(Word) Values('Belgium')
... etc ...

Then a @value would be ok if

Not Exists(Select * From NaughtyWords
  Where CharIndex(Word, @Value) > 0)

To deal with B`e`l`g`i`u`m etc., replace @Value in the above with Replace(@Value, '`', ''). You can string these together for other characters. Note that this does increase the likelihood of false positives.
0
 
LVL 9

Expert Comment

by:dancebert
ID: 11885350
1. Make a list of words that are unacceptable, no matter what the context. For example:  S*it, F*ck, C*unt, C*cksucker, m*therf*cker.  This will be a short list unless you're willing to include languges other than English.

2. Make a list of phrases that are unacceptable, no matter what the context.  These phrases will include words that have multiple meanings so they can't be included in the first category.  For example, Tit is a bird species, and so is the Great Tit.  "Great Tits" is a common phrase among bird watchers in certain parts of the world. However, 'Suck my t*ts' obviously has no redeeming social value.

3. Monitor what gets through and add new phrases to #2 as needed.

4. Tell the people who object to the things that get through to get a life.
0

Featured Post

Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
SQL Backup skipping a few tables 7 29
Help with SQL Query 23 39
Mssql SQL query 14 28
add 1 to a field for 100 rows 11 24
Introduced in Microsoft SQL Server 2005, the Copy Database Wizard (http://msdn.microsoft.com/en-us/library/ms188664.aspx) is useful in copying databases and associated objects between SQL instances; therefore, it is a good migration and upgrade tool…
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Viewers will learn how the fundamental information of how to create a table.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now