Link to home
Start Free TrialLog in
Avatar of johnsit
johnsitFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Squid with squidguard content filter/blacklist (cannot block google cache)

Hi All,

I have a squid server with squidguard up and running fine.

I have added the line to the squidguard config of,
rewite google [ s@(google.co.uk/search.*q=.*)@\1\&safe=active@i
                         s@(google.co.uk/images.*q=.*)@\1\&safe=active@i
                         s@(google.co.uk/news.*q=.*)@\1\&safe=active@i}

which rewrites the urls to force google safe search (hence proves the above method for rewriting urls is valid. Ok so far..

The problem is that when users do a search in google, although the page is blocked via some blacklists I have implemented the users can click on the cached results in the google results and athough out of date can still see the cached page.

The URL format of the cached pages are of the form
http://74.125.77.132/search?q=cache:bAe-5jBVrukJ:www.bbc.co.uk/+bbc&cd=1&hl=en&ct=clnk&gl=uk

The common part of cached results which always appears in the url is the "search?q=cache" part.
Hence one would have thought that replacing the above with,

rewite google [ s@(google.co.uk/search.*q=.*)@\1\&safe=active@i
                         s@(google.co.uk/images.*q=.*)@\1\&safe=active@i
                         s@(google.co.uk/news.*q=.*)@\1\&safe=active@i
                         s@^(*search?q=cache*)@http://www.someblockpage.com}


would work which it doesnt.

in addition, i have tried,

rewite google [ s@(google.co.uk/search.*q=.*)@\1\&safe=active@i
                         s@(google.co.uk/images.*q=.*)@\1\&safe=active@i
                         s@(google.co.uk/news.*q=.*)@\1\&safe=active@i
                         s@^www.yahoo.co.uk@http://www.google.com}

Just as a test to redirect yahoo.co.uk to google hich works fine, proving there is no issue with the asterisk.

Can anyone shed any light on this subject please? Hope Ive made my q clear, plase ask if you need any further info.

Thanks.
ASKER CERTIFIED SOLUTION
Avatar of jb1dev
jb1dev

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jb1dev
jb1dev

Probably the ^ (starts with) would cause problems too, as the url won't start with "search".

Note that all of google's cache is referred to directly by IP, so you could potentially just block all URLs with IPs using:

pass !in-addr

in your acl source.


Avatar of johnsit

ASKER

Thanks for your comments and response however it doesnt seem to work for me..
Avatar of johnsit

ASKER

Sorry, when you refer to the !in-addr config line I do actually have that but thought (and sorry if its my misunderstanding) that would prevent IP adresses of items which are on the BLs not all IP addresses in the browser. At least thats what seems to be happening with mine.

Thanks again
Avatar of johnsit

ASKER

Finally managed to get this working.

Thanks for the input and i did end up using your config line.

The issue as to why it wasnt working was that i had the !in-addr in the same line as to where the regexes were specified. I removed from there and added to the default area and all working fantastically.

Thanks for your help. Awarding points now.
This worked fine for the initial hit on the page.  It appeared to rewrite the string to show safe search.  The minute I selected safe search and turned filtering off though it showed all the explicit content.  It didnt seem to re-rewrite the page (sorry for the double re).  Another point is that there is google.com and google.ca and many many more.  This only rewrites google.com