Python/C++ web domain filter approach immune to VPN or proxy

wfskmoney
wfskmoney used Ask the Experts™
on
Hi,

I am developing an anti procrastination tool. The tool is supposed to block access to distracting websites such as Facebook. Blocking should work using asterix notation:

website.com
website.*
anything.website.com
anything.website.*
*.website.com
*.website.*

Its important that my users can not circumvent the blocking by using (a) Proxy Servers, (b) VPN or (c) websites like www.hidemyass.com

I am looking for a framework or approach in Python/C++ to realize this including requirements (a), (b) and (c). If (c) can not be done it should be fine, since I could just create of list of alle websites like hidemyass.com and add them to the block list as well. So (a) and (b) are most important.

The blocking method will be compiled as a Windows .dll to be included into a VB.NET project.

Thanks for helping!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
jkr
Top Expert 2012

Commented:
On which level are you planning to do the actual blocking? Application level? Transport layer? Device level?

Author

Commented:
I am not a programmer so I am not sure what would be best. Which of the 3 would be best suitable so that all my 4 requirements (asterix, VPN immune, Proxy immune, Web proxy immune) from above can be realized?
jkr
Top Expert 2012

Commented:
Well, that will depend on what you want to do - filtering domains for your VB.NET project only or system-wide...
Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

Author

Commented:
ahh thanks for asking, seems it wasnt that clear. It will be an anti-procrastination application, so that means the domain/website has to be blocked for the hole system/any browser. I would guess the lower layer, the better.
jkr
Top Expert 2012

Commented:
If you want to do that globally, there are several options as well:

- use a Firewall-like setup, see e.g. http://www.codeproject.com/Articles/5602/Simple-Packet-Filter-Firewall ("Simple Packet - Filter Firewall"). Will meet all your requirements, the drawack is the driver installation and regex-matching in a kernel driver is something that I wouldn't really want to implement, yet feasable.

- use an API hook that hooks into 'gethostbyname()' to implement your filter, see e.g. http://www.codeproject.com/Articles/49319/Easy-way-to-set-up-global-API-hooks ("Easy way to set up global API hooks"). Will also meet all your requirements. Drawback: Behaviour-analyzing Virus Scanners might classify that as malware.
jkr
Top Expert 2012

Commented:
Oh, add "reverse DNS lookups from kernel mode" to the drawbacks of the Firewall option as well ;o)

Author

Commented:
Seems API hooking is one possible way that also allows asterix notation for domain blocking. 2 questions:

1. Seems this is immune to proxies and VPN. However is it also immune to if the user goes to www.hidemyass.com? Because this is different from just connecting to a proxy server on Browser or System level, as the user makes an HTTP request to hidemyass which opens a website and delivers the content.

2. Someone else suggested "Nullrouting" using Python. I am not sure what nullrouting does exactly, so what kind of approach is that and would it meet all my requirements like asterix notation and immunity to Proxy/VPN?

Thanks
jkr
Top Expert 2012

Commented:
1. Well, as you put it, just put that domain on the blacklist and you are rid of that problem.

2. I assume "nullrouting" is meant to set a route in the routing table that points to oblivion. While that approach would work, it suffers one obvious flaw: Routing is IP-based, and e.g. facebook.com definitely has more than one IP address, so you'd have to add a route for eacdh, which is not feasible in the long run.

Author

Commented:
I see.

What about http://mitmproxy.org/ which is a Python based Man-In-The-Middle Proxy to intercept web content and drop packages?  It could be configured such as that any "GET blockeddomain.com/index.html" request would be dropped. Mitmproxy has a command line tool called "mitmdump" to do that. However, would I have to set the system proxy to localhost if I ran mitmdump?

Or I also found Squid Cache Proxy.

Could one use one of those proxies and use API Hooking to reroute all traffic trough that local proxy server?
jkr
Top Expert 2012

Commented:
You could of course use a proxy to do the blocking, but you not only have to configure 'localhost' as a proxy, you'd have to do that for every possible browser - each IE, FF and Chrome have their own proxy settings, which makes bypassing that a piece of cake.

Author

Commented:
I see. So could one use this API hooking approach or a different approach to reroute traffic through a proxy system-wide?
jkr
Top Expert 2012

Commented:
IMO that would be a bit too complex, since it would require to manipulate in- and outbound traffic on a per-packet basis, which effectively would mean writing your own NAT, which seems a bit like overkill to me. Tackling that at DNS level seems the easiest way to me so far.

Author

Commented:
What do you mean by tackling at DNS level exactly? Do you mean using API hooking?
jkr
Top Expert 2012

Commented:
Well, "DNS level" means that if someone request the IP adress for "www.website.com", you match that against "*.website.com" and return "WSAHOST_NOT_FOUND", indicating that the host does not exist. If however e.g. www.experts-exchange.com is requested, you foward the call to the original 'gethostbyname()' function and return the repective result, allowing the client application to connect.

Author

Commented:
1. so essence that means your API hooking approach?

2. I thought about just using the hosts file. However, the user can easily circumvent it by  using a proxy. In theory could there be ways to prevent the user from using a proxy to circumvent the hosts file? That would be a really easy solution.
jkr
Top Expert 2012

Commented:
'hosts' is what I'd have recommended if you hadn't introduced wildcards in your Q. Using a proxy won't matter at all, since DNS resolution takes place before the connection.

Author

Commented:
I see. So is what you mean by blocking DNS level your API hooking approach or do you mean a different approach?
Top Expert 2012
Commented:
I'd go that route, since it seems to be the most simple yet bullte-proof way of doing that.

Author

Commented:
I see, thanks a lot!

Author

Commented:
Very responsive, helpful, answered all questions!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial