Start Free Trial

asked on

Transparent proxy using bridged router

The setup is installing a device between the lan and the upstream router.

If we configure a two port router to be a router between the lan and the upstream router, then we can set the squid client to be transparent to pc's behind the device.

If we configure the device to be a bridge between the lan and the upstream router, then we have to set the IP in the browser.

Is it possible to have a squid proxy client on a device configured as a bridge?

Where is the Squid box in relationship to your router and the upstream router?

When you have the router configured as a router, how does traffic get set to the Squid box?

I personally would have the Squid box with two NICs, configure the NIC as a bridge, stick that box inbetween the LAN and the upstream router and have all traffic flow through it. With IP tables you can direct the traffic where you want it to go, proxy if needed or directly to the Internet.

You can do a search for "Squid Transparent Proxy" and you will find lots of info on how to do this.

ASKER

squid client is on a two port device.
Squid server is on another network in another location.

Yes, the box does have two nics and we've tried both bridge and router mode and is between the upstream router and the lan.
We figured router mode worked because it is inspecting every packet while bridged mode is just packing packets from one interface to the other.

Obviously, we've searched and tried a number of things before posting here.

We seem to be making headway, have the client working while the device is in bridge mode as of a short while after posting this.

ASKER

I should also have added in my question... is it possible to make the squid client do the dns lookups instead of the pc?

I imagine that the dns resolves would be initiated on the workstation since this is a transparent setup.

Therefore, the workstation dns would have to be set to the client IP perhaps? And the client would have a dns forwarder, allowed to communicate with a dns server where the squid proxy server resides?

Does a squid client offer dns resolve, where perhaps the client automatically gets its resolve from the server?

I can't see how any of that would work other than changing dns settings on the workstation.
And apparently, folks seem to say that SQUID is unsuitable for DNS forwarding and there being multiple reasons for this. One being https requests typically look like CONNECT ipaddress:443, resolved on the client side.

You refer to "squid client." The only "squid client" I know is "squidclient" it is a command line utility that allows you to get a webpage, sort of like wget. It can be used to manage and monitor Squid, but it can "use" Squid.

What do you mean by "squid client?"

That I am aware of Squid does not do DNS forwarding.

A host always does its own DNS lookup, but it sends the DNS request to whatever IP addresses are configured as its DNS servers. If you want to control DNS lookups somehow, you could run your down caching only DNS server and set your computers to use it as their DNS server.

ASKER

To clarify;

The 'client' is Squid Cache: Version 2.7.STABLE9 for openwrt using TLS v1.2
The server on a Centos7 server and is Squid Cache: Version 3.3.8

Yes, I think I misused the word client but what I mean by that is that it is at the users end, and is the device which accepts the browser traffic, and decides where to send it, either direct to internet or through the proxy server.

The issue with DNS is that if the remote workstation enters an https address, the proxy server cannot know what it is since it is encrypted.

So you have a proxy server using another proxy server.

For HTTPS you have to setup Squid to be a "man-in-the middle" and I would assume for a Squid proxy server going to another Squid proxy server, you would have to have both of them.

Here is a link to what should work, it is for an older version of Squid:

http://tektab.com/2012/09/28/squid-transparent-proxy-for-https-ssl-traffic/

ASKER

The reason for the first proxy is so the filtering can be done before it hits the server but I'm not sure it is needed this way. As I understand it, filtering can be done on the server itself, and all traffic which is not allowed would go directly to internet.

They are both running Squid, so they can both do the filtering.

Are the both at the same location? Do only some users going through the 1st Squid box and then everybody goes through the second Squid box?

Is the "upstream" Squid box a old PC that might be under powered?

There are valid reasons for using 2 Squid boxes like this, but it is normally because of diverse locations where a central one has the only Internet connection, or very large corporations where you have thousands of employees and you want to distribute the filtering process so you don't overload a single box.

ASKER

I thought I mentioned it but if not, here is is again.

There is a local squid server running at the users office.
There is the 'main' squid server which is the actual proxy server over the internet.

In my case, I use the small local squid to filter *before* the traffic hits the main proxy server by using a smaller local proxy to do the filtering. However, maybe I don't need that local device because as I understand it, if I filter at the main proxy server, anything not allowed will go directly to the internet instead of through the proxy server anyhow.

The other reason was to have a transparent local proxy but that didn't work so far.

I missed the post where the "main" Squid server was at other location.

If you want a transparent proxy server, it must have two NIC's and be connection in-line. The NIC's need to be configured in bridge mode. The Linux box will at like a "switch", transparently passing traffic through it. With the proper configuration on the Linux box, web traffic will be proxied by Squid where you can allow or block.

There is really no technical reason to then forward the traffic to another Squid box, unless it is a company requirement.

ASKER

Yes, I know about the dual NIC setup, it's how we had initially done it.
The idea was to have the pre-squid proxy between lan/router but that seemed to cause issues which I can't recall at this moment. So instead, we decided to go back to making the user change their browser settings which put us right back to where we started.

The idea was to prevent users from over using the proxy by adding one at their location. The main proxy is also to give anonymity for remote employee work/research.

Problem is, they forget to disable the proxy when they are doing their own thing, using up bandwidth, hence the filtering.

You could setup Squid at your location to be transparent/inline and then configure the local Squid to forward to the main site Squid server.

Review: http://wiki.squid-cache.org/Features/CacheHierarchyh

ASKER

Yes, that is what I'm doing now but I'm not sure if I need to do it this way. It costs another device at each location so am trying to find out if I can eliminate that, using just one server.

You don't need to, you can use a single device at a central location or a single device at each remote location.

The advantage of using a 2 tier design or using a Squid server at each remote location is you save bandwidth on the WAN connections between each remote site and the central site.

ASKER

So to confirm...

Using a proxy at each remote site simply saves WAN connections to the central site because of pre-filtering.

Using a proxy server at the central site achieves the same, filtering at the server will force user traffic to go directly onto the internet, bypassing the central server. The cost is a little more WAN bandwidth since the remotes have to check with the central server first.

Overall, that doesn't seem like any significant amount of extra WAN bandwidth?

You don't save a lot of bandwidth for the filtering. However you have the possibility to save more bandwidth due to the caching.

If 10 people at the same office access the same site, if you have a local Squid server, it could be caching information from that site and so you only to to the central site and the Internet once. The other 9 get it from the local Squid cache.

If you don't have the local Squid cache, then everybody has to go to the central site to get it. Depending on the WAN bandwidth and what is cached it could be a little bandwidth saved, or it could be lot of bandwidth saved.

ASKER

I'm confused about proxies then.

In looking around for information on setups, I came across countless free and paid proxy services for example. Some of those look incredibly cheap which makes me wonder how they could be offering proxy services for next to nothing if thousands of people joined their services?

In my case, I've got hundreds of possible users, it's free to employees but if we don't manage the bandwidth, it could get ugly.

We mainly want to offer them privacy for their work, not much else but we can't have them sending everything they do through the central proxy. Yet, most don't remember to disable the proxy after work so their home/leisure stuff ends up going through the network as well.

Figured that's ok, so long as we can at least prevent the big bandwidth stuff like streaming and large file downloads, etc.

Most of the proxies on the Internet are to allow you to anonymously visit Web sites, or in some cases allow you to access sites that you are not allow to visit based on where you live.

Now what most companies use a proxy for is NOT to hide, because they are still coming from the companies IP address so your not hiding. But to cache as much static information as possible so that instead of hundreds or thousands people going to the same site and downloading the same information over the Internet connection, only the 1st person gets it downloaded over the Internet, everybody else gets it from the proxies cache.

ASKER

>Most of the proxies on the Internet are to allow you to anonymously visit Web
>sites, or in some cases allow you to access sites that you are not allow to visit
>based on where you live.

I understand that but they would have the same problem. Offering those services for free or at a cost has to cost them bandwidth. How are they getting around that?

I've never used one, have not even looked to use one, but my guess is scale of economy (lots of people for a low cost), advertising, and they most likely throttle what bandwidth you can use.

ASKER

What ever their model is seems to be what I am wanting otherwise, all our bandwidth gets used up.

Then you need to analyze what traffic is being allowed. The possibilities:

1) Bandwidth is being used by something that does not go through Squid.
2) Bandwidth is being used by sites that are not getting filtered out.
3) Bandwidth is being used because a lot of sites use dynamic content and so it will not be cached.

ASKER

Not sure what this means? This is basically my question, problem :)
Anyhow, I don't see how this will be resolved so its basically a moot question now.

Read: http://wiki.squid-cache.org/Features/CacheManager

Tells you how to look at the caching stats. This will tell you things like cache hit ratio

You can also run a packet capture on the Squid box on the interface that is the "outside" interface. This will show you what is not being cached.

If possible I would run a packet capture on the local firewall or the router that is your last hop so you can see what traffic is going across the WAN link.

Do you access company servers at the main office over this link? If so, none of that would be cached.

ASKER

I know all those things, that's why I'm trying to find a way of preventing certain traffic from flowing through the proxy server.
This isn't about caching as much as it is offering their employees privacy but without allowing ALL of their traffic through the proxy server. As mentioned, they keep forgetting to disable the proxy.

The question is really about whether I need to have a local proxy to make this work or if one central proxy can do the filtering, forcing the users traffic to go directly to the internet.

Again, maybe you don't understand what I am saying.

The ONLY reason to have a local proxy is to reduce the traffic on the WAN link between the local office and the main office. You do NOT need to have a local proxy. The central proxy at the HQ location can do everything you need.

However, having a proxy is NOT giving any of your employees privacy. The request are coming from YOUR company's Internet connection. So the sites they are visiting know where they are coming from. So if your goal is to give them privacy, the way you are doing it will NOT work.

ASKER

I certainly do understand you but maybe I'm not explaining well enough. I keep asking the same question repeatedly but it doesn't seem to be coming across properly. Let me try once again.

I guess I need to explain the things which I understand about using proxies :). This is where the confusion starts, because we aren't talking face to face to having to explain in text makes things more complicated than they are.

First, I totally understand using a local proxy for saving bandwidth. This isn't about saving bandwidth at the remote sites.
Second, I totally understand that if they use a local proxy, all traffic comes from their own site, thus, no privacy.

The question has nothing to do with these parts. I'm not asking about why and how proxies can be used.

The idea of using a local proxy is not about privacy or saving bandwidth, it's only about pre-filtering (for lack of better term) to lower the amount of traffic hitting the central proxy server.

By default, Squid proxies do some filtering where certain types of traffic is not allowed to the request is bypassed, sending the user directly to the url/IP on the internet.

The *only* reason for a setup where there is a local proxy was the idea of pre-filtering, so that the local proxy could prevent traffic from even hitting the central server, sending it directly to the internet.

Right now, people are setting their browsers to the central proxy server and keep forgetting to disable that. Thus, everything they are doing, day and night, is going to the proxy.

The idea at first was to prepare a little router which contained a transparent proxy which they could install between their lan and router but that ended up being too complicated. The idea was that the local proxy would pre-filter, before ever reaching the central proxy server, saving bandwidth at the central server by not having to deal with those requests. If the local router didn't filter for something, then that traffic would simply be sent to the central server.

Then in my question, I asked if this would make all that much difference? I am wondering if it really takes all that much resources for the central server to simply do it all on its own instead of using a local pre-filtering proxy.

So you see, the local proxy is not about bandwidth or privacy, it was simply about looking at ways of preventing traffic from hitting the central server if it didn't have to. The idea was trying to find a way of allowing ALL but filtered traffic to hit the central server so that it no longer mattered if they remember to disable the proxy or not.

From your prior post:

"This isn't about caching as much as it is offering their employees privacy "

Right there you say you want to offer employees privacy.

However, lets ignore that.

You can setup Squid as a transparent proxy. However, NOT while the browser is configured to point to another proxy. All proxy settings must be removed. When you configure a browser to use a proxy server, how it sends the request is different. A transparent proxy server will not intercept traffic sent to proxy server because of how the HTTP requests are sent.

You would need to remove the proxy settings from the browser and then let the transparent proxy server forward any non-filtered requests to the central site's proxy.

Now you say you are trying to reduce the load on the central proxy server. That implies that you feel it is overloaded. Is it overloaded?

ASKER

Again, yes, offering privacy by using the central server! Nothing to do with the local proxy. Not important, not something I care about.

In transparent mode, we don't use any proxy settings in the browser.

Ok, this question is endless. We obviously will never understand each other.

Just as a F.Y.I, unless the central server is run by another company, it still provides no privacy. Every site you will see the IP address that the central server presents to the Internet.

I think part of the understanding problem is:

1) I'm getting confused as to what you want to do locally vs. what is being done centrally. Example: You mention privacy and I am assuming locally, you then state no, that is the central servers jobs.

2) You seem to want a definite answer about if this will prevent traffic from going to the central site. I can't answer that. Read below.

How much in bandwidth usage a local Squid box saves you will depend on how much Internet traffic you have going from the local office to the central office. If Internet traffic from the local office takes up 5% of your total usage (not total avaliable bandwidth, but total you actually use) then the maintaining a local Squid server really does nothing. Now if Internet traffic from the local office takes up 50%+ of your total usage, then it COULD save you quite a bit, as long as a lot of that traffic can be either filtered or cached on the local Squid server.

As for offloading load from the central server, again it depends. If your local office accounts for 1-5% of the central Squid server's load, then it really is not doing a whole lot to have a local Squid server. If your local office accounts for 50%+ of the load on the central server, then a local Squid server COULD reduce the load, again, assuming that the local Squid server ends up filtering or caching a significant amount of traffic.

What I will say is:

A local Squid proxy running in transparent mode has the possibility to reduce traffic to/from the central site and reduce load on the Squid at the central site.

It has to be setup to properly filter what you want filtered and cache as much as you can.

It must be setup to properly forward requests to the central site's Squid server.

How much load? That is something I can't answer. The way to find that out is to look at the stats on the central Squid server to see how may requests from your location are filtered out and how many are served from cache. Then see if that is a significant amount of traffic.

ASKER

Yes, I know that the central server IP will be shown and that is the point. That ALL traffic would appear to be coming from the central server and not the users, thus, offering them some privacy.

>1) I'm getting confused as to what you want to do locally vs. what is being done centrally. Example: You mention
>privacy and I am assuming locally, you then state no, that is the central servers jobs.

I've explained it several times now and am not sure how else to explain it. At this point, I would simply be repeating what I've already said many times.

Is this for a business? The users have the same level of privacy no matter what you do. Why? Because no matter what you do, the IP address(es) will be the companies IP address. So if you use a proxy server, or a firewall doing a many-to-one NAT, or you give each user their own public IP address, the address will map back to the company not the user.

Why do you think that the proxy provides some level of privacy?

Again, I have explained. What you want do is possible. The question is how much value does it provide. I can't answer that you need to look at how much traffic from your site is filtered/cached by the central site's proxy. Then see if you think that volume of traffic is worth the time and effort to setup a local proxy and maintain it.

ASKER

I keep saying we should close this question because you simply cannot understand what I am explaining no matter how many ways I try to explain it :).

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

That will do :).

Thanks kindly, it's a good lead.