Link to home
Start Free TrialLog in
Avatar of Oliver2000
Oliver2000Flag for Brazil

asked on

htaccess disallow anything else as images or css with Condition http:via

Hi Experts,

I would like to deny in my htaccess all requests from my cdn providers server to files other than jpg/png/gif/css/js/txt files.

Right now I use:
RewriteCond %{HTTP:VIA} ^.*\.worldcdn\..*$
RewriteRule ^robots\.txt$ robots_cdn77.txt [L]

To present for any request from my cdn provider a different robots.txt file to prevent google indexing my site via the cdn servers and create double content. The problem is was that google indexed my site double/trible times via the cdn urls. (they are cname host entries)

I would like to either add an additional rule or change the above one to avoid also users or site visitors can call my site via the cdn links.

In human words spoken something like:
If request is (from http:via) anything with *worldcdn* and for a file other than jpg/gif/png than give as response not the file but cdn_error.php

so any request from *worldcdn* to http://www.domain.com/site/index.php would be changed into http://www.domain.com/cdn_error.php but a request to http://www.domain.com/images/logo.jpg would normal repond with the logo.jpg file.

BUT important is that this only happens for requests from worldcdn and not for any other visitor.

Thank you for your help in advance
Avatar of DrDamnit
DrDamnit
Flag of United States of America image

This doesn't make sense. Your CDN provider doesn't index your site not does it call your site  - your can provides content to visitors to your site.

Are you saying that you want content served by your CDN provider to only Asia when people are on your site and give an error if someone attempts to link directly to your content? This, you force people to your site to see your content?

And you don't want Google to index your content either?
Avatar of Oliver2000

ASKER

Hi DrDamnit,

let me try to explain. it does make perfect sense. If somebody (incl. google) use the url cdn.domain.com the CNAME host entry directs him to my cdn provider (cdn77 in this case) and of course they take this request and take the original file from www.domain.com and forward to the user (or google). The CDN in the middle acts like a proxy more or less. the result is that you can not only call any image via this cdn url but also any html or php file which I want to prevent.

The problem is that google indexed now the cdn urls like double content. What I did now is already register the cdn subdomains in google webmaster area and started a removal request for all cdn urls which worked very fast. how ever I want to prevent re indexing of any html or php file.

The main question is actually independent of my cdn or google etc.

How can I write a small rule which just allow certain file extensions if the request comes from a certain server and leave all file extensions for others.

Like IF THE REQUEST COMES FROM ANY SERVER WITH CDN allow only jpg,gif but if the request comes from anybody else allow all files.
ASKER CERTIFIED SOLUTION
Avatar of Oliver2000
Oliver2000
Flag of Brazil image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I've requested that this question be closed as follows:

Accepted answer: 0 points for Oliver2000's comment #a39458362

for the following reason:

I found the solution myself.
Based on what you wrote, all requests come from the cdn because the cdn is the proxy. If that is really how you have it set up, I am not sure you can do what you're asking.

I use two different CDNs (rackspace and Amazon) and do not have it setup this way. The cdn delivers large files via urls (images and video) but the main site delivers the actual html.

In this setup,  I can tell the CDN show images and video only to requests that come from my site.  But in your setup,  all requests are moving through the cdn as a proxy,  which means htaccess would be useless -  as far as apache is concerned, all traffic comes from your proxy, so there is no way to filter.
I have the same situation. the cdn suppose to deliver only images and static files. I changed in my sites the links to this static content to the cdn urls. the problem was that you can call what ever you want via this cdn urls and there for also php or html files which than would be delivered via cdn. since google got hold of the cdn url they started to index the cdn url and with the solution above this is done now because now the cdn gets only static files.

thank you for your help anyway