We help IT Professionals succeed at work.

prevent remote download via curl

dabobert
dabobert asked
on
999 Views
Last Modified: 2009-12-16
I'm trying to protect files on my server from being downloaded by scripts.  My team lead wants me to make it so that when a file is being downloaded by curl they only get an empty file, but i am unsure of how to set it up.  I understand that this may not be the most secure solution, so i am also curious of the downsides if there are any
Comment
Watch Question

Commented:
You can use the User-Agent string of curl to detect it:

if (preg_match('/curl/i', $_SERVER['HTTP_USER_AGENT'])) != 0) {
  die();
}

However, you have to include or require that into all your php files, the simpler way is to use Apache's mod_setenvif:

setenvif user-agent ^curl goaway
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=goaway
</Limit>

Author

Commented:
but the goal is not to determine whether or not the term "curl" is in the url, the goal is to prevent the page from being remotely accessed from a script that may or may not be using curl.  also if i were to use curl commands in a file named foo.php, the remote download will still work because curl is not the url of the referer.  plus the php script could use curl or something like it to send out an incorrect referer.

Commented:
> ...the goal is not to determine whether or not the term "curl" is in the url,...
The script that I posted is not to determine if the term "curl" is in the url but to check the user-agent string.  The user-agent string contains information such as which browser program and operating system is used (http://en.wikipedia.org/wiki/User_agent).

The only way to do it is via user-agent string (we are not talking about url of the referer but the user-agent string here).  There are quite a lot of user-agent string out there (http://www.user-agents.org/).  There are a few ways that you may want to consider:

1) If you know what user-agent string that the automated download script you want to block (you can check it from the server log), then use that user-agent string to block it.
2) If you don't know what user-agent string, you may want to only allow those that you know (i.e. mozilla, internet explorer, firefox, safari, opera, googlebot etc).
3) You may want to put a robot.txt (http://www.robotstxt.org/wc/norobots.html) file in your web server to see if those script observes robot.txt

However, having said that, any program such as curl can send any type of user-agent string to your server to spoof it.
CERTIFIED EXPERT
Top Expert 2007

Commented:
You can fake your user agent using CURLOPT_HTTPHEADER .
The best solution is to use a turning code on the page ... which machines can not  read.

Author

Commented:
steelseth12 could you please elaborate

Author

Commented:
slyong you were right about me confusing user-agent and referer but as you and i tried but failed to say yesterday,  user-agent can be spoofed
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
CERTIFIED EXPERT
Top Expert 2007

Commented:
A turing key or CAPTCHA is a randomly generated image of letters that the user needs to input in order to proceed to the next step ... machines can not (or at least the software spammers have at this moment in time) recognise the letters in the image so they can not proceed to the next step .....

have a look at http://viebrock.ca/code/10/turing-protection for a sample script.
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.