Link to home
Start Free TrialLog in
Avatar of dabobert
dabobert

asked on

prevent remote download via curl

I'm trying to protect files on my server from being downloaded by scripts.  My team lead wants me to make it so that when a file is being downloaded by curl they only get an empty file, but i am unsure of how to set it up.  I understand that this may not be the most secure solution, so i am also curious of the downsides if there are any
Avatar of slyong
slyong

You can use the User-Agent string of curl to detect it:

if (preg_match('/curl/i', $_SERVER['HTTP_USER_AGENT'])) != 0) {
  die();
}

However, you have to include or require that into all your php files, the simpler way is to use Apache's mod_setenvif:

setenvif user-agent ^curl goaway
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=goaway
</Limit>
Avatar of dabobert

ASKER

but the goal is not to determine whether or not the term "curl" is in the url, the goal is to prevent the page from being remotely accessed from a script that may or may not be using curl.  also if i were to use curl commands in a file named foo.php, the remote download will still work because curl is not the url of the referer.  plus the php script could use curl or something like it to send out an incorrect referer.
> ...the goal is not to determine whether or not the term "curl" is in the url,...
The script that I posted is not to determine if the term "curl" is in the url but to check the user-agent string.  The user-agent string contains information such as which browser program and operating system is used (http://en.wikipedia.org/wiki/User_agent).

The only way to do it is via user-agent string (we are not talking about url of the referer but the user-agent string here).  There are quite a lot of user-agent string out there (http://www.user-agents.org/).  There are a few ways that you may want to consider:

1) If you know what user-agent string that the automated download script you want to block (you can check it from the server log), then use that user-agent string to block it.
2) If you don't know what user-agent string, you may want to only allow those that you know (i.e. mozilla, internet explorer, firefox, safari, opera, googlebot etc).
3) You may want to put a robot.txt (http://www.robotstxt.org/wc/norobots.html) file in your web server to see if those script observes robot.txt

However, having said that, any program such as curl can send any type of user-agent string to your server to spoof it.
You can fake your user agent using CURLOPT_HTTPHEADER .
The best solution is to use a turning code on the page ... which machines can not  read.
steelseth12 could you please elaborate
slyong you were right about me confusing user-agent and referer but as you and i tried but failed to say yesterday,  user-agent can be spoofed
ASKER CERTIFIED SOLUTION
Avatar of mattjp88
mattjp88
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
A turing key or CAPTCHA is a randomly generated image of letters that the user needs to input in order to proceed to the next step ... machines can not (or at least the software spammers have at this moment in time) recognise the letters in the image so they can not proceed to the next step .....

have a look at http://viebrock.ca/code/10/turing-protection for a sample script.