• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 399
  • Last Modified:

Keeping PDF Files secure from google search

I have created a secure log-in for a client who has a page with links to pdf files. The page is secure and you must log-in to see these links. The problem is that a user was searching goggle and found a link to the pdf file, which he was able to download. with a previous client where I created a digital download. To prevent users browsing directly to the file, I created a folder outside the web root and put all the files in there. With the server I am working with inthis case , they don't allow any access to files outside the web root. I know I can create an htaccess file for the particular folder but we are trying to avoid having the user log-in to the page and than again log in evrytime they want to look at a pdf file.

Is there another way to prevent a user from searching on google and accessing the pdf file directly?

1 Solution
Google cannot search into database-driven pages.  It uses crawling technology.
That is why database-driven e-commerce sites will generate thousands of crawler-compatible pages.

Have you thought about storing the files in a database?  User logon and clicks are passed to a backend SQL database to fetch files.
nigelspongeAuthor Commented:
This is a good idea, I will try this and get back.

Thank You, VJC
nociSoftware EngineerCommented:
Google can't makeup links of it's own, it will only follow links on your site.
So is appearantly there is a page that is accessible (indirectly) without logon that points to the the .PDf files.

You can request to the googlebot to ignore your site(polite), you can block google's address ranges for inbound port 80 access and (block).

Also check these references:
Prevent content on google:

And also: (more background)
We Need Your Input!

WatchGuard is currently running a beta program for our new macOS Host Sensor for our Threat Detection and Response service. We're looking for more macOS users to help provide insight and feedback to help us make the product even better. Please sign up for our beta program today!

nociSoftware EngineerCommented:
Other things:

Don't allow directory access, only access though an index.html/index.php etc.
use the robots.txt as a guard. (And allow it to be read).
Place the files in a subdirectory.

Setup directory security on the subdirectory to disallow anonymous access, and require basic authentication,  or edit the server configuration and use a .
<Directory /path/to/locked/dir>
  Deny from all

In the parent directory create a script that when run opens a file in the subdirectory and sends the file to the user's browser  (bypassing the directory security).

Make your special script validate that the user has authenticated with the secure login.

Either that   or use HTACCESS site-wide for all client pages, and place them all as subdirectories of the common directory.

Instead of implementing login/logout yourself, have the webserver and the web browser handle it.

Use the server-side variables within your mod_php scripts to determine which user is currently logged in.


Rich RumbleSecurity SamuraiCommented:
Search engines are supposed to respect the webmaster's wishes, so robots.txt and no follow/no index tags should be used as linked to above:
It works for us, Yahoo, Culi, MSN, Google and even Archive.org all adhere to our wishes via the robots and no follow tags. http://www.archive.org/about/exclude.php
nigelspongeAuthor Commented:
thank you
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Managing Security & Risk at the Speed of Business

Gartner Research VP, Neil McDonald & AlgoSec CTO, Prof. Avishai Wool, discuss the business-driven approach to automated security policy management, its benefits and how to align security policy management with business processes to address today's security challenges.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now