[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now

x
?
Solved

Keeping PDF Files secure from google search

Posted on 2008-10-07
7
Medium Priority
?
389 Views
Last Modified: 2013-12-08
I have created a secure log-in for a client who has a page with links to pdf files. The page is secure and you must log-in to see these links. The problem is that a user was searching goggle and found a link to the pdf file, which he was able to download. with a previous client where I created a digital download. To prevent users browsing directly to the file, I created a folder outside the web root and put all the files in there. With the server I am working with inthis case , they don't allow any access to files outside the web root. I know I can create an htaccess file for the particular folder but we are trying to avoid having the user log-in to the page and than again log in evrytime they want to look at a pdf file.

Is there another way to prevent a user from searching on google and accessing the pdf file directly?

0
Comment
Question by:nigelsponge
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 32

Expert Comment

by:aleghart
ID: 22664216
Google cannot search into database-driven pages.  It uses crawling technology.
That is why database-driven e-commerce sites will generate thousands of crawler-compatible pages.

Have you thought about storing the files in a database?  User logon and clicks are passed to a backend SQL database to fetch files.
0
 

Author Comment

by:nigelsponge
ID: 22665546
This is a good idea, I will try this and get back.

Thank You, VJC
0
 
LVL 40

Accepted Solution

by:
noci earned 375 total points
ID: 22673376
Google can't makeup links of it's own, it will only follow links on your site.
So is appearantly there is a page that is accessible (indirectly) without logon that points to the the .PDf files.

You can request to the googlebot to ignore your site(polite), you can block google's address ranges for inbound port 80 access and (block).

Also check these references:
General:
http://www.google.com/support/webmasters/
Prevent content on google:
http://www.google.com/support/webmasters/bin/topic.py?topic=8459

And also: (more background)
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=70897
http://www.smart-it-consulting.com/internet/google/googlebot-spoofer/
http://www.googleguide.com/google_works.html
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=80553
0
Simplify Your Workload with One Tool

How do you combat today’s intelligent hacker while managing multiple domains and platforms? By simplifying your workload with one tool. With Lunarpages hosting through Plesk Onyx, you can:

Automate SSL generation and installation with two clicks
Experience total server control

 
LVL 40

Expert Comment

by:noci
ID: 22673404
Other things:

Don't allow directory access, only access though an index.html/index.php etc.
use the robots.txt as a guard. (And allow it to be read).
0
 
LVL 23

Expert Comment

by:Mysidia
ID: 22695707
Place the files in a subdirectory.

Setup directory security on the subdirectory to disallow anonymous access, and require basic authentication,  or edit the server configuration and use a .
<Directory /path/to/locked/dir>
  Deny from all
</Directory>

In the parent directory create a script that when run opens a file in the subdirectory and sends the file to the user's browser  (bypassing the directory security).

Make your special script validate that the user has authenticated with the secure login.


Either that   or use HTACCESS site-wide for all client pages, and place them all as subdirectories of the common directory.

Instead of implementing login/logout yourself, have the webserver and the web browser handle it.

Use the server-side variables within your mod_php scripts to determine which user is currently logged in.

$_SERVER['PHP_AUTH_USER']




0
 
LVL 38

Expert Comment

by:Rich Rumble
ID: 22758211
Search engines are supposed to respect the webmaster's wishes, so robots.txt and no follow/no index tags should be used as linked to above:
http://www.google.com/support/webmasters/bin/answer.py?answer=93708&topic=8846
It works for us, Yahoo, Culi, MSN, Google and even Archive.org all adhere to our wishes via the robots and no follow tags. http://www.archive.org/about/exclude.php
-rich
0
 

Author Closing Comment

by:nigelsponge
ID: 31504005
thank you
0

Featured Post

2017 Webroot Threat Report

MSPs: Get the facts you need to protect your clients.
The 2017 Webroot Threat Report provides a uniquely insightful global view into the analysis and discoveries made by the Webroot® Threat Intelligence Platform to provide insights on key trends and risks as seen by our users.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Without even knowing it, most of us are using web applications on a daily basis.  In fact, Gmail and Yahoo email, Twitter, Facebook, and eBay are used by most of us daily—and they are web applications. We generally confuse these web applications to…
It’s time for spooky stories and consuming way too much sugar, including the many treats we’ve whipped for you in the world of tech. Check it out!
The viewer will learn how to count occurrences of each item in an array.
Is your data getting by on basic protection measures? In today’s climate of debilitating malware and ransomware—like WannaCry—that may not be enough. You need to establish more than basics, like a recovery plan that protects both data and endpoints.…

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question