Solved

Advice on Web Crawlers...

Posted on 2014-11-24
9
156 Views
Last Modified: 2014-11-27
Hi,

I have a Drupal website and am thinking of creating some pages that provide information to some software that I am developing.  

I envisage that the software contains predefined links to specific pages and other than that there are no links to the pages.

I do NOT want the content of these pages to appear in search engine results.  My question is:  

If I create a page:  www.drTribos.com.au/<SomeRandomString> and don't make any links to it, will a search engine be able (ok let's say likely) to find it?

Are there any recommendations that people can make?  Please let me know if I need to clarify the concept.

TIA
0
Comment
Question by:DrTribos
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
9 Comments
 
LVL 83

Accepted Solution

by:
Dave Baldwin earned 400 total points
ID: 40461839
'robots.txt' is used to tell 'legitimate' search engine web crawlers to stay out.  The spam scanners will ignore that.  http://www.robotstxt.org/robotstxt.html

There is some suspicion that it doesn't really work though.  They won't list it in the results but it seems that they do keep track no matter what they say.  All browsers now check every site and page you go to against a malware database.  Firefox has been using Google's  for years as does Chrome and IE uses Microsoft's.  If you bring it up in your browser, someone somewhere will know.  And you might find people you never heard of downloading your page.
0
 
LVL 53

Assisted Solution

by:COBOLdinosaur
COBOLdinosaur earned 50 total points
ID: 40462837
Yeah Dave is right the simple answer is if you put a page on a public facing server then it will get discovered.  Even without a link to it, it will be found unless it is in a directory that is secured and inaccessible then it will have some protection, unless a curious hacker finds a security hole.

Of course if it gets discovered then it is possible that someone will download it and put it on another site or post links to it.  You should never put anything on a publicly accessible server that you want kept confidential.

Cd&
0
 
LVL 70

Assisted Solution

by:Jason C. Levine
Jason C. Levine earned 50 total points
ID: 40462955
Agree with Cd&.

Password protect the page/folder to keep spiders out
Don't post it at all if it's sensitive (unless you really, really know what you're doing)
0
Webinar: Aligning, Automating, Winning

Join Dan Russo, Senior Manager of Operations Intelligence, for an in-depth discussion on how Dealertrack, leading provider of integrated digital solutions for the automotive industry, transformed their DevOps processes to increase collaboration and move with greater velocity.

 
LVL 15

Author Closing Comment

by:DrTribos
ID: 40463450
Thanks guys... I was actually a little surprised when I first read Dave's answer, now I think I'm surprised that I was surprised... :-/

My information is not super sensitive... I am developing some software which has automatic bug reporting.  Among other things, the bug tracker I use detects duplicates and tracks frequency.  This provides me with the opportunity to notify the user (who just experienced the bug) if the:
- bug is known
- there is a workaround
- there is an upgrade

I was planning on making some web pages to describe workaround information.   I'm just in two minds about broadcasting this to the entire web.

I think I can put pages of that nature in a specific folder which is protected by a ht.access  - not sure the best way to implement.

Cheers,
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 40463475
If you're running on Apache, you can use '.htaccess' to implement Basic Auth security which will keep people out that don't have the password including search engine robots.  http://httpd.apache.org/docs/current/howto/auth.html

As far as I know, there is literally nothing you can do about the page reporting done by the browsers.  Supposedly you can turn it off but I don't know that it really works.  Years ago there was a question here by someone who had uploaded a file by FTP and only looked at it once in their browser.  There were no links to it anywhere.  So they were quite surprised when they saw in their logs that someone from Global Crossing had downloaded the file.  They were bought by Level3 which is one of the biggest network providers you never heard of because they don't do residential or 'last mile' networking.  They connect ISPs to each other.  The chances are very good that your request for this page went thru part of Level3's network.
0
 
LVL 15

Author Comment

by:DrTribos
ID: 40463526
Wow...

I can probably cope with using .htaccess  thanks for the link :-D
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 40463578
You're welcome.  So now you understand why that I say...

If you want privacy... turn off the computer and walk away.
0
 
LVL 10

Expert Comment

by:oliverpolden
ID: 40468605
I realise this has already been accepted but wanted to cover the options for protecting pages for which there are loads of options:
 - Put the page behind a login (the obvious answer)
 - Protected pages module: https://www.drupal.org/project/protected_pages
 - Premium pages: https://www.drupal.org/project/nopremium
Plus many more.

It sounds like protected pages is the right one for you. You cannot use a .htaccess file in Drupal to secure a "folder of pages" since Drupal serves all pages out of the database via index.php.

To add to the discovery of non-linked-to pages. There are lots of reasons they could be discovered by search engines:
 - Server misconfiguration that exposes directory listings
 - Automatically generated sitemaps
 - Automatically generated feeds e.g. RSS
 - Some unexpected link from elsewhere on the site

Hope that's helpful.
Oliver
0
 
LVL 15

Author Comment

by:DrTribos
ID: 40470059
Oliver, thank you for the extra info, very much appreciated.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

FAQ pages provide a simple way for you to supply and for customers to find answers to the most common questions about your company. Here are six reasons why your company website should have a FAQ page
Although a lot of people devote their energy toward marketing for specific industries, there are some basic principles that can be applied to any sector imaginable. We’ll look at four steps to take and examine how those steps were put into action fo…
This tutorial walks through the best practices in adding a local business to Google Maps including how to properly search for duplicates, marker placement, and inputing business details. Login to your Google Account, then search for "Google Mapmaker…
The is a quite short video tutorial. In this video, I'm going to show you how to create self-host WordPress blog with free hosting service.

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question