?
Solved

Advice on Web Crawlers...

Posted on 2014-11-24
9
Medium Priority
?
162 Views
Last Modified: 2014-11-27
Hi,

I have a Drupal website and am thinking of creating some pages that provide information to some software that I am developing.  

I envisage that the software contains predefined links to specific pages and other than that there are no links to the pages.

I do NOT want the content of these pages to appear in search engine results.  My question is:  

If I create a page:  www.drTribos.com.au/<SomeRandomString> and don't make any links to it, will a search engine be able (ok let's say likely) to find it?

Are there any recommendations that people can make?  Please let me know if I need to clarify the concept.

TIA
0
Comment
Question by:DrTribos
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
9 Comments
 
LVL 84

Accepted Solution

by:
Dave Baldwin earned 1600 total points
ID: 40461839
'robots.txt' is used to tell 'legitimate' search engine web crawlers to stay out.  The spam scanners will ignore that.  http://www.robotstxt.org/robotstxt.html

There is some suspicion that it doesn't really work though.  They won't list it in the results but it seems that they do keep track no matter what they say.  All browsers now check every site and page you go to against a malware database.  Firefox has been using Google's  for years as does Chrome and IE uses Microsoft's.  If you bring it up in your browser, someone somewhere will know.  And you might find people you never heard of downloading your page.
0
 
LVL 53

Assisted Solution

by:COBOLdinosaur
COBOLdinosaur earned 200 total points
ID: 40462837
Yeah Dave is right the simple answer is if you put a page on a public facing server then it will get discovered.  Even without a link to it, it will be found unless it is in a directory that is secured and inaccessible then it will have some protection, unless a curious hacker finds a security hole.

Of course if it gets discovered then it is possible that someone will download it and put it on another site or post links to it.  You should never put anything on a publicly accessible server that you want kept confidential.

Cd&
0
 
LVL 70

Assisted Solution

by:Jason C. Levine
Jason C. Levine earned 200 total points
ID: 40462955
Agree with Cd&.

Password protect the page/folder to keep spiders out
Don't post it at all if it's sensitive (unless you really, really know what you're doing)
0
WordPress Tutorial 2: Terminology

An important part of learning any new piece of software is understanding the terminology it uses. Thankfully WordPress uses fairly simple names for everything that make it easy to start using the software.

 
LVL 15

Author Closing Comment

by:DrTribos
ID: 40463450
Thanks guys... I was actually a little surprised when I first read Dave's answer, now I think I'm surprised that I was surprised... :-/

My information is not super sensitive... I am developing some software which has automatic bug reporting.  Among other things, the bug tracker I use detects duplicates and tracks frequency.  This provides me with the opportunity to notify the user (who just experienced the bug) if the:
- bug is known
- there is a workaround
- there is an upgrade

I was planning on making some web pages to describe workaround information.   I'm just in two minds about broadcasting this to the entire web.

I think I can put pages of that nature in a specific folder which is protected by a ht.access  - not sure the best way to implement.

Cheers,
0
 
LVL 84

Expert Comment

by:Dave Baldwin
ID: 40463475
If you're running on Apache, you can use '.htaccess' to implement Basic Auth security which will keep people out that don't have the password including search engine robots.  http://httpd.apache.org/docs/current/howto/auth.html

As far as I know, there is literally nothing you can do about the page reporting done by the browsers.  Supposedly you can turn it off but I don't know that it really works.  Years ago there was a question here by someone who had uploaded a file by FTP and only looked at it once in their browser.  There were no links to it anywhere.  So they were quite surprised when they saw in their logs that someone from Global Crossing had downloaded the file.  They were bought by Level3 which is one of the biggest network providers you never heard of because they don't do residential or 'last mile' networking.  They connect ISPs to each other.  The chances are very good that your request for this page went thru part of Level3's network.
0
 
LVL 15

Author Comment

by:DrTribos
ID: 40463526
Wow...

I can probably cope with using .htaccess  thanks for the link :-D
0
 
LVL 84

Expert Comment

by:Dave Baldwin
ID: 40463578
You're welcome.  So now you understand why that I say...

If you want privacy... turn off the computer and walk away.
0
 
LVL 10

Expert Comment

by:oliverpolden
ID: 40468605
I realise this has already been accepted but wanted to cover the options for protecting pages for which there are loads of options:
 - Put the page behind a login (the obvious answer)
 - Protected pages module: https://www.drupal.org/project/protected_pages
 - Premium pages: https://www.drupal.org/project/nopremium
Plus many more.

It sounds like protected pages is the right one for you. You cannot use a .htaccess file in Drupal to secure a "folder of pages" since Drupal serves all pages out of the database via index.php.

To add to the discovery of non-linked-to pages. There are lots of reasons they could be discovered by search engines:
 - Server misconfiguration that exposes directory listings
 - Automatically generated sitemaps
 - Automatically generated feeds e.g. RSS
 - Some unexpected link from elsewhere on the site

Hope that's helpful.
Oliver
0
 
LVL 15

Author Comment

by:DrTribos
ID: 40470059
Oliver, thank you for the extra info, very much appreciated.
0

Featured Post

Application Discovery Service in AWS

In the era of the cloud, customers migrating away from their existing on-premise infrastructure. This requires lots of planning, strategies, and effort to identify their existing resources and determine how best to migrate.  Datacenter migrations happen in four phases -

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Today, the web development industry is booming, and many people consider it to be their vocation. The question you may be asking yourself is – how do I become a web developer?
Australian government abolished Visa 457 earlier this April and this article describes how this decision might affect Australian IT scene and IT experts.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Learn how to set-up custom confirmation messages to users who complete your Wufoo form. Include inputs from fields in your form, webpage redirects, and more with Wufoo’s confirmation options.
Suggested Courses

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question