Solved

Advice on Web Crawlers...

Posted on 2014-11-24
9
147 Views
Last Modified: 2014-11-27
Hi,

I have a Drupal website and am thinking of creating some pages that provide information to some software that I am developing.  

I envisage that the software contains predefined links to specific pages and other than that there are no links to the pages.

I do NOT want the content of these pages to appear in search engine results.  My question is:  

If I create a page:  www.drTribos.com.au/<SomeRandomString> and don't make any links to it, will a search engine be able (ok let's say likely) to find it?

Are there any recommendations that people can make?  Please let me know if I need to clarify the concept.

TIA
0
Comment
Question by:DrTribos
9 Comments
 
LVL 82

Accepted Solution

by:
Dave Baldwin earned 400 total points
ID: 40461839
'robots.txt' is used to tell 'legitimate' search engine web crawlers to stay out.  The spam scanners will ignore that.  http://www.robotstxt.org/robotstxt.html

There is some suspicion that it doesn't really work though.  They won't list it in the results but it seems that they do keep track no matter what they say.  All browsers now check every site and page you go to against a malware database.  Firefox has been using Google's  for years as does Chrome and IE uses Microsoft's.  If you bring it up in your browser, someone somewhere will know.  And you might find people you never heard of downloading your page.
0
 
LVL 53

Assisted Solution

by:COBOLdinosaur
COBOLdinosaur earned 50 total points
ID: 40462837
Yeah Dave is right the simple answer is if you put a page on a public facing server then it will get discovered.  Even without a link to it, it will be found unless it is in a directory that is secured and inaccessible then it will have some protection, unless a curious hacker finds a security hole.

Of course if it gets discovered then it is possible that someone will download it and put it on another site or post links to it.  You should never put anything on a publicly accessible server that you want kept confidential.

Cd&
0
 
LVL 70

Assisted Solution

by:Jason C. Levine
Jason C. Levine earned 50 total points
ID: 40462955
Agree with Cd&.

Password protect the page/folder to keep spiders out
Don't post it at all if it's sensitive (unless you really, really know what you're doing)
0
 
LVL 14

Author Closing Comment

by:DrTribos
ID: 40463450
Thanks guys... I was actually a little surprised when I first read Dave's answer, now I think I'm surprised that I was surprised... :-/

My information is not super sensitive... I am developing some software which has automatic bug reporting.  Among other things, the bug tracker I use detects duplicates and tracks frequency.  This provides me with the opportunity to notify the user (who just experienced the bug) if the:
- bug is known
- there is a workaround
- there is an upgrade

I was planning on making some web pages to describe workaround information.   I'm just in two minds about broadcasting this to the entire web.

I think I can put pages of that nature in a specific folder which is protected by a ht.access  - not sure the best way to implement.

Cheers,
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 82

Expert Comment

by:Dave Baldwin
ID: 40463475
If you're running on Apache, you can use '.htaccess' to implement Basic Auth security which will keep people out that don't have the password including search engine robots.  http://httpd.apache.org/docs/current/howto/auth.html

As far as I know, there is literally nothing you can do about the page reporting done by the browsers.  Supposedly you can turn it off but I don't know that it really works.  Years ago there was a question here by someone who had uploaded a file by FTP and only looked at it once in their browser.  There were no links to it anywhere.  So they were quite surprised when they saw in their logs that someone from Global Crossing had downloaded the file.  They were bought by Level3 which is one of the biggest network providers you never heard of because they don't do residential or 'last mile' networking.  They connect ISPs to each other.  The chances are very good that your request for this page went thru part of Level3's network.
0
 
LVL 14

Author Comment

by:DrTribos
ID: 40463526
Wow...

I can probably cope with using .htaccess  thanks for the link :-D
0
 
LVL 82

Expert Comment

by:Dave Baldwin
ID: 40463578
You're welcome.  So now you understand why that I say...

If you want privacy... turn off the computer and walk away.
0
 
LVL 9

Expert Comment

by:oliverpolden
ID: 40468605
I realise this has already been accepted but wanted to cover the options for protecting pages for which there are loads of options:
 - Put the page behind a login (the obvious answer)
 - Protected pages module: https://www.drupal.org/project/protected_pages
 - Premium pages: https://www.drupal.org/project/nopremium
Plus many more.

It sounds like protected pages is the right one for you. You cannot use a .htaccess file in Drupal to secure a "folder of pages" since Drupal serves all pages out of the database via index.php.

To add to the discovery of non-linked-to pages. There are lots of reasons they could be discovered by search engines:
 - Server misconfiguration that exposes directory listings
 - Automatically generated sitemaps
 - Automatically generated feeds e.g. RSS
 - Some unexpected link from elsewhere on the site

Hope that's helpful.
Oliver
0
 
LVL 14

Author Comment

by:DrTribos
ID: 40470059
Oliver, thank you for the extra info, very much appreciated.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

This story has been written with permission from the scammed victim, a valued client of mine – identity protected by request.
Boost your ability to deliver ambitious and competitive web apps by choosing the right JavaScript framework to best suit your project’s needs.
The viewer will get a basic understanding of what section 508 compliance can entail, learn about skip navigation links, alt text, transcripts, and font size controls.
Learn how to set-up PayPal payment integration in your Wufoo form. Allow your users to remit payment through PayPal upon completion of your online form. This is helpful for collecting membership payments, customer payments, donations, and more.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now