Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium


Google bot requesting non-existent pages buried in appended directory names

Posted on 2011-05-09
Medium Priority
Last Modified: 2013-12-08
I was looking through my server request logs and see that google bots are requesting all kinds of bad URL's on my site.  It seems to be appending directory names to each other.

For example:  http://www.mydomain.com/directory1/directory2/directory3/directory4/directory5/directory6/randomfile.cfm

All of the directories appended to each other are top level directories.

Any idea why this is happening and how I can fix it?
Question by:MFredin
LVL 12

Expert Comment

ID: 35732207
Although not strictly necessary if you're willing to put up with whatever Google does, there are ways to help you control how robots search your site.

Google recommends using a sitemap. If you aren't using one, they're described here: http://www.sitemaps.org/

A robots.txt file may be useful too. They're described at http://www.robotstxt.org/.

There are many tools available to help you automate the creation of sitemaps and robot.txt files.
Here are Google links to sitemap generators and robots.txt info
LVL 84

Expert Comment

by:Dave Baldwin
ID: 35732417
Like @Amick said, Google Sitemaps are the best way.  If the links are coming from another site for some reason, you won't be able to do much about it.   Do a Google search to see if they are getting those links somewhere else.
LVL 23

Accepted Solution

Tony McCreath earned 1000 total points
ID: 35735050
This is often caused by badly written links.

In this case I would suspect some of your links are of the form "directory1/" and not "/directory1/".

The first form is a relative link, so if you are already in a directory it will append the second directly to it. Just like your example. the second for has the preceding / which indicate the link is relative to the root of the website.

If you can't spot the bad links I'd suggest you use a link checker (like Zenu) to try and find them.
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

LVL 29

Assisted Solution

fibo earned 1000 total points
ID: 35866638

1. Google Sitemaps are NOT a solution to this problem.

Google sitemaps do not say "this a complete list of web pages and no other pages should be indexed", what they say is "I know your spiders will discover all pages in my site, but to help you here is a list of some pages present on the site. And although I have tried to make this list as exhaustive as possible, I would not be surprised that your spiders will discover other pages."
As a matter of fact, this last sentence is very important, since it allows you to use several sitemaps when needed.

2. The solution is robots.txt

These loops are probably caused by some error in the site code (I know, I have one on one site I cannot solve). If you do not find the code, then the only solution is to rely on robots.txt.
Again, robots.txt does not say "it is forbidden to do differently than what is written here", but it says "if you are a nice and well-educated spider/robot, you will do as is written here".
So every nice spider, i.e. those that count, might index pages on other sites that index "wrong pages" on your site, but when following the link it will index this page, the look at the link on this page and, if robots.txt tells not to index them, then it will not.
So you should build a list of all the wrong 2-level (or is it 3-level?) directories that should not be explored, and place it in your robots.txt.
Note that this will not remove the pages already listed in the search engines...

3. Consider using the rewrites 301

All of this could be solved if it is simple to resolve all of your wrong paths with a regexp mod-rewrite in htaccess: using a rewrite 301 would clean up things for all spiders(even ill behaving ones)
LVL 29

Expert Comment

ID: 36132386
I think the thread proposes solutions which should be evaluated.

Not evaluating mine (I obviously have some bias!), I think the best contribution is Tiggerito's  #a35735050 because it points to a probable cause of the problem, thus curing the problem rather than its symptoms.
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 36206033
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
LVL 29

Expert Comment

ID: 36179783
Thx Angel

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The burden of debt that average person carries today has not only increased, but it keeps on growing. It is quite common nowadays that people find it extremely difficult to make ends meet in the face of millions of competing priorities that they hav…
Conducting a customer service survey used to be as straightforward as sending a template email out using checkboxes and numerical rating systems to measure satisfaction.
Any person in technology especially those working for big companies should at least know about the basics of web accessibility. Believe it or not there are even laws in place that require businesses to provide such means for the disabled and aging p…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses

578 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question