Solved

GoogleBot Timeout errors

Posted on 2010-11-15
8
1,012 Views
Last Modified: 2012-05-10
Hello Exports,
Google has stopped indexing my site for some reason, I moved to host my stie on my own servers a few weeks ago and Google as stopped indexing my site.
I get the following error:
We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit.
I am able to access botht he sitemmap and the robots.txt file without any problems no timeouts errors.
I am thinking my firewall in blocking Googlebot from accessing my site.
Is anyone know which ports need to be open to allow Googlebot to index my site?
My site is locate in the DMZ, my firewall is Palo Alto Firewall.
Thank you
Roy
0
Comment
Question by:rfinaly
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
8 Comments
 
LVL 83

Accepted Solution

by:
Dave Baldwin earned 500 total points
ID: 34141710
Googlebot comes in on port 80 like a web browser would.  Can you post your web address so we can check it out?
0
 

Author Comment

by:rfinaly
ID: 34146198
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 34148408
Your 'robots.txt' file doesn't look right to me.  'Bad robots' typically Ignore 'robots.txt'.  You may be blocking Googlebot because it comes in as:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

0
Secure Your WordPress Site: 5 Essential Approaches

WordPress is the web's most popular CMS, but its dominance also makes it a target for attackers. Our eBook will show you how to:

Prevent costly exploits of core and plugin vulnerabilities
Repel automated attacks
Lock down your dashboard, secure your code, and protect your users

 

Author Comment

by:rfinaly
ID: 34150023
I logged in to Google Web Master and Generated a new robots.txt file using Google Tools.
Here is the file http://www.usuniversity.edu/robots.txt
When testing the robots.txt file also with Google tooles I get:
http://www.usuniversity.edu/ Allowed by line 2: Allow: /
Detected as a directory; specific files may have different restrictions.

I am assuming it is good? I will resubmit my sitemaps files and see what happens.
Thank you
Roy
0
 

Author Comment

by:rfinaly
ID: 34150472
I also tested a few pages with Google Fetch as Googlebot tool and this is what I get:
This is how Googlebot fetched the page.

URL: http://www.usuniversity.edu/

Date: Tue Nov 16 13:46:57 PST 2010

Googlebot Type: Web

When Submitting the sitemap.html file I get:
URL timeout: robots.txt timeout
http://www.usuniversity.edu/sitemap.html

Should I remove the robots.txt all together?
Roy
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 34151227
I'm lost at this point.  Click on "Request Attention" and get some others to look at your question.
0
 

Author Comment

by:rfinaly
ID: 34177788
I was able to resolve the problem, it was my firewall that blocks WEB-CRALERS.
After openning the port everything want back to normal.
Thank you
Roy
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 34177870
Cool, thanks.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Wordpress Security 29 111
asp web application 3 55
Ways to Manage 2 Development Projects on Same Section 1 39
type of website 13 53
In this blog, I will share you some basic tips for content marketing and to rank your website on Google.
When the s#!t hits the fan, you don’t have time to look up who’s on call, draft emails, call collaborators, or send text messages. An instant chat window is definitely the way to go, especially one like HipChat. HipChat is a true business app. An…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This video teaches users how to migrate an existing Wordpress website to a new domain.

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question