Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1053
  • Last Modified:

GoogleBot Timeout errors

Hello Exports,
Google has stopped indexing my site for some reason, I moved to host my stie on my own servers a few weeks ago and Google as stopped indexing my site.
I get the following error:
We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit.
I am able to access botht he sitemmap and the robots.txt file without any problems no timeouts errors.
I am thinking my firewall in blocking Googlebot from accessing my site.
Is anyone know which ports need to be open to allow Googlebot to index my site?
My site is locate in the DMZ, my firewall is Palo Alto Firewall.
Thank you
Roy
0
rfinaly
Asked:
rfinaly
  • 4
  • 4
1 Solution
 
Dave BaldwinFixer of ProblemsCommented:
Googlebot comes in on port 80 like a web browser would.  Can you post your web address so we can check it out?
0
 
rfinalyAuthor Commented:
0
 
Dave BaldwinFixer of ProblemsCommented:
Your 'robots.txt' file doesn't look right to me.  'Bad robots' typically Ignore 'robots.txt'.  You may be blocking Googlebot because it comes in as:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

0
Get 10% Off Your First Squarespace Website

Ready to showcase your work, publish content or promote your business online? With Squarespace’s award-winning templates and 24/7 customer service, getting started is simple. Head to Squarespace.com and use offer code ‘EXPERTS’ to get 10% off your first purchase.

 
rfinalyAuthor Commented:
I logged in to Google Web Master and Generated a new robots.txt file using Google Tools.
Here is the file http://www.usuniversity.edu/robots.txt
When testing the robots.txt file also with Google tooles I get:
http://www.usuniversity.edu/ Allowed by line 2: Allow: /
Detected as a directory; specific files may have different restrictions.

I am assuming it is good? I will resubmit my sitemaps files and see what happens.
Thank you
Roy
0
 
rfinalyAuthor Commented:
I also tested a few pages with Google Fetch as Googlebot tool and this is what I get:
This is how Googlebot fetched the page.

URL: http://www.usuniversity.edu/

Date: Tue Nov 16 13:46:57 PST 2010

Googlebot Type: Web

When Submitting the sitemap.html file I get:
URL timeout: robots.txt timeout
http://www.usuniversity.edu/sitemap.html

Should I remove the robots.txt all together?
Roy
0
 
Dave BaldwinFixer of ProblemsCommented:
I'm lost at this point.  Click on "Request Attention" and get some others to look at your question.
0
 
rfinalyAuthor Commented:
I was able to resolve the problem, it was my firewall that blocks WEB-CRALERS.
After openning the port everything want back to normal.
Thank you
Roy
0
 
Dave BaldwinFixer of ProblemsCommented:
Cool, thanks.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

  • 4
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now