Google Bots - Killing the Server

Hi

We do web hosting and some of the websites hosted at our servers have something that Google Bots keep working with them so often and for very long time that the server becomes slow. Sometimes it is over a million hits to one website by Google bot in a day.

We are trying to find out a solution that even if our customer has not configured its website correctly with Google bot/Webmaster tools Google cannot do that much hits to our server.
Currently in such cases we block the Google Bot IPs in IPTables and the servers become very good but in that case the customers having good websites also suffer.

Can someone please suggest a solution to this?

We are running CentOS 6.5 64bit and using NginX and Apache at our servers.
sysautomationAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
Scott Fell, EE MVEConnect With a Mentor Developer & EE ModeratorCommented:
I think as the server owner, all you can do is determine which customer and send them a notice they are using up more than their allotted resources.    Googlebot listens to the domains, not the server.  If you have control of the domain, then you can use webmaster tools to limit or you can use robots.txt to prevent googlebot from crawling a folder or you can use a noindex tag on a page to prevent it from crawling the page https://support.google.com/webmasters/answer/93708

You can also set up your serverside programming to prevent one user from paging through too many pages in a certain time.

In any case, this is a domain function and not a server function as far as being able to tell googlebot what to do.
0
 
Zephyr ICTCloud ArchitectCommented:
Did you already look into the Webmaster Tools of Google? You'll have to create an account if you don't have one though

You can limit the Googlebot Crawling rate: https://support.google.com/webmasters/answer/48620?hl=en

Might be worth checking out?
0
 
Scott Fell, EE MVEDeveloper & EE ModeratorCommented:
That sounds odd for google.  Is there a special app you have created?  Or is there one domain that has an issue?  I would look for the page that is causing the problem and send a note to the domain owner to fix their page / limit google or be turned off.

It sounds like they must have a dynamic page with a lot of links and the queries they use take up a lot of resources.  

In any case, it you probably  have to have the domain owner take care of it or limit/shut off their service.
0
Worried about phishing attacks?

90% of attacks start with a phish. It’s critical that IT admins and MSSPs have the right security in place to protect their end users from these phishing attacks. Check out our latest feature brief for tips and tricks to keep your employees off a hackers line!

 
sysautomationAuthor Commented:
Yes it is dynamic. We are hosting Oracle Apex applications and have little control over customers except to force them when the server is in problem. But what I really look for is some preventive measure.
0
 
Giovanni HewardCommented:
Bear in mind the user-agent can easily be spoofed, so the bot may not actually belong to google.  (Verify the IP with ARIN to confirm.)
0
 
Scott Fell, EE MVEDeveloper & EE ModeratorCommented:
That is a great point!  

Like I said http:#a39998753, this did not sound right for google.
0
 
Dave BaldwinFixer of ProblemsCommented:
Here's what Google says about verifying their Googlebots: https://support.google.com/webmasters/answer/80553?hl=en
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.