Improve company productivity with a Business Account.Sign Up

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 694
  • Last Modified:

Too much traffic?

A colleague (domain obfuscated by me) writes:

"Much as we love the idea of higher traffic, the big numbers become less desirable when it becomes clear that there are no actual eyeballs behind many of those visits.

We're seeing a surge in non-human traffic on Domain Online, and are not quite sure what to do about it. These are not day-to-day spikes separated by days of "normal" traffic. Since March of this year, there has been a steady increase in stress on our servers. Four weeks ago, traffic that was already double or triple our "normal" numbers quadrupled. This traffic has been so significant that our webservers (Two IIS servers) have, on occasion, been knocked or nearly-knocked offline.

For the month of May, our Urchin Tracking Monitor, which counts only those sessions and page views from browsers accepting javascript, showed daily traffic at about 35,000 sessions, 3,400,000 hits and 80,000 page views per day. Over the same period our non-UTM stats, which includes traffic of all sorts, show daily traffic of 105,000 sessions, 3,400,000 hits and 1,175,000 page views. Of the total hits we're showing 3,035,000 coming from Robots (63% coming from the Mozilla Compatible agents, 1.3 million identifying themselves as the Googlebot).

We're using a Windows 2003 SQL server database. Our tech folks ruled out the possibility of an SQL injection attack because we weren't getting hit by a single domain range.

Anybody have experience with these kinds of ratios of human-to-non human traffic? Will adding webservers help us? Other solutions?"

Any suggestions would be appreciated.

Eric AKA Netminder
Eric AKA Netminder
  • 4
  • 3
  • 2
  • +4
6 Solutions
They may be being hit with a "denial of service" attack.  There are methods out there to use multiple computers or servers to essentially bombard your web server with traffic to prevent the website from being viewed or functioning resonably.  I am no expert on fixing this type of thing at all unfortunatly i just thought you would appreciate the idea.
I've had a friend of mine who had a very popular forum setup and was managing it.
He would complain that due to the regularity with which google would index his website, his bandwidth quota would get used up quickly.

Incase you have a similar site (forum, blog, etc) with lots of pages, you could probably try a robots.txt file which has instructions on when and how frequently to index your site. All major search engines will follow these rules.

The other thing could be that someone is scraping off your website - which means running bots which extract data from your web pages and storing / using them elsewhere.
This is not uncommon for sites which are of the directory listing type.

In such cases, if you can zoom in on the IPs which are hitting your website, you could block access to them from your server. (Reject access to IPs from a list)

The final thing can be a denial of service attack.
This is when some people get together and hit a website with tonnes of requests so that the servers go down.

In your case, I have a strong feeling its either 1, 2 or both.
Incase it is a DOS attack, you'll have to probably contact some security experts which can guide you to a better solution.

I'm sorry that I haven't been able to cleanly help you out - but I do hope this helps.
Eric AKA NetminderAuthor Commented:
It's likely that the site is being Digged etc., which could account for a lot of the traffic, I suppose. They seem to doubt that it's a DOS attack based on the logs, as noted in the question, but I'll pass it along too.

I'm going to leave this open to see if I get any other ideas/suggestions. I'll also answer any questions as best I can in order to get some other specifics.

Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

Could you let us onto the nature of the website?
Whether it is a forum, blog, etc?
This would throw some more light on the situation.

Getting digged is unlikely - because you mentioned that most of the responses are automated / bot responses.
The digg effect (or when your site gets digged) is when actual people open the site and look at it.
This will register in your urchin records as actual people and not automated bots.
Eric AKA NetminderAuthor Commented: (realized that there's no compelling reason to hide it).

There's a lot going on there as you can see.

Steve BinkCommented:
It could be because your popularity is skyrocketing.  :)

If people are linking to your website, spiders will follow those links back to you.  If you have generated content, a blog, for example, the spider will crawl every link it finds, where a human will only follow one or two.  That would obviously cause this kind of disparity in numbers.  In the 65000 non-Urchin sessions, approximately 1.1 million page hits were generated.  That certainly sounds like being spidered.

Depending on the nature of your site, and how well it places in search engines, you could be getting scraped by spammers.  These guys just set up a bot net to pull all the information it can find from your site.  That info is then used to populate a bait site for search engines.  These scraping clients will generally show up as non-JS bots.

I doubt it is a DOS attack.  If it were a real DOS attack, your server would be unresponsive in a matter of minutes each and every time you put it up.  Also, Google does not generally participate in such attacks, and they account for a large number of hits in your stats.

Do you subscribe to any third-party certification services, such as ScanAlert?
Keith AlabasterEnterprise ArchitectCommented:
Hey mate - seems to have been a while since I saw your name on a question :)

Can you provide some more detail on the hits been seen? Are they all arriving on port 80 of this site or is the protocol spread larger than that? ie Is the number of hits being seen on the published web sites less than the number of hits on the external interface of the router/firewall on the outside?

For example, we have approx 500K hits per day according to our (Government Agency) web site but we also block over 100,000  spam mails per day, and god knows how many potential hits on ports that are not even open on the external firewall (but they still get logged).

What are the ranges of ip addresses? Can they be tracked back to a particular country/continent?

We have had to cop out and introduced a pair of Cisco Load Balancers - we have not yet found a way to strip out unwanted traffic that still met the ip/port requirements. We did try and put up an 'accept Terms' page that timed out and dropped the connection after 30 seconds to try and stall some of the bots and we also put the traditional options in to exclude spiders/crawlers etc. That helped a lot but the Teerms page had no real noticeable effect.

Gabriel OrozcoSolution ArchitectCommented:
I would try to check with the google tools to see if they are the ones hitting the web site ( if not, I would try to limit accepting requests from any agent named googlebot not coming from google network. this can continue until you sort out all the visits you do not want.

As said, your primary tool is the robots.txt file. you will need to analyze the web logs to see how much traffic it is from each one, and if you want to be searched from all of them. you can even see if adding metatags to the web pages you do not want listed helps. this is an ongoing effort and I suggest you get in touch with somebody at google
Eric AKA NetminderAuthor Commented:

I've asked if they subscribe to something like ScanAlert, but your explanaton of spiders seems more plausible.


At this point, I ask questions more for other people than I do for myself; that means that either I'm not doing anything I haven't been doing for a while, or I have all my acquaintances so buffaloed that they think I know everything.

I've not received a reply to the message I sent them (see my comment to routinet; your questions were in the same email), but when I do, I will post immediately. I have also asked if they have considered load balancers.


Thanks. I've already sent to them some information on the use of robots.txt to limit the Googlebot-type scans; in it, I did mention non-Google googlebots, though I've never personally heard of such a thing. That's not to say they don't exist -- just to say that I've never seen one.

I kinda agree with routinet.
It could very well be because your popularity is increasing.

A search on alexa and compete - does show an increase in people visiting the website since Jan 08 and it has been an upward trend.

So I am guessing as more people are reading the content and then blogging and linking back to the articles on their blogs, it could very well be the spiders crawling on your website.
It doesn't sound like an attack.
An ongoing blatant  attack would generate many more requests than indicated.

But "not getting hit by a single domain range" is not a valid basis for ruling out SQL injection attack as occuring or not.   (But increased hit count is not characteristic or indicative
 of an SQL injection problem, either)

Something to keep in mind is that bot traffic has a role -- when the site is indexed by search engines, you get more visitors from search engines - the more search engines that have indexed content, the more search engine users will follow the link.

You can drop an entry in robots.txt  to disallow all crawlers, or all crawlers except known ones, but it is probably a bad idea, if you want the site to grow and have many human visitors.

The really bad search engines won't look for robots.txt anyways, and the best thing to do is identify those by their unusual activity, and ban them by IP.

If servers are being nearly knocked off by what is becoming ordinary traffic, then the next logical step is probably to work out a plan for scaling up the application,  

so there is at least a little breathing room for the site to grow and survive unexpected bursts in traffic (flash crowds).

That may involve buying more bandwidth, adding more servers -- load-balancers, web servers, database servers, etc.

And possibly development work -- changes to the scripts that drive the site, increased use of memory caching, tuning of the database system, possibly re-examination of choices of server software, database, API/framework, etc,  all in order to function acceptably on the larger scale.

Many efficiency considerations that don't matter on a small site become a lot more important when you have more visitors.

5 million hits a month.
Is actually not very many.

Only ~2 hits per second  on average.

If load that small is crashing web servers, then it seems like either the servers are old/slow
or there may be some serious application inefficiency issues.

Gabriel OrozcoSolution ArchitectCommented:
indeed. you need to work into making the site scalable. maybe a load balancer, or a caching network is on order.
Eric AKA NetminderAuthor Commented:
Thank you, all. I appreciate the ideas.

As I have not heard back, I'm going to close this question; if some specifics are requested of me, I will open a new question using the Ask A Related Question feature to ensure that you are all notified.

Great work, folks.

Keith AlabasterEnterprise ArchitectCommented:
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get 10% Off Your First Squarespace Website

Ready to showcase your work, publish content or promote your business online? With Squarespace’s award-winning templates and 24/7 customer service, getting started is simple. Head to and use offer code ‘EXPERTS’ to get 10% off your first purchase.

  • 4
  • 3
  • 2
  • +4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now