Too much traffic?

Posted on 2008-06-10
Medium Priority
Last Modified: 2013-12-07
A colleague (domain obfuscated by me) writes:

"Much as we love the idea of higher traffic, the big numbers become less desirable when it becomes clear that there are no actual eyeballs behind many of those visits.

We're seeing a surge in non-human traffic on Domain Online, and are not quite sure what to do about it. These are not day-to-day spikes separated by days of "normal" traffic. Since March of this year, there has been a steady increase in stress on our servers. Four weeks ago, traffic that was already double or triple our "normal" numbers quadrupled. This traffic has been so significant that our webservers (Two IIS servers) have, on occasion, been knocked or nearly-knocked offline.

For the month of May, our Urchin Tracking Monitor, which counts only those sessions and page views from browsers accepting javascript, showed daily traffic at about 35,000 sessions, 3,400,000 hits and 80,000 page views per day. Over the same period our non-UTM stats, which includes traffic of all sorts, show daily traffic of 105,000 sessions, 3,400,000 hits and 1,175,000 page views. Of the total hits we're showing 3,035,000 coming from Robots (63% coming from the Mozilla Compatible agents, 1.3 million identifying themselves as the Googlebot).

We're using a Windows 2003 SQL server database. Our tech folks ruled out the possibility of an SQL injection attack because we weren't getting hit by a single domain range.

Anybody have experience with these kinds of ratios of human-to-non human traffic? Will adding webservers help us? Other solutions?"

Any suggestions would be appreciated.

Question by:Eric AKA Netminder
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +4

Assisted Solution

Jk387 earned 260 total points
ID: 21753306
They may be being hit with a "denial of service" attack.  There are methods out there to use multiple computers or servers to essentially bombard your web server with traffic to prevent the website from being viewed or functioning resonably.  I am no expert on fixing this type of thing at all unfortunatly i just thought you would appreciate the idea.

Assisted Solution

sajain84 earned 260 total points
ID: 21753510
I've had a friend of mine who had a very popular forum setup and was managing it.
He would complain that due to the regularity with which google would index his website, his bandwidth quota would get used up quickly.

Incase you have a similar site (forum, blog, etc) with lots of pages, you could probably try a robots.txt file which has instructions on when and how frequently to index your site. All major search engines will follow these rules.

The other thing could be that someone is scraping off your website - which means running bots which extract data from your web pages and storing / using them elsewhere.
This is not uncommon for sites which are of the directory listing type.

In such cases, if you can zoom in on the IPs which are hitting your website, you could block access to them from your server. (Reject access to IPs from a list)

The final thing can be a denial of service attack.
This is when some people get together and hit a website with tonnes of requests so that the servers go down.

In your case, I have a strong feeling its either 1, 2 or both.
Incase it is a DOS attack, you'll have to probably contact some security experts which can guide you to a better solution.

I'm sorry that I haven't been able to cleanly help you out - but I do hope this helps.
LVL 15

Author Comment

by:Eric AKA Netminder
ID: 21753730
It's likely that the site is being Digged etc., which could account for a lot of the traffic, I suppose. They seem to doubt that it's a DOS attack based on the logs, as noted in the question, but I'll pass it along too.

I'm going to leave this open to see if I get any other ideas/suggestions. I'll also answer any questions as best I can in order to get some other specifics.

Automating Terraform w Jenkins & AWS CodeCommit

How to configure Jenkins and CodeCommit to allow users to easily create and destroy infrastructure using Terraform code.


Expert Comment

ID: 21753993
Could you let us onto the nature of the website?
Whether it is a forum, blog, etc?
This would throw some more light on the situation.

Getting digged is unlikely - because you mentioned that most of the responses are automated / bot responses.
The digg effect (or when your site gets digged) is when actual people open the site and look at it.
This will register in your urchin records as actual people and not automated bots.
LVL 15

Author Comment

by:Eric AKA Netminder
ID: 21754419
www.poynter.org (realized that there's no compelling reason to hide it).

There's a lot going on there as you can see.

LVL 51

Assisted Solution

by:Steve Bink
Steve Bink earned 380 total points
ID: 21754448
It could be because your popularity is skyrocketing.  :)

If people are linking to your website, spiders will follow those links back to you.  If you have generated content, a blog, for example, the spider will crawl every link it finds, where a human will only follow one or two.  That would obviously cause this kind of disparity in numbers.  In the 65000 non-Urchin sessions, approximately 1.1 million page hits were generated.  That certainly sounds like being spidered.

Depending on the nature of your site, and how well it places in search engines, you could be getting scraped by spammers.  These guys just set up a bot net to pull all the information it can find from your site.  That info is then used to populate a bait site for search engines.  These scraping clients will generally show up as non-JS bots.

I doubt it is a DOS attack.  If it were a real DOS attack, your server would be unresponsive in a matter of minutes each and every time you put it up.  Also, Google does not generally participate in such attacks, and they account for a large number of hits in your stats.

Do you subscribe to any third-party certification services, such as ScanAlert?
LVL 51

Assisted Solution

by:Keith Alabaster
Keith Alabaster earned 380 total points
ID: 21754684
Hey mate - seems to have been a while since I saw your name on a question :)

Can you provide some more detail on the hits been seen? Are they all arriving on port 80 of this site or is the protocol spread larger than that? ie Is the number of hits being seen on the published web sites less than the number of hits on the external interface of the router/firewall on the outside?

For example, we have approx 500K hits per day according to our (Government Agency) web site but we also block over 100,000  spam mails per day, and god knows how many potential hits on ports that are not even open on the external firewall (but they still get logged).

What are the ranges of ip addresses? Can they be tracked back to a particular country/continent?

We have had to cop out and introduced a pair of Cisco Load Balancers - we have not yet found a way to strip out unwanted traffic that still met the ip/port requirements. We did try and put up an 'accept Terms' page that timed out and dropped the connection after 30 seconds to try and stall some of the bots and we also put the traditional options in to exclude spiders/crawlers etc. That helped a lot but the Teerms page had no real noticeable effect.

LVL 19

Assisted Solution

by:Gabriel Orozco
Gabriel Orozco earned 260 total points
ID: 21755052
I would try to check with the google tools to see if they are the ones hitting the web site (https://www.google.com/webmasters/tools) if not, I would try to limit accepting requests from any agent named googlebot not coming from google network. this can continue until you sort out all the visits you do not want.

As said, your primary tool is the robots.txt file. you will need to analyze the web logs to see how much traffic it is from each one, and if you want to be searched from all of them. you can even see if adding metatags to the web pages you do not want listed helps. this is an ongoing effort and I suggest you get in touch with somebody at google
LVL 15

Author Comment

by:Eric AKA Netminder
ID: 21756136

I've asked if they subscribe to something like ScanAlert, but your explanaton of spiders seems more plausible.


At this point, I ask questions more for other people than I do for myself; that means that either I'm not doing anything I haven't been doing for a while, or I have all my acquaintances so buffaloed that they think I know everything.

I've not received a reply to the message I sent them (see my comment to routinet; your questions were in the same email), but when I do, I will post immediately. I have also asked if they have considered load balancers.


Thanks. I've already sent to them some information on the use of robots.txt to limit the Googlebot-type scans; in it, I did mention non-Google googlebots, though I've never personally heard of such a thing. That's not to say they don't exist -- just to say that I've never seen one.


Expert Comment

ID: 21756660
I kinda agree with routinet.
It could very well be because your popularity is increasing.

A search on alexa and compete - does show an increase in people visiting the website since Jan 08 and it has been an upward trend.

So I am guessing as more people are reading the content and then blogging and linking back to the articles on their blogs, it could very well be the spiders crawling on your website.
LVL 23

Accepted Solution

Mysidia earned 460 total points
ID: 21757048
It doesn't sound like an attack.
An ongoing blatant  attack would generate many more requests than indicated.

But "not getting hit by a single domain range" is not a valid basis for ruling out SQL injection attack as occuring or not.   (But increased hit count is not characteristic or indicative
 of an SQL injection problem, either)

Something to keep in mind is that bot traffic has a role -- when the site is indexed by search engines, you get more visitors from search engines - the more search engines that have indexed content, the more search engine users will follow the link.

You can drop an entry in robots.txt  to disallow all crawlers, or all crawlers except known ones, but it is probably a bad idea, if you want the site to grow and have many human visitors.

The really bad search engines won't look for robots.txt anyways, and the best thing to do is identify those by their unusual activity, and ban them by IP.

If servers are being nearly knocked off by what is becoming ordinary traffic, then the next logical step is probably to work out a plan for scaling up the application,  

so there is at least a little breathing room for the site to grow and survive unexpected bursts in traffic (flash crowds).

That may involve buying more bandwidth, adding more servers -- load-balancers, web servers, database servers, etc.

And possibly development work -- changes to the scripts that drive the site, increased use of memory caching, tuning of the database system, possibly re-examination of choices of server software, database, API/framework, etc,  all in order to function acceptably on the larger scale.

Many efficiency considerations that don't matter on a small site become a lot more important when you have more visitors.

5 million hits a month.
Is actually not very many.

Only ~2 hits per second  on average.

If load that small is crashing web servers, then it seems like either the servers are old/slow
or there may be some serious application inefficiency issues.

LVL 19

Expert Comment

by:Gabriel Orozco
ID: 21757186
indeed. you need to work into making the site scalable. maybe a load balancer, or a caching network is on order.
LVL 15

Author Comment

by:Eric AKA Netminder
ID: 21763222
Thank you, all. I appreciate the ideas.

As I have not heard back, I'm going to close this question; if some specifics are requested of me, I will open a new question using the Ask A Related Question feature to ensure that you are all notified.

Great work, folks.

LVL 51

Expert Comment

by:Keith Alabaster
ID: 21763316

Featured Post

Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

FAQ pages provide a simple way for you to supply and for customers to find answers to the most common questions about your company. Here are six reasons why your company website should have a FAQ page
Australian government abolished Visa 457 earlier this April and this article describes how this decision might affect Australian IT scene and IT experts.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.
Suggested Courses

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question