Eric - Netminder
asked on
Too much traffic?
A colleague (domain obfuscated by me) writes:
"Much as we love the idea of higher traffic, the big numbers become less desirable when it becomes clear that there are no actual eyeballs behind many of those visits.
We're seeing a surge in non-human traffic on Domain Online, and are not quite sure what to do about it. These are not day-to-day spikes separated by days of "normal" traffic. Since March of this year, there has been a steady increase in stress on our servers. Four weeks ago, traffic that was already double or triple our "normal" numbers quadrupled. This traffic has been so significant that our webservers (Two IIS servers) have, on occasion, been knocked or nearly-knocked offline.
For the month of May, our Urchin Tracking Monitor, which counts only those sessions and page views from browsers accepting javascript, showed daily traffic at about 35,000 sessions, 3,400,000 hits and 80,000 page views per day. Over the same period our non-UTM stats, which includes traffic of all sorts, show daily traffic of 105,000 sessions, 3,400,000 hits and 1,175,000 page views. Of the total hits we're showing 3,035,000 coming from Robots (63% coming from the Mozilla Compatible agents, 1.3 million identifying themselves as the Googlebot).
We're using a Windows 2003 SQL server database. Our tech folks ruled out the possibility of an SQL injection attack because we weren't getting hit by a single domain range.
Anybody have experience with these kinds of ratios of human-to-non human traffic? Will adding webservers help us? Other solutions?"
Any suggestions would be appreciated.
ep
"Much as we love the idea of higher traffic, the big numbers become less desirable when it becomes clear that there are no actual eyeballs behind many of those visits.
We're seeing a surge in non-human traffic on Domain Online, and are not quite sure what to do about it. These are not day-to-day spikes separated by days of "normal" traffic. Since March of this year, there has been a steady increase in stress on our servers. Four weeks ago, traffic that was already double or triple our "normal" numbers quadrupled. This traffic has been so significant that our webservers (Two IIS servers) have, on occasion, been knocked or nearly-knocked offline.
For the month of May, our Urchin Tracking Monitor, which counts only those sessions and page views from browsers accepting javascript, showed daily traffic at about 35,000 sessions, 3,400,000 hits and 80,000 page views per day. Over the same period our non-UTM stats, which includes traffic of all sorts, show daily traffic of 105,000 sessions, 3,400,000 hits and 1,175,000 page views. Of the total hits we're showing 3,035,000 coming from Robots (63% coming from the Mozilla Compatible agents, 1.3 million identifying themselves as the Googlebot).
We're using a Windows 2003 SQL server database. Our tech folks ruled out the possibility of an SQL injection attack because we weren't getting hit by a single domain range.
Anybody have experience with these kinds of ratios of human-to-non human traffic? Will adding webservers help us? Other solutions?"
Any suggestions would be appreciated.
ep
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
Could you let us onto the nature of the website?
Whether it is a forum, blog, etc?
This would throw some more light on the situation.
Getting digged is unlikely - because you mentioned that most of the responses are automated / bot responses.
The digg effect (or when your site gets digged) is when actual people open the site and look at it.
This will register in your urchin records as actual people and not automated bots.
Whether it is a forum, blog, etc?
This would throw some more light on the situation.
Getting digged is unlikely - because you mentioned that most of the responses are automated / bot responses.
The digg effect (or when your site gets digged) is when actual people open the site and look at it.
This will register in your urchin records as actual people and not automated bots.
ASKER
www.poynter.org (realized that there's no compelling reason to hide it).
There's a lot going on there as you can see.
ep
There's a lot going on there as you can see.
ep
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
routinet,
I've asked if they subscribe to something like ScanAlert, but your explanaton of spiders seems more plausible.
keith_alabaster,
At this point, I ask questions more for other people than I do for myself; that means that either I'm not doing anything I haven't been doing for a while, or I have all my acquaintances so buffaloed that they think I know everything.
I've not received a reply to the message I sent them (see my comment to routinet; your questions were in the same email), but when I do, I will post immediately. I have also asked if they have considered load balancers.
Redimido,
Thanks. I've already sent to them some information on the use of robots.txt to limit the Googlebot-type scans; in it, I did mention non-Google googlebots, though I've never personally heard of such a thing. That's not to say they don't exist -- just to say that I've never seen one.
ep
I've asked if they subscribe to something like ScanAlert, but your explanaton of spiders seems more plausible.
keith_alabaster,
At this point, I ask questions more for other people than I do for myself; that means that either I'm not doing anything I haven't been doing for a while, or I have all my acquaintances so buffaloed that they think I know everything.
I've not received a reply to the message I sent them (see my comment to routinet; your questions were in the same email), but when I do, I will post immediately. I have also asked if they have considered load balancers.
Redimido,
Thanks. I've already sent to them some information on the use of robots.txt to limit the Googlebot-type scans; in it, I did mention non-Google googlebots, though I've never personally heard of such a thing. That's not to say they don't exist -- just to say that I've never seen one.
ep
I kinda agree with routinet.
It could very well be because your popularity is increasing.
A search on alexa and compete - does show an increase in people visiting the website since Jan 08 and it has been an upward trend.
So I am guessing as more people are reading the content and then blogging and linking back to the articles on their blogs, it could very well be the spiders crawling on your website.
It could very well be because your popularity is increasing.
A search on alexa and compete - does show an increase in people visiting the website since Jan 08 and it has been an upward trend.
So I am guessing as more people are reading the content and then blogging and linking back to the articles on their blogs, it could very well be the spiders crawling on your website.
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
indeed. you need to work into making the site scalable. maybe a load balancer, or a caching network is on order.
ASKER
Thank you, all. I appreciate the ideas.
As I have not heard back, I'm going to close this question; if some specifics are requested of me, I will open a new question using the Ask A Related Question feature to ensure that you are all notified.
Great work, folks.
ep
As I have not heard back, I'm going to close this question; if some specifics are requested of me, I will open a new question using the Ask A Related Question feature to ensure that you are all notified.
Great work, folks.
ep
:)
ASKER
I'm going to leave this open to see if I get any other ideas/suggestions. I'll also answer any questions as best I can in order to get some other specifics.
ep