Solved

# Calculate peak online users for a given day

Posted on 2011-05-05
819 Views
I need to calculate the peak number of users we have on our web application on a given day.

i can calculate the total number of users on a given day no problem with a MySQL DISTINCT clause, but not sure how to work out peak concurrent users for a given day.

There is a session log which is updated as users login and logout of the system (with an auto logoff).
*note all sessions are automatically terminated at 10pm for overnight processing so no sessions run from one day to the next.

The key fields in the MySQL session log table  are.
sessionId - unsigned int - auto increment
userId - unsigned int
logoutTime - timestamp

0
Question by:Matthew_Way

LVL 24

Expert Comment

Why not just use google analytics and the like?

Okay maybe you've convinced management/whoever to use google analytics but you still need to produce a report for sessions prior having the google analytics working, I cannot think of a way to do this in a single query.  You need to process the rows in your session log because a single row can reflect a user logged on ranging from a few seconds to multiple hours.  It's like plotting each row in as a bar graph and finding the which time period has the most graphs crossing it. Assuming it is acceptable to just determine the peak hour and the number of users with it, try something like:

create table session_map (sessionid int(11), h00 int, h01 int); /*h00 h01 represents 12mn-1am, 1am-2pm, so you need to create h03, etc for the other hours)

insert into session_map (sessionid, h00, h01)
select sessionid,
from session_table
where loginTime between '2011-05-05 00:00:00' and '2011-05-05 22:00:00';

select sum(h00), sum(h01) from session_map; --> will give you number of users for the first and second hours.  Just extend the code, changing the number you compare hour(loginTime) and hour(logoutTime) with.

Note: untested.
0

Author Comment

The application is running on a private LAN / VPN so Google Analytics is out.
We also need a finer resolution than within the same hour Eg if one user logs out and 10 seconds later.

We rent the application out to a third party and the license works the following way.
Licensed for X concurrent users, if X+Y users are logged in they will be charged additionally for the overflow / day.

Results will be computed each night and an Email warning will be sent to let them know they have incurred an additional charge for that day.
0

LVL 11

Expert Comment

First you may want to explain what you mean my peak. For instance 'peak' might mean at which hour you had the highest number of active users? if you can define the benchmark for 'peak', then I should b able to help create an appropriate query.
0

LVL 24

Expert Comment

So you developed the application?  Then why not modify it to track actual live sessions, not relying on the current session logs table.  Something like a live_sessions table which you insert into when a user logs on and delete from when the user logs out.  Then query that table every nn minutes/seconds and insert the queried data to another sessions_history table that has something like:

sessions_history
(period_start datetime, period_end datetime, total_users int(11));
0

Author Comment

@madaboutasp: Peak would be maximum concurrent users during the day at any particular time.

@johanntagle:While that is an option i would rather not run a CRON job every X seconds if possible.

My current thought is to create a hash table in PHP with and index of "minute within the day".
The loop through incrementing the hash table.
But that seems a little ugly, just thought someone may have a more elegant solution.
0

LVL 24

Expert Comment

Well yeah if I need to check every X seconds I wouldn't use a cron job either.  I'd use a script that perpetually runs, just idly waiting for the next time to check.

I'm not a PHP developer so I can't comment on the solution you are thinking of.
0

LVL 107

Expert Comment

In the RESTful implementation of the Client/Server model (which is how the WWW works) there is no concept of an online user.  Let me try to explain with an example.  Let's say I visit a web site and receive a web page in my browser.  This was an atomic request - the URL was my complete request string and the web page was the server's complete response.  At that point it is over.  The server does not know whether I am looking at the web page or in the kitchen drinking coffee.  If you visit the same web page ten seconds later, are we concurrent users?  There is no way to know because the server does not get any notification when I have left the web page.

OK, well, what about logins and sessions?  You can create a session for me when I come to your page, and the session will "expire" when I close my browser window.  But the expiration process on the client end of things consists of nothing more than discarding the session cookie.  On the server side, you might want to learn about session garbage collection (it's a little bit complicated for me to explain in a post at EE) and as you think through what the GC routines do, you will quickly come to realize that the presence of session data is a very poor proxy for client activity.  It will grossly overstate the "concurrent" indicator.

A better measure of what is going on might be a page hit counter.  You can get that information and it will be accurate.  What you cannot get is how many people are looking at a web page a the same time.  The client-server architecture does not contain this information.
0

Author Comment

@Ray Paseur: I think we all know about the stateless nature of web browsers.
We use a session control class to simulate state that after 30mins idle time invalidates that particular session.
The session control class logs session activity to a MySQL table "session_log".

The database fields of the session log given above is a simplification so the duration of a particular session is actually;
loginTime - timestamp (Time the session began)
lastRequest - timestamp (Last time an Ajax or Page request was sent.)

The application is heavy ajax based with about 80% of all requests being ajax.
For our stats for each day we record total ajax requests, total page requests (page hits).

The reason for needing to know peak concurrent users is for our billing purposes.
0

LVL 33

Expert Comment

A thought for the question here -

You might have an integer column in your database for users_login, which is increased by one when a user logs in and decreased by one when the user logs out or is cleared in a timeout.  On each log - in you add one and then test to see if over the max. If this users_login number exceeds the max, or amount of "Licensed for X concurrent users", then send Email warning.
0

LVL 24

Expert Comment

Hi Matthew_Way,

Just realized a way to have that sessions table I mentioned in the previous post and not use cron job or an outside script to query it regularly - use a database trigger.

So you have a sessions table that you insert into every time somebody logs in, and delete from every time somebody logs out.  Then have a trigger count the rows on that table every time a change is made and store the data to another table.  Of course, this will take up considerable resources if you have login/logout activity every few seconds, but if that's not the case but you just want to know update right away the total session count the moment somebody logs in or out, this should work.  Something like:

``````mysql> create table current_sessions (sessionid int(11), primary key (sessionid));
Query OK, 0 rows affected (0.08 sec)

mysql> create table simultaneous_session_history (log_date datetime, session_count int(11));
Query OK, 0 rows affected (0.05 sec)

mysql> delimiter |
mysql> create trigger count_sessions1
-> after insert on current_sessions for each row
-> begin
->   insert into simultaneous_session_history (log_date, session_count)
->     select now(), count(*) from current_sessions;
-> end;
-> |
Query OK, 0 rows affected (0.08 sec)

mysql> create trigger count_sessions2
-> after delete on current_sessions for each row
-> begin
->   insert into simultaneous_session_history (log_date, session_count)
->     select now(), count(*) from current_sessions;
-> end;
-> |
mysql> delimiter ;
mysql> insert into current_sessions (sessionid) values (1);
Query OK, 1 row affected (0.00 sec)

mysql> insert into current_sessions (sessionid) values (2);
Query OK, 1 row affected (0.00 sec)

mysql> select * from simultaneous_session_history;
+---------------------+---------------+
| log_date            | session_count |
+---------------------+---------------+
| 2011-05-06 10:58:07 |             1 |
| 2011-05-06 10:58:10 |             2 |
+---------------------+---------------+
2 rows in set (0.00 sec)

mysql> delete from current_sessions where sessionid=1;
Query OK, 1 row affected (0.00 sec)

mysql> select * from simultaneous_session_history order by log_date;
+---------------------+---------------+
| log_date            | session_count |
+---------------------+---------------+
| 2011-05-06 10:58:07 |             1 |
| 2011-05-06 10:58:10 |             2 |
| 2011-05-06 10:58:40 |             1 |
+---------------------+---------------+
3 rows in set (0.00 sec)
``````

So you know that from 10:58:07 until just before 10:58:10, there's one user, then there are 2 users for the next 30 seconds, then back to one after that.  Unfortunately MySQL doesn't allow multiple event triggers, so you have to create one for insert and one for delete.  See http://dev.mysql.com/doc/refman/5.1/en/create-trigger.html for more information.

0

LVL 107

Expert Comment

I think we all know about the stateless nature of web browsers.

Yeah, it is a problem if you try to ignore it ...  Best of luck, ~Ray
0

Author Comment

@Slick812:
The license arrangement was negotiated by management / customer.
We charge them \$X dollars per concurrent user / year.
The customers concern is that some times during the year they may need to add extra operators and would therefore rather be charged an overflow  at \$Y dollars / user / day for additional capacity.

The users are not casual users but full time operators, we estimate 2,000 page requests / user / day.
So we can expect 2,000 requests X 70 users = 140,000 page requests..
They connect to our server by site to site VPN.
In addition we have 120 internal users, which we don't charge for but do need to gather stats.

The different user pools we internally refer to as "domains".

I hope this explains the business requirements.
0

LVL 33

Accepted Solution

OK Matthew, that shines some light on your problem, I'm am not to sure about the logic you have for "charged an overflow at dollars / user / day" and some determination of "peak concurrent users", and that determines a added charge rate.

You have said that some of the columns in your session log table are -
1. sessionId - unsigned int - auto increment
2. userId - unsigned int
4. logoutTime - timestamp

and it might take quite a bit of statistical math analysis in a PHP function to go through this table at the end of the day and find "peak concurrent users for a given day", at least that's my view without actually trying it. I would think you would need to have math in a function to sort the user sessions, determine which were running at the same time, and apply some sort of "overlap" filter (a math formula for sort and differentiation) to see how many users were log-in at the same time, as johanntagle said in a post above "It's like plotting each row in as a bar graph and finding the which time period has the most graphs crossing it" , , not so simple, and my mind can not come up with a plan to do this, the main problem in my thinking is that there are so many time blocks (lets say you try a "10 Minute" or "One Hour" time block) to analyze for your "each row in as a bar graph" for all the users that day at many different times. As A Note, I have gotten help with math problems in my programming in the Math area here at EE.

I think it more simple to keep a running count evaluation in your table, counting the amount for number of users logged-In for each "time block" at that time. What I would try for testing, is to add five columns to your session log table -
5. loggedIn - unsigned int // holds the number of users logged in, starts at zero
6. overMax - boolean // true if users logged in is above Max
7. highOver - unsigned int // highest number in loggedIn while overMax is true
8. beginMax - timestamp
9. endMax - timestamp

when a user successfully logs in, I guess you already update the sessionId, userId, loginTime fields. What you could also do is to update the loggedIn to increase by one, for each new log in, if the boolean overMax is false, test the new increased loggedIn number to see if it is above the MAX for concurrent users, if it is above the MAX then update the overMax to true, and update the highOver to the value of the new loggedIn number, and update the beginMax to the current timestamp. As soon as the overMax was set to true, I would write this date time information to a log file -
beginMax:5-7-2011-14-32

if the overMax is true on login, test the highOver against the new increased loggedIn number, if higher, replace the highOver with the higher loggedIn number

When a user logs out (or time out), update the loggedIn to decrease by one, then see if the overMax is true, if it is true and the new decreased loggedIn is now below the MAX, update the overMax to false and set the endMax to the current timestamp. As soon as the overMax was set to false, I would write this date time and highOver information to a log file -
endMax:5-7-2011-14-56:highOver-77

someone then can read the log file to see if and how many times the number of current users was over the max. I suppose there are ways to parse and analyze the log file to get info you need without having a person read it, but thats more than I want to think about now.
As in all programming there's more than one way to solve a problem, this is what I might try.

0

Assisted Solution

@Slick812: There are some other issues which cause a problem in evaluating the concurrent users in real time.
Mainly with logout in around 20% of cases people don't actually log out either they have a technical problem or they simply close the browser.
How we handle this is we allow an idle time between requests at (normally 30 minutes).
So when the next request comes in we check the last request time stamp if it's been longer than 30 minutes they get sent back to the login page to create a new session.
We don't have a garbage collector which scans sessions every X minutes, instead when we take the last page request as their logout time.
The second complication is we have several user domains which need to be tracked separately.

I've now written a PHP function which will create a hash table of time slots in 5 minute blocks.
Since we will only see user activity between 8am -> 8pm the hash table shouldn't normally grow over 144 entries.
once the hash table has been built I can then use max( hashTable ) to get the high water mark.

I would like to award you the points for the work you have done.
But I really want something which can gather the stats from the log files.

Example code follows;

``````public function getPeakUsageForDate(\$statDate){
// Get session log for day, return login / last active times as 5min cohort from midnight.
// Example a time of 09:35 will return (9h X 60m + 35m) = 540+7 = 547th 5minute block in the day.
\$class = __METHOD__;
\$sql_peakUsage = <<<SQL
SELECT sl.sessionId, sl.userId, us.domain,
ROUND(TIMESTAMPDIFF( MINUTE, DATE(lastActive), lastActive )/5) as lastActive5Minute
FROM sessionlog sl
-- Class = {\$class}

SQL;

\$queryResult = \$this->database->query(\$sql_peakUsage);
\$peakHash = array('LAN'=>array(), 'VPN'=>array() );
while( \$dbRow = \$queryResult->fetch_assoc() ){
\$domain = strtoupper(\$dbRow['domain']);
if( \$domain!=='LAN' && \$domain!=='VPN' ){
// domain dosn't exist
continue;
}
\$lastActive5Minute = intval(\$dbRow['lastActive5Minute']);
for( \$epoch = \$loginTime5Minute; \$epoch <= \$lastActive5Minute; \$epoch++ ){
if( array_key_exists(\$epoch, \$peakHash[\$domain]) === false ){
\$peakHash[\$domain][\$epoch]=0;
}
\$peakHash[\$domain][\$epoch]++;
}
}
// Get hash table max values, give a value of zero if the table is empty.
\$lanPeak = count(\$peakHash['LAN'])===0 ? 0: max(\$peakHash['LAN']);
\$vpnPeak = count(\$peakHash['VPN'])===0 ? 0: max(\$peakHash['VPN']);
\$peakUsage = array(
'lanUsersOnlinePeak' => \$lanPeak,
'vpnUsersOnlinePeak' => \$vpnPeak
);
return \$peakUsage;
}
``````
0

LVL 33

Expert Comment

@ Matthew_Way

I did some thinking for this problem, but not any code work or testing, so I do not really need the points, and since you followed your own thinking about a unique identifier (hash) table to just go through and test each 5 minute period, I don't see that I dezserve many points, since your problem did not seem all that complex at first, but somewhat difficult to get an idea about the time frame elements of your needed logged-in analysis.
Yea, there are not many that ever bother to log out of anything in and network, if it is not enforced. I had not considered your inclusion of " lastActive ", but thats a way to help with the lack of proper log outs.

I hope in testing your results are good, and you can move on.

Good luck.
0

Author Closing Comment

Thanks Slick 812 for a solution for tracking peak users in real time.
But I really wanted a method which will create peak usage from log data
0

## Featured Post

### Suggested Solutions

Introduction HTML checkboxes provide the perfect way for a web developer to receive client input when the client's options might be none, one or many. Â But the PHP code for processing the checkboxes can be confusing at first. Â What if a checkbox isâ€¦
Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and tâ€¦
The viewer will learn how to count occurrences of each item in an array.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.