Solved

# Extrapolate from range, count unique rows

Posted on 2006-03-30
1,657 Views
Given a visitDate range, for every day covered by a search that occured in that range, count unique records.

Sample data (see end of question for SQL to create this table):
+---------+--------------+----------------+--------------+
| visitor  | visitDate     | searchStart   | searchEnd  |
+---------+--------------+----------------+--------------+
|       1   | 2006-03-30 | 2006-05-01  | 2006-05-03 |
|       2   | 2006-03-30 | 2006-04-28  | 2006-05-02 |
|       1   | 2006-03-30 | 2006-05-01  | 2006-05-03 |
|       2   | 2006-03-30 | 2006-04-30  | 2006-05-02 |
|       3   | 2006-03-31 | 2006-05-01  | 2006-05-05 |
+---------+--------------+---------------+---------------+

In the sample data, the two rows for visitor 1 are identical. They count as only one. The two rows for vistor 2 have different searchStart values. They count as two.

Given: visitDate = 2006-03-30
Result:
+---------------+--------+
| searchDate  | visits   |
+---------------+--------+
|  2006-04-28 |      1   | // visitor 2's first search
|  2006-04-29 |      1   |
|  2006-04-30 |      2   | // visitor 2's two searches
|  2006-05-01 |      3   | // visitor 1's search and visitor 2's two searches
|  2006-05-02 |      3   |
|  2006-05-03 |      1   | // visitor 1's search
+---------------+--------+

Given: visitDate = 2006-03-31
Result:
+---------------+--------+
| searchDate  | visits   |
+---------------+--------+
|  2006-05-01 |      1   | // all from visitor 3
|  2006-05-02 |      1   |
|  2006-05-03 |      1   |
|  2006-05-04 |      1   |
|  2006-05-05 |      1   |
+---------------+--------+

Given: visitDate BETWEEN 2006-03-30 AND 2006-03-31
Result:
+---------------+--------+
| searchDate  | visits   |
+---------------+--------+
|  2006-04-28 |      1   | // the previous two results added together
|  2006-04-29 |      1   |
|  2006-04-30 |      2   |
|  2006-05-01 |      4   |
|  2006-05-02 |      4   |
|  2006-05-03 |      2   |
|  2006-05-04 |      1   |
|  2006-05-05 |      1   |
+---------------+--------+

Restrictions:
MySQL 4.0.12
Temporary tables not allowed.
Multiple queries, user variables, and use of a scripting language (PHP) are acceptable.
The table contains 500K rows.  The query (or queries) should complete in a reasonable time. 10 seconds is ok, 5 minutes is not.  To that end, table structure may be changed, indexes added, as appropriate.

SQL to create sample data:

CREATE TABLE visitData (
visitor int,
visitDate date,
searchStart date,
searchEnd date
);
INSERT INTO visitData VALUES (1, '2006-03-30', '2006-05-01', '2006-05-03');
INSERT INTO visitData VALUES (2, '2006-03-30', '2006-04-28', '2006-05-02');
INSERT INTO visitData VALUES (1, '2006-03-30', '2006-05-01', '2006-05-03');
INSERT INTO visitData VALUES (2, '2006-03-30', '2006-04-30', '2006-05-02');
INSERT INTO visitData VALUES (2, '2006-03-31', '2006-05-01', '2006-05-05');
0
Question by:snoyes_jw

LVL 11

Accepted Solution

Query should be
select searchStart,searchEnd,count(*) as cnt from visitData where visitDate = '2006-03-30' group by
visitor,visitDate,searchStart,searchEnd;
//for the visistData between you can use the following
select searchStart,searchEnd,count(*) as cnt from visitData where visitDate >= '2006-03-30' and visitDate <= '2006-03-31'  group by  visitor,visitDate,searchStart,searchEnd;
using your data it return the following result

+-------------+------------+----------+
| searchStart | searchEnd  | cnt   |
+-------------+------------+----------+
| 2006-05-01  | 2006-05-03 |        2 |
| 2006-04-28  | 2006-05-02 |        1 |
| 2006-04-30  | 2006-05-02 |        1 |
+-------------+------------+----------+

In PHP Code, you can you can use a while loop to get the recordset, the code is something like
\$dt_array = array(); //array to store the each date
while(\$row = mysql_fetch_array(\$result)) {
\$dt1 = \$row[0];
\$dt2 = \$row[1];
for(\$p=\$dt1;\$p<=\$dt2;) {
\$dt_array[\$p] = \$row[2];
\$p = date("Y-m-d",strtotime(\$p." +1 days"));
}
}

//print dt_array, dt_array contains all the dates and count
0

LVL 11

Expert Comment

In above the statement
\$dt_array[\$p] = \$row[2];

should be
\$dt_array[\$p] += \$row[2];
0

LVL 33

Author Comment

Might work.  I was reluctant to do that at the beginning of the project because I feared it would would be too much data for our web servers to handle in a PHP array.  But it turns out I can trim some outlying data, so I won't have more than a thousand or so entries in the \$dt_array, and since each is just an integer, I think we have the memory to handle that.

One concern is the query.  I'd need to change it to COUNT(DISTINCT visitor, visitDate, searchStart, searchEnd), and when I've done that, it takes longer to finish than I can allow.
0

LVL 14

Assisted Solution

if you create a table containing days :

CREATE TABLE days (
day date NOT NULL,
PRIMARY KEY  (day)
);

INSERT INTO days VALUES ('2006-04-26'), ('2006-04-27'), ('2006-04-28'), ('2006-04-29'), ('2006-04-30'), ('2006-05-01'), ('2006-05-02'), ('2006-05-03'), ('2006-05-04'), ('2006-05-05'), ('2006-05-06'), ('2006-05-07');

you could have your result with this simple query :

SELECT d.day, COUNT(DISTINCT v.visitor, v.visitdate, v.searchstart, v.searchend)
FROM days AS d
LEFT JOIN visitdata AS v ON d.day BETWEEN v.searchstart AND v.searchend
WHERE d.day BETWEEN \$begindate AND \$enddate
GROUP BY d.day
ORDER BY d.day;

but you will probably have the same performance problems you encountered with COUNT(DISTINCT visitor, visitDate, searchStart, searchEnd)
i have created a visit table with 2.5 M rows with random data and it takes 3 minutes to complete, with an index on  (visitor, visitDate, searchStart, searchEnd)
0

LVL 33

Author Comment

Yeah, I've tried a solution very much like that, and as you stated, it's too slow.  I think I'm going to have to switch gears and go with some sort of summary table.
0

## Featured Post

### Suggested Solutions

Foreword This is an old article.  Instead of using the MySQL extension that was used in the original code examples, please choose one of the currently supported database extensions instead.  More information is available here: MySQLi / PDO (http://…
Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
To add imagery to an HTML email signature, you have two options available to you. You can either add a logo/image by embedding it directly into the signature or hosting it externally and linking to it. The vast majority of email clients display l…
This video is in connection to the article "The case of a missing mobile phone (https://www.experts-exchange.com/articles/28474/The-Case-of-a-Missing-Mobile-Phone.html)". It will help one to understand clearly the steps to track a lost android phone.