EICT
asked on
How do I compare row values in a MYSQL table so that I can count how many have increased/decreased within the grouped values
I have the following table which records attainment levels of a Client in different areas on a review date.
clientcode review_date area1_level area2_level area3_level area4_level
C1013 2013-01-14 5 8 5 4
C1013 2013-04-04 5 8 5 5
C1013 2013-06-25 5 8 5 5
C1028 2014-07-07 9 5 10 9
C1031 2013-10-25 5 7 5 8
C1031 2014-01-21 3 2 5 4
C1061 2012-09-07 8 10 3 7
C1061 2012-12-04 8 10 3 7
C1061 2013-03-05 7 10 3 7
C1061 2013-06-13 7 10 4 7
C1061 2013-09-12 7 10 4 7
C1068 2012-12-17 8 1 8 9
C1068 2013-03-07 8 1 8 9
C1118 2014-03-11 8 5 8 7
Each client has a unique client code.
I would like to be able to
1) Compare the first review date values with the latest review date values in such a way as that I can count how many clients have ‘improved’, ‘worsened’ or ‘no change’ in each area.
2) Compare the latest review date values with the previous review date values in the same manner.
If there was only one review then I need to count this as a ‘no change’ in each area.
I wonder if the best way to do this is to join the first review columns to the latest review value columns, then compare the values with a comparison column for each area and then count the result.
The table above is the result of a complex query select statement already.
I have the following questions.
1. Can this be easily done with a single query in a simpler way, without having to join the complex query which created the above with it’s self?
2. If I have a series of review dates to obtain the latest date I can “GROUP the clientcode and then SORT BY review_date DESC LIMIT 1 “ to get the latest review date values but how to I select the date before the latest review date?
3. If there isn’t any previous review date values how can I instead use take the one review values and use them again, thereby showing ‘no change’?
Thanks for your help.
clientcode review_date area1_level area2_level area3_level area4_level
C1013 2013-01-14 5 8 5 4
C1013 2013-04-04 5 8 5 5
C1013 2013-06-25 5 8 5 5
C1028 2014-07-07 9 5 10 9
C1031 2013-10-25 5 7 5 8
C1031 2014-01-21 3 2 5 4
C1061 2012-09-07 8 10 3 7
C1061 2012-12-04 8 10 3 7
C1061 2013-03-05 7 10 3 7
C1061 2013-06-13 7 10 4 7
C1061 2013-09-12 7 10 4 7
C1068 2012-12-17 8 1 8 9
C1068 2013-03-07 8 1 8 9
C1118 2014-03-11 8 5 8 7
Each client has a unique client code.
I would like to be able to
1) Compare the first review date values with the latest review date values in such a way as that I can count how many clients have ‘improved’, ‘worsened’ or ‘no change’ in each area.
2) Compare the latest review date values with the previous review date values in the same manner.
If there was only one review then I need to count this as a ‘no change’ in each area.
I wonder if the best way to do this is to join the first review columns to the latest review value columns, then compare the values with a comparison column for each area and then count the result.
The table above is the result of a complex query select statement already.
I have the following questions.
1. Can this be easily done with a single query in a simpler way, without having to join the complex query which created the above with it’s self?
2. If I have a series of review dates to obtain the latest date I can “GROUP the clientcode and then SORT BY review_date DESC LIMIT 1 “ to get the latest review date values but how to I select the date before the latest review date?
3. If there isn’t any previous review date values how can I instead use take the one review values and use them again, thereby showing ‘no change’?
Thanks for your help.
How close or useful is this?
| CLIENTCODE | AREA | NUM_RECORDS | EARLIEST | LATEST | MIN_LEVEL | MAX_LEVEL | DELTA |
|------------|------|-------------|------------|------------|-----------|-----------|-------|
| C1013 | 1 | 3 | 2013-01-14 | 2013-06-25 | 5 | 5 | 0 |
| C1013 | 2 | 3 | 2013-01-14 | 2013-06-25 | 8 | 8 | 0 |
| C1013 | 3 | 3 | 2013-01-14 | 2013-06-25 | 5 | 5 | 0 |
| C1013 | 4 | 3 | 2013-01-14 | 2013-06-25 | 4 | 5 | 1 |
| C1028 | 1 | 1 | 2014-07-07 | 2014-07-07 | 9 | 9 | 0 |
| C1028 | 2 | 1 | 2014-07-07 | 2014-07-07 | 5 | 5 | 0 |
| C1028 | 3 | 1 | 2014-07-07 | 2014-07-07 | 10 | 10 | 0 |
| C1028 | 4 | 1 | 2014-07-07 | 2014-07-07 | 9 | 9 | 0 |
| C1031 | 1 | 2 | 2013-10-25 | 2014-01-21 | 3 | 5 | 2 |
| C1031 | 2 | 2 | 2013-10-25 | 2014-01-21 | 2 | 7 | 5 |
| C1031 | 3 | 2 | 2013-10-25 | 2014-01-21 | 5 | 5 | 0 |
| C1031 | 4 | 2 | 2013-10-25 | 2014-01-21 | 4 | 8 | 4 |
| C1061 | 1 | 5 | 2012-09-07 | 2013-09-12 | 7 | 8 | 1 |
| C1061 | 2 | 5 | 2012-09-07 | 2013-09-12 | 10 | 10 | 0 |
| C1061 | 3 | 5 | 2012-09-07 | 2013-09-12 | 3 | 4 | 1 |
| C1061 | 4 | 5 | 2012-09-07 | 2013-09-12 | 7 | 7 | 0 |
| C1068 | 1 | 2 | 2012-12-17 | 2013-03-07 | 8 | 8 | 0 |
| C1068 | 2 | 2 | 2012-12-17 | 2013-03-07 | 1 | 1 | 0 |
| C1068 | 3 | 2 | 2012-12-17 | 2013-03-07 | 8 | 8 | 0 |
| C1068 | 4 | 2 | 2012-12-17 | 2013-03-07 | 9 | 9 | 0 |
| C1118 | 1 | 1 | 2014-03-11 | 2014-03-11 | 8 | 8 | 0 |
| C1118 | 2 | 1 | 2014-03-11 | 2014-03-11 | 5 | 5 | 0 |
| C1118 | 3 | 1 | 2014-03-11 | 2014-03-11 | 8 | 8 | 0 |
| C1118 | 4 | 1 | 2014-03-11 | 2014-03-11 | 7 | 7 | 0 |
it was produced by the following query:
SELECT
clientcode
, area
, COUNT(*) AS num_records
, date_format(MIN(review_date),'%Y-%m-%d') AS earliest
, date_format(MAX(review_date),'%Y-%m-%d') AS latest
, MIN(level) AS min_level
, MAX(level) AS max_level
, MAX(level) - MIN(level) AS delta
FROM (
SELECT
clientcode
, review_date
, 1 AS area
, area1_level AS level
FROM Results
UNION ALL
SELECT
clientcode
, review_date
, 2 AS area
, area2_level AS level
FROM Results
UNION ALL
SELECT
clientcode
, review_date
, 3 AS area
, area3_level AS level
FROM Results
UNION ALL
SELECT
clientcode
, review_date
, 4 AS area
, area4_level AS level
FROM Results
) x
GROUP BY
clientcode
, area
;
Note. The majority of the query is using UNION ALL to "unpivot" your data into a normalized form which makes calculations much simpler. I do understand this query isn't exactly what you asked for, but what you have asked for is more complex than one query.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi PortletPaul,
The second query is exactly what I want to get the LEVEL_AT_MIN_DT and LEVEL_AT MAX_DT.
Before I close this question. Do you have any idea how I can pick the Maximum Date and the Date Before. I need this as well as the Max/Min Dates.
For example, using my data above for Client C1061 I want data from rows with review dates 2013-06-13 and 2013-09-12
Thanks
The second query is exactly what I want to get the LEVEL_AT_MIN_DT and LEVEL_AT MAX_DT.
Before I close this question. Do you have any idea how I can pick the Maximum Date and the Date Before. I need this as well as the Max/Min Dates.
For example, using my data above for Client C1061 I want data from rows with review dates 2013-06-13 and 2013-09-12
Thanks
ASKER
I suppose I could do a separate sub query to find all the dates less than the Max date (i.e all the dates less than 2013-09-12) and then pick the max from this result.
Regards,
Matt
Regards,
Matt
>>" how I can pick the Maximum Date and the Date Before"
You need the equivalent of row_number() over(partition by ... order by ...) regrettably this "analytic function" isn't available in MySQL.
There was a nice blog on this topic which went missing, but here is a link to it that still works
In essence you attache 2 @variable to the row of a query using a cross join. Then you set value into those variables on each row so that you establish a "row number" starting at 1 and order by whatever fields you choose. This is "partitioned" (which is similar to "group by") so each partition gets to start at 1.
In the end you can filter for row numbers 1 and 2.
You need the equivalent of row_number() over(partition by ... order by ...) regrettably this "analytic function" isn't available in MySQL.
There was a nice blog on this topic which went missing, but here is a link to it that still works
In essence you attache 2 @variable to the row of a query using a cross join. Then you set value into those variables on each row so that you establish a "row number" starting at 1 and order by whatever fields you choose. This is "partitioned" (which is similar to "group by") so each partition gets to start at 1.
In the end you can filter for row numbers 1 and 2.
SELECT
clientcode
, review_date
, RowNumber
FROM (
SELECT
@row_num :=IF(@prev_value = r.clientcode, @row_num + 1, 1)AS RowNumber
, r.clientcode
, r.review_date
, @prev_value := r.clientcode prev_client
FROM Results r
CROSS JOIN (
SELECT @row_num :=1, @prev_value :=''
) vars
ORDER BY
r.clientcode
, r.review_date DESC
) sq
WHERE RowNumber < 3
http://sqlfiddle.com/#!9/9d789/18
ASKER
Thanks for your help.
ASKER
1. selecting the results to get the table as above but order the review_date in DESC order
2. then select this table using the Group By clause to receive the first review record for each clientcode group, called table1
3. selecting the results to get the table as above but order the review_date in ASC order
4. then select this table using the Group By clause to receive the last review record for each clientcode group, called table2
5. join subqueries table1 and table2
6. Select the result of all this and then use 'IF' statements in the SELECT clause to compare the first and last values.
Not sure how to complete No2. though and get the review before the last review. If that makes sense.