Link to home
Start Free TrialLog in
Avatar of bdichiara
bdichiaraFlag for United States of America

asked on

How to fill in missing date gaps in date range

I need to display how many records are in each month/year in a grid view. To program this in a table manner, it would be very helpful if every month in every year have 1 or more records, however that's not the case, so I was wondering if it's possible to fill in those gaps some way with 0s at the query level. I was trying to figure out a way to do it at PHP level, however it was going to get way complicated to keep track of the current month, then match it with the row value, etc. If i had representation for every month and year within the range, i could just spit out my data accordingly. Here's the query I'm using:
SELECT
    COUNT(`id`) AS `total`, DATE_FORMAT(`report_date`,'%Y-%m') AS `group_date`,
    DATE_FORMAT(`report_date`,'%Y') AS `report_year`, DATE_FORMAT(`report_date`,'%m') AS `report_month`,
    DATE_FORMAT(`report_date`,'%M') AS `disp_month`
FROM `mhl_reports`
WHERE `deleted` IS NULL AND YEAR(`report_date`) >= '2001' AND YEAR(`report_date`) <= '2009'
GROUP BY `group_date`
ORDER BY `report_month` ASC, `report_year` DESC;

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of bdichiara

ASKER

I had thought about doing it this way. Do you think doing it this way is faster (performance-wise) than somehow doing it at MySQL level?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I don't know how to do it effectively at the mysql level!  

But seriously, with only about 100 possible rows in the result set, you will never be able to tell the difference in speed between one way or another.  It's too small to optimize.
If you try my solution, fix line 44.

If you try CXR's solution, use ENGINE=MEMORY for the temporary tables.

If you try them both, I would love for you to put a script timer on them and see if you can detect a difference of more than 1/100 second.

Best regards, ~Ray
		echo "<br/>$month HAS $total";

Open in new window

I am using cxr's method because it was the simplest to implement without having to go and change all my variables where my data was being output. it seems to work pretty good, although I had to change this:

m AS `report_month`

to this:

DATE_FORMAT(concat('2001-',m,'-01'),'%m') AS `report_month`

Having the date as a single integer messed up sorting (and my code) so I just set the output to the same format as before.

Thanks again for the help.
my sincerest apologies to Ray_Paseur for this one. I got to using the other method and with the amount of data already in the table (3k records +/-) that process was actually quite slow, when I moved to the method below, I got much much (noticeable) better performance. Sorry for mis-distributing the points, but I felt posting the results and best solution was necessary. (I created my array a bit differently, not sure if it's faster or slower, but just the way I decided to do it.
$reportsSQL = sprintf("SELECT
		COUNT(`id`) AS `total`, DATE_FORMAT(`report_date`,'%%Y-%%m') AS `group_date`,
		DATE_FORMAT(`report_date`,'%%Y') AS `report_year`, DATE_FORMAT(`report_date`,'%%m') AS `report_month`,
		DATE_FORMAT(`report_date`,'%%M') AS `disp_month`
	FROM `mhl_reports`
	WHERE `deleted` IS NULL AND YEAR(`report_date`) >= %s AND YEAR(`report_date`) <= %s
	GROUP BY `group_date`
	ORDER BY `report_month` ASC, `report_year` DESC;",
		escape($minYear),
		escape($maxYear)
);
		
#echo $reportsSQL . '<br />';
$reports = new mySQL($reportsSQL);
 
$data = array();
for($m=1; $m <= 12; $m++){
	for($y=$maxYear; $y >= $minYear; $y--){
		$report_month = str_pad($m, 2, "0", STR_PAD_LEFT);
		$key = $y . '-' . $report_month;
		$data[$key]['total'] = 0;
		$data[$key]['report_year'] = $y;
		$data[$key]['report_month'] = $report_month;
		$data[$key]['group_date'] = $key;
		$data[$key]['disp_month'] = date("F", strtotime($y.'-'.$report_month.'-01'));
	}
}
 
do {
	$key = $reports->row['group_date'];
	$data[$key] = $reports->row;
} while ($reports->row = mysql_fetch_assoc($reports->cmd));

Open in new window

Two things:

1) With only 3000 rows, the query should not have to be slow. There are some adjustments that can be done to make it faster. Make sure you have an index on the report_date column, and try the variant below.

2) You can "undo" your decission, and split the points differently. Just press the "request attention" link in the lower right corner of the original question, and a moderator will re-open this question.
create temporary table years (y smallint) engine=memory;
insert into years values 
  (2001),(2002),(2003),(2004),
  (2005),(2006),(2007),(2008),(2009);
create temporary table months(m tinyint) engine=memory;
insert into months values 
  (1),(2),(3),(4),(5),(6),
  (7),(8),(9),(10),(11),(12);
 
SELECT
    COUNT(`id`) AS `total`, concat(y,'-',m) AS `group_date`,
    y AS `report_year`, 
    DATE_FORMAT(concat('2001-',m,'-01'),'%m') AS `report_month`,
    DATE_FORMAT(concat('2001-',m,'-01'),'%M') AS `disp_month`
FROM (years,months)
LEFT JOIN `mhl_reports` ON 
  `report_date` between concat(y,'-',m,'-01') and concat(y,'-',m,'-31') and
  `deleted` IS NULL 
GROUP BY `group_date`
ORDER BY `report_month` ASC, `report_year` DESC;

Open in new window

Great suggestion by cxr and excellent alternative method, however I found the most optimal method to be in Ray_Paseur's solution. Thanks!