I am probably going to have to run this query in SQL Server but I can convert an Access solution to T-SQL so either will do.
I have a table of customers with their ServiceStartDate and ServiceDropDate. Drop date will be null if the account is currently active. In order to do proper forecasting, I need a count of all active accounts by day starting in Jan of 2009. I have a calendar table that is populated with a record per day going from that time out to Dec of 2099. The query I have uses a non-equi-join between the two tables and counts the results but it doesn't run in an acceptable amount of time (I've let it run for almost two hours before cancelling it). There are 1.2 million customer records so the intermediate recordset is huge. Here is the query as I have written it. ACCESS_30DayMeterForecast is a view that does a very complicated calculation to obtain Zone. SYS_Calendar is a table.
SELECT MF.Commodity, MF.UtilityID, MF.FinalZone, SYS_Calendar.Date, Count(*) AS MeterCount
FROM ACCESS_30DayMeterForecast AS MF INNER JOIN SYS_Calendar ON MF.ServiceStartDate <= SYS_Calendar.Date
WHERE (MF.DropDate Is Null Or MF.DropDate >[SYS_Calendar].[Date]) AND (SYS_Calendar.is_weekday = 1)
GROUP BY MF.Commodity, MF.UtilityID, MF.FinalZone, SYS_Calendar.Date;
I'm looking for alternative solutions.
At the moment, there are no useful indexes on either table. The customer table is reloaded every night and I could ask the DBA to add indexes if I am convinced that will help.
One idea I have is to use a cursor instead of trying to do this in a query. The code would read each record in the customer table and increment buckets in an array. The array would be two dimensional. Year and day or I could just as easily make it one dimensional and simply use the integer part of the date to decide which buckets to update. The inner loop starts from the customer's ServiceStartDate and increments the date, one day at a time until it gets to the drop date or the current date. Then at the end of the process, the array will be written out and used in the forecasting. Typically cursors are much slower than queries but this may not be true in this case due to the huge size of the intermediary table that the query engine would have to create prior to aggregating the data.