Link to home
Start Free TrialLog in
Avatar of paulnwgb
paulnwgb

asked on

SQL Date query

Hey experts

I have a table, policies, which looks like this:

customer_id int,
start_date datetime,
end_date datetime

Customers may have multiple policies, so we might have the following in our table:

1,2007-06-01,2008-05-31
1,2008-06-05,2009-01-01
2,2005-01-01,2012-01-01

I need a select which returns the customer ids that had a policy in force between two dates *without any gaps in coverage*.

So, using 2008-01-01 and 2008-12-31 as parameters to the select, only the 3rd record will come back, because he is the only customer who had a policy in effect for every day throughout 2008.

Let's say that the 2nd row had a start date of 2008-06-01 instead of 2008-06-05, he would also get returned, because although he doesn't have a single policy that covers all of 2008, between rows 1 & 2, he does.

Now, here's the kicker, this application is using SQL Server CE 3.5 - which does not support subselect.

Is there another way or am I forced to pull all the rows and calculate it in code?  

This is for a report on all customers - doing multiple queries per customer is not an option for performance reasons.

Thanks!
Avatar of richard_crist
richard_crist
Flag of United States of America image

Assuming that a customer with no gaps in coverage only has one row in the table, while customers with gaps have more than one, then the following query should return the customer ids with no gap in coverage:

select
    customer_id
from
    sometable
group by
    customer_id
having
    count(*) = 1

The above query aggregates the rows by customer_id.  Once aggregated you can use the "having" clause to return only rows where the aggregate function meets your criteria, in this case customer_id's with only one row in the table.

If you have customers with two or more rows where the dates have no gap from one row to the next, then this query would not work for you.

Avatar of paulnwgb
paulnwgb

ASKER

Thanks for the comment Richard - unfortunately, it is quite possible that a customer with no gaps in coverage has multiple rows.
ASKER CERTIFIED SOLUTION
Avatar of Paul_Harris_Fusion
Paul_Harris_Fusion
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
paulnwgb,

The solution by Paul_Harris_Fusion is outstanding and elegant.  As he states, if your data is such that the start date of one record is the same as the end date of the previous record, or the start date is one greater than the previous end date (that is, the next day) then his query should work.  If your data has the possibility of the start date on one record being before the end date of another record, then the situation is messy.

Paul_Harris_Fusion,

Can he do a preliminary check of his data to see if he has records with more than one day overlap by using your query above with the HAVING statement as follows?

having (max(end_date) - min(start_date))  > ( Sum(end_date-start_date))+1
Also, I meant to add that if this works for you please be sure and accept the comment of Paul_Harris_Fusion as the solution. That is, do not accept my comment because I am just commenting on the correctness of Paul_Harris_Fusion.   :)
Richard - I really appreciate the kind words -thanks.

Richard's suggestion to screen the data first is a good one.   The query he suggests should show up some potential errors but be aware that an overlap of say 3 days between period 1 and 2 can be exactly cancelled out by a corresponding gap between periods 2 and  3 and the query would not catch this since it based on totals.

Assuming there is a PK field on the policies table e.g. ID then you could try the follwing sql for screening purposes:

Select p1.ID, P2.ID
from policies p1, policies p2
where
(p1.start_date<=p2.start_date and p2.end_date>p2.start_date)
or
(p1.start_date>p2.start_date and p1.end_date<=p2.end_date)

The first data clause checks for period 1 straddling the start date of period 2
The second clause checks for period 1 being entirely contained with period 2.

This sql is a bit fiddly and unintuitve.  At first glance it does not appear to cover all cases but I think that the way the join works will actually uncover all problems without 'soft duplications' such as returning both 1,2  and 2,1.

Good luck



Paul/Richard

Thanks very much for your comments.  I am currently testing but I *think* it is looking good so far.

We never have overlap - if a new policy is created, we amend the end date of any existing policies whose end date is greater than the new start date, to match the new start date.

With that in mind, I think that addresses the potential gotcha?