T-SQL for a date series to return 'x rows within y days of each other'

Hi All

I have a requirement to query against a normalized table with member id's and dates and generate a list where there are at least 5 rows where the dates are within 30 days of each other.  Anyone ever pull this off before?

CREATE TABLE #tmp (id int, dt date) 

INSERT INTO #tmp (id, dt) 
  (1, '2015-01-01'), (1, '2015-03-01'), (1, '2015-04-01'),
  (2, '2015-04-01'), (2, '2015-04-01'), (2, '2015-04-02'),  (2, '2015-04-09'), (2, '2015-04-15'), (2, '2015-04-30'),
  (3, '2015-04-01'), (3, '2015-04-01'), (3, '2015-04-02'),  (3, '2015-05-09'), (3, '2015-06-15'), (3, '2015-06-30')

Open in new window

My desired return set would contain id=2 (5+ rows with dates in April 2015), and not contain and 3 (5 rows but not within 30 days of each other), and not 1 (not 5 rows).

Thanks in advance.
LVL 67
Jim HornMicrosoft SQL Server Data DudeAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

dsackerContract ERP Admin/ConsultantCommented:
Hi Jim. This code should do the trick, or is this regardless of id?

SELECT  a.id, a.dt
FROM    @tmp a
JOIN   (SELECT  id, MIN(dt) AS minDt, MAX(dt) AS maxDt, COUNT(1) AS idCount
        FROM    #tmp
        GROUP BY id
        HAVING ABS(DATEDIFF(d, MIN(dt), MAX(dt))) <= 30)b
ON      b.id = a.id
WHERE   idCount >= 5

Open in new window

dsackerContract ERP Admin/ConsultantCommented:
I saw the additional objective for the counts after I posted, so I have modified the code above.
That wouldn't account for items with 5 orders somewhere in the middle of a list (within 30 days).

I had to do this for an ordering / inventory system.  The best I could come up with was running it against each record, using a cross apply.
Bit of a beastly execution plan, and takes a little over a second on a table with ~200K rows, but works:

  select distinct t1.id
  from #tmp t1
  cross apply (
	select count(1) as nRecords
	from #tmp t2
	where t1.id = t2.id and t2.dt >= t1.dt and t2.dt <= dateadd(dd, 30, t1.dt)
  ) t2
  where t2.nRecords >= 5

Open in new window

Though I have that nagging thought in the back of my head I could fit an "exists" in there somewhere to stop after the first match for each member...

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Active Protection takes the fight to cryptojacking

While there were several headline-grabbing ransomware attacks during in 2017, another big threat started appearing at the same time that didn’t get the same coverage – illicit cryptomining.

In the inventory one also looping in the actual record ID so it doesn't include itself in the apply.  If not possible then you of course should bump to nRecords >= 6.
Scott PletcherSenior DBACommented:
Haven't fully tested this yet, but I'll go ahead and post it since it should be an efficient way to do this if it works.  It uses a "standard" tally table of sequential numbers.

FROM #tmp d
INNER JOIN DBA.dbo.tally t ON
    t.tally BETWEEN 1 AND 61
    SELECT DATEADD(DAY, t.tally - 31, d.dt) AS day#
) AS ca1
GROUP BY d.id, ca1.day#
Jim HornMicrosoft SQL Server Data DudeAuthor Commented:
... looking ...
Jim HornMicrosoft SQL Server Data DudeAuthor Commented:
Points for all, more for the one I ran with.  Final code below, which includes some stuff that wasn't critical to the question.  Thanks guys.
INSERT INTO rpt_Scoring(HubspotContactId, ScoreId, Dt, Score, Notes)
SELECT DISTINCT s.HubspotContactId, 8, CAST(s_five.dt as date), 5, 'Click through 5+ emails within 30 days'
FROM rpt_Scoring s
   JOIN (
      SELECT DISTINCT t1.HubspotContactId, t2.dt
      FROM rpt_Scoring t1
         SELECT count(1) as nRecords, dt
         FROM rpt_Scoring t2
         WHERE t1.HubspotContactId = t2.HubspotContactId and t2.dt >= t1.dt and t2.dt <= dateadd(dd, 30, t1.dt) AND ScoreId = 9
         GROUP BY dt) t2
WHERE t2.nRecords >= 5 AND ScoreId = 9) s_five ON s.HubspotContactId = s_five.HubspotContactId AND s.dt = s_five.dt

Open in new window

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.