T-SQL for a date series to return 'x rows within y days of each other'

Hi All

I have a requirement to query against a normalized table with member id's and dates and generate a list where there are at least 5 rows where the dates are within 30 days of each other.  Anyone ever pull this off before?

Example:
CREATE TABLE #tmp (id int, dt date) 

INSERT INTO #tmp (id, dt) 
VALUES 
  (1, '2015-01-01'), (1, '2015-03-01'), (1, '2015-04-01'),
  (2, '2015-04-01'), (2, '2015-04-01'), (2, '2015-04-02'),  (2, '2015-04-09'), (2, '2015-04-15'), (2, '2015-04-30'),
  (3, '2015-04-01'), (3, '2015-04-01'), (3, '2015-04-02'),  (3, '2015-05-09'), (3, '2015-06-15'), (3, '2015-06-30')

Open in new window


My desired return set would contain id=2 (5+ rows with dates in April 2015), and not contain and 3 (5 rows but not within 30 days of each other), and not 1 (not 5 rows).

Thanks in advance.
Jim
LVL 66
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

dsackerContract ERP Admin/ConsultantCommented:
Hi Jim. This code should do the trick, or is this regardless of id?

SELECT  a.id, a.dt
FROM    @tmp a
JOIN   (SELECT  id, MIN(dt) AS minDt, MAX(dt) AS maxDt, COUNT(1) AS idCount
        FROM    #tmp
        GROUP BY id
        HAVING ABS(DATEDIFF(d, MIN(dt), MAX(dt))) <= 30)b
ON      b.id = a.id
WHERE   idCount >= 5

Open in new window

0
dsackerContract ERP Admin/ConsultantCommented:
I saw the additional objective for the counts after I posted, so I have modified the code above.
0
Snarf0001Commented:
That wouldn't account for items with 5 orders somewhere in the middle of a list (within 30 days).

I had to do this for an ordering / inventory system.  The best I could come up with was running it against each record, using a cross apply.
Bit of a beastly execution plan, and takes a little over a second on a table with ~200K rows, but works:

  select distinct t1.id
  from #tmp t1
  cross apply (
	select count(1) as nRecords
	from #tmp t2
	where t1.id = t2.id and t2.dt >= t1.dt and t2.dt <= dateadd(dd, 30, t1.dt)
  ) t2
  where t2.nRecords >= 5

Open in new window


Though I have that nagging thought in the back of my head I could fit an "exists" in there somewhere to stop after the first match for each member...
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Snarf0001Commented:
In the inventory one also looping in the actual record ID so it doesn't include itself in the apply.  If not possible then you of course should bump to nRecords >= 6.
0
Scott PletcherSenior DBACommented:
Haven't fully tested this yet, but I'll go ahead and post it since it should be an efficient way to do this if it works.  It uses a "standard" tally table of sequential numbers.

SELECT DISTINCT d.id
FROM #tmp d
INNER JOIN DBA.dbo.tally t ON
    t.tally BETWEEN 1 AND 61
CROSS APPLY (
    SELECT DATEADD(DAY, t.tally - 31, d.dt) AS day#
) AS ca1
GROUP BY d.id, ca1.day#
HAVING COUNT(*) >= 5
0
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorAuthor Commented:
... looking ...
0
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorAuthor Commented:
Points for all, more for the one I ran with.  Final code below, which includes some stuff that wasn't critical to the question.  Thanks guys.
INSERT INTO rpt_Scoring(HubspotContactId, ScoreId, Dt, Score, Notes)
SELECT DISTINCT s.HubspotContactId, 8, CAST(s_five.dt as date), 5, 'Click through 5+ emails within 30 days'
FROM rpt_Scoring s
   JOIN (
      SELECT DISTINCT t1.HubspotContactId, t2.dt
      FROM rpt_Scoring t1
      CROSS APPLY (
         SELECT count(1) as nRecords, dt
         FROM rpt_Scoring t2
         WHERE t1.HubspotContactId = t2.HubspotContactId and t2.dt >= t1.dt and t2.dt <= dateadd(dd, 30, t1.dt) AND ScoreId = 9
         GROUP BY dt) t2
WHERE t2.nRecords >= 5 AND ScoreId = 9) s_five ON s.HubspotContactId = s_five.HubspotContactId AND s.dt = s_five.dt

Open in new window

0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.