Link to home
Start Free TrialLog in
Avatar of moe57
moe57

asked on

Finding records based on their previous dates and grouping them by location using SQL

Hi,
I have new requirement for this logic and it uses 2 main locations and the locations are categorized into these 2 groups: ER group and Clinic group
•      ER Group Locations = ER, OBS and IP
•      Clinic Group Locations = Clinic and OP
So the logic is following:
1)      ER Group
a)      Output this record, if patient makes his/her first visit within 48 hours to any of the ER group locations
b)      Output  any subsequent visits to any of the ER locations after 48 from their last visit (last visits = last output record)
c)      output if same patient makes a visit to any of the Clinic group locations after 90 days from their last visit (last visit = last output record for this patient)
2)      Clinic Group
a)      Output if patient makes his/her first visit to any of Clinic group locations
b)      Do not output any subsequent visits to any of the Clinic group locations for the same patient
c)      Output if same patient makes a visit to any of the ER group locations after 48 hours from their last clinic visit (last visit = last output record)
here is some examples for different visits
here is the final output should look like

here is sample data to test it
thank you all for your help.
Avatar of Peter Chan
Peter Chan
Flag of Hong Kong image

Try
select name,location,max(date) [Date]
from tab1
group by name,location

Open in new window

Avatar of moe57
moe57

ASKER

thanks but it is not as easy as that because there are some rules that you have to take into consideration as i have described in my initial post.
Avatar of moe57

ASKER

okay, this is what i have but it is not quite right yet. The problem with this query is that is not applying the 48 and the 90 days rule correctly because we always need to look at the last output record and from that date that is where we start counting if patient visited ER group within 48 hours.
here is the query that just need to be adjusted
;with grp as (
SELECT CASE WHEN LOCATION IN ('ER', 'OBS', 'IP') THEN 'ER' 
WHEN LOCATION IN('CLINIC', 'OP') THEN 'CLINIC' END GRP, *
FROM TABLE1 )
,grprn AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [Name] ORDER BY [DATE]) rn, * FROM grp
) 
,ERs AS
(
SELECT * FROM grprn WHERE GRP = 'ER'
)
,CLs AS
(
SELECT * FROM grprn WHERE GRP = 'CLINIC'
)
SELECT [Name], [Location], [DATE] 
FROM ERs WHERE RN = 1
UNION ALL 
SELECT [Name], [Location], [DATE] 
FROM ERs a 
WHERE RN > 1 AND EXISTS(SELECT 1 FROM ERs WHERE RN = 1 AND [Name] = a.[Name])
AND NOT EXISTS(SELECT 1 FROM ERs WHERE RN = 1 AND [Name] = a.[Name] AND a.[Date] < DateAdd(hh, 48, [Date]))
UNION ALL
SELECT [Name], [Location], [DATE] 
FROM CLs b 
WHERE RN > 1 AND EXISTS(SELECT 1 FROM ERs WHERE RN = 1 AND [Name] = b.[Name])
AND NOT EXISTS(SELECT 1 FROM ERs WHERE RN = 1 AND [Name] = b.[Name] AND b.[Date] < DateAdd(d, 90, [Date]))
UNION ALL
SELECT [Name], [Location], [DATE] 
FROM CLs WHERE RN = 1
UNION ALL 
SELECT [Name], [Location], [DATE] 
FROM ERs c 
WHERE RN > 1 AND EXISTS(SELECT 1 FROM CLs WHERE RN = 1 AND [Name] = c.[Name])
AND NOT EXISTS(SELECT 1 FROM CLs WHERE RN = 1 AND [Name] = c.[Name] AND c.[Date] < DateAdd(hh, 48, [Date]))
ORDER BY 1, 3

Open in new window

Avatar of Jim Horn
>here is some examples for different visits
>here is the final output should look like
>here is sample data to test it
Either my eyesight is going bad, or I don't see any of these samples/attachments.
Also, there is a wompload of business logic here, but without a data diagram that shows your tables and relationships it is very difficult to conceptualize what you are asking.
Avatar of moe57

ASKER

hmmm! you are right but I had details post out there for the last couple of days but it seems that some of my post was deleted this morning and i am not able to even edit it.  I am not sure what is going on..
@Jim - Weird, but the sample and final output were there earlier - I copied them into SQL fiddle. Is it possible for the OP to edit the question?
This is the result of what I copied into SQLFiddle to create the schema...
CREATE TABLE Table1
	([NAME] varchar(5), [LOCATION] varchar(6), [DATE] date, [NOTES] varchar(72))
;
	
INSERT INTO Table1
	([NAME], [LOCATION], [DATE], [NOTES])
VALUES
	('MIKE', 'ER', '2014-01-01 00:00:00', 'output it is first visit'),
	('MIKE', 'ER', '2014-01-02 00:00:00', 'do not output, it is <48'),
	('MIKE', 'OBS', '2014-01-03 10:00:00', 'output this because it is >48 from his last visit on 1/1/2014'),
	('MIKE', 'IP', '2014-01-04 00:00:00', 'do not output because it is less than 48 from his last visit on 1/3/2014'),
	('MIKE', 'IP', '2014-01-29 00:00:00', 'output this, it is >48 from his last visit on 1/3/2014'),
	('MIKE', 'OBS', '2014-04-15 01:00:00', 'output this, it is >48 from his last visit on 1/29/2014'),
	('MIKE', 'CLINIC', '2014-05-20 01:00:00', 'do not output, it is <90  from his last visit on 4/15/2014'),
	('MIKE', 'OP', '2014-06-20 01:00:00', 'do not output, it is <90  from his last visit on 4/15/2014'),
	('MIKE', 'CLINIC', '2014-09-15 01:00:00', 'output, it is >90 from his ER visit on 4/15/2014'),
	('DAVID', 'CLINIC', '2014-03-03 00:00:00', 'output it is first visit'),
	('DAVID', 'CLINIC', '2014-03-04 00:00:00', 'do not output'),
	('DAVID', 'CLINIC', '2014-07-05 01:00:00', 'do not output'),
	('DAVID', 'ER', '2014-03-15 00:00:00', 'output this because it is >48 from his last clinic visit on 3/3/2014'),
	('DAVID', 'CLINIC', '2014-03-18 00:00:00', 'do not output because it is < 90 from his ER visit on 3/15/2014'),
	('DAVID', 'OBS', '2014-03-19 00:00:00', 'output, it is >48 from his last ER visit on 3/15/2014')
;

Open in new window


@Moe, I'm thought I was close to solving it. The one part of the scenario that is causing problems is this one:
      ('MIKE', 'ER', '2014-01-01 00:00:00', 'output it is first visit'),
      ('MIKE', 'ER', '2014-01-02 00:00:00', 'do not output, it is <48'),
      ('MIKE', 'OBS', '2014-01-03 10:00:00', 'output this because it is >48 from his last visit on 1/1/2014'),

Note that the third line for MIKE is not actually >48 after his last ER visit (it is the next day), but it IS 48 hours after the most recent ER visit that matched the output rules. I'm struggling to build suitable CTE that can identify these properly. Potentially a patient could go to ER every day, and only every other visit would match the output rule.

@Experts: I'm out of time on this, so don't hesitate to jump in if you're inclined.
I couldn't give it up so close to the end:

Working example here on SQLFiddle

Returns 8 rows as required. I'm not claiming it's the most elegant way to do it...
It uses 3 CTEs.
1. order by person and identify location groups
2. Compare to date of previous row
3. Generate final result set by removing non-eligible rows

It has some superfluous columns in the CTEs but I'm out of time. Can tune on Monday if necessary.
;with cte (personID,ItemRank,ItemDate,ItemLocation,LocGroup,Notes) 
as
(select 
name
 ,ROW_NUMBER()  OVER (partition by name ORDER BY name,date)
,date
 ,location
 ,case when location in ('ER', 'OBS' ,'IP') then 'ER' ELSE 'CLINIC' END
 ,notes
from table1
),
 cteFilter (personID,ItemRank,LocRank,ItemDate,PrevDate,UseDate,ItemLocation,LocGroup,Notes) as
(select 
a.personID,
 ROW_NUMBER()  OVER (partition by a.personid ORDER BY a.personid,a.itemdate)
 ,row_number() over (order by a.locgroup,a.personid)
 ,a.ItemDate,coalesce(p.itemdate,null)
 ,case when a.locgroup = 'ER' and a.itemdate <dateadd(hh,48,coalesce(p.itemdate,null)) then p.itemdate 
   when a.locgroup='CLINIC' and a.itemdate <dateadd(d,90,coalesce(p.itemdate,null)) then p.itemdate 
  else a.itemdate end as UseDate
 ,a.ItemLocation
 ,a.LocGroup,a.Notes
from cte A 
   left join cte P on a.personID=p.personID and a.Itemrank=p.itemrank+1
),

cteFinal  (personID,ItemRank,LocRank,ItemDate,PrevDate,UseDate,ItemLocation,LocGroup,Notes) as 
(
select 
 a.personID,a.ItemRank,a.LocRank,a.ItemDate,a.PrevDate,a.UseDate,a.Itemlocation,a.LocGroup,a.Notes 
  from ctefilter A left join ctefilter P on a.personID=p.personID and a.Itemrank=p.itemrank+1
where a.itemrank=1 
or (a.locgroup='ER' and a.itemdate > dateadd(hh,48,p.usedate))
or (a.locgroup ='CLINIC' AND dateadd(d,90,coalesce(p.usedate,p.itemdate))<=a.itemdate
   AND
   (select locGroup from cte where itemrank=1 and personID=a.personid) ='ER')
)
select personID,ItemDate,Itemlocation,Notes from ctefinal
order by personid,itemdate;


--
select * from table1 where notes like 'Output%'
order by name,date;

Open in new window


Excellent link on simple talk for solving complex T-SQL problems step by step.
Avatar of moe57

ASKER

SimonAdept,
Thank you so much, the solution works great with the sample data, however when i try to test against my real data i get this error: " transaction log for database tempdb is full error".  I was doing some testing all day and it seems it does not like when we use the OR operation:
OR ( A.locgroup = 'ER' 
                      AND A.itemdate > Dateadd(hh, 48, P.usedate) ) 
                 OR ( A.locgroup = 'CLINIC' 

Open in new window

if i remove that then the query will run fine.  Thanks
Hi Moe,

How many rows in your live data?
How big is your tempdb?
How much free space is there on the drive?
Is tempdb a fixed size or is it allowed to autogrow? - allow autogrow if currently fixed (and there is drive space available)
Is it simple or full recovery model? - if full, I'd suggest setting it to simple.

If you're doing this sort of query on millions of rows of data you need appropriate server resources (disk space and RAM assigned to SQL Server).
As a short term fix for today:
1. You could try breaking your live data down into chunks (do an initial select into a staging table to get names beginning A-N, then M-Z).
2. You could also move some of the load out of tempdb by replacing the first cte with a query that creates a staging table in another database.
ASKER CERTIFIED SOLUTION
Avatar of Simon
Simon
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of moe57

ASKER

thanks you so much, the database is set to unlimited auto-grow and it is simple recover model.  It has 80gb free space.  I guess the only option is to break it down the data and see if that helps.  Thanks again, i really appreciate all your time and efforts on this.
Hi Moe, have you tried breaking down your source data and discovered a batch size that will work? If so, is that a workable method or would it take more than 10 batches to process all your data?

Please let me know if you would like assistance to modify the query to create and then use tables in another database to lighten the load on the tempdb.

Hope you're well down the road to completing your task.
Avatar of moe57

ASKER

Thanks Simon again for all your help, i could not do this without your precious time.  I was finally able to do this by following your suggestion by replacing the cte with a table and that did it.  thanks again for your support.
Avatar of moe57

ASKER

I've requested that this question be closed as follows:

Accepted answer: 0 points for moe57's comment #a40485588

for the following reason:

Simon was great helping and providing all kind of  support and guidance, i really appreciate your support.
That's excellent news. Now I can go to sleep happy :)

Just for the record, how may rows of live data were you processing?
Did you only have to replace the top CTE with a table?
Close Request Pending
moe57 requested that this question be closed by accepting moe57's comment #a40485588 (0 points) as the solution for the following reason:

Simon was great helping and providing all kind of  support and guidance, i really appreciate your support.

Moe, thanks for your kind words. Please accept one of my comments as the solution, or I'll get zero credit for my efforts on this.
Avatar of moe57

ASKER

great solution.  thanks
Avatar of moe57

ASKER

The clinic group is showing up multiple times and the rule was only to show one record if patient visits again another clinic group more than one time.  I remember you helped me similar issue about a week ago where clinic records where showing up multiple times for the same patient and you applied an extra parenthesis and that seemed to fix the issue.  I have tried that but did not work for me and i am not sure if i am doing it wrong.  It is weird that it works in the sample data but not production data.  I am using sql server 2008 R2.   Thanks
Hi Moe,

Can you be a bit more specific about which clinic visits are showing. Is it all clinic visits, just those 90+ days after an ER visit or subsequent clinic visits for patients whose first visit was to a clinic?

Can you post the actual SQL you are now using, which I understand to now be an initial query to write data to a staging table, and then the remaining part of the solution I posted yesterday.
Avatar of moe57

ASKER

Hi SimonAdept,
it seems the subsequent visits for the same patients.  Here is my actual query after i split the CTE:
-- this is the temp table for the first CTE 
INSERT INTO TEMP_TABLE  (INIT_UQ_NBR, RANK_UQ_NBR, UMPI_ADMT_TS, UMPI_LOC_TYPE, UMPI_LOC_GRP)
     SELECT DISTINCT UQ_NBR AS INIT_UQ_NBR, 
                ROW_NUMBER() 
                  OVER ( PARTITION BY UQ_NBR ORDER BY UQ_NBR, ADMT_TS)AS RANK_UQ_NBR,
                 Cast(ADMT_TS AS SMALLDATETIME)AS UMPI_ADMT_TS, 
                --ADMT_TS AS UMPI_ADMT_TS, 
                LOC_TYPE AS UMPI_LOC_TYPE, 
                CASE 
                  WHEN LOC_TYPE IN ( 'ER', 'OBS', 'ELECTIVE', 'EMERGENCY', 'INPATIENT', 'OBSERVATION', 'URGENT' )  THEN 'ER' 
                  WHEN LOC_TYPE IN ( 'CLINIC', 'OP', 'OUTPATIENT' ) THEN 'CLINIC' END AS UMPI_LOC_GRP 
           
         FROM   METRIC_TABLE WHERE UQ_NBR IS NOT NULL 
                AND UQ_NBR <> '0000000000' 
             --    ), 
    --***************************         
             
     ;WITH CTEFILTER (INIT_UQ_NBR, RANK_UQ_NBR, LOCRANK, UMPI_ADMT_TS, PREVADMT_TS, USEADMT_TS, 
     UMPI_LOC_TYPE, UMPI_LOC_GRP) 
     AS (
     SELECT A.INIT_UQ_NBR, 
                ROW_NUMBER() OVER ( PARTITION BY A.INIT_UQ_NBR ORDER BY A.INIT_UQ_NBR, A.UMPI_ADMT_TS),
          
                ROW_NUMBER() OVER ( ORDER BY A.UMPI_LOC_GRP, A.INIT_UQ_NBR),      
                A.UMPI_ADMT_TS, 
                COALESCE(P.UMPI_ADMT_TS, NULL) , 
                CASE 
                  WHEN A.UMPI_LOC_GRP = 'ER' 
                       AND A.UMPI_ADMT_TS < Dateadd(hh, 48, COALESCE(P.UMPI_ADMT_TS, NULL))THEN P.UMPI_ADMT_TS
  
                  WHEN A.UMPI_LOC_GRP = 'CLINIC' AND A.UMPI_ADMT_TS < Dateadd(d, 90, COALESCE(P.UMPI_ADMT_TS, NULL)) THEN P.UMPI_ADMT_TS ELSE A.UMPI_ADMT_TS END AS USEADMT_TS,
         
                A.UMPI_LOC_TYPE, 
                A.UMPI_LOC_GRP
                
         FROM   TEMP_TABLE A 
                LEFT JOIN TEMP_TABLE P 
                       ON A.INIT_UQ_NBR = P.INIT_UQ_NBR 
                          AND A.RANK_UQ_NBR = P.RANK_UQ_NBR + 1
                          
                         )  , 
     CTEFINAL (INIT_UQ_NBR, RANK_UQ_NBR, LOCRANK, UMPI_ADMT_TS, PREVADMT_TS, USEADMT_TS, 
     UMPI_LOC_TYPE, UMPI_LOC_GRP) 
     AS (SELECT A.INIT_UQ_NBR, 
                A.RANK_UQ_NBR, 
                A.LOCRANK, 
                A.UMPI_ADMT_TS, 
                A.PREVADMT_TS, 
                A.USEADMT_TS, 
                A.UMPI_LOC_TYPE, 
                A.UMPI_LOC_GRP 
                
         FROM   CTEFILTER A 
                LEFT JOIN CTEFILTER P 
                       ON A.INIT_UQ_NBR = P.INIT_UQ_NBR 
                          AND A.RANK_UQ_NBR = P.RANK_UQ_NBR + 1 
         WHERE  A.RANK_UQ_NBR = 1 
                 OR ( A.UMPI_LOC_GRP = 'ER' 
                      AND A.UMPI_ADMT_TS > Dateadd(hh, 48, P.USEADMT_TS) ) 
                 OR ( A.UMPI_LOC_GRP = 'CLINIC' 
                      AND 
                Dateadd(d, 90, COALESCE(P.USEADMT_TS, P.UMPI_ADMT_TS)) <= A.UMPI_ADMT_TS 
                      AND ( SELECT UMPI_LOC_GRP 
                           FROM   TEMP_TABLE 
                           WHERE  RANK_UQ_NBR = 1 
                                  AND INIT_UQ_NBR = A.INIT_UQ_NBR ) = 'ER' )) 
SELECT INIT_UQ_NBR AS UQ_NBR, 
       UMPI_ADMT_TS AS ADMT_DT, 
       UMPI_LOC_TYPE AS LOC_TYPE
            
FROM   CTEFINAL 

Open in new window

Thanks for looking into it
Avatar of moe57

ASKER

Simon,
I take that back, it is not the subsequent visit for the same patient but it is only  just those 90+ days after an ER visit, those are showing up multiple times.  thanks
Hi Moe, Sorry, I didn't get a notification of your most recent post for some reason, I just happened to visit this thread again.

Thanks for posting your final query. I see you had a lot of column name editing to do.

Are you saying that there is still a problem?

Is it a business rule that for patients whose first visit is to ER, you do want to output the first visit to a clinic location that occurs >90 days from the ER visit, but no further clinic visits, no matter how many days after the ER visit or the previous clinic visit?

I've added a few further records to the test table I'm using and only get output for CLINIC visits for patients whose first visit was ER where they are more than 90 days apart.
Avatar of moe57

ASKER

thanks for getting back to me.   Yes i still have the duplicate issue for the clinic group.   I do want to output the first visit to a clinic location that occurs >90 days from the ER visit, but no further clinic visits from last ER visit.  We always look at the last output record and that tells us if we output or not.  For instance
Mike visits ER for the first time on 1/3, output this
Mike visits Clinic after 90 days from his last visit on 3/15, output this
Mike visits Clinic again 4/20, do not output this
any subsequent visits to Clinic, do not output
Mike visits ER on 3/18, output this because it is greater than 48 from his last clinic visit which was 3/15
Mike visits again Clinic after 90 days on 7/20 , we output this because his last visit was at the ER on 3/18.  I hope i clarified little bit more.  thanks again
Avatar of moe57

ASKER

please try this data, you will see Mike went to clinic twice, one time on sept and the other time on October.  Please see more detail the above post in which i provided example.
CREATE TABLE Table1
	([NAME] varchar(5), [LOCATION] varchar(6), [DATE] datetime, [NOTES] varchar(72))
;
	
INSERT INTO Table1
	([NAME], [LOCATION], [DATE], [NOTES])
VALUES
	('MIKE', 'ER', '2014-01-01 00:00:00', 'output it is first visit'),
	('MIKE', 'ER', '2014-01-02 00:00:00', 'do not output, it is <48'),
	('MIKE', 'OBS', '2014-01-03 10:00:00', 'output this because it is >48 from his last visit on 1/1/2014'),
	('MIKE', 'IP', '2014-01-04 00:00:00', 'do not output because it is less than 48 from his last visit on 1/3/2014'),
	('MIKE', 'IP', '2014-01-29 00:00:00', 'output this, it is >48 from his last visit on 1/3/2014'),
	('MIKE', 'OBS', '2014-04-15 01:00:00', 'output this, it is >48 from his last visit on 1/29/2014'),
	('MIKE', 'CLINIC', '2014-05-20 01:00:00', 'do not output, it is <90  from his last visit on 4/15/2014'),
	('MIKE', 'OP', '2014-06-20 01:00:00', 'do not output, it is <90  from his last visit on 4/15/2014'),
	('MIKE', 'CLINIC', '2014-09-15 01:00:00', 'output, it is >90 from his ER visit on 4/15/2014'),
    ('MIKE', 'CLINIC', '2014-10-15 01:00:00', 'do not output, it is subsequent visit'),
	('DAVID', 'CLINIC', '2014-03-03 00:00:00', 'output it is first visit'),
	('DAVID', 'CLINIC', '2014-03-04 00:00:00', 'do not output'),
	('DAVID', 'CLINIC', '2014-07-05 01:00:00', 'do not output'),
	('DAVID', 'ER', '2014-03-15 00:00:00', 'output this because it is >48 from his last clinic visit on 3/3/2014'),
	('DAVID', 'CLINIC', '2014-03-18 00:00:00', 'do not output because it is < 90 from his ER visit on 3/15/2014'),
	('DAVID', 'OBS', '2014-03-19 00:00:00', 'output, it is >48 from his last ER visit on 3/15/2014')
;

Open in new window

Moe, these are a strange set of rules for the query. They seem inconsistent ,because if the patient first visit was to a clinic, the rules say to omit ALL future clinic visits, no matter if they happen to be >90 days after an ER visit that WAS output because it was >48 hours after the clinic visit.
2)      Clinic Group
a)      Output if patient makes his/her first visit to any of Clinic group locations
b)      Do not output any subsequent visits to any of the Clinic group locations for the same patient
c)      Output if same patient makes a visit to any of the ER group locations after 48 hours from their last clinic visit (last visit = last output record)
What I would suggest is to use the principles already demonstrated to remove the duplicate / consecutive clinic visits.
i.e. save your current final result as a table, then in new query
re-rank the now reduced set of records for each person by date.
self-join that on the same method (personID=personID AND rank=Rank+1)
exclude records where currentlocationType=prevlocationType and  currentlocationType = CLINIC

I'm sure you can do this yourself, but if you want further help, please restate as a new question, ensuring that your set of business rules covers all possible cases and describes them all in a clear, non-ambiguous way, and that your sample data table covers all the test cases. The rules as stated at the top of this question DID NOT disallow multiple visits to a clinic type location that were >90 days after an ER visit and 90 days apart.
1)      ER Group
a)      Output this record, if patient makes his/her first visit within 48 hours to any of the ER group locations
b)      Output  any subsequent visits to any of the ER locations after 48 from their last visit (last visits = last output record)
c)      output if same patient makes a visit to any of the Clinic group locations after 90 days from their last visit (last visit = last output record for this patient)

I'm concerned that each time the rules get refined, the solution built for the previous iterations becomes less and less appropriate. I know it can be difficult to be both precise and concise when defining the question, but it is key do obtaining a good and scalable solution. I don't want you to keep hitting the "tempdb full" situation when you've just spent hours translating the sample code to your live environment.
Avatar of moe57

ASKER

okay, will try to do that.  thanks for your help.