Link to home
Start Free TrialLog in
Avatar of moe57
moe57

asked on

how to find first record from dataset and group them by location using SQL

The trick for my challenge is to find where the patient was FIRST seen, meaning was the patient seen at the clinic first or at the emergency room, once we find out that then I need to group them and output the data. Here are the criteria and the logic that has to be used when outputting the final data:
here is the logic that needs to be followed.
1) ER first visit
   a) if patient's first visit is ER, then display it and all other subsequent visits to ER like 2nd, 3rd etc
   b) if same patient goes to clinic after 90 days then display it
2) Clinic first visit
  a) if patient first visit is clinic then display it
  b) if same patient goes to clinic again do not display it
  c) if same patient goes to ER after 48 hours then display it.

Example:
- Mike first visit goes to ER on 1/1/2014 :- output this data
 - Mike goes again to ER on 1/3/2014 :- output this data
-  Mike goes to Clinic on 1/20/2014 :- do not output this data
- Mike goes to Clinic on 5/2/2014 : output this data
- David first visit goes to Clinic on 3/3/2014 : output this data
-  David goes back again to Clinic on 3/4/2014 : do not output this data
-  David goes to ER on 3/15/2014 : output this data

Open in new window


here is how my data looks like :
NAME    LOCATION           DATE
MIKE             ER                  1/1/2014
MIKE             ER                  1/3/2014
MIKE         CLINIC              1/20/2014
MIKE         CLINIC              5/2/2014
DAVID       CLINIC             3/3/2014
DAVID        CLINIC            3/4/2014
DAVID           ER                 3/15/2014

Open in new window


here is the desired results after the query is run:
NAME      LOCATION         DATE
    MIKE           ER               1/1/2014
    MIKE           ER               1/3/2014       
    MIKE      CLINIC             5/2/2014
    DAVID     CLINIC            3/3/2014
    DAVID      ER                   3/15/2014

Open in new window


here is the query i have started but not finished it
SET NOCOUNT ON
GO

    DECLARE @Init_DataSource TABLE
    (
         [NAME] VARCHAR(20)
        ,[DATE] DATE
        ,[LOCATION] VARCHAR(15)
    )

    INSERT INTO @Init_DataSource ([NAME], [DATE], [LOCATION])
    VALUES ('MIKE', '1-1-2014', 'ER')
          ,('MIKE', '1-3-2014', 'ER')
          ,('MIKE', '1-20-2014', 'CLINIC')
          ,('MIKE', '5-2-2014', 'CLINIC')
          ,('DAVID', '3-3-2014', 'CLINIC')
          ,('DAVID', '3-4-2014', 'CLINIC')
          ,('DAVID', '3-15-2014', 'ER')

SELECT * 
FROM @Init_DataSource
ORDER BY 1 DESC, 2 ASC

Open in new window

Avatar of Simon
Simon
Flag of United Kingdom of Great Britain and Northern Ireland image

Hi, I've done this on SQL Fiddle:
http://sqlfiddle.com/#!3/200bb/32

Schema:
   create TABLE DataSource 
    (
         [NAME] VARCHAR(20)
        ,[DATE] DATE
        ,[LOCATION] VARCHAR(15)
    )

    INSERT INTO DataSource ([NAME], [DATE], [LOCATION])
    VALUES ('MIKE', '1-1-2014', 'ER')
          ,('MIKE', '1-3-2014', 'ER')
          ,('MIKE', '1-20-2014', 'CLINIC')
          ,('MIKE', '5-2-2014', 'CLINIC')
          ,('DAVID', '3-3-2014', 'CLINIC')
          ,('DAVID', '3-4-2014', 'CLINIC')
          ,('DAVID', '3-15-2014', 'ER')

Open in new window


Query:
with cte1 (personID,ItemRank,ItemDate,ItemLocation) 
as
(select 
name
 ,ROW_NUMBER()  OVER (partition by name ORDER BY name,date,location) AS "Row Number"
,date,location
from datasource
)
select itemrank,datasource.* 
from cte1 inner join datasource 
on cte1.personID=datasource.name AND cte1.itemdate=datasource.date

where ItemRank=1
--first visit was ER
or (select itemlocation from cte1 where itemrank=1 and personID=name)= 'ER' AND location ='ER'
or (select itemlocation from cte1 where itemrank=1 and personID=name)= 'ER' AND location ='CLINIC'
  AND dateadd(d, -90,itemdate) >(select itemdate from cte1 where itemrank=1 and personID=name)
--first visit was CLINIC
or ((select itemlocation from cte1 where itemrank=1 and personID=name)= 'CLINIC' 
  AND dateadd(d, -2,itemdate) >(select itemdate from cte1 where itemrank=1 and personID=name)
  AND location = 'ER')

Open in new window


At present, this will only work if properly if the patient has only one event per day, as the date forms part of the join condition.
The data schema could be improved by addition of a unique ID to each patient event.
Avatar of moe57
moe57

ASKER

thank you so much for your help, so the only thing i need to change now this line
--first visit was ER
or (select itemlocation from cte1 where itemrank=1 and personID=name)= 'ER' AND location ='ER'
or (select itemlocation from cte1 where itemrank=1 and personID=name)= 'ER' AND location ='CLINIC'
  AND dateadd(d, -90,itemdate) >(select itemdate from cte1 where itemrank=1 and personID=name)
--first visit was CLINIC
or ((select itemlocation from cte1 where itemrank=1 and personID=name)= 'CLINIC' 
  AND dateadd(d, -2,itemdate) >(select itemdate from cte1 where itemrank=1 and personID=name)
  AND location = 'ER')

Open in new window

because i was told to use Netezza database so i need to convert these codes to some  kind of postgresql or some syntax that supports netezza.  It seems to work find in SQL but not in Netezza.  thanks
Hi I don't know netezza. Could you first try taking the comment lines out of the where clause. I would never normally put comments anywhere inside a select statement, and was mildly surprised that they were accepted.
Your question was tagged as SQL Server 2008. Which version of postgreSQL does this need to work on? Are you using "PureData System for Analytics 7.0.3" or "InfoSphere BigInsights 2.1.2" or "PureData System for Hadoop 1.0.0" or something else?
Avatar of moe57

ASKER

okay i have decided to stick with SQL server and foret about the Postgresql or Netezza.  There is one problem with this solution, for the clinic visit, each patient should only show up one time and this is only true for first time clinic visits.  For example, patient A went to clinic twice first time on 5/1/2014 and second time on 5/3/2014 so in this case we only need to show the first visit for this patient (patient A).  Any subsequent visits to clinic for the same patient should be deleted.
i have added one more record for David just to test the result and David showed up twice for clinic visits.  here is the data i have used:
  create  TABLE dbo.DataSource 
    (
         [NAME] VARCHAR(20)
        ,[DATE] DATE
        ,[LOCATION] VARCHAR(15)
    )

    INSERT INTO DataSource ([NAME], [DATE], [LOCATION])
    VALUES ('MIKE', '1-1-2014', 'ER')
          ,('MIKE', '1-3-2014', 'ER')
          ,('MIKE', '1-20-2014', 'CLINIC')
          ,('MIKE', '5-2-2014', 'CLINIC')
          ,('DAVID', '3-3-2014', 'CLINIC')
          ,('DAVID', '3-4-2014', 'CLINIC')
          ,('DAVID', '3-7-2014', 'CLINIC')
          ,('DAVID', '3-15-2014', 'ER')

Open in new window

thanks
Hi, I am only seeing one clinic visit for each patient, even with your revised dataset:
ITEMRANK 	NAME 	DATE 	LOCATION
1 	DAVID 	2014-03-03 	CLINIC
4 	DAVID 	2014-03-15 	ER
1 	MIKE 	2014-01-01 	ER
2 	MIKE 	2014-01-03 	ER
4 	MIKE 	2014-05-02 	CLINIC

Open in new window


See http://sqlfiddle.com/#!3/0485f/2

I can't replicate your result on sqlfiddle, and don't have access to a physical SQL Server to test on today. Which version of SQL Server have you tested on and obtained the incorrect result?

To be clear on the logic, your initial description does not necessarily exclude more than one clinic visit being shown for a patient IF they were first seen in ER, and then visit the clinic several times after 90 days have elapsed.

So, if I add another record for Mike ('MIKE', '5-4-2014', 'CLINIC'), that will also show in the results.
Avatar of moe57

ASKER

it is weird, i am seeing David went to Clinic twice, one is on 3/3/2014 and the other one is 3/7/2014.  I am using SQL Server 2008 R2.  If the patient first visit was at the clinic then any subsequent visits should be excluded but it is okay if patient first visit was at the ER then goes to the clinic several times later
ASKER CERTIFIED SOLUTION
Avatar of Simon
Simon
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of moe57

ASKER

actually this has fixed the problem.  Thank you so much for your help.
Thanks for confirming. Are you happy that the above is the solution?
Avatar of moe57

ASKER

thanks for your follow up, i did not close the question because i was having an issue when i run the query in production and the data is huge so it was failing on me and getting this error:
in my real data, i have other locations other than ER, something like this:
OR ((SELECT ITEMLOC_TYPE FROM CTE1 WHERE ITEMRANK = 1 AND PERSONID=PI_NBR) IN ('ER', 'OBS', 'ELEC','EXY','INP','OBS', 'URG') AND LOC_TYPE IN ('ER', 'OBS', 'ELEC','EXY','INP','OBS', 'URG'))
OR ((SELECT ITEMLOC_TYPE FROM CTE1 WHERE ITEMRANK = 1 AND PERSONID=PI_NBR) IN ('ER', 'OBS', 'ELEC','EXY','INP','OBS', 'URG') AND LOC_TYPE IN ('CLINIC', 'OP', 'OUTP')

Open in new window

so is there a way to overcome this issue or adjust the query like using subquery so i can overcome this issue?  thanks
Avatar of moe57

ASKER

the query runs fine if only highlight until here:
WHERE ITEMRANK=1
but as soon as i add the OR operation then it will run forever and eventually fails.  Thanks
Avatar of moe57

ASKER

Simon, can you help please?  thanks
Hi Moe, please post your complete modified select statement that includes the other location types. If server resources do not permit it may be necessary to approach this completely differently, using temporary tables. Your greatly expanded list of location types makes the task considerably more complex. I'd really prefer that you close off this question and restate your scenario more fully in a new one, using this one as a starting point.
Avatar of moe57

ASKER

okay will do that now.  I will close this one now and i already created another post in which i have incorporated some littel changes in the logic.  Here is the link for my new post.  thank you so much for your help.
https://www.experts-exchange.com/questions/28573340/find-first-record-and-group-them-by-locations-in-sql.html
Avatar of moe57

ASKER

I've requested that this question be closed as follows:

Accepted answer: 0 points for moe57's comment #a40476067

for the following reason:

thanks for your help,
Hi Moe,

When I said 'close off this question' I meant that the answer I'd given was a working solution for the question as stated. I'll look at your new question, but I think it would be fair to give credit for the effort already made.
This comment is the one that I consider to be the solution.
Avatar of moe57

ASKER

this was the correct answer but new requirements for different logic.  Thanks