moe57
asked on
how to find first record from dataset and group them by location using SQL
The trick for my challenge is to find where the patient was FIRST seen, meaning was the patient seen at the clinic first or at the emergency room, once we find out that then I need to group them and output the data. Here are the criteria and the logic that has to be used when outputting the final data:
here is the logic that needs to be followed.
1) ER first visit
a) if patient's first visit is ER, then display it and all other subsequent visits to ER like 2nd, 3rd etc
b) if same patient goes to clinic after 90 days then display it
2) Clinic first visit
a) if patient first visit is clinic then display it
b) if same patient goes to clinic again do not display it
c) if same patient goes to ER after 48 hours then display it.
Example:
here is how my data looks like :
here is the desired results after the query is run:
here is the query i have started but not finished it
here is the logic that needs to be followed.
1) ER first visit
a) if patient's first visit is ER, then display it and all other subsequent visits to ER like 2nd, 3rd etc
b) if same patient goes to clinic after 90 days then display it
2) Clinic first visit
a) if patient first visit is clinic then display it
b) if same patient goes to clinic again do not display it
c) if same patient goes to ER after 48 hours then display it.
Example:
- Mike first visit goes to ER on 1/1/2014 :- output this data
- Mike goes again to ER on 1/3/2014 :- output this data
- Mike goes to Clinic on 1/20/2014 :- do not output this data
- Mike goes to Clinic on 5/2/2014 : output this data
- David first visit goes to Clinic on 3/3/2014 : output this data
- David goes back again to Clinic on 3/4/2014 : do not output this data
- David goes to ER on 3/15/2014 : output this data
here is how my data looks like :
NAME LOCATION DATE
MIKE ER 1/1/2014
MIKE ER 1/3/2014
MIKE CLINIC 1/20/2014
MIKE CLINIC 5/2/2014
DAVID CLINIC 3/3/2014
DAVID CLINIC 3/4/2014
DAVID ER 3/15/2014
here is the desired results after the query is run:
NAME LOCATION DATE
MIKE ER 1/1/2014
MIKE ER 1/3/2014
MIKE CLINIC 5/2/2014
DAVID CLINIC 3/3/2014
DAVID ER 3/15/2014
here is the query i have started but not finished it
SET NOCOUNT ON
GO
DECLARE @Init_DataSource TABLE
(
[NAME] VARCHAR(20)
,[DATE] DATE
,[LOCATION] VARCHAR(15)
)
INSERT INTO @Init_DataSource ([NAME], [DATE], [LOCATION])
VALUES ('MIKE', '1-1-2014', 'ER')
,('MIKE', '1-3-2014', 'ER')
,('MIKE', '1-20-2014', 'CLINIC')
,('MIKE', '5-2-2014', 'CLINIC')
,('DAVID', '3-3-2014', 'CLINIC')
,('DAVID', '3-4-2014', 'CLINIC')
,('DAVID', '3-15-2014', 'ER')
SELECT *
FROM @Init_DataSource
ORDER BY 1 DESC, 2 ASC
ASKER
thank you so much for your help, so the only thing i need to change now this line
--first visit was ER
or (select itemlocation from cte1 where itemrank=1 and personID=name)= 'ER' AND location ='ER'
or (select itemlocation from cte1 where itemrank=1 and personID=name)= 'ER' AND location ='CLINIC'
AND dateadd(d, -90,itemdate) >(select itemdate from cte1 where itemrank=1 and personID=name)
--first visit was CLINIC
or ((select itemlocation from cte1 where itemrank=1 and personID=name)= 'CLINIC'
AND dateadd(d, -2,itemdate) >(select itemdate from cte1 where itemrank=1 and personID=name)
AND location = 'ER')
because i was told to use Netezza database so i need to convert these codes to some kind of postgresql or some syntax that supports netezza. It seems to work find in SQL but not in Netezza. thanks
Hi I don't know netezza. Could you first try taking the comment lines out of the where clause. I would never normally put comments anywhere inside a select statement, and was mildly surprised that they were accepted.
Your question was tagged as SQL Server 2008. Which version of postgreSQL does this need to work on? Are you using "PureData System for Analytics 7.0.3" or "InfoSphere BigInsights 2.1.2" or "PureData System for Hadoop 1.0.0" or something else?
Your question was tagged as SQL Server 2008. Which version of postgreSQL does this need to work on? Are you using "PureData System for Analytics 7.0.3" or "InfoSphere BigInsights 2.1.2" or "PureData System for Hadoop 1.0.0" or something else?
ASKER
okay i have decided to stick with SQL server and foret about the Postgresql or Netezza. There is one problem with this solution, for the clinic visit, each patient should only show up one time and this is only true for first time clinic visits. For example, patient A went to clinic twice first time on 5/1/2014 and second time on 5/3/2014 so in this case we only need to show the first visit for this patient (patient A). Any subsequent visits to clinic for the same patient should be deleted.
i have added one more record for David just to test the result and David showed up twice for clinic visits. here is the data i have used:
i have added one more record for David just to test the result and David showed up twice for clinic visits. here is the data i have used:
create TABLE dbo.DataSource
(
[NAME] VARCHAR(20)
,[DATE] DATE
,[LOCATION] VARCHAR(15)
)
INSERT INTO DataSource ([NAME], [DATE], [LOCATION])
VALUES ('MIKE', '1-1-2014', 'ER')
,('MIKE', '1-3-2014', 'ER')
,('MIKE', '1-20-2014', 'CLINIC')
,('MIKE', '5-2-2014', 'CLINIC')
,('DAVID', '3-3-2014', 'CLINIC')
,('DAVID', '3-4-2014', 'CLINIC')
,('DAVID', '3-7-2014', 'CLINIC')
,('DAVID', '3-15-2014', 'ER')
thanks
Hi, I am only seeing one clinic visit for each patient, even with your revised dataset:
See http://sqlfiddle.com/#!3/0485f/2
I can't replicate your result on sqlfiddle, and don't have access to a physical SQL Server to test on today. Which version of SQL Server have you tested on and obtained the incorrect result?
To be clear on the logic, your initial description does not necessarily exclude more than one clinic visit being shown for a patient IF they were first seen in ER, and then visit the clinic several times after 90 days have elapsed.
So, if I add another record for Mike ('MIKE', '5-4-2014', 'CLINIC'), that will also show in the results.
ITEMRANK NAME DATE LOCATION
1 DAVID 2014-03-03 CLINIC
4 DAVID 2014-03-15 ER
1 MIKE 2014-01-01 ER
2 MIKE 2014-01-03 ER
4 MIKE 2014-05-02 CLINIC
See http://sqlfiddle.com/#!3/0485f/2
I can't replicate your result on sqlfiddle, and don't have access to a physical SQL Server to test on today. Which version of SQL Server have you tested on and obtained the incorrect result?
To be clear on the logic, your initial description does not necessarily exclude more than one clinic visit being shown for a patient IF they were first seen in ER, and then visit the clinic several times after 90 days have elapsed.
So, if I add another record for Mike ('MIKE', '5-4-2014', 'CLINIC'), that will also show in the results.
ASKER
it is weird, i am seeing David went to Clinic twice, one is on 3/3/2014 and the other one is 3/7/2014. I am using SQL Server 2008 R2. If the patient first visit was at the clinic then any subsequent visits should be excluded but it is okay if patient first visit was at the ER then goes to the clinic several times later
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
actually this has fixed the problem. Thank you so much for your help.
Thanks for confirming. Are you happy that the above is the solution?
ASKER
thanks for your follow up, i did not close the question because i was having an issue when i run the query in production and the data is huge so it was failing on me and getting this error:
in my real data, i have other locations other than ER, something like this:
in my real data, i have other locations other than ER, something like this:
OR ((SELECT ITEMLOC_TYPE FROM CTE1 WHERE ITEMRANK = 1 AND PERSONID=PI_NBR) IN ('ER', 'OBS', 'ELEC','EXY','INP','OBS', 'URG') AND LOC_TYPE IN ('ER', 'OBS', 'ELEC','EXY','INP','OBS', 'URG'))
OR ((SELECT ITEMLOC_TYPE FROM CTE1 WHERE ITEMRANK = 1 AND PERSONID=PI_NBR) IN ('ER', 'OBS', 'ELEC','EXY','INP','OBS', 'URG') AND LOC_TYPE IN ('CLINIC', 'OP', 'OUTP')
so is there a way to overcome this issue or adjust the query like using subquery so i can overcome this issue? thanks
ASKER
the query runs fine if only highlight until here:
WHERE ITEMRANK=1
but as soon as i add the OR operation then it will run forever and eventually fails. Thanks
WHERE ITEMRANK=1
but as soon as i add the OR operation then it will run forever and eventually fails. Thanks
ASKER
Simon, can you help please? thanks
Hi Moe, please post your complete modified select statement that includes the other location types. If server resources do not permit it may be necessary to approach this completely differently, using temporary tables. Your greatly expanded list of location types makes the task considerably more complex. I'd really prefer that you close off this question and restate your scenario more fully in a new one, using this one as a starting point.
ASKER
okay will do that now. I will close this one now and i already created another post in which i have incorporated some littel changes in the logic. Here is the link for my new post. thank you so much for your help.
https://www.experts-exchange.com/questions/28573340/find-first-record-and-group-them-by-locations-in-sql.html
https://www.experts-exchange.com/questions/28573340/find-first-record-and-group-them-by-locations-in-sql.html
ASKER
I've requested that this question be closed as follows:
Accepted answer: 0 points for moe57's comment #a40476067
for the following reason:
thanks for your help,
Accepted answer: 0 points for moe57's comment #a40476067
for the following reason:
thanks for your help,
Hi Moe,
When I said 'close off this question' I meant that the answer I'd given was a working solution for the question as stated. I'll look at your new question, but I think it would be fair to give credit for the effort already made.
When I said 'close off this question' I meant that the answer I'd given was a working solution for the question as stated. I'll look at your new question, but I think it would be fair to give credit for the effort already made.
This comment is the one that I consider to be the solution.
ASKER
this was the correct answer but new requirements for different logic. Thanks
http://sqlfiddle.com/#!3/200bb/32
Schema:
Open in new window
Query:
Open in new window
At present, this will only work if properly if the patient has only one event per day, as the date forms part of the join condition.
The data schema could be improved by addition of a unique ID to each patient event.