biotec
asked on
Two SQL queries that need to be incorporated into one (possible Union)
One of these queries pulls internal labs and the other external labs. There are duplicate patients that need to be removed when joining these. Also wondering if instead of using multiple like statements I can use IN when it contains wildcards?
Internal Labs:
Internal Labs:
SELECT * from (select o2.acttext, o2.obsvalue, p.person_nbr, p.last_name, p.first_name, pe.enc_id, pe.person_id,
ROW_NUMBER() OVER (PARTITION BY p.person_nbr ORDER BY o2.completeddate DESC)rowno, o2.completeddate
FROM order_ o2
inner join person p with(nolock) on o2.person_id = p.person_id
INNER JOIN patient_encounter AS pe with(nolock) ON p.person_id = pe.person_id
inner join patient_diagnosis pd with(nolock) on pe.enc_id = pd.enc_id
INNER JOIN appointments AS a with(nolock) ON pe.enc_id = a.enc_id
WHERE o2.obsvalue IS NOT NULL
and (o2.completeddate >= '20150101'
AND o2.completeddate < '20160101')
and o2.acttext in ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
and (pd.diagnosis_code_id like '250%' or pd.diagnosis_code_id like '648.0%'
or pd.diagnosis_code_id like '357.2' or pd.diagnosis_code_id like '362.0[0-7]' or pd.diagnosis_code_id like '366.41' or pd.diagnosis_code_id like 'E1[0-3].%'
or pd.diagnosis_code_id like 'O24.[0-4]0%' or pd.diagnosis_code_id like 'O24.[4-9][2-9]')
and pe.person_id in (select pe.person_id from patient_encounter pe
INNER JOIN appointments AS a with(nolock) ON pe.enc_id = a.enc_id
where (pe.enc_timestamp >= '20151001'
AND pe.enc_timestamp < '20160101'
and a.appt_kept_ind = 'Y' and a.resched_ind = 'N'))) inofficea1c
where rowno = 1
External Labs: SELECT * from (select p.last_name as LastName, p.first_name as FirstName, p.sex as Sex, p.language as Language, p.Ethnicity as Ethnicity,
p.Race as Race, p.date_of_birth as DOB, res.obs_date_time as A1cDate, res.result_desc as ResultDescription, res.observ_value as A1cResultValue, res.obs_id as OBXObsID,
ROW_NUMBER() OVER (PARTITION BY p.person_nbr ORDER BY res.obs_date_time DESC)rowno
FROM lab_results_obx res
inner join person p with(nolock) on res.person_id = p.person_id
INNER JOIN patient_encounter AS pe with(nolock) ON p.person_id = pe.person_id
inner join patient_diagnosis pd with(nolock) on pe.enc_id = pd.enc_id
INNER JOIN appointments AS a with(nolock) ON pe.enc_id = a.enc_id
WHERE res.observ_value IS NOT NULL
and (res.obs_date_time >= '20150101'
AND res.obs_date_time < '20160101')
and (res.obs_id like '%HEMOGLOBIN A1C%' or res.obs_id like '%HBG. A1C%' or res.obs_id like '%HBG.AIC%' or res.obs_id like '%HEMOGLOBIN A1c%')
and (pd.diagnosis_code_id like '250%' or pd.diagnosis_code_id like '648.0%'
or pd.diagnosis_code_id like '357.2' or pd.diagnosis_code_id like '362.0[0-7]' or pd.diagnosis_code_id like '366.41' or pd.diagnosis_code_id like 'E1[0-3].%'
or pd.diagnosis_code_id like 'O24.[0-4]0%' or pd.diagnosis_code_id like 'O24.[4-9][2-9]')
and pe.person_id in (select pe.person_id from patient_encounter pe
INNER JOIN appointments AS a with(nolock) ON pe.enc_id = a.enc_id
where (pe.enc_timestamp >= '20151001'
AND pe.enc_timestamp < '20160101'
and a.appt_kept_ind = 'Y' and a.resched_ind = 'N'))) outsidelabA1c
where rowno = 1
>Also wondering if instead of using multiple like statements I can use IN when it contains wildcards?
Afaik no, IN does not support wildcards, so the way you have it is correct.
Afaik no, IN does not support wildcards, so the way you have it is correct.
ASKER
What is the best way to paste in from an excel spreadsheet without it just garbling together here? Also, what if I used AS in the select statement so that any column that was different matched up in terms of the AS part of it.
So if I pull an A1c lab value and date on an internal lab from table Order_ and the same values and dates from an external lab called lab_results_obx but call both AS result_value and AS result_date could I then use the Union?
So if I pull an A1c lab value and date on an internal lab from table Order_ and the same values and dates from an external lab called lab_results_obx but call both AS result_value and AS result_date could I then use the Union?
ASKER
Here is an Excel file with examples. I changed the select statements to match more closely and changed the AS parts where the queries were pulling from different tables.
Also, the only reason I was pulling enc_id and person_id in one of the original queries was for troubleshooting as those are unique_id's. I don't need them in the output for the report but I need them to check for dups and inaccuracies etc.
A1c_lab_query_examples.xlsx
Also, the only reason I was pulling enc_id and person_id in one of the original queries was for troubleshooting as those are unique_id's. I don't need them in the output for the report but I need them to check for dups and inaccuracies etc.
A1c_lab_query_examples.xlsx
Just a quick note, the column names do not need to match, just the column data types, and the matching columns have to be selected in the same order. SQL will use the field names in the first query for the result set.
First, you need to have same number of columns returned from each SELECT so get rid of the '*' and replace the columns names there. They should match logically, i.e. same data types at least and if you have columns returned from one SELECT that isn't returned from the other one, just replace it with NULL.
From your example you have for query 1 these columns:
SELECT last_name, first_name, acttext, person_nbr, enc_id, person_id, o2.completeddate, obsvalue, rowno
and from query 2 these ones:
SELECT LastName, FirstName, Sex, Language, Ethnicity, Race, DOB, A1cDate, ResultDescription, A1cResultValue, OBXObsID, rowno
Can you match them for us? I could at least match some but still to need for the rest of them.
From your example you have for query 1 these columns:
SELECT last_name, first_name, acttext, person_nbr, enc_id, person_id, o2.completeddate, obsvalue, rowno
and from query 2 these ones:
SELECT LastName, FirstName, Sex, Language, Ethnicity, Race, DOB, A1cDate, ResultDescription, A1cResultValue, OBXObsID, rowno
Can you match them for us? I could at least match some but still to need for the rest of them.
According to that spreadsheet if you drop one column [ResultDescription2] from the external labs query, and you re-arrange the column order, you should be able to use UNION
Internal Labs
LastName,FirstName,Sex,Lan guage,Ethn icity,Race ,DOB,Resul tDescripti on,A1cResult,A1ccompleteddate ,rowno
External Labs
LastName,FirstName,Sex,Lan guage,Ethn icity,Race ,DOB,Resul tDescripti on,A1cResultValue,A1cComplete dDate,rown o
Internal Labs
LastName,FirstName,Sex,Lan
External Labs
LastName,FirstName,Sex,Lan
ASKER
Thanks for the help. It runs properly now. What I need help with is that this obviously creates some duplicate rows in terms of the person and I need to pull only the most recent based on one of the timestamps. I'm not sure how I wrap the whole thing in one ROW_NUMBER or something like that.
The query takes over a minute to run which isn't good but I might be able to figure something out later.
My other problem is that I had to change to using a datetime for one of the queries as it was previously a varchar 20150101 type of date. I think the timestamp date I used is ok, but could occassionally be different than the actual ordered date I wanted to use.
The query takes over a minute to run which isn't good but I might be able to figure something out later.
My other problem is that I had to change to using a datetime for one of the queries as it was previously a varchar 20150101 type of date. I think the timestamp date I used is ok, but could occassionally be different than the actual ordered date I wanted to use.
SELECT * from (select p.last_name as LastName, p.first_name as FirstName, p.sex as Sex, p.language as Language, p.Ethnicity as Ethnicity,
p.Race as Race, p.date_of_birth as DOB, res.result_desc as ResultDescription, res.observ_value as A1cResultValue, res.obs_date_time as A1cDate,
pe.enc_id, pe.person_id,
ROW_NUMBER() OVER (PARTITION BY p.person_nbr ORDER BY res.obs_date_time DESC)rowno
FROM lab_results_obx res
inner join person p with(nolock) on res.person_id = p.person_id
INNER JOIN patient_encounter AS pe with(nolock) ON p.person_id = pe.person_id
inner join patient_diagnosis pd with(nolock) on pe.enc_id = pd.enc_id
INNER JOIN appointments AS a with(nolock) ON pe.enc_id = a.enc_id
WHERE res.observ_value IS NOT NULL
and (res.obs_date_time >= '20150401'
AND res.obs_date_time < '20160331')
and (res.obs_id like '%HEMOGLOBIN A1C%' or res.obs_id like '%HBG. A1C%' or res.obs_id like '%HBG.AIC%' or res.obs_id like '%HEMOGLOBIN A1c%')
and (pd.diagnosis_code_id like '250%' or pd.diagnosis_code_id like '648.0%'
or pd.diagnosis_code_id like '357.2' or pd.diagnosis_code_id like '362.0[0-7]' or pd.diagnosis_code_id like '366.41' or pd.diagnosis_code_id like 'E1[0-3].%'
or pd.diagnosis_code_id like 'O24.[0-4]0%' or pd.diagnosis_code_id like 'O24.[4-9][2-9]')
and pe.person_id in (select pe.person_id from patient_encounter pe
INNER JOIN appointments AS a with(nolock) ON pe.enc_id = a.enc_id
where (pe.enc_timestamp >= '20160101'
AND pe.enc_timestamp < '20160331'
and a.appt_kept_ind = 'Y' and a.resched_ind = 'N'))) outsidelabA1c
where rowno = 1
UNION
SELECT * from (select p.last_name as LastName, p.first_name as FirstName, p.sex as Sex, p.language as Language, p.Ethnicity as Ethnicity,
p.Race as Race, p.date_of_birth as DOB, o2.acttext as ResultDescription, o2.obsvalue as A1cResultValue, o2.create_timestamp as A1cDate, pe.enc_id, pe.person_id,
ROW_NUMBER() OVER (PARTITION BY p.person_nbr ORDER BY o2.completeddate DESC)rowno
FROM order_ o2
inner join person p with(nolock) on o2.person_id = p.person_id
INNER JOIN patient_encounter AS pe with(nolock) ON p.person_id = pe.person_id
inner join patient_diagnosis pd with(nolock) on pe.enc_id = pd.enc_id
INNER JOIN appointments AS a with(nolock) ON pe.enc_id = a.enc_id
WHERE o2.obsvalue IS NOT NULL
and (o2.completeddate >= '20150401'
AND o2.completeddate < '20160331')
and o2.acttext in ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
and (pd.diagnosis_code_id like '250%' or pd.diagnosis_code_id like '648.0%'
or pd.diagnosis_code_id like '357.2' or pd.diagnosis_code_id like '362.0[0-7]' or pd.diagnosis_code_id like '366.41' or pd.diagnosis_code_id like 'E1[0-3].%'
or pd.diagnosis_code_id like 'O24.[0-4]0%' or pd.diagnosis_code_id like 'O24.[4-9][2-9]')
and pe.person_id in (select pe.person_id from patient_encounter pe
INNER JOIN appointments AS a with(nolock) ON pe.enc_id = a.enc_id
where (pe.enc_timestamp >= '20160101'
AND pe.enc_timestamp < '20160331'
and a.appt_kept_ind = 'Y' and a.resched_ind = 'N'))) inofficea1c
where rowno = 1
I suggest using a CTE to arrive at the list of people, so you do that just once.
Then, link that CTE to the each of the 2 sources of data.
Keep the rows of these as "narrow" (few columns) as possible.
Use row_number() on each result to only get the latest dated rows.
Use UNION on the latest dated narrow rows to get uniqueness.
Then link to the person table for the extra columns
If sample data had been provided it should be "per table" and scrubbed of any private information.
Then, link that CTE to the each of the 2 sources of data.
Keep the rows of these as "narrow" (few columns) as possible.
Use row_number() on each result to only get the latest dated rows.
Use UNION on the latest dated narrow rows to get uniqueness.
Then link to the person table for the extra columns
;
WITH CTEpe
AS (
SELECT DISTINCT
pe.person_id
FROM patient_encounter pe
INNER JOIN appointments AS a WITH (NOLOCK) ON pe.enc_id = a.enc_id
INNER JOIN patient_diagnosis pd WITH (NOLOCK) ON pe.enc_id = pd.enc_id
WHERE pe.enc_timestamp >= '20151001'
AND pe.enc_timestamp < '20160101'
AND a.appt_kept_ind = 'Y'
AND a.resched_ind = 'N'
AND (
pd.diagnosis_code_id LIKE '250%'
OR pd.diagnosis_code_id LIKE '648.0%'
OR pd.diagnosis_code_id LIKE '357.2'
OR pd.diagnosis_code_id LIKE '362.0[0-7]'
OR pd.diagnosis_code_id LIKE '366.41'
OR pd.diagnosis_code_id LIKE 'E1[0-3].%'
OR pd.diagnosis_code_id LIKE 'O24.[0-4]0%'
OR pd.diagnosis_code_id LIKE 'O24.[4-9][2-9]'
)
)
SELECT
p.last_name AS LastName
, p.first_name AS FirstName
, p.sex AS Sex
, p.language AS Language
, p.Ethnicity AS Ethnicity
, p.Race AS Race
, p.date_of_birth AS DOB
, d.ResultDescription
, d.A1cResult
, d.A1ccompleteddate
FROM person p
INNER JOIN (
SELECT
person_id, ResultDescription, A1cResult, A1ccompleteddate
FROM (
SELECT
o2.person_id
, o2.acttext AS ResultDescription
, o2.obsvalue AS A1cResult
, o2.completeddate AS A1ccompleteddate
, ROW_NUMBER() OVER (PARTITION BY o2.person_id ORDER BY o2.completeddate DESC) AS rowno
FROM order_ o2
INNER JOIN CTEpe ON o2.person_id = CTEpe.person_id
WHERE o2.obsvalue IS NOT NULL
AND (o2.completeddate >= '20150101'
AND o2.completeddate < '20160101')
AND o2.acttext IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
) inofficea1c
WHERE rowno = 1
UNION
SELECT
person_id, ResultDescription, A1cResult, A1ccompleteddate
FROM (
SELECT
res.person_id
, res.result_desc AS ResultDescription
, res.observ_value AS A1cResult
, res.obs_date_time AS A1ccompleteddate
, ROW_NUMBER() OVER (PARTITION BY p.person_nbr ORDER BY res.obs_date_time DESC) rowno
FROM lab_results_obx res
INNER JOIN CTEpe ON res.person_id = CTEpe.person_id
WHERE res.observ_value IS NOT NULL
AND (res.obs_date_time >= '20150101'
AND res.obs_date_time < '20160101')
AND (res.obs_id LIKE '%HEMOGLOBIN A1C%'
OR res.obs_id LIKE '%HBG. A1C%'
OR res.obs_id LIKE '%HBG.AIC%'
OR res.obs_id LIKE '%HEMOGLOBIN A1c%')
) outsidelabA1c
WHERE rowno = 1
) AS d ON p.person_id = d.person_id
;
none of this is tested of course as there was no sample data provided.If sample data had been provided it should be "per table" and scrubbed of any private information.
We cross posted, I had not seen your 2016-01-29 at 15:56:56 ID: 41440674 comment
perhaps this:
perhaps this:
;
WITH CTEpe
AS (
SELECT DISTINCT
pe.person_id
FROM patient_encounter pe
INNER JOIN appointments AS a WITH (NOLOCK) ON pe.enc_id = a.enc_id
INNER JOIN patient_diagnosis pd WITH (NOLOCK) ON pe.enc_id = pd.enc_id
WHERE pe.enc_timestamp >= '20151001'
AND pe.enc_timestamp < '20160101'
AND a.appt_kept_ind = 'Y'
AND a.resched_ind = 'N'
AND (
pd.diagnosis_code_id LIKE '250%'
OR pd.diagnosis_code_id LIKE '648.0%'
OR pd.diagnosis_code_id LIKE '357.2'
OR pd.diagnosis_code_id LIKE '362.0[0-7]'
OR pd.diagnosis_code_id LIKE '366.41'
OR pd.diagnosis_code_id LIKE 'E1[0-3].%'
OR pd.diagnosis_code_id LIKE 'O24.[0-4]0%'
OR pd.diagnosis_code_id LIKE 'O24.[4-9][2-9]'
)
)
SELECT
p.last_name AS LastName
, p.first_name AS FirstName
, p.sex AS Sex
, p.language AS Language
, p.Ethnicity AS Ethnicity
, p.Race AS Race
, p.date_of_birth AS DOB
, d.ResultDescription
, d.A1cResult
, d.A1ccompleteddate
FROM person p
INNER JOIN (
SELECT
person_id, ResultDescription, A1cResult, A1ccompleteddate
, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY A1ccompleteddate DESC) AS rowno
FROM (
SELECT
o2.person_id
, o2.acttext AS ResultDescription
, o2.obsvalue AS A1cResult
, o2.completeddate AS A1ccompleteddate
FROM order_ o2
INNER JOIN CTEpe ON o2.person_id = CTEpe.person_id
WHERE o2.obsvalue IS NOT NULL
AND (o2.completeddate >= '20150101'
AND o2.completeddate < '20160101')
AND o2.acttext IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
UNION
SELECT
res.person_id
, res.result_desc AS ResultDescription
, res.observ_value AS A1cResult
, res.obs_date_time AS A1ccompleteddate
FROM lab_results_obx res
INNER JOIN CTEpe ON res.person_id = CTEpe.person_id
WHERE res.observ_value IS NOT NULL
AND (res.obs_date_time >= '20150101'
AND res.obs_date_time < '20160101')
AND (res.obs_id LIKE '%HEMOGLOBIN A1C%'
OR res.obs_id LIKE '%HBG. A1C%'
OR res.obs_id LIKE '%HBG.AIC%'
OR res.obs_id LIKE '%HEMOGLOBIN A1c%')
) AS DT
) AS d ON p.person_id = d.person_id
AND d.rowno = 1
ASKER
Thanks. I'll attach an output from each query. When I remove dups from mine run as the original Union I get 165 when I run yours it returns dups as well but I only get 114 when removing the dups so it appears to be eliminating some of the tests. Certainly appreciate the help. I'm trying to get used to the CTE but not crystal clear yet.
Examples keep the data similar to what you'd expect but all PHI has been removed or changed. Thanks again.
A1c_lab_query_examples.xlsx
Examples keep the data similar to what you'd expect but all PHI has been removed or changed. Thanks again.
A1c_lab_query_examples.xlsx
ASKER
Cross posted again. Ok, I tried the new one and it returns 112 rows. I'll add person_id into it and then do some comparisons to what is unique between yours and my UNION and see if I can figure out where they are falling through the cracks (code).
Getting close though as your new one no longer returns dups. Thanks
Getting close though as your new one no longer returns dups. Thanks
without data I can't do much except make suggestions.
what you describe does not make much sense because the filter is there
SELECT DISTINCT
pe.person_id
FROM patient_encounter pe
INNER JOIN appointments AS a ON pe.enc_id = a.enc_id
INNER JOIN patient_diagnosis pd ON pe.enc_id = pd.enc_id
WHERE pe.enc_timestamp >= '20151001' -- 2015-10-01
AND pe.enc_timestamp < '20160101' -- 2016-01-01
-- you have previously indicated hat these are also needed:
AND a.appt_kept_ind = 'Y'
AND a.resched_ind = 'N'
-- and these were previously in both too
AND (
pd.diagnosis_code_id LIKE '250%'
OR pd.diagnosis_code_id LIKE '648.0%'
OR pd.diagnosis_code_id LIKE '357.2'
OR pd.diagnosis_code_id LIKE '362.0[0-7]'
OR pd.diagnosis_code_id LIKE '366.41'
OR pd.diagnosis_code_id LIKE 'E1[0-3].%'
OR pd.diagnosis_code_id LIKE 'O24.[0-4]0%'
OR pd.diagnosis_code_id LIKE 'O24.[4-9][2-9]'
)
so the result would be a distinct list of person_id's meeting all of those conditions
Then we take that list (CTEpe) to restrict who we are interested in and look for the related remaining information
SELECT
o2.person_id
, o2.acttext AS ResultDescription
, o2.obsvalue AS A1cResult
, o2.completeddate AS A1ccompleteddate
FROM order_ o2
INNER JOIN CTEpe ON o2.person_id = CTEpe.person_id
WHERE o2.obsvalue IS NOT NULL
AND (o2.completeddate >= '20150101'
AND o2.completeddate < '20160101')
AND o2.acttext IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
and
SELECT
res.person_id
, res.result_desc AS ResultDescription
, res.observ_value AS A1cResult
, res.obs_date_time AS A1ccompleteddate
FROM lab_results_obx res
INNER JOIN CTEpe ON res.person_id = CTEpe.person_id
WHERE res.observ_value IS NOT NULL
AND (res.obs_date_time >= '20150101'
AND res.obs_date_time < '20160101')
AND (res.obs_id LIKE '%HEMOGLOBIN A1C%'
OR res.obs_id LIKE '%HBG. A1C%'
OR res.obs_id LIKE '%HBG.AIC%'
OR res.obs_id LIKE '%HEMOGLOBIN A1c%')
Then by using ROW_NUMBER over that remaining information we take just the latest contact per person
hence, from where I sit (and with my imperfect knowledge of your needs and the data) I cannot offer much more than I have.
what you describe does not make much sense because the filter is there
SELECT DISTINCT
pe.person_id
FROM patient_encounter pe
INNER JOIN appointments AS a ON pe.enc_id = a.enc_id
INNER JOIN patient_diagnosis pd ON pe.enc_id = pd.enc_id
WHERE pe.enc_timestamp >= '20151001' -- 2015-10-01
AND pe.enc_timestamp < '20160101' -- 2016-01-01
-- you have previously indicated hat these are also needed:
AND a.appt_kept_ind = 'Y'
AND a.resched_ind = 'N'
-- and these were previously in both too
AND (
pd.diagnosis_code_id LIKE '250%'
OR pd.diagnosis_code_id LIKE '648.0%'
OR pd.diagnosis_code_id LIKE '357.2'
OR pd.diagnosis_code_id LIKE '362.0[0-7]'
OR pd.diagnosis_code_id LIKE '366.41'
OR pd.diagnosis_code_id LIKE 'E1[0-3].%'
OR pd.diagnosis_code_id LIKE 'O24.[0-4]0%'
OR pd.diagnosis_code_id LIKE 'O24.[4-9][2-9]'
)
so the result would be a distinct list of person_id's meeting all of those conditions
Then we take that list (CTEpe) to restrict who we are interested in and look for the related remaining information
SELECT
o2.person_id
, o2.acttext AS ResultDescription
, o2.obsvalue AS A1cResult
, o2.completeddate AS A1ccompleteddate
FROM order_ o2
INNER JOIN CTEpe ON o2.person_id = CTEpe.person_id
WHERE o2.obsvalue IS NOT NULL
AND (o2.completeddate >= '20150101'
AND o2.completeddate < '20160101')
AND o2.acttext IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
and
SELECT
res.person_id
, res.result_desc AS ResultDescription
, res.observ_value AS A1cResult
, res.obs_date_time AS A1ccompleteddate
FROM lab_results_obx res
INNER JOIN CTEpe ON res.person_id = CTEpe.person_id
WHERE res.observ_value IS NOT NULL
AND (res.obs_date_time >= '20150101'
AND res.obs_date_time < '20160101')
AND (res.obs_id LIKE '%HEMOGLOBIN A1C%'
OR res.obs_id LIKE '%HBG. A1C%'
OR res.obs_id LIKE '%HBG.AIC%'
OR res.obs_id LIKE '%HEMOGLOBIN A1c%')
Then by using ROW_NUMBER over that remaining information we take just the latest contact per person
hence, from where I sit (and with my imperfect knowledge of your needs and the data) I cannot offer much more than I have.
ASKER
I'm definitely finding patients that had an enc in the range and had the lab done in the range from yours but can't figure out what might be causing it to lose some. It did get rid of the dups but got rid of too many in the process. Weird.
ASKER
The only real way I know to fix this is to change the code I have to eliminate dups without changing it to use CTE. I realize that is the most efficient way but I don't know enough about it and how the joins all fit together to figure out why it's excluding patients. This original one if somehow could be set to remove the dups between the ones from pre UNION and post UNION then I'd have a good way to test.
SELECT * from (select p.last_name as LastName, p.first_name as FirstName, p.sex as Sex, p.language as Language, p.Ethnicity as Ethnicity,
p.Race as Race, p.date_of_birth as DOB, res.result_desc as ResultDescription, res.observ_value as A1cResultValue, res.obs_date_time as A1cDate,
pe.enc_id, pe.person_id,
ROW_NUMBER() OVER (PARTITION BY p.person_id ORDER BY res.obs_date_time DESC)rowno
FROM lab_results_obx res
inner join person p with(nolock) on res.person_id = p.person_id
INNER JOIN patient_encounter AS pe with(nolock) ON p.person_id = pe.person_id
inner join patient_diagnosis pd with(nolock) on pe.enc_id = pd.enc_id
INNER JOIN appointments AS a with(nolock) ON pe.enc_id = a.enc_id
WHERE res.observ_value IS NOT NULL
and (res.obs_date_time >= '20150401'
AND res.obs_date_time < '20160331')
and (res.obs_id like '%HEMOGLOBIN A1C%' or res.obs_id like '%HBG. A1C%' or res.obs_id like '%HBG.AIC%' or res.obs_id like '%HEMOGLOBIN A1c%')
and (pd.diagnosis_code_id like '250%' or pd.diagnosis_code_id like '648.0%'
or pd.diagnosis_code_id like '357.2' or pd.diagnosis_code_id like '362.0[0-7]' or pd.diagnosis_code_id like '366.41' or pd.diagnosis_code_id like 'E1[0-3].%'
or pd.diagnosis_code_id like 'O24.[0-4]0%' or pd.diagnosis_code_id like 'O24.[4-9][2-9]')
and pe.person_id in (select pe.person_id from patient_encounter pe
INNER JOIN appointments AS a with(nolock) ON pe.enc_id = a.enc_id
where (pe.enc_timestamp >= '20160101'
AND pe.enc_timestamp < '20160401'
and a.appt_kept_ind = 'Y' and a.resched_ind = 'N'))) outsidelabA1c
where rowno = 1
UNION
SELECT * from (select p.last_name as LastName, p.first_name as FirstName, p.sex as Sex, p.language as Language, p.Ethnicity as Ethnicity,
p.Race as Race, p.date_of_birth as DOB, o2.acttext as ResultDescription, o2.obsvalue as A1cResultValue, o2.create_timestamp as A1cDate, pe.enc_id, pe.person_id,
ROW_NUMBER() OVER (PARTITION BY p.person_nbr ORDER BY o2.completeddate DESC)rowno
FROM order_ o2
inner join person p with(nolock) on o2.person_id = p.person_id
INNER JOIN patient_encounter AS pe with(nolock) ON p.person_id = pe.person_id
inner join patient_diagnosis pd with(nolock) on pe.enc_id = pd.enc_id
INNER JOIN appointments AS a with(nolock) ON pe.enc_id = a.enc_id
WHERE o2.obsvalue IS NOT NULL
and (o2.completeddate >= '20150401'
AND o2.completeddate < '20160331')
and o2.acttext in ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
and (pd.diagnosis_code_id like '250%' or pd.diagnosis_code_id like '648.0%'
or pd.diagnosis_code_id like '357.2' or pd.diagnosis_code_id like '362.0[0-7]' or pd.diagnosis_code_id like '366.41' or pd.diagnosis_code_id like 'E1[0-3].%'
or pd.diagnosis_code_id like 'O24.[0-4]0%' or pd.diagnosis_code_id like 'O24.[4-9][2-9]')
and pe.person_id in (select pe.person_id from patient_encounter pe
INNER JOIN appointments AS a with(nolock) ON pe.enc_id = a.enc_id
where (pe.enc_timestamp >= '20160101'
AND pe.enc_timestamp < '20160401'
and a.appt_kept_ind = 'Y' and a.resched_ind = 'N'))) inofficea1c
where rowno = 1
Without data I am flying low in a blizzard; I see white and nothing but white.
========================== ========== ========== ========== ======
select person_id, adate, 'some text'
union
select person_id, diffdate, 'different text'
in that example NO ROWS ARE REMOVED by the use of UNION
but for this:
there MIGHT be some rows removed
and for this:
it is highly likely that rows will be removed.
What I am trying to indicate is that the UNION in your query probably removes no rows or very few rows.
In my query more rows will be removed!!!!
========================== ========== ========== ========== =========
Unless or until there is sample data to work with I have given you my best effort.
You create a small list of people you know you are interested in, we all that "CTEpe"
then you re-use that list to filter the 2 different source of more information.
COMBINE both those sets of information using UNION ALL << difference here!
now take the most recent contact from that combined list
If this isn't the correct logic, then it isn't a correct result.
==========================
select person_id, adate, 'some text'
union
select person_id, diffdate, 'different text'
but for this:
select person_id, adate
union
select person_id, diffdate
union
select person_id, diffdate
there MIGHT be some rows removed
and for this:
select person_id
union
select person_id
union
select person_id
it is highly likely that rows will be removed.
What I am trying to indicate is that the UNION in your query probably removes no rows or very few rows.
In my query more rows will be removed!!!!
==========================
Unless or until there is sample data to work with I have given you my best effort.
;
WITH CTEpe
AS (
SELECT DISTINCT
pe.person_id
FROM patient_encounter pe
INNER JOIN appointments AS a WITH (NOLOCK) ON pe.enc_id = a.enc_id
INNER JOIN patient_diagnosis pd WITH (NOLOCK) ON pe.enc_id = pd.enc_id
WHERE pe.enc_timestamp >= '20151001'
AND pe.enc_timestamp < '20160101'
AND a.appt_kept_ind = 'Y'
AND a.resched_ind = 'N'
AND (
pd.diagnosis_code_id LIKE '250%'
OR pd.diagnosis_code_id LIKE '648.0%'
OR pd.diagnosis_code_id LIKE '357.2'
OR pd.diagnosis_code_id LIKE '362.0[0-7]'
OR pd.diagnosis_code_id LIKE '366.41'
OR pd.diagnosis_code_id LIKE 'E1[0-3].%'
OR pd.diagnosis_code_id LIKE 'O24.[0-4]0%'
OR pd.diagnosis_code_id LIKE 'O24.[4-9][2-9]'
)
)
SELECT
p.last_name AS LastName
, p.first_name AS FirstName
, p.sex AS Sex
, p.language AS Language
, p.Ethnicity AS Ethnicity
, p.Race AS Race
, p.date_of_birth AS DOB
, d.ResultDescription
, d.A1cResult
, d.A1ccompleteddate
FROM person p
INNER JOIN (
SELECT
person_id, ResultDescription, A1cResult, A1ccompleteddate
, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY A1ccompleteddate DESC) AS rowno
FROM (
SELECT
o2.person_id
, o2.acttext AS ResultDescription
, o2.obsvalue AS A1cResult
, o2.completeddate AS A1ccompleteddate
FROM order_ o2
INNER JOIN CTEpe ON o2.person_id = CTEpe.person_id
WHERE o2.obsvalue IS NOT NULL
AND (o2.completeddate >= '20150101'
AND o2.completeddate < '20160101')
AND o2.acttext IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
UNION ALL
SELECT
res.person_id
, res.result_desc AS ResultDescription
, res.observ_value AS A1cResult
, res.obs_date_time AS A1ccompleteddate
FROM lab_results_obx res
INNER JOIN CTEpe ON res.person_id = CTEpe.person_id
WHERE res.observ_value IS NOT NULL
AND (res.obs_date_time >= '20150101'
AND res.obs_date_time < '20160101')
AND (res.obs_id LIKE '%HEMOGLOBIN A1C%'
OR res.obs_id LIKE '%HBG. A1C%'
OR res.obs_id LIKE '%HBG.AIC%'
OR res.obs_id LIKE '%HEMOGLOBIN A1c%')
) AS DT
) AS d ON p.person_id = d.person_id
AND d.rowno = 1
;
CTEs are not complex, think of it like a temp table.You create a small list of people you know you are interested in, we all that "CTEpe"
then you re-use that list to filter the 2 different source of more information.
COMBINE both those sets of information using UNION ALL << difference here!
now take the most recent contact from that combined list
If this isn't the correct logic, then it isn't a correct result.
By the way.
The logic of your current query includes the most recent contact, from 2 sources, ONLY IF those 2 contacts are identical in ALL COLUMNS will it be reduced to a single row.
You are comparing that query, to mine that will ONLY LIST the most recent contact
in other words, I'm would not be surprised if my query produced fewer rows.
The logic of your current query includes the most recent contact, from 2 sources, ONLY IF those 2 contacts are identical in ALL COLUMNS will it be reduced to a single row.
You are comparing that query, to mine that will ONLY LIST the most recent contact
in other words, I'm would not be surprised if my query produced fewer rows.
ASKER
Yes but the problem is that I pulled all the dups out of mine then went in and did a compare against the results of yours and the ones I had that you did not were legitimate. So, even though I realize my UNION query will not remove dups (which is why I wanted a method of doing that without the CTE) it does seem to produce the correct results other than the fact that it is comparing against different columns in different tables and thus giving dups.
Also, I did attach an excel file with examples that would be pulled from each query separately (in office labs versus external labs) which is what I thought was asked for. I'm happy to return results from the UNION query or the CTE if that would be helpful.
I'm wondering if it's possible that the CTE by inner joining on the person_id is somehow eliminating all the people who had them done within the date range for both external and internal labs. Basically my queries before the UNION seem to be accurate. The only problem is that some people had 2 tests, one in-office and one externally and that creates dups that must be removed in Excel or something else which is obviously not what I would like.
Thanks
Also, I did attach an excel file with examples that would be pulled from each query separately (in office labs versus external labs) which is what I thought was asked for. I'm happy to return results from the UNION query or the CTE if that would be helpful.
I'm wondering if it's possible that the CTE by inner joining on the person_id is somehow eliminating all the people who had them done within the date range for both external and internal labs. Basically my queries before the UNION seem to be accurate. The only problem is that some people had 2 tests, one in-office and one externally and that creates dups that must be removed in Excel or something else which is obviously not what I would like.
Thanks
>>"UNION query will not remove dups (which is why I wanted a method of doing that without the CTE) "
the CTE had nothing whatever to do it. Please stop focusing on the CTE.
If you did the following replaced "JOIN CTEpe" with "JOIN ( .... ) as CTEpe" and in that ... you included the full subquery that currently forms the CTE. Then removed all of the code that created the CTE you would end up with an equivalent query, without a CTE, that produced the same results as my query. The CTE is just saving you some time (better performance).
I do not know how to say this any more politely than I have - but without DATA I can do very little at this end. Query results are not data and I am not spending my voluntary time trying to reverse engineer them into table inserts.
"sample data" is per table so that the query one is trying to develop can be run.
Regarding the "dups that are valid"
OK
but again as I cannot see them, I have no idea why they are valid except that you tell me they are.
Can you translate "they are valid" into business rules that the query should meet?
As you can surely tell, I am trying to assist, but: no data = no much help
You might want to read this: http://sscce.org/
the CTE had nothing whatever to do it. Please stop focusing on the CTE.
If you did the following replaced "JOIN CTEpe" with "JOIN ( .... ) as CTEpe" and in that ... you included the full subquery that currently forms the CTE. Then removed all of the code that created the CTE you would end up with an equivalent query, without a CTE, that produced the same results as my query. The CTE is just saving you some time (better performance).
I do not know how to say this any more politely than I have - but without DATA I can do very little at this end. Query results are not data and I am not spending my voluntary time trying to reverse engineer them into table inserts.
"sample data" is per table so that the query one is trying to develop can be run.
Regarding the "dups that are valid"
OK
but again as I cannot see them, I have no idea why they are valid except that you tell me they are.
Can you translate "they are valid" into business rules that the query should meet?
As you can surely tell, I am trying to assist, but: no data = no much help
You might want to read this: http://sscce.org/
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
thank you Paul. I sincerely appreciate all the help I get here. This particular account was paid for originally by an org I worked for very shortly but they used my login for years after I left. It looks like I've been on here that long but in reality I'm fairly new to this and just learning how to properly ask questions. Trust me I'm paying for this membership on my own now and trying not to piss off all the great people who help me. :)
I do understand now about getting sample data. My problem is getting you data from these that is de-identified. That would be a bit of an undertaking. I can get data from tables using last name of 'Test' and started doing that but realized they would rarely have legitimate information in their records which would make it difficult to test with.
Bottom Line: I've spent considerable time testing all the records returned from various queries used here and I am wondering if it is possibly one simple issue that causes problems on all of them. The fact that the one most important column to Union on is the timestamp. In the first part of the query for external labs the ROW_NUMBER uses the res.obs_date_time which is a datetime and the in-office labs portion of the second half of the UNION uses o2.completeddate which is a varchar. All I'm trying to do is UNION the two but remove dups, however even if I were able to do some ROW_NUMBER on the whole big query (to only take the most recently done A1c lab from the UNION of the two thus comparing on one of the timestamps) it wouldn't be able to properly compare because maybe doing a convert on the varchar completed date would be necessary.
As far as Business Rules
I do understand now about getting sample data. My problem is getting you data from these that is de-identified. That would be a bit of an undertaking. I can get data from tables using last name of 'Test' and started doing that but realized they would rarely have legitimate information in their records which would make it difficult to test with.
Bottom Line: I've spent considerable time testing all the records returned from various queries used here and I am wondering if it is possibly one simple issue that causes problems on all of them. The fact that the one most important column to Union on is the timestamp. In the first part of the query for external labs the ROW_NUMBER uses the res.obs_date_time which is a datetime and the in-office labs portion of the second half of the UNION uses o2.completeddate which is a varchar. All I'm trying to do is UNION the two but remove dups, however even if I were able to do some ROW_NUMBER on the whole big query (to only take the most recently done A1c lab from the UNION of the two thus comparing on one of the timestamps) it wouldn't be able to properly compare because maybe doing a convert on the varchar completed date would be necessary.
As far as Business Rules
Diabetic Patients (the diagnosis codes tell us this)
Had an encounter any time between 20160101 and 20160331
If they had an encounter in that period then they also must have an A1c lab test done any time in the past year 20150401 to 20160331
If the patients had more than one test only return the most recent regardless of it being in-office or external
It may be preferrable to convert the timestamp to varchar - that way you avoid errors if an existing varchar "date" cannot be converted.
E.g. convert (varchar (8),datetimecolumn,112)
That should convert to a YYYYMMDD pattern but it would be necessary to match the stringdate column's format.
This is another instance of not seeing data, I don't know what format your varchar "dates" are. Hopefully yyyymmdd or yyyy-mm-dd so they can be sorted correctly.
E.g. convert (varchar (8),datetimecolumn,112)
That should convert to a YYYYMMDD pattern but it would be necessary to match the stringdate column's format.
This is another instance of not seeing data, I don't know what format your varchar "dates" are. Hopefully yyyymmdd or yyyy-mm-dd so they can be sorted correctly.
ASKER
Yes they are and I just thought of something that I should have thought about a long time ago. Why not just take your CTE query which incorporated the UNION in a more efficient way (or even my giant one that takes forever to load) and turn it into a view then select from the view and eliminate the dups that way. I would of course have to do a convert on the varchar date that is yyyymmdd as you mention but then I'm doing the ROW_NUMBER statement outside of the queries and using it against the view.
Let us establish the best most correct query, then translate that to a view.
some while back you said this:
WHAT was in the valid ones that made you aware they were valid?
a tiny example would be really appreciated.
before pulling dups
after pulling dups
which ones(s) is(are) valid
some while back you said this:
Yes but the problem is that I pulled all the dups out of mine then went in and did a compare against the results of yours and the ones I had that you did not were legitimate.HOW did you "pull the dups"?
WHAT was in the valid ones that made you aware they were valid?
a tiny example would be really appreciated.
before pulling dups
after pulling dups
which ones(s) is(are) valid
ASKER
I added person_id to both of our queries. Mine returned patients that had a visit in the date range and had an A1c in the date range.
I'll try to get some examples from each of the queries. My biggest issue is getting real data and making it de-identified data. No MRN's, phones, addresses, dob, names etc etc. and sometimes these tables are so large (50-60 columns) it's easy to miss something.
Can I return some examples of just the data returned from the queries rather than examples from each table referenced? thanks
I'll try to get some examples from each of the queries. My biggest issue is getting real data and making it de-identified data. No MRN's, phones, addresses, dob, names etc etc. and sometimes these tables are so large (50-60 columns) it's easy to miss something.
Can I return some examples of just the data returned from the queries rather than examples from each table referenced? thanks
Yes examples are ok
ASKER
I hope this attachment will help.
A1cTestExamples.xlsx
A1cTestExamples.xlsx
I'm really sorry but while I did study that spreadsheet I just didn't follow it.
Not sure where yo want to go with this now. By the way my time for EE is quite restricted now also as I have 2 projects working concurrently.
Not sure where yo want to go with this now. By the way my time for EE is quite restricted now also as I have 2 projects working concurrently.
ASKER
I gave real examples of just about every table used in the queries which is what I thought you wanted. I'm not sure what else to send. I appreciate your help but if you don't have time, I understand. I'll repost the question and add all the current examples we have and see if anyone else might be able to provide help. thanks
Here is some sample data (tab delimited):
Order_.A1cDate is varchar in format 2015-12-07 11:54:00.000
Lab_Results_Obx.A1cDate is a datetime column
person_id is really stored in this form 'D8FF794D-3E19-4830-82C8-0 9E2758CF1B 9'
enc_id is really stored in this form '9535E988-C284-465A-83C1-F AED1C297E0 C'
the string values for person_id and enc_id have been replaced with shorter strings for ease of reading. e.g. each unique person_id was replaced by p1 or p2 or p3 etc
for enc_id only some of these are common across more than one table, they commence with 'e'
other that are not common across tables commence with 'c'
===============
From that simplfied text I can easily create some tables, like this:
Person
person_id LastName
p1 test1
p2 test2
p3 test3
p4 test4
p5 test5
patient_encounter
person_id enc_id enc_timestamp
p1 e1 2016-01-02 11:50:00
p2 e2 2016-01-04 11:57:03
p1 e3 2016-01-05 10:36:46
p1 e4 2016-01-05 11:30:00
p3 e5 2016-01-07 15:15:00
p4 e6 2016-01-08 16:35:00
p2 e7 2016-01-11 10:00:00
p1 e8 2016-01-12 08:30:00
p4 e9 2016-01-12 12:00:00
p1 ea 2016-01-12 12:22:34
p2 eb 2016-01-18 10:17:34
p5 ec 2016-01-19 12:42:06
p5 ed 2016-01-26 14:15:00
p2 ee 2016-02-01 12:24:44
Order_
person_id A1cDate A1cResultValue ResultDescription
p1 2015-07-02 16:59:25 6.5 GLYCATED HEMOGLOBIN TEST
p1 2015-10-01 13:39:24 6.3 Hemoglobin A1C
p2 2015-11-23 09:38:23 10.3 GLYCATED HEMOGLOBIN TEST
p2 2016-01-11 10:41:37 8.3 GLYCATED HEMOGLOBIN TEST
p3 2015-05-20 10:08:55 5.7 GLYCATED HEMOGLOBIN TEST
p3 2015-11-03 09:06:45 7.2 GLYCATED HEMOGLOBIN TEST
p3 2016-01-07 18:07:26 6.5 GLYCATED HEMOGLOBIN TEST
p4 2015-12-15 16:32:59 14 GLYCATED HEMOGLOBIN TEST
p5 2015-04-14 15:31:30 7 Hemoglobin A1C
p5 2015-07-28 15:28:16 7 GLYCATED HEMOGLOBIN TEST
lab_results_obx
person_id A1cDate ResultDescription A1cResultValue
p1 2015-08-07 07:17:00 HBG.AIC 6.1
p3 2015-05-20 12:45:00 HBG.AIC 5.3
p5 2015-11-03 10:12:00 HEMOGLOBIN A1c 7.1
patient_diagnosis
person_id enc_id create_timestamp diagnosis_code_id
p1 c1 2014-02-04 10:24:38 250
p1 c2 2013-10-22 18:55:00 250
p1 c3 2014-02-06 11:24:43 250
p1 c4 2012-11-27 17:17:24 250
p1 c5 2013-02-21 15:56:17 250
p1 c6 2014-12-16 14:42:00 250
p1 c7 2015-01-22 11:12:00 250
p1 c8 2014-09-08 09:07:37 250
p1 c9 2015-07-02 16:59:25 250
p1 ca 2015-08-03 10:28:20 250
p1 cb 2014-08-12 14:53:53 250
p1 cc 2015-10-01 14:15:06 E11.40
p1 cc 2015-10-01 14:16:24 E13.42
p2 cd 2012-12-19 18:19:12 250
p2 ce 2014-02-11 16:01:20 250
p2 cf 2013-07-17 18:15:21 250
p2 cg 2013-03-09 09:12:48 250
p2 ch 2014-05-10 11:41:59 250.02
p2 ci 2014-11-04 08:45:38 250.02
p2 cj 2014-05-03 10:19:15 250.02
p2 ce 2014-02-11 16:33:42 250.6
p2 ck 2015-11-23 09:22:53 E11.42
p2 e7 2016-01-11 10:12:47 E11.65
p2 cl 2015-12-18 12:04:59 E11.65
p2 cm 2015-12-14 08:50:47 E11.65
p2 ck 2015-11-23 17:18:08 E11.65
p2 cn 2015-12-07 08:54:12 E11.65
p2 e7 2016-01-11 10:14:57 E11.9
p2 ck 2015-11-23 17:17:54 E11.9
p3 co 2013-08-07 09:17:33 250
p3 cp 2013-08-28 10:13:01 250
p3 e5 2016-01-07 16:17:06 E11.9
p3 cq 2015-11-03 09:06:34 E11.9
p4 e9 2016-01-12 12:43:11 E11.65
p4 cr 2015-12-15 16:27:09 E11.65
p5 cs 2013-04-30 14:16:29 250
p5 ct 2015-04-14 15:31:25 250
p5 cu 2013-08-23 12:05:33 250
p5 cv 2015-07-28 15:21:17 250
p5 cw 2013-01-11 17:05:15 250
p5 cx 2013-11-08 10:13:10 250
p5 cy 2015-02-12 11:55:45 250.62
p5 cz 2015-11-24 13:44:38 E11.9
Notes:Order_.A1cDate is varchar in format 2015-12-07 11:54:00.000
Lab_Results_Obx.A1cDate is a datetime column
person_id is really stored in this form 'D8FF794D-3E19-4830-82C8-0
enc_id is really stored in this form '9535E988-C284-465A-83C1-F
the string values for person_id and enc_id have been replaced with shorter strings for ease of reading. e.g. each unique person_id was replaced by p1 or p2 or p3 etc
for enc_id only some of these are common across more than one table, they commence with 'e'
other that are not common across tables commence with 'c'
===============
From that simplfied text I can easily create some tables, like this:
CREATE TABLE Person
([person_id] varchar(2), [LastName] varchar(5))
;
INSERT INTO Person
([person_id], [LastName])
VALUES
('p1', 'test1'),
('p2', 'test2'),
('p3', 'test3'),
('p4', 'test4'),
('p5', 'test5')
;
CREATE TABLE patient_encounter
([person_id] varchar(2), [enc_id] varchar(2), [enc_timestamp] datetime)
;
INSERT INTO patient_encounter
([person_id], [enc_id], [enc_timestamp])
VALUES
('p1', 'e1', '2016-01-02 11:50:00'),
('p2', 'e2', '2016-01-04 11:57:03'),
('p1', 'e3', '2016-01-05 10:36:46'),
('p1', 'e4', '2016-01-05 11:30:00'),
('p3', 'e5', '2016-01-07 15:15:00'),
('p4', 'e6', '2016-01-08 16:35:00'),
('p2', 'e7', '2016-01-11 10:00:00'),
('p1', 'e8', '2016-01-12 08:30:00'),
('p4', 'e9', '2016-01-12 12:00:00'),
('p1', 'ea', '2016-01-12 12:22:34'),
('p2', 'eb', '2016-01-18 10:17:34'),
('p5', 'ec', '2016-01-19 12:42:06'),
('p5', 'ed', '2016-01-26 14:15:00'),
('p2', 'ee', '2016-02-01 12:24:44')
;
CREATE TABLE Order_
([person_id] varchar(2), [A1cDate] varchar(24)
, [A1cResultValue] int, [ResultDescription] varchar(24))
;
INSERT INTO Order_
([person_id], [A1cDate], [A1cResultValue], [ResultDescription])
VALUES
('p1', '2015-07-02 16:59:25', 6.5, 'GLYCATED HEMOGLOBIN TEST'),
('p1', '2015-10-01 13:39:24', 6.3, 'Hemoglobin A1C'),
('p2', '2015-11-23 09:38:23', 10.3, 'GLYCATED HEMOGLOBIN TEST'),
('p2', '2016-01-11 10:41:37', 8.3, 'GLYCATED HEMOGLOBIN TEST'),
('p3', '2015-05-20 10:08:55', 5.7, 'GLYCATED HEMOGLOBIN TEST'),
('p3', '2015-11-03 09:06:45', 7.2, 'GLYCATED HEMOGLOBIN TEST'),
('p3', '2016-01-07 18:07:26', 6.5, 'GLYCATED HEMOGLOBIN TEST'),
('p4', '2015-12-15 16:32:59', 14, 'GLYCATED HEMOGLOBIN TEST'),
('p5', '2015-04-14 15:31:30', 7, 'Hemoglobin A1C'),
('p5', '2015-07-28 15:28:16', 7, 'GLYCATED HEMOGLOBIN TEST')
;
CREATE TABLE lab_results_obx
([person_id] varchar(2), [A1cDate] datetime, [ResultDescription] varchar(14), [A1cResultValue] numeric)
;
INSERT INTO lab_results_obx
([person_id], [A1cDate], [ResultDescription], [A1cResultValue])
VALUES
('p1', '2015-08-07 07:17:00', 'HBG.AIC', 6.1),
('p3', '2015-05-20 12:45:00', 'HBG.AIC', 5.3),
('p5', '2015-11-03 10:12:00', 'HEMOGLOBIN A1c', 7.1)
;
CREATE TABLE patient_diagnosis
([person_id] varchar(2), [enc_id] varchar(2), [create_timestamp] datetime, [diagnosis_code_id] varchar(6))
;
INSERT INTO patient_diagnosis
([person_id], [enc_id], [create_timestamp], [diagnosis_code_id])
VALUES
('p1', 'c1', '2014-02-04 10:24:38', '250'),
('p1', 'c2', '2013-10-22 18:55:00', '250'),
('p1', 'c3', '2014-02-06 11:24:43', '250'),
('p1', 'c4', '2012-11-27 17:17:24', '250'),
('p1', 'c5', '2013-02-21 15:56:17', '250'),
('p1', 'c6', '2014-12-16 14:42:00', '250'),
('p1', 'c7', '2015-01-22 11:12:00', '250'),
('p1', 'c8', '2014-09-08 09:07:37', '250'),
('p1', 'c9', '2015-07-02 16:59:25', '250'),
('p1', 'ca', '2015-08-03 10:28:20', '250'),
('p1', 'cb', '2014-08-12 14:53:53', '250'),
('p1', 'cc', '2015-10-01 14:15:06', 'E11.40'),
('p1', 'cc', '2015-10-01 14:16:24', 'E13.42'),
('p2', 'cd', '2012-12-19 18:19:12', '250'),
('p2', 'ce', '2014-02-11 16:01:20', '250'),
('p2', 'cf', '2013-07-17 18:15:21', '250'),
('p2', 'cg', '2013-03-09 09:12:48', '250'),
('p2', 'ch', '2014-05-10 11:41:59', '250.02'),
('p2', 'ci', '2014-11-04 08:45:38', '250.02'),
('p2', 'cj', '2014-05-03 10:19:15', '250.02'),
('p2', 'ce', '2014-02-11 16:33:42', '250.6'),
('p2', 'ck', '2015-11-23 09:22:53', 'E11.42'),
('p2', 'e7', '2016-01-11 10:12:47', 'E11.65'),
('p2', 'cl', '2015-12-18 12:04:59', 'E11.65'),
('p2', 'cm', '2015-12-14 08:50:47', 'E11.65'),
('p2', 'ck', '2015-11-23 17:18:08', 'E11.65'),
('p2', 'cn', '2015-12-07 08:54:12', 'E11.65'),
('p2', 'e7', '2016-01-11 10:14:57', 'E11.9'),
('p2', 'ck', '2015-11-23 17:17:54', 'E11.9'),
('p3', 'co', '2013-08-07 09:17:33', '250'),
('p3', 'cp', '2013-08-28 10:13:01', '250'),
('p3', 'e5', '2016-01-07 16:17:06', 'E11.9'),
('p3', 'cq', '2015-11-03 09:06:34', 'E11.9'),
('p4', 'e9', '2016-01-12 12:43:11', 'E11.65'),
('p4', 'cr', '2015-12-15 16:27:09', 'E11.65'),
('p5', 'cs', '2013-04-30 14:16:29', '250'),
('p5', 'ct', '2015-04-14 15:31:25', '250'),
('p5', 'cu', '2013-08-23 12:05:33', '250'),
('p5', 'cv', '2015-07-28 15:21:17', '250'),
('p5', 'cw', '2013-01-11 17:05:15', '250'),
('p5', 'cx', '2013-11-08 10:13:10', '250'),
('p5', 'cy', '2015-02-12 11:55:45', '250.62'),
('p5', 'cz', '2015-11-24 13:44:38', 'E11.9')
;
In fact you can see those table - live and ready to play with - here at sqlfiddle: http://sqlfiddle.com/#!6/1ddd1
It is wiser to use the REAL COLUMN NAMES in your sample data, regrettably I didn't notice until too late. But now that had the sqlfiddle working I could get some results. (see below)
What I do not know is, is the final result below "correct"? Only you know that.
Please note that for your A1cDate I suggest you use:
convert(varchar(19), res.A1cDate ,121) AS A1cDate
then you get the full date and time to the second
If in doubt over what style number to use refer to this article:
https://www.experts-exchange.com/articles/12315/SQL-Server-Date-Styles-formats-using-CONVERT.html
I trust this helps, but most of all I trust this shows why sample data and expected result are so important to resolving your questions.
What I do not know is, is the final result below "correct"? Only you know that.
Please note that for your A1cDate I suggest you use:
convert(varchar(19), res.A1cDate ,121) AS A1cDate
then you get the full date and time to the second
If in doubt over what style number to use refer to this article:
https://www.experts-exchange.com/articles/12315/SQL-Server-Date-Styles-formats-using-CONVERT.html
I trust this helps, but most of all I trust this shows why sample data and expected result are so important to resolving your questions.
SELECT
o2.person_id
, o2.ResultDescription
, o2.A1cResultValue
, o2.A1cDate
FROM order_ o2
INNER JOIN person as CTEpe ON o2.person_id = CTEpe.person_id
WHERE o2.A1cResultValue IS NOT NULL
AND (o2.A1cDate >= '20150101'
AND o2.A1cDate < '20160101')
AND o2.ResultDescription IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
**[Results][2]**:
| person_id | ResultDescription | A1cResultValue | A1cDate |
|-----------|--------------------------|----------------|---------------------|
| p2 | GLYCATED HEMOGLOBIN TEST | 8 | 2016-01-11 10:41:37 |
| p3 | GLYCATED HEMOGLOBIN TEST | 6 | 2016-01-07 18:07:26 |
SELECT
res.person_id
, res.ResultDescription
, res.A1cResultValue
, convert(varchar(19), res.A1cDate ,121) AS A1cDate
FROM lab_results_obx res
INNER JOIN person as CTEpe ON res.person_id = CTEpe.person_id
WHERE res.A1cResultValue IS NOT NULL
AND (res.A1cDate >= '20150101'
AND res.A1cDate < '20160101')
AND (res.ResultDescription LIKE '%HEMOGLOBIN A1C%'
OR res.ResultDescription LIKE '%HBG. A1C%'
OR res.ResultDescription LIKE '%HBG.AIC%'
OR res.ResultDescription LIKE '%HEMOGLOBIN A1c%')
| person_id | ResultDescription | A1cResultValue | A1cDate |
|-----------|-------------------|----------------|---------------------|
| p1 | HBG.AIC | 6 | 2015-08-07 07:17:00 |
| p3 | HBG.AIC | 5 | 2015-05-20 12:45:00 |
| p5 | HEMOGLOBIN A1c | 7 | 2015-11-03 10:12:00 |
SELECT
o2.person_id
, o2.ResultDescription
, o2.A1cResultValue
, o2.A1cDate
FROM order_ o2
INNER JOIN person as CTEpe ON o2.person_id = CTEpe.person_id
WHERE o2.A1cResultValue IS NOT NULL
AND (o2.A1cDate >= '20150101'
AND o2.A1cDate < '20160101')
AND o2.ResultDescription IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
UNION
SELECT
res.person_id
, res.ResultDescription
, res.A1cResultValue
, convert(varchar(19), res.A1cDate ,121) AS A1cDate
FROM lab_results_obx res
INNER JOIN person as CTEpe ON res.person_id = CTEpe.person_id
WHERE res.A1cResultValue IS NOT NULL
AND (res.A1cDate >= '20150101'
AND res.A1cDate < '20160101')
AND (res.ResultDescription LIKE '%HEMOGLOBIN A1C%'
OR res.ResultDescription LIKE '%HBG. A1C%'
OR res.ResultDescription LIKE '%HBG.AIC%'
OR res.ResultDescription LIKE '%HEMOGLOBIN A1c%')
| person_id | ResultDescription | A1cResultValue | A1cDate |
|-----------|--------------------------|----------------|---------------------|
| p1 | HBG.AIC | 6 | 2015-08-07 07:17:00 |
| p2 | GLYCATED HEMOGLOBIN TEST | 8 | 2016-01-11 10:41:37 |
| p3 | GLYCATED HEMOGLOBIN TEST | 6 | 2016-01-07 18:07:26 |
| p3 | HBG.AIC | 5 | 2015-05-20 12:45:00 |
| p5 | HEMOGLOBIN A1c | 7 | 2015-11-03 10:12:00 |
SELECT
person_id, ResultDescription, A1cResultValue, A1cDate
, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY A1cDate DESC) AS rowno
FROM (
SELECT
o2.person_id
, o2.ResultDescription
, o2.A1cResultValue
, o2.A1cDate
FROM order_ o2
INNER JOIN person as CTEpe ON o2.person_id = CTEpe.person_id
WHERE o2.A1cResultValue IS NOT NULL
AND (o2.A1cDate >= '20150101'
AND o2.A1cDate < '20160101')
AND o2.ResultDescription IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
UNION
SELECT
res.person_id
, res.ResultDescription
, res.A1cResultValue
, convert(varchar(19), res.A1cDate ,121) AS A1cDate
FROM lab_results_obx res
INNER JOIN person as CTEpe ON res.person_id = CTEpe.person_id
WHERE res.A1cResultValue IS NOT NULL
AND (res.A1cDate >= '20150101'
AND res.A1cDate < '20160101')
AND (res.ResultDescription LIKE '%HEMOGLOBIN A1C%'
OR res.ResultDescription LIKE '%HBG. A1C%'
OR res.ResultDescription LIKE '%HBG.AIC%'
OR res.ResultDescription LIKE '%HEMOGLOBIN A1c%')
) AS DT
| person_id | ResultDescription | A1cResultValue | A1cDate | rowno |
|-----------|--------------------------|----------------|---------------------|-------|
| p1 | HBG.AIC | 6 | 2015-08-07 07:17:00 | 1 |
| p2 | GLYCATED HEMOGLOBIN TEST | 8 | 2016-01-11 10:41:37 | 1 |
| p3 | GLYCATED HEMOGLOBIN TEST | 6 | 2016-01-07 18:07:26 | 1 |
| p3 | HBG.AIC | 5 | 2015-05-20 12:45:00 | 2 |
| p5 | HEMOGLOBIN A1c | 7 | 2015-11-03 10:12:00 | 1 |
select
*
from (
SELECT
person_id, ResultDescription, A1cResultValue, A1cDate
, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY A1cDate DESC) AS rowno
FROM (
SELECT
o2.person_id
, o2.ResultDescription
, o2.A1cResultValue
, o2.A1cDate
FROM order_ o2
INNER JOIN person as CTEpe ON o2.person_id = CTEpe.person_id
WHERE o2.A1cResultValue IS NOT NULL
AND (o2.A1cDate >= '20150101'
AND o2.A1cDate < '20160101')
AND o2.ResultDescription IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
UNION
SELECT
res.person_id
, res.ResultDescription
, res.A1cResultValue
, convert(varchar(19), res.A1cDate ,121) AS A1cDate
FROM lab_results_obx res
INNER JOIN person as CTEpe ON res.person_id = CTEpe.person_id
WHERE res.A1cResultValue IS NOT NULL
AND (res.A1cDate >= '20150101'
AND res.A1cDate < '20160101')
AND (res.ResultDescription LIKE '%HEMOGLOBIN A1C%'
OR res.ResultDescription LIKE '%HBG. A1C%'
OR res.ResultDescription LIKE '%HBG.AIC%'
OR res.ResultDescription LIKE '%HEMOGLOBIN A1c%')
) AS DT
) as dt2
where rowno = 1
| person_id | ResultDescription | A1cResultValue | A1cDate | rowno |
|-----------|--------------------------|----------------|---------------------|-------|
| p1 | HBG.AIC | 6 | 2015-08-07 07:17:00 | 1 |
| p2 | GLYCATED HEMOGLOBIN TEST | 8 | 2016-01-11 10:41:37 | 1 |
| p3 | GLYCATED HEMOGLOBIN TEST | 6 | 2016-01-07 18:07:26 | 1 |
| p5 | HEMOGLOBIN A1c | 7 | 2015-11-03 10:12:00 | 1 |
See: http://sqlfiddle.com/#!6/1ddd1/18
ASKER
Thank you very much Paul. Didn't expect all the extra effort on this, greatly appreciated. :)
>There are duplicate patients that need to be removed when joining these.
You can UNION both of these sets together which will eliminate duplicates.
UNION ALL would join the queries, but not eliminate duplicates.
BUT
UNION requires that all columns in both sets be identical, and looking at the SELECT * from (select {column list} here they seem very different.
SO
Give us a data mockup example of what these sets look like, and your definition of 'eliminate duplicates'.