asked on

Two SQL queries that need to be incorporated into one (possible Union)

One of these queries pulls internal labs and the other external labs. There are duplicate patients that need to be removed when joining these. Also wondering if instead of using multiple like statements I can use IN when it contains wildcards?

Internal Labs:

SELECT  * from (select o2.acttext, o2.obsvalue, p.person_nbr, p.last_name, p.first_name, pe.enc_id, pe.person_id,
ROW_NUMBER() OVER (PARTITION BY p.person_nbr ORDER BY o2.completeddate DESC)rowno, o2.completeddate
FROM order_ o2	
	  inner join person p with(nolock) on o2.person_id = p.person_id 
	  INNER JOIN patient_encounter AS pe with(nolock)  ON p.person_id = pe.person_id
	  inner join patient_diagnosis pd with(nolock) on pe.enc_id = pd.enc_id
	  INNER JOIN appointments AS a	 with(nolock)  ON pe.enc_id	= a.enc_id
	  WHERE o2.obsvalue IS NOT NULL 
     and (o2.completeddate >= '20150101'
                  AND o2.completeddate < '20160101')
	 and o2.acttext in ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
	  and (pd.diagnosis_code_id  like '250%' or pd.diagnosis_code_id like '648.0%' 
	or pd.diagnosis_code_id  like '357.2' or pd.diagnosis_code_id like '362.0[0-7]' or pd.diagnosis_code_id  like '366.41' or pd.diagnosis_code_id like 'E1[0-3].%' 
	or pd.diagnosis_code_id  like 'O24.[0-4]0%' or pd.diagnosis_code_id like 'O24.[4-9][2-9]') 
	 and pe.person_id in (select pe.person_id from patient_encounter pe 
	INNER JOIN appointments AS a	 with(nolock)  ON pe.enc_id	= a.enc_id
	where  (pe.enc_timestamp >= '20151001'
                  AND pe.enc_timestamp < '20160101'
				  and 	a.appt_kept_ind = 'Y' and a.resched_ind = 'N'))) inofficea1c
	where rowno = 1

Open in new window

External Labs:

	SELECT  * from (select p.last_name as LastName, p.first_name as FirstName, p.sex as Sex, p.language as Language, p.Ethnicity as Ethnicity, 
p.Race as Race, p.date_of_birth as DOB, res.obs_date_time as A1cDate, res.result_desc as ResultDescription, res.observ_value as A1cResultValue, res.obs_id as OBXObsID,
ROW_NUMBER() OVER (PARTITION BY p.person_nbr ORDER BY  res.obs_date_time DESC)rowno
FROM lab_results_obx res	  
	  inner join person p with(nolock) on res.person_id = p.person_id 
	  INNER JOIN patient_encounter AS pe with(nolock)  ON p.person_id = pe.person_id
	  inner join patient_diagnosis pd with(nolock) on pe.enc_id = pd.enc_id
	  INNER JOIN appointments AS a	 with(nolock)  ON pe.enc_id	= a.enc_id
	  WHERE res.observ_value IS NOT NULL 
     and (res.obs_date_time >= '20150101'
                  AND res.obs_date_time < '20160101')
	 and (res.obs_id like '%HEMOGLOBIN A1C%'  or res.obs_id like '%HBG. A1C%' or res.obs_id like '%HBG.AIC%' or res.obs_id like '%HEMOGLOBIN A1c%')
	  and (pd.diagnosis_code_id  like '250%' or pd.diagnosis_code_id like '648.0%' 
	or pd.diagnosis_code_id  like '357.2' or pd.diagnosis_code_id like '362.0[0-7]' or pd.diagnosis_code_id  like '366.41' or pd.diagnosis_code_id like 'E1[0-3].%' 
	or pd.diagnosis_code_id  like 'O24.[0-4]0%' or pd.diagnosis_code_id like 'O24.[4-9][2-9]') 
	 and pe.person_id in (select pe.person_id from patient_encounter pe 
	INNER JOIN appointments AS a	 with(nolock)  ON pe.enc_id	= a.enc_id
	where  (pe.enc_timestamp >= '20151001'
                  AND pe.enc_timestamp < '20160101'
				  and 	a.appt_kept_ind = 'Y' and a.resched_ind = 'N'))) outsidelabA1c
	where rowno = 1

Open in new window

Jim Horn

Hard to troubleshoot this looking at a wompload of T-SQL with no data mockup, but...

>There are duplicate patients that need to be removed when joining these.
You can UNION both of these sets together which will eliminate duplicates.
UNION ALL would join the queries, but not eliminate duplicates.

BUT

UNION requires that all columns in both sets be identical, and looking at the SELECT * from (select {column list} here they seem very different.

SO

Give us a data mockup example of what these sets look like, and your definition of 'eliminate duplicates'.

Jim Horn

>Also wondering if instead of using multiple like statements I can use IN when it contains wildcards?
Afaik no, IN does not support wildcards, so the way you have it is correct.

biotec

ASKER

What is the best way to paste in from an excel spreadsheet without it just garbling together here? Also, what if I used AS in the select statement so that any column that was different matched up in terms of the AS part of it.
So if I pull an A1c lab value and date on an internal lab from table Order_ and the same values and dates from an external lab called lab_results_obx but call both AS result_value and AS result_date could I then use the Union?

biotec

ASKER

Here is an Excel file with examples. I changed the select statements to match more closely and changed the AS parts where the queries were pulling from different tables.
Also, the only reason I was pulling enc_id and person_id in one of the original queries was for troubleshooting as those are unique_id's. I don't need them in the output for the report but I need them to check for dups and inaccuracies etc.
A1c_lab_query_examples.xlsx

Russell Fox

Just a quick note, the column names do not need to match, just the column data types, and the matching columns have to be selected in the same order. SQL will use the field names in the first query for the result set.

Vitor Montalvão

First, you need to have same number of columns returned from each SELECT so get rid of the '*' and replace the columns names there. They should match logically, i.e. same data types at least and if you have columns returned from one SELECT that isn't returned from the other one, just replace it with NULL.

From your example you have for query 1 these columns:
SELECT last_name, first_name, acttext, person_nbr, enc_id, person_id, o2.completeddate, obsvalue, rowno

and from query 2 these ones:
SELECT LastName, FirstName, Sex, Language, Ethnicity, Race, DOB, A1cDate, ResultDescription, A1cResultValue, OBXObsID, rowno

Can you match them for us? I could at least match some but still to need for the rest of them.

PortletPaul

According to that spreadsheet if you drop one column [ResultDescription2] from the external labs query, and you re-arrange the column order, you should be able to use UNION

Internal Labs
LastName,FirstName,Sex,Language,Ethnicity,Race,DOB,ResultDescription,A1cResult,A1ccompleteddate,rowno

External Labs
LastName,FirstName,Sex,Language,Ethnicity,Race,DOB,ResultDescription,A1cResultValue,A1cCompletedDate,rowno

biotec

ASKER

Thanks for the help. It runs properly now. What I need help with is that this obviously creates some duplicate rows in terms of the person and I need to pull only the most recent based on one of the timestamps. I'm not sure how I wrap the whole thing in one ROW_NUMBER or something like that.

The query takes over a minute to run which isn't good but I might be able to figure something out later.
My other problem is that I had to change to using a datetime for one of the queries as it was previously a varchar 20150101 type of date. I think the timestamp date I used is ok, but could occassionally be different than the actual ordered date I wanted to use.

SELECT  * from (select p.last_name as LastName, p.first_name as FirstName, p.sex as Sex, p.language as Language, p.Ethnicity as Ethnicity, 
p.Race as Race, p.date_of_birth as DOB,  res.result_desc as ResultDescription, res.observ_value as A1cResultValue, res.obs_date_time as A1cDate,
pe.enc_id, pe.person_id,
ROW_NUMBER() OVER (PARTITION BY p.person_nbr ORDER BY  res.obs_date_time DESC)rowno
FROM lab_results_obx res	  
	  inner join person p with(nolock) on res.person_id = p.person_id 
	  INNER JOIN patient_encounter AS pe with(nolock)  ON p.person_id = pe.person_id
	  inner join patient_diagnosis pd with(nolock) on pe.enc_id = pd.enc_id
	  INNER JOIN appointments AS a	 with(nolock)  ON pe.enc_id	= a.enc_id
	  WHERE res.observ_value IS NOT NULL 
     and (res.obs_date_time >= '20150401'
                  AND res.obs_date_time < '20160331')
	 and (res.obs_id like '%HEMOGLOBIN A1C%'  or res.obs_id like '%HBG. A1C%' or res.obs_id like '%HBG.AIC%' or res.obs_id like '%HEMOGLOBIN A1c%')
	  and (pd.diagnosis_code_id  like '250%' or pd.diagnosis_code_id like '648.0%' 
	or pd.diagnosis_code_id  like '357.2' or pd.diagnosis_code_id like '362.0[0-7]' or pd.diagnosis_code_id  like '366.41' or pd.diagnosis_code_id like 'E1[0-3].%' 
	or pd.diagnosis_code_id  like 'O24.[0-4]0%' or pd.diagnosis_code_id like 'O24.[4-9][2-9]') 
	 and pe.person_id in (select pe.person_id from patient_encounter pe 
	INNER JOIN appointments AS a	 with(nolock)  ON pe.enc_id	= a.enc_id
	where  (pe.enc_timestamp >= '20160101'
                  AND pe.enc_timestamp < '20160331'
				  and 	a.appt_kept_ind = 'Y' and a.resched_ind = 'N'))) outsidelabA1c
	where rowno = 1
	
UNION 

	SELECT  * from (select p.last_name as LastName, p.first_name as FirstName, p.sex as Sex, p.language as Language, p.Ethnicity as Ethnicity,
	 p.Race as Race, p.date_of_birth as DOB,   o2.acttext as ResultDescription,  o2.obsvalue as A1cResultValue, o2.create_timestamp as A1cDate, pe.enc_id, pe.person_id,
ROW_NUMBER() OVER (PARTITION BY p.person_nbr ORDER BY o2.completeddate DESC)rowno
FROM order_ o2	
	  inner join person p with(nolock) on o2.person_id = p.person_id 
	  INNER JOIN patient_encounter AS pe with(nolock)  ON p.person_id = pe.person_id
	  inner join patient_diagnosis pd with(nolock) on pe.enc_id = pd.enc_id
	  INNER JOIN appointments AS a	 with(nolock)  ON pe.enc_id	= a.enc_id
	  WHERE o2.obsvalue IS NOT NULL 
     and (o2.completeddate >= '20150401'
                  AND o2.completeddate < '20160331')
	 and o2.acttext in ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
	  and (pd.diagnosis_code_id  like '250%' or pd.diagnosis_code_id like '648.0%' 
	or pd.diagnosis_code_id  like '357.2' or pd.diagnosis_code_id like '362.0[0-7]' or pd.diagnosis_code_id  like '366.41' or pd.diagnosis_code_id like 'E1[0-3].%' 
	or pd.diagnosis_code_id  like 'O24.[0-4]0%' or pd.diagnosis_code_id like 'O24.[4-9][2-9]') 
	 and pe.person_id in (select pe.person_id from patient_encounter pe 
	INNER JOIN appointments AS a	 with(nolock)  ON pe.enc_id	= a.enc_id
	where  (pe.enc_timestamp >= '20160101'
                  AND pe.enc_timestamp < '20160331'
				  and 	a.appt_kept_ind = 'Y' and a.resched_ind = 'N'))) inofficea1c
	where rowno = 1

Open in new window

PortletPaul

I suggest using a CTE to arrive at the list of people, so you do that just once.

Then, link that CTE to the each of the 2 sources of data.
Keep the rows of these as "narrow" (few columns) as possible.
Use row_number() on each result to only get the latest dated rows.
Use UNION on the latest dated narrow rows to get uniqueness.
Then link to the person table for the extra columns

;
WITH CTEpe
AS (
      SELECT DISTINCT
            pe.person_id
      FROM patient_encounter pe
            INNER JOIN appointments AS a WITH (NOLOCK) ON pe.enc_id = a.enc_id
            INNER JOIN patient_diagnosis pd WITH (NOLOCK) ON pe.enc_id = pd.enc_id
      WHERE     pe.enc_timestamp >= '20151001'
            AND pe.enc_timestamp < '20160101'
            AND a.appt_kept_ind = 'Y'
            AND a.resched_ind = 'N'
            AND (
                 pd.diagnosis_code_id LIKE '250%'
              OR pd.diagnosis_code_id LIKE '648.0%'
              OR pd.diagnosis_code_id LIKE '357.2'
              OR pd.diagnosis_code_id LIKE '362.0[0-7]'
              OR pd.diagnosis_code_id LIKE '366.41'
              OR pd.diagnosis_code_id LIKE 'E1[0-3].%'
              OR pd.diagnosis_code_id LIKE 'O24.[0-4]0%'
              OR pd.diagnosis_code_id LIKE 'O24.[4-9][2-9]'
                )
    )
SELECT
      p.last_name AS LastName
    , p.first_name AS FirstName
    , p.sex AS Sex
    , p.language AS Language
    , p.Ethnicity AS Ethnicity
    , p.Race AS Race
    , p.date_of_birth AS DOB
    , d.ResultDescription
    , d.A1cResult
    , d.A1ccompleteddate
FROM person p
INNER JOIN (
      SELECT
            person_id, ResultDescription, A1cResult, A1ccompleteddate
      FROM (
            SELECT
                  o2.person_id
                , o2.acttext AS ResultDescription
                , o2.obsvalue AS A1cResult
                , o2.completeddate AS A1ccompleteddate
                , ROW_NUMBER() OVER (PARTITION BY o2.person_id ORDER BY o2.completeddate DESC) AS rowno
            FROM order_ o2
                  INNER JOIN CTEpe ON o2.person_id = CTEpe.person_id
            WHERE o2.obsvalue IS NOT NULL
                  AND (o2.completeddate >= '20150101'
                  AND o2.completeddate < '20160101')
                  AND o2.acttext IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
            ) inofficea1c
      WHERE rowno = 1
      
      UNION
      
      SELECT
            person_id, ResultDescription, A1cResult, A1ccompleteddate
      FROM (
            SELECT
                  res.person_id
                , res.result_desc AS ResultDescription
                , res.observ_value AS A1cResult
                , res.obs_date_time AS A1ccompleteddate
                , ROW_NUMBER() OVER (PARTITION BY p.person_nbr ORDER BY res.obs_date_time DESC) rowno
            FROM lab_results_obx res
                  INNER JOIN CTEpe ON res.person_id = CTEpe.person_id
            WHERE res.observ_value IS NOT NULL
                  AND (res.obs_date_time >= '20150101'
                  AND res.obs_date_time < '20160101')
                  AND (res.obs_id LIKE '%HEMOGLOBIN A1C%'
                  OR res.obs_id LIKE '%HBG. A1C%'
                  OR res.obs_id LIKE '%HBG.AIC%'
                  OR res.obs_id LIKE '%HEMOGLOBIN A1c%')
            ) outsidelabA1c
      WHERE rowno = 1
    ) AS d  ON p.person_id = d.person_id
;

Open in new window

none of this is tested of course as there was no sample data provided.

If sample data had been provided it should be "per table" and scrubbed of any private information.

PortletPaul

We cross posted, I had not seen your 2016-01-29 at 15:56:56 ID: 41440674 comment

perhaps this:

;
WITH CTEpe
AS (
      SELECT DISTINCT
            pe.person_id
      FROM patient_encounter pe
            INNER JOIN appointments AS a WITH (NOLOCK) ON pe.enc_id = a.enc_id
            INNER JOIN patient_diagnosis pd WITH (NOLOCK) ON pe.enc_id = pd.enc_id
      WHERE     pe.enc_timestamp >= '20151001'
            AND pe.enc_timestamp < '20160101'
            AND a.appt_kept_ind = 'Y'
            AND a.resched_ind = 'N'
            AND (
                 pd.diagnosis_code_id LIKE '250%'
              OR pd.diagnosis_code_id LIKE '648.0%'
              OR pd.diagnosis_code_id LIKE '357.2'
              OR pd.diagnosis_code_id LIKE '362.0[0-7]'
              OR pd.diagnosis_code_id LIKE '366.41'
              OR pd.diagnosis_code_id LIKE 'E1[0-3].%'
              OR pd.diagnosis_code_id LIKE 'O24.[0-4]0%'
              OR pd.diagnosis_code_id LIKE 'O24.[4-9][2-9]'
                )
    )
SELECT
      p.last_name AS LastName
    , p.first_name AS FirstName
    , p.sex AS Sex
    , p.language AS Language
    , p.Ethnicity AS Ethnicity
    , p.Race AS Race
    , p.date_of_birth AS DOB
    , d.ResultDescription
    , d.A1cResult
    , d.A1ccompleteddate
FROM person p
INNER JOIN (
    SELECT
            person_id, ResultDescription, A1cResult, A1ccompleteddate
          , ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY A1ccompleteddate DESC) AS rowno
    FROM (
            SELECT
                  o2.person_id
                , o2.acttext AS ResultDescription
                , o2.obsvalue AS A1cResult
                , o2.completeddate AS A1ccompleteddate
            FROM order_ o2
                  INNER JOIN CTEpe ON o2.person_id = CTEpe.person_id
            WHERE o2.obsvalue IS NOT NULL
                  AND (o2.completeddate >= '20150101'
                  AND o2.completeddate < '20160101')
                  AND o2.acttext IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')

            UNION

                SELECT
                      res.person_id
                    , res.result_desc AS ResultDescription
                    , res.observ_value AS A1cResult
                    , res.obs_date_time AS A1ccompleteddate
                FROM lab_results_obx res
                      INNER JOIN CTEpe ON res.person_id = CTEpe.person_id
                WHERE res.observ_value IS NOT NULL
                      AND (res.obs_date_time >= '20150101'
                      AND res.obs_date_time < '20160101')
                      AND (res.obs_id LIKE '%HEMOGLOBIN A1C%'
                      OR res.obs_id LIKE '%HBG. A1C%'
                      OR res.obs_id LIKE '%HBG.AIC%'
                      OR res.obs_id LIKE '%HEMOGLOBIN A1c%')
          ) AS DT
    ) AS d  ON p.person_id = d.person_id 
           AND d.rowno = 1

Open in new window

biotec

ASKER

Thanks. I'll attach an output from each query. When I remove dups from mine run as the original Union I get 165 when I run yours it returns dups as well but I only get 114 when removing the dups so it appears to be eliminating some of the tests. Certainly appreciate the help. I'm trying to get used to the CTE but not crystal clear yet.
Examples keep the data similar to what you'd expect but all PHI has been removed or changed. Thanks again.
A1c_lab_query_examples.xlsx

biotec

ASKER

Cross posted again. Ok, I tried the new one and it returns 112 rows. I'll add person_id into it and then do some comparisons to what is unique between yours and my UNION and see if I can figure out where they are falling through the cracks (code).
Getting close though as your new one no longer returns dups. Thanks

PortletPaul

without data I can't do much except make suggestions.

what you describe does not make much sense because the filter is there

SELECT DISTINCT
pe.person_id
FROM patient_encounter pe
INNER JOIN appointments AS a ON pe.enc_id = a.enc_id
INNER JOIN patient_diagnosis pd ON pe.enc_id = pd.enc_id

WHERE pe.enc_timestamp >= '20151001' -- 2015-10-01
AND pe.enc_timestamp < '20160101' -- 2016-01-01

-- you have previously indicated hat these are also needed:
AND a.appt_kept_ind = 'Y'
AND a.resched_ind = 'N'

-- and these were previously in both too
AND (
pd.diagnosis_code_id LIKE '250%'
OR pd.diagnosis_code_id LIKE '648.0%'
OR pd.diagnosis_code_id LIKE '357.2'
OR pd.diagnosis_code_id LIKE '362.0[0-7]'
OR pd.diagnosis_code_id LIKE '366.41'
OR pd.diagnosis_code_id LIKE 'E1[0-3].%'
OR pd.diagnosis_code_id LIKE 'O24.[0-4]0%'
OR pd.diagnosis_code_id LIKE 'O24.[4-9][2-9]'
)

so the result would be a distinct list of person_id's meeting all of those conditions

Then we take that list (CTEpe) to restrict who we are interested in and look for the related remaining information

SELECT
o2.person_id
, o2.acttext AS ResultDescription
, o2.obsvalue AS A1cResult
, o2.completeddate AS A1ccompleteddate
FROM order_ o2
INNER JOIN CTEpe ON o2.person_id = CTEpe.person_id
WHERE o2.obsvalue IS NOT NULL
AND (o2.completeddate >= '20150101'
AND o2.completeddate < '20160101')
AND o2.acttext IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')

and

SELECT
res.person_id
, res.result_desc AS ResultDescription
, res.observ_value AS A1cResult
, res.obs_date_time AS A1ccompleteddate
FROM lab_results_obx res
INNER JOIN CTEpe ON res.person_id = CTEpe.person_id
WHERE res.observ_value IS NOT NULL
AND (res.obs_date_time >= '20150101'
AND res.obs_date_time < '20160101')
AND (res.obs_id LIKE '%HEMOGLOBIN A1C%'
OR res.obs_id LIKE '%HBG. A1C%'
OR res.obs_id LIKE '%HBG.AIC%'
OR res.obs_id LIKE '%HEMOGLOBIN A1c%')

Then by using ROW_NUMBER over that remaining information we take just the latest contact per person

hence, from where I sit (and with my imperfect knowledge of your needs and the data) I cannot offer much more than I have.

biotec

ASKER

I'm definitely finding patients that had an enc in the range and had the lab done in the range from yours but can't figure out what might be causing it to lose some. It did get rid of the dups but got rid of too many in the process. Weird.

biotec

ASKER

The only real way I know to fix this is to change the code I have to eliminate dups without changing it to use CTE. I realize that is the most efficient way but I don't know enough about it and how the joins all fit together to figure out why it's excluding patients. This original one if somehow could be set to remove the dups between the ones from pre UNION and post UNION then I'd have a good way to test.

SELECT  * from (select p.last_name as LastName, p.first_name as FirstName, p.sex as Sex, p.language as Language, p.Ethnicity as Ethnicity, 
p.Race as Race, p.date_of_birth as DOB,  res.result_desc as ResultDescription, res.observ_value as A1cResultValue, res.obs_date_time as A1cDate,
pe.enc_id, pe.person_id,
ROW_NUMBER() OVER (PARTITION BY p.person_id ORDER BY  res.obs_date_time DESC)rowno
FROM lab_results_obx res	  
	  inner join person p with(nolock) on res.person_id = p.person_id 
	  INNER JOIN patient_encounter AS pe with(nolock)  ON p.person_id = pe.person_id
	  inner join patient_diagnosis pd with(nolock) on pe.enc_id = pd.enc_id
	  INNER JOIN appointments AS a	 with(nolock)  ON pe.enc_id	= a.enc_id
	  WHERE res.observ_value IS NOT NULL 
     and (res.obs_date_time >= '20150401'
                  AND res.obs_date_time < '20160331')
	 and (res.obs_id like '%HEMOGLOBIN A1C%'  or res.obs_id like '%HBG. A1C%' or res.obs_id like '%HBG.AIC%' or res.obs_id like '%HEMOGLOBIN A1c%')
	  and (pd.diagnosis_code_id  like '250%' or pd.diagnosis_code_id like '648.0%' 
	or pd.diagnosis_code_id  like '357.2' or pd.diagnosis_code_id like '362.0[0-7]' or pd.diagnosis_code_id  like '366.41' or pd.diagnosis_code_id like 'E1[0-3].%' 
	or pd.diagnosis_code_id  like 'O24.[0-4]0%' or pd.diagnosis_code_id like 'O24.[4-9][2-9]') 
	 and pe.person_id in (select pe.person_id from patient_encounter pe 
	INNER JOIN appointments AS a	 with(nolock)  ON pe.enc_id	= a.enc_id
	where  (pe.enc_timestamp >= '20160101'
                  AND pe.enc_timestamp < '20160401'
				  and 	a.appt_kept_ind = 'Y' and a.resched_ind = 'N'))) outsidelabA1c
	where rowno = 1
	
UNION 

	SELECT  * from (select p.last_name as LastName, p.first_name as FirstName, p.sex as Sex, p.language as Language, p.Ethnicity as Ethnicity,
	 p.Race as Race, p.date_of_birth as DOB,   o2.acttext as ResultDescription,  o2.obsvalue as A1cResultValue, o2.create_timestamp as A1cDate, pe.enc_id, pe.person_id,
ROW_NUMBER() OVER (PARTITION BY p.person_nbr ORDER BY o2.completeddate DESC)rowno
FROM order_ o2	
	  inner join person p with(nolock) on o2.person_id = p.person_id 
	  INNER JOIN patient_encounter AS pe with(nolock)  ON p.person_id = pe.person_id
	  inner join patient_diagnosis pd with(nolock) on pe.enc_id = pd.enc_id
	  INNER JOIN appointments AS a	 with(nolock)  ON pe.enc_id	= a.enc_id
	  WHERE o2.obsvalue IS NOT NULL 
     and (o2.completeddate >= '20150401'
                  AND o2.completeddate < '20160331')
	 and o2.acttext in ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')
	  and (pd.diagnosis_code_id  like '250%' or pd.diagnosis_code_id like '648.0%' 
	or pd.diagnosis_code_id  like '357.2' or pd.diagnosis_code_id like '362.0[0-7]' or pd.diagnosis_code_id  like '366.41' or pd.diagnosis_code_id like 'E1[0-3].%' 
	or pd.diagnosis_code_id  like 'O24.[0-4]0%' or pd.diagnosis_code_id like 'O24.[4-9][2-9]') 
	 and pe.person_id in (select pe.person_id from patient_encounter pe 
	INNER JOIN appointments AS a	 with(nolock)  ON pe.enc_id	= a.enc_id
	where  (pe.enc_timestamp >= '20160101'
                  AND pe.enc_timestamp < '20160401'
				  and 	a.appt_kept_ind = 'Y' and a.resched_ind = 'N'))) inofficea1c
	where rowno = 1

Open in new window

PortletPaul

Without data I am flying low in a blizzard; I see white and nothing but white.
==============================================================

select person_id, adate, 'some text'
union
select person_id, diffdate, 'different text'

in that example NO ROWS ARE REMOVED by the use of UNION

but for this:

select person_id, adate
union
select person_id, diffdate

there MIGHT be some rows removed

and for this:

select person_id
union
select person_id

it is highly likely that rows will be removed.

What I am trying to indicate is that the UNION in your query probably removes no rows or very few rows.

In my query more rows will be removed!!!!

=================================================================
Unless or until there is sample data to work with I have given you my best effort.

;
WITH CTEpe
AS (
      SELECT DISTINCT
            pe.person_id
      FROM patient_encounter pe
            INNER JOIN appointments AS a WITH (NOLOCK) ON pe.enc_id = a.enc_id
            INNER JOIN patient_diagnosis pd WITH (NOLOCK) ON pe.enc_id = pd.enc_id
      WHERE     pe.enc_timestamp >= '20151001'
            AND pe.enc_timestamp < '20160101'
            AND a.appt_kept_ind = 'Y'
            AND a.resched_ind = 'N'
            AND (
                 pd.diagnosis_code_id LIKE '250%'
              OR pd.diagnosis_code_id LIKE '648.0%'
              OR pd.diagnosis_code_id LIKE '357.2'
              OR pd.diagnosis_code_id LIKE '362.0[0-7]'
              OR pd.diagnosis_code_id LIKE '366.41'
              OR pd.diagnosis_code_id LIKE 'E1[0-3].%'
              OR pd.diagnosis_code_id LIKE 'O24.[0-4]0%'
              OR pd.diagnosis_code_id LIKE 'O24.[4-9][2-9]'
                )
    )
SELECT
      p.last_name AS LastName
    , p.first_name AS FirstName
    , p.sex AS Sex
    , p.language AS Language
    , p.Ethnicity AS Ethnicity
    , p.Race AS Race
    , p.date_of_birth AS DOB
    , d.ResultDescription
    , d.A1cResult
    , d.A1ccompleteddate
FROM person p
INNER JOIN (
    SELECT
            person_id, ResultDescription, A1cResult, A1ccompleteddate
          , ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY A1ccompleteddate DESC) AS rowno
    FROM (
            SELECT
                  o2.person_id
                , o2.acttext AS ResultDescription
                , o2.obsvalue AS A1cResult
                , o2.completeddate AS A1ccompleteddate
            FROM order_ o2
                  INNER JOIN CTEpe ON o2.person_id = CTEpe.person_id
            WHERE o2.obsvalue IS NOT NULL
                  AND (o2.completeddate >= '20150101'
                  AND o2.completeddate < '20160101')
                  AND o2.acttext IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')

            UNION ALL

                SELECT
                      res.person_id
                    , res.result_desc AS ResultDescription
                    , res.observ_value AS A1cResult
                    , res.obs_date_time AS A1ccompleteddate
                FROM lab_results_obx res
                      INNER JOIN CTEpe ON res.person_id = CTEpe.person_id
                WHERE res.observ_value IS NOT NULL
                      AND (res.obs_date_time >= '20150101'
                      AND res.obs_date_time < '20160101')
                      AND (res.obs_id LIKE '%HEMOGLOBIN A1C%'
                      OR res.obs_id LIKE '%HBG. A1C%'
                      OR res.obs_id LIKE '%HBG.AIC%'
                      OR res.obs_id LIKE '%HEMOGLOBIN A1c%')
          ) AS DT
    ) AS d  ON p.person_id = d.person_id 
           AND d.rowno = 1
;

Open in new window

CTEs are not complex, think of it like a temp table.

You create a small list of people you know you are interested in, we all that "CTEpe"
then you re-use that list to filter the 2 different source of more information.

COMBINE both those sets of information using UNION ALL << difference here!

now take the most recent contact from that combined list

If this isn't the correct logic, then it isn't a correct result.

PortletPaul

By the way.

The logic of your current query includes the most recent contact, from 2 sources, ONLY IF those 2 contacts are identical in ALL COLUMNS will it be reduced to a single row.

You are comparing that query, to mine that will ONLY LIST the most recent contact

in other words, I'm would not be surprised if my query produced fewer rows.

biotec

ASKER

Yes but the problem is that I pulled all the dups out of mine then went in and did a compare against the results of yours and the ones I had that you did not were legitimate. So, even though I realize my UNION query will not remove dups (which is why I wanted a method of doing that without the CTE) it does seem to produce the correct results other than the fact that it is comparing against different columns in different tables and thus giving dups.

Also, I did attach an excel file with examples that would be pulled from each query separately (in office labs versus external labs) which is what I thought was asked for. I'm happy to return results from the UNION query or the CTE if that would be helpful.

I'm wondering if it's possible that the CTE by inner joining on the person_id is somehow eliminating all the people who had them done within the date range for both external and internal labs. Basically my queries before the UNION seem to be accurate. The only problem is that some people had 2 tests, one in-office and one externally and that creates dups that must be removed in Excel or something else which is obviously not what I would like.
Thanks

PortletPaul

>>"UNION query will not remove dups (which is why I wanted a method of doing that without the CTE) "

the CTE had nothing whatever to do it. Please stop focusing on the CTE.

If you did the following replaced "JOIN CTEpe" with "JOIN ( .... ) as CTEpe" and in that ... you included the full subquery that currently forms the CTE. Then removed all of the code that created the CTE you would end up with an equivalent query, without a CTE, that produced the same results as my query. The CTE is just saving you some time (better performance).

I do not know how to say this any more politely than I have - but without DATA I can do very little at this end. Query results are not data and I am not spending my voluntary time trying to reverse engineer them into table inserts.

"sample data" is per table so that the query one is trying to develop can be run.

Regarding the "dups that are valid"
OK
but again as I cannot see them, I have no idea why they are valid except that you tell me they are.
Can you translate "they are valid" into business rules that the query should meet?

As you can surely tell, I am trying to assist, but: no data = no much help

You might want to read this: http://sscce.org/

ASKER CERTIFIED SOLUTION

PortletPaul

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

biotec

ASKER

thank you Paul. I sincerely appreciate all the help I get here. This particular account was paid for originally by an org I worked for very shortly but they used my login for years after I left. It looks like I've been on here that long but in reality I'm fairly new to this and just learning how to properly ask questions. Trust me I'm paying for this membership on my own now and trying not to piss off all the great people who help me. :)

I do understand now about getting sample data. My problem is getting you data from these that is de-identified. That would be a bit of an undertaking. I can get data from tables using last name of 'Test' and started doing that but realized they would rarely have legitimate information in their records which would make it difficult to test with.

Bottom Line: I've spent considerable time testing all the records returned from various queries used here and I am wondering if it is possibly one simple issue that causes problems on all of them. The fact that the one most important column to Union on is the timestamp. In the first part of the query for external labs the ROW_NUMBER uses the res.obs_date_time which is a datetime and the in-office labs portion of the second half of the UNION uses o2.completeddate which is a varchar. All I'm trying to do is UNION the two but remove dups, however even if I were able to do some ROW_NUMBER on the whole big query (to only take the most recently done A1c lab from the UNION of the two thus comparing on one of the timestamps) it wouldn't be able to properly compare because maybe doing a convert on the varchar completed date would be necessary.
As far as Business Rules

Diabetic Patients (the diagnosis codes tell us this)

Had an encounter any time between 20160101 and 20160331

If they had an encounter in that period then they also must have an A1c lab test done any time in the past year 20150401 to 20160331

If the patients had more than one test only return the most recent regardless of it being in-office or external

PortletPaul

It may be preferrable to convert the timestamp to varchar - that way you avoid errors if an existing varchar "date" cannot be converted.

E.g. convert (varchar (8),datetimecolumn,112)

That should convert to a YYYYMMDD pattern but it would be necessary to match the stringdate column's format.

This is another instance of not seeing data, I don't know what format your varchar "dates" are. Hopefully yyyymmdd or yyyy-mm-dd so they can be sorted correctly.

biotec

ASKER

Yes they are and I just thought of something that I should have thought about a long time ago. Why not just take your CTE query which incorporated the UNION in a more efficient way (or even my giant one that takes forever to load) and turn it into a view then select from the view and eliminate the dups that way. I would of course have to do a convert on the varchar date that is yyyymmdd as you mention but then I'm doing the ROW_NUMBER statement outside of the queries and using it against the view.

PortletPaul

Let us establish the best most correct query, then translate that to a view.

some while back you said this:

Yes but the problem is that I pulled all the dups out of mine then went in and did a compare against the results of yours and the ones I had that you did not were legitimate.

HOW did you "pull the dups"?
WHAT was in the valid ones that made you aware they were valid?

a tiny example would be really appreciated.
before pulling dups
after pulling dups
which ones(s) is(are) valid

biotec

ASKER

I added person_id to both of our queries. Mine returned patients that had a visit in the date range and had an A1c in the date range.
I'll try to get some examples from each of the queries. My biggest issue is getting real data and making it de-identified data. No MRN's, phones, addresses, dob, names etc etc. and sometimes these tables are so large (50-60 columns) it's easy to miss something.

Can I return some examples of just the data returned from the queries rather than examples from each table referenced? thanks

PortletPaul

Yes examples are ok

biotec

ASKER

I hope this attachment will help.
A1cTestExamples.xlsx

PortletPaul

I'm really sorry but while I did study that spreadsheet I just didn't follow it.

Not sure where yo want to go with this now. By the way my time for EE is quite restricted now also as I have 2 projects working concurrently.

biotec

ASKER

I gave real examples of just about every table used in the queries which is what I thought you wanted. I'm not sure what else to send. I appreciate your help but if you don't have time, I understand. I'll repost the question and add all the current examples we have and see if anyone else might be able to provide help. thanks

PortletPaul

Here is some sample data (tab delimited):

Person
person_id LastName
p1	test1
p2	test2
p3	test3
p4	test4
p5	test5

patient_encounter							
person_id	enc_id	enc_timestamp					
p1	e1	2016-01-02 11:50:00					
p2	e2	2016-01-04 11:57:03					
p1	e3	2016-01-05 10:36:46					
p1	e4	2016-01-05 11:30:00					
p3	e5	2016-01-07 15:15:00					
p4	e6	2016-01-08 16:35:00					
p2	e7	2016-01-11 10:00:00					
p1	e8	2016-01-12 08:30:00					
p4	e9	2016-01-12 12:00:00					
p1	ea	2016-01-12 12:22:34					
p2	eb	2016-01-18 10:17:34					
p5	ec	2016-01-19 12:42:06					
p5	ed	2016-01-26 14:15:00					
p2	ee	2016-02-01 12:24:44					

Order_						
person_id	A1cDate	A1cResultValue	ResultDescription
p1	2015-07-02 16:59:25	6.5	GLYCATED HEMOGLOBIN TEST
p1	2015-10-01 13:39:24	6.3	Hemoglobin A1C
p2	2015-11-23 09:38:23	10.3	GLYCATED HEMOGLOBIN TEST
p2	2016-01-11 10:41:37	8.3	GLYCATED HEMOGLOBIN TEST
p3	2015-05-20 10:08:55	5.7	GLYCATED HEMOGLOBIN TEST
p3	2015-11-03 09:06:45	7.2	GLYCATED HEMOGLOBIN TEST
p3	2016-01-07 18:07:26	6.5	GLYCATED HEMOGLOBIN TEST
p4	2015-12-15 16:32:59	14	GLYCATED HEMOGLOBIN TEST
p5	2015-04-14 15:31:30	7	Hemoglobin A1C
p5	2015-07-28 15:28:16	7	GLYCATED HEMOGLOBIN TEST

lab_results_obx							
person_id	A1cDate	ResultDescription	A1cResultValue
p1	2015-08-07 07:17:00	HBG.AIC	6.1
p3	2015-05-20 12:45:00	HBG.AIC	5.3
p5	2015-11-03 10:12:00	HEMOGLOBIN A1c	7.1

					
patient_diagnosis							
person_id	enc_id	create_timestamp	diagnosis_code_id
p1	c1	2014-02-04 10:24:38	250
p1	c2	2013-10-22 18:55:00	250
p1	c3	2014-02-06 11:24:43	250
p1	c4	2012-11-27 17:17:24	250
p1	c5	2013-02-21 15:56:17	250
p1	c6	2014-12-16 14:42:00	250
p1	c7	2015-01-22 11:12:00	250
p1	c8	2014-09-08 09:07:37	250
p1	c9	2015-07-02 16:59:25	250
p1	ca	2015-08-03 10:28:20	250
p1	cb	2014-08-12 14:53:53	250
p1	cc	2015-10-01 14:15:06	E11.40
p1	cc	2015-10-01 14:16:24	E13.42
p2	cd	2012-12-19 18:19:12	250
p2	ce	2014-02-11 16:01:20	250
p2	cf	2013-07-17 18:15:21	250
p2	cg	2013-03-09 09:12:48	250
p2	ch	2014-05-10 11:41:59	250.02
p2	ci	2014-11-04 08:45:38	250.02
p2	cj	2014-05-03 10:19:15	250.02
p2	ce	2014-02-11 16:33:42	250.6
p2	ck	2015-11-23 09:22:53	E11.42
p2	e7	2016-01-11 10:12:47	E11.65
p2	cl	2015-12-18 12:04:59	E11.65
p2	cm	2015-12-14 08:50:47	E11.65
p2	ck	2015-11-23 17:18:08	E11.65
p2	cn	2015-12-07 08:54:12	E11.65
p2	e7	2016-01-11 10:14:57	E11.9
p2	ck	2015-11-23 17:17:54	E11.9
p3	co	2013-08-07 09:17:33	250
p3	cp	2013-08-28 10:13:01	250
p3	e5	2016-01-07 16:17:06	E11.9
p3	cq	2015-11-03 09:06:34	E11.9
p4	e9	2016-01-12 12:43:11	E11.65
p4	cr	2015-12-15 16:27:09	E11.65
p5	cs	2013-04-30 14:16:29	250
p5	ct	2015-04-14 15:31:25	250
p5	cu	2013-08-23 12:05:33	250
p5	cv	2015-07-28 15:21:17	250
p5	cw	2013-01-11 17:05:15	250
p5	cx	2013-11-08 10:13:10	250
p5	cy	2015-02-12 11:55:45	250.62
p5	cz	2015-11-24 13:44:38	E11.9

Open in new window

Notes:
Order_.A1cDate is varchar in format 2015-12-07 11:54:00.000
Lab_Results_Obx.A1cDate is a datetime column

person_id is really stored in this form 'D8FF794D-3E19-4830-82C8-09E2758CF1B9'
enc_id is really stored in this form '9535E988-C284-465A-83C1-FAED1C297E0C'

the string values for person_id and enc_id have been replaced with shorter strings for ease of reading. e.g. each unique person_id was replaced by p1 or p2 or p3 etc

for enc_id only some of these are common across more than one table, they commence with 'e'
other that are not common across tables commence with 'c'

===============
From that simplfied text I can easily create some tables, like this:

CREATE TABLE Person
    ([person_id] varchar(2), [LastName] varchar(5))
;
    
INSERT INTO Person
    ([person_id], [LastName])
VALUES
    ('p1', 'test1'),
    ('p2', 'test2'),
    ('p3', 'test3'),
    ('p4', 'test4'),
    ('p5', 'test5')
;


CREATE TABLE patient_encounter	
    ([person_id] varchar(2), [enc_id] varchar(2), [enc_timestamp] datetime)
;
    
INSERT INTO patient_encounter	
    ([person_id], [enc_id], [enc_timestamp])
VALUES
    ('p1', 'e1', '2016-01-02 11:50:00'),
    ('p2', 'e2', '2016-01-04 11:57:03'),
    ('p1', 'e3', '2016-01-05 10:36:46'),
    ('p1', 'e4', '2016-01-05 11:30:00'),
    ('p3', 'e5', '2016-01-07 15:15:00'),
    ('p4', 'e6', '2016-01-08 16:35:00'),
    ('p2', 'e7', '2016-01-11 10:00:00'),
    ('p1', 'e8', '2016-01-12 08:30:00'),
    ('p4', 'e9', '2016-01-12 12:00:00'),
    ('p1', 'ea', '2016-01-12 12:22:34'),
    ('p2', 'eb', '2016-01-18 10:17:34'),
    ('p5', 'ec', '2016-01-19 12:42:06'),
    ('p5', 'ed', '2016-01-26 14:15:00'),
    ('p2', 'ee', '2016-02-01 12:24:44')
;


CREATE TABLE Order_	
    ([person_id] varchar(2), [A1cDate] varchar(24)
     , [A1cResultValue] int, [ResultDescription] varchar(24))
;
    
INSERT INTO Order_	
    ([person_id], [A1cDate], [A1cResultValue], [ResultDescription])
VALUES
    ('p1', '2015-07-02 16:59:25', 6.5, 'GLYCATED HEMOGLOBIN TEST'),
    ('p1', '2015-10-01 13:39:24', 6.3, 'Hemoglobin A1C'),
    ('p2', '2015-11-23 09:38:23', 10.3, 'GLYCATED HEMOGLOBIN TEST'),
    ('p2', '2016-01-11 10:41:37', 8.3, 'GLYCATED HEMOGLOBIN TEST'),
    ('p3', '2015-05-20 10:08:55', 5.7, 'GLYCATED HEMOGLOBIN TEST'),
    ('p3', '2015-11-03 09:06:45', 7.2, 'GLYCATED HEMOGLOBIN TEST'),
    ('p3', '2016-01-07 18:07:26', 6.5, 'GLYCATED HEMOGLOBIN TEST'),
    ('p4', '2015-12-15 16:32:59', 14, 'GLYCATED HEMOGLOBIN TEST'),
    ('p5', '2015-04-14 15:31:30', 7, 'Hemoglobin A1C'),
    ('p5', '2015-07-28 15:28:16', 7, 'GLYCATED HEMOGLOBIN TEST')
;


CREATE TABLE lab_results_obx	
    ([person_id] varchar(2), [A1cDate] datetime, [ResultDescription] varchar(14), [A1cResultValue] numeric)
;
    
INSERT INTO lab_results_obx	
    ([person_id], [A1cDate], [ResultDescription], [A1cResultValue])
VALUES
    ('p1', '2015-08-07 07:17:00', 'HBG.AIC', 6.1),
    ('p3', '2015-05-20 12:45:00', 'HBG.AIC', 5.3),
    ('p5', '2015-11-03 10:12:00', 'HEMOGLOBIN A1c', 7.1)
;


CREATE TABLE patient_diagnosis	
    ([person_id] varchar(2), [enc_id] varchar(2), [create_timestamp] datetime, [diagnosis_code_id] varchar(6))
;
    
INSERT INTO patient_diagnosis	
    ([person_id], [enc_id], [create_timestamp], [diagnosis_code_id])
VALUES
    ('p1', 'c1', '2014-02-04 10:24:38', '250'),
    ('p1', 'c2', '2013-10-22 18:55:00', '250'),
    ('p1', 'c3', '2014-02-06 11:24:43', '250'),
    ('p1', 'c4', '2012-11-27 17:17:24', '250'),
    ('p1', 'c5', '2013-02-21 15:56:17', '250'),
    ('p1', 'c6', '2014-12-16 14:42:00', '250'),
    ('p1', 'c7', '2015-01-22 11:12:00', '250'),
    ('p1', 'c8', '2014-09-08 09:07:37', '250'),
    ('p1', 'c9', '2015-07-02 16:59:25', '250'),
    ('p1', 'ca', '2015-08-03 10:28:20', '250'),
    ('p1', 'cb', '2014-08-12 14:53:53', '250'),
    ('p1', 'cc', '2015-10-01 14:15:06', 'E11.40'),
    ('p1', 'cc', '2015-10-01 14:16:24', 'E13.42'),
    ('p2', 'cd', '2012-12-19 18:19:12', '250'),
    ('p2', 'ce', '2014-02-11 16:01:20', '250'),
    ('p2', 'cf', '2013-07-17 18:15:21', '250'),
    ('p2', 'cg', '2013-03-09 09:12:48', '250'),
    ('p2', 'ch', '2014-05-10 11:41:59', '250.02'),
    ('p2', 'ci', '2014-11-04 08:45:38', '250.02'),
    ('p2', 'cj', '2014-05-03 10:19:15', '250.02'),
    ('p2', 'ce', '2014-02-11 16:33:42', '250.6'),
    ('p2', 'ck', '2015-11-23 09:22:53', 'E11.42'),
    ('p2', 'e7', '2016-01-11 10:12:47', 'E11.65'),
    ('p2', 'cl', '2015-12-18 12:04:59', 'E11.65'),
    ('p2', 'cm', '2015-12-14 08:50:47', 'E11.65'),
    ('p2', 'ck', '2015-11-23 17:18:08', 'E11.65'),
    ('p2', 'cn', '2015-12-07 08:54:12', 'E11.65'),
    ('p2', 'e7', '2016-01-11 10:14:57', 'E11.9'),
    ('p2', 'ck', '2015-11-23 17:17:54', 'E11.9'),
    ('p3', 'co', '2013-08-07 09:17:33', '250'),
    ('p3', 'cp', '2013-08-28 10:13:01', '250'),
    ('p3', 'e5', '2016-01-07 16:17:06', 'E11.9'),
    ('p3', 'cq', '2015-11-03 09:06:34', 'E11.9'),
    ('p4', 'e9', '2016-01-12 12:43:11', 'E11.65'),
    ('p4', 'cr', '2015-12-15 16:27:09', 'E11.65'),
    ('p5', 'cs', '2013-04-30 14:16:29', '250'),
    ('p5', 'ct', '2015-04-14 15:31:25', '250'),
    ('p5', 'cu', '2013-08-23 12:05:33', '250'),
    ('p5', 'cv', '2015-07-28 15:21:17', '250'),
    ('p5', 'cw', '2013-01-11 17:05:15', '250'),
    ('p5', 'cx', '2013-11-08 10:13:10', '250'),
    ('p5', 'cy', '2015-02-12 11:55:45', '250.62'),
    ('p5', 'cz', '2015-11-24 13:44:38', 'E11.9')
;

Open in new window

In fact you can see those table - live and ready to play with - here at sqlfiddle: http://sqlfiddle.com/#!6/1ddd1

PortletPaul

It is wiser to use the REAL COLUMN NAMES in your sample data, regrettably I didn't notice until too late. But now that had the sqlfiddle working I could get some results. (see below)

What I do not know is, is the final result below "correct"? Only you know that.

Please note that for your A1cDate I suggest you use:
convert(varchar(19), res.A1cDate ,121) AS A1cDate
then you get the full date and time to the second

If in doubt over what style number to use refer to this article:
https://www.experts-exchange.com/articles/12315/SQL-Server-Date-Styles-formats-using-CONVERT.html

I trust this helps, but most of all I trust this shows why sample data and expected result are so important to resolving your questions.

SELECT
      o2.person_id
    , o2.ResultDescription
    , o2.A1cResultValue
    , o2.A1cDate
FROM order_ o2
      INNER JOIN person as CTEpe ON o2.person_id = CTEpe.person_id
WHERE o2.A1cResultValue IS NOT NULL
      AND (o2.A1cDate >= '20150101'
      AND o2.A1cDate < '20160101')
      AND o2.ResultDescription IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')


**[Results][2]**:
| person_id |        ResultDescription | A1cResultValue |             A1cDate |
|-----------|--------------------------|----------------|---------------------|
|        p2 | GLYCATED HEMOGLOBIN TEST |              8 | 2016-01-11 10:41:37 |
|        p3 | GLYCATED HEMOGLOBIN TEST |              6 | 2016-01-07 18:07:26 |

Open in new window

SELECT
      res.person_id
    , res.ResultDescription
    , res.A1cResultValue
    , convert(varchar(19), res.A1cDate ,121) AS A1cDate
FROM lab_results_obx res
      INNER JOIN person as CTEpe ON res.person_id = CTEpe.person_id
WHERE res.A1cResultValue IS NOT NULL
      AND (res.A1cDate >= '20150101'
      AND res.A1cDate < '20160101')
      AND (res.ResultDescription LIKE '%HEMOGLOBIN A1C%'
      OR res.ResultDescription LIKE '%HBG. A1C%'
      OR res.ResultDescription LIKE '%HBG.AIC%'
      OR res.ResultDescription LIKE '%HEMOGLOBIN A1c%')


| person_id | ResultDescription | A1cResultValue |             A1cDate |
|-----------|-------------------|----------------|---------------------|
|        p1 |           HBG.AIC |              6 | 2015-08-07 07:17:00 |
|        p3 |           HBG.AIC |              5 | 2015-05-20 12:45:00 |
|        p5 |    HEMOGLOBIN A1c |              7 | 2015-11-03 10:12:00 |

Open in new window

SELECT
      o2.person_id
    , o2.ResultDescription
    , o2.A1cResultValue
    , o2.A1cDate
FROM order_ o2
      INNER JOIN person as CTEpe ON o2.person_id = CTEpe.person_id
WHERE o2.A1cResultValue IS NOT NULL
      AND (o2.A1cDate >= '20150101'
      AND o2.A1cDate < '20160101')
      AND o2.ResultDescription IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')

UNION

    SELECT
          res.person_id
        , res.ResultDescription
        , res.A1cResultValue
        , convert(varchar(19), res.A1cDate ,121) AS A1cDate
    FROM lab_results_obx res
          INNER JOIN person as CTEpe ON res.person_id = CTEpe.person_id
    WHERE res.A1cResultValue IS NOT NULL
          AND (res.A1cDate >= '20150101'
          AND res.A1cDate < '20160101')
          AND (res.ResultDescription LIKE '%HEMOGLOBIN A1C%'
          OR res.ResultDescription LIKE '%HBG. A1C%'
          OR res.ResultDescription LIKE '%HBG.AIC%'
          OR res.ResultDescription LIKE '%HEMOGLOBIN A1c%')



| person_id |        ResultDescription | A1cResultValue |             A1cDate |
|-----------|--------------------------|----------------|---------------------|
|        p1 |                  HBG.AIC |              6 | 2015-08-07 07:17:00 |
|        p2 | GLYCATED HEMOGLOBIN TEST |              8 | 2016-01-11 10:41:37 |
|        p3 | GLYCATED HEMOGLOBIN TEST |              6 | 2016-01-07 18:07:26 |
|        p3 |                  HBG.AIC |              5 | 2015-05-20 12:45:00 |
|        p5 |           HEMOGLOBIN A1c |              7 | 2015-11-03 10:12:00 |

Open in new window

SELECT
    person_id, ResultDescription, A1cResultValue, A1cDate
  , ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY A1cDate DESC) AS rowno
FROM (

    SELECT
          o2.person_id
        , o2.ResultDescription
        , o2.A1cResultValue
        , o2.A1cDate
    FROM order_ o2
          INNER JOIN person as CTEpe ON o2.person_id = CTEpe.person_id
    WHERE o2.A1cResultValue IS NOT NULL
          AND (o2.A1cDate >= '20150101'
          AND o2.A1cDate < '20160101')
          AND o2.ResultDescription IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')

   UNION

        SELECT
              res.person_id
            , res.ResultDescription
            , res.A1cResultValue
            , convert(varchar(19), res.A1cDate ,121) AS A1cDate
        FROM lab_results_obx res
              INNER JOIN person as CTEpe ON res.person_id = CTEpe.person_id
        WHERE res.A1cResultValue IS NOT NULL
              AND (res.A1cDate >= '20150101'
              AND res.A1cDate < '20160101')
              AND (res.ResultDescription LIKE '%HEMOGLOBIN A1C%'
              OR res.ResultDescription LIKE '%HBG. A1C%'
              OR res.ResultDescription LIKE '%HBG.AIC%'
              OR res.ResultDescription LIKE '%HEMOGLOBIN A1c%')
) AS DT


| person_id |        ResultDescription | A1cResultValue |             A1cDate | rowno |
|-----------|--------------------------|----------------|---------------------|-------|
|        p1 |                  HBG.AIC |              6 | 2015-08-07 07:17:00 |     1 |
|        p2 | GLYCATED HEMOGLOBIN TEST |              8 | 2016-01-11 10:41:37 |     1 |
|        p3 | GLYCATED HEMOGLOBIN TEST |              6 | 2016-01-07 18:07:26 |     1 |
|        p3 |                  HBG.AIC |              5 | 2015-05-20 12:45:00 |     2 |
|        p5 |           HEMOGLOBIN A1c |              7 | 2015-11-03 10:12:00 |     1 |

Open in new window

select
*
from (
    SELECT
            person_id, ResultDescription, A1cResultValue, A1cDate
          , ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY A1cDate DESC) AS rowno
    FROM (
      
            SELECT
                  o2.person_id
                , o2.ResultDescription
                , o2.A1cResultValue
                , o2.A1cDate
            FROM order_ o2
                  INNER JOIN person as CTEpe ON o2.person_id = CTEpe.person_id
            WHERE o2.A1cResultValue IS NOT NULL
                  AND (o2.A1cDate >= '20150101'
                  AND o2.A1cDate < '20160101')
                  AND o2.ResultDescription IN ('GLYCATED HEMOGLOBIN TEST', 'Hemoglobin A1c')

           UNION

                SELECT
                      res.person_id
                    , res.ResultDescription
                    , res.A1cResultValue
                    , convert(varchar(19), res.A1cDate ,121) AS A1cDate
                FROM lab_results_obx res
                      INNER JOIN person as CTEpe ON res.person_id = CTEpe.person_id
                WHERE res.A1cResultValue IS NOT NULL
                      AND (res.A1cDate >= '20150101'
                      AND res.A1cDate < '20160101')
                      AND (res.ResultDescription LIKE '%HEMOGLOBIN A1C%'
                      OR res.ResultDescription LIKE '%HBG. A1C%'
                      OR res.ResultDescription LIKE '%HBG.AIC%'
                      OR res.ResultDescription LIKE '%HEMOGLOBIN A1c%')
        ) AS DT
   ) as dt2
where rowno = 1


| person_id |        ResultDescription | A1cResultValue |             A1cDate | rowno |
|-----------|--------------------------|----------------|---------------------|-------|
|        p1 |                  HBG.AIC |              6 | 2015-08-07 07:17:00 |     1 |
|        p2 | GLYCATED HEMOGLOBIN TEST |              8 | 2016-01-11 10:41:37 |     1 |
|        p3 | GLYCATED HEMOGLOBIN TEST |              6 | 2016-01-07 18:07:26 |     1 |
|        p5 |           HEMOGLOBIN A1c |              7 | 2015-11-03 10:12:00 |     1 |

Open in new window

See: http://sqlfiddle.com/#!6/1ddd1/18

biotec

ASKER

Thank you very much Paul. Didn't expect all the extra effort on this, greatly appreciated. :)