Simple SQL query need to filter by count greater than 2 within time frame

biotec
biotec used Ask the Experts™
on
Problem with this query below is it gives number of visits/encounters (pe.enc_id) ever and I need the count within the current year (20160101 - 20161231).

select distinct p.person_nbr, p.last_name, p.first_name, p.race, count(pe.enc_id) as count_of_encounters 
from patient_encounter pe
	inner join person p on pe.person_id = p.person_id
	inner join patient_diagnosis pd on pd.enc_id = pe.enc_id
WHERE (p.date_of_birth between '19420101' and '19981231' and pe.enc_timestamp >= '20160101' AND pe.enc_timestamp  < '20170101')
	and pd.person_id in (select distinct p.person_id from person p  inner join patient_diagnosis pd on p.person_id = pd.person_id
where (pd.diagnosis_code_id  like '250%' or pd.diagnosis_code_id like '648.0%' 
	or pd.diagnosis_code_id  like '357.2' or pd.diagnosis_code_id like '362.0[0-7]' or pd.diagnosis_code_id  like '366.41' or pd.diagnosis_code_id like 'E1[0-3].%' 
	or pd.diagnosis_code_id  like 'O24.[0-4]0%' or pd.diagnosis_code_id like 'O24.[4-9][2-9]') )
group by p.person_nbr, p.last_name, p.first_name, p.race
having count(pe.enc_id) >= 2

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Commented:
The query looks good, but check the enc_timestamp it may have stamped this year for alll the rows returnd

Author

Commented:
The timestamps are correct. There is nothing in the query that restricts the count of enc_id to this year and that is what is missing.
Scott PletcherSenior DBA
Most Valuable Expert 2018
Top Expert 2014

Commented:
You're not limiting the diagnosis to only 2016.  I'm not sure what the column name is for the date in the pd table, but something like below may give you what you need:


select distinct p.person_nbr, p.last_name, p.first_name, p.race, count(pe.enc_id) as count_of_encounters
from patient_encounter pe
      inner join person p on pe.person_id = p.person_id
WHERE (p.date_of_birth between '19420101' and '19981231'
    and pe.enc_timestamp >= '20160101' AND pe.enc_timestamp  < '20170101')
      and pe.person_id in (
          select distinct person_id
          from patient_diagnosis
        where
           date_column >= '20160101' --??column name??
           and date_column < '20170101'
--??column name??
           and (diagnosis_code_id  like '250%' or diagnosis_code_id like '648.0%'
               or diagnosis_code_id  like '357.2' or diagnosis_code_id like '362.0[0-7]'
               or diagnosis_code_id  like '366.41' or diagnosis_code_id like 'E1[0-3].%'
               or diagnosis_code_id  like 'O24.[0-4]0%' or diagnosis_code_id like 'O24.[4-9][2-9]')
         )
group by p.person_nbr, p.last_name, p.first_name, p.race
having count(pe.enc_id) >= 2
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Author

Commented:
The diagnosis can happen ever. The problem is the count(pe.enc_id) as count_of_encounters. It looks for any encounters ever, or so it seems and the I'm not sure the last having statement of >=2 is only looking at encounters for this year.
It looks like you already have it:

and pe.enc_timestamp >= '20160101' AND pe.enc_timestamp  < '20170101'
Are you sure that the count is including all dates?  Or is it just that the count seems high, and you're assuming that it's because it includes all dates?

 If the counts are high, maybe they're being inflated by something else.  For example, the main query has a Join with patient_diagnosis, but you don't include any columns from pd in the Select or Group By.  Unless I'm missing something, if a patient has more than one entry in patient_diagnosis, the patient_encounter rows will be repeated for each entry in patient_diagnosis, and those "extra" rows will be included in the counts.  If that's a problem, you could add something like pd.person_id to the Group By.

 On a side note, it doesn't seem like you really need to include the person table in the sub-query.  Why not just use
(select distinct pd.person_id from patient_diagnosis pd
where (pd.diagnosis_code_id  like '250%' or pd.diagnosis_code_id like '648.0%' 
	or pd.diagnosis_code_id  like '357.2' or pd.diagnosis_code_id like '362.0[0-7]' or pd.diagnosis_code_id  like '366.41' or pd.diagnosis_code_id like 'E1[0-3].%' 
	or pd.diagnosis_code_id  like 'O24.[0-4]0%' or pd.diagnosis_code_id like 'O24.[4-9][2-9]') )

Open in new window


James

Author

Commented:
James,
This is why the monthly fee for this site is so worth it. I just did not look at the fact that the inner join to patient_diagnosis at the beginning could cause the problems. I assume things about the data and the visits in this case and just thought the logic was good. It wasn't! You helped me spot two areas where there were problems although the join in the sub-query was not the issue it wasn't done correctly and just caused extra work for the query.
Thanks!!!!!
You're welcome.  Glad I could help.

 James

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial