Link to home
Start Free TrialLog in
Avatar of matthew016
matthew016Flag for Belgium

asked on

UNION but avoid duplicated data

Hi,

I have two tables : patients and patients_guest.

I would like to have the patients who have a lastname starting by X (parameter) in both tables. I think that's a simple UNION with: WHERE P.lastname LIKE "' . $x . '%"
The problem is that a patient may be in both tables. I need to check that there is only one lastname-firstname-birthDate returned (note that there may be two patients with same lastname and firstname, that's why they have to have the same birthday to be considered as duplicated).

The fields to return are :
id
lastname
firstname
date_of_birth
is_guest
(this must be generated, should be true if the record is from patients_guest, false otherwise).

Thank you for any help
Avatar of cyberkiwi
cyberkiwi
Flag of New Zealand image

Just change UNION to UNION ALL
It will remove the duplicates
SOLUTION
Avatar of packratt_jk
packratt_jk
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of matthew016

ASKER

how will UNION ALL know that duplicated data is data with same lastname, firstname and date_of_birth (id may be differnet) ?
@cyberkiwi

Thanks, is it possible to have the record from table "patients" when there is a duplicate, and not from "patients_guest" ?

Maybe a AND NOT EXISTS or something should be added to the second part of the union ?
Because the UNION is a "virtual" dataset, this very complicated query is needed to properly return just 1 row from duplicates.
But even this is not perfect, if you have

firstname, lastname, date_of_birth AND id duplicated in both tables, the query will give you both rows since it cannot distinguish between the two.
select B.*
from
(
select min(id) min_id, lastname,firstname,date_of_birth
from
(
select id,lastname,firstname,date_of_birth,'false' is_guest
from patients
union
select id,lastname,firstname,date_of_birth,'true' is_guest
from patients_guest
) both
group by lastname,firstname,date_of_birth
) A
inner join
(
select id,lastname,firstname,date_of_birth,'false' is_guest
from patients
union
select id,lastname,firstname,date_of_birth,'true' is_guest
from patients_guest
) B on A.lastname=B.lastname and a.date_of_birth=b.date_of_birth and a.firstname=b.firstname and A.min_id=B.id

Open in new window

> Thanks, is it possible to have the record from table "patients" when there is a duplicate, and not from "patients_guest" ?
Brilliant - that makes it easier if there is a preference.

FYI - I just woke up - UNION is the one that removes duplicates and UNION ALL does not.
-- all patients
-- patients_guest only when not already in patients by (lastname,firstname,date_of_birth)
select id,lastname,firstname,date_of_birth,'false' is_guest
from patients
union all
select A.id,A.lastname,A.firstname,A.date_of_birth,'true' is_guest
from patients_guest A
left join patients B on A.lastname=B.lastname and a.date_of_birth=b.date_of_birth and a.firstname=b.firstname
where B.id is null

Open in new window

Thanks,

So what should I use in my case, UNION or UNION ALL ?

I'll post a second question about a problem with the data returned, this is enough for now :)
Normally if you want duplicate data (defined as begin equal on every single column) removed, use UNION.
However it has a cost, since it is performing a DISTINCT filter.

In the case of the last query presented above, we already know (by design) that the two parts are distinct, so using UNION ALL is just slightly faster to avoid the additional duplicate filtering process.