pda4me
asked on
MySQL remove duplicate based on columns
I have a table with multiple duplicate records. Each has a column called f95 that can have either BEAR, FMFL or NAPLES as the data.
How do I craft a query to compare the following fields:
STREET_NUMBER, STREET_NAME, STREET_TYPE, STREET_DIRECTION, UNIT_NUMBER, CITY, ZIP_CODE
and then remove the records based on if the value is FMFL I always keep and remove the others, if not FMFL then keep NAPLES and remove other duplicates
Please show me the correct SQL query to do this.
How do I craft a query to compare the following fields:
STREET_NUMBER, STREET_NAME, STREET_TYPE, STREET_DIRECTION, UNIT_NUMBER, CITY, ZIP_CODE
and then remove the records based on if the value is FMFL I always keep and remove the others, if not FMFL then keep NAPLES and remove other duplicates
Please show me the correct SQL query to do this.
ASKER
yes, its called bID and is INT 12 auto_increment
I'm not 100% sure how to do the delete, but here is a query that will get you the information you need.
If it were me, I would loop through this query and do individual deletes via PHP or an alternate language. It's possible it could be done via a Stored Procedure. I'm not confident that MySQL will accept a GROUP BY query in a DELETE statement...
The attached query will get you all duplicates in the table and will have a 2 in the f95_test column when FMLA is present in the duplicates or a 1 when NAPLES is, but not FMLA. 0 when neither appears.
If it were me, I would loop through this query and do individual deletes via PHP or an alternate language. It's possible it could be done via a Stored Procedure. I'm not confident that MySQL will accept a GROUP BY query in a DELETE statement...
The attached query will get you all duplicates in the table and will have a 2 in the f95_test column when FMLA is present in the duplicates or a 1 when NAPLES is, but not FMLA. 0 when neither appears.
STREET_NUMBER, STREET_NAME, STREET_TYPE, STREET_DIRECTION, UNIT_NUMBER, CITY, ZIP_CODE, MAX(CASE WHEN f95 = "FMLA" THEN 2 ELSE CASE WHEN f95 = "NAPLES" then 1 ELSE 0 END END) as f95_test
FROM YOUR_TABLE
GROUP BY STREET_NUMBER, STREET_NAME, STREET_TYPE, STREET_DIRECTION, UNIT_NUMBER, CITY, ZIP_CODE
HAVING count(bID) > 1
Hi pda,
One more question. About how many rows are in the table? The simplest query to write/understand isn't very efficient, but if the data set is relatively small, that won't be an issue.
Kent
One more question. About how many rows are in the table? The simplest query to write/understand isn't very efficient, but if the data set is relatively small, that won't be an issue.
Kent
ASKER
< 50K records
Ok. This something like this should work. So should a variation of what Dirk posted.
Kent
INSERT INTO newtable
SELECT * FROM oldtable
WHERE fID in
(
SELECT
coalesce (f95_fmfl, f95_naples, f95_bear)
FROM
(
SELECT DISTINCT
STREET_NUMBER,
STREET_NAME,
STREET_TYPE,
STREET_DIRECTION,
UNIT_NUMBER,
CITY,
ZIP_CODE,
(
SELECT id
FROM mytable t1
WHERE t0.STREET_NUMBER = t1.STREET_NUMBER
AND t0.STREET_NAME = t1.STREET_NAME
AND t0.STREET_TYPE = t1.STREET_TYPE
AND t0.STREET_DIRECTION = t1. STREET_DIRECTION
AND t0.UNIT_NUMBER = t1.UNIT_NUMBER
AND t0.CITY = T1.CITY
AND t0.ZIP_CODE = t1.ZIP_CODE
AND t1.f95 = 'FMFL'
) as f95_fmfl,
(
SELECT id
FROM mytable t2
WHERE t0.STREET_NUMBER = t2.STREET_NUMBER
AND t0.STREET_NAME = t2.STREET_NAME
AND t0.STREET_TYPE = t2.STREET_TYPE
AND t0.STREET_DIRECTION = t2. STREET_DIRECTION
AND t0.UNIT_NUMBER = t2.UNIT_NUMBER
AND t0.CITY = T2.CITY
AND t0.ZIP_CODE = t2.ZIP_CODE
AND t1.f95 = 'NAPLES'
) as f95_naples,
(
SELECT id
FROM mytable t1
WHERE t0.STREET_NUMBER = t3.STREET_NUMBER
AND t0.STREET_NAME = t3.STREET_NAME
AND t0.STREET_TYPE = t3.STREET_TYPE
AND t0.STREET_DIRECTION = t3. STREET_DIRECTION
AND t0.UNIT_NUMBER = t3.UNIT_NUMBER
AND t0.CITY = T3.CITY
AND t0.ZIP_CODE = t3.ZIP_CODE
AND t1.f95 = 'BEAR'
) as f95_bear
) t10
)
ASKER
Weird, I am getting a 1054 - Unknown column 'STREET_NUMBER' in 'field list'
I verfied that this is a valid column in the table?
I verfied that this is a valid column in the table?
Oops. :)
After line 52 add this:
FROM mytable t0
:)
Kent
ASKER
Close, here is the script with database names in it...I am now getting a
1054 - Unknown column 'T1.CITY' in 'where clause'
Query:
1054 - Unknown column 'T1.CITY' in 'where clause'
Query:
INSERT INTO idx_Bear_FMFL_Naples_Final
SELECT * FROM idx_Bear_FMFL_Naples
WHERE MLS_LISTING_ID in
(
SELECT
coalesce (f95_fmfl, f95_naples, f95_bear)
FROM
(
SELECT DISTINCT
STREET_NUMBER,
STREET_NAME,
STREET_TYPE,
STREET_DIRECTION,
UNIT_NUMBER,
CITY,
ZIP_CODE,
(
SELECT MLS_LISTING_ID
FROM idx_Bear_FMFL_Naples t1
WHERE t0.STREET_NUMBER = t1.STREET_NUMBER
AND t0.STREET_NAME = t1.STREET_NAME
AND t0.STREET_TYPE = t1.STREET_TYPE
AND t0.STREET_DIRECTION = t1. STREET_DIRECTION
AND t0.UNIT_NUMBER = t1.UNIT_NUMBER
AND t0.CITY = T1.CITY
AND t0.ZIP_CODE = t1.ZIP_CODE
AND t1.f95 = 'FMFL'
) as f95_fmfl,
(
SELECT MLS_LISTING_ID
FROM idx_Bear_FMFL_Naples t2
WHERE t0.STREET_NUMBER = t2.STREET_NUMBER
AND t0.STREET_NAME = t2.STREET_NAME
AND t0.STREET_TYPE = t2.STREET_TYPE
AND t0.STREET_DIRECTION = t2. STREET_DIRECTION
AND t0.UNIT_NUMBER = t2.UNIT_NUMBER
AND t0.CITY = T2.CITY
AND t0.ZIP_CODE = t2.ZIP_CODE
AND t1.f95 = 'NAPLES'
) as f95_naples,
(
SELECT MLS_LISTING_ID
FROM idx_Bear_FMFL_Naples t1
WHERE t0.STREET_NUMBER = t3.STREET_NUMBER
AND t0.STREET_NAME = t3.STREET_NAME
AND t0.STREET_TYPE = t3.STREET_TYPE
AND t0.STREET_DIRECTION = t3. STREET_DIRECTION
AND t0.UNIT_NUMBER = t3.UNIT_NUMBER
AND t0.CITY = T3.CITY
AND t0.ZIP_CODE = t3.ZIP_CODE
AND t1.f95 = 'BEAR'
) as f95_bear
FROM idx_Bear_FMFL_Naples t0
) t10
)
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Line 43. The alias should be T3, not T1.
ASKER
closer, still getting this error:
1054 - Unknown column 't1.f95' in 'where clause'
Updated Code:
1054 - Unknown column 't1.f95' in 'where clause'
Updated Code:
INSERT INTO idx_Bear_FMFL_Naples_Final
SELECT * FROM idx_Bear_FMFL_Naples
WHERE MLS_LISTING_ID in
(
SELECT
coalesce (f95_FMFL, f95_Naples, f95_BEAR)
FROM
(
SELECT DISTINCT
STREET_NUMBER,
STREET_NAME,
STREET_TYPE,
STREET_DIRECTION,
UNIT_NUMBER,
CITY,
ZIP_CODE,
(
SELECT MLS_LISTING_ID
FROM idx_Bear_FMFL_Naples t1
WHERE t0.STREET_NUMBER = t1.STREET_NUMBER
AND t0.STREET_NAME = t1.STREET_NAME
AND t0.STREET_TYPE = t1.STREET_TYPE
AND t0.STREET_DIRECTION = t1. STREET_DIRECTION
AND t0.UNIT_NUMBER = t1.UNIT_NUMBER
AND t0.CITY = t1.CITY
AND t0.ZIP_CODE = t1.ZIP_CODE
AND t1.f95 = 'FMFL'
) as f95_FMFL,
(
SELECT MLS_LISTING_ID
FROM idx_Bear_FMFL_Naples t2
WHERE t0.STREET_NUMBER = t2.STREET_NUMBER
AND t0.STREET_NAME = t2.STREET_NAME
AND t0.STREET_TYPE = t2.STREET_TYPE
AND t0.STREET_DIRECTION = t2. STREET_DIRECTION
AND t0.UNIT_NUMBER = t2.UNIT_NUMBER
AND t0.CITY = t2.CITY
AND t0.ZIP_CODE = t2.ZIP_CODE
AND t1.f95 = 'Naples'
) as f95_Naples,
(
SELECT MLS_LISTING_ID
FROM idx_Bear_FMFL_Naples t3
WHERE t0.STREET_NUMBER = t3.STREET_NUMBER
AND t0.STREET_NAME = t3.STREET_NAME
AND t0.STREET_TYPE = t3.STREET_TYPE
AND t0.STREET_DIRECTION = t3. STREET_DIRECTION
AND t0.UNIT_NUMBER = t3.UNIT_NUMBER
AND t0.CITY = t3.CITY
AND t0.ZIP_CODE = t3.ZIP_CODE
AND t1.f95 = 'BEAR'
) as f95_BEAR
FROM idx_Bear_FMFL_Naples t0
) t10
)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks!
>> The T in T1.CITY is capital (also T2.CITY and T3.CITY) it should be lowercase: t1.CITY, t2.CITY, t3.CITY
SQL is NOT case sensitive. Upper case, lower case, and mixed case should all work equally well.
Kent
SQL is NOT case sensitive. Upper case, lower case, and mixed case should all work equally well.
Kent
Is there a primary key on the rows? Perhaps an auto_increment column?
Kent