• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 308
  • Last Modified:

MySQL remove duplicate based on columns

I have a table with multiple duplicate records.  Each has a column called f95 that can have either BEAR, FMFL or NAPLES as the data.

How do I craft a query to compare the following fields:

STREET_NUMBER, STREET_NAME, STREET_TYPE, STREET_DIRECTION, UNIT_NUMBER, CITY, ZIP_CODE

and then remove the records based on if the value is FMFL I always keep and remove the others, if not FMFL then keep NAPLES and remove other duplicates

Please show me the correct SQL query to do this.
0
pda4me
Asked:
pda4me
  • 7
  • 6
  • 2
2 Solutions
 
Kent OlsenData Warehouse Architect / DBACommented:
Hi Pda,

Is there a primary key on the rows?  Perhaps an auto_increment column?


Kent
0
 
pda4meAuthor Commented:
yes, its called bID and is INT 12 auto_increment
0
 
dirknibleckCommented:
I'm not 100% sure how to do the delete, but here is a query that will get you the information you need.

If it were me, I would loop through this query and do individual deletes via PHP or an alternate language. It's possible it could be done via a Stored Procedure. I'm not confident that MySQL will accept a GROUP BY query in a DELETE statement...


The attached query will get you all duplicates in the table and will have a 2 in the f95_test column when FMLA is present in the duplicates or a 1 when NAPLES is, but not FMLA. 0 when neither appears.
STREET_NUMBER, STREET_NAME, STREET_TYPE, STREET_DIRECTION, UNIT_NUMBER, CITY, ZIP_CODE, MAX(CASE WHEN f95 = "FMLA" THEN 2 ELSE CASE WHEN f95 = "NAPLES" then 1 ELSE 0 END END) as f95_test

FROM YOUR_TABLE

GROUP BY STREET_NUMBER, STREET_NAME, STREET_TYPE, STREET_DIRECTION, UNIT_NUMBER, CITY, ZIP_CODE

HAVING count(bID) > 1

Open in new window

0
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

 
Kent OlsenData Warehouse Architect / DBACommented:
Hi pda,

One more question.  About how many rows are in the table?  The simplest query to write/understand isn't very efficient, but if the data set is relatively small, that won't be an issue.


Kent
0
 
pda4meAuthor Commented:
< 50K records
0
 
Kent OlsenData Warehouse Architect / DBACommented:

Ok.  This something like this should work.  So should a variation of what Dirk posted.


Kent

INSERT INTO newtable
SELECT * FROM oldtable
WHERE fID in 
(
  SELECT
    coalesce (f95_fmfl, f95_naples, f95_bear)
  FROM
  (
    SELECT DISTINCT 
      STREET_NUMBER,
      STREET_NAME,
      STREET_TYPE,
      STREET_DIRECTION,
      UNIT_NUMBER,
      CITY,
      ZIP_CODE,
      (
        SELECT id 
        FROM mytable t1 
        WHERE t0.STREET_NUMBER = t1.STREET_NUMBER
          AND t0.STREET_NAME = t1.STREET_NAME
          AND t0.STREET_TYPE = t1.STREET_TYPE
          AND t0.STREET_DIRECTION = t1. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t1.UNIT_NUMBER
          AND t0.CITY = T1.CITY
          AND t0.ZIP_CODE = t1.ZIP_CODE
          AND t1.f95 = 'FMFL'
      ) as f95_fmfl,
      (
        SELECT id 
        FROM mytable t2 
        WHERE t0.STREET_NUMBER = t2.STREET_NUMBER
          AND t0.STREET_NAME = t2.STREET_NAME
          AND t0.STREET_TYPE = t2.STREET_TYPE
          AND t0.STREET_DIRECTION = t2. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t2.UNIT_NUMBER
          AND t0.CITY = T2.CITY
          AND t0.ZIP_CODE = t2.ZIP_CODE
          AND t1.f95 = 'NAPLES'
      ) as f95_naples,
      (
        SELECT id 
        FROM mytable t1 
        WHERE t0.STREET_NUMBER = t3.STREET_NUMBER
          AND t0.STREET_NAME = t3.STREET_NAME
          AND t0.STREET_TYPE = t3.STREET_TYPE
          AND t0.STREET_DIRECTION = t3. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t3.UNIT_NUMBER
          AND t0.CITY = T3.CITY
          AND t0.ZIP_CODE = t3.ZIP_CODE
          AND t1.f95 = 'BEAR'
      ) as f95_bear
  ) t10
)

Open in new window

0
 
pda4meAuthor Commented:
Weird, I am getting a 1054 - Unknown column 'STREET_NUMBER' in 'field list'

I verfied that this is a valid column in the table?
0
 
Kent OlsenData Warehouse Architect / DBACommented:

Oops.  :)

After line 52 add this:

  FROM mytable t0



  :)

Kent
0
 
pda4meAuthor Commented:
Close, here is the script with database names in it...I am now getting a

1054 - Unknown column 'T1.CITY' in 'where clause'  

Query:
INSERT INTO idx_Bear_FMFL_Naples_Final
SELECT * FROM idx_Bear_FMFL_Naples
WHERE MLS_LISTING_ID in 
(
  SELECT
    coalesce (f95_fmfl, f95_naples, f95_bear)
  FROM
  (
    SELECT DISTINCT 
      STREET_NUMBER,
      STREET_NAME,
      STREET_TYPE,
      STREET_DIRECTION,
      UNIT_NUMBER,
      CITY,
      ZIP_CODE,
      (
        SELECT MLS_LISTING_ID
        FROM  idx_Bear_FMFL_Naples t1 
        WHERE t0.STREET_NUMBER = t1.STREET_NUMBER
          AND t0.STREET_NAME = t1.STREET_NAME
          AND t0.STREET_TYPE = t1.STREET_TYPE
          AND t0.STREET_DIRECTION = t1. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t1.UNIT_NUMBER
          AND t0.CITY = T1.CITY
          AND t0.ZIP_CODE = t1.ZIP_CODE
          AND t1.f95 = 'FMFL'
      ) as f95_fmfl,
      (
        SELECT MLS_LISTING_ID
        FROM idx_Bear_FMFL_Naples t2 
        WHERE t0.STREET_NUMBER = t2.STREET_NUMBER
          AND t0.STREET_NAME = t2.STREET_NAME
          AND t0.STREET_TYPE = t2.STREET_TYPE
          AND t0.STREET_DIRECTION = t2. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t2.UNIT_NUMBER
          AND t0.CITY = T2.CITY
          AND t0.ZIP_CODE = t2.ZIP_CODE
          AND t1.f95 = 'NAPLES'
      ) as f95_naples,
      (
        SELECT MLS_LISTING_ID
        FROM idx_Bear_FMFL_Naples t1 
        WHERE t0.STREET_NUMBER = t3.STREET_NUMBER
          AND t0.STREET_NAME = t3.STREET_NAME
          AND t0.STREET_TYPE = t3.STREET_TYPE
          AND t0.STREET_DIRECTION = t3. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t3.UNIT_NUMBER
          AND t0.CITY = T3.CITY
          AND t0.ZIP_CODE = t3.ZIP_CODE
          AND t1.f95 = 'BEAR'
      ) as f95_bear
FROM idx_Bear_FMFL_Naples t0
  ) t10
)

Open in new window

0
 
dirknibleckCommented:
The T in T1.CITY is capital (also T2.CITY and T3.CITY) it should be lowercase: t1.CITY, t2.CITY, t3.CITY
0
 
Kent OlsenData Warehouse Architect / DBACommented:

Line 43.  The alias should be T3, not T1.


0
 
pda4meAuthor Commented:
closer, still getting this error:

1054 - Unknown column 't1.f95' in 'where clause'

Updated Code:

INSERT INTO idx_Bear_FMFL_Naples_Final
SELECT * FROM idx_Bear_FMFL_Naples
WHERE MLS_LISTING_ID in 
(
  SELECT
    coalesce (f95_FMFL, f95_Naples, f95_BEAR)
  FROM
  (
    SELECT DISTINCT 
      STREET_NUMBER,
      STREET_NAME,
      STREET_TYPE,
      STREET_DIRECTION,
      UNIT_NUMBER,
      CITY,
      ZIP_CODE,
      (
        SELECT MLS_LISTING_ID
        FROM  idx_Bear_FMFL_Naples t1 
        WHERE t0.STREET_NUMBER = t1.STREET_NUMBER
          AND t0.STREET_NAME = t1.STREET_NAME
          AND t0.STREET_TYPE = t1.STREET_TYPE
          AND t0.STREET_DIRECTION = t1. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t1.UNIT_NUMBER
          AND t0.CITY = t1.CITY
          AND t0.ZIP_CODE = t1.ZIP_CODE
          AND t1.f95 = 'FMFL'
      ) as f95_FMFL,
      (
        SELECT MLS_LISTING_ID
        FROM idx_Bear_FMFL_Naples t2 
        WHERE t0.STREET_NUMBER = t2.STREET_NUMBER
          AND t0.STREET_NAME = t2.STREET_NAME
          AND t0.STREET_TYPE = t2.STREET_TYPE
          AND t0.STREET_DIRECTION = t2. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t2.UNIT_NUMBER
          AND t0.CITY = t2.CITY
          AND t0.ZIP_CODE = t2.ZIP_CODE
          AND t1.f95 = 'Naples'
      ) as f95_Naples,
      (
        SELECT MLS_LISTING_ID
        FROM idx_Bear_FMFL_Naples t3 
        WHERE t0.STREET_NUMBER = t3.STREET_NUMBER
          AND t0.STREET_NAME = t3.STREET_NAME
          AND t0.STREET_TYPE = t3.STREET_TYPE
          AND t0.STREET_DIRECTION = t3. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t3.UNIT_NUMBER
          AND t0.CITY = t3.CITY
          AND t0.ZIP_CODE = t3.ZIP_CODE
          AND t1.f95 = 'BEAR'
      ) as f95_BEAR
FROM idx_Bear_FMFL_Naples t0
  ) t10
)

Open in new window

0
 
Kent OlsenData Warehouse Architect / DBACommented:

Same thing.  :)

Lines 39 and 51 need to reference t2 and t3, respectively....


Kent
0
 
pda4meAuthor Commented:
Thanks!
0
 
Kent OlsenData Warehouse Architect / DBACommented:
>> The T in T1.CITY is capital (also T2.CITY and T3.CITY) it should be lowercase: t1.CITY, t2.CITY, t3.CITY

SQL is NOT case sensitive.  Upper case, lower case, and mixed case should all work equally well.



Kent
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Amazon Web Services - Basic

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

  • 7
  • 6
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now