?
Solved

MySQL remove duplicate based on columns

Posted on 2011-10-13
15
Medium Priority
?
305 Views
Last Modified: 2012-05-12
I have a table with multiple duplicate records.  Each has a column called f95 that can have either BEAR, FMFL or NAPLES as the data.

How do I craft a query to compare the following fields:

STREET_NUMBER, STREET_NAME, STREET_TYPE, STREET_DIRECTION, UNIT_NUMBER, CITY, ZIP_CODE

and then remove the records based on if the value is FMFL I always keep and remove the others, if not FMFL then keep NAPLES and remove other duplicates

Please show me the correct SQL query to do this.
0
Comment
Question by:pda4me
  • 7
  • 6
  • 2
15 Comments
 
LVL 46

Expert Comment

by:Kent Olsen
ID: 36962928
Hi Pda,

Is there a primary key on the rows?  Perhaps an auto_increment column?


Kent
0
 

Author Comment

by:pda4me
ID: 36962983
yes, its called bID and is INT 12 auto_increment
0
 
LVL 15

Expert Comment

by:dirknibleck
ID: 36963421
I'm not 100% sure how to do the delete, but here is a query that will get you the information you need.

If it were me, I would loop through this query and do individual deletes via PHP or an alternate language. It's possible it could be done via a Stored Procedure. I'm not confident that MySQL will accept a GROUP BY query in a DELETE statement...


The attached query will get you all duplicates in the table and will have a 2 in the f95_test column when FMLA is present in the duplicates or a 1 when NAPLES is, but not FMLA. 0 when neither appears.
STREET_NUMBER, STREET_NAME, STREET_TYPE, STREET_DIRECTION, UNIT_NUMBER, CITY, ZIP_CODE, MAX(CASE WHEN f95 = "FMLA" THEN 2 ELSE CASE WHEN f95 = "NAPLES" then 1 ELSE 0 END END) as f95_test

FROM YOUR_TABLE

GROUP BY STREET_NUMBER, STREET_NAME, STREET_TYPE, STREET_DIRECTION, UNIT_NUMBER, CITY, ZIP_CODE

HAVING count(bID) > 1

Open in new window

0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 46

Expert Comment

by:Kent Olsen
ID: 36963642
Hi pda,

One more question.  About how many rows are in the table?  The simplest query to write/understand isn't very efficient, but if the data set is relatively small, that won't be an issue.


Kent
0
 

Author Comment

by:pda4me
ID: 36963649
< 50K records
0
 
LVL 46

Expert Comment

by:Kent Olsen
ID: 36963775

Ok.  This something like this should work.  So should a variation of what Dirk posted.


Kent

INSERT INTO newtable
SELECT * FROM oldtable
WHERE fID in 
(
  SELECT
    coalesce (f95_fmfl, f95_naples, f95_bear)
  FROM
  (
    SELECT DISTINCT 
      STREET_NUMBER,
      STREET_NAME,
      STREET_TYPE,
      STREET_DIRECTION,
      UNIT_NUMBER,
      CITY,
      ZIP_CODE,
      (
        SELECT id 
        FROM mytable t1 
        WHERE t0.STREET_NUMBER = t1.STREET_NUMBER
          AND t0.STREET_NAME = t1.STREET_NAME
          AND t0.STREET_TYPE = t1.STREET_TYPE
          AND t0.STREET_DIRECTION = t1. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t1.UNIT_NUMBER
          AND t0.CITY = T1.CITY
          AND t0.ZIP_CODE = t1.ZIP_CODE
          AND t1.f95 = 'FMFL'
      ) as f95_fmfl,
      (
        SELECT id 
        FROM mytable t2 
        WHERE t0.STREET_NUMBER = t2.STREET_NUMBER
          AND t0.STREET_NAME = t2.STREET_NAME
          AND t0.STREET_TYPE = t2.STREET_TYPE
          AND t0.STREET_DIRECTION = t2. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t2.UNIT_NUMBER
          AND t0.CITY = T2.CITY
          AND t0.ZIP_CODE = t2.ZIP_CODE
          AND t1.f95 = 'NAPLES'
      ) as f95_naples,
      (
        SELECT id 
        FROM mytable t1 
        WHERE t0.STREET_NUMBER = t3.STREET_NUMBER
          AND t0.STREET_NAME = t3.STREET_NAME
          AND t0.STREET_TYPE = t3.STREET_TYPE
          AND t0.STREET_DIRECTION = t3. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t3.UNIT_NUMBER
          AND t0.CITY = T3.CITY
          AND t0.ZIP_CODE = t3.ZIP_CODE
          AND t1.f95 = 'BEAR'
      ) as f95_bear
  ) t10
)

Open in new window

0
 

Author Comment

by:pda4me
ID: 36963958
Weird, I am getting a 1054 - Unknown column 'STREET_NUMBER' in 'field list'

I verfied that this is a valid column in the table?
0
 
LVL 46

Expert Comment

by:Kent Olsen
ID: 36964040

Oops.  :)

After line 52 add this:

  FROM mytable t0



  :)

Kent
0
 

Author Comment

by:pda4me
ID: 36965431
Close, here is the script with database names in it...I am now getting a

1054 - Unknown column 'T1.CITY' in 'where clause'  

Query:
INSERT INTO idx_Bear_FMFL_Naples_Final
SELECT * FROM idx_Bear_FMFL_Naples
WHERE MLS_LISTING_ID in 
(
  SELECT
    coalesce (f95_fmfl, f95_naples, f95_bear)
  FROM
  (
    SELECT DISTINCT 
      STREET_NUMBER,
      STREET_NAME,
      STREET_TYPE,
      STREET_DIRECTION,
      UNIT_NUMBER,
      CITY,
      ZIP_CODE,
      (
        SELECT MLS_LISTING_ID
        FROM  idx_Bear_FMFL_Naples t1 
        WHERE t0.STREET_NUMBER = t1.STREET_NUMBER
          AND t0.STREET_NAME = t1.STREET_NAME
          AND t0.STREET_TYPE = t1.STREET_TYPE
          AND t0.STREET_DIRECTION = t1. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t1.UNIT_NUMBER
          AND t0.CITY = T1.CITY
          AND t0.ZIP_CODE = t1.ZIP_CODE
          AND t1.f95 = 'FMFL'
      ) as f95_fmfl,
      (
        SELECT MLS_LISTING_ID
        FROM idx_Bear_FMFL_Naples t2 
        WHERE t0.STREET_NUMBER = t2.STREET_NUMBER
          AND t0.STREET_NAME = t2.STREET_NAME
          AND t0.STREET_TYPE = t2.STREET_TYPE
          AND t0.STREET_DIRECTION = t2. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t2.UNIT_NUMBER
          AND t0.CITY = T2.CITY
          AND t0.ZIP_CODE = t2.ZIP_CODE
          AND t1.f95 = 'NAPLES'
      ) as f95_naples,
      (
        SELECT MLS_LISTING_ID
        FROM idx_Bear_FMFL_Naples t1 
        WHERE t0.STREET_NUMBER = t3.STREET_NUMBER
          AND t0.STREET_NAME = t3.STREET_NAME
          AND t0.STREET_TYPE = t3.STREET_TYPE
          AND t0.STREET_DIRECTION = t3. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t3.UNIT_NUMBER
          AND t0.CITY = T3.CITY
          AND t0.ZIP_CODE = t3.ZIP_CODE
          AND t1.f95 = 'BEAR'
      ) as f95_bear
FROM idx_Bear_FMFL_Naples t0
  ) t10
)

Open in new window

0
 
LVL 15

Assisted Solution

by:dirknibleck
dirknibleck earned 400 total points
ID: 36965846
The T in T1.CITY is capital (also T2.CITY and T3.CITY) it should be lowercase: t1.CITY, t2.CITY, t3.CITY
0
 
LVL 46

Expert Comment

by:Kent Olsen
ID: 36965926

Line 43.  The alias should be T3, not T1.


0
 

Author Comment

by:pda4me
ID: 36966259
closer, still getting this error:

1054 - Unknown column 't1.f95' in 'where clause'

Updated Code:

INSERT INTO idx_Bear_FMFL_Naples_Final
SELECT * FROM idx_Bear_FMFL_Naples
WHERE MLS_LISTING_ID in 
(
  SELECT
    coalesce (f95_FMFL, f95_Naples, f95_BEAR)
  FROM
  (
    SELECT DISTINCT 
      STREET_NUMBER,
      STREET_NAME,
      STREET_TYPE,
      STREET_DIRECTION,
      UNIT_NUMBER,
      CITY,
      ZIP_CODE,
      (
        SELECT MLS_LISTING_ID
        FROM  idx_Bear_FMFL_Naples t1 
        WHERE t0.STREET_NUMBER = t1.STREET_NUMBER
          AND t0.STREET_NAME = t1.STREET_NAME
          AND t0.STREET_TYPE = t1.STREET_TYPE
          AND t0.STREET_DIRECTION = t1. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t1.UNIT_NUMBER
          AND t0.CITY = t1.CITY
          AND t0.ZIP_CODE = t1.ZIP_CODE
          AND t1.f95 = 'FMFL'
      ) as f95_FMFL,
      (
        SELECT MLS_LISTING_ID
        FROM idx_Bear_FMFL_Naples t2 
        WHERE t0.STREET_NUMBER = t2.STREET_NUMBER
          AND t0.STREET_NAME = t2.STREET_NAME
          AND t0.STREET_TYPE = t2.STREET_TYPE
          AND t0.STREET_DIRECTION = t2. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t2.UNIT_NUMBER
          AND t0.CITY = t2.CITY
          AND t0.ZIP_CODE = t2.ZIP_CODE
          AND t1.f95 = 'Naples'
      ) as f95_Naples,
      (
        SELECT MLS_LISTING_ID
        FROM idx_Bear_FMFL_Naples t3 
        WHERE t0.STREET_NUMBER = t3.STREET_NUMBER
          AND t0.STREET_NAME = t3.STREET_NAME
          AND t0.STREET_TYPE = t3.STREET_TYPE
          AND t0.STREET_DIRECTION = t3. STREET_DIRECTION
          AND t0.UNIT_NUMBER = t3.UNIT_NUMBER
          AND t0.CITY = t3.CITY
          AND t0.ZIP_CODE = t3.ZIP_CODE
          AND t1.f95 = 'BEAR'
      ) as f95_BEAR
FROM idx_Bear_FMFL_Naples t0
  ) t10
)

Open in new window

0
 
LVL 46

Accepted Solution

by:
Kent Olsen earned 1600 total points
ID: 36967995

Same thing.  :)

Lines 39 and 51 need to reference t2 and t3, respectively....


Kent
0
 

Author Closing Comment

by:pda4me
ID: 36970001
Thanks!
0
 
LVL 46

Expert Comment

by:Kent Olsen
ID: 36970439
>> The T in T1.CITY is capital (also T2.CITY and T3.CITY) it should be lowercase: t1.CITY, t2.CITY, t3.CITY

SQL is NOT case sensitive.  Upper case, lower case, and mixed case should all work equally well.



Kent
0

Featured Post

Get free NFR key for Veeam Availability Suite 9.5

Veeam is happy to provide a free NFR license (1 year, 2 sockets) to all certified IT Pros. The license allows for the non-production use of Veeam Availability Suite v9.5 in your home lab, without any feature limitations. It works for both VMware and Hyper-V environments

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this blog post, we’ll look at how using thread_statistics can cause high memory usage.
In this article, we’ll look at how to deploy ProxySQL.
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
Suggested Courses

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question