Solved

remove duplicates in SQL for only specific columns and leaving one remaining row without blank address

Posted on 2007-03-22
7
213 Views
Last Modified: 2010-03-20
I recently got a question answered on here that helped out a lot about removing duplicates from my SQL Server table. I now need further assistance with this same question. Please see my previous quesition and the solution here:

http://www.experts-exchange.com/Programming/Languages/SQL_Syntax/Q_22423230.html

I'd now like to specify which dupicates to remove. So if I had the below table...

id (not primary key but unique) |  name   |  email              | password        | address
1                                                    tom     tom@aol.com     tomrules            
2                                                   sue      sue@aol.com    sparkle               25 Oak st
3                                                  harry    tom@aol.com     tomrules          354 Elm st
4                                                  sally     sally@aol.com    frisky               98 Walnut St
5                                                 susan   sue@aol.com     sparkle            

I'd like to remove all duplicates above where address is blank so that the resulting table would be....

id (not primary key but unique) |  name   |  email              | password        | address
2                                                   sue      sue@aol.com    sparkle               25 Oak st
3                                                  harry    tom@aol.com     tomrules          354 Elm st
4                                                  sally     sally@aol.com    frisky               98 Walnut St

Please give me a solution that fits the format of the previously accepted answer as this wored best for me..

delete from members where id not in (select min(id) id from members group by (email+password))

Thank you very much

0
Comment
Question by:trevoray
  • 4
  • 3
7 Comments
 
LVL 5

Expert Comment

by:Steve Dubyo
Comment Utility
Does this do what you want..

delete from members where id not in (select min(id) id from members group by (email+password) where address is not null)
0
 

Author Comment

by:trevoray
Comment Utility
ok, here's a problem. I should've mentioned this in question. Most importantly, I need to get rid of duplicates. Sometimes there will be duplicates where address field is filled out for both. And sometimes there will be duplicates where only one address field is filled out. I need the one with address field filled out to stay. But if both duplicates have an address field, or if both have no address field, I still need one of the duplicates remove. The above code looks like it will only remove a duplicate if one of the address fields is blank. Can you help provide code that will remove duplicates (email/password) in this fashion?

Thanks
0
 

Author Comment

by:trevoray
Comment Utility
what about this, does anyone think this would work?...

delete from members where id not in (select top 1 id from members group by (email+password) ORDER BY address)

that would put the NULL address up top and it would only select first row. I think this might work.
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 

Author Comment

by:trevoray
Comment Utility
No, that doesn't work because I can't order by a column that I'm not selecting in returned results. And I'm only allowed to select ID when I do "NOT IN"

Can anyone help?

tks
0
 
LVL 5

Accepted Solution

by:
Steve Dubyo earned 500 total points
Comment Utility
Ye that is a bit more in depth!  I got curious tho and gave it a go, recreated the table and got this working...

It looks allot but what it is doing is deleting from the members table where the id doesnt equal one of three rules:
1, Is a unique combination of email & pwd
2, Is the min(id) where email+pwd not unique but has an address
3, Is the min(id) where email+pwd not unique, hasn't got an address but isn't in rule 2

delete
from members
where id not in
      (
      select i
      from
            (
            select (email+password) e, min(id) i, count(id) c
            from members
            group by (email+password)
            ) s
      where c = 1
      union
      select  min(i) i
      from
            (
            select (email+password) e, id i
            from members a
            join
                  (
                  select count(m.id) c, (m.email+m.password) e
                  from members m
                  join
                        (
                        select count(id) c , (email+password) e
                        from members
                        group by (email+password)
                        ) t
                  on (m.email+m.password) = t.e
                  where c > 1  
                  group by (m.email+m.password)
                  ) d
            on (a.email+a.password) = e
            where address is null
            ) n
      where e not in
            (
                  select (email+password) e
                  from members a
                  join
                        (
                        select count(m.id) c, (m.email+m.password) e
                        from members m
                        join
                              (
                              select count(id) c , (email+password) e
                              from members
                              group by (email+password)
                              ) t
                        on (m.email+m.password) = t.e
                        where c > 1  
                        group by (m.email+m.password)
                        ) d
                  on (a.email+a.password) = e
                  where address is  not null
                  group by (email+password)
            )
      group by e
      union
      select  min(id) i
      from members a
      join
            (
            select count(m.id) c, (m.email+m.password) e
            from members m
            join
                  (
                  select count(id) c , (email+password) e
                  from members
                  group by (email+password)
                  ) t
            on (m.email+m.password) = t.e
            where c > 1  
            group by (m.email+m.password)
            ) d
      on (a.email+a.password) = e
      where address is  not null
      group by (email+password)
      )

I'm using sql server btw incase the syntax differs, what r u using?
0
 

Author Comment

by:trevoray
Comment Utility
i'm using SQL Server. Thanks!
0
 
LVL 5

Expert Comment

by:Steve Dubyo
Comment Utility
No problem !
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Confronted with some SQL you don't know can be a daunting task. It can be even more daunting if that SQL carries some of the old secret codes used in the Ye Olde query syntax, such as: (+)     as used in Oracle;     *=     =*    as used in Sybase …
If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
This video discusses moving either the default database or any database to a new volume.
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now