Solved

Problems deleting duplicate rows from SQL table

Posted on 2004-04-13
23
871 Views
Last Modified: 2008-02-01
Hi all

I've hunted and hunted for a solution to this problem, but to no avail - I'll try to explain.

I have a table 'tbl_arc_leeds_sent' which has 18 columns, one of which is 'itemID' which has been set as IDENTITY.

Through being a complete idiot one day, I managed to TRIPLICATE 5738 rows of data in said table.  I'm now stuck with trying to remove the 11,476 rows that are not required.

However, some of the column types are ntext, therefore the solution often offered of 'SELECT DISTINCT' wont work.

I'm not concerned with any other column other than 'itemID', as this is my key to knowing whether the rows are duplicated; that's enough of a check for me for this table.

I can find all of the duplicated entries; Query Analyzer shows all 17,214 rows - but as each unique entry shows up three times, each of these with the same 'itemID' three times, I'm having trouble.

Hopefully someone will be able to help me out!

Cheers in advance.
Matt
0
Comment
Question by:thisismatt
  • 7
  • 5
  • 3
  • +4
23 Comments
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 10813159
I see a contradiction here:
>> one of which is 'itemID' which has been set as IDENTITY.<<
...
>>each unique entry shows up three times, each of these with the same 'itemID' three times, I'm having trouble.<<

This should not be possible.
0
 
LVL 34

Expert Comment

by:arbert
ID: 10813175
So, if you triplicated entries and there was an identity column, couldn't you have just deleted the ones with an identity greater than your last known good identity?
0
 

Author Comment

by:thisismatt
ID: 10813481
My bad sorry @ acperkins, I was looking at the wrong table, its sister table has IDENTITY!

To clarify, there is NO IDENTITY column in the table.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 10813763
Can you post some sample data (obfuscated if needed)?
0
 
LVL 7

Expert Comment

by:ChrisFretwell
ID: 10813779
Yuck, yuck and yuck.  Even if you dont have an identity column, do you have something that you can join to with the table?

One way around this, while not elegant, will at least clean up your data.

Create a blank copy of your table.
Select all the non-text distinct columns into the new table and
then update the text columns by joining back to your original table (on as many columns as you need to) (if they are duplicate rows, then the text columns will be the same for any values).

Chris
0
 
LVL 10

Expert Comment

by:Shailesh15
ID: 10813822
Then you have to write a T-sql procedure to delete..  You need to modify as per your table..
***************************
DECLARE @iErrorVar int,
--table fields
@vName varchar(30),
@vId int,

@iCount int,
@chCount char(3),
@nvchCommand nvarchar(4000)

SET ROWCOUNT 0
SET NOCOUNT ON

-- Build cursor to find duplicated information

DECLARE DelDupe CURSOR FOR
SELECT COUNT(*) AS cnt, departmentName,department_Id
FROM departments
GROUP BY department_id,departmentName
HAVING COUNT(*) > 1

OPEN DelDupe

FETCH NEXT FROM DelDupe INTO @iCount,@vName,@vId

WHILE (@@fetch_status = 0)

BEGIN

-- Calculate number of rows to delete for each grouping by subtracting
-- 1 from the total count for a given group.
      SELECT @iCount = @iCount - 1
      SELECT @chCount = CONVERT(char(3),@iCount)

-- now build the rowcount and delete statements.

      SELECT @nvchCommand = N'SET ROWCOUNT ' + @chCount +
                        'DELETE departments ' +
                        ' WHERE department_id = ' +   convert(char(5),@vId)  +
                        ' AND departmentname = ''' +  @vName + ''''

-- print the statement.
      PRINT  @nvchCommand

-- execute the statement.
      EXEC sp_executesql @nvchCommand
      SELECT @iErrorVar = @@Error

      IF @iErrorVar <> 0
      BEGIN
            RETURN
      END

      FETCH NEXT FROM DelDupe INTO @iCount,@vName,@vId
END

CLOSE DelDupe
DEALLOCATE DelDupe
RETURN
****************************************
0
 
LVL 12

Expert Comment

by:geotiger
ID: 10813940

When you say duplicated row, you mean they are identical (all the corresponding columns are the same value)? Then it is easy to de-duplcate them. I did once before without writing SP. I will try to search the code after my meeting today.

GT
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 10814156
If you don't have a datetime or some other identifying value, I do think you will have to use a cursor, but you don't need dynamic SQL.  For example:


DECLARE dupsCsr CURSOR READ_ONLY FOR
SELECT [itemId], COUNT(*) AS numDups
FROM tbl_arc_leeds_sent
GROUP BY [itemId]
HAVING COUNT(*) > 1
DECLARE @itemId INT
DECLARE @numDups INT

OPEN dupsCsr
FETCH NEXT FROM dupsCsr INTO @itemId, @numDups
WHILE @@FETCH_STATUS = 0
BEGIN
      SET @numDups = @numDups - 1 --delete all but 1 of the duplicates
      SET ROWCOUNT @numDups
      DELETE FROM yourTable
      WHERE [itemId] = @itemId
      FETCH NEXT FROM dupsCsr INTO @itemId, @col2, @col3, @numDups
END --WHILE
CLOSE dupsCsr
DEALLOCATE dupsCsr

SET ROWCOUNT 0 --restore default
0
 
LVL 12

Expert Comment

by:geotiger
ID: 10816132
I do not know whether this is what you need. You can create a new table to hold the unique records. The follow is what I tested:

CREATE TABLE EMP1 (
    itemID       numeric (8)    NOT NULL,
    DEPT_ID      NUMeric (8)    NOT NULL
)
go

insert into emp1 values(500,100);
insert into emp1 values(501,100);
insert into emp1 values(502,100);
insert into emp1 values(503,100);
insert into emp1 values(504,100);
insert into emp1 values(500,100);
insert into emp1 values(502,100);

select * from emp1

-- Hold unique records in a new table by sorting all the columns
select * into emp2
 from emp1
group by itemID, dept_id
go

/*-----------------------------
select * from emp2;
-----------------------------*/
itemID,DEPT_ID
500,100
501,100
502,100
503,100
504,100

(5 row(s) affected)

The duplicates are removed.

GT


0
 

Author Comment

by:thisismatt
ID: 10822025
Oh my word, this is going into depths I've never been to before!

Geotiger - If I look up row 1000, I'll get three rows returned.  Each one of those rows has identical data in each column.

All of the solutions shown above are VERY new to me (I only do really, REALLY, simple stuff with SQL!), so I'm going to try them in the order they were submitted - cos to be honest, none of them mean anything to me, I don't understand many of functions being used.

I'm sure I'll come up with issues with each of them, as they weren't written with my table in mind, so I'll post back issues that I have.

Thanks in advance though :)
0
 

Author Comment

by:thisismatt
ID: 10822121
Hmmm, having now looked through what has been posted, I am really confused.  Sorry guys, but you're dealing with a rank amateur!

Shailesh15 - Which bits do I need to tweak exactly?  I don't want to start changing things that I think need to be changed and mess it all up.

ScottPletcher - I got the following returned: Server: Msg 137, Level 15, State 1, Line 17
Must declare the variable '@col2'.

Geotiger - Sorry, should have seen your latest entry before replying to your first!  A bit like Shailesh15, what bits do I need to change to my scenario?

I guess it would make life easier for us all if I tell you the following:

Table Name:
tbl_arc_leeds_sent

Column Names:
itemID (int, Not Null)
Eate_Of_Export (smalldatetime, Null)
CreationTime (datetime, Null)
LastModificationTime (datetime, Null)
ReceivedTime (datetime, Null)
Importance (nvarchar(50), Null)
Sensitivity (nvarchar(50), Null)
SenderName (nvarchar(255), Null)
ReceivedByName (nvarchar(255), Null)
ToAddress (ntext, Null)
CCAddress (ntext, Null)
Subject (ntext, Null)
Body (ntext, Null)
EmailSize (nvarchar(50), Null)
ReadReceiptRequested (nvarchar(50), Null)
OriginalDeliveryReportRequested (nvarchar(50), Null)
Attachments (ntext, Null)

That's how my table is built (I inherited it before anyone says anything!), so hopefully that'll asist with my problem(s).

Cheers
Matt
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 10

Expert Comment

by:Shailesh15
ID: 10822824
Strictly speaking We need to test each field to make sure both rows are exactly same. Since you are sure that data is duplicated, here is the version which checks for itemid & if two rows have same id takes it as duplicate.

Run the code in sql query analyzer.( Please backup the data before proceed!)

*********************************************************
DECLARE @iErrorVar int,
--table fields
@itemID int,


@iCount int,
@chCount char(3),
@nvchCommand nvarchar(4000)

SET ROWCOUNT 0
SET NOCOUNT ON

-- Build cursor to find duplicated information

DECLARE DelDupe CURSOR FOR
    SELECT COUNT(*) AS cnt, itemID
         FROM tbl_arc_leeds_sent
       GROUP BY itemID
      HAVING COUNT(*) > 1

OPEN DelDupe

FETCH NEXT FROM DelDupe INTO @iCount,@itemID

WHILE (@@fetch_status = 0)

BEGIN

-- Calculate number of rows to delete for each grouping by subtracting
-- 1 from the total count for a given group.
     SELECT @iCount = @iCount - 1
     SELECT @chCount = CONVERT(char(3),@iCount)

-- now build the rowcount and delete statements.

     SELECT @nvchCommand = N'SET ROWCOUNT ' + @chCount +
                    'DELETE tbl_arc_leeds_sent' +
                    ' WHERE itemID= ' +   convert(char(5),@itemID)  


-- print the statement.
     PRINT  @nvchCommand

-- execute the statement.
     EXEC sp_executesql @nvchCommand
     SELECT @iErrorVar = @@Error

     IF @iErrorVar <> 0
     BEGIN
          RETURN
     END

     FETCH NEXT FROM DelDupe INTO @iCount,@itemID
END

CLOSE DelDupe
DEALLOCATE DelDupe
RETURN
********************************************************************
0
 
LVL 12

Expert Comment

by:geotiger
ID: 10822844
If my assumption is correct, i.e., the duplicated records are identical in every column, then you do not care which one of the duplicated will be moved to the new table so that you can use my suggested method. Here are the steps involved:

1. create a new table to hold the unique records

select * into tbl_arc_temp
from tbl_arc_leeds_sent
group by itemID, Eate_Of_Export, CreationTime, LastModificationTime,
    ReceivedTime, Importance, Sensitivity, SenderName, ReceivedByName,
    ToAddress, CCAddress, Subject, Body, EmailSize, ReadReceiptRequested,
    OriginalDeliveryReportRequested, Attachments

2. if you have disk space, make a backup copy of the original table

select * into tbl_arc_backup
from tbl_arc_leeds_sent


3. drop your original table

drop table tbl_arc_leeds_sent
go

4. move the de-duplicated records back to the original table

select * into tbl_arc_leeds_sent
from tbl_arc_temp

Hope this helps.

GT
0
 

Author Comment

by:thisismatt
ID: 10823273
Shailesh15, I got the following (after about 15 seconds):

SET ROWCOUNT 2  DELETE tbl_arc_leeds_sent WHERE itemID= *    
Server: Msg 170, Level 15, State 1, Line 1
Line 1: Incorrect syntax near '*'.
0
 

Author Comment

by:thisismatt
ID: 10823374
Geotiger, I got the following (pretty much immediately):

Server: Msg 8120, Level 16, State 1, Line 1
Column 'tbl_arc_leeds_sent.SentOn' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
0
 
LVL 10

Accepted Solution

by:
Shailesh15 earned 500 total points
ID: 10823440
This is because of the length of the itemid or count . Now I have set it for 8 digits
 convert(char(8),@itemID)  

try this...
*******************************
DECLARE @iErrorVar int,
--table fields
@itemID int,


@iCount int,
@chCount char(3),
@nvchCommand nvarchar(4000)

SET ROWCOUNT 0
SET NOCOUNT ON

-- Build cursor to find duplicated information

DECLARE DelDupe CURSOR FOR
    SELECT COUNT(*) AS cnt, itemID
        FROM tbl_arc_leeds_sent
      GROUP BY itemID
     HAVING COUNT(*) > 1

OPEN DelDupe

FETCH NEXT FROM DelDupe INTO @iCount,@itemID

WHILE (@@fetch_status = 0)

BEGIN

-- Calculate number of rows to delete for each grouping by subtracting
-- 1 from the total count for a given group.
     SELECT @iCount = @iCount - 1
     SELECT @chCount = CONVERT(char(8),@iCount)

-- now build the rowcount and delete statements.

     SELECT @nvchCommand = N'SET ROWCOUNT ' + @chCount +
                    'DELETE tbl_arc_leeds_sent' +
                    ' WHERE itemID= ' +   convert(char(8),@itemID)  


-- print the statement.
     PRINT  @nvchCommand

-- execute the statement.
     EXEC sp_executesql @nvchCommand
     SELECT @iErrorVar = @@Error

     IF @iErrorVar <> 0
     BEGIN
          RETURN
     END

     FETCH NEXT FROM DelDupe INTO @iCount,@itemID
END

CLOSE DelDupe
DEALLOCATE DelDupe
RETURN
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 10823526
D'OH, forgot to change the second FETCH.


DECLARE dupsCsr CURSOR READ_ONLY FOR
SELECT [itemId], COUNT(*) AS numDups
FROM tbl_arc_leeds_sent
GROUP BY [itemId]
HAVING COUNT(*) > 1
DECLARE @itemId INT
DECLARE @numDups INT

OPEN dupsCsr
FETCH NEXT FROM dupsCsr INTO @itemId, @numDups
WHILE @@FETCH_STATUS = 0
BEGIN
     SET @numDups = @numDups - 1 --delete all but 1 of the duplicates
     SET ROWCOUNT @numDups
     DELETE FROM yourTable
     WHERE [itemId] = @itemId
     FETCH NEXT FROM dupsCsr INTO @itemId, @numDups
END --WHILE
CLOSE dupsCsr
DEALLOCATE dupsCsr

SET ROWCOUNT 0 --restore default
0
 

Author Comment

by:thisismatt
ID: 10823830
Shailesh15, it worked!  Not meaning to sound surprised, just VERY pleased!!

It took 20 minutes to complete, but it was worth waiting!

Thank you so much.

I now realise that I managed to click the wrong "Accept Answer" button - how do I ensure Shailesh15 gets the credit?!?  Erk!
0
 
LVL 12

Expert Comment

by:geotiger
ID: 10823832

I wrote the code based on the columns that you listed in your message. You just add those columns that you did not list in your previous posting. Basically, you need to include all the columns in the table in the group by statement.

You can find out the column names using

sp_columns tbl_arc_leeds_sent

Or

you can use Query Analyzer's GUI to look for the column names.


Let me know if it works.

0
 
LVL 12

Expert Comment

by:geotiger
ID: 10823882

You may have to ask EE administor to change that. I believe that my method would work as well since I have used it to finish a samiliar project.

0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 10823956
>>I now realise that I managed to click the wrong "Accept Answer" button - how do I ensure Shailesh15 gets the credit?!?  Erk!<<
I accepted the wrong answer. Now what?
http://www.experts-exchange.com/help.jsp#hi17
0
 

Author Comment

by:thisismatt
ID: 10824717
Wow, you guys on here really are helpful!

Thanks acperkins.

Geotiger - I don't doubt that your method would work, but for fairness I tried them in the order that they were posted - I couldn't think of a fairer way I'm afraid.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

I wrote this interesting script that really help me find jobs or procedures when working in a huge environment. I could I have written it as a Procedure but then I would have to have it on each machine or have a link to a server-related search that …
JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Via a live example, show how to setup several different housekeeping processes for a SQL Server.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now