Removing duplicate rows

We have a "document archive" storage table that looks like this:
FD_RID   KEYWORD  VALUE                    INSUSER  INSDATE
93771    CRID     892448                   CC3      10/25/2005 3:47:09 PM
93771    CRID     892448                   CC3      11/1/2005 12:08:23 PM
93771    DATE     20060101                 CC3      10/25/2005 3:47:09 PM
93771    DATE     20060101                 CC3      11/1/2005 12:08:23 PM
93771    DIST     INT                      AHZ      11/2/2007 11:32:24 AM
93771    DNAME    CDues Inv                CC3      10/25/2005 3:47:09 PM
93771    DNAME    CDues Inv                CC3      11/1/2005 12:08:23 PM
93771    FNAME    \892448\CDues\2006CDues  CC3      10/21/2005 5:11:43 PM
93771    RDATE    20051021                 CC3      10/25/2005 3:47:09 PM
93771    RDATE    20051021                 CC3      11/1/2005 12:08:23 PM

Open in new window

That is the "record" for one single document. Think of this as a kind of tuple table where we have multiple "columns" that are defined by the KEYWORDs. A document may have more, less or even different KEYWORDs than above.

We have 30,000,000 records in the table and literally thousands of records where we have duplicate entries like the ones above,
CRID, DNAME, RDATE

How would write a query to delete the oldest record (based on INSDATE) that has a duplicate. Sometimes they will have the same INSDATE but most of the time, one of them will be newer.
LVL 26
Eddie ShipmanAll-around developerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

slightwv (䄆 Netminder) Commented:
Try something like this:
delete from tab1 where rowid in (
	select myrowid from (
		select rowid myrowid, row_number() over(partition by fd_rid, keyword, value, insuser order by fd_rid, keyword, value, insuser, insdate desc) rn
		from some_table
	) where rn > 1
)
/

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Karthik RamachandranCommented:
DELETE FROM MyTable
WHERE rowid in (Select
                   rowid
                From
                   MyTable
                GROUP By
                  FD_RID,
                  KEYWORD,
                  VALUE,
                  INSURER,
                  INSDATE
                HAVING
                  INSDATE = MIN(INSDATE)
                AND COUNT(1) > 1);

Open in new window

0
slightwv (䄆 Netminder) Commented:
Karthik,
I don't think that will work.  You'll get a ora-00979 on rowed from the select.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Eddie ShipmanAll-around developerAuthor Commented:
@Karthik, getting rowid not a GROUP BY expression.
0
slightwv (䄆 Netminder) Commented:
Try mine.  It was tested against version 11.2.0.2.
0
Farzad AkbarnejadDeveloperCommented:
Hello,
Try the following tested lines of code:
delete from tuple where (keyword, insdate) in (
select  keyword, insdate from (
select t.*, row_number() over (partition by keyword order by insdate) rn from TUPLE t ) where rn=1
);
commit;

Open in new window

-FA
0
pcelbaCommented:
If you have duplicate rows which do not differ in any column value then it is easier to create a new table without multiplicities... because you cannot delete one of the two or more equal rows.

1. Create a new table having structure same as the original table plus one column for row number:
CREATE TABLE YourNewTable ... etc.   (row number column can be named rn  with integer data type)

2. Insert all records from the old table into this new table
INSERT INTO YourNewTable ( ... possible column list ...)
SELECT *, ROW_NUMBER() OVER (PARTITION BY FD_RID, KEYWORD, VALUE, INSUSER ORDER BY INSDATE DESC) AS rn
  FROM YourOldTable

3. Delete duplicities
DELETE FROM YourNewTable WHERE rn > 1

4. Now you have a few options how to finish this task, e.g.:
4a. You may delete all rows from the old table and insert all rows from the new table
4b. You may delete the old table, rename the new table, and remove the rn column

Update:  The first solution should work, it just does not create any backup...
0
slightwv (䄆 Netminder) Commented:
>>Try the following tested lines of code:

I believe you will need to partition by more than just keyword.

When you add the necessary columns, then you will have what I posted.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Oracle Database

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.