Partially duplicated records

SELECT ID 
FROM 
table1 A 
WHERE 
rowid > 
(SELECT min(rowid) FROM table1 B 
WHERE 
B.ID = A.ID);

Open in new window


The Above code will select duplicate rows, but will only return the rowID greater than one...  We need to return both duplicated rows.

In addition, the records we seek have duplicate values in some fields, but not all.  We need to return all fields for rows where field one, two and three are duplicated, but fields four five and six are not duplicated... And again, we need to return all fields within the partially duplicated records.

create table  table1(field1 number, field2 number, field3 number, field4 number, field5 number, field6 number) ;

insert into table1 values(1, 2, 3, 4, 5, 6);
insert into table1 values(1, 2, 3, 9, 9, 9);
insert into table1 values(1, 5, 3, 54, 15, 26);
insert into table1 values(1, 5, 3, 14, 65, 56);
insert into table1 values(9, 5, 3, 14, 65, 56);
insert into table1 values(7, 5, 3, 14, 65, 56); 

Open in new window


Expected results:

field1 field2 field3 field4 field5 field6
  1        2       3        4        5       6
  1        2       3        9        9       9
  1        5       3       54      15     26
  1        5       3       14      65     56

We need returned the two sets of two rows that are matched in the first three fields of each set.

Any and all suggestion & insights appreciated!!
J RAsked:
Who is Participating?
 
slightwv (䄆 Netminder) Commented:
First way I thought of:
select field1, field2 field3, field4, field5, field6
from (
	select field1, field2, field3, field4, field5, field6
		, count(*) over(partition by field1, field2, field3 order by field1, field2, field3) col_count
	from table1
)
where col_count > 1
/

Open in new window

0
 
slightwv (䄆 Netminder) Commented:
Is this really not a duplicate of your previous question?

http://www.experts-exchange.com/Database/Oracle/Q_28636322.html#a40665600

It seems the same to me.
0
 
J RAuthor Commented:
technically it may be.  I arrived at the two questions from different problems, but if you feel they are the same, do whatever needs to be done..

it seems the solutions are different,,,

Here we are asking about duplicates, and there it  was asking about distinct values... and when  state it like that, they seem real similar, but not duplicates.. maybe a bit distinct?
0
Cloud Class® Course: CompTIA Cloud+

The CompTIA Cloud+ Basic training course will teach you about cloud concepts and models, data storage, networking, and network infrastructure.

 
J RAuthor Commented:
using the script at slightwv2015-03-17 at 12:11:04ID: 40671579

no rows selected.  Also pulled out the sub query and no rows selected there as well.
0
 
slightwv (䄆 Netminder) Commented:
Given the sample data, I get the same results using the SQL I posted here and the example posted there.

I believe mine is more efficient but this is a small sample and you should test both to see which one works best for your environment.

Here is my setup for both queries:
--drop table  table1;
create table  table1(field1 number, field2 number, field3 number, field4 number, field5 number, field6 number) ;

insert into table1 values(1, 2, 3, 4, 5, 6);
insert into table1 values(1, 2, 3, 9, 9, 9);
insert into table1 values(1, 5, 3, 54, 15, 26);
insert into table1 values(1, 5, 3, 14, 65, 56);
insert into table1 values(9, 5, 3, 14, 65, 56);
insert into table1 values(7, 5, 3, 14, 65, 56); 
commit;
                                  
set echo on

select field1, field2, field3, field4, field5, field6
from (
	select field1, field2, field3, field4, field5, field6
		, count(*) over(partition by field1, field2, field3 order by field1, field2, field3) col_count
	from table1
)
where col_count > 1
/
                                          


SELECT field1, 
       field2, 
       field3, 
       field4, 
       field5, 
       field6 
FROM   table1 
WHERE  ( field1, field2, field3) IN (SELECT field1, field2, field3
                                           FROM   table1 
                                           GROUP  BY field1, field2, field3
                                           HAVING Count(1) > 1); 
                                          
set echo off

Open in new window



My results for both queries:
SQL> select field1, field2, field3, field4, field5, field6
  2  from (
  3          select field1, field2, field3, field4, field5, field6
  4                  , count(*) over(partition by field1, field2, field3 order b
y field1, field2, field3) col_count
  5          from table1
  6  )
  7  where col_count > 1
  8  /

    FIELD1     FIELD2     FIELD3     FIELD4     FIELD5     FIELD6
---------- ---------- ---------- ---------- ---------- ----------
         1          2          3          4          5          6
         1          2          3          9          9          9
         1          5          3         54         15         26
         1          5          3         14         65         56

SQL>
SQL>
SQL>
SQL> SELECT field1,
  2         field2,
  3         field3,
  4         field4,
  5         field5,
  6         field6
  7  FROM   table1
  8  WHERE  ( field1, field2, field3) IN (SELECT field1, field2, field3
  9                                             FROM   table1
 10                                             GROUP  BY field1, field2, field3

 11                                             HAVING Count(1) > 1);

    FIELD1     FIELD2     FIELD3     FIELD4     FIELD5     FIELD6
---------- ---------- ---------- ---------- ---------- ----------
         1          2          3          4          5          6
         1          2          3          9          9          9
         1          5          3         54         15         26
         1          5          3         14         65         56

SQL>

Open in new window

0
 
Mark GeerlingsDatabase AdministratorCommented:
Usually the quickest way to find the duplicates in a situation like this is to start by creating a non-unique index on the three columns that you are concerned about, if you don't already have an index that includes these three columns.  You can create one like this:

create index dup_index on table1
(filed1, field2, field3);

Then i usually use SQL query syntax that I find simpler to understand (but maybe not quite as efficent?) like this:

select t1.field1, t1.field2, t1.field3, t1.field4, t1.field5, t1.field6, t1.rowid
from table1 t1
where t1.rowid in (select t2.rowid from table1 t2
  where t2.field1 = t1.field1
    and t2.field2 = t1.field2
    and t2.field3 = t1.field3
    and t2.rowid <> t1.rowid)
order by t1.field1, t1.field2, t1.field3, t1.rowid;

I think the index I suggested will also help slightwv's suggestion to run faster, if your table has more than a few hundred records.  You can add the word "nologging" to the end of the "create index..." statement to allow it to complete in about half the time, if you don't use DataGuard.  And, add the word "online" if this table is very active and you can't create the index without this keyword.  Those two options are independent of each other.  You can use either one without the other.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.