Link to home
Start Free TrialLog in
Avatar of Member_2_957366
Member_2_957366

asked on

Comparing Duplicate Records between Columns

I need to run a simple query to produce a comparable results between the columns and need your help.  Here is the sample table info:

Column 1                                Column 2        Column 3     Column 4
      
00001111                                     XXXXX            XXXX1            XXXX2
00002222                                     YYYYY               SSSS            XXXXX
00003333                                     ZZZZ                                 SSSS
00004444                                     SSSS              SSSS1

 
Each row has 4 columns.  Column 1 is a unique key so there won't be any duplicate records.  Columns 2, 3 and 4 are not unique so we might have the same record from other columns as well.  In the sample above, Record "XXXXX" from Row 1, Column 2 is also in Row 2, Column 4.  The same thing applies to record "SSSS" where it is in Row 2, Column3, Row 3, Column 4 and Row 4, column 2.

I need to produce a result showing these duplicate records.  Database is Oracle 8i, running Windows NT4.

Thanks....
Avatar of Mark Geerlings
Mark Geerlings
Flag of United States of America image

This example assumes that your table is named: "my_table".

select * from my_table t1
where exists (select 1 from my_table t2
  where t2.column1 <> t1.column1
    and (t2.column2 = t1.column2
      or t2.column3 = t1.column2
      or t2.column4 = t1.column2
      or t2.column2 = t1.column3
      or t2.column3 = t1.column3
      or t2.column4 = t1.column3
      or t2.column2 = t1.column4
      or t2.column3 = t1.column4
      or t2.column4 = t1.column4);

It may be possible to use an "in" set and reduce the number of "or" conditions, but this was the quickest solution I could think of.
Avatar of Member_2_957366
Member_2_957366

ASKER

Hi Markgeer,

Your script did not work.  It produces every single row of records in my database !!!
Hi hunart,

Try this

SELECT *
  FROM table1 a
 WHERE EXISTS (
          SELECT 1
            FROM table1 b
           WHERE (   b.column2 = a.column2
                  OR b.column2 = a.column3
                  OR b.column2 = a.column4
                 )
              OR (   b.column3 = a.column2
                  OR b.column3 = a.column3
                  OR b.column3 = a.column4
                 )
              OR     (   b.column4 = a.column2
                      OR b.column4 = a.column3
                      OR b.column4 = a.column4
                     )
                 AND b.column1 <> a.column1)

Cheers
Nickson
opss... same sql as markgeer :p

Nickson
Try this:

select t1.*
,t2.*
,t3.*
from my_table t1,my_table t2,my_table t3
where t1.column1!=t2.column1 and t2.column1!=t3.column1 and t3.column1!=t1.column1
and (
t1.column2 in (t2.column2,t2.column3,t2.column4,t3.column2,t3.column3,t3.column4)
or
t1.column3 in (t2.column2,t2.column3,t2.column4,t3.column2,t3.column3,t3.column4)
or
t1.column4 in (t2.column2,t2.column3,t2.column4,t3.column2,t3.column3,t3.column4)
)
Sorry. i was wrong.

There is better solution

select distinct t1.*
,t2.*
from my_table t1,my_table t2,my_table t3
where t1.column1!=t2.column1
and (
t1.column2 in (t2.column2,t2.column3,t2.column4)
or
t1.column3 in (t2.column2,t2.column3,t2.column4)
or
t1.column4 in (t2.column2,t2.column3,t2.column4)
)
I tried Radja7 and it did not work.  The SQLPLUS was frozen and did not produce any results.

Can anyone please help....
CREATE OR REPLACE FORCE VIEW PRES_FORM.VIEW_TEST
(IND, COLUMN01, "2-3", "2-4", "3-4")
AS
SELECT ROWNUM IND,A.* FROM (
SELECT COLUMN01,COLUMN02 "2-3",COLUMN02 "2-4",COLUMN03 "3-4" FROM TBLPRUEBAS
UNION
SELECT COLUMN01,COLUMN03,COLUMN04,COLUMN04 FROM TBLPRUEBAS
) A;


SELECT * FROM TBLPRUEBAS WHERE COLUMN01 IN
(
SELECT column01
  FROM view_test
 WHERE "2-3" IN (SELECT   "2-3"
                     FROM (SELECT *
                             FROM view_test) a
                   HAVING COUNT (*) > 1
                 GROUP BY "2-3")
UNION
SELECT column01
  FROM view_test
 WHERE "2-4" IN (SELECT   "2-4"
                     FROM (SELECT *
                             FROM view_test) a
                   HAVING COUNT (*) > 1
                 GROUP BY "2-4")
UNION
SELECT column01
  FROM view_test
 WHERE "3-4" IN (SELECT   "3-4"
                     FROM (SELECT *
                             FROM view_test) a
                   HAVING COUNT (*) > 1
                 GROUP BY "3-4")
)


TBLPRUEBAS,  YOUR TEST DATA.



A series of union and distinct keywords shold do the trick....


select distinct column2 from my_table
where column2 in  (select distinct column3 from my_table)
union
select distinct column2 from my_table
where column2 in (select distinct column4 from my_table)
union
select column2 from my_table
group by column2
having count(*) > 1
union
select distinct column3 from my_table
where column3 in  (select distinct column2 from my_table)
union
select distinct column2 from my_table
where column3 in (select distinct column4 from my_table)
union
select column3 from my_table
group by column3
having count(*) > 1
union
select distinct column4 from my_table
where column4 in  (select distinct column3 from my_table)
union
select distinct column2 from my_table
where column4 in (select distinct column2 from my_table)
union
select column4 from my_table
group by column4
having count(*) > 1

Here it assumes that column -1 values can not be present in either column2,3or 4.

Rgds,
HR

I have not had time to try pbocanegra's suggestion since it involves creating a view but I will try tomorrow.

I tried Harish_Rajani's suggestion and my SQLPLUS session froze.
Is it a very big table ?

Can you also try following?

1) If possible create the index on column2, column3 & column4.

2) Instead of 'IN' clause, you can use exist, which should be faster and more cost effective.


select distinct column2 from my_table t
where exists
(select 1 from my_table t1
where t1.column3 = t.column2)
or exists
(select 1 from my_table t1
where t1.column4 = t.column2)

UNION

select distinct column3 from my_table t
where exists
(select 1 from my_table t1
where t1.column2 = t.column3)
or exists
(select 1 from my_table t1
where t1.column4 = t1.column3)

UNION

select distinct column4 from my_table t
where exists
(select 1 from my_table t1
where t1.column2 = t.column4)
or exists
(select 1 from my_table t1
where t1.column3 = t.column4)

UNION

select column2 from my_table t
group by column2
having count(*) > 1

UNION

select column3 from my_table t
group by column3
having count(*) > 1

UNION

select column4 from my_table t
group by column4
having count(*) > 1;


Rgds,
HR



ASKER CERTIFIED SOLUTION
Avatar of pbocanegra
pbocanegra

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Harish,

The table is not very big.  It has approx 38,000 records.  Yes, the table is indexed and there are other fields as well, not just these 4 fields.

I ran your script but run into error message:

SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.

Can you please look at this again.

Thanks...
Hi hunart,

I have given the extra line before and after the 'UNION' for purpose of showing it more clearly. While executing, you can remove this extra line.

The Query should run fine thus.

Please report if even then you face difficulties.

Rgds,
HR
Hi Harish,

I removed the extra lines and then the script froze.

Thanks...
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
hunart
Are you test my code?


pbocanegra
Why not try following :
It has the same effect..after all you just want the values that duplicate across column2 , 3 or 4.


SELECT col FROM
(
SELECT DISTINCT column2 AS col FROM my_table
UNION ALL
SELECT DISTINCT column3  AS col FROM my_table
UNION ALL
SELECT DISTINCT column4 AS col  FROM my_table
)
GROUP BY col
HAVING COUNT(*) > 1


Rgds,
HR
Hi pbocanegra,

No, I have not tried your script as yet as it involves creating a view.  However, I will look at it today and report back..

Thanks...
select col2 ,col3 from table1 where col2 in (select col3 from table )

if u apply on all colums u might get all row cuz all rows have some data matching other columns