Comparing Duplicate Records between Columns

I need to run a simple query to produce a comparable results between the columns and need your help.  Here is the sample table info:

Column 1                                Column 2        Column 3     Column 4
      
00001111                                     XXXXX            XXXX1            XXXX2
00002222                                     YYYYY               SSSS            XXXXX
00003333                                     ZZZZ                                 SSSS
00004444                                     SSSS              SSSS1

 
Each row has 4 columns.  Column 1 is a unique key so there won't be any duplicate records.  Columns 2, 3 and 4 are not unique so we might have the same record from other columns as well.  In the sample above, Record "XXXXX" from Row 1, Column 2 is also in Row 2, Column 4.  The same thing applies to record "SSSS" where it is in Row 2, Column3, Row 3, Column 4 and Row 4, column 2.

I need to produce a result showing these duplicate records.  Database is Oracle 8i, running Windows NT4.

Thanks....
LVL 8
hunartAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Mark GeerlingsDatabase AdministratorCommented:
This example assumes that your table is named: "my_table".

select * from my_table t1
where exists (select 1 from my_table t2
  where t2.column1 <> t1.column1
    and (t2.column2 = t1.column2
      or t2.column3 = t1.column2
      or t2.column4 = t1.column2
      or t2.column2 = t1.column3
      or t2.column3 = t1.column3
      or t2.column4 = t1.column3
      or t2.column2 = t1.column4
      or t2.column3 = t1.column4
      or t2.column4 = t1.column4);

It may be possible to use an "in" set and reduce the number of "or" conditions, but this was the quickest solution I could think of.
hunartAuthor Commented:
Hi Markgeer,

Your script did not work.  It produces every single row of records in my database !!!
NicksonKohCommented:
Hi hunart,

Try this

SELECT *
  FROM table1 a
 WHERE EXISTS (
          SELECT 1
            FROM table1 b
           WHERE (   b.column2 = a.column2
                  OR b.column2 = a.column3
                  OR b.column2 = a.column4
                 )
              OR (   b.column3 = a.column2
                  OR b.column3 = a.column3
                  OR b.column3 = a.column4
                 )
              OR     (   b.column4 = a.column2
                      OR b.column4 = a.column3
                      OR b.column4 = a.column4
                     )
                 AND b.column1 <> a.column1)

Cheers
Nickson
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

NicksonKohCommented:
opss... same sql as markgeer :p

Nickson
radja7Commented:
Try this:

select t1.*
,t2.*
,t3.*
from my_table t1,my_table t2,my_table t3
where t1.column1!=t2.column1 and t2.column1!=t3.column1 and t3.column1!=t1.column1
and (
t1.column2 in (t2.column2,t2.column3,t2.column4,t3.column2,t3.column3,t3.column4)
or
t1.column3 in (t2.column2,t2.column3,t2.column4,t3.column2,t3.column3,t3.column4)
or
t1.column4 in (t2.column2,t2.column3,t2.column4,t3.column2,t3.column3,t3.column4)
)
radja7Commented:
Sorry. i was wrong.

There is better solution

select distinct t1.*
,t2.*
from my_table t1,my_table t2,my_table t3
where t1.column1!=t2.column1
and (
t1.column2 in (t2.column2,t2.column3,t2.column4)
or
t1.column3 in (t2.column2,t2.column3,t2.column4)
or
t1.column4 in (t2.column2,t2.column3,t2.column4)
)
hunartAuthor Commented:
I tried Radja7 and it did not work.  The SQLPLUS was frozen and did not produce any results.

Can anyone please help....
pbocanegraCommented:
CREATE OR REPLACE FORCE VIEW PRES_FORM.VIEW_TEST
(IND, COLUMN01, "2-3", "2-4", "3-4")
AS
SELECT ROWNUM IND,A.* FROM (
SELECT COLUMN01,COLUMN02 "2-3",COLUMN02 "2-4",COLUMN03 "3-4" FROM TBLPRUEBAS
UNION
SELECT COLUMN01,COLUMN03,COLUMN04,COLUMN04 FROM TBLPRUEBAS
) A;


SELECT * FROM TBLPRUEBAS WHERE COLUMN01 IN
(
SELECT column01
  FROM view_test
 WHERE "2-3" IN (SELECT   "2-3"
                     FROM (SELECT *
                             FROM view_test) a
                   HAVING COUNT (*) > 1
                 GROUP BY "2-3")
UNION
SELECT column01
  FROM view_test
 WHERE "2-4" IN (SELECT   "2-4"
                     FROM (SELECT *
                             FROM view_test) a
                   HAVING COUNT (*) > 1
                 GROUP BY "2-4")
UNION
SELECT column01
  FROM view_test
 WHERE "3-4" IN (SELECT   "3-4"
                     FROM (SELECT *
                             FROM view_test) a
                   HAVING COUNT (*) > 1
                 GROUP BY "3-4")
)


TBLPRUEBAS,  YOUR TEST DATA.



Harish_RajaniCommented:
A series of union and distinct keywords shold do the trick....


select distinct column2 from my_table
where column2 in  (select distinct column3 from my_table)
union
select distinct column2 from my_table
where column2 in (select distinct column4 from my_table)
union
select column2 from my_table
group by column2
having count(*) > 1
union
select distinct column3 from my_table
where column3 in  (select distinct column2 from my_table)
union
select distinct column2 from my_table
where column3 in (select distinct column4 from my_table)
union
select column3 from my_table
group by column3
having count(*) > 1
union
select distinct column4 from my_table
where column4 in  (select distinct column3 from my_table)
union
select distinct column2 from my_table
where column4 in (select distinct column2 from my_table)
union
select column4 from my_table
group by column4
having count(*) > 1

Here it assumes that column -1 values can not be present in either column2,3or 4.

Rgds,
HR

hunartAuthor Commented:
I have not had time to try pbocanegra's suggestion since it involves creating a view but I will try tomorrow.

I tried Harish_Rajani's suggestion and my SQLPLUS session froze.
Harish_RajaniCommented:
Is it a very big table ?

Can you also try following?

1) If possible create the index on column2, column3 & column4.

2) Instead of 'IN' clause, you can use exist, which should be faster and more cost effective.


select distinct column2 from my_table t
where exists
(select 1 from my_table t1
where t1.column3 = t.column2)
or exists
(select 1 from my_table t1
where t1.column4 = t.column2)

UNION

select distinct column3 from my_table t
where exists
(select 1 from my_table t1
where t1.column2 = t.column3)
or exists
(select 1 from my_table t1
where t1.column4 = t1.column3)

UNION

select distinct column4 from my_table t
where exists
(select 1 from my_table t1
where t1.column2 = t.column4)
or exists
(select 1 from my_table t1
where t1.column3 = t.column4)

UNION

select column2 from my_table t
group by column2
having count(*) > 1

UNION

select column3 from my_table t
group by column3
having count(*) > 1

UNION

select column4 from my_table t
group by column4
having count(*) > 1;


Rgds,
HR



pbocanegraCommented:
In my suggestion appears: "PRES_FORM." , this my Oracle schema, supress this part.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
hunartAuthor Commented:
Hi Harish,

The table is not very big.  It has approx 38,000 records.  Yes, the table is indexed and there are other fields as well, not just these 4 fields.

I ran your script but run into error message:

SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.

Can you please look at this again.

Thanks...
Harish_RajaniCommented:
Hi hunart,

I have given the extra line before and after the 'UNION' for purpose of showing it more clearly. While executing, you can remove this extra line.

The Query should run fine thus.

Please report if even then you face difficulties.

Rgds,
HR
hunartAuthor Commented:
Hi Harish,

I removed the extra lines and then the script froze.

Thanks...
Harish_RajaniCommented:
Well I tried this on my system HPUX/Oracle 8i on an identical table with 150,000+ rows. It gave result in under 30 seconds.


 SQL> desc my_table;
 Name                    Null?    Type
 ----------------------- -------- ----------------
 COLUMN1                          NUMBER(2)
 COLUMN2                          VARCHAR2(8)
 COLUMN3                          VARCHAR2(8)
 COLUMN4                          VARCHAR2(8)

SQL> select count(*) from my_table;

  COUNT(*)
----------
    156672


SQL> set time on;
15:36:28 SQL> SELECT DISTINCT column2 FROM my_table t
15:36:35   2  WHERE EXISTS
15:36:35   3  (SELECT 1 FROM my_table t1
15:36:35   4  WHERE t1.column3 = t.column2)
15:36:35   5  OR EXISTS
15:36:35   6  (SELECT 1 FROM my_table t1
15:36:35   7  WHERE t1.column4 = t.column2)
15:36:35   8  UNION
15:36:35   9  SELECT DISTINCT column3 FROM my_table t
15:36:35  10  WHERE EXISTS
15:36:35  11  (SELECT 1 FROM my_table t1
15:36:35  12  WHERE t1.column2 = t.column3)
15:36:35  13  OR EXISTS
15:36:35  14  (SELECT 1 FROM my_table t1
15:36:35  15  WHERE t1.column4 = t1.column3)
15:36:35  16  UNION
15:36:35  17  SELECT DISTINCT column4 FROM my_table t
15:36:35  18  WHERE EXISTS
15:36:35  19  (SELECT 1 FROM my_table t1
15:36:35  20  WHERE t1.column2 = t.column4)
15:36:35  21  OR EXISTS
15:36:35  22  (SELECT 1 FROM my_table t1
15:36:35  23  WHERE t1.column3 = t.column4)
15:36:35  24  UNION
15:36:35  25  SELECT column2 FROM my_table t
15:36:35  26  GROUP BY column2
15:36:35  27  HAVING COUNT(*) > 1
15:36:35  28  UNION
15:36:35  29  SELECT column3 FROM my_table t
15:36:35  30  GROUP BY column3
15:36:35  31  HAVING COUNT(*) > 1
15:36:35  32  UNION
15:36:35  33  SELECT column4 FROM my_table t
15:36:35  34  GROUP BY column4
15:36:35  35  HAVING COUNT(*) > 1;

COLUMN2
--------
23423
2343
234324

15:36:39 SQL>


It must be a performance problem, I suggest following:

1. Analyse the table
2. If already analyzed, please create indexes on column2, column3, and column4

Is it possible for you to publish the execution plan, please do so.

Rgds,
HR

pbocanegraCommented:
hunart
Are you test my code?


pbocanegra
Harish_RajaniCommented:
Why not try following :
It has the same effect..after all you just want the values that duplicate across column2 , 3 or 4.


SELECT col FROM
(
SELECT DISTINCT column2 AS col FROM my_table
UNION ALL
SELECT DISTINCT column3  AS col FROM my_table
UNION ALL
SELECT DISTINCT column4 AS col  FROM my_table
)
GROUP BY col
HAVING COUNT(*) > 1


Rgds,
HR
hunartAuthor Commented:
Hi pbocanegra,

No, I have not tried your script as yet as it involves creating a view.  However, I will look at it today and report back..

Thanks...
asaeedCommented:
select col2 ,col3 from table1 where col2 in (select col3 from table )

if u apply on all colums u might get all row cuz all rows have some data matching other columns
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Oracle Database

From novice to tech pro — start learning today.