?
Solved

Comparing Duplicate Records between Columns

Posted on 2006-03-23
23
Medium Priority
?
898 Views
Last Modified: 2011-10-03
I need to run a simple query to produce a comparable results between the columns and need your help.  Here is the sample table info:

Column 1                                Column 2        Column 3     Column 4
      
00001111                                     XXXXX            XXXX1            XXXX2
00002222                                     YYYYY               SSSS            XXXXX
00003333                                     ZZZZ                                 SSSS
00004444                                     SSSS              SSSS1

 
Each row has 4 columns.  Column 1 is a unique key so there won't be any duplicate records.  Columns 2, 3 and 4 are not unique so we might have the same record from other columns as well.  In the sample above, Record "XXXXX" from Row 1, Column 2 is also in Row 2, Column 4.  The same thing applies to record "SSSS" where it is in Row 2, Column3, Row 3, Column 4 and Row 4, column 2.

I need to produce a result showing these duplicate records.  Database is Oracle 8i, running Windows NT4.

Thanks....
0
Comment
Question by:hunart
  • 6
  • 5
  • 3
  • +4
20 Comments
 
LVL 35

Expert Comment

by:Mark Geerlings
ID: 16271609
This example assumes that your table is named: "my_table".

select * from my_table t1
where exists (select 1 from my_table t2
  where t2.column1 <> t1.column1
    and (t2.column2 = t1.column2
      or t2.column3 = t1.column2
      or t2.column4 = t1.column2
      or t2.column2 = t1.column3
      or t2.column3 = t1.column3
      or t2.column4 = t1.column3
      or t2.column2 = t1.column4
      or t2.column3 = t1.column4
      or t2.column4 = t1.column4);

It may be possible to use an "in" set and reduce the number of "or" conditions, but this was the quickest solution I could think of.
0
 
LVL 8

Author Comment

by:hunart
ID: 16273632
Hi Markgeer,

Your script did not work.  It produces every single row of records in my database !!!
0
 
LVL 17

Expert Comment

by:NicksonKoh
ID: 16276592
Hi hunart,

Try this

SELECT *
  FROM table1 a
 WHERE EXISTS (
          SELECT 1
            FROM table1 b
           WHERE (   b.column2 = a.column2
                  OR b.column2 = a.column3
                  OR b.column2 = a.column4
                 )
              OR (   b.column3 = a.column2
                  OR b.column3 = a.column3
                  OR b.column3 = a.column4
                 )
              OR     (   b.column4 = a.column2
                      OR b.column4 = a.column3
                      OR b.column4 = a.column4
                     )
                 AND b.column1 <> a.column1)

Cheers
Nickson
0
NEW Veeam Backup for Microsoft Office 365 1.5

With Office 365, it’s your data and your responsibility to protect it. NEW Veeam Backup for Microsoft Office 365 eliminates the risk of losing access to your Office 365 data.

 
LVL 17

Expert Comment

by:NicksonKoh
ID: 16276604
opss... same sql as markgeer :p

Nickson
0
 
LVL 4

Expert Comment

by:radja7
ID: 16277283
Try this:

select t1.*
,t2.*
,t3.*
from my_table t1,my_table t2,my_table t3
where t1.column1!=t2.column1 and t2.column1!=t3.column1 and t3.column1!=t1.column1
and (
t1.column2 in (t2.column2,t2.column3,t2.column4,t3.column2,t3.column3,t3.column4)
or
t1.column3 in (t2.column2,t2.column3,t2.column4,t3.column2,t3.column3,t3.column4)
or
t1.column4 in (t2.column2,t2.column3,t2.column4,t3.column2,t3.column3,t3.column4)
)
0
 
LVL 4

Expert Comment

by:radja7
ID: 16277434
Sorry. i was wrong.

There is better solution

select distinct t1.*
,t2.*
from my_table t1,my_table t2,my_table t3
where t1.column1!=t2.column1
and (
t1.column2 in (t2.column2,t2.column3,t2.column4)
or
t1.column3 in (t2.column2,t2.column3,t2.column4)
or
t1.column4 in (t2.column2,t2.column3,t2.column4)
)
0
 
LVL 8

Author Comment

by:hunart
ID: 16279700
I tried Radja7 and it did not work.  The SQLPLUS was frozen and did not produce any results.

Can anyone please help....
0
 
LVL 4

Expert Comment

by:pbocanegra
ID: 16283396
CREATE OR REPLACE FORCE VIEW PRES_FORM.VIEW_TEST
(IND, COLUMN01, "2-3", "2-4", "3-4")
AS
SELECT ROWNUM IND,A.* FROM (
SELECT COLUMN01,COLUMN02 "2-3",COLUMN02 "2-4",COLUMN03 "3-4" FROM TBLPRUEBAS
UNION
SELECT COLUMN01,COLUMN03,COLUMN04,COLUMN04 FROM TBLPRUEBAS
) A;


SELECT * FROM TBLPRUEBAS WHERE COLUMN01 IN
(
SELECT column01
  FROM view_test
 WHERE "2-3" IN (SELECT   "2-3"
                     FROM (SELECT *
                             FROM view_test) a
                   HAVING COUNT (*) > 1
                 GROUP BY "2-3")
UNION
SELECT column01
  FROM view_test
 WHERE "2-4" IN (SELECT   "2-4"
                     FROM (SELECT *
                             FROM view_test) a
                   HAVING COUNT (*) > 1
                 GROUP BY "2-4")
UNION
SELECT column01
  FROM view_test
 WHERE "3-4" IN (SELECT   "3-4"
                     FROM (SELECT *
                             FROM view_test) a
                   HAVING COUNT (*) > 1
                 GROUP BY "3-4")
)


TBLPRUEBAS,  YOUR TEST DATA.



0
 
LVL 4

Expert Comment

by:Harish_Rajani
ID: 16302731
A series of union and distinct keywords shold do the trick....


select distinct column2 from my_table
where column2 in  (select distinct column3 from my_table)
union
select distinct column2 from my_table
where column2 in (select distinct column4 from my_table)
union
select column2 from my_table
group by column2
having count(*) > 1
union
select distinct column3 from my_table
where column3 in  (select distinct column2 from my_table)
union
select distinct column2 from my_table
where column3 in (select distinct column4 from my_table)
union
select column3 from my_table
group by column3
having count(*) > 1
union
select distinct column4 from my_table
where column4 in  (select distinct column3 from my_table)
union
select distinct column2 from my_table
where column4 in (select distinct column2 from my_table)
union
select column4 from my_table
group by column4
having count(*) > 1

Here it assumes that column -1 values can not be present in either column2,3or 4.

Rgds,
HR

0
 
LVL 8

Author Comment

by:hunart
ID: 16303784
I have not had time to try pbocanegra's suggestion since it involves creating a view but I will try tomorrow.

I tried Harish_Rajani's suggestion and my SQLPLUS session froze.
0
 
LVL 4

Expert Comment

by:Harish_Rajani
ID: 16304138
Is it a very big table ?

Can you also try following?

1) If possible create the index on column2, column3 & column4.

2) Instead of 'IN' clause, you can use exist, which should be faster and more cost effective.


select distinct column2 from my_table t
where exists
(select 1 from my_table t1
where t1.column3 = t.column2)
or exists
(select 1 from my_table t1
where t1.column4 = t.column2)

UNION

select distinct column3 from my_table t
where exists
(select 1 from my_table t1
where t1.column2 = t.column3)
or exists
(select 1 from my_table t1
where t1.column4 = t1.column3)

UNION

select distinct column4 from my_table t
where exists
(select 1 from my_table t1
where t1.column2 = t.column4)
or exists
(select 1 from my_table t1
where t1.column3 = t.column4)

UNION

select column2 from my_table t
group by column2
having count(*) > 1

UNION

select column3 from my_table t
group by column3
having count(*) > 1

UNION

select column4 from my_table t
group by column4
having count(*) > 1;


Rgds,
HR



0
 
LVL 4

Accepted Solution

by:
pbocanegra earned 252 total points
ID: 16304214
In my suggestion appears: "PRES_FORM." , this my Oracle schema, supress this part.
0
 
LVL 8

Author Comment

by:hunart
ID: 16310013
Hi Harish,

The table is not very big.  It has approx 38,000 records.  Yes, the table is indexed and there are other fields as well, not just these 4 fields.

I ran your script but run into error message:

SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.

Can you please look at this again.

Thanks...
0
 
LVL 4

Expert Comment

by:Harish_Rajani
ID: 16310159
Hi hunart,

I have given the extra line before and after the 'UNION' for purpose of showing it more clearly. While executing, you can remove this extra line.

The Query should run fine thus.

Please report if even then you face difficulties.

Rgds,
HR
0
 
LVL 8

Author Comment

by:hunart
ID: 16310362
Hi Harish,

I removed the extra lines and then the script froze.

Thanks...
0
 
LVL 4

Assisted Solution

by:Harish_Rajani
Harish_Rajani earned 248 total points
ID: 16310675
Well I tried this on my system HPUX/Oracle 8i on an identical table with 150,000+ rows. It gave result in under 30 seconds.


 SQL> desc my_table;
 Name                    Null?    Type
 ----------------------- -------- ----------------
 COLUMN1                          NUMBER(2)
 COLUMN2                          VARCHAR2(8)
 COLUMN3                          VARCHAR2(8)
 COLUMN4                          VARCHAR2(8)

SQL> select count(*) from my_table;

  COUNT(*)
----------
    156672


SQL> set time on;
15:36:28 SQL> SELECT DISTINCT column2 FROM my_table t
15:36:35   2  WHERE EXISTS
15:36:35   3  (SELECT 1 FROM my_table t1
15:36:35   4  WHERE t1.column3 = t.column2)
15:36:35   5  OR EXISTS
15:36:35   6  (SELECT 1 FROM my_table t1
15:36:35   7  WHERE t1.column4 = t.column2)
15:36:35   8  UNION
15:36:35   9  SELECT DISTINCT column3 FROM my_table t
15:36:35  10  WHERE EXISTS
15:36:35  11  (SELECT 1 FROM my_table t1
15:36:35  12  WHERE t1.column2 = t.column3)
15:36:35  13  OR EXISTS
15:36:35  14  (SELECT 1 FROM my_table t1
15:36:35  15  WHERE t1.column4 = t1.column3)
15:36:35  16  UNION
15:36:35  17  SELECT DISTINCT column4 FROM my_table t
15:36:35  18  WHERE EXISTS
15:36:35  19  (SELECT 1 FROM my_table t1
15:36:35  20  WHERE t1.column2 = t.column4)
15:36:35  21  OR EXISTS
15:36:35  22  (SELECT 1 FROM my_table t1
15:36:35  23  WHERE t1.column3 = t.column4)
15:36:35  24  UNION
15:36:35  25  SELECT column2 FROM my_table t
15:36:35  26  GROUP BY column2
15:36:35  27  HAVING COUNT(*) > 1
15:36:35  28  UNION
15:36:35  29  SELECT column3 FROM my_table t
15:36:35  30  GROUP BY column3
15:36:35  31  HAVING COUNT(*) > 1
15:36:35  32  UNION
15:36:35  33  SELECT column4 FROM my_table t
15:36:35  34  GROUP BY column4
15:36:35  35  HAVING COUNT(*) > 1;

COLUMN2
--------
23423
2343
234324

15:36:39 SQL>


It must be a performance problem, I suggest following:

1. Analyse the table
2. If already analyzed, please create indexes on column2, column3, and column4

Is it possible for you to publish the execution plan, please do so.

Rgds,
HR

0
 
LVL 4

Expert Comment

by:pbocanegra
ID: 16310831
hunart
Are you test my code?


pbocanegra
0
 
LVL 4

Expert Comment

by:Harish_Rajani
ID: 16310842
Why not try following :
It has the same effect..after all you just want the values that duplicate across column2 , 3 or 4.


SELECT col FROM
(
SELECT DISTINCT column2 AS col FROM my_table
UNION ALL
SELECT DISTINCT column3  AS col FROM my_table
UNION ALL
SELECT DISTINCT column4 AS col  FROM my_table
)
GROUP BY col
HAVING COUNT(*) > 1


Rgds,
HR
0
 
LVL 8

Author Comment

by:hunart
ID: 16311063
Hi pbocanegra,

No, I have not tried your script as yet as it involves creating a view.  However, I will look at it today and report back..

Thanks...
0
 
LVL 2

Expert Comment

by:asaeed
ID: 16330729
select col2 ,col3 from table1 where col2 in (select col3 from table )

if u apply on all colums u might get all row cuz all rows have some data matching other columns
0

Featured Post

Restore individual SQL databases with ease

Veeam Explorer for Microsoft SQL Server delivers an easy-to-use, wizard-driven interface for restoring your databases from a backup. No expert SQL background required. Web interface provides a complete view of all available SQL databases to simplify the recovery of lost database

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Note: this article covers simple compression. Oracle introduced in version 11g release 2 a new feature called Advanced Compression which is not covered here. General principle of Oracle compression Oracle compression is a way of reducing the d…
When it comes to protecting Oracle Database servers and systems, there are a ton of myths out there. Here are the most common.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
This video shows how to configure and send email from and Oracle database using both UTL_SMTP and UTL_MAIL, as well as comparing UTL_SMTP to a manual SMTP conversation with a mail server.
Suggested Courses
Course of the Month14 days, 6 hours left to enroll

807 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question