asked on

Comparing Duplicate Records between Columns

I need to run a simple query to produce a comparable results between the columns and need your help. Here is the sample table info:

Column 1 Column 2 Column 3 Column 4

00001111       XXXXX      XXXX1      XXXX2
00002222       YYYYY       SSSS      XXXXX
00003333       ZZZZ SSSS
00004444       SSSS      SSSS1

Each row has 4 columns. Column 1 is a unique key so there won't be any duplicate records. Columns 2, 3 and 4 are not unique so we might have the same record from other columns as well. In the sample above, Record "XXXXX" from Row 1, Column 2 is also in Row 2, Column 4. The same thing applies to record "SSSS" where it is in Row 2, Column3, Row 3, Column 4 and Row 4, column 2.

I need to produce a result showing these duplicate records. Database is Oracle 8i, running Windows NT4.

Thanks....

Mark Geerlings

This example assumes that your table is named: "my_table".

select * from my_table t1
where exists (select 1 from my_table t2
where t2.column1 <> t1.column1
and (t2.column2 = t1.column2
or t2.column3 = t1.column2
or t2.column4 = t1.column2
or t2.column2 = t1.column3
or t2.column3 = t1.column3
or t2.column4 = t1.column3
or t2.column2 = t1.column4
or t2.column3 = t1.column4
or t2.column4 = t1.column4);

It may be possible to use an "in" set and reduce the number of "or" conditions, but this was the quickest solution I could think of.

Member_2_957366

ASKER

Hi Markgeer,

Your script did not work. It produces every single row of records in my database !!!

NicksonKoh

Hi hunart,

Try this

SELECT *
FROM table1 a
WHERE EXISTS (
SELECT 1
FROM table1 b
WHERE ( b.column2 = a.column2
OR b.column2 = a.column3
OR b.column2 = a.column4
)
OR ( b.column3 = a.column2
OR b.column3 = a.column3
OR b.column3 = a.column4
)
OR ( b.column4 = a.column2
OR b.column4 = a.column3
OR b.column4 = a.column4
)
AND b.column1 <> a.column1)

Cheers
Nickson

NicksonKoh

opss... same sql as markgeer :p

Nickson

radja7

Try this:

select t1.*
,t2.*
,t3.*
from my_table t1,my_table t2,my_table t3
where t1.column1!=t2.column1 and t2.column1!=t3.column1 and t3.column1!=t1.column1
and (
t1.column2 in (t2.column2,t2.column3,t2.column4,t3.column2,t3.column3,t3.column4)
or
t1.column3 in (t2.column2,t2.column3,t2.column4,t3.column2,t3.column3,t3.column4)
or
t1.column4 in (t2.column2,t2.column3,t2.column4,t3.column2,t3.column3,t3.column4)
)

radja7

Sorry. i was wrong.

There is better solution

select distinct t1.*
,t2.*
from my_table t1,my_table t2,my_table t3
where t1.column1!=t2.column1
and (
t1.column2 in (t2.column2,t2.column3,t2.column4)
or
t1.column3 in (t2.column2,t2.column3,t2.column4)
or
t1.column4 in (t2.column2,t2.column3,t2.column4)
)

Member_2_957366

ASKER

I tried Radja7 and it did not work. The SQLPLUS was frozen and did not produce any results.

Can anyone please help....

pbocanegra

CREATE OR REPLACE FORCE VIEW PRES_FORM.VIEW_TEST
(IND, COLUMN01, "2-3", "2-4", "3-4")
AS
SELECT ROWNUM IND,A.* FROM (
SELECT COLUMN01,COLUMN02 "2-3",COLUMN02 "2-4",COLUMN03 "3-4" FROM TBLPRUEBAS
UNION
SELECT COLUMN01,COLUMN03,COLUMN04,COLUMN04 FROM TBLPRUEBAS
) A;

SELECT * FROM TBLPRUEBAS WHERE COLUMN01 IN
(
SELECT column01
FROM view_test
WHERE "2-3" IN (SELECT "2-3"
FROM (SELECT *
FROM view_test) a
HAVING COUNT (*) > 1
GROUP BY "2-3")
UNION
SELECT column01
FROM view_test
WHERE "2-4" IN (SELECT "2-4"
FROM (SELECT *
FROM view_test) a
HAVING COUNT (*) > 1
GROUP BY "2-4")
UNION
SELECT column01
FROM view_test
WHERE "3-4" IN (SELECT "3-4"
FROM (SELECT *
FROM view_test) a
HAVING COUNT (*) > 1
GROUP BY "3-4")
)

TBLPRUEBAS, YOUR TEST DATA.

Harish_Rajani

A series of union and distinct keywords shold do the trick....

select distinct column2 from my_table
where column2 in (select distinct column3 from my_table)
union
select distinct column2 from my_table
where column2 in (select distinct column4 from my_table)
union
select column2 from my_table
group by column2
having count(*) > 1
union
select distinct column3 from my_table
where column3 in (select distinct column2 from my_table)
union
select distinct column2 from my_table
where column3 in (select distinct column4 from my_table)
union
select column3 from my_table
group by column3
having count(*) > 1
union
select distinct column4 from my_table
where column4 in (select distinct column3 from my_table)
union
select distinct column2 from my_table
where column4 in (select distinct column2 from my_table)
union
select column4 from my_table
group by column4
having count(*) > 1

Here it assumes that column -1 values can not be present in either column2,3or 4.

Rgds,
HR

Member_2_957366

ASKER

I have not had time to try pbocanegra's suggestion since it involves creating a view but I will try tomorrow.

I tried Harish_Rajani's suggestion and my SQLPLUS session froze.

Harish_Rajani

Is it a very big table ?

Can you also try following?

1) If possible create the index on column2, column3 & column4.

2) Instead of 'IN' clause, you can use exist, which should be faster and more cost effective.

select distinct column2 from my_table t
where exists
(select 1 from my_table t1
where t1.column3 = t.column2)
or exists
(select 1 from my_table t1
where t1.column4 = t.column2)

UNION

select distinct column3 from my_table t
where exists
(select 1 from my_table t1
where t1.column2 = t.column3)
or exists
(select 1 from my_table t1
where t1.column4 = t1.column3)

UNION

select distinct column4 from my_table t
where exists
(select 1 from my_table t1
where t1.column2 = t.column4)
or exists
(select 1 from my_table t1
where t1.column3 = t.column4)

UNION

select column2 from my_table t
group by column2
having count(*) > 1

UNION

select column3 from my_table t
group by column3
having count(*) > 1

UNION

select column4 from my_table t
group by column4
having count(*) > 1;

Rgds,
HR

ASKER CERTIFIED SOLUTION

pbocanegra

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Member_2_957366

ASKER

Hi Harish,

The table is not very big. It has approx 38,000 records. Yes, the table is indexed and there are other fields as well, not just these 4 fields.

I ran your script but run into error message:

SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.

Can you please look at this again.

Thanks...

Harish_Rajani

Hi hunart,

I have given the extra line before and after the 'UNION' for purpose of showing it more clearly. While executing, you can remove this extra line.

The Query should run fine thus.

Please report if even then you face difficulties.

Rgds,
HR

Member_2_957366

ASKER

Hi Harish,

I removed the extra lines and then the script froze.

Thanks...

SOLUTION

Harish_Rajani

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

pbocanegra

hunart
Are you test my code?

pbocanegra

Harish_Rajani

Why not try following :
It has the same effect..after all you just want the values that duplicate across column2 , 3 or 4.

SELECT col FROM
(
SELECT DISTINCT column2 AS col FROM my_table
UNION ALL
SELECT DISTINCT column3 AS col FROM my_table
UNION ALL
SELECT DISTINCT column4 AS col FROM my_table
)
GROUP BY col
HAVING COUNT(*) > 1

Rgds,
HR

Member_2_957366

ASKER

Hi pbocanegra,

No, I have not tried your script as yet as it involves creating a view. However, I will look at it today and report back..

Thanks...

asaeed

select col2 ,col3 from table1 where col2 in (select col3 from table )

if u apply on all colums u might get all row cuz all rows have some data matching other columns