Member_2_957366
asked on
Comparing Duplicate Records between Columns
I need to run a simple query to produce a comparable results between the columns and need your help. Here is the sample table info:
Column 1 Column 2 Column 3 Column 4
00001111 XXXXX XXXX1 XXXX2
00002222 YYYYY SSSS XXXXX
00003333 ZZZZ SSSS
00004444 SSSS SSSS1
Each row has 4 columns. Column 1 is a unique key so there won't be any duplicate records. Columns 2, 3 and 4 are not unique so we might have the same record from other columns as well. In the sample above, Record "XXXXX" from Row 1, Column 2 is also in Row 2, Column 4. The same thing applies to record "SSSS" where it is in Row 2, Column3, Row 3, Column 4 and Row 4, column 2.
I need to produce a result showing these duplicate records. Database is Oracle 8i, running Windows NT4.
Thanks....
Column 1 Column 2 Column 3 Column 4
00001111 XXXXX XXXX1 XXXX2
00002222 YYYYY SSSS XXXXX
00003333 ZZZZ SSSS
00004444 SSSS SSSS1
Each row has 4 columns. Column 1 is a unique key so there won't be any duplicate records. Columns 2, 3 and 4 are not unique so we might have the same record from other columns as well. In the sample above, Record "XXXXX" from Row 1, Column 2 is also in Row 2, Column 4. The same thing applies to record "SSSS" where it is in Row 2, Column3, Row 3, Column 4 and Row 4, column 2.
I need to produce a result showing these duplicate records. Database is Oracle 8i, running Windows NT4.
Thanks....
ASKER
Hi Markgeer,
Your script did not work. It produces every single row of records in my database !!!
Your script did not work. It produces every single row of records in my database !!!
Hi hunart,
Try this
SELECT *
FROM table1 a
WHERE EXISTS (
SELECT 1
FROM table1 b
WHERE ( b.column2 = a.column2
OR b.column2 = a.column3
OR b.column2 = a.column4
)
OR ( b.column3 = a.column2
OR b.column3 = a.column3
OR b.column3 = a.column4
)
OR ( b.column4 = a.column2
OR b.column4 = a.column3
OR b.column4 = a.column4
)
AND b.column1 <> a.column1)
Cheers
Nickson
Try this
SELECT *
FROM table1 a
WHERE EXISTS (
SELECT 1
FROM table1 b
WHERE ( b.column2 = a.column2
OR b.column2 = a.column3
OR b.column2 = a.column4
)
OR ( b.column3 = a.column2
OR b.column3 = a.column3
OR b.column3 = a.column4
)
OR ( b.column4 = a.column2
OR b.column4 = a.column3
OR b.column4 = a.column4
)
AND b.column1 <> a.column1)
Cheers
Nickson
opss... same sql as markgeer :p
Nickson
Nickson
Try this:
select t1.*
,t2.*
,t3.*
from my_table t1,my_table t2,my_table t3
where t1.column1!=t2.column1 and t2.column1!=t3.column1 and t3.column1!=t1.column1
and (
t1.column2 in (t2.column2,t2.column3,t2. column4,t3 .column2,t 3.column3, t3.column4 )
or
t1.column3 in (t2.column2,t2.column3,t2. column4,t3 .column2,t 3.column3, t3.column4 )
or
t1.column4 in (t2.column2,t2.column3,t2. column4,t3 .column2,t 3.column3, t3.column4 )
)
select t1.*
,t2.*
,t3.*
from my_table t1,my_table t2,my_table t3
where t1.column1!=t2.column1 and t2.column1!=t3.column1 and t3.column1!=t1.column1
and (
t1.column2 in (t2.column2,t2.column3,t2.
or
t1.column3 in (t2.column2,t2.column3,t2.
or
t1.column4 in (t2.column2,t2.column3,t2.
)
Sorry. i was wrong.
There is better solution
select distinct t1.*
,t2.*
from my_table t1,my_table t2,my_table t3
where t1.column1!=t2.column1
and (
t1.column2 in (t2.column2,t2.column3,t2. column4)
or
t1.column3 in (t2.column2,t2.column3,t2. column4)
or
t1.column4 in (t2.column2,t2.column3,t2. column4)
)
There is better solution
select distinct t1.*
,t2.*
from my_table t1,my_table t2,my_table t3
where t1.column1!=t2.column1
and (
t1.column2 in (t2.column2,t2.column3,t2.
or
t1.column3 in (t2.column2,t2.column3,t2.
or
t1.column4 in (t2.column2,t2.column3,t2.
)
ASKER
I tried Radja7 and it did not work. The SQLPLUS was frozen and did not produce any results.
Can anyone please help....
Can anyone please help....
CREATE OR REPLACE FORCE VIEW PRES_FORM.VIEW_TEST
(IND, COLUMN01, "2-3", "2-4", "3-4")
AS
SELECT ROWNUM IND,A.* FROM (
SELECT COLUMN01,COLUMN02 "2-3",COLUMN02 "2-4",COLUMN03 "3-4" FROM TBLPRUEBAS
UNION
SELECT COLUMN01,COLUMN03,COLUMN04 ,COLUMN04 FROM TBLPRUEBAS
) A;
SELECT * FROM TBLPRUEBAS WHERE COLUMN01 IN
(
SELECT column01
FROM view_test
WHERE "2-3" IN (SELECT "2-3"
FROM (SELECT *
FROM view_test) a
HAVING COUNT (*) > 1
GROUP BY "2-3")
UNION
SELECT column01
FROM view_test
WHERE "2-4" IN (SELECT "2-4"
FROM (SELECT *
FROM view_test) a
HAVING COUNT (*) > 1
GROUP BY "2-4")
UNION
SELECT column01
FROM view_test
WHERE "3-4" IN (SELECT "3-4"
FROM (SELECT *
FROM view_test) a
HAVING COUNT (*) > 1
GROUP BY "3-4")
)
TBLPRUEBAS, YOUR TEST DATA.
(IND, COLUMN01, "2-3", "2-4", "3-4")
AS
SELECT ROWNUM IND,A.* FROM (
SELECT COLUMN01,COLUMN02 "2-3",COLUMN02 "2-4",COLUMN03 "3-4" FROM TBLPRUEBAS
UNION
SELECT COLUMN01,COLUMN03,COLUMN04
) A;
SELECT * FROM TBLPRUEBAS WHERE COLUMN01 IN
(
SELECT column01
FROM view_test
WHERE "2-3" IN (SELECT "2-3"
FROM (SELECT *
FROM view_test) a
HAVING COUNT (*) > 1
GROUP BY "2-3")
UNION
SELECT column01
FROM view_test
WHERE "2-4" IN (SELECT "2-4"
FROM (SELECT *
FROM view_test) a
HAVING COUNT (*) > 1
GROUP BY "2-4")
UNION
SELECT column01
FROM view_test
WHERE "3-4" IN (SELECT "3-4"
FROM (SELECT *
FROM view_test) a
HAVING COUNT (*) > 1
GROUP BY "3-4")
)
TBLPRUEBAS, YOUR TEST DATA.
A series of union and distinct keywords shold do the trick....
select distinct column2 from my_table
where column2 in (select distinct column3 from my_table)
union
select distinct column2 from my_table
where column2 in (select distinct column4 from my_table)
union
select column2 from my_table
group by column2
having count(*) > 1
union
select distinct column3 from my_table
where column3 in (select distinct column2 from my_table)
union
select distinct column2 from my_table
where column3 in (select distinct column4 from my_table)
union
select column3 from my_table
group by column3
having count(*) > 1
union
select distinct column4 from my_table
where column4 in (select distinct column3 from my_table)
union
select distinct column2 from my_table
where column4 in (select distinct column2 from my_table)
union
select column4 from my_table
group by column4
having count(*) > 1
Here it assumes that column -1 values can not be present in either column2,3or 4.
Rgds,
HR
select distinct column2 from my_table
where column2 in (select distinct column3 from my_table)
union
select distinct column2 from my_table
where column2 in (select distinct column4 from my_table)
union
select column2 from my_table
group by column2
having count(*) > 1
union
select distinct column3 from my_table
where column3 in (select distinct column2 from my_table)
union
select distinct column2 from my_table
where column3 in (select distinct column4 from my_table)
union
select column3 from my_table
group by column3
having count(*) > 1
union
select distinct column4 from my_table
where column4 in (select distinct column3 from my_table)
union
select distinct column2 from my_table
where column4 in (select distinct column2 from my_table)
union
select column4 from my_table
group by column4
having count(*) > 1
Here it assumes that column -1 values can not be present in either column2,3or 4.
Rgds,
HR
ASKER
I have not had time to try pbocanegra's suggestion since it involves creating a view but I will try tomorrow.
I tried Harish_Rajani's suggestion and my SQLPLUS session froze.
I tried Harish_Rajani's suggestion and my SQLPLUS session froze.
Is it a very big table ?
Can you also try following?
1) If possible create the index on column2, column3 & column4.
2) Instead of 'IN' clause, you can use exist, which should be faster and more cost effective.
select distinct column2 from my_table t
where exists
(select 1 from my_table t1
where t1.column3 = t.column2)
or exists
(select 1 from my_table t1
where t1.column4 = t.column2)
UNION
select distinct column3 from my_table t
where exists
(select 1 from my_table t1
where t1.column2 = t.column3)
or exists
(select 1 from my_table t1
where t1.column4 = t1.column3)
UNION
select distinct column4 from my_table t
where exists
(select 1 from my_table t1
where t1.column2 = t.column4)
or exists
(select 1 from my_table t1
where t1.column3 = t.column4)
UNION
select column2 from my_table t
group by column2
having count(*) > 1
UNION
select column3 from my_table t
group by column3
having count(*) > 1
UNION
select column4 from my_table t
group by column4
having count(*) > 1;
Rgds,
HR
Can you also try following?
1) If possible create the index on column2, column3 & column4.
2) Instead of 'IN' clause, you can use exist, which should be faster and more cost effective.
select distinct column2 from my_table t
where exists
(select 1 from my_table t1
where t1.column3 = t.column2)
or exists
(select 1 from my_table t1
where t1.column4 = t.column2)
UNION
select distinct column3 from my_table t
where exists
(select 1 from my_table t1
where t1.column2 = t.column3)
or exists
(select 1 from my_table t1
where t1.column4 = t1.column3)
UNION
select distinct column4 from my_table t
where exists
(select 1 from my_table t1
where t1.column2 = t.column4)
or exists
(select 1 from my_table t1
where t1.column3 = t.column4)
UNION
select column2 from my_table t
group by column2
having count(*) > 1
UNION
select column3 from my_table t
group by column3
having count(*) > 1
UNION
select column4 from my_table t
group by column4
having count(*) > 1;
Rgds,
HR
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
Hi Harish,
The table is not very big. It has approx 38,000 records. Yes, the table is indexed and there are other fields as well, not just these 4 fields.
I ran your script but run into error message:
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
Can you please look at this again.
Thanks...
The table is not very big. It has approx 38,000 records. Yes, the table is indexed and there are other fields as well, not just these 4 fields.
I ran your script but run into error message:
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
SP2-0042: unknown command "UNION" - rest of line ignored.
Can you please look at this again.
Thanks...
Hi hunart,
I have given the extra line before and after the 'UNION' for purpose of showing it more clearly. While executing, you can remove this extra line.
The Query should run fine thus.
Please report if even then you face difficulties.
Rgds,
HR
I have given the extra line before and after the 'UNION' for purpose of showing it more clearly. While executing, you can remove this extra line.
The Query should run fine thus.
Please report if even then you face difficulties.
Rgds,
HR
ASKER
Hi Harish,
I removed the extra lines and then the script froze.
Thanks...
I removed the extra lines and then the script froze.
Thanks...
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
hunart
Are you test my code?
pbocanegra
Are you test my code?
pbocanegra
Why not try following :
It has the same effect..after all you just want the values that duplicate across column2 , 3 or 4.
SELECT col FROM
(
SELECT DISTINCT column2 AS col FROM my_table
UNION ALL
SELECT DISTINCT column3 AS col FROM my_table
UNION ALL
SELECT DISTINCT column4 AS col FROM my_table
)
GROUP BY col
HAVING COUNT(*) > 1
Rgds,
HR
It has the same effect..after all you just want the values that duplicate across column2 , 3 or 4.
SELECT col FROM
(
SELECT DISTINCT column2 AS col FROM my_table
UNION ALL
SELECT DISTINCT column3 AS col FROM my_table
UNION ALL
SELECT DISTINCT column4 AS col FROM my_table
)
GROUP BY col
HAVING COUNT(*) > 1
Rgds,
HR
ASKER
Hi pbocanegra,
No, I have not tried your script as yet as it involves creating a view. However, I will look at it today and report back..
Thanks...
No, I have not tried your script as yet as it involves creating a view. However, I will look at it today and report back..
Thanks...
select col2 ,col3 from table1 where col2 in (select col3 from table )
if u apply on all colums u might get all row cuz all rows have some data matching other columns
if u apply on all colums u might get all row cuz all rows have some data matching other columns
select * from my_table t1
where exists (select 1 from my_table t2
where t2.column1 <> t1.column1
and (t2.column2 = t1.column2
or t2.column3 = t1.column2
or t2.column4 = t1.column2
or t2.column2 = t1.column3
or t2.column3 = t1.column3
or t2.column4 = t1.column3
or t2.column2 = t1.column4
or t2.column3 = t1.column4
or t2.column4 = t1.column4);
It may be possible to use an "in" set and reduce the number of "or" conditions, but this was the quickest solution I could think of.