Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
Solved

# How to remove identify the duplicate with two distinct rows?

Posted on 2009-04-13
Medium Priority
727 Views
Hi,
I need to identify the duplicate data with two distinct rows. See my data structure below.
NAME      NAME_1   VALUE    START_DATE      END_DATE      FLAG     INDEX
SURI         SE            275        13/12/2005            31/12/2010         B          1
SURI         SE            375                                                                 A          1
SURI         SE            475                                                                 A          1
SURI         SE            275        13/12/2005            31/12/2010         B          2
SURI         SE           375                                                                  A          2
SURI         SE           475                                                                  A          2
SURI         SE           175        13/12/2006            31/12/2010         B          3
SURI         SE           375                                                                  A          3
SURI         SE           475                                                                  A          3

This is my sample data. Here data are duplicate with two different index column. INDEX 1 and 2 contains same group of combination. So i need to identify any one of duplicate combination(i.e INDEX 1 or 2). Can anyone come up with exact solution?

Thanks
0
[X]
###### Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

• Help others & share knowledge
• Earn cash & points

LVL 25

Assisted Solution

ID: 24128739
Hi Suriyaraj_Sudalaiappan,

I see combinations that occur more than twice - if you ignore the value in the INDEX column.

This row is unique
SURI         SE     175    13/12/2006      31/12/2010       B         3

These two are duplicate
SURI         SE     275    13/12/2005      31/12/2010       B         1
SURI         SE     275    13/12/2005      31/12/2010       B         2

These three are triplicate
SURI         SE     375                                     A         1
SURI         SE     375                                     A         2
SURI         SE     375                                     A         3

So are these three
SURI         SE     475                                     A         1
SURI         SE     475                                     A         2
SURI         SE     475                                     A         3

Is the below what you where looking for?
``````--If you want to identify the doubles/triples/... combinations etc
SELECT NAME, NAME_1, VALUE, START_DATE, END_DATE, FLAG
FROM yourtable
GROUP BY NAME, NAME_1, VALUE, START_DATE, END_DATE, FLAG
HAVING COUNT(*) > 1
-- If you want the individual rows, this assumes you do not have duplicates with the same INDEX value
SELECT *
FROM yourtable t,
(SELECT NAME, NAME_1, VALUE, START_DATE, END_DATE, FLAG, MIN(INDEX) as min_index
FROM yourtable
GROUP BY NAME, NAME_1, VALUE, START_DATE, END_DATE, FLAG
HAVING COUNT(*) > 1) v
WHERE t.NAME        = v.NAME
AND t.NAME_1      = v.NAME_1
AND t.VALUE       = v.VALUE
AND t.START_DATE  = v.START_DATE
AND t.END_DATE    = v.END_DATE
AND t.FLAG        = v.FLAG
AND v.INDEX       > v.min_index
``````
0

Accepted Solution

Suriyaraj_Sudalaiappan earned 0 total points
ID: 24773616
Hi,

Below query is giving the exact answer.

SELECT t4.*,
MIN(t4.diff_flag) over(PARTITION BY t4.LINE_INDEX) min_flag
FROM (SELECT t3.*,
CASE
WHEN lag(ATTR_VALUE_FROM, lag_flag)OVER(ORDER BY LINE_INDEX, START_DATE_ACTIVE, END_DATE_ACTIVE, ATTR_VALUE_FROM) != ATTR_VALUE_FROM AND BASE_ASSOCIATE_FLAG = 'A'
THEN
1
WHEN (lag(ATTR_VALUE_FROM, lag_flag)OVER(ORDER BY LINE_INDEX, START_DATE_ACTIVE, END_DATE_ACTIVE, ATTR_VALUE_FROM) != ATTR_VALUE_FROM
OR lag(START_DATE_ACTIVE, lag_flag) OVER(ORDER BY LINE_INDEX, START_DATE_ACTIVE, END_DATE_ACTIVE, ATTR_VALUE_FROM) != START_DATE_ACTIVE
OR lag(END_DATE_ACTIVE, lag_flag) OVER(ORDER BY LINE_INDEX, START_DATE_ACTIVE, END_DATE_ACTIVE, ATTR_VALUE_FROM) != END_DATE_ACTIVE)
AND BASE_ASSOCIATE_FLAG = 'B'
THEN
1
WHEN lag_flag = 0
THEN
1
ELSE
2
END diff_flag
FROM (SELECT t2.*,
first_value(lag_count_flag) OVER(PARTITION BY LINE_INDEX
ORDER BY LINE_INDEX, START_DATE_ACTIVE, END_DATE_ACTIVE) lag_flag
FROM (SELECT t1.*,
CASE
WHEN BASE_ASSOCIATE_FLAG = 'B'
THEN
lag(COUNT, 1, 0) OVER(ORDER BY LINE_INDEX, START_DATE_ACTIVE, END_DATE_ACTIVE, ATTR_VALUE_FROM)
END lag_count_flag
FROM (SELECT t.*,
COUNT(*) over(PARTITION BY LINE_INDEX) COUNT,
row_number() over(PARTITION BY NAME, FACTOR_ATTR_CONTEXT, FACTOR_ATTRIBUTE, COMP_OPERATOR, ATTR_VALUE_FROM,
ORDER BY LINE_INDEX) row_number
FROM xxqp_pam_factor_stg t
WHERE
t.process_flag = 1 AND t.request_id = v_conc_req_id) t1) t2) t3) t4
ORDER BY t4.LINE_INDEX,
t4.START_DATE_ACTIVE,
t4.ATTR_VALUE_FROM
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I remember the day when someone asked me to create a user for an application developement. The user should be able to create views and materialized views and, so, I used the following syntax: (CODE) This way, I guessed, I would ensure that use…
Background In several of the companies I have worked for, I noticed that corporate reporting is off loaded from the production database and done mainly on a clone database which needs to be kept up to date daily by various means, be it a logical…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Via a live example, show how to restore a database from backup after a simulated disk failure using RMAN.
###### Suggested Courses
Course of the Month7 days, 18 hours left to enroll