Link to home
Start Free TrialLog in
Avatar of anumoses
anumosesFlag for United States of America

asked on

duplicates

select SCHEDULE_DATE, SOCIAL_SECURITY_NO,
          SITE_CODE, START_TIME, END_TIME
  from dept_staff
where schedule_date = '25-jun-2011'      
----------------------
SCHEDULE_DATE|SOCIAL_SECURITY_NO|SITE_CODE|START_TIME|END_TIME
6/25/2011|aaaaaaaaa||630|300
6/25/2011|aaaaaaaaa||630|300
6/25/2011|aaaaaaaaa||630|300
6/25/2011|aaaaaaaaa||630|300
6/25/2011|aaaaaaaaa||630|300
6/25/2011|aaaaaaaaa||630|300
6/25/2011|aaaaaaaaa||630|300
6/25/2011|aaaaaaaaa||630|300
6/25/2011|aaaaaaaaa||630|300
6/25/2011|aaaaaaaaa||630|300
6/25/2011|aaaaaaaaa||630|300
6/25/2011|aaaaaaaaa||630|300
6/25/2011|aaaaaaaaa||630|300
6/25/2011|bbbbbbbbb||800|400
6/25/2011|bbbbbbbbb||800|400
6/25/2011|bbbbbbbbb||800|400
6/25/2011|bbbbbbbbb||800|400
6/25/2011|bbbbbbbbb||800|400
6/25/2011|bbbbbbbbb||800|400
6/25/2011|bbbbbbbbb||800|400
6/25/2011|bbbbbbbbb||800|400
6/25/2011|bbbbbbbbb||800|400
6/25/2011|bbbbbbbbb||800|400
6/25/2011|bbbbbbbbb||800|400
6/25/2011|bbbbbbbbb||800|400
6/25/2011|bbbbbbbbb||800|400
-------------------
Here data for each employee is repeated 13 times. We go by social_security_no(key).
I need to find the duplicates for the schedule date.
 I tried this query below. but takes too long to execute. I have to kill the execution.

select * FROM dept_staff a
 WHERE a.ROWID >
          ANY (SELECT ROWID
                 FROM dept_staff M
                WHERE M.SOCIAL_SECURITY_NO = a.SOCIAL_SECURITY_NO
           and m.schedule_date between '01-JAN-2011' and '30-JUN-2011');
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

Do you have an index on the date column?

Also try to not use implicit conversions and use TO_DATE on the strings and provide the format mask.

The query you used is sort of the 'standard' but you have a typo:

...
SELECT MIN(ROWID)
                 FROM dept_staff M
...
please check this article to see your options to get non-duplicates:
https://www.experts-exchange.com/A_3203.html

in regards to performance, you need to check the explain plan, aka if you have a index on the social_security_no + schedule_date field ...
Avatar of anumoses

ASKER

Indexes on
SCHEDULE_DATE      
SOCIAL_SECURITY_NO      
SITE_CODE
select * FROM dept_staff a
 WHERE a.ROWID >
          ANY (SELECT MIN(ROWID)
                 FROM dept_staff M
                WHERE M.SOCIAL_SECURITY_NO = a.SOCIAL_SECURITY_NO
                          and m.schedule_date between '01-JAN-2011' and '30-JUN-2011');
----------------
I get unwanted date form date range not chosen. I mean august 2011
unwanted data
please consider this query:
select *
  from ( 
select a.*
, row_number() over (partition by SOCIAL_SECURITY_NO order by rowid) rn
 FROM dept_staff a
 where a.schedule_date between '01-JAN-2011' and '30-JUN-2011')
where rn > 1

Open in new window

>>nwanted data

Sorry.  Think you are missing a group by in there.  Check the common duplicate queries out there.

The row_number proposed above will likely be faster but that depends.

As suggested: check the execution plans.

Not a group by but a column join on the date columns.
>>  I tried this query below. but takes too long to execute. I have to kill the execution.
In case there is an performance issue with new code, you can also think to use parallel hint as well.
What is the datatype of schedule_date?
Date
ASKER CERTIFIED SOLUTION
Avatar of awking00
awking00
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Also, note that using between ... and to_date('30-JUN-2011','DD-MON-YYYY') will not include any records with a schedule_date of June 30th that contains a time element. If schedule_date includes a time element use -
to_date('01-JUL-2011','DD-MON-YYYY') - 1/86400
or
to_date('30-JUN-2011 23:59:59','DD-MON-YYYY HH24:MI:SS')
instead.
Can I do order by schedule date and social_security_no?
Is the order by the last statement? I mean


select * from dept_staff
where rowid in
(select row_id from
 (select rowid row_id,
         row_number() over (partition by schedule_date, social_security_no order by rowid) rn
  from dept_staff
  where schedule_date between to_date('01-JAN-2011','DD-MON-YYYY')
                          and to_date('30-JUN-2011','DD-MON-YYYY'))
 where rn > 1)
 order by social_security_no,schedule_date desc;      

-------------
I am not geting the right data.
Did you not try the code form angelIII's in http:#a36346144.

It also uses analytics but doesn't mess with rowids.
indeed, did you try my comment?
apart from that, please clarify what bad data you get? ...
indeed, did you try my comment?
apart from that, please clarify what bad data you get? ...

yes I tried your comment with order by as I posted above. But I am not getting the data order by
please post some data, and what you get "wrong".
otherwise it will remain guesswork
>>I am not geting the right data.<<
What data are you getting and why is it not right?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
thanks