Solved

cleaning duplicates from a table

Posted on 2014-10-10
15
69 Views
Last Modified: 2014-10-13
I have a table named tblOrgRegistrations.  Sample data from that table is attaced to this question.  There was a mistake made that allowed duplicate entries into the table.  The RegID, ActivityDate, HourFrom, HourTo cannot overlap in timeframe.

Is it possible to find all the records that are duplicates where times overlap?  For example, in the attached document, the times overlap for RegID number 35.  He has 9 to 10 and 9 to 11.  Unfortunately I need to identify all these overlaps and correct them.  How can I query to find all these duplicates.
sample1.xlsx
0
Comment
Question by:al4629740
  • 10
  • 5
15 Comments
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40374333
what version of SQL Server please (sql features differ in each version)
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40374342
Try this
select
      TOR.*
from tblOrgRegistrations TOR
inner join tblOrgRegistrations as TOR2 on TOR.RegID =  TOR2.RegID
                     and TOR.ActivityDate = TOR2.ActivityDate
                     and TOR.HourID <> TOR2.HourID
                     and (
                           TOR.HourTimeFrom between TOR2.HourTimeFrom and TOR2.HourTimeTo
                          OR
                           TOR.HourTimeTo between TOR2.HourTimeFrom and TOR2.HourTimeTo
                         )

Open in new window


Details:
**MS SQL Server 2008 Schema Setup**:

    
    
    CREATE TABLE tblOrgRegistrations
    	([HourID] int, [Agency] varchar(13), [AgencyID] int, [Program] varchar(3), [ActivityID] int, [RegID] int, [ActivityDate] datetime, [Hours] int, [HourTimeFrom] varchar(8), [HourTimeTo] varchar(8), [ParticipantHour] int, [VolunteerHour] int, [Fiscal] int, [EntryTime] varchar(8))
    ;
    	
    INSERT INTO tblOrgRegistrations
    	([HourID], [Agency], [AgencyID], [Program], [ActivityID], [RegID], [ActivityDate], [Hours], [HourTimeFrom], [HourTimeTo], [ParticipantHour], [VolunteerHour], [Fiscal], [EntryTime])
    VALUES
    	(180, 'Administrator', 74, 'CSW', 3, 35, '2014-07-03 00:00:00', 1, '09:00:00', '10:00:00', 1, 0, 2015, '10:23:32'),
    	(181, 'Administrator', 74, 'CSW', 3, 35, '2014-07-03 00:00:00', 1, '09:00:00', '11:00:00', 1, 0, 2015, '10:23:32'),
    	(183, 'Administrator', 74, 'CSW', 3, 6, '2014-07-03 00:00:00', 1, '09:00:00', '10:00:00', 1, 0, 2015, '10:23:32'),
    	(187, 'Administrator', 74, 'CSW', 3, 1, '2014-07-03 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '10:23:32'),
    	(190, 'Administrator', 74, 'CYD', 4, 8, '2014-07-06 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '23:13:03'),
    	(193, 'Administrator', 74, 'CSW', 3, 6, '2014-07-06 00:00:00', 4, '09:00:00', '13:00:00', 1, 0, 2015, '23:17:09'),
    	(196, 'Administrator', 74, 'CYD', 3, 35, '2014-07-07 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '10:33:23'),
    	(199, 'Administrator', 74, 'CYD', 3, 14, '2014-07-07 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '10:38:16'),
    	(200, 'Administrator', 74, 'CYD', 3, 4, '2014-07-07 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '10:38:16'),
    	(202, 'Administrator', 74, 'CYD', 3, 1, '2014-07-07 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '10:38:16')
    ; 

**Query 1**:

    select
          TOR.*
    from tblOrgRegistrations TOR
    inner join tblOrgRegistrations as TOR2 on TOR.RegID =  TOR2.RegID
                         and TOR.ActivityDate = TOR2.ActivityDate
                         and TOR.HourID <> TOR2.HourID
                         and (
                               TOR.HourTimeFrom between TOR2.HourTimeFrom and TOR2.HourTimeTo
                              OR
                               TOR.HourTimeTo between TOR2.HourTimeFrom and TOR2.HourTimeTo
                             )
    
    

**[Results][2]**:
    
    | HOURID |        AGENCY | AGENCYID | PROGRAM | ACTIVITYID | REGID |                ACTIVITYDATE | HOURS | HOURTIMEFROM | HOURTIMETO | PARTICIPANTHOUR | VOLUNTEERHOUR | FISCAL | ENTRYTIME |
    |--------|---------------|----------|---------|------------|-------|-----------------------------|-------|--------------|------------|-----------------|---------------|--------|-----------|
    |    180 | Administrator |       74 |     CSW |          3 |    35 | July, 03 2014 00:00:00+0000 |     1 |     09:00:00 |   10:00:00 |               1 |             0 |   2015 |  10:23:32 |
    |    181 | Administrator |       74 |     CSW |          3 |    35 | July, 03 2014 00:00:00+0000 |     1 |     09:00:00 |   11:00:00 |               1 |             0 |   2015 |  10:23:32 |



  [1]: http://sqlfiddle.com/#!3/c088f/1

Open in new window

0
 

Author Comment

by:al4629740
ID: 40374372
2012
0
 

Author Comment

by:al4629740
ID: 40374374
Paul,

It does not appear that it worked.  I am getting records that should not be in that query.
0
 

Author Comment

by:al4629740
ID: 40374375
I would like to only get those that are duplicated.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40374387
For the sample you provided it does exactly that
    | HOURID |        AGENCY | AGENCYID | PROGRAM | ACTIVITYID | REGID |                ACTIVITYDATE | HOURS | HOURTIMEFROM | HOURTIMETO | PARTICIPANTHOUR | VOLUNTEERHOUR | FISCAL | ENTRYTIME |
    |--------|---------------|----------|---------|------------|-------|-----------------------------|-------|--------------|------------|-----------------|---------------|--------|-----------|
    |    180 | Administrator |       74 |     CSW |          3 |    35 | July, 03 2014 00:00:00+0000 |     1 |     09:00:00 |   10:00:00 |               1 |             0 |   2015 |  10:23:32 |
    |    181 | Administrator |       74 |     CSW |          3 |    35 | July, 03 2014 00:00:00+0000 |     1 |     09:00:00 |   11:00:00 |               1 |             0 |   2015 |  10:23:32 |

Open in new window

see it working at: http://sqlfiddle.com/#!3/c088f/1

please provide a sample where it also fails

----------
different topic, if using sql 2012 please always add that in your question or as a tag so this is clear. it really can make a difference but probably not for this question
0
 

Author Comment

by:al4629740
ID: 40374438
Here is the query I used and I attached the output.  It appears that the join is bringing up records that are duplicates that are not really duplicates.  


select
      TOR.*
from tblorghours TOR
inner join tblorghours as TOR2 on TOR.RegID =  TOR2.RegID
                     and TOR.ActivityDate = TOR2.ActivityDate
                     and TOR.HourID <> TOR2.HourID
                     and (
                           TOR.HourTimeFrom between TOR2.HourTimeFrom and TOR2.HourTimeTo
                          OR
                           TOR.HourTimeTo between TOR2.HourTimeFrom and TOR2.HourTimeTo
                         ) order by activitydate, regid,hourtimefrom, ActivityID

Open in new window

sample1.xlsx
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 48

Accepted Solution

by:
PortletPaul earned 500 total points
ID: 40374468
It is important for you to understand what the the query does.

Part of the logic is that you absolutely must NOT compare a record to itself (otherwise every record would be "duplicated")
So, in your original sample there was a unique ID column [HourID] and this worked: [HourID] <> [HourID]

You have not given me a unique ID to work with in this new sample, so I introduced one. I called it [ID]
you may have to edit this to suit your data
if that table does not have a unique primary key this query will not work.

select
      TOH.*
from tblorghours TOH
inner join tblorghours as TOH2 on TOH.RegID =  TOH2.RegID
                     and TOH.ActivityID = TOH2.ActivityID
                     and TOH.ActivityDate = TOH2.ActivityDate
                     and TOH.ID <> TOH2.ID                                                  --<< VERY IMPORTANT !!!
                     and (
                           TOH.HourTimeFrom >= TOH2.HourTimeFrom and TOH.HourTimeFrom < TOH2.HourTimeTo
                          OR
                           TOH.HourTimeTo >= TOH2.HourTimeFrom and TOH.HourTimeTo < TOH2.HourTimeTo
                         ) 
 order by activitydate, regid,hourtimefrom, ActivityID
;

Open in new window

also see: http://sqlfiddle.com/#!3/0e078/4
0
 

Author Comment

by:al4629740
ID: 40374783
In the new sample I gave you, its the same table.  It shows HourID as the primary ID
0
 

Author Comment

by:al4629740
ID: 40374787
btw, how did you post results that way in your answer above.  I would like to do that instead of uploading an excel document.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40374805
it is a feature of sqlfiddle, there are 3 output styles, choose "plaintext" then copy/pastesqlfidde-outputs.png
0
 

Author Comment

by:al4629740
ID: 40374867
what is sqlfiddle?
0
 

Author Comment

by:al4629740
ID: 40374869
is that a replacement to sql management studio?
0
 

Author Comment

by:al4629740
ID: 40374883
Back to the question:

Since I don't need to see overlapped times that include

9:00am to 10:00am
10:00am to 11:00am

I changed your query to

select
      TOR.*
from tblorghours TOR
inner join tblorghours as TOR2 on TOR.RegID =  TOR2.RegID
                     and TOR.ActivityDate = TOR2.ActivityDate
                     and TOR.HourID <> TOR2.HourID
                     and (
                           TOR.HourTimeFrom > TOR2.HourTimeFrom and TOR.HourTimeFrom < TOR2.HourTimeTo 
                          OR
                           TOR.HourTimeTo > TOR2.HourTimeFrom and TOR.HourTimeTo < TOR2.HourTimeTo
						  OR
						   TOR.HourTimeFrom < TOR2.HourTimeFrom and TOR.HourTimeTo > TOR2.HourTimeTo
                         ) order by Agency,activitydate, regid,hourtimefrom, ActivityID

Open in new window


Does this work for finding times of overlapping?
0
 

Author Comment

by:al4629740
ID: 40375128
It's unnecessary to find recordsWhere the ending time and start time are the same
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Sharepoint 3.0 migration 4 40
Choosing SSD drives for SQL Server 32 79
MS SQL Backup 24 70
SQL query to summarize items per month 5 27
Let's review the features of new SQL Server 2012 (Denali CTP3). It listed as below: PERCENT_RANK(): PERCENT_RANK() function will returns the percentage value of rank of the values among its group. PERCENT_RANK() function value always in be…
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now