Solved

cleaning duplicates from a table

Posted on 2014-10-10
15
84 Views
Last Modified: 2014-10-13
I have a table named tblOrgRegistrations.  Sample data from that table is attaced to this question.  There was a mistake made that allowed duplicate entries into the table.  The RegID, ActivityDate, HourFrom, HourTo cannot overlap in timeframe.

Is it possible to find all the records that are duplicates where times overlap?  For example, in the attached document, the times overlap for RegID number 35.  He has 9 to 10 and 9 to 11.  Unfortunately I need to identify all these overlaps and correct them.  How can I query to find all these duplicates.
sample1.xlsx
0
Comment
Question by:al4629740
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 5
15 Comments
 
LVL 49

Expert Comment

by:PortletPaul
ID: 40374333
what version of SQL Server please (sql features differ in each version)
0
 
LVL 49

Expert Comment

by:PortletPaul
ID: 40374342
Try this
select
      TOR.*
from tblOrgRegistrations TOR
inner join tblOrgRegistrations as TOR2 on TOR.RegID =  TOR2.RegID
                     and TOR.ActivityDate = TOR2.ActivityDate
                     and TOR.HourID <> TOR2.HourID
                     and (
                           TOR.HourTimeFrom between TOR2.HourTimeFrom and TOR2.HourTimeTo
                          OR
                           TOR.HourTimeTo between TOR2.HourTimeFrom and TOR2.HourTimeTo
                         )

Open in new window


Details:
**MS SQL Server 2008 Schema Setup**:

    
    
    CREATE TABLE tblOrgRegistrations
    	([HourID] int, [Agency] varchar(13), [AgencyID] int, [Program] varchar(3), [ActivityID] int, [RegID] int, [ActivityDate] datetime, [Hours] int, [HourTimeFrom] varchar(8), [HourTimeTo] varchar(8), [ParticipantHour] int, [VolunteerHour] int, [Fiscal] int, [EntryTime] varchar(8))
    ;
    	
    INSERT INTO tblOrgRegistrations
    	([HourID], [Agency], [AgencyID], [Program], [ActivityID], [RegID], [ActivityDate], [Hours], [HourTimeFrom], [HourTimeTo], [ParticipantHour], [VolunteerHour], [Fiscal], [EntryTime])
    VALUES
    	(180, 'Administrator', 74, 'CSW', 3, 35, '2014-07-03 00:00:00', 1, '09:00:00', '10:00:00', 1, 0, 2015, '10:23:32'),
    	(181, 'Administrator', 74, 'CSW', 3, 35, '2014-07-03 00:00:00', 1, '09:00:00', '11:00:00', 1, 0, 2015, '10:23:32'),
    	(183, 'Administrator', 74, 'CSW', 3, 6, '2014-07-03 00:00:00', 1, '09:00:00', '10:00:00', 1, 0, 2015, '10:23:32'),
    	(187, 'Administrator', 74, 'CSW', 3, 1, '2014-07-03 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '10:23:32'),
    	(190, 'Administrator', 74, 'CYD', 4, 8, '2014-07-06 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '23:13:03'),
    	(193, 'Administrator', 74, 'CSW', 3, 6, '2014-07-06 00:00:00', 4, '09:00:00', '13:00:00', 1, 0, 2015, '23:17:09'),
    	(196, 'Administrator', 74, 'CYD', 3, 35, '2014-07-07 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '10:33:23'),
    	(199, 'Administrator', 74, 'CYD', 3, 14, '2014-07-07 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '10:38:16'),
    	(200, 'Administrator', 74, 'CYD', 3, 4, '2014-07-07 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '10:38:16'),
    	(202, 'Administrator', 74, 'CYD', 3, 1, '2014-07-07 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '10:38:16')
    ; 

**Query 1**:

    select
          TOR.*
    from tblOrgRegistrations TOR
    inner join tblOrgRegistrations as TOR2 on TOR.RegID =  TOR2.RegID
                         and TOR.ActivityDate = TOR2.ActivityDate
                         and TOR.HourID <> TOR2.HourID
                         and (
                               TOR.HourTimeFrom between TOR2.HourTimeFrom and TOR2.HourTimeTo
                              OR
                               TOR.HourTimeTo between TOR2.HourTimeFrom and TOR2.HourTimeTo
                             )
    
    

**[Results][2]**:
    
    | HOURID |        AGENCY | AGENCYID | PROGRAM | ACTIVITYID | REGID |                ACTIVITYDATE | HOURS | HOURTIMEFROM | HOURTIMETO | PARTICIPANTHOUR | VOLUNTEERHOUR | FISCAL | ENTRYTIME |
    |--------|---------------|----------|---------|------------|-------|-----------------------------|-------|--------------|------------|-----------------|---------------|--------|-----------|
    |    180 | Administrator |       74 |     CSW |          3 |    35 | July, 03 2014 00:00:00+0000 |     1 |     09:00:00 |   10:00:00 |               1 |             0 |   2015 |  10:23:32 |
    |    181 | Administrator |       74 |     CSW |          3 |    35 | July, 03 2014 00:00:00+0000 |     1 |     09:00:00 |   11:00:00 |               1 |             0 |   2015 |  10:23:32 |



  [1]: http://sqlfiddle.com/#!3/c088f/1

Open in new window

0
 

Author Comment

by:al4629740
ID: 40374372
2012
0
Raise the IQ of Your IT Alerts

From IT major incidents to manufacturing line slowdowns, every business process generates insights that need to reach the people required to take action. You need a platform that integrates with your business tools to create fully enabled DevOps toolchains.

You need xMatters.

 

Author Comment

by:al4629740
ID: 40374374
Paul,

It does not appear that it worked.  I am getting records that should not be in that query.
0
 

Author Comment

by:al4629740
ID: 40374375
I would like to only get those that are duplicated.
0
 
LVL 49

Expert Comment

by:PortletPaul
ID: 40374387
For the sample you provided it does exactly that
    | HOURID |        AGENCY | AGENCYID | PROGRAM | ACTIVITYID | REGID |                ACTIVITYDATE | HOURS | HOURTIMEFROM | HOURTIMETO | PARTICIPANTHOUR | VOLUNTEERHOUR | FISCAL | ENTRYTIME |
    |--------|---------------|----------|---------|------------|-------|-----------------------------|-------|--------------|------------|-----------------|---------------|--------|-----------|
    |    180 | Administrator |       74 |     CSW |          3 |    35 | July, 03 2014 00:00:00+0000 |     1 |     09:00:00 |   10:00:00 |               1 |             0 |   2015 |  10:23:32 |
    |    181 | Administrator |       74 |     CSW |          3 |    35 | July, 03 2014 00:00:00+0000 |     1 |     09:00:00 |   11:00:00 |               1 |             0 |   2015 |  10:23:32 |

Open in new window

see it working at: http://sqlfiddle.com/#!3/c088f/1

please provide a sample where it also fails

----------
different topic, if using sql 2012 please always add that in your question or as a tag so this is clear. it really can make a difference but probably not for this question
0
 

Author Comment

by:al4629740
ID: 40374438
Here is the query I used and I attached the output.  It appears that the join is bringing up records that are duplicates that are not really duplicates.  


select
      TOR.*
from tblorghours TOR
inner join tblorghours as TOR2 on TOR.RegID =  TOR2.RegID
                     and TOR.ActivityDate = TOR2.ActivityDate
                     and TOR.HourID <> TOR2.HourID
                     and (
                           TOR.HourTimeFrom between TOR2.HourTimeFrom and TOR2.HourTimeTo
                          OR
                           TOR.HourTimeTo between TOR2.HourTimeFrom and TOR2.HourTimeTo
                         ) order by activitydate, regid,hourtimefrom, ActivityID

Open in new window

sample1.xlsx
0
 
LVL 49

Accepted Solution

by:
PortletPaul earned 500 total points
ID: 40374468
It is important for you to understand what the the query does.

Part of the logic is that you absolutely must NOT compare a record to itself (otherwise every record would be "duplicated")
So, in your original sample there was a unique ID column [HourID] and this worked: [HourID] <> [HourID]

You have not given me a unique ID to work with in this new sample, so I introduced one. I called it [ID]
you may have to edit this to suit your data
if that table does not have a unique primary key this query will not work.

select
      TOH.*
from tblorghours TOH
inner join tblorghours as TOH2 on TOH.RegID =  TOH2.RegID
                     and TOH.ActivityID = TOH2.ActivityID
                     and TOH.ActivityDate = TOH2.ActivityDate
                     and TOH.ID <> TOH2.ID                                                  --<< VERY IMPORTANT !!!
                     and (
                           TOH.HourTimeFrom >= TOH2.HourTimeFrom and TOH.HourTimeFrom < TOH2.HourTimeTo
                          OR
                           TOH.HourTimeTo >= TOH2.HourTimeFrom and TOH.HourTimeTo < TOH2.HourTimeTo
                         ) 
 order by activitydate, regid,hourtimefrom, ActivityID
;

Open in new window

also see: http://sqlfiddle.com/#!3/0e078/4
0
 

Author Comment

by:al4629740
ID: 40374783
In the new sample I gave you, its the same table.  It shows HourID as the primary ID
0
 

Author Comment

by:al4629740
ID: 40374787
btw, how did you post results that way in your answer above.  I would like to do that instead of uploading an excel document.
0
 
LVL 49

Expert Comment

by:PortletPaul
ID: 40374805
it is a feature of sqlfiddle, there are 3 output styles, choose "plaintext" then copy/pastesqlfidde-outputs.png
0
 

Author Comment

by:al4629740
ID: 40374867
what is sqlfiddle?
0
 

Author Comment

by:al4629740
ID: 40374869
is that a replacement to sql management studio?
0
 

Author Comment

by:al4629740
ID: 40374883
Back to the question:

Since I don't need to see overlapped times that include

9:00am to 10:00am
10:00am to 11:00am

I changed your query to

select
      TOR.*
from tblorghours TOR
inner join tblorghours as TOR2 on TOR.RegID =  TOR2.RegID
                     and TOR.ActivityDate = TOR2.ActivityDate
                     and TOR.HourID <> TOR2.HourID
                     and (
                           TOR.HourTimeFrom > TOR2.HourTimeFrom and TOR.HourTimeFrom < TOR2.HourTimeTo 
                          OR
                           TOR.HourTimeTo > TOR2.HourTimeFrom and TOR.HourTimeTo < TOR2.HourTimeTo
						  OR
						   TOR.HourTimeFrom < TOR2.HourTimeFrom and TOR.HourTimeTo > TOR2.HourTimeTo
                         ) order by Agency,activitydate, regid,hourtimefrom, ActivityID

Open in new window


Does this work for finding times of overlapping?
0
 

Author Comment

by:al4629740
ID: 40375128
It's unnecessary to find recordsWhere the ending time and start time are the same
0

Featured Post

Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

695 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question