cleaning duplicates from a table

I have a table named tblOrgRegistrations.  Sample data from that table is attaced to this question.  There was a mistake made that allowed duplicate entries into the table.  The RegID, ActivityDate, HourFrom, HourTo cannot overlap in timeframe.

Is it possible to find all the records that are duplicates where times overlap?  For example, in the attached document, the times overlap for RegID number 35.  He has 9 to 10 and 9 to 11.  Unfortunately I need to identify all these overlaps and correct them.  How can I query to find all these duplicates.
sample1.xlsx
al4629740Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

PortletPaulfreelancerCommented:
what version of SQL Server please (sql features differ in each version)
0
PortletPaulfreelancerCommented:
Try this
select
      TOR.*
from tblOrgRegistrations TOR
inner join tblOrgRegistrations as TOR2 on TOR.RegID =  TOR2.RegID
                     and TOR.ActivityDate = TOR2.ActivityDate
                     and TOR.HourID <> TOR2.HourID
                     and (
                           TOR.HourTimeFrom between TOR2.HourTimeFrom and TOR2.HourTimeTo
                          OR
                           TOR.HourTimeTo between TOR2.HourTimeFrom and TOR2.HourTimeTo
                         )

Open in new window


Details:
**MS SQL Server 2008 Schema Setup**:

    
    
    CREATE TABLE tblOrgRegistrations
    	([HourID] int, [Agency] varchar(13), [AgencyID] int, [Program] varchar(3), [ActivityID] int, [RegID] int, [ActivityDate] datetime, [Hours] int, [HourTimeFrom] varchar(8), [HourTimeTo] varchar(8), [ParticipantHour] int, [VolunteerHour] int, [Fiscal] int, [EntryTime] varchar(8))
    ;
    	
    INSERT INTO tblOrgRegistrations
    	([HourID], [Agency], [AgencyID], [Program], [ActivityID], [RegID], [ActivityDate], [Hours], [HourTimeFrom], [HourTimeTo], [ParticipantHour], [VolunteerHour], [Fiscal], [EntryTime])
    VALUES
    	(180, 'Administrator', 74, 'CSW', 3, 35, '2014-07-03 00:00:00', 1, '09:00:00', '10:00:00', 1, 0, 2015, '10:23:32'),
    	(181, 'Administrator', 74, 'CSW', 3, 35, '2014-07-03 00:00:00', 1, '09:00:00', '11:00:00', 1, 0, 2015, '10:23:32'),
    	(183, 'Administrator', 74, 'CSW', 3, 6, '2014-07-03 00:00:00', 1, '09:00:00', '10:00:00', 1, 0, 2015, '10:23:32'),
    	(187, 'Administrator', 74, 'CSW', 3, 1, '2014-07-03 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '10:23:32'),
    	(190, 'Administrator', 74, 'CYD', 4, 8, '2014-07-06 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '23:13:03'),
    	(193, 'Administrator', 74, 'CSW', 3, 6, '2014-07-06 00:00:00', 4, '09:00:00', '13:00:00', 1, 0, 2015, '23:17:09'),
    	(196, 'Administrator', 74, 'CYD', 3, 35, '2014-07-07 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '10:33:23'),
    	(199, 'Administrator', 74, 'CYD', 3, 14, '2014-07-07 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '10:38:16'),
    	(200, 'Administrator', 74, 'CYD', 3, 4, '2014-07-07 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '10:38:16'),
    	(202, 'Administrator', 74, 'CYD', 3, 1, '2014-07-07 00:00:00', 1, '09:00:00', '10:00:00', 0, 1, 2015, '10:38:16')
    ; 

**Query 1**:

    select
          TOR.*
    from tblOrgRegistrations TOR
    inner join tblOrgRegistrations as TOR2 on TOR.RegID =  TOR2.RegID
                         and TOR.ActivityDate = TOR2.ActivityDate
                         and TOR.HourID <> TOR2.HourID
                         and (
                               TOR.HourTimeFrom between TOR2.HourTimeFrom and TOR2.HourTimeTo
                              OR
                               TOR.HourTimeTo between TOR2.HourTimeFrom and TOR2.HourTimeTo
                             )
    
    

**[Results][2]**:
    
    | HOURID |        AGENCY | AGENCYID | PROGRAM | ACTIVITYID | REGID |                ACTIVITYDATE | HOURS | HOURTIMEFROM | HOURTIMETO | PARTICIPANTHOUR | VOLUNTEERHOUR | FISCAL | ENTRYTIME |
    |--------|---------------|----------|---------|------------|-------|-----------------------------|-------|--------------|------------|-----------------|---------------|--------|-----------|
    |    180 | Administrator |       74 |     CSW |          3 |    35 | July, 03 2014 00:00:00+0000 |     1 |     09:00:00 |   10:00:00 |               1 |             0 |   2015 |  10:23:32 |
    |    181 | Administrator |       74 |     CSW |          3 |    35 | July, 03 2014 00:00:00+0000 |     1 |     09:00:00 |   11:00:00 |               1 |             0 |   2015 |  10:23:32 |



  [1]: http://sqlfiddle.com/#!3/c088f/1

Open in new window

0
al4629740Author Commented:
2012
0
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

al4629740Author Commented:
Paul,

It does not appear that it worked.  I am getting records that should not be in that query.
0
al4629740Author Commented:
I would like to only get those that are duplicated.
0
PortletPaulfreelancerCommented:
For the sample you provided it does exactly that
    | HOURID |        AGENCY | AGENCYID | PROGRAM | ACTIVITYID | REGID |                ACTIVITYDATE | HOURS | HOURTIMEFROM | HOURTIMETO | PARTICIPANTHOUR | VOLUNTEERHOUR | FISCAL | ENTRYTIME |
    |--------|---------------|----------|---------|------------|-------|-----------------------------|-------|--------------|------------|-----------------|---------------|--------|-----------|
    |    180 | Administrator |       74 |     CSW |          3 |    35 | July, 03 2014 00:00:00+0000 |     1 |     09:00:00 |   10:00:00 |               1 |             0 |   2015 |  10:23:32 |
    |    181 | Administrator |       74 |     CSW |          3 |    35 | July, 03 2014 00:00:00+0000 |     1 |     09:00:00 |   11:00:00 |               1 |             0 |   2015 |  10:23:32 |

Open in new window

see it working at: http://sqlfiddle.com/#!3/c088f/1

please provide a sample where it also fails

----------
different topic, if using sql 2012 please always add that in your question or as a tag so this is clear. it really can make a difference but probably not for this question
0
al4629740Author Commented:
Here is the query I used and I attached the output.  It appears that the join is bringing up records that are duplicates that are not really duplicates.  


select
      TOR.*
from tblorghours TOR
inner join tblorghours as TOR2 on TOR.RegID =  TOR2.RegID
                     and TOR.ActivityDate = TOR2.ActivityDate
                     and TOR.HourID <> TOR2.HourID
                     and (
                           TOR.HourTimeFrom between TOR2.HourTimeFrom and TOR2.HourTimeTo
                          OR
                           TOR.HourTimeTo between TOR2.HourTimeFrom and TOR2.HourTimeTo
                         ) order by activitydate, regid,hourtimefrom, ActivityID

Open in new window

sample1.xlsx
0
PortletPaulfreelancerCommented:
It is important for you to understand what the the query does.

Part of the logic is that you absolutely must NOT compare a record to itself (otherwise every record would be "duplicated")
So, in your original sample there was a unique ID column [HourID] and this worked: [HourID] <> [HourID]

You have not given me a unique ID to work with in this new sample, so I introduced one. I called it [ID]
you may have to edit this to suit your data
if that table does not have a unique primary key this query will not work.

select
      TOH.*
from tblorghours TOH
inner join tblorghours as TOH2 on TOH.RegID =  TOH2.RegID
                     and TOH.ActivityID = TOH2.ActivityID
                     and TOH.ActivityDate = TOH2.ActivityDate
                     and TOH.ID <> TOH2.ID                                                  --<< VERY IMPORTANT !!!
                     and (
                           TOH.HourTimeFrom >= TOH2.HourTimeFrom and TOH.HourTimeFrom < TOH2.HourTimeTo
                          OR
                           TOH.HourTimeTo >= TOH2.HourTimeFrom and TOH.HourTimeTo < TOH2.HourTimeTo
                         ) 
 order by activitydate, regid,hourtimefrom, ActivityID
;

Open in new window

also see: http://sqlfiddle.com/#!3/0e078/4
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
al4629740Author Commented:
In the new sample I gave you, its the same table.  It shows HourID as the primary ID
0
al4629740Author Commented:
btw, how did you post results that way in your answer above.  I would like to do that instead of uploading an excel document.
0
PortletPaulfreelancerCommented:
it is a feature of sqlfiddle, there are 3 output styles, choose "plaintext" then copy/pastesqlfidde-outputs.png
0
al4629740Author Commented:
what is sqlfiddle?
0
al4629740Author Commented:
is that a replacement to sql management studio?
0
al4629740Author Commented:
Back to the question:

Since I don't need to see overlapped times that include

9:00am to 10:00am
10:00am to 11:00am

I changed your query to

select
      TOR.*
from tblorghours TOR
inner join tblorghours as TOR2 on TOR.RegID =  TOR2.RegID
                     and TOR.ActivityDate = TOR2.ActivityDate
                     and TOR.HourID <> TOR2.HourID
                     and (
                           TOR.HourTimeFrom > TOR2.HourTimeFrom and TOR.HourTimeFrom < TOR2.HourTimeTo 
                          OR
                           TOR.HourTimeTo > TOR2.HourTimeFrom and TOR.HourTimeTo < TOR2.HourTimeTo
						  OR
						   TOR.HourTimeFrom < TOR2.HourTimeFrom and TOR.HourTimeTo > TOR2.HourTimeTo
                         ) order by Agency,activitydate, regid,hourtimefrom, ActivityID

Open in new window


Does this work for finding times of overlapping?
0
al4629740Author Commented:
It's unnecessary to find recordsWhere the ending time and start time are the same
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.