Link to home
Start Free TrialLog in
Avatar of bfuchs
bfuchsFlag for United States of America

asked on

Help needed creating a union query in SQL

Hi Experts,

I have the following fields in PatientsEmployeesSchedule table

ID (AutoNum)
EmployeeID (int)
PatientID (int)
Day (DateTime) containing a date
From (DateTime) containing time
To (DateTime) containing time

(I know the name of fields are not the best as they are reserved words, but I cant change them at this point:(

Now the table contains scheduling records that usually starts & ends at the same day, however in case it does start at one day and finishes the next day (for example from 8PM to 8AM), only one record is being entered in the table with the day field specifying the first day meaning the day schedule started.

Now my question is, how can I have a query/view that will split that in two records in the most efficient way?

So if the table has the following:
ID          EmployeeID    PatientID     Day                From           To
1           123                  456              10/27/15        8:00 PM       8:00 AM

I would like to see the following:
ID          EmployeeID    PatientID     Day                From           To
1           123                  456              10/27/15        8:00 PM       12:00 AM
1           123                  456              10/28/15        12:00 PM     08:00 AM

Thanks
Avatar of Brian Crowe
Brian Crowe
Flag of United States of America image

The easiest way to do that is to just join it against a date table.  I have created a very small sample one for this example but most production databases will have a calendar table of some kind.

DECLARE @Employee TABLE
(
	ID			INT IDENTITY(1,1) NOT NULL,
	EmployeeID	INT,
	PatientID	INT,
	[Day]		DATE,
	[From]		TIME,
	[To]		TIME
);

INSERT INTO @Employee
VALUES (123, 456, '20151027', '20:00', '08:00');

DECLARE @Date TABLE
(
	[Date]		DATE
);

INSERT INTO @Date
VALUES ('20151027'),
	('20151028');

SELECT ID, EmployeeID, PatientID, D.[Date] AS [Day],
	E.[From], E.[To]
FROM @Employee AS E
INNER JOIN @Date AS D
	ON E.[Day] = D.[Date]
	OR (E.[To] < E.[From] AND E.[Day] = DATEADD(DAY, -1, D.[Date]));

Open in new window

Brian follows your principle of using reserved words for the column names :)
You stipulate that these are DateTime data types.
Day (DateTime) containing a date
From (DateTime) containing time  <<<<<<< must have a date too, is it 1900-01-01 ?
To (DateTime) containing time  <<<<<< must have a date too, is it 1900-01-01 ?

This means that ALL OF THEM contain a date

Because of the required precision here I think you need to output some REAL data as an insert script. I think you may find that those "time only" columns do in fact have a date as well, and that date might be 1900-01-01 (but it might not be).

So, until there is confirmation of the REAL data, the following is based on a hunch. Note also I am displaying the whole date/time, not juts date or just time.
CREATE TABLE PatientsEmployeesSchedule 
    ([ID] int, [EmployeeID] int, [PatientID] int, [Day] datetime, [From] datetime, [To] datetime)
;
    
INSERT INTO PatientsEmployeesSchedule 
    ([ID], [EmployeeID], [PatientID], [Day], [From], [To])
VALUES
    (1 , 123, 456, '2015-10-27 00:00:00', '1900-01-01 00:20:00', '1900-01-01 00:08:00')
;

Open in new window

This query uses CROSS APPLY and VALUES to generate 2 rows output for each one row of input.
select
        ca.*
from PatientsEmployeesSchedule pes
cross apply (
      values
             (pes.ID, pes.EmployeeID, pes.PatientID, pes.[Day], pes.[From], dateadd(hour,24,0))
           , (pes.ID, pes.EmployeeID, pes.PatientID, dateadd(day,1,pes.[Day]), dateadd(hour,24,0), pes.[To])
    ) ca ([ID], [EmployeeID], [PatientID], [Day], [From], [To])
where pes.[From] > pes.[To]



| ID | EmployeeID | PatientID |                       Day |                      From |                        To |
|----|------------|-----------|---------------------------|---------------------------|---------------------------|
|  1 |        123 |       456 | October, 27 2015 00:00:00 | January, 01 1900 00:20:00 | January, 02 1900 00:00:00 |
|  1 |        123 |       456 | October, 28 2015 00:00:00 | January, 02 1900 00:00:00 | January, 01 1900 00:08:00 |

Open in new window

also see http://sqlfiddle.com/#!6/c57b5/5

{+edit}Those column names are terrible - sorry you know that already - but they really are bad.
Avatar of bfuchs

ASKER

Hi Experts,

First of all thanks for replying..

I will test those suggestions and keep you posted.

Just wanted to stress out this point, since we are going to force some rules here that will impact every single transaction, for example that an employee cant have over X amount of hours scheduled per week (with some exceptions tough), therefore is very important to have the most efficient way of calculating this.

@Paul,
Yes, the from & to columns do contain the date of 1899-12-30...


Thanks,
Ben
SOLUTION
Avatar of PortletPaul
PortletPaul
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of bfuchs

ASKER

@Brian, Paul,

I forgot to mention that besides returning 2 records for that instance (where schedule ends next day), I also need to make sure this record is not showing up as a record by itself, otherwise I will end up with double the hours scheduled...

Thanks,
Ben
I'm not sure I understand.  You don't want to see rows that don't overlap days in the result set at all?
Hello,

Try this

DECLARE @Employee TABLE
(
	ID			INT IDENTITY(1,1) NOT NULL,
	EmployeeID	INT,
	PatientID	INT,
	[Day]		DATE,
	[From]		TIME,
	[To]		TIME
);

DECLARE @EmployeeOut TABLE
(
	ID			INT NULL,
	EmployeeID	INT,
	PatientID	INT,
	[Day]		DATE,
	[From]		TIME,
	[To]		TIME
);

INSERT INTO @Employee
VALUES (123, 456, '20151027', '20:00', '08:00');

INSERT INTO @EmployeeOut
SELECT ID,EmployeeID,PatientID,[Day],[From],CASE WHEN [To]<[From] THEN '12:00' ELSE [To] END FROM @Employee
INSERT INTO @EmployeeOut
SELECT ID,EmployeeID,PatientID,CONVERT(DATE,CASE WHEN [To]<[From] THEN DATEADD(D,1,[Day]) ELSE [Day] END) [DAY],[From],[To] FROM @Employee 

SELECT * FROM @Employee
SELECT * FROM @EmployeeOut

Open in new window

It's returning 2 rows - as specified

>>"also need to make sure this record is not showing up as a record by itself"
eh?

interpretation:

 you do NOT want this dual row at all if it would create a duplicate of some other row

OR

you do NOT want ALL OF THE DURATION of the dual rows if it would cause an overlap with some other row

OR

 you do NOT want either row if ????

Perhaps you could explain BY EXAMPLES e.g.
a dual row you do want
a dual row that overlaps
a dual row that duplicates

for each, what is the expected result?

Can I point out that any implicit overlap with other rows already exists (in a logical sense), all you are doing is splitting the duration over 2 calendar days with one row per day
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of bfuchs

ASKER

Wow,

Such a nice response!

As of now Scott's solution seems to work perfectly as desired, see attached.

@Vikas,
See attached results of yours
I need only 3 rows returned for that scenario tested.

@Brian, Paul,
Excuse me for perhaps not being so clear.

As originally described the structure of our table was clear.

Now I was given the task of preventing schedules of above X amount of hours per employee per week.

Therefore in order to calculate that, I would have to split a record that starts from 8PM and ends 8AM in two, as one belongs to the date saved in the day field, and the other belongs to the next day

About my last post, it was just to make sure that when selecting this record along with other records that do not need to be split, I will get one record for those that do not be to be split and two records for those that do need to be split, (as opposed to 3 records for those in need for split).

Regarding the overlap minute, I am not concerned as all records are entered this way, for example (2:AM - 4AM) and next can be (4AM: - 6AM)

Now since I will need to perform this calculation before every single record is saved, I do need the most efficient method to accomplish that.

Therefore, everyone who can come up with something that will be more efficient than Scott's solution is still welcome-:)

Thanks,
Ben
Untitled.png
Untitled1.png
I believe you will find that my solution will provide records whether they overlap midnight or not so you don't need to separate queries and union them.  If you don't have a calendar table then I recommend adding one to your database or even a separate database on the instance if necessary.  You can find many examples of Date tables and how to populate them from a simple Google search.
Avatar of bfuchs

ASKER

@Brian,
I see you're right, that solution will also work well.

I guess will have to test both in our production DB that contains significant amount of data, and decide which is the ideal solution regarding performance.

Thanks,
Ben
If performance is the biggest factor then post your execution plan and we can look at where you might need indexing.
Avatar of bfuchs

ASKER

@Brian,

Actually I don't see how your solution splits the time, it only gives me two dates but both have the same time?
See attached.

Thanks,
Ben
Untitled1.png
If you are attempting to prevent new booking that would overlap, why not add one or two computed columns to the existing table (depending on design) that would provide the means for doing that

e.g.
  calculate the datetime of the startpoint (both [Day] and [From])
  calculate the datetime of the endpoint (both [Day] and [To] including the extra day as needed)

Then when inserting new appointments just compare to those new columns. May solve other issues as well, plus you can use decent column names.
Avatar of bfuchs

ASKER

@Paul,

Not sure why you guys are getting the picture I am talking about overlap here..?

My task is to calculate the quantity of hours an employee has schedules ([To] - [From]) per week, so if someone is scheduled from 8AM to 8PM I want to get 12 hours, and from 8PM to 8AM I also want to get 12 hours.

The only peace I was missing here is that the 8PM to 8AM could really belong to two weeks, as the first 4 hours till 12 AM can be on Sat night while the remaining 8 hours are of Sunday which is considered the next week already.

Thanks,
Ben
Avatar of bfuchs

ASKER

@Scott,

So far I got to test your suggestion and it took 7 seconds (for over 1M rows).

Attached is the execution plan

do you see any need for improvement?

PS. Next thing I will do is, add the time calculation and group it by week, but will leave that for another post..

Thanks,
Ben
Scott.sqlplan
Avatar of bfuchs

ASKER

Meanwhile just doing some data analysis..

tested Paul's suggestion as following

SELECT
      ca.*
from PatientsEmployeesSchedule pes
      CROSS APPLY (

            VALUES
                  (pes.ID, pes.EmployeeID, pes.PatientID, pes.[Day], pes.[From], '18991231')
                , (pes.ID, pes.EmployeeID, pes.PatientID, DATEADD(day, 1, pes.[Day]), '18991230', pes.[To])

              ) ca ([ID], [EmployeeID], [PatientID], [Day], [From], [To])
WHERE pes.[From] > pes.[To]

union all

select ID,EmployeeID,PatientID,[Day],[from],[TO] from PatientsEmployeesSchedule pes
WHERE pes.[From] <= pes.[To]

Open in new window


That took only 6 seconds.

But I see a big difference in the count of rows returned, compared to Scott's..

Scott's suggestion returns over 2 thousand rows more then Paul's..

Who can help me figure out which is correct?

Thanks,
Ben
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of bfuchs

ASKER

@Scott,

Your first suggestion results in the following:
Table 'PatientsEmployeesSchedule'. Scan count 25, logical reads 7194, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

you last suggestion:
Table 'Worktable'. Scan count 0, logical reads 5987018, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'PatientsEmployeesSchedule'. Scan count 2, logical reads 14152, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Both take the same time to execute.

Paul's  suggestion:

Table 'PatientsEmployeesSchedule'. Scan count 50, logical reads 14388, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Not sure how can I determine from these which is performing better?


Thanks,
Ben
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of bfuchs

ASKER

So you're are saying the only thing that counts is the logical reads?

And Whats about the 5.9M logical reads of the work table?-:)

Thanks,
Ben
Idk for sure, looking at the query plan might give a better idea, but often those show up when a table doesn't have a unique index.  Making the index unique can sometimes get rid of the extra I/O, even though that's not really logical, since that shouldn't affect reads that badly.
Avatar of bfuchs

ASKER

That's not the case here, as ID column is unique in PatientsEmployeesSchedule.

Perhaps you're referring to the fact this query may show 2 records per ID (in case of the split) ?

Thanks,
Ben
Avatar of bfuchs

ASKER

As of now 3 solutions posted here seems to work, Paul's solution and 2 of Scott's,
And looking at logical reads, Scott's first solution will perform the best.

Do someone has any objetion on this?

Thanks,
Ben
no objection

although instead of
 THEN '18991230 23:59:59.997'  

I would recommend
THEN '18991231'
Avatar of bfuchs

ASKER

Thank You guys
Great job!