Help needed creating a union query in SQL

Hi Experts,

I have the following fields in PatientsEmployeesSchedule table

ID (AutoNum)
EmployeeID (int)
PatientID (int)
Day (DateTime) containing a date
From (DateTime) containing time
To (DateTime) containing time

(I know the name of fields are not the best as they are reserved words, but I cant change them at this point:(

Now the table contains scheduling records that usually starts & ends at the same day, however in case it does start at one day and finishes the next day (for example from 8PM to 8AM), only one record is being entered in the table with the day field specifying the first day meaning the day schedule started.

Now my question is, how can I have a query/view that will split that in two records in the most efficient way?

So if the table has the following:
ID          EmployeeID    PatientID     Day                From           To
1           123                  456              10/27/15        8:00 PM       8:00 AM

I would like to see the following:
ID          EmployeeID    PatientID     Day                From           To
1           123                  456              10/27/15        8:00 PM       12:00 AM
1           123                  456              10/28/15        12:00 PM     08:00 AM

Thanks
LVL 6
bfuchsAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Brian CroweDatabase AdministratorCommented:
The easiest way to do that is to just join it against a date table.  I have created a very small sample one for this example but most production databases will have a calendar table of some kind.

DECLARE @Employee TABLE
(
	ID			INT IDENTITY(1,1) NOT NULL,
	EmployeeID	INT,
	PatientID	INT,
	[Day]		DATE,
	[From]		TIME,
	[To]		TIME
);

INSERT INTO @Employee
VALUES (123, 456, '20151027', '20:00', '08:00');

DECLARE @Date TABLE
(
	[Date]		DATE
);

INSERT INTO @Date
VALUES ('20151027'),
	('20151028');

SELECT ID, EmployeeID, PatientID, D.[Date] AS [Day],
	E.[From], E.[To]
FROM @Employee AS E
INNER JOIN @Date AS D
	ON E.[Day] = D.[Date]
	OR (E.[To] < E.[From] AND E.[Day] = DATEADD(DAY, -1, D.[Date]));

Open in new window

chaauCommented:
Brian follows your principle of using reserved words for the column names :)
PortletPaulEE Topic AdvisorCommented:
You stipulate that these are DateTime data types.
Day (DateTime) containing a date
From (DateTime) containing time  <<<<<<< must have a date too, is it 1900-01-01 ?
To (DateTime) containing time  <<<<<< must have a date too, is it 1900-01-01 ?

This means that ALL OF THEM contain a date

Because of the required precision here I think you need to output some REAL data as an insert script. I think you may find that those "time only" columns do in fact have a date as well, and that date might be 1900-01-01 (but it might not be).

So, until there is confirmation of the REAL data, the following is based on a hunch. Note also I am displaying the whole date/time, not juts date or just time.
CREATE TABLE PatientsEmployeesSchedule 
    ([ID] int, [EmployeeID] int, [PatientID] int, [Day] datetime, [From] datetime, [To] datetime)
;
    
INSERT INTO PatientsEmployeesSchedule 
    ([ID], [EmployeeID], [PatientID], [Day], [From], [To])
VALUES
    (1 , 123, 456, '2015-10-27 00:00:00', '1900-01-01 00:20:00', '1900-01-01 00:08:00')
;

Open in new window

This query uses CROSS APPLY and VALUES to generate 2 rows output for each one row of input.
select
        ca.*
from PatientsEmployeesSchedule pes
cross apply (
      values
             (pes.ID, pes.EmployeeID, pes.PatientID, pes.[Day], pes.[From], dateadd(hour,24,0))
           , (pes.ID, pes.EmployeeID, pes.PatientID, dateadd(day,1,pes.[Day]), dateadd(hour,24,0), pes.[To])
    ) ca ([ID], [EmployeeID], [PatientID], [Day], [From], [To])
where pes.[From] > pes.[To]



| ID | EmployeeID | PatientID |                       Day |                      From |                        To |
|----|------------|-----------|---------------------------|---------------------------|---------------------------|
|  1 |        123 |       456 | October, 27 2015 00:00:00 | January, 01 1900 00:20:00 | January, 02 1900 00:00:00 |
|  1 |        123 |       456 | October, 28 2015 00:00:00 | January, 02 1900 00:00:00 | January, 01 1900 00:08:00 |

Open in new window

also see http://sqlfiddle.com/#!6/c57b5/5

{+edit}Those column names are terrible - sorry you know that already - but they really are bad.
Protecting & Securing Your Critical Data

Considering 93 percent of companies file for bankruptcy within 12 months of a disaster that blocked access to their data for 10 days or more, planning for the worst is just smart business. Learn how Acronis Backup integrates security at every stage

bfuchsAuthor Commented:
Hi Experts,

First of all thanks for replying..

I will test those suggestions and keep you posted.

Just wanted to stress out this point, since we are going to force some rules here that will impact every single transaction, for example that an employee cant have over X amount of hours scheduled per week (with some exceptions tough), therefore is very important to have the most efficient way of calculating this.

@Paul,
Yes, the from & to columns do contain the date of 1899-12-30...


Thanks,
Ben
PortletPaulEE Topic AdvisorCommented:
OK, assuming that BOTH [From] and [To] have the date 1899-12-30 then I would suggest:

SELECT
      ca.*
from PatientsEmployeesSchedule pes
      CROSS APPLY (

            VALUES
                  (pes.ID, pes.EmployeeID, pes.PatientID, pes.[Day], pes.[From], '18991231')
                , (pes.ID, pes.EmployeeID, pes.PatientID, DATEADD(day, 1, pes.[Day]), '18991230', pes.[To])

              ) ca ([ID], [EmployeeID], [PatientID], [Day], [From], [To])
WHERE pes.[From] > pes.[To]
;

| ID | EmployeeID | PatientID |                       Day |                       From |                         To |
|----|------------|-----------|---------------------------|----------------------------|----------------------------|
|  1 |        123 |       456 | October, 27 2015 00:00:00 | December, 30 1899 00:20:00 | December, 31 1899 00:00:00 |
|  1 |        123 |       456 | October, 28 2015 00:00:00 | December, 30 1899 00:00:00 | December, 30 1899 00:08:00 |

Open in new window


It would be a whole lot better if the table was redesigned but as this has been ruled out, what I propose above has [To] being after [From] on both rows which should simplify any date/time arithmetic you want to perform.

{+edit}in case it isn't clear...

1. source rows needing this treatment are found by using  WHERE pes.[From] > pes.[To]
2. that what you see as "time" in columns [From] and [To] is just a display format, there are dates involved
3. you need to be able to calculate "duration" hence [From] should be less than [To] always e.g.
|                       From |                         To |  Duration |
|----------------------------|----------------------------|-----------|
| December, 30 1899 00:20:00 | December, 31 1899 00:00:00 |   4 hours |
| December, 30 1899 00:00:00 | December, 30 1899 00:08:00 |   8 hours |

Open in new window

bfuchsAuthor Commented:
@Brian, Paul,

I forgot to mention that besides returning 2 records for that instance (where schedule ends next day), I also need to make sure this record is not showing up as a record by itself, otherwise I will end up with double the hours scheduled...

Thanks,
Ben
Brian CroweDatabase AdministratorCommented:
I'm not sure I understand.  You don't want to see rows that don't overlap days in the result set at all?
Vikas GargAssociate Principal EngineerCommented:
Hello,

Try this

DECLARE @Employee TABLE
(
	ID			INT IDENTITY(1,1) NOT NULL,
	EmployeeID	INT,
	PatientID	INT,
	[Day]		DATE,
	[From]		TIME,
	[To]		TIME
);

DECLARE @EmployeeOut TABLE
(
	ID			INT NULL,
	EmployeeID	INT,
	PatientID	INT,
	[Day]		DATE,
	[From]		TIME,
	[To]		TIME
);

INSERT INTO @Employee
VALUES (123, 456, '20151027', '20:00', '08:00');

INSERT INTO @EmployeeOut
SELECT ID,EmployeeID,PatientID,[Day],[From],CASE WHEN [To]<[From] THEN '12:00' ELSE [To] END FROM @Employee
INSERT INTO @EmployeeOut
SELECT ID,EmployeeID,PatientID,CONVERT(DATE,CASE WHEN [To]<[From] THEN DATEADD(D,1,[Day]) ELSE [Day] END) [DAY],[From],[To] FROM @Employee 

SELECT * FROM @Employee
SELECT * FROM @EmployeeOut

Open in new window

PortletPaulEE Topic AdvisorCommented:
It's returning 2 rows - as specified

>>"also need to make sure this record is not showing up as a record by itself"
eh?

interpretation:

 you do NOT want this dual row at all if it would create a duplicate of some other row

OR

you do NOT want ALL OF THE DURATION of the dual rows if it would cause an overlap with some other row

OR

 you do NOT want either row if ????

Perhaps you could explain BY EXAMPLES e.g.
a dual row you do want
a dual row that overlaps
a dual row that duplicates

for each, what is the expected result?

Can I point out that any implicit overlap with other rows already exists (in a logical sense), all you are doing is splitting the duration over 2 calendar days with one row per day
Scott PletcherSenior DBACommented:
/*
DECLARE @Employee TABLE
(
      ID                  INT IDENTITY(1,1) NOT NULL,
      EmployeeID      int,
      PatientID      int,
      [Day]            datetime,
      [From]            datetime,
      [To]            datetime
);
INSERT INTO @Employee VALUES (123, 456, '20151027', '18991230 20:00', '18991230 08:00');
INSERT INTO @Employee VALUES (234, 567, '20151028', '18991230 20:00', '18991230 23:15');
*/

SELECT
    e.ID, e.EmployeeID, e.PatientID,
    CASE WHEN row# = 1 THEN e.[Day] ELSE DATEADD(DAY, 1, e.[Day]) END AS [Day],
    CASE WHEN row# = 1 THEN e.[From] ELSE '18991230' END AS [From],    
    CASE WHEN row# = 1  AND row_needs_split = 1
              THEN '18991230 23:59:59.997'
              ELSE e.[To] END AS [To]
FROM @Employee AS e
CROSS APPLY (
    SELECT CAST(CASE WHEN DATEPART(HOUR, e.[From]) > DATEPART(HOUR, e.[To]) THEN 1 ELSE 0 END AS bit) AS row_needs_split
) AS ca1
CROSS APPLY (
    SELECT 1 AS row# UNION ALL
    SELECT 2 WHERE row_needs_split = 1
) AS ca2

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
bfuchsAuthor Commented:
Wow,

Such a nice response!

As of now Scott's solution seems to work perfectly as desired, see attached.

@Vikas,
See attached results of yours
I need only 3 rows returned for that scenario tested.

@Brian, Paul,
Excuse me for perhaps not being so clear.

As originally described the structure of our table was clear.

Now I was given the task of preventing schedules of above X amount of hours per employee per week.

Therefore in order to calculate that, I would have to split a record that starts from 8PM and ends 8AM in two, as one belongs to the date saved in the day field, and the other belongs to the next day

About my last post, it was just to make sure that when selecting this record along with other records that do not need to be split, I will get one record for those that do not be to be split and two records for those that do need to be split, (as opposed to 3 records for those in need for split).

Regarding the overlap minute, I am not concerned as all records are entered this way, for example (2:AM - 4AM) and next can be (4AM: - 6AM)

Now since I will need to perform this calculation before every single record is saved, I do need the most efficient method to accomplish that.

Therefore, everyone who can come up with something that will be more efficient than Scott's solution is still welcome-:)

Thanks,
Ben
Untitled.png
Untitled1.png
Brian CroweDatabase AdministratorCommented:
I believe you will find that my solution will provide records whether they overlap midnight or not so you don't need to separate queries and union them.  If you don't have a calendar table then I recommend adding one to your database or even a separate database on the instance if necessary.  You can find many examples of Date tables and how to populate them from a simple Google search.
bfuchsAuthor Commented:
@Brian,
I see you're right, that solution will also work well.

I guess will have to test both in our production DB that contains significant amount of data, and decide which is the ideal solution regarding performance.

Thanks,
Ben
Brian CroweDatabase AdministratorCommented:
If performance is the biggest factor then post your execution plan and we can look at where you might need indexing.
bfuchsAuthor Commented:
@Brian,

Actually I don't see how your solution splits the time, it only gives me two dates but both have the same time?
See attached.

Thanks,
Ben
Untitled1.png
PortletPaulEE Topic AdvisorCommented:
If you are attempting to prevent new booking that would overlap, why not add one or two computed columns to the existing table (depending on design) that would provide the means for doing that

e.g.
  calculate the datetime of the startpoint (both [Day] and [From])
  calculate the datetime of the endpoint (both [Day] and [To] including the extra day as needed)

Then when inserting new appointments just compare to those new columns. May solve other issues as well, plus you can use decent column names.
bfuchsAuthor Commented:
@Paul,

Not sure why you guys are getting the picture I am talking about overlap here..?

My task is to calculate the quantity of hours an employee has schedules ([To] - [From]) per week, so if someone is scheduled from 8AM to 8PM I want to get 12 hours, and from 8PM to 8AM I also want to get 12 hours.

The only peace I was missing here is that the 8PM to 8AM could really belong to two weeks, as the first 4 hours till 12 AM can be on Sat night while the remaining 8 hours are of Sunday which is considered the next week already.

Thanks,
Ben
bfuchsAuthor Commented:
@Scott,

So far I got to test your suggestion and it took 7 seconds (for over 1M rows).

Attached is the execution plan

do you see any need for improvement?

PS. Next thing I will do is, add the time calculation and group it by week, but will leave that for another post..

Thanks,
Ben
Scott.sqlplan
bfuchsAuthor Commented:
Meanwhile just doing some data analysis..

tested Paul's suggestion as following

SELECT
      ca.*
from PatientsEmployeesSchedule pes
      CROSS APPLY (

            VALUES
                  (pes.ID, pes.EmployeeID, pes.PatientID, pes.[Day], pes.[From], '18991231')
                , (pes.ID, pes.EmployeeID, pes.PatientID, DATEADD(day, 1, pes.[Day]), '18991230', pes.[To])

              ) ca ([ID], [EmployeeID], [PatientID], [Day], [From], [To])
WHERE pes.[From] > pes.[To]

union all

select ID,EmployeeID,PatientID,[Day],[from],[TO] from PatientsEmployeesSchedule pes
WHERE pes.[From] <= pes.[To]

Open in new window


That took only 6 seconds.

But I see a big difference in the count of rows returned, compared to Scott's..

Scott's suggestion returns over 2 thousand rows more then Paul's..

Who can help me figure out which is correct?

Thanks,
Ben
Scott PletcherSenior DBACommented:
You can't go by clock time, because so many different factors can affect that.  You need to look at logical IOs, which you can get by using:
SET STATISTICS IO ON
before you run the SQL.

Interesting on the different results, not sure what is happening or which is right.  I suggest capturing both results by using SELECT ... INTO a new table, and then you can do an EXCEPT on the results sets and check out a row that is listed in mine and not the other one.


Btw, I made some tweaks to my code to see if it might perform a little bit better:


SELECT
    e.ID, e.EmployeeID, e.PatientID,
    CASE WHEN row# = 1 THEN e.[Day] ELSE DATEADD(DAY, 1, e.[Day]) END AS [Day],
    CASE WHEN row# = 1 THEN e.[From] ELSE '18991230' END AS [From],    
    CASE WHEN row# = 1 AND DATEPART(HOUR, e.[From]) > DATEPART(HOUR, e.[To])
              THEN '18991230 23:59:59.997'
              ELSE e.[To] END AS [To]
FROM PatientsEmployeesSchedule AS e
CROSS JOIN (
    SELECT 1 AS row# UNION ALL
    SELECT 2
) AS cj1
WHERE
    row# = 1 OR DATEPART(HOUR, e.[From]) > DATEPART(HOUR, e.[To])
bfuchsAuthor Commented:
@Scott,

Your first suggestion results in the following:
Table 'PatientsEmployeesSchedule'. Scan count 25, logical reads 7194, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

you last suggestion:
Table 'Worktable'. Scan count 0, logical reads 5987018, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'PatientsEmployeesSchedule'. Scan count 2, logical reads 14152, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Both take the same time to execute.

Paul's  suggestion:

Table 'PatientsEmployeesSchedule'. Scan count 50, logical reads 14388, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Not sure how can I determine from these which is performing better?


Thanks,
Ben
Scott PletcherSenior DBACommented:
Comparing this:
Table 'PatientsEmployeesSchedule'. Scan count 25, logical reads 7194, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
and this:
Table 'PatientsEmployeesSchedule'. Scan count 2, logical reads 14152, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The first has roughly 1/2 the logical reads of the second, so it should perform better overall (btw, the "Scan count" can be ignored).
bfuchsAuthor Commented:
So you're are saying the only thing that counts is the logical reads?

And Whats about the 5.9M logical reads of the work table?-:)

Thanks,
Ben
Scott PletcherSenior DBACommented:
Idk for sure, looking at the query plan might give a better idea, but often those show up when a table doesn't have a unique index.  Making the index unique can sometimes get rid of the extra I/O, even though that's not really logical, since that shouldn't affect reads that badly.
bfuchsAuthor Commented:
That's not the case here, as ID column is unique in PatientsEmployeesSchedule.

Perhaps you're referring to the fact this query may show 2 records per ID (in case of the split) ?

Thanks,
Ben
bfuchsAuthor Commented:
As of now 3 solutions posted here seems to work, Paul's solution and 2 of Scott's,
And looking at logical reads, Scott's first solution will perform the best.

Do someone has any objetion on this?

Thanks,
Ben
PortletPaulEE Topic AdvisorCommented:
no objection

although instead of
 THEN '18991230 23:59:59.997'  

I would recommend
THEN '18991231'
bfuchsAuthor Commented:
Thank You guys
Great job!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.