We help IT Professionals succeed at work.

How to maintain the dates of a history table aligned with no gaps?

marcodelphi2010
marcodelphi2010 used Ask the Experts™
on
Hi,

I have to mantain a history table, where given a new entry or a record is updated, the table is recreated or realigned (only for records with the same ID). There should be no gaps between the records. Each record has a StartDate and EndDate. Below is the schema of the table:

CREATE TABLE [dbo].[History] (
  [HistoryID] bigint IDENTITY(1, 1) NOT NULL,
  [ID] int NOT NULL,
  [StartDate] datetime NOT NULL,
  [EndDate] datetime NULL,
  [Status] tinyint NULL,
  CONSTRAINT [History_pk] PRIMARY KEY CLUSTERED ([HistoryID]),
  CONSTRAINT [History_uq] UNIQUE ([ID], [StartDate])
)
ON [PRIMARY]
GO

Here is a sample data with the dates aligned:

The DateFormat is dd/mm/yyyy

HistoryID      ID      StartDate      EndDate      Status
1      1      01/01/2010      10/01/2010      1
2      1      11/01/2010      15/01/2010      2
3      1      16/01/2010      20/01/2010      3

Rules:
1.Get rid of any rows that are entirely within the range of the new row being inserted or updated.
example:

INSERT INTO History(ID,StartDate,EndDate,Status)
VALUES(1,'01/01/2010','17/01/2010',2)

The end ResultSet should be as follows:
      
HistoryID      ID      StartDate      EndDate      Status
6      1      01/01/2010      17/01/2010      2
3      1      18/01/2010      20/01/2010      3

Note that the record with HistoryID=3 have the StartDate increased by 1 because of the EndDate of the previous record(6) are greater than the StartDate of the record below(3), which range was from 16/01/2010 to 20/01/2010.

The same behavior should apply if the user makes an update on any record on any field (StartDate,EndDate,Status).

Following the previous resultset, lets INSERT more data into the table.

HistoryID      ID      StartDate      EndDate      Status
6      1      01/01/2010      17/01/2010      2
3      1      18/01/2010      20/01/2010      3

INSERT INTO History(ID,StartDate,EndDate,Status)
VALUES(1,'05/01/2010','10/01/2010',1)

ResultSet after the insert:

HistoryID      ID      StartDate      EndDate      Status
6      1      01/01/2010      04/01/2010      2
7      1      05/01/2010      10/01/2010      1
8      1      11/01/2010      17/01/2010      2
3      1      18/01/2010      20/01/2010      3

Now the user changes the record with HistoryID=3, with the folowing UPDATE:
UPDATE
  SET StartDate='17/01/2010', EndDate='19/01/2010', Status=1
WHERE HistoryID=3

Resultset:

HistoryID      ID      StartDate      EndDate      Status
6      1      01/01/2010      04/01/2010      2
7      1      05/01/2010      10/01/2010      1
8      1      11/01/2010      16/01/2010      2
3      1      17/01/2010      19/01/2010      1
9      1      20/01/2010      20/01/2010      3


Another INSERT:

INSERT INTO History(ID,StartDate,EndDate,Status)
VALUES(1,'01/01/2010','18/01/2010',1)

Resultset:

HistoryID      ID      StartDate      EndDate      Status
10      1      01/01/2010      19/01/2010      1
9      1      20/01/2010      20/01/2010      3

Note that the records 7,8,3 were deleted. The new added record takes the EndDate modified to 19/01/2010 because of the EndDate of record no 3 had the same status, that means, that there is no need to keep or modify the record no 3 either, just take the EndDate of it and put it on the newly inserted record.

I dont know how to achive this result together with the behavior exposed here. I dont know if its better to do it as a trigger or to write a procedure for inserting records and do all the job in it.

That´s it. If there is any doubt about the rules, please, let me know.

Thanks!

Best Regards,

Marco André
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2011

Commented:
gut feel at present is a trigger...

what happens for deletes? ( e.g a,b,c   does a expand up or c grow back?)
what happens if c is deleted.. does b grow? etc

at present though i feel that the trigger would probably disallow actions updating multiple rows for
an id since order of updates cannot be determined within a trigger....
Top Expert 2011

Commented:
gut feel at present is a trigger...

what happens for deletes? ( e.g a,b,c   does a expand up or c grow back?)
what happens if c is deleted.. does b grow? etc

at present though i feel that the trigger would probably disallow actions updating multiple rows for
an id since order of updates cannot be determined within a trigger....

Author

Commented:
The application will not allow the user to delete a record. Only INSERT and UPDATE. But, let´s supposed that a DELETE is allowed, then all the records should be re-aligned.

A
B
C

If C is greater then B and A, then do nothing. The user only wants to delete a history which will not impact on other records. But, if the user deletes B, which is higher then A and lower then C, the records should be re-aligned with the above rules.

ste5anSenior Developer

Commented:
Your problem sounds like simple range splitting, if this is the main problem, I would use a simplified structure:

HistoryID      ID      Date      Status
1                  1        01/01/2010   1
2                  1        11/01/2010   2
3                  1        16/01/2010   3

Your end-value is always the next start-value.

Overlapping can be easily detected with  SELECT * FROM yourTable WHERE Date BETWEEN  StartDate AND EndDate;

btw, use stored procedures to build your logic and use a view with instead-of triggers to call these.

E.g your second example is then simplified to

HistoryID      ID      Date             Status
6                  1        01/01/2010  2
3                  1        18/01/2010    3

INSERT INTO History(ID,StartDate,EndDate,Status)
VALUES(1,'05/01/2010','10/01/2010',1)

transforms to a simple

INSERT INTO History(ID,Date,Status) VALUES(1,'05/01/2010',1)
INSERT INTO History(ID,Date,Status) VALUES(1,'10/01/2010'+1,1)

No more logic required. ResultSet after the insert:

HistoryID      ID      Date      Status
6                  1       01/01/2010      2
7                  1       05/01/2010      1
8                  1       11/01/2010      2
3                  1       18/01/2010      3


mfG
--> stefan <--

Author

Commented:
Hi Stefan,

Thanks for the help.

This problem is not that simple. I really need the StartDate and EndDate fields. This table and their date ranges will be compared to another huge table.

Example:

HistoryTable, contains the history of certain ID, which represents another entity on the database.

[HistoryID][ID][StartDate][EndDate][Status]

I have another table, callled Events, which have millions of records. Here is the simplified schema of this table:

EventID,EventDate,Cost,ID,ETC...

I have to compare where the EventDate "fits" in certain History, using for example, BETWEEN to find where it fits.

Sample database: Events

EventID      EventDate      Cost      ID
1      03/01/2010      0,50      1
2      03/01/2010      0,32      3
3      04/01/2010      0,67      5
4      10/01/2010      0,23      1
5      06/01/2010      0,45      2

I have to find out what was the status of those events. So, I have to compare EventDate to the HistoryTable, for example:

SELECT E.*,H.* FROM Events E
INNER JOIN History H ON H.ID=E.ID AND E.EventDate BETWEEN H.StartDate AND H.EndDate

Using your suggestion, how can I obtain the status of the events? It seems that you splited the StartDate and EndDate in 2 records. What happens if there are multiple users connected to the database, modifying and INSERTING new records? Where will be  the StartDate and EndDate of each History?

--Marco André
Senior Developer
Commented:
hi Marco,

The huge number of records may introduce performance problems in both scenarios. The basic problem about performance is always the same: Test all scenarios and compare the results.

Finding your correct history record is simple:

SELECT TOP 1 * FROM [HistoryTable] WHERE [Date] <= [EventDate] ORDER BY [Date] DESC ;

You can use a user defined table-value function to calculate this record, but I cannot guess about its performance. Maybe there is a cool join, but I can't imagine one right now ;)

The multiple user problems are the same in both solutions, it is the same problem about atomicity as in the well-known bank account transfer problem. You need an explicit transaction control encapsulating your logic and an isolation level of read commit or better.

mfG
--> stefan <--
SELECT  *
FROM    [EventTable] E
        CROSS APPLY [dbo].[FindHistory](E.[ID], E.[EventDate]) H ;

-- with

CREATE FUNCTION [FindHistory]
    (
      @Id AS INT,
      @EventDate AS DATETIME
    )
RETURNS TABLE
AS
RETURN
    ( SELECT TOP 1
                H.*
      FROM      [HistoryTable] H
      WHERE     H.ID = @ID
                AND H.[HistoryDate] = @EventDate
      ORDER BY  H.[HistoryDate] DESC
    )

Open in new window

Author

Commented:
Hi Stefan,

Ok. I will test your scenario. But, i think it will do a tremendous impact on performance, because you are using a CROSS JOIN together with a table-value function. Well, I will test it and put here the results.

Thanks!

Marco André

Author

Commented:
Stefan,

I have created a test database with the following script:

CREATE TABLE [dbo].[EventTable] (
  [EventID] bigint IDENTITY(1, 1) NOT NULL,
  [EventDate] datetime NULL,
  [ID] int NULL,
  [Cost] money NULL,
  PRIMARY KEY CLUSTERED ([EventID])
)
ON [PRIMARY]
GO

CREATE TABLE [dbo].[HistoryTable] (
  [HistoryID] bigint IDENTITY(1, 1) NOT NULL,
  [ID] int NULL,
  [Date] datetime NULL,
  [Status] int NULL,
  PRIMARY KEY CLUSTERED ([HistoryID])
)
ON [PRIMARY]
GO

I used a program to generate 100000 records data into the table EventTable. With the following rules:

.ID range from 1 to 5
.EventDate range from 01/01/2010 to 07/07/2010
.Cost range from $0 to $2

I have looked into the generated data just to make sure that the records I want to compare are there, I mean, if there are events in some range there are on table HistoryTable.

After that, I inserted this records on HistoryTable:

HistoryID      ID      Date      Status
1      1      01/01/2010      2
2      1      05/01/2010      1
3      1      11/01/2010      2
4      1      18/01/2010      3


I modified your function FindHistory so that it compares only the date portion of a DATETIME, using this function:

CREATE FUNCTION dbo.GetOnlyDate (@Data DATETIME
)
RETURNS DATETIME
AS
BEGIN
  RETURN CAST(FLOOR( CAST( @Data AS FLOAT ) ) AS DATETIME)
END


CREATE FUNCTION [FindHistory]  
(  
  @Id AS INT,  
  @EventDate AS DATETIME  
)  
RETURNS TABLE  
AS  
RETURN  
(
  SELECT TOP 1  
    H.*  
  FROM [HistoryTable] H WITH(NOLOCK)
  WHERE H.ID = @ID  
    AND dbo.GetOnlyDate(H.[Date]) = dbo.GetOnlyDate(@EventDate)
--    AND H.[Date] = @EventDate
  ORDER BY H.[Date] DESC  
)

Yes, I know, there is a new datatype in SQL 2008 which only holds the DATE part. But lets continue the tests...

Then, a ran your suggested SELECT statement:

SELECT E.*,H.Status
FROM dbo.EventTable E
CROSS APPLY [dbo].[FindHistory](E.[ID],E.[EventDate]) H
WHERE E.ID=1

I only want to find the status of each Event for the Entity with ID=1.

Your procedure really worked, but it only finds the records the fits entirely within the range that its on HistoryTable. Looking again on the EventsTable, there are records with Date=02/01/2010, 03/01/2010, and so on...But the query returns only records that matches with the dates there exists on HistoryTable, which includes: 01/01/2010(2),05/01/2010(1),18/01/2010(3), and so on...
Again, what is the status, for instance, of the events of the day 02/01/2010(which have records on the EventsTable)?

Thanks,

Marco André
ste5anSenior Developer

Commented:
Oops, a typo.

The comparison operator for the date value is wrong.
CREATE FUNCTION [FindHistory]
    (
      @Id AS INT,
      @EventDate AS DATETIME
    )
RETURNS TABLE
AS
RETURN
    ( SELECT TOP 1
                H.*
      FROM      [HistoryTable] H
      WHERE     H.ID = @ID
                AND H.[HistoryDate] <= @EventDate
      ORDER BY  H.[HistoryDate] DESC
    )

Open in new window

Author

Commented:
Hi again,

I forgot to put the performance data.

To compare the performance of both worlds, I also created the original HistoryTable and called it HistoryTable2, which have the following data:

HistoryID      ID      StartDate      EndDate      Status
6      1      01/01/2010      04/01/2010      2
7      1      05/01/2010      10/01/2010      1
8      1      11/01/2010      17/01/2010      2
3      1      18/01/2010      20/01/2010      3

Here are the results of both SELECT statements:

1.
SELECT E.*,H.Status
FROM dbo.EventTable E
CROSS APPLY [dbo].[FindHistory](E.[ID],E.[EventDate]) H
WHERE E.ID=1
ORDER BY E.EventDate,E.ID

<1887 rows fetched (3,437 sec)>

2.
SELECT E.*,H.Status
FROM dbo.EventTable E
INNER JOIN dbo.HistoryTable2 H ON H.ID=E.ID
  AND Funcoes.dbo.GetOnlyDate(E.EventDate) BETWEEN H.StartDate AND H.EndDate
WHERE E.ID=1
ORDER BY E.EventDate,E.ID

<9612 rows fetched (2,562 sec)>

The query number 2, returned all Events that FITS in some range on HistoryTable2, much faster then query 1, which does not show me the status of all events that should be there.

Regards,

Marco André

Author

Commented:
Ok, I changed the function.

Here is fresh result for query no. 1:

<100000 rows fetched (2,469 sec)>

Why the SELECT returns now all the records from the database, even putting the filter to only analyse records with ID=1?

Marco André

Author

Commented:
Ooops, sorry. It´s OK, now the table have 500000 records. 100000 have ID=1.
ste5anSenior Developer

Commented:
Can you place your test data (*.mdf) somewhere online?


mfG
--> stefan <--

Author

Commented:
Yes, sure. But now I think it worked. I will go to lunch, then I will put the data online for you.

Thanks again!

Marco André

Author

Commented:
Hi Stefan,

Attached is a procedure to INSERT a new History. Can you look at it to see if the ideia is correct?

I have another question: What happens to the records of EventTable whose date are below any record found on HistoryTable?
I had to insert a new record on HistoryTable, with a very low date (01/01/1900, with a default status), so it can be found by your procedure. Doing this I will be obliged to INSERT a default History for each ID on HistoryTable with a low date. Do you have a better solution for that?

Thanks,

Marco André
CREATE PROCEDURE dbo.PrcInsertNewHistory(
  @ID INT,
  @StartDate DATETIME,
  @EndDate DATETIME,
  @Status INT = 1
)
AS
BEGIN
  SET NOCOUNT ON
  
  BEGIN TRANSACTION
  
  --DELETE the overlapping date ranges that are on the same range of the new range
  DELETE FROM dbo.HistoryTable 
  WHERE dbo.GetOnlyDate([Date]) BETWEEN @StartDate AND @EndDate
  
  --Insert the StartDate
  INSERT INTO dbo.HistoryTable
  VALUES(@ID,@StartDate,@Status)
  
  --Insere the EndDate
  INSERT INTO dbo.HistoryTable
  VALUES(@ID,@EndDate+1,@Status)
  
  IF @@ERROR = 0 
    COMMIT TRANSACTION
  ELSE
    ROLLBACK TRANSACTION
END

Open in new window

Author

Commented:
I´ve made some modifications. Now the users can INPUT the data as usual. What he will see is a vision that contains the StartDate and EndDate. Then, a created a trigger on table HistoryTable that does the magic.

Thank you very much!

Your help was very, very helpfull! I am just wating for some tests to give you your credits.

Best Regards,

Marco André
CREATE TABLE [dbo].[HistoryTable] (
  [HistoryID] bigint IDENTITY(1, 1) NOT NULL,
  [ID] int NULL,
  [Date] datetime NULL,
  [Status] int NULL,
  [StartDate] datetime NULL,
  [EndDate] datetime NULL,
  PRIMARY KEY CLUSTERED ([HistoryID])
)
ON [PRIMARY]
GO

CREATE VIEW dbo.ViewHistory
AS
  SELECT DISTINCT
    H.ID,
    H.StartDate,
    H.EndDate,
    H.[Status]
  FROM dbo.HistoryTable H WITH(NOLOCK)
GO

CREATE TRIGGER [dbo].[HistoryTable_triu] ON [dbo].[HistoryTable]
WITH EXECUTE AS CALLER
INSTEAD OF INSERT, UPDATE
AS
BEGIN
  DECLARE 
    @ID        INT,
    @StartDate DATETIME,
    @EndDate   DATETIME,
    @Status    INT
  
  IF (SELECT COUNT(*) FROM INSERTED) > 0
  BEGIN 
    IF (SELECT COUNT(*) FROM DELETED) > 0 -- ATUALIZANDO
    BEGIN
      SELECT 
        @ID        = I.[ID],
        @StartDate = I.[StartDate],
        @EndDate   = I.[EndDate],
        @Status    = I.[Status]
      FROM DELETED I 
    END
    ELSE -- INSERINDO
    BEGIN
      SELECT 
        @ID        = I.[ID],
        @StartDate = I.[StartDate],
        @EndDate   = I.[EndDate],
        @Status    = I.[Status]
      FROM INSERTED I 
    END
  END
  
  EXECUTE dbo.PrcInsertNewHistory @ID,@StartDate,@EndDate,@Status
END
GO

CREATE PROCEDURE dbo.PrcInsertNewHistory(
  @ID        INT,
  @StartDate DATETIME,
  @EndDate   DATETIME,
  @Status    INT = 1
)
AS
BEGIN
  SET NOCOUNT ON

  BEGIN TRANSACTION
  
  ALTER TABLE dbo.HistoryTable DISABLE TRIGGER ALL
  
  DECLARE @Start DATETIME, @End DATETIME
  
  SELECT 
    @Start = dbo.GetOnlyDate(@StartDate)
    ,@End  = dbo.GetOnlyDate(@EndDate)
  
  --DELETE the overlapping date ranges that are on the same range of the new range
  DELETE FROM dbo.HistoryTable 
  WHERE dbo.GetOnlyDate(Date) BETWEEN @Start AND @EndDate
  
  --Insere a data inicial
  INSERT INTO dbo.HistoryTable
  VALUES(@ID,@Start,@Status,@Start,@End)
  
  --Insere a data final
  INSERT INTO dbo.HistoryTable
  VALUES(@ID,@End+1,@Status,@Start,@End)
  
  ALTER TABLE dbo.HistoryTable ENABLE TRIGGER ALL  
  
  IF @@ERROR = 0 
    COMMIT TRANSACTION
  ELSE
    ROLLBACK TRANSACTION
END

Open in new window