Solved

Adding datetime column to table

Posted on 2011-09-28
19
390 Views
Last Modified: 2012-05-12
I would like to add a datetime column to a table that will contain the date of when the row was inserted into the table.  This is for debugging purposes.

The person who is in charge of this particular database refuses to do it claiming it will slow down inserts and take up too much storage!

We have about 4.5 million records and insert about 1.5 million records a day which will grow.  The code is in C# using LINQ to access the SQL Server 2008 database.  Do you agree with him that this will really slow down inserts?

Does anyone have any statistics or algorithms to figure out how much more time it would take to do an insert with a getDate() for an additional column?  And how to figure out how much storage it would take to add a datetime column for 5 million records?  Any suggestions are welcomed.

Thanks,
Abbi
0
Comment
Question by:Bodhi108
  • 7
  • 6
  • 4
  • +2
19 Comments
 
LVL 38

Expert Comment

by:Aaron Tomosky
ID: 36765457
I don't know sizes but if you don't send the time but use getdate() as the default value that is the fastest way.
0
 

Author Comment

by:Bodhi108
ID: 36812918
My question was not about the fastest way to get the date or datetime in the column but the performance cost.  I'm wondering if there is an algorithm.  My guess is that it is nanoseconds to add that extra column.    Yes, the default value would be quite fast.  I believe getDate() includes the time unless you convert it to date...
0
 
LVL 50

Assisted Solution

by:Lowfatspread
Lowfatspread earned 315 total points
ID: 36813274
1. a datetime column takes 8 bytes of storage
2. adding a new (not null) column with a default of getdate() will mean that if properly written your applications should
    function without change (ie no select * , no Insert without a column list) if not proerly written then you may need to
    create a view to hide the new column from your poorly wriiten applications.

so on the face of it you will consume an extra 40million bytes  immediately and an extra 12million bytes per day.. of database  storage, with an associated impact on log sizes as well...

the exact significan depends on what proportion of the existing row size the additional 8 bytes is, and the effect it will have on page splits to the table.

of course if yoi actively wish to use the new column in search queries then you may require new/amended  indexes , which will add additional storage requirements and have a more significant impact on your system.

adding the column should have a negligible impact on performance , unless due to your application style you are ar a "ripping" point in termss of the effect of row storafe per page which effects you page utilisation/splitting ratios.


its difficult to comment further without detailed information on your system and the lifecycle of the objects being affected.
0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 36813277
sorry thats "Tipping" point
0
 

Author Comment

by:Bodhi108
ID: 36813548
Lowfatspread...
I don't have access to the table right now but I believe the rowsize is quite small.  It has approximately 6 - 7 datetime columns, a GUID, and and 2 int columns.  There will be no extra indexes needed.  The additional column is only for debugging purposes so this column will not be used in the where clause.

So what you are saying is that by adding the column with a default of getdate() and no additional searches or indexes, every time there is an insert the performance hit is negligible if not a zero impact, right?

Thanks for your information on the database storage impact.
0
 
LVL 50

Assisted Solution

by:Lowfatspread
Lowfatspread earned 315 total points
ID: 36814082
yes impact should be negligible.


storage Impact...
so current row size is 7*8 + 16+ 2*4 + 4 overhead = 84 bytes

so you get 95 rows per page

adding the extra column give a row size of 92 bytes so a max of 87 rows per page

so extra storage requirement is 9.2%  so  allow 12.5% for contingency

however the increase may affect the  decision on initial freespace that is left on the page for insert/update operations which may need to be re-examined.

you could use a smalldatetime and lose the seconds component and save 4 bytes per row....
0
 

Author Comment

by:Bodhi108
ID: 36817213
Lowfatspread...

Thank you so much for your information.

I received a response from the person who thinks there will be an impact on performance and size.  He claims that adding a default value is the same as adding a trigger.  I explained that it is not a trigger.  Can you tell me what goes on behind the scene when it creates the default value?  What is happening behind the scenes causing the impact to be negligible?  GetDate() is a function that has to be performed...

He also claims that the datetime is 23 bytes not 8 bytes...

Below is his reply.  Could you help me reply to this email?

Email I received:
-----------
Adding new date time or any other data type column definitely effects performance, size. And may be due to other factors, because it's additional data, and additional operation, it effects to size, load and data process.

Setting up GetDate() to default value makes things more worse, because it's SQL server function which call some SQL server system code, and if you setup as default value, it will work as trigger before or after insert which is biggest bottleneck maker in any RDMS.

Size? The SQL shows it's 8 bytes, but actual size for datetime (2011-09-29 10:10:43.703) is 23 bytes. So if you multiple what we currently have in user_log, it'll be about 100 MB just for data, and you will get about twice for log.
0
 
LVL 18

Assisted Solution

by:lludden
lludden earned 165 total points
ID: 36817893
When SQL displays an int, it can show 2000000000, but it still only takes up 4 bytes of storage.  Datetime is the same way.  It is encoded as two 32 byte integers (a date part and a time part)

Setting the default value to the current date is no slower than setting it to zero (it still has to copy eight bytes to the position and write it out).

If you want to update the value each time it is accessed, that will require a trigger, which will slow things down some, and isn't really scalable.

0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 36817906
@lludden  32bits not bytes
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 18

Assisted Solution

by:lludden
lludden earned 165 total points
ID: 36817925
http://msdn.microsoft.com/en-us/library/ms187819.aspx
Storage size 8 bytes

More detail under the SQL 2000 entry (still applicable)

Values with the datetime data type are stored internally by Microsoft SQL Server as two 4-byte integers. The first 4 bytes store the number of days before or after the base date, January 1, 1900. The base date is the system reference date. Values for datetime earlier than January 1, 1753, are not permitted. The other 4 bytes store the time of day represented as the number of milliseconds after midnight.

0
 
LVL 18

Expert Comment

by:lludden
ID: 36817986
Lowfatspread - yeah.  That is what my brain said.  My fingers are being rebellious.

OP - My guess is that it boils down to this: He doesn't want to do it.  It doesn't appear to have any benefit to him and has a potential downside for him, so he is fighting it.  Putting extra, mostly unnecessary data on a production server permanently usually isn't the best solution.  Perhaps it would be better to find another solution.  

You could install a temporary trigger on the database to timestamp another table with the creation date for each primary key.  If the data is being submitted from a relatively small number of stored procedures, you could alter them to add logging information to another table.  
0
 
LVL 50

Accepted Solution

by:
Lowfatspread earned 315 total points
ID: 36818034
<hmmm bite tongue mode on>

roflmao


ok
this is the usual i'm talking at second hand and don't understand what your system is doing and is set up to do..
so whilst i can sympathsize with some of what your "adversary" is saying.  no it basically isn't correct.

1. a default is definitely not the same as a trigger
2. a default is stored in the system tables which are readily available and always used in the process of performing all database activity requests.
3. default processing is resolved before any trigger processing is even considered.
4. yes getdate() is a function and must resolve to a system call to obtain a datetime, but a datetime is essential to database processing, it must be amongst the most efficient code there is in the database engine, and the database engine is always having to retrieve the time for its operations... so no impact
5. bad/poor database design , bad/poor developer code ,inappropiate use of cursors, lack of physical resources these are the      causes of bottlenecks
    ok as a DBA i understand the impact of poorly written triggers, which is why i always advocate they are written by DBA's
   however a trigger in itself need not affect performance - its a push
6. an external character representation of a date time is 23/26 characters (who can say how many bytes that is these days)
    but to the database its 8 bytes treated as 2 floating point numbers, you've said you don't normally want to have anything to do with it so its 8 bytes.
    bad coding (use of select *) would cause the column to be retrieved and expanded to the 23 character size but thats more a network issue rather than a database issue per se. and the impact is still only of the order of 10-12% for this table.
    if your running that close to capacity you are having problems now and need to address them with hardware...

7. again ok 100mb i personally don't see that as out of line to my 40mb initial estimate... if you haven't got a freespace overhead of at least 10GB , then again you are dangerously close to resource outage problems anyway.

--------

however its not clear about which log he's talking about , do you have some system specific log which are created?
also he seems to imply that the system  is being loaded, so are the tables subject to some batch processes which cause them to be loaded/unloaded via text files? in which case his 23 character impact maybe valid but again 100mb isn't particularly large. or did he mode performance load...

reading again its a user log  so you have some sort of audit trail being produced, how/why is this being done do the business require this new audit column included in the "log" ... but still 100mb isn;t something i'd have thought anyone worried about these days. how is the user-log created.

any help/
sorry got distracted halfway through, and i need some sleep... will check back tomorrow.



0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 36818069
i was going to suggest something along the lines of lluddens solution ...

it depends on your application/system but you could just audit a transaction if you have other identifiers already which identify the change... in an audit table ... but that would have a bigger impact to the system... personally i think its a nobrainer an audit trail is always required in a system the fact that you've identified  a requirement "late" shouldn't affect what is a fundamental requirement (time to get some users on your side... try internal audit if necessary they are usually very sympathetic to additional tracking mechanisms...)

ps i read your requirement as being for the insert only you don't have any update maintenance requirements as well do you?
(which would need to be dealt with via a trigger... but again the impact of an update/delete(?) trigger to maintain a datetime column would be negligable).


good luck
0
 

Author Comment

by:Bodhi108
ID: 36818177
lludden... Thanks for your response...
1) Storage....
The response I received from my co-worker is this...
"8 bytes is not actual data size. You should look into how the windows OS stores data in disk clusters. Using the scenario I provided below gives more accurate result to figure out what is the actual data size."
...
"Since our DB file in Windows OS, so to check the actual size of the data, create a new text file. Right click and check the size on property window, it shows 0 byte.
Now copy the simple datetime "2011-09-29 10:10:43.703", and paste it in the file and save it. Then check the size again on property window, it will show what is the actual size of the data. I bet it's 23. it may be more in DB file, because SQL server encryptes the data, and adds additional tags to store the data."

2) Performance of getDate() default constraint...
We want to update the value only when it is inserted, not updated...  Doesn't the database have to do some work to do a getDate() versus setting it to 0?  He claims it is acting as a trigger which will slow down performance...

Thanks!
0
 

Author Comment

by:Bodhi108
ID: 36818184
If you haven't noticed by now, I am what you call an 'accidental DBA'.
0
 

Author Comment

by:Bodhi108
ID: 36818463
Lowfatspread:  the log file he is referring to is the transaction log file...
0
 
LVL 18

Assisted Solution

by:lludden
lludden earned 165 total points
ID: 36818562
Based on this remark,
"8 bytes is not actual data size. You should look into how the windows OS stores data in disk clusters. Using the scenario I provided below gives more accurate result to figure out what is the actual data size."
, I am not sure it is even worth trying to convince him, but I'll give it one try.  LowFatSpread probably knows this better than I though.

Actually, here here is a good article explaining SQL data storage.   In a nutshell, the SQL data files are divided into pages.  Rows are stored in pages.   Rows can span between pages or fit multiple to a page.  There is a 4 byte per row overhead.

The data size that is read from the disk is based on the NTFS cluster size, and doesn't really affect overall data size, it mostly just affects IO speed.


Now to test the speed, try the following code:
DECLARE @L int = 2000000
DECLARE @T1 datetime
DECLARE @EndAt datetime2
DECLARE @StartAt datetime2 =getdate()

WHILE @L > 0
BEGIN
   --SET @T1 = getdate()
   --SET @T1 = '2011-10-01 12:11'
   SET @L = @L -1
END
SET @EndAt = getdate()

PRINT @StartAt
PRINT @EndAt

Open in new window

Run it first as a loop to get a baseline:
2011-09-29 15:24:23.7570000
2011-09-29 15:24:30.4530000
6.696 sec base

Next, run it with the getdate() uncommented
2011-09-29 15:26:26.2900000
2011-09-29 15:26:34.3470000
8.057 sec

Finally, run it with a constant
2011-09-29 15:27:08.1070000
2011-09-29 15:27:16.1030000
7.996 sec

So, it takes about 0.1 longer to call getdate() than it does to just make an assignment, 2 million times.

Good luck
0
 
LVL 75

Assisted Solution

by:Anthony Perkins
Anthony Perkins earned 20 total points
ID: 36819158
To put it mildly your co-worker is an ignorant idiot and your chances of convincing him/her that they are ill-informed are close to zero (or would that be 23 bytes?).
0
 

Author Closing Comment

by:Bodhi108
ID: 36907978
Yes, thanks for everyone's advice.  

They decided NOT to add the dateInserted column which is only going to hurt them.  And, yes, it is Database 101 to have columns of when a row was inserted and updated.  I couldn't convince him but I'm really worn out on this issue and have to drop the issue and move on to other things I can control...

In over 20 years of working with databases, I've never encountered having to fight for such a column.  It is frustrating to work where there is database ignorance!
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

SQL Server engine let you use a Windows account or a SQL Server account to connect to a SQL Server instance. This can be configured immediatly during the SQL Server installation or after in the Server Authentication section in the Server properties …
In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now