Attending one of Rob Farley's seminars the other day, I heard the phrase "The Accidental DBA" and fell in love with it. It got me thinking about the plight of the newcomer to SQL Server... So if you are the accidental DBA, or, simply new to SQL Server then there are a few concepts we can share to make your life easier.
Based on the frequency of questions we see in Experts-Exchange, I thought a 'light' discussion about the Transaction Log would be a good place to start.
Microsoft's SQL Server is a great database. Generally, it can look after itself, but there are one or two aspects that will need your attention. You can interactively access, configure, manage, administer, and develop your databases using SQL Server Management Studio (SSMS). With SSMS, the database administrator (i.e., you) can administer any of the components that we will be discussing. Now, this article is not an introduction into SSMS, but there are a couple of tools you will need to know. For managing your database, in SSMS, use the Object explorer to right click on your database and then Properties for a range of settings and information, or Tasks to shrink or backup files. Also in Object Explorer is the Management group, expand that for Maintenance plans.
A SQL Database is made up of three types of files, the two main files are primary data file with the extension of MDF and the transaction log file with the extension of LDF. The third is an optional secondary data file and has the extension NDF. As users add information into the database, it is stored in the data file. In turn, those transactions and database modifications can be recorded in the Transaction Log. These files are physical disk files and are built with an initial allocation of disk space. That means, you can reserve a pre-determined size for your database, and then set about filling it up (more about that later).
Now, the transaction log is an extremely important and sometimes neglected part of the database. Its use can mean the difference between data survival or devastation if needing to recover following a database or systems failure.
The way in which your database will use the transaction log is determined by the Recovery Model. The way in which you manage the transaction log is using SQL Backup. The way in which you manage the Backup is using Maintenance Plans. And the nice thing ? It can all be automated ! Phew...
The Recovery Model is a setting on your Database. There are three choices for Recovery Model FULL, SIMPLE and BULK LOGGED. While it is possible to change Recovery Model at any stage, it really should be determined before you create your database. The two most often used are Simple and Full so that is what we will talk about.
If running SIMPLE recovery mode then data transactions are minimally logged and automatically reclaims log space to keep space requirements small, essentially eliminating the need to manage the transaction log. If running FULL recovery mode then individual transactions and database modifications are fully logged in the transaction log, but then you will need to manage the log file. We mention truncation further below, and normally when we talk about size it normally refers to disk, when we talk about space, we normally refer to how much of our allocated disk size is being used (or how "full" the file is).
The idea behind FULL recovery mode is that you can recover from a failure up to a point in time by applying transaction log backups to the last full backup. Whereas recovering from a SIMPLE recovery model means going to the last full backup. So, your choice of recovery model needs to be considered in terms of your business dependency on the volume (or existence) of data that can be recovered when restoring from backup.
There are a few backup options including Full backup, Differential Backup and Transaction Log backups using the internal Backup command / facility within SQL. When DBAs talk about a backup, it is typically the SQL backup that they are most likely referring to - not the physical disk backup (which is also a crucial part of your DR plan). When restoring a database, it is the SQL backup that you will need to restore from.
Backups are crucial part of managing your transaction logs in a FULL recovery mode. Regardless or recovery model, you do need to run Full backups, the frequency really depends on the type and volume of activity. For a typical office environment, that would most likely be every night. Then for FULL recovery, run transaction log backups during the day at regular and frequent intervals - most common are multiples of 15 minutes (i.e., every quarter hour, half hour, hourly etc). That helps keep the transaction log clean by truncating those transactions that are now secured in the transaction log backup and so releases that space (i.e., inside the file), ready to accept new transactions. In FULL recovery, it is only the Transaction log backups that will make a difference to the content of the transaction log, without them, the transaction log will continue to grow.
There are two other Backup options Differential and Copy Only. Differential backups are smaller and faster to create than the base of a full backup being just the changes since last full backup. Using differential backups can speed up the process of making frequent backups to decrease the risk of data loss. At restore time, the full backup is restored first, followed by the most recent differential backup. Copy Only is a "special" back up normally a once off event that will not impact the state of any other scheduled backup operations.
Really, the frequency with which backups are taken needs to reflect the Business's ability to cope with data loss. The big attraction with full recovery is that it records the transactions in the transaction log which can be (progressively) applied to the last full backup. The greater the frequency, the fewer the transactions (relatively speaking) and so the less resources required to take the backup, and the greater likelihood of recovery to a point in time prior to failure.
Now, with all that activity, how is the DBA meant to manage ? Enter, stage right, the Maintenance Plan. The maintenance plan is part of SQL Server and found in the Management object in SSMS. These plans can be scheduled using SQL Server Agent to run automatically (not in the Express edition, but there are alternatives). The maintenance plan can also perform a variety of other jobs, such as removing old backups, database "housekeeping," email notifications and so on. So, can be almost a case of set and forget (well set and monitor really).
When creating or maintaining a database, you can allocate disk space and can be easily controlled. Consuming disk space can get a little more involved and is largely reactionary to the amount of activity. To help the DBA manage, there are the Database settings to help with sizing the database with Autoshrink and Autogrow. Immediately, without thinking, right now, disable Autoshrink. Having done that, you can then go and find out about Autoshrink and agree that turning it off is really the best thing to do. Autogrow is a bit different. When a database or log file is filling up the allocated space, it will decide at some point that it needs more disk. Autogrow sets the amount of disk that the database files will grow by. Given time, and maturity of the database, the size and growth rates will become more predictable, ideally to the point were the growth is fully managed and Autogrow is an exception.
If you are seeing continual growth in the database files it could mean that you are either shrinking, or not managing the size. It could also be due to not running frequent enough transaction log backups. While Autogrow may happen, it really should be the exception and considered a safety measure to accommodate unpredicted activity. We don't really want the overheads with the disk files to stop, grow, resume processing along with fragmentation that must surely result in trying to acquire additional disk. So, when using Autogrow, make it a reasonable amount so even if it must autogrow, the number of instances is minimal (or non-existant)
If you do set your database to simple recovery, the transaction log is managed for you. But then in the event of a database failure, your only protection is the last full backup, or last full backup and the last differential backup. For a typical OLTP type operation, SIMPLE is ill-advised. If your business depends on the information, then you will want high availability and high recoverability (meaning FULL recovery mode). A Read only database on the other hand like a data warehouse or reporting database probably does not need the same dynamic recoverability because they are typically batch updated (from the "real" transaction store) and easy to schedule a full backup after the load, or recover from the original transactional database, so ideally suited to SIMPLE recovery model.
So, if you do not already have them, and it is not the Express edition, then get those backups happening, and use the maintenance plans so it can be scheduled. In that regard, would suggest you create three new maintenance plans.
1) Full backup - is a full backup run each day / night, or really, it depends on type and amount of activity.
2) Transaction log backup - frequently during the day if running FULL recovery mode (otherwise do not worry about it). If not running FULL, you may consider Differential backups
3) Database Maintenance Tasks - run periodically to update statistics, rebuild indexes, manage / resize databases etc - normally run at a period where there are high volume turnovers such as an end of month purge / archive or similar - keeps database running efficiently -- could even run weekly (maybe overkill) over the weekend if you have some downtime available.
So, what happens if we don't ? that's simple, you will have uncontrolled disk consumption and exposure to failure. So, what do I do if that hasn't been done before ? that's also simple, do it now - it is never too late. Or, what if I am running out of disk space already ? Then it is highly likely that you may have to shrink or resize the database files and keep it clean from that point forward.
If you can, run a FULL backup and set up those maintenance plans. Maybe (big maybe) and only momentarily, change to SIMPLE recovery mode until you have your plans in place. This should not take long to set up and is done in SSMS. Changing to Simple, and then change back, do a FULL backup, will give you a "clean" log, but it will not be optimised for size and growth. Ideally you size your database files according to the maximum type of activity that will occur in the time frames / frequency dictated by your transaction log backups. That way, disk can be pre-allocated and then allowed to fill up that allocated space. These files can also autogrow, and so is not of "vital" importance, though, it is not desirable for all kinds of performance and fragmentation issues and ensuing IO performance impacts associated with frequent need to autogrow.
If needed, using SSMS, you can also inspect and shrink your files by right clicking on the database, tasks, shrink, files and a window will popup. There is a drop down selection for looking at either the data file or the log file. The down the bottom, you can select some actions - best to use middle radio button to reorganise and also nominate some space. You do need to allow enough space for the data base to grow (physical allocation of disk, and then starts to use that space with data activity). The logs will typically shrink very quickly. The data files will take considerably longer. But you will only have success shrinking files if they can be truncated. That means you must release internal space before you can release (shrink) physical disk space.
Now, we previously introduced the term truncation in terms of releasing space. Using Full Recovery, and regular transaction logs backups, the log files can be kept clean, but how does it do that ? SQL 2005 and 2008 rely on checkpoint processing after backups to truncate those logs :
Except when delayed for some reason, log truncation (releasing internal space, or making the disk file empty) occurs automatically when :
1) Under the simple recovery model, after a checkpoint.
2) Under the full recovery model or bulk-logged recovery model, after a log backup, if a checkpoint has occurred since the previous backup.
A checkpoint is essential for truncating the log under the full recovery - there might still be other factors (like an orphaned transaction) that prevent it from releasing space. If needed, explicitly invoke a checkpoint after the backup. If still needed, you may need to check (use the Activity Monitor in SSMS Mangement) for orphaned transactions / hanging process and (gulp) kill them (or simply shut down SQL Server and restart the service).
You can check via:
select log_reuse_wait, log_reuse_wait_desc, * from sys.databases -- looking for 'checkpoint' in log_reuse_wait_desc.
In FULL recovery, the inactive part of the log cannot be truncated until the log records have been captured in a transaction log backup. So, a moot point trying to shrink files without doing the backups, or releasing some of that internal space first.
Having said all that above, heavily favouring the Full Recovery mode, we had one company in EE who decided on "ease of management". Running a full backup of their Simple recovery database starting from 8:00am and then every two hours worked perfectly for them. Mind you it was not a huge database, and they processed a lot of parchments so recovery simply meant re-keying readily available information. Really, they should have entertained the notion of differential backups, but it is the model that they chose and could cope with. It is quite a bit different if everything is electronic and you lose access to the source information.
All this should be considered in line with your DR plan which most likely incorporates some wide reaching considerations. And believe it or not, with the higher reliability rates, there is less likelihood of failure, but, the more important the DR plan. Suffice it to say, you may need to restore your database at some point in time, for whatever reason (hardware failure, catastrophic failure, database corruption), and better to be in a position knowing that (if nothing else) the "Accidental DBA" has been able to save the database.
Thanks for pointing out that uncertainty. If being new to SQL server and it was unclear, then it does need spelling out a bit more...
In SQL what we have for backup options are a FULL Backup (ie the full database) and then either Transaction Logs or Differential Backups (both of which need to be used in conjunction with the last FULL backup in a restore). The Transaction Log Backup option is dependant on Recovery Model.
The "FULL" recovery model benefits from transaction logs (ie transactions are fully logged), so we must use the Transaction Log Backup option to help manage the transaction log.
The "SIMPLE" recovery model does not benefit from transaction logs (ie is minimally logged), so cannot use the Transaction Log Backup option. SQL will manage the transaction log for us.
We can use differential backups in a "SIMPLE" recovery model, so, when recovering, we have the choice of either the last full backup, or, the last full backup plus the most recent differential backup.
Does that make it clearer ?
>>If SQL Server is running in SIMPLE mode, logs are truncated, right? So if my last full backup was Sunday, and my db fails on Wednesday, will I have ability to recover up until Wednesday
No, in that case, you can only recover from the last full backup which was Sunday. meaning data loss since last full backup.
>>And _with_ differentials, it sounds like the recovery granularity is based on the time of the differential, so I would still have incomplete recovery
Correct, the recovery process with be last FULL backup plus most recent DIFFERENTIAL backup, meaning data loss since last differential.
The complexity with SQL is that there is a recovery model, and there are backup options. While the two are codependant, there is nothing enforcing the correct strategy based on that relationship. More so, you have choices and flexibility up to a degree, based on what works best for your business. Choice + Flexibility = Complexity.
I am thinking that a seperate paragraph is required. then again, our comments are not "private", so the thread does provide that clarity.
The more important factor is, if running a "FULL" recovery model, then you really have to do transaction log backups, otherwise your transaction logs will continue to grow. So, your transaction log really must be managed by doing those transaction log backups. It is not just for recoverability, backups are an important part of database health. Of course running a "FULL" recovery model is the reason in the first place to give you the highest possible mode for recoverability. Bit of a catch-22 in some regards.
Waiting for more from you in future.....