Solved

Database is starting up

Posted on 2014-12-15
30
104 Views
Last Modified: 2014-12-22
SQL Server 2012 Enterprise.
Am currently looking into why backup jobs are taking longer than usual.
Looking through the SQL logs I see the DBs being backed up one by one (I use this to do so - https://ola.hallengren.com/ - so is a single "job")

Looking through I see DB backed up, DB backed up but then I see entries saying Starting up database <dbname> several times and then some more DB backup messages. Then another few start ups and a transaction rollback message (in other words, interspersed between backup notifications are messages that look like a server start up!)
Databases are *not* set to auto close.
Any ideas what could be causing this?
0
Comment
Question by:QPR
  • 17
  • 12
30 Comments
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40501522
<<Am currently looking into why backup jobs are taking longer than usual.>>
How luch longer ?  Which one is the longest backup ?  Do you have a specific database bigger than any other ?  Can you put some figures on this ?
0
 
LVL 29

Author Comment

by:QPR
ID: 40501573
From an average of an hour to over 3 hours.
This is a sharepoint sql server so there are a few DBs of varying sizes from small to very large.
We have recently transferred from one environment to another (new SAN and upgraded VM, this isn't my area so can't expand much on that other than to be assured the new environment should be better and greater).

My question is more aimed at the database starting messages. The services are not being restarted, the Dbs are not set to autoclose. Why is it then behaving (logging) messages that would indicate a restart or recovery?
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40501591
Unless your database has tripled in size during the migration, you definitely have a problem.  It looks like you have an IO hog somewhere.  Please double check the IO write throughput (in MB/sec) on the disk where you put the backups.
0
 
LVL 29

Author Comment

by:QPR
ID: 40501714
Yes that is being investigated but this isn't the question I'm asking
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40501737
<<Yes that is being investigated but this isn't the question I'm asking>>
I am sorry but when you ask:

Any ideas what could be causing this?

I read what may be causing a significant increase in backup times, transactions to roll back, and server instability.  

You need to be aware that an IO Hog, due to a faulty IO hypervizor or a insufficient IO throughput may cause many of the things you described:

> An increase in backup times
> Transaction to be rolled back.
> Server Restarts (often due to system databases not being accessible for short period of time)

Which is the reason I am encouraging to look closer at this.  I suggest you install and test for yourself what is your current IO throughput.  You can use tools such as IOMeter.  

Here is a video for you: http://www.youtube.com/watch?v=XkOUGbsnGn0

The tool will quickly reveal what is your current throughput.  

But perhaps I misunderstood what you are looking for.
0
 
LVL 29

Author Comment

by:QPR
ID: 40501826
Thanks,
The database backups are all part of a single script which basically loops through all the DBs and backs them up.
So the job is an all or nothing failure or success.
An example from the log might be....

database A successfully backed up to.....
database B successfully backed up to.....
database C successfully backed up to.....
starting database D
Starting database E
Rolling back 1 transaction database E
database D successfully backed up to.....

etc.
If sql had restarted then the agent "job" would have failed and the sql logs would have been regenarated - neither happened.

We are looking into what is happening in the evenings on this new subsystem that is causing these increases in backup times but I am wondering what on Earth is causing these scary log entries
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40502132
Please take a look at the system logs in backup time window.  See if you have unusual events.
0
 
LVL 29

Author Comment

by:QPR
ID: 40503321
Nope, nothing out of the ordinary
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40503390
It looks like the backup cause the server to start back.  Is the drive hosting the system database files the same as the one hosting the backups ?
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40503393
Please use IOMeter and give some throughput info please.  We need to know how many MB/sec your system can absorb
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40503394
IO is number one on my list then Memory in second.
0
 
LVL 29

Author Comment

by:QPR
ID: 40503439
This is a screen shot showing the metrics for the time period while the backups are taking place. That looks like some very heavy CPU load to me!
metrics
0
 
LVL 45

Expert Comment

by:Vitor Montalvão
ID: 40504978
I would run a SQL Profiler to see what's starting the databases. It can be some restore process running at the same time?
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40507537
True that CPU is high but never overloaded completely: backup is *not* a CPU intensive task.  Again, please look for the disk write throughput first.  IO contention is your most probable cause so investigate it first. You must determine if you do  not have IO contention due to your backup requirement exceeding your IO hyper vizor capabilities.
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40507556
A little summary..

You mentioned that the problem appeared in the context of a migration but did not mention any new functionalities were added.  So why would be a probable cause for an IO intensive process (such as backup) take there time longer than a decrease in IO throughput OR an multiplication by three of the database size ?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 29

Author Comment

by:QPR
ID: 40513521
No new functionality or increased. I'm just trying to work out what would cause SQL to log "starting" messages when they we're not stopped
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40513972
QPR
Let me know when you got the information so that I can help you further.
0
 
LVL 29

Author Comment

by:QPR
ID: 40514123
That is the question I have been asking here all along :)
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40514154
QPR.
Perhaps there is misunderstandng here.  I have suggested

<<Please use IOMeter and give some throughput info please.  We need to know how many MB/sec your system can absorb>>

Which means I suggested you to install and run IOMeter to stress test your backup volume to know what is your current throughput in MB/sec (and IOPS).  From the value you get as a result of the stress test I can directly conclude if there are good chances of IO contention on this volume which can explain your problems.

I even provided a link for you on how to use IOMeter.  You have not provided any of the info I specifically asked for so far.  Therefore I have difficulties understanding your comment.  Note that the information you have provided does not give much indication as to the cause of the problem you described.  Did I miss something ?
0
 
LVL 29

Author Comment

by:QPR
ID: 40514159
I posted metrics a week ago in the image.
Are you saying that a stressed IO system can cause SQL to log false entries? Or that a DB can be closed?
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40514175
<<I posted metrics a week ago in the image.>>
I know but the metrics you posted does not say anything interesting IO wise about whether your IO subsystem can cope with what you throw at it.  IOMeter is a much sharper tool to do that which is why I suggested in the first place.  

<<Are you saying that a stressed IO system can cause SQL to log false entries? Or that a DB can be closed?>>
No.  

What I am saying is that a volume which has insufficient IO throughput (or bandwidth in MB/sec) can presents all the symptoms you have described.  For example, Low throughput create micro shortages of IO bandwidth which can cause servers to restart when the volumes hosting master are affected.  Low IO throughput can also slowdown backups or any IO operation when the IO requests exceed the available throughput on the volumes.

There is one way for me to know your maximum throughput in MB/sec and IOPS and that way is to run a short stress test on your backup volume and the volume hosting your master files.

I hope this clarified.
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40514184
Note that VM generally are known for a low throughput and low IOPS.  Throuhput measure IO bandwidth and IOPS measures the ability of the volume of handling simultaneous requests on the same volume.  When the volume has insufficient throughput and/or IOPS, I have seen symptoms like yours occurring.

Does it make sense better now ?
0
 
LVL 29

Author Comment

by:QPR
ID: 40514191
Thanks, these tests have been done by the SAN experts and after some drive swapping and config they have the read/write IOs under 10ms.
The original issue with slow backups (now resolved) were the CPUs running at > 95%
While checking logs and metrics for reasons I saw theSQL log entries about DBs starting up despite no restarting of the server or services and no DBs being set to auto close. If you are saying that tgese entries were caused by (temp) bad performance then I will accept that answer. The disk IOs are shown (in ms) at the bottom left of the image I posted
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40514196
I am not convinced what you are reading are false entries.  It is perfectly possible SQL Server is taking a precaution in starting a server shutdown to preserve data integrity when it finds out the volume is not stable.  This can happen when there the IO subsystem becomes unstable. Unstable means that for a period of time, a specific volume can not aborb all the IO SQL throws at it.
0
 
LVL 29

Author Comment

by:QPR
ID: 40514197
Wouldn't a server shut down recycle the error log and bring down the SharePoint farm? Neither happened.
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40514210
<<Thanks, these tests have been done by the SAN experts and after some drive swapping and config they have the read/write IOs under 10ms.>>
Time of single IO is not the same as throughput.  SQL Server can write concurrently many IOPS and has a specific read and write data at very big rates for periods of time.  SAN people generally have no clue about this aspect of things and doing a single elementary test does not say much.

If using a VM connected to a SAN, the SAN does not even handle the IO's, the hypervizor does.

<<If you are saying that tgese entries were caused by (temp) bad performance then I will accept that answer. The disk IOs are shown (in ms) at the bottom left of the image I posted>>
I have seen these and told you these values can't allow to draw any conclusion in your context.  To tell you anything conclusive I need to know what is your volume capable of producing as a throughput.  

As you are familiar with EE guidelines, I wander why you are so reluctant to provide some info that can help you get to the bottom of this, especially knowing that a stress test takes less than 10 minutes to run.  If I could have used your info to inquire I would have.  But the info you posted simply is insufficient to say anything about what may cause SQL to trigger a shutdown or shutdown attempt.

Really up to you.  I am the just the guy who faced the same problem a few hundred times on the SQL side and who tries to avoid some perfectly unuseful pain for you.

Regards
0
 
LVL 23

Accepted Solution

by:
Racim BOUDJAKDJI earned 500 total points
ID: 40514215
<<Wouldn't a server shut down recycle the error log and bring down the SharePoint farm? Neither happened.>>
I do not know what is an error log recycle on SQL Server.

SQL Server can perfectly trigger a shutdown that never occurs.  A shutdown in the logs is primarily an indication that SQL Server is taking action to preserve data integrity.  In most cases the shutdown occurs but in some cases it does not (don't ask me why I have no idea)
0
 
LVL 29

Author Comment

by:QPR
ID: 40514224
Because we have no IO issues!
The image shows both disk and SQL IOs. Neither are overly high and have since been reduced further still.
At the time of the issue (that no longer occurs) the CPU was running at 100%.
This resolved itself with a SharePoint (web servers) reboot and some rescheduling of SP timer jobs.
There is no issue anymore and therefore nothing to investigate. I am also away from work on Xmas break so not at my desk to run any tests.
My question was always about the log entries. You have since taught me that if under stress SQL Server can initiate shout downs to protect the data. This must have been what I was seeing. I did not know this so thank you
0
 
LVL 29

Author Comment

by:QPR
ID: 40514228
When SQL Server restarts it creates a new error log. Otherwise known as recycling.
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40514236
<<Because we have no IO issues!>>
Glad for you.  But everything else I said is true.

<<I am also away from work on Xmas break so not at my desk to run any tests. >>
Merry Christmas in this case. Enjoying the vacation and relaxing hard I see :).

<<Otherwise known as recycling.>>
Ah OK.  I was not aware of this terminology for creating a new file.

Regards;..
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

In this article—a derivative of my DaytaBase.org blog post (http://daytabase.org/2011/06/18/what-week-is-it/)—I will explore a few different perspectives on which week today's date falls within using Microsoft SQL Server. First, to frame this stu…
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Via a live example, show how to setup several different housekeeping processes for a SQL Server.

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now