?
Solved

One of the databases reported SUSPECT after reboot - why at reboot?

Posted on 2014-09-10
19
Medium Priority
?
181 Views
Last Modified: 2014-09-28
What is the connection between reboot and Status of database. How can reboot reset or alert a new status for a database?
0
Comment
Question by:25112
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 7
  • 3
19 Comments
 
LVL 5

Author Comment

by:25112
ID: 40314410
This is SQL 2005, if that should make a different...

server reboots take place every 90 days. is that too much distance that can cause issues for sql server?
0
 
LVL 35

Accepted Solution

by:
ste5an earned 668 total points
ID: 40314494
The reboot per se is not a problem. The only direct cause: the drives have not enough time to settle down (write cache).

One the other hand, are you sure that the db wasn't already in suspect state?
0
 
LVL 5

Author Comment

by:25112
ID: 40314501
>>One the other hand, are you sure that the db wasn't already in suspect state?
the database was in use and hence I don't believe it could have been in suspect state. Also when I ran checkdb before the reboot, it did not give errors.. but now after the reboot it mentioned several consistency and allocation errros.

>>is that too much distance that can cause issues for sql server?
do you mean the first few minutes after the reboot?
0
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

 
LVL 51

Assisted Solution

by:Vitor Montalvão
Vitor Montalvão earned 1332 total points
ID: 40314508
The server could has some issue that only a reboot will make it "alive" like for example a previous installation of a driver, hotfix or even an application.
Also can be hardware issue like storage (you can check if you can reach the database files) or even a security issue.
0
 
LVL 35

Assisted Solution

by:ste5an
ste5an earned 668 total points
ID: 40314509
Check your disks. Sounds like an problem with the IO system.
0
 
LVL 51

Assisted Solution

by:Vitor Montalvão
Vitor Montalvão earned 1332 total points
ID: 40314526
Even the first thing to do is to check SQL Server log and OS event log.
Usually you can find important information there.
0
 
LVL 5

Author Comment

by:25112
ID: 40314610
when I ran checkdb now, I got this in errorlog.. what does it tell you?
---

2014-09-10 09:39:19.090      spid54      DBCC CHECKDB (ROSTER_TEST) executed by RSBSV\agors found 539 errors and repaired 0 errors. Elapsed time: 0 hours 2 minutes 12 seconds.
2014-09-10 09:39:19.180      spid54      Using 'dbghelp.dll' version '4.0.5'
2014-09-10 09:39:19.180      spid54      **Dump thread - spid = 54, PSS = 0x179F5AC0, EC = 0x179F5AC8
2014-09-10 09:39:19.180      spid54      ***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump0127.txt
2014-09-10 09:39:19.180      spid54      * *******************************************************************************
2014-09-10 09:39:19.180      spid54      *
2014-09-10 09:39:19.180      spid54      * BEGIN STACK DUMP:
2014-09-10 09:39:19.180      spid54      *   09/10/14 09:39:19 spid 54
2014-09-10 09:39:19.180      spid54      *
2014-09-10 09:39:19.180      spid54      * DBCC database corruption
2014-09-10 09:39:19.180      spid54      *
2014-09-10 09:39:19.180      spid54      * Input Buffer 88 bytes -
2014-09-10 09:39:19.180      spid54      *               dbcc checkdb (ROSTER_TEST)
2014-09-10 09:39:19.180      spid54      *  
2014-09-10 09:39:19.180      spid54      * *******************************************************************************
2014-09-10 09:39:19.180      spid54      * -------------------------------------------------------------------------------
2014-09-10 09:39:19.180      spid54      * Short Stack Dump
2014-09-10 09:39:19.210      spid54      Stack Signature for the dump is 0x00000092
2014-09-10 09:39:21.250      spid54      External dump process return code 0x20002001.  The error information has been submitted to Watson error reporting.
2014-09-10 09:39:21.250      spid54      DoMiniDump () encountered error (0x800703E6) - Invalid access to memory location.
0
 
LVL 5

Author Comment

by:25112
ID: 40314618
this error is happening 100s of times:

A time-out occurred while waiting for buffer latch -- type 3, bp 0359F010, page 1:652536, stat 0xc0010b, database id: 9, allocation unit Id: 72057609715712000, task 0x008D86B8 : 0, waittime 300, flags 0x4003a, owning task 0x008D92E8. Not continuing to wait.
Error: 845, Severity: 17, State: 1.
0
 
LVL 51

Assisted Solution

by:Vitor Montalvão
Vitor Montalvão earned 1332 total points
ID: 40314623
Check the error log for the period just before and after the reboot.
0
 
LVL 35

Assisted Solution

by:ste5an
ste5an earned 668 total points
ID: 40314642
A time-out occurred while waiting for buffer latch [..]
The request IO process timed out, sounds like an IO problem. Maybe the load on the IO system is too high when rebooting.
0
 
LVL 5

Author Comment

by:25112
ID: 40314649
last error before reboot:
A time-out occurred while waiting for buffer latch -- type 3, bp 0359F010, page 1:652536, stat 0xc0010b, database id: 9, allocation unit Id: 72057609715712000, task 0x008D86B8 : 0, waittime 300, flags 0x4003a, owning task 0x008D92E8. Not continuing to wait.
 Error: 845, Severity: 17, State: 1.

After reboot:
2014-09-09 15:37:58.670      spid17s      Analysis of database 'ROSTER_TEST' (9) is 100% complete (approximately 0 seconds remain). This is an informational message only. No user action is required.
2014-09-09 15:37:59.200      spid17s      Error: 3456, Severity: 21, State: 1.
2014-09-09 15:37:59.200      spid17s      Could not redo log record (259903:17022:4), for transaction ID (0:75894198), on page (1:652523), database 'ROSTER_TEST' (database ID 9). Page: LSN = (259886:6386:12), type = 2. Log: OpCode = 3, context 19, PrevPageLSN: (259827:5755:1). Restore from a backup of the database, or repair the database.
2014-09-09 15:37:59.320      spid17s      Using 'dbghelp.dll' version '4.0.5'
2014-09-09 15:37:59.590      spid17s      **Dump thread - spid = 17, PSS = 0x040947D8, EC = 0x040947E0
2014-09-09 15:37:59.590      spid17s      ***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump0123.txt
2014-09-09 15:37:59.590      spid17s      * *******************************************************************************
2014-09-09 15:37:59.590      spid17s      *
2014-09-09 15:37:59.590      spid17s      * BEGIN STACK DUMP:
2014-09-09 15:37:59.590      spid17s      *   09/09/14 15:37:59 spid 17
2014-09-09 15:37:59.590      spid17s      *
2014-09-09 15:37:59.590      spid17s      * HandleAndNoteToErrorlog: Exception raised, major=34, minor=56, severity=21
2014-09-09 15:37:59.590      spid17s      *

Stack Signature for the dump is 0xA30963D5
External dump process return code 0x20002001.  The error information has been submitted to Watson error reporting.
0
 
LVL 5

Author Comment

by:25112
ID: 40314664
>>The request IO process timed out, sounds like an IO problem. Maybe the load on the IO system is too high when rebooting.

this error is seen on the error log all day yesterday.. after the reboot, this error has not happened (perhaps because the database is on EMERGENCY (suspect) mode)
0
 
LVL 51

Assisted Solution

by:Vitor Montalvão
Vitor Montalvão earned 1332 total points
ID: 40314681
I think you got transaction log full. Can you check if you find this message in the error logs?
If it's that you should backup more often the transaction log.
0
 
LVL 5

Author Comment

by:25112
ID: 40314699
just before the reboot, I see one error related to log being full.. but that is for the windows events log.. there is 200GB space left on drive, so I don't TL could have got full....

the error is
"
The current event was not reported to the Windows Events log. Operating system error = 1717(The interface is unknown.). You may need to clear the Windows Events log if it is full.
"
0
 
LVL 51

Assisted Solution

by:Vitor Montalvão
Vitor Montalvão earned 1332 total points
ID: 40314703
That's not Transaction Log full error.
Maybe the engine lost connection to the transaction log files.
0
 
LVL 5

Author Comment

by:25112
ID: 40314762
right now in EMERGENCY mode, Tlog is 0kb. (according to  sp_helpfile) but the actual file is still there and is over a GB)
0
 
LVL 51

Expert Comment

by:Vitor Montalvão
ID: 40314799
I think SQL Server created a new Tlog file.
0
 
LVL 5

Author Comment

by:25112
ID: 40315221
Thanks..

the network team says the disks are OK without errors.. that is what makes this more tricky to resolve.

so back to the original question.. is it possible to pinpoint why the reboot could have kicked off a suspect database, or can we can only theorize possibilities?

~~~
The only direct cause: the drives have not enough time to settle down (write cache).

The server could has some issue that only a reboot will make it "alive" like for example a previous installation of a driver, hotfix or even an application.
~~~

what would be constructive things to do/watch in general to avoid this in future (other than making sure to reboot more often?)
0
 
LVL 51

Assisted Solution

by:Vitor Montalvão
Vitor Montalvão earned 1332 total points
ID: 40316331
I recommend you to have a good monitor tool, like for example SCOM, so you can be warned immediately when a problem occurs.
If your company have a support contract with Microsoft then send them the dumps originated by SQL Server and Windows Server so they can analyze it. For sure there's information about the problem.
0

Featured Post

Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
Via a live example, show how to setup several different housekeeping processes for a SQL Server.
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question