Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 212
  • Last Modified:

One of the databases reported SUSPECT after reboot - why at reboot?

What is the connection between reboot and Status of database. How can reboot reset or alert a new status for a database?
0
25112
Asked:
25112
  • 9
  • 7
  • 3
9 Solutions
 
25112Author Commented:
This is SQL 2005, if that should make a different...

server reboots take place every 90 days. is that too much distance that can cause issues for sql server?
0
 
ste5anSenior DeveloperCommented:
The reboot per se is not a problem. The only direct cause: the drives have not enough time to settle down (write cache).

One the other hand, are you sure that the db wasn't already in suspect state?
0
 
25112Author Commented:
>>One the other hand, are you sure that the db wasn't already in suspect state?
the database was in use and hence I don't believe it could have been in suspect state. Also when I ran checkdb before the reboot, it did not give errors.. but now after the reboot it mentioned several consistency and allocation errros.

>>is that too much distance that can cause issues for sql server?
do you mean the first few minutes after the reboot?
0
Get your Disaster Recovery as a Service basics

Disaster Recovery as a Service is one go-to solution that revolutionizes DR planning. Implementing DRaaS could be an efficient process, easily accessible to non-DR experts. Learn about monitoring, testing, executing failovers and failbacks to ensure a "healthy" DR environment.

 
Vitor MontalvãoMSSQL Senior EngineerCommented:
The server could has some issue that only a reboot will make it "alive" like for example a previous installation of a driver, hotfix or even an application.
Also can be hardware issue like storage (you can check if you can reach the database files) or even a security issue.
0
 
ste5anSenior DeveloperCommented:
Check your disks. Sounds like an problem with the IO system.
0
 
Vitor MontalvãoMSSQL Senior EngineerCommented:
Even the first thing to do is to check SQL Server log and OS event log.
Usually you can find important information there.
0
 
25112Author Commented:
when I ran checkdb now, I got this in errorlog.. what does it tell you?
---

2014-09-10 09:39:19.090      spid54      DBCC CHECKDB (ROSTER_TEST) executed by RSBSV\agors found 539 errors and repaired 0 errors. Elapsed time: 0 hours 2 minutes 12 seconds.
2014-09-10 09:39:19.180      spid54      Using 'dbghelp.dll' version '4.0.5'
2014-09-10 09:39:19.180      spid54      **Dump thread - spid = 54, PSS = 0x179F5AC0, EC = 0x179F5AC8
2014-09-10 09:39:19.180      spid54      ***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump0127.txt
2014-09-10 09:39:19.180      spid54      * *******************************************************************************
2014-09-10 09:39:19.180      spid54      *
2014-09-10 09:39:19.180      spid54      * BEGIN STACK DUMP:
2014-09-10 09:39:19.180      spid54      *   09/10/14 09:39:19 spid 54
2014-09-10 09:39:19.180      spid54      *
2014-09-10 09:39:19.180      spid54      * DBCC database corruption
2014-09-10 09:39:19.180      spid54      *
2014-09-10 09:39:19.180      spid54      * Input Buffer 88 bytes -
2014-09-10 09:39:19.180      spid54      *               dbcc checkdb (ROSTER_TEST)
2014-09-10 09:39:19.180      spid54      *  
2014-09-10 09:39:19.180      spid54      * *******************************************************************************
2014-09-10 09:39:19.180      spid54      * -------------------------------------------------------------------------------
2014-09-10 09:39:19.180      spid54      * Short Stack Dump
2014-09-10 09:39:19.210      spid54      Stack Signature for the dump is 0x00000092
2014-09-10 09:39:21.250      spid54      External dump process return code 0x20002001.  The error information has been submitted to Watson error reporting.
2014-09-10 09:39:21.250      spid54      DoMiniDump () encountered error (0x800703E6) - Invalid access to memory location.
0
 
25112Author Commented:
this error is happening 100s of times:

A time-out occurred while waiting for buffer latch -- type 3, bp 0359F010, page 1:652536, stat 0xc0010b, database id: 9, allocation unit Id: 72057609715712000, task 0x008D86B8 : 0, waittime 300, flags 0x4003a, owning task 0x008D92E8. Not continuing to wait.
Error: 845, Severity: 17, State: 1.
0
 
Vitor MontalvãoMSSQL Senior EngineerCommented:
Check the error log for the period just before and after the reboot.
0
 
ste5anSenior DeveloperCommented:
A time-out occurred while waiting for buffer latch [..]
The request IO process timed out, sounds like an IO problem. Maybe the load on the IO system is too high when rebooting.
0
 
25112Author Commented:
last error before reboot:
A time-out occurred while waiting for buffer latch -- type 3, bp 0359F010, page 1:652536, stat 0xc0010b, database id: 9, allocation unit Id: 72057609715712000, task 0x008D86B8 : 0, waittime 300, flags 0x4003a, owning task 0x008D92E8. Not continuing to wait.
 Error: 845, Severity: 17, State: 1.

After reboot:
2014-09-09 15:37:58.670      spid17s      Analysis of database 'ROSTER_TEST' (9) is 100% complete (approximately 0 seconds remain). This is an informational message only. No user action is required.
2014-09-09 15:37:59.200      spid17s      Error: 3456, Severity: 21, State: 1.
2014-09-09 15:37:59.200      spid17s      Could not redo log record (259903:17022:4), for transaction ID (0:75894198), on page (1:652523), database 'ROSTER_TEST' (database ID 9). Page: LSN = (259886:6386:12), type = 2. Log: OpCode = 3, context 19, PrevPageLSN: (259827:5755:1). Restore from a backup of the database, or repair the database.
2014-09-09 15:37:59.320      spid17s      Using 'dbghelp.dll' version '4.0.5'
2014-09-09 15:37:59.590      spid17s      **Dump thread - spid = 17, PSS = 0x040947D8, EC = 0x040947E0
2014-09-09 15:37:59.590      spid17s      ***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump0123.txt
2014-09-09 15:37:59.590      spid17s      * *******************************************************************************
2014-09-09 15:37:59.590      spid17s      *
2014-09-09 15:37:59.590      spid17s      * BEGIN STACK DUMP:
2014-09-09 15:37:59.590      spid17s      *   09/09/14 15:37:59 spid 17
2014-09-09 15:37:59.590      spid17s      *
2014-09-09 15:37:59.590      spid17s      * HandleAndNoteToErrorlog: Exception raised, major=34, minor=56, severity=21
2014-09-09 15:37:59.590      spid17s      *

Stack Signature for the dump is 0xA30963D5
External dump process return code 0x20002001.  The error information has been submitted to Watson error reporting.
0
 
25112Author Commented:
>>The request IO process timed out, sounds like an IO problem. Maybe the load on the IO system is too high when rebooting.

this error is seen on the error log all day yesterday.. after the reboot, this error has not happened (perhaps because the database is on EMERGENCY (suspect) mode)
0
 
Vitor MontalvãoMSSQL Senior EngineerCommented:
I think you got transaction log full. Can you check if you find this message in the error logs?
If it's that you should backup more often the transaction log.
0
 
25112Author Commented:
just before the reboot, I see one error related to log being full.. but that is for the windows events log.. there is 200GB space left on drive, so I don't TL could have got full....

the error is
"
The current event was not reported to the Windows Events log. Operating system error = 1717(The interface is unknown.). You may need to clear the Windows Events log if it is full.
"
0
 
Vitor MontalvãoMSSQL Senior EngineerCommented:
That's not Transaction Log full error.
Maybe the engine lost connection to the transaction log files.
0
 
25112Author Commented:
right now in EMERGENCY mode, Tlog is 0kb. (according to  sp_helpfile) but the actual file is still there and is over a GB)
0
 
Vitor MontalvãoMSSQL Senior EngineerCommented:
I think SQL Server created a new Tlog file.
0
 
25112Author Commented:
Thanks..

the network team says the disks are OK without errors.. that is what makes this more tricky to resolve.

so back to the original question.. is it possible to pinpoint why the reboot could have kicked off a suspect database, or can we can only theorize possibilities?

~~~
The only direct cause: the drives have not enough time to settle down (write cache).

The server could has some issue that only a reboot will make it "alive" like for example a previous installation of a driver, hotfix or even an application.
~~~

what would be constructive things to do/watch in general to avoid this in future (other than making sure to reboot more often?)
0
 
Vitor MontalvãoMSSQL Senior EngineerCommented:
I recommend you to have a good monitor tool, like for example SCOM, so you can be warned immediately when a problem occurs.
If your company have a support contract with Microsoft then send them the dumps originated by SQL Server and Windows Server so they can analyze it. For sure there's information about the problem.
0

Featured Post

[Webinar] Database Backup and Recovery

Does your company store data on premises, off site, in the cloud, or a combination of these? If you answered “yes”, you need a data backup recovery plan that fits each and every platform. Watch now as as Percona teaches us how to build agile data backup recovery plan.

  • 9
  • 7
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now