[Webinar] Learn how to a build a cloud-first strategyRegister Now


Where to find who reboot the server?

Posted on 2011-10-14
Medium Priority
Last Modified: 2013-12-06
How can I know if the server envcountered a disk problem before reboot and who reboot it ?
Hpux 11.31
Question by:LindaC
  • 7
  • 5

Expert Comment

ID: 36972119
A disk problem should be logged to syslog and hopefully monitored by the event log.  You can also use cstm/stm to look at the drives.

To see how rebooted a machine you need to enable auditing.
This can be configured using SAM, but there are also a slew of
 commands (named aud*) to control the audit log system from the command
The audit log system is disabled by default. When it is enabled, it
 can log detailed information of the user's actions (the names of files
 read/written, the starting or stopping of processes and the like).
 The full list of audit action categories is available by the command
 "man 5 audit".

The file /.secure/etc/audnames lists the names of the audit logfiles,
 if the audit log system is enabled. The audit logfiles are in a binary
 format: you must use the "audisp" command to view them.

Author Comment

ID: 36972190
Thank you.

I have the following entries in the OLDsyslog..log, that I have attched to this entry.


Expert Comment

ID: 36972211
You need to look above that snipet but excessive io like that can cause a crash.  Is the VG external to the server?  Check those connections such as fc cable and gbics.  Most arrays have some built in drive health checks you might consider running.
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Author Comment

ID: 36972215
Some disks are in the San disk.

This are the lists of the filesystems: (I'am an oracle dba)

Filesystem          kbytes    used   avail %used Mounted on
/dev/vg00/lvol3    2097152  237912 1844824   11% /
/dev/vg00/lvol1    1835008  180592 1641584   10% /stand
/dev/vg00/lvol8    18432000 1681728 16621024    9% /var
/dev/vg00/lvol7    10240000 4211992 5981024   41% /usr
/dev/vg04/lvol1    50176000 36302270 13006719   74% /u06
/dev/vg03/lvol1    70656000 33531106 34804791   49% /u04
/dev/vg02/lvol1    245760000 54511994 179295126   23% /u03
/dev/vg01/lvol1    101376000 31103631 66045976   32% /u01
/dev/vg00/lvol6    5636096 3188720 2430128   57% /tmp
/dev/vg06/lvol1    40894464 1114537 37293817    3% /prod/ARCHIVE
/dev/vg00/lvol5    12288000 4289008 7936632   35% /opt
/dev/vg00/lvol4    1048576   69288  971704    7% /home
/dev/vg07/lvol1    19922944  712798 18009628    4% /home/oracle
/dev/vg05/lvol1    102400000 27891973 69851281   29% /exports

Expert Comment

ID: 36972264
OK.  What was happening when the IO timeouts started?  For instance I can create this problem by deleting luns on my VA because lun deletion is a foreground process, especially if the server is busy.

What else is in syslog?  A typical disk problem looks like this:

hostname vmunix: SCSI: Async write error -- dev: b 31 0x022000, errno: 126, resid: 8192,
hostname vmunix:   blkno: 45699672, sectno: 91399344, offset: 3846791168, bcount: 8192.
 hostname vmunix:   blkno: 45699128, sectno: 91398256, offset: 3846234112, bcount: 8192.
 hostname vmunix: SCSI: Read error -- dev: b 31 0x022000, errno: 126, resid: 1024,
 hostname vmunix: SCSI: Async write error -- dev: b 31 0x022000, errno: 126, resid: 8192,
 hostname vmunix:   blkno: 8, sectno: 16, offset: 8192, bcount: 1024.
 hostname vmunix: LVM: VG 64 0x000000: PVLink 31 0x022000 Failed! The PV is not accessible.

Author Comment

ID: 36972296
Do you know where is the reboot log?  Why it is not in syslog?

Author Comment

ID: 36972322
Can it be that it crashed and it did not register anything today at 11:45 am ?

Now is 12:26 am (Saturday)


12:26am  up 12:55,  1 user,  load average: 0.22, 0.25, 0.23

Author Comment

ID: 36972327
I found the restart in the syslog.log, but I don't know if it was that is was shutdown or it crashed:

Oct 14 11:32:15 ebsprdb syslogd: restart
Oct 14 11:32:15 ebsprdb vmunix:
Oct 14 11:32:15 ebsprdb vmunix: MFS is defined: base= 0xe00000010205e000  size=
39928 KB
Oct 14 11:32:15 ebsprdb vmunix: Loaded ACPI revision 2.0 tables.
Oct 14 11:32:15 ebsprdb vmunix: MCA recovery subsystem disabled, not supported o
n this platform.
Oct 14 11:32:15 ebsprdb vmunix: montecito_proc_features: PROC_GET_FEATURES retur
ned 0xfffffffffffffff8
Oct 14 11:32:15 ebsprdb vmunix: Using /stand/ext_ioconfig
Oct 14 11:32:15 ebsprdb vmunix: Memory Class Setup
Oct 14 11:32:15 ebsprdb vmunix: ------------------------------------------------
Oct 14 11:32:15 ebsprdb vmunix: Class     Physmem              Lockmem
Oct 14 11:32:15 ebsprdb vmunix:
Oct 14 11:32:15 ebsprdb  above message repeats 3 times
Oct 14 11:32:15 ebsprdb vmunix: System :  16552 MB             16552 MB
    16552 MB
Oct 14 11:32:15 ebsprdb vmunix: Kernel :  16552 MB             16552 MB
    16552 MB
Oct 14 11:32:15 ebsprdb vmunix: User   :  15243 MB             13665 MB
    13718 MB
syslog.log (6%)

Expert Comment

ID: 36973683
Restart is not a crash that is someone or some process doing it.  Look above the restart for messages that lead to it if anything.  If there was a crash you should have files from the 14th in /var/log/crash.

Author Comment

ID: 36973722
Thank you.  No crash log.

The thing is that we know that the other tests servers from this particular production servers where affected by a work that was done yesterday in the San disks by Ibm external personnell.  
The thing is that this production server did not appeared to have yesterday october 14 an issue with disks at all and it was restarted maybe as an error.  We need to know if this server have some type of disks error.  I have uploaded as an attachment the syslog and the OLDsyslog.log.

But no crash, and the tests server has no crash also.

Accepted Solution

paulc earned 2000 total points
ID: 36973993
I don't see those files but I'd be happy to look at them.  LIke I said, if this machine is attached to that san then it is very likely that work caused the excessive IO and it can cause a crash.  I had this happen to my 8420s earlier this year when the DBAs were migrating LUNs to different EMC arrays.  In the previous syslog excerpts you posted there are no indications of disk issues.

Take a look at the logtool in cstm for any other issues.  Cstm can also be used to test the system.  xstm is the graphical version.



Author Closing Comment

ID: 36974008
Thank you so much for your help and valuable information!

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

My previous tech tip, Installing the Solaris OS From the Flash Archive On a Tape (http://www.experts-exchange.com/articles/OS/Unix/Solaris/Installing-the-Solaris-OS-From-the-Flash-Archive-on-a-Tape.html), discussed installing the Solaris Operating S…
I have been running these systems for a few years now and I am just very happy with them.   I just wanted to share the manual that I have created for upgrades and other things.  Oooh yes! FreeBSD makes me happy (as a server), no maintenance and I al…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Suggested Courses
Course of the Month20 days, 11 hours left to enroll

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question