Where to find who reboot the server?

How can I know if the server envcountered a disk problem before reboot and who reboot it ?
Hpux 11.31
LVL 8
LindaCAsked:
Who is Participating?
 
Paul CoffeyCommented:
I don't see those files but I'd be happy to look at them.  LIke I said, if this machine is attached to that san then it is very likely that work caused the excessive IO and it can cause a crash.  I had this happen to my 8420s earlier this year when the DBAs were migrating LUNs to different EMC arrays.  In the previous syslog excerpts you posted there are no indications of disk issues.

Take a look at the logtool in cstm for any other issues.  Cstm can also be used to test the system.  xstm is the graphical version.

http://h20000.www2.hp.com/bc/docs/support/SupportManual/c02620764/c02620764.pdf
http://h20000.www2.hp.com/bc/docs/support/SupportManual/c02620756/c02620756.pdf

0
 
Paul CoffeyCommented:
A disk problem should be logged to syslog and hopefully monitored by the event log.  You can also use cstm/stm to look at the drives.

To see how rebooted a machine you need to enable auditing.
This can be configured using SAM, but there are also a slew of
 commands (named aud*) to control the audit log system from the command
 line.
 
The audit log system is disabled by default. When it is enabled, it
 can log detailed information of the user's actions (the names of files
 read/written, the starting or stopping of processes and the like).
 The full list of audit action categories is available by the command
 "man 5 audit".

The file /.secure/etc/audnames lists the names of the audit logfiles,
 if the audit log system is enabled. The audit logfiles are in a binary
 format: you must use the "audisp" command to view them.
 
0
 
LindaCAuthor Commented:
Thank you.

I have the following entries in the OLDsyslog..log, that I have attched to this entry.


Extract-old-syslog.txt
0
Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

 
Paul CoffeyCommented:
You need to look above that snipet but excessive io like that can cause a crash.  Is the VG external to the server?  Check those connections such as fc cable and gbics.  Most arrays have some built in drive health checks you might consider running.
0
 
LindaCAuthor Commented:
Some disks are in the San disk.

This are the lists of the filesystems: (I'am an oracle dba)

Filesystem          kbytes    used   avail %used Mounted on
/dev/vg00/lvol3    2097152  237912 1844824   11% /
/dev/vg00/lvol1    1835008  180592 1641584   10% /stand
/dev/vg00/lvol8    18432000 1681728 16621024    9% /var
/dev/vg00/lvol7    10240000 4211992 5981024   41% /usr
/dev/vg04/lvol1    50176000 36302270 13006719   74% /u06
/dev/vg03/lvol1    70656000 33531106 34804791   49% /u04
/dev/vg02/lvol1    245760000 54511994 179295126   23% /u03
/dev/vg01/lvol1    101376000 31103631 66045976   32% /u01
/dev/vg00/lvol6    5636096 3188720 2430128   57% /tmp
/dev/vg06/lvol1    40894464 1114537 37293817    3% /prod/ARCHIVE
/dev/vg00/lvol5    12288000 4289008 7936632   35% /opt
/dev/vg00/lvol4    1048576   69288  971704    7% /home
/dev/vg07/lvol1    19922944  712798 18009628    4% /home/oracle
/dev/vg05/lvol1    102400000 27891973 69851281   29% /exports
0
 
Paul CoffeyCommented:
OK.  What was happening when the IO timeouts started?  For instance I can create this problem by deleting luns on my VA because lun deletion is a foreground process, especially if the server is busy.

What else is in syslog?  A typical disk problem looks like this:

hostname vmunix: SCSI: Async write error -- dev: b 31 0x022000, errno: 126, resid: 8192,
hostname vmunix:   blkno: 45699672, sectno: 91399344, offset: 3846791168, bcount: 8192.
 hostname vmunix:   blkno: 45699128, sectno: 91398256, offset: 3846234112, bcount: 8192.
 hostname vmunix: SCSI: Read error -- dev: b 31 0x022000, errno: 126, resid: 1024,
 hostname vmunix: SCSI: Async write error -- dev: b 31 0x022000, errno: 126, resid: 8192,
 hostname vmunix:   blkno: 8, sectno: 16, offset: 8192, bcount: 1024.
 hostname vmunix: LVM: VG 64 0x000000: PVLink 31 0x022000 Failed! The PV is not accessible.
0
 
LindaCAuthor Commented:
Do you know where is the reboot log?  Why it is not in syslog?
0
 
LindaCAuthor Commented:
Can it be that it crashed and it did not register anything today at 11:45 am ?

Now is 12:26 am (Saturday)

uptime

12:26am  up 12:55,  1 user,  load average: 0.22, 0.25, 0.23
0
 
LindaCAuthor Commented:
I found the restart in the syslog.log, but I don't know if it was that is was shutdown or it crashed:

Oct 14 11:32:15 ebsprdb syslogd: restart
Oct 14 11:32:15 ebsprdb vmunix:
Oct 14 11:32:15 ebsprdb vmunix: MFS is defined: base= 0xe00000010205e000  size=
39928 KB
Oct 14 11:32:15 ebsprdb vmunix: Loaded ACPI revision 2.0 tables.
Oct 14 11:32:15 ebsprdb vmunix: MCA recovery subsystem disabled, not supported o
n this platform.
Oct 14 11:32:15 ebsprdb vmunix: montecito_proc_features: PROC_GET_FEATURES retur
ned 0xfffffffffffffff8
Oct 14 11:32:15 ebsprdb vmunix: Using /stand/ext_ioconfig
Oct 14 11:32:15 ebsprdb vmunix: Memory Class Setup
Oct 14 11:32:15 ebsprdb vmunix: ------------------------------------------------
-------------------------
Oct 14 11:32:15 ebsprdb vmunix: Class     Physmem              Lockmem
    Swapmem
Oct 14 11:32:15 ebsprdb vmunix:
Oct 14 11:32:15 ebsprdb  above message repeats 3 times
Oct 14 11:32:15 ebsprdb vmunix: System :  16552 MB             16552 MB
    16552 MB
Oct 14 11:32:15 ebsprdb vmunix: Kernel :  16552 MB             16552 MB
    16552 MB
Oct 14 11:32:15 ebsprdb vmunix: User   :  15243 MB             13665 MB
    13718 MB
syslog.log (6%)
0
 
Paul CoffeyCommented:
Restart is not a crash that is someone or some process doing it.  Look above the restart for messages that lead to it if anything.  If there was a crash you should have files from the 14th in /var/log/crash.
0
 
LindaCAuthor Commented:
Thank you.  No crash log.

The thing is that we know that the other tests servers from this particular production servers where affected by a work that was done yesterday in the San disks by Ibm external personnell.  
The thing is that this production server did not appeared to have yesterday october 14 an issue with disks at all and it was restarted maybe as an error.  We need to know if this server have some type of disks error.  I have uploaded as an attachment the syslog and the OLDsyslog.log.

But no crash, and the tests server has no crash also.
0
 
LindaCAuthor Commented:
Thank you so much for your help and valuable information!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.