Link to home
Start Free TrialLog in
Avatar of MezzutOzil
MezzutOzil

asked on

RHEL 6.2 server rebooting by itself

This is a newly-setup RedHat 6.2 64-bit server.It was formatted using LVM. This server is later found always rebooting by itself. The incident always happened while nobody is using the server.

Please see few checkings done:

  1. [root@server1 log]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_barley-LogVol00
                      866G  7.5G  815G   1% /
tmpfs                  16G  500K   16G   1% /dev/shm
/dev/sda2             2.0G   61M  1.8G   4% /boot
/dev/sda1             2.0G  260K  2.0G   1% /boot/efi
/dev/sdb1             3.8G  2.6G  1.2G  69% /media/2854-2FE3


  2. [root@server1 log]# fdisk -l

WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.


Disk /dev/sda: 1999.0 GB, 1998998994944 bytes
255 heads, 63 sectors/track, 243031 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1      243032  1952147455+  ee  GPT

Disk /dev/mapper/vg_barley-LogVol01: 2097 MB, 2097152000 bytes
255 heads, 63 sectors/track, 254 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/vg_barley-LogVol01 doesn't contain a valid partition table

Disk /dev/mapper/vg_barley-LogVol00: 944.1 GB, 944125247488 bytes
255 heads, 63 sectors/track, 114783 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/vg_barley-LogVol00 doesn't contain a valid partition table

Disk /dev/sdb: 4000 MB, 4000317440 bytes
114 heads, 49 sectors/track, 1398 cylinders
Units = cylinders of 5586 * 512 = 2860032 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1        1399     3906552    b  W95 FAT32

  3. After the incident, I typed "dmesg" to check the for recent
     messages. I found a few strings of invalid texts.

  4. Check /var/log/messages, no error found

  5. My teammate did tried to disconnect the NICs, and the system
     rebooting incident didn't happen. Could it be the NIC?

What could be the root cause? appreciate for your help.


rgds,
Soo How
Avatar of Kerem ERSOY
Kerem ERSOY

Hi,

the information you've provided so fr is inconclusive. There are some problems which cause a system to reboot. One of them is hardware problems such a as bad memory. You can test the memory rebooting from the installation disk and then going to memtest mode.

Another problem might be you have your hardware watchdog timer has been turned on via bios and iit might boot the server.

But I don't think it is an issue with your harddisk tables.

It might be your network adapter too. You told you had some invalid messages in dmesg. can you please post them here? Do you see any crashdump messages etc in your syslog output ??


Cheers,
K.
I would second the option of possible hardware problems. Besides memory problems I would add overheating of any chip (is CPU ventilation working?).

On the software side I would like to know if in your messages log you see a proper shutdown happening or logs just terminate at one time.
Avatar of MezzutOzil

ASKER

Hi blaz,

In the cron log I find that every time before the server crash "(root) CMD (/usr/lib64/sa/sa1 -S DISK 1 1)" from "/etc/cron.d/sysstat" was always running.

Before that, IBM engineer checked that all hardware was confirmed working.
> (root) CMD (/usr/lib64/sa/sa1 -S DISK 1 1)
Just to clear what this command does:
It retrieves system activity for block devices (disks) and outputs to screen!?
http://linux.die.net/man/8/sadc

How often does this run? If it is quite often then it is normal that you see this before every reboot.


You still did not answer whether you see a proper shutdown in logs?

Did you check the BIOS wdt as KeremE suggested? Do you have any software wdt installed (like "ipmiutil wdt")?

Are you running out of disk or RAM?

Did you check ALL logs at the time of reboot?
cd /var/log
grep -ri 'Feb 20 05:4' *


> Before that, IBM engineer checked that all hardware was confirmed working.
What is the ambient environment - is there proper ventilation and temperature low enough?
-How often does this run? If it is quite often then it is normal that you see this  before every reboot.

ans: Normally is after rebooting the server. A server reboot is the only way out
        as the server is found hung

-You still did not answer whether you see a proper shutdown in logs?

ans: How to check log for proper shutdown? Which log to look for?

- Did you check the BIOS wdt as KeremE suggested? Do you have any software wdt installed (like "ipmiutil wdt")?

ans: No special setting on BIOS. I have to check with my teammate for wdt

- Are you running out of disk or RAM?

ans: no, it shouldn't

- Did you check ALL logs at the time of reboot?
cd /var/log
grep -ri 'Feb 20 05:4' *

ans: I'll get my teammate to check

> Before that, IBM engineer checked that all hardware was confirmed working.
What is the ambient environment - is there proper ventilation and temperature low enough?

ans: ambient and ventilation are confirmed working properly
>> How often does this run? If it is quite often then it is normal
>> that you see this  before every reboot.

> ans: Normally is after rebooting the server. A server reboot is
> the only way out as the server is found hung

I may misunderstood something so please bear with me. Is the machine rebooting itself or does it hang and the only way out is a hard reboot?

I may have not explained enough what I mean:
> every time before the server crash "(root) CMD (/usr/lib64/sa/sa1
> -S DISK 1 1)" from "/etc/cron.d/sysstat" was always running.

If I understand correctly you found this command in the logs as (one of) the last commands that is logged before it reboots. Is this correct? What log are you looking for this information?
My point was that if this command is scheduled to run every 5 or 10 minutes and no other heavy activity is running on the server then it could be normal that this command is the last to be seen.

> ans: How to check log for proper shutdown? Which log to look for?
In /var/log/messages there should be plenty of information about a shutdown if it was done softly (services stopping etc.) - exactly what entries are there on your system you should check by yourself - issue command:
shutdown -r now
All these entries will be missing if a hard shutdown (like powerswitch) was performed - there will be some logs of operation and then the logs of system startup.
ASKER CERTIFIED SOLUTION
Avatar of Kerem ERSOY
Kerem ERSOY

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
After replacing the motherboard, no problem
I actually don't see why you chose (only) the answer you did as the problem was a hardware malfunction as you said. Better suited answers would be #37602050 and possibly #37608538
I don't think it was a malfunction.. If it was then there would be crash and it would not be periodic.