Solved

Strange problem with Redhat 7.3

Posted on 2002-06-28
12
350 Views
Last Modified: 2010-04-20
I have installed RedHat 7.3 Server to replace the old RedHat 6.2. The replacement went well and the new 7.3 worked fine for about a month. Then it started to act strangely.

It crashes randomly. When the system crashes I can ping it, but no other traffic goes through. Apache, sendmail, ssh, bind don't response. It must be rebooted. On the console it also acts strangely because no one can login or see what is wrong.

I have looked at the log files and there is nothing I could relate to a crash. When the system crashes it stops to write anything into log files.

The systems runs Apache with mod_ssl, mod_php; sendmail and other standard programs included in RedHat distribution.

The only programs that are somehow suspicious (not from RedHat distribution) are some program for e-store and RealMedia Server.

When the system crashed for the second time I took another computer, took the disks from the server and put them in another computer to eliminate hardware problem. The first computer was some duron processor the second one that now worked for a month is som athlon. It worked nearly for a month and now it crashed two times in three days.

I have installed the newest patches with up2date utility from RedHat.

The computer has RealNetworks networks cards with "dmfe" driver. This cards with the same driver work fine in many another 6.2 and 7.2 servers.

Where to look for error? Should I install RedHat 7.2 which works fine on another Intel server for some time wihtout any problems. This is the first time I used AMD. Is it possible that the problem is in the processor? Maybe I should switch to Intel.

I have looked on internet to find any information if someone has the same problem. I have found nothing.

Please help.
0
Comment
Question by:marko020397
12 Comments
 
LVL 6

Accepted Solution

by:
Zoplax earned 500 total points
ID: 7116235
Maybe your hard drive has developed some bad sectors.

I recently went through a VERY frustrating ordeal, trying to install RedHat on a hard drive which I thought was free of errors.

I literally attempted to install 7.3 about ** 12 ** times, each time trying anything from a full-blown install to minimal install to many things in between.

I finally gave up, and tried installing the latest version of FreeBSD on this machine.  The installation went flawlessly; however, when I tried to run the configuration utility post-setup, FreeBSD informed me that there were some "hard errors" as it tried to load the config utility!!

RedHat's setup NEVER divulged this info to me; it would just go anywhere from 1/2 to 3/4 of the way into the setup and crash with an obscure error message unrelated to a hard drive problem.

Anyway...  Try checking your file system with the "fsck" utility and see if any bad blocks are found; I don't know how to deal with bad blocks when they're found, nor whether RedHat has any kind of built-in way to mark these sectors as "bad" and move the data elsewhere.
0
 
LVL 40

Expert Comment

by:jlevie
ID: 7116267
It could be a disk problem, as Zoplax pinted out, but I'd bet on it being something else.

Since nothing is being logged to files as the box dies, it's likely to be a problem that takes down part of the kernel or its services. Are you running a GUI login? If so turn that off by going to run level 3 or commenting out the prefdm line in inittab. If the kernel is dying it will probably write some error messages to the console, which the GUI login will hide and keep you from seeing. Not running a GUI login will let you see those errors.
0
 
LVL 2

Expert Comment

by:hindmost
ID: 7117297
It worked fine for about a month and then began to act strangely?
When you swapped disks did you reinstall or just slot them into place?
Did you change ANYTHING during that time. Software install, hardware, tweak your system...
Certainly a console login will let you see any messages as they appear, however they should get written to a logfile if they are making it to the console...
Kernel is still up coz you are getting a response to ping. Are you able to do a soft reboot or do you have to do a hard reboot? (ctr-alt-delete or big red switch <g>)
What kernel version are you using? also more hardware info would be useful (Fault diagnosis is a pain, you have to go through a wee tree for everything <g>) Although it is begining to sound like either a problem with the disk, except you haven't mentioned any fsck errors, or a software problem which sounds more likely. Sounds like something is killing userspace progs rather than your kernel dying (you are still getting a ping response so something is still alive <g>) Does your keyboard still respond? (Caps lock key etc)
0
 
LVL 4

Author Comment

by:marko020397
ID: 7118212
The server is located in a distant location. I went there only twice when the crash occured.

There was nothing writen on the console about any error. I could type on the console but I couldn't login and I think Ctrl-Alt-Del didn't work too. The kernel version is the one included in redHat distribution. I am not sure about version number.

When I swapped disks I just slot them into place.

There is nothing special about hardware. An Athlon processor, floppy, CD-ROM, two network cards and two disks.

There is no X windows installed on the system. It boots up in runlevel 3.
0
 
LVL 40

Expert Comment

by:jlevie
ID: 7118920
There have been two kernel updates since 7.3 was released (current is 2.4.18-5). With no other hard clues to suggest a cause I'd suggest updating the system for the latest kernel and Apache, etc.
0
 
LVL 1

Expert Comment

by:mikeyman
ID: 7119523
As a curiosity...did you upgrade the 6.2 box or do a fresh install.  I know upgrading sounds easier, but I have found (on the same piece of hardware) upgrades tend to have more flakey behavior than complete installs.  Since this is Linux, save your .conf files and any other bits you cant live without and do a fresh install, re-formatting the / partition.  If the box is all one partition, you may want to rebuild it (saving the data to tape or another box) seperating the / from /home at the very least.
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 4

Author Comment

by:marko020397
ID: 7120190
I installed RedHat 7.3 on an new machine. Then I transfered web sites, mailboxes,... When the new machines functionality was identical I plugged out the old one and switched in the new one.

I have taken care of software updates and the server has the newest kernel, apache,...
0
 
LVL 40

Expert Comment

by:jlevie
ID: 7120227
I guess, we'll have to wait and see if the problem reoccurs. I don't know of any problems inherit in 7.3 that would cause your problem and I do have a number of 7.3 boxes doing DNS, mail, and web that have uptimes much in excess of a month. I've not experienced any problems with those boxes, which I religiously keep up to date.

Since it might matter in this case I'll have to admit that I don't use the Internet server packages that RedHat ships. I always build my own copies of Bind, Sendmail, Cyrus/UoW IMAP, Apache, PHP, & Postgres or MySQL. That's partly because I want to be able to use the current version and partly because I want to customize the build for the environment they serve. So I can't say if the distributed copies of Apache or whatever could be part of the problem.
0
 
LVL 4

Author Comment

by:marko020397
ID: 7142521
I think I may have found the solution. It is the AMD Athlon/Duron bug. Redad about it here:

http://www.bestcomputerbuilders.com/stories.htm

I have added "mem=nopentium" parameter to the kernel. Now is all I can do wait and see if the server will stay alive.

By the way. There is nothing wrong with the disk. I have checked it.
0
 
LVL 4

Author Comment

by:marko020397
ID: 7164132
The computer crashed again. I have now put the disks in completely new Intel machine and of course installed Intell instead of AMD optimized kernel. It works now for a week. I'll wait an see.
0
 
LVL 4

Author Comment

by:marko020397
ID: 7195443
Intel processor didn't help as I predicted but I was hoping my predictions were wrong.

I came to conclusion that disk is the problem although it has no bad sectors. I believe it must have some other non surface problem.

Indications:
- ping works because it doesn't need anything written on the disk
- nothing is in log files because disk stopped working
- all other services which need disk access stopped working
- when tried to login on the console I could have written the username, then everything stopped when server tried to check the username on disk

The server stopped again and I installed everything on new disks. For now the server works on new disks. I will be 100% sure I have found the error when it will work for at least a month.
0
 
LVL 4

Author Comment

by:marko020397
ID: 7243018
It turned out to be a disk problem although disk reports no bad sectors. It must be some other more hidden disk error.

Now the server runs for a month with new disks and all the other hardware is the same.
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Linux users are sometimes dumbfounded by the severe lack of documentation on a topic. Sometimes, the documentation is copious, but other times, you end up with some obscure "it varies depending on your distribution" over and over when searching for …
The purpose of this article is to fix the unknown display problem in Linux Mint operating system. After installing the OS if you see Display monitor is not recognized then we can install "MESA" utilities to fix this problem or we can install additio…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now