Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


Strange problem with Redhat 7.3

Posted on 2002-06-28
Medium Priority
Last Modified: 2010-04-20
I have installed RedHat 7.3 Server to replace the old RedHat 6.2. The replacement went well and the new 7.3 worked fine for about a month. Then it started to act strangely.

It crashes randomly. When the system crashes I can ping it, but no other traffic goes through. Apache, sendmail, ssh, bind don't response. It must be rebooted. On the console it also acts strangely because no one can login or see what is wrong.

I have looked at the log files and there is nothing I could relate to a crash. When the system crashes it stops to write anything into log files.

The systems runs Apache with mod_ssl, mod_php; sendmail and other standard programs included in RedHat distribution.

The only programs that are somehow suspicious (not from RedHat distribution) are some program for e-store and RealMedia Server.

When the system crashed for the second time I took another computer, took the disks from the server and put them in another computer to eliminate hardware problem. The first computer was some duron processor the second one that now worked for a month is som athlon. It worked nearly for a month and now it crashed two times in three days.

I have installed the newest patches with up2date utility from RedHat.

The computer has RealNetworks networks cards with "dmfe" driver. This cards with the same driver work fine in many another 6.2 and 7.2 servers.

Where to look for error? Should I install RedHat 7.2 which works fine on another Intel server for some time wihtout any problems. This is the first time I used AMD. Is it possible that the problem is in the processor? Maybe I should switch to Intel.

I have looked on internet to find any information if someone has the same problem. I have found nothing.

Please help.
Question by:marko020397
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Accepted Solution

Zoplax earned 1500 total points
ID: 7116235
Maybe your hard drive has developed some bad sectors.

I recently went through a VERY frustrating ordeal, trying to install RedHat on a hard drive which I thought was free of errors.

I literally attempted to install 7.3 about ** 12 ** times, each time trying anything from a full-blown install to minimal install to many things in between.

I finally gave up, and tried installing the latest version of FreeBSD on this machine.  The installation went flawlessly; however, when I tried to run the configuration utility post-setup, FreeBSD informed me that there were some "hard errors" as it tried to load the config utility!!

RedHat's setup NEVER divulged this info to me; it would just go anywhere from 1/2 to 3/4 of the way into the setup and crash with an obscure error message unrelated to a hard drive problem.

Anyway...  Try checking your file system with the "fsck" utility and see if any bad blocks are found; I don't know how to deal with bad blocks when they're found, nor whether RedHat has any kind of built-in way to mark these sectors as "bad" and move the data elsewhere.
LVL 40

Expert Comment

ID: 7116267
It could be a disk problem, as Zoplax pinted out, but I'd bet on it being something else.

Since nothing is being logged to files as the box dies, it's likely to be a problem that takes down part of the kernel or its services. Are you running a GUI login? If so turn that off by going to run level 3 or commenting out the prefdm line in inittab. If the kernel is dying it will probably write some error messages to the console, which the GUI login will hide and keep you from seeing. Not running a GUI login will let you see those errors.

Expert Comment

ID: 7117297
It worked fine for about a month and then began to act strangely?
When you swapped disks did you reinstall or just slot them into place?
Did you change ANYTHING during that time. Software install, hardware, tweak your system...
Certainly a console login will let you see any messages as they appear, however they should get written to a logfile if they are making it to the console...
Kernel is still up coz you are getting a response to ping. Are you able to do a soft reboot or do you have to do a hard reboot? (ctr-alt-delete or big red switch <g>)
What kernel version are you using? also more hardware info would be useful (Fault diagnosis is a pain, you have to go through a wee tree for everything <g>) Although it is begining to sound like either a problem with the disk, except you haven't mentioned any fsck errors, or a software problem which sounds more likely. Sounds like something is killing userspace progs rather than your kernel dying (you are still getting a ping response so something is still alive <g>) Does your keyboard still respond? (Caps lock key etc)
Does Your Cloud Backup Use Blockchain Technology?

Blockchain technology has already revolutionized finance thanks to Bitcoin. Now it's disrupting other areas, including the realm of data protection. Learn how blockchain is now being used to authenticate backup files and keep them safe from hackers.


Author Comment

ID: 7118212
The server is located in a distant location. I went there only twice when the crash occured.

There was nothing writen on the console about any error. I could type on the console but I couldn't login and I think Ctrl-Alt-Del didn't work too. The kernel version is the one included in redHat distribution. I am not sure about version number.

When I swapped disks I just slot them into place.

There is nothing special about hardware. An Athlon processor, floppy, CD-ROM, two network cards and two disks.

There is no X windows installed on the system. It boots up in runlevel 3.
LVL 40

Expert Comment

ID: 7118920
There have been two kernel updates since 7.3 was released (current is 2.4.18-5). With no other hard clues to suggest a cause I'd suggest updating the system for the latest kernel and Apache, etc.

Expert Comment

ID: 7119523
As a curiosity...did you upgrade the 6.2 box or do a fresh install.  I know upgrading sounds easier, but I have found (on the same piece of hardware) upgrades tend to have more flakey behavior than complete installs.  Since this is Linux, save your .conf files and any other bits you cant live without and do a fresh install, re-formatting the / partition.  If the box is all one partition, you may want to rebuild it (saving the data to tape or another box) seperating the / from /home at the very least.

Author Comment

ID: 7120190
I installed RedHat 7.3 on an new machine. Then I transfered web sites, mailboxes,... When the new machines functionality was identical I plugged out the old one and switched in the new one.

I have taken care of software updates and the server has the newest kernel, apache,...
LVL 40

Expert Comment

ID: 7120227
I guess, we'll have to wait and see if the problem reoccurs. I don't know of any problems inherit in 7.3 that would cause your problem and I do have a number of 7.3 boxes doing DNS, mail, and web that have uptimes much in excess of a month. I've not experienced any problems with those boxes, which I religiously keep up to date.

Since it might matter in this case I'll have to admit that I don't use the Internet server packages that RedHat ships. I always build my own copies of Bind, Sendmail, Cyrus/UoW IMAP, Apache, PHP, & Postgres or MySQL. That's partly because I want to be able to use the current version and partly because I want to customize the build for the environment they serve. So I can't say if the distributed copies of Apache or whatever could be part of the problem.

Author Comment

ID: 7142521
I think I may have found the solution. It is the AMD Athlon/Duron bug. Redad about it here:

I have added "mem=nopentium" parameter to the kernel. Now is all I can do wait and see if the server will stay alive.

By the way. There is nothing wrong with the disk. I have checked it.

Author Comment

ID: 7164132
The computer crashed again. I have now put the disks in completely new Intel machine and of course installed Intell instead of AMD optimized kernel. It works now for a week. I'll wait an see.

Author Comment

ID: 7195443
Intel processor didn't help as I predicted but I was hoping my predictions were wrong.

I came to conclusion that disk is the problem although it has no bad sectors. I believe it must have some other non surface problem.

- ping works because it doesn't need anything written on the disk
- nothing is in log files because disk stopped working
- all other services which need disk access stopped working
- when tried to login on the console I could have written the username, then everything stopped when server tried to check the username on disk

The server stopped again and I installed everything on new disks. For now the server works on new disks. I will be 100% sure I have found the error when it will work for at least a month.

Author Comment

ID: 7243018
It turned out to be a disk problem although disk reports no bad sectors. It must be some other more hidden disk error.

Now the server runs for a month with new disks and all the other hardware is the same.

Featured Post

Tutorials alone can't teach real engineering

So we built better training tools.

-Hands-on Labs
-Instructor Mentoring
-Scenario-Based Tests
-Dedicated Cloud Servers

All at your fingertips. What are you waiting for?

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will explain how to establish a SSH connection to Ubuntu through the firewall and using a different port other then 22. I have set up a Ubuntu virtual machine in Virtualbox and I am running a Windows 7 workstation. From the Ubuntu vi…
The purpose of this article is to demonstrate how we can use conditional statements using Python.
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.
Suggested Courses

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question