Solved

Unusual Disk Read From Primary Disks

Posted on 2003-10-31
13
338 Views
Last Modified: 2013-12-15
Basics:

We have a server running Linux Redhat 9.0
Kernel: 2.4.20-20.9bigmem #1

External Raid
and two internal drives RAID-0 (using the same cable)

and all our data and applications is mounted on the external raid

the primary drive is for the root file system only to be more prcesisi here is a output of df

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md2              34336980   1965536  30627192   7% /
/dev/md0               2063440     65396   1893228   4% /boot
none                   3096472         0   3096472   0% /dev/shm
tiny:/home            75854568  27651104  44350280  39% /ftp
//fax/doze            77199360  50599936  26599424  66% /doze
/dev/Volume00/lvol1  203147960  64257220 128404980  34% /appl
/dev/Volume00/tmp     10485436    113016  10372420   2% /tmp
/dev/Volume00/proedi  10485436    867228   9618208   9% /appl/proedi
sys3:/appl/backups/fast
                      52427200  34969832  17457368  67% /appl/backups/sys3
sys4bc:/appl/backups/fast
                      36298264   3120520  31333888  10% /appl/backups/sys4b


our usual load average is between 2 - 4, now and then it would peak to 10 but usualy comes down with on minitues.

since couple of weeks ago we started having a small problem, it seems every 7 days (which is another intresting factor) the disk reads from primary disks would increase significantly, causing a i/o wait for all other web applications we are running off our webserver (shell scripts n stuff) , we are yet to find what exactly is causing all these reads and why is it always 6-7 days after a reboot. we always have no other choice than having to reboot the server when this happens. since the increase read time from dev3-0 and 1  would go upto 200 to 300 blks/s causing everything else to hang

dev3-0 and 1 is our primary drives using IDE and they are on the same IDE cable

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev3-0            4.50        15.21        53.90      35564     126016
dev3-1            4.51        16.27        53.90      38032     126016
dev8-0          131.01      1928.25       937.88    4507854    2192568
dev8-1          131.19      2000.76       937.88    4677356    2192568

dev8 is the external raid

well i hope i have given enough information to visualize the senario if not i am more than willing to provide some extra information.. if anyone out there have encounterd and problem as this or knows something that could cause a situation like this all comments are greatly apriciated.

cheers

DevZer0

0
Comment
Question by:DevZer0
  • 6
  • 5
  • 2
13 Comments
 

Author Comment

by:DevZer0
ID: 9659299
Well here is some more info.. we do have a swap parition on our primary drives but it is set to turn off when the RAID comes to life. since there is a swap in the RAID.. but we can;t know for sure if the primary swap is actually not being used or not.. as for my knowladge using free -m will not show what swap is being used well that was one theory that i came up with my no evidence to confirm it yet.

0
 
LVL 12

Expert Comment

by:paullamhkg
ID: 9661754
For swap normally there is/are entry in the fstab, and show the partitions used for swap, eg.

/dev/hda7               swap                    swap    defaults        0 0     <-- this mean the primary HDD partition#7 is for swap, so everytime you start/restart your

linux distro, the system will read the fstab and enable the swap for you.
0
 
LVL 12

Assisted Solution

by:paullamhkg
paullamhkg earned 250 total points
ID: 9661785
There may have lots of reason why your harddisk i/o increase, if someone try to scan you web server for hacking which might increase your i/o, because they use many way to scan your server which make your system busy, and your system is install into your dev3-0 and 1.

Your application request the system resouce and create lots of system I/O may also the reason, check you web application to ensure it's not the case.

I will suggest to restart the web service each day or twice per week, before you find out the reason.

I do not mean to restart/reboot the server, I mean restart the web/http service, the way to do is just as below

/usr/bin/apachectl stop      <--- which will stop the web/http service
/usr/bin/apachectl start      <--- which start up the web/http service without the ssl
or
/usr/bin/apachectl startssl   <--- which start up the web/http serice with ssl

I've an application running in our web server, which create lots of dead child, and which hang up my web 10-20 days, my programmer still looking for the problem, but the restart http service which kill all the dead child and keep my web server running, I know this is not a solution, but at lease I can keep my server alive.

I don't know the restart service will help you or not but at lease you can try.
0
 
LVL 40

Expert Comment

by:jlevie
ID: 9662821
If this is happening every 7 days I'd be inclined to look for a cronjob doing something at weekly intervals. In a standard RH 9 configuration only the rpm log is rotated weekly (see /etc/logrotate.d/rpm), but you could have configured something else to rotate logs (like web server logs) on a weekly basis.
0
 

Author Comment

by:DevZer0
ID: 9663451
Hi thanks for the comments, much apriciated.. i checked the cron weekly and and we do have weekly cronjobs but they are nothing to do with the primary drives... apache is also on our external raid, pretty much whats on the root file system is

/var      /mnt
/etc      /opt
/usr     /root
/bin     /sbin
/boot     /snap
/initrd    
/lib
/proc
/lost+found


and i know for a fact var has whole bunch of logs, as in ftp logs and telnet logs and other error logs.. but as far as apache logs and our application logs goes they are all in the external raid

0
 
LVL 40

Expert Comment

by:jlevie
ID: 9663913
Do the log dates on /var/logs match up with when the disk load occurs? How about the apache logs?

Note that activity on the system devices could be a result of the use of swap space or temp areas, even though Apache, DB's, whatever, are located on the external RAID. You can check to see what devices are actually being used for swap with 'swapon -s'.
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 

Author Comment

by:DevZer0
ID: 9671715
apache logs are also in the external RAID and the problem is increase of primary disk read activity not writes. so even though if it was a small log that is being written in /var/log (not apache logs nor other application logs because they are written to the raid) woulodnt cause a increase of disk read time but them the two primary disks are Mirrored so that is the only cause that i could think that something is actually trying to read stuff...

0
 
LVL 40

Expert Comment

by:jlevie
ID: 9671966
Well, /tmp and swap are on the primary disks and anything that used those resources would cause disk activity on the primaries, even though the application and its data resides on the external RAID. The key to solving this is going to be in finding a correlation between the high disk I/O on the primaries and something that happens every 7 days. Once you know what triggers the I/O it then becomes a matter of figuring out why it beats on the primary disks.
0
 

Author Comment

by:DevZer0
ID: 9672109
/tmp and swap is not on the primary disks
0
 
LVL 40

Expert Comment

by:jlevie
ID: 9672775
Oops I missed that /tmp is on the RAID. Have you verified that all of swap isn't on a primary with 'swapon -s'? Could something be using /var/tmp?
0
 

Author Comment

by:DevZer0
ID: 9674288
----- swapon -s results -----

/dev/Volume00/swap1             partition 2097144 29292 -2
/dev/Volume00/swap2             partition 2097144 0 -3
/dev/Volume00/swap3             partition 2097144 0 -4
/dev/Volume00/swap4             partition 2097144 0 -5
/dev/Volume00/swap5             partition 2097144 0 -6
/dev/Volume00/swap6             partition 2097144 0 -7


looks like only RAID swaps are active and /var/tmp is not used by anything infact its empty as well...

any other ideas ?
0
 

Author Comment

by:DevZer0
ID: 9674303
We switched the cable which was used by the primary disks over this weekend... Since both Primary drives were sharing the same cable and since they are mirrored as well our most resonable theory was that since both Disks are sharing the same IDE Controler and the CABLE it could increase I/O time when a mirror operation is in progress. but we are yet to see the results... so far no problems have noticed...

0
 
LVL 40

Accepted Solution

by:
jlevie earned 250 total points
ID: 9675482
I assume you are using RAID 1 (mirroring) and having each disk as a master on different IDE controllers is a good thing, reliability wise. Using two controllers is also good w/respect to disk I/O rates and that should help. But it still doesn't explain what's happening or why. I think you are going to have to look carefully at what happens on a seven day period for the case.
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Little introduction about CP: CP is a command on linux that use to copy files and folder from one location to another location. Example usage of CP as follow: cp /myfoder /pathto/destination/folder/ cp abc.tar.gz /pathto/destination/folder/ab…
Linux users are sometimes dumbfounded by the severe lack of documentation on a topic. Sometimes, the documentation is copious, but other times, you end up with some obscure "it varies depending on your distribution" over and over when searching for …
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now