DevZer0
asked on
Unusual Disk Read From Primary Disks
Basics:
We have a server running Linux Redhat 9.0
Kernel: 2.4.20-20.9bigmem #1
External Raid
and two internal drives RAID-0 (using the same cable)
and all our data and applications is mounted on the external raid
the primary drive is for the root file system only to be more prcesisi here is a output of df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md2 34336980 1965536 30627192 7% /
/dev/md0 2063440 65396 1893228 4% /boot
none 3096472 0 3096472 0% /dev/shm
tiny:/home 75854568 27651104 44350280 39% /ftp
//fax/doze 77199360 50599936 26599424 66% /doze
/dev/Volume00/lvol1 203147960 64257220 128404980 34% /appl
/dev/Volume00/tmp 10485436 113016 10372420 2% /tmp
/dev/Volume00/proedi 10485436 867228 9618208 9% /appl/proedi
sys3:/appl/backups/fast
52427200 34969832 17457368 67% /appl/backups/sys3
sys4bc:/appl/backups/fast
36298264 3120520 31333888 10% /appl/backups/sys4b
our usual load average is between 2 - 4, now and then it would peak to 10 but usualy comes down with on minitues.
since couple of weeks ago we started having a small problem, it seems every 7 days (which is another intresting factor) the disk reads from primary disks would increase significantly, causing a i/o wait for all other web applications we are running off our webserver (shell scripts n stuff) , we are yet to find what exactly is causing all these reads and why is it always 6-7 days after a reboot. we always have no other choice than having to reboot the server when this happens. since the increase read time from dev3-0 and 1 would go upto 200 to 300 blks/s causing everything else to hang
dev3-0 and 1 is our primary drives using IDE and they are on the same IDE cable
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
dev3-0 4.50 15.21 53.90 35564 126016
dev3-1 4.51 16.27 53.90 38032 126016
dev8-0 131.01 1928.25 937.88 4507854 2192568
dev8-1 131.19 2000.76 937.88 4677356 2192568
dev8 is the external raid
well i hope i have given enough information to visualize the senario if not i am more than willing to provide some extra information.. if anyone out there have encounterd and problem as this or knows something that could cause a situation like this all comments are greatly apriciated.
cheers
DevZer0
We have a server running Linux Redhat 9.0
Kernel: 2.4.20-20.9bigmem #1
External Raid
and two internal drives RAID-0 (using the same cable)
and all our data and applications is mounted on the external raid
the primary drive is for the root file system only to be more prcesisi here is a output of df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md2 34336980 1965536 30627192 7% /
/dev/md0 2063440 65396 1893228 4% /boot
none 3096472 0 3096472 0% /dev/shm
tiny:/home 75854568 27651104 44350280 39% /ftp
//fax/doze 77199360 50599936 26599424 66% /doze
/dev/Volume00/lvol1 203147960 64257220 128404980 34% /appl
/dev/Volume00/tmp 10485436 113016 10372420 2% /tmp
/dev/Volume00/proedi 10485436 867228 9618208 9% /appl/proedi
sys3:/appl/backups/fast
52427200 34969832 17457368 67% /appl/backups/sys3
sys4bc:/appl/backups/fast
36298264 3120520 31333888 10% /appl/backups/sys4b
our usual load average is between 2 - 4, now and then it would peak to 10 but usualy comes down with on minitues.
since couple of weeks ago we started having a small problem, it seems every 7 days (which is another intresting factor) the disk reads from primary disks would increase significantly, causing a i/o wait for all other web applications we are running off our webserver (shell scripts n stuff) , we are yet to find what exactly is causing all these reads and why is it always 6-7 days after a reboot. we always have no other choice than having to reboot the server when this happens. since the increase read time from dev3-0 and 1 would go upto 200 to 300 blks/s causing everything else to hang
dev3-0 and 1 is our primary drives using IDE and they are on the same IDE cable
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
dev3-0 4.50 15.21 53.90 35564 126016
dev3-1 4.51 16.27 53.90 38032 126016
dev8-0 131.01 1928.25 937.88 4507854 2192568
dev8-1 131.19 2000.76 937.88 4677356 2192568
dev8 is the external raid
well i hope i have given enough information to visualize the senario if not i am more than willing to provide some extra information.. if anyone out there have encounterd and problem as this or knows something that could cause a situation like this all comments are greatly apriciated.
cheers
DevZer0
For swap normally there is/are entry in the fstab, and show the partitions used for swap, eg.
/dev/hda7 swap swap defaults 0 0 <-- this mean the primary HDD partition#7 is for swap, so everytime you start/restart your
linux distro, the system will read the fstab and enable the swap for you.
/dev/hda7 swap swap defaults 0 0 <-- this mean the primary HDD partition#7 is for swap, so everytime you start/restart your
linux distro, the system will read the fstab and enable the swap for you.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
If this is happening every 7 days I'd be inclined to look for a cronjob doing something at weekly intervals. In a standard RH 9 configuration only the rpm log is rotated weekly (see /etc/logrotate.d/rpm), but you could have configured something else to rotate logs (like web server logs) on a weekly basis.
ASKER
Hi thanks for the comments, much apriciated.. i checked the cron weekly and and we do have weekly cronjobs but they are nothing to do with the primary drives... apache is also on our external raid, pretty much whats on the root file system is
/var /mnt
/etc /opt
/usr /root
/bin /sbin
/boot /snap
/initrd
/lib
/proc
/lost+found
and i know for a fact var has whole bunch of logs, as in ftp logs and telnet logs and other error logs.. but as far as apache logs and our application logs goes they are all in the external raid
/var /mnt
/etc /opt
/usr /root
/bin /sbin
/boot /snap
/initrd
/lib
/proc
/lost+found
and i know for a fact var has whole bunch of logs, as in ftp logs and telnet logs and other error logs.. but as far as apache logs and our application logs goes they are all in the external raid
Do the log dates on /var/logs match up with when the disk load occurs? How about the apache logs?
Note that activity on the system devices could be a result of the use of swap space or temp areas, even though Apache, DB's, whatever, are located on the external RAID. You can check to see what devices are actually being used for swap with 'swapon -s'.
Note that activity on the system devices could be a result of the use of swap space or temp areas, even though Apache, DB's, whatever, are located on the external RAID. You can check to see what devices are actually being used for swap with 'swapon -s'.
ASKER
apache logs are also in the external RAID and the problem is increase of primary disk read activity not writes. so even though if it was a small log that is being written in /var/log (not apache logs nor other application logs because they are written to the raid) woulodnt cause a increase of disk read time but them the two primary disks are Mirrored so that is the only cause that i could think that something is actually trying to read stuff...
Well, /tmp and swap are on the primary disks and anything that used those resources would cause disk activity on the primaries, even though the application and its data resides on the external RAID. The key to solving this is going to be in finding a correlation between the high disk I/O on the primaries and something that happens every 7 days. Once you know what triggers the I/O it then becomes a matter of figuring out why it beats on the primary disks.
ASKER
/tmp and swap is not on the primary disks
Oops I missed that /tmp is on the RAID. Have you verified that all of swap isn't on a primary with 'swapon -s'? Could something be using /var/tmp?
ASKER
----- swapon -s results -----
/dev/Volume00/swap1 partition 2097144 29292 -2
/dev/Volume00/swap2 partition 2097144 0 -3
/dev/Volume00/swap3 partition 2097144 0 -4
/dev/Volume00/swap4 partition 2097144 0 -5
/dev/Volume00/swap5 partition 2097144 0 -6
/dev/Volume00/swap6 partition 2097144 0 -7
looks like only RAID swaps are active and /var/tmp is not used by anything infact its empty as well...
any other ideas ?
/dev/Volume00/swap1 partition 2097144 29292 -2
/dev/Volume00/swap2 partition 2097144 0 -3
/dev/Volume00/swap3 partition 2097144 0 -4
/dev/Volume00/swap4 partition 2097144 0 -5
/dev/Volume00/swap5 partition 2097144 0 -6
/dev/Volume00/swap6 partition 2097144 0 -7
looks like only RAID swaps are active and /var/tmp is not used by anything infact its empty as well...
any other ideas ?
ASKER
We switched the cable which was used by the primary disks over this weekend... Since both Primary drives were sharing the same cable and since they are mirrored as well our most resonable theory was that since both Disks are sharing the same IDE Controler and the CABLE it could increase I/O time when a mirror operation is in progress. but we are yet to see the results... so far no problems have noticed...
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER