MSJoe
asked on
Linux server swap memory problem
I have a problem with my webserver. Basically as you will see my drive statistics displayed below, as well as the swap. Now I'm having a major problem with web traffic spikes a certain time of the day and the server goes into oblivion and Tomcat and Appache must be rebooted. I logged in and ran the top command a few times over a course of a few days and my physical memory is below 100MB and my swap memory says it has 2GB but it is always at 144KB. Now it also says that it is on /dev/cciss/c0d0p3 which if I run the df -h command the drive/partition doesn't appear to exist.
So I'm a little bit confused if they are partitions or individual drives, and if there is something else going on here. To explain a bit better I changed the director up to root, and then to /dev/cciss/ and then ran an LS command. It listed c0d0p1 through c0d0p7 but when I tried to change directory to any of them it reported that it was not a directory! What the heck!
Regardless my assumption here is that c0d0p3 doesn't exist and my swap memory is failing to be accessed which is why my web server is crashing! This is a huge problem, and I'm confused by my investigation. I'm not new to linux, but I'm a bit confused when looking at this through SSH. I do not have the ability to look at the system directly, so I can't physically get to the machine.
Help!
[root@server ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/cciss/c0d0p6 989M 236M 703M 26% /
/dev/cciss/c0d0p1 99M 12M 83M 12% /boot
none 1.5G 0 1.5G 0% /dev/shm
/dev/cciss/c0d0p5 989M 18M 921M 2% /tmp
/dev/cciss/c0d0p2 2.9G 916M 1.9G 33% /usr
/dev/cciss/c0d0p7 127G 6.1G 115G 6% /var
[root@server ~]# swapon -s
Filename Type Size Used Priority
/dev/cciss/c0d0p3 partition 2032212 144 -1
So I'm a little bit confused if they are partitions or individual drives, and if there is something else going on here. To explain a bit better I changed the director up to root, and then to /dev/cciss/ and then ran an LS command. It listed c0d0p1 through c0d0p7 but when I tried to change directory to any of them it reported that it was not a directory! What the heck!
Regardless my assumption here is that c0d0p3 doesn't exist and my swap memory is failing to be accessed which is why my web server is crashing! This is a huge problem, and I'm confused by my investigation. I'm not new to linux, but I'm a bit confused when looking at this through SSH. I do not have the ability to look at the system directly, so I can't physically get to the machine.
Help!
[root@server ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/cciss/c0d0p6 989M 236M 703M 26% /
/dev/cciss/c0d0p1 99M 12M 83M 12% /boot
none 1.5G 0 1.5G 0% /dev/shm
/dev/cciss/c0d0p5 989M 18M 921M 2% /tmp
/dev/cciss/c0d0p2 2.9G 916M 1.9G 33% /usr
/dev/cciss/c0d0p7 127G 6.1G 115G 6% /var
[root@server ~]# swapon -s
Filename Type Size Used Priority
/dev/cciss/c0d0p3 partition 2032212 144 -1
from your listing is seems that swap in under used, I think this is a red herring!
ASKER
Ok but that just tells me what I alreadly know. I know its usage, but I feel that this is an error. And what is this red herring you speak of?
I got the impression you thought swap not being listed in a df was a problem.
swap is not a mountable file system so it wouldnt show up in a df.
use fdisk -l /dev/cciss/c0d0 to list the partitions, swap should be type 82
swap is not a mountable file system so it wouldnt show up in a df.
use fdisk -l /dev/cciss/c0d0 to list the partitions, swap should be type 82
ASKER
[root@server ~]# fdisk -l /dev/cciss/c0d0
Disk /dev/cciss/c0d0: 145.6 GB, 145659002880 bytes
255 heads, 63 sectors/track, 17708 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/cciss/c0d0p1 * 1 13 104391 83 Linux
/dev/cciss/c0d0p2 14 396 3076447+ 83 Linux
/dev/cciss/c0d0p3 397 649 2032222+ 82 Linux swap
/dev/cciss/c0d0p4 650 17708 137026417+ 5 Extended
/dev/cciss/c0d0p5 650 777 1028128+ 83 Linux
/dev/cciss/c0d0p6 778 905 1028128+ 83 Linux
/dev/cciss/c0d0p7 906 17708 134970066 83 Linux
Ok, so its there from runing the swapon -s command I see it is 2GB so why the heck is it always at 144KB and what can I do about it? Also what is this red herring? I have the feeling I'm not getting a very bad joke, which might make me smilke.
[root@server ~]# swapon -s
Filename Type Size Used Priority
/dev/cciss/c0d0p3 partition 2032212 144 -1
Disk /dev/cciss/c0d0: 145.6 GB, 145659002880 bytes
255 heads, 63 sectors/track, 17708 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/cciss/c0d0p1 * 1 13 104391 83 Linux
/dev/cciss/c0d0p2 14 396 3076447+ 83 Linux
/dev/cciss/c0d0p3 397 649 2032222+ 82 Linux swap
/dev/cciss/c0d0p4 650 17708 137026417+ 5 Extended
/dev/cciss/c0d0p5 650 777 1028128+ 83 Linux
/dev/cciss/c0d0p6 778 905 1028128+ 83 Linux
/dev/cciss/c0d0p7 906 17708 134970066 83 Linux
Ok, so its there from runing the swapon -s command I see it is 2GB so why the heck is it always at 144KB and what can I do about it? Also what is this red herring? I have the feeling I'm not getting a very bad joke, which might make me smilke.
[root@server ~]# swapon -s
Filename Type Size Used Priority
/dev/cciss/c0d0p3 partition 2032212 144 -1
ASKER
Nevermind the red herring.
Something that draws attention away from the central issue
Something that draws attention away from the central issue
Its at 144kb because it's not being used (you dont actually want the system to use swap unless it has to).
The red herring?? I think you're looking at the wrong area, I dont think your problem is related to swap.
You need to be looking in the log files for apache and tomcat and perhaps using apachetop during the times of the problem to find out whats wrong.
The red herring?? I think you're looking at the wrong area, I dont think your problem is related to swap.
You need to be looking in the log files for apache and tomcat and perhaps using apachetop during the times of the problem to find out whats wrong.
ASKER
Well the memory is very low, very very low during the problem. Basically it happens everyday when we send out a mass amount of emails and we are thinking it is overloading the image server with request and the lack of memory is crashing the server.
Regardless if I am wrong of not, although the system is reporting 2gb of memory it should start using some swap memory before it hits zero physical ram correct? I'm just confused it is always at 144KB. I would expect when it hits liek 50MB or close to it would start using swap, right?
Anyway when the problem does occur we have to reboot appache, so I guess you are likely correct that this is a red herring. However the apachetop command doesn't exist?
Regardless if I am wrong of not, although the system is reporting 2gb of memory it should start using some swap memory before it hits zero physical ram correct? I'm just confused it is always at 144KB. I would expect when it hits liek 50MB or close to it would start using swap, right?
Anyway when the problem does occur we have to reboot appache, so I guess you are likely correct that this is a red herring. However the apachetop command doesn't exist?
Maybe, maybe not, the system will page out to swap if it needs to, if it does it a lot it will start thrashing but I've not seen that for several years, especially in a server with 2GB memory.
If you have sar running it collects data on your server, running sar with several options will give you some system stats.
You can install apachetop.
Also look at vmstat and iostat. I believe the tools are part of the sysstat package, apachetop is it's own package.
If you have sar running it collects data on your server, running sar with several options will give you some system stats.
You can install apachetop.
Also look at vmstat and iostat. I believe the tools are part of the sysstat package, apachetop is it's own package.
ASKER
Well vmstat is but it doesn't really give me anything helpful.
[root@localhost root]# vmstat
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 181032 207964 752200 0 0 0 1 2 0 0 0 2 3
[root@localhost root]# iostat
-bash: iostat: command not found
[root@localhost root]#
What does apachetop do exactly? Is http://www.webta.org/projects/apachetop/ what you are referring to? I'm just not sure how to troubleshoot this issue with apache, if it isn't a memory problem. I was under the impression this was a swap memory problem, which it appears it is not.
Since I am wrong let me repaint the picture so you can point me into the correct direction. Two websites, one which is the actual website, another which is images. The images website is what appears to be crashing, so maybe appache is the problem but if the website became unresponsive and require appache to be rebooted, wouldn't it take down the actual website, not just the images website?
[root@localhost root]# vmstat
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 181032 207964 752200 0 0 0 1 2 0 0 0 2 3
[root@localhost root]# iostat
-bash: iostat: command not found
[root@localhost root]#
What does apachetop do exactly? Is http://www.webta.org/projects/apachetop/ what you are referring to? I'm just not sure how to troubleshoot this issue with apache, if it isn't a memory problem. I was under the impression this was a swap memory problem, which it appears it is not.
Since I am wrong let me repaint the picture so you can point me into the correct direction. Two websites, one which is the actual website, another which is images. The images website is what appears to be crashing, so maybe appache is the problem but if the website became unresponsive and require appache to be rebooted, wouldn't it take down the actual website, not just the images website?
FYI, you can add timings to vmstat, for example;
# vmstat 5 5
procs -----------memory--------- - ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 108 61020 254300 333432 0 0 106 9 218 146 1 0 99 0 0
0 0 108 61020 254308 333432 0 0 0 20 1233 131 0 0 100 0 0
0 0 108 61020 254308 333432 0 0 0 5 1231 152 0 0 100 0 0
0 0 108 61020 254308 333432 0 0 0 12 1233 126 0 0 100 0 0
0 0 108 61020 254308 333432 0 0 0 0 1230 127 0 0 100 0 0
(my system isnt doing much)
# vmstat 5 5
procs -----------memory--------- - ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 1 108 58116 256888 333428 0 0 106 9 218 146 1 0 99 0 0
0 1 108 52064 262460 333408 0 0 1098 234 1470 941 1 5 0 94 0
1 1 108 45432 269160 333404 0 0 1330 276 1531 1124 1 5 0 94 0
0 1 108 40540 273804 333416 0 0 919 529 1435 856 10 5 1 85 0
0 1 108 32976 281408 333420 0 0 1512 458 1578 1262 1 5 0 94 0
(with find / running in background)
SAR may also help you (a little)
# sar -r
09:40:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
09:50:01 AM 66640 968372 93.56 331900 288340 2097036 108 0.01 0
10:00:01 AM 62736 972276 93.94 332248 288376 2097036 108 0.01 0
10:10:01 AM 66240 968772 93.60 332532 288380 2097036 108 0.01 0
10:20:01 AM 64380 970632 93.78 332780 288376 2097036 108 0.01 0
10:30:01 AM 65840 969172 93.64 333040 288404 2097036 108 0.01 0
10:40:01 AM 68876 966136 93.35 303552 286264 2097036 108 0.01 0
10:50:01 AM 65020 969992 93.72 304156 287332 2097036 108 0.01 0
11:00:01 AM 62796 972216 93.93 305024 287556 2097036 108 0.01 0
11:10:01 AM 62300 972712 93.98 306008 287536 2097036 108 0.01 0
11:20:01 AM 60564 974448 94.15 306892 287596 2097036 108 0.01 0
11:30:01 AM 57340 977672 94.46 307904 287664 2097036 108 0.01 0
11:40:01 AM 68488 966524 93.38 298736 287448 2097036 108 0.01 0
11:50:01 AM 27132 1007880 97.38 300224 329828 2097036 108 0.01 0
12:00:01 PM 25748 1009264 97.51 301000 329948 2097036 108 0.01 0
12:10:01 PM 25456 1009556 97.54 301728 329972 2097036 108 0.01 0
12:20:01 PM 25012 1010000 97.58 302244 329992 2097036 108 0.01 0
12:30:01 PM 23876 1011136 97.69 302696 330020 2097036 108 0.01 0
12:40:01 PM 66732 968280 93.55 263456 326128 2097036 108 0.01 0
12:50:01 PM 66568 968444 93.57 263724 326132 2097036 108 0.01 0
01:00:01 PM 66188 968824 93.61 264132 326128 2097036 108 0.01 0
01:10:01 PM 64960 970052 93.72 264472 326132 2097036 108 0.01 0
01:20:01 PM 65616 969396 93.66 264776 326140 2097036 108 0.01 0
01:30:01 PM 65244 969768 93.70 265040 326156 2097036 108 0.01 0
01:40:01 PM 67972 967040 93.43 262616 325808 2097036 108 0.01 0
01:50:01 PM 61616 973396 94.05 263152 326888 2097036 108 0.01 0
02:00:01 PM 61368 973644 94.07 263468 326920 2097036 108 0.01 0
02:10:01 PM 60480 974532 94.16 263860 326976 2097036 108 0.01 0
02:20:01 PM 59392 975620 94.26 264612 327160 2097036 108 0.01 0
02:30:01 PM 59552 975460 94.25 264940 327172 2097036 108 0.01 0
02:40:01 PM 66840 968172 93.54 257136 326328 2097036 108 0.01 0
02:50:01 PM 66988 968024 93.53 257612 326356 2097036 108 0.01 0
03:00:01 PM 67000 968012 93.53 258160 326444 2097036 108 0.01 0
03:10:01 PM 66712 968300 93.55 258612 326464 2097036 108 0.01 0
03:20:01 PM 66064 968948 93.62 259048 326488 2097036 108 0.01 0
03:30:01 PM 63388 971624 93.88 259560 326512 2097036 108 0.01 0
03:40:01 PM 70528 964484 93.19 254472 325980 2097036 108 0.01 0
03:50:01 PM 68076 966936 93.42 255156 326080 2097036 108 0.01 0
04:00:01 PM 67580 967432 93.47 255656 326100 2097036 108 0.01 0
04:10:01 PM 67208 967804 93.51 256244 326136 2097036 108 0.01 0
04:20:01 PM 68644 966368 93.37 256612 326140 2097036 108 0.01 0
04:30:01 PM 68664 966348 93.37 256984 326140 2097036 108 0.01 0
04:40:01 PM 71264 963748 93.11 254672 325892 2097036 108 0.01 0
04:50:01 PM 64308 970704 93.79 255240 327456 2097036 108 0.01 0
05:00:01 PM 66576 968436 93.57 255720 327520 2097036 108 0.01 0
05:10:01 PM 63936 971076 93.82 256328 327724 2097036 108 0.01 0
05:20:01 PM 65332 969680 93.69 256764 327728 2097036 108 0.01 0
05:30:01 PM 61684 973328 94.04 257276 328696 2097036 108 0.01 0
05:40:01 PM 68496 966516 93.38 252280 328844 2097036 108 0.01 0
05:50:01 PM 68248 966764 93.41 252552 328852 2097036 108 0.01 0
06:00:01 PM 62872 972140 93.93 252960 333452 2097036 108 0.01 0
06:10:01 PM 65232 969780 93.70 253348 331216 2097036 108 0.01 0
06:20:01 PM 66272 968740 93.60 253696 330828 2097036 108 0.01 0
06:30:01 PM 65956 969056 93.63 253940 330856 2097036 108 0.01 0
06:40:01 PM 66936 968076 93.53 252308 331280 2097036 108 0.01 0
06:50:01 PM 66352 968660 93.59 252636 331288 2097036 108 0.01 0
07:00:01 PM 66260 968752 93.60 252884 331316 2097036 108 0.01 0
07:10:01 PM 65252 969760 93.70 253352 331392 2097036 108 0.01 0
07:20:01 PM 63268 971744 93.89 253708 331416 2097036 108 0.01 0
07:20:01 PM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
07:30:01 PM 64884 970128 93.73 254100 331420 2097036 108 0.01 0
07:40:01 PM 64504 970508 93.77 252512 331212 2097036 108 0.01 0
07:50:01 PM 59732 975280 94.23 253132 333332 2097036 108 0.01 0
08:00:01 PM 60972 974040 94.11 253480 333392 2097036 108 0.01 0
08:10:01 PM 60600 974412 94.14 253764 333416 2097036 108 0.01 0
08:20:01 PM 60176 974836 94.19 254052 333424 2097036 108 0.01 0
08:30:02 PM 43920 991092 95.76 269948 333436 2097036 108 0.01 0
Average: 63501 971511 93.86 303774 302919 2097036 108 0.01 0
iostat really gives disk io information
# iostat
avg-cpu: %user %nice %system %iowait %steal %idle
0.61 0.07 0.18 0.55 0.00 98.59
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.79 106.18 20.58 17959510 3480268
sdb 2.27 109.68 18.98 18551638 3209852
md1 6.30 210.73 15.40 35644226 2605640
md0 0.16 0.73 0.00 124014 212
dm-0 0.00 0.01 0.00 1544 216
dm-1 2.90 47.01 7.09 7952146 1199912
dm-2 1.32 31.73 6.26 5367714 1058736
dm-3 1.96 130.92 1.82 22144986 308648
dm-4 0.12 0.95 0.17 161154 28456
dm-5 0.01 0.02 0.06 3218 9672
dm-6 0.56 3.37 1.07 570842 180456
dm-7 0.19 0.99 0.53 167690 89960
apachetop can show the apache activity, pages being accessed and performance of the web server.
There is some more information here with an example screenshot. http://freshmeat.net/projects/apachetop/
If both websites are running on the same server then yes, if both websites are running on sparate servers then no but the images server would be unavailable to serve while it was being reset. Not sure if this is a problem!
The problem could be any number of things at the moment, including (but not limited to).
Server hardware setup.
Network setup and configuration (the physical stuff, switches/routers etc)
Linux server network setup
Apache server configuration
Please understand, whilst there is no information you have posted here that leads me to think there is a memory issue, it doesnt mean that there isnt one.
You need to check the logs on the server first of all.
# vmstat 5 5
procs -----------memory---------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 108 61020 254300 333432 0 0 106 9 218 146 1 0 99 0 0
0 0 108 61020 254308 333432 0 0 0 20 1233 131 0 0 100 0 0
0 0 108 61020 254308 333432 0 0 0 5 1231 152 0 0 100 0 0
0 0 108 61020 254308 333432 0 0 0 12 1233 126 0 0 100 0 0
0 0 108 61020 254308 333432 0 0 0 0 1230 127 0 0 100 0 0
(my system isnt doing much)
# vmstat 5 5
procs -----------memory---------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 1 108 58116 256888 333428 0 0 106 9 218 146 1 0 99 0 0
0 1 108 52064 262460 333408 0 0 1098 234 1470 941 1 5 0 94 0
1 1 108 45432 269160 333404 0 0 1330 276 1531 1124 1 5 0 94 0
0 1 108 40540 273804 333416 0 0 919 529 1435 856 10 5 1 85 0
0 1 108 32976 281408 333420 0 0 1512 458 1578 1262 1 5 0 94 0
(with find / running in background)
SAR may also help you (a little)
# sar -r
09:40:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
09:50:01 AM 66640 968372 93.56 331900 288340 2097036 108 0.01 0
10:00:01 AM 62736 972276 93.94 332248 288376 2097036 108 0.01 0
10:10:01 AM 66240 968772 93.60 332532 288380 2097036 108 0.01 0
10:20:01 AM 64380 970632 93.78 332780 288376 2097036 108 0.01 0
10:30:01 AM 65840 969172 93.64 333040 288404 2097036 108 0.01 0
10:40:01 AM 68876 966136 93.35 303552 286264 2097036 108 0.01 0
10:50:01 AM 65020 969992 93.72 304156 287332 2097036 108 0.01 0
11:00:01 AM 62796 972216 93.93 305024 287556 2097036 108 0.01 0
11:10:01 AM 62300 972712 93.98 306008 287536 2097036 108 0.01 0
11:20:01 AM 60564 974448 94.15 306892 287596 2097036 108 0.01 0
11:30:01 AM 57340 977672 94.46 307904 287664 2097036 108 0.01 0
11:40:01 AM 68488 966524 93.38 298736 287448 2097036 108 0.01 0
11:50:01 AM 27132 1007880 97.38 300224 329828 2097036 108 0.01 0
12:00:01 PM 25748 1009264 97.51 301000 329948 2097036 108 0.01 0
12:10:01 PM 25456 1009556 97.54 301728 329972 2097036 108 0.01 0
12:20:01 PM 25012 1010000 97.58 302244 329992 2097036 108 0.01 0
12:30:01 PM 23876 1011136 97.69 302696 330020 2097036 108 0.01 0
12:40:01 PM 66732 968280 93.55 263456 326128 2097036 108 0.01 0
12:50:01 PM 66568 968444 93.57 263724 326132 2097036 108 0.01 0
01:00:01 PM 66188 968824 93.61 264132 326128 2097036 108 0.01 0
01:10:01 PM 64960 970052 93.72 264472 326132 2097036 108 0.01 0
01:20:01 PM 65616 969396 93.66 264776 326140 2097036 108 0.01 0
01:30:01 PM 65244 969768 93.70 265040 326156 2097036 108 0.01 0
01:40:01 PM 67972 967040 93.43 262616 325808 2097036 108 0.01 0
01:50:01 PM 61616 973396 94.05 263152 326888 2097036 108 0.01 0
02:00:01 PM 61368 973644 94.07 263468 326920 2097036 108 0.01 0
02:10:01 PM 60480 974532 94.16 263860 326976 2097036 108 0.01 0
02:20:01 PM 59392 975620 94.26 264612 327160 2097036 108 0.01 0
02:30:01 PM 59552 975460 94.25 264940 327172 2097036 108 0.01 0
02:40:01 PM 66840 968172 93.54 257136 326328 2097036 108 0.01 0
02:50:01 PM 66988 968024 93.53 257612 326356 2097036 108 0.01 0
03:00:01 PM 67000 968012 93.53 258160 326444 2097036 108 0.01 0
03:10:01 PM 66712 968300 93.55 258612 326464 2097036 108 0.01 0
03:20:01 PM 66064 968948 93.62 259048 326488 2097036 108 0.01 0
03:30:01 PM 63388 971624 93.88 259560 326512 2097036 108 0.01 0
03:40:01 PM 70528 964484 93.19 254472 325980 2097036 108 0.01 0
03:50:01 PM 68076 966936 93.42 255156 326080 2097036 108 0.01 0
04:00:01 PM 67580 967432 93.47 255656 326100 2097036 108 0.01 0
04:10:01 PM 67208 967804 93.51 256244 326136 2097036 108 0.01 0
04:20:01 PM 68644 966368 93.37 256612 326140 2097036 108 0.01 0
04:30:01 PM 68664 966348 93.37 256984 326140 2097036 108 0.01 0
04:40:01 PM 71264 963748 93.11 254672 325892 2097036 108 0.01 0
04:50:01 PM 64308 970704 93.79 255240 327456 2097036 108 0.01 0
05:00:01 PM 66576 968436 93.57 255720 327520 2097036 108 0.01 0
05:10:01 PM 63936 971076 93.82 256328 327724 2097036 108 0.01 0
05:20:01 PM 65332 969680 93.69 256764 327728 2097036 108 0.01 0
05:30:01 PM 61684 973328 94.04 257276 328696 2097036 108 0.01 0
05:40:01 PM 68496 966516 93.38 252280 328844 2097036 108 0.01 0
05:50:01 PM 68248 966764 93.41 252552 328852 2097036 108 0.01 0
06:00:01 PM 62872 972140 93.93 252960 333452 2097036 108 0.01 0
06:10:01 PM 65232 969780 93.70 253348 331216 2097036 108 0.01 0
06:20:01 PM 66272 968740 93.60 253696 330828 2097036 108 0.01 0
06:30:01 PM 65956 969056 93.63 253940 330856 2097036 108 0.01 0
06:40:01 PM 66936 968076 93.53 252308 331280 2097036 108 0.01 0
06:50:01 PM 66352 968660 93.59 252636 331288 2097036 108 0.01 0
07:00:01 PM 66260 968752 93.60 252884 331316 2097036 108 0.01 0
07:10:01 PM 65252 969760 93.70 253352 331392 2097036 108 0.01 0
07:20:01 PM 63268 971744 93.89 253708 331416 2097036 108 0.01 0
07:20:01 PM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
07:30:01 PM 64884 970128 93.73 254100 331420 2097036 108 0.01 0
07:40:01 PM 64504 970508 93.77 252512 331212 2097036 108 0.01 0
07:50:01 PM 59732 975280 94.23 253132 333332 2097036 108 0.01 0
08:00:01 PM 60972 974040 94.11 253480 333392 2097036 108 0.01 0
08:10:01 PM 60600 974412 94.14 253764 333416 2097036 108 0.01 0
08:20:01 PM 60176 974836 94.19 254052 333424 2097036 108 0.01 0
08:30:02 PM 43920 991092 95.76 269948 333436 2097036 108 0.01 0
Average: 63501 971511 93.86 303774 302919 2097036 108 0.01 0
iostat really gives disk io information
# iostat
avg-cpu: %user %nice %system %iowait %steal %idle
0.61 0.07 0.18 0.55 0.00 98.59
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.79 106.18 20.58 17959510 3480268
sdb 2.27 109.68 18.98 18551638 3209852
md1 6.30 210.73 15.40 35644226 2605640
md0 0.16 0.73 0.00 124014 212
dm-0 0.00 0.01 0.00 1544 216
dm-1 2.90 47.01 7.09 7952146 1199912
dm-2 1.32 31.73 6.26 5367714 1058736
dm-3 1.96 130.92 1.82 22144986 308648
dm-4 0.12 0.95 0.17 161154 28456
dm-5 0.01 0.02 0.06 3218 9672
dm-6 0.56 3.37 1.07 570842 180456
dm-7 0.19 0.99 0.53 167690 89960
apachetop can show the apache activity, pages being accessed and performance of the web server.
There is some more information here with an example screenshot. http://freshmeat.net/projects/apachetop/
If both websites are running on the same server then yes, if both websites are running on sparate servers then no but the images server would be unavailable to serve while it was being reset. Not sure if this is a problem!
The problem could be any number of things at the moment, including (but not limited to).
Server hardware setup.
Network setup and configuration (the physical stuff, switches/routers etc)
Linux server network setup
Apache server configuration
Please understand, whilst there is no information you have posted here that leads me to think there is a memory issue, it doesnt mean that there isnt one.
You need to check the logs on the server first of all.
ASKER
If you were troubleshooting this issue with Apache what logging would you enable? I have to get approval for installing Apachetop prior to doing it, so I'm waiting on that right now. We have a strict policy about installation of 3rd party software that has not been tested.
Anyway I need some clarification about the website crash you mentioned. Basically for our conversation there is one server. It hosts two websites which are website.domain.com and images.domain.com. The website points to images, for images obviously. Now the crash occurs when we send out a large email blast containing large amounts of references to images. This is our daily email so that is why there is a huge spike in traffic to that website. I fear it is a problem with apache, or memory but not physical network or the server setup itself. At least not the server itself as the only thing I was worried about was the swap memory but we ruled this out and fixed my misunderstanding of swap memory. When the "crash" occurs basically the website can still serve up html and render website.domain.com although it is very slow. Images.domain.com however when the website eventually comes up appears to be down but not exactly. The website.domain.com will render after some amount of time but without images being displayed. I feel and maybe I am wrong if apache was to blame wouldn't both go down? To resolve the issue apache must be restarted.
My conclusion leads me to believe that physical memory maybe the issue, it only has 3GB and we have a huge amount of traffic that has arrived in the past few months during peak hours. I have an upgrade to 6GB waiting to come in, and hopefully if it is a memory problem it will help. I just feel that in average usage the server is running at 50-80 MB of available physical memory that this could be the cause. If it doesnt help, then maybe it is apache but I suppose it could be a network issue however I can Ssh into it during the peak times without any problems. I guess the next step is to watch the top command to see if the physical memory drops, and the page file increases. If it does then it likely isnt a memory problem unless the page file is completely used up or there is a problem with a physical stick of memory.
Is there a way to check the network statistics and usage at SSH? Sorry for some of the rambling, I was thinking out loud.
Anyway I need some clarification about the website crash you mentioned. Basically for our conversation there is one server. It hosts two websites which are website.domain.com and images.domain.com. The website points to images, for images obviously. Now the crash occurs when we send out a large email blast containing large amounts of references to images. This is our daily email so that is why there is a huge spike in traffic to that website. I fear it is a problem with apache, or memory but not physical network or the server setup itself. At least not the server itself as the only thing I was worried about was the swap memory but we ruled this out and fixed my misunderstanding of swap memory. When the "crash" occurs basically the website can still serve up html and render website.domain.com although it is very slow. Images.domain.com however when the website eventually comes up appears to be down but not exactly. The website.domain.com will render after some amount of time but without images being displayed. I feel and maybe I am wrong if apache was to blame wouldn't both go down? To resolve the issue apache must be restarted.
My conclusion leads me to believe that physical memory maybe the issue, it only has 3GB and we have a huge amount of traffic that has arrived in the past few months during peak hours. I have an upgrade to 6GB waiting to come in, and hopefully if it is a memory problem it will help. I just feel that in average usage the server is running at 50-80 MB of available physical memory that this could be the cause. If it doesnt help, then maybe it is apache but I suppose it could be a network issue however I can Ssh into it during the peak times without any problems. I guess the next step is to watch the top command to see if the physical memory drops, and the page file increases. If it does then it likely isnt a memory problem unless the page file is completely used up or there is a problem with a physical stick of memory.
Is there a way to check the network statistics and usage at SSH? Sorry for some of the rambling, I was thinking out loud.
> We have a strict policy about installation of 3rd party software that has not been tested.
A very good policy to have if you ask me!
> It hosts two websites which are website.domain.com and images.domain.com.
Shutting down apache that hosts two web sites will shut them both down.
> Now the crash occurs when we send out a large email blast containing large amounts of references to images. This is our daily email so that is why there is a huge spike in traffic to that website.
This is the bit that will need further investigation and (like all performance type issues) it may take some time, throwing memory at the problem may not fix it although it wont hurt.
Physical memory may be the issue... only have 3GB :-) That is quite a bit...
You do need to spend the time going thru the logs in /var/log to see if there are any errors.
You also need to be collecting stats with sar, vmstat (see my previous post above). I'm afraid linux performance monitoring tools are limited, the thing you may notice about top is that it's usually one of the top CPU hogs of the system, but then you need to look at what is happening.
You said that the problem seems to occurr when you send out a large email with links to the images on the website. I take it that the email messages arent necessarily big but the volume is. Have you seen spikes in cpu/memory usage for the mail server?
Use the tools you have available to help you, check the man pages for relevant options, I've given a few below;
sar -B | -r | -R | -u | -b | -d | -n DEV
vmstat
ps -ev | -ealf
top
Dont be swayed by someone coming along and saying it's definately memory or this or that. You need to check and recheck logs etc.
I don't think the this question is going to give you all the help you need, although we may have covered off the original question I hope to your satisfaction, the title you have used and the subject areas it it is posted in may not hit the right amount of "expertise". You don't just want one persions opinion, I don't know everything (don't tell my boss!). You could do better by closing this question down and opening a new, perhaps related question including the apache, linux and some other groups.
A very good policy to have if you ask me!
> It hosts two websites which are website.domain.com and images.domain.com.
Shutting down apache that hosts two web sites will shut them both down.
> Now the crash occurs when we send out a large email blast containing large amounts of references to images. This is our daily email so that is why there is a huge spike in traffic to that website.
This is the bit that will need further investigation and (like all performance type issues) it may take some time, throwing memory at the problem may not fix it although it wont hurt.
Physical memory may be the issue... only have 3GB :-) That is quite a bit...
You do need to spend the time going thru the logs in /var/log to see if there are any errors.
You also need to be collecting stats with sar, vmstat (see my previous post above). I'm afraid linux performance monitoring tools are limited, the thing you may notice about top is that it's usually one of the top CPU hogs of the system, but then you need to look at what is happening.
You said that the problem seems to occurr when you send out a large email with links to the images on the website. I take it that the email messages arent necessarily big but the volume is. Have you seen spikes in cpu/memory usage for the mail server?
Use the tools you have available to help you, check the man pages for relevant options, I've given a few below;
sar -B | -r | -R | -u | -b | -d | -n DEV
vmstat
ps -ev | -ealf
top
Dont be swayed by someone coming along and saying it's definately memory or this or that. You need to check and recheck logs etc.
I don't think the this question is going to give you all the help you need, although we may have covered off the original question I hope to your satisfaction, the title you have used and the subject areas it it is posted in may not hit the right amount of "expertise". You don't just want one persions opinion, I don't know everything (don't tell my boss!). You could do better by closing this question down and opening a new, perhaps related question including the apache, linux and some other groups.
A couple points:
Linux uses almost all physical memory.all of the time. Any physical memory not used by programs will be used for buffers and disk cache. The 'top' command will show physical and swap memory usage in an easily understood format. You can also have 'top' sort by memory usage by typing 'F' then 'N'. You can track how much memory apache is using during your peak times.
Entries under /dev are device handles (usually just called devices). /dev/cciss/c0d0p6 is the device handle for channel 0, device 0, partition. You cannot 'cd' to these paths, because they are not directories. Device handles are used by the 'mount' and 'umount' command, fstab, and by 'fdisk'.
'df' displays mounted partitions. Since the swap partition(s) are not mounted, they are not displayed. 'swapon' displays swap devices, as you have seen.
As jools said, you might have a memory problem, but it isn't shown by the information you've provided. You appear to have decided at the start that you have a memory problem and have been searching (unsuccessfully) for support for that conclusion. You need to take a step back and look at the problem again, starting with collecting evidence.
Linux uses almost all physical memory.all of the time. Any physical memory not used by programs will be used for buffers and disk cache. The 'top' command will show physical and swap memory usage in an easily understood format. You can also have 'top' sort by memory usage by typing 'F' then 'N'. You can track how much memory apache is using during your peak times.
Entries under /dev are device handles (usually just called devices). /dev/cciss/c0d0p6 is the device handle for channel 0, device 0, partition. You cannot 'cd' to these paths, because they are not directories. Device handles are used by the 'mount' and 'umount' command, fstab, and by 'fdisk'.
'df' displays mounted partitions. Since the swap partition(s) are not mounted, they are not displayed. 'swapon' displays swap devices, as you have seen.
As jools said, you might have a memory problem, but it isn't shown by the information you've provided. You appear to have decided at the start that you have a memory problem and have been searching (unsuccessfully) for support for that conclusion. You need to take a step back and look at the problem again, starting with collecting evidence.
Oooh... Hi eager, welcome to the party, sausage rolls and cake are in the corner...
MSJoe,
I found this http://www.redbooks.ibm.com/abstracts/REDP4285.html
Redbooks, excellent.
Toodle pip for now...
MSJoe,
I found this http://www.redbooks.ibm.com/abstracts/REDP4285.html
Redbooks, excellent.
Toodle pip for now...
ASKER
Thanks guys. I ruled out Appache this morning, and I think I'm getting closer to pin pointing my problem. I'm still on the fence if it is related to swap memory or not.
Yesterday the webserver failed to load images so I did not reboot apache. In fact I logged in and found the physical memory to be very low, under 1MB, and the swap file was still at 144k. Now I rebooted tomcat which also rebooted java which was reporting to use a significant amount of the memory.
Speaking to one of our web guys they said Java has a huge problem with garbage collection so that made me start thinking from the details of our conversation memory leak.
So today I logged in and looked at the system via the top command and saw that it was not a peak time and our memory was below 35MB, and that the Java was again using the majority of it. Before the email blast went out I decided if it was a memory leak rebooting tomcat which in fact restarts java would resolve the memory if there was one. I did and it brought the available physical memory up to 450 MB.
Now we have a pretty significant amount of traffic, and our system is heavily taxed in memory because of database connections and and media streaming so 3GB of memory for this setup is a bit weak. True linux does pretty much usage all of the available memory, but I have some concerns about the swap file. When does it start to swap!?! I mean don't let me compare Windows to Linux but Windows typically loads items into virtual memory as physical memory as it dwindles. I expect that linux will do the same, but at what level will it start? Can that value be changed? I wonder if the swap file has a problem, but I'm limited on how to determine this.
My next step is to close this thread and start a Java/Tomcat thread, but I want to be certain of a few detail of linux and swap memory usage.
Yesterday the webserver failed to load images so I did not reboot apache. In fact I logged in and found the physical memory to be very low, under 1MB, and the swap file was still at 144k. Now I rebooted tomcat which also rebooted java which was reporting to use a significant amount of the memory.
Speaking to one of our web guys they said Java has a huge problem with garbage collection so that made me start thinking from the details of our conversation memory leak.
So today I logged in and looked at the system via the top command and saw that it was not a peak time and our memory was below 35MB, and that the Java was again using the majority of it. Before the email blast went out I decided if it was a memory leak rebooting tomcat which in fact restarts java would resolve the memory if there was one. I did and it brought the available physical memory up to 450 MB.
Now we have a pretty significant amount of traffic, and our system is heavily taxed in memory because of database connections and and media streaming so 3GB of memory for this setup is a bit weak. True linux does pretty much usage all of the available memory, but I have some concerns about the swap file. When does it start to swap!?! I mean don't let me compare Windows to Linux but Windows typically loads items into virtual memory as physical memory as it dwindles. I expect that linux will do the same, but at what level will it start? Can that value be changed? I wonder if the swap file has a problem, but I'm limited on how to determine this.
My next step is to close this thread and start a Java/Tomcat thread, but I want to be certain of a few detail of linux and swap memory usage.
Found this as well... http://www.linux.com/feature/121916
You really shouldnt fixate on what you think the problem is unless you have some backup evidence.
From your post http:#22672036 I really cant see that memory is the issue, but you have not posted information when the problem occurrs so we are guessing.
Have you had the time to check sar and ps -ev outputs and compare them to then the system is running normally.
You really shouldnt fixate on what you think the problem is unless you have some backup evidence.
From your post http:#22672036 I really cant see that memory is the issue, but you have not posted information when the problem occurrs so we are guessing.
Have you had the time to check sar and ps -ev outputs and compare them to then the system is running normally.
hmmm... ps -ev doesnt do what I expected it to...
Try using pmap -x <pid>
eg/
ps -eaf | grep http
16045: /usr/sbin/httpd
Address Kbytes RSS Anon Locked Mode Mapping
<trimmed>
b7ee9000 100 - - - rw-s- zero (deleted)
b7f02000 32 - - - rw--- [ anon ]
bfef4000 88 - - - rw--- [ stack ]
-------- ------- ------- ------- -------
total kB 25696 - - -
Try using pmap -x <pid>
eg/
ps -eaf | grep http
16045: /usr/sbin/httpd
Address Kbytes RSS Anon Locked Mode Mapping
<trimmed>
b7ee9000 100 - - - rw-s- zero (deleted)
b7f02000 32 - - - rw--- [ anon ]
bfef4000 88 - - - rw--- [ stack ]
-------- ------- ------- ------- -------
total kB 25696 - - -
ahhh...keyboard....£$%£$"$ !
eg/
ps -eaf | grep http
pmap -x 16045
16045: /usr/sbin/httpd
Address Kbytes RSS Anon Locked Mode Mapping
<trimmed>
b7ee9000 100 - - - rw-s- zero (deleted)
b7f02000 32 - - - rw--- [ anon ]
bfef4000 88 - - - rw--- [ stack ]
-------- ------- ------- ------- -------
total kB 25696 - - -
eg/
ps -eaf | grep http
pmap -x 16045
16045: /usr/sbin/httpd
Address Kbytes RSS Anon Locked Mode Mapping
<trimmed>
b7ee9000 100 - - - rw-s- zero (deleted)
b7f02000 32 - - - rw--- [ anon ]
bfef4000 88 - - - rw--- [ stack ]
-------- ------- ------- ------- -------
total kB 25696 - - -
It is odd that you are not using more swap space. But swapon says that you have swap enabled.
Look at your log file (/var/log/messages) and try to identify why swap is not used and why apache dies.
Memory usage stats for Java don't necessarily indicate that there is a memory leak. As Java runs, it allocates VM from the operating system. It never returns this memory back to the OS. The memory allocated to Java represents the high water mark, not necessarily current usage. Be sure that you are not confusing virtual memory with physical memory.
There is a small possibility that you are running into the Out of Memory killer. More info about OOM can be found here: http://linux.derkeiler.com/Mailing-Lists/RedHat/2007-08/msg00061.html
Again, fixating on one possible cause may be leading you to grasp at straws to justify that cause, such as relying on your web guy's suspicions about Java, rather than look for evidence which would lead you to a different cause.
Look at your log file (/var/log/messages) and try to identify why swap is not used and why apache dies.
Memory usage stats for Java don't necessarily indicate that there is a memory leak. As Java runs, it allocates VM from the operating system. It never returns this memory back to the OS. The memory allocated to Java represents the high water mark, not necessarily current usage. Be sure that you are not confusing virtual memory with physical memory.
There is a small possibility that you are running into the Out of Memory killer. More info about OOM can be found here: http://linux.derkeiler.com/Mailing-Lists/RedHat/2007-08/msg00061.html
Again, fixating on one possible cause may be leading you to grasp at straws to justify that cause, such as relying on your web guy's suspicions about Java, rather than look for evidence which would lead you to a different cause.
ASKER
Sorry for the delay in response. So here is what I have been doing. I have been watching the server, documenting, and going through my logs each day.
Eager, you said "Look at your log file (/var/log/messages) and try to identify why swap is not used and why apache dies.". I have been looking in the message logs and I can't find anything at all I'm message, message 1, message 2, message3, or message 4 regarding a problem and swap memory. Regardless there has to be a problem because theres no way it will get to 30mb of physical ram and still not at least be using some swap memory.
I can't say Java is the problem for sure eager, but I did jump to that conclusion because A) when the server crashes Java is using up all of the memory, and no swap memory has been used. This indicates to me that Java may not be the problem but my swap memory may. Java may just take more and more of the memory not know the system isn't going to compensate with swap memory as it should. I'm trying to work with our server host to resolve this issue but it seems I have to fend for myself so any help you guys can provide to pin point this issue through logs files would be so helpful I can't tell you my gratitude.
Anyway I have been rebooting tomcat everyday in off hours and the server does not crash which means that if it runs for a long period of time and takes up a significant of amount of memory swap memory usage will not go up, and I'm guessing one or more of the processes that are related to tomcat crash because it can't get more memory. e.g. java.
I feel pretty confident of this, I just can't prove it because I can't find the logs. I'm looking but I can't find the information I need in logs for the page file usage or apache crashes. I spoke to my supervisors and I have been advised to not install additional software and to investigate the problem through log files.
Eager, you said "Look at your log file (/var/log/messages) and try to identify why swap is not used and why apache dies.". I have been looking in the message logs and I can't find anything at all I'm message, message 1, message 2, message3, or message 4 regarding a problem and swap memory. Regardless there has to be a problem because theres no way it will get to 30mb of physical ram and still not at least be using some swap memory.
I can't say Java is the problem for sure eager, but I did jump to that conclusion because A) when the server crashes Java is using up all of the memory, and no swap memory has been used. This indicates to me that Java may not be the problem but my swap memory may. Java may just take more and more of the memory not know the system isn't going to compensate with swap memory as it should. I'm trying to work with our server host to resolve this issue but it seems I have to fend for myself so any help you guys can provide to pin point this issue through logs files would be so helpful I can't tell you my gratitude.
Anyway I have been rebooting tomcat everyday in off hours and the server does not crash which means that if it runs for a long period of time and takes up a significant of amount of memory swap memory usage will not go up, and I'm guessing one or more of the processes that are related to tomcat crash because it can't get more memory. e.g. java.
I feel pretty confident of this, I just can't prove it because I can't find the logs. I'm looking but I can't find the information I need in logs for the page file usage or apache crashes. I spoke to my supervisors and I have been advised to not install additional software and to investigate the problem through log files.
Run "dmesg". You should find a message which looks like
Adding 4192956k swap on /dev/sda3. Priority:-1 extents:1 across:4192956k
The same message should show up in /var/log/messages.
Run "free". Post results here.
List warning or error messages issued when Tomcat/Apache crashes. Look for "unable to allocaate" or "out of memory" or "kernel error".
I still recommend looking for causes other than memory. You have nothing more than a hunch, with no evidence to support this.
Adding 4192956k swap on /dev/sda3. Priority:-1 extents:1 across:4192956k
The same message should show up in /var/log/messages.
Run "free". Post results here.
List warning or error messages issued when Tomcat/Apache crashes. Look for "unable to allocaate" or "out of memory" or "kernel error".
I still recommend looking for causes other than memory. You have nothing more than a hunch, with no evidence to support this.
ASKER
I added more memory to the server and it slows down because of database connections left open during peak times but the problem essentially is resolved with tomcat/apache crashing.
However I did what you asked. I did find Adding 2032212k swap on /dev/cciss/c0d0p3. Priority:-1 extents:1 (but no across:number does that matter?
Free results
total used free shared buffers cached
Mem: 5974196 5007816 966380 0 124000 4116600
-/+ buffers/cache: 767216 5206980
Swap: 2032212 0 2032212
The free results look normal; we bumped up the amount of memory Java/Tomcat/Apache use so 1 GB remains for the system itself. None of the data from the above shows any problem with memory, which is why I am pointing towards log files to prove it. I started coping log files the other day just so I could go through them all but I can't find any log files that are relevant to the crash which is why I asked what log files should I be looking at! For instance there are hundreds of log files that are all named different. I went looking in anything label out website's name, and anything that essentially generic such as messages (1-5) but I feel I'm looking in the wrong logs because I've been able to identify problems, but nothing relevant to the website crashing. The information I've been able to find is simply more or less invalid directory or file, which refers to old directories and files on our website which our team is working to clean up, but that isn't the source of the problem. So, what log files (specify names) should I be looking in?
Regarding the slow down I am waiting to get memory for my DB server, and we have limited the number of DB connections that are available to resolve what appears to be a bottle neck due to a large increase in volume.
However I did what you asked. I did find Adding 2032212k swap on /dev/cciss/c0d0p3. Priority:-1 extents:1 (but no across:number does that matter?
Free results
total used free shared buffers cached
Mem: 5974196 5007816 966380 0 124000 4116600
-/+ buffers/cache: 767216 5206980
Swap: 2032212 0 2032212
The free results look normal; we bumped up the amount of memory Java/Tomcat/Apache use so 1 GB remains for the system itself. None of the data from the above shows any problem with memory, which is why I am pointing towards log files to prove it. I started coping log files the other day just so I could go through them all but I can't find any log files that are relevant to the crash which is why I asked what log files should I be looking at! For instance there are hundreds of log files that are all named different. I went looking in anything label out website's name, and anything that essentially generic such as messages (1-5) but I feel I'm looking in the wrong logs because I've been able to identify problems, but nothing relevant to the website crashing. The information I've been able to find is simply more or less invalid directory or file, which refers to old directories and files on our website which our team is working to clean up, but that isn't the source of the problem. So, what log files (specify names) should I be looking in?
Regarding the slow down I am waiting to get memory for my DB server, and we have limited the number of DB connections that are available to resolve what appears to be a bottle neck due to a large increase in volume.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
# free
total used free shared buffers cached
Mem: 1035012 968276 66736 0 252924 331336
-/+ buffers/cache: 384016 650996
Swap: 2097144 108 2097036