Link to home
Start Free TrialLog in
Avatar of MSJoe
MSJoe

asked on

Linux server swap memory problem

I have a problem with my webserver. Basically as you will see my drive statistics displayed below, as well as the swap. Now I'm having a major problem with web traffic spikes a certain time of the day and the server goes into oblivion and Tomcat and Appache must be rebooted. I logged in and ran the top command a few times over a course of a few days and my physical memory is below 100MB and my swap memory says it has 2GB but it is always at 144KB. Now it also says that it is on /dev/cciss/c0d0p3 which if I run the df -h command the drive/partition doesn't appear to exist.

So I'm a little bit confused if they are partitions or individual drives, and if there is something else going on here. To explain a bit better I changed the director up to root, and then to /dev/cciss/ and then ran an LS command. It listed c0d0p1 through c0d0p7 but when I tried to change directory to any of them it reported that it was not a directory! What the heck!

Regardless my assumption here is that c0d0p3 doesn't exist and my swap memory is failing to be accessed which is why my web server is crashing! This is a huge problem, and I'm confused by my investigation. I'm not new to linux, but I'm a bit confused when looking at this through SSH. I do not have the ability to look at the system directly, so I can't physically get to the machine.

Help!

[root@server ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/cciss/c0d0p6     989M  236M  703M  26% /
/dev/cciss/c0d0p1      99M   12M   83M  12% /boot
none                  1.5G     0  1.5G   0% /dev/shm
/dev/cciss/c0d0p5     989M   18M  921M   2% /tmp
/dev/cciss/c0d0p2     2.9G  916M  1.9G  33% /usr
/dev/cciss/c0d0p7     127G  6.1G  115G   6% /var


[root@server ~]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/cciss/c0d0p3                       partition       2032212 144     -1
Avatar of Julian Parker
Julian Parker
Flag of United Kingdom of Great Britain and Northern Ireland image

Swap is usually a raw partition, you can use free as well to view its usage.

# free
             total       used       free     shared    buffers     cached
Mem:       1035012     968276      66736          0     252924     331336
-/+ buffers/cache:     384016     650996
Swap:      2097144        108    2097036
from your listing is seems that swap in under used, I think this is a red herring!
Avatar of MSJoe
MSJoe

ASKER

Ok but that just tells me what I alreadly know. I know its usage, but I feel that this is an error. And what is this red herring you speak of?
I got the impression you thought swap not being listed in a df was a problem.

swap is not a mountable file system so it wouldnt show up in a df.

use fdisk -l /dev/cciss/c0d0 to list the partitions, swap should be type 82
Avatar of MSJoe

ASKER

[root@server ~]# fdisk -l /dev/cciss/c0d0

Disk /dev/cciss/c0d0: 145.6 GB, 145659002880 bytes
255 heads, 63 sectors/track, 17708 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

           Device Boot      Start         End      Blocks   Id  System
/dev/cciss/c0d0p1   *           1          13      104391   83  Linux
/dev/cciss/c0d0p2              14         396     3076447+  83  Linux
/dev/cciss/c0d0p3             397         649     2032222+  82  Linux swap
/dev/cciss/c0d0p4             650       17708   137026417+   5  Extended
/dev/cciss/c0d0p5             650         777     1028128+  83  Linux
/dev/cciss/c0d0p6             778         905     1028128+  83  Linux
/dev/cciss/c0d0p7             906       17708   134970066   83  Linux

Ok, so its there from runing the swapon -s command I see it is 2GB so why the heck is it always at 144KB and what can I do about it? Also what is this red herring? I have the feeling I'm not getting a very bad joke, which might make me smilke.

[root@server ~]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/cciss/c0d0p3                       partition       2032212 144     -1
Avatar of MSJoe

ASKER

Nevermind the red herring.

Something that draws attention away from the central issue
Its at 144kb because it's not being used (you dont actually want the system to use swap unless it has to).

The red herring?? I think you're looking at the wrong area, I dont think your problem is related to swap.

You need to be looking in the log files for apache and tomcat and perhaps using apachetop during the times of the problem to find out whats wrong.

Avatar of MSJoe

ASKER

Well the memory is very low, very very low during the problem.  Basically it happens everyday when we send out a mass amount of emails and we are thinking it is overloading the image server with request and the lack of memory is crashing the server.

Regardless if I am wrong of not, although the system is reporting 2gb of memory it should start using some swap memory before it hits zero physical ram correct? I'm just confused it is always at 144KB. I would expect when it hits liek 50MB or close to it would start using swap, right?

Anyway when the problem does occur we have to reboot appache, so I guess you are likely correct that this is a red herring. However the apachetop command doesn't exist?
Maybe, maybe not, the system will page out to swap if it needs to, if it does it a lot it will start thrashing but I've not seen that for several years, especially in a server with 2GB memory.

If you have sar running it collects data on your server, running sar with several options will give you some system stats.

You can install apachetop.

Also look at vmstat and iostat. I believe the tools are part of the sysstat package, apachetop is it's own package.
Avatar of MSJoe

ASKER

Well vmstat is but it doesn't really give me anything helpful.

[root@localhost root]# vmstat
procs                      memory      swap          io     system         cpu
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  0      0 181032 207964 752200    0    0     0     1    2     0  0  0  2  3
[root@localhost root]# iostat
-bash: iostat: command not found
[root@localhost root]#

What does apachetop do exactly? Is http://www.webta.org/projects/apachetop/ what you are referring to? I'm just not sure how to troubleshoot this issue with apache, if it isn't a memory problem. I was under the impression this was a swap memory problem, which it appears it is not.

Since I am wrong let me repaint the picture so you can point me into the correct direction. Two websites, one which is the actual website, another which is images. The images website is what appears to be crashing, so maybe appache is the problem but if the website became unresponsive and require appache to be rebooted, wouldn't it take down the actual website, not just the images website?
FYI, you can add timings to vmstat, for example;
# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0    108  61020 254300 333432    0    0   106     9  218  146  1  0 99  0  0
 0  0    108  61020 254308 333432    0    0     0    20 1233  131  0  0 100  0  0
 0  0    108  61020 254308 333432    0    0     0     5 1231  152  0  0 100  0  0
 0  0    108  61020 254308 333432    0    0     0    12 1233  126  0  0 100  0  0
 0  0    108  61020 254308 333432    0    0     0     0 1230  127  0  0 100  0  0
(my system isnt doing much)

# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  1    108  58116 256888 333428    0    0   106     9  218  146  1  0 99  0  0
 0  1    108  52064 262460 333408    0    0  1098   234 1470  941  1  5  0 94  0
 1  1    108  45432 269160 333404    0    0  1330   276 1531 1124  1  5  0 94  0
 0  1    108  40540 273804 333416    0    0   919   529 1435  856 10  5  1 85  0
 0  1    108  32976 281408 333420    0    0  1512   458 1578 1262  1  5  0 94  0
(with find / running in background)

SAR may also help you (a little)
# sar -r
09:40:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
09:50:01 AM     66640    968372     93.56    331900    288340   2097036       108      0.01         0
10:00:01 AM     62736    972276     93.94    332248    288376   2097036       108      0.01         0
10:10:01 AM     66240    968772     93.60    332532    288380   2097036       108      0.01         0
10:20:01 AM     64380    970632     93.78    332780    288376   2097036       108      0.01         0
10:30:01 AM     65840    969172     93.64    333040    288404   2097036       108      0.01         0
10:40:01 AM     68876    966136     93.35    303552    286264   2097036       108      0.01         0
10:50:01 AM     65020    969992     93.72    304156    287332   2097036       108      0.01         0
11:00:01 AM     62796    972216     93.93    305024    287556   2097036       108      0.01         0
11:10:01 AM     62300    972712     93.98    306008    287536   2097036       108      0.01         0
11:20:01 AM     60564    974448     94.15    306892    287596   2097036       108      0.01         0
11:30:01 AM     57340    977672     94.46    307904    287664   2097036       108      0.01         0
11:40:01 AM     68488    966524     93.38    298736    287448   2097036       108      0.01         0
11:50:01 AM     27132   1007880     97.38    300224    329828   2097036       108      0.01         0
12:00:01 PM     25748   1009264     97.51    301000    329948   2097036       108      0.01         0
12:10:01 PM     25456   1009556     97.54    301728    329972   2097036       108      0.01         0
12:20:01 PM     25012   1010000     97.58    302244    329992   2097036       108      0.01         0
12:30:01 PM     23876   1011136     97.69    302696    330020   2097036       108      0.01         0
12:40:01 PM     66732    968280     93.55    263456    326128   2097036       108      0.01         0
12:50:01 PM     66568    968444     93.57    263724    326132   2097036       108      0.01         0
01:00:01 PM     66188    968824     93.61    264132    326128   2097036       108      0.01         0
01:10:01 PM     64960    970052     93.72    264472    326132   2097036       108      0.01         0
01:20:01 PM     65616    969396     93.66    264776    326140   2097036       108      0.01         0
01:30:01 PM     65244    969768     93.70    265040    326156   2097036       108      0.01         0
01:40:01 PM     67972    967040     93.43    262616    325808   2097036       108      0.01         0
01:50:01 PM     61616    973396     94.05    263152    326888   2097036       108      0.01         0
02:00:01 PM     61368    973644     94.07    263468    326920   2097036       108      0.01         0
02:10:01 PM     60480    974532     94.16    263860    326976   2097036       108      0.01         0
02:20:01 PM     59392    975620     94.26    264612    327160   2097036       108      0.01         0
02:30:01 PM     59552    975460     94.25    264940    327172   2097036       108      0.01         0
02:40:01 PM     66840    968172     93.54    257136    326328   2097036       108      0.01         0
02:50:01 PM     66988    968024     93.53    257612    326356   2097036       108      0.01         0
03:00:01 PM     67000    968012     93.53    258160    326444   2097036       108      0.01         0
03:10:01 PM     66712    968300     93.55    258612    326464   2097036       108      0.01         0
03:20:01 PM     66064    968948     93.62    259048    326488   2097036       108      0.01         0
03:30:01 PM     63388    971624     93.88    259560    326512   2097036       108      0.01         0
03:40:01 PM     70528    964484     93.19    254472    325980   2097036       108      0.01         0
03:50:01 PM     68076    966936     93.42    255156    326080   2097036       108      0.01         0
04:00:01 PM     67580    967432     93.47    255656    326100   2097036       108      0.01         0
04:10:01 PM     67208    967804     93.51    256244    326136   2097036       108      0.01         0
04:20:01 PM     68644    966368     93.37    256612    326140   2097036       108      0.01         0
04:30:01 PM     68664    966348     93.37    256984    326140   2097036       108      0.01         0
04:40:01 PM     71264    963748     93.11    254672    325892   2097036       108      0.01         0
04:50:01 PM     64308    970704     93.79    255240    327456   2097036       108      0.01         0
05:00:01 PM     66576    968436     93.57    255720    327520   2097036       108      0.01         0
05:10:01 PM     63936    971076     93.82    256328    327724   2097036       108      0.01         0
05:20:01 PM     65332    969680     93.69    256764    327728   2097036       108      0.01         0
05:30:01 PM     61684    973328     94.04    257276    328696   2097036       108      0.01         0
05:40:01 PM     68496    966516     93.38    252280    328844   2097036       108      0.01         0
05:50:01 PM     68248    966764     93.41    252552    328852   2097036       108      0.01         0
06:00:01 PM     62872    972140     93.93    252960    333452   2097036       108      0.01         0
06:10:01 PM     65232    969780     93.70    253348    331216   2097036       108      0.01         0
06:20:01 PM     66272    968740     93.60    253696    330828   2097036       108      0.01         0
06:30:01 PM     65956    969056     93.63    253940    330856   2097036       108      0.01         0
06:40:01 PM     66936    968076     93.53    252308    331280   2097036       108      0.01         0
06:50:01 PM     66352    968660     93.59    252636    331288   2097036       108      0.01         0
07:00:01 PM     66260    968752     93.60    252884    331316   2097036       108      0.01         0
07:10:01 PM     65252    969760     93.70    253352    331392   2097036       108      0.01         0
07:20:01 PM     63268    971744     93.89    253708    331416   2097036       108      0.01         0

07:20:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
07:30:01 PM     64884    970128     93.73    254100    331420   2097036       108      0.01         0
07:40:01 PM     64504    970508     93.77    252512    331212   2097036       108      0.01         0
07:50:01 PM     59732    975280     94.23    253132    333332   2097036       108      0.01         0
08:00:01 PM     60972    974040     94.11    253480    333392   2097036       108      0.01         0
08:10:01 PM     60600    974412     94.14    253764    333416   2097036       108      0.01         0
08:20:01 PM     60176    974836     94.19    254052    333424   2097036       108      0.01         0
08:30:02 PM     43920    991092     95.76    269948    333436   2097036       108      0.01         0
Average:        63501    971511     93.86    303774    302919   2097036       108      0.01         0

iostat really gives disk io information
# iostat
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.61    0.07    0.18    0.55    0.00   98.59

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               2.79       106.18        20.58   17959510    3480268
sdb               2.27       109.68        18.98   18551638    3209852
md1               6.30       210.73        15.40   35644226    2605640
md0               0.16         0.73         0.00     124014        212
dm-0              0.00         0.01         0.00       1544        216
dm-1              2.90        47.01         7.09    7952146    1199912
dm-2              1.32        31.73         6.26    5367714    1058736
dm-3              1.96       130.92         1.82   22144986     308648
dm-4              0.12         0.95         0.17     161154      28456
dm-5              0.01         0.02         0.06       3218       9672
dm-6              0.56         3.37         1.07     570842     180456
dm-7              0.19         0.99         0.53     167690      89960

apachetop can show the apache activity, pages being accessed and performance of the web server.
There is some more information here with an example screenshot. http://freshmeat.net/projects/apachetop/

If both websites are running on the same server then yes, if both websites are running on sparate servers then no but the images server would be unavailable to serve while it was being reset. Not sure if this is a problem!

The problem could be any number of things at the moment, including (but not limited to).
Server hardware setup.
Network setup and configuration (the physical stuff, switches/routers etc)
Linux server network setup
Apache server configuration

Please understand, whilst there is no information you have posted here that leads me to think there is a memory issue, it doesnt mean that there isnt one.

You need to check the logs on the server first of all.

Avatar of MSJoe

ASKER

If you were troubleshooting this issue with Apache what logging would you enable? I have to get approval for installing Apachetop prior to doing it, so I'm waiting on that right now. We have a strict policy about installation of 3rd party software that has not been tested.

Anyway I need some clarification about the website crash you mentioned. Basically for our conversation there is one server. It hosts two websites which are website.domain.com and images.domain.com. The website points to images, for images obviously. Now the crash occurs when we send out a large email blast containing large amounts of references to images. This is our daily email so that is why there is a huge spike in traffic to that website. I fear it is a problem with apache, or memory but not physical network or the server setup itself. At least not the server itself as the only thing I was worried about was the swap memory but we ruled this out and fixed my misunderstanding of swap memory. When the "crash" occurs basically the website can still serve up html and render website.domain.com although it is very slow. Images.domain.com however when the website eventually comes up appears to be down but not exactly. The website.domain.com will render after some amount of time but without images being displayed. I feel and maybe I am wrong if apache was to blame wouldn't both go down? To resolve the issue apache must be restarted.

My conclusion leads me to believe that physical memory maybe the issue, it only has 3GB and we have a huge amount of traffic that has arrived in the past few months during peak hours. I have an upgrade to 6GB waiting to come in, and hopefully if it is a memory problem it will help. I just feel that in average usage the server is running at 50-80 MB of available physical memory that this could be the cause. If it doesnt help, then maybe it is apache but I suppose it could be a network issue however I can Ssh into it during the peak times without any problems. I guess the next step is to watch the top command to see if the physical memory drops, and the page file increases. If it does then it likely isnt a memory problem unless the page file is completely used up or there is a problem with a physical stick of memory.

Is there a way to check the network statistics and usage at SSH? Sorry for some of the rambling, I was thinking out loud.
> We have a strict policy about installation of 3rd party software that has not been tested.

A very good policy to have if you ask me!

> It hosts two websites which are website.domain.com and images.domain.com.

Shutting down apache that hosts two web sites will shut them both down.

> Now the crash occurs when we send out a large email blast containing large amounts of references to images. This is our daily email so that is why there is a huge spike in traffic to that website.

This is the bit that will need further investigation and (like all performance type issues) it may take some time, throwing memory at the problem may not fix it although it wont hurt.

Physical memory may be the issue... only have 3GB :-) That is quite a bit...

You do need to spend the time going thru the logs in /var/log to see if there are any errors.
You also need to be collecting stats with sar, vmstat (see my previous post above). I'm afraid linux performance monitoring tools are limited, the thing you may notice about top is that it's usually one of the top CPU hogs of the system, but then you need to look at what is happening.

You said that the problem seems to occurr when you send out a large email with links to the images on the website. I take it that the email messages arent necessarily big but the volume is. Have you seen spikes in cpu/memory usage for the mail server?

Use the tools you have available to help you, check the man pages for relevant options, I've given a few below;
   sar -B | -r  | -R | -u | -b | -d | -n DEV
   vmstat
   ps -ev | -ealf
   top

Dont be swayed by someone coming along and saying it's definately memory or this or that. You need to check and recheck logs etc.

I don't think the this question is going to give you all the help you need, although we may have covered off the original question I hope to your satisfaction, the title you have used and the subject areas it it is posted in may not hit the right amount of "expertise". You don't just want one persions opinion, I don't know everything (don't tell my boss!). You could do better by closing this question down and opening a new, perhaps related question including the apache, linux and some other groups.

A couple points:

Linux uses almost all physical memory.all of the time.  Any physical memory not used by programs will be used for buffers and disk cache.  The 'top' command will show physical and swap memory usage in an easily understood format.  You can also have 'top' sort by memory usage by typing 'F' then 'N'.  You can track how much memory apache is using during your peak times.

Entries under /dev are device handles (usually just called devices).  /dev/cciss/c0d0p6 is the device handle for channel 0, device 0, partition.  You cannot 'cd' to these paths, because they are not directories.  Device handles are used by the 'mount' and 'umount' command, fstab, and by 'fdisk'.  

'df' displays mounted partitions.  Since the swap partition(s) are not mounted, they are not displayed. 'swapon' displays swap devices, as you have seen.  

As jools said, you might have a memory problem, but it isn't shown by the information you've provided.  You appear to have decided at the start that you have a memory problem and have been searching (unsuccessfully) for support for that conclusion.  You need to take a step back and look at the problem again, starting with collecting evidence.


Oooh... Hi eager, welcome to the party, sausage rolls and cake are in the corner...

MSJoe,
I found this http://www.redbooks.ibm.com/abstracts/REDP4285.html
Redbooks, excellent.

Toodle pip for now...
Avatar of MSJoe

ASKER

Thanks guys. I ruled out Appache this morning, and I think I'm getting closer to pin pointing my problem. I'm still on the fence if it is related to swap memory or not.

Yesterday the webserver failed to load images so I did not reboot apache. In fact I logged in and found the physical memory to be very low, under 1MB, and the swap file was still at 144k. Now I rebooted tomcat which also rebooted java which was reporting to use a significant amount of the memory.

Speaking to one of our web guys they said Java has a huge problem with garbage collection so that made me start thinking from the details of our conversation memory leak.

So today I logged in and looked at the system via the top command and saw that it was not a peak time and our memory was below 35MB, and that the Java was again using the majority of it. Before the email blast went out I decided if it was a memory leak rebooting tomcat which in fact restarts java would resolve the memory if there was one. I did and it brought the available physical memory up to 450 MB.

Now we have a pretty significant amount of traffic, and our system is heavily taxed in memory because of database connections and and media streaming so 3GB of memory for this setup is a bit weak. True linux does pretty much usage all of the available memory, but I have some concerns about the swap file. When does it start to swap!?! I mean don't let me compare Windows to Linux but Windows typically loads items into virtual memory as physical memory as it dwindles. I expect that linux will do the same, but at what level will it start? Can that value be changed? I wonder if the swap file has a problem, but I'm limited on how to determine this.

My next step is to close this thread and start a Java/Tomcat thread, but I want to be certain of a few detail of linux and swap memory usage.
Found this as well... http://www.linux.com/feature/121916

You really shouldnt fixate on what you think the problem is unless you have some backup evidence.

From your post http:#22672036 I really cant see that memory is the issue, but you have not posted information when the problem occurrs so we are guessing.

Have you had the time to check sar  and ps -ev outputs and compare them to then the system is running normally.

hmmm... ps -ev doesnt do what I expected it to...
Try using pmap -x <pid>

eg/
ps -eaf | grep http
16045:   /usr/sbin/httpd
Address   Kbytes     RSS    Anon  Locked Mode   Mapping
<trimmed>
b7ee9000     100       -       -       - rw-s-  zero (deleted)
b7f02000      32       -       -       - rw---    [ anon ]
bfef4000      88       -       -       - rw---    [ stack ]
-------- ------- ------- ------- -------
total kB   25696       -       -       -
ahhh...keyboard....£$%£$"$!

eg/
ps -eaf | grep http
pmap -x 16045
16045:   /usr/sbin/httpd
Address   Kbytes     RSS    Anon  Locked Mode   Mapping
<trimmed>
b7ee9000     100       -       -       - rw-s-  zero (deleted)
b7f02000      32       -       -       - rw---    [ anon ]
bfef4000      88       -       -       - rw---    [ stack ]
-------- ------- ------- ------- -------
total kB   25696       -       -       -
It is odd that you are not using more swap space.  But swapon says that you have swap enabled.

Look at your log file (/var/log/messages) and try to identify why swap is not used and why apache dies.

Memory usage stats for Java don't necessarily indicate that there is a memory leak.  As Java runs, it allocates VM from the operating system.  It never returns this memory back to the OS.  The memory allocated to Java represents the high water mark, not necessarily current usage. Be sure that you are not confusing virtual memory with physical memory.  

There is a small possibility that you are running into the Out of Memory killer.  More info about OOM can be found here:  http://linux.derkeiler.com/Mailing-Lists/RedHat/2007-08/msg00061.html

Again, fixating on one possible cause may be leading you to grasp at straws to justify that cause, such as relying on your web guy's suspicions about Java, rather than look for evidence which would lead you to a different cause.  
Avatar of MSJoe

ASKER

Sorry for the delay in response. So here is what I have been doing. I have been watching the server, documenting, and going through my logs each day.

Eager, you said "Look at your log file (/var/log/messages) and try to identify why swap is not used and why apache dies.". I have been looking in the message logs and I can't find anything at all I'm message, message 1, message 2, message3, or message 4 regarding a problem and swap memory. Regardless there has to be a problem because theres no way it will get to 30mb of physical ram and still not at least be using some swap memory.

I can't say Java is the problem for sure eager, but I did jump to that conclusion because A) when the server crashes Java is using up all of the memory, and no swap memory has been used. This indicates to me that Java may not be the problem but my swap memory may. Java may just take more and more of the memory not know the system isn't going to compensate with swap memory as it should. I'm trying to work with our server host to resolve this issue but it seems I have to fend for myself so any help you guys can provide to pin point this issue through logs files would be so helpful I can't tell you my gratitude.

Anyway I have been rebooting tomcat everyday in off hours and the server does not crash which means that if it runs for a long period of time and takes up a significant of amount of memory swap memory usage will not go up, and I'm guessing one or more of the processes that are related to tomcat crash because it can't get more memory. e.g. java.

I feel pretty confident of this, I just can't prove it because I can't find the logs. I'm looking but I can't find the information I need in logs for the page file usage or apache crashes. I spoke to my supervisors and I have been advised to not install additional software and to investigate the problem through log files.
Run "dmesg".  You should find a message which looks like

  Adding 4192956k swap on /dev/sda3.  Priority:-1 extents:1 across:4192956k

The same message should show up in /var/log/messages.  

Run "free".  Post results here.

List warning or error messages issued when Tomcat/Apache crashes.   Look for "unable to allocaate" or "out of memory" or "kernel error".  

I still recommend looking for causes other than memory.  You have nothing more than a hunch, with no evidence to support this.
Avatar of MSJoe

ASKER

I added more memory to the server and it slows down because of database connections left open during peak times but the problem essentially is resolved with tomcat/apache crashing.

However I did what you asked. I did find Adding 2032212k swap on /dev/cciss/c0d0p3.  Priority:-1 extents:1 (but no across:number does that matter?

Free results
             total       used       free     shared    buffers     cached
Mem:       5974196    5007816     966380          0     124000    4116600
-/+ buffers/cache:     767216    5206980
Swap:      2032212          0    2032212

The free results look normal; we bumped up the amount of memory Java/Tomcat/Apache use so 1 GB remains for the system itself. None of the data from the above shows any problem with memory, which is why I am pointing towards log files to prove it. I started coping log files the other day just so I could go through them all but I can't find any log files that are relevant to the crash which is why I asked what log files should I be looking at! For instance there are hundreds of log files that are all named different. I went looking in anything label out website's name, and anything that essentially generic such as messages (1-5) but I feel I'm looking in the wrong logs because I've been able to identify problems, but nothing relevant to the website crashing. The information I've been able to find is simply more or less invalid directory or file, which refers to old directories and files on our website which our team is working to clean up, but that isn't the source of the problem. So, what log files (specify names) should I be looking in?

Regarding the slow down I am waiting to get memory for my DB server, and we have limited the number of DB connections that are available to resolve what appears to be a bottle neck due to a large increase in volume.

ASKER CERTIFIED SOLUTION
Avatar of Michael Eager
Michael Eager
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial