Out of Memory Errors

Hello Experts,

We just launched a new server and we are having a lot of performance related problems. We have the following environment.

uname -a
SunOS behemoth 5.9 Generic sun4u sparc SUNW,Sun-Fire-880

java -version
java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)

RAM: 4096M

 ./httpd -version
Server version: Apache/1.3.27 (Unix)

tomcat version 4.1, xlst, oscache.

After starting the server, the load goes up very high - currently: load average: 1.25 and the webserver is very slow and then eventually dies if the site becomes very busy. I saw the following error in the tocat logs:

Unexpected Signal : 10 occurred at PC=0xFECD6270
Function=[Unknown. Nearest: JVM_ArrayCopy+0x5600]

Dynamic libraries:
0x10000         /usr/local/j2sdk1.4.2_03/bin/java
0xff370000      /usr/lib/libthread.so.1
0xff3a0000      /usr/lib/libdl.so.1
0xff280000      /usr/lib/libc.so.1
0xff360000      /usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1
0xfec00000      /usr/local/j2sdk1.4.2_03/jre/lib/sparc/server/libjvm.so
0xff240000      /usr/lib/libCrun.so.1
0xff210000      /usr/lib/libsocket.so.1
0xfeb00000      /usr/lib/libnsl.so.1
0xfebd0000      /usr/lib/libm.so.1
0xff1f0000      /usr/lib/libsched.so.1
0xff270000      /usr/lib/libw.so.1
0xfeae0000      /usr/lib/libmp.so.2
0xfeac0000      /usr/lib/librt.so.1
0xfeaa0000      /usr/lib/libaio.so.1
0xfea70000      /usr/lib/libmd5.so.1
0xfea50000      /usr/platform/SUNW,Sun-Fire-880/lib/libmd5_psr.so.1
0xfea10000      /usr/local/j2sdk1.4.2_03/jre/lib/sparc/native_threads/libhpi.so
0xfe9c0000      /usr/local/j2sdk1.4.2_03/jre/lib/sparc/libverify.so
0xfe970000      /usr/local/j2sdk1.4.2_03/jre/lib/sparc/libjava.so
0xfe950000      /usr/local/j2sdk1.4.2_03/jre/lib/sparc/libzip.so
0xfbc20000      /usr/local/j2sdk1.4.2_03/jre/lib/sparc/libnet.so
0xf9be0000      /usr/lib/nss_files.so.1

Heap at VM Abort:
 par new generation   total 8128K, used 8064K [0x75800000, 0x76000000, 0x76000000)
  eden space 8064K, 100% used [0x75800000, 0x75fe0000, 0x75fe0000)
  from space 64K,   0% used [0x75ff0000, 0x75ff0000, 0x76000000)
  to   space 64K,   0% used [0x75fe0000, 0x75fe0000, 0x75ff0000)
 concurrent mark-sweep generation total 2088960K, used 1929831K [0x76000000, 0xf5800000, 0xf580
 concurrent-mark-sweep perm gen total 16384K, used 9840K [0xf5800000, 0xf6800000, 0xf9800000)

Local Time = Sun Mar 28 03:22:11 2004
Elapsed Time = 118113
# HotSpot Virtual Machine Error : 10
# Error ID : 4F530E43505002EF 01
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
# Java VM: Java HotSpot(TM) Server VM (1.4.2_03-b02 mixed mode)

Any help will be greatly appreciated. Thanks in advance.

Tomcat is started with the following options :

CATALINA_OPTS="-server -Xms2048m -Xmx2048m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC "

- Ian

Who is Participating?
twobitadderConnect With a Mentor Commented:
I say this because:

Function=[Unknown. Nearest: JVM_ArrayCopy+0x5600]

implies that there could be a problem copying to the the 'to' region from eden.

read this also:

and consider using :
XX:+UseAdaptiveSizePolicy  to try and judge the new size for the young gen after each minor collection.

could also adjust the ratio of young:tenured with:
-XX:NewRatio= >>newRatioOfyoung:tenured<<


for the throughput collector.
this uses several threads to perform minor collections, your eden space is saturated. This could help.
kamarjaAuthor Commented:
Thanks for the quick response. So you suggest that I start tomcat like :

"-server -Xms2048m -Xmx2048m -XX:+UseConcMarkSweepGC -XX:+UseParrallelGC XX:+UseAdaptiveSizePolicy "

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

kamarjaAuthor Commented:
Just wondering if XX:+UseConcMarkSweepGC - and XX:+UseParrallelGC  play well together ???

Nope,  concurrent mark-sweep tries 'to reduce the time taken to collect the tenure generation', this is the generation of objects that have passed from short term to longer term, which isn't the kind of objects you'll be dealing primarily with in the case of your short lived web connections.

Try it without XX:+UseConcMarkSweepGC, it's a different garbage collector.

Use concMarkSweep when objects reside longer in the system and you have high % tenure.
kamarjaAuthor Commented:
Thanks. I will try it and let you know. It's seems that we are having long pause times when the GC is working.
btw signal 10 is sigbus signal which flags a data bus error, I think this can occur from young overflow, not sure though.
kamarjaAuthor Commented:

It looks a lot happier right now and the site seems much faster. The load is stable and the memory usage is much better. We currently have 3GiGs free memory while before with the previous config, we would have about 500M free memory. I am going to watch it until tomorrow and I will let you know. Usually, it crashes more in the morning, so I will wait until tomorrow. Thanks so much.

I starting tomcat with :
CATALINA_OPTS="-server -Xmx2048m -Xmx2048m -XX:+UseParallelGC -XX:+UseAdaptiveSizePolicy -XX:+PrintGCDetails"
luck :P
kamarjaAuthor Commented:
OOps, not out of the woods yet. The webserver is again extremely slow, memory usage and loads high. I found this in the catalina.out log.

Look how long it takes now.
 before it was :

[Full GC 9648K->9560K(27712K), 0.2556779 secs]
[GC 19736K->12858K(27648K), 0.0046218 secs]
[GC 20396K->11729K(27968K), 0.0184282 secs]
[GC 21713K->15950K(29504K), 0.0217567 secs]
[GC 23246K->14443K(31424K), 0.0294165 secs]
[Full GC 14443K->14219K(31424K), 0.2671838 secs]

approx midnight it's:

[Full GC 1791224K->1791224K(1944192K), 11.8895081 secs]
[Full GC 1791224K->1791224K(1944192K), 11.8294296 secs]
[Full GC 1791224K->1791008K(1944192K), 12.1928545 secs]
[Full GC 1791224K->1791133K(1944192K), 11.9237863 secs]

I think there's probably some kind of memory leak in one of the java components, maybe try something like JProfile to try and find the problem.
kamarjaAuthor Commented:
Thanks. Will do.
JProbe not JProfile sorry.


They have a free trial download.
kamarjaAuthor Commented:
Oh - ok thanks. I also found this error in the catalina.out, not sure if it's related tho:

WebappClassLoader: Lifecycle error : CL stopped


SEVERE: Error in action code
ava.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at
java.net.SocketOutputStream.write(SocketOutputStream.java:136)  at
org.apache.jk.common.ChannelSocket.send(ChannelSocket.java:457) at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:654) at
org.apache.jk.server.JkCoyoteHandler.action(JkCoyoteHandler.java:435) at org.apache.coyote.Response.action(Response.java:222)
at org.apache.coyote.Response.finish(Response.java:343)
at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:314) at
org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:387) at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:673) at
org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:615) at
> org.apache.jk.common.SocketConnection.runIt(ChannelSocket.java:786) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:666)
kamarjaAuthor Commented:

Do you think I could decrease the initial heap size as a work around while we try to find the memory leak. Think it would help ?
FROM:"-server -Xms2048m -Xmx2048m -XX:+UseParallelGC -XX:+UseAdaptiveSizePolicy -XX:+PrintGCDetails"

-server -Xmx512m -Xmx512m -XX:+UseParallelGC -XX:+UseAdaptiveSizePolicy -XX:+PrintGCDetails"


For the broken pipe check:
It could be the interrupted download described.

I don't know the tomcat classes but the WebAppClassLoader has a stop method() and I don't know why it's being called, some printStackTrace would be needed to pinpoint the reason.

General docs:
Lifecycle docs:

I'm sorry but I don't know the reason for the problem and my best guess is it's a memory leak that's leaving objects with references to fill up the heap.

I think decreasing the initial heap size will just degrade performance more by using slower hard disc swap space. Perhaps increasing it will help a little if there is some swapping going on atm, but this wouldn't solve your problem.

kamarjaAuthor Commented:
Thanks - you did point me in the right direction. I think we may have found some large responses from a few of our php files that uses xml. They are over 5megs. We'll see what happens when we reduce tha amount of data that they return.

Thanks for the pointers and your time.

- Ian
I suppose it's a kind of memory leak :P
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.