We help IT Professionals succeed at work.

Check out our new AWS podcast with Certified Expert, Phil Phillips! Listen to "How to Execute a Seamless AWS Migration" on EE or on your favorite podcast platform. Listen Now

x

Out of Memory Errors

kamarja
kamarja asked
on
Medium Priority
1,644 Views
Last Modified: 2007-12-19
Hello Experts,

We just launched a new server and we are having a lot of performance related problems. We have the following environment.

uname -a
SunOS behemoth 5.9 Generic sun4u sparc SUNW,Sun-Fire-880

java -version
java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)

RAM: 4096M

 ./httpd -version
Server version: Apache/1.3.27 (Unix)

tomcat version 4.1, xlst, oscache.

After starting the server, the load goes up very high - currently: load average: 1.25 and the webserver is very slow and then eventually dies if the site becomes very busy. I saw the following error in the tocat logs:

Unexpected Signal : 10 occurred at PC=0xFECD6270
Function=[Unknown. Nearest: JVM_ArrayCopy+0x5600]
Library=/usr/local/j2sdk1.4.2_03/jre/lib/sparc/server/libjvm.so


Dynamic libraries:
0x10000         /usr/local/j2sdk1.4.2_03/bin/java
0xff370000      /usr/lib/libthread.so.1
0xff3a0000      /usr/lib/libdl.so.1
0xff280000      /usr/lib/libc.so.1
0xff360000      /usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1
0xfec00000      /usr/local/j2sdk1.4.2_03/jre/lib/sparc/server/libjvm.so
0xff240000      /usr/lib/libCrun.so.1
0xff210000      /usr/lib/libsocket.so.1
0xfeb00000      /usr/lib/libnsl.so.1
0xfebd0000      /usr/lib/libm.so.1
0xff1f0000      /usr/lib/libsched.so.1
0xff270000      /usr/lib/libw.so.1
0xfeae0000      /usr/lib/libmp.so.2
0xfeac0000      /usr/lib/librt.so.1
0xfeaa0000      /usr/lib/libaio.so.1
0xfea70000      /usr/lib/libmd5.so.1
0xfea50000      /usr/platform/SUNW,Sun-Fire-880/lib/libmd5_psr.so.1
0xfea10000      /usr/local/j2sdk1.4.2_03/jre/lib/sparc/native_threads/libhpi.so
0xfe9c0000      /usr/local/j2sdk1.4.2_03/jre/lib/sparc/libverify.so
0xfe970000      /usr/local/j2sdk1.4.2_03/jre/lib/sparc/libjava.so
0xfe950000      /usr/local/j2sdk1.4.2_03/jre/lib/sparc/libzip.so
0xfbc20000      /usr/local/j2sdk1.4.2_03/jre/lib/sparc/libnet.so
0xf9be0000      /usr/lib/nss_files.so.1

Heap at VM Abort:
Heap
 par new generation   total 8128K, used 8064K [0x75800000, 0x76000000, 0x76000000)
  eden space 8064K, 100% used [0x75800000, 0x75fe0000, 0x75fe0000)
  from space 64K,   0% used [0x75ff0000, 0x75ff0000, 0x76000000)
  to   space 64K,   0% used [0x75fe0000, 0x75fe0000, 0x75ff0000)
 concurrent mark-sweep generation total 2088960K, used 1929831K [0x76000000, 0xf5800000, 0xf580
0000)
 concurrent-mark-sweep perm gen total 16384K, used 9840K [0xf5800000, 0xf6800000, 0xf9800000)

Local Time = Sun Mar 28 03:22:11 2004
Elapsed Time = 118113
#
# HotSpot Virtual Machine Error : 10
# Error ID : 4F530E43505002EF 01
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# Java VM: Java HotSpot(TM) Server VM (1.4.2_03-b02 mixed mode)


Any help will be greatly appreciated. Thanks in advance.

Tomcat is started with the following options :

CATALINA_OPTS="-server -Xms2048m -Xmx2048m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC "



- Ian








Comment
Watch Question

try

-XX:+UseParallelGC

for the throughput collector.
this uses several threads to perform minor collections, your eden space is saturated. This could help.
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION

Author

Commented:
Thanks for the quick response. So you suggest that I start tomcat like :


"-server -Xms2048m -Xmx2048m -XX:+UseConcMarkSweepGC -XX:+UseParrallelGC XX:+UseAdaptiveSizePolicy "



Author

Commented:
Just wondering if XX:+UseConcMarkSweepGC - and XX:+UseParrallelGC  play well together ???

Thanks.
Nope,  concurrent mark-sweep tries 'to reduce the time taken to collect the tenure generation', this is the generation of objects that have passed from short term to longer term, which isn't the kind of objects you'll be dealing primarily with in the case of your short lived web connections.

Try it without XX:+UseConcMarkSweepGC, it's a different garbage collector.

Use concMarkSweep when objects reside longer in the system and you have high % tenure.

Author

Commented:
Thanks. I will try it and let you know. It's seems that we are having long pause times when the GC is working.
btw signal 10 is sigbus signal which flags a data bus error, I think this can occur from young overflow, not sure though.

Author

Commented:
Hey,

It looks a lot happier right now and the site seems much faster. The load is stable and the memory usage is much better. We currently have 3GiGs free memory while before with the previous config, we would have about 500M free memory. I am going to watch it until tomorrow and I will let you know. Usually, it crashes more in the morning, so I will wait until tomorrow. Thanks so much.

I starting tomcat with :
CATALINA_OPTS="-server -Xmx2048m -Xmx2048m -XX:+UseParallelGC -XX:+UseAdaptiveSizePolicy -XX:+PrintGCDetails"
Ian
luck :P

Author

Commented:
OOps, not out of the woods yet. The webserver is again extremely slow, memory usage and loads high. I found this in the catalina.out log.

Look how long it takes now.
 before it was :

[Full GC 9648K->9560K(27712K), 0.2556779 secs]
[GC 19736K->12858K(27648K), 0.0046218 secs]
[GC 20396K->11729K(27968K), 0.0184282 secs]
[GC 21713K->15950K(29504K), 0.0217567 secs]
[GC 23246K->14443K(31424K), 0.0294165 secs]
[Full GC 14443K->14219K(31424K), 0.2671838 secs]

approx midnight it's:

[Full GC 1791224K->1791224K(1944192K), 11.8895081 secs]
[Full GC 1791224K->1791224K(1944192K), 11.8294296 secs]
[Full GC 1791224K->1791008K(1944192K), 12.1928545 secs]
[Full GC 1791224K->1791133K(1944192K), 11.9237863 secs]


I think there's probably some kind of memory leak in one of the java components, maybe try something like JProfile to try and find the problem.

Author

Commented:
Thanks. Will do.
JProbe not JProfile sorry.

http://www.quest.com/jprobe/index.asp

They have a free trial download.

Author

Commented:
Oh - ok thanks. I also found this error in the catalina.out, not sure if it's related tho:

WebappClassLoader: Lifecycle error : CL stopped

and

SEVERE: Error in action code
ava.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at
java.net.SocketOutputStream.write(SocketOutputStream.java:136)  at
org.apache.jk.common.ChannelSocket.send(ChannelSocket.java:457) at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:654) at
org.apache.jk.server.JkCoyoteHandler.action(JkCoyoteHandler.java:435) at org.apache.coyote.Response.action(Response.java:222)
at org.apache.coyote.Response.finish(Response.java:343)
at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:314) at
org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:387) at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:673) at
org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:615) at
> org.apache.jk.common.SocketConnection.runIt(ChannelSocket.java:786) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:666)

Author

Commented:
Hey,

Do you think I could decrease the initial heap size as a work around while we try to find the memory leak. Think it would help ?
FROM:"-server -Xms2048m -Xmx2048m -XX:+UseParallelGC -XX:+UseAdaptiveSizePolicy -XX:+PrintGCDetails"

TO:
-server -Xmx512m -Xmx512m -XX:+UseParallelGC -XX:+UseAdaptiveSizePolicy -XX:+PrintGCDetails"

Thanks.

Ian
For the broken pipe check:
http://archives.real-time.com/pipermail/tomcat-users/2003-January/091849.html
It could be the interrupted download described.

I don't know the tomcat classes but the WebAppClassLoader has a stop method() and I don't know why it's being called, some printStackTrace would be needed to pinpoint the reason.

General docs:
http://jakarta.apache.org/tomcat/tomcat-5.0-doc/catalina/docs/api/org/apache/catalina/loader/WebappClassLoader.html
Lifecycle docs:
http://jakarta.apache.org/tomcat/tomcat-5.0-doc/catalina/docs/api/org/apache/catalina/Lifecycle.html

I'm sorry but I don't know the reason for the problem and my best guess is it's a memory leak that's leaving objects with references to fill up the heap.

I think decreasing the initial heap size will just degrade performance more by using slower hard disc swap space. Perhaps increasing it will help a little if there is some swapping going on atm, but this wouldn't solve your problem.

Author

Commented:
Thanks - you did point me in the right direction. I think we may have found some large responses from a few of our php files that uses xml. They are over 5megs. We'll see what happens when we reduce tha amount of data that they return.

Thanks for the pointers and your time.

- Ian
I suppose it's a kind of memory leak :P
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a sample view!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.