[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

RHEL, tomcat is getting into hung situation.

Posted on 2006-07-17
22
Medium Priority
?
2,477 Views
Last Modified: 2008-05-03
Hi,
I am running tomcat 5.0.18(using JDK 1.4.2_07) on RHEL.
-bash-3.00$ uname -a
Linux localhost.com 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:32:14 EDT 2005 i686 i686 i386 GNU/Linux

After doing certain load runs, tomcat is getting into hung situation. netstat shows that the new request is not getting into ESTABLISHED state.
If i send new wget localhost:port (from same host where tomcat is running) then in netstat, i will see the entry in SYN_RECV state for quite sometime. After 2-3 minutes, wget command comes out saying no answer from the server.

In the netstat -s output, i see
From netstat -s, I noticed overflow of the socket queue:
  7 packets pruned from receive queue because of socket buffer overrun
421277 times the listen queue of a socket overflowed
    421277 SYNs to LISTEN sockets ignored

Should i fine tune something in linux?

0
Comment
Question by:bhaskarna
  • 8
  • 7
  • 4
19 Comments
 
LVL 22

Expert Comment

by:pjedmond
ID: 17120683
>After doing certain load runs
>From netstat -s, I noticed overflow of the socket queue

The whole point of load runs is to find where performance limitations and bottlenecks are. You want these to be at a point in the system that 'fails gracefully'. You need to decide what is acceptable. In particular, they may also highlight vulnerability to flood or DOS attacks.

Tuning:

You can increase the transmission queue for your ethernet interface by:

/sbin/ifconfig eth0 txqueuelen 4000

You can also increase improve kernel performance by tweaking sysctrl

http://datatag.web.cern.ch/datatag/howto/tcp.html

Unfortunately, neither of these is really the solution in this case, as what this does is increase the ability of the system to cope with peak loads for a slightly longer period of time. (Whilst the 'buffers/queues' fill up).

You need to have a look at your overall application set up, and perhaps have your application monitor load in some way, and reject new connection or fail gracefully once a load threshold is exceeded.

(   (()
(`-' _\
 ''  ''
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 17120700
..or buy a faster processor and more memory, until the ethernet input connection becomes the bottleneck. (You always want the input to be the bottleneck with a process when you are designing it - that way the 'system' has a little bit of leaway to cope with anything 'unusual', which helps guarantee that the 'system' or in this case your application, doesn't deal with anything unexpected.)

(   (()
(`-' _\
 ''  ''

0
 

Author Comment

by:bhaskarna
ID: 17120817
Even after stopping all the load clients, why will port remain in frozen state. Also i don't see too many active connections?
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 22

Expert Comment

by:pjedmond
ID: 17120870
>Even after stopping all the load clients, why will port remain in frozen state. Also i don't see too many active connections?

Correct - You've just managed a successful DOS attack on your system. Look at it from this point of view:

As root, you have permissions to turn off an ethernet connection. Once you've done it, it can't be used until it is reset.

As your tomcat has permissions to create sockets, and exceed the number of sockets available, and fill up the queue, then it does, and it won't work, until you reset it. Ideally, the system wouldn't allow you to create the sockets in the first place, but as the socket creation functionality is such a key part of many processes in a Linux system, the code is designed to be as fast and efficient as possible. As a result, some checks and limitations that otherwise might be there are not. Instead, the user is expected to make the checks necesary.

It may even be that there is an error somewhere being produced, and your system is ignoring it, and then carrying on to do something it shouldn't, thus freezing the setup. Probably worth checking your error logs.

(   (()
(`-' _\
 ''  ''

0
 

Author Comment

by:bhaskarna
ID: 17121069
Please execuseme if i am asking very basic questions.

Purposefully, after hitting hung situation, i left the tomcat process for 10hrs(without killing). Even after 10 hrs, i was seeing the same problem. (New requests were not getting into ESTABLISHED state). It use to stay at SYN_RECV for 2 minutes and comeout with no answer.

What does this mean? If tomcat is holding lot of socket connections then i should be able to notice the huge list of ESTABLISHED connections in netstat. But i was not seeing lot of established connections. Does it mean connections were released? if yes, then why i am not getting new connection? will there be any leaks? if yes, how to find out?

Will RHEL blocks a particular port after bombarding it? Do we need to finetune the message_burst, message_cost etc?

0
 
LVL 22

Expert Comment

by:pjedmond
ID: 17121533
Imagine a stick - If you bend it a little bit, then it will spring back. (This is equivalent to using the socket queue). - If you bend it a lot, then the stick snaps, and you need to get a new one. (This is equivalent to filling the socket queue, and then trying to keep on going...which is what your load testing is doing!)

Purposefully, after hitting hung situation, i left the tomcat process for 10hrs(without killing). Even after 10 hrs, i was seeing the same problem. (New requests were not getting into ESTABLISHED state). It use to stay at SYN_RECV for 2 minutes and comeout with no answer. - You've effectively 'broken' the system, so it won't recover without a restart.

If tomcat is holding lot of socket connections then i should be able to notice the huge list of ESTABLISHED connections in netstat. But i was not seeing lot of established connections. Does it mean connections were released? - All connections in Q were probably released, but the pointer for adding new connections is now pointing in the wrong place (possibly loads of bytes beyond where the Q should be), and as a result new connections cannot be placed in this Q.

Will RHEL blocks a particular port after bombarding it? - No - not unless you have a firewall rule of some sort to do this.

Do we need to finetune the message_burst, message_cost etc? - Without studying your configuration, there is no way to say. The key point is that your application is exceeding the capabilities of the setup that you have. Ideally, RHEL should prevent you from doing this, but in this case it obviously does not - probably due to optimisations that improve performance. You need to change your tomcat code to prevent it exceeding these limitations. For example - if there are more than say 100 connections, then you tomcat servlets need to reject the connection and produce a page that says "Please retry again later". A better alternative would be to configure the necessary restriction at the Tomcat Level:

http://tomcat.apache.org/tomcat-4.0-doc/config/http11.html

In particular in the services.xml file, check and adjust the 'connector' (HttpConnector) configuration to 'tune' your setup. The exact 'tuning' required will depend on your application.

(   (()
(`-' _\
 ''  ''











0
 
LVL 22

Expert Comment

by:pjedmond
ID: 17121563
Sorry - noticed that you were running Tomcat 5. Worth having a look at logging, and see if that can help locate at what point the setup fails. Start here:

http://tomcat.apache.org/tomcat-5.0-doc/config/

Note that Tomcat 5, and 5.5 have a different logging configuration.

(   (()
(`-' _\
 ''  ''
0
 

Author Comment

by:bhaskarna
ID: 17128696
Till now i was suspecting that the socket q overflow problem is caused due to my load runs. But today in nestat -s output, I noticed the following entry in the box where load runner will not hit:
"    391808 times the listen queue of a socket overflowed"

To confirm that loadrunner didn't hit this box, in the tomcat access logs (for last 15 days) i see max of 200-300 requests per day. Out of 300 requests, 200+ will be heartbeat checks and around 60-80 requests are real requests.

What are the various reasons to see 'listen queue of a socket overflowed'?
Is there a easy way to debug and find out who is causing this overflow?

0
 
LVL 22

Expert Comment

by:pjedmond
ID: 17128812
Interesting, as it implies that tomcat may be the cause:

http://www.linuxquestions.org/questions/showthread.php?t=457517

>What are the various reasons to see 'listen queue of a socket overflowed'?
Only 1 reason really - too many connection attempts in too short a period of time.

>Is there a easy way to debug and find out who is causing this overflow?
Not really, unless someone is doing a 'standard' attack on your PC (Is is isolated from the rest of the network?). Best bet is probably to try using "tcpdump > filename.txt" and check what's going on after the freeze. These are raw packets and this approach is not for the faint-hearted, and will be timeconsuming to check/analyse!

(   (()
(`-' _\
 ''  ''

0
 

Author Comment

by:bhaskarna
ID: 17129147
I tried taking tcpdump of 5 seconds and noticed:

listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
589 packets captured
595 packets received by filter
6 packets dropped by kernel

is it too much? normal? should i check specific entries in tcpdump output to better understand the symptom?
0
 
LVL 22

Accepted Solution

by:
pjedmond earned 2000 total points
ID: 17129615
That could be normal - I notice that you have a 10MB card - worth upgrading for 2 reasons:

1.    It can only cope with so much trafiic, and due to the ethernet protocol, 2 packets sent at the same time will 'collide' giving an invalid packet...which might cause a problem.
2.    Older drivers potentially have more errors in them and may not make full use of functionality in modern kernels to deal with errors or other problems with packets, or the network .

The key here is that you really need to get a tcpdump during the failure process in order to see what is happening. Log files will be huge...!...but I suspect that once the proble occurs the packet behaviour will change.

      
Comment from pjedmond
Date: 07/17/2006 11:00AM BST
      Your Comment       

>After doing certain load runs
>From netstat -s, I noticed overflow of the socket queue

The whole point of load runs is to find where performance limitations and bottlenecks are. You want these to be at a point in the system that 'fails gracefully'. You need to decide what is acceptable. In particular, they may also highlight vulnerability to flood or DOS attacks.

Tuning:

You can increase the transmission queue for your ethernet interface by:

/sbin/ifconfig eth0 txqueuelen 4000

You can also increase improve kernel performance by tweaking sysctrl

http://datatag.web.cern.ch/datatag/howto/tcp.html

Unfortunately, neither of these is really the solution in this case, as what this does is increase the ability of the system to cope with peak loads for a slightly longer period of time. (Whilst the 'buffers/queues' fill up).

You need to have a look at your overall application set up, and perhaps have your application monitor load in some way, and reject new connection or fail gracefully once a load threshold is exceeded.

(   (()
(`-' _\
 ''  ''

Comment from pjedmond
Date: 07/17/2006 11:05AM BST
      Your Comment       

..or buy a faster processor and more memory, until the ethernet input connection becomes the bottleneck. (You always want the input to be the bottleneck with a process when you are designing it - that way the 'system' has a little bit of leaway to cope with anything 'unusual', which helps guarantee that the 'system' or in this case your application, doesn't deal with anything unexpected.)

(   (()
(`-' _\
 ''  ''


Comment from bhaskarna
Date: 07/17/2006 11:38AM BST
      Author Comment       

Even after stopping all the load clients, why will port remain in frozen state. Also i don't see too many active connections?

Comment from pjedmond
Date: 07/17/2006 11:49AM BST
      Your Comment       

>Even after stopping all the load clients, why will port remain in frozen state. Also i don't see too many active connections?

Correct - You've just managed a successful DOS attack on your system. Look at it from this point of view:

As root, you have permissions to turn off an ethernet connection. Once you've done it, it can't be used until it is reset.

As your tomcat has permissions to create sockets, and exceed the number of sockets available, and fill up the queue, then it does, and it won't work, until you reset it. Ideally, the system wouldn't allow you to create the sockets in the first place, but as the socket creation functionality is such a key part of many processes in a Linux system, the code is designed to be as fast and efficient as possible. As a result, some checks and limitations that otherwise might be there are not. Instead, the user is expected to make the checks necesary.

It may even be that there is an error somewhere being produced, and your system is ignoring it, and then carrying on to do something it shouldn't, thus freezing the setup. Probably worth checking your error logs.

(   (()
(`-' _\
 ''  ''


Comment from bhaskarna
Date: 07/17/2006 12:27PM BST
      Author Comment       

Please execuseme if i am asking very basic questions.

Purposefully, after hitting hung situation, i left the tomcat process for 10hrs(without killing). Even after 10 hrs, i was seeing the same problem. (New requests were not getting into ESTABLISHED state). It use to stay at SYN_RECV for 2 minutes and comeout with no answer.

What does this mean? If tomcat is holding lot of socket connections then i should be able to notice the huge list of ESTABLISHED connections in netstat. But i was not seeing lot of established connections. Does it mean connections were released? if yes, then why i am not getting new connection? will there be any leaks? if yes, how to find out?

Will RHEL blocks a particular port after bombarding it? Do we need to finetune the message_burst, message_cost etc?


Comment from pjedmond
Date: 07/17/2006 01:45PM BST
      Your Comment       

Imagine a stick - If you bend it a little bit, then it will spring back. (This is equivalent to using the socket queue). - If you bend it a lot, then the stick snaps, and you need to get a new one. (This is equivalent to filling the socket queue, and then trying to keep on going...which is what your load testing is doing!)

Purposefully, after hitting hung situation, i left the tomcat process for 10hrs(without killing). Even after 10 hrs, i was seeing the same problem. (New requests were not getting into ESTABLISHED state). It use to stay at SYN_RECV for 2 minutes and comeout with no answer. - You've effectively 'broken' the system, so it won't recover without a restart.

If tomcat is holding lot of socket connections then i should be able to notice the huge list of ESTABLISHED connections in netstat. But i was not seeing lot of established connections. Does it mean connections were released? - All connections in Q were probably released, but the pointer for adding new connections is now pointing in the wrong place (possibly loads of bytes beyond where the Q should be), and as a result new connections cannot be placed in this Q.

Will RHEL blocks a particular port after bombarding it? - No - not unless you have a firewall rule of some sort to do this.

Do we need to finetune the message_burst, message_cost etc? - Without studying your configuration, there is no way to say. The key point is that your application is exceeding the capabilities of the setup that you have. Ideally, RHEL should prevent you from doing this, but in this case it obviously does not - probably due to optimisations that improve performance. You need to change your tomcat code to prevent it exceeding these limitations. For example - if there are more than say 100 connections, then you tomcat servlets need to reject the connection and produce a page that says "Please retry again later". A better alternative would be to configure the necessary restriction at the Tomcat Level:

http://tomcat.apache.org/tomcat-4.0-doc/config/http11.html

In particular in the services.xml file, check and adjust the 'connector' (HttpConnector) configuration to 'tune' your setup. The exact 'tuning' required will depend on your application.

(   (()
(`-' _\
 ''  ''












Comment from pjedmond
Date: 07/17/2006 01:49PM BST
      Your Comment       

Sorry - noticed that you were running Tomcat 5. Worth having a look at logging, and see if that can help locate at what point the setup fails. Start here:

http://tomcat.apache.org/tomcat-5.0-doc/config/

Note that Tomcat 5, and 5.5 have a different logging configuration.

(   (()
(`-' _\
 ''  ''

Comment from bhaskarna
Date: 07/18/2006 10:17AM BST
      Author Comment       

Till now i was suspecting that the socket q overflow problem is caused due to my load runs. But today in nestat -s output, I noticed the following entry in the box where load runner will not hit:
"    391808 times the listen queue of a socket overflowed"

To confirm that loadrunner didn't hit this box, in the tomcat access logs (for last 15 days) i see max of 200-300 requests per day. Out of 300 requests, 200+ will be heartbeat checks and around 60-80 requests are real requests.

What are the various reasons to see 'listen queue of a socket overflowed'?
Is there a easy way to debug and find out who is causing this overflow?


Comment from pjedmond
Date: 07/18/2006 10:48AM BST
      Your Comment       

Interesting, as it implies that tomcat may be the cause:

http://www.linuxquestions.org/questions/showthread.php?t=457517

>What are the various reasons to see 'listen queue of a socket overflowed'?
Only 1 reason really - too many connection attempts in too short a period of time.

>Is there a easy way to debug and find out who is causing this overflow?
Not really, unless someone is doing a 'standard' attack on your PC (Is is isolated from the rest of the network?). Best bet is probably to try using "tcpdump > filename.txt" and check what's going on after the freeze. These are raw packets and this approach is not for the faint-hearted, and will be timeconsuming to check/analyse!

(   (()
(`-' _\
 ''  ''
0
 

Author Comment

by:bhaskarna
ID: 17137281
I noticed that the overflowcount is growing studily(even though tomcat didn't get into hung situation). Because yesterday when I checked the count was:
    22288 times the listen queue of a socket overflowed
Today it grown to:
    22330 times the listen queue of a socket overflowed

sar logs doesnt' show any spikes..
Will there be any other logs that justifies too many connection attempts in too short a period of time.?  or should i explicitly turnon some specific logs to get more data on what type of requests that caused this small growth.

0
 
LVL 22

Expert Comment

by:pjedmond
ID: 17137529
Well found......

Can you try leaving the server disconnected from the network and see if these overflows continue to grow. If so, then we know that the problem is part of the configuration rather than anything external. May be time to start looking for tomcat logs.

(   (()
(`-' _\
 ''  ''



0
 

Author Comment

by:bhaskarna
ID: 17144986
The below thread is initiated by me for the same issue:
http://www.linuxquestions.org/questions/showthread.php?t=457517

The proc status provided in the above thread is taken when tomcat got into the frozen state.

Does proc status of jsvc (tomcat) hints something to you to tell that tomcat is a cause for this problem?
0
 

Author Comment

by:bhaskarna
ID: 17277637
i tried to bombard the tomcat server with 100 threads again and noticed that the  overflowcount was growing studily.
Now, didn't get the right answer on why the port didn't hang this time.
But i didn't hit the no answer from port issue. In between we did some tuning in tomcat like maxThreads, acceptCount, jvm parameters like XmX, XmS.


0
 
LVL 88

Expert Comment

by:rindi
ID: 17390219
Any reason not to close this Q? It seems there is nothing new...
0
 
LVL 88

Expert Comment

by:rindi
ID: 17477304
bhaskarna, still there?
0
 
LVL 88

Expert Comment

by:rindi
ID: 17477568
Thanks, but please don't accept a post which only reminds you to close a Question and has nothing to with the Q at all. Select one of the other posts or follow the instructions in my link in the administrative comment above to get the Q closed. I will reopen this Q so that you can properly close it.

Thanks, rindi
PE Storage
0
 
LVL 88

Expert Comment

by:rindi
ID: 17526962
Please close the Q according to the EE rules, thanks.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I use more than 1 computer in my office for various reasons. Multiple keyboards and mice take up more than just extra space, they make working a little more complicated. Using one mouse and keyboard for all of my computers makes life easier. This co…
I have written articles previously comparing SARDU and YUMI.  I also included a couple of lines about Easy2boot (easy2boot.com).  I have now been using, and enjoying easy2boot as my sole multiboot utility for some years and realize that it deserves …
This is used to tweak the memory usage for your computer, it is used for servers more so than workstations but just be careful editing registry settings as it may cause irreversible results. I hold no responsibility for anything you do to the regist…
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses

872 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question